-File: lispref.info, Node: Internationalization Terminology, Next: Charsets, Up: MULE
-
-Internationalization Terminology
-================================
-
- In internationalization terminology, a string of text is divided up
-into "characters", which are the printable units that make up the text.
-A single character is (for example) a capital `A', the number `2', a
-Katakana character, a Kanji ideograph (an "ideograph" is a "picture"
-character, such as is used in Japanese Kanji, Chinese Hanzi, and Korean
-Hangul; typically there are thousands of such ideographs in each
-language), etc. The basic property of a character is its shape. Note
-that the same character may be drawn by two different people (or in two
-different fonts) in slightly different ways, although the basic shape
-will be the same.
-
- In some cases, the differences will be significant enough that it is
-actually possible to identify two or more distinct shapes that both
-represent the same character. For example, the lowercase letters `a'
-and `g' each have two distinct possible shapes - the `a' can optionally
-have a curved tail projecting off the top, and the `g' can be formed
-either of two loops, or of one loop and a tail hanging off the bottom.
-Such distinct possible shapes of a character are called "glyphs". The
-important characteristic of two glyphs making up the same character is
-that the choice between one or the other is purely stylistic and has no
-linguistic effect on a word (this is the reason why a capital `A' and
-lowercase `a' are different characters rather than different glyphs -
-e.g. `Aspen' is a city while `aspen' is a kind of tree).
-
- Note that "character" and "glyph" are used differently here than
-elsewhere in XEmacs.
-
- A "character set" is simply a set of related characters. ASCII, for
-example, is a set of 94 characters (or 128, if you count non-printing
-characters). Other character sets are ISO8859-1 (ASCII plus various
-accented characters and other international symbols), JISX0201 (ASCII,
-more or less, plus half-width Katakana), JISX0208 (Japanese Kanji),
-JISX0212 (a second set of less-used Japanese Kanji), GB2312 (Mainland
-Chinese Hanzi), etc.
-
- Every character set has one or more "orderings", which can be viewed
-as a way of assigning a number (or set of numbers) to each character in
-the set. For most character sets, there is a standard ordering, and in
-fact all of the character sets mentioned above define a particular
-ordering. ASCII, for example, places letters in their "natural" order,
-puts uppercase letters before lowercase letters, numbers before
-letters, etc. Note that for many of the Asian character sets, there is
-no natural ordering of the characters. The actual orderings are based
-on one or more salient characteristic, of which there are many to
-choose from - e.g. number of strokes, common radicals, phonetic
-ordering, etc.
-
- The set of numbers assigned to any particular character are called
-the character's "position codes". The number of position codes
-required to index a particular character in a character set is called
-the "dimension" of the character set. ASCII, being a relatively small
-character set, is of dimension one, and each character in the set is
-indexed using a single position code, in the range 0 through 127 (if
-non-printing characters are included) or 33 through 126 (if only the
-printing characters are considered). JISX0208, i.e. Japanese Kanji,
-has thousands of characters, and is of dimension two - every character
-is indexed by two position codes, each in the range 33 through 126.
-(Note that the choice of the range here is somewhat arbitrary.
-Although a character set such as JISX0208 defines an _ordering_ of all
-its characters, it does not define the actual mapping between numbers
-and characters. You could just as easily index the characters in
-JISX0208 using numbers in the range 0 through 93, 1 through 94, 2
-through 95, etc. The reason for the actual range chosen is so that the
-position codes match up with the actual values used in the common
-encodings.)
-
- An "encoding" is a way of numerically representing characters from
-one or more character sets into a stream of like-sized numerical values
-called "words"; typically these are 8-bit, 16-bit, or 32-bit
-quantities. If an encoding encompasses only one character set, then the
-position codes for the characters in that character set could be used
-directly. (This is the case with ASCII, and as a result, most people do
-not understand the difference between a character set and an encoding.)
-This is not possible, however, if more than one character set is to be
-used in the encoding. For example, printed Japanese text typically
-requires characters from multiple character sets - ASCII, JISX0208, and
-JISX0212, to be specific. Each of these is indexed using one or more
-position codes in the range 33 through 126, so the position codes could
-not be used directly or there would be no way to tell which character
-was meant. Different Japanese encodings handle this differently - JIS
-uses special escape characters to denote different character sets; EUC
-sets the high bit of the position codes for JISX0208 and JISX0212, and
-puts a special extra byte before each JISX0212 character; etc. (JIS,
-EUC, and most of the other encodings you will encounter are 7-bit or
-8-bit encodings. There is one common 16-bit encoding, which is Unicode;
-this strives to represent all the world's characters in a single large
-character set. 32-bit encodings are generally used internally in
-programs to simplify the code that manipulates them; however, they are
-not much used externally because they are not very space-efficient.)
-
- Encodings are classified as either "modal" or "non-modal". In a
-"modal encoding", there are multiple states that the encoding can be in,
-and the interpretation of the values in the stream depends on the
-current global state of the encoding. Special values in the encoding,
-called "escape sequences", are used to change the global state. JIS,
-for example, is a modal encoding. The bytes `ESC $ B' indicate that,
-from then on, bytes are to be interpreted as position codes for
-JISX0208, rather than as ASCII. This effect is cancelled using the
-bytes `ESC ( B', which mean "switch from whatever the current state is
-to ASCII". To switch to JISX0212, the escape sequence `ESC $ ( D'.
-(Note that here, as is common, the escape sequences do in fact begin
-with `ESC'. This is not necessarily the case, however.)
-
- A "non-modal encoding" has no global state that extends past the
-character currently being interpreted. EUC, for example, is a
-non-modal encoding. Characters in JISX0208 are encoded by setting the
-high bit of the position codes, and characters in JISX0212 are encoded
-by doing the same but also prefixing the character with the byte 0x8F.
-
- The advantage of a modal encoding is that it is generally more
-space-efficient, and is easily extendable because there are essentially
-an arbitrary number of escape sequences that can be created. The
-disadvantage, however, is that it is much more difficult to work with
-if it is not being processed in a sequential manner. In the non-modal
-EUC encoding, for example, the byte 0x41 always refers to the letter
-`A'; whereas in JIS, it could either be the letter `A', or one of the
-two position codes in a JISX0208 character, or one of the two position
-codes in a JISX0212 character. Determining exactly which one is meant
-could be difficult and time-consuming if the previous bytes in the
-string have not already been processed.
-
- Non-modal encodings are further divided into "fixed-width" and
-"variable-width" formats. A fixed-width encoding always uses the same
-number of words per character, whereas a variable-width encoding does
-not. EUC is a good example of a variable-width encoding: one to three
-bytes are used per character, depending on the character set. 16-bit
-and 32-bit encodings are nearly always fixed-width, and this is in fact
-one of the main reasons for using an encoding with a larger word size.
-The advantages of fixed-width encodings should be obvious. The
-advantages of variable-width encodings are that they are generally more
-space-efficient and allow for compatibility with existing 8-bit
-encodings such as ASCII.
-
- Note that the bytes in an 8-bit encoding are often referred to as
-"octets" rather than simply as bytes. This terminology dates back to
-the days before 8-bit bytes were universal, when some computers had
-9-bit bytes, others had 10-bit bytes, etc.
+File: lispref.info, Node: Time of Day, Next: Time Conversion, Prev: User Identification, Up: System Interface
+
+Time of Day
+===========
+
+ This section explains how to determine the current time and the time
+zone.
+
+ - Function: current-time-string &optional time-value
+ This function returns the current time and date as a
+ humanly-readable string. The format of the string is unvarying;
+ the number of characters used for each part is always the same, so
+ you can reliably use `substring' to extract pieces of it. It is
+ wise to count the characters from the beginning of the string
+ rather than from the end, as additional information may be added
+ at the end.
+
+ The argument TIME-VALUE, if given, specifies a time to format
+ instead of the current time. The argument should be a list whose
+ first two elements are integers. Thus, you can use times obtained
+ from `current-time' (see below) and from `file-attributes' (*note
+ File Attributes::).
+
+ (current-time-string)
+ => "Wed Oct 14 22:21:05 1987"
+
+ - Function: current-time
+ This function returns the system's time value as a list of three
+ integers: `(HIGH LOW MICROSEC)'. The integers HIGH and LOW
+ combine to give the number of seconds since 0:00 January 1, 1970,
+ which is HIGH * 2**16 + LOW.
+
+ The third element, MICROSEC, gives the microseconds since the
+ start of the current second (or 0 for systems that return time
+ only on the resolution of a second).
+
+ The first two elements can be compared with file time values such
+ as you get with the function `file-attributes'. *Note File
+ Attributes::.
+
+ - Function: current-time-zone &optional time-value
+ This function returns a list describing the time zone that the
+ user is in.
+
+ The value has the form `(OFFSET NAME)'. Here OFFSET is an integer
+ giving the number of seconds ahead of UTC (east of Greenwich). A
+ negative value means west of Greenwich. The second element, NAME
+ is a string giving the name of the time zone. Both elements
+ change when daylight savings time begins or ends; if the user has
+ specified a time zone that does not use a seasonal time
+ adjustment, then the value is constant through time.
+
+ If the operating system doesn't supply all the information
+ necessary to compute the value, both elements of the list are
+ `nil'.
+
+ The argument TIME-VALUE, if given, specifies a time to analyze
+ instead of the current time. The argument should be a cons cell
+ containing two integers, or a list whose first two elements are
+ integers. Thus, you can use times obtained from `current-time'
+ (see above) and from `file-attributes' (*note File Attributes::).