This is ../info/internals.info, produced by makeinfo version 4.0b from internals/internals.texi. INFO-DIR-SECTION XEmacs Editor START-INFO-DIR-ENTRY * Internals: (internals). XEmacs Internals Manual. END-INFO-DIR-ENTRY Copyright (C) 1992 - 1996 Ben Wing. Copyright (C) 1996, 1997 Sun Microsystems. Copyright (C) 1994 - 1998 Free Software Foundation. Copyright (C) 1994, 1995 Board of Trustees, University of Illinois. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Foundation. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided also that the section entitled "GNU General Public License" is included exactly as in the original, and provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that the section entitled "GNU General Public License" may be included in a translation approved by the Free Software Foundation instead of in the original English.  File: internals.info, Node: The Text in a Buffer, Next: Buffer Lists, Prev: Introduction to Buffers, Up: Buffers and Textual Representation The Text in a Buffer ==================== The text in a buffer consists of a sequence of zero or more characters. A "character" is an integer that logically represents a letter, number, space, or other unit of text. Most of the characters that you will typically encounter belong to the ASCII set of characters, but there are also characters for various sorts of accented letters, special symbols, Chinese and Japanese ideograms (i.e. Kanji, Katakana, etc.), Cyrillic and Greek letters, etc. The actual number of possible characters is quite large. For now, we can view a character as some non-negative integer that has some shape that defines how it typically appears (e.g. as an uppercase A). (The exact way in which a character appears depends on the font used to display the character.) The internal type of characters in the C code is an `Emchar'; this is just an `int', but using a symbolic type makes the code clearer. Between every character in a buffer is a "buffer position" or "character position". We can speak of the character before or after a particular buffer position, and when you insert a character at a particular position, all characters after that position end up at new positions. When we speak of the character "at" a position, we really mean the character after the position. (This schizophrenia between a buffer position being "between" a character and "on" a character is rampant in Emacs.) Buffer positions are numbered starting at 1. This means that position 1 is before the first character, and position 0 is not valid. If there are N characters in a buffer, then buffer position N+1 is after the last one, and position N+2 is not valid. The internal makeup of the Emchar integer varies depending on whether we have compiled with MULE support. If not, the Emchar integer is an 8-bit integer with possible values from 0 - 255. 0 - 127 are the standard ASCII characters, while 128 - 255 are the characters from the ISO-8859-1 character set. If we have compiled with MULE support, an Emchar is a 19-bit integer, with the various bits having meanings according to a complex scheme that will be detailed later. The characters numbered 0 - 255 still have the same meanings as for the non-MULE case, though. Internally, the text in a buffer is represented in a fairly simple fashion: as a contiguous array of bytes, with a "gap" of some size in the middle. Although the gap is of some substantial size in bytes, there is no text contained within it: From the perspective of the text in the buffer, it does not exist. The gap logically sits at some buffer position, between two characters (or possibly at the beginning or end of the buffer). Insertion of text in a buffer at a particular position is always accomplished by first moving the gap to that position (i.e. through some block moving of text), then writing the text into the beginning of the gap, thereby shrinking the gap. If the gap shrinks down to nothing, a new gap is created. (What actually happens is that a new gap is "created" at the end of the buffer's text, which requires nothing more than changing a couple of indices; then the gap is "moved" to the position where the insertion needs to take place by moving up in memory all the text after that position.) Similarly, deletion occurs by moving the gap to the place where the text is to be deleted, and then simply expanding the gap to include the deleted text. ("Expanding" and "shrinking" the gap as just described means just that the internal indices that keep track of where the gap is located are changed.) Note that the total amount of memory allocated for a buffer text never decreases while the buffer is live. Therefore, if you load up a 20-megabyte file and then delete all but one character, there will be a 20-megabyte gap, which won't get any smaller (except by inserting characters back again). Once the buffer is killed, the memory allocated for the buffer text will be freed, but it will still be sitting on the heap, taking up virtual memory, and will not be released back to the operating system. (However, if you have compiled XEmacs with rel-alloc, the situation is different. In this case, the space _will_ be released back to the operating system. However, this tends to result in a noticeable speed penalty.) Astute readers may notice that the text in a buffer is represented as an array of _bytes_, while (at least in the MULE case) an Emchar is a 19-bit integer, which clearly cannot fit in a byte. This means (of course) that the text in a buffer uses a different representation from an Emchar: specifically, the 19-bit Emchar becomes a series of one to four bytes. The conversion between these two representations is complex and will be described later. In the non-MULE case, everything is very simple: An Emchar is an 8-bit value, which fits neatly into one byte. If we are given a buffer position and want to retrieve the character at that position, we need to follow these steps: 1. Pretend there's no gap, and convert the buffer position into a "byte index" that indexes to the appropriate byte in the buffer's stream of textual bytes. By convention, byte indices begin at 1, just like buffer positions. In the non-MULE case, byte indices and buffer positions are identical, since one character equals one byte. 2. Convert the byte index into a "memory index", which takes the gap into account. The memory index is a direct index into the block of memory that stores the text of a buffer. This basically just involves checking to see if the byte index is past the gap, and if so, adding the size of the gap to it. By convention, memory indices begin at 1, just like buffer positions and byte indices, and when referring to the position that is "at" the gap, we always use the memory position at the _beginning_, not at the end, of the gap. 3. Fetch the appropriate bytes at the determined memory position. 4. Convert these bytes into an Emchar. In the non-Mule case, (3) and (4) boil down to a simple one-byte memory access. Note that we have defined three types of positions in a buffer: 1. "buffer positions" or "character positions", typedef `Bufpos' 2. "byte indices", typedef `Bytind' 3. "memory indices", typedef `Memind' All three typedefs are just `int's, but defining them this way makes things a lot clearer. Most code works with buffer positions. In particular, all Lisp code that refers to text in a buffer uses buffer positions. Lisp code does not know that byte indices or memory indices exist. Finally, we have a typedef for the bytes in a buffer. This is a `Bufbyte', which is an unsigned char. Referring to them as Bufbytes underscores the fact that we are working with a string of bytes in the internal Emacs buffer representation rather than in one of a number of possible alternative representations (e.g. EUC-encoded text, etc.).  File: internals.info, Node: Buffer Lists, Next: Markers and Extents, Prev: The Text in a Buffer, Up: Buffers and Textual Representation Buffer Lists ============ Recall earlier that buffers are "permanent" objects, i.e. that they remain around until explicitly deleted. This entails that there is a list of all the buffers in existence. This list is actually an assoc-list (mapping from the buffer's name to the buffer) and is stored in the global variable `Vbuffer_alist'. The order of the buffers in the list is important: the buffers are ordered approximately from most-recently-used to least-recently-used. Switching to a buffer using `switch-to-buffer', `pop-to-buffer', etc. and switching windows using `other-window', etc. usually brings the new current buffer to the front of the list. `switch-to-buffer', `other-buffer', etc. look at the beginning of the list to find an alternative buffer to suggest. You can also explicitly move a buffer to the end of the list using `bury-buffer'. In addition to the global ordering in `Vbuffer_alist', each frame has its own ordering of the list. These lists always contain the same elements as in `Vbuffer_alist' although possibly in a different order. `buffer-list' normally returns the list for the selected frame. This allows you to work in separate frames without things interfering with each other. The standard way to look up a buffer given a name is `get-buffer', and the standard way to create a new buffer is `get-buffer-create', which looks up a buffer with a given name, creating a new one if necessary. These operations correspond exactly with the symbol operations `intern-soft' and `intern', respectively. You can also force a new buffer to be created using `generate-new-buffer', which takes a name and (if necessary) makes a unique name from this by appending a number, and then creates the buffer. This is basically like the symbol operation `gensym'.  File: internals.info, Node: Markers and Extents, Next: Bufbytes and Emchars, Prev: Buffer Lists, Up: Buffers and Textual Representation Markers and Extents =================== Among the things associated with a buffer are things that are logically attached to certain buffer positions. This can be used to keep track of a buffer position when text is inserted and deleted, so that it remains at the same spot relative to the text around it; to assign properties to particular sections of text; etc. There are two such objects that are useful in this regard: they are "markers" and "extents". A "marker" is simply a flag placed at a particular buffer position, which is moved around as text is inserted and deleted. Markers are used for all sorts of purposes, such as the `mark' that is the other end of textual regions to be cut, copied, etc. An "extent" is similar to two markers plus some associated properties, and is used to keep track of regions in a buffer as text is inserted and deleted, and to add properties (e.g. fonts) to particular regions of text. The external interface of extents is explained elsewhere. The important thing here is that markers and extents simply contain buffer positions in them as integers, and every time text is inserted or deleted, these positions must be updated. In order to minimize the amount of shuffling that needs to be done, the positions in markers and extents (there's one per marker, two per extent) are stored in Meminds. This means that they only need to be moved when the text is physically moved in memory; since the gap structure tries to minimize this, it also minimizes the number of marker and extent indices that need to be adjusted. Look in `insdel.c' for the details of how this works. One other important distinction is that markers are "temporary" while extents are "permanent". This means that markers disappear as soon as there are no more pointers to them, and correspondingly, there is no way to determine what markers are in a buffer if you are just given the buffer. Extents remain in a buffer until they are detached (which could happen as a result of text being deleted) or the buffer is deleted, and primitives do exist to enumerate the extents in a buffer.  File: internals.info, Node: Bufbytes and Emchars, Next: The Buffer Object, Prev: Markers and Extents, Up: Buffers and Textual Representation Bufbytes and Emchars ==================== Not yet documented.  File: internals.info, Node: The Buffer Object, Prev: Bufbytes and Emchars, Up: Buffers and Textual Representation The Buffer Object ================= Buffers contain fields not directly accessible by the Lisp programmer. We describe them here, naming them by the names used in the C code. Many are accessible indirectly in Lisp programs via Lisp primitives. `name' The buffer name is a string that names the buffer. It is guaranteed to be unique. *Note Buffer Names: (lispref)Buffer Names. `save_modified' This field contains the time when the buffer was last saved, as an integer. *Note Buffer Modification: (lispref)Buffer Modification. `modtime' This field contains the modification time of the visited file. It is set when the file is written or read. Every time the buffer is written to the file, this field is compared to the modification time of the file. *Note Buffer Modification: (lispref)Buffer Modification. `auto_save_modified' This field contains the time when the buffer was last auto-saved. `last_window_start' This field contains the `window-start' position in the buffer as of the last time the buffer was displayed in a window. `undo_list' This field points to the buffer's undo list. *Note Undo: (lispref)Undo. `syntax_table_v' This field contains the syntax table for the buffer. *Note Syntax Tables: (lispref)Syntax Tables. `downcase_table' This field contains the conversion table for converting text to lower case. *Note Case Tables: (lispref)Case Tables. `upcase_table' This field contains the conversion table for converting text to upper case. *Note Case Tables: (lispref)Case Tables. `case_canon_table' This field contains the conversion table for canonicalizing text for case-folding search. *Note Case Tables: (lispref)Case Tables. `case_eqv_table' This field contains the equivalence table for case-folding search. *Note Case Tables: (lispref)Case Tables. `display_table' This field contains the buffer's display table, or `nil' if it doesn't have one. *Note Display Tables: (lispref)Display Tables. `markers' This field contains the chain of all markers that currently point into the buffer. Deletion of text in the buffer, and motion of the buffer's gap, must check each of these markers and perhaps update it. *Note Markers: (lispref)Markers. `backed_up' This field is a flag that tells whether a backup file has been made for the visited file of this buffer. `mark' This field contains the mark for the buffer. The mark is a marker, hence it is also included on the list `markers'. *Note The Mark: (lispref)The Mark. `mark_active' This field is non-`nil' if the buffer's mark is active. `local_var_alist' This field contains the association list describing the variables local in this buffer, and their values, with the exception of local variables that have special slots in the buffer object. (Those slots are omitted from this table.) *Note Buffer-Local Variables: (lispref)Buffer-Local Variables. `modeline_format' This field contains a Lisp object which controls how to display the mode line for this buffer. *Note Modeline Format: (lispref)Modeline Format. `base_buffer' This field holds the buffer's base buffer (if it is an indirect buffer), or `nil'.  File: internals.info, Node: MULE Character Sets and Encodings, Next: The Lisp Reader and Compiler, Prev: Buffers and Textual Representation, Up: Top MULE Character Sets and Encodings ********************************* Recall that there are two primary ways that text is represented in XEmacs. The "buffer" representation sees the text as a series of bytes (Bufbytes), with a variable number of bytes used per character. The "character" representation sees the text as a series of integers (Emchars), one per character. The character representation is a cleaner representation from a theoretical standpoint, and is thus used in many cases when lots of manipulations on a string need to be done. However, the buffer representation is the standard representation used in both Lisp strings and buffers, and because of this, it is the "default" representation that text comes in. The reason for using this representation is that it's compact and is compatible with ASCII. * Menu: * Character Sets:: * Encodings:: * Internal Mule Encodings:: * CCL::  File: internals.info, Node: Character Sets, Next: Encodings, Prev: MULE Character Sets and Encodings, Up: MULE Character Sets and Encodings Character Sets ============== A character set (or "charset") is an ordered set of characters. A particular character in a charset is indexed using one or more "position codes", which are non-negative integers. The number of position codes needed to identify a particular character in a charset is called the "dimension" of the charset. In XEmacs/Mule, all charsets have dimension 1 or 2, and the size of all charsets (except for a few special cases) is either 94, 96, 94 by 94, or 96 by 96. The range of position codes used to index characters from any of these types of character sets is as follows: Charset type Position code 1 Position code 2 ------------------------------------------------------------ 94 33 - 126 N/A 96 32 - 127 N/A 94x94 33 - 126 33 - 126 96x96 32 - 127 32 - 127 Note that in the above cases position codes do not start at an expected value such as 0 or 1. The reason for this will become clear later. For example, Latin-1 is a 96-character charset, and JISX0208 (the Japanese national character set) is a 94x94-character charset. [Note that, although the ranges above define the _valid_ position codes for a charset, some of the slots in a particular charset may in fact be empty. This is the case for JISX0208, for example, where (e.g.) all the slots whose first position code is in the range 118 - 127 are empty.] There are three charsets that do not follow the above rules. All of them have one dimension, and have ranges of position codes as follows: Charset name Position code 1 ------------------------------------ ASCII 0 - 127 Control-1 0 - 31 Composite 0 - some large number (The upper bound of the position code for composite characters has not yet been determined, but it will probably be at least 16,383). ASCII is the union of two subsidiary character sets: Printing-ASCII (the printing ASCII character set, consisting of position codes 33 - 126, like for a standard 94-character charset) and Control-ASCII (the non-printing characters that would appear in a binary file with codes 0 - 32 and 127). Control-1 contains the non-printing characters that would appear in a binary file with codes 128 - 159. Composite contains characters that are generated by overstriking one or more characters from other charsets. Note that some characters in ASCII, and all characters in Control-1, are "control" (non-printing) characters. These have no printed representation but instead control some other function of the printing (e.g. TAB or 8 moves the current character position to the next tab stop). All other characters in all charsets are "graphic" (printing) characters. When a binary file is read in, the bytes in the file are assigned to character sets as follows: Bytes Character set Range -------------------------------------------------- 0 - 127 ASCII 0 - 127 128 - 159 Control-1 0 - 31 160 - 255 Latin-1 32 - 127 This is a bit ad-hoc but gets the job done.  File: internals.info, Node: Encodings, Next: Internal Mule Encodings, Prev: Character Sets, Up: MULE Character Sets and Encodings Encodings ========= An "encoding" is a way of numerically representing characters from one or more character sets. If an encoding only encompasses one character set, then the position codes for the characters in that character set could be used directly. This is not possible, however, if more than one character set is to be used in the encoding. For example, the conversion detailed above between bytes in a binary file and characters is effectively an encoding that encompasses the three character sets ASCII, Control-1, and Latin-1 in a stream of 8-bit bytes. Thus, an encoding can be viewed as a way of encoding characters from a specified group of character sets using a stream of bytes, each of which contains a fixed number of bits (but not necessarily 8, as in the common usage of "byte"). Here are descriptions of a couple of common encodings: * Menu: * Japanese EUC (Extended Unix Code):: * JIS7::  File: internals.info, Node: Japanese EUC (Extended Unix Code), Next: JIS7, Prev: Encodings, Up: Encodings Japanese EUC (Extended Unix Code) --------------------------------- This encompasses the character sets Printing-ASCII, Japanese-JISX0201, and Japanese-JISX0208-Kana (half-width katakana, the right half of JISX0201). It uses 8-bit bytes. Note that Printing-ASCII and Japanese-JISX0201-Kana are 94-character charsets, while Japanese-JISX0208 is a 94x94-character charset. The encoding is as follows: Character set Representation (PC=position-code) ------------- -------------- Printing-ASCII PC1 Japanese-JISX0201-Kana 0x8E | PC1 + 0x80 Japanese-JISX0208 PC1 + 0x80 | PC2 + 0x80 Japanese-JISX0212 PC1 + 0x80 | PC2 + 0x80  File: internals.info, Node: JIS7, Prev: Japanese EUC (Extended Unix Code), Up: Encodings JIS7 ---- This encompasses the character sets Printing-ASCII, Japanese-JISX0201-Roman (the left half of JISX0201; this character set is very similar to Printing-ASCII and is a 94-character charset), Japanese-JISX0208, and Japanese-JISX0201-Kana. It uses 7-bit bytes. Unlike Japanese EUC, this is a "modal" encoding, which means that there are multiple states that the encoding can be in, which affect how the bytes are to be interpreted. Special sequences of bytes (called "escape sequences") are used to change states. The encoding is as follows: Character set Representation (PC=position-code) ------------- -------------- Printing-ASCII PC1 Japanese-JISX0201-Roman PC1 Japanese-JISX0201-Kana PC1 Japanese-JISX0208 PC1 PC2 Escape sequence ASCII equivalent Meaning --------------- ---------------- ------- 0x1B 0x28 0x4A ESC ( J invoke Japanese-JISX0201-Roman 0x1B 0x28 0x49 ESC ( I invoke Japanese-JISX0201-Kana 0x1B 0x24 0x42 ESC $ B invoke Japanese-JISX0208 0x1B 0x28 0x42 ESC ( B invoke Printing-ASCII Initially, Printing-ASCII is invoked.  File: internals.info, Node: Internal Mule Encodings, Next: CCL, Prev: Encodings, Up: MULE Character Sets and Encodings Internal Mule Encodings ======================= In XEmacs/Mule, each character set is assigned a unique number, called a "leading byte". This is used in the encodings of a character. Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has a leading byte of 0), although some leading bytes are reserved. Charsets whose leading byte is in the range 0x80 - 0x9F are called "official" and are used for built-in charsets. Other charsets are called "private" and have leading bytes in the range 0xA0 - 0xFF; these are user-defined charsets. More specifically: Character set Leading byte ------------- ------------ ASCII 0 Composite 0x80 Dimension-1 Official 0x81 - 0x8D (0x8E is free) Control-1 0x8F Dimension-2 Official 0x90 - 0x99 (0x9A - 0x9D are free; 0x9E and 0x9F are reserved) Dimension-1 Private 0xA0 - 0xEF Dimension-2 Private 0xF0 - 0xFF There are two internal encodings for characters in XEmacs/Mule. One is called "string encoding" and is an 8-bit encoding that is used for representing characters in a buffer or string. It uses 1 to 4 bytes per character. The other is called "character encoding" and is a 19-bit encoding that is used for representing characters individually in a variable. (In the following descriptions, we'll ignore composite characters for the moment. We also give a general (structural) overview first, followed later by the exact details.) * Menu: * Internal String Encoding:: * Internal Character Encoding::  File: internals.info, Node: Internal String Encoding, Next: Internal Character Encoding, Prev: Internal Mule Encodings, Up: Internal Mule Encodings Internal String Encoding ------------------------ ASCII characters are encoded using their position code directly. Other characters are encoded using their leading byte followed by their position code(s) with the high bit set. Characters in private character sets have their leading byte prefixed with a "leading byte prefix", which is either 0x9E or 0x9F. (No character sets are ever assigned these leading bytes.) Specifically: Character set Encoding (PC=position-code, LB=leading-byte) ------------- -------- ASCII PC-1 | Control-1 LB | PC1 + 0xA0 | Dimension-1 official LB | PC1 + 0x80 | Dimension-1 private 0x9E | LB | PC1 + 0x80 | Dimension-2 official LB | PC1 + 0x80 | PC2 + 0x80 | Dimension-2 private 0x9F | LB | PC1 + 0x80 | PC2 + 0x80 The basic characteristic of this encoding is that the first byte of all characters is in the range 0x00 - 0x9F, and the second and following bytes of all characters is in the range 0xA0 - 0xFF. This means that it is impossible to get out of sync, or more specifically: 1. Given any byte position, the beginning of the character it is within can be determined in constant time. 2. Given any byte position at the beginning of a character, the beginning of the next character can be determined in constant time. 3. Given any byte position at the beginning of a character, the beginning of the previous character can be determined in constant time. 4. Textual searches can simply treat encoded strings as if they were encoded in a one-byte-per-character fashion rather than the actual multi-byte encoding. None of the standard non-modal encodings meet all of these conditions. For example, EUC satisfies only (2) and (3), while Shift-JIS and Big5 (not yet described) satisfy only (2). (All non-modal encodings must satisfy (2), in order to be unambiguous.)  File: internals.info, Node: Internal Character Encoding, Prev: Internal String Encoding, Up: Internal Mule Encodings Internal Character Encoding --------------------------- One 19-bit word represents a single character. The word is separated into three fields: Bit number: 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 <------------> <------------------> <------------------> Field: 1 2 3 Note that fields 2 and 3 hold 7 bits each, while field 1 holds 5 bits. Character set Field 1 Field 2 Field 3 ------------- ------- ------- ------- ASCII 0 0 PC1 range: (00 - 7F) Control-1 0 1 PC1 range: (00 - 1F) Dimension-1 official 0 LB - 0x80 PC1 range: (01 - 0D) (20 - 7F) Dimension-1 private 0 LB - 0x80 PC1 range: (20 - 6F) (20 - 7F) Dimension-2 official LB - 0x8F PC1 PC2 range: (01 - 0A) (20 - 7F) (20 - 7F) Dimension-2 private LB - 0xE1 PC1 PC2 range: (0F - 1E) (20 - 7F) (20 - 7F) Composite 0x1F ? ? Note that character codes 0 - 255 are the same as the "binary encoding" described above.  File: internals.info, Node: CCL, Prev: Internal Mule Encodings, Up: MULE Character Sets and Encodings CCL === CCL PROGRAM SYNTAX: CCL_PROGRAM := (CCL_MAIN_BLOCK [ CCL_EOF_BLOCK ]) CCL_MAIN_BLOCK := CCL_BLOCK CCL_EOF_BLOCK := CCL_BLOCK CCL_BLOCK := STATEMENT | (STATEMENT [STATEMENT ...]) STATEMENT := SET | IF | BRANCH | LOOP | REPEAT | BREAK | READ | WRITE SET := (REG = EXPRESSION) | (REG SELF_OP EXPRESSION) | INT-OR-CHAR EXPRESSION := ARG | (EXPRESSION OP ARG) IF := (if EXPRESSION CCL_BLOCK CCL_BLOCK) BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...]) LOOP := (loop STATEMENT [STATEMENT ...]) BREAK := (break) REPEAT := (repeat) | (write-repeat [REG | INT-OR-CHAR | string]) | (write-read-repeat REG [INT-OR-CHAR | string | ARRAY]?) READ := (read REG) | (read REG REG) | (read-if REG ARITH_OP ARG CCL_BLOCK CCL_BLOCK) | (read-branch REG CCL_BLOCK [CCL_BLOCK ...]) WRITE := (write REG) | (write REG REG) | (write INT-OR-CHAR) | (write STRING) | STRING | (write REG ARRAY) END := (end) REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7 ARG := REG | INT-OR-CHAR OP := + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | // | < | > | == | <= | >= | != SELF_OP := += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>= ARRAY := '[' INT-OR-CHAR ... ']' INT-OR-CHAR := INT | CHAR MACHINE CODE: The machine code consists of a vector of 32-bit words. The first such word specifies the start of the EOF section of the code; this is the code executed to handle any stuff that needs to be done (e.g. designating back to ASCII and left-to-right mode) after all other encoded/decoded data has been written out. This is not used for charset CCL programs. REGISTER: 0..7 -- referred by RRR or rrr OPERATOR BIT FIELD (27-bit): XXXXXXXXXXXXXXX RRR TTTTT TTTTT (5-bit): operator type RRR (3-bit): register number XXXXXXXXXXXXXXXX (15-bit): CCCCCCCCCCCCCCC: constant or address 000000000000rrr: register number AAAA: 00000 + 00001 - 00010 * 00011 / 00100 % 00101 & 00110 | 00111 ~ 01000 << 01001 >> 01010 <8 01011 >8 01100 // 01101 not used 01110 not used 01111 not used 10000 < 10001 > 10010 == 10011 <= 10100 >= 10101 != OPERATORS: TTTTT RRR XX.. SetCS: 00000 RRR C...C RRR = C...C SetCL: 00001 RRR ..... RRR = c...c c.............c SetR: 00010 RRR ..rrr RRR = rrr SetA: 00011 RRR ..rrr RRR = array[rrr] C.............C size of array = C...C c.............c contents = c...c Jump: 00100 000 c...c jump to c...c JumpCond: 00101 RRR c...c if (!RRR) jump to c...c WriteJump: 00110 RRR c...c Write1 RRR, jump to c...c WriteReadJump: 00111 RRR c...c Write1, Read1 RRR, jump to c...c WriteCJump: 01000 000 c...c Write1 C...C, jump to c...c C...C WriteCReadJump: 01001 RRR c...c Write1 C...C, Read1 RRR, C.............C and jump to c...c WriteSJump: 01010 000 c...c WriteS, jump to c...c C.............C S.............S ... WriteSReadJump: 01011 RRR c...c WriteS, Read1 RRR, jump to c...c C.............C S.............S ... WriteAReadJump: 01100 RRR c...c WriteA, Read1 RRR, jump to c...c C.............C size of array = C...C c.............c contents = c...c ... Branch: 01101 RRR C...C if (RRR >= 0 && RRR < C..) c.............c branch to (RRR+1)th address Read1: 01110 RRR ... read 1-byte to RRR Read2: 01111 RRR ..rrr read 2-byte to RRR and rrr ReadBranch: 10000 RRR C...C Read1 and Branch c.............c ... Write1: 10001 RRR ..... write 1-byte RRR Write2: 10010 RRR ..rrr write 2-byte RRR and rrr WriteC: 10011 000 ..... write 1-char C...CC C.............C WriteS: 10100 000 ..... write C..-byte of string C.............C S.............S ... WriteA: 10101 RRR ..... write array[RRR] C.............C size of array = C...C c.............c contents = c...c ... End: 10110 000 ..... terminate the execution SetSelfCS: 10111 RRR C...C RRR AAAAA= C...C ..........AAAAA SetSelfCL: 11000 RRR ..... RRR AAAAA= c...c c.............c ..........AAAAA SetSelfR: 11001 RRR ..Rrr RRR AAAAA= rrr ..........AAAAA SetExprCL: 11010 RRR ..Rrr RRR = rrr AAAAA c...c c.............c ..........AAAAA SetExprR: 11011 RRR ..rrr RRR = rrr AAAAA Rrr ............Rrr ..........AAAAA JumpCondC: 11100 RRR c...c if !(RRR AAAAA C..) jump to c...c C.............C ..........AAAAA JumpCondR: 11101 RRR c...c if !(RRR AAAAA rrr) jump to c...c ............rrr ..........AAAAA ReadJumpCondC: 11110 RRR c...c Read1 and JumpCondC C.............C ..........AAAAA ReadJumpCondR: 11111 RRR c...c Read1 and JumpCondR ............rrr ..........AAAAA  File: internals.info, Node: The Lisp Reader and Compiler, Next: Lstreams, Prev: MULE Character Sets and Encodings, Up: Top The Lisp Reader and Compiler **************************** Not yet documented.  File: internals.info, Node: Lstreams, Next: Consoles; Devices; Frames; Windows, Prev: The Lisp Reader and Compiler, Up: Top Lstreams ******** An "lstream" is an internal Lisp object that provides a generic buffering stream implementation. Conceptually, you send data to the stream or read data from the stream, not caring what's on the other end of the stream. The other end could be another stream, a file descriptor, a stdio stream, a fixed block of memory, a reallocating block of memory, etc. The main purpose of the stream is to provide a standard interface and to do buffering. Macros are defined to read or write characters, so the calling functions do not have to worry about blocking data together in order to achieve efficiency. * Menu: * Creating an Lstream:: Creating an lstream object. * Lstream Types:: Different sorts of things that are streamed. * Lstream Functions:: Functions for working with lstreams. * Lstream Methods:: Creating new lstream types.  File: internals.info, Node: Creating an Lstream, Next: Lstream Types, Prev: Lstreams, Up: Lstreams Creating an Lstream =================== Lstreams come in different types, depending on what is being interfaced to. Although the primitive for creating new lstreams is `Lstream_new()', generally you do not call this directly. Instead, you call some type-specific creation function, which creates the lstream and initializes it as appropriate for the particular type. All lstream creation functions take a MODE argument, specifying what mode the lstream should be opened as. This controls whether the lstream is for input and output, and optionally whether data should be blocked up in units of MULE characters. Note that some types of lstreams can only be opened for input; others only for output; and others can be opened either way. #### Richard Mlynarik thinks that there should be a strict separation between input and output streams, and he's probably right. MODE is a string, one of `"r"' Open for reading. `"w"' Open for writing. `"rc"' Open for reading, but "read" never returns partial MULE characters. `"wc"' Open for writing, but never writes partial MULE characters.  File: internals.info, Node: Lstream Types, Next: Lstream Functions, Prev: Creating an Lstream, Up: Lstreams Lstream Types ============= stdio filedesc lisp-string fixed-buffer resizing-buffer dynarr lisp-buffer print decoding encoding  File: internals.info, Node: Lstream Functions, Next: Lstream Methods, Prev: Lstream Types, Up: Lstreams Lstream Functions ================= - Function: Lstream * Lstream_new (Lstream_implementation *IMP, const char *MODE) Allocate and return a new Lstream. This function is not really meant to be called directly; rather, each stream type should provide its own stream creation function, which creates the stream and does any other necessary creation stuff (e.g. opening a file). - Function: void Lstream_set_buffering (Lstream *LSTR, Lstream_buffering BUFFERING, int BUFFERING_SIZE) Change the buffering of a stream. See `lstream.h'. By default the buffering is `STREAM_BLOCK_BUFFERED'. - Function: int Lstream_flush (Lstream *LSTR) Flush out any pending unwritten data in the stream. Clear any buffered input data. Returns 0 on success, -1 on error. - Macro: int Lstream_putc (Lstream *STREAM, int C) Write out one byte to the stream. This is a macro and so it is very efficient. The C argument is only evaluated once but the STREAM argument is evaluated more than once. Returns 0 on success, -1 on error. - Macro: int Lstream_getc (Lstream *STREAM) Read one byte from the stream. This is a macro and so it is very efficient. The STREAM argument is evaluated more than once. Return value is -1 for EOF or error. - Macro: void Lstream_ungetc (Lstream *STREAM, int C) Push one byte back onto the input queue. This will be the next byte read from the stream. Any number of bytes can be pushed back and will be read in the reverse order they were pushed back--most recent first. (This is necessary for consistency--if there are a number of bytes that have been unread and I read and unread a byte, it needs to be the first to be read again.) This is a macro and so it is very efficient. The C argument is only evaluated once but the STREAM argument is evaluated more than once. - Function: int Lstream_fputc (Lstream *STREAM, int C) - Function: int Lstream_fgetc (Lstream *STREAM) - Function: void Lstream_fungetc (Lstream *STREAM, int C) Function equivalents of the above macros. - Function: ssize_t Lstream_read (Lstream *STREAM, void *DATA, size_t SIZE) Read SIZE bytes of DATA from the stream. Return the number of bytes read. 0 means EOF. -1 means an error occurred and no bytes were read. - Function: ssize_t Lstream_write (Lstream *STREAM, void *DATA, size_t SIZE) Write SIZE bytes of DATA to the stream. Return the number of bytes written. -1 means an error occurred and no bytes were written. - Function: void Lstream_unread (Lstream *STREAM, void *DATA, size_t SIZE) Push back SIZE bytes of DATA onto the input queue. The next call to `Lstream_read()' with the same size will read the same bytes back. Note that this will be the case even if there is other pending unread data. - Function: int Lstream_close (Lstream *STREAM) Close the stream. All data will be flushed out. - Function: void Lstream_reopen (Lstream *STREAM) Reopen a closed stream. This enables I/O on it again. This is not meant to be called except from a wrapper routine that reinitializes variables and such--the close routine may well have freed some necessary storage structures, for example. - Function: void Lstream_rewind (Lstream *STREAM) Rewind the stream to the beginning.  File: internals.info, Node: Lstream Methods, Prev: Lstream Functions, Up: Lstreams Lstream Methods =============== - Lstream Method: ssize_t reader (Lstream *STREAM, unsigned char *DATA, size_t SIZE) Read some data from the stream's end and store it into DATA, which can hold SIZE bytes. Return the number of bytes read. A return value of 0 means no bytes can be read at this time. This may be because of an EOF, or because there is a granularity greater than one byte that the stream imposes on the returned data, and SIZE is less than this granularity. (This will happen frequently for streams that need to return whole characters, because `Lstream_read()' calls the reader function repeatedly until it has the number of bytes it wants or until 0 is returned.) The lstream functions do not treat a 0 return as EOF or do anything special; however, the calling function will interpret any 0 it gets back as EOF. This will normally not happen unless the caller calls `Lstream_read()' with a very small size. This function can be `NULL' if the stream is output-only. - Lstream Method: ssize_t writer (Lstream *STREAM, const unsigned char *DATA, size_t SIZE) Send some data to the stream's end. Data to be sent is in DATA and is SIZE bytes. Return the number of bytes sent. This function can send and return fewer bytes than is passed in; in that case, the function will just be called again until there is no data left or 0 is returned. A return value of 0 means that no more data can be currently stored, but there is no error; the data will be squirreled away until the writer can accept data. (This is useful, e.g., if you're dealing with a non-blocking file descriptor and are getting `EWOULDBLOCK' errors.) This function can be `NULL' if the stream is input-only. - Lstream Method: int rewinder (Lstream *STREAM) Rewind the stream. If this is `NULL', the stream is not seekable. - Lstream Method: int seekable_p (Lstream *STREAM) Indicate whether this stream is seekable--i.e. it can be rewound. This method is ignored if the stream does not have a rewind method. If this method is not present, the result is determined by whether a rewind method is present. - Lstream Method: int flusher (Lstream *STREAM) Perform any additional operations necessary to flush the data in this stream. - Lstream Method: int pseudo_closer (Lstream *STREAM) - Lstream Method: int closer (Lstream *STREAM) Perform any additional operations necessary to close this stream down. May be `NULL'. This function is called when `Lstream_close()' is called or when the stream is garbage-collected. When this function is called, all pending data in the stream will already have been written out. - Lstream Method: Lisp_Object marker (Lisp_Object LSTREAM, void (*MARKFUN) (Lisp_Object)) Mark this object for garbage collection. Same semantics as a standard `Lisp_Object' marker. This function can be `NULL'.  File: internals.info, Node: Consoles; Devices; Frames; Windows, Next: The Redisplay Mechanism, Prev: Lstreams, Up: Top Consoles; Devices; Frames; Windows ********************************** * Menu: * Introduction to Consoles; Devices; Frames; Windows:: * Point:: * Window Hierarchy:: * The Window Object::  File: internals.info, Node: Introduction to Consoles; Devices; Frames; Windows, Next: Point, Prev: Consoles; Devices; Frames; Windows, Up: Consoles; Devices; Frames; Windows Introduction to Consoles; Devices; Frames; Windows ================================================== A window-system window that you see on the screen is called a "frame" in Emacs terminology. Each frame is subdivided into one or more non-overlapping panes, called (confusingly) "windows". Each window displays the text of a buffer in it. (See above on Buffers.) Note that buffers and windows are independent entities: Two or more windows can be displaying the same buffer (potentially in different locations), and a buffer can be displayed in no windows. A single display screen that contains one or more frames is called a "display". Under most circumstances, there is only one display. However, more than one display can exist, for example if you have a "multi-headed" console, i.e. one with a single keyboard but multiple displays. (Typically in such a situation, the various displays act like one large display, in that the mouse is only in one of them at a time, and moving the mouse off of one moves it into another.) In some cases, the different displays will have different characteristics, e.g. one color and one mono. XEmacs can display frames on multiple displays. It can even deal simultaneously with frames on multiple keyboards (called "consoles" in XEmacs terminology). Here is one case where this might be useful: You are using XEmacs on your workstation at work, and leave it running. Then you go home and dial in on a TTY line, and you can use the already-running XEmacs process to display another frame on your local TTY. Thus, there is a hierarchy console -> display -> frame -> window. There is a separate Lisp object type for each of these four concepts. Furthermore, there is logically a "selected console", "selected display", "selected frame", and "selected window". Each of these objects is distinguished in various ways, such as being the default object for various functions that act on objects of that type. Note that every containing object remembers the "selected" object among the objects that it contains: e.g. not only is there a selected window, but every frame remembers the last window in it that was selected, and changing the selected frame causes the remembered window within it to become the selected window. Similar relationships apply for consoles to devices and devices to frames.  File: internals.info, Node: Point, Next: Window Hierarchy, Prev: Introduction to Consoles; Devices; Frames; Windows, Up: Consoles; Devices; Frames; Windows Point ===== Recall that every buffer has a current insertion position, called "point". Now, two or more windows may be displaying the same buffer, and the text cursor in the two windows (i.e. `point') can be in two different places. You may ask, how can that be, since each buffer has only one value of `point'? The answer is that each window also has a value of `point' that is squirreled away in it. There is only one selected window, and the value of "point" in that buffer corresponds to that window. When the selected window is changed from one window to another displaying the same buffer, the old value of `point' is stored into the old window's "point" and the value of `point' from the new window is retrieved and made the value of `point' in the buffer. This means that `window-point' for the selected window is potentially inaccurate, and if you want to retrieve the correct value of `point' for a window, you must special-case on the selected window and retrieve the buffer's point instead. This is related to why `save-window-excursion' does not save the selected window's value of `point'.