1 This is ../info/internals.info, produced by makeinfo version 4.0b from
2 internals/internals.texi.
4 INFO-DIR-SECTION XEmacs Editor
6 * Internals: (internals). XEmacs Internals Manual.
9 Copyright (C) 1992 - 1996 Ben Wing. Copyright (C) 1996, 1997 Sun
10 Microsystems. Copyright (C) 1994 - 1998 Free Software Foundation.
11 Copyright (C) 1994, 1995 Board of Trustees, University of Illinois.
13 Permission is granted to make and distribute verbatim copies of this
14 manual provided the copyright notice and this permission notice are
15 preserved on all copies.
17 Permission is granted to copy and distribute modified versions of
18 this manual under the conditions for verbatim copying, provided that the
19 entire resulting derived work is distributed under the terms of a
20 permission notice identical to this one.
22 Permission is granted to copy and distribute translations of this
23 manual into another language, under the above conditions for modified
24 versions, except that this permission notice may be stated in a
25 translation approved by the Foundation.
27 Permission is granted to copy and distribute modified versions of
28 this manual under the conditions for verbatim copying, provided also
29 that the section entitled "GNU General Public License" is included
30 exactly as in the original, and provided that the entire resulting
31 derived work is distributed under the terms of a permission notice
32 identical to this one.
34 Permission is granted to copy and distribute translations of this
35 manual into another language, under the above conditions for modified
36 versions, except that the section entitled "GNU General Public License"
37 may be included in a translation approved by the Free Software
38 Foundation instead of in the original English.
41 File: internals.info, Node: The Text in a Buffer, Next: Buffer Lists, Prev: Introduction to Buffers, Up: Buffers and Textual Representation
46 The text in a buffer consists of a sequence of zero or more
47 characters. A "character" is an integer that logically represents a
48 letter, number, space, or other unit of text. Most of the characters
49 that you will typically encounter belong to the ASCII set of characters,
50 but there are also characters for various sorts of accented letters,
51 special symbols, Chinese and Japanese ideograms (i.e. Kanji, Katakana,
52 etc.), Cyrillic and Greek letters, etc. The actual number of possible
53 characters is quite large.
55 For now, we can view a character as some non-negative integer that
56 has some shape that defines how it typically appears (e.g. as an
57 uppercase A). (The exact way in which a character appears depends on the
58 font used to display the character.) The internal type of characters in
59 the C code is an `Emchar'; this is just an `int', but using a symbolic
60 type makes the code clearer.
62 Between every character in a buffer is a "buffer position" or
63 "character position". We can speak of the character before or after a
64 particular buffer position, and when you insert a character at a
65 particular position, all characters after that position end up at new
66 positions. When we speak of the character "at" a position, we really
67 mean the character after the position. (This schizophrenia between a
68 buffer position being "between" a character and "on" a character is
71 Buffer positions are numbered starting at 1. This means that
72 position 1 is before the first character, and position 0 is not valid.
73 If there are N characters in a buffer, then buffer position N+1 is
74 after the last one, and position N+2 is not valid.
76 The internal makeup of the Emchar integer varies depending on whether
77 we have compiled with MULE support. If not, the Emchar integer is an
78 8-bit integer with possible values from 0 - 255. 0 - 127 are the
79 standard ASCII characters, while 128 - 255 are the characters from the
80 ISO-8859-1 character set. If we have compiled with MULE support, an
81 Emchar is a 19-bit integer, with the various bits having meanings
82 according to a complex scheme that will be detailed later. The
83 characters numbered 0 - 255 still have the same meanings as for the
84 non-MULE case, though.
86 Internally, the text in a buffer is represented in a fairly simple
87 fashion: as a contiguous array of bytes, with a "gap" of some size in
88 the middle. Although the gap is of some substantial size in bytes,
89 there is no text contained within it: From the perspective of the text
90 in the buffer, it does not exist. The gap logically sits at some buffer
91 position, between two characters (or possibly at the beginning or end of
92 the buffer). Insertion of text in a buffer at a particular position is
93 always accomplished by first moving the gap to that position (i.e.
94 through some block moving of text), then writing the text into the
95 beginning of the gap, thereby shrinking the gap. If the gap shrinks
96 down to nothing, a new gap is created. (What actually happens is that a
97 new gap is "created" at the end of the buffer's text, which requires
98 nothing more than changing a couple of indices; then the gap is "moved"
99 to the position where the insertion needs to take place by moving up in
100 memory all the text after that position.) Similarly, deletion occurs
101 by moving the gap to the place where the text is to be deleted, and
102 then simply expanding the gap to include the deleted text.
103 ("Expanding" and "shrinking" the gap as just described means just that
104 the internal indices that keep track of where the gap is located are
107 Note that the total amount of memory allocated for a buffer text
108 never decreases while the buffer is live. Therefore, if you load up a
109 20-megabyte file and then delete all but one character, there will be a
110 20-megabyte gap, which won't get any smaller (except by inserting
111 characters back again). Once the buffer is killed, the memory allocated
112 for the buffer text will be freed, but it will still be sitting on the
113 heap, taking up virtual memory, and will not be released back to the
114 operating system. (However, if you have compiled XEmacs with rel-alloc,
115 the situation is different. In this case, the space _will_ be released
116 back to the operating system. However, this tends to result in a
117 noticeable speed penalty.)
119 Astute readers may notice that the text in a buffer is represented as
120 an array of _bytes_, while (at least in the MULE case) an Emchar is a
121 19-bit integer, which clearly cannot fit in a byte. This means (of
122 course) that the text in a buffer uses a different representation from
123 an Emchar: specifically, the 19-bit Emchar becomes a series of one to
124 four bytes. The conversion between these two representations is complex
125 and will be described later.
127 In the non-MULE case, everything is very simple: An Emchar is an
128 8-bit value, which fits neatly into one byte.
130 If we are given a buffer position and want to retrieve the character
131 at that position, we need to follow these steps:
133 1. Pretend there's no gap, and convert the buffer position into a
134 "byte index" that indexes to the appropriate byte in the buffer's
135 stream of textual bytes. By convention, byte indices begin at 1,
136 just like buffer positions. In the non-MULE case, byte indices
137 and buffer positions are identical, since one character equals one
140 2. Convert the byte index into a "memory index", which takes the gap
141 into account. The memory index is a direct index into the block of
142 memory that stores the text of a buffer. This basically just
143 involves checking to see if the byte index is past the gap, and if
144 so, adding the size of the gap to it. By convention, memory
145 indices begin at 1, just like buffer positions and byte indices,
146 and when referring to the position that is "at" the gap, we always
147 use the memory position at the _beginning_, not at the end, of the
150 3. Fetch the appropriate bytes at the determined memory position.
152 4. Convert these bytes into an Emchar.
154 In the non-Mule case, (3) and (4) boil down to a simple one-byte
157 Note that we have defined three types of positions in a buffer:
159 1. "buffer positions" or "character positions", typedef `Bufpos'
161 2. "byte indices", typedef `Bytind'
163 3. "memory indices", typedef `Memind'
165 All three typedefs are just `int's, but defining them this way makes
166 things a lot clearer.
168 Most code works with buffer positions. In particular, all Lisp code
169 that refers to text in a buffer uses buffer positions. Lisp code does
170 not know that byte indices or memory indices exist.
172 Finally, we have a typedef for the bytes in a buffer. This is a
173 `Bufbyte', which is an unsigned char. Referring to them as Bufbytes
174 underscores the fact that we are working with a string of bytes in the
175 internal Emacs buffer representation rather than in one of a number of
176 possible alternative representations (e.g. EUC-encoded text, etc.).
179 File: internals.info, Node: Buffer Lists, Next: Markers and Extents, Prev: The Text in a Buffer, Up: Buffers and Textual Representation
184 Recall earlier that buffers are "permanent" objects, i.e. that they
185 remain around until explicitly deleted. This entails that there is a
186 list of all the buffers in existence. This list is actually an
187 assoc-list (mapping from the buffer's name to the buffer) and is stored
188 in the global variable `Vbuffer_alist'.
190 The order of the buffers in the list is important: the buffers are
191 ordered approximately from most-recently-used to least-recently-used.
192 Switching to a buffer using `switch-to-buffer', `pop-to-buffer', etc.
193 and switching windows using `other-window', etc. usually brings the
194 new current buffer to the front of the list. `switch-to-buffer',
195 `other-buffer', etc. look at the beginning of the list to find an
196 alternative buffer to suggest. You can also explicitly move a buffer
197 to the end of the list using `bury-buffer'.
199 In addition to the global ordering in `Vbuffer_alist', each frame
200 has its own ordering of the list. These lists always contain the same
201 elements as in `Vbuffer_alist' although possibly in a different order.
202 `buffer-list' normally returns the list for the selected frame. This
203 allows you to work in separate frames without things interfering with
206 The standard way to look up a buffer given a name is `get-buffer',
207 and the standard way to create a new buffer is `get-buffer-create',
208 which looks up a buffer with a given name, creating a new one if
209 necessary. These operations correspond exactly with the symbol
210 operations `intern-soft' and `intern', respectively. You can also
211 force a new buffer to be created using `generate-new-buffer', which
212 takes a name and (if necessary) makes a unique name from this by
213 appending a number, and then creates the buffer. This is basically
214 like the symbol operation `gensym'.
217 File: internals.info, Node: Markers and Extents, Next: Bufbytes and Emchars, Prev: Buffer Lists, Up: Buffers and Textual Representation
222 Among the things associated with a buffer are things that are
223 logically attached to certain buffer positions. This can be used to
224 keep track of a buffer position when text is inserted and deleted, so
225 that it remains at the same spot relative to the text around it; to
226 assign properties to particular sections of text; etc. There are two
227 such objects that are useful in this regard: they are "markers" and
230 A "marker" is simply a flag placed at a particular buffer position,
231 which is moved around as text is inserted and deleted. Markers are
232 used for all sorts of purposes, such as the `mark' that is the other
233 end of textual regions to be cut, copied, etc.
235 An "extent" is similar to two markers plus some associated
236 properties, and is used to keep track of regions in a buffer as text is
237 inserted and deleted, and to add properties (e.g. fonts) to particular
238 regions of text. The external interface of extents is explained
241 The important thing here is that markers and extents simply contain
242 buffer positions in them as integers, and every time text is inserted or
243 deleted, these positions must be updated. In order to minimize the
244 amount of shuffling that needs to be done, the positions in markers and
245 extents (there's one per marker, two per extent) are stored in Meminds.
246 This means that they only need to be moved when the text is physically
247 moved in memory; since the gap structure tries to minimize this, it also
248 minimizes the number of marker and extent indices that need to be
249 adjusted. Look in `insdel.c' for the details of how this works.
251 One other important distinction is that markers are "temporary"
252 while extents are "permanent". This means that markers disappear as
253 soon as there are no more pointers to them, and correspondingly, there
254 is no way to determine what markers are in a buffer if you are just
255 given the buffer. Extents remain in a buffer until they are detached
256 (which could happen as a result of text being deleted) or the buffer is
257 deleted, and primitives do exist to enumerate the extents in a buffer.
260 File: internals.info, Node: Bufbytes and Emchars, Next: The Buffer Object, Prev: Markers and Extents, Up: Buffers and Textual Representation
268 File: internals.info, Node: The Buffer Object, Prev: Bufbytes and Emchars, Up: Buffers and Textual Representation
273 Buffers contain fields not directly accessible by the Lisp
274 programmer. We describe them here, naming them by the names used in
275 the C code. Many are accessible indirectly in Lisp programs via Lisp
279 The buffer name is a string that names the buffer. It is
280 guaranteed to be unique. *Note Buffer Names: (lispref)Buffer
284 This field contains the time when the buffer was last saved, as an
285 integer. *Note Buffer Modification: (lispref)Buffer Modification.
288 This field contains the modification time of the visited file. It
289 is set when the file is written or read. Every time the buffer is
290 written to the file, this field is compared to the modification
291 time of the file. *Note Buffer Modification: (lispref)Buffer
295 This field contains the time when the buffer was last auto-saved.
298 This field contains the `window-start' position in the buffer as of
299 the last time the buffer was displayed in a window.
302 This field points to the buffer's undo list. *Note Undo:
306 This field contains the syntax table for the buffer. *Note Syntax
307 Tables: (lispref)Syntax Tables.
310 This field contains the conversion table for converting text to
311 lower case. *Note Case Tables: (lispref)Case Tables.
314 This field contains the conversion table for converting text to
315 upper case. *Note Case Tables: (lispref)Case Tables.
318 This field contains the conversion table for canonicalizing text
319 for case-folding search. *Note Case Tables: (lispref)Case Tables.
322 This field contains the equivalence table for case-folding search.
323 *Note Case Tables: (lispref)Case Tables.
326 This field contains the buffer's display table, or `nil' if it
327 doesn't have one. *Note Display Tables: (lispref)Display Tables.
330 This field contains the chain of all markers that currently point
331 into the buffer. Deletion of text in the buffer, and motion of
332 the buffer's gap, must check each of these markers and perhaps
333 update it. *Note Markers: (lispref)Markers.
336 This field is a flag that tells whether a backup file has been
337 made for the visited file of this buffer.
340 This field contains the mark for the buffer. The mark is a marker,
341 hence it is also included on the list `markers'. *Note The Mark:
345 This field is non-`nil' if the buffer's mark is active.
348 This field contains the association list describing the variables
349 local in this buffer, and their values, with the exception of
350 local variables that have special slots in the buffer object.
351 (Those slots are omitted from this table.) *Note Buffer-Local
352 Variables: (lispref)Buffer-Local Variables.
355 This field contains a Lisp object which controls how to display
356 the mode line for this buffer. *Note Modeline Format:
357 (lispref)Modeline Format.
360 This field holds the buffer's base buffer (if it is an indirect
364 File: internals.info, Node: MULE Character Sets and Encodings, Next: The Lisp Reader and Compiler, Prev: Buffers and Textual Representation, Up: Top
366 MULE Character Sets and Encodings
367 *********************************
369 Recall that there are two primary ways that text is represented in
370 XEmacs. The "buffer" representation sees the text as a series of bytes
371 (Bufbytes), with a variable number of bytes used per character. The
372 "character" representation sees the text as a series of integers
373 (Emchars), one per character. The character representation is a cleaner
374 representation from a theoretical standpoint, and is thus used in many
375 cases when lots of manipulations on a string need to be done. However,
376 the buffer representation is the standard representation used in both
377 Lisp strings and buffers, and because of this, it is the "default"
378 representation that text comes in. The reason for using this
379 representation is that it's compact and is compatible with ASCII.
385 * Internal Mule Encodings::
389 File: internals.info, Node: Character Sets, Next: Encodings, Prev: MULE Character Sets and Encodings, Up: MULE Character Sets and Encodings
394 A character set (or "charset") is an ordered set of characters. A
395 particular character in a charset is indexed using one or more
396 "position codes", which are non-negative integers. The number of
397 position codes needed to identify a particular character in a charset is
398 called the "dimension" of the charset. In XEmacs/Mule, all charsets
399 have dimension 1 or 2, and the size of all charsets (except for a few
400 special cases) is either 94, 96, 94 by 94, or 96 by 96. The range of
401 position codes used to index characters from any of these types of
402 character sets is as follows:
404 Charset type Position code 1 Position code 2
405 ------------------------------------------------------------
408 94x94 33 - 126 33 - 126
409 96x96 32 - 127 32 - 127
411 Note that in the above cases position codes do not start at an
412 expected value such as 0 or 1. The reason for this will become clear
415 For example, Latin-1 is a 96-character charset, and JISX0208 (the
416 Japanese national character set) is a 94x94-character charset.
418 [Note that, although the ranges above define the _valid_ position
419 codes for a charset, some of the slots in a particular charset may in
420 fact be empty. This is the case for JISX0208, for example, where (e.g.)
421 all the slots whose first position code is in the range 118 - 127 are
424 There are three charsets that do not follow the above rules. All of
425 them have one dimension, and have ranges of position codes as follows:
427 Charset name Position code 1
428 ------------------------------------
431 Composite 0 - some large number
433 (The upper bound of the position code for composite characters has
434 not yet been determined, but it will probably be at least 16,383).
436 ASCII is the union of two subsidiary character sets: Printing-ASCII
437 (the printing ASCII character set, consisting of position codes 33 -
438 126, like for a standard 94-character charset) and Control-ASCII (the
439 non-printing characters that would appear in a binary file with codes 0
442 Control-1 contains the non-printing characters that would appear in a
443 binary file with codes 128 - 159.
445 Composite contains characters that are generated by overstriking one
446 or more characters from other charsets.
448 Note that some characters in ASCII, and all characters in Control-1,
449 are "control" (non-printing) characters. These have no printed
450 representation but instead control some other function of the printing
451 (e.g. TAB or 8 moves the current character position to the next tab
452 stop). All other characters in all charsets are "graphic" (printing)
455 When a binary file is read in, the bytes in the file are assigned to
456 character sets as follows:
458 Bytes Character set Range
459 --------------------------------------------------
460 0 - 127 ASCII 0 - 127
461 128 - 159 Control-1 0 - 31
462 160 - 255 Latin-1 32 - 127
464 This is a bit ad-hoc but gets the job done.
467 File: internals.info, Node: Encodings, Next: Internal Mule Encodings, Prev: Character Sets, Up: MULE Character Sets and Encodings
472 An "encoding" is a way of numerically representing characters from
473 one or more character sets. If an encoding only encompasses one
474 character set, then the position codes for the characters in that
475 character set could be used directly. This is not possible, however, if
476 more than one character set is to be used in the encoding.
478 For example, the conversion detailed above between bytes in a binary
479 file and characters is effectively an encoding that encompasses the
480 three character sets ASCII, Control-1, and Latin-1 in a stream of 8-bit
483 Thus, an encoding can be viewed as a way of encoding characters from
484 a specified group of character sets using a stream of bytes, each of
485 which contains a fixed number of bits (but not necessarily 8, as in the
486 common usage of "byte").
488 Here are descriptions of a couple of common encodings:
492 * Japanese EUC (Extended Unix Code)::
496 File: internals.info, Node: Japanese EUC (Extended Unix Code), Next: JIS7, Prev: Encodings, Up: Encodings
498 Japanese EUC (Extended Unix Code)
499 ---------------------------------
501 This encompasses the character sets Printing-ASCII,
502 Japanese-JISX0201, and Japanese-JISX0208-Kana (half-width katakana, the
503 right half of JISX0201). It uses 8-bit bytes.
505 Note that Printing-ASCII and Japanese-JISX0201-Kana are 94-character
506 charsets, while Japanese-JISX0208 is a 94x94-character charset.
508 The encoding is as follows:
510 Character set Representation (PC=position-code)
511 ------------- --------------
513 Japanese-JISX0201-Kana 0x8E | PC1 + 0x80
514 Japanese-JISX0208 PC1 + 0x80 | PC2 + 0x80
515 Japanese-JISX0212 PC1 + 0x80 | PC2 + 0x80
518 File: internals.info, Node: JIS7, Prev: Japanese EUC (Extended Unix Code), Up: Encodings
523 This encompasses the character sets Printing-ASCII,
524 Japanese-JISX0201-Roman (the left half of JISX0201; this character set
525 is very similar to Printing-ASCII and is a 94-character charset),
526 Japanese-JISX0208, and Japanese-JISX0201-Kana. It uses 7-bit bytes.
528 Unlike Japanese EUC, this is a "modal" encoding, which means that
529 there are multiple states that the encoding can be in, which affect how
530 the bytes are to be interpreted. Special sequences of bytes (called
531 "escape sequences") are used to change states.
533 The encoding is as follows:
535 Character set Representation (PC=position-code)
536 ------------- --------------
538 Japanese-JISX0201-Roman PC1
539 Japanese-JISX0201-Kana PC1
540 Japanese-JISX0208 PC1 PC2
543 Escape sequence ASCII equivalent Meaning
544 --------------- ---------------- -------
545 0x1B 0x28 0x4A ESC ( J invoke Japanese-JISX0201-Roman
546 0x1B 0x28 0x49 ESC ( I invoke Japanese-JISX0201-Kana
547 0x1B 0x24 0x42 ESC $ B invoke Japanese-JISX0208
548 0x1B 0x28 0x42 ESC ( B invoke Printing-ASCII
550 Initially, Printing-ASCII is invoked.
553 File: internals.info, Node: Internal Mule Encodings, Next: CCL, Prev: Encodings, Up: MULE Character Sets and Encodings
555 Internal Mule Encodings
556 =======================
558 In XEmacs/Mule, each character set is assigned a unique number,
559 called a "leading byte". This is used in the encodings of a character.
560 Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has
561 a leading byte of 0), although some leading bytes are reserved.
563 Charsets whose leading byte is in the range 0x80 - 0x9F are called
564 "official" and are used for built-in charsets. Other charsets are
565 called "private" and have leading bytes in the range 0xA0 - 0xFF; these
566 are user-defined charsets.
570 Character set Leading byte
571 ------------- ------------
574 Dimension-1 Official 0x81 - 0x8D
577 Dimension-2 Official 0x90 - 0x99
578 (0x9A - 0x9D are free;
579 0x9E and 0x9F are reserved)
580 Dimension-1 Private 0xA0 - 0xEF
581 Dimension-2 Private 0xF0 - 0xFF
583 There are two internal encodings for characters in XEmacs/Mule. One
584 is called "string encoding" and is an 8-bit encoding that is used for
585 representing characters in a buffer or string. It uses 1 to 4 bytes per
586 character. The other is called "character encoding" and is a 19-bit
587 encoding that is used for representing characters individually in a
590 (In the following descriptions, we'll ignore composite characters for
591 the moment. We also give a general (structural) overview first,
592 followed later by the exact details.)
596 * Internal String Encoding::
597 * Internal Character Encoding::
600 File: internals.info, Node: Internal String Encoding, Next: Internal Character Encoding, Prev: Internal Mule Encodings, Up: Internal Mule Encodings
602 Internal String Encoding
603 ------------------------
605 ASCII characters are encoded using their position code directly.
606 Other characters are encoded using their leading byte followed by their
607 position code(s) with the high bit set. Characters in private character
608 sets have their leading byte prefixed with a "leading byte prefix",
609 which is either 0x9E or 0x9F. (No character sets are ever assigned these
610 leading bytes.) Specifically:
612 Character set Encoding (PC=position-code, LB=leading-byte)
613 ------------- --------
615 Control-1 LB | PC1 + 0xA0 |
616 Dimension-1 official LB | PC1 + 0x80 |
617 Dimension-1 private 0x9E | LB | PC1 + 0x80 |
618 Dimension-2 official LB | PC1 + 0x80 | PC2 + 0x80 |
619 Dimension-2 private 0x9F | LB | PC1 + 0x80 | PC2 + 0x80
621 The basic characteristic of this encoding is that the first byte of
622 all characters is in the range 0x00 - 0x9F, and the second and
623 following bytes of all characters is in the range 0xA0 - 0xFF. This
624 means that it is impossible to get out of sync, or more specifically:
626 1. Given any byte position, the beginning of the character it is
627 within can be determined in constant time.
629 2. Given any byte position at the beginning of a character, the
630 beginning of the next character can be determined in constant time.
632 3. Given any byte position at the beginning of a character, the
633 beginning of the previous character can be determined in constant
636 4. Textual searches can simply treat encoded strings as if they were
637 encoded in a one-byte-per-character fashion rather than the actual
640 None of the standard non-modal encodings meet all of these
641 conditions. For example, EUC satisfies only (2) and (3), while
642 Shift-JIS and Big5 (not yet described) satisfy only (2). (All non-modal
643 encodings must satisfy (2), in order to be unambiguous.)
646 File: internals.info, Node: Internal Character Encoding, Prev: Internal String Encoding, Up: Internal Mule Encodings
648 Internal Character Encoding
649 ---------------------------
651 One 19-bit word represents a single character. The word is
652 separated into three fields:
654 Bit number: 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
655 <------------> <------------------> <------------------>
658 Note that fields 2 and 3 hold 7 bits each, while field 1 holds 5
661 Character set Field 1 Field 2 Field 3
662 ------------- ------- ------- -------
667 Dimension-1 official 0 LB - 0x80 PC1
668 range: (01 - 0D) (20 - 7F)
669 Dimension-1 private 0 LB - 0x80 PC1
670 range: (20 - 6F) (20 - 7F)
671 Dimension-2 official LB - 0x8F PC1 PC2
672 range: (01 - 0A) (20 - 7F) (20 - 7F)
673 Dimension-2 private LB - 0xE1 PC1 PC2
674 range: (0F - 1E) (20 - 7F) (20 - 7F)
677 Note that character codes 0 - 255 are the same as the "binary
678 encoding" described above.
681 File: internals.info, Node: CCL, Prev: Internal Mule Encodings, Up: MULE Character Sets and Encodings
687 CCL_PROGRAM := (CCL_MAIN_BLOCK
690 CCL_MAIN_BLOCK := CCL_BLOCK
691 CCL_EOF_BLOCK := CCL_BLOCK
693 CCL_BLOCK := STATEMENT | (STATEMENT [STATEMENT ...])
695 SET | IF | BRANCH | LOOP | REPEAT | BREAK
698 SET := (REG = EXPRESSION) | (REG SELF_OP EXPRESSION)
701 EXPRESSION := ARG | (EXPRESSION OP ARG)
703 IF := (if EXPRESSION CCL_BLOCK CCL_BLOCK)
704 BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...])
705 LOOP := (loop STATEMENT [STATEMENT ...])
708 | (write-repeat [REG | INT-OR-CHAR | string])
709 | (write-read-repeat REG [INT-OR-CHAR | string | ARRAY]?)
710 READ := (read REG) | (read REG REG)
711 | (read-if REG ARITH_OP ARG CCL_BLOCK CCL_BLOCK)
712 | (read-branch REG CCL_BLOCK [CCL_BLOCK ...])
713 WRITE := (write REG) | (write REG REG)
714 | (write INT-OR-CHAR) | (write STRING) | STRING
718 REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7
719 ARG := REG | INT-OR-CHAR
720 OP := + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | //
721 | < | > | == | <= | >= | !=
723 += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>=
724 ARRAY := '[' INT-OR-CHAR ... ']'
725 INT-OR-CHAR := INT | CHAR
729 The machine code consists of a vector of 32-bit words.
730 The first such word specifies the start of the EOF section of the code;
731 this is the code executed to handle any stuff that needs to be done
732 (e.g. designating back to ASCII and left-to-right mode) after all
733 other encoded/decoded data has been written out. This is not used for
734 charset CCL programs.
736 REGISTER: 0..7 -- referred by RRR or rrr
738 OPERATOR BIT FIELD (27-bit): XXXXXXXXXXXXXXX RRR TTTTT
739 TTTTT (5-bit): operator type
740 RRR (3-bit): register number
741 XXXXXXXXXXXXXXXX (15-bit):
742 CCCCCCCCCCCCCCC: constant or address
743 000000000000rrr: register number
770 OPERATORS: TTTTT RRR XX..
772 SetCS: 00000 RRR C...C RRR = C...C
773 SetCL: 00001 RRR ..... RRR = c...c
775 SetR: 00010 RRR ..rrr RRR = rrr
776 SetA: 00011 RRR ..rrr RRR = array[rrr]
777 C.............C size of array = C...C
778 c.............c contents = c...c
780 Jump: 00100 000 c...c jump to c...c
781 JumpCond: 00101 RRR c...c if (!RRR) jump to c...c
782 WriteJump: 00110 RRR c...c Write1 RRR, jump to c...c
783 WriteReadJump: 00111 RRR c...c Write1, Read1 RRR, jump to c...c
784 WriteCJump: 01000 000 c...c Write1 C...C, jump to c...c
786 WriteCReadJump: 01001 RRR c...c Write1 C...C, Read1 RRR,
787 C.............C and jump to c...c
788 WriteSJump: 01010 000 c...c WriteS, jump to c...c
792 WriteSReadJump: 01011 RRR c...c WriteS, Read1 RRR, jump to c...c
796 WriteAReadJump: 01100 RRR c...c WriteA, Read1 RRR, jump to c...c
797 C.............C size of array = C...C
798 c.............c contents = c...c
800 Branch: 01101 RRR C...C if (RRR >= 0 && RRR < C..)
801 c.............c branch to (RRR+1)th address
802 Read1: 01110 RRR ... read 1-byte to RRR
803 Read2: 01111 RRR ..rrr read 2-byte to RRR and rrr
804 ReadBranch: 10000 RRR C...C Read1 and Branch
807 Write1: 10001 RRR ..... write 1-byte RRR
808 Write2: 10010 RRR ..rrr write 2-byte RRR and rrr
809 WriteC: 10011 000 ..... write 1-char C...CC
811 WriteS: 10100 000 ..... write C..-byte of string
815 WriteA: 10101 RRR ..... write array[RRR]
816 C.............C size of array = C...C
817 c.............c contents = c...c
819 End: 10110 000 ..... terminate the execution
821 SetSelfCS: 10111 RRR C...C RRR AAAAA= C...C
823 SetSelfCL: 11000 RRR ..... RRR AAAAA= c...c
826 SetSelfR: 11001 RRR ..Rrr RRR AAAAA= rrr
828 SetExprCL: 11010 RRR ..Rrr RRR = rrr AAAAA c...c
831 SetExprR: 11011 RRR ..rrr RRR = rrr AAAAA Rrr
834 JumpCondC: 11100 RRR c...c if !(RRR AAAAA C..) jump to c...c
837 JumpCondR: 11101 RRR c...c if !(RRR AAAAA rrr) jump to c...c
840 ReadJumpCondC: 11110 RRR c...c Read1 and JumpCondC
843 ReadJumpCondR: 11111 RRR c...c Read1 and JumpCondR
848 File: internals.info, Node: The Lisp Reader and Compiler, Next: Lstreams, Prev: MULE Character Sets and Encodings, Up: Top
850 The Lisp Reader and Compiler
851 ****************************
856 File: internals.info, Node: Lstreams, Next: Consoles; Devices; Frames; Windows, Prev: The Lisp Reader and Compiler, Up: Top
861 An "lstream" is an internal Lisp object that provides a generic
862 buffering stream implementation. Conceptually, you send data to the
863 stream or read data from the stream, not caring what's on the other end
864 of the stream. The other end could be another stream, a file
865 descriptor, a stdio stream, a fixed block of memory, a reallocating
866 block of memory, etc. The main purpose of the stream is to provide a
867 standard interface and to do buffering. Macros are defined to read or
868 write characters, so the calling functions do not have to worry about
869 blocking data together in order to achieve efficiency.
873 * Creating an Lstream:: Creating an lstream object.
874 * Lstream Types:: Different sorts of things that are streamed.
875 * Lstream Functions:: Functions for working with lstreams.
876 * Lstream Methods:: Creating new lstream types.
879 File: internals.info, Node: Creating an Lstream, Next: Lstream Types, Prev: Lstreams, Up: Lstreams
884 Lstreams come in different types, depending on what is being
885 interfaced to. Although the primitive for creating new lstreams is
886 `Lstream_new()', generally you do not call this directly. Instead, you
887 call some type-specific creation function, which creates the lstream
888 and initializes it as appropriate for the particular type.
890 All lstream creation functions take a MODE argument, specifying what
891 mode the lstream should be opened as. This controls whether the
892 lstream is for input and output, and optionally whether data should be
893 blocked up in units of MULE characters. Note that some types of
894 lstreams can only be opened for input; others only for output; and
895 others can be opened either way. #### Richard Mlynarik thinks that
896 there should be a strict separation between input and output streams,
897 and he's probably right.
899 MODE is a string, one of
908 Open for reading, but "read" never returns partial MULE characters.
911 Open for writing, but never writes partial MULE characters.
914 File: internals.info, Node: Lstream Types, Next: Lstream Functions, Prev: Creating an Lstream, Up: Lstreams
939 File: internals.info, Node: Lstream Functions, Next: Lstream Methods, Prev: Lstream Types, Up: Lstreams
944 - Function: Lstream * Lstream_new (Lstream_implementation *IMP, const
946 Allocate and return a new Lstream. This function is not really
947 meant to be called directly; rather, each stream type should
948 provide its own stream creation function, which creates the stream
949 and does any other necessary creation stuff (e.g. opening a file).
951 - Function: void Lstream_set_buffering (Lstream *LSTR,
952 Lstream_buffering BUFFERING, int BUFFERING_SIZE)
953 Change the buffering of a stream. See `lstream.h'. By default the
954 buffering is `STREAM_BLOCK_BUFFERED'.
956 - Function: int Lstream_flush (Lstream *LSTR)
957 Flush out any pending unwritten data in the stream. Clear any
958 buffered input data. Returns 0 on success, -1 on error.
960 - Macro: int Lstream_putc (Lstream *STREAM, int C)
961 Write out one byte to the stream. This is a macro and so it is
962 very efficient. The C argument is only evaluated once but the
963 STREAM argument is evaluated more than once. Returns 0 on
964 success, -1 on error.
966 - Macro: int Lstream_getc (Lstream *STREAM)
967 Read one byte from the stream. This is a macro and so it is very
968 efficient. The STREAM argument is evaluated more than once.
969 Return value is -1 for EOF or error.
971 - Macro: void Lstream_ungetc (Lstream *STREAM, int C)
972 Push one byte back onto the input queue. This will be the next
973 byte read from the stream. Any number of bytes can be pushed back
974 and will be read in the reverse order they were pushed back--most
975 recent first. (This is necessary for consistency--if there are a
976 number of bytes that have been unread and I read and unread a
977 byte, it needs to be the first to be read again.) This is a macro
978 and so it is very efficient. The C argument is only evaluated
979 once but the STREAM argument is evaluated more than once.
981 - Function: int Lstream_fputc (Lstream *STREAM, int C)
982 - Function: int Lstream_fgetc (Lstream *STREAM)
983 - Function: void Lstream_fungetc (Lstream *STREAM, int C)
984 Function equivalents of the above macros.
986 - Function: ssize_t Lstream_read (Lstream *STREAM, void *DATA, size_t
988 Read SIZE bytes of DATA from the stream. Return the number of
989 bytes read. 0 means EOF. -1 means an error occurred and no bytes
992 - Function: ssize_t Lstream_write (Lstream *STREAM, void *DATA, size_t
994 Write SIZE bytes of DATA to the stream. Return the number of
995 bytes written. -1 means an error occurred and no bytes were
998 - Function: void Lstream_unread (Lstream *STREAM, void *DATA, size_t
1000 Push back SIZE bytes of DATA onto the input queue. The next call
1001 to `Lstream_read()' with the same size will read the same bytes
1002 back. Note that this will be the case even if there is other
1003 pending unread data.
1005 - Function: int Lstream_close (Lstream *STREAM)
1006 Close the stream. All data will be flushed out.
1008 - Function: void Lstream_reopen (Lstream *STREAM)
1009 Reopen a closed stream. This enables I/O on it again. This is not
1010 meant to be called except from a wrapper routine that reinitializes
1011 variables and such--the close routine may well have freed some
1012 necessary storage structures, for example.
1014 - Function: void Lstream_rewind (Lstream *STREAM)
1015 Rewind the stream to the beginning.
1018 File: internals.info, Node: Lstream Methods, Prev: Lstream Functions, Up: Lstreams
1023 - Lstream Method: ssize_t reader (Lstream *STREAM, unsigned char
1025 Read some data from the stream's end and store it into DATA, which
1026 can hold SIZE bytes. Return the number of bytes read. A return
1027 value of 0 means no bytes can be read at this time. This may be
1028 because of an EOF, or because there is a granularity greater than
1029 one byte that the stream imposes on the returned data, and SIZE is
1030 less than this granularity. (This will happen frequently for
1031 streams that need to return whole characters, because
1032 `Lstream_read()' calls the reader function repeatedly until it has
1033 the number of bytes it wants or until 0 is returned.) The lstream
1034 functions do not treat a 0 return as EOF or do anything special;
1035 however, the calling function will interpret any 0 it gets back as
1036 EOF. This will normally not happen unless the caller calls
1037 `Lstream_read()' with a very small size.
1039 This function can be `NULL' if the stream is output-only.
1041 - Lstream Method: ssize_t writer (Lstream *STREAM, const unsigned char
1043 Send some data to the stream's end. Data to be sent is in DATA
1044 and is SIZE bytes. Return the number of bytes sent. This
1045 function can send and return fewer bytes than is passed in; in that
1046 case, the function will just be called again until there is no
1047 data left or 0 is returned. A return value of 0 means that no
1048 more data can be currently stored, but there is no error; the data
1049 will be squirreled away until the writer can accept data. (This is
1050 useful, e.g., if you're dealing with a non-blocking file
1051 descriptor and are getting `EWOULDBLOCK' errors.) This function
1052 can be `NULL' if the stream is input-only.
1054 - Lstream Method: int rewinder (Lstream *STREAM)
1055 Rewind the stream. If this is `NULL', the stream is not seekable.
1057 - Lstream Method: int seekable_p (Lstream *STREAM)
1058 Indicate whether this stream is seekable--i.e. it can be rewound.
1059 This method is ignored if the stream does not have a rewind
1060 method. If this method is not present, the result is determined
1061 by whether a rewind method is present.
1063 - Lstream Method: int flusher (Lstream *STREAM)
1064 Perform any additional operations necessary to flush the data in
1067 - Lstream Method: int pseudo_closer (Lstream *STREAM)
1069 - Lstream Method: int closer (Lstream *STREAM)
1070 Perform any additional operations necessary to close this stream
1071 down. May be `NULL'. This function is called when
1072 `Lstream_close()' is called or when the stream is
1073 garbage-collected. When this function is called, all pending data
1074 in the stream will already have been written out.
1076 - Lstream Method: Lisp_Object marker (Lisp_Object LSTREAM, void
1077 (*MARKFUN) (Lisp_Object))
1078 Mark this object for garbage collection. Same semantics as a
1079 standard `Lisp_Object' marker. This function can be `NULL'.
1082 File: internals.info, Node: Consoles; Devices; Frames; Windows, Next: The Redisplay Mechanism, Prev: Lstreams, Up: Top
1084 Consoles; Devices; Frames; Windows
1085 **********************************
1089 * Introduction to Consoles; Devices; Frames; Windows::
1091 * Window Hierarchy::
1092 * The Window Object::
1095 File: internals.info, Node: Introduction to Consoles; Devices; Frames; Windows, Next: Point, Prev: Consoles; Devices; Frames; Windows, Up: Consoles; Devices; Frames; Windows
1097 Introduction to Consoles; Devices; Frames; Windows
1098 ==================================================
1100 A window-system window that you see on the screen is called a
1101 "frame" in Emacs terminology. Each frame is subdivided into one or
1102 more non-overlapping panes, called (confusingly) "windows". Each
1103 window displays the text of a buffer in it. (See above on Buffers.) Note
1104 that buffers and windows are independent entities: Two or more windows
1105 can be displaying the same buffer (potentially in different locations),
1106 and a buffer can be displayed in no windows.
1108 A single display screen that contains one or more frames is called a
1109 "display". Under most circumstances, there is only one display.
1110 However, more than one display can exist, for example if you have a
1111 "multi-headed" console, i.e. one with a single keyboard but multiple
1112 displays. (Typically in such a situation, the various displays act like
1113 one large display, in that the mouse is only in one of them at a time,
1114 and moving the mouse off of one moves it into another.) In some cases,
1115 the different displays will have different characteristics, e.g. one
1118 XEmacs can display frames on multiple displays. It can even deal
1119 simultaneously with frames on multiple keyboards (called "consoles" in
1120 XEmacs terminology). Here is one case where this might be useful: You
1121 are using XEmacs on your workstation at work, and leave it running.
1122 Then you go home and dial in on a TTY line, and you can use the
1123 already-running XEmacs process to display another frame on your local
1126 Thus, there is a hierarchy console -> display -> frame -> window.
1127 There is a separate Lisp object type for each of these four concepts.
1128 Furthermore, there is logically a "selected console", "selected
1129 display", "selected frame", and "selected window". Each of these
1130 objects is distinguished in various ways, such as being the default
1131 object for various functions that act on objects of that type. Note
1132 that every containing object remembers the "selected" object among the
1133 objects that it contains: e.g. not only is there a selected window, but
1134 every frame remembers the last window in it that was selected, and
1135 changing the selected frame causes the remembered window within it to
1136 become the selected window. Similar relationships apply for consoles
1137 to devices and devices to frames.
1140 File: internals.info, Node: Point, Next: Window Hierarchy, Prev: Introduction to Consoles; Devices; Frames; Windows, Up: Consoles; Devices; Frames; Windows
1145 Recall that every buffer has a current insertion position, called
1146 "point". Now, two or more windows may be displaying the same buffer,
1147 and the text cursor in the two windows (i.e. `point') can be in two
1148 different places. You may ask, how can that be, since each buffer has
1149 only one value of `point'? The answer is that each window also has a
1150 value of `point' that is squirreled away in it. There is only one
1151 selected window, and the value of "point" in that buffer corresponds to
1152 that window. When the selected window is changed from one window to
1153 another displaying the same buffer, the old value of `point' is stored
1154 into the old window's "point" and the value of `point' from the new
1155 window is retrieved and made the value of `point' in the buffer. This
1156 means that `window-point' for the selected window is potentially
1157 inaccurate, and if you want to retrieve the correct value of `point'
1158 for a window, you must special-case on the selected window and retrieve
1159 the buffer's point instead. This is related to why
1160 `save-window-excursion' does not save the selected window's value of