X-Git-Url: http://git.chise.org/gitweb/?p=chise%2Fxemacs-chise.git.1;a=blobdiff_plain;f=info%2Finternals.info-2;h=41bd915ef1c7078211f5cbade18518aaf5332763;hp=805e7efc2d1bcf7ea943f82926b47218f98da148;hb=79d2db7d65205bc85d471590726d0cf3af5598e0;hpb=de1ec4b272dfa3f9ef2c9ae28a9ba67170d24da5 diff --git a/info/internals.info-2 b/info/internals.info-2 index 805e7ef..41bd915 100644 --- a/info/internals.info-2 +++ b/info/internals.info-2 @@ -1,4 +1,4 @@ -This is ../info/internals.info, produced by makeinfo version 4.0 from +This is ../info/internals.info, produced by makeinfo version 4.6 from internals/internals.texi. INFO-DIR-SECTION XEmacs Editor @@ -7,8 +7,9 @@ START-INFO-DIR-ENTRY END-INFO-DIR-ENTRY Copyright (C) 1992 - 1996 Ben Wing. Copyright (C) 1996, 1997 Sun -Microsystems. Copyright (C) 1994 - 1998 Free Software Foundation. -Copyright (C) 1994, 1995 Board of Trustees, University of Illinois. +Microsystems. Copyright (C) 1994 - 1998, 2002, 2003 Free Software +Foundation. Copyright (C) 1994, 1995 Board of Trustees, University of +Illinois. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are @@ -38,1052 +39,2819 @@ may be included in a translation approved by the Free Software Foundation instead of in the original English.  -File: internals.info, Node: The XEmacs Object System (Abstractly Speaking), Next: How Lisp Objects Are Represented in C, Prev: XEmacs From the Inside, Up: Top - -The XEmacs Object System (Abstractly Speaking) -********************************************** - - At the heart of the Lisp interpreter is its management of objects. -XEmacs Lisp contains many built-in objects, some of which are simple -and others of which can be very complex; and some of which are very -common, and others of which are rarely used or are only used -internally. (Since the Lisp allocation system, with its automatic -reclamation of unused storage, is so much more convenient than -`malloc()' and `free()', the C code makes extensive use of it in its -internal operations.) - - The basic Lisp objects are - -`integer' - 28 or 31 bits of precision, or 60 or 63 bits on 64-bit machines; - the reason for this is described below when the internal Lisp - object representation is described. - -`float' - Same precision as a double in C. - -`cons' - A simple container for two Lisp objects, used to implement lists - and most other data structures in Lisp. - -`char' - An object representing a single character of text; chars behave - like integers in many ways but are logically considered text - rather than numbers and have a different read syntax. (the read - syntax for a char contains the char itself or some textual - encoding of it--for example, a Japanese Kanji character might be - encoded as `^[$(B#&^[(B' using the ISO-2022 encoding - standard--rather than the numerical representation of the char; - this way, if the mapping between chars and integers changes, which - is quite possible for Kanji characters and other extended - characters, the same character will still be created. Note that - some primitives confuse chars and integers. The worst culprit is - `eq', which makes a special exception and considers a char to be - `eq' to its integer equivalent, even though in no other case are - objects of two different types `eq'. The reason for this - monstrosity is compatibility with existing code; the separation of - char from integer came fairly recently.) - -`symbol' - An object that contains Lisp objects and is referred to by name; - symbols are used to implement variables and named functions and to - provide the equivalent of preprocessor constants in C. - -`vector' - A one-dimensional array of Lisp objects providing constant-time - access to any of the objects; access to an arbitrary object in a - vector is faster than for lists, but the operations that can be - done on a vector are more limited. - -`string' - Self-explanatory; behaves much like a vector of chars but has a - different read syntax and is stored and manipulated more compactly. - -`bit-vector' - A vector of bits; similar to a string in spirit. - -`compiled-function' - An object containing compiled Lisp code, known as "byte code". - -`subr' - A Lisp primitive, i.e. a Lisp-callable function implemented in C. - - Note that there is no basic "function" type, as in more powerful -versions of Lisp (where it's called a "closure"). XEmacs Lisp does not -provide the closure semantics implemented by Common Lisp and Scheme. -The guts of a function in XEmacs Lisp are represented in one of four -ways: a symbol specifying another function (when one function is an -alias for another), a list (whose first element must be the symbol -`lambda') containing the function's source code, a compiled-function -object, or a subr object. (In other words, given a symbol specifying -the name of a function, calling `symbol-function' to retrieve the -contents of the symbol's function cell will return one of these types -of objects.) - - XEmacs Lisp also contains numerous specialized objects used to -implement the editor: +File: internals.info, Node: Introduction to Symbols, Next: Obarrays, Up: Symbols and Variables -`buffer' - Stores text like a string, but is optimized for insertion and - deletion and has certain other properties that can be set. +Introduction to Symbols +======================= -`frame' - An object with various properties whose displayable representation - is a "window" in window-system parlance. - -`window' - A section of a frame that displays the contents of a buffer; often - called a "pane" in window-system parlance. - -`window-configuration' - An object that represents a saved configuration of windows in a - frame. - -`device' - An object representing a screen on which frames can be displayed; - equivalent to a "display" in the X Window System and a "TTY" in - character mode. - -`face' - An object specifying the appearance of text or graphics; it has - properties such as font, foreground color, and background color. - -`marker' - An object that refers to a particular position in a buffer and - moves around as text is inserted and deleted to stay in the same - relative position to the text around it. - -`extent' - Similar to a marker but covers a range of text in a buffer; can - also specify properties of the text, such as a face in which the - text is to be displayed, whether the text is invisible or - unmodifiable, etc. - -`event' - Generated by calling `next-event' and contains information - describing a particular event happening in the system, such as the - user pressing a key or a process terminating. - -`keymap' - An object that maps from events (described using lists, vectors, - and symbols rather than with an event object because the mapping - is for classes of events, rather than individual events) to - functions to execute or other events to recursively look up; the - functions are described by name, using a symbol, or using lists to - specify the function's code. - -`glyph' - An object that describes the appearance of an image (e.g. pixmap) - on the screen; glyphs can be attached to the beginning or end of - extents and in some future version of XEmacs will be able to be - inserted directly into a buffer. - -`process' - An object that describes a connection to an externally-running - process. - - There are some other, less-commonly-encountered general objects: - -`hash-table' - An object that maps from an arbitrary Lisp object to another - arbitrary Lisp object, using hashing for fast lookup. - -`obarray' - A limited form of hash-table that maps from strings to symbols; - obarrays are used to look up a symbol given its name and are not - actually their own object type but are kludgily represented using - vectors with hidden fields (this representation derives from GNU - Emacs). - -`specifier' - A complex object used to specify the value of a display property; a - default value is given and different values can be specified for - particular frames, buffers, windows, devices, or classes of device. - -`char-table' - An object that maps from chars or classes of chars to arbitrary - Lisp objects; internally char tables use a complex nested-vector - representation that is optimized to the way characters are - represented as integers. - -`range-table' - An object that maps from ranges of integers to arbitrary Lisp - objects. - - And some strange special-purpose objects: - -`charset' -`coding-system' - Objects used when MULE, or multi-lingual/Asian-language, support is - enabled. - -`color-instance' -`font-instance' -`image-instance' - An object that encapsulates a window-system resource; instances are - mostly used internally but are exposed on the Lisp level for - cleanness of the specifier model and because it's occasionally - useful for Lisp program to create or query the properties of - instances. - -`subwindow' - An object that encapsulate a "subwindow" resource, i.e. a - window-system child window that is drawn into by an external - process; this object should be integrated into the glyph system - but isn't yet, and may change form when this is done. - -`tooltalk-message' -`tooltalk-pattern' - Objects that represent resources used in the ToolTalk interprocess - communication protocol. - -`toolbar-button' - An object used in conjunction with the toolbar. - - And objects that are only used internally: - -`opaque' - A generic object for encapsulating arbitrary memory; this allows - you the generality of `malloc()' and the convenience of the Lisp - object system. - -`lstream' - A buffering I/O stream, used to provide a unified interface to - anything that can accept output or provide input, such as a file - descriptor, a stdio stream, a chunk of memory, a Lisp buffer, a - Lisp string, etc.; it's a Lisp object to make its memory - management more convenient. - -`char-table-entry' - Subsidiary objects in the internal char-table representation. - -`extent-auxiliary' -`menubar-data' -`toolbar-data' - Various special-purpose objects that are basically just used to - encapsulate memory for particular subsystems, similar to the more - general "opaque" object. - -`symbol-value-forward' -`symbol-value-buffer-local' -`symbol-value-varalias' -`symbol-value-lisp-magic' - Special internal-only objects that are placed in the value cell of - a symbol to indicate that there is something special with this - variable - e.g. it has no value, it mirrors another variable, or - it mirrors some C variable; there is really only one kind of - object, called a "symbol-value-magic", but it is sort-of halfway - kludged into semi-different object types. +A symbol is basically just an object with four fields: a name (a +string), a value (some Lisp object), a function (some Lisp object), and +a property list (usually a list of alternating keyword/value pairs). +What makes symbols special is that there is usually only one symbol with +a given name, and the symbol is referred to by name. This makes a +symbol a convenient way of calling up data by name, i.e. of implementing +variables. (The variable's value is stored in the "value slot".) +Similarly, functions are referenced by name, and the definition of the +function is stored in a symbol's "function slot". This means that +there can be a distinct function and variable with the same name. The +property list is used as a more general mechanism of associating +additional values with particular names, and once again the namespace is +independent of the function and variable namespaces. - Some types of objects are "permanent", meaning that once created, -they do not disappear until explicitly destroyed, using a function such -as `delete-buffer', `delete-window', `delete-frame', etc. Others will -disappear once they are not longer used, through the garbage collection -mechanism. Buffers, frames, windows, devices, and processes are among -the objects that are permanent. Note that some objects can go both -ways: Faces can be created either way; extents are normally permanent, -but detached extents (extents not referring to any text, as happens to -some extents when the text they are referring to is deleted) are -temporary. Note that some permanent objects, such as faces and coding -systems, cannot be deleted. Note also that windows are unique in that -they can be _undeleted_ after having previously been deleted. (This -happens as a result of restoring a window configuration.) - - Note that many types of objects have a "read syntax", i.e. a way of -specifying an object of that type in Lisp code. When you load a Lisp -file, or type in code to be evaluated, what really happens is that the -function `read' is called, which reads some text and creates an object -based on the syntax of that text; then `eval' is called, which possibly -does something special; then this loop repeats until there's no more -text to read. (`eval' only actually does something special with -symbols, which causes the symbol's value to be returned, similar to -referencing a variable; and with conses [i.e. lists], which cause a -function invocation. All other values are returned unchanged.) + +File: internals.info, Node: Obarrays, Next: Symbol Values, Prev: Introduction to Symbols, Up: Symbols and Variables + +Obarrays +======== + +The identity of symbols with their names is accomplished through a +structure called an obarray, which is just a poorly-implemented hash +table mapping from strings to symbols whose name is that string. (I say +"poorly implemented" because an obarray appears in Lisp as a vector +with some hidden fields rather than as its own opaque type. This is an +Emacs Lisp artifact that should be fixed.) + + Obarrays are implemented as a vector of some fixed size (which should +be a prime for best results), where each "bucket" of the vector +contains one or more symbols, threaded through a hidden `next' field in +the symbol. Lookup of a symbol in an obarray, and adding a symbol to +an obarray, is accomplished through standard hash-table techniques. + + The standard Lisp function for working with symbols and obarrays is +`intern'. This looks up a symbol in an obarray given its name; if it's +not found, a new symbol is automatically created with the specified +name, added to the obarray, and returned. This is what happens when the +Lisp reader encounters a symbol (or more precisely, encounters the name +of a symbol) in some text that it is reading. There is a standard +obarray called `obarray' that is used for this purpose, although the +Lisp programmer is free to create his own obarrays and `intern' symbols +in them. + + Note that, once a symbol is in an obarray, it stays there until +something is done about it, and the standard obarray `obarray' always +stays around, so once you use any particular variable name, a +corresponding symbol will stay around in `obarray' until you exit +XEmacs. + + Note that `obarray' itself is a variable, and as such there is a +symbol in `obarray' whose name is `"obarray"' and which contains +`obarray' as its value. + + Note also that this call to `intern' occurs only when in the Lisp +reader, not when the code is executed (at which point the symbol is +already around, stored as such in the definition of the function). + + You can create your own obarray using `make-vector' (this is +horrible but is an artifact) and intern symbols into that obarray. +Doing that will result in two or more symbols with the same name. +However, at most one of these symbols is in the standard `obarray': You +cannot have two symbols of the same name in any particular obarray. +Note that you cannot add a symbol to an obarray in any fashion other +than using `intern': i.e. you can't take an existing symbol and put it +in an existing obarray. Nor can you change the name of an existing +symbol. (Since obarrays are vectors, you can violate the consistency of +things by storing directly into the vector, but let's ignore that +possibility.) + + Usually symbols are created by `intern', but if you really want, you +can explicitly create a symbol using `make-symbol', giving it some +name. The resulting symbol is not in any obarray (i.e. it is +"uninterned"), and you can't add it to any obarray. Therefore its +primary purpose is as a symbol to use in macros to avoid namespace +pollution. It can also be used as a carrier of information, but cons +cells could probably be used just as well. + + You can also use `intern-soft' to look up a symbol but not create a +new one, and `unintern' to remove a symbol from an obarray. This +returns the removed symbol. (Remember: You can't put the symbol back +into any obarray.) Finally, `mapatoms' maps over all of the symbols in +an obarray. - The read syntax + +File: internals.info, Node: Symbol Values, Prev: Obarrays, Up: Symbols and Variables + +Symbol Values +============= + +The value field of a symbol normally contains a Lisp object. However, +a symbol can be "unbound", meaning that it logically has no value. +This is internally indicated by storing a special Lisp object, called +"the unbound marker" and stored in the global variable `Qunbound'. The +unbound marker is of a special Lisp object type called +"symbol-value-magic". It is impossible for the Lisp programmer to +directly create or access any object of this type. + + *You must not let any "symbol-value-magic" object escape to the Lisp +level.* Printing any of these objects will cause the message `INTERNAL +EMACS BUG' to appear as part of the print representation. (You may see +this normally when you call `debug_print()' from the debugger on a Lisp +object.) If you let one of these objects escape to the Lisp level, you +will violate a number of assumptions contained in the C code and make +the unbound marker not function right. + + When a symbol is created, its value field (and function field) are +set to `Qunbound'. The Lisp programmer can restore these conditions +later using `makunbound' or `fmakunbound', and can query to see whether +the value of function fields are "bound" (i.e. have a value other than +`Qunbound') using `boundp' and `fboundp'. The fields are set to a +normal Lisp object using `set' (or `setq') and `fset'. + + Other symbol-value-magic objects are used as special markers to +indicate variables that have non-normal properties. This includes any +variables that are tied into C variables (setting the variable magically +sets some global variable in the C code, and likewise for retrieving the +variable's value), variables that magically tie into slots in the +current buffer, variables that are buffer-local, etc. The +symbol-value-magic object is stored in the value cell in place of a +normal object, and the code to retrieve a symbol's value (i.e. +`symbol-value') knows how to do special things with them. This means +that you should not just fetch the value cell directly if you want a +symbol's value. + + The exact workings of this are rather complex and involved and are +well-documented in comments in `buffer.c', `symbols.c', and `lisp.h'. - 17297 + +File: internals.info, Node: Buffers and Textual Representation, Next: MULE Character Sets and Encodings, Prev: Symbols and Variables, Up: Top - converts to an integer whose value is 17297. +Buffers and Textual Representation +********************************** - 1.983e-4 +* Menu: - converts to a float whose value is 1.983e-4, or .0001983. +* Introduction to Buffers:: A buffer holds a block of text such as a file. +* The Text in a Buffer:: Representation of the text in a buffer. +* Buffer Lists:: Keeping track of all buffers. +* Markers and Extents:: Tagging locations within a buffer. +* Bufbytes and Emchars:: Representation of individual characters. +* The Buffer Object:: The Lisp object corresponding to a buffer. - ?b + +File: internals.info, Node: Introduction to Buffers, Next: The Text in a Buffer, Up: Buffers and Textual Representation - converts to a char that represents the lowercase letter b. +Introduction to Buffers +======================= - ?^[$(B#&^[(B +A buffer is logically just a Lisp object that holds some text. In +this, it is like a string, but a buffer is optimized for frequent +insertion and deletion, while a string is not. Furthermore: + + 1. Buffers are "permanent" objects, i.e. once you create them, they + remain around, and need to be explicitly deleted before they go + away. + + 2. Each buffer has a unique name, which is a string. Buffers are + normally referred to by name. In this respect, they are like + symbols. + + 3. Buffers have a default insertion position, called "point". + Inserting text (unless you explicitly give a position) goes at + point, and moves point forward past the text. This is what is + going on when you type text into Emacs. + + 4. Buffers have lots of extra properties associated with them. + + 5. Buffers can be "displayed". What this means is that there exist a + number of "windows", which are objects that correspond to some + visible section of your display, and each window has an associated + buffer, and the current contents of the buffer are shown in that + section of the display. The redisplay mechanism (which takes care + of doing this) knows how to look at the text of a buffer and come + up with some reasonable way of displaying this. Many of the + properties of a buffer control how the buffer's text is displayed. + + 6. One buffer is distinguished and called the "current buffer". It is + stored in the variable `current_buffer'. Buffer operations operate + on this buffer by default. When you are typing text into a + buffer, the buffer you are typing into is always `current_buffer'. + Switching to a different window changes the current buffer. Note + that Lisp code can temporarily change the current buffer using + `set-buffer' (often enclosed in a `save-excursion' so that the + former current buffer gets restored when the code is finished). + However, calling `set-buffer' will NOT cause a permanent change in + the current buffer. The reason for this is that the top-level + event loop sets `current_buffer' to the buffer of the selected + window, each time it finishes executing a user command. + + Make sure you understand the distinction between "current buffer" +and "buffer of the selected window", and the distinction between +"point" of the current buffer and "window-point" of the selected +window. (This latter distinction is explained in detail in the section +on windows.) - (where `^[' actually is an `ESC' character) converts to a particular -Kanji character when using an ISO2022-based coding system for input. -(To decode this goo: `ESC' begins an escape sequence; `ESC $ (' is a -class of escape sequences meaning "switch to a 94x94 character set"; -`ESC $ ( B' means "switch to Japanese Kanji"; `#' and `&' collectively -index into a 94-by-94 array of characters [subtract 33 from the ASCII -value of each character to get the corresponding index]; `ESC (' is a -class of escape sequences meaning "switch to a 94 character set"; `ESC -(B' means "switch to US ASCII". It is a coincidence that the letter -`B' is used to denote both Japanese Kanji and US ASCII. If the first -`B' were replaced with an `A', you'd be requesting a Chinese Hanzi -character from the GB2312 character set.) + +File: internals.info, Node: The Text in a Buffer, Next: Buffer Lists, Prev: Introduction to Buffers, Up: Buffers and Textual Representation + +The Text in a Buffer +==================== + +The text in a buffer consists of a sequence of zero or more characters. +A "character" is an integer that logically represents a letter, +number, space, or other unit of text. Most of the characters that you +will typically encounter belong to the ASCII set of characters, but +there are also characters for various sorts of accented letters, +special symbols, Chinese and Japanese ideograms (i.e. Kanji, Katakana, +etc.), Cyrillic and Greek letters, etc. The actual number of possible +characters is quite large. + + For now, we can view a character as some non-negative integer that +has some shape that defines how it typically appears (e.g. as an +uppercase A). (The exact way in which a character appears depends on the +font used to display the character.) The internal type of characters in +the C code is an `Emchar'; this is just an `int', but using a symbolic +type makes the code clearer. + + Between every character in a buffer is a "buffer position" or +"character position". We can speak of the character before or after a +particular buffer position, and when you insert a character at a +particular position, all characters after that position end up at new +positions. When we speak of the character "at" a position, we really +mean the character after the position. (This schizophrenia between a +buffer position being "between" a character and "on" a character is +rampant in Emacs.) + + Buffer positions are numbered starting at 1. This means that +position 1 is before the first character, and position 0 is not valid. +If there are N characters in a buffer, then buffer position N+1 is +after the last one, and position N+2 is not valid. + + The internal makeup of the Emchar integer varies depending on whether +we have compiled with MULE support. If not, the Emchar integer is an +8-bit integer with possible values from 0 - 255. 0 - 127 are the +standard ASCII characters, while 128 - 255 are the characters from the +ISO-8859-1 character set. If we have compiled with MULE support, an +Emchar is a 19-bit integer, with the various bits having meanings +according to a complex scheme that will be detailed later. The +characters numbered 0 - 255 still have the same meanings as for the +non-MULE case, though. + + Internally, the text in a buffer is represented in a fairly simple +fashion: as a contiguous array of bytes, with a "gap" of some size in +the middle. Although the gap is of some substantial size in bytes, +there is no text contained within it: From the perspective of the text +in the buffer, it does not exist. The gap logically sits at some buffer +position, between two characters (or possibly at the beginning or end of +the buffer). Insertion of text in a buffer at a particular position is +always accomplished by first moving the gap to that position (i.e. +through some block moving of text), then writing the text into the +beginning of the gap, thereby shrinking the gap. If the gap shrinks +down to nothing, a new gap is created. (What actually happens is that a +new gap is "created" at the end of the buffer's text, which requires +nothing more than changing a couple of indices; then the gap is "moved" +to the position where the insertion needs to take place by moving up in +memory all the text after that position.) Similarly, deletion occurs +by moving the gap to the place where the text is to be deleted, and +then simply expanding the gap to include the deleted text. +("Expanding" and "shrinking" the gap as just described means just that +the internal indices that keep track of where the gap is located are +changed.) + + Note that the total amount of memory allocated for a buffer text +never decreases while the buffer is live. Therefore, if you load up a +20-megabyte file and then delete all but one character, there will be a +20-megabyte gap, which won't get any smaller (except by inserting +characters back again). Once the buffer is killed, the memory allocated +for the buffer text will be freed, but it will still be sitting on the +heap, taking up virtual memory, and will not be released back to the +operating system. (However, if you have compiled XEmacs with rel-alloc, +the situation is different. In this case, the space _will_ be released +back to the operating system. However, this tends to result in a +noticeable speed penalty.) + + Astute readers may notice that the text in a buffer is represented as +an array of _bytes_, while (at least in the MULE case) an Emchar is a +19-bit integer, which clearly cannot fit in a byte. This means (of +course) that the text in a buffer uses a different representation from +an Emchar: specifically, the 19-bit Emchar becomes a series of one to +four bytes. The conversion between these two representations is complex +and will be described later. + + In the non-MULE case, everything is very simple: An Emchar is an +8-bit value, which fits neatly into one byte. + + If we are given a buffer position and want to retrieve the character +at that position, we need to follow these steps: + + 1. Pretend there's no gap, and convert the buffer position into a + "byte index" that indexes to the appropriate byte in the buffer's + stream of textual bytes. By convention, byte indices begin at 1, + just like buffer positions. In the non-MULE case, byte indices + and buffer positions are identical, since one character equals one + byte. + + 2. Convert the byte index into a "memory index", which takes the gap + into account. The memory index is a direct index into the block of + memory that stores the text of a buffer. This basically just + involves checking to see if the byte index is past the gap, and if + so, adding the size of the gap to it. By convention, memory + indices begin at 1, just like buffer positions and byte indices, + and when referring to the position that is "at" the gap, we always + use the memory position at the _beginning_, not at the end, of the + gap. + + 3. Fetch the appropriate bytes at the determined memory position. + + 4. Convert these bytes into an Emchar. + + In the non-Mule case, (3) and (4) boil down to a simple one-byte +memory access. + + Note that we have defined three types of positions in a buffer: + + 1. "buffer positions" or "character positions", typedef `Bufpos' + + 2. "byte indices", typedef `Bytind' + + 3. "memory indices", typedef `Memind' + + All three typedefs are just `int's, but defining them this way makes +things a lot clearer. + + Most code works with buffer positions. In particular, all Lisp code +that refers to text in a buffer uses buffer positions. Lisp code does +not know that byte indices or memory indices exist. + + Finally, we have a typedef for the bytes in a buffer. This is a +`Bufbyte', which is an unsigned char. Referring to them as Bufbytes +underscores the fact that we are working with a string of bytes in the +internal Emacs buffer representation rather than in one of a number of +possible alternative representations (e.g. EUC-encoded text, etc.). - "foobar" + +File: internals.info, Node: Buffer Lists, Next: Markers and Extents, Prev: The Text in a Buffer, Up: Buffers and Textual Representation + +Buffer Lists +============ + +Recall earlier that buffers are "permanent" objects, i.e. that they +remain around until explicitly deleted. This entails that there is a +list of all the buffers in existence. This list is actually an +assoc-list (mapping from the buffer's name to the buffer) and is stored +in the global variable `Vbuffer_alist'. + + The order of the buffers in the list is important: the buffers are +ordered approximately from most-recently-used to least-recently-used. +Switching to a buffer using `switch-to-buffer', `pop-to-buffer', etc. +and switching windows using `other-window', etc. usually brings the +new current buffer to the front of the list. `switch-to-buffer', +`other-buffer', etc. look at the beginning of the list to find an +alternative buffer to suggest. You can also explicitly move a buffer +to the end of the list using `bury-buffer'. + + In addition to the global ordering in `Vbuffer_alist', each frame +has its own ordering of the list. These lists always contain the same +elements as in `Vbuffer_alist' although possibly in a different order. +`buffer-list' normally returns the list for the selected frame. This +allows you to work in separate frames without things interfering with +each other. + + The standard way to look up a buffer given a name is `get-buffer', +and the standard way to create a new buffer is `get-buffer-create', +which looks up a buffer with a given name, creating a new one if +necessary. These operations correspond exactly with the symbol +operations `intern-soft' and `intern', respectively. You can also +force a new buffer to be created using `generate-new-buffer', which +takes a name and (if necessary) makes a unique name from this by +appending a number, and then creates the buffer. This is basically +like the symbol operation `gensym'. + + +File: internals.info, Node: Markers and Extents, Next: Bufbytes and Emchars, Prev: Buffer Lists, Up: Buffers and Textual Representation + +Markers and Extents +=================== + +Among the things associated with a buffer are things that are logically +attached to certain buffer positions. This can be used to keep track +of a buffer position when text is inserted and deleted, so that it +remains at the same spot relative to the text around it; to assign +properties to particular sections of text; etc. There are two such +objects that are useful in this regard: they are "markers" and +"extents". + + A "marker" is simply a flag placed at a particular buffer position, +which is moved around as text is inserted and deleted. Markers are +used for all sorts of purposes, such as the `mark' that is the other +end of textual regions to be cut, copied, etc. + + An "extent" is similar to two markers plus some associated +properties, and is used to keep track of regions in a buffer as text is +inserted and deleted, and to add properties (e.g. fonts) to particular +regions of text. The external interface of extents is explained +elsewhere. + + The important thing here is that markers and extents simply contain +buffer positions in them as integers, and every time text is inserted or +deleted, these positions must be updated. In order to minimize the +amount of shuffling that needs to be done, the positions in markers and +extents (there's one per marker, two per extent) are stored in Meminds. +This means that they only need to be moved when the text is physically +moved in memory; since the gap structure tries to minimize this, it also +minimizes the number of marker and extent indices that need to be +adjusted. Look in `insdel.c' for the details of how this works. + + One other important distinction is that markers are "temporary" +while extents are "permanent". This means that markers disappear as +soon as there are no more pointers to them, and correspondingly, there +is no way to determine what markers are in a buffer if you are just +given the buffer. Extents remain in a buffer until they are detached +(which could happen as a result of text being deleted) or the buffer is +deleted, and primitives do exist to enumerate the extents in a buffer. - converts to a string. + +File: internals.info, Node: Bufbytes and Emchars, Next: The Buffer Object, Prev: Markers and Extents, Up: Buffers and Textual Representation - foobar +Bufbytes and Emchars +==================== - converts to a symbol whose name is `"foobar"'. This is done by -looking up the string equivalent in the global variable `obarray', -whose contents should be an obarray. If no symbol is found, a new -symbol with the name `"foobar"' is automatically created and added to -`obarray'; this process is called "interning" the symbol. +Not yet documented. - (foo . bar) + +File: internals.info, Node: The Buffer Object, Prev: Bufbytes and Emchars, Up: Buffers and Textual Representation + +The Buffer Object +================= + +Buffers contain fields not directly accessible by the Lisp programmer. +We describe them here, naming them by the names used in the C code. +Many are accessible indirectly in Lisp programs via Lisp primitives. + +`name' + The buffer name is a string that names the buffer. It is + guaranteed to be unique. *Note Buffer Names: (lispref)Buffer + Names. + +`save_modified' + This field contains the time when the buffer was last saved, as an + integer. *Note Buffer Modification: (lispref)Buffer Modification. + +`modtime' + This field contains the modification time of the visited file. It + is set when the file is written or read. Every time the buffer is + written to the file, this field is compared to the modification + time of the file. *Note Buffer Modification: (lispref)Buffer + Modification. + +`auto_save_modified' + This field contains the time when the buffer was last auto-saved. + +`last_window_start' + This field contains the `window-start' position in the buffer as of + the last time the buffer was displayed in a window. + +`undo_list' + This field points to the buffer's undo list. *Note Undo: + (lispref)Undo. + +`syntax_table_v' + This field contains the syntax table for the buffer. *Note Syntax + Tables: (lispref)Syntax Tables. + +`downcase_table' + This field contains the conversion table for converting text to + lower case. *Note Case Tables: (lispref)Case Tables. + +`upcase_table' + This field contains the conversion table for converting text to + upper case. *Note Case Tables: (lispref)Case Tables. + +`case_canon_table' + This field contains the conversion table for canonicalizing text + for case-folding search. *Note Case Tables: (lispref)Case Tables. + +`case_eqv_table' + This field contains the equivalence table for case-folding search. + *Note Case Tables: (lispref)Case Tables. + +`display_table' + This field contains the buffer's display table, or `nil' if it + doesn't have one. *Note Display Tables: (lispref)Display Tables. + +`markers' + This field contains the chain of all markers that currently point + into the buffer. Deletion of text in the buffer, and motion of + the buffer's gap, must check each of these markers and perhaps + update it. *Note Markers: (lispref)Markers. + +`backed_up' + This field is a flag that tells whether a backup file has been + made for the visited file of this buffer. + +`mark' + This field contains the mark for the buffer. The mark is a marker, + hence it is also included on the list `markers'. *Note The Mark: + (lispref)The Mark. + +`mark_active' + This field is non-`nil' if the buffer's mark is active. + +`local_var_alist' + This field contains the association list describing the variables + local in this buffer, and their values, with the exception of + local variables that have special slots in the buffer object. + (Those slots are omitted from this table.) *Note Buffer-Local + Variables: (lispref)Buffer-Local Variables. + +`modeline_format' + This field contains a Lisp object which controls how to display + the mode line for this buffer. *Note Modeline Format: + (lispref)Modeline Format. + +`base_buffer' + This field holds the buffer's base buffer (if it is an indirect + buffer), or `nil'. - converts to a cons cell containing the symbols `foo' and `bar'. + +File: internals.info, Node: MULE Character Sets and Encodings, Next: The Lisp Reader and Compiler, Prev: Buffers and Textual Representation, Up: Top + +MULE Character Sets and Encodings +********************************* + +Recall that there are two primary ways that text is represented in +XEmacs. The "buffer" representation sees the text as a series of bytes +(Bufbytes), with a variable number of bytes used per character. The +"character" representation sees the text as a series of integers +(Emchars), one per character. The character representation is a cleaner +representation from a theoretical standpoint, and is thus used in many +cases when lots of manipulations on a string need to be done. However, +the buffer representation is the standard representation used in both +Lisp strings and buffers, and because of this, it is the "default" +representation that text comes in. The reason for using this +representation is that it's compact and is compatible with ASCII. - (1 a 2.5) +* Menu: - converts to a three-element list containing the specified objects -(note that a list is actually a set of nested conses; see the XEmacs -Lisp Reference). +* Character Sets:: +* Encodings:: +* Internal Mule Encodings:: +* CCL:: - [1 a 2.5] + +File: internals.info, Node: Character Sets, Next: Encodings, Up: MULE Character Sets and Encodings + +Character Sets +============== + +A character set (or "charset") is an ordered set of characters. A +particular character in a charset is indexed using one or more +"position codes", which are non-negative integers. The number of +position codes needed to identify a particular character in a charset is +called the "dimension" of the charset. In XEmacs/Mule, all charsets +have dimension 1 or 2, and the size of all charsets (except for a few +special cases) is either 94, 96, 94 by 94, or 96 by 96. The range of +position codes used to index characters from any of these types of +character sets is as follows: + + Charset type Position code 1 Position code 2 + ------------------------------------------------------------ + 94 33 - 126 N/A + 96 32 - 127 N/A + 94x94 33 - 126 33 - 126 + 96x96 32 - 127 32 - 127 + + Note that in the above cases position codes do not start at an +expected value such as 0 or 1. The reason for this will become clear +later. + + For example, Latin-1 is a 96-character charset, and JISX0208 (the +Japanese national character set) is a 94x94-character charset. + + [Note that, although the ranges above define the _valid_ position +codes for a charset, some of the slots in a particular charset may in +fact be empty. This is the case for JISX0208, for example, where (e.g.) +all the slots whose first position code is in the range 118 - 127 are +empty.] + + There are three charsets that do not follow the above rules. All of +them have one dimension, and have ranges of position codes as follows: + + Charset name Position code 1 + ------------------------------------ + ASCII 0 - 127 + Control-1 0 - 31 + Composite 0 - some large number + + (The upper bound of the position code for composite characters has +not yet been determined, but it will probably be at least 16,383). + + ASCII is the union of two subsidiary character sets: Printing-ASCII +(the printing ASCII character set, consisting of position codes 33 - +126, like for a standard 94-character charset) and Control-ASCII (the +non-printing characters that would appear in a binary file with codes 0 +- 32 and 127). + + Control-1 contains the non-printing characters that would appear in a +binary file with codes 128 - 159. + + Composite contains characters that are generated by overstriking one +or more characters from other charsets. + + Note that some characters in ASCII, and all characters in Control-1, +are "control" (non-printing) characters. These have no printed +representation but instead control some other function of the printing +(e.g. TAB or 8 moves the current character position to the next tab +stop). All other characters in all charsets are "graphic" (printing) +characters. + + When a binary file is read in, the bytes in the file are assigned to +character sets as follows: + + Bytes Character set Range + -------------------------------------------------- + 0 - 127 ASCII 0 - 127 + 128 - 159 Control-1 0 - 31 + 160 - 255 Latin-1 32 - 127 + + This is a bit ad-hoc but gets the job done. - converts to a three-element vector containing the specified objects. + +File: internals.info, Node: Encodings, Next: Internal Mule Encodings, Prev: Character Sets, Up: MULE Character Sets and Encodings - #[... ... ... ...] +Encodings +========= - converts to a compiled-function object (the actual contents are not -shown since they are not relevant here; look at a file that ends with -`.elc' for examples). +An "encoding" is a way of numerically representing characters from one +or more character sets. If an encoding only encompasses one character +set, then the position codes for the characters in that character set +could be used directly. This is not possible, however, if more than +one character set is to be used in the encoding. - #*01110110 + For example, the conversion detailed above between bytes in a binary +file and characters is effectively an encoding that encompasses the +three character sets ASCII, Control-1, and Latin-1 in a stream of 8-bit +bytes. - converts to a bit-vector. + Thus, an encoding can be viewed as a way of encoding characters from +a specified group of character sets using a stream of bytes, each of +which contains a fixed number of bits (but not necessarily 8, as in the +common usage of "byte"). - #s(hash-table ... ...) + Here are descriptions of a couple of common encodings: - converts to a hash table (the actual contents are not shown). +* Menu: - #s(range-table ... ...) +* Japanese EUC (Extended Unix Code):: +* JIS7:: - converts to a range table (the actual contents are not shown). + +File: internals.info, Node: Japanese EUC (Extended Unix Code), Next: JIS7, Up: Encodings - #s(char-table ... ...) +Japanese EUC (Extended Unix Code) +--------------------------------- - converts to a char table (the actual contents are not shown). +This encompasses the character sets Printing-ASCII, Japanese-JISX0201, +and Japanese-JISX0208-Kana (half-width katakana, the right half of +JISX0201). It uses 8-bit bytes. - Note that the `#s()' syntax is the general syntax for structures, -which are not really implemented in XEmacs Lisp but should be. + Note that Printing-ASCII and Japanese-JISX0201-Kana are 94-character +charsets, while Japanese-JISX0208 is a 94x94-character charset. - When an object is printed out (using `print' or a related function), -the read syntax is used, so that the same object can be read in again. + The encoding is as follows: - The other objects do not have read syntaxes, usually because it does -not really make sense to create them in this fashion (i.e. processes, -where it doesn't make sense to have a subprocess created as a side -effect of reading some Lisp code), or because they can't be created at -all (e.g. subrs). Permanent objects, as a rule, do not have a read -syntax; nor do most complex objects, which contain too much state to be -easily initialized through a read syntax. + Character set Representation (PC=position-code) + ------------- -------------- + Printing-ASCII PC1 + Japanese-JISX0201-Kana 0x8E | PC1 + 0x80 + Japanese-JISX0208 PC1 + 0x80 | PC2 + 0x80 + Japanese-JISX0212 PC1 + 0x80 | PC2 + 0x80  -File: internals.info, Node: How Lisp Objects Are Represented in C, Next: Rules When Writing New C Code, Prev: The XEmacs Object System (Abstractly Speaking), Up: Top +File: internals.info, Node: JIS7, Prev: Japanese EUC (Extended Unix Code), Up: Encodings + +JIS7 +---- -How Lisp Objects Are Represented in C -************************************* +This encompasses the character sets Printing-ASCII, +Japanese-JISX0201-Roman (the left half of JISX0201; this character set +is very similar to Printing-ASCII and is a 94-character charset), +Japanese-JISX0208, and Japanese-JISX0201-Kana. It uses 7-bit bytes. - Lisp objects are represented in C using a 32-bit or 64-bit machine -word (depending on the processor; i.e. DEC Alphas use 64-bit Lisp -objects and most other processors use 32-bit Lisp objects). The -representation stuffs a pointer together with a tag, as follows: + Unlike Japanese EUC, this is a "modal" encoding, which means that +there are multiple states that the encoding can be in, which affect how +the bytes are to be interpreted. Special sequences of bytes (called +"escape sequences") are used to change states. - [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ] - [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ] + The encoding is as follows: + + Character set Representation (PC=position-code) + ------------- -------------- + Printing-ASCII PC1 + Japanese-JISX0201-Roman PC1 + Japanese-JISX0201-Kana PC1 + Japanese-JISX0208 PC1 PC2 + - <---------------------------------------------------------> <-> - a pointer to a structure, or an integer tag - - A tag of 00 is used for all pointer object types, a tag of 10 is used -for characters, and the other two tags 01 and 11 are joined together to -form the integer object type. This representation gives us 31 bit -integers and 30 bit characters, while pointers are represented directly -without any bit masking or shifting. This representation, though, -assumes that pointers to structs are always aligned to multiples of 4, -so the lower 2 bits are always zero. - - Lisp objects use the typedef `Lisp_Object', but the actual C type -used for the Lisp object can vary. It can be either a simple type -(`long' on the DEC Alpha, `int' on other machines) or a structure whose -fields are bit fields that line up properly (actually, a union of -structures is used). Generally the simple integral type is preferable -because it ensures that the compiler will actually use a machine word -to represent the object (some compilers will use more general and less -efficient code for unions and structs even if they can fit in a machine -word). The union type, however, has the advantage of stricter type -checking. If you accidentally pass an integer where a Lisp object is -desired, you get a compile error. The choice of which type to use is -determined by the preprocessor constant `USE_UNION_TYPE' which is -defined via the `--use-union-type' option to `configure'. - - Various macros are used to convert between Lisp_Objects and the -corresponding C type. Macros of the form `XINT()', `XCHAR()', -`XSTRING()', `XSYMBOL()', do any required bit shifting and/or masking -and cast it to the appropriate type. `XINT()' needs to be a bit tricky -so that negative numbers are properly sign-extended. Since integers -are stored left-shifted, if the right-shift operator does an arithmetic -shift (i.e. it leaves the most-significant bit as-is rather than -shifting in a zero, so that it mimics a divide-by-two even for negative -numbers) the shift to remove the tag bit is enough. This is the case -on all the systems we support. - - Note that when `ERROR_CHECK_TYPECHECK' is defined, the converter -macros become more complicated--they check the tag bits and/or the type -field in the first four bytes of a record type to ensure that the -object is really of the correct type. This is great for catching places -where an incorrect type is being dereferenced--this typically results -in a pointer being dereferenced as the wrong type of structure, with -unpredictable (and sometimes not easily traceable) results. - - There are similar `XSETTYPE()' macros that construct a Lisp object. -These macros are of the form `XSETTYPE (LVALUE, RESULT)', i.e. they -have to be a statement rather than just used in an expression. The -reason for this is that standard C doesn't let you "construct" a -structure (but GCC does). Granted, this sometimes isn't too -convenient; for the case of integers, at least, you can use the -function `make_int()', which constructs and _returns_ an integer Lisp -object. Note that the `XSETTYPE()' macros are also affected by -`ERROR_CHECK_TYPECHECK' and make sure that the structure is of the -right type in the case of record types, where the type is contained in -the structure. - - The C programmer is responsible for *guaranteeing* that a -Lisp_Object is the correct type before using the `XTYPE' macros. This -is especially important in the case of lists. Use `XCAR' and `XCDR' if -a Lisp_Object is certainly a cons cell, else use `Fcar()' and `Fcdr()'. -Trust other C code, but not Lisp code. On the other hand, if XEmacs -has an internal logic error, it's better to crash immediately, so -sprinkle `assert()'s and "unreachable" `abort()'s liberally about the -source code. Where performance is an issue, use `type_checking_assert', -`bufpos_checking_assert', and `gc_checking_assert', which do nothing -unless the corresponding configure error checking flag was specified. + Escape sequence ASCII equivalent Meaning + --------------- ---------------- ------- + 0x1B 0x28 0x4A ESC ( J invoke Japanese-JISX0201-Roman + 0x1B 0x28 0x49 ESC ( I invoke Japanese-JISX0201-Kana + 0x1B 0x24 0x42 ESC $ B invoke Japanese-JISX0208 + 0x1B 0x28 0x42 ESC ( B invoke Printing-ASCII + + Initially, Printing-ASCII is invoked.  -File: internals.info, Node: Rules When Writing New C Code, Next: A Summary of the Various XEmacs Modules, Prev: How Lisp Objects Are Represented in C, Up: Top +File: internals.info, Node: Internal Mule Encodings, Next: CCL, Prev: Encodings, Up: MULE Character Sets and Encodings -Rules When Writing New C Code -***************************** +Internal Mule Encodings +======================= - The XEmacs C Code is extremely complex and intricate, and there are -many rules that are more or less consistently followed throughout the -code. Many of these rules are not obvious, so they are explained here. -It is of the utmost importance that you follow them. If you don't, -you may get something that appears to work, but which will crash in odd -situations, often in code far away from where the actual breakage is. +In XEmacs/Mule, each character set is assigned a unique number, called a +"leading byte". This is used in the encodings of a character. Leading +bytes are in the range 0x80 - 0xFF (except for ASCII, which has a +leading byte of 0), although some leading bytes are reserved. + + Charsets whose leading byte is in the range 0x80 - 0x9F are called +"official" and are used for built-in charsets. Other charsets are +called "private" and have leading bytes in the range 0xA0 - 0xFF; these +are user-defined charsets. + + More specifically: + + Character set Leading byte + ------------- ------------ + ASCII 0 + Composite 0x80 + Dimension-1 Official 0x81 - 0x8D + (0x8E is free) + Control-1 0x8F + Dimension-2 Official 0x90 - 0x99 + (0x9A - 0x9D are free; + 0x9E and 0x9F are reserved) + Dimension-1 Private 0xA0 - 0xEF + Dimension-2 Private 0xF0 - 0xFF + + There are two internal encodings for characters in XEmacs/Mule. One +is called "string encoding" and is an 8-bit encoding that is used for +representing characters in a buffer or string. It uses 1 to 4 bytes per +character. The other is called "character encoding" and is a 19-bit +encoding that is used for representing characters individually in a +variable. + + (In the following descriptions, we'll ignore composite characters for +the moment. We also give a general (structural) overview first, +followed later by the exact details.) * Menu: -* General Coding Rules:: -* Writing Lisp Primitives:: -* Writing Good Comments:: -* Adding Global Lisp Variables:: -* Proper Use of Unsigned Types:: -* Coding for Mule:: -* Techniques for XEmacs Developers:: +* Internal String Encoding:: +* Internal Character Encoding::  -File: internals.info, Node: General Coding Rules, Next: Writing Lisp Primitives, Up: Rules When Writing New C Code +File: internals.info, Node: Internal String Encoding, Next: Internal Character Encoding, Up: Internal Mule Encodings + +Internal String Encoding +------------------------ + +ASCII characters are encoded using their position code directly. Other +characters are encoded using their leading byte followed by their +position code(s) with the high bit set. Characters in private character +sets have their leading byte prefixed with a "leading byte prefix", +which is either 0x9E or 0x9F. (No character sets are ever assigned these +leading bytes.) Specifically: + + Character set Encoding (PC=position-code, LB=leading-byte) + ------------- -------- + ASCII PC-1 | + Control-1 LB | PC1 + 0xA0 | + Dimension-1 official LB | PC1 + 0x80 | + Dimension-1 private 0x9E | LB | PC1 + 0x80 | + Dimension-2 official LB | PC1 + 0x80 | PC2 + 0x80 | + Dimension-2 private 0x9F | LB | PC1 + 0x80 | PC2 + 0x80 + + The basic characteristic of this encoding is that the first byte of +all characters is in the range 0x00 - 0x9F, and the second and +following bytes of all characters is in the range 0xA0 - 0xFF. This +means that it is impossible to get out of sync, or more specifically: + + 1. Given any byte position, the beginning of the character it is + within can be determined in constant time. + + 2. Given any byte position at the beginning of a character, the + beginning of the next character can be determined in constant time. + + 3. Given any byte position at the beginning of a character, the + beginning of the previous character can be determined in constant + time. + + 4. Textual searches can simply treat encoded strings as if they were + encoded in a one-byte-per-character fashion rather than the actual + multi-byte encoding. + + None of the standard non-modal encodings meet all of these +conditions. For example, EUC satisfies only (2) and (3), while +Shift-JIS and Big5 (not yet described) satisfy only (2). (All non-modal +encodings must satisfy (2), in order to be unambiguous.) -General Coding Rules -==================== - - The C code is actually written in a dialect of C called "Clean C", -meaning that it can be compiled, mostly warning-free, with either a C or -C++ compiler. Coding in Clean C has several advantages over plain C. -C++ compilers are more nit-picking, and a number of coding errors have -been found by compiling with C++. The ability to use both C and C++ -tools means that a greater variety of development tools are available to -the developer. - - Every module includes `' (angle brackets so that -`--srcdir' works correctly; `config.h' may or may not be in the same -directory as the C sources) and `lisp.h'. `config.h' must always be -included before any other header files (including system header files) -to ensure that certain tricks played by various `s/' and `m/' files -work out correctly. - - When including header files, always use angle brackets, not double -quotes, except when the file to be included is always in the same -directory as the including file. If either file is a generated file, -then that is not likely to be the case. In order to understand why we -have this rule, imagine what happens when you do a build in the source -directory using `./configure' and another build in another directory -using `../work/configure'. There will be two different `config.h' -files. Which one will be used if you `#include "config.h"'? - - Almost every module contains a `syms_of_*()' function and a -`vars_of_*()' function. The former declares any Lisp primitives you -have defined and defines any symbols you will be using. The latter -declares any global Lisp variables you have added and initializes global -C variables in the module. *Important*: There are stringent -requirements on exactly what can go into these functions. See the -comment in `emacs.c'. The reason for this is to avoid obscure unwanted -interactions during initialization. If you don't follow these rules, -you'll be sorry! If you want to do anything that isn't allowed, create -a `complex_vars_of_*()' function for it. Doing this is tricky, though: -you have to make sure your function is called at the right time so that -all the initialization dependencies work out. - - Declare each function of these kinds in `symsinit.h'. Make sure -it's called in the appropriate place in `emacs.c'. You never need to -include `symsinit.h' directly, because it is included by `lisp.h'. - - *All global and static variables that are to be modifiable must be -declared uninitialized.* This means that you may not use the "declare -with initializer" form for these variables, such as `int some_variable -= 0;'. The reason for this has to do with some kludges done during the -dumping process: If possible, the initialized data segment is re-mapped -so that it becomes part of the (unmodifiable) code segment in the -dumped executable. This allows this memory to be shared among multiple -running XEmacs processes. XEmacs is careful to place as much constant -data as possible into initialized variables during the `temacs' phase. - - *Please note:* This kludge only works on a few systems nowadays, and -is rapidly becoming irrelevant because most modern operating systems -provide "copy-on-write" semantics. All data is initially shared -between processes, and a private copy is automatically made (on a -page-by-page basis) when a process first attempts to write to a page of -memory. - - Formerly, there was a requirement that static variables not be -declared inside of functions. This had to do with another hack along -the same vein as what was just described: old USG systems put -statically-declared variables in the initialized data space, so those -header files had a `#define static' declaration. (That way, the -data-segment remapping described above could still work.) This fails -badly on static variables inside of functions, which suddenly become -automatic variables; therefore, you weren't supposed to have any of -them. This awful kludge has been removed in XEmacs because - - 1. almost all of the systems that used this kludge ended up having to - disable the data-segment remapping anyway; - - 2. the only systems that didn't were extremely outdated ones; - - 3. this hack completely messed up inline functions. - - The C source code makes heavy use of C preprocessor macros. One -popular macro style is: - - #define FOO(var, value) do { \ - Lisp_Object FOO_value = (value); \ - ... /* compute using FOO_value */ \ - (var) = bar; \ - } while (0) - - The `do {...} while (0)' is a standard trick to allow FOO to have -statement semantics, so that it can safely be used within an `if' -statement in C, for example. Multiple evaluation is prevented by -copying a supplied argument into a local variable, so that -`FOO(var,fun(1))' only calls `fun' once. - - Lisp lists are popular data structures in the C code as well as in -Elisp. There are two sets of macros that iterate over lists. -`EXTERNAL_LIST_LOOP_N' should be used when the list has been supplied -by the user, and cannot be trusted to be acyclic and `nil'-terminated. -A `malformed-list' or `circular-list' error will be generated if the -list being iterated over is not entirely kosher. `LIST_LOOP_N', on the -other hand, is faster and less safe, and can be used only on trusted -lists. - - Related macros are `GET_EXTERNAL_LIST_LENGTH' and `GET_LIST_LENGTH', -which calculate the length of a list, and in the case of -`GET_EXTERNAL_LIST_LENGTH', validating the properness of the list. The -macros `EXTERNAL_LIST_LOOP_DELETE_IF' and `LIST_LOOP_DELETE_IF' delete -elements from a lisp list satisfying some predicate. + +File: internals.info, Node: Internal Character Encoding, Prev: Internal String Encoding, Up: Internal Mule Encodings + +Internal Character Encoding +--------------------------- + +One 19-bit word represents a single character. The word is separated +into three fields: + + Bit number: 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 + <------------> <------------------> <------------------> + Field: 1 2 3 + + Note that fields 2 and 3 hold 7 bits each, while field 1 holds 5 +bits. + + Character set Field 1 Field 2 Field 3 + ------------- ------- ------- ------- + ASCII 0 0 PC1 + range: (00 - 7F) + Control-1 0 1 PC1 + range: (00 - 1F) + Dimension-1 official 0 LB - 0x80 PC1 + range: (01 - 0D) (20 - 7F) + Dimension-1 private 0 LB - 0x80 PC1 + range: (20 - 6F) (20 - 7F) + Dimension-2 official LB - 0x8F PC1 PC2 + range: (01 - 0A) (20 - 7F) (20 - 7F) + Dimension-2 private LB - 0xE1 PC1 PC2 + range: (0F - 1E) (20 - 7F) (20 - 7F) + Composite 0x1F ? ? + + Note that character codes 0 - 255 are the same as the "binary +encoding" described above.  -File: internals.info, Node: Writing Lisp Primitives, Next: Writing Good Comments, Prev: General Coding Rules, Up: Rules When Writing New C Code +File: internals.info, Node: CCL, Prev: Internal Mule Encodings, Up: MULE Character Sets and Encodings -Writing Lisp Primitives -======================= +CCL +=== - Lisp primitives are Lisp functions implemented in C. The details of -interfacing the C function so that Lisp can call it are handled by a few -C macros. The only way to really understand how to write new C code is -to read the source, but we can explain some things here. - - An example of a special form is the definition of `prog1', from -`eval.c'. (An ordinary function would have the same general -appearance.) - - DEFUN ("prog1", Fprog1, 1, UNEVALLED, 0, /* - Similar to `progn', but the value of the first form is returned. - \(prog1 FIRST BODY...): All the arguments are evaluated sequentially. - The value of FIRST is saved during evaluation of the remaining args, - whose values are discarded. - */ - (args)) - { - /* This function can GC */ - REGISTER Lisp_Object val, form, tail; - struct gcpro gcpro1; + CCL PROGRAM SYNTAX: + CCL_PROGRAM := (CCL_MAIN_BLOCK + [ CCL_EOF_BLOCK ]) - val = Feval (XCAR (args)); + CCL_MAIN_BLOCK := CCL_BLOCK + CCL_EOF_BLOCK := CCL_BLOCK - GCPRO1 (val); + CCL_BLOCK := STATEMENT | (STATEMENT [STATEMENT ...]) + STATEMENT := + SET | IF | BRANCH | LOOP | REPEAT | BREAK + | READ | WRITE - LIST_LOOP_3 (form, XCDR (args), tail) - Feval (form); + SET := (REG = EXPRESSION) | (REG SELF_OP EXPRESSION) + | INT-OR-CHAR - UNGCPRO; - return val; - } - - Let's start with a precise explanation of the arguments to the -`DEFUN' macro. Here is a template for them: - - DEFUN (LNAME, FNAME, MIN_ARGS, MAX_ARGS, INTERACTIVE, /* - DOCSTRING - */ - (ARGLIST)) - -LNAME - This string is the name of the Lisp symbol to define as the - function name; in the example above, it is `"prog1"'. - -FNAME - This is the C function name for this function. This is the name - that is used in C code for calling the function. The name is, by - convention, `F' prepended to the Lisp name, with all dashes (`-') - in the Lisp name changed to underscores. Thus, to call this - function from C code, call `Fprog1'. Remember that the arguments - are of type `Lisp_Object'; various macros and functions for - creating values of type `Lisp_Object' are declared in the file - `lisp.h'. - - Primitives whose names are special characters (e.g. `+' or `<') - are named by spelling out, in some fashion, the special character: - e.g. `Fplus()' or `Flss()'. Primitives whose names begin with - normal alphanumeric characters but also contain special characters - are spelled out in some creative way, e.g. `let*' becomes - `FletX()'. - - Each function also has an associated structure that holds the data - for the subr object that represents the function in Lisp. This - structure conveys the Lisp symbol name to the initialization - routine that will create the symbol and store the subr object as - its definition. The C variable name of this structure is always - `S' prepended to the FNAME. You hardly ever need to be aware of - the existence of this structure, since `DEFUN' plus `DEFSUBR' - takes care of all the details. - -MIN_ARGS - This is the minimum number of arguments that the function - requires. The function `prog1' allows a minimum of one argument. - -MAX_ARGS - This is the maximum number of arguments that the function accepts, - if there is a fixed maximum. Alternatively, it can be `UNEVALLED', - indicating a special form that receives unevaluated arguments, or - `MANY', indicating an unlimited number of evaluated arguments (the - C equivalent of `&rest'). Both `UNEVALLED' and `MANY' are macros. - If MAX_ARGS is a number, it may not be less than MIN_ARGS and it - may not be greater than 8. (If you need to add a function with - more than 8 arguments, use the `MANY' form. Resist the urge to - edit the definition of `DEFUN' in `lisp.h'. If you do it anyways, - make sure to also add another clause to the switch statement in - `primitive_funcall().') - -INTERACTIVE - This is an interactive specification, a string such as might be - used as the argument of `interactive' in a Lisp function. In the - case of `prog1', it is 0 (a null pointer), indicating that `prog1' - cannot be called interactively. A value of `""' indicates a - function that should receive no arguments when called - interactively. - -DOCSTRING - This is the documentation string. It is written just like a - documentation string for a function defined in Lisp; in - particular, the first line should be a single sentence. Note how - the documentation string is enclosed in a comment, none of the - documentation is placed on the same lines as the comment-start and - comment-end characters, and the comment-start characters are on - the same line as the interactive specification. `make-docfile', - which scans the C files for documentation strings, is very - particular about what it looks for, and will not properly extract - the doc string if it's not in this exact format. - - In order to make both `etags' and `make-docfile' happy, make sure - that the `DEFUN' line contains the LNAME and FNAME, and that the - comment-start characters for the doc string are on the same line - as the interactive specification, and put a newline directly after - them (and before the comment-end characters). - -ARGLIST - This is the comma-separated list of arguments to the C function. - For a function with a fixed maximum number of arguments, provide a - C argument for each Lisp argument. In this case, unlike regular C - functions, the types of the arguments are not declared; they are - simply always of type `Lisp_Object'. - - The names of the C arguments will be used as the names of the - arguments to the Lisp primitive as displayed in its documentation, - modulo the same concerns described above for `F...' names (in - particular, underscores in the C arguments become dashes in the - Lisp arguments). - - There is one additional kludge: A trailing `_' on the C argument is - discarded when forming the Lisp argument. This allows C language - reserved words (like `default') or global symbols (like `dirname') - to be used as argument names without compiler warnings or errors. - - A Lisp function with MAX_ARGS = `UNEVALLED' is a "special form"; - its arguments are not evaluated. Instead it receives one argument - of type `Lisp_Object', a (Lisp) list of the unevaluated arguments, - conventionally named `(args)'. - - When a Lisp function has no upper limit on the number of arguments, - specify MAX_ARGS = `MANY'. In this case its implementation in C - actually receives exactly two arguments: the number of Lisp - arguments (an `int') and the address of a block containing their - values (a `Lisp_Object *'). In this case only are the C types - specified in the ARGLIST: `(int nargs, Lisp_Object *args)'. - - Within the function `Fprog1' itself, note the use of the macros -`GCPRO1' and `UNGCPRO'. `GCPRO1' is used to "protect" a variable from -garbage collection--to inform the garbage collector that it must look -in that variable and regard the object pointed at by its contents as an -accessible object. This is necessary whenever you call `Feval' or -anything that can directly or indirectly call `Feval' (this includes -the `QUIT' macro!). At such a time, any Lisp object that you intend to -refer to again must be protected somehow. `UNGCPRO' cancels the -protection of the variables that are protected in the current function. -It is necessary to do this explicitly. - - The macro `GCPRO1' protects just one local variable. If you want to -protect two, use `GCPRO2' instead; repeating `GCPRO1' will not work. -Macros `GCPRO3' and `GCPRO4' also exist. - - These macros implicitly use local variables such as `gcpro1'; you -must declare these explicitly, with type `struct gcpro'. Thus, if you -use `GCPRO2', you must declare `gcpro1' and `gcpro2'. - - Note also that the general rule is "caller-protects"; i.e. you are -only responsible for protecting those Lisp objects that you create. Any -objects passed to you as arguments should have been protected by whoever -created them, so you don't in general have to protect them. - - In particular, the arguments to any Lisp primitive are always -automatically `GCPRO'ed, when called "normally" from Lisp code or -bytecode. So only a few Lisp primitives that are called frequently from -C code, such as `Fprogn' protect their arguments as a service to their -caller. You don't need to protect your arguments when writing a new -`DEFUN'. - - `GCPRO'ing is perhaps the trickiest and most error-prone part of -XEmacs coding. It is *extremely* important that you get this right and -use a great deal of discipline when writing this code. *Note -`GCPRO'ing: GCPROing, for full details on how to do this. - - What `DEFUN' actually does is declare a global structure of type -`Lisp_Subr' whose name begins with capital `SF' and which contains -information about the primitive (e.g. a pointer to the function, its -minimum and maximum allowed arguments, a string describing its Lisp -name); `DEFUN' then begins a normal C function declaration using the -`F...' name. The Lisp subr object that is the function definition of a -primitive (i.e. the object in the function slot of the symbol that -names the primitive) actually points to this `SF' structure; when -`Feval' encounters a subr, it looks in the structure to find out how to -call the C function. - - Defining the C function is not enough to make a Lisp primitive -available; you must also create the Lisp symbol for the primitive (the -symbol is "interned"; *note Obarrays::) and store a suitable subr -object in its function cell. (If you don't do this, the primitive won't -be seen by Lisp code.) The code looks like this: - - DEFSUBR (FNAME); - -Here FNAME is the same name you used as the second argument to `DEFUN'. - - This call to `DEFSUBR' should go in the `syms_of_*()' function at -the end of the module. If no such function exists, create it and make -sure to also declare it in `symsinit.h' and call it from the -appropriate spot in `main()'. *Note General Coding Rules::. - - Note that C code cannot call functions by name unless they are -defined in C. The way to call a function written in Lisp from C is to -use `Ffuncall', which embodies the Lisp function `funcall'. Since the -Lisp function `funcall' accepts an unlimited number of arguments, in C -it takes two: the number of Lisp-level arguments, and a one-dimensional -array containing their values. The first Lisp-level argument is the -Lisp function to call, and the rest are the arguments to pass to it. -Since `Ffuncall' can call the evaluator, you must protect pointers from -garbage collection around the call to `Ffuncall'. (However, `Ffuncall' -explicitly protects all of its parameters, so you don't have to protect -any pointers passed as parameters to it.) - - The C functions `call0', `call1', `call2', and so on, provide handy -ways to call a Lisp function conveniently with a fixed number of -arguments. They work by calling `Ffuncall'. - - `eval.c' is a very good file to look through for examples; `lisp.h' -contains the definitions for important macros and functions. + EXPRESSION := ARG | (EXPRESSION OP ARG) + + IF := (if EXPRESSION CCL_BLOCK CCL_BLOCK) + BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...]) + LOOP := (loop STATEMENT [STATEMENT ...]) + BREAK := (break) + REPEAT := (repeat) + | (write-repeat [REG | INT-OR-CHAR | string]) + | (write-read-repeat REG [INT-OR-CHAR | string | ARRAY]?) + READ := (read REG) | (read REG REG) + | (read-if REG ARITH_OP ARG CCL_BLOCK CCL_BLOCK) + | (read-branch REG CCL_BLOCK [CCL_BLOCK ...]) + WRITE := (write REG) | (write REG REG) + | (write INT-OR-CHAR) | (write STRING) | STRING + | (write REG ARRAY) + END := (end) + + REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7 + ARG := REG | INT-OR-CHAR + OP := + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | // + | < | > | == | <= | >= | != + SELF_OP := + += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>= + ARRAY := '[' INT-OR-CHAR ... ']' + INT-OR-CHAR := INT | CHAR + + MACHINE CODE: + + The machine code consists of a vector of 32-bit words. + The first such word specifies the start of the EOF section of the code; + this is the code executed to handle any stuff that needs to be done + (e.g. designating back to ASCII and left-to-right mode) after all + other encoded/decoded data has been written out. This is not used for + charset CCL programs. + + REGISTER: 0..7 -- referred by RRR or rrr + + OPERATOR BIT FIELD (27-bit): XXXXXXXXXXXXXXX RRR TTTTT + TTTTT (5-bit): operator type + RRR (3-bit): register number + XXXXXXXXXXXXXXXX (15-bit): + CCCCCCCCCCCCCCC: constant or address + 000000000000rrr: register number + + AAAA: 00000 + + 00001 - + 00010 * + 00011 / + 00100 % + 00101 & + 00110 | + 00111 ~ + + 01000 << + 01001 >> + 01010 <8 + 01011 >8 + 01100 // + 01101 not used + 01110 not used + 01111 not used + + 10000 < + 10001 > + 10010 == + 10011 <= + 10100 >= + 10101 != + + OPERATORS: TTTTT RRR XX.. + + SetCS: 00000 RRR C...C RRR = C...C + SetCL: 00001 RRR ..... RRR = c...c + c.............c + SetR: 00010 RRR ..rrr RRR = rrr + SetA: 00011 RRR ..rrr RRR = array[rrr] + C.............C size of array = C...C + c.............c contents = c...c + + Jump: 00100 000 c...c jump to c...c + JumpCond: 00101 RRR c...c if (!RRR) jump to c...c + WriteJump: 00110 RRR c...c Write1 RRR, jump to c...c + WriteReadJump: 00111 RRR c...c Write1, Read1 RRR, jump to c...c + WriteCJump: 01000 000 c...c Write1 C...C, jump to c...c + C...C + WriteCReadJump: 01001 RRR c...c Write1 C...C, Read1 RRR, + C.............C and jump to c...c + WriteSJump: 01010 000 c...c WriteS, jump to c...c + C.............C + S.............S + ... + WriteSReadJump: 01011 RRR c...c WriteS, Read1 RRR, jump to c...c + C.............C + S.............S + ... + WriteAReadJump: 01100 RRR c...c WriteA, Read1 RRR, jump to c...c + C.............C size of array = C...C + c.............c contents = c...c + ... + Branch: 01101 RRR C...C if (RRR >= 0 && RRR < C..) + c.............c branch to (RRR+1)th address + Read1: 01110 RRR ... read 1-byte to RRR + Read2: 01111 RRR ..rrr read 2-byte to RRR and rrr + ReadBranch: 10000 RRR C...C Read1 and Branch + c.............c + ... + Write1: 10001 RRR ..... write 1-byte RRR + Write2: 10010 RRR ..rrr write 2-byte RRR and rrr + WriteC: 10011 000 ..... write 1-char C...CC + C.............C + WriteS: 10100 000 ..... write C..-byte of string + C.............C + S.............S + ... + WriteA: 10101 RRR ..... write array[RRR] + C.............C size of array = C...C + c.............c contents = c...c + ... + End: 10110 000 ..... terminate the execution + + SetSelfCS: 10111 RRR C...C RRR AAAAA= C...C + ..........AAAAA + SetSelfCL: 11000 RRR ..... RRR AAAAA= c...c + c.............c + ..........AAAAA + SetSelfR: 11001 RRR ..Rrr RRR AAAAA= rrr + ..........AAAAA + SetExprCL: 11010 RRR ..Rrr RRR = rrr AAAAA c...c + c.............c + ..........AAAAA + SetExprR: 11011 RRR ..rrr RRR = rrr AAAAA Rrr + ............Rrr + ..........AAAAA + JumpCondC: 11100 RRR c...c if !(RRR AAAAA C..) jump to c...c + C.............C + ..........AAAAA + JumpCondR: 11101 RRR c...c if !(RRR AAAAA rrr) jump to c...c + ............rrr + ..........AAAAA + ReadJumpCondC: 11110 RRR c...c Read1 and JumpCondC + C.............C + ..........AAAAA + ReadJumpCondR: 11111 RRR c...c Read1 and JumpCondR + ............rrr + ..........AAAAA  -File: internals.info, Node: Writing Good Comments, Next: Adding Global Lisp Variables, Prev: Writing Lisp Primitives, Up: Rules When Writing New C Code - -Writing Good Comments -===================== - - Comments are a lifeline for programmers trying to understand tricky -code. In general, the less obvious it is what you are doing, the more -you need a comment, and the more detailed it needs to be. You should -always be on guard when you're writing code for stuff that's tricky, and -should constantly be putting yourself in someone else's shoes and asking -if that person could figure out without much difficulty what's going -on. (Assume they are a competent programmer who understands the -essentials of how the XEmacs code is structured but doesn't know much -about the module you're working on or any algorithms you're using.) If -you're not sure whether they would be able to, add a comment. Always -err on the side of more comments, rather than less. - - Generally, when making comments, there is no need to attribute them -with your name or initials. This especially goes for small, -easy-to-understand, non-opinionated ones. Also, comments indicating -where, when, and by whom a file was changed are _strongly_ discouraged, -and in general will be removed as they are discovered. This is exactly -what `ChangeLogs' are there for. However, it can occasionally be -useful to mark exactly where (but not when or by whom) changes are -made, particularly when making small changes to a file imported from -elsewhere. These marks help when later on a newer version of the file -is imported and the changes need to be merged. (If everything were -always kept in CVS, there would be no need for this. But in practice, -this often doesn't happen, or the CVS repository is later on lost or -unavailable to the person doing the update.) - - When putting in an explicit opinion in a comment, you should -_always_ attribute it with your name, and optionally the date. This -also goes for long, complex comments explaining in detail the workings -of something - by putting your name there, you make it possible for -someone who has questions about how that thing works to determine who -wrote the comment so they can write to them. Preferably, use your -actual name and not your initials, unless your initials are generally -recognized (e.g. `jwz'). You can use only your first name if it's -obvious who you are; otherwise, give first and last name. If you're -not a regular contributor, you might consider putting your email -address in - it may be in the ChangeLog, but after awhile ChangeLogs -have a tendency of disappearing or getting muddled. (E.g. your comment -may get copied somewhere else or even into another program, and -tracking down the proper ChangeLog may be very difficult.) - - If you come across an opinion that is not or no longer valid, or you -come across any comment that no longer applies but you want to keep it -around, enclose it in `[[ ' and ` ]]' marks and add a comment -afterwards explaining why the preceding comment is no longer valid. Put -your name on this comment, as explained above. - - Just as comments are a lifeline to programmers, incorrect comments -are death. If you come across an incorrect comment, *immediately* -correct it or flag it as incorrect, as described in the previous -paragraph. Whenever you work on a section of code, _always_ make sure -to update any comments to be correct - or, at the very least, flag them -as incorrect. - - To indicate a "todo" or other problem, use four pound signs - i.e. -`####'. +File: internals.info, Node: The Lisp Reader and Compiler, Next: Lstreams, Prev: MULE Character Sets and Encodings, Up: Top + +The Lisp Reader and Compiler +**************************** + +Not yet documented.  -File: internals.info, Node: Adding Global Lisp Variables, Next: Proper Use of Unsigned Types, Prev: Writing Good Comments, Up: Rules When Writing New C Code - -Adding Global Lisp Variables -============================ - - Global variables whose names begin with `Q' are constants whose -value is a symbol of a particular name. The name of the variable should -be derived from the name of the symbol using the same rules as for Lisp -primitives. These variables are initialized using a call to -`defsymbol()' in the `syms_of_*()' function. (This call interns a -symbol, sets the C variable to the resulting Lisp object, and calls -`staticpro()' on the C variable to tell the garbage-collection -mechanism about this variable. What `staticpro()' does is add a -pointer to the variable to a large global array; when -garbage-collection happens, all pointers listed in the array are used -as starting points for marking Lisp objects. This is important because -it's quite possible that the only current reference to the object is -the C variable. In the case of symbols, the `staticpro()' doesn't -matter all that much because the symbol is contained in `obarray', -which is itself `staticpro()'ed. However, it's possible that a naughty -user could do something like uninterning the symbol out of `obarray' or -even setting `obarray' to a different value [although this is likely to -make XEmacs crash!].) - - *Please note:* It is potentially deadly if you declare a `Q...' -variable in two different modules. The two calls to `defsymbol()' are -no problem, but some linkers will complain about multiply-defined -symbols. The most insidious aspect of this is that often the link will -succeed anyway, but then the resulting executable will sometimes crash -in obscure ways during certain operations! To avoid this problem, -declare any symbols with common names (such as `text') that are not -obviously associated with this particular module in the module -`general.c'. - - Global variables whose names begin with `V' are variables that -contain Lisp objects. The convention here is that all global variables -of type `Lisp_Object' begin with `V', and all others don't (including -integer and boolean variables that have Lisp equivalents). Most of the -time, these variables have equivalents in Lisp, but some don't. Those -that do are declared this way by a call to `DEFVAR_LISP()' in the -`vars_of_*()' initializer for the module. What this does is create a -special "symbol-value-forward" Lisp object that contains a pointer to -the C variable, intern a symbol whose name is as specified in the call -to `DEFVAR_LISP()', and set its value to the symbol-value-forward Lisp -object; it also calls `staticpro()' on the C variable to tell the -garbage-collection mechanism about the variable. When `eval' (or -actually `symbol-value') encounters this special object in the process -of retrieving a variable's value, it follows the indirection to the C -variable and gets its value. `setq' does similar things so that the C -variable gets changed. - - Whether or not you `DEFVAR_LISP()' a variable, you need to -initialize it in the `vars_of_*()' function; otherwise it will end up -as all zeroes, which is the integer 0 (_not_ `nil'), and this is -probably not what you want. Also, if the variable is not -`DEFVAR_LISP()'ed, *you must call* `staticpro()' on the C variable in -the `vars_of_*()' function. Otherwise, the garbage-collection -mechanism won't know that the object in this variable is in use, and -will happily collect it and reuse its storage for another Lisp object, -and you will be the one who's unhappy when you can't figure out how -your variable got overwritten. +File: internals.info, Node: Lstreams, Next: Consoles; Devices; Frames; Windows, Prev: The Lisp Reader and Compiler, Up: Top + +Lstreams +******** + +An "lstream" is an internal Lisp object that provides a generic +buffering stream implementation. Conceptually, you send data to the +stream or read data from the stream, not caring what's on the other end +of the stream. The other end could be another stream, a file +descriptor, a stdio stream, a fixed block of memory, a reallocating +block of memory, etc. The main purpose of the stream is to provide a +standard interface and to do buffering. Macros are defined to read or +write characters, so the calling functions do not have to worry about +blocking data together in order to achieve efficiency. + +* Menu: + +* Creating an Lstream:: Creating an lstream object. +* Lstream Types:: Different sorts of things that are streamed. +* Lstream Functions:: Functions for working with lstreams. +* Lstream Methods:: Creating new lstream types. + + +File: internals.info, Node: Creating an Lstream, Next: Lstream Types, Up: Lstreams + +Creating an Lstream +=================== + +Lstreams come in different types, depending on what is being interfaced +to. Although the primitive for creating new lstreams is +`Lstream_new()', generally you do not call this directly. Instead, you +call some type-specific creation function, which creates the lstream +and initializes it as appropriate for the particular type. + + All lstream creation functions take a MODE argument, specifying what +mode the lstream should be opened as. This controls whether the +lstream is for input and output, and optionally whether data should be +blocked up in units of MULE characters. Note that some types of +lstreams can only be opened for input; others only for output; and +others can be opened either way. #### Richard Mlynarik thinks that +there should be a strict separation between input and output streams, +and he's probably right. + + MODE is a string, one of + +`"r"' + Open for reading. + +`"w"' + Open for writing. + +`"rc"' + Open for reading, but "read" never returns partial MULE characters. + +`"wc"' + Open for writing, but never writes partial MULE characters.  -File: internals.info, Node: Proper Use of Unsigned Types, Next: Coding for Mule, Prev: Adding Global Lisp Variables, Up: Rules When Writing New C Code +File: internals.info, Node: Lstream Types, Next: Lstream Functions, Prev: Creating an Lstream, Up: Lstreams + +Lstream Types +============= + +stdio + +filedesc -Proper Use of Unsigned Types -============================ +lisp-string - Avoid using `unsigned int' and `unsigned long' whenever possible. -Unsigned types are viral - any arithmetic or comparisons involving -mixed signed and unsigned types are automatically converted to -unsigned, which is almost certainly not what you want. Many subtle and -hard-to-find bugs are created by careless use of unsigned types. In -general, you should almost _never_ use an unsigned type to hold a -regular quantity of any sort. The only exceptions are +fixed-buffer - 1. When there's a reasonable possibility you will actually need all - 32 or 64 bits to store the quantity. +resizing-buffer - 2. When calling existing API's that require unsigned types. In this - case, you should still do all manipulation using signed types, and - do the conversion at the very threshold of the API call. +dynarr - 3. In existing code that you don't want to modify because you don't - maintain it. +lisp-buffer - 4. In bit-field structures. +print - Other reasonable uses of `unsigned int' and `unsigned long' are -representing non-quantities - e.g. bit-oriented flags and such. +decoding + +encoding + + +File: internals.info, Node: Lstream Functions, Next: Lstream Methods, Prev: Lstream Types, Up: Lstreams + +Lstream Functions +================= + + - Function: Lstream * Lstream_new (Lstream_implementation *IMP, const + char *MODE) + Allocate and return a new Lstream. This function is not really + meant to be called directly; rather, each stream type should + provide its own stream creation function, which creates the stream + and does any other necessary creation stuff (e.g. opening a file). + + - Function: void Lstream_set_buffering (Lstream *LSTR, + Lstream_buffering BUFFERING, int BUFFERING_SIZE) + Change the buffering of a stream. See `lstream.h'. By default the + buffering is `STREAM_BLOCK_BUFFERED'. + + - Function: int Lstream_flush (Lstream *LSTR) + Flush out any pending unwritten data in the stream. Clear any + buffered input data. Returns 0 on success, -1 on error. + + - Macro: int Lstream_putc (Lstream *STREAM, int C) + Write out one byte to the stream. This is a macro and so it is + very efficient. The C argument is only evaluated once but the + STREAM argument is evaluated more than once. Returns 0 on + success, -1 on error. + + - Macro: int Lstream_getc (Lstream *STREAM) + Read one byte from the stream. This is a macro and so it is very + efficient. The STREAM argument is evaluated more than once. + Return value is -1 for EOF or error. + + - Macro: void Lstream_ungetc (Lstream *STREAM, int C) + Push one byte back onto the input queue. This will be the next + byte read from the stream. Any number of bytes can be pushed back + and will be read in the reverse order they were pushed back--most + recent first. (This is necessary for consistency--if there are a + number of bytes that have been unread and I read and unread a + byte, it needs to be the first to be read again.) This is a macro + and so it is very efficient. The C argument is only evaluated + once but the STREAM argument is evaluated more than once. + + - Function: int Lstream_fputc (Lstream *STREAM, int C) + - Function: int Lstream_fgetc (Lstream *STREAM) + - Function: void Lstream_fungetc (Lstream *STREAM, int C) + Function equivalents of the above macros. + + - Function: ssize_t Lstream_read (Lstream *STREAM, void *DATA, size_t + SIZE) + Read SIZE bytes of DATA from the stream. Return the number of + bytes read. 0 means EOF. -1 means an error occurred and no bytes + were read. + + - Function: ssize_t Lstream_write (Lstream *STREAM, void *DATA, size_t + SIZE) + Write SIZE bytes of DATA to the stream. Return the number of + bytes written. -1 means an error occurred and no bytes were + written. + + - Function: void Lstream_unread (Lstream *STREAM, void *DATA, size_t + SIZE) + Push back SIZE bytes of DATA onto the input queue. The next call + to `Lstream_read()' with the same size will read the same bytes + back. Note that this will be the case even if there is other + pending unread data. + + - Function: int Lstream_close (Lstream *STREAM) + Close the stream. All data will be flushed out. + + - Function: void Lstream_reopen (Lstream *STREAM) + Reopen a closed stream. This enables I/O on it again. This is not + meant to be called except from a wrapper routine that reinitializes + variables and such--the close routine may well have freed some + necessary storage structures, for example. + + - Function: void Lstream_rewind (Lstream *STREAM) + Rewind the stream to the beginning.  -File: internals.info, Node: Coding for Mule, Next: Techniques for XEmacs Developers, Prev: Proper Use of Unsigned Types, Up: Rules When Writing New C Code +File: internals.info, Node: Lstream Methods, Prev: Lstream Functions, Up: Lstreams -Coding for Mule +Lstream Methods =============== - Although Mule support is not compiled by default in XEmacs, many -people are using it, and we consider it crucial that new code works -correctly with multibyte characters. This is not hard; it is only a -matter of following several simple user-interface guidelines. Even if -you never compile with Mule, with a little practice you will find it -quite easy to code Mule-correctly. + - Lstream Method: ssize_t reader (Lstream *STREAM, unsigned char + *DATA, size_t SIZE) + Read some data from the stream's end and store it into DATA, which + can hold SIZE bytes. Return the number of bytes read. A return + value of 0 means no bytes can be read at this time. This may be + because of an EOF, or because there is a granularity greater than + one byte that the stream imposes on the returned data, and SIZE is + less than this granularity. (This will happen frequently for + streams that need to return whole characters, because + `Lstream_read()' calls the reader function repeatedly until it has + the number of bytes it wants or until 0 is returned.) The lstream + functions do not treat a 0 return as EOF or do anything special; + however, the calling function will interpret any 0 it gets back as + EOF. This will normally not happen unless the caller calls + `Lstream_read()' with a very small size. + + This function can be `NULL' if the stream is output-only. + + - Lstream Method: ssize_t writer (Lstream *STREAM, const unsigned char + *DATA, size_t SIZE) + Send some data to the stream's end. Data to be sent is in DATA + and is SIZE bytes. Return the number of bytes sent. This + function can send and return fewer bytes than is passed in; in that + case, the function will just be called again until there is no + data left or 0 is returned. A return value of 0 means that no + more data can be currently stored, but there is no error; the data + will be squirreled away until the writer can accept data. (This is + useful, e.g., if you're dealing with a non-blocking file + descriptor and are getting `EWOULDBLOCK' errors.) This function + can be `NULL' if the stream is input-only. + + - Lstream Method: int rewinder (Lstream *STREAM) + Rewind the stream. If this is `NULL', the stream is not seekable. + + - Lstream Method: int seekable_p (Lstream *STREAM) + Indicate whether this stream is seekable--i.e. it can be rewound. + This method is ignored if the stream does not have a rewind + method. If this method is not present, the result is determined + by whether a rewind method is present. + + - Lstream Method: int flusher (Lstream *STREAM) + Perform any additional operations necessary to flush the data in + this stream. + + - Lstream Method: int pseudo_closer (Lstream *STREAM) + + - Lstream Method: int closer (Lstream *STREAM) + Perform any additional operations necessary to close this stream + down. May be `NULL'. This function is called when + `Lstream_close()' is called or when the stream is + garbage-collected. When this function is called, all pending data + in the stream will already have been written out. + + - Lstream Method: Lisp_Object marker (Lisp_Object LSTREAM, void + (*MARKFUN) (Lisp_Object)) + Mark this object for garbage collection. Same semantics as a + standard `Lisp_Object' marker. This function can be `NULL'. + + +File: internals.info, Node: Consoles; Devices; Frames; Windows, Next: The Redisplay Mechanism, Prev: Lstreams, Up: Top + +Consoles; Devices; Frames; Windows +********************************** + +* Menu: + +* Introduction to Consoles; Devices; Frames; Windows:: +* Point:: +* Window Hierarchy:: +* The Window Object:: + + +File: internals.info, Node: Introduction to Consoles; Devices; Frames; Windows, Next: Point, Up: Consoles; Devices; Frames; Windows + +Introduction to Consoles; Devices; Frames; Windows +================================================== + +A window-system window that you see on the screen is called a "frame" +in Emacs terminology. Each frame is subdivided into one or more +non-overlapping panes, called (confusingly) "windows". Each window +displays the text of a buffer in it. (See above on Buffers.) Note that +buffers and windows are independent entities: Two or more windows can +be displaying the same buffer (potentially in different locations), and +a buffer can be displayed in no windows. + + A single display screen that contains one or more frames is called a +"display". Under most circumstances, there is only one display. +However, more than one display can exist, for example if you have a +"multi-headed" console, i.e. one with a single keyboard but multiple +displays. (Typically in such a situation, the various displays act like +one large display, in that the mouse is only in one of them at a time, +and moving the mouse off of one moves it into another.) In some cases, +the different displays will have different characteristics, e.g. one +color and one mono. + + XEmacs can display frames on multiple displays. It can even deal +simultaneously with frames on multiple keyboards (called "consoles" in +XEmacs terminology). Here is one case where this might be useful: You +are using XEmacs on your workstation at work, and leave it running. +Then you go home and dial in on a TTY line, and you can use the +already-running XEmacs process to display another frame on your local +TTY. + + Thus, there is a hierarchy console -> display -> frame -> window. +There is a separate Lisp object type for each of these four concepts. +Furthermore, there is logically a "selected console", "selected +display", "selected frame", and "selected window". Each of these +objects is distinguished in various ways, such as being the default +object for various functions that act on objects of that type. Note +that every containing object remembers the "selected" object among the +objects that it contains: e.g. not only is there a selected window, but +every frame remembers the last window in it that was selected, and +changing the selected frame causes the remembered window within it to +become the selected window. Similar relationships apply for consoles +to devices and devices to frames. + + +File: internals.info, Node: Point, Next: Window Hierarchy, Prev: Introduction to Consoles; Devices; Frames; Windows, Up: Consoles; Devices; Frames; Windows + +Point +===== + +Recall that every buffer has a current insertion position, called +"point". Now, two or more windows may be displaying the same buffer, +and the text cursor in the two windows (i.e. `point') can be in two +different places. You may ask, how can that be, since each buffer has +only one value of `point'? The answer is that each window also has a +value of `point' that is squirreled away in it. There is only one +selected window, and the value of "point" in that buffer corresponds to +that window. When the selected window is changed from one window to +another displaying the same buffer, the old value of `point' is stored +into the old window's "point" and the value of `point' from the new +window is retrieved and made the value of `point' in the buffer. This +means that `window-point' for the selected window is potentially +inaccurate, and if you want to retrieve the correct value of `point' +for a window, you must special-case on the selected window and retrieve +the buffer's point instead. This is related to why +`save-window-excursion' does not save the selected window's value of +`point'. + + +File: internals.info, Node: Window Hierarchy, Next: The Window Object, Prev: Point, Up: Consoles; Devices; Frames; Windows + +Window Hierarchy +================ + +If a frame contains multiple windows (panes), they are always created +by splitting an existing window along the horizontal or vertical axis. +Terminology is a bit confusing here: to "split a window horizontally" +means to create two side-by-side windows, i.e. to make a _vertical_ cut +in a window. Likewise, to "split a window vertically" means to create +two windows, one above the other, by making a _horizontal_ cut. + + If you split a window and then split again along the same axis, you +will end up with a number of panes all arranged along the same axis. +The precise way in which the splits were made should not be important, +and this is reflected internally. Internally, all windows are arranged +in a tree, consisting of two types of windows, "combination" windows +(which have children, and are covered completely by those children) and +"leaf" windows, which have no children and are visible. Every +combination window has two or more children, all arranged along the same +axis. There are (logically) two subtypes of windows, depending on +whether their children are horizontally or vertically arrayed. There is +always one root window, which is either a leaf window (if the frame +contains only one window) or a combination window (if the frame contains +more than one window). In the latter case, the root window will have +two or more children, either horizontally or vertically arrayed, and +each of those children will be either a leaf window or another +combination window. + + Here are some rules: + + 1. Horizontal combination windows can never have children that are + horizontal combination windows; same for vertical. + + 2. Only leaf windows can be split (obviously) and this splitting does + one of two things: (a) turns the leaf window into a combination + window and creates two new leaf children, or (b) turns the leaf + window into one of the two new leaves and creates the other leaf. + Rule (1) dictates which of these two outcomes happens. + + 3. Every combination window must have at least two children. + + 4. Leaf windows can never become combination windows. They can be + deleted, however. If this results in a violation of (3), the + parent combination window also gets deleted. + + 5. All functions that accept windows must be prepared to accept + combination windows, and do something sane (e.g. signal an error + if so). Combination windows _do_ escape to the Lisp level. + + 6. All windows have three fields governing their contents: these are + "hchild" (a list of horizontally-arrayed children), "vchild" (a + list of vertically-arrayed children), and "buffer" (the buffer + contained in a leaf window). Exactly one of these will be + non-`nil'. Remember that "horizontally-arrayed" means + "side-by-side" and "vertically-arrayed" means "one above the + other". + + 7. Leaf windows also have markers in their `start' (the first buffer + position displayed in the window) and `pointm' (the window's + stashed value of `point'--see above) fields, while combination + windows have `nil' in these fields. + + 8. The list of children for a window is threaded through the `next' + and `prev' fields of each child window. + + 9. *Deleted windows can be undeleted*. This happens as a result of + restoring a window configuration, and is unlike frames, displays, + and consoles, which, once deleted, can never be restored. + Deleting a window does nothing except set a special `dead' bit to + 1 and clear out the `next', `prev', `hchild', and `vchild' fields, + for GC purposes. + + 10. Most frames actually have two top-level windows--one for the + minibuffer and one (the "root") for everything else. The modeline + (if present) separates these two. The `next' field of the root + points to the minibuffer, and the `prev' field of the minibuffer + points to the root. The other `next' and `prev' fields are `nil', + and the frame points to both of these windows. Minibuffer-less + frames have no minibuffer window, and the `next' and `prev' of the + root window are `nil'. Minibuffer-only frames have no root + window, and the `next' of the minibuffer window is `nil' but the + `prev' points to itself. (#### This is an artifact that should be + fixed.) + + +File: internals.info, Node: The Window Object, Prev: Window Hierarchy, Up: Consoles; Devices; Frames; Windows + +The Window Object +================= + +Windows have the following accessible fields: + +`frame' + The frame that this window is on. + +`mini_p' + Non-`nil' if this window is a minibuffer window. + +`buffer' + The buffer that the window is displaying. This may change often + during the life of the window. + +`dedicated' + Non-`nil' if this window is dedicated to its buffer. + +`pointm' + This is the value of point in the current buffer when this window + is selected; when it is not selected, it retains its previous + value. + +`start' + The position in the buffer that is the first character to be + displayed in the window. + +`force_start' + If this flag is non-`nil', it says that the window has been + scrolled explicitly by the Lisp program. This affects what the + next redisplay does if point is off the screen: instead of + scrolling the window to show the text around point, it moves point + to a location that is on the screen. + +`last_modified' + The `modified' field of the window's buffer, as of the last time a + redisplay completed in this window. + +`last_point' + The buffer's value of point, as of the last time a redisplay + completed in this window. + +`left' + This is the left-hand edge of the window, measured in columns. + (The leftmost column on the screen is column 0.) + +`top' + This is the top edge of the window, measured in lines. (The top + line on the screen is line 0.) + +`height' + The height of the window, measured in lines. + +`width' + The width of the window, measured in columns. + +`next' + This is the window that is the next in the chain of siblings. It + is `nil' in a window that is the rightmost or bottommost of a + group of siblings. + +`prev' + This is the window that is the previous in the chain of siblings. + It is `nil' in a window that is the leftmost or topmost of a group + of siblings. + +`parent' + Internally, XEmacs arranges windows in a tree; each group of + siblings has a parent window whose area includes all the siblings. + This field points to a window's parent. + + Parent windows do not display buffers, and play little role in + display except to shape their child windows. Emacs Lisp programs + usually have no access to the parent windows; they operate on the + windows at the leaves of the tree, which actually display buffers. + +`hscroll' + This is the number of columns that the display in the window is + scrolled horizontally to the left. Normally, this is 0. + +`use_time' + This is the last time that the window was selected. The function + `get-lru-window' uses this field. + +`display_table' + The window's display table, or `nil' if none is specified for it. + +`update_mode_line' + Non-`nil' means this window's mode line needs to be updated. + +`base_line_number' + The line number of a certain position in the buffer, or `nil'. + This is used for displaying the line number of point in the mode + line. + +`base_line_pos' + The position in the buffer for which the line number is known, or + `nil' meaning none is known. + +`region_showing' + If the region (or part of it) is highlighted in this window, this + field holds the mark position that made one end of that region. + Otherwise, this field is `nil'. + + +File: internals.info, Node: The Redisplay Mechanism, Next: Extents, Prev: Consoles; Devices; Frames; Windows, Up: Top - Note that these guidelines are not necessarily tied to the current -Mule implementation; they are also a good idea to follow on the grounds -of code generalization for future I18N work. +The Redisplay Mechanism +*********************** + +The redisplay mechanism is one of the most complicated sections of +XEmacs, especially from a conceptual standpoint. This is doubly so +because, unlike for the basic aspects of the Lisp interpreter, the +computer science theories of how to efficiently handle redisplay are not +well-developed. + + When working with the redisplay mechanism, remember the Golden Rules +of Redisplay: + + 1. It Is Better To Be Correct Than Fast. + + 2. Thou Shalt Not Run Elisp From Within Redisplay. + + 3. It Is Better To Be Fast Than Not To Be. * Menu: -* Character-Related Data Types:: -* Working With Character and Byte Positions:: -* Conversion to and from External Data:: -* General Guidelines for Writing Mule-Aware Code:: -* An Example of Mule-Aware Code:: +* Critical Redisplay Sections:: +* Line Start Cache:: +* Redisplay Piece by Piece::  -File: internals.info, Node: Character-Related Data Types, Next: Working With Character and Byte Positions, Up: Coding for Mule +File: internals.info, Node: Critical Redisplay Sections, Next: Line Start Cache, Up: The Redisplay Mechanism + +Critical Redisplay Sections +=========================== + +Within this section, we are defenseless and assume that the following +cannot happen: + + 1. garbage collection + + 2. Lisp code evaluation + + 3. frame size changes + + We ensure (3) by calling `hold_frame_size_changes()', which will +cause any pending frame size changes to get put on hold till after the +end of the critical section. (1) follows automatically if (2) is met. +#### Unfortunately, there are some places where Lisp code can be called +within this section. We need to remove them. + + If `Fsignal()' is called during this critical section, we will +`abort()'. + + If garbage collection is called during this critical section, we +simply return. #### We should abort instead. + + #### If a frame-size change does occur we should probably actually +be preempting redisplay. + + +File: internals.info, Node: Line Start Cache, Next: Redisplay Piece by Piece, Prev: Critical Redisplay Sections, Up: The Redisplay Mechanism + +Line Start Cache +================ + +The traditional scrolling code in Emacs breaks in a variable height +world. It depends on the key assumption that the number of lines that +can be displayed at any given time is fixed. This led to a complete +separation of the scrolling code from the redisplay code. In order to +fully support variable height lines, the scrolling code must actually be +tightly integrated with redisplay. Only redisplay can determine how +many lines will be displayed on a screen for any given starting point. + + What is ideally wanted is a complete list of the starting buffer +position for every possible display line of a buffer along with the +height of that display line. Maintaining such a full list would be very +expensive. We settle for having it include information for all areas +which we happen to generate anyhow (i.e. the region currently being +displayed) and for those areas we need to work with. + + In order to ensure that the cache accurately represents what +redisplay would actually show, it is necessary to invalidate it in many +situations. If the buffer changes, the starting positions may no longer +be correct. If a face or an extent has changed then the line heights +may have altered. These events happen frequently enough that the cache +can end up being constantly disabled. With this potentially constant +invalidation when is the cache ever useful? + + Even if the cache is invalidated before every single usage, it is +necessary. Scrolling often requires knowledge about display lines which +are actually above or below the visible region. The cache provides a +convenient light-weight method of storing this information for multiple +display regions. This knowledge is necessary for the scrolling code to +always obey the First Golden Rule of Redisplay. + + If the cache already contains all of the information that the +scrolling routines happen to need so that it doesn't have to go +generate it, then we are able to obey the Third Golden Rule of +Redisplay. The first thing we do to help out the cache is to always +add the displayed region. This region had to be generated anyway, so +the cache ends up getting the information basically for free. In those +cases where a user is simply scrolling around viewing a buffer there is +a high probability that this is sufficient to always provide the needed +information. The second thing we can do is be smart about invalidating +the cache. + + TODO--Be smart about invalidating the cache. Potential places: + + * Insertions at end-of-line which don't cause line-wraps do not + alter the starting positions of any display lines. These types of + buffer modifications should not invalidate the cache. This is + actually a large optimization for redisplay speed as well. + + * Buffer modifications frequently only affect the display of lines + at and below where they occur. In these situations we should only + invalidate the part of the cache starting at where the + modification occurs. + + In case you're wondering, the Second Golden Rule of Redisplay is not +applicable. + + +File: internals.info, Node: Redisplay Piece by Piece, Prev: Line Start Cache, Up: The Redisplay Mechanism + +Redisplay Piece by Piece +======================== + +As you can begin to see redisplay is complex and also not well +documented. Chuck no longer works on XEmacs so this section is my take +on the workings of redisplay. + + Redisplay happens in three phases: + + 1. Determine desired display in area that needs redisplay. + Implemented by `redisplay.c' + + 2. Compare desired display with current display Implemented by + `redisplay-output.c' + + 3. Output changes Implemented by `redisplay-output.c', + `redisplay-x.c', `redisplay-msw.c' and `redisplay-tty.c' + + Steps 1 and 2 are device-independent and relatively complex. Step 3 +is mostly device-dependent. + + Determining the desired display + + Display attributes are stored in `display_line' structures. Each +`display_line' consists of a set of `display_block''s and each +`display_block' contains a number of `rune''s. Generally dynarr's of +`display_line''s are held by each window representing the current +display and the desired display. + + The `display_line' structures are tightly tied to buffers which +presents a problem for redisplay as this connection is bogus for the +modeline. Hence the `display_line' generation routines are duplicated +for generating the modeline. This means that the modeline display code +has many bugs that the standard redisplay code does not. + + The guts of `display_line' generation are in `create_text_block', +which creates a single display line for the desired locale. This +incrementally parses the characters on the current line and generates +redisplay structures for each. + + Gutter redisplay is different. Because the data to display is stored +in a string we cannot use `create_text_block'. Instead we use +`create_text_string_block' which performs the same function as +`create_text_block' but for strings. Many of the complexities of +`create_text_block' to do with cursor handling and selective display +have been removed. + + +File: internals.info, Node: Extents, Next: Faces, Prev: The Redisplay Mechanism, Up: Top + +Extents +******* + +* Menu: + +* Introduction to Extents:: Extents are ranges over text, with properties. +* Extent Ordering:: How extents are ordered internally. +* Format of the Extent Info:: The extent information in a buffer or string. +* Zero-Length Extents:: A weird special case. +* Mathematics of Extent Ordering:: A rigorous foundation. +* Extent Fragments:: Cached information useful for redisplay. + + +File: internals.info, Node: Introduction to Extents, Next: Extent Ordering, Up: Extents + +Introduction to Extents +======================= + +Extents are regions over a buffer, with a start and an end position +denoting the region of the buffer included in the extent. In addition, +either end can be closed or open, meaning that the endpoint is or is +not logically included in the extent. Insertion of a character at a +closed endpoint causes the character to go inside the extent; insertion +at an open endpoint causes the character to go outside. -Character-Related Data Types + Extent endpoints are stored using memory indices (see `insdel.c'), +to minimize the amount of adjusting that needs to be done when +characters are inserted or deleted. + + (Formerly, extent endpoints at the gap could be either before or +after the gap, depending on the open/closedness of the endpoint. The +intent of this was to make it so that insertions would automatically go +inside or out of extents as necessary with no further work needing to +be done. It didn't work out that way, however, and just ended up +complexifying and buggifying all the rest of the code.) + + +File: internals.info, Node: Extent Ordering, Next: Format of the Extent Info, Prev: Introduction to Extents, Up: Extents + +Extent Ordering +=============== + +Extents are compared using memory indices. There are two orderings for +extents and both orders are kept current at all times. The normal or +"display" order is as follows: + + Extent A is ``less than'' extent B, + that is, earlier in the display order, + if: A-start < B-start, + or if: A-start = B-start, and A-end > B-end + + So if two extents begin at the same position, the larger of them is +the earlier one in the display order (`EXTENT_LESS' is true). + + For the e-order, the same thing holds: + + Extent A is ``less than'' extent B in e-order, + that is, later in the buffer, + if: A-end < B-end, + or if: A-end = B-end, and A-start > B-start + + So if two extents end at the same position, the smaller of them is +the earlier one in the e-order (`EXTENT_E_LESS' is true). + + The display order and the e-order are complementary orders: any +theorem about the display order also applies to the e-order if you swap +all occurrences of "display order" and "e-order", "less than" and +"greater than", and "extent start" and "extent end". + + +File: internals.info, Node: Format of the Extent Info, Next: Zero-Length Extents, Prev: Extent Ordering, Up: Extents + +Format of the Extent Info +========================= + +An extent-info structure consists of a list of the buffer or string's +extents and a "stack of extents" that lists all of the extents over a +particular position. The stack-of-extents info is used for +optimization purposes--it basically caches some info that might be +expensive to compute. Certain otherwise hard computations are easy +given the stack of extents over a particular position, and if the stack +of extents over a nearby position is known (because it was calculated +at some prior point in time), it's easy to move the stack of extents to +the proper position. + + Given that the stack of extents is an optimization, and given that +it requires memory, a string's stack of extents is wiped out each time +a garbage collection occurs. Therefore, any time you retrieve the +stack of extents, it might not be there. If you need it to be there, +use the `_force' version. + + Similarly, a string may or may not have an extent_info structure. +(Generally it won't if there haven't been any extents added to the +string.) So use the `_force' version if you need the extent_info +structure to be there. + + A list of extents is maintained as a double gap array: one gap array +is ordered by start index (the "display order") and the other is +ordered by end index (the "e-order"). Note that positions in an extent +list should logically be conceived of as referring _to_ a particular +extent (as is the norm in programs) rather than sitting between two +extents. Note also that callers of these functions should not be aware +of the fact that the extent list is implemented as an array, except for +the fact that positions are integers (this should be generalized to +handle integers and linked list equally well). + + +File: internals.info, Node: Zero-Length Extents, Next: Mathematics of Extent Ordering, Prev: Format of the Extent Info, Up: Extents + +Zero-Length Extents +=================== + +Extents can be zero-length, and will end up that way if their endpoints +are explicitly set that way or if their detachable property is `nil' +and all the text in the extent is deleted. (The exception is open-open +zero-length extents, which are barred from existing because there is no +sensible way to define their properties. Deletion of the text in an +open-open extent causes it to be converted into a closed-open extent.) +Zero-length extents are primarily used to represent annotations, and +behave as follows: + + 1. Insertion at the position of a zero-length extent expands the + extent if both endpoints are closed; goes after the extent if it + is closed-open; and goes before the extent if it is open-closed. + + 2. Deletion of a character on a side of a zero-length extent whose + corresponding endpoint is closed causes the extent to be detached + if it is detachable; if the extent is not detachable or the + corresponding endpoint is open, the extent remains in the buffer, + moving as necessary. + + Note that closed-open, non-detachable zero-length extents behave +exactly like markers and that open-closed, non-detachable zero-length +extents behave like the "point-type" marker in Mule. + + +File: internals.info, Node: Mathematics of Extent Ordering, Next: Extent Fragments, Prev: Zero-Length Extents, Up: Extents + +Mathematics of Extent Ordering +============================== + +The extents in a buffer are ordered by "display order" because that is +that order that the redisplay mechanism needs to process them in. The +e-order is an auxiliary ordering used to facilitate operations over +extents. The operations that can be performed on the ordered list of +extents in a buffer are + + 1. Locate where an extent would go if inserted into the list. + + 2. Insert an extent into the list. + + 3. Remove an extent from the list. + + 4. Map over all the extents that overlap a range. + + (4) requires being able to determine the first and last extents that +overlap a range. + + NOTE: "overlap" is used as follows: + + * two ranges overlap if they have at least one point in common. + Whether the endpoints are open or closed makes a difference here. + + * a point overlaps a range if the point is contained within the + range; this is equivalent to treating a point P as the range [P, + P]. + + * In the case of an _extent_ overlapping a point or range, the extent + is normally treated as having closed endpoints. This applies + consistently in the discussion of stacks of extents and such below. + Note that this definition of overlap is not necessarily consistent + with the extents that `map-extents' maps over, since `map-extents' + sometimes pays attention to whether the endpoints of an extents + are open or closed. But for our purposes, it greatly simplifies + things to treat all extents as having closed endpoints. + + First, define >, <, <=, etc. as applied to extents to mean +comparison according to the display order. Comparison between an +extent E and an index I means comparison between E and the range [I, I]. + + Also define e>, e<, e<=, etc. to mean comparison according to the +e-order. + + For any range R, define R(0) to be the starting index of the range +and R(1) to be the ending index of the range. + + For any extent E, define E(next) to be the extent directly following +E, and E(prev) to be the extent directly preceding E. Assume E(next) +and E(prev) can be determined from E in constant time. (This is +because we store the extent list as a doubly linked list.) + + Similarly, define E(e-next) and E(e-prev) to be the extents directly +following and preceding E in the e-order. + + Now: + + Let R be a range. Let F be the first extent overlapping R. Let L +be the last extent overlapping R. + + Theorem 1: R(1) lies between L and L(next), i.e. L <= R(1) < L(next). + + This follows easily from the definition of display order. The basic +reason that this theorem applies is that the display order sorts by +increasing starting index. + + Therefore, we can determine L just by looking at where we would +insert R(1) into the list, and if we know F and are moving forward over +extents, we can easily determine when we've hit L by comparing the +extent we're at to R(1). + + Theorem 2: F(e-prev) e< [1, R(0)] e<= F. + + This is the analog of Theorem 1, and applies because the e-order +sorts by increasing ending index. + + Therefore, F can be found in the same amount of time as operation +(1), i.e. the time that it takes to locate where an extent would go if +inserted into the e-order list. + + If the lists were stored as balanced binary trees, then operation (1) +would take logarithmic time, which is usually quite fast. However, +currently they're stored as simple doubly-linked lists, and instead we +do some caching to try to speed things up. + + Define a "stack of extents" (or "SOE") as the set of extents +(ordered in the display order) that overlap an index I, together with +the SOE's "previous" extent, which is an extent that precedes I in the +e-order. (Hopefully there will not be very many extents between I and +the previous extent.) + + Now: + + Let I be an index, let S be the stack of extents on I, let F be the +first extent in S, and let P be S's previous extent. + + Theorem 3: The first extent in S is the first extent that overlaps +any range [I, J]. + + Proof: Any extent that overlaps [I, J] but does not include I must +have a start index > I, and thus be greater than any extent in S. + + Therefore, finding the first extent that overlaps a range R is the +same as finding the first extent that overlaps R(0). + + Theorem 4: Let I2 be an index such that I2 > I, and let F2 be the +first extent that overlaps I2. Then, either F2 is in S or F2 is +greater than any extent in S. + + Proof: If F2 does not include I then its start index is greater than +I and thus it is greater than any extent in S, including F. Otherwise, +F2 includes I and thus is in S, and thus F2 >= F. + + +File: internals.info, Node: Extent Fragments, Prev: Mathematics of Extent Ordering, Up: Extents + +Extent Fragments +================ + +Imagine that the buffer is divided up into contiguous, non-overlapping +"runs" of text such that no extent starts or ends within a run (extents +that abut the run don't count). + + An extent fragment is a structure that holds data about the run that +contains a particular buffer position (if the buffer position is at the +junction of two runs, the run after the position is used)--the +beginning and end of the run, a list of all of the extents in that run, +the "merged face" that results from merging all of the faces +corresponding to those extents, the begin and end glyphs at the +beginning of the run, etc. This is the information that redisplay needs +in order to display this run. + + Extent fragments have to be very quick to update to a new buffer +position when moving linearly through the buffer. They rely on the +stack-of-extents code, which does the heavy-duty algorithmic work of +determining which extents overly a particular position. + + +File: internals.info, Node: Faces, Next: Glyphs, Prev: Extents, Up: Top + +Faces +***** + +Not yet documented. + + +File: internals.info, Node: Glyphs, Next: Specifiers, Prev: Faces, Up: Top + +Glyphs +****** + +Glyphs are graphical elements that can be displayed in XEmacs buffers or +gutters. We use the term graphical element here in the broadest possible +sense since glyphs can be as mundane as text or as arcane as a native +tab widget. + + In XEmacs, glyphs represent the uninstantiated state of graphical +elements, i.e. they hold all the information necessary to produce an +image on-screen but the image need not exist at this stage, and multiple +screen images can be instantiated from a single glyph. + + Glyphs are lazily instantiated by calling one of the glyph +functions. This usually occurs within redisplay when `Fglyph_height' is +called. Instantiation causes an image-instance to be created and +cached. This cache is on a per-device basis for all glyphs except +widget-glyphs, and on a per-window basis for widgets-glyphs. The +caching is done by `image_instantiate' and is necessary because it is +generally possible to display an image-instance in multiple domains. +For instance if we create a Pixmap, we can actually display this on +multiple windows - even though we only need a single Pixmap instance to +do this. If caching wasn't done then it would be necessary to create +image-instances for every displayable occurrence of a glyph - and every +usage - and this would be extremely memory and cpu intensive. + + Widget-glyphs (a.k.a native widgets) are not cached in this way. +This is because widget-glyph image-instances on screen are toolkit +windows, and thus cannot be reused in multiple XEmacs domains. Thus +widget-glyphs are cached on an XEmacs window basis. + + Any action on a glyph first consults the cache before actually +instantiating a widget. + +Glyph Instantiation +=================== + +Glyph instantiation is a hairy topic and requires some explanation. The +guts of glyph instantiation is contained within `image_instantiate'. A +glyph contains an image which is a specifier. When a glyph function - +for instance `Fglyph_height' - asks for a property of the glyph that +can only be determined from its instantiated state, then the glyph +image is instantiated and an image instance created. The instantiation +process is governed by the specifier code and goes through a series of +steps: + + * Validation. Instantiation of image instances happens dynamically - + often within the guts of redisplay. Thus it is often not feasible + to catch instantiator errors at instantiation time. Instead the + instantiator is validated at the time it is added to the image + specifier. This function is defined by `image_validate' and at a + simple level validates keyword value pairs. + + * Duplication. The specifier code by default takes a copy of the + instantiator. This is reasonable for most specifiers but in the + case of widget-glyphs can be problematic, since some of the + properties in the instantiator - for instance callbacks - could + cause infinite recursion in the copying process. Thus the image + code defines a function - `image_copy_instantiator' - which will + selectively copy values. This is controlled by the way that a + keyword is defined either using `IIFORMAT_VALID_KEYWORD' or + `IIFORMAT_VALID_NONCOPY_KEYWORD'. Note that the image caching and + redisplay code relies on instantiator copying to ensure that + current and new instantiators are actually different rather than + referring to the same thing. + + * Normalization. Once the instantiator has been copied it must be + converted into a form that is viable at instantiation time. This + can involve no changes at all, but typically involves things like + converting file names to the actual data. This function is defined + by `image_going_to_add' and `normalize_image_instantiator'. + + * Instantiation. When an image instance is actually required for + display it is instantiated using `image_instantiate'. This + involves calling instantiate methods that are specific to the type + of image being instantiated. + + The final instantiation phase also involves a number of steps. In +order to understand these we need to describe a number of concepts. + + An image is instantiated in a "domain", where a domain can be any +one of a device, frame, window or image-instance. The domain gives the +image-instance context and identity and properties that affect the +appearance of the image-instance may be different for the same glyph +instantiated in different domains. An example is the face used to +display the image-instance. + + Although an image is instantiated in a particular domain the +instantiation domain is not necessarily the domain in which the +image-instance is cached. For example a pixmap can be instantiated in a +window be actually be cached on a per-device basis. The domain in which +the image-instance is actually cached is called the "governing-domain". +A governing-domain is currently either a device or a window. +Widget-glyphs and text-glyphs have a window as a governing-domain, all +other image-instances have a device as the governing-domain. The +governing domain for an image-instance is determined using the +governing_domain image-instance method. + +Widget-Glyphs +============= + +Widget-Glyphs in the MS-Windows Environment +=========================================== + +To Do + +Widget-Glyphs in the X Environment +================================== + +Widget-glyphs under X make heavy use of lwlib (*note Lucid Widget +Library::) for manipulating the native toolkit objects. This is +primarily so that different toolkits can be supported for +widget-glyphs, just as they are supported for features such as menubars +etc. + + Lwlib is extremely poorly documented and quite hairy so here is my +understanding of what goes on. + + Lwlib maintains a set of widget_instances which mirror the +hierarchical state of Xt widgets. I think this is so that widgets can +be updated and manipulated generically by the lwlib library. For +instance update_one_widget_instance can cope with multiple types of +widget and multiple types of toolkit. Each element in the widget +hierarchy is updated from its corresponding widget_instance by walking +the widget_instance tree recursively. + + This has desirable properties such as lw_modify_all_widgets which is +called from `glyphs-x.c' and updates all the properties of a widget +without having to know what the widget is or what toolkit it is from. +Unfortunately this also has hairy properties such as making the lwlib +code quite complex. And of course lwlib has to know at some level what +the widget is and how to set its properties. + + +File: internals.info, Node: Specifiers, Next: Menus, Prev: Glyphs, Up: Top + +Specifiers +********** + +Not yet documented. + + +File: internals.info, Node: Menus, Next: Subprocesses, Prev: Specifiers, Up: Top + +Menus +***** + +A menu is set by setting the value of the variable `current-menubar' +(which may be buffer-local) and then calling `set-menubar-dirty-flag' +to signal a change. This will cause the menu to be redrawn at the next +redisplay. The format of the data in `current-menubar' is described in +`menubar.c'. + + Internally the data in current-menubar is parsed into a tree of +`widget_value's' (defined in `lwlib.h'); this is accomplished by the +recursive function `menu_item_descriptor_to_widget_value()', called by +`compute_menubar_data()'. Such a tree is deallocated using +`free_widget_value()'. + + `update_screen_menubars()' is one of the external entry points. +This checks to see, for each screen, if that screen's menubar needs to +be updated. This is the case if + + 1. `set-menubar-dirty-flag' was called since the last redisplay. + (This function sets the C variable menubar_has_changed.) + + 2. The buffer displayed in the screen has changed. + + 3. The screen has no menubar currently displayed. + + `set_screen_menubar()' is called for each such screen. This +function calls `compute_menubar_data()' to create the tree of +widget_value's, then calls `lw_create_widget()', +`lw_modify_all_widgets()', and/or `lw_destroy_all_widgets()' to create +the X-Toolkit widget associated with the menu. + + `update_psheets()', the other external entry point, actually changes +the menus being displayed. It uses the widgets fixed by +`update_screen_menubars()' and calls various X functions to ensure that +the menus are displayed properly. + + The menubar widget is set up so that `pre_activate_callback()' is +called when the menu is first selected (i.e. mouse button goes down), +and `menubar_selection_callback()' is called when an item is selected. +`pre_activate_callback()' calls the function in activate-menubar-hook, +which can change the menubar (this is described in `menubar.c'). If +the menubar is changed, `set_screen_menubars()' is called. +`menubar_selection_callback()' enqueues a menu event, putting in it a +function to call (either `eval' or `call-interactively') and its +argument, which is the callback function or form given in the menu's +description. + + +File: internals.info, Node: Subprocesses, Next: Interface to the X Window System, Prev: Menus, Up: Top + +Subprocesses +************ + +The fields of a process are: + +`name' + A string, the name of the process. + +`command' + A list containing the command arguments that were used to start + this process. + +`filter' + A function used to accept output from the process instead of a + buffer, or `nil'. + +`sentinel' + A function called whenever the process receives a signal, or `nil'. + +`buffer' + The associated buffer of the process. + +`pid' + An integer, the Unix process ID. + +`childp' + A flag, non-`nil' if this is really a child process. It is `nil' + for a network connection. + +`mark' + A marker indicating the position of the end of the last output + from this process inserted into the buffer. This is often but not + always the end of the buffer. + +`kill_without_query' + If this is non-`nil', killing XEmacs while this process is still + running does not ask for confirmation about killing the process. + +`raw_status_low' +`raw_status_high' + These two fields record 16 bits each of the process status + returned by the `wait' system call. + +`status' + The process status, as `process-status' should return it. + +`tick' +`update_tick' + If these two fields are not equal, a change in the status of the + process needs to be reported, either by running the sentinel or by + inserting a message in the process buffer. + +`pty_flag' + Non-`nil' if communication with the subprocess uses a PTY; `nil' + if it uses a pipe. + +`infd' + The file descriptor for input from the process. + +`outfd' + The file descriptor for output to the process. + +`subtty' + The file descriptor for the terminal that the subprocess is using. + (On some systems, there is no need to record this, so the value is + `-1'.) + +`tty_name' + The name of the terminal that the subprocess is using, or `nil' if + it is using pipes. + + +File: internals.info, Node: Interface to the X Window System, Next: Index, Prev: Subprocesses, Up: Top + +Interface to the X Window System +******************************** + +Mostly undocumented. + +* Menu: + +* Lucid Widget Library:: An interface to various widget sets. + + +File: internals.info, Node: Lucid Widget Library, Up: Interface to the X Window System + +Lucid Widget Library +==================== + +Lwlib is extremely poorly documented and quite hairy. The author(s) +blame that on X, Xt, and Motif, with some justice, but also sufficient +hypocrisy to avoid drawing the obvious conclusion about their own work. + + The Lucid Widget Library is composed of two more or less independent +pieces. The first, as the name suggests, is a set of widgets. These +widgets are intended to resemble and improve on widgets provided in the +Motif toolkit but not in the Athena widgets, including menubars and +scrollbars. Recent additions by Andy Piper integrate some "modern" +widgets by Edward Falk, including checkboxes, radio buttons, progress +gauges, and index tab controls (aka notebooks). + + The second piece of the Lucid widget library is a generic interface +to several toolkits for X (including Xt, the Athena widget set, and +Motif, as well as the Lucid widgets themselves) so that core XEmacs +code need not know which widget set has been used to build the +graphical user interface. + +* Menu: + +* Generic Widget Interface:: The lwlib generic widget interface. +* Scrollbars:: +* Menubars:: +* Checkboxes and Radio Buttons:: +* Progress Bars:: +* Tab Controls:: + + +File: internals.info, Node: Generic Widget Interface, Next: Scrollbars, Up: Lucid Widget Library + +Generic Widget Interface +------------------------ + +In general in any toolkit a widget may be a composite object. In Xt, +all widgets have an X window that they manage, but typically a complex +widget will have widget children, each of which manages a subwindow of +the parent widget's X window. These children may themselves be +composite widgets. Thus a widget is actually a tree or hierarchy of +widgets. + + For each toolkit widget, lwlib maintains a tree of `widget_values' +which mirror the hierarchical state of Xt widgets (including Motif, +Athena, 3D Athena, and Falk's widget sets). Each `widget_value' has +`contents' member, which points to the head of a linked list of its +children. The linked list of siblings is chained through the `next' +member of `widget_value'. + + +-----------+ + | composite | + +-----------+ + | + | contents + V + +-------+ next +-------+ next +-------+ + | child |----->| child |----->| child | + +-------+ +-------+ +-------+ + | + | contents + V + +-------------+ next +-------------+ + | grand child |----->| grand child | + +-------------+ +-------------+ + + The `widget_value' hierarchy of a composite widget with two simple + children and one composite child. + + The `widget_instance' structure maintains the inverse view of the +tree. As for the `widget_value', siblings are chained through the +`next' member. However, rather than naming children, the +`widget_instance' tree links to parents. + + +-----------+ + | composite | + +-----------+ + A + | parent + | + +-------+ next +-------+ next +-------+ + | child |----->| child |----->| child | + +-------+ +-------+ +-------+ + A + | parent + | + +-------------+ next +-------------+ + | grand child |----->| grand child | + +-------------+ +-------------+ + + The `widget_value' hierarchy of a composite widget with two simple + children and one composite child. + + This permits widgets derived from different toolkits to be updated +and manipulated generically by the lwlib library. For instance +`update_one_widget_instance' can cope with multiple types of widget and +multiple types of toolkit. Each element in the widget hierarchy is +updated from its corresponding `widget_value' by walking the +`widget_value' tree. This has desirable properties. For example, +`lw_modify_all_widgets' is called from `glyphs-x.c' and updates all the +properties of a widget without having to know what the widget is or +what toolkit it is from. Unfortunately this also has its hairy +properties; the lwlib code quite complex. And of course lwlib has to +know at some level what the widget is and how to set its properties. + + The `widget_instance' structure also contains a pointer to the root +of its tree. Widget instances are further confi + + +File: internals.info, Node: Scrollbars, Next: Menubars, Prev: Generic Widget Interface, Up: Lucid Widget Library + +Scrollbars +---------- + + +File: internals.info, Node: Menubars, Next: Checkboxes and Radio Buttons, Prev: Scrollbars, Up: Lucid Widget Library + +Menubars +-------- + + +File: internals.info, Node: Checkboxes and Radio Buttons, Next: Progress Bars, Prev: Menubars, Up: Lucid Widget Library + +Checkboxes and Radio Buttons ---------------------------- - First, let's review the basic character-related datatypes used by -XEmacs. Note that the separate `typedef's are not mandatory in the -current implementation (all of them boil down to `unsigned char' or -`int'), but they improve clarity of code a great deal, because one -glance at the declaration can tell the intended use of the variable. - -`Emchar' - An `Emchar' holds a single Emacs character. - - Obviously, the equality between characters and bytes is lost in - the Mule world. Characters can be represented by one or more - bytes in the buffer, and `Emchar' is the C type large enough to - hold any character. - - Without Mule support, an `Emchar' is equivalent to an `unsigned - char'. - -`Bufbyte' - The data representing the text in a buffer or string is logically - a set of `Bufbyte's. - - XEmacs does not work with the same character formats all the time; - when reading characters from the outside, it decodes them to an - internal format, and likewise encodes them when writing. - `Bufbyte' (in fact `unsigned char') is the basic unit of XEmacs - internal buffers and strings format. A `Bufbyte *' is the type - that points at text encoded in the variable-width internal - encoding. - - One character can correspond to one or more `Bufbyte's. In the - current Mule implementation, an ASCII character is represented by - the same `Bufbyte', and other characters are represented by a - sequence of two or more `Bufbyte's. - - Without Mule support, there are exactly 256 characters, implicitly - Latin-1, and each character is represented using one `Bufbyte', and - there is a one-to-one correspondence between `Bufbyte's and - `Emchar's. - -`Bufpos' -`Charcount' - A `Bufpos' represents a character position in a buffer or string. - A `Charcount' represents a number (count) of characters. - Logically, subtracting two `Bufpos' values yields a `Charcount' - value. Although all of these are `typedef'ed to `EMACS_INT', we - use them in preference to `EMACS_INT' to make it clear what sort - of position is being used. - - `Bufpos' and `Charcount' values are the only ones that are ever - visible to Lisp. - -`Bytind' -`Bytecount' - A `Bytind' represents a byte position in a buffer or string. A - `Bytecount' represents the distance between two positions, in - bytes. The relationship between `Bytind' and `Bytecount' is the - same as the relationship between `Bufpos' and `Charcount'. - -`Extbyte' -`Extcount' - When dealing with the outside world, XEmacs works with `Extbyte's, - which are equivalent to `unsigned char'. Obviously, an `Extcount' - is the distance between two `Extbyte's. Extbytes and Extcounts - are not all that frequent in XEmacs code. + +File: internals.info, Node: Progress Bars, Next: Tab Controls, Prev: Checkboxes and Radio Buttons, Up: Lucid Widget Library + +Progress Bars +------------- + + +File: internals.info, Node: Tab Controls, Prev: Progress Bars, Up: Lucid Widget Library + +Tab Controls +------------ + + +File: internals.info, Node: Index, Prev: Interface to the X Window System, Up: Top + +Index +***** + +* Menu: + +* allocation from frob blocks: Allocation from Frob Blocks. +* allocation of objects in XEmacs Lisp: Allocation of Objects in XEmacs Lisp. +* allocation, introduction to: Introduction to Allocation. +* allocation, low-level: Low-level allocation. +* Amdahl Corporation: XEmacs. +* Andreessen, Marc: XEmacs. +* asynchronous subprocesses: Modules for Interfacing with the Operating System. +* bars, progress: Progress Bars. +* Baur, Steve: XEmacs. +* Benson, Eric: Lucid Emacs. +* binding; the specbinding stack; unwind-protects, dynamic: Dynamic Binding; The specbinding Stack; Unwind-Protects. +* bindings, evaluation; stack frames;: Evaluation; Stack Frames; Bindings. +* bit vector: Bit Vector. +* bridge, playing: XEmacs From the Outside. +* Buchholz, Martin: XEmacs. +* Bufbyte: Character-Related Data Types. +* Bufbytes and Emchars: Bufbytes and Emchars. +* buffer lists: Buffer Lists. +* buffer object, the: The Buffer Object. +* buffer, the text in a: The Text in a Buffer. +* buffers and textual representation: Buffers and Textual Representation. +* buffers, introduction to: Introduction to Buffers. +* Bufpos: Character-Related Data Types. +* building, XEmacs from the perspective of: XEmacs From the Perspective of Building. +* buttons, checkboxes and radio: Checkboxes and Radio Buttons. +* byte positions, working with character and: Working With Character and Byte Positions. +* Bytecount: Character-Related Data Types. +* bytecount_to_charcount: Working With Character and Byte Positions. +* Bytind: Character-Related Data Types. +* C code, rules when writing new: Rules When Writing New C Code. +* C vs. Lisp: The Lisp Language. +* callback routines, the event stream: The Event Stream Callback Routines. +* caller-protects (GCPRO rule): Writing Lisp Primitives. +* case table: Modules for Other Aspects of the Lisp Interpreter and Object System. +* catch and throw: Catch and Throw. +* CCL: CCL. +* character and byte positions, working with: Working With Character and Byte Positions. +* character encoding, internal: Internal Character Encoding. +* character sets: Character Sets. +* character sets and encodings, Mule: MULE Character Sets and Encodings. +* character-related data types: Character-Related Data Types. +* characters, integers and: Integers and Characters. +* Charcount: Character-Related Data Types. +* charcount_to_bytecount: Working With Character and Byte Positions. +* charptr_emchar: Working With Character and Byte Positions. +* charptr_n_addr: Working With Character and Byte Positions. +* checkboxes and radio buttons: Checkboxes and Radio Buttons. +* closer: Lstream Methods. +* closure: The XEmacs Object System (Abstractly Speaking). +* code, an example of Mule-aware: An Example of Mule-Aware Code. +* code, general guidelines for writing Mule-aware: General Guidelines for Writing Mule-Aware Code. +* code, rules when writing new C: Rules When Writing New C Code. +* coding conventions: A Reader's Guide to XEmacs Coding Conventions. +* coding for Mule: Coding for Mule. +* coding rules, general: General Coding Rules. +* coding rules, naming: A Reader's Guide to XEmacs Coding Conventions. +* command builder, dispatching events; the: Dispatching Events; The Command Builder. +* comments, writing good: Writing Good Comments. +* Common Lisp: The Lisp Language. +* compact_string_chars: compact_string_chars. +* compiled function: Compiled Function. +* compiler, the Lisp reader and: The Lisp Reader and Compiler. +* cons: Cons. +* conservative garbage collection: GCPROing. +* consoles; devices; frames; windows: Consoles; Devices; Frames; Windows. +* consoles; devices; frames; windows, introduction to: Introduction to Consoles; Devices; Frames; Windows. +* control flow modules, editor-level: Editor-Level Control Flow Modules. +* conversion to and from external data: Conversion to and from External Data. +* converting events: Converting Events. +* copy-on-write: General Coding Rules. +* creating Lisp object types: Techniques for XEmacs Developers. +* critical redisplay sections: Critical Redisplay Sections. +* data dumping: Data dumping. +* data types, character-related: Character-Related Data Types. +* DEC_CHARPTR: Working With Character and Byte Positions. +* developers, techniques for XEmacs: Techniques for XEmacs Developers. +* devices; frames; windows, consoles;: Consoles; Devices; Frames; Windows. +* devices; frames; windows, introduction to consoles;: Introduction to Consoles; Devices; Frames; Windows. +* Devin, Matthieu: Lucid Emacs. +* dispatching events; the command builder: Dispatching Events; The Command Builder. +* display order of extents: Mathematics of Extent Ordering. +* display-related Lisp objects, modules for other: Modules for other Display-Related Lisp Objects. +* displayable Lisp objects, modules for the basic: Modules for the Basic Displayable Lisp Objects. +* dumping: Dumping. +* dumping address allocation: Address allocation. +* dumping and its justification, what is: Dumping. +* dumping data descriptions: Data descriptions. +* dumping object inventory: Object inventory. +* dumping overview: Overview. +* dumping phase: Dumping phase. +* dumping, data: Data dumping. +* dumping, file loading: Reloading phase. +* dumping, object relocation: Reloading phase. +* dumping, pointers: Pointers dumping. +* dumping, putting back the pdump_opaques: Reloading phase. +* dumping, putting back the pdump_root_objects and pdump_weak_object_chains: Reloading phase. +* dumping, putting back the pdump_root_struct_ptrs: Reloading phase. +* dumping, reloading phase: Reloading phase. +* dumping, remaining issues: Remaining issues. +* dumping, reorganize the hash tables: Reloading phase. +* dumping, the header: The header. +* dynamic array: Low-Level Modules. +* dynamic binding; the specbinding stack; unwind-protects: Dynamic Binding; The specbinding Stack; Unwind-Protects. +* dynamic scoping: The Lisp Language. +* dynamic types: The Lisp Language. +* editing operations, modules for standard: Modules for Standard Editing Operations. +* Emacs 19, GNU: GNU Emacs 19. +* Emacs 20, GNU: GNU Emacs 20. +* Emacs, a history of: A History of Emacs. +* Emchar: Character-Related Data Types. +* Emchars, Bufbytes and: Bufbytes and Emchars. +* encoding, internal character: Internal Character Encoding. +* encoding, internal string: Internal String Encoding. +* encodings, internal Mule: Internal Mule Encodings. +* encodings, Mule: Encodings. +* encodings, Mule character sets and: MULE Character Sets and Encodings. +* Energize: Lucid Emacs. +* Epoch <1>: XEmacs. +* Epoch: Lucid Emacs. +* error checking: Techniques for XEmacs Developers. +* EUC (Extended Unix Code), Japanese: Japanese EUC (Extended Unix Code). +* evaluation: Evaluation. +* evaluation; stack frames; bindings: Evaluation; Stack Frames; Bindings. +* event gathering mechanism, specifics of the: Specifics of the Event Gathering Mechanism. +* event loop functions, other: Other Event Loop Functions. +* event loop, events and the: Events and the Event Loop. +* event stream callback routines, the: The Event Stream Callback Routines. +* event, specifics about the Lisp object: Specifics About the Emacs Event. +* events and the event loop: Events and the Event Loop. +* events, converting: Converting Events. +* events, introduction to: Introduction to Events. +* events, main loop: Main Loop. +* events; the command builder, dispatching: Dispatching Events; The Command Builder. +* Extbyte: Character-Related Data Types. +* Extcount: Character-Related Data Types. +* Extended Unix Code, Japanese EUC: Japanese EUC (Extended Unix Code). +* extent fragments: Extent Fragments. +* extent info, format of the: Format of the Extent Info. +* extent mathematics: Mathematics of Extent Ordering. +* extent ordering <1>: Mathematics of Extent Ordering. +* extent ordering: Extent Ordering. +* extents: Extents. +* extents, display order: Mathematics of Extent Ordering. +* extents, introduction to: Introduction to Extents. +* extents, markers and: Markers and Extents. +* extents, zero-length: Zero-Length Extents. +* external data, conversion to and from: Conversion to and from External Data. +* external widget: Modules for Interfacing with X Windows. +* faces: Faces. +* file system, modules for interfacing with the: Modules for Interfacing with the File System. +* flusher: Lstream Methods. +* fragments, extent: Extent Fragments. +* frames; windows, consoles; devices;: Consoles; Devices; Frames; Windows. +* frames; windows, introduction to consoles; devices;: Introduction to Consoles; Devices; Frames; Windows. +* Free Software Foundation: A History of Emacs. +* frob blocks, allocation from: Allocation from Frob Blocks. +* FSF: A History of Emacs. +* FSF Emacs <1>: GNU Emacs 20. +* FSF Emacs: GNU Emacs 19. +* function, compiled: Compiled Function. +* garbage collection: Garbage Collection. +* garbage collection - step by step: Garbage Collection - Step by Step. +* garbage collection protection <1>: GCPROing. +* garbage collection protection: Writing Lisp Primitives. +* garbage collection, conservative: GCPROing. +* garbage collection, invocation: Invocation. +* garbage_collect_1: garbage_collect_1. +* gc_sweep: gc_sweep. +* GCPROing: GCPROing. +* global Lisp variables, adding: Adding Global Lisp Variables. +* glyph instantiation: Glyphs. +* glyphs: Glyphs. +* GNU Emacs 19: GNU Emacs 19. +* GNU Emacs 20: GNU Emacs 20. +* Gosling, James <1>: The Lisp Language. +* Gosling, James: Through Version 18. +* Great Usenet Renaming: Through Version 18. +* Hackers (Steven Levy): A History of Emacs. +* header files, inline functions: Techniques for XEmacs Developers. +* hierarchy of windows: Window Hierarchy. +* history of Emacs, a: A History of Emacs. +* Illinois, University of: XEmacs. +* INC_CHARPTR: Working With Character and Byte Positions. +* inline functions: Techniques for XEmacs Developers. +* inline functions, headers: Techniques for XEmacs Developers. +* inside, XEmacs from the: XEmacs From the Inside. +* instantiation, glyph: Glyphs. +* integers and characters: Integers and Characters. +* interactive: Modules for Standard Editing Operations. +* interfacing with the file system, modules for: Modules for Interfacing with the File System. +* interfacing with the operating system, modules for: Modules for Interfacing with the Operating System. +* interfacing with X Windows, modules for: Modules for Interfacing with X Windows. +* internal character encoding: Internal Character Encoding. +* internal Mule encodings: Internal Mule Encodings. +* internal string encoding: Internal String Encoding. +* internationalization, modules for: Modules for Internationalization. +* interning: The XEmacs Object System (Abstractly Speaking). +* interpreter and object system, modules for other aspects of the Lisp: Modules for Other Aspects of the Lisp Interpreter and Object System. +* ITS (Incompatible Timesharing System): A History of Emacs. +* Japanese EUC (Extended Unix Code): Japanese EUC (Extended Unix Code). +* Java: The Lisp Language. +* Java vs. Lisp: The Lisp Language. +* JIS7: JIS7. +* Jones, Kyle: XEmacs. +* Kaplan, Simon: XEmacs. +* Levy, Steven: A History of Emacs. +* library, Lucid Widget: Lucid Widget Library. +* line start cache: Line Start Cache. +* Lisp interpreter and object system, modules for other aspects of the: Modules for Other Aspects of the Lisp Interpreter and Object System. +* Lisp language, the: The Lisp Language. +* Lisp modules, basic: Basic Lisp Modules. +* Lisp object types, creating: Techniques for XEmacs Developers. +* Lisp objects are represented in C, how: How Lisp Objects Are Represented in C. +* Lisp objects, allocation of in XEmacs: Allocation of Objects in XEmacs Lisp. +* Lisp objects, modules for other display-related: Modules for other Display-Related Lisp Objects. +* Lisp objects, modules for the basic displayable: Modules for the Basic Displayable Lisp Objects. +* Lisp primitives, writing: Writing Lisp Primitives. +* Lisp reader and compiler, the: The Lisp Reader and Compiler. +* Lisp vs. C: The Lisp Language. +* Lisp vs. Java: The Lisp Language. +* low-level allocation: Low-level allocation. +* low-level modules: Low-Level Modules. +* lrecords: lrecords. +* lstream: Modules for Interfacing with the File System. +* lstream functions: Lstream Functions. +* lstream methods: Lstream Methods. +* lstream types: Lstream Types. +* lstream, creating an: Creating an Lstream. +* Lstream_close: Lstream Functions. +* Lstream_fgetc: Lstream Functions. +* Lstream_flush: Lstream Functions. +* Lstream_fputc: Lstream Functions. +* Lstream_fungetc: Lstream Functions. +* Lstream_getc: Lstream Functions. +* Lstream_new: Lstream Functions. +* Lstream_putc: Lstream Functions. +* Lstream_read: Lstream Functions. +* Lstream_reopen: Lstream Functions. +* Lstream_rewind: Lstream Functions. +* Lstream_set_buffering: Lstream Functions. +* Lstream_ungetc: Lstream Functions. +* Lstream_unread: Lstream Functions. +* Lstream_write: Lstream Functions. +* lstreams: Lstreams. +* Lucid Emacs: Lucid Emacs. +* Lucid Inc.: Lucid Emacs. +* Lucid Widget Library: Lucid Widget Library. +* macro hygiene: Techniques for XEmacs Developers. +* main loop: Main Loop. +* mark and sweep: Garbage Collection. +* mark method <1>: lrecords. +* mark method: Modules for Other Aspects of the Lisp Interpreter and Object System. +* mark_object: mark_object. +* marker <1>: Lstream Methods. +* marker: Marker. +* markers and extents: Markers and Extents. +* mathematics of extent ordering: Mathematics of Extent Ordering. +* MAX_EMCHAR_LEN: Working With Character and Byte Positions. +* menubars: Menubars. +* menus: Menus. +* merging attempts: XEmacs. +* MIT: A History of Emacs. +* Mlynarik, Richard: GNU Emacs 19. +* modules for interfacing with the file system: Modules for Interfacing with the File System. +* modules for interfacing with the operating system: Modules for Interfacing with the Operating System. +* modules for interfacing with X Windows: Modules for Interfacing with X Windows. +* modules for internationalization: Modules for Internationalization. +* modules for other aspects of the Lisp interpreter and object system: Modules for Other Aspects of the Lisp Interpreter and Object System. +* modules for other display-related Lisp objects: Modules for other Display-Related Lisp Objects. +* modules for regression testing: Modules for Regression Testing. +* modules for standard editing operations: Modules for Standard Editing Operations. +* modules for the basic displayable Lisp objects: Modules for the Basic Displayable Lisp Objects. +* modules for the redisplay mechanism: Modules for the Redisplay Mechanism. +* modules, a summary of the various XEmacs: A Summary of the Various XEmacs Modules. +* modules, basic Lisp: Basic Lisp Modules. +* modules, editor-level control flow: Editor-Level Control Flow Modules. +* modules, low-level: Low-Level Modules. +* MS-Windows environment, widget-glyphs in the: Glyphs. +* Mule character sets and encodings: MULE Character Sets and Encodings. +* Mule encodings: Encodings. +* Mule encodings, internal: Internal Mule Encodings. +* MULE merged XEmacs appears: XEmacs. +* Mule, coding for: Coding for Mule. +* Mule-aware code, an example of: An Example of Mule-Aware Code. +* Mule-aware code, general guidelines for writing: General Guidelines for Writing Mule-Aware Code. +* NAS: Modules for Interfacing with the Operating System. +* native sound: Modules for Interfacing with the Operating System. +* network connections: Modules for Interfacing with the Operating System. +* network sound: Modules for Interfacing with the Operating System. +* Niksic, Hrvoje: XEmacs. +* obarrays: Obarrays. +* object system (abstractly speaking), the XEmacs: The XEmacs Object System (Abstractly Speaking). +* object system, modules for other aspects of the Lisp interpreter and: Modules for Other Aspects of the Lisp Interpreter and Object System. +* object types, creating Lisp: Techniques for XEmacs Developers. +* object, the buffer: The Buffer Object. +* object, the window: The Window Object. +* objects are represented in C, how Lisp: How Lisp Objects Are Represented in C. +* objects in XEmacs Lisp, allocation of: Allocation of Objects in XEmacs Lisp. +* objects, modules for the basic displayable Lisp: Modules for the Basic Displayable Lisp Objects. +* operating system, modules for interfacing with the: Modules for Interfacing with the Operating System. +* outside, XEmacs from the: XEmacs From the Outside. +* pane: Modules for the Basic Displayable Lisp Objects. +* permanent objects: The XEmacs Object System (Abstractly Speaking). +* pi, calculating: XEmacs From the Outside. +* point: Point. +* pointers dumping: Pointers dumping. +* positions, working with character and byte: Working With Character and Byte Positions. +* primitives, writing Lisp: Writing Lisp Primitives. +* progress bars: Progress Bars. +* protection, garbage collection: GCPROing. +* pseudo_closer: Lstream Methods. +* Purify: Techniques for XEmacs Developers. +* Quantify: Techniques for XEmacs Developers. +* radio buttons, checkboxes and: Checkboxes and Radio Buttons. +* read syntax: The XEmacs Object System (Abstractly Speaking). +* read-eval-print: XEmacs From the Outside. +* reader: Lstream Methods. +* reader and compiler, the Lisp: The Lisp Reader and Compiler. +* reader's guide: A Reader's Guide to XEmacs Coding Conventions. +* redisplay mechanism, modules for the: Modules for the Redisplay Mechanism. +* redisplay mechanism, the: The Redisplay Mechanism. +* redisplay piece by piece: Redisplay Piece by Piece. +* redisplay sections, critical: Critical Redisplay Sections. +* regression testing, modules for: Modules for Regression Testing. +* reloading phase: Reloading phase. +* relocating allocator: Low-Level Modules. +* rename to XEmacs: XEmacs. +* represented in C, how Lisp objects are: How Lisp Objects Are Represented in C. +* rewinder: Lstream Methods. +* RMS: A History of Emacs. +* scanner: Modules for Other Aspects of the Lisp Interpreter and Object System. +* scoping, dynamic: The Lisp Language. +* scrollbars: Scrollbars. +* seekable_p: Lstream Methods. +* selections: Modules for Interfacing with X Windows. +* set_charptr_emchar: Working With Character and Byte Positions. +* Sexton, Harlan: Lucid Emacs. +* sound, native: Modules for Interfacing with the Operating System. +* sound, network: Modules for Interfacing with the Operating System. +* SPARCWorks: XEmacs. +* specbinding stack; unwind-protects, dynamic binding; the: Dynamic Binding; The specbinding Stack; Unwind-Protects. +* special forms, simple: Simple Special Forms. +* specifiers: Specifiers. +* stack frames; bindings, evaluation;: Evaluation; Stack Frames; Bindings. +* Stallman, Richard: A History of Emacs. +* string: String. +* string encoding, internal: Internal String Encoding. +* subprocesses: Subprocesses. +* subprocesses, asynchronous: Modules for Interfacing with the Operating System. +* subprocesses, synchronous: Modules for Interfacing with the Operating System. +* Sun Microsystems: XEmacs. +* sweep_bit_vectors_1: sweep_bit_vectors_1. +* sweep_lcrecords_1: sweep_lcrecords_1. +* sweep_strings: sweep_strings. +* symbol: Symbol. +* symbol values: Symbol Values. +* symbols and variables: Symbols and Variables. +* symbols, introduction to: Introduction to Symbols. +* synchronous subprocesses: Modules for Interfacing with the Operating System. +* tab controls: Tab Controls. +* taxes, doing: XEmacs From the Outside. +* techniques for XEmacs developers: Techniques for XEmacs Developers. +* TECO: A History of Emacs. +* temporary objects: The XEmacs Object System (Abstractly Speaking). +* testing, regression: Regression Testing XEmacs. +* text in a buffer, the: The Text in a Buffer. +* textual representation, buffers and: Buffers and Textual Representation. +* Thompson, Chuck: XEmacs. +* throw, catch and: Catch and Throw. +* types, dynamic: The Lisp Language. +* types, lstream: Lstream Types. +* types, proper use of unsigned: Proper Use of Unsigned Types. +* University of Illinois: XEmacs. +* unsigned types, proper use of: Proper Use of Unsigned Types. +* unwind-protects, dynamic binding; the specbinding stack;: Dynamic Binding; The specbinding Stack; Unwind-Protects. +* values, symbol: Symbol Values. +* variables, adding global Lisp: Adding Global Lisp Variables. +* variables, symbols and: Symbols and Variables. +* vector: Vector. +* vector, bit: Bit Vector. +* version 18, through: Through Version 18. +* version 19, GNU Emacs: GNU Emacs 19. +* version 20, GNU Emacs: GNU Emacs 20. +* widget interface, generic: Generic Widget Interface. +* widget library, Lucid: Lucid Widget Library. +* widget-glyphs: Glyphs. +* widget-glyphs in the MS-Windows environment: Glyphs. +* widget-glyphs in the X environment: Glyphs. +* Win-Emacs: XEmacs. +* window (in Emacs): Modules for the Basic Displayable Lisp Objects. +* window hierarchy: Window Hierarchy. +* window object, the: The Window Object. +* window point internals: The Window Object. +* windows, consoles; devices; frames;: Consoles; Devices; Frames; Windows. +* windows, introduction to consoles; devices; frames;: Introduction to Consoles; Devices; Frames; Windows. +* Wing, Ben: XEmacs. +* writer: Lstream Methods. +* writing good comments: Writing Good Comments. +* writing Lisp primitives: Writing Lisp Primitives. +* writing Mule-aware code, general guidelines for: General Guidelines for Writing Mule-Aware Code. +* writing new C code, rules when: Rules When Writing New C Code. +* X environment, widget-glyphs in the: Glyphs. +* X Window System, interface to the: Interface to the X Window System. +* X Windows, modules for interfacing with: Modules for Interfacing with X Windows. +* XEmacs: XEmacs. +* XEmacs from the inside: XEmacs From the Inside. +* XEmacs from the outside: XEmacs From the Outside. +* XEmacs from the perspective of building: XEmacs From the Perspective of Building. +* XEmacs goes it alone: XEmacs. +* XEmacs object system (abstractly speaking), the: The XEmacs Object System (Abstractly Speaking). +* Zawinski, Jamie: Lucid Emacs. +* zero-length extents: Zero-Length Extents. +