+\1f
+File: internals.info, Node: The XEmacs Object System (Abstractly Speaking), Next: How Lisp Objects Are Represented in C, Prev: XEmacs From the Inside, Up: Top
+
+The XEmacs Object System (Abstractly Speaking)
+**********************************************
+
+At the heart of the Lisp interpreter is its management of objects.
+XEmacs Lisp contains many built-in objects, some of which are simple
+and others of which can be very complex; and some of which are very
+common, and others of which are rarely used or are only used
+internally. (Since the Lisp allocation system, with its automatic
+reclamation of unused storage, is so much more convenient than
+`malloc()' and `free()', the C code makes extensive use of it in its
+internal operations.)
+
+ The basic Lisp objects are
+
+`integer'
+ 28 or 31 bits of precision, or 60 or 63 bits on 64-bit machines;
+ the reason for this is described below when the internal Lisp
+ object representation is described.
+
+`float'
+ Same precision as a double in C.
+
+`cons'
+ A simple container for two Lisp objects, used to implement lists
+ and most other data structures in Lisp.
+
+`char'
+ An object representing a single character of text; chars behave
+ like integers in many ways but are logically considered text
+ rather than numbers and have a different read syntax. (the read
+ syntax for a char contains the char itself or some textual
+ encoding of it--for example, a Japanese Kanji character might be
+ encoded as `^[$(B#&^[(B' using the ISO-2022 encoding
+ standard--rather than the numerical representation of the char;
+ this way, if the mapping between chars and integers changes, which
+ is quite possible for Kanji characters and other extended
+ characters, the same character will still be created. Note that
+ some primitives confuse chars and integers. The worst culprit is
+ `eq', which makes a special exception and considers a char to be
+ `eq' to its integer equivalent, even though in no other case are
+ objects of two different types `eq'. The reason for this
+ monstrosity is compatibility with existing code; the separation of
+ char from integer came fairly recently.)
+
+`symbol'
+ An object that contains Lisp objects and is referred to by name;
+ symbols are used to implement variables and named functions and to
+ provide the equivalent of preprocessor constants in C.
+
+`vector'
+ A one-dimensional array of Lisp objects providing constant-time
+ access to any of the objects; access to an arbitrary object in a
+ vector is faster than for lists, but the operations that can be
+ done on a vector are more limited.
+
+`string'
+ Self-explanatory; behaves much like a vector of chars but has a
+ different read syntax and is stored and manipulated more compactly.
+
+`bit-vector'
+ A vector of bits; similar to a string in spirit.
+
+`compiled-function'
+ An object containing compiled Lisp code, known as "byte code".
+
+`subr'
+ A Lisp primitive, i.e. a Lisp-callable function implemented in C.
+
+ Note that there is no basic "function" type, as in more powerful
+versions of Lisp (where it's called a "closure"). XEmacs Lisp does not
+provide the closure semantics implemented by Common Lisp and Scheme.
+The guts of a function in XEmacs Lisp are represented in one of four
+ways: a symbol specifying another function (when one function is an
+alias for another), a list (whose first element must be the symbol
+`lambda') containing the function's source code, a compiled-function
+object, or a subr object. (In other words, given a symbol specifying
+the name of a function, calling `symbol-function' to retrieve the
+contents of the symbol's function cell will return one of these types
+of objects.)
+
+ XEmacs Lisp also contains numerous specialized objects used to
+implement the editor:
+
+`buffer'
+ Stores text like a string, but is optimized for insertion and
+ deletion and has certain other properties that can be set.
+
+`frame'
+ An object with various properties whose displayable representation
+ is a "window" in window-system parlance.
+
+`window'
+ A section of a frame that displays the contents of a buffer; often
+ called a "pane" in window-system parlance.
+
+`window-configuration'
+ An object that represents a saved configuration of windows in a
+ frame.
+
+`device'
+ An object representing a screen on which frames can be displayed;
+ equivalent to a "display" in the X Window System and a "TTY" in
+ character mode.
+
+`face'
+ An object specifying the appearance of text or graphics; it has
+ properties such as font, foreground color, and background color.
+
+`marker'
+ An object that refers to a particular position in a buffer and
+ moves around as text is inserted and deleted to stay in the same
+ relative position to the text around it.
+
+`extent'
+ Similar to a marker but covers a range of text in a buffer; can
+ also specify properties of the text, such as a face in which the
+ text is to be displayed, whether the text is invisible or
+ unmodifiable, etc.
+
+`event'
+ Generated by calling `next-event' and contains information
+ describing a particular event happening in the system, such as the
+ user pressing a key or a process terminating.
+
+`keymap'
+ An object that maps from events (described using lists, vectors,
+ and symbols rather than with an event object because the mapping
+ is for classes of events, rather than individual events) to
+ functions to execute or other events to recursively look up; the
+ functions are described by name, using a symbol, or using lists to
+ specify the function's code.
+
+`glyph'
+ An object that describes the appearance of an image (e.g. pixmap)
+ on the screen; glyphs can be attached to the beginning or end of
+ extents and in some future version of XEmacs will be able to be
+ inserted directly into a buffer.
+
+`process'
+ An object that describes a connection to an externally-running
+ process.
+
+ There are some other, less-commonly-encountered general objects:
+
+`hash-table'
+ An object that maps from an arbitrary Lisp object to another
+ arbitrary Lisp object, using hashing for fast lookup.
+
+`obarray'
+ A limited form of hash-table that maps from strings to symbols;
+ obarrays are used to look up a symbol given its name and are not
+ actually their own object type but are kludgily represented using
+ vectors with hidden fields (this representation derives from GNU
+ Emacs).
+
+`specifier'
+ A complex object used to specify the value of a display property; a
+ default value is given and different values can be specified for
+ particular frames, buffers, windows, devices, or classes of device.
+
+`char-table'
+ An object that maps from chars or classes of chars to arbitrary
+ Lisp objects; internally char tables use a complex nested-vector
+ representation that is optimized to the way characters are
+ represented as integers.
+
+`range-table'
+ An object that maps from ranges of integers to arbitrary Lisp
+ objects.
+
+ And some strange special-purpose objects:
+
+`charset'
+`coding-system'
+ Objects used when MULE, or multi-lingual/Asian-language, support is
+ enabled.
+
+`color-instance'
+`font-instance'
+`image-instance'
+ An object that encapsulates a window-system resource; instances are
+ mostly used internally but are exposed on the Lisp level for
+ cleanness of the specifier model and because it's occasionally
+ useful for Lisp program to create or query the properties of
+ instances.
+
+`subwindow'
+ An object that encapsulate a "subwindow" resource, i.e. a
+ window-system child window that is drawn into by an external
+ process; this object should be integrated into the glyph system
+ but isn't yet, and may change form when this is done.
+
+`tooltalk-message'
+`tooltalk-pattern'
+ Objects that represent resources used in the ToolTalk interprocess
+ communication protocol.
+
+`toolbar-button'
+ An object used in conjunction with the toolbar.
+
+ And objects that are only used internally:
+
+`opaque'
+ A generic object for encapsulating arbitrary memory; this allows
+ you the generality of `malloc()' and the convenience of the Lisp
+ object system.
+
+`lstream'
+ A buffering I/O stream, used to provide a unified interface to
+ anything that can accept output or provide input, such as a file
+ descriptor, a stdio stream, a chunk of memory, a Lisp buffer, a
+ Lisp string, etc.; it's a Lisp object to make its memory
+ management more convenient.
+
+`char-table-entry'
+ Subsidiary objects in the internal char-table representation.
+
+`extent-auxiliary'
+`menubar-data'
+`toolbar-data'
+ Various special-purpose objects that are basically just used to
+ encapsulate memory for particular subsystems, similar to the more
+ general "opaque" object.
+
+`symbol-value-forward'
+`symbol-value-buffer-local'
+`symbol-value-varalias'
+`symbol-value-lisp-magic'
+ Special internal-only objects that are placed in the value cell of
+ a symbol to indicate that there is something special with this
+ variable - e.g. it has no value, it mirrors another variable, or
+ it mirrors some C variable; there is really only one kind of
+ object, called a "symbol-value-magic", but it is sort-of halfway
+ kludged into semi-different object types.
+
+ Some types of objects are "permanent", meaning that once created,
+they do not disappear until explicitly destroyed, using a function such
+as `delete-buffer', `delete-window', `delete-frame', etc. Others will
+disappear once they are not longer used, through the garbage collection
+mechanism. Buffers, frames, windows, devices, and processes are among
+the objects that are permanent. Note that some objects can go both
+ways: Faces can be created either way; extents are normally permanent,
+but detached extents (extents not referring to any text, as happens to
+some extents when the text they are referring to is deleted) are
+temporary. Note that some permanent objects, such as faces and coding
+systems, cannot be deleted. Note also that windows are unique in that
+they can be _undeleted_ after having previously been deleted. (This
+happens as a result of restoring a window configuration.)
+
+ Note that many types of objects have a "read syntax", i.e. a way of
+specifying an object of that type in Lisp code. When you load a Lisp
+file, or type in code to be evaluated, what really happens is that the
+function `read' is called, which reads some text and creates an object
+based on the syntax of that text; then `eval' is called, which possibly
+does something special; then this loop repeats until there's no more
+text to read. (`eval' only actually does something special with
+symbols, which causes the symbol's value to be returned, similar to
+referencing a variable; and with conses [i.e. lists], which cause a
+function invocation. All other values are returned unchanged.)
+
+ The read syntax
+
+ 17297
+
+ converts to an integer whose value is 17297.
+
+ 1.983e-4
+
+ converts to a float whose value is 1.983e-4, or .0001983.
+
+ ?b
+
+ converts to a char that represents the lowercase letter b.
+
+ ?^[$(B#&^[(B
+
+ (where `^[' actually is an `ESC' character) converts to a particular
+Kanji character when using an ISO2022-based coding system for input.
+(To decode this goo: `ESC' begins an escape sequence; `ESC $ (' is a
+class of escape sequences meaning "switch to a 94x94 character set";
+`ESC $ ( B' means "switch to Japanese Kanji"; `#' and `&' collectively
+index into a 94-by-94 array of characters [subtract 33 from the ASCII
+value of each character to get the corresponding index]; `ESC (' is a
+class of escape sequences meaning "switch to a 94 character set"; `ESC
+(B' means "switch to US ASCII". It is a coincidence that the letter
+`B' is used to denote both Japanese Kanji and US ASCII. If the first
+`B' were replaced with an `A', you'd be requesting a Chinese Hanzi
+character from the GB2312 character set.)
+
+ "foobar"
+
+ converts to a string.
+
+ foobar
+
+ converts to a symbol whose name is `"foobar"'. This is done by
+looking up the string equivalent in the global variable `obarray',
+whose contents should be an obarray. If no symbol is found, a new
+symbol with the name `"foobar"' is automatically created and added to
+`obarray'; this process is called "interning" the symbol.
+
+ (foo . bar)
+
+ converts to a cons cell containing the symbols `foo' and `bar'.
+
+ (1 a 2.5)
+
+ converts to a three-element list containing the specified objects
+(note that a list is actually a set of nested conses; see the XEmacs
+Lisp Reference).
+
+ [1 a 2.5]
+
+ converts to a three-element vector containing the specified objects.
+
+ #[... ... ... ...]
+
+ converts to a compiled-function object (the actual contents are not
+shown since they are not relevant here; look at a file that ends with
+`.elc' for examples).
+
+ #*01110110
+
+ converts to a bit-vector.
+
+ #s(hash-table ... ...)
+
+ converts to a hash table (the actual contents are not shown).
+
+ #s(range-table ... ...)
+
+ converts to a range table (the actual contents are not shown).
+
+ #s(char-table ... ...)
+
+ converts to a char table (the actual contents are not shown).
+
+ Note that the `#s()' syntax is the general syntax for structures,
+which are not really implemented in XEmacs Lisp but should be.
+
+ When an object is printed out (using `print' or a related function),
+the read syntax is used, so that the same object can be read in again.
+
+ The other objects do not have read syntaxes, usually because it does
+not really make sense to create them in this fashion (i.e. processes,
+where it doesn't make sense to have a subprocess created as a side
+effect of reading some Lisp code), or because they can't be created at
+all (e.g. subrs). Permanent objects, as a rule, do not have a read
+syntax; nor do most complex objects, which contain too much state to be
+easily initialized through a read syntax.
+
+\1f
+File: internals.info, Node: How Lisp Objects Are Represented in C, Next: Rules When Writing New C Code, Prev: The XEmacs Object System (Abstractly Speaking), Up: Top
+
+How Lisp Objects Are Represented in C
+*************************************
+
+Lisp objects are represented in C using a 32-bit or 64-bit machine word
+(depending on the processor; i.e. DEC Alphas use 64-bit Lisp objects and
+most other processors use 32-bit Lisp objects). The representation
+stuffs a pointer together with a tag, as follows:
+
+ [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
+ [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
+
+ <---------------------------------------------------------> <->
+ a pointer to a structure, or an integer tag
+
+ A tag of 00 is used for all pointer object types, a tag of 10 is used
+for characters, and the other two tags 01 and 11 are joined together to
+form the integer object type. This representation gives us 31 bit
+integers and 30 bit characters, while pointers are represented directly
+without any bit masking or shifting. This representation, though,
+assumes that pointers to structs are always aligned to multiples of 4,
+so the lower 2 bits are always zero.
+
+ Lisp objects use the typedef `Lisp_Object', but the actual C type
+used for the Lisp object can vary. It can be either a simple type
+(`long' on the DEC Alpha, `int' on other machines) or a structure whose
+fields are bit fields that line up properly (actually, a union of
+structures is used). The choice of which type to use is determined by
+the preprocessor constant `USE_UNION_TYPE' which is defined via the
+`--use-union-type' option to `configure'.
+
+ Generally the simple integral type is preferable because it ensures
+that the compiler will actually use a machine word to represent the
+object (some compilers will use more general and less efficient code
+for unions and structs even if they can fit in a machine word). The
+union type, however, has the advantage of stricter _static_ type
+checking. Places where a `Lisp_Object' is mistakenly passed to a
+routine expecting an `int' (or vice-versa), or a check is written `if
+(foo)' (instead of `if (!NILP (foo))', will be flagged as errors. None
+of these lead to the expected results! `Qnil' is not represented as 0
+(so `if (foo)' will *ALWAYS* be true for a `Lisp_Object'), and the
+representation of an integer as a `Lisp_Object' is not just the
+integer's numeric value, but usually 2x the integer +/- 1.)
+
+ There used to be a claim that the union type simplified debugging.
+There may have been a grain of truth to this pre-19.8, when there was no
+`lrecord' type and all objects had a separate type appearing in the
+tag. Nowadays, however, there is no debugging gain, and in fact
+frequent debugging *_loss_*, since many debuggers don't handle unions
+very well, and usually there is no way to directly specify a union from
+a debugging prompt.
+
+ Furthermore, release builds should *_not_* be done with union type
+because (a) you may get less efficiency, with compilers that can't
+figure out how to optimize the union into a machine word; (b) even
+worse, the union type often triggers miscompilation, especially when
+combined with Mule and error-checking. This has been the case at
+various times when using GCC and MS VC, at least with `--pdump'.
+Therefore, be warned!
+
+ As of 2002 4Q, miscompilation is known to happen with current
+versions of *Microsoft VC++* and *GCC in combination with Mule, pdump,
+and KKCC* (no error checking).
+
+ Various macros are used to convert between Lisp_Objects and the
+corresponding C type. Macros of the form `XINT()', `XCHAR()',
+`XSTRING()', `XSYMBOL()', do any required bit shifting and/or masking
+and cast it to the appropriate type. `XINT()' needs to be a bit tricky
+so that negative numbers are properly sign-extended. Since integers
+are stored left-shifted, if the right-shift operator does an arithmetic
+shift (i.e. it leaves the most-significant bit as-is rather than
+shifting in a zero, so that it mimics a divide-by-two even for negative
+numbers) the shift to remove the tag bit is enough. This is the case
+on all the systems we support.
+
+ Note that when `ERROR_CHECK_TYPECHECK' is defined, the converter
+macros become more complicated--they check the tag bits and/or the type
+field in the first four bytes of a record type to ensure that the
+object is really of the correct type. This is great for catching places
+where an incorrect type is being dereferenced--this typically results
+in a pointer being dereferenced as the wrong type of structure, with
+unpredictable (and sometimes not easily traceable) results.
+
+ There are similar `XSETTYPE()' macros that construct a Lisp object.
+These macros are of the form `XSETTYPE (LVALUE, RESULT)', i.e. they
+have to be a statement rather than just used in an expression. The
+reason for this is that standard C doesn't let you "construct" a
+structure (but GCC does). Granted, this sometimes isn't too
+convenient; for the case of integers, at least, you can use the
+function `make_int()', which constructs and _returns_ an integer Lisp
+object. Note that the `XSETTYPE()' macros are also affected by
+`ERROR_CHECK_TYPECHECK' and make sure that the structure is of the
+right type in the case of record types, where the type is contained in
+the structure.
+
+ The C programmer is responsible for *guaranteeing* that a
+Lisp_Object is the correct type before using the `XTYPE' macros. This
+is especially important in the case of lists. Use `XCAR' and `XCDR' if
+a Lisp_Object is certainly a cons cell, else use `Fcar()' and `Fcdr()'.
+Trust other C code, but not Lisp code. On the other hand, if XEmacs
+has an internal logic error, it's better to crash immediately, so
+sprinkle `assert()'s and "unreachable" `abort()'s liberally about the
+source code. Where performance is an issue, use `type_checking_assert',
+`bufpos_checking_assert', and `gc_checking_assert', which do nothing
+unless the corresponding configure error checking flag was specified.
+
+\1f
+File: internals.info, Node: Rules When Writing New C Code, Next: Regression Testing XEmacs, Prev: How Lisp Objects Are Represented in C, Up: Top
+
+Rules When Writing New C Code
+*****************************
+
+The XEmacs C Code is extremely complex and intricate, and there are many
+rules that are more or less consistently followed throughout the code.
+Many of these rules are not obvious, so they are explained here. It is
+of the utmost importance that you follow them. If you don't, you may
+get something that appears to work, but which will crash in odd
+situations, often in code far away from where the actual breakage is.
+
+* Menu:
+
+* A Reader's Guide to XEmacs Coding Conventions::
+* General Coding Rules::
+* Writing Lisp Primitives::
+* Writing Good Comments::
+* Adding Global Lisp Variables::
+* Proper Use of Unsigned Types::
+* Coding for Mule::
+* Techniques for XEmacs Developers::
+
+\1f
+File: internals.info, Node: A Reader's Guide to XEmacs Coding Conventions, Next: General Coding Rules, Up: Rules When Writing New C Code
+
+A Reader's Guide to XEmacs Coding Conventions
+=============================================
+
+Of course the low-level implementation language of XEmacs is C, but much
+of that uses the Lisp engine to do its work. However, because the code
+is "inside" of the protective containment shell around the "reactor
+core," you'll see lots of complex "plumbing" needed to do the work and
+"safety mechanisms," whose failure results in a meltdown. This section
+provides a quick overview (or review) of the various components of the
+implementation of Lisp objects.
+
+ Two typographic conventions help to identify C objects that implement
+Lisp objects. The first is that capitalized identifiers, especially
+beginning with the letters `Q', `V', `F', and `S', for C variables and
+functions, and C macros with beginning with the letter `X', are used to
+implement Lisp. The second is that where Lisp uses the hyphen `-' in
+symbol names, the corresponding C identifiers use the underscore `_'.
+Of course, since XEmacs Lisp contains interfaces to many external
+libraries, those external names will follow the coding conventions
+their authors chose, and may overlap the "XEmacs name space." However
+these cases are usually pretty obvious.
+
+ All Lisp objects are handled indirectly. The `Lisp_Object' type is
+usually a pointer to a structure, except for a very small number of
+types with immediate representations (currently characters and
+integers). However, these types cannot be directly operated on in C
+code, either, so they can also be considered indirect. Types that do
+not have an immediate representation always have a C typedef
+`Lisp_TYPE' for a corresponding structure.
+
+ In older code, it was common practice to pass around pointers to
+`Lisp_TYPE', but this is now deprecated in favor of using `Lisp_Object'
+for all function arguments and return values that are Lisp objects.
+The `XTYPE' macro is used to extract the pointer and cast it to
+`(Lisp_TYPE *)' for the desired type.
+
+ *Convention*: macros whose names begin with `X' operate on
+`Lisp_Object's and do no type-checking. Many such macros are type
+extractors, but others implement Lisp operations in C (_e.g._, `XCAR'
+implements the Lisp `car' function). These are unsafe, and must only
+be used where types of all data have already been checked. Such macros
+are only applied to `Lisp_Object's. In internal implementations where
+the pointer has already been converted, the structure is operated on
+directly using the C `->' member access operator.
+
+ The `TYPEP', `CHECK_TYPE', and `CONCHECK_TYPE' macros are used to
+test types. The first returns a Boolean value, and the latter signal
+errors. (The `CONCHECK' variety allows execution to be CONtinued under
+some circumstances, thus the name.) Functions which expect to be
+passed user data invariably call `CHECK' macros on arguments.
+
+ There are many types of specialized Lisp objects implemented in C,
+but the most pervasive type is the "symbol". Symbols are used as
+identifiers, variables, and functions.
+
+ *Convention*: Global variables whose names begin with `Q' are
+constants whose value is a symbol. The name of the variable should be
+derived from the name of the symbol using the same rules as for Lisp
+primitives. Such variables allow the C code to check whether a
+particular `Lisp_Object' is equal to a given symbol. Symbols are Lisp
+objects, so these variables may be passed to Lisp primitives. (An
+alternative to the use of `Q...' variables is to call the `intern'
+function at initialization in the `vars_of_MODULE' function, which is
+hardly less efficient.)
+
+ *Convention*: Global variables whose names begin with `V' are
+variables that contain Lisp objects. The convention here is that all
+global variables of type `Lisp_Object' begin with `V', and no others do
+(not even integer and boolean variables that have Lisp equivalents).
+Most of the time, these variables have equivalents in Lisp, which are
+defined via the `DEFVAR' family of macros, but some don't. Since the
+variable's value is a `Lisp_Object', it can be passed to Lisp
+primitives.
+
+ The implementation of Lisp primitives is more complex.
+*Convention*: Global variables with names beginning with `S' contain a
+structure that allows the Lisp engine to identify and call a C
+function. In modern versions of XEmacs, these identifiers are almost
+always completely hidden in the `DEFUN' and `SUBR' macros, but you will
+encounter them if you look at very old versions of XEmacs or at GNU
+Emacs. *Convention*: Functions with names beginning with `F' implement
+Lisp primitives. Of course all their arguments and their return values
+must be Lisp_Objects. (This is hidden in the `DEFUN' macro.)
+
+\1f
+File: internals.info, Node: General Coding Rules, Next: Writing Lisp Primitives, Prev: A Reader's Guide to XEmacs Coding Conventions, Up: Rules When Writing New C Code
+
+General Coding Rules
+====================
+
+The C code is actually written in a dialect of C called "Clean C",
+meaning that it can be compiled, mostly warning-free, with either a C or
+C++ compiler. Coding in Clean C has several advantages over plain C.
+C++ compilers are more nit-picking, and a number of coding errors have
+been found by compiling with C++. The ability to use both C and C++
+tools means that a greater variety of development tools are available to
+the developer.
+
+ Every module includes `<config.h>' (angle brackets so that
+`--srcdir' works correctly; `config.h' may or may not be in the same
+directory as the C sources) and `lisp.h'. `config.h' must always be
+included before any other header files (including system header files)
+to ensure that certain tricks played by various `s/' and `m/' files
+work out correctly.
+
+ When including header files, always use angle brackets, not double
+quotes, except when the file to be included is always in the same
+directory as the including file. If either file is a generated file,
+then that is not likely to be the case. In order to understand why we
+have this rule, imagine what happens when you do a build in the source
+directory using `./configure' and another build in another directory
+using `../work/configure'. There will be two different `config.h'
+files. Which one will be used if you `#include "config.h"'?
+
+ Almost every module contains a `syms_of_*()' function and a
+`vars_of_*()' function. The former declares any Lisp primitives you
+have defined and defines any symbols you will be using. The latter
+declares any global Lisp variables you have added and initializes global
+C variables in the module. *Important*: There are stringent
+requirements on exactly what can go into these functions. See the
+comment in `emacs.c'. The reason for this is to avoid obscure unwanted
+interactions during initialization. If you don't follow these rules,
+you'll be sorry! If you want to do anything that isn't allowed, create
+a `complex_vars_of_*()' function for it. Doing this is tricky, though:
+you have to make sure your function is called at the right time so that
+all the initialization dependencies work out.
+
+ Declare each function of these kinds in `symsinit.h'. Make sure
+it's called in the appropriate place in `emacs.c'. You never need to
+include `symsinit.h' directly, because it is included by `lisp.h'.
+
+ *All global and static variables that are to be modifiable must be
+declared uninitialized.* This means that you may not use the "declare
+with initializer" form for these variables, such as `int some_variable
+= 0;'. The reason for this has to do with some kludges done during the
+dumping process: If possible, the initialized data segment is re-mapped
+so that it becomes part of the (unmodifiable) code segment in the
+dumped executable. This allows this memory to be shared among multiple
+running XEmacs processes. XEmacs is careful to place as much constant
+data as possible into initialized variables during the `temacs' phase.
+
+ *Please note:* This kludge only works on a few systems nowadays, and
+is rapidly becoming irrelevant because most modern operating systems
+provide "copy-on-write" semantics. All data is initially shared
+between processes, and a private copy is automatically made (on a
+page-by-page basis) when a process first attempts to write to a page of
+memory.
+
+ Formerly, there was a requirement that static variables not be
+declared inside of functions. This had to do with another hack along
+the same vein as what was just described: old USG systems put
+statically-declared variables in the initialized data space, so those
+header files had a `#define static' declaration. (That way, the
+data-segment remapping described above could still work.) This fails
+badly on static variables inside of functions, which suddenly become
+automatic variables; therefore, you weren't supposed to have any of
+them. This awful kludge has been removed in XEmacs because
+
+ 1. almost all of the systems that used this kludge ended up having to
+ disable the data-segment remapping anyway;
+
+ 2. the only systems that didn't were extremely outdated ones;
+
+ 3. this hack completely messed up inline functions.
+
+ The C source code makes heavy use of C preprocessor macros. One
+popular macro style is:
+
+ #define FOO(var, value) do { \
+ Lisp_Object FOO_value = (value); \
+ ... /* compute using FOO_value */ \
+ (var) = bar; \
+ } while (0)
+
+ The `do {...} while (0)' is a standard trick to allow FOO to have
+statement semantics, so that it can safely be used within an `if'
+statement in C, for example. Multiple evaluation is prevented by
+copying a supplied argument into a local variable, so that
+`FOO(var,fun(1))' only calls `fun' once.
+
+ Lisp lists are popular data structures in the C code as well as in
+Elisp. There are two sets of macros that iterate over lists.
+`EXTERNAL_LIST_LOOP_N' should be used when the list has been supplied
+by the user, and cannot be trusted to be acyclic and `nil'-terminated.
+A `malformed-list' or `circular-list' error will be generated if the
+list being iterated over is not entirely kosher. `LIST_LOOP_N', on the
+other hand, is faster and less safe, and can be used only on trusted
+lists.
+
+ Related macros are `GET_EXTERNAL_LIST_LENGTH' and `GET_LIST_LENGTH',
+which calculate the length of a list, and in the case of
+`GET_EXTERNAL_LIST_LENGTH', validating the properness of the list. The
+macros `EXTERNAL_LIST_LOOP_DELETE_IF' and `LIST_LOOP_DELETE_IF' delete
+elements from a lisp list satisfying some predicate.
+
+\1f
+File: internals.info, Node: Writing Lisp Primitives, Next: Writing Good Comments, Prev: General Coding Rules, Up: Rules When Writing New C Code
+
+Writing Lisp Primitives
+=======================
+
+Lisp primitives are Lisp functions implemented in C. The details of
+interfacing the C function so that Lisp can call it are handled by a few
+C macros. The only way to really understand how to write new C code is
+to read the source, but we can explain some things here.
+
+ An example of a special form is the definition of `prog1', from
+`eval.c'. (An ordinary function would have the same general
+appearance.)
+
+ DEFUN ("prog1", Fprog1, 1, UNEVALLED, 0, /*
+ Similar to `progn', but the value of the first form is returned.
+ \(prog1 FIRST BODY...): All the arguments are evaluated sequentially.
+ The value of FIRST is saved during evaluation of the remaining args,
+ whose values are discarded.
+ */
+ (args))
+ {
+ /* This function can GC */
+ REGISTER Lisp_Object val, form, tail;
+ struct gcpro gcpro1;
+
+ val = Feval (XCAR (args));
+
+ GCPRO1 (val);
+
+ LIST_LOOP_3 (form, XCDR (args), tail)
+ Feval (form);
+
+ UNGCPRO;
+ return val;
+ }
+
+ Let's start with a precise explanation of the arguments to the
+`DEFUN' macro. Here is a template for them:
+
+ DEFUN (LNAME, FNAME, MIN_ARGS, MAX_ARGS, INTERACTIVE, /*
+ DOCSTRING
+ */
+ (ARGLIST))
+
+LNAME
+ This string is the name of the Lisp symbol to define as the
+ function name; in the example above, it is `"prog1"'.
+
+FNAME
+ This is the C function name for this function. This is the name
+ that is used in C code for calling the function. The name is, by
+ convention, `F' prepended to the Lisp name, with all dashes (`-')
+ in the Lisp name changed to underscores. Thus, to call this
+ function from C code, call `Fprog1'. Remember that the arguments
+ are of type `Lisp_Object'; various macros and functions for
+ creating values of type `Lisp_Object' are declared in the file
+ `lisp.h'.
+
+ Primitives whose names are special characters (e.g. `+' or `<')
+ are named by spelling out, in some fashion, the special character:
+ e.g. `Fplus()' or `Flss()'. Primitives whose names begin with
+ normal alphanumeric characters but also contain special characters
+ are spelled out in some creative way, e.g. `let*' becomes
+ `FletX()'.
+
+ Each function also has an associated structure that holds the data
+ for the subr object that represents the function in Lisp. This
+ structure conveys the Lisp symbol name to the initialization
+ routine that will create the symbol and store the subr object as
+ its definition. The C variable name of this structure is always
+ `S' prepended to the FNAME. You hardly ever need to be aware of
+ the existence of this structure, since `DEFUN' plus `DEFSUBR'
+ takes care of all the details.
+
+MIN_ARGS
+ This is the minimum number of arguments that the function
+ requires. The function `prog1' allows a minimum of one argument.
+
+MAX_ARGS
+ This is the maximum number of arguments that the function accepts,
+ if there is a fixed maximum. Alternatively, it can be `UNEVALLED',
+ indicating a special form that receives unevaluated arguments, or
+ `MANY', indicating an unlimited number of evaluated arguments (the
+ C equivalent of `&rest'). Both `UNEVALLED' and `MANY' are macros.
+ If MAX_ARGS is a number, it may not be less than MIN_ARGS and it
+ may not be greater than 8. (If you need to add a function with
+ more than 8 arguments, use the `MANY' form. Resist the urge to
+ edit the definition of `DEFUN' in `lisp.h'. If you do it anyways,
+ make sure to also add another clause to the switch statement in
+ `primitive_funcall().')
+
+INTERACTIVE
+ This is an interactive specification, a string such as might be
+ used as the argument of `interactive' in a Lisp function. In the
+ case of `prog1', it is 0 (a null pointer), indicating that `prog1'
+ cannot be called interactively. A value of `""' indicates a
+ function that should receive no arguments when called
+ interactively.
+
+DOCSTRING
+ This is the documentation string. It is written just like a
+ documentation string for a function defined in Lisp; in
+ particular, the first line should be a single sentence. Note how
+ the documentation string is enclosed in a comment, none of the
+ documentation is placed on the same lines as the comment-start and
+ comment-end characters, and the comment-start characters are on
+ the same line as the interactive specification. `make-docfile',
+ which scans the C files for documentation strings, is very
+ particular about what it looks for, and will not properly extract
+ the doc string if it's not in this exact format.
+
+ In order to make both `etags' and `make-docfile' happy, make sure
+ that the `DEFUN' line contains the LNAME and FNAME, and that the
+ comment-start characters for the doc string are on the same line
+ as the interactive specification, and put a newline directly after
+ them (and before the comment-end characters).
+
+ARGLIST
+ This is the comma-separated list of arguments to the C function.
+ For a function with a fixed maximum number of arguments, provide a
+ C argument for each Lisp argument. In this case, unlike regular C
+ functions, the types of the arguments are not declared; they are
+ simply always of type `Lisp_Object'.
+
+ The names of the C arguments will be used as the names of the
+ arguments to the Lisp primitive as displayed in its documentation,
+ modulo the same concerns described above for `F...' names (in
+ particular, underscores in the C arguments become dashes in the
+ Lisp arguments).
+
+ There is one additional kludge: A trailing `_' on the C argument is
+ discarded when forming the Lisp argument. This allows C language
+ reserved words (like `default') or global symbols (like `dirname')
+ to be used as argument names without compiler warnings or errors.
+
+ A Lisp function with MAX_ARGS = `UNEVALLED' is a "special form";
+ its arguments are not evaluated. Instead it receives one argument
+ of type `Lisp_Object', a (Lisp) list of the unevaluated arguments,
+ conventionally named `(args)'.
+
+ When a Lisp function has no upper limit on the number of arguments,
+ specify MAX_ARGS = `MANY'. In this case its implementation in C
+ actually receives exactly two arguments: the number of Lisp
+ arguments (an `int') and the address of a block containing their
+ values (a `Lisp_Object *'). In this case only are the C types
+ specified in the ARGLIST: `(int nargs, Lisp_Object *args)'.
+
+
+ Within the function `Fprog1' itself, note the use of the macros
+`GCPRO1' and `UNGCPRO'. `GCPRO1' is used to "protect" a variable from
+garbage collection--to inform the garbage collector that it must look
+in that variable and regard the object pointed at by its contents as an
+accessible object. This is necessary whenever you call `Feval' or
+anything that can directly or indirectly call `Feval' (this includes
+the `QUIT' macro!). At such a time, any Lisp object that you intend to
+refer to again must be protected somehow. `UNGCPRO' cancels the
+protection of the variables that are protected in the current function.
+It is necessary to do this explicitly.
+
+ The macro `GCPRO1' protects just one local variable. If you want to
+protect two, use `GCPRO2' instead; repeating `GCPRO1' will not work.
+Macros `GCPRO3' and `GCPRO4' also exist.
+
+ These macros implicitly use local variables such as `gcpro1'; you
+must declare these explicitly, with type `struct gcpro'. Thus, if you
+use `GCPRO2', you must declare `gcpro1' and `gcpro2'.
+
+ Note also that the general rule is "caller-protects"; i.e. you are
+only responsible for protecting those Lisp objects that you create. Any
+objects passed to you as arguments should have been protected by whoever
+created them, so you don't in general have to protect them.
+
+ In particular, the arguments to any Lisp primitive are always
+automatically `GCPRO'ed, when called "normally" from Lisp code or
+bytecode. So only a few Lisp primitives that are called frequently from
+C code, such as `Fprogn' protect their arguments as a service to their
+caller. You don't need to protect your arguments when writing a new
+`DEFUN'.
+
+ `GCPRO'ing is perhaps the trickiest and most error-prone part of
+XEmacs coding. It is *extremely* important that you get this right and
+use a great deal of discipline when writing this code. *Note
+`GCPRO'ing: GCPROing, for full details on how to do this.
+
+ What `DEFUN' actually does is declare a global structure of type
+`Lisp_Subr' whose name begins with capital `SF' and which contains
+information about the primitive (e.g. a pointer to the function, its
+minimum and maximum allowed arguments, a string describing its Lisp
+name); `DEFUN' then begins a normal C function declaration using the
+`F...' name. The Lisp subr object that is the function definition of a
+primitive (i.e. the object in the function slot of the symbol that
+names the primitive) actually points to this `SF' structure; when
+`Feval' encounters a subr, it looks in the structure to find out how to
+call the C function.
+
+ Defining the C function is not enough to make a Lisp primitive
+available; you must also create the Lisp symbol for the primitive (the
+symbol is "interned"; *note Obarrays::) and store a suitable subr
+object in its function cell. (If you don't do this, the primitive won't
+be seen by Lisp code.) The code looks like this:
+
+ DEFSUBR (FNAME);
+
+Here FNAME is the same name you used as the second argument to `DEFUN'.
+
+ This call to `DEFSUBR' should go in the `syms_of_*()' function at
+the end of the module. If no such function exists, create it and make
+sure to also declare it in `symsinit.h' and call it from the
+appropriate spot in `main()'. *Note General Coding Rules::.
+
+ Note that C code cannot call functions by name unless they are
+defined in C. The way to call a function written in Lisp from C is to
+use `Ffuncall', which embodies the Lisp function `funcall'. Since the
+Lisp function `funcall' accepts an unlimited number of arguments, in C
+it takes two: the number of Lisp-level arguments, and a one-dimensional
+array containing their values. The first Lisp-level argument is the
+Lisp function to call, and the rest are the arguments to pass to it.
+Since `Ffuncall' can call the evaluator, you must protect pointers from
+garbage collection around the call to `Ffuncall'. (However, `Ffuncall'
+explicitly protects all of its parameters, so you don't have to protect
+any pointers passed as parameters to it.)
+
+ The C functions `call0', `call1', `call2', and so on, provide handy
+ways to call a Lisp function conveniently with a fixed number of
+arguments. They work by calling `Ffuncall'.
+
+ `eval.c' is a very good file to look through for examples; `lisp.h'
+contains the definitions for important macros and functions.
+
+\1f
+File: internals.info, Node: Writing Good Comments, Next: Adding Global Lisp Variables, Prev: Writing Lisp Primitives, Up: Rules When Writing New C Code
+
+Writing Good Comments
+=====================
+
+Comments are a lifeline for programmers trying to understand tricky
+code. In general, the less obvious it is what you are doing, the more
+you need a comment, and the more detailed it needs to be. You should
+always be on guard when you're writing code for stuff that's tricky, and
+should constantly be putting yourself in someone else's shoes and asking
+if that person could figure out without much difficulty what's going
+on. (Assume they are a competent programmer who understands the
+essentials of how the XEmacs code is structured but doesn't know much
+about the module you're working on or any algorithms you're using.) If
+you're not sure whether they would be able to, add a comment. Always
+err on the side of more comments, rather than less.
+
+ Generally, when making comments, there is no need to attribute them
+with your name or initials. This especially goes for small,
+easy-to-understand, non-opinionated ones. Also, comments indicating
+where, when, and by whom a file was changed are _strongly_ discouraged,
+and in general will be removed as they are discovered. This is exactly
+what `ChangeLogs' are there for. However, it can occasionally be
+useful to mark exactly where (but not when or by whom) changes are
+made, particularly when making small changes to a file imported from
+elsewhere. These marks help when later on a newer version of the file
+is imported and the changes need to be merged. (If everything were
+always kept in CVS, there would be no need for this. But in practice,
+this often doesn't happen, or the CVS repository is later on lost or
+unavailable to the person doing the update.)
+
+ When putting in an explicit opinion in a comment, you should
+_always_ attribute it with your name, and optionally the date. This
+also goes for long, complex comments explaining in detail the workings
+of something - by putting your name there, you make it possible for
+someone who has questions about how that thing works to determine who
+wrote the comment so they can write to them. Preferably, use your
+actual name and not your initials, unless your initials are generally
+recognized (e.g. `jwz'). You can use only your first name if it's
+obvious who you are; otherwise, give first and last name. If you're
+not a regular contributor, you might consider putting your email
+address in - it may be in the ChangeLog, but after awhile ChangeLogs
+have a tendency of disappearing or getting muddled. (E.g. your comment
+may get copied somewhere else or even into another program, and
+tracking down the proper ChangeLog may be very difficult.)
+
+ If you come across an opinion that is not or no longer valid, or you
+come across any comment that no longer applies but you want to keep it
+around, enclose it in `[[ ' and ` ]]' marks and add a comment
+afterwards explaining why the preceding comment is no longer valid. Put
+your name on this comment, as explained above.
+
+ Just as comments are a lifeline to programmers, incorrect comments
+are death. If you come across an incorrect comment, *immediately*
+correct it or flag it as incorrect, as described in the previous
+paragraph. Whenever you work on a section of code, _always_ make sure
+to update any comments to be correct - or, at the very least, flag them
+as incorrect.
+
+ To indicate a "todo" or other problem, use four pound signs - i.e.
+`####'.
+
+\1f
+File: internals.info, Node: Adding Global Lisp Variables, Next: Proper Use of Unsigned Types, Prev: Writing Good Comments, Up: Rules When Writing New C Code
+
+Adding Global Lisp Variables
+============================
+
+Global variables whose names begin with `Q' are constants whose value
+is a symbol of a particular name. The name of the variable should be
+derived from the name of the symbol using the same rules as for Lisp
+primitives. These variables are initialized using a call to
+`defsymbol()' in the `syms_of_*()' function. (This call interns a
+symbol, sets the C variable to the resulting Lisp object, and calls
+`staticpro()' on the C variable to tell the garbage-collection
+mechanism about this variable. What `staticpro()' does is add a
+pointer to the variable to a large global array; when
+garbage-collection happens, all pointers listed in the array are used
+as starting points for marking Lisp objects. This is important because
+it's quite possible that the only current reference to the object is
+the C variable. In the case of symbols, the `staticpro()' doesn't
+matter all that much because the symbol is contained in `obarray',
+which is itself `staticpro()'ed. However, it's possible that a naughty
+user could do something like uninterning the symbol out of `obarray' or
+even setting `obarray' to a different value [although this is likely to
+make XEmacs crash!].)
+
+ *Please note:* It is potentially deadly if you declare a `Q...'
+variable in two different modules. The two calls to `defsymbol()' are
+no problem, but some linkers will complain about multiply-defined
+symbols. The most insidious aspect of this is that often the link will
+succeed anyway, but then the resulting executable will sometimes crash
+in obscure ways during certain operations!
+
+ To avoid this problem, declare any symbols with common names (such as
+`text') that are not obviously associated with this particular module
+in the file `general-slots.h'. The "-slots" suffix indicates that this
+is a file that is included multiple times in `general.c'. Redefinition
+of preprocessor macros allows the effects to be different in each
+context, so this is actually more convenient and less error-prone than
+doing it in your module.
+
+ Global variables whose names begin with `V' are variables that
+contain Lisp objects. The convention here is that all global variables
+of type `Lisp_Object' begin with `V', and all others don't (including
+integer and boolean variables that have Lisp equivalents). Most of the
+time, these variables have equivalents in Lisp, but some don't. Those
+that do are declared this way by a call to `DEFVAR_LISP()' in the
+`vars_of_*()' initializer for the module. What this does is create a
+special "symbol-value-forward" Lisp object that contains a pointer to
+the C variable, intern a symbol whose name is as specified in the call
+to `DEFVAR_LISP()', and set its value to the symbol-value-forward Lisp
+object; it also calls `staticpro()' on the C variable to tell the
+garbage-collection mechanism about the variable. When `eval' (or
+actually `symbol-value') encounters this special object in the process
+of retrieving a variable's value, it follows the indirection to the C
+variable and gets its value. `setq' does similar things so that the C
+variable gets changed.
+
+ Whether or not you `DEFVAR_LISP()' a variable, you need to
+initialize it in the `vars_of_*()' function; otherwise it will end up
+as all zeroes, which is the integer 0 (_not_ `nil'), and this is
+probably not what you want. Also, if the variable is not
+`DEFVAR_LISP()'ed, *you must call* `staticpro()' on the C variable in
+the `vars_of_*()' function. Otherwise, the garbage-collection
+mechanism won't know that the object in this variable is in use, and
+will happily collect it and reuse its storage for another Lisp object,
+and you will be the one who's unhappy when you can't figure out how
+your variable got overwritten.
+
+\1f
+File: internals.info, Node: Proper Use of Unsigned Types, Next: Coding for Mule, Prev: Adding Global Lisp Variables, Up: Rules When Writing New C Code
+
+Proper Use of Unsigned Types
+============================
+
+Avoid using `unsigned int' and `unsigned long' whenever possible.
+Unsigned types are viral - any arithmetic or comparisons involving
+mixed signed and unsigned types are automatically converted to
+unsigned, which is almost certainly not what you want. Many subtle and
+hard-to-find bugs are created by careless use of unsigned types. In
+general, you should almost _never_ use an unsigned type to hold a
+regular quantity of any sort. The only exceptions are
+
+ 1. When there's a reasonable possibility you will actually need all
+ 32 or 64 bits to store the quantity.
+
+ 2. When calling existing API's that require unsigned types. In this
+ case, you should still do all manipulation using signed types, and
+ do the conversion at the very threshold of the API call.
+
+ 3. In existing code that you don't want to modify because you don't
+ maintain it.
+
+ 4. In bit-field structures.
+
+ Other reasonable uses of `unsigned int' and `unsigned long' are
+representing non-quantities - e.g. bit-oriented flags and such.
+
+\1f
+File: internals.info, Node: Coding for Mule, Next: Techniques for XEmacs Developers, Prev: Proper Use of Unsigned Types, Up: Rules When Writing New C Code
+
+Coding for Mule
+===============
+
+Although Mule support is not compiled by default in XEmacs, many people
+are using it, and we consider it crucial that new code works correctly
+with multibyte characters. This is not hard; it is only a matter of
+following several simple user-interface guidelines. Even if you never
+compile with Mule, with a little practice you will find it quite easy
+to code Mule-correctly.
+
+ Note that these guidelines are not necessarily tied to the current
+Mule implementation; they are also a good idea to follow on the grounds
+of code generalization for future I18N work.
+
+* Menu:
+
+* Character-Related Data Types::
+* Working With Character and Byte Positions::
+* Conversion to and from External Data::
+* General Guidelines for Writing Mule-Aware Code::
+* An Example of Mule-Aware Code::
+
+\1f
+File: internals.info, Node: Character-Related Data Types, Next: Working With Character and Byte Positions, Up: Coding for Mule
+
+Character-Related Data Types
+----------------------------
+
+First, let's review the basic character-related datatypes used by
+XEmacs. Note that the separate `typedef's are not mandatory in the
+current implementation (all of them boil down to `unsigned char' or
+`int'), but they improve clarity of code a great deal, because one
+glance at the declaration can tell the intended use of the variable.
+
+`Emchar'
+ An `Emchar' holds a single Emacs character.
+
+ Obviously, the equality between characters and bytes is lost in
+ the Mule world. Characters can be represented by one or more
+ bytes in the buffer, and `Emchar' is the C type large enough to
+ hold any character.
+
+ Without Mule support, an `Emchar' is equivalent to an `unsigned
+ char'.
+
+`Bufbyte'
+ The data representing the text in a buffer or string is logically
+ a set of `Bufbyte's.
+
+ XEmacs does not work with the same character formats all the time;
+ when reading characters from the outside, it decodes them to an
+ internal format, and likewise encodes them when writing.
+ `Bufbyte' (in fact `unsigned char') is the basic unit of XEmacs
+ internal buffers and strings format. A `Bufbyte *' is the type
+ that points at text encoded in the variable-width internal
+ encoding.
+
+ One character can correspond to one or more `Bufbyte's. In the
+ current Mule implementation, an ASCII character is represented by
+ the same `Bufbyte', and other characters are represented by a
+ sequence of two or more `Bufbyte's.
+
+ Without Mule support, there are exactly 256 characters, implicitly
+ Latin-1, and each character is represented using one `Bufbyte', and
+ there is a one-to-one correspondence between `Bufbyte's and
+ `Emchar's.
+
+`Bufpos'
+`Charcount'
+ A `Bufpos' represents a character position in a buffer or string.
+ A `Charcount' represents a number (count) of characters.
+ Logically, subtracting two `Bufpos' values yields a `Charcount'
+ value. Although all of these are `typedef'ed to `EMACS_INT', we
+ use them in preference to `EMACS_INT' to make it clear what sort
+ of position is being used.
+
+ `Bufpos' and `Charcount' values are the only ones that are ever
+ visible to Lisp.
+
+`Bytind'
+`Bytecount'
+ A `Bytind' represents a byte position in a buffer or string. A
+ `Bytecount' represents the distance between two positions, in
+ bytes. The relationship between `Bytind' and `Bytecount' is the
+ same as the relationship between `Bufpos' and `Charcount'.
+
+`Extbyte'
+`Extcount'
+ When dealing with the outside world, XEmacs works with `Extbyte's,
+ which are equivalent to `unsigned char'. Obviously, an `Extcount'
+ is the distance between two `Extbyte's. Extbytes and Extcounts
+ are not all that frequent in XEmacs code.
+
+\1f
+File: internals.info, Node: Working With Character and Byte Positions, Next: Conversion to and from External Data, Prev: Character-Related Data Types, Up: Coding for Mule
+
+Working With Character and Byte Positions
+-----------------------------------------
+
+Now that we have defined the basic character-related types, we can look
+at the macros and functions designed for work with them and for
+conversion between them. Most of these macros are defined in
+`buffer.h', and we don't discuss all of them here, but only the most
+important ones. Examining the existing code is the best way to learn
+about them.
+
+`MAX_EMCHAR_LEN'
+ This preprocessor constant is the maximum number of buffer bytes to
+ represent an Emacs character in the variable width internal
+ encoding. It is useful when allocating temporary strings to keep
+ a known number of characters. For instance:
+
+ {
+ Charcount cclen;
+ ...
+ {
+ /* Allocate place for CCLEN characters. */
+ Bufbyte *buf = (Bufbyte *)alloca (cclen * MAX_EMCHAR_LEN);
+ ...
+
+ If you followed the previous section, you can guess that,
+ logically, multiplying a `Charcount' value with `MAX_EMCHAR_LEN'
+ produces a `Bytecount' value.
+
+ In the current Mule implementation, `MAX_EMCHAR_LEN' equals 4.
+ Without Mule, it is 1.
+
+`charptr_emchar'
+`set_charptr_emchar'
+ The `charptr_emchar' macro takes a `Bufbyte' pointer and returns
+ the `Emchar' stored at that position. If it were a function, its
+ prototype would be:
+
+ Emchar charptr_emchar (Bufbyte *p);
+
+ `set_charptr_emchar' stores an `Emchar' to the specified byte
+ position. It returns the number of bytes stored:
+
+ Bytecount set_charptr_emchar (Bufbyte *p, Emchar c);
+
+ It is important to note that `set_charptr_emchar' is safe only for
+ appending a character at the end of a buffer, not for overwriting a
+ character in the middle. This is because the width of characters
+ varies, and `set_charptr_emchar' cannot resize the string if it
+ writes, say, a two-byte character where a single-byte character
+ used to reside.
+
+ A typical use of `set_charptr_emchar' can be demonstrated by this
+ example, which copies characters from buffer BUF to a temporary
+ string of Bufbytes.
+
+ {
+ Bufpos pos;
+ for (pos = beg; pos < end; pos++)
+ {
+ Emchar c = BUF_FETCH_CHAR (buf, pos);
+ p += set_charptr_emchar (buf, c);
+ }
+ }
+
+ Note how `set_charptr_emchar' is used to store the `Emchar' and
+ increment the counter, at the same time.
+
+`INC_CHARPTR'
+`DEC_CHARPTR'
+ These two macros increment and decrement a `Bufbyte' pointer,
+ respectively. They will adjust the pointer by the appropriate
+ number of bytes according to the byte length of the character
+ stored there. Both macros assume that the memory address is
+ located at the beginning of a valid character.
+
+ Without Mule support, `INC_CHARPTR (p)' and `DEC_CHARPTR (p)'
+ simply expand to `p++' and `p--', respectively.
+
+`bytecount_to_charcount'
+ Given a pointer to a text string and a length in bytes, return the
+ equivalent length in characters.
+
+ Charcount bytecount_to_charcount (Bufbyte *p, Bytecount bc);
+
+`charcount_to_bytecount'
+ Given a pointer to a text string and a length in characters,
+ return the equivalent length in bytes.
+
+ Bytecount charcount_to_bytecount (Bufbyte *p, Charcount cc);
+
+`charptr_n_addr'
+ Return a pointer to the beginning of the character offset CC (in
+ characters) from P.
+
+ Bufbyte *charptr_n_addr (Bufbyte *p, Charcount cc);
+
+\1f
+File: internals.info, Node: Conversion to and from External Data, Next: General Guidelines for Writing Mule-Aware Code, Prev: Working With Character and Byte Positions, Up: Coding for Mule
+
+Conversion to and from External Data
+------------------------------------
+
+When an external function, such as a C library function, returns a
+`char' pointer, you should almost never treat it as `Bufbyte'. This is
+because these returned strings may contain 8bit characters which can be
+misinterpreted by XEmacs, and cause a crash. Likewise, when exporting
+a piece of internal text to the outside world, you should always
+convert it to an appropriate external encoding, lest the internal stuff
+(such as the infamous \201 characters) leak out.
+
+ The interface to conversion between the internal and external
+representations of text are the numerous conversion macros defined in
+`buffer.h'. There used to be a fixed set of external formats supported
+by these macros, but now any coding system can be used with these
+macros. The coding system alias mechanism is used to create the
+following logical coding systems, which replace the fixed external
+formats. The (dontusethis-set-symbol-value-handler) mechanism was
+enhanced to make this possible (more work on that is needed - like
+remove the `dontusethis-' prefix).
+
+`Qbinary'
+ This is the simplest format and is what we use in the absence of a
+ more appropriate format. This converts according to the `binary'
+ coding system:
+
+ a. On input, bytes 0-255 are converted into (implicitly Latin-1)
+ characters 0-255. A non-Mule xemacs doesn't really know about
+ different character sets and the fonts to display them, so
+ the bytes can be treated as text in different 1-byte
+ encodings by simply setting the appropriate fonts. So in a
+ sense, non-Mule xemacs is a multi-lingual editor if, for
+ example, different fonts are used to display text in
+ different buffers, faces, or windows. The specifier
+ mechanism gives the user complete control over this kind of
+ behavior.
+
+ b. On output, characters 0-255 are converted into bytes 0-255
+ and other characters are converted into `~'.
+
+`Qfile_name'
+ Format used for filenames. This is user-definable via either the
+ `file-name-coding-system' or `pathname-coding-system' (now
+ obsolete) variables.
+
+`Qnative'
+ Format used for the external Unix environment--`argv[]', stuff
+ from `getenv()', stuff from the `/etc/passwd' file, etc.
+ Currently this is the same as Qfile_name. The two should be
+ distinguished for clarity and possible future separation.
+
+`Qctext'
+ Compound-text format. This is the standard X11 format used for
+ data stored in properties, selections, and the like. This is an
+ 8-bit no-lock-shift ISO2022 coding system. This is a real coding
+ system, unlike Qfile_name, which is user-definable.
+
+ There are two fundamental macros to convert between external and
+internal format.
+
+ `TO_INTERNAL_FORMAT' converts external data to internal format, and
+`TO_EXTERNAL_FORMAT' converts the other way around. The arguments each
+of these receives are a source type, a source, a sink type, a sink, and
+a coding system (or a symbol naming a coding system).
+
+ A typical call looks like
+ TO_EXTERNAL_FORMAT (LISP_STRING, str, C_STRING_MALLOC, ptr, Qfile_name);
+
+ which means that the contents of the lisp string `str' are written
+to a malloc'ed memory area which will be pointed to by `ptr', after the
+function returns. The conversion will be done using the `file-name'
+coding system, which will be controlled by the user indirectly by
+setting or binding the variable `file-name-coding-system'.
+
+ Some sources and sinks require two C variables to specify. We use
+some preprocessor magic to allow different source and sink types, and
+even different numbers of arguments to specify different types of
+sources and sinks.
+
+ So we can have a call that looks like
+ TO_INTERNAL_FORMAT (DATA, (ptr, len),
+ MALLOC, (ptr, len),
+ coding_system);
+
+ The parenthesized argument pairs are required to make the
+preprocessor magic work.
+
+ Here are the different source and sink types:
+
+``DATA, (ptr, len),''
+ input data is a fixed buffer of size LEN at address PTR
+
+``ALLOCA, (ptr, len),''
+ output data is placed in an alloca()ed buffer of size LEN pointed
+ to by PTR
+
+``MALLOC, (ptr, len),''
+ output data is in a malloc()ed buffer of size LEN pointed to by PTR
+
+``C_STRING_ALLOCA, ptr,''
+ equivalent to `ALLOCA (ptr, len_ignored)' on output.
+
+``C_STRING_MALLOC, ptr,''
+ equivalent to `MALLOC (ptr, len_ignored)' on output
+
+``C_STRING, ptr,''
+ equivalent to `DATA, (ptr, strlen (ptr) + 1)' on input
+
+``LISP_STRING, string,''
+ input or output is a Lisp_Object of type string
+
+``LISP_BUFFER, buffer,''
+ output is written to `(point)' in lisp buffer BUFFER
+
+``LISP_LSTREAM, lstream,''
+ input or output is a Lisp_Object of type lstream
+
+``LISP_OPAQUE, object,''
+ input or output is a Lisp_Object of type opaque
+
+ Often, the data is being converted to a '\0'-byte-terminated string,
+which is the format required by many external system C APIs. For these
+purposes, a source type of `C_STRING' or a sink type of
+`C_STRING_ALLOCA' or `C_STRING_MALLOC' is appropriate. Otherwise, we
+should try to keep XEmacs '\0'-byte-clean, which means using (ptr, len)
+pairs.
+
+ The sinks to be specified must be lvalues, unless they are the lisp
+object types `LISP_LSTREAM' or `LISP_BUFFER'.
+
+ For the sink types `ALLOCA' and `C_STRING_ALLOCA', the resulting
+text is stored in a stack-allocated buffer, which is automatically
+freed on returning from the function. However, the sink types `MALLOC'
+and `C_STRING_MALLOC' return `xmalloc()'ed memory. The caller is
+responsible for freeing this memory using `xfree()'.
+
+ Note that it doesn't make sense for `LISP_STRING' to be a source for
+`TO_INTERNAL_FORMAT' or a sink for `TO_EXTERNAL_FORMAT'. You'll get an
+assertion failure if you try.
+
+\1f
+File: internals.info, Node: General Guidelines for Writing Mule-Aware Code, Next: An Example of Mule-Aware Code, Prev: Conversion to and from External Data, Up: Coding for Mule
+
+General Guidelines for Writing Mule-Aware Code
+----------------------------------------------
+
+This section contains some general guidance on how to write Mule-aware
+code, as well as some pitfalls you should avoid.
+
+_Never use `char' and `char *'._
+ In XEmacs, the use of `char' and `char *' is almost always a
+ mistake. If you want to manipulate an Emacs character from "C",
+ use `Emchar'. If you want to examine a specific octet in the
+ internal format, use `Bufbyte'. If you want a Lisp-visible
+ character, use a `Lisp_Object' and `make_char'. If you want a
+ pointer to move through the internal text, use `Bufbyte *'. Also
+ note that you almost certainly do not need `Emchar *'.
+
+_Be careful not to confuse `Charcount', `Bytecount', and `Bufpos'._
+ The whole point of using different types is to avoid confusion
+ about the use of certain variables. Lest this effect be
+ nullified, you need to be careful about using the right types.
+
+_Always convert external data_
+ It is extremely important to always convert external data, because
+ XEmacs can crash if unexpected 8bit sequences are copied to its
+ internal buffers literally.
+
+ This means that when a system function, such as `readdir', returns
+ a string, you may need to convert it using one of the conversion
+ macros described in the previous chapter, before passing it
+ further to Lisp.
+
+ Actually, most of the basic system functions that accept
+ '\0'-terminated string arguments, like `stat()' and `open()', have
+ been *encapsulated* so that they are they `always' do internal to
+ external conversion themselves. This means you must pass
+ internally encoded data, typically the `XSTRING_DATA' of a
+ Lisp_String to these functions. This is actually a design bug,
+ since it unexpectedly changes the semantics of the system
+ functions. A better design would be to provide separate versions
+ of these system functions that accepted Lisp_Objects which were
+ lisp strings in place of their current `char *' arguments.
+
+ int stat_lisp (Lisp_Object path, struct stat *buf); /* Implement me */
+
+ Also note that many internal functions, such as `make_string',
+ accept Bufbytes, which removes the need for them to convert the
+ data they receive. This increases efficiency because that way
+ external data needs to be decoded only once, when it is read.
+ After that, it is passed around in internal format.
+
+\1f
+File: internals.info, Node: An Example of Mule-Aware Code, Prev: General Guidelines for Writing Mule-Aware Code, Up: Coding for Mule
+
+An Example of Mule-Aware Code
+-----------------------------
+
+As an example of Mule-aware code, we will analyze the `string'
+function, which conses up a Lisp string from the character arguments it
+receives. Here is the definition, pasted from `alloc.c':
+
+ DEFUN ("string", Fstring, 0, MANY, 0, /*
+ Concatenate all the argument characters and make the result a string.
+ */
+ (int nargs, Lisp_Object *args))
+ {
+ Bufbyte *storage = alloca_array (Bufbyte, nargs * MAX_EMCHAR_LEN);
+ Bufbyte *p = storage;
+
+ for (; nargs; nargs--, args++)
+ {
+ Lisp_Object lisp_char = *args;
+ CHECK_CHAR_COERCE_INT (lisp_char);
+ p += set_charptr_emchar (p, XCHAR (lisp_char));
+ }
+ return make_string (storage, p - storage);
+ }
+
+ Now we can analyze the source line by line.
+
+ Obviously, string will be as long as there are arguments to the
+function. This is why we allocate `MAX_EMCHAR_LEN' * NARGS bytes on
+the stack, i.e. the worst-case number of bytes for NARGS `Emchar's to
+fit in the string.
+
+ Then, the loop checks that each element is a character, converting
+integers in the process. Like many other functions in XEmacs, this
+function silently accepts integers where characters are expected, for
+historical and compatibility reasons. Unless you know what you are
+doing, `CHECK_CHAR' will also suffice. `XCHAR (lisp_char)' extracts
+the `Emchar' from the `Lisp_Object', and `set_charptr_emchar' stores it
+to storage, increasing `p' in the process.
+
+ Other instructive examples of correct coding under Mule can be found
+all over the XEmacs code. For starters, I recommend
+`Fnormalize_menu_item_name' in `menubar.c'. After you have understood
+this section of the manual and studied the examples, you can proceed
+writing new Mule-aware code.
+
+\1f
+File: internals.info, Node: Techniques for XEmacs Developers, Prev: Coding for Mule, Up: Rules When Writing New C Code
+
+Techniques for XEmacs Developers
+================================
+
+To make a purified XEmacs, do: `make puremacs'. To make a quantified
+XEmacs, do: `make quantmacs'.
+
+ You simply can't dump Quantified and Purified images (unless using
+the portable dumper). Purify gets confused when xemacs frees memory in
+one process that was allocated in a _different_ process on a different
+machine!. Run it like so:
+ temacs -batch -l loadup.el run-temacs XEMACS-ARGS...
+
+ Before you go through the trouble, are you compiling with all
+debugging and error-checking off? If not, try that first. Be warned
+that while Quantify is directly responsible for quite a few
+optimizations which have been made to XEmacs, doing a run which
+generates results which can be acted upon is not necessarily a trivial
+task.
+
+ Also, if you're still willing to do some runs make sure you configure
+with the `--quantify' flag. That will keep Quantify from starting to
+record data until after the loadup is completed and will shut off
+recording right before it shuts down (which generates enough bogus data
+to throw most results off). It also enables three additional elisp
+commands: `quantify-start-recording-data',
+`quantify-stop-recording-data' and `quantify-clear-data'.
+
+ If you want to make XEmacs faster, target your favorite slow
+benchmark, run a profiler like Quantify, `gprof', or `tcov', and figure
+out where the cycles are going. In many cases you can localize the
+problem (because a particular new feature or even a single patch
+elicited it). Don't hesitate to use brute force techniques like a
+global counter incremented at strategic places, especially in
+combination with other performance indications (_e.g._, degree of
+buffer fragmentation into extents).
+
+ Specific projects:
+
+ * Make the garbage collector faster. Figure out how to write an
+ incremental garbage collector.
+
+ * Write a compiler that takes bytecode and spits out C code.
+ Unfortunately, you will then need a C compiler and a more fully
+ developed module system.
+
+ * Speed up redisplay.
+
+ * Speed up syntax highlighting. It was suggested that "maybe moving
+ some of the syntax highlighting capabilities into C would make a
+ difference." Wrong idea, I think. When processing one large file
+ a particular low-level routine was being called 40 _million_ times
+ simply for _one_ call to `newline-and-indent'. Syntax
+ highlighting needs to be rewritten to use a reliable, fast parser,
+ then to trust the pre-parsed structure, and only do
+ re-highlighting locally to a text change. Modern machines are
+ fast enough to implement such parsers in Lisp; but no machine will
+ ever be fast enough to deal with quadratic (or worse) algorithms!
+
+ * Implement tail recursion in Emacs Lisp (hard!).
+
+ Unfortunately, Emacs Lisp is slow, and is going to stay slow.
+Function calls in elisp are especially expensive. Iterating over a
+long list is going to be 30 times faster implemented in C than in Elisp.
+
+ Heavily used small code fragments need to be fast. The traditional
+way to implement such code fragments in C is with macros. But macros
+in C are known to be broken.
+
+ Macro arguments that are repeatedly evaluated may suffer from
+repeated side effects or suboptimal performance.
+
+ Variable names used in macros may collide with caller's variables,
+causing (at least) unwanted compiler warnings.
+
+ In order to solve these problems, and maintain statement semantics,
+one should use the `do { ... } while (0)' trick while trying to
+reference macro arguments exactly once using local variables.
+
+ Let's take a look at this poor macro definition:
+
+ #define MARK_OBJECT(obj) \
+ if (!marked_p (obj)) mark_object (obj), did_mark = 1
+
+ This macro evaluates its argument twice, and also fails if used like
+this:
+ if (flag) MARK_OBJECT (obj); else do_something();
+
+ A much better definition is
+
+ #define MARK_OBJECT(obj) do { \
+ Lisp_Object mo_obj = (obj); \
+ if (!marked_p (mo_obj)) \
+ { \
+ mark_object (mo_obj); \
+ did_mark = 1; \
+ } \
+ } while (0)
+
+ Notice the elimination of double evaluation by using the local
+variable with the obscure name. Writing safe and efficient macros
+requires great care. The one problem with macros that cannot be
+portably worked around is, since a C block has no value, a macro used
+as an expression rather than a statement cannot use the techniques just
+described to avoid multiple evaluation.
+
+ In most cases where a macro has function semantics, an inline
+function is a better implementation technique. Modern compiler
+optimizers tend to inline functions even if they have no `inline'
+keyword, and configure magic ensures that the `inline' keyword can be
+safely used as an additional compiler hint. Inline functions used in a
+single .c files are easy. The function must already be defined to be
+`static'. Just add another `inline' keyword to the definition.
+
+ inline static int
+ heavily_used_small_function (int arg)
+ {
+ ...
+ }
+
+ Inline functions in header files are trickier, because we would like
+to make the following optimization if the function is _not_ inlined
+(for example, because we're compiling for debugging). We would like the
+function to be defined externally exactly once, and each calling
+translation unit would create an external reference to the function,
+instead of including a definition of the inline function in the object
+code of every translation unit that uses it. This optimization is
+currently only available for gcc. But you don't have to worry about the
+trickiness; just define your inline functions in header files using this
+pattern:
+
+ INLINE_HEADER int
+ i_used_to_be_a_crufty_macro_but_look_at_me_now (int arg);
+ INLINE_HEADER int
+ i_used_to_be_a_crufty_macro_but_look_at_me_now (int arg)
+ {
+ ...
+ }
+
+ The declaration right before the definition is to prevent warnings
+when compiling with `gcc -Wmissing-declarations'. I consider issuing
+this warning for inline functions a gcc bug, but the gcc maintainers
+disagree.
+
+ Every header which contains inline functions, either directly by
+using `INLINE_HEADER' or indirectly by using `DECLARE_LRECORD' must be
+added to `inline.c''s includes to make the optimization described above
+work. (Optimization note: if all INLINE_HEADER functions are in fact
+inlined in all translation units, then the linker can just discard
+`inline.o', since it contains only unreferenced code).
+
+ To get started debugging XEmacs, take a look at the `.gdbinit' and
+`.dbxrc' files in the `src' directory. See the section in the XEmacs
+FAQ on How to Debug an XEmacs problem with a debugger.
+
+ After making source code changes, run `make check' to ensure that
+you haven't introduced any regressions. If you want to make xemacs more
+reliable, please improve the test suite in `tests/automated'.
+
+ Did you make sure you didn't introduce any new compiler warnings?
+
+ Before submitting a patch, please try compiling at least once with
+
+ configure --with-mule --use-union-type --error-checking=all
+
+ Here are things to know when you create a new source file:
+
+ * All `.c' files should `#include <config.h>' first. Almost all
+ `.c' files should `#include "lisp.h"' second.
+
+ * Generated header files should be included using the `#include
+ <...>' syntax, not the `#include "..."' syntax. The generated
+ headers are:
+
+ `config.h sheap-adjust.h paths.h Emacs.ad.h'
+
+ The basic rule is that you should assume builds using `--srcdir'
+ and the `#include <...>' syntax needs to be used when the
+ to-be-included generated file is in a potentially different
+ directory _at compile time_. The non-obvious C rule is that
+ `#include "..."' means to search for the included file in the same
+ directory as the including file, _not_ in the current directory.
+
+ * Header files should _not_ include `<config.h>' and `"lisp.h"'. It
+ is the responsibility of the `.c' files that use it to do so.
+
+
+ Here is a checklist of things to do when creating a new lisp object
+type named FOO:
+
+ 1. create FOO.h
+
+ 2. create FOO.c
+
+ 3. add definitions of `syms_of_FOO', etc. to `FOO.c'
+
+ 4. add declarations of `syms_of_FOO', etc. to `symsinit.h'
+
+ 5. add calls to `syms_of_FOO', etc. to `emacs.c'
+
+ 6. add definitions of macros like `CHECK_FOO' and `FOOP' to `FOO.h'
+
+ 7. add the new type index to `enum lrecord_type'
+
+ 8. add a DEFINE_LRECORD_IMPLEMENTATION call to `FOO.c'
+
+ 9. add an INIT_LRECORD_IMPLEMENTATION call to `syms_of_FOO.c'
+
+\1f
+File: internals.info, Node: Regression Testing XEmacs, Next: A Summary of the Various XEmacs Modules, Prev: Rules When Writing New C Code, Up: Top
+
+Regression Testing XEmacs
+*************************
+
+The source directory `tests/automated' contains XEmacs' automated test
+suite. The usual way of running all the tests is running `make check'
+from the top-level source directory.
+
+ The test suite is unfinished and it's still lacking some essential
+features. It is nevertheless recommended that you run the tests to
+confirm that XEmacs behaves correctly.
+
+ If you want to run a specific test case, you can do it from the
+command-line like this:
+
+ $ xemacs -batch -l test-harness.elc -f batch-test-emacs TEST-FILE
+
+ If something goes wrong, you can run the test suite interactively by
+loading `test-harness.el' into a running XEmacs and typing `M-x
+test-emacs-test-file RET <filename> RET'. You will see a log of passed
+and failed tests, which should allow you to investigate the source of
+the error and ultimately fix the bug.
+
+ Adding a new test file is trivial: just create a new file here and it
+will be run. There is no need to byte-compile any of the files in this
+directory--the test-harness will take care of any necessary
+byte-compilation.
+
+ Look at the existing test cases for the examples of coding test
+cases. It all boils down to your imagination and judicious use of the
+macros `Assert', `Check-Error', `Check-Error-Message', and
+`Check-Message'.
+
+ Here's a simple example checking case-sensitive and case-insensitive
+comparisons from `case-tests.el'.
+
+ (with-temp-buffer
+ (insert "Test Buffer")
+ (let ((case-fold-search t))
+ (goto-char (point-min))
+ (Assert (eq (search-forward "test buffer" nil t) 12))
+ (goto-char (point-min))
+ (Assert (eq (search-forward "Test buffer" nil t) 12))
+ (goto-char (point-min))
+ (Assert (eq (search-forward "Test Buffer" nil t) 12))
+
+ (setq case-fold-search nil)
+ (goto-char (point-min))
+ (Assert (not (search-forward "test buffer" nil t)))
+ (goto-char (point-min))
+ (Assert (not (search-forward "Test buffer" nil t)))
+ (goto-char (point-min))
+ (Assert (eq (search-forward "Test Buffer" nil t) 12))))
+
+ This example could be inserted in a file in `tests/automated', and
+it would be a complete test, automatically executed when you run `make
+check' after building XEmacs. More complex tests may require
+substantial temporary scaffolding to create the environment that elicits
+the bugs, but the top-level Makefile and `test-harness.el' handle the
+running and collection of results from the `Assert', `Check-Error',
+`Check-Error-Message', and `Check-Message' macros.
+
+ In general, you should avoid using functionality from packages in
+your tests, because you can't be sure that everyone will have the
+required package. However, if you've got a test that works, by all
+means add it. Simply wrap the test in an appropriate test, add a
+notice that the test was skipped, and update the `skipped-test-reasons'
+hashtable. Here's an example from `syntax-tests.el':
+
+ ;; Test forward-comment at buffer boundaries
+ (with-temp-buffer
+
+ ;; try to use exactly what you need: featurep, boundp, fboundp
+ (if (not (fboundp 'c-mode))
+
+ ;; We should provide a standard function for this boilerplate,
+ ;; probably called `Skip-Test' -- check for that API with C-h f
+ (let* ((reason "c-mode unavailable")
+ (count (gethash reason skipped-test-reasons)))
+ (puthash reason (if (null count) 1 (1+ count))
+ skipped-test-reasons)
+ (Print-Skip "comment and parse-partial-sexp tests" reason))
+
+ ;; and here's the test code
+ (c-mode)
+ (insert "// comment\n")
+ (forward-comment -2)
+ (Assert (eq (point) (point-min)))
+ (let ((point (point)))
+ (insert "/* comment */")
+ (goto-char point)
+ (forward-comment 2)
+ (Assert (eq (point) (point-max)))
+ (parse-partial-sexp point (point-max)))))
+
+ `Skip-Test' is intended for use with features that are normally
+present in typical configurations. For truly optional features, or
+tests that apply to one of several alternative implementations (eg, to
+GTK widgets, but not Athena, Motif, MS Windows, or Carbon), simply
+silently omit the test.
+
+\1f
+File: internals.info, Node: A Summary of the Various XEmacs Modules, Next: Allocation of Objects in XEmacs Lisp, Prev: Regression Testing XEmacs, Up: Top
+
+A Summary of the Various XEmacs Modules
+***************************************
+
+This is accurate as of XEmacs 20.0.
+
+* Menu:
+
+* Low-Level Modules::
+* Basic Lisp Modules::
+* Modules for Standard Editing Operations::
+* Editor-Level Control Flow Modules::
+* Modules for the Basic Displayable Lisp Objects::
+* Modules for other Display-Related Lisp Objects::
+* Modules for the Redisplay Mechanism::
+* Modules for Interfacing with the File System::
+* Modules for Other Aspects of the Lisp Interpreter and Object System::
+* Modules for Interfacing with the Operating System::
+* Modules for Interfacing with X Windows::
+* Modules for Internationalization::
+* Modules for Regression Testing::
+
+\1f
+File: internals.info, Node: Low-Level Modules, Next: Basic Lisp Modules, Up: A Summary of the Various XEmacs Modules
+
+Low-Level Modules
+=================
+
+ config.h
+
+ This is automatically generated from `config.h.in' based on the
+results of configure tests and user-selected optional features and
+contains preprocessor definitions specifying the nature of the
+environment in which XEmacs is being compiled.
+
+ paths.h
+
+ This is automatically generated from `paths.h.in' based on supplied
+configure values, and allows for non-standard installed configurations
+of the XEmacs directories. It's currently broken, though.
+
+ emacs.c
+ signal.c
+
+ `emacs.c' contains `main()' and other code that performs the most
+basic environment initializations and handles shutting down the XEmacs
+process (this includes `kill-emacs', the normal way that XEmacs is
+exited; `dump-emacs', which is used during the build process to write
+out the XEmacs executable; `run-emacs-from-temacs', which can be used
+to start XEmacs directly when temacs has finished loading all the Lisp
+code; and emergency code to handle crashes [XEmacs tries to auto-save
+all files before it crashes]).
+
+ Low-level code that directly interacts with the Unix signal
+mechanism, however, is in `signal.c'. Note that this code does not
+handle system dependencies in interfacing to signals; that is handled
+using the `syssignal.h' header file, described in section J below.
+
+ unexaix.c
+ unexalpha.c
+ unexapollo.c
+ unexconvex.c
+ unexec.c
+ unexelf.c
+ unexelfsgi.c
+ unexencap.c
+ unexenix.c
+ unexfreebsd.c
+ unexfx2800.c
+ unexhp9k3.c
+ unexhp9k800.c
+ unexmips.c
+ unexnext.c
+ unexsol2.c
+ unexsunos4.c
+
+ These modules contain code dumping out the XEmacs executable on
+various different systems. (This process is highly machine-specific and
+requires intimate knowledge of the executable format and the memory map
+of the process.) Only one of these modules is actually used; this is
+chosen by `configure'.
+
+ ecrt0.c
+ lastfile.c
+ pre-crt0.c
+
+ These modules are used in conjunction with the dump mechanism. On
+some systems, an alternative version of the C startup code (the actual
+code that receives control from the operating system when the process is
+started, and which calls `main()') is required so that the dumping
+process works properly; `crt0.c' provides this.
+
+ `pre-crt0.c' and `lastfile.c' should be the very first and very last
+file linked, respectively. (Actually, this is not really true.
+`lastfile.c' should be after all Emacs modules whose initialized data
+should be made constant, and before all other Emacs files and all
+libraries. In particular, the allocation modules `gmalloc.c',
+`alloca.c', etc. are normally placed past `lastfile.c', and all of the
+files that implement Xt widget classes _must_ be placed after
+`lastfile.c' because they contain various structures that must be
+statically initialized and into which Xt writes at various times.)
+`pre-crt0.c' and `lastfile.c' contain exported symbols that are used to
+determine the start and end of XEmacs' initialized data space when
+dumping.
+
+ alloca.c
+ free-hook.c
+ getpagesize.h
+ gmalloc.c
+ malloc.c
+ mem-limits.h
+ ralloc.c
+ vm-limit.c
+
+ These handle basic C allocation of memory. `alloca.c' is an
+emulation of the stack allocation function `alloca()' on machines that
+lack this. (XEmacs makes extensive use of `alloca()' in its code.)
+
+ `gmalloc.c' and `malloc.c' are two implementations of the standard C
+functions `malloc()', `realloc()' and `free()'. They are often used in
+place of the standard system-provided `malloc()' because they usually
+provide a much faster implementation, at the expense of additional
+memory use. `gmalloc.c' is a newer implementation that is much more
+memory-efficient for large allocations than `malloc.c', and should
+always be preferred if it works. (At one point, `gmalloc.c' didn't work
+on some systems where `malloc.c' worked; but this should be fixed now.)
+
+ `ralloc.c' is the "relocating allocator". It provides functions
+similar to `malloc()', `realloc()' and `free()' that allocate memory
+that can be dynamically relocated in memory. The advantage of this is
+that allocated memory can be shuffled around to place all the free
+memory at the end of the heap, and the heap can then be shrunk,
+releasing the memory back to the operating system. The use of this can
+be controlled with the configure option `--rel-alloc'; if enabled,
+memory allocated for buffers will be relocatable, so that if a very
+large file is visited and the buffer is later killed, the memory can be
+released to the operating system. (The disadvantage of this mechanism
+is that it can be very slow. On systems with the `mmap()' system call,
+the XEmacs version of `ralloc.c' uses this to move memory around
+without actually having to block-copy it, which can speed things up;
+but it can still cause noticeable performance degradation.)
+
+ `free-hook.c' contains some debugging functions for checking for
+invalid arguments to `free()'.
+
+ `vm-limit.c' contains some functions that warn the user when memory
+is getting low. These are callback functions that are called by
+`gmalloc.c' and `malloc.c' at appropriate times.
+
+ `getpagesize.h' provides a uniform interface for retrieving the size
+of a page in virtual memory. `mem-limits.h' provides a uniform
+interface for retrieving the total amount of available virtual memory.
+Both are similar in spirit to the `sys*.h' files described in section
+J, below.
+
+ blocktype.c
+ blocktype.h
+ dynarr.c
+
+ These implement a couple of basic C data types to facilitate memory
+allocation. The `Blocktype' type efficiently manages the allocation of
+fixed-size blocks by minimizing the number of times that `malloc()' and
+`free()' are called. It allocates memory in large chunks, subdivides
+the chunks into blocks of the proper size, and returns the blocks as
+requested. When blocks are freed, they are placed onto a linked list,
+so they can be efficiently reused. This data type is not much used in
+XEmacs currently, because it's a fairly new addition.
+
+ The `Dynarr' type implements a "dynamic array", which is similar to
+a standard C array but has no fixed limit on the number of elements it
+can contain. Dynamic arrays can hold elements of any type, and when
+you add a new element, the array automatically resizes itself if it
+isn't big enough. Dynarrs are extensively used in the redisplay
+mechanism.
+
+ inline.c
+
+ This module is used in connection with inline functions (available in
+some compilers). Often, inline functions need to have a corresponding
+non-inline function that does the same thing. This module is where they
+reside. It contains no actual code, but defines some special flags that
+cause inline functions defined in header files to be rendered as actual
+functions. It then includes all header files that contain any inline
+function definitions, so that each one gets a real function equivalent.
+
+ debug.c
+ debug.h
+
+ These functions provide a system for doing internal consistency
+checks during code development. This system is not currently used;
+instead the simpler `assert()' macro is used along with the various
+checks provided by the `--error-check-*' configuration options.
+
+ universe.h
+
+ This is not currently used.
+
+\1f
+File: internals.info, Node: Basic Lisp Modules, Next: Modules for Standard Editing Operations, Prev: Low-Level Modules, Up: A Summary of the Various XEmacs Modules
+
+Basic Lisp Modules
+==================
+
+ lisp-disunion.h
+ lisp-union.h
+ lisp.h
+ lrecord.h
+ symsinit.h
+
+ These are the basic header files for all XEmacs modules. Each module
+includes `lisp.h', which brings the other header files in. `lisp.h'
+contains the definitions of the structures and extractor and
+constructor macros for the basic Lisp objects and various other basic
+definitions for the Lisp environment, as well as some general-purpose
+definitions (e.g. `min()' and `max()'). `lisp.h' includes either
+`lisp-disunion.h' or `lisp-union.h', depending on whether
+`USE_UNION_TYPE' is defined. These files define the typedef of the
+Lisp object itself (as described above) and the low-level macros that
+hide the actual implementation of the Lisp object. All extractor and
+constructor macros for particular types of Lisp objects are defined in
+terms of these low-level macros.
+
+ As a general rule, all typedefs should go into the typedefs section
+of `lisp.h' rather than into a module-specific header file even if the
+structure is defined elsewhere. This allows function prototypes that
+use the typedef to be placed into other header files. Forward structure
+declarations (i.e. a simple declaration like `struct foo;' where the
+structure itself is defined elsewhere) should be placed into the
+typedefs section as necessary.
+
+ `lrecord.h' contains the basic structures and macros that implement
+all record-type Lisp objects--i.e. all objects whose type is a field in
+their C structure, which includes all objects except the few most basic
+ones.
+
+ `lisp.h' contains prototypes for most of the exported functions in
+the various modules. Lisp primitives defined using `DEFUN' that need
+to be called by C code should be declared using `EXFUN'. Other
+function prototypes should be placed either into the appropriate
+section of `lisp.h', or into a module-specific header file, depending
+on how general-purpose the function is and whether it has
+special-purpose argument types requiring definitions not in `lisp.h'.)
+All initialization functions are prototyped in `symsinit.h'.
+
+ alloc.c
+
+ The large module `alloc.c' implements all of the basic allocation and
+garbage collection for Lisp objects. The most commonly used Lisp
+objects are allocated in chunks, similar to the Blocktype data type
+described above; others are allocated in individually `malloc()'ed
+blocks. This module provides the foundation on which all other aspects
+of the Lisp environment sit, and is the first module initialized at
+startup.
+
+ Note that `alloc.c' provides a series of generic functions that are
+not dependent on any particular object type, and interfaces to
+particular types of objects using a standardized interface of
+type-specific methods. This scheme is a fundamental principle of
+object-oriented programming and is heavily used throughout XEmacs. The
+great advantage of this is that it allows for a clean separation of
+functionality into different modules--new classes of Lisp objects, new
+event interfaces, new device types, new stream interfaces, etc. can be
+added transparently without affecting code anywhere else in XEmacs.
+Because the different subsystems are divided into general and specific
+code, adding a new subtype within a subsystem will in general not
+require changes to the generic subsystem code or affect any of the other
+subtypes in the subsystem; this provides a great deal of robustness to
+the XEmacs code.
+
+ eval.c
+ backtrace.h
+
+ This module contains all of the functions to handle the flow of
+control. This includes the mechanisms of defining functions, calling
+functions, traversing stack frames, and binding variables; the control
+primitives and other special forms such as `while', `if', `eval',
+`let', `and', `or', `progn', etc.; handling of non-local exits,
+unwind-protects, and exception handlers; entering the debugger; methods
+for the subr Lisp object type; etc. It does _not_ include the `read'
+function, the `print' function, or the handling of symbols and obarrays.
+
+ `backtrace.h' contains some structures related to stack frames and
+the flow of control.
+
+ lread.c
+
+ This module implements the Lisp reader and the `read' function,
+which converts text into Lisp objects, according to the read syntax of
+the objects, as described above. This is similar to the parser that is
+a part of all compilers.
+
+ print.c
+
+ This module implements the Lisp print mechanism and the `print'
+function and related functions. This is the inverse of the Lisp reader
+- it converts Lisp objects to a printed, textual representation.
+(Hopefully something that can be read back in using `read' to get an
+equivalent object.)
+
+ general.c
+ symbols.c
+ symeval.h
+
+ `symbols.c' implements the handling of symbols, obarrays, and
+retrieving the values of symbols. Much of the code is devoted to
+handling the special "symbol-value-magic" objects that define special
+types of variables--this includes buffer-local variables, variable
+aliases, variables that forward into C variables, etc. This module is
+initialized extremely early (right after `alloc.c'), because it is here
+that the basic symbols `t' and `nil' are created, and those symbols are
+used everywhere throughout XEmacs.
+
+ `symeval.h' contains the definitions of symbol structures and the
+`DEFVAR_LISP()' and related macros for declaring variables.
+
+ data.c
+ floatfns.c
+ fns.c
+
+ These modules implement the methods and standard Lisp primitives for
+all the basic Lisp object types other than symbols (which are described
+above). `data.c' contains all the predicates (primitives that return
+whether an object is of a particular type); the integer arithmetic
+functions; and the basic accessor and mutator primitives for the various
+object types. `fns.c' contains all the standard predicates for working
+with sequences (where, abstractly speaking, a sequence is an ordered set
+of objects, and can be represented by a list, string, vector, or
+bit-vector); it also contains `equal', perhaps on the grounds that bulk
+of the operation of `equal' is comparing sequences. `floatfns.c'
+contains methods and primitives for floats and floating-point
+arithmetic.
+
+ bytecode.c
+ bytecode.h
+
+ `bytecode.c' implements the byte-code interpreter and
+compiled-function objects, and `bytecode.h' contains associated
+structures. Note that the byte-code _compiler_ is written in Lisp.
+
+\1f
+File: internals.info, Node: Modules for Standard Editing Operations, Next: Editor-Level Control Flow Modules, Prev: Basic Lisp Modules, Up: A Summary of the Various XEmacs Modules
+
+Modules for Standard Editing Operations
+=======================================
+
+ buffer.c
+ buffer.h
+ bufslots.h
+
+ `buffer.c' implements the "buffer" Lisp object type. This includes
+functions that create and destroy buffers; retrieve buffers by name or
+by other properties; manipulate lists of buffers (remember that buffers
+are permanent objects and stored in various ordered lists); retrieve or
+change buffer properties; etc. It also contains the definitions of all
+the built-in buffer-local variables (which can be viewed as buffer
+properties). It does _not_ contain code to manipulate buffer-local
+variables (that's in `symbols.c', described above); or code to
+manipulate the text in a buffer.
+
+ `buffer.h' defines the structures associated with a buffer and the
+various macros for retrieving text from a buffer and special buffer
+positions (e.g. `point', the default location for text insertion). It
+also contains macros for working with buffer positions and converting
+between their representations as character offsets and as byte offsets
+(under MULE, they are different, because characters can be multi-byte).
+It is one of the largest header files.
+
+ `bufslots.h' defines the fields in the buffer structure that
+correspond to the built-in buffer-local variables. It is its own
+header file because it is included many times in `buffer.c', as a way
+of iterating over all the built-in buffer-local variables.
+
+ insdel.c
+ insdel.h
+
+ `insdel.c' contains low-level functions for inserting and deleting
+text in a buffer, keeping track of changed regions for use by
+redisplay, and calling any before-change and after-change functions
+that may have been registered for the buffer. It also contains the
+actual functions that convert between byte offsets and character
+offsets.
+
+ `insdel.h' contains associated headers.
+
+ marker.c
+
+ This module implements the "marker" Lisp object type, which
+conceptually is a pointer to a text position in a buffer that moves
+around as text is inserted and deleted, so as to remain in the same
+relative position. This module doesn't actually move the markers around
+- that's handled in `insdel.c'. This module just creates them and
+implements the primitives for working with them. As markers are simple
+objects, this does not entail much.
+
+ Note that the standard arithmetic primitives (e.g. `+') accept
+markers in place of integers and automatically substitute the value of
+`marker-position' for the marker, i.e. an integer describing the
+current buffer position of the marker.
+
+ extents.c
+ extents.h
+
+ This module implements the "extent" Lisp object type, which is like
+a marker that works over a range of text rather than a single position.
+Extents are also much more complex and powerful than markers and have a
+more efficient (and more algorithmically complex) implementation. The
+implementation is described in detail in comments in `extents.c'.
+
+ The code in `extents.c' works closely with `insdel.c' so that
+extents are properly moved around as text is inserted and deleted.
+There is also code in `extents.c' that provides information needed by
+the redisplay mechanism for efficient operation. (Remember that extents
+can have display properties that affect [sometimes drastically, as in
+the `invisible' property] the display of the text they cover.)
+
+ editfns.c
+
+ `editfns.c' contains the standard Lisp primitives for working with a
+buffer's text, and calls the low-level functions in `insdel.c'. It
+also contains primitives for working with `point' (the default buffer
+insertion location).
+
+ `editfns.c' also contains functions for retrieving various
+characteristics from the external environment: the current time, the
+process ID of the running XEmacs process, the name of the user who ran
+this XEmacs process, etc. It's not clear why this code is in
+`editfns.c'.
+
+ callint.c
+ cmds.c
+ commands.h
+
+ These modules implement the basic "interactive" commands, i.e.
+user-callable functions. Commands, as opposed to other functions, have
+special ways of getting their parameters interactively (by querying the
+user), as opposed to having them passed in a normal function
+invocation. Many commands are not really meant to be called from other
+Lisp functions, because they modify global state in a way that's often
+undesired as part of other Lisp functions.
+
+ `callint.c' implements the mechanism for querying the user for
+parameters and calling interactive commands. The bulk of this module is
+code that parses the interactive spec that is supplied with an
+interactive command.
+
+ `cmds.c' implements the basic, most commonly used editing commands:
+commands to move around the current buffer and insert and delete
+characters. These commands are implemented using the Lisp primitives
+defined in `editfns.c'.
+
+ `commands.h' contains associated structure definitions and
+prototypes.
+
+ regex.c
+ regex.h
+ search.c
+
+ `search.c' implements the Lisp primitives for searching for text in
+a buffer, and some of the low-level algorithms for doing this. In
+particular, the fast fixed-string Boyer-Moore search algorithm is
+implemented in `search.c'. The low-level algorithms for doing
+regular-expression searching, however, are implemented in `regex.c' and
+`regex.h'. These two modules are largely independent of XEmacs, and
+are similar to (and based upon) the regular-expression routines used in
+`grep' and other GNU utilities.
+
+ doprnt.c
+
+ `doprnt.c' implements formatted-string processing, similar to
+`printf()' command in C.
+
+ undo.c
+
+ This module implements the undo mechanism for tracking buffer
+changes. Most of this could be implemented in Lisp.
+
+\1f
+File: internals.info, Node: Editor-Level Control Flow Modules, Next: Modules for the Basic Displayable Lisp Objects, Prev: Modules for Standard Editing Operations, Up: A Summary of the Various XEmacs Modules
+
+Editor-Level Control Flow Modules
+=================================
+
+ event-Xt.c
+ event-msw.c
+ event-stream.c
+ event-tty.c
+ events-mod.h
+ gpmevent.c
+ gpmevent.h
+ events.c
+ events.h
+
+ These implement the handling of events (user input and other system
+notifications).
+
+ `events.c' and `events.h' define the "event" Lisp object type and
+primitives for manipulating it.
+
+ `event-stream.c' implements the basic functions for working with
+event queues, dispatching an event by looking it up in relevant keymaps
+and such, and handling timeouts; this includes the primitives
+`next-event' and `dispatch-event', as well as related primitives such
+as `sit-for', `sleep-for', and `accept-process-output'.
+(`event-stream.c' is one of the hairiest and trickiest modules in
+XEmacs. Beware! You can easily mess things up here.)
+
+ `event-Xt.c' and `event-tty.c' implement the low-level interfaces
+onto retrieving events from Xt (the X toolkit) and from TTY's (using
+`read()' and `select()'), respectively. The event interface enforces a
+clean separation between the specific code for interfacing with the
+operating system and the generic code for working with events, by
+defining an API of basic, low-level event methods; `event-Xt.c' and
+`event-tty.c' are two different implementations of this API. To add
+support for a new operating system (e.g. NeXTstep), one merely needs to
+provide another implementation of those API functions.
+
+ Note that the choice of whether to use `event-Xt.c' or `event-tty.c'
+is made at compile time! Or at the very latest, it is made at startup
+time. `event-Xt.c' handles events for _both_ X and TTY frames;
+`event-tty.c' is only used when X support is not compiled into XEmacs.
+The reason for this is that there is only one event loop in XEmacs:
+thus, it needs to be able to receive events from all different kinds of
+frames.
+
+ keymap.c
+ keymap.h
+
+ `keymap.c' and `keymap.h' define the "keymap" Lisp object type and
+associated methods and primitives. (Remember that keymaps are objects
+that associate event descriptions with functions to be called to
+"execute" those events; `dispatch-event' looks up events in the
+relevant keymaps.)
+
+ cmdloop.c
+
+ `cmdloop.c' contains functions that implement the actual editor
+command loop--i.e. the event loop that cyclically retrieves and
+dispatches events. This code is also rather tricky, just like
+`event-stream.c'.
+
+ macros.c
+ macros.h
+
+ These two modules contain the basic code for defining keyboard
+macros. These functions don't actually do much; most of the code that
+handles keyboard macros is mixed in with the event-handling code in
+`event-stream.c'.
+
+ minibuf.c
+
+ This contains some miscellaneous code related to the minibuffer
+(most of the minibuffer code was moved into Lisp by Richard Mlynarik).
+This includes the primitives for completion (although filename
+completion is in `dired.c'), the lowest-level interface to the
+minibuffer (if the command loop were cleaned up, this too could be in
+Lisp), and code for dealing with the echo area (this, too, was mostly
+moved into Lisp, and the only code remaining is code to call out to
+Lisp or provide simple bootstrapping implementations early in temacs,
+before the echo-area Lisp code is loaded).
+
+\1f
+File: internals.info, Node: Modules for the Basic Displayable Lisp Objects, Next: Modules for other Display-Related Lisp Objects, Prev: Editor-Level Control Flow Modules, Up: A Summary of the Various XEmacs Modules
+
+Modules for the Basic Displayable Lisp Objects
+==============================================
+
+ console-msw.c
+ console-msw.h
+ console-stream.c
+ console-stream.h
+ console-tty.c
+ console-tty.h
+ console-x.c
+ console-x.h
+ console.c
+ console.h
+
+ These modules implement the "console" Lisp object type. A console
+contains multiple display devices, but only one keyboard and mouse.
+Most of the time, a console will contain exactly one device.
+
+ Consoles are the top of a lisp object inclusion hierarchy. Consoles
+contain devices, which contain frames, which contain windows.
+
+ device-msw.c
+ device-tty.c
+ device-x.c
+ device.c
+ device.h
+
+ These modules implement the "device" Lisp object type. This
+abstracts a particular screen or connection on which frames are
+displayed. As with Lisp objects, event interfaces, and other
+subsystems, the device code is separated into a generic component that
+contains a standardized interface (in the form of a set of methods) onto
+particular device types.
+
+ The device subsystem defines all the methods and provides method
+services for not only device operations but also for the frame, window,
+menubar, scrollbar, toolbar, and other displayable-object subsystems.
+The reason for this is that all of these subsystems have the same
+subtypes (X, TTY, NeXTstep, Microsoft Windows, etc.) as devices do.
+
+ frame-msw.c
+ frame-tty.c
+ frame-x.c
+ frame.c
+ frame.h
+
+ Each device contains one or more frames in which objects (e.g. text)
+are displayed. A frame corresponds to a window in the window system;
+usually this is a top-level window but it could potentially be one of a
+number of overlapping child windows within a top-level window, using the
+MDI (Multiple Document Interface) protocol in Microsoft Windows or a
+similar scheme.
+
+ The `frame-*' files implement the "frame" Lisp object type and
+provide the generic and device-type-specific operations on frames (e.g.
+raising, lowering, resizing, moving, etc.).
+
+ window.c
+ window.h
+
+ Each frame consists of one or more non-overlapping "windows" (better
+known as "panes" in standard window-system terminology) in which a
+buffer's text can be displayed. Windows can also have scrollbars
+displayed around their edges.
+
+ `window.c' and `window.h' implement the "window" Lisp object type
+and provide code to manage windows. Since windows have no associated
+resources in the window system (the window system knows only about the
+frame; no child windows or anything are used for XEmacs windows), there
+is no device-type-specific code here; all of that code is part of the
+redisplay mechanism or the code for particular object types such as
+scrollbars.
+
+\1f
+File: internals.info, Node: Modules for other Display-Related Lisp Objects, Next: Modules for the Redisplay Mechanism, Prev: Modules for the Basic Displayable Lisp Objects, Up: A Summary of the Various XEmacs Modules
+
+Modules for other Display-Related Lisp Objects
+==============================================
+
+ faces.c
+ faces.h
+
+ bitmaps.h
+ glyphs-eimage.c
+ glyphs-msw.c
+ glyphs-msw.h
+ glyphs-widget.c
+ glyphs-x.c
+ glyphs-x.h
+ glyphs.c
+ glyphs.h
+
+ objects-msw.c
+ objects-msw.h
+ objects-tty.c
+ objects-tty.h
+ objects-x.c
+ objects-x.h
+ objects.c
+ objects.h
+
+ menubar-msw.c
+ menubar-msw.h
+ menubar-x.c
+ menubar.c
+ menubar.h
+
+ scrollbar-msw.c
+ scrollbar-msw.h
+ scrollbar-x.c
+ scrollbar-x.h
+ scrollbar.c
+ scrollbar.h
+
+ toolbar-msw.c
+ toolbar-x.c
+ toolbar.c
+ toolbar.h
+
+ font-lock.c
+
+ This file provides C support for syntax highlighting--i.e.
+highlighting different syntactic constructs of a source file in
+different colors, for easy reading. The C support is provided so that
+this is fast.
+
+ As of 21.4.10, bugs introduced at the very end of the 21.2 series in
+the "syntax properties" code were fixed, and highlighting is acceptably
+quick again. However, presumably more improvements are possible, and
+the places to look are probably here, in the defun-traversing code, and
+in `syntax.c', in the comment-traversing code.
+
+ dgif_lib.c
+ gif_err.c
+ gif_lib.h
+ gifalloc.c
+
+ These modules decode GIF-format image files, for use with glyphs.
+These files were removed due to Unisys patent infringement concerns.
+
+\1f
+File: internals.info, Node: Modules for the Redisplay Mechanism, Next: Modules for Interfacing with the File System, Prev: Modules for other Display-Related Lisp Objects, Up: A Summary of the Various XEmacs Modules
+
+Modules for the Redisplay Mechanism
+===================================
+
+ redisplay-output.c
+ redisplay-msw.c
+ redisplay-tty.c
+ redisplay-x.c
+ redisplay.c
+ redisplay.h
+
+ These files provide the redisplay mechanism. As with many other
+subsystems in XEmacs, there is a clean separation between the general
+and device-specific support.
+
+ `redisplay.c' contains the bulk of the redisplay engine. These
+functions update the redisplay structures (which describe how the screen
+is to appear) to reflect any changes made to the state of any
+displayable objects (buffer, frame, window, etc.) since the last time
+that redisplay was called. These functions are highly optimized to
+avoid doing more work than necessary (since redisplay is called
+extremely often and is potentially a huge time sink), and depend heavily
+on notifications from the objects themselves that changes have occurred,
+so that redisplay doesn't explicitly have to check each possible object.
+The redisplay mechanism also contains a great deal of caching to further
+speed things up; some of this caching is contained within the various
+displayable objects.
+
+ `redisplay-output.c' goes through the redisplay structures and
+converts them into calls to device-specific methods to actually output
+the screen changes.
+
+ `redisplay-x.c' and `redisplay-tty.c' are two implementations of
+these redisplay output methods, for X frames and TTY frames,
+respectively.
+
+ indent.c
+
+ This module contains various functions and Lisp primitives for
+converting between buffer positions and screen positions. These
+functions call the redisplay mechanism to do most of the work, and then
+examine the redisplay structures to get the necessary information. This
+module needs work.
+
+ termcap.c
+ terminfo.c
+ tparam.c
+
+ These files contain functions for working with the termcap
+(BSD-style) and terminfo (System V style) databases of terminal
+capabilities and escape sequences, used when XEmacs is displaying in a
+TTY.
+
+ cm.c
+ cm.h
+
+ These files provide some miscellaneous TTY-output functions and
+should probably be merged into `redisplay-tty.c'.
+
+\1f
+File: internals.info, Node: Modules for Interfacing with the File System, Next: Modules for Other Aspects of the Lisp Interpreter and Object System, Prev: Modules for the Redisplay Mechanism, Up: A Summary of the Various XEmacs Modules
+
+Modules for Interfacing with the File System
+============================================
+
+ lstream.c
+ lstream.h
+
+ These modules implement the "stream" Lisp object type. This is an
+internal-only Lisp object that implements a generic buffering stream.
+The idea is to provide a uniform interface onto all sources and sinks of
+data, including file descriptors, stdio streams, chunks of memory, Lisp
+buffers, Lisp strings, etc. That way, I/O functions can be written to
+the stream interface and can transparently handle all possible sources
+and sinks. (For example, the `read' function can read data from a
+file, a string, a buffer, or even a function that is called repeatedly
+to return data, without worrying about where the data is coming from or
+what-size chunks it is returned in.)
+
+ Note that in the C code, streams are called "lstreams" (for "Lisp
+streams") to distinguish them from other kinds of streams, e.g. stdio
+streams and C++ I/O streams.
+
+ Similar to other subsystems in XEmacs, lstreams are separated into
+generic functions and a set of methods for the different types of
+lstreams. `lstream.c' provides implementations of many different types
+of streams; others are provided, e.g., in `file-coding.c'.
+
+ fileio.c
+
+ This implements the basic primitives for interfacing with the file
+system. This includes primitives for reading files into buffers,
+writing buffers into files, checking for the presence or accessibility
+of files, canonicalizing file names, etc. Note that these primitives
+are usually not invoked directly by the user: There is a great deal of
+higher-level Lisp code that implements the user commands such as
+`find-file' and `save-buffer'. This is similar to the distinction
+between the lower-level primitives in `editfns.c' and the higher-level
+user commands in `commands.c' and `simple.el'.
+
+ filelock.c
+
+ This file provides functions for detecting clashes between different
+processes (e.g. XEmacs and some external process, or two different
+XEmacs processes) modifying the same file. (XEmacs can optionally use
+the `lock/' subdirectory to provide a form of "locking" between
+different XEmacs processes.) This module is also used by the low-level
+functions in `insdel.c' to ensure that, if the first modification is
+being made to a buffer whose corresponding file has been externally
+modified, the user is made aware of this so that the buffer can be
+synched up with the external changes if necessary.
+
+ filemode.c
+
+ This file provides some miscellaneous functions that construct a
+`rwxr-xr-x'-type permissions string (as might appear in an `ls'-style
+directory listing) given the information returned by the `stat()'
+system call.
+
+ dired.c
+ ndir.h
+
+ These files implement the XEmacs interface to directory searching.
+This includes a number of primitives for determining the files in a
+directory and for doing filename completion. (Remember that generic
+completion is handled by a different mechanism, in `minibuf.c'.)
+
+ `ndir.h' is a header file used for the directory-searching emulation
+functions provided in `sysdep.c' (see section J below), for systems
+that don't provide any directory-searching functions. (On those
+systems, directories can be read directly as files, and parsed.)
+
+ realpath.c
+
+ This file provides an implementation of the `realpath()' function
+for expanding symbolic links, on systems that don't implement it or have
+a broken implementation.
+
+\1f
+File: internals.info, Node: Modules for Other Aspects of the Lisp Interpreter and Object System, Next: Modules for Interfacing with the Operating System, Prev: Modules for Interfacing with the File System, Up: A Summary of the Various XEmacs Modules
+
+Modules for Other Aspects of the Lisp Interpreter and Object System
+===================================================================
+
+ elhash.c
+ elhash.h
+ hash.c
+ hash.h
+
+ These files provide two implementations of hash tables. Files
+`hash.c' and `hash.h' provide a generic C implementation of hash tables
+which can stand independently of XEmacs. Files `elhash.c' and
+`elhash.h' provide a separate implementation of hash tables that can
+store only Lisp objects, and knows about Lispy things like garbage
+collection, and implement the "hash-table" Lisp object type.
+
+ specifier.c
+ specifier.h
+
+ This module implements the "specifier" Lisp object type. This is
+primarily used for displayable properties, and allows for values that
+are specific to a particular buffer, window, frame, device, or device
+class, as well as a default value existing. This is used, for example,
+to control the height of the horizontal scrollbar or the appearance of
+the `default', `bold', or other faces. The specifier object consists
+of a number of specifications, each of which maps from a buffer,
+window, etc. to a value. The function `specifier-instance' looks up a
+value given a window (from which a buffer, frame, and device can be
+derived).
+
+ chartab.c
+ chartab.h
+ casetab.c
+
+ `chartab.c' and `chartab.h' implement the "char table" Lisp object
+type, which maps from characters or certain sorts of character ranges
+to Lisp objects. The implementation of this object type is optimized
+for the internal representation of characters. Char tables come in
+different types, which affect the allowed object types to which a
+character can be mapped and also dictate certain other properties of
+the char table.
+
+ `casetab.c' implements one sort of char table, the "case table",
+which maps characters to other characters of possibly different case.
+These are used by XEmacs to implement case-changing primitives and to
+do case-insensitive searching.
+
+ syntax.c
+ syntax.h
+
+ This module implements "syntax tables", another sort of char table
+that maps characters into syntax classes that define the syntax of these
+characters (e.g. a parenthesis belongs to a class of `open' characters
+that have corresponding `close' characters and can be nested). This
+module also implements the Lisp "scanner", a set of primitives for
+scanning over text based on syntax tables. This is used, for example,
+to find the matching parenthesis in a command such as `forward-sexp',
+and by `font-lock.c' to locate quoted strings, comments, etc.
+
+ Syntax codes are implemented as bitfields in an int. Bits 0-6
+contain the syntax code itself, bit 7 is a special prefix flag used for
+Lisp, and bits 16-23 contain comment syntax flags. From the Lisp
+programmer's point of view, there are 11 flags: 2 styles X 2 characters
+X {start, end} flags for two-character comment delimiters, 2 style
+flags for one-character comment delimiters, and the prefix flag.
+
+ Internally, however, the characters used in multi-character
+delimiters will have non-comment-character syntax classes (_e.g._, the
+`/' in C's `/*' comment-start delimiter has "punctuation" (here meaning
+"operator-like") class in C modes). Thus in a mixed comment style,
+such as C++'s `//' to end of line, is represented by giving `/' the
+"punctuation" class and the "style b first character of start sequence"
+and "style b second character of start sequence" flags. The fact that
+class is _not_ punctuation allows the syntax scanner to recognize that
+this is a multi-character delimiter. The `newline' character is given
+(single-character) "comment-end" _class_ and the "style b first
+character of end sequence" _flag_. The "comment-end" class allows the
+scanner to determine that no second character is needed to terminate
+the comment.
+
+ casefiddle.c
+
+ This module implements various Lisp primitives for upcasing,
+downcasing and capitalizing strings or regions of buffers.
+
+ rangetab.c
+
+ This module implements the "range table" Lisp object type, which
+provides for a mapping from ranges of integers to arbitrary Lisp
+objects.
+
+ opaque.c
+ opaque.h
+
+ This module implements the "opaque" Lisp object type, an
+internal-only Lisp object that encapsulates an arbitrary block of memory
+so that it can be managed by the Lisp allocation system. To create an
+opaque object, you call `make_opaque()', passing a pointer to a block
+of memory. An object is created that is big enough to hold the memory,
+which is copied into the object's storage. The object will then stick
+around as long as you keep pointers to it, after which it will be
+automatically reclaimed.
+
+ Opaque objects can also have an arbitrary "mark method" associated
+with them, in case the block of memory contains other Lisp objects that
+need to be marked for garbage-collection purposes. (If you need other
+object methods, such as a finalize method, you should just go ahead and
+create a new Lisp object type--it's not hard.)
+
+ abbrev.c
+
+ This function provides a few primitives for doing dynamic
+abbreviation expansion. In XEmacs, most of the code for this has been
+moved into Lisp. Some C code remains for speed and because the
+primitive `self-insert-command' (which is executed for all
+self-inserting characters) hooks into the abbrev mechanism.
+(`self-insert-command' is itself in C only for speed.)
+
+ doc.c
+
+ This function provides primitives for retrieving the documentation
+strings of functions and variables. These documentation strings contain
+certain special markers that get dynamically expanded (e.g. a
+reverse-lookup is performed on some named functions to retrieve their
+current key bindings). Some documentation strings (in particular, for
+the built-in primitives and pre-loaded Lisp functions) are stored
+externally in a file `DOC' in the `lib-src/' directory and need to be
+fetched from that file. (Part of the build stage involves building this
+file, and another part involves constructing an index for this file and
+embedding it into the executable, so that the functions in `doc.c' do
+not have to search the entire `DOC' file to find the appropriate
+documentation string.)
+
+ md5.c
+
+ This function provides a Lisp primitive that implements the MD5
+secure hashing scheme, used to create a large hash value of a string of
+data such that the data cannot be derived from the hash value. This is
+used for various security applications on the Internet.
+
+\1f
+File: internals.info, Node: Modules for Interfacing with the Operating System, Next: Modules for Interfacing with X Windows, Prev: Modules for Other Aspects of the Lisp Interpreter and Object System, Up: A Summary of the Various XEmacs Modules
+
+Modules for Interfacing with the Operating System
+=================================================
+
+ callproc.c
+ process.c
+ process.h
+
+ These modules allow XEmacs to spawn and communicate with subprocesses
+and network connections.
+
+ `callproc.c' implements (through the `call-process' primitive) what
+are called "synchronous subprocesses". This means that XEmacs runs a
+program, waits till it's done, and retrieves its output. A typical
+example might be calling the `ls' program to get a directory listing.
+
+ `process.c' and `process.h' implement "asynchronous subprocesses".
+This means that XEmacs starts a program and then continues normally,
+not waiting for the process to finish. Data can be sent to the process
+or retrieved from it as it's running. This is used for the `shell'
+command (which provides a front end onto a shell program such as
+`csh'), the mail and news readers implemented in XEmacs, etc. The
+result of calling `start-process' to start a subprocess is a process
+object, a particular kind of object used to communicate with the
+subprocess. You can send data to the process by passing the process
+object and the data to `send-process', and you can specify what happens
+to data retrieved from the process by setting properties of the process
+object. (When the process sends data, XEmacs receives a process event,
+which says that there is data ready. When `dispatch-event' is called
+on this event, it reads the data from the process and does something
+with it, as specified by the process object's properties. Typically,
+this means inserting the data into a buffer or calling a function.)
+Another property of the process object is called the "sentinel", which
+is a function that is called when the process terminates.
+
+ Process objects are also used for network connections (connections
+to a process running on another machine). Network connections are
+started with `open-network-stream' but otherwise work just like
+subprocesses.
+
+ sysdep.c
+ sysdep.h
+
+ These modules implement most of the low-level, messy operating-system
+interface code. This includes various device control (ioctl) operations
+for file descriptors, TTY's, pseudo-terminals, etc. (usually this stuff
+is fairly system-dependent; thus the name of this module), and emulation
+of standard library functions and system calls on systems that don't
+provide them or have broken versions.
+
+ sysdir.h
+ sysfile.h
+ sysfloat.h
+ sysproc.h
+ syspwd.h
+ syssignal.h
+ systime.h
+ systty.h
+ syswait.h
+
+ These header files provide consistent interfaces onto
+system-dependent header files and system calls. The idea is that,
+instead of including a standard header file like `<sys/param.h>' (which
+may or may not exist on various systems) or having to worry about
+whether all system provide a particular preprocessor constant, or
+having to deal with the four different paradigms for manipulating
+signals, you just include the appropriate `sys*.h' header file, which
+includes all the right system header files, defines and missing
+preprocessor constants, provides a uniform interface onto system calls,
+etc.
+
+ `sysdir.h' provides a uniform interface onto directory-querying
+functions. (In some cases, this is in conjunction with emulation
+functions in `sysdep.c'.)
+
+ `sysfile.h' includes all the necessary header files for standard
+system calls (e.g. `read()'), ensures that all necessary `open()' and
+`stat()' preprocessor constants are defined, and possibly (usually)
+substitutes sugared versions of `read()', `write()', etc. that
+automatically restart interrupted I/O operations.
+
+ `sysfloat.h' includes the necessary header files for floating-point
+operations.
+
+ `sysproc.h' includes the necessary header files for calling
+`select()', `fork()', `execve()', socket operations, and the like, and
+ensures that the `FD_*()' macros for descriptor-set manipulations are
+available.
+
+ `syspwd.h' includes the necessary header files for obtaining
+information from `/etc/passwd' (the functions are emulated under VMS).
+
+ `syssignal.h' includes the necessary header files for
+signal-handling and provides a uniform interface onto the different
+signal-handling and signal-blocking paradigms.
+
+ `systime.h' includes the necessary header files and provides uniform
+interfaces for retrieving the time of day, setting file
+access/modification times, getting the amount of time used by the XEmacs
+process, etc.
+
+ `systty.h' buffers against the infinitude of different ways of
+controlling TTY's.
+
+ `syswait.h' provides a uniform way of retrieving the exit status
+from a `wait()'ed-on process (some systems use a union, others use an
+int).
+
+ hpplay.c
+ libsst.c
+ libsst.h
+ libst.h
+ linuxplay.c
+ nas.c
+ sgiplay.c
+ sound.c
+ sunplay.c
+
+ These files implement the ability to play various sounds on some
+types of computers. You have to configure your XEmacs with sound
+support in order to get this capability.
+
+ `sound.c' provides the generic interface. It implements various
+Lisp primitives and variables that let you specify which sounds should
+be played in certain conditions. (The conditions are identified by
+symbols, which are passed to `ding' to make a sound. Various standard
+functions call this function at certain times; if sound support does
+not exist, a simple beep results.
+
+ `sgiplay.c', `sunplay.c', `hpplay.c', and `linuxplay.c' interface to
+the machine's speaker for various different kind of machines. This is
+called "native" sound.
+
+ `nas.c' interfaces to a computer somewhere else on the network using
+the NAS (Network Audio Server) protocol, playing sounds on that
+machine. This allows you to run XEmacs on a remote machine, with its
+display set to your local machine, and have the sounds be made on your
+local machine, provided that you have a NAS server running on your local
+machine.
+
+ `libsst.c', `libsst.h', and `libst.h' provide some additional
+functions for playing sound on a Sun SPARC but are not currently in use.
+
+ tooltalk.c
+ tooltalk.h
+
+ These two modules implement an interface to the ToolTalk protocol,
+which is an interprocess communication protocol implemented on some
+versions of Unix. ToolTalk is a high-level protocol that allows
+processes to register themselves as providers of particular services;
+other processes can then request a service without knowing or caring
+exactly who is providing the service. It is similar in spirit to the
+DDE protocol provided under Microsoft Windows. ToolTalk is a part of
+the new CDE (Common Desktop Environment) specification and is used to
+connect the parts of the SPARCWorks development environment.
+
+ getloadavg.c
+
+ This module provides the ability to retrieve the system's current
+load average. (The way to do this is highly system-specific,
+unfortunately, and requires a lot of special-case code.)
+
+ sunpro.c
+
+ This module provides a small amount of code used internally at Sun to
+keep statistics on the usage of XEmacs.
+
+ broken-sun.h
+ strcmp.c
+ strcpy.c
+ sunOS-fix.c
+
+ These files provide replacement functions and prototypes to fix
+numerous bugs in early releases of SunOS 4.1.
+
+ hftctl.c
+
+ This module provides some terminal-control code necessary on
+versions of AIX prior to 4.1.
+
+\1f
+File: internals.info, Node: Modules for Interfacing with X Windows, Next: Modules for Internationalization, Prev: Modules for Interfacing with the Operating System, Up: A Summary of the Various XEmacs Modules
+
+Modules for Interfacing with X Windows
+======================================
+
+ Emacs.ad.h
+
+ A file generated from `Emacs.ad', which contains XEmacs-supplied
+fallback resources (so that XEmacs has pretty defaults).
+
+ EmacsFrame.c
+ EmacsFrame.h
+ EmacsFrameP.h
+
+ These modules implement an Xt widget class that encapsulates a frame.
+This is for ease in integrating with Xt. The EmacsFrame widget covers
+the entire X window except for the menubar; the scrollbars are
+positioned on top of the EmacsFrame widget.
+
+ *Warning:* Abandon hope, all ye who enter here. This code took an
+ungodly amount of time to get right, and is likely to fall apart
+mercilessly at the slightest change. Such is life under Xt.
+
+ EmacsManager.c
+ EmacsManager.h
+ EmacsManagerP.h
+
+ These modules implement a simple Xt manager (i.e. composite) widget
+class that simply lets its children set whatever geometry they want.
+It's amazing that Xt doesn't provide this standardly, but on second
+thought, it makes sense, considering how amazingly broken Xt is.
+
+ EmacsShell-sub.c
+ EmacsShell.c
+ EmacsShell.h
+ EmacsShellP.h
+
+ These modules implement two Xt widget classes that are subclasses of
+the TopLevelShell and TransientShell classes. This is necessary to deal
+with more brokenness that Xt has sadistically thrust onto the backs of
+developers.
+
+ xgccache.c
+ xgccache.h
+
+ These modules provide functions for maintenance and caching of GC's
+(graphics contexts) under the X Window System. This code is junky and
+needs to be rewritten.
+
+ select-msw.c
+ select-x.c
+ select.c
+ select.h
+
+ This module provides an interface to the X Window System's concept of
+"selections", the standard way for X applications to communicate with
+each other.
+
+ xintrinsic.h
+ xintrinsicp.h
+ xmmanagerp.h
+ xmprimitivep.h
+
+ These header files are similar in spirit to the `sys*.h' files and
+buffer against different implementations of Xt and Motif.
+
+ * `xintrinsic.h' should be included in place of `<Intrinsic.h>'.
+
+ * `xintrinsicp.h' should be included in place of `<IntrinsicP.h>'.
+
+ * `xmmanagerp.h' should be included in place of `<XmManagerP.h>'.
+
+ * `xmprimitivep.h' should be included in place of `<XmPrimitiveP.h>'.
+
+ xmu.c
+ xmu.h
+
+ These files provide an emulation of the Xmu library for those systems
+(i.e. HPUX) that don't provide it as a standard part of X.
+
+ ExternalClient-Xlib.c
+ ExternalClient.c
+ ExternalClient.h
+ ExternalClientP.h
+ ExternalShell.c
+ ExternalShell.h
+ ExternalShellP.h
+ extw-Xlib.c
+ extw-Xlib.h
+ extw-Xt.c
+ extw-Xt.h
+
+ These files provide the "external widget" interface, which allows an
+XEmacs frame to appear as a widget in another application. To do this,
+you have to configure with `--external-widget'.
+
+ `ExternalShell*' provides the server (XEmacs) side of the connection.
+
+ `ExternalClient*' provides the client (other application) side of
+the connection. These files are not compiled into XEmacs but are
+compiled into libraries that are then linked into your application.
+
+ `extw-*' is common code that is used for both the client and server.
+
+ Don't touch this code; something is liable to break if you do.
+
+\1f
+File: internals.info, Node: Modules for Internationalization, Next: Modules for Regression Testing, Prev: Modules for Interfacing with X Windows, Up: A Summary of the Various XEmacs Modules
+
+Modules for Internationalization
+================================
+
+ mule-canna.c
+ mule-ccl.c
+ mule-charset.c
+ mule-charset.h
+ file-coding.c
+ file-coding.h
+ mule-mcpath.c
+ mule-mcpath.h
+ mule-wnnfns.c
+ mule.c
+
+ These files implement the MULE (Asian-language) support. Note that
+MULE actually provides a general interface for all sorts of languages,
+not just Asian languages (although they are generally the most
+complicated to support). This code is still in beta.
+
+ `mule-charset.*' and `file-coding.*' provide the heart of the XEmacs
+MULE support. `mule-charset.*' implements the "charset" Lisp object
+type, which encapsulates a character set (an ordered one- or
+two-dimensional set of characters, such as US ASCII or JISX0208 Japanese
+Kanji).
+
+ `file-coding.*' implements the "coding-system" Lisp object type,
+which encapsulates a method of converting between different encodings.
+An encoding is a representation of a stream of characters, possibly
+from multiple character sets, using a stream of bytes or words, and
+defines (e.g.) which escape sequences are used to specify particular
+character sets, how the indices for a character are converted into bytes
+(sometimes this involves setting the high bit; sometimes complicated
+rearranging of the values takes place, as in the Shift-JIS encoding),
+etc.
+
+ `mule-ccl.c' provides the CCL (Code Conversion Language)
+interpreter. CCL is similar in spirit to Lisp byte code and is used to
+implement converters for custom encodings.
+
+ `mule-canna.c' and `mule-wnnfns.c' implement interfaces to external
+programs used to implement the Canna and WNN input methods,
+respectively. This is currently in beta.
+
+ `mule-mcpath.c' provides some functions to allow for pathnames
+containing extended characters. This code is fragmentary, obsolete, and
+completely non-working. Instead, `pathname-coding-system' is used to
+specify conversions of names of files and directories. The standard C
+I/O functions like `open()' are wrapped so that conversion occurs
+automatically.
+
+ `mule.c' provides a few miscellaneous things that should probably be
+elsewhere.
+
+ intl.c
+
+ This provides some miscellaneous internationalization code for
+implementing message translation and interfacing to the Ximp input
+method. None of this code is currently working.
+
+ iso-wide.h
+
+ This contains leftover code from an earlier implementation of
+Asian-language support, and is not currently used.
+
+\1f
+File: internals.info, Node: Modules for Regression Testing, Prev: Modules for Internationalization, Up: A Summary of the Various XEmacs Modules
+
+Modules for Regression Testing
+==============================
+
+ test-harness.el
+ base64-tests.el
+ byte-compiler-tests.el
+ case-tests.el
+ ccl-tests.el
+ c-tests.el
+ database-tests.el
+ extent-tests.el
+ hash-table-tests.el
+ lisp-tests.el
+ md5-tests.el
+ mule-tests.el
+ regexp-tests.el
+ symbol-tests.el
+ syntax-tests.el
+
+ `test-harness.el' defines the macros `Assert', `Check-Error',
+`Check-Error-Message', and `Check-Message'. The other files are test
+files, testing various XEmacs modules.
+
+\1f
+File: internals.info, Node: Allocation of Objects in XEmacs Lisp, Next: Dumping, Prev: A Summary of the Various XEmacs Modules, Up: Top
+
+Allocation of Objects in XEmacs Lisp
+************************************
+
+* Menu:
+
+* Introduction to Allocation::
+* Garbage Collection::
+* GCPROing::
+* Garbage Collection - Step by Step::
+* Integers and Characters::
+* Allocation from Frob Blocks::
+* lrecords::
+* Low-level allocation::
+* Cons::
+* Vector::
+* Bit Vector::
+* Symbol::
+* Marker::
+* String::
+* Compiled Function::
+
+\1f
+File: internals.info, Node: Introduction to Allocation, Next: Garbage Collection, Up: Allocation of Objects in XEmacs Lisp
+
+Introduction to Allocation
+==========================
+
+Emacs Lisp, like all Lisps, has garbage collection. This means that
+the programmer never has to explicitly free (destroy) an object; it
+happens automatically when the object becomes inaccessible. Most
+experts agree that garbage collection is a necessity in a modern,
+high-level language. Its omission from C stems from the fact that C was
+originally designed to be a nice abstract layer on top of assembly
+language, for writing kernels and basic system utilities rather than
+large applications.
+
+ Lisp objects can be created by any of a number of Lisp primitives.
+Most object types have one or a small number of basic primitives for
+creating objects. For conses, the basic primitive is `cons'; for
+vectors, the primitives are `make-vector' and `vector'; for symbols,
+the primitives are `make-symbol' and `intern'; etc. Some Lisp objects,
+especially those that are primarily used internally, have no
+corresponding Lisp primitives. Every Lisp object, though, has at least
+one C primitive for creating it.
+
+ Recall from section (VII) that a Lisp object, as stored in a 32-bit
+or 64-bit word, has a few tag bits, and a "value" that occupies the
+remainder of the bits. We can separate the different Lisp object types
+into three broad categories:
+
+ * (a) Those for whom the value directly represents the contents of
+ the Lisp object. Only two types are in this category: integers and
+ characters. No special allocation or garbage collection is
+ necessary for such objects. Lisp objects of these types do not
+ need to be `GCPRO'ed.
+
+ In the remaining two categories, the type is stored in the object
+itself. The tag for all such objects is the generic "lrecord"
+(Lisp_Type_Record) tag. The first bytes of the object's structure are
+an integer (actually a char) characterising the object's type and some
+flags, in particular the mark bit used for garbage collection. A
+structure describing the type is accessible thru the
+lrecord_implementation_table indexed with said integer. This structure
+includes the method pointers and a pointer to a string naming the type.
+
+ * (b) Those lrecords that are allocated in frob blocks (see above).
+ This includes the objects that are most common and relatively
+ small, and includes conses, strings, subrs, floats, compiled
+ functions, symbols, extents, events, and markers. With the
+ cleanup of frob blocks done in 19.12, it's not terribly hard to
+ add more objects to this category, but it's a bit trickier than
+ adding an object type to type (c) (esp. if the object needs a
+ finalization method), and is not likely to save much space unless
+ the object is small and there are many of them. (In fact, if there
+ are very few of them, it might actually waste space.)
+
+ * (c) Those lrecords that are individually `malloc()'ed. These are
+ called "lcrecords". All other types are in this category. Adding
+ a new type to this category is comparatively easy, and all types
+ added since 19.8 (when the current allocation scheme was devised,
+ by Richard Mlynarik), with the exception of the character type,
+ have been in this category.
+
+ Note that bit vectors are a bit of a special case. They are simple
+lrecords as in category (b), but are individually `malloc()'ed like
+vectors. You can basically view them as exactly like vectors except
+that their type is stored in lrecord fashion rather than in
+directly-tagged fashion.
+
+\1f
+File: internals.info, Node: Garbage Collection, Next: GCPROing, Prev: Introduction to Allocation, Up: Allocation of Objects in XEmacs Lisp
+
+Garbage Collection
+==================
+
+Garbage collection is simple in theory but tricky to implement. Emacs
+Lisp uses the oldest garbage collection method, called "mark and
+sweep". Garbage collection begins by starting with all accessible
+locations (i.e. all variables and other slots where Lisp objects might
+occur) and recursively traversing all objects accessible from those
+slots, marking each one that is found. We then go through all of
+memory and free each object that is not marked, and unmarking each
+object that is marked. Note that "all of memory" means all currently
+allocated objects. Traversing all these objects means traversing all
+frob blocks, all vectors (which are chained in one big list), and all
+lcrecords (which are likewise chained).
+
+ Garbage collection can be invoked explicitly by calling
+`garbage-collect' but is also called automatically by `eval', once a
+certain amount of memory has been allocated since the last garbage
+collection (according to `gc-cons-threshold').
+
+\1f
+File: internals.info, Node: GCPROing, Next: Garbage Collection - Step by Step, Prev: Garbage Collection, Up: Allocation of Objects in XEmacs Lisp
+
+`GCPRO'ing
+==========
+
+`GCPRO'ing is one of the ugliest and trickiest parts of Emacs
+internals. The basic idea is that whenever garbage collection occurs,
+all in-use objects must be reachable somehow or other from one of the
+roots of accessibility. The roots of accessibility are:
+
+ 1. All objects that have been `staticpro()'d or
+ `staticpro_nodump()'ed. This is used for any global C variables
+ that hold Lisp objects. A call to `staticpro()' happens implicitly
+ as a result of any symbols declared with `defsymbol()' and any
+ variables declared with `DEFVAR_FOO()'. You need to explicitly
+ call `staticpro()' (in the `vars_of_foo()' method of a module) for
+ other global C variables holding Lisp objects. (This typically
+ includes internal lists and such things.). Use
+ `staticpro_nodump()' only in the rare cases when you do not want
+ the pointed variable to be saved at dump time but rather recompute
+ it at startup.
+
+ Note that `obarray' is one of the `staticpro()'d things.
+ Therefore, all functions and variables get marked through this.
+
+ 2. Any shadowed bindings that are sitting on the `specpdl' stack.
+
+ 3. Any objects sitting in currently active (Lisp) stack frames,
+ catches, and condition cases.
+
+ 4. A couple of special-case places where active objects are located.
+
+ 5. Anything currently marked with `GCPRO'.
+
+ Marking with `GCPRO' is necessary because some C functions (quite a
+lot, in fact), allocate objects during their operation. Quite
+frequently, there will be no other pointer to the object while the
+function is running, and if a garbage collection occurs and the object
+needs to be referenced again, bad things will happen. The solution is
+to mark those objects with `GCPRO'. Unfortunately this is easy to
+forget, and there is basically no way around this problem. Here are
+some rules, though:
+
+ 1. For every `GCPRON', there have to be declarations of `struct gcpro
+ gcpro1, gcpro2', etc.
+
+ 2. You _must_ `UNGCPRO' anything that's `GCPRO'ed, and you _must not_
+ `UNGCPRO' if you haven't `GCPRO'ed. Getting either of these wrong
+ will lead to crashes, often in completely random places unrelated
+ to where the problem lies.
+
+ 3. The way this actually works is that all currently active `GCPRO's
+ are chained through the `struct gcpro' local variables, with the
+ variable `gcprolist' pointing to the head of the list and the nth
+ local `gcpro' variable pointing to the first `gcpro' variable in
+ the next enclosing stack frame. Each `GCPRO'ed thing is an
+ lvalue, and the `struct gcpro' local variable contains a pointer to
+ this lvalue. This is why things will mess up badly if you don't
+ pair up the `GCPRO's and `UNGCPRO's--you will end up with
+ `gcprolist's containing pointers to `struct gcpro's or local
+ `Lisp_Object' variables in no-longer-active stack frames.
+
+ 4. It is actually possible for a single `struct gcpro' to protect a
+ contiguous array of any number of values, rather than just a
+ single lvalue. To effect this, call `GCPRON' as usual on the
+ first object in the array and then set `gcproN.nvars'.
+
+ 5. *Strings are relocated.* What this means in practice is that the
+ pointer obtained using `XSTRING_DATA()' is liable to change at any
+ time, and you should never keep it around past any function call,
+ or pass it as an argument to any function that might cause a
+ garbage collection. This is why a number of functions accept
+ either a "non-relocatable" `char *' pointer or a relocatable Lisp
+ string, and only access the Lisp string's data at the very last
+ minute. In some cases, you may end up having to `alloca()' some
+ space and copy the string's data into it.
+
+ 6. By convention, if you have to nest `GCPRO''s, use `NGCPRON' (along
+ with `struct gcpro ngcpro1, ngcpro2', etc.), `NNGCPRON', etc.
+ This avoids compiler warnings about shadowed locals.
+
+ 7. It is _always_ better to err on the side of extra `GCPRO's rather
+ than too few. The extra cycles spent on this are almost never
+ going to make a whit of difference in the speed of anything.
+
+ 8. The general rule to follow is that caller, not callee, `GCPRO's.
+ That is, you should not have to explicitly `GCPRO' any Lisp objects
+ that are passed in as parameters.
+
+ One exception from this rule is if you ever plan to change the
+ parameter value, and store a new object in it. In that case, you
+ _must_ `GCPRO' the parameter, because otherwise the new object
+ will not be protected.
+
+ So, if you create any Lisp objects (remember, this happens in all
+ sorts of circumstances, e.g. with `Fcons()', etc.), you are
+ responsible for `GCPRO'ing them, unless you are _absolutely sure_
+ that there's no possibility that a garbage-collection can occur
+ while you need to use the object. Even then, consider `GCPRO'ing.
+
+ 9. A garbage collection can occur whenever anything calls `Feval', or
+ whenever a QUIT can occur where execution can continue past this.
+ (Remember, this is almost anywhere.)
+
+ 10. If you have the _least smidgeon of doubt_ about whether you need
+ to `GCPRO', you should `GCPRO'.
+
+ 11. Beware of `GCPRO'ing something that is uninitialized. If you have
+ any shade of doubt about this, initialize all your variables to
+ `Qnil'.
+
+ 12. Be careful of traps, like calling `Fcons()' in the argument to
+ another function. By the "caller protects" law, you should be
+ `GCPRO'ing the newly-created cons, but you aren't. A certain
+ number of functions that are commonly called on freshly created
+ stuff (e.g. `nconc2()', `Fsignal()'), break the "caller protects"
+ law and go ahead and `GCPRO' their arguments so as to simplify
+ things, but make sure and check if it's OK whenever doing
+ something like this.
+
+ 13. Once again, remember to `GCPRO'! Bugs resulting from insufficient
+ `GCPRO'ing are intermittent and extremely difficult to track down,
+ often showing up in crashes inside of `garbage-collect' or in
+ weirdly corrupted objects or even in incorrect values in a totally
+ different section of code.
+
+ If you don't understand whether to `GCPRO' in a particular instance,
+ask on the mailing lists. A general hint is that `prog1' is the
+canonical example.
+
+ Given the extremely error-prone nature of the `GCPRO' scheme, and
+the difficulties in tracking down, it should be considered a deficiency
+in the XEmacs code. A solution to this problem would involve
+implementing so-called "conservative" garbage collection for the C
+stack. That involves looking through all of stack memory and treating
+anything that looks like a reference to an object as a reference. This
+will result in a few objects not getting collected when they should, but
+it obviates the need for `GCPRO'ing, and allows garbage collection to
+happen at any point at all, such as during object allocation.
+
+\1f
+File: internals.info, Node: Garbage Collection - Step by Step, Next: Integers and Characters, Prev: GCPROing, Up: Allocation of Objects in XEmacs Lisp
+
+Garbage Collection - Step by Step
+=================================
+
+* Menu:
+
+* Invocation::
+* garbage_collect_1::
+* mark_object::
+* gc_sweep::
+* sweep_lcrecords_1::
+* compact_string_chars::
+* sweep_strings::
+* sweep_bit_vectors_1::
+
+\1f
+File: internals.info, Node: Invocation, Next: garbage_collect_1, Up: Garbage Collection - Step by Step
+
+Invocation
+----------
+
+The first thing that anyone should know about garbage collection is:
+when and how the garbage collector is invoked. One might think that this
+could happen every time new memory is allocated, e.g. new objects are
+created, but this is _not_ the case. Instead, we have the following
+situation:
+
+ The entry point of any process of garbage collection is an invocation
+of the function `garbage_collect_1' in file `alloc.c'. The invocation
+can occur _explicitly_ by calling the function `Fgarbage_collect' (in
+addition this function provides information about the freed memory), or
+can occur _implicitly_ in four different situations:
+ 1. In function `main_1' in file `emacs.c'. This function is called at
+ each startup of xemacs. The garbage collection is invoked after all
+ initial creations are completed, but only if a special internal
+ error checking-constant `ERROR_CHECK_GC' is defined.
+
+ 2. In function `disksave_object_finalization' in file `alloc.c'. The
+ only purpose of this function is to clear the objects from memory
+ which need not be stored with xemacs when we dump out an
+ executable. This is only done by `Fdump_emacs' or by
+ `Fdump_emacs_data' respectively (both in `emacs.c'). The actual
+ clearing is accomplished by making these objects unreachable and
+ starting a garbage collection. The function is only used while
+ building xemacs.
+
+ 3. In function `Feval / eval' in file `eval.c'. Each time the well
+ known and often used function eval is called to evaluate a form,
+ one of the first things that could happen, is a potential call of
+ `garbage_collect_1'. There exist three global variables,
+ `consing_since_gc' (counts the created cons-cells since the last
+ garbage collection), `gc_cons_threshold' (a specified threshold
+ after which a garbage collection occurs) and `always_gc'. If
+ `always_gc' is set or if the threshold is exceeded, the garbage
+ collection will start.
+
+ 4. In function `Ffuncall / funcall' in file `eval.c'. This function
+ evaluates calls of elisp functions and works according to `Feval'.
+
+ The upshot is that garbage collection can basically occur everywhere
+`Feval', respectively `Ffuncall', is used - either directly or through
+another function. Since calls to these two functions are hidden in
+various other functions, many calls to `garbage_collect_1' are not
+obviously foreseeable, and therefore unexpected. Instances where they
+are used that are worth remembering are various elisp commands, as for
+example `or', `and', `if', `cond', `while', `setq', etc., miscellaneous
+`gui_item_...' functions, everything related to `eval' (`Feval_buffer',
+`call0', ...) and inside `Fsignal'. The latter is used to handle
+signals, as for example the ones raised by every `QUIT'-macro triggered
+after pressing Ctrl-g.
+
+\1f
+File: internals.info, Node: garbage_collect_1, Next: mark_object, Prev: Invocation, Up: Garbage Collection - Step by Step
+
+`garbage_collect_1'
+-------------------
+
+We can now describe exactly what happens after the invocation takes
+place.
+ 1. There are several cases in which the garbage collector is left
+ immediately: when we are already garbage collecting
+ (`gc_in_progress'), when the garbage collection is somehow
+ forbidden (`gc_currently_forbidden'), when we are currently
+ displaying something (`in_display') or when we are preparing for
+ the armageddon of the whole system (`preparing_for_armageddon').
+
+ 2. Next the correct frame in which to put all the output occurring
+ during garbage collecting is determined. In order to be able to
+ restore the old display's state after displaying the message, some
+ data about the current cursor position has to be saved. The
+ variables `pre_gc_cursor' and `cursor_changed' take care of that.
+
+ 3. The state of `gc_currently_forbidden' must be restored after the
+ garbage collection, no matter what happens during the process. We
+ accomplish this by `record_unwind_protect'ing the suitable function
+ `restore_gc_inhibit' together with the current value of
+ `gc_currently_forbidden'.
+
+ 4. If we are concurrently running an interactive xemacs session, the
+ next step is simply to show the garbage collector's cursor/message.
+
+ 5. The following steps are the intrinsic steps of the garbage
+ collector, therefore `gc_in_progress' is set.
+
+ 6. For debugging purposes, it is possible to copy the current C stack
+ frame. However, this seems to be a currently unused feature.
+
+ 7. Before actually starting to go over all live objects, references to
+ objects that are no longer used are pruned. We only have to do
+ this for events (`clear_event_resource') and for specifiers
+ (`cleanup_specifiers').
+
+ 8. Now the mark phase begins and marks all accessible elements. In
+ order to start from all slots that serve as roots of
+ accessibility, the function `mark_object' is called for each root
+ individually to go out from there to mark all reachable objects.
+ All roots that are traversed are shown in their processed order:
+ * all constant symbols and static variables that are registered
+ via `staticpro' in the dynarr `staticpros'. *Note Adding
+ Global Lisp Variables::.
+
+ * all Lisp objects that are created in C functions and that
+ must be protected from freeing them. They are registered in
+ the global list `gcprolist'. *Note GCPROing::.
+
+ * all local variables (i.e. their name fields `symbol' and old
+ values `old_values') that are bound during the evaluation by
+ the Lisp engine. They are stored in `specbinding' structs
+ pushed on a stack called `specpdl'. *Note Dynamic Binding;
+ The specbinding Stack; Unwind-Protects::.
+
+ * all catch blocks that the Lisp engine encounters during the
+ evaluation cause the creation of structs `catchtag' inserted
+ in the list `catchlist'. Their tag (`tag') and value (`val'
+ fields are freshly created objects and therefore have to be
+ marked. *Note Catch and Throw::.
+
+ * every function application pushes new structs `backtrace' on
+ the call stack of the Lisp engine (`backtrace_list'). The
+ unique parts that have to be marked are the fields for each
+ function (`function') and all their arguments (`args').
+ *Note Evaluation::.
+
+ * all objects that are used by the redisplay engine that must
+ not be freed are marked by a special function called
+ `mark_redisplay' (in `redisplay.c').
+
+ * all objects created for profiling purposes are allocated by C
+ functions instead of using the lisp allocation mechanisms. In
+ order to receive the right ones during the sweep phase, they
+ also have to be marked manually. That is done by the function
+ `mark_profiling_info'
+
+ 9. Hash tables in XEmacs belong to a kind of special objects that
+ make use of a concept often called 'weak pointers'. To make a
+ long story short, these kind of pointers are not followed during
+ the estimation of the live objects during garbage collection. Any
+ object referenced only by weak pointers is collected anyway, and
+ the reference to it is cleared. In hash tables there are different
+ usage patterns of them, manifesting in different types of hash
+ tables, namely 'non-weak', 'weak', 'key-weak' and 'value-weak'
+ (internally also 'key-car-weak' and 'value-car-weak') hash tables,
+ each clearing entries depending on different conditions. More
+ information can be found in the documentation to the function
+ `make-hash-table'.
+
+ Because there are complicated dependency rules about when and what
+ to mark while processing weak hash tables, the standard `marker'
+ method is only active if it is marking non-weak hash tables. As
+ soon as a weak component is in the table, the hash table entries
+ are ignored while marking. Instead their marking is done each
+ separately by the function `finish_marking_weak_hash_tables'. This
+ function iterates over each hash table entry `hentries' for each
+ weak hash table in `Vall_weak_hash_tables'. Depending on the type
+ of a table, the appropriate action is performed. If a table is
+ acting as `HASH_TABLE_KEY_WEAK', and a key already marked,
+ everything reachable from the `value' component is marked. If it is
+ acting as a `HASH_TABLE_VALUE_WEAK' and the value component is
+ already marked, the marking starts beginning only from the `key'
+ component. If it is a `HASH_TABLE_KEY_CAR_WEAK' and the car of
+ the key entry is already marked, we mark both the `key' and
+ `value' components. Finally, if the table is of the type
+ `HASH_TABLE_VALUE_CAR_WEAK' and the car of the value components is
+ already marked, again both the `key' and the `value' components
+ get marked.
+
+ Again, there are lists with comparable properties called weak
+ lists. There exist different peculiarities of their types called
+ `simple', `assoc', `key-assoc' and `value-assoc'. You can find
+ further details about them in the description to the function
+ `make-weak-list'. The scheme of their marking is similar: all weak
+ lists are listed in `Qall_weak_lists', therefore we iterate over
+ them. The marking is advanced until we hit an already marked pair.
+ Then we know that during a former run all the rest has been marked
+ completely. Again, depending on the special type of the weak list,
+ our jobs differ. If it is a `WEAK_LIST_SIMPLE' and the elem is
+ marked, we mark the `cons' part. If it is a `WEAK_LIST_ASSOC' and
+ not a pair or a pair with both marked car and cdr, we mark the
+ `cons' and the `elem'. If it is a `WEAK_LIST_KEY_ASSOC' and not a
+ pair or a pair with a marked car of the elem, we mark the `cons'
+ and the `elem'. Finally, if it is a `WEAK_LIST_VALUE_ASSOC' and
+ not a pair or a pair with a marked cdr of the elem, we mark both
+ the `cons' and the `elem'.
+
+ Since, by marking objects in reach from weak hash tables and weak
+ lists, other objects could get marked, this perhaps implies
+ further marking of other weak objects, both finishing functions
+ are redone as long as yet unmarked objects get freshly marked.
+
+ 10. After completing the special marking for the weak hash tables and
+ for the weak lists, all entries that point to objects that are
+ going to be swept in the further process are useless, and
+ therefore have to be removed from the table or the list.
+
+ The function `prune_weak_hash_tables' does the job for weak hash
+ tables. Totally unmarked hash tables are removed from the list
+ `Vall_weak_hash_tables'. The other ones are treated more carefully
+ by scanning over all entries and removing one as soon as one of
+ the components `key' and `value' is unmarked.
+
+ The same idea applies to the weak lists. It is accomplished by
+ `prune_weak_lists': An unmarked list is pruned from
+ `Vall_weak_lists' immediately. A marked list is treated more
+ carefully by going over it and removing just the unmarked pairs.
+
+ 11. The function `prune_specifiers' checks all listed specifiers held
+ in `Vall_specifiers' and removes the ones from the lists that are
+ unmarked.
+
+ 12. All syntax tables are stored in a list called
+ `Vall_syntax_tables'. The function `prune_syntax_tables' walks
+ through it and unlinks the tables that are unmarked.
+
+ 13. Next, we will attack the complete sweeping - the function
+ `gc_sweep' which holds the predominance.
+
+ 14. First, all the variables with respect to garbage collection are
+ reset. `consing_since_gc' - the counter of the created cells since
+ the last garbage collection - is set back to 0, and
+ `gc_in_progress' is not `true' anymore.
+
+ 15. In case the session is interactive, the displayed cursor and
+ message are removed again.
+
+ 16. The state of `gc_inhibit' is restored to the former value by
+ unwinding the stack.
+
+ 17. A small memory reserve is always held back that can be reached by
+ `breathing_space'. If nothing more is left, we create a new reserve
+ and exit.
+
+\1f
+File: internals.info, Node: mark_object, Next: gc_sweep, Prev: garbage_collect_1, Up: Garbage Collection - Step by Step
+
+`mark_object'
+-------------
+
+The first thing that is checked while marking an object is whether the
+object is a real Lisp object `Lisp_Type_Record' or just an integer or a
+character. Integers and characters are the only two types that are
+stored directly - without another level of indirection, and therefore
+they don't have to be marked and collected. *Note How Lisp Objects Are
+Represented in C::.
+
+ The second case is the one we have to handle. It is the one when we
+are dealing with a pointer to a Lisp object. But, there exist also three
+possibilities, that prevent us from doing anything while marking: The
+object is read only which prevents it from being garbage collected,
+i.e. marked (`C_READONLY_RECORD_HEADER'). The object in question is
+already marked, and need not be marked for the second time (checked by
+`MARKED_RECORD_HEADER_P'). If it is a special, unmarkable object
+(`UNMARKABLE_RECORD_HEADER_P', apparently, these are objects that sit
+in some const space, and can therefore not be marked, see
+`this_one_is_unmarkable' in `alloc.c').
+
+ Now, the actual marking is feasible. We do so by once using the macro
+`MARK_RECORD_HEADER' to mark the object itself (actually the special
+flag in the lrecord header), and calling its special marker "method"
+`marker' if available. The marker method marks every other object that
+is in reach from our current object. Note, that these marker methods
+should not call `mark_object' recursively, but instead should return
+the next object from where further marking has to be performed.
+
+ In case another object was returned, as mentioned before, we
+reiterate the whole `mark_object' process beginning with this next
+object.
+
+\1f
+File: internals.info, Node: gc_sweep, Next: sweep_lcrecords_1, Prev: mark_object, Up: Garbage Collection - Step by Step
+
+`gc_sweep'
+----------
+
+The job of this function is to free all unmarked records from memory. As
+we know, there are different types of objects implemented and managed,
+and consequently different ways to free them from memory. *Note
+Introduction to Allocation::.
+
+ We start with all objects stored through `lcrecords'. All bulkier
+objects are allocated and handled using that scheme of `lcrecords'.
+Each object is `malloc'ed separately instead of placing it in one of
+the contiguous frob blocks. All types that are currently stored using
+`lcrecords''s `alloc_lcrecord' and `make_lcrecord_list' are the types:
+vectors, buffers, char-table, char-table-entry, console, weak-list,
+database, device, ldap, hash-table, command-builder, extent-auxiliary,
+extent-info, face, coding-system, frame, image-instance, glyph,
+popup-data, gui-item, keymap, charset, color_instance, font_instance,
+opaque, opaque-list, process, range-table, specifier,
+symbol-value-buffer-local, symbol-value-lisp-magic,
+symbol-value-varalias, toolbar-button, tooltalk-message,
+tooltalk-pattern, window, and window-configuration. We take care of
+them in the fist place in order to be able to handle and to finalize
+items stored in them more easily. The function `sweep_lcrecords_1' as
+described below is doing the whole job for us. For a description about
+the internals: *Note lrecords::.
+
+ Our next candidates are the other objects that behave quite
+differently than everything else: the strings. They consists of two
+parts, a fixed-size portion (`struct Lisp_String') holding the string's
+length, its property list and a pointer to the second part, and the
+actual string data, which is stored in string-chars blocks comparable to
+frob blocks. In this block, the data is not only freed, but also a
+compression of holes is made, i.e. all strings are relocated together.
+*Note String::. This compacting phase is performed by the function
+`compact_string_chars', the actual sweeping by the function
+`sweep_strings' is described below.
+
+ After that, the other types are swept step by step using functions
+`sweep_conses', `sweep_bit_vectors_1', `sweep_compiled_functions',
+`sweep_floats', `sweep_symbols', `sweep_extents', `sweep_markers' and
+`sweep_extents'. They are the fixed-size types cons, floats,
+compiled-functions, symbol, marker, extent, and event stored in
+so-called "frob blocks", and therefore we can basically do the same on
+every type objects, using the same macros, especially defined only to
+handle everything with respect to fixed-size blocks. The only fixed-size
+type that is not handled here are the fixed-size portion of strings,
+because we took special care of them earlier.
+
+ The only big exceptions are bit vectors stored differently and
+therefore treated differently by the function `sweep_bit_vectors_1'
+described later.
+
+ At first, we need some brief information about how these fixed-size
+types are managed in general, in order to understand how the sweeping
+is done. They have all a fixed size, and are therefore stored in big
+blocks of memory - allocated at once - that can hold a certain amount
+of objects of one type. The macro `DECLARE_FIXED_TYPE_ALLOC' creates
+the suitable structures for every type. More precisely, we have the
+block struct (holding a pointer to the previous block `prev' and the
+objects in `block[]'), a pointer to current block
+(`current_..._block)') and its last index (`current_..._block_index'),
+and a pointer to the free list that will be created. Also a macro
+`FIXED_TYPE_FROM_BLOCK' plus some related macros exists that are used
+to obtain a new object, either from the free list
+`ALLOCATE_FIXED_TYPE_1' if there is an unused object of that type
+stored or by allocating a completely new block using
+`ALLOCATE_FIXED_TYPE_FROM_BLOCK'.
+
+ The rest works as follows: all of them define a macro `UNMARK_...'
+that is used to unmark the object. They define a macro
+`ADDITIONAL_FREE_...' that defines additional work that has to be done
+when converting an object from in use to not in use (so far, only
+markers use it in order to unchain them). Then, they all call the macro
+`SWEEP_FIXED_TYPE_BLOCK' instantiated with their type name and their
+struct name.
+
+ This call in particular does the following: we go over all blocks
+starting with the current moving towards the oldest. For each block,
+we look at every object in it. If the object already freed (checked
+with `FREE_STRUCT_P' using the first pointer of the object), or if it is
+set to read only (`C_READONLY_RECORD_HEADER_P', nothing must be done.
+If it is unmarked (checked with `MARKED_RECORD_HEADER_P'), it is put in
+the free list and set free (using the macro `FREE_FIXED_TYPE',
+otherwise it stays in the block, but is unmarked (by `UNMARK_...').
+While going through one block, we note if the whole block is empty. If
+so, the whole block is freed (using `xfree') and the free list state is
+set to the state it had before handling this block.
+
+\1f
+File: internals.info, Node: sweep_lcrecords_1, Next: compact_string_chars, Prev: gc_sweep, Up: Garbage Collection - Step by Step
+
+`sweep_lcrecords_1'
+-------------------
+
+After nullifying the complete lcrecord statistics, we go over all
+lcrecords two separate times. They are all chained together in a list
+with a head called `all_lcrecords'.
+
+ The first loop calls for each object its `finalizer' method, but only
+in the case that it is not read only (`C_READONLY_RECORD_HEADER_P)', it
+is not already marked (`MARKED_RECORD_HEADER_P'), it is not already in
+a free list (list of freed objects, field `free') and finally it owns a
+finalizer method.
+
+ The second loop actually frees the appropriate objects again by
+iterating through the whole list. In case an object is read only or
+marked, it has to persist, otherwise it is manually freed by calling
+`xfree'. During this loop, the lcrecord statistics are kept up to date
+by calling `tick_lcrecord_stats' with the right arguments,
+
+\1f
+File: internals.info, Node: compact_string_chars, Next: sweep_strings, Prev: sweep_lcrecords_1, Up: Garbage Collection - Step by Step
+
+`compact_string_chars'
+----------------------
+
+The purpose of this function is to compact all the data parts of the
+strings that are held in so-called `string_chars_block', i.e. the
+strings that do not exceed a certain maximal length.
+
+ The procedure with which this is done is as follows. We are keeping
+two positions in the `string_chars_block's using two pointer/integer
+pairs, namely `from_sb'/`from_pos' and `to_sb'/`to_pos'. They stand for
+the actual positions, from where to where, to copy the actually handled
+string.
+
+ While going over all chained `string_char_block's and their held
+strings, staring at `first_string_chars_block', both pointers are
+advanced and eventually a string is copied from `from_sb' to `to_sb',
+depending on the status of the pointed at strings.
+
+ More precisely, we can distinguish between the following actions.
+ * The string at `from_sb''s position could be marked as free, which
+ is indicated by an invalid pointer to the pointer that should
+ point back to the fixed size string object, and which is checked by
+ `FREE_STRUCT_P'. In this case, the `from_sb'/`from_pos' is
+ advanced to the next string, and nothing has to be copied.
+
+ * Also, if a string object itself is unmarked, nothing has to be
+ copied. We likewise advance the `from_sb'/`from_pos' pair as
+ described above.
+
+ * In all other cases, we have a marked string at hand. The string
+ data must be moved from the from-position to the to-position. In
+ case there is not enough space in the actual `to_sb'-block, we
+ advance this pointer to the beginning of the next block before
+ copying. In case the from and to positions are different, we
+ perform the actual copying using the library function `memmove'.
+
+ After compacting, the pointer to the current `string_chars_block',
+sitting in `current_string_chars_block', is reset on the last block to
+which we moved a string, i.e. `to_block', and all remaining blocks (we
+know that they just carry garbage) are explicitly `xfree'd.
+
+\1f
+File: internals.info, Node: sweep_strings, Next: sweep_bit_vectors_1, Prev: compact_string_chars, Up: Garbage Collection - Step by Step
+
+`sweep_strings'
+---------------
+
+The sweeping for the fixed sized string objects is essentially exactly
+the same as it is for all other fixed size types. As before, the freeing
+into the suitable free list is done by using the macro
+`SWEEP_FIXED_SIZE_BLOCK' after defining the right macros
+`UNMARK_string' and `ADDITIONAL_FREE_string'. These two definitions are
+a little bit special compared to the ones used for the other fixed size
+types.
+
+ `UNMARK_string' is defined the same way except some additional code
+used for updating the bookkeeping information.
+
+ For strings, `ADDITIONAL_FREE_string' has to do something in
+addition: in case, the string was not allocated in a
+`string_chars_block' because it exceeded the maximal length, and
+therefore it was `malloc'ed separately, we know also `xfree' it
+explicitly.
+
+\1f
+File: internals.info, Node: sweep_bit_vectors_1, Prev: sweep_strings, Up: Garbage Collection - Step by Step
+
+`sweep_bit_vectors_1'
+---------------------
+
+Bit vectors are also one of the rare types that are `malloc'ed
+individually. Consequently, while sweeping, all further needless bit
+vectors must be freed by hand. This is done, as one might imagine, the
+expected way: since they are all registered in a list called
+`all_bit_vectors', all elements of that list are traversed, all
+unmarked bit vectors are unlinked by calling `xfree' and all of them
+become unmarked. In addition, the bookkeeping information used for
+garbage collector's output purposes is updated.
+
+\1f
+File: internals.info, Node: Integers and Characters, Next: Allocation from Frob Blocks, Prev: Garbage Collection - Step by Step, Up: Allocation of Objects in XEmacs Lisp
+
+Integers and Characters
+=======================
+
+Integer and character Lisp objects are created from integers using the
+macros `XSETINT()' and `XSETCHAR()' or the equivalent functions
+`make_int()' and `make_char()'. (These are actually macros on most
+systems.) These functions basically just do some moving of bits
+around, since the integral value of the object is stored directly in
+the `Lisp_Object'.
+
+ `XSETINT()' and the like will truncate values given to them that are
+too big; i.e. you won't get the value you expected but the tag bits
+will at least be correct.
+
+\1f
+File: internals.info, Node: Allocation from Frob Blocks, Next: lrecords, Prev: Integers and Characters, Up: Allocation of Objects in XEmacs Lisp
+
+Allocation from Frob Blocks
+===========================
+
+The uninitialized memory required by a `Lisp_Object' of a particular
+type is allocated using `ALLOCATE_FIXED_TYPE()'. This only occurs
+inside of the lowest-level object-creating functions in `alloc.c':
+`Fcons()', `make_float()', `Fmake_byte_code()', `Fmake_symbol()',
+`allocate_extent()', `allocate_event()', `Fmake_marker()', and
+`make_uninit_string()'. The idea is that, for each type, there are a
+number of frob blocks (each 2K in size); each frob block is divided up
+into object-sized chunks. Each frob block will have some of these
+chunks that are currently assigned to objects, and perhaps some that are
+free. (If a frob block has nothing but free chunks, it is freed at the
+end of the garbage collection cycle.) The free chunks are stored in a
+free list, which is chained by storing a pointer in the first four bytes
+of the chunk. (Except for the free chunks at the end of the last frob
+block, which are handled using an index which points past the end of the
+last-allocated chunk in the last frob block.) `ALLOCATE_FIXED_TYPE()'
+first tries to retrieve a chunk from the free list; if that fails, it
+calls `ALLOCATE_FIXED_TYPE_FROM_BLOCK()', which looks at the end of the
+last frob block for space, and creates a new frob block if there is
+none. (There are actually two versions of these macros, one of which is
+more defensive but less efficient and is used for error-checking.)
+
+\1f
+File: internals.info, Node: lrecords, Next: Low-level allocation, Prev: Allocation from Frob Blocks, Up: Allocation of Objects in XEmacs Lisp
+
+lrecords
+========
+
+[see `lrecord.h']
+
+ All lrecords have at the beginning of their structure a `struct
+lrecord_header'. This just contains a type number and some flags,
+including the mark bit. All builtin type numbers are defined as
+constants in `enum lrecord_type', to allow the compiler to generate
+more efficient code for `TYPEP'. The type number, thru the
+`lrecord_implementation_table', gives access to a `struct
+lrecord_implementation', which is a structure containing method pointers
+and such. There is one of these for each type, and it is a global,
+constant, statically-declared structure that is declared in the
+`DEFINE_LRECORD_IMPLEMENTATION()' macro.
+
+ Simple lrecords (of type (b) above) just have a `struct
+lrecord_header' at their beginning. lcrecords, however, actually have a
+`struct lcrecord_header'. This, in turn, has a `struct lrecord_header'
+at its beginning, so sanity is preserved; but it also has a pointer
+used to chain all lcrecords together, and a special ID field used to
+distinguish one lcrecord from another. (This field is used only for
+debugging and could be removed, but the space gain is not significant.)
+
+ Simple lrecords are created using `ALLOCATE_FIXED_TYPE()', just like
+for other frob blocks. The only change is that the implementation
+pointer must be initialized correctly. (The implementation structure for
+an lrecord, or rather the pointer to it, is named `lrecord_float',
+`lrecord_extent', `lrecord_buffer', etc.)
+
+ lcrecords are created using `alloc_lcrecord()'. This takes a size
+to allocate and an implementation pointer. (The size needs to be passed
+because some lcrecords, such as window configurations, are of variable
+size.) This basically just `malloc()'s the storage, initializes the
+`struct lcrecord_header', and chains the lcrecord onto the head of the
+list of all lcrecords, which is stored in the variable `all_lcrecords'.
+The calls to `alloc_lcrecord()' generally occur in the lowest-level
+allocation function for each lrecord type.
+
+ Whenever you create an lrecord, you need to call either
+`DEFINE_LRECORD_IMPLEMENTATION()' or
+`DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()'. This needs to be specified
+in a `.c' file, at the top level. What this actually does is define
+and initialize the implementation structure for the lrecord. (And
+possibly declares a function `error_check_foo()' that implements the
+`XFOO()' macro when error-checking is enabled.) The arguments to the
+macros are the actual type name (this is used to construct the C
+variable name of the lrecord implementation structure and related
+structures using the `##' macro concatenation operator), a string that
+names the type on the Lisp level (this may not be the same as the C
+type name; typically, the C type name has underscores, while the Lisp
+string has dashes), various method pointers, and the name of the C
+structure that contains the object. The methods are used to
+encapsulate type-specific information about the object, such as how to
+print it or mark it for garbage collection, so that it's easy to add
+new object types without having to add a specific case for each new
+type in a bunch of different places.
+
+ The difference between `DEFINE_LRECORD_IMPLEMENTATION()' and
+`DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()' is that the former is used
+for fixed-size object types and the latter is for variable-size object
+types. Most object types are fixed-size; some complex types, however
+(e.g. window configurations), are variable-size. Variable-size object
+types have an extra method, which is called to determine the actual
+size of a particular object of that type. (Currently this is only used
+for keeping allocation statistics.)
+
+ For the purpose of keeping allocation statistics, the allocation
+engine keeps a list of all the different types that exist. Note that,
+since `DEFINE_LRECORD_IMPLEMENTATION()' is a macro that is specified at
+top-level, there is no way for it to initialize the global data
+structures containing type information, like
+`lrecord_implementations_table'. For this reason a call to
+`INIT_LRECORD_IMPLEMENTATION' must be added to the same source file
+containing `DEFINE_LRECORD_IMPLEMENTATION', but instead of to the top
+level, to one of the init functions, typically `syms_of_FOO.c'.
+`INIT_LRECORD_IMPLEMENTATION' must be called before an object of this
+type is used.
+
+ The type number is also used to index into an array holding the
+number of objects of each type and the total memory allocated for
+objects of that type. The statistics in this array are computed during
+the sweep stage. These statistics are returned by the call to
+`garbage-collect'.
+
+ Note that for every type defined with a `DEFINE_LRECORD_*()' macro,
+there needs to be a `DECLARE_LRECORD_IMPLEMENTATION()' somewhere in a
+`.h' file, and this `.h' file needs to be included by `inline.c'.
+
+ Furthermore, there should generally be a set of `XFOOBAR()',
+`FOOBARP()', etc. macros in a `.h' (or occasionally `.c') file. To
+create one of these, copy an existing model and modify as necessary.
+
+ *Please note:* If you define an lrecord in an external
+dynamically-loaded module, you must use `DECLARE_EXTERNAL_LRECORD',
+`DEFINE_EXTERNAL_LRECORD_IMPLEMENTATION', and
+`DEFINE_EXTERNAL_LRECORD_SEQUENCE_IMPLEMENTATION' instead of the
+non-EXTERNAL forms. These macros will dynamically add new type numbers
+to the global enum that records them, whereas the non-EXTERNAL forms
+assume that the programmer has already inserted the correct type numbers
+into the enum's code at compile-time.
+
+ The various methods in the lrecord implementation structure are:
+
+ 1. A "mark" method. This is called during the marking stage and
+ passed a function pointer (usually the `mark_object()' function),
+ which is used to mark an object. All Lisp objects that are
+ contained within the object need to be marked by applying this
+ function to them. The mark method should also return a Lisp
+ object, which should be either `nil' or an object to mark. (This
+ can be used in lieu of calling `mark_object()' on the object, to
+ reduce the recursion depth, and consequently should be the most
+ heavily nested sub-object, such as a long list.)
+
+ *Please note:* When the mark method is called, garbage collection
+ is in progress, and special precautions need to be taken when
+ accessing objects; see section (B) above.
+
+ If your mark method does not need to do anything, it can be `NULL'.
+
+ 2. A "print" method. This is called to create a printed
+ representation of the object, whenever `princ', `prin1', or the
+ like is called. It is passed the object, a stream to which the
+ output is to be directed, and an `escapeflag' which indicates
+ whether the object's printed representation should be "escaped" so
+ that it is readable. (This corresponds to the difference between
+ `princ' and `prin1'.) Basically, "escaped" means that strings will
+ have quotes around them and confusing characters in the strings
+ such as quotes, backslashes, and newlines will be backslashed; and
+ that special care will be taken to make symbols print in a
+ readable fashion (e.g. symbols that look like numbers will be
+ backslashed). Other readable objects should perhaps pass
+ `escapeflag' on when sub-objects are printed, so that readability
+ is preserved when necessary (or if not, always pass in a 1 for
+ `escapeflag'). Non-readable objects should in general ignore
+ `escapeflag', except that some use it as an indication that more
+ verbose output should be given.
+
+ Sub-objects are printed using `print_internal()', which takes
+ exactly the same arguments as are passed to the print method.
+
+ Literal C strings should be printed using `write_c_string()', or
+ `write_string_1()' for non-null-terminated strings.
+
+ Functions that do not have a readable representation should check
+ the `print_readably' flag and signal an error if it is set.
+
+ If you specify NULL for the print method, the
+ `default_object_printer()' will be used.
+
+ 3. A "finalize" method. This is called at the beginning of the sweep
+ stage on lcrecords that are about to be freed, and should be used
+ to perform any extra object cleanup. This typically involves
+ freeing any extra `malloc()'ed memory associated with the object,
+ releasing any operating-system and window-system resources
+ associated with the object (e.g. pixmaps, fonts), etc.
+
+ The finalize method can be NULL if nothing needs to be done.
+
+ WARNING #1: The finalize method is also called at the end of the
+ dump phase; this time with the for_disksave parameter set to
+ non-zero. The object is _not_ about to disappear, so you have to
+ make sure to _not_ free any extra `malloc()'ed memory if you're
+ going to need it later. (Also, signal an error if there are any
+ operating-system and window-system resources here, because they
+ can't be dumped.)
+
+ Finalize methods should, as a rule, set to zero any pointers after
+ they've been freed, and check to make sure pointers are not zero
+ before freeing. Although I'm pretty sure that finalize methods
+ are not called twice on the same object (except for the
+ `for_disksave' proviso), we've gotten nastily burned in some cases
+ by not doing this.
+
+ WARNING #2: The finalize method is _only_ called for lcrecords,
+ _not_ for simply lrecords. If you need a finalize method for
+ simple lrecords, you have to stick it in the
+ `ADDITIONAL_FREE_foo()' macro in `alloc.c'.
+
+ WARNING #3: Things are in an _extremely_ bizarre state when
+ `ADDITIONAL_FREE_foo()' is called, so you have to be incredibly
+ careful when writing one of these functions. See the comment in
+ `gc_sweep()'. If you ever have to add one of these, consider
+ using an lcrecord or dealing with the problem in a different
+ fashion.
+
+ 4. An "equal" method. This compares the two objects for similarity,
+ when `equal' is called. It should compare the contents of the
+ objects in some reasonable fashion. It is passed the two objects
+ and a "depth" value, which is used to catch circular objects. To
+ compare sub-Lisp-objects, call `internal_equal()' and bump the
+ depth value by one. If this value gets too high, a
+ `circular-object' error will be signaled.
+
+ If this is NULL, objects are `equal' only when they are `eq', i.e.
+ identical.
+
+ 5. A "hash" method. This is used to hash objects when they are to be
+ compared with `equal'. The rule here is that if two objects are
+ `equal', they _must_ hash to the same value; i.e. your hash
+ function should use some subset of the sub-fields of the object
+ that are compared in the "equal" method. If you specify this
+ method as `NULL', the object's pointer will be used as the hash,
+ which will _fail_ if the object has an `equal' method, so don't do
+ this.
+
+ To hash a sub-Lisp-object, call `internal_hash()'. Bump the depth
+ by one, just like in the "equal" method.
+
+ To convert a Lisp object directly into a hash value (using its
+ pointer), use `LISP_HASH()'. This is what happens when the hash
+ method is NULL.
+
+ To hash two or more values together into a single value, use
+ `HASH2()', `HASH3()', `HASH4()', etc.
+
+ 6. "getprop", "putprop", "remprop", and "plist" methods. These are
+ used for object types that have properties. I don't feel like
+ documenting them here. If you create one of these objects, you
+ have to use different macros to define them, i.e.
+ `DEFINE_LRECORD_IMPLEMENTATION_WITH_PROPS()' or
+ `DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION_WITH_PROPS()'.
+
+ 7. A "size_in_bytes" method, when the object is of variable-size.
+ (i.e. declared with a `_SEQUENCE_IMPLEMENTATION' macro.) This
+ should simply return the object's size in bytes, exactly as you
+ might expect. For an example, see the methods for window
+ configurations and opaques.
+
+\1f
+File: internals.info, Node: Low-level allocation, Next: Cons, Prev: lrecords, Up: Allocation of Objects in XEmacs Lisp
+
+Low-level allocation
+====================
+
+Memory that you want to allocate directly should be allocated using
+`xmalloc()' rather than `malloc()'. This implements error-checking on
+the return value, and once upon a time did some more vital stuff (i.e.
+`BLOCK_INPUT', which is no longer necessary). Free using `xfree()',
+and realloc using `xrealloc()'. Note that `xmalloc()' will do a
+non-local exit if the memory can't be allocated. (Many functions,
+however, do not expect this, and thus XEmacs will likely crash if this
+happens. *This is a bug.* If you can, you should strive to make your
+function handle this OK. However, it's difficult in the general
+circumstance, perhaps requiring extra unwind-protects and such.)
+
+ Note that XEmacs provides two separate replacements for the standard
+`malloc()' library function. These are called "old GNU malloc"
+(`malloc.c') and "new GNU malloc" (`gmalloc.c'), respectively. New GNU
+malloc is better in pretty much every way than old GNU malloc, and
+should be used if possible. (It used to be that on some systems, the
+old one worked but the new one didn't. I think this was due
+specifically to a bug in SunOS, which the new one now works around; so
+I don't think the old one ever has to be used any more.) The primary
+difference between both of these mallocs and the standard system malloc
+is that they are much faster, at the expense of increased space. The
+basic idea is that memory is allocated in fixed chunks of powers of
+two. This allows for basically constant malloc time, since the various
+chunks can just be kept on a number of free lists. (The standard system
+malloc typically allocates arbitrary-sized chunks and has to spend some
+time, sometimes a significant amount of time, walking the heap looking
+for a free block to use and cleaning things up.) The new GNU malloc
+improves on things by allocating large objects in chunks of 4096 bytes
+rather than in ever larger powers of two, which results in ever larger
+wastage. There is a slight speed loss here, but it's of doubtful
+significance.
+
+ NOTE: Apparently there is a third-generation GNU malloc that is
+significantly better than the new GNU malloc, and should probably be
+included in XEmacs.
+
+ There is also the relocating allocator, `ralloc.c'. This actually
+moves blocks of memory around so that the `sbrk()' pointer shrunk and
+virtual memory released back to the system. On some systems, this is a
+big win. On all systems, it causes a noticeable (and sometimes huge)
+speed penalty, so I turn it off by default. `ralloc.c' only works with
+the new GNU malloc in `gmalloc.c'. There are also two versions of
+`ralloc.c', one that uses `mmap()' rather than block copies to move
+data around. This purports to be faster, although that depends on the
+amount of data that would have had to be block copied and the
+system-call overhead for `mmap()'. I don't know exactly how this
+works, except that the relocating-allocation routines are pretty much
+used only for the memory allocated for a buffer, which is the biggest
+consumer of space, esp. of space that may get freed later.
+
+ Note that the GNU mallocs have some "memory warning" facilities.
+XEmacs taps into them and issues a warning through the standard warning
+system, when memory gets to 75%, 85%, and 95% full. (On some systems,
+the memory warnings are not functional.)
+
+ Allocated memory that is going to be used to make a Lisp object is
+created using `allocate_lisp_storage()'. This just calls `xmalloc()'.
+It used to verify that the pointer to the memory can fit into a Lisp
+word, before the current Lisp object representation was introduced.
+`allocate_lisp_storage()' is called by `alloc_lcrecord()',
+`ALLOCATE_FIXED_TYPE()', and the vector and bit-vector creation
+routines. These routines also call `INCREMENT_CONS_COUNTER()' at the
+appropriate times; this keeps statistics on how much memory is
+allocated, so that garbage-collection can be invoked when the threshold
+is reached.
+
+\1f
+File: internals.info, Node: Cons, Next: Vector, Prev: Low-level allocation, Up: Allocation of Objects in XEmacs Lisp
+
+Cons
+====
+
+Conses are allocated in standard frob blocks. The only thing to note
+is that conses can be explicitly freed using `free_cons()' and
+associated functions `free_list()' and `free_alist()'. This
+immediately puts the conses onto the cons free list, and decrements the
+statistics on memory allocation appropriately. This is used to good
+effect by some extremely commonly-used code, to avoid generating extra
+objects and thereby triggering GC sooner. However, you have to be
+_extremely_ careful when doing this. If you mess this up, you will get
+BADLY BURNED, and it has happened before.
+
+\1f
+File: internals.info, Node: Vector, Next: Bit Vector, Prev: Cons, Up: Allocation of Objects in XEmacs Lisp
+
+Vector
+======
+
+As mentioned above, each vector is `malloc()'ed individually, and all
+are threaded through the variable `all_vectors'. Vectors are marked
+strangely during garbage collection, by kludging the size field. Note
+that the `struct Lisp_Vector' is declared with its `contents' field
+being a _stretchy_ array of one element. It is actually `malloc()'ed
+with the right size, however, and access to any element through the
+`contents' array works fine.
+
+\1f
+File: internals.info, Node: Bit Vector, Next: Symbol, Prev: Vector, Up: Allocation of Objects in XEmacs Lisp
+
+Bit Vector
+==========
+
+Bit vectors work exactly like vectors, except for more complicated code
+to access an individual bit, and except for the fact that bit vectors
+are lrecords while vectors are not. (The only difference here is that
+there's an lrecord implementation pointer at the beginning and the tag
+field in bit vector Lisp words is "lrecord" rather than "vector".)
+
+\1f
+File: internals.info, Node: Symbol, Next: Marker, Prev: Bit Vector, Up: Allocation of Objects in XEmacs Lisp
+
+Symbol
+======
+
+Symbols are also allocated in frob blocks. Symbols in the awful
+horrible obarray structure are chained through their `next' field.
+
+ Remember that `intern' looks up a symbol in an obarray, creating one
+if necessary.
+
+\1f
+File: internals.info, Node: Marker, Next: String, Prev: Symbol, Up: Allocation of Objects in XEmacs Lisp
+
+Marker
+======
+
+Markers are allocated in frob blocks, as usual. They are kept in a
+buffer unordered, but in a doubly-linked list so that they can easily
+be removed. (Formerly this was a singly-linked list, but in some cases
+garbage collection took an extraordinarily long time due to the O(N^2)
+time required to remove lots of markers from a buffer.) Markers are
+removed from a buffer in the finalize stage, in
+`ADDITIONAL_FREE_marker()'.
+
+\1f
+File: internals.info, Node: String, Next: Compiled Function, Prev: Marker, Up: Allocation of Objects in XEmacs Lisp
+
+String
+======
+
+As mentioned above, strings are a special case. A string is logically
+two parts, a fixed-size object (containing the length, property list,
+and a pointer to the actual data), and the actual data in the string.
+The fixed-size object is a `struct Lisp_String' and is allocated in
+frob blocks, as usual. The actual data is stored in special
+"string-chars blocks", which are 8K blocks of memory.
+Currently-allocated strings are simply laid end to end in these
+string-chars blocks, with a pointer back to the `struct Lisp_String'
+stored before each string in the string-chars block. When a new string
+needs to be allocated, the remaining space at the end of the last
+string-chars block is used if there's enough, and a new string-chars
+block is created otherwise.
+
+ There are never any holes in the string-chars blocks due to the
+string compaction and relocation that happens at the end of garbage
+collection. During the sweep stage of garbage collection, when objects
+are reclaimed, the garbage collector goes through all string-chars
+blocks, looking for unused strings. Each chunk of string data is
+preceded by a pointer to the corresponding `struct Lisp_String', which
+indicates both whether the string is used and how big the string is,
+i.e. how to get to the next chunk of string data. Holes are compressed
+by block-copying the next string into the empty space and relocating the
+pointer stored in the corresponding `struct Lisp_String'. *This means
+you have to be careful with strings in your code.* See the section
+above on `GCPRO'ing.
+
+ Note that there is one situation not handled: a string that is too
+big to fit into a string-chars block. Such strings, called "big
+strings", are all `malloc()'ed as their own block. (#### Although it
+would make more sense for the threshold for big strings to be somewhat
+lower, e.g. 1/2 or 1/4 the size of a string-chars block. It seems that
+this was indeed the case formerly--indeed, the threshold was set at
+1/8--but Mly forgot about this when rewriting things for 19.8.)
+
+ Note also that the string data in string-chars blocks is padded as
+necessary so that proper alignment constraints on the `struct
+Lisp_String' back pointers are maintained.
+
+ Finally, strings can be resized. This happens in Mule when a
+character is substituted with a different-length character, or during
+modeline frobbing. (You could also export this to Lisp, but it's not
+done so currently.) Resizing a string is a potentially tricky process.
+If the change is small enough that the padding can absorb it, nothing
+other than a simple memory move needs to be done. Keep in mind,
+however, that the string can't shrink too much because the offset to the
+next string in the string-chars block is computed by looking at the
+length and rounding to the nearest multiple of four or eight. If the
+string would shrink or expand beyond the correct padding, new string
+data needs to be allocated at the end of the last string-chars block and
+the data moved appropriately. This leaves some dead string data, which
+is marked by putting a special marker of 0xFFFFFFFF in the `struct
+Lisp_String' pointer before the data (there's no real `struct
+Lisp_String' to point to and relocate), and storing the size of the dead
+string data (which would normally be obtained from the now-non-existent
+`struct Lisp_String') at the beginning of the dead string data gap.
+The string compactor recognizes this special 0xFFFFFFFF marker and
+handles it correctly.
+
+\1f
+File: internals.info, Node: Compiled Function, Prev: String, Up: Allocation of Objects in XEmacs Lisp
+
+Compiled Function
+=================
+
+Not yet documented.
+
+\1f
+File: internals.info, Node: Dumping, Next: Events and the Event Loop, Prev: Allocation of Objects in XEmacs Lisp, Up: Top
+
+Dumping
+*******
+
+What is dumping and its justification
+=====================================
+
+The C code of XEmacs is just a Lisp engine with a lot of built-in
+primitives useful for writing an editor. The editor itself is written
+mostly in Lisp, and represents around 100K lines of code. Loading and
+executing the initialization of all this code takes a bit a time (five
+to ten times the usual startup time of current xemacs) and requires
+having all the lisp source files around. Having to reload them each
+time the editor is started would not be acceptable.
+
+ The traditional solution to this problem is called dumping: the build
+process first creates the lisp engine under the name `temacs', then
+runs it until it has finished loading and initializing all the lisp
+code, and eventually creates a new executable called `xemacs' including
+both the object code in `temacs' and all the contents of the memory
+after the initialization.
+
+ This solution, while working, has a huge problem: the creation of the
+new executable from the actual contents of memory is an extremely
+system-specific process, quite error-prone, and which interferes with a
+lot of system libraries (like malloc). It is even getting worse
+nowadays with libraries using constructors which are automatically
+called when the program is started (even before main()) which tend to
+crash when they are called multiple times, once before dumping and once
+after (IRIX 6.x libz.so pulls in some C++ image libraries thru
+dependencies which have this problem). Writing the dumper is also one
+of the most difficult parts of porting XEmacs to a new operating system.
+Basically, `dumping' is an operation that is just not officially
+supported on many operating systems.
+
+ The aim of the portable dumper is to solve the same problem as the
+system-specific dumper, that is to be able to reload quickly, using only
+a small number of files, the fully initialized lisp part of the editor,
+without any system-specific hacks.
+
+* Menu:
+
+* Overview::
+* Data descriptions::
+* Dumping phase::
+* Reloading phase::
+* Remaining issues::
+
+\1f
+File: internals.info, Node: Overview, Next: Data descriptions, Up: Dumping
+
+Overview
+========
+
+The portable dumping system has to:
+
+ 1. At dump time, write all initialized, non-quickly-rebuildable data
+ to a file [Note: currently named `xemacs.dmp', but the name will
+ change], along with all informations needed for the reloading.
+
+ 2. When starting xemacs, reload the dump file, relocate it to its new
+ starting address if needed, and reinitialize all pointers to this
+ data. Also, rebuild all the quickly rebuildable data.
+
+\1f
+File: internals.info, Node: Data descriptions, Next: Dumping phase, Prev: Overview, Up: Dumping
+
+Data descriptions
+=================
+
+The more complex task of the dumper is to be able to write lisp objects
+(lrecords) and C structs to disk and reload them at a different address,
+updating all the pointers they include in the process. This is done by
+using external data descriptions that give information about the layout
+of the structures in memory.
+
+ The specification of these descriptions is in lrecord.h. A
+description of an lrecord is an array of struct lrecord_description.
+Each of these structs include a type, an offset in the structure and
+some optional parameters depending on the type. For instance, here is
+the string description:
+
+ static const struct lrecord_description string_description[] = {
+ { XD_BYTECOUNT, offsetof (Lisp_String, size) },
+ { XD_OPAQUE_DATA_PTR, offsetof (Lisp_String, data), XD_INDIRECT(0, 1) },
+ { XD_LISP_OBJECT, offsetof (Lisp_String, plist) },
+ { XD_END }
+ };
+
+ The first line indicates a member of type Bytecount, which is used by
+the next, indirect directive. The second means "there is a pointer to
+some opaque data in the field `data'". The length of said data is
+given by the expression `XD_INDIRECT(0, 1)', which means "the value in
+the 0th line of the description (welcome to C) plus one". The third
+line means "there is a Lisp_Object member `plist' in the Lisp_String
+structure". `XD_END' then ends the description.
+
+ This gives us all the information we need to move around what is
+pointed to by a structure (C or lrecord) and, by transitivity,
+everything that it points to. The only missing information for dumping
+is the size of the structure. For lrecords, this is part of the
+lrecord_implementation, so we don't need to duplicate it. For C
+structures we use a struct struct_description, which includes a size
+field and a pointer to an associated array of lrecord_description.
+
+\1f
+File: internals.info, Node: Dumping phase, Next: Reloading phase, Prev: Data descriptions, Up: Dumping
+
+Dumping phase
+=============
+
+Dumping is done by calling the function pdump() (in dumper.c) which is
+invoked from Fdump_emacs (in emacs.c). This function performs a number
+of tasks.
+
+* Menu:
+
+* Object inventory::
+* Address allocation::
+* The header::
+* Data dumping::
+* Pointers dumping::
+
+\1f
+File: internals.info, Node: Object inventory, Next: Address allocation, Up: Dumping phase
+
+Object inventory
+----------------
+
+The first task is to build the list of the objects to dump. This
+includes:
+
+ * lisp objects
+
+ * C structures
+
+ We end up with one `pdump_entry_list_elmt' per object group (arrays
+of C structs are kept together) which includes a pointer to the first
+object of the group, the per-object size and the count of objects in the
+group, along with some other information which is initialized later.
+
+ These entries are linked together in `pdump_entry_list' structures
+and can be enumerated thru either:
+
+ 1. the `pdump_object_table', an array of `pdump_entry_list', one per
+ lrecord type, indexed by type number.
+
+ 2. the `pdump_opaque_data_list', used for the opaque data which does
+ not include pointers, and hence does not need descriptions.
+
+ 3. the `pdump_struct_table', which is a vector of
+ `struct_description'/`pdump_entry_list' pairs, used for non-opaque
+ C structures.
+
+ This uses a marking strategy similar to the garbage collector. Some
+differences though:
+
+ 1. We do not use the mark bit (which does not exist for C structures
+ anyway); we use a big hash table instead.
+
+ 2. We do not use the mark function of lrecords but instead rely on the
+ external descriptions. This happens essentially because we need to
+ follow pointers to C structures and opaque data in addition to
+ Lisp_Object members.
+
+ This is done by `pdump_register_object()', which handles Lisp_Object
+variables, and `pdump_register_struct()' which handles C structures,
+which both delegate the description management to
+`pdump_register_sub()'.
+
+ The hash table doubles as a map object to pdump_entry_list_elmt (i.e.
+allows us to look up a pdump_entry_list_elmt with the object it points
+to). Entries are added with `pdump_add_entry()' and looked up with
+`pdump_get_entry()'. There is no need for entry removal. The hash
+value is computed quite simply from the object pointer by
+`pdump_make_hash()'.
+
+ The roots for the marking are:
+
+ 1. the `staticpro''ed variables (there is a special
+ `staticpro_nodump()' call for protected variables we do not want
+ to dump).
+
+ 2. the variables registered via `dump_add_root_object' (`staticpro()'
+ is equivalent to `staticpro_nodump()' + `dump_add_root_object()').
+
+ 3. the variables registered via `dump_add_root_struct_ptr', each of
+ which points to a C structure.
+
+ This does not include the GCPRO'ed variables, the specbinds, the
+catchtags, the backlist, the redisplay or the profiling info, since we
+do not want to rebuild the actual chain of lisp calls which end up to
+the dump-emacs call, only the global variables.
+
+ Weak lists and weak hash tables are dumped as if they were their
+non-weak equivalent (without changing their type, of course). This has
+not yet been a problem.
+
+\1f
+File: internals.info, Node: Address allocation, Next: The header, Prev: Object inventory, Up: Dumping phase
+
+Address allocation
+------------------
+
+The next step is to allocate the offsets of each of the objects in the
+final dump file. This is done by `pdump_allocate_offset()' which is
+called indirectly by `pdump_scan_by_alignment()'.
+
+ The strategy to deal with alignment problems uses these facts:
+
+ 1. real world alignment requirements are powers of two.
+
+ 2. the C compiler is required to adjust the size of a struct so that
+ you can have an array of them next to each other. This means you
+ can have an upper bound of the alignment requirements of a given
+ structure by looking at which power of two its size is a multiple.
+
+ 3. the non-variant part of variable size lrecords has an alignment
+ requirement of 4.
+
+ Hence, for each lrecord type, C struct type or opaque data block the
+alignment requirement is computed as a power of two, with a minimum of
+2^2 for lrecords. `pdump_scan_by_alignment()' then scans all the
+`pdump_entry_list_elmt''s, the ones with the highest requirements
+first. This ensures the best packing.
+
+ The maximum alignment requirement we take into account is 2^8.
+
+ `pdump_allocate_offset()' only has to do a linear allocation,
+starting at offset 256 (this leaves room for the header and keeps the
+alignments happy).
+
+\1f
+File: internals.info, Node: The header, Next: Data dumping, Prev: Address allocation, Up: Dumping phase
+
+The header
+----------
+
+The next step creates the file and writes a header with a signature and
+some random information in it. The `reloc_address' field, which
+indicates at which address the file should be loaded if we want to avoid
+post-reload relocation, is set to 0. It then seeks to offset 256 (base
+offset for the objects).
+
+\1f
+File: internals.info, Node: Data dumping, Next: Pointers dumping, Prev: The header, Up: Dumping phase
+
+Data dumping
+------------
+
+The data is dumped in the same order as the addresses were allocated by
+`pdump_dump_data()', called from `pdump_scan_by_alignment()'. This
+function copies the data to a temporary buffer, relocates all pointers
+in the object to the addresses allocated in step Address Allocation,
+and writes it to the file. Using the same order means that, if we are
+careful with lrecords whose size is not a multiple of 4, we are ensured
+that the object is always written at the offset in the file allocated
+in step Address Allocation.
+
+\1f
+File: internals.info, Node: Pointers dumping, Prev: Data dumping, Up: Dumping phase
+
+Pointers dumping
+----------------
+
+A bunch of tables needed to reassign properly the global pointers are
+then written. They are:
+
+ 1. the pdump_root_struct_ptrs dynarr
+
+ 2. the pdump_opaques dynarr
+
+ 3. a vector of all the offsets to the objects in the file that
+ include a description (for faster relocation at reload time)
+
+ 4. the pdump_root_objects and pdump_weak_object_chains dynarrs.
+
+ For each of the dynarrs we write both the pointer to the variables
+and the relocated offset of the object they point to. Since these
+variables are global, the pointers are still valid when restarting the
+program and are used to regenerate the global pointers.
+
+ The `pdump_weak_object_chains' dynarr is a special case. The
+variables it points to are the head of weak linked lists of lisp objects
+of the same type. Not all objects of this list are dumped so the
+relocated pointer we associate with them points to the first dumped
+object of the list, or Qnil if none is available. This is also the
+reason why they are not used as roots for the purpose of object
+enumeration.
+
+ Some very important information like the `staticpros' and
+`lrecord_implementations_table' are handled indirectly using
+`dump_add_opaque' or `dump_add_root_struct_ptr'.
+
+ This is the end of the dumping part.
+
+\1f
+File: internals.info, Node: Reloading phase, Next: Remaining issues, Prev: Dumping phase, Up: Dumping
+
+Reloading phase
+===============
+
+File loading
+------------
+
+The file is mmap'ed in memory (which ensures a PAGESIZE alignment, at
+least 4096), or if mmap is unavailable or fails, a 256-bytes aligned
+malloc is done and the file is loaded.
+
+ Some variables are reinitialized from the values found in the header.
+
+ The difference between the actual loading address and the
+reloc_address is computed and will be used for all the relocations.
+
+Putting back the pdump_opaques
+------------------------------
+
+The memory contents are restored in the obvious and trivial way.
+
+Putting back the pdump_root_struct_ptrs
+---------------------------------------
+
+The variables pointed to by pdump_root_struct_ptrs in the dump phase are
+reset to the right relocated object addresses.
+
+Object relocation
+-----------------
+
+All the objects are relocated using their description and their offset
+by `pdump_reloc_one'. This step is unnecessary if the reloc_address is
+equal to the file loading address.
+
+Putting back the pdump_root_objects and pdump_weak_object_chains
+----------------------------------------------------------------
+
+Same as Putting back the pdump_root_struct_ptrs.
+
+Reorganize the hash tables
+--------------------------
+
+Since some of the hash values in the lisp hash tables are
+address-dependent, their layout is now wrong. So we go through each of
+them and have them resorted by calling `pdump_reorganize_hash_table'.
+
+\1f
+File: internals.info, Node: Remaining issues, Prev: Reloading phase, Up: Dumping
+
+Remaining issues
+================
+
+The build process will have to start a post-dump xemacs, ask it the
+loading address (which will, hopefully, be always the same between
+different xemacs invocations) and relocate the file to the new address.
+This way the object relocation phase will not have to be done, which
+means no writes in the objects and that, because of the use of mmap, the
+dumped data will be shared between all the xemacs running on the
+computer.
+
+ Some executable signature will be necessary to ensure that a given
+dump file is really associated with a given executable, or random
+crashes will occur. Maybe a random number set at compile or configure
+time thru a define. This will also allow for having
+differently-compiled xemacsen on the same system (mule and no-mule
+comes to mind).
+
+ The DOC file contents should probably end up in the dump file.
+
+\1f
+File: internals.info, Node: Events and the Event Loop, Next: Evaluation; Stack Frames; Bindings, Prev: Dumping, Up: Top
+
+Events and the Event Loop
+*************************
+
+* Menu:
+
+* Introduction to Events::
+* Main Loop::
+* Specifics of the Event Gathering Mechanism::
+* Specifics About the Emacs Event::
+* The Event Stream Callback Routines::
+* Other Event Loop Functions::
+* Converting Events::
+* Dispatching Events; The Command Builder::
+
+\1f
+File: internals.info, Node: Introduction to Events, Next: Main Loop, Up: Events and the Event Loop
+
+Introduction to Events
+======================
+
+An event is an object that encapsulates information about an
+interesting occurrence in the operating system. Events are generated
+either by user action, direct (e.g. typing on the keyboard or moving
+the mouse) or indirect (moving another window, thereby generating an
+expose event on an Emacs frame), or as a result of some other typically
+asynchronous action happening, such as output from a subprocess being
+ready or a timer expiring. Events come into the system in an
+asynchronous fashion (typically through a callback being called) and
+are converted into a synchronous event queue (first-in, first-out) in a
+process that we will call "collection".
+
+ Note that each application has its own event queue. (It is
+immaterial whether the collection process directly puts the events in
+the proper application's queue, or puts them into a single system
+queue, which is later split up.)
+
+ The most basic level of event collection is done by the operating
+system or window system. Typically, XEmacs does its own event
+collection as well. Often there are multiple layers of collection in
+XEmacs, with events from various sources being collected into a queue,
+which is then combined with other sources to go into another queue
+(i.e. a second level of collection), with perhaps another level on top
+of this, etc.
+
+ XEmacs has its own types of events (called "Emacs events"), which
+provides an abstract layer on top of the system-dependent nature of the
+most basic events that are received. Part of the complex nature of the
+XEmacs event collection process involves converting from the
+operating-system events into the proper Emacs events--there may not be
+a one-to-one correspondence.
+
+ Emacs events are documented in `events.h'; I'll discuss them later.
+
+\1f
+File: internals.info, Node: Main Loop, Next: Specifics of the Event Gathering Mechanism, Prev: Introduction to Events, Up: Events and the Event Loop
+
+Main Loop
+=========
+
+The "command loop" is the top-level loop that the editor is always
+running. It loops endlessly, calling `next-event' to retrieve an event
+and `dispatch-event' to execute it. `dispatch-event' does the
+appropriate thing with non-user events (process, timeout, magic, eval,
+mouse motion); this involves calling a Lisp handler function, redrawing
+a newly-exposed part of a frame, reading subprocess output, etc. For
+user events, `dispatch-event' looks up the event in relevant keymaps or
+menubars; when a full key sequence or menubar selection is reached, the
+appropriate function is executed. `dispatch-event' may have to keep
+state across calls; this is done in the "command-builder" structure
+associated with each console (remember, there's usually only one
+console), and the engine that looks up keystrokes and constructs full
+key sequences is called the "command builder". This is documented
+elsewhere.
+
+ The guts of the command loop are in `command_loop_1()'. This
+function doesn't catch errors, though--that's the job of
+`command_loop_2()', which is a condition-case (i.e. error-trapping)
+wrapper around `command_loop_1()'. `command_loop_1()' never returns,
+but may get thrown out of.
+
+ When an error occurs, `cmd_error()' is called, which usually invokes
+the Lisp error handler in `command-error'; however, a default error
+handler is provided if `command-error' is `nil' (e.g. during startup).
+The purpose of the error handler is simply to display the error message
+and do associated cleanup; it does not need to throw anywhere. When
+the error handler finishes, the condition-case in `command_loop_2()'
+will finish and `command_loop_2()' will reinvoke `command_loop_1()'.
+
+ `command_loop_2()' is invoked from three places: from
+`initial_command_loop()' (called from `main()' at the end of internal
+initialization), from the Lisp function `recursive-edit', and from
+`call_command_loop()'.
+
+ `call_command_loop()' is called when a macro is started and when the
+minibuffer is entered; normal termination of the macro or minibuffer
+causes a throw out of the recursive command loop. (To
+`execute-kbd-macro' for macros and `exit' for minibuffers. Note also
+that the low-level minibuffer-entering function,
+`read-minibuffer-internal', provides its own error handling and does
+not need `command_loop_2()''s error encapsulation; so it tells
+`call_command_loop()' to invoke `command_loop_1()' directly.)
+
+ Note that both read-minibuffer-internal and recursive-edit set up a
+catch for `exit'; this is why `abort-recursive-edit', which throws to
+this catch, exits out of either one.
+
+ `initial_command_loop()', called from `main()', sets up a catch for
+`top-level' when invoking `command_loop_2()', allowing functions to
+throw all the way to the top level if they really need to. Before
+invoking `command_loop_2()', `initial_command_loop()' calls
+`top_level_1()', which handles all of the startup stuff (creating the
+initial frame, handling the command-line options, loading the user's
+`.emacs' file, etc.). The function that actually does this is in Lisp
+and is pointed to by the variable `top-level'; normally this function is
+`normal-top-level'. `top_level_1()' is just an error-handling wrapper
+similar to `command_loop_2()'. Note also that `initial_command_loop()'
+sets up a catch for `top-level' when invoking `top_level_1()', just
+like when it invokes `command_loop_2()'.
+
+\1f
+File: internals.info, Node: Specifics of the Event Gathering Mechanism, Next: Specifics About the Emacs Event, Prev: Main Loop, Up: Events and the Event Loop
+
+Specifics of the Event Gathering Mechanism
+==========================================
+
+Here is an approximate diagram of the collection processes at work in
+XEmacs, under TTY's (TTY's are simpler than X so we'll look at this
+first):
+
+ asynch. asynch. asynch. asynch. [Collectors in
+ kbd events kbd events process process the OS]
+ | | output output
+ | | | |
+ | | | | SIGINT, [signal handlers
+ | | | | SIGQUIT, in XEmacs]
+ V V V V SIGWINCH,
+ file file file file SIGALRM
+ desc. desc. desc. desc. |
+ (TTY) (TTY) (pipe) (pipe) |
+ | | | | fake timeouts
+ | | | | file |
+ | | | | desc. |
+ | | | | (pipe) |
+ | | | | | |
+ | | | | | |
+ | | | | | |
+ V V V V V V
+ ------>-----------<----------------<----------------
+ |
+ |
+ | [collected using select() in emacs_tty_next_event()
+ | and converted to the appropriate Emacs event]
+ |
+ |
+ V (above this line is TTY-specific)
+ Emacs -----------------------------------------------
+ event (below this line is the generic event mechanism)
+ |
+ |
+ was there if not, call
+ a SIGINT? emacs_tty_next_event()
+ | |
+ | |
+ | |
+ V V
+ --->------<----
+ |
+ | [collected in event_stream_next_event();
+ | SIGINT is converted using maybe_read_quit_event()]
+ V
+ Emacs
+ event
+ |
+ \---->------>----- maybe_kbd_translate() ---->---\
+ |
+ |
+ |
+ command event queue |
+ if not from command
+ (contains events that were event queue, call
+ read earlier but not processed, event_stream_next_event()
+ typically when waiting in a |
+ sit-for, sleep-for, etc. for |
+ a particular event to be received) |
+ | |
+ | |
+ V V
+ ---->------------------------------------<----
+ |
+ | [collected in
+ | next_event_internal()]
+ |
+ unread- unread- event from |
+ command- command- keyboard else, call
+ events event macro next_event_internal()
+ | | | |
+ | | | |
+ | | | |
+ V V V V
+ --------->----------------------<------------
+ |
+ | [collected in `next-event', which may loop
+ | more than once if the event it gets is on
+ | a dead frame, device, etc.]
+ |
+ |
+ V
+ feed into top-level event loop,
+ which repeatedly calls `next-event'
+ and then dispatches the event
+ using `dispatch-event'
+
+ Notice the separation between TTY-specific and generic event
+mechanism. When using the Xt-based event loop, the TTY-specific stuff
+is replaced but the rest stays the same.
+
+ It's also important to realize that only one different kind of
+system-specific event loop can be operating at a time, and must be able
+to receive all kinds of events simultaneously. For the two existing
+event loops (implemented in `event-tty.c' and `event-Xt.c',
+respectively), the TTY event loop _only_ handles TTY consoles, while
+the Xt event loop handles _both_ TTY and X consoles. This situation is
+different from all of the output handlers, where you simply have one
+per console type.
+
+ Here's the Xt Event Loop Diagram (notice that below a certain point,
+it's the same as the above diagram):
+
+ asynch. asynch. asynch. asynch. [Collectors in
+ kbd kbd process process the OS]
+ events events output output
+ | | | |
+ | | | | asynch. asynch. [Collectors in the
+ | | | | X X OS and X Window System]
+ | | | | events events
+ | | | | | |
+ | | | | | |
+ | | | | | | SIGINT, [signal handlers
+ | | | | | | SIGQUIT, in XEmacs]
+ | | | | | | SIGWINCH,
+ | | | | | | SIGALRM
+ | | | | | | |
+ | | | | | | |
+ | | | | | | | timeouts
+ | | | | | | | |
+ | | | | | | | |
+ | | | | | | V |
+ V V V V V V fake |
+ file file file file file file file |
+ desc. desc. desc. desc. desc. desc. desc. |
+ (TTY) (TTY) (pipe) (pipe) (socket) (socket) (pipe) |
+ | | | | | | | |
+ | | | | | | | |
+ | | | | | | | |
+ V V V V V V V V
+ --->----------------------------------------<---------<------
+ | | |
+ | | |[collected using select() in
+ | | | _XtWaitForSomething(), called
+ | | | from XtAppProcessEvent(), called
+ | | | in emacs_Xt_next_event();
+ | | | dispatched to various callbacks]
+ | | |
+ | | |
+ emacs_Xt_ p_s_callback(), | [popup_selection_callback]
+ event_handler() x_u_v_s_callback(),| [x_update_vertical_scrollbar_
+ | x_u_h_s_callback(),| callback]
+ | search_callback() | [x_update_horizontal_scrollbar_
+ | | | callback]
+ | | |
+ | | |
+ enqueue_Xt_ signal_special_ |
+ dispatch_event() Xt_user_event() |
+ [maybe multiple | |
+ times, maybe 0 | |
+ times] | |
+ | enqueue_Xt_ |
+ | dispatch_event() |
+ | | |
+ | | |
+ V V |
+ -->----------<-- |
+ | |
+ | |
+ dispatch Xt_what_callback()
+ event sets flags
+ queue |
+ | |
+ | |
+ | |
+ | |
+ ---->-----------<--------
+ |
+ |
+ | [collected and converted as appropriate in
+ | emacs_Xt_next_event()]
+ |
+ |
+ V (above this line is Xt-specific)
+ Emacs ------------------------------------------------
+ event (below this line is the generic event mechanism)
+ |
+ |
+ was there if not, call
+ a SIGINT? emacs_Xt_next_event()
+ | |
+ | |
+ | |
+ V V
+ --->-------<----
+ |
+ | [collected in event_stream_next_event();
+ | SIGINT is converted using maybe_read_quit_event()]
+ V
+ Emacs
+ event
+ |
+ \---->------>----- maybe_kbd_translate() -->-----\
+ |
+ |
+ |
+ command event queue |
+ if not from command
+ (contains events that were event queue, call
+ read earlier but not processed, event_stream_next_event()
+ typically when waiting in a |
+ sit-for, sleep-for, etc. for |
+ a particular event to be received) |
+ | |
+ | |
+ V V
+ ---->----------------------------------<------
+ |
+ | [collected in
+ | next_event_internal()]
+ |
+ unread- unread- event from |
+ command- command- keyboard else, call
+ events event macro next_event_internal()
+ | | | |
+ | | | |
+ | | | |
+ V V V V
+ --------->----------------------<------------
+ |
+ | [collected in `next-event', which may loop
+ | more than once if the event it gets is on
+ | a dead frame, device, etc.]
+ |
+ |
+ V
+ feed into top-level event loop,
+ which repeatedly calls `next-event'
+ and then dispatches the event
+ using `dispatch-event'
+
+\1f
+File: internals.info, Node: Specifics About the Emacs Event, Next: The Event Stream Callback Routines, Prev: Specifics of the Event Gathering Mechanism, Up: Events and the Event Loop
+
+Specifics About the Emacs Event
+===============================
+
+\1f
+File: internals.info, Node: The Event Stream Callback Routines, Next: Other Event Loop Functions, Prev: Specifics About the Emacs Event, Up: Events and the Event Loop
+
+The Event Stream Callback Routines
+==================================
+
+\1f
+File: internals.info, Node: Other Event Loop Functions, Next: Converting Events, Prev: The Event Stream Callback Routines, Up: Events and the Event Loop
+
+Other Event Loop Functions
+==========================
+
+`detect_input_pending()' and `input-pending-p' look for input by
+calling `event_stream->event_pending_p' and looking in
+`[V]unread-command-event' and the `command_event_queue' (they do not
+check for an executing keyboard macro, though).
+
+ `discard-input' cancels any command events pending (and any keyboard
+macros currently executing), and puts the others onto the
+`command_event_queue'. There is a comment about a "race condition",
+which is not a good sign.
+
+ `next-command-event' and `read-char' are higher-level interfaces to
+`next-event'. `next-command-event' gets the next "command" event (i.e.
+keypress, mouse event, menu selection, or scrollbar action), calling
+`dispatch-event' on any others. `read-char' calls `next-command-event'
+and uses `event_to_character()' to return the character equivalent.
+With the right kind of input method support, it is possible for
+(read-char) to return a Kanji character.
+
+\1f
+File: internals.info, Node: Converting Events, Next: Dispatching Events; The Command Builder, Prev: Other Event Loop Functions, Up: Events and the Event Loop
+
+Converting Events
+=================
+
+`character_to_event()', `event_to_character()', `event-to-character',
+and `character-to-event' convert between characters and keypress events
+corresponding to the characters. If the event was not a keypress,
+`event_to_character()' returns -1 and `event-to-character' returns
+`nil'. These functions convert between character representation and
+the split-up event representation (keysym plus mod keys).
+
+\1f
+File: internals.info, Node: Dispatching Events; The Command Builder, Prev: Converting Events, Up: Events and the Event Loop
+
+Dispatching Events; The Command Builder
+=======================================
+
+Not yet documented.
+
+\1f
+File: internals.info, Node: Evaluation; Stack Frames; Bindings, Next: Symbols and Variables, Prev: Events and the Event Loop, Up: Top
+
+Evaluation; Stack Frames; Bindings
+**********************************
+
+* Menu:
+
+* Evaluation::
+* Dynamic Binding; The specbinding Stack; Unwind-Protects::
+* Simple Special Forms::
+* Catch and Throw::
+
+\1f
+File: internals.info, Node: Evaluation, Next: Dynamic Binding; The specbinding Stack; Unwind-Protects, Up: Evaluation; Stack Frames; Bindings
+
+Evaluation
+==========
+
+`Feval()' evaluates the form (a Lisp object) that is passed to it.
+Note that evaluation is only non-trivial for two types of objects:
+symbols and conses. A symbol is evaluated simply by calling
+`symbol-value' on it and returning the value.
+
+ Evaluating a cons means calling a function. First, `eval' checks to
+see if garbage-collection is necessary, and calls `garbage_collect_1()'
+if so. It then increases the evaluation depth by 1 (`lisp_eval_depth',
+which is always less than `max_lisp_eval_depth') and adds an element to
+the linked list of `struct backtrace''s (`backtrace_list'). Each such
+structure contains a pointer to the function being called plus a list
+of the function's arguments. Originally these values are stored
+unevalled, and as they are evaluated, the backtrace structure is
+updated. Garbage collection pays attention to the objects pointed to
+in the backtrace structures (garbage collection might happen while a
+function is being called or while an argument is being evaluated, and
+there could easily be no other references to the arguments in the
+argument list; once an argument is evaluated, however, the unevalled
+version is not needed by eval, and so the backtrace structure is
+changed).
+
+ At this point, the function to be called is determined by looking at
+the car of the cons (if this is a symbol, its function definition is
+retrieved and the process repeated). The function should then consist
+of either a `Lisp_Subr' (built-in function written in C), a
+`Lisp_Compiled_Function' object, or a cons whose car is one of the
+symbols `autoload', `macro' or `lambda'.
+
+ If the function is a `Lisp_Subr', the lisp object points to a
+`struct Lisp_Subr' (created by `DEFUN()'), which contains a pointer to
+the C function, a minimum and maximum number of arguments (or possibly
+the special constants `MANY' or `UNEVALLED'), a pointer to the symbol
+referring to that subr, and a couple of other things. If the subr
+wants its arguments `UNEVALLED', they are passed raw as a list.
+Otherwise, an array of evaluated arguments is created and put into the
+backtrace structure, and either passed whole (`MANY') or each argument
+is passed as a C argument.
+
+ If the function is a `Lisp_Compiled_Function',
+`funcall_compiled_function()' is called. If the function is a lambda
+list, `funcall_lambda()' is called. If the function is a macro, [.....
+fill in] is done. If the function is an autoload, `do_autoload()' is
+called to load the definition and then eval starts over [explain this
+more].
+
+ When `Feval()' exits, the evaluation depth is reduced by one, the
+debugger is called if appropriate, and the current backtrace structure
+is removed from the list.
+
+ Both `funcall_compiled_function()' and `funcall_lambda()' need to go
+through the list of formal parameters to the function and bind them to
+the actual arguments, checking for `&rest' and `&optional' symbols in
+the formal parameters and making sure the number of actual arguments is
+correct. `funcall_compiled_function()' can do this a little more
+efficiently, since the formal parameter list can be checked for sanity
+when the compiled function object is created.
+
+ `funcall_lambda()' simply calls `Fprogn' to execute the code in the
+lambda list.
+
+ `funcall_compiled_function()' calls the real byte-code interpreter
+`execute_optimized_program()' on the byte-code instructions, which are
+converted into an internal form for faster execution.
+
+ When a compiled function is executed for the first time by
+`funcall_compiled_function()', or during the dump phase of building
+XEmacs, the byte-code instructions are converted from a `Lisp_String'
+(which is inefficient to access, especially in the presence of MULE)
+into a `Lisp_Opaque' object containing an array of unsigned char, which
+can be directly executed by the byte-code interpreter. At this time
+the byte code is also analyzed for validity and transformed into a more
+optimized form, so that `execute_optimized_program()' can really fly.
+
+ Here are some of the optimizations performed by the internal
+byte-code transformer:
+ 1. References to the `constants' array are checked for out-of-range
+ indices, so that the byte interpreter doesn't have to.
+
+ 2. References to the `constants' array that will be used as a Lisp
+ variable are checked for being correct non-constant (i.e. not `t',
+ `nil', or `keywordp') symbols, so that the byte interpreter
+ doesn't have to.
+
+ 3. The maximum number of variable bindings in the byte-code is
+ pre-computed, so that space on the `specpdl' stack can be
+ pre-reserved once for the whole function execution.
+
+ 4. All byte-code jumps are relative to the current program counter
+ instead of the start of the program, thereby saving a register.
+
+ 5. One-byte relative jumps are converted from the byte-code form of
+ unsigned chars offset by 127 to machine-friendly signed chars.
+
+ Of course, this transformation of the `instructions' should not be
+visible to the user, so `Fcompiled_function_instructions()' needs to
+know how to convert the optimized opaque object back into a Lisp string
+that is identical to the original string from the `.elc' file.
+(Actually, the resulting string may (rarely) contain slightly
+different, yet equivalent, byte code.)
+
+ `Ffuncall()' implements Lisp `funcall'. `(funcall fun x1 x2 x3
+...)' is equivalent to `(eval (list fun (quote x1) (quote x2) (quote
+x3) ...))'. `Ffuncall()' contains its own code to do the evaluation,
+however, and is very similar to `Feval()'.
+
+ From the performance point of view, it is worth knowing that most of
+the time in Lisp evaluation is spent executing `Lisp_Subr' and
+`Lisp_Compiled_Function' objects via `Ffuncall()' (not `Feval()').
+
+ `Fapply()' implements Lisp `apply', which is very similar to
+`funcall' except that if the last argument is a list, the result is the
+same as if each of the arguments in the list had been passed separately.
+`Fapply()' does some business to expand the last argument if it's a
+list, then calls `Ffuncall()' to do the work.
+
+ `apply1()', `call0()', `call1()', `call2()', and `call3()' call a
+function, passing it the argument(s) given (the arguments are given as
+separate C arguments rather than being passed as an array). `apply1()'
+uses `Fapply()' while the others use `Ffuncall()' to do the real work.
+
+\1f
+File: internals.info, Node: Dynamic Binding; The specbinding Stack; Unwind-Protects, Next: Simple Special Forms, Prev: Evaluation, Up: Evaluation; Stack Frames; Bindings
+
+Dynamic Binding; The specbinding Stack; Unwind-Protects
+=======================================================
+
+ struct specbinding
+ {
+ Lisp_Object symbol;
+ Lisp_Object old_value;
+ Lisp_Object (*func) (Lisp_Object); /* for unwind-protect */
+ };
+
+ `struct specbinding' is used for local-variable bindings and
+unwind-protects. `specpdl' holds an array of `struct specbinding''s,
+`specpdl_ptr' points to the beginning of the free bindings in the
+array, `specpdl_size' specifies the total number of binding slots in
+the array, and `max_specpdl_size' specifies the maximum number of
+bindings the array can be expanded to hold. `grow_specpdl()' increases
+the size of the `specpdl' array, multiplying its size by 2 but never
+exceeding `max_specpdl_size' (except that if this number is less than
+400, it is first set to 400).
+
+ `specbind()' binds a symbol to a value and is used for local
+variables and `let' forms. The symbol and its old value (which might
+be `Qunbound', indicating no prior value) are recorded in the specpdl
+array, and `specpdl_size' is increased by 1.
+
+ `record_unwind_protect()' implements an "unwind-protect", which,
+when placed around a section of code, ensures that some specified
+cleanup routine will be executed even if the code exits abnormally
+(e.g. through a `throw' or quit). `record_unwind_protect()' simply
+adds a new specbinding to the `specpdl' array and stores the
+appropriate information in it. The cleanup routine can either be a C
+function, which is stored in the `func' field, or a `progn' form, which
+is stored in the `old_value' field.
+
+ `unbind_to()' removes specbindings from the `specpdl' array until
+the specified position is reached. Each specbinding can be one of
+three types:
+
+ 1. an unwind-protect with a C cleanup function (`func' is not 0, and
+ `old_value' holds an argument to be passed to the function);
+
+ 2. an unwind-protect with a Lisp form (`func' is 0, `symbol' is
+ `nil', and `old_value' holds the form to be executed with
+ `Fprogn()'); or
+
+ 3. a local-variable binding (`func' is 0, `symbol' is not `nil', and
+ `old_value' holds the old value, which is stored as the symbol's
+ value).
+
+\1f
+File: internals.info, Node: Simple Special Forms, Next: Catch and Throw, Prev: Dynamic Binding; The specbinding Stack; Unwind-Protects, Up: Evaluation; Stack Frames; Bindings
+
+Simple Special Forms
+====================
+
+`or', `and', `if', `cond', `progn', `prog1', `prog2', `setq', `quote',
+`function', `let*', `let', `while'
+
+ All of these are very simple and work as expected, calling `Feval()'
+or `Fprogn()' as necessary and (in the case of `let' and `let*') using
+`specbind()' to create bindings and `unbind_to()' to undo the bindings
+when finished.
+
+ Note that, with the exception of `Fprogn', these functions are
+typically called in real life only in interpreted code, since the byte
+compiler knows how to convert calls to these functions directly into
+byte code.
+
+\1f
+File: internals.info, Node: Catch and Throw, Prev: Simple Special Forms, Up: Evaluation; Stack Frames; Bindings
+
+Catch and Throw
+===============
+
+ struct catchtag
+ {
+ Lisp_Object tag;
+ Lisp_Object val;
+ struct catchtag *next;
+ struct gcpro *gcpro;
+ jmp_buf jmp;
+ struct backtrace *backlist;
+ int lisp_eval_depth;
+ int pdlcount;
+ };
+
+ `catch' is a Lisp function that places a catch around a body of
+code. A catch is a means of non-local exit from the code. When a catch
+is created, a tag is specified, and executing a `throw' to this tag
+will exit from the body of code caught with this tag, and its value will
+be the value given in the call to `throw'. If there is no such call,
+the code will be executed normally.
+
+ Information pertaining to a catch is held in a `struct catchtag',
+which is placed at the head of a linked list pointed to by `catchlist'.
+`internal_catch()' is passed a C function to call (`Fprogn()' when
+Lisp `catch' is called) and arguments to give it, and places a catch
+around the function. Each `struct catchtag' is held in the stack frame
+of the `internal_catch()' instance that created the catch.
+
+ `internal_catch()' is fairly straightforward. It stores into the
+`struct catchtag' the tag name and the current values of
+`backtrace_list', `lisp_eval_depth', `gcprolist', and the offset into
+the `specpdl' array, sets a jump point with `_setjmp()' (storing the
+jump point into the `struct catchtag'), and calls the function.
+Control will return to `internal_catch()' either when the function
+exits normally or through a `_longjmp()' to this jump point. In the
+latter case, `throw' will store the value to be returned into the
+`struct catchtag' before jumping. When it's done, `internal_catch()'
+removes the `struct catchtag' from the catchlist and returns the proper
+value.
+
+ `Fthrow()' goes up through the catchlist until it finds one with a
+matching tag. It then calls `unbind_catch()' to restore everything to
+what it was when the appropriate catch was set, stores the return value
+in the `struct catchtag', and jumps (with `_longjmp()') to its jump
+point.
+
+ `unbind_catch()' removes all catches from the catchlist until it
+finds the correct one. Some of the catches might have been placed for
+error-trapping, and if so, the appropriate entries on the handlerlist
+must be removed (see "errors"). `unbind_catch()' also restores the
+values of `gcprolist', `backtrace_list', and `lisp_eval', and calls
+`unbind_to()' to undo any specbindings created since the catch.
+
+\1f
+File: internals.info, Node: Symbols and Variables, Next: Buffers and Textual Representation, Prev: Evaluation; Stack Frames; Bindings, Up: Top
+
+Symbols and Variables
+*********************
+
+* Menu:
+
+* Introduction to Symbols::
+* Obarrays::
+* Symbol Values::
+