+++ /dev/null
-This is Info file ../../info/internals.info, produced by Makeinfo
-version 1.68 from the input file internals.texi.
-
-INFO-DIR-SECTION XEmacs Editor
-START-INFO-DIR-ENTRY
-* Internals: (internals). XEmacs Internals Manual.
-END-INFO-DIR-ENTRY
-
- Copyright (C) 1992 - 1996 Ben Wing. Copyright (C) 1996, 1997 Sun
-Microsystems. Copyright (C) 1994 - 1998 Free Software Foundation.
-Copyright (C) 1994, 1995 Board of Trustees, University of Illinois.
-
- Permission is granted to make and distribute verbatim copies of this
-manual provided the copyright notice and this permission notice are
-preserved on all copies.
-
- Permission is granted to copy and distribute modified versions of
-this manual under the conditions for verbatim copying, provided that the
-entire resulting derived work is distributed under the terms of a
-permission notice identical to this one.
-
- Permission is granted to copy and distribute translations of this
-manual into another language, under the above conditions for modified
-versions, except that this permission notice may be stated in a
-translation approved by the Foundation.
-
- Permission is granted to copy and distribute modified versions of
-this manual under the conditions for verbatim copying, provided also
-that the section entitled "GNU General Public License" is included
-exactly as in the original, and provided that the entire resulting
-derived work is distributed under the terms of a permission notice
-identical to this one.
-
- Permission is granted to copy and distribute translations of this
-manual into another language, under the above conditions for modified
-versions, except that the section entitled "GNU General Public License"
-may be included in a translation approved by the Free Software
-Foundation instead of in the original English.
-
-\1f
-File: internals.info, Node: The XEmacs Object System (Abstractly Speaking), Next: How Lisp Objects Are Represented in C, Prev: XEmacs From the Inside, Up: Top
-
-The XEmacs Object System (Abstractly Speaking)
-**********************************************
-
- At the heart of the Lisp interpreter is its management of objects.
-XEmacs Lisp contains many built-in objects, some of which are simple
-and others of which can be very complex; and some of which are very
-common, and others of which are rarely used or are only used
-internally. (Since the Lisp allocation system, with its automatic
-reclamation of unused storage, is so much more convenient than
-`malloc()' and `free()', the C code makes extensive use of it in its
-internal operations.)
-
- The basic Lisp objects are
-
-`integer'
- 28 or 31 bits of precision, or 60 or 63 bits on 64-bit machines;
- the reason for this is described below when the internal Lisp
- object representation is described.
-
-`float'
- Same precision as a double in C.
-
-`cons'
- A simple container for two Lisp objects, used to implement lists
- and most other data structures in Lisp.
-
-`char'
- An object representing a single character of text; chars behave
- like integers in many ways but are logically considered text
- rather than numbers and have a different read syntax. (the read
- syntax for a char contains the char itself or some textual
- encoding of it - for example, a Japanese Kanji character might be
- encoded as `^[$(B#&^[(B' using the ISO-2022 encoding standard -
- rather than the numerical representation of the char; this way, if
- the mapping between chars and integers changes, which is quite
- possible for Kanji characters and other extended characters, the
- same character will still be created. Note that some primitives
- confuse chars and integers. The worst culprit is `eq', which
- makes a special exception and considers a char to be `eq' to its
- integer equivalent, even though in no other case are objects of two
- different types `eq'. The reason for this monstrosity is
- compatibility with existing code; the separation of char from
- integer came fairly recently.)
-
-`symbol'
- An object that contains Lisp objects and is referred to by name;
- symbols are used to implement variables and named functions and to
- provide the equivalent of preprocessor constants in C.
-
-`vector'
- A one-dimensional array of Lisp objects providing constant-time
- access to any of the objects; access to an arbitrary object in a
- vector is faster than for lists, but the operations that can be
- done on a vector are more limited.
-
-`string'
- Self-explanatory; behaves much like a vector of chars but has a
- different read syntax and is stored and manipulated more compactly.
-
-`bit-vector'
- A vector of bits; similar to a string in spirit.
-
-`compiled-function'
- An object containing compiled Lisp code, known as "byte code".
-
-`subr'
- A Lisp primitive, i.e. a Lisp-callable function implemented in C.
-
- Note that there is no basic "function" type, as in more powerful
-versions of Lisp (where it's called a "closure"). XEmacs Lisp does not
-provide the closure semantics implemented by Common Lisp and Scheme.
-The guts of a function in XEmacs Lisp are represented in one of four
-ways: a symbol specifying another function (when one function is an
-alias for another), a list (whose first element must be the symbol
-`lambda') containing the function's source code, a compiled-function
-object, or a subr object. (In other words, given a symbol specifying
-the name of a function, calling `symbol-function' to retrieve the
-contents of the symbol's function cell will return one of these types
-of objects.)
-
- XEmacs Lisp also contains numerous specialized objects used to
-implement the editor:
-
-`buffer'
- Stores text like a string, but is optimized for insertion and
- deletion and has certain other properties that can be set.
-
-`frame'
- An object with various properties whose displayable representation
- is a "window" in window-system parlance.
-
-`window'
- A section of a frame that displays the contents of a buffer; often
- called a "pane" in window-system parlance.
-
-`window-configuration'
- An object that represents a saved configuration of windows in a
- frame.
-
-`device'
- An object representing a screen on which frames can be displayed;
- equivalent to a "display" in the X Window System and a "TTY" in
- character mode.
-
-`face'
- An object specifying the appearance of text or graphics; it has
- properties such as font, foreground color, and background color.
-
-`marker'
- An object that refers to a particular position in a buffer and
- moves around as text is inserted and deleted to stay in the same
- relative position to the text around it.
-
-`extent'
- Similar to a marker but covers a range of text in a buffer; can
- also specify properties of the text, such as a face in which the
- text is to be displayed, whether the text is invisible or
- unmodifiable, etc.
-
-`event'
- Generated by calling `next-event' and contains information
- describing a particular event happening in the system, such as the
- user pressing a key or a process terminating.
-
-`keymap'
- An object that maps from events (described using lists, vectors,
- and symbols rather than with an event object because the mapping
- is for classes of events, rather than individual events) to
- functions to execute or other events to recursively look up; the
- functions are described by name, using a symbol, or using lists to
- specify the function's code.
-
-`glyph'
- An object that describes the appearance of an image (e.g. pixmap)
- on the screen; glyphs can be attached to the beginning or end of
- extents and in some future version of XEmacs will be able to be
- inserted directly into a buffer.
-
-`process'
- An object that describes a connection to an externally-running
- process.
-
- There are some other, less-commonly-encountered general objects:
-
-`hash-table'
- An object that maps from an arbitrary Lisp object to another
- arbitrary Lisp object, using hashing for fast lookup.
-
-`obarray'
- A limited form of hash-table that maps from strings to symbols;
- obarrays are used to look up a symbol given its name and are not
- actually their own object type but are kludgily represented using
- vectors with hidden fields (this representation derives from GNU
- Emacs).
-
-`specifier'
- A complex object used to specify the value of a display property; a
- default value is given and different values can be specified for
- particular frames, buffers, windows, devices, or classes of device.
-
-`char-table'
- An object that maps from chars or classes of chars to arbitrary
- Lisp objects; internally char tables use a complex nested-vector
- representation that is optimized to the way characters are
- represented as integers.
-
-`range-table'
- An object that maps from ranges of integers to arbitrary Lisp
- objects.
-
- And some strange special-purpose objects:
-
-`charset'
-`coding-system'
- Objects used when MULE, or multi-lingual/Asian-language, support is
- enabled.
-
-`color-instance'
-`font-instance'
-`image-instance'
- An object that encapsulates a window-system resource; instances are
- mostly used internally but are exposed on the Lisp level for
- cleanness of the specifier model and because it's occasionally
- useful for Lisp program to create or query the properties of
- instances.
-
-`subwindow'
- An object that encapsulate a "subwindow" resource, i.e. a
- window-system child window that is drawn into by an external
- process; this object should be integrated into the glyph system
- but isn't yet, and may change form when this is done.
-
-`tooltalk-message'
-`tooltalk-pattern'
- Objects that represent resources used in the ToolTalk interprocess
- communication protocol.
-
-`toolbar-button'
- An object used in conjunction with the toolbar.
-
- And objects that are only used internally:
-
-`opaque'
- A generic object for encapsulating arbitrary memory; this allows
- you the generality of `malloc()' and the convenience of the Lisp
- object system.
-
-`lstream'
- A buffering I/O stream, used to provide a unified interface to
- anything that can accept output or provide input, such as a file
- descriptor, a stdio stream, a chunk of memory, a Lisp buffer, a
- Lisp string, etc.; it's a Lisp object to make its memory
- management more convenient.
-
-`char-table-entry'
- Subsidiary objects in the internal char-table representation.
-
-`extent-auxiliary'
-`menubar-data'
-`toolbar-data'
- Various special-purpose objects that are basically just used to
- encapsulate memory for particular subsystems, similar to the more
- general "opaque" object.
-
-`symbol-value-forward'
-`symbol-value-buffer-local'
-`symbol-value-varalias'
-`symbol-value-lisp-magic'
- Special internal-only objects that are placed in the value cell of
- a symbol to indicate that there is something special with this
- variable - e.g. it has no value, it mirrors another variable, or
- it mirrors some C variable; there is really only one kind of
- object, called a "symbol-value-magic", but it is sort-of halfway
- kludged into semi-different object types.
-
- Some types of objects are "permanent", meaning that once created,
-they do not disappear until explicitly destroyed, using a function such
-as `delete-buffer', `delete-window', `delete-frame', etc. Others will
-disappear once they are not longer used, through the garbage collection
-mechanism. Buffers, frames, windows, devices, and processes are among
-the objects that are permanent. Note that some objects can go both
-ways: Faces can be created either way; extents are normally permanent,
-but detached extents (extents not referring to any text, as happens to
-some extents when the text they are referring to is deleted) are
-temporary. Note that some permanent objects, such as faces and coding
-systems, cannot be deleted. Note also that windows are unique in that
-they can be *undeleted* after having previously been deleted. (This
-happens as a result of restoring a window configuration.)
-
- Note that many types of objects have a "read syntax", i.e. a way of
-specifying an object of that type in Lisp code. When you load a Lisp
-file, or type in code to be evaluated, what really happens is that the
-function `read' is called, which reads some text and creates an object
-based on the syntax of that text; then `eval' is called, which possibly
-does something special; then this loop repeats until there's no more
-text to read. (`eval' only actually does something special with
-symbols, which causes the symbol's value to be returned, similar to
-referencing a variable; and with conses [i.e. lists], which cause a
-function invocation. All other values are returned unchanged.)
-
- The read syntax
-
- 17297
-
- converts to an integer whose value is 17297.
-
- 1.983e-4
-
- converts to a float whose value is 1.983e-4, or .0001983.
-
- ?b
-
- converts to a char that represents the lowercase letter b.
-
- ?^[$(B#&^[(B
-
- (where `^[' actually is an `ESC' character) converts to a particular
-Kanji character when using an ISO2022-based coding system for input.
-(To decode this goo: `ESC' begins an escape sequence; `ESC $ (' is a
-class of escape sequences meaning "switch to a 94x94 character set";
-`ESC $ ( B' means "switch to Japanese Kanji"; `#' and `&' collectively
-index into a 94-by-94 array of characters [subtract 33 from the ASCII
-value of each character to get the corresponding index]; `ESC (' is a
-class of escape sequences meaning "switch to a 94 character set"; `ESC
-(B' means "switch to US ASCII". It is a coincidence that the letter
-`B' is used to denote both Japanese Kanji and US ASCII. If the first
-`B' were replaced with an `A', you'd be requesting a Chinese Hanzi
-character from the GB2312 character set.)
-
- "foobar"
-
- converts to a string.
-
- foobar
-
- converts to a symbol whose name is `"foobar"'. This is done by
-looking up the string equivalent in the global variable `obarray',
-whose contents should be an obarray. If no symbol is found, a new
-symbol with the name `"foobar"' is automatically created and added to
-`obarray'; this process is called "interning" the symbol.
-
- (foo . bar)
-
- converts to a cons cell containing the symbols `foo' and `bar'.
-
- (1 a 2.5)
-
- converts to a three-element list containing the specified objects
-(note that a list is actually a set of nested conses; see the XEmacs
-Lisp Reference).
-
- [1 a 2.5]
-
- converts to a three-element vector containing the specified objects.
-
- #[... ... ... ...]
-
- converts to a compiled-function object (the actual contents are not
-shown since they are not relevant here; look at a file that ends with
-`.elc' for examples).
-
- #*01110110
-
- converts to a bit-vector.
-
- #s(hash-table ... ...)
-
- converts to a hash table (the actual contents are not shown).
-
- #s(range-table ... ...)
-
- converts to a range table (the actual contents are not shown).
-
- #s(char-table ... ...)
-
- converts to a char table (the actual contents are not shown).
-
- Note that the `#s()' syntax is the general syntax for structures,
-which are not really implemented in XEmacs Lisp but should be.
-
- When an object is printed out (using `print' or a related function),
-the read syntax is used, so that the same object can be read in again.
-
- The other objects do not have read syntaxes, usually because it does
-not really make sense to create them in this fashion (i.e. processes,
-where it doesn't make sense to have a subprocess created as a side
-effect of reading some Lisp code), or because they can't be created at
-all (e.g. subrs). Permanent objects, as a rule, do not have a read
-syntax; nor do most complex objects, which contain too much state to be
-easily initialized through a read syntax.
-
-\1f
-File: internals.info, Node: How Lisp Objects Are Represented in C, Next: Rules When Writing New C Code, Prev: The XEmacs Object System (Abstractly Speaking), Up: Top
-
-How Lisp Objects Are Represented in C
-*************************************
-
- Lisp objects are represented in C using a 32-bit or 64-bit machine
-word (depending on the processor; i.e. DEC Alphas use 64-bit Lisp
-objects and most other processors use 32-bit Lisp objects). The
-representation stuffs a pointer together with a tag, as follows:
-
- [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
- [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
-
- <---> ^ <------------------------------------------------------>
- tag | a pointer to a structure, or an integer
- |
- mark bit
-
- The tag describes the type of the Lisp object. For integers and
-chars, the lower 28 bits contain the value of the integer or char; for
-all others, the lower 28 bits contain a pointer. The mark bit is used
-during garbage-collection, and is always 0 when garbage collection is
-not happening. (The way that garbage collection works, basically, is
-that it loops over all places where Lisp objects could exist - this
-includes all global variables in C that contain Lisp objects [including
-`Vobarray', the C equivalent of `obarray'; through this, all Lisp
-variables will get marked], plus various other places - and recursively
-scans through the Lisp objects, marking each object it finds by setting
-the mark bit. Then it goes through the lists of all objects allocated,
-freeing the ones that are not marked and turning off the mark bit of
-the ones that are marked.)
-
- Lisp objects use the typedef `Lisp_Object', but the actual C type
-used for the Lisp object can vary. It can be either a simple type
-(`long' on the DEC Alpha, `int' on other machines) or a structure whose
-fields are bit fields that line up properly (actually, a union of
-structures is used). Generally the simple integral type is preferable
-because it ensures that the compiler will actually use a machine word
-to represent the object (some compilers will use more general and less
-efficient code for unions and structs even if they can fit in a machine
-word). The union type, however, has the advantage of stricter type
-checking (if you accidentally pass an integer where a Lisp object is
-desired, you get a compile error), and it makes it easier to decode
-Lisp objects when debugging. The choice of which type to use is
-determined by the preprocessor constant `USE_UNION_TYPE' which is
-defined via the `--use-union-type' option to `configure'.
-
- Note that there are only eight types that the tag can represent, but
-many more actual types than this. This is handled by having one of the
-tag types specify a meta-type called a "record"; for all such objects,
-the first four bytes of the pointed-to structure indicate what the
-actual type is.
-
- Note also that having 28 bits for pointers and integers restricts a
-lot of things to 256 megabytes of memory. (Basically, enough pointers
-and indices and whatnot get stuffed into Lisp objects that the total
-amount of memory used by XEmacs can't grow above 256 megabytes. In
-older versions of XEmacs and GNU Emacs, the tag was 5 bits wide,
-allowing for 32 types, which was more than the actual number of types
-that existed at the time, and no "record" type was necessary. However,
-this limited the editor to 64 megabytes total, which some users who
-edited large files might conceivably exceed.)
-
- Also, note that there is an implicit assumption here that all
-pointers are low enough that the top bits are all zero and can just be
-chopped off. On standard machines that allocate memory from the bottom
-up (and give each process its own address space), this works fine. Some
-machines, however, put the data space somewhere else in memory (e.g.
-beginning at 0x80000000). Those machines cope by defining
-`DATA_SEG_BITS' in the corresponding `m/' or `s/' file to the proper
-mask. Then, pointers retrieved from Lisp objects are automatically
-OR'ed with this value prior to being used.
-
- A corollary of the previous paragraph is that *(pointers to)
-stack-allocated structures cannot be put into Lisp objects*. The stack
-is generally located near the top of memory; if you put such a pointer
-into a Lisp object, it will get its top bits chopped off, and you will
-lose.
-
- Actually, there's an alternative representation of a `Lisp_Object',
-invented by Kyle Jones, that is used when the `--use-minimal-tagbits'
-option to `configure' is used. In this case the 2 lower bits are used
-for the tag bits. This representation assumes that pointers to structs
-are always aligned to multiples of 4, so the lower 2 bits are always
-zero.
-
- [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
- [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
-
- <---------------------------------------------------------> <->
- a pointer to a structure, or an integer tag
-
- A tag of 00 is used for all pointer object types, a tag of 10 is used
-for characters, and the other two tags 01 and 11 are joined together to
-form the integer object type. The markbit is moved to part of the
-structure being pointed at (integers and chars do not need to be marked,
-since no memory is allocated). This representation has these
-advantages:
-
- 1. 31 bits can be used for Lisp Integers.
-
- 2. *Any* pointer can be represented directly, and no bit masking
- operations are necessary.
-
- The disadvantages are:
-
- 1. An extra level of indirection is needed when accessing the object
- types that were not record types. So checking whether a Lisp
- object is a cons cell becomes a slower operation.
-
- 2. Mark bits can no longer be stored directly in Lisp objects, so
- another place for them must be found. This means that a cons cell
- requires more memory than merely room for 2 lisp objects, leading
- to extra memory use.
-
- Various macros are used to construct Lisp objects and extract the
-components. Macros of the form `XINT()', `XCHAR()', `XSTRING()',
-`XSYMBOL()', etc. mask out the pointer/integer field and cast it to the
-appropriate type. All of the macros that construct pointers will `OR'
-with `DATA_SEG_BITS' if necessary. `XINT()' needs to be a bit tricky
-so that negative numbers are properly sign-extended: Usually it does
-this by shifting the number four bits to the left and then four bits to
-the right. This assumes that the right-shift operator does an
-arithmetic shift (i.e. it leaves the most-significant bit as-is rather
-than shifting in a zero, so that it mimics a divide-by-two even for
-negative numbers). Not all machines/compilers do this, and on the ones
-that don't, a more complicated definition is selected by defining
-`EXPLICIT_SIGN_EXTEND'.
-
- Note that when `ERROR_CHECK_TYPECHECK' is defined, the extractor
-macros become more complicated - they check the tag bits and/or the
-type field in the first four bytes of a record type to ensure that the
-object is really of the correct type. This is great for catching places
-where an incorrect type is being dereferenced - this typically results
-in a pointer being dereferenced as the wrong type of structure, with
-unpredictable (and sometimes not easily traceable) results.
-
- There are similar `XSETTYPE()' macros that construct a Lisp object.
-These macros are of the form `XSETTYPE (LVALUE, RESULT)', i.e. they
-have to be a statement rather than just used in an expression. The
-reason for this is that standard C doesn't let you "construct" a
-structure (but GCC does). Granted, this sometimes isn't too convenient;
-for the case of integers, at least, you can use the function
-`make_int()', which constructs and *returns* an integer Lisp object.
-Note that the `XSETTYPE()' macros are also affected by
-`ERROR_CHECK_TYPECHECK' and make sure that the structure is of the
-right type in the case of record types, where the type is contained in
-the structure.
-
- The C programmer is responsible for *guaranteeing* that a
-Lisp_Object is is the correct type before using the `XTYPE' macros.
-This is especially important in the case of lists. Use `XCAR' and
-`XCDR' if a Lisp_Object is certainly a cons cell, else use `Fcar()' and
-`Fcdr()'. Trust other C code, but not Lisp code. On the other hand,
-if XEmacs has an internal logic error, it's better to crash
-immediately, so sprinkle "unreachable" `abort()'s liberally about the
-source code.
-
-\1f
-File: internals.info, Node: Rules When Writing New C Code, Next: A Summary of the Various XEmacs Modules, Prev: How Lisp Objects Are Represented in C, Up: Top
-
-Rules When Writing New C Code
-*****************************
-
- The XEmacs C Code is extremely complex and intricate, and there are
-many rules that are more or less consistently followed throughout the
-code. Many of these rules are not obvious, so they are explained here.
-It is of the utmost importance that you follow them. If you don't,
-you may get something that appears to work, but which will crash in odd
-situations, often in code far away from where the actual breakage is.
-
-* Menu:
-
-* General Coding Rules::
-* Writing Lisp Primitives::
-* Adding Global Lisp Variables::
-* Coding for Mule::
-* Techniques for XEmacs Developers::
-
-\1f
-File: internals.info, Node: General Coding Rules, Next: Writing Lisp Primitives, Up: Rules When Writing New C Code
-
-General Coding Rules
-====================
-
- The C code is actually written in a dialect of C called "Clean C",
-meaning that it can be compiled, mostly warning-free, with either a C or
-C++ compiler. Coding in Clean C has several advantages over plain C.
-C++ compilers are more nit-picking, and a number of coding errors have
-been found by compiling with C++. The ability to use both C and C++
-tools means that a greater variety of development tools are available to
-the developer.
-
- Almost every module contains a `syms_of_*()' function and a
-`vars_of_*()' function. The former declares any Lisp primitives you
-have defined and defines any symbols you will be using. The latter
-declares any global Lisp variables you have added and initializes global
-C variables in the module. For each such function, declare it in
-`symsinit.h' and make sure it's called in the appropriate place in
-`emacs.c'. *Important*: There are stringent requirements on exactly
-what can go into these functions. See the comment in `emacs.c'. The
-reason for this is to avoid obscure unwanted interactions during
-initialization. If you don't follow these rules, you'll be sorry! If
-you want to do anything that isn't allowed, create a
-`complex_vars_of_*()' function for it. Doing this is tricky, though:
-You have to make sure your function is called at the right time so that
-all the initialization dependencies work out.
-
- Every module includes `<config.h>' (angle brackets so that
-`--srcdir' works correctly; `config.h' may or may not be in the same
-directory as the C sources) and `lisp.h'. `config.h' must always be
-included before any other header files (including system header files)
-to ensure that certain tricks played by various `s/' and `m/' files
-work out correctly.
-
- *All global and static variables that are to be modifiable must be
-declared uninitialized.* This means that you may not use the "declare
-with initializer" form for these variables, such as `int some_variable
-= 0;'. The reason for this has to do with some kludges done during the
-dumping process: If possible, the initialized data segment is re-mapped
-so that it becomes part of the (unmodifiable) code segment in the
-dumped executable. This allows this memory to be shared among multiple
-running XEmacs processes. XEmacs is careful to place as much constant
-data as possible into initialized variables (in particular, into what's
-called the "pure space" - see below) during the `temacs' phase.
-
- *Please note:* This kludge only works on a few systems nowadays, and
-is rapidly becoming irrelevant because most modern operating systems
-provide "copy-on-write" semantics. All data is initially shared
-between processes, and a private copy is automatically made (on a
-page-by-page basis) when a process first attempts to write to a page of
-memory.
-
- Formerly, there was a requirement that static variables not be
-declared inside of functions. This had to do with another hack along
-the same vein as what was just described: old USG systems put
-statically-declared variables in the initialized data space, so those
-header files had a `#define static' declaration. (That way, the
-data-segment remapping described above could still work.) This fails
-badly on static variables inside of functions, which suddenly become
-automatic variables; therefore, you weren't supposed to have any of
-them. This awful kludge has been removed in XEmacs because
-
- 1. almost all of the systems that used this kludge ended up having to
- disable the data-segment remapping anyway;
-
- 2. the only systems that didn't were extremely outdated ones;
-
- 3. this hack completely messed up inline functions.
-
- The C source code makes heavy use of C preprocessor macros. One
-popular macro style is:
-
- #define FOO(var, value) do { \
- Lisp_Object FOO_value = (value); \
- ... /* compute using FOO_value */ \
- (var) = bar; \
- } while (0)
-
- The `do {...} while (0)' is a standard trick to allow FOO to have
-statement semantics, so that it can safely be used within an `if'
-statement in C, for example. Multiple evaluation is prevented by
-copying a supplied argument into a local variable, so that
-`FOO(var,fun(1))' only calls `fun' once.
-
- Lisp lists are popular data structures in the C code as well as in
-Elisp. There are two sets of macros that iterate over lists.
-`EXTERNAL_LIST_LOOP_N' should be used when the list has been supplied
-by the user, and cannot be trusted to be acyclic and nil-terminated. A
-`malformed-list' or `circular-list' error will be generated if the list
-being iterated over is not entirely kosher. `LIST_LOOP_N', on the
-other hand, is faster and less safe, and can be used only on trusted
-lists.
-
- Related macros are `GET_EXTERNAL_LIST_LENGTH' and `GET_LIST_LENGTH',
-which calculate the length of a list, and in the case of
-`GET_EXTERNAL_LIST_LENGTH', validating the properness of the list. The
-macros `EXTERNAL_LIST_LOOP_DELETE_IF' and `LIST_LOOP_DELETE_IF' delete
-elements from a lisp list satisfying some predicate.
-
-\1f
-File: internals.info, Node: Writing Lisp Primitives, Next: Adding Global Lisp Variables, Prev: General Coding Rules, Up: Rules When Writing New C Code
-
-Writing Lisp Primitives
-=======================
-
- Lisp primitives are Lisp functions implemented in C. The details of
-interfacing the C function so that Lisp can call it are handled by a few
-C macros. The only way to really understand how to write new C code is
-to read the source, but we can explain some things here.
-
- An example of a special form is the definition of `prog1', from
-`eval.c'. (An ordinary function would have the same general
-appearance.)
-
- DEFUN ("prog1", Fprog1, 1, UNEVALLED, 0, /*
- Similar to `progn', but the value of the first form is returned.
- \(prog1 FIRST BODY...): All the arguments are evaluated sequentially.
- The value of FIRST is saved during evaluation of the remaining args,
- whose values are discarded.
- */
- (args))
- {
- /* This function can GC */
- REGISTER Lisp_Object val, form, tail;
- struct gcpro gcpro1;
-
- val = Feval (XCAR (args));
-
- GCPRO1 (val);
-
- LIST_LOOP_3 (form, XCDR (args), tail)
- Feval (form);
-
- UNGCPRO;
- return val;
- }
-
- Let's start with a precise explanation of the arguments to the
-`DEFUN' macro. Here is a template for them:
-
- DEFUN (LNAME, FNAME, MIN_ARGS, MAX_ARGS, INTERACTIVE, /*
- DOCSTRING
- */
- (ARGLIST))
-
-LNAME
- This string is the name of the Lisp symbol to define as the
- function name; in the example above, it is `"prog1"'.
-
-FNAME
- This is the C function name for this function. This is the name
- that is used in C code for calling the function. The name is, by
- convention, `F' prepended to the Lisp name, with all dashes (`-')
- in the Lisp name changed to underscores. Thus, to call this
- function from C code, call `Fprog1'. Remember that the arguments
- are of type `Lisp_Object'; various macros and functions for
- creating values of type `Lisp_Object' are declared in the file
- `lisp.h'.
-
- Primitives whose names are special characters (e.g. `+' or `<')
- are named by spelling out, in some fashion, the special character:
- e.g. `Fplus()' or `Flss()'. Primitives whose names begin with
- normal alphanumeric characters but also contain special characters
- are spelled out in some creative way, e.g. `let*' becomes
- `FletX()'.
-
- Each function also has an associated structure that holds the data
- for the subr object that represents the function in Lisp. This
- structure conveys the Lisp symbol name to the initialization
- routine that will create the symbol and store the subr object as
- its definition. The C variable name of this structure is always
- `S' prepended to the FNAME. You hardly ever need to be aware of
- the existence of this structure, since `DEFUN' plus `DEFSUBR'
- takes care of all the details.
-
-MIN_ARGS
- This is the minimum number of arguments that the function
- requires. The function `prog1' allows a minimum of one argument.
-
-MAX_ARGS
- This is the maximum number of arguments that the function accepts,
- if there is a fixed maximum. Alternatively, it can be `UNEVALLED',
- indicating a special form that receives unevaluated arguments, or
- `MANY', indicating an unlimited number of evaluated arguments (the
- C equivalent of `&rest'). Both `UNEVALLED' and `MANY' are macros.
- If MAX_ARGS is a number, it may not be less than MIN_ARGS and it
- may not be greater than 8. (If you need to add a function with
- more than 8 arguments, use the `MANY' form. Resist the urge to
- edit the definition of `DEFUN' in `lisp.h'. If you do it anyways,
- make sure to also add another clause to the switch statement in
- `primitive_funcall().')
-
-INTERACTIVE
- This is an interactive specification, a string such as might be
- used as the argument of `interactive' in a Lisp function. In the
- case of `prog1', it is 0 (a null pointer), indicating that `prog1'
- cannot be called interactively. A value of `""' indicates a
- function that should receive no arguments when called
- interactively.
-
-DOCSTRING
- This is the documentation string. It is written just like a
- documentation string for a function defined in Lisp; in
- particular, the first line should be a single sentence. Note how
- the documentation string is enclosed in a comment, none of the
- documentation is placed on the same lines as the comment-start and
- comment-end characters, and the comment-start characters are on
- the same line as the interactive specification. `make-docfile',
- which scans the C files for documentation strings, is very
- particular about what it looks for, and will not properly extract
- the doc string if it's not in this exact format.
-
- In order to make both `etags' and `make-docfile' happy, make sure
- that the `DEFUN' line contains the LNAME and FNAME, and that the
- comment-start characters for the doc string are on the same line
- as the interactive specification, and put a newline directly after
- them (and before the comment-end characters).
-
-ARGLIST
- This is the comma-separated list of arguments to the C function.
- For a function with a fixed maximum number of arguments, provide a
- C argument for each Lisp argument. In this case, unlike regular C
- functions, the types of the arguments are not declared; they are
- simply always of type `Lisp_Object'.
-
- The names of the C arguments will be used as the names of the
- arguments to the Lisp primitive as displayed in its documentation,
- modulo the same concerns described above for `F...' names (in
- particular, underscores in the C arguments become dashes in the
- Lisp arguments).
-
- There is one additional kludge: A trailing `_' on the C argument is
- discarded when forming the Lisp argument. This allows C language
- reserved words (like `default') or global symbols (like `dirname')
- to be used as argument names without compiler warnings or errors.
-
- A Lisp function with MAX_ARGS = `UNEVALLED' is a "special form";
- its arguments are not evaluated. Instead it receives one argument
- of type `Lisp_Object', a (Lisp) list of the unevaluated arguments,
- conventionally named `(args)'.
-
- When a Lisp function has no upper limit on the number of arguments,
- specify MAX_ARGS = `MANY'. In this case its implementation in C
- actually receives exactly two arguments: the number of Lisp
- arguments (an `int') and the address of a block containing their
- values (a `Lisp_Object *'). In this case only are the C types
- specified in the ARGLIST: `(int nargs, Lisp_Object *args)'.
-
- Within the function `Fprog1' itself, note the use of the macros
-`GCPRO1' and `UNGCPRO'. `GCPRO1' is used to "protect" a variable from
-garbage collection--to inform the garbage collector that it must look
-in that variable and regard the object pointed at by its contents as an
-accessible object. This is necessary whenever you call `Feval' or
-anything that can directly or indirectly call `Feval' (this includes
-the `QUIT' macro!). At such a time, any Lisp object that you intend to
-refer to again must be protected somehow. `UNGCPRO' cancels the
-protection of the variables that are protected in the current function.
-It is necessary to do this explicitly.
-
- The macro `GCPRO1' protects just one local variable. If you want to
-protect two, use `GCPRO2' instead; repeating `GCPRO1' will not work.
-Macros `GCPRO3' and `GCPRO4' also exist.
-
- These macros implicitly use local variables such as `gcpro1'; you
-must declare these explicitly, with type `struct gcpro'. Thus, if you
-use `GCPRO2', you must declare `gcpro1' and `gcpro2'.
-
- Note also that the general rule is "caller-protects"; i.e. you are
-only responsible for protecting those Lisp objects that you create. Any
-objects passed to you as arguments should have been protected by whoever
-created them, so you don't in general have to protect them.
-
- In particular, the arguments to any Lisp primitive are always
-automatically `GCPRO'ed, when called "normally" from Lisp code or
-bytecode. So only a few Lisp primitives that are called frequently from
-C code, such as `Fprogn' protect their arguments as a service to their
-caller. You don't need to protect your arguments when writing a new
-`DEFUN'.
-
- `GCPRO'ing is perhaps the trickiest and most error-prone part of
-XEmacs coding. It is *extremely* important that you get this right and
-use a great deal of discipline when writing this code. *Note
-`GCPRO'ing: GCPROing, for full details on how to do this.
-
- What `DEFUN' actually does is declare a global structure of type
-`Lisp_Subr' whose name begins with capital `SF' and which contains
-information about the primitive (e.g. a pointer to the function, its
-minimum and maximum allowed arguments, a string describing its Lisp
-name); `DEFUN' then begins a normal C function declaration using the
-`F...' name. The Lisp subr object that is the function definition of a
-primitive (i.e. the object in the function slot of the symbol that
-names the primitive) actually points to this `SF' structure; when
-`Feval' encounters a subr, it looks in the structure to find out how to
-call the C function.
-
- Defining the C function is not enough to make a Lisp primitive
-available; you must also create the Lisp symbol for the primitive (the
-symbol is "interned"; *note Obarrays::.) and store a suitable subr
-object in its function cell. (If you don't do this, the primitive won't
-be seen by Lisp code.) The code looks like this:
-
- DEFSUBR (FNAME);
-
-Here FNAME is the same name you used as the second argument to `DEFUN'.
-
- This call to `DEFSUBR' should go in the `syms_of_*()' function at
-the end of the module. If no such function exists, create it and make
-sure to also declare it in `symsinit.h' and call it from the
-appropriate spot in `main()'. *Note General Coding Rules::.
-
- Note that C code cannot call functions by name unless they are
-defined in C. The way to call a function written in Lisp from C is to
-use `Ffuncall', which embodies the Lisp function `funcall'. Since the
-Lisp function `funcall' accepts an unlimited number of arguments, in C
-it takes two: the number of Lisp-level arguments, and a one-dimensional
-array containing their values. The first Lisp-level argument is the
-Lisp function to call, and the rest are the arguments to pass to it.
-Since `Ffuncall' can call the evaluator, you must protect pointers from
-garbage collection around the call to `Ffuncall'. (However, `Ffuncall'
-explicitly protects all of its parameters, so you don't have to protect
-any pointers passed as parameters to it.)
-
- The C functions `call0', `call1', `call2', and so on, provide handy
-ways to call a Lisp function conveniently with a fixed number of
-arguments. They work by calling `Ffuncall'.
-
- `eval.c' is a very good file to look through for examples; `lisp.h'
-contains the definitions for important macros and functions.
-
-\1f
-File: internals.info, Node: Adding Global Lisp Variables, Next: Coding for Mule, Prev: Writing Lisp Primitives, Up: Rules When Writing New C Code
-
-Adding Global Lisp Variables
-============================
-
- Global variables whose names begin with `Q' are constants whose
-value is a symbol of a particular name. The name of the variable should
-be derived from the name of the symbol using the same rules as for Lisp
-primitives. These variables are initialized using a call to
-`defsymbol()' in the `syms_of_*()' function. (This call interns a
-symbol, sets the C variable to the resulting Lisp object, and calls
-`staticpro()' on the C variable to tell the garbage-collection
-mechanism about this variable. What `staticpro()' does is add a
-pointer to the variable to a large global array; when
-garbage-collection happens, all pointers listed in the array are used
-as starting points for marking Lisp objects. This is important because
-it's quite possible that the only current reference to the object is
-the C variable. In the case of symbols, the `staticpro()' doesn't
-matter all that much because the symbol is contained in `obarray',
-which is itself `staticpro()'ed. However, it's possible that a naughty
-user could do something like uninterning the symbol out of `obarray' or
-even setting `obarray' to a different value [although this is likely to
-make XEmacs crash!].)
-
- *Please note:* It is potentially deadly if you declare a `Q...'
-variable in two different modules. The two calls to `defsymbol()' are
-no problem, but some linkers will complain about multiply-defined
-symbols. The most insidious aspect of this is that often the link will
-succeed anyway, but then the resulting executable will sometimes crash
-in obscure ways during certain operations! To avoid this problem,
-declare any symbols with common names (such as `text') that are not
-obviously associated with this particular module in the module
-`general.c'.
-
- Global variables whose names begin with `V' are variables that
-contain Lisp objects. The convention here is that all global variables
-of type `Lisp_Object' begin with `V', and all others don't (including
-integer and boolean variables that have Lisp equivalents). Most of the
-time, these variables have equivalents in Lisp, but some don't. Those
-that do are declared this way by a call to `DEFVAR_LISP()' in the
-`vars_of_*()' initializer for the module. What this does is create a
-special "symbol-value-forward" Lisp object that contains a pointer to
-the C variable, intern a symbol whose name is as specified in the call
-to `DEFVAR_LISP()', and set its value to the symbol-value-forward Lisp
-object; it also calls `staticpro()' on the C variable to tell the
-garbage-collection mechanism about the variable. When `eval' (or
-actually `symbol-value') encounters this special object in the process
-of retrieving a variable's value, it follows the indirection to the C
-variable and gets its value. `setq' does similar things so that the C
-variable gets changed.
-
- Whether or not you `DEFVAR_LISP()' a variable, you need to
-initialize it in the `vars_of_*()' function; otherwise it will end up
-as all zeroes, which is the integer 0 (*not* `nil'), and this is
-probably not what you want. Also, if the variable is not
-`DEFVAR_LISP()'ed, *you must call* `staticpro()' on the C variable in
-the `vars_of_*()' function. Otherwise, the garbage-collection
-mechanism won't know that the object in this variable is in use, and
-will happily collect it and reuse its storage for another Lisp object,
-and you will be the one who's unhappy when you can't figure out how
-your variable got overwritten.
-
-\1f
-File: internals.info, Node: Coding for Mule, Next: Techniques for XEmacs Developers, Prev: Adding Global Lisp Variables, Up: Rules When Writing New C Code
-
-Coding for Mule
-===============
-
- Although Mule support is not compiled by default in XEmacs, many
-people are using it, and we consider it crucial that new code works
-correctly with multibyte characters. This is not hard; it is only a
-matter of following several simple user-interface guidelines. Even if
-you never compile with Mule, with a little practice you will find it
-quite easy to code Mule-correctly.
-
- Note that these guidelines are not necessarily tied to the current
-Mule implementation; they are also a good idea to follow on the grounds
-of code generalization for future I18N work.
-
-* Menu:
-
-* Character-Related Data Types::
-* Working With Character and Byte Positions::
-* Conversion to and from External Data::
-* General Guidelines for Writing Mule-Aware Code::
-* An Example of Mule-Aware Code::
-
-\1f
-File: internals.info, Node: Character-Related Data Types, Next: Working With Character and Byte Positions, Up: Coding for Mule
-
-Character-Related Data Types
-----------------------------
-
- First, let's review the basic character-related datatypes used by
-XEmacs. Note that the separate `typedef's are not mandatory in the
-current implementation (all of them boil down to `unsigned char' or
-`int'), but they improve clarity of code a great deal, because one
-glance at the declaration can tell the intended use of the variable.
-
-`Emchar'
- An `Emchar' holds a single Emacs character.
-
- Obviously, the equality between characters and bytes is lost in
- the Mule world. Characters can be represented by one or more
- bytes in the buffer, and `Emchar' is the C type large enough to
- hold any character.
-
- Without Mule support, an `Emchar' is equivalent to an `unsigned
- char'.
-
-`Bufbyte'
- The data representing the text in a buffer or string is logically
- a set of `Bufbyte's.
-
- XEmacs does not work with character formats all the time; when
- reading characters from the outside, it decodes them to an
- internal format, and likewise encodes them when writing.
- `Bufbyte' (in fact `unsigned char') is the basic unit of XEmacs
- internal buffers and strings format.
-
- One character can correspond to one or more `Bufbyte's. In the
- current implementation, an ASCII character is represented by the
- same `Bufbyte', and extended characters are represented by a
- sequence of `Bufbyte's.
-
- Without Mule support, a `Bufbyte' is equivalent to an `Emchar'.
-
-`Bufpos'
-`Charcount'
- A `Bufpos' represents a character position in a buffer or string.
- A `Charcount' represents a number (count) of characters.
- Logically, subtracting two `Bufpos' values yields a `Charcount'
- value. Although all of these are `typedef'ed to `int', we use
- them in preference to `int' to make it clear what sort of position
- is being used.
-
- `Bufpos' and `Charcount' values are the only ones that are ever
- visible to Lisp.
-
-`Bytind'
-`Bytecount'
- A `Bytind' represents a byte position in a buffer or string. A
- `Bytecount' represents the distance between two positions in bytes.
- The relationship between `Bytind' and `Bytecount' is the same as
- the relationship between `Bufpos' and `Charcount'.
-
-`Extbyte'
-`Extcount'
- When dealing with the outside world, XEmacs works with `Extbyte's,
- which are equivalent to `unsigned char'. Obviously, an `Extcount'
- is the distance between two `Extbyte's. Extbytes and Extcounts
- are not all that frequent in XEmacs code.
-