@c %**end of header
@ifinfo
+@dircategory XEmacs Editor
+@direntry
+* Internals: (internals). XEmacs Internals Manual.
+@end direntry
Copyright @copyright{} 1992 - 1996 Ben Wing.
Copyright @copyright{} 1996, 1997 Sun Microsystems.
-Copyright @copyright{} 1994, 1995 Free Software Foundation.
+Copyright @copyright{} 1994 - 1998 Free Software Foundation.
Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
@titlepage
@title XEmacs Internals Manual
-@subtitle Version 1.1, March 1997
+@subtitle Version 1.3, August 1999
@author Ben Wing
@author Martin Buchholz
+@author Hrvoje Niksic
+@author Matthias Neubauer
+@author Olivier Galibert
@page
@vskip 0pt plus 1fill
@noindent
Copyright @copyright{} 1992 - 1996 Ben Wing. @*
-Copyright @copyright{} 1996 Sun Microsystems, Inc. @*
-Copyright @copyright{} 1994 Free Software Foundation. @*
+Copyright @copyright{} 1996, 1997 Sun Microsystems, Inc. @*
+Copyright @copyright{} 1994 - 1998 Free Software Foundation. @*
Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
@sp 2
-Version 1.1 @*
-March, 1997.@*
+Version 1.3 @*
+August 1999.@*
Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
* Rules When Writing New C Code::
* A Summary of the Various XEmacs Modules::
* Allocation of Objects in XEmacs Lisp::
+* Dumping::
* Events and the Event Loop::
* Evaluation; Stack Frames; Bindings::
* Symbols and Variables::
* Consoles; Devices; Frames; Windows::
* The Redisplay Mechanism::
* Extents::
-* Faces and Glyphs::
+* Faces::
+* Glyphs::
* Specifiers::
* Menus::
* Subprocesses::
-* Interface to X Windows::
-* Index:: Index including concepts, functions, variables,
- and other terms.
+* Interface to the X Window System::
+* Index::
- --- The Detailed Node Listing ---
+@detailmenu
-Here are other nodes that are inferiors of those already listed,
-mentioned here so you can get to them in one step:
+--- The Detailed Node Listing ---
A History of Emacs
* Through Version 18:: Unification prevails.
* Lucid Emacs:: One version 19 Emacs.
* GNU Emacs 19:: The other version 19 Emacs.
+* GNU Emacs 20:: The other version 20 Emacs.
* XEmacs:: The continuation of Lucid Emacs.
Rules When Writing New C Code
* General Coding Rules::
* Writing Lisp Primitives::
* Adding Global Lisp Variables::
+* Coding for Mule::
* Techniques for XEmacs Developers::
+Coding for Mule
+
+* Character-Related Data Types::
+* Working With Character and Byte Positions::
+* Conversion to and from External Data::
+* General Guidelines for Writing Mule-Aware Code::
+* An Example of Mule-Aware Code::
+
A Summary of the Various XEmacs Modules
* Low-Level Modules::
* Introduction to Allocation::
* Garbage Collection::
* GCPROing::
+* Garbage Collection - Step by Step::
* Integers and Characters::
* Allocation from Frob Blocks::
* lrecords::
* Low-level allocation::
-* Pure Space::
* Cons::
* Vector::
* Bit Vector::
* Symbol::
* Marker::
* String::
-* Bytecode::
+* Compiled Function::
+
+Garbage Collection - Step by Step
+
+* Invocation::
+* garbage_collect_1::
+* mark_object::
+* gc_sweep::
+* sweep_lcrecords_1::
+* compact_string_chars::
+* sweep_strings::
+* sweep_bit_vectors_1::
+
+Dumping
+
+* Overview::
+* Data descriptions::
+* Dumping phase::
+* Reloading phase::
+
+Dumping phase
+
+* Object inventory::
+* Address allocation::
+* The header::
+* Data dumping::
+* Pointers dumping::
Events and the Event Loop
* Character Sets::
* Encodings::
* Internal Mule Encodings::
+* CCL::
Encodings
* Internal String Encoding::
* Internal Character Encoding::
-The Lisp Reader and Compiler
-
Lstreams
+* Creating an Lstream:: Creating an lstream object.
+* Lstream Types:: Different sorts of things that are streamed.
+* Lstream Functions:: Functions for working with lstreams.
+* Lstream Methods:: Creating new lstream types.
+
Consoles; Devices; Frames; Windows
* Introduction to Consoles; Devices; Frames; Windows::
* Point::
* Window Hierarchy::
+* The Window Object::
The Redisplay Mechanism
* Critical Redisplay Sections::
* Line Start Cache::
+* Redisplay Piece by Piece::
Extents
* Extent Ordering:: How extents are ordered internally.
* Format of the Extent Info:: The extent information in a buffer or string.
* Zero-Length Extents:: A weird special case.
-* Mathematics of Extent Ordering:: A rigorous foundation.
+* Mathematics of Extent Ordering:: A rigorous foundation.
* Extent Fragments:: Cached information useful for redisplay.
-Faces and Glyphs
-
-Specifiers
-
-Menus
-
-Subprocesses
-
-Interface to X Windows
-
+@end detailmenu
@end menu
@node A History of Emacs, XEmacs From the Outside, Top, Top
* XEmacs:: The continuation of Lucid Emacs.
@end menu
-@node Through Version 18
+@node Through Version 18, Lucid Emacs, A History of Emacs, A History of Emacs
@section Through Version 18
@cindex Gosling, James
@cindex Great Usenet Renaming
version 18.59 released October 31, 1992.
@end itemize
-@node Lucid Emacs
+@node Lucid Emacs, GNU Emacs 19, Through Version 18, A History of Emacs
@section Lucid Emacs
@cindex Lucid Emacs
@cindex Lucid Inc.
version 20.4 released February 28, 1998.
@end itemize
-@node GNU Emacs 19
+@node GNU Emacs 19, GNU Emacs 20, Lucid Emacs, A History of Emacs
@section GNU Emacs 19
@cindex GNU Emacs 19
@cindex FSF Emacs
working on and using GNU Emacs for a long time (back as far as version
16 or 17).
-@node GNU Emacs 20
+@node GNU Emacs 20, XEmacs, GNU Emacs 19, A History of Emacs
@section GNU Emacs 20
@cindex GNU Emacs 20
@cindex FSF Emacs
version 20.3 released August 19, 1998.
@end itemize
-@node XEmacs
+@node XEmacs, , GNU Emacs 20, A History of Emacs
@section XEmacs
@cindex XEmacs
displayable representations, and XEmacs provides a function
@code{redisplay()} that ensures that the display of all such objects
matches their internal state. Most of the time, a standard Lisp
-environment is in a @dfn{read-eval-print} loop -- i.e. ``read some Lisp
+environment is in a @dfn{read-eval-print} loop---i.e. ``read some Lisp
code, execute it, and print the results''. XEmacs has a similar loop:
@itemize @bullet
executed; this prints out the error and continues.) Routines can also
specify cleanup code (called an @dfn{unwind-protect}) that will be
called when control exits from a block of code, no matter how that exit
-occurs -- i.e. even if a function deeply nested below it causes a
+occurs---i.e. even if a function deeply nested below it causes a
non-local exit back to the top level.
Note that this facility has appeared in some recent vintages of C, in
you declared. This is actually considered a bug in Emacs Lisp and in
all other early dialects of Lisp, and was corrected in Common Lisp. (In
Common Lisp, you can still declare dynamically scoped variables if you
-want to -- they are sometimes useful -- but variables by default are
+want to---they are sometimes useful---but variables by default are
@dfn{lexically scoped} as in C.)
@end enumerate
providing the increased compile-time error-checking of static typing.
@end enumerate
+The Java language also has some negative attributes:
+
+@enumerate
+@item
+Java uses the edit/compile/run model of software development. This
+makes it hard to use interactively. For example, to use Java like
+@code{bc} it is necessary to write a special purpose, albeit tiny,
+application. In Emacs Lisp, a calculator comes built-in without any
+effort - one can always just type an expression in the @code{*scratch*}
+buffer.
+@item
+Java tries too hard to enforce, not merely enable, portability, making
+ordinary access to standard OS facilities painful. Java has an
+@dfn{agenda}. I think this is why @code{chdir} is not part of standard
+Java, which is inexcusable.
+@end enumerate
+
+Unfortunately, there is no perfect language. Static typing allows a
+compiler to catch programmer errors and produce more efficient code, but
+makes programming more tedious and less fun. For the foreseeable future,
+an Ideal Editing and Programming Environment (and that is what XEmacs
+aspires to) will be programmable in multiple languages: high level ones
+like Lisp for user customization and prototyping, and lower level ones
+for infrastructure and industrial strength applications. If I had my
+way, XEmacs would be friendly towards the Python, Scheme, C++, ML,
+etc... communities. But there are serious technical difficulties to
+achieving that goal.
+
+The word @dfn{application} in the previous paragraph was used
+intentionally. XEmacs implements an API for programs written in Lisp
+that makes it a full-fledged application platform, very much like an OS
+inside the real OS.
+
@node XEmacs From the Perspective of Building, XEmacs From the Inside, The Lisp Language, Top
@chapter XEmacs From the Perspective of Building
- The heart of XEmacs is the Lisp environment, which is written in C.
+The heart of XEmacs is the Lisp environment, which is written in C.
This is contained in the @file{src/} subdirectory. Underneath
@file{src/} are two subdirectories of header files: @file{s/} (header
files for particular operating systems) and @file{m/} (header files for
identified for the particular environment in which XEmacs is being
built.
- XEmacs also contains a great deal of Lisp code. This implements the
-operations that make XEmacs useful as an editor as well as just a
-Lisp environment, and also contains many add-on packages that allow
-XEmacs to browse directories, act as a mail and Usenet news reader,
-compile Lisp code, etc. There is actually more Lisp code than
-C code associated with XEmacs, but much of the Lisp code is
-peripheral to the actual operation of the editor. The Lisp code
-all lies in subdirectories underneath the @file{lisp/} directory.
+XEmacs also contains a great deal of Lisp code. This implements the
+operations that make XEmacs useful as an editor as well as just a Lisp
+environment, and also contains many add-on packages that allow XEmacs to
+browse directories, act as a mail and Usenet news reader, compile Lisp
+code, etc. There is actually more Lisp code than C code associated with
+XEmacs, but much of the Lisp code is peripheral to the actual operation
+of the editor. The Lisp code all lies in subdirectories underneath the
+@file{lisp/} directory.
- The @file{lwlib/} directory contains C code that implements a
+The @file{lwlib/} directory contains C code that implements a
generalized interface onto different X widget toolkits and also
implements some widgets of its own that behave like Motif widgets but
are faster, free, and in some cases more powerful. The code in this
directory compiles into a library and is mostly independent from XEmacs.
- The @file{etc/} directory contains various data files associated with
+The @file{etc/} directory contains various data files associated with
XEmacs. Some of them are actually read by XEmacs at startup; others
merely contain useful information of various sorts.
- The @file{lib-src/} directory contains C code for various auxiliary
+The @file{lib-src/} directory contains C code for various auxiliary
programs that are used in connection with XEmacs. Some of them are used
during the build process; others are used to perform certain functions
that cannot conveniently be placed in the XEmacs executable (e.g. the
@file{gnuclient} program, which allows an external script to communicate
with a running XEmacs process).
- The @file{man/} directory contains the sources for the XEmacs
+The @file{man/} directory contains the sources for the XEmacs
documentation. It is mostly in a form called Texinfo, which can be
converted into either a printed document (by passing it through @TeX{})
or into on-line documentation called @dfn{info files}.
- The @file{info/} directory contains the results of formatting the
-XEmacs documentation as @dfn{info files}, for on-line use. These files
-are used when you enter the Info system using @kbd{C-h i} or through the
+The @file{info/} directory contains the results of formatting the XEmacs
+documentation as @dfn{info files}, for on-line use. These files are
+used when you enter the Info system using @kbd{C-h i} or through the
Help menu.
- The @file{dynodump/} directory contains auxiliary code used to build
+The @file{dynodump/} directory contains auxiliary code used to build
XEmacs on Solaris platforms.
- The other directories contain various miscellaneous code and
-information that is not normally used or needed.
-
- The first step of building involves running the @file{configure}
-program and passing it various parameters to specify any optional
-features you want and compiler arguments and such, as described in the
-@file{INSTALL} file. This determines what the build environment is,
-chooses the appropriate @file{s/} and @file{m/} file, and runs a series
-of tests to determine many details about your environment, such as which
-library functions are available and exactly how they work. (The
-@file{s/} and @file{m/} files only contain information that cannot be
-conveniently detected in this fashion.) The reason for running these
-tests is that it allows XEmacs to be compiled on a much wider variety of
-platforms than those that the XEmacs developers happen to be familiar
-with, including various sorts of hybrid platforms. This is especially
-important now that many operating systems give you a great deal of
-control over exactly what features you want installed, and allow for
-easy upgrading of parts of a system without upgrading the rest. It
+The other directories contain various miscellaneous code and information
+that is not normally used or needed.
+
+The first step of building involves running the @file{configure} program
+and passing it various parameters to specify any optional features you
+want and compiler arguments and such, as described in the @file{INSTALL}
+file. This determines what the build environment is, chooses the
+appropriate @file{s/} and @file{m/} file, and runs a series of tests to
+determine many details about your environment, such as which library
+functions are available and exactly how they work. The reason for
+running these tests is that it allows XEmacs to be compiled on a much
+wider variety of platforms than those that the XEmacs developers happen
+to be familiar with, including various sorts of hybrid platforms. This
+is especially important now that many operating systems give you a great
+deal of control over exactly what features you want installed, and allow
+for easy upgrading of parts of a system without upgrading the rest. It
would be impossible to pre-determine and pre-specify the information for
all possible configurations.
- When configure is done running, it generates @file{Makefile}s and the
-file @file{src/config.h} (which describes the features of your system)
-from template files. You then run @file{make}, which compiles the
-auxiliary code and programs in @file{lib-src/} and @file{lwlib/} and the
-main XEmacs executable in @file{src/}. The result of compiling and
-linking is an executable called @file{temacs}, which is @emph{not} the
-final XEmacs executable. @file{temacs} by itself is not intended to
-function as an editor or even display any windows on the screen, and if
-you simply run it, it will exit immediately. The @file{Makefile} runs
-@file{temacs} with certain options that cause it to initialize itself,
-read in a number of basic Lisp files, and then dump itself out into a
-new executable called @file{xemacs}. This new executable has been
-pre-initialized and contains pre-digested Lisp code that is necessary
-for the editor to function (this includes most basic Lisp functions,
-e.g. @code{not}, that can be defined in terms of other Lisp primitives;
-some initialization code that is called when certain objects, such as
-frames, are created; and all of the standard keybindings and code for
-the actions they result in). This executable, @file{xemacs}, is the
-executable that you run to use the XEmacs editor.
+In fact, the @file{s/} and @file{m/} files are basically @emph{evil},
+since they contain unmaintainable platform-specific hard-coded
+information. XEmacs has been moving in the direction of having all
+system-specific information be determined dynamically by
+@file{configure}. Perhaps someday we can @code{rm -rf src/s src/m}.
+
+When configure is done running, it generates @file{Makefile}s and
+@file{GNUmakefile}s and the file @file{src/config.h} (which describes
+the features of your system) from template files. You then run
+@file{make}, which compiles the auxiliary code and programs in
+@file{lib-src/} and @file{lwlib/} and the main XEmacs executable in
+@file{src/}. The result of compiling and linking is an executable
+called @file{temacs}, which is @emph{not} the final XEmacs executable.
+@file{temacs} by itself is not intended to function as an editor or even
+display any windows on the screen, and if you simply run it, it will
+exit immediately. The @file{Makefile} runs @file{temacs} with certain
+options that cause it to initialize itself, read in a number of basic
+Lisp files, and then dump itself out into a new executable called
+@file{xemacs}. This new executable has been pre-initialized and
+contains pre-digested Lisp code that is necessary for the editor to
+function (this includes most basic editing functions,
+e.g. @code{kill-line}, that can be defined in terms of other Lisp
+primitives; some initialization code that is called when certain
+objects, such as frames, are created; and all of the standard
+keybindings and code for the actions they result in). This executable,
+@file{xemacs}, is the executable that you run to use the XEmacs editor.
Although @file{temacs} is not intended to be run as an editor, it can,
by using the incantation @code{temacs -batch -l loadup.el run-temacs}.
@node XEmacs From the Inside, The XEmacs Object System (Abstractly Speaking), XEmacs From the Perspective of Building, Top
@chapter XEmacs From the Inside
- Internally, XEmacs is quite complex, and can be very confusing. To
+Internally, XEmacs is quite complex, and can be very confusing. To
simplify things, it can be useful to think of XEmacs as containing an
event loop that ``drives'' everything, and a number of other subsystems,
such as a Lisp engine and a redisplay mechanism. Each of these other
state. The flow of control continually passes in and out of these
different subsystems in the course of normal operation of the editor.
- It is important to keep in mind that, most of the time, the editor is
+It is important to keep in mind that, most of the time, the editor is
``driven'' by the event loop. Except during initialization and batch
mode, all subsystems are entered directly or indirectly through the
event loop, and ultimately, control exits out of all subsystems back up
to the event loop, and starting another iteration of the event loop
occurs once each keystroke, mouse motion, etc.
- If you're trying to understand a particular subsystem (other than the
+If you're trying to understand a particular subsystem (other than the
event loop), think of it as a ``daemon'' process or ``servant'' that is
responsible for one particular aspect of a larger system, and
periodically receives commands or environment changes that cause it to
When the Lisp initialization code is done, the C code enters the event
loop, and stays there for the duration of the XEmacs process. The code
-for the event loop is contained in @file{keyboard.c}, and is called
+for the event loop is contained in @file{cmdloop.c}, and is called
@code{Fcommand_loop_1()}. Note that this event loop could very well be
written in Lisp, and in fact a Lisp version exists; but apparently,
doing this makes XEmacs run noticeably slower.
@table @code
@item integer
-28 bits of precision, or 60 bits on 64-bit machines; the reason for this
-is described below when the internal Lisp object representation is
-described.
+28 or 31 bits of precision, or 60 or 63 bits on 64-bit machines; the
+reason for this is described below when the internal Lisp object
+representation is described.
@item float
Same precision as a double in C.
@item cons
An object representing a single character of text; chars behave like
integers in many ways but are logically considered text rather than
numbers and have a different read syntax. (the read syntax for a char
-contains the char itself or some textual encoding of it -- for example,
+contains the char itself or some textual encoding of it---for example,
a Japanese Kanji character might be encoded as @samp{^[$(B#&^[(B} using the
-ISO-2022 encoding standard -- rather than the numerical representation
+ISO-2022 encoding standard---rather than the numerical representation
of the char; this way, if the mapping between chars and integers
changes, which is quite possible for Kanji characters and other extended
characters, the same character will still be created. Note that some
@item string
Self-explanatory; behaves much like a vector of chars
but has a different read syntax and is stored and manipulated
-more compactly and efficiently.
+more compactly.
@item bit-vector
A vector of bits; similar to a string in spirit.
@item compiled-function
-An object describing compiled Lisp code, known as @dfn{byte code}.
+An object containing compiled Lisp code, known as @dfn{byte code}.
@item subr
-An object describing a Lisp primitive.
+A Lisp primitive, i.e. a Lisp-callable function implemented in C.
@end table
@cindex closure
- Note that there is no basic ``function'' type, as in more powerful
+Note that there is no basic ``function'' type, as in more powerful
versions of Lisp (where it's called a @dfn{closure}). XEmacs Lisp does
not provide the closure semantics implemented by Common Lisp and Scheme.
The guts of a function in XEmacs Lisp are represented in one of four
ways: a symbol specifying another function (when one function is an
-alias for another), a list containing the function's source code, a
-bytecode object, or a subr object. (In other words, given a symbol
-specifying the name of a function, calling @code{symbol-function} to
-retrieve the contents of the symbol's function cell will return one of
-these types of objects.)
+alias for another), a list (whose first element must be the symbol
+@code{lambda}) containing the function's source code, a
+compiled-function object, or a subr object. (In other words, given a
+symbol specifying the name of a function, calling @code{symbol-function}
+to retrieve the contents of the symbol's function cell will return one
+of these types of objects.)
- XEmacs Lisp also contains numerous specialized objects used to
-implement the editor:
+XEmacs Lisp also contains numerous specialized objects used to implement
+the editor:
@table @code
@item buffer
equivalent to a @dfn{display} in the X Window System and a @dfn{TTY} in
character mode.
@item face
-An object specifying the appearance of text or graphics; it contains
-characteristics such as font, foreground color, and background color.
+An object specifying the appearance of text or graphics; it has
+properties such as font, foreground color, and background color.
@item marker
An object that refers to a particular position in a buffer and moves
around as text is inserted and deleted to stay in the same relative
There are some other, less-commonly-encountered general objects:
@table @code
-@item hashtable
+@item hash-table
An object that maps from an arbitrary Lisp object to another arbitrary
Lisp object, using hashing for fast lookup.
@item obarray
-A limited form of hashtable that maps from strings to symbols; obarrays
+A limited form of hash-table that maps from strings to symbols; obarrays
are used to look up a symbol given its name and are not actually their
own object type but are kludgily represented using vectors with hidden
fields (this representation derives from GNU Emacs).
communication protocol.
@item toolbar-button
An object used in conjunction with the toolbar.
-@item x-resource
-An object that encapsulates certain miscellaneous resources in the X
-window system, used only when Epoch support is enabled.
@end table
And objects that are only used internally:
-@table @asis
+@table @code
@item opaque
A generic object for encapsulating arbitrary memory; this allows you the
generality of @code{malloc()} and the convenience of the Lisp object
1.983e-4
@end example
-converts to a float whose value is 1983.23e-4, or .0001983.
+converts to a float whose value is 1.983e-4, or .0001983.
@example
?b
(where @samp{^[} actually is an @samp{ESC} character) converts to a
particular Kanji character when using an ISO2022-based coding system for
-input. (To decode this gook: @samp{ESC} begins an escape sequence;
+input. (To decode this goo: @samp{ESC} begins an escape sequence;
@samp{ESC $ (} is a class of escape sequences meaning ``switch to a
94x94 character set''; @samp{ESC $ ( B} means ``switch to Japanese
Kanji''; @samp{#} and @samp{&} collectively index into a 94-by-94 array
@code{obarray}, whose contents should be an obarray. If no symbol
is found, a new symbol with the name @code{"foobar"} is automatically
created and added to @code{obarray}; this process is called
-@dfn{interning} the symbol.
+@dfn{interning} the symbol.
@cindex interning
@example
converts to a bit-vector.
@example
+#s(hash-table ... ...)
+@end example
+
+converts to a hash table (the actual contents are not shown).
+
+@example
#s(range-table ... ...)
@end example
@end example
converts to a char table (the actual contents are not shown).
-(Note that the #s syntax is the general syntax for structures,
-which are not really implemented in XEmacs Lisp but should be.)
- When an object is printed out (using @code{print} or a related
+Note that the @code{#s()} syntax is the general syntax for structures,
+which are not really implemented in XEmacs Lisp but should be.
+
+When an object is printed out (using @code{print} or a related
function), the read syntax is used, so that the same object can be read
in again.
- The other objects do not have read syntaxes, usually because it does
-not really make sense to create them in this fashion (i.e. processes,
-where it doesn't make sense to have a subprocess created as a side
-effect of reading some Lisp code), or because they can't be created at
-all (e.g. subrs). Permanent objects, as a rule, do not have a read
-syntax; nor do most complex objects, which contain too much state to be
-easily initialized through a read syntax.
+The other objects do not have read syntaxes, usually because it does not
+really make sense to create them in this fashion (i.e. processes, where
+it doesn't make sense to have a subprocess created as a side effect of
+reading some Lisp code), or because they can't be created at all
+(e.g. subrs). Permanent objects, as a rule, do not have a read syntax;
+nor do most complex objects, which contain too much state to be easily
+initialized through a read syntax.
@node How Lisp Objects Are Represented in C, Rules When Writing New C Code, The XEmacs Object System (Abstractly Speaking), Top
@chapter How Lisp Objects Are Represented in C
- Lisp objects are represented in C using a 32- or 64-bit machine word
+Lisp objects are represented in C using a 32-bit or 64-bit machine word
(depending on the processor; i.e. DEC Alphas use 64-bit Lisp objects and
most other processors use 32-bit Lisp objects). The representation
stuffs a pointer together with a tag, as follows:
[ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
[ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
- ^ <---> <------------------------------------------------------>
- | tag a pointer to a structure, or an integer
- |
- `---> mark bit
-@end example
-
- The tag describes the type of the Lisp object. For integers and
-chars, the lower 28 bits contain the value of the integer or char; for
-all others, the lower 28 bits contain a pointer. The mark bit is used
-during garbage-collection, and is always 0 when garbage collection is
-not happening. Many macros that extract out parts of a Lisp object
-expect that the mark bit is 0, and will produce incorrect results if
-it's not. (The way that garbage collection works, basically, is that it
-loops over all places where Lisp objects could exist -- this includes
-all global variables in C that contain Lisp objects [including
-@code{Vobarray}, the C equivalent of @code{obarray}; through this, all
-Lisp variables will get marked], plus various other places -- and
-recursively scans through the Lisp objects, marking each object it finds
-by setting the mark bit. Then it goes through the lists of all objects
-allocated, freeing the ones that are not marked and turning off the
-mark bit of the ones that are marked.)
-
- Lisp objects use the typedef @code{Lisp_Object}, but the actual C type
+ <---------------------------------------------------------> <->
+ a pointer to a structure, or an integer tag
+@end example
+
+A tag of 00 is used for all pointer object types, a tag of 10 is used
+for characters, and the other two tags 01 and 11 are joined together to
+form the integer object type. This representation gives us 31 bit
+integers and 30 bit characters, while pointers are represented directly
+without any bit masking or shifting. This representation, though,
+assumes that pointers to structs are always aligned to multiples of 4,
+so the lower 2 bits are always zero.
+
+Lisp objects use the typedef @code{Lisp_Object}, but the actual C type
used for the Lisp object can vary. It can be either a simple type
(@code{long} on the DEC Alpha, @code{int} on other machines) or a
structure whose fields are bit fields that line up properly (actually, a
-union of structures that's used). Generally the simple integral type is
+union of structures is used). Generally the simple integral type is
preferable because it ensures that the compiler will actually use a
machine word to represent the object (some compilers will use more
general and less efficient code for unions and structs even if they can
fit in a machine word). The union type, however, has the advantage of
-stricter type checking (if you accidentally pass an integer where a Lisp
-object is desired, you get a compile error), and it makes it easier to
-decode Lisp objects when debugging. The choice of which type to use is
-determined by the presence or absence of the preprocessor constant
-@code{USE_UNION_TYPE}.
-
-@cindex record type
- Note that there are only eight types that the tag can represent,
-but many more actual types than this. This is handled by having
-one of the tag types specify a meta-type called a @dfn{record};
-for all such objects, the first four bytes of the pointed-to
-structure indicate what the actual type is.
-
- Note also that having 28 bits for pointers and integers restricts a
-lot of things to 256 megabytes of memory. (Basically, enough pointers
-and indices and whatnot get stuffed into Lisp objects that the total
-amount of memory used by XEmacs can't grow above 256 megabytes. In
-older versions of XEmacs and GNU Emacs, the tag was 5 bits wide,
-allowing for 32 types, which was more than the actual number of types
-that existed at the time, and no ``record'' type was necessary.
-However, this limited the editor to 64 megabytes total, which some users
-who edited large files might conceivably exceed.)
-
- Also, note that there is an implicit assumption here that all pointers
-are low enough that the top bits are all zero and can just be chopped
-off. On standard machines that allocate memory from the bottom up (and
-give each process its own address space), this works fine. Some
-machines, however, put the data space somewhere else in memory
-(e.g. beginning at 0x80000000). Those machines cope by defining
-@code{DATA_SEG_BITS} in the corresponding @file{m/} or @file{s/} file to
-the proper mask. Then, pointers retrieved from Lisp objects are
-automatically OR'ed with this value prior to being used.
-
- A corollary of the previous paragraph is that @strong{(pointers to)
-stack-allocated structures cannot be put into Lisp objects}. The stack
-is generally located near the top of memory; if you put such a pointer
-into a Lisp object, it will get its top bits chopped off, and you will
-lose.
-
- Various macros are used to construct Lisp objects and extract the
-components. Macros of the form @code{XINT()}, @code{XCHAR()},
-@code{XSTRING()}, @code{XSYMBOL()}, etc. mask out the pointer/integer
-field and cast it to the appropriate type. All of the macros that
-construct pointers will @code{OR} with @code{DATA_SEG_BITS} if
-necessary. @code{XINT()} needs to be a bit tricky so that negative
-numbers are properly sign-extended: Usually it does this by shifting the
-number four bits to the left and then four bits to the right. This
-assumes that the right-shift operator does an arithmetic shift (i.e. it
-leaves the most-significant bit as-is rather than shifting in a zero, so
-that it mimics a divide-by-two even for negative numbers). Not all
-machines/compilers do this, and on the ones that don't, a more
-complicated definition is selected by defining
-@code{EXPLICIT_SIGN_EXTEND}.
-
- Note that when @code{ERROR_CHECK_TYPECHECK} is defined, the extractor
-macros become more complicated -- they check the tag bits and/or the
+stricter type checking. If you accidentally pass an integer where a Lisp
+object is desired, you get a compile error. The choice of which type
+to use is determined by the preprocessor constant @code{USE_UNION_TYPE}
+which is defined via the @code{--use-union-type} option to
+@code{configure}.
+
+Various macros are used to convert between Lisp_Objects and the
+corresponding C type. Macros of the form @code{XINT()}, @code{XCHAR()},
+@code{XSTRING()}, @code{XSYMBOL()}, do any required bit shifting and/or
+masking and cast it to the appropriate type. @code{XINT()} needs to be
+a bit tricky so that negative numbers are properly sign-extended. Since
+integers are stored left-shifted, if the right-shift operator does an
+arithmetic shift (i.e. it leaves the most-significant bit as-is rather
+than shifting in a zero, so that it mimics a divide-by-two even for
+negative numbers) the shift to remove the tag bit is enough. This is
+the case on all the systems we support.
+
+Note that when @code{ERROR_CHECK_TYPECHECK} is defined, the converter
+macros become more complicated---they check the tag bits and/or the
type field in the first four bytes of a record type to ensure that the
object is really of the correct type. This is great for catching places
-where an incorrect type is being dereferenced -- this typically results
+where an incorrect type is being dereferenced---this typically results
in a pointer being dereferenced as the wrong type of structure, with
unpredictable (and sometimes not easily traceable) results.
- There are similar @code{XSET@var{TYPE}()} macros that construct a Lisp object.
-These macros are of the form @code{XSET@var{TYPE} (@var{lvalue}, @var{result})},
-i.e. they have to be a statement rather than just used in an expression.
-The reason for this is that standard C doesn't let you ``construct'' a
-structure (but GCC does). Granted, this sometimes isn't too convenient;
-for the case of integers, at least, you can use the function
-@code{make_int()}, which constructs and @emph{returns} an integer
-Lisp object. Note that the @code{XSET@var{TYPE}()} macros are also
-affected by @code{ERROR_CHECK_TYPECHECK} and make sure that the
-structure is of the right type in the case of record types, where the
-type is contained in the structure.
+There are similar @code{XSET@var{TYPE}()} macros that construct a Lisp
+object. These macros are of the form @code{XSET@var{TYPE}
+(@var{lvalue}, @var{result})}, i.e. they have to be a statement rather
+than just used in an expression. The reason for this is that standard C
+doesn't let you ``construct'' a structure (but GCC does). Granted, this
+sometimes isn't too convenient; for the case of integers, at least, you
+can use the function @code{make_int()}, which constructs and
+@emph{returns} an integer Lisp object. Note that the
+@code{XSET@var{TYPE}()} macros are also affected by
+@code{ERROR_CHECK_TYPECHECK} and make sure that the structure is of the
+right type in the case of record types, where the type is contained in
+the structure.
+
+The C programmer is responsible for @strong{guaranteeing} that a
+Lisp_Object is the correct type before using the @code{X@var{TYPE}}
+macros. This is especially important in the case of lists. Use
+@code{XCAR} and @code{XCDR} if a Lisp_Object is certainly a cons cell,
+else use @code{Fcar()} and @code{Fcdr()}. Trust other C code, but not
+Lisp code. On the other hand, if XEmacs has an internal logic error,
+it's better to crash immediately, so sprinkle @code{assert()}s and
+``unreachable'' @code{abort()}s liberally about the source code. Where
+performance is an issue, use @code{type_checking_assert},
+@code{bufpos_checking_assert}, and @code{gc_checking_assert}, which do
+nothing unless the corresponding configure error checking flag was
+specified.
@node Rules When Writing New C Code, A Summary of the Various XEmacs Modules, How Lisp Objects Are Represented in C, Top
@chapter Rules When Writing New C Code
- The XEmacs C Code is extremely complex and intricate, and there are
-many rules that are more or less consistently followed throughout the code.
+The XEmacs C Code is extremely complex and intricate, and there are many
+rules that are more or less consistently followed throughout the code.
Many of these rules are not obvious, so they are explained here. It is
-of the utmost importance that you follow them. If you don't, you may get
-something that appears to work, but which will crash in odd situations,
-often in code far away from where the actual breakage is.
+of the utmost importance that you follow them. If you don't, you may
+get something that appears to work, but which will crash in odd
+situations, often in code far away from where the actual breakage is.
@menu
* General Coding Rules::
* Techniques for XEmacs Developers::
@end menu
-@node General Coding Rules
+@node General Coding Rules, Writing Lisp Primitives, Rules When Writing New C Code, Rules When Writing New C Code
@section General Coding Rules
- Almost every module contains a @code{syms_of_*()} function and a
-@code{vars_of_*()} function. The former declares any Lisp primitives
-you have defined and defines any symbols you will be using. The latter
-declares any global Lisp variables you have added and initializes global
-C variables in the module. For each such function, declare it in
-@file{symsinit.h} and make sure it's called in the appropriate place in
-@file{emacs.c}. @strong{Important}: There are stringent requirements on
-exactly what can go into these functions. See the comment in
-@file{emacs.c}. The reason for this is to avoid obscure unwanted
-interactions during initialization. If you don't follow these rules,
-you'll be sorry! If you want to do anything that isn't allowed, create
-a @code{complex_vars_of_*()} function for it. Doing this is tricky,
-though: You have to make sure your function is called at the right time
-so that all the initialization dependencies work out.
-
- Every module includes @file{<config.h>} (angle brackets so that
+The C code is actually written in a dialect of C called @dfn{Clean C},
+meaning that it can be compiled, mostly warning-free, with either a C or
+C++ compiler. Coding in Clean C has several advantages over plain C.
+C++ compilers are more nit-picking, and a number of coding errors have
+been found by compiling with C++. The ability to use both C and C++
+tools means that a greater variety of development tools are available to
+the developer.
+
+Every module includes @file{<config.h>} (angle brackets so that
@samp{--srcdir} works correctly; @file{config.h} may or may not be in
the same directory as the C sources) and @file{lisp.h}. @file{config.h}
-should always be included before any other header files (including
+must always be included before any other header files (including
system header files) to ensure that certain tricks played by various
@file{s/} and @file{m/} files work out correctly.
- @strong{All global and static variables that are to be modifiable must
-be declared uninitialized.} This means that you may not use the ``declare
-with initializer'' form for these variables, such as @code{int
+When including header files, always use angle brackets, not double
+quotes, except when the file to be included is always in the same
+directory as the including file. If either file is a generated file,
+then that is not likely to be the case. In order to understand why we
+have this rule, imagine what happens when you do a build in the source
+directory using @samp{./configure} and another build in another
+directory using @samp{../work/configure}. There will be two different
+@file{config.h} files. Which one will be used if you @samp{#include
+"config.h"}?
+
+Almost every module contains a @code{syms_of_*()} function and a
+@code{vars_of_*()} function. The former declares any Lisp primitives
+you have defined and defines any symbols you will be using. The latter
+declares any global Lisp variables you have added and initializes global
+C variables in the module. @strong{Important}: There are stringent
+requirements on exactly what can go into these functions. See the
+comment in @file{emacs.c}. The reason for this is to avoid obscure
+unwanted interactions during initialization. If you don't follow these
+rules, you'll be sorry! If you want to do anything that isn't allowed,
+create a @code{complex_vars_of_*()} function for it. Doing this is
+tricky, though: you have to make sure your function is called at the
+right time so that all the initialization dependencies work out.
+
+Declare each function of these kinds in @file{symsinit.h}. Make sure
+it's called in the appropriate place in @file{emacs.c}. You never need
+to include @file{symsinit.h} directly, because it is included by
+@file{lisp.h}.
+
+@strong{All global and static variables that are to be modifiable must
+be declared uninitialized.} This means that you may not use the
+``declare with initializer'' form for these variables, such as @code{int
some_variable = 0;}. The reason for this has to do with some kludges
done during the dumping process: If possible, the initialized data
segment is re-mapped so that it becomes part of the (unmodifiable) code
segment in the dumped executable. This allows this memory to be shared
among multiple running XEmacs processes. XEmacs is careful to place as
-much constant data as possible into initialized variables (in
-particular, into what's called the @dfn{pure space} -- see below) during
-the @file{temacs} phase.
+much constant data as possible into initialized variables during the
+@file{temacs} phase.
@cindex copy-on-write
- @strong{Please note:} This kludge only works on a few systems
-nowadays, and is rapidly becoming irrelevant because most modern
-operating systems provide @dfn{copy-on-write} semantics. All data is
-initially shared between processes, and a private copy is automatically
-made (on a page-by-page basis) when a process first attempts to write to
-a page of memory.
-
- Formerly, there was a requirement that static variables not be
-declared inside of functions. This had to do with another hack along
-the same vein as what was just described: old USG systems put
-statically-declared variables in the initialized data space, so those
-header files had a @code{#define static} declaration. (That way, the
-data-segment remapping described above could still work.) This fails
-badly on static variables inside of functions, which suddenly become
-automatic variables; therefore, you weren't supposed to have any of
-them. This awful kludge has been removed in XEmacs because
+@strong{Please note:} This kludge only works on a few systems nowadays,
+and is rapidly becoming irrelevant because most modern operating systems
+provide @dfn{copy-on-write} semantics. All data is initially shared
+between processes, and a private copy is automatically made (on a
+page-by-page basis) when a process first attempts to write to a page of
+memory.
+
+Formerly, there was a requirement that static variables not be declared
+inside of functions. This had to do with another hack along the same
+vein as what was just described: old USG systems put statically-declared
+variables in the initialized data space, so those header files had a
+@code{#define static} declaration. (That way, the data-segment remapping
+described above could still work.) This fails badly on static variables
+inside of functions, which suddenly become automatic variables;
+therefore, you weren't supposed to have any of them. This awful kludge
+has been removed in XEmacs because
@enumerate
@item
this hack completely messed up inline functions.
@end enumerate
-@node Writing Lisp Primitives
+The C source code makes heavy use of C preprocessor macros. One popular
+macro style is:
+
+@example
+#define FOO(var, value) do @{ \
+ Lisp_Object FOO_value = (value); \
+ ... /* compute using FOO_value */ \
+ (var) = bar; \
+@} while (0)
+@end example
+
+The @code{do @{...@} while (0)} is a standard trick to allow FOO to have
+statement semantics, so that it can safely be used within an @code{if}
+statement in C, for example. Multiple evaluation is prevented by
+copying a supplied argument into a local variable, so that
+@code{FOO(var,fun(1))} only calls @code{fun} once.
+
+Lisp lists are popular data structures in the C code as well as in
+Elisp. There are two sets of macros that iterate over lists.
+@code{EXTERNAL_LIST_LOOP_@var{n}} should be used when the list has been
+supplied by the user, and cannot be trusted to be acyclic and
+@code{nil}-terminated. A @code{malformed-list} or @code{circular-list} error
+will be generated if the list being iterated over is not entirely
+kosher. @code{LIST_LOOP_@var{n}}, on the other hand, is faster and less
+safe, and can be used only on trusted lists.
+
+Related macros are @code{GET_EXTERNAL_LIST_LENGTH} and
+@code{GET_LIST_LENGTH}, which calculate the length of a list, and in the
+case of @code{GET_EXTERNAL_LIST_LENGTH}, validating the properness of
+the list. The macros @code{EXTERNAL_LIST_LOOP_DELETE_IF} and
+@code{LIST_LOOP_DELETE_IF} delete elements from a lisp list satisfying some
+predicate.
+
+@node Writing Lisp Primitives, Adding Global Lisp Variables, General Coding Rules, Rules When Writing New C Code
@section Writing Lisp Primitives
- Lisp primitives are Lisp functions implemented in C. The details of
+Lisp primitives are Lisp functions implemented in C. The details of
interfacing the C function so that Lisp can call it are handled by a few
C macros. The only way to really understand how to write new C code is
to read the source, but we can explain some things here.
- An example of a special form is the definition of @code{or}, from
+An example of a special form is the definition of @code{prog1}, from
@file{eval.c}. (An ordinary function would have the same general
appearance.)
@cindex garbage collection protection
@smallexample
@group
-DEFUN ("or", For, 0, UNEVALLED, 0, /*
-Eval args until one of them yields non-nil, then return that value.
-The remaining args are not evalled at all.
-If all args return nil, return nil.
+DEFUN ("prog1", Fprog1, 1, UNEVALLED, 0, /*
+Similar to `progn', but the value of the first form is returned.
+\(prog1 FIRST BODY...): All the arguments are evaluated sequentially.
+The value of FIRST is saved during evaluation of the remaining args,
+whose values are discarded.
*/
(args))
@{
/* This function can GC */
- Lisp_Object val = Qnil;
+ REGISTER Lisp_Object val, form, tail;
struct gcpro gcpro1;
- GCPRO1 (args);
+ val = Feval (XCAR (args));
- while (!NILP (args))
- @{
- val = Feval (XCAR (args));
- if (!NILP (val))
- break;
- args = XCDR (args);
- @}
+ GCPRO1 (val);
+
+ LIST_LOOP_3 (form, XCDR (args), tail)
+ Feval (form);
UNGCPRO;
return val;
@code{DEFUN} macro. Here is a template for them:
@example
-DEFUN (@var{lname}, @var{fname}, @var{min}, @var{max}, @var{interactive}, /*
-@var{docstring}
-*/
- (@var{arglist}) )
+@group
+DEFUN (@var{lname}, @var{fname}, @var{min_args}, @var{max_args}, @var{interactive}, /*
+@var{docstring}
+*/
+ (@var{arglist}))
+@end group
@end example
@table @var
@item lname
This string is the name of the Lisp symbol to define as the function
-name; in the example above, it is @code{"or"}.
+name; in the example above, it is @code{"prog1"}.
@item fname
This is the C function name for this function. This is the name that is
used in C code for calling the function. The name is, by convention,
@samp{F} prepended to the Lisp name, with all dashes (@samp{-}) in the
Lisp name changed to underscores. Thus, to call this function from C
-code, call @code{For}. Remember that the arguments are of type
+code, call @code{Fprog1}. Remember that the arguments are of type
@code{Lisp_Object}; various macros and functions for creating values of
type @code{Lisp_Object} are declared in the file @file{lisp.h}.
create the symbol and store the subr object as its definition. The C
variable name of this structure is always @samp{S} prepended to the
@var{fname}. You hardly ever need to be aware of the existence of this
-structure.
+structure, since @code{DEFUN} plus @code{DEFSUBR} takes care of all the
+details.
-@item min
+@item min_args
This is the minimum number of arguments that the function requires. The
-function @code{or} allows a minimum of zero arguments.
+function @code{prog1} allows a minimum of one argument.
-@item max
+@item max_args
This is the maximum number of arguments that the function accepts, if
there is a fixed maximum. Alternatively, it can be @code{UNEVALLED},
indicating a special form that receives unevaluated arguments, or
@code{MANY}, indicating an unlimited number of evaluated arguments (the
-equivalent of @code{&rest}). Both @code{UNEVALLED} and @code{MANY} are
-macros. If @var{max} is a number, it may not be less than @var{min} and
-it may not be greater than 8. (If you need to add a function with
-more than 8 arguments, either use the @code{MANY} form or edit the
-definition of @code{DEFUN} in @file{lisp.h}. If you do the latter,
-make sure to also add another clause to the switch statement in
-@code{primitive_funcall().})
+C equivalent of @code{&rest}). Both @code{UNEVALLED} and @code{MANY}
+are macros. If @var{max_args} is a number, it may not be less than
+@var{min_args} and it may not be greater than 8. (If you need to add a
+function with more than 8 arguments, use the @code{MANY} form. Resist
+the urge to edit the definition of @code{DEFUN} in @file{lisp.h}. If
+you do it anyways, make sure to also add another clause to the switch
+statement in @code{primitive_funcall().})
@item interactive
This is an interactive specification, a string such as might be used as
the argument of @code{interactive} in a Lisp function. In the case of
-@code{or}, it is 0 (a null pointer), indicating that @code{or} cannot be
-called interactively. A value of @code{""} indicates a function that
-should receive no arguments when called interactively.
+@code{prog1}, it is 0 (a null pointer), indicating that @code{prog1}
+cannot be called interactively. A value of @code{""} indicates a
+function that should receive no arguments when called interactively.
@item docstring
This is the documentation string. It is written just like a
documentation strings, is very particular about what it looks for, and
will not properly extract the doc string if it's not in this exact format.
-You are free to put the various arguments to @code{DEFUN} on separate
-lines to avoid overly long lines. However, make sure to put the
-comment-start characters for the doc string on the same line as the
-interactive specification, and put a newline directly after them (and
-before the comment-end characters).
+In order to make both @file{etags} and @file{make-docfile} happy, make
+sure that the @code{DEFUN} line contains the @var{lname} and
+@var{fname}, and that the comment-start characters for the doc string
+are on the same line as the interactive specification, and put a newline
+directly after them (and before the comment-end characters).
@item arglist
This is the comma-separated list of arguments to the C function. For a
function with a fixed maximum number of arguments, provide a C argument
for each Lisp argument. In this case, unlike regular C functions, the
types of the arguments are not declared; they are simply always of type
-@code{Lisp_Object}.
+@code{Lisp_Object}.
The names of the C arguments will be used as the names of the arguments
to the Lisp primitive as displayed in its documentation, modulo the same
@code{dirname}) to be used as argument names without compiler warnings
or errors.
-A Lisp function with @w{@var{max} = @code{UNEVALLED}} is a
+A Lisp function with @w{@var{max_args} = @code{UNEVALLED}} is a
@w{@dfn{special form}}; its arguments are not evaluated. Instead it
receives one argument of type @code{Lisp_Object}, a (Lisp) list of the
unevaluated arguments, conventionally named @code{(args)}.
When a Lisp function has no upper limit on the number of arguments,
-specify @w{@var{max} = @code{MANY}}. In this case its implementation in
+specify @w{@var{max_args} = @code{MANY}}. In this case its implementation in
C actually receives exactly two arguments: the number of Lisp arguments
(an @code{int}) and the address of a block containing their values (a
@w{@code{Lisp_Object *}}). In this case only are the C types specified
@end table
- Within the function @code{For} itself, note the use of the macros
+Within the function @code{Fprog1} itself, note the use of the macros
@code{GCPRO1} and @code{UNGCPRO}. @code{GCPRO1} is used to ``protect''
a variable from garbage collection---to inform the garbage collector
-that it must look in that variable and regard its contents as an
-accessible object. This is necessary whenever you call @code{Feval} or
-anything that can directly or indirectly call @code{Feval} (this
-includes the @code{QUIT} macro!). At such a time, any Lisp object that
-you intend to refer to again must be protected somehow. @code{UNGCPRO}
-cancels the protection of the variables that are protected in the
-current function. It is necessary to do this explicitly.
-
- The macro @code{GCPRO1} protects just one local variable. If you want
+that it must look in that variable and regard the object pointed at by
+its contents as an accessible object. This is necessary whenever you
+call @code{Feval} or anything that can directly or indirectly call
+@code{Feval} (this includes the @code{QUIT} macro!). At such a time,
+any Lisp object that you intend to refer to again must be protected
+somehow. @code{UNGCPRO} cancels the protection of the variables that
+are protected in the current function. It is necessary to do this
+explicitly.
+
+The macro @code{GCPRO1} protects just one local variable. If you want
to protect two, use @code{GCPRO2} instead; repeating @code{GCPRO1} will
not work. Macros @code{GCPRO3} and @code{GCPRO4} also exist.
- These macros implicitly use local variables such as @code{gcpro1}; you
+These macros implicitly use local variables such as @code{gcpro1}; you
must declare these explicitly, with type @code{struct gcpro}. Thus, if
you use @code{GCPRO2}, you must declare @code{gcpro1} and @code{gcpro2}.
@cindex caller-protects (@code{GCPRO} rule)
- Note also that the general rule is @dfn{caller-protects}; i.e. you
-are only responsible for protecting those Lisp objects that you create.
-Any objects passed to you as parameters should have been protected
-by whoever created them, so you don't in general have to protect them.
-@code{For} is an exception; it protects its parameters to provide
-extra assurance against Lisp primitives elsewhere that are incorrectly
-written, and against malicious self-modifying code. There are a few
-other standard functions that also do this.
-
-@code{GCPRO}ing is perhaps the trickiest and most error-prone part
-of XEmacs coding. It is @strong{extremely} important that you get this
+Note also that the general rule is @dfn{caller-protects}; i.e. you are
+only responsible for protecting those Lisp objects that you create. Any
+objects passed to you as arguments should have been protected by whoever
+created them, so you don't in general have to protect them.
+
+In particular, the arguments to any Lisp primitive are always
+automatically @code{GCPRO}ed, when called ``normally'' from Lisp code or
+bytecode. So only a few Lisp primitives that are called frequently from
+C code, such as @code{Fprogn} protect their arguments as a service to
+their caller. You don't need to protect your arguments when writing a
+new @code{DEFUN}.
+
+@code{GCPRO}ing is perhaps the trickiest and most error-prone part of
+XEmacs coding. It is @strong{extremely} important that you get this
right and use a great deal of discipline when writing this code.
@xref{GCPROing, ,@code{GCPRO}ing}, for full details on how to do this.
- What @code{DEFUN} actually does is declare a global structure of
-type @code{Lisp_Subr} whose name begins with capital @samp{SF} and
-which contains information about the primitive (e.g. a pointer to the
+What @code{DEFUN} actually does is declare a global structure of type
+@code{Lisp_Subr} whose name begins with capital @samp{SF} and which
+contains information about the primitive (e.g. a pointer to the
function, its minimum and maximum allowed arguments, a string describing
-its Lisp name); @code{DEFUN} then begins a normal C function
-declaration using the @code{F...} name. The Lisp subr object that is
-the function definition of a primitive (i.e. the object in the function
-slot of the symbol that names the primitive) actually points to this
-@samp{SF} structure; when @code{Feval} encounters a subr, it looks in the
+its Lisp name); @code{DEFUN} then begins a normal C function declaration
+using the @code{F...} name. The Lisp subr object that is the function
+definition of a primitive (i.e. the object in the function slot of the
+symbol that names the primitive) actually points to this @samp{SF}
+structure; when @code{Feval} encounters a subr, it looks in the
structure to find out how to call the C function.
- Defining the C function is not enough to make a Lisp primitive
+Defining the C function is not enough to make a Lisp primitive
available; you must also create the Lisp symbol for the primitive (the
symbol is @dfn{interned}; @pxref{Obarrays}) and store a suitable subr
object in its function cell. (If you don't do this, the primitive won't
DEFSUBR (@var{fname});
@end example
-@noindent
-Here @var{fname} is the name you used as the second argument to
+@noindent
+Here @var{fname} is the same name you used as the second argument to
@code{DEFUN}.
- This call to @code{DEFSUBR} should go in the @code{syms_of_*()}
-function at the end of the module. If no such function exists, create
-it and make sure to also declare it in @file{symsinit.h} and call it
-from the appropriate spot in @code{main()}. @xref{General Coding
-Rules}.
+This call to @code{DEFSUBR} should go in the @code{syms_of_*()} function
+at the end of the module. If no such function exists, create it and
+make sure to also declare it in @file{symsinit.h} and call it from the
+appropriate spot in @code{main()}. @xref{General Coding Rules}.
- Note that C code cannot call functions by name unless they are defined
+Note that C code cannot call functions by name unless they are defined
in C. The way to call a function written in Lisp from C is to use
@code{Ffuncall}, which embodies the Lisp function @code{funcall}. Since
the Lisp function @code{funcall} accepts an unlimited number of
pass to it. Since @code{Ffuncall} can call the evaluator, you must
protect pointers from garbage collection around the call to
@code{Ffuncall}. (However, @code{Ffuncall} explicitly protects all of
-its parameters, so you don't have to protect any pointers passed
-as parameters to it.)
+its parameters, so you don't have to protect any pointers passed as
+parameters to it.)
- The C functions @code{call0}, @code{call1}, @code{call2}, and so on,
+The C functions @code{call0}, @code{call1}, @code{call2}, and so on,
provide handy ways to call a Lisp function conveniently with a fixed
number of arguments. They work by calling @code{Ffuncall}.
- @file{eval.c} is a very good file to look through for examples;
-@file{lisp.h} contains the definitions for some important macros and
+@file{eval.c} is a very good file to look through for examples;
+@file{lisp.h} contains the definitions for important macros and
functions.
-@node Adding Global Lisp Variables
+@node Adding Global Lisp Variables, Coding for Mule, Writing Lisp Primitives, Rules When Writing New C Code
@section Adding Global Lisp Variables
- Global variables whose names begin with @samp{Q} are constants whose
+Global variables whose names begin with @samp{Q} are constants whose
value is a symbol of a particular name. The name of the variable should
be derived from the name of the symbol using the same rules as for Lisp
primitives. These variables are initialized using a call to
Lisp object, and you will be the one who's unhappy when you can't figure
out how your variable got overwritten.
-@node Coding for Mule
+@node Coding for Mule, Techniques for XEmacs Developers, Adding Global Lisp Variables, Rules When Writing New C Code
@section Coding for Mule
@cindex Coding for Mule
@menu
* Character-Related Data Types::
* Working With Character and Byte Positions::
-* Conversion of External Data::
+* Conversion to and from External Data::
* General Guidelines for Writing Mule-Aware Code::
* An Example of Mule-Aware Code::
@end menu
-@node Character-Related Data Types
+@node Character-Related Data Types, Working With Character and Byte Positions, Coding for Mule, Coding for Mule
@subsection Character-Related Data Types
-First, we will list the basic character-related datatypes used by
-XEmacs. Note that the separate @code{typedef}s are not required for the
-code to work (all of them boil down to @code{unsigned char} or
+First, let's review the basic character-related datatypes used by
+XEmacs. Note that the separate @code{typedef}s are not mandatory in the
+current implementation (all of them boil down to @code{unsigned char} or
@code{int}), but they improve clarity of code a great deal, because one
glance at the declaration can tell the intended use of the variable.
The data representing the text in a buffer or string is logically a set
of @code{Bufbyte}s.
-XEmacs does not work with character formats all the time; when reading
-characters from the outside, it decodes them to an internal format, and
-likewise encodes them when writing. @code{Bufbyte} (in fact
+XEmacs does not work with the same character formats all the time; when
+reading characters from the outside, it decodes them to an internal
+format, and likewise encodes them when writing. @code{Bufbyte} (in fact
@code{unsigned char}) is the basic unit of XEmacs internal buffers and
-strings format.
+strings format. A @code{Bufbyte *} is the type that points at text
+encoded in the variable-width internal encoding.
One character can correspond to one or more @code{Bufbyte}s. In the
-current implementation, an ASCII character is represented by the same
-@code{Bufbyte}, and extended characters are represented by a sequence of
-@code{Bufbyte}s.
+current Mule implementation, an ASCII character is represented by the
+same @code{Bufbyte}, and other characters are represented by a sequence
+of two or more @code{Bufbyte}s.
-Without Mule support, a @code{Bufbyte} is equivalent to an
-@code{Emchar}.
+Without Mule support, there are exactly 256 characters, implicitly
+Latin-1, and each character is represented using one @code{Bufbyte}, and
+there is a one-to-one correspondence between @code{Bufbyte}s and
+@code{Emchar}s.
@item Bufpos
@itemx Charcount
+@cindex Bufpos
+@cindex Charcount
A @code{Bufpos} represents a character position in a buffer or string.
A @code{Charcount} represents a number (count) of characters.
Logically, subtracting two @code{Bufpos} values yields a
@code{Charcount} value. Although all of these are @code{typedef}ed to
-@code{int}, we use them in preference to @code{int} to make it clear
-what sort of position is being used.
+@code{EMACS_INT}, we use them in preference to @code{EMACS_INT} to make
+it clear what sort of position is being used.
@code{Bufpos} and @code{Charcount} values are the only ones that are
ever visible to Lisp.
@item Bytind
@itemx Bytecount
+@cindex Bytind
+@cindex Bytecount
A @code{Bytind} represents a byte position in a buffer or string. A
-@code{Bytecount} represents the distance between two positions in bytes.
+@code{Bytecount} represents the distance between two positions, in bytes.
The relationship between @code{Bytind} and @code{Bytecount} is the same
as the relationship between @code{Bufpos} and @code{Charcount}.
@item Extbyte
@itemx Extcount
+@cindex Extbyte
+@cindex Extcount
When dealing with the outside world, XEmacs works with @code{Extbyte}s,
which are equivalent to @code{unsigned char}. Obviously, an
@code{Extcount} is the distance between two @code{Extbyte}s. Extbytes
and Extcounts are not all that frequent in XEmacs code.
@end table
-@node Working With Character and Byte Positions
+@node Working With Character and Byte Positions, Conversion to and from External Data, Character-Related Data Types, Coding for Mule
@subsection Working With Character and Byte Positions
Now that we have defined the basic character-related types, we can look
@table @code
@item MAX_EMCHAR_LEN
-This preprocessor constant is the maximum number of buffer bytes per
-Emacs character, i.e. the byte length of an @code{Emchar}. It is useful
-when allocating temporary strings to keep a known number of characters.
-For instance:
+@cindex MAX_EMCHAR_LEN
+This preprocessor constant is the maximum number of buffer bytes to
+represent an Emacs character in the variable width internal encoding.
+It is useful when allocating temporary strings to keep a known number of
+characters. For instance:
@example
@group
...
@{
/* Allocate place for @var{cclen} characters. */
- Bufbyte *tmp_buf = (Bufbyte *)alloca (cclen * MAX_EMCHAR_LEN);
+ Bufbyte *buf = (Bufbyte *)alloca (cclen * MAX_EMCHAR_LEN);
...
@end group
@end example
If you followed the previous section, you can guess that, logically,
-multiplying a @code{Charcount} value with @code{MAX_EMCHAR_LEN} produces
+multiplying a @code{Charcount} value with @code{MAX_EMCHAR_LEN} produces
a @code{Bytecount} value.
In the current Mule implementation, @code{MAX_EMCHAR_LEN} equals 4.
Without Mule, it is 1.
@item charptr_emchar
-@item set_charptr_emchar
-@code{charptr_emchar} macro takes a @code{Bufbyte} pointer and returns
-the underlying @code{Emchar}. If it were a function, its prototype
-would be:
+@itemx set_charptr_emchar
+@cindex charptr_emchar
+@cindex set_charptr_emchar
+The @code{charptr_emchar} macro takes a @code{Bufbyte} pointer and
+returns the @code{Emchar} stored at that position. If it were a
+function, its prototype would be:
@example
Emchar charptr_emchar (Bufbyte *p);
@item INC_CHARPTR
@itemx DEC_CHARPTR
+@cindex INC_CHARPTR
+@cindex DEC_CHARPTR
These two macros increment and decrement a @code{Bufbyte} pointer,
-respectively. The pointer needs to be correctly positioned at the
-beginning of a valid character position.
+respectively. They will adjust the pointer by the appropriate number of
+bytes according to the byte length of the character stored there. Both
+macros assume that the memory address is located at the beginning of a
+valid character.
Without Mule support, @code{INC_CHARPTR (p)} and @code{DEC_CHARPTR (p)}
simply expand to @code{p++} and @code{p--}, respectively.
@item bytecount_to_charcount
+@cindex bytecount_to_charcount
Given a pointer to a text string and a length in bytes, return the
equivalent length in characters.
@end example
@item charcount_to_bytecount
+@cindex charcount_to_bytecount
Given a pointer to a text string and a length in characters, return the
equivalent length in bytes.
@end example
@item charptr_n_addr
+@cindex charptr_n_addr
Return a pointer to the beginning of the character offset @var{cc} (in
characters) from @var{p}.
@end example
@end table
-@node Conversion of External Data
-@subsection Conversion of External Data
+@node Conversion to and from External Data, General Guidelines for Writing Mule-Aware Code, Working With Character and Byte Positions, Coding for Mule
+@subsection Conversion to and from External Data
When an external function, such as a C library function, returns a
-@code{char} pointer, you should never treat it as @code{Bufbyte}. This
-is because these returned strings may contain 8bit characters which can
-be misinterpreted by XEmacs, and cause a crash. Instead, you should use
-a conversion macro. Many different conversion macros are defined in
-@file{buffer.h}, so I will try to order them logically, by direction and
-by format.
-
-Thus the basic conversion macros are @code{GET_CHARPTR_INT_DATA_ALLOCA}
-and @code{GET_CHARPTR_EXT_DATA_ALLOCA}. The former is used to convert
-external data to internal format, and the latter is used to convert the
-other way around. The arguments each of these receives are @var{ptr}
-(pointer to the text in external format), @var{len} (length of texts in
-bytes), @var{fmt} (format of the external text), @var{ptr_out} (lvalue
-to which new text should be copied), and @var{len_out} (lvalue which
-will be assigned the length of the internal text in bytes). The
-resulting text is stored to a stack-allocated buffer. If the text
-doesn't need changing, these macros will do nothing, except for setting
-@var{len_out}.
-
-Currently meaningful formats are @code{FORMAT_BINARY},
-@code{FORMAT_FILENAME}, @code{FORMAT_OS}, and @code{FORMAT_CTEXT}.
-
-The two macros above take many arguments which makes them unwieldy. For
-this reason, several convenience macros are defined with obvious
-functionality, but accepting less arguments:
+@code{char} pointer, you should almost never treat it as @code{Bufbyte}.
+This is because these returned strings may contain 8bit characters which
+can be misinterpreted by XEmacs, and cause a crash. Likewise, when
+exporting a piece of internal text to the outside world, you should
+always convert it to an appropriate external encoding, lest the internal
+stuff (such as the infamous \201 characters) leak out.
+
+The interface to conversion between the internal and external
+representations of text are the numerous conversion macros defined in
+@file{buffer.h}. There used to be a fixed set of external formats
+supported by these macros, but now any coding system can be used with
+these macros. The coding system alias mechanism is used to create the
+following logical coding systems, which replace the fixed external
+formats. The (dontusethis-set-symbol-value-handler) mechanism was
+enhanced to make this possible (more work on that is needed - like
+remove the @code{dontusethis-} prefix).
+
+@table @code
+@item Qbinary
+This is the simplest format and is what we use in the absence of a more
+appropriate format. This converts according to the @code{binary} coding
+system:
+
+@enumerate a
+@item
+On input, bytes 0--255 are converted into (implicitly Latin-1)
+characters 0--255. A non-Mule xemacs doesn't really know about
+different character sets and the fonts to display them, so the bytes can
+be treated as text in different 1-byte encodings by simply setting the
+appropriate fonts. So in a sense, non-Mule xemacs is a multi-lingual
+editor if, for example, different fonts are used to display text in
+different buffers, faces, or windows. The specifier mechanism gives the
+user complete control over this kind of behavior.
+@item
+On output, characters 0--255 are converted into bytes 0--255 and other
+characters are converted into `~'.
+@end enumerate
+
+@item Qfile_name
+Format used for filenames. This is user-definable via either the
+@code{file-name-coding-system} or @code{pathname-coding-system} (now
+obsolete) variables.
+
+@item Qnative
+Format used for the external Unix environment---@code{argv[]}, stuff
+from @code{getenv()}, stuff from the @file{/etc/passwd} file, etc.
+Currently this is the same as Qfile_name. The two should be
+distinguished for clarity and possible future separation.
+
+@item Qctext
+Compound--text format. This is the standard X11 format used for data
+stored in properties, selections, and the like. This is an 8-bit
+no-lock-shift ISO2022 coding system. This is a real coding system,
+unlike Qfile_name, which is user-definable.
+@end table
+
+There are two fundamental macros to convert between external and
+internal format.
+
+@code{TO_INTERNAL_FORMAT} converts external data to internal format, and
+@code{TO_EXTERNAL_FORMAT} converts the other way around. The arguments
+each of these receives are a source type, a source, a sink type, a sink,
+and a coding system (or a symbol naming a coding system).
+
+A typical call looks like
+@example
+TO_EXTERNAL_FORMAT (LISP_STRING, str, C_STRING_MALLOC, ptr, Qfile_name);
+@end example
+
+which means that the contents of the lisp string @code{str} are written
+to a malloc'ed memory area which will be pointed to by @code{ptr}, after
+the function returns. The conversion will be done using the
+@code{file-name} coding system, which will be controlled by the user
+indirectly by setting or binding the variable
+@code{file-name-coding-system}.
+
+Some sources and sinks require two C variables to specify. We use some
+preprocessor magic to allow different source and sink types, and even
+different numbers of arguments to specify different types of sources and
+sinks.
+
+So we can have a call that looks like
+@example
+TO_INTERNAL_FORMAT (DATA, (ptr, len),
+ MALLOC, (ptr, len),
+ coding_system);
+@end example
+
+The parenthesized argument pairs are required to make the preprocessor
+magic work.
+
+Here are the different source and sink types:
@table @code
-@item GET_C_CHARPTR_EXT_DATA_ALLOCA
-@itemx GET_C_CHARPTR_INT_DATA_ALLOCA
-These two macros work on ``C char pointers'', which are zero-terminated,
-and thus do not need @var{len} or @var{len_out} parameters.
-
-@item GET_STRING_EXT_DATA_ALLOCA
-@itemx GET_C_STRING_EXT_DATA_ALLOCA
-These two macros work on Lisp strings, thus also not needing a @var{len}
-parameter. However, @code{GET_STRING_EXT_DATA_ALLOCA} still provides a
-@var{len_out} parameter. Note that for Lisp strings only one conversion
-direction makes sense.
-
-@item GET_C_CHARPTR_EXT_BINARY_DATA_ALLOCA
-@itemx GET_C_CHARPTR_EXT_FILENAME_DATA_ALLOCA
-@itemx GET_C_CHARPTR_EXT_CTEXT_DATA_ALLOCA
-@itemx ...
-These macros are a combination of the above, but with the @var{fmt}
-argument encoded into the name of the macro.
+@item @code{DATA, (ptr, len),}
+input data is a fixed buffer of size @var{len} at address @var{ptr}
+@item @code{ALLOCA, (ptr, len),}
+output data is placed in an alloca()ed buffer of size @var{len} pointed to by @var{ptr}
+@item @code{MALLOC, (ptr, len),}
+output data is in a malloc()ed buffer of size @var{len} pointed to by @var{ptr}
+@item @code{C_STRING_ALLOCA, ptr,}
+equivalent to @code{ALLOCA (ptr, len_ignored)} on output.
+@item @code{C_STRING_MALLOC, ptr,}
+equivalent to @code{MALLOC (ptr, len_ignored)} on output
+@item @code{C_STRING, ptr,}
+equivalent to @code{DATA, (ptr, strlen (ptr) + 1)} on input
+@item @code{LISP_STRING, string,}
+input or output is a Lisp_Object of type string
+@item @code{LISP_BUFFER, buffer,}
+output is written to @code{(point)} in lisp buffer @var{buffer}
+@item @code{LISP_LSTREAM, lstream,}
+input or output is a Lisp_Object of type lstream
+@item @code{LISP_OPAQUE, object,}
+input or output is a Lisp_Object of type opaque
@end table
-@node General Guidelines for Writing Mule-Aware Code
+Often, the data is being converted to a '\0'-byte-terminated string,
+which is the format required by many external system C APIs. For these
+purposes, a source type of @code{C_STRING} or a sink type of
+@code{C_STRING_ALLOCA} or @code{C_STRING_MALLOC} is appropriate.
+Otherwise, we should try to keep XEmacs '\0'-byte-clean, which means
+using (ptr, len) pairs.
+
+The sinks to be specified must be lvalues, unless they are the lisp
+object types @code{LISP_LSTREAM} or @code{LISP_BUFFER}.
+
+For the sink types @code{ALLOCA} and @code{C_STRING_ALLOCA}, the
+resulting text is stored in a stack-allocated buffer, which is
+automatically freed on returning from the function. However, the sink
+types @code{MALLOC} and @code{C_STRING_MALLOC} return @code{xmalloc()}ed
+memory. The caller is responsible for freeing this memory using
+@code{xfree()}.
+
+Note that it doesn't make sense for @code{LISP_STRING} to be a source
+for @code{TO_INTERNAL_FORMAT} or a sink for @code{TO_EXTERNAL_FORMAT}.
+You'll get an assertion failure if you try.
+
+
+@node General Guidelines for Writing Mule-Aware Code, An Example of Mule-Aware Code, Conversion to and from External Data, Coding for Mule
@subsection General Guidelines for Writing Mule-Aware Code
This section contains some general guidance on how to write Mule-aware
almost certainly do not need @code{Emchar *}.
@item Be careful not to confuse @code{Charcount}, @code{Bytecount}, and @code{Bufpos}.
-The whole point of using different types is to avoid confusion about the
-use of certain variables. Lest this effect be nullified, you need to be
+The whole point of using different types is to avoid confusion about the
+use of certain variables. Lest this effect be nullified, you need to be
careful about using the right types.
@item Always convert external data
It is extremely important to always convert external data, because
-XEmacs can crash if unexpected 8bit sequences are copied to its internal
+XEmacs can crash if unexpected 8bit sequences are copied to its internal
buffers literally.
This means that when a system function, such as @code{readdir}, returns
-a string, you need to convert it using one of the conversion macros
+a string, you may need to convert it using one of the conversion macros
described in the previous chapter, before passing it further to Lisp.
-In the case of @code{readdir}, you would use the
-@code{GET_C_CHARPTR_INT_FILENAME_DATA_ALLOCA} macro.
+
+Actually, most of the basic system functions that accept '\0'-terminated
+string arguments, like @code{stat()} and @code{open()}, have been
+@strong{encapsulated} so that they are they @code{always} do internal to
+external conversion themselves. This means you must pass internally
+encoded data, typically the @code{XSTRING_DATA} of a Lisp_String to
+these functions. This is actually a design bug, since it unexpectedly
+changes the semantics of the system functions. A better design would be
+to provide separate versions of these system functions that accepted
+Lisp_Objects which were lisp strings in place of their current
+@code{char *} arguments.
+
+@example
+int stat_lisp (Lisp_Object path, struct stat *buf); /* Implement me */
+@end example
Also note that many internal functions, such as @code{make_string},
accept Bufbytes, which removes the need for them to convert the data
passed around in internal format.
@end table
-@node An Example of Mule-Aware Code
+@node An Example of Mule-Aware Code, , General Guidelines for Writing Mule-Aware Code, Coding for Mule
@subsection An Example of Mule-Aware Code
-As an example of Mule-aware code, we shall will analyze the
-@code{string} function, which conses up a Lisp string from the character
-arguments it receives. Here is the definition, pasted from
-@code{alloc.c}:
+As an example of Mule-aware code, we will analyze the @code{string}
+function, which conses up a Lisp string from the character arguments it
+receives. Here is the definition, pasted from @code{alloc.c}:
@example
@group
@code{set_charptr_emchar} stores it to storage, increasing @code{p} in
the process.
-Other instructing examples of correct coding under Mule can be found all
-over XEmacs code. For starters, I recommend
+Other instructive examples of correct coding under Mule can be found all
+over the XEmacs code. For starters, I recommend
@code{Fnormalize_menu_item_name} in @file{menubar.c}. After you have
understood this section of the manual and studied the examples, you can
proceed writing new Mule-aware code.
-@node Techniques for XEmacs Developers
+@node Techniques for XEmacs Developers, , Coding for Mule, Rules When Writing New C Code
@section Techniques for XEmacs Developers
+To make a purified XEmacs, do: @code{make puremacs}.
To make a quantified XEmacs, do: @code{make quantmacs}.
-You simply can't dump Quantified and Purified images. Run the image
-like so: @code{quantmacs -batch -l loadup.el run-temacs -q}.
+You simply can't dump Quantified and Purified images (unless using the
+portable dumper). Purify gets confused when xemacs frees memory in one
+process that was allocated in a @emph{different} process on a different
+machine!. Run it like so:
+@example
+temacs -batch -l loadup.el run-temacs @var{xemacs-args...}
+@end example
Before you go through the trouble, are you compiling with all
-debugging and error-checking off? If not try that first. Be warned
+debugging and error-checking off? If not, try that first. Be warned
that while Quantify is directly responsible for quite a few
optimizations which have been made to XEmacs, doing a run which
generates results which can be acted upon is not necessarily a trivial
commands: @code{quantify-start-recording-data},
@code{quantify-stop-recording-data} and @code{quantify-clear-data}.
-To get started debugging XEmacs, take a look at the @file{gdbinit} and
-@file{dbxrc} files in the @file{src} directory.
-@xref{Q2.1.15 - How to Debug an XEmacs problem with a debugger,,,
-xemacs-faq, XEmacs FAQ}.
+If you want to make XEmacs faster, target your favorite slow benchmark,
+run a profiler like Quantify, @code{gprof}, or @code{tcov}, and figure
+out where the cycles are going. Specific projects:
+
+@itemize @bullet
+@item
+Make the garbage collector faster. Figure out how to write an
+incremental garbage collector.
+@item
+Write a compiler that takes bytecode and spits out C code.
+Unfortunately, you will then need a C compiler and a more fully
+developed module system.
+@item
+Speed up redisplay.
+@item
+Speed up syntax highlighting. Maybe moving some of the syntax
+highlighting capabilities into C would make a difference.
+@item
+Implement tail recursion in Emacs Lisp (hard!).
+@end itemize
+
+Unfortunately, Emacs Lisp is slow, and is going to stay slow. Function
+calls in elisp are especially expensive. Iterating over a long list is
+going to be 30 times faster implemented in C than in Elisp.
+
+Heavily used small code fragments need to be fast. The traditional way
+to implement such code fragments in C is with macros. But macros in C
+are known to be broken.
+
+Macro arguments that are repeatedly evaluated may suffer from repeated
+side effects or suboptimal performance.
+
+Variable names used in macros may collide with caller's variables,
+causing (at least) unwanted compiler warnings.
+In order to solve these problems, and maintain statement semantics, one
+should use the @code{do @{ ... @} while (0)} trick while trying to
+reference macro arguments exactly once using local variables.
+
+Let's take a look at this poor macro definition:
+
+@example
+#define MARK_OBJECT(obj) \
+ if (!marked_p (obj)) mark_object (obj), did_mark = 1
+@end example
+
+This macro evaluates its argument twice, and also fails if used like this:
+@example
+ if (flag) MARK_OBJECT (obj); else do_something();
+@end example
+
+A much better definition is
+
+@example
+#define MARK_OBJECT(obj) do @{ \
+ Lisp_Object mo_obj = (obj); \
+ if (!marked_p (mo_obj)) \
+ @{ \
+ mark_object (mo_obj); \
+ did_mark = 1; \
+ @} \
+@} while (0)
+@end example
+
+Notice the elimination of double evaluation by using the local variable
+with the obscure name. Writing safe and efficient macros requires great
+care. The one problem with macros that cannot be portably worked around
+is, since a C block has no value, a macro used as an expression rather
+than a statement cannot use the techniques just described to avoid
+multiple evaluation.
+
+In most cases where a macro has function semantics, an inline function
+is a better implementation technique. Modern compiler optimizers tend
+to inline functions even if they have no @code{inline} keyword, and
+configure magic ensures that the @code{inline} keyword can be safely
+used as an additional compiler hint. Inline functions used in a single
+.c files are easy. The function must already be defined to be
+@code{static}. Just add another @code{inline} keyword to the
+definition.
+
+@example
+inline static int
+heavily_used_small_function (int arg)
+@{
+ ...
+@}
+@end example
+
+Inline functions in header files are trickier, because we would like to
+make the following optimization if the function is @emph{not} inlined
+(for example, because we're compiling for debugging). We would like the
+function to be defined externally exactly once, and each calling
+translation unit would create an external reference to the function,
+instead of including a definition of the inline function in the object
+code of every translation unit that uses it. This optimization is
+currently only available for gcc. But you don't have to worry about the
+trickiness; just define your inline functions in header files using this
+pattern:
+
+@example
+INLINE_HEADER int
+i_used_to_be_a_crufty_macro_but_look_at_me_now (int arg);
+INLINE_HEADER int
+i_used_to_be_a_crufty_macro_but_look_at_me_now (int arg)
+@{
+ ...
+@}
+@end example
+
+The declaration right before the definition is to prevent warnings when
+compiling with @code{gcc -Wmissing-declarations}. I consider issuing
+this warning for inline functions a gcc bug, but the gcc maintainers disagree.
+
+Every header which contains inline functions, either directly by using
+@code{INLINE_HEADER} or indirectly by using @code{DECLARE_LRECORD} must
+be added to @file{inline.c}'s includes to make the optimization
+described above work. (Optimization note: if all INLINE_HEADER
+functions are in fact inlined in all translation units, then the linker
+can just discard @code{inline.o}, since it contains only unreferenced code).
+
+To get started debugging XEmacs, take a look at the @file{.gdbinit} and
+@file{.dbxrc} files in the @file{src} directory. See the section in the
+XEmacs FAQ on How to Debug an XEmacs problem with a debugger.
+
+After making source code changes, run @code{make check} to ensure that
+you haven't introduced any regressions. If you want to make xemacs more
+reliable, please improve the test suite in @file{tests/automated}.
+
+Did you make sure you didn't introduce any new compiler warnings?
+
+Before submitting a patch, please try compiling at least once with
+
+@example
+configure --with-mule --with-union-type --error-checking=all
+@end example
Here are things to know when you create a new source file:
@itemize @bullet
@item
-All .c files should @code{#include <config.h>} first. Almost all .c
-files should @code{#include "lisp.h"} second.
+All @file{.c} files should @code{#include <config.h>} first. Almost all
+@file{.c} files should @code{#include "lisp.h"} second.
@item
-Generated header files should be included using the @code{<>} syntax,
-not the @code{""} syntax. The generated headers are:
+Generated header files should be included using the @code{#include <...>} syntax,
+not the @code{#include "..."} syntax. The generated headers are:
-config.h puresize-adjust.h sheap-adjust.h paths.h Emacs.ad.h
+@file{config.h sheap-adjust.h paths.h Emacs.ad.h}
The basic rule is that you should assume builds using @code{--srcdir}
-and the @code{<>} syntax needs to be used when the to-be-included
-generated file is in a potentially different directory
-@emph{at compile time}.
-
-@item
-Header files should not include <config.h> and "lisp.h". It is the
-responsibility of the .c files that use it to do so.
-
-@item
-If the header uses INLINE, either directly or though DECLARE_LRECORD,
-then it must be added to inline.c's includes.
+and the @code{#include <...>} syntax needs to be used when the
+to-be-included generated file is in a potentially different directory
+@emph{at compile time}. The non-obvious C rule is that @code{#include "..."}
+means to search for the included file in the same directory as the
+including file, @emph{not} in the current directory.
@item
-Try compiling at least once with
+Header files should @emph{not} include @code{<config.h>} and
+@code{"lisp.h"}. It is the responsibility of the @file{.c} files that
+use it to do so.
-@example
-gcc --with-mule --with-union-type --error-checking=all
-@end example
@end itemize
+Here is a checklist of things to do when creating a new lisp object type
+named @var{foo}:
+
+@enumerate
+@item
+create @var{foo}.h
+@item
+create @var{foo}.c
+@item
+add definitions of @code{syms_of_@var{foo}}, etc. to @file{@var{foo}.c}
+@item
+add declarations of @code{syms_of_@var{foo}}, etc. to @file{symsinit.h}
+@item
+add calls to @code{syms_of_@var{foo}}, etc. to @file{emacs.c}
+@item
+add definitions of macros like @code{CHECK_@var{FOO}} and
+@code{@var{FOO}P} to @file{@var{foo}.h}
+@item
+add the new type index to @code{enum lrecord_type}
+@item
+add a DEFINE_LRECORD_IMPLEMENTATION call to @file{@var{foo}.c}
+@item
+add an INIT_LRECORD_IMPLEMENTATION call to @code{syms_of_@var{foo}.c}
+@end enumerate
+
@node A Summary of the Various XEmacs Modules, Allocation of Objects in XEmacs Lisp, Rules When Writing New C Code, Top
@chapter A Summary of the Various XEmacs Modules
* Modules for Internationalization::
@end menu
-@node Low-Level Modules
+@node Low-Level Modules, Basic Lisp Modules, A Summary of the Various XEmacs Modules, A Summary of the Various XEmacs Modules
@section Low-Level Modules
@example
- size name
-------- ---------------------
- 18150 config.h
+config.h
@end example
This is automatically generated from @file{config.h.in} based on the
@example
- 2347 paths.h
+paths.h
@end example
This is automatically generated from @file{paths.h.in} based on supplied
@example
- 47878 emacs.c
- 20239 signal.c
+emacs.c
+signal.c
@end example
@file{emacs.c} contains @code{main()} and other code that performs the most
@example
- 23458 unexaix.c
- 9893 unexalpha.c
- 11302 unexapollo.c
- 16544 unexconvex.c
- 31967 unexec.c
- 30959 unexelf.c
- 35791 unexelfsgi.c
- 3207 unexencap.c
- 7276 unexenix.c
- 20539 unexfreebsd.c
- 1153 unexfx2800.c
- 13432 unexhp9k3.c
- 11049 unexhp9k800.c
- 9165 unexmips.c
- 8981 unexnext.c
- 1673 unexsol2.c
- 19261 unexsunos4.c
+unexaix.c
+unexalpha.c
+unexapollo.c
+unexconvex.c
+unexec.c
+unexelf.c
+unexelfsgi.c
+unexencap.c
+unexenix.c
+unexfreebsd.c
+unexfx2800.c
+unexhp9k3.c
+unexhp9k800.c
+unexmips.c
+unexnext.c
+unexsol2.c
+unexsunos4.c
@end example
These modules contain code dumping out the XEmacs executable on various
@example
- 15715 crt0.c
- 1484 lastfile.c
- 1115 pre-crt0.c
+ecrt0.c
+lastfile.c
+pre-crt0.c
@end example
These modules are used in conjunction with the dump mechanism. On some
@example
- 14786 alloca.c
- 16678 free-hook.c
- 1692 getpagesize.h
- 41936 gmalloc.c
- 25141 malloc.c
- 3802 mem-limits.h
- 39011 ralloc.c
- 3436 vm-limit.c
+alloca.c
+free-hook.c
+getpagesize.h
+gmalloc.c
+malloc.c
+mem-limits.h
+ralloc.c
+vm-limit.c
@end example
These handle basic C allocation of memory. @file{alloca.c} is an emulation of
fixed now.)
@cindex relocating allocator
-@file{ralloc.c} is the @dfn{relocating allocator}. It provides functions
-similar to @code{malloc()}, @code{realloc()} and @code{free()} that allocate
-memory that can be dynamically relocated in memory. The advantage of
-this is that allocated memory can be shuffled around to place all the
-free memory at the end of the heap, and the heap can then be shrunk,
-releasing the memory back to the operating system. The use of this can
-be controlled with the configure option @code{--rel-alloc}; if enabled, memory allocated for
-buffers will be relocatable, so that if a very large file is visited and
-the buffer is later killed, the memory can be released to the operating
-system. (The disadvantage of this mechanism is that it can be very
-slow. On systems with the @code{mmap()} system call, the XEmacs version
-of @file{ralloc.c} uses this to move memory around without actually having to
-block-copy it, which can speed things up; but it can still cause
-noticeable performance degradation.)
+@file{ralloc.c} is the @dfn{relocating allocator}. It provides
+functions similar to @code{malloc()}, @code{realloc()} and @code{free()}
+that allocate memory that can be dynamically relocated in memory. The
+advantage of this is that allocated memory can be shuffled around to
+place all the free memory at the end of the heap, and the heap can then
+be shrunk, releasing the memory back to the operating system. The use
+of this can be controlled with the configure option @code{--rel-alloc};
+if enabled, memory allocated for buffers will be relocatable, so that if
+a very large file is visited and the buffer is later killed, the memory
+can be released to the operating system. (The disadvantage of this
+mechanism is that it can be very slow. On systems with the
+@code{mmap()} system call, the XEmacs version of @file{ralloc.c} uses
+this to move memory around without actually having to block-copy it,
+which can speed things up; but it can still cause noticeable performance
+degradation.)
@file{free-hook.c} contains some debugging functions for checking for invalid
arguments to @code{free()}.
@example
- 2659 blocktype.c
- 1410 blocktype.h
- 7194 dynarr.c
- 2671 dynarr.h
+blocktype.c
+blocktype.h
+dynarr.c
@end example
These implement a couple of basic C data types to facilitate memory
@example
- 2058 inline.c
+inline.c
@end example
This module is used in connection with inline functions (available in
@example
- 6489 debug.c
- 2267 debug.h
+debug.c
+debug.h
@end example
These functions provide a system for doing internal consistency checks
@example
- 1643 prefix-args.c
-@end example
-
-This is actually the source for a small, self-contained program
-used during building.
-
-
-@example
- 904 universe.h
+universe.h
@end example
This is not currently used.
-@node Basic Lisp Modules
+@node Basic Lisp Modules, Modules for Standard Editing Operations, Low-Level Modules, A Summary of the Various XEmacs Modules
@section Basic Lisp Modules
@example
- size name
-------- ---------------------
- 70167 emacsfns.h
- 6305 lisp-disunion.h
- 7086 lisp-union.h
- 54929 lisp.h
- 14235 lrecord.h
- 10728 symsinit.h
+lisp-disunion.h
+lisp-union.h
+lisp.h
+lrecord.h
+symsinit.h
@end example
These are the basic header files for all XEmacs modules. Each module
As a general rule, all typedefs should go into the typedefs section of
@file{lisp.h} rather than into a module-specific header file even if the
structure is defined elsewhere. This allows function prototypes that
-use the typedef to be placed into @file{emacsfns.h}. Forward structure
+use the typedef to be placed into other header files. Forward structure
declarations (i.e. a simple declaration like @code{struct foo;} where
the structure itself is defined elsewhere) should be placed into the
typedefs section as necessary.
@file{lrecord.h} contains the basic structures and macros that implement
-all record-type Lisp objects -- i.e. all objects whose type is a field
+all record-type Lisp objects---i.e. all objects whose type is a field
in their C structure, which includes all objects except the few most
basic ones.
-@file{emacsfns.h} contains prototypes for most of the exported functions
-in the various modules. (In particular, prototypes for Lisp primitives
-should always go into this header file. Prototypes for other functions
-can either go here or in a module-specific header file, depending on how
-general-purpose the function is and whether it has special-purpose
-argument types requiring definitions not in @file{lisp.h}.) All
-initialization functions are prototyped in @file{symsinit.h}.
+@file{lisp.h} contains prototypes for most of the exported functions in
+the various modules. Lisp primitives defined using @code{DEFUN} that
+need to be called by C code should be declared using @code{EXFUN}.
+Other function prototypes should be placed either into the appropriate
+section of @code{lisp.h}, or into a module-specific header file,
+depending on how general-purpose the function is and whether it has
+special-purpose argument types requiring definitions not in
+@file{lisp.h}.) All initialization functions are prototyped in
+@file{symsinit.h}.
@example
- 120478 alloc.c
- 1029 pure.c
- 2506 puresize.h
+alloc.c
@end example
The large module @file{alloc.c} implements all of the basic allocation and
type-specific methods. This scheme is a fundamental principle of
object-oriented programming and is heavily used throughout XEmacs. The
great advantage of this is that it allows for a clean separation of
-functionality into different modules -- new classes of Lisp objects, new
+functionality into different modules---new classes of Lisp objects, new
event interfaces, new device types, new stream interfaces, etc. can be
added transparently without affecting code anywhere else in XEmacs.
Because the different subsystems are divided into general and specific
subtypes in the subsystem; this provides a great deal of robustness to
the XEmacs code.
-@cindex pure space
-@file{pure.c} contains the declaration of the @dfn{purespace} array.
-Pure space is a hack used to place some constant Lisp data into the code
-segment of the XEmacs executable, even though the data needs to be
-initialized through function calls. (See above in section VIII for more
-info about this.) During startup, certain sorts of data is
-automatically copied into pure space, and other data is copied manually
-in some of the basic Lisp files by calling the function @code{purecopy},
-which copies the object if possible (this only works in temacs, of
-course) and returns the new object. In particular, while temacs is
-executing, the Lisp reader automatically copies all compiled-function
-objects that it reads into pure space. Since compiled-function objects
-are large, are never modified, and typically comprise the majority of
-the contents of a compiled-Lisp file, this works well. While XEmacs is
-running, any attempt to modify an object that resides in pure space
-causes an error. Objects in pure space are never garbage collected --
-almost all of the time, they're intended to be permanent, and in any
-case you can't write into pure space to set the mark bits.
-
-@file{puresize.h} contains the declaration of the size of the pure space
-array. This depends on the optional features that are compiled in, any
-extra purespace requested by the user at compile time, and certain other
-factors (e.g. 64-bit machines need more pure space because their Lisp
-objects are larger). The smallest size that suffices should be used, so
-that there's no wasted space. If there's not enough pure space, you
-will get an error during the build process, specifying how much more
-pure space is needed.
-
-
-
-@example
- 122243 eval.c
- 2305 backtrace.h
+
+@example
+eval.c
+backtrace.h
@end example
This module contains all of the functions to handle the flow of control.
@example
- 64949 lread.c
+lread.c
@end example
This module implements the Lisp reader and the @code{read} function,
@example
- 40900 print.c
+print.c
@end example
This module implements the Lisp print mechanism and the @code{print}
@example
- 4518 general.c
- 60220 symbols.c
- 9966 symeval.h
+general.c
+symbols.c
+symeval.h
@end example
@file{symbols.c} implements the handling of symbols, obarrays, and
retrieving the values of symbols. Much of the code is devoted to
handling the special @dfn{symbol-value-magic} objects that define
-special types of variables -- this includes buffer-local variables,
+special types of variables---this includes buffer-local variables,
variable aliases, variables that forward into C variables, etc. This
module is initialized extremely early (right after @file{alloc.c}),
because it is here that the basic symbols @code{t} and @code{nil} are
@example
- 48973 data.c
- 25694 floatfns.c
- 71049 fns.c
+data.c
+floatfns.c
+fns.c
@end example
These modules implement the methods and standard Lisp primitives for all
@example
- 23555 bytecode.c
- 3358 bytecode.h
+bytecode.c
+bytecode.h
@end example
-@file{bytecode.c} implements the byte-code interpreter, and @file{bytecode.h} contains
-associated structures. Note that the byte-code @emph{compiler} is
-written in Lisp.
+@file{bytecode.c} implements the byte-code interpreter and
+compiled-function objects, and @file{bytecode.h} contains associated
+structures. Note that the byte-code @emph{compiler} is written in Lisp.
-@node Modules for Standard Editing Operations
+@node Modules for Standard Editing Operations, Editor-Level Control Flow Modules, Basic Lisp Modules, A Summary of the Various XEmacs Modules
@section Modules for Standard Editing Operations
@example
- size name
-------- ---------------------
- 82900 buffer.c
- 60964 buffer.h
- 6059 bufslots.h
+buffer.c
+buffer.h
+bufslots.h
@end example
@file{buffer.c} implements the @dfn{buffer} Lisp object type. This
@example
- 79888 insdel.c
- 6103 insdel.h
+insdel.c
+insdel.h
@end example
@file{insdel.c} contains low-level functions for inserting and deleting text in
@example
- 10975 marker.c
+marker.c
@end example
This module implements the @dfn{marker} Lisp object type, which
@example
- 193714 extents.c
- 15686 extents.h
+extents.c
+extents.h
@end example
This module implements the @dfn{extent} Lisp object type, which is like
@example
- 60155 editfns.c
+editfns.c
@end example
@file{editfns.c} contains the standard Lisp primitives for working with
@example
- 26081 callint.c
- 12577 cmds.c
- 2749 commands.h
+callint.c
+cmds.c
+commands.h
@end example
@cindex interactive
@example
- 194863 regex.c
- 18968 regex.h
- 79800 search.c
+regex.c
+regex.h
+search.c
@end example
@file{search.c} implements the Lisp primitives for searching for text in
@example
- 20476 doprnt.c
+doprnt.c
@end example
@file{doprnt.c} implements formatted-string processing, similar to
@example
- 15372 undo.c
+undo.c
@end example
This module implements the undo mechanism for tracking buffer changes.
-@node Editor-Level Control Flow Modules
+@node Editor-Level Control Flow Modules, Modules for the Basic Displayable Lisp Objects, Modules for Standard Editing Operations, A Summary of the Various XEmacs Modules
@section Editor-Level Control Flow Modules
@example
- size name
-------- ---------------------
- 84546 event-Xt.c
- 121483 event-stream.c
- 6658 event-tty.c
- 49271 events.c
- 14459 events.h
+event-Xt.c
+event-msw.c
+event-stream.c
+event-tty.c
+events-mod.h
+gpmevent.c
+gpmevent.h
+events.c
+events.h
@end example
These implement the handling of events (user input and other system
@example
- 129583 keymap.c
- 2621 keymap.h
+keymap.c
+keymap.h
@end example
@file{keymap.c} and @file{keymap.h} define the @dfn{keymap} Lisp object
@example
- 25212 keyboard.c
+cmdloop.c
@end example
-@file{keyboard.c} contains functions that implement the actual editor
-command loop -- i.e. the event loop that cyclically retrieves and
+@file{cmdloop.c} contains functions that implement the actual editor
+command loop---i.e. the event loop that cyclically retrieves and
dispatches events. This code is also rather tricky, just like
@file{event-stream.c}.
@example
- 9973 macros.c
- 1397 macros.h
+macros.c
+macros.h
@end example
These two modules contain the basic code for defining keyboard macros.
@example
- 23234 minibuf.c
+minibuf.c
@end example
This contains some miscellaneous code related to the minibuffer (most of
-@node Modules for the Basic Displayable Lisp Objects
+@node Modules for the Basic Displayable Lisp Objects, Modules for other Display-Related Lisp Objects, Editor-Level Control Flow Modules, A Summary of the Various XEmacs Modules
@section Modules for the Basic Displayable Lisp Objects
@example
- size name
-------- ---------------------
- 985 device-ns.h
- 6454 device-stream.c
- 1196 device-stream.h
- 9526 device-tty.c
- 8660 device-tty.h
- 43798 device-x.c
- 11667 device-x.h
- 26056 device.c
- 22993 device.h
+console-msw.c
+console-msw.h
+console-stream.c
+console-stream.h
+console-tty.c
+console-tty.h
+console-x.c
+console-x.h
+console.c
+console.h
+@end example
+
+These modules implement the @dfn{console} Lisp object type. A console
+contains multiple display devices, but only one keyboard and mouse.
+Most of the time, a console will contain exactly one device.
+
+Consoles are the top of a lisp object inclusion hierarchy. Consoles
+contain devices, which contain frames, which contain windows.
+
+
+
+@example
+device-msw.c
+device-tty.c
+device-x.c
+device.c
+device.h
@end example
These modules implement the @dfn{device} Lisp object type. This
@example
- 934 frame-ns.h
- 2303 frame-tty.c
- 69205 frame-x.c
- 5976 frame-x.h
- 68175 frame.c
- 15080 frame.h
+frame-msw.c
+frame-tty.c
+frame-x.c
+frame.c
+frame.h
@end example
Each device contains one or more frames in which objects (e.g. text) are
@example
- 160783 window.c
- 15974 window.h
+window.c
+window.h
@end example
@cindex window (in Emacs)
-@node Modules for other Display-Related Lisp Objects
+@node Modules for other Display-Related Lisp Objects, Modules for the Redisplay Mechanism, Modules for the Basic Displayable Lisp Objects, A Summary of the Various XEmacs Modules
@section Modules for other Display-Related Lisp Objects
@example
- size name
-------- ---------------------
- 54397 faces.c
- 15173 faces.h
+faces.c
+faces.h
@end example
@example
- 4961 bitmaps.h
- 954 glyphs-ns.h
- 105345 glyphs-x.c
- 4288 glyphs-x.h
- 72102 glyphs.c
- 16356 glyphs.h
+bitmaps.h
+glyphs-eimage.c
+glyphs-msw.c
+glyphs-msw.h
+glyphs-widget.c
+glyphs-x.c
+glyphs-x.h
+glyphs.c
+glyphs.h
@end example
@example
- 952 objects-ns.h
- 9971 objects-tty.c
- 1465 objects-tty.h
- 32326 objects-x.c
- 2806 objects-x.h
- 31944 objects.c
- 6809 objects.h
+objects-msw.c
+objects-msw.h
+objects-tty.c
+objects-tty.h
+objects-x.c
+objects-x.h
+objects.c
+objects.h
@end example
@example
- 57511 menubar-x.c
- 11243 menubar.c
+menubar-msw.c
+menubar-msw.h
+menubar-x.c
+menubar.c
+menubar.h
@end example
@example
- 25012 scrollbar-x.c
- 2554 scrollbar-x.h
- 26954 scrollbar.c
- 2778 scrollbar.h
+scrollbar-msw.c
+scrollbar-msw.h
+scrollbar-x.c
+scrollbar-x.h
+scrollbar.c
+scrollbar.h
@end example
@example
- 23117 toolbar-x.c
- 43456 toolbar.c
- 4280 toolbar.h
+toolbar-msw.c
+toolbar-x.c
+toolbar.c
+toolbar.h
@end example
@example
- 25070 font-lock.c
+font-lock.c
@end example
-This file provides C support for syntax highlighting -- i.e.
+This file provides C support for syntax highlighting---i.e.
highlighting different syntactic constructs of a source file in
different colors, for easy reading. The C support is provided so that
this is fast.
@example
- 32180 dgif_lib.c
- 3999 gif_err.c
- 10697 gif_lib.h
- 9371 gifalloc.c
+dgif_lib.c
+gif_err.c
+gif_lib.h
+gifalloc.c
@end example
These modules decode GIF-format image files, for use with glyphs.
+These files were removed due to Unisys patent infringement concerns.
-@node Modules for the Redisplay Mechanism
+@node Modules for the Redisplay Mechanism, Modules for Interfacing with the File System, Modules for other Display-Related Lisp Objects, A Summary of the Various XEmacs Modules
@section Modules for the Redisplay Mechanism
@example
- size name
-------- ---------------------
- 38692 redisplay-output.c
- 40835 redisplay-tty.c
- 65069 redisplay-x.c
- 234142 redisplay.c
- 17026 redisplay.h
+redisplay-output.c
+redisplay-msw.c
+redisplay-tty.c
+redisplay-x.c
+redisplay.c
+redisplay.h
@end example
These files provide the redisplay mechanism. As with many other
@example
- 14129 indent.c
+indent.c
@end example
This module contains various functions and Lisp primitives for
@example
- 14754 termcap.c
- 2141 terminfo.c
- 7253 tparam.c
+termcap.c
+terminfo.c
+tparam.c
@end example
These files contain functions for working with the termcap (BSD-style)
@example
- 10869 cm.c
- 5876 cm.h
+cm.c
+cm.h
@end example
These files provide some miscellaneous TTY-output functions and should
-@node Modules for Interfacing with the File System
+@node Modules for Interfacing with the File System, Modules for Other Aspects of the Lisp Interpreter and Object System, Modules for the Redisplay Mechanism, A Summary of the Various XEmacs Modules
@section Modules for Interfacing with the File System
@example
- size name
-------- ---------------------
- 43362 lstream.c
- 14240 lstream.h
+lstream.c
+lstream.h
@end example
These modules implement the @dfn{stream} Lisp object type. This is an
Similar to other subsystems in XEmacs, lstreams are separated into
generic functions and a set of methods for the different types of
lstreams. @file{lstream.c} provides implementations of many different
-types of streams; others are provided, e.g., in @file{mule-coding.c}.
+types of streams; others are provided, e.g., in @file{file-coding.c}.
@example
- 126926 fileio.c
+fileio.c
@end example
This implements the basic primitives for interfacing with the file
@example
- 10960 filelock.c
+filelock.c
@end example
This file provides functions for detecting clashes between different
@example
- 4527 filemode.c
+filemode.c
@end example
This file provides some miscellaneous functions that construct a
@example
- 22855 dired.c
- 2094 ndir.h
+dired.c
+ndir.h
@end example
These files implement the XEmacs interface to directory searching. This
@example
- 4311 realpath.c
+realpath.c
@end example
This file provides an implementation of the @code{realpath()} function
-@node Modules for Other Aspects of the Lisp Interpreter and Object System
+@node Modules for Other Aspects of the Lisp Interpreter and Object System, Modules for Interfacing with the Operating System, Modules for Interfacing with the File System, A Summary of the Various XEmacs Modules
@section Modules for Other Aspects of the Lisp Interpreter and Object System
@example
- size name
-------- ---------------------
- 22290 elhash.c
- 2454 elhash.h
- 12169 hash.c
- 3369 hash.h
+elhash.c
+elhash.h
+hash.c
+hash.h
@end example
-These files implement the @dfn{hashtable} Lisp object type.
+These files provide two implementations of hash tables. Files
@file{hash.c} and @file{hash.h} provide a generic C implementation of
-hash tables (which can stand independently of XEmacs), and
-@file{elhash.c} and @file{elhash.h} provide a Lisp interface onto the C
-hash tables using the hashtable Lisp object type.
-
+hash tables which can stand independently of XEmacs. Files
+@file{elhash.c} and @file{elhash.h} provide a separate implementation of
+hash tables that can store only Lisp objects, and knows about Lispy
+things like garbage collection, and implement the @dfn{hash-table} Lisp
+object type.
@example
- 95691 specifier.c
- 11167 specifier.h
+specifier.c
+specifier.h
@end example
This module implements the @dfn{specifier} Lisp object type. This is
@example
- 43058 chartab.c
- 6503 chartab.h
- 9918 casetab.c
+chartab.c
+chartab.h
+casetab.c
@end example
@file{chartab.c} and @file{chartab.h} implement the @dfn{char table}
@example
- 49593 syntax.c
- 10200 syntax.h
+syntax.c
+syntax.h
@end example
@cindex scanner
@example
- 10438 casefiddle.c
+casefiddle.c
@end example
This module implements various Lisp primitives for upcasing, downcasing
@example
- 20234 rangetab.c
+rangetab.c
@end example
This module implements the @dfn{range table} Lisp object type, which
@example
- 3201 opaque.c
- 2206 opaque.h
+opaque.c
+opaque.h
@end example
This module implements the @dfn{opaque} Lisp object type, an
with them, in case the block of memory contains other Lisp objects that
need to be marked for garbage-collection purposes. (If you need other
object methods, such as a finalize method, you should just go ahead and
-create a new Lisp object type -- it's not hard.)
+create a new Lisp object type---it's not hard.)
@example
- 8783 abbrev.c
+abbrev.c
@end example
This function provides a few primitives for doing dynamic abbreviation
@example
- 21934 doc.c
+doc.c
@end example
This function provides primitives for retrieving the documentation
@example
- 13197 md5.c
+md5.c
@end example
This function provides a Lisp primitive that implements the MD5 secure
-@node Modules for Interfacing with the Operating System
+@node Modules for Interfacing with the Operating System, Modules for Interfacing with X Windows, Modules for Other Aspects of the Lisp Interpreter and Object System, A Summary of the Various XEmacs Modules
@section Modules for Interfacing with the Operating System
@example
- size name
-------- ---------------------
- 33533 callproc.c
- 89697 process.c
- 4663 process.h
+callproc.c
+process.c
+process.h
@end example
These modules allow XEmacs to spawn and communicate with subprocesses
@example
- 136029 sysdep.c
- 5986 sysdep.h
+sysdep.c
+sysdep.h
@end example
These modules implement most of the low-level, messy operating-system
@example
- 3605 sysdir.h
- 6708 sysfile.h
- 2027 sysfloat.h
- 2918 sysproc.h
- 745 syspwd.h
- 7643 syssignal.h
- 6892 systime.h
- 12477 systty.h
- 3487 syswait.h
+sysdir.h
+sysfile.h
+sysfloat.h
+sysproc.h
+syspwd.h
+syssignal.h
+systime.h
+systty.h
+syswait.h
@end example
These header files provide consistent interfaces onto system-dependent
@example
- 7940 hpplay.c
- 10920 libsst.c
- 1480 libsst.h
- 3260 libst.h
- 15355 linuxplay.c
- 15849 nas.c
- 19133 sgiplay.c
- 15411 sound.c
- 7358 sunplay.c
+hpplay.c
+libsst.c
+libsst.h
+libst.h
+linuxplay.c
+nas.c
+sgiplay.c
+sound.c
+sunplay.c
@end example
These files implement the ability to play various sounds on some types
@example
- 44368 tooltalk.c
- 2137 tooltalk.h
+tooltalk.c
+tooltalk.h
@end example
These two modules implement an interface to the ToolTalk protocol, which
@example
- 22695 getloadavg.c
+getloadavg.c
@end example
This module provides the ability to retrieve the system's current load
@example
- 148520 energize.c
- 6896 energize.h
-@end example
-
-This module provides code to interface to an Energize server (when
-XEmacs is used as part of Lucid's Energize development environment) and
-provides some other Energize-specific functions. Much of the code in
-this module should be made more general-purpose and moved elsewhere, but
-is no longer very relevant now that Lucid is defunct. It also hasn't
-worked since version 19.12, since nobody has been maintaining it.
-
-
-
-@example
- 2861 sunpro.c
+sunpro.c
@end example
This module provides a small amount of code used internally at Sun to
@example
- 5548 broken-sun.h
- 3468 strcmp.c
- 2179 strcpy.c
- 1650 sunOS-fix.c
+broken-sun.h
+strcmp.c
+strcpy.c
+sunOS-fix.c
@end example
These files provide replacement functions and prototypes to fix numerous
@example
- 11669 hftctl.c
+hftctl.c
@end example
This module provides some terminal-control code necessary on versions of
-@example
- 1776 acldef.h
- 1602 chpdef.h
- 9032 uaf.h
- 105 vlimit.h
- 7145 vms-pp.c
- 1158 vms-pwd.h
- 26532 vmsfns.c
- 6038 vmsmap.c
- 695 vmspaths.h
- 17482 vmsproc.c
- 469 vmsproc.h
-@end example
-
-All of these files are used for VMS support, which has never worked in
-XEmacs.
-
-
-
-@example
- 28316 msdos.c
- 1472 msdos.h
-@end example
-
-These modules are used for MS-DOS support, which does not work in
-XEmacs.
-
-
-
-@node Modules for Interfacing with X Windows
+@node Modules for Interfacing with X Windows, Modules for Internationalization, Modules for Interfacing with the Operating System, A Summary of the Various XEmacs Modules
@section Modules for Interfacing with X Windows
@example
- size name
-------- ---------------------
- 3196 Emacs.ad.h
+Emacs.ad.h
@end example
A file generated from @file{Emacs.ad}, which contains XEmacs-supplied
@example
- 24242 EmacsFrame.c
- 6979 EmacsFrame.h
- 3351 EmacsFrameP.h
+EmacsFrame.c
+EmacsFrame.h
+EmacsFrameP.h
@end example
These modules implement an Xt widget class that encapsulates a frame.
@example
- 8178 EmacsManager.c
- 1967 EmacsManager.h
- 1895 EmacsManagerP.h
+EmacsManager.c
+EmacsManager.h
+EmacsManagerP.h
@end example
These modules implement a simple Xt manager (i.e. composite) widget
@example
- 13188 EmacsShell-sub.c
- 4588 EmacsShell.c
- 2180 EmacsShell.h
- 3133 EmacsShellP.h
+EmacsShell-sub.c
+EmacsShell.c
+EmacsShell.h
+EmacsShellP.h
@end example
These modules implement two Xt widget classes that are subclasses of
@example
- 9673 xgccache.c
- 1111 xgccache.h
+xgccache.c
+xgccache.h
@end example
These modules provide functions for maintenance and caching of GC's
@example
- 69181 xselect.c
+select-msw.c
+select-x.c
+select.c
+select.h
@end example
@cindex selections
@example
- 929 xintrinsic.h
- 1038 xintrinsicp.h
- 1579 xmmanagerp.h
- 1585 xmprimitivep.h
+xintrinsic.h
+xintrinsicp.h
+xmmanagerp.h
+xmprimitivep.h
@end example
These header files are similar in spirit to the @file{sys*.h} files and buffer
@example
- 16930 xmu.c
- 936 xmu.h
+xmu.c
+xmu.h
@end example
These files provide an emulation of the Xmu library for those systems
@example
- 4201 ExternalClient-Xlib.c
- 18083 ExternalClient.c
- 2035 ExternalClient.h
- 2104 ExternalClientP.h
- 22684 ExternalShell.c
- 1709 ExternalShell.h
- 1971 ExternalShellP.h
- 2478 extw-Xlib.c
- 1481 extw-Xlib.h
- 6565 extw-Xt.c
- 1430 extw-Xt.h
+ExternalClient-Xlib.c
+ExternalClient.c
+ExternalClient.h
+ExternalClientP.h
+ExternalShell.c
+ExternalShell.h
+ExternalShellP.h
+extw-Xlib.c
+extw-Xlib.h
+extw-Xt.c
+extw-Xt.h
@end example
@cindex external widget
-@example
- 31014 epoch.c
-@end example
-
-This file provides some additional, Epoch-compatible, functionality for
-interfacing to the X Window System.
-
-
-
-@node Modules for Internationalization
+@node Modules for Internationalization, , Modules for Interfacing with X Windows, A Summary of the Various XEmacs Modules
@section Modules for Internationalization
@example
- size name
-------- ---------------------
- 42836 mule-canna.c
- 16737 mule-ccl.c
- 41080 mule-charset.c
- 30176 mule-charset.h
- 146844 mule-coding.c
- 16588 mule-coding.h
- 6996 mule-mcpath.c
- 2899 mule-mcpath.h
- 57158 mule-wnnfns.c
- 3351 mule.c
+mule-canna.c
+mule-ccl.c
+mule-charset.c
+mule-charset.h
+file-coding.c
+file-coding.h
+mule-mcpath.c
+mule-mcpath.h
+mule-wnnfns.c
+mule.c
@end example
These files implement the MULE (Asian-language) support. Note that MULE
just Asian languages (although they are generally the most complicated
to support). This code is still in beta.
-@file{mule-charset.*} and @file{mule-coding.*} provide the heart of the
+@file{mule-charset.*} and @file{file-coding.*} provide the heart of the
XEmacs MULE support. @file{mule-charset.*} implements the @dfn{charset}
Lisp object type, which encapsulates a character set (an ordered one- or
two-dimensional set of characters, such as US ASCII or JISX0208 Japanese
-Kanji).
+Kanji).
-@file{mule-coding.*} implements the @dfn{coding-system} Lisp object
+@file{file-coding.*} implements the @dfn{coding-system} Lisp object
type, which encapsulates a method of converting between different
encodings. An encoding is a representation of a stream of characters,
possibly from multiple character sets, using a stream of bytes or words,
@example
- 9400 intl.c
+intl.c
@end example
This provides some miscellaneous internationalization code for
@example
- 1764 iso-wide.h
+iso-wide.h
@end example
This contains leftover code from an earlier implementation of
-@node Allocation of Objects in XEmacs Lisp, Events and the Event Loop, A Summary of the Various XEmacs Modules, Top
+@node Allocation of Objects in XEmacs Lisp, Dumping, A Summary of the Various XEmacs Modules, Top
@chapter Allocation of Objects in XEmacs Lisp
@menu
* Introduction to Allocation::
* Garbage Collection::
* GCPROing::
+* Garbage Collection - Step by Step::
* Integers and Characters::
* Allocation from Frob Blocks::
* lrecords::
* Low-level allocation::
-* Pure Space::
* Cons::
* Vector::
* Bit Vector::
* Symbol::
* Marker::
* String::
-* Bytecode::
+* Compiled Function::
@end menu
-@node Introduction to Allocation
+@node Introduction to Allocation, Garbage Collection, Allocation of Objects in XEmacs Lisp, Allocation of Objects in XEmacs Lisp
@section Introduction to Allocation
Emacs Lisp, like all Lisps, has garbage collection. This means that
have no corresponding Lisp primitives. Every Lisp object, though,
has at least one C primitive for creating it.
- Recall from section (VII) that a Lisp object, as stored in a 32-bit
-or 64-bit word, has a mark bit, a few tag bits, and a ``value'' that
-occupies the remainder of the bits. We can separate the different
-Lisp object types into four broad categories:
+ Recall from section (VII) that a Lisp object, as stored in a 32-bit or
+64-bit word, has a few tag bits, and a ``value'' that occupies the
+remainder of the bits. We can separate the different Lisp object types
+into three broad categories:
@itemize @bullet
@item
(a) Those for whom the value directly represents the contents of the
Lisp object. Only two types are in this category: integers and
characters. No special allocation or garbage collection is necessary
-for such objects. Lisp objects of these types do not need to be
+for such objects. Lisp objects of these types do not need to be
@code{GCPRO}ed.
@end itemize
- In the remaining three categories, the value is a pointer to a
-structure.
-
-@itemize @bullet
-@item
-@cindex frob block
-(b) Those for whom the tag directly specifies the type. Recall that
-there are only three tag bits; this means that at most five types can be
-specified this way. The most commonly-used types are stored in this
-format; this includes conses, strings, vectors, and sometimes symbols.
-With the exception of vectors, objects in this category are allocated in
-@dfn{frob blocks}, i.e. large blocks of memory that are subdivided into
-individual objects. This saves a lot on malloc overhead, since there
-are typically quite a lot of these objects around, and the objects are
-small. (A cons, for example, occupies 8 bytes on 32-bit machines -- 4
-bytes for each of the two objects it contains.) Vectors are individually
-@code{malloc()}ed since they are of variable size. (It would be
-possible, and desirable, to allocate vectors of certain small sizes out
-of frob blocks, but it isn't currently done.) Strings are handled
-specially: Each string is allocated in two parts, a fixed size structure
-containing a length and a data pointer, and the actual data of the
-string. The former structure is allocated in frob blocks as usual, and
-the latter data is stored in @dfn{string chars blocks} and is relocated
-during garbage collection to eliminate holes.
-@end itemize
-
In the remaining two categories, the type is stored in the object
itself. The tag for all such objects is the generic @dfn{lrecord}
-(Lisp_Record) tag. The first four bytes (or eight, for 64-bit machines)
-of the object's structure are a pointer to a structure that describes
-the object's type, which includes method pointers and a pointer to a
-string naming the type. Note that it's possible to save some space by
-using a one- or two-byte tag, rather than a four- or eight-byte pointer
-to store the type, but it's not clear it's worth making the change.
+(Lisp_Type_Record) tag. The first bytes of the object's structure are an
+integer (actually a char) characterising the object's type and some
+flags, in particular the mark bit used for garbage collection. A
+structure describing the type is accessible thru the
+lrecord_implementation_table indexed with said integer. This structure
+includes the method pointers and a pointer to a string naming the type.
@itemize @bullet
@item
-(c) Those lrecords that are allocated in frob blocks (see above). This
+(b) Those lrecords that are allocated in frob blocks (see above). This
includes the objects that are most common and relatively small, and
-includes floats, bytecodes, symbols (when not in category (b)), extents,
-events, and markers. With the cleanup of frob blocks done in 19.12,
-it's not terribly hard to add more objects to this category, but it's a
-bit trickier than adding an object type to type (d) (esp. if the object
-needs a finalization method), and is not likely to save much space
-unless the object is small and there are many of them. (In fact, if
-there are very few of them, it might actually waste space.)
-@item
-(d) Those lrecords that are individually @code{malloc()}ed. These are
+includes conses, strings, subrs, floats, compiled functions, symbols,
+extents, events, and markers. With the cleanup of frob blocks done in
+19.12, it's not terribly hard to add more objects to this category, but
+it's a bit trickier than adding an object type to type (c) (esp. if the
+object needs a finalization method), and is not likely to save much
+space unless the object is small and there are many of them. (In fact,
+if there are very few of them, it might actually waste space.)
+@item
+(c) Those lrecords that are individually @code{malloc()}ed. These are
called @dfn{lcrecords}. All other types are in this category. Adding a
new type to this category is comparatively easy, and all types added
since 19.8 (when the current allocation scheme was devised, by Richard
@end itemize
Note that bit vectors are a bit of a special case. They are
-simple lrecords as in category (c), but are individually @code{malloc()}ed
+simple lrecords as in category (b), but are individually @code{malloc()}ed
like vectors. You can basically view them as exactly like vectors
except that their type is stored in lrecord fashion rather than
in directly-tagged fashion.
- Note that FSF Emacs redesigned their object system in 19.29 to follow
-a similar scheme. However, given RMS's expressed dislike for data
-abstraction, the FSF scheme is not nearly as clean or as easy to
-extend. (FSF calls items of type (c) @code{Lisp_Misc} and items of type
-(d) @code{Lisp_Vectorlike}, with separate tags for each, although
-@code{Lisp_Vectorlike} is also used for vectors.)
-@node Garbage Collection
+@node Garbage Collection, GCPROing, Introduction to Allocation, Allocation of Objects in XEmacs Lisp
@section Garbage Collection
@cindex garbage collection
all vectors (which are chained in one big list), and all
lcrecords (which are likewise chained).
- Note that, when an object is marked, the mark has to occur
-inside of the object's structure, rather than in the 32-bit
-@code{Lisp_Object} holding the object's pointer; i.e. you can't just
-set the pointer's mark bit. This is because there may be many
-pointers to the same object. This means that the method of
-marking an object can differ depending on the type. The
-different marking methods are approximately as follows:
-
-@enumerate
-@item
-For conses, the mark bit of the car is set.
-@item
-For strings, the mark bit of the string's plist is set.
-@item
-For symbols when not lrecords, the mark bit of the
-symbol's plist is set.
-@item
-For vectors, the length is negated after adding 1.
-@item
-For lrecords, the pointer to the structure describing
-the type is changed (see below).
-@item
-Integers and characters do not need to be marked, since
-no allocation occurs for them.
-@end enumerate
-
- The details of this are in the @code{mark_object()} function.
+ Garbage collection can be invoked explicitly by calling
+@code{garbage-collect} but is also called automatically by @code{eval},
+once a certain amount of memory has been allocated since the last
+garbage collection (according to @code{gc-cons-threshold}).
- Note that any code that operates during garbage collection has
-to be especially careful because of the fact that some objects
-may be marked and as such may not look like they normally do.
-In particular:
-@itemize @bullet
-Some object pointers may have their mark bit set. This will make
-@code{FOOBARP()} predicates fail. Use @code{GC_FOOBARP()} to deal with
-this.
-@item
-Even if you clear the mark bit, @code{FOOBARP()} will still fail
-for lrecords because the implementation pointer has been
-changed (see below). @code{GC_FOOBARP()} will correctly deal with
-this.
-@item
-Vectors have their size field munged, so anything that
-looks at this field will fail.
-@item
-Note that @code{XFOOBAR()} macros @emph{will} work correctly on object
-pointers with their mark bit set, because the logical shift operations
-that remove the tag also remove the mark bit.
-@end itemize
-
- Finally, note that garbage collection can be invoked explicitly
-by calling @code{garbage-collect} but is also called automatically
-by @code{eval}, once a certain amount of memory has been allocated
-since the last garbage collection (according to @code{gc-cons-threshold}).
-
-@node GCPROing
+@node GCPROing, Garbage Collection - Step by Step, Garbage Collection, Allocation of Objects in XEmacs Lisp
@section @code{GCPRO}ing
@code{GCPRO}ing is one of the ugliest and trickiest parts of Emacs
@enumerate
@item
-All objects that have been @code{staticpro()}d. This is used for
-any global C variables that hold Lisp objects. A call to
-@code{staticpro()} happens implicitly as a result of any symbols
-declared with @code{defsymbol()} and any variables declared with
-@code{DEFVAR_FOO()}. You need to explicitly call @code{staticpro()}
-(in the @code{vars_of_foo()} method of a module) for other global
-C variables holding Lisp objects. (This typically includes
-internal lists and such things.)
+All objects that have been @code{staticpro()}d or
+@code{staticpro_nodump()}ed. This is used for any global C variables
+that hold Lisp objects. A call to @code{staticpro()} happens implicitly
+as a result of any symbols declared with @code{defsymbol()} and any
+variables declared with @code{DEFVAR_FOO()}. You need to explicitly
+call @code{staticpro()} (in the @code{vars_of_foo()} method of a module)
+for other global C variables holding Lisp objects. (This typically
+includes internal lists and such things.). Use
+@code{staticpro_nodump()} only in the rare cases when you do not want
+the pointed variable to be saved at dump time but rather recompute it at
+startup.
Note that @code{obarray} is one of the @code{staticpro()}d things.
Therefore, all functions and variables get marked through this.
in the next enclosing stack frame. Each @code{GCPRO}ed thing is an
lvalue, and the @code{struct gcpro} local variable contains a pointer to
this lvalue. This is why things will mess up badly if you don't pair up
-the @code{GCPRO}s and @code{UNGCPRO}s -- you will end up with
+the @code{GCPRO}s and @code{UNGCPRO}s---you will end up with
@code{gcprolist}s containing pointers to @code{struct gcpro}s or local
@code{Lisp_Object} variables in no-longer-active stack frames.
it obviates the need for @code{GCPRO}ing, and allows garbage collection
to happen at any point at all, such as during object allocation.
-@node Integers and Characters
+@node Garbage Collection - Step by Step, Integers and Characters, GCPROing, Allocation of Objects in XEmacs Lisp
+@section Garbage Collection - Step by Step
+@cindex garbage collection step by step
+
+@menu
+* Invocation::
+* garbage_collect_1::
+* mark_object::
+* gc_sweep::
+* sweep_lcrecords_1::
+* compact_string_chars::
+* sweep_strings::
+* sweep_bit_vectors_1::
+@end menu
+
+@node Invocation, garbage_collect_1, Garbage Collection - Step by Step, Garbage Collection - Step by Step
+@subsection Invocation
+@cindex garbage collection, invocation
+
+The first thing that anyone should know about garbage collection is:
+when and how the garbage collector is invoked. One might think that this
+could happen every time new memory is allocated, e.g. new objects are
+created, but this is @emph{not} the case. Instead, we have the following
+situation:
+
+The entry point of any process of garbage collection is an invocation
+of the function @code{garbage_collect_1} in file @code{alloc.c}. The
+invocation can occur @emph{explicitly} by calling the function
+@code{Fgarbage_collect} (in addition this function provides information
+about the freed memory), or can occur @emph{implicitly} in four different
+situations:
+@enumerate
+@item
+In function @code{main_1} in file @code{emacs.c}. This function is called
+at each startup of xemacs. The garbage collection is invoked after all
+initial creations are completed, but only if a special internal error
+checking-constant @code{ERROR_CHECK_GC} is defined.
+@item
+In function @code{disksave_object_finalization} in file
+@code{alloc.c}. The only purpose of this function is to clear the
+objects from memory which need not be stored with xemacs when we dump out
+an executable. This is only done by @code{Fdump_emacs} or by
+@code{Fdump_emacs_data} respectively (both in @code{emacs.c}). The
+actual clearing is accomplished by making these objects unreachable and
+starting a garbage collection. The function is only used while building
+xemacs.
+@item
+In function @code{Feval / eval} in file @code{eval.c}. Each time the
+well known and often used function eval is called to evaluate a form,
+one of the first things that could happen, is a potential call of
+@code{garbage_collect_1}. There exist three global variables,
+@code{consing_since_gc} (counts the created cons-cells since the last
+garbage collection), @code{gc_cons_threshold} (a specified threshold
+after which a garbage collection occurs) and @code{always_gc}. If
+@code{always_gc} is set or if the threshold is exceeded, the garbage
+collection will start.
+@item
+In function @code{Ffuncall / funcall} in file @code{eval.c}. This
+function evaluates calls of elisp functions and works according to
+@code{Feval}.
+@end enumerate
+
+The upshot is that garbage collection can basically occur everywhere
+@code{Feval}, respectively @code{Ffuncall}, is used - either directly or
+through another function. Since calls to these two functions are hidden
+in various other functions, many calls to @code{garbage_collect_1} are
+not obviously foreseeable, and therefore unexpected. Instances where
+they are used that are worth remembering are various elisp commands, as
+for example @code{or}, @code{and}, @code{if}, @code{cond}, @code{while},
+@code{setq}, etc., miscellaneous @code{gui_item_...} functions,
+everything related to @code{eval} (@code{Feval_buffer}, @code{call0},
+...) and inside @code{Fsignal}. The latter is used to handle signals, as
+for example the ones raised by every @code{QUIT}-macro triggered after
+pressing Ctrl-g.
+
+@node garbage_collect_1, mark_object, Invocation, Garbage Collection - Step by Step
+@subsection @code{garbage_collect_1}
+@cindex @code{garbage_collect_1}
+
+We can now describe exactly what happens after the invocation takes
+place.
+@enumerate
+@item
+There are several cases in which the garbage collector is left immediately:
+when we are already garbage collecting (@code{gc_in_progress}), when
+the garbage collection is somehow forbidden
+(@code{gc_currently_forbidden}), when we are currently displaying something
+(@code{in_display}) or when we are preparing for the armageddon of the
+whole system (@code{preparing_for_armageddon}).
+@item
+Next the correct frame in which to put
+all the output occurring during garbage collecting is determined. In
+order to be able to restore the old display's state after displaying the
+message, some data about the current cursor position has to be
+saved. The variables @code{pre_gc_cursor} and @code{cursor_changed} take
+care of that.
+@item
+The state of @code{gc_currently_forbidden} must be restored after
+the garbage collection, no matter what happens during the process. We
+accomplish this by @code{record_unwind_protect}ing the suitable function
+@code{restore_gc_inhibit} together with the current value of
+@code{gc_currently_forbidden}.
+@item
+If we are concurrently running an interactive xemacs session, the next step
+is simply to show the garbage collector's cursor/message.
+@item
+The following steps are the intrinsic steps of the garbage collector,
+therefore @code{gc_in_progress} is set.
+@item
+For debugging purposes, it is possible to copy the current C stack
+frame. However, this seems to be a currently unused feature.
+@item
+Before actually starting to go over all live objects, references to
+objects that are no longer used are pruned. We only have to do this for events
+(@code{clear_event_resource}) and for specifiers
+(@code{cleanup_specifiers}).
+@item
+Now the mark phase begins and marks all accessible elements. In order to
+start from
+all slots that serve as roots of accessibility, the function
+@code{mark_object} is called for each root individually to go out from
+there to mark all reachable objects. All roots that are traversed are
+shown in their processed order:
+@itemize @bullet
+@item
+all constant symbols and static variables that are registered via
+@code{staticpro}@ in the dynarr @code{staticpros}.
+@xref{Adding Global Lisp Variables}.
+@item
+all Lisp objects that are created in C functions and that must be
+protected from freeing them. They are registered in the global
+list @code{gcprolist}.
+@xref{GCPROing}.
+@item
+all local variables (i.e. their name fields @code{symbol} and old
+values @code{old_values}) that are bound during the evaluation by the Lisp
+engine. They are stored in @code{specbinding} structs pushed on a stack
+called @code{specpdl}.
+@xref{Dynamic Binding; The specbinding Stack; Unwind-Protects}.
+@item
+all catch blocks that the Lisp engine encounters during the evaluation
+cause the creation of structs @code{catchtag} inserted in the list
+@code{catchlist}. Their tag (@code{tag}) and value (@code{val} fields
+are freshly created objects and therefore have to be marked.
+@xref{Catch and Throw}.
+@item
+every function application pushes new structs @code{backtrace}
+on the call stack of the Lisp engine (@code{backtrace_list}). The unique
+parts that have to be marked are the fields for each function
+(@code{function}) and all their arguments (@code{args}).
+@xref{Evaluation}.
+@item
+all objects that are used by the redisplay engine that must not be freed
+are marked by a special function called @code{mark_redisplay} (in
+@code{redisplay.c}).
+@item
+all objects created for profiling purposes are allocated by C functions
+instead of using the lisp allocation mechanisms. In order to receive the
+right ones during the sweep phase, they also have to be marked
+manually. That is done by the function @code{mark_profiling_info}
+@end itemize
+@item
+Hash tables in XEmacs belong to a kind of special objects that
+make use of a concept often called 'weak pointers'.
+To make a long story short, these kind of pointers are not followed
+during the estimation of the live objects during garbage collection.
+Any object referenced only by weak pointers is collected
+anyway, and the reference to it is cleared. In hash tables there are
+different usage patterns of them, manifesting in different types of hash
+tables, namely 'non-weak', 'weak', 'key-weak' and 'value-weak'
+(internally also 'key-car-weak' and 'value-car-weak') hash tables, each
+clearing entries depending on different conditions. More information can
+be found in the documentation to the function @code{make-hash-table}.
+
+Because there are complicated dependency rules about when and what to
+mark while processing weak hash tables, the standard @code{marker}
+method is only active if it is marking non-weak hash tables. As soon as
+a weak component is in the table, the hash table entries are ignored
+while marking. Instead their marking is done each separately by the
+function @code{finish_marking_weak_hash_tables}. This function iterates
+over each hash table entry @code{hentries} for each weak hash table in
+@code{Vall_weak_hash_tables}. Depending on the type of a table, the
+appropriate action is performed.
+If a table is acting as @code{HASH_TABLE_KEY_WEAK}, and a key already marked,
+everything reachable from the @code{value} component is marked. If it is
+acting as a @code{HASH_TABLE_VALUE_WEAK} and the value component is
+already marked, the marking starts beginning only from the
+@code{key} component.
+If it is a @code{HASH_TABLE_KEY_CAR_WEAK} and the car
+of the key entry is already marked, we mark both the @code{key} and
+@code{value} components.
+Finally, if the table is of the type @code{HASH_TABLE_VALUE_CAR_WEAK}
+and the car of the value components is already marked, again both the
+@code{key} and the @code{value} components get marked.
+
+Again, there are lists with comparable properties called weak
+lists. There exist different peculiarities of their types called
+@code{simple}, @code{assoc}, @code{key-assoc} and
+@code{value-assoc}. You can find further details about them in the
+description to the function @code{make-weak-list}. The scheme of their
+marking is similar: all weak lists are listed in @code{Qall_weak_lists},
+therefore we iterate over them. The marking is advanced until we hit an
+already marked pair. Then we know that during a former run all
+the rest has been marked completely. Again, depending on the special
+type of the weak list, our jobs differ. If it is a @code{WEAK_LIST_SIMPLE}
+and the elem is marked, we mark the @code{cons} part. If it is a
+@code{WEAK_LIST_ASSOC} and not a pair or a pair with both marked car and
+cdr, we mark the @code{cons} and the @code{elem}. If it is a
+@code{WEAK_LIST_KEY_ASSOC} and not a pair or a pair with a marked car of
+the elem, we mark the @code{cons} and the @code{elem}. Finally, if it is
+a @code{WEAK_LIST_VALUE_ASSOC} and not a pair or a pair with a marked
+cdr of the elem, we mark both the @code{cons} and the @code{elem}.
+
+Since, by marking objects in reach from weak hash tables and weak lists,
+other objects could get marked, this perhaps implies further marking of
+other weak objects, both finishing functions are redone as long as
+yet unmarked objects get freshly marked.
+
+@item
+After completing the special marking for the weak hash tables and for the weak
+lists, all entries that point to objects that are going to be swept in
+the further process are useless, and therefore have to be removed from
+the table or the list.
+
+The function @code{prune_weak_hash_tables} does the job for weak hash
+tables. Totally unmarked hash tables are removed from the list
+@code{Vall_weak_hash_tables}. The other ones are treated more carefully
+by scanning over all entries and removing one as soon as one of
+the components @code{key} and @code{value} is unmarked.
+
+The same idea applies to the weak lists. It is accomplished by
+@code{prune_weak_lists}: An unmarked list is pruned from
+@code{Vall_weak_lists} immediately. A marked list is treated more
+carefully by going over it and removing just the unmarked pairs.
+
+@item
+The function @code{prune_specifiers} checks all listed specifiers held
+in @code{Vall_specifiers} and removes the ones from the lists that are
+unmarked.
+
+@item
+All syntax tables are stored in a list called
+@code{Vall_syntax_tables}. The function @code{prune_syntax_tables} walks
+through it and unlinks the tables that are unmarked.
+
+@item
+Next, we will attack the complete sweeping - the function
+@code{gc_sweep} which holds the predominance.
+@item
+First, all the variables with respect to garbage collection are
+reset. @code{consing_since_gc} - the counter of the created cells since
+the last garbage collection - is set back to 0, and
+@code{gc_in_progress} is not @code{true} anymore.
+@item
+In case the session is interactive, the displayed cursor and message are
+removed again.
+@item
+The state of @code{gc_inhibit} is restored to the former value by
+unwinding the stack.
+@item
+A small memory reserve is always held back that can be reached by
+@code{breathing_space}. If nothing more is left, we create a new reserve
+and exit.
+@end enumerate
+
+@node mark_object, gc_sweep, garbage_collect_1, Garbage Collection - Step by Step
+@subsection @code{mark_object}
+@cindex @code{mark_object}
+
+The first thing that is checked while marking an object is whether the
+object is a real Lisp object @code{Lisp_Type_Record} or just an integer
+or a character. Integers and characters are the only two types that are
+stored directly - without another level of indirection, and therefore they
+don't have to be marked and collected.
+@xref{How Lisp Objects Are Represented in C}.
+
+The second case is the one we have to handle. It is the one when we are
+dealing with a pointer to a Lisp object. But, there exist also three
+possibilities, that prevent us from doing anything while marking: The
+object is read only which prevents it from being garbage collected,
+i.e. marked (@code{C_READONLY_RECORD_HEADER}). The object in question is
+already marked, and need not be marked for the second time (checked by
+@code{MARKED_RECORD_HEADER_P}). If it is a special, unmarkable object
+(@code{UNMARKABLE_RECORD_HEADER_P}, apparently, these are objects that
+sit in some const space, and can therefore not be marked, see
+@code{this_one_is_unmarkable} in @code{alloc.c}).
+
+Now, the actual marking is feasible. We do so by once using the macro
+@code{MARK_RECORD_HEADER} to mark the object itself (actually the
+special flag in the lrecord header), and calling its special marker
+"method" @code{marker} if available. The marker method marks every
+other object that is in reach from our current object. Note, that these
+marker methods should not call @code{mark_object} recursively, but
+instead should return the next object from where further marking has to
+be performed.
+
+In case another object was returned, as mentioned before, we reiterate
+the whole @code{mark_object} process beginning with this next object.
+
+@node gc_sweep, sweep_lcrecords_1, mark_object, Garbage Collection - Step by Step
+@subsection @code{gc_sweep}
+@cindex @code{gc_sweep}
+
+The job of this function is to free all unmarked records from memory. As
+we know, there are different types of objects implemented and managed, and
+consequently different ways to free them from memory.
+@xref{Introduction to Allocation}.
+
+We start with all objects stored through @code{lcrecords}. All
+bulkier objects are allocated and handled using that scheme of
+@code{lcrecords}. Each object is @code{malloc}ed separately
+instead of placing it in one of the contiguous frob blocks. All types
+that are currently stored
+using @code{lcrecords}'s @code{alloc_lcrecord} and
+@code{make_lcrecord_list} are the types: vectors, buffers,
+char-table, char-table-entry, console, weak-list, database, device,
+ldap, hash-table, command-builder, extent-auxiliary, extent-info, face,
+coding-system, frame, image-instance, glyph, popup-data, gui-item,
+keymap, charset, color_instance, font_instance, opaque, opaque-list,
+process, range-table, specifier, symbol-value-buffer-local,
+symbol-value-lisp-magic, symbol-value-varalias, toolbar-button,
+tooltalk-message, tooltalk-pattern, window, and window-configuration. We
+take care of them in the fist place
+in order to be able to handle and to finalize items stored in them more
+easily. The function @code{sweep_lcrecords_1} as described below is
+doing the whole job for us.
+For a description about the internals: @xref{lrecords}.
+
+Our next candidates are the other objects that behave quite differently
+than everything else: the strings. They consists of two parts, a
+fixed-size portion (@code{struct Lisp_String}) holding the string's
+length, its property list and a pointer to the second part, and the
+actual string data, which is stored in string-chars blocks comparable to
+frob blocks. In this block, the data is not only freed, but also a
+compression of holes is made, i.e. all strings are relocated together.
+@xref{String}. This compacting phase is performed by the function
+@code{compact_string_chars}, the actual sweeping by the function
+@code{sweep_strings} is described below.
+
+After that, the other types are swept step by step using functions
+@code{sweep_conses}, @code{sweep_bit_vectors_1},
+@code{sweep_compiled_functions}, @code{sweep_floats},
+@code{sweep_symbols}, @code{sweep_extents}, @code{sweep_markers} and
+@code{sweep_extents}. They are the fixed-size types cons, floats,
+compiled-functions, symbol, marker, extent, and event stored in
+so-called "frob blocks", and therefore we can basically do the same on
+every type objects, using the same macros, especially defined only to
+handle everything with respect to fixed-size blocks. The only fixed-size
+type that is not handled here are the fixed-size portion of strings,
+because we took special care of them earlier.
+
+The only big exceptions are bit vectors stored differently and
+therefore treated differently by the function @code{sweep_bit_vectors_1}
+described later.
+
+At first, we need some brief information about how
+these fixed-size types are managed in general, in order to understand
+how the sweeping is done. They have all a fixed size, and are therefore
+stored in big blocks of memory - allocated at once - that can hold a
+certain amount of objects of one type. The macro
+@code{DECLARE_FIXED_TYPE_ALLOC} creates the suitable structures for
+every type. More precisely, we have the block struct
+(holding a pointer to the previous block @code{prev} and the
+objects in @code{block[]}), a pointer to current block
+(@code{current_..._block)}) and its last index
+(@code{current_..._block_index}), and a pointer to the free list that
+will be created. Also a macro @code{FIXED_TYPE_FROM_BLOCK} plus some
+related macros exists that are used to obtain a new object, either from
+the free list @code{ALLOCATE_FIXED_TYPE_1} if there is an unused object
+of that type stored or by allocating a completely new block using
+@code{ALLOCATE_FIXED_TYPE_FROM_BLOCK}.
+
+The rest works as follows: all of them define a
+macro @code{UNMARK_...} that is used to unmark the object. They define a
+macro @code{ADDITIONAL_FREE_...} that defines additional work that has
+to be done when converting an object from in use to not in use (so far,
+only markers use it in order to unchain them). Then, they all call
+the macro @code{SWEEP_FIXED_TYPE_BLOCK} instantiated with their type name
+and their struct name.
+
+This call in particular does the following: we go over all blocks
+starting with the current moving towards the oldest.
+For each block, we look at every object in it. If the object already
+freed (checked with @code{FREE_STRUCT_P} using the first pointer of the
+object), or if it is
+set to read only (@code{C_READONLY_RECORD_HEADER_P}, nothing must be
+done. If it is unmarked (checked with @code{MARKED_RECORD_HEADER_P}), it
+is put in the free list and set free (using the macro
+@code{FREE_FIXED_TYPE}, otherwise it stays in the block, but is unmarked
+(by @code{UNMARK_...}). While going through one block, we note if the
+whole block is empty. If so, the whole block is freed (using
+@code{xfree}) and the free list state is set to the state it had before
+handling this block.
+
+@node sweep_lcrecords_1, compact_string_chars, gc_sweep, Garbage Collection - Step by Step
+@subsection @code{sweep_lcrecords_1}
+@cindex @code{sweep_lcrecords_1}
+
+After nullifying the complete lcrecord statistics, we go over all
+lcrecords two separate times. They are all chained together in a list with
+a head called @code{all_lcrecords}.
+
+The first loop calls for each object its @code{finalizer} method, but only
+in the case that it is not read only
+(@code{C_READONLY_RECORD_HEADER_P)}, it is not already marked
+(@code{MARKED_RECORD_HEADER_P}), it is not already in a free list (list of
+freed objects, field @code{free}) and finally it owns a finalizer
+method.
+
+The second loop actually frees the appropriate objects again by iterating
+through the whole list. In case an object is read only or marked, it
+has to persist, otherwise it is manually freed by calling
+@code{xfree}. During this loop, the lcrecord statistics are kept up to
+date by calling @code{tick_lcrecord_stats} with the right arguments,
+
+@node compact_string_chars, sweep_strings, sweep_lcrecords_1, Garbage Collection - Step by Step
+@subsection @code{compact_string_chars}
+@cindex @code{compact_string_chars}
+
+The purpose of this function is to compact all the data parts of the
+strings that are held in so-called @code{string_chars_block}, i.e. the
+strings that do not exceed a certain maximal length.
+
+The procedure with which this is done is as follows. We are keeping two
+positions in the @code{string_chars_block}s using two pointer/integer
+pairs, namely @code{from_sb}/@code{from_pos} and
+@code{to_sb}/@code{to_pos}. They stand for the actual positions, from
+where to where, to copy the actually handled string.
+
+While going over all chained @code{string_char_block}s and their held
+strings, staring at @code{first_string_chars_block}, both pointers
+are advanced and eventually a string is copied from @code{from_sb} to
+@code{to_sb}, depending on the status of the pointed at strings.
+
+More precisely, we can distinguish between the following actions.
+@itemize @bullet
+@item
+The string at @code{from_sb}'s position could be marked as free, which
+is indicated by an invalid pointer to the pointer that should point back
+to the fixed size string object, and which is checked by
+@code{FREE_STRUCT_P}. In this case, the @code{from_sb}/@code{from_pos}
+is advanced to the next string, and nothing has to be copied.
+@item
+Also, if a string object itself is unmarked, nothing has to be
+copied. We likewise advance the @code{from_sb}/@code{from_pos}
+pair as described above.
+@item
+In all other cases, we have a marked string at hand. The string data
+must be moved from the from-position to the to-position. In case
+there is not enough space in the actual @code{to_sb}-block, we advance
+this pointer to the beginning of the next block before copying. In case the
+from and to positions are different, we perform the
+actual copying using the library function @code{memmove}.
+@end itemize
+
+After compacting, the pointer to the current
+@code{string_chars_block}, sitting in @code{current_string_chars_block},
+is reset on the last block to which we moved a string,
+i.e. @code{to_block}, and all remaining blocks (we know that they just
+carry garbage) are explicitly @code{xfree}d.
+
+@node sweep_strings, sweep_bit_vectors_1, compact_string_chars, Garbage Collection - Step by Step
+@subsection @code{sweep_strings}
+@cindex @code{sweep_strings}
+
+The sweeping for the fixed sized string objects is essentially exactly
+the same as it is for all other fixed size types. As before, the freeing
+into the suitable free list is done by using the macro
+@code{SWEEP_FIXED_SIZE_BLOCK} after defining the right macros
+@code{UNMARK_string} and @code{ADDITIONAL_FREE_string}. These two
+definitions are a little bit special compared to the ones used
+for the other fixed size types.
+
+@code{UNMARK_string} is defined the same way except some additional code
+used for updating the bookkeeping information.
+
+For strings, @code{ADDITIONAL_FREE_string} has to do something in
+addition: in case, the string was not allocated in a
+@code{string_chars_block} because it exceeded the maximal length, and
+therefore it was @code{malloc}ed separately, we know also @code{xfree}
+it explicitly.
+
+@node sweep_bit_vectors_1, , sweep_strings, Garbage Collection - Step by Step
+@subsection @code{sweep_bit_vectors_1}
+@cindex @code{sweep_bit_vectors_1}
+
+Bit vectors are also one of the rare types that are @code{malloc}ed
+individually. Consequently, while sweeping, all further needless
+bit vectors must be freed by hand. This is done, as one might imagine,
+the expected way: since they are all registered in a list called
+@code{all_bit_vectors}, all elements of that list are traversed,
+all unmarked bit vectors are unlinked by calling @code{xfree} and all of
+them become unmarked.
+In addition, the bookkeeping information used for garbage
+collector's output purposes is updated.
+
+@node Integers and Characters, Allocation from Frob Blocks, Garbage Collection - Step by Step, Allocation of Objects in XEmacs Lisp
@section Integers and Characters
Integer and character Lisp objects are created from integers using the
are too big; i.e. you won't get the value you expected but the tag bits
will at least be correct.
-@node Allocation from Frob Blocks
+@node Allocation from Frob Blocks, lrecords, Integers and Characters, Allocation of Objects in XEmacs Lisp
@section Allocation from Frob Blocks
The uninitialized memory required by a @code{Lisp_Object} of a particular type
none. (There are actually two versions of these macros, one of which is
more defensive but less efficient and is used for error-checking.)
-@node lrecords
+@node lrecords, Low-level allocation, Allocation from Frob Blocks, Allocation of Objects in XEmacs Lisp
@section lrecords
[see @file{lrecord.h}]
All lrecords have at the beginning of their structure a @code{struct
-lrecord_header}. This just contains a pointer to a @code{struct
+lrecord_header}. This just contains a type number and some flags,
+including the mark bit. All builtin type numbers are defined as
+constants in @code{enum lrecord_type}, to allow the compiler to generate
+more efficient code for @code{@var{type}P}. The type number, thru the
+@code{lrecord_implementation_table}, gives access to a @code{struct
lrecord_implementation}, which is a structure containing method pointers
and such. There is one of these for each type, and it is a global,
constant, statically-declared structure that is declared in the
-@code{DEFINE_LRECORD_IMPLEMENTATION()} macro. (This macro actually
-declares an array of two @code{struct lrecord_implementation}
-structures. The first one contains all the standard method pointers,
-and is used in all normal circumstances. During garbage collection,
-however, the lrecord is @dfn{marked} by bumping its implementation
-pointer by one, so that it points to the second structure in the array.
-This structure contains a special indication in it that it's a
-@dfn{marked-object} structure: the finalize method is the special
-function @code{this_marks_a_marked_record()}, and all other methods are
-null pointers. At the end of garbage collection, all lrecords will
-either be reclaimed or unmarked by decrementing their implementation
-pointers, so this second structure pointer will never remain past
-garbage collection.
-
- Simple lrecords (of type (c) above) just have a @code{struct
+@code{DEFINE_LRECORD_IMPLEMENTATION()} macro.
+
+ Simple lrecords (of type (b) above) just have a @code{struct
lrecord_header} at their beginning. lcrecords, however, actually have a
@code{struct lcrecord_header}. This, in turn, has a @code{struct
lrecord_header} at its beginning, so sanity is preserved; but it also
Whenever you create an lrecord, you need to call either
@code{DEFINE_LRECORD_IMPLEMENTATION()} or
@code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()}. This needs to be
-specified in a C file, at the top level. What this actually does is
-define and initialize the implementation structure for the lrecord. (And
-possibly declares a function @code{error_check_foo()} that implements
-the @code{XFOO()} macro when error-checking is enabled.) The arguments
-to the macros are the actual type name (this is used to construct the C
-variable name of the lrecord implementation structure and related
-structures using the @samp{##} macro concatenation operator), a string
-that names the type on the Lisp level (this may not be the same as the C
-type name; typically, the C type name has underscores, while the Lisp
-string has dashes), various method pointers, and the name of the C
-structure that contains the object. The methods are used to encapsulate
-type-specific information about the object, such as how to print it or
-mark it for garbage collection, so that it's easy to add new object
-types without having to add a specific case for each new type in a bunch
-of different places.
+specified in a @file{.c} file, at the top level. What this actually
+does is define and initialize the implementation structure for the
+lrecord. (And possibly declares a function @code{error_check_foo()} that
+implements the @code{XFOO()} macro when error-checking is enabled.) The
+arguments to the macros are the actual type name (this is used to
+construct the C variable name of the lrecord implementation structure
+and related structures using the @samp{##} macro concatenation
+operator), a string that names the type on the Lisp level (this may not
+be the same as the C type name; typically, the C type name has
+underscores, while the Lisp string has dashes), various method pointers,
+and the name of the C structure that contains the object. The methods
+are used to encapsulate type-specific information about the object, such
+as how to print it or mark it for garbage collection, so that it's easy
+to add new object types without having to add a specific case for each
+new type in a bunch of different places.
The difference between @code{DEFINE_LRECORD_IMPLEMENTATION()} and
@code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()} is that the former is
For the purpose of keeping allocation statistics, the allocation
engine keeps a list of all the different types that exist. Note that,
since @code{DEFINE_LRECORD_IMPLEMENTATION()} is a macro that is
-specified at top-level, there is no way for it to add to the list of all
-existing types. What happens instead is that each implementation
-structure contains in it a dynamically assigned number that is
-particular to that type. (Or rather, it contains a pointer to another
-structure that contains this number. This evasiveness is done so that
-the implementation structure can be declared const.) In the sweep stage
-of garbage collection, each lrecord is examined to see if its
-implementation structure has its dynamically-assigned number set. If
-not, it must be a new type, and it is added to the list of known types
-and a new number assigned. The number is used to index into an array
-holding the number of objects of each type and the total memory
-allocated for objects of that type. The statistics in this array are
-also computed during the sweep stage. These statistics are returned by
-the call to @code{garbage-collect} and are printed out at the end of the
-loadup phase.
+specified at top-level, there is no way for it to initialize the global
+data structures containing type information, like
+@code{lrecord_implementations_table}. For this reason a call to
+@code{INIT_LRECORD_IMPLEMENTATION} must be added to the same source file
+containing @code{DEFINE_LRECORD_IMPLEMENTATION}, but instead of to the
+top level, to one of the init functions, typically
+@code{syms_of_@var{foo}.c}. @code{INIT_LRECORD_IMPLEMENTATION} must be
+called before an object of this type is used.
+
+The type number is also used to index into an array holding the number
+of objects of each type and the total memory allocated for objects of
+that type. The statistics in this array are computed during the sweep
+stage. These statistics are returned by the call to
+@code{garbage-collect}.
Note that for every type defined with a @code{DEFINE_LRECORD_*()}
macro, there needs to be a @code{DECLARE_LRECORD_IMPLEMENTATION()}
file. To create one of these, copy an existing model and modify as
necessary.
+ @strong{Please note:} If you define an lrecord in an external
+dynamically-loaded module, you must use @code{DECLARE_EXTERNAL_LRECORD},
+@code{DEFINE_EXTERNAL_LRECORD_IMPLEMENTATION}, and
+@code{DEFINE_EXTERNAL_LRECORD_SEQUENCE_IMPLEMENTATION} instead of the
+non-EXTERNAL forms. These macros will dynamically add new type numbers
+to the global enum that records them, whereas the non-EXTERNAL forms
+assume that the programmer has already inserted the correct type numbers
+into the enum's code at compile-time.
+
The various methods in the lrecord implementation structure are:
@enumerate
a function pointer (usually the @code{mark_object()} function), which is
used to mark an object. All Lisp objects that are contained within the
object need to be marked by applying this function to them. The mark
-method should also return a Lisp object, which should be either nil or
+method should also return a Lisp object, which should be either @code{nil} or
an object to mark. (This can be used in lieu of calling
@code{mark_object()} on the object, to reduce the recursion depth, and
consequently should be the most heavily nested sub-object, such as a
For an example, see the methods for window configurations and opaques.
@end enumerate
-@node Low-level allocation
+@node Low-level allocation, Cons, lrecords, Allocation of Objects in XEmacs Lisp
@section Low-level allocation
Memory that you want to allocate directly should be allocated using
(On some systems, the memory warnings are not functional.)
Allocated memory that is going to be used to make a Lisp object
-is created using @code{allocate_lisp_storage()}. This calls @code{xmalloc()}
-but also verifies that the pointer to the memory can fit into
-a Lisp word (remember that some bits are taken away for a type
-tag and a mark bit). If not, an error is issued through @code{memory_full()}.
-@code{allocate_lisp_storage()} is called by @code{alloc_lcrecord()},
-@code{ALLOCATE_FIXED_TYPE()}, and the vector and bit-vector creation
-routines. These routines also call @code{INCREMENT_CONS_COUNTER()} at the
-appropriate times; this keeps statistics on how much memory is
-allocated, so that garbage-collection can be invoked when the
-threshold is reached.
-
-@node Pure Space
-@section Pure Space
-
- Not yet documented.
-
-@node Cons
+is created using @code{allocate_lisp_storage()}. This just calls
+@code{xmalloc()}. It used to verify that the pointer to the memory can
+fit into a Lisp word, before the current Lisp object representation was
+introduced. @code{allocate_lisp_storage()} is called by
+@code{alloc_lcrecord()}, @code{ALLOCATE_FIXED_TYPE()}, and the vector
+and bit-vector creation routines. These routines also call
+@code{INCREMENT_CONS_COUNTER()} at the appropriate times; this keeps
+statistics on how much memory is allocated, so that garbage-collection
+can be invoked when the threshold is reached.
+
+@node Cons, Vector, Low-level allocation, Allocation of Objects in XEmacs Lisp
@section Cons
Conses are allocated in standard frob blocks. The only thing to
If you mess this up, you will get BADLY BURNED, and it has happened
before.
-@node Vector
+@node Vector, Bit Vector, Cons, Allocation of Objects in XEmacs Lisp
@section Vector
As mentioned above, each vector is @code{malloc()}ed individually, and
is actually @code{malloc()}ed with the right size, however, and access
to any element through the @code{contents} array works fine.
-@node Bit Vector
+@node Bit Vector, Symbol, Vector, Allocation of Objects in XEmacs Lisp
@section Bit Vector
Bit vectors work exactly like vectors, except for more complicated
tag field in bit vector Lisp words is ``lrecord'' rather than
``vector''.)
-@node Symbol
+@node Symbol, Marker, Bit Vector, Allocation of Objects in XEmacs Lisp
@section Symbol
- Symbols are also allocated in frob blocks. Note that the code
-exists for symbols to be either lrecords (category (c) above)
-or simple types (category (b) above), and are lrecords by
-default (I think), although there is no good reason for this.
-
- Note that symbols in the awful horrible obarray structure are
-chained through their @code{next} field.
+ Symbols are also allocated in frob blocks. Symbols in the awful
+horrible obarray structure are chained through their @code{next} field.
Remember that @code{intern} looks up a symbol in an obarray, creating
one if necessary.
-@node Marker
+@node Marker, String, Symbol, Allocation of Objects in XEmacs Lisp
@section Marker
Markers are allocated in frob blocks, as usual. They are kept
markers from a buffer.) Markers are removed from a buffer in
the finalize stage, in @code{ADDITIONAL_FREE_marker()}.
-@node String
+@node String, Compiled Function, Marker, Allocation of Objects in XEmacs Lisp
@section String
As mentioned above, strings are a special case. A string is logically
strings}, are all @code{malloc()}ed as their own block. (#### Although it
would make more sense for the threshold for big strings to be somewhat
lower, e.g. 1/2 or 1/4 the size of a string-chars block. It seems that
-this was indeed the case formerly -- indeed, the threshold was set at
-1/8 -- but Mly forgot about this when rewriting things for 19.8.)
+this was indeed the case formerly---indeed, the threshold was set at
+1/8---but Mly forgot about this when rewriting things for 19.8.)
Note also that the string data in string-chars blocks is padded as
necessary so that proper alignment constraints on the @code{struct
The string compactor recognizes this special 0xFFFFFFFF marker and
handles it correctly.
-@node Bytecode
-@section Bytecode
+@node Compiled Function, , String, Allocation of Objects in XEmacs Lisp
+@section Compiled Function
Not yet documented.
-@node Events and the Event Loop, Evaluation; Stack Frames; Bindings, Allocation of Objects in XEmacs Lisp, Top
+
+@node Dumping, Events and the Event Loop, Allocation of Objects in XEmacs Lisp, Top
+@chapter Dumping
+
+@section What is dumping and its justification
+
+The C code of XEmacs is just a Lisp engine with a lot of built-in
+primitives useful for writing an editor. The editor itself is written
+mostly in Lisp, and represents around 100K lines of code. Loading and
+executing the initialization of all this code takes a bit a time (five
+to ten times the usual startup time of current xemacs) and requires
+having all the lisp source files around. Having to reload them each
+time the editor is started would not be acceptable.
+
+The traditional solution to this problem is called dumping: the build
+process first creates the lisp engine under the name @file{temacs}, then
+runs it until it has finished loading and initializing all the lisp
+code, and eventually creates a new executable called @file{xemacs}
+including both the object code in @file{temacs} and all the contents of
+the memory after the initialization.
+
+This solution, while working, has a huge problem: the creation of the
+new executable from the actual contents of memory is an extremely
+system-specific process, quite error-prone, and which interferes with a
+lot of system libraries (like malloc). It is even getting worse
+nowadays with libraries using constructors which are automatically
+called when the program is started (even before main()) which tend to
+crash when they are called multiple times, once before dumping and once
+after (IRIX 6.x libz.so pulls in some C++ image libraries thru
+dependencies which have this problem). Writing the dumper is also one
+of the most difficult parts of porting XEmacs to a new operating system.
+Basically, `dumping' is an operation that is just not officially
+supported on many operating systems.
+
+The aim of the portable dumper is to solve the same problem as the
+system-specific dumper, that is to be able to reload quickly, using only
+a small number of files, the fully initialized lisp part of the editor,
+without any system-specific hacks.
+
+@menu
+* Overview::
+* Data descriptions::
+* Dumping phase::
+* Reloading phase::
+* Remaining issues::
+@end menu
+
+@node Overview, Data descriptions, Dumping, Dumping
+@section Overview
+
+The portable dumping system has to:
+
+@enumerate
+@item
+At dump time, write all initialized, non-quickly-rebuildable data to a
+file [Note: currently named @file{xemacs.dmp}, but the name will
+change], along with all informations needed for the reloading.
+
+@item
+When starting xemacs, reload the dump file, relocate it to its new
+starting address if needed, and reinitialize all pointers to this
+data. Also, rebuild all the quickly rebuildable data.
+@end enumerate
+
+@node Data descriptions, Dumping phase, Overview, Dumping
+@section Data descriptions
+
+The more complex task of the dumper is to be able to write lisp objects
+(lrecords) and C structs to disk and reload them at a different address,
+updating all the pointers they include in the process. This is done by
+using external data descriptions that give information about the layout
+of the structures in memory.
+
+The specification of these descriptions is in lrecord.h. A description
+of an lrecord is an array of struct lrecord_description. Each of these
+structs include a type, an offset in the structure and some optional
+parameters depending on the type. For instance, here is the string
+description:
+
+@example
+static const struct lrecord_description string_description[] = @{
+ @{ XD_BYTECOUNT, offsetof (Lisp_String, size) @},
+ @{ XD_OPAQUE_DATA_PTR, offsetof (Lisp_String, data), XD_INDIRECT(0, 1) @},
+ @{ XD_LISP_OBJECT, offsetof (Lisp_String, plist) @},
+ @{ XD_END @}
+@};
+@end example
+
+The first line indicates a member of type Bytecount, which is used by
+the next, indirect directive. The second means "there is a pointer to
+some opaque data in the field @code{data}". The length of said data is
+given by the expression @code{XD_INDIRECT(0, 1)}, which means "the value
+in the 0th line of the description (welcome to C) plus one". The third
+line means "there is a Lisp_Object member @code{plist} in the Lisp_String
+structure". @code{XD_END} then ends the description.
+
+This gives us all the information we need to move around what is pointed
+to by a structure (C or lrecord) and, by transitivity, everything that
+it points to. The only missing information for dumping is the size of
+the structure. For lrecords, this is part of the
+lrecord_implementation, so we don't need to duplicate it. For C
+structures we use a struct struct_description, which includes a size
+field and a pointer to an associated array of lrecord_description.
+
+@node Dumping phase, Reloading phase, Data descriptions, Dumping
+@section Dumping phase
+
+Dumping is done by calling the function pdump() (in dumper.c) which is
+invoked from Fdump_emacs (in emacs.c). This function performs a number
+of tasks.
+
+@menu
+* Object inventory::
+* Address allocation::
+* The header::
+* Data dumping::
+* Pointers dumping::
+@end menu
+
+@node Object inventory, Address allocation, Dumping phase, Dumping phase
+@subsection Object inventory
+
+The first task is to build the list of the objects to dump. This
+includes:
+
+@itemize @bullet
+@item lisp objects
+@item C structures
+@end itemize
+
+We end up with one @code{pdump_entry_list_elmt} per object group (arrays
+of C structs are kept together) which includes a pointer to the first
+object of the group, the per-object size and the count of objects in the
+group, along with some other information which is initialized later.
+
+These entries are linked together in @code{pdump_entry_list} structures
+and can be enumerated thru either:
+
+@enumerate
+@item
+the @code{pdump_object_table}, an array of @code{pdump_entry_list}, one
+per lrecord type, indexed by type number.
+
+@item
+the @code{pdump_opaque_data_list}, used for the opaque data which does
+not include pointers, and hence does not need descriptions.
+
+@item
+the @code{pdump_struct_table}, which is a vector of
+@code{struct_description}/@code{pdump_entry_list} pairs, used for
+non-opaque C structures.
+@end enumerate
+
+This uses a marking strategy similar to the garbage collector. Some
+differences though:
+
+@enumerate
+@item
+We do not use the mark bit (which does not exist for C structures
+anyway); we use a big hash table instead.
+
+@item
+We do not use the mark function of lrecords but instead rely on the
+external descriptions. This happens essentially because we need to
+follow pointers to C structures and opaque data in addition to
+Lisp_Object members.
+@end enumerate
+
+This is done by @code{pdump_register_object()}, which handles Lisp_Object
+variables, and @code{pdump_register_struct()} which handles C structures,
+which both delegate the description management to @code{pdump_register_sub()}.
+
+The hash table doubles as a map object to pdump_entry_list_elmt (i.e.
+allows us to look up a pdump_entry_list_elmt with the object it points
+to). Entries are added with @code{pdump_add_entry()} and looked up with
+@code{pdump_get_entry()}. There is no need for entry removal. The hash
+value is computed quite simply from the object pointer by
+@code{pdump_make_hash()}.
+
+The roots for the marking are:
+
+@enumerate
+@item
+the @code{staticpro}'ed variables (there is a special @code{staticpro_nodump()}
+call for protected variables we do not want to dump).
+
+@item
+the variables registered via @code{dump_add_root_object}
+(@code{staticpro()} is equivalent to @code{staticpro_nodump()} +
+@code{dump_add_root_object()}).
+
+@item
+the variables registered via @code{dump_add_root_struct_ptr}, each of
+which points to a C structure.
+@end enumerate
+
+This does not include the GCPRO'ed variables, the specbinds, the
+catchtags, the backlist, the redisplay or the profiling info, since we
+do not want to rebuild the actual chain of lisp calls which end up to
+the dump-emacs call, only the global variables.
+
+Weak lists and weak hash tables are dumped as if they were their
+non-weak equivalent (without changing their type, of course). This has
+not yet been a problem.
+
+@node Address allocation, The header, Object inventory, Dumping phase
+@subsection Address allocation
+
+
+The next step is to allocate the offsets of each of the objects in the
+final dump file. This is done by @code{pdump_allocate_offset()} which
+is called indirectly by @code{pdump_scan_by_alignment()}.
+
+The strategy to deal with alignment problems uses these facts:
+
+@enumerate
+@item
+real world alignment requirements are powers of two.
+
+@item
+the C compiler is required to adjust the size of a struct so that you
+can have an array of them next to each other. This means you can have an
+upper bound of the alignment requirements of a given structure by
+looking at which power of two its size is a multiple.
+
+@item
+the non-variant part of variable size lrecords has an alignment
+requirement of 4.
+@end enumerate
+
+Hence, for each lrecord type, C struct type or opaque data block the
+alignment requirement is computed as a power of two, with a minimum of
+2^2 for lrecords. @code{pdump_scan_by_alignment()} then scans all the
+@code{pdump_entry_list_elmt}'s, the ones with the highest requirements
+first. This ensures the best packing.
+
+The maximum alignment requirement we take into account is 2^8.
+
+@code{pdump_allocate_offset()} only has to do a linear allocation,
+starting at offset 256 (this leaves room for the header and keeps the
+alignments happy).
+
+@node The header, Data dumping, Address allocation, Dumping phase
+@subsection The header
+
+The next step creates the file and writes a header with a signature and
+some random information in it. The @code{reloc_address} field, which
+indicates at which address the file should be loaded if we want to avoid
+post-reload relocation, is set to 0. It then seeks to offset 256 (base
+offset for the objects).
+
+@node Data dumping, Pointers dumping, The header, Dumping phase
+@subsection Data dumping
+
+The data is dumped in the same order as the addresses were allocated by
+@code{pdump_dump_data()}, called from @code{pdump_scan_by_alignment()}.
+This function copies the data to a temporary buffer, relocates all
+pointers in the object to the addresses allocated in step Address
+Allocation, and writes it to the file. Using the same order means that,
+if we are careful with lrecords whose size is not a multiple of 4, we
+are ensured that the object is always written at the offset in the file
+allocated in step Address Allocation.
+
+@node Pointers dumping, , Data dumping, Dumping phase
+@subsection Pointers dumping
+
+A bunch of tables needed to reassign properly the global pointers are
+then written. They are:
+
+@enumerate
+@item
+the pdump_root_struct_ptrs dynarr
+@item
+the pdump_opaques dynarr
+@item
+a vector of all the offsets to the objects in the file that include a
+description (for faster relocation at reload time)
+@item
+the pdump_root_objects and pdump_weak_object_chains dynarrs.
+@end enumerate
+
+For each of the dynarrs we write both the pointer to the variables and
+the relocated offset of the object they point to. Since these variables
+are global, the pointers are still valid when restarting the program and
+are used to regenerate the global pointers.
+
+The @code{pdump_weak_object_chains} dynarr is a special case. The
+variables it points to are the head of weak linked lists of lisp objects
+of the same type. Not all objects of this list are dumped so the
+relocated pointer we associate with them points to the first dumped
+object of the list, or Qnil if none is available. This is also the
+reason why they are not used as roots for the purpose of object
+enumeration.
+
+Some very important information like the @code{staticpros} and
+@code{lrecord_implementations_table} are handled indirectly using
+@code{dump_add_opaque} or @code{dump_add_root_struct_ptr}.
+
+This is the end of the dumping part.
+
+@node Reloading phase, Remaining issues, Dumping phase, Dumping
+@section Reloading phase
+
+@subsection File loading
+
+The file is mmap'ed in memory (which ensures a PAGESIZE alignment, at
+least 4096), or if mmap is unavailable or fails, a 256-bytes aligned
+malloc is done and the file is loaded.
+
+Some variables are reinitialized from the values found in the header.
+
+The difference between the actual loading address and the reloc_address
+is computed and will be used for all the relocations.
+
+
+@subsection Putting back the pdump_opaques
+
+The memory contents are restored in the obvious and trivial way.
+
+
+@subsection Putting back the pdump_root_struct_ptrs
+
+The variables pointed to by pdump_root_struct_ptrs in the dump phase are
+reset to the right relocated object addresses.
+
+
+@subsection Object relocation
+
+All the objects are relocated using their description and their offset
+by @code{pdump_reloc_one}. This step is unnecessary if the
+reloc_address is equal to the file loading address.
+
+
+@subsection Putting back the pdump_root_objects and pdump_weak_object_chains
+
+Same as Putting back the pdump_root_struct_ptrs.
+
+
+@subsection Reorganize the hash tables
+
+Since some of the hash values in the lisp hash tables are
+address-dependent, their layout is now wrong. So we go through each of
+them and have them resorted by calling @code{pdump_reorganize_hash_table}.
+
+@node Remaining issues, , Reloading phase, Dumping
+@section Remaining issues
+
+The build process will have to start a post-dump xemacs, ask it the
+loading address (which will, hopefully, be always the same between
+different xemacs invocations) and relocate the file to the new address.
+This way the object relocation phase will not have to be done, which
+means no writes in the objects and that, because of the use of mmap, the
+dumped data will be shared between all the xemacs running on the
+computer.
+
+Some executable signature will be necessary to ensure that a given dump
+file is really associated with a given executable, or random crashes
+will occur. Maybe a random number set at compile or configure time thru
+a define. This will also allow for having differently-compiled xemacsen
+on the same system (mule and no-mule comes to mind).
+
+The DOC file contents should probably end up in the dump file.
+
+
+@node Events and the Event Loop, Evaluation; Stack Frames; Bindings, Dumping, Top
@chapter Events and the Event Loop
@menu
* Dispatching Events; The Command Builder::
@end menu
-@node Introduction to Events
+@node Introduction to Events, Main Loop, Events and the Event Loop, Events and the Event Loop
@section Introduction to Events
An event is an object that encapsulates information about an
nature of the most basic events that are received. Part of the
complex nature of the XEmacs event collection process involves
converting from the operating-system events into the proper
-Emacs events -- there may not be a one-to-one correspondence.
+Emacs events---there may not be a one-to-one correspondence.
Emacs events are documented in @file{events.h}; I'll discuss them
later.
-@node Main Loop
+@node Main Loop, Specifics of the Event Gathering Mechanism, Introduction to Events, Events and the Event Loop
@section Main Loop
The @dfn{command loop} is the top-level loop that the editor is always
This is documented elsewhere.
The guts of the command loop are in @code{command_loop_1()}. This
-function doesn't catch errors, though -- that's the job of
+function doesn't catch errors, though---that's the job of
@code{command_loop_2()}, which is a condition-case (i.e. error-trapping)
wrapper around @code{command_loop_1()}. @code{command_loop_1()} never
returns, but may get thrown out of.
invoking @code{top_level_1()}, just like when it invokes
@code{command_loop_2()}.
-@node Specifics of the Event Gathering Mechanism
+@node Specifics of the Event Gathering Mechanism, Specifics About the Emacs Event, Main Loop, Events and the Event Loop
@section Specifics of the Event Gathering Mechanism
Here is an approximate diagram of the collection processes
@noindent
@example
- asynch. asynch. asynch. asynch. [Collectors in
-kbd events kbd events process process the OS]
- | | output output
- | | | |
- | | | | SIGINT, [signal handlers
- | | | | SIGQUIT, in XEmacs]
+ asynch. asynch. asynch. asynch. [Collectors in
+kbd events kbd events process process the OS]
+ | | output output
+ | | | |
+ | | | | SIGINT, [signal handlers
+ | | | | SIGQUIT, in XEmacs]
V V V V SIGWINCH,
file file file file SIGALRM
desc. desc. desc. desc. |
| | | | | |
V V V V V V
------>-----------<----------------<----------------
- |
- |
- | [collected using select() in emacs_tty_next_event()
- | and converted to the appropriate Emacs event]
- |
- |
- V (above this line is TTY-specific)
- Emacs ------------------------------------------------
- event (below this line is the generic event mechanism)
- |
- |
-was there if not, call
-a SIGINT? emacs_tty_next_event()
- | |
- | |
- | |
- V V
- --->-------<----
+ |
+ |
+ | [collected using select() in emacs_tty_next_event()
+ | and converted to the appropriate Emacs event]
+ |
+ |
+ V (above this line is TTY-specific)
+ Emacs -----------------------------------------------
+ event (below this line is the generic event mechanism)
+ |
+ |
+was there if not, call
+a SIGINT? emacs_tty_next_event()
+ | |
+ | |
+ | |
+ V V
+ --->------<----
|
- | [collected in event_stream_next_event();
- | SIGINT is converted using maybe_read_quit_event()]
+ | [collected in event_stream_next_event();
+ | SIGINT is converted using maybe_read_quit_event()]
V
Emacs
event
|
|
command event queue |
- if not from command
- (contains events that were event queue, call
- read earlier but not processed, event_stream_next_event()
+ if not from command
+ (contains events that were event queue, call
+ read earlier but not processed, event_stream_next_event()
typically when waiting in a |
sit-for, sleep-for, etc. for |
a particular event to be received) |
V V
---->------------------------------------<----
|
- | [collected in
- | next_event_internal()]
+ | [collected in
+ | next_event_internal()]
|
unread- unread- event from |
command- command- keyboard else, call
@example
asynch. asynch. asynch. asynch. [Collectors in
kbd kbd process process the OS]
-events events output output
- | | | |
- | | | | asynch. asynch. [Collectors in the
- | | | | X X OS and X Window System]
- | | | | events events
+events events output output
+ | | | |
+ | | | | asynch. asynch. [Collectors in the
+ | | | | X X OS and X Window System]
+ | | | | events events
| | | | | |
| | | | | |
- | | | | | | SIGINT, [signal handlers
- | | | | | | SIGQUIT, in XEmacs]
- | | | | | | SIGWINCH,
- | | | | | | SIGALRM
- | | | | | | |
- | | | | | | |
- | | | | | | | timeouts
+ | | | | | | SIGINT, [signal handlers
+ | | | | | | SIGQUIT, in XEmacs]
+ | | | | | | SIGWINCH,
+ | | | | | | SIGALRM
+ | | | | | | |
+ | | | | | | |
+ | | | | | | | timeouts
| | | | | | | |
| | | | | | | |
| | | | | | V |
- V V V V V V fake |
- file file file file file file file |
- desc. desc. desc. desc. desc. desc. desc. |
- (TTY) (TTY) (pipe) (pipe) (socket) (socket) (pipe) |
+ V V V V V V fake |
+ file file file file file file file |
+ desc. desc. desc. desc. desc. desc. desc. |
+ (TTY) (TTY) (pipe) (pipe) (socket) (socket) (pipe) |
| | | | | | | |
| | | | | | | |
| | | | | | | |
- V V V V V V V V
+ V V V V V V V V
--->----------------------------------------<---------<------
| | |
- | | | [collected using select() in
- | | | _XtWaitForSomething(), called
- | | | from XtAppProcessEvent(), called
- | | | in emacs_Xt_next_event();
- | | | dispatched to various callbacks]
+ | | |[collected using select() in
+ | | | _XtWaitForSomething(), called
+ | | | from XtAppProcessEvent(), called
+ | | | in emacs_Xt_next_event();
+ | | | dispatched to various callbacks]
| | |
| | |
- emacs_Xt_ p_s_callback(), | [popup_selection_callback]
- event_handler() x_u_v_s_callback(),| [x_update_vertical_scrollbar_
- | x_u_h_s_callback(),| callback]
- | search_callback() | [x_update_horizontal_scrollbar_
- | | | callback]
+ emacs_Xt_ p_s_callback(), | [popup_selection_callback]
+ event_handler() x_u_v_s_callback(),| [x_update_vertical_scrollbar_
+ | x_u_h_s_callback(),| callback]
+ | search_callback() | [x_update_horizontal_scrollbar_
+ | | | callback]
| | |
| | |
enqueue_Xt_ signal_special_ |
-->----------<-- |
| |
| |
- dispatch Xt_what_callback()
+ dispatch Xt_what_callback()
event sets flags
queue |
| |
| |
| |
---->-----------<--------
- |
+ |
|
| [collected and converted as appropriate in
| emacs_Xt_next_event()]
- |
- |
- V (above this line is Xt-specific)
- Emacs ------------------------------------------------
- event (below this line is the generic event mechanism)
+ |
+ |
+ V (above this line is Xt-specific)
+ Emacs ------------------------------------------------
+ event (below this line is the generic event mechanism)
|
|
was there if not, call
|
|
command event queue |
- if not from command
- (contains events that were event queue, call
- read earlier but not processed, event_stream_next_event()
+ if not from command
+ (contains events that were event queue, call
+ read earlier but not processed, event_stream_next_event()
typically when waiting in a |
sit-for, sleep-for, etc. for |
a particular event to be received) |
V V
---->----------------------------------<------
|
- | [collected in
- | next_event_internal()]
+ | [collected in
+ | next_event_internal()]
|
unread- unread- event from |
command- command- keyboard else, call
using `dispatch-event'
@end example
-@node Specifics About the Emacs Event
+@node Specifics About the Emacs Event, The Event Stream Callback Routines, Specifics of the Event Gathering Mechanism, Events and the Event Loop
@section Specifics About the Emacs Event
-@node The Event Stream Callback Routines
+@node The Event Stream Callback Routines, Other Event Loop Functions, Specifics About the Emacs Event, Events and the Event Loop
@section The Event Stream Callback Routines
-@node Other Event Loop Functions
+@node Other Event Loop Functions, Converting Events, The Event Stream Callback Routines, Events and the Event Loop
@section Other Event Loop Functions
@code{detect_input_pending()} and @code{input-pending-p} look for
the right kind of input method support, it is possible for (read-char)
to return a Kanji character.
-@node Converting Events
+@node Converting Events, Dispatching Events; The Command Builder, Other Event Loop Functions, Events and the Event Loop
@section Converting Events
@code{character_to_event()}, @code{event_to_character()},
between character representation and the split-up event representation
(keysym plus mod keys).
-@node Dispatching Events; The Command Builder
+@node Dispatching Events; The Command Builder, , Converting Events, Events and the Event Loop
@section Dispatching Events; The Command Builder
Not yet documented.
* Catch and Throw::
@end menu
-@node Evaluation
+@node Evaluation, Dynamic Binding; The specbinding Stack; Unwind-Protects, Evaluation; Stack Frames; Bindings, Evaluation; Stack Frames; Bindings
@section Evaluation
@code{Feval()} evaluates the form (a Lisp object) that is passed to
it. Note that evaluation is only non-trivial for two types of objects:
symbols and conses. A symbol is evaluated simply by calling
-symbol-value on it and returning the value.
+@code{symbol-value} on it and returning the value.
Evaluating a cons means calling a function. First, @code{eval} checks
to see if garbage-collection is necessary, and calls
-@code{Fgarbage_collect()} if so. It then increases the evaluation depth
-by 1 (@code{lisp_eval_depth}, which is always less than @code{max_lisp_eval_depth}) and adds an
-element to the linked list of @code{struct backtrace}'s
-(@code{backtrace_list}). Each such structure contains a pointer to the
-function being called plus a list of the function's arguments.
-Originally these values are stored unevalled, and as they are evaluated,
-the backtrace structure is updated. Garbage collection pays attention
-to the objects pointed to in the backtrace structures (garbage
-collection might happen while a function is being called or while an
-argument is being evaluated, and there could easily be no other
-references to the arguments in the argument list; once an argument is
-evaluated, however, the unevalled version is not needed by eval, and so
-the backtrace structure is changed).
-
- At this point, the function to be called is determined by looking at
+@code{garbage_collect_1()} if so. It then increases the evaluation
+depth by 1 (@code{lisp_eval_depth}, which is always less than
+@code{max_lisp_eval_depth}) and adds an element to the linked list of
+@code{struct backtrace}'s (@code{backtrace_list}). Each such structure
+contains a pointer to the function being called plus a list of the
+function's arguments. Originally these values are stored unevalled, and
+as they are evaluated, the backtrace structure is updated. Garbage
+collection pays attention to the objects pointed to in the backtrace
+structures (garbage collection might happen while a function is being
+called or while an argument is being evaluated, and there could easily
+be no other references to the arguments in the argument list; once an
+argument is evaluated, however, the unevalled version is not needed by
+eval, and so the backtrace structure is changed).
+
+At this point, the function to be called is determined by looking at
the car of the cons (if this is a symbol, its function definition is
retrieved and the process repeated). The function should then consist
-of either a @code{Lisp_Subr} (built-in function), a
-@code{Lisp_Compiled_Function} object, or a cons whose car is the symbol
-@code{autoload}, @code{macro} or @code{lambda}.
+of either a @code{Lisp_Subr} (built-in function written in C), a
+@code{Lisp_Compiled_Function} object, or a cons whose car is one of the
+symbols @code{autoload}, @code{macro} or @code{lambda}.
If the function is a @code{Lisp_Subr}, the lisp object points to a
@code{struct Lisp_Subr} (created by @code{DEFUN()}), which contains a
pointer to the C function, a minimum and maximum number of arguments
-(possibly the special constants @code{MANY} or @code{UNEVALLED}), a
+(or possibly the special constants @code{MANY} or @code{UNEVALLED}), a
pointer to the symbol referring to that subr, and a couple of other
things. If the subr wants its arguments @code{UNEVALLED}, they are
passed raw as a list. Otherwise, an array of evaluated arguments is
created and put into the backtrace structure, and either passed whole
(@code{MANY}) or each argument is passed as a C argument.
- If the function is a @code{Lisp_Compiled_Function} object or a lambda,
-@code{apply_lambda()} is called. If the function is a macro,
-[..... fill in] is done. If the function is an autoload,
+If the function is a @code{Lisp_Compiled_Function},
+@code{funcall_compiled_function()} is called. If the function is a
+lambda list, @code{funcall_lambda()} is called. If the function is a
+macro, [..... fill in] is done. If the function is an autoload,
@code{do_autoload()} is called to load the definition and then eval
starts over [explain this more].
- When @code{Feval} exits, the evaluation depth is reduced by one, the
+When @code{Feval()} exits, the evaluation depth is reduced by one, the
debugger is called if appropriate, and the current backtrace structure
is removed from the list.
- @code{apply_lambda()} is passed a function, a list of arguments, and a
-flag indicating whether to evaluate the arguments. It creates an array
-of (possibly) evaluated arguments and fixes up the backtrace structure,
-just like eval does. Then it calls @code{funcall_lambda()}.
+Both @code{funcall_compiled_function()} and @code{funcall_lambda()} need
+to go through the list of formal parameters to the function and bind
+them to the actual arguments, checking for @code{&rest} and
+@code{&optional} symbols in the formal parameters and making sure the
+number of actual arguments is correct.
+@code{funcall_compiled_function()} can do this a little more
+efficiently, since the formal parameter list can be checked for sanity
+when the compiled function object is created.
+
+@code{funcall_lambda()} simply calls @code{Fprogn} to execute the code
+in the lambda list.
+
+@code{funcall_compiled_function()} calls the real byte-code interpreter
+@code{execute_optimized_program()} on the byte-code instructions, which
+are converted into an internal form for faster execution.
+
+When a compiled function is executed for the first time by
+@code{funcall_compiled_function()}, or during the dump phase of building
+XEmacs, the byte-code instructions are converted from a
+@code{Lisp_String} (which is inefficient to access, especially in the
+presence of MULE) into a @code{Lisp_Opaque} object containing an array
+of unsigned char, which can be directly executed by the byte-code
+interpreter. At this time the byte code is also analyzed for validity
+and transformed into a more optimized form, so that
+@code{execute_optimized_program()} can really fly.
+
+Here are some of the optimizations performed by the internal byte-code
+transformer:
+@enumerate
+@item
+References to the @code{constants} array are checked for out-of-range
+indices, so that the byte interpreter doesn't have to.
+@item
+References to the @code{constants} array that will be used as a Lisp
+variable are checked for being correct non-constant (i.e. not @code{t},
+@code{nil}, or @code{keywordp}) symbols, so that the byte interpreter
+doesn't have to.
+@item
+The maximum number of variable bindings in the byte-code is
+pre-computed, so that space on the @code{specpdl} stack can be
+pre-reserved once for the whole function execution.
+@item
+All byte-code jumps are relative to the current program counter instead
+of the start of the program, thereby saving a register.
+@item
+One-byte relative jumps are converted from the byte-code form of unsigned
+chars offset by 127 to machine-friendly signed chars.
+@end enumerate
- @code{funcall_lambda()} goes through the formal arguments to the
-function and binds them to the actual arguments, checking for
-@code{&rest} and @code{&optional} symbols in the formal arguments and
-making sure the number of actual arguments is correct. Then either
-@code{progn} or @code{byte-code} is called to actually execute the body
-and return a value.
+Of course, this transformation of the @code{instructions} should not be
+visible to the user, so @code{Fcompiled_function_instructions()} needs
+to know how to convert the optimized opaque object back into a Lisp
+string that is identical to the original string from the @file{.elc}
+file. (Actually, the resulting string may (rarely) contain slightly
+different, yet equivalent, byte code.)
- @code{Ffuncall()} implements Lisp @code{funcall}. @code{(funcall fun
+@code{Ffuncall()} implements Lisp @code{funcall}. @code{(funcall fun
x1 x2 x3 ...)} is equivalent to @code{(eval (list fun (quote x1) (quote
x2) (quote x3) ...))}. @code{Ffuncall()} contains its own code to do
-the evaluation, however, and is almost identical to eval.
+the evaluation, however, and is very similar to @code{Feval()}.
- @code{Fapply()} implements Lisp @code{apply}, which is very similar to
+From the performance point of view, it is worth knowing that most of the
+time in Lisp evaluation is spent executing @code{Lisp_Subr} and
+@code{Lisp_Compiled_Function} objects via @code{Ffuncall()} (not
+@code{Feval()}).
+
+@code{Fapply()} implements Lisp @code{apply}, which is very similar to
@code{funcall} except that if the last argument is a list, the result is the
same as if each of the arguments in the list had been passed separately.
@code{Fapply()} does some business to expand the last argument if it's a
list, then calls @code{Ffuncall()} to do the work.
- @code{apply1()}, @code{call0()}, @code{call1()}, @code{call2()}, and
+@code{apply1()}, @code{call0()}, @code{call1()}, @code{call2()}, and
@code{call3()} call a function, passing it the argument(s) given (the
arguments are given as separate C arguments rather than being passed as
-an array). @code{apply1()} uses @code{apply} while the others use
-@code{funcall}.
+an array). @code{apply1()} uses @code{Fapply()} while the others use
+@code{Ffuncall()} to do the real work.
-@node Dynamic Binding; The specbinding Stack; Unwind-Protects
+@node Dynamic Binding; The specbinding Stack; Unwind-Protects, Simple Special Forms, Evaluation, Evaluation; Stack Frames; Bindings
@section Dynamic Binding; The specbinding Stack; Unwind-Protects
@example
struct specbinding
@{
- Lisp_Object symbol, old_value;
+ Lisp_Object symbol;
+ Lisp_Object old_value;
Lisp_Object (*func) (Lisp_Object); /* for unwind-protect */
@};
@end example
the symbol's value).
@end enumerate
-@node Simple Special Forms
+@node Simple Special Forms, Catch and Throw, Dynamic Binding; The specbinding Stack; Unwind-Protects, Evaluation; Stack Frames; Bindings
@section Simple Special Forms
@code{or}, @code{and}, @code{if}, @code{cond}, @code{progn},
@code{prog1}, @code{prog2}, @code{setq}, @code{quote}, @code{function},
@code{let*}, @code{let}, @code{while}
- All of these are very simple and work as expected, calling
+All of these are very simple and work as expected, calling
@code{Feval()} or @code{Fprogn()} as necessary and (in the case of
@code{let} and @code{let*}) using @code{specbind()} to create bindings
-and @code{unbind_to()} to undo the bindings when finished. Note that
-these functions do a lot of @code{GCPRO}ing to protect their arguments
-from garbage collection because they call @code{Feval()} (@pxref{Garbage
-Collection}).
+and @code{unbind_to()} to undo the bindings when finished.
+
+Note that, with the exception of @code{Fprogn}, these functions are
+typically called in real life only in interpreted code, since the byte
+compiler knows how to convert calls to these functions directly into
+byte code.
-@node Catch and Throw
+@node Catch and Throw, , Simple Special Forms, Evaluation; Stack Frames; Bindings
@section Catch and Throw
@example
* Symbol Values::
@end menu
-@node Introduction to Symbols
+@node Introduction to Symbols, Obarrays, Symbols and Variables, Symbols and Variables
@section Introduction to Symbols
A symbol is basically just an object with four fields: a name (a
additional values with particular names, and once again the namespace is
independent of the function and variable namespaces.
-@node Obarrays
+@node Obarrays, Symbol Values, Introduction to Symbols, Symbols and Variables
@section Obarrays
The identity of symbols with their names is accomplished through a
into any obarray.) Finally, @code{mapatoms} maps over all of the symbols
in an obarray.
-@node Symbol Values
+@node Symbol Values, , Obarrays, Symbols and Variables
@section Symbol Values
The value field of a symbol normally contains a Lisp object. However,
* The Buffer Object:: The Lisp object corresponding to a buffer.
@end menu
-@node Introduction to Buffers
+@node Introduction to Buffers, The Text in a Buffer, Buffers and Textual Representation, Buffers and Textual Representation
@section Introduction to Buffers
A buffer is logically just a Lisp object that holds some text.
gets restored when the code is finished). However, calling
@code{set-buffer} will NOT cause a permanent change in the current
buffer. The reason for this is that the top-level event loop sets
-@code{current_buffer} to the buffer of the selected window, each time
+@code{current_buffer} to the buffer of the selected window, each time
it finishes executing a user command.
@end enumerate
window. (This latter distinction is explained in detail in the section
on windows.)
-@node The Text in a Buffer
+@node The Text in a Buffer, Buffer Lists, Introduction to Buffers, Buffers and Textual Representation
@section The Text in a Buffer
The text in a buffer consists of a sequence of zero or more
number of possible alternative representations (e.g. EUC-encoded text,
etc.).
-@node Buffer Lists
+@node Buffer Lists, Markers and Extents, The Text in a Buffer, Buffers and Textual Representation
@section Buffer Lists
Recall earlier that buffers are @dfn{permanent} objects, i.e. that
a unique name from this by appending a number, and then creates the
buffer. This is basically like the symbol operation @code{gensym}.
-@node Markers and Extents
+@node Markers and Extents, Bufbytes and Emchars, Buffer Lists, Buffers and Textual Representation
@section Markers and Extents
Among the things associated with a buffer are things that are
buffer positions in them as integers, and every time text is inserted or
deleted, these positions must be updated. In order to minimize the
amount of shuffling that needs to be done, the positions in markers and
-extents (there's one per marker, two per extent) and stored in Meminds.
+extents (there's one per marker, two per extent) are stored in Meminds.
This means that they only need to be moved when the text is physically
moved in memory; since the gap structure tries to minimize this, it also
minimizes the number of marker and extent indices that need to be
(which could happen as a result of text being deleted) or the buffer is
deleted, and primitives do exist to enumerate the extents in a buffer.
-@node Bufbytes and Emchars
+@node Bufbytes and Emchars, The Buffer Object, Markers and Extents, Buffers and Textual Representation
@section Bufbytes and Emchars
Not yet documented.
-@node The Buffer Object
+@node The Buffer Object, , Bufbytes and Emchars, Buffers and Textual Representation
@section The Buffer Object
Buffers contain fields not directly accessible by the Lisp programmer.
@table @code
@item name
The buffer name is a string that names the buffer. It is guaranteed to
-be unique. @xref{Buffer Names,,, lispref, XEmacs Lisp Programmer's
+be unique. @xref{Buffer Names,,, lispref, XEmacs Lisp Reference
Manual}.
@item save_modified
This field contains the time when the buffer was last saved, as an
-integer. @xref{Buffer Modification,,, lispref, XEmacs Lisp Programmer's
+integer. @xref{Buffer Modification,,, lispref, XEmacs Lisp Reference
Manual}.
@item modtime
This field contains the modification time of the visited file. It is
set when the file is written or read. Every time the buffer is written
to the file, this field is compared to the modification time of the
-file. @xref{Buffer Modification,,, lispref, XEmacs Lisp Programmer's
+file. @xref{Buffer Modification,,, lispref, XEmacs Lisp Reference
Manual}.
@item auto_save_modified
@item undo_list
This field points to the buffer's undo list. @xref{Undo,,, lispref,
-XEmacs Lisp Programmer's Manual}.
+XEmacs Lisp Reference Manual}.
@item syntax_table_v
This field contains the syntax table for the buffer. @xref{Syntax
-Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
+Tables,,, lispref, XEmacs Lisp Reference Manual}.
@item downcase_table
This field contains the conversion table for converting text to lower
-case. @xref{Case Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
+case. @xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}.
@item upcase_table
This field contains the conversion table for converting text to upper
-case. @xref{Case Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
+case. @xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}.
@item case_canon_table
This field contains the conversion table for canonicalizing text for
case-folding search. @xref{Case Tables,,, lispref, XEmacs Lisp
-Programmer's Manual}.
+Reference Manual}.
@item case_eqv_table
This field contains the equivalence table for case-folding search.
-@xref{Case Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
+@xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}.
@item display_table
This field contains the buffer's display table, or @code{nil} if it
doesn't have one. @xref{Display Tables,,, lispref, XEmacs Lisp
-Programmer's Manual}.
+Reference Manual}.
@item markers
This field contains the chain of all markers that currently point into
the buffer. Deletion of text in the buffer, and motion of the buffer's
gap, must check each of these markers and perhaps update it.
-@xref{Markers,,, lispref, XEmacs Lisp Programmer's Manual}.
+@xref{Markers,,, lispref, XEmacs Lisp Reference Manual}.
@item backed_up
This field is a flag that tells whether a backup file has been made for
@item mark
This field contains the mark for the buffer. The mark is a marker,
hence it is also included on the list @code{markers}. @xref{The Mark,,,
-lispref, XEmacs Lisp Programmer's Manual}.
+lispref, XEmacs Lisp Reference Manual}.
@item mark_active
This field is non-@code{nil} if the buffer's mark is active.
in this buffer, and their values, with the exception of local variables
that have special slots in the buffer object. (Those slots are omitted
from this table.) @xref{Buffer-Local Variables,,, lispref, XEmacs Lisp
-Programmer's Manual}.
+Reference Manual}.
@item modeline_format
This field contains a Lisp object which controls how to display the mode
line for this buffer. @xref{Modeline Format,,, lispref, XEmacs Lisp
-Programmer's Manual}.
+Reference Manual}.
@item base_buffer
This field holds the buffer's base buffer (if it is an indirect buffer),
* CCL::
@end menu
-@node Character Sets
+@node Character Sets, Encodings, MULE Character Sets and Encodings, MULE Character Sets and Encodings
@section Character Sets
A character set (or @dfn{charset}) is an ordered set of characters. A
This is a bit ad-hoc but gets the job done.
-@node Encodings
+@node Encodings, Internal Mule Encodings, Character Sets, MULE Character Sets and Encodings
@section Encodings
An @dfn{encoding} is a way of numerically representing characters from
* JIS7::
@end menu
-@node Japanese EUC (Extended Unix Code)
+@node Japanese EUC (Extended Unix Code), JIS7, Encodings, Encodings
@subsection Japanese EUC (Extended Unix Code)
-This encompasses the character sets Printing-ASCII, Japanese-JISSX0201,
+This encompasses the character sets Printing-ASCII, Japanese-JISX0201,
and Japanese-JISX0208-Kana (half-width katakana, the right half of
JISX0201). It uses 8-bit bytes.
@end example
-@node JIS7
+@node JIS7, , Japanese EUC (Extended Unix Code), Encodings
@subsection JIS7
This encompasses the character sets Printing-ASCII,
Initially, Printing-ASCII is invoked.
-@node Internal Mule Encodings
+@node Internal Mule Encodings, CCL, Encodings, MULE Character Sets and Encodings
@section Internal Mule Encodings
In XEmacs/Mule, each character set is assigned a unique number, called a
* Internal Character Encoding::
@end menu
-@node Internal String Encoding
+@node Internal String Encoding, Internal Character Encoding, Internal Mule Encodings, Internal Mule Encodings
@subsection Internal String Encoding
ASCII characters are encoded using their position code directly. Other
Shift-JIS and Big5 (not yet described) satisfy only (2). (All
non-modal encodings must satisfy (2), in order to be unambiguous.)
-@node Internal Character Encoding
+@node Internal Character Encoding, , Internal String Encoding, Internal Mule Encodings
@subsection Internal Character Encoding
One 19-bit word represents a single character. The word is
Note that character codes 0 - 255 are the same as the ``binary encoding''
described above.
-@node CCL
+@node CCL, , Internal Mule Encodings, MULE Character Sets and Encodings
@section CCL
@example
CCL PROGRAM SYNTAX:
- CCL_PROGRAM := (CCL_MAIN_BLOCK
- [ CCL_EOF_BLOCK ])
-
- CCL_MAIN_BLOCK := CCL_BLOCK
- CCL_EOF_BLOCK := CCL_BLOCK
-
- CCL_BLOCK := STATEMENT | (STATEMENT [STATEMENT ...])
- STATEMENT :=
- SET | IF | BRANCH | LOOP | REPEAT | BREAK
- | READ | WRITE
-
- SET := (REG = EXPRESSION) | (REG SELF_OP EXPRESSION)
- | INT-OR-CHAR
-
- EXPRESSION := ARG | (EXPRESSION OP ARG)
-
- IF := (if EXPRESSION CCL_BLOCK CCL_BLOCK)
- BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...])
- LOOP := (loop STATEMENT [STATEMENT ...])
- BREAK := (break)
- REPEAT := (repeat)
- | (write-repeat [REG | INT-OR-CHAR | string])
- | (write-read-repeat REG [INT-OR-CHAR | string | ARRAY]?)
- READ := (read REG) | (read REG REG)
- | (read-if REG ARITH_OP ARG CCL_BLOCK CCL_BLOCK)
- | (read-branch REG CCL_BLOCK [CCL_BLOCK ...])
- WRITE := (write REG) | (write REG REG)
- | (write INT-OR-CHAR) | (write STRING) | STRING
- | (write REG ARRAY)
- END := (end)
-
- REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7
- ARG := REG | INT-OR-CHAR
- OP := + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | //
- | < | > | == | <= | >= | !=
- SELF_OP :=
- += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>=
- ARRAY := '[' INT-OR-CHAR ... ']'
- INT-OR-CHAR := INT | CHAR
+ CCL_PROGRAM := (CCL_MAIN_BLOCK
+ [ CCL_EOF_BLOCK ])
+
+ CCL_MAIN_BLOCK := CCL_BLOCK
+ CCL_EOF_BLOCK := CCL_BLOCK
+
+ CCL_BLOCK := STATEMENT | (STATEMENT [STATEMENT ...])
+ STATEMENT :=
+ SET | IF | BRANCH | LOOP | REPEAT | BREAK
+ | READ | WRITE
+
+ SET := (REG = EXPRESSION) | (REG SELF_OP EXPRESSION)
+ | INT-OR-CHAR
+
+ EXPRESSION := ARG | (EXPRESSION OP ARG)
+
+ IF := (if EXPRESSION CCL_BLOCK CCL_BLOCK)
+ BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...])
+ LOOP := (loop STATEMENT [STATEMENT ...])
+ BREAK := (break)
+ REPEAT := (repeat)
+ | (write-repeat [REG | INT-OR-CHAR | string])
+ | (write-read-repeat REG [INT-OR-CHAR | string | ARRAY]?)
+ READ := (read REG) | (read REG REG)
+ | (read-if REG ARITH_OP ARG CCL_BLOCK CCL_BLOCK)
+ | (read-branch REG CCL_BLOCK [CCL_BLOCK ...])
+ WRITE := (write REG) | (write REG REG)
+ | (write INT-OR-CHAR) | (write STRING) | STRING
+ | (write REG ARRAY)
+ END := (end)
+
+ REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7
+ ARG := REG | INT-OR-CHAR
+ OP := + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | //
+ | < | > | == | <= | >= | !=
+ SELF_OP :=
+ += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>=
+ ARRAY := '[' INT-OR-CHAR ... ']'
+ INT-OR-CHAR := INT | CHAR
MACHINE CODE:
other encoded/decoded data has been written out. This is not used for
charset CCL programs.
-REGISTER: 0..7 -- refered by RRR or rrr
+REGISTER: 0..7 -- referred by RRR or rrr
OPERATOR BIT FIELD (27-bit): XXXXXXXXXXXXXXX RRR TTTTT
TTTTT (5-bit): operator type
CCCCCCCCCCCCCCC: constant or address
000000000000rrr: register number
-AAAA: 00000 +
- 00001 -
- 00010 *
- 00011 /
- 00100 %
- 00101 &
- 00110 |
+AAAA: 00000 +
+ 00001 -
+ 00010 *
+ 00011 /
+ 00100 %
+ 00101 &
+ 00110 |
00111 ~
01000 <<
01110 not used
01111 not used
- 10000 <
- 10001 >
+ 10000 <
+ 10001 >
10010 ==
10011 <=
10100 >=
OPERATORS: TTTTT RRR XX..
-SetCS: 00000 RRR C...C RRR = C...C
-SetCL: 00001 RRR ..... RRR = c...c
+SetCS: 00000 RRR C...C RRR = C...C
+SetCL: 00001 RRR ..... RRR = c...c
c.............c
-SetR: 00010 RRR ..rrr RRR = rrr
-SetA: 00011 RRR ..rrr RRR = array[rrr]
- C.............C size of array = C...C
- c.............c contents = c...c
-
-Jump: 00100 000 c...c jump to c...c
-JumpCond: 00101 RRR c...c if (!RRR) jump to c...c
-WriteJump: 00110 RRR c...c Write1 RRR, jump to c...c
-WriteReadJump: 00111 RRR c...c Write1, Read1 RRR, jump to c...c
-WriteCJump: 01000 000 c...c Write1 C...C, jump to c...c
+SetR: 00010 RRR ..rrr RRR = rrr
+SetA: 00011 RRR ..rrr RRR = array[rrr]
+ C.............C size of array = C...C
+ c.............c contents = c...c
+
+Jump: 00100 000 c...c jump to c...c
+JumpCond: 00101 RRR c...c if (!RRR) jump to c...c
+WriteJump: 00110 RRR c...c Write1 RRR, jump to c...c
+WriteReadJump: 00111 RRR c...c Write1, Read1 RRR, jump to c...c
+WriteCJump: 01000 000 c...c Write1 C...C, jump to c...c
C...C
-WriteCReadJump: 01001 RRR c...c Write1 C...C, Read1 RRR,
- C.............C and jump to c...c
-WriteSJump: 01010 000 c...c WriteS, jump to c...c
+WriteCReadJump: 01001 RRR c...c Write1 C...C, Read1 RRR,
+ C.............C and jump to c...c
+WriteSJump: 01010 000 c...c WriteS, jump to c...c
C.............C
S.............S
...
-WriteSReadJump: 01011 RRR c...c WriteS, Read1 RRR, jump to c...c
+WriteSReadJump: 01011 RRR c...c WriteS, Read1 RRR, jump to c...c
C.............C
S.............S
...
-WriteAReadJump: 01100 RRR c...c WriteA, Read1 RRR, jump to c...c
- C.............C size of array = C...C
- c.............c contents = c...c
+WriteAReadJump: 01100 RRR c...c WriteA, Read1 RRR, jump to c...c
+ C.............C size of array = C...C
+ c.............c contents = c...c
...
-Branch: 01101 RRR C...C if (RRR >= 0 && RRR < C..)
- c.............c branch to (RRR+1)th address
-Read1: 01110 RRR ... read 1-byte to RRR
-Read2: 01111 RRR ..rrr read 2-byte to RRR and rrr
-ReadBranch: 10000 RRR C...C Read1 and Branch
+Branch: 01101 RRR C...C if (RRR >= 0 && RRR < C..)
+ c.............c branch to (RRR+1)th address
+Read1: 01110 RRR ... read 1-byte to RRR
+Read2: 01111 RRR ..rrr read 2-byte to RRR and rrr
+ReadBranch: 10000 RRR C...C Read1 and Branch
c.............c
...
-Write1: 10001 RRR ..... write 1-byte RRR
-Write2: 10010 RRR ..rrr write 2-byte RRR and rrr
-WriteC: 10011 000 ..... write 1-char C...CC
+Write1: 10001 RRR ..... write 1-byte RRR
+Write2: 10010 RRR ..rrr write 2-byte RRR and rrr
+WriteC: 10011 000 ..... write 1-char C...CC
C.............C
-WriteS: 10100 000 ..... write C..-byte of string
+WriteS: 10100 000 ..... write C..-byte of string
C.............C
S.............S
...
-WriteA: 10101 RRR ..... write array[RRR]
- C.............C size of array = C...C
- c.............c contents = c...c
+WriteA: 10101 RRR ..... write array[RRR]
+ C.............C size of array = C...C
+ c.............c contents = c...c
...
-End: 10110 000 ..... terminate the execution
+End: 10110 000 ..... terminate the execution
-SetSelfCS: 10111 RRR C...C RRR AAAAA= C...C
+SetSelfCS: 10111 RRR C...C RRR AAAAA= C...C
..........AAAAA
-SetSelfCL: 11000 RRR ..... RRR AAAAA= c...c
+SetSelfCL: 11000 RRR ..... RRR AAAAA= c...c
c.............c
..........AAAAA
-SetSelfR: 11001 RRR ..Rrr RRR AAAAA= rrr
+SetSelfR: 11001 RRR ..Rrr RRR AAAAA= rrr
..........AAAAA
-SetExprCL: 11010 RRR ..Rrr RRR = rrr AAAAA c...c
+SetExprCL: 11010 RRR ..Rrr RRR = rrr AAAAA c...c
c.............c
..........AAAAA
-SetExprR: 11011 RRR ..rrr RRR = rrr AAAAA Rrr
+SetExprR: 11011 RRR ..rrr RRR = rrr AAAAA Rrr
............Rrr
..........AAAAA
-JumpCondC: 11100 RRR c...c if !(RRR AAAAA C..) jump to c...c
+JumpCondC: 11100 RRR c...c if !(RRR AAAAA C..) jump to c...c
C.............C
..........AAAAA
-JumpCondR: 11101 RRR c...c if !(RRR AAAAA rrr) jump to c...c
+JumpCondR: 11101 RRR c...c if !(RRR AAAAA rrr) jump to c...c
............rrr
..........AAAAA
-ReadJumpCondC: 11110 RRR c...c Read1 and JumpCondC
+ReadJumpCondC: 11110 RRR c...c Read1 and JumpCondC
C.............C
..........AAAAA
-ReadJumpCondR: 11111 RRR c...c Read1 and JumpCondR
+ReadJumpCondR: 11111 RRR c...c Read1 and JumpCondR
............rrr
..........AAAAA
@end example
* Lstream Methods:: Creating new lstream types.
@end menu
-@node Creating an Lstream
+@node Creating an Lstream, Lstream Types, Lstreams, Lstreams
@section Creating an Lstream
Lstreams come in different types, depending on what is being interfaced
Open for writing, but never writes partial MULE characters.
@end table
-@node Lstream Types
+@node Lstream Types, Lstream Functions, Creating an Lstream, Lstreams
@section Lstream Types
@table @asis
@item encoding
@end table
-@node Lstream Functions
+@node Lstream Functions, Lstream Methods, Lstream Types, Lstreams
@section Lstream Functions
-@deftypefun {Lstream *} Lstream_new (Lstream_implementation *@var{imp}, CONST char *@var{mode})
+@deftypefun {Lstream *} Lstream_new (Lstream_implementation *@var{imp}, const char *@var{mode})
Allocate and return a new Lstream. This function is not really meant to
be called directly; rather, each stream type should provide its own
stream creation function, which creates the stream and does any other
@deftypefn Macro void Lstream_ungetc (Lstream *@var{stream}, int @var{c})
Push one byte back onto the input queue. This will be the next byte
read from the stream. Any number of bytes can be pushed back and will
-be read in the reverse order they were pushed back -- most recent
-first. (This is necessary for consistency -- if there are a number of
+be read in the reverse order they were pushed back---most recent
+first. (This is necessary for consistency---if there are a number of
bytes that have been unread and I read and unread a byte, it needs to be
the first to be read again.) This is a macro and so it is very
efficient. The @var{c} argument is only evaluated once but the @var{stream}
Function equivalents of the above macros.
@end deftypefun
-@deftypefun int Lstream_read (Lstream *@var{stream}, void *@var{data}, int @var{size})
+@deftypefun ssize_t Lstream_read (Lstream *@var{stream}, void *@var{data}, size_t @var{size})
Read @var{size} bytes of @var{data} from the stream. Return the number
of bytes read. 0 means EOF. -1 means an error occurred and no bytes
were read.
@end deftypefun
-@deftypefun int Lstream_write (Lstream *@var{stream}, void *@var{data}, int @var{size})
+@deftypefun ssize_t Lstream_write (Lstream *@var{stream}, void *@var{data}, size_t @var{size})
Write @var{size} bytes of @var{data} to the stream. Return the number
of bytes written. -1 means an error occurred and no bytes were written.
@end deftypefun
-@deftypefun void Lstream_unread (Lstream *@var{stream}, void *@var{data}, int @var{size})
+@deftypefun void Lstream_unread (Lstream *@var{stream}, void *@var{data}, size_t @var{size})
Push back @var{size} bytes of @var{data} onto the input queue. The next
call to @code{Lstream_read()} with the same size will read the same
bytes back. Note that this will be the case even if there is other
@deftypefun void Lstream_reopen (Lstream *@var{stream})
Reopen a closed stream. This enables I/O on it again. This is not
meant to be called except from a wrapper routine that reinitializes
-variables and such -- the close routine may well have freed some
+variables and such---the close routine may well have freed some
necessary storage structures, for example.
@end deftypefun
Rewind the stream to the beginning.
@end deftypefun
-@node Lstream Methods
+@node Lstream Methods, , Lstream Functions, Lstreams
@section Lstream Methods
-@deftypefn {Lstream Method} int reader (Lstream *@var{stream}, unsigned char *@var{data}, int @var{size})
+@deftypefn {Lstream Method} ssize_t reader (Lstream *@var{stream}, unsigned char *@var{data}, size_t @var{size})
Read some data from the stream's end and store it into @var{data}, which
can hold @var{size} bytes. Return the number of bytes read. A return
value of 0 means no bytes can be read at this time. This may be because
This function can be @code{NULL} if the stream is output-only.
@end deftypefn
-@deftypefn {Lstream Method} int writer (Lstream *@var{stream}, CONST unsigned char *@var{data}, int @var{size})
+@deftypefn {Lstream Method} ssize_t writer (Lstream *@var{stream}, const unsigned char *@var{data}, size_t @var{size})
Send some data to the stream's end. Data to be sent is in @var{data}
and is @var{size} bytes. Return the number of bytes sent. This
function can send and return fewer bytes than is passed in; in that
@end deftypefn
@deftypefn {Lstream Method} int seekable_p (Lstream *@var{stream})
-Indicate whether this stream is seekable -- i.e. it can be rewound.
+Indicate whether this stream is seekable---i.e. it can be rewound.
This method is ignored if the stream does not have a rewind method. If
this method is not present, the result is determined by whether a rewind
method is present.
* The Window Object::
@end menu
-@node Introduction to Consoles; Devices; Frames; Windows
+@node Introduction to Consoles; Devices; Frames; Windows, Point, Consoles; Devices; Frames; Windows, Consoles; Devices; Frames; Windows
@section Introduction to Consoles; Devices; Frames; Windows
A window-system window that you see on the screen is called a
Thus, there is a hierarchy console -> display -> frame -> window.
There is a separate Lisp object type for each of these four concepts.
-Furthermore, there is logically a @dfn{selected console},
+Furthermore, there is logically a @dfn{selected console},
@dfn{selected display}, @dfn{selected frame}, and @dfn{selected window}.
Each of these objects is distinguished in various ways, such as being the
default object for various functions that act on objects of that type.
-Note that every containing object rememembers the ``selected'' object
+Note that every containing object remembers the ``selected'' object
among the objects that it contains: e.g. not only is there a selected
window, but every frame remembers the last window in it that was
selected, and changing the selected frame causes the remembered window
within it to become the selected window. Similar relationships apply
for consoles to devices and devices to frames.
-@node Point
+@node Point, Window Hierarchy, Introduction to Consoles; Devices; Frames; Windows, Consoles; Devices; Frames; Windows
@section Point
Recall that every buffer has a current insertion position, called
buffer's point instead. This is related to why @code{save-window-excursion}
does not save the selected window's value of @code{point}.
-@node Window Hierarchy
+@node Window Hierarchy, The Window Object, Point, Consoles; Devices; Frames; Windows
@section Window Hierarchy
@cindex window hierarchy
@cindex hierarchy of windows
these are @dfn{hchild} (a list of horizontally-arrayed children),
@dfn{vchild} (a list of vertically-arrayed children), and @dfn{buffer}
(the buffer contained in a leaf window). Exactly one of
-these will be non-nil. Remember that @dfn{horizontally-arrayed}
+these will be non-@code{nil}. Remember that @dfn{horizontally-arrayed}
means ``side-by-side'' and @dfn{vertically-arrayed} means
@dfn{one above the other}.
@item
Leaf windows also have markers in their @code{start} (the
first buffer position displayed in the window) and @code{pointm}
-(the window's stashed value of @code{point} -- see above) fields,
-while combination windows have nil in these fields.
+(the window's stashed value of @code{point}---see above) fields,
+while combination windows have @code{nil} in these fields.
@item
The list of children for a window is threaded through the
GC purposes.
@item
-Most frames actually have two top-level windows -- one for the
+Most frames actually have two top-level windows---one for the
minibuffer and one (the @dfn{root}) for everything else. The modeline
(if present) separates these two. The @code{next} field of the root
points to the minibuffer, and the @code{prev} field of the minibuffer
artifact that should be fixed.)
@end enumerate
-@node The Window Object
+@node The Window Object, , Window Hierarchy, Consoles; Devices; Frames; Windows
@section The Window Object
Windows have the following accessible fields:
@menu
* Critical Redisplay Sections::
* Line Start Cache::
+* Redisplay Piece by Piece::
@end menu
-@node Critical Redisplay Sections
+@node Critical Redisplay Sections, Line Start Cache, The Redisplay Mechanism, The Redisplay Mechanism
@section Critical Redisplay Sections
@cindex critical redisplay sections
#### If a frame-size change does occur we should probably
actually be preempting redisplay.
-@node Line Start Cache
+@node Line Start Cache, Redisplay Piece by Piece, Critical Redisplay Sections, The Redisplay Mechanism
@section Line Start Cache
@cindex line start cache
is sufficient to always provide the needed information. The second
thing we can do is be smart about invalidating the cache.
- TODO -- Be smart about invalidating the cache. Potential places:
+ TODO---Be smart about invalidating the cache. Potential places:
@itemize @bullet
@item
In case you're wondering, the Second Golden Rule of Redisplay is not
applicable.
-@node Extents, Faces and Glyphs, The Redisplay Mechanism, Top
+@node Redisplay Piece by Piece, , Line Start Cache, The Redisplay Mechanism
+@section Redisplay Piece by Piece
+@cindex Redisplay Piece by Piece
+
+As you can begin to see redisplay is complex and also not well
+documented. Chuck no longer works on XEmacs so this section is my take
+on the workings of redisplay.
+
+Redisplay happens in three phases:
+
+@enumerate
+@item
+Determine desired display in area that needs redisplay.
+Implemented by @code{redisplay.c}
+@item
+Compare desired display with current display
+Implemented by @code{redisplay-output.c}
+@item
+Output changes Implemented by @code{redisplay-output.c},
+@code{redisplay-x.c}, @code{redisplay-msw.c} and @code{redisplay-tty.c}
+@end enumerate
+
+Steps 1 and 2 are device-independent and relatively complex. Step 3 is
+mostly device-dependent.
+
+Determining the desired display
+
+Display attributes are stored in @code{display_line} structures. Each
+@code{display_line} consists of a set of @code{display_block}'s and each
+@code{display_block} contains a number of @code{rune}'s. Generally
+dynarr's of @code{display_line}'s are held by each window representing
+the current display and the desired display.
+
+The @code{display_line} structures are tightly tied to buffers which
+presents a problem for redisplay as this connection is bogus for the
+modeline. Hence the @code{display_line} generation routines are
+duplicated for generating the modeline. This means that the modeline
+display code has many bugs that the standard redisplay code does not.
+
+The guts of @code{display_line} generation are in
+@code{create_text_block}, which creates a single display line for the
+desired locale. This incrementally parses the characters on the current
+line and generates redisplay structures for each.
+
+Gutter redisplay is different. Because the data to display is stored in
+a string we cannot use @code{create_text_block}. Instead we use
+@code{create_text_string_block} which performs the same function as
+@code{create_text_block} but for strings. Many of the complexities of
+@code{create_text_block} to do with cursor handling and selective
+display have been removed.
+
+@node Extents, Faces, The Redisplay Mechanism, Top
@chapter Extents
@menu
* Extent Ordering:: How extents are ordered internally.
* Format of the Extent Info:: The extent information in a buffer or string.
* Zero-Length Extents:: A weird special case.
-* Mathematics of Extent Ordering:: A rigorous foundation.
+* Mathematics of Extent Ordering:: A rigorous foundation.
* Extent Fragments:: Cached information useful for redisplay.
@end menu
-@node Introduction to Extents
+@node Introduction to Extents, Extent Ordering, Extents, Extents
@section Introduction to Extents
Extents are regions over a buffer, with a start and an end position
however, and just ended up complexifying and buggifying all the
rest of the code.)
-@node Extent Ordering
+@node Extent Ordering, Format of the Extent Info, Introduction to Extents, Extents
@section Extent Ordering
Extents are compared using memory indices. There are two orderings
or @dfn{display} order is as follows:
@example
-Extent A is ``less than'' extent B, that is, earlier in the display order,
-if: A-start < B-start,
-or if: A-start = B-start, and A-end > B-end
+Extent A is ``less than'' extent B,
+that is, earlier in the display order,
+ if: A-start < B-start,
+ or if: A-start = B-start, and A-end > B-end
@end example
So if two extents begin at the same position, the larger of them is the
For the e-order, the same thing holds:
@example
-Extent A is ``less than'' extent B in e-order, that is, later in the buffer,
-if: A-end < B-end,
-or if: A-end = B-end, and A-start > B-start
+Extent A is ``less than'' extent B in e-order,
+that is, later in the buffer,
+ if: A-end < B-end,
+ or if: A-end = B-end, and A-start > B-start
@end example
So if two extents end at the same position, the smaller of them is the
all occurrences of ``display order'' and ``e-order'', ``less than'' and
``greater than'', and ``extent start'' and ``extent end''.
-@node Format of the Extent Info
+@node Format of the Extent Info, Zero-Length Extents, Extent Ordering, Extents
@section Format of the Extent Info
An extent-info structure consists of a list of the buffer or string's
extents and a @dfn{stack of extents} that lists all of the extents over
a particular position. The stack-of-extents info is used for
-optimization purposes -- it basically caches some info that might
+optimization purposes---it basically caches some info that might
be expensive to compute. Certain otherwise hard computations are easy
given the stack of extents over a particular position, and if the
stack of extents over a nearby position is known (because it was
array, except for the fact that positions are integers (this should be
generalized to handle integers and linked list equally well).
-@node Zero-Length Extents
+@node Zero-Length Extents, Mathematics of Extent Ordering, Format of the Extent Info, Extents
@section Zero-Length Extents
Extents can be zero-length, and will end up that way if their endpoints
-are explicitly set that way or if their detachable property is nil
+are explicitly set that way or if their detachable property is @code{nil}
and all the text in the extent is deleted. (The exception is open-open
zero-length extents, which are barred from existing because there is
no sensible way to define their properties. Deletion of the text in
exactly like markers and that open-closed, non-detachable zero-length
extents behave like the ``point-type'' marker in Mule.
-@node Mathematics of Extent Ordering
+@node Mathematics of Extent Ordering, Extent Fragments, Zero-Length Extents, Extents
@section Mathematics of Extent Ordering
@cindex extent mathematics
@cindex mathematics of extents
@math{S}, including @math{F}. Otherwise, @math{F2} includes @math{I}
and thus is in @math{S}, and thus @math{F2 >= F}.
-@node Extent Fragments
+@node Extent Fragments, , Mathematics of Extent Ordering, Extents
@section Extent Fragments
@cindex extent fragment
An extent fragment is a structure that holds data about the run that
contains a particular buffer position (if the buffer position is at the
-junction of two runs, the run after the position is used) -- the
+junction of two runs, the run after the position is used)---the
beginning and end of the run, a list of all of the extents in that run,
the @dfn{merged face} that results from merging all of the faces
corresponding to those extents, the begin and end glyphs at the
stack-of-extents code, which does the heavy-duty algorithmic work of
determining which extents overly a particular position.
-@node Faces and Glyphs, Specifiers, Extents, Top
-@chapter Faces and Glyphs
+@node Faces, Glyphs, Extents, Top
+@chapter Faces
Not yet documented.
-@node Specifiers, Menus, Faces and Glyphs, Top
+@node Glyphs, Specifiers, Faces, Top
+@chapter Glyphs
+
+Glyphs are graphical elements that can be displayed in XEmacs buffers or
+gutters. We use the term graphical element here in the broadest possible
+sense since glyphs can be as mundane as text or as arcane as a native
+tab widget.
+
+In XEmacs, glyphs represent the uninstantiated state of graphical
+elements, i.e. they hold all the information necessary to produce an
+image on-screen but the image need not exist at this stage, and multiple
+screen images can be instantiated from a single glyph.
+
+Glyphs are lazily instantiated by calling one of the glyph
+functions. This usually occurs within redisplay when
+@code{Fglyph_height} is called. Instantiation causes an image-instance
+to be created and cached. This cache is on a device basis for all glyphs
+except glyph-widgets, and on a window basis for glyph widgets. The
+caching is done by @code{image_instantiate} and is necessary because it
+is generally possible to display an image-instance in multiple
+domains. For instance if we create a Pixmap, we can actually display
+this on multiple windows - even though we only need a single Pixmap
+instance to do this. If caching wasn't done then it would be necessary
+to create image-instances for every displayable occurrence of a glyph -
+and every usage - and this would be extremely memory and cpu intensive.
+
+Widget-glyphs (a.k.a native widgets) are not cached in this way. This is
+because widget-glyph image-instances on screen are toolkit windows, and
+thus cannot be reused in multiple XEmacs domains. Thus widget-glyphs are
+cached on an XEmacs window basis.
+
+Any action on a glyph first consults the cache before actually
+instantiating a widget.
+
+@section Widget-Glyphs in the MS-Windows Environment
+
+To Do
+
+@section Widget-Glyphs in the X Environment
+
+Widget-glyphs under X make heavy use of lwlib (@pxref{Lucid Widget
+Library}) for manipulating the native toolkit objects. This is primarily
+so that different toolkits can be supported for widget-glyphs, just as
+they are supported for features such as menubars etc.
+
+@node Specifiers, Menus, Glyphs, Top
@chapter Specifiers
Not yet documented.
its argument, which is the callback function or form given in the menu's
description.
-@node Subprocesses, Interface to X Windows, Menus, Top
+@node Subprocesses, Interface to the X Window System, Menus, Top
@chapter Subprocesses
The fields of a process are:
or @code{nil} if it is using pipes.
@end table
-@node Interface to X Windows, Index, Subprocesses, Top
-@chapter Interface to X Windows
+@node Interface to the X Window System, Index, Subprocesses, Top
+@chapter Interface to the X Window System
-Not yet documented.
+Mostly undocumented.
+
+@menu
+* Lucid Widget Library:: An interface to various widget sets.
+@end menu
+
+@node Lucid Widget Library, , , Interface to the X Window System
+@section Lucid Widget Library
+
+Lwlib is extremely poorly documented and quite hairy. The author(s)
+blame that on X, Xt, and Motif, with some justice, but also sufficient
+hypocrisy to avoid drawing the obvious conclusion about their own work.
+
+The Lucid Widget Library is composed of two more or less independent
+pieces. The first, as the name suggests, is a set of widgets. These
+widgets are intended to resemble and improve on widgets provided in the
+Motif toolkit but not in the Athena widgets, including menubars and
+scrollbars. Recent additions by Andy Piper integrate some ``modern''
+widgets by Edward Falk, including checkboxes, radio buttons, progress
+gauges, and index tab controls (aka notebooks).
+
+The second piece of the Lucid widget library is a generic interface to
+several toolkits for X (including Xt, the Athena widget set, and Motif,
+as well as the Lucid widgets themselves) so that core XEmacs code need
+not know which widget set has been used to build the graphical user
+interface.
+
+@menu
+* Generic Widget Interface:: The lwlib generic widget interface.
+* Scrollbars::
+* Menubars::
+* Checkboxes and Radio Buttons::
+* Progress Bars::
+* Tab Controls::
+@end menu
+
+@node Generic Widget Interface, Scrollbars, , Lucid Widget Library
+@subsection Generic Widget Interface
+
+In general in any toolkit a widget may be a composite object. In Xt,
+all widgets have an X window that they manage, but typically a complex
+widget will have widget children, each of which manages a subwindow of
+the parent widget's X window. These children may themselves be
+composite widgets. Thus a widget is actually a tree or hierarchy of
+widgets.
+
+For each toolkit widget, lwlib maintains a tree of @code{widget_values}
+which mirror the hierarchical state of Xt widgets (including Motif,
+Athena, 3D Athena, and Falk's widget sets). Each @code{widget_value}
+has @code{contents} member, which points to the head of a linked list of
+its children. The linked list of siblings is chained through the
+@code{next} member of @code{widget_value}.
+
+@example
+ +-----------+
+ | composite |
+ +-----------+
+ |
+ | contents
+ V
+ +-------+ next +-------+ next +-------+
+ | child |----->| child |----->| child |
+ +-------+ +-------+ +-------+
+ |
+ | contents
+ V
+ +-------------+ next +-------------+
+ | grand child |----->| grand child |
+ +-------------+ +-------------+
+
+The @code{widget_value} hierarchy of a composite widget with two simple
+children and one composite child.
+@end example
+
+The @code{widget_instance} structure maintains the inverse view of the
+tree. As for the @code{widget_value}, siblings are chained through the
+@code{next} member. However, rather than naming children, the
+@code{widget_instance} tree links to parents.
+
+@example
+ +-----------+
+ | composite |
+ +-----------+
+ A
+ | parent
+ |
+ +-------+ next +-------+ next +-------+
+ | child |----->| child |----->| child |
+ +-------+ +-------+ +-------+
+ A
+ | parent
+ |
+ +-------------+ next +-------------+
+ | grand child |----->| grand child |
+ +-------------+ +-------------+
+
+The @code{widget_value} hierarchy of a composite widget with two simple
+children and one composite child.
+@end example
+
+This permits widgets derived from different toolkits to be updated and
+manipulated generically by the lwlib library. For instance
+@code{update_one_widget_instance} can cope with multiple types of widget
+and multiple types of toolkit. Each element in the widget hierarchy is
+updated from its corresponding @code{widget_value} by walking the
+@code{widget_value} tree. This has desirable properties. For example,
+@code{lw_modify_all_widgets} is called from @file{glyphs-x.c} and
+updates all the properties of a widget without having to know what the
+widget is or what toolkit it is from. Unfortunately this also has its
+hairy properties; the lwlib code quite complex. And of course lwlib has
+to know at some level what the widget is and how to set its properties.
+
+The @code{widget_instance} structure also contains a pointer to the root
+of its tree. Widget instances are further confi
+
+
+@node Scrollbars, Menubars, Generic Widget Interface, Lucid Widget Library
+@subsection Scrollbars
+
+@node Menubars, Checkboxes and Radio Buttons, Scrollbars, Lucid Widget Library
+@subsection Menubars
+
+@node Checkboxes and Radio Buttons, Progress Bars, Menubars, Lucid Widget Library
+@subsection Checkboxes and Radio Buttons
+
+@node Progress Bars, Tab Controls, Checkboxes and Radio Buttons, Lucid Widget Library
+@subsection Progress Bars
+
+@node Tab Controls, , Progress Bars, Lucid Widget Library
+@subsection Tab Controls
@include index.texi
@c That's all
@bye
-