Sync with r21_2_36.

[chise/xemacs-chise.git-] / info / internals.info-5
diff --git a/info/internals.info-5 b/info/internals.info-5

index f5cbeb4..56833c5 100644 (file)
--- a/info/internals.info-5
+++ b/info/internals.info-5
@@ -1,9 +1,9 @@
-This is ../info/internals.info, produced by makeinfo version 3.12s from
+This is ../info/internals.info, produced by makeinfo version 4.0 from
  internals/internals.texi.
  
  INFO-DIR-SECTION XEmacs Editor
  START-INFO-DIR-ENTRY
-* Internals: (internals).      XEmacs Internals Manual.
+* Internals: (internals).       XEmacs Internals Manual.
  END-INFO-DIR-ENTRY
  
     Copyright (C) 1992 - 1996 Ben Wing.  Copyright (C) 1996, 1997 Sun
@@ -38,222 +38,6 @@ may be included in a translation approved by the Free Software
  Foundation instead of in the original English.
  
  \1f
-File: internals.info,  Node: garbage_collect_1,  Next: mark_object,  Prev: Invocation,  Up: Garbage Collection - Step by Step
-
-`garbage_collect_1'
--------------------
-
-   We can now describe exactly what happens after the invocation takes
-place.
-  1. There are several cases in which the garbage collector is left
-     immediately: when we are already garbage collecting
-     (`gc_in_progress'), when the garbage collection is somehow
-     forbidden (`gc_currently_forbidden'), when we are currently
-     displaying something (`in_display') or when we are preparing for
-     the armageddon of the whole system (`preparing_for_armageddon').
-
-  2. Next the correct frame in which to put all the output occurring
-     during garbage collecting is determined. In order to be able to
-     restore the old display's state after displaying the message, some
-     data about the current cursor position has to be saved. The
-     variables `pre_gc_curser' and `cursor_changed' take care of that.
-
-  3. The state of `gc_currently_forbidden' must be restored after the
-     garbage collection, no matter what happens during the process. We
-     accomplish this by `record_unwind_protect'ing the suitable function
-     `restore_gc_inhibit' together with the current value of
-     `gc_currently_forbidden'.
-
-  4. If we are concurrently running an interactive xemacs session, the
-     next step is simply to show the garbage collector's cursor/message.
-
-  5. The following steps are the intrinsic steps of the garbage
-     collector, therefore `gc_in_progress' is set.
-
-  6. For debugging purposes, it is possible to copy the current C stack
-     frame. However, this seems to be a currently unused feature.
-
-  7. Before actually starting to go over all live objects, references to
-     objects that are no longer used are pruned. We only have to do
-     this for events (`clear_event_resource') and for specifiers
-     (`cleanup_specifiers').
-
-  8. Now the mark phase begins and marks all accessible elements. In
-     order to start from all slots that serve as roots of
-     accessibility, the function `mark_object' is called for each root
-     individually to go out from there to mark all reachable objects.
-     All roots that are traversed are shown in their processed order:
-        * all constant symbols and static variables that are registered
-          via `staticpro' in the array `staticvec'.  *Note Adding
-          Global Lisp Variables::.
-
-        * all Lisp objects that are created in C functions and that
-          must be protected from freeing them. They are registered in
-          the global list `gcprolist'.  *Note GCPROing::.
-
-        * all local variables (i.e. their name fields `symbol' and old
-          values `old_values') that are bound during the evaluation by
-          the Lisp engine. They are stored in `specbinding' structs
-          pushed on a stack called `specpdl'.  *Note Dynamic Binding;
-          The specbinding Stack; Unwind-Protects::.
-
-        * all catch blocks that the Lisp engine encounters during the
-          evaluation cause the creation of structs `catchtag' inserted
-          in the list `catchlist'. Their tag (`tag') and value (`val'
-          fields are freshly created objects and therefore have to be
-          marked.  *Note Catch and Throw::.
-
-        * every function application pushes new structs `backtrace' on
-          the call stack of the Lisp engine (`backtrace_list'). The
-          unique parts that have to be marked are the fields for each
-          function (`function') and all their arguments (`args').
-          *Note Evaluation::.
-
-        * all objects that are used by the redisplay engine that must
-          not be freed are marked by a special function called
-          `mark_redisplay' (in `redisplay.c').
-
-        * all objects created for profiling purposes are allocated by C
-          functions instead of using the lisp allocation mechanisms. In
-          order to receive the right ones during the sweep phase, they
-          also have to be marked manually. That is done by the function
-          `mark_profiling_info'
-
-  9. Hash tables in Xemacs belong to a kind of special objects that
-     make use of a concept often called 'weak pointers'.  To make a
-     long story short, these kind of pointers are not followed during
-     the estimation of the live objects during garbage collection.  Any
-     object referenced only by weak pointers is collected anyway, and
-     the reference to it is cleared. In hash tables there are different
-     usage patterns of them, manifesting in different types of hash
-     tables, namely 'non-weak', 'weak', 'key-weak' and 'value-weak'
-     (internally also 'key-car-weak' and 'value-car-weak') hash tables,
-     each clearing entries depending on different conditions. More
-     information can be found in the documentation to the function
-     `make-hash-table'.
-
-     Because there are complicated dependency rules about when and what
-     to mark while processing weak hash tables, the standard `marker'
-     method is only active if it is marking non-weak hash tables. As
-     soon as a weak component is in the table, the hash table entries
-     are ignored while marking. Instead their marking is done each
-     separately by the function `finish_marking_weak_hash_tables'. This
-     function iterates over each hash table entry `hentries' for each
-     weak hash table in `Vall_weak_hash_tables'. Depending on the type
-     of a table, the appropriate action is performed.  If a table is
-     acting as `HASH_TABLE_KEY_WEAK', and a key already marked,
-     everything reachable from the `value' component is marked. If it is
-     acting as a `HASH_TABLE_VALUE_WEAK' and the value component is
-     already marked, the marking starts beginning only from the `key'
-     component.  If it is a `HASH_TABLE_KEY_CAR_WEAK' and the car of
-     the key entry is already marked, we mark both the `key' and
-     `value' components.  Finally, if the table is of the type
-     `HASH_TABLE_VALUE_CAR_WEAK' and the car of the value components is
-     already marked, again both the `key' and the `value' components
-     get marked.
-
-     Again, there are lists with comparable properties called weak
-     lists. There exist different peculiarities of their types called
-     `simple', `assoc', `key-assoc' and `value-assoc'. You can find
-     further details about them in the description to the function
-     `make-weak-list'. The scheme of their marking is similar: all weak
-     lists are listed in `Qall_weak_lists', therefore we iterate over
-     them. The marking is advanced until we hit an already marked pair.
-     Then we know that during a former run all the rest has been marked
-     completely. Again, depending on the special type of the weak list,
-     our jobs differ. If it is a `WEAK_LIST_SIMPLE' and the elem is
-     marked, we mark the `cons' part. If it is a `WEAK_LIST_ASSOC' and
-     not a pair or a pair with both marked car and cdr, we mark the
-     `cons' and the `elem'. If it is a `WEAK_LIST_KEY_ASSOC' and not a
-     pair or a pair with a marked car of the elem, we mark the `cons'
-     and the `elem'. Finally, if it is a `WEAK_LIST_VALUE_ASSOC' and
-     not a pair or a pair with a marked cdr of the elem, we mark both
-     the `cons' and the `elem'.
-
-     Since, by marking objects in reach from weak hash tables and weak
-     lists, other objects could get marked, this perhaps implies
-     further marking of other weak objects, both finishing functions
-     are redone as long as yet unmarked objects get freshly marked.
-
- 10. After completing the special marking for the weak hash tables and
-     for the weak lists, all entries that point to objects that are
-     going to be swept in the further process are useless, and
-     therefore have to be removed from the table or the list.
-
-     The function `prune_weak_hash_tables' does the job for weak hash
-     tables. Totally unmarked hash tables are removed from the list
-     `Vall_weak_hash_tables'. The other ones are treated more carefully
-     by scanning over all entries and removing one as soon as one of
-     the components `key' and `value' is unmarked.
-
-     The same idea applies to the weak lists. It is accomplished by
-     `prune_weak_lists': An unmarked list is pruned from
-     `Vall_weak_lists' immediately. A marked list is treated more
-     carefully by going over it and removing just the unmarked pairs.
-
- 11. The function `prune_specifiers' checks all listed specifiers held
-     in `Vall_speficiers' and removes the ones from the lists that are
-     unmarked.
-
- 12. All syntax tables are stored in a list called
-     `Vall_syntax_tables'. The function `prune_syntax_tables' walks
-     through it and unlinks the tables that are unmarked.
-
- 13. Next, we will attack the complete sweeping - the function
-     `gc_sweep' which holds the predominance.
-
- 14. First, all the variables with respect to garbage collection are
-     reset. `consing_since_gc' - the counter of the created cells since
-     the last garbage collection - is set back to 0, and
-     `gc_in_progress' is not `true' anymore.
-
- 15. In case the session is interactive, the displayed cursor and
-     message are removed again.
-
- 16. The state of `gc_inhibit' is restored to the former value by
-     unwinding the stack.
-
- 17. A small memory reserve is always held back that can be reached by
-     `breathing_space'. If nothing more is left, we create a new reserve
-     and exit.
-
-\1f
-File: internals.info,  Node: mark_object,  Next: gc_sweep,  Prev: garbage_collect_1,  Up: Garbage Collection - Step by Step
-
-`mark_object'
--------------
-
-   The first thing that is checked while marking an object is whether
-the object is a real Lisp object `Lisp_Type_Record' or just an integer
-or a character. Integers and characters are the only two types that are
-stored directly - without another level of indirection, and therefore
-they don´t have to be marked and collected.  *Note How Lisp Objects Are
-Represented in C::.
-
-   The second case is the one we have to handle. It is the one when we
-are dealing with a pointer to a Lisp object. But, there exist also three
-possibilities, that prevent us from doing anything while marking: The
-object is read only which prevents it from being garbage collected,
-i.e. marked (`C_READONLY_RECORD_HEADER'). The object in question is
-already marked, and need not be marked for the second time (checked by
-`MARKED_RECORD_HEADER_P'). If it is a special, unmarkable object
-(`UNMARKABLE_RECORD_HEADER_P', apparently, these are objects that sit
-in some CONST space, and can therefore not be marked, see
-`this_one_is_unmarkable' in `alloc.c').
-
-   Now, the actual marking is feasible. We do so by once using the macro
-`MARK_RECORD_HEADER' to mark the object itself (actually the special
-flag in the lrecord header), and calling its special marker "method"
-`marker' if available. The marker method marks every other object that
-is in reach from our current object. Note, that these marker methods
-should not call `mark_object' recursively, but instead should return
-the next object from where further marking has to be performed.
-
-   In case another object was returned, as mentioned before, we
-reiterate the whole `mark_object' process beginning with this next
-object.
-
-\1f
  File: internals.info,  Node: gc_sweep,  Next: sweep_lcrecords_1,  Prev: mark_object,  Up: Garbage Collection - Step by Step
  
  `gc_sweep'
@@ -268,7 +52,7 @@ and managed, and consequently different ways to free them from memory.
  objects are allocated and handled using that scheme of `lcrecords'.
  Each object is `malloc'ed separately instead of placing it in one of
  the contiguous frob blocks. All types that are currently stored using
-`lcrecords'´s  `alloc_lcrecord' and `make_lcrecord_list' are the types:
+`lcrecords''s  `alloc_lcrecord' and `make_lcrecord_list' are the types:
  vectors, buffers, char-table, char-table-entry, console, weak-list,
  database, device, ldap, hash-table, command-builder, extent-auxiliary,
  extent-info, face, coding-system, frame, image-instance, glyph,
@@ -284,7 +68,7 @@ the internals: *Note lrecords::.
  
     Our next candidates are the other objects that behave quite
  differently than everything else: the strings. They consists of two
-parts, a fixed-size portion (`struct Lisp_string') holding the string's
+parts, a fixed-size portion (`struct Lisp_String') holding the string's
  length, its property list and a pointer to the second part, and the
  actual string data, which is stored in string-chars blocks comparable to
  frob blocks. In this block, the data is not only freed, but also a
@@ -501,24 +285,17 @@ lrecords
     [see `lrecord.h']
  
     All lrecords have at the beginning of their structure a `struct
-lrecord_header'.  This just contains a pointer to a `struct
+lrecord_header'.  This just contains a type number and some flags,
+including the mark bit.  All builtin type numbers are defined as
+constants in `enum lrecord_type', to allow the compiler to generate
+more efficient code for `TYPEP'.  The type number, thru the
+`lrecord_implementation_table', gives access to a `struct
  lrecord_implementation', which is a structure containing method pointers
  and such.  There is one of these for each type, and it is a global,
  constant, statically-declared structure that is declared in the
-`DEFINE_LRECORD_IMPLEMENTATION()' macro. (This macro actually declares
-an array of two `struct lrecord_implementation' structures.  The first
-one contains all the standard method pointers, and is used in all
-normal circumstances.  During garbage collection, however, the lrecord
-is "marked" by bumping its implementation pointer by one, so that it
-points to the second structure in the array.  This structure contains a
-special indication in it that it's a "marked-object" structure: the
-finalize method is the special function `this_marks_a_marked_record()',
-and all other methods are null pointers.  At the end of garbage
-collection, all lrecords will either be reclaimed or unmarked by
-decrementing their implementation pointers, so this second structure
-pointer will never remain past garbage collection.
-
-   Simple lrecords (of type (c) above) just have a `struct
+`DEFINE_LRECORD_IMPLEMENTATION()' macro.
+
+   Simple lrecords (of type (b) above) just have a `struct
  lrecord_header' at their beginning.  lcrecords, however, actually have a
  `struct lcrecord_header'.  This, in turn, has a `struct lrecord_header'
  at its beginning, so sanity is preserved; but it also has a pointer
@@ -544,20 +321,21 @@ allocation function for each lrecord type.
     Whenever you create an lrecord, you need to call either
  `DEFINE_LRECORD_IMPLEMENTATION()' or
  `DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()'.  This needs to be specified
-in a C file, at the top level.  What this actually does is define and
-initialize the implementation structure for the lrecord. (And possibly
-declares a function `error_check_foo()' that implements the `XFOO()'
-macro when error-checking is enabled.)  The arguments to the macros are
-the actual type name (this is used to construct the C variable name of
-the lrecord implementation structure and related structures using the
-`##' macro concatenation operator), a string that names the type on the
-Lisp level (this may not be the same as the C type name; typically, the
-C type name has underscores, while the Lisp string has dashes), various
-method pointers, and the name of the C structure that contains the
-object.  The methods are used to encapsulate type-specific information
-about the object, such as how to print it or mark it for garbage
-collection, so that it's easy to add new object types without having to
-add a specific case for each new type in a bunch of different places.
+in a `.c' file, at the top level.  What this actually does is define
+and initialize the implementation structure for the lrecord. (And
+possibly declares a function `error_check_foo()' that implements the
+`XFOO()' macro when error-checking is enabled.)  The arguments to the
+macros are the actual type name (this is used to construct the C
+variable name of the lrecord implementation structure and related
+structures using the `##' macro concatenation operator), a string that
+names the type on the Lisp level (this may not be the same as the C
+type name; typically, the C type name has underscores, while the Lisp
+string has dashes), various method pointers, and the name of the C
+structure that contains the object.  The methods are used to
+encapsulate type-specific information about the object, such as how to
+print it or mark it for garbage collection, so that it's easy to add
+new object types without having to add a specific case for each new
+type in a bunch of different places.
  
     The difference between `DEFINE_LRECORD_IMPLEMENTATION()' and
  `DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()' is that the former is used
@@ -571,21 +349,20 @@ for keeping allocation statistics.)
     For the purpose of keeping allocation statistics, the allocation
  engine keeps a list of all the different types that exist.  Note that,
  since `DEFINE_LRECORD_IMPLEMENTATION()' is a macro that is specified at
-top-level, there is no way for it to add to the list of all existing
-types.  What happens instead is that each implementation structure
-contains in it a dynamically assigned number that is particular to that
-type. (Or rather, it contains a pointer to another structure that
-contains this number.  This evasiveness is done so that the
-implementation structure can be declared const.) In the sweep stage of
-garbage collection, each lrecord is examined to see if its
-implementation structure has its dynamically-assigned number set.  If
-not, it must be a new type, and it is added to the list of known types
-and a new number assigned.  The number is used to index into an array
-holding the number of objects of each type and the total memory
-allocated for objects of that type.  The statistics in this array are
-also computed during the sweep stage.  These statistics are returned by
-the call to `garbage-collect' and are printed out at the end of the
-loadup phase.
+top-level, there is no way for it to initialize the global data
+structures containing type information, like
+`lrecord_implementations_table'.  For this reason a call to
+`INIT_LRECORD_IMPLEMENTATION' must be added to the same source file
+containing `DEFINE_LRECORD_IMPLEMENTATION', but instead of to the top
+level, to one of the init functions, typically `syms_of_FOO.c'.
+`INIT_LRECORD_IMPLEMENTATION' must be called before an object of this
+type is used.
+
+   The type number is also used to index into an array holding the
+number of objects of each type and the total memory allocated for
+objects of that type.  The statistics in this array are computed during
+the sweep stage.  These statistics are returned by the call to
+`garbage-collect'.
  
     Note that for every type defined with a `DEFINE_LRECORD_*()' macro,
  there needs to be a `DECLARE_LRECORD_IMPLEMENTATION()' somewhere in a
@@ -595,6 +372,15 @@ there needs to be a `DECLARE_LRECORD_IMPLEMENTATION()' somewhere in a
  `FOOBARP()', etc. macros in a `.h' (or occasionally `.c') file.  To
  create one of these, copy an existing model and modify as necessary.
  
+   *Please note:* If you define an lrecord in an external
+dynamically-loaded module, you must use `DECLARE_EXTERNAL_LRECORD',
+`DEFINE_EXTERNAL_LRECORD_IMPLEMENTATION', and
+`DEFINE_EXTERNAL_LRECORD_SEQUENCE_IMPLEMENTATION' instead of the
+non-EXTERNAL forms. These macros will dynamically add new type numbers
+to the global enum that records them, whereas the non-EXTERNAL forms
+assume that the programmer has already inserted the correct type numbers
+into the enum's code at compile-time.
+
     The various methods in the lrecord implementation structure are:
  
    1. A "mark" method.  This is called during the marking stage and
@@ -723,7 +509,7 @@ create one of these, copy an existing model and modify as necessary.
       configurations and opaques.
  
  \1f
-File: internals.info,  Node: Low-level allocation,  Next: Pure Space,  Prev: lrecords,  Up: Allocation of Objects in XEmacs Lisp
+File: internals.info,  Node: Low-level allocation,  Next: Cons,  Prev: lrecords,  Up: Allocation of Objects in XEmacs Lisp
  
  Low-level allocation
  ====================
@@ -784,10 +570,9 @@ system, when memory gets to 75%, 85%, and 95% full.  (On some systems,
  the memory warnings are not functional.)
  
     Allocated memory that is going to be used to make a Lisp object is
-created using `allocate_lisp_storage()'.  This calls `xmalloc()' but
-also verifies that the pointer to the memory can fit into a Lisp word
-(remember that some bits are taken away for a type tag and a mark bit).
-If not, an error is issued through `memory_full()'.
+created using `allocate_lisp_storage()'.  This just calls `xmalloc()'.
+It used to verify that the pointer to the memory can fit into a Lisp
+word, before the current Lisp object representation was introduced.
  `allocate_lisp_storage()' is called by `alloc_lcrecord()',
  `ALLOCATE_FIXED_TYPE()', and the vector and bit-vector creation
  routines.  These routines also call `INCREMENT_CONS_COUNTER()' at the
@@ -796,15 +581,7 @@ allocated, so that garbage-collection can be invoked when the threshold
  is reached.
  
  \1f
-File: internals.info,  Node: Pure Space,  Next: Cons,  Prev: Low-level allocation,  Up: Allocation of Objects in XEmacs Lisp
-
-Pure Space
-==========
-
-   Not yet documented.
-
-\1f
-File: internals.info,  Node: Cons,  Next: Vector,  Prev: Pure Space,  Up: Allocation of Objects in XEmacs Lisp
+File: internals.info,  Node: Cons,  Next: Vector,  Prev: Low-level allocation,  Up: Allocation of Objects in XEmacs Lisp
  
  Cons
  ====
@@ -851,13 +628,8 @@ File: internals.info,  Node: Symbol,  Next: Marker,  Prev: Bit Vector,  Up: Allo
  Symbol
  ======
  
-   Symbols are also allocated in frob blocks.  Note that the code
-exists for symbols to be either lrecords (category (c) above) or simple
-types (category (b) above), and are lrecords by default (I think),
-although there is no good reason for this.
-
-   Note that symbols in the awful horrible obarray structure are
-chained through their `next' field.
+   Symbols are also allocated in frob blocks.  Symbols in the awful
+horrible obarray structure are chained through their `next' field.
  
     Remember that `intern' looks up a symbol in an obarray, creating one
  if necessary.
@@ -913,8 +685,8 @@ big to fit into a string-chars block.  Such strings, called "big
  strings", are all `malloc()'ed as their own block. (#### Although it
  would make more sense for the threshold for big strings to be somewhat
  lower, e.g. 1/2 or 1/4 the size of a string-chars block.  It seems that
-this was indeed the case formerly - indeed, the threshold was set at
-1/8 - but Mly forgot about this when rewriting things for 19.8.)
+this was indeed the case formerly--indeed, the threshold was set at
+1/8--but Mly forgot about this when rewriting things for 19.8.)
  
     Note also that the string data in string-chars blocks is padded as
  necessary so that proper alignment constraints on the `struct
@@ -949,58 +721,374 @@ Compiled Function
     Not yet documented.
  
  \1f
-File: internals.info,  Node: Events and the Event Loop,  Next: Evaluation; Stack Frames; Bindings,  Prev: Allocation of Objects in XEmacs Lisp,  Up: Top
+File: internals.info,  Node: Dumping,  Next: Events and the Event Loop,  Prev: Allocation of Objects in XEmacs Lisp,  Up: Top
+
+Dumping
+*******
+
+What is dumping and its justification
+=====================================
+
+   The C code of XEmacs is just a Lisp engine with a lot of built-in
+primitives useful for writing an editor.  The editor itself is written
+mostly in Lisp, and represents around 100K lines of code.  Loading and
+executing the initialization of all this code takes a bit a time (five
+to ten times the usual startup time of current xemacs) and requires
+having all the lisp source files around.  Having to reload them each
+time the editor is started would not be acceptable.
+
+   The traditional solution to this problem is called dumping: the build
+process first creates the lisp engine under the name `temacs', then
+runs it until it has finished loading and initializing all the lisp
+code, and eventually creates a new executable called `xemacs' including
+both the object code in `temacs' and all the contents of the memory
+after the initialization.
+
+   This solution, while working, has a huge problem: the creation of the
+new executable from the actual contents of memory is an extremely
+system-specific process, quite error-prone, and which interferes with a
+lot of system libraries (like malloc).  It is even getting worse
+nowadays with libraries using constructors which are automatically
+called when the program is started (even before main()) which tend to
+crash when they are called multiple times, once before dumping and once
+after (IRIX 6.x libz.so pulls in some C++ image libraries thru
+dependencies which have this problem).  Writing the dumper is also one
+of the most difficult parts of porting XEmacs to a new operating system.
+Basically, `dumping' is an operation that is just not officially
+supported on many operating systems.
+
+   The aim of the portable dumper is to solve the same problem as the
+system-specific dumper, that is to be able to reload quickly, using only
+a small number of files, the fully initialized lisp part of the editor,
+without any system-specific hacks.
+
+* Menu:
+
+* Overview::
+* Data descriptions::
+* Dumping phase::
+* Reloading phase::
+* Remaining issues::
+
+\1f
+File: internals.info,  Node: Overview,  Next: Data descriptions,  Prev: Dumping,  Up: Dumping
+
+Overview
+========
+
+   The portable dumping system has to:
+
+  1. At dump time, write all initialized, non-quickly-rebuildable data
+     to a file [Note: currently named `xemacs.dmp', but the name will
+     change], along with all informations needed for the reloading.
+
+  2. When starting xemacs, reload the dump file, relocate it to its new
+     starting address if needed, and reinitialize all pointers to this
+     data.  Also, rebuild all the quickly rebuildable data.
+
+\1f
+File: internals.info,  Node: Data descriptions,  Next: Dumping phase,  Prev: Overview,  Up: Dumping
+
+Data descriptions
+=================
+
+   The more complex task of the dumper is to be able to write lisp
+objects (lrecords) and C structs to disk and reload them at a different
+address, updating all the pointers they include in the process.  This
+is done by using external data descriptions that give information about
+the layout of the structures in memory.
+
+   The specification of these descriptions is in lrecord.h.  A
+description of an lrecord is an array of struct lrecord_description.
+Each of these structs include a type, an offset in the structure and
+some optional parameters depending on the type.  For instance, here is
+the string description:
+
+     static const struct lrecord_description string_description[] = {
+       { XD_BYTECOUNT,         offsetof (Lisp_String, size) },
+       { XD_OPAQUE_DATA_PTR,   offsetof (Lisp_String, data), XD_INDIRECT(0, 1) },
+       { XD_LISP_OBJECT,       offsetof (Lisp_String, plist) },
+       { XD_END }
+     };
+
+   The first line indicates a member of type Bytecount, which is used by
+the next, indirect directive.  The second means "there is a pointer to
+some opaque data in the field `data'".  The length of said data is
+given by the expression `XD_INDIRECT(0, 1)', which means "the value in
+the 0th line of the description (welcome to C) plus one".  The third
+line means "there is a Lisp_Object member `plist' in the Lisp_String
+structure".  `XD_END' then ends the description.
+
+   This gives us all the information we need to move around what is
+pointed to by a structure (C or lrecord) and, by transitivity,
+everything that it points to.  The only missing information for dumping
+is the size of the structure.  For lrecords, this is part of the
+lrecord_implementation, so we don't need to duplicate it.  For C
+structures we use a struct struct_description, which includes a size
+field and a pointer to an associated array of lrecord_description.
+
+\1f
+File: internals.info,  Node: Dumping phase,  Next: Reloading phase,  Prev: Data descriptions,  Up: Dumping
+
+Dumping phase
+=============
  
-Events and the Event Loop
-*************************
+   Dumping is done by calling the function pdump() (in dumper.c) which
+is invoked from Fdump_emacs (in emacs.c).  This function performs a
+number of tasks.
  
  * Menu:
  
-* Introduction to Events::
-* Main Loop::
-* Specifics of the Event Gathering Mechanism::
-* Specifics About the Emacs Event::
-* The Event Stream Callback Routines::
-* Other Event Loop Functions::
-* Converting Events::
-* Dispatching Events; The Command Builder::
+* Object inventory::
+* Address allocation::
+* The header::
+* Data dumping::
+* Pointers dumping::
+
+\1f
+File: internals.info,  Node: Object inventory,  Next: Address allocation,  Prev: Dumping phase,  Up: Dumping phase
+
+Object inventory
+----------------
+
+   The first task is to build the list of the objects to dump.  This
+includes:
+
+   * lisp objects
+
+   * C structures
+
+   We end up with one `pdump_entry_list_elmt' per object group (arrays
+of C structs are kept together) which includes a pointer to the first
+object of the group, the per-object size and the count of objects in the
+group, along with some other information which is initialized later.
+
+   These entries are linked together in `pdump_entry_list' structures
+and can be enumerated thru either:
+
+  1. the `pdump_object_table', an array of `pdump_entry_list', one per
+     lrecord type, indexed by type number.
+
+  2. the `pdump_opaque_data_list', used for the opaque data which does
+     not include pointers, and hence does not need descriptions.
+
+  3. the `pdump_struct_table', which is a vector of
+     `struct_description'/`pdump_entry_list' pairs, used for non-opaque
+     C structures.
+
+   This uses a marking strategy similar to the garbage collector.  Some
+differences though:
+
+  1. We do not use the mark bit (which does not exist for C structures
+     anyway), we use a big hash table instead.
+
+  2. We do not use the mark function of lrecords but instead rely on the
+     external descriptions.  This happens essentially because we need to
+     follow pointers to C structures and opaque data in addition to
+     Lisp_Object members.
+
+   This is done by `pdump_register_object', which handles Lisp_Object
+variables, and pdump_register_struct which handles C structures, which
+both delegate the description management to pdump_register_sub.
+
+   The hash table doubles as a map object to pdump_entry_list_elmt (i.e.
+allows us to look up a pdump_entry_list_elmt with the object it points
+to).  Entries are added with `pdump_add_entry()' and looked up with
+`pdump_get_entry()'.  There is no need for entry removal.  The hash
+value is computed quite basically from the object pointer by
+`pdump_make_hash()'.
+
+   The roots for the marking are:
+
+  1. the `staticpro''ed variables (there is a special
+     `staticpro_nodump()' call for protected variables we do not want
+     to dump).
+
+  2. the `pdump_wire''d variables (`staticpro' is equivalent to
+     `staticpro_nodump()' + `pdump_wire()').
+
+  3. the `dumpstruct''ed variables, which points to C structures.
+
+   This does not include the GCPRO'ed variables, the specbinds, the
+catchtags, the backlist, the redisplay or the profiling info, since we
+do not want to rebuild the actual chain of lisp calls which end up to
+the dump-emacs call, only the global variables.
+
+   Weak lists and weak hash tables are dumped as if they were their
+non-weak equivalent (without changing their type, of course).  This has
+not yet been a problem.
+
+\1f
+File: internals.info,  Node: Address allocation,  Next: The header,  Prev: Object inventory,  Up: Dumping phase
+
+Address allocation
+------------------
+
+   The next step is to allocate the offsets of each of the objects in
+the final dump file.  This is done by `pdump_allocate_offset()' which
+is called indirectly by `pdump_scan_by_alignment()'.
+
+   The strategy to deal with alignment problems uses these facts:
+
+  1. real world alignment requirements are powers of two.
+
+  2. the C compiler is required to adjust the size of a struct so that
+     you can have an array of them next to each other.  This means you
+     can have a upper bound of the alignment requirements of a given
+     structure by looking at which power of two its size is a multiple.
+
+  3. the non-variant part of variable size lrecords has an alignment
+     requirement of 4.
+
+   Hence, for each lrecord type, C struct type or opaque data block the
+alignment requirement is computed as a power of two, with a minimum of
+2^2 for lrecords.  `pdump_scan_by_alignment()' then scans all the
+`pdump_entry_list_elmt''s, the ones with the highest requirements
+first.  This ensures the best packing.
+
+   The maximum alignment requirement we take into account is 2^8.
+
+   `pdump_allocate_offset()' only has to do a linear allocation,
+starting at offset 256 (this leaves room for the header and keep the
+alignments happy).
+
+\1f
+File: internals.info,  Node: The header,  Next: Data dumping,  Prev: Address allocation,  Up: Dumping phase
+
+The header
+----------
+
+   The next step creates the file and writes a header with a signature
+and some random informations in it (number of staticpro, number of
+assigned lrecord types, etc...).  The reloc_address field, which
+indicates at which address the file should be loaded if we want to
+avoid post-reload relocation, is set to 0.  It then seeks to offset 256
+(base offset for the objects).
+
+\1f
+File: internals.info,  Node: Data dumping,  Next: Pointers dumping,  Prev: The header,  Up: Dumping phase
+
+Data dumping
+------------
+
+   The data is dumped in the same order as the addresses were allocated
+by `pdump_dump_data()', called from `pdump_scan_by_alignment()'.  This
+function copies the data to a temporary buffer, relocates all pointers
+in the object to the addresses allocated in step Address Allocation,
+and writes it to the file.  Using the same order means that, if we are
+careful with lrecords whose size is not a multiple of 4, we are ensured
+that the object is always written at the offset in the file allocated
+in step Address Allocation.
+
+\1f
+File: internals.info,  Node: Pointers dumping,  Prev: Data dumping,  Up: Dumping phase
+
+Pointers dumping
+----------------
+
+   A bunch of tables needed to reassign properly the global pointers are
+then written.  They are:
+
+  1. the staticpro array
+
+  2. the dumpstruct array
+
+  3. the lrecord_implementation_table array
+
+  4. a vector of all the offsets to the objects in the file that
+     include a description (for faster relocation at reload time)
+
+  5. the pdump_wired and pdump_wired_list arrays
+
+   For each of the arrays we write both the pointer to the variables and
+the relocated offset of the object they point to.  Since these variables
+are global, the pointers are still valid when restarting the program and
+are used to regenerate the global pointers.
+
+   The `pdump_wired_list' array is a special case.  The variables it
+points to are the head of weak linked lists of lisp objects of the same
+type.  Not all objects of this list are dumped so the relocated pointer
+we associate with them points to the first dumped object of the list, or
+Qnil if none is available.  This is also the reason why they are not
+used as roots for the purpose of object enumeration.
+
+   This is the end of the dumping part.
+
+\1f
+File: internals.info,  Node: Reloading phase,  Next: Remaining issues,  Prev: Dumping phase,  Up: Dumping
+
+Reloading phase
+===============
+
+File loading
+------------
+
+   The file is mmap'ed in memory (which ensures a PAGESIZE alignment, at
+least 4096), or if mmap is unavailable or fails, a 256-bytes aligned
+malloc is done and the file is loaded.
+
+   Some variables are reinitialized from the values found in the header.
+
+   The difference between the actual loading address and the
+reloc_address is computed and will be used for all the relocations.
+
+Putting back the staticvec
+--------------------------
+
+   The staticvec array is memcpy'd from the file and the variables it
+points to are reset to the relocated objects addresses.
+
+Putting back the dumpstructed variables
+---------------------------------------
+
+   The variables pointed to by dumpstruct in the dump phase are reset to
+the right relocated object addresses.
+
+lrecord_implementations_table
+-----------------------------
+
+   The lrecord_implementations_table is reset to its dump time state and
+the right lrecord_type_index values are put in.
+
+Object relocation
+-----------------
+
+   All the objects are relocated using their description and their
+offset by `pdump_reloc_one'.  This step is unnecessary if the
+reloc_address is equal to the file loading address.
+
+Putting back the pdump_wire and pdump_wire_list variables
+---------------------------------------------------------
+
+   Same as Putting back the dumpstructed variables.
+
+Reorganize the hash tables
+--------------------------
+
+   Since some of the hash values in the lisp hash tables are
+address-dependent, their layout is now wrong.  So we go through each of
+them and have them resorted by calling `pdump_reorganize_hash_table'.
  
  \1f
-File: internals.info,  Node: Introduction to Events,  Next: Main Loop,  Up: Events and the Event Loop
-
-Introduction to Events
-======================
-
-   An event is an object that encapsulates information about an
-interesting occurrence in the operating system.  Events are generated
-either by user action, direct (e.g. typing on the keyboard or moving
-the mouse) or indirect (moving another window, thereby generating an
-expose event on an Emacs frame), or as a result of some other typically
-asynchronous action happening, such as output from a subprocess being
-ready or a timer expiring.  Events come into the system in an
-asynchronous fashion (typically through a callback being called) and
-are converted into a synchronous event queue (first-in, first-out) in a
-process that we will call "collection".
-
-   Note that each application has its own event queue. (It is
-immaterial whether the collection process directly puts the events in
-the proper application's queue, or puts them into a single system
-queue, which is later split up.)
-
-   The most basic level of event collection is done by the operating
-system or window system.  Typically, XEmacs does its own event
-collection as well.  Often there are multiple layers of collection in
-XEmacs, with events from various sources being collected into a queue,
-which is then combined with other sources to go into another queue
-(i.e. a second level of collection), with perhaps another level on top
-of this, etc.
-
-   XEmacs has its own types of events (called "Emacs events"), which
-provides an abstract layer on top of the system-dependent nature of the
-most basic events that are received.  Part of the complex nature of the
-XEmacs event collection process involves converting from the
-operating-system events into the proper Emacs events - there may not be
-a one-to-one correspondence.
-
-   Emacs events are documented in `events.h'; I'll discuss them later.
+File: internals.info,  Node: Remaining issues,  Prev: Reloading phase,  Up: Dumping
+
+Remaining issues
+================
+
+   The build process will have to start a post-dump xemacs, ask it the
+loading address (which will, hopefully, be always the same between
+different xemacs invocations) and relocate the file to the new address.
+This way the object relocation phase will not have to be done, which
+means no writes in the objects and that, because of the use of mmap, the
+dumped data will be shared between all the xemacs running on the
+computer.
+
+   Some executable signature will be necessary to ensure that a given
+dump file is really associated with a given executable, or random
+crashes will occur.  Maybe a random number set at compile or configure
+time thru a define.  This will also allow for having
+differently-compiled xemacsen on the same system (mule and no-mule
+comes to mind).
+
+   The DOC file contents should probably end up in the dump file.