X-Git-Url: http://git.chise.org/gitweb/?a=blobdiff_plain;ds=sidebyside;f=info%2Finternals.info-5;h=56833c546b317c36e0c7bc6890aa3f0463c5e028;hb=d8bd7eee3147c839d3c74d1823c139cd54867a75;hp=d64e8dd86d000e1dc18559c25e14c411b37eb6a3;hpb=7d6edaefa00e7b7e102354283824a4f6a721b71a;p=chise%2Fxemacs-chise.git- diff --git a/info/internals.info-5 b/info/internals.info-5 index d64e8dd..56833c5 100644 --- a/info/internals.info-5 +++ b/info/internals.info-5 @@ -3,7 +3,7 @@ internals/internals.texi. INFO-DIR-SECTION XEmacs Editor START-INFO-DIR-ENTRY -* Internals: (internals). XEmacs Internals Manual. +* Internals: (internals). XEmacs Internals Manual. END-INFO-DIR-ENTRY Copyright (C) 1992 - 1996 Ben Wing. Copyright (C) 1996, 1997 Sun @@ -38,222 +38,6 @@ may be included in a translation approved by the Free Software Foundation instead of in the original English.  -File: internals.info, Node: garbage_collect_1, Next: mark_object, Prev: Invocation, Up: Garbage Collection - Step by Step - -`garbage_collect_1' -------------------- - - We can now describe exactly what happens after the invocation takes -place. - 1. There are several cases in which the garbage collector is left - immediately: when we are already garbage collecting - (`gc_in_progress'), when the garbage collection is somehow - forbidden (`gc_currently_forbidden'), when we are currently - displaying something (`in_display') or when we are preparing for - the armageddon of the whole system (`preparing_for_armageddon'). - - 2. Next the correct frame in which to put all the output occurring - during garbage collecting is determined. In order to be able to - restore the old display's state after displaying the message, some - data about the current cursor position has to be saved. The - variables `pre_gc_curser' and `cursor_changed' take care of that. - - 3. The state of `gc_currently_forbidden' must be restored after the - garbage collection, no matter what happens during the process. We - accomplish this by `record_unwind_protect'ing the suitable function - `restore_gc_inhibit' together with the current value of - `gc_currently_forbidden'. - - 4. If we are concurrently running an interactive xemacs session, the - next step is simply to show the garbage collector's cursor/message. - - 5. The following steps are the intrinsic steps of the garbage - collector, therefore `gc_in_progress' is set. - - 6. For debugging purposes, it is possible to copy the current C stack - frame. However, this seems to be a currently unused feature. - - 7. Before actually starting to go over all live objects, references to - objects that are no longer used are pruned. We only have to do - this for events (`clear_event_resource') and for specifiers - (`cleanup_specifiers'). - - 8. Now the mark phase begins and marks all accessible elements. In - order to start from all slots that serve as roots of - accessibility, the function `mark_object' is called for each root - individually to go out from there to mark all reachable objects. - All roots that are traversed are shown in their processed order: - * all constant symbols and static variables that are registered - via `staticpro' in the array `staticvec'. *Note Adding - Global Lisp Variables::. - - * all Lisp objects that are created in C functions and that - must be protected from freeing them. They are registered in - the global list `gcprolist'. *Note GCPROing::. - - * all local variables (i.e. their name fields `symbol' and old - values `old_values') that are bound during the evaluation by - the Lisp engine. They are stored in `specbinding' structs - pushed on a stack called `specpdl'. *Note Dynamic Binding; - The specbinding Stack; Unwind-Protects::. - - * all catch blocks that the Lisp engine encounters during the - evaluation cause the creation of structs `catchtag' inserted - in the list `catchlist'. Their tag (`tag') and value (`val' - fields are freshly created objects and therefore have to be - marked. *Note Catch and Throw::. - - * every function application pushes new structs `backtrace' on - the call stack of the Lisp engine (`backtrace_list'). The - unique parts that have to be marked are the fields for each - function (`function') and all their arguments (`args'). - *Note Evaluation::. - - * all objects that are used by the redisplay engine that must - not be freed are marked by a special function called - `mark_redisplay' (in `redisplay.c'). - - * all objects created for profiling purposes are allocated by C - functions instead of using the lisp allocation mechanisms. In - order to receive the right ones during the sweep phase, they - also have to be marked manually. That is done by the function - `mark_profiling_info' - - 9. Hash tables in XEmacs belong to a kind of special objects that - make use of a concept often called 'weak pointers'. To make a - long story short, these kind of pointers are not followed during - the estimation of the live objects during garbage collection. Any - object referenced only by weak pointers is collected anyway, and - the reference to it is cleared. In hash tables there are different - usage patterns of them, manifesting in different types of hash - tables, namely 'non-weak', 'weak', 'key-weak' and 'value-weak' - (internally also 'key-car-weak' and 'value-car-weak') hash tables, - each clearing entries depending on different conditions. More - information can be found in the documentation to the function - `make-hash-table'. - - Because there are complicated dependency rules about when and what - to mark while processing weak hash tables, the standard `marker' - method is only active if it is marking non-weak hash tables. As - soon as a weak component is in the table, the hash table entries - are ignored while marking. Instead their marking is done each - separately by the function `finish_marking_weak_hash_tables'. This - function iterates over each hash table entry `hentries' for each - weak hash table in `Vall_weak_hash_tables'. Depending on the type - of a table, the appropriate action is performed. If a table is - acting as `HASH_TABLE_KEY_WEAK', and a key already marked, - everything reachable from the `value' component is marked. If it is - acting as a `HASH_TABLE_VALUE_WEAK' and the value component is - already marked, the marking starts beginning only from the `key' - component. If it is a `HASH_TABLE_KEY_CAR_WEAK' and the car of - the key entry is already marked, we mark both the `key' and - `value' components. Finally, if the table is of the type - `HASH_TABLE_VALUE_CAR_WEAK' and the car of the value components is - already marked, again both the `key' and the `value' components - get marked. - - Again, there are lists with comparable properties called weak - lists. There exist different peculiarities of their types called - `simple', `assoc', `key-assoc' and `value-assoc'. You can find - further details about them in the description to the function - `make-weak-list'. The scheme of their marking is similar: all weak - lists are listed in `Qall_weak_lists', therefore we iterate over - them. The marking is advanced until we hit an already marked pair. - Then we know that during a former run all the rest has been marked - completely. Again, depending on the special type of the weak list, - our jobs differ. If it is a `WEAK_LIST_SIMPLE' and the elem is - marked, we mark the `cons' part. If it is a `WEAK_LIST_ASSOC' and - not a pair or a pair with both marked car and cdr, we mark the - `cons' and the `elem'. If it is a `WEAK_LIST_KEY_ASSOC' and not a - pair or a pair with a marked car of the elem, we mark the `cons' - and the `elem'. Finally, if it is a `WEAK_LIST_VALUE_ASSOC' and - not a pair or a pair with a marked cdr of the elem, we mark both - the `cons' and the `elem'. - - Since, by marking objects in reach from weak hash tables and weak - lists, other objects could get marked, this perhaps implies - further marking of other weak objects, both finishing functions - are redone as long as yet unmarked objects get freshly marked. - - 10. After completing the special marking for the weak hash tables and - for the weak lists, all entries that point to objects that are - going to be swept in the further process are useless, and - therefore have to be removed from the table or the list. - - The function `prune_weak_hash_tables' does the job for weak hash - tables. Totally unmarked hash tables are removed from the list - `Vall_weak_hash_tables'. The other ones are treated more carefully - by scanning over all entries and removing one as soon as one of - the components `key' and `value' is unmarked. - - The same idea applies to the weak lists. It is accomplished by - `prune_weak_lists': An unmarked list is pruned from - `Vall_weak_lists' immediately. A marked list is treated more - carefully by going over it and removing just the unmarked pairs. - - 11. The function `prune_specifiers' checks all listed specifiers held - in `Vall_speficiers' and removes the ones from the lists that are - unmarked. - - 12. All syntax tables are stored in a list called - `Vall_syntax_tables'. The function `prune_syntax_tables' walks - through it and unlinks the tables that are unmarked. - - 13. Next, we will attack the complete sweeping - the function - `gc_sweep' which holds the predominance. - - 14. First, all the variables with respect to garbage collection are - reset. `consing_since_gc' - the counter of the created cells since - the last garbage collection - is set back to 0, and - `gc_in_progress' is not `true' anymore. - - 15. In case the session is interactive, the displayed cursor and - message are removed again. - - 16. The state of `gc_inhibit' is restored to the former value by - unwinding the stack. - - 17. A small memory reserve is always held back that can be reached by - `breathing_space'. If nothing more is left, we create a new reserve - and exit. - - -File: internals.info, Node: mark_object, Next: gc_sweep, Prev: garbage_collect_1, Up: Garbage Collection - Step by Step - -`mark_object' -------------- - - The first thing that is checked while marking an object is whether -the object is a real Lisp object `Lisp_Type_Record' or just an integer -or a character. Integers and characters are the only two types that are -stored directly - without another level of indirection, and therefore -they don't have to be marked and collected. *Note How Lisp Objects Are -Represented in C::. - - The second case is the one we have to handle. It is the one when we -are dealing with a pointer to a Lisp object. But, there exist also three -possibilities, that prevent us from doing anything while marking: The -object is read only which prevents it from being garbage collected, -i.e. marked (`C_READONLY_RECORD_HEADER'). The object in question is -already marked, and need not be marked for the second time (checked by -`MARKED_RECORD_HEADER_P'). If it is a special, unmarkable object -(`UNMARKABLE_RECORD_HEADER_P', apparently, these are objects that sit -in some CONST space, and can therefore not be marked, see -`this_one_is_unmarkable' in `alloc.c'). - - Now, the actual marking is feasible. We do so by once using the macro -`MARK_RECORD_HEADER' to mark the object itself (actually the special -flag in the lrecord header), and calling its special marker "method" -`marker' if available. The marker method marks every other object that -is in reach from our current object. Note, that these marker methods -should not call `mark_object' recursively, but instead should return -the next object from where further marking has to be performed. - - In case another object was returned, as mentioned before, we -reiterate the whole `mark_object' process beginning with this next -object. - - File: internals.info, Node: gc_sweep, Next: sweep_lcrecords_1, Prev: mark_object, Up: Garbage Collection - Step by Step `gc_sweep' @@ -284,7 +68,7 @@ the internals: *Note lrecords::. Our next candidates are the other objects that behave quite differently than everything else: the strings. They consists of two -parts, a fixed-size portion (`struct Lisp_string') holding the string's +parts, a fixed-size portion (`struct Lisp_String') holding the string's length, its property list and a pointer to the second part, and the actual string data, which is stored in string-chars blocks comparable to frob blocks. In this block, the data is not only freed, but also a @@ -501,24 +285,17 @@ lrecords [see `lrecord.h'] All lrecords have at the beginning of their structure a `struct -lrecord_header'. This just contains a pointer to a `struct +lrecord_header'. This just contains a type number and some flags, +including the mark bit. All builtin type numbers are defined as +constants in `enum lrecord_type', to allow the compiler to generate +more efficient code for `TYPEP'. The type number, thru the +`lrecord_implementation_table', gives access to a `struct lrecord_implementation', which is a structure containing method pointers and such. There is one of these for each type, and it is a global, constant, statically-declared structure that is declared in the -`DEFINE_LRECORD_IMPLEMENTATION()' macro. (This macro actually declares -an array of two `struct lrecord_implementation' structures. The first -one contains all the standard method pointers, and is used in all -normal circumstances. During garbage collection, however, the lrecord -is "marked" by bumping its implementation pointer by one, so that it -points to the second structure in the array. This structure contains a -special indication in it that it's a "marked-object" structure: the -finalize method is the special function `this_marks_a_marked_record()', -and all other methods are null pointers. At the end of garbage -collection, all lrecords will either be reclaimed or unmarked by -decrementing their implementation pointers, so this second structure -pointer will never remain past garbage collection. - - Simple lrecords (of type (c) above) just have a `struct +`DEFINE_LRECORD_IMPLEMENTATION()' macro. + + Simple lrecords (of type (b) above) just have a `struct lrecord_header' at their beginning. lcrecords, however, actually have a `struct lcrecord_header'. This, in turn, has a `struct lrecord_header' at its beginning, so sanity is preserved; but it also has a pointer @@ -544,20 +321,21 @@ allocation function for each lrecord type. Whenever you create an lrecord, you need to call either `DEFINE_LRECORD_IMPLEMENTATION()' or `DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()'. This needs to be specified -in a C file, at the top level. What this actually does is define and -initialize the implementation structure for the lrecord. (And possibly -declares a function `error_check_foo()' that implements the `XFOO()' -macro when error-checking is enabled.) The arguments to the macros are -the actual type name (this is used to construct the C variable name of -the lrecord implementation structure and related structures using the -`##' macro concatenation operator), a string that names the type on the -Lisp level (this may not be the same as the C type name; typically, the -C type name has underscores, while the Lisp string has dashes), various -method pointers, and the name of the C structure that contains the -object. The methods are used to encapsulate type-specific information -about the object, such as how to print it or mark it for garbage -collection, so that it's easy to add new object types without having to -add a specific case for each new type in a bunch of different places. +in a `.c' file, at the top level. What this actually does is define +and initialize the implementation structure for the lrecord. (And +possibly declares a function `error_check_foo()' that implements the +`XFOO()' macro when error-checking is enabled.) The arguments to the +macros are the actual type name (this is used to construct the C +variable name of the lrecord implementation structure and related +structures using the `##' macro concatenation operator), a string that +names the type on the Lisp level (this may not be the same as the C +type name; typically, the C type name has underscores, while the Lisp +string has dashes), various method pointers, and the name of the C +structure that contains the object. The methods are used to +encapsulate type-specific information about the object, such as how to +print it or mark it for garbage collection, so that it's easy to add +new object types without having to add a specific case for each new +type in a bunch of different places. The difference between `DEFINE_LRECORD_IMPLEMENTATION()' and `DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()' is that the former is used @@ -571,21 +349,20 @@ for keeping allocation statistics.) For the purpose of keeping allocation statistics, the allocation engine keeps a list of all the different types that exist. Note that, since `DEFINE_LRECORD_IMPLEMENTATION()' is a macro that is specified at -top-level, there is no way for it to add to the list of all existing -types. What happens instead is that each implementation structure -contains in it a dynamically assigned number that is particular to that -type. (Or rather, it contains a pointer to another structure that -contains this number. This evasiveness is done so that the -implementation structure can be declared const.) In the sweep stage of -garbage collection, each lrecord is examined to see if its -implementation structure has its dynamically-assigned number set. If -not, it must be a new type, and it is added to the list of known types -and a new number assigned. The number is used to index into an array -holding the number of objects of each type and the total memory -allocated for objects of that type. The statistics in this array are -also computed during the sweep stage. These statistics are returned by -the call to `garbage-collect' and are printed out at the end of the -loadup phase. +top-level, there is no way for it to initialize the global data +structures containing type information, like +`lrecord_implementations_table'. For this reason a call to +`INIT_LRECORD_IMPLEMENTATION' must be added to the same source file +containing `DEFINE_LRECORD_IMPLEMENTATION', but instead of to the top +level, to one of the init functions, typically `syms_of_FOO.c'. +`INIT_LRECORD_IMPLEMENTATION' must be called before an object of this +type is used. + + The type number is also used to index into an array holding the +number of objects of each type and the total memory allocated for +objects of that type. The statistics in this array are computed during +the sweep stage. These statistics are returned by the call to +`garbage-collect'. Note that for every type defined with a `DEFINE_LRECORD_*()' macro, there needs to be a `DECLARE_LRECORD_IMPLEMENTATION()' somewhere in a @@ -595,6 +372,15 @@ there needs to be a `DECLARE_LRECORD_IMPLEMENTATION()' somewhere in a `FOOBARP()', etc. macros in a `.h' (or occasionally `.c') file. To create one of these, copy an existing model and modify as necessary. + *Please note:* If you define an lrecord in an external +dynamically-loaded module, you must use `DECLARE_EXTERNAL_LRECORD', +`DEFINE_EXTERNAL_LRECORD_IMPLEMENTATION', and +`DEFINE_EXTERNAL_LRECORD_SEQUENCE_IMPLEMENTATION' instead of the +non-EXTERNAL forms. These macros will dynamically add new type numbers +to the global enum that records them, whereas the non-EXTERNAL forms +assume that the programmer has already inserted the correct type numbers +into the enum's code at compile-time. + The various methods in the lrecord implementation structure are: 1. A "mark" method. This is called during the marking stage and @@ -723,7 +509,7 @@ create one of these, copy an existing model and modify as necessary. configurations and opaques.  -File: internals.info, Node: Low-level allocation, Next: Pure Space, Prev: lrecords, Up: Allocation of Objects in XEmacs Lisp +File: internals.info, Node: Low-level allocation, Next: Cons, Prev: lrecords, Up: Allocation of Objects in XEmacs Lisp Low-level allocation ==================== @@ -784,10 +570,9 @@ system, when memory gets to 75%, 85%, and 95% full. (On some systems, the memory warnings are not functional.) Allocated memory that is going to be used to make a Lisp object is -created using `allocate_lisp_storage()'. This calls `xmalloc()' but -also verifies that the pointer to the memory can fit into a Lisp word -(remember that some bits are taken away for a type tag and a mark bit). -If not, an error is issued through `memory_full()'. +created using `allocate_lisp_storage()'. This just calls `xmalloc()'. +It used to verify that the pointer to the memory can fit into a Lisp +word, before the current Lisp object representation was introduced. `allocate_lisp_storage()' is called by `alloc_lcrecord()', `ALLOCATE_FIXED_TYPE()', and the vector and bit-vector creation routines. These routines also call `INCREMENT_CONS_COUNTER()' at the @@ -796,15 +581,7 @@ allocated, so that garbage-collection can be invoked when the threshold is reached.  -File: internals.info, Node: Pure Space, Next: Cons, Prev: Low-level allocation, Up: Allocation of Objects in XEmacs Lisp - -Pure Space -========== - - Not yet documented. - - -File: internals.info, Node: Cons, Next: Vector, Prev: Pure Space, Up: Allocation of Objects in XEmacs Lisp +File: internals.info, Node: Cons, Next: Vector, Prev: Low-level allocation, Up: Allocation of Objects in XEmacs Lisp Cons ==== @@ -851,13 +628,8 @@ File: internals.info, Node: Symbol, Next: Marker, Prev: Bit Vector, Up: Allo Symbol ====== - Symbols are also allocated in frob blocks. Note that the code -exists for symbols to be either lrecords (category (c) above) or simple -types (category (b) above), and are lrecords by default (I think), -although there is no good reason for this. - - Note that symbols in the awful horrible obarray structure are -chained through their `next' field. + Symbols are also allocated in frob blocks. Symbols in the awful +horrible obarray structure are chained through their `next' field. Remember that `intern' looks up a symbol in an obarray, creating one if necessary. @@ -913,8 +685,8 @@ big to fit into a string-chars block. Such strings, called "big strings", are all `malloc()'ed as their own block. (#### Although it would make more sense for the threshold for big strings to be somewhat lower, e.g. 1/2 or 1/4 the size of a string-chars block. It seems that -this was indeed the case formerly - indeed, the threshold was set at -1/8 - but Mly forgot about this when rewriting things for 19.8.) +this was indeed the case formerly--indeed, the threshold was set at +1/8--but Mly forgot about this when rewriting things for 19.8.) Note also that the string data in string-chars blocks is padded as necessary so that proper alignment constraints on the `struct @@ -949,58 +721,374 @@ Compiled Function Not yet documented.  -File: internals.info, Node: Events and the Event Loop, Next: Evaluation; Stack Frames; Bindings, Prev: Allocation of Objects in XEmacs Lisp, Up: Top +File: internals.info, Node: Dumping, Next: Events and the Event Loop, Prev: Allocation of Objects in XEmacs Lisp, Up: Top + +Dumping +******* + +What is dumping and its justification +===================================== + + The C code of XEmacs is just a Lisp engine with a lot of built-in +primitives useful for writing an editor. The editor itself is written +mostly in Lisp, and represents around 100K lines of code. Loading and +executing the initialization of all this code takes a bit a time (five +to ten times the usual startup time of current xemacs) and requires +having all the lisp source files around. Having to reload them each +time the editor is started would not be acceptable. + + The traditional solution to this problem is called dumping: the build +process first creates the lisp engine under the name `temacs', then +runs it until it has finished loading and initializing all the lisp +code, and eventually creates a new executable called `xemacs' including +both the object code in `temacs' and all the contents of the memory +after the initialization. + + This solution, while working, has a huge problem: the creation of the +new executable from the actual contents of memory is an extremely +system-specific process, quite error-prone, and which interferes with a +lot of system libraries (like malloc). It is even getting worse +nowadays with libraries using constructors which are automatically +called when the program is started (even before main()) which tend to +crash when they are called multiple times, once before dumping and once +after (IRIX 6.x libz.so pulls in some C++ image libraries thru +dependencies which have this problem). Writing the dumper is also one +of the most difficult parts of porting XEmacs to a new operating system. +Basically, `dumping' is an operation that is just not officially +supported on many operating systems. + + The aim of the portable dumper is to solve the same problem as the +system-specific dumper, that is to be able to reload quickly, using only +a small number of files, the fully initialized lisp part of the editor, +without any system-specific hacks. + +* Menu: + +* Overview:: +* Data descriptions:: +* Dumping phase:: +* Reloading phase:: +* Remaining issues:: + + +File: internals.info, Node: Overview, Next: Data descriptions, Prev: Dumping, Up: Dumping + +Overview +======== + + The portable dumping system has to: + + 1. At dump time, write all initialized, non-quickly-rebuildable data + to a file [Note: currently named `xemacs.dmp', but the name will + change], along with all informations needed for the reloading. + + 2. When starting xemacs, reload the dump file, relocate it to its new + starting address if needed, and reinitialize all pointers to this + data. Also, rebuild all the quickly rebuildable data. + + +File: internals.info, Node: Data descriptions, Next: Dumping phase, Prev: Overview, Up: Dumping + +Data descriptions +================= + + The more complex task of the dumper is to be able to write lisp +objects (lrecords) and C structs to disk and reload them at a different +address, updating all the pointers they include in the process. This +is done by using external data descriptions that give information about +the layout of the structures in memory. + + The specification of these descriptions is in lrecord.h. A +description of an lrecord is an array of struct lrecord_description. +Each of these structs include a type, an offset in the structure and +some optional parameters depending on the type. For instance, here is +the string description: + + static const struct lrecord_description string_description[] = { + { XD_BYTECOUNT, offsetof (Lisp_String, size) }, + { XD_OPAQUE_DATA_PTR, offsetof (Lisp_String, data), XD_INDIRECT(0, 1) }, + { XD_LISP_OBJECT, offsetof (Lisp_String, plist) }, + { XD_END } + }; + + The first line indicates a member of type Bytecount, which is used by +the next, indirect directive. The second means "there is a pointer to +some opaque data in the field `data'". The length of said data is +given by the expression `XD_INDIRECT(0, 1)', which means "the value in +the 0th line of the description (welcome to C) plus one". The third +line means "there is a Lisp_Object member `plist' in the Lisp_String +structure". `XD_END' then ends the description. + + This gives us all the information we need to move around what is +pointed to by a structure (C or lrecord) and, by transitivity, +everything that it points to. The only missing information for dumping +is the size of the structure. For lrecords, this is part of the +lrecord_implementation, so we don't need to duplicate it. For C +structures we use a struct struct_description, which includes a size +field and a pointer to an associated array of lrecord_description. + + +File: internals.info, Node: Dumping phase, Next: Reloading phase, Prev: Data descriptions, Up: Dumping + +Dumping phase +============= -Events and the Event Loop -************************* + Dumping is done by calling the function pdump() (in dumper.c) which +is invoked from Fdump_emacs (in emacs.c). This function performs a +number of tasks. * Menu: -* Introduction to Events:: -* Main Loop:: -* Specifics of the Event Gathering Mechanism:: -* Specifics About the Emacs Event:: -* The Event Stream Callback Routines:: -* Other Event Loop Functions:: -* Converting Events:: -* Dispatching Events; The Command Builder:: +* Object inventory:: +* Address allocation:: +* The header:: +* Data dumping:: +* Pointers dumping:: + + +File: internals.info, Node: Object inventory, Next: Address allocation, Prev: Dumping phase, Up: Dumping phase + +Object inventory +---------------- + + The first task is to build the list of the objects to dump. This +includes: + + * lisp objects + + * C structures + + We end up with one `pdump_entry_list_elmt' per object group (arrays +of C structs are kept together) which includes a pointer to the first +object of the group, the per-object size and the count of objects in the +group, along with some other information which is initialized later. + + These entries are linked together in `pdump_entry_list' structures +and can be enumerated thru either: + + 1. the `pdump_object_table', an array of `pdump_entry_list', one per + lrecord type, indexed by type number. + + 2. the `pdump_opaque_data_list', used for the opaque data which does + not include pointers, and hence does not need descriptions. + + 3. the `pdump_struct_table', which is a vector of + `struct_description'/`pdump_entry_list' pairs, used for non-opaque + C structures. + + This uses a marking strategy similar to the garbage collector. Some +differences though: + + 1. We do not use the mark bit (which does not exist for C structures + anyway), we use a big hash table instead. + + 2. We do not use the mark function of lrecords but instead rely on the + external descriptions. This happens essentially because we need to + follow pointers to C structures and opaque data in addition to + Lisp_Object members. + + This is done by `pdump_register_object', which handles Lisp_Object +variables, and pdump_register_struct which handles C structures, which +both delegate the description management to pdump_register_sub. + + The hash table doubles as a map object to pdump_entry_list_elmt (i.e. +allows us to look up a pdump_entry_list_elmt with the object it points +to). Entries are added with `pdump_add_entry()' and looked up with +`pdump_get_entry()'. There is no need for entry removal. The hash +value is computed quite basically from the object pointer by +`pdump_make_hash()'. + + The roots for the marking are: + + 1. the `staticpro''ed variables (there is a special + `staticpro_nodump()' call for protected variables we do not want + to dump). + + 2. the `pdump_wire''d variables (`staticpro' is equivalent to + `staticpro_nodump()' + `pdump_wire()'). + + 3. the `dumpstruct''ed variables, which points to C structures. + + This does not include the GCPRO'ed variables, the specbinds, the +catchtags, the backlist, the redisplay or the profiling info, since we +do not want to rebuild the actual chain of lisp calls which end up to +the dump-emacs call, only the global variables. + + Weak lists and weak hash tables are dumped as if they were their +non-weak equivalent (without changing their type, of course). This has +not yet been a problem. + + +File: internals.info, Node: Address allocation, Next: The header, Prev: Object inventory, Up: Dumping phase + +Address allocation +------------------ + + The next step is to allocate the offsets of each of the objects in +the final dump file. This is done by `pdump_allocate_offset()' which +is called indirectly by `pdump_scan_by_alignment()'. + + The strategy to deal with alignment problems uses these facts: + + 1. real world alignment requirements are powers of two. + + 2. the C compiler is required to adjust the size of a struct so that + you can have an array of them next to each other. This means you + can have a upper bound of the alignment requirements of a given + structure by looking at which power of two its size is a multiple. + + 3. the non-variant part of variable size lrecords has an alignment + requirement of 4. + + Hence, for each lrecord type, C struct type or opaque data block the +alignment requirement is computed as a power of two, with a minimum of +2^2 for lrecords. `pdump_scan_by_alignment()' then scans all the +`pdump_entry_list_elmt''s, the ones with the highest requirements +first. This ensures the best packing. + + The maximum alignment requirement we take into account is 2^8. + + `pdump_allocate_offset()' only has to do a linear allocation, +starting at offset 256 (this leaves room for the header and keep the +alignments happy). + + +File: internals.info, Node: The header, Next: Data dumping, Prev: Address allocation, Up: Dumping phase + +The header +---------- + + The next step creates the file and writes a header with a signature +and some random informations in it (number of staticpro, number of +assigned lrecord types, etc...). The reloc_address field, which +indicates at which address the file should be loaded if we want to +avoid post-reload relocation, is set to 0. It then seeks to offset 256 +(base offset for the objects). + + +File: internals.info, Node: Data dumping, Next: Pointers dumping, Prev: The header, Up: Dumping phase + +Data dumping +------------ + + The data is dumped in the same order as the addresses were allocated +by `pdump_dump_data()', called from `pdump_scan_by_alignment()'. This +function copies the data to a temporary buffer, relocates all pointers +in the object to the addresses allocated in step Address Allocation, +and writes it to the file. Using the same order means that, if we are +careful with lrecords whose size is not a multiple of 4, we are ensured +that the object is always written at the offset in the file allocated +in step Address Allocation. + + +File: internals.info, Node: Pointers dumping, Prev: Data dumping, Up: Dumping phase + +Pointers dumping +---------------- + + A bunch of tables needed to reassign properly the global pointers are +then written. They are: + + 1. the staticpro array + + 2. the dumpstruct array + + 3. the lrecord_implementation_table array + + 4. a vector of all the offsets to the objects in the file that + include a description (for faster relocation at reload time) + + 5. the pdump_wired and pdump_wired_list arrays + + For each of the arrays we write both the pointer to the variables and +the relocated offset of the object they point to. Since these variables +are global, the pointers are still valid when restarting the program and +are used to regenerate the global pointers. + + The `pdump_wired_list' array is a special case. The variables it +points to are the head of weak linked lists of lisp objects of the same +type. Not all objects of this list are dumped so the relocated pointer +we associate with them points to the first dumped object of the list, or +Qnil if none is available. This is also the reason why they are not +used as roots for the purpose of object enumeration. + + This is the end of the dumping part. + + +File: internals.info, Node: Reloading phase, Next: Remaining issues, Prev: Dumping phase, Up: Dumping + +Reloading phase +=============== + +File loading +------------ + + The file is mmap'ed in memory (which ensures a PAGESIZE alignment, at +least 4096), or if mmap is unavailable or fails, a 256-bytes aligned +malloc is done and the file is loaded. + + Some variables are reinitialized from the values found in the header. + + The difference between the actual loading address and the +reloc_address is computed and will be used for all the relocations. + +Putting back the staticvec +-------------------------- + + The staticvec array is memcpy'd from the file and the variables it +points to are reset to the relocated objects addresses. + +Putting back the dumpstructed variables +--------------------------------------- + + The variables pointed to by dumpstruct in the dump phase are reset to +the right relocated object addresses. + +lrecord_implementations_table +----------------------------- + + The lrecord_implementations_table is reset to its dump time state and +the right lrecord_type_index values are put in. + +Object relocation +----------------- + + All the objects are relocated using their description and their +offset by `pdump_reloc_one'. This step is unnecessary if the +reloc_address is equal to the file loading address. + +Putting back the pdump_wire and pdump_wire_list variables +--------------------------------------------------------- + + Same as Putting back the dumpstructed variables. + +Reorganize the hash tables +-------------------------- + + Since some of the hash values in the lisp hash tables are +address-dependent, their layout is now wrong. So we go through each of +them and have them resorted by calling `pdump_reorganize_hash_table'.  -File: internals.info, Node: Introduction to Events, Next: Main Loop, Up: Events and the Event Loop - -Introduction to Events -====================== - - An event is an object that encapsulates information about an -interesting occurrence in the operating system. Events are generated -either by user action, direct (e.g. typing on the keyboard or moving -the mouse) or indirect (moving another window, thereby generating an -expose event on an Emacs frame), or as a result of some other typically -asynchronous action happening, such as output from a subprocess being -ready or a timer expiring. Events come into the system in an -asynchronous fashion (typically through a callback being called) and -are converted into a synchronous event queue (first-in, first-out) in a -process that we will call "collection". - - Note that each application has its own event queue. (It is -immaterial whether the collection process directly puts the events in -the proper application's queue, or puts them into a single system -queue, which is later split up.) - - The most basic level of event collection is done by the operating -system or window system. Typically, XEmacs does its own event -collection as well. Often there are multiple layers of collection in -XEmacs, with events from various sources being collected into a queue, -which is then combined with other sources to go into another queue -(i.e. a second level of collection), with perhaps another level on top -of this, etc. - - XEmacs has its own types of events (called "Emacs events"), which -provides an abstract layer on top of the system-dependent nature of the -most basic events that are received. Part of the complex nature of the -XEmacs event collection process involves converting from the -operating-system events into the proper Emacs events - there may not be -a one-to-one correspondence. - - Emacs events are documented in `events.h'; I'll discuss them later. +File: internals.info, Node: Remaining issues, Prev: Reloading phase, Up: Dumping + +Remaining issues +================ + + The build process will have to start a post-dump xemacs, ask it the +loading address (which will, hopefully, be always the same between +different xemacs invocations) and relocate the file to the new address. +This way the object relocation phase will not have to be done, which +means no writes in the objects and that, because of the use of mmap, the +dumped data will be shared between all the xemacs running on the +computer. + + Some executable signature will be necessary to ensure that a given +dump file is really associated with a given executable, or random +crashes will occur. Maybe a random number set at compile or configure +time thru a define. This will also allow for having +differently-compiled xemacsen on the same system (mule and no-mule +comes to mind). + + The DOC file contents should probably end up in the dump file.