This is ../info/internals.info, produced by makeinfo version 4.0b from internals/internals.texi. INFO-DIR-SECTION XEmacs Editor START-INFO-DIR-ENTRY * Internals: (internals). XEmacs Internals Manual. END-INFO-DIR-ENTRY Copyright (C) 1992 - 1996 Ben Wing. Copyright (C) 1996, 1997 Sun Microsystems. Copyright (C) 1994 - 1998 Free Software Foundation. Copyright (C) 1994, 1995 Board of Trustees, University of Illinois. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Foundation. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided also that the section entitled "GNU General Public License" is included exactly as in the original, and provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that the section entitled "GNU General Public License" may be included in a translation approved by the Free Software Foundation instead of in the original English.  File: internals.info, Node: Modules for Interfacing with the File System, Next: Modules for Other Aspects of the Lisp Interpreter and Object System, Prev: Modules for the Redisplay Mechanism, Up: A Summary of the Various XEmacs Modules Modules for Interfacing with the File System ============================================ lstream.c lstream.h These modules implement the "stream" Lisp object type. This is an internal-only Lisp object that implements a generic buffering stream. The idea is to provide a uniform interface onto all sources and sinks of data, including file descriptors, stdio streams, chunks of memory, Lisp buffers, Lisp strings, etc. That way, I/O functions can be written to the stream interface and can transparently handle all possible sources and sinks. (For example, the `read' function can read data from a file, a string, a buffer, or even a function that is called repeatedly to return data, without worrying about where the data is coming from or what-size chunks it is returned in.) Note that in the C code, streams are called "lstreams" (for "Lisp streams") to distinguish them from other kinds of streams, e.g. stdio streams and C++ I/O streams. Similar to other subsystems in XEmacs, lstreams are separated into generic functions and a set of methods for the different types of lstreams. `lstream.c' provides implementations of many different types of streams; others are provided, e.g., in `file-coding.c'. fileio.c This implements the basic primitives for interfacing with the file system. This includes primitives for reading files into buffers, writing buffers into files, checking for the presence or accessibility of files, canonicalizing file names, etc. Note that these primitives are usually not invoked directly by the user: There is a great deal of higher-level Lisp code that implements the user commands such as `find-file' and `save-buffer'. This is similar to the distinction between the lower-level primitives in `editfns.c' and the higher-level user commands in `commands.c' and `simple.el'. filelock.c This file provides functions for detecting clashes between different processes (e.g. XEmacs and some external process, or two different XEmacs processes) modifying the same file. (XEmacs can optionally use the `lock/' subdirectory to provide a form of "locking" between different XEmacs processes.) This module is also used by the low-level functions in `insdel.c' to ensure that, if the first modification is being made to a buffer whose corresponding file has been externally modified, the user is made aware of this so that the buffer can be synched up with the external changes if necessary. filemode.c This file provides some miscellaneous functions that construct a `rwxr-xr-x'-type permissions string (as might appear in an `ls'-style directory listing) given the information returned by the `stat()' system call. dired.c ndir.h These files implement the XEmacs interface to directory searching. This includes a number of primitives for determining the files in a directory and for doing filename completion. (Remember that generic completion is handled by a different mechanism, in `minibuf.c'.) `ndir.h' is a header file used for the directory-searching emulation functions provided in `sysdep.c' (see section J below), for systems that don't provide any directory-searching functions. (On those systems, directories can be read directly as files, and parsed.) realpath.c This file provides an implementation of the `realpath()' function for expanding symbolic links, on systems that don't implement it or have a broken implementation.  File: internals.info, Node: Modules for Other Aspects of the Lisp Interpreter and Object System, Next: Modules for Interfacing with the Operating System, Prev: Modules for Interfacing with the File System, Up: A Summary of the Various XEmacs Modules Modules for Other Aspects of the Lisp Interpreter and Object System =================================================================== elhash.c elhash.h hash.c hash.h These files provide two implementations of hash tables. Files `hash.c' and `hash.h' provide a generic C implementation of hash tables which can stand independently of XEmacs. Files `elhash.c' and `elhash.h' provide a separate implementation of hash tables that can store only Lisp objects, and knows about Lispy things like garbage collection, and implement the "hash-table" Lisp object type. specifier.c specifier.h This module implements the "specifier" Lisp object type. This is primarily used for displayable properties, and allows for values that are specific to a particular buffer, window, frame, device, or device class, as well as a default value existing. This is used, for example, to control the height of the horizontal scrollbar or the appearance of the `default', `bold', or other faces. The specifier object consists of a number of specifications, each of which maps from a buffer, window, etc. to a value. The function `specifier-instance' looks up a value given a window (from which a buffer, frame, and device can be derived). chartab.c chartab.h casetab.c `chartab.c' and `chartab.h' implement the "char table" Lisp object type, which maps from characters or certain sorts of character ranges to Lisp objects. The implementation of this object type is optimized for the internal representation of characters. Char tables come in different types, which affect the allowed object types to which a character can be mapped and also dictate certain other properties of the char table. `casetab.c' implements one sort of char table, the "case table", which maps characters to other characters of possibly different case. These are used by XEmacs to implement case-changing primitives and to do case-insensitive searching. syntax.c syntax.h This module implements "syntax tables", another sort of char table that maps characters into syntax classes that define the syntax of these characters (e.g. a parenthesis belongs to a class of `open' characters that have corresponding `close' characters and can be nested). This module also implements the Lisp "scanner", a set of primitives for scanning over text based on syntax tables. This is used, for example, to find the matching parenthesis in a command such as `forward-sexp', and by `font-lock.c' to locate quoted strings, comments, etc. casefiddle.c This module implements various Lisp primitives for upcasing, downcasing and capitalizing strings or regions of buffers. rangetab.c This module implements the "range table" Lisp object type, which provides for a mapping from ranges of integers to arbitrary Lisp objects. opaque.c opaque.h This module implements the "opaque" Lisp object type, an internal-only Lisp object that encapsulates an arbitrary block of memory so that it can be managed by the Lisp allocation system. To create an opaque object, you call `make_opaque()', passing a pointer to a block of memory. An object is created that is big enough to hold the memory, which is copied into the object's storage. The object will then stick around as long as you keep pointers to it, after which it will be automatically reclaimed. Opaque objects can also have an arbitrary "mark method" associated with them, in case the block of memory contains other Lisp objects that need to be marked for garbage-collection purposes. (If you need other object methods, such as a finalize method, you should just go ahead and create a new Lisp object type--it's not hard.) abbrev.c This function provides a few primitives for doing dynamic abbreviation expansion. In XEmacs, most of the code for this has been moved into Lisp. Some C code remains for speed and because the primitive `self-insert-command' (which is executed for all self-inserting characters) hooks into the abbrev mechanism. (`self-insert-command' is itself in C only for speed.) doc.c This function provides primitives for retrieving the documentation strings of functions and variables. These documentation strings contain certain special markers that get dynamically expanded (e.g. a reverse-lookup is performed on some named functions to retrieve their current key bindings). Some documentation strings (in particular, for the built-in primitives and pre-loaded Lisp functions) are stored externally in a file `DOC' in the `lib-src/' directory and need to be fetched from that file. (Part of the build stage involves building this file, and another part involves constructing an index for this file and embedding it into the executable, so that the functions in `doc.c' do not have to search the entire `DOC' file to find the appropriate documentation string.) md5.c This function provides a Lisp primitive that implements the MD5 secure hashing scheme, used to create a large hash value of a string of data such that the data cannot be derived from the hash value. This is used for various security applications on the Internet.  File: internals.info, Node: Modules for Interfacing with the Operating System, Next: Modules for Interfacing with X Windows, Prev: Modules for Other Aspects of the Lisp Interpreter and Object System, Up: A Summary of the Various XEmacs Modules Modules for Interfacing with the Operating System ================================================= callproc.c process.c process.h These modules allow XEmacs to spawn and communicate with subprocesses and network connections. `callproc.c' implements (through the `call-process' primitive) what are called "synchronous subprocesses". This means that XEmacs runs a program, waits till it's done, and retrieves its output. A typical example might be calling the `ls' program to get a directory listing. `process.c' and `process.h' implement "asynchronous subprocesses". This means that XEmacs starts a program and then continues normally, not waiting for the process to finish. Data can be sent to the process or retrieved from it as it's running. This is used for the `shell' command (which provides a front end onto a shell program such as `csh'), the mail and news readers implemented in XEmacs, etc. The result of calling `start-process' to start a subprocess is a process object, a particular kind of object used to communicate with the subprocess. You can send data to the process by passing the process object and the data to `send-process', and you can specify what happens to data retrieved from the process by setting properties of the process object. (When the process sends data, XEmacs receives a process event, which says that there is data ready. When `dispatch-event' is called on this event, it reads the data from the process and does something with it, as specified by the process object's properties. Typically, this means inserting the data into a buffer or calling a function.) Another property of the process object is called the "sentinel", which is a function that is called when the process terminates. Process objects are also used for network connections (connections to a process running on another machine). Network connections are started with `open-network-stream' but otherwise work just like subprocesses. sysdep.c sysdep.h These modules implement most of the low-level, messy operating-system interface code. This includes various device control (ioctl) operations for file descriptors, TTY's, pseudo-terminals, etc. (usually this stuff is fairly system-dependent; thus the name of this module), and emulation of standard library functions and system calls on systems that don't provide them or have broken versions. sysdir.h sysfile.h sysfloat.h sysproc.h syspwd.h syssignal.h systime.h systty.h syswait.h These header files provide consistent interfaces onto system-dependent header files and system calls. The idea is that, instead of including a standard header file like `' (which may or may not exist on various systems) or having to worry about whether all system provide a particular preprocessor constant, or having to deal with the four different paradigms for manipulating signals, you just include the appropriate `sys*.h' header file, which includes all the right system header files, defines and missing preprocessor constants, provides a uniform interface onto system calls, etc. `sysdir.h' provides a uniform interface onto directory-querying functions. (In some cases, this is in conjunction with emulation functions in `sysdep.c'.) `sysfile.h' includes all the necessary header files for standard system calls (e.g. `read()'), ensures that all necessary `open()' and `stat()' preprocessor constants are defined, and possibly (usually) substitutes sugared versions of `read()', `write()', etc. that automatically restart interrupted I/O operations. `sysfloat.h' includes the necessary header files for floating-point operations. `sysproc.h' includes the necessary header files for calling `select()', `fork()', `execve()', socket operations, and the like, and ensures that the `FD_*()' macros for descriptor-set manipulations are available. `syspwd.h' includes the necessary header files for obtaining information from `/etc/passwd' (the functions are emulated under VMS). `syssignal.h' includes the necessary header files for signal-handling and provides a uniform interface onto the different signal-handling and signal-blocking paradigms. `systime.h' includes the necessary header files and provides uniform interfaces for retrieving the time of day, setting file access/modification times, getting the amount of time used by the XEmacs process, etc. `systty.h' buffers against the infinitude of different ways of controlling TTY's. `syswait.h' provides a uniform way of retrieving the exit status from a `wait()'ed-on process (some systems use a union, others use an int). hpplay.c libsst.c libsst.h libst.h linuxplay.c nas.c sgiplay.c sound.c sunplay.c These files implement the ability to play various sounds on some types of computers. You have to configure your XEmacs with sound support in order to get this capability. `sound.c' provides the generic interface. It implements various Lisp primitives and variables that let you specify which sounds should be played in certain conditions. (The conditions are identified by symbols, which are passed to `ding' to make a sound. Various standard functions call this function at certain times; if sound support does not exist, a simple beep results. `sgiplay.c', `sunplay.c', `hpplay.c', and `linuxplay.c' interface to the machine's speaker for various different kind of machines. This is called "native" sound. `nas.c' interfaces to a computer somewhere else on the network using the NAS (Network Audio Server) protocol, playing sounds on that machine. This allows you to run XEmacs on a remote machine, with its display set to your local machine, and have the sounds be made on your local machine, provided that you have a NAS server running on your local machine. `libsst.c', `libsst.h', and `libst.h' provide some additional functions for playing sound on a Sun SPARC but are not currently in use. tooltalk.c tooltalk.h These two modules implement an interface to the ToolTalk protocol, which is an interprocess communication protocol implemented on some versions of Unix. ToolTalk is a high-level protocol that allows processes to register themselves as providers of particular services; other processes can then request a service without knowing or caring exactly who is providing the service. It is similar in spirit to the DDE protocol provided under Microsoft Windows. ToolTalk is a part of the new CDE (Common Desktop Environment) specification and is used to connect the parts of the SPARCWorks development environment. getloadavg.c This module provides the ability to retrieve the system's current load average. (The way to do this is highly system-specific, unfortunately, and requires a lot of special-case code.) sunpro.c This module provides a small amount of code used internally at Sun to keep statistics on the usage of XEmacs. broken-sun.h strcmp.c strcpy.c sunOS-fix.c These files provide replacement functions and prototypes to fix numerous bugs in early releases of SunOS 4.1. hftctl.c This module provides some terminal-control code necessary on versions of AIX prior to 4.1.  File: internals.info, Node: Modules for Interfacing with X Windows, Next: Modules for Internationalization, Prev: Modules for Interfacing with the Operating System, Up: A Summary of the Various XEmacs Modules Modules for Interfacing with X Windows ====================================== Emacs.ad.h A file generated from `Emacs.ad', which contains XEmacs-supplied fallback resources (so that XEmacs has pretty defaults). EmacsFrame.c EmacsFrame.h EmacsFrameP.h These modules implement an Xt widget class that encapsulates a frame. This is for ease in integrating with Xt. The EmacsFrame widget covers the entire X window except for the menubar; the scrollbars are positioned on top of the EmacsFrame widget. *Warning:* Abandon hope, all ye who enter here. This code took an ungodly amount of time to get right, and is likely to fall apart mercilessly at the slightest change. Such is life under Xt. EmacsManager.c EmacsManager.h EmacsManagerP.h These modules implement a simple Xt manager (i.e. composite) widget class that simply lets its children set whatever geometry they want. It's amazing that Xt doesn't provide this standardly, but on second thought, it makes sense, considering how amazingly broken Xt is. EmacsShell-sub.c EmacsShell.c EmacsShell.h EmacsShellP.h These modules implement two Xt widget classes that are subclasses of the TopLevelShell and TransientShell classes. This is necessary to deal with more brokenness that Xt has sadistically thrust onto the backs of developers. xgccache.c xgccache.h These modules provide functions for maintenance and caching of GC's (graphics contexts) under the X Window System. This code is junky and needs to be rewritten. select-msw.c select-x.c select.c select.h This module provides an interface to the X Window System's concept of "selections", the standard way for X applications to communicate with each other. xintrinsic.h xintrinsicp.h xmmanagerp.h xmprimitivep.h These header files are similar in spirit to the `sys*.h' files and buffer against different implementations of Xt and Motif. * `xintrinsic.h' should be included in place of `'. * `xintrinsicp.h' should be included in place of `'. * `xmmanagerp.h' should be included in place of `'. * `xmprimitivep.h' should be included in place of `'. xmu.c xmu.h These files provide an emulation of the Xmu library for those systems (i.e. HPUX) that don't provide it as a standard part of X. ExternalClient-Xlib.c ExternalClient.c ExternalClient.h ExternalClientP.h ExternalShell.c ExternalShell.h ExternalShellP.h extw-Xlib.c extw-Xlib.h extw-Xt.c extw-Xt.h These files provide the "external widget" interface, which allows an XEmacs frame to appear as a widget in another application. To do this, you have to configure with `--external-widget'. `ExternalShell*' provides the server (XEmacs) side of the connection. `ExternalClient*' provides the client (other application) side of the connection. These files are not compiled into XEmacs but are compiled into libraries that are then linked into your application. `extw-*' is common code that is used for both the client and server. Don't touch this code; something is liable to break if you do.  File: internals.info, Node: Modules for Internationalization, Prev: Modules for Interfacing with X Windows, Up: A Summary of the Various XEmacs Modules Modules for Internationalization ================================ mule-canna.c mule-ccl.c mule-charset.c mule-charset.h file-coding.c file-coding.h mule-mcpath.c mule-mcpath.h mule-wnnfns.c mule.c These files implement the MULE (Asian-language) support. Note that MULE actually provides a general interface for all sorts of languages, not just Asian languages (although they are generally the most complicated to support). This code is still in beta. `mule-charset.*' and `file-coding.*' provide the heart of the XEmacs MULE support. `mule-charset.*' implements the "charset" Lisp object type, which encapsulates a character set (an ordered one- or two-dimensional set of characters, such as US ASCII or JISX0208 Japanese Kanji). `file-coding.*' implements the "coding-system" Lisp object type, which encapsulates a method of converting between different encodings. An encoding is a representation of a stream of characters, possibly from multiple character sets, using a stream of bytes or words, and defines (e.g.) which escape sequences are used to specify particular character sets, how the indices for a character are converted into bytes (sometimes this involves setting the high bit; sometimes complicated rearranging of the values takes place, as in the Shift-JIS encoding), etc. `mule-ccl.c' provides the CCL (Code Conversion Language) interpreter. CCL is similar in spirit to Lisp byte code and is used to implement converters for custom encodings. `mule-canna.c' and `mule-wnnfns.c' implement interfaces to external programs used to implement the Canna and WNN input methods, respectively. This is currently in beta. `mule-mcpath.c' provides some functions to allow for pathnames containing extended characters. This code is fragmentary, obsolete, and completely non-working. Instead, PATHNAME-CODING-SYSTEM is used to specify conversions of names of files and directories. The standard C I/O functions like `open()' are wrapped so that conversion occurs automatically. `mule.c' provides a few miscellaneous things that should probably be elsewhere. intl.c This provides some miscellaneous internationalization code for implementing message translation and interfacing to the Ximp input method. None of this code is currently working. iso-wide.h This contains leftover code from an earlier implementation of Asian-language support, and is not currently used.  File: internals.info, Node: Allocation of Objects in XEmacs Lisp, Next: Dumping, Prev: A Summary of the Various XEmacs Modules, Up: Top Allocation of Objects in XEmacs Lisp ************************************ * Menu: * Introduction to Allocation:: * Garbage Collection:: * GCPROing:: * Garbage Collection - Step by Step:: * Integers and Characters:: * Allocation from Frob Blocks:: * lrecords:: * Low-level allocation:: * Cons:: * Vector:: * Bit Vector:: * Symbol:: * Marker:: * String:: * Compiled Function::  File: internals.info, Node: Introduction to Allocation, Next: Garbage Collection, Prev: Allocation of Objects in XEmacs Lisp, Up: Allocation of Objects in XEmacs Lisp Introduction to Allocation ========================== Emacs Lisp, like all Lisps, has garbage collection. This means that the programmer never has to explicitly free (destroy) an object; it happens automatically when the object becomes inaccessible. Most experts agree that garbage collection is a necessity in a modern, high-level language. Its omission from C stems from the fact that C was originally designed to be a nice abstract layer on top of assembly language, for writing kernels and basic system utilities rather than large applications. Lisp objects can be created by any of a number of Lisp primitives. Most object types have one or a small number of basic primitives for creating objects. For conses, the basic primitive is `cons'; for vectors, the primitives are `make-vector' and `vector'; for symbols, the primitives are `make-symbol' and `intern'; etc. Some Lisp objects, especially those that are primarily used internally, have no corresponding Lisp primitives. Every Lisp object, though, has at least one C primitive for creating it. Recall from section (VII) that a Lisp object, as stored in a 32-bit or 64-bit word, has a few tag bits, and a "value" that occupies the remainder of the bits. We can separate the different Lisp object types into three broad categories: * (a) Those for whom the value directly represents the contents of the Lisp object. Only two types are in this category: integers and characters. No special allocation or garbage collection is necessary for such objects. Lisp objects of these types do not need to be `GCPRO'ed. In the remaining two categories, the type is stored in the object itself. The tag for all such objects is the generic "lrecord" (Lisp_Type_Record) tag. The first bytes of the object's structure are an integer (actually a char) characterising the object's type and some flags, in particular the mark bit used for garbage collection. A structure describing the type is accessible thru the lrecord_implementation_table indexed with said integer. This structure includes the method pointers and a pointer to a string naming the type. * (b) Those lrecords that are allocated in frob blocks (see above). This includes the objects that are most common and relatively small, and includes conses, strings, subrs, floats, compiled functions, symbols, extents, events, and markers. With the cleanup of frob blocks done in 19.12, it's not terribly hard to add more objects to this category, but it's a bit trickier than adding an object type to type (c) (esp. if the object needs a finalization method), and is not likely to save much space unless the object is small and there are many of them. (In fact, if there are very few of them, it might actually waste space.) * (c) Those lrecords that are individually `malloc()'ed. These are called "lcrecords". All other types are in this category. Adding a new type to this category is comparatively easy, and all types added since 19.8 (when the current allocation scheme was devised, by Richard Mlynarik), with the exception of the character type, have been in this category. Note that bit vectors are a bit of a special case. They are simple lrecords as in category (b), but are individually `malloc()'ed like vectors. You can basically view them as exactly like vectors except that their type is stored in lrecord fashion rather than in directly-tagged fashion.  File: internals.info, Node: Garbage Collection, Next: GCPROing, Prev: Introduction to Allocation, Up: Allocation of Objects in XEmacs Lisp Garbage Collection ================== Garbage collection is simple in theory but tricky to implement. Emacs Lisp uses the oldest garbage collection method, called "mark and sweep". Garbage collection begins by starting with all accessible locations (i.e. all variables and other slots where Lisp objects might occur) and recursively traversing all objects accessible from those slots, marking each one that is found. We then go through all of memory and free each object that is not marked, and unmarking each object that is marked. Note that "all of memory" means all currently allocated objects. Traversing all these objects means traversing all frob blocks, all vectors (which are chained in one big list), and all lcrecords (which are likewise chained). Garbage collection can be invoked explicitly by calling `garbage-collect' but is also called automatically by `eval', once a certain amount of memory has been allocated since the last garbage collection (according to `gc-cons-threshold').  File: internals.info, Node: GCPROing, Next: Garbage Collection - Step by Step, Prev: Garbage Collection, Up: Allocation of Objects in XEmacs Lisp `GCPRO'ing ========== `GCPRO'ing is one of the ugliest and trickiest parts of Emacs internals. The basic idea is that whenever garbage collection occurs, all in-use objects must be reachable somehow or other from one of the roots of accessibility. The roots of accessibility are: 1. All objects that have been `staticpro()'d or `staticpro_nodump()'ed. This is used for any global C variables that hold Lisp objects. A call to `staticpro()' happens implicitly as a result of any symbols declared with `defsymbol()' and any variables declared with `DEFVAR_FOO()'. You need to explicitly call `staticpro()' (in the `vars_of_foo()' method of a module) for other global C variables holding Lisp objects. (This typically includes internal lists and such things.). Use `staticpro_nodump()' only in the rare cases when you do not want the pointed variable to be saved at dump time but rather recompute it at startup. Note that `obarray' is one of the `staticpro()'d things. Therefore, all functions and variables get marked through this. 2. Any shadowed bindings that are sitting on the `specpdl' stack. 3. Any objects sitting in currently active (Lisp) stack frames, catches, and condition cases. 4. A couple of special-case places where active objects are located. 5. Anything currently marked with `GCPRO'. Marking with `GCPRO' is necessary because some C functions (quite a lot, in fact), allocate objects during their operation. Quite frequently, there will be no other pointer to the object while the function is running, and if a garbage collection occurs and the object needs to be referenced again, bad things will happen. The solution is to mark those objects with `GCPRO'. Unfortunately this is easy to forget, and there is basically no way around this problem. Here are some rules, though: 1. For every `GCPRON', there have to be declarations of `struct gcpro gcpro1, gcpro2', etc. 2. You _must_ `UNGCPRO' anything that's `GCPRO'ed, and you _must not_ `UNGCPRO' if you haven't `GCPRO'ed. Getting either of these wrong will lead to crashes, often in completely random places unrelated to where the problem lies. 3. The way this actually works is that all currently active `GCPRO's are chained through the `struct gcpro' local variables, with the variable `gcprolist' pointing to the head of the list and the nth local `gcpro' variable pointing to the first `gcpro' variable in the next enclosing stack frame. Each `GCPRO'ed thing is an lvalue, and the `struct gcpro' local variable contains a pointer to this lvalue. This is why things will mess up badly if you don't pair up the `GCPRO's and `UNGCPRO's--you will end up with `gcprolist's containing pointers to `struct gcpro's or local `Lisp_Object' variables in no-longer-active stack frames. 4. It is actually possible for a single `struct gcpro' to protect a contiguous array of any number of values, rather than just a single lvalue. To effect this, call `GCPRON' as usual on the first object in the array and then set `gcproN.nvars'. 5. *Strings are relocated.* What this means in practice is that the pointer obtained using `XSTRING_DATA()' is liable to change at any time, and you should never keep it around past any function call, or pass it as an argument to any function that might cause a garbage collection. This is why a number of functions accept either a "non-relocatable" `char *' pointer or a relocatable Lisp string, and only access the Lisp string's data at the very last minute. In some cases, you may end up having to `alloca()' some space and copy the string's data into it. 6. By convention, if you have to nest `GCPRO''s, use `NGCPRON' (along with `struct gcpro ngcpro1, ngcpro2', etc.), `NNGCPRON', etc. This avoids compiler warnings about shadowed locals. 7. It is _always_ better to err on the side of extra `GCPRO's rather than too few. The extra cycles spent on this are almost never going to make a whit of difference in the speed of anything. 8. The general rule to follow is that caller, not callee, `GCPRO's. That is, you should not have to explicitly `GCPRO' any Lisp objects that are passed in as parameters. One exception from this rule is if you ever plan to change the parameter value, and store a new object in it. In that case, you _must_ `GCPRO' the parameter, because otherwise the new object will not be protected. So, if you create any Lisp objects (remember, this happens in all sorts of circumstances, e.g. with `Fcons()', etc.), you are responsible for `GCPRO'ing them, unless you are _absolutely sure_ that there's no possibility that a garbage-collection can occur while you need to use the object. Even then, consider `GCPRO'ing. 9. A garbage collection can occur whenever anything calls `Feval', or whenever a QUIT can occur where execution can continue past this. (Remember, this is almost anywhere.) 10. If you have the _least smidgeon of doubt_ about whether you need to `GCPRO', you should `GCPRO'. 11. Beware of `GCPRO'ing something that is uninitialized. If you have any shade of doubt about this, initialize all your variables to `Qnil'. 12. Be careful of traps, like calling `Fcons()' in the argument to another function. By the "caller protects" law, you should be `GCPRO'ing the newly-created cons, but you aren't. A certain number of functions that are commonly called on freshly created stuff (e.g. `nconc2()', `Fsignal()'), break the "caller protects" law and go ahead and `GCPRO' their arguments so as to simplify things, but make sure and check if it's OK whenever doing something like this. 13. Once again, remember to `GCPRO'! Bugs resulting from insufficient `GCPRO'ing are intermittent and extremely difficult to track down, often showing up in crashes inside of `garbage-collect' or in weirdly corrupted objects or even in incorrect values in a totally different section of code. Given the extremely error-prone nature of the `GCPRO' scheme, and the difficulties in tracking down, it should be considered a deficiency in the XEmacs code. A solution to this problem would involve implementing so-called "conservative" garbage collection for the C stack. That involves looking through all of stack memory and treating anything that looks like a reference to an object as a reference. This will result in a few objects not getting collected when they should, but it obviates the need for `GCPRO'ing, and allows garbage collection to happen at any point at all, such as during object allocation.  File: internals.info, Node: Garbage Collection - Step by Step, Next: Integers and Characters, Prev: GCPROing, Up: Allocation of Objects in XEmacs Lisp Garbage Collection - Step by Step ================================= * Menu: * Invocation:: * garbage_collect_1:: * mark_object:: * gc_sweep:: * sweep_lcrecords_1:: * compact_string_chars:: * sweep_strings:: * sweep_bit_vectors_1::  File: internals.info, Node: Invocation, Next: garbage_collect_1, Prev: Garbage Collection - Step by Step, Up: Garbage Collection - Step by Step Invocation ---------- The first thing that anyone should know about garbage collection is: when and how the garbage collector is invoked. One might think that this could happen every time new memory is allocated, e.g. new objects are created, but this is _not_ the case. Instead, we have the following situation: The entry point of any process of garbage collection is an invocation of the function `garbage_collect_1' in file `alloc.c'. The invocation can occur _explicitly_ by calling the function `Fgarbage_collect' (in addition this function provides information about the freed memory), or can occur _implicitly_ in four different situations: 1. In function `main_1' in file `emacs.c'. This function is called at each startup of xemacs. The garbage collection is invoked after all initial creations are completed, but only if a special internal error checking-constant `ERROR_CHECK_GC' is defined. 2. In function `disksave_object_finalization' in file `alloc.c'. The only purpose of this function is to clear the objects from memory which need not be stored with xemacs when we dump out an executable. This is only done by `Fdump_emacs' or by `Fdump_emacs_data' respectively (both in `emacs.c'). The actual clearing is accomplished by making these objects unreachable and starting a garbage collection. The function is only used while building xemacs. 3. In function `Feval / eval' in file `eval.c'. Each time the well known and often used function eval is called to evaluate a form, one of the first things that could happen, is a potential call of `garbage_collect_1'. There exist three global variables, `consing_since_gc' (counts the created cons-cells since the last garbage collection), `gc_cons_threshold' (a specified threshold after which a garbage collection occurs) and `always_gc'. If `always_gc' is set or if the threshold is exceeded, the garbage collection will start. 4. In function `Ffuncall / funcall' in file `eval.c'. This function evaluates calls of elisp functions and works according to `Feval'. The upshot is that garbage collection can basically occur everywhere `Feval', respectively `Ffuncall', is used - either directly or through another function. Since calls to these two functions are hidden in various other functions, many calls to `garbage_collect_1' are not obviously foreseeable, and therefore unexpected. Instances where they are used that are worth remembering are various elisp commands, as for example `or', `and', `if', `cond', `while', `setq', etc., miscellaneous `gui_item_...' functions, everything related to `eval' (`Feval_buffer', `call0', ...) and inside `Fsignal'. The latter is used to handle signals, as for example the ones raised by every `QUIT'-macro triggered after pressing Ctrl-g.  File: internals.info, Node: garbage_collect_1, Next: mark_object, Prev: Invocation, Up: Garbage Collection - Step by Step `garbage_collect_1' ------------------- We can now describe exactly what happens after the invocation takes place. 1. There are several cases in which the garbage collector is left immediately: when we are already garbage collecting (`gc_in_progress'), when the garbage collection is somehow forbidden (`gc_currently_forbidden'), when we are currently displaying something (`in_display') or when we are preparing for the armageddon of the whole system (`preparing_for_armageddon'). 2. Next the correct frame in which to put all the output occurring during garbage collecting is determined. In order to be able to restore the old display's state after displaying the message, some data about the current cursor position has to be saved. The variables `pre_gc_cursor' and `cursor_changed' take care of that. 3. The state of `gc_currently_forbidden' must be restored after the garbage collection, no matter what happens during the process. We accomplish this by `record_unwind_protect'ing the suitable function `restore_gc_inhibit' together with the current value of `gc_currently_forbidden'. 4. If we are concurrently running an interactive xemacs session, the next step is simply to show the garbage collector's cursor/message. 5. The following steps are the intrinsic steps of the garbage collector, therefore `gc_in_progress' is set. 6. For debugging purposes, it is possible to copy the current C stack frame. However, this seems to be a currently unused feature. 7. Before actually starting to go over all live objects, references to objects that are no longer used are pruned. We only have to do this for events (`clear_event_resource') and for specifiers (`cleanup_specifiers'). 8. Now the mark phase begins and marks all accessible elements. In order to start from all slots that serve as roots of accessibility, the function `mark_object' is called for each root individually to go out from there to mark all reachable objects. All roots that are traversed are shown in their processed order: * all constant symbols and static variables that are registered via `staticpro' in the dynarr `staticpros'. *Note Adding Global Lisp Variables::. * all Lisp objects that are created in C functions and that must be protected from freeing them. They are registered in the global list `gcprolist'. *Note GCPROing::. * all local variables (i.e. their name fields `symbol' and old values `old_values') that are bound during the evaluation by the Lisp engine. They are stored in `specbinding' structs pushed on a stack called `specpdl'. *Note Dynamic Binding; The specbinding Stack; Unwind-Protects::. * all catch blocks that the Lisp engine encounters during the evaluation cause the creation of structs `catchtag' inserted in the list `catchlist'. Their tag (`tag') and value (`val' fields are freshly created objects and therefore have to be marked. *Note Catch and Throw::. * every function application pushes new structs `backtrace' on the call stack of the Lisp engine (`backtrace_list'). The unique parts that have to be marked are the fields for each function (`function') and all their arguments (`args'). *Note Evaluation::. * all objects that are used by the redisplay engine that must not be freed are marked by a special function called `mark_redisplay' (in `redisplay.c'). * all objects created for profiling purposes are allocated by C functions instead of using the lisp allocation mechanisms. In order to receive the right ones during the sweep phase, they also have to be marked manually. That is done by the function `mark_profiling_info' 9. Hash tables in XEmacs belong to a kind of special objects that make use of a concept often called 'weak pointers'. To make a long story short, these kind of pointers are not followed during the estimation of the live objects during garbage collection. Any object referenced only by weak pointers is collected anyway, and the reference to it is cleared. In hash tables there are different usage patterns of them, manifesting in different types of hash tables, namely 'non-weak', 'weak', 'key-weak' and 'value-weak' (internally also 'key-car-weak' and 'value-car-weak') hash tables, each clearing entries depending on different conditions. More information can be found in the documentation to the function `make-hash-table'. Because there are complicated dependency rules about when and what to mark while processing weak hash tables, the standard `marker' method is only active if it is marking non-weak hash tables. As soon as a weak component is in the table, the hash table entries are ignored while marking. Instead their marking is done each separately by the function `finish_marking_weak_hash_tables'. This function iterates over each hash table entry `hentries' for each weak hash table in `Vall_weak_hash_tables'. Depending on the type of a table, the appropriate action is performed. If a table is acting as `HASH_TABLE_KEY_WEAK', and a key already marked, everything reachable from the `value' component is marked. If it is acting as a `HASH_TABLE_VALUE_WEAK' and the value component is already marked, the marking starts beginning only from the `key' component. If it is a `HASH_TABLE_KEY_CAR_WEAK' and the car of the key entry is already marked, we mark both the `key' and `value' components. Finally, if the table is of the type `HASH_TABLE_VALUE_CAR_WEAK' and the car of the value components is already marked, again both the `key' and the `value' components get marked. Again, there are lists with comparable properties called weak lists. There exist different peculiarities of their types called `simple', `assoc', `key-assoc' and `value-assoc'. You can find further details about them in the description to the function `make-weak-list'. The scheme of their marking is similar: all weak lists are listed in `Qall_weak_lists', therefore we iterate over them. The marking is advanced until we hit an already marked pair. Then we know that during a former run all the rest has been marked completely. Again, depending on the special type of the weak list, our jobs differ. If it is a `WEAK_LIST_SIMPLE' and the elem is marked, we mark the `cons' part. If it is a `WEAK_LIST_ASSOC' and not a pair or a pair with both marked car and cdr, we mark the `cons' and the `elem'. If it is a `WEAK_LIST_KEY_ASSOC' and not a pair or a pair with a marked car of the elem, we mark the `cons' and the `elem'. Finally, if it is a `WEAK_LIST_VALUE_ASSOC' and not a pair or a pair with a marked cdr of the elem, we mark both the `cons' and the `elem'. Since, by marking objects in reach from weak hash tables and weak lists, other objects could get marked, this perhaps implies further marking of other weak objects, both finishing functions are redone as long as yet unmarked objects get freshly marked. 10. After completing the special marking for the weak hash tables and for the weak lists, all entries that point to objects that are going to be swept in the further process are useless, and therefore have to be removed from the table or the list. The function `prune_weak_hash_tables' does the job for weak hash tables. Totally unmarked hash tables are removed from the list `Vall_weak_hash_tables'. The other ones are treated more carefully by scanning over all entries and removing one as soon as one of the components `key' and `value' is unmarked. The same idea applies to the weak lists. It is accomplished by `prune_weak_lists': An unmarked list is pruned from `Vall_weak_lists' immediately. A marked list is treated more carefully by going over it and removing just the unmarked pairs. 11. The function `prune_specifiers' checks all listed specifiers held in `Vall_specifiers' and removes the ones from the lists that are unmarked. 12. All syntax tables are stored in a list called `Vall_syntax_tables'. The function `prune_syntax_tables' walks through it and unlinks the tables that are unmarked. 13. Next, we will attack the complete sweeping - the function `gc_sweep' which holds the predominance. 14. First, all the variables with respect to garbage collection are reset. `consing_since_gc' - the counter of the created cells since the last garbage collection - is set back to 0, and `gc_in_progress' is not `true' anymore. 15. In case the session is interactive, the displayed cursor and message are removed again. 16. The state of `gc_inhibit' is restored to the former value by unwinding the stack. 17. A small memory reserve is always held back that can be reached by `breathing_space'. If nothing more is left, we create a new reserve and exit.  File: internals.info, Node: mark_object, Next: gc_sweep, Prev: garbage_collect_1, Up: Garbage Collection - Step by Step `mark_object' ------------- The first thing that is checked while marking an object is whether the object is a real Lisp object `Lisp_Type_Record' or just an integer or a character. Integers and characters are the only two types that are stored directly - without another level of indirection, and therefore they don't have to be marked and collected. *Note How Lisp Objects Are Represented in C::. The second case is the one we have to handle. It is the one when we are dealing with a pointer to a Lisp object. But, there exist also three possibilities, that prevent us from doing anything while marking: The object is read only which prevents it from being garbage collected, i.e. marked (`C_READONLY_RECORD_HEADER'). The object in question is already marked, and need not be marked for the second time (checked by `MARKED_RECORD_HEADER_P'). If it is a special, unmarkable object (`UNMARKABLE_RECORD_HEADER_P', apparently, these are objects that sit in some const space, and can therefore not be marked, see `this_one_is_unmarkable' in `alloc.c'). Now, the actual marking is feasible. We do so by once using the macro `MARK_RECORD_HEADER' to mark the object itself (actually the special flag in the lrecord header), and calling its special marker "method" `marker' if available. The marker method marks every other object that is in reach from our current object. Note, that these marker methods should not call `mark_object' recursively, but instead should return the next object from where further marking has to be performed. In case another object was returned, as mentioned before, we reiterate the whole `mark_object' process beginning with this next object.