This is Info file ../../info/internals.info, produced by Makeinfo version 1.68 from the input file internals.texi. INFO-DIR-SECTION XEmacs Editor START-INFO-DIR-ENTRY * Internals: (internals). XEmacs Internals Manual. END-INFO-DIR-ENTRY Copyright (C) 1992 - 1996 Ben Wing. Copyright (C) 1996, 1997 Sun Microsystems. Copyright (C) 1994 - 1998 Free Software Foundation. Copyright (C) 1994, 1995 Board of Trustees, University of Illinois. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Foundation. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided also that the section entitled "GNU General Public License" is included exactly as in the original, and provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that the section entitled "GNU General Public License" may be included in a translation approved by the Free Software Foundation instead of in the original English.  File: internals.info, Node: Working With Character and Byte Positions, Next: Conversion to and from External Data, Prev: Character-Related Data Types, Up: Coding for Mule Working With Character and Byte Positions ----------------------------------------- Now that we have defined the basic character-related types, we can look at the macros and functions designed for work with them and for conversion between them. Most of these macros are defined in `buffer.h', and we don't discuss all of them here, but only the most important ones. Examining the existing code is the best way to learn about them. `MAX_EMCHAR_LEN' This preprocessor constant is the maximum number of buffer bytes per Emacs character, i.e. the byte length of an `Emchar'. It is useful when allocating temporary strings to keep a known number of characters. For instance: { Charcount cclen; ... { /* Allocate place for CCLEN characters. */ Bufbyte *buf = (Bufbyte *)alloca (cclen * MAX_EMCHAR_LEN); ... If you followed the previous section, you can guess that, logically, multiplying a `Charcount' value with `MAX_EMCHAR_LEN' produces a `Bytecount' value. In the current Mule implementation, `MAX_EMCHAR_LEN' equals 4. Without Mule, it is 1. `charptr_emchar' `set_charptr_emchar' The `charptr_emchar' macro takes a `Bufbyte' pointer and returns the `Emchar' stored at that position. If it were a function, its prototype would be: Emchar charptr_emchar (Bufbyte *p); `set_charptr_emchar' stores an `Emchar' to the specified byte position. It returns the number of bytes stored: Bytecount set_charptr_emchar (Bufbyte *p, Emchar c); It is important to note that `set_charptr_emchar' is safe only for appending a character at the end of a buffer, not for overwriting a character in the middle. This is because the width of characters varies, and `set_charptr_emchar' cannot resize the string if it writes, say, a two-byte character where a single-byte character used to reside. A typical use of `set_charptr_emchar' can be demonstrated by this example, which copies characters from buffer BUF to a temporary string of Bufbytes. { Bufpos pos; for (pos = beg; pos < end; pos++) { Emchar c = BUF_FETCH_CHAR (buf, pos); p += set_charptr_emchar (buf, c); } } Note how `set_charptr_emchar' is used to store the `Emchar' and increment the counter, at the same time. `INC_CHARPTR' `DEC_CHARPTR' These two macros increment and decrement a `Bufbyte' pointer, respectively. They will adjust the pointer by the appropriate number of bytes according to the byte length of the character stored there. Both macros assume that the memory address is located at the beginning of a valid character. Without Mule support, `INC_CHARPTR (p)' and `DEC_CHARPTR (p)' simply expand to `p++' and `p--', respectively. `bytecount_to_charcount' Given a pointer to a text string and a length in bytes, return the equivalent length in characters. Charcount bytecount_to_charcount (Bufbyte *p, Bytecount bc); `charcount_to_bytecount' Given a pointer to a text string and a length in characters, return the equivalent length in bytes. Bytecount charcount_to_bytecount (Bufbyte *p, Charcount cc); `charptr_n_addr' Return a pointer to the beginning of the character offset CC (in characters) from P. Bufbyte *charptr_n_addr (Bufbyte *p, Charcount cc);  File: internals.info, Node: Conversion to and from External Data, Next: General Guidelines for Writing Mule-Aware Code, Prev: Working With Character and Byte Positions, Up: Coding for Mule Conversion to and from External Data ------------------------------------ When an external function, such as a C library function, returns a `char' pointer, you should almost never treat it as `Bufbyte'. This is because these returned strings may contain 8bit characters which can be misinterpreted by XEmacs, and cause a crash. Likewise, when exporting a piece of internal text to the outside world, you should always convert it to an appropriate external encoding, lest the internal stuff (such as the infamous \201 characters) leak out. The interface to conversion between the internal and external representations of text are the numerous conversion macros defined in `buffer.h'. Before looking at them, we'll look at the external formats supported by these macros. Currently meaningful formats are `FORMAT_BINARY', `FORMAT_FILENAME', `FORMAT_OS', and `FORMAT_CTEXT'. Here is a description of these. `FORMAT_BINARY' Binary format. This is the simplest format and is what we use in the absence of a more appropriate format. This converts according to the `binary' coding system: a. On input, bytes 0-255 are converted into characters 0-255. b. On output, characters 0-255 are converted into bytes 0-255 and other characters are converted into `X'. `FORMAT_FILENAME' Format used for filenames. In the original Mule, this is user-definable with the `pathname-coding-system' variable. For the moment, we just use the `binary' coding system. `FORMAT_OS' Format used for the external Unix environment--`argv[]', stuff from `getenv()', stuff from the `/etc/passwd' file, etc. Perhaps should be the same as FORMAT_FILENAME. `FORMAT_CTEXT' Compound-text format. This is the standard X format used for data stored in properties, selections, and the like. This is an 8-bit no-lock-shift ISO2022 coding system. The macros to convert between these formats and the internal format, and vice versa, follow. `GET_CHARPTR_INT_DATA_ALLOCA' `GET_CHARPTR_EXT_DATA_ALLOCA' These two are the most basic conversion macros. `GET_CHARPTR_INT_DATA_ALLOCA' converts external data to internal format, and `GET_CHARPTR_EXT_DATA_ALLOCA' converts the other way around. The arguments each of these receives are PTR (pointer to the text in external format), LEN (length of texts in bytes), FMT (format of the external text), PTR_OUT (lvalue to which new text should be copied), and LEN_OUT (lvalue which will be assigned the length of the internal text in bytes). The resulting text is stored to a stack-allocated buffer. If the text doesn't need changing, these macros will do nothing, except for setting LEN_OUT. The macros above take many arguments which makes them unwieldy. For this reason, a number of convenience macros are defined with obvious functionality, but accepting less arguments. The general rule is that macros with `INT' in their name convert text to internal Emacs representation, whereas the `EXT' macros convert to external representation. `GET_C_CHARPTR_INT_DATA_ALLOCA' `GET_C_CHARPTR_EXT_DATA_ALLOCA' As their names imply, these macros work on C char pointers, which are zero-terminated, and thus do not need LEN or LEN_OUT parameters. `GET_STRING_EXT_DATA_ALLOCA' `GET_C_STRING_EXT_DATA_ALLOCA' These two macros convert a Lisp string into an external representation. The difference between them is that `GET_STRING_EXT_DATA_ALLOCA' stores its output to a generic string, providing LEN_OUT, the length of the resulting external string. On the other hand, `GET_C_STRING_EXT_DATA_ALLOCA' assumes that the caller will be satisfied with output string being zero-terminated. Note that for Lisp strings only one conversion direction makes sense. `GET_C_CHARPTR_EXT_BINARY_DATA_ALLOCA' `GET_CHARPTR_EXT_BINARY_DATA_ALLOCA' `GET_STRING_BINARY_DATA_ALLOCA' `GET_C_STRING_BINARY_DATA_ALLOCA' `GET_C_CHARPTR_EXT_FILENAME_DATA_ALLOCA' `...' These macros convert internal text to a specific external representation, with the external format being encoded into the name of the macro. Note that the `GET_STRING_...' and `GET_C_STRING...' macros lack the `EXT' tag, because they only make sense in that direction. `GET_C_CHARPTR_INT_BINARY_DATA_ALLOCA' `GET_CHARPTR_INT_BINARY_DATA_ALLOCA' `GET_C_CHARPTR_INT_FILENAME_DATA_ALLOCA' `...' These macros convert external text of a specific format to its internal representation, with the external format being incoded into the name of the macro.  File: internals.info, Node: General Guidelines for Writing Mule-Aware Code, Next: An Example of Mule-Aware Code, Prev: Conversion to and from External Data, Up: Coding for Mule General Guidelines for Writing Mule-Aware Code ---------------------------------------------- This section contains some general guidance on how to write Mule-aware code, as well as some pitfalls you should avoid. *Never use `char' and `char *'.* In XEmacs, the use of `char' and `char *' is almost always a mistake. If you want to manipulate an Emacs character from "C", use `Emchar'. If you want to examine a specific octet in the internal format, use `Bufbyte'. If you want a Lisp-visible character, use a `Lisp_Object' and `make_char'. If you want a pointer to move through the internal text, use `Bufbyte *'. Also note that you almost certainly do not need `Emchar *'. *Be careful not to confuse `Charcount', `Bytecount', and `Bufpos'.* The whole point of using different types is to avoid confusion about the use of certain variables. Lest this effect be nullified, you need to be careful about using the right types. *Always convert external data* It is extremely important to always convert external data, because XEmacs can crash if unexpected 8bit sequences are copied to its internal buffers literally. This means that when a system function, such as `readdir', returns a string, you need to convert it using one of the conversion macros described in the previous chapter, before passing it further to Lisp. In the case of `readdir', you would use the `GET_C_CHARPTR_INT_FILENAME_DATA_ALLOCA' macro. Also note that many internal functions, such as `make_string', accept Bufbytes, which removes the need for them to convert the data they receive. This increases efficiency because that way external data needs to be decoded only once, when it is read. After that, it is passed around in internal format.  File: internals.info, Node: An Example of Mule-Aware Code, Prev: General Guidelines for Writing Mule-Aware Code, Up: Coding for Mule An Example of Mule-Aware Code ----------------------------- As an example of Mule-aware code, we shall will analyze the `string' function, which conses up a Lisp string from the character arguments it receives. Here is the definition, pasted from `alloc.c': DEFUN ("string", Fstring, 0, MANY, 0, /* Concatenate all the argument characters and make the result a string. */ (int nargs, Lisp_Object *args)) { Bufbyte *storage = alloca_array (Bufbyte, nargs * MAX_EMCHAR_LEN); Bufbyte *p = storage; for (; nargs; nargs--, args++) { Lisp_Object lisp_char = *args; CHECK_CHAR_COERCE_INT (lisp_char); p += set_charptr_emchar (p, XCHAR (lisp_char)); } return make_string (storage, p - storage); } Now we can analyze the source line by line. Obviously, string will be as long as there are arguments to the function. This is why we allocate `MAX_EMCHAR_LEN' * NARGS bytes on the stack, i.e. the worst-case number of bytes for NARGS `Emchar's to fit in the string. Then, the loop checks that each element is a character, converting integers in the process. Like many other functions in XEmacs, this function silently accepts integers where characters are expected, for historical and compatibility reasons. Unless you know what you are doing, `CHECK_CHAR' will also suffice. `XCHAR (lisp_char)' extracts the `Emchar' from the `Lisp_Object', and `set_charptr_emchar' stores it to storage, increasing `p' in the process. Other instructive examples of correct coding under Mule can be found all over the XEmacs code. For starters, I recommend `Fnormalize_menu_item_name' in `menubar.c'. After you have understood this section of the manual and studied the examples, you can proceed writing new Mule-aware code.  File: internals.info, Node: Techniques for XEmacs Developers, Prev: Coding for Mule, Up: Rules When Writing New C Code Techniques for XEmacs Developers ================================ To make a quantified XEmacs, do: `make quantmacs'. You simply can't dump Quantified and Purified images. Run the image like so: `quantmacs -batch -l loadup.el run-temacs XEMACS-ARGS...'. Before you go through the trouble, are you compiling with all debugging and error-checking off? If not try that first. Be warned that while Quantify is directly responsible for quite a few optimizations which have been made to XEmacs, doing a run which generates results which can be acted upon is not necessarily a trivial task. Also, if you're still willing to do some runs make sure you configure with the `--quantify' flag. That will keep Quantify from starting to record data until after the loadup is completed and will shut off recording right before it shuts down (which generates enough bogus data to throw most results off). It also enables three additional elisp commands: `quantify-start-recording-data', `quantify-stop-recording-data' and `quantify-clear-data'. If you want to make XEmacs faster, target your favorite slow benchmark, run a profiler like Quantify, `gprof', or `tcov', and figure out where the cycles are going. Specific projects: * Make the garbage collector faster. Figure out how to write an incremental garbage collector. * Write a compiler that takes bytecode and spits out C code. Unfortunately, you will then need a C compiler and a more fully developed module system. * Speed up redisplay. * Speed up syntax highlighting. Maybe moving some of the syntax highlighting capabilities into C would make a difference. * Implement tail recursion in Emacs Lisp (hard!). Unfortunately, Emacs Lisp is slow, and is going to stay slow. Function calls in elisp are especially expensive. Iterating over a long list is going to be 30 times faster implemented in C than in Elisp. To get started debugging XEmacs, take a look at the `gdbinit' and `dbxrc' files in the `src' directory. *Note Q2.1.15 - How to Debug an XEmacs problem with a debugger: (xemacs-faq)Q2.1.15 - How to Debug an XEmacs problem with a debugger. After making source code changes, run `make check' to ensure that you haven't introduced any regressions. If you're feeling ambitious, you can try to improve the test suite in `tests/automated'. Here are things to know when you create a new source file: * All `.c' files should `#include ' first. Almost all `.c' files should `#include "lisp.h"' second. * Generated header files should be included using the `#include <...>' syntax, not the `#include "..."' syntax. The generated headers are: `config.h puresize-adjust.h sheap-adjust.h paths.h Emacs.ad.h' The basic rule is that you should assume builds using `--srcdir' and the `#include <...>' syntax needs to be used when the to-be-included generated file is in a potentially different directory *at compile time*. The non-obvious C rule is that `#include "..."' means to search for the included file in the same directory as the including file, *not* in the current directory. * Header files should *not* include `' and `"lisp.h"'. It is the responsibility of the `.c' files that use it to do so. * If the header uses `INLINE', either directly or through `DECLARE_LRECORD', then it must be added to `inline.c''s includes. * Try compiling at least once with gcc --with-mule --with-union-type --error-checking=all * Did I mention that you should run the test suite? make check  File: internals.info, Node: A Summary of the Various XEmacs Modules, Next: Allocation of Objects in XEmacs Lisp, Prev: Rules When Writing New C Code, Up: Top A Summary of the Various XEmacs Modules *************************************** This is accurate as of XEmacs 20.0. * Menu: * Low-Level Modules:: * Basic Lisp Modules:: * Modules for Standard Editing Operations:: * Editor-Level Control Flow Modules:: * Modules for the Basic Displayable Lisp Objects:: * Modules for other Display-Related Lisp Objects:: * Modules for the Redisplay Mechanism:: * Modules for Interfacing with the File System:: * Modules for Other Aspects of the Lisp Interpreter and Object System:: * Modules for Interfacing with the Operating System:: * Modules for Interfacing with X Windows:: * Modules for Internationalization::  File: internals.info, Node: Low-Level Modules, Next: Basic Lisp Modules, Up: A Summary of the Various XEmacs Modules Low-Level Modules ================= config.h This is automatically generated from `config.h.in' based on the results of configure tests and user-selected optional features and contains preprocessor definitions specifying the nature of the environment in which XEmacs is being compiled. paths.h This is automatically generated from `paths.h.in' based on supplied configure values, and allows for non-standard installed configurations of the XEmacs directories. It's currently broken, though. emacs.c signal.c `emacs.c' contains `main()' and other code that performs the most basic environment initializations and handles shutting down the XEmacs process (this includes `kill-emacs', the normal way that XEmacs is exited; `dump-emacs', which is used during the build process to write out the XEmacs executable; `run-emacs-from-temacs', which can be used to start XEmacs directly when temacs has finished loading all the Lisp code; and emergency code to handle crashes [XEmacs tries to auto-save all files before it crashes]). Low-level code that directly interacts with the Unix signal mechanism, however, is in `signal.c'. Note that this code does not handle system dependencies in interfacing to signals; that is handled using the `syssignal.h' header file, described in section J below. unexaix.c unexalpha.c unexapollo.c unexconvex.c unexec.c unexelf.c unexelfsgi.c unexencap.c unexenix.c unexfreebsd.c unexfx2800.c unexhp9k3.c unexhp9k800.c unexmips.c unexnext.c unexsol2.c unexsunos4.c These modules contain code dumping out the XEmacs executable on various different systems. (This process is highly machine-specific and requires intimate knowledge of the executable format and the memory map of the process.) Only one of these modules is actually used; this is chosen by `configure'. crt0.c lastfile.c pre-crt0.c These modules are used in conjunction with the dump mechanism. On some systems, an alternative version of the C startup code (the actual code that receives control from the operating system when the process is started, and which calls `main()') is required so that the dumping process works properly; `crt0.c' provides this. `pre-crt0.c' and `lastfile.c' should be the very first and very last file linked, respectively. (Actually, this is not really true. `lastfile.c' should be after all Emacs modules whose initialized data should be made constant, and before all other Emacs files and all libraries. In particular, the allocation modules `gmalloc.c', `alloca.c', etc. are normally placed past `lastfile.c', and all of the files that implement Xt widget classes *must* be placed after `lastfile.c' because they contain various structures that must be statically initialized and into which Xt writes at various times.) `pre-crt0.c' and `lastfile.c' contain exported symbols that are used to determine the start and end of XEmacs' initialized data space when dumping. alloca.c free-hook.c getpagesize.h gmalloc.c malloc.c mem-limits.h ralloc.c vm-limit.c These handle basic C allocation of memory. `alloca.c' is an emulation of the stack allocation function `alloca()' on machines that lack this. (XEmacs makes extensive use of `alloca()' in its code.) `gmalloc.c' and `malloc.c' are two implementations of the standard C functions `malloc()', `realloc()' and `free()'. They are often used in place of the standard system-provided `malloc()' because they usually provide a much faster implementation, at the expense of additional memory use. `gmalloc.c' is a newer implementation that is much more memory-efficient for large allocations than `malloc.c', and should always be preferred if it works. (At one point, `gmalloc.c' didn't work on some systems where `malloc.c' worked; but this should be fixed now.) `ralloc.c' is the "relocating allocator". It provides functions similar to `malloc()', `realloc()' and `free()' that allocate memory that can be dynamically relocated in memory. The advantage of this is that allocated memory can be shuffled around to place all the free memory at the end of the heap, and the heap can then be shrunk, releasing the memory back to the operating system. The use of this can be controlled with the configure option `--rel-alloc'; if enabled, memory allocated for buffers will be relocatable, so that if a very large file is visited and the buffer is later killed, the memory can be released to the operating system. (The disadvantage of this mechanism is that it can be very slow. On systems with the `mmap()' system call, the XEmacs version of `ralloc.c' uses this to move memory around without actually having to block-copy it, which can speed things up; but it can still cause noticeable performance degradation.) `free-hook.c' contains some debugging functions for checking for invalid arguments to `free()'. `vm-limit.c' contains some functions that warn the user when memory is getting low. These are callback functions that are called by `gmalloc.c' and `malloc.c' at appropriate times. `getpagesize.h' provides a uniform interface for retrieving the size of a page in virtual memory. `mem-limits.h' provides a uniform interface for retrieving the total amount of available virtual memory. Both are similar in spirit to the `sys*.h' files described in section J, below. blocktype.c blocktype.h dynarr.c These implement a couple of basic C data types to facilitate memory allocation. The `Blocktype' type efficiently manages the allocation of fixed-size blocks by minimizing the number of times that `malloc()' and `free()' are called. It allocates memory in large chunks, subdivides the chunks into blocks of the proper size, and returns the blocks as requested. When blocks are freed, they are placed onto a linked list, so they can be efficiently reused. This data type is not much used in XEmacs currently, because it's a fairly new addition. The `Dynarr' type implements a "dynamic array", which is similar to a standard C array but has no fixed limit on the number of elements it can contain. Dynamic arrays can hold elements of any type, and when you add a new element, the array automatically resizes itself if it isn't big enough. Dynarrs are extensively used in the redisplay mechanism. inline.c This module is used in connection with inline functions (available in some compilers). Often, inline functions need to have a corresponding non-inline function that does the same thing. This module is where they reside. It contains no actual code, but defines some special flags that cause inline functions defined in header files to be rendered as actual functions. It then includes all header files that contain any inline function definitions, so that each one gets a real function equivalent. debug.c debug.h These functions provide a system for doing internal consistency checks during code development. This system is not currently used; instead the simpler `assert()' macro is used along with the various checks provided by the `--error-check-*' configuration options. prefix-args.c This is actually the source for a small, self-contained program used during building. universe.h This is not currently used.  File: internals.info, Node: Basic Lisp Modules, Next: Modules for Standard Editing Operations, Prev: Low-Level Modules, Up: A Summary of the Various XEmacs Modules Basic Lisp Modules ================== emacsfns.h lisp-disunion.h lisp-union.h lisp.h lrecord.h symsinit.h These are the basic header files for all XEmacs modules. Each module includes `lisp.h', which brings the other header files in. `lisp.h' contains the definitions of the structures and extractor and constructor macros for the basic Lisp objects and various other basic definitions for the Lisp environment, as well as some general-purpose definitions (e.g. `min()' and `max()'). `lisp.h' includes either `lisp-disunion.h' or `lisp-union.h', depending on whether `USE_UNION_TYPE' is defined. These files define the typedef of the Lisp object itself (as described above) and the low-level macros that hide the actual implementation of the Lisp object. All extractor and constructor macros for particular types of Lisp objects are defined in terms of these low-level macros. As a general rule, all typedefs should go into the typedefs section of `lisp.h' rather than into a module-specific header file even if the structure is defined elsewhere. This allows function prototypes that use the typedef to be placed into other header files. Forward structure declarations (i.e. a simple declaration like `struct foo;' where the structure itself is defined elsewhere) should be placed into the typedefs section as necessary. `lrecord.h' contains the basic structures and macros that implement all record-type Lisp objects - i.e. all objects whose type is a field in their C structure, which includes all objects except the few most basic ones. `lisp.h' contains prototypes for most of the exported functions in the various modules. Lisp primitives defined using `DEFUN' that need to be called by C code should be declared using `EXFUN'. Other function prototypes should be placed either into the appropriate section of `lisp.h', or into a module-specific header file, depending on how general-purpose the function is and whether it has special-purpose argument types requiring definitions not in `lisp.h'.) All initialization functions are prototyped in `symsinit.h'. alloc.c pure.c puresize.h The large module `alloc.c' implements all of the basic allocation and garbage collection for Lisp objects. The most commonly used Lisp objects are allocated in chunks, similar to the Blocktype data type described above; others are allocated in individually `malloc()'ed blocks. This module provides the foundation on which all other aspects of the Lisp environment sit, and is the first module initialized at startup. Note that `alloc.c' provides a series of generic functions that are not dependent on any particular object type, and interfaces to particular types of objects using a standardized interface of type-specific methods. This scheme is a fundamental principle of object-oriented programming and is heavily used throughout XEmacs. The great advantage of this is that it allows for a clean separation of functionality into different modules - new classes of Lisp objects, new event interfaces, new device types, new stream interfaces, etc. can be added transparently without affecting code anywhere else in XEmacs. Because the different subsystems are divided into general and specific code, adding a new subtype within a subsystem will in general not require changes to the generic subsystem code or affect any of the other subtypes in the subsystem; this provides a great deal of robustness to the XEmacs code. `pure.c' contains the declaration of the "purespace" array. Pure space is a hack used to place some constant Lisp data into the code segment of the XEmacs executable, even though the data needs to be initialized through function calls. (See above in section VIII for more info about this.) During startup, certain sorts of data is automatically copied into pure space, and other data is copied manually in some of the basic Lisp files by calling the function `purecopy', which copies the object if possible (this only works in temacs, of course) and returns the new object. In particular, while temacs is executing, the Lisp reader automatically copies all compiled-function objects that it reads into pure space. Since compiled-function objects are large, are never modified, and typically comprise the majority of the contents of a compiled-Lisp file, this works well. While XEmacs is running, any attempt to modify an object that resides in pure space causes an error. Objects in pure space are never garbage collected - almost all of the time, they're intended to be permanent, and in any case you can't write into pure space to set the mark bits. `puresize.h' contains the declaration of the size of the pure space array. This depends on the optional features that are compiled in, any extra purespace requested by the user at compile time, and certain other factors (e.g. 64-bit machines need more pure space because their Lisp objects are larger). The smallest size that suffices should be used, so that there's no wasted space. If there's not enough pure space, you will get an error during the build process, specifying how much more pure space is needed. eval.c backtrace.h This module contains all of the functions to handle the flow of control. This includes the mechanisms of defining functions, calling functions, traversing stack frames, and binding variables; the control primitives and other special forms such as `while', `if', `eval', `let', `and', `or', `progn', etc.; handling of non-local exits, unwind-protects, and exception handlers; entering the debugger; methods for the subr Lisp object type; etc. It does *not* include the `read' function, the `print' function, or the handling of symbols and obarrays. `backtrace.h' contains some structures related to stack frames and the flow of control. lread.c This module implements the Lisp reader and the `read' function, which converts text into Lisp objects, according to the read syntax of the objects, as described above. This is similar to the parser that is a part of all compilers. print.c This module implements the Lisp print mechanism and the `print' function and related functions. This is the inverse of the Lisp reader - it converts Lisp objects to a printed, textual representation. (Hopefully something that can be read back in using `read' to get an equivalent object.) general.c symbols.c symeval.h `symbols.c' implements the handling of symbols, obarrays, and retrieving the values of symbols. Much of the code is devoted to handling the special "symbol-value-magic" objects that define special types of variables - this includes buffer-local variables, variable aliases, variables that forward into C variables, etc. This module is initialized extremely early (right after `alloc.c'), because it is here that the basic symbols `t' and `nil' are created, and those symbols are used everywhere throughout XEmacs. `symeval.h' contains the definitions of symbol structures and the `DEFVAR_LISP()' and related macros for declaring variables. data.c floatfns.c fns.c These modules implement the methods and standard Lisp primitives for all the basic Lisp object types other than symbols (which are described above). `data.c' contains all the predicates (primitives that return whether an object is of a particular type); the integer arithmetic functions; and the basic accessor and mutator primitives for the various object types. `fns.c' contains all the standard predicates for working with sequences (where, abstractly speaking, a sequence is an ordered set of objects, and can be represented by a list, string, vector, or bit-vector); it also contains `equal', perhaps on the grounds that bulk of the operation of `equal' is comparing sequences. `floatfns.c' contains methods and primitives for floats and floating-point arithmetic. bytecode.c bytecode.h `bytecode.c' implements the byte-code interpreter and compiled-function objects, and `bytecode.h' contains associated structures. Note that the byte-code *compiler* is written in Lisp.  File: internals.info, Node: Modules for Standard Editing Operations, Next: Editor-Level Control Flow Modules, Prev: Basic Lisp Modules, Up: A Summary of the Various XEmacs Modules Modules for Standard Editing Operations ======================================= buffer.c buffer.h bufslots.h `buffer.c' implements the "buffer" Lisp object type. This includes functions that create and destroy buffers; retrieve buffers by name or by other properties; manipulate lists of buffers (remember that buffers are permanent objects and stored in various ordered lists); retrieve or change buffer properties; etc. It also contains the definitions of all the built-in buffer-local variables (which can be viewed as buffer properties). It does *not* contain code to manipulate buffer-local variables (that's in `symbols.c', described above); or code to manipulate the text in a buffer. `buffer.h' defines the structures associated with a buffer and the various macros for retrieving text from a buffer and special buffer positions (e.g. `point', the default location for text insertion). It also contains macros for working with buffer positions and converting between their representations as character offsets and as byte offsets (under MULE, they are different, because characters can be multi-byte). It is one of the largest header files. `bufslots.h' defines the fields in the buffer structure that correspond to the built-in buffer-local variables. It is its own header file because it is included many times in `buffer.c', as a way of iterating over all the built-in buffer-local variables. insdel.c insdel.h `insdel.c' contains low-level functions for inserting and deleting text in a buffer, keeping track of changed regions for use by redisplay, and calling any before-change and after-change functions that may have been registered for the buffer. It also contains the actual functions that convert between byte offsets and character offsets. `insdel.h' contains associated headers. marker.c This module implements the "marker" Lisp object type, which conceptually is a pointer to a text position in a buffer that moves around as text is inserted and deleted, so as to remain in the same relative position. This module doesn't actually move the markers around - that's handled in `insdel.c'. This module just creates them and implements the primitives for working with them. As markers are simple objects, this does not entail much. Note that the standard arithmetic primitives (e.g. `+') accept markers in place of integers and automatically substitute the value of `marker-position' for the marker, i.e. an integer describing the current buffer position of the marker. extents.c extents.h This module implements the "extent" Lisp object type, which is like a marker that works over a range of text rather than a single position. Extents are also much more complex and powerful than markers and have a more efficient (and more algorithmically complex) implementation. The implementation is described in detail in comments in `extents.c'. The code in `extents.c' works closely with `insdel.c' so that extents are properly moved around as text is inserted and deleted. There is also code in `extents.c' that provides information needed by the redisplay mechanism for efficient operation. (Remember that extents can have display properties that affect [sometimes drastically, as in the `invisible' property] the display of the text they cover.) editfns.c `editfns.c' contains the standard Lisp primitives for working with a buffer's text, and calls the low-level functions in `insdel.c'. It also contains primitives for working with `point' (the default buffer insertion location). `editfns.c' also contains functions for retrieving various characteristics from the external environment: the current time, the process ID of the running XEmacs process, the name of the user who ran this XEmacs process, etc. It's not clear why this code is in `editfns.c'. callint.c cmds.c commands.h These modules implement the basic "interactive" commands, i.e. user-callable functions. Commands, as opposed to other functions, have special ways of getting their parameters interactively (by querying the user), as opposed to having them passed in a normal function invocation. Many commands are not really meant to be called from other Lisp functions, because they modify global state in a way that's often undesired as part of other Lisp functions. `callint.c' implements the mechanism for querying the user for parameters and calling interactive commands. The bulk of this module is code that parses the interactive spec that is supplied with an interactive command. `cmds.c' implements the basic, most commonly used editing commands: commands to move around the current buffer and insert and delete characters. These commands are implemented using the Lisp primitives defined in `editfns.c'. `commands.h' contains associated structure definitions and prototypes. regex.c regex.h search.c `search.c' implements the Lisp primitives for searching for text in a buffer, and some of the low-level algorithms for doing this. In particular, the fast fixed-string Boyer-Moore search algorithm is implemented in `search.c'. The low-level algorithms for doing regular-expression searching, however, are implemented in `regex.c' and `regex.h'. These two modules are largely independent of XEmacs, and are similar to (and based upon) the regular-expression routines used in `grep' and other GNU utilities. doprnt.c `doprnt.c' implements formatted-string processing, similar to `printf()' command in C. undo.c This module implements the undo mechanism for tracking buffer changes. Most of this could be implemented in Lisp.  File: internals.info, Node: Editor-Level Control Flow Modules, Next: Modules for the Basic Displayable Lisp Objects, Prev: Modules for Standard Editing Operations, Up: A Summary of the Various XEmacs Modules Editor-Level Control Flow Modules ================================= event-Xt.c event-stream.c event-tty.c events.c events.h These implement the handling of events (user input and other system notifications). `events.c' and `events.h' define the "event" Lisp object type and primitives for manipulating it. `event-stream.c' implements the basic functions for working with event queues, dispatching an event by looking it up in relevant keymaps and such, and handling timeouts; this includes the primitives `next-event' and `dispatch-event', as well as related primitives such as `sit-for', `sleep-for', and `accept-process-output'. (`event-stream.c' is one of the hairiest and trickiest modules in XEmacs. Beware! You can easily mess things up here.) `event-Xt.c' and `event-tty.c' implement the low-level interfaces onto retrieving events from Xt (the X toolkit) and from TTY's (using `read()' and `select()'), respectively. The event interface enforces a clean separation between the specific code for interfacing with the operating system and the generic code for working with events, by defining an API of basic, low-level event methods; `event-Xt.c' and `event-tty.c' are two different implementations of this API. To add support for a new operating system (e.g. NeXTstep), one merely needs to provide another implementation of those API functions. Note that the choice of whether to use `event-Xt.c' or `event-tty.c' is made at compile time! Or at the very latest, it is made at startup time. `event-Xt.c' handles events for *both* X and TTY frames; `event-tty.c' is only used when X support is not compiled into XEmacs. The reason for this is that there is only one event loop in XEmacs: thus, it needs to be able to receive events from all different kinds of frames. keymap.c keymap.h `keymap.c' and `keymap.h' define the "keymap" Lisp object type and associated methods and primitives. (Remember that keymaps are objects that associate event descriptions with functions to be called to "execute" those events; `dispatch-event' looks up events in the relevant keymaps.) keyboard.c `keyboard.c' contains functions that implement the actual editor command loop - i.e. the event loop that cyclically retrieves and dispatches events. This code is also rather tricky, just like `event-stream.c'. macros.c macros.h These two modules contain the basic code for defining keyboard macros. These functions don't actually do much; most of the code that handles keyboard macros is mixed in with the event-handling code in `event-stream.c'. minibuf.c This contains some miscellaneous code related to the minibuffer (most of the minibuffer code was moved into Lisp by Richard Mlynarik). This includes the primitives for completion (although filename completion is in `dired.c'), the lowest-level interface to the minibuffer (if the command loop were cleaned up, this too could be in Lisp), and code for dealing with the echo area (this, too, was mostly moved into Lisp, and the only code remaining is code to call out to Lisp or provide simple bootstrapping implementations early in temacs, before the echo-area Lisp code is loaded).  File: internals.info, Node: Modules for the Basic Displayable Lisp Objects, Next: Modules for other Display-Related Lisp Objects, Prev: Editor-Level Control Flow Modules, Up: A Summary of the Various XEmacs Modules Modules for the Basic Displayable Lisp Objects ============================================== device-ns.h device-stream.c device-stream.h device-tty.c device-tty.h device-x.c device-x.h device.c device.h These modules implement the "device" Lisp object type. This abstracts a particular screen or connection on which frames are displayed. As with Lisp objects, event interfaces, and other subsystems, the device code is separated into a generic component that contains a standardized interface (in the form of a set of methods) onto particular device types. The device subsystem defines all the methods and provides method services for not only device operations but also for the frame, window, menubar, scrollbar, toolbar, and other displayable-object subsystems. The reason for this is that all of these subsystems have the same subtypes (X, TTY, NeXTstep, Microsoft Windows, etc.) as devices do. frame-ns.h frame-tty.c frame-x.c frame-x.h frame.c frame.h Each device contains one or more frames in which objects (e.g. text) are displayed. A frame corresponds to a window in the window system; usually this is a top-level window but it could potentially be one of a number of overlapping child windows within a top-level window, using the MDI (Multiple Document Interface) protocol in Microsoft Windows or a similar scheme. The `frame-*' files implement the "frame" Lisp object type and provide the generic and device-type-specific operations on frames (e.g. raising, lowering, resizing, moving, etc.). window.c window.h Each frame consists of one or more non-overlapping "windows" (better known as "panes" in standard window-system terminology) in which a buffer's text can be displayed. Windows can also have scrollbars displayed around their edges. `window.c' and `window.h' implement the "window" Lisp object type and provide code to manage windows. Since windows have no associated resources in the window system (the window system knows only about the frame; no child windows or anything are used for XEmacs windows), there is no device-type-specific code here; all of that code is part of the redisplay mechanism or the code for particular object types such as scrollbars.  File: internals.info, Node: Modules for other Display-Related Lisp Objects, Next: Modules for the Redisplay Mechanism, Prev: Modules for the Basic Displayable Lisp Objects, Up: A Summary of the Various XEmacs Modules Modules for other Display-Related Lisp Objects ============================================== faces.c faces.h bitmaps.h glyphs-ns.h glyphs-x.c glyphs-x.h glyphs.c glyphs.h objects-ns.h objects-tty.c objects-tty.h objects-x.c objects-x.h objects.c objects.h menubar-x.c menubar.c scrollbar-x.c scrollbar-x.h scrollbar.c scrollbar.h toolbar-x.c toolbar.c toolbar.h font-lock.c This file provides C support for syntax highlighting - i.e. highlighting different syntactic constructs of a source file in different colors, for easy reading. The C support is provided so that this is fast. dgif_lib.c gif_err.c gif_lib.h gifalloc.c These modules decode GIF-format image files, for use with glyphs.  File: internals.info, Node: Modules for the Redisplay Mechanism, Next: Modules for Interfacing with the File System, Prev: Modules for other Display-Related Lisp Objects, Up: A Summary of the Various XEmacs Modules Modules for the Redisplay Mechanism =================================== redisplay-output.c redisplay-tty.c redisplay-x.c redisplay.c redisplay.h These files provide the redisplay mechanism. As with many other subsystems in XEmacs, there is a clean separation between the general and device-specific support. `redisplay.c' contains the bulk of the redisplay engine. These functions update the redisplay structures (which describe how the screen is to appear) to reflect any changes made to the state of any displayable objects (buffer, frame, window, etc.) since the last time that redisplay was called. These functions are highly optimized to avoid doing more work than necessary (since redisplay is called extremely often and is potentially a huge time sink), and depend heavily on notifications from the objects themselves that changes have occurred, so that redisplay doesn't explicitly have to check each possible object. The redisplay mechanism also contains a great deal of caching to further speed things up; some of this caching is contained within the various displayable objects. `redisplay-output.c' goes through the redisplay structures and converts them into calls to device-specific methods to actually output the screen changes. `redisplay-x.c' and `redisplay-tty.c' are two implementations of these redisplay output methods, for X frames and TTY frames, respectively. indent.c This module contains various functions and Lisp primitives for converting between buffer positions and screen positions. These functions call the redisplay mechanism to do most of the work, and then examine the redisplay structures to get the necessary information. This module needs work. termcap.c terminfo.c tparam.c These files contain functions for working with the termcap (BSD-style) and terminfo (System V style) databases of terminal capabilities and escape sequences, used when XEmacs is displaying in a TTY. cm.c cm.h These files provide some miscellaneous TTY-output functions and should probably be merged into `redisplay-tty.c'.