1 This is ../info/internals.info, produced by makeinfo version 4.0 from
2 internals/internals.texi.
4 INFO-DIR-SECTION XEmacs Editor
6 * Internals: (internals). XEmacs Internals Manual.
9 Copyright (C) 1992 - 1996 Ben Wing. Copyright (C) 1996, 1997 Sun
10 Microsystems. Copyright (C) 1994 - 1998 Free Software Foundation.
11 Copyright (C) 1994, 1995 Board of Trustees, University of Illinois.
13 Permission is granted to make and distribute verbatim copies of this
14 manual provided the copyright notice and this permission notice are
15 preserved on all copies.
17 Permission is granted to copy and distribute modified versions of
18 this manual under the conditions for verbatim copying, provided that the
19 entire resulting derived work is distributed under the terms of a
20 permission notice identical to this one.
22 Permission is granted to copy and distribute translations of this
23 manual into another language, under the above conditions for modified
24 versions, except that this permission notice may be stated in a
25 translation approved by the Foundation.
27 Permission is granted to copy and distribute modified versions of
28 this manual under the conditions for verbatim copying, provided also
29 that the section entitled "GNU General Public License" is included
30 exactly as in the original, and provided that the entire resulting
31 derived work is distributed under the terms of a permission notice
32 identical to this one.
34 Permission is granted to copy and distribute translations of this
35 manual into another language, under the above conditions for modified
36 versions, except that the section entitled "GNU General Public License"
37 may be included in a translation approved by the Free Software
38 Foundation instead of in the original English.
41 File: internals.info, Node: The XEmacs Object System (Abstractly Speaking), Next: How Lisp Objects Are Represented in C, Prev: XEmacs From the Inside, Up: Top
43 The XEmacs Object System (Abstractly Speaking)
44 **********************************************
46 At the heart of the Lisp interpreter is its management of objects.
47 XEmacs Lisp contains many built-in objects, some of which are simple
48 and others of which can be very complex; and some of which are very
49 common, and others of which are rarely used or are only used
50 internally. (Since the Lisp allocation system, with its automatic
51 reclamation of unused storage, is so much more convenient than
52 `malloc()' and `free()', the C code makes extensive use of it in its
55 The basic Lisp objects are
58 28 or 31 bits of precision, or 60 or 63 bits on 64-bit machines;
59 the reason for this is described below when the internal Lisp
60 object representation is described.
63 Same precision as a double in C.
66 A simple container for two Lisp objects, used to implement lists
67 and most other data structures in Lisp.
70 An object representing a single character of text; chars behave
71 like integers in many ways but are logically considered text
72 rather than numbers and have a different read syntax. (the read
73 syntax for a char contains the char itself or some textual
74 encoding of it--for example, a Japanese Kanji character might be
75 encoded as `^[$(B#&^[(B' using the ISO-2022 encoding
76 standard--rather than the numerical representation of the char;
77 this way, if the mapping between chars and integers changes, which
78 is quite possible for Kanji characters and other extended
79 characters, the same character will still be created. Note that
80 some primitives confuse chars and integers. The worst culprit is
81 `eq', which makes a special exception and considers a char to be
82 `eq' to its integer equivalent, even though in no other case are
83 objects of two different types `eq'. The reason for this
84 monstrosity is compatibility with existing code; the separation of
85 char from integer came fairly recently.)
88 An object that contains Lisp objects and is referred to by name;
89 symbols are used to implement variables and named functions and to
90 provide the equivalent of preprocessor constants in C.
93 A one-dimensional array of Lisp objects providing constant-time
94 access to any of the objects; access to an arbitrary object in a
95 vector is faster than for lists, but the operations that can be
96 done on a vector are more limited.
99 Self-explanatory; behaves much like a vector of chars but has a
100 different read syntax and is stored and manipulated more compactly.
103 A vector of bits; similar to a string in spirit.
106 An object containing compiled Lisp code, known as "byte code".
109 A Lisp primitive, i.e. a Lisp-callable function implemented in C.
111 Note that there is no basic "function" type, as in more powerful
112 versions of Lisp (where it's called a "closure"). XEmacs Lisp does not
113 provide the closure semantics implemented by Common Lisp and Scheme.
114 The guts of a function in XEmacs Lisp are represented in one of four
115 ways: a symbol specifying another function (when one function is an
116 alias for another), a list (whose first element must be the symbol
117 `lambda') containing the function's source code, a compiled-function
118 object, or a subr object. (In other words, given a symbol specifying
119 the name of a function, calling `symbol-function' to retrieve the
120 contents of the symbol's function cell will return one of these types
123 XEmacs Lisp also contains numerous specialized objects used to
124 implement the editor:
127 Stores text like a string, but is optimized for insertion and
128 deletion and has certain other properties that can be set.
131 An object with various properties whose displayable representation
132 is a "window" in window-system parlance.
135 A section of a frame that displays the contents of a buffer; often
136 called a "pane" in window-system parlance.
138 `window-configuration'
139 An object that represents a saved configuration of windows in a
143 An object representing a screen on which frames can be displayed;
144 equivalent to a "display" in the X Window System and a "TTY" in
148 An object specifying the appearance of text or graphics; it has
149 properties such as font, foreground color, and background color.
152 An object that refers to a particular position in a buffer and
153 moves around as text is inserted and deleted to stay in the same
154 relative position to the text around it.
157 Similar to a marker but covers a range of text in a buffer; can
158 also specify properties of the text, such as a face in which the
159 text is to be displayed, whether the text is invisible or
163 Generated by calling `next-event' and contains information
164 describing a particular event happening in the system, such as the
165 user pressing a key or a process terminating.
168 An object that maps from events (described using lists, vectors,
169 and symbols rather than with an event object because the mapping
170 is for classes of events, rather than individual events) to
171 functions to execute or other events to recursively look up; the
172 functions are described by name, using a symbol, or using lists to
173 specify the function's code.
176 An object that describes the appearance of an image (e.g. pixmap)
177 on the screen; glyphs can be attached to the beginning or end of
178 extents and in some future version of XEmacs will be able to be
179 inserted directly into a buffer.
182 An object that describes a connection to an externally-running
185 There are some other, less-commonly-encountered general objects:
188 An object that maps from an arbitrary Lisp object to another
189 arbitrary Lisp object, using hashing for fast lookup.
192 A limited form of hash-table that maps from strings to symbols;
193 obarrays are used to look up a symbol given its name and are not
194 actually their own object type but are kludgily represented using
195 vectors with hidden fields (this representation derives from GNU
199 A complex object used to specify the value of a display property; a
200 default value is given and different values can be specified for
201 particular frames, buffers, windows, devices, or classes of device.
204 An object that maps from chars or classes of chars to arbitrary
205 Lisp objects; internally char tables use a complex nested-vector
206 representation that is optimized to the way characters are
207 represented as integers.
210 An object that maps from ranges of integers to arbitrary Lisp
213 And some strange special-purpose objects:
217 Objects used when MULE, or multi-lingual/Asian-language, support is
223 An object that encapsulates a window-system resource; instances are
224 mostly used internally but are exposed on the Lisp level for
225 cleanness of the specifier model and because it's occasionally
226 useful for Lisp program to create or query the properties of
230 An object that encapsulate a "subwindow" resource, i.e. a
231 window-system child window that is drawn into by an external
232 process; this object should be integrated into the glyph system
233 but isn't yet, and may change form when this is done.
237 Objects that represent resources used in the ToolTalk interprocess
238 communication protocol.
241 An object used in conjunction with the toolbar.
243 And objects that are only used internally:
246 A generic object for encapsulating arbitrary memory; this allows
247 you the generality of `malloc()' and the convenience of the Lisp
251 A buffering I/O stream, used to provide a unified interface to
252 anything that can accept output or provide input, such as a file
253 descriptor, a stdio stream, a chunk of memory, a Lisp buffer, a
254 Lisp string, etc.; it's a Lisp object to make its memory
255 management more convenient.
258 Subsidiary objects in the internal char-table representation.
263 Various special-purpose objects that are basically just used to
264 encapsulate memory for particular subsystems, similar to the more
265 general "opaque" object.
267 `symbol-value-forward'
268 `symbol-value-buffer-local'
269 `symbol-value-varalias'
270 `symbol-value-lisp-magic'
271 Special internal-only objects that are placed in the value cell of
272 a symbol to indicate that there is something special with this
273 variable - e.g. it has no value, it mirrors another variable, or
274 it mirrors some C variable; there is really only one kind of
275 object, called a "symbol-value-magic", but it is sort-of halfway
276 kludged into semi-different object types.
278 Some types of objects are "permanent", meaning that once created,
279 they do not disappear until explicitly destroyed, using a function such
280 as `delete-buffer', `delete-window', `delete-frame', etc. Others will
281 disappear once they are not longer used, through the garbage collection
282 mechanism. Buffers, frames, windows, devices, and processes are among
283 the objects that are permanent. Note that some objects can go both
284 ways: Faces can be created either way; extents are normally permanent,
285 but detached extents (extents not referring to any text, as happens to
286 some extents when the text they are referring to is deleted) are
287 temporary. Note that some permanent objects, such as faces and coding
288 systems, cannot be deleted. Note also that windows are unique in that
289 they can be _undeleted_ after having previously been deleted. (This
290 happens as a result of restoring a window configuration.)
292 Note that many types of objects have a "read syntax", i.e. a way of
293 specifying an object of that type in Lisp code. When you load a Lisp
294 file, or type in code to be evaluated, what really happens is that the
295 function `read' is called, which reads some text and creates an object
296 based on the syntax of that text; then `eval' is called, which possibly
297 does something special; then this loop repeats until there's no more
298 text to read. (`eval' only actually does something special with
299 symbols, which causes the symbol's value to be returned, similar to
300 referencing a variable; and with conses [i.e. lists], which cause a
301 function invocation. All other values are returned unchanged.)
307 converts to an integer whose value is 17297.
311 converts to a float whose value is 1.983e-4, or .0001983.
315 converts to a char that represents the lowercase letter b.
319 (where `^[' actually is an `ESC' character) converts to a particular
320 Kanji character when using an ISO2022-based coding system for input.
321 (To decode this goo: `ESC' begins an escape sequence; `ESC $ (' is a
322 class of escape sequences meaning "switch to a 94x94 character set";
323 `ESC $ ( B' means "switch to Japanese Kanji"; `#' and `&' collectively
324 index into a 94-by-94 array of characters [subtract 33 from the ASCII
325 value of each character to get the corresponding index]; `ESC (' is a
326 class of escape sequences meaning "switch to a 94 character set"; `ESC
327 (B' means "switch to US ASCII". It is a coincidence that the letter
328 `B' is used to denote both Japanese Kanji and US ASCII. If the first
329 `B' were replaced with an `A', you'd be requesting a Chinese Hanzi
330 character from the GB2312 character set.)
334 converts to a string.
338 converts to a symbol whose name is `"foobar"'. This is done by
339 looking up the string equivalent in the global variable `obarray',
340 whose contents should be an obarray. If no symbol is found, a new
341 symbol with the name `"foobar"' is automatically created and added to
342 `obarray'; this process is called "interning" the symbol.
346 converts to a cons cell containing the symbols `foo' and `bar'.
350 converts to a three-element list containing the specified objects
351 (note that a list is actually a set of nested conses; see the XEmacs
356 converts to a three-element vector containing the specified objects.
360 converts to a compiled-function object (the actual contents are not
361 shown since they are not relevant here; look at a file that ends with
362 `.elc' for examples).
366 converts to a bit-vector.
368 #s(hash-table ... ...)
370 converts to a hash table (the actual contents are not shown).
372 #s(range-table ... ...)
374 converts to a range table (the actual contents are not shown).
376 #s(char-table ... ...)
378 converts to a char table (the actual contents are not shown).
380 Note that the `#s()' syntax is the general syntax for structures,
381 which are not really implemented in XEmacs Lisp but should be.
383 When an object is printed out (using `print' or a related function),
384 the read syntax is used, so that the same object can be read in again.
386 The other objects do not have read syntaxes, usually because it does
387 not really make sense to create them in this fashion (i.e. processes,
388 where it doesn't make sense to have a subprocess created as a side
389 effect of reading some Lisp code), or because they can't be created at
390 all (e.g. subrs). Permanent objects, as a rule, do not have a read
391 syntax; nor do most complex objects, which contain too much state to be
392 easily initialized through a read syntax.
395 File: internals.info, Node: How Lisp Objects Are Represented in C, Next: Rules When Writing New C Code, Prev: The XEmacs Object System (Abstractly Speaking), Up: Top
397 How Lisp Objects Are Represented in C
398 *************************************
400 Lisp objects are represented in C using a 32-bit or 64-bit machine
401 word (depending on the processor; i.e. DEC Alphas use 64-bit Lisp
402 objects and most other processors use 32-bit Lisp objects). The
403 representation stuffs a pointer together with a tag, as follows:
405 [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
406 [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
408 <---> ^ <------------------------------------------------------>
409 tag | a pointer to a structure, or an integer
413 The tag describes the type of the Lisp object. For integers and
414 chars, the lower 28 bits contain the value of the integer or char; for
415 all others, the lower 28 bits contain a pointer. The mark bit is used
416 during garbage-collection, and is always 0 when garbage collection is
417 not happening. (The way that garbage collection works, basically, is
418 that it loops over all places where Lisp objects could exist--this
419 includes all global variables in C that contain Lisp objects [including
420 `Vobarray', the C equivalent of `obarray'; through this, all Lisp
421 variables will get marked], plus various other places--and recursively
422 scans through the Lisp objects, marking each object it finds by setting
423 the mark bit. Then it goes through the lists of all objects allocated,
424 freeing the ones that are not marked and turning off the mark bit of
425 the ones that are marked.)
427 Lisp objects use the typedef `Lisp_Object', but the actual C type
428 used for the Lisp object can vary. It can be either a simple type
429 (`long' on the DEC Alpha, `int' on other machines) or a structure whose
430 fields are bit fields that line up properly (actually, a union of
431 structures is used). Generally the simple integral type is preferable
432 because it ensures that the compiler will actually use a machine word
433 to represent the object (some compilers will use more general and less
434 efficient code for unions and structs even if they can fit in a machine
435 word). The union type, however, has the advantage of stricter type
436 checking (if you accidentally pass an integer where a Lisp object is
437 desired, you get a compile error), and it makes it easier to decode
438 Lisp objects when debugging. The choice of which type to use is
439 determined by the preprocessor constant `USE_UNION_TYPE' which is
440 defined via the `--use-union-type' option to `configure'.
442 Note that there are only eight types that the tag can represent, but
443 many more actual types than this. This is handled by having one of the
444 tag types specify a meta-type called a "record"; for all such objects,
445 the first four bytes of the pointed-to structure indicate what the
448 Note also that having 28 bits for pointers and integers restricts a
449 lot of things to 256 megabytes of memory. (Basically, enough pointers
450 and indices and whatnot get stuffed into Lisp objects that the total
451 amount of memory used by XEmacs can't grow above 256 megabytes. In
452 older versions of XEmacs and GNU Emacs, the tag was 5 bits wide,
453 allowing for 32 types, which was more than the actual number of types
454 that existed at the time, and no "record" type was necessary. However,
455 this limited the editor to 64 megabytes total, which some users who
456 edited large files might conceivably exceed.)
458 Also, note that there is an implicit assumption here that all
459 pointers are low enough that the top bits are all zero and can just be
460 chopped off. On standard machines that allocate memory from the bottom
461 up (and give each process its own address space), this works fine. Some
462 machines, however, put the data space somewhere else in memory (e.g.
463 beginning at 0x80000000). Those machines cope by defining
464 `DATA_SEG_BITS' in the corresponding `m/' or `s/' file to the proper
465 mask. Then, pointers retrieved from Lisp objects are automatically
466 OR'ed with this value prior to being used.
468 A corollary of the previous paragraph is that *(pointers to)
469 stack-allocated structures cannot be put into Lisp objects*. The stack
470 is generally located near the top of memory; if you put such a pointer
471 into a Lisp object, it will get its top bits chopped off, and you will
474 Actually, there's an alternative representation of a `Lisp_Object',
475 invented by Kyle Jones, that is used when the `--use-minimal-tagbits'
476 option to `configure' is used. In this case the 2 lower bits are used
477 for the tag bits. This representation assumes that pointers to structs
478 are always aligned to multiples of 4, so the lower 2 bits are always
481 [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
482 [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
484 <---------------------------------------------------------> <->
485 a pointer to a structure, or an integer tag
487 A tag of 00 is used for all pointer object types, a tag of 10 is used
488 for characters, and the other two tags 01 and 11 are joined together to
489 form the integer object type. The markbit is moved to part of the
490 structure being pointed at (integers and chars do not need to be marked,
491 since no memory is allocated). This representation has these
494 1. 31 bits can be used for Lisp Integers.
496 2. _Any_ pointer can be represented directly, and no bit masking
497 operations are necessary.
499 The disadvantages are:
501 1. An extra level of indirection is needed when accessing the object
502 types that were not record types. So checking whether a Lisp
503 object is a cons cell becomes a slower operation.
505 2. Mark bits can no longer be stored directly in Lisp objects, so
506 another place for them must be found. This means that a cons cell
507 requires more memory than merely room for 2 lisp objects, leading
510 Various macros are used to construct Lisp objects and extract the
511 components. Macros of the form `XINT()', `XCHAR()', `XSTRING()',
512 `XSYMBOL()', etc. mask out the pointer/integer field and cast it to the
513 appropriate type. All of the macros that construct pointers will `OR'
514 with `DATA_SEG_BITS' if necessary. `XINT()' needs to be a bit tricky
515 so that negative numbers are properly sign-extended: Usually it does
516 this by shifting the number four bits to the left and then four bits to
517 the right. This assumes that the right-shift operator does an
518 arithmetic shift (i.e. it leaves the most-significant bit as-is rather
519 than shifting in a zero, so that it mimics a divide-by-two even for
520 negative numbers). Not all machines/compilers do this, and on the ones
521 that don't, a more complicated definition is selected by defining
522 `EXPLICIT_SIGN_EXTEND'.
524 Note that when `ERROR_CHECK_TYPECHECK' is defined, the extractor
525 macros become more complicated--they check the tag bits and/or the type
526 field in the first four bytes of a record type to ensure that the
527 object is really of the correct type. This is great for catching places
528 where an incorrect type is being dereferenced--this typically results
529 in a pointer being dereferenced as the wrong type of structure, with
530 unpredictable (and sometimes not easily traceable) results.
532 There are similar `XSETTYPE()' macros that construct a Lisp object.
533 These macros are of the form `XSETTYPE (LVALUE, RESULT)', i.e. they
534 have to be a statement rather than just used in an expression. The
535 reason for this is that standard C doesn't let you "construct" a
536 structure (but GCC does). Granted, this sometimes isn't too convenient;
537 for the case of integers, at least, you can use the function
538 `make_int()', which constructs and _returns_ an integer Lisp object.
539 Note that the `XSETTYPE()' macros are also affected by
540 `ERROR_CHECK_TYPECHECK' and make sure that the structure is of the
541 right type in the case of record types, where the type is contained in
544 The C programmer is responsible for *guaranteeing* that a
545 Lisp_Object is is the correct type before using the `XTYPE' macros.
546 This is especially important in the case of lists. Use `XCAR' and
547 `XCDR' if a Lisp_Object is certainly a cons cell, else use `Fcar()' and
548 `Fcdr()'. Trust other C code, but not Lisp code. On the other hand,
549 if XEmacs has an internal logic error, it's better to crash
550 immediately, so sprinkle "unreachable" `abort()'s liberally about the
554 File: internals.info, Node: Rules When Writing New C Code, Next: A Summary of the Various XEmacs Modules, Prev: How Lisp Objects Are Represented in C, Up: Top
556 Rules When Writing New C Code
557 *****************************
559 The XEmacs C Code is extremely complex and intricate, and there are
560 many rules that are more or less consistently followed throughout the
561 code. Many of these rules are not obvious, so they are explained here.
562 It is of the utmost importance that you follow them. If you don't,
563 you may get something that appears to work, but which will crash in odd
564 situations, often in code far away from where the actual breakage is.
568 * General Coding Rules::
569 * Writing Lisp Primitives::
570 * Adding Global Lisp Variables::
572 * Techniques for XEmacs Developers::
575 File: internals.info, Node: General Coding Rules, Next: Writing Lisp Primitives, Up: Rules When Writing New C Code
580 The C code is actually written in a dialect of C called "Clean C",
581 meaning that it can be compiled, mostly warning-free, with either a C or
582 C++ compiler. Coding in Clean C has several advantages over plain C.
583 C++ compilers are more nit-picking, and a number of coding errors have
584 been found by compiling with C++. The ability to use both C and C++
585 tools means that a greater variety of development tools are available to
588 Almost every module contains a `syms_of_*()' function and a
589 `vars_of_*()' function. The former declares any Lisp primitives you
590 have defined and defines any symbols you will be using. The latter
591 declares any global Lisp variables you have added and initializes global
592 C variables in the module. For each such function, declare it in
593 `symsinit.h' and make sure it's called in the appropriate place in
594 `emacs.c'. *Important*: There are stringent requirements on exactly
595 what can go into these functions. See the comment in `emacs.c'. The
596 reason for this is to avoid obscure unwanted interactions during
597 initialization. If you don't follow these rules, you'll be sorry! If
598 you want to do anything that isn't allowed, create a
599 `complex_vars_of_*()' function for it. Doing this is tricky, though:
600 You have to make sure your function is called at the right time so that
601 all the initialization dependencies work out.
603 Every module includes `<config.h>' (angle brackets so that
604 `--srcdir' works correctly; `config.h' may or may not be in the same
605 directory as the C sources) and `lisp.h'. `config.h' must always be
606 included before any other header files (including system header files)
607 to ensure that certain tricks played by various `s/' and `m/' files
610 When including header files, always use angle brackets, not double
611 quotes, except when the file to be included is in the same directory as
612 the including file. If either file is a generated file, then that is
613 not likely to be the case. In order to understand why we have this
614 rule, imagine what happens when you do a build in the source directory
615 using `./configure' and another build in another directory using
616 `../work/configure'. There will be two different `config.h' files.
617 Which one will be used if you `#include "config.h"'?
619 *All global and static variables that are to be modifiable must be
620 declared uninitialized.* This means that you may not use the "declare
621 with initializer" form for these variables, such as `int some_variable
622 = 0;'. The reason for this has to do with some kludges done during the
623 dumping process: If possible, the initialized data segment is re-mapped
624 so that it becomes part of the (unmodifiable) code segment in the
625 dumped executable. This allows this memory to be shared among multiple
626 running XEmacs processes. XEmacs is careful to place as much constant
627 data as possible into initialized variables (in particular, into what's
628 called the "pure space"--see below) during the `temacs' phase.
630 *Please note:* This kludge only works on a few systems nowadays, and
631 is rapidly becoming irrelevant because most modern operating systems
632 provide "copy-on-write" semantics. All data is initially shared
633 between processes, and a private copy is automatically made (on a
634 page-by-page basis) when a process first attempts to write to a page of
637 Formerly, there was a requirement that static variables not be
638 declared inside of functions. This had to do with another hack along
639 the same vein as what was just described: old USG systems put
640 statically-declared variables in the initialized data space, so those
641 header files had a `#define static' declaration. (That way, the
642 data-segment remapping described above could still work.) This fails
643 badly on static variables inside of functions, which suddenly become
644 automatic variables; therefore, you weren't supposed to have any of
645 them. This awful kludge has been removed in XEmacs because
647 1. almost all of the systems that used this kludge ended up having to
648 disable the data-segment remapping anyway;
650 2. the only systems that didn't were extremely outdated ones;
652 3. this hack completely messed up inline functions.
654 The C source code makes heavy use of C preprocessor macros. One
655 popular macro style is:
657 #define FOO(var, value) do { \
658 Lisp_Object FOO_value = (value); \
659 ... /* compute using FOO_value */ \
663 The `do {...} while (0)' is a standard trick to allow FOO to have
664 statement semantics, so that it can safely be used within an `if'
665 statement in C, for example. Multiple evaluation is prevented by
666 copying a supplied argument into a local variable, so that
667 `FOO(var,fun(1))' only calls `fun' once.
669 Lisp lists are popular data structures in the C code as well as in
670 Elisp. There are two sets of macros that iterate over lists.
671 `EXTERNAL_LIST_LOOP_N' should be used when the list has been supplied
672 by the user, and cannot be trusted to be acyclic and nil-terminated. A
673 `malformed-list' or `circular-list' error will be generated if the list
674 being iterated over is not entirely kosher. `LIST_LOOP_N', on the
675 other hand, is faster and less safe, and can be used only on trusted
678 Related macros are `GET_EXTERNAL_LIST_LENGTH' and `GET_LIST_LENGTH',
679 which calculate the length of a list, and in the case of
680 `GET_EXTERNAL_LIST_LENGTH', validating the properness of the list. The
681 macros `EXTERNAL_LIST_LOOP_DELETE_IF' and `LIST_LOOP_DELETE_IF' delete
682 elements from a lisp list satisfying some predicate.
685 File: internals.info, Node: Writing Lisp Primitives, Next: Adding Global Lisp Variables, Prev: General Coding Rules, Up: Rules When Writing New C Code
687 Writing Lisp Primitives
688 =======================
690 Lisp primitives are Lisp functions implemented in C. The details of
691 interfacing the C function so that Lisp can call it are handled by a few
692 C macros. The only way to really understand how to write new C code is
693 to read the source, but we can explain some things here.
695 An example of a special form is the definition of `prog1', from
696 `eval.c'. (An ordinary function would have the same general
699 DEFUN ("prog1", Fprog1, 1, UNEVALLED, 0, /*
700 Similar to `progn', but the value of the first form is returned.
701 \(prog1 FIRST BODY...): All the arguments are evaluated sequentially.
702 The value of FIRST is saved during evaluation of the remaining args,
703 whose values are discarded.
707 /* This function can GC */
708 REGISTER Lisp_Object val, form, tail;
711 val = Feval (XCAR (args));
715 LIST_LOOP_3 (form, XCDR (args), tail)
722 Let's start with a precise explanation of the arguments to the
723 `DEFUN' macro. Here is a template for them:
725 DEFUN (LNAME, FNAME, MIN_ARGS, MAX_ARGS, INTERACTIVE, /*
731 This string is the name of the Lisp symbol to define as the
732 function name; in the example above, it is `"prog1"'.
735 This is the C function name for this function. This is the name
736 that is used in C code for calling the function. The name is, by
737 convention, `F' prepended to the Lisp name, with all dashes (`-')
738 in the Lisp name changed to underscores. Thus, to call this
739 function from C code, call `Fprog1'. Remember that the arguments
740 are of type `Lisp_Object'; various macros and functions for
741 creating values of type `Lisp_Object' are declared in the file
744 Primitives whose names are special characters (e.g. `+' or `<')
745 are named by spelling out, in some fashion, the special character:
746 e.g. `Fplus()' or `Flss()'. Primitives whose names begin with
747 normal alphanumeric characters but also contain special characters
748 are spelled out in some creative way, e.g. `let*' becomes
751 Each function also has an associated structure that holds the data
752 for the subr object that represents the function in Lisp. This
753 structure conveys the Lisp symbol name to the initialization
754 routine that will create the symbol and store the subr object as
755 its definition. The C variable name of this structure is always
756 `S' prepended to the FNAME. You hardly ever need to be aware of
757 the existence of this structure, since `DEFUN' plus `DEFSUBR'
758 takes care of all the details.
761 This is the minimum number of arguments that the function
762 requires. The function `prog1' allows a minimum of one argument.
765 This is the maximum number of arguments that the function accepts,
766 if there is a fixed maximum. Alternatively, it can be `UNEVALLED',
767 indicating a special form that receives unevaluated arguments, or
768 `MANY', indicating an unlimited number of evaluated arguments (the
769 C equivalent of `&rest'). Both `UNEVALLED' and `MANY' are macros.
770 If MAX_ARGS is a number, it may not be less than MIN_ARGS and it
771 may not be greater than 8. (If you need to add a function with
772 more than 8 arguments, use the `MANY' form. Resist the urge to
773 edit the definition of `DEFUN' in `lisp.h'. If you do it anyways,
774 make sure to also add another clause to the switch statement in
775 `primitive_funcall().')
778 This is an interactive specification, a string such as might be
779 used as the argument of `interactive' in a Lisp function. In the
780 case of `prog1', it is 0 (a null pointer), indicating that `prog1'
781 cannot be called interactively. A value of `""' indicates a
782 function that should receive no arguments when called
786 This is the documentation string. It is written just like a
787 documentation string for a function defined in Lisp; in
788 particular, the first line should be a single sentence. Note how
789 the documentation string is enclosed in a comment, none of the
790 documentation is placed on the same lines as the comment-start and
791 comment-end characters, and the comment-start characters are on
792 the same line as the interactive specification. `make-docfile',
793 which scans the C files for documentation strings, is very
794 particular about what it looks for, and will not properly extract
795 the doc string if it's not in this exact format.
797 In order to make both `etags' and `make-docfile' happy, make sure
798 that the `DEFUN' line contains the LNAME and FNAME, and that the
799 comment-start characters for the doc string are on the same line
800 as the interactive specification, and put a newline directly after
801 them (and before the comment-end characters).
804 This is the comma-separated list of arguments to the C function.
805 For a function with a fixed maximum number of arguments, provide a
806 C argument for each Lisp argument. In this case, unlike regular C
807 functions, the types of the arguments are not declared; they are
808 simply always of type `Lisp_Object'.
810 The names of the C arguments will be used as the names of the
811 arguments to the Lisp primitive as displayed in its documentation,
812 modulo the same concerns described above for `F...' names (in
813 particular, underscores in the C arguments become dashes in the
816 There is one additional kludge: A trailing `_' on the C argument is
817 discarded when forming the Lisp argument. This allows C language
818 reserved words (like `default') or global symbols (like `dirname')
819 to be used as argument names without compiler warnings or errors.
821 A Lisp function with MAX_ARGS = `UNEVALLED' is a "special form";
822 its arguments are not evaluated. Instead it receives one argument
823 of type `Lisp_Object', a (Lisp) list of the unevaluated arguments,
824 conventionally named `(args)'.
826 When a Lisp function has no upper limit on the number of arguments,
827 specify MAX_ARGS = `MANY'. In this case its implementation in C
828 actually receives exactly two arguments: the number of Lisp
829 arguments (an `int') and the address of a block containing their
830 values (a `Lisp_Object *'). In this case only are the C types
831 specified in the ARGLIST: `(int nargs, Lisp_Object *args)'.
833 Within the function `Fprog1' itself, note the use of the macros
834 `GCPRO1' and `UNGCPRO'. `GCPRO1' is used to "protect" a variable from
835 garbage collection--to inform the garbage collector that it must look
836 in that variable and regard the object pointed at by its contents as an
837 accessible object. This is necessary whenever you call `Feval' or
838 anything that can directly or indirectly call `Feval' (this includes
839 the `QUIT' macro!). At such a time, any Lisp object that you intend to
840 refer to again must be protected somehow. `UNGCPRO' cancels the
841 protection of the variables that are protected in the current function.
842 It is necessary to do this explicitly.
844 The macro `GCPRO1' protects just one local variable. If you want to
845 protect two, use `GCPRO2' instead; repeating `GCPRO1' will not work.
846 Macros `GCPRO3' and `GCPRO4' also exist.
848 These macros implicitly use local variables such as `gcpro1'; you
849 must declare these explicitly, with type `struct gcpro'. Thus, if you
850 use `GCPRO2', you must declare `gcpro1' and `gcpro2'.
852 Note also that the general rule is "caller-protects"; i.e. you are
853 only responsible for protecting those Lisp objects that you create. Any
854 objects passed to you as arguments should have been protected by whoever
855 created them, so you don't in general have to protect them.
857 In particular, the arguments to any Lisp primitive are always
858 automatically `GCPRO'ed, when called "normally" from Lisp code or
859 bytecode. So only a few Lisp primitives that are called frequently from
860 C code, such as `Fprogn' protect their arguments as a service to their
861 caller. You don't need to protect your arguments when writing a new
864 `GCPRO'ing is perhaps the trickiest and most error-prone part of
865 XEmacs coding. It is *extremely* important that you get this right and
866 use a great deal of discipline when writing this code. *Note
867 `GCPRO'ing: GCPROing, for full details on how to do this.
869 What `DEFUN' actually does is declare a global structure of type
870 `Lisp_Subr' whose name begins with capital `SF' and which contains
871 information about the primitive (e.g. a pointer to the function, its
872 minimum and maximum allowed arguments, a string describing its Lisp
873 name); `DEFUN' then begins a normal C function declaration using the
874 `F...' name. The Lisp subr object that is the function definition of a
875 primitive (i.e. the object in the function slot of the symbol that
876 names the primitive) actually points to this `SF' structure; when
877 `Feval' encounters a subr, it looks in the structure to find out how to
880 Defining the C function is not enough to make a Lisp primitive
881 available; you must also create the Lisp symbol for the primitive (the
882 symbol is "interned"; *note Obarrays::) and store a suitable subr
883 object in its function cell. (If you don't do this, the primitive won't
884 be seen by Lisp code.) The code looks like this:
888 Here FNAME is the same name you used as the second argument to `DEFUN'.
890 This call to `DEFSUBR' should go in the `syms_of_*()' function at
891 the end of the module. If no such function exists, create it and make
892 sure to also declare it in `symsinit.h' and call it from the
893 appropriate spot in `main()'. *Note General Coding Rules::.
895 Note that C code cannot call functions by name unless they are
896 defined in C. The way to call a function written in Lisp from C is to
897 use `Ffuncall', which embodies the Lisp function `funcall'. Since the
898 Lisp function `funcall' accepts an unlimited number of arguments, in C
899 it takes two: the number of Lisp-level arguments, and a one-dimensional
900 array containing their values. The first Lisp-level argument is the
901 Lisp function to call, and the rest are the arguments to pass to it.
902 Since `Ffuncall' can call the evaluator, you must protect pointers from
903 garbage collection around the call to `Ffuncall'. (However, `Ffuncall'
904 explicitly protects all of its parameters, so you don't have to protect
905 any pointers passed as parameters to it.)
907 The C functions `call0', `call1', `call2', and so on, provide handy
908 ways to call a Lisp function conveniently with a fixed number of
909 arguments. They work by calling `Ffuncall'.
911 `eval.c' is a very good file to look through for examples; `lisp.h'
912 contains the definitions for important macros and functions.
915 File: internals.info, Node: Adding Global Lisp Variables, Next: Coding for Mule, Prev: Writing Lisp Primitives, Up: Rules When Writing New C Code
917 Adding Global Lisp Variables
918 ============================
920 Global variables whose names begin with `Q' are constants whose
921 value is a symbol of a particular name. The name of the variable should
922 be derived from the name of the symbol using the same rules as for Lisp
923 primitives. These variables are initialized using a call to
924 `defsymbol()' in the `syms_of_*()' function. (This call interns a
925 symbol, sets the C variable to the resulting Lisp object, and calls
926 `staticpro()' on the C variable to tell the garbage-collection
927 mechanism about this variable. What `staticpro()' does is add a
928 pointer to the variable to a large global array; when
929 garbage-collection happens, all pointers listed in the array are used
930 as starting points for marking Lisp objects. This is important because
931 it's quite possible that the only current reference to the object is
932 the C variable. In the case of symbols, the `staticpro()' doesn't
933 matter all that much because the symbol is contained in `obarray',
934 which is itself `staticpro()'ed. However, it's possible that a naughty
935 user could do something like uninterning the symbol out of `obarray' or
936 even setting `obarray' to a different value [although this is likely to
937 make XEmacs crash!].)
939 *Please note:* It is potentially deadly if you declare a `Q...'
940 variable in two different modules. The two calls to `defsymbol()' are
941 no problem, but some linkers will complain about multiply-defined
942 symbols. The most insidious aspect of this is that often the link will
943 succeed anyway, but then the resulting executable will sometimes crash
944 in obscure ways during certain operations! To avoid this problem,
945 declare any symbols with common names (such as `text') that are not
946 obviously associated with this particular module in the module
949 Global variables whose names begin with `V' are variables that
950 contain Lisp objects. The convention here is that all global variables
951 of type `Lisp_Object' begin with `V', and all others don't (including
952 integer and boolean variables that have Lisp equivalents). Most of the
953 time, these variables have equivalents in Lisp, but some don't. Those
954 that do are declared this way by a call to `DEFVAR_LISP()' in the
955 `vars_of_*()' initializer for the module. What this does is create a
956 special "symbol-value-forward" Lisp object that contains a pointer to
957 the C variable, intern a symbol whose name is as specified in the call
958 to `DEFVAR_LISP()', and set its value to the symbol-value-forward Lisp
959 object; it also calls `staticpro()' on the C variable to tell the
960 garbage-collection mechanism about the variable. When `eval' (or
961 actually `symbol-value') encounters this special object in the process
962 of retrieving a variable's value, it follows the indirection to the C
963 variable and gets its value. `setq' does similar things so that the C
964 variable gets changed.
966 Whether or not you `DEFVAR_LISP()' a variable, you need to
967 initialize it in the `vars_of_*()' function; otherwise it will end up
968 as all zeroes, which is the integer 0 (_not_ `nil'), and this is
969 probably not what you want. Also, if the variable is not
970 `DEFVAR_LISP()'ed, *you must call* `staticpro()' on the C variable in
971 the `vars_of_*()' function. Otherwise, the garbage-collection
972 mechanism won't know that the object in this variable is in use, and
973 will happily collect it and reuse its storage for another Lisp object,
974 and you will be the one who's unhappy when you can't figure out how
975 your variable got overwritten.
978 File: internals.info, Node: Coding for Mule, Next: Techniques for XEmacs Developers, Prev: Adding Global Lisp Variables, Up: Rules When Writing New C Code
983 Although Mule support is not compiled by default in XEmacs, many
984 people are using it, and we consider it crucial that new code works
985 correctly with multibyte characters. This is not hard; it is only a
986 matter of following several simple user-interface guidelines. Even if
987 you never compile with Mule, with a little practice you will find it
988 quite easy to code Mule-correctly.
990 Note that these guidelines are not necessarily tied to the current
991 Mule implementation; they are also a good idea to follow on the grounds
992 of code generalization for future I18N work.
996 * Character-Related Data Types::
997 * Working With Character and Byte Positions::
998 * Conversion to and from External Data::
999 * General Guidelines for Writing Mule-Aware Code::
1000 * An Example of Mule-Aware Code::
1003 File: internals.info, Node: Character-Related Data Types, Next: Working With Character and Byte Positions, Up: Coding for Mule
1005 Character-Related Data Types
1006 ----------------------------
1008 First, let's review the basic character-related datatypes used by
1009 XEmacs. Note that the separate `typedef's are not mandatory in the
1010 current implementation (all of them boil down to `unsigned char' or
1011 `int'), but they improve clarity of code a great deal, because one
1012 glance at the declaration can tell the intended use of the variable.
1015 An `Emchar' holds a single Emacs character.
1017 Obviously, the equality between characters and bytes is lost in
1018 the Mule world. Characters can be represented by one or more
1019 bytes in the buffer, and `Emchar' is the C type large enough to
1022 Without Mule support, an `Emchar' is equivalent to an `unsigned
1026 The data representing the text in a buffer or string is logically
1027 a set of `Bufbyte's.
1029 XEmacs does not work with character formats all the time; when
1030 reading characters from the outside, it decodes them to an
1031 internal format, and likewise encodes them when writing.
1032 `Bufbyte' (in fact `unsigned char') is the basic unit of XEmacs
1033 internal buffers and strings format.
1035 One character can correspond to one or more `Bufbyte's. In the
1036 current implementation, an ASCII character is represented by the
1037 same `Bufbyte', and extended characters are represented by a
1038 sequence of `Bufbyte's.
1040 Without Mule support, a `Bufbyte' is equivalent to an `Emchar'.
1044 A `Bufpos' represents a character position in a buffer or string.
1045 A `Charcount' represents a number (count) of characters.
1046 Logically, subtracting two `Bufpos' values yields a `Charcount'
1047 value. Although all of these are `typedef'ed to `int', we use
1048 them in preference to `int' to make it clear what sort of position
1051 `Bufpos' and `Charcount' values are the only ones that are ever
1056 A `Bytind' represents a byte position in a buffer or string. A
1057 `Bytecount' represents the distance between two positions in bytes.
1058 The relationship between `Bytind' and `Bytecount' is the same as
1059 the relationship between `Bufpos' and `Charcount'.
1063 When dealing with the outside world, XEmacs works with `Extbyte's,
1064 which are equivalent to `unsigned char'. Obviously, an `Extcount'
1065 is the distance between two `Extbyte's. Extbytes and Extcounts
1066 are not all that frequent in XEmacs code.