This is Info file ../../info/internals.info, produced by Makeinfo version 1.68 from the input file internals.texi. INFO-DIR-SECTION XEmacs Editor START-INFO-DIR-ENTRY * Internals: (internals). XEmacs Internals Manual. END-INFO-DIR-ENTRY Copyright (C) 1992 - 1996 Ben Wing. Copyright (C) 1996, 1997 Sun Microsystems. Copyright (C) 1994 - 1998 Free Software Foundation. Copyright (C) 1994, 1995 Board of Trustees, University of Illinois. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Foundation. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided also that the section entitled "GNU General Public License" is included exactly as in the original, and provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that the section entitled "GNU General Public License" may be included in a translation approved by the Free Software Foundation instead of in the original English.  File: internals.info, Node: Main Loop, Next: Specifics of the Event Gathering Mechanism, Prev: Introduction to Events, Up: Events and the Event Loop Main Loop ========= The "command loop" is the top-level loop that the editor is always running. It loops endlessly, calling `next-event' to retrieve an event and `dispatch-event' to execute it. `dispatch-event' does the appropriate thing with non-user events (process, timeout, magic, eval, mouse motion); this involves calling a Lisp handler function, redrawing a newly-exposed part of a frame, reading subprocess output, etc. For user events, `dispatch-event' looks up the event in relevant keymaps or menubars; when a full key sequence or menubar selection is reached, the appropriate function is executed. `dispatch-event' may have to keep state across calls; this is done in the "command-builder" structure associated with each console (remember, there's usually only one console), and the engine that looks up keystrokes and constructs full key sequences is called the "command builder". This is documented elsewhere. The guts of the command loop are in `command_loop_1()'. This function doesn't catch errors, though - that's the job of `command_loop_2()', which is a condition-case (i.e. error-trapping) wrapper around `command_loop_1()'. `command_loop_1()' never returns, but may get thrown out of. When an error occurs, `cmd_error()' is called, which usually invokes the Lisp error handler in `command-error'; however, a default error handler is provided if `command-error' is `nil' (e.g. during startup). The purpose of the error handler is simply to display the error message and do associated cleanup; it does not need to throw anywhere. When the error handler finishes, the condition-case in `command_loop_2()' will finish and `command_loop_2()' will reinvoke `command_loop_1()'. `command_loop_2()' is invoked from three places: from `initial_command_loop()' (called from `main()' at the end of internal initialization), from the Lisp function `recursive-edit', and from `call_command_loop()'. `call_command_loop()' is called when a macro is started and when the minibuffer is entered; normal termination of the macro or minibuffer causes a throw out of the recursive command loop. (To `execute-kbd-macro' for macros and `exit' for minibuffers. Note also that the low-level minibuffer-entering function, `read-minibuffer-internal', provides its own error handling and does not need `command_loop_2()''s error encapsulation; so it tells `call_command_loop()' to invoke `command_loop_1()' directly.) Note that both read-minibuffer-internal and recursive-edit set up a catch for `exit'; this is why `abort-recursive-edit', which throws to this catch, exits out of either one. `initial_command_loop()', called from `main()', sets up a catch for `top-level' when invoking `command_loop_2()', allowing functions to throw all the way to the top level if they really need to. Before invoking `command_loop_2()', `initial_command_loop()' calls `top_level_1()', which handles all of the startup stuff (creating the initial frame, handling the command-line options, loading the user's `.emacs' file, etc.). The function that actually does this is in Lisp and is pointed to by the variable `top-level'; normally this function is `normal-top-level'. `top_level_1()' is just an error-handling wrapper similar to `command_loop_2()'. Note also that `initial_command_loop()' sets up a catch for `top-level' when invoking `top_level_1()', just like when it invokes `command_loop_2()'.  File: internals.info, Node: Specifics of the Event Gathering Mechanism, Next: Specifics About the Emacs Event, Prev: Main Loop, Up: Events and the Event Loop Specifics of the Event Gathering Mechanism ========================================== Here is an approximate diagram of the collection processes at work in XEmacs, under TTY's (TTY's are simpler than X so we'll look at this first): asynch. asynch. asynch. asynch. [Collectors in kbd events kbd events process process the OS] | | output output | | | | | | | | SIGINT, [signal handlers | | | | SIGQUIT, in XEmacs] V V V V SIGWINCH, file file file file SIGALRM desc. desc. desc. desc. | (TTY) (TTY) (pipe) (pipe) | | | | | fake timeouts | | | | file | | | | | desc. | | | | | (pipe) | | | | | | | | | | | | | | | | | | | V V V V V V ------>-----------<----------------<---------------- | | | [collected using select() in emacs_tty_next_event() | and converted to the appropriate Emacs event] | | V (above this line is TTY-specific) Emacs ----------------------------------------------- event (below this line is the generic event mechanism) | | was there if not, call a SIGINT? emacs_tty_next_event() | | | | | | V V --->------<---- | | [collected in event_stream_next_event(); | SIGINT is converted using maybe_read_quit_event()] V Emacs event | \---->------>----- maybe_kbd_translate() ---->---\ | | | command event queue | if not from command (contains events that were event queue, call read earlier but not processed, event_stream_next_event() typically when waiting in a | sit-for, sleep-for, etc. for | a particular event to be received) | | | | | V V ---->------------------------------------<---- | | [collected in | next_event_internal()] | unread- unread- event from | command- command- keyboard else, call events event macro next_event_internal() | | | | | | | | | | | | V V V V --------->----------------------<------------ | | [collected in `next-event', which may loop | more than once if the event it gets is on | a dead frame, device, etc.] | | V feed into top-level event loop, which repeatedly calls `next-event' and then dispatches the event using `dispatch-event' Notice the separation between TTY-specific and generic event mechanism. When using the Xt-based event loop, the TTY-specific stuff is replaced but the rest stays the same. It's also important to realize that only one different kind of system-specific event loop can be operating at a time, and must be able to receive all kinds of events simultaneously. For the two existing event loops (implemented in `event-tty.c' and `event-Xt.c', respectively), the TTY event loop *only* handles TTY consoles, while the Xt event loop handles *both* TTY and X consoles. This situation is different from all of the output handlers, where you simply have one per console type. Here's the Xt Event Loop Diagram (notice that below a certain point, it's the same as the above diagram): asynch. asynch. asynch. asynch. [Collectors in kbd kbd process process the OS] events events output output | | | | | | | | asynch. asynch. [Collectors in the | | | | X X OS and X Window System] | | | | events events | | | | | | | | | | | | | | | | | | SIGINT, [signal handlers | | | | | | SIGQUIT, in XEmacs] | | | | | | SIGWINCH, | | | | | | SIGALRM | | | | | | | | | | | | | | | | | | | | | timeouts | | | | | | | | | | | | | | | | | | | | | | V | V V V V V V fake | file file file file file file file | desc. desc. desc. desc. desc. desc. desc. | (TTY) (TTY) (pipe) (pipe) (socket) (socket) (pipe) | | | | | | | | | | | | | | | | | | | | | | | | | V V V V V V V V --->----------------------------------------<---------<------ | | | | | |[collected using select() in | | | _XtWaitForSomething(), called | | | from XtAppProcessEvent(), called | | | in emacs_Xt_next_event(); | | | dispatched to various callbacks] | | | | | | emacs_Xt_ p_s_callback(), | [popup_selection_callback] event_handler() x_u_v_s_callback(),| [x_update_vertical_scrollbar_ | x_u_h_s_callback(),| callback] | search_callback() | [x_update_horizontal_scrollbar_ | | | callback] | | | | | | enqueue_Xt_ signal_special_ | dispatch_event() Xt_user_event() | [maybe multiple | | times, maybe 0 | | times] | | | enqueue_Xt_ | | dispatch_event() | | | | | | | V V | -->----------<-- | | | | | dispatch Xt_what_callback() event sets flags queue | | | | | | | | | ---->-----------<-------- | | | [collected and converted as appropriate in | emacs_Xt_next_event()] | | V (above this line is Xt-specific) Emacs ------------------------------------------------ event (below this line is the generic event mechanism) | | was there if not, call a SIGINT? emacs_Xt_next_event() | | | | | | V V --->-------<---- | | [collected in event_stream_next_event(); | SIGINT is converted using maybe_read_quit_event()] V Emacs event | \---->------>----- maybe_kbd_translate() -->-----\ | | | command event queue | if not from command (contains events that were event queue, call read earlier but not processed, event_stream_next_event() typically when waiting in a | sit-for, sleep-for, etc. for | a particular event to be received) | | | | | V V ---->----------------------------------<------ | | [collected in | next_event_internal()] | unread- unread- event from | command- command- keyboard else, call events event macro next_event_internal() | | | | | | | | | | | | V V V V --------->----------------------<------------ | | [collected in `next-event', which may loop | more than once if the event it gets is on | a dead frame, device, etc.] | | V feed into top-level event loop, which repeatedly calls `next-event' and then dispatches the event using `dispatch-event'  File: internals.info, Node: Specifics About the Emacs Event, Next: The Event Stream Callback Routines, Prev: Specifics of the Event Gathering Mechanism, Up: Events and the Event Loop Specifics About the Emacs Event ===============================  File: internals.info, Node: The Event Stream Callback Routines, Next: Other Event Loop Functions, Prev: Specifics About the Emacs Event, Up: Events and the Event Loop The Event Stream Callback Routines ==================================  File: internals.info, Node: Other Event Loop Functions, Next: Converting Events, Prev: The Event Stream Callback Routines, Up: Events and the Event Loop Other Event Loop Functions ========================== `detect_input_pending()' and `input-pending-p' look for input by calling `event_stream->event_pending_p' and looking in `[V]unread-command-event' and the `command_event_queue' (they do not check for an executing keyboard macro, though). `discard-input' cancels any command events pending (and any keyboard macros currently executing), and puts the others onto the `command_event_queue'. There is a comment about a "race condition", which is not a good sign. `next-command-event' and `read-char' are higher-level interfaces to `next-event'. `next-command-event' gets the next "command" event (i.e. keypress, mouse event, menu selection, or scrollbar action), calling `dispatch-event' on any others. `read-char' calls `next-command-event' and uses `event_to_character()' to return the character equivalent. With the right kind of input method support, it is possible for (read-char) to return a Kanji character.  File: internals.info, Node: Converting Events, Next: Dispatching Events; The Command Builder, Prev: Other Event Loop Functions, Up: Events and the Event Loop Converting Events ================= `character_to_event()', `event_to_character()', `event-to-character', and `character-to-event' convert between characters and keypress events corresponding to the characters. If the event was not a keypress, `event_to_character()' returns -1 and `event-to-character' returns `nil'. These functions convert between character representation and the split-up event representation (keysym plus mod keys).  File: internals.info, Node: Dispatching Events; The Command Builder, Prev: Converting Events, Up: Events and the Event Loop Dispatching Events; The Command Builder ======================================= Not yet documented.  File: internals.info, Node: Evaluation; Stack Frames; Bindings, Next: Symbols and Variables, Prev: Events and the Event Loop, Up: Top Evaluation; Stack Frames; Bindings ********************************** * Menu: * Evaluation:: * Dynamic Binding; The specbinding Stack; Unwind-Protects:: * Simple Special Forms:: * Catch and Throw::  File: internals.info, Node: Evaluation, Next: Dynamic Binding; The specbinding Stack; Unwind-Protects, Up: Evaluation; Stack Frames; Bindings Evaluation ========== `Feval()' evaluates the form (a Lisp object) that is passed to it. Note that evaluation is only non-trivial for two types of objects: symbols and conses. A symbol is evaluated simply by calling `symbol-value' on it and returning the value. Evaluating a cons means calling a function. First, `eval' checks to see if garbage-collection is necessary, and calls `garbage_collect_1()' if so. It then increases the evaluation depth by 1 (`lisp_eval_depth', which is always less than `max_lisp_eval_depth') and adds an element to the linked list of `struct backtrace''s (`backtrace_list'). Each such structure contains a pointer to the function being called plus a list of the function's arguments. Originally these values are stored unevalled, and as they are evaluated, the backtrace structure is updated. Garbage collection pays attention to the objects pointed to in the backtrace structures (garbage collection might happen while a function is being called or while an argument is being evaluated, and there could easily be no other references to the arguments in the argument list; once an argument is evaluated, however, the unevalled version is not needed by eval, and so the backtrace structure is changed). At this point, the function to be called is determined by looking at the car of the cons (if this is a symbol, its function definition is retrieved and the process repeated). The function should then consist of either a `Lisp_Subr' (built-in function written in C), a `Lisp_Compiled_Function' object, or a cons whose car is one of the symbols `autoload', `macro' or `lambda'. If the function is a `Lisp_Subr', the lisp object points to a `struct Lisp_Subr' (created by `DEFUN()'), which contains a pointer to the C function, a minimum and maximum number of arguments (or possibly the special constants `MANY' or `UNEVALLED'), a pointer to the symbol referring to that subr, and a couple of other things. If the subr wants its arguments `UNEVALLED', they are passed raw as a list. Otherwise, an array of evaluated arguments is created and put into the backtrace structure, and either passed whole (`MANY') or each argument is passed as a C argument. If the function is a `Lisp_Compiled_Function', `funcall_compiled_function()' is called. If the function is a lambda list, `funcall_lambda()' is called. If the function is a macro, [..... fill in] is done. If the function is an autoload, `do_autoload()' is called to load the definition and then eval starts over [explain this more]. When `Feval()' exits, the evaluation depth is reduced by one, the debugger is called if appropriate, and the current backtrace structure is removed from the list. Both `funcall_compiled_function()' and `funcall_lambda()' need to go through the list of formal parameters to the function and bind them to the actual arguments, checking for `&rest' and `&optional' symbols in the formal parameters and making sure the number of actual arguments is correct. `funcall_compiled_function()' can do this a little more efficiently, since the formal parameter list can be checked for sanity when the compiled function object is created. `funcall_lambda()' simply calls `Fprogn' to execute the code in the lambda list. `funcall_compiled_function()' calls the real byte-code interpreter `execute_optimized_program()' on the byte-code instructions, which are converted into an internal form for faster execution. When a compiled function is executed for the first time by `funcall_compiled_function()', or when it is `Fpurecopy()'ed during the dump phase of building XEmacs, the byte-code instructions are converted from a `Lisp_String' (which is inefficient to access, especially in the presence of MULE) into a `Lisp_Opaque' object containing an array of unsigned char, which can be directly executed by the byte-code interpreter. At this time the byte code is also analyzed for validity and transformed into a more optimized form, so that `execute_optimized_program()' can really fly. Here are some of the optimizations performed by the internal byte-code transformer: 1. References to the `constants' array are checked for out-of-range indices, so that the byte interpreter doesn't have to. 2. References to the `constants' array that will be used as a Lisp variable are checked for being correct non-constant (i.e. not `t', `nil', or `keywordp') symbols, so that the byte interpreter doesn't have to. 3. The maxiumum number of variable bindings in the byte-code is pre-computed, so that space on the `specpdl' stack can be pre-reserved once for the whole function execution. 4. All byte-code jumps are relative to the current program counter instead of the start of the program, thereby saving a register. 5. One-byte relative jumps are converted from the byte-code form of unsigned chars offset by 127 to machine-friendly signed chars. Of course, this transformation of the `instructions' should not be visible to the user, so `Fcompiled_function_instructions()' needs to know how to convert the optimized opaque object back into a Lisp string that is identical to the original string from the `.elc' file. (Actually, the resulting string may (rarely) contain slightly different, yet equivalent, byte code.) `Ffuncall()' implements Lisp `funcall'. `(funcall fun x1 x2 x3 ...)' is equivalent to `(eval (list fun (quote x1) (quote x2) (quote x3) ...))'. `Ffuncall()' contains its own code to do the evaluation, however, and is very similar to `Feval()'. From the performance point of view, it is worth knowing that most of the time in Lisp evaluation is spent executing `Lisp_Subr' and `Lisp_Compiled_Function' objects via `Ffuncall()' (not `Feval()'). `Fapply()' implements Lisp `apply', which is very similar to `funcall' except that if the last argument is a list, the result is the same as if each of the arguments in the list had been passed separately. `Fapply()' does some business to expand the last argument if it's a list, then calls `Ffuncall()' to do the work. `apply1()', `call0()', `call1()', `call2()', and `call3()' call a function, passing it the argument(s) given (the arguments are given as separate C arguments rather than being passed as an array). `apply1()' uses `Fapply()' while the others use `Ffuncall()' to do the real work.  File: internals.info, Node: Dynamic Binding; The specbinding Stack; Unwind-Protects, Next: Simple Special Forms, Prev: Evaluation, Up: Evaluation; Stack Frames; Bindings Dynamic Binding; The specbinding Stack; Unwind-Protects ======================================================= struct specbinding { Lisp_Object symbol; Lisp_Object old_value; Lisp_Object (*func) (Lisp_Object); /* for unwind-protect */ }; `struct specbinding' is used for local-variable bindings and unwind-protects. `specpdl' holds an array of `struct specbinding''s, `specpdl_ptr' points to the beginning of the free bindings in the array, `specpdl_size' specifies the total number of binding slots in the array, and `max_specpdl_size' specifies the maximum number of bindings the array can be expanded to hold. `grow_specpdl()' increases the size of the `specpdl' array, multiplying its size by 2 but never exceeding `max_specpdl_size' (except that if this number is less than 400, it is first set to 400). `specbind()' binds a symbol to a value and is used for local variables and `let' forms. The symbol and its old value (which might be `Qunbound', indicating no prior value) are recorded in the specpdl array, and `specpdl_size' is increased by 1. `record_unwind_protect()' implements an "unwind-protect", which, when placed around a section of code, ensures that some specified cleanup routine will be executed even if the code exits abnormally (e.g. through a `throw' or quit). `record_unwind_protect()' simply adds a new specbinding to the `specpdl' array and stores the appropriate information in it. The cleanup routine can either be a C function, which is stored in the `func' field, or a `progn' form, which is stored in the `old_value' field. `unbind_to()' removes specbindings from the `specpdl' array until the specified position is reached. Each specbinding can be one of three types: 1. an unwind-protect with a C cleanup function (`func' is not 0, and `old_value' holds an argument to be passed to the function); 2. an unwind-protect with a Lisp form (`func' is 0, `symbol' is `nil', and `old_value' holds the form to be executed with `Fprogn()'); or 3. a local-variable binding (`func' is 0, `symbol' is not `nil', and `old_value' holds the old value, which is stored as the symbol's value).  File: internals.info, Node: Simple Special Forms, Next: Catch and Throw, Prev: Dynamic Binding; The specbinding Stack; Unwind-Protects, Up: Evaluation; Stack Frames; Bindings Simple Special Forms ==================== `or', `and', `if', `cond', `progn', `prog1', `prog2', `setq', `quote', `function', `let*', `let', `while' All of these are very simple and work as expected, calling `Feval()' or `Fprogn()' as necessary and (in the case of `let' and `let*') using `specbind()' to create bindings and `unbind_to()' to undo the bindings when finished. Note that, with the exeption of `Fprogn', these functions are typically called in real life only in interpreted code, since the byte compiler knows how to convert calls to these functions directly into byte code.  File: internals.info, Node: Catch and Throw, Prev: Simple Special Forms, Up: Evaluation; Stack Frames; Bindings Catch and Throw =============== struct catchtag { Lisp_Object tag; Lisp_Object val; struct catchtag *next; struct gcpro *gcpro; jmp_buf jmp; struct backtrace *backlist; int lisp_eval_depth; int pdlcount; }; `catch' is a Lisp function that places a catch around a body of code. A catch is a means of non-local exit from the code. When a catch is created, a tag is specified, and executing a `throw' to this tag will exit from the body of code caught with this tag, and its value will be the value given in the call to `throw'. If there is no such call, the code will be executed normally. Information pertaining to a catch is held in a `struct catchtag', which is placed at the head of a linked list pointed to by `catchlist'. `internal_catch()' is passed a C function to call (`Fprogn()' when Lisp `catch' is called) and arguments to give it, and places a catch around the function. Each `struct catchtag' is held in the stack frame of the `internal_catch()' instance that created the catch. `internal_catch()' is fairly straightforward. It stores into the `struct catchtag' the tag name and the current values of `backtrace_list', `lisp_eval_depth', `gcprolist', and the offset into the `specpdl' array, sets a jump point with `_setjmp()' (storing the jump point into the `struct catchtag'), and calls the function. Control will return to `internal_catch()' either when the function exits normally or through a `_longjmp()' to this jump point. In the latter case, `throw' will store the value to be returned into the `struct catchtag' before jumping. When it's done, `internal_catch()' removes the `struct catchtag' from the catchlist and returns the proper value. `Fthrow()' goes up through the catchlist until it finds one with a matching tag. It then calls `unbind_catch()' to restore everything to what it was when the appropriate catch was set, stores the return value in the `struct catchtag', and jumps (with `_longjmp()') to its jump point. `unbind_catch()' removes all catches from the catchlist until it finds the correct one. Some of the catches might have been placed for error-trapping, and if so, the appropriate entries on the handlerlist must be removed (see "errors"). `unbind_catch()' also restores the values of `gcprolist', `backtrace_list', and `lisp_eval', and calls `unbind_to()' to undo any specbindings created since the catch.  File: internals.info, Node: Symbols and Variables, Next: Buffers and Textual Representation, Prev: Evaluation; Stack Frames; Bindings, Up: Top Symbols and Variables ********************* * Menu: * Introduction to Symbols:: * Obarrays:: * Symbol Values::  File: internals.info, Node: Introduction to Symbols, Next: Obarrays, Up: Symbols and Variables Introduction to Symbols ======================= A symbol is basically just an object with four fields: a name (a string), a value (some Lisp object), a function (some Lisp object), and a property list (usually a list of alternating keyword/value pairs). What makes symbols special is that there is usually only one symbol with a given name, and the symbol is referred to by name. This makes a symbol a convenient way of calling up data by name, i.e. of implementing variables. (The variable's value is stored in the "value slot".) Similarly, functions are referenced by name, and the definition of the function is stored in a symbol's "function slot". This means that there can be a distinct function and variable with the same name. The property list is used as a more general mechanism of associating additional values with particular names, and once again the namespace is independent of the function and variable namespaces.  File: internals.info, Node: Obarrays, Next: Symbol Values, Prev: Introduction to Symbols, Up: Symbols and Variables Obarrays ======== The identity of symbols with their names is accomplished through a structure called an obarray, which is just a poorly-implemented hash table mapping from strings to symbols whose name is that string. (I say "poorly implemented" because an obarray appears in Lisp as a vector with some hidden fields rather than as its own opaque type. This is an Emacs Lisp artifact that should be fixed.) Obarrays are implemented as a vector of some fixed size (which should be a prime for best results), where each "bucket" of the vector contains one or more symbols, threaded through a hidden `next' field in the symbol. Lookup of a symbol in an obarray, and adding a symbol to an obarray, is accomplished through standard hash-table techniques. The standard Lisp function for working with symbols and obarrays is `intern'. This looks up a symbol in an obarray given its name; if it's not found, a new symbol is automatically created with the specified name, added to the obarray, and returned. This is what happens when the Lisp reader encounters a symbol (or more precisely, encounters the name of a symbol) in some text that it is reading. There is a standard obarray called `obarray' that is used for this purpose, although the Lisp programmer is free to create his own obarrays and `intern' symbols in them. Note that, once a symbol is in an obarray, it stays there until something is done about it, and the standard obarray `obarray' always stays around, so once you use any particular variable name, a corresponding symbol will stay around in `obarray' until you exit XEmacs. Note that `obarray' itself is a variable, and as such there is a symbol in `obarray' whose name is `"obarray"' and which contains `obarray' as its value. Note also that this call to `intern' occurs only when in the Lisp reader, not when the code is executed (at which point the symbol is already around, stored as such in the definition of the function). You can create your own obarray using `make-vector' (this is horrible but is an artifact) and intern symbols into that obarray. Doing that will result in two or more symbols with the same name. However, at most one of these symbols is in the standard `obarray': You cannot have two symbols of the same name in any particular obarray. Note that you cannot add a symbol to an obarray in any fashion other than using `intern': i.e. you can't take an existing symbol and put it in an existing obarray. Nor can you change the name of an existing symbol. (Since obarrays are vectors, you can violate the consistency of things by storing directly into the vector, but let's ignore that possibility.) Usually symbols are created by `intern', but if you really want, you can explicitly create a symbol using `make-symbol', giving it some name. The resulting symbol is not in any obarray (i.e. it is "uninterned"), and you can't add it to any obarray. Therefore its primary purpose is as a symbol to use in macros to avoid namespace pollution. It can also be used as a carrier of information, but cons cells could probably be used just as well. You can also use `intern-soft' to look up a symbol but not create a new one, and `unintern' to remove a symbol from an obarray. This returns the removed symbol. (Remember: You can't put the symbol back into any obarray.) Finally, `mapatoms' maps over all of the symbols in an obarray.  File: internals.info, Node: Symbol Values, Prev: Obarrays, Up: Symbols and Variables Symbol Values ============= The value field of a symbol normally contains a Lisp object. However, a symbol can be "unbound", meaning that it logically has no value. This is internally indicated by storing a special Lisp object, called "the unbound marker" and stored in the global variable `Qunbound'. The unbound marker is of a special Lisp object type called "symbol-value-magic". It is impossible for the Lisp programmer to directly create or access any object of this type. *You must not let any "symbol-value-magic" object escape to the Lisp level.* Printing any of these objects will cause the message `INTERNAL EMACS BUG' to appear as part of the print representation. (You may see this normally when you call `debug_print()' from the debugger on a Lisp object.) If you let one of these objects escape to the Lisp level, you will violate a number of assumptions contained in the C code and make the unbound marker not function right. When a symbol is created, its value field (and function field) are set to `Qunbound'. The Lisp programmer can restore these conditions later using `makunbound' or `fmakunbound', and can query to see whether the value of function fields are "bound" (i.e. have a value other than `Qunbound') using `boundp' and `fboundp'. The fields are set to a normal Lisp object using `set' (or `setq') and `fset'. Other symbol-value-magic objects are used as special markers to indicate variables that have non-normal properties. This includes any variables that are tied into C variables (setting the variable magically sets some global variable in the C code, and likewise for retrieving the variable's value), variables that magically tie into slots in the current buffer, variables that are buffer-local, etc. The symbol-value-magic object is stored in the value cell in place of a normal object, and the code to retrieve a symbol's value (i.e. `symbol-value') knows how to do special things with them. This means that you should not just fetch the value cell directly if you want a symbol's value. The exact workings of this are rather complex and involved and are well-documented in comments in `buffer.c', `symbols.c', and `lisp.h'.  File: internals.info, Node: Buffers and Textual Representation, Next: MULE Character Sets and Encodings, Prev: Symbols and Variables, Up: Top Buffers and Textual Representation ********************************** * Menu: * Introduction to Buffers:: A buffer holds a block of text such as a file. * The Text in a Buffer:: Representation of the text in a buffer. * Buffer Lists:: Keeping track of all buffers. * Markers and Extents:: Tagging locations within a buffer. * Bufbytes and Emchars:: Representation of individual characters. * The Buffer Object:: The Lisp object corresponding to a buffer.  File: internals.info, Node: Introduction to Buffers, Next: The Text in a Buffer, Up: Buffers and Textual Representation Introduction to Buffers ======================= A buffer is logically just a Lisp object that holds some text. In this, it is like a string, but a buffer is optimized for frequent insertion and deletion, while a string is not. Furthermore: 1. Buffers are "permanent" objects, i.e. once you create them, they remain around, and need to be explicitly deleted before they go away. 2. Each buffer has a unique name, which is a string. Buffers are normally referred to by name. In this respect, they are like symbols. 3. Buffers have a default insertion position, called "point". Inserting text (unless you explicitly give a position) goes at point, and moves point forward past the text. This is what is going on when you type text into Emacs. 4. Buffers have lots of extra properties associated with them. 5. Buffers can be "displayed". What this means is that there exist a number of "windows", which are objects that correspond to some visible section of your display, and each window has an associated buffer, and the current contents of the buffer are shown in that section of the display. The redisplay mechanism (which takes care of doing this) knows how to look at the text of a buffer and come up with some reasonable way of displaying this. Many of the properties of a buffer control how the buffer's text is displayed. 6. One buffer is distinguished and called the "current buffer". It is stored in the variable `current_buffer'. Buffer operations operate on this buffer by default. When you are typing text into a buffer, the buffer you are typing into is always `current_buffer'. Switching to a different window changes the current buffer. Note that Lisp code can temporarily change the current buffer using `set-buffer' (often enclosed in a `save-excursion' so that the former current buffer gets restored when the code is finished). However, calling `set-buffer' will NOT cause a permanent change in the current buffer. The reason for this is that the top-level event loop sets `current_buffer' to the buffer of the selected window, each time it finishes executing a user command. Make sure you understand the distinction between "current buffer" and "buffer of the selected window", and the distinction between "point" of the current buffer and "window-point" of the selected window. (This latter distinction is explained in detail in the section on windows.)  File: internals.info, Node: The Text in a Buffer, Next: Buffer Lists, Prev: Introduction to Buffers, Up: Buffers and Textual Representation The Text in a Buffer ==================== The text in a buffer consists of a sequence of zero or more characters. A "character" is an integer that logically represents a letter, number, space, or other unit of text. Most of the characters that you will typically encounter belong to the ASCII set of characters, but there are also characters for various sorts of accented letters, special symbols, Chinese and Japanese ideograms (i.e. Kanji, Katakana, etc.), Cyrillic and Greek letters, etc. The actual number of possible characters is quite large. For now, we can view a character as some non-negative integer that has some shape that defines how it typically appears (e.g. as an uppercase A). (The exact way in which a character appears depends on the font used to display the character.) The internal type of characters in the C code is an `Emchar'; this is just an `int', but using a symbolic type makes the code clearer. Between every character in a buffer is a "buffer position" or "character position". We can speak of the character before or after a particular buffer position, and when you insert a character at a particular position, all characters after that position end up at new positions. When we speak of the character "at" a position, we really mean the character after the position. (This schizophrenia between a buffer position being "between" a character and "on" a character is rampant in Emacs.) Buffer positions are numbered starting at 1. This means that position 1 is before the first character, and position 0 is not valid. If there are N characters in a buffer, then buffer position N+1 is after the last one, and position N+2 is not valid. The internal makeup of the Emchar integer varies depending on whether we have compiled with MULE support. If not, the Emchar integer is an 8-bit integer with possible values from 0 - 255. 0 - 127 are the standard ASCII characters, while 128 - 255 are the characters from the ISO-8859-1 character set. If we have compiled with MULE support, an Emchar is a 19-bit integer, with the various bits having meanings according to a complex scheme that will be detailed later. The characters numbered 0 - 255 still have the same meanings as for the non-MULE case, though. Internally, the text in a buffer is represented in a fairly simple fashion: as a contiguous array of bytes, with a "gap" of some size in the middle. Although the gap is of some substantial size in bytes, there is no text contained within it: From the perspective of the text in the buffer, it does not exist. The gap logically sits at some buffer position, between two characters (or possibly at the beginning or end of the buffer). Insertion of text in a buffer at a particular position is always accomplished by first moving the gap to that position (i.e. through some block moving of text), then writing the text into the beginning of the gap, thereby shrinking the gap. If the gap shrinks down to nothing, a new gap is created. (What actually happens is that a new gap is "created" at the end of the buffer's text, which requires nothing more than changing a couple of indices; then the gap is "moved" to the position where the insertion needs to take place by moving up in memory all the text after that position.) Similarly, deletion occurs by moving the gap to the place where the text is to be deleted, and then simply expanding the gap to include the deleted text. ("Expanding" and "shrinking" the gap as just described means just that the internal indices that keep track of where the gap is located are changed.) Note that the total amount of memory allocated for a buffer text never decreases while the buffer is live. Therefore, if you load up a 20-megabyte file and then delete all but one character, there will be a 20-megabyte gap, which won't get any smaller (except by inserting characters back again). Once the buffer is killed, the memory allocated for the buffer text will be freed, but it will still be sitting on the heap, taking up virtual memory, and will not be released back to the operating system. (However, if you have compiled XEmacs with rel-alloc, the situation is different. In this case, the space *will* be released back to the operating system. However, this tends to result in a noticeable speed penalty.) Astute readers may notice that the text in a buffer is represented as an array of *bytes*, while (at least in the MULE case) an Emchar is a 19-bit integer, which clearly cannot fit in a byte. This means (of course) that the text in a buffer uses a different representation from an Emchar: specifically, the 19-bit Emchar becomes a series of one to four bytes. The conversion between these two representations is complex and will be described later. In the non-MULE case, everything is very simple: An Emchar is an 8-bit value, which fits neatly into one byte. If we are given a buffer position and want to retrieve the character at that position, we need to follow these steps: 1. Pretend there's no gap, and convert the buffer position into a "byte index" that indexes to the appropriate byte in the buffer's stream of textual bytes. By convention, byte indices begin at 1, just like buffer positions. In the non-MULE case, byte indices and buffer positions are identical, since one character equals one byte. 2. Convert the byte index into a "memory index", which takes the gap into account. The memory index is a direct index into the block of memory that stores the text of a buffer. This basically just involves checking to see if the byte index is past the gap, and if so, adding the size of the gap to it. By convention, memory indices begin at 1, just like buffer positions and byte indices, and when referring to the position that is "at" the gap, we always use the memory position at the *beginning*, not at the end, of the gap. 3. Fetch the appropriate bytes at the determined memory position. 4. Convert these bytes into an Emchar. In the non-Mule case, (3) and (4) boil down to a simple one-byte memory access. Note that we have defined three types of positions in a buffer: 1. "buffer positions" or "character positions", typedef `Bufpos' 2. "byte indices", typedef `Bytind' 3. "memory indices", typedef `Memind' All three typedefs are just `int's, but defining them this way makes things a lot clearer. Most code works with buffer positions. In particular, all Lisp code that refers to text in a buffer uses buffer positions. Lisp code does not know that byte indices or memory indices exist. Finally, we have a typedef for the bytes in a buffer. This is a `Bufbyte', which is an unsigned char. Referring to them as Bufbytes underscores the fact that we are working with a string of bytes in the internal Emacs buffer representation rather than in one of a number of possible alternative representations (e.g. EUC-encoded text, etc.).