1 This is Info file ../../info/internals.info, produced by Makeinfo
2 version 1.68 from the input file internals.texi.
4 INFO-DIR-SECTION XEmacs Editor
6 * Internals: (internals). XEmacs Internals Manual.
9 Copyright (C) 1992 - 1996 Ben Wing. Copyright (C) 1996, 1997 Sun
10 Microsystems. Copyright (C) 1994 - 1998 Free Software Foundation.
11 Copyright (C) 1994, 1995 Board of Trustees, University of Illinois.
13 Permission is granted to make and distribute verbatim copies of this
14 manual provided the copyright notice and this permission notice are
15 preserved on all copies.
17 Permission is granted to copy and distribute modified versions of
18 this manual under the conditions for verbatim copying, provided that the
19 entire resulting derived work is distributed under the terms of a
20 permission notice identical to this one.
22 Permission is granted to copy and distribute translations of this
23 manual into another language, under the above conditions for modified
24 versions, except that this permission notice may be stated in a
25 translation approved by the Foundation.
27 Permission is granted to copy and distribute modified versions of
28 this manual under the conditions for verbatim copying, provided also
29 that the section entitled "GNU General Public License" is included
30 exactly as in the original, and provided that the entire resulting
31 derived work is distributed under the terms of a permission notice
32 identical to this one.
34 Permission is granted to copy and distribute translations of this
35 manual into another language, under the above conditions for modified
36 versions, except that the section entitled "GNU General Public License"
37 may be included in a translation approved by the Free Software
38 Foundation instead of in the original English.
41 File: internals.info, Node: Main Loop, Next: Specifics of the Event Gathering Mechanism, Prev: Introduction to Events, Up: Events and the Event Loop
46 The "command loop" is the top-level loop that the editor is always
47 running. It loops endlessly, calling `next-event' to retrieve an event
48 and `dispatch-event' to execute it. `dispatch-event' does the
49 appropriate thing with non-user events (process, timeout, magic, eval,
50 mouse motion); this involves calling a Lisp handler function, redrawing
51 a newly-exposed part of a frame, reading subprocess output, etc. For
52 user events, `dispatch-event' looks up the event in relevant keymaps or
53 menubars; when a full key sequence or menubar selection is reached, the
54 appropriate function is executed. `dispatch-event' may have to keep
55 state across calls; this is done in the "command-builder" structure
56 associated with each console (remember, there's usually only one
57 console), and the engine that looks up keystrokes and constructs full
58 key sequences is called the "command builder". This is documented
61 The guts of the command loop are in `command_loop_1()'. This
62 function doesn't catch errors, though - that's the job of
63 `command_loop_2()', which is a condition-case (i.e. error-trapping)
64 wrapper around `command_loop_1()'. `command_loop_1()' never returns,
65 but may get thrown out of.
67 When an error occurs, `cmd_error()' is called, which usually invokes
68 the Lisp error handler in `command-error'; however, a default error
69 handler is provided if `command-error' is `nil' (e.g. during startup).
70 The purpose of the error handler is simply to display the error message
71 and do associated cleanup; it does not need to throw anywhere. When
72 the error handler finishes, the condition-case in `command_loop_2()'
73 will finish and `command_loop_2()' will reinvoke `command_loop_1()'.
75 `command_loop_2()' is invoked from three places: from
76 `initial_command_loop()' (called from `main()' at the end of internal
77 initialization), from the Lisp function `recursive-edit', and from
78 `call_command_loop()'.
80 `call_command_loop()' is called when a macro is started and when the
81 minibuffer is entered; normal termination of the macro or minibuffer
82 causes a throw out of the recursive command loop. (To
83 `execute-kbd-macro' for macros and `exit' for minibuffers. Note also
84 that the low-level minibuffer-entering function,
85 `read-minibuffer-internal', provides its own error handling and does
86 not need `command_loop_2()''s error encapsulation; so it tells
87 `call_command_loop()' to invoke `command_loop_1()' directly.)
89 Note that both read-minibuffer-internal and recursive-edit set up a
90 catch for `exit'; this is why `abort-recursive-edit', which throws to
91 this catch, exits out of either one.
93 `initial_command_loop()', called from `main()', sets up a catch for
94 `top-level' when invoking `command_loop_2()', allowing functions to
95 throw all the way to the top level if they really need to. Before
96 invoking `command_loop_2()', `initial_command_loop()' calls
97 `top_level_1()', which handles all of the startup stuff (creating the
98 initial frame, handling the command-line options, loading the user's
99 `.emacs' file, etc.). The function that actually does this is in Lisp
100 and is pointed to by the variable `top-level'; normally this function is
101 `normal-top-level'. `top_level_1()' is just an error-handling wrapper
102 similar to `command_loop_2()'. Note also that `initial_command_loop()'
103 sets up a catch for `top-level' when invoking `top_level_1()', just
104 like when it invokes `command_loop_2()'.
107 File: internals.info, Node: Specifics of the Event Gathering Mechanism, Next: Specifics About the Emacs Event, Prev: Main Loop, Up: Events and the Event Loop
109 Specifics of the Event Gathering Mechanism
110 ==========================================
112 Here is an approximate diagram of the collection processes at work
113 in XEmacs, under TTY's (TTY's are simpler than X so we'll look at this
116 asynch. asynch. asynch. asynch. [Collectors in
117 kbd events kbd events process process the OS]
120 | | | | SIGINT, [signal handlers
121 | | | | SIGQUIT, in XEmacs]
123 file file file file SIGALRM
124 desc. desc. desc. desc. |
125 (TTY) (TTY) (pipe) (pipe) |
126 | | | | fake timeouts
134 ------>-----------<----------------<----------------
137 | [collected using select() in emacs_tty_next_event()
138 | and converted to the appropriate Emacs event]
141 V (above this line is TTY-specific)
142 Emacs -----------------------------------------------
143 event (below this line is the generic event mechanism)
146 was there if not, call
147 a SIGINT? emacs_tty_next_event()
154 | [collected in event_stream_next_event();
155 | SIGINT is converted using maybe_read_quit_event()]
160 \---->------>----- maybe_kbd_translate() ---->---\
164 command event queue |
166 (contains events that were event queue, call
167 read earlier but not processed, event_stream_next_event()
168 typically when waiting in a |
169 sit-for, sleep-for, etc. for |
170 a particular event to be received) |
174 ---->------------------------------------<----
177 | next_event_internal()]
179 unread- unread- event from |
180 command- command- keyboard else, call
181 events event macro next_event_internal()
186 --------->----------------------<------------
188 | [collected in `next-event', which may loop
189 | more than once if the event it gets is on
190 | a dead frame, device, etc.]
194 feed into top-level event loop,
195 which repeatedly calls `next-event'
196 and then dispatches the event
197 using `dispatch-event'
199 Notice the separation between TTY-specific and generic event
200 mechanism. When using the Xt-based event loop, the TTY-specific stuff
201 is replaced but the rest stays the same.
203 It's also important to realize that only one different kind of
204 system-specific event loop can be operating at a time, and must be able
205 to receive all kinds of events simultaneously. For the two existing
206 event loops (implemented in `event-tty.c' and `event-Xt.c',
207 respectively), the TTY event loop *only* handles TTY consoles, while
208 the Xt event loop handles *both* TTY and X consoles. This situation is
209 different from all of the output handlers, where you simply have one
212 Here's the Xt Event Loop Diagram (notice that below a certain point,
213 it's the same as the above diagram):
215 asynch. asynch. asynch. asynch. [Collectors in
216 kbd kbd process process the OS]
217 events events output output
219 | | | | asynch. asynch. [Collectors in the
220 | | | | X X OS and X Window System]
221 | | | | events events
224 | | | | | | SIGINT, [signal handlers
225 | | | | | | SIGQUIT, in XEmacs]
226 | | | | | | SIGWINCH,
230 | | | | | | | timeouts
235 file file file file file file file |
236 desc. desc. desc. desc. desc. desc. desc. |
237 (TTY) (TTY) (pipe) (pipe) (socket) (socket) (pipe) |
242 --->----------------------------------------<---------<------
244 | | |[collected using select() in
245 | | | _XtWaitForSomething(), called
246 | | | from XtAppProcessEvent(), called
247 | | | in emacs_Xt_next_event();
248 | | | dispatched to various callbacks]
251 emacs_Xt_ p_s_callback(), | [popup_selection_callback]
252 event_handler() x_u_v_s_callback(),| [x_update_vertical_scrollbar_
253 | x_u_h_s_callback(),| callback]
254 | search_callback() | [x_update_horizontal_scrollbar_
258 enqueue_Xt_ signal_special_ |
259 dispatch_event() Xt_user_event() |
271 dispatch Xt_what_callback()
278 ---->-----------<--------
281 | [collected and converted as appropriate in
282 | emacs_Xt_next_event()]
285 V (above this line is Xt-specific)
286 Emacs ------------------------------------------------
287 event (below this line is the generic event mechanism)
290 was there if not, call
291 a SIGINT? emacs_Xt_next_event()
298 | [collected in event_stream_next_event();
299 | SIGINT is converted using maybe_read_quit_event()]
304 \---->------>----- maybe_kbd_translate() -->-----\
308 command event queue |
310 (contains events that were event queue, call
311 read earlier but not processed, event_stream_next_event()
312 typically when waiting in a |
313 sit-for, sleep-for, etc. for |
314 a particular event to be received) |
318 ---->----------------------------------<------
321 | next_event_internal()]
323 unread- unread- event from |
324 command- command- keyboard else, call
325 events event macro next_event_internal()
330 --------->----------------------<------------
332 | [collected in `next-event', which may loop
333 | more than once if the event it gets is on
334 | a dead frame, device, etc.]
338 feed into top-level event loop,
339 which repeatedly calls `next-event'
340 and then dispatches the event
341 using `dispatch-event'
344 File: internals.info, Node: Specifics About the Emacs Event, Next: The Event Stream Callback Routines, Prev: Specifics of the Event Gathering Mechanism, Up: Events and the Event Loop
346 Specifics About the Emacs Event
347 ===============================
350 File: internals.info, Node: The Event Stream Callback Routines, Next: Other Event Loop Functions, Prev: Specifics About the Emacs Event, Up: Events and the Event Loop
352 The Event Stream Callback Routines
353 ==================================
356 File: internals.info, Node: Other Event Loop Functions, Next: Converting Events, Prev: The Event Stream Callback Routines, Up: Events and the Event Loop
358 Other Event Loop Functions
359 ==========================
361 `detect_input_pending()' and `input-pending-p' look for input by
362 calling `event_stream->event_pending_p' and looking in
363 `[V]unread-command-event' and the `command_event_queue' (they do not
364 check for an executing keyboard macro, though).
366 `discard-input' cancels any command events pending (and any keyboard
367 macros currently executing), and puts the others onto the
368 `command_event_queue'. There is a comment about a "race condition",
369 which is not a good sign.
371 `next-command-event' and `read-char' are higher-level interfaces to
372 `next-event'. `next-command-event' gets the next "command" event (i.e.
373 keypress, mouse event, menu selection, or scrollbar action), calling
374 `dispatch-event' on any others. `read-char' calls `next-command-event'
375 and uses `event_to_character()' to return the character equivalent.
376 With the right kind of input method support, it is possible for
377 (read-char) to return a Kanji character.
380 File: internals.info, Node: Converting Events, Next: Dispatching Events; The Command Builder, Prev: Other Event Loop Functions, Up: Events and the Event Loop
385 `character_to_event()', `event_to_character()',
386 `event-to-character', and `character-to-event' convert between
387 characters and keypress events corresponding to the characters. If the
388 event was not a keypress, `event_to_character()' returns -1 and
389 `event-to-character' returns `nil'. These functions convert between
390 character representation and the split-up event representation (keysym
394 File: internals.info, Node: Dispatching Events; The Command Builder, Prev: Converting Events, Up: Events and the Event Loop
396 Dispatching Events; The Command Builder
397 =======================================
402 File: internals.info, Node: Evaluation; Stack Frames; Bindings, Next: Symbols and Variables, Prev: Events and the Event Loop, Up: Top
404 Evaluation; Stack Frames; Bindings
405 **********************************
410 * Dynamic Binding; The specbinding Stack; Unwind-Protects::
411 * Simple Special Forms::
415 File: internals.info, Node: Evaluation, Next: Dynamic Binding; The specbinding Stack; Unwind-Protects, Up: Evaluation; Stack Frames; Bindings
420 `Feval()' evaluates the form (a Lisp object) that is passed to it.
421 Note that evaluation is only non-trivial for two types of objects:
422 symbols and conses. A symbol is evaluated simply by calling
423 `symbol-value' on it and returning the value.
425 Evaluating a cons means calling a function. First, `eval' checks to
426 see if garbage-collection is necessary, and calls `garbage_collect_1()'
427 if so. It then increases the evaluation depth by 1 (`lisp_eval_depth',
428 which is always less than `max_lisp_eval_depth') and adds an element to
429 the linked list of `struct backtrace''s (`backtrace_list'). Each such
430 structure contains a pointer to the function being called plus a list
431 of the function's arguments. Originally these values are stored
432 unevalled, and as they are evaluated, the backtrace structure is
433 updated. Garbage collection pays attention to the objects pointed to
434 in the backtrace structures (garbage collection might happen while a
435 function is being called or while an argument is being evaluated, and
436 there could easily be no other references to the arguments in the
437 argument list; once an argument is evaluated, however, the unevalled
438 version is not needed by eval, and so the backtrace structure is
441 At this point, the function to be called is determined by looking at
442 the car of the cons (if this is a symbol, its function definition is
443 retrieved and the process repeated). The function should then consist
444 of either a `Lisp_Subr' (built-in function written in C), a
445 `Lisp_Compiled_Function' object, or a cons whose car is one of the
446 symbols `autoload', `macro' or `lambda'.
448 If the function is a `Lisp_Subr', the lisp object points to a
449 `struct Lisp_Subr' (created by `DEFUN()'), which contains a pointer to
450 the C function, a minimum and maximum number of arguments (or possibly
451 the special constants `MANY' or `UNEVALLED'), a pointer to the symbol
452 referring to that subr, and a couple of other things. If the subr
453 wants its arguments `UNEVALLED', they are passed raw as a list.
454 Otherwise, an array of evaluated arguments is created and put into the
455 backtrace structure, and either passed whole (`MANY') or each argument
456 is passed as a C argument.
458 If the function is a `Lisp_Compiled_Function',
459 `funcall_compiled_function()' is called. If the function is a lambda
460 list, `funcall_lambda()' is called. If the function is a macro, [.....
461 fill in] is done. If the function is an autoload, `do_autoload()' is
462 called to load the definition and then eval starts over [explain this
465 When `Feval()' exits, the evaluation depth is reduced by one, the
466 debugger is called if appropriate, and the current backtrace structure
467 is removed from the list.
469 Both `funcall_compiled_function()' and `funcall_lambda()' need to go
470 through the list of formal parameters to the function and bind them to
471 the actual arguments, checking for `&rest' and `&optional' symbols in
472 the formal parameters and making sure the number of actual arguments is
473 correct. `funcall_compiled_function()' can do this a little more
474 efficiently, since the formal parameter list can be checked for sanity
475 when the compiled function object is created.
477 `funcall_lambda()' simply calls `Fprogn' to execute the code in the
480 `funcall_compiled_function()' calls the real byte-code interpreter
481 `execute_optimized_program()' on the byte-code instructions, which are
482 converted into an internal form for faster execution.
484 When a compiled function is executed for the first time by
485 `funcall_compiled_function()', or when it is `Fpurecopy()'ed during the
486 dump phase of building XEmacs, the byte-code instructions are converted
487 from a `Lisp_String' (which is inefficient to access, especially in the
488 presence of MULE) into a `Lisp_Opaque' object containing an array of
489 unsigned char, which can be directly executed by the byte-code
490 interpreter. At this time the byte code is also analyzed for validity
491 and transformed into a more optimized form, so that
492 `execute_optimized_program()' can really fly.
494 Here are some of the optimizations performed by the internal
495 byte-code transformer:
496 1. References to the `constants' array are checked for out-of-range
497 indices, so that the byte interpreter doesn't have to.
499 2. References to the `constants' array that will be used as a Lisp
500 variable are checked for being correct non-constant (i.e. not `t',
501 `nil', or `keywordp') symbols, so that the byte interpreter
504 3. The maxiumum number of variable bindings in the byte-code is
505 pre-computed, so that space on the `specpdl' stack can be
506 pre-reserved once for the whole function execution.
508 4. All byte-code jumps are relative to the current program counter
509 instead of the start of the program, thereby saving a register.
511 5. One-byte relative jumps are converted from the byte-code form of
512 unsigned chars offset by 127 to machine-friendly signed chars.
514 Of course, this transformation of the `instructions' should not be
515 visible to the user, so `Fcompiled_function_instructions()' needs to
516 know how to convert the optimized opaque object back into a Lisp string
517 that is identical to the original string from the `.elc' file.
518 (Actually, the resulting string may (rarely) contain slightly
519 different, yet equivalent, byte code.)
521 `Ffuncall()' implements Lisp `funcall'. `(funcall fun x1 x2 x3
522 ...)' is equivalent to `(eval (list fun (quote x1) (quote x2) (quote
523 x3) ...))'. `Ffuncall()' contains its own code to do the evaluation,
524 however, and is very similar to `Feval()'.
526 From the performance point of view, it is worth knowing that most of
527 the time in Lisp evaluation is spent executing `Lisp_Subr' and
528 `Lisp_Compiled_Function' objects via `Ffuncall()' (not `Feval()').
530 `Fapply()' implements Lisp `apply', which is very similar to
531 `funcall' except that if the last argument is a list, the result is the
532 same as if each of the arguments in the list had been passed separately.
533 `Fapply()' does some business to expand the last argument if it's a
534 list, then calls `Ffuncall()' to do the work.
536 `apply1()', `call0()', `call1()', `call2()', and `call3()' call a
537 function, passing it the argument(s) given (the arguments are given as
538 separate C arguments rather than being passed as an array). `apply1()'
539 uses `Fapply()' while the others use `Ffuncall()' to do the real work.
542 File: internals.info, Node: Dynamic Binding; The specbinding Stack; Unwind-Protects, Next: Simple Special Forms, Prev: Evaluation, Up: Evaluation; Stack Frames; Bindings
544 Dynamic Binding; The specbinding Stack; Unwind-Protects
545 =======================================================
550 Lisp_Object old_value;
551 Lisp_Object (*func) (Lisp_Object); /* for unwind-protect */
554 `struct specbinding' is used for local-variable bindings and
555 unwind-protects. `specpdl' holds an array of `struct specbinding''s,
556 `specpdl_ptr' points to the beginning of the free bindings in the
557 array, `specpdl_size' specifies the total number of binding slots in
558 the array, and `max_specpdl_size' specifies the maximum number of
559 bindings the array can be expanded to hold. `grow_specpdl()' increases
560 the size of the `specpdl' array, multiplying its size by 2 but never
561 exceeding `max_specpdl_size' (except that if this number is less than
562 400, it is first set to 400).
564 `specbind()' binds a symbol to a value and is used for local
565 variables and `let' forms. The symbol and its old value (which might
566 be `Qunbound', indicating no prior value) are recorded in the specpdl
567 array, and `specpdl_size' is increased by 1.
569 `record_unwind_protect()' implements an "unwind-protect", which,
570 when placed around a section of code, ensures that some specified
571 cleanup routine will be executed even if the code exits abnormally
572 (e.g. through a `throw' or quit). `record_unwind_protect()' simply
573 adds a new specbinding to the `specpdl' array and stores the
574 appropriate information in it. The cleanup routine can either be a C
575 function, which is stored in the `func' field, or a `progn' form, which
576 is stored in the `old_value' field.
578 `unbind_to()' removes specbindings from the `specpdl' array until
579 the specified position is reached. Each specbinding can be one of
582 1. an unwind-protect with a C cleanup function (`func' is not 0, and
583 `old_value' holds an argument to be passed to the function);
585 2. an unwind-protect with a Lisp form (`func' is 0, `symbol' is
586 `nil', and `old_value' holds the form to be executed with
589 3. a local-variable binding (`func' is 0, `symbol' is not `nil', and
590 `old_value' holds the old value, which is stored as the symbol's
594 File: internals.info, Node: Simple Special Forms, Next: Catch and Throw, Prev: Dynamic Binding; The specbinding Stack; Unwind-Protects, Up: Evaluation; Stack Frames; Bindings
599 `or', `and', `if', `cond', `progn', `prog1', `prog2', `setq',
600 `quote', `function', `let*', `let', `while'
602 All of these are very simple and work as expected, calling `Feval()'
603 or `Fprogn()' as necessary and (in the case of `let' and `let*') using
604 `specbind()' to create bindings and `unbind_to()' to undo the bindings
607 Note that, with the exeption of `Fprogn', these functions are
608 typically called in real life only in interpreted code, since the byte
609 compiler knows how to convert calls to these functions directly into
613 File: internals.info, Node: Catch and Throw, Prev: Simple Special Forms, Up: Evaluation; Stack Frames; Bindings
622 struct catchtag *next;
625 struct backtrace *backlist;
630 `catch' is a Lisp function that places a catch around a body of
631 code. A catch is a means of non-local exit from the code. When a catch
632 is created, a tag is specified, and executing a `throw' to this tag
633 will exit from the body of code caught with this tag, and its value will
634 be the value given in the call to `throw'. If there is no such call,
635 the code will be executed normally.
637 Information pertaining to a catch is held in a `struct catchtag',
638 which is placed at the head of a linked list pointed to by `catchlist'.
639 `internal_catch()' is passed a C function to call (`Fprogn()' when
640 Lisp `catch' is called) and arguments to give it, and places a catch
641 around the function. Each `struct catchtag' is held in the stack frame
642 of the `internal_catch()' instance that created the catch.
644 `internal_catch()' is fairly straightforward. It stores into the
645 `struct catchtag' the tag name and the current values of
646 `backtrace_list', `lisp_eval_depth', `gcprolist', and the offset into
647 the `specpdl' array, sets a jump point with `_setjmp()' (storing the
648 jump point into the `struct catchtag'), and calls the function.
649 Control will return to `internal_catch()' either when the function
650 exits normally or through a `_longjmp()' to this jump point. In the
651 latter case, `throw' will store the value to be returned into the
652 `struct catchtag' before jumping. When it's done, `internal_catch()'
653 removes the `struct catchtag' from the catchlist and returns the proper
656 `Fthrow()' goes up through the catchlist until it finds one with a
657 matching tag. It then calls `unbind_catch()' to restore everything to
658 what it was when the appropriate catch was set, stores the return value
659 in the `struct catchtag', and jumps (with `_longjmp()') to its jump
662 `unbind_catch()' removes all catches from the catchlist until it
663 finds the correct one. Some of the catches might have been placed for
664 error-trapping, and if so, the appropriate entries on the handlerlist
665 must be removed (see "errors"). `unbind_catch()' also restores the
666 values of `gcprolist', `backtrace_list', and `lisp_eval', and calls
667 `unbind_to()' to undo any specbindings created since the catch.
670 File: internals.info, Node: Symbols and Variables, Next: Buffers and Textual Representation, Prev: Evaluation; Stack Frames; Bindings, Up: Top
672 Symbols and Variables
673 *********************
677 * Introduction to Symbols::
682 File: internals.info, Node: Introduction to Symbols, Next: Obarrays, Up: Symbols and Variables
684 Introduction to Symbols
685 =======================
687 A symbol is basically just an object with four fields: a name (a
688 string), a value (some Lisp object), a function (some Lisp object), and
689 a property list (usually a list of alternating keyword/value pairs).
690 What makes symbols special is that there is usually only one symbol with
691 a given name, and the symbol is referred to by name. This makes a
692 symbol a convenient way of calling up data by name, i.e. of implementing
693 variables. (The variable's value is stored in the "value slot".)
694 Similarly, functions are referenced by name, and the definition of the
695 function is stored in a symbol's "function slot". This means that
696 there can be a distinct function and variable with the same name. The
697 property list is used as a more general mechanism of associating
698 additional values with particular names, and once again the namespace is
699 independent of the function and variable namespaces.
702 File: internals.info, Node: Obarrays, Next: Symbol Values, Prev: Introduction to Symbols, Up: Symbols and Variables
707 The identity of symbols with their names is accomplished through a
708 structure called an obarray, which is just a poorly-implemented hash
709 table mapping from strings to symbols whose name is that string. (I say
710 "poorly implemented" because an obarray appears in Lisp as a vector
711 with some hidden fields rather than as its own opaque type. This is an
712 Emacs Lisp artifact that should be fixed.)
714 Obarrays are implemented as a vector of some fixed size (which should
715 be a prime for best results), where each "bucket" of the vector
716 contains one or more symbols, threaded through a hidden `next' field in
717 the symbol. Lookup of a symbol in an obarray, and adding a symbol to
718 an obarray, is accomplished through standard hash-table techniques.
720 The standard Lisp function for working with symbols and obarrays is
721 `intern'. This looks up a symbol in an obarray given its name; if it's
722 not found, a new symbol is automatically created with the specified
723 name, added to the obarray, and returned. This is what happens when the
724 Lisp reader encounters a symbol (or more precisely, encounters the name
725 of a symbol) in some text that it is reading. There is a standard
726 obarray called `obarray' that is used for this purpose, although the
727 Lisp programmer is free to create his own obarrays and `intern' symbols
730 Note that, once a symbol is in an obarray, it stays there until
731 something is done about it, and the standard obarray `obarray' always
732 stays around, so once you use any particular variable name, a
733 corresponding symbol will stay around in `obarray' until you exit
736 Note that `obarray' itself is a variable, and as such there is a
737 symbol in `obarray' whose name is `"obarray"' and which contains
738 `obarray' as its value.
740 Note also that this call to `intern' occurs only when in the Lisp
741 reader, not when the code is executed (at which point the symbol is
742 already around, stored as such in the definition of the function).
744 You can create your own obarray using `make-vector' (this is
745 horrible but is an artifact) and intern symbols into that obarray.
746 Doing that will result in two or more symbols with the same name.
747 However, at most one of these symbols is in the standard `obarray': You
748 cannot have two symbols of the same name in any particular obarray.
749 Note that you cannot add a symbol to an obarray in any fashion other
750 than using `intern': i.e. you can't take an existing symbol and put it
751 in an existing obarray. Nor can you change the name of an existing
752 symbol. (Since obarrays are vectors, you can violate the consistency of
753 things by storing directly into the vector, but let's ignore that
756 Usually symbols are created by `intern', but if you really want, you
757 can explicitly create a symbol using `make-symbol', giving it some
758 name. The resulting symbol is not in any obarray (i.e. it is
759 "uninterned"), and you can't add it to any obarray. Therefore its
760 primary purpose is as a symbol to use in macros to avoid namespace
761 pollution. It can also be used as a carrier of information, but cons
762 cells could probably be used just as well.
764 You can also use `intern-soft' to look up a symbol but not create a
765 new one, and `unintern' to remove a symbol from an obarray. This
766 returns the removed symbol. (Remember: You can't put the symbol back
767 into any obarray.) Finally, `mapatoms' maps over all of the symbols in
771 File: internals.info, Node: Symbol Values, Prev: Obarrays, Up: Symbols and Variables
776 The value field of a symbol normally contains a Lisp object.
777 However, a symbol can be "unbound", meaning that it logically has no
778 value. This is internally indicated by storing a special Lisp object,
779 called "the unbound marker" and stored in the global variable
780 `Qunbound'. The unbound marker is of a special Lisp object type called
781 "symbol-value-magic". It is impossible for the Lisp programmer to
782 directly create or access any object of this type.
784 *You must not let any "symbol-value-magic" object escape to the Lisp
785 level.* Printing any of these objects will cause the message `INTERNAL
786 EMACS BUG' to appear as part of the print representation. (You may see
787 this normally when you call `debug_print()' from the debugger on a Lisp
788 object.) If you let one of these objects escape to the Lisp level, you
789 will violate a number of assumptions contained in the C code and make
790 the unbound marker not function right.
792 When a symbol is created, its value field (and function field) are
793 set to `Qunbound'. The Lisp programmer can restore these conditions
794 later using `makunbound' or `fmakunbound', and can query to see whether
795 the value of function fields are "bound" (i.e. have a value other than
796 `Qunbound') using `boundp' and `fboundp'. The fields are set to a
797 normal Lisp object using `set' (or `setq') and `fset'.
799 Other symbol-value-magic objects are used as special markers to
800 indicate variables that have non-normal properties. This includes any
801 variables that are tied into C variables (setting the variable magically
802 sets some global variable in the C code, and likewise for retrieving the
803 variable's value), variables that magically tie into slots in the
804 current buffer, variables that are buffer-local, etc. The
805 symbol-value-magic object is stored in the value cell in place of a
806 normal object, and the code to retrieve a symbol's value (i.e.
807 `symbol-value') knows how to do special things with them. This means
808 that you should not just fetch the value cell directly if you want a
811 The exact workings of this are rather complex and involved and are
812 well-documented in comments in `buffer.c', `symbols.c', and `lisp.h'.
815 File: internals.info, Node: Buffers and Textual Representation, Next: MULE Character Sets and Encodings, Prev: Symbols and Variables, Up: Top
817 Buffers and Textual Representation
818 **********************************
822 * Introduction to Buffers:: A buffer holds a block of text such as a file.
823 * The Text in a Buffer:: Representation of the text in a buffer.
824 * Buffer Lists:: Keeping track of all buffers.
825 * Markers and Extents:: Tagging locations within a buffer.
826 * Bufbytes and Emchars:: Representation of individual characters.
827 * The Buffer Object:: The Lisp object corresponding to a buffer.
830 File: internals.info, Node: Introduction to Buffers, Next: The Text in a Buffer, Up: Buffers and Textual Representation
832 Introduction to Buffers
833 =======================
835 A buffer is logically just a Lisp object that holds some text. In
836 this, it is like a string, but a buffer is optimized for frequent
837 insertion and deletion, while a string is not. Furthermore:
839 1. Buffers are "permanent" objects, i.e. once you create them, they
840 remain around, and need to be explicitly deleted before they go
843 2. Each buffer has a unique name, which is a string. Buffers are
844 normally referred to by name. In this respect, they are like
847 3. Buffers have a default insertion position, called "point".
848 Inserting text (unless you explicitly give a position) goes at
849 point, and moves point forward past the text. This is what is
850 going on when you type text into Emacs.
852 4. Buffers have lots of extra properties associated with them.
854 5. Buffers can be "displayed". What this means is that there exist a
855 number of "windows", which are objects that correspond to some
856 visible section of your display, and each window has an associated
857 buffer, and the current contents of the buffer are shown in that
858 section of the display. The redisplay mechanism (which takes care
859 of doing this) knows how to look at the text of a buffer and come
860 up with some reasonable way of displaying this. Many of the
861 properties of a buffer control how the buffer's text is displayed.
863 6. One buffer is distinguished and called the "current buffer". It is
864 stored in the variable `current_buffer'. Buffer operations operate
865 on this buffer by default. When you are typing text into a
866 buffer, the buffer you are typing into is always `current_buffer'.
867 Switching to a different window changes the current buffer. Note
868 that Lisp code can temporarily change the current buffer using
869 `set-buffer' (often enclosed in a `save-excursion' so that the
870 former current buffer gets restored when the code is finished).
871 However, calling `set-buffer' will NOT cause a permanent change in
872 the current buffer. The reason for this is that the top-level
873 event loop sets `current_buffer' to the buffer of the selected
874 window, each time it finishes executing a user command.
876 Make sure you understand the distinction between "current buffer"
877 and "buffer of the selected window", and the distinction between
878 "point" of the current buffer and "window-point" of the selected
879 window. (This latter distinction is explained in detail in the section
883 File: internals.info, Node: The Text in a Buffer, Next: Buffer Lists, Prev: Introduction to Buffers, Up: Buffers and Textual Representation
888 The text in a buffer consists of a sequence of zero or more
889 characters. A "character" is an integer that logically represents a
890 letter, number, space, or other unit of text. Most of the characters
891 that you will typically encounter belong to the ASCII set of characters,
892 but there are also characters for various sorts of accented letters,
893 special symbols, Chinese and Japanese ideograms (i.e. Kanji, Katakana,
894 etc.), Cyrillic and Greek letters, etc. The actual number of possible
895 characters is quite large.
897 For now, we can view a character as some non-negative integer that
898 has some shape that defines how it typically appears (e.g. as an
899 uppercase A). (The exact way in which a character appears depends on the
900 font used to display the character.) The internal type of characters in
901 the C code is an `Emchar'; this is just an `int', but using a symbolic
902 type makes the code clearer.
904 Between every character in a buffer is a "buffer position" or
905 "character position". We can speak of the character before or after a
906 particular buffer position, and when you insert a character at a
907 particular position, all characters after that position end up at new
908 positions. When we speak of the character "at" a position, we really
909 mean the character after the position. (This schizophrenia between a
910 buffer position being "between" a character and "on" a character is
913 Buffer positions are numbered starting at 1. This means that
914 position 1 is before the first character, and position 0 is not valid.
915 If there are N characters in a buffer, then buffer position N+1 is
916 after the last one, and position N+2 is not valid.
918 The internal makeup of the Emchar integer varies depending on whether
919 we have compiled with MULE support. If not, the Emchar integer is an
920 8-bit integer with possible values from 0 - 255. 0 - 127 are the
921 standard ASCII characters, while 128 - 255 are the characters from the
922 ISO-8859-1 character set. If we have compiled with MULE support, an
923 Emchar is a 19-bit integer, with the various bits having meanings
924 according to a complex scheme that will be detailed later. The
925 characters numbered 0 - 255 still have the same meanings as for the
926 non-MULE case, though.
928 Internally, the text in a buffer is represented in a fairly simple
929 fashion: as a contiguous array of bytes, with a "gap" of some size in
930 the middle. Although the gap is of some substantial size in bytes,
931 there is no text contained within it: From the perspective of the text
932 in the buffer, it does not exist. The gap logically sits at some buffer
933 position, between two characters (or possibly at the beginning or end of
934 the buffer). Insertion of text in a buffer at a particular position is
935 always accomplished by first moving the gap to that position (i.e.
936 through some block moving of text), then writing the text into the
937 beginning of the gap, thereby shrinking the gap. If the gap shrinks
938 down to nothing, a new gap is created. (What actually happens is that a
939 new gap is "created" at the end of the buffer's text, which requires
940 nothing more than changing a couple of indices; then the gap is "moved"
941 to the position where the insertion needs to take place by moving up in
942 memory all the text after that position.) Similarly, deletion occurs
943 by moving the gap to the place where the text is to be deleted, and
944 then simply expanding the gap to include the deleted text.
945 ("Expanding" and "shrinking" the gap as just described means just that
946 the internal indices that keep track of where the gap is located are
949 Note that the total amount of memory allocated for a buffer text
950 never decreases while the buffer is live. Therefore, if you load up a
951 20-megabyte file and then delete all but one character, there will be a
952 20-megabyte gap, which won't get any smaller (except by inserting
953 characters back again). Once the buffer is killed, the memory allocated
954 for the buffer text will be freed, but it will still be sitting on the
955 heap, taking up virtual memory, and will not be released back to the
956 operating system. (However, if you have compiled XEmacs with rel-alloc,
957 the situation is different. In this case, the space *will* be released
958 back to the operating system. However, this tends to result in a
959 noticeable speed penalty.)
961 Astute readers may notice that the text in a buffer is represented as
962 an array of *bytes*, while (at least in the MULE case) an Emchar is a
963 19-bit integer, which clearly cannot fit in a byte. This means (of
964 course) that the text in a buffer uses a different representation from
965 an Emchar: specifically, the 19-bit Emchar becomes a series of one to
966 four bytes. The conversion between these two representations is complex
967 and will be described later.
969 In the non-MULE case, everything is very simple: An Emchar is an
970 8-bit value, which fits neatly into one byte.
972 If we are given a buffer position and want to retrieve the character
973 at that position, we need to follow these steps:
975 1. Pretend there's no gap, and convert the buffer position into a
976 "byte index" that indexes to the appropriate byte in the buffer's
977 stream of textual bytes. By convention, byte indices begin at 1,
978 just like buffer positions. In the non-MULE case, byte indices
979 and buffer positions are identical, since one character equals one
982 2. Convert the byte index into a "memory index", which takes the gap
983 into account. The memory index is a direct index into the block of
984 memory that stores the text of a buffer. This basically just
985 involves checking to see if the byte index is past the gap, and if
986 so, adding the size of the gap to it. By convention, memory
987 indices begin at 1, just like buffer positions and byte indices,
988 and when referring to the position that is "at" the gap, we always
989 use the memory position at the *beginning*, not at the end, of the
992 3. Fetch the appropriate bytes at the determined memory position.
994 4. Convert these bytes into an Emchar.
996 In the non-Mule case, (3) and (4) boil down to a simple one-byte
999 Note that we have defined three types of positions in a buffer:
1001 1. "buffer positions" or "character positions", typedef `Bufpos'
1003 2. "byte indices", typedef `Bytind'
1005 3. "memory indices", typedef `Memind'
1007 All three typedefs are just `int's, but defining them this way makes
1008 things a lot clearer.
1010 Most code works with buffer positions. In particular, all Lisp code
1011 that refers to text in a buffer uses buffer positions. Lisp code does
1012 not know that byte indices or memory indices exist.
1014 Finally, we have a typedef for the bytes in a buffer. This is a
1015 `Bufbyte', which is an unsigned char. Referring to them as Bufbytes
1016 underscores the fact that we are working with a string of bytes in the
1017 internal Emacs buffer representation rather than in one of a number of
1018 possible alternative representations (e.g. EUC-encoded text, etc.).