1 \input texinfo @c -*-texinfo-*-
3 @setfilename ../../info/internals.info
4 @settitle XEmacs Internals Manual
8 @dircategory XEmacs Editor
10 * Internals: (internals). XEmacs Internals Manual.
13 Copyright @copyright{} 1992 - 1996 Ben Wing.
14 Copyright @copyright{} 1996, 1997 Sun Microsystems.
15 Copyright @copyright{} 1994 - 1998 Free Software Foundation.
16 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
19 Permission is granted to make and distribute verbatim copies of this
20 manual provided the copyright notice and this permission notice are
21 preserved on all copies.
24 Permission is granted to process this file through TeX and print the
25 results, provided the printed document carries copying permission notice
26 identical to this one except for the removal of this paragraph (this
27 paragraph not being relevant to the printed manual).
30 Permission is granted to copy and distribute modified versions of this
31 manual under the conditions for verbatim copying, provided that the
32 entire resulting derived work is distributed under the terms of a
33 permission notice identical to this one.
35 Permission is granted to copy and distribute translations of this manual
36 into another language, under the above conditions for modified versions,
37 except that this permission notice may be stated in a translation
38 approved by the Foundation.
40 Permission is granted to copy and distribute modified versions of this
41 manual under the conditions for verbatim copying, provided also that the
42 section entitled ``GNU General Public License'' is included exactly as
43 in the original, and provided that the entire resulting derived work is
44 distributed under the terms of a permission notice identical to this
47 Permission is granted to copy and distribute translations of this manual
48 into another language, under the above conditions for modified versions,
49 except that the section entitled ``GNU General Public License'' may be
50 included in a translation approved by the Free Software Foundation
51 instead of in the original English.
61 @setchapternewpage odd
65 @title XEmacs Internals Manual
66 @subtitle Version 1.4, March 2001
69 @author Martin Buchholz
71 @author Matthias Neubauer
72 @author Olivier Galibert
77 Copyright @copyright{} 1992 - 1996, 2001 Ben Wing. @*
78 Copyright @copyright{} 1996, 1997 Sun Microsystems, Inc. @*
79 Copyright @copyright{} 1994 - 1998 Free Software Foundation. @*
80 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
86 Permission is granted to make and distribute verbatim copies of this
87 manual provided the copyright notice and this permission notice are
88 preserved on all copies.
90 Permission is granted to copy and distribute modified versions of this
91 manual under the conditions for verbatim copying, provided also that the
92 section entitled ``GNU General Public License'' is included
93 exactly as in the original, and provided that the entire resulting
94 derived work is distributed under the terms of a permission notice
95 identical to this one.
97 Permission is granted to copy and distribute translations of this manual
98 into another language, under the above conditions for modified versions,
99 except that the section entitled ``GNU General Public License'' may be
100 included in a translation approved by the Free Software Foundation
101 instead of in the original English.
105 @node Top, A History of Emacs, (dir), (dir)
108 This Info file contains v1.4 of the XEmacs Internals Manual, March 2001.
112 * A History of Emacs:: Times, dates, important events.
113 * XEmacs From the Outside:: A broad conceptual overview.
114 * The Lisp Language:: An overview.
115 * XEmacs From the Perspective of Building::
116 * XEmacs From the Inside::
117 * The XEmacs Object System (Abstractly Speaking)::
118 * How Lisp Objects Are Represented in C::
119 * Rules When Writing New C Code::
120 * Regression Testing XEmacs::
121 * A Summary of the Various XEmacs Modules::
122 * Allocation of Objects in XEmacs Lisp::
124 * Events and the Event Loop::
125 * Evaluation; Stack Frames; Bindings::
126 * Symbols and Variables::
127 * Buffers and Textual Representation::
128 * MULE Character Sets and Encodings::
129 * The Lisp Reader and Compiler::
131 * Consoles; Devices; Frames; Windows::
132 * The Redisplay Mechanism::
139 * Interface to the X Window System::
144 --- The Detailed Node Listing ---
148 * Through Version 18:: Unification prevails.
149 * Lucid Emacs:: One version 19 Emacs.
150 * GNU Emacs 19:: The other version 19 Emacs.
151 * GNU Emacs 20:: The other version 20 Emacs.
152 * XEmacs:: The continuation of Lucid Emacs.
154 Rules When Writing New C Code
156 * General Coding Rules::
157 * Writing Lisp Primitives::
158 * Adding Global Lisp Variables::
160 * Techniques for XEmacs Developers::
164 * Character-Related Data Types::
165 * Working With Character and Byte Positions::
166 * Conversion to and from External Data::
167 * General Guidelines for Writing Mule-Aware Code::
168 * An Example of Mule-Aware Code::
170 Regression Testing XEmacs
172 A Summary of the Various XEmacs Modules
174 * Low-Level Modules::
175 * Basic Lisp Modules::
176 * Modules for Standard Editing Operations::
177 * Editor-Level Control Flow Modules::
178 * Modules for the Basic Displayable Lisp Objects::
179 * Modules for other Display-Related Lisp Objects::
180 * Modules for the Redisplay Mechanism::
181 * Modules for Interfacing with the File System::
182 * Modules for Other Aspects of the Lisp Interpreter and Object System::
183 * Modules for Interfacing with the Operating System::
184 * Modules for Interfacing with X Windows::
185 * Modules for Internationalization::
186 * Modules for Regression Testing::
188 Allocation of Objects in XEmacs Lisp
190 * Introduction to Allocation::
191 * Garbage Collection::
193 * Garbage Collection - Step by Step::
194 * Integers and Characters::
195 * Allocation from Frob Blocks::
197 * Low-level allocation::
204 * Compiled Function::
206 Garbage Collection - Step by Step
209 * garbage_collect_1::
212 * sweep_lcrecords_1::
213 * compact_string_chars::
215 * sweep_bit_vectors_1::
220 * Data descriptions::
227 * Address allocation::
232 Events and the Event Loop
234 * Introduction to Events::
236 * Specifics of the Event Gathering Mechanism::
237 * Specifics About the Emacs Event::
238 * The Event Stream Callback Routines::
239 * Other Event Loop Functions::
240 * Converting Events::
241 * Dispatching Events; The Command Builder::
243 Evaluation; Stack Frames; Bindings
246 * Dynamic Binding; The specbinding Stack; Unwind-Protects::
247 * Simple Special Forms::
250 Symbols and Variables
252 * Introduction to Symbols::
256 Buffers and Textual Representation
258 * Introduction to Buffers:: A buffer holds a block of text such as a file.
259 * The Text in a Buffer:: Representation of the text in a buffer.
260 * Buffer Lists:: Keeping track of all buffers.
261 * Markers and Extents:: Tagging locations within a buffer.
262 * Bufbytes and Emchars:: Representation of individual characters.
263 * The Buffer Object:: The Lisp object corresponding to a buffer.
265 MULE Character Sets and Encodings
269 * Internal Mule Encodings::
274 * Japanese EUC (Extended Unix Code)::
277 Internal Mule Encodings
279 * Internal String Encoding::
280 * Internal Character Encoding::
284 * Creating an Lstream:: Creating an lstream object.
285 * Lstream Types:: Different sorts of things that are streamed.
286 * Lstream Functions:: Functions for working with lstreams.
287 * Lstream Methods:: Creating new lstream types.
289 Consoles; Devices; Frames; Windows
291 * Introduction to Consoles; Devices; Frames; Windows::
294 * The Window Object::
296 The Redisplay Mechanism
298 * Critical Redisplay Sections::
300 * Redisplay Piece by Piece::
304 * Introduction to Extents:: Extents are ranges over text, with properties.
305 * Extent Ordering:: How extents are ordered internally.
306 * Format of the Extent Info:: The extent information in a buffer or string.
307 * Zero-Length Extents:: A weird special case.
308 * Mathematics of Extent Ordering:: A rigorous foundation.
309 * Extent Fragments:: Cached information useful for redisplay.
314 @node A History of Emacs, XEmacs From the Outside, Top, Top
315 @chapter A History of Emacs
316 @cindex history of Emacs, a
317 @cindex Emacs, a history of
318 @cindex Hackers (Steven Levy)
320 @cindex ITS (Incompatible Timesharing System)
321 @cindex Stallman, Richard
326 @cindex Free Software Foundation
328 XEmacs is a powerful, customizable text editor and development
329 environment. It began as Lucid Emacs, which was in turn derived from
330 GNU Emacs, a program written by Richard Stallman of the Free Software
331 Foundation. GNU Emacs dates back to the 1970's, and was modelled
332 after a package called ``Emacs'', written in 1976, that was a set of
333 macros on top of TECO, an old, old text editor written at MIT on the
334 DEC PDP 10 under one of the earliest time-sharing operating systems,
335 ITS (Incompatible Timesharing System). (ITS dates back well before
336 Unix.) ITS, TECO, and Emacs were products of a group of people at MIT
337 who called themselves ``hackers'', who shared an idealistic belief
338 system about the free exchange of information and were fanatical in
339 their devotion to and time spent with computers. (The hacker
340 subculture dates back to the late 1950's at MIT and is described in
341 detail in Steven Levy's book @cite{Hackers}. This book also includes
342 a lot of information about Stallman himself and the development of
343 Lisp, a programming language developed at MIT that underlies Emacs.)
346 * Through Version 18:: Unification prevails.
347 * Lucid Emacs:: One version 19 Emacs.
348 * GNU Emacs 19:: The other version 19 Emacs.
349 * GNU Emacs 20:: The other version 20 Emacs.
350 * XEmacs:: The continuation of Lucid Emacs.
353 @node Through Version 18
354 @section Through Version 18
355 @cindex version 18, through
356 @cindex Gosling, James
357 @cindex Great Usenet Renaming
359 Although the history of the early versions of GNU Emacs is unclear,
360 the history is well-known from the middle of 1985. A time line is:
364 GNU Emacs version 15 (15.34) was released sometime in 1984 or 1985 and
365 shared some code with a version of Emacs written by James Gosling (the
366 same James Gosling who later created the Java language).
368 GNU Emacs version 16 (first released version was 16.56) was released on
369 July 15, 1985. All Gosling code was removed due to potential copyright
370 problems with the code.
372 version 16.57: released on September 16, 1985.
374 versions 16.58, 16.59: released on September 17, 1985.
376 version 16.60: released on September 19, 1985. These later version 16's
377 incorporated patches from the net, esp. for getting Emacs to work under
380 version 17.36 (first official v17 release) released on December 20,
381 1985. Included a TeX-able user manual. First official unpatched
382 version that worked on vanilla System V machines.
384 version 17.43 (second official v17 release) released on January 25,
387 version 17.45 released on January 30, 1986.
389 version 17.46 released on February 4, 1986.
391 version 17.48 released on February 10, 1986.
393 version 17.49 released on February 12, 1986.
395 version 17.55 released on March 18, 1986.
397 version 17.57 released on March 27, 1986.
399 version 17.58 released on April 4, 1986.
401 version 17.61 released on April 12, 1986.
403 version 17.63 released on May 7, 1986.
405 version 17.64 released on May 12, 1986.
407 version 18.24 (a beta version) released on October 2, 1986.
409 version 18.30 (a beta version) released on November 15, 1986.
411 version 18.31 (a beta version) released on November 23, 1986.
413 version 18.32 (a beta version) released on December 7, 1986.
415 version 18.33 (a beta version) released on December 12, 1986.
417 version 18.35 (a beta version) released on January 5, 1987.
419 version 18.36 (a beta version) released on January 21, 1987.
421 January 27, 1987: The Great Usenet Renaming. net.emacs is now
424 version 18.37 (a beta version) released on February 12, 1987.
426 version 18.38 (a beta version) released on March 3, 1987.
428 version 18.39 (a beta version) released on March 14, 1987.
430 version 18.40 (a beta version) released on March 18, 1987.
432 version 18.41 (the first ``official'' release) released on March 22,
435 version 18.45 released on June 2, 1987.
437 version 18.46 released on June 9, 1987.
439 version 18.47 released on June 18, 1987.
441 version 18.48 released on September 3, 1987.
443 version 18.49 released on September 18, 1987.
445 version 18.50 released on February 13, 1988.
447 version 18.51 released on May 7, 1988.
449 version 18.52 released on September 1, 1988.
451 version 18.53 released on February 24, 1989.
453 version 18.54 released on April 26, 1989.
455 version 18.55 released on August 23, 1989. This is the earliest version
456 that is still available by FTP.
458 version 18.56 released on January 17, 1991.
460 version 18.57 released late January, 1991.
462 version 18.58 released ?????.
464 version 18.59 released October 31, 1992.
474 Lucid Emacs was developed by the (now-defunct) Lucid Inc., a maker of
475 C++ and Lisp development environments. It began when Lucid decided they
476 wanted to use Emacs as the editor and cornerstone of their C++
477 development environment (called ``Energize''). They needed many features
478 that were not available in the existing version of GNU Emacs (version
479 18.5something), in particular good and integrated support for GUI
480 elements such as mouse support, multiple fonts, multiple window-system
481 windows, etc. A branch of GNU Emacs called Epoch, written at the
482 University of Illinois, existed that supplied many of these features;
483 however, Lucid needed more than what existed in Epoch. At the time, the
484 Free Software Foundation was working on version 19 of Emacs (this was
485 sometime around 1991), which was planned to have similar features, and
486 so Lucid decided to work with the Free Software Foundation. Their plan
487 was to add features that they needed, and coordinate with the FSF so
488 that the features would get included back into Emacs version 19.
490 Delays in the release of version 19 occurred, however (resulting in it
491 finally being released more than a year after what was initially
492 planned), and Lucid encountered unexpected technical resistance in
493 getting their changes merged back into version 19, so they decided to
494 release their own version of Emacs, which became Lucid Emacs 19.0.
496 @cindex Zawinski, Jamie
497 @cindex Sexton, Harlan
499 @cindex Devin, Matthieu
500 The initial authors of Lucid Emacs were Matthieu Devin, Harlan Sexton,
501 and Eric Benson, and the work was later taken over by Jamie Zawinski,
502 who became ``Mr. Lucid Emacs'' for many releases.
504 A time line for Lucid Emacs is
508 version 19.0 shipped with Energize 1.0, April 1992.
510 version 19.1 released June 4, 1992.
512 version 19.2 released June 19, 1992.
514 version 19.3 released September 9, 1992.
516 version 19.4 released January 21, 1993.
518 version 19.5 was a repackaging of 19.4 with a few bug fixes and
519 shipped with Energize 2.0. Never released to the net.
521 version 19.6 released April 9, 1993.
523 version 19.7 was a repackaging of 19.6 with a few bug fixes and
524 shipped with Energize 2.1. Never released to the net.
526 version 19.8 released September 6, 1993.
528 version 19.9 released January 12, 1994.
530 version 19.10 released May 27, 1994.
532 version 19.11 (first XEmacs) released September 13, 1994.
534 version 19.12 released June 23, 1995.
536 version 19.13 released September 1, 1995.
538 version 19.14 released June 23, 1996.
540 version 20.0 released February 9, 1997.
542 version 19.15 released March 28, 1997.
544 version 20.1 (not released to the net) April 15, 1997.
546 version 20.2 released May 16, 1997.
548 version 19.16 released October 31, 1997.
550 version 20.3 (the first stable version of XEmacs 20.x) released November 30,
553 version 20.4 released February 28, 1998.
555 version 21.1.2 released May 14, 1999. (The version naming scheme was
556 changed at this point: [a] the second version number is odd for stable
557 versions, even for beta versions; [b] a third version number is added,
558 replacing the "beta xxx" ending for beta versions and allowing for
559 periodic maintenance releases for stable versions. Therefore, 21.0 was
560 never "officially" released; similarly for 21.2, etc.)
562 version 21.1.3 released June 26, 1999.
564 version 21.1.4 released July 8, 1999.
566 version 21.1.6 released August 14, 1999. (There was no 21.1.5.)
568 version 21.1.7 released September 26, 1999.
570 version 21.1.8 released November 2, 1999.
572 version 21.1.9 released February 13, 2000.
574 version 21.1.10 released May 7, 2000.
576 version 21.1.10a released June 24, 2000.
578 version 21.1.11 released July 18, 2000.
580 version 21.1.12 released August 5, 2000.
582 version 21.1.13 released January 7, 2001.
584 version 21.1.14 released January 27, 2001.
588 @section GNU Emacs 19
590 @cindex Emacs 19, GNU
591 @cindex version 19, GNU Emacs
594 About a year after the initial release of Lucid Emacs, the FSF
595 released a beta of their version of Emacs 19 (referred to here as ``GNU
596 Emacs''). By this time, the current version of Lucid Emacs was
597 19.6. (Strangely, the first released beta from the FSF was GNU Emacs
598 19.7.) A time line for GNU Emacs version 19 is
602 version 19.8 (beta) released May 27, 1993.
604 version 19.9 (beta) released May 27, 1993.
606 version 19.10 (beta) released May 30, 1993.
608 version 19.11 (beta) released June 1, 1993.
610 version 19.12 (beta) released June 2, 1993.
612 version 19.13 (beta) released June 8, 1993.
614 version 19.14 (beta) released June 17, 1993.
616 version 19.15 (beta) released June 19, 1993.
618 version 19.16 (beta) released July 6, 1993.
620 version 19.17 (beta) released late July, 1993.
622 version 19.18 (beta) released August 9, 1993.
624 version 19.19 (beta) released August 15, 1993.
626 version 19.20 (beta) released November 17, 1993.
628 version 19.21 (beta) released November 17, 1993.
630 version 19.22 (beta) released November 28, 1993.
632 version 19.23 (beta) released May 17, 1994.
634 version 19.24 (beta) released May 16, 1994.
636 version 19.25 (beta) released June 3, 1994.
638 version 19.26 (beta) released September 11, 1994.
640 version 19.27 (beta) released September 14, 1994.
642 version 19.28 (first ``official'' release) released November 1, 1994.
644 version 19.29 released June 21, 1995.
646 version 19.30 released November 24, 1995.
648 version 19.31 released May 25, 1996.
650 version 19.32 released July 31, 1996.
652 version 19.33 released August 11, 1996.
654 version 19.34 released August 21, 1996.
656 version 19.34b released September 6, 1996.
659 @cindex Mlynarik, Richard
660 In some ways, GNU Emacs 19 was better than Lucid Emacs; in some ways,
661 worse. Lucid soon began incorporating features from GNU Emacs 19 into
662 Lucid Emacs; the work was mostly done by Richard Mlynarik, who had been
663 working on and using GNU Emacs for a long time (back as far as version
667 @section GNU Emacs 20
669 @cindex Emacs 20, GNU
670 @cindex version 20, GNU Emacs
673 On February 2, 1997 work began on GNU Emacs to integrate Mule. The first
674 release was made in September of that year.
676 A timeline for Emacs 20 is
680 version 20.1 released September 17, 1997.
682 version 20.2 released September 20, 1997.
684 version 20.3 released August 19, 1998.
691 @cindex Sun Microsystems
692 @cindex University of Illinois
693 @cindex Illinois, University of
695 @cindex Andreessen, Marc
697 @cindex Buchholz, Martin
698 @cindex Kaplan, Simon
700 @cindex Thompson, Chuck
703 @cindex Amdahl Corporation
704 Around the time that Lucid was developing Energize, Sun Microsystems
705 was developing their own development environment (called ``SPARCWorks'')
706 and also decided to use Emacs. They joined forces with the Epoch team
707 at the University of Illinois and later with Lucid. The maintainer of
708 the last-released version of Epoch was Marc Andreessen, but he dropped
709 out and the Epoch project, headed by Simon Kaplan, lured Chuck Thompson
710 away from a system administration job to become the primary Lucid Emacs
711 author for Epoch and Sun. Chuck's area of specialty became the
712 redisplay engine (he replaced the old Lucid Emacs redisplay engine with
713 a ported version from Epoch and then later rewrote it from scratch).
714 Sun also hired Ben Wing (the author of Win-Emacs, a port of Lucid Emacs
715 to Microsoft Windows 3.1) in 1993, for what was initially a one-month
716 contract to fix some event problems but later became a many-year
717 involvement, punctuated by a six-month contract with Amdahl Corporation.
719 @cindex rename to XEmacs
720 In 1994, Sun and Lucid agreed to rename Lucid Emacs to XEmacs (a name
721 not favorable to either company); the first release called XEmacs was
722 version 19.11. In June 1994, Lucid folded and Jamie quit to work for
723 the newly formed Mosaic Communications Corp., later Netscape
724 Communications Corp. (co-founded by the same Marc Andreessen, who had
725 quit his Epoch job to work on a graphical browser for the World Wide
726 Web). Chuck then become the primary maintainer of XEmacs, and put out
727 versions 19.11 through 19.14 in conjunction with Ben. For 19.12 and
728 19.13, Chuck added the new redisplay and many other display improvements
729 and Ben added MULE support (support for Asian and other languages) and
730 redesigned most of the internal Lisp subsystems to better support the
731 MULE work and the various other features being added to XEmacs. After
732 19.14 Chuck retired as primary maintainer and Steve Baur stepped in.
734 @cindex MULE merged XEmacs appears
735 Soon after 19.13 was released, work began in earnest on the MULE
736 internationalization code and the source tree was divided into two
737 development paths. The MULE version was initially called 19.20, but was
738 soon renamed to 20.0. In 1996 Martin Buchholz of Sun Microsystems took
739 over the care and feeding of it and worked on it in parallel with the
740 19.14 development that was occurring at the same time. After much work
741 by Martin, it was decided to release 20.0 ahead of 19.15 in February
742 1997. The source tree remained divided until 20.2 when the version 19
743 source was finally retired at version 19.16.
746 @cindex Buchholz, Martin
748 @cindex Niksic, Hrvoje
749 @cindex XEmacs goes it alone
750 In 1997, Sun finally dropped all pretense of support for XEmacs and
751 Martin Buchholz left the company in November. Since then, and mostly
752 for the previous year, because Steve Baur was never paid to work on
753 XEmacs, XEmacs has existed solely on the contributions of volunteers
754 from the Free Software Community. Starting from 1997, Hrvoje Niksic and
755 Kyle Jones have figured prominently in XEmacs development.
757 @cindex merging attempts
758 Many attempts have been made to merge XEmacs and GNU Emacs, but they
759 have consistently failed.
761 A more detailed history is contained in the XEmacs About page.
763 A time line for XEmacs is
767 version 19.11 (first XEmacs) released September 13, 1994.
769 version 19.12 released June 23, 1995.
771 version 19.13 released September 1, 1995.
773 version 19.14 released June 23, 1996.
775 version 20.0 released February 9, 1997.
777 version 19.15 released March 28, 1997.
779 version 20.1 (not released to the net) April 15, 1997.
781 version 20.2 released May 16, 1997.
783 version 19.16 released October 31, 1997.
785 version 20.3 (the first stable version of XEmacs 20.x) released November 30,
788 version 20.4 released February 28, 1998.
790 version 21.0.60 released December 10, 1998. (The version naming scheme was
791 changed at this point: [a] the second version number is odd for stable
792 versions, even for beta versions; [b] a third version number is added,
793 replacing the "beta xxx" ending for beta versions and allowing for
794 periodic maintenance releases for stable versions. Therefore, 21.0 was
795 never "officially" released; similarly for 21.2, etc.)
797 version 21.0.61 released January 4, 1999.
799 version 21.0.63 released February 3, 1999.
801 version 21.0.64 released March 1, 1999.
803 version 21.0.65 released March 5, 1999.
805 version 21.0.66 released March 12, 1999.
807 version 21.0.67 released March 25, 1999.
809 version 21.1.2 released May 14, 1999. (This is the followup to 21.0.67.
810 The second version number was bumped to indicate the beginning of the
813 version 21.1.3 released June 26, 1999.
815 version 21.1.4 released July 8, 1999.
817 version 21.1.6 released August 14, 1999. (There was no 21.1.5.)
819 version 21.1.7 released September 26, 1999.
821 version 21.1.8 released November 2, 1999.
823 version 21.1.9 released February 13, 2000.
825 version 21.1.10 released May 7, 2000.
827 version 21.1.10a released June 24, 2000.
829 version 21.1.11 released July 18, 2000.
831 version 21.1.12 released August 5, 2000.
833 version 21.1.13 released January 7, 2001.
835 version 21.1.14 released January 27, 2001.
837 version 21.2.9 released February 3, 1999.
839 version 21.2.10 released February 5, 1999.
841 version 21.2.11 released March 1, 1999.
843 version 21.2.12 released March 5, 1999.
845 version 21.2.13 released March 12, 1999.
847 version 21.2.14 released May 14, 1999.
849 version 21.2.15 released June 4, 1999.
851 version 21.2.16 released June 11, 1999.
853 version 21.2.17 released June 22, 1999.
855 version 21.2.18 released July 14, 1999.
857 version 21.2.19 released July 30, 1999.
859 version 21.2.20 released November 10, 1999.
861 version 21.2.21 released November 28, 1999.
863 version 21.2.22 released November 29, 1999.
865 version 21.2.23 released December 7, 1999.
867 version 21.2.24 released December 14, 1999.
869 version 21.2.25 released December 24, 1999.
871 version 21.2.26 released December 31, 1999.
873 version 21.2.27 released January 18, 2000.
875 version 21.2.28 released February 7, 2000.
877 version 21.2.29 released February 16, 2000.
879 version 21.2.30 released February 21, 2000.
881 version 21.2.31 released February 23, 2000.
883 version 21.2.32 released March 20, 2000.
885 version 21.2.33 released May 1, 2000.
887 version 21.2.34 released May 28, 2000.
889 version 21.2.35 released July 19, 2000.
891 version 21.2.36 released October 4, 2000.
893 version 21.2.37 released November 14, 2000.
895 version 21.2.38 released December 5, 2000.
897 version 21.2.39 released December 31, 2000.
899 version 21.2.40 released January 8, 2001.
901 version 21.2.41 released January 17, 2001.
903 version 21.2.42 released January 20, 2001.
905 version 21.2.43 released January 26, 2001.
907 version 21.2.44 released February 8, 2001.
909 version 21.2.45 released February 23, 2001.
911 version 21.2.46 released March 21, 2001.
914 @node XEmacs From the Outside, The Lisp Language, A History of Emacs, Top
915 @chapter XEmacs From the Outside
916 @cindex XEmacs from the outside
917 @cindex outside, XEmacs from the
918 @cindex read-eval-print
920 XEmacs appears to the outside world as an editor, but it is really a
921 Lisp environment. At its heart is a Lisp interpreter; it also
922 ``happens'' to contain many specialized object types (e.g. buffers,
923 windows, frames, events) that are useful for implementing an editor.
924 Some of these objects (in particular windows and frames) have
925 displayable representations, and XEmacs provides a function
926 @code{redisplay()} that ensures that the display of all such objects
927 matches their internal state. Most of the time, a standard Lisp
928 environment is in a @dfn{read-eval-print} loop---i.e. ``read some Lisp
929 code, execute it, and print the results''. XEmacs has a similar loop:
935 dispatch the event (i.e. ``do it'')
940 Reading an event is done using the Lisp function @code{next-event},
941 which waits for something to happen (typically, the user presses a key
942 or moves the mouse) and returns an event object describing this.
943 Dispatching an event is done using the Lisp function
944 @code{dispatch-event}, which looks up the event in a keymap object (a
945 particular kind of object that associates an event with a Lisp function)
946 and calls that function. The function ``does'' what the user has
947 requested by changing the state of particular frame objects, buffer
948 objects, etc. Finally, @code{redisplay()} is called, which updates the
949 display to reflect those changes just made. Thus is an ``editor'' born.
951 @cindex bridge, playing
953 @cindex pi, calculating
954 Note that you do not have to use XEmacs as an editor; you could just
955 as well make it do your taxes, compute pi, play bridge, etc. You'd just
956 have to write functions to do those operations in Lisp.
958 @node The Lisp Language, XEmacs From the Perspective of Building, XEmacs From the Outside, Top
959 @chapter The Lisp Language
960 @cindex Lisp language, the
963 @cindex Lisp vs. Java
964 @cindex Java vs. Lisp
965 @cindex dynamic scoping
966 @cindex scoping, dynamic
967 @cindex dynamic types
968 @cindex types, dynamic
971 @cindex Gosling, James
973 Lisp is a general-purpose language that is higher-level than C and in
974 many ways more powerful than C. Powerful dialects of Lisp such as
975 Common Lisp are probably much better languages for writing very large
976 applications than is C. (Unfortunately, for many non-technical
977 reasons C and its successor C++ have become the dominant languages for
978 application development. These languages are both inadequate for
979 extremely large applications, which is evidenced by the fact that newer,
980 larger programs are becoming ever harder to write and are requiring ever
981 more programmers despite great increases in C development environments;
982 and by the fact that, although hardware speeds and reliability have been
983 growing at an exponential rate, most software is still generally
984 considered to be slow and buggy.)
986 The new Java language holds promise as a better general-purpose
987 development language than C. Java has many features in common with
988 Lisp that are not shared by C (this is not a coincidence, since
989 Java was designed by James Gosling, a former Lisp hacker). This
990 will be discussed more later.
992 For those used to C, here is a summary of the basic differences between
997 Lisp has an extremely regular syntax. Every function, expression,
998 and control statement is written in the form
1001 (@var{func} @var{arg1} @var{arg2} ...)
1004 This is as opposed to C, which writes functions as
1007 func(@var{arg1}, @var{arg2}, ...)
1010 but writes expressions involving operators as (e.g.)
1013 @var{arg1} + @var{arg2}
1016 and writes control statements as (e.g.)
1019 while (@var{expr}) @{ @var{statement1}; @var{statement2}; ... @}
1022 Lisp equivalents of the latter two would be
1025 (+ @var{arg1} @var{arg2} ...)
1031 (while @var{expr} @var{statement1} @var{statement2} ...)
1035 Lisp is a safe language. Assuming there are no bugs in the Lisp
1036 interpreter/compiler, it is impossible to write a program that ``core
1037 dumps'' or otherwise causes the machine to execute an illegal
1038 instruction. This is very different from C, where perhaps the most
1039 common outcome of a bug is exactly such a crash. A corollary of this is that
1040 the C operation of casting a pointer is impossible (and unnecessary) in
1041 Lisp, and that it is impossible to access memory outside the bounds of
1045 Programs and data are written in the same form. The
1046 parenthesis-enclosing form described above for statements is the same
1047 form used for the most common data type in Lisp, the list. Thus, it is
1048 possible to represent any Lisp program using Lisp data types, and for
1049 one program to construct Lisp statements and then dynamically
1050 @dfn{evaluate} them, or cause them to execute.
1053 All objects are @dfn{dynamically typed}. This means that part of every
1054 object is an indication of what type it is. A Lisp program can
1055 manipulate an object without knowing what type it is, and can query an
1056 object to determine its type. This means that, correspondingly,
1057 variables and function parameters can hold objects of any type and are
1058 not normally declared as being of any particular type. This is opposed
1059 to the @dfn{static typing} of C, where variables can hold exactly one
1060 type of object and must be declared as such, and objects do not contain
1061 an indication of their type because it's implicit in the variables they
1062 are stored in. It is possible in C to have a variable hold different
1063 types of objects (e.g. through the use of @code{void *} pointers or
1064 variable-argument functions), but the type information must then be
1065 passed explicitly in some other fashion, leading to additional program
1069 Allocated memory is automatically reclaimed when it is no longer in use.
1070 This operation is called @dfn{garbage collection} and involves looking
1071 through all variables to see what memory is being pointed to, and
1072 reclaiming any memory that is not pointed to and is thus
1073 ``inaccessible'' and out of use. This is as opposed to C, in which
1074 allocated memory must be explicitly reclaimed using @code{free()}. If
1075 you simply drop all pointers to memory without freeing it, it becomes
1076 ``leaked'' memory that still takes up space. Over a long period of
1077 time, this can cause your program to grow and grow until it runs out of
1081 Lisp has built-in facilities for handling errors and exceptions. In C,
1082 when an error occurs, usually either the program exits entirely or the
1083 routine in which the error occurs returns a value indicating this. If
1084 an error occurs in a deeply-nested routine, then every routine currently
1085 called must unwind itself normally and return an error value back up to
1086 the next routine. This means that every routine must explicitly check
1087 for an error in all the routines it calls; if it does not do so,
1088 unexpected and often random behavior results. This is an extremely
1089 common source of bugs in C programs. An alternative would be to do a
1090 non-local exit using @code{longjmp()}, but that is often very dangerous
1091 because the routines that were exited past had no opportunity to clean
1092 up after themselves and may leave things in an inconsistent state,
1093 causing a crash shortly afterwards.
1095 Lisp provides mechanisms to make such non-local exits safe. When an
1096 error occurs, a routine simply signals that an error of a particular
1097 class has occurred, and a non-local exit takes place. Any routine can
1098 trap errors occurring in routines it calls by registering an error
1099 handler for some or all classes of errors. (If no handler is registered,
1100 a default handler, generally installed by the top-level event loop, is
1101 executed; this prints out the error and continues.) Routines can also
1102 specify cleanup code (called an @dfn{unwind-protect}) that will be
1103 called when control exits from a block of code, no matter how that exit
1104 occurs---i.e. even if a function deeply nested below it causes a
1105 non-local exit back to the top level.
1107 Note that this facility has appeared in some recent vintages of C, in
1108 particular Visual C++ and other PC compilers written for the Microsoft
1112 In Emacs Lisp, local variables are @dfn{dynamically scoped}. This means
1113 that if you declare a local variable in a particular function, and then
1114 call another function, that subfunction can ``see'' the local variable
1115 you declared. This is actually considered a bug in Emacs Lisp and in
1116 all other early dialects of Lisp, and was corrected in Common Lisp. (In
1117 Common Lisp, you can still declare dynamically scoped variables if you
1118 want to---they are sometimes useful---but variables by default are
1119 @dfn{lexically scoped} as in C.)
1122 For those familiar with Lisp, Emacs Lisp is modelled after MacLisp, an
1123 early dialect of Lisp developed at MIT (no relation to the Macintosh
1124 computer). There is a Common Lisp compatibility package available for
1125 Emacs that provides many of the features of Common Lisp.
1127 The Java language is derived in many ways from C, and shares a similar
1128 syntax, but has the following features in common with Lisp (and different
1133 Java is a safe language, like Lisp.
1135 Java provides garbage collection, like Lisp.
1137 Java has built-in facilities for handling errors and exceptions, like
1140 Java has a type system that combines the best advantages of both static
1141 and dynamic typing. Objects (except very simple types) are explicitly
1142 marked with their type, as in dynamic typing; but there is a hierarchy
1143 of types and functions are declared to accept only certain types, thus
1144 providing the increased compile-time error-checking of static typing.
1147 The Java language also has some negative attributes:
1151 Java uses the edit/compile/run model of software development. This
1152 makes it hard to use interactively. For example, to use Java like
1153 @code{bc} it is necessary to write a special purpose, albeit tiny,
1154 application. In Emacs Lisp, a calculator comes built-in without any
1155 effort - one can always just type an expression in the @code{*scratch*}
1158 Java tries too hard to enforce, not merely enable, portability, making
1159 ordinary access to standard OS facilities painful. Java has an
1160 @dfn{agenda}. I think this is why @code{chdir} is not part of standard
1161 Java, which is inexcusable.
1164 Unfortunately, there is no perfect language. Static typing allows a
1165 compiler to catch programmer errors and produce more efficient code, but
1166 makes programming more tedious and less fun. For the foreseeable future,
1167 an Ideal Editing and Programming Environment (and that is what XEmacs
1168 aspires to) will be programmable in multiple languages: high level ones
1169 like Lisp for user customization and prototyping, and lower level ones
1170 for infrastructure and industrial strength applications. If I had my
1171 way, XEmacs would be friendly towards the Python, Scheme, C++, ML,
1172 etc... communities. But there are serious technical difficulties to
1173 achieving that goal.
1175 The word @dfn{application} in the previous paragraph was used
1176 intentionally. XEmacs implements an API for programs written in Lisp
1177 that makes it a full-fledged application platform, very much like an OS
1180 @node XEmacs From the Perspective of Building, XEmacs From the Inside, The Lisp Language, Top
1181 @chapter XEmacs From the Perspective of Building
1182 @cindex XEmacs from the perspective of building
1183 @cindex building, XEmacs from the perspective of
1185 The heart of XEmacs is the Lisp environment, which is written in C.
1186 This is contained in the @file{src/} subdirectory. Underneath
1187 @file{src/} are two subdirectories of header files: @file{s/} (header
1188 files for particular operating systems) and @file{m/} (header files for
1189 particular machine types). In practice the distinction between the two
1190 types of header files is blurred. These header files define or undefine
1191 certain preprocessor constants and macros to indicate particular
1192 characteristics of the associated machine or operating system. As part
1193 of the configure process, one @file{s/} file and one @file{m/} file is
1194 identified for the particular environment in which XEmacs is being
1197 XEmacs also contains a great deal of Lisp code. This implements the
1198 operations that make XEmacs useful as an editor as well as just a Lisp
1199 environment, and also contains many add-on packages that allow XEmacs to
1200 browse directories, act as a mail and Usenet news reader, compile Lisp
1201 code, etc. There is actually more Lisp code than C code associated with
1202 XEmacs, but much of the Lisp code is peripheral to the actual operation
1203 of the editor. The Lisp code all lies in subdirectories underneath the
1204 @file{lisp/} directory.
1206 The @file{lwlib/} directory contains C code that implements a
1207 generalized interface onto different X widget toolkits and also
1208 implements some widgets of its own that behave like Motif widgets but
1209 are faster, free, and in some cases more powerful. The code in this
1210 directory compiles into a library and is mostly independent from XEmacs.
1212 The @file{etc/} directory contains various data files associated with
1213 XEmacs. Some of them are actually read by XEmacs at startup; others
1214 merely contain useful information of various sorts.
1216 The @file{lib-src/} directory contains C code for various auxiliary
1217 programs that are used in connection with XEmacs. Some of them are used
1218 during the build process; others are used to perform certain functions
1219 that cannot conveniently be placed in the XEmacs executable (e.g. the
1220 @file{movemail} program for fetching mail out of @file{/var/spool/mail},
1221 which must be setgid to @file{mail} on many systems; and the
1222 @file{gnuclient} program, which allows an external script to communicate
1223 with a running XEmacs process).
1225 The @file{man/} directory contains the sources for the XEmacs
1226 documentation. It is mostly in a form called Texinfo, which can be
1227 converted into either a printed document (by passing it through @TeX{})
1228 or into on-line documentation called @dfn{info files}.
1230 The @file{info/} directory contains the results of formatting the XEmacs
1231 documentation as @dfn{info files}, for on-line use. These files are
1232 used when you enter the Info system using @kbd{C-h i} or through the
1235 The @file{dynodump/} directory contains auxiliary code used to build
1236 XEmacs on Solaris platforms.
1238 The other directories contain various miscellaneous code and information
1239 that is not normally used or needed.
1241 The first step of building involves running the @file{configure} program
1242 and passing it various parameters to specify any optional features you
1243 want and compiler arguments and such, as described in the @file{INSTALL}
1244 file. This determines what the build environment is, chooses the
1245 appropriate @file{s/} and @file{m/} file, and runs a series of tests to
1246 determine many details about your environment, such as which library
1247 functions are available and exactly how they work. The reason for
1248 running these tests is that it allows XEmacs to be compiled on a much
1249 wider variety of platforms than those that the XEmacs developers happen
1250 to be familiar with, including various sorts of hybrid platforms. This
1251 is especially important now that many operating systems give you a great
1252 deal of control over exactly what features you want installed, and allow
1253 for easy upgrading of parts of a system without upgrading the rest. It
1254 would be impossible to pre-determine and pre-specify the information for
1255 all possible configurations.
1257 In fact, the @file{s/} and @file{m/} files are basically @emph{evil},
1258 since they contain unmaintainable platform-specific hard-coded
1259 information. XEmacs has been moving in the direction of having all
1260 system-specific information be determined dynamically by
1261 @file{configure}. Perhaps someday we can @code{rm -rf src/s src/m}.
1263 When configure is done running, it generates @file{Makefile}s and
1264 @file{GNUmakefile}s and the file @file{src/config.h} (which describes
1265 the features of your system) from template files. You then run
1266 @file{make}, which compiles the auxiliary code and programs in
1267 @file{lib-src/} and @file{lwlib/} and the main XEmacs executable in
1268 @file{src/}. The result of compiling and linking is an executable
1269 called @file{temacs}, which is @emph{not} the final XEmacs executable.
1270 @file{temacs} by itself is not intended to function as an editor or even
1271 display any windows on the screen, and if you simply run it, it will
1272 exit immediately. The @file{Makefile} runs @file{temacs} with certain
1273 options that cause it to initialize itself, read in a number of basic
1274 Lisp files, and then dump itself out into a new executable called
1275 @file{xemacs}. This new executable has been pre-initialized and
1276 contains pre-digested Lisp code that is necessary for the editor to
1277 function (this includes most basic editing functions,
1278 e.g. @code{kill-line}, that can be defined in terms of other Lisp
1279 primitives; some initialization code that is called when certain
1280 objects, such as frames, are created; and all of the standard
1281 keybindings and code for the actions they result in). This executable,
1282 @file{xemacs}, is the executable that you run to use the XEmacs editor.
1284 Although @file{temacs} is not intended to be run as an editor, it can,
1285 by using the incantation @code{temacs -batch -l loadup.el run-temacs}.
1286 This is useful when the dumping procedure described above is broken, or
1287 when using certain program debugging tools such as Purify. These tools
1288 get mighty confused by the tricks played by the XEmacs build process,
1289 such as allocation memory in one process, and freeing it in the next.
1291 @node XEmacs From the Inside, The XEmacs Object System (Abstractly Speaking), XEmacs From the Perspective of Building, Top
1292 @chapter XEmacs From the Inside
1293 @cindex XEmacs from the inside
1294 @cindex inside, XEmacs from the
1296 Internally, XEmacs is quite complex, and can be very confusing. To
1297 simplify things, it can be useful to think of XEmacs as containing an
1298 event loop that ``drives'' everything, and a number of other subsystems,
1299 such as a Lisp engine and a redisplay mechanism. Each of these other
1300 subsystems exists simultaneously in XEmacs, and each has a certain
1301 state. The flow of control continually passes in and out of these
1302 different subsystems in the course of normal operation of the editor.
1304 It is important to keep in mind that, most of the time, the editor is
1305 ``driven'' by the event loop. Except during initialization and batch
1306 mode, all subsystems are entered directly or indirectly through the
1307 event loop, and ultimately, control exits out of all subsystems back up
1308 to the event loop. This cycle of entering a subsystem, exiting back out
1309 to the event loop, and starting another iteration of the event loop
1310 occurs once each keystroke, mouse motion, etc.
1312 If you're trying to understand a particular subsystem (other than the
1313 event loop), think of it as a ``daemon'' process or ``servant'' that is
1314 responsible for one particular aspect of a larger system, and
1315 periodically receives commands or environment changes that cause it to
1316 do something. Ultimately, these commands and environment changes are
1317 always triggered by the event loop. For example:
1321 The window and frame mechanism is responsible for keeping track of what
1322 windows and frames exist, what buffers are in them, etc. It is
1323 periodically given commands (usually from the user) to make a change to
1324 the current window/frame state: i.e. create a new frame, delete a
1328 The buffer mechanism is responsible for keeping track of what buffers
1329 exist and what text is in them. It is periodically given commands
1330 (usually from the user) to insert or delete text, create a buffer, etc.
1331 When it receives a text-change command, it notifies the redisplay
1335 The redisplay mechanism is responsible for making sure that windows and
1336 frames are displayed correctly. It is periodically told (by the event
1337 loop) to actually ``do its job'', i.e. snoop around and see what the
1338 current state of the environment (mostly of the currently-existing
1339 windows, frames, and buffers) is, and make sure that state matches
1340 what's actually displayed. It keeps lots and lots of information around
1341 (such as what is actually being displayed currently, and what the
1342 environment was last time it checked) so that it can minimize the work
1343 it has to do. It is also helped along in that whenever a relevant
1344 change to the environment occurs, the redisplay mechanism is told about
1345 this, so it has a pretty good idea of where it has to look to find
1346 possible changes and doesn't have to look everywhere.
1349 The Lisp engine is responsible for executing the Lisp code in which most
1350 user commands are written. It is entered through a call to @code{eval}
1351 or @code{funcall}, which occurs as a result of dispatching an event from
1352 the event loop. The functions it calls issue commands to the buffer
1353 mechanism, the window/frame subsystem, etc.
1356 The Lisp allocation subsystem is responsible for keeping track of Lisp
1357 objects. It is given commands from the Lisp engine to allocate objects,
1358 garbage collect, etc.
1363 The important idea here is that there are a number of independent
1364 subsystems each with its own responsibility and persistent state, just
1365 like different employees in a company, and each subsystem is
1366 periodically given commands from other subsystems. Commands can flow
1367 from any one subsystem to any other, but there is usually some sort of
1368 hierarchy, with all commands originating from the event subsystem.
1370 XEmacs is entered in @code{main()}, which is in @file{emacs.c}. When
1371 this is called the first time (in a properly-invoked @file{temacs}), it
1376 It does some very basic environment initializations, such as determining
1377 where it and its directories (e.g. @file{lisp/} and @file{etc/}) reside
1378 and setting up signal handlers.
1380 It initializes the entire Lisp interpreter.
1382 It sets the initial values of many built-in variables (including many
1383 variables that are visible to Lisp programs), such as the global keymap
1384 object and the built-in faces (a face is an object that describes the
1385 display characteristics of text). This involves creating Lisp objects
1386 and thus is dependent on step (2).
1388 It performs various other initializations that are relevant to the
1389 particular environment it is running in, such as retrieving environment
1390 variables, determining the current date and the user who is running the
1391 program, examining its standard input, creating any necessary file
1394 At this point, the C initialization is complete. A Lisp program that
1395 was specified on the command line (usually @file{loadup.el}) is called
1396 (temacs is normally invoked as @code{temacs -batch -l loadup.el dump}).
1397 @file{loadup.el} loads all of the other Lisp files that are needed for
1398 the operation of the editor, calls the @code{dump-emacs} function to
1399 write out @file{xemacs}, and then kills the temacs process.
1402 When @file{xemacs} is then run, it only redoes steps (1) and (4)
1403 above; all variables already contain the values they were set to when
1404 the executable was dumped, and all memory that was allocated with
1405 @code{malloc()} is still around. (XEmacs knows whether it is being run
1406 as @file{xemacs} or @file{temacs} because it sets the global variable
1407 @code{initialized} to 1 after step (4) above.) At this point,
1408 @file{xemacs} calls a Lisp function to do any further initialization,
1409 which includes parsing the command-line (the C code can only do limited
1410 command-line parsing, which includes looking for the @samp{-batch} and
1411 @samp{-l} flags and a few other flags that it needs to know about before
1412 initialization is complete), creating the first frame (or @dfn{window}
1413 in standard window-system parlance), running the user's init file
1414 (usually the file @file{.emacs} in the user's home directory), etc. The
1415 function to do this is usually called @code{normal-top-level};
1416 @file{loadup.el} tells the C code about this function by setting its
1417 name as the value of the Lisp variable @code{top-level}.
1419 When the Lisp initialization code is done, the C code enters the event
1420 loop, and stays there for the duration of the XEmacs process. The code
1421 for the event loop is contained in @file{cmdloop.c}, and is called
1422 @code{Fcommand_loop_1()}. Note that this event loop could very well be
1423 written in Lisp, and in fact a Lisp version exists; but apparently,
1424 doing this makes XEmacs run noticeably slower.
1426 Notice how much of the initialization is done in Lisp, not in C.
1427 In general, XEmacs tries to move as much code as is possible
1428 into Lisp. Code that remains in C is code that implements the
1429 Lisp interpreter itself, or code that needs to be very fast, or
1430 code that needs to do system calls or other such stuff that
1431 needs to be done in C, or code that needs to have access to
1432 ``forbidden'' structures. (One conscious aspect of the design of
1433 Lisp under XEmacs is a clean separation between the external
1434 interface to a Lisp object's functionality and its internal
1435 implementation. Part of this design is that Lisp programs
1436 are forbidden from accessing the contents of the object other
1437 than through using a standard API. In this respect, XEmacs Lisp
1438 is similar to modern Lisp dialects but differs from GNU Emacs,
1439 which tends to expose the implementation and allow Lisp
1440 programs to look at it directly. The major advantage of
1441 hiding the implementation is that it allows the implementation
1442 to be redesigned without affecting any Lisp programs, including
1443 those that might want to be ``clever'' by looking directly at
1444 the object's contents and possibly manipulating them.)
1446 Moving code into Lisp makes the code easier to debug and maintain and
1447 makes it much easier for people who are not XEmacs developers to
1448 customize XEmacs, because they can make a change with much less chance
1449 of obscure and unwanted interactions occurring than if they were to
1452 @node The XEmacs Object System (Abstractly Speaking), How Lisp Objects Are Represented in C, XEmacs From the Inside, Top
1453 @chapter The XEmacs Object System (Abstractly Speaking)
1454 @cindex XEmacs object system (abstractly speaking), the
1455 @cindex object system (abstractly speaking), the XEmacs
1457 At the heart of the Lisp interpreter is its management of objects.
1458 XEmacs Lisp contains many built-in objects, some of which are
1459 simple and others of which can be very complex; and some of which
1460 are very common, and others of which are rarely used or are only
1461 used internally. (Since the Lisp allocation system, with its
1462 automatic reclamation of unused storage, is so much more convenient
1463 than @code{malloc()} and @code{free()}, the C code makes extensive use of it
1464 in its internal operations.)
1466 The basic Lisp objects are
1470 28 or 31 bits of precision, or 60 or 63 bits on 64-bit machines; the
1471 reason for this is described below when the internal Lisp object
1472 representation is described.
1474 Same precision as a double in C.
1476 A simple container for two Lisp objects, used to implement lists and
1477 most other data structures in Lisp.
1479 An object representing a single character of text; chars behave like
1480 integers in many ways but are logically considered text rather than
1481 numbers and have a different read syntax. (the read syntax for a char
1482 contains the char itself or some textual encoding of it---for example,
1483 a Japanese Kanji character might be encoded as @samp{^[$(B#&^[(B} using the
1484 ISO-2022 encoding standard---rather than the numerical representation
1485 of the char; this way, if the mapping between chars and integers
1486 changes, which is quite possible for Kanji characters and other extended
1487 characters, the same character will still be created. Note that some
1488 primitives confuse chars and integers. The worst culprit is @code{eq},
1489 which makes a special exception and considers a char to be @code{eq} to
1490 its integer equivalent, even though in no other case are objects of two
1491 different types @code{eq}. The reason for this monstrosity is
1492 compatibility with existing code; the separation of char from integer
1493 came fairly recently.)
1495 An object that contains Lisp objects and is referred to by name;
1496 symbols are used to implement variables and named functions
1497 and to provide the equivalent of preprocessor constants in C.
1499 A one-dimensional array of Lisp objects providing constant-time access
1500 to any of the objects; access to an arbitrary object in a vector is
1501 faster than for lists, but the operations that can be done on a vector
1504 Self-explanatory; behaves much like a vector of chars
1505 but has a different read syntax and is stored and manipulated
1508 A vector of bits; similar to a string in spirit.
1509 @item compiled-function
1510 An object containing compiled Lisp code, known as @dfn{byte code}.
1512 A Lisp primitive, i.e. a Lisp-callable function implemented in C.
1516 Note that there is no basic ``function'' type, as in more powerful
1517 versions of Lisp (where it's called a @dfn{closure}). XEmacs Lisp does
1518 not provide the closure semantics implemented by Common Lisp and Scheme.
1519 The guts of a function in XEmacs Lisp are represented in one of four
1520 ways: a symbol specifying another function (when one function is an
1521 alias for another), a list (whose first element must be the symbol
1522 @code{lambda}) containing the function's source code, a
1523 compiled-function object, or a subr object. (In other words, given a
1524 symbol specifying the name of a function, calling @code{symbol-function}
1525 to retrieve the contents of the symbol's function cell will return one
1526 of these types of objects.)
1528 XEmacs Lisp also contains numerous specialized objects used to implement
1533 Stores text like a string, but is optimized for insertion and deletion
1534 and has certain other properties that can be set.
1536 An object with various properties whose displayable representation is a
1537 @dfn{window} in window-system parlance.
1539 A section of a frame that displays the contents of a buffer;
1540 often called a @dfn{pane} in window-system parlance.
1541 @item window-configuration
1542 An object that represents a saved configuration of windows in a frame.
1544 An object representing a screen on which frames can be displayed;
1545 equivalent to a @dfn{display} in the X Window System and a @dfn{TTY} in
1548 An object specifying the appearance of text or graphics; it has
1549 properties such as font, foreground color, and background color.
1551 An object that refers to a particular position in a buffer and moves
1552 around as text is inserted and deleted to stay in the same relative
1553 position to the text around it.
1555 Similar to a marker but covers a range of text in a buffer; can also
1556 specify properties of the text, such as a face in which the text is to
1557 be displayed, whether the text is invisible or unmodifiable, etc.
1559 Generated by calling @code{next-event} and contains information
1560 describing a particular event happening in the system, such as the user
1561 pressing a key or a process terminating.
1563 An object that maps from events (described using lists, vectors, and
1564 symbols rather than with an event object because the mapping is for
1565 classes of events, rather than individual events) to functions to
1566 execute or other events to recursively look up; the functions are
1567 described by name, using a symbol, or using lists to specify the
1570 An object that describes the appearance of an image (e.g. pixmap) on
1571 the screen; glyphs can be attached to the beginning or end of extents
1572 and in some future version of XEmacs will be able to be inserted
1573 directly into a buffer.
1575 An object that describes a connection to an externally-running process.
1578 There are some other, less-commonly-encountered general objects:
1582 An object that maps from an arbitrary Lisp object to another arbitrary
1583 Lisp object, using hashing for fast lookup.
1585 A limited form of hash-table that maps from strings to symbols; obarrays
1586 are used to look up a symbol given its name and are not actually their
1587 own object type but are kludgily represented using vectors with hidden
1588 fields (this representation derives from GNU Emacs).
1590 A complex object used to specify the value of a display property; a
1591 default value is given and different values can be specified for
1592 particular frames, buffers, windows, devices, or classes of device.
1594 An object that maps from chars or classes of chars to arbitrary Lisp
1595 objects; internally char tables use a complex nested-vector
1596 representation that is optimized to the way characters are represented
1599 An object that maps from ranges of integers to arbitrary Lisp objects.
1602 And some strange special-purpose objects:
1606 @itemx coding-system
1607 Objects used when MULE, or multi-lingual/Asian-language, support is
1609 @item color-instance
1610 @itemx font-instance
1611 @itemx image-instance
1612 An object that encapsulates a window-system resource; instances are
1613 mostly used internally but are exposed on the Lisp level for cleanness
1614 of the specifier model and because it's occasionally useful for Lisp
1615 program to create or query the properties of instances.
1617 An object that encapsulate a @dfn{subwindow} resource, i.e. a
1618 window-system child window that is drawn into by an external process;
1619 this object should be integrated into the glyph system but isn't yet,
1620 and may change form when this is done.
1621 @item tooltalk-message
1622 @itemx tooltalk-pattern
1623 Objects that represent resources used in the ToolTalk interprocess
1624 communication protocol.
1625 @item toolbar-button
1626 An object used in conjunction with the toolbar.
1629 And objects that are only used internally:
1633 A generic object for encapsulating arbitrary memory; this allows you the
1634 generality of @code{malloc()} and the convenience of the Lisp object
1637 A buffering I/O stream, used to provide a unified interface to anything
1638 that can accept output or provide input, such as a file descriptor, a
1639 stdio stream, a chunk of memory, a Lisp buffer, a Lisp string, etc.;
1640 it's a Lisp object to make its memory management more convenient.
1641 @item char-table-entry
1642 Subsidiary objects in the internal char-table representation.
1643 @item extent-auxiliary
1646 Various special-purpose objects that are basically just used to
1647 encapsulate memory for particular subsystems, similar to the more
1648 general ``opaque'' object.
1649 @item symbol-value-forward
1650 @itemx symbol-value-buffer-local
1651 @itemx symbol-value-varalias
1652 @itemx symbol-value-lisp-magic
1653 Special internal-only objects that are placed in the value cell of a
1654 symbol to indicate that there is something special with this variable --
1655 e.g. it has no value, it mirrors another variable, or it mirrors some C
1656 variable; there is really only one kind of object, called a
1657 @dfn{symbol-value-magic}, but it is sort-of halfway kludged into
1658 semi-different object types.
1661 @cindex permanent objects
1662 @cindex temporary objects
1663 Some types of objects are @dfn{permanent}, meaning that once created,
1664 they do not disappear until explicitly destroyed, using a function such
1665 as @code{delete-buffer}, @code{delete-window}, @code{delete-frame}, etc.
1666 Others will disappear once they are not longer used, through the garbage
1667 collection mechanism. Buffers, frames, windows, devices, and processes
1668 are among the objects that are permanent. Note that some objects can go
1669 both ways: Faces can be created either way; extents are normally
1670 permanent, but detached extents (extents not referring to any text, as
1671 happens to some extents when the text they are referring to is deleted)
1672 are temporary. Note that some permanent objects, such as faces and
1673 coding systems, cannot be deleted. Note also that windows are unique in
1674 that they can be @emph{undeleted} after having previously been
1675 deleted. (This happens as a result of restoring a window configuration.)
1678 Note that many types of objects have a @dfn{read syntax}, i.e. a way of
1679 specifying an object of that type in Lisp code. When you load a Lisp
1680 file, or type in code to be evaluated, what really happens is that the
1681 function @code{read} is called, which reads some text and creates an object
1682 based on the syntax of that text; then @code{eval} is called, which
1683 possibly does something special; then this loop repeats until there's
1684 no more text to read. (@code{eval} only actually does something special
1685 with symbols, which causes the symbol's value to be returned,
1686 similar to referencing a variable; and with conses [i.e. lists],
1687 which cause a function invocation. All other values are returned
1696 converts to an integer whose value is 17297.
1702 converts to a float whose value is 1.983e-4, or .0001983.
1708 converts to a char that represents the lowercase letter b.
1714 (where @samp{^[} actually is an @samp{ESC} character) converts to a
1715 particular Kanji character when using an ISO2022-based coding system for
1716 input. (To decode this goo: @samp{ESC} begins an escape sequence;
1717 @samp{ESC $ (} is a class of escape sequences meaning ``switch to a
1718 94x94 character set''; @samp{ESC $ ( B} means ``switch to Japanese
1719 Kanji''; @samp{#} and @samp{&} collectively index into a 94-by-94 array
1720 of characters [subtract 33 from the ASCII value of each character to get
1721 the corresponding index]; @samp{ESC (} is a class of escape sequences
1722 meaning ``switch to a 94 character set''; @samp{ESC (B} means ``switch
1723 to US ASCII''. It is a coincidence that the letter @samp{B} is used to
1724 denote both Japanese Kanji and US ASCII. If the first @samp{B} were
1725 replaced with an @samp{A}, you'd be requesting a Chinese Hanzi character
1726 from the GB2312 character set.)
1732 converts to a string.
1738 converts to a symbol whose name is @code{"foobar"}. This is done by
1739 looking up the string equivalent in the global variable
1740 @code{obarray}, whose contents should be an obarray. If no symbol
1741 is found, a new symbol with the name @code{"foobar"} is automatically
1742 created and added to @code{obarray}; this process is called
1743 @dfn{interning} the symbol.
1750 converts to a cons cell containing the symbols @code{foo} and @code{bar}.
1756 converts to a three-element list containing the specified objects
1757 (note that a list is actually a set of nested conses; see the
1758 XEmacs Lisp Reference).
1764 converts to a three-element vector containing the specified objects.
1770 converts to a compiled-function object (the actual contents are not
1771 shown since they are not relevant here; look at a file that ends with
1772 @file{.elc} for examples).
1778 converts to a bit-vector.
1781 #s(hash-table ... ...)
1784 converts to a hash table (the actual contents are not shown).
1787 #s(range-table ... ...)
1790 converts to a range table (the actual contents are not shown).
1793 #s(char-table ... ...)
1796 converts to a char table (the actual contents are not shown).
1798 Note that the @code{#s()} syntax is the general syntax for structures,
1799 which are not really implemented in XEmacs Lisp but should be.
1801 When an object is printed out (using @code{print} or a related
1802 function), the read syntax is used, so that the same object can be read
1805 The other objects do not have read syntaxes, usually because it does not
1806 really make sense to create them in this fashion (i.e. processes, where
1807 it doesn't make sense to have a subprocess created as a side effect of
1808 reading some Lisp code), or because they can't be created at all
1809 (e.g. subrs). Permanent objects, as a rule, do not have a read syntax;
1810 nor do most complex objects, which contain too much state to be easily
1811 initialized through a read syntax.
1813 @node How Lisp Objects Are Represented in C, Rules When Writing New C Code, The XEmacs Object System (Abstractly Speaking), Top
1814 @chapter How Lisp Objects Are Represented in C
1815 @cindex Lisp objects are represented in C, how
1816 @cindex objects are represented in C, how Lisp
1817 @cindex represented in C, how Lisp objects are
1819 Lisp objects are represented in C using a 32-bit or 64-bit machine word
1820 (depending on the processor; i.e. DEC Alphas use 64-bit Lisp objects and
1821 most other processors use 32-bit Lisp objects). The representation
1822 stuffs a pointer together with a tag, as follows:
1825 [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
1826 [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
1828 <---------------------------------------------------------> <->
1829 a pointer to a structure, or an integer tag
1832 A tag of 00 is used for all pointer object types, a tag of 10 is used
1833 for characters, and the other two tags 01 and 11 are joined together to
1834 form the integer object type. This representation gives us 31 bit
1835 integers and 30 bit characters, while pointers are represented directly
1836 without any bit masking or shifting. This representation, though,
1837 assumes that pointers to structs are always aligned to multiples of 4,
1838 so the lower 2 bits are always zero.
1840 Lisp objects use the typedef @code{Lisp_Object}, but the actual C type
1841 used for the Lisp object can vary. It can be either a simple type
1842 (@code{long} on the DEC Alpha, @code{int} on other machines) or a
1843 structure whose fields are bit fields that line up properly (actually, a
1844 union of structures is used). Generally the simple integral type is
1845 preferable because it ensures that the compiler will actually use a
1846 machine word to represent the object (some compilers will use more
1847 general and less efficient code for unions and structs even if they can
1848 fit in a machine word). The union type, however, has the advantage of
1849 stricter type checking. If you accidentally pass an integer where a Lisp
1850 object is desired, you get a compile error. The choice of which type
1851 to use is determined by the preprocessor constant @code{USE_UNION_TYPE}
1852 which is defined via the @code{--use-union-type} option to
1855 Various macros are used to convert between Lisp_Objects and the
1856 corresponding C type. Macros of the form @code{XINT()}, @code{XCHAR()},
1857 @code{XSTRING()}, @code{XSYMBOL()}, do any required bit shifting and/or
1858 masking and cast it to the appropriate type. @code{XINT()} needs to be
1859 a bit tricky so that negative numbers are properly sign-extended. Since
1860 integers are stored left-shifted, if the right-shift operator does an
1861 arithmetic shift (i.e. it leaves the most-significant bit as-is rather
1862 than shifting in a zero, so that it mimics a divide-by-two even for
1863 negative numbers) the shift to remove the tag bit is enough. This is
1864 the case on all the systems we support.
1866 Note that when @code{ERROR_CHECK_TYPECHECK} is defined, the converter
1867 macros become more complicated---they check the tag bits and/or the
1868 type field in the first four bytes of a record type to ensure that the
1869 object is really of the correct type. This is great for catching places
1870 where an incorrect type is being dereferenced---this typically results
1871 in a pointer being dereferenced as the wrong type of structure, with
1872 unpredictable (and sometimes not easily traceable) results.
1874 There are similar @code{XSET@var{TYPE}()} macros that construct a Lisp
1875 object. These macros are of the form @code{XSET@var{TYPE}
1876 (@var{lvalue}, @var{result})}, i.e. they have to be a statement rather
1877 than just used in an expression. The reason for this is that standard C
1878 doesn't let you ``construct'' a structure (but GCC does). Granted, this
1879 sometimes isn't too convenient; for the case of integers, at least, you
1880 can use the function @code{make_int()}, which constructs and
1881 @emph{returns} an integer Lisp object. Note that the
1882 @code{XSET@var{TYPE}()} macros are also affected by
1883 @code{ERROR_CHECK_TYPECHECK} and make sure that the structure is of the
1884 right type in the case of record types, where the type is contained in
1887 The C programmer is responsible for @strong{guaranteeing} that a
1888 Lisp_Object is the correct type before using the @code{X@var{TYPE}}
1889 macros. This is especially important in the case of lists. Use
1890 @code{XCAR} and @code{XCDR} if a Lisp_Object is certainly a cons cell,
1891 else use @code{Fcar()} and @code{Fcdr()}. Trust other C code, but not
1892 Lisp code. On the other hand, if XEmacs has an internal logic error,
1893 it's better to crash immediately, so sprinkle @code{assert()}s and
1894 ``unreachable'' @code{abort()}s liberally about the source code. Where
1895 performance is an issue, use @code{type_checking_assert},
1896 @code{bufpos_checking_assert}, and @code{gc_checking_assert}, which do
1897 nothing unless the corresponding configure error checking flag was
1900 @node Rules When Writing New C Code, Regression Testing XEmacs, How Lisp Objects Are Represented in C, Top
1901 @chapter Rules When Writing New C Code
1902 @cindex writing new C code, rules when
1903 @cindex C code, rules when writing new
1904 @cindex code, rules when writing new C
1906 The XEmacs C Code is extremely complex and intricate, and there are many
1907 rules that are more or less consistently followed throughout the code.
1908 Many of these rules are not obvious, so they are explained here. It is
1909 of the utmost importance that you follow them. If you don't, you may
1910 get something that appears to work, but which will crash in odd
1911 situations, often in code far away from where the actual breakage is.
1914 * General Coding Rules::
1915 * Writing Lisp Primitives::
1916 * Writing Good Comments::
1917 * Adding Global Lisp Variables::
1918 * Proper Use of Unsigned Types::
1920 * Techniques for XEmacs Developers::
1923 @node General Coding Rules
1924 @section General Coding Rules
1925 @cindex coding rules, general
1927 The C code is actually written in a dialect of C called @dfn{Clean C},
1928 meaning that it can be compiled, mostly warning-free, with either a C or
1929 C++ compiler. Coding in Clean C has several advantages over plain C.
1930 C++ compilers are more nit-picking, and a number of coding errors have
1931 been found by compiling with C++. The ability to use both C and C++
1932 tools means that a greater variety of development tools are available to
1935 Every module includes @file{<config.h>} (angle brackets so that
1936 @samp{--srcdir} works correctly; @file{config.h} may or may not be in
1937 the same directory as the C sources) and @file{lisp.h}. @file{config.h}
1938 must always be included before any other header files (including
1939 system header files) to ensure that certain tricks played by various
1940 @file{s/} and @file{m/} files work out correctly.
1942 When including header files, always use angle brackets, not double
1943 quotes, except when the file to be included is always in the same
1944 directory as the including file. If either file is a generated file,
1945 then that is not likely to be the case. In order to understand why we
1946 have this rule, imagine what happens when you do a build in the source
1947 directory using @samp{./configure} and another build in another
1948 directory using @samp{../work/configure}. There will be two different
1949 @file{config.h} files. Which one will be used if you @samp{#include
1952 Almost every module contains a @code{syms_of_*()} function and a
1953 @code{vars_of_*()} function. The former declares any Lisp primitives
1954 you have defined and defines any symbols you will be using. The latter
1955 declares any global Lisp variables you have added and initializes global
1956 C variables in the module. @strong{Important}: There are stringent
1957 requirements on exactly what can go into these functions. See the
1958 comment in @file{emacs.c}. The reason for this is to avoid obscure
1959 unwanted interactions during initialization. If you don't follow these
1960 rules, you'll be sorry! If you want to do anything that isn't allowed,
1961 create a @code{complex_vars_of_*()} function for it. Doing this is
1962 tricky, though: you have to make sure your function is called at the
1963 right time so that all the initialization dependencies work out.
1965 Declare each function of these kinds in @file{symsinit.h}. Make sure
1966 it's called in the appropriate place in @file{emacs.c}. You never need
1967 to include @file{symsinit.h} directly, because it is included by
1970 @strong{All global and static variables that are to be modifiable must
1971 be declared uninitialized.} This means that you may not use the
1972 ``declare with initializer'' form for these variables, such as @code{int
1973 some_variable = 0;}. The reason for this has to do with some kludges
1974 done during the dumping process: If possible, the initialized data
1975 segment is re-mapped so that it becomes part of the (unmodifiable) code
1976 segment in the dumped executable. This allows this memory to be shared
1977 among multiple running XEmacs processes. XEmacs is careful to place as
1978 much constant data as possible into initialized variables during the
1979 @file{temacs} phase.
1981 @cindex copy-on-write
1982 @strong{Please note:} This kludge only works on a few systems nowadays,
1983 and is rapidly becoming irrelevant because most modern operating systems
1984 provide @dfn{copy-on-write} semantics. All data is initially shared
1985 between processes, and a private copy is automatically made (on a
1986 page-by-page basis) when a process first attempts to write to a page of
1989 Formerly, there was a requirement that static variables not be declared
1990 inside of functions. This had to do with another hack along the same
1991 vein as what was just described: old USG systems put statically-declared
1992 variables in the initialized data space, so those header files had a
1993 @code{#define static} declaration. (That way, the data-segment remapping
1994 described above could still work.) This fails badly on static variables
1995 inside of functions, which suddenly become automatic variables;
1996 therefore, you weren't supposed to have any of them. This awful kludge
1997 has been removed in XEmacs because
2001 almost all of the systems that used this kludge ended up having
2002 to disable the data-segment remapping anyway;
2004 the only systems that didn't were extremely outdated ones;
2006 this hack completely messed up inline functions.
2009 The C source code makes heavy use of C preprocessor macros. One popular
2013 #define FOO(var, value) do @{ \
2014 Lisp_Object FOO_value = (value); \
2015 ... /* compute using FOO_value */ \
2020 The @code{do @{...@} while (0)} is a standard trick to allow FOO to have
2021 statement semantics, so that it can safely be used within an @code{if}
2022 statement in C, for example. Multiple evaluation is prevented by
2023 copying a supplied argument into a local variable, so that
2024 @code{FOO(var,fun(1))} only calls @code{fun} once.
2026 Lisp lists are popular data structures in the C code as well as in
2027 Elisp. There are two sets of macros that iterate over lists.
2028 @code{EXTERNAL_LIST_LOOP_@var{n}} should be used when the list has been
2029 supplied by the user, and cannot be trusted to be acyclic and
2030 @code{nil}-terminated. A @code{malformed-list} or @code{circular-list} error
2031 will be generated if the list being iterated over is not entirely
2032 kosher. @code{LIST_LOOP_@var{n}}, on the other hand, is faster and less
2033 safe, and can be used only on trusted lists.
2035 Related macros are @code{GET_EXTERNAL_LIST_LENGTH} and
2036 @code{GET_LIST_LENGTH}, which calculate the length of a list, and in the
2037 case of @code{GET_EXTERNAL_LIST_LENGTH}, validating the properness of
2038 the list. The macros @code{EXTERNAL_LIST_LOOP_DELETE_IF} and
2039 @code{LIST_LOOP_DELETE_IF} delete elements from a lisp list satisfying some
2042 @node Writing Lisp Primitives
2043 @section Writing Lisp Primitives
2044 @cindex writing Lisp primitives
2045 @cindex Lisp primitives, writing
2046 @cindex primitives, writing Lisp
2048 Lisp primitives are Lisp functions implemented in C. The details of
2049 interfacing the C function so that Lisp can call it are handled by a few
2050 C macros. The only way to really understand how to write new C code is
2051 to read the source, but we can explain some things here.
2053 An example of a special form is the definition of @code{prog1}, from
2054 @file{eval.c}. (An ordinary function would have the same general
2057 @cindex garbage collection protection
2060 DEFUN ("prog1", Fprog1, 1, UNEVALLED, 0, /*
2061 Similar to `progn', but the value of the first form is returned.
2062 \(prog1 FIRST BODY...): All the arguments are evaluated sequentially.
2063 The value of FIRST is saved during evaluation of the remaining args,
2064 whose values are discarded.
2068 /* This function can GC */
2069 REGISTER Lisp_Object val, form, tail;
2070 struct gcpro gcpro1;
2072 val = Feval (XCAR (args));
2076 LIST_LOOP_3 (form, XCDR (args), tail)
2085 Let's start with a precise explanation of the arguments to the
2086 @code{DEFUN} macro. Here is a template for them:
2090 DEFUN (@var{lname}, @var{fname}, @var{min_args}, @var{max_args}, @var{interactive}, /*
2099 This string is the name of the Lisp symbol to define as the function
2100 name; in the example above, it is @code{"prog1"}.
2103 This is the C function name for this function. This is the name that is
2104 used in C code for calling the function. The name is, by convention,
2105 @samp{F} prepended to the Lisp name, with all dashes (@samp{-}) in the
2106 Lisp name changed to underscores. Thus, to call this function from C
2107 code, call @code{Fprog1}. Remember that the arguments are of type
2108 @code{Lisp_Object}; various macros and functions for creating values of
2109 type @code{Lisp_Object} are declared in the file @file{lisp.h}.
2111 Primitives whose names are special characters (e.g. @code{+} or
2112 @code{<}) are named by spelling out, in some fashion, the special
2113 character: e.g. @code{Fplus()} or @code{Flss()}. Primitives whose names
2114 begin with normal alphanumeric characters but also contain special
2115 characters are spelled out in some creative way, e.g. @code{let*}
2116 becomes @code{FletX()}.
2118 Each function also has an associated structure that holds the data for
2119 the subr object that represents the function in Lisp. This structure
2120 conveys the Lisp symbol name to the initialization routine that will
2121 create the symbol and store the subr object as its definition. The C
2122 variable name of this structure is always @samp{S} prepended to the
2123 @var{fname}. You hardly ever need to be aware of the existence of this
2124 structure, since @code{DEFUN} plus @code{DEFSUBR} takes care of all the
2128 This is the minimum number of arguments that the function requires. The
2129 function @code{prog1} allows a minimum of one argument.
2132 This is the maximum number of arguments that the function accepts, if
2133 there is a fixed maximum. Alternatively, it can be @code{UNEVALLED},
2134 indicating a special form that receives unevaluated arguments, or
2135 @code{MANY}, indicating an unlimited number of evaluated arguments (the
2136 C equivalent of @code{&rest}). Both @code{UNEVALLED} and @code{MANY}
2137 are macros. If @var{max_args} is a number, it may not be less than
2138 @var{min_args} and it may not be greater than 8. (If you need to add a
2139 function with more than 8 arguments, use the @code{MANY} form. Resist
2140 the urge to edit the definition of @code{DEFUN} in @file{lisp.h}. If
2141 you do it anyways, make sure to also add another clause to the switch
2142 statement in @code{primitive_funcall().})
2145 This is an interactive specification, a string such as might be used as
2146 the argument of @code{interactive} in a Lisp function. In the case of
2147 @code{prog1}, it is 0 (a null pointer), indicating that @code{prog1}
2148 cannot be called interactively. A value of @code{""} indicates a
2149 function that should receive no arguments when called interactively.
2152 This is the documentation string. It is written just like a
2153 documentation string for a function defined in Lisp; in particular, the
2154 first line should be a single sentence. Note how the documentation
2155 string is enclosed in a comment, none of the documentation is placed on
2156 the same lines as the comment-start and comment-end characters, and the
2157 comment-start characters are on the same line as the interactive
2158 specification. @file{make-docfile}, which scans the C files for
2159 documentation strings, is very particular about what it looks for, and
2160 will not properly extract the doc string if it's not in this exact format.
2162 In order to make both @file{etags} and @file{make-docfile} happy, make
2163 sure that the @code{DEFUN} line contains the @var{lname} and
2164 @var{fname}, and that the comment-start characters for the doc string
2165 are on the same line as the interactive specification, and put a newline
2166 directly after them (and before the comment-end characters).
2169 This is the comma-separated list of arguments to the C function. For a
2170 function with a fixed maximum number of arguments, provide a C argument
2171 for each Lisp argument. In this case, unlike regular C functions, the
2172 types of the arguments are not declared; they are simply always of type
2175 The names of the C arguments will be used as the names of the arguments
2176 to the Lisp primitive as displayed in its documentation, modulo the same
2177 concerns described above for @code{F...} names (in particular,
2178 underscores in the C arguments become dashes in the Lisp arguments).
2180 There is one additional kludge: A trailing `_' on the C argument is
2181 discarded when forming the Lisp argument. This allows C language
2182 reserved words (like @code{default}) or global symbols (like
2183 @code{dirname}) to be used as argument names without compiler warnings
2186 A Lisp function with @w{@var{max_args} = @code{UNEVALLED}} is a
2187 @w{@dfn{special form}}; its arguments are not evaluated. Instead it
2188 receives one argument of type @code{Lisp_Object}, a (Lisp) list of the
2189 unevaluated arguments, conventionally named @code{(args)}.
2191 When a Lisp function has no upper limit on the number of arguments,
2192 specify @w{@var{max_args} = @code{MANY}}. In this case its implementation in
2193 C actually receives exactly two arguments: the number of Lisp arguments
2194 (an @code{int}) and the address of a block containing their values (a
2195 @w{@code{Lisp_Object *}}). In this case only are the C types specified
2196 in the @var{arglist}: @w{@code{(int nargs, Lisp_Object *args)}}.
2200 Within the function @code{Fprog1} itself, note the use of the macros
2201 @code{GCPRO1} and @code{UNGCPRO}. @code{GCPRO1} is used to ``protect''
2202 a variable from garbage collection---to inform the garbage collector
2203 that it must look in that variable and regard the object pointed at by
2204 its contents as an accessible object. This is necessary whenever you
2205 call @code{Feval} or anything that can directly or indirectly call
2206 @code{Feval} (this includes the @code{QUIT} macro!). At such a time,
2207 any Lisp object that you intend to refer to again must be protected
2208 somehow. @code{UNGCPRO} cancels the protection of the variables that
2209 are protected in the current function. It is necessary to do this
2212 The macro @code{GCPRO1} protects just one local variable. If you want
2213 to protect two, use @code{GCPRO2} instead; repeating @code{GCPRO1} will
2214 not work. Macros @code{GCPRO3} and @code{GCPRO4} also exist.
2216 These macros implicitly use local variables such as @code{gcpro1}; you
2217 must declare these explicitly, with type @code{struct gcpro}. Thus, if
2218 you use @code{GCPRO2}, you must declare @code{gcpro1} and @code{gcpro2}.
2220 @cindex caller-protects (@code{GCPRO} rule)
2221 Note also that the general rule is @dfn{caller-protects}; i.e. you are
2222 only responsible for protecting those Lisp objects that you create. Any
2223 objects passed to you as arguments should have been protected by whoever
2224 created them, so you don't in general have to protect them.
2226 In particular, the arguments to any Lisp primitive are always
2227 automatically @code{GCPRO}ed, when called ``normally'' from Lisp code or
2228 bytecode. So only a few Lisp primitives that are called frequently from
2229 C code, such as @code{Fprogn} protect their arguments as a service to
2230 their caller. You don't need to protect your arguments when writing a
2233 @code{GCPRO}ing is perhaps the trickiest and most error-prone part of
2234 XEmacs coding. It is @strong{extremely} important that you get this
2235 right and use a great deal of discipline when writing this code.
2236 @xref{GCPROing, ,@code{GCPRO}ing}, for full details on how to do this.
2238 What @code{DEFUN} actually does is declare a global structure of type
2239 @code{Lisp_Subr} whose name begins with capital @samp{SF} and which
2240 contains information about the primitive (e.g. a pointer to the
2241 function, its minimum and maximum allowed arguments, a string describing
2242 its Lisp name); @code{DEFUN} then begins a normal C function declaration
2243 using the @code{F...} name. The Lisp subr object that is the function
2244 definition of a primitive (i.e. the object in the function slot of the
2245 symbol that names the primitive) actually points to this @samp{SF}
2246 structure; when @code{Feval} encounters a subr, it looks in the
2247 structure to find out how to call the C function.
2249 Defining the C function is not enough to make a Lisp primitive
2250 available; you must also create the Lisp symbol for the primitive (the
2251 symbol is @dfn{interned}; @pxref{Obarrays}) and store a suitable subr
2252 object in its function cell. (If you don't do this, the primitive won't
2253 be seen by Lisp code.) The code looks like this:
2256 DEFSUBR (@var{fname});
2260 Here @var{fname} is the same name you used as the second argument to
2263 This call to @code{DEFSUBR} should go in the @code{syms_of_*()} function
2264 at the end of the module. If no such function exists, create it and
2265 make sure to also declare it in @file{symsinit.h} and call it from the
2266 appropriate spot in @code{main()}. @xref{General Coding Rules}.
2268 Note that C code cannot call functions by name unless they are defined
2269 in C. The way to call a function written in Lisp from C is to use
2270 @code{Ffuncall}, which embodies the Lisp function @code{funcall}. Since
2271 the Lisp function @code{funcall} accepts an unlimited number of
2272 arguments, in C it takes two: the number of Lisp-level arguments, and a
2273 one-dimensional array containing their values. The first Lisp-level
2274 argument is the Lisp function to call, and the rest are the arguments to
2275 pass to it. Since @code{Ffuncall} can call the evaluator, you must
2276 protect pointers from garbage collection around the call to
2277 @code{Ffuncall}. (However, @code{Ffuncall} explicitly protects all of
2278 its parameters, so you don't have to protect any pointers passed as
2281 The C functions @code{call0}, @code{call1}, @code{call2}, and so on,
2282 provide handy ways to call a Lisp function conveniently with a fixed
2283 number of arguments. They work by calling @code{Ffuncall}.
2285 @file{eval.c} is a very good file to look through for examples;
2286 @file{lisp.h} contains the definitions for important macros and
2289 @node Writing Good Comments
2290 @section Writing Good Comments
2291 @cindex writing good comments
2292 @cindex comments, writing good
2294 Comments are a lifeline for programmers trying to understand tricky
2295 code. In general, the less obvious it is what you are doing, the more
2296 you need a comment, and the more detailed it needs to be. You should
2297 always be on guard when you're writing code for stuff that's tricky, and
2298 should constantly be putting yourself in someone else's shoes and asking
2299 if that person could figure out without much difficulty what's going
2300 on. (Assume they are a competent programmer who understands the
2301 essentials of how the XEmacs code is structured but doesn't know much
2302 about the module you're working on or any algorithms you're using.) If
2303 you're not sure whether they would be able to, add a comment. Always
2304 err on the side of more comments, rather than less.
2306 Generally, when making comments, there is no need to attribute them with
2307 your name or initials. This especially goes for small,
2308 easy-to-understand, non-opinionated ones. Also, comments indicating
2309 where, when, and by whom a file was changed are @emph{strongly}
2310 discouraged, and in general will be removed as they are discovered.
2311 This is exactly what @file{ChangeLogs} are there for. However, it can
2312 occasionally be useful to mark exactly where (but not when or by whom)
2313 changes are made, particularly when making small changes to a file
2314 imported from elsewhere. These marks help when later on a newer version
2315 of the file is imported and the changes need to be merged. (If
2316 everything were always kept in CVS, there would be no need for this.
2317 But in practice, this often doesn't happen, or the CVS repository is
2318 later on lost or unavailable to the person doing the update.)
2320 When putting in an explicit opinion in a comment, you should
2321 @emph{always} attribute it with your name, and optionally the date.
2322 This also goes for long, complex comments explaining in detail the
2323 workings of something -- by putting your name there, you make it
2324 possible for someone who has questions about how that thing works to
2325 determine who wrote the comment so they can write to them. Preferably,
2326 use your actual name and not your initials, unless your initials are
2327 generally recognized (e.g. @samp{jwz}). You can use only your first
2328 name if it's obvious who you are; otherwise, give first and last name.
2329 If you're not a regular contributor, you might consider putting your
2330 email address in -- it may be in the ChangeLog, but after awhile
2331 ChangeLogs have a tendency of disappearing or getting
2332 muddled. (E.g. your comment may get copied somewhere else or even into
2333 another program, and tracking down the proper ChangeLog may be very
2336 If you come across an opinion that is not or no longer valid, or you
2337 come across any comment that no longer applies but you want to keep it
2338 around, enclose it in @samp{[[ } and @samp{ ]]} marks and add a comment
2339 afterwards explaining why the preceding comment is no longer valid. Put
2340 your name on this comment, as explained above.
2342 Just as comments are a lifeline to programmers, incorrect comments are
2343 death. If you come across an incorrect comment, @strong{immediately}
2344 correct it or flag it as incorrect, as described in the previous
2345 paragraph. Whenever you work on a section of code, @emph{always} make
2346 sure to update any comments to be correct -- or, at the very least, flag
2349 To indicate a "todo" or other problem, use four pound signs --
2352 @node Adding Global Lisp Variables
2353 @section Adding Global Lisp Variables
2354 @cindex global Lisp variables, adding
2355 @cindex variables, adding global Lisp
2357 Global variables whose names begin with @samp{Q} are constants whose
2358 value is a symbol of a particular name. The name of the variable should
2359 be derived from the name of the symbol using the same rules as for Lisp
2360 primitives. These variables are initialized using a call to
2361 @code{defsymbol()} in the @code{syms_of_*()} function. (This call
2362 interns a symbol, sets the C variable to the resulting Lisp object, and
2363 calls @code{staticpro()} on the C variable to tell the
2364 garbage-collection mechanism about this variable. What
2365 @code{staticpro()} does is add a pointer to the variable to a large
2366 global array; when garbage-collection happens, all pointers listed in
2367 the array are used as starting points for marking Lisp objects. This is
2368 important because it's quite possible that the only current reference to
2369 the object is the C variable. In the case of symbols, the
2370 @code{staticpro()} doesn't matter all that much because the symbol is
2371 contained in @code{obarray}, which is itself @code{staticpro()}ed.
2372 However, it's possible that a naughty user could do something like
2373 uninterning the symbol out of @code{obarray} or even setting
2374 @code{obarray} to a different value [although this is likely to make
2377 @strong{Please note:} It is potentially deadly if you declare a
2378 @samp{Q...} variable in two different modules. The two calls to
2379 @code{defsymbol()} are no problem, but some linkers will complain about
2380 multiply-defined symbols. The most insidious aspect of this is that
2381 often the link will succeed anyway, but then the resulting executable
2382 will sometimes crash in obscure ways during certain operations! To
2383 avoid this problem, declare any symbols with common names (such as
2384 @code{text}) that are not obviously associated with this particular
2385 module in the module @file{general.c}.
2387 Global variables whose names begin with @samp{V} are variables that
2388 contain Lisp objects. The convention here is that all global variables
2389 of type @code{Lisp_Object} begin with @samp{V}, and all others don't
2390 (including integer and boolean variables that have Lisp
2391 equivalents). Most of the time, these variables have equivalents in
2392 Lisp, but some don't. Those that do are declared this way by a call to
2393 @code{DEFVAR_LISP()} in the @code{vars_of_*()} initializer for the
2394 module. What this does is create a special @dfn{symbol-value-forward}
2395 Lisp object that contains a pointer to the C variable, intern a symbol
2396 whose name is as specified in the call to @code{DEFVAR_LISP()}, and set
2397 its value to the symbol-value-forward Lisp object; it also calls
2398 @code{staticpro()} on the C variable to tell the garbage-collection
2399 mechanism about the variable. When @code{eval} (or actually
2400 @code{symbol-value}) encounters this special object in the process of
2401 retrieving a variable's value, it follows the indirection to the C
2402 variable and gets its value. @code{setq} does similar things so that
2403 the C variable gets changed.
2405 Whether or not you @code{DEFVAR_LISP()} a variable, you need to
2406 initialize it in the @code{vars_of_*()} function; otherwise it will end
2407 up as all zeroes, which is the integer 0 (@emph{not} @code{nil}), and
2408 this is probably not what you want. Also, if the variable is not
2409 @code{DEFVAR_LISP()}ed, @strong{you must call} @code{staticpro()} on the
2410 C variable in the @code{vars_of_*()} function. Otherwise, the
2411 garbage-collection mechanism won't know that the object in this variable
2412 is in use, and will happily collect it and reuse its storage for another
2413 Lisp object, and you will be the one who's unhappy when you can't figure
2414 out how your variable got overwritten.
2416 @node Proper Use of Unsigned Types
2417 @section Proper Use of Unsigned Types
2418 @cindex unsigned types, proper use of
2419 @cindex types, proper use of unsigned
2421 Avoid using @code{unsigned int} and @code{unsigned long} whenever
2422 possible. Unsigned types are viral -- any arithmetic or comparisons
2423 involving mixed signed and unsigned types are automatically converted to
2424 unsigned, which is almost certainly not what you want. Many subtle and
2425 hard-to-find bugs are created by careless use of unsigned types. In
2426 general, you should almost @emph{never} use an unsigned type to hold a
2427 regular quantity of any sort. The only exceptions are
2431 When there's a reasonable possibility you will actually need all 32 or
2432 64 bits to store the quantity.
2434 When calling existing API's that require unsigned types. In this case,
2435 you should still do all manipulation using signed types, and do the
2436 conversion at the very threshold of the API call.
2438 In existing code that you don't want to modify because you don't
2441 In bit-field structures.
2444 Other reasonable uses of @code{unsigned int} and @code{unsigned long}
2445 are representing non-quantities -- e.g. bit-oriented flags and such.
2447 @node Coding for Mule
2448 @section Coding for Mule
2449 @cindex coding for Mule
2450 @cindex Mule, coding for
2452 Although Mule support is not compiled by default in XEmacs, many people
2453 are using it, and we consider it crucial that new code works correctly
2454 with multibyte characters. This is not hard; it is only a matter of
2455 following several simple user-interface guidelines. Even if you never
2456 compile with Mule, with a little practice you will find it quite easy
2457 to code Mule-correctly.
2459 Note that these guidelines are not necessarily tied to the current Mule
2460 implementation; they are also a good idea to follow on the grounds of
2461 code generalization for future I18N work.
2464 * Character-Related Data Types::
2465 * Working With Character and Byte Positions::
2466 * Conversion to and from External Data::
2467 * General Guidelines for Writing Mule-Aware Code::
2468 * An Example of Mule-Aware Code::
2471 @node Character-Related Data Types
2472 @subsection Character-Related Data Types
2473 @cindex character-related data types
2474 @cindex data types, character-related
2476 First, let's review the basic character-related datatypes used by
2477 XEmacs. Note that the separate @code{typedef}s are not mandatory in the
2478 current implementation (all of them boil down to @code{unsigned char} or
2479 @code{int}), but they improve clarity of code a great deal, because one
2480 glance at the declaration can tell the intended use of the variable.
2485 An @code{Emchar} holds a single Emacs character.
2487 Obviously, the equality between characters and bytes is lost in the Mule
2488 world. Characters can be represented by one or more bytes in the
2489 buffer, and @code{Emchar} is the C type large enough to hold any
2492 Without Mule support, an @code{Emchar} is equivalent to an
2493 @code{unsigned char}.
2497 The data representing the text in a buffer or string is logically a set
2500 XEmacs does not work with the same character formats all the time; when
2501 reading characters from the outside, it decodes them to an internal
2502 format, and likewise encodes them when writing. @code{Bufbyte} (in fact
2503 @code{unsigned char}) is the basic unit of XEmacs internal buffers and
2504 strings format. A @code{Bufbyte *} is the type that points at text
2505 encoded in the variable-width internal encoding.
2507 One character can correspond to one or more @code{Bufbyte}s. In the
2508 current Mule implementation, an ASCII character is represented by the
2509 same @code{Bufbyte}, and other characters are represented by a sequence
2510 of two or more @code{Bufbyte}s.
2512 Without Mule support, there are exactly 256 characters, implicitly
2513 Latin-1, and each character is represented using one @code{Bufbyte}, and
2514 there is a one-to-one correspondence between @code{Bufbyte}s and
2521 A @code{Bufpos} represents a character position in a buffer or string.
2522 A @code{Charcount} represents a number (count) of characters.
2523 Logically, subtracting two @code{Bufpos} values yields a
2524 @code{Charcount} value. Although all of these are @code{typedef}ed to
2525 @code{EMACS_INT}, we use them in preference to @code{EMACS_INT} to make
2526 it clear what sort of position is being used.
2528 @code{Bufpos} and @code{Charcount} values are the only ones that are
2529 ever visible to Lisp.
2535 A @code{Bytind} represents a byte position in a buffer or string. A
2536 @code{Bytecount} represents the distance between two positions, in bytes.
2537 The relationship between @code{Bytind} and @code{Bytecount} is the same
2538 as the relationship between @code{Bufpos} and @code{Charcount}.
2544 When dealing with the outside world, XEmacs works with @code{Extbyte}s,
2545 which are equivalent to @code{unsigned char}. Obviously, an
2546 @code{Extcount} is the distance between two @code{Extbyte}s. Extbytes
2547 and Extcounts are not all that frequent in XEmacs code.
2550 @node Working With Character and Byte Positions
2551 @subsection Working With Character and Byte Positions
2552 @cindex character and byte positions, working with
2553 @cindex byte positions, working with character and
2554 @cindex positions, working with character and byte
2556 Now that we have defined the basic character-related types, we can look
2557 at the macros and functions designed for work with them and for
2558 conversion between them. Most of these macros are defined in
2559 @file{buffer.h}, and we don't discuss all of them here, but only the
2560 most important ones. Examining the existing code is the best way to
2564 @item MAX_EMCHAR_LEN
2565 @cindex MAX_EMCHAR_LEN
2566 This preprocessor constant is the maximum number of buffer bytes to
2567 represent an Emacs character in the variable width internal encoding.
2568 It is useful when allocating temporary strings to keep a known number of
2569 characters. For instance:
2577 /* Allocate place for @var{cclen} characters. */
2578 Bufbyte *buf = (Bufbyte *)alloca (cclen * MAX_EMCHAR_LEN);
2583 If you followed the previous section, you can guess that, logically,
2584 multiplying a @code{Charcount} value with @code{MAX_EMCHAR_LEN} produces
2585 a @code{Bytecount} value.
2587 In the current Mule implementation, @code{MAX_EMCHAR_LEN} equals 4.
2588 Without Mule, it is 1.
2590 @item charptr_emchar
2591 @itemx set_charptr_emchar
2592 @cindex charptr_emchar
2593 @cindex set_charptr_emchar
2594 The @code{charptr_emchar} macro takes a @code{Bufbyte} pointer and
2595 returns the @code{Emchar} stored at that position. If it were a
2596 function, its prototype would be:
2599 Emchar charptr_emchar (Bufbyte *p);
2602 @code{set_charptr_emchar} stores an @code{Emchar} to the specified byte
2603 position. It returns the number of bytes stored:
2606 Bytecount set_charptr_emchar (Bufbyte *p, Emchar c);
2609 It is important to note that @code{set_charptr_emchar} is safe only for
2610 appending a character at the end of a buffer, not for overwriting a
2611 character in the middle. This is because the width of characters
2612 varies, and @code{set_charptr_emchar} cannot resize the string if it
2613 writes, say, a two-byte character where a single-byte character used to
2616 A typical use of @code{set_charptr_emchar} can be demonstrated by this
2617 example, which copies characters from buffer @var{buf} to a temporary
2624 for (pos = beg; pos < end; pos++)
2626 Emchar c = BUF_FETCH_CHAR (buf, pos);
2627 p += set_charptr_emchar (buf, c);
2633 Note how @code{set_charptr_emchar} is used to store the @code{Emchar}
2634 and increment the counter, at the same time.
2640 These two macros increment and decrement a @code{Bufbyte} pointer,
2641 respectively. They will adjust the pointer by the appropriate number of
2642 bytes according to the byte length of the character stored there. Both
2643 macros assume that the memory address is located at the beginning of a
2646 Without Mule support, @code{INC_CHARPTR (p)} and @code{DEC_CHARPTR (p)}
2647 simply expand to @code{p++} and @code{p--}, respectively.
2649 @item bytecount_to_charcount
2650 @cindex bytecount_to_charcount
2651 Given a pointer to a text string and a length in bytes, return the
2652 equivalent length in characters.
2655 Charcount bytecount_to_charcount (Bufbyte *p, Bytecount bc);
2658 @item charcount_to_bytecount
2659 @cindex charcount_to_bytecount
2660 Given a pointer to a text string and a length in characters, return the
2661 equivalent length in bytes.
2664 Bytecount charcount_to_bytecount (Bufbyte *p, Charcount cc);
2667 @item charptr_n_addr
2668 @cindex charptr_n_addr
2669 Return a pointer to the beginning of the character offset @var{cc} (in
2670 characters) from @var{p}.
2673 Bufbyte *charptr_n_addr (Bufbyte *p, Charcount cc);
2677 @node Conversion to and from External Data
2678 @subsection Conversion to and from External Data
2679 @cindex conversion to and from external data
2680 @cindex external data, conversion to and from
2682 When an external function, such as a C library function, returns a
2683 @code{char} pointer, you should almost never treat it as @code{Bufbyte}.
2684 This is because these returned strings may contain 8bit characters which
2685 can be misinterpreted by XEmacs, and cause a crash. Likewise, when
2686 exporting a piece of internal text to the outside world, you should
2687 always convert it to an appropriate external encoding, lest the internal
2688 stuff (such as the infamous \201 characters) leak out.
2690 The interface to conversion between the internal and external
2691 representations of text are the numerous conversion macros defined in
2692 @file{buffer.h}. There used to be a fixed set of external formats
2693 supported by these macros, but now any coding system can be used with
2694 these macros. The coding system alias mechanism is used to create the
2695 following logical coding systems, which replace the fixed external
2696 formats. The (dontusethis-set-symbol-value-handler) mechanism was
2697 enhanced to make this possible (more work on that is needed - like
2698 remove the @code{dontusethis-} prefix).
2702 This is the simplest format and is what we use in the absence of a more
2703 appropriate format. This converts according to the @code{binary} coding
2708 On input, bytes 0--255 are converted into (implicitly Latin-1)
2709 characters 0--255. A non-Mule xemacs doesn't really know about
2710 different character sets and the fonts to display them, so the bytes can
2711 be treated as text in different 1-byte encodings by simply setting the
2712 appropriate fonts. So in a sense, non-Mule xemacs is a multi-lingual
2713 editor if, for example, different fonts are used to display text in
2714 different buffers, faces, or windows. The specifier mechanism gives the
2715 user complete control over this kind of behavior.
2717 On output, characters 0--255 are converted into bytes 0--255 and other
2718 characters are converted into `~'.
2722 Format used for filenames. This is user-definable via either the
2723 @code{file-name-coding-system} or @code{pathname-coding-system} (now
2724 obsolete) variables.
2727 Format used for the external Unix environment---@code{argv[]}, stuff
2728 from @code{getenv()}, stuff from the @file{/etc/passwd} file, etc.
2729 Currently this is the same as Qfile_name. The two should be
2730 distinguished for clarity and possible future separation.
2733 Compound--text format. This is the standard X11 format used for data
2734 stored in properties, selections, and the like. This is an 8-bit
2735 no-lock-shift ISO2022 coding system. This is a real coding system,
2736 unlike Qfile_name, which is user-definable.
2739 There are two fundamental macros to convert between external and
2742 @code{TO_INTERNAL_FORMAT} converts external data to internal format, and
2743 @code{TO_EXTERNAL_FORMAT} converts the other way around. The arguments
2744 each of these receives are a source type, a source, a sink type, a sink,
2745 and a coding system (or a symbol naming a coding system).
2747 A typical call looks like
2749 TO_EXTERNAL_FORMAT (LISP_STRING, str, C_STRING_MALLOC, ptr, Qfile_name);
2752 which means that the contents of the lisp string @code{str} are written
2753 to a malloc'ed memory area which will be pointed to by @code{ptr}, after
2754 the function returns. The conversion will be done using the
2755 @code{file-name} coding system, which will be controlled by the user
2756 indirectly by setting or binding the variable
2757 @code{file-name-coding-system}.
2759 Some sources and sinks require two C variables to specify. We use some
2760 preprocessor magic to allow different source and sink types, and even
2761 different numbers of arguments to specify different types of sources and
2764 So we can have a call that looks like
2766 TO_INTERNAL_FORMAT (DATA, (ptr, len),
2771 The parenthesized argument pairs are required to make the preprocessor
2774 Here are the different source and sink types:
2777 @item @code{DATA, (ptr, len),}
2778 input data is a fixed buffer of size @var{len} at address @var{ptr}
2779 @item @code{ALLOCA, (ptr, len),}
2780 output data is placed in an alloca()ed buffer of size @var{len} pointed to by @var{ptr}
2781 @item @code{MALLOC, (ptr, len),}
2782 output data is in a malloc()ed buffer of size @var{len} pointed to by @var{ptr}
2783 @item @code{C_STRING_ALLOCA, ptr,}
2784 equivalent to @code{ALLOCA (ptr, len_ignored)} on output.
2785 @item @code{C_STRING_MALLOC, ptr,}
2786 equivalent to @code{MALLOC (ptr, len_ignored)} on output
2787 @item @code{C_STRING, ptr,}
2788 equivalent to @code{DATA, (ptr, strlen (ptr) + 1)} on input
2789 @item @code{LISP_STRING, string,}
2790 input or output is a Lisp_Object of type string
2791 @item @code{LISP_BUFFER, buffer,}
2792 output is written to @code{(point)} in lisp buffer @var{buffer}
2793 @item @code{LISP_LSTREAM, lstream,}
2794 input or output is a Lisp_Object of type lstream
2795 @item @code{LISP_OPAQUE, object,}
2796 input or output is a Lisp_Object of type opaque
2799 Often, the data is being converted to a '\0'-byte-terminated string,
2800 which is the format required by many external system C APIs. For these
2801 purposes, a source type of @code{C_STRING} or a sink type of
2802 @code{C_STRING_ALLOCA} or @code{C_STRING_MALLOC} is appropriate.
2803 Otherwise, we should try to keep XEmacs '\0'-byte-clean, which means
2804 using (ptr, len) pairs.
2806 The sinks to be specified must be lvalues, unless they are the lisp
2807 object types @code{LISP_LSTREAM} or @code{LISP_BUFFER}.
2809 For the sink types @code{ALLOCA} and @code{C_STRING_ALLOCA}, the
2810 resulting text is stored in a stack-allocated buffer, which is
2811 automatically freed on returning from the function. However, the sink
2812 types @code{MALLOC} and @code{C_STRING_MALLOC} return @code{xmalloc()}ed
2813 memory. The caller is responsible for freeing this memory using
2816 Note that it doesn't make sense for @code{LISP_STRING} to be a source
2817 for @code{TO_INTERNAL_FORMAT} or a sink for @code{TO_EXTERNAL_FORMAT}.
2818 You'll get an assertion failure if you try.
2821 @node General Guidelines for Writing Mule-Aware Code
2822 @subsection General Guidelines for Writing Mule-Aware Code
2823 @cindex writing Mule-aware code, general guidelines for
2824 @cindex Mule-aware code, general guidelines for writing
2825 @cindex code, general guidelines for writing Mule-aware
2827 This section contains some general guidance on how to write Mule-aware
2828 code, as well as some pitfalls you should avoid.
2831 @item Never use @code{char} and @code{char *}.
2832 In XEmacs, the use of @code{char} and @code{char *} is almost always a
2833 mistake. If you want to manipulate an Emacs character from ``C'', use
2834 @code{Emchar}. If you want to examine a specific octet in the internal
2835 format, use @code{Bufbyte}. If you want a Lisp-visible character, use a
2836 @code{Lisp_Object} and @code{make_char}. If you want a pointer to move
2837 through the internal text, use @code{Bufbyte *}. Also note that you
2838 almost certainly do not need @code{Emchar *}.
2840 @item Be careful not to confuse @code{Charcount}, @code{Bytecount}, and @code{Bufpos}.
2841 The whole point of using different types is to avoid confusion about the
2842 use of certain variables. Lest this effect be nullified, you need to be
2843 careful about using the right types.
2845 @item Always convert external data
2846 It is extremely important to always convert external data, because
2847 XEmacs can crash if unexpected 8bit sequences are copied to its internal
2850 This means that when a system function, such as @code{readdir}, returns
2851 a string, you may need to convert it using one of the conversion macros
2852 described in the previous chapter, before passing it further to Lisp.
2854 Actually, most of the basic system functions that accept '\0'-terminated
2855 string arguments, like @code{stat()} and @code{open()}, have been
2856 @strong{encapsulated} so that they are they @code{always} do internal to
2857 external conversion themselves. This means you must pass internally
2858 encoded data, typically the @code{XSTRING_DATA} of a Lisp_String to
2859 these functions. This is actually a design bug, since it unexpectedly
2860 changes the semantics of the system functions. A better design would be
2861 to provide separate versions of these system functions that accepted
2862 Lisp_Objects which were lisp strings in place of their current
2863 @code{char *} arguments.
2866 int stat_lisp (Lisp_Object path, struct stat *buf); /* Implement me */
2869 Also note that many internal functions, such as @code{make_string},
2870 accept Bufbytes, which removes the need for them to convert the data
2871 they receive. This increases efficiency because that way external data
2872 needs to be decoded only once, when it is read. After that, it is
2873 passed around in internal format.
2876 @node An Example of Mule-Aware Code
2877 @subsection An Example of Mule-Aware Code
2878 @cindex code, an example of Mule-aware
2879 @cindex Mule-aware code, an example of
2881 As an example of Mule-aware code, we will analyze the @code{string}
2882 function, which conses up a Lisp string from the character arguments it
2883 receives. Here is the definition, pasted from @code{alloc.c}:
2887 DEFUN ("string", Fstring, 0, MANY, 0, /*
2888 Concatenate all the argument characters and make the result a string.
2890 (int nargs, Lisp_Object *args))
2892 Bufbyte *storage = alloca_array (Bufbyte, nargs * MAX_EMCHAR_LEN);
2893 Bufbyte *p = storage;
2895 for (; nargs; nargs--, args++)
2897 Lisp_Object lisp_char = *args;
2898 CHECK_CHAR_COERCE_INT (lisp_char);
2899 p += set_charptr_emchar (p, XCHAR (lisp_char));
2901 return make_string (storage, p - storage);
2906 Now we can analyze the source line by line.
2908 Obviously, string will be as long as there are arguments to the
2909 function. This is why we allocate @code{MAX_EMCHAR_LEN} * @var{nargs}
2910 bytes on the stack, i.e. the worst-case number of bytes for @var{nargs}
2911 @code{Emchar}s to fit in the string.
2913 Then, the loop checks that each element is a character, converting
2914 integers in the process. Like many other functions in XEmacs, this
2915 function silently accepts integers where characters are expected, for
2916 historical and compatibility reasons. Unless you know what you are
2917 doing, @code{CHECK_CHAR} will also suffice. @code{XCHAR (lisp_char)}
2918 extracts the @code{Emchar} from the @code{Lisp_Object}, and
2919 @code{set_charptr_emchar} stores it to storage, increasing @code{p} in
2922 Other instructive examples of correct coding under Mule can be found all
2923 over the XEmacs code. For starters, I recommend
2924 @code{Fnormalize_menu_item_name} in @file{menubar.c}. After you have
2925 understood this section of the manual and studied the examples, you can
2926 proceed writing new Mule-aware code.
2928 @node Techniques for XEmacs Developers
2929 @section Techniques for XEmacs Developers
2930 @cindex techniques for XEmacs developers
2931 @cindex developers, techniques for XEmacs
2935 To make a purified XEmacs, do: @code{make puremacs}.
2936 To make a quantified XEmacs, do: @code{make quantmacs}.
2938 You simply can't dump Quantified and Purified images (unless using the
2939 portable dumper). Purify gets confused when xemacs frees memory in one
2940 process that was allocated in a @emph{different} process on a different
2941 machine!. Run it like so:
2943 temacs -batch -l loadup.el run-temacs @var{xemacs-args...}
2946 @cindex error checking
2947 Before you go through the trouble, are you compiling with all
2948 debugging and error-checking off? If not, try that first. Be warned
2949 that while Quantify is directly responsible for quite a few
2950 optimizations which have been made to XEmacs, doing a run which
2951 generates results which can be acted upon is not necessarily a trivial
2954 Also, if you're still willing to do some runs make sure you configure
2955 with the @samp{--quantify} flag. That will keep Quantify from starting
2956 to record data until after the loadup is completed and will shut off
2957 recording right before it shuts down (which generates enough bogus data
2958 to throw most results off). It also enables three additional elisp
2959 commands: @code{quantify-start-recording-data},
2960 @code{quantify-stop-recording-data} and @code{quantify-clear-data}.
2962 If you want to make XEmacs faster, target your favorite slow benchmark,
2963 run a profiler like Quantify, @code{gprof}, or @code{tcov}, and figure
2964 out where the cycles are going. In many cases you can localize the
2965 problem (because a particular new feature or even a single patch
2966 elicited it). Don't hesitate to use brute force techniques like a
2967 global counter incremented at strategic places, especially in
2968 combination with other performance indications (@emph{e.g.}, degree of
2969 buffer fragmentation into extents).
2975 Make the garbage collector faster. Figure out how to write an
2976 incremental garbage collector.
2978 Write a compiler that takes bytecode and spits out C code.
2979 Unfortunately, you will then need a C compiler and a more fully
2980 developed module system.
2984 Speed up syntax highlighting. It was suggested that ``maybe moving some
2985 of the syntax highlighting capabilities into C would make a
2986 difference.'' Wrong idea, I think. When processing one large file a
2987 particular low-level routine was being called 40 @emph{million} times
2988 simply for @emph{one} call to @code{newline-and-indent}. Syntax
2989 highlighting needs to be rewritten to use a reliable, fast parser, then
2990 to trust the pre-parsed structure, and only do re-highlighting locally
2991 to a text change. Modern machines are fast enough to implement such
2992 parsers in Lisp; but no machine will ever be fast enough to deal with
2993 quadratic (or worse) algorithms!
2995 Implement tail recursion in Emacs Lisp (hard!).
2998 Unfortunately, Emacs Lisp is slow, and is going to stay slow. Function
2999 calls in elisp are especially expensive. Iterating over a long list is
3000 going to be 30 times faster implemented in C than in Elisp.
3002 Heavily used small code fragments need to be fast. The traditional way
3003 to implement such code fragments in C is with macros. But macros in C
3004 are known to be broken.
3006 @cindex macro hygiene
3007 Macro arguments that are repeatedly evaluated may suffer from repeated
3008 side effects or suboptimal performance.
3010 Variable names used in macros may collide with caller's variables,
3011 causing (at least) unwanted compiler warnings.
3013 In order to solve these problems, and maintain statement semantics, one
3014 should use the @code{do @{ ... @} while (0)} trick while trying to
3015 reference macro arguments exactly once using local variables.
3017 Let's take a look at this poor macro definition:
3020 #define MARK_OBJECT(obj) \
3021 if (!marked_p (obj)) mark_object (obj), did_mark = 1
3024 This macro evaluates its argument twice, and also fails if used like this:
3026 if (flag) MARK_OBJECT (obj); else do_something();
3029 A much better definition is
3032 #define MARK_OBJECT(obj) do @{ \
3033 Lisp_Object mo_obj = (obj); \
3034 if (!marked_p (mo_obj)) \
3036 mark_object (mo_obj); \
3042 Notice the elimination of double evaluation by using the local variable
3043 with the obscure name. Writing safe and efficient macros requires great
3044 care. The one problem with macros that cannot be portably worked around
3045 is, since a C block has no value, a macro used as an expression rather
3046 than a statement cannot use the techniques just described to avoid
3047 multiple evaluation.
3049 @cindex inline functions
3050 In most cases where a macro has function semantics, an inline function
3051 is a better implementation technique. Modern compiler optimizers tend
3052 to inline functions even if they have no @code{inline} keyword, and
3053 configure magic ensures that the @code{inline} keyword can be safely
3054 used as an additional compiler hint. Inline functions used in a single
3055 .c files are easy. The function must already be defined to be
3056 @code{static}. Just add another @code{inline} keyword to the
3061 heavily_used_small_function (int arg)
3067 Inline functions in header files are trickier, because we would like to
3068 make the following optimization if the function is @emph{not} inlined
3069 (for example, because we're compiling for debugging). We would like the
3070 function to be defined externally exactly once, and each calling
3071 translation unit would create an external reference to the function,
3072 instead of including a definition of the inline function in the object
3073 code of every translation unit that uses it. This optimization is
3074 currently only available for gcc. But you don't have to worry about the
3075 trickiness; just define your inline functions in header files using this
3080 i_used_to_be_a_crufty_macro_but_look_at_me_now (int arg);
3082 i_used_to_be_a_crufty_macro_but_look_at_me_now (int arg)
3088 The declaration right before the definition is to prevent warnings when
3089 compiling with @code{gcc -Wmissing-declarations}. I consider issuing
3090 this warning for inline functions a gcc bug, but the gcc maintainers disagree.
3092 @cindex inline functions, headers
3093 @cindex header files, inline functions
3094 Every header which contains inline functions, either directly by using
3095 @code{INLINE_HEADER} or indirectly by using @code{DECLARE_LRECORD} must
3096 be added to @file{inline.c}'s includes to make the optimization
3097 described above work. (Optimization note: if all INLINE_HEADER
3098 functions are in fact inlined in all translation units, then the linker
3099 can just discard @code{inline.o}, since it contains only unreferenced code).
3101 To get started debugging XEmacs, take a look at the @file{.gdbinit} and
3102 @file{.dbxrc} files in the @file{src} directory. See the section in the
3103 XEmacs FAQ on How to Debug an XEmacs problem with a debugger.
3105 After making source code changes, run @code{make check} to ensure that
3106 you haven't introduced any regressions. If you want to make xemacs more
3107 reliable, please improve the test suite in @file{tests/automated}.
3109 Did you make sure you didn't introduce any new compiler warnings?
3111 Before submitting a patch, please try compiling at least once with
3114 configure --with-mule --use-union-type --error-checking=all
3117 Here are things to know when you create a new source file:
3121 All @file{.c} files should @code{#include <config.h>} first. Almost all
3122 @file{.c} files should @code{#include "lisp.h"} second.
3125 Generated header files should be included using the @code{#include <...>} syntax,
3126 not the @code{#include "..."} syntax. The generated headers are:
3128 @file{config.h sheap-adjust.h paths.h Emacs.ad.h}
3130 The basic rule is that you should assume builds using @code{--srcdir}
3131 and the @code{#include <...>} syntax needs to be used when the
3132 to-be-included generated file is in a potentially different directory
3133 @emph{at compile time}. The non-obvious C rule is that @code{#include "..."}
3134 means to search for the included file in the same directory as the
3135 including file, @emph{not} in the current directory.
3138 Header files should @emph{not} include @code{<config.h>} and
3139 @code{"lisp.h"}. It is the responsibility of the @file{.c} files that
3144 @cindex Lisp object types, creating
3145 @cindex creating Lisp object types
3146 @cindex object types, creating Lisp
3147 Here is a checklist of things to do when creating a new lisp object type
3156 add definitions of @code{syms_of_@var{foo}}, etc. to @file{@var{foo}.c}
3158 add declarations of @code{syms_of_@var{foo}}, etc. to @file{symsinit.h}
3160 add calls to @code{syms_of_@var{foo}}, etc. to @file{emacs.c}
3162 add definitions of macros like @code{CHECK_@var{FOO}} and
3163 @code{@var{FOO}P} to @file{@var{foo}.h}
3165 add the new type index to @code{enum lrecord_type}
3167 add a DEFINE_LRECORD_IMPLEMENTATION call to @file{@var{foo}.c}
3169 add an INIT_LRECORD_IMPLEMENTATION call to @code{syms_of_@var{foo}.c}
3173 @node Regression Testing XEmacs, A Summary of the Various XEmacs Modules, Rules When Writing New C Code, Top
3174 @chapter Regression Testing XEmacs
3175 @cindex testing, regression
3177 The source directory @file{tests/automated} contains XEmacs' automated
3178 test suite. The usual way of running all the tests is running
3179 @code{make check} from the top-level source directory.
3181 The test suite is unfinished and it's still lacking some essential
3182 features. It is nevertheless recommended that you run the tests to
3183 confirm that XEmacs behaves correctly.
3185 If you want to run a specific test case, you can do it from the
3186 command-line like this:
3189 $ xemacs -batch -l test-harness.elc -f batch-test-emacs TEST-FILE
3192 If something goes wrong, you can run the test suite interactively by
3193 loading @file{test-harness.el} into a running XEmacs and typing
3194 @kbd{M-x test-emacs-test-file RET <filename> RET}. You will see a log of
3195 passed and failed tests, which should allow you to investigate the
3196 source of the error and ultimately fix the bug.
3198 Adding a new test file is trivial: just create a new file here and it
3199 will be run. There is no need to byte-compile any of the files in
3200 this directory---the test-harness will take care of any necessary
3203 Look at the existing test cases for the examples of coding test cases.
3204 It all boils down to your imagination and judicious use of the macros
3205 @code{Assert}, @code{Check-Error}, @code{Check-Error-Message}, and
3206 @code{Check-Message}.
3208 Here's a simple example checking case-sensitive and case-insensitive
3209 comparisons from @file{case-tests.el}.
3213 (insert "Test Buffer")
3214 (let ((case-fold-search t))
3215 (goto-char (point-min))
3216 (Assert (eq (search-forward "test buffer" nil t) 12))
3217 (goto-char (point-min))
3218 (Assert (eq (search-forward "Test buffer" nil t) 12))
3219 (goto-char (point-min))
3220 (Assert (eq (search-forward "Test Buffer" nil t) 12))
3222 (setq case-fold-search nil)
3223 (goto-char (point-min))
3224 (Assert (not (search-forward "test buffer" nil t)))
3225 (goto-char (point-min))
3226 (Assert (not (search-forward "Test buffer" nil t)))
3227 (goto-char (point-min))
3228 (Assert (eq (search-forward "Test Buffer" nil t) 12))))
3231 This example could be inserted in a file in @file{tests/automated}, and
3232 it would be a complete test, automatically executed when you run
3233 @kbd{make check} after building XEmacs. More complex tests may require
3234 substantial temporary scaffolding to create the environment that elicits
3235 the bugs, but the top-level Makefile and @file{test-harness.el} handle
3236 the running and collection of results from the @code{Assert},
3237 @code{Check-Error}, @code{Check-Error-Message}, and @code{Check-Message}
3240 In general, you should avoid using functionality from packages in your
3241 tests, because you can't be sure that everyone will have the required
3242 package. However, if you've got a test that works, by all means add it.
3243 Simply wrap the test in an appropriate test, add a notice that the test
3244 was skipped, and update the @code{skipped-test-reasons} hashtable.
3245 Here's an example from @file{syntax-tests.el}:
3248 ;; Test forward-comment at buffer boundaries
3251 ;; try to use exactly what you need: featurep, boundp, fboundp
3252 (if (not (fboundp 'c-mode))
3254 ;; We should provide a standard function for this boilerplate,
3255 ;; probably called `Skip-Test' -- check for that API with C-h f
3256 (let* ((reason "c-mode unavailable")
3257 (count (gethash reason skipped-test-reasons)))
3258 (puthash reason (if (null count) 1 (1+ count))
3259 skipped-test-reasons)
3260 (Print-Skip "comment and parse-partial-sexp tests" reason))
3262 ;; and here's the test code
3264 (insert "// comment\n")
3265 (forward-comment -2)
3266 (Assert (eq (point) (point-min)))
3267 (let ((point (point)))
3268 (insert "/* comment */")
3271 (Assert (eq (point) (point-max)))
3272 (parse-partial-sexp point (point-max)))))
3275 @code{Skip-Test} is intended for use with features that are normally
3276 present in typical configurations. For truly optional features, or
3277 tests that apply to one of several alternative implementations (eg, to
3278 GTK widgets, but not Athena, Motif, MS Windows, or Carbon), simply
3279 silently omit the test.
3282 @node A Summary of the Various XEmacs Modules, Allocation of Objects in XEmacs Lisp, Regression Testing XEmacs, Top
3283 @chapter A Summary of the Various XEmacs Modules
3284 @cindex modules, a summary of the various XEmacs
3286 This is accurate as of XEmacs 20.0.
3289 * Low-Level Modules::
3290 * Basic Lisp Modules::
3291 * Modules for Standard Editing Operations::
3292 * Editor-Level Control Flow Modules::
3293 * Modules for the Basic Displayable Lisp Objects::
3294 * Modules for other Display-Related Lisp Objects::
3295 * Modules for the Redisplay Mechanism::
3296 * Modules for Interfacing with the File System::
3297 * Modules for Other Aspects of the Lisp Interpreter and Object System::
3298 * Modules for Interfacing with the Operating System::
3299 * Modules for Interfacing with X Windows::
3300 * Modules for Internationalization::
3301 * Modules for Regression Testing::
3304 @node Low-Level Modules
3305 @section Low-Level Modules
3306 @cindex low-level modules
3307 @cindex modules, low-level
3313 This is automatically generated from @file{config.h.in} based on the
3314 results of configure tests and user-selected optional features and
3315 contains preprocessor definitions specifying the nature of the
3316 environment in which XEmacs is being compiled.
3324 This is automatically generated from @file{paths.h.in} based on supplied
3325 configure values, and allows for non-standard installed configurations
3326 of the XEmacs directories. It's currently broken, though.
3335 @file{emacs.c} contains @code{main()} and other code that performs the most
3336 basic environment initializations and handles shutting down the XEmacs
3337 process (this includes @code{kill-emacs}, the normal way that XEmacs is
3338 exited; @code{dump-emacs}, which is used during the build process to
3339 write out the XEmacs executable; @code{run-emacs-from-temacs}, which can
3340 be used to start XEmacs directly when temacs has finished loading all
3341 the Lisp code; and emergency code to handle crashes [XEmacs tries to
3342 auto-save all files before it crashes]).
3344 Low-level code that directly interacts with the Unix signal mechanism,
3345 however, is in @file{signal.c}. Note that this code does not handle system
3346 dependencies in interfacing to signals; that is handled using the
3347 @file{syssignal.h} header file, described in section J below.
3371 These modules contain code dumping out the XEmacs executable on various
3372 different systems. (This process is highly machine-specific and
3373 requires intimate knowledge of the executable format and the memory map
3374 of the process.) Only one of these modules is actually used; this is
3375 chosen by @file{configure}.
3385 These modules are used in conjunction with the dump mechanism. On some
3386 systems, an alternative version of the C startup code (the actual code
3387 that receives control from the operating system when the process is
3388 started, and which calls @code{main()}) is required so that the dumping
3389 process works properly; @file{crt0.c} provides this.
3391 @file{pre-crt0.c} and @file{lastfile.c} should be the very first and
3392 very last file linked, respectively. (Actually, this is not really true.
3393 @file{lastfile.c} should be after all Emacs modules whose initialized
3394 data should be made constant, and before all other Emacs files and all
3395 libraries. In particular, the allocation modules @file{gmalloc.c},
3396 @file{alloca.c}, etc. are normally placed past @file{lastfile.c}, and
3397 all of the files that implement Xt widget classes @emph{must} be placed
3398 after @file{lastfile.c} because they contain various structures that
3399 must be statically initialized and into which Xt writes at various
3400 times.) @file{pre-crt0.c} and @file{lastfile.c} contain exported symbols
3401 that are used to determine the start and end of XEmacs' initialized
3402 data space when dumping.
3417 These handle basic C allocation of memory. @file{alloca.c} is an emulation of
3418 the stack allocation function @code{alloca()} on machines that lack
3419 this. (XEmacs makes extensive use of @code{alloca()} in its code.)
3421 @file{gmalloc.c} and @file{malloc.c} are two implementations of the standard C
3422 functions @code{malloc()}, @code{realloc()} and @code{free()}. They are
3423 often used in place of the standard system-provided @code{malloc()}
3424 because they usually provide a much faster implementation, at the
3425 expense of additional memory use. @file{gmalloc.c} is a newer implementation
3426 that is much more memory-efficient for large allocations than @file{malloc.c},
3427 and should always be preferred if it works. (At one point, @file{gmalloc.c}
3428 didn't work on some systems where @file{malloc.c} worked; but this should be
3431 @cindex relocating allocator
3432 @file{ralloc.c} is the @dfn{relocating allocator}. It provides
3433 functions similar to @code{malloc()}, @code{realloc()} and @code{free()}
3434 that allocate memory that can be dynamically relocated in memory. The
3435 advantage of this is that allocated memory can be shuffled around to
3436 place all the free memory at the end of the heap, and the heap can then
3437 be shrunk, releasing the memory back to the operating system. The use
3438 of this can be controlled with the configure option @code{--rel-alloc};
3439 if enabled, memory allocated for buffers will be relocatable, so that if
3440 a very large file is visited and the buffer is later killed, the memory
3441 can be released to the operating system. (The disadvantage of this
3442 mechanism is that it can be very slow. On systems with the
3443 @code{mmap()} system call, the XEmacs version of @file{ralloc.c} uses
3444 this to move memory around without actually having to block-copy it,
3445 which can speed things up; but it can still cause noticeable performance
3448 @file{free-hook.c} contains some debugging functions for checking for invalid
3449 arguments to @code{free()}.
3451 @file{vm-limit.c} contains some functions that warn the user when memory is
3452 getting low. These are callback functions that are called by @file{gmalloc.c}
3453 and @file{malloc.c} at appropriate times.
3455 @file{getpagesize.h} provides a uniform interface for retrieving the size of a
3456 page in virtual memory. @file{mem-limits.h} provides a uniform interface for
3457 retrieving the total amount of available virtual memory. Both are
3458 similar in spirit to the @file{sys*.h} files described in section J, below.
3468 These implement a couple of basic C data types to facilitate memory
3469 allocation. The @code{Blocktype} type efficiently manages the
3470 allocation of fixed-size blocks by minimizing the number of times that
3471 @code{malloc()} and @code{free()} are called. It allocates memory in
3472 large chunks, subdivides the chunks into blocks of the proper size, and
3473 returns the blocks as requested. When blocks are freed, they are placed
3474 onto a linked list, so they can be efficiently reused. This data type
3475 is not much used in XEmacs currently, because it's a fairly new
3478 @cindex dynamic array
3479 The @code{Dynarr} type implements a @dfn{dynamic array}, which is
3480 similar to a standard C array but has no fixed limit on the number of
3481 elements it can contain. Dynamic arrays can hold elements of any type,
3482 and when you add a new element, the array automatically resizes itself
3483 if it isn't big enough. Dynarrs are extensively used in the redisplay
3492 This module is used in connection with inline functions (available in
3493 some compilers). Often, inline functions need to have a corresponding
3494 non-inline function that does the same thing. This module is where they
3495 reside. It contains no actual code, but defines some special flags that
3496 cause inline functions defined in header files to be rendered as actual
3497 functions. It then includes all header files that contain any inline
3498 function definitions, so that each one gets a real function equivalent.
3507 These functions provide a system for doing internal consistency checks
3508 during code development. This system is not currently used; instead the
3509 simpler @code{assert()} macro is used along with the various checks
3510 provided by the @samp{--error-check-*} configuration options.
3518 This is not currently used.
3522 @node Basic Lisp Modules
3523 @section Basic Lisp Modules
3524 @cindex Lisp modules, basic
3525 @cindex modules, basic Lisp
3535 These are the basic header files for all XEmacs modules. Each module
3536 includes @file{lisp.h}, which brings the other header files in.
3537 @file{lisp.h} contains the definitions of the structures and extractor
3538 and constructor macros for the basic Lisp objects and various other
3539 basic definitions for the Lisp environment, as well as some
3540 general-purpose definitions (e.g. @code{min()} and @code{max()}).
3541 @file{lisp.h} includes either @file{lisp-disunion.h} or
3542 @file{lisp-union.h}, depending on whether @code{USE_UNION_TYPE} is
3543 defined. These files define the typedef of the Lisp object itself (as
3544 described above) and the low-level macros that hide the actual
3545 implementation of the Lisp object. All extractor and constructor macros
3546 for particular types of Lisp objects are defined in terms of these
3549 As a general rule, all typedefs should go into the typedefs section of
3550 @file{lisp.h} rather than into a module-specific header file even if the
3551 structure is defined elsewhere. This allows function prototypes that
3552 use the typedef to be placed into other header files. Forward structure
3553 declarations (i.e. a simple declaration like @code{struct foo;} where
3554 the structure itself is defined elsewhere) should be placed into the
3555 typedefs section as necessary.
3557 @file{lrecord.h} contains the basic structures and macros that implement
3558 all record-type Lisp objects---i.e. all objects whose type is a field
3559 in their C structure, which includes all objects except the few most
3562 @file{lisp.h} contains prototypes for most of the exported functions in
3563 the various modules. Lisp primitives defined using @code{DEFUN} that
3564 need to be called by C code should be declared using @code{EXFUN}.
3565 Other function prototypes should be placed either into the appropriate
3566 section of @code{lisp.h}, or into a module-specific header file,
3567 depending on how general-purpose the function is and whether it has
3568 special-purpose argument types requiring definitions not in
3569 @file{lisp.h}.) All initialization functions are prototyped in
3578 The large module @file{alloc.c} implements all of the basic allocation and
3579 garbage collection for Lisp objects. The most commonly used Lisp
3580 objects are allocated in chunks, similar to the Blocktype data type
3581 described above; others are allocated in individually @code{malloc()}ed
3582 blocks. This module provides the foundation on which all other aspects
3583 of the Lisp environment sit, and is the first module initialized at
3586 Note that @file{alloc.c} provides a series of generic functions that are
3587 not dependent on any particular object type, and interfaces to
3588 particular types of objects using a standardized interface of
3589 type-specific methods. This scheme is a fundamental principle of
3590 object-oriented programming and is heavily used throughout XEmacs. The
3591 great advantage of this is that it allows for a clean separation of
3592 functionality into different modules---new classes of Lisp objects, new
3593 event interfaces, new device types, new stream interfaces, etc. can be
3594 added transparently without affecting code anywhere else in XEmacs.
3595 Because the different subsystems are divided into general and specific
3596 code, adding a new subtype within a subsystem will in general not
3597 require changes to the generic subsystem code or affect any of the other
3598 subtypes in the subsystem; this provides a great deal of robustness to
3607 This module contains all of the functions to handle the flow of control.
3608 This includes the mechanisms of defining functions, calling functions,
3609 traversing stack frames, and binding variables; the control primitives
3610 and other special forms such as @code{while}, @code{if}, @code{eval},
3611 @code{let}, @code{and}, @code{or}, @code{progn}, etc.; handling of
3612 non-local exits, unwind-protects, and exception handlers; entering the
3613 debugger; methods for the subr Lisp object type; etc. It does
3614 @emph{not} include the @code{read} function, the @code{print} function,
3615 or the handling of symbols and obarrays.
3617 @file{backtrace.h} contains some structures related to stack frames and the
3626 This module implements the Lisp reader and the @code{read} function,
3627 which converts text into Lisp objects, according to the read syntax of
3628 the objects, as described above. This is similar to the parser that is
3629 a part of all compilers.
3637 This module implements the Lisp print mechanism and the @code{print}
3638 function and related functions. This is the inverse of the Lisp reader
3639 -- it converts Lisp objects to a printed, textual representation.
3640 (Hopefully something that can be read back in using @code{read} to get
3641 an equivalent object.)
3651 @file{symbols.c} implements the handling of symbols, obarrays, and
3652 retrieving the values of symbols. Much of the code is devoted to
3653 handling the special @dfn{symbol-value-magic} objects that define
3654 special types of variables---this includes buffer-local variables,
3655 variable aliases, variables that forward into C variables, etc. This
3656 module is initialized extremely early (right after @file{alloc.c}),
3657 because it is here that the basic symbols @code{t} and @code{nil} are
3658 created, and those symbols are used everywhere throughout XEmacs.
3660 @file{symeval.h} contains the definitions of symbol structures and the
3661 @code{DEFVAR_LISP()} and related macros for declaring variables.
3671 These modules implement the methods and standard Lisp primitives for all
3672 the basic Lisp object types other than symbols (which are described
3673 above). @file{data.c} contains all the predicates (primitives that return
3674 whether an object is of a particular type); the integer arithmetic
3675 functions; and the basic accessor and mutator primitives for the various
3676 object types. @file{fns.c} contains all the standard predicates for working
3677 with sequences (where, abstractly speaking, a sequence is an ordered set
3678 of objects, and can be represented by a list, string, vector, or
3679 bit-vector); it also contains @code{equal}, perhaps on the grounds that
3680 bulk of the operation of @code{equal} is comparing sequences.
3681 @file{floatfns.c} contains methods and primitives for floats and floating-point
3691 @file{bytecode.c} implements the byte-code interpreter and
3692 compiled-function objects, and @file{bytecode.h} contains associated
3693 structures. Note that the byte-code @emph{compiler} is written in Lisp.
3698 @node Modules for Standard Editing Operations
3699 @section Modules for Standard Editing Operations
3700 @cindex modules for standard editing operations
3701 @cindex editing operations, modules for standard
3709 @file{buffer.c} implements the @dfn{buffer} Lisp object type. This
3710 includes functions that create and destroy buffers; retrieve buffers by
3711 name or by other properties; manipulate lists of buffers (remember that
3712 buffers are permanent objects and stored in various ordered lists);
3713 retrieve or change buffer properties; etc. It also contains the
3714 definitions of all the built-in buffer-local variables (which can be
3715 viewed as buffer properties). It does @emph{not} contain code to
3716 manipulate buffer-local variables (that's in @file{symbols.c}, described
3717 above); or code to manipulate the text in a buffer.
3719 @file{buffer.h} defines the structures associated with a buffer and the various
3720 macros for retrieving text from a buffer and special buffer positions
3721 (e.g. @code{point}, the default location for text insertion). It also
3722 contains macros for working with buffer positions and converting between
3723 their representations as character offsets and as byte offsets (under
3724 MULE, they are different, because characters can be multi-byte). It is
3725 one of the largest header files.
3727 @file{bufslots.h} defines the fields in the buffer structure that correspond to
3728 the built-in buffer-local variables. It is its own header file because
3729 it is included many times in @file{buffer.c}, as a way of iterating over all
3730 the built-in buffer-local variables.
3739 @file{insdel.c} contains low-level functions for inserting and deleting text in
3740 a buffer, keeping track of changed regions for use by redisplay, and
3741 calling any before-change and after-change functions that may have been
3742 registered for the buffer. It also contains the actual functions that
3743 convert between byte offsets and character offsets.
3745 @file{insdel.h} contains associated headers.
3753 This module implements the @dfn{marker} Lisp object type, which
3754 conceptually is a pointer to a text position in a buffer that moves
3755 around as text is inserted and deleted, so as to remain in the same
3756 relative position. This module doesn't actually move the markers around
3757 -- that's handled in @file{insdel.c}. This module just creates them and
3758 implements the primitives for working with them. As markers are simple
3759 objects, this does not entail much.
3761 Note that the standard arithmetic primitives (e.g. @code{+}) accept
3762 markers in place of integers and automatically substitute the value of
3763 @code{marker-position} for the marker, i.e. an integer describing the
3764 current buffer position of the marker.
3773 This module implements the @dfn{extent} Lisp object type, which is like
3774 a marker that works over a range of text rather than a single position.
3775 Extents are also much more complex and powerful than markers and have a
3776 more efficient (and more algorithmically complex) implementation. The
3777 implementation is described in detail in comments in @file{extents.c}.
3779 The code in @file{extents.c} works closely with @file{insdel.c} so that
3780 extents are properly moved around as text is inserted and deleted.
3781 There is also code in @file{extents.c} that provides information needed
3782 by the redisplay mechanism for efficient operation. (Remember that
3783 extents can have display properties that affect [sometimes drastically,
3784 as in the @code{invisible} property] the display of the text they
3793 @file{editfns.c} contains the standard Lisp primitives for working with
3794 a buffer's text, and calls the low-level functions in @file{insdel.c}.
3795 It also contains primitives for working with @code{point} (the default
3796 buffer insertion location).
3798 @file{editfns.c} also contains functions for retrieving various
3799 characteristics from the external environment: the current time, the
3800 process ID of the running XEmacs process, the name of the user who ran
3801 this XEmacs process, etc. It's not clear why this code is in
3813 These modules implement the basic @dfn{interactive} commands,
3814 i.e. user-callable functions. Commands, as opposed to other functions,
3815 have special ways of getting their parameters interactively (by querying
3816 the user), as opposed to having them passed in a normal function
3817 invocation. Many commands are not really meant to be called from other
3818 Lisp functions, because they modify global state in a way that's often
3819 undesired as part of other Lisp functions.
3821 @file{callint.c} implements the mechanism for querying the user for
3822 parameters and calling interactive commands. The bulk of this module is
3823 code that parses the interactive spec that is supplied with an
3824 interactive command.
3826 @file{cmds.c} implements the basic, most commonly used editing commands:
3827 commands to move around the current buffer and insert and delete
3828 characters. These commands are implemented using the Lisp primitives
3829 defined in @file{editfns.c}.
3831 @file{commands.h} contains associated structure definitions and prototypes.
3841 @file{search.c} implements the Lisp primitives for searching for text in
3842 a buffer, and some of the low-level algorithms for doing this. In
3843 particular, the fast fixed-string Boyer-Moore search algorithm is
3844 implemented in @file{search.c}. The low-level algorithms for doing
3845 regular-expression searching, however, are implemented in @file{regex.c}
3846 and @file{regex.h}. These two modules are largely independent of
3847 XEmacs, and are similar to (and based upon) the regular-expression
3848 routines used in @file{grep} and other GNU utilities.
3856 @file{doprnt.c} implements formatted-string processing, similar to
3857 @code{printf()} command in C.
3865 This module implements the undo mechanism for tracking buffer changes.
3866 Most of this could be implemented in Lisp.
3870 @node Editor-Level Control Flow Modules
3871 @section Editor-Level Control Flow Modules
3872 @cindex control flow modules, editor-level
3873 @cindex modules, editor-level control flow
3887 These implement the handling of events (user input and other system
3890 @file{events.c} and @file{events.h} define the @dfn{event} Lisp object
3891 type and primitives for manipulating it.
3893 @file{event-stream.c} implements the basic functions for working with
3894 event queues, dispatching an event by looking it up in relevant keymaps
3895 and such, and handling timeouts; this includes the primitives
3896 @code{next-event} and @code{dispatch-event}, as well as related
3897 primitives such as @code{sit-for}, @code{sleep-for}, and
3898 @code{accept-process-output}. (@file{event-stream.c} is one of the
3899 hairiest and trickiest modules in XEmacs. Beware! You can easily mess
3902 @file{event-Xt.c} and @file{event-tty.c} implement the low-level
3903 interfaces onto retrieving events from Xt (the X toolkit) and from TTY's
3904 (using @code{read()} and @code{select()}), respectively. The event
3905 interface enforces a clean separation between the specific code for
3906 interfacing with the operating system and the generic code for working
3907 with events, by defining an API of basic, low-level event methods;
3908 @file{event-Xt.c} and @file{event-tty.c} are two different
3909 implementations of this API. To add support for a new operating system
3910 (e.g. NeXTstep), one merely needs to provide another implementation of
3911 those API functions.
3913 Note that the choice of whether to use @file{event-Xt.c} or
3914 @file{event-tty.c} is made at compile time! Or at the very latest, it
3915 is made at startup time. @file{event-Xt.c} handles events for
3916 @emph{both} X and TTY frames; @file{event-tty.c} is only used when X
3917 support is not compiled into XEmacs. The reason for this is that there
3918 is only one event loop in XEmacs: thus, it needs to be able to receive
3919 events from all different kinds of frames.
3928 @file{keymap.c} and @file{keymap.h} define the @dfn{keymap} Lisp object
3929 type and associated methods and primitives. (Remember that keymaps are
3930 objects that associate event descriptions with functions to be called to
3931 ``execute'' those events; @code{dispatch-event} looks up events in the
3940 @file{cmdloop.c} contains functions that implement the actual editor
3941 command loop---i.e. the event loop that cyclically retrieves and
3942 dispatches events. This code is also rather tricky, just like
3943 @file{event-stream.c}.
3952 These two modules contain the basic code for defining keyboard macros.
3953 These functions don't actually do much; most of the code that handles keyboard
3954 macros is mixed in with the event-handling code in @file{event-stream.c}.
3962 This contains some miscellaneous code related to the minibuffer (most of
3963 the minibuffer code was moved into Lisp by Richard Mlynarik). This
3964 includes the primitives for completion (although filename completion is
3965 in @file{dired.c}), the lowest-level interface to the minibuffer (if the
3966 command loop were cleaned up, this too could be in Lisp), and code for
3967 dealing with the echo area (this, too, was mostly moved into Lisp, and
3968 the only code remaining is code to call out to Lisp or provide simple
3969 bootstrapping implementations early in temacs, before the echo-area Lisp
3974 @node Modules for the Basic Displayable Lisp Objects
3975 @section Modules for the Basic Displayable Lisp Objects
3976 @cindex modules for the basic displayable Lisp objects
3977 @cindex displayable Lisp objects, modules for the basic
3978 @cindex Lisp objects, modules for the basic displayable
3979 @cindex objects, modules for the basic displayable Lisp
3994 These modules implement the @dfn{console} Lisp object type. A console
3995 contains multiple display devices, but only one keyboard and mouse.
3996 Most of the time, a console will contain exactly one device.
3998 Consoles are the top of a lisp object inclusion hierarchy. Consoles
3999 contain devices, which contain frames, which contain windows.
4011 These modules implement the @dfn{device} Lisp object type. This
4012 abstracts a particular screen or connection on which frames are
4013 displayed. As with Lisp objects, event interfaces, and other
4014 subsystems, the device code is separated into a generic component that
4015 contains a standardized interface (in the form of a set of methods) onto
4016 particular device types.
4018 The device subsystem defines all the methods and provides method
4019 services for not only device operations but also for the frame, window,
4020 menubar, scrollbar, toolbar, and other displayable-object subsystems.
4021 The reason for this is that all of these subsystems have the same
4022 subtypes (X, TTY, NeXTstep, Microsoft Windows, etc.) as devices do.
4034 Each device contains one or more frames in which objects (e.g. text) are
4035 displayed. A frame corresponds to a window in the window system;
4036 usually this is a top-level window but it could potentially be one of a
4037 number of overlapping child windows within a top-level window, using the
4038 MDI (Multiple Document Interface) protocol in Microsoft Windows or a
4041 The @file{frame-*} files implement the @dfn{frame} Lisp object type and
4042 provide the generic and device-type-specific operations on frames
4043 (e.g. raising, lowering, resizing, moving, etc.).
4052 @cindex window (in Emacs)
4054 Each frame consists of one or more non-overlapping @dfn{windows} (better
4055 known as @dfn{panes} in standard window-system terminology) in which a
4056 buffer's text can be displayed. Windows can also have scrollbars
4057 displayed around their edges.
4059 @file{window.c} and @file{window.h} implement the @dfn{window} Lisp
4060 object type and provide code to manage windows. Since windows have no
4061 associated resources in the window system (the window system knows only
4062 about the frame; no child windows or anything are used for XEmacs
4063 windows), there is no device-type-specific code here; all of that code
4064 is part of the redisplay mechanism or the code for particular object
4065 types such as scrollbars.
4069 @node Modules for other Display-Related Lisp Objects
4070 @section Modules for other Display-Related Lisp Objects
4071 @cindex modules for other display-related Lisp objects
4072 @cindex display-related Lisp objects, modules for other
4073 @cindex Lisp objects, modules for other display-related
4143 This file provides C support for syntax highlighting---i.e.
4144 highlighting different syntactic constructs of a source file in
4145 different colors, for easy reading. The C support is provided so that
4157 These modules decode GIF-format image files, for use with glyphs.
4158 These files were removed due to Unisys patent infringement concerns.
4162 @node Modules for the Redisplay Mechanism
4163 @section Modules for the Redisplay Mechanism
4164 @cindex modules for the redisplay mechanism
4165 @cindex redisplay mechanism, modules for the
4176 These files provide the redisplay mechanism. As with many other
4177 subsystems in XEmacs, there is a clean separation between the general
4178 and device-specific support.
4180 @file{redisplay.c} contains the bulk of the redisplay engine. These
4181 functions update the redisplay structures (which describe how the screen
4182 is to appear) to reflect any changes made to the state of any
4183 displayable objects (buffer, frame, window, etc.) since the last time
4184 that redisplay was called. These functions are highly optimized to
4185 avoid doing more work than necessary (since redisplay is called
4186 extremely often and is potentially a huge time sink), and depend heavily
4187 on notifications from the objects themselves that changes have occurred,
4188 so that redisplay doesn't explicitly have to check each possible object.
4189 The redisplay mechanism also contains a great deal of caching to further
4190 speed things up; some of this caching is contained within the various
4191 displayable objects.
4193 @file{redisplay-output.c} goes through the redisplay structures and converts
4194 them into calls to device-specific methods to actually output the screen
4197 @file{redisplay-x.c} and @file{redisplay-tty.c} are two implementations
4198 of these redisplay output methods, for X frames and TTY frames,
4207 This module contains various functions and Lisp primitives for
4208 converting between buffer positions and screen positions. These
4209 functions call the redisplay mechanism to do most of the work, and then
4210 examine the redisplay structures to get the necessary information. This
4221 These files contain functions for working with the termcap (BSD-style)
4222 and terminfo (System V style) databases of terminal capabilities and
4223 escape sequences, used when XEmacs is displaying in a TTY.
4232 These files provide some miscellaneous TTY-output functions and should
4233 probably be merged into @file{redisplay-tty.c}.
4237 @node Modules for Interfacing with the File System
4238 @section Modules for Interfacing with the File System
4239 @cindex modules for interfacing with the file system
4240 @cindex interfacing with the file system, modules for
4241 @cindex file system, modules for interfacing with the
4248 These modules implement the @dfn{stream} Lisp object type. This is an
4249 internal-only Lisp object that implements a generic buffering stream.
4250 The idea is to provide a uniform interface onto all sources and sinks of
4251 data, including file descriptors, stdio streams, chunks of memory, Lisp
4252 buffers, Lisp strings, etc. That way, I/O functions can be written to
4253 the stream interface and can transparently handle all possible sources
4254 and sinks. (For example, the @code{read} function can read data from a
4255 file, a string, a buffer, or even a function that is called repeatedly
4256 to return data, without worrying about where the data is coming from or
4257 what-size chunks it is returned in.)
4260 Note that in the C code, streams are called @dfn{lstreams} (for ``Lisp
4261 streams'') to distinguish them from other kinds of streams, e.g. stdio
4262 streams and C++ I/O streams.
4264 Similar to other subsystems in XEmacs, lstreams are separated into
4265 generic functions and a set of methods for the different types of
4266 lstreams. @file{lstream.c} provides implementations of many different
4267 types of streams; others are provided, e.g., in @file{file-coding.c}.
4275 This implements the basic primitives for interfacing with the file
4276 system. This includes primitives for reading files into buffers,
4277 writing buffers into files, checking for the presence or accessibility
4278 of files, canonicalizing file names, etc. Note that these primitives
4279 are usually not invoked directly by the user: There is a great deal of
4280 higher-level Lisp code that implements the user commands such as
4281 @code{find-file} and @code{save-buffer}. This is similar to the
4282 distinction between the lower-level primitives in @file{editfns.c} and
4283 the higher-level user commands in @file{commands.c} and
4292 This file provides functions for detecting clashes between different
4293 processes (e.g. XEmacs and some external process, or two different
4294 XEmacs processes) modifying the same file. (XEmacs can optionally use
4295 the @file{lock/} subdirectory to provide a form of ``locking'' between
4296 different XEmacs processes.) This module is also used by the low-level
4297 functions in @file{insdel.c} to ensure that, if the first modification
4298 is being made to a buffer whose corresponding file has been externally
4299 modified, the user is made aware of this so that the buffer can be
4300 synched up with the external changes if necessary.
4307 This file provides some miscellaneous functions that construct a
4308 @samp{rwxr-xr-x}-type permissions string (as might appear in an
4309 @file{ls}-style directory listing) given the information returned by the
4310 @code{stat()} system call.
4319 These files implement the XEmacs interface to directory searching. This
4320 includes a number of primitives for determining the files in a directory
4321 and for doing filename completion. (Remember that generic completion is
4322 handled by a different mechanism, in @file{minibuf.c}.)
4324 @file{ndir.h} is a header file used for the directory-searching
4325 emulation functions provided in @file{sysdep.c} (see section J below),
4326 for systems that don't provide any directory-searching functions. (On
4327 those systems, directories can be read directly as files, and parsed.)
4335 This file provides an implementation of the @code{realpath()} function
4336 for expanding symbolic links, on systems that don't implement it or have
4337 a broken implementation.
4341 @node Modules for Other Aspects of the Lisp Interpreter and Object System
4342 @section Modules for Other Aspects of the Lisp Interpreter and Object System
4343 @cindex modules for other aspects of the Lisp interpreter and object system
4344 @cindex Lisp interpreter and object system, modules for other aspects of the
4345 @cindex interpreter and object system, modules for other aspects of the Lisp
4346 @cindex object system, modules for other aspects of the Lisp interpreter and
4355 These files provide two implementations of hash tables. Files
4356 @file{hash.c} and @file{hash.h} provide a generic C implementation of
4357 hash tables which can stand independently of XEmacs. Files
4358 @file{elhash.c} and @file{elhash.h} provide a separate implementation of
4359 hash tables that can store only Lisp objects, and knows about Lispy
4360 things like garbage collection, and implement the @dfn{hash-table} Lisp
4369 This module implements the @dfn{specifier} Lisp object type. This is
4370 primarily used for displayable properties, and allows for values that
4371 are specific to a particular buffer, window, frame, device, or device
4372 class, as well as a default value existing. This is used, for example,
4373 to control the height of the horizontal scrollbar or the appearance of
4374 the @code{default}, @code{bold}, or other faces. The specifier object
4375 consists of a number of specifications, each of which maps from a
4376 buffer, window, etc. to a value. The function @code{specifier-instance}
4377 looks up a value given a window (from which a buffer, frame, and device
4387 @file{chartab.c} and @file{chartab.h} implement the @dfn{char table}
4388 Lisp object type, which maps from characters or certain sorts of
4389 character ranges to Lisp objects. The implementation of this object
4390 type is optimized for the internal representation of characters. Char
4391 tables come in different types, which affect the allowed object types to
4392 which a character can be mapped and also dictate certain other
4393 properties of the char table.
4396 @file{casetab.c} implements one sort of char table, the @dfn{case
4397 table}, which maps characters to other characters of possibly different
4398 case. These are used by XEmacs to implement case-changing primitives
4399 and to do case-insensitive searching.
4409 This module implements @dfn{syntax tables}, another sort of char table
4410 that maps characters into syntax classes that define the syntax of these
4411 characters (e.g. a parenthesis belongs to a class of @samp{open}
4412 characters that have corresponding @samp{close} characters and can be
4413 nested). This module also implements the Lisp @dfn{scanner}, a set of
4414 primitives for scanning over text based on syntax tables. This is used,
4415 for example, to find the matching parenthesis in a command such as
4416 @code{forward-sexp}, and by @file{font-lock.c} to locate quoted strings,
4419 @c #### Break this out into a separate node somewhere!
4420 Syntax codes are implemented as bitfields in an int. Bits 0-6 contain
4421 the syntax code itself, bit 7 is a special prefix flag used for Lisp,
4422 and bits 16-23 contain comment syntax flags. From the Lisp programmer's
4423 point of view, there are 11 flags: 2 styles X 2 characters X @{start,
4424 end@} flags for two-character comment delimiters, 2 style flags for
4425 one-character comment delimiters, and the prefix flag.
4427 Internally, however, the characters used in multi-character delimiters
4428 will have non-comment-character syntax classes (@emph{e.g.}, the
4429 @samp{/} in C's @samp{/*} comment-start delimiter has ``punctuation''
4430 (here meaning ``operator-like'') class in C modes). Thus in a mixed
4431 comment style, such as C++'s @samp{//} to end of line, is represented by
4432 giving @samp{/} the ``punctuation'' class and the ``style b first
4433 character of start sequence'' and ``style b second character of start
4434 sequence'' flags. The fact that class is @emph{not} punctuation allows
4435 the syntax scanner to recognize that this is a multi-character
4436 delimiter. The @samp{newline} character is given (single-character)
4437 ``comment-end'' @emph{class} and the ``style b first character of end
4438 sequence'' @emph{flag}. The ``comment-end'' class allows the scanner to
4439 determine that no second character is needed to terminate the comment.
4446 This module implements various Lisp primitives for upcasing, downcasing
4447 and capitalizing strings or regions of buffers.
4455 This module implements the @dfn{range table} Lisp object type, which
4456 provides for a mapping from ranges of integers to arbitrary Lisp
4466 This module implements the @dfn{opaque} Lisp object type, an
4467 internal-only Lisp object that encapsulates an arbitrary block of memory
4468 so that it can be managed by the Lisp allocation system. To create an
4469 opaque object, you call @code{make_opaque()}, passing a pointer to a
4470 block of memory. An object is created that is big enough to hold the
4471 memory, which is copied into the object's storage. The object will then
4472 stick around as long as you keep pointers to it, after which it will be
4473 automatically reclaimed.
4476 Opaque objects can also have an arbitrary @dfn{mark method} associated
4477 with them, in case the block of memory contains other Lisp objects that
4478 need to be marked for garbage-collection purposes. (If you need other
4479 object methods, such as a finalize method, you should just go ahead and
4480 create a new Lisp object type---it's not hard.)
4488 This function provides a few primitives for doing dynamic abbreviation
4489 expansion. In XEmacs, most of the code for this has been moved into
4490 Lisp. Some C code remains for speed and because the primitive
4491 @code{self-insert-command} (which is executed for all self-inserting
4492 characters) hooks into the abbrev mechanism. (@code{self-insert-command}
4493 is itself in C only for speed.)
4501 This function provides primitives for retrieving the documentation
4502 strings of functions and variables. These documentation strings contain
4503 certain special markers that get dynamically expanded (e.g. a
4504 reverse-lookup is performed on some named functions to retrieve their
4505 current key bindings). Some documentation strings (in particular, for
4506 the built-in primitives and pre-loaded Lisp functions) are stored
4507 externally in a file @file{DOC} in the @file{lib-src/} directory and
4508 need to be fetched from that file. (Part of the build stage involves
4509 building this file, and another part involves constructing an index for
4510 this file and embedding it into the executable, so that the functions in
4511 @file{doc.c} do not have to search the entire @file{DOC} file to find
4512 the appropriate documentation string.)
4520 This function provides a Lisp primitive that implements the MD5 secure
4521 hashing scheme, used to create a large hash value of a string of data such that
4522 the data cannot be derived from the hash value. This is used for
4523 various security applications on the Internet.
4528 @node Modules for Interfacing with the Operating System
4529 @section Modules for Interfacing with the Operating System
4530 @cindex modules for interfacing with the operating system
4531 @cindex interfacing with the operating system, modules for
4532 @cindex operating system, modules for interfacing with the
4540 These modules allow XEmacs to spawn and communicate with subprocesses
4541 and network connections.
4543 @cindex synchronous subprocesses
4544 @cindex subprocesses, synchronous
4545 @file{callproc.c} implements (through the @code{call-process}
4546 primitive) what are called @dfn{synchronous subprocesses}. This means
4547 that XEmacs runs a program, waits till it's done, and retrieves its
4548 output. A typical example might be calling the @file{ls} program to get
4549 a directory listing.
4551 @cindex asynchronous subprocesses
4552 @cindex subprocesses, asynchronous
4553 @file{process.c} and @file{process.h} implement @dfn{asynchronous
4554 subprocesses}. This means that XEmacs starts a program and then
4555 continues normally, not waiting for the process to finish. Data can be
4556 sent to the process or retrieved from it as it's running. This is used
4557 for the @code{shell} command (which provides a front end onto a shell
4558 program such as @file{csh}), the mail and news readers implemented in
4559 XEmacs, etc. The result of calling @code{start-process} to start a
4560 subprocess is a process object, a particular kind of object used to
4561 communicate with the subprocess. You can send data to the process by
4562 passing the process object and the data to @code{send-process}, and you
4563 can specify what happens to data retrieved from the process by setting
4564 properties of the process object. (When the process sends data, XEmacs
4565 receives a process event, which says that there is data ready. When
4566 @code{dispatch-event} is called on this event, it reads the data from
4567 the process and does something with it, as specified by the process
4568 object's properties. Typically, this means inserting the data into a
4569 buffer or calling a function.) Another property of the process object is
4570 called the @dfn{sentinel}, which is a function that is called when the
4573 @cindex network connections
4574 Process objects are also used for network connections (connections to a
4575 process running on another machine). Network connections are started
4576 with @code{open-network-stream} but otherwise work just like
4586 These modules implement most of the low-level, messy operating-system
4587 interface code. This includes various device control (ioctl) operations
4588 for file descriptors, TTY's, pseudo-terminals, etc. (usually this stuff
4589 is fairly system-dependent; thus the name of this module), and emulation
4590 of standard library functions and system calls on systems that don't
4591 provide them or have broken versions.
4607 These header files provide consistent interfaces onto system-dependent
4608 header files and system calls. The idea is that, instead of including a
4609 standard header file like @file{<sys/param.h>} (which may or may not
4610 exist on various systems) or having to worry about whether all system
4611 provide a particular preprocessor constant, or having to deal with the
4612 four different paradigms for manipulating signals, you just include the
4613 appropriate @file{sys*.h} header file, which includes all the right
4614 system header files, defines and missing preprocessor constants,
4615 provides a uniform interface onto system calls, etc.
4617 @file{sysdir.h} provides a uniform interface onto directory-querying
4618 functions. (In some cases, this is in conjunction with emulation
4619 functions in @file{sysdep.c}.)
4621 @file{sysfile.h} includes all the necessary header files for standard
4622 system calls (e.g. @code{read()}), ensures that all necessary
4623 @code{open()} and @code{stat()} preprocessor constants are defined, and
4624 possibly (usually) substitutes sugared versions of @code{read()},
4625 @code{write()}, etc. that automatically restart interrupted I/O
4628 @file{sysfloat.h} includes the necessary header files for floating-point
4631 @file{sysproc.h} includes the necessary header files for calling
4632 @code{select()}, @code{fork()}, @code{execve()}, socket operations, and
4633 the like, and ensures that the @code{FD_*()} macros for descriptor-set
4634 manipulations are available.
4636 @file{syspwd.h} includes the necessary header files for obtaining
4637 information from @file{/etc/passwd} (the functions are emulated under
4640 @file{syssignal.h} includes the necessary header files for
4641 signal-handling and provides a uniform interface onto the different
4642 signal-handling and signal-blocking paradigms.
4644 @file{systime.h} includes the necessary header files and provides
4645 uniform interfaces for retrieving the time of day, setting file
4646 access/modification times, getting the amount of time used by the XEmacs
4649 @file{systty.h} buffers against the infinitude of different ways of
4652 @file{syswait.h} provides a uniform way of retrieving the exit status
4653 from a @code{wait()}ed-on process (some systems use a union, others use
4670 These files implement the ability to play various sounds on some types
4671 of computers. You have to configure your XEmacs with sound support in
4672 order to get this capability.
4674 @file{sound.c} provides the generic interface. It implements various
4675 Lisp primitives and variables that let you specify which sounds should
4676 be played in certain conditions. (The conditions are identified by
4677 symbols, which are passed to @code{ding} to make a sound. Various
4678 standard functions call this function at certain times; if sound support
4679 does not exist, a simple beep results.
4681 @cindex native sound
4682 @cindex sound, native
4683 @file{sgiplay.c}, @file{sunplay.c}, @file{hpplay.c}, and
4684 @file{linuxplay.c} interface to the machine's speaker for various
4685 different kind of machines. This is called @dfn{native} sound.
4687 @cindex sound, network
4688 @cindex network sound
4690 @file{nas.c} interfaces to a computer somewhere else on the network
4691 using the NAS (Network Audio Server) protocol, playing sounds on that
4692 machine. This allows you to run XEmacs on a remote machine, with its
4693 display set to your local machine, and have the sounds be made on your
4694 local machine, provided that you have a NAS server running on your local
4697 @file{libsst.c}, @file{libsst.h}, and @file{libst.h} provide some
4698 additional functions for playing sound on a Sun SPARC but are not
4708 These two modules implement an interface to the ToolTalk protocol, which
4709 is an interprocess communication protocol implemented on some versions
4710 of Unix. ToolTalk is a high-level protocol that allows processes to
4711 register themselves as providers of particular services; other processes
4712 can then request a service without knowing or caring exactly who is
4713 providing the service. It is similar in spirit to the DDE protocol
4714 provided under Microsoft Windows. ToolTalk is a part of the new CDE
4715 (Common Desktop Environment) specification and is used to connect the
4716 parts of the SPARCWorks development environment.
4724 This module provides the ability to retrieve the system's current load
4725 average. (The way to do this is highly system-specific, unfortunately,
4726 and requires a lot of special-case code.)
4734 This module provides a small amount of code used internally at Sun to
4735 keep statistics on the usage of XEmacs.
4746 These files provide replacement functions and prototypes to fix numerous
4747 bugs in early releases of SunOS 4.1.
4755 This module provides some terminal-control code necessary on versions of
4760 @node Modules for Interfacing with X Windows
4761 @section Modules for Interfacing with X Windows
4762 @cindex modules for interfacing with X Windows
4763 @cindex interfacing with X Windows, modules for
4764 @cindex X Windows, modules for interfacing with
4770 A file generated from @file{Emacs.ad}, which contains XEmacs-supplied
4771 fallback resources (so that XEmacs has pretty defaults).
4781 These modules implement an Xt widget class that encapsulates a frame.
4782 This is for ease in integrating with Xt. The EmacsFrame widget covers
4783 the entire X window except for the menubar; the scrollbars are
4784 positioned on top of the EmacsFrame widget.
4786 @strong{Warning:} Abandon hope, all ye who enter here. This code took
4787 an ungodly amount of time to get right, and is likely to fall apart
4788 mercilessly at the slightest change. Such is life under Xt.
4798 These modules implement a simple Xt manager (i.e. composite) widget
4799 class that simply lets its children set whatever geometry they want.
4800 It's amazing that Xt doesn't provide this standardly, but on second
4801 thought, it makes sense, considering how amazingly broken Xt is.
4811 These modules implement two Xt widget classes that are subclasses of
4812 the TopLevelShell and TransientShell classes. This is necessary to deal
4813 with more brokenness that Xt has sadistically thrust onto the backs of
4823 These modules provide functions for maintenance and caching of GC's
4824 (graphics contexts) under the X Window System. This code is junky and
4825 needs to be rewritten.
4837 This module provides an interface to the X Window System's concept of
4838 @dfn{selections}, the standard way for X applications to communicate
4850 These header files are similar in spirit to the @file{sys*.h} files and buffer
4851 against different implementations of Xt and Motif.
4855 @file{xintrinsic.h} should be included in place of @file{<Intrinsic.h>}.
4857 @file{xintrinsicp.h} should be included in place of @file{<IntrinsicP.h>}.
4859 @file{xmmanagerp.h} should be included in place of @file{<XmManagerP.h>}.
4861 @file{xmprimitivep.h} should be included in place of @file{<XmPrimitiveP.h>}.
4871 These files provide an emulation of the Xmu library for those systems
4872 (i.e. HPUX) that don't provide it as a standard part of X.
4877 ExternalClient-Xlib.c
4890 @cindex external widget
4891 These files provide the @dfn{external widget} interface, which allows an
4892 XEmacs frame to appear as a widget in another application. To do this,
4893 you have to configure with @samp{--external-widget}.
4895 @file{ExternalShell*} provides the server (XEmacs) side of the
4898 @file{ExternalClient*} provides the client (other application) side of
4899 the connection. These files are not compiled into XEmacs but are
4900 compiled into libraries that are then linked into your application.
4902 @file{extw-*} is common code that is used for both the client and server.
4904 Don't touch this code; something is liable to break if you do.
4908 @node Modules for Internationalization
4909 @section Modules for Internationalization
4910 @cindex modules for internationalization
4911 @cindex internationalization, modules for
4926 These files implement the MULE (Asian-language) support. Note that MULE
4927 actually provides a general interface for all sorts of languages, not
4928 just Asian languages (although they are generally the most complicated
4929 to support). This code is still in beta.
4931 @file{mule-charset.*} and @file{file-coding.*} provide the heart of the
4932 XEmacs MULE support. @file{mule-charset.*} implements the @dfn{charset}
4933 Lisp object type, which encapsulates a character set (an ordered one- or
4934 two-dimensional set of characters, such as US ASCII or JISX0208 Japanese
4937 @file{file-coding.*} implements the @dfn{coding-system} Lisp object
4938 type, which encapsulates a method of converting between different
4939 encodings. An encoding is a representation of a stream of characters,
4940 possibly from multiple character sets, using a stream of bytes or words,
4941 and defines (e.g.) which escape sequences are used to specify particular
4942 character sets, how the indices for a character are converted into bytes
4943 (sometimes this involves setting the high bit; sometimes complicated
4944 rearranging of the values takes place, as in the Shift-JIS encoding),
4947 @file{mule-ccl.c} provides the CCL (Code Conversion Language)
4948 interpreter. CCL is similar in spirit to Lisp byte code and is used to
4949 implement converters for custom encodings.
4951 @file{mule-canna.c} and @file{mule-wnnfns.c} implement interfaces to
4952 external programs used to implement the Canna and WNN input methods,
4953 respectively. This is currently in beta.
4955 @file{mule-mcpath.c} provides some functions to allow for pathnames
4956 containing extended characters. This code is fragmentary, obsolete, and
4957 completely non-working. Instead, @var{pathname-coding-system} is used
4958 to specify conversions of names of files and directories. The standard
4959 C I/O functions like @samp{open()} are wrapped so that conversion occurs
4962 @file{mule.c} provides a few miscellaneous things that should probably
4971 This provides some miscellaneous internationalization code for
4972 implementing message translation and interfacing to the Ximp input
4973 method. None of this code is currently working.
4981 This contains leftover code from an earlier implementation of
4982 Asian-language support, and is not currently used.
4987 @node Modules for Regression Testing
4988 @section Modules for Regression Testing
4989 @cindex modules for regression testing
4990 @cindex regression testing, modules for
4995 byte-compiler-tests.el
5010 @file{test-harness.el} defines the macros @code{Assert},
5011 @code{Check-Error}, @code{Check-Error-Message}, and
5012 @code{Check-Message}. The other files are test files, testing various
5017 @node Allocation of Objects in XEmacs Lisp, Dumping, A Summary of the Various XEmacs Modules, Top
5018 @chapter Allocation of Objects in XEmacs Lisp
5019 @cindex allocation of objects in XEmacs Lisp
5020 @cindex objects in XEmacs Lisp, allocation of
5021 @cindex Lisp objects, allocation of in XEmacs
5024 * Introduction to Allocation::
5025 * Garbage Collection::
5027 * Garbage Collection - Step by Step::
5028 * Integers and Characters::
5029 * Allocation from Frob Blocks::
5031 * Low-level allocation::
5038 * Compiled Function::
5041 @node Introduction to Allocation
5042 @section Introduction to Allocation
5043 @cindex allocation, introduction to
5045 Emacs Lisp, like all Lisps, has garbage collection. This means that
5046 the programmer never has to explicitly free (destroy) an object; it
5047 happens automatically when the object becomes inaccessible. Most
5048 experts agree that garbage collection is a necessity in a modern,
5049 high-level language. Its omission from C stems from the fact that C was
5050 originally designed to be a nice abstract layer on top of assembly
5051 language, for writing kernels and basic system utilities rather than
5054 Lisp objects can be created by any of a number of Lisp primitives.
5055 Most object types have one or a small number of basic primitives
5056 for creating objects. For conses, the basic primitive is @code{cons};
5057 for vectors, the primitives are @code{make-vector} and @code{vector}; for
5058 symbols, the primitives are @code{make-symbol} and @code{intern}; etc.
5059 Some Lisp objects, especially those that are primarily used internally,
5060 have no corresponding Lisp primitives. Every Lisp object, though,
5061 has at least one C primitive for creating it.
5063 Recall from section (VII) that a Lisp object, as stored in a 32-bit or
5064 64-bit word, has a few tag bits, and a ``value'' that occupies the
5065 remainder of the bits. We can separate the different Lisp object types
5066 into three broad categories:
5070 (a) Those for whom the value directly represents the contents of the
5071 Lisp object. Only two types are in this category: integers and
5072 characters. No special allocation or garbage collection is necessary
5073 for such objects. Lisp objects of these types do not need to be
5077 In the remaining two categories, the type is stored in the object
5078 itself. The tag for all such objects is the generic @dfn{lrecord}
5079 (Lisp_Type_Record) tag. The first bytes of the object's structure are an
5080 integer (actually a char) characterising the object's type and some
5081 flags, in particular the mark bit used for garbage collection. A
5082 structure describing the type is accessible thru the
5083 lrecord_implementation_table indexed with said integer. This structure
5084 includes the method pointers and a pointer to a string naming the type.
5088 (b) Those lrecords that are allocated in frob blocks (see above). This
5089 includes the objects that are most common and relatively small, and
5090 includes conses, strings, subrs, floats, compiled functions, symbols,
5091 extents, events, and markers. With the cleanup of frob blocks done in
5092 19.12, it's not terribly hard to add more objects to this category, but
5093 it's a bit trickier than adding an object type to type (c) (esp. if the
5094 object needs a finalization method), and is not likely to save much
5095 space unless the object is small and there are many of them. (In fact,
5096 if there are very few of them, it might actually waste space.)
5098 (c) Those lrecords that are individually @code{malloc()}ed. These are
5099 called @dfn{lcrecords}. All other types are in this category. Adding a
5100 new type to this category is comparatively easy, and all types added
5101 since 19.8 (when the current allocation scheme was devised, by Richard
5102 Mlynarik), with the exception of the character type, have been in this
5106 Note that bit vectors are a bit of a special case. They are
5107 simple lrecords as in category (b), but are individually @code{malloc()}ed
5108 like vectors. You can basically view them as exactly like vectors
5109 except that their type is stored in lrecord fashion rather than
5110 in directly-tagged fashion.
5113 @node Garbage Collection
5114 @section Garbage Collection
5115 @cindex garbage collection
5117 @cindex mark and sweep
5118 Garbage collection is simple in theory but tricky to implement.
5119 Emacs Lisp uses the oldest garbage collection method, called
5120 @dfn{mark and sweep}. Garbage collection begins by starting with
5121 all accessible locations (i.e. all variables and other slots where
5122 Lisp objects might occur) and recursively traversing all objects
5123 accessible from those slots, marking each one that is found.
5124 We then go through all of memory and free each object that is
5125 not marked, and unmarking each object that is marked. Note
5126 that ``all of memory'' means all currently allocated objects.
5127 Traversing all these objects means traversing all frob blocks,
5128 all vectors (which are chained in one big list), and all
5129 lcrecords (which are likewise chained).
5131 Garbage collection can be invoked explicitly by calling
5132 @code{garbage-collect} but is also called automatically by @code{eval},
5133 once a certain amount of memory has been allocated since the last
5134 garbage collection (according to @code{gc-cons-threshold}).
5138 @section @code{GCPRO}ing
5139 @cindex @code{GCPRO}ing
5140 @cindex garbage collection protection
5141 @cindex protection, garbage collection
5143 @code{GCPRO}ing is one of the ugliest and trickiest parts of Emacs
5144 internals. The basic idea is that whenever garbage collection
5145 occurs, all in-use objects must be reachable somehow or
5146 other from one of the roots of accessibility. The roots
5147 of accessibility are:
5151 All objects that have been @code{staticpro()}d or
5152 @code{staticpro_nodump()}ed. This is used for any global C variables
5153 that hold Lisp objects. A call to @code{staticpro()} happens implicitly
5154 as a result of any symbols declared with @code{defsymbol()} and any
5155 variables declared with @code{DEFVAR_FOO()}. You need to explicitly
5156 call @code{staticpro()} (in the @code{vars_of_foo()} method of a module)
5157 for other global C variables holding Lisp objects. (This typically
5158 includes internal lists and such things.). Use
5159 @code{staticpro_nodump()} only in the rare cases when you do not want
5160 the pointed variable to be saved at dump time but rather recompute it at
5163 Note that @code{obarray} is one of the @code{staticpro()}d things.
5164 Therefore, all functions and variables get marked through this.
5166 Any shadowed bindings that are sitting on the @code{specpdl} stack.
5168 Any objects sitting in currently active (Lisp) stack frames,
5169 catches, and condition cases.
5171 A couple of special-case places where active objects are
5174 Anything currently marked with @code{GCPRO}.
5177 Marking with @code{GCPRO} is necessary because some C functions (quite
5178 a lot, in fact), allocate objects during their operation. Quite
5179 frequently, there will be no other pointer to the object while the
5180 function is running, and if a garbage collection occurs and the object
5181 needs to be referenced again, bad things will happen. The solution is
5182 to mark those objects with @code{GCPRO}. Unfortunately this is easy to
5183 forget, and there is basically no way around this problem. Here are
5188 For every @code{GCPRO@var{n}}, there have to be declarations of
5189 @code{struct gcpro gcpro1, gcpro2}, etc.
5192 You @emph{must} @code{UNGCPRO} anything that's @code{GCPRO}ed, and you
5193 @emph{must not} @code{UNGCPRO} if you haven't @code{GCPRO}ed. Getting
5194 either of these wrong will lead to crashes, often in completely random
5195 places unrelated to where the problem lies.
5198 The way this actually works is that all currently active @code{GCPRO}s
5199 are chained through the @code{struct gcpro} local variables, with the
5200 variable @samp{gcprolist} pointing to the head of the list and the nth
5201 local @code{gcpro} variable pointing to the first @code{gcpro} variable
5202 in the next enclosing stack frame. Each @code{GCPRO}ed thing is an
5203 lvalue, and the @code{struct gcpro} local variable contains a pointer to
5204 this lvalue. This is why things will mess up badly if you don't pair up
5205 the @code{GCPRO}s and @code{UNGCPRO}s---you will end up with
5206 @code{gcprolist}s containing pointers to @code{struct gcpro}s or local
5207 @code{Lisp_Object} variables in no-longer-active stack frames.
5210 It is actually possible for a single @code{struct gcpro} to
5211 protect a contiguous array of any number of values, rather than
5212 just a single lvalue. To effect this, call @code{GCPRO@var{n}} as usual on
5213 the first object in the array and then set @code{gcpro@var{n}.nvars}.
5216 @strong{Strings are relocated.} What this means in practice is that the
5217 pointer obtained using @code{XSTRING_DATA()} is liable to change at any
5218 time, and you should never keep it around past any function call, or
5219 pass it as an argument to any function that might cause a garbage
5220 collection. This is why a number of functions accept either a
5221 ``non-relocatable'' @code{char *} pointer or a relocatable Lisp string,
5222 and only access the Lisp string's data at the very last minute. In some
5223 cases, you may end up having to @code{alloca()} some space and copy the
5224 string's data into it.
5227 By convention, if you have to nest @code{GCPRO}'s, use @code{NGCPRO@var{n}}
5228 (along with @code{struct gcpro ngcpro1, ngcpro2}, etc.), @code{NNGCPRO@var{n}},
5229 etc. This avoids compiler warnings about shadowed locals.
5232 It is @emph{always} better to err on the side of extra @code{GCPRO}s
5233 rather than too few. The extra cycles spent on this are
5234 almost never going to make a whit of difference in the
5238 The general rule to follow is that caller, not callee, @code{GCPRO}s.
5239 That is, you should not have to explicitly @code{GCPRO} any Lisp objects
5240 that are passed in as parameters.
5242 One exception from this rule is if you ever plan to change the parameter
5243 value, and store a new object in it. In that case, you @emph{must}
5244 @code{GCPRO} the parameter, because otherwise the new object will not be
5247 So, if you create any Lisp objects (remember, this happens in all sorts
5248 of circumstances, e.g. with @code{Fcons()}, etc.), you are responsible
5249 for @code{GCPRO}ing them, unless you are @emph{absolutely sure} that
5250 there's no possibility that a garbage-collection can occur while you
5251 need to use the object. Even then, consider @code{GCPRO}ing.
5254 A garbage collection can occur whenever anything calls @code{Feval}, or
5255 whenever a QUIT can occur where execution can continue past
5256 this. (Remember, this is almost anywhere.)
5259 If you have the @emph{least smidgeon of doubt} about whether
5260 you need to @code{GCPRO}, you should @code{GCPRO}.
5263 Beware of @code{GCPRO}ing something that is uninitialized. If you have
5264 any shade of doubt about this, initialize all your variables to @code{Qnil}.
5267 Be careful of traps, like calling @code{Fcons()} in the argument to
5268 another function. By the ``caller protects'' law, you should be
5269 @code{GCPRO}ing the newly-created cons, but you aren't. A certain
5270 number of functions that are commonly called on freshly created stuff
5271 (e.g. @code{nconc2()}, @code{Fsignal()}), break the ``caller protects''
5272 law and go ahead and @code{GCPRO} their arguments so as to simplify
5273 things, but make sure and check if it's OK whenever doing something like
5277 Once again, remember to @code{GCPRO}! Bugs resulting from insufficient
5278 @code{GCPRO}ing are intermittent and extremely difficult to track down,
5279 often showing up in crashes inside of @code{garbage-collect} or in
5280 weirdly corrupted objects or even in incorrect values in a totally
5281 different section of code.
5284 If you don't understand whether to @code{GCPRO} in a particular
5285 instance, ask on the mailing lists. A general hint is that @code{prog1}
5286 is the canonical example
5288 @cindex garbage collection, conservative
5289 @cindex conservative garbage collection
5290 Given the extremely error-prone nature of the @code{GCPRO} scheme, and
5291 the difficulties in tracking down, it should be considered a deficiency
5292 in the XEmacs code. A solution to this problem would involve
5293 implementing so-called @dfn{conservative} garbage collection for the C
5294 stack. That involves looking through all of stack memory and treating
5295 anything that looks like a reference to an object as a reference. This
5296 will result in a few objects not getting collected when they should, but
5297 it obviates the need for @code{GCPRO}ing, and allows garbage collection
5298 to happen at any point at all, such as during object allocation.
5300 @node Garbage Collection - Step by Step
5301 @section Garbage Collection - Step by Step
5302 @cindex garbage collection - step by step
5306 * garbage_collect_1::
5309 * sweep_lcrecords_1::
5310 * compact_string_chars::
5312 * sweep_bit_vectors_1::
5316 @subsection Invocation
5317 @cindex garbage collection, invocation
5319 The first thing that anyone should know about garbage collection is:
5320 when and how the garbage collector is invoked. One might think that this
5321 could happen every time new memory is allocated, e.g. new objects are
5322 created, but this is @emph{not} the case. Instead, we have the following
5325 The entry point of any process of garbage collection is an invocation
5326 of the function @code{garbage_collect_1} in file @code{alloc.c}. The
5327 invocation can occur @emph{explicitly} by calling the function
5328 @code{Fgarbage_collect} (in addition this function provides information
5329 about the freed memory), or can occur @emph{implicitly} in four different
5333 In function @code{main_1} in file @code{emacs.c}. This function is called
5334 at each startup of xemacs. The garbage collection is invoked after all
5335 initial creations are completed, but only if a special internal error
5336 checking-constant @code{ERROR_CHECK_GC} is defined.
5338 In function @code{disksave_object_finalization} in file
5339 @code{alloc.c}. The only purpose of this function is to clear the
5340 objects from memory which need not be stored with xemacs when we dump out
5341 an executable. This is only done by @code{Fdump_emacs} or by
5342 @code{Fdump_emacs_data} respectively (both in @code{emacs.c}). The
5343 actual clearing is accomplished by making these objects unreachable and
5344 starting a garbage collection. The function is only used while building
5347 In function @code{Feval / eval} in file @code{eval.c}. Each time the
5348 well known and often used function eval is called to evaluate a form,
5349 one of the first things that could happen, is a potential call of
5350 @code{garbage_collect_1}. There exist three global variables,
5351 @code{consing_since_gc} (counts the created cons-cells since the last
5352 garbage collection), @code{gc_cons_threshold} (a specified threshold
5353 after which a garbage collection occurs) and @code{always_gc}. If
5354 @code{always_gc} is set or if the threshold is exceeded, the garbage
5355 collection will start.
5357 In function @code{Ffuncall / funcall} in file @code{eval.c}. This
5358 function evaluates calls of elisp functions and works according to
5362 The upshot is that garbage collection can basically occur everywhere
5363 @code{Feval}, respectively @code{Ffuncall}, is used - either directly or
5364 through another function. Since calls to these two functions are hidden
5365 in various other functions, many calls to @code{garbage_collect_1} are
5366 not obviously foreseeable, and therefore unexpected. Instances where
5367 they are used that are worth remembering are various elisp commands, as
5368 for example @code{or}, @code{and}, @code{if}, @code{cond}, @code{while},
5369 @code{setq}, etc., miscellaneous @code{gui_item_...} functions,
5370 everything related to @code{eval} (@code{Feval_buffer}, @code{call0},
5371 ...) and inside @code{Fsignal}. The latter is used to handle signals, as
5372 for example the ones raised by every @code{QUIT}-macro triggered after
5375 @node garbage_collect_1
5376 @subsection @code{garbage_collect_1}
5377 @cindex @code{garbage_collect_1}
5379 We can now describe exactly what happens after the invocation takes
5383 There are several cases in which the garbage collector is left immediately:
5384 when we are already garbage collecting (@code{gc_in_progress}), when
5385 the garbage collection is somehow forbidden
5386 (@code{gc_currently_forbidden}), when we are currently displaying something
5387 (@code{in_display}) or when we are preparing for the armageddon of the
5388 whole system (@code{preparing_for_armageddon}).
5390 Next the correct frame in which to put
5391 all the output occurring during garbage collecting is determined. In
5392 order to be able to restore the old display's state after displaying the
5393 message, some data about the current cursor position has to be
5394 saved. The variables @code{pre_gc_cursor} and @code{cursor_changed} take
5397 The state of @code{gc_currently_forbidden} must be restored after
5398 the garbage collection, no matter what happens during the process. We
5399 accomplish this by @code{record_unwind_protect}ing the suitable function
5400 @code{restore_gc_inhibit} together with the current value of
5401 @code{gc_currently_forbidden}.
5403 If we are concurrently running an interactive xemacs session, the next step
5404 is simply to show the garbage collector's cursor/message.
5406 The following steps are the intrinsic steps of the garbage collector,
5407 therefore @code{gc_in_progress} is set.
5409 For debugging purposes, it is possible to copy the current C stack
5410 frame. However, this seems to be a currently unused feature.
5412 Before actually starting to go over all live objects, references to
5413 objects that are no longer used are pruned. We only have to do this for events
5414 (@code{clear_event_resource}) and for specifiers
5415 (@code{cleanup_specifiers}).
5417 Now the mark phase begins and marks all accessible elements. In order to
5419 all slots that serve as roots of accessibility, the function
5420 @code{mark_object} is called for each root individually to go out from
5421 there to mark all reachable objects. All roots that are traversed are
5422 shown in their processed order:
5425 all constant symbols and static variables that are registered via
5426 @code{staticpro}@ in the dynarr @code{staticpros}.
5427 @xref{Adding Global Lisp Variables}.
5429 all Lisp objects that are created in C functions and that must be
5430 protected from freeing them. They are registered in the global
5431 list @code{gcprolist}.
5434 all local variables (i.e. their name fields @code{symbol} and old
5435 values @code{old_values}) that are bound during the evaluation by the Lisp
5436 engine. They are stored in @code{specbinding} structs pushed on a stack
5437 called @code{specpdl}.
5438 @xref{Dynamic Binding; The specbinding Stack; Unwind-Protects}.
5440 all catch blocks that the Lisp engine encounters during the evaluation
5441 cause the creation of structs @code{catchtag} inserted in the list
5442 @code{catchlist}. Their tag (@code{tag}) and value (@code{val} fields
5443 are freshly created objects and therefore have to be marked.
5444 @xref{Catch and Throw}.
5446 every function application pushes new structs @code{backtrace}
5447 on the call stack of the Lisp engine (@code{backtrace_list}). The unique
5448 parts that have to be marked are the fields for each function
5449 (@code{function}) and all their arguments (@code{args}).
5452 all objects that are used by the redisplay engine that must not be freed
5453 are marked by a special function called @code{mark_redisplay} (in
5454 @code{redisplay.c}).
5456 all objects created for profiling purposes are allocated by C functions
5457 instead of using the lisp allocation mechanisms. In order to receive the
5458 right ones during the sweep phase, they also have to be marked
5459 manually. That is done by the function @code{mark_profiling_info}
5462 Hash tables in XEmacs belong to a kind of special objects that
5463 make use of a concept often called 'weak pointers'.
5464 To make a long story short, these kind of pointers are not followed
5465 during the estimation of the live objects during garbage collection.
5466 Any object referenced only by weak pointers is collected
5467 anyway, and the reference to it is cleared. In hash tables there are
5468 different usage patterns of them, manifesting in different types of hash
5469 tables, namely 'non-weak', 'weak', 'key-weak' and 'value-weak'
5470 (internally also 'key-car-weak' and 'value-car-weak') hash tables, each
5471 clearing entries depending on different conditions. More information can
5472 be found in the documentation to the function @code{make-hash-table}.
5474 Because there are complicated dependency rules about when and what to
5475 mark while processing weak hash tables, the standard @code{marker}
5476 method is only active if it is marking non-weak hash tables. As soon as
5477 a weak component is in the table, the hash table entries are ignored
5478 while marking. Instead their marking is done each separately by the
5479 function @code{finish_marking_weak_hash_tables}. This function iterates
5480 over each hash table entry @code{hentries} for each weak hash table in
5481 @code{Vall_weak_hash_tables}. Depending on the type of a table, the
5482 appropriate action is performed.
5483 If a table is acting as @code{HASH_TABLE_KEY_WEAK}, and a key already marked,
5484 everything reachable from the @code{value} component is marked. If it is
5485 acting as a @code{HASH_TABLE_VALUE_WEAK} and the value component is
5486 already marked, the marking starts beginning only from the
5487 @code{key} component.
5488 If it is a @code{HASH_TABLE_KEY_CAR_WEAK} and the car
5489 of the key entry is already marked, we mark both the @code{key} and
5490 @code{value} components.
5491 Finally, if the table is of the type @code{HASH_TABLE_VALUE_CAR_WEAK}
5492 and the car of the value components is already marked, again both the
5493 @code{key} and the @code{value} components get marked.
5495 Again, there are lists with comparable properties called weak
5496 lists. There exist different peculiarities of their types called
5497 @code{simple}, @code{assoc}, @code{key-assoc} and
5498 @code{value-assoc}. You can find further details about them in the
5499 description to the function @code{make-weak-list}. The scheme of their
5500 marking is similar: all weak lists are listed in @code{Qall_weak_lists},
5501 therefore we iterate over them. The marking is advanced until we hit an
5502 already marked pair. Then we know that during a former run all
5503 the rest has been marked completely. Again, depending on the special
5504 type of the weak list, our jobs differ. If it is a @code{WEAK_LIST_SIMPLE}
5505 and the elem is marked, we mark the @code{cons} part. If it is a
5506 @code{WEAK_LIST_ASSOC} and not a pair or a pair with both marked car and
5507 cdr, we mark the @code{cons} and the @code{elem}. If it is a
5508 @code{WEAK_LIST_KEY_ASSOC} and not a pair or a pair with a marked car of
5509 the elem, we mark the @code{cons} and the @code{elem}. Finally, if it is
5510 a @code{WEAK_LIST_VALUE_ASSOC} and not a pair or a pair with a marked
5511 cdr of the elem, we mark both the @code{cons} and the @code{elem}.
5513 Since, by marking objects in reach from weak hash tables and weak lists,
5514 other objects could get marked, this perhaps implies further marking of
5515 other weak objects, both finishing functions are redone as long as
5516 yet unmarked objects get freshly marked.
5519 After completing the special marking for the weak hash tables and for the weak
5520 lists, all entries that point to objects that are going to be swept in
5521 the further process are useless, and therefore have to be removed from
5522 the table or the list.
5524 The function @code{prune_weak_hash_tables} does the job for weak hash
5525 tables. Totally unmarked hash tables are removed from the list
5526 @code{Vall_weak_hash_tables}. The other ones are treated more carefully
5527 by scanning over all entries and removing one as soon as one of
5528 the components @code{key} and @code{value} is unmarked.
5530 The same idea applies to the weak lists. It is accomplished by
5531 @code{prune_weak_lists}: An unmarked list is pruned from
5532 @code{Vall_weak_lists} immediately. A marked list is treated more
5533 carefully by going over it and removing just the unmarked pairs.
5536 The function @code{prune_specifiers} checks all listed specifiers held
5537 in @code{Vall_specifiers} and removes the ones from the lists that are
5541 All syntax tables are stored in a list called
5542 @code{Vall_syntax_tables}. The function @code{prune_syntax_tables} walks
5543 through it and unlinks the tables that are unmarked.
5546 Next, we will attack the complete sweeping - the function
5547 @code{gc_sweep} which holds the predominance.
5549 First, all the variables with respect to garbage collection are
5550 reset. @code{consing_since_gc} - the counter of the created cells since
5551 the last garbage collection - is set back to 0, and
5552 @code{gc_in_progress} is not @code{true} anymore.
5554 In case the session is interactive, the displayed cursor and message are
5557 The state of @code{gc_inhibit} is restored to the former value by
5558 unwinding the stack.
5560 A small memory reserve is always held back that can be reached by
5561 @code{breathing_space}. If nothing more is left, we create a new reserve
5566 @subsection @code{mark_object}
5567 @cindex @code{mark_object}
5569 The first thing that is checked while marking an object is whether the
5570 object is a real Lisp object @code{Lisp_Type_Record} or just an integer
5571 or a character. Integers and characters are the only two types that are
5572 stored directly - without another level of indirection, and therefore they
5573 don't have to be marked and collected.
5574 @xref{How Lisp Objects Are Represented in C}.
5576 The second case is the one we have to handle. It is the one when we are
5577 dealing with a pointer to a Lisp object. But, there exist also three
5578 possibilities, that prevent us from doing anything while marking: The
5579 object is read only which prevents it from being garbage collected,
5580 i.e. marked (@code{C_READONLY_RECORD_HEADER}). The object in question is
5581 already marked, and need not be marked for the second time (checked by
5582 @code{MARKED_RECORD_HEADER_P}). If it is a special, unmarkable object
5583 (@code{UNMARKABLE_RECORD_HEADER_P}, apparently, these are objects that
5584 sit in some const space, and can therefore not be marked, see
5585 @code{this_one_is_unmarkable} in @code{alloc.c}).
5587 Now, the actual marking is feasible. We do so by once using the macro
5588 @code{MARK_RECORD_HEADER} to mark the object itself (actually the
5589 special flag in the lrecord header), and calling its special marker
5590 "method" @code{marker} if available. The marker method marks every
5591 other object that is in reach from our current object. Note, that these
5592 marker methods should not call @code{mark_object} recursively, but
5593 instead should return the next object from where further marking has to
5596 In case another object was returned, as mentioned before, we reiterate
5597 the whole @code{mark_object} process beginning with this next object.
5600 @subsection @code{gc_sweep}
5601 @cindex @code{gc_sweep}
5603 The job of this function is to free all unmarked records from memory. As
5604 we know, there are different types of objects implemented and managed, and
5605 consequently different ways to free them from memory.
5606 @xref{Introduction to Allocation}.
5608 We start with all objects stored through @code{lcrecords}. All
5609 bulkier objects are allocated and handled using that scheme of
5610 @code{lcrecords}. Each object is @code{malloc}ed separately
5611 instead of placing it in one of the contiguous frob blocks. All types
5612 that are currently stored
5613 using @code{lcrecords}'s @code{alloc_lcrecord} and
5614 @code{make_lcrecord_list} are the types: vectors, buffers,
5615 char-table, char-table-entry, console, weak-list, database, device,
5616 ldap, hash-table, command-builder, extent-auxiliary, extent-info, face,
5617 coding-system, frame, image-instance, glyph, popup-data, gui-item,
5618 keymap, charset, color_instance, font_instance, opaque, opaque-list,
5619 process, range-table, specifier, symbol-value-buffer-local,
5620 symbol-value-lisp-magic, symbol-value-varalias, toolbar-button,
5621 tooltalk-message, tooltalk-pattern, window, and window-configuration. We
5622 take care of them in the fist place
5623 in order to be able to handle and to finalize items stored in them more
5624 easily. The function @code{sweep_lcrecords_1} as described below is
5625 doing the whole job for us.
5626 For a description about the internals: @xref{lrecords}.
5628 Our next candidates are the other objects that behave quite differently
5629 than everything else: the strings. They consists of two parts, a
5630 fixed-size portion (@code{struct Lisp_String}) holding the string's
5631 length, its property list and a pointer to the second part, and the
5632 actual string data, which is stored in string-chars blocks comparable to
5633 frob blocks. In this block, the data is not only freed, but also a
5634 compression of holes is made, i.e. all strings are relocated together.
5635 @xref{String}. This compacting phase is performed by the function
5636 @code{compact_string_chars}, the actual sweeping by the function
5637 @code{sweep_strings} is described below.
5639 After that, the other types are swept step by step using functions
5640 @code{sweep_conses}, @code{sweep_bit_vectors_1},
5641 @code{sweep_compiled_functions}, @code{sweep_floats},
5642 @code{sweep_symbols}, @code{sweep_extents}, @code{sweep_markers} and
5643 @code{sweep_extents}. They are the fixed-size types cons, floats,
5644 compiled-functions, symbol, marker, extent, and event stored in
5645 so-called "frob blocks", and therefore we can basically do the same on
5646 every type objects, using the same macros, especially defined only to
5647 handle everything with respect to fixed-size blocks. The only fixed-size
5648 type that is not handled here are the fixed-size portion of strings,
5649 because we took special care of them earlier.
5651 The only big exceptions are bit vectors stored differently and
5652 therefore treated differently by the function @code{sweep_bit_vectors_1}
5655 At first, we need some brief information about how
5656 these fixed-size types are managed in general, in order to understand
5657 how the sweeping is done. They have all a fixed size, and are therefore
5658 stored in big blocks of memory - allocated at once - that can hold a
5659 certain amount of objects of one type. The macro
5660 @code{DECLARE_FIXED_TYPE_ALLOC} creates the suitable structures for
5661 every type. More precisely, we have the block struct
5662 (holding a pointer to the previous block @code{prev} and the
5663 objects in @code{block[]}), a pointer to current block
5664 (@code{current_..._block)}) and its last index
5665 (@code{current_..._block_index}), and a pointer to the free list that
5666 will be created. Also a macro @code{FIXED_TYPE_FROM_BLOCK} plus some
5667 related macros exists that are used to obtain a new object, either from
5668 the free list @code{ALLOCATE_FIXED_TYPE_1} if there is an unused object
5669 of that type stored or by allocating a completely new block using
5670 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK}.
5672 The rest works as follows: all of them define a
5673 macro @code{UNMARK_...} that is used to unmark the object. They define a
5674 macro @code{ADDITIONAL_FREE_...} that defines additional work that has
5675 to be done when converting an object from in use to not in use (so far,
5676 only markers use it in order to unchain them). Then, they all call
5677 the macro @code{SWEEP_FIXED_TYPE_BLOCK} instantiated with their type name
5678 and their struct name.
5680 This call in particular does the following: we go over all blocks
5681 starting with the current moving towards the oldest.
5682 For each block, we look at every object in it. If the object already
5683 freed (checked with @code{FREE_STRUCT_P} using the first pointer of the
5684 object), or if it is
5685 set to read only (@code{C_READONLY_RECORD_HEADER_P}, nothing must be
5686 done. If it is unmarked (checked with @code{MARKED_RECORD_HEADER_P}), it
5687 is put in the free list and set free (using the macro
5688 @code{FREE_FIXED_TYPE}, otherwise it stays in the block, but is unmarked
5689 (by @code{UNMARK_...}). While going through one block, we note if the
5690 whole block is empty. If so, the whole block is freed (using
5691 @code{xfree}) and the free list state is set to the state it had before
5692 handling this block.
5694 @node sweep_lcrecords_1
5695 @subsection @code{sweep_lcrecords_1}
5696 @cindex @code{sweep_lcrecords_1}
5698 After nullifying the complete lcrecord statistics, we go over all
5699 lcrecords two separate times. They are all chained together in a list with
5700 a head called @code{all_lcrecords}.
5702 The first loop calls for each object its @code{finalizer} method, but only
5703 in the case that it is not read only
5704 (@code{C_READONLY_RECORD_HEADER_P)}, it is not already marked
5705 (@code{MARKED_RECORD_HEADER_P}), it is not already in a free list (list of
5706 freed objects, field @code{free}) and finally it owns a finalizer
5709 The second loop actually frees the appropriate objects again by iterating
5710 through the whole list. In case an object is read only or marked, it
5711 has to persist, otherwise it is manually freed by calling
5712 @code{xfree}. During this loop, the lcrecord statistics are kept up to
5713 date by calling @code{tick_lcrecord_stats} with the right arguments,
5715 @node compact_string_chars
5716 @subsection @code{compact_string_chars}
5717 @cindex @code{compact_string_chars}
5719 The purpose of this function is to compact all the data parts of the
5720 strings that are held in so-called @code{string_chars_block}, i.e. the
5721 strings that do not exceed a certain maximal length.
5723 The procedure with which this is done is as follows. We are keeping two
5724 positions in the @code{string_chars_block}s using two pointer/integer
5725 pairs, namely @code{from_sb}/@code{from_pos} and
5726 @code{to_sb}/@code{to_pos}. They stand for the actual positions, from
5727 where to where, to copy the actually handled string.
5729 While going over all chained @code{string_char_block}s and their held
5730 strings, staring at @code{first_string_chars_block}, both pointers
5731 are advanced and eventually a string is copied from @code{from_sb} to
5732 @code{to_sb}, depending on the status of the pointed at strings.
5734 More precisely, we can distinguish between the following actions.
5737 The string at @code{from_sb}'s position could be marked as free, which
5738 is indicated by an invalid pointer to the pointer that should point back
5739 to the fixed size string object, and which is checked by
5740 @code{FREE_STRUCT_P}. In this case, the @code{from_sb}/@code{from_pos}
5741 is advanced to the next string, and nothing has to be copied.
5743 Also, if a string object itself is unmarked, nothing has to be
5744 copied. We likewise advance the @code{from_sb}/@code{from_pos}
5745 pair as described above.
5747 In all other cases, we have a marked string at hand. The string data
5748 must be moved from the from-position to the to-position. In case
5749 there is not enough space in the actual @code{to_sb}-block, we advance
5750 this pointer to the beginning of the next block before copying. In case the
5751 from and to positions are different, we perform the
5752 actual copying using the library function @code{memmove}.
5755 After compacting, the pointer to the current
5756 @code{string_chars_block}, sitting in @code{current_string_chars_block},
5757 is reset on the last block to which we moved a string,
5758 i.e. @code{to_block}, and all remaining blocks (we know that they just
5759 carry garbage) are explicitly @code{xfree}d.
5762 @subsection @code{sweep_strings}
5763 @cindex @code{sweep_strings}
5765 The sweeping for the fixed sized string objects is essentially exactly
5766 the same as it is for all other fixed size types. As before, the freeing
5767 into the suitable free list is done by using the macro
5768 @code{SWEEP_FIXED_SIZE_BLOCK} after defining the right macros
5769 @code{UNMARK_string} and @code{ADDITIONAL_FREE_string}. These two
5770 definitions are a little bit special compared to the ones used
5771 for the other fixed size types.
5773 @code{UNMARK_string} is defined the same way except some additional code
5774 used for updating the bookkeeping information.
5776 For strings, @code{ADDITIONAL_FREE_string} has to do something in
5777 addition: in case, the string was not allocated in a
5778 @code{string_chars_block} because it exceeded the maximal length, and
5779 therefore it was @code{malloc}ed separately, we know also @code{xfree}
5782 @node sweep_bit_vectors_1
5783 @subsection @code{sweep_bit_vectors_1}
5784 @cindex @code{sweep_bit_vectors_1}
5786 Bit vectors are also one of the rare types that are @code{malloc}ed
5787 individually. Consequently, while sweeping, all further needless
5788 bit vectors must be freed by hand. This is done, as one might imagine,
5789 the expected way: since they are all registered in a list called
5790 @code{all_bit_vectors}, all elements of that list are traversed,
5791 all unmarked bit vectors are unlinked by calling @code{xfree} and all of
5792 them become unmarked.
5793 In addition, the bookkeeping information used for garbage
5794 collector's output purposes is updated.
5796 @node Integers and Characters
5797 @section Integers and Characters
5798 @cindex integers and characters
5799 @cindex characters, integers and
5801 Integer and character Lisp objects are created from integers using the
5802 macros @code{XSETINT()} and @code{XSETCHAR()} or the equivalent
5803 functions @code{make_int()} and @code{make_char()}. (These are actually
5804 macros on most systems.) These functions basically just do some moving
5805 of bits around, since the integral value of the object is stored
5806 directly in the @code{Lisp_Object}.
5808 @code{XSETINT()} and the like will truncate values given to them that
5809 are too big; i.e. you won't get the value you expected but the tag bits
5810 will at least be correct.
5812 @node Allocation from Frob Blocks
5813 @section Allocation from Frob Blocks
5814 @cindex allocation from frob blocks
5815 @cindex frob blocks, allocation from
5817 The uninitialized memory required by a @code{Lisp_Object} of a particular type
5819 @code{ALLOCATE_FIXED_TYPE()}. This only occurs inside of the
5820 lowest-level object-creating functions in @file{alloc.c}:
5821 @code{Fcons()}, @code{make_float()}, @code{Fmake_byte_code()},
5822 @code{Fmake_symbol()}, @code{allocate_extent()},
5823 @code{allocate_event()}, @code{Fmake_marker()}, and
5824 @code{make_uninit_string()}. The idea is that, for each type, there are
5825 a number of frob blocks (each 2K in size); each frob block is divided up
5826 into object-sized chunks. Each frob block will have some of these
5827 chunks that are currently assigned to objects, and perhaps some that are
5828 free. (If a frob block has nothing but free chunks, it is freed at the
5829 end of the garbage collection cycle.) The free chunks are stored in a
5830 free list, which is chained by storing a pointer in the first four bytes
5831 of the chunk. (Except for the free chunks at the end of the last frob
5832 block, which are handled using an index which points past the end of the
5833 last-allocated chunk in the last frob block.)
5834 @code{ALLOCATE_FIXED_TYPE()} first tries to retrieve a chunk from the
5835 free list; if that fails, it calls
5836 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK()}, which looks at the end of the
5837 last frob block for space, and creates a new frob block if there is
5838 none. (There are actually two versions of these macros, one of which is
5839 more defensive but less efficient and is used for error-checking.)
5845 [see @file{lrecord.h}]
5847 All lrecords have at the beginning of their structure a @code{struct
5848 lrecord_header}. This just contains a type number and some flags,
5849 including the mark bit. All builtin type numbers are defined as
5850 constants in @code{enum lrecord_type}, to allow the compiler to generate
5851 more efficient code for @code{@var{type}P}. The type number, thru the
5852 @code{lrecord_implementation_table}, gives access to a @code{struct
5853 lrecord_implementation}, which is a structure containing method pointers
5854 and such. There is one of these for each type, and it is a global,
5855 constant, statically-declared structure that is declared in the
5856 @code{DEFINE_LRECORD_IMPLEMENTATION()} macro.
5858 Simple lrecords (of type (b) above) just have a @code{struct
5859 lrecord_header} at their beginning. lcrecords, however, actually have a
5860 @code{struct lcrecord_header}. This, in turn, has a @code{struct
5861 lrecord_header} at its beginning, so sanity is preserved; but it also
5862 has a pointer used to chain all lcrecords together, and a special ID
5863 field used to distinguish one lcrecord from another. (This field is used
5864 only for debugging and could be removed, but the space gain is not
5867 Simple lrecords are created using @code{ALLOCATE_FIXED_TYPE()}, just
5868 like for other frob blocks. The only change is that the implementation
5869 pointer must be initialized correctly. (The implementation structure for
5870 an lrecord, or rather the pointer to it, is named @code{lrecord_float},
5871 @code{lrecord_extent}, @code{lrecord_buffer}, etc.)
5873 lcrecords are created using @code{alloc_lcrecord()}. This takes a
5874 size to allocate and an implementation pointer. (The size needs to be
5875 passed because some lcrecords, such as window configurations, are of
5876 variable size.) This basically just @code{malloc()}s the storage,
5877 initializes the @code{struct lcrecord_header}, and chains the lcrecord
5878 onto the head of the list of all lcrecords, which is stored in the
5879 variable @code{all_lcrecords}. The calls to @code{alloc_lcrecord()}
5880 generally occur in the lowest-level allocation function for each lrecord
5883 Whenever you create an lrecord, you need to call either
5884 @code{DEFINE_LRECORD_IMPLEMENTATION()} or
5885 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()}. This needs to be
5886 specified in a @file{.c} file, at the top level. What this actually
5887 does is define and initialize the implementation structure for the
5888 lrecord. (And possibly declares a function @code{error_check_foo()} that
5889 implements the @code{XFOO()} macro when error-checking is enabled.) The
5890 arguments to the macros are the actual type name (this is used to
5891 construct the C variable name of the lrecord implementation structure
5892 and related structures using the @samp{##} macro concatenation
5893 operator), a string that names the type on the Lisp level (this may not
5894 be the same as the C type name; typically, the C type name has
5895 underscores, while the Lisp string has dashes), various method pointers,
5896 and the name of the C structure that contains the object. The methods
5897 are used to encapsulate type-specific information about the object, such
5898 as how to print it or mark it for garbage collection, so that it's easy
5899 to add new object types without having to add a specific case for each
5900 new type in a bunch of different places.
5902 The difference between @code{DEFINE_LRECORD_IMPLEMENTATION()} and
5903 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()} is that the former is
5904 used for fixed-size object types and the latter is for variable-size
5905 object types. Most object types are fixed-size; some complex
5906 types, however (e.g. window configurations), are variable-size.
5907 Variable-size object types have an extra method, which is called
5908 to determine the actual size of a particular object of that type.
5909 (Currently this is only used for keeping allocation statistics.)
5911 For the purpose of keeping allocation statistics, the allocation
5912 engine keeps a list of all the different types that exist. Note that,
5913 since @code{DEFINE_LRECORD_IMPLEMENTATION()} is a macro that is
5914 specified at top-level, there is no way for it to initialize the global
5915 data structures containing type information, like
5916 @code{lrecord_implementations_table}. For this reason a call to
5917 @code{INIT_LRECORD_IMPLEMENTATION} must be added to the same source file
5918 containing @code{DEFINE_LRECORD_IMPLEMENTATION}, but instead of to the
5919 top level, to one of the init functions, typically
5920 @code{syms_of_@var{foo}.c}. @code{INIT_LRECORD_IMPLEMENTATION} must be
5921 called before an object of this type is used.
5923 The type number is also used to index into an array holding the number
5924 of objects of each type and the total memory allocated for objects of
5925 that type. The statistics in this array are computed during the sweep
5926 stage. These statistics are returned by the call to
5927 @code{garbage-collect}.
5929 Note that for every type defined with a @code{DEFINE_LRECORD_*()}
5930 macro, there needs to be a @code{DECLARE_LRECORD_IMPLEMENTATION()}
5931 somewhere in a @file{.h} file, and this @file{.h} file needs to be
5932 included by @file{inline.c}.
5934 Furthermore, there should generally be a set of @code{XFOOBAR()},
5935 @code{FOOBARP()}, etc. macros in a @file{.h} (or occasionally @file{.c})
5936 file. To create one of these, copy an existing model and modify as
5939 @strong{Please note:} If you define an lrecord in an external
5940 dynamically-loaded module, you must use @code{DECLARE_EXTERNAL_LRECORD},
5941 @code{DEFINE_EXTERNAL_LRECORD_IMPLEMENTATION}, and
5942 @code{DEFINE_EXTERNAL_LRECORD_SEQUENCE_IMPLEMENTATION} instead of the
5943 non-EXTERNAL forms. These macros will dynamically add new type numbers
5944 to the global enum that records them, whereas the non-EXTERNAL forms
5945 assume that the programmer has already inserted the correct type numbers
5946 into the enum's code at compile-time.
5948 The various methods in the lrecord implementation structure are:
5953 A @dfn{mark} method. This is called during the marking stage and passed
5954 a function pointer (usually the @code{mark_object()} function), which is
5955 used to mark an object. All Lisp objects that are contained within the
5956 object need to be marked by applying this function to them. The mark
5957 method should also return a Lisp object, which should be either @code{nil} or
5958 an object to mark. (This can be used in lieu of calling
5959 @code{mark_object()} on the object, to reduce the recursion depth, and
5960 consequently should be the most heavily nested sub-object, such as a
5963 @strong{Please note:} When the mark method is called, garbage collection
5964 is in progress, and special precautions need to be taken when accessing
5965 objects; see section (B) above.
5967 If your mark method does not need to do anything, it can be
5971 A @dfn{print} method. This is called to create a printed representation
5972 of the object, whenever @code{princ}, @code{prin1}, or the like is
5973 called. It is passed the object, a stream to which the output is to be
5974 directed, and an @code{escapeflag} which indicates whether the object's
5975 printed representation should be @dfn{escaped} so that it is
5976 readable. (This corresponds to the difference between @code{princ} and
5977 @code{prin1}.) Basically, @dfn{escaped} means that strings will have
5978 quotes around them and confusing characters in the strings such as
5979 quotes, backslashes, and newlines will be backslashed; and that special
5980 care will be taken to make symbols print in a readable fashion
5981 (e.g. symbols that look like numbers will be backslashed). Other
5982 readable objects should perhaps pass @code{escapeflag} on when
5983 sub-objects are printed, so that readability is preserved when necessary
5984 (or if not, always pass in a 1 for @code{escapeflag}). Non-readable
5985 objects should in general ignore @code{escapeflag}, except that some use
5986 it as an indication that more verbose output should be given.
5988 Sub-objects are printed using @code{print_internal()}, which takes
5989 exactly the same arguments as are passed to the print method.
5991 Literal C strings should be printed using @code{write_c_string()},
5992 or @code{write_string_1()} for non-null-terminated strings.
5994 Functions that do not have a readable representation should check the
5995 @code{print_readably} flag and signal an error if it is set.
5997 If you specify NULL for the print method, the
5998 @code{default_object_printer()} will be used.
6001 A @dfn{finalize} method. This is called at the beginning of the sweep
6002 stage on lcrecords that are about to be freed, and should be used to
6003 perform any extra object cleanup. This typically involves freeing any
6004 extra @code{malloc()}ed memory associated with the object, releasing any
6005 operating-system and window-system resources associated with the object
6006 (e.g. pixmaps, fonts), etc.
6008 The finalize method can be NULL if nothing needs to be done.
6010 WARNING #1: The finalize method is also called at the end of the dump
6011 phase; this time with the for_disksave parameter set to non-zero. The
6012 object is @emph{not} about to disappear, so you have to make sure to
6013 @emph{not} free any extra @code{malloc()}ed memory if you're going to
6014 need it later. (Also, signal an error if there are any operating-system
6015 and window-system resources here, because they can't be dumped.)
6017 Finalize methods should, as a rule, set to zero any pointers after
6018 they've been freed, and check to make sure pointers are not zero before
6019 freeing. Although I'm pretty sure that finalize methods are not called
6020 twice on the same object (except for the @code{for_disksave} proviso),
6021 we've gotten nastily burned in some cases by not doing this.
6023 WARNING #2: The finalize method is @emph{only} called for
6024 lcrecords, @emph{not} for simply lrecords. If you need a
6025 finalize method for simple lrecords, you have to stick
6026 it in the @code{ADDITIONAL_FREE_foo()} macro in @file{alloc.c}.
6028 WARNING #3: Things are in an @emph{extremely} bizarre state
6029 when @code{ADDITIONAL_FREE_foo()} is called, so you have to
6030 be incredibly careful when writing one of these functions.
6031 See the comment in @code{gc_sweep()}. If you ever have to add
6032 one of these, consider using an lcrecord or dealing with
6033 the problem in a different fashion.
6036 An @dfn{equal} method. This compares the two objects for similarity,
6037 when @code{equal} is called. It should compare the contents of the
6038 objects in some reasonable fashion. It is passed the two objects and a
6039 @dfn{depth} value, which is used to catch circular objects. To compare
6040 sub-Lisp-objects, call @code{internal_equal()} and bump the depth value
6041 by one. If this value gets too high, a @code{circular-object} error
6044 If this is NULL, objects are @code{equal} only when they are @code{eq},
6048 A @dfn{hash} method. This is used to hash objects when they are to be
6049 compared with @code{equal}. The rule here is that if two objects are
6050 @code{equal}, they @emph{must} hash to the same value; i.e. your hash
6051 function should use some subset of the sub-fields of the object that are
6052 compared in the ``equal'' method. If you specify this method as
6053 @code{NULL}, the object's pointer will be used as the hash, which will
6054 @emph{fail} if the object has an @code{equal} method, so don't do this.
6056 To hash a sub-Lisp-object, call @code{internal_hash()}. Bump the
6057 depth by one, just like in the ``equal'' method.
6059 To convert a Lisp object directly into a hash value (using
6060 its pointer), use @code{LISP_HASH()}. This is what happens when
6061 the hash method is NULL.
6063 To hash two or more values together into a single value, use
6064 @code{HASH2()}, @code{HASH3()}, @code{HASH4()}, etc.
6067 @dfn{getprop}, @dfn{putprop}, @dfn{remprop}, and @dfn{plist} methods.
6068 These are used for object types that have properties. I don't feel like
6069 documenting them here. If you create one of these objects, you have to
6070 use different macros to define them,
6071 i.e. @code{DEFINE_LRECORD_IMPLEMENTATION_WITH_PROPS()} or
6072 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION_WITH_PROPS()}.
6075 A @dfn{size_in_bytes} method, when the object is of variable-size.
6076 (i.e. declared with a @code{_SEQUENCE_IMPLEMENTATION} macro.) This should
6077 simply return the object's size in bytes, exactly as you might expect.
6078 For an example, see the methods for window configurations and opaques.
6081 @node Low-level allocation
6082 @section Low-level allocation
6083 @cindex low-level allocation
6084 @cindex allocation, low-level
6086 Memory that you want to allocate directly should be allocated using
6087 @code{xmalloc()} rather than @code{malloc()}. This implements
6088 error-checking on the return value, and once upon a time did some more
6089 vital stuff (i.e. @code{BLOCK_INPUT}, which is no longer necessary).
6090 Free using @code{xfree()}, and realloc using @code{xrealloc()}. Note
6091 that @code{xmalloc()} will do a non-local exit if the memory can't be
6092 allocated. (Many functions, however, do not expect this, and thus XEmacs
6093 will likely crash if this happens. @strong{This is a bug.} If you can,
6094 you should strive to make your function handle this OK. However, it's
6095 difficult in the general circumstance, perhaps requiring extra
6096 unwind-protects and such.)
6098 Note that XEmacs provides two separate replacements for the standard
6099 @code{malloc()} library function. These are called @dfn{old GNU malloc}
6100 (@file{malloc.c}) and @dfn{new GNU malloc} (@file{gmalloc.c}),
6101 respectively. New GNU malloc is better in pretty much every way than
6102 old GNU malloc, and should be used if possible. (It used to be that on
6103 some systems, the old one worked but the new one didn't. I think this
6104 was due specifically to a bug in SunOS, which the new one now works
6105 around; so I don't think the old one ever has to be used any more.) The
6106 primary difference between both of these mallocs and the standard system
6107 malloc is that they are much faster, at the expense of increased space.
6108 The basic idea is that memory is allocated in fixed chunks of powers of
6109 two. This allows for basically constant malloc time, since the various
6110 chunks can just be kept on a number of free lists. (The standard system
6111 malloc typically allocates arbitrary-sized chunks and has to spend some
6112 time, sometimes a significant amount of time, walking the heap looking
6113 for a free block to use and cleaning things up.) The new GNU malloc
6114 improves on things by allocating large objects in chunks of 4096 bytes
6115 rather than in ever larger powers of two, which results in ever larger
6116 wastage. There is a slight speed loss here, but it's of doubtful
6119 NOTE: Apparently there is a third-generation GNU malloc that is
6120 significantly better than the new GNU malloc, and should probably
6121 be included in XEmacs.
6123 There is also the relocating allocator, @file{ralloc.c}. This actually
6124 moves blocks of memory around so that the @code{sbrk()} pointer shrunk
6125 and virtual memory released back to the system. On some systems,
6126 this is a big win. On all systems, it causes a noticeable (and
6127 sometimes huge) speed penalty, so I turn it off by default.
6128 @file{ralloc.c} only works with the new GNU malloc in @file{gmalloc.c}.
6129 There are also two versions of @file{ralloc.c}, one that uses @code{mmap()}
6130 rather than block copies to move data around. This purports to
6131 be faster, although that depends on the amount of data that would
6132 have had to be block copied and the system-call overhead for
6133 @code{mmap()}. I don't know exactly how this works, except that the
6134 relocating-allocation routines are pretty much used only for
6135 the memory allocated for a buffer, which is the biggest consumer
6136 of space, esp. of space that may get freed later.
6138 Note that the GNU mallocs have some ``memory warning'' facilities.
6139 XEmacs taps into them and issues a warning through the standard
6140 warning system, when memory gets to 75%, 85%, and 95% full.
6141 (On some systems, the memory warnings are not functional.)
6143 Allocated memory that is going to be used to make a Lisp object
6144 is created using @code{allocate_lisp_storage()}. This just calls
6145 @code{xmalloc()}. It used to verify that the pointer to the memory can
6146 fit into a Lisp word, before the current Lisp object representation was
6147 introduced. @code{allocate_lisp_storage()} is called by
6148 @code{alloc_lcrecord()}, @code{ALLOCATE_FIXED_TYPE()}, and the vector
6149 and bit-vector creation routines. These routines also call
6150 @code{INCREMENT_CONS_COUNTER()} at the appropriate times; this keeps
6151 statistics on how much memory is allocated, so that garbage-collection
6152 can be invoked when the threshold is reached.
6158 Conses are allocated in standard frob blocks. The only thing to
6159 note is that conses can be explicitly freed using @code{free_cons()}
6160 and associated functions @code{free_list()} and @code{free_alist()}. This
6161 immediately puts the conses onto the cons free list, and decrements
6162 the statistics on memory allocation appropriately. This is used
6163 to good effect by some extremely commonly-used code, to avoid
6164 generating extra objects and thereby triggering GC sooner.
6165 However, you have to be @emph{extremely} careful when doing this.
6166 If you mess this up, you will get BADLY BURNED, and it has happened
6173 As mentioned above, each vector is @code{malloc()}ed individually, and
6174 all are threaded through the variable @code{all_vectors}. Vectors are
6175 marked strangely during garbage collection, by kludging the size field.
6176 Note that the @code{struct Lisp_Vector} is declared with its
6177 @code{contents} field being a @emph{stretchy} array of one element. It
6178 is actually @code{malloc()}ed with the right size, however, and access
6179 to any element through the @code{contents} array works fine.
6186 Bit vectors work exactly like vectors, except for more complicated
6187 code to access an individual bit, and except for the fact that bit
6188 vectors are lrecords while vectors are not. (The only difference here is
6189 that there's an lrecord implementation pointer at the beginning and the
6190 tag field in bit vector Lisp words is ``lrecord'' rather than
6197 Symbols are also allocated in frob blocks. Symbols in the awful
6198 horrible obarray structure are chained through their @code{next} field.
6200 Remember that @code{intern} looks up a symbol in an obarray, creating
6207 Markers are allocated in frob blocks, as usual. They are kept
6208 in a buffer unordered, but in a doubly-linked list so that they
6209 can easily be removed. (Formerly this was a singly-linked list,
6210 but in some cases garbage collection took an extraordinarily
6211 long time due to the O(N^2) time required to remove lots of
6212 markers from a buffer.) Markers are removed from a buffer in
6213 the finalize stage, in @code{ADDITIONAL_FREE_marker()}.
6219 As mentioned above, strings are a special case. A string is logically
6220 two parts, a fixed-size object (containing the length, property list,
6221 and a pointer to the actual data), and the actual data in the string.
6222 The fixed-size object is a @code{struct Lisp_String} and is allocated in
6223 frob blocks, as usual. The actual data is stored in special
6224 @dfn{string-chars blocks}, which are 8K blocks of memory.
6225 Currently-allocated strings are simply laid end to end in these
6226 string-chars blocks, with a pointer back to the @code{struct Lisp_String}
6227 stored before each string in the string-chars block. When a new string
6228 needs to be allocated, the remaining space at the end of the last
6229 string-chars block is used if there's enough, and a new string-chars
6230 block is created otherwise.
6232 There are never any holes in the string-chars blocks due to the string
6233 compaction and relocation that happens at the end of garbage collection.
6234 During the sweep stage of garbage collection, when objects are
6235 reclaimed, the garbage collector goes through all string-chars blocks,
6236 looking for unused strings. Each chunk of string data is preceded by a
6237 pointer to the corresponding @code{struct Lisp_String}, which indicates
6238 both whether the string is used and how big the string is, i.e. how to
6239 get to the next chunk of string data. Holes are compressed by
6240 block-copying the next string into the empty space and relocating the
6241 pointer stored in the corresponding @code{struct Lisp_String}.
6242 @strong{This means you have to be careful with strings in your code.}
6243 See the section above on @code{GCPRO}ing.
6245 Note that there is one situation not handled: a string that is too big
6246 to fit into a string-chars block. Such strings, called @dfn{big
6247 strings}, are all @code{malloc()}ed as their own block. (#### Although it
6248 would make more sense for the threshold for big strings to be somewhat
6249 lower, e.g. 1/2 or 1/4 the size of a string-chars block. It seems that
6250 this was indeed the case formerly---indeed, the threshold was set at
6251 1/8---but Mly forgot about this when rewriting things for 19.8.)
6253 Note also that the string data in string-chars blocks is padded as
6254 necessary so that proper alignment constraints on the @code{struct
6255 Lisp_String} back pointers are maintained.
6257 Finally, strings can be resized. This happens in Mule when a
6258 character is substituted with a different-length character, or during
6259 modeline frobbing. (You could also export this to Lisp, but it's not
6260 done so currently.) Resizing a string is a potentially tricky process.
6261 If the change is small enough that the padding can absorb it, nothing
6262 other than a simple memory move needs to be done. Keep in mind,
6263 however, that the string can't shrink too much because the offset to the
6264 next string in the string-chars block is computed by looking at the
6265 length and rounding to the nearest multiple of four or eight. If the
6266 string would shrink or expand beyond the correct padding, new string
6267 data needs to be allocated at the end of the last string-chars block and
6268 the data moved appropriately. This leaves some dead string data, which
6269 is marked by putting a special marker of 0xFFFFFFFF in the @code{struct
6270 Lisp_String} pointer before the data (there's no real @code{struct
6271 Lisp_String} to point to and relocate), and storing the size of the dead
6272 string data (which would normally be obtained from the now-non-existent
6273 @code{struct Lisp_String}) at the beginning of the dead string data gap.
6274 The string compactor recognizes this special 0xFFFFFFFF marker and
6275 handles it correctly.
6277 @node Compiled Function
6278 @section Compiled Function
6279 @cindex compiled function
6280 @cindex function, compiled
6285 @node Dumping, Events and the Event Loop, Allocation of Objects in XEmacs Lisp, Top
6289 @section What is dumping and its justification
6290 @cindex dumping and its justification, what is
6292 The C code of XEmacs is just a Lisp engine with a lot of built-in
6293 primitives useful for writing an editor. The editor itself is written
6294 mostly in Lisp, and represents around 100K lines of code. Loading and
6295 executing the initialization of all this code takes a bit a time (five
6296 to ten times the usual startup time of current xemacs) and requires
6297 having all the lisp source files around. Having to reload them each
6298 time the editor is started would not be acceptable.
6300 The traditional solution to this problem is called dumping: the build
6301 process first creates the lisp engine under the name @file{temacs}, then
6302 runs it until it has finished loading and initializing all the lisp
6303 code, and eventually creates a new executable called @file{xemacs}
6304 including both the object code in @file{temacs} and all the contents of
6305 the memory after the initialization.
6307 This solution, while working, has a huge problem: the creation of the
6308 new executable from the actual contents of memory is an extremely
6309 system-specific process, quite error-prone, and which interferes with a
6310 lot of system libraries (like malloc). It is even getting worse
6311 nowadays with libraries using constructors which are automatically
6312 called when the program is started (even before main()) which tend to
6313 crash when they are called multiple times, once before dumping and once
6314 after (IRIX 6.x libz.so pulls in some C++ image libraries thru
6315 dependencies which have this problem). Writing the dumper is also one
6316 of the most difficult parts of porting XEmacs to a new operating system.
6317 Basically, `dumping' is an operation that is just not officially
6318 supported on many operating systems.
6320 The aim of the portable dumper is to solve the same problem as the
6321 system-specific dumper, that is to be able to reload quickly, using only
6322 a small number of files, the fully initialized lisp part of the editor,
6323 without any system-specific hacks.
6327 * Data descriptions::
6330 * Remaining issues::
6335 @cindex dumping overview
6337 The portable dumping system has to:
6341 At dump time, write all initialized, non-quickly-rebuildable data to a
6342 file [Note: currently named @file{xemacs.dmp}, but the name will
6343 change], along with all informations needed for the reloading.
6346 When starting xemacs, reload the dump file, relocate it to its new
6347 starting address if needed, and reinitialize all pointers to this
6348 data. Also, rebuild all the quickly rebuildable data.
6351 @node Data descriptions
6352 @section Data descriptions
6353 @cindex dumping data descriptions
6355 The more complex task of the dumper is to be able to write lisp objects
6356 (lrecords) and C structs to disk and reload them at a different address,
6357 updating all the pointers they include in the process. This is done by
6358 using external data descriptions that give information about the layout
6359 of the structures in memory.
6361 The specification of these descriptions is in lrecord.h. A description
6362 of an lrecord is an array of struct lrecord_description. Each of these
6363 structs include a type, an offset in the structure and some optional
6364 parameters depending on the type. For instance, here is the string
6368 static const struct lrecord_description string_description[] = @{
6369 @{ XD_BYTECOUNT, offsetof (Lisp_String, size) @},
6370 @{ XD_OPAQUE_DATA_PTR, offsetof (Lisp_String, data), XD_INDIRECT(0, 1) @},
6371 @{ XD_LISP_OBJECT, offsetof (Lisp_String, plist) @},
6376 The first line indicates a member of type Bytecount, which is used by
6377 the next, indirect directive. The second means "there is a pointer to
6378 some opaque data in the field @code{data}". The length of said data is
6379 given by the expression @code{XD_INDIRECT(0, 1)}, which means "the value
6380 in the 0th line of the description (welcome to C) plus one". The third
6381 line means "there is a Lisp_Object member @code{plist} in the Lisp_String
6382 structure". @code{XD_END} then ends the description.
6384 This gives us all the information we need to move around what is pointed
6385 to by a structure (C or lrecord) and, by transitivity, everything that
6386 it points to. The only missing information for dumping is the size of
6387 the structure. For lrecords, this is part of the
6388 lrecord_implementation, so we don't need to duplicate it. For C
6389 structures we use a struct struct_description, which includes a size
6390 field and a pointer to an associated array of lrecord_description.
6393 @section Dumping phase
6394 @cindex dumping phase
6396 Dumping is done by calling the function pdump() (in dumper.c) which is
6397 invoked from Fdump_emacs (in emacs.c). This function performs a number
6401 * Object inventory::
6402 * Address allocation::
6405 * Pointers dumping::
6408 @node Object inventory
6409 @subsection Object inventory
6410 @cindex dumping object inventory
6412 The first task is to build the list of the objects to dump. This
6420 We end up with one @code{pdump_entry_list_elmt} per object group (arrays
6421 of C structs are kept together) which includes a pointer to the first
6422 object of the group, the per-object size and the count of objects in the
6423 group, along with some other information which is initialized later.
6425 These entries are linked together in @code{pdump_entry_list} structures
6426 and can be enumerated thru either:
6430 the @code{pdump_object_table}, an array of @code{pdump_entry_list}, one
6431 per lrecord type, indexed by type number.
6434 the @code{pdump_opaque_data_list}, used for the opaque data which does
6435 not include pointers, and hence does not need descriptions.
6438 the @code{pdump_struct_table}, which is a vector of
6439 @code{struct_description}/@code{pdump_entry_list} pairs, used for
6440 non-opaque C structures.
6443 This uses a marking strategy similar to the garbage collector. Some
6448 We do not use the mark bit (which does not exist for C structures
6449 anyway); we use a big hash table instead.
6452 We do not use the mark function of lrecords but instead rely on the
6453 external descriptions. This happens essentially because we need to
6454 follow pointers to C structures and opaque data in addition to
6455 Lisp_Object members.
6458 This is done by @code{pdump_register_object()}, which handles Lisp_Object
6459 variables, and @code{pdump_register_struct()} which handles C structures,
6460 which both delegate the description management to @code{pdump_register_sub()}.
6462 The hash table doubles as a map object to pdump_entry_list_elmt (i.e.
6463 allows us to look up a pdump_entry_list_elmt with the object it points
6464 to). Entries are added with @code{pdump_add_entry()} and looked up with
6465 @code{pdump_get_entry()}. There is no need for entry removal. The hash
6466 value is computed quite simply from the object pointer by
6467 @code{pdump_make_hash()}.
6469 The roots for the marking are:
6473 the @code{staticpro}'ed variables (there is a special @code{staticpro_nodump()}
6474 call for protected variables we do not want to dump).
6477 the variables registered via @code{dump_add_root_object}
6478 (@code{staticpro()} is equivalent to @code{staticpro_nodump()} +
6479 @code{dump_add_root_object()}).
6482 the variables registered via @code{dump_add_root_struct_ptr}, each of
6483 which points to a C structure.
6486 This does not include the GCPRO'ed variables, the specbinds, the
6487 catchtags, the backlist, the redisplay or the profiling info, since we
6488 do not want to rebuild the actual chain of lisp calls which end up to
6489 the dump-emacs call, only the global variables.
6491 Weak lists and weak hash tables are dumped as if they were their
6492 non-weak equivalent (without changing their type, of course). This has
6493 not yet been a problem.
6495 @node Address allocation
6496 @subsection Address allocation
6497 @cindex dumping address allocation
6500 The next step is to allocate the offsets of each of the objects in the
6501 final dump file. This is done by @code{pdump_allocate_offset()} which
6502 is called indirectly by @code{pdump_scan_by_alignment()}.
6504 The strategy to deal with alignment problems uses these facts:
6508 real world alignment requirements are powers of two.
6511 the C compiler is required to adjust the size of a struct so that you
6512 can have an array of them next to each other. This means you can have an
6513 upper bound of the alignment requirements of a given structure by
6514 looking at which power of two its size is a multiple.
6517 the non-variant part of variable size lrecords has an alignment
6521 Hence, for each lrecord type, C struct type or opaque data block the
6522 alignment requirement is computed as a power of two, with a minimum of
6523 2^2 for lrecords. @code{pdump_scan_by_alignment()} then scans all the
6524 @code{pdump_entry_list_elmt}'s, the ones with the highest requirements
6525 first. This ensures the best packing.
6527 The maximum alignment requirement we take into account is 2^8.
6529 @code{pdump_allocate_offset()} only has to do a linear allocation,
6530 starting at offset 256 (this leaves room for the header and keeps the
6534 @subsection The header
6535 @cindex dumping, the header
6537 The next step creates the file and writes a header with a signature and
6538 some random information in it. The @code{reloc_address} field, which
6539 indicates at which address the file should be loaded if we want to avoid
6540 post-reload relocation, is set to 0. It then seeks to offset 256 (base
6541 offset for the objects).
6544 @subsection Data dumping
6545 @cindex data dumping
6546 @cindex dumping, data
6548 The data is dumped in the same order as the addresses were allocated by
6549 @code{pdump_dump_data()}, called from @code{pdump_scan_by_alignment()}.
6550 This function copies the data to a temporary buffer, relocates all
6551 pointers in the object to the addresses allocated in step Address
6552 Allocation, and writes it to the file. Using the same order means that,
6553 if we are careful with lrecords whose size is not a multiple of 4, we
6554 are ensured that the object is always written at the offset in the file
6555 allocated in step Address Allocation.
6557 @node Pointers dumping
6558 @subsection Pointers dumping
6559 @cindex pointers dumping
6560 @cindex dumping, pointers
6562 A bunch of tables needed to reassign properly the global pointers are
6563 then written. They are:
6567 the pdump_root_struct_ptrs dynarr
6569 the pdump_opaques dynarr
6571 a vector of all the offsets to the objects in the file that include a
6572 description (for faster relocation at reload time)
6574 the pdump_root_objects and pdump_weak_object_chains dynarrs.
6577 For each of the dynarrs we write both the pointer to the variables and
6578 the relocated offset of the object they point to. Since these variables
6579 are global, the pointers are still valid when restarting the program and
6580 are used to regenerate the global pointers.
6582 The @code{pdump_weak_object_chains} dynarr is a special case. The
6583 variables it points to are the head of weak linked lists of lisp objects
6584 of the same type. Not all objects of this list are dumped so the
6585 relocated pointer we associate with them points to the first dumped
6586 object of the list, or Qnil if none is available. This is also the
6587 reason why they are not used as roots for the purpose of object
6590 Some very important information like the @code{staticpros} and
6591 @code{lrecord_implementations_table} are handled indirectly using
6592 @code{dump_add_opaque} or @code{dump_add_root_struct_ptr}.
6594 This is the end of the dumping part.
6596 @node Reloading phase
6597 @section Reloading phase
6598 @cindex reloading phase
6599 @cindex dumping, reloading phase
6601 @subsection File loading
6602 @cindex dumping, file loading
6604 The file is mmap'ed in memory (which ensures a PAGESIZE alignment, at
6605 least 4096), or if mmap is unavailable or fails, a 256-bytes aligned
6606 malloc is done and the file is loaded.
6608 Some variables are reinitialized from the values found in the header.
6610 The difference between the actual loading address and the reloc_address
6611 is computed and will be used for all the relocations.
6614 @subsection Putting back the pdump_opaques
6615 @cindex dumping, putting back the pdump_opaques
6617 The memory contents are restored in the obvious and trivial way.
6620 @subsection Putting back the pdump_root_struct_ptrs
6621 @cindex dumping, putting back the pdump_root_struct_ptrs
6623 The variables pointed to by pdump_root_struct_ptrs in the dump phase are
6624 reset to the right relocated object addresses.
6627 @subsection Object relocation
6628 @cindex dumping, object relocation
6630 All the objects are relocated using their description and their offset
6631 by @code{pdump_reloc_one}. This step is unnecessary if the
6632 reloc_address is equal to the file loading address.
6635 @subsection Putting back the pdump_root_objects and pdump_weak_object_chains
6636 @cindex dumping, putting back the pdump_root_objects and pdump_weak_object_chains
6638 Same as Putting back the pdump_root_struct_ptrs.
6641 @subsection Reorganize the hash tables
6642 @cindex dumping, reorganize the hash tables
6644 Since some of the hash values in the lisp hash tables are
6645 address-dependent, their layout is now wrong. So we go through each of
6646 them and have them resorted by calling @code{pdump_reorganize_hash_table}.
6648 @node Remaining issues
6649 @section Remaining issues
6650 @cindex dumping, remaining issues
6652 The build process will have to start a post-dump xemacs, ask it the
6653 loading address (which will, hopefully, be always the same between
6654 different xemacs invocations) and relocate the file to the new address.
6655 This way the object relocation phase will not have to be done, which
6656 means no writes in the objects and that, because of the use of mmap, the
6657 dumped data will be shared between all the xemacs running on the
6660 Some executable signature will be necessary to ensure that a given dump
6661 file is really associated with a given executable, or random crashes
6662 will occur. Maybe a random number set at compile or configure time thru
6663 a define. This will also allow for having differently-compiled xemacsen
6664 on the same system (mule and no-mule comes to mind).
6666 The DOC file contents should probably end up in the dump file.
6669 @node Events and the Event Loop, Evaluation; Stack Frames; Bindings, Dumping, Top
6670 @chapter Events and the Event Loop
6671 @cindex events and the event loop
6672 @cindex event loop, events and the
6675 * Introduction to Events::
6677 * Specifics of the Event Gathering Mechanism::
6678 * Specifics About the Emacs Event::
6679 * The Event Stream Callback Routines::
6680 * Other Event Loop Functions::
6681 * Converting Events::
6682 * Dispatching Events; The Command Builder::
6685 @node Introduction to Events
6686 @section Introduction to Events
6687 @cindex events, introduction to
6689 An event is an object that encapsulates information about an
6690 interesting occurrence in the operating system. Events are
6691 generated either by user action, direct (e.g. typing on the
6692 keyboard or moving the mouse) or indirect (moving another
6693 window, thereby generating an expose event on an Emacs frame),
6694 or as a result of some other typically asynchronous action happening,
6695 such as output from a subprocess being ready or a timer expiring.
6696 Events come into the system in an asynchronous fashion (typically
6697 through a callback being called) and are converted into a
6698 synchronous event queue (first-in, first-out) in a process that
6699 we will call @dfn{collection}.
6701 Note that each application has its own event queue. (It is
6702 immaterial whether the collection process directly puts the
6703 events in the proper application's queue, or puts them into
6704 a single system queue, which is later split up.)
6706 The most basic level of event collection is done by the
6707 operating system or window system. Typically, XEmacs does
6708 its own event collection as well. Often there are multiple
6709 layers of collection in XEmacs, with events from various
6710 sources being collected into a queue, which is then combined
6711 with other sources to go into another queue (i.e. a second
6712 level of collection), with perhaps another level on top of
6715 XEmacs has its own types of events (called @dfn{Emacs events}),
6716 which provides an abstract layer on top of the system-dependent
6717 nature of the most basic events that are received. Part of the
6718 complex nature of the XEmacs event collection process involves
6719 converting from the operating-system events into the proper
6720 Emacs events---there may not be a one-to-one correspondence.
6722 Emacs events are documented in @file{events.h}; I'll discuss them
6728 @cindex events, main loop
6730 The @dfn{command loop} is the top-level loop that the editor is always
6731 running. It loops endlessly, calling @code{next-event} to retrieve an
6732 event and @code{dispatch-event} to execute it. @code{dispatch-event} does
6733 the appropriate thing with non-user events (process, timeout,
6734 magic, eval, mouse motion); this involves calling a Lisp handler
6735 function, redrawing a newly-exposed part of a frame, reading
6736 subprocess output, etc. For user events, @code{dispatch-event}
6737 looks up the event in relevant keymaps or menubars; when a
6738 full key sequence or menubar selection is reached, the appropriate
6739 function is executed. @code{dispatch-event} may have to keep state
6740 across calls; this is done in the ``command-builder'' structure
6741 associated with each console (remember, there's usually only
6742 one console), and the engine that looks up keystrokes and
6743 constructs full key sequences is called the @dfn{command builder}.
6744 This is documented elsewhere.
6746 The guts of the command loop are in @code{command_loop_1()}. This
6747 function doesn't catch errors, though---that's the job of
6748 @code{command_loop_2()}, which is a condition-case (i.e. error-trapping)
6749 wrapper around @code{command_loop_1()}. @code{command_loop_1()} never
6750 returns, but may get thrown out of.
6752 When an error occurs, @code{cmd_error()} is called, which usually
6753 invokes the Lisp error handler in @code{command-error}; however, a
6754 default error handler is provided if @code{command-error} is @code{nil}
6755 (e.g. during startup). The purpose of the error handler is simply to
6756 display the error message and do associated cleanup; it does not need to
6757 throw anywhere. When the error handler finishes, the condition-case in
6758 @code{command_loop_2()} will finish and @code{command_loop_2()} will
6759 reinvoke @code{command_loop_1()}.
6761 @code{command_loop_2()} is invoked from three places: from
6762 @code{initial_command_loop()} (called from @code{main()} at the end of
6763 internal initialization), from the Lisp function @code{recursive-edit},
6764 and from @code{call_command_loop()}.
6766 @code{call_command_loop()} is called when a macro is started and when
6767 the minibuffer is entered; normal termination of the macro or minibuffer
6768 causes a throw out of the recursive command loop. (To
6769 @code{execute-kbd-macro} for macros and @code{exit} for minibuffers.
6770 Note also that the low-level minibuffer-entering function,
6771 @code{read-minibuffer-internal}, provides its own error handling and
6772 does not need @code{command_loop_2()}'s error encapsulation; so it tells
6773 @code{call_command_loop()} to invoke @code{command_loop_1()} directly.)
6775 Note that both read-minibuffer-internal and recursive-edit set up a
6776 catch for @code{exit}; this is why @code{abort-recursive-edit}, which
6777 throws to this catch, exits out of either one.
6779 @code{initial_command_loop()}, called from @code{main()}, sets up a
6780 catch for @code{top-level} when invoking @code{command_loop_2()},
6781 allowing functions to throw all the way to the top level if they really
6782 need to. Before invoking @code{command_loop_2()},
6783 @code{initial_command_loop()} calls @code{top_level_1()}, which handles
6784 all of the startup stuff (creating the initial frame, handling the
6785 command-line options, loading the user's @file{.emacs} file, etc.). The
6786 function that actually does this is in Lisp and is pointed to by the
6787 variable @code{top-level}; normally this function is
6788 @code{normal-top-level}. @code{top_level_1()} is just an error-handling
6789 wrapper similar to @code{command_loop_2()}. Note also that
6790 @code{initial_command_loop()} sets up a catch for @code{top-level} when
6791 invoking @code{top_level_1()}, just like when it invokes
6792 @code{command_loop_2()}.
6794 @node Specifics of the Event Gathering Mechanism
6795 @section Specifics of the Event Gathering Mechanism
6796 @cindex event gathering mechanism, specifics of the
6798 Here is an approximate diagram of the collection processes
6799 at work in XEmacs, under TTY's (TTY's are simpler than X
6800 so we'll look at this first):
6804 asynch. asynch. asynch. asynch. [Collectors in
6805 kbd events kbd events process process the OS]
6808 | | | | SIGINT, [signal handlers
6809 | | | | SIGQUIT, in XEmacs]
6811 file file file file SIGALRM
6812 desc. desc. desc. desc. |
6813 (TTY) (TTY) (pipe) (pipe) |
6814 | | | | fake timeouts
6822 ------>-----------<----------------<----------------
6825 | [collected using select() in emacs_tty_next_event()
6826 | and converted to the appropriate Emacs event]
6829 V (above this line is TTY-specific)
6830 Emacs -----------------------------------------------
6831 event (below this line is the generic event mechanism)
6834 was there if not, call
6835 a SIGINT? emacs_tty_next_event()
6842 | [collected in event_stream_next_event();
6843 | SIGINT is converted using maybe_read_quit_event()]
6848 \---->------>----- maybe_kbd_translate() ---->---\
6852 command event queue |
6854 (contains events that were event queue, call
6855 read earlier but not processed, event_stream_next_event()
6856 typically when waiting in a |
6857 sit-for, sleep-for, etc. for |
6858 a particular event to be received) |
6862 ---->------------------------------------<----
6865 | next_event_internal()]
6867 unread- unread- event from |
6868 command- command- keyboard else, call
6869 events event macro next_event_internal()
6874 --------->----------------------<------------
6876 | [collected in `next-event', which may loop
6877 | more than once if the event it gets is on
6878 | a dead frame, device, etc.]
6882 feed into top-level event loop,
6883 which repeatedly calls `next-event'
6884 and then dispatches the event
6885 using `dispatch-event'
6888 Notice the separation between TTY-specific and generic event mechanism.
6889 When using the Xt-based event loop, the TTY-specific stuff is replaced
6890 but the rest stays the same.
6892 It's also important to realize that only one different kind of
6893 system-specific event loop can be operating at a time, and must be able
6894 to receive all kinds of events simultaneously. For the two existing
6895 event loops (implemented in @file{event-tty.c} and @file{event-Xt.c},
6896 respectively), the TTY event loop @emph{only} handles TTY consoles,
6897 while the Xt event loop handles @emph{both} TTY and X consoles. This
6898 situation is different from all of the output handlers, where you simply
6899 have one per console type.
6901 Here's the Xt Event Loop Diagram (notice that below a certain point,
6902 it's the same as the above diagram):
6905 asynch. asynch. asynch. asynch. [Collectors in
6906 kbd kbd process process the OS]
6907 events events output output
6909 | | | | asynch. asynch. [Collectors in the
6910 | | | | X X OS and X Window System]
6911 | | | | events events
6914 | | | | | | SIGINT, [signal handlers
6915 | | | | | | SIGQUIT, in XEmacs]
6916 | | | | | | SIGWINCH,
6920 | | | | | | | timeouts
6925 file file file file file file file |
6926 desc. desc. desc. desc. desc. desc. desc. |
6927 (TTY) (TTY) (pipe) (pipe) (socket) (socket) (pipe) |
6932 --->----------------------------------------<---------<------
6934 | | |[collected using select() in
6935 | | | _XtWaitForSomething(), called
6936 | | | from XtAppProcessEvent(), called
6937 | | | in emacs_Xt_next_event();
6938 | | | dispatched to various callbacks]
6941 emacs_Xt_ p_s_callback(), | [popup_selection_callback]
6942 event_handler() x_u_v_s_callback(),| [x_update_vertical_scrollbar_
6943 | x_u_h_s_callback(),| callback]
6944 | search_callback() | [x_update_horizontal_scrollbar_
6948 enqueue_Xt_ signal_special_ |
6949 dispatch_event() Xt_user_event() |
6954 | dispatch_event() |
6961 dispatch Xt_what_callback()
6968 ---->-----------<--------
6971 | [collected and converted as appropriate in
6972 | emacs_Xt_next_event()]
6975 V (above this line is Xt-specific)
6976 Emacs ------------------------------------------------
6977 event (below this line is the generic event mechanism)
6980 was there if not, call
6981 a SIGINT? emacs_Xt_next_event()
6988 | [collected in event_stream_next_event();
6989 | SIGINT is converted using maybe_read_quit_event()]
6994 \---->------>----- maybe_kbd_translate() -->-----\
6998 command event queue |
7000 (contains events that were event queue, call
7001 read earlier but not processed, event_stream_next_event()
7002 typically when waiting in a |
7003 sit-for, sleep-for, etc. for |
7004 a particular event to be received) |
7008 ---->----------------------------------<------
7011 | next_event_internal()]
7013 unread- unread- event from |
7014 command- command- keyboard else, call
7015 events event macro next_event_internal()
7020 --------->----------------------<------------
7022 | [collected in `next-event', which may loop
7023 | more than once if the event it gets is on
7024 | a dead frame, device, etc.]
7028 feed into top-level event loop,
7029 which repeatedly calls `next-event'
7030 and then dispatches the event
7031 using `dispatch-event'
7034 @node Specifics About the Emacs Event
7035 @section Specifics About the Emacs Event
7036 @cindex event, specifics about the Lisp object
7038 @node The Event Stream Callback Routines
7039 @section The Event Stream Callback Routines
7040 @cindex event stream callback routines, the
7041 @cindex callback routines, the event stream
7043 @node Other Event Loop Functions
7044 @section Other Event Loop Functions
7045 @cindex event loop functions, other
7047 @code{detect_input_pending()} and @code{input-pending-p} look for
7048 input by calling @code{event_stream->event_pending_p} and looking in
7049 @code{[V]unread-command-event} and the @code{command_event_queue} (they
7050 do not check for an executing keyboard macro, though).
7052 @code{discard-input} cancels any command events pending (and any
7053 keyboard macros currently executing), and puts the others onto the
7054 @code{command_event_queue}. There is a comment about a ``race
7055 condition'', which is not a good sign.
7057 @code{next-command-event} and @code{read-char} are higher-level
7058 interfaces to @code{next-event}. @code{next-command-event} gets the
7059 next @dfn{command} event (i.e. keypress, mouse event, menu selection,
7060 or scrollbar action), calling @code{dispatch-event} on any others.
7061 @code{read-char} calls @code{next-command-event} and uses
7062 @code{event_to_character()} to return the character equivalent. With
7063 the right kind of input method support, it is possible for (read-char)
7064 to return a Kanji character.
7066 @node Converting Events
7067 @section Converting Events
7068 @cindex converting events
7069 @cindex events, converting
7071 @code{character_to_event()}, @code{event_to_character()},
7072 @code{event-to-character}, and @code{character-to-event} convert between
7073 characters and keypress events corresponding to the characters. If the
7074 event was not a keypress, @code{event_to_character()} returns -1 and
7075 @code{event-to-character} returns @code{nil}. These functions convert
7076 between character representation and the split-up event representation
7077 (keysym plus mod keys).
7079 @node Dispatching Events; The Command Builder
7080 @section Dispatching Events; The Command Builder
7081 @cindex dispatching events; the command builder
7082 @cindex events; the command builder, dispatching
7083 @cindex command builder, dispatching events; the
7087 @node Evaluation; Stack Frames; Bindings, Symbols and Variables, Events and the Event Loop, Top
7088 @chapter Evaluation; Stack Frames; Bindings
7089 @cindex evaluation; stack frames; bindings
7090 @cindex stack frames; bindings, evaluation;
7091 @cindex bindings, evaluation; stack frames;
7095 * Dynamic Binding; The specbinding Stack; Unwind-Protects::
7096 * Simple Special Forms::
7104 @code{Feval()} evaluates the form (a Lisp object) that is passed to
7105 it. Note that evaluation is only non-trivial for two types of objects:
7106 symbols and conses. A symbol is evaluated simply by calling
7107 @code{symbol-value} on it and returning the value.
7109 Evaluating a cons means calling a function. First, @code{eval} checks
7110 to see if garbage-collection is necessary, and calls
7111 @code{garbage_collect_1()} if so. It then increases the evaluation
7112 depth by 1 (@code{lisp_eval_depth}, which is always less than
7113 @code{max_lisp_eval_depth}) and adds an element to the linked list of
7114 @code{struct backtrace}'s (@code{backtrace_list}). Each such structure
7115 contains a pointer to the function being called plus a list of the
7116 function's arguments. Originally these values are stored unevalled, and
7117 as they are evaluated, the backtrace structure is updated. Garbage
7118 collection pays attention to the objects pointed to in the backtrace
7119 structures (garbage collection might happen while a function is being
7120 called or while an argument is being evaluated, and there could easily
7121 be no other references to the arguments in the argument list; once an
7122 argument is evaluated, however, the unevalled version is not needed by
7123 eval, and so the backtrace structure is changed).
7125 At this point, the function to be called is determined by looking at
7126 the car of the cons (if this is a symbol, its function definition is
7127 retrieved and the process repeated). The function should then consist
7128 of either a @code{Lisp_Subr} (built-in function written in C), a
7129 @code{Lisp_Compiled_Function} object, or a cons whose car is one of the
7130 symbols @code{autoload}, @code{macro} or @code{lambda}.
7132 If the function is a @code{Lisp_Subr}, the lisp object points to a
7133 @code{struct Lisp_Subr} (created by @code{DEFUN()}), which contains a
7134 pointer to the C function, a minimum and maximum number of arguments
7135 (or possibly the special constants @code{MANY} or @code{UNEVALLED}), a
7136 pointer to the symbol referring to that subr, and a couple of other
7137 things. If the subr wants its arguments @code{UNEVALLED}, they are
7138 passed raw as a list. Otherwise, an array of evaluated arguments is
7139 created and put into the backtrace structure, and either passed whole
7140 (@code{MANY}) or each argument is passed as a C argument.
7142 If the function is a @code{Lisp_Compiled_Function},
7143 @code{funcall_compiled_function()} is called. If the function is a
7144 lambda list, @code{funcall_lambda()} is called. If the function is a
7145 macro, [..... fill in] is done. If the function is an autoload,
7146 @code{do_autoload()} is called to load the definition and then eval
7147 starts over [explain this more].
7149 When @code{Feval()} exits, the evaluation depth is reduced by one, the
7150 debugger is called if appropriate, and the current backtrace structure
7151 is removed from the list.
7153 Both @code{funcall_compiled_function()} and @code{funcall_lambda()} need
7154 to go through the list of formal parameters to the function and bind
7155 them to the actual arguments, checking for @code{&rest} and
7156 @code{&optional} symbols in the formal parameters and making sure the
7157 number of actual arguments is correct.
7158 @code{funcall_compiled_function()} can do this a little more
7159 efficiently, since the formal parameter list can be checked for sanity
7160 when the compiled function object is created.
7162 @code{funcall_lambda()} simply calls @code{Fprogn} to execute the code
7165 @code{funcall_compiled_function()} calls the real byte-code interpreter
7166 @code{execute_optimized_program()} on the byte-code instructions, which
7167 are converted into an internal form for faster execution.
7169 When a compiled function is executed for the first time by
7170 @code{funcall_compiled_function()}, or during the dump phase of building
7171 XEmacs, the byte-code instructions are converted from a
7172 @code{Lisp_String} (which is inefficient to access, especially in the
7173 presence of MULE) into a @code{Lisp_Opaque} object containing an array
7174 of unsigned char, which can be directly executed by the byte-code
7175 interpreter. At this time the byte code is also analyzed for validity
7176 and transformed into a more optimized form, so that
7177 @code{execute_optimized_program()} can really fly.
7179 Here are some of the optimizations performed by the internal byte-code
7183 References to the @code{constants} array are checked for out-of-range
7184 indices, so that the byte interpreter doesn't have to.
7186 References to the @code{constants} array that will be used as a Lisp
7187 variable are checked for being correct non-constant (i.e. not @code{t},
7188 @code{nil}, or @code{keywordp}) symbols, so that the byte interpreter
7191 The maximum number of variable bindings in the byte-code is
7192 pre-computed, so that space on the @code{specpdl} stack can be
7193 pre-reserved once for the whole function execution.
7195 All byte-code jumps are relative to the current program counter instead
7196 of the start of the program, thereby saving a register.
7198 One-byte relative jumps are converted from the byte-code form of unsigned
7199 chars offset by 127 to machine-friendly signed chars.
7202 Of course, this transformation of the @code{instructions} should not be
7203 visible to the user, so @code{Fcompiled_function_instructions()} needs
7204 to know how to convert the optimized opaque object back into a Lisp
7205 string that is identical to the original string from the @file{.elc}
7206 file. (Actually, the resulting string may (rarely) contain slightly
7207 different, yet equivalent, byte code.)
7209 @code{Ffuncall()} implements Lisp @code{funcall}. @code{(funcall fun
7210 x1 x2 x3 ...)} is equivalent to @code{(eval (list fun (quote x1) (quote
7211 x2) (quote x3) ...))}. @code{Ffuncall()} contains its own code to do
7212 the evaluation, however, and is very similar to @code{Feval()}.
7214 From the performance point of view, it is worth knowing that most of the
7215 time in Lisp evaluation is spent executing @code{Lisp_Subr} and
7216 @code{Lisp_Compiled_Function} objects via @code{Ffuncall()} (not
7219 @code{Fapply()} implements Lisp @code{apply}, which is very similar to
7220 @code{funcall} except that if the last argument is a list, the result is the
7221 same as if each of the arguments in the list had been passed separately.
7222 @code{Fapply()} does some business to expand the last argument if it's a
7223 list, then calls @code{Ffuncall()} to do the work.
7225 @code{apply1()}, @code{call0()}, @code{call1()}, @code{call2()}, and
7226 @code{call3()} call a function, passing it the argument(s) given (the
7227 arguments are given as separate C arguments rather than being passed as
7228 an array). @code{apply1()} uses @code{Fapply()} while the others use
7229 @code{Ffuncall()} to do the real work.
7231 @node Dynamic Binding; The specbinding Stack; Unwind-Protects
7232 @section Dynamic Binding; The specbinding Stack; Unwind-Protects
7233 @cindex dynamic binding; the specbinding stack; unwind-protects
7234 @cindex binding; the specbinding stack; unwind-protects, dynamic
7235 @cindex specbinding stack; unwind-protects, dynamic binding; the
7236 @cindex unwind-protects, dynamic binding; the specbinding stack;
7242 Lisp_Object old_value;
7243 Lisp_Object (*func) (Lisp_Object); /* for unwind-protect */
7247 @code{struct specbinding} is used for local-variable bindings and
7248 unwind-protects. @code{specpdl} holds an array of @code{struct specbinding}'s,
7249 @code{specpdl_ptr} points to the beginning of the free bindings in the
7250 array, @code{specpdl_size} specifies the total number of binding slots
7251 in the array, and @code{max_specpdl_size} specifies the maximum number
7252 of bindings the array can be expanded to hold. @code{grow_specpdl()}
7253 increases the size of the @code{specpdl} array, multiplying its size by
7254 2 but never exceeding @code{max_specpdl_size} (except that if this
7255 number is less than 400, it is first set to 400).
7257 @code{specbind()} binds a symbol to a value and is used for local
7258 variables and @code{let} forms. The symbol and its old value (which
7259 might be @code{Qunbound}, indicating no prior value) are recorded in the
7260 specpdl array, and @code{specpdl_size} is increased by 1.
7262 @code{record_unwind_protect()} implements an @dfn{unwind-protect},
7263 which, when placed around a section of code, ensures that some specified
7264 cleanup routine will be executed even if the code exits abnormally
7265 (e.g. through a @code{throw} or quit). @code{record_unwind_protect()}
7266 simply adds a new specbinding to the @code{specpdl} array and stores the
7267 appropriate information in it. The cleanup routine can either be a C
7268 function, which is stored in the @code{func} field, or a @code{progn}
7269 form, which is stored in the @code{old_value} field.
7271 @code{unbind_to()} removes specbindings from the @code{specpdl} array
7272 until the specified position is reached. Each specbinding can be one of
7277 an unwind-protect with a C cleanup function (@code{func} is not 0, and
7278 @code{old_value} holds an argument to be passed to the function);
7280 an unwind-protect with a Lisp form (@code{func} is 0, @code{symbol}
7281 is @code{nil}, and @code{old_value} holds the form to be executed with
7282 @code{Fprogn()}); or
7284 a local-variable binding (@code{func} is 0, @code{symbol} is not
7285 @code{nil}, and @code{old_value} holds the old value, which is stored as
7286 the symbol's value).
7289 @node Simple Special Forms
7290 @section Simple Special Forms
7291 @cindex special forms, simple
7293 @code{or}, @code{and}, @code{if}, @code{cond}, @code{progn},
7294 @code{prog1}, @code{prog2}, @code{setq}, @code{quote}, @code{function},
7295 @code{let*}, @code{let}, @code{while}
7297 All of these are very simple and work as expected, calling
7298 @code{Feval()} or @code{Fprogn()} as necessary and (in the case of
7299 @code{let} and @code{let*}) using @code{specbind()} to create bindings
7300 and @code{unbind_to()} to undo the bindings when finished.
7302 Note that, with the exception of @code{Fprogn}, these functions are
7303 typically called in real life only in interpreted code, since the byte
7304 compiler knows how to convert calls to these functions directly into
7307 @node Catch and Throw
7308 @section Catch and Throw
7309 @cindex catch and throw
7310 @cindex throw, catch and
7317 struct catchtag *next;
7318 struct gcpro *gcpro;
7320 struct backtrace *backlist;
7321 int lisp_eval_depth;
7326 @code{catch} is a Lisp function that places a catch around a body of
7327 code. A catch is a means of non-local exit from the code. When a catch
7328 is created, a tag is specified, and executing a @code{throw} to this tag
7329 will exit from the body of code caught with this tag, and its value will
7330 be the value given in the call to @code{throw}. If there is no such
7331 call, the code will be executed normally.
7333 Information pertaining to a catch is held in a @code{struct catchtag},
7334 which is placed at the head of a linked list pointed to by
7335 @code{catchlist}. @code{internal_catch()} is passed a C function to
7336 call (@code{Fprogn()} when Lisp @code{catch} is called) and arguments to
7337 give it, and places a catch around the function. Each @code{struct
7338 catchtag} is held in the stack frame of the @code{internal_catch()}
7339 instance that created the catch.
7341 @code{internal_catch()} is fairly straightforward. It stores into the
7342 @code{struct catchtag} the tag name and the current values of
7343 @code{backtrace_list}, @code{lisp_eval_depth}, @code{gcprolist}, and the
7344 offset into the @code{specpdl} array, sets a jump point with @code{_setjmp()}
7345 (storing the jump point into the @code{struct catchtag}), and calls the
7346 function. Control will return to @code{internal_catch()} either when
7347 the function exits normally or through a @code{_longjmp()} to this jump
7348 point. In the latter case, @code{throw} will store the value to be
7349 returned into the @code{struct catchtag} before jumping. When it's
7350 done, @code{internal_catch()} removes the @code{struct catchtag} from
7351 the catchlist and returns the proper value.
7353 @code{Fthrow()} goes up through the catchlist until it finds one with
7354 a matching tag. It then calls @code{unbind_catch()} to restore
7355 everything to what it was when the appropriate catch was set, stores the
7356 return value in the @code{struct catchtag}, and jumps (with
7357 @code{_longjmp()}) to its jump point.
7359 @code{unbind_catch()} removes all catches from the catchlist until it
7360 finds the correct one. Some of the catches might have been placed for
7361 error-trapping, and if so, the appropriate entries on the handlerlist
7362 must be removed (see ``errors''). @code{unbind_catch()} also restores
7363 the values of @code{gcprolist}, @code{backtrace_list}, and
7364 @code{lisp_eval}, and calls @code{unbind_to()} to undo any specbindings
7365 created since the catch.
7368 @node Symbols and Variables, Buffers and Textual Representation, Evaluation; Stack Frames; Bindings, Top
7369 @chapter Symbols and Variables
7370 @cindex symbols and variables
7371 @cindex variables, symbols and
7374 * Introduction to Symbols::
7379 @node Introduction to Symbols
7380 @section Introduction to Symbols
7381 @cindex symbols, introduction to
7383 A symbol is basically just an object with four fields: a name (a
7384 string), a value (some Lisp object), a function (some Lisp object), and
7385 a property list (usually a list of alternating keyword/value pairs).
7386 What makes symbols special is that there is usually only one symbol with
7387 a given name, and the symbol is referred to by name. This makes a
7388 symbol a convenient way of calling up data by name, i.e. of implementing
7389 variables. (The variable's value is stored in the @dfn{value slot}.)
7390 Similarly, functions are referenced by name, and the definition of the
7391 function is stored in a symbol's @dfn{function slot}. This means that
7392 there can be a distinct function and variable with the same name. The
7393 property list is used as a more general mechanism of associating
7394 additional values with particular names, and once again the namespace is
7395 independent of the function and variable namespaces.
7401 The identity of symbols with their names is accomplished through a
7402 structure called an obarray, which is just a poorly-implemented hash
7403 table mapping from strings to symbols whose name is that string. (I say
7404 ``poorly implemented'' because an obarray appears in Lisp as a vector
7405 with some hidden fields rather than as its own opaque type. This is an
7406 Emacs Lisp artifact that should be fixed.)
7408 Obarrays are implemented as a vector of some fixed size (which should
7409 be a prime for best results), where each ``bucket'' of the vector
7410 contains one or more symbols, threaded through a hidden @code{next}
7411 field in the symbol. Lookup of a symbol in an obarray, and adding a
7412 symbol to an obarray, is accomplished through standard hash-table
7415 The standard Lisp function for working with symbols and obarrays is
7416 @code{intern}. This looks up a symbol in an obarray given its name; if
7417 it's not found, a new symbol is automatically created with the specified
7418 name, added to the obarray, and returned. This is what happens when the
7419 Lisp reader encounters a symbol (or more precisely, encounters the name
7420 of a symbol) in some text that it is reading. There is a standard
7421 obarray called @code{obarray} that is used for this purpose, although
7422 the Lisp programmer is free to create his own obarrays and @code{intern}
7425 Note that, once a symbol is in an obarray, it stays there until
7426 something is done about it, and the standard obarray @code{obarray}
7427 always stays around, so once you use any particular variable name, a
7428 corresponding symbol will stay around in @code{obarray} until you exit
7431 Note that @code{obarray} itself is a variable, and as such there is a
7432 symbol in @code{obarray} whose name is @code{"obarray"} and which
7433 contains @code{obarray} as its value.
7435 Note also that this call to @code{intern} occurs only when in the Lisp
7436 reader, not when the code is executed (at which point the symbol is
7437 already around, stored as such in the definition of the function).
7439 You can create your own obarray using @code{make-vector} (this is
7440 horrible but is an artifact) and intern symbols into that obarray.
7441 Doing that will result in two or more symbols with the same name.
7442 However, at most one of these symbols is in the standard @code{obarray}:
7443 You cannot have two symbols of the same name in any particular obarray.
7444 Note that you cannot add a symbol to an obarray in any fashion other
7445 than using @code{intern}: i.e. you can't take an existing symbol and put
7446 it in an existing obarray. Nor can you change the name of an existing
7447 symbol. (Since obarrays are vectors, you can violate the consistency of
7448 things by storing directly into the vector, but let's ignore that
7451 Usually symbols are created by @code{intern}, but if you really want,
7452 you can explicitly create a symbol using @code{make-symbol}, giving it
7453 some name. The resulting symbol is not in any obarray (i.e. it is
7454 @dfn{uninterned}), and you can't add it to any obarray. Therefore its
7455 primary purpose is as a symbol to use in macros to avoid namespace
7456 pollution. It can also be used as a carrier of information, but cons
7457 cells could probably be used just as well.
7459 You can also use @code{intern-soft} to look up a symbol but not create
7460 a new one, and @code{unintern} to remove a symbol from an obarray. This
7461 returns the removed symbol. (Remember: You can't put the symbol back
7462 into any obarray.) Finally, @code{mapatoms} maps over all of the symbols
7466 @section Symbol Values
7467 @cindex symbol values
7468 @cindex values, symbol
7470 The value field of a symbol normally contains a Lisp object. However,
7471 a symbol can be @dfn{unbound}, meaning that it logically has no value.
7472 This is internally indicated by storing a special Lisp object, called
7473 @dfn{the unbound marker} and stored in the global variable
7474 @code{Qunbound}. The unbound marker is of a special Lisp object type
7475 called @dfn{symbol-value-magic}. It is impossible for the Lisp
7476 programmer to directly create or access any object of this type.
7478 @strong{You must not let any ``symbol-value-magic'' object escape to
7479 the Lisp level.} Printing any of these objects will cause the message
7480 @samp{INTERNAL EMACS BUG} to appear as part of the print representation.
7481 (You may see this normally when you call @code{debug_print()} from the
7482 debugger on a Lisp object.) If you let one of these objects escape to
7483 the Lisp level, you will violate a number of assumptions contained in
7484 the C code and make the unbound marker not function right.
7486 When a symbol is created, its value field (and function field) are set
7487 to @code{Qunbound}. The Lisp programmer can restore these conditions
7488 later using @code{makunbound} or @code{fmakunbound}, and can query to
7489 see whether the value of function fields are @dfn{bound} (i.e. have a
7490 value other than @code{Qunbound}) using @code{boundp} and
7491 @code{fboundp}. The fields are set to a normal Lisp object using
7492 @code{set} (or @code{setq}) and @code{fset}.
7494 Other symbol-value-magic objects are used as special markers to
7495 indicate variables that have non-normal properties. This includes any
7496 variables that are tied into C variables (setting the variable magically
7497 sets some global variable in the C code, and likewise for retrieving the
7498 variable's value), variables that magically tie into slots in the
7499 current buffer, variables that are buffer-local, etc. The
7500 symbol-value-magic object is stored in the value cell in place of
7501 a normal object, and the code to retrieve a symbol's value
7502 (i.e. @code{symbol-value}) knows how to do special things with them.
7503 This means that you should not just fetch the value cell directly if you
7504 want a symbol's value.
7506 The exact workings of this are rather complex and involved and are
7507 well-documented in comments in @file{buffer.c}, @file{symbols.c}, and
7510 @node Buffers and Textual Representation, MULE Character Sets and Encodings, Symbols and Variables, Top
7511 @chapter Buffers and Textual Representation
7512 @cindex buffers and textual representation
7513 @cindex textual representation, buffers and
7516 * Introduction to Buffers:: A buffer holds a block of text such as a file.
7517 * The Text in a Buffer:: Representation of the text in a buffer.
7518 * Buffer Lists:: Keeping track of all buffers.
7519 * Markers and Extents:: Tagging locations within a buffer.
7520 * Bufbytes and Emchars:: Representation of individual characters.
7521 * The Buffer Object:: The Lisp object corresponding to a buffer.
7524 @node Introduction to Buffers
7525 @section Introduction to Buffers
7526 @cindex buffers, introduction to
7528 A buffer is logically just a Lisp object that holds some text.
7529 In this, it is like a string, but a buffer is optimized for
7530 frequent insertion and deletion, while a string is not. Furthermore:
7534 Buffers are @dfn{permanent} objects, i.e. once you create them, they
7535 remain around, and need to be explicitly deleted before they go away.
7537 Each buffer has a unique name, which is a string. Buffers are
7538 normally referred to by name. In this respect, they are like
7541 Buffers have a default insertion position, called @dfn{point}.
7542 Inserting text (unless you explicitly give a position) goes at point,
7543 and moves point forward past the text. This is what is going on when
7544 you type text into Emacs.
7546 Buffers have lots of extra properties associated with them.
7548 Buffers can be @dfn{displayed}. What this means is that there
7549 exist a number of @dfn{windows}, which are objects that correspond
7550 to some visible section of your display, and each window has
7551 an associated buffer, and the current contents of the buffer
7552 are shown in that section of the display. The redisplay mechanism
7553 (which takes care of doing this) knows how to look at the
7554 text of a buffer and come up with some reasonable way of displaying
7555 this. Many of the properties of a buffer control how the
7556 buffer's text is displayed.
7558 One buffer is distinguished and called the @dfn{current buffer}. It is
7559 stored in the variable @code{current_buffer}. Buffer operations operate
7560 on this buffer by default. When you are typing text into a buffer, the
7561 buffer you are typing into is always @code{current_buffer}. Switching
7562 to a different window changes the current buffer. Note that Lisp code
7563 can temporarily change the current buffer using @code{set-buffer} (often
7564 enclosed in a @code{save-excursion} so that the former current buffer
7565 gets restored when the code is finished). However, calling
7566 @code{set-buffer} will NOT cause a permanent change in the current
7567 buffer. The reason for this is that the top-level event loop sets
7568 @code{current_buffer} to the buffer of the selected window, each time
7569 it finishes executing a user command.
7572 Make sure you understand the distinction between @dfn{current buffer}
7573 and @dfn{buffer of the selected window}, and the distinction between
7574 @dfn{point} of the current buffer and @dfn{window-point} of the selected
7575 window. (This latter distinction is explained in detail in the section
7578 @node The Text in a Buffer
7579 @section The Text in a Buffer
7580 @cindex text in a buffer, the
7581 @cindex buffer, the text in a
7583 The text in a buffer consists of a sequence of zero or more
7584 characters. A @dfn{character} is an integer that logically represents
7585 a letter, number, space, or other unit of text. Most of the characters
7586 that you will typically encounter belong to the ASCII set of characters,
7587 but there are also characters for various sorts of accented letters,
7588 special symbols, Chinese and Japanese ideograms (i.e. Kanji, Katakana,
7589 etc.), Cyrillic and Greek letters, etc. The actual number of possible
7590 characters is quite large.
7592 For now, we can view a character as some non-negative integer that
7593 has some shape that defines how it typically appears (e.g. as an
7594 uppercase A). (The exact way in which a character appears depends on the
7595 font used to display the character.) The internal type of characters in
7596 the C code is an @code{Emchar}; this is just an @code{int}, but using a
7597 symbolic type makes the code clearer.
7599 Between every character in a buffer is a @dfn{buffer position} or
7600 @dfn{character position}. We can speak of the character before or after
7601 a particular buffer position, and when you insert a character at a
7602 particular position, all characters after that position end up at new
7603 positions. When we speak of the character @dfn{at} a position, we
7604 really mean the character after the position. (This schizophrenia
7605 between a buffer position being ``between'' a character and ``on'' a
7606 character is rampant in Emacs.)
7608 Buffer positions are numbered starting at 1. This means that
7609 position 1 is before the first character, and position 0 is not
7610 valid. If there are N characters in a buffer, then buffer
7611 position N+1 is after the last one, and position N+2 is not valid.
7613 The internal makeup of the Emchar integer varies depending on whether
7614 we have compiled with MULE support. If not, the Emchar integer is an
7615 8-bit integer with possible values from 0 - 255. 0 - 127 are the
7616 standard ASCII characters, while 128 - 255 are the characters from the
7617 ISO-8859-1 character set. If we have compiled with MULE support, an
7618 Emchar is a 19-bit integer, with the various bits having meanings
7619 according to a complex scheme that will be detailed later. The
7620 characters numbered 0 - 255 still have the same meanings as for the
7621 non-MULE case, though.
7623 Internally, the text in a buffer is represented in a fairly simple
7624 fashion: as a contiguous array of bytes, with a @dfn{gap} of some size
7625 in the middle. Although the gap is of some substantial size in bytes,
7626 there is no text contained within it: From the perspective of the text
7627 in the buffer, it does not exist. The gap logically sits at some buffer
7628 position, between two characters (or possibly at the beginning or end of
7629 the buffer). Insertion of text in a buffer at a particular position is
7630 always accomplished by first moving the gap to that position
7631 (i.e. through some block moving of text), then writing the text into the
7632 beginning of the gap, thereby shrinking the gap. If the gap shrinks
7633 down to nothing, a new gap is created. (What actually happens is that a
7634 new gap is ``created'' at the end of the buffer's text, which requires
7635 nothing more than changing a couple of indices; then the gap is
7636 ``moved'' to the position where the insertion needs to take place by
7637 moving up in memory all the text after that position.) Similarly,
7638 deletion occurs by moving the gap to the place where the text is to be
7639 deleted, and then simply expanding the gap to include the deleted text.
7640 (@dfn{Expanding} and @dfn{shrinking} the gap as just described means
7641 just that the internal indices that keep track of where the gap is
7642 located are changed.)
7644 Note that the total amount of memory allocated for a buffer text never
7645 decreases while the buffer is live. Therefore, if you load up a
7646 20-megabyte file and then delete all but one character, there will be a
7647 20-megabyte gap, which won't get any smaller (except by inserting
7648 characters back again). Once the buffer is killed, the memory allocated
7649 for the buffer text will be freed, but it will still be sitting on the
7650 heap, taking up virtual memory, and will not be released back to the
7651 operating system. (However, if you have compiled XEmacs with rel-alloc,
7652 the situation is different. In this case, the space @emph{will} be
7653 released back to the operating system. However, this tends to result in a
7654 noticeable speed penalty.)
7656 Astute readers may notice that the text in a buffer is represented as
7657 an array of @emph{bytes}, while (at least in the MULE case) an Emchar is
7658 a 19-bit integer, which clearly cannot fit in a byte. This means (of
7659 course) that the text in a buffer uses a different representation from
7660 an Emchar: specifically, the 19-bit Emchar becomes a series of one to
7661 four bytes. The conversion between these two representations is complex
7662 and will be described later.
7664 In the non-MULE case, everything is very simple: An Emchar
7665 is an 8-bit value, which fits neatly into one byte.
7667 If we are given a buffer position and want to retrieve the
7668 character at that position, we need to follow these steps:
7672 Pretend there's no gap, and convert the buffer position into a @dfn{byte
7673 index} that indexes to the appropriate byte in the buffer's stream of
7674 textual bytes. By convention, byte indices begin at 1, just like buffer
7675 positions. In the non-MULE case, byte indices and buffer positions are
7676 identical, since one character equals one byte.
7678 Convert the byte index into a @dfn{memory index}, which takes the gap
7679 into account. The memory index is a direct index into the block of
7680 memory that stores the text of a buffer. This basically just involves
7681 checking to see if the byte index is past the gap, and if so, adding the
7682 size of the gap to it. By convention, memory indices begin at 1, just
7683 like buffer positions and byte indices, and when referring to the
7684 position that is @dfn{at} the gap, we always use the memory position at
7685 the @emph{beginning}, not at the end, of the gap.
7687 Fetch the appropriate bytes at the determined memory position.
7689 Convert these bytes into an Emchar.
7692 In the non-Mule case, (3) and (4) boil down to a simple one-byte
7695 Note that we have defined three types of positions in a buffer:
7699 @dfn{buffer positions} or @dfn{character positions}, typedef @code{Bufpos}
7701 @dfn{byte indices}, typedef @code{Bytind}
7703 @dfn{memory indices}, typedef @code{Memind}
7706 All three typedefs are just @code{int}s, but defining them this way makes
7707 things a lot clearer.
7709 Most code works with buffer positions. In particular, all Lisp code
7710 that refers to text in a buffer uses buffer positions. Lisp code does
7711 not know that byte indices or memory indices exist.
7713 Finally, we have a typedef for the bytes in a buffer. This is a
7714 @code{Bufbyte}, which is an unsigned char. Referring to them as
7715 Bufbytes underscores the fact that we are working with a string of bytes
7716 in the internal Emacs buffer representation rather than in one of a
7717 number of possible alternative representations (e.g. EUC-encoded text,
7721 @section Buffer Lists
7722 @cindex buffer lists
7724 Recall earlier that buffers are @dfn{permanent} objects, i.e. that
7725 they remain around until explicitly deleted. This entails that there is
7726 a list of all the buffers in existence. This list is actually an
7727 assoc-list (mapping from the buffer's name to the buffer) and is stored
7728 in the global variable @code{Vbuffer_alist}.
7730 The order of the buffers in the list is important: the buffers are
7731 ordered approximately from most-recently-used to least-recently-used.
7732 Switching to a buffer using @code{switch-to-buffer},
7733 @code{pop-to-buffer}, etc. and switching windows using
7734 @code{other-window}, etc. usually brings the new current buffer to the
7735 front of the list. @code{switch-to-buffer}, @code{other-buffer},
7736 etc. look at the beginning of the list to find an alternative buffer to
7737 suggest. You can also explicitly move a buffer to the end of the list
7738 using @code{bury-buffer}.
7740 In addition to the global ordering in @code{Vbuffer_alist}, each frame
7741 has its own ordering of the list. These lists always contain the same
7742 elements as in @code{Vbuffer_alist} although possibly in a different
7743 order. @code{buffer-list} normally returns the list for the selected
7744 frame. This allows you to work in separate frames without things
7745 interfering with each other.
7747 The standard way to look up a buffer given a name is
7748 @code{get-buffer}, and the standard way to create a new buffer is
7749 @code{get-buffer-create}, which looks up a buffer with a given name,
7750 creating a new one if necessary. These operations correspond exactly
7751 with the symbol operations @code{intern-soft} and @code{intern},
7752 respectively. You can also force a new buffer to be created using
7753 @code{generate-new-buffer}, which takes a name and (if necessary) makes
7754 a unique name from this by appending a number, and then creates the
7755 buffer. This is basically like the symbol operation @code{gensym}.
7757 @node Markers and Extents
7758 @section Markers and Extents
7759 @cindex markers and extents
7760 @cindex extents, markers and
7762 Among the things associated with a buffer are things that are
7763 logically attached to certain buffer positions. This can be used to
7764 keep track of a buffer position when text is inserted and deleted, so
7765 that it remains at the same spot relative to the text around it; to
7766 assign properties to particular sections of text; etc. There are two
7767 such objects that are useful in this regard: they are @dfn{markers} and
7770 A @dfn{marker} is simply a flag placed at a particular buffer
7771 position, which is moved around as text is inserted and deleted.
7772 Markers are used for all sorts of purposes, such as the @code{mark} that
7773 is the other end of textual regions to be cut, copied, etc.
7775 An @dfn{extent} is similar to two markers plus some associated
7776 properties, and is used to keep track of regions in a buffer as text is
7777 inserted and deleted, and to add properties (e.g. fonts) to particular
7778 regions of text. The external interface of extents is explained
7781 The important thing here is that markers and extents simply contain
7782 buffer positions in them as integers, and every time text is inserted or
7783 deleted, these positions must be updated. In order to minimize the
7784 amount of shuffling that needs to be done, the positions in markers and
7785 extents (there's one per marker, two per extent) are stored in Meminds.
7786 This means that they only need to be moved when the text is physically
7787 moved in memory; since the gap structure tries to minimize this, it also
7788 minimizes the number of marker and extent indices that need to be
7789 adjusted. Look in @file{insdel.c} for the details of how this works.
7791 One other important distinction is that markers are @dfn{temporary}
7792 while extents are @dfn{permanent}. This means that markers disappear as
7793 soon as there are no more pointers to them, and correspondingly, there
7794 is no way to determine what markers are in a buffer if you are just
7795 given the buffer. Extents remain in a buffer until they are detached
7796 (which could happen as a result of text being deleted) or the buffer is
7797 deleted, and primitives do exist to enumerate the extents in a buffer.
7799 @node Bufbytes and Emchars
7800 @section Bufbytes and Emchars
7801 @cindex Bufbytes and Emchars
7802 @cindex Emchars, Bufbytes and
7806 @node The Buffer Object
7807 @section The Buffer Object
7808 @cindex buffer object, the
7809 @cindex object, the buffer
7811 Buffers contain fields not directly accessible by the Lisp programmer.
7812 We describe them here, naming them by the names used in the C code.
7813 Many are accessible indirectly in Lisp programs via Lisp primitives.
7817 The buffer name is a string that names the buffer. It is guaranteed to
7818 be unique. @xref{Buffer Names,,, lispref, XEmacs Lisp Reference
7822 This field contains the time when the buffer was last saved, as an
7823 integer. @xref{Buffer Modification,,, lispref, XEmacs Lisp Reference
7827 This field contains the modification time of the visited file. It is
7828 set when the file is written or read. Every time the buffer is written
7829 to the file, this field is compared to the modification time of the
7830 file. @xref{Buffer Modification,,, lispref, XEmacs Lisp Reference
7833 @item auto_save_modified
7834 This field contains the time when the buffer was last auto-saved.
7836 @item last_window_start
7837 This field contains the @code{window-start} position in the buffer as of
7838 the last time the buffer was displayed in a window.
7841 This field points to the buffer's undo list. @xref{Undo,,, lispref,
7842 XEmacs Lisp Reference Manual}.
7844 @item syntax_table_v
7845 This field contains the syntax table for the buffer. @xref{Syntax
7846 Tables,,, lispref, XEmacs Lisp Reference Manual}.
7848 @item downcase_table
7849 This field contains the conversion table for converting text to lower
7850 case. @xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}.
7853 This field contains the conversion table for converting text to upper
7854 case. @xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}.
7856 @item case_canon_table
7857 This field contains the conversion table for canonicalizing text for
7858 case-folding search. @xref{Case Tables,,, lispref, XEmacs Lisp
7861 @item case_eqv_table
7862 This field contains the equivalence table for case-folding search.
7863 @xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}.
7866 This field contains the buffer's display table, or @code{nil} if it
7867 doesn't have one. @xref{Display Tables,,, lispref, XEmacs Lisp
7871 This field contains the chain of all markers that currently point into
7872 the buffer. Deletion of text in the buffer, and motion of the buffer's
7873 gap, must check each of these markers and perhaps update it.
7874 @xref{Markers,,, lispref, XEmacs Lisp Reference Manual}.
7877 This field is a flag that tells whether a backup file has been made for
7878 the visited file of this buffer.
7881 This field contains the mark for the buffer. The mark is a marker,
7882 hence it is also included on the list @code{markers}. @xref{The Mark,,,
7883 lispref, XEmacs Lisp Reference Manual}.
7886 This field is non-@code{nil} if the buffer's mark is active.
7888 @item local_var_alist
7889 This field contains the association list describing the variables local
7890 in this buffer, and their values, with the exception of local variables
7891 that have special slots in the buffer object. (Those slots are omitted
7892 from this table.) @xref{Buffer-Local Variables,,, lispref, XEmacs Lisp
7895 @item modeline_format
7896 This field contains a Lisp object which controls how to display the mode
7897 line for this buffer. @xref{Modeline Format,,, lispref, XEmacs Lisp
7901 This field holds the buffer's base buffer (if it is an indirect buffer),
7905 @node MULE Character Sets and Encodings, The Lisp Reader and Compiler, Buffers and Textual Representation, Top
7906 @chapter MULE Character Sets and Encodings
7907 @cindex Mule character sets and encodings
7908 @cindex character sets and encodings, Mule
7909 @cindex encodings, Mule character sets and
7911 Recall that there are two primary ways that text is represented in
7912 XEmacs. The @dfn{buffer} representation sees the text as a series of
7913 bytes (Bufbytes), with a variable number of bytes used per character.
7914 The @dfn{character} representation sees the text as a series of integers
7915 (Emchars), one per character. The character representation is a cleaner
7916 representation from a theoretical standpoint, and is thus used in many
7917 cases when lots of manipulations on a string need to be done. However,
7918 the buffer representation is the standard representation used in both
7919 Lisp strings and buffers, and because of this, it is the ``default''
7920 representation that text comes in. The reason for using this
7921 representation is that it's compact and is compatible with ASCII.
7926 * Internal Mule Encodings::
7930 @node Character Sets
7931 @section Character Sets
7932 @cindex character sets
7934 A character set (or @dfn{charset}) is an ordered set of characters. A
7935 particular character in a charset is indexed using one or more
7936 @dfn{position codes}, which are non-negative integers. The number of
7937 position codes needed to identify a particular character in a charset is
7938 called the @dfn{dimension} of the charset. In XEmacs/Mule, all charsets
7939 have dimension 1 or 2, and the size of all charsets (except for a few
7940 special cases) is either 94, 96, 94 by 94, or 96 by 96. The range of
7941 position codes used to index characters from any of these types of
7942 character sets is as follows:
7945 Charset type Position code 1 Position code 2
7946 ------------------------------------------------------------
7949 94x94 33 - 126 33 - 126
7950 96x96 32 - 127 32 - 127
7953 Note that in the above cases position codes do not start at an
7954 expected value such as 0 or 1. The reason for this will become clear
7957 For example, Latin-1 is a 96-character charset, and JISX0208 (the
7958 Japanese national character set) is a 94x94-character charset.
7960 [Note that, although the ranges above define the @emph{valid} position
7961 codes for a charset, some of the slots in a particular charset may in
7962 fact be empty. This is the case for JISX0208, for example, where (e.g.)
7963 all the slots whose first position code is in the range 118 - 127 are
7966 There are three charsets that do not follow the above rules. All of
7967 them have one dimension, and have ranges of position codes as follows:
7970 Charset name Position code 1
7971 ------------------------------------
7974 Composite 0 - some large number
7977 (The upper bound of the position code for composite characters has not
7978 yet been determined, but it will probably be at least 16,383).
7980 ASCII is the union of two subsidiary character sets: Printing-ASCII
7981 (the printing ASCII character set, consisting of position codes 33 -
7982 126, like for a standard 94-character charset) and Control-ASCII (the
7983 non-printing characters that would appear in a binary file with codes 0
7986 Control-1 contains the non-printing characters that would appear in a
7987 binary file with codes 128 - 159.
7989 Composite contains characters that are generated by overstriking one
7990 or more characters from other charsets.
7992 Note that some characters in ASCII, and all characters in Control-1,
7993 are @dfn{control} (non-printing) characters. These have no printed
7994 representation but instead control some other function of the printing
7995 (e.g. TAB or 8 moves the current character position to the next tab
7996 stop). All other characters in all charsets are @dfn{graphic}
7997 (printing) characters.
7999 When a binary file is read in, the bytes in the file are assigned to
8000 character sets as follows:
8003 Bytes Character set Range
8004 --------------------------------------------------
8005 0 - 127 ASCII 0 - 127
8006 128 - 159 Control-1 0 - 31
8007 160 - 255 Latin-1 32 - 127
8010 This is a bit ad-hoc but gets the job done.
8014 @cindex encodings, Mule
8015 @cindex Mule encodings
8017 An @dfn{encoding} is a way of numerically representing characters from
8018 one or more character sets. If an encoding only encompasses one
8019 character set, then the position codes for the characters in that
8020 character set could be used directly. This is not possible, however, if
8021 more than one character set is to be used in the encoding.
8023 For example, the conversion detailed above between bytes in a binary
8024 file and characters is effectively an encoding that encompasses the
8025 three character sets ASCII, Control-1, and Latin-1 in a stream of 8-bit
8028 Thus, an encoding can be viewed as a way of encoding characters from a
8029 specified group of character sets using a stream of bytes, each of which
8030 contains a fixed number of bits (but not necessarily 8, as in the common
8033 Here are descriptions of a couple of common
8037 * Japanese EUC (Extended Unix Code)::
8041 @node Japanese EUC (Extended Unix Code)
8042 @subsection Japanese EUC (Extended Unix Code)
8043 @cindex Japanese EUC (Extended Unix Code)
8044 @cindex EUC (Extended Unix Code), Japanese
8045 @cindex Extended Unix Code, Japanese EUC
8047 This encompasses the character sets Printing-ASCII, Japanese-JISX0201,
8048 and Japanese-JISX0208-Kana (half-width katakana, the right half of
8049 JISX0201). It uses 8-bit bytes.
8051 Note that Printing-ASCII and Japanese-JISX0201-Kana are 94-character
8052 charsets, while Japanese-JISX0208 is a 94x94-character charset.
8054 The encoding is as follows:
8057 Character set Representation (PC=position-code)
8058 ------------- --------------
8060 Japanese-JISX0201-Kana 0x8E | PC1 + 0x80
8061 Japanese-JISX0208 PC1 + 0x80 | PC2 + 0x80
8062 Japanese-JISX0212 PC1 + 0x80 | PC2 + 0x80
8070 This encompasses the character sets Printing-ASCII,
8071 Japanese-JISX0201-Roman (the left half of JISX0201; this character set
8072 is very similar to Printing-ASCII and is a 94-character charset),
8073 Japanese-JISX0208, and Japanese-JISX0201-Kana. It uses 7-bit bytes.
8075 Unlike Japanese EUC, this is a @dfn{modal} encoding, which
8076 means that there are multiple states that the encoding can
8077 be in, which affect how the bytes are to be interpreted.
8078 Special sequences of bytes (called @dfn{escape sequences})
8079 are used to change states.
8081 The encoding is as follows:
8084 Character set Representation (PC=position-code)
8085 ------------- --------------
8087 Japanese-JISX0201-Roman PC1
8088 Japanese-JISX0201-Kana PC1
8089 Japanese-JISX0208 PC1 PC2
8092 Escape sequence ASCII equivalent Meaning
8093 --------------- ---------------- -------
8094 0x1B 0x28 0x4A ESC ( J invoke Japanese-JISX0201-Roman
8095 0x1B 0x28 0x49 ESC ( I invoke Japanese-JISX0201-Kana
8096 0x1B 0x24 0x42 ESC $ B invoke Japanese-JISX0208
8097 0x1B 0x28 0x42 ESC ( B invoke Printing-ASCII
8100 Initially, Printing-ASCII is invoked.
8102 @node Internal Mule Encodings
8103 @section Internal Mule Encodings
8104 @cindex internal Mule encodings
8105 @cindex Mule encodings, internal
8106 @cindex encodings, internal Mule
8108 In XEmacs/Mule, each character set is assigned a unique number, called a
8109 @dfn{leading byte}. This is used in the encodings of a character.
8110 Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has
8111 a leading byte of 0), although some leading bytes are reserved.
8113 Charsets whose leading byte is in the range 0x80 - 0x9F are called
8114 @dfn{official} and are used for built-in charsets. Other charsets are
8115 called @dfn{private} and have leading bytes in the range 0xA0 - 0xFF;
8116 these are user-defined charsets.
8121 Character set Leading byte
8122 ------------- ------------
8125 Dimension-1 Official 0x81 - 0x8D
8128 Dimension-2 Official 0x90 - 0x99
8129 (0x9A - 0x9D are free;
8130 0x9E and 0x9F are reserved)
8131 Dimension-1 Private 0xA0 - 0xEF
8132 Dimension-2 Private 0xF0 - 0xFF
8135 There are two internal encodings for characters in XEmacs/Mule. One is
8136 called @dfn{string encoding} and is an 8-bit encoding that is used for
8137 representing characters in a buffer or string. It uses 1 to 4 bytes per
8138 character. The other is called @dfn{character encoding} and is a 19-bit
8139 encoding that is used for representing characters individually in a
8142 (In the following descriptions, we'll ignore composite characters for
8143 the moment. We also give a general (structural) overview first,
8144 followed later by the exact details.)
8147 * Internal String Encoding::
8148 * Internal Character Encoding::
8151 @node Internal String Encoding
8152 @subsection Internal String Encoding
8153 @cindex internal string encoding
8154 @cindex string encoding, internal
8155 @cindex encoding, internal string
8157 ASCII characters are encoded using their position code directly. Other
8158 characters are encoded using their leading byte followed by their
8159 position code(s) with the high bit set. Characters in private character
8160 sets have their leading byte prefixed with a @dfn{leading byte prefix},
8161 which is either 0x9E or 0x9F. (No character sets are ever assigned these
8162 leading bytes.) Specifically:
8165 Character set Encoding (PC=position-code, LB=leading-byte)
8166 ------------- --------
8168 Control-1 LB | PC1 + 0xA0 |
8169 Dimension-1 official LB | PC1 + 0x80 |
8170 Dimension-1 private 0x9E | LB | PC1 + 0x80 |
8171 Dimension-2 official LB | PC1 + 0x80 | PC2 + 0x80 |
8172 Dimension-2 private 0x9F | LB | PC1 + 0x80 | PC2 + 0x80
8175 The basic characteristic of this encoding is that the first byte
8176 of all characters is in the range 0x00 - 0x9F, and the second and
8177 following bytes of all characters is in the range 0xA0 - 0xFF.
8178 This means that it is impossible to get out of sync, or more
8183 Given any byte position, the beginning of the character it is
8184 within can be determined in constant time.
8186 Given any byte position at the beginning of a character, the
8187 beginning of the next character can be determined in constant
8190 Given any byte position at the beginning of a character, the
8191 beginning of the previous character can be determined in constant
8194 Textual searches can simply treat encoded strings as if they
8195 were encoded in a one-byte-per-character fashion rather than
8196 the actual multi-byte encoding.
8199 None of the standard non-modal encodings meet all of these
8200 conditions. For example, EUC satisfies only (2) and (3), while
8201 Shift-JIS and Big5 (not yet described) satisfy only (2). (All
8202 non-modal encodings must satisfy (2), in order to be unambiguous.)
8204 @node Internal Character Encoding
8205 @subsection Internal Character Encoding
8206 @cindex internal character encoding
8207 @cindex character encoding, internal
8208 @cindex encoding, internal character
8210 One 19-bit word represents a single character. The word is
8211 separated into three fields:
8214 Bit number: 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
8215 <------------> <------------------> <------------------>
8219 Note that fields 2 and 3 hold 7 bits each, while field 1 holds 5 bits.
8222 Character set Field 1 Field 2 Field 3
8223 ------------- ------- ------- -------
8228 Dimension-1 official 0 LB - 0x80 PC1
8229 range: (01 - 0D) (20 - 7F)
8230 Dimension-1 private 0 LB - 0x80 PC1
8231 range: (20 - 6F) (20 - 7F)
8232 Dimension-2 official LB - 0x8F PC1 PC2
8233 range: (01 - 0A) (20 - 7F) (20 - 7F)
8234 Dimension-2 private LB - 0xE1 PC1 PC2
8235 range: (0F - 1E) (20 - 7F) (20 - 7F)
8239 Note that character codes 0 - 255 are the same as the ``binary encoding''
8248 CCL_PROGRAM := (CCL_MAIN_BLOCK
8251 CCL_MAIN_BLOCK := CCL_BLOCK
8252 CCL_EOF_BLOCK := CCL_BLOCK
8254 CCL_BLOCK := STATEMENT | (STATEMENT [STATEMENT ...])
8256 SET | IF | BRANCH | LOOP | REPEAT | BREAK
8259 SET := (REG = EXPRESSION) | (REG SELF_OP EXPRESSION)
8262 EXPRESSION := ARG | (EXPRESSION OP ARG)
8264 IF := (if EXPRESSION CCL_BLOCK CCL_BLOCK)
8265 BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...])
8266 LOOP := (loop STATEMENT [STATEMENT ...])
8269 | (write-repeat [REG | INT-OR-CHAR | string])
8270 | (write-read-repeat REG [INT-OR-CHAR | string | ARRAY]?)
8271 READ := (read REG) | (read REG REG)
8272 | (read-if REG ARITH_OP ARG CCL_BLOCK CCL_BLOCK)
8273 | (read-branch REG CCL_BLOCK [CCL_BLOCK ...])
8274 WRITE := (write REG) | (write REG REG)
8275 | (write INT-OR-CHAR) | (write STRING) | STRING
8279 REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7
8280 ARG := REG | INT-OR-CHAR
8281 OP := + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | //
8282 | < | > | == | <= | >= | !=
8284 += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>=
8285 ARRAY := '[' INT-OR-CHAR ... ']'
8286 INT-OR-CHAR := INT | CHAR
8290 The machine code consists of a vector of 32-bit words.
8291 The first such word specifies the start of the EOF section of the code;
8292 this is the code executed to handle any stuff that needs to be done
8293 (e.g. designating back to ASCII and left-to-right mode) after all
8294 other encoded/decoded data has been written out. This is not used for
8295 charset CCL programs.
8297 REGISTER: 0..7 -- referred by RRR or rrr
8299 OPERATOR BIT FIELD (27-bit): XXXXXXXXXXXXXXX RRR TTTTT
8300 TTTTT (5-bit): operator type
8301 RRR (3-bit): register number
8302 XXXXXXXXXXXXXXXX (15-bit):
8303 CCCCCCCCCCCCCCC: constant or address
8304 000000000000rrr: register number
8331 OPERATORS: TTTTT RRR XX..
8333 SetCS: 00000 RRR C...C RRR = C...C
8334 SetCL: 00001 RRR ..... RRR = c...c
8336 SetR: 00010 RRR ..rrr RRR = rrr
8337 SetA: 00011 RRR ..rrr RRR = array[rrr]
8338 C.............C size of array = C...C
8339 c.............c contents = c...c
8341 Jump: 00100 000 c...c jump to c...c
8342 JumpCond: 00101 RRR c...c if (!RRR) jump to c...c
8343 WriteJump: 00110 RRR c...c Write1 RRR, jump to c...c
8344 WriteReadJump: 00111 RRR c...c Write1, Read1 RRR, jump to c...c
8345 WriteCJump: 01000 000 c...c Write1 C...C, jump to c...c
8347 WriteCReadJump: 01001 RRR c...c Write1 C...C, Read1 RRR,
8348 C.............C and jump to c...c
8349 WriteSJump: 01010 000 c...c WriteS, jump to c...c
8353 WriteSReadJump: 01011 RRR c...c WriteS, Read1 RRR, jump to c...c
8357 WriteAReadJump: 01100 RRR c...c WriteA, Read1 RRR, jump to c...c
8358 C.............C size of array = C...C
8359 c.............c contents = c...c
8361 Branch: 01101 RRR C...C if (RRR >= 0 && RRR < C..)
8362 c.............c branch to (RRR+1)th address
8363 Read1: 01110 RRR ... read 1-byte to RRR
8364 Read2: 01111 RRR ..rrr read 2-byte to RRR and rrr
8365 ReadBranch: 10000 RRR C...C Read1 and Branch
8368 Write1: 10001 RRR ..... write 1-byte RRR
8369 Write2: 10010 RRR ..rrr write 2-byte RRR and rrr
8370 WriteC: 10011 000 ..... write 1-char C...CC
8372 WriteS: 10100 000 ..... write C..-byte of string
8376 WriteA: 10101 RRR ..... write array[RRR]
8377 C.............C size of array = C...C
8378 c.............c contents = c...c
8380 End: 10110 000 ..... terminate the execution
8382 SetSelfCS: 10111 RRR C...C RRR AAAAA= C...C
8384 SetSelfCL: 11000 RRR ..... RRR AAAAA= c...c
8387 SetSelfR: 11001 RRR ..Rrr RRR AAAAA= rrr
8389 SetExprCL: 11010 RRR ..Rrr RRR = rrr AAAAA c...c
8392 SetExprR: 11011 RRR ..rrr RRR = rrr AAAAA Rrr
8395 JumpCondC: 11100 RRR c...c if !(RRR AAAAA C..) jump to c...c
8398 JumpCondR: 11101 RRR c...c if !(RRR AAAAA rrr) jump to c...c
8401 ReadJumpCondC: 11110 RRR c...c Read1 and JumpCondC
8404 ReadJumpCondR: 11111 RRR c...c Read1 and JumpCondR
8409 @node The Lisp Reader and Compiler, Lstreams, MULE Character Sets and Encodings, Top
8410 @chapter The Lisp Reader and Compiler
8411 @cindex Lisp reader and compiler, the
8412 @cindex reader and compiler, the Lisp
8413 @cindex compiler, the Lisp reader and
8417 @node Lstreams, Consoles; Devices; Frames; Windows, The Lisp Reader and Compiler, Top
8421 An @dfn{lstream} is an internal Lisp object that provides a generic
8422 buffering stream implementation. Conceptually, you send data to the
8423 stream or read data from the stream, not caring what's on the other end
8424 of the stream. The other end could be another stream, a file
8425 descriptor, a stdio stream, a fixed block of memory, a reallocating
8426 block of memory, etc. The main purpose of the stream is to provide a
8427 standard interface and to do buffering. Macros are defined to read or
8428 write characters, so the calling functions do not have to worry about
8429 blocking data together in order to achieve efficiency.
8432 * Creating an Lstream:: Creating an lstream object.
8433 * Lstream Types:: Different sorts of things that are streamed.
8434 * Lstream Functions:: Functions for working with lstreams.
8435 * Lstream Methods:: Creating new lstream types.
8438 @node Creating an Lstream
8439 @section Creating an Lstream
8440 @cindex lstream, creating an
8442 Lstreams come in different types, depending on what is being interfaced
8443 to. Although the primitive for creating new lstreams is
8444 @code{Lstream_new()}, generally you do not call this directly. Instead,
8445 you call some type-specific creation function, which creates the lstream
8446 and initializes it as appropriate for the particular type.
8448 All lstream creation functions take a @var{mode} argument, specifying
8449 what mode the lstream should be opened as. This controls whether the
8450 lstream is for input and output, and optionally whether data should be
8451 blocked up in units of MULE characters. Note that some types of
8452 lstreams can only be opened for input; others only for output; and
8453 others can be opened either way. #### Richard Mlynarik thinks that
8454 there should be a strict separation between input and output streams,
8455 and he's probably right.
8457 @var{mode} is a string, one of
8465 Open for reading, but ``read'' never returns partial MULE characters.
8467 Open for writing, but never writes partial MULE characters.
8471 @section Lstream Types
8472 @cindex lstream types
8473 @cindex types, lstream
8484 @item resizing-buffer
8497 @node Lstream Functions
8498 @section Lstream Functions
8499 @cindex lstream functions
8501 @deftypefun {Lstream *} Lstream_new (Lstream_implementation *@var{imp}, const char *@var{mode})
8502 Allocate and return a new Lstream. This function is not really meant to
8503 be called directly; rather, each stream type should provide its own
8504 stream creation function, which creates the stream and does any other
8505 necessary creation stuff (e.g. opening a file).
8508 @deftypefun void Lstream_set_buffering (Lstream *@var{lstr}, Lstream_buffering @var{buffering}, int @var{buffering_size})
8509 Change the buffering of a stream. See @file{lstream.h}. By default the
8510 buffering is @code{STREAM_BLOCK_BUFFERED}.
8513 @deftypefun int Lstream_flush (Lstream *@var{lstr})
8514 Flush out any pending unwritten data in the stream. Clear any buffered
8515 input data. Returns 0 on success, -1 on error.
8518 @deftypefn Macro int Lstream_putc (Lstream *@var{stream}, int @var{c})
8519 Write out one byte to the stream. This is a macro and so it is very
8520 efficient. The @var{c} argument is only evaluated once but the @var{stream}
8521 argument is evaluated more than once. Returns 0 on success, -1 on
8525 @deftypefn Macro int Lstream_getc (Lstream *@var{stream})
8526 Read one byte from the stream. This is a macro and so it is very
8527 efficient. The @var{stream} argument is evaluated more than once. Return
8528 value is -1 for EOF or error.
8531 @deftypefn Macro void Lstream_ungetc (Lstream *@var{stream}, int @var{c})
8532 Push one byte back onto the input queue. This will be the next byte
8533 read from the stream. Any number of bytes can be pushed back and will
8534 be read in the reverse order they were pushed back---most recent
8535 first. (This is necessary for consistency---if there are a number of
8536 bytes that have been unread and I read and unread a byte, it needs to be
8537 the first to be read again.) This is a macro and so it is very
8538 efficient. The @var{c} argument is only evaluated once but the @var{stream}
8539 argument is evaluated more than once.
8542 @deftypefun int Lstream_fputc (Lstream *@var{stream}, int @var{c})
8543 @deftypefunx int Lstream_fgetc (Lstream *@var{stream})
8544 @deftypefunx void Lstream_fungetc (Lstream *@var{stream}, int @var{c})
8545 Function equivalents of the above macros.
8548 @deftypefun ssize_t Lstream_read (Lstream *@var{stream}, void *@var{data}, size_t @var{size})
8549 Read @var{size} bytes of @var{data} from the stream. Return the number
8550 of bytes read. 0 means EOF. -1 means an error occurred and no bytes
8554 @deftypefun ssize_t Lstream_write (Lstream *@var{stream}, void *@var{data}, size_t @var{size})
8555 Write @var{size} bytes of @var{data} to the stream. Return the number
8556 of bytes written. -1 means an error occurred and no bytes were written.
8559 @deftypefun void Lstream_unread (Lstream *@var{stream}, void *@var{data}, size_t @var{size})
8560 Push back @var{size} bytes of @var{data} onto the input queue. The next
8561 call to @code{Lstream_read()} with the same size will read the same
8562 bytes back. Note that this will be the case even if there is other
8563 pending unread data.
8566 @deftypefun int Lstream_close (Lstream *@var{stream})
8567 Close the stream. All data will be flushed out.
8570 @deftypefun void Lstream_reopen (Lstream *@var{stream})
8571 Reopen a closed stream. This enables I/O on it again. This is not
8572 meant to be called except from a wrapper routine that reinitializes
8573 variables and such---the close routine may well have freed some
8574 necessary storage structures, for example.
8577 @deftypefun void Lstream_rewind (Lstream *@var{stream})
8578 Rewind the stream to the beginning.
8581 @node Lstream Methods
8582 @section Lstream Methods
8583 @cindex lstream methods
8585 @deftypefn {Lstream Method} ssize_t reader (Lstream *@var{stream}, unsigned char *@var{data}, size_t @var{size})
8586 Read some data from the stream's end and store it into @var{data}, which
8587 can hold @var{size} bytes. Return the number of bytes read. A return
8588 value of 0 means no bytes can be read at this time. This may be because
8589 of an EOF, or because there is a granularity greater than one byte that
8590 the stream imposes on the returned data, and @var{size} is less than
8591 this granularity. (This will happen frequently for streams that need to
8592 return whole characters, because @code{Lstream_read()} calls the reader
8593 function repeatedly until it has the number of bytes it wants or until 0
8594 is returned.) The lstream functions do not treat a 0 return as EOF or
8595 do anything special; however, the calling function will interpret any 0
8596 it gets back as EOF. This will normally not happen unless the caller
8597 calls @code{Lstream_read()} with a very small size.
8599 This function can be @code{NULL} if the stream is output-only.
8602 @deftypefn {Lstream Method} ssize_t writer (Lstream *@var{stream}, const unsigned char *@var{data}, size_t @var{size})
8603 Send some data to the stream's end. Data to be sent is in @var{data}
8604 and is @var{size} bytes. Return the number of bytes sent. This
8605 function can send and return fewer bytes than is passed in; in that
8606 case, the function will just be called again until there is no data left
8607 or 0 is returned. A return value of 0 means that no more data can be
8608 currently stored, but there is no error; the data will be squirreled
8609 away until the writer can accept data. (This is useful, e.g., if you're
8610 dealing with a non-blocking file descriptor and are getting
8611 @code{EWOULDBLOCK} errors.) This function can be @code{NULL} if the
8612 stream is input-only.
8615 @deftypefn {Lstream Method} int rewinder (Lstream *@var{stream})
8616 Rewind the stream. If this is @code{NULL}, the stream is not seekable.
8619 @deftypefn {Lstream Method} int seekable_p (Lstream *@var{stream})
8620 Indicate whether this stream is seekable---i.e. it can be rewound.
8621 This method is ignored if the stream does not have a rewind method. If
8622 this method is not present, the result is determined by whether a rewind
8626 @deftypefn {Lstream Method} int flusher (Lstream *@var{stream})
8627 Perform any additional operations necessary to flush the data in this
8631 @deftypefn {Lstream Method} int pseudo_closer (Lstream *@var{stream})
8634 @deftypefn {Lstream Method} int closer (Lstream *@var{stream})
8635 Perform any additional operations necessary to close this stream down.
8636 May be @code{NULL}. This function is called when @code{Lstream_close()}
8637 is called or when the stream is garbage-collected. When this function
8638 is called, all pending data in the stream will already have been written
8642 @deftypefn {Lstream Method} Lisp_Object marker (Lisp_Object @var{lstream}, void (*@var{markfun}) (Lisp_Object))
8643 Mark this object for garbage collection. Same semantics as a standard
8644 @code{Lisp_Object} marker. This function can be @code{NULL}.
8647 @node Consoles; Devices; Frames; Windows, The Redisplay Mechanism, Lstreams, Top
8648 @chapter Consoles; Devices; Frames; Windows
8649 @cindex consoles; devices; frames; windows
8650 @cindex devices; frames; windows, consoles;
8651 @cindex frames; windows, consoles; devices;
8652 @cindex windows, consoles; devices; frames;
8655 * Introduction to Consoles; Devices; Frames; Windows::
8657 * Window Hierarchy::
8658 * The Window Object::
8661 @node Introduction to Consoles; Devices; Frames; Windows
8662 @section Introduction to Consoles; Devices; Frames; Windows
8663 @cindex consoles; devices; frames; windows, introduction to
8664 @cindex devices; frames; windows, introduction to consoles;
8665 @cindex frames; windows, introduction to consoles; devices;
8666 @cindex windows, introduction to consoles; devices; frames;
8668 A window-system window that you see on the screen is called a
8669 @dfn{frame} in Emacs terminology. Each frame is subdivided into one or
8670 more non-overlapping panes, called (confusingly) @dfn{windows}. Each
8671 window displays the text of a buffer in it. (See above on Buffers.) Note
8672 that buffers and windows are independent entities: Two or more windows
8673 can be displaying the same buffer (potentially in different locations),
8674 and a buffer can be displayed in no windows.
8676 A single display screen that contains one or more frames is called
8677 a @dfn{display}. Under most circumstances, there is only one display.
8678 However, more than one display can exist, for example if you have
8679 a @dfn{multi-headed} console, i.e. one with a single keyboard but
8680 multiple displays. (Typically in such a situation, the various
8681 displays act like one large display, in that the mouse is only
8682 in one of them at a time, and moving the mouse off of one moves
8683 it into another.) In some cases, the different displays will
8684 have different characteristics, e.g. one color and one mono.
8686 XEmacs can display frames on multiple displays. It can even deal
8687 simultaneously with frames on multiple keyboards (called @dfn{consoles} in
8688 XEmacs terminology). Here is one case where this might be useful: You
8689 are using XEmacs on your workstation at work, and leave it running.
8690 Then you go home and dial in on a TTY line, and you can use the
8691 already-running XEmacs process to display another frame on your local
8694 Thus, there is a hierarchy console -> display -> frame -> window.
8695 There is a separate Lisp object type for each of these four concepts.
8696 Furthermore, there is logically a @dfn{selected console},
8697 @dfn{selected display}, @dfn{selected frame}, and @dfn{selected window}.
8698 Each of these objects is distinguished in various ways, such as being the
8699 default object for various functions that act on objects of that type.
8700 Note that every containing object remembers the ``selected'' object
8701 among the objects that it contains: e.g. not only is there a selected
8702 window, but every frame remembers the last window in it that was
8703 selected, and changing the selected frame causes the remembered window
8704 within it to become the selected window. Similar relationships apply
8705 for consoles to devices and devices to frames.
8711 Recall that every buffer has a current insertion position, called
8712 @dfn{point}. Now, two or more windows may be displaying the same buffer,
8713 and the text cursor in the two windows (i.e. @code{point}) can be in
8714 two different places. You may ask, how can that be, since each
8715 buffer has only one value of @code{point}? The answer is that each window
8716 also has a value of @code{point} that is squirreled away in it. There
8717 is only one selected window, and the value of ``point'' in that buffer
8718 corresponds to that window. When the selected window is changed
8719 from one window to another displaying the same buffer, the old
8720 value of @code{point} is stored into the old window's ``point'' and the
8721 value of @code{point} from the new window is retrieved and made the
8722 value of @code{point} in the buffer. This means that @code{window-point}
8723 for the selected window is potentially inaccurate, and if you
8724 want to retrieve the correct value of @code{point} for a window,
8725 you must special-case on the selected window and retrieve the
8726 buffer's point instead. This is related to why @code{save-window-excursion}
8727 does not save the selected window's value of @code{point}.
8729 @node Window Hierarchy
8730 @section Window Hierarchy
8731 @cindex window hierarchy
8732 @cindex hierarchy of windows
8734 If a frame contains multiple windows (panes), they are always created
8735 by splitting an existing window along the horizontal or vertical axis.
8736 Terminology is a bit confusing here: to @dfn{split a window
8737 horizontally} means to create two side-by-side windows, i.e. to make a
8738 @emph{vertical} cut in a window. Likewise, to @dfn{split a window
8739 vertically} means to create two windows, one above the other, by making
8740 a @emph{horizontal} cut.
8742 If you split a window and then split again along the same axis, you
8743 will end up with a number of panes all arranged along the same axis.
8744 The precise way in which the splits were made should not be important,
8745 and this is reflected internally. Internally, all windows are arranged
8746 in a tree, consisting of two types of windows, @dfn{combination} windows
8747 (which have children, and are covered completely by those children) and
8748 @dfn{leaf} windows, which have no children and are visible. Every
8749 combination window has two or more children, all arranged along the same
8750 axis. There are (logically) two subtypes of windows, depending on
8751 whether their children are horizontally or vertically arrayed. There is
8752 always one root window, which is either a leaf window (if the frame
8753 contains only one window) or a combination window (if the frame contains
8754 more than one window). In the latter case, the root window will have
8755 two or more children, either horizontally or vertically arrayed, and
8756 each of those children will be either a leaf window or another
8759 Here are some rules:
8763 Horizontal combination windows can never have children that are
8764 horizontal combination windows; same for vertical.
8767 Only leaf windows can be split (obviously) and this splitting does one
8768 of two things: (a) turns the leaf window into a combination window and
8769 creates two new leaf children, or (b) turns the leaf window into one of
8770 the two new leaves and creates the other leaf. Rule (1) dictates which
8771 of these two outcomes happens.
8774 Every combination window must have at least two children.
8777 Leaf windows can never become combination windows. They can be deleted,
8778 however. If this results in a violation of (3), the parent combination
8779 window also gets deleted.
8782 All functions that accept windows must be prepared to accept combination
8783 windows, and do something sane (e.g. signal an error if so).
8784 Combination windows @emph{do} escape to the Lisp level.
8787 All windows have three fields governing their contents:
8788 these are @dfn{hchild} (a list of horizontally-arrayed children),
8789 @dfn{vchild} (a list of vertically-arrayed children), and @dfn{buffer}
8790 (the buffer contained in a leaf window). Exactly one of
8791 these will be non-@code{nil}. Remember that @dfn{horizontally-arrayed}
8792 means ``side-by-side'' and @dfn{vertically-arrayed} means
8793 @dfn{one above the other}.
8796 Leaf windows also have markers in their @code{start} (the
8797 first buffer position displayed in the window) and @code{pointm}
8798 (the window's stashed value of @code{point}---see above) fields,
8799 while combination windows have @code{nil} in these fields.
8802 The list of children for a window is threaded through the
8803 @code{next} and @code{prev} fields of each child window.
8806 @strong{Deleted windows can be undeleted}. This happens as a result of
8807 restoring a window configuration, and is unlike frames, displays, and
8808 consoles, which, once deleted, can never be restored. Deleting a window
8809 does nothing except set a special @code{dead} bit to 1 and clear out the
8810 @code{next}, @code{prev}, @code{hchild}, and @code{vchild} fields, for
8814 Most frames actually have two top-level windows---one for the
8815 minibuffer and one (the @dfn{root}) for everything else. The modeline
8816 (if present) separates these two. The @code{next} field of the root
8817 points to the minibuffer, and the @code{prev} field of the minibuffer
8818 points to the root. The other @code{next} and @code{prev} fields are
8819 @code{nil}, and the frame points to both of these windows.
8820 Minibuffer-less frames have no minibuffer window, and the @code{next}
8821 and @code{prev} of the root window are @code{nil}. Minibuffer-only
8822 frames have no root window, and the @code{next} of the minibuffer window
8823 is @code{nil} but the @code{prev} points to itself. (#### This is an
8824 artifact that should be fixed.)
8827 @node The Window Object
8828 @section The Window Object
8829 @cindex window object, the
8830 @cindex object, the window
8832 Windows have the following accessible fields:
8836 The frame that this window is on.
8839 Non-@code{nil} if this window is a minibuffer window.
8842 The buffer that the window is displaying. This may change often during
8843 the life of the window.
8846 Non-@code{nil} if this window is dedicated to its buffer.
8849 @cindex window point internals
8850 This is the value of point in the current buffer when this window is
8851 selected; when it is not selected, it retains its previous value.
8854 The position in the buffer that is the first character to be displayed
8858 If this flag is non-@code{nil}, it says that the window has been
8859 scrolled explicitly by the Lisp program. This affects what the next
8860 redisplay does if point is off the screen: instead of scrolling the
8861 window to show the text around point, it moves point to a location that
8865 The @code{modified} field of the window's buffer, as of the last time
8866 a redisplay completed in this window.
8869 The buffer's value of point, as of the last time
8870 a redisplay completed in this window.
8873 This is the left-hand edge of the window, measured in columns. (The
8874 leftmost column on the screen is @w{column 0}.)
8877 This is the top edge of the window, measured in lines. (The top line on
8878 the screen is @w{line 0}.)
8881 The height of the window, measured in lines.
8884 The width of the window, measured in columns.
8887 This is the window that is the next in the chain of siblings. It is
8888 @code{nil} in a window that is the rightmost or bottommost of a group of
8892 This is the window that is the previous in the chain of siblings. It is
8893 @code{nil} in a window that is the leftmost or topmost of a group of
8897 Internally, XEmacs arranges windows in a tree; each group of siblings has
8898 a parent window whose area includes all the siblings. This field points
8899 to a window's parent.
8901 Parent windows do not display buffers, and play little role in display
8902 except to shape their child windows. Emacs Lisp programs usually have
8903 no access to the parent windows; they operate on the windows at the
8904 leaves of the tree, which actually display buffers.
8907 This is the number of columns that the display in the window is scrolled
8908 horizontally to the left. Normally, this is 0.
8911 This is the last time that the window was selected. The function
8912 @code{get-lru-window} uses this field.
8915 The window's display table, or @code{nil} if none is specified for it.
8917 @item update_mode_line
8918 Non-@code{nil} means this window's mode line needs to be updated.
8920 @item base_line_number
8921 The line number of a certain position in the buffer, or @code{nil}.
8922 This is used for displaying the line number of point in the mode line.
8925 The position in the buffer for which the line number is known, or
8926 @code{nil} meaning none is known.
8928 @item region_showing
8929 If the region (or part of it) is highlighted in this window, this field
8930 holds the mark position that made one end of that region. Otherwise,
8931 this field is @code{nil}.
8934 @node The Redisplay Mechanism, Extents, Consoles; Devices; Frames; Windows, Top
8935 @chapter The Redisplay Mechanism
8936 @cindex redisplay mechanism, the
8938 The redisplay mechanism is one of the most complicated sections of
8939 XEmacs, especially from a conceptual standpoint. This is doubly so
8940 because, unlike for the basic aspects of the Lisp interpreter, the
8941 computer science theories of how to efficiently handle redisplay are not
8944 When working with the redisplay mechanism, remember the Golden Rules
8949 It Is Better To Be Correct Than Fast.
8951 Thou Shalt Not Run Elisp From Within Redisplay.
8953 It Is Better To Be Fast Than Not To Be.
8957 * Critical Redisplay Sections::
8958 * Line Start Cache::
8959 * Redisplay Piece by Piece::
8962 @node Critical Redisplay Sections
8963 @section Critical Redisplay Sections
8964 @cindex redisplay sections, critical
8965 @cindex critical redisplay sections
8967 Within this section, we are defenseless and assume that the
8968 following cannot happen:
8974 Lisp code evaluation
8979 We ensure (3) by calling @code{hold_frame_size_changes()}, which
8980 will cause any pending frame size changes to get put on hold
8981 till after the end of the critical section. (1) follows
8982 automatically if (2) is met. #### Unfortunately, there are
8983 some places where Lisp code can be called within this section.
8984 We need to remove them.
8986 If @code{Fsignal()} is called during this critical section, we
8987 will @code{abort()}.
8989 If garbage collection is called during this critical section,
8990 we simply return. #### We should abort instead.
8992 #### If a frame-size change does occur we should probably
8993 actually be preempting redisplay.
8995 @node Line Start Cache
8996 @section Line Start Cache
8997 @cindex line start cache
8999 The traditional scrolling code in Emacs breaks in a variable height
9000 world. It depends on the key assumption that the number of lines that
9001 can be displayed at any given time is fixed. This led to a complete
9002 separation of the scrolling code from the redisplay code. In order to
9003 fully support variable height lines, the scrolling code must actually be
9004 tightly integrated with redisplay. Only redisplay can determine how
9005 many lines will be displayed on a screen for any given starting point.
9007 What is ideally wanted is a complete list of the starting buffer
9008 position for every possible display line of a buffer along with the
9009 height of that display line. Maintaining such a full list would be very
9010 expensive. We settle for having it include information for all areas
9011 which we happen to generate anyhow (i.e. the region currently being
9012 displayed) and for those areas we need to work with.
9014 In order to ensure that the cache accurately represents what redisplay
9015 would actually show, it is necessary to invalidate it in many
9016 situations. If the buffer changes, the starting positions may no longer
9017 be correct. If a face or an extent has changed then the line heights
9018 may have altered. These events happen frequently enough that the cache
9019 can end up being constantly disabled. With this potentially constant
9020 invalidation when is the cache ever useful?
9022 Even if the cache is invalidated before every single usage, it is
9023 necessary. Scrolling often requires knowledge about display lines which
9024 are actually above or below the visible region. The cache provides a
9025 convenient light-weight method of storing this information for multiple
9026 display regions. This knowledge is necessary for the scrolling code to
9027 always obey the First Golden Rule of Redisplay.
9029 If the cache already contains all of the information that the scrolling
9030 routines happen to need so that it doesn't have to go generate it, then
9031 we are able to obey the Third Golden Rule of Redisplay. The first thing
9032 we do to help out the cache is to always add the displayed region. This
9033 region had to be generated anyway, so the cache ends up getting the
9034 information basically for free. In those cases where a user is simply
9035 scrolling around viewing a buffer there is a high probability that this
9036 is sufficient to always provide the needed information. The second
9037 thing we can do is be smart about invalidating the cache.
9039 TODO---Be smart about invalidating the cache. Potential places:
9043 Insertions at end-of-line which don't cause line-wraps do not alter the
9044 starting positions of any display lines. These types of buffer
9045 modifications should not invalidate the cache. This is actually a large
9046 optimization for redisplay speed as well.
9048 Buffer modifications frequently only affect the display of lines at and
9049 below where they occur. In these situations we should only invalidate
9050 the part of the cache starting at where the modification occurs.
9053 In case you're wondering, the Second Golden Rule of Redisplay is not
9056 @node Redisplay Piece by Piece
9057 @section Redisplay Piece by Piece
9058 @cindex redisplay piece by piece
9060 As you can begin to see redisplay is complex and also not well
9061 documented. Chuck no longer works on XEmacs so this section is my take
9062 on the workings of redisplay.
9064 Redisplay happens in three phases:
9068 Determine desired display in area that needs redisplay.
9069 Implemented by @code{redisplay.c}
9071 Compare desired display with current display
9072 Implemented by @code{redisplay-output.c}
9074 Output changes Implemented by @code{redisplay-output.c},
9075 @code{redisplay-x.c}, @code{redisplay-msw.c} and @code{redisplay-tty.c}
9078 Steps 1 and 2 are device-independent and relatively complex. Step 3 is
9079 mostly device-dependent.
9081 Determining the desired display
9083 Display attributes are stored in @code{display_line} structures. Each
9084 @code{display_line} consists of a set of @code{display_block}'s and each
9085 @code{display_block} contains a number of @code{rune}'s. Generally
9086 dynarr's of @code{display_line}'s are held by each window representing
9087 the current display and the desired display.
9089 The @code{display_line} structures are tightly tied to buffers which
9090 presents a problem for redisplay as this connection is bogus for the
9091 modeline. Hence the @code{display_line} generation routines are
9092 duplicated for generating the modeline. This means that the modeline
9093 display code has many bugs that the standard redisplay code does not.
9095 The guts of @code{display_line} generation are in
9096 @code{create_text_block}, which creates a single display line for the
9097 desired locale. This incrementally parses the characters on the current
9098 line and generates redisplay structures for each.
9100 Gutter redisplay is different. Because the data to display is stored in
9101 a string we cannot use @code{create_text_block}. Instead we use
9102 @code{create_text_string_block} which performs the same function as
9103 @code{create_text_block} but for strings. Many of the complexities of
9104 @code{create_text_block} to do with cursor handling and selective
9105 display have been removed.
9107 @node Extents, Faces, The Redisplay Mechanism, Top
9112 * Introduction to Extents:: Extents are ranges over text, with properties.
9113 * Extent Ordering:: How extents are ordered internally.
9114 * Format of the Extent Info:: The extent information in a buffer or string.
9115 * Zero-Length Extents:: A weird special case.
9116 * Mathematics of Extent Ordering:: A rigorous foundation.
9117 * Extent Fragments:: Cached information useful for redisplay.
9120 @node Introduction to Extents
9121 @section Introduction to Extents
9122 @cindex extents, introduction to
9124 Extents are regions over a buffer, with a start and an end position
9125 denoting the region of the buffer included in the extent. In
9126 addition, either end can be closed or open, meaning that the endpoint
9127 is or is not logically included in the extent. Insertion of a character
9128 at a closed endpoint causes the character to go inside the extent;
9129 insertion at an open endpoint causes the character to go outside.
9131 Extent endpoints are stored using memory indices (see @file{insdel.c}),
9132 to minimize the amount of adjusting that needs to be done when
9133 characters are inserted or deleted.
9135 (Formerly, extent endpoints at the gap could be either before or
9136 after the gap, depending on the open/closedness of the endpoint.
9137 The intent of this was to make it so that insertions would
9138 automatically go inside or out of extents as necessary with no
9139 further work needing to be done. It didn't work out that way,
9140 however, and just ended up complexifying and buggifying all the
9143 @node Extent Ordering
9144 @section Extent Ordering
9145 @cindex extent ordering
9147 Extents are compared using memory indices. There are two orderings
9148 for extents and both orders are kept current at all times. The normal
9149 or @dfn{display} order is as follows:
9152 Extent A is ``less than'' extent B,
9153 that is, earlier in the display order,
9154 if: A-start < B-start,
9155 or if: A-start = B-start, and A-end > B-end
9158 So if two extents begin at the same position, the larger of them is the
9159 earlier one in the display order (@code{EXTENT_LESS} is true).
9161 For the e-order, the same thing holds:
9164 Extent A is ``less than'' extent B in e-order,
9165 that is, later in the buffer,
9167 or if: A-end = B-end, and A-start > B-start
9170 So if two extents end at the same position, the smaller of them is the
9171 earlier one in the e-order (@code{EXTENT_E_LESS} is true).
9173 The display order and the e-order are complementary orders: any
9174 theorem about the display order also applies to the e-order if you swap
9175 all occurrences of ``display order'' and ``e-order'', ``less than'' and
9176 ``greater than'', and ``extent start'' and ``extent end''.
9178 @node Format of the Extent Info
9179 @section Format of the Extent Info
9180 @cindex extent info, format of the
9182 An extent-info structure consists of a list of the buffer or string's
9183 extents and a @dfn{stack of extents} that lists all of the extents over
9184 a particular position. The stack-of-extents info is used for
9185 optimization purposes---it basically caches some info that might
9186 be expensive to compute. Certain otherwise hard computations are easy
9187 given the stack of extents over a particular position, and if the
9188 stack of extents over a nearby position is known (because it was
9189 calculated at some prior point in time), it's easy to move the stack
9190 of extents to the proper position.
9192 Given that the stack of extents is an optimization, and given that
9193 it requires memory, a string's stack of extents is wiped out each
9194 time a garbage collection occurs. Therefore, any time you retrieve
9195 the stack of extents, it might not be there. If you need it to
9196 be there, use the @code{_force} version.
9198 Similarly, a string may or may not have an extent_info structure.
9199 (Generally it won't if there haven't been any extents added to the
9200 string.) So use the @code{_force} version if you need the extent_info
9201 structure to be there.
9203 A list of extents is maintained as a double gap array: one gap array
9204 is ordered by start index (the @dfn{display order}) and the other is
9205 ordered by end index (the @dfn{e-order}). Note that positions in an
9206 extent list should logically be conceived of as referring @emph{to} a
9207 particular extent (as is the norm in programs) rather than sitting
9208 between two extents. Note also that callers of these functions should
9209 not be aware of the fact that the extent list is implemented as an
9210 array, except for the fact that positions are integers (this should be
9211 generalized to handle integers and linked list equally well).
9213 @node Zero-Length Extents
9214 @section Zero-Length Extents
9215 @cindex zero-length extents
9216 @cindex extents, zero-length
9218 Extents can be zero-length, and will end up that way if their endpoints
9219 are explicitly set that way or if their detachable property is @code{nil}
9220 and all the text in the extent is deleted. (The exception is open-open
9221 zero-length extents, which are barred from existing because there is
9222 no sensible way to define their properties. Deletion of the text in
9223 an open-open extent causes it to be converted into a closed-open
9224 extent.) Zero-length extents are primarily used to represent
9225 annotations, and behave as follows:
9229 Insertion at the position of a zero-length extent expands the extent
9230 if both endpoints are closed; goes after the extent if it is closed-open;
9231 and goes before the extent if it is open-closed.
9234 Deletion of a character on a side of a zero-length extent whose
9235 corresponding endpoint is closed causes the extent to be detached if
9236 it is detachable; if the extent is not detachable or the corresponding
9237 endpoint is open, the extent remains in the buffer, moving as necessary.
9240 Note that closed-open, non-detachable zero-length extents behave
9241 exactly like markers and that open-closed, non-detachable zero-length
9242 extents behave like the ``point-type'' marker in Mule.
9244 @node Mathematics of Extent Ordering
9245 @section Mathematics of Extent Ordering
9246 @cindex mathematics of extent ordering
9247 @cindex extent mathematics
9248 @cindex extent ordering
9250 @cindex display order of extents
9251 @cindex extents, display order
9252 The extents in a buffer are ordered by ``display order'' because that
9253 is that order that the redisplay mechanism needs to process them in.
9254 The e-order is an auxiliary ordering used to facilitate operations
9255 over extents. The operations that can be performed on the ordered
9256 list of extents in a buffer are
9260 Locate where an extent would go if inserted into the list.
9262 Insert an extent into the list.
9264 Remove an extent from the list.
9266 Map over all the extents that overlap a range.
9269 (4) requires being able to determine the first and last extents
9270 that overlap a range.
9272 NOTE: @dfn{overlap} is used as follows:
9276 two ranges overlap if they have at least one point in common.
9277 Whether the endpoints are open or closed makes a difference here.
9279 a point overlaps a range if the point is contained within the
9280 range; this is equivalent to treating a point @math{P} as the range
9283 In the case of an @emph{extent} overlapping a point or range, the extent
9284 is normally treated as having closed endpoints. This applies
9285 consistently in the discussion of stacks of extents and such below.
9286 Note that this definition of overlap is not necessarily consistent with
9287 the extents that @code{map-extents} maps over, since @code{map-extents}
9288 sometimes pays attention to whether the endpoints of an extents are open
9289 or closed. But for our purposes, it greatly simplifies things to treat
9290 all extents as having closed endpoints.
9293 First, define @math{>}, @math{<}, @math{<=}, etc. as applied to extents
9294 to mean comparison according to the display order. Comparison between
9295 an extent @math{E} and an index @math{I} means comparison between
9296 @math{E} and the range @math{[I, I]}.
9298 Also define @math{e>}, @math{e<}, @math{e<=}, etc. to mean comparison
9299 according to the e-order.
9301 For any range @math{R}, define @math{R(0)} to be the starting index of
9302 the range and @math{R(1)} to be the ending index of the range.
9304 For any extent @math{E}, define @math{E(next)} to be the extent directly
9305 following @math{E}, and @math{E(prev)} to be the extent directly
9306 preceding @math{E}. Assume @math{E(next)} and @math{E(prev)} can be
9307 determined from @math{E} in constant time. (This is because we store
9308 the extent list as a doubly linked list.)
9310 Similarly, define @math{E(e-next)} and @math{E(e-prev)} to be the
9311 extents directly following and preceding @math{E} in the e-order.
9315 Let @math{R} be a range.
9316 Let @math{F} be the first extent overlapping @math{R}.
9317 Let @math{L} be the last extent overlapping @math{R}.
9319 Theorem 1: @math{R(1)} lies between @math{L} and @math{L(next)},
9320 i.e. @math{L <= R(1) < L(next)}.
9322 This follows easily from the definition of display order. The
9323 basic reason that this theorem applies is that the display order
9324 sorts by increasing starting index.
9326 Therefore, we can determine @math{L} just by looking at where we would
9327 insert @math{R(1)} into the list, and if we know @math{F} and are moving
9328 forward over extents, we can easily determine when we've hit @math{L} by
9329 comparing the extent we're at to @math{R(1)}.
9332 Theorem 2: @math{F(e-prev) e< [1, R(0)] e<= F}.
9335 This is the analog of Theorem 1, and applies because the e-order
9336 sorts by increasing ending index.
9338 Therefore, @math{F} can be found in the same amount of time as
9339 operation (1), i.e. the time that it takes to locate where an extent
9340 would go if inserted into the e-order list.
9342 If the lists were stored as balanced binary trees, then operation (1)
9343 would take logarithmic time, which is usually quite fast. However,
9344 currently they're stored as simple doubly-linked lists, and instead we
9345 do some caching to try to speed things up.
9347 Define a @dfn{stack of extents} (or @dfn{SOE}) as the set of extents
9348 (ordered in the display order) that overlap an index @math{I}, together
9349 with the SOE's @dfn{previous} extent, which is an extent that precedes
9350 @math{I} in the e-order. (Hopefully there will not be very many extents
9351 between @math{I} and the previous extent.)
9355 Let @math{I} be an index, let @math{S} be the stack of extents on
9356 @math{I}, let @math{F} be the first extent in @math{S}, and let @math{P}
9357 be @math{S}'s previous extent.
9359 Theorem 3: The first extent in @math{S} is the first extent that overlaps
9360 any range @math{[I, J]}.
9362 Proof: Any extent that overlaps @math{[I, J]} but does not include
9363 @math{I} must have a start index @math{> I}, and thus be greater than
9364 any extent in @math{S}.
9366 Therefore, finding the first extent that overlaps a range @math{R} is
9367 the same as finding the first extent that overlaps @math{R(0)}.
9369 Theorem 4: Let @math{I2} be an index such that @math{I2 > I}, and let
9370 @math{F2} be the first extent that overlaps @math{I2}. Then, either
9371 @math{F2} is in @math{S} or @math{F2} is greater than any extent in
9374 Proof: If @math{F2} does not include @math{I} then its start index is
9375 greater than @math{I} and thus it is greater than any extent in
9376 @math{S}, including @math{F}. Otherwise, @math{F2} includes @math{I}
9377 and thus is in @math{S}, and thus @math{F2 >= F}.
9379 @node Extent Fragments
9380 @section Extent Fragments
9381 @cindex extent fragments
9382 @cindex fragments, extent
9384 Imagine that the buffer is divided up into contiguous, non-overlapping
9385 @dfn{runs} of text such that no extent starts or ends within a run
9386 (extents that abut the run don't count).
9388 An extent fragment is a structure that holds data about the run that
9389 contains a particular buffer position (if the buffer position is at the
9390 junction of two runs, the run after the position is used)---the
9391 beginning and end of the run, a list of all of the extents in that run,
9392 the @dfn{merged face} that results from merging all of the faces
9393 corresponding to those extents, the begin and end glyphs at the
9394 beginning of the run, etc. This is the information that redisplay needs
9395 in order to display this run.
9397 Extent fragments have to be very quick to update to a new buffer
9398 position when moving linearly through the buffer. They rely on the
9399 stack-of-extents code, which does the heavy-duty algorithmic work of
9400 determining which extents overly a particular position.
9402 @node Faces, Glyphs, Extents, Top
9408 @node Glyphs, Specifiers, Faces, Top
9412 Glyphs are graphical elements that can be displayed in XEmacs buffers or
9413 gutters. We use the term graphical element here in the broadest possible
9414 sense since glyphs can be as mundane as text or as arcane as a native
9417 In XEmacs, glyphs represent the uninstantiated state of graphical
9418 elements, i.e. they hold all the information necessary to produce an
9419 image on-screen but the image need not exist at this stage, and multiple
9420 screen images can be instantiated from a single glyph.
9422 Glyphs are lazily instantiated by calling one of the glyph
9423 functions. This usually occurs within redisplay when
9424 @code{Fglyph_height} is called. Instantiation causes an image-instance
9425 to be created and cached. This cache is on a per-device basis for all glyphs
9426 except widget-glyphs, and on a per-window basis for widgets-glyphs. The
9427 caching is done by @code{image_instantiate} and is necessary because it
9428 is generally possible to display an image-instance in multiple
9429 domains. For instance if we create a Pixmap, we can actually display
9430 this on multiple windows - even though we only need a single Pixmap
9431 instance to do this. If caching wasn't done then it would be necessary
9432 to create image-instances for every displayable occurrence of a glyph -
9433 and every usage - and this would be extremely memory and cpu intensive.
9435 Widget-glyphs (a.k.a native widgets) are not cached in this way. This is
9436 because widget-glyph image-instances on screen are toolkit windows, and
9437 thus cannot be reused in multiple XEmacs domains. Thus widget-glyphs are
9438 cached on an XEmacs window basis.
9440 Any action on a glyph first consults the cache before actually
9441 instantiating a widget.
9443 @section Glyph Instantiation
9444 @cindex glyph instantiation
9445 @cindex instantiation, glyph
9447 Glyph instantiation is a hairy topic and requires some explanation. The
9448 guts of glyph instantiation is contained within
9449 @code{image_instantiate}. A glyph contains an image which is a
9450 specifier. When a glyph function - for instance @code{Fglyph_height} -
9451 asks for a property of the glyph that can only be determined from its
9452 instantiated state, then the glyph image is instantiated and an image
9453 instance created. The instantiation process is governed by the specifier
9454 code and goes through a series of steps:
9458 Validation. Instantiation of image instances happens dynamically - often
9459 within the guts of redisplay. Thus it is often not feasible to catch
9460 instantiator errors at instantiation time. Instead the instantiator is
9461 validated at the time it is added to the image specifier. This function
9462 is defined by @code{image_validate} and at a simple level validates
9463 keyword value pairs.
9465 Duplication. The specifier code by default takes a copy of the
9466 instantiator. This is reasonable for most specifiers but in the case of
9467 widget-glyphs can be problematic, since some of the properties in the
9468 instantiator - for instance callbacks - could cause infinite recursion
9469 in the copying process. Thus the image code defines a function -
9470 @code{image_copy_instantiator} - which will selectively copy values.
9471 This is controlled by the way that a keyword is defined either using
9472 @code{IIFORMAT_VALID_KEYWORD} or
9473 @code{IIFORMAT_VALID_NONCOPY_KEYWORD}. Note that the image caching and
9474 redisplay code relies on instantiator copying to ensure that current and
9475 new instantiators are actually different rather than referring to the
9478 Normalization. Once the instantiator has been copied it must be
9479 converted into a form that is viable at instantiation time. This can
9480 involve no changes at all, but typically involves things like converting
9481 file names to the actual data. This function is defined by
9482 @code{image_going_to_add} and @code{normalize_image_instantiator}.
9484 Instantiation. When an image instance is actually required for display
9485 it is instantiated using @code{image_instantiate}. This involves calling
9486 instantiate methods that are specific to the type of image being
9490 The final instantiation phase also involves a number of steps. In order
9491 to understand these we need to describe a number of concepts.
9493 An image is instantiated in a @dfn{domain}, where a domain can be any
9494 one of a device, frame, window or image-instance. The domain gives the
9495 image-instance context and identity and properties that affect the
9496 appearance of the image-instance may be different for the same glyph
9497 instantiated in different domains. An example is the face used to
9498 display the image-instance.
9500 Although an image is instantiated in a particular domain the
9501 instantiation domain is not necessarily the domain in which the
9502 image-instance is cached. For example a pixmap can be instantiated in a
9503 window be actually be cached on a per-device basis. The domain in which
9504 the image-instance is actually cached is called the
9505 @dfn{governing-domain}. A governing-domain is currently either a device
9506 or a window. Widget-glyphs and text-glyphs have a window as a
9507 governing-domain, all other image-instances have a device as the
9508 governing-domain. The governing domain for an image-instance is
9509 determined using the governing_domain image-instance method.
9511 @section Widget-Glyphs
9512 @cindex widget-glyphs
9514 @section Widget-Glyphs in the MS-Windows Environment
9515 @cindex widget-glyphs in the MS-Windows environment
9516 @cindex MS-Windows environment, widget-glyphs in the
9520 @section Widget-Glyphs in the X Environment
9521 @cindex widget-glyphs in the X environment
9522 @cindex X environment, widget-glyphs in the
9524 Widget-glyphs under X make heavy use of lwlib (@pxref{Lucid Widget
9525 Library}) for manipulating the native toolkit objects. This is primarily
9526 so that different toolkits can be supported for widget-glyphs, just as
9527 they are supported for features such as menubars etc.
9529 Lwlib is extremely poorly documented and quite hairy so here is my
9530 understanding of what goes on.
9532 Lwlib maintains a set of widget_instances which mirror the hierarchical
9533 state of Xt widgets. I think this is so that widgets can be updated and
9534 manipulated generically by the lwlib library. For instance
9535 update_one_widget_instance can cope with multiple types of widget and
9536 multiple types of toolkit. Each element in the widget hierarchy is updated
9537 from its corresponding widget_instance by walking the widget_instance
9540 This has desirable properties such as lw_modify_all_widgets which is
9541 called from @file{glyphs-x.c} and updates all the properties of a widget
9542 without having to know what the widget is or what toolkit it is from.
9543 Unfortunately this also has hairy properties such as making the lwlib
9544 code quite complex. And of course lwlib has to know at some level what
9545 the widget is and how to set its properties.
9547 @node Specifiers, Menus, Glyphs, Top
9553 @node Menus, Subprocesses, Specifiers, Top
9557 A menu is set by setting the value of the variable
9558 @code{current-menubar} (which may be buffer-local) and then calling
9559 @code{set-menubar-dirty-flag} to signal a change. This will cause the
9560 menu to be redrawn at the next redisplay. The format of the data in
9561 @code{current-menubar} is described in @file{menubar.c}.
9563 Internally the data in current-menubar is parsed into a tree of
9564 @code{widget_value's} (defined in @file{lwlib.h}); this is accomplished
9565 by the recursive function @code{menu_item_descriptor_to_widget_value()},
9566 called by @code{compute_menubar_data()}. Such a tree is deallocated
9567 using @code{free_widget_value()}.
9569 @code{update_screen_menubars()} is one of the external entry points.
9570 This checks to see, for each screen, if that screen's menubar needs to
9571 be updated. This is the case if
9575 @code{set-menubar-dirty-flag} was called since the last redisplay. (This
9576 function sets the C variable menubar_has_changed.)
9578 The buffer displayed in the screen has changed.
9580 The screen has no menubar currently displayed.
9583 @code{set_screen_menubar()} is called for each such screen. This
9584 function calls @code{compute_menubar_data()} to create the tree of
9585 widget_value's, then calls @code{lw_create_widget()},
9586 @code{lw_modify_all_widgets()}, and/or @code{lw_destroy_all_widgets()}
9587 to create the X-Toolkit widget associated with the menu.
9589 @code{update_psheets()}, the other external entry point, actually
9590 changes the menus being displayed. It uses the widgets fixed by
9591 @code{update_screen_menubars()} and calls various X functions to ensure
9592 that the menus are displayed properly.
9594 The menubar widget is set up so that @code{pre_activate_callback()} is
9595 called when the menu is first selected (i.e. mouse button goes down),
9596 and @code{menubar_selection_callback()} is called when an item is
9597 selected. @code{pre_activate_callback()} calls the function in
9598 activate-menubar-hook, which can change the menubar (this is described
9599 in @file{menubar.c}). If the menubar is changed,
9600 @code{set_screen_menubars()} is called.
9601 @code{menubar_selection_callback()} enqueues a menu event, putting in it
9602 a function to call (either @code{eval} or @code{call-interactively}) and
9603 its argument, which is the callback function or form given in the menu's
9606 @node Subprocesses, Interface to the X Window System, Menus, Top
9607 @chapter Subprocesses
9608 @cindex subprocesses
9610 The fields of a process are:
9614 A string, the name of the process.
9617 A list containing the command arguments that were used to start this
9621 A function used to accept output from the process instead of a buffer,
9625 A function called whenever the process receives a signal, or @code{nil}.
9628 The associated buffer of the process.
9631 An integer, the Unix process @sc{id}.
9634 A flag, non-@code{nil} if this is really a child process.
9635 It is @code{nil} for a network connection.
9638 A marker indicating the position of the end of the last output from this
9639 process inserted into the buffer. This is often but not always the end
9642 @item kill_without_query
9643 If this is non-@code{nil}, killing XEmacs while this process is still
9644 running does not ask for confirmation about killing the process.
9646 @item raw_status_low
9647 @itemx raw_status_high
9648 These two fields record 16 bits each of the process status returned by
9649 the @code{wait} system call.
9652 The process status, as @code{process-status} should return it.
9656 If these two fields are not equal, a change in the status of the process
9657 needs to be reported, either by running the sentinel or by inserting a
9658 message in the process buffer.
9661 Non-@code{nil} if communication with the subprocess uses a @sc{pty};
9662 @code{nil} if it uses a pipe.
9665 The file descriptor for input from the process.
9668 The file descriptor for output to the process.
9671 The file descriptor for the terminal that the subprocess is using. (On
9672 some systems, there is no need to record this, so the value is
9676 The name of the terminal that the subprocess is using,
9677 or @code{nil} if it is using pipes.
9680 @node Interface to the X Window System, Index, Subprocesses, Top
9681 @chapter Interface to the X Window System
9682 @cindex X Window System, interface to the
9684 Mostly undocumented.
9687 * Lucid Widget Library:: An interface to various widget sets.
9690 @node Lucid Widget Library
9691 @section Lucid Widget Library
9692 @cindex Lucid Widget Library
9693 @cindex widget library, Lucid
9694 @cindex library, Lucid Widget
9696 Lwlib is extremely poorly documented and quite hairy. The author(s)
9697 blame that on X, Xt, and Motif, with some justice, but also sufficient
9698 hypocrisy to avoid drawing the obvious conclusion about their own work.
9700 The Lucid Widget Library is composed of two more or less independent
9701 pieces. The first, as the name suggests, is a set of widgets. These
9702 widgets are intended to resemble and improve on widgets provided in the
9703 Motif toolkit but not in the Athena widgets, including menubars and
9704 scrollbars. Recent additions by Andy Piper integrate some ``modern''
9705 widgets by Edward Falk, including checkboxes, radio buttons, progress
9706 gauges, and index tab controls (aka notebooks).
9708 The second piece of the Lucid widget library is a generic interface to
9709 several toolkits for X (including Xt, the Athena widget set, and Motif,
9710 as well as the Lucid widgets themselves) so that core XEmacs code need
9711 not know which widget set has been used to build the graphical user
9715 * Generic Widget Interface:: The lwlib generic widget interface.
9718 * Checkboxes and Radio Buttons::
9723 @node Generic Widget Interface
9724 @subsection Generic Widget Interface
9725 @cindex widget interface, generic
9727 In general in any toolkit a widget may be a composite object. In Xt,
9728 all widgets have an X window that they manage, but typically a complex
9729 widget will have widget children, each of which manages a subwindow of
9730 the parent widget's X window. These children may themselves be
9731 composite widgets. Thus a widget is actually a tree or hierarchy of
9734 For each toolkit widget, lwlib maintains a tree of @code{widget_values}
9735 which mirror the hierarchical state of Xt widgets (including Motif,
9736 Athena, 3D Athena, and Falk's widget sets). Each @code{widget_value}
9737 has @code{contents} member, which points to the head of a linked list of
9738 its children. The linked list of siblings is chained through the
9739 @code{next} member of @code{widget_value}.
9748 +-------+ next +-------+ next +-------+
9749 | child |----->| child |----->| child |
9750 +-------+ +-------+ +-------+
9754 +-------------+ next +-------------+
9755 | grand child |----->| grand child |
9756 +-------------+ +-------------+
9758 The @code{widget_value} hierarchy of a composite widget with two simple
9759 children and one composite child.
9762 The @code{widget_instance} structure maintains the inverse view of the
9763 tree. As for the @code{widget_value}, siblings are chained through the
9764 @code{next} member. However, rather than naming children, the
9765 @code{widget_instance} tree links to parents.
9774 +-------+ next +-------+ next +-------+
9775 | child |----->| child |----->| child |
9776 +-------+ +-------+ +-------+
9780 +-------------+ next +-------------+
9781 | grand child |----->| grand child |
9782 +-------------+ +-------------+
9784 The @code{widget_value} hierarchy of a composite widget with two simple
9785 children and one composite child.
9788 This permits widgets derived from different toolkits to be updated and
9789 manipulated generically by the lwlib library. For instance
9790 @code{update_one_widget_instance} can cope with multiple types of widget
9791 and multiple types of toolkit. Each element in the widget hierarchy is
9792 updated from its corresponding @code{widget_value} by walking the
9793 @code{widget_value} tree. This has desirable properties. For example,
9794 @code{lw_modify_all_widgets} is called from @file{glyphs-x.c} and
9795 updates all the properties of a widget without having to know what the
9796 widget is or what toolkit it is from. Unfortunately this also has its
9797 hairy properties; the lwlib code quite complex. And of course lwlib has
9798 to know at some level what the widget is and how to set its properties.
9800 The @code{widget_instance} structure also contains a pointer to the root
9801 of its tree. Widget instances are further confi
9805 @subsection Scrollbars
9809 @subsection Menubars
9812 @node Checkboxes and Radio Buttons
9813 @subsection Checkboxes and Radio Buttons
9814 @cindex checkboxes and radio buttons
9815 @cindex radio buttons, checkboxes and
9816 @cindex buttons, checkboxes and radio
9819 @subsection Progress Bars
9820 @cindex progress bars
9821 @cindex bars, progress
9824 @subsection Tab Controls
9825 @cindex tab controls
9829 @c Print the tables of contents