This is ../info/internals.info, produced by makeinfo version 4.8 from
internals/internals.texi.

INFO-DIR-SECTION XEmacs Editor
START-INFO-DIR-ENTRY
* Internals: (internals).       XEmacs Internals Manual.
END-INFO-DIR-ENTRY

   Copyright (C) 1992 - 1996 Ben Wing.  Copyright (C) 1996, 1997 Sun
Microsystems.  Copyright (C) 1994 - 1998, 2002, 2003 Free Software
Foundation.  Copyright (C) 1994, 1995 Board of Trustees, University of
Illinois.

   Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
preserved on all copies.

   Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided that the
entire resulting derived work is distributed under the terms of a
permission notice identical to this one.

   Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that this permission notice may be stated in a
translation approved by the Foundation.

   Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided also
that the section entitled "GNU General Public License" is included
exactly as in the original, and provided that the entire resulting
derived work is distributed under the terms of a permission notice
identical to this one.

   Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that the section entitled "GNU General Public License"
may be included in a translation approved by the Free Software
Foundation instead of in the original English.


File: internals.info,  Node: Top,  Next: A History of Emacs,  Prev: (dir),  Up: (dir)

   This Info file contains v1.4 of the XEmacs Internals Manual, March
2001.

* Menu:

* A History of Emacs::          Times, dates, important events.
* XEmacs From the Outside::     A broad conceptual overview.
* The Lisp Language::           An overview.
* XEmacs From the Perspective of Building::
* XEmacs From the Inside::
* The XEmacs Object System (Abstractly Speaking)::
* How Lisp Objects Are Represented in C::
* Rules When Writing New C Code::
* Regression Testing XEmacs::
* A Summary of the Various XEmacs Modules::
* Allocation of Objects in XEmacs Lisp::
* Dumping::
* Events and the Event Loop::
* Evaluation; Stack Frames; Bindings::
* Symbols and Variables::
* Buffers and Textual Representation::
* MULE Character Sets and Encodings::
* The Lisp Reader and Compiler::
* Lstreams::
* Consoles; Devices; Frames; Windows::
* The Redisplay Mechanism::
* Extents::
* Faces::
* Glyphs::
* Specifiers::
* Menus::
* Subprocesses::
* Interface to the X Window System::
* Index::


--- The Detailed Node Listing ---

A History of Emacs

* Through Version 18::          Unification prevails.
* Lucid Emacs::                 One version 19 Emacs.
* GNU Emacs 19::                The other version 19 Emacs.
* GNU Emacs 20::                The other version 20 Emacs.
* XEmacs::                      The continuation of Lucid Emacs.

Rules When Writing New C Code

* General Coding Rules::
* Writing Lisp Primitives::
* Adding Global Lisp Variables::
* Coding for Mule::
* Techniques for XEmacs Developers::

Coding for Mule

* Character-Related Data Types::
* Working With Character and Byte Positions::
* Conversion to and from External Data::
* General Guidelines for Writing Mule-Aware Code::
* An Example of Mule-Aware Code::

Regression Testing XEmacs

A Summary of the Various XEmacs Modules

* Low-Level Modules::
* Basic Lisp Modules::
* Modules for Standard Editing Operations::
* Editor-Level Control Flow Modules::
* Modules for the Basic Displayable Lisp Objects::
* Modules for other Display-Related Lisp Objects::
* Modules for the Redisplay Mechanism::
* Modules for Interfacing with the File System::
* Modules for Other Aspects of the Lisp Interpreter and Object System::
* Modules for Interfacing with the Operating System::
* Modules for Interfacing with X Windows::
* Modules for Internationalization::
* Modules for Regression Testing::

Allocation of Objects in XEmacs Lisp

* Introduction to Allocation::
* Garbage Collection::
* GCPROing::
* Garbage Collection - Step by Step::
* Integers and Characters::
* Allocation from Frob Blocks::
* lrecords::
* Low-level allocation::
* Cons::
* Vector::
* Bit Vector::
* Symbol::
* Marker::
* String::
* Compiled Function::

Garbage Collection - Step by Step

* Invocation::
* garbage_collect_1::
* mark_object::
* gc_sweep::
* sweep_lcrecords_1::
* compact_string_chars::
* sweep_strings::
* sweep_bit_vectors_1::

Dumping

* Overview::
* Data descriptions::
* Dumping phase::
* Reloading phase::

Dumping phase

* Object inventory::
* Address allocation::
* The header::
* Data dumping::
* Pointers dumping::

Events and the Event Loop

* Introduction to Events::
* Main Loop::
* Specifics of the Event Gathering Mechanism::
* Specifics About the Emacs Event::
* The Event Stream Callback Routines::
* Other Event Loop Functions::
* Converting Events::
* Dispatching Events; The Command Builder::

Evaluation; Stack Frames; Bindings

* Evaluation::
* Dynamic Binding; The specbinding Stack; Unwind-Protects::
* Simple Special Forms::
* Catch and Throw::

Symbols and Variables

* Introduction to Symbols::
* Obarrays::
* Symbol Values::

Buffers and Textual Representation

* Introduction to Buffers::     A buffer holds a block of text such as a file.
* The Text in a Buffer::        Representation of the text in a buffer.
* Buffer Lists::                Keeping track of all buffers.
* Markers and Extents::         Tagging locations within a buffer.
* Bufbytes and Emchars::        Representation of individual characters.
* The Buffer Object::           The Lisp object corresponding to a buffer.

MULE Character Sets and Encodings

* Character Sets::
* Encodings::
* Internal Mule Encodings::
* CCL::

Encodings

* Japanese EUC (Extended Unix Code)::
* JIS7::

Internal Mule Encodings

* Internal String Encoding::
* Internal Character Encoding::

Lstreams

* Creating an Lstream::         Creating an lstream object.
* Lstream Types::               Different sorts of things that are streamed.
* Lstream Functions::           Functions for working with lstreams.
* Lstream Methods::             Creating new lstream types.

Consoles; Devices; Frames; Windows

* Introduction to Consoles; Devices; Frames; Windows::
* Point::
* Window Hierarchy::
* The Window Object::

The Redisplay Mechanism

* Critical Redisplay Sections::
* Line Start Cache::
* Redisplay Piece by Piece::

Extents

* Introduction to Extents::     Extents are ranges over text, with properties.
* Extent Ordering::             How extents are ordered internally.
* Format of the Extent Info::   The extent information in a buffer or string.
* Zero-Length Extents::         A weird special case.
* Mathematics of Extent Ordering::  A rigorous foundation.
* Extent Fragments::            Cached information useful for redisplay.


File: internals.info,  Node: A History of Emacs,  Next: XEmacs From the Outside,  Prev: Top,  Up: Top

1 A History of Emacs
********************

XEmacs is a powerful, customizable text editor and development
environment.  It began as Lucid Emacs, which was in turn derived from
GNU Emacs, a program written by Richard Stallman of the Free Software
Foundation.  GNU Emacs dates back to the 1970's, and was modelled after
a package called "Emacs", written in 1976, that was a set of macros on
top of TECO, an old, old text editor written at MIT on the DEC PDP 10
under one of the earliest time-sharing operating systems, ITS
(Incompatible Timesharing System). (ITS dates back well before Unix.)
ITS, TECO, and Emacs were products of a group of people at MIT who
called themselves "hackers", who shared an idealistic belief system
about the free exchange of information and were fanatical in their
devotion to and time spent with computers. (The hacker subculture dates
back to the late 1950's at MIT and is described in detail in Steven
Levy's book `Hackers'.  This book also includes a lot of information
about Stallman himself and the development of Lisp, a programming
language developed at MIT that underlies Emacs.)

* Menu:

* Through Version 18::          Unification prevails.
* Lucid Emacs::                 One version 19 Emacs.
* GNU Emacs 19::                The other version 19 Emacs.
* GNU Emacs 20::                The other version 20 Emacs.
* XEmacs::                      The continuation of Lucid Emacs.


File: internals.info,  Node: Through Version 18,  Next: Lucid Emacs,  Up: A History of Emacs

1.1 Through Version 18
======================

Although the history of the early versions of GNU Emacs is unclear, the
history is well-known from the middle of 1985.  A time line is:

   * GNU Emacs version 15 (15.34) was released sometime in 1984 or 1985
     and shared some code with a version of Emacs written by James
     Gosling (the same James Gosling who later created the Java
     language).

   * GNU Emacs version 16 (first released version was 16.56) was
     released on July 15, 1985.  All Gosling code was removed due to
     potential copyright problems with the code.

   * version 16.57: released on September 16, 1985.

   * versions 16.58, 16.59: released on September 17, 1985.

   * version 16.60: released on September 19, 1985.  These later
     version 16's incorporated patches from the net, esp. for getting
     Emacs to work under System V.

   * version 17.36 (first official v17 release) released on December 20,
     1985.  Included a TeX-able user manual.  First official unpatched
     version that worked on vanilla System V machines.

   * version 17.43 (second official v17 release) released on January 25,
     1986.

   * version 17.45 released on January 30, 1986.

   * version 17.46 released on February 4, 1986.

   * version 17.48 released on February 10, 1986.

   * version 17.49 released on February 12, 1986.

   * version 17.55 released on March 18, 1986.

   * version 17.57 released on March 27, 1986.

   * version 17.58 released on April 4, 1986.

   * version 17.61 released on April 12, 1986.

   * version 17.63 released on May 7, 1986.

   * version 17.64 released on May 12, 1986.

   * version 18.24 (a beta version) released on October 2, 1986.

   * version 18.30 (a beta version) released on November 15, 1986.

   * version 18.31 (a beta version) released on November 23, 1986.

   * version 18.32 (a beta version) released on December 7, 1986.

   * version 18.33 (a beta version) released on December 12, 1986.

   * version 18.35 (a beta version) released on January 5, 1987.

   * version 18.36 (a beta version) released on January 21, 1987.

   * January 27, 1987: The Great Usenet Renaming.  net.emacs is now
     comp.emacs.

   * version 18.37 (a beta version) released on February 12, 1987.

   * version 18.38 (a beta version) released on March 3, 1987.

   * version 18.39 (a beta version) released on March 14, 1987.

   * version 18.40 (a beta version) released on March 18, 1987.

   * version 18.41 (the first "official" release) released on March 22,
     1987.

   * version 18.45 released on June 2, 1987.

   * version 18.46 released on June 9, 1987.

   * version 18.47 released on June 18, 1987.

   * version 18.48 released on September 3, 1987.

   * version 18.49 released on September 18, 1987.

   * version 18.50 released on February 13, 1988.

   * version 18.51 released on May 7, 1988.

   * version 18.52 released on September 1, 1988.

   * version 18.53 released on February 24, 1989.

   * version 18.54 released on April 26, 1989.

   * version 18.55 released on August 23, 1989.  This is the earliest
     version that is still available by FTP.

   * version 18.56 released on January 17, 1991.

   * version 18.57 released late January, 1991.

   * version 18.58 released ?????.

   * version 18.59 released October 31, 1992.


File: internals.info,  Node: Lucid Emacs,  Next: GNU Emacs 19,  Prev: Through Version 18,  Up: A History of Emacs

1.2 Lucid Emacs
===============

Lucid Emacs was developed by the (now-defunct) Lucid Inc., a maker of
C++ and Lisp development environments.  It began when Lucid decided they
wanted to use Emacs as the editor and cornerstone of their C++
development environment (called "Energize").  They needed many features
that were not available in the existing version of GNU Emacs (version
18.5something), in particular good and integrated support for GUI
elements such as mouse support, multiple fonts, multiple window-system
windows, etc.  A branch of GNU Emacs called Epoch, written at the
University of Illinois, existed that supplied many of these features;
however, Lucid needed more than what existed in Epoch.  At the time, the
Free Software Foundation was working on version 19 of Emacs (this was
sometime around 1991), which was planned to have similar features, and
so Lucid decided to work with the Free Software Foundation.  Their plan
was to add features that they needed, and coordinate with the FSF so
that the features would get included back into Emacs version 19.

   Delays in the release of version 19 occurred, however (resulting in
it finally being released more than a year after what was initially
planned), and Lucid encountered unexpected technical resistance in
getting their changes merged back into version 19, so they decided to
release their own version of Emacs, which became Lucid Emacs 19.0.

   The initial authors of Lucid Emacs were Matthieu Devin, Harlan
Sexton, and Eric Benson, and the work was later taken over by Jamie
Zawinski, who became "Mr. Lucid Emacs" for many releases.

   A time line for Lucid Emacs is

   * version 19.0 shipped with Energize 1.0, April 1992.

   * version 19.1 released June 4, 1992.

   * version 19.2 released June 19, 1992.

   * version 19.3 released September 9, 1992.

   * version 19.4 released January 21, 1993.

   * version 19.5 was a repackaging of 19.4 with a few bug fixes and
     shipped with Energize 2.0.  Never released to the net.

   * version 19.6 released April 9, 1993.

   * version 19.7 was a repackaging of 19.6 with a few bug fixes and
     shipped with Energize 2.1.  Never released to the net.

   * version 19.8 released September 6, 1993.

   * version 19.9 released January 12, 1994.

   * version 19.10 released May 27, 1994.

   * version 19.11 (first XEmacs) released September 13, 1994.

   * version 19.12 released June 23, 1995.

   * version 19.13 released September 1, 1995.

   * version 19.14 released June 23, 1996.

   * version 20.0 released February 9, 1997.

   * version 19.15 released March 28, 1997.

   * version 20.1 (not released to the net) April 15, 1997.

   * version 20.2 released May 16, 1997.

   * version 19.16 released October 31, 1997.

   * version 20.3 (the first stable version of XEmacs 20.x) released
     November 30, 1997.

   * version 20.4 released February 28, 1998.

   * version 21.1.2 released May 14, 1999. (The version naming scheme
     was changed at this point: [a] the second version number is odd
     for stable versions, even for beta versions; [b] a third version
     number is added, replacing the "beta xxx" ending for beta versions
     and allowing for periodic maintenance releases for stable
     versions.  Therefore, 21.0 was never "officially" released;
     similarly for 21.2, etc.)

   * version 21.1.3 released June 26, 1999.

   * version 21.1.4 released July 8, 1999.

   * version 21.1.6 released August 14, 1999. (There was no 21.1.5.)

   * version 21.1.7 released September 26, 1999.

   * version 21.1.8 released November 2, 1999.

   * version 21.1.9 released February 13, 2000.

   * version 21.1.10 released May 7, 2000.

   * version 21.1.10a released June 24, 2000.

   * version 21.1.11 released July 18, 2000.

   * version 21.1.12 released August 5, 2000.

   * version 21.1.13 released January 7, 2001.

   * version 21.1.14 released January 27, 2001.


File: internals.info,  Node: GNU Emacs 19,  Next: GNU Emacs 20,  Prev: Lucid Emacs,  Up: A History of Emacs

1.3 GNU Emacs 19
================

About a year after the initial release of Lucid Emacs, the FSF released
a beta of their version of Emacs 19 (referred to here as "GNU Emacs").
By this time, the current version of Lucid Emacs was 19.6. (Strangely,
the first released beta from the FSF was GNU Emacs 19.7.) A time line
for GNU Emacs version 19 is

   * version 19.8 (beta) released May 27, 1993.

   * version 19.9 (beta) released May 27, 1993.

   * version 19.10 (beta) released May 30, 1993.

   * version 19.11 (beta) released June 1, 1993.

   * version 19.12 (beta) released June 2, 1993.

   * version 19.13 (beta) released June 8, 1993.

   * version 19.14 (beta) released June 17, 1993.

   * version 19.15 (beta) released June 19, 1993.

   * version 19.16 (beta) released July 6, 1993.

   * version 19.17 (beta) released late July, 1993.

   * version 19.18 (beta) released August 9, 1993.

   * version 19.19 (beta) released August 15, 1993.

   * version 19.20 (beta) released November 17, 1993.

   * version 19.21 (beta) released November 17, 1993.

   * version 19.22 (beta) released November 28, 1993.

   * version 19.23 (beta) released May 17, 1994.

   * version 19.24 (beta) released May 16, 1994.

   * version 19.25 (beta) released June 3, 1994.

   * version 19.26 (beta) released September 11, 1994.

   * version 19.27 (beta) released September 14, 1994.

   * version 19.28 (first "official" release) released November 1, 1994.

   * version 19.29 released June 21, 1995.

   * version 19.30 released November 24, 1995.

   * version 19.31 released May 25, 1996.

   * version 19.32 released July 31, 1996.

   * version 19.33 released August 11, 1996.

   * version 19.34 released August 21, 1996.

   * version 19.34b released September 6, 1996.

   In some ways, GNU Emacs 19 was better than Lucid Emacs; in some ways,
worse.  Lucid soon began incorporating features from GNU Emacs 19 into
Lucid Emacs; the work was mostly done by Richard Mlynarik, who had been
working on and using GNU Emacs for a long time (back as far as version
16 or 17).


File: internals.info,  Node: GNU Emacs 20,  Next: XEmacs,  Prev: GNU Emacs 19,  Up: A History of Emacs

1.4 GNU Emacs 20
================

On February 2, 1997 work began on GNU Emacs to integrate Mule.  The
first release was made in September of that year.

   A timeline for Emacs 20 is

   * version 20.1 released September 17, 1997.

   * version 20.2 released September 20, 1997.

   * version 20.3 released August 19, 1998.


File: internals.info,  Node: XEmacs,  Prev: GNU Emacs 20,  Up: A History of Emacs

1.5 XEmacs
==========

Around the time that Lucid was developing Energize, Sun Microsystems
was developing their own development environment (called "SPARCWorks")
and also decided to use Emacs.  They joined forces with the Epoch team
at the University of Illinois and later with Lucid.  The maintainer of
the last-released version of Epoch was Marc Andreessen, but he dropped
out and the Epoch project, headed by Simon Kaplan, lured Chuck Thompson
away from a system administration job to become the primary Lucid Emacs
author for Epoch and Sun.  Chuck's area of specialty became the
redisplay engine (he replaced the old Lucid Emacs redisplay engine with
a ported version from Epoch and then later rewrote it from scratch).
Sun also hired Ben Wing (the author of Win-Emacs, a port of Lucid Emacs
to Microsoft Windows 3.1) in 1993, for what was initially a one-month
contract to fix some event problems but later became a many-year
involvement, punctuated by a six-month contract with Amdahl Corporation.

   In 1994, Sun and Lucid agreed to rename Lucid Emacs to XEmacs (a name
not favorable to either company); the first release called XEmacs was
version 19.11.  In June 1994, Lucid folded and Jamie quit to work for
the newly formed Mosaic Communications Corp., later Netscape
Communications Corp. (co-founded by the same Marc Andreessen, who had
quit his Epoch job to work on a graphical browser for the World Wide
Web).  Chuck then become the primary maintainer of XEmacs, and put out
versions 19.11 through 19.14 in conjunction with Ben.  For 19.12 and
19.13, Chuck added the new redisplay and many other display improvements
and Ben added MULE support (support for Asian and other languages) and
redesigned most of the internal Lisp subsystems to better support the
MULE work and the various other features being added to XEmacs.  After
19.14 Chuck retired as primary maintainer and Steve Baur stepped in.

   Soon after 19.13 was released, work began in earnest on the MULE
internationalization code and the source tree was divided into two
development paths.  The MULE version was initially called 19.20, but was
soon renamed to 20.0.  In 1996 Martin Buchholz of Sun Microsystems took
over the care and feeding of it and worked on it in parallel with the
19.14 development that was occurring at the same time.  After much work
by Martin, it was decided to release 20.0 ahead of 19.15 in February
1997.  The source tree remained divided until 20.2 when the version 19
source was finally retired at version 19.16.

   In 1997, Sun finally dropped all pretense of support for XEmacs and
Martin Buchholz left the company in November.  Since then, and mostly
for the previous year, because Steve Baur was never paid to work on
XEmacs, XEmacs has existed solely on the contributions of volunteers
from the Free Software Community.  Starting from 1997, Hrvoje Niksic and
Kyle Jones have figured prominently in XEmacs development.

   Many attempts have been made to merge XEmacs and GNU Emacs, but they
have consistently failed.

   A more detailed history is contained in the XEmacs About page.

   A time line for XEmacs is

   * version 19.11 (first XEmacs) released September 13, 1994.

   * version 19.12 released June 23, 1995.

   * version 19.13 released September 1, 1995.

   * version 19.14 released June 23, 1996.

   * version 20.0 released February 9, 1997.

   * version 19.15 released March 28, 1997.

   * version 20.1 (not released to the net) April 15, 1997.

   * version 20.2 released May 16, 1997.

   * version 19.16 released October 31, 1997.

   * version 20.3 (the first stable version of XEmacs 20.x) released
     November 30, 1997.

   * version 20.4 released February 28, 1998.

   * version 21.0.60 released December 10, 1998. (The version naming
     scheme was changed at this point: [a] the second version number is
     odd for stable versions, even for beta versions; [b] a third
     version number is added, replacing the "beta xxx" ending for beta
     versions and allowing for periodic maintenance releases for stable
     versions.  Therefore, 21.0 was never "officially" released;
     similarly for 21.2, etc.)

   * version 21.0.61 released January 4, 1999.

   * version 21.0.63 released February 3, 1999.

   * version 21.0.64 released March 1, 1999.

   * version 21.0.65 released March 5, 1999.

   * version 21.0.66 released March 12, 1999.

   * version 21.0.67 released March 25, 1999.

   * version 21.1.2 released May 14, 1999. (This is the followup to
     21.0.67.  The second version number was bumped to indicate the
     beginning of the "stable" series.)

   * version 21.1.3 released June 26, 1999.

   * version 21.1.4 released July 8, 1999.

   * version 21.1.6 released August 14, 1999. (There was no 21.1.5.)

   * version 21.1.7 released September 26, 1999.

   * version 21.1.8 released November 2, 1999.

   * version 21.1.9 released February 13, 2000.

   * version 21.1.10 released May 7, 2000.

   * version 21.1.10a released June 24, 2000.

   * version 21.1.11 released July 18, 2000.

   * version 21.1.12 released August 5, 2000.

   * version 21.1.13 released January 7, 2001.

   * version 21.1.14 released January 27, 2001.

   * version 21.2.9 released February 3, 1999.

   * version 21.2.10 released February 5, 1999.

   * version 21.2.11 released March 1, 1999.

   * version 21.2.12 released March 5, 1999.

   * version 21.2.13 released March 12, 1999.

   * version 21.2.14 released May 14, 1999.

   * version 21.2.15 released June 4, 1999.

   * version 21.2.16 released June 11, 1999.

   * version 21.2.17 released June 22, 1999.

   * version 21.2.18 released July 14, 1999.

   * version 21.2.19 released July 30, 1999.

   * version 21.2.20 released November 10, 1999.

   * version 21.2.21 released November 28, 1999.

   * version 21.2.22 released November 29, 1999.

   * version 21.2.23 released December 7, 1999.

   * version 21.2.24 released December 14, 1999.

   * version 21.2.25 released December 24, 1999.

   * version 21.2.26 released December 31, 1999.

   * version 21.2.27 released January 18, 2000.

   * version 21.2.28 released February 7, 2000.

   * version 21.2.29 released February 16, 2000.

   * version 21.2.30 released February 21, 2000.

   * version 21.2.31 released February 23, 2000.

   * version 21.2.32 released March 20, 2000.

   * version 21.2.33 released May 1, 2000.

   * version 21.2.34 released May 28, 2000.

   * version 21.2.35 released July 19, 2000.

   * version 21.2.36 released October 4, 2000.

   * version 21.2.37 released November 14, 2000.

   * version 21.2.38 released December 5, 2000.

   * version 21.2.39 released December 31, 2000.

   * version 21.2.40 released January 8, 2001.

   * version 21.2.41 released January 17, 2001.

   * version 21.2.42 released January 20, 2001.

   * version 21.2.43 released January 26, 2001.

   * version 21.2.44 released February 8, 2001.

   * version 21.2.45 released February 23, 2001.

   * version 21.2.46 released March 21, 2001.


File: internals.info,  Node: XEmacs From the Outside,  Next: The Lisp Language,  Prev: A History of Emacs,  Up: Top

2 XEmacs From the Outside
*************************

XEmacs appears to the outside world as an editor, but it is really a
Lisp environment.  At its heart is a Lisp interpreter; it also
"happens" to contain many specialized object types (e.g. buffers,
windows, frames, events) that are useful for implementing an editor.
Some of these objects (in particular windows and frames) have
displayable representations, and XEmacs provides a function
`redisplay()' that ensures that the display of all such objects matches
their internal state.  Most of the time, a standard Lisp environment is
in a "read-eval-print" loop--i.e. "read some Lisp code, execute it, and
print the results".  XEmacs has a similar loop:

   * read an event

   * dispatch the event (i.e. "do it")

   * redisplay

   Reading an event is done using the Lisp function `next-event', which
waits for something to happen (typically, the user presses a key or
moves the mouse) and returns an event object describing this.
Dispatching an event is done using the Lisp function `dispatch-event',
which looks up the event in a keymap object (a particular kind of
object that associates an event with a Lisp function) and calls that
function.  The function "does" what the user has requested by changing
the state of particular frame objects, buffer objects, etc.  Finally,
`redisplay()' is called, which updates the display to reflect those
changes just made.  Thus is an "editor" born.

   Note that you do not have to use XEmacs as an editor; you could just
as well make it do your taxes, compute pi, play bridge, etc.  You'd just
have to write functions to do those operations in Lisp.


File: internals.info,  Node: The Lisp Language,  Next: XEmacs From the Perspective of Building,  Prev: XEmacs From the Outside,  Up: Top

3 The Lisp Language
*******************

Lisp is a general-purpose language that is higher-level than C and in
many ways more powerful than C.  Powerful dialects of Lisp such as
Common Lisp are probably much better languages for writing very large
applications than is C. (Unfortunately, for many non-technical reasons
C and its successor C++ have become the dominant languages for
application development.  These languages are both inadequate for
extremely large applications, which is evidenced by the fact that newer,
larger programs are becoming ever harder to write and are requiring ever
more programmers despite great increases in C development environments;
and by the fact that, although hardware speeds and reliability have been
growing at an exponential rate, most software is still generally
considered to be slow and buggy.)

   The new Java language holds promise as a better general-purpose
development language than C.  Java has many features in common with
Lisp that are not shared by C (this is not a coincidence, since Java
was designed by James Gosling, a former Lisp hacker).  This will be
discussed more later.

   For those used to C, here is a summary of the basic differences
between C and Lisp:

  1. Lisp has an extremely regular syntax.  Every function, expression,
     and control statement is written in the form

             (FUNC ARG1 ARG2 ...)

     This is as opposed to C, which writes functions as

             func(ARG1, ARG2, ...)

     but writes expressions involving operators as (e.g.)

             ARG1 + ARG2

     and writes control statements as (e.g.)

             while (EXPR) { STATEMENT1; STATEMENT2; ... }

     Lisp equivalents of the latter two would be

             (+ ARG1 ARG2 ...)

     and

             (while EXPR STATEMENT1 STATEMENT2 ...)

  2. Lisp is a safe language.  Assuming there are no bugs in the Lisp
     interpreter/compiler, it is impossible to write a program that
     "core dumps" or otherwise causes the machine to execute an illegal
     instruction.  This is very different from C, where perhaps the most
     common outcome of a bug is exactly such a crash.  A corollary of
     this is that the C operation of casting a pointer is impossible
     (and unnecessary) in Lisp, and that it is impossible to access
     memory outside the bounds of an array.

  3. Programs and data are written in the same form.  The
     parenthesis-enclosing form described above for statements is the
     same form used for the most common data type in Lisp, the list.
     Thus, it is possible to represent any Lisp program using Lisp data
     types, and for one program to construct Lisp statements and then
     dynamically "evaluate" them, or cause them to execute.

  4. All objects are "dynamically typed".  This means that part of every
     object is an indication of what type it is.  A Lisp program can
     manipulate an object without knowing what type it is, and can
     query an object to determine its type.  This means that,
     correspondingly, variables and function parameters can hold
     objects of any type and are not normally declared as being of any
     particular type.  This is opposed to the "static typing" of C,
     where variables can hold exactly one type of object and must be
     declared as such, and objects do not contain an indication of
     their type because it's implicit in the variables they are stored
     in.  It is possible in C to have a variable hold different types
     of objects (e.g. through the use of `void *' pointers or
     variable-argument functions), but the type information must then be
     passed explicitly in some other fashion, leading to additional
     program complexity.

  5. Allocated memory is automatically reclaimed when it is no longer
     in use.  This operation is called "garbage collection" and
     involves looking through all variables to see what memory is being
     pointed to, and reclaiming any memory that is not pointed to and
     is thus "inaccessible" and out of use.  This is as opposed to C,
     in which allocated memory must be explicitly reclaimed using
     `free()'.  If you simply drop all pointers to memory without
     freeing it, it becomes "leaked" memory that still takes up space.
     Over a long period of time, this can cause your program to grow
     and grow until it runs out of memory.

  6. Lisp has built-in facilities for handling errors and exceptions.
     In C, when an error occurs, usually either the program exits
     entirely or the routine in which the error occurs returns a value
     indicating this.  If an error occurs in a deeply-nested routine,
     then every routine currently called must unwind itself normally
     and return an error value back up to the next routine.  This means
     that every routine must explicitly check for an error in all the
     routines it calls; if it does not do so, unexpected and often
     random behavior results.  This is an extremely common source of
     bugs in C programs.  An alternative would be to do a non-local
     exit using `longjmp()', but that is often very dangerous because
     the routines that were exited past had no opportunity to clean up
     after themselves and may leave things in an inconsistent state,
     causing a crash shortly afterwards.

     Lisp provides mechanisms to make such non-local exits safe.  When
     an error occurs, a routine simply signals that an error of a
     particular class has occurred, and a non-local exit takes place.
     Any routine can trap errors occurring in routines it calls by
     registering an error handler for some or all classes of errors.
     (If no handler is registered, a default handler, generally
     installed by the top-level event loop, is executed; this prints
     out the error and continues.) Routines can also specify cleanup
     code (called an "unwind-protect") that will be called when control
     exits from a block of code, no matter how that exit occurs--i.e.
     even if a function deeply nested below it causes a non-local exit
     back to the top level.

     Note that this facility has appeared in some recent vintages of C,
     in particular Visual C++ and other PC compilers written for the
     Microsoft Win32 API.

  7. In Emacs Lisp, local variables are "dynamically scoped".  This
     means that if you declare a local variable in a particular
     function, and then call another function, that subfunction can
     "see" the local variable you declared.  This is actually
     considered a bug in Emacs Lisp and in all other early dialects of
     Lisp, and was corrected in Common Lisp. (In Common Lisp, you can
     still declare dynamically scoped variables if you want to--they
     are sometimes useful--but variables by default are "lexically
     scoped" as in C.)

   For those familiar with Lisp, Emacs Lisp is modelled after MacLisp,
an early dialect of Lisp developed at MIT (no relation to the Macintosh
computer).  There is a Common Lisp compatibility package available for
Emacs that provides many of the features of Common Lisp.

   The Java language is derived in many ways from C, and shares a
similar syntax, but has the following features in common with Lisp (and
different from C):

  1. Java is a safe language, like Lisp.

  2. Java provides garbage collection, like Lisp.

  3. Java has built-in facilities for handling errors and exceptions,
     like Lisp.

  4. Java has a type system that combines the best advantages of both
     static and dynamic typing.  Objects (except very simple types) are
     explicitly marked with their type, as in dynamic typing; but there
     is a hierarchy of types and functions are declared to accept only
     certain types, thus providing the increased compile-time
     error-checking of static typing.

   The Java language also has some negative attributes:

  1. Java uses the edit/compile/run model of software development.  This
     makes it hard to use interactively.  For example, to use Java like
     `bc' it is necessary to write a special purpose, albeit tiny,
     application.  In Emacs Lisp, a calculator comes built-in without
     any effort - one can always just type an expression in the
     `*scratch*' buffer.

  2. Java tries too hard to enforce, not merely enable, portability,
     making ordinary access to standard OS facilities painful.  Java
     has an "agenda".  I think this is why `chdir' is not part of
     standard Java, which is inexcusable.

   Unfortunately, there is no perfect language.  Static typing allows a
compiler to catch programmer errors and produce more efficient code, but
makes programming more tedious and less fun.  For the foreseeable
future, an Ideal Editing and Programming Environment (and that is what
XEmacs aspires to) will be programmable in multiple languages: high
level ones like Lisp for user customization and prototyping, and lower
level ones for infrastructure and industrial strength applications.  If
I had my way, XEmacs would be friendly towards the Python, Scheme, C++,
ML, etc... communities.  But there are serious technical difficulties to
achieving that goal.

   The word "application" in the previous paragraph was used
intentionally.  XEmacs implements an API for programs written in Lisp
that makes it a full-fledged application platform, very much like an OS
inside the real OS.


File: internals.info,  Node: XEmacs From the Perspective of Building,  Next: XEmacs From the Inside,  Prev: The Lisp Language,  Up: Top

4 XEmacs From the Perspective of Building
*****************************************

The heart of XEmacs is the Lisp environment, which is written in C.
This is contained in the `src/' subdirectory.  Underneath `src/' are
two subdirectories of header files: `s/' (header files for particular
operating systems) and `m/' (header files for particular machine
types).  In practice the distinction between the two types of header
files is blurred.  These header files define or undefine certain
preprocessor constants and macros to indicate particular
characteristics of the associated machine or operating system.  As part
of the configure process, one `s/' file and one `m/' file is identified
for the particular environment in which XEmacs is being built.

   XEmacs also contains a great deal of Lisp code.  This implements the
operations that make XEmacs useful as an editor as well as just a Lisp
environment, and also contains many add-on packages that allow XEmacs to
browse directories, act as a mail and Usenet news reader, compile Lisp
code, etc.  There is actually more Lisp code than C code associated with
XEmacs, but much of the Lisp code is peripheral to the actual operation
of the editor.  The Lisp code all lies in subdirectories underneath the
`lisp/' directory.

   The `lwlib/' directory contains C code that implements a generalized
interface onto different X widget toolkits and also implements some
widgets of its own that behave like Motif widgets but are faster, free,
and in some cases more powerful.  The code in this directory compiles
into a library and is mostly independent from XEmacs.

   The `etc/' directory contains various data files associated with
XEmacs.  Some of them are actually read by XEmacs at startup; others
merely contain useful information of various sorts.

   The `lib-src/' directory contains C code for various auxiliary
programs that are used in connection with XEmacs.  Some of them are used
during the build process; others are used to perform certain functions
that cannot conveniently be placed in the XEmacs executable (e.g. the
`movemail' program for fetching mail out of `/var/spool/mail', which
must be setgid to `mail' on many systems; and the `gnuclient' program,
which allows an external script to communicate with a running XEmacs
process).

   The `man/' directory contains the sources for the XEmacs
documentation.  It is mostly in a form called Texinfo, which can be
converted into either a printed document (by passing it through TeX) or
into on-line documentation called "info files".

   The `info/' directory contains the results of formatting the XEmacs
documentation as "info files", for on-line use.  These files are used
when you enter the Info system using `C-h i' or through the Help menu.

   The `dynodump/' directory contains auxiliary code used to build
XEmacs on Solaris platforms.

   The other directories contain various miscellaneous code and
information that is not normally used or needed.

   The first step of building involves running the `configure' program
and passing it various parameters to specify any optional features you
want and compiler arguments and such, as described in the `INSTALL'
file.  This determines what the build environment is, chooses the
appropriate `s/' and `m/' file, and runs a series of tests to determine
many details about your environment, such as which library functions
are available and exactly how they work.  The reason for running these
tests is that it allows XEmacs to be compiled on a much wider variety
of platforms than those that the XEmacs developers happen to be
familiar with, including various sorts of hybrid platforms.  This is
especially important now that many operating systems give you a great
deal of control over exactly what features you want installed, and allow
for easy upgrading of parts of a system without upgrading the rest.  It
would be impossible to pre-determine and pre-specify the information for
all possible configurations.

   In fact, the `s/' and `m/' files are basically _evil_, since they
contain unmaintainable platform-specific hard-coded information.
XEmacs has been moving in the direction of having all system-specific
information be determined dynamically by `configure'.  Perhaps someday
we can `rm -rf src/s src/m'.

   When configure is done running, it generates `Makefile's and
`GNUmakefile's and the file `src/config.h' (which describes the
features of your system) from template files.  You then run `make',
which compiles the auxiliary code and programs in `lib-src/' and
`lwlib/' and the main XEmacs executable in `src/'.  The result of
compiling and linking is an executable called `temacs', which is _not_
the final XEmacs executable.  `temacs' by itself is not intended to
function as an editor or even display any windows on the screen, and if
you simply run it, it will exit immediately.  The `Makefile' runs
`temacs' with certain options that cause it to initialize itself, read
in a number of basic Lisp files, and then dump itself out into a new
executable called `xemacs'.  This new executable has been
pre-initialized and contains pre-digested Lisp code that is necessary
for the editor to function (this includes most basic editing functions,
e.g. `kill-line', that can be defined in terms of other Lisp
primitives; some initialization code that is called when certain
objects, such as frames, are created; and all of the standard
keybindings and code for the actions they result in).  This executable,
`xemacs', is the executable that you run to use the XEmacs editor.

   Although `temacs' is not intended to be run as an editor, it can, by
using the incantation `temacs -batch -l loadup.el run-temacs'.  This is
useful when the dumping procedure described above is broken, or when
using certain program debugging tools such as Purify.  These tools get
mighty confused by the tricks played by the XEmacs build process, such
as allocation memory in one process, and freeing it in the next.


File: internals.info,  Node: XEmacs From the Inside,  Next: The XEmacs Object System (Abstractly Speaking),  Prev: XEmacs From the Perspective of Building,  Up: Top

5 XEmacs From the Inside
************************

Internally, XEmacs is quite complex, and can be very confusing.  To
simplify things, it can be useful to think of XEmacs as containing an
event loop that "drives" everything, and a number of other subsystems,
such as a Lisp engine and a redisplay mechanism.  Each of these other
subsystems exists simultaneously in XEmacs, and each has a certain
state.  The flow of control continually passes in and out of these
different subsystems in the course of normal operation of the editor.

   It is important to keep in mind that, most of the time, the editor is
"driven" by the event loop.  Except during initialization and batch
mode, all subsystems are entered directly or indirectly through the
event loop, and ultimately, control exits out of all subsystems back up
to the event loop.  This cycle of entering a subsystem, exiting back out
to the event loop, and starting another iteration of the event loop
occurs once each keystroke, mouse motion, etc.

   If you're trying to understand a particular subsystem (other than the
event loop), think of it as a "daemon" process or "servant" that is
responsible for one particular aspect of a larger system, and
periodically receives commands or environment changes that cause it to
do something.  Ultimately, these commands and environment changes are
always triggered by the event loop.  For example:

   * The window and frame mechanism is responsible for keeping track of
     what windows and frames exist, what buffers are in them, etc.  It
     is periodically given commands (usually from the user) to make a
     change to the current window/frame state: i.e. create a new frame,
     delete a window, etc.

   * The buffer mechanism is responsible for keeping track of what
     buffers exist and what text is in them.  It is periodically given
     commands (usually from the user) to insert or delete text, create
     a buffer, etc.  When it receives a text-change command, it
     notifies the redisplay mechanism.

   * The redisplay mechanism is responsible for making sure that
     windows and frames are displayed correctly.  It is periodically
     told (by the event loop) to actually "do its job", i.e. snoop
     around and see what the current state of the environment (mostly
     of the currently-existing windows, frames, and buffers) is, and
     make sure that state matches what's actually displayed.  It keeps
     lots and lots of information around (such as what is actually
     being displayed currently, and what the environment was last time
     it checked) so that it can minimize the work it has to do.  It is
     also helped along in that whenever a relevant change to the
     environment occurs, the redisplay mechanism is told about this, so
     it has a pretty good idea of where it has to look to find possible
     changes and doesn't have to look everywhere.

   * The Lisp engine is responsible for executing the Lisp code in
     which most user commands are written.  It is entered through a
     call to `eval' or `funcall', which occurs as a result of
     dispatching an event from the event loop.  The functions it calls
     issue commands to the buffer mechanism, the window/frame
     subsystem, etc.

   * The Lisp allocation subsystem is responsible for keeping track of
     Lisp objects.  It is given commands from the Lisp engine to
     allocate objects, garbage collect, etc.

   etc.

   The important idea here is that there are a number of independent
subsystems each with its own responsibility and persistent state, just
like different employees in a company, and each subsystem is
periodically given commands from other subsystems.  Commands can flow
from any one subsystem to any other, but there is usually some sort of
hierarchy, with all commands originating from the event subsystem.

   XEmacs is entered in `main()', which is in `emacs.c'.  When this is
called the first time (in a properly-invoked `temacs'), it does the
following:

  1. It does some very basic environment initializations, such as
     determining where it and its directories (e.g. `lisp/' and `etc/')
     reside and setting up signal handlers.

  2. It initializes the entire Lisp interpreter.

  3. It sets the initial values of many built-in variables (including
     many variables that are visible to Lisp programs), such as the
     global keymap object and the built-in faces (a face is an object
     that describes the display characteristics of text).  This
     involves creating Lisp objects and thus is dependent on step (2).

  4. It performs various other initializations that are relevant to the
     particular environment it is running in, such as retrieving
     environment variables, determining the current date and the user
     who is running the program, examining its standard input, creating
     any necessary file descriptors, etc.

  5. At this point, the C initialization is complete.  A Lisp program
     that was specified on the command line (usually `loadup.el') is
     called (temacs is normally invoked as `temacs -batch -l loadup.el
     dump').  `loadup.el' loads all of the other Lisp files that are
     needed for the operation of the editor, calls the `dump-emacs'
     function to write out `xemacs', and then kills the temacs process.

   When `xemacs' is then run, it only redoes steps (1) and (4) above;
all variables already contain the values they were set to when the
executable was dumped, and all memory that was allocated with
`malloc()' is still around. (XEmacs knows whether it is being run as
`xemacs' or `temacs' because it sets the global variable `initialized'
to 1 after step (4) above.) At this point, `xemacs' calls a Lisp
function to do any further initialization, which includes parsing the
command-line (the C code can only do limited command-line parsing,
which includes looking for the `-batch' and `-l' flags and a few other
flags that it needs to know about before initialization is complete),
creating the first frame (or "window" in standard window-system
parlance), running the user's init file (usually the file `.emacs' in
the user's home directory), etc.  The function to do this is usually
called `normal-top-level'; `loadup.el' tells the C code about this
function by setting its name as the value of the Lisp variable
`top-level'.

   When the Lisp initialization code is done, the C code enters the
event loop, and stays there for the duration of the XEmacs process.
The code for the event loop is contained in `cmdloop.c', and is called
`Fcommand_loop_1()'.  Note that this event loop could very well be
written in Lisp, and in fact a Lisp version exists; but apparently,
doing this makes XEmacs run noticeably slower.

   Notice how much of the initialization is done in Lisp, not in C.  In
general, XEmacs tries to move as much code as is possible into Lisp.
Code that remains in C is code that implements the Lisp interpreter
itself, or code that needs to be very fast, or code that needs to do
system calls or other such stuff that needs to be done in C, or code
that needs to have access to "forbidden" structures. (One conscious
aspect of the design of Lisp under XEmacs is a clean separation between
the external interface to a Lisp object's functionality and its internal
implementation.  Part of this design is that Lisp programs are
forbidden from accessing the contents of the object other than through
using a standard API.  In this respect, XEmacs Lisp is similar to
modern Lisp dialects but differs from GNU Emacs, which tends to expose
the implementation and allow Lisp programs to look at it directly.  The
major advantage of hiding the implementation is that it allows the
implementation to be redesigned without affecting any Lisp programs,
including those that might want to be "clever" by looking directly at
the object's contents and possibly manipulating them.)

   Moving code into Lisp makes the code easier to debug and maintain and
makes it much easier for people who are not XEmacs developers to
customize XEmacs, because they can make a change with much less chance
of obscure and unwanted interactions occurring than if they were to
change the C code.


File: internals.info,  Node: The XEmacs Object System (Abstractly Speaking),  Next: How Lisp Objects Are Represented in C,  Prev: XEmacs From the Inside,  Up: Top

6 The XEmacs Object System (Abstractly Speaking)
************************************************

At the heart of the Lisp interpreter is its management of objects.
XEmacs Lisp contains many built-in objects, some of which are simple
and others of which can be very complex; and some of which are very
common, and others of which are rarely used or are only used
internally. (Since the Lisp allocation system, with its automatic
reclamation of unused storage, is so much more convenient than
`malloc()' and `free()', the C code makes extensive use of it in its
internal operations.)

   The basic Lisp objects are

`integer'
     28 or 31 bits of precision, or 60 or 63 bits on 64-bit machines;
     the reason for this is described below when the internal Lisp
     object representation is described.

`float'
     Same precision as a double in C.

`cons'
     A simple container for two Lisp objects, used to implement lists
     and most other data structures in Lisp.

`char'
     An object representing a single character of text; chars behave
     like integers in many ways but are logically considered text
     rather than numbers and have a different read syntax. (the read
     syntax for a char contains the char itself or some textual
     encoding of it--for example, a Japanese Kanji character might be
     encoded as `^[$(B#&^[(B' using the ISO-2022 encoding
     standard--rather than the numerical representation of the char;
     this way, if the mapping between chars and integers changes, which
     is quite possible for Kanji characters and other extended
     characters, the same character will still be created.  Note that
     some primitives confuse chars and integers.  The worst culprit is
     `eq', which makes a special exception and considers a char to be
     `eq' to its integer equivalent, even though in no other case are
     objects of two different types `eq'.  The reason for this
     monstrosity is compatibility with existing code; the separation of
     char from integer came fairly recently.)

`symbol'
     An object that contains Lisp objects and is referred to by name;
     symbols are used to implement variables and named functions and to
     provide the equivalent of preprocessor constants in C.

`vector'
     A one-dimensional array of Lisp objects providing constant-time
     access to any of the objects; access to an arbitrary object in a
     vector is faster than for lists, but the operations that can be
     done on a vector are more limited.

`string'
     Self-explanatory; behaves much like a vector of chars but has a
     different read syntax and is stored and manipulated more compactly.

`bit-vector'
     A vector of bits; similar to a string in spirit.

`compiled-function'
     An object containing compiled Lisp code, known as "byte code".

`subr'
     A Lisp primitive, i.e. a Lisp-callable function implemented in C.

   Note that there is no basic "function" type, as in more powerful
versions of Lisp (where it's called a "closure").  XEmacs Lisp does not
provide the closure semantics implemented by Common Lisp and Scheme.
The guts of a function in XEmacs Lisp are represented in one of four
ways: a symbol specifying another function (when one function is an
alias for another), a list (whose first element must be the symbol
`lambda') containing the function's source code, a compiled-function
object, or a subr object. (In other words, given a symbol specifying
the name of a function, calling `symbol-function' to retrieve the
contents of the symbol's function cell will return one of these types
of objects.)

   XEmacs Lisp also contains numerous specialized objects used to
implement the editor:

`buffer'
     Stores text like a string, but is optimized for insertion and
     deletion and has certain other properties that can be set.

`frame'
     An object with various properties whose displayable representation
     is a "window" in window-system parlance.

`window'
     A section of a frame that displays the contents of a buffer; often
     called a "pane" in window-system parlance.

`window-configuration'
     An object that represents a saved configuration of windows in a
     frame.

`device'
     An object representing a screen on which frames can be displayed;
     equivalent to a "display" in the X Window System and a "TTY" in
     character mode.

`face'
     An object specifying the appearance of text or graphics; it has
     properties such as font, foreground color, and background color.

`marker'
     An object that refers to a particular position in a buffer and
     moves around as text is inserted and deleted to stay in the same
     relative position to the text around it.

`extent'
     Similar to a marker but covers a range of text in a buffer; can
     also specify properties of the text, such as a face in which the
     text is to be displayed, whether the text is invisible or
     unmodifiable, etc.

`event'
     Generated by calling `next-event' and contains information
     describing a particular event happening in the system, such as the
     user pressing a key or a process terminating.

`keymap'
     An object that maps from events (described using lists, vectors,
     and symbols rather than with an event object because the mapping
     is for classes of events, rather than individual events) to
     functions to execute or other events to recursively look up; the
     functions are described by name, using a symbol, or using lists to
     specify the function's code.

`glyph'
     An object that describes the appearance of an image (e.g.  pixmap)
     on the screen; glyphs can be attached to the beginning or end of
     extents and in some future version of XEmacs will be able to be
     inserted directly into a buffer.

`process'
     An object that describes a connection to an externally-running
     process.

   There are some other, less-commonly-encountered general objects:

`hash-table'
     An object that maps from an arbitrary Lisp object to another
     arbitrary Lisp object, using hashing for fast lookup.

`obarray'
     A limited form of hash-table that maps from strings to symbols;
     obarrays are used to look up a symbol given its name and are not
     actually their own object type but are kludgily represented using
     vectors with hidden fields (this representation derives from GNU
     Emacs).

`specifier'
     A complex object used to specify the value of a display property; a
     default value is given and different values can be specified for
     particular frames, buffers, windows, devices, or classes of device.

`char-table'
     An object that maps from chars or classes of chars to arbitrary
     Lisp objects; internally char tables use a complex nested-vector
     representation that is optimized to the way characters are
     represented as integers.

`range-table'
     An object that maps from ranges of integers to arbitrary Lisp
     objects.

   And some strange special-purpose objects:

`charset'
`coding-system'
     Objects used when MULE, or multi-lingual/Asian-language, support is
     enabled.

`color-instance'
`font-instance'
`image-instance'
     An object that encapsulates a window-system resource; instances are
     mostly used internally but are exposed on the Lisp level for
     cleanness of the specifier model and because it's occasionally
     useful for Lisp program to create or query the properties of
     instances.

`subwindow'
     An object that encapsulate a "subwindow" resource, i.e. a
     window-system child window that is drawn into by an external
     process; this object should be integrated into the glyph system
     but isn't yet, and may change form when this is done.

`tooltalk-message'
`tooltalk-pattern'
     Objects that represent resources used in the ToolTalk interprocess
     communication protocol.

`toolbar-button'
     An object used in conjunction with the toolbar.

   And objects that are only used internally:

`opaque'
     A generic object for encapsulating arbitrary memory; this allows
     you the generality of `malloc()' and the convenience of the Lisp
     object system.

`lstream'
     A buffering I/O stream, used to provide a unified interface to
     anything that can accept output or provide input, such as a file
     descriptor, a stdio stream, a chunk of memory, a Lisp buffer, a
     Lisp string, etc.; it's a Lisp object to make its memory
     management more convenient.

`char-table-entry'
     Subsidiary objects in the internal char-table representation.

`extent-auxiliary'
`menubar-data'
`toolbar-data'
     Various special-purpose objects that are basically just used to
     encapsulate memory for particular subsystems, similar to the more
     general "opaque" object.

`symbol-value-forward'
`symbol-value-buffer-local'
`symbol-value-varalias'
`symbol-value-lisp-magic'
     Special internal-only objects that are placed in the value cell of
     a symbol to indicate that there is something special with this
     variable - e.g. it has no value, it mirrors another variable, or
     it mirrors some C variable; there is really only one kind of
     object, called a "symbol-value-magic", but it is sort-of halfway
     kludged into semi-different object types.

   Some types of objects are "permanent", meaning that once created,
they do not disappear until explicitly destroyed, using a function such
as `delete-buffer', `delete-window', `delete-frame', etc.  Others will
disappear once they are not longer used, through the garbage collection
mechanism.  Buffers, frames, windows, devices, and processes are among
the objects that are permanent.  Note that some objects can go both
ways: Faces can be created either way; extents are normally permanent,
but detached extents (extents not referring to any text, as happens to
some extents when the text they are referring to is deleted) are
temporary.  Note that some permanent objects, such as faces and coding
systems, cannot be deleted.  Note also that windows are unique in that
they can be _undeleted_ after having previously been deleted. (This
happens as a result of restoring a window configuration.)

   Note that many types of objects have a "read syntax", i.e. a way of
specifying an object of that type in Lisp code.  When you load a Lisp
file, or type in code to be evaluated, what really happens is that the
function `read' is called, which reads some text and creates an object
based on the syntax of that text; then `eval' is called, which possibly
does something special; then this loop repeats until there's no more
text to read. (`eval' only actually does something special with
symbols, which causes the symbol's value to be returned, similar to
referencing a variable; and with conses [i.e. lists], which cause a
function invocation.  All other values are returned unchanged.)

   The read syntax

     17297

   converts to an integer whose value is 17297.

     1.983e-4

   converts to a float whose value is 1.983e-4, or .0001983.

     ?b

   converts to a char that represents the lowercase letter b.

     ?^[$(B#&^[(B

   (where `^[' actually is an `ESC' character) converts to a particular
Kanji character when using an ISO2022-based coding system for input.
(To decode this goo: `ESC' begins an escape sequence; `ESC $ (' is a
class of escape sequences meaning "switch to a 94x94 character set";
`ESC $ ( B' means "switch to Japanese Kanji"; `#' and `&' collectively
index into a 94-by-94 array of characters [subtract 33 from the ASCII
value of each character to get the corresponding index]; `ESC (' is a
class of escape sequences meaning "switch to a 94 character set"; `ESC
(B' means "switch to US ASCII".  It is a coincidence that the letter
`B' is used to denote both Japanese Kanji and US ASCII.  If the first
`B' were replaced with an `A', you'd be requesting a Chinese Hanzi
character from the GB2312 character set.)

     "foobar"

   converts to a string.

     foobar

   converts to a symbol whose name is `"foobar"'.  This is done by
looking up the string equivalent in the global variable `obarray',
whose contents should be an obarray.  If no symbol is found, a new
symbol with the name `"foobar"' is automatically created and added to
`obarray'; this process is called "interning" the symbol.  

     (foo . bar)

   converts to a cons cell containing the symbols `foo' and `bar'.

     (1 a 2.5)

   converts to a three-element list containing the specified objects
(note that a list is actually a set of nested conses; see the XEmacs
Lisp Reference).

     [1 a 2.5]

   converts to a three-element vector containing the specified objects.

     #[... ... ... ...]

   converts to a compiled-function object (the actual contents are not
shown since they are not relevant here; look at a file that ends with
`.elc' for examples).

     #*01110110

   converts to a bit-vector.

     #s(hash-table ... ...)

   converts to a hash table (the actual contents are not shown).

     #s(range-table ... ...)

   converts to a range table (the actual contents are not shown).

     #s(char-table ... ...)

   converts to a char table (the actual contents are not shown).

   Note that the `#s()' syntax is the general syntax for structures,
which are not really implemented in XEmacs Lisp but should be.

   When an object is printed out (using `print' or a related function),
the read syntax is used, so that the same object can be read in again.

   The other objects do not have read syntaxes, usually because it does
not really make sense to create them in this fashion (i.e.  processes,
where it doesn't make sense to have a subprocess created as a side
effect of reading some Lisp code), or because they can't be created at
all (e.g. subrs).  Permanent objects, as a rule, do not have a read
syntax; nor do most complex objects, which contain too much state to be
easily initialized through a read syntax.


File: internals.info,  Node: How Lisp Objects Are Represented in C,  Next: Rules When Writing New C Code,  Prev: The XEmacs Object System (Abstractly Speaking),  Up: Top

7 How Lisp Objects Are Represented in C
***************************************

Lisp objects are represented in C using a 32-bit or 64-bit machine word
(depending on the processor; i.e. DEC Alphas use 64-bit Lisp objects and
most other processors use 32-bit Lisp objects).  The representation
stuffs a pointer together with a tag, as follows:

      [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
      [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]

        <---------------------------------------------------------> <->
                 a pointer to a structure, or an integer            tag

   A tag of 00 is used for all pointer object types, a tag of 10 is used
for characters, and the other two tags 01 and 11 are joined together to
form the integer object type.  This representation gives us 31 bit
integers and 30 bit characters, while pointers are represented directly
without any bit masking or shifting.  This representation, though,
assumes that pointers to structs are always aligned to multiples of 4,
so the lower 2 bits are always zero.

   Lisp objects use the typedef `Lisp_Object', but the actual C type
used for the Lisp object can vary.  It can be either a simple type
(`long' on the DEC Alpha, `int' on other machines) or a structure whose
fields are bit fields that line up properly (actually, a union of
structures is used).  The choice of which type to use is determined by
the preprocessor constant `USE_UNION_TYPE' which is defined via the
`--use-union-type' option to `configure'.

   Generally the simple integral type is preferable because it ensures
that the compiler will actually use a machine word to represent the
object (some compilers will use more general and less efficient code
for unions and structs even if they can fit in a machine word).  The
union type, however, has the advantage of stricter _static_ type
checking.  Places where a `Lisp_Object' is mistakenly passed to a
routine expecting an `int' (or vice-versa), or a check is written `if
(foo)' (instead of `if (!NILP (foo))', will be flagged as errors.  None
of these lead to the expected results!  `Qnil' is not represented as 0
(so `if (foo)' will *ALWAYS* be true for a `Lisp_Object'), and the
representation of an integer as a `Lisp_Object' is not just the
integer's numeric value, but usually 2x the integer +/- 1.)

   There used to be a claim that the union type simplified debugging.
There may have been a grain of truth to this pre-19.8, when there was no
`lrecord' type and all objects had a separate type appearing in the
tag.  Nowadays, however, there is no debugging gain, and in fact
frequent debugging *_loss_*, since many debuggers don't handle unions
very well, and usually there is no way to directly specify a union from
a debugging prompt.

   Furthermore, release builds should *_not_* be done with union type
because (a) you may get less efficiency, with compilers that can't
figure out how to optimize the union into a machine word; (b) even
worse, the union type often triggers miscompilation, especially when
combined with Mule and error-checking.  This has been the case at
various times when using GCC and MS VC, at least with `--pdump'.
Therefore, be warned!

   As of 2002 4Q, miscompilation is known to happen with current
versions of *Microsoft VC++* and *GCC in combination with Mule, pdump,
and KKCC* (no error checking).

   Various macros are used to convert between Lisp_Objects and the
corresponding C type.  Macros of the form `XINT()', `XCHAR()',
`XSTRING()', `XSYMBOL()', do any required bit shifting and/or masking
and cast it to the appropriate type.  `XINT()' needs to be a bit tricky
so that negative numbers are properly sign-extended.  Since integers
are stored left-shifted, if the right-shift operator does an arithmetic
shift (i.e. it leaves the most-significant bit as-is rather than
shifting in a zero, so that it mimics a divide-by-two even for negative
numbers) the shift to remove the tag bit is enough.  This is the case
on all the systems we support.

   Note that when `ERROR_CHECK_TYPECHECK' is defined, the converter
macros become more complicated--they check the tag bits and/or the type
field in the first four bytes of a record type to ensure that the
object is really of the correct type.  This is great for catching places
where an incorrect type is being dereferenced--this typically results
in a pointer being dereferenced as the wrong type of structure, with
unpredictable (and sometimes not easily traceable) results.

   There are similar `XSETTYPE()' macros that construct a Lisp object.
These macros are of the form `XSETTYPE (LVALUE, RESULT)', i.e. they
have to be a statement rather than just used in an expression.  The
reason for this is that standard C doesn't let you "construct" a
structure (but GCC does).  Granted, this sometimes isn't too
convenient; for the case of integers, at least, you can use the
function `make_int()', which constructs and _returns_ an integer Lisp
object.  Note that the `XSETTYPE()' macros are also affected by
`ERROR_CHECK_TYPECHECK' and make sure that the structure is of the
right type in the case of record types, where the type is contained in
the structure.

   The C programmer is responsible for *guaranteeing* that a
Lisp_Object is the correct type before using the `XTYPE' macros.  This
is especially important in the case of lists.  Use `XCAR' and `XCDR' if
a Lisp_Object is certainly a cons cell, else use `Fcar()' and `Fcdr()'.
Trust other C code, but not Lisp code.  On the other hand, if XEmacs
has an internal logic error, it's better to crash immediately, so
sprinkle `assert()'s and "unreachable" `abort()'s liberally about the
source code.  Where performance is an issue, use `type_checking_assert',
`bufpos_checking_assert', and `gc_checking_assert', which do nothing
unless the corresponding configure error checking flag was specified.


File: internals.info,  Node: Rules When Writing New C Code,  Next: Regression Testing XEmacs,  Prev: How Lisp Objects Are Represented in C,  Up: Top

8 Rules When Writing New C Code
*******************************

The XEmacs C Code is extremely complex and intricate, and there are many
rules that are more or less consistently followed throughout the code.
Many of these rules are not obvious, so they are explained here.  It is
of the utmost importance that you follow them.  If you don't, you may
get something that appears to work, but which will crash in odd
situations, often in code far away from where the actual breakage is.

* Menu:

* A Reader's Guide to XEmacs Coding Conventions::
* General Coding Rules::
* Writing Lisp Primitives::
* Writing Good Comments::
* Adding Global Lisp Variables::
* Proper Use of Unsigned Types::
* Coding for Mule::
* Techniques for XEmacs Developers::


File: internals.info,  Node: A Reader's Guide to XEmacs Coding Conventions,  Next: General Coding Rules,  Up: Rules When Writing New C Code

8.1 A Reader's Guide to XEmacs Coding Conventions
=================================================

Of course the low-level implementation language of XEmacs is C, but much
of that uses the Lisp engine to do its work.  However, because the code
is "inside" of the protective containment shell around the "reactor
core," you'll see lots of complex "plumbing" needed to do the work and
"safety mechanisms," whose failure results in a meltdown.  This section
provides a quick overview (or review) of the various components of the
implementation of Lisp objects.

   Two typographic conventions help to identify C objects that implement
Lisp objects.  The first is that capitalized identifiers, especially
beginning with the letters `Q', `V', `F', and `S', for C variables and
functions, and C macros with beginning with the letter `X', are used to
implement Lisp.  The second is that where Lisp uses the hyphen `-' in
symbol names, the corresponding C identifiers use the underscore `_'.
Of course, since XEmacs Lisp contains interfaces to many external
libraries, those external names will follow the coding conventions
their authors chose, and may overlap the "XEmacs name space."  However
these cases are usually pretty obvious.

   All Lisp objects are handled indirectly.  The `Lisp_Object' type is
usually a pointer to a structure, except for a very small number of
types with immediate representations (currently characters and
integers).  However, these types cannot be directly operated on in C
code, either, so they can also be considered indirect.  Types that do
not have an immediate representation always have a C typedef
`Lisp_TYPE' for a corresponding structure.

   In older code, it was common practice to pass around pointers to
`Lisp_TYPE', but this is now deprecated in favor of using `Lisp_Object'
for all function arguments and return values that are Lisp objects.
The `XTYPE' macro is used to extract the pointer and cast it to
`(Lisp_TYPE *)' for the desired type.

   *Convention*: macros whose names begin with `X' operate on
`Lisp_Object's and do no type-checking.  Many such macros are type
extractors, but others implement Lisp operations in C (_e.g._, `XCAR'
implements the Lisp `car' function).  These are unsafe, and must only
be used where types of all data have already been checked.  Such macros
are only applied to `Lisp_Object's.  In internal implementations where
the pointer has already been converted, the structure is operated on
directly using the C `->' member access operator.

   The `TYPEP', `CHECK_TYPE', and `CONCHECK_TYPE' macros are used to
test types.  The first returns a Boolean value, and the latter signal
errors.  (The `CONCHECK' variety allows execution to be CONtinued under
some circumstances, thus the name.)  Functions which expect to be
passed user data invariably call `CHECK' macros on arguments.

   There are many types of specialized Lisp objects implemented in C,
but the most pervasive type is the "symbol".  Symbols are used as
identifiers, variables, and functions.

   *Convention*: Global variables whose names begin with `Q' are
constants whose value is a symbol.  The name of the variable should be
derived from the name of the symbol using the same rules as for Lisp
primitives.  Such variables allow the C code to check whether a
particular `Lisp_Object' is equal to a given symbol.  Symbols are Lisp
objects, so these variables may be passed to Lisp primitives.  (An
alternative to the use of `Q...' variables is to call the `intern'
function at initialization in the `vars_of_MODULE' function, which is
hardly less efficient.)

   *Convention*: Global variables whose names begin with `V' are
variables that contain Lisp objects.  The convention here is that all
global variables of type `Lisp_Object' begin with `V', and no others do
(not even integer and boolean variables that have Lisp equivalents).
Most of the time, these variables have equivalents in Lisp, which are
defined via the `DEFVAR' family of macros, but some don't.  Since the
variable's value is a `Lisp_Object', it can be passed to Lisp
primitives.

   The implementation of Lisp primitives is more complex.
*Convention*: Global variables with names beginning with `S' contain a
structure that allows the Lisp engine to identify and call a C
function.  In modern versions of XEmacs, these identifiers are almost
always completely hidden in the `DEFUN' and `SUBR' macros, but you will
encounter them if you look at very old versions of XEmacs or at GNU
Emacs.  *Convention*: Functions with names beginning with `F' implement
Lisp primitives.  Of course all their arguments and their return values
must be Lisp_Objects.  (This is hidden in the `DEFUN' macro.)


File: internals.info,  Node: General Coding Rules,  Next: Writing Lisp Primitives,  Prev: A Reader's Guide to XEmacs Coding Conventions,  Up: Rules When Writing New C Code

8.2 General Coding Rules
========================

The C code is actually written in a dialect of C called "Clean C",
meaning that it can be compiled, mostly warning-free, with either a C or
C++ compiler.  Coding in Clean C has several advantages over plain C.
C++ compilers are more nit-picking, and a number of coding errors have
been found by compiling with C++.  The ability to use both C and C++
tools means that a greater variety of development tools are available to
the developer.

   Every module includes `<config.h>' (angle brackets so that
`--srcdir' works correctly; `config.h' may or may not be in the same
directory as the C sources) and `lisp.h'.  `config.h' must always be
included before any other header files (including system header files)
to ensure that certain tricks played by various `s/' and `m/' files
work out correctly.

   When including header files, always use angle brackets, not double
quotes, except when the file to be included is always in the same
directory as the including file.  If either file is a generated file,
then that is not likely to be the case.  In order to understand why we
have this rule, imagine what happens when you do a build in the source
directory using `./configure' and another build in another directory
using `../work/configure'.  There will be two different `config.h'
files.  Which one will be used if you `#include "config.h"'?

   Almost every module contains a `syms_of_*()' function and a
`vars_of_*()' function.  The former declares any Lisp primitives you
have defined and defines any symbols you will be using.  The latter
declares any global Lisp variables you have added and initializes global
C variables in the module.  *Important*: There are stringent
requirements on exactly what can go into these functions.  See the
comment in `emacs.c'.  The reason for this is to avoid obscure unwanted
interactions during initialization.  If you don't follow these rules,
you'll be sorry!  If you want to do anything that isn't allowed, create
a `complex_vars_of_*()' function for it.  Doing this is tricky, though:
you have to make sure your function is called at the right time so that
all the initialization dependencies work out.

   Declare each function of these kinds in `symsinit.h'.  Make sure
it's called in the appropriate place in `emacs.c'.  You never need to
include `symsinit.h' directly, because it is included by `lisp.h'.

   *All global and static variables that are to be modifiable must be
declared uninitialized.*  This means that you may not use the "declare
with initializer" form for these variables, such as `int some_variable
= 0;'.  The reason for this has to do with some kludges done during the
dumping process: If possible, the initialized data segment is re-mapped
so that it becomes part of the (unmodifiable) code segment in the
dumped executable.  This allows this memory to be shared among multiple
running XEmacs processes.  XEmacs is careful to place as much constant
data as possible into initialized variables during the `temacs' phase.

   *Please note:* This kludge only works on a few systems nowadays, and
is rapidly becoming irrelevant because most modern operating systems
provide "copy-on-write" semantics.  All data is initially shared
between processes, and a private copy is automatically made (on a
page-by-page basis) when a process first attempts to write to a page of
memory.

   Formerly, there was a requirement that static variables not be
declared inside of functions.  This had to do with another hack along
the same vein as what was just described: old USG systems put
statically-declared variables in the initialized data space, so those
header files had a `#define static' declaration. (That way, the
data-segment remapping described above could still work.) This fails
badly on static variables inside of functions, which suddenly become
automatic variables; therefore, you weren't supposed to have any of
them.  This awful kludge has been removed in XEmacs because

  1. almost all of the systems that used this kludge ended up having to
     disable the data-segment remapping anyway;

  2. the only systems that didn't were extremely outdated ones;

  3. this hack completely messed up inline functions.

   The C source code makes heavy use of C preprocessor macros.  One
popular macro style is:

     #define FOO(var, value) do {            \
       Lisp_Object FOO_value = (value);      \
       ... /* compute using FOO_value */     \
       (var) = bar;                          \
     } while (0)

   The `do {...} while (0)' is a standard trick to allow FOO to have
statement semantics, so that it can safely be used within an `if'
statement in C, for example.  Multiple evaluation is prevented by
copying a supplied argument into a local variable, so that
`FOO(var,fun(1))' only calls `fun' once.

   Lisp lists are popular data structures in the C code as well as in
Elisp.  There are two sets of macros that iterate over lists.
`EXTERNAL_LIST_LOOP_N' should be used when the list has been supplied
by the user, and cannot be trusted to be acyclic and `nil'-terminated.
A `malformed-list' or `circular-list' error will be generated if the
list being iterated over is not entirely kosher.  `LIST_LOOP_N', on the
other hand, is faster and less safe, and can be used only on trusted
lists.

   Related macros are `GET_EXTERNAL_LIST_LENGTH' and `GET_LIST_LENGTH',
which calculate the length of a list, and in the case of
`GET_EXTERNAL_LIST_LENGTH', validating the properness of the list.  The
macros `EXTERNAL_LIST_LOOP_DELETE_IF' and `LIST_LOOP_DELETE_IF' delete
elements from a lisp list satisfying some predicate.


File: internals.info,  Node: Writing Lisp Primitives,  Next: Writing Good Comments,  Prev: General Coding Rules,  Up: Rules When Writing New C Code

8.3 Writing Lisp Primitives
===========================

Lisp primitives are Lisp functions implemented in C.  The details of
interfacing the C function so that Lisp can call it are handled by a few
C macros.  The only way to really understand how to write new C code is
to read the source, but we can explain some things here.

   An example of a special form is the definition of `prog1', from
`eval.c'.  (An ordinary function would have the same general
appearance.)

     DEFUN ("prog1", Fprog1, 1, UNEVALLED, 0, /*
     Similar to `progn', but the value of the first form is returned.
     \(prog1 FIRST BODY...): All the arguments are evaluated sequentially.
     The value of FIRST is saved during evaluation of the remaining args,
     whose values are discarded.
     */
            (args))
     {
       /* This function can GC */
       REGISTER Lisp_Object val, form, tail;
       struct gcpro gcpro1;

       val = Feval (XCAR (args));

       GCPRO1 (val);

       LIST_LOOP_3 (form, XCDR (args), tail)
         Feval (form);

       UNGCPRO;
       return val;
     }

   Let's start with a precise explanation of the arguments to the
`DEFUN' macro.  Here is a template for them:

     DEFUN (LNAME, FNAME, MIN_ARGS, MAX_ARGS, INTERACTIVE, /*
     DOCSTRING
     */
        (ARGLIST))

LNAME
     This string is the name of the Lisp symbol to define as the
     function name; in the example above, it is `"prog1"'.

FNAME
     This is the C function name for this function.  This is the name
     that is used in C code for calling the function.  The name is, by
     convention, `F' prepended to the Lisp name, with all dashes (`-')
     in the Lisp name changed to underscores.  Thus, to call this
     function from C code, call `Fprog1'.  Remember that the arguments
     are of type `Lisp_Object'; various macros and functions for
     creating values of type `Lisp_Object' are declared in the file
     `lisp.h'.

     Primitives whose names are special characters (e.g. `+' or `<')
     are named by spelling out, in some fashion, the special character:
     e.g. `Fplus()' or `Flss()'.  Primitives whose names begin with
     normal alphanumeric characters but also contain special characters
     are spelled out in some creative way, e.g. `let*' becomes
     `FletX()'.

     Each function also has an associated structure that holds the data
     for the subr object that represents the function in Lisp.  This
     structure conveys the Lisp symbol name to the initialization
     routine that will create the symbol and store the subr object as
     its definition.  The C variable name of this structure is always
     `S' prepended to the FNAME.  You hardly ever need to be aware of
     the existence of this structure, since `DEFUN' plus `DEFSUBR'
     takes care of all the details.

MIN_ARGS
     This is the minimum number of arguments that the function
     requires.  The function `prog1' allows a minimum of one argument.

MAX_ARGS
     This is the maximum number of arguments that the function accepts,
     if there is a fixed maximum.  Alternatively, it can be `UNEVALLED',
     indicating a special form that receives unevaluated arguments, or
     `MANY', indicating an unlimited number of evaluated arguments (the
     C equivalent of `&rest').  Both `UNEVALLED' and `MANY' are macros.
     If MAX_ARGS is a number, it may not be less than MIN_ARGS and it
     may not be greater than 8. (If you need to add a function with
     more than 8 arguments, use the `MANY' form.  Resist the urge to
     edit the definition of `DEFUN' in `lisp.h'.  If you do it anyways,
     make sure to also add another clause to the switch statement in
     `primitive_funcall().')

INTERACTIVE
     This is an interactive specification, a string such as might be
     used as the argument of `interactive' in a Lisp function.  In the
     case of `prog1', it is 0 (a null pointer), indicating that `prog1'
     cannot be called interactively.  A value of `""' indicates a
     function that should receive no arguments when called
     interactively.

DOCSTRING
     This is the documentation string.  It is written just like a
     documentation string for a function defined in Lisp; in
     particular, the first line should be a single sentence.  Note how
     the documentation string is enclosed in a comment, none of the
     documentation is placed on the same lines as the comment-start and
     comment-end characters, and the comment-start characters are on
     the same line as the interactive specification.  `make-docfile',
     which scans the C files for documentation strings, is very
     particular about what it looks for, and will not properly extract
     the doc string if it's not in this exact format.

     In order to make both `etags' and `make-docfile' happy, make sure
     that the `DEFUN' line contains the LNAME and FNAME, and that the
     comment-start characters for the doc string are on the same line
     as the interactive specification, and put a newline directly after
     them (and before the comment-end characters).

ARGLIST
     This is the comma-separated list of arguments to the C function.
     For a function with a fixed maximum number of arguments, provide a
     C argument for each Lisp argument.  In this case, unlike regular C
     functions, the types of the arguments are not declared; they are
     simply always of type `Lisp_Object'.

     The names of the C arguments will be used as the names of the
     arguments to the Lisp primitive as displayed in its documentation,
     modulo the same concerns described above for `F...' names (in
     particular, underscores in the C arguments become dashes in the
     Lisp arguments).

     There is one additional kludge: A trailing `_' on the C argument is
     discarded when forming the Lisp argument.  This allows C language
     reserved words (like `default') or global symbols (like `dirname')
     to be used as argument names without compiler warnings or errors.

     A Lisp function with MAX_ARGS = `UNEVALLED' is a "special form";
     its arguments are not evaluated.  Instead it receives one argument
     of type `Lisp_Object', a (Lisp) list of the unevaluated arguments,
     conventionally named `(args)'.

     When a Lisp function has no upper limit on the number of arguments,
     specify MAX_ARGS = `MANY'.  In this case its implementation in C
     actually receives exactly two arguments: the number of Lisp
     arguments (an `int') and the address of a block containing their
     values (a `Lisp_Object *').  In this case only are the C types
     specified in the ARGLIST: `(int nargs, Lisp_Object *args)'.


   Within the function `Fprog1' itself, note the use of the macros
`GCPRO1' and `UNGCPRO'.  `GCPRO1' is used to "protect" a variable from
garbage collection--to inform the garbage collector that it must look
in that variable and regard the object pointed at by its contents as an
accessible object.  This is necessary whenever you call `Feval' or
anything that can directly or indirectly call `Feval' (this includes
the `QUIT' macro!).  At such a time, any Lisp object that you intend to
refer to again must be protected somehow.  `UNGCPRO' cancels the
protection of the variables that are protected in the current function.
It is necessary to do this explicitly.

   The macro `GCPRO1' protects just one local variable.  If you want to
protect two, use `GCPRO2' instead; repeating `GCPRO1' will not work.
Macros `GCPRO3' and `GCPRO4' also exist.

   These macros implicitly use local variables such as `gcpro1'; you
must declare these explicitly, with type `struct gcpro'.  Thus, if you
use `GCPRO2', you must declare `gcpro1' and `gcpro2'.

   Note also that the general rule is "caller-protects"; i.e. you are
only responsible for protecting those Lisp objects that you create.  Any
objects passed to you as arguments should have been protected by whoever
created them, so you don't in general have to protect them.

   In particular, the arguments to any Lisp primitive are always
automatically `GCPRO'ed, when called "normally" from Lisp code or
bytecode.  So only a few Lisp primitives that are called frequently from
C code, such as `Fprogn' protect their arguments as a service to their
caller.  You don't need to protect your arguments when writing a new
`DEFUN'.

   `GCPRO'ing is perhaps the trickiest and most error-prone part of
XEmacs coding.  It is *extremely* important that you get this right and
use a great deal of discipline when writing this code.  *Note
`GCPRO'ing: GCPROing, for full details on how to do this.

   What `DEFUN' actually does is declare a global structure of type
`Lisp_Subr' whose name begins with capital `SF' and which contains
information about the primitive (e.g. a pointer to the function, its
minimum and maximum allowed arguments, a string describing its Lisp
name); `DEFUN' then begins a normal C function declaration using the
`F...' name.  The Lisp subr object that is the function definition of a
primitive (i.e. the object in the function slot of the symbol that
names the primitive) actually points to this `SF' structure; when
`Feval' encounters a subr, it looks in the structure to find out how to
call the C function.

   Defining the C function is not enough to make a Lisp primitive
available; you must also create the Lisp symbol for the primitive (the
symbol is "interned"; *note Obarrays::) and store a suitable subr
object in its function cell. (If you don't do this, the primitive won't
be seen by Lisp code.) The code looks like this:

     DEFSUBR (FNAME);

Here FNAME is the same name you used as the second argument to `DEFUN'.

   This call to `DEFSUBR' should go in the `syms_of_*()' function at
the end of the module.  If no such function exists, create it and make
sure to also declare it in `symsinit.h' and call it from the
appropriate spot in `main()'.  *Note General Coding Rules::.

   Note that C code cannot call functions by name unless they are
defined in C.  The way to call a function written in Lisp from C is to
use `Ffuncall', which embodies the Lisp function `funcall'.  Since the
Lisp function `funcall' accepts an unlimited number of arguments, in C
it takes two: the number of Lisp-level arguments, and a one-dimensional
array containing their values.  The first Lisp-level argument is the
Lisp function to call, and the rest are the arguments to pass to it.
Since `Ffuncall' can call the evaluator, you must protect pointers from
garbage collection around the call to `Ffuncall'. (However, `Ffuncall'
explicitly protects all of its parameters, so you don't have to protect
any pointers passed as parameters to it.)

   The C functions `call0', `call1', `call2', and so on, provide handy
ways to call a Lisp function conveniently with a fixed number of
arguments.  They work by calling `Ffuncall'.

   `eval.c' is a very good file to look through for examples; `lisp.h'
contains the definitions for important macros and functions.


File: internals.info,  Node: Writing Good Comments,  Next: Adding Global Lisp Variables,  Prev: Writing Lisp Primitives,  Up: Rules When Writing New C Code

8.4 Writing Good Comments
=========================

Comments are a lifeline for programmers trying to understand tricky
code.  In general, the less obvious it is what you are doing, the more
you need a comment, and the more detailed it needs to be.  You should
always be on guard when you're writing code for stuff that's tricky, and
should constantly be putting yourself in someone else's shoes and asking
if that person could figure out without much difficulty what's going
on. (Assume they are a competent programmer who understands the
essentials of how the XEmacs code is structured but doesn't know much
about the module you're working on or any algorithms you're using.) If
you're not sure whether they would be able to, add a comment.  Always
err on the side of more comments, rather than less.

   Generally, when making comments, there is no need to attribute them
with your name or initials.  This especially goes for small,
easy-to-understand, non-opinionated ones.  Also, comments indicating
where, when, and by whom a file was changed are _strongly_ discouraged,
and in general will be removed as they are discovered.  This is exactly
what `ChangeLogs' are there for.  However, it can occasionally be
useful to mark exactly where (but not when or by whom) changes are
made, particularly when making small changes to a file imported from
elsewhere.  These marks help when later on a newer version of the file
is imported and the changes need to be merged. (If everything were
always kept in CVS, there would be no need for this.  But in practice,
this often doesn't happen, or the CVS repository is later on lost or
unavailable to the person doing the update.)

   When putting in an explicit opinion in a comment, you should
_always_ attribute it with your name, and optionally the date.  This
also goes for long, complex comments explaining in detail the workings
of something - by putting your name there, you make it possible for
someone who has questions about how that thing works to determine who
wrote the comment so they can write to them.  Preferably, use your
actual name and not your initials, unless your initials are generally
recognized (e.g. `jwz').  You can use only your first name if it's
obvious who you are; otherwise, give first and last name.  If you're
not a regular contributor, you might consider putting your email
address in - it may be in the ChangeLog, but after awhile ChangeLogs
have a tendency of disappearing or getting muddled. (E.g. your comment
may get copied somewhere else or even into another program, and
tracking down the proper ChangeLog may be very difficult.)

   If you come across an opinion that is not or no longer valid, or you
come across any comment that no longer applies but you want to keep it
around, enclose it in `[[ ' and ` ]]' marks and add a comment
afterwards explaining why the preceding comment is no longer valid.  Put
your name on this comment, as explained above.

   Just as comments are a lifeline to programmers, incorrect comments
are death.  If you come across an incorrect comment, *immediately*
correct it or flag it as incorrect, as described in the previous
paragraph.  Whenever you work on a section of code, _always_ make sure
to update any comments to be correct - or, at the very least, flag them
as incorrect.

   To indicate a "todo" or other problem, use four pound signs - i.e.
`####'.


File: internals.info,  Node: Adding Global Lisp Variables,  Next: Proper Use of Unsigned Types,  Prev: Writing Good Comments,  Up: Rules When Writing New C Code

8.5 Adding Global Lisp Variables
================================

Global variables whose names begin with `Q' are constants whose value
is a symbol of a particular name.  The name of the variable should be
derived from the name of the symbol using the same rules as for Lisp
primitives.  These variables are initialized using a call to
`defsymbol()' in the `syms_of_*()' function. (This call interns a
symbol, sets the C variable to the resulting Lisp object, and calls
`staticpro()' on the C variable to tell the garbage-collection
mechanism about this variable.  What `staticpro()' does is add a
pointer to the variable to a large global array; when
garbage-collection happens, all pointers listed in the array are used
as starting points for marking Lisp objects.  This is important because
it's quite possible that the only current reference to the object is
the C variable.  In the case of symbols, the `staticpro()' doesn't
matter all that much because the symbol is contained in `obarray',
which is itself `staticpro()'ed.  However, it's possible that a naughty
user could do something like uninterning the symbol out of `obarray' or
even setting `obarray' to a different value [although this is likely to
make XEmacs crash!].)

   *Please note:* It is potentially deadly if you declare a `Q...'
variable in two different modules.  The two calls to `defsymbol()' are
no problem, but some linkers will complain about multiply-defined
symbols.  The most insidious aspect of this is that often the link will
succeed anyway, but then the resulting executable will sometimes crash
in obscure ways during certain operations!

   To avoid this problem, declare any symbols with common names (such as
`text') that are not obviously associated with this particular module
in the file `general-slots.h'.  The "-slots" suffix indicates that this
is a file that is included multiple times in `general.c'.  Redefinition
of preprocessor macros allows the effects to be different in each
context, so this is actually more convenient and less error-prone than
doing it in your module.

   Global variables whose names begin with `V' are variables that
contain Lisp objects.  The convention here is that all global variables
of type `Lisp_Object' begin with `V', and all others don't (including
integer and boolean variables that have Lisp equivalents). Most of the
time, these variables have equivalents in Lisp, but some don't.  Those
that do are declared this way by a call to `DEFVAR_LISP()' in the
`vars_of_*()' initializer for the module.  What this does is create a
special "symbol-value-forward" Lisp object that contains a pointer to
the C variable, intern a symbol whose name is as specified in the call
to `DEFVAR_LISP()', and set its value to the symbol-value-forward Lisp
object; it also calls `staticpro()' on the C variable to tell the
garbage-collection mechanism about the variable.  When `eval' (or
actually `symbol-value') encounters this special object in the process
of retrieving a variable's value, it follows the indirection to the C
variable and gets its value.  `setq' does similar things so that the C
variable gets changed.

   Whether or not you `DEFVAR_LISP()' a variable, you need to
initialize it in the `vars_of_*()' function; otherwise it will end up
as all zeroes, which is the integer 0 (_not_ `nil'), and this is
probably not what you want.  Also, if the variable is not
`DEFVAR_LISP()'ed, *you must call* `staticpro()' on the C variable in
the `vars_of_*()' function.  Otherwise, the garbage-collection
mechanism won't know that the object in this variable is in use, and
will happily collect it and reuse its storage for another Lisp object,
and you will be the one who's unhappy when you can't figure out how
your variable got overwritten.


File: internals.info,  Node: Proper Use of Unsigned Types,  Next: Coding for Mule,  Prev: Adding Global Lisp Variables,  Up: Rules When Writing New C Code

8.6 Proper Use of Unsigned Types
================================

Avoid using `unsigned int' and `unsigned long' whenever possible.
Unsigned types are viral - any arithmetic or comparisons involving
mixed signed and unsigned types are automatically converted to
unsigned, which is almost certainly not what you want.  Many subtle and
hard-to-find bugs are created by careless use of unsigned types.  In
general, you should almost _never_ use an unsigned type to hold a
regular quantity of any sort.  The only exceptions are

  1. When there's a reasonable possibility you will actually need all
     32 or 64 bits to store the quantity.

  2. When calling existing API's that require unsigned types.  In this
     case, you should still do all manipulation using signed types, and
     do the conversion at the very threshold of the API call.

  3. In existing code that you don't want to modify because you don't
     maintain it.

  4. In bit-field structures.

   Other reasonable uses of `unsigned int' and `unsigned long' are
representing non-quantities - e.g. bit-oriented flags and such.


File: internals.info,  Node: Coding for Mule,  Next: Techniques for XEmacs Developers,  Prev: Proper Use of Unsigned Types,  Up: Rules When Writing New C Code

8.7 Coding for Mule
===================

Although Mule support is not compiled by default in XEmacs, many people
are using it, and we consider it crucial that new code works correctly
with multibyte characters.  This is not hard; it is only a matter of
following several simple user-interface guidelines.  Even if you never
compile with Mule, with a little practice you will find it quite easy
to code Mule-correctly.

   Note that these guidelines are not necessarily tied to the current
Mule implementation; they are also a good idea to follow on the grounds
of code generalization for future I18N work.

* Menu:

* Character-Related Data Types::
* Working With Character and Byte Positions::
* Conversion to and from External Data::
* General Guidelines for Writing Mule-Aware Code::
* An Example of Mule-Aware Code::


File: internals.info,  Node: Character-Related Data Types,  Next: Working With Character and Byte Positions,  Up: Coding for Mule

8.7.1 Character-Related Data Types
----------------------------------

First, let's review the basic character-related datatypes used by
XEmacs.  Note that the separate `typedef's are not mandatory in the
current implementation (all of them boil down to `unsigned char' or
`int'), but they improve clarity of code a great deal, because one
glance at the declaration can tell the intended use of the variable.

`Emchar'
     An `Emchar' holds a single Emacs character.

     Obviously, the equality between characters and bytes is lost in
     the Mule world.  Characters can be represented by one or more
     bytes in the buffer, and `Emchar' is the C type large enough to
     hold any character.

     Without Mule support, an `Emchar' is equivalent to an `unsigned
     char'.

`Bufbyte'
     The data representing the text in a buffer or string is logically
     a set of `Bufbyte's.

     XEmacs does not work with the same character formats all the time;
     when reading characters from the outside, it decodes them to an
     internal format, and likewise encodes them when writing.
     `Bufbyte' (in fact `unsigned char') is the basic unit of XEmacs
     internal buffers and strings format.  A `Bufbyte *' is the type
     that points at text encoded in the variable-width internal
     encoding.

     One character can correspond to one or more `Bufbyte's.  In the
     current Mule implementation, an ASCII character is represented by
     the same `Bufbyte', and other characters are represented by a
     sequence of two or more `Bufbyte's.

     Without Mule support, there are exactly 256 characters, implicitly
     Latin-1, and each character is represented using one `Bufbyte', and
     there is a one-to-one correspondence between `Bufbyte's and
     `Emchar's.

`Bufpos'
`Charcount'
     A `Bufpos' represents a character position in a buffer or string.
     A `Charcount' represents a number (count) of characters.
     Logically, subtracting two `Bufpos' values yields a `Charcount'
     value.  Although all of these are `typedef'ed to `EMACS_INT', we
     use them in preference to `EMACS_INT' to make it clear what sort
     of position is being used.

     `Bufpos' and `Charcount' values are the only ones that are ever
     visible to Lisp.

`Bytind'
`Bytecount'
     A `Bytind' represents a byte position in a buffer or string.  A
     `Bytecount' represents the distance between two positions, in
     bytes.  The relationship between `Bytind' and `Bytecount' is the
     same as the relationship between `Bufpos' and `Charcount'.

`Extbyte'
`Extcount'
     When dealing with the outside world, XEmacs works with `Extbyte's,
     which are equivalent to `unsigned char'.  Obviously, an `Extcount'
     is the distance between two `Extbyte's.  Extbytes and Extcounts
     are not all that frequent in XEmacs code.


File: internals.info,  Node: Working With Character and Byte Positions,  Next: Conversion to and from External Data,  Prev: Character-Related Data Types,  Up: Coding for Mule

8.7.2 Working With Character and Byte Positions
-----------------------------------------------

Now that we have defined the basic character-related types, we can look
at the macros and functions designed for work with them and for
conversion between them.  Most of these macros are defined in
`buffer.h', and we don't discuss all of them here, but only the most
important ones.  Examining the existing code is the best way to learn
about them.

`MAX_EMCHAR_LEN'
     This preprocessor constant is the maximum number of buffer bytes to
     represent an Emacs character in the variable width internal
     encoding.  It is useful when allocating temporary strings to keep
     a known number of characters.  For instance:

          {
            Charcount cclen;
            ...
            {
              /* Allocate place for CCLEN characters. */
              Bufbyte *buf = (Bufbyte *)alloca (cclen * MAX_EMCHAR_LEN);
          ...

     If you followed the previous section, you can guess that,
     logically, multiplying a `Charcount' value with `MAX_EMCHAR_LEN'
     produces a `Bytecount' value.

     In the current Mule implementation, `MAX_EMCHAR_LEN' equals 4.
     Without Mule, it is 1.

`charptr_emchar'
`set_charptr_emchar'
     The `charptr_emchar' macro takes a `Bufbyte' pointer and returns
     the `Emchar' stored at that position.  If it were a function, its
     prototype would be:

          Emchar charptr_emchar (Bufbyte *p);

     `set_charptr_emchar' stores an `Emchar' to the specified byte
     position.  It returns the number of bytes stored:

          Bytecount set_charptr_emchar (Bufbyte *p, Emchar c);

     It is important to note that `set_charptr_emchar' is safe only for
     appending a character at the end of a buffer, not for overwriting a
     character in the middle.  This is because the width of characters
     varies, and `set_charptr_emchar' cannot resize the string if it
     writes, say, a two-byte character where a single-byte character
     used to reside.

     A typical use of `set_charptr_emchar' can be demonstrated by this
     example, which copies characters from buffer BUF to a temporary
     string of Bufbytes.

          {
            Bufpos pos;
            for (pos = beg; pos < end; pos++)
              {
                Emchar c = BUF_FETCH_CHAR (buf, pos);
                p += set_charptr_emchar (buf, c);
              }
          }

     Note how `set_charptr_emchar' is used to store the `Emchar' and
     increment the counter, at the same time.

`INC_CHARPTR'
`DEC_CHARPTR'
     These two macros increment and decrement a `Bufbyte' pointer,
     respectively.  They will adjust the pointer by the appropriate
     number of bytes according to the byte length of the character
     stored there.  Both macros assume that the memory address is
     located at the beginning of a valid character.

     Without Mule support, `INC_CHARPTR (p)' and `DEC_CHARPTR (p)'
     simply expand to `p++' and `p--', respectively.

`bytecount_to_charcount'
     Given a pointer to a text string and a length in bytes, return the
     equivalent length in characters.

          Charcount bytecount_to_charcount (Bufbyte *p, Bytecount bc);

`charcount_to_bytecount'
     Given a pointer to a text string and a length in characters,
     return the equivalent length in bytes.

          Bytecount charcount_to_bytecount (Bufbyte *p, Charcount cc);

`charptr_n_addr'
     Return a pointer to the beginning of the character offset CC (in
     characters) from P.

          Bufbyte *charptr_n_addr (Bufbyte *p, Charcount cc);


File: internals.info,  Node: Conversion to and from External Data,  Next: General Guidelines for Writing Mule-Aware Code,  Prev: Working With Character and Byte Positions,  Up: Coding for Mule

8.7.3 Conversion to and from External Data
------------------------------------------

When an external function, such as a C library function, returns a
`char' pointer, you should almost never treat it as `Bufbyte'.  This is
because these returned strings may contain 8bit characters which can be
misinterpreted by XEmacs, and cause a crash.  Likewise, when exporting
a piece of internal text to the outside world, you should always
convert it to an appropriate external encoding, lest the internal stuff
(such as the infamous \201 characters) leak out.

   The interface to conversion between the internal and external
representations of text are the numerous conversion macros defined in
`buffer.h'.  There used to be a fixed set of external formats supported
by these macros, but now any coding system can be used with these
macros.  The coding system alias mechanism is used to create the
following logical coding systems, which replace the fixed external
formats.  The (dontusethis-set-symbol-value-handler) mechanism was
enhanced to make this possible (more work on that is needed - like
remove the `dontusethis-' prefix).

`Qbinary'
     This is the simplest format and is what we use in the absence of a
     more appropriate format.  This converts according to the `binary'
     coding system:

       a. On input, bytes 0-255 are converted into (implicitly Latin-1)
          characters 0-255.  A non-Mule xemacs doesn't really know about
          different character sets and the fonts to display them, so
          the bytes can be treated as text in different 1-byte
          encodings by simply setting the appropriate fonts.  So in a
          sense, non-Mule xemacs is a multi-lingual editor if, for
          example, different fonts are used to display text in
          different buffers, faces, or windows.  The specifier
          mechanism gives the user complete control over this kind of
          behavior.

       b. On output, characters 0-255 are converted into bytes 0-255
          and other characters are converted into `~'.

`Qfile_name'
     Format used for filenames.  This is user-definable via either the
     `file-name-coding-system' or `pathname-coding-system' (now
     obsolete) variables.

`Qnative'
     Format used for the external Unix environment--`argv[]', stuff
     from `getenv()', stuff from the `/etc/passwd' file, etc.
     Currently this is the same as Qfile_name.  The two should be
     distinguished for clarity and possible future separation.

`Qctext'
     Compound-text format.  This is the standard X11 format used for
     data stored in properties, selections, and the like.  This is an
     8-bit no-lock-shift ISO2022 coding system.  This is a real coding
     system, unlike Qfile_name, which is user-definable.

   There are two fundamental macros to convert between external and
internal format.

   `TO_INTERNAL_FORMAT' converts external data to internal format, and
`TO_EXTERNAL_FORMAT' converts the other way around.  The arguments each
of these receives are a source type, a source, a sink type, a sink, and
a coding system (or a symbol naming a coding system).

   A typical call looks like
     TO_EXTERNAL_FORMAT (LISP_STRING, str, C_STRING_MALLOC, ptr, Qfile_name);

   which means that the contents of the lisp string `str' are written
to a malloc'ed memory area which will be pointed to by `ptr', after the
function returns.  The conversion will be done using the `file-name'
coding system, which will be controlled by the user indirectly by
setting or binding the variable `file-name-coding-system'.

   Some sources and sinks require two C variables to specify.  We use
some preprocessor magic to allow different source and sink types, and
even different numbers of arguments to specify different types of
sources and sinks.

   So we can have a call that looks like
     TO_INTERNAL_FORMAT (DATA, (ptr, len),
                         MALLOC, (ptr, len),
                         coding_system);

   The parenthesized argument pairs are required to make the
preprocessor magic work.

   Here are the different source and sink types:

``DATA, (ptr, len),''
     input data is a fixed buffer of size LEN at address PTR

``ALLOCA, (ptr, len),''
     output data is placed in an alloca()ed buffer of size LEN pointed
     to by PTR

``MALLOC, (ptr, len),''
     output data is in a malloc()ed buffer of size LEN pointed to by PTR

``C_STRING_ALLOCA, ptr,''
     equivalent to `ALLOCA (ptr, len_ignored)' on output.

``C_STRING_MALLOC, ptr,''
     equivalent to `MALLOC (ptr, len_ignored)' on output

``C_STRING, ptr,''
     equivalent to `DATA, (ptr, strlen (ptr) + 1)' on input

``LISP_STRING, string,''
     input or output is a Lisp_Object of type string

``LISP_BUFFER, buffer,''
     output is written to `(point)' in lisp buffer BUFFER

``LISP_LSTREAM, lstream,''
     input or output is a Lisp_Object of type lstream

``LISP_OPAQUE, object,''
     input or output is a Lisp_Object of type opaque

   Often, the data is being converted to a '\0'-byte-terminated string,
which is the format required by many external system C APIs.  For these
purposes, a source type of `C_STRING' or a sink type of
`C_STRING_ALLOCA' or `C_STRING_MALLOC' is appropriate.  Otherwise, we
should try to keep XEmacs '\0'-byte-clean, which means using (ptr, len)
pairs.

   The sinks to be specified must be lvalues, unless they are the lisp
object types `LISP_LSTREAM' or `LISP_BUFFER'.

   For the sink types `ALLOCA' and `C_STRING_ALLOCA', the resulting
text is stored in a stack-allocated buffer, which is automatically
freed on returning from the function.  However, the sink types `MALLOC'
and `C_STRING_MALLOC' return `xmalloc()'ed memory.  The caller is
responsible for freeing this memory using `xfree()'.

   Note that it doesn't make sense for `LISP_STRING' to be a source for
`TO_INTERNAL_FORMAT' or a sink for `TO_EXTERNAL_FORMAT'.  You'll get an
assertion failure if you try.


File: internals.info,  Node: General Guidelines for Writing Mule-Aware Code,  Next: An Example of Mule-Aware Code,  Prev: Conversion to and from External Data,  Up: Coding for Mule

8.7.4 General Guidelines for Writing Mule-Aware Code
----------------------------------------------------

This section contains some general guidance on how to write Mule-aware
code, as well as some pitfalls you should avoid.

_Never use `char' and `char *'._
     In XEmacs, the use of `char' and `char *' is almost always a
     mistake.  If you want to manipulate an Emacs character from "C",
     use `Emchar'.  If you want to examine a specific octet in the
     internal format, use `Bufbyte'.  If you want a Lisp-visible
     character, use a `Lisp_Object' and `make_char'.  If you want a
     pointer to move through the internal text, use `Bufbyte *'.  Also
     note that you almost certainly do not need `Emchar *'.

_Be careful not to confuse `Charcount', `Bytecount', and `Bufpos'._
     The whole point of using different types is to avoid confusion
     about the use of certain variables.  Lest this effect be
     nullified, you need to be careful about using the right types.

_Always convert external data_
     It is extremely important to always convert external data, because
     XEmacs can crash if unexpected 8bit sequences are copied to its
     internal buffers literally.

     This means that when a system function, such as `readdir', returns
     a string, you may need to convert it using one of the conversion
     macros described in the previous chapter, before passing it
     further to Lisp.

     Actually, most of the basic system functions that accept
     '\0'-terminated string arguments, like `stat()' and `open()', have
     been *encapsulated* so that they are they `always' do internal to
     external conversion themselves.  This means you must pass
     internally encoded data, typically the `XSTRING_DATA' of a
     Lisp_String to these functions.  This is actually a design bug,
     since it unexpectedly changes the semantics of the system
     functions.  A better design would be to provide separate versions
     of these system functions that accepted Lisp_Objects which were
     lisp strings in place of their current `char *' arguments.

          int stat_lisp (Lisp_Object path, struct stat *buf); /* Implement me */

     Also note that many internal functions, such as `make_string',
     accept Bufbytes, which removes the need for them to convert the
     data they receive.  This increases efficiency because that way
     external data needs to be decoded only once, when it is read.
     After that, it is passed around in internal format.


File: internals.info,  Node: An Example of Mule-Aware Code,  Prev: General Guidelines for Writing Mule-Aware Code,  Up: Coding for Mule

8.7.5 An Example of Mule-Aware Code
-----------------------------------

As an example of Mule-aware code, we will analyze the `string'
function, which conses up a Lisp string from the character arguments it
receives.  Here is the definition, pasted from `alloc.c':

     DEFUN ("string", Fstring, 0, MANY, 0, /*
     Concatenate all the argument characters and make the result a string.
     */
            (int nargs, Lisp_Object *args))
     {
       Bufbyte *storage = alloca_array (Bufbyte, nargs * MAX_EMCHAR_LEN);
       Bufbyte *p = storage;

       for (; nargs; nargs--, args++)
         {
           Lisp_Object lisp_char = *args;
           CHECK_CHAR_COERCE_INT (lisp_char);
           p += set_charptr_emchar (p, XCHAR (lisp_char));
         }
       return make_string (storage, p - storage);
     }

   Now we can analyze the source line by line.

   Obviously, string will be as long as there are arguments to the
function.  This is why we allocate `MAX_EMCHAR_LEN' * NARGS bytes on
the stack, i.e. the worst-case number of bytes for NARGS `Emchar's to
fit in the string.

   Then, the loop checks that each element is a character, converting
integers in the process.  Like many other functions in XEmacs, this
function silently accepts integers where characters are expected, for
historical and compatibility reasons.  Unless you know what you are
doing, `CHECK_CHAR' will also suffice.  `XCHAR (lisp_char)' extracts
the `Emchar' from the `Lisp_Object', and `set_charptr_emchar' stores it
to storage, increasing `p' in the process.

   Other instructive examples of correct coding under Mule can be found
all over the XEmacs code.  For starters, I recommend
`Fnormalize_menu_item_name' in `menubar.c'.  After you have understood
this section of the manual and studied the examples, you can proceed
writing new Mule-aware code.


File: internals.info,  Node: Techniques for XEmacs Developers,  Prev: Coding for Mule,  Up: Rules When Writing New C Code

8.8 Techniques for XEmacs Developers
====================================

To make a purified XEmacs, do: `make puremacs'.  To make a quantified
XEmacs, do: `make quantmacs'.

   You simply can't dump Quantified and Purified images (unless using
the portable dumper).  Purify gets confused when xemacs frees memory in
one process that was allocated in a _different_ process on a different
machine!.  Run it like so:
     temacs -batch -l loadup.el run-temacs XEMACS-ARGS...

   Before you go through the trouble, are you compiling with all
debugging and error-checking off?  If not, try that first.  Be warned
that while Quantify is directly responsible for quite a few
optimizations which have been made to XEmacs, doing a run which
generates results which can be acted upon is not necessarily a trivial
task.

   Also, if you're still willing to do some runs make sure you configure
with the `--quantify' flag.  That will keep Quantify from starting to
record data until after the loadup is completed and will shut off
recording right before it shuts down (which generates enough bogus data
to throw most results off).  It also enables three additional elisp
commands: `quantify-start-recording-data',
`quantify-stop-recording-data' and `quantify-clear-data'.

   If you want to make XEmacs faster, target your favorite slow
benchmark, run a profiler like Quantify, `gprof', or `tcov', and figure
out where the cycles are going.  In many cases you can localize the
problem (because a particular new feature or even a single patch
elicited it).  Don't hesitate to use brute force techniques like a
global counter incremented at strategic places, especially in
combination with other performance indications (_e.g._, degree of
buffer fragmentation into extents).

   Specific projects:

   * Make the garbage collector faster.  Figure out how to write an
     incremental garbage collector.

   * Write a compiler that takes bytecode and spits out C code.
     Unfortunately, you will then need a C compiler and a more fully
     developed module system.

   * Speed up redisplay.

   * Speed up syntax highlighting.  It was suggested that "maybe moving
     some of the syntax highlighting capabilities into C would make a
     difference."  Wrong idea, I think.  When processing one 400kB file
     a particular low-level routine was being called 40 _million_ times
     simply for _one_ call to `newline-and-indent'.  Syntax
     highlighting needs to be rewritten to use a reliable, fast parser,
     then to trust the pre-parsed structure, and only do
     re-highlighting locally to a text change.  Modern machines are
     fast enough to implement such parsers in Lisp; but no machine will
     ever be fast enough to deal with quadratic (or worse) algorithms!

   * Implement tail recursion in Emacs Lisp (hard!).

   Unfortunately, Emacs Lisp is slow, and is going to stay slow.
Function calls in elisp are especially expensive.  Iterating over a
long list is going to be 30 times faster implemented in C than in Elisp.

   Heavily used small code fragments need to be fast.  The traditional
way to implement such code fragments in C is with macros.  But macros
in C are known to be broken.

   Macro arguments that are repeatedly evaluated may suffer from
repeated side effects or suboptimal performance.

   Variable names used in macros may collide with caller's variables,
causing (at least) unwanted compiler warnings.

   In order to solve these problems, and maintain statement semantics,
one should use the `do { ... } while (0)' trick while trying to
reference macro arguments exactly once using local variables.

   Let's take a look at this poor macro definition:

     #define MARK_OBJECT(obj) \
       if (!marked_p (obj)) mark_object (obj), did_mark = 1

   This macro evaluates its argument twice, and also fails if used like
this:
       if (flag) MARK_OBJECT (obj); else do_something();

   A much better definition is

     #define MARK_OBJECT(obj) do { \
       Lisp_Object mo_obj = (obj); \
       if (!marked_p (mo_obj))     \
         {                         \
           mark_object (mo_obj);   \
           did_mark = 1;           \
         }                         \
     } while (0)

   Notice the elimination of double evaluation by using the local
variable with the obscure name.  Writing safe and efficient macros
requires great care.  The one problem with macros that cannot be
portably worked around is, since a C block has no value, a macro used
as an expression rather than a statement cannot use the techniques just
described to avoid multiple evaluation.

   In most cases where a macro has function semantics, an inline
function is a better implementation technique.  Modern compiler
optimizers tend to inline functions even if they have no `inline'
keyword, and configure magic ensures that the `inline' keyword can be
safely used as an additional compiler hint.  Inline functions used in a
single .c files are easy.  The function must already be defined to be
`static'.  Just add another `inline' keyword to the definition.

     inline static int
     heavily_used_small_function (int arg)
     {
       ...
     }

   Inline functions in header files are trickier, because we would like
to make the following optimization if the function is _not_ inlined
(for example, because we're compiling for debugging).  We would like the
function to be defined externally exactly once, and each calling
translation unit would create an external reference to the function,
instead of including a definition of the inline function in the object
code of every translation unit that uses it.  This optimization is
currently only available for gcc.  But you don't have to worry about the
trickiness; just define your inline functions in header files using this
pattern:

     INLINE_HEADER int
     i_used_to_be_a_crufty_macro_but_look_at_me_now (int arg);
     INLINE_HEADER int
     i_used_to_be_a_crufty_macro_but_look_at_me_now (int arg)
     {
       ...
     }

   The declaration right before the definition is to prevent warnings
when compiling with `gcc -Wmissing-declarations'.  I consider issuing
this warning for inline functions a gcc bug, but the gcc maintainers
disagree.

   Every header which contains inline functions, either directly by
using `INLINE_HEADER' or indirectly by using `DECLARE_LRECORD' must be
added to `inline.c''s includes to make the optimization described above
work.  (Optimization note: if all INLINE_HEADER functions are in fact
inlined in all translation units, then the linker can just discard
`inline.o', since it contains only unreferenced code).

   To get started debugging XEmacs, take a look at the `.gdbinit' and
`.dbxrc' files in the `src' directory.  See the section in the XEmacs
FAQ on How to Debug an XEmacs problem with a debugger.

   After making source code changes, run `make check' to ensure that
you haven't introduced any regressions.  If you want to make xemacs more
reliable, please improve the test suite in `tests/automated'.

   Did you make sure you didn't introduce any new compiler warnings?

   Before submitting a patch, please try compiling at least once with

     configure --with-mule --use-union-type --error-checking=all

   Here are things to know when you create a new source file:

   * All `.c' files should `#include <config.h>' first.  Almost all
     `.c' files should `#include "lisp.h"' second.

   * Generated header files should be included using the `#include
     <...>' syntax, not the `#include "..."' syntax.  The generated
     headers are:

     `config.h sheap-adjust.h paths.h Emacs.ad.h'

     The basic rule is that you should assume builds using `--srcdir'
     and the `#include <...>' syntax needs to be used when the
     to-be-included generated file is in a potentially different
     directory _at compile time_.  The non-obvious C rule is that
     `#include "..."' means to search for the included file in the same
     directory as the including file, _not_ in the current directory.
     Normally this is not a problem but when building with `--srcdir',
     `make' will search the `VPATH' for you, while the C compiler knows
     nothing about it.

   * Header files should _not_ include `<config.h>' and `"lisp.h"'.  It
     is the responsibility of the `.c' files that use it to do so.


   Here is a checklist of things to do when creating a new lisp object
type named FOO:

  1. create FOO.h

  2. create FOO.c

  3. add definitions of `syms_of_FOO', etc. to `FOO.c'

  4. add declarations of `syms_of_FOO', etc. to `symsinit.h'

  5. add calls to `syms_of_FOO', etc. to `emacs.c'

  6. add definitions of macros like `CHECK_FOO' and `FOOP' to `FOO.h'

  7. add the new type index to `enum lrecord_type'

  8. add a DEFINE_LRECORD_IMPLEMENTATION call to `FOO.c'

  9. add an INIT_LRECORD_IMPLEMENTATION call to `syms_of_FOO.c'


File: internals.info,  Node: Regression Testing XEmacs,  Next: A Summary of the Various XEmacs Modules,  Prev: Rules When Writing New C Code,  Up: Top

9 Regression Testing XEmacs
***************************

The source directory `tests/automated' contains XEmacs' automated test
suite.  The usual way of running all the tests is running `make check'
from the top-level source directory.

   The test suite is unfinished and it's still lacking some essential
features.  It is nevertheless recommended that you run the tests to
confirm that XEmacs behaves correctly.

   If you want to run a specific test case, you can do it from the
command-line like this:

     $ xemacs -batch -l test-harness.elc -f batch-test-emacs TEST-FILE

   If something goes wrong, you can run the test suite interactively by
loading `test-harness.el' into a running XEmacs and typing `M-x
test-emacs-test-file RET <filename> RET'.  You will see a log of passed
and failed tests, which should allow you to investigate the source of
the error and ultimately fix the bug.

   Adding a new test file is trivial: just create a new file here and it
will be run.  There is no need to byte-compile any of the files in this
directory--the test-harness will take care of any necessary
byte-compilation.

   Look at the existing test cases for the examples of coding test
cases.  It all boils down to your imagination and judicious use of the
macros `Assert', `Check-Error', `Check-Error-Message', and
`Check-Message'.

   Here's a simple example checking case-sensitive and case-insensitive
comparisons from `case-tests.el'.

     (with-temp-buffer
       (insert "Test Buffer")
       (let ((case-fold-search t))
         (goto-char (point-min))
         (Assert (eq (search-forward "test buffer" nil t) 12))
         (goto-char (point-min))
         (Assert (eq (search-forward "Test buffer" nil t) 12))
         (goto-char (point-min))
         (Assert (eq (search-forward "Test Buffer" nil t) 12))

         (setq case-fold-search nil)
         (goto-char (point-min))
         (Assert (not (search-forward "test buffer" nil t)))
         (goto-char (point-min))
         (Assert (not (search-forward "Test buffer" nil t)))
         (goto-char (point-min))
         (Assert (eq (search-forward "Test Buffer" nil t) 12))))

   This example could be inserted in a file in `tests/automated', and
it would be a complete test, automatically executed when you run `make
check' after building XEmacs.  More complex tests may require
substantial temporary scaffolding to create the environment that elicits
the bugs, but the top-level Makefile and `test-harness.el' handle the
running and collection of results from the `Assert', `Check-Error',
`Check-Error-Message', and `Check-Message' macros.

   In general, you should avoid using functionality from packages in
your tests, because you can't be sure that everyone will have the
required package.  However, if you've got a test that works, by all
means add it.  Simply wrap the test in an appropriate test, add a
notice that the test was skipped, and update the `skipped-test-reasons'
hashtable.  Here's an example from `syntax-tests.el':

     ;; Test forward-comment at buffer boundaries
     (with-temp-buffer

       ;; try to use exactly what you need: featurep, boundp, fboundp
       (if (not (fboundp 'c-mode))

           ;; We should provide a standard function for this boilerplate,
           ;; probably called `Skip-Test' -- check for that API with C-h f
           (let* ((reason "c-mode unavailable")
     	     (count (gethash reason skipped-test-reasons)))
     	(puthash reason (if (null count) 1 (1+ count))
     		 skipped-test-reasons)
     	(Print-Skip "comment and parse-partial-sexp tests" reason))

         ;; and here's the test code
         (c-mode)
         (insert "// comment\n")
         (forward-comment -2)
         (Assert (eq (point) (point-min)))
         (let ((point (point)))
           (insert "/* comment */")
           (goto-char point)
           (forward-comment 2)
           (Assert (eq (point) (point-max)))
           (parse-partial-sexp point (point-max)))))

   `Skip-Test' is intended for use with features that are normally
present in typical configurations.  For truly optional features, or
tests that apply to one of several alternative implementations (eg, to
GTK widgets, but not Athena, Motif, MS Windows, or Carbon), simply
silently omit the test.


File: internals.info,  Node: A Summary of the Various XEmacs Modules,  Next: Allocation of Objects in XEmacs Lisp,  Prev: Regression Testing XEmacs,  Up: Top

10 A Summary of the Various XEmacs Modules
******************************************

This is accurate as of XEmacs 20.0.

* Menu:

* Low-Level Modules::
* Basic Lisp Modules::
* Modules for Standard Editing Operations::
* Editor-Level Control Flow Modules::
* Modules for the Basic Displayable Lisp Objects::
* Modules for other Display-Related Lisp Objects::
* Modules for the Redisplay Mechanism::
* Modules for Interfacing with the File System::
* Modules for Other Aspects of the Lisp Interpreter and Object System::
* Modules for Interfacing with the Operating System::
* Modules for Interfacing with X Windows::
* Modules for Internationalization::
* Modules for Regression Testing::


File: internals.info,  Node: Low-Level Modules,  Next: Basic Lisp Modules,  Up: A Summary of the Various XEmacs Modules

10.1 Low-Level Modules
======================

     config.h

   This is automatically generated from `config.h.in' based on the
results of configure tests and user-selected optional features and
contains preprocessor definitions specifying the nature of the
environment in which XEmacs is being compiled.

     paths.h

   This is automatically generated from `paths.h.in' based on supplied
configure values, and allows for non-standard installed configurations
of the XEmacs directories.  It's currently broken, though.

     emacs.c
     signal.c

   `emacs.c' contains `main()' and other code that performs the most
basic environment initializations and handles shutting down the XEmacs
process (this includes `kill-emacs', the normal way that XEmacs is
exited; `dump-emacs', which is used during the build process to write
out the XEmacs executable; `run-emacs-from-temacs', which can be used
to start XEmacs directly when temacs has finished loading all the Lisp
code; and emergency code to handle crashes [XEmacs tries to auto-save
all files before it crashes]).

   Low-level code that directly interacts with the Unix signal
mechanism, however, is in `signal.c'.  Note that this code does not
handle system dependencies in interfacing to signals; that is handled
using the `syssignal.h' header file, described in section J below.

     unexaix.c
     unexalpha.c
     unexapollo.c
     unexconvex.c
     unexec.c
     unexelf.c
     unexelfsgi.c
     unexencap.c
     unexenix.c
     unexfreebsd.c
     unexfx2800.c
     unexhp9k3.c
     unexhp9k800.c
     unexmips.c
     unexnext.c
     unexsol2.c
     unexsunos4.c

   These modules contain code dumping out the XEmacs executable on
various different systems. (This process is highly machine-specific and
requires intimate knowledge of the executable format and the memory map
of the process.) Only one of these modules is actually used; this is
chosen by `configure'.

     ecrt0.c
     lastfile.c
     pre-crt0.c

   These modules are used in conjunction with the dump mechanism.  On
some systems, an alternative version of the C startup code (the actual
code that receives control from the operating system when the process is
started, and which calls `main()') is required so that the dumping
process works properly; `crt0.c' provides this.

   `pre-crt0.c' and `lastfile.c' should be the very first and very last
file linked, respectively. (Actually, this is not really true.
`lastfile.c' should be after all Emacs modules whose initialized data
should be made constant, and before all other Emacs files and all
libraries.  In particular, the allocation modules `gmalloc.c',
`alloca.c', etc. are normally placed past `lastfile.c', and all of the
files that implement Xt widget classes _must_ be placed after
`lastfile.c' because they contain various structures that must be
statically initialized and into which Xt writes at various times.)
`pre-crt0.c' and `lastfile.c' contain exported symbols that are used to
determine the start and end of XEmacs' initialized data space when
dumping.

     alloca.c
     free-hook.c
     getpagesize.h
     gmalloc.c
     malloc.c
     mem-limits.h
     ralloc.c
     vm-limit.c

   These handle basic C allocation of memory.  `alloca.c' is an
emulation of the stack allocation function `alloca()' on machines that
lack this. (XEmacs makes extensive use of `alloca()' in its code.)

   `gmalloc.c' and `malloc.c' are two implementations of the standard C
functions `malloc()', `realloc()' and `free()'.  They are often used in
place of the standard system-provided `malloc()' because they usually
provide a much faster implementation, at the expense of additional
memory use.  `gmalloc.c' is a newer implementation that is much more
memory-efficient for large allocations than `malloc.c', and should
always be preferred if it works. (At one point, `gmalloc.c' didn't work
on some systems where `malloc.c' worked; but this should be fixed now.)

   `ralloc.c' is the "relocating allocator".  It provides functions
similar to `malloc()', `realloc()' and `free()' that allocate memory
that can be dynamically relocated in memory.  The advantage of this is
that allocated memory can be shuffled around to place all the free
memory at the end of the heap, and the heap can then be shrunk,
releasing the memory back to the operating system.  The use of this can
be controlled with the configure option `--rel-alloc'; if enabled,
memory allocated for buffers will be relocatable, so that if a very
large file is visited and the buffer is later killed, the memory can be
released to the operating system.  (The disadvantage of this mechanism
is that it can be very slow.  On systems with the `mmap()' system call,
the XEmacs version of `ralloc.c' uses this to move memory around
without actually having to block-copy it, which can speed things up;
but it can still cause noticeable performance degradation.)

   `free-hook.c' contains some debugging functions for checking for
invalid arguments to `free()'.

   `vm-limit.c' contains some functions that warn the user when memory
is getting low.  These are callback functions that are called by
`gmalloc.c' and `malloc.c' at appropriate times.

   `getpagesize.h' provides a uniform interface for retrieving the size
of a page in virtual memory.  `mem-limits.h' provides a uniform
interface for retrieving the total amount of available virtual memory.
Both are similar in spirit to the `sys*.h' files described in section
J, below.

     blocktype.c
     blocktype.h
     dynarr.c

   These implement a couple of basic C data types to facilitate memory
allocation.  The `Blocktype' type efficiently manages the allocation of
fixed-size blocks by minimizing the number of times that `malloc()' and
`free()' are called.  It allocates memory in large chunks, subdivides
the chunks into blocks of the proper size, and returns the blocks as
requested.  When blocks are freed, they are placed onto a linked list,
so they can be efficiently reused.  This data type is not much used in
XEmacs currently, because it's a fairly new addition.

   The `Dynarr' type implements a "dynamic array", which is similar to
a standard C array but has no fixed limit on the number of elements it
can contain.  Dynamic arrays can hold elements of any type, and when
you add a new element, the array automatically resizes itself if it
isn't big enough.  Dynarrs are extensively used in the redisplay
mechanism.

     inline.c

   This module is used in connection with inline functions (available in
some compilers).  Often, inline functions need to have a corresponding
non-inline function that does the same thing.  This module is where they
reside.  It contains no actual code, but defines some special flags that
cause inline functions defined in header files to be rendered as actual
functions.  It then includes all header files that contain any inline
function definitions, so that each one gets a real function equivalent.

     debug.c
     debug.h

   These functions provide a system for doing internal consistency
checks during code development.  This system is not currently used;
instead the simpler `assert()' macro is used along with the various
checks provided by the `--error-check-*' configuration options.

     universe.h

   This is not currently used.


File: internals.info,  Node: Basic Lisp Modules,  Next: Modules for Standard Editing Operations,  Prev: Low-Level Modules,  Up: A Summary of the Various XEmacs Modules

10.2 Basic Lisp Modules
=======================

     lisp-disunion.h
     lisp-union.h
     lisp.h
     lrecord.h
     symsinit.h

   These are the basic header files for all XEmacs modules.  Each module
includes `lisp.h', which brings the other header files in.  `lisp.h'
contains the definitions of the structures and extractor and
constructor macros for the basic Lisp objects and various other basic
definitions for the Lisp environment, as well as some general-purpose
definitions (e.g. `min()' and `max()').  `lisp.h' includes either
`lisp-disunion.h' or `lisp-union.h', depending on whether
`USE_UNION_TYPE' is defined.  These files define the typedef of the
Lisp object itself (as described above) and the low-level macros that
hide the actual implementation of the Lisp object.  All extractor and
constructor macros for particular types of Lisp objects are defined in
terms of these low-level macros.

   As a general rule, all typedefs should go into the typedefs section
of `lisp.h' rather than into a module-specific header file even if the
structure is defined elsewhere.  This allows function prototypes that
use the typedef to be placed into other header files.  Forward structure
declarations (i.e. a simple declaration like `struct foo;' where the
structure itself is defined elsewhere) should be placed into the
typedefs section as necessary.

   `lrecord.h' contains the basic structures and macros that implement
all record-type Lisp objects--i.e. all objects whose type is a field in
their C structure, which includes all objects except the few most basic
ones.

   `lisp.h' contains prototypes for most of the exported functions in
the various modules.  Lisp primitives defined using `DEFUN' that need
to be called by C code should be declared using `EXFUN'.  Other
function prototypes should be placed either into the appropriate
section of `lisp.h', or into a module-specific header file, depending
on how general-purpose the function is and whether it has
special-purpose argument types requiring definitions not in `lisp.h'.)
All initialization functions are prototyped in `symsinit.h'.

     alloc.c

   The large module `alloc.c' implements all of the basic allocation and
garbage collection for Lisp objects.  The most commonly used Lisp
objects are allocated in chunks, similar to the Blocktype data type
described above; others are allocated in individually `malloc()'ed
blocks.  This module provides the foundation on which all other aspects
of the Lisp environment sit, and is the first module initialized at
startup.

   Note that `alloc.c' provides a series of generic functions that are
not dependent on any particular object type, and interfaces to
particular types of objects using a standardized interface of
type-specific methods.  This scheme is a fundamental principle of
object-oriented programming and is heavily used throughout XEmacs.  The
great advantage of this is that it allows for a clean separation of
functionality into different modules--new classes of Lisp objects, new
event interfaces, new device types, new stream interfaces, etc. can be
added transparently without affecting code anywhere else in XEmacs.
Because the different subsystems are divided into general and specific
code, adding a new subtype within a subsystem will in general not
require changes to the generic subsystem code or affect any of the other
subtypes in the subsystem; this provides a great deal of robustness to
the XEmacs code.

     eval.c
     backtrace.h

   This module contains all of the functions to handle the flow of
control.  This includes the mechanisms of defining functions, calling
functions, traversing stack frames, and binding variables; the control
primitives and other special forms such as `while', `if', `eval',
`let', `and', `or', `progn', etc.; handling of non-local exits,
unwind-protects, and exception handlers; entering the debugger; methods
for the subr Lisp object type; etc.  It does _not_ include the `read'
function, the `print' function, or the handling of symbols and obarrays.

   `backtrace.h' contains some structures related to stack frames and
the flow of control.

     lread.c

   This module implements the Lisp reader and the `read' function,
which converts text into Lisp objects, according to the read syntax of
the objects, as described above.  This is similar to the parser that is
a part of all compilers.

     print.c

   This module implements the Lisp print mechanism and the `print'
function and related functions.  This is the inverse of the Lisp reader
- it converts Lisp objects to a printed, textual representation.
(Hopefully something that can be read back in using `read' to get an
equivalent object.)

     general.c
     symbols.c
     symeval.h

   `symbols.c' implements the handling of symbols, obarrays, and
retrieving the values of symbols.  Much of the code is devoted to
handling the special "symbol-value-magic" objects that define special
types of variables--this includes buffer-local variables, variable
aliases, variables that forward into C variables, etc.  This module is
initialized extremely early (right after `alloc.c'), because it is here
that the basic symbols `t' and `nil' are created, and those symbols are
used everywhere throughout XEmacs.

   `symeval.h' contains the definitions of symbol structures and the
`DEFVAR_LISP()' and related macros for declaring variables.

     data.c
     floatfns.c
     fns.c

   These modules implement the methods and standard Lisp primitives for
all the basic Lisp object types other than symbols (which are described
above).  `data.c' contains all the predicates (primitives that return
whether an object is of a particular type); the integer arithmetic
functions; and the basic accessor and mutator primitives for the various
object types.  `fns.c' contains all the standard predicates for working
with sequences (where, abstractly speaking, a sequence is an ordered set
of objects, and can be represented by a list, string, vector, or
bit-vector); it also contains `equal', perhaps on the grounds that bulk
of the operation of `equal' is comparing sequences.  `floatfns.c'
contains methods and primitives for floats and floating-point
arithmetic.

     bytecode.c
     bytecode.h

   `bytecode.c' implements the byte-code interpreter and
compiled-function objects, and `bytecode.h' contains associated
structures.  Note that the byte-code _compiler_ is written in Lisp.


File: internals.info,  Node: Modules for Standard Editing Operations,  Next: Editor-Level Control Flow Modules,  Prev: Basic Lisp Modules,  Up: A Summary of the Various XEmacs Modules

10.3 Modules for Standard Editing Operations
============================================

     buffer.c
     buffer.h
     bufslots.h

   `buffer.c' implements the "buffer" Lisp object type.  This includes
functions that create and destroy buffers; retrieve buffers by name or
by other properties; manipulate lists of buffers (remember that buffers
are permanent objects and stored in various ordered lists); retrieve or
change buffer properties; etc.  It also contains the definitions of all
the built-in buffer-local variables (which can be viewed as buffer
properties).  It does _not_ contain code to manipulate buffer-local
variables (that's in `symbols.c', described above); or code to
manipulate the text in a buffer.

   `buffer.h' defines the structures associated with a buffer and the
various macros for retrieving text from a buffer and special buffer
positions (e.g. `point', the default location for text insertion).  It
also contains macros for working with buffer positions and converting
between their representations as character offsets and as byte offsets
(under MULE, they are different, because characters can be multi-byte).
It is one of the largest header files.

   `bufslots.h' defines the fields in the buffer structure that
correspond to the built-in buffer-local variables.  It is its own
header file because it is included many times in `buffer.c', as a way
of iterating over all the built-in buffer-local variables.

     insdel.c
     insdel.h

   `insdel.c' contains low-level functions for inserting and deleting
text in a buffer, keeping track of changed regions for use by
redisplay, and calling any before-change and after-change functions
that may have been registered for the buffer.  It also contains the
actual functions that convert between byte offsets and character
offsets.

   `insdel.h' contains associated headers.

     marker.c

   This module implements the "marker" Lisp object type, which
conceptually is a pointer to a text position in a buffer that moves
around as text is inserted and deleted, so as to remain in the same
relative position.  This module doesn't actually move the markers around
- that's handled in `insdel.c'.  This module just creates them and
implements the primitives for working with them.  As markers are simple
objects, this does not entail much.

   Note that the standard arithmetic primitives (e.g. `+') accept
markers in place of integers and automatically substitute the value of
`marker-position' for the marker, i.e. an integer describing the
current buffer position of the marker.

     extents.c
     extents.h

   This module implements the "extent" Lisp object type, which is like
a marker that works over a range of text rather than a single position.
Extents are also much more complex and powerful than markers and have a
more efficient (and more algorithmically complex) implementation.  The
implementation is described in detail in comments in `extents.c'.

   The code in `extents.c' works closely with `insdel.c' so that
extents are properly moved around as text is inserted and deleted.
There is also code in `extents.c' that provides information needed by
the redisplay mechanism for efficient operation. (Remember that extents
can have display properties that affect [sometimes drastically, as in
the `invisible' property] the display of the text they cover.)

     editfns.c

   `editfns.c' contains the standard Lisp primitives for working with a
buffer's text, and calls the low-level functions in `insdel.c'.  It
also contains primitives for working with `point' (the default buffer
insertion location).

   `editfns.c' also contains functions for retrieving various
characteristics from the external environment: the current time, the
process ID of the running XEmacs process, the name of the user who ran
this XEmacs process, etc.  It's not clear why this code is in
`editfns.c'.

     callint.c
     cmds.c
     commands.h

   These modules implement the basic "interactive" commands, i.e.
user-callable functions.  Commands, as opposed to other functions, have
special ways of getting their parameters interactively (by querying the
user), as opposed to having them passed in a normal function
invocation.  Many commands are not really meant to be called from other
Lisp functions, because they modify global state in a way that's often
undesired as part of other Lisp functions.

   `callint.c' implements the mechanism for querying the user for
parameters and calling interactive commands.  The bulk of this module is
code that parses the interactive spec that is supplied with an
interactive command.

   `cmds.c' implements the basic, most commonly used editing commands:
commands to move around the current buffer and insert and delete
characters.  These commands are implemented using the Lisp primitives
defined in `editfns.c'.

   `commands.h' contains associated structure definitions and
prototypes.

     regex.c
     regex.h
     search.c

   `search.c' implements the Lisp primitives for searching for text in
a buffer, and some of the low-level algorithms for doing this.  In
particular, the fast fixed-string Boyer-Moore search algorithm is
implemented in `search.c'.  The low-level algorithms for doing
regular-expression searching, however, are implemented in `regex.c' and
`regex.h'.  These two modules are largely independent of XEmacs, and
are similar to (and based upon) the regular-expression routines used in
`grep' and other GNU utilities.

     doprnt.c

   `doprnt.c' implements formatted-string processing, similar to
`printf()' command in C.

     undo.c

   This module implements the undo mechanism for tracking buffer
changes.  Most of this could be implemented in Lisp.


File: internals.info,  Node: Editor-Level Control Flow Modules,  Next: Modules for the Basic Displayable Lisp Objects,  Prev: Modules for Standard Editing Operations,  Up: A Summary of the Various XEmacs Modules

10.4 Editor-Level Control Flow Modules
======================================

     event-Xt.c
     event-msw.c
     event-stream.c
     event-tty.c
     events-mod.h
     gpmevent.c
     gpmevent.h
     events.c
     events.h

   These implement the handling of events (user input and other system
notifications).

   `events.c' and `events.h' define the "event" Lisp object type and
primitives for manipulating it.

   `event-stream.c' implements the basic functions for working with
event queues, dispatching an event by looking it up in relevant keymaps
and such, and handling timeouts; this includes the primitives
`next-event' and `dispatch-event', as well as related primitives such
as `sit-for', `sleep-for', and `accept-process-output'.
(`event-stream.c' is one of the hairiest and trickiest modules in
XEmacs.  Beware!  You can easily mess things up here.)

   `event-Xt.c' and `event-tty.c' implement the low-level interfaces
onto retrieving events from Xt (the X toolkit) and from TTY's (using
`read()' and `select()'), respectively.  The event interface enforces a
clean separation between the specific code for interfacing with the
operating system and the generic code for working with events, by
defining an API of basic, low-level event methods; `event-Xt.c' and
`event-tty.c' are two different implementations of this API.  To add
support for a new operating system (e.g. NeXTstep), one merely needs to
provide another implementation of those API functions.

   Note that the choice of whether to use `event-Xt.c' or `event-tty.c'
is made at compile time!  Or at the very latest, it is made at startup
time.  `event-Xt.c' handles events for _both_ X and TTY frames;
`event-tty.c' is only used when X support is not compiled into XEmacs.
The reason for this is that there is only one event loop in XEmacs:
thus, it needs to be able to receive events from all different kinds of
frames.

     keymap.c
     keymap.h

   `keymap.c' and `keymap.h' define the "keymap" Lisp object type and
associated methods and primitives. (Remember that keymaps are objects
that associate event descriptions with functions to be called to
"execute" those events; `dispatch-event' looks up events in the
relevant keymaps.)

     cmdloop.c

   `cmdloop.c' contains functions that implement the actual editor
command loop--i.e. the event loop that cyclically retrieves and
dispatches events.  This code is also rather tricky, just like
`event-stream.c'.

     macros.c
     macros.h

   These two modules contain the basic code for defining keyboard
macros.  These functions don't actually do much; most of the code that
handles keyboard macros is mixed in with the event-handling code in
`event-stream.c'.

     minibuf.c

   This contains some miscellaneous code related to the minibuffer
(most of the minibuffer code was moved into Lisp by Richard Mlynarik).
This includes the primitives for completion (although filename
completion is in `dired.c'), the lowest-level interface to the
minibuffer (if the command loop were cleaned up, this too could be in
Lisp), and code for dealing with the echo area (this, too, was mostly
moved into Lisp, and the only code remaining is code to call out to
Lisp or provide simple bootstrapping implementations early in temacs,
before the echo-area Lisp code is loaded).


File: internals.info,  Node: Modules for the Basic Displayable Lisp Objects,  Next: Modules for other Display-Related Lisp Objects,  Prev: Editor-Level Control Flow Modules,  Up: A Summary of the Various XEmacs Modules

10.5 Modules for the Basic Displayable Lisp Objects
===================================================

     console-msw.c
     console-msw.h
     console-stream.c
     console-stream.h
     console-tty.c
     console-tty.h
     console-x.c
     console-x.h
     console.c
     console.h

   These modules implement the "console" Lisp object type.  A console
contains multiple display devices, but only one keyboard and mouse.
Most of the time, a console will contain exactly one device.

   Consoles are the top of a lisp object inclusion hierarchy.  Consoles
contain devices, which contain frames, which contain windows.

     device-msw.c
     device-tty.c
     device-x.c
     device.c
     device.h

   These modules implement the "device" Lisp object type.  This
abstracts a particular screen or connection on which frames are
displayed.  As with Lisp objects, event interfaces, and other
subsystems, the device code is separated into a generic component that
contains a standardized interface (in the form of a set of methods) onto
particular device types.

   The device subsystem defines all the methods and provides method
services for not only device operations but also for the frame, window,
menubar, scrollbar, toolbar, and other displayable-object subsystems.
The reason for this is that all of these subsystems have the same
subtypes (X, TTY, NeXTstep, Microsoft Windows, etc.) as devices do.

     frame-msw.c
     frame-tty.c
     frame-x.c
     frame.c
     frame.h

   Each device contains one or more frames in which objects (e.g. text)
are displayed.  A frame corresponds to a window in the window system;
usually this is a top-level window but it could potentially be one of a
number of overlapping child windows within a top-level window, using the
MDI (Multiple Document Interface) protocol in Microsoft Windows or a
similar scheme.

   The `frame-*' files implement the "frame" Lisp object type and
provide the generic and device-type-specific operations on frames (e.g.
raising, lowering, resizing, moving, etc.).

     window.c
     window.h

   Each frame consists of one or more non-overlapping "windows" (better
known as "panes" in standard window-system terminology) in which a
buffer's text can be displayed.  Windows can also have scrollbars
displayed around their edges.

   `window.c' and `window.h' implement the "window" Lisp object type
and provide code to manage windows.  Since windows have no associated
resources in the window system (the window system knows only about the
frame; no child windows or anything are used for XEmacs windows), there
is no device-type-specific code here; all of that code is part of the
redisplay mechanism or the code for particular object types such as
scrollbars.


File: internals.info,  Node: Modules for other Display-Related Lisp Objects,  Next: Modules for the Redisplay Mechanism,  Prev: Modules for the Basic Displayable Lisp Objects,  Up: A Summary of the Various XEmacs Modules

10.6 Modules for other Display-Related Lisp Objects
===================================================

     faces.c
     faces.h

     bitmaps.h
     glyphs-eimage.c
     glyphs-msw.c
     glyphs-msw.h
     glyphs-widget.c
     glyphs-x.c
     glyphs-x.h
     glyphs.c
     glyphs.h

     objects-msw.c
     objects-msw.h
     objects-tty.c
     objects-tty.h
     objects-x.c
     objects-x.h
     objects.c
     objects.h

     menubar-msw.c
     menubar-msw.h
     menubar-x.c
     menubar.c
     menubar.h

     scrollbar-msw.c
     scrollbar-msw.h
     scrollbar-x.c
     scrollbar-x.h
     scrollbar.c
     scrollbar.h

     toolbar-msw.c
     toolbar-x.c
     toolbar.c
     toolbar.h

     font-lock.c

   This file provides C support for syntax highlighting--i.e.
highlighting different syntactic constructs of a source file in
different colors, for easy reading.  The C support is provided so that
this is fast.

   As of 21.4.10, bugs introduced at the very end of the 21.2 series in
the "syntax properties" code were fixed, and highlighting is acceptably
quick again.  However, presumably more improvements are possible, and
the places to look are probably here, in the defun-traversing code, and
in `syntax.c', in the comment-traversing code.

     dgif_lib.c
     gif_err.c
     gif_lib.h
     gifalloc.c

   These modules decode GIF-format image files, for use with glyphs.
These files were removed due to Unisys patent infringement concerns.


File: internals.info,  Node: Modules for the Redisplay Mechanism,  Next: Modules for Interfacing with the File System,  Prev: Modules for other Display-Related Lisp Objects,  Up: A Summary of the Various XEmacs Modules

10.7 Modules for the Redisplay Mechanism
========================================

     redisplay-output.c
     redisplay-msw.c
     redisplay-tty.c
     redisplay-x.c
     redisplay.c
     redisplay.h

   These files provide the redisplay mechanism.  As with many other
subsystems in XEmacs, there is a clean separation between the general
and device-specific support.

   `redisplay.c' contains the bulk of the redisplay engine.  These
functions update the redisplay structures (which describe how the screen
is to appear) to reflect any changes made to the state of any
displayable objects (buffer, frame, window, etc.) since the last time
that redisplay was called.  These functions are highly optimized to
avoid doing more work than necessary (since redisplay is called
extremely often and is potentially a huge time sink), and depend heavily
on notifications from the objects themselves that changes have occurred,
so that redisplay doesn't explicitly have to check each possible object.
The redisplay mechanism also contains a great deal of caching to further
speed things up; some of this caching is contained within the various
displayable objects.

   `redisplay-output.c' goes through the redisplay structures and
converts them into calls to device-specific methods to actually output
the screen changes.

   `redisplay-x.c' and `redisplay-tty.c' are two implementations of
these redisplay output methods, for X frames and TTY frames,
respectively.

     indent.c

   This module contains various functions and Lisp primitives for
converting between buffer positions and screen positions.  These
functions call the redisplay mechanism to do most of the work, and then
examine the redisplay structures to get the necessary information.  This
module needs work.

     termcap.c
     terminfo.c
     tparam.c

   These files contain functions for working with the termcap
(BSD-style) and terminfo (System V style) databases of terminal
capabilities and escape sequences, used when XEmacs is displaying in a
TTY.

     cm.c
     cm.h

   These files provide some miscellaneous TTY-output functions and
should probably be merged into `redisplay-tty.c'.


File: internals.info,  Node: Modules for Interfacing with the File System,  Next: Modules for Other Aspects of the Lisp Interpreter and Object System,  Prev: Modules for the Redisplay Mechanism,  Up: A Summary of the Various XEmacs Modules

10.8 Modules for Interfacing with the File System
=================================================

     lstream.c
     lstream.h

   These modules implement the "stream" Lisp object type.  This is an
internal-only Lisp object that implements a generic buffering stream.
The idea is to provide a uniform interface onto all sources and sinks of
data, including file descriptors, stdio streams, chunks of memory, Lisp
buffers, Lisp strings, etc.  That way, I/O functions can be written to
the stream interface and can transparently handle all possible sources
and sinks.  (For example, the `read' function can read data from a
file, a string, a buffer, or even a function that is called repeatedly
to return data, without worrying about where the data is coming from or
what-size chunks it is returned in.)

   Note that in the C code, streams are called "lstreams" (for "Lisp
streams") to distinguish them from other kinds of streams, e.g. stdio
streams and C++ I/O streams.

   Similar to other subsystems in XEmacs, lstreams are separated into
generic functions and a set of methods for the different types of
lstreams.  `lstream.c' provides implementations of many different types
of streams; others are provided, e.g., in `file-coding.c'.

     fileio.c

   This implements the basic primitives for interfacing with the file
system.  This includes primitives for reading files into buffers,
writing buffers into files, checking for the presence or accessibility
of files, canonicalizing file names, etc.  Note that these primitives
are usually not invoked directly by the user: There is a great deal of
higher-level Lisp code that implements the user commands such as
`find-file' and `save-buffer'.  This is similar to the distinction
between the lower-level primitives in `editfns.c' and the higher-level
user commands in `commands.c' and `simple.el'.

     filelock.c

   This file provides functions for detecting clashes between different
processes (e.g. XEmacs and some external process, or two different
XEmacs processes) modifying the same file.  (XEmacs can optionally use
the `lock/' subdirectory to provide a form of "locking" between
different XEmacs processes.)  This module is also used by the low-level
functions in `insdel.c' to ensure that, if the first modification is
being made to a buffer whose corresponding file has been externally
modified, the user is made aware of this so that the buffer can be
synched up with the external changes if necessary.

     filemode.c

   This file provides some miscellaneous functions that construct a
`rwxr-xr-x'-type permissions string (as might appear in an `ls'-style
directory listing) given the information returned by the `stat()'
system call.

     dired.c
     ndir.h

   These files implement the XEmacs interface to directory searching.
This includes a number of primitives for determining the files in a
directory and for doing filename completion. (Remember that generic
completion is handled by a different mechanism, in `minibuf.c'.)

   `ndir.h' is a header file used for the directory-searching emulation
functions provided in `sysdep.c' (see section J below), for systems
that don't provide any directory-searching functions. (On those
systems, directories can be read directly as files, and parsed.)

     realpath.c

   This file provides an implementation of the `realpath()' function
for expanding symbolic links, on systems that don't implement it or have
a broken implementation.


File: internals.info,  Node: Modules for Other Aspects of the Lisp Interpreter and Object System,  Next: Modules for Interfacing with the Operating System,  Prev: Modules for Interfacing with the File System,  Up: A Summary of the Various XEmacs Modules

10.9 Modules for Other Aspects of the Lisp Interpreter and Object System
========================================================================

     elhash.c
     elhash.h
     hash.c
     hash.h

   These files provide two implementations of hash tables.  Files
`hash.c' and `hash.h' provide a generic C implementation of hash tables
which can stand independently of XEmacs.  Files `elhash.c' and
`elhash.h' provide a separate implementation of hash tables that can
store only Lisp objects, and knows about Lispy things like garbage
collection, and implement the "hash-table" Lisp object type.

     specifier.c
     specifier.h

   This module implements the "specifier" Lisp object type.  This is
primarily used for displayable properties, and allows for values that
are specific to a particular buffer, window, frame, device, or device
class, as well as a default value existing.  This is used, for example,
to control the height of the horizontal scrollbar or the appearance of
the `default', `bold', or other faces.  The specifier object consists
of a number of specifications, each of which maps from a buffer,
window, etc. to a value.  The function `specifier-instance' looks up a
value given a window (from which a buffer, frame, and device can be
derived).

     chartab.c
     chartab.h
     casetab.c

   `chartab.c' and `chartab.h' implement the "char table" Lisp object
type, which maps from characters or certain sorts of character ranges
to Lisp objects.  The implementation of this object type is optimized
for the internal representation of characters.  Char tables come in
different types, which affect the allowed object types to which a
character can be mapped and also dictate certain other properties of
the char table.

   `casetab.c' implements one sort of char table, the "case table",
which maps characters to other characters of possibly different case.
These are used by XEmacs to implement case-changing primitives and to
do case-insensitive searching.

     syntax.c
     syntax.h

   This module implements "syntax tables", another sort of char table
that maps characters into syntax classes that define the syntax of these
characters (e.g. a parenthesis belongs to a class of `open' characters
that have corresponding `close' characters and can be nested).  This
module also implements the Lisp "scanner", a set of primitives for
scanning over text based on syntax tables.  This is used, for example,
to find the matching parenthesis in a command such as `forward-sexp',
and by `font-lock.c' to locate quoted strings, comments, etc.

   Syntax codes are implemented as bitfields in an int.  Bits 0-6
contain the syntax code itself, bit 7 is a special prefix flag used for
Lisp, and bits 16-23 contain comment syntax flags.  From the Lisp
programmer's point of view, there are 11 flags: 2 styles X 2 characters
X {start, end} flags for two-character comment delimiters, 2 style
flags for one-character comment delimiters, and the prefix flag.

   Internally, however, the characters used in multi-character
delimiters will have non-comment-character syntax classes (_e.g._, the
`/' in C's `/*' comment-start delimiter has "punctuation" (here meaning
"operator-like") class in C modes).  Thus in a mixed comment style,
such as C++'s `//' to end of line, is represented by giving `/' the
"punctuation" class and the "style b first character of start sequence"
and "style b second character of start sequence" flags.  The fact that
class is _not_ punctuation allows the syntax scanner to recognize that
this is a multi-character delimiter.  The `newline' character is given
(single-character) "comment-end" _class_ and the "style b first
character of end sequence" _flag_.  The "comment-end" class allows the
scanner to determine that no second character is needed to terminate
the comment.

   There used to be a syntax class `Sextword'.  A character of
`Sextword' class is a word-constituent but a word boundary may exist
between two such characters.  Ken'ichi HANDA <handa@etl.go.jp> explains
the purpose of the Sextword syntax category:

     Japanese words are not separated by spaces, which makes finding
     word boundaries very difficult.  Theoretically it's impossible
     without using natural language processing techniques.  But, by
     defining pseudo-words as below (much simplified for letting you
     understand it easily) for Japanese, we can have a convenient
     forward-word function for Japanese.

          A Japanese word is a sequence of characters that consists of
          zero or more Kanji characters followed by zero or more
          Hiragana characters.

     Then, the problem is that now we can't say that a sequence of
     word-constituents makes up a word.  For instance, both Hiragana "A"
     and Kanji "KAN" are word-constituents but the sequence of these two
     letters can't be a single word.

     So, we introduced Sextword for Japanese letters.

   There seems to have been some controversy about this category, as it
has been removed, readded, and removed again.  Currently neither GNU
Emacs (21.3.99) nor XEmacs (21.5.17) seems to use it.

     casefiddle.c

   This module implements various Lisp primitives for upcasing,
downcasing and capitalizing strings or regions of buffers.

     rangetab.c

   This module implements the "range table" Lisp object type, which
provides for a mapping from ranges of integers to arbitrary Lisp
objects.

     opaque.c
     opaque.h

   This module implements the "opaque" Lisp object type, an
internal-only Lisp object that encapsulates an arbitrary block of memory
so that it can be managed by the Lisp allocation system.  To create an
opaque object, you call `make_opaque()', passing a pointer to a block
of memory.  An object is created that is big enough to hold the memory,
which is copied into the object's storage.  The object will then stick
around as long as you keep pointers to it, after which it will be
automatically reclaimed.

   Opaque objects can also have an arbitrary "mark method" associated
with them, in case the block of memory contains other Lisp objects that
need to be marked for garbage-collection purposes. (If you need other
object methods, such as a finalize method, you should just go ahead and
create a new Lisp object type--it's not hard.)

     abbrev.c

   This function provides a few primitives for doing dynamic
abbreviation expansion.  In XEmacs, most of the code for this has been
moved into Lisp.  Some C code remains for speed and because the
primitive `self-insert-command' (which is executed for all
self-inserting characters) hooks into the abbrev mechanism.
(`self-insert-command' is itself in C only for speed.)

     doc.c

   This function provides primitives for retrieving the documentation
strings of functions and variables.  These documentation strings contain
certain special markers that get dynamically expanded (e.g. a
reverse-lookup is performed on some named functions to retrieve their
current key bindings).  Some documentation strings (in particular, for
the built-in primitives and pre-loaded Lisp functions) are stored
externally in a file `DOC' in the `lib-src/' directory and need to be
fetched from that file. (Part of the build stage involves building this
file, and another part involves constructing an index for this file and
embedding it into the executable, so that the functions in `doc.c' do
not have to search the entire `DOC' file to find the appropriate
documentation string.)

     md5.c

   This function provides a Lisp primitive that implements the MD5
secure hashing scheme, used to create a large hash value of a string of
data such that the data cannot be derived from the hash value.  This is
used for various security applications on the Internet.


File: internals.info,  Node: Modules for Interfacing with the Operating System,  Next: Modules for Interfacing with X Windows,  Prev: Modules for Other Aspects of the Lisp Interpreter and Object System,  Up: A Summary of the Various XEmacs Modules

10.10 Modules for Interfacing with the Operating System
=======================================================

     callproc.c
     process.c
     process.h

   These modules allow XEmacs to spawn and communicate with subprocesses
and network connections.

   `callproc.c' implements (through the `call-process' primitive) what
are called "synchronous subprocesses".  This means that XEmacs runs a
program, waits till it's done, and retrieves its output.  A typical
example might be calling the `ls' program to get a directory listing.

   `process.c' and `process.h' implement "asynchronous subprocesses".
This means that XEmacs starts a program and then continues normally,
not waiting for the process to finish.  Data can be sent to the process
or retrieved from it as it's running.  This is used for the `shell'
command (which provides a front end onto a shell program such as
`csh'), the mail and news readers implemented in XEmacs, etc.  The
result of calling `start-process' to start a subprocess is a process
object, a particular kind of object used to communicate with the
subprocess.  You can send data to the process by passing the process
object and the data to `send-process', and you can specify what happens
to data retrieved from the process by setting properties of the process
object. (When the process sends data, XEmacs receives a process event,
which says that there is data ready.  When `dispatch-event' is called
on this event, it reads the data from the process and does something
with it, as specified by the process object's properties.  Typically,
this means inserting the data into a buffer or calling a function.)
Another property of the process object is called the "sentinel", which
is a function that is called when the process terminates.

   Process objects are also used for network connections (connections
to a process running on another machine).  Network connections are
started with `open-network-stream' but otherwise work just like
subprocesses.

     sysdep.c
     sysdep.h

   These modules implement most of the low-level, messy operating-system
interface code.  This includes various device control (ioctl) operations
for file descriptors, TTY's, pseudo-terminals, etc. (usually this stuff
is fairly system-dependent; thus the name of this module), and emulation
of standard library functions and system calls on systems that don't
provide them or have broken versions.

     sysdir.h
     sysfile.h
     sysfloat.h
     sysproc.h
     syspwd.h
     syssignal.h
     systime.h
     systty.h
     syswait.h

   These header files provide consistent interfaces onto
system-dependent header files and system calls.  The idea is that,
instead of including a standard header file like `<sys/param.h>' (which
may or may not exist on various systems) or having to worry about
whether all system provide a particular preprocessor constant, or
having to deal with the four different paradigms for manipulating
signals, you just include the appropriate `sys*.h' header file, which
includes all the right system header files, defines and missing
preprocessor constants, provides a uniform interface onto system calls,
etc.

   `sysdir.h' provides a uniform interface onto directory-querying
functions. (In some cases, this is in conjunction with emulation
functions in `sysdep.c'.)

   `sysfile.h' includes all the necessary header files for standard
system calls (e.g. `read()'), ensures that all necessary `open()' and
`stat()' preprocessor constants are defined, and possibly (usually)
substitutes sugared versions of `read()', `write()', etc. that
automatically restart interrupted I/O operations.

   `sysfloat.h' includes the necessary header files for floating-point
operations.

   `sysproc.h' includes the necessary header files for calling
`select()', `fork()', `execve()', socket operations, and the like, and
ensures that the `FD_*()' macros for descriptor-set manipulations are
available.

   `syspwd.h' includes the necessary header files for obtaining
information from `/etc/passwd' (the functions are emulated under VMS).

   `syssignal.h' includes the necessary header files for
signal-handling and provides a uniform interface onto the different
signal-handling and signal-blocking paradigms.

   `systime.h' includes the necessary header files and provides uniform
interfaces for retrieving the time of day, setting file
access/modification times, getting the amount of time used by the XEmacs
process, etc.

   `systty.h' buffers against the infinitude of different ways of
controlling TTY's.

   `syswait.h' provides a uniform way of retrieving the exit status
from a `wait()'ed-on process (some systems use a union, others use an
int).

     hpplay.c
     libsst.c
     libsst.h
     libst.h
     linuxplay.c
     nas.c
     sgiplay.c
     sound.c
     sunplay.c

   These files implement the ability to play various sounds on some
types of computers.  You have to configure your XEmacs with sound
support in order to get this capability.

   `sound.c' provides the generic interface.  It implements various
Lisp primitives and variables that let you specify which sounds should
be played in certain conditions. (The conditions are identified by
symbols, which are passed to `ding' to make a sound.  Various standard
functions call this function at certain times; if sound support does
not exist, a simple beep results.

   `sgiplay.c', `sunplay.c', `hpplay.c', and `linuxplay.c' interface to
the machine's speaker for various different kind of machines.  This is
called "native" sound.

   `nas.c' interfaces to a computer somewhere else on the network using
the NAS (Network Audio Server) protocol, playing sounds on that
machine.  This allows you to run XEmacs on a remote machine, with its
display set to your local machine, and have the sounds be made on your
local machine, provided that you have a NAS server running on your local
machine.

   `libsst.c', `libsst.h', and `libst.h' provide some additional
functions for playing sound on a Sun SPARC but are not currently in use.

     tooltalk.c
     tooltalk.h

   These two modules implement an interface to the ToolTalk protocol,
which is an interprocess communication protocol implemented on some
versions of Unix.  ToolTalk is a high-level protocol that allows
processes to register themselves as providers of particular services;
other processes can then request a service without knowing or caring
exactly who is providing the service.  It is similar in spirit to the
DDE protocol provided under Microsoft Windows.  ToolTalk is a part of
the new CDE (Common Desktop Environment) specification and is used to
connect the parts of the SPARCWorks development environment.

     getloadavg.c

   This module provides the ability to retrieve the system's current
load average. (The way to do this is highly system-specific,
unfortunately, and requires a lot of special-case code.)

     sunpro.c

   This module provides a small amount of code used internally at Sun to
keep statistics on the usage of XEmacs.

     broken-sun.h
     strcmp.c
     strcpy.c
     sunOS-fix.c

   These files provide replacement functions and prototypes to fix
numerous bugs in early releases of SunOS 4.1.

     hftctl.c

   This module provides some terminal-control code necessary on
versions of AIX prior to 4.1.


File: internals.info,  Node: Modules for Interfacing with X Windows,  Next: Modules for Internationalization,  Prev: Modules for Interfacing with the Operating System,  Up: A Summary of the Various XEmacs Modules

10.11 Modules for Interfacing with X Windows
============================================

     Emacs.ad.h

   A file generated from `Emacs.ad', which contains XEmacs-supplied
fallback resources (so that XEmacs has pretty defaults).

     EmacsFrame.c
     EmacsFrame.h
     EmacsFrameP.h

   These modules implement an Xt widget class that encapsulates a frame.
This is for ease in integrating with Xt.  The EmacsFrame widget covers
the entire X window except for the menubar; the scrollbars are
positioned on top of the EmacsFrame widget.

   *Warning:* Abandon hope, all ye who enter here.  This code took an
ungodly amount of time to get right, and is likely to fall apart
mercilessly at the slightest change.  Such is life under Xt.

     EmacsManager.c
     EmacsManager.h
     EmacsManagerP.h

   These modules implement a simple Xt manager (i.e. composite) widget
class that simply lets its children set whatever geometry they want.
It's amazing that Xt doesn't provide this standardly, but on second
thought, it makes sense, considering how amazingly broken Xt is.

     EmacsShell-sub.c
     EmacsShell.c
     EmacsShell.h
     EmacsShellP.h

   These modules implement two Xt widget classes that are subclasses of
the TopLevelShell and TransientShell classes.  This is necessary to deal
with more brokenness that Xt has sadistically thrust onto the backs of
developers.

     xgccache.c
     xgccache.h

   These modules provide functions for maintenance and caching of GC's
(graphics contexts) under the X Window System.  This code is junky and
needs to be rewritten.

     select-msw.c
     select-x.c
     select.c
     select.h

   This module provides an interface to the X Window System's concept of
"selections", the standard way for X applications to communicate with
each other.

     xintrinsic.h
     xintrinsicp.h
     xmmanagerp.h
     xmprimitivep.h

   These header files are similar in spirit to the `sys*.h' files and
buffer against different implementations of Xt and Motif.

   * `xintrinsic.h' should be included in place of `<Intrinsic.h>'.

   * `xintrinsicp.h' should be included in place of `<IntrinsicP.h>'.

   * `xmmanagerp.h' should be included in place of `<XmManagerP.h>'.

   * `xmprimitivep.h' should be included in place of `<XmPrimitiveP.h>'.

     xmu.c
     xmu.h

   These files provide an emulation of the Xmu library for those systems
(i.e. HPUX) that don't provide it as a standard part of X.

     ExternalClient-Xlib.c
     ExternalClient.c
     ExternalClient.h
     ExternalClientP.h
     ExternalShell.c
     ExternalShell.h
     ExternalShellP.h
     extw-Xlib.c
     extw-Xlib.h
     extw-Xt.c
     extw-Xt.h

   These files provide the "external widget" interface, which allows an
XEmacs frame to appear as a widget in another application.  To do this,
you have to configure with `--external-widget'.

   `ExternalShell*' provides the server (XEmacs) side of the connection.

   `ExternalClient*' provides the client (other application) side of
the connection.  These files are not compiled into XEmacs but are
compiled into libraries that are then linked into your application.

   `extw-*' is common code that is used for both the client and server.

   Don't touch this code; something is liable to break if you do.


File: internals.info,  Node: Modules for Internationalization,  Next: Modules for Regression Testing,  Prev: Modules for Interfacing with X Windows,  Up: A Summary of the Various XEmacs Modules

10.12 Modules for Internationalization
======================================

     mule-canna.c
     mule-ccl.c
     mule-charset.c
     mule-charset.h
     file-coding.c
     file-coding.h
     mule-mcpath.c
     mule-mcpath.h
     mule-wnnfns.c
     mule.c

   These files implement the MULE (Asian-language) support.  Note that
MULE actually provides a general interface for all sorts of languages,
not just Asian languages (although they are generally the most
complicated to support).  This code is still in beta.

   `mule-charset.*' and `file-coding.*' provide the heart of the XEmacs
MULE support.  `mule-charset.*' implements the "charset" Lisp object
type, which encapsulates a character set (an ordered one- or
two-dimensional set of characters, such as US ASCII or JISX0208 Japanese
Kanji).

   `file-coding.*' implements the "coding-system" Lisp object type,
which encapsulates a method of converting between different encodings.
An encoding is a representation of a stream of characters, possibly
from multiple character sets, using a stream of bytes or words, and
defines (e.g.) which escape sequences are used to specify particular
character sets, how the indices for a character are converted into bytes
(sometimes this involves setting the high bit; sometimes complicated
rearranging of the values takes place, as in the Shift-JIS encoding),
etc.

   `mule-ccl.c' provides the CCL (Code Conversion Language)
interpreter.  CCL is similar in spirit to Lisp byte code and is used to
implement converters for custom encodings.

   `mule-canna.c' and `mule-wnnfns.c' implement interfaces to external
programs used to implement the Canna and WNN input methods,
respectively.  This is currently in beta.

   `mule-mcpath.c' provides some functions to allow for pathnames
containing extended characters.  This code is fragmentary, obsolete, and
completely non-working.  Instead, `pathname-coding-system' is used to
specify conversions of names of files and directories.  The standard C
I/O functions like `open()' are wrapped so that conversion occurs
automatically.

   `mule.c' contains a few miscellaneous things.  It currently seems to
be unused and probably should be removed.

     intl.c

   This provides some miscellaneous internationalization code for
implementing message translation and interfacing to the Ximp input
method.  None of this code is currently working.

     iso-wide.h

   This contains leftover code from an earlier implementation of
Asian-language support, and is not currently used.


File: internals.info,  Node: Modules for Regression Testing,  Prev: Modules for Internationalization,  Up: A Summary of the Various XEmacs Modules

10.13 Modules for Regression Testing
====================================

     test-harness.el
     base64-tests.el
     byte-compiler-tests.el
     case-tests.el
     ccl-tests.el
     c-tests.el
     database-tests.el
     extent-tests.el
     hash-table-tests.el
     lisp-tests.el
     md5-tests.el
     mule-tests.el
     regexp-tests.el
     symbol-tests.el
     syntax-tests.el
     tag-tests.el

   `test-harness.el' defines the macros `Assert', `Check-Error',
`Check-Error-Message', and `Check-Message'.  The other files are test
files, testing various XEmacs modules.


File: internals.info,  Node: Allocation of Objects in XEmacs Lisp,  Next: Dumping,  Prev: A Summary of the Various XEmacs Modules,  Up: Top

11 Allocation of Objects in XEmacs Lisp
***************************************

* Menu:

* Introduction to Allocation::
* Garbage Collection::
* GCPROing::
* Garbage Collection - Step by Step::
* Integers and Characters::
* Allocation from Frob Blocks::
* lrecords::
* Low-level allocation::
* Cons::
* Vector::
* Bit Vector::
* Symbol::
* Marker::
* String::
* Compiled Function::


File: internals.info,  Node: Introduction to Allocation,  Next: Garbage Collection,  Up: Allocation of Objects in XEmacs Lisp

11.1 Introduction to Allocation
===============================

Emacs Lisp, like all Lisps, has garbage collection.  This means that
the programmer never has to explicitly free (destroy) an object; it
happens automatically when the object becomes inaccessible.  Most
experts agree that garbage collection is a necessity in a modern,
high-level language.  Its omission from C stems from the fact that C was
originally designed to be a nice abstract layer on top of assembly
language, for writing kernels and basic system utilities rather than
large applications.

   Lisp objects can be created by any of a number of Lisp primitives.
Most object types have one or a small number of basic primitives for
creating objects.  For conses, the basic primitive is `cons'; for
vectors, the primitives are `make-vector' and `vector'; for symbols,
the primitives are `make-symbol' and `intern'; etc.  Some Lisp objects,
especially those that are primarily used internally, have no
corresponding Lisp primitives.  Every Lisp object, though, has at least
one C primitive for creating it.

   Recall from section (VII) that a Lisp object, as stored in a 32-bit
or 64-bit word, has a few tag bits, and a "value" that occupies the
remainder of the bits.  We can separate the different Lisp object types
into three broad categories:

   * (a) Those for whom the value directly represents the contents of
     the Lisp object.  Only two types are in this category: integers and
     characters.  No special allocation or garbage collection is
     necessary for such objects.  Lisp objects of these types do not
     need to be `GCPRO'ed.

   In the remaining two categories, the type is stored in the object
itself.  The tag for all such objects is the generic "lrecord"
(Lisp_Type_Record) tag.  The first bytes of the object's structure are
an integer (actually a char) characterising the object's type and some
flags, in particular the mark bit used for garbage collection.  A
structure describing the type is accessible thru the
lrecord_implementation_table indexed with said integer.  This structure
includes the method pointers and a pointer to a string naming the type.

   * (b) Those lrecords that are allocated in frob blocks (see above).
     This includes the objects that are most common and relatively
     small, and includes conses, strings, subrs, floats, compiled
     functions, symbols, extents, events, and markers.  With the
     cleanup of frob blocks done in 19.12, it's not terribly hard to
     add more objects to this category, but it's a bit trickier than
     adding an object type to type (c) (esp. if the object needs a
     finalization method), and is not likely to save much space unless
     the object is small and there are many of them. (In fact, if there
     are very few of them, it might actually waste space.)

   * (c) Those lrecords that are individually `malloc()'ed.  These are
     called "lcrecords".  All other types are in this category.  Adding
     a new type to this category is comparatively easy, and all types
     added since 19.8 (when the current allocation scheme was devised,
     by Richard Mlynarik), with the exception of the character type,
     have been in this category.

   Note that bit vectors are a bit of a special case.  They are simple
lrecords as in category (b), but are individually `malloc()'ed like
vectors.  You can basically view them as exactly like vectors except
that their type is stored in lrecord fashion rather than in
directly-tagged fashion.


File: internals.info,  Node: Garbage Collection,  Next: GCPROing,  Prev: Introduction to Allocation,  Up: Allocation of Objects in XEmacs Lisp

11.2 Garbage Collection
=======================

Garbage collection is simple in theory but tricky to implement.  Emacs
Lisp uses the oldest garbage collection method, called "mark and
sweep".  Garbage collection begins by starting with all accessible
locations (i.e. all variables and other slots where Lisp objects might
occur) and recursively traversing all objects accessible from those
slots, marking each one that is found.  We then go through all of
memory and free each object that is not marked, and unmarking each
object that is marked.  Note that "all of memory" means all currently
allocated objects.  Traversing all these objects means traversing all
frob blocks, all vectors (which are chained in one big list), and all
lcrecords (which are likewise chained).

   Garbage collection can be invoked explicitly by calling
`garbage-collect' but is also called automatically by `eval', once a
certain amount of memory has been allocated since the last garbage
collection (according to `gc-cons-threshold').


File: internals.info,  Node: GCPROing,  Next: Garbage Collection - Step by Step,  Prev: Garbage Collection,  Up: Allocation of Objects in XEmacs Lisp

11.3 `GCPRO'ing
===============

`GCPRO'ing is one of the ugliest and trickiest parts of Emacs
internals.  The basic idea is that whenever garbage collection occurs,
all in-use objects must be reachable somehow or other from one of the
roots of accessibility.  The roots of accessibility are:

  1. All objects that have been `staticpro()'d or
     `staticpro_nodump()'ed.  This is used for any global C variables
     that hold Lisp objects.  A call to `staticpro()' happens implicitly
     as a result of any symbols declared with `defsymbol()' and any
     variables declared with `DEFVAR_FOO()'.  You need to explicitly
     call `staticpro()' (in the `vars_of_foo()' method of a module) for
     other global C variables holding Lisp objects. (This typically
     includes internal lists and such things.).  Use
     `staticpro_nodump()' only in the rare cases when you do not want
     the pointed variable to be saved at dump time but rather recompute
     it at startup.

     Note that `obarray' is one of the `staticpro()'d things.
     Therefore, all functions and variables get marked through this.

  2. Any shadowed bindings that are sitting on the `specpdl' stack.

  3. Any objects sitting in currently active (Lisp) stack frames,
     catches, and condition cases.

  4. A couple of special-case places where active objects are located.

  5. Anything currently marked with `GCPRO'.

   Marking with `GCPRO' is necessary because some C functions (quite a
lot, in fact), allocate objects during their operation.  Quite
frequently, there will be no other pointer to the object while the
function is running, and if a garbage collection occurs and the object
needs to be referenced again, bad things will happen.  The solution is
to mark those objects with `GCPRO'.  Unfortunately this is easy to
forget, and there is basically no way around this problem.  Here are
some rules, though:

  1. For every `GCPRON', there have to be declarations of `struct gcpro
     gcpro1, gcpro2', etc.

  2. You _must_ `UNGCPRO' anything that's `GCPRO'ed, and you _must not_
     `UNGCPRO' if you haven't `GCPRO'ed.  Getting either of these wrong
     will lead to crashes, often in completely random places unrelated
     to where the problem lies.

  3. The way this actually works is that all currently active `GCPRO's
     are chained through the `struct gcpro' local variables, with the
     variable `gcprolist' pointing to the head of the list and the nth
     local `gcpro' variable pointing to the first `gcpro' variable in
     the next enclosing stack frame.  Each `GCPRO'ed thing is an
     lvalue, and the `struct gcpro' local variable contains a pointer to
     this lvalue.  This is why things will mess up badly if you don't
     pair up the `GCPRO's and `UNGCPRO's--you will end up with
     `gcprolist's containing pointers to `struct gcpro's or local
     `Lisp_Object' variables in no-longer-active stack frames.

  4. It is actually possible for a single `struct gcpro' to protect a
     contiguous array of any number of values, rather than just a
     single lvalue.  To effect this, call `GCPRON' as usual on the
     first object in the array and then set `gcproN.nvars'.

  5. *Strings are relocated.*  What this means in practice is that the
     pointer obtained using `XSTRING_DATA()' is liable to change at any
     time, and you should never keep it around past any function call,
     or pass it as an argument to any function that might cause a
     garbage collection.  This is why a number of functions accept
     either a "non-relocatable" `char *' pointer or a relocatable Lisp
     string, and only access the Lisp string's data at the very last
     minute.  In some cases, you may end up having to `alloca()' some
     space and copy the string's data into it.

  6. By convention, if you have to nest `GCPRO''s, use `NGCPRON' (along
     with `struct gcpro ngcpro1, ngcpro2', etc.), `NNGCPRON', etc.
     This avoids compiler warnings about shadowed locals.

  7. It is _always_ better to err on the side of extra `GCPRO's rather
     than too few.  The extra cycles spent on this are almost never
     going to make a whit of difference in the speed of anything.

  8. The general rule to follow is that caller, not callee, `GCPRO's.
     That is, you should not have to explicitly `GCPRO' any Lisp objects
     that are passed in as parameters.

     One exception from this rule is if you ever plan to change the
     parameter value, and store a new object in it.  In that case, you
     _must_ `GCPRO' the parameter, because otherwise the new object
     will not be protected.

     So, if you create any Lisp objects (remember, this happens in all
     sorts of circumstances, e.g. with `Fcons()', etc.), you are
     responsible for `GCPRO'ing them, unless you are _absolutely sure_
     that there's no possibility that a garbage-collection can occur
     while you need to use the object.  Even then, consider `GCPRO'ing.

  9. A garbage collection can occur whenever anything calls `Feval', or
     whenever a QUIT can occur where execution can continue past this.
     (Remember, this is almost anywhere.)

 10. If you have the _least smidgeon of doubt_ about whether you need
     to `GCPRO', you should `GCPRO'.

 11. Beware of `GCPRO'ing something that is uninitialized.  If you have
     any shade of doubt about this, initialize all your variables to
     `Qnil'.

 12. Be careful of traps, like calling `Fcons()' in the argument to
     another function.  By the "caller protects" law, you should be
     `GCPRO'ing the newly-created cons, but you aren't.  A certain
     number of functions that are commonly called on freshly created
     stuff (e.g. `nconc2()', `Fsignal()'), break the "caller protects"
     law and go ahead and `GCPRO' their arguments so as to simplify
     things, but make sure and check if it's OK whenever doing
     something like this.

 13. Once again, remember to `GCPRO'!  Bugs resulting from insufficient
     `GCPRO'ing are intermittent and extremely difficult to track down,
     often showing up in crashes inside of `garbage-collect' or in
     weirdly corrupted objects or even in incorrect values in a totally
     different section of code.

   If you don't understand whether to `GCPRO' in a particular instance,
ask on the mailing lists.  A general hint is that `prog1' is the
canonical example.

   Given the extremely error-prone nature of the `GCPRO' scheme, and
the difficulties in tracking down, it should be considered a deficiency
in the XEmacs code.  A solution to this problem would involve
implementing so-called "conservative" garbage collection for the C
stack.  That involves looking through all of stack memory and treating
anything that looks like a reference to an object as a reference.  This
will result in a few objects not getting collected when they should, but
it obviates the need for `GCPRO'ing, and allows garbage collection to
happen at any point at all, such as during object allocation.


File: internals.info,  Node: Garbage Collection - Step by Step,  Next: Integers and Characters,  Prev: GCPROing,  Up: Allocation of Objects in XEmacs Lisp

11.4 Garbage Collection - Step by Step
======================================

* Menu:

* Invocation::
* garbage_collect_1::
* mark_object::
* gc_sweep::
* sweep_lcrecords_1::
* compact_string_chars::
* sweep_strings::
* sweep_bit_vectors_1::


File: internals.info,  Node: Invocation,  Next: garbage_collect_1,  Up: Garbage Collection - Step by Step

11.4.1 Invocation
-----------------

The first thing that anyone should know about garbage collection is:
when and how the garbage collector is invoked. One might think that this
could happen every time new memory is allocated, e.g. new objects are
created, but this is _not_ the case. Instead, we have the following
situation:

   The entry point of any process of garbage collection is an invocation
of the function `garbage_collect_1' in file `alloc.c'. The invocation
can occur _explicitly_ by calling the function `Fgarbage_collect' (in
addition this function provides information about the freed memory), or
can occur _implicitly_ in four different situations:
  1. In function `main_1' in file `emacs.c'. This function is called at
     each startup of xemacs. The garbage collection is invoked after all
     initial creations are completed, but only if a special internal
     error checking-constant `ERROR_CHECK_GC' is defined.

  2. In function `disksave_object_finalization' in file `alloc.c'. The
     only purpose of this function is to clear the objects from memory
     which need not be stored with xemacs when we dump out an
     executable. This is only done by `Fdump_emacs' or by
     `Fdump_emacs_data' respectively (both in `emacs.c'). The actual
     clearing is accomplished by making these objects unreachable and
     starting a garbage collection. The function is only used while
     building xemacs.

  3. In function `Feval / eval' in file `eval.c'. Each time the well
     known and often used function eval is called to evaluate a form,
     one of the first things that could happen, is a potential call of
     `garbage_collect_1'. There exist three global variables,
     `consing_since_gc' (counts the created cons-cells since the last
     garbage collection), `gc_cons_threshold' (a specified threshold
     after which a garbage collection occurs) and `always_gc'. If
     `always_gc' is set or if the threshold is exceeded, the garbage
     collection will start.

  4. In function `Ffuncall / funcall' in file `eval.c'. This function
     evaluates calls of elisp functions and works according to `Feval'.

   The upshot is that garbage collection can basically occur everywhere
`Feval', respectively `Ffuncall', is used - either directly or through
another function. Since calls to these two functions are hidden in
various other functions, many calls to `garbage_collect_1' are not
obviously foreseeable, and therefore unexpected. Instances where they
are used that are worth remembering are various elisp commands, as for
example `or', `and', `if', `cond', `while', `setq', etc., miscellaneous
`gui_item_...' functions, everything related to `eval' (`Feval_buffer',
`call0', ...) and inside `Fsignal'. The latter is used to handle
signals, as for example the ones raised by every `QUIT'-macro triggered
after pressing Ctrl-g.


File: internals.info,  Node: garbage_collect_1,  Next: mark_object,  Prev: Invocation,  Up: Garbage Collection - Step by Step

11.4.2 `garbage_collect_1'
--------------------------

We can now describe exactly what happens after the invocation takes
place.
  1. There are several cases in which the garbage collector is left
     immediately: when we are already garbage collecting
     (`gc_in_progress'), when the garbage collection is somehow
     forbidden (`gc_currently_forbidden'), when we are currently
     displaying something (`in_display') or when we are preparing for
     the armageddon of the whole system (`preparing_for_armageddon').

  2. Next the correct frame in which to put all the output occurring
     during garbage collecting is determined. In order to be able to
     restore the old display's state after displaying the message, some
     data about the current cursor position has to be saved. The
     variables `pre_gc_cursor' and `cursor_changed' take care of that.

  3. The state of `gc_currently_forbidden' must be restored after the
     garbage collection, no matter what happens during the process. We
     accomplish this by `record_unwind_protect'ing the suitable function
     `restore_gc_inhibit' together with the current value of
     `gc_currently_forbidden'.

  4. If we are concurrently running an interactive xemacs session, the
     next step is simply to show the garbage collector's cursor/message.

  5. The following steps are the intrinsic steps of the garbage
     collector, therefore `gc_in_progress' is set.

  6. For debugging purposes, it is possible to copy the current C stack
     frame. However, this seems to be a currently unused feature.

  7. Before actually starting to go over all live objects, references to
     objects that are no longer used are pruned. We only have to do
     this for events (`clear_event_resource') and for specifiers
     (`cleanup_specifiers').

  8. Now the mark phase begins and marks all accessible elements. In
     order to start from all slots that serve as roots of
     accessibility, the function `mark_object' is called for each root
     individually to go out from there to mark all reachable objects.
     All roots that are traversed are shown in their processed order:
        * all constant symbols and static variables that are registered
          via `staticpro' in the dynarr `staticpros'.  *Note Adding
          Global Lisp Variables::.

        * all Lisp objects that are created in C functions and that
          must be protected from freeing them. They are registered in
          the global list `gcprolist'.  *Note GCPROing::.

        * all local variables (i.e. their name fields `symbol' and old
          values `old_values') that are bound during the evaluation by
          the Lisp engine. They are stored in `specbinding' structs
          pushed on a stack called `specpdl'.  *Note Dynamic Binding;
          The specbinding Stack; Unwind-Protects::.

        * all catch blocks that the Lisp engine encounters during the
          evaluation cause the creation of structs `catchtag' inserted
          in the list `catchlist'. Their tag (`tag') and value (`val'
          fields are freshly created objects and therefore have to be
          marked.  *Note Catch and Throw::.

        * every function application pushes new structs `backtrace' on
          the call stack of the Lisp engine (`backtrace_list'). The
          unique parts that have to be marked are the fields for each
          function (`function') and all their arguments (`args').
          *Note Evaluation::.

        * all objects that are used by the redisplay engine that must
          not be freed are marked by a special function called
          `mark_redisplay' (in `redisplay.c').

        * all objects created for profiling purposes are allocated by C
          functions instead of using the lisp allocation mechanisms. In
          order to receive the right ones during the sweep phase, they
          also have to be marked manually. That is done by the function
          `mark_profiling_info'

  9. Hash tables in XEmacs belong to a kind of special objects that
     make use of a concept often called 'weak pointers'.  To make a
     long story short, these kind of pointers are not followed during
     the estimation of the live objects during garbage collection.  Any
     object referenced only by weak pointers is collected anyway, and
     the reference to it is cleared. In hash tables there are different
     usage patterns of them, manifesting in different types of hash
     tables, namely 'non-weak', 'weak', 'key-weak' and 'value-weak'
     (internally also 'key-car-weak' and 'value-car-weak') hash tables,
     each clearing entries depending on different conditions. More
     information can be found in the documentation to the function
     `make-hash-table'.

     Because there are complicated dependency rules about when and what
     to mark while processing weak hash tables, the standard `marker'
     method is only active if it is marking non-weak hash tables. As
     soon as a weak component is in the table, the hash table entries
     are ignored while marking. Instead their marking is done each
     separately by the function `finish_marking_weak_hash_tables'. This
     function iterates over each hash table entry `hentries' for each
     weak hash table in `Vall_weak_hash_tables'. Depending on the type
     of a table, the appropriate action is performed.  If a table is
     acting as `HASH_TABLE_KEY_WEAK', and a key already marked,
     everything reachable from the `value' component is marked. If it is
     acting as a `HASH_TABLE_VALUE_WEAK' and the value component is
     already marked, the marking starts beginning only from the `key'
     component.  If it is a `HASH_TABLE_KEY_CAR_WEAK' and the car of
     the key entry is already marked, we mark both the `key' and
     `value' components.  Finally, if the table is of the type
     `HASH_TABLE_VALUE_CAR_WEAK' and the car of the value components is
     already marked, again both the `key' and the `value' components
     get marked.

     Again, there are lists with comparable properties called weak
     lists. There exist different peculiarities of their types called
     `simple', `assoc', `key-assoc' and `value-assoc'. You can find
     further details about them in the description to the function
     `make-weak-list'. The scheme of their marking is similar: all weak
     lists are listed in `Qall_weak_lists', therefore we iterate over
     them. The marking is advanced until we hit an already marked pair.
     Then we know that during a former run all the rest has been marked
     completely. Again, depending on the special type of the weak list,
     our jobs differ. If it is a `WEAK_LIST_SIMPLE' and the elem is
     marked, we mark the `cons' part. If it is a `WEAK_LIST_ASSOC' and
     not a pair or a pair with both marked car and cdr, we mark the
     `cons' and the `elem'. If it is a `WEAK_LIST_KEY_ASSOC' and not a
     pair or a pair with a marked car of the elem, we mark the `cons'
     and the `elem'. Finally, if it is a `WEAK_LIST_VALUE_ASSOC' and
     not a pair or a pair with a marked cdr of the elem, we mark both
     the `cons' and the `elem'.

     Since, by marking objects in reach from weak hash tables and weak
     lists, other objects could get marked, this perhaps implies
     further marking of other weak objects, both finishing functions
     are redone as long as yet unmarked objects get freshly marked.

 10. After completing the special marking for the weak hash tables and
     for the weak lists, all entries that point to objects that are
     going to be swept in the further process are useless, and
     therefore have to be removed from the table or the list.

     The function `prune_weak_hash_tables' does the job for weak hash
     tables. Totally unmarked hash tables are removed from the list
     `Vall_weak_hash_tables'. The other ones are treated more carefully
     by scanning over all entries and removing one as soon as one of
     the components `key' and `value' is unmarked.

     The same idea applies to the weak lists. It is accomplished by
     `prune_weak_lists': An unmarked list is pruned from
     `Vall_weak_lists' immediately. A marked list is treated more
     carefully by going over it and removing just the unmarked pairs.

 11. The function `prune_specifiers' checks all listed specifiers held
     in `Vall_specifiers' and removes the ones from the lists that are
     unmarked.

 12. All syntax tables are stored in a list called
     `Vall_syntax_tables'. The function `prune_syntax_tables' walks
     through it and unlinks the tables that are unmarked.

 13. Next, we will attack the complete sweeping - the function
     `gc_sweep' which holds the predominance.

 14. First, all the variables with respect to garbage collection are
     reset. `consing_since_gc' - the counter of the created cells since
     the last garbage collection - is set back to 0, and
     `gc_in_progress' is not `true' anymore.

 15. In case the session is interactive, the displayed cursor and
     message are removed again.

 16. The state of `gc_inhibit' is restored to the former value by
     unwinding the stack.

 17. A small memory reserve is always held back that can be reached by
     `breathing_space'. If nothing more is left, we create a new reserve
     and exit.


File: internals.info,  Node: mark_object,  Next: gc_sweep,  Prev: garbage_collect_1,  Up: Garbage Collection - Step by Step

11.4.3 `mark_object'
--------------------

The first thing that is checked while marking an object is whether the
object is a real Lisp object `Lisp_Type_Record' or just an integer or a
character. Integers and characters are the only two types that are
stored directly - without another level of indirection, and therefore
they don't have to be marked and collected.  *Note How Lisp Objects Are
Represented in C::.

   The second case is the one we have to handle. It is the one when we
are dealing with a pointer to a Lisp object. But, there exist also three
possibilities, that prevent us from doing anything while marking: The
object is read only which prevents it from being garbage collected,
i.e. marked (`C_READONLY_RECORD_HEADER'). The object in question is
already marked, and need not be marked for the second time (checked by
`MARKED_RECORD_HEADER_P'). If it is a special, unmarkable object
(`UNMARKABLE_RECORD_HEADER_P', apparently, these are objects that sit
in some const space, and can therefore not be marked, see
`this_one_is_unmarkable' in `alloc.c').

   Now, the actual marking is feasible. We do so by once using the macro
`MARK_RECORD_HEADER' to mark the object itself (actually the special
flag in the lrecord header), and calling its special marker "method"
`marker' if available. The marker method marks every other object that
is in reach from our current object. Note, that these marker methods
should not call `mark_object' recursively, but instead should return
the next object from where further marking has to be performed.

   In case another object was returned, as mentioned before, we
reiterate the whole `mark_object' process beginning with this next
object.


File: internals.info,  Node: gc_sweep,  Next: sweep_lcrecords_1,  Prev: mark_object,  Up: Garbage Collection - Step by Step

11.4.4 `gc_sweep'
-----------------

The job of this function is to free all unmarked records from memory. As
we know, there are different types of objects implemented and managed,
and consequently different ways to free them from memory.  *Note
Introduction to Allocation::.

   We start with all objects stored through `lcrecords'. All bulkier
objects are allocated and handled using that scheme of `lcrecords'.
Each object is `malloc'ed separately instead of placing it in one of
the contiguous frob blocks. All types that are currently stored using
`lcrecords''s  `alloc_lcrecord' and `make_lcrecord_list' are the types:
vectors, buffers, char-table, char-table-entry, console, weak-list,
database, device, ldap, hash-table, command-builder, extent-auxiliary,
extent-info, face, coding-system, frame, image-instance, glyph,
popup-data, gui-item, keymap, charset, color_instance, font_instance,
opaque, opaque-list, process, range-table, specifier,
symbol-value-buffer-local, symbol-value-lisp-magic,
symbol-value-varalias, toolbar-button, tooltalk-message,
tooltalk-pattern, window, and window-configuration. We take care of
them in the fist place in order to be able to handle and to finalize
items stored in them more easily. The function `sweep_lcrecords_1' as
described below is doing the whole job for us.  For a description about
the internals: *Note lrecords::.

   Our next candidates are the other objects that behave quite
differently than everything else: the strings. They consists of two
parts, a fixed-size portion (`struct Lisp_String') holding the string's
length, its property list and a pointer to the second part, and the
actual string data, which is stored in string-chars blocks comparable to
frob blocks. In this block, the data is not only freed, but also a
compression of holes is made, i.e. all strings are relocated together.
*Note String::. This compacting phase is performed by the function
`compact_string_chars', the actual sweeping by the function
`sweep_strings' is described below.

   After that, the other types are swept step by step using functions
`sweep_conses', `sweep_bit_vectors_1', `sweep_compiled_functions',
`sweep_floats', `sweep_symbols', `sweep_extents', `sweep_markers' and
`sweep_extents'.  They are the fixed-size types cons, floats,
compiled-functions, symbol, marker, extent, and event stored in
so-called "frob blocks", and therefore we can basically do the same on
every type objects, using the same macros, especially defined only to
handle everything with respect to fixed-size blocks. The only fixed-size
type that is not handled here are the fixed-size portion of strings,
because we took special care of them earlier.

   The only big exceptions are bit vectors stored differently and
therefore treated differently by the function `sweep_bit_vectors_1'
described later.

   At first, we need some brief information about how these fixed-size
types are managed in general, in order to understand how the sweeping
is done. They have all a fixed size, and are therefore stored in big
blocks of memory - allocated at once - that can hold a certain amount
of objects of one type. The macro `DECLARE_FIXED_TYPE_ALLOC' creates
the suitable structures for every type. More precisely, we have the
block struct (holding a pointer to the previous block `prev' and the
objects in `block[]'), a pointer to current block
(`current_..._block)') and its last index (`current_..._block_index'),
and a pointer to the free list that will be created. Also a macro
`FIXED_TYPE_FROM_BLOCK' plus some related macros exists that are used
to obtain a new object, either from the free list
`ALLOCATE_FIXED_TYPE_1' if there is an unused object of that type
stored or by allocating a completely new block using
`ALLOCATE_FIXED_TYPE_FROM_BLOCK'.

   The rest works as follows: all of them define a macro `UNMARK_...'
that is used to unmark the object. They define a macro
`ADDITIONAL_FREE_...' that defines additional work that has to be done
when converting an object from in use to not in use (so far, only
markers use it in order to unchain them). Then, they all call the macro
`SWEEP_FIXED_TYPE_BLOCK' instantiated with their type name and their
struct name.

   This call in particular does the following: we go over all blocks
starting with the current moving towards the oldest.  For each block,
we look at every object in it. If the object already freed (checked
with `FREE_STRUCT_P' using the first pointer of the object), or if it is
set to read only (`C_READONLY_RECORD_HEADER_P', nothing must be done.
If it is unmarked (checked with `MARKED_RECORD_HEADER_P'), it is put in
the free list and set free (using the macro `FREE_FIXED_TYPE',
otherwise it stays in the block, but is unmarked (by `UNMARK_...').
While going through one block, we note if the whole block is empty. If
so, the whole block is freed (using `xfree') and the free list state is
set to the state it had before handling this block.


File: internals.info,  Node: sweep_lcrecords_1,  Next: compact_string_chars,  Prev: gc_sweep,  Up: Garbage Collection - Step by Step

11.4.5 `sweep_lcrecords_1'
--------------------------

After nullifying the complete lcrecord statistics, we go over all
lcrecords two separate times. They are all chained together in a list
with a head called `all_lcrecords'.

   The first loop calls for each object its `finalizer' method, but only
in the case that it is not read only (`C_READONLY_RECORD_HEADER_P)', it
is not already marked (`MARKED_RECORD_HEADER_P'), it is not already in
a free list (list of freed objects, field `free') and finally it owns a
finalizer method.

   The second loop actually frees the appropriate objects again by
iterating through the whole list. In case an object is read only or
marked, it has to persist, otherwise it is manually freed by calling
`xfree'. During this loop, the lcrecord statistics are kept up to date
by calling `tick_lcrecord_stats' with the right arguments,


File: internals.info,  Node: compact_string_chars,  Next: sweep_strings,  Prev: sweep_lcrecords_1,  Up: Garbage Collection - Step by Step

11.4.6 `compact_string_chars'
-----------------------------

The purpose of this function is to compact all the data parts of the
strings that are held in so-called `string_chars_block', i.e. the
strings that do not exceed a certain maximal length.

   The procedure with which this is done is as follows. We are keeping
two positions in the `string_chars_block's using two pointer/integer
pairs, namely `from_sb'/`from_pos' and `to_sb'/`to_pos'. They stand for
the actual positions, from where to where, to copy the actually handled
string.

   While going over all chained `string_char_block's and their held
strings, staring at `first_string_chars_block', both pointers are
advanced and eventually a string is copied from `from_sb' to `to_sb',
depending on the status of the pointed at strings.

   More precisely, we can distinguish between the following actions.
   * The string at `from_sb''s position could be marked as free, which
     is indicated by an invalid pointer to the pointer that should
     point back to the fixed size string object, and which is checked by
     `FREE_STRUCT_P'. In this case, the `from_sb'/`from_pos' is
     advanced to the next string, and nothing has to be copied.

   * Also, if a string object itself is unmarked, nothing has to be
     copied. We likewise advance the `from_sb'/`from_pos' pair as
     described above.

   * In all other cases, we have a marked string at hand. The string
     data must be moved from the from-position to the to-position. In
     case there is not enough space in the actual `to_sb'-block, we
     advance this pointer to the beginning of the next block before
     copying. In case the from and to positions are different, we
     perform the actual copying using the library function `memmove'.

   After compacting, the pointer to the current `string_chars_block',
sitting in `current_string_chars_block', is reset on the last block to
which we moved a string, i.e. `to_block', and all remaining blocks (we
know that they just carry garbage) are explicitly `xfree'd.


File: internals.info,  Node: sweep_strings,  Next: sweep_bit_vectors_1,  Prev: compact_string_chars,  Up: Garbage Collection - Step by Step

11.4.7 `sweep_strings'
----------------------

The sweeping for the fixed sized string objects is essentially exactly
the same as it is for all other fixed size types. As before, the freeing
into the suitable free list is done by using the macro
`SWEEP_FIXED_SIZE_BLOCK' after defining the right macros
`UNMARK_string' and `ADDITIONAL_FREE_string'. These two definitions are
a little bit special compared to the ones used for the other fixed size
types.

   `UNMARK_string' is defined the same way except some additional code
used for updating the bookkeeping information.

   For strings, `ADDITIONAL_FREE_string' has to do something in
addition: in case, the string was not allocated in a
`string_chars_block' because it exceeded the maximal length, and
therefore it was `malloc'ed separately, we know also `xfree' it
explicitly.


File: internals.info,  Node: sweep_bit_vectors_1,  Prev: sweep_strings,  Up: Garbage Collection - Step by Step

11.4.8 `sweep_bit_vectors_1'
----------------------------

Bit vectors are also one of the rare types that are `malloc'ed
individually. Consequently, while sweeping, all further needless bit
vectors must be freed by hand. This is done, as one might imagine, the
expected way: since they are all registered in a list called
`all_bit_vectors', all elements of that list are traversed, all
unmarked bit vectors are unlinked by calling `xfree' and all of them
become unmarked.  In addition, the bookkeeping information used for
garbage collector's output purposes is updated.


File: internals.info,  Node: Integers and Characters,  Next: Allocation from Frob Blocks,  Prev: Garbage Collection - Step by Step,  Up: Allocation of Objects in XEmacs Lisp

11.5 Integers and Characters
============================

Integer and character Lisp objects are created from integers using the
macros `XSETINT()' and `XSETCHAR()' or the equivalent functions
`make_int()' and `make_char()'. (These are actually macros on most
systems.)  These functions basically just do some moving of bits
around, since the integral value of the object is stored directly in
the `Lisp_Object'.

   `XSETINT()' and the like will truncate values given to them that are
too big; i.e. you won't get the value you expected but the tag bits
will at least be correct.


File: internals.info,  Node: Allocation from Frob Blocks,  Next: lrecords,  Prev: Integers and Characters,  Up: Allocation of Objects in XEmacs Lisp

11.6 Allocation from Frob Blocks
================================

The uninitialized memory required by a `Lisp_Object' of a particular
type is allocated using `ALLOCATE_FIXED_TYPE()'.  This only occurs
inside of the lowest-level object-creating functions in `alloc.c':
`Fcons()', `make_float()', `Fmake_byte_code()', `Fmake_symbol()',
`allocate_extent()', `allocate_event()', `Fmake_marker()', and
`make_uninit_string()'.  The idea is that, for each type, there are a
number of frob blocks (each 2K in size); each frob block is divided up
into object-sized chunks.  Each frob block will have some of these
chunks that are currently assigned to objects, and perhaps some that are
free. (If a frob block has nothing but free chunks, it is freed at the
end of the garbage collection cycle.)  The free chunks are stored in a
free list, which is chained by storing a pointer in the first four bytes
of the chunk. (Except for the free chunks at the end of the last frob
block, which are handled using an index which points past the end of the
last-allocated chunk in the last frob block.)  `ALLOCATE_FIXED_TYPE()'
first tries to retrieve a chunk from the free list; if that fails, it
calls `ALLOCATE_FIXED_TYPE_FROM_BLOCK()', which looks at the end of the
last frob block for space, and creates a new frob block if there is
none. (There are actually two versions of these macros, one of which is
more defensive but less efficient and is used for error-checking.)


File: internals.info,  Node: lrecords,  Next: Low-level allocation,  Prev: Allocation from Frob Blocks,  Up: Allocation of Objects in XEmacs Lisp

11.7 lrecords
=============

[see `lrecord.h']

   All lrecords have at the beginning of their structure a `struct
lrecord_header'.  This just contains a type number and some flags,
including the mark bit.  All builtin type numbers are defined as
constants in `enum lrecord_type', to allow the compiler to generate
more efficient code for `TYPEP'.  The type number, thru the
`lrecord_implementation_table', gives access to a `struct
lrecord_implementation', which is a structure containing method pointers
and such.  There is one of these for each type, and it is a global,
constant, statically-declared structure that is declared in the
`DEFINE_LRECORD_IMPLEMENTATION()' macro.

   Simple lrecords (of type (b) above) just have a `struct
lrecord_header' at their beginning.  lcrecords, however, actually have a
`struct lcrecord_header'.  This, in turn, has a `struct lrecord_header'
at its beginning, so sanity is preserved; but it also has a pointer
used to chain all lcrecords together, and a special ID field used to
distinguish one lcrecord from another. (This field is used only for
debugging and could be removed, but the space gain is not significant.)

   Simple lrecords are created using `ALLOCATE_FIXED_TYPE()', just like
for other frob blocks.  The only change is that the implementation
pointer must be initialized correctly. (The implementation structure for
an lrecord, or rather the pointer to it, is named `lrecord_float',
`lrecord_extent', `lrecord_buffer', etc.)

   lcrecords are created using `alloc_lcrecord()'.  This takes a size
to allocate and an implementation pointer. (The size needs to be passed
because some lcrecords, such as window configurations, are of variable
size.) This basically just `malloc()'s the storage, initializes the
`struct lcrecord_header', and chains the lcrecord onto the head of the
list of all lcrecords, which is stored in the variable `all_lcrecords'.
The calls to `alloc_lcrecord()' generally occur in the lowest-level
allocation function for each lrecord type.

   Whenever you create an lrecord, you need to call either
`DEFINE_LRECORD_IMPLEMENTATION()' or
`DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()'.  This needs to be specified
in a `.c' file, at the top level.  What this actually does is define
and initialize the implementation structure for the lrecord. (And
possibly declares a function `error_check_foo()' that implements the
`XFOO()' macro when error-checking is enabled.)  The arguments to the
macros are the actual type name (this is used to construct the C
variable name of the lrecord implementation structure and related
structures using the `##' macro concatenation operator), a string that
names the type on the Lisp level (this may not be the same as the C
type name; typically, the C type name has underscores, while the Lisp
string has dashes), various method pointers, and the name of the C
structure that contains the object.  The methods are used to
encapsulate type-specific information about the object, such as how to
print it or mark it for garbage collection, so that it's easy to add
new object types without having to add a specific case for each new
type in a bunch of different places.

   The difference between `DEFINE_LRECORD_IMPLEMENTATION()' and
`DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()' is that the former is used
for fixed-size object types and the latter is for variable-size object
types.  Most object types are fixed-size; some complex types, however
(e.g. window configurations), are variable-size.  Variable-size object
types have an extra method, which is called to determine the actual
size of a particular object of that type.  (Currently this is only used
for keeping allocation statistics.)

   For the purpose of keeping allocation statistics, the allocation
engine keeps a list of all the different types that exist.  Note that,
since `DEFINE_LRECORD_IMPLEMENTATION()' is a macro that is specified at
top-level, there is no way for it to initialize the global data
structures containing type information, like
`lrecord_implementations_table'.  For this reason a call to
`INIT_LRECORD_IMPLEMENTATION' must be added to the same source file
containing `DEFINE_LRECORD_IMPLEMENTATION', but instead of to the top
level, to one of the init functions, typically `syms_of_FOO.c'.
`INIT_LRECORD_IMPLEMENTATION' must be called before an object of this
type is used.

   The type number is also used to index into an array holding the
number of objects of each type and the total memory allocated for
objects of that type.  The statistics in this array are computed during
the sweep stage.  These statistics are returned by the call to
`garbage-collect'.

   Note that for every type defined with a `DEFINE_LRECORD_*()' macro,
there needs to be a `DECLARE_LRECORD_IMPLEMENTATION()' somewhere in a
`.h' file, and this `.h' file needs to be included by `inline.c'.

   Furthermore, there should generally be a set of `XFOOBAR()',
`FOOBARP()', etc. macros in a `.h' (or occasionally `.c') file.  To
create one of these, copy an existing model and modify as necessary.

   *Please note:* If you define an lrecord in an external
dynamically-loaded module, you must use `DECLARE_EXTERNAL_LRECORD',
`DEFINE_EXTERNAL_LRECORD_IMPLEMENTATION', and
`DEFINE_EXTERNAL_LRECORD_SEQUENCE_IMPLEMENTATION' instead of the
non-EXTERNAL forms. These macros will dynamically add new type numbers
to the global enum that records them, whereas the non-EXTERNAL forms
assume that the programmer has already inserted the correct type numbers
into the enum's code at compile-time.

   The various methods in the lrecord implementation structure are:

  1. A "mark" method.  This is called during the marking stage and
     passed a function pointer (usually the `mark_object()' function),
     which is used to mark an object.  All Lisp objects that are
     contained within the object need to be marked by applying this
     function to them.  The mark method should also return a Lisp
     object, which should be either `nil' or an object to mark. (This
     can be used in lieu of calling `mark_object()' on the object, to
     reduce the recursion depth, and consequently should be the most
     heavily nested sub-object, such as a long list.)

     *Please note:* When the mark method is called, garbage collection
     is in progress, and special precautions need to be taken when
     accessing objects; see section (B) above.

     If your mark method does not need to do anything, it can be `NULL'.

  2. A "print" method.  This is called to create a printed
     representation of the object, whenever `princ', `prin1', or the
     like is called.  It is passed the object, a stream to which the
     output is to be directed, and an `escapeflag' which indicates
     whether the object's printed representation should be "escaped" so
     that it is readable. (This corresponds to the difference between
     `princ' and `prin1'.) Basically, "escaped" means that strings will
     have quotes around them and confusing characters in the strings
     such as quotes, backslashes, and newlines will be backslashed; and
     that special care will be taken to make symbols print in a
     readable fashion (e.g. symbols that look like numbers will be
     backslashed).  Other readable objects should perhaps pass
     `escapeflag' on when sub-objects are printed, so that readability
     is preserved when necessary (or if not, always pass in a 1 for
     `escapeflag').  Non-readable objects should in general ignore
     `escapeflag', except that some use it as an indication that more
     verbose output should be given.

     Sub-objects are printed using `print_internal()', which takes
     exactly the same arguments as are passed to the print method.

     Literal C strings should be printed using `write_c_string()', or
     `write_string_1()' for non-null-terminated strings.

     Functions that do not have a readable representation should check
     the `print_readably' flag and signal an error if it is set.

     If you specify NULL for the print method, the
     `default_object_printer()' will be used.

  3. A "finalize" method.  This is called at the beginning of the sweep
     stage on lcrecords that are about to be freed, and should be used
     to perform any extra object cleanup.  This typically involves
     freeing any extra `malloc()'ed memory associated with the object,
     releasing any operating-system and window-system resources
     associated with the object (e.g. pixmaps, fonts), etc.

     The finalize method can be NULL if nothing needs to be done.

     WARNING #1: The finalize method is also called at the end of the
     dump phase; this time with the for_disksave parameter set to
     non-zero.  The object is _not_ about to disappear, so you have to
     make sure to _not_ free any extra `malloc()'ed memory if you're
     going to need it later.  (Also, signal an error if there are any
     operating-system and window-system resources here, because they
     can't be dumped.)

     Finalize methods should, as a rule, set to zero any pointers after
     they've been freed, and check to make sure pointers are not zero
     before freeing.  Although I'm pretty sure that finalize methods
     are not called twice on the same object (except for the
     `for_disksave' proviso), we've gotten nastily burned in some cases
     by not doing this.

     WARNING #2: The finalize method is _only_ called for lcrecords,
     _not_ for simply lrecords.  If you need a finalize method for
     simple lrecords, you have to stick it in the
     `ADDITIONAL_FREE_foo()' macro in `alloc.c'.

     WARNING #3: Things are in an _extremely_ bizarre state when
     `ADDITIONAL_FREE_foo()' is called, so you have to be incredibly
     careful when writing one of these functions.  See the comment in
     `gc_sweep()'.  If you ever have to add one of these, consider
     using an lcrecord or dealing with the problem in a different
     fashion.

  4. An "equal" method.  This compares the two objects for similarity,
     when `equal' is called.  It should compare the contents of the
     objects in some reasonable fashion.  It is passed the two objects
     and a "depth" value, which is used to catch circular objects.  To
     compare sub-Lisp-objects, call `internal_equal()' and bump the
     depth value by one.  If this value gets too high, a
     `circular-object' error will be signaled.

     If this is NULL, objects are `equal' only when they are `eq', i.e.
     identical.

  5. A "hash" method.  This is used to hash objects when they are to be
     compared with `equal'.  The rule here is that if two objects are
     `equal', they _must_ hash to the same value; i.e. your hash
     function should use some subset of the sub-fields of the object
     that are compared in the "equal" method.  If you specify this
     method as `NULL', the object's pointer will be used as the hash,
     which will _fail_ if the object has an `equal' method, so don't do
     this.

     To hash a sub-Lisp-object, call `internal_hash()'.  Bump the depth
     by one, just like in the "equal" method.

     To convert a Lisp object directly into a hash value (using its
     pointer), use `LISP_HASH()'.  This is what happens when the hash
     method is NULL.

     To hash two or more values together into a single value, use
     `HASH2()', `HASH3()', `HASH4()', etc.

  6. "getprop", "putprop", "remprop", and "plist" methods.  These are
     used for object types that have properties.  I don't feel like
     documenting them here.  If you create one of these objects, you
     have to use different macros to define them, i.e.
     `DEFINE_LRECORD_IMPLEMENTATION_WITH_PROPS()' or
     `DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION_WITH_PROPS()'.

  7. A "size_in_bytes" method, when the object is of variable-size.
     (i.e. declared with a `_SEQUENCE_IMPLEMENTATION' macro.)  This
     should simply return the object's size in bytes, exactly as you
     might expect.  For an example, see the methods for window
     configurations and opaques.


File: internals.info,  Node: Low-level allocation,  Next: Cons,  Prev: lrecords,  Up: Allocation of Objects in XEmacs Lisp

11.8 Low-level allocation
=========================

Memory that you want to allocate directly should be allocated using
`xmalloc()' rather than `malloc()'.  This implements error-checking on
the return value, and once upon a time did some more vital stuff (i.e.
`BLOCK_INPUT', which is no longer necessary).  Free using `xfree()',
and realloc using `xrealloc()'.  Note that `xmalloc()' will do a
non-local exit if the memory can't be allocated. (Many functions,
however, do not expect this, and thus XEmacs will likely crash if this
happens.  *This is a bug.*  If you can, you should strive to make your
function handle this OK.  However, it's difficult in the general
circumstance, perhaps requiring extra unwind-protects and such.)

   Note that XEmacs provides two separate replacements for the standard
`malloc()' library function.  These are called "old GNU malloc"
(`malloc.c') and "new GNU malloc" (`gmalloc.c'), respectively.  New GNU
malloc is better in pretty much every way than old GNU malloc, and
should be used if possible.  (It used to be that on some systems, the
old one worked but the new one didn't.  I think this was due
specifically to a bug in SunOS, which the new one now works around; so
I don't think the old one ever has to be used any more.) The primary
difference between both of these mallocs and the standard system malloc
is that they are much faster, at the expense of increased space.  The
basic idea is that memory is allocated in fixed chunks of powers of
two.  This allows for basically constant malloc time, since the various
chunks can just be kept on a number of free lists. (The standard system
malloc typically allocates arbitrary-sized chunks and has to spend some
time, sometimes a significant amount of time, walking the heap looking
for a free block to use and cleaning things up.)  The new GNU malloc
improves on things by allocating large objects in chunks of 4096 bytes
rather than in ever larger powers of two, which results in ever larger
wastage.  There is a slight speed loss here, but it's of doubtful
significance.

   NOTE: Apparently there is a third-generation GNU malloc that is
significantly better than the new GNU malloc, and should probably be
included in XEmacs.

   There is also the relocating allocator, `ralloc.c'.  This actually
moves blocks of memory around so that the `sbrk()' pointer shrunk and
virtual memory released back to the system.  On some systems, this is a
big win.  On all systems, it causes a noticeable (and sometimes huge)
speed penalty, so I turn it off by default.  `ralloc.c' only works with
the new GNU malloc in `gmalloc.c'.  There are also two versions of
`ralloc.c', one that uses `mmap()' rather than block copies to move
data around.  This purports to be faster, although that depends on the
amount of data that would have had to be block copied and the
system-call overhead for `mmap()'.  I don't know exactly how this
works, except that the relocating-allocation routines are pretty much
used only for the memory allocated for a buffer, which is the biggest
consumer of space, esp. of space that may get freed later.

   Note that the GNU mallocs have some "memory warning" facilities.
XEmacs taps into them and issues a warning through the standard warning
system, when memory gets to 75%, 85%, and 95% full.  (On some systems,
the memory warnings are not functional.)

   Allocated memory that is going to be used to make a Lisp object is
created using `allocate_lisp_storage()'.  This just calls `xmalloc()'.
It used to verify that the pointer to the memory can fit into a Lisp
word, before the current Lisp object representation was introduced.
`allocate_lisp_storage()' is called by `alloc_lcrecord()',
`ALLOCATE_FIXED_TYPE()', and the vector and bit-vector creation
routines.  These routines also call `INCREMENT_CONS_COUNTER()' at the
appropriate times; this keeps statistics on how much memory is
allocated, so that garbage-collection can be invoked when the threshold
is reached.


File: internals.info,  Node: Cons,  Next: Vector,  Prev: Low-level allocation,  Up: Allocation of Objects in XEmacs Lisp

11.9 Cons
=========

Conses are allocated in standard frob blocks.  The only thing to note
is that conses can be explicitly freed using `free_cons()' and
associated functions `free_list()' and `free_alist()'.  This
immediately puts the conses onto the cons free list, and decrements the
statistics on memory allocation appropriately.  This is used to good
effect by some extremely commonly-used code, to avoid generating extra
objects and thereby triggering GC sooner.  However, you have to be
_extremely_ careful when doing this.  If you mess this up, you will get
BADLY BURNED, and it has happened before.


File: internals.info,  Node: Vector,  Next: Bit Vector,  Prev: Cons,  Up: Allocation of Objects in XEmacs Lisp

11.10 Vector
============

As mentioned above, each vector is `malloc()'ed individually, and all
are threaded through the variable `all_vectors'.  Vectors are marked
strangely during garbage collection, by kludging the size field.  Note
that the `struct Lisp_Vector' is declared with its `contents' field
being a _stretchy_ array of one element.  It is actually `malloc()'ed
with the right size, however, and access to any element through the
`contents' array works fine.


File: internals.info,  Node: Bit Vector,  Next: Symbol,  Prev: Vector,  Up: Allocation of Objects in XEmacs Lisp

11.11 Bit Vector
================

Bit vectors work exactly like vectors, except for more complicated code
to access an individual bit, and except for the fact that bit vectors
are lrecords while vectors are not. (The only difference here is that
there's an lrecord implementation pointer at the beginning and the tag
field in bit vector Lisp words is "lrecord" rather than "vector".)


File: internals.info,  Node: Symbol,  Next: Marker,  Prev: Bit Vector,  Up: Allocation of Objects in XEmacs Lisp

11.12 Symbol
============

Symbols are also allocated in frob blocks.  Symbols in the awful
horrible obarray structure are chained through their `next' field.

   Remember that `intern' looks up a symbol in an obarray, creating one
if necessary.


File: internals.info,  Node: Marker,  Next: String,  Prev: Symbol,  Up: Allocation of Objects in XEmacs Lisp

11.13 Marker
============

Markers are allocated in frob blocks, as usual.  They are kept in a
buffer unordered, but in a doubly-linked list so that they can easily
be removed. (Formerly this was a singly-linked list, but in some cases
garbage collection took an extraordinarily long time due to the O(N^2)
time required to remove lots of markers from a buffer.) Markers are
removed from a buffer in the finalize stage, in
`ADDITIONAL_FREE_marker()'.


File: internals.info,  Node: String,  Next: Compiled Function,  Prev: Marker,  Up: Allocation of Objects in XEmacs Lisp

11.14 String
============

As mentioned above, strings are a special case.  A string is logically
two parts, a fixed-size object (containing the length, property list,
and a pointer to the actual data), and the actual data in the string.
The fixed-size object is a `struct Lisp_String' and is allocated in
frob blocks, as usual.  The actual data is stored in special
"string-chars blocks", which are 8K blocks of memory.
Currently-allocated strings are simply laid end to end in these
string-chars blocks, with a pointer back to the `struct Lisp_String'
stored before each string in the string-chars block.  When a new string
needs to be allocated, the remaining space at the end of the last
string-chars block is used if there's enough, and a new string-chars
block is created otherwise.

   There are never any holes in the string-chars blocks due to the
string compaction and relocation that happens at the end of garbage
collection.  During the sweep stage of garbage collection, when objects
are reclaimed, the garbage collector goes through all string-chars
blocks, looking for unused strings.  Each chunk of string data is
preceded by a pointer to the corresponding `struct Lisp_String', which
indicates both whether the string is used and how big the string is,
i.e. how to get to the next chunk of string data.  Holes are compressed
by block-copying the next string into the empty space and relocating the
pointer stored in the corresponding `struct Lisp_String'.  *This means
you have to be careful with strings in your code.* See the section
above on `GCPRO'ing.

   Note that there is one situation not handled: a string that is too
big to fit into a string-chars block.  Such strings, called "big
strings", are all `malloc()'ed as their own block. (#### Although it
would make more sense for the threshold for big strings to be somewhat
lower, e.g. 1/2 or 1/4 the size of a string-chars block.  It seems that
this was indeed the case formerly--indeed, the threshold was set at
1/8--but Mly forgot about this when rewriting things for 19.8.)

   Note also that the string data in string-chars blocks is padded as
necessary so that proper alignment constraints on the `struct
Lisp_String' back pointers are maintained.

   Finally, strings can be resized.  This happens in Mule when a
character is substituted with a different-length character, or during
modeline frobbing. (You could also export this to Lisp, but it's not
done so currently.) Resizing a string is a potentially tricky process.
If the change is small enough that the padding can absorb it, nothing
other than a simple memory move needs to be done.  Keep in mind,
however, that the string can't shrink too much because the offset to the
next string in the string-chars block is computed by looking at the
length and rounding to the nearest multiple of four or eight.  If the
string would shrink or expand beyond the correct padding, new string
data needs to be allocated at the end of the last string-chars block and
the data moved appropriately.  This leaves some dead string data, which
is marked by putting a special marker of 0xFFFFFFFF in the `struct
Lisp_String' pointer before the data (there's no real `struct
Lisp_String' to point to and relocate), and storing the size of the dead
string data (which would normally be obtained from the now-non-existent
`struct Lisp_String') at the beginning of the dead string data gap.
The string compactor recognizes this special 0xFFFFFFFF marker and
handles it correctly.


File: internals.info,  Node: Compiled Function,  Prev: String,  Up: Allocation of Objects in XEmacs Lisp

11.15 Compiled Function
=======================

Not yet documented.


File: internals.info,  Node: Dumping,  Next: Events and the Event Loop,  Prev: Allocation of Objects in XEmacs Lisp,  Up: Top

12 Dumping
**********

12.1 What is dumping and its justification
==========================================

The C code of XEmacs is just a Lisp engine with a lot of built-in
primitives useful for writing an editor.  The editor itself is written
mostly in Lisp, and represents around 100K lines of code.  Loading and
executing the initialization of all this code takes a bit a time (five
to ten times the usual startup time of current xemacs) and requires
having all the lisp source files around.  Having to reload them each
time the editor is started would not be acceptable.

   The traditional solution to this problem is called dumping: the build
process first creates the lisp engine under the name `temacs', then
runs it until it has finished loading and initializing all the lisp
code, and eventually creates a new executable called `xemacs' including
both the object code in `temacs' and all the contents of the memory
after the initialization.

   This solution, while working, has a huge problem: the creation of the
new executable from the actual contents of memory is an extremely
system-specific process, quite error-prone, and which interferes with a
lot of system libraries (like malloc).  It is even getting worse
nowadays with libraries using constructors which are automatically
called when the program is started (even before main()) which tend to
crash when they are called multiple times, once before dumping and once
after (IRIX 6.x libz.so pulls in some C++ image libraries thru
dependencies which have this problem).  Writing the dumper is also one
of the most difficult parts of porting XEmacs to a new operating system.
Basically, `dumping' is an operation that is just not officially
supported on many operating systems.

   The aim of the portable dumper is to solve the same problem as the
system-specific dumper, that is to be able to reload quickly, using only
a small number of files, the fully initialized lisp part of the editor,
without any system-specific hacks.

* Menu:

* Overview::
* Data descriptions::
* Dumping phase::
* Reloading phase::
* Remaining issues::


File: internals.info,  Node: Overview,  Next: Data descriptions,  Up: Dumping

12.2 Overview
=============

The portable dumping system has to:

  1. At dump time, write all initialized, non-quickly-rebuildable data
     to a file [Note: currently named `xemacs.dmp', but the name will
     change], along with all informations needed for the reloading.

  2. When starting xemacs, reload the dump file, relocate it to its new
     starting address if needed, and reinitialize all pointers to this
     data.  Also, rebuild all the quickly rebuildable data.


File: internals.info,  Node: Data descriptions,  Next: Dumping phase,  Prev: Overview,  Up: Dumping

12.3 Data descriptions
======================

The more complex task of the dumper is to be able to write lisp objects
(lrecords) and C structs to disk and reload them at a different address,
updating all the pointers they include in the process.  This is done by
using external data descriptions that give information about the layout
of the structures in memory.

   The specification of these descriptions is in lrecord.h.  A
description of an lrecord is an array of struct lrecord_description.
Each of these structs include a type, an offset in the structure and
some optional parameters depending on the type.  For instance, here is
the string description:

     static const struct lrecord_description string_description[] = {
       { XD_BYTECOUNT,         offsetof (Lisp_String, size) },
       { XD_OPAQUE_DATA_PTR,   offsetof (Lisp_String, data), XD_INDIRECT(0, 1) },
       { XD_LISP_OBJECT,       offsetof (Lisp_String, plist) },
       { XD_END }
     };

   The first line indicates a member of type Bytecount, which is used by
the next, indirect directive.  The second means "there is a pointer to
some opaque data in the field `data'".  The length of said data is
given by the expression `XD_INDIRECT(0, 1)', which means "the value in
the 0th line of the description (welcome to C) plus one".  The third
line means "there is a Lisp_Object member `plist' in the Lisp_String
structure".  `XD_END' then ends the description.

   This gives us all the information we need to move around what is
pointed to by a structure (C or lrecord) and, by transitivity,
everything that it points to.  The only missing information for dumping
is the size of the structure.  For lrecords, this is part of the
lrecord_implementation, so we don't need to duplicate it.  For C
structures we use a struct struct_description, which includes a size
field and a pointer to an associated array of lrecord_description.


File: internals.info,  Node: Dumping phase,  Next: Reloading phase,  Prev: Data descriptions,  Up: Dumping

12.4 Dumping phase
==================

Dumping is done by calling the function pdump() (in dumper.c) which is
invoked from Fdump_emacs (in emacs.c).  This function performs a number
of tasks.

* Menu:

* Object inventory::
* Address allocation::
* The header::
* Data dumping::
* Pointers dumping::


File: internals.info,  Node: Object inventory,  Next: Address allocation,  Up: Dumping phase

12.4.1 Object inventory
-----------------------

The first task is to build the list of the objects to dump.  This
includes:

   * lisp objects

   * C structures

   We end up with one `pdump_entry_list_elmt' per object group (arrays
of C structs are kept together) which includes a pointer to the first
object of the group, the per-object size and the count of objects in the
group, along with some other information which is initialized later.

   These entries are linked together in `pdump_entry_list' structures
and can be enumerated thru either:

  1. the `pdump_object_table', an array of `pdump_entry_list', one per
     lrecord type, indexed by type number.

  2. the `pdump_opaque_data_list', used for the opaque data which does
     not include pointers, and hence does not need descriptions.

  3. the `pdump_struct_table', which is a vector of
     `struct_description'/`pdump_entry_list' pairs, used for non-opaque
     C structures.

   This uses a marking strategy similar to the garbage collector.  Some
differences though:

  1. We do not use the mark bit (which does not exist for C structures
     anyway); we use a big hash table instead.

  2. We do not use the mark function of lrecords but instead rely on the
     external descriptions.  This happens essentially because we need to
     follow pointers to C structures and opaque data in addition to
     Lisp_Object members.

   This is done by `pdump_register_object()', which handles Lisp_Object
variables, and `pdump_register_struct()' which handles C structures,
which both delegate the description management to
`pdump_register_sub()'.

   The hash table doubles as a map object to pdump_entry_list_elmt (i.e.
allows us to look up a pdump_entry_list_elmt with the object it points
to).  Entries are added with `pdump_add_entry()' and looked up with
`pdump_get_entry()'.  There is no need for entry removal.  The hash
value is computed quite simply from the object pointer by
`pdump_make_hash()'.

   The roots for the marking are:

  1. the `staticpro''ed variables (there is a special
     `staticpro_nodump()' call for protected variables we do not want
     to dump).

  2. the variables registered via `dump_add_root_object' (`staticpro()'
     is equivalent to `staticpro_nodump()' + `dump_add_root_object()').

  3. the variables registered via `dump_add_root_struct_ptr', each of
     which points to a C structure.

   This does not include the GCPRO'ed variables, the specbinds, the
catchtags, the backlist, the redisplay or the profiling info, since we
do not want to rebuild the actual chain of lisp calls which end up to
the dump-emacs call, only the global variables.

   Weak lists and weak hash tables are dumped as if they were their
non-weak equivalent (without changing their type, of course).  This has
not yet been a problem.


File: internals.info,  Node: Address allocation,  Next: The header,  Prev: Object inventory,  Up: Dumping phase

12.4.2 Address allocation
-------------------------

The next step is to allocate the offsets of each of the objects in the
final dump file.  This is done by `pdump_allocate_offset()' which is
called indirectly by `pdump_scan_by_alignment()'.

   The strategy to deal with alignment problems uses these facts:

  1. real world alignment requirements are powers of two.

  2. the C compiler is required to adjust the size of a struct so that
     you can have an array of them next to each other.  This means you
     can have an upper bound of the alignment requirements of a given
     structure by looking at which power of two its size is a multiple.

  3. the non-variant part of variable size lrecords has an alignment
     requirement of 4.

   Hence, for each lrecord type, C struct type or opaque data block the
alignment requirement is computed as a power of two, with a minimum of
2^2 for lrecords.  `pdump_scan_by_alignment()' then scans all the
`pdump_entry_list_elmt''s, the ones with the highest requirements
first.  This ensures the best packing.

   The maximum alignment requirement we take into account is 2^8.

   `pdump_allocate_offset()' only has to do a linear allocation,
starting at offset 256 (this leaves room for the header and keeps the
alignments happy).


File: internals.info,  Node: The header,  Next: Data dumping,  Prev: Address allocation,  Up: Dumping phase

12.4.3 The header
-----------------

The next step creates the file and writes a header with a signature and
some random information in it.  The `reloc_address' field, which
indicates at which address the file should be loaded if we want to avoid
post-reload relocation, is set to 0.  It then seeks to offset 256 (base
offset for the objects).


File: internals.info,  Node: Data dumping,  Next: Pointers dumping,  Prev: The header,  Up: Dumping phase

12.4.4 Data dumping
-------------------

The data is dumped in the same order as the addresses were allocated by
`pdump_dump_data()', called from `pdump_scan_by_alignment()'.  This
function copies the data to a temporary buffer, relocates all pointers
in the object to the addresses allocated in step Address Allocation,
and writes it to the file.  Using the same order means that, if we are
careful with lrecords whose size is not a multiple of 4, we are ensured
that the object is always written at the offset in the file allocated
in step Address Allocation.


File: internals.info,  Node: Pointers dumping,  Prev: Data dumping,  Up: Dumping phase

12.4.5 Pointers dumping
-----------------------

A bunch of tables needed to reassign properly the global pointers are
then written.  They are:

  1. the pdump_root_struct_ptrs dynarr

  2. the pdump_opaques dynarr

  3. a vector of all the offsets to the objects in the file that
     include a description (for faster relocation at reload time)

  4. the pdump_root_objects and pdump_weak_object_chains dynarrs.

   For each of the dynarrs we write both the pointer to the variables
and the relocated offset of the object they point to.  Since these
variables are global, the pointers are still valid when restarting the
program and are used to regenerate the global pointers.

   The `pdump_weak_object_chains' dynarr is a special case.  The
variables it points to are the head of weak linked lists of lisp objects
of the same type.  Not all objects of this list are dumped so the
relocated pointer we associate with them points to the first dumped
object of the list, or Qnil if none is available.  This is also the
reason why they are not used as roots for the purpose of object
enumeration.

   Some very important information like the `staticpros' and
`lrecord_implementations_table' are handled indirectly using
`dump_add_opaque' or `dump_add_root_struct_ptr'.

   This is the end of the dumping part.


File: internals.info,  Node: Reloading phase,  Next: Remaining issues,  Prev: Dumping phase,  Up: Dumping

12.5 Reloading phase
====================

12.5.1 File loading
-------------------

The file is mmap'ed in memory (which ensures a PAGESIZE alignment, at
least 4096), or if mmap is unavailable or fails, a 256-bytes aligned
malloc is done and the file is loaded.

   Some variables are reinitialized from the values found in the header.

   The difference between the actual loading address and the
reloc_address is computed and will be used for all the relocations.

12.5.2 Putting back the pdump_opaques
-------------------------------------

The memory contents are restored in the obvious and trivial way.

12.5.3 Putting back the pdump_root_struct_ptrs
----------------------------------------------

The variables pointed to by pdump_root_struct_ptrs in the dump phase are
reset to the right relocated object addresses.

12.5.4 Object relocation
------------------------

All the objects are relocated using their description and their offset
by `pdump_reloc_one'.  This step is unnecessary if the reloc_address is
equal to the file loading address.

12.5.5 Putting back the pdump_root_objects and pdump_weak_object_chains
-----------------------------------------------------------------------

Same as Putting back the pdump_root_struct_ptrs.

12.5.6 Reorganize the hash tables
---------------------------------

Since some of the hash values in the lisp hash tables are
address-dependent, their layout is now wrong.  So we go through each of
them and have them resorted by calling `pdump_reorganize_hash_table'.


File: internals.info,  Node: Remaining issues,  Prev: Reloading phase,  Up: Dumping

12.6 Remaining issues
=====================

The build process will have to start a post-dump xemacs, ask it the
loading address (which will, hopefully, be always the same between
different xemacs invocations) and relocate the file to the new address.
This way the object relocation phase will not have to be done, which
means no writes in the objects and that, because of the use of mmap, the
dumped data will be shared between all the xemacs running on the
computer.

   Some executable signature will be necessary to ensure that a given
dump file is really associated with a given executable, or random
crashes will occur.  Maybe a random number set at compile or configure
time thru a define.  This will also allow for having
differently-compiled xemacsen on the same system (mule and no-mule
comes to mind).

   The DOC file contents should probably end up in the dump file.


File: internals.info,  Node: Events and the Event Loop,  Next: Evaluation; Stack Frames; Bindings,  Prev: Dumping,  Up: Top

13 Events and the Event Loop
****************************

* Menu:

* Introduction to Events::
* Main Loop::
* Specifics of the Event Gathering Mechanism::
* Specifics About the Emacs Event::
* The Event Stream Callback Routines::
* Other Event Loop Functions::
* Converting Events::
* Dispatching Events; The Command Builder::


File: internals.info,  Node: Introduction to Events,  Next: Main Loop,  Up: Events and the Event Loop

13.1 Introduction to Events
===========================

An event is an object that encapsulates information about an
interesting occurrence in the operating system.  Events are generated
either by user action, direct (e.g. typing on the keyboard or moving
the mouse) or indirect (moving another window, thereby generating an
expose event on an Emacs frame), or as a result of some other typically
asynchronous action happening, such as output from a subprocess being
ready or a timer expiring.  Events come into the system in an
asynchronous fashion (typically through a callback being called) and
are converted into a synchronous event queue (first-in, first-out) in a
process that we will call "collection".

   Note that each application has its own event queue. (It is
immaterial whether the collection process directly puts the events in
the proper application's queue, or puts them into a single system
queue, which is later split up.)

   The most basic level of event collection is done by the operating
system or window system.  Typically, XEmacs does its own event
collection as well.  Often there are multiple layers of collection in
XEmacs, with events from various sources being collected into a queue,
which is then combined with other sources to go into another queue
(i.e. a second level of collection), with perhaps another level on top
of this, etc.

   XEmacs has its own types of events (called "Emacs events"), which
provides an abstract layer on top of the system-dependent nature of the
most basic events that are received.  Part of the complex nature of the
XEmacs event collection process involves converting from the
operating-system events into the proper Emacs events--there may not be
a one-to-one correspondence.

   Emacs events are documented in `events.h'; I'll discuss them later.


File: internals.info,  Node: Main Loop,  Next: Specifics of the Event Gathering Mechanism,  Prev: Introduction to Events,  Up: Events and the Event Loop

13.2 Main Loop
==============

The "command loop" is the top-level loop that the editor is always
running.  It loops endlessly, calling `next-event' to retrieve an event
and `dispatch-event' to execute it. `dispatch-event' does the
appropriate thing with non-user events (process, timeout, magic, eval,
mouse motion); this involves calling a Lisp handler function, redrawing
a newly-exposed part of a frame, reading subprocess output, etc.  For
user events, `dispatch-event' looks up the event in relevant keymaps or
menubars; when a full key sequence or menubar selection is reached, the
appropriate function is executed. `dispatch-event' may have to keep
state across calls; this is done in the "command-builder" structure
associated with each console (remember, there's usually only one
console), and the engine that looks up keystrokes and constructs full
key sequences is called the "command builder".  This is documented
elsewhere.

   The guts of the command loop are in `command_loop_1()'.  This
function doesn't catch errors, though--that's the job of
`command_loop_2()', which is a condition-case (i.e. error-trapping)
wrapper around `command_loop_1()'.  `command_loop_1()' never returns,
but may get thrown out of.

   When an error occurs, `cmd_error()' is called, which usually invokes
the Lisp error handler in `command-error'; however, a default error
handler is provided if `command-error' is `nil' (e.g. during startup).
The purpose of the error handler is simply to display the error message
and do associated cleanup; it does not need to throw anywhere.  When
the error handler finishes, the condition-case in `command_loop_2()'
will finish and `command_loop_2()' will reinvoke `command_loop_1()'.

   `command_loop_2()' is invoked from three places: from
`initial_command_loop()' (called from `main()' at the end of internal
initialization), from the Lisp function `recursive-edit', and from
`call_command_loop()'.

   `call_command_loop()' is called when a macro is started and when the
minibuffer is entered; normal termination of the macro or minibuffer
causes a throw out of the recursive command loop. (To
`execute-kbd-macro' for macros and `exit' for minibuffers.  Note also
that the low-level minibuffer-entering function,
`read-minibuffer-internal', provides its own error handling and does
not need `command_loop_2()''s error encapsulation; so it tells
`call_command_loop()' to invoke `command_loop_1()' directly.)

   Note that both read-minibuffer-internal and recursive-edit set up a
catch for `exit'; this is why `abort-recursive-edit', which throws to
this catch, exits out of either one.

   `initial_command_loop()', called from `main()', sets up a catch for
`top-level' when invoking `command_loop_2()', allowing functions to
throw all the way to the top level if they really need to.  Before
invoking `command_loop_2()', `initial_command_loop()' calls
`top_level_1()', which handles all of the startup stuff (creating the
initial frame, handling the command-line options, loading the user's
`.emacs' file, etc.).  The function that actually does this is in Lisp
and is pointed to by the variable `top-level'; normally this function is
`normal-top-level'.  `top_level_1()' is just an error-handling wrapper
similar to `command_loop_2()'.  Note also that `initial_command_loop()'
sets up a catch for `top-level' when invoking `top_level_1()', just
like when it invokes `command_loop_2()'.


File: internals.info,  Node: Specifics of the Event Gathering Mechanism,  Next: Specifics About the Emacs Event,  Prev: Main Loop,  Up: Events and the Event Loop

13.3 Specifics of the Event Gathering Mechanism
===============================================

Here is an approximate diagram of the collection processes at work in
XEmacs, under TTY's (TTY's are simpler than X so we'll look at this
first):


      asynch.      asynch.    asynch.   asynch.             [Collectors in
     kbd events  kbd events   process   process                the OS]
           |         |         output    output
           |         |           |         |
           |         |           |         |      SIGINT,   [signal handlers
           |         |           |         |      SIGQUIT,     in XEmacs]
           V         V           V         V      SIGWINCH,
          file      file        file      file    SIGALRM
          desc.     desc.       desc.     desc.     |
          (TTY)     (TTY)       (pipe)    (pipe)    |
           |          |          |         |      fake    timeouts
           |          |          |         |      file        |
           |          |          |         |      desc.       |
           |          |          |         |      (pipe)      |
           |          |          |         |        |         |
           |          |          |         |        |         |
           |          |          |         |        |         |
           V          V          V         V        V         V
           ------>-----------<----------------<----------------
                       |
                       |
                       | [collected using select() in emacs_tty_next_event()
                       |  and converted to the appropriate Emacs event]
                       |
                       |
                       V          (above this line is TTY-specific)
                     Emacs -----------------------------------------------
                     event (below this line is the generic event mechanism)
                       |
                       |
     was there     if not, call
     a SIGINT?  emacs_tty_next_event()
         |             |
         |             |
         |             |
         V             V
         --->------<----
                |
                |     [collected in event_stream_next_event();
                |      SIGINT is converted using maybe_read_quit_event()]
                V
              Emacs
              event
                |
                \---->------>----- maybe_kbd_translate() ---->---\
                                                                 |
                                                                 |
                                                                 |
          command event queue                                    |
                                                    if not from command
       (contains events that were                   event queue, call
       read earlier but not processed,              event_stream_next_event()
       typically when waiting in a                               |
       sit-for, sleep-for, etc. for                              |
      a particular event to be received)                         |
                    |                                            |
                    |                                            |
                    V                                            V
                    ---->------------------------------------<----
                                                    |
                                                    | [collected in
                                                    |  next_event_internal()]
                                                    |
      unread-     unread-       event from          |
      command-    command-       keyboard       else, call
      events      event           macro      next_event_internal()
        |           |               |               |
        |           |               |               |
        |           |               |               |
        V           V               V               V
        --------->----------------------<------------
                          |
                          |      [collected in `next-event', which may loop
                          |       more than once if the event it gets is on
                          |       a dead frame, device, etc.]
                          |
                          |
                          V
                 feed into top-level event loop,
                 which repeatedly calls `next-event'
                 and then dispatches the event
                 using `dispatch-event'

   Notice the separation between TTY-specific and generic event
mechanism.  When using the Xt-based event loop, the TTY-specific stuff
is replaced but the rest stays the same.

   It's also important to realize that only one different kind of
system-specific event loop can be operating at a time, and must be able
to receive all kinds of events simultaneously.  For the two existing
event loops (implemented in `event-tty.c' and `event-Xt.c',
respectively), the TTY event loop _only_ handles TTY consoles, while
the Xt event loop handles _both_ TTY and X consoles.  This situation is
different from all of the output handlers, where you simply have one
per console type.

   Here's the Xt Event Loop Diagram (notice that below a certain point,
it's the same as the above diagram):

     asynch. asynch. asynch. asynch.                 [Collectors in
      kbd     kbd    process process                    the OS]
     events  events  output  output
       |       |       |       |
       |       |       |       |     asynch. asynch. [Collectors in the
       |       |       |       |       X        X     OS and X Window System]
       |       |       |       |     events  events
       |       |       |       |       |        |
       |       |       |       |       |        |
       |       |       |       |       |        |    SIGINT, [signal handlers
       |       |       |       |       |        |    SIGQUIT,   in XEmacs]
       |       |       |       |       |        |    SIGWINCH,
       |       |       |       |       |        |    SIGALRM
       |       |       |       |       |        |       |
       |       |       |       |       |        |       |
       |       |       |       |       |        |       |      timeouts
       |       |       |       |       |        |       |          |
       |       |       |       |       |        |       |          |
       |       |       |       |       |        |       V          |
       V       V       V       V       V        V      fake        |
      file    file    file    file    file     file    file        |
      desc.   desc.   desc.   desc.   desc.    desc.   desc.       |
      (TTY)   (TTY)   (pipe)  (pipe) (socket) (socket) (pipe)      |
       |       |       |       |       |        |       |          |
       |       |       |       |       |        |       |          |
       |       |       |       |       |        |       |          |
       V       V       V       V       V        V       V          V
       --->----------------------------------------<---------<------
            |              |               |
            |              |               |[collected using select() in
            |              |               | _XtWaitForSomething(), called
            |              |               | from XtAppProcessEvent(), called
            |              |               | in emacs_Xt_next_event();
            |              |               | dispatched to various callbacks]
            |              |               |
            |              |               |
       emacs_Xt_        p_s_callback(),    | [popup_selection_callback]
       event_handler()  x_u_v_s_callback(),| [x_update_vertical_scrollbar_
            |           x_u_h_s_callback(),|  callback]
            |           search_callback()  | [x_update_horizontal_scrollbar_
            |              |               |  callback]
            |              |               |
            |              |               |
       enqueue_Xt_       signal_special_   |
       dispatch_event()  Xt_user_event()   |
       [maybe multiple     |               |
        times, maybe 0     |               |
        times]             |               |
            |            enqueue_Xt_       |
            |            dispatch_event()  |
            |              |               |
            |              |               |
            V              V               |
            -->----------<--               |
                   |                       |
                   |                       |
                dispatch             Xt_what_callback()
                event                  sets flags
                queue                      |
                   |                       |
                   |                       |
                   |                       |
                   |                       |
                   ---->-----------<--------
                        |
                        |
                        |     [collected and converted as appropriate in
                        |            emacs_Xt_next_event()]
                        |
                        |
                        V          (above this line is Xt-specific)
                      Emacs ------------------------------------------------
                      event (below this line is the generic event mechanism)
                        |
                        |
     was there      if not, call
     a SIGINT?   emacs_Xt_next_event()
         |              |
         |              |
         |              |
         V              V
         --->-------<----
                |
                |        [collected in event_stream_next_event();
                |         SIGINT is converted using maybe_read_quit_event()]
                V
              Emacs
              event
                |
                \---->------>----- maybe_kbd_translate() -->-----\
                                                                 |
                                                                 |
                                                                 |
          command event queue                                    |
                                                   if not from command
       (contains events that were                  event queue, call
       read earlier but not processed,             event_stream_next_event()
       typically when waiting in a                               |
       sit-for, sleep-for, etc. for                              |
      a particular event to be received)                         |
                    |                                            |
                    |                                            |
                    V                                            V
                    ---->----------------------------------<------
                                                    |
                                                    | [collected in
                                                    |  next_event_internal()]
                                                    |
      unread-     unread-       event from          |
      command-    command-       keyboard       else, call
      events      event           macro      next_event_internal()
        |           |               |               |
        |           |               |               |
        |           |               |               |
        V           V               V               V
        --------->----------------------<------------
                          |
                          |      [collected in `next-event', which may loop
                          |       more than once if the event it gets is on
                          |       a dead frame, device, etc.]
                          |
                          |
                          V
                 feed into top-level event loop,
                 which repeatedly calls `next-event'
                 and then dispatches the event
                 using `dispatch-event'


File: internals.info,  Node: Specifics About the Emacs Event,  Next: The Event Stream Callback Routines,  Prev: Specifics of the Event Gathering Mechanism,  Up: Events and the Event Loop

13.4 Specifics About the Emacs Event
====================================


File: internals.info,  Node: The Event Stream Callback Routines,  Next: Other Event Loop Functions,  Prev: Specifics About the Emacs Event,  Up: Events and the Event Loop

13.5 The Event Stream Callback Routines
=======================================


File: internals.info,  Node: Other Event Loop Functions,  Next: Converting Events,  Prev: The Event Stream Callback Routines,  Up: Events and the Event Loop

13.6 Other Event Loop Functions
===============================

`detect_input_pending()' and `input-pending-p' look for input by
calling `event_stream->event_pending_p' and looking in
`[V]unread-command-event' and the `command_event_queue' (they do not
check for an executing keyboard macro, though).

   `discard-input' cancels any command events pending (and any keyboard
macros currently executing), and puts the others onto the
`command_event_queue'.  There is a comment about a "race condition",
which is not a good sign.

   `next-command-event' and `read-char' are higher-level interfaces to
`next-event'.  `next-command-event' gets the next "command" event (i.e.
keypress, mouse event, menu selection, or scrollbar action), calling
`dispatch-event' on any others.  `read-char' calls `next-command-event'
and uses `event_to_character()' to return the character equivalent.
With the right kind of input method support, it is possible for
(read-char) to return a Kanji character.


File: internals.info,  Node: Converting Events,  Next: Dispatching Events; The Command Builder,  Prev: Other Event Loop Functions,  Up: Events and the Event Loop

13.7 Converting Events
======================

`character_to_event()', `event_to_character()', `event-to-character',
and `character-to-event' convert between characters and keypress events
corresponding to the characters.  If the event was not a keypress,
`event_to_character()' returns -1 and `event-to-character' returns
`nil'.  These functions convert between character representation and
the split-up event representation (keysym plus mod keys).


File: internals.info,  Node: Dispatching Events; The Command Builder,  Prev: Converting Events,  Up: Events and the Event Loop

13.8 Dispatching Events; The Command Builder
============================================

Not yet documented.


File: internals.info,  Node: Evaluation; Stack Frames; Bindings,  Next: Symbols and Variables,  Prev: Events and the Event Loop,  Up: Top

14 Evaluation; Stack Frames; Bindings
*************************************

* Menu:

* Evaluation::
* Dynamic Binding; The specbinding Stack; Unwind-Protects::
* Simple Special Forms::
* Catch and Throw::


File: internals.info,  Node: Evaluation,  Next: Dynamic Binding; The specbinding Stack; Unwind-Protects,  Up: Evaluation; Stack Frames; Bindings

14.1 Evaluation
===============

`Feval()' evaluates the form (a Lisp object) that is passed to it.
Note that evaluation is only non-trivial for two types of objects:
symbols and conses.  A symbol is evaluated simply by calling
`symbol-value' on it and returning the value.

   Evaluating a cons means calling a function.  First, `eval' checks to
see if garbage-collection is necessary, and calls `garbage_collect_1()'
if so.  It then increases the evaluation depth by 1 (`lisp_eval_depth',
which is always less than `max_lisp_eval_depth') and adds an element to
the linked list of `struct backtrace''s (`backtrace_list').  Each such
structure contains a pointer to the function being called plus a list
of the function's arguments.  Originally these values are stored
unevalled, and as they are evaluated, the backtrace structure is
updated.  Garbage collection pays attention to the objects pointed to
in the backtrace structures (garbage collection might happen while a
function is being called or while an argument is being evaluated, and
there could easily be no other references to the arguments in the
argument list; once an argument is evaluated, however, the unevalled
version is not needed by eval, and so the backtrace structure is
changed).

   At this point, the function to be called is determined by looking at
the car of the cons (if this is a symbol, its function definition is
retrieved and the process repeated).  The function should then consist
of either a `Lisp_Subr' (built-in function written in C), a
`Lisp_Compiled_Function' object, or a cons whose car is one of the
symbols `autoload', `macro' or `lambda'.

   If the function is a `Lisp_Subr', the lisp object points to a
`struct Lisp_Subr' (created by `DEFUN()'), which contains a pointer to
the C function, a minimum and maximum number of arguments (or possibly
the special constants `MANY' or `UNEVALLED'), a pointer to the symbol
referring to that subr, and a couple of other things.  If the subr
wants its arguments `UNEVALLED', they are passed raw as a list.
Otherwise, an array of evaluated arguments is created and put into the
backtrace structure, and either passed whole (`MANY') or each argument
is passed as a C argument.

   If the function is a `Lisp_Compiled_Function',
`funcall_compiled_function()' is called.  If the function is a lambda
list, `funcall_lambda()' is called.  If the function is a macro, [.....
fill in] is done.  If the function is an autoload, `do_autoload()' is
called to load the definition and then eval starts over [explain this
more].

   When `Feval()' exits, the evaluation depth is reduced by one, the
debugger is called if appropriate, and the current backtrace structure
is removed from the list.

   Both `funcall_compiled_function()' and `funcall_lambda()' need to go
through the list of formal parameters to the function and bind them to
the actual arguments, checking for `&rest' and `&optional' symbols in
the formal parameters and making sure the number of actual arguments is
correct.  `funcall_compiled_function()' can do this a little more
efficiently, since the formal parameter list can be checked for sanity
when the compiled function object is created.

   `funcall_lambda()' simply calls `Fprogn' to execute the code in the
lambda list.

   `funcall_compiled_function()' calls the real byte-code interpreter
`execute_optimized_program()' on the byte-code instructions, which are
converted into an internal form for faster execution.

   When a compiled function is executed for the first time by
`funcall_compiled_function()', or during the dump phase of building
XEmacs, the byte-code instructions are converted from a `Lisp_String'
(which is inefficient to access, especially in the presence of MULE)
into a `Lisp_Opaque' object containing an array of unsigned char, which
can be directly executed by the byte-code interpreter.  At this time
the byte code is also analyzed for validity and transformed into a more
optimized form, so that `execute_optimized_program()' can really fly.

   Here are some of the optimizations performed by the internal
byte-code transformer:
  1. References to the `constants' array are checked for out-of-range
     indices, so that the byte interpreter doesn't have to.

  2. References to the `constants' array that will be used as a Lisp
     variable are checked for being correct non-constant (i.e. not `t',
     `nil', or `keywordp') symbols, so that the byte interpreter
     doesn't have to.

  3. The maximum number of variable bindings in the byte-code is
     pre-computed, so that space on the `specpdl' stack can be
     pre-reserved once for the whole function execution.

  4. All byte-code jumps are relative to the current program counter
     instead of the start of the program, thereby saving a register.

  5. One-byte relative jumps are converted from the byte-code form of
     unsigned chars offset by 127 to machine-friendly signed chars.

   Of course, this transformation of the `instructions' should not be
visible to the user, so `Fcompiled_function_instructions()' needs to
know how to convert the optimized opaque object back into a Lisp string
that is identical to the original string from the `.elc' file.
(Actually, the resulting string may (rarely) contain slightly
different, yet equivalent, byte code.)

   `Ffuncall()' implements Lisp `funcall'.  `(funcall fun x1 x2 x3
...)' is equivalent to `(eval (list fun (quote x1) (quote x2) (quote
x3) ...))'.  `Ffuncall()' contains its own code to do the evaluation,
however, and is very similar to `Feval()'.

   From the performance point of view, it is worth knowing that most of
the time in Lisp evaluation is spent executing `Lisp_Subr' and
`Lisp_Compiled_Function' objects via `Ffuncall()' (not `Feval()').

   `Fapply()' implements Lisp `apply', which is very similar to
`funcall' except that if the last argument is a list, the result is the
same as if each of the arguments in the list had been passed separately.
`Fapply()' does some business to expand the last argument if it's a
list, then calls `Ffuncall()' to do the work.

   `apply1()', `call0()', `call1()', `call2()', and `call3()' call a
function, passing it the argument(s) given (the arguments are given as
separate C arguments rather than being passed as an array).  `apply1()'
uses `Fapply()' while the others use `Ffuncall()' to do the real work.


File: internals.info,  Node: Dynamic Binding; The specbinding Stack; Unwind-Protects,  Next: Simple Special Forms,  Prev: Evaluation,  Up: Evaluation; Stack Frames; Bindings

14.2 Dynamic Binding; The specbinding Stack; Unwind-Protects
============================================================

     struct specbinding
     {
       Lisp_Object symbol;
       Lisp_Object old_value;
       Lisp_Object (*func) (Lisp_Object); /* for unwind-protect */
     };

   `struct specbinding' is used for local-variable bindings and
unwind-protects.  `specpdl' holds an array of `struct specbinding''s,
`specpdl_ptr' points to the beginning of the free bindings in the
array, `specpdl_size' specifies the total number of binding slots in
the array, and `max_specpdl_size' specifies the maximum number of
bindings the array can be expanded to hold.  `grow_specpdl()' increases
the size of the `specpdl' array, multiplying its size by 2 but never
exceeding `max_specpdl_size' (except that if this number is less than
400, it is first set to 400).

   `specbind()' binds a symbol to a value and is used for local
variables and `let' forms.  The symbol and its old value (which might
be `Qunbound', indicating no prior value) are recorded in the specpdl
array, and `specpdl_size' is increased by 1.

   `record_unwind_protect()' implements an "unwind-protect", which,
when placed around a section of code, ensures that some specified
cleanup routine will be executed even if the code exits abnormally
(e.g. through a `throw' or quit).  `record_unwind_protect()' simply
adds a new specbinding to the `specpdl' array and stores the
appropriate information in it.  The cleanup routine can either be a C
function, which is stored in the `func' field, or a `progn' form, which
is stored in the `old_value' field.

   `unbind_to()' removes specbindings from the `specpdl' array until
the specified position is reached.  Each specbinding can be one of
three types:

  1. an unwind-protect with a C cleanup function (`func' is not 0, and
     `old_value' holds an argument to be passed to the function);

  2. an unwind-protect with a Lisp form (`func' is 0, `symbol' is
     `nil', and `old_value' holds the form to be executed with
     `Fprogn()'); or

  3. a local-variable binding (`func' is 0, `symbol' is not `nil', and
     `old_value' holds the old value, which is stored as the symbol's
     value).


File: internals.info,  Node: Simple Special Forms,  Next: Catch and Throw,  Prev: Dynamic Binding; The specbinding Stack; Unwind-Protects,  Up: Evaluation; Stack Frames; Bindings

14.3 Simple Special Forms
=========================

`or', `and', `if', `cond', `progn', `prog1', `prog2', `setq', `quote',
`function', `let*', `let', `while'

   All of these are very simple and work as expected, calling `Feval()'
or `Fprogn()' as necessary and (in the case of `let' and `let*') using
`specbind()' to create bindings and `unbind_to()' to undo the bindings
when finished.

   Note that, with the exception of `Fprogn', these functions are
typically called in real life only in interpreted code, since the byte
compiler knows how to convert calls to these functions directly into
byte code.