git.chise.org Git - chise/xemacs-chise.git.1/blob - man/internals/internals.texi

   1 \input texinfo  @c -*-texinfo-*-
   2 @c %**start of header
   3 @setfilename ../../info/internals.info
   4 @settitle XEmacs Internals Manual
   5 @c %**end of header
   6
   7 @ifinfo
   8 @dircategory XEmacs Editor
   9 @direntry
  10 * Internals: (internals).       XEmacs Internals Manual.
  11 @end direntry
  12
  13 Copyright @copyright{} 1992 - 1996 Ben Wing.
  14 Copyright @copyright{} 1996, 1997 Sun Microsystems.
  15 Copyright @copyright{} 1994 - 1998 Free Software Foundation.
  16 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
  17
  18
  19 Permission is granted to make and distribute verbatim copies of this
  20 manual provided the copyright notice and this permission notice are
  21 preserved on all copies.
  22
  23 @ignore
  24 Permission is granted to process this file through TeX and print the
  25 results, provided the printed document carries copying permission notice
  26 identical to this one except for the removal of this paragraph (this
  27 paragraph not being relevant to the printed manual).
  28
  29 @end ignore
  30 Permission is granted to copy and distribute modified versions of this
  31 manual under the conditions for verbatim copying, provided that the
  32 entire resulting derived work is distributed under the terms of a
  33 permission notice identical to this one.
  34
  35 Permission is granted to copy and distribute translations of this manual
  36 into another language, under the above conditions for modified versions,
  37 except that this permission notice may be stated in a translation
  38 approved by the Foundation.
  39
  40 Permission is granted to copy and distribute modified versions of this
  41 manual under the conditions for verbatim copying, provided also that the
  42 section entitled ``GNU General Public License'' is included exactly as
  43 in the original, and provided that the entire resulting derived work is
  44 distributed under the terms of a permission notice identical to this
  45 one.
  46
  47 Permission is granted to copy and distribute translations of this manual
  48 into another language, under the above conditions for modified versions,
  49 except that the section entitled ``GNU General Public License'' may be
  50 included in a translation approved by the Free Software Foundation
  51 instead of in the original English.
  52 @end ifinfo
  53
  54 @c Combine indices.
  55 @synindex cp fn
  56 @syncodeindex vr fn
  57 @syncodeindex ky fn
  58 @syncodeindex pg fn
  59 @syncodeindex tp fn
  60
  61 @setchapternewpage odd
  62 @finalout
  63
  64 @titlepage
  65 @title XEmacs Internals Manual
  66 @subtitle Version 1.3, August 1999
  67
  68 @author Ben Wing
  69 @author Martin Buchholz
  70 @author Hrvoje Niksic
  71 @author Matthias Neubauer
  72 @author Olivier Galibert
  73 @page
  74 @vskip 0pt plus 1fill
  75
  76 @noindent
  77 Copyright @copyright{} 1992 - 1996 Ben Wing. @*
  78 Copyright @copyright{} 1996, 1997 Sun Microsystems, Inc. @*
  79 Copyright @copyright{} 1994 - 1998 Free Software Foundation. @*
  80 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
  81
  82 @sp 2
  83 Version 1.3 @*
  84 August 1999.@*
  85
  86 Permission is granted to make and distribute verbatim copies of this
  87 manual provided the copyright notice and this permission notice are
  88 preserved on all copies.
  89
  90 Permission is granted to copy and distribute modified versions of this
  91 manual under the conditions for verbatim copying, provided also that the
  92 section entitled ``GNU General Public License'' is included
  93 exactly as in the original, and provided that the entire resulting
  94 derived work is distributed under the terms of a permission notice
  95 identical to this one.
  96
  97 Permission is granted to copy and distribute translations of this manual
  98 into another language, under the above conditions for modified versions,
  99 except that the section entitled ``GNU General Public License'' may be
 100 included in a translation approved by the Free Software Foundation
 101 instead of in the original English.
 102 @end titlepage
 103 @page
 104
 105 @node Top, A History of Emacs, (dir), (dir)
 106
 107 @ifinfo
 108 This Info file contains v1.0 of the XEmacs Internals Manual.
 109 @end ifinfo
 110
 111 @menu
 112 * A History of Emacs::          Times, dates, important events.
 113 * XEmacs From the Outside::     A broad conceptual overview.
 114 * The Lisp Language::           An overview.
 115 * XEmacs From the Perspective of Building::
 116 * XEmacs From the Inside::
 117 * The XEmacs Object System (Abstractly Speaking)::
 118 * How Lisp Objects Are Represented in C::
 119 * Rules When Writing New C Code::
 120 * A Summary of the Various XEmacs Modules::
 121 * Allocation of Objects in XEmacs Lisp::
 122 * Dumping::
 123 * Events and the Event Loop::
 124 * Evaluation; Stack Frames; Bindings::
 125 * Symbols and Variables::
 126 * Buffers and Textual Representation::
 127 * MULE Character Sets and Encodings::
 128 * The Lisp Reader and Compiler::
 129 * Lstreams::
 130 * Consoles; Devices; Frames; Windows::
 131 * The Redisplay Mechanism::
 132 * Extents::
 133 * Faces::
 134 * Glyphs::
 135 * Specifiers::
 136 * Menus::
 137 * Subprocesses::
 138 * Interface to X Windows::
 139 * Index::
 140
 141 @detailmenu --- The Detailed Node Listing ---
 142
 143 A History of Emacs
 144
 145 * Through Version 18::          Unification prevails.
 146 * Lucid Emacs::                 One version 19 Emacs.
 147 * GNU Emacs 19::                The other version 19 Emacs.
 148 * GNU Emacs 20::                The other version 20 Emacs.
 149 * XEmacs::                      The continuation of Lucid Emacs.
 150
 151 Rules When Writing New C Code
 152
 153 * General Coding Rules::
 154 * Writing Lisp Primitives::
 155 * Adding Global Lisp Variables::
 156 * Coding for Mule::
 157 * Techniques for XEmacs Developers::
 158
 159 Coding for Mule
 160
 161 * Character-Related Data Types::
 162 * Working With Character and Byte Positions::
 163 * Conversion to and from External Data::
 164 * General Guidelines for Writing Mule-Aware Code::
 165 * An Example of Mule-Aware Code::
 166
 167 A Summary of the Various XEmacs Modules
 168
 169 * Low-Level Modules::
 170 * Basic Lisp Modules::
 171 * Modules for Standard Editing Operations::
 172 * Editor-Level Control Flow Modules::
 173 * Modules for the Basic Displayable Lisp Objects::
 174 * Modules for other Display-Related Lisp Objects::
 175 * Modules for the Redisplay Mechanism::
 176 * Modules for Interfacing with the File System::
 177 * Modules for Other Aspects of the Lisp Interpreter and Object System::
 178 * Modules for Interfacing with the Operating System::
 179 * Modules for Interfacing with X Windows::
 180 * Modules for Internationalization::
 181
 182 Allocation of Objects in XEmacs Lisp
 183
 184 * Introduction to Allocation::
 185 * Garbage Collection::
 186 * GCPROing::
 187 * Garbage Collection - Step by Step::
 188 * Integers and Characters::
 189 * Allocation from Frob Blocks::
 190 * lrecords::
 191 * Low-level allocation::
 192 * Pure Space::
 193 * Cons::
 194 * Vector::
 195 * Bit Vector::
 196 * Symbol::
 197 * Marker::
 198 * String::
 199 * Compiled Function::
 200
 201 Garbage Collection - Step by Step
 202
 203 * Invocation::
 204 * garbage_collect_1::
 205 * mark_object::
 206 * gc_sweep::
 207 * sweep_lcrecords_1::
 208 * compact_string_chars::
 209 * sweep_strings::
 210 * sweep_bit_vectors_1::
 211
 212 Dumping
 213
 214 * Overview::
 215 * Data descriptions::
 216 * Dumping phase::
 217 * Reloading phase::
 218
 219 Dumping phase
 220
 221 * Object inventory::
 222 * Address allocation::
 223 * The header::
 224 * Data dumping::
 225 * Pointers dumping::
 226
 227 Events and the Event Loop
 228
 229 * Introduction to Events::
 230 * Main Loop::
 231 * Specifics of the Event Gathering Mechanism::
 232 * Specifics About the Emacs Event::
 233 * The Event Stream Callback Routines::
 234 * Other Event Loop Functions::
 235 * Converting Events::
 236 * Dispatching Events; The Command Builder::
 237
 238 Evaluation; Stack Frames; Bindings
 239
 240 * Evaluation::
 241 * Dynamic Binding; The specbinding Stack; Unwind-Protects::
 242 * Simple Special Forms::
 243 * Catch and Throw::
 244
 245 Symbols and Variables
 246
 247 * Introduction to Symbols::
 248 * Obarrays::
 249 * Symbol Values::
 250
 251 Buffers and Textual Representation
 252
 253 * Introduction to Buffers::     A buffer holds a block of text such as a file.
 254 * The Text in a Buffer::        Representation of the text in a buffer.
 255 * Buffer Lists::                Keeping track of all buffers.
 256 * Markers and Extents::         Tagging locations within a buffer.
 257 * Bufbytes and Emchars::        Representation of individual characters.
 258 * The Buffer Object::           The Lisp object corresponding to a buffer.
 259
 260 MULE Character Sets and Encodings
 261
 262 * Character Sets::
 263 * Encodings::
 264 * Internal Mule Encodings::
 265 * CCL::
 266
 267 Encodings
 268
 269 * Japanese EUC (Extended Unix Code)::
 270 * JIS7::
 271
 272 Internal Mule Encodings
 273
 274 * Internal String Encoding::
 275 * Internal Character Encoding::
 276
 277 Lstreams
 278
 279 * Creating an Lstream::         Creating an lstream object.
 280 * Lstream Types::               Different sorts of things that are streamed.
 281 * Lstream Functions::           Functions for working with lstreams.
 282 * Lstream Methods::             Creating new lstream types.
 283
 284 Consoles; Devices; Frames; Windows
 285
 286 * Introduction to Consoles; Devices; Frames; Windows::
 287 * Point::
 288 * Window Hierarchy::
 289 * The Window Object::
 290
 291 The Redisplay Mechanism
 292
 293 * Critical Redisplay Sections::
 294 * Line Start Cache::
 295 * Redisplay Piece by Piece::
 296
 297 Extents
 298
 299 * Introduction to Extents::     Extents are ranges over text, with properties.
 300 * Extent Ordering::             How extents are ordered internally.
 301 * Format of the Extent Info::   The extent information in a buffer or string.
 302 * Zero-Length Extents::         A weird special case.
 303 * Mathematics of Extent Ordering::  A rigorous foundation.
 304 * Extent Fragments::            Cached information useful for redisplay.
 305
 306 @end detailmenu
 307 @end menu
 308
 309 @node A History of Emacs, XEmacs From the Outside, Top, Top
 310 @chapter A History of Emacs
 311 @cindex history of Emacs
 312 @cindex Hackers (Steven Levy)
 313 @cindex Levy, Steven
 314 @cindex ITS (Incompatible Timesharing System)
 315 @cindex Stallman, Richard
 316 @cindex RMS
 317 @cindex MIT
 318 @cindex TECO
 319 @cindex FSF
 320 @cindex Free Software Foundation
 321
 322   XEmacs is a powerful, customizable text editor and development
 323 environment.  It began as Lucid Emacs, which was in turn derived from
 324 GNU Emacs, a program written by Richard Stallman of the Free Software
 325 Foundation.  GNU Emacs dates back to the 1970's, and was modelled
 326 after a package called ``Emacs'', written in 1976, that was a set of
 327 macros on top of TECO, an old, old text editor written at MIT on the
 328 DEC PDP 10 under one of the earliest time-sharing operating systems,
 329 ITS (Incompatible Timesharing System). (ITS dates back well before
 330 Unix.) ITS, TECO, and Emacs were products of a group of people at MIT
 331 who called themselves ``hackers'', who shared an idealistic belief
 332 system about the free exchange of information and were fanatical in
 333 their devotion to and time spent with computers. (The hacker
 334 subculture dates back to the late 1950's at MIT and is described in
 335 detail in Steven Levy's book @cite{Hackers}.  This book also includes
 336 a lot of information about Stallman himself and the development of
 337 Lisp, a programming language developed at MIT that underlies Emacs.)
 338
 339 @menu
 340 * Through Version 18::          Unification prevails.
 341 * Lucid Emacs::                 One version 19 Emacs.
 342 * GNU Emacs 19::                The other version 19 Emacs.
 343 * GNU Emacs 20::                The other version 20 Emacs.
 344 * XEmacs::                      The continuation of Lucid Emacs.
 345 @end menu
 346
 347 @node Through Version 18, Lucid Emacs, A History of Emacs, A History of Emacs
 348 @section Through Version 18
 349 @cindex Gosling, James
 350 @cindex Great Usenet Renaming
 351
 352   Although the history of the early versions of GNU Emacs is unclear,
 353 the history is well-known from the middle of 1985.  A time line is:
 354
 355 @itemize @bullet
 356 @item
 357 GNU Emacs version 15 (15.34) was released sometime in 1984 or 1985 and
 358 shared some code with a version of Emacs written by James Gosling (the
 359 same James Gosling who later created the Java language).
 360 @item
 361 GNU Emacs version 16 (first released version was 16.56) was released on
 362 July 15, 1985.  All Gosling code was removed due to potential copyright
 363 problems with the code.
 364 @item
 365 version 16.57: released on September 16, 1985.
 366 @item
 367 versions 16.58, 16.59: released on September 17, 1985.
 368 @item
 369 version 16.60: released on September 19, 1985.  These later version 16's
 370 incorporated patches from the net, esp. for getting Emacs to work under
 371 System V.
 372 @item
 373 version 17.36 (first official v17 release) released on December 20,
 374 1985.  Included a TeX-able user manual.  First official unpatched
 375 version that worked on vanilla System V machines.
 376 @item
 377 version 17.43 (second official v17 release) released on January 25,
 378 1986.
 379 @item
 380 version 17.45 released on January 30, 1986.
 381 @item
 382 version 17.46 released on February 4, 1986.
 383 @item
 384 version 17.48 released on February 10, 1986.
 385 @item
 386 version 17.49 released on February 12, 1986.
 387 @item
 388 version 17.55 released on March 18, 1986.
 389 @item
 390 version 17.57 released on March 27, 1986.
 391 @item
 392 version 17.58 released on April 4, 1986.
 393 @item
 394 version 17.61 released on April 12, 1986.
 395 @item
 396 version 17.63 released on May 7, 1986.
 397 @item
 398 version 17.64 released on May 12, 1986.
 399 @item
 400 version 18.24 (a beta version) released on October 2, 1986.
 401 @item
 402 version 18.30 (a beta version) released on November 15, 1986.
 403 @item
 404 version 18.31 (a beta version) released on November 23, 1986.
 405 @item
 406 version 18.32 (a beta version) released on December 7, 1986.
 407 @item
 408 version 18.33 (a beta version) released on December 12, 1986.
 409 @item
 410 version 18.35 (a beta version) released on January 5, 1987.
 411 @item
 412 version 18.36 (a beta version) released on January 21, 1987.
 413 @item
 414 January 27, 1987: The Great Usenet Renaming.  net.emacs is now
 415 comp.emacs.
 416 @item
 417 version 18.37 (a beta version) released on February 12, 1987.
 418 @item
 419 version 18.38 (a beta version) released on March 3, 1987.
 420 @item
 421 version 18.39 (a beta version) released on March 14, 1987.
 422 @item
 423 version 18.40 (a beta version) released on March 18, 1987.
 424 @item
 425 version 18.41 (the first ``official'' release) released on March 22,
 426 1987.
 427 @item
 428 version 18.45 released on June 2, 1987.
 429 @item
 430 version 18.46 released on June 9, 1987.
 431 @item
 432 version 18.47 released on June 18, 1987.
 433 @item
 434 version 18.48 released on September 3, 1987.
 435 @item
 436 version 18.49 released on September 18, 1987.
 437 @item
 438 version 18.50 released on February 13, 1988.
 439 @item
 440 version 18.51 released on May 7, 1988.
 441 @item
 442 version 18.52 released on September 1, 1988.
 443 @item
 444 version 18.53 released on February 24, 1989.
 445 @item
 446 version 18.54 released on April 26, 1989.
 447 @item
 448 version 18.55 released on August 23, 1989.  This is the earliest version
 449 that is still available by FTP.
 450 @item
 451 version 18.56 released on January 17, 1991.
 452 @item
 453 version 18.57 released late January, 1991.
 454 @item
 455 version 18.58 released ?????.
 456 @item
 457 version 18.59 released October 31, 1992.
 458 @end itemize
 459
 460 @node Lucid Emacs, GNU Emacs 19, Through Version 18, A History of Emacs
 461 @section Lucid Emacs
 462 @cindex Lucid Emacs
 463 @cindex Lucid Inc.
 464 @cindex Energize
 465 @cindex Epoch
 466
 467   Lucid Emacs was developed by the (now-defunct) Lucid Inc., a maker of
 468 C++ and Lisp development environments.  It began when Lucid decided they
 469 wanted to use Emacs as the editor and cornerstone of their C++
 470 development environment (called ``Energize'').  They needed many features
 471 that were not available in the existing version of GNU Emacs (version
 472 18.5something), in particular good and integrated support for GUI
 473 elements such as mouse support, multiple fonts, multiple window-system
 474 windows, etc.  A branch of GNU Emacs called Epoch, written at the
 475 University of Illinois, existed that supplied many of these features;
 476 however, Lucid needed more than what existed in Epoch.  At the time, the
 477 Free Software Foundation was working on version 19 of Emacs (this was
 478 sometime around 1991), which was planned to have similar features, and
 479 so Lucid decided to work with the Free Software Foundation.  Their plan
 480 was to add features that they needed, and coordinate with the FSF so
 481 that the features would get included back into Emacs version 19.
 482
 483   Delays in the release of version 19 occurred, however (resulting in it
 484 finally being released more than a year after what was initially
 485 planned), and Lucid encountered unexpected technical resistance in
 486 getting their changes merged back into version 19, so they decided to
 487 release their own version of Emacs, which became Lucid Emacs 19.0.
 488
 489 @cindex Zawinski, Jamie
 490 @cindex Sexton, Harlan
 491 @cindex Benson, Eric
 492 @cindex Devin, Matthieu
 493   The initial authors of Lucid Emacs were Matthieu Devin, Harlan Sexton,
 494 and Eric Benson, and the work was later taken over by Jamie Zawinski,
 495 who became ``Mr. Lucid Emacs'' for many releases.
 496
 497   A time line for Lucid Emacs/XEmacs is
 498
 499 @itemize @bullet
 500 @item
 501 version 19.0 shipped with Energize 1.0, April 1992.
 502 @item
 503 version 19.1 released June 4, 1992.
 504 @item
 505 version 19.2 released June 19, 1992.
 506 @item
 507 version 19.3 released September 9, 1992.
 508 @item
 509 version 19.4 released January 21, 1993.
 510 @item
 511 version 19.5 was a repackaging of 19.4 with a few bug fixes and
 512 shipped with Energize 2.0.  Never released to the net.
 513 @item
 514 version 19.6 released April 9, 1993.
 515 @item
 516 version 19.7 was a repackaging of 19.6 with a few bug fixes and
 517 shipped with Energize 2.1.  Never released to the net.
 518 @item
 519 version 19.8 released September 6, 1993.
 520 @item
 521 version 19.9 released January 12, 1994.
 522 @item
 523 version 19.10 released May 27, 1994.
 524 @item
 525 version 19.11 (first XEmacs) released September 13, 1994.
 526 @item
 527 version 19.12 released June 23, 1995.
 528 @item
 529 version 19.13 released September 1, 1995.
 530 @item
 531 version 19.14 released June 23, 1996.
 532 @item
 533 version 20.0 released February 9, 1997.
 534 @item
 535 version 19.15 released March 28, 1997.
 536 @item
 537 version 20.1 (not released to the net) April 15, 1997.
 538 @item
 539 version 20.2 released May 16, 1997.
 540 @item
 541 version 19.16 released October 31, 1997.
 542 @item
 543 version 20.3 (the first stable version of XEmacs 20.x) released November 30,
 544 1997.
 545 version 20.4 released February 28, 1998.
 546 @end itemize
 547
 548 @node GNU Emacs 19, GNU Emacs 20, Lucid Emacs, A History of Emacs
 549 @section GNU Emacs 19
 550 @cindex GNU Emacs 19
 551 @cindex FSF Emacs
 552
 553   About a year after the initial release of Lucid Emacs, the FSF
 554 released a beta of their version of Emacs 19 (referred to here as ``GNU
 555 Emacs'').  By this time, the current version of Lucid Emacs was
 556 19.6. (Strangely, the first released beta from the FSF was GNU Emacs
 557 19.7.) A time line for GNU Emacs version 19 is
 558
 559 @itemize @bullet
 560 @item
 561 version 19.8 (beta) released May 27, 1993.
 562 @item
 563 version 19.9 (beta) released May 27, 1993.
 564 @item
 565 version 19.10 (beta) released May 30, 1993.
 566 @item
 567 version 19.11 (beta) released June 1, 1993.
 568 @item
 569 version 19.12 (beta) released June 2, 1993.
 570 @item
 571 version 19.13 (beta) released June 8, 1993.
 572 @item
 573 version 19.14 (beta) released June 17, 1993.
 574 @item
 575 version 19.15 (beta) released June 19, 1993.
 576 @item
 577 version 19.16 (beta) released July 6, 1993.
 578 @item
 579 version 19.17 (beta) released late July, 1993.
 580 @item
 581 version 19.18 (beta) released August 9, 1993.
 582 @item
 583 version 19.19 (beta) released August 15, 1993.
 584 @item
 585 version 19.20 (beta) released November 17, 1993.
 586 @item
 587 version 19.21 (beta) released November 17, 1993.
 588 @item
 589 version 19.22 (beta) released November 28, 1993.
 590 @item
 591 version 19.23 (beta) released May 17, 1994.
 592 @item
 593 version 19.24 (beta) released May 16, 1994.
 594 @item
 595 version 19.25 (beta) released June 3, 1994.
 596 @item
 597 version 19.26 (beta) released September 11, 1994.
 598 @item
 599 version 19.27 (beta) released September 14, 1994.
 600 @item
 601 version 19.28 (first ``official'' release) released November 1, 1994.
 602 @item
 603 version 19.29 released June 21, 1995.
 604 @item
 605 version 19.30 released November 24, 1995.
 606 @item
 607 version 19.31 released May 25, 1996.
 608 @item
 609 version 19.32 released July 31, 1996.
 610 @item
 611 version 19.33 released August 11, 1996.
 612 @item
 613 version 19.34 released August 21, 1996.
 614 @item
 615 version 19.34b released September 6, 1996.
 616 @end itemize
 617
 618 @cindex Mlynarik, Richard
 619   In some ways, GNU Emacs 19 was better than Lucid Emacs; in some ways,
 620 worse.  Lucid soon began incorporating features from GNU Emacs 19 into
 621 Lucid Emacs; the work was mostly done by Richard Mlynarik, who had been
 622 working on and using GNU Emacs for a long time (back as far as version
 623 16 or 17).
 624
 625 @node GNU Emacs 20, XEmacs, GNU Emacs 19, A History of Emacs
 626 @section GNU Emacs 20
 627 @cindex GNU Emacs 20
 628 @cindex FSF Emacs
 629
 630 On February 2, 1997 work began on GNU Emacs to integrate Mule.  The first
 631 release was made in September of that year.
 632
 633 A timeline for Emacs 20 is
 634
 635 @itemize @bullet
 636 @item
 637 version 20.1 released September 17, 1997.
 638 @item
 639 version 20.2 released September 20, 1997.
 640 @item
 641 version 20.3 released August 19, 1998.
 642 @end itemize
 643
 644 @node XEmacs,  , GNU Emacs 20, A History of Emacs
 645 @section XEmacs
 646 @cindex XEmacs
 647
 648 @cindex Sun Microsystems
 649 @cindex University of Illinois
 650 @cindex Illinois, University of
 651 @cindex SPARCWorks
 652 @cindex Andreessen, Marc
 653 @cindex Baur, Steve
 654 @cindex Buchholz, Martin
 655 @cindex Kaplan, Simon
 656 @cindex Wing, Ben
 657 @cindex Thompson, Chuck
 658 @cindex Win-Emacs
 659 @cindex Epoch
 660 @cindex Amdahl Corporation
 661   Around the time that Lucid was developing Energize, Sun Microsystems
 662 was developing their own development environment (called ``SPARCWorks'')
 663 and also decided to use Emacs.  They joined forces with the Epoch team
 664 at the University of Illinois and later with Lucid.  The maintainer of
 665 the last-released version of Epoch was Marc Andreessen, but he dropped
 666 out and the Epoch project, headed by Simon Kaplan, lured Chuck Thompson
 667 away from a system administration job to become the primary Lucid Emacs
 668 author for Epoch and Sun.  Chuck's area of specialty became the
 669 redisplay engine (he replaced the old Lucid Emacs redisplay engine with
 670 a ported version from Epoch and then later rewrote it from scratch).
 671 Sun also hired Ben Wing (the author of Win-Emacs, a port of Lucid Emacs
 672 to Microsoft Windows 3.1) in 1993, for what was initially a one-month
 673 contract to fix some event problems but later became a many-year
 674 involvement, punctuated by a six-month contract with Amdahl Corporation.
 675
 676 @cindex rename to XEmacs
 677   In 1994, Sun and Lucid agreed to rename Lucid Emacs to XEmacs (a name
 678 not favorable to either company); the first release called XEmacs was
 679 version 19.11.  In June 1994, Lucid folded and Jamie quit to work for
 680 the newly formed Mosaic Communications Corp., later Netscape
 681 Communications Corp. (co-founded by the same Marc Andreessen, who had
 682 quit his Epoch job to work on a graphical browser for the World Wide
 683 Web).  Chuck then become the primary maintainer of XEmacs, and put out
 684 versions 19.11 through 19.14 in conjunction with Ben.  For 19.12 and
 685 19.13, Chuck added the new redisplay and many other display improvements
 686 and Ben added MULE support (support for Asian and other languages) and
 687 redesigned most of the internal Lisp subsystems to better support the
 688 MULE work and the various other features being added to XEmacs.  After
 689 19.14 Chuck retired as primary maintainer and Steve Baur stepped in.
 690
 691 @cindex MULE merged XEmacs appears
 692   Soon after 19.13 was released, work began in earnest on the MULE
 693 internationalization code and the source tree was divided into two
 694 development paths.  The MULE version was initially called 19.20, but was
 695 soon renamed to 20.0.  In 1996 Martin Buchholz of Sun Microsystems took
 696 over the care and feeding of it and worked on it in parallel with the
 697 19.14 development that was occurring at the same time.  After much work
 698 by Martin, it was decided to release 20.0 ahead of 19.15 in February
 699 1997.  The source tree remained divided until 20.2 when the version 19
 700 source was finally retired at version 19.16.
 701
 702 @cindex Baur, Steve
 703 @cindex Buchholz, Martin
 704 @cindex Jones, Kyle
 705 @cindex Niksic, Hrvoje
 706 @cindex XEmacs goes it alone
 707   In 1997, Sun finally dropped all pretense of support for XEmacs and
 708 Martin Buchholz left the company in November.  Since then, and mostly
 709 for the previous year, because Steve Baur was never paid to work on
 710 XEmacs, XEmacs has existed solely on the contributions of volunteers
 711 from the Free Software Community.  Starting from 1997, Hrvoje Niksic and
 712 Kyle Jones have figured prominently in XEmacs development.
 713
 714 @cindex merging attempts
 715   Many attempts have been made to merge XEmacs and GNU Emacs, but they
 716 have consistently failed.
 717
 718   A more detailed history is contained in the XEmacs About page.
 719
 720 @node XEmacs From the Outside, The Lisp Language, A History of Emacs, Top
 721 @chapter XEmacs From the Outside
 722 @cindex read-eval-print
 723
 724   XEmacs appears to the outside world as an editor, but it is really a
 725 Lisp environment.  At its heart is a Lisp interpreter; it also
 726 ``happens'' to contain many specialized object types (e.g. buffers,
 727 windows, frames, events) that are useful for implementing an editor.
 728 Some of these objects (in particular windows and frames) have
 729 displayable representations, and XEmacs provides a function
 730 @code{redisplay()} that ensures that the display of all such objects
 731 matches their internal state.  Most of the time, a standard Lisp
 732 environment is in a @dfn{read-eval-print} loop---i.e. ``read some Lisp
 733 code, execute it, and print the results''.  XEmacs has a similar loop:
 734
 735 @itemize @bullet
 736 @item
 737 read an event
 738 @item
 739 dispatch the event (i.e. ``do it'')
 740 @item
 741 redisplay
 742 @end itemize
 743
 744   Reading an event is done using the Lisp function @code{next-event},
 745 which waits for something to happen (typically, the user presses a key
 746 or moves the mouse) and returns an event object describing this.
 747 Dispatching an event is done using the Lisp function
 748 @code{dispatch-event}, which looks up the event in a keymap object (a
 749 particular kind of object that associates an event with a Lisp function)
 750 and calls that function.  The function ``does'' what the user has
 751 requested by changing the state of particular frame objects, buffer
 752 objects, etc.  Finally, @code{redisplay()} is called, which updates the
 753 display to reflect those changes just made.  Thus is an ``editor'' born.
 754
 755 @cindex bridge, playing
 756 @cindex taxes, doing
 757 @cindex pi, calculating
 758   Note that you do not have to use XEmacs as an editor; you could just
 759 as well make it do your taxes, compute pi, play bridge, etc.  You'd just
 760 have to write functions to do those operations in Lisp.
 761
 762 @node The Lisp Language, XEmacs From the Perspective of Building, XEmacs From the Outside, Top
 763 @chapter The Lisp Language
 764 @cindex Lisp vs. C
 765 @cindex C vs. Lisp
 766 @cindex Lisp vs. Java
 767 @cindex Java vs. Lisp
 768 @cindex dynamic scoping
 769 @cindex scoping, dynamic
 770 @cindex dynamic types
 771 @cindex types, dynamic
 772 @cindex Java
 773 @cindex Common Lisp
 774 @cindex Gosling, James
 775
 776   Lisp is a general-purpose language that is higher-level than C and in
 777 many ways more powerful than C.  Powerful dialects of Lisp such as
 778 Common Lisp are probably much better languages for writing very large
 779 applications than is C. (Unfortunately, for many non-technical
 780 reasons C and its successor C++ have become the dominant languages for
 781 application development.  These languages are both inadequate for
 782 extremely large applications, which is evidenced by the fact that newer,
 783 larger programs are becoming ever harder to write and are requiring ever
 784 more programmers despite great increases in C development environments;
 785 and by the fact that, although hardware speeds and reliability have been
 786 growing at an exponential rate, most software is still generally
 787 considered to be slow and buggy.)
 788
 789   The new Java language holds promise as a better general-purpose
 790 development language than C.  Java has many features in common with
 791 Lisp that are not shared by C (this is not a coincidence, since
 792 Java was designed by James Gosling, a former Lisp hacker).  This
 793 will be discussed more later.
 794
 795 For those used to C, here is a summary of the basic differences between
 796 C and Lisp:
 797
 798 @enumerate
 799 @item
 800 Lisp has an extremely regular syntax.  Every function, expression,
 801 and control statement is written in the form
 802
 803 @example
 804    (@var{func} @var{arg1} @var{arg2} ...)
 805 @end example
 806
 807 This is as opposed to C, which writes functions as
 808
 809 @example
 810    func(@var{arg1}, @var{arg2}, ...)
 811 @end example
 812
 813 but writes expressions involving operators as (e.g.)
 814
 815 @example
 816    @var{arg1} + @var{arg2}
 817 @end example
 818
 819 and writes control statements as (e.g.)
 820
 821 @example
 822    while (@var{expr}) @{ @var{statement1}; @var{statement2}; ... @}
 823 @end example
 824
 825 Lisp equivalents of the latter two would be
 826
 827 @example
 828    (+ @var{arg1} @var{arg2} ...)
 829 @end example
 830
 831 and
 832
 833 @example
 834    (while @var{expr} @var{statement1} @var{statement2} ...)
 835 @end example
 836
 837 @item
 838 Lisp is a safe language.  Assuming there are no bugs in the Lisp
 839 interpreter/compiler, it is impossible to write a program that ``core
 840 dumps'' or otherwise causes the machine to execute an illegal
 841 instruction.  This is very different from C, where perhaps the most
 842 common outcome of a bug is exactly such a crash.  A corollary of this is that
 843 the C operation of casting a pointer is impossible (and unnecessary) in
 844 Lisp, and that it is impossible to access memory outside the bounds of
 845 an array.
 846
 847 @item
 848 Programs and data are written in the same form.  The
 849 parenthesis-enclosing form described above for statements is the same
 850 form used for the most common data type in Lisp, the list.  Thus, it is
 851 possible to represent any Lisp program using Lisp data types, and for
 852 one program to construct Lisp statements and then dynamically
 853 @dfn{evaluate} them, or cause them to execute.
 854
 855 @item
 856 All objects are @dfn{dynamically typed}.  This means that part of every
 857 object is an indication of what type it is.  A Lisp program can
 858 manipulate an object without knowing what type it is, and can query an
 859 object to determine its type.  This means that, correspondingly,
 860 variables and function parameters can hold objects of any type and are
 861 not normally declared as being of any particular type.  This is opposed
 862 to the @dfn{static typing} of C, where variables can hold exactly one
 863 type of object and must be declared as such, and objects do not contain
 864 an indication of their type because it's implicit in the variables they
 865 are stored in.  It is possible in C to have a variable hold different
 866 types of objects (e.g. through the use of @code{void *} pointers or
 867 variable-argument functions), but the type information must then be
 868 passed explicitly in some other fashion, leading to additional program
 869 complexity.
 870
 871 @item
 872 Allocated memory is automatically reclaimed when it is no longer in use.
 873 This operation is called @dfn{garbage collection} and involves looking
 874 through all variables to see what memory is being pointed to, and
 875 reclaiming any memory that is not pointed to and is thus
 876 ``inaccessible'' and out of use.  This is as opposed to C, in which
 877 allocated memory must be explicitly reclaimed using @code{free()}.  If
 878 you simply drop all pointers to memory without freeing it, it becomes
 879 ``leaked'' memory that still takes up space.  Over a long period of
 880 time, this can cause your program to grow and grow until it runs out of
 881 memory.
 882
 883 @item
 884 Lisp has built-in facilities for handling errors and exceptions.  In C,
 885 when an error occurs, usually either the program exits entirely or the
 886 routine in which the error occurs returns a value indicating this.  If
 887 an error occurs in a deeply-nested routine, then every routine currently
 888 called must unwind itself normally and return an error value back up to
 889 the next routine.  This means that every routine must explicitly check
 890 for an error in all the routines it calls; if it does not do so,
 891 unexpected and often random behavior results.  This is an extremely
 892 common source of bugs in C programs.  An alternative would be to do a
 893 non-local exit using @code{longjmp()}, but that is often very dangerous
 894 because the routines that were exited past had no opportunity to clean
 895 up after themselves and may leave things in an inconsistent state,
 896 causing a crash shortly afterwards.
 897
 898 Lisp provides mechanisms to make such non-local exits safe.  When an
 899 error occurs, a routine simply signals that an error of a particular
 900 class has occurred, and a non-local exit takes place.  Any routine can
 901 trap errors occurring in routines it calls by registering an error
 902 handler for some or all classes of errors. (If no handler is registered,
 903 a default handler, generally installed by the top-level event loop, is
 904 executed; this prints out the error and continues.) Routines can also
 905 specify cleanup code (called an @dfn{unwind-protect}) that will be
 906 called when control exits from a block of code, no matter how that exit
 907 occurs---i.e. even if a function deeply nested below it causes a
 908 non-local exit back to the top level.
 909
 910 Note that this facility has appeared in some recent vintages of C, in
 911 particular Visual C++ and other PC compilers written for the Microsoft
 912 Win32 API.
 913
 914 @item
 915 In Emacs Lisp, local variables are @dfn{dynamically scoped}.  This means
 916 that if you declare a local variable in a particular function, and then
 917 call another function, that subfunction can ``see'' the local variable
 918 you declared.  This is actually considered a bug in Emacs Lisp and in
 919 all other early dialects of Lisp, and was corrected in Common Lisp. (In
 920 Common Lisp, you can still declare dynamically scoped variables if you
 921 want to---they are sometimes useful---but variables by default are
 922 @dfn{lexically scoped} as in C.)
 923 @end enumerate
 924
 925 For those familiar with Lisp, Emacs Lisp is modelled after MacLisp, an
 926 early dialect of Lisp developed at MIT (no relation to the Macintosh
 927 computer).  There is a Common Lisp compatibility package available for
 928 Emacs that provides many of the features of Common Lisp.
 929
 930 The Java language is derived in many ways from C, and shares a similar
 931 syntax, but has the following features in common with Lisp (and different
 932 from C):
 933
 934 @enumerate
 935 @item
 936 Java is a safe language, like Lisp.
 937 @item
 938 Java provides garbage collection, like Lisp.
 939 @item
 940 Java has built-in facilities for handling errors and exceptions, like
 941 Lisp.
 942 @item
 943 Java has a type system that combines the best advantages of both static
 944 and dynamic typing.  Objects (except very simple types) are explicitly
 945 marked with their type, as in dynamic typing; but there is a hierarchy
 946 of types and functions are declared to accept only certain types, thus
 947 providing the increased compile-time error-checking of static typing.
 948 @end enumerate
 949
 950 The Java language also has some negative attributes:
 951
 952 @enumerate
 953 @item
 954 Java uses the edit/compile/run model of software development.  This
 955 makes it hard to use interactively.  For example, to use Java like
 956 @code{bc} it is necessary to write a special purpose, albeit tiny,
 957 application.  In Emacs Lisp, a calculator comes built-in without any
 958 effort - one can always just type an expression in the @code{*scratch*}
 959 buffer.
 960 @item
 961 Java tries too hard to enforce, not merely enable, portability, making
 962 ordinary access to standard OS facilities painful.  Java has an
 963 @dfn{agenda}.  I think this is why @code{chdir} is not part of standard
 964 Java, which is inexcusable.
 965 @end enumerate
 966
 967 Unfortunately, there is no perfect language.  Static typing allows a
 968 compiler to catch programmer errors and produce more efficient code, but
 969 makes programming more tedious and less fun.  For the forseeable future,
 970 an Ideal Editing and Programming Environment (and that is what XEmacs
 971 aspires to) will be programmable in multiple languages: high level ones
 972 like Lisp for user customization and prototyping, and lower level ones
 973 for infrastructure and industrial strength applications.  If I had my
 974 way, XEmacs would be friendly towards the Python, Scheme, C++, ML,
 975 etc... communities.  But there are serious technical difficulties to
 976 achieving that goal.
 977
 978 The word @dfn{application} in the previous paragraph was used
 979 intentionally.  XEmacs implements an API for programs written in Lisp
 980 that makes it a full-fledged application platform, very much like an OS
 981 inside the real OS.
 982
 983 @node XEmacs From the Perspective of Building, XEmacs From the Inside, The Lisp Language, Top
 984 @chapter XEmacs From the Perspective of Building
 985
 986 The heart of XEmacs is the Lisp environment, which is written in C.
 987 This is contained in the @file{src/} subdirectory.  Underneath
 988 @file{src/} are two subdirectories of header files: @file{s/} (header
 989 files for particular operating systems) and @file{m/} (header files for
 990 particular machine types).  In practice the distinction between the two
 991 types of header files is blurred.  These header files define or undefine
 992 certain preprocessor constants and macros to indicate particular
 993 characteristics of the associated machine or operating system.  As part
 994 of the configure process, one @file{s/} file and one @file{m/} file is
 995 identified for the particular environment in which XEmacs is being
 996 built.
 997
 998 XEmacs also contains a great deal of Lisp code.  This implements the
 999 operations that make XEmacs useful as an editor as well as just a Lisp
1000 environment, and also contains many add-on packages that allow XEmacs to
1001 browse directories, act as a mail and Usenet news reader, compile Lisp
1002 code, etc.  There is actually more Lisp code than C code associated with
1003 XEmacs, but much of the Lisp code is peripheral to the actual operation
1004 of the editor.  The Lisp code all lies in subdirectories underneath the
1005 @file{lisp/} directory.
1006
1007 The @file{lwlib/} directory contains C code that implements a
1008 generalized interface onto different X widget toolkits and also
1009 implements some widgets of its own that behave like Motif widgets but
1010 are faster, free, and in some cases more powerful.  The code in this
1011 directory compiles into a library and is mostly independent from XEmacs.
1012
1013 The @file{etc/} directory contains various data files associated with
1014 XEmacs.  Some of them are actually read by XEmacs at startup; others
1015 merely contain useful information of various sorts.
1016
1017 The @file{lib-src/} directory contains C code for various auxiliary
1018 programs that are used in connection with XEmacs.  Some of them are used
1019 during the build process; others are used to perform certain functions
1020 that cannot conveniently be placed in the XEmacs executable (e.g. the
1021 @file{movemail} program for fetching mail out of @file{/var/spool/mail},
1022 which must be setgid to @file{mail} on many systems; and the
1023 @file{gnuclient} program, which allows an external script to communicate
1024 with a running XEmacs process).
1025
1026 The @file{man/} directory contains the sources for the XEmacs
1027 documentation.  It is mostly in a form called Texinfo, which can be
1028 converted into either a printed document (by passing it through @TeX{})
1029 or into on-line documentation called @dfn{info files}.
1030
1031 The @file{info/} directory contains the results of formatting the XEmacs
1032 documentation as @dfn{info files}, for on-line use.  These files are
1033 used when you enter the Info system using @kbd{C-h i} or through the
1034 Help menu.
1035
1036 The @file{dynodump/} directory contains auxiliary code used to build
1037 XEmacs on Solaris platforms.
1038
1039 The other directories contain various miscellaneous code and information
1040 that is not normally used or needed.
1041
1042 The first step of building involves running the @file{configure} program
1043 and passing it various parameters to specify any optional features you
1044 want and compiler arguments and such, as described in the @file{INSTALL}
1045 file.  This determines what the build environment is, chooses the
1046 appropriate @file{s/} and @file{m/} file, and runs a series of tests to
1047 determine many details about your environment, such as which library
1048 functions are available and exactly how they work.  The reason for
1049 running these tests is that it allows XEmacs to be compiled on a much
1050 wider variety of platforms than those that the XEmacs developers happen
1051 to be familiar with, including various sorts of hybrid platforms.  This
1052 is especially important now that many operating systems give you a great
1053 deal of control over exactly what features you want installed, and allow
1054 for easy upgrading of parts of a system without upgrading the rest.  It
1055 would be impossible to pre-determine and pre-specify the information for
1056 all possible configurations.
1057
1058 In fact, the @file{s/} and @file{m/} files are basically @emph{evil},
1059 since they contain unmaintainable platform-specific hard-coded
1060 information.  XEmacs has been moving in the direction of having all
1061 system-specific information be determined dynamically by
1062 @file{configure}.  Perhaps someday we can @code{rm -rf src/s src/m}.
1063
1064 When configure is done running, it generates @file{Makefile}s and
1065 @file{GNUmakefile}s and the file @file{src/config.h} (which describes
1066 the features of your system) from template files.  You then run
1067 @file{make}, which compiles the auxiliary code and programs in
1068 @file{lib-src/} and @file{lwlib/} and the main XEmacs executable in
1069 @file{src/}.  The result of compiling and linking is an executable
1070 called @file{temacs}, which is @emph{not} the final XEmacs executable.
1071 @file{temacs} by itself is not intended to function as an editor or even
1072 display any windows on the screen, and if you simply run it, it will
1073 exit immediately.  The @file{Makefile} runs @file{temacs} with certain
1074 options that cause it to initialize itself, read in a number of basic
1075 Lisp files, and then dump itself out into a new executable called
1076 @file{xemacs}.  This new executable has been pre-initialized and
1077 contains pre-digested Lisp code that is necessary for the editor to
1078 function (this includes most basic editing functions,
1079 e.g. @code{kill-line}, that can be defined in terms of other Lisp
1080 primitives; some initialization code that is called when certain
1081 objects, such as frames, are created; and all of the standard
1082 keybindings and code for the actions they result in).  This executable,
1083 @file{xemacs}, is the executable that you run to use the XEmacs editor.
1084
1085 Although @file{temacs} is not intended to be run as an editor, it can,
1086 by using the incantation @code{temacs -batch -l loadup.el run-temacs}.
1087 This is useful when the dumping procedure described above is broken, or
1088 when using certain program debugging tools such as Purify.  These tools
1089 get mighty confused by the tricks played by the XEmacs build process,
1090 such as allocation memory in one process, and freeing it in the next.
1091
1092 @node XEmacs From the Inside, The XEmacs Object System (Abstractly Speaking), XEmacs From the Perspective of Building, Top
1093 @chapter XEmacs From the Inside
1094
1095 Internally, XEmacs is quite complex, and can be very confusing.  To
1096 simplify things, it can be useful to think of XEmacs as containing an
1097 event loop that ``drives'' everything, and a number of other subsystems,
1098 such as a Lisp engine and a redisplay mechanism.  Each of these other
1099 subsystems exists simultaneously in XEmacs, and each has a certain
1100 state.  The flow of control continually passes in and out of these
1101 different subsystems in the course of normal operation of the editor.
1102
1103 It is important to keep in mind that, most of the time, the editor is
1104 ``driven'' by the event loop.  Except during initialization and batch
1105 mode, all subsystems are entered directly or indirectly through the
1106 event loop, and ultimately, control exits out of all subsystems back up
1107 to the event loop.  This cycle of entering a subsystem, exiting back out
1108 to the event loop, and starting another iteration of the event loop
1109 occurs once each keystroke, mouse motion, etc.
1110
1111 If you're trying to understand a particular subsystem (other than the
1112 event loop), think of it as a ``daemon'' process or ``servant'' that is
1113 responsible for one particular aspect of a larger system, and
1114 periodically receives commands or environment changes that cause it to
1115 do something.  Ultimately, these commands and environment changes are
1116 always triggered by the event loop.  For example:
1117
1118 @itemize @bullet
1119 @item
1120 The window and frame mechanism is responsible for keeping track of what
1121 windows and frames exist, what buffers are in them, etc.  It is
1122 periodically given commands (usually from the user) to make a change to
1123 the current window/frame state: i.e. create a new frame, delete a
1124 window, etc.
1125
1126 @item
1127 The buffer mechanism is responsible for keeping track of what buffers
1128 exist and what text is in them.  It is periodically given commands
1129 (usually from the user) to insert or delete text, create a buffer, etc.
1130 When it receives a text-change command, it notifies the redisplay
1131 mechanism.
1132
1133 @item
1134 The redisplay mechanism is responsible for making sure that windows and
1135 frames are displayed correctly.  It is periodically told (by the event
1136 loop) to actually ``do its job'', i.e. snoop around and see what the
1137 current state of the environment (mostly of the currently-existing
1138 windows, frames, and buffers) is, and make sure that that state matches
1139 what's actually displayed.  It keeps lots and lots of information around
1140 (such as what is actually being displayed currently, and what the
1141 environment was last time it checked) so that it can minimize the work
1142 it has to do.  It is also helped along in that whenever a relevant
1143 change to the environment occurs, the redisplay mechanism is told about
1144 this, so it has a pretty good idea of where it has to look to find
1145 possible changes and doesn't have to look everywhere.
1146
1147 @item
1148 The Lisp engine is responsible for executing the Lisp code in which most
1149 user commands are written.  It is entered through a call to @code{eval}
1150 or @code{funcall}, which occurs as a result of dispatching an event from
1151 the event loop.  The functions it calls issue commands to the buffer
1152 mechanism, the window/frame subsystem, etc.
1153
1154 @item
1155 The Lisp allocation subsystem is responsible for keeping track of Lisp
1156 objects.  It is given commands from the Lisp engine to allocate objects,
1157 garbage collect, etc.
1158 @end itemize
1159
1160 etc.
1161
1162   The important idea here is that there are a number of independent
1163 subsystems each with its own responsibility and persistent state, just
1164 like different employees in a company, and each subsystem is
1165 periodically given commands from other subsystems.  Commands can flow
1166 from any one subsystem to any other, but there is usually some sort of
1167 hierarchy, with all commands originating from the event subsystem.
1168
1169   XEmacs is entered in @code{main()}, which is in @file{emacs.c}.  When
1170 this is called the first time (in a properly-invoked @file{temacs}), it
1171 does the following:
1172
1173 @enumerate
1174 @item
1175 It does some very basic environment initializations, such as determining
1176 where it and its directories (e.g. @file{lisp/} and @file{etc/}) reside
1177 and setting up signal handlers.
1178 @item
1179 It initializes the entire Lisp interpreter.
1180 @item
1181 It sets the initial values of many built-in variables (including many
1182 variables that are visible to Lisp programs), such as the global keymap
1183 object and the built-in faces (a face is an object that describes the
1184 display characteristics of text).  This involves creating Lisp objects
1185 and thus is dependent on step (2).
1186 @item
1187 It performs various other initializations that are relevant to the
1188 particular environment it is running in, such as retrieving environment
1189 variables, determining the current date and the user who is running the
1190 program, examining its standard input, creating any necessary file
1191 descriptors, etc.
1192 @item
1193 At this point, the C initialization is complete.  A Lisp program that
1194 was specified on the command line (usually @file{loadup.el}) is called
1195 (temacs is normally invoked as @code{temacs -batch -l loadup.el dump}).
1196 @file{loadup.el} loads all of the other Lisp files that are needed for
1197 the operation of the editor, calls the @code{dump-emacs} function to
1198 write out @file{xemacs}, and then kills the temacs process.
1199 @end enumerate
1200
1201   When @file{xemacs} is then run, it only redoes steps (1) and (4)
1202 above; all variables already contain the values they were set to when
1203 the executable was dumped, and all memory that was allocated with
1204 @code{malloc()} is still around. (XEmacs knows whether it is being run
1205 as @file{xemacs} or @file{temacs} because it sets the global variable
1206 @code{initialized} to 1 after step (4) above.) At this point,
1207 @file{xemacs} calls a Lisp function to do any further initialization,
1208 which includes parsing the command-line (the C code can only do limited
1209 command-line parsing, which includes looking for the @samp{-batch} and
1210 @samp{-l} flags and a few other flags that it needs to know about before
1211 initialization is complete), creating the first frame (or @dfn{window}
1212 in standard window-system parlance), running the user's init file
1213 (usually the file @file{.emacs} in the user's home directory), etc.  The
1214 function to do this is usually called @code{normal-top-level};
1215 @file{loadup.el} tells the C code about this function by setting its
1216 name as the value of the Lisp variable @code{top-level}.
1217
1218   When the Lisp initialization code is done, the C code enters the event
1219 loop, and stays there for the duration of the XEmacs process.  The code
1220 for the event loop is contained in @file{keyboard.c}, and is called
1221 @code{Fcommand_loop_1()}.  Note that this event loop could very well be
1222 written in Lisp, and in fact a Lisp version exists; but apparently,
1223 doing this makes XEmacs run noticeably slower.
1224
1225   Notice how much of the initialization is done in Lisp, not in C.
1226 In general, XEmacs tries to move as much code as is possible
1227 into Lisp.  Code that remains in C is code that implements the
1228 Lisp interpreter itself, or code that needs to be very fast, or
1229 code that needs to do system calls or other such stuff that
1230 needs to be done in C, or code that needs to have access to
1231 ``forbidden'' structures. (One conscious aspect of the design of
1232 Lisp under XEmacs is a clean separation between the external
1233 interface to a Lisp object's functionality and its internal
1234 implementation.  Part of this design is that Lisp programs
1235 are forbidden from accessing the contents of the object other
1236 than through using a standard API.  In this respect, XEmacs Lisp
1237 is similar to modern Lisp dialects but differs from GNU Emacs,
1238 which tends to expose the implementation and allow Lisp
1239 programs to look at it directly.  The major advantage of
1240 hiding the implementation is that it allows the implementation
1241 to be redesigned without affecting any Lisp programs, including
1242 those that might want to be ``clever'' by looking directly at
1243 the object's contents and possibly manipulating them.)
1244
1245   Moving code into Lisp makes the code easier to debug and maintain and
1246 makes it much easier for people who are not XEmacs developers to
1247 customize XEmacs, because they can make a change with much less chance
1248 of obscure and unwanted interactions occurring than if they were to
1249 change the C code.
1250
1251 @node The XEmacs Object System (Abstractly Speaking), How Lisp Objects Are Represented in C, XEmacs From the Inside, Top
1252 @chapter The XEmacs Object System (Abstractly Speaking)
1253
1254   At the heart of the Lisp interpreter is its management of objects.
1255 XEmacs Lisp contains many built-in objects, some of which are
1256 simple and others of which can be very complex; and some of which
1257 are very common, and others of which are rarely used or are only
1258 used internally. (Since the Lisp allocation system, with its
1259 automatic reclamation of unused storage, is so much more convenient
1260 than @code{malloc()} and @code{free()}, the C code makes extensive use of it
1261 in its internal operations.)
1262
1263   The basic Lisp objects are
1264
1265 @table @code
1266 @item integer
1267 28 or 31 bits of precision, or 60 or 63 bits on 64-bit machines; the
1268 reason for this is described below when the internal Lisp object
1269 representation is described.
1270 @item float
1271 Same precision as a double in C.
1272 @item cons
1273 A simple container for two Lisp objects, used to implement lists and
1274 most other data structures in Lisp.
1275 @item char
1276 An object representing a single character of text; chars behave like
1277 integers in many ways but are logically considered text rather than
1278 numbers and have a different read syntax. (the read syntax for a char
1279 contains the char itself or some textual encoding of it---for example,
1280 a Japanese Kanji character might be encoded as @samp{^[$(B#&^[(B} using the
1281 ISO-2022 encoding standard---rather than the numerical representation
1282 of the char; this way, if the mapping between chars and integers
1283 changes, which is quite possible for Kanji characters and other extended
1284 characters, the same character will still be created.  Note that some
1285 primitives confuse chars and integers.  The worst culprit is @code{eq},
1286 which makes a special exception and considers a char to be @code{eq} to
1287 its integer equivalent, even though in no other case are objects of two
1288 different types @code{eq}.  The reason for this monstrosity is
1289 compatibility with existing code; the separation of char from integer
1290 came fairly recently.)
1291 @item symbol
1292 An object that contains Lisp objects and is referred to by name;
1293 symbols are used to implement variables and named functions
1294 and to provide the equivalent of preprocessor constants in C.
1295 @item vector
1296 A one-dimensional array of Lisp objects providing constant-time access
1297 to any of the objects; access to an arbitrary object in a vector is
1298 faster than for lists, but the operations that can be done on a vector
1299 are more limited.
1300 @item string
1301 Self-explanatory; behaves much like a vector of chars
1302 but has a different read syntax and is stored and manipulated
1303 more compactly.
1304 @item bit-vector
1305 A vector of bits; similar to a string in spirit.
1306 @item compiled-function
1307 An object containing compiled Lisp code, known as @dfn{byte code}.
1308 @item subr
1309 A Lisp primitive, i.e. a Lisp-callable function implemented in C.
1310 @end table
1311
1312 @cindex closure
1313 Note that there is no basic ``function'' type, as in more powerful
1314 versions of Lisp (where it's called a @dfn{closure}).  XEmacs Lisp does
1315 not provide the closure semantics implemented by Common Lisp and Scheme.
1316 The guts of a function in XEmacs Lisp are represented in one of four
1317 ways: a symbol specifying another function (when one function is an
1318 alias for another), a list (whose first element must be the symbol
1319 @code{lambda}) containing the function's source code, a
1320 compiled-function object, or a subr object. (In other words, given a
1321 symbol specifying the name of a function, calling @code{symbol-function}
1322 to retrieve the contents of the symbol's function cell will return one
1323 of these types of objects.)
1324
1325 XEmacs Lisp also contains numerous specialized objects used to implement
1326 the editor:
1327
1328 @table @code
1329 @item buffer
1330 Stores text like a string, but is optimized for insertion and deletion
1331 and has certain other properties that can be set.
1332 @item frame
1333 An object with various properties whose displayable representation is a
1334 @dfn{window} in window-system parlance.
1335 @item window
1336 A section of a frame that displays the contents of a buffer;
1337 often called a @dfn{pane} in window-system parlance.
1338 @item window-configuration
1339 An object that represents a saved configuration of windows in a frame.
1340 @item device
1341 An object representing a screen on which frames can be displayed;
1342 equivalent to a @dfn{display} in the X Window System and a @dfn{TTY} in
1343 character mode.
1344 @item face
1345 An object specifying the appearance of text or graphics; it has
1346 properties such as font, foreground color, and background color.
1347 @item marker
1348 An object that refers to a particular position in a buffer and moves
1349 around as text is inserted and deleted to stay in the same relative
1350 position to the text around it.
1351 @item extent
1352 Similar to a marker but covers a range of text in a buffer; can also
1353 specify properties of the text, such as a face in which the text is to
1354 be displayed, whether the text is invisible or unmodifiable, etc.
1355 @item event
1356 Generated by calling @code{next-event} and contains information
1357 describing a particular event happening in the system, such as the user
1358 pressing a key or a process terminating.
1359 @item keymap
1360 An object that maps from events (described using lists, vectors, and
1361 symbols rather than with an event object because the mapping is for
1362 classes of events, rather than individual events) to functions to
1363 execute or other events to recursively look up; the functions are
1364 described by name, using a symbol, or using lists to specify the
1365 function's code.
1366 @item glyph
1367 An object that describes the appearance of an image (e.g.  pixmap) on
1368 the screen; glyphs can be attached to the beginning or end of extents
1369 and in some future version of XEmacs will be able to be inserted
1370 directly into a buffer.
1371 @item process
1372 An object that describes a connection to an externally-running process.
1373 @end table
1374
1375   There are some other, less-commonly-encountered general objects:
1376
1377 @table @code
1378 @item hash-table
1379 An object that maps from an arbitrary Lisp object to another arbitrary
1380 Lisp object, using hashing for fast lookup.
1381 @item obarray
1382 A limited form of hash-table that maps from strings to symbols; obarrays
1383 are used to look up a symbol given its name and are not actually their
1384 own object type but are kludgily represented using vectors with hidden
1385 fields (this representation derives from GNU Emacs).
1386 @item specifier
1387 A complex object used to specify the value of a display property; a
1388 default value is given and different values can be specified for
1389 particular frames, buffers, windows, devices, or classes of device.
1390 @item char-table
1391 An object that maps from chars or classes of chars to arbitrary Lisp
1392 objects; internally char tables use a complex nested-vector
1393 representation that is optimized to the way characters are represented
1394 as integers.
1395 @item range-table
1396 An object that maps from ranges of integers to arbitrary Lisp objects.
1397 @end table
1398
1399   And some strange special-purpose objects:
1400
1401 @table @code
1402 @item charset
1403 @itemx coding-system
1404 Objects used when MULE, or multi-lingual/Asian-language, support is
1405 enabled.
1406 @item color-instance
1407 @itemx font-instance
1408 @itemx image-instance
1409 An object that encapsulates a window-system resource; instances are
1410 mostly used internally but are exposed on the Lisp level for cleanness
1411 of the specifier model and because it's occasionally useful for Lisp
1412 program to create or query the properties of instances.
1413 @item subwindow
1414 An object that encapsulate a @dfn{subwindow} resource, i.e. a
1415 window-system child window that is drawn into by an external process;
1416 this object should be integrated into the glyph system but isn't yet,
1417 and may change form when this is done.
1418 @item tooltalk-message
1419 @itemx tooltalk-pattern
1420 Objects that represent resources used in the ToolTalk interprocess
1421 communication protocol.
1422 @item toolbar-button
1423 An object used in conjunction with the toolbar.
1424 @end table
1425
1426   And objects that are only used internally:
1427
1428 @table @code
1429 @item opaque
1430 A generic object for encapsulating arbitrary memory; this allows you the
1431 generality of @code{malloc()} and the convenience of the Lisp object
1432 system.
1433 @item lstream
1434 A buffering I/O stream, used to provide a unified interface to anything
1435 that can accept output or provide input, such as a file descriptor, a
1436 stdio stream, a chunk of memory, a Lisp buffer, a Lisp string, etc.;
1437 it's a Lisp object to make its memory management more convenient.
1438 @item char-table-entry
1439 Subsidiary objects in the internal char-table representation.
1440 @item extent-auxiliary
1441 @itemx menubar-data
1442 @itemx toolbar-data
1443 Various special-purpose objects that are basically just used to
1444 encapsulate memory for particular subsystems, similar to the more
1445 general ``opaque'' object.
1446 @item symbol-value-forward
1447 @itemx symbol-value-buffer-local
1448 @itemx symbol-value-varalias
1449 @itemx symbol-value-lisp-magic
1450 Special internal-only objects that are placed in the value cell of a
1451 symbol to indicate that there is something special with this variable --
1452 e.g. it has no value, it mirrors another variable, or it mirrors some C
1453 variable; there is really only one kind of object, called a
1454 @dfn{symbol-value-magic}, but it is sort-of halfway kludged into
1455 semi-different object types.
1456 @end table
1457
1458 @cindex permanent objects
1459 @cindex temporary objects
1460   Some types of objects are @dfn{permanent}, meaning that once created,
1461 they do not disappear until explicitly destroyed, using a function such
1462 as @code{delete-buffer}, @code{delete-window}, @code{delete-frame}, etc.
1463 Others will disappear once they are not longer used, through the garbage
1464 collection mechanism.  Buffers, frames, windows, devices, and processes
1465 are among the objects that are permanent.  Note that some objects can go
1466 both ways: Faces can be created either way; extents are normally
1467 permanent, but detached extents (extents not referring to any text, as
1468 happens to some extents when the text they are referring to is deleted)
1469 are temporary.  Note that some permanent objects, such as faces and
1470 coding systems, cannot be deleted.  Note also that windows are unique in
1471 that they can be @emph{undeleted} after having previously been
1472 deleted. (This happens as a result of restoring a window configuration.)
1473
1474 @cindex read syntax
1475   Note that many types of objects have a @dfn{read syntax}, i.e. a way of
1476 specifying an object of that type in Lisp code.  When you load a Lisp
1477 file, or type in code to be evaluated, what really happens is that the
1478 function @code{read} is called, which reads some text and creates an object
1479 based on the syntax of that text; then @code{eval} is called, which
1480 possibly does something special; then this loop repeats until there's
1481 no more text to read. (@code{eval} only actually does something special
1482 with symbols, which causes the symbol's value to be returned,
1483 similar to referencing a variable; and with conses [i.e. lists],
1484 which cause a function invocation.  All other values are returned
1485 unchanged.)
1486
1487   The read syntax
1488
1489 @example
1490 17297
1491 @end example
1492
1493 converts to an integer whose value is 17297.
1494
1495 @example
1496 1.983e-4
1497 @end example
1498
1499 converts to a float whose value is 1.983e-4, or .0001983.
1500
1501 @example
1502 ?b
1503 @end example
1504
1505 converts to a char that represents the lowercase letter b.
1506
1507 @example
1508 ?^[$(B#&^[(B
1509 @end example
1510
1511 (where @samp{^[} actually is an @samp{ESC} character) converts to a
1512 particular Kanji character when using an ISO2022-based coding system for
1513 input. (To decode this goo: @samp{ESC} begins an escape sequence;
1514 @samp{ESC $ (} is a class of escape sequences meaning ``switch to a
1515 94x94 character set''; @samp{ESC $ ( B} means ``switch to Japanese
1516 Kanji''; @samp{#} and @samp{&} collectively index into a 94-by-94 array
1517 of characters [subtract 33 from the ASCII value of each character to get
1518 the corresponding index]; @samp{ESC (} is a class of escape sequences
1519 meaning ``switch to a 94 character set''; @samp{ESC (B} means ``switch
1520 to US ASCII''.  It is a coincidence that the letter @samp{B} is used to
1521 denote both Japanese Kanji and US ASCII.  If the first @samp{B} were
1522 replaced with an @samp{A}, you'd be requesting a Chinese Hanzi character
1523 from the GB2312 character set.)
1524
1525 @example
1526 "foobar"
1527 @end example
1528
1529 converts to a string.
1530
1531 @example
1532 foobar
1533 @end example
1534
1535 converts to a symbol whose name is @code{"foobar"}.  This is done by
1536 looking up the string equivalent in the global variable
1537 @code{obarray}, whose contents should be an obarray.  If no symbol
1538 is found, a new symbol with the name @code{"foobar"} is automatically
1539 created and added to @code{obarray}; this process is called
1540 @dfn{interning} the symbol.
1541 @cindex interning
1542
1543 @example
1544 (foo . bar)
1545 @end example
1546
1547 converts to a cons cell containing the symbols @code{foo} and @code{bar}.
1548
1549 @example
1550 (1 a 2.5)
1551 @end example
1552
1553 converts to a three-element list containing the specified objects
1554 (note that a list is actually a set of nested conses; see the
1555 XEmacs Lisp Reference).
1556
1557 @example
1558 [1 a 2.5]
1559 @end example
1560
1561 converts to a three-element vector containing the specified objects.
1562
1563 @example
1564 #[... ... ... ...]
1565 @end example
1566
1567 converts to a compiled-function object (the actual contents are not
1568 shown since they are not relevant here; look at a file that ends with
1569 @file{.elc} for examples).
1570
1571 @example
1572 #*01110110
1573 @end example
1574
1575 converts to a bit-vector.
1576
1577 @example
1578 #s(hash-table ... ...)
1579 @end example
1580
1581 converts to a hash table (the actual contents are not shown).
1582
1583 @example
1584 #s(range-table ... ...)
1585 @end example
1586
1587 converts to a range table (the actual contents are not shown).
1588
1589 @example
1590 #s(char-table ... ...)
1591 @end example
1592
1593 converts to a char table (the actual contents are not shown).
1594
1595 Note that the @code{#s()} syntax is the general syntax for structures,
1596 which are not really implemented in XEmacs Lisp but should be.
1597
1598 When an object is printed out (using @code{print} or a related
1599 function), the read syntax is used, so that the same object can be read
1600 in again.
1601
1602 The other objects do not have read syntaxes, usually because it does not
1603 really make sense to create them in this fashion (i.e.  processes, where
1604 it doesn't make sense to have a subprocess created as a side effect of
1605 reading some Lisp code), or because they can't be created at all
1606 (e.g. subrs).  Permanent objects, as a rule, do not have a read syntax;
1607 nor do most complex objects, which contain too much state to be easily
1608 initialized through a read syntax.
1609
1610 @node How Lisp Objects Are Represented in C, Rules When Writing New C Code, The XEmacs Object System (Abstractly Speaking), Top
1611 @chapter How Lisp Objects Are Represented in C
1612
1613 Lisp objects are represented in C using a 32-bit or 64-bit machine word
1614 (depending on the processor; i.e. DEC Alphas use 64-bit Lisp objects and
1615 most other processors use 32-bit Lisp objects).  The representation
1616 stuffs a pointer together with a tag, as follows:
1617
1618 @example
1619  [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
1620  [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
1621
1622    <---> ^ <------------------------------------------------------>
1623     tag  |       a pointer to a structure, or an integer
1624          |
1625        mark bit
1626 @end example
1627
1628 The tag describes the type of the Lisp object.  For integers and chars,
1629 the lower 28 bits contain the value of the integer or char; for all
1630 others, the lower 28 bits contain a pointer.  The mark bit is used
1631 during garbage-collection, and is always 0 when garbage collection is
1632 not happening. (The way that garbage collection works, basically, is that it
1633 loops over all places where Lisp objects could exist---this includes
1634 all global variables in C that contain Lisp objects [including
1635 @code{Vobarray}, the C equivalent of @code{obarray}; through this, all
1636 Lisp variables will get marked], plus various other places---and
1637 recursively scans through the Lisp objects, marking each object it finds
1638 by setting the mark bit.  Then it goes through the lists of all objects
1639 allocated, freeing the ones that are not marked and turning off the mark
1640 bit of the ones that are marked.)
1641
1642 Lisp objects use the typedef @code{Lisp_Object}, but the actual C type
1643 used for the Lisp object can vary.  It can be either a simple type
1644 (@code{long} on the DEC Alpha, @code{int} on other machines) or a
1645 structure whose fields are bit fields that line up properly (actually, a
1646 union of structures is used).  Generally the simple integral type is
1647 preferable because it ensures that the compiler will actually use a
1648 machine word to represent the object (some compilers will use more
1649 general and less efficient code for unions and structs even if they can
1650 fit in a machine word).  The union type, however, has the advantage of
1651 stricter type checking (if you accidentally pass an integer where a Lisp
1652 object is desired, you get a compile error), and it makes it easier to
1653 decode Lisp objects when debugging.  The choice of which type to use is
1654 determined by the preprocessor constant @code{USE_UNION_TYPE} which is
1655 defined via the @code{--use-union-type} option to @code{configure}.
1656
1657 @cindex record type
1658
1659 Note that there are only eight types that the tag can represent, but
1660 many more actual types than this.  This is handled by having one of the
1661 tag types specify a meta-type called a @dfn{record}; for all such
1662 objects, the first four bytes of the pointed-to structure indicate what
1663 the actual type is.
1664
1665 Note also that having 28 bits for pointers and integers restricts a lot
1666 of things to 256 megabytes of memory. (Basically, enough pointers and
1667 indices and whatnot get stuffed into Lisp objects that the total amount
1668 of memory used by XEmacs can't grow above 256 megabytes.  In older
1669 versions of XEmacs and GNU Emacs, the tag was 5 bits wide, allowing for
1670 32 types, which was more than the actual number of types that existed at
1671 the time, and no ``record'' type was necessary.  However, this limited
1672 the editor to 64 megabytes total, which some users who edited large
1673 files might conceivably exceed.)
1674
1675 Also, note that there is an implicit assumption here that all pointers
1676 are low enough that the top bits are all zero and can just be chopped
1677 off.  On standard machines that allocate memory from the bottom up (and
1678 give each process its own address space), this works fine.  Some
1679 machines, however, put the data space somewhere else in memory
1680 (e.g. beginning at 0x80000000).  Those machines cope by defining
1681 @code{DATA_SEG_BITS} in the corresponding @file{m/} or @file{s/} file to
1682 the proper mask.  Then, pointers retrieved from Lisp objects are
1683 automatically OR'ed with this value prior to being used.
1684
1685 A corollary of the previous paragraph is that @strong{(pointers to)
1686 stack-allocated structures cannot be put into Lisp objects}.  The stack
1687 is generally located near the top of memory; if you put such a pointer
1688 into a Lisp object, it will get its top bits chopped off, and you will
1689 lose.
1690
1691 Actually, there's an alternative representation of a @code{Lisp_Object},
1692 invented by Kyle Jones, that is used when the
1693 @code{--use-minimal-tagbits} option to @code{configure} is used.  In
1694 this case the 2 lower bits are used for the tag bits.  This
1695 representation assumes that pointers to structs are always aligned to
1696 multiples of 4, so the lower 2 bits are always zero.
1697
1698 @example
1699  [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
1700  [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
1701
1702    <---------------------------------------------------------> <->
1703             a pointer to a structure, or an integer            tag
1704 @end example
1705
1706 A tag of 00 is used for all pointer object types, a tag of 10 is used
1707 for characters, and the other two tags 01 and 11 are joined together to
1708 form the integer object type.  The markbit is moved to part of the
1709 structure being pointed at (integers and chars do not need to be marked,
1710 since no memory is allocated).  This representation has these
1711 advantages:
1712
1713 @enumerate
1714 @item
1715 31 bits can be used for Lisp Integers.
1716 @item
1717 @emph{Any} pointer can be represented directly, and no bit masking
1718 operations are necessary.
1719 @end enumerate
1720
1721 The disadvantages are:
1722
1723 @enumerate
1724 @item
1725 An extra level of indirection is needed when accessing the object types
1726 that were not record types.  So checking whether a Lisp object is a cons
1727 cell becomes a slower operation.
1728 @item
1729 Mark bits can no longer be stored directly in Lisp objects, so another
1730 place for them must be found.  This means that a cons cell requires more
1731 memory than merely room for 2 lisp objects, leading to extra memory use.
1732 @end enumerate
1733
1734 Various macros are used to construct Lisp objects and extract the
1735 components.  Macros of the form @code{XINT()}, @code{XCHAR()},
1736 @code{XSTRING()}, @code{XSYMBOL()}, etc. mask out the pointer/integer
1737 field and cast it to the appropriate type.  All of the macros that
1738 construct pointers will @code{OR} with @code{DATA_SEG_BITS} if
1739 necessary.  @code{XINT()} needs to be a bit tricky so that negative
1740 numbers are properly sign-extended: Usually it does this by shifting the
1741 number four bits to the left and then four bits to the right.  This
1742 assumes that the right-shift operator does an arithmetic shift (i.e. it
1743 leaves the most-significant bit as-is rather than shifting in a zero, so
1744 that it mimics a divide-by-two even for negative numbers).  Not all
1745 machines/compilers do this, and on the ones that don't, a more
1746 complicated definition is selected by defining
1747 @code{EXPLICIT_SIGN_EXTEND}.
1748
1749 Note that when @code{ERROR_CHECK_TYPECHECK} is defined, the extractor
1750 macros become more complicated---they check the tag bits and/or the
1751 type field in the first four bytes of a record type to ensure that the
1752 object is really of the correct type.  This is great for catching places
1753 where an incorrect type is being dereferenced---this typically results
1754 in a pointer being dereferenced as the wrong type of structure, with
1755 unpredictable (and sometimes not easily traceable) results.
1756
1757 There are similar @code{XSET@var{TYPE}()} macros that construct a Lisp
1758 object.  These macros are of the form @code{XSET@var{TYPE}
1759 (@var{lvalue}, @var{result})},
1760 i.e. they have to be a statement rather than just used in an expression.
1761 The reason for this is that standard C doesn't let you ``construct'' a
1762 structure (but GCC does).  Granted, this sometimes isn't too convenient;
1763 for the case of integers, at least, you can use the function
1764 @code{make_int()}, which constructs and @emph{returns} an integer
1765 Lisp object.  Note that the @code{XSET@var{TYPE}()} macros are also
1766 affected by @code{ERROR_CHECK_TYPECHECK} and make sure that the
1767 structure is of the right type in the case of record types, where the
1768 type is contained in the structure.
1769
1770 The C programmer is responsible for @strong{guaranteeing} that a
1771 Lisp_Object is is the correct type before using the @code{X@var{TYPE}}
1772 macros.  This is especially important in the case of lists.  Use
1773 @code{XCAR} and @code{XCDR} if a Lisp_Object is certainly a cons cell,
1774 else use @code{Fcar()} and @code{Fcdr()}.  Trust other C code, but not
1775 Lisp code.  On the other hand, if XEmacs has an internal logic error,
1776 it's better to crash immediately, so sprinkle ``unreachable''
1777 @code{abort()}s liberally about the source code.
1778
1779 @node Rules When Writing New C Code, A Summary of the Various XEmacs Modules, How Lisp Objects Are Represented in C, Top
1780 @chapter Rules When Writing New C Code
1781
1782 The XEmacs C Code is extremely complex and intricate, and there are many
1783 rules that are more or less consistently followed throughout the code.
1784 Many of these rules are not obvious, so they are explained here.  It is
1785 of the utmost importance that you follow them.  If you don't, you may
1786 get something that appears to work, but which will crash in odd
1787 situations, often in code far away from where the actual breakage is.
1788
1789 @menu
1790 * General Coding Rules::
1791 * Writing Lisp Primitives::
1792 * Adding Global Lisp Variables::
1793 * Coding for Mule::
1794 * Techniques for XEmacs Developers::
1795 @end menu
1796
1797 @node General Coding Rules, Writing Lisp Primitives, Rules When Writing New C Code, Rules When Writing New C Code
1798 @section General Coding Rules
1799
1800 The C code is actually written in a dialect of C called @dfn{Clean C},
1801 meaning that it can be compiled, mostly warning-free, with either a C or
1802 C++ compiler.  Coding in Clean C has several advantages over plain C.
1803 C++ compilers are more nit-picking, and a number of coding errors have
1804 been found by compiling with C++.  The ability to use both C and C++
1805 tools means that a greater variety of development tools are available to
1806 the developer.
1807
1808 Almost every module contains a @code{syms_of_*()} function and a
1809 @code{vars_of_*()} function.  The former declares any Lisp primitives
1810 you have defined and defines any symbols you will be using.  The latter
1811 declares any global Lisp variables you have added and initializes global
1812 C variables in the module.  For each such function, declare it in
1813 @file{symsinit.h} and make sure it's called in the appropriate place in
1814 @file{emacs.c}.  @strong{Important}: There are stringent requirements on
1815 exactly what can go into these functions.  See the comment in
1816 @file{emacs.c}.  The reason for this is to avoid obscure unwanted
1817 interactions during initialization.  If you don't follow these rules,
1818 you'll be sorry!  If you want to do anything that isn't allowed, create
1819 a @code{complex_vars_of_*()} function for it.  Doing this is tricky,
1820 though: You have to make sure your function is called at the right time
1821 so that all the initialization dependencies work out.
1822
1823 Every module includes @file{<config.h>} (angle brackets so that
1824 @samp{--srcdir} works correctly; @file{config.h} may or may not be in
1825 the same directory as the C sources) and @file{lisp.h}.  @file{config.h}
1826 must always be included before any other header files (including
1827 system header files) to ensure that certain tricks played by various
1828 @file{s/} and @file{m/} files work out correctly.
1829
1830 When including header files, always use angle brackets, not double
1831 quotes, except when the file to be included is in the same directory as
1832 the including file.  If either file is a generated file, then that is
1833 not likely to be the case.  In order to understand why we have this
1834 rule, imagine what happens when you do a build in the source directory
1835 using @samp{./configure} and another build in another directory using
1836 @samp{../work/configure}.  There will be two different @file{config.h}
1837 files.  Which one will be used if you @samp{#include "config.h"}?
1838
1839 @strong{All global and static variables that are to be modifiable must
1840 be declared uninitialized.}  This means that you may not use the
1841 ``declare with initializer'' form for these variables, such as @code{int
1842 some_variable = 0;}.  The reason for this has to do with some kludges
1843 done during the dumping process: If possible, the initialized data
1844 segment is re-mapped so that it becomes part of the (unmodifiable) code
1845 segment in the dumped executable.  This allows this memory to be shared
1846 among multiple running XEmacs processes.  XEmacs is careful to place as
1847 much constant data as possible into initialized variables (in
1848 particular, into what's called the @dfn{pure space}---see below) during
1849 the @file{temacs} phase.
1850
1851 @cindex copy-on-write
1852 @strong{Please note:} This kludge only works on a few systems nowadays,
1853 and is rapidly becoming irrelevant because most modern operating systems
1854 provide @dfn{copy-on-write} semantics.  All data is initially shared
1855 between processes, and a private copy is automatically made (on a
1856 page-by-page basis) when a process first attempts to write to a page of
1857 memory.
1858
1859 Formerly, there was a requirement that static variables not be declared
1860 inside of functions.  This had to do with another hack along the same
1861 vein as what was just described: old USG systems put statically-declared
1862 variables in the initialized data space, so those header files had a
1863 @code{#define static} declaration. (That way, the data-segment remapping
1864 described above could still work.) This fails badly on static variables
1865 inside of functions, which suddenly become automatic variables;
1866 therefore, you weren't supposed to have any of them.  This awful kludge
1867 has been removed in XEmacs because
1868
1869 @enumerate
1870 @item
1871 almost all of the systems that used this kludge ended up having
1872 to disable the data-segment remapping anyway;
1873 @item
1874 the only systems that didn't were extremely outdated ones;
1875 @item
1876 this hack completely messed up inline functions.
1877 @end enumerate
1878
1879 The C source code makes heavy use of C preprocessor macros.  One popular
1880 macro style is:
1881
1882 @example
1883 #define FOO(var, value) do @{           \
1884   Lisp_Object FOO_value = (value);      \
1885   ... /* compute using FOO_value */     \
1886   (var) = bar;                          \
1887 @} while (0)
1888 @end example
1889
1890 The @code{do @{...@} while (0)} is a standard trick to allow FOO to have
1891 statement semantics, so that it can safely be used within an @code{if}
1892 statement in C, for example.  Multiple evaluation is prevented by
1893 copying a supplied argument into a local variable, so that
1894 @code{FOO(var,fun(1))} only calls @code{fun} once.
1895
1896 Lisp lists are popular data structures in the C code as well as in
1897 Elisp.  There are two sets of macros that iterate over lists.
1898 @code{EXTERNAL_LIST_LOOP_@var{n}} should be used when the list has been
1899 supplied by the user, and cannot be trusted to be acyclic and
1900 nil-terminated.  A @code{malformed-list} or @code{circular-list} error
1901 will be generated if the list being iterated over is not entirely
1902 kosher.  @code{LIST_LOOP_@var{n}}, on the other hand, is faster and less
1903 safe, and can be used only on trusted lists.
1904
1905 Related macros are @code{GET_EXTERNAL_LIST_LENGTH} and
1906 @code{GET_LIST_LENGTH}, which calculate the length of a list, and in the
1907 case of @code{GET_EXTERNAL_LIST_LENGTH}, validating the properness of
1908 the list.  The macros @code{EXTERNAL_LIST_LOOP_DELETE_IF} and
1909 @code{LIST_LOOP_DELETE_IF} delete elements from a lisp list satisfying some
1910 predicate.
1911
1912 @node Writing Lisp Primitives, Adding Global Lisp Variables, General Coding Rules, Rules When Writing New C Code
1913 @section Writing Lisp Primitives
1914
1915 Lisp primitives are Lisp functions implemented in C.  The details of
1916 interfacing the C function so that Lisp can call it are handled by a few
1917 C macros.  The only way to really understand how to write new C code is
1918 to read the source, but we can explain some things here.
1919
1920 An example of a special form is the definition of @code{prog1}, from
1921 @file{eval.c}.  (An ordinary function would have the same general
1922 appearance.)
1923
1924 @cindex garbage collection protection
1925 @smallexample
1926 @group
1927 DEFUN ("prog1", Fprog1, 1, UNEVALLED, 0, /*
1928 Similar to `progn', but the value of the first form is returned.
1929 \(prog1 FIRST BODY...): All the arguments are evaluated sequentially.
1930 The value of FIRST is saved during evaluation of the remaining args,
1931 whose values are discarded.
1932 */
1933        (args))
1934 @{
1935   /* This function can GC */
1936   REGISTER Lisp_Object val, form, tail;
1937   struct gcpro gcpro1;
1938
1939   val = Feval (XCAR (args));
1940
1941   GCPRO1 (val);
1942
1943   LIST_LOOP_3 (form, XCDR (args), tail)
1944     Feval (form);
1945
1946   UNGCPRO;
1947   return val;
1948 @}
1949 @end group
1950 @end smallexample
1951
1952   Let's start with a precise explanation of the arguments to the
1953 @code{DEFUN} macro.  Here is a template for them:
1954
1955 @example
1956 @group
1957 DEFUN (@var{lname}, @var{fname}, @var{min_args}, @var{max_args}, @var{interactive}, /*
1958 @var{docstring}
1959 */
1960    (@var{arglist}))
1961 @end group
1962 @end example
1963
1964 @table @var
1965 @item lname
1966 This string is the name of the Lisp symbol to define as the function
1967 name; in the example above, it is @code{"prog1"}.
1968
1969 @item fname
1970 This is the C function name for this function.  This is the name that is
1971 used in C code for calling the function.  The name is, by convention,
1972 @samp{F} prepended to the Lisp name, with all dashes (@samp{-}) in the
1973 Lisp name changed to underscores.  Thus, to call this function from C
1974 code, call @code{Fprog1}.  Remember that the arguments are of type
1975 @code{Lisp_Object}; various macros and functions for creating values of
1976 type @code{Lisp_Object} are declared in the file @file{lisp.h}.
1977
1978 Primitives whose names are special characters (e.g. @code{+} or
1979 @code{<}) are named by spelling out, in some fashion, the special
1980 character: e.g. @code{Fplus()} or @code{Flss()}.  Primitives whose names
1981 begin with normal alphanumeric characters but also contain special
1982 characters are spelled out in some creative way, e.g. @code{let*}
1983 becomes @code{FletX()}.
1984
1985 Each function also has an associated structure that holds the data for
1986 the subr object that represents the function in Lisp.  This structure
1987 conveys the Lisp symbol name to the initialization routine that will
1988 create the symbol and store the subr object as its definition.  The C
1989 variable name of this structure is always @samp{S} prepended to the
1990 @var{fname}.  You hardly ever need to be aware of the existence of this
1991 structure, since @code{DEFUN} plus @code{DEFSUBR} takes care of all the
1992 details.
1993
1994 @item min_args
1995 This is the minimum number of arguments that the function requires.  The
1996 function @code{prog1} allows a minimum of one argument.
1997
1998 @item max_args
1999 This is the maximum number of arguments that the function accepts, if
2000 there is a fixed maximum.  Alternatively, it can be @code{UNEVALLED},
2001 indicating a special form that receives unevaluated arguments, or
2002 @code{MANY}, indicating an unlimited number of evaluated arguments (the
2003 C equivalent of @code{&rest}).  Both @code{UNEVALLED} and @code{MANY}
2004 are macros.  If @var{max_args} is a number, it may not be less than
2005 @var{min_args} and it may not be greater than 8. (If you need to add a
2006 function with more than 8 arguments, use the @code{MANY} form.  Resist
2007 the urge to edit the definition of @code{DEFUN} in @file{lisp.h}.  If
2008 you do it anyways, make sure to also add another clause to the switch
2009 statement in @code{primitive_funcall().})
2010
2011 @item interactive
2012 This is an interactive specification, a string such as might be used as
2013 the argument of @code{interactive} in a Lisp function.  In the case of
2014 @code{prog1}, it is 0 (a null pointer), indicating that @code{prog1}
2015 cannot be called interactively.  A value of @code{""} indicates a
2016 function that should receive no arguments when called interactively.
2017
2018 @item docstring
2019 This is the documentation string.  It is written just like a
2020 documentation string for a function defined in Lisp; in particular, the
2021 first line should be a single sentence.  Note how the documentation
2022 string is enclosed in a comment, none of the documentation is placed on
2023 the same lines as the comment-start and comment-end characters, and the
2024 comment-start characters are on the same line as the interactive
2025 specification.  @file{make-docfile}, which scans the C files for
2026 documentation strings, is very particular about what it looks for, and
2027 will not properly extract the doc string if it's not in this exact format.
2028
2029 In order to make both @file{etags} and @file{make-docfile} happy, make
2030 sure that the @code{DEFUN} line contains the @var{lname} and
2031 @var{fname}, and that the comment-start characters for the doc string
2032 are on the same line as the interactive specification, and put a newline
2033 directly after them (and before the comment-end characters).
2034
2035 @item arglist
2036 This is the comma-separated list of arguments to the C function.  For a
2037 function with a fixed maximum number of arguments, provide a C argument
2038 for each Lisp argument.  In this case, unlike regular C functions, the
2039 types of the arguments are not declared; they are simply always of type
2040 @code{Lisp_Object}.
2041
2042 The names of the C arguments will be used as the names of the arguments
2043 to the Lisp primitive as displayed in its documentation, modulo the same
2044 concerns described above for @code{F...} names (in particular,
2045 underscores in the C arguments become dashes in the Lisp arguments).
2046
2047 There is one additional kludge: A trailing `_' on the C argument is
2048 discarded when forming the Lisp argument.  This allows C language
2049 reserved words (like @code{default}) or global symbols (like
2050 @code{dirname}) to be used as argument names without compiler warnings
2051 or errors.
2052
2053 A Lisp function with @w{@var{max_args} = @code{UNEVALLED}} is a
2054 @w{@dfn{special form}}; its arguments are not evaluated.  Instead it
2055 receives one argument of type @code{Lisp_Object}, a (Lisp) list of the
2056 unevaluated arguments, conventionally named @code{(args)}.
2057
2058 When a Lisp function has no upper limit on the number of arguments,
2059 specify @w{@var{max_args} = @code{MANY}}.  In this case its implementation in
2060 C actually receives exactly two arguments: the number of Lisp arguments
2061 (an @code{int}) and the address of a block containing their values (a
2062 @w{@code{Lisp_Object *}}).  In this case only are the C types specified
2063 in the @var{arglist}: @w{@code{(int nargs, Lisp_Object *args)}}.
2064
2065 @end table
2066
2067 Within the function @code{Fprog1} itself, note the use of the macros
2068 @code{GCPRO1} and @code{UNGCPRO}.  @code{GCPRO1} is used to ``protect''
2069 a variable from garbage collection---to inform the garbage collector
2070 that it must look in that variable and regard the object pointed at by
2071 its contents as an accessible object.  This is necessary whenever you
2072 call @code{Feval} or anything that can directly or indirectly call
2073 @code{Feval} (this includes the @code{QUIT} macro!).  At such a time,
2074 any Lisp object that you intend to refer to again must be protected
2075 somehow.  @code{UNGCPRO} cancels the protection of the variables that
2076 are protected in the current function.  It is necessary to do this
2077 explicitly.
2078
2079 The macro @code{GCPRO1} protects just one local variable.  If you want
2080 to protect two, use @code{GCPRO2} instead; repeating @code{GCPRO1} will
2081 not work.  Macros @code{GCPRO3} and @code{GCPRO4} also exist.
2082
2083 These macros implicitly use local variables such as @code{gcpro1}; you
2084 must declare these explicitly, with type @code{struct gcpro}.  Thus, if
2085 you use @code{GCPRO2}, you must declare @code{gcpro1} and @code{gcpro2}.
2086
2087 @cindex caller-protects (@code{GCPRO} rule)
2088 Note also that the general rule is @dfn{caller-protects}; i.e. you are
2089 only responsible for protecting those Lisp objects that you create.  Any
2090 objects passed to you as arguments should have been protected by whoever
2091 created them, so you don't in general have to protect them.
2092
2093 In particular, the arguments to any Lisp primitive are always
2094 automatically @code{GCPRO}ed, when called ``normally'' from Lisp code or
2095 bytecode.  So only a few Lisp primitives that are called frequently from
2096 C code, such as @code{Fprogn} protect their arguments as a service to
2097 their caller.  You don't need to protect your arguments when writing a
2098 new @code{DEFUN}.
2099
2100 @code{GCPRO}ing is perhaps the trickiest and most error-prone part of
2101 XEmacs coding.  It is @strong{extremely} important that you get this
2102 right and use a great deal of discipline when writing this code.
2103 @xref{GCPROing, ,@code{GCPRO}ing}, for full details on how to do this.
2104
2105 What @code{DEFUN} actually does is declare a global structure of type
2106 @code{Lisp_Subr} whose name begins with capital @samp{SF} and which
2107 contains information about the primitive (e.g. a pointer to the
2108 function, its minimum and maximum allowed arguments, a string describing
2109 its Lisp name); @code{DEFUN} then begins a normal C function declaration
2110 using the @code{F...} name.  The Lisp subr object that is the function
2111 definition of a primitive (i.e. the object in the function slot of the
2112 symbol that names the primitive) actually points to this @samp{SF}
2113 structure; when @code{Feval} encounters a subr, it looks in the
2114 structure to find out how to call the C function.
2115
2116 Defining the C function is not enough to make a Lisp primitive
2117 available; you must also create the Lisp symbol for the primitive (the
2118 symbol is @dfn{interned}; @pxref{Obarrays}) and store a suitable subr
2119 object in its function cell. (If you don't do this, the primitive won't
2120 be seen by Lisp code.) The code looks like this:
2121
2122 @example
2123 DEFSUBR (@var{fname});
2124 @end example
2125
2126 @noindent
2127 Here @var{fname} is the same name you used as the second argument to
2128 @code{DEFUN}.
2129
2130 This call to @code{DEFSUBR} should go in the @code{syms_of_*()} function
2131 at the end of the module.  If no such function exists, create it and
2132 make sure to also declare it in @file{symsinit.h} and call it from the
2133 appropriate spot in @code{main()}.  @xref{General Coding Rules}.
2134
2135 Note that C code cannot call functions by name unless they are defined
2136 in C.  The way to call a function written in Lisp from C is to use
2137 @code{Ffuncall}, which embodies the Lisp function @code{funcall}.  Since
2138 the Lisp function @code{funcall} accepts an unlimited number of
2139 arguments, in C it takes two: the number of Lisp-level arguments, and a
2140 one-dimensional array containing their values.  The first Lisp-level
2141 argument is the Lisp function to call, and the rest are the arguments to
2142 pass to it.  Since @code{Ffuncall} can call the evaluator, you must
2143 protect pointers from garbage collection around the call to
2144 @code{Ffuncall}. (However, @code{Ffuncall} explicitly protects all of
2145 its parameters, so you don't have to protect any pointers passed as
2146 parameters to it.)
2147
2148 The C functions @code{call0}, @code{call1}, @code{call2}, and so on,
2149 provide handy ways to call a Lisp function conveniently with a fixed
2150 number of arguments.  They work by calling @code{Ffuncall}.
2151
2152 @file{eval.c} is a very good file to look through for examples;
2153 @file{lisp.h} contains the definitions for important macros and
2154 functions.
2155
2156 @node Adding Global Lisp Variables, Coding for Mule, Writing Lisp Primitives, Rules When Writing New C Code
2157 @section Adding Global Lisp Variables
2158
2159 Global variables whose names begin with @samp{Q} are constants whose
2160 value is a symbol of a particular name.  The name of the variable should
2161 be derived from the name of the symbol using the same rules as for Lisp
2162 primitives.  These variables are initialized using a call to
2163 @code{defsymbol()} in the @code{syms_of_*()} function. (This call
2164 interns a symbol, sets the C variable to the resulting Lisp object, and
2165 calls @code{staticpro()} on the C variable to tell the
2166 garbage-collection mechanism about this variable.  What
2167 @code{staticpro()} does is add a pointer to the variable to a large
2168 global array; when garbage-collection happens, all pointers listed in
2169 the array are used as starting points for marking Lisp objects.  This is
2170 important because it's quite possible that the only current reference to
2171 the object is the C variable.  In the case of symbols, the
2172 @code{staticpro()} doesn't matter all that much because the symbol is
2173 contained in @code{obarray}, which is itself @code{staticpro()}ed.
2174 However, it's possible that a naughty user could do something like
2175 uninterning the symbol out of @code{obarray} or even setting
2176 @code{obarray} to a different value [although this is likely to make
2177 XEmacs crash!].)
2178
2179   @strong{Please note:} It is potentially deadly if you declare a
2180 @samp{Q...}  variable in two different modules.  The two calls to
2181 @code{defsymbol()} are no problem, but some linkers will complain about
2182 multiply-defined symbols.  The most insidious aspect of this is that
2183 often the link will succeed anyway, but then the resulting executable
2184 will sometimes crash in obscure ways during certain operations!  To
2185 avoid this problem, declare any symbols with common names (such as
2186 @code{text}) that are not obviously associated with this particular
2187 module in the module @file{general.c}.
2188
2189   Global variables whose names begin with @samp{V} are variables that
2190 contain Lisp objects.  The convention here is that all global variables
2191 of type @code{Lisp_Object} begin with @samp{V}, and all others don't
2192 (including integer and boolean variables that have Lisp
2193 equivalents). Most of the time, these variables have equivalents in
2194 Lisp, but some don't.  Those that do are declared this way by a call to
2195 @code{DEFVAR_LISP()} in the @code{vars_of_*()} initializer for the
2196 module.  What this does is create a special @dfn{symbol-value-forward}
2197 Lisp object that contains a pointer to the C variable, intern a symbol
2198 whose name is as specified in the call to @code{DEFVAR_LISP()}, and set
2199 its value to the symbol-value-forward Lisp object; it also calls
2200 @code{staticpro()} on the C variable to tell the garbage-collection
2201 mechanism about the variable.  When @code{eval} (or actually
2202 @code{symbol-value}) encounters this special object in the process of
2203 retrieving a variable's value, it follows the indirection to the C
2204 variable and gets its value.  @code{setq} does similar things so that
2205 the C variable gets changed.
2206
2207   Whether or not you @code{DEFVAR_LISP()} a variable, you need to
2208 initialize it in the @code{vars_of_*()} function; otherwise it will end
2209 up as all zeroes, which is the integer 0 (@emph{not} @code{nil}), and
2210 this is probably not what you want.  Also, if the variable is not
2211 @code{DEFVAR_LISP()}ed, @strong{you must call} @code{staticpro()} on the
2212 C variable in the @code{vars_of_*()} function.  Otherwise, the
2213 garbage-collection mechanism won't know that the object in this variable
2214 is in use, and will happily collect it and reuse its storage for another
2215 Lisp object, and you will be the one who's unhappy when you can't figure
2216 out how your variable got overwritten.
2217
2218 @node Coding for Mule, Techniques for XEmacs Developers, Adding Global Lisp Variables, Rules When Writing New C Code
2219 @section Coding for Mule
2220 @cindex Coding for Mule
2221
2222 Although Mule support is not compiled by default in XEmacs, many people
2223 are using it, and we consider it crucial that new code works correctly
2224 with multibyte characters.  This is not hard; it is only a matter of
2225 following several simple user-interface guidelines.  Even if you never
2226 compile with Mule, with a little practice you will find it quite easy
2227 to code Mule-correctly.
2228
2229 Note that these guidelines are not necessarily tied to the current Mule
2230 implementation; they are also a good idea to follow on the grounds of
2231 code generalization for future I18N work.
2232
2233 @menu
2234 * Character-Related Data Types::
2235 * Working With Character and Byte Positions::
2236 * Conversion to and from External Data::
2237 * General Guidelines for Writing Mule-Aware Code::
2238 * An Example of Mule-Aware Code::
2239 @end menu
2240
2241 @node Character-Related Data Types, Working With Character and Byte Positions, Coding for Mule, Coding for Mule
2242 @subsection Character-Related Data Types
2243
2244 First, let's review the basic character-related datatypes used by
2245 XEmacs.  Note that the separate @code{typedef}s are not mandatory in the
2246 current implementation (all of them boil down to @code{unsigned char} or
2247 @code{int}), but they improve clarity of code a great deal, because one
2248 glance at the declaration can tell the intended use of the variable.
2249
2250 @table @code
2251 @item Emchar
2252 @cindex Emchar
2253 An @code{Emchar} holds a single Emacs character.
2254
2255 Obviously, the equality between characters and bytes is lost in the Mule
2256 world.  Characters can be represented by one or more bytes in the
2257 buffer, and @code{Emchar} is the C type large enough to hold any
2258 character.
2259
2260 Without Mule support, an @code{Emchar} is equivalent to an
2261 @code{unsigned char}.
2262
2263 @item Bufbyte
2264 @cindex Bufbyte
2265 The data representing the text in a buffer or string is logically a set
2266 of @code{Bufbyte}s.
2267
2268 XEmacs does not work with character formats all the time; when reading
2269 characters from the outside, it decodes them to an internal format, and
2270 likewise encodes them when writing.  @code{Bufbyte} (in fact
2271 @code{unsigned char}) is the basic unit of XEmacs internal buffers and
2272 strings format.
2273
2274 One character can correspond to one or more @code{Bufbyte}s.  In the
2275 current implementation, an ASCII character is represented by the same
2276 @code{Bufbyte}, and extended characters are represented by a sequence of
2277 @code{Bufbyte}s.
2278
2279 Without Mule support, a @code{Bufbyte} is equivalent to an
2280 @code{Emchar}.
2281
2282 @item Bufpos
2283 @itemx Charcount
2284 @cindex Bufpos
2285 @cindex Charcount
2286 A @code{Bufpos} represents a character position in a buffer or string.
2287 A @code{Charcount} represents a number (count) of characters.
2288 Logically, subtracting two @code{Bufpos} values yields a
2289 @code{Charcount} value.  Although all of these are @code{typedef}ed to
2290 @code{int}, we use them in preference to @code{int} to make it clear
2291 what sort of position is being used.
2292
2293 @code{Bufpos} and @code{Charcount} values are the only ones that are
2294 ever visible to Lisp.
2295
2296 @item Bytind
2297 @itemx Bytecount
2298 @cindex Bytind
2299 @cindex Bytecount
2300 A @code{Bytind} represents a byte position in a buffer or string.  A
2301 @code{Bytecount} represents the distance between two positions in bytes.
2302 The relationship between @code{Bytind} and @code{Bytecount} is the same
2303 as the relationship between @code{Bufpos} and @code{Charcount}.
2304
2305 @item Extbyte
2306 @itemx Extcount
2307 @cindex Extbyte
2308 @cindex Extcount
2309 When dealing with the outside world, XEmacs works with @code{Extbyte}s,
2310 which are equivalent to @code{unsigned char}.  Obviously, an
2311 @code{Extcount} is the distance between two @code{Extbyte}s.  Extbytes
2312 and Extcounts are not all that frequent in XEmacs code.
2313 @end table
2314
2315 @node Working With Character and Byte Positions, Conversion to and from External Data, Character-Related Data Types, Coding for Mule
2316 @subsection Working With Character and Byte Positions
2317
2318 Now that we have defined the basic character-related types, we can look
2319 at the macros and functions designed for work with them and for
2320 conversion between them.  Most of these macros are defined in
2321 @file{buffer.h}, and we don't discuss all of them here, but only the
2322 most important ones.  Examining the existing code is the best way to
2323 learn about them.
2324
2325 @table @code
2326 @item MAX_EMCHAR_LEN
2327 @cindex MAX_EMCHAR_LEN
2328 This preprocessor constant is the maximum number of buffer bytes per
2329 Emacs character, i.e. the byte length of an @code{Emchar}.  It is useful
2330 when allocating temporary strings to keep a known number of characters.
2331 For instance:
2332
2333 @example
2334 @group
2335 @{
2336   Charcount cclen;
2337   ...
2338   @{
2339     /* Allocate place for @var{cclen} characters. */
2340     Bufbyte *buf = (Bufbyte *)alloca (cclen * MAX_EMCHAR_LEN);
2341 ...
2342 @end group
2343 @end example
2344
2345 If you followed the previous section, you can guess that, logically,
2346 multiplying a @code{Charcount} value with @code{MAX_EMCHAR_LEN} produces
2347 a @code{Bytecount} value.
2348
2349 In the current Mule implementation, @code{MAX_EMCHAR_LEN} equals 4.
2350 Without Mule, it is 1.
2351
2352 @item charptr_emchar
2353 @itemx set_charptr_emchar
2354 @cindex charptr_emchar
2355 @cindex set_charptr_emchar
2356 The @code{charptr_emchar} macro takes a @code{Bufbyte} pointer and
2357 returns the @code{Emchar} stored at that position.  If it were a
2358 function, its prototype would be:
2359
2360 @example
2361 Emchar charptr_emchar (Bufbyte *p);
2362 @end example
2363
2364 @code{set_charptr_emchar} stores an @code{Emchar} to the specified byte
2365 position.  It returns the number of bytes stored:
2366
2367 @example
2368 Bytecount set_charptr_emchar (Bufbyte *p, Emchar c);
2369 @end example
2370
2371 It is important to note that @code{set_charptr_emchar} is safe only for
2372 appending a character at the end of a buffer, not for overwriting a
2373 character in the middle.  This is because the width of characters
2374 varies, and @code{set_charptr_emchar} cannot resize the string if it
2375 writes, say, a two-byte character where a single-byte character used to
2376 reside.
2377
2378 A typical use of @code{set_charptr_emchar} can be demonstrated by this
2379 example, which copies characters from buffer @var{buf} to a temporary
2380 string of Bufbytes.
2381
2382 @example
2383 @group
2384 @{
2385   Bufpos pos;
2386   for (pos = beg; pos < end; pos++)
2387     @{
2388       Emchar c = BUF_FETCH_CHAR (buf, pos);
2389       p += set_charptr_emchar (buf, c);
2390     @}
2391 @}
2392 @end group
2393 @end example
2394
2395 Note how @code{set_charptr_emchar} is used to store the @code{Emchar}
2396 and increment the counter, at the same time.
2397
2398 @item INC_CHARPTR
2399 @itemx DEC_CHARPTR
2400 @cindex INC_CHARPTR
2401 @cindex DEC_CHARPTR
2402 These two macros increment and decrement a @code{Bufbyte} pointer,
2403 respectively.  They will adjust the pointer by the appropriate number of
2404 bytes according to the byte length of the character stored there.  Both
2405 macros assume that the memory address is located at the beginning of a
2406 valid character.
2407
2408 Without Mule support, @code{INC_CHARPTR (p)} and @code{DEC_CHARPTR (p)}
2409 simply expand to @code{p++} and @code{p--}, respectively.
2410
2411 @item bytecount_to_charcount
2412 @cindex bytecount_to_charcount
2413 Given a pointer to a text string and a length in bytes, return the
2414 equivalent length in characters.
2415
2416 @example
2417 Charcount bytecount_to_charcount (Bufbyte *p, Bytecount bc);
2418 @end example
2419
2420 @item charcount_to_bytecount
2421 @cindex charcount_to_bytecount
2422 Given a pointer to a text string and a length in characters, return the
2423 equivalent length in bytes.
2424
2425 @example
2426 Bytecount charcount_to_bytecount (Bufbyte *p, Charcount cc);
2427 @end example
2428
2429 @item charptr_n_addr
2430 @cindex charptr_n_addr
2431 Return a pointer to the beginning of the character offset @var{cc} (in
2432 characters) from @var{p}.
2433
2434 @example
2435 Bufbyte *charptr_n_addr (Bufbyte *p, Charcount cc);
2436 @end example
2437 @end table
2438
2439 @node Conversion to and from External Data, General Guidelines for Writing Mule-Aware Code, Working With Character and Byte Positions, Coding for Mule
2440 @subsection Conversion to and from External Data
2441
2442 When an external function, such as a C library function, returns a
2443 @code{char} pointer, you should almost never treat it as @code{Bufbyte}.
2444 This is because these returned strings may contain 8bit characters which
2445 can be misinterpreted by XEmacs, and cause a crash.  Likewise, when
2446 exporting a piece of internal text to the outside world, you should
2447 always convert it to an appropriate external encoding, lest the internal
2448 stuff (such as the infamous \201 characters) leak out.
2449
2450 The interface to conversion between the internal and external
2451 representations of text are the numerous conversion macros defined in
2452 @file{buffer.h}.  Before looking at them, we'll look at the external
2453 formats supported by these macros.
2454
2455 Currently meaningful formats are @code{FORMAT_BINARY},
2456 @code{FORMAT_FILENAME}, @code{FORMAT_OS}, and @code{FORMAT_CTEXT}.  Here
2457 is a description of these.
2458
2459 @table @code
2460 @item FORMAT_BINARY
2461 Binary format.  This is the simplest format and is what we use in the
2462 absence of a more appropriate format.  This converts according to the
2463 @code{binary} coding system:
2464
2465 @enumerate a
2466 @item
2467 On input, bytes 0--255 are converted into characters 0--255.
2468 @item
2469 On output, characters 0--255 are converted into bytes 0--255 and other
2470 characters are converted into `X'.
2471 @end enumerate
2472
2473 @item FORMAT_FILENAME
2474 Format used for filenames.  In the original Mule, this is user-definable
2475 with the @code{pathname-coding-system} variable.  For the moment, we
2476 just use the @code{binary} coding system.
2477
2478 @item FORMAT_OS
2479 Format used for the external Unix environment---@code{argv[]}, stuff
2480 from @code{getenv()}, stuff from the @file{/etc/passwd} file, etc.
2481
2482 Perhaps should be the same as FORMAT_FILENAME.
2483
2484 @item FORMAT_CTEXT
2485 Compound--text format.  This is the standard X format used for data
2486 stored in properties, selections, and the like.  This is an 8-bit
2487 no-lock-shift ISO2022 coding system.
2488 @end table
2489
2490 The macros to convert between these formats and the internal format, and
2491 vice versa, follow.
2492
2493 @table @code
2494 @item GET_CHARPTR_INT_DATA_ALLOCA
2495 @itemx GET_CHARPTR_EXT_DATA_ALLOCA
2496 These two are the most basic conversion macros.
2497 @code{GET_CHARPTR_INT_DATA_ALLOCA} converts external data to internal
2498 format, and @code{GET_CHARPTR_EXT_DATA_ALLOCA} converts the other way
2499 around.  The arguments each of these receives are @var{ptr} (pointer to
2500 the text in external format), @var{len} (length of texts in bytes),
2501 @var{fmt} (format of the external text), @var{ptr_out} (lvalue to which
2502 new text should be copied), and @var{len_out} (lvalue which will be
2503 assigned the length of the internal text in bytes).  The resulting text
2504 is stored to a stack-allocated buffer.  If the text doesn't need
2505 changing, these macros will do nothing, except for setting
2506 @var{len_out}.
2507
2508 The macros above take many arguments which makes them unwieldy.  For
2509 this reason, a number of convenience macros are defined with obvious
2510 functionality, but accepting less arguments.  The general rule is that
2511 macros with @samp{INT} in their name convert text to internal Emacs
2512 representation, whereas the @samp{EXT} macros convert to external
2513 representation.
2514
2515 @item GET_C_CHARPTR_INT_DATA_ALLOCA
2516 @itemx GET_C_CHARPTR_EXT_DATA_ALLOCA
2517 As their names imply, these macros work on C char pointers, which are
2518 zero-terminated, and thus do not need @var{len} or @var{len_out}
2519 parameters.
2520
2521 @item GET_STRING_EXT_DATA_ALLOCA
2522 @itemx GET_C_STRING_EXT_DATA_ALLOCA
2523 These two macros convert a Lisp string into an external representation.
2524 The difference between them is that @code{GET_STRING_EXT_DATA_ALLOCA}
2525 stores its output to a generic string, providing @var{len_out}, the
2526 length of the resulting external string.  On the other hand,
2527 @code{GET_C_STRING_EXT_DATA_ALLOCA} assumes that the caller will be
2528 satisfied with output string being zero-terminated.
2529
2530 Note that for Lisp strings only one conversion direction makes sense.
2531
2532 @item GET_C_CHARPTR_EXT_BINARY_DATA_ALLOCA
2533 @itemx GET_CHARPTR_EXT_BINARY_DATA_ALLOCA
2534 @itemx GET_STRING_BINARY_DATA_ALLOCA
2535 @itemx GET_C_STRING_BINARY_DATA_ALLOCA
2536 @itemx GET_C_CHARPTR_EXT_FILENAME_DATA_ALLOCA
2537 @itemx ...
2538 These macros convert internal text to a specific external
2539 representation, with the external format being encoded into the name of
2540 the macro.  Note that the @code{GET_STRING_...} and
2541 @code{GET_C_STRING...}  macros lack the @samp{EXT} tag, because they
2542 only make sense in that direction.
2543
2544 @item GET_C_CHARPTR_INT_BINARY_DATA_ALLOCA
2545 @itemx GET_CHARPTR_INT_BINARY_DATA_ALLOCA
2546 @itemx GET_C_CHARPTR_INT_FILENAME_DATA_ALLOCA
2547 @itemx ...
2548 These macros convert external text of a specific format to its internal
2549 representation, with the external format being incoded into the name of
2550 the macro.
2551 @end table
2552
2553 @node General Guidelines for Writing Mule-Aware Code, An Example of Mule-Aware Code, Conversion to and from External Data, Coding for Mule
2554 @subsection General Guidelines for Writing Mule-Aware Code
2555
2556 This section contains some general guidance on how to write Mule-aware
2557 code, as well as some pitfalls you should avoid.
2558
2559 @table @emph
2560 @item Never use @code{char} and @code{char *}.
2561 In XEmacs, the use of @code{char} and @code{char *} is almost always a
2562 mistake.  If you want to manipulate an Emacs character from ``C'', use
2563 @code{Emchar}.  If you want to examine a specific octet in the internal
2564 format, use @code{Bufbyte}.  If you want a Lisp-visible character, use a
2565 @code{Lisp_Object} and @code{make_char}.  If you want a pointer to move
2566 through the internal text, use @code{Bufbyte *}.  Also note that you
2567 almost certainly do not need @code{Emchar *}.
2568
2569 @item Be careful not to confuse @code{Charcount}, @code{Bytecount}, and @code{Bufpos}.
2570 The whole point of using different types is to avoid confusion about the
2571 use of certain variables.  Lest this effect be nullified, you need to be
2572 careful about using the right types.
2573
2574 @item Always convert external data
2575 It is extremely important to always convert external data, because
2576 XEmacs can crash if unexpected 8bit sequences are copied to its internal
2577 buffers literally.
2578
2579 This means that when a system function, such as @code{readdir}, returns
2580 a string, you need to convert it using one of the conversion macros
2581 described in the previous chapter, before passing it further to Lisp.
2582 In the case of @code{readdir}, you would use the
2583 @code{GET_C_CHARPTR_INT_FILENAME_DATA_ALLOCA} macro.
2584
2585 Also note that many internal functions, such as @code{make_string},
2586 accept Bufbytes, which removes the need for them to convert the data
2587 they receive.  This increases efficiency because that way external data
2588 needs to be decoded only once, when it is read.  After that, it is
2589 passed around in internal format.
2590 @end table
2591
2592 @node An Example of Mule-Aware Code,  , General Guidelines for Writing Mule-Aware Code, Coding for Mule
2593 @subsection An Example of Mule-Aware Code
2594
2595 As an example of Mule-aware code, we shall will analyze the
2596 @code{string} function, which conses up a Lisp string from the character
2597 arguments it receives.  Here is the definition, pasted from
2598 @code{alloc.c}:
2599
2600 @example
2601 @group
2602 DEFUN ("string", Fstring, 0, MANY, 0, /*
2603 Concatenate all the argument characters and make the result a string.
2604 */
2605        (int nargs, Lisp_Object *args))
2606 @{
2607   Bufbyte *storage = alloca_array (Bufbyte, nargs * MAX_EMCHAR_LEN);
2608   Bufbyte *p = storage;
2609
2610   for (; nargs; nargs--, args++)
2611     @{
2612       Lisp_Object lisp_char = *args;
2613       CHECK_CHAR_COERCE_INT (lisp_char);
2614       p += set_charptr_emchar (p, XCHAR (lisp_char));
2615     @}
2616   return make_string (storage, p - storage);
2617 @}
2618 @end group
2619 @end example
2620
2621 Now we can analyze the source line by line.
2622
2623 Obviously, string will be as long as there are arguments to the
2624 function.  This is why we allocate @code{MAX_EMCHAR_LEN} * @var{nargs}
2625 bytes on the stack, i.e. the worst-case number of bytes for @var{nargs}
2626 @code{Emchar}s to fit in the string.
2627
2628 Then, the loop checks that each element is a character, converting
2629 integers in the process.  Like many other functions in XEmacs, this
2630 function silently accepts integers where characters are expected, for
2631 historical and compatibility reasons.  Unless you know what you are
2632 doing, @code{CHECK_CHAR} will also suffice.  @code{XCHAR (lisp_char)}
2633 extracts the @code{Emchar} from the @code{Lisp_Object}, and
2634 @code{set_charptr_emchar} stores it to storage, increasing @code{p} in
2635 the process.
2636
2637 Other instructive examples of correct coding under Mule can be found all
2638 over the XEmacs code.  For starters, I recommend
2639 @code{Fnormalize_menu_item_name} in @file{menubar.c}.  After you have
2640 understood this section of the manual and studied the examples, you can
2641 proceed writing new Mule-aware code.
2642
2643 @node Techniques for XEmacs Developers,  , Coding for Mule, Rules When Writing New C Code
2644 @section Techniques for XEmacs Developers
2645
2646 To make a quantified XEmacs, do: @code{make quantmacs}.
2647
2648 You simply can't dump Quantified and Purified images.  Run the image
2649 like so:  @code{quantmacs -batch -l loadup.el run-temacs @var{xemacs-args...}}.
2650
2651 Before you go through the trouble, are you compiling with all
2652 debugging and error-checking off?  If not try that first.  Be warned
2653 that while Quantify is directly responsible for quite a few
2654 optimizations which have been made to XEmacs, doing a run which
2655 generates results which can be acted upon is not necessarily a trivial
2656 task.
2657
2658 Also, if you're still willing to do some runs make sure you configure
2659 with the @samp{--quantify} flag.  That will keep Quantify from starting
2660 to record data until after the loadup is completed and will shut off
2661 recording right before it shuts down (which generates enough bogus data
2662 to throw most results off).  It also enables three additional elisp
2663 commands: @code{quantify-start-recording-data},
2664 @code{quantify-stop-recording-data} and @code{quantify-clear-data}.
2665
2666 If you want to make XEmacs faster, target your favorite slow benchmark,
2667 run a profiler like Quantify, @code{gprof}, or @code{tcov}, and figure
2668 out where the cycles are going.  Specific projects:
2669
2670 @itemize @bullet
2671 @item
2672 Make the garbage collector faster.  Figure out how to write an
2673 incremental garbage collector.
2674 @item
2675 Write a compiler that takes bytecode and spits out C code.
2676 Unfortunately, you will then need a C compiler and a more fully
2677 developed module system.
2678 @item
2679 Speed up redisplay.
2680 @item
2681 Speed up syntax highlighting.  Maybe moving some of the syntax
2682 highlighting capabilities into C would make a difference.
2683 @item
2684 Implement tail recursion in Emacs Lisp (hard!).
2685 @end itemize
2686
2687 Unfortunately, Emacs Lisp is slow, and is going to stay slow.  Function
2688 calls in elisp are especially expensive.  Iterating over a long list is
2689 going to be 30 times faster implemented in C than in Elisp.
2690
2691 To get started debugging XEmacs, take a look at the @file{.gdbinit} and
2692 @file{.dbxrc} files in the @file{src} directory.
2693 @xref{Q2.1.15 - How to Debug an XEmacs problem with a debugger,,,
2694 xemacs-faq, XEmacs FAQ}.
2695
2696 After making source code changes, run @code{make check} to ensure that
2697 you haven't introduced any regressions.  If you're feeling ambitious,
2698 you can try to improve the test suite in @file{tests/automated}.
2699
2700 Here are things to know when you create a new source file:
2701
2702 @itemize @bullet
2703 @item
2704 All @file{.c} files should @code{#include <config.h>} first.  Almost all
2705 @file{.c} files should @code{#include "lisp.h"} second.
2706
2707 @item
2708 Generated header files should be included using the @code{#include <...>} syntax,
2709 not the @code{#include "..."} syntax.  The generated headers are:
2710
2711 @file{config.h puresize-adjust.h sheap-adjust.h paths.h Emacs.ad.h}
2712
2713 The basic rule is that you should assume builds using @code{--srcdir}
2714 and the @code{#include <...>} syntax needs to be used when the
2715 to-be-included generated file is in a potentially different directory
2716 @emph{at compile time}.  The non-obvious C rule is that @code{#include "..."}
2717 means to search for the included file in the same directory as the
2718 including file, @emph{not} in the current directory.
2719
2720 @item
2721 Header files should @emph{not} include @code{<config.h>} and
2722 @code{"lisp.h"}.  It is the responsibility of the @file{.c} files that
2723 use it to do so.
2724
2725 @item
2726 If the header uses @code{INLINE}, either directly or through
2727 @code{DECLARE_LRECORD}, then it must be added to @file{inline.c}'s
2728 includes.
2729
2730 @item
2731 Try compiling at least once with
2732
2733 @example
2734 gcc --with-mule --with-union-type --error-checking=all
2735 @end example
2736
2737 @item
2738 Did I mention that you should run the test suite?
2739 @example
2740 make check
2741 @end example
2742 @end itemize
2743
2744
2745 @node A Summary of the Various XEmacs Modules, Allocation of Objects in XEmacs Lisp, Rules When Writing New C Code, Top
2746 @chapter A Summary of the Various XEmacs Modules
2747
2748   This is accurate as of XEmacs 20.0.
2749
2750 @menu
2751 * Low-Level Modules::
2752 * Basic Lisp Modules::
2753 * Modules for Standard Editing Operations::
2754 * Editor-Level Control Flow Modules::
2755 * Modules for the Basic Displayable Lisp Objects::
2756 * Modules for other Display-Related Lisp Objects::
2757 * Modules for the Redisplay Mechanism::
2758 * Modules for Interfacing with the File System::
2759 * Modules for Other Aspects of the Lisp Interpreter and Object System::
2760 * Modules for Interfacing with the Operating System::
2761 * Modules for Interfacing with X Windows::
2762 * Modules for Internationalization::
2763 @end menu
2764
2765 @node Low-Level Modules, Basic Lisp Modules, A Summary of the Various XEmacs Modules, A Summary of the Various XEmacs Modules
2766 @section Low-Level Modules
2767
2768 @example
2769 config.h
2770 @end example
2771
2772 This is automatically generated from @file{config.h.in} based on the
2773 results of configure tests and user-selected optional features and
2774 contains preprocessor definitions specifying the nature of the
2775 environment in which XEmacs is being compiled.
2776
2777
2778
2779 @example
2780 paths.h
2781 @end example
2782
2783 This is automatically generated from @file{paths.h.in} based on supplied
2784 configure values, and allows for non-standard installed configurations
2785 of the XEmacs directories.  It's currently broken, though.
2786
2787
2788
2789 @example
2790 emacs.c
2791 signal.c
2792 @end example
2793
2794 @file{emacs.c} contains @code{main()} and other code that performs the most
2795 basic environment initializations and handles shutting down the XEmacs
2796 process (this includes @code{kill-emacs}, the normal way that XEmacs is
2797 exited; @code{dump-emacs}, which is used during the build process to
2798 write out the XEmacs executable; @code{run-emacs-from-temacs}, which can
2799 be used to start XEmacs directly when temacs has finished loading all
2800 the Lisp code; and emergency code to handle crashes [XEmacs tries to
2801 auto-save all files before it crashes]).
2802
2803 Low-level code that directly interacts with the Unix signal mechanism,
2804 however, is in @file{signal.c}.  Note that this code does not handle system
2805 dependencies in interfacing to signals; that is handled using the
2806 @file{syssignal.h} header file, described in section J below.
2807
2808
2809
2810 @example
2811 unexaix.c
2812 unexalpha.c
2813 unexapollo.c
2814 unexconvex.c
2815 unexec.c
2816 unexelf.c
2817 unexelfsgi.c
2818 unexencap.c
2819 unexenix.c
2820 unexfreebsd.c
2821 unexfx2800.c
2822 unexhp9k3.c
2823 unexhp9k800.c
2824 unexmips.c
2825 unexnext.c
2826 unexsol2.c
2827 unexsunos4.c
2828 @end example
2829
2830 These modules contain code dumping out the XEmacs executable on various
2831 different systems. (This process is highly machine-specific and
2832 requires intimate knowledge of the executable format and the memory map
2833 of the process.) Only one of these modules is actually used; this is
2834 chosen by @file{configure}.
2835
2836
2837
2838 @example
2839 crt0.c
2840 lastfile.c
2841 pre-crt0.c
2842 @end example
2843
2844 These modules are used in conjunction with the dump mechanism.  On some
2845 systems, an alternative version of the C startup code (the actual code
2846 that receives control from the operating system when the process is
2847 started, and which calls @code{main()}) is required so that the dumping
2848 process works properly; @file{crt0.c} provides this.
2849
2850 @file{pre-crt0.c} and @file{lastfile.c} should be the very first and
2851 very last file linked, respectively. (Actually, this is not really true.
2852 @file{lastfile.c} should be after all Emacs modules whose initialized
2853 data should be made constant, and before all other Emacs files and all
2854 libraries.  In particular, the allocation modules @file{gmalloc.c},
2855 @file{alloca.c}, etc. are normally placed past @file{lastfile.c}, and
2856 all of the files that implement Xt widget classes @emph{must} be placed
2857 after @file{lastfile.c} because they contain various structures that
2858 must be statically initialized and into which Xt writes at various
2859 times.) @file{pre-crt0.c} and @file{lastfile.c} contain exported symbols
2860 that are used to determine the start and end of XEmacs' initialized
2861 data space when dumping.
2862
2863
2864
2865 @example
2866 alloca.c
2867 free-hook.c
2868 getpagesize.h
2869 gmalloc.c
2870 malloc.c
2871 mem-limits.h
2872 ralloc.c
2873 vm-limit.c
2874 @end example
2875
2876 These handle basic C allocation of memory.  @file{alloca.c} is an emulation of
2877 the stack allocation function @code{alloca()} on machines that lack
2878 this. (XEmacs makes extensive use of @code{alloca()} in its code.)
2879
2880 @file{gmalloc.c} and @file{malloc.c} are two implementations of the standard C
2881 functions @code{malloc()}, @code{realloc()} and @code{free()}.  They are
2882 often used in place of the standard system-provided @code{malloc()}
2883 because they usually provide a much faster implementation, at the
2884 expense of additional memory use.  @file{gmalloc.c} is a newer implementation
2885 that is much more memory-efficient for large allocations than @file{malloc.c},
2886 and should always be preferred if it works. (At one point, @file{gmalloc.c}
2887 didn't work on some systems where @file{malloc.c} worked; but this should be
2888 fixed now.)
2889
2890 @cindex relocating allocator
2891 @file{ralloc.c} is the @dfn{relocating allocator}.  It provides
2892 functions similar to @code{malloc()}, @code{realloc()} and @code{free()}
2893 that allocate memory that can be dynamically relocated in memory.  The
2894 advantage of this is that allocated memory can be shuffled around to
2895 place all the free memory at the end of the heap, and the heap can then
2896 be shrunk, releasing the memory back to the operating system.  The use
2897 of this can be controlled with the configure option @code{--rel-alloc};
2898 if enabled, memory allocated for buffers will be relocatable, so that if
2899 a very large file is visited and the buffer is later killed, the memory
2900 can be released to the operating system.  (The disadvantage of this
2901 mechanism is that it can be very slow.  On systems with the
2902 @code{mmap()} system call, the XEmacs version of @file{ralloc.c} uses
2903 this to move memory around without actually having to block-copy it,
2904 which can speed things up; but it can still cause noticeable performance
2905 degradation.)
2906
2907 @file{free-hook.c} contains some debugging functions for checking for invalid
2908 arguments to @code{free()}.
2909
2910 @file{vm-limit.c} contains some functions that warn the user when memory is
2911 getting low.  These are callback functions that are called by @file{gmalloc.c}
2912 and @file{malloc.c} at appropriate times.
2913
2914 @file{getpagesize.h} provides a uniform interface for retrieving the size of a
2915 page in virtual memory.  @file{mem-limits.h} provides a uniform interface for
2916 retrieving the total amount of available virtual memory.  Both are
2917 similar in spirit to the @file{sys*.h} files described in section J, below.
2918
2919
2920
2921 @example
2922 blocktype.c
2923 blocktype.h
2924 dynarr.c
2925 @end example
2926
2927 These implement a couple of basic C data types to facilitate memory
2928 allocation.  The @code{Blocktype} type efficiently manages the
2929 allocation of fixed-size blocks by minimizing the number of times that
2930 @code{malloc()} and @code{free()} are called.  It allocates memory in
2931 large chunks, subdivides the chunks into blocks of the proper size, and
2932 returns the blocks as requested.  When blocks are freed, they are placed
2933 onto a linked list, so they can be efficiently reused.  This data type
2934 is not much used in XEmacs currently, because it's a fairly new
2935 addition.
2936
2937 @cindex dynamic array
2938 The @code{Dynarr} type implements a @dfn{dynamic array}, which is
2939 similar to a standard C array but has no fixed limit on the number of
2940 elements it can contain.  Dynamic arrays can hold elements of any type,
2941 and when you add a new element, the array automatically resizes itself
2942 if it isn't big enough.  Dynarrs are extensively used in the redisplay
2943 mechanism.
2944
2945
2946
2947 @example
2948 inline.c
2949 @end example
2950
2951 This module is used in connection with inline functions (available in
2952 some compilers).  Often, inline functions need to have a corresponding
2953 non-inline function that does the same thing.  This module is where they
2954 reside.  It contains no actual code, but defines some special flags that
2955 cause inline functions defined in header files to be rendered as actual
2956 functions.  It then includes all header files that contain any inline
2957 function definitions, so that each one gets a real function equivalent.
2958
2959
2960
2961 @example
2962 debug.c
2963 debug.h
2964 @end example
2965
2966 These functions provide a system for doing internal consistency checks
2967 during code development.  This system is not currently used; instead the
2968 simpler @code{assert()} macro is used along with the various checks
2969 provided by the @samp{--error-check-*} configuration options.
2970
2971
2972
2973 @example
2974 prefix-args.c
2975 @end example
2976
2977 This is actually the source for a small, self-contained program
2978 used during building.
2979
2980
2981 @example
2982 universe.h
2983 @end example
2984
2985 This is not currently used.
2986
2987
2988
2989 @node Basic Lisp Modules, Modules for Standard Editing Operations, Low-Level Modules, A Summary of the Various XEmacs Modules
2990 @section Basic Lisp Modules
2991
2992 @example
2993 emacsfns.h
2994 lisp-disunion.h
2995 lisp-union.h
2996 lisp.h
2997 lrecord.h
2998 symsinit.h
2999 @end example
3000
3001 These are the basic header files for all XEmacs modules.  Each module
3002 includes @file{lisp.h}, which brings the other header files in.
3003 @file{lisp.h} contains the definitions of the structures and extractor
3004 and constructor macros for the basic Lisp objects and various other
3005 basic definitions for the Lisp environment, as well as some
3006 general-purpose definitions (e.g. @code{min()} and @code{max()}).
3007 @file{lisp.h} includes either @file{lisp-disunion.h} or
3008 @file{lisp-union.h}, depending on whether @code{USE_UNION_TYPE} is
3009 defined.  These files define the typedef of the Lisp object itself (as
3010 described above) and the low-level macros that hide the actual
3011 implementation of the Lisp object.  All extractor and constructor macros
3012 for particular types of Lisp objects are defined in terms of these
3013 low-level macros.
3014
3015 As a general rule, all typedefs should go into the typedefs section of
3016 @file{lisp.h} rather than into a module-specific header file even if the
3017 structure is defined elsewhere.  This allows function prototypes that
3018 use the typedef to be placed into other header files.  Forward structure
3019 declarations (i.e. a simple declaration like @code{struct foo;} where
3020 the structure itself is defined elsewhere) should be placed into the
3021 typedefs section as necessary.
3022
3023 @file{lrecord.h} contains the basic structures and macros that implement
3024 all record-type Lisp objects---i.e. all objects whose type is a field
3025 in their C structure, which includes all objects except the few most
3026 basic ones.
3027
3028 @file{lisp.h} contains prototypes for most of the exported functions in
3029 the various modules.  Lisp primitives defined using @code{DEFUN} that
3030 need to be called by C code should be declared using @code{EXFUN}.
3031 Other function prototypes should be placed either into the appropriate
3032 section of @code{lisp.h}, or into a module-specific header file,
3033 depending on how general-purpose the function is and whether it has
3034 special-purpose argument types requiring definitions not in
3035 @file{lisp.h}.)  All initialization functions are prototyped in
3036 @file{symsinit.h}.
3037
3038
3039
3040 @example
3041 alloc.c
3042 pure.c
3043 puresize.h
3044 @end example
3045
3046 The large module @file{alloc.c} implements all of the basic allocation and
3047 garbage collection for Lisp objects.  The most commonly used Lisp
3048 objects are allocated in chunks, similar to the Blocktype data type
3049 described above; others are allocated in individually @code{malloc()}ed
3050 blocks.  This module provides the foundation on which all other aspects
3051 of the Lisp environment sit, and is the first module initialized at
3052 startup.
3053
3054 Note that @file{alloc.c} provides a series of generic functions that are
3055 not dependent on any particular object type, and interfaces to
3056 particular types of objects using a standardized interface of
3057 type-specific methods.  This scheme is a fundamental principle of
3058 object-oriented programming and is heavily used throughout XEmacs.  The
3059 great advantage of this is that it allows for a clean separation of
3060 functionality into different modules---new classes of Lisp objects, new
3061 event interfaces, new device types, new stream interfaces, etc. can be
3062 added transparently without affecting code anywhere else in XEmacs.
3063 Because the different subsystems are divided into general and specific
3064 code, adding a new subtype within a subsystem will in general not
3065 require changes to the generic subsystem code or affect any of the other
3066 subtypes in the subsystem; this provides a great deal of robustness to
3067 the XEmacs code.
3068
3069 @cindex pure space
3070 @file{pure.c} contains the declaration of the @dfn{purespace} array.
3071 Pure space is a hack used to place some constant Lisp data into the code
3072 segment of the XEmacs executable, even though the data needs to be
3073 initialized through function calls.  (See above in section VIII for more
3074 info about this.)  During startup, certain sorts of data is
3075 automatically copied into pure space, and other data is copied manually
3076 in some of the basic Lisp files by calling the function @code{purecopy},
3077 which copies the object if possible (this only works in temacs, of
3078 course) and returns the new object.  In particular, while temacs is
3079 executing, the Lisp reader automatically copies all compiled-function
3080 objects that it reads into pure space.  Since compiled-function objects
3081 are large, are never modified, and typically comprise the majority of
3082 the contents of a compiled-Lisp file, this works well.  While XEmacs is
3083 running, any attempt to modify an object that resides in pure space
3084 causes an error.  Objects in pure space are never garbage collected --
3085 almost all of the time, they're intended to be permanent, and in any
3086 case you can't write into pure space to set the mark bits.
3087
3088 @file{puresize.h} contains the declaration of the size of the pure space
3089 array.  This depends on the optional features that are compiled in, any
3090 extra purespace requested by the user at compile time, and certain other
3091 factors (e.g. 64-bit machines need more pure space because their Lisp
3092 objects are larger).  The smallest size that suffices should be used, so
3093 that there's no wasted space.  If there's not enough pure space, you
3094 will get an error during the build process, specifying how much more
3095 pure space is needed.
3096
3097
3098
3099 @example
3100 eval.c
3101 backtrace.h
3102 @end example
3103
3104 This module contains all of the functions to handle the flow of control.
3105 This includes the mechanisms of defining functions, calling functions,
3106 traversing stack frames, and binding variables; the control primitives
3107 and other special forms such as @code{while}, @code{if}, @code{eval},
3108 @code{let}, @code{and}, @code{or}, @code{progn}, etc.; handling of
3109 non-local exits, unwind-protects, and exception handlers; entering the
3110 debugger; methods for the subr Lisp object type; etc.  It does
3111 @emph{not} include the @code{read} function, the @code{print} function,
3112 or the handling of symbols and obarrays.
3113
3114 @file{backtrace.h} contains some structures related to stack frames and the
3115 flow of control.
3116
3117
3118
3119 @example
3120 lread.c
3121 @end example
3122
3123 This module implements the Lisp reader and the @code{read} function,
3124 which converts text into Lisp objects, according to the read syntax of
3125 the objects, as described above.  This is similar to the parser that is
3126 a part of all compilers.
3127
3128
3129
3130 @example
3131 print.c
3132 @end example
3133
3134 This module implements the Lisp print mechanism and the @code{print}
3135 function and related functions.  This is the inverse of the Lisp reader
3136 -- it converts Lisp objects to a printed, textual representation.
3137 (Hopefully something that can be read back in using @code{read} to get
3138 an equivalent object.)
3139
3140
3141
3142 @example
3143 general.c
3144 symbols.c
3145 symeval.h
3146 @end example
3147
3148 @file{symbols.c} implements the handling of symbols, obarrays, and
3149 retrieving the values of symbols.  Much of the code is devoted to
3150 handling the special @dfn{symbol-value-magic} objects that define
3151 special types of variables---this includes buffer-local variables,
3152 variable aliases, variables that forward into C variables, etc.  This
3153 module is initialized extremely early (right after @file{alloc.c}),
3154 because it is here that the basic symbols @code{t} and @code{nil} are
3155 created, and those symbols are used everywhere throughout XEmacs.
3156
3157 @file{symeval.h} contains the definitions of symbol structures and the
3158 @code{DEFVAR_LISP()} and related macros for declaring variables.
3159
3160
3161
3162 @example
3163 data.c
3164 floatfns.c
3165 fns.c
3166 @end example
3167
3168 These modules implement the methods and standard Lisp primitives for all
3169 the basic Lisp object types other than symbols (which are described
3170 above).  @file{data.c} contains all the predicates (primitives that return
3171 whether an object is of a particular type); the integer arithmetic
3172 functions; and the basic accessor and mutator primitives for the various
3173 object types.  @file{fns.c} contains all the standard predicates for working
3174 with sequences (where, abstractly speaking, a sequence is an ordered set
3175 of objects, and can be represented by a list, string, vector, or
3176 bit-vector); it also contains @code{equal}, perhaps on the grounds that
3177 bulk of the operation of @code{equal} is comparing sequences.
3178 @file{floatfns.c} contains methods and primitives for floats and floating-point
3179 arithmetic.
3180
3181
3182
3183 @example
3184 bytecode.c
3185 bytecode.h
3186 @end example
3187
3188 @file{bytecode.c} implements the byte-code interpreter and
3189 compiled-function objects, and @file{bytecode.h} contains associated
3190 structures.  Note that the byte-code @emph{compiler} is written in Lisp.
3191
3192
3193
3194
3195 @node Modules for Standard Editing Operations, Editor-Level Control Flow Modules, Basic Lisp Modules, A Summary of the Various XEmacs Modules
3196 @section Modules for Standard Editing Operations
3197
3198 @example
3199 buffer.c
3200 buffer.h
3201 bufslots.h
3202 @end example
3203
3204 @file{buffer.c} implements the @dfn{buffer} Lisp object type.  This
3205 includes functions that create and destroy buffers; retrieve buffers by
3206 name or by other properties; manipulate lists of buffers (remember that
3207 buffers are permanent objects and stored in various ordered lists);
3208 retrieve or change buffer properties; etc.  It also contains the
3209 definitions of all the built-in buffer-local variables (which can be
3210 viewed as buffer properties).  It does @emph{not} contain code to
3211 manipulate buffer-local variables (that's in @file{symbols.c}, described
3212 above); or code to manipulate the text in a buffer.
3213
3214 @file{buffer.h} defines the structures associated with a buffer and the various
3215 macros for retrieving text from a buffer and special buffer positions
3216 (e.g. @code{point}, the default location for text insertion).  It also
3217 contains macros for working with buffer positions and converting between
3218 their representations as character offsets and as byte offsets (under
3219 MULE, they are different, because characters can be multi-byte).  It is
3220 one of the largest header files.
3221
3222 @file{bufslots.h} defines the fields in the buffer structure that correspond to
3223 the built-in buffer-local variables.  It is its own header file because
3224 it is included many times in @file{buffer.c}, as a way of iterating over all
3225 the built-in buffer-local variables.
3226
3227
3228
3229 @example
3230 insdel.c
3231 insdel.h
3232 @end example
3233
3234 @file{insdel.c} contains low-level functions for inserting and deleting text in
3235 a buffer, keeping track of changed regions for use by redisplay, and
3236 calling any before-change and after-change functions that may have been
3237 registered for the buffer.  It also contains the actual functions that
3238 convert between byte offsets and character offsets.
3239
3240 @file{insdel.h} contains associated headers.
3241
3242
3243
3244 @example
3245 marker.c
3246 @end example
3247
3248 This module implements the @dfn{marker} Lisp object type, which
3249 conceptually is a pointer to a text position in a buffer that moves
3250 around as text is inserted and deleted, so as to remain in the same
3251 relative position.  This module doesn't actually move the markers around
3252 -- that's handled in @file{insdel.c}.  This module just creates them and
3253 implements the primitives for working with them.  As markers are simple
3254 objects, this does not entail much.
3255
3256 Note that the standard arithmetic primitives (e.g. @code{+}) accept
3257 markers in place of integers and automatically substitute the value of
3258 @code{marker-position} for the marker, i.e. an integer describing the
3259 current buffer position of the marker.
3260
3261
3262
3263 @example
3264 extents.c
3265 extents.h
3266 @end example
3267
3268 This module implements the @dfn{extent} Lisp object type, which is like
3269 a marker that works over a range of text rather than a single position.
3270 Extents are also much more complex and powerful than markers and have a
3271 more efficient (and more algorithmically complex) implementation.  The
3272 implementation is described in detail in comments in @file{extents.c}.
3273
3274 The code in @file{extents.c} works closely with @file{insdel.c} so that
3275 extents are properly moved around as text is inserted and deleted.
3276 There is also code in @file{extents.c} that provides information needed
3277 by the redisplay mechanism for efficient operation. (Remember that
3278 extents can have display properties that affect [sometimes drastically,
3279 as in the @code{invisible} property] the display of the text they
3280 cover.)
3281
3282
3283
3284 @example
3285 editfns.c
3286 @end example
3287
3288 @file{editfns.c} contains the standard Lisp primitives for working with
3289 a buffer's text, and calls the low-level functions in @file{insdel.c}.
3290 It also contains primitives for working with @code{point} (the default
3291 buffer insertion location).
3292
3293 @file{editfns.c} also contains functions for retrieving various
3294 characteristics from the external environment: the current time, the
3295 process ID of the running XEmacs process, the name of the user who ran
3296 this XEmacs process, etc.  It's not clear why this code is in
3297 @file{editfns.c}.
3298
3299
3300
3301 @example
3302 callint.c
3303 cmds.c
3304 commands.h
3305 @end example
3306
3307 @cindex interactive
3308 These modules implement the basic @dfn{interactive} commands,
3309 i.e. user-callable functions.  Commands, as opposed to other functions,
3310 have special ways of getting their parameters interactively (by querying
3311 the user), as opposed to having them passed in a normal function
3312 invocation.  Many commands are not really meant to be called from other
3313 Lisp functions, because they modify global state in a way that's often
3314 undesired as part of other Lisp functions.
3315
3316 @file{callint.c} implements the mechanism for querying the user for
3317 parameters and calling interactive commands.  The bulk of this module is
3318 code that parses the interactive spec that is supplied with an
3319 interactive command.
3320
3321 @file{cmds.c} implements the basic, most commonly used editing commands:
3322 commands to move around the current buffer and insert and delete
3323 characters.  These commands are implemented using the Lisp primitives
3324 defined in @file{editfns.c}.
3325
3326 @file{commands.h} contains associated structure definitions and prototypes.
3327
3328
3329
3330 @example
3331 regex.c
3332 regex.h
3333 search.c
3334 @end example
3335
3336 @file{search.c} implements the Lisp primitives for searching for text in
3337 a buffer, and some of the low-level algorithms for doing this.  In
3338 particular, the fast fixed-string Boyer-Moore search algorithm is
3339 implemented in @file{search.c}.  The low-level algorithms for doing
3340 regular-expression searching, however, are implemented in @file{regex.c}
3341 and @file{regex.h}.  These two modules are largely independent of
3342 XEmacs, and are similar to (and based upon) the regular-expression
3343 routines used in @file{grep} and other GNU utilities.
3344
3345
3346
3347 @example
3348 doprnt.c
3349 @end example
3350
3351 @file{doprnt.c} implements formatted-string processing, similar to
3352 @code{printf()} command in C.
3353
3354
3355
3356 @example
3357 undo.c
3358 @end example
3359
3360 This module implements the undo mechanism for tracking buffer changes.
3361 Most of this could be implemented in Lisp.
3362
3363
3364
3365 @node Editor-Level Control Flow Modules, Modules for the Basic Displayable Lisp Objects, Modules for Standard Editing Operations, A Summary of the Various XEmacs Modules
3366 @section Editor-Level Control Flow Modules
3367
3368 @example
3369 event-Xt.c
3370 event-stream.c
3371 event-tty.c
3372 events.c
3373 events.h
3374 @end example
3375
3376 These implement the handling of events (user input and other system
3377 notifications).
3378
3379 @file{events.c} and @file{events.h} define the @dfn{event} Lisp object
3380 type and primitives for manipulating it.
3381
3382 @file{event-stream.c} implements the basic functions for working with
3383 event queues, dispatching an event by looking it up in relevant keymaps
3384 and such, and handling timeouts; this includes the primitives
3385 @code{next-event} and @code{dispatch-event}, as well as related
3386 primitives such as @code{sit-for}, @code{sleep-for}, and
3387 @code{accept-process-output}. (@file{event-stream.c} is one of the
3388 hairiest and trickiest modules in XEmacs.  Beware!  You can easily mess
3389 things up here.)
3390
3391 @file{event-Xt.c} and @file{event-tty.c} implement the low-level
3392 interfaces onto retrieving events from Xt (the X toolkit) and from TTY's
3393 (using @code{read()} and @code{select()}), respectively.  The event
3394 interface enforces a clean separation between the specific code for
3395 interfacing with the operating system and the generic code for working
3396 with events, by defining an API of basic, low-level event methods;
3397 @file{event-Xt.c} and @file{event-tty.c} are two different
3398 implementations of this API.  To add support for a new operating system
3399 (e.g. NeXTstep), one merely needs to provide another implementation of
3400 those API functions.
3401
3402 Note that the choice of whether to use @file{event-Xt.c} or
3403 @file{event-tty.c} is made at compile time!  Or at the very latest, it
3404 is made at startup time.  @file{event-Xt.c} handles events for
3405 @emph{both} X and TTY frames; @file{event-tty.c} is only used when X
3406 support is not compiled into XEmacs.  The reason for this is that there
3407 is only one event loop in XEmacs: thus, it needs to be able to receive
3408 events from all different kinds of frames.
3409
3410
3411
3412 @example
3413 keymap.c
3414 keymap.h
3415 @end example
3416
3417 @file{keymap.c} and @file{keymap.h} define the @dfn{keymap} Lisp object
3418 type and associated methods and primitives. (Remember that keymaps are
3419 objects that associate event descriptions with functions to be called to
3420 ``execute'' those events; @code{dispatch-event} looks up events in the
3421 relevant keymaps.)
3422
3423
3424
3425 @example
3426 keyboard.c
3427 @end example
3428
3429 @file{keyboard.c} contains functions that implement the actual editor
3430 command loop---i.e. the event loop that cyclically retrieves and
3431 dispatches events.  This code is also rather tricky, just like
3432 @file{event-stream.c}.
3433
3434
3435
3436 @example
3437 macros.c
3438 macros.h
3439 @end example
3440
3441 These two modules contain the basic code for defining keyboard macros.
3442 These functions don't actually do much; most of the code that handles keyboard
3443 macros is mixed in with the event-handling code in @file{event-stream.c}.
3444
3445
3446
3447 @example
3448 minibuf.c
3449 @end example
3450
3451 This contains some miscellaneous code related to the minibuffer (most of
3452 the minibuffer code was moved into Lisp by Richard Mlynarik).  This
3453 includes the primitives for completion (although filename completion is
3454 in @file{dired.c}), the lowest-level interface to the minibuffer (if the
3455 command loop were cleaned up, this too could be in Lisp), and code for
3456 dealing with the echo area (this, too, was mostly moved into Lisp, and
3457 the only code remaining is code to call out to Lisp or provide simple
3458 bootstrapping implementations early in temacs, before the echo-area Lisp
3459 code is loaded).
3460
3461
3462
3463 @node Modules for the Basic Displayable Lisp Objects, Modules for other Display-Related Lisp Objects, Editor-Level Control Flow Modules, A Summary of the Various XEmacs Modules
3464 @section Modules for the Basic Displayable Lisp Objects
3465
3466 @example
3467 device-ns.h
3468 device-stream.c
3469 device-stream.h
3470 device-tty.c
3471 device-tty.h
3472 device-x.c
3473 device-x.h
3474 device.c
3475 device.h
3476 @end example
3477
3478 These modules implement the @dfn{device} Lisp object type.  This
3479 abstracts a particular screen or connection on which frames are
3480 displayed.  As with Lisp objects, event interfaces, and other
3481 subsystems, the device code is separated into a generic component that
3482 contains a standardized interface (in the form of a set of methods) onto
3483 particular device types.
3484
3485 The device subsystem defines all the methods and provides method
3486 services for not only device operations but also for the frame, window,
3487 menubar, scrollbar, toolbar, and other displayable-object subsystems.
3488 The reason for this is that all of these subsystems have the same
3489 subtypes (X, TTY, NeXTstep, Microsoft Windows, etc.) as devices do.
3490
3491
3492
3493 @example
3494 frame-ns.h
3495 frame-tty.c
3496 frame-x.c
3497 frame-x.h
3498 frame.c
3499 frame.h
3500 @end example
3501
3502 Each device contains one or more frames in which objects (e.g. text) are
3503 displayed.  A frame corresponds to a window in the window system;
3504 usually this is a top-level window but it could potentially be one of a
3505 number of overlapping child windows within a top-level window, using the
3506 MDI (Multiple Document Interface) protocol in Microsoft Windows or a
3507 similar scheme.
3508
3509 The @file{frame-*} files implement the @dfn{frame} Lisp object type and
3510 provide the generic and device-type-specific operations on frames
3511 (e.g. raising, lowering, resizing, moving, etc.).
3512
3513
3514
3515 @example
3516 window.c
3517 window.h
3518 @end example
3519
3520 @cindex window (in Emacs)
3521 @cindex pane
3522 Each frame consists of one or more non-overlapping @dfn{windows} (better
3523 known as @dfn{panes} in standard window-system terminology) in which a
3524 buffer's text can be displayed.  Windows can also have scrollbars
3525 displayed around their edges.
3526
3527 @file{window.c} and @file{window.h} implement the @dfn{window} Lisp
3528 object type and provide code to manage windows.  Since windows have no
3529 associated resources in the window system (the window system knows only
3530 about the frame; no child windows or anything are used for XEmacs
3531 windows), there is no device-type-specific code here; all of that code
3532 is part of the redisplay mechanism or the code for particular object
3533 types such as scrollbars.
3534
3535
3536
3537 @node Modules for other Display-Related Lisp Objects, Modules for the Redisplay Mechanism, Modules for the Basic Displayable Lisp Objects, A Summary of the Various XEmacs Modules
3538 @section Modules for other Display-Related Lisp Objects
3539
3540 @example
3541 faces.c
3542 faces.h
3543 @end example
3544
3545
3546
3547 @example
3548 bitmaps.h
3549 glyphs-ns.h
3550 glyphs-x.c
3551 glyphs-x.h
3552 glyphs.c
3553 glyphs.h
3554 @end example
3555
3556
3557
3558 @example
3559 objects-ns.h
3560 objects-tty.c
3561 objects-tty.h
3562 objects-x.c
3563 objects-x.h
3564 objects.c
3565 objects.h
3566 @end example
3567
3568
3569
3570 @example
3571 menubar-x.c
3572 menubar.c
3573 @end example
3574
3575
3576
3577 @example
3578 scrollbar-x.c
3579 scrollbar-x.h
3580 scrollbar.c
3581 scrollbar.h
3582 @end example
3583
3584
3585
3586 @example
3587 toolbar-x.c
3588 toolbar.c
3589 toolbar.h
3590 @end example
3591
3592
3593
3594 @example
3595 font-lock.c
3596 @end example
3597
3598 This file provides C support for syntax highlighting---i.e.
3599 highlighting different syntactic constructs of a source file in
3600 different colors, for easy reading.  The C support is provided so that
3601 this is fast.
3602
3603
3604
3605 @example
3606 dgif_lib.c
3607 gif_err.c
3608 gif_lib.h
3609 gifalloc.c
3610 @end example
3611
3612 These modules decode GIF-format image files, for use with glyphs.
3613
3614
3615
3616 @node Modules for the Redisplay Mechanism, Modules for Interfacing with the File System, Modules for other Display-Related Lisp Objects, A Summary of the Various XEmacs Modules
3617 @section Modules for the Redisplay Mechanism
3618
3619 @example
3620 redisplay-output.c
3621 redisplay-tty.c
3622 redisplay-x.c
3623 redisplay.c
3624 redisplay.h
3625 @end example
3626
3627 These files provide the redisplay mechanism.  As with many other
3628 subsystems in XEmacs, there is a clean separation between the general
3629 and device-specific support.
3630
3631 @file{redisplay.c} contains the bulk of the redisplay engine.  These
3632 functions update the redisplay structures (which describe how the screen
3633 is to appear) to reflect any changes made to the state of any
3634 displayable objects (buffer, frame, window, etc.) since the last time
3635 that redisplay was called.  These functions are highly optimized to
3636 avoid doing more work than necessary (since redisplay is called
3637 extremely often and is potentially a huge time sink), and depend heavily
3638 on notifications from the objects themselves that changes have occurred,
3639 so that redisplay doesn't explicitly have to check each possible object.
3640 The redisplay mechanism also contains a great deal of caching to further
3641 speed things up; some of this caching is contained within the various
3642 displayable objects.
3643
3644 @file{redisplay-output.c} goes through the redisplay structures and converts
3645 them into calls to device-specific methods to actually output the screen
3646 changes.
3647
3648 @file{redisplay-x.c} and @file{redisplay-tty.c} are two implementations
3649 of these redisplay output methods, for X frames and TTY frames,
3650 respectively.
3651
3652
3653
3654 @example
3655 indent.c
3656 @end example
3657
3658 This module contains various functions and Lisp primitives for
3659 converting between buffer positions and screen positions.  These
3660 functions call the redisplay mechanism to do most of the work, and then
3661 examine the redisplay structures to get the necessary information.  This
3662 module needs work.
3663
3664
3665
3666 @example
3667 termcap.c
3668 terminfo.c
3669 tparam.c
3670 @end example
3671
3672 These files contain functions for working with the termcap (BSD-style)
3673 and terminfo (System V style) databases of terminal capabilities and
3674 escape sequences, used when XEmacs is displaying in a TTY.
3675
3676
3677
3678 @example
3679 cm.c
3680 cm.h
3681 @end example
3682
3683 These files provide some miscellaneous TTY-output functions and should
3684 probably be merged into @file{redisplay-tty.c}.
3685
3686
3687
3688 @node Modules for Interfacing with the File System, Modules for Other Aspects of the Lisp Interpreter and Object System, Modules for the Redisplay Mechanism, A Summary of the Various XEmacs Modules
3689 @section Modules for Interfacing with the File System
3690
3691 @example
3692 lstream.c
3693 lstream.h
3694 @end example
3695
3696 These modules implement the @dfn{stream} Lisp object type.  This is an
3697 internal-only Lisp object that implements a generic buffering stream.
3698 The idea is to provide a uniform interface onto all sources and sinks of
3699 data, including file descriptors, stdio streams, chunks of memory, Lisp
3700 buffers, Lisp strings, etc.  That way, I/O functions can be written to
3701 the stream interface and can transparently handle all possible sources
3702 and sinks.  (For example, the @code{read} function can read data from a
3703 file, a string, a buffer, or even a function that is called repeatedly
3704 to return data, without worrying about where the data is coming from or
3705 what-size chunks it is returned in.)
3706
3707 @cindex lstream
3708 Note that in the C code, streams are called @dfn{lstreams} (for ``Lisp
3709 streams'') to distinguish them from other kinds of streams, e.g. stdio
3710 streams and C++ I/O streams.
3711
3712 Similar to other subsystems in XEmacs, lstreams are separated into
3713 generic functions and a set of methods for the different types of
3714 lstreams.  @file{lstream.c} provides implementations of many different
3715 types of streams; others are provided, e.g., in @file{mule-coding.c}.
3716
3717
3718
3719 @example
3720 fileio.c
3721 @end example
3722
3723 This implements the basic primitives for interfacing with the file
3724 system.  This includes primitives for reading files into buffers,
3725 writing buffers into files, checking for the presence or accessibility
3726 of files, canonicalizing file names, etc.  Note that these primitives
3727 are usually not invoked directly by the user: There is a great deal of
3728 higher-level Lisp code that implements the user commands such as
3729 @code{find-file} and @code{save-buffer}.  This is similar to the
3730 distinction between the lower-level primitives in @file{editfns.c} and
3731 the higher-level user commands in @file{commands.c} and
3732 @file{simple.el}.
3733
3734
3735
3736 @example
3737 filelock.c
3738 @end example
3739
3740 This file provides functions for detecting clashes between different
3741 processes (e.g. XEmacs and some external process, or two different
3742 XEmacs processes) modifying the same file.  (XEmacs can optionally use
3743 the @file{lock/} subdirectory to provide a form of ``locking'' between
3744 different XEmacs processes.)  This module is also used by the low-level
3745 functions in @file{insdel.c} to ensure that, if the first modification
3746 is being made to a buffer whose corresponding file has been externally
3747 modified, the user is made aware of this so that the buffer can be
3748 synched up with the external changes if necessary.
3749
3750
3751 @example
3752 filemode.c
3753 @end example
3754
3755 This file provides some miscellaneous functions that construct a
3756 @samp{rwxr-xr-x}-type permissions string (as might appear in an
3757 @file{ls}-style directory listing) given the information returned by the
3758 @code{stat()} system call.
3759
3760
3761
3762 @example
3763 dired.c
3764 ndir.h
3765 @end example
3766
3767 These files implement the XEmacs interface to directory searching.  This
3768 includes a number of primitives for determining the files in a directory
3769 and for doing filename completion. (Remember that generic completion is
3770 handled by a different mechanism, in @file{minibuf.c}.)
3771
3772 @file{ndir.h} is a header file used for the directory-searching
3773 emulation functions provided in @file{sysdep.c} (see section J below),
3774 for systems that don't provide any directory-searching functions. (On
3775 those systems, directories can be read directly as files, and parsed.)
3776
3777
3778
3779 @example
3780 realpath.c
3781 @end example
3782
3783 This file provides an implementation of the @code{realpath()} function
3784 for expanding symbolic links, on systems that don't implement it or have
3785 a broken implementation.
3786
3787
3788
3789 @node Modules for Other Aspects of the Lisp Interpreter and Object System, Modules for Interfacing with the Operating System, Modules for Interfacing with the File System, A Summary of the Various XEmacs Modules
3790 @section Modules for Other Aspects of the Lisp Interpreter and Object System
3791
3792 @example
3793 elhash.c
3794 elhash.h
3795 hash.c
3796 hash.h
3797 @end example
3798
3799 These files provide two implementations of hash tables.  Files
3800 @file{hash.c} and @file{hash.h} provide a generic C implementation of
3801 hash tables which can stand independently of XEmacs.  Files
3802 @file{elhash.c} and @file{elhash.h} provide a separate implementation of
3803 hash tables that can store only Lisp objects, and knows about Lispy
3804 things like garbage collection, and implement the @dfn{hash-table} Lisp
3805 object type.
3806
3807
3808 @example
3809 specifier.c
3810 specifier.h
3811 @end example
3812
3813 This module implements the @dfn{specifier} Lisp object type.  This is
3814 primarily used for displayable properties, and allows for values that
3815 are specific to a particular buffer, window, frame, device, or device
3816 class, as well as a default value existing.  This is used, for example,
3817 to control the height of the horizontal scrollbar or the appearance of
3818 the @code{default}, @code{bold}, or other faces.  The specifier object
3819 consists of a number of specifications, each of which maps from a
3820 buffer, window, etc. to a value.  The function @code{specifier-instance}
3821 looks up a value given a window (from which a buffer, frame, and device
3822 can be derived).
3823
3824
3825 @example
3826 chartab.c
3827 chartab.h
3828 casetab.c
3829 @end example
3830
3831 @file{chartab.c} and @file{chartab.h} implement the @dfn{char table}
3832 Lisp object type, which maps from characters or certain sorts of
3833 character ranges to Lisp objects.  The implementation of this object
3834 type is optimized for the internal representation of characters.  Char
3835 tables come in different types, which affect the allowed object types to
3836 which a character can be mapped and also dictate certain other
3837 properties of the char table.
3838
3839 @cindex case table
3840 @file{casetab.c} implements one sort of char table, the @dfn{case
3841 table}, which maps characters to other characters of possibly different
3842 case.  These are used by XEmacs to implement case-changing primitives
3843 and to do case-insensitive searching.
3844
3845
3846
3847 @example
3848 syntax.c
3849 syntax.h
3850 @end example
3851
3852 @cindex scanner
3853 This module implements @dfn{syntax tables}, another sort of char table
3854 that maps characters into syntax classes that define the syntax of these
3855 characters (e.g. a parenthesis belongs to a class of @samp{open}
3856 characters that have corresponding @samp{close} characters and can be
3857 nested).  This module also implements the Lisp @dfn{scanner}, a set of
3858 primitives for scanning over text based on syntax tables.  This is used,
3859 for example, to find the matching parenthesis in a command such as
3860 @code{forward-sexp}, and by @file{font-lock.c} to locate quoted strings,
3861 comments, etc.
3862
3863
3864
3865 @example
3866 casefiddle.c
3867 @end example
3868
3869 This module implements various Lisp primitives for upcasing, downcasing
3870 and capitalizing strings or regions of buffers.
3871
3872
3873
3874 @example
3875 rangetab.c
3876 @end example
3877
3878 This module implements the @dfn{range table} Lisp object type, which
3879 provides for a mapping from ranges of integers to arbitrary Lisp
3880 objects.
3881
3882
3883
3884 @example
3885 opaque.c
3886 opaque.h
3887 @end example
3888
3889 This module implements the @dfn{opaque} Lisp object type, an
3890 internal-only Lisp object that encapsulates an arbitrary block of memory
3891 so that it can be managed by the Lisp allocation system.  To create an
3892 opaque object, you call @code{make_opaque()}, passing a pointer to a
3893 block of memory.  An object is created that is big enough to hold the
3894 memory, which is copied into the object's storage.  The object will then
3895 stick around as long as you keep pointers to it, after which it will be
3896 automatically reclaimed.
3897
3898 @cindex mark method
3899 Opaque objects can also have an arbitrary @dfn{mark method} associated
3900 with them, in case the block of memory contains other Lisp objects that
3901 need to be marked for garbage-collection purposes. (If you need other
3902 object methods, such as a finalize method, you should just go ahead and
3903 create a new Lisp object type---it's not hard.)
3904
3905
3906
3907 @example
3908 abbrev.c
3909 @end example
3910
3911 This function provides a few primitives for doing dynamic abbreviation
3912 expansion.  In XEmacs, most of the code for this has been moved into
3913 Lisp.  Some C code remains for speed and because the primitive
3914 @code{self-insert-command} (which is executed for all self-inserting
3915 characters) hooks into the abbrev mechanism. (@code{self-insert-command}
3916 is itself in C only for speed.)
3917
3918
3919
3920 @example
3921 doc.c
3922 @end example
3923
3924 This function provides primitives for retrieving the documentation
3925 strings of functions and variables.  These documentation strings contain
3926 certain special markers that get dynamically expanded (e.g. a
3927 reverse-lookup is performed on some named functions to retrieve their
3928 current key bindings).  Some documentation strings (in particular, for
3929 the built-in primitives and pre-loaded Lisp functions) are stored
3930 externally in a file @file{DOC} in the @file{lib-src/} directory and
3931 need to be fetched from that file. (Part of the build stage involves
3932 building this file, and another part involves constructing an index for
3933 this file and embedding it into the executable, so that the functions in
3934 @file{doc.c} do not have to search the entire @file{DOC} file to find
3935 the appropriate documentation string.)
3936
3937
3938
3939 @example
3940 md5.c
3941 @end example
3942
3943 This function provides a Lisp primitive that implements the MD5 secure
3944 hashing scheme, used to create a large hash value of a string of data such that
3945 the data cannot be derived from the hash value.  This is used for
3946 various security applications on the Internet.
3947
3948
3949
3950
3951 @node Modules for Interfacing with the Operating System, Modules for Interfacing with X Windows, Modules for Other Aspects of the Lisp Interpreter and Object System, A Summary of the Various XEmacs Modules
3952 @section Modules for Interfacing with the Operating System
3953
3954 @example
3955 callproc.c
3956 process.c
3957 process.h
3958 @end example
3959
3960 These modules allow XEmacs to spawn and communicate with subprocesses
3961 and network connections.
3962
3963 @cindex synchronous subprocesses
3964 @cindex subprocesses, synchronous
3965   @file{callproc.c} implements (through the @code{call-process}
3966 primitive) what are called @dfn{synchronous subprocesses}.  This means
3967 that XEmacs runs a program, waits till it's done, and retrieves its
3968 output.  A typical example might be calling the @file{ls} program to get
3969 a directory listing.
3970
3971 @cindex asynchronous subprocesses
3972 @cindex subprocesses, asynchronous
3973   @file{process.c} and @file{process.h} implement @dfn{asynchronous
3974 subprocesses}.  This means that XEmacs starts a program and then
3975 continues normally, not waiting for the process to finish.  Data can be
3976 sent to the process or retrieved from it as it's running.  This is used
3977 for the @code{shell} command (which provides a front end onto a shell
3978 program such as @file{csh}), the mail and news readers implemented in
3979 XEmacs, etc.  The result of calling @code{start-process} to start a
3980 subprocess is a process object, a particular kind of object used to
3981 communicate with the subprocess.  You can send data to the process by
3982 passing the process object and the data to @code{send-process}, and you
3983 can specify what happens to data retrieved from the process by setting
3984 properties of the process object. (When the process sends data, XEmacs
3985 receives a process event, which says that there is data ready.  When
3986 @code{dispatch-event} is called on this event, it reads the data from
3987 the process and does something with it, as specified by the process
3988 object's properties.  Typically, this means inserting the data into a
3989 buffer or calling a function.) Another property of the process object is
3990 called the @dfn{sentinel}, which is a function that is called when the
3991 process terminates.
3992
3993 @cindex network connections
3994   Process objects are also used for network connections (connections to a
3995 process running on another machine).  Network connections are started
3996 with @code{open-network-stream} but otherwise work just like
3997 subprocesses.
3998
3999
4000
4001 @example
4002 sysdep.c
4003 sysdep.h
4004 @end example
4005
4006   These modules implement most of the low-level, messy operating-system
4007 interface code.  This includes various device control (ioctl) operations
4008 for file descriptors, TTY's, pseudo-terminals, etc. (usually this stuff
4009 is fairly system-dependent; thus the name of this module), and emulation
4010 of standard library functions and system calls on systems that don't
4011 provide them or have broken versions.
4012
4013
4014
4015 @example
4016 sysdir.h
4017 sysfile.h
4018 sysfloat.h
4019 sysproc.h
4020 syspwd.h
4021 syssignal.h
4022 systime.h
4023 systty.h
4024 syswait.h
4025 @end example
4026
4027 These header files provide consistent interfaces onto system-dependent
4028 header files and system calls.  The idea is that, instead of including a
4029 standard header file like @file{<sys/param.h>} (which may or may not
4030 exist on various systems) or having to worry about whether all system
4031 provide a particular preprocessor constant, or having to deal with the
4032 four different paradigms for manipulating signals, you just include the
4033 appropriate @file{sys*.h} header file, which includes all the right
4034 system header files, defines and missing preprocessor constants,
4035 provides a uniform interface onto system calls, etc.
4036
4037 @file{sysdir.h} provides a uniform interface onto directory-querying
4038 functions. (In some cases, this is in conjunction with emulation
4039 functions in @file{sysdep.c}.)
4040
4041 @file{sysfile.h} includes all the necessary header files for standard
4042 system calls (e.g. @code{read()}), ensures that all necessary
4043 @code{open()} and @code{stat()} preprocessor constants are defined, and
4044 possibly (usually) substitutes sugared versions of @code{read()},
4045 @code{write()}, etc. that automatically restart interrupted I/O
4046 operations.
4047
4048 @file{sysfloat.h} includes the necessary header files for floating-point
4049 operations.
4050
4051 @file{sysproc.h} includes the necessary header files for calling
4052 @code{select()}, @code{fork()}, @code{execve()}, socket operations, and
4053 the like, and ensures that the @code{FD_*()} macros for descriptor-set
4054 manipulations are available.
4055
4056 @file{syspwd.h} includes the necessary header files for obtaining
4057 information from @file{/etc/passwd} (the functions are emulated under
4058 VMS).
4059
4060 @file{syssignal.h} includes the necessary header files for
4061 signal-handling and provides a uniform interface onto the different
4062 signal-handling and signal-blocking paradigms.
4063
4064 @file{systime.h} includes the necessary header files and provides
4065 uniform interfaces for retrieving the time of day, setting file
4066 access/modification times, getting the amount of time used by the XEmacs
4067 process, etc.
4068
4069 @file{systty.h} buffers against the infinitude of different ways of
4070 controlling TTY's.
4071
4072 @file{syswait.h} provides a uniform way of retrieving the exit status
4073 from a @code{wait()}ed-on process (some systems use a union, others use
4074 an int).
4075
4076
4077
4078 @example
4079 hpplay.c
4080 libsst.c
4081 libsst.h
4082 libst.h
4083 linuxplay.c
4084 nas.c
4085 sgiplay.c
4086 sound.c
4087 sunplay.c
4088 @end example
4089
4090 These files implement the ability to play various sounds on some types
4091 of computers.  You have to configure your XEmacs with sound support in
4092 order to get this capability.
4093
4094 @file{sound.c} provides the generic interface.  It implements various
4095 Lisp primitives and variables that let you specify which sounds should
4096 be played in certain conditions. (The conditions are identified by
4097 symbols, which are passed to @code{ding} to make a sound.  Various
4098 standard functions call this function at certain times; if sound support
4099 does not exist, a simple beep results.
4100
4101 @cindex native sound
4102 @cindex sound, native
4103 @file{sgiplay.c}, @file{sunplay.c}, @file{hpplay.c}, and
4104 @file{linuxplay.c} interface to the machine's speaker for various
4105 different kind of machines.  This is called @dfn{native} sound.
4106
4107 @cindex sound, network
4108 @cindex network sound
4109 @cindex NAS
4110 @file{nas.c} interfaces to a computer somewhere else on the network
4111 using the NAS (Network Audio Server) protocol, playing sounds on that
4112 machine.  This allows you to run XEmacs on a remote machine, with its
4113 display set to your local machine, and have the sounds be made on your
4114 local machine, provided that you have a NAS server running on your local
4115 machine.
4116
4117 @file{libsst.c}, @file{libsst.h}, and @file{libst.h} provide some
4118 additional functions for playing sound on a Sun SPARC but are not
4119 currently in use.
4120
4121
4122
4123 @example
4124 tooltalk.c
4125 tooltalk.h
4126 @end example
4127
4128 These two modules implement an interface to the ToolTalk protocol, which
4129 is an interprocess communication protocol implemented on some versions
4130 of Unix.  ToolTalk is a high-level protocol that allows processes to
4131 register themselves as providers of particular services; other processes
4132 can then request a service without knowing or caring exactly who is
4133 providing the service.  It is similar in spirit to the DDE protocol
4134 provided under Microsoft Windows.  ToolTalk is a part of the new CDE
4135 (Common Desktop Environment) specification and is used to connect the
4136 parts of the SPARCWorks development environment.
4137
4138
4139
4140 @example
4141 getloadavg.c
4142 @end example
4143
4144 This module provides the ability to retrieve the system's current load
4145 average. (The way to do this is highly system-specific, unfortunately,
4146 and requires a lot of special-case code.)
4147
4148
4149
4150 @example
4151 sunpro.c
4152 @end example
4153
4154 This module provides a small amount of code used internally at Sun to
4155 keep statistics on the usage of XEmacs.
4156
4157
4158
4159 @example
4160 broken-sun.h
4161 strcmp.c
4162 strcpy.c
4163 sunOS-fix.c
4164 @end example
4165
4166 These files provide replacement functions and prototypes to fix numerous
4167 bugs in early releases of SunOS 4.1.
4168
4169
4170
4171 @example
4172 hftctl.c
4173 @end example
4174
4175 This module provides some terminal-control code necessary on versions of
4176 AIX prior to 4.1.
4177
4178
4179
4180 @example
4181 msdos.c
4182 msdos.h
4183 @end example
4184
4185 These modules are used for MS-DOS support, which does not work in
4186 XEmacs.
4187
4188
4189
4190 @node Modules for Interfacing with X Windows, Modules for Internationalization, Modules for Interfacing with the Operating System, A Summary of the Various XEmacs Modules
4191 @section Modules for Interfacing with X Windows
4192
4193 @example
4194 Emacs.ad.h
4195 @end example
4196
4197 A file generated from @file{Emacs.ad}, which contains XEmacs-supplied
4198 fallback resources (so that XEmacs has pretty defaults).
4199
4200
4201
4202 @example
4203 EmacsFrame.c
4204 EmacsFrame.h
4205 EmacsFrameP.h
4206 @end example
4207
4208 These modules implement an Xt widget class that encapsulates a frame.
4209 This is for ease in integrating with Xt.  The EmacsFrame widget covers
4210 the entire X window except for the menubar; the scrollbars are
4211 positioned on top of the EmacsFrame widget.
4212
4213 @strong{Warning:} Abandon hope, all ye who enter here.  This code took
4214 an ungodly amount of time to get right, and is likely to fall apart
4215 mercilessly at the slightest change.  Such is life under Xt.
4216
4217
4218
4219 @example
4220 EmacsManager.c
4221 EmacsManager.h
4222 EmacsManagerP.h
4223 @end example
4224
4225 These modules implement a simple Xt manager (i.e. composite) widget
4226 class that simply lets its children set whatever geometry they want.
4227 It's amazing that Xt doesn't provide this standardly, but on second
4228 thought, it makes sense, considering how amazingly broken Xt is.
4229
4230
4231 @example
4232 EmacsShell-sub.c
4233 EmacsShell.c
4234 EmacsShell.h
4235 EmacsShellP.h
4236 @end example
4237
4238 These modules implement two Xt widget classes that are subclasses of
4239 the TopLevelShell and TransientShell classes.  This is necessary to deal
4240 with more brokenness that Xt has sadistically thrust onto the backs of
4241 developers.
4242
4243
4244
4245 @example
4246 xgccache.c
4247 xgccache.h
4248 @end example
4249
4250 These modules provide functions for maintenance and caching of GC's
4251 (graphics contexts) under the X Window System.  This code is junky and
4252 needs to be rewritten.
4253
4254
4255
4256 @example
4257 xselect.c
4258 @end example
4259
4260 @cindex selections
4261   This module provides an interface to the X Window System's concept of
4262 @dfn{selections}, the standard way for X applications to communicate
4263 with each other.
4264
4265
4266
4267 @example
4268 xintrinsic.h
4269 xintrinsicp.h
4270 xmmanagerp.h
4271 xmprimitivep.h
4272 @end example
4273
4274 These header files are similar in spirit to the @file{sys*.h} files and buffer
4275 against different implementations of Xt and Motif.
4276
4277 @itemize @bullet
4278 @item
4279 @file{xintrinsic.h} should be included in place of @file{<Intrinsic.h>}.
4280 @item
4281 @file{xintrinsicp.h} should be included in place of @file{<IntrinsicP.h>}.
4282 @item
4283 @file{xmmanagerp.h} should be included in place of @file{<XmManagerP.h>}.
4284 @item
4285 @file{xmprimitivep.h} should be included in place of @file{<XmPrimitiveP.h>}.
4286 @end itemize
4287
4288
4289
4290 @example
4291 xmu.c
4292 xmu.h
4293 @end example
4294
4295 These files provide an emulation of the Xmu library for those systems
4296 (i.e. HPUX) that don't provide it as a standard part of X.
4297
4298
4299
4300 @example
4301 ExternalClient-Xlib.c
4302 ExternalClient.c
4303 ExternalClient.h
4304 ExternalClientP.h
4305 ExternalShell.c
4306 ExternalShell.h
4307 ExternalShellP.h
4308 extw-Xlib.c
4309 extw-Xlib.h
4310 extw-Xt.c
4311 extw-Xt.h
4312 @end example
4313
4314 @cindex external widget
4315   These files provide the @dfn{external widget} interface, which allows an
4316 XEmacs frame to appear as a widget in another application.  To do this,
4317 you have to configure with @samp{--external-widget}.
4318
4319 @file{ExternalShell*} provides the server (XEmacs) side of the
4320 connection.
4321
4322 @file{ExternalClient*} provides the client (other application) side of
4323 the connection.  These files are not compiled into XEmacs but are
4324 compiled into libraries that are then linked into your application.
4325
4326 @file{extw-*} is common code that is used for both the client and server.
4327
4328 Don't touch this code; something is liable to break if you do.
4329
4330
4331
4332 @node Modules for Internationalization,  , Modules for Interfacing with X Windows, A Summary of the Various XEmacs Modules
4333 @section Modules for Internationalization
4334
4335 @example
4336 mule-canna.c
4337 mule-ccl.c
4338 mule-charset.c
4339 mule-charset.h
4340 mule-coding.c
4341 mule-coding.h
4342 mule-mcpath.c
4343 mule-mcpath.h
4344 mule-wnnfns.c
4345 mule.c
4346 @end example
4347
4348 These files implement the MULE (Asian-language) support.  Note that MULE
4349 actually provides a general interface for all sorts of languages, not
4350 just Asian languages (although they are generally the most complicated
4351 to support).  This code is still in beta.
4352
4353 @file{mule-charset.*} and @file{mule-coding.*} provide the heart of the
4354 XEmacs MULE support.  @file{mule-charset.*} implements the @dfn{charset}
4355 Lisp object type, which encapsulates a character set (an ordered one- or
4356 two-dimensional set of characters, such as US ASCII or JISX0208 Japanese
4357 Kanji).
4358
4359 @file{mule-coding.*} implements the @dfn{coding-system} Lisp object
4360 type, which encapsulates a method of converting between different
4361 encodings.  An encoding is a representation of a stream of characters,
4362 possibly from multiple character sets, using a stream of bytes or words,
4363 and defines (e.g.) which escape sequences are used to specify particular
4364 character sets, how the indices for a character are converted into bytes
4365 (sometimes this involves setting the high bit; sometimes complicated
4366 rearranging of the values takes place, as in the Shift-JIS encoding),
4367 etc.
4368
4369 @file{mule-ccl.c} provides the CCL (Code Conversion Language)
4370 interpreter.  CCL is similar in spirit to Lisp byte code and is used to
4371 implement converters for custom encodings.
4372
4373 @file{mule-canna.c} and @file{mule-wnnfns.c} implement interfaces to
4374 external programs used to implement the Canna and WNN input methods,
4375 respectively.  This is currently in beta.
4376
4377 @file{mule-mcpath.c} provides some functions to allow for pathnames
4378 containing extended characters.  This code is fragmentary, obsolete, and
4379 completely non-working.  Instead, @var{pathname-coding-system} is used
4380 to specify conversions of names of files and directories.  The standard
4381 C I/O functions like @samp{open()} are wrapped so that conversion occurs
4382 automatically.
4383
4384 @file{mule.c} provides a few miscellaneous things that should probably
4385 be elsewhere.
4386
4387
4388
4389 @example
4390 intl.c
4391 @end example
4392
4393 This provides some miscellaneous internationalization code for
4394 implementing message translation and interfacing to the Ximp input
4395 method.  None of this code is currently working.
4396
4397
4398
4399 @example
4400 iso-wide.h
4401 @end example
4402
4403 This contains leftover code from an earlier implementation of
4404 Asian-language support, and is not currently used.
4405
4406
4407
4408
4409 @node Allocation of Objects in XEmacs Lisp, Dumping, A Summary of the Various XEmacs Modules, Top
4410 @chapter Allocation of Objects in XEmacs Lisp
4411
4412 @menu
4413 * Introduction to Allocation::
4414 * Garbage Collection::
4415 * GCPROing::
4416 * Garbage Collection - Step by Step::
4417 * Integers and Characters::
4418 * Allocation from Frob Blocks::
4419 * lrecords::
4420 * Low-level allocation::
4421 * Pure Space::
4422 * Cons::
4423 * Vector::
4424 * Bit Vector::
4425 * Symbol::
4426 * Marker::
4427 * String::
4428 * Compiled Function::
4429 @end menu
4430
4431 @node Introduction to Allocation, Garbage Collection, Allocation of Objects in XEmacs Lisp, Allocation of Objects in XEmacs Lisp
4432 @section Introduction to Allocation
4433
4434   Emacs Lisp, like all Lisps, has garbage collection.  This means that
4435 the programmer never has to explicitly free (destroy) an object; it
4436 happens automatically when the object becomes inaccessible.  Most
4437 experts agree that garbage collection is a necessity in a modern,
4438 high-level language.  Its omission from C stems from the fact that C was
4439 originally designed to be a nice abstract layer on top of assembly
4440 language, for writing kernels and basic system utilities rather than
4441 large applications.
4442
4443   Lisp objects can be created by any of a number of Lisp primitives.
4444 Most object types have one or a small number of basic primitives
4445 for creating objects.  For conses, the basic primitive is @code{cons};
4446 for vectors, the primitives are @code{make-vector} and @code{vector}; for
4447 symbols, the primitives are @code{make-symbol} and @code{intern}; etc.
4448 Some Lisp objects, especially those that are primarily used internally,
4449 have no corresponding Lisp primitives.  Every Lisp object, though,
4450 has at least one C primitive for creating it.
4451
4452   Recall from section (VII) that a Lisp object, as stored in a 32-bit
4453 or 64-bit word, has a mark bit, a few tag bits, and a ``value'' that
4454 occupies the remainder of the bits.  We can separate the different
4455 Lisp object types into four broad categories:
4456
4457 @itemize @bullet
4458 @item
4459 (a) Those for whom the value directly represents the contents of the
4460 Lisp object.  Only two types are in this category: integers and
4461 characters.  No special allocation or garbage collection is necessary
4462 for such objects.  Lisp objects of these types do not need to be
4463 @code{GCPRO}ed.
4464 @end itemize
4465
4466   In the remaining three categories, the value is a pointer to a
4467 structure.
4468
4469 @itemize @bullet
4470 @item
4471 @cindex frob block
4472 (b) Those for whom the tag directly specifies the type.  Recall that
4473 there are only three tag bits; this means that at most five types can be
4474 specified this way.  The most commonly-used types are stored in this
4475 format; this includes conses, strings, vectors, and sometimes symbols.
4476 With the exception of vectors, objects in this category are allocated in
4477 @dfn{frob blocks}, i.e. large blocks of memory that are subdivided into
4478 individual objects.  This saves a lot on malloc overhead, since there
4479 are typically quite a lot of these objects around, and the objects are
4480 small.  (A cons, for example, occupies 8 bytes on 32-bit machines---4
4481 bytes for each of the two objects it contains.) Vectors are individually
4482 @code{malloc()}ed since they are of variable size.  (It would be
4483 possible, and desirable, to allocate vectors of certain small sizes out
4484 of frob blocks, but it isn't currently done.) Strings are handled
4485 specially: Each string is allocated in two parts, a fixed size structure
4486 containing a length and a data pointer, and the actual data of the
4487 string.  The former structure is allocated in frob blocks as usual, and
4488 the latter data is stored in @dfn{string chars blocks} and is relocated
4489 during garbage collection to eliminate holes.
4490 @end itemize
4491
4492   In the remaining two categories, the type is stored in the object
4493 itself.  The tag for all such objects is the generic @dfn{lrecord}
4494 (Lisp_Record) tag.  The first four bytes (or eight, for 64-bit machines)
4495 of the object's structure are a pointer to a structure that describes
4496 the object's type, which includes method pointers and a pointer to a
4497 string naming the type.  Note that it's possible to save some space by
4498 using a one- or two-byte tag, rather than a four- or eight-byte pointer
4499 to store the type, but it's not clear it's worth making the change.
4500
4501 @itemize @bullet
4502 @item
4503 (c) Those lrecords that are allocated in frob blocks (see above).  This
4504 includes the objects that are most common and relatively small, and
4505 includes floats, compiled functions, symbols (when not in category (b)),
4506 extents, events, and markers.  With the cleanup of frob blocks done in
4507 19.12, it's not terribly hard to add more objects to this category, but
4508 it's a bit trickier than adding an object type to type (d) (esp. if the
4509 object needs a finalization method), and is not likely to save much
4510 space unless the object is small and there are many of them. (In fact,
4511 if there are very few of them, it might actually waste space.)
4512 @item
4513 (d) Those lrecords that are individually @code{malloc()}ed.  These are
4514 called @dfn{lcrecords}.  All other types are in this category.  Adding a
4515 new type to this category is comparatively easy, and all types added
4516 since 19.8 (when the current allocation scheme was devised, by Richard
4517 Mlynarik), with the exception of the character type, have been in this
4518 category.
4519 @end itemize
4520
4521   Note that bit vectors are a bit of a special case.  They are
4522 simple lrecords as in category (c), but are individually @code{malloc()}ed
4523 like vectors.  You can basically view them as exactly like vectors
4524 except that their type is stored in lrecord fashion rather than
4525 in directly-tagged fashion.
4526
4527   Note that FSF Emacs redesigned their object system in 19.29 to follow
4528 a similar scheme.  However, given RMS's expressed dislike for data
4529 abstraction, the FSF scheme is not nearly as clean or as easy to
4530 extend. (FSF calls items of type (c) @code{Lisp_Misc} and items of type
4531 (d) @code{Lisp_Vectorlike}, with separate tags for each, although
4532 @code{Lisp_Vectorlike} is also used for vectors.)
4533
4534 @node Garbage Collection, GCPROing, Introduction to Allocation, Allocation of Objects in XEmacs Lisp
4535 @section Garbage Collection
4536 @cindex garbage collection
4537
4538 @cindex mark and sweep
4539   Garbage collection is simple in theory but tricky to implement.
4540 Emacs Lisp uses the oldest garbage collection method, called
4541 @dfn{mark and sweep}.  Garbage collection begins by starting with
4542 all accessible locations (i.e. all variables and other slots where
4543 Lisp objects might occur) and recursively traversing all objects
4544 accessible from those slots, marking each one that is found.
4545 We then go through all of memory and free each object that is
4546 not marked, and unmarking each object that is marked.  Note
4547 that ``all of memory'' means all currently allocated objects.
4548 Traversing all these objects means traversing all frob blocks,
4549 all vectors (which are chained in one big list), and all
4550 lcrecords (which are likewise chained).
4551
4552   Note that, when an object is marked, the mark has to occur
4553 inside of the object's structure, rather than in the 32-bit
4554 @code{Lisp_Object} holding the object's pointer; i.e. you can't just
4555 set the pointer's mark bit.  This is because there may be many
4556 pointers to the same object.  This means that the method of
4557 marking an object can differ depending on the type.  The
4558 different marking methods are approximately as follows:
4559
4560 @enumerate
4561 @item
4562 For conses, the mark bit of the car is set.
4563 @item
4564 For strings, the mark bit of the string's plist is set.
4565 @item
4566 For symbols when not lrecords, the mark bit of the
4567 symbol's plist is set.
4568 @item
4569 For vectors, the length is negated after adding 1.
4570 @item
4571 For lrecords, the pointer to the structure describing
4572 the type is changed (see below).
4573 @item
4574 Integers and characters do not need to be marked, since
4575 no allocation occurs for them.
4576 @end enumerate
4577
4578   The details of this are in the @code{mark_object()} function.
4579
4580   Note that any code that operates during garbage collection has
4581 to be especially careful because of the fact that some objects
4582 may be marked and as such may not look like they normally do.
4583 In particular:
4584
4585 @itemize @bullet
4586 Some object pointers may have their mark bit set.  This will make
4587 @code{FOOBARP()} predicates fail.  Use @code{GC_FOOBARP()} to deal with
4588 this.
4589 @item
4590 Even if you clear the mark bit, @code{FOOBARP()} will still fail
4591 for lrecords because the implementation pointer has been
4592 changed (see below).  @code{GC_FOOBARP()} will correctly deal with
4593 this.
4594 @item
4595 Vectors have their size field munged, so anything that
4596 looks at this field will fail.
4597 @item
4598 Note that @code{XFOOBAR()} macros @emph{will} work correctly on object
4599 pointers with their mark bit set, because the logical shift operations
4600 that remove the tag also remove the mark bit.
4601 @end itemize
4602
4603   Finally, note that garbage collection can be invoked explicitly
4604 by calling @code{garbage-collect} but is also called automatically
4605 by @code{eval}, once a certain amount of memory has been allocated
4606 since the last garbage collection (according to @code{gc-cons-threshold}).
4607
4608 @node GCPROing, Garbage Collection - Step by Step, Garbage Collection, Allocation of Objects in XEmacs Lisp
4609 @section @code{GCPRO}ing
4610
4611 @code{GCPRO}ing is one of the ugliest and trickiest parts of Emacs
4612 internals.  The basic idea is that whenever garbage collection
4613 occurs, all in-use objects must be reachable somehow or
4614 other from one of the roots of accessibility.  The roots
4615 of accessibility are:
4616
4617 @enumerate
4618 @item
4619 All objects that have been @code{staticpro()}d.  This is used for
4620 any global C variables that hold Lisp objects.  A call to
4621 @code{staticpro()} happens implicitly as a result of any symbols
4622 declared with @code{defsymbol()} and any variables declared with
4623 @code{DEFVAR_FOO()}.  You need to explicitly call @code{staticpro()}
4624 (in the @code{vars_of_foo()} method of a module) for other global
4625 C variables holding Lisp objects. (This typically includes
4626 internal lists and such things.)
4627
4628 Note that @code{obarray} is one of the @code{staticpro()}d things.
4629 Therefore, all functions and variables get marked through this.
4630 @item
4631 Any shadowed bindings that are sitting on the @code{specpdl} stack.
4632 @item
4633 Any objects sitting in currently active (Lisp) stack frames,
4634 catches, and condition cases.
4635 @item
4636 A couple of special-case places where active objects are
4637 located.
4638 @item
4639 Anything currently marked with @code{GCPRO}.
4640 @end enumerate
4641
4642   Marking with @code{GCPRO} is necessary because some C functions (quite
4643 a lot, in fact), allocate objects during their operation.  Quite
4644 frequently, there will be no other pointer to the object while the
4645 function is running, and if a garbage collection occurs and the object
4646 needs to be referenced again, bad things will happen.  The solution is
4647 to mark those objects with @code{GCPRO}.  Unfortunately this is easy to
4648 forget, and there is basically no way around this problem.  Here are
4649 some rules, though:
4650
4651 @enumerate
4652 @item
4653 For every @code{GCPRO@var{n}}, there have to be declarations of
4654 @code{struct gcpro gcpro1, gcpro2}, etc.
4655
4656 @item
4657 You @emph{must} @code{UNGCPRO} anything that's @code{GCPRO}ed, and you
4658 @emph{must not} @code{UNGCPRO} if you haven't @code{GCPRO}ed.  Getting
4659 either of these wrong will lead to crashes, often in completely random
4660 places unrelated to where the problem lies.
4661
4662 @item
4663 The way this actually works is that all currently active @code{GCPRO}s
4664 are chained through the @code{struct gcpro} local variables, with the
4665 variable @samp{gcprolist} pointing to the head of the list and the nth
4666 local @code{gcpro} variable pointing to the first @code{gcpro} variable
4667 in the next enclosing stack frame.  Each @code{GCPRO}ed thing is an
4668 lvalue, and the @code{struct gcpro} local variable contains a pointer to
4669 this lvalue.  This is why things will mess up badly if you don't pair up
4670 the @code{GCPRO}s and @code{UNGCPRO}s---you will end up with
4671 @code{gcprolist}s containing pointers to @code{struct gcpro}s or local
4672 @code{Lisp_Object} variables in no-longer-active stack frames.
4673
4674 @item
4675 It is actually possible for a single @code{struct gcpro} to
4676 protect a contiguous array of any number of values, rather than
4677 just a single lvalue.  To effect this, call @code{GCPRO@var{n}} as usual on
4678 the first object in the array and then set @code{gcpro@var{n}.nvars}.
4679
4680 @item
4681 @strong{Strings are relocated.}  What this means in practice is that the
4682 pointer obtained using @code{XSTRING_DATA()} is liable to change at any
4683 time, and you should never keep it around past any function call, or
4684 pass it as an argument to any function that might cause a garbage
4685 collection.  This is why a number of functions accept either a
4686 ``non-relocatable'' @code{char *} pointer or a relocatable Lisp string,
4687 and only access the Lisp string's data at the very last minute.  In some
4688 cases, you may end up having to @code{alloca()} some space and copy the
4689 string's data into it.
4690
4691 @item
4692 By convention, if you have to nest @code{GCPRO}'s, use @code{NGCPRO@var{n}}
4693 (along with @code{struct gcpro ngcpro1, ngcpro2}, etc.), @code{NNGCPRO@var{n}},
4694 etc.  This avoids compiler warnings about shadowed locals.
4695
4696 @item
4697 It is @emph{always} better to err on the side of extra @code{GCPRO}s
4698 rather than too few.  The extra cycles spent on this are
4699 almost never going to make a whit of difference in the
4700 speed of anything.
4701
4702 @item
4703 The general rule to follow is that caller, not callee, @code{GCPRO}s.
4704 That is, you should not have to explicitly @code{GCPRO} any Lisp objects
4705 that are passed in as parameters.
4706
4707 One exception from this rule is if you ever plan to change the parameter
4708 value, and store a new object in it.  In that case, you @emph{must}
4709 @code{GCPRO} the parameter, because otherwise the new object will not be
4710 protected.
4711
4712 So, if you create any Lisp objects (remember, this happens in all sorts
4713 of circumstances, e.g. with @code{Fcons()}, etc.), you are responsible
4714 for @code{GCPRO}ing them, unless you are @emph{absolutely sure} that
4715 there's no possibility that a garbage-collection can occur while you
4716 need to use the object.  Even then, consider @code{GCPRO}ing.
4717
4718 @item
4719 A garbage collection can occur whenever anything calls @code{Feval}, or
4720 whenever a QUIT can occur where execution can continue past
4721 this. (Remember, this is almost anywhere.)
4722
4723 @item
4724 If you have the @emph{least smidgeon of doubt} about whether
4725 you need to @code{GCPRO}, you should @code{GCPRO}.
4726
4727 @item
4728 Beware of @code{GCPRO}ing something that is uninitialized.  If you have
4729 any shade of doubt about this, initialize all your variables to @code{Qnil}.
4730
4731 @item
4732 Be careful of traps, like calling @code{Fcons()} in the argument to
4733 another function.  By the ``caller protects'' law, you should be
4734 @code{GCPRO}ing the newly-created cons, but you aren't.  A certain
4735 number of functions that are commonly called on freshly created stuff
4736 (e.g. @code{nconc2()}, @code{Fsignal()}), break the ``caller protects''
4737 law and go ahead and @code{GCPRO} their arguments so as to simplify
4738 things, but make sure and check if it's OK whenever doing something like
4739 this.
4740
4741 @item
4742 Once again, remember to @code{GCPRO}!  Bugs resulting from insufficient
4743 @code{GCPRO}ing are intermittent and extremely difficult to track down,
4744 often showing up in crashes inside of @code{garbage-collect} or in
4745 weirdly corrupted objects or even in incorrect values in a totally
4746 different section of code.
4747 @end enumerate
4748
4749 @cindex garbage collection, conservative
4750 @cindex conservative garbage collection
4751   Given the extremely error-prone nature of the @code{GCPRO} scheme, and
4752 the difficulties in tracking down, it should be considered a deficiency
4753 in the XEmacs code.  A solution to this problem would involve
4754 implementing so-called @dfn{conservative} garbage collection for the C
4755 stack.  That involves looking through all of stack memory and treating
4756 anything that looks like a reference to an object as a reference.  This
4757 will result in a few objects not getting collected when they should, but
4758 it obviates the need for @code{GCPRO}ing, and allows garbage collection
4759 to happen at any point at all, such as during object allocation.
4760
4761 @node Garbage Collection - Step by Step, Integers and Characters, GCPROing, Allocation of Objects in XEmacs Lisp
4762 @section Garbage Collection - Step by Step
4763 @cindex garbage collection step by step
4764
4765 @menu
4766 * Invocation::
4767 * garbage_collect_1::
4768 * mark_object::
4769 * gc_sweep::
4770 * sweep_lcrecords_1::
4771 * compact_string_chars::
4772 * sweep_strings::
4773 * sweep_bit_vectors_1::
4774 @end menu
4775
4776 @node Invocation, garbage_collect_1, Garbage Collection - Step by Step, Garbage Collection - Step by Step
4777 @subsection Invocation
4778 @cindex garbage collection, invocation
4779
4780 The first thing that anyone should know about garbage collection is:
4781 when and how the garbage collector is invoked. One might think that this
4782 could happen every time new memory is allocated, e.g. new objects are
4783 created, but this is @emph{not} the case. Instead, we have the following
4784 situation:
4785
4786 The entry point of any process of garbage collection is an invocation
4787 of the function @code{garbage_collect_1} in file @code{alloc.c}. The
4788 invocation can occur @emph{explicitly} by calling the function
4789 @code{Fgarbage_collect} (in addition this function provides information
4790 about the freed memory), or can occur @emph{implicitly} in four different
4791 situations:
4792 @enumerate
4793 @item
4794 In function @code{main_1} in file @code{emacs.c}. This function is called
4795 at each startup of xemacs. The garbage collection is invoked after all
4796 initial creations are completed, but only if a special internal error
4797 checking-constant @code{ERROR_CHECK_GC} is defined.
4798 @item
4799 In function @code{disksave_object_finalization} in file
4800 @code{alloc.c}. The only purpose of this function is to clear the
4801 objects from memory which need not be stored with xemacs when we dump out
4802 an executable. This is only done by @code{Fdump_emacs} or by
4803 @code{Fdump_emacs_data} respectively (both in @code{emacs.c}). The
4804 actual clearing is accomplished by making these objects unreachable and
4805 starting a garbage collection. The function is only used while building
4806 xemacs.
4807 @item
4808 In function @code{Feval / eval} in file @code{eval.c}. Each time the
4809 well known and often used function eval is called to evaluate a form,
4810 one of the first things that could happen, is a potential call of
4811 @code{garbage_collect_1}. There exist three global variables,
4812 @code{consing_since_gc} (counts the created cons-cells since the last
4813 garbage collection), @code{gc_cons_threshold} (a specified threshold
4814 after which a garbage collection occurs) and @code{always_gc}. If
4815 @code{always_gc} is set or if the threshold is exceeded, the garbage
4816 collection will start.
4817 @item
4818 In function @code{Ffuncall / funcall} in file @code{eval.c}. This
4819 function evaluates calls of elisp functions and works according to
4820 @code{Feval}.
4821 @end enumerate
4822
4823 The upshot is that garbage collection can basically occur everywhere
4824 @code{Feval}, respectively @code{Ffuncall}, is used - either directly or
4825 through another function. Since calls to these two functions are
4826 hidden in various other functions, many calls to
4827 @code{garabge_collect_1} are not obviously foreseeable, and therefore
4828 unexpected. Instances where they are used that are worth remembering are
4829 various elisp commands, as for example @code{or},
4830 @code{and}, @code{if}, @code{cond}, @code{while}, @code{setq}, etc.,
4831 miscellaneous @code{gui_item_...} functions, everything related to
4832 @code{eval} (@code{Feval_buffer}, @code{call0}, ...) and inside
4833 @code{Fsignal}. The latter is used to handle signals, as for example the
4834 ones raised by every @code{QUITE}-macro triggered after pressing Ctrl-g.
4835
4836 @node garbage_collect_1, mark_object, Invocation, Garbage Collection - Step by Step
4837 @subsection @code{garbage_collect_1}
4838 @cindex @code{garbage_collect_1}
4839
4840 We can now describe exactly what happens after the invocation takes
4841 place.
4842 @enumerate
4843 @item
4844 There are several cases in which the garbage collector is left immediately:
4845 when we are already garbage collecting (@code{gc_in_progress}), when
4846 the garbage collection is somehow forbidden
4847 (@code{gc_currently_forbidden}), when we are currently displaying something
4848 (@code{in_display}) or when we are preparing for the armageddon of the
4849 whole system (@code{preparing_for_armageddon}).
4850 @item
4851 Next the correct frame in which to put
4852 all the output occurring during garbage collecting is determined. In
4853 order to be able to restore the old display's state after displaying the
4854 message, some data about the current cursor position has to be
4855 saved. The variables @code{pre_gc_curser} and @code{cursor_changed} take
4856 care of that.
4857 @item
4858 The state of @code{gc_currently_forbidden} must be restored after
4859 the garbage collection, no matter what happens during the process. We
4860 accomplish this by @code{record_unwind_protect}ing the suitable function
4861 @code{restore_gc_inhibit} together with the current value of
4862 @code{gc_currently_forbidden}.
4863 @item
4864 If we are concurrently running an interactive xemacs session, the next step
4865 is simply to show the garbage collector's cursor/message.
4866 @item
4867 The following steps are the intrinsic steps of the garbage collector,
4868 therefore @code{gc_in_progress} is set.
4869 @item
4870 For debugging purposes, it is possible to copy the current C stack
4871 frame. However, this seems to be a currently unused feature.
4872 @item
4873 Before actually starting to go over all live objects, references to
4874 objects that are no longer used are pruned. We only have to do this for events
4875 (@code{clear_event_resource}) and for specifiers
4876 (@code{cleanup_specifiers}).
4877 @item
4878 Now the mark phase begins and marks all accessible elements. In order to
4879 start from
4880 all slots that serve as roots of accessibility, the function
4881 @code{mark_object} is called for each root individually to go out from
4882 there to mark all reachable objects. All roots that are traversed are
4883 shown in their processed order:
4884 @itemize @bullet
4885 @item
4886 all constant symbols and static variables that are registered via
4887 @code{staticpro}@ in the array @code{staticvec}.
4888 @xref{Adding Global Lisp Variables}.
4889 @item
4890 all Lisp objects that are created in C functions and that must be
4891 protected from freeing them. They are registered in the global
4892 list @code{gcprolist}.
4893 @xref{GCPROing}.
4894 @item
4895 all local variables (i.e. their name fields @code{symbol} and old
4896 values @code{old_values}) that are bound during the evaluation by the Lisp
4897 engine. They are stored in @code{specbinding} structs pushed on a stack
4898 called @code{specpdl}.
4899 @xref{Dynamic Binding; The specbinding Stack; Unwind-Protects}.
4900 @item
4901 all catch blocks that the Lisp engine encounters during the evaluation
4902 cause the creation of structs @code{catchtag} inserted in the list
4903 @code{catchlist}. Their tag (@code{tag}) and value (@code{val} fields
4904 are freshly created objects and therefore have to be marked.
4905 @xref{Catch and Throw}.
4906 @item
4907 every function application pushes new structs @code{backtrace}
4908 on the call stack of the Lisp engine (@code{backtrace_list}). The unique
4909 parts that have to be marked are the fields for each function
4910 (@code{function}) and all their arguments (@code{args}).
4911 @xref{Evaluation}.
4912 @item
4913 all objects that are used by the redisplay engine that must not be freed
4914 are marked by a special function called @code{mark_redisplay} (in
4915 @code{redisplay.c}).
4916 @item
4917 all objects created for profiling purposes are allocated by C functions
4918 instead of using the lisp allocation mechanisms. In order to receive the
4919 right ones during the sweep phase, they also have to be marked
4920 manually. That is done by the function @code{mark_profiling_info}
4921 @end itemize
4922 @item
4923 Hash tables in XEmacs belong to a kind of special objects that
4924 make use of a concept often called 'weak pointers'.
4925 To make a long story short, these kind of pointers are not followed
4926 during the estimation of the live objects during garbage collection.
4927 Any object referenced only by weak pointers is collected
4928 anyway, and the reference to it is cleared. In hash tables there are
4929 different usage patterns of them, manifesting in different types of hash
4930 tables, namely 'non-weak', 'weak', 'key-weak' and 'value-weak'
4931 (internally also 'key-car-weak' and 'value-car-weak') hash tables, each
4932 clearing entries depending on different conditions. More information can
4933 be found in the documentation to the function @code{make-hash-table}.
4934
4935 Because there are complicated dependency rules about when and what to
4936 mark while processing weak hash tables, the standard @code{marker}
4937 method is only active if it is marking non-weak hash tables. As soon as
4938 a weak component is in the table, the hash table entries are ignored
4939 while marking. Instead their marking is done each separately by the
4940 function @code{finish_marking_weak_hash_tables}. This function iterates
4941 over each hash table entry @code{hentries} for each weak hash table in
4942 @code{Vall_weak_hash_tables}. Depending on the type of a table, the
4943 appropriate action is performed.
4944 If a table is acting as @code{HASH_TABLE_KEY_WEAK}, and a key already marked,
4945 everything reachable from the @code{value} component is marked. If it is
4946 acting as a @code{HASH_TABLE_VALUE_WEAK} and the value component is
4947 already marked, the marking starts beginning only from the
4948 @code{key} component.
4949 If it is a @code{HASH_TABLE_KEY_CAR_WEAK} and the car
4950 of the key entry is already marked, we mark both the @code{key} and
4951 @code{value} components.
4952 Finally, if the table is of the type @code{HASH_TABLE_VALUE_CAR_WEAK}
4953 and the car of the value components is already marked, again both the
4954 @code{key} and the @code{value} components get marked.
4955
4956 Again, there are lists with comparable properties called weak
4957 lists. There exist different peculiarities of their types called
4958 @code{simple}, @code{assoc}, @code{key-assoc} and
4959 @code{value-assoc}. You can find further details about them in the
4960 description to the function @code{make-weak-list}. The scheme of their
4961 marking is similar: all weak lists are listed in @code{Qall_weak_lists},
4962 therefore we iterate over them. The marking is advanced until we hit an
4963 already marked pair. Then we know that during a former run all
4964 the rest has been marked completely. Again, depending on the special
4965 type of the weak list, our jobs differ. If it is a @code{WEAK_LIST_SIMPLE}
4966 and the elem is marked, we mark the @code{cons} part. If it is a
4967 @code{WEAK_LIST_ASSOC} and not a pair or a pair with both marked car and
4968 cdr, we mark the @code{cons} and the @code{elem}. If it is a
4969 @code{WEAK_LIST_KEY_ASSOC} and not a pair or a pair with a marked car of
4970 the elem, we mark the @code{cons} and the @code{elem}. Finally, if it is
4971 a @code{WEAK_LIST_VALUE_ASSOC} and not a pair or a pair with a marked
4972 cdr of the elem, we mark both the @code{cons} and the @code{elem}.
4973
4974 Since, by marking objects in reach from weak hash tables and weak lists,
4975 other objects could get marked, this perhaps implies further marking of
4976 other weak objects, both finishing functions are redone as long as
4977 yet unmarked objects get freshly marked.
4978
4979 @item
4980 After completing the special marking for the weak hash tables and for the weak
4981 lists, all entries that point to objects that are going to be swept in
4982 the further process are useless, and therefore have to be removed from
4983 the table or the list.
4984
4985 The function @code{prune_weak_hash_tables} does the job for weak hash
4986 tables. Totally unmarked hash tables are removed from the list
4987 @code{Vall_weak_hash_tables}. The other ones are treated more carefully
4988 by scanning over all entries and removing one as soon as one of
4989 the components @code{key} and @code{value} is unmarked.
4990
4991 The same idea applies to the weak lists. It is accomplished by
4992 @code{prune_weak_lists}: An unmarked list is pruned from
4993 @code{Vall_weak_lists} immediately. A marked list is treated more
4994 carefully by going over it and removing just the unmarked pairs.
4995
4996 @item
4997 The function @code{prune_specifiers} checks all listed specifiers held
4998 in @code{Vall_speficiers} and removes the ones from the lists that are
4999 unmarked.
5000
5001 @item
5002 All syntax tables are stored in a list called
5003 @code{Vall_syntax_tables}. The function @code{prune_syntax_tables} walks
5004 through it and unlinks the tables that are unmarked.
5005
5006 @item
5007 Next, we will attack the complete sweeping - the function
5008 @code{gc_sweep} which holds the predominance.
5009 @item
5010 First, all the variables with respect to garbage collection are
5011 reset. @code{consing_since_gc} - the counter of the created cells since
5012 the last garbage collection - is set back to 0, and
5013 @code{gc_in_progress} is not @code{true} anymore.
5014 @item
5015 In case the session is interactive, the displayed cursor and message are
5016 removed again.
5017 @item
5018 The state of @code{gc_inhibit} is restored to the former value by
5019 unwinding the stack.
5020 @item
5021 A small memory reserve is always held back that can be reached by
5022 @code{breathing_space}. If nothing more is left, we create a new reserve
5023 and exit.
5024 @end enumerate
5025
5026 @node mark_object, gc_sweep, garbage_collect_1, Garbage Collection - Step by Step
5027 @subsection @code{mark_object}
5028 @cindex @code{mark_object}
5029
5030 The first thing that is checked while marking an object is whether the
5031 object is a real Lisp object @code{Lisp_Type_Record} or just an integer
5032 or a character. Integers and characters are the only two types that are
5033 stored directly - without another level of indirection, and therefore they
5034 don't have to be marked and collected.
5035 @xref{How Lisp Objects Are Represented in C}.
5036
5037 The second case is the one we have to handle. It is the one when we are
5038 dealing with a pointer to a Lisp object. But, there exist also three
5039 possibilities, that prevent us from doing anything while marking: The
5040 object is read only which prevents it from being garbage collected,
5041 i.e. marked (@code{C_READONLY_RECORD_HEADER}). The object in question is
5042 already marked, and need not be marked for the second time (checked by
5043 @code{MARKED_RECORD_HEADER_P}). If it is a special, unmarkable object
5044 (@code{UNMARKABLE_RECORD_HEADER_P}, apparently, these are objects that
5045 sit in some const space, and can therefore not be marked, see
5046 @code{this_one_is_unmarkable} in @code{alloc.c}).
5047
5048 Now, the actual marking is feasible. We do so by once using the macro
5049 @code{MARK_RECORD_HEADER} to mark the object itself (actually the
5050 special flag in the lrecord header), and calling its special marker
5051 "method" @code{marker} if available. The marker method marks every
5052 other object that is in reach from our current object. Note, that these
5053 marker methods should not call @code{mark_object} recursively, but
5054 instead should return the next object from where further marking has to
5055 be performed.
5056
5057 In case another object was returned, as mentioned before, we reiterate
5058 the whole @code{mark_object} process beginning with this next object.
5059
5060 @node gc_sweep, sweep_lcrecords_1, mark_object, Garbage Collection - Step by Step
5061 @subsection @code{gc_sweep}
5062 @cindex @code{gc_sweep}
5063
5064 The job of this function is to free all unmarked records from memory. As
5065 we know, there are different types of objects implemented and managed, and
5066 consequently different ways to free them from memory.
5067 @xref{Introduction to Allocation}.
5068
5069 We start with all objects stored through @code{lcrecords}. All
5070 bulkier objects are allocated and handled using that scheme of
5071 @code{lcrecords}. Each object is @code{malloc}ed separately
5072 instead of placing it in one of the contiguous frob blocks. All types
5073 that are currently stored
5074 using @code{lcrecords}'s  @code{alloc_lcrecord} and
5075 @code{make_lcrecord_list} are the types: vectors, buffers,
5076 char-table, char-table-entry, console, weak-list, database, device,
5077 ldap, hash-table, command-builder, extent-auxiliary, extent-info, face,
5078 coding-system, frame, image-instance, glyph, popup-data, gui-item,
5079 keymap, charset, color_instance, font_instance, opaque, opaque-list,
5080 process, range-table, specifier, symbol-value-buffer-local,
5081 symbol-value-lisp-magic, symbol-value-varalias, toolbar-button,
5082 tooltalk-message, tooltalk-pattern, window, and window-configuration. We
5083 take care of them in the fist place
5084 in order to be able to handle and to finalize items stored in them more
5085 easily. The function @code{sweep_lcrecords_1} as described below is
5086 doing the whole job for us.
5087 For a description about the internals: @xref{lrecords}.
5088
5089 Our next candidates are the other objects that behave quite differently
5090 than everything else: the strings. They consists of two parts, a
5091 fixed-size portion (@code{struct Lisp_string}) holding the string's
5092 length, its property list and a pointer to the second part, and the
5093 actual string data, which is stored in string-chars blocks comparable to
5094 frob blocks. In this block, the data is not only freed, but also a
5095 compression of holes is made, i.e. all strings are relocated together.
5096 @xref{String}. This compacting phase is performed by the function
5097 @code{compact_string_chars}, the actual sweeping by the function
5098 @code{sweep_strings} is described below.
5099
5100 After that, the other types are swept step by step using functions
5101 @code{sweep_conses}, @code{sweep_bit_vectors_1},
5102 @code{sweep_compiled_functions}, @code{sweep_floats},
5103 @code{sweep_symbols}, @code{sweep_extents}, @code{sweep_markers} and
5104 @code{sweep_extents}.  They are the fixed-size types cons, floats,
5105 compiled-functions, symbol, marker, extent, and event stored in
5106 so-called "frob blocks", and therefore we can basically do the same on
5107 every type objects, using the same macros, especially defined only to
5108 handle everything with respect to fixed-size blocks. The only fixed-size
5109 type that is not handled here are the fixed-size portion of strings,
5110 because we took special care of them earlier.
5111
5112 The only big exceptions are bit vectors stored differently and
5113 therefore treated differently by the function @code{sweep_bit_vectors_1}
5114 described later.
5115
5116 At first, we need some brief information about how
5117 these fixed-size types are managed in general, in order to understand
5118 how the sweeping is done. They have all a fixed size, and are therefore
5119 stored in big blocks of memory - allocated at once - that can hold a
5120 certain amount of objects of one type. The macro
5121 @code{DECLARE_FIXED_TYPE_ALLOC} creates the suitable structures for
5122 every type. More precisely, we have the block struct
5123 (holding a pointer to the previous block @code{prev} and the
5124 objects in @code{block[]}), a pointer to current block
5125 (@code{current_..._block)}) and its last index
5126 (@code{current_..._block_index}), and a pointer to the free list that
5127 will be created. Also a macro @code{FIXED_TYPE_FROM_BLOCK} plus some
5128 related macros exists that are used to obtain a new object, either from
5129 the free list @code{ALLOCATE_FIXED_TYPE_1} if there is an unused object
5130 of that type stored or by allocating a completely new block using
5131 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK}.
5132
5133 The rest works as follows: all of them define a
5134 macro @code{UNMARK_...} that is used to unmark the object. They define a
5135 macro @code{ADDITIONAL_FREE_...} that defines additional work that has
5136 to be done when converting an object from in use to not in use (so far,
5137 only markers use it in order to unchain them). Then, they all call
5138 the macro @code{SWEEP_FIXED_TYPE_BLOCK} instantiated with their type name
5139 and their struct name.
5140
5141 This call in particular does the following: we go over all blocks
5142 starting with the current moving towards the oldest.
5143 For each block, we look at every object in it. If the object already
5144 freed (checked with @code{FREE_STRUCT_P} using the first pointer of the
5145 object), or if it is
5146 set to read only (@code{C_READONLY_RECORD_HEADER_P}, nothing must be
5147 done. If it is unmarked (checked with @code{MARKED_RECORD_HEADER_P}), it
5148 is put in the free list and set free (using the macro
5149 @code{FREE_FIXED_TYPE}, otherwise it stays in the block, but is unmarked
5150 (by @code{UNMARK_...}). While going through one block, we note if the
5151 whole block is empty. If so, the whole block is freed (using
5152 @code{xfree}) and the free list state is set to the state it had before
5153 handling this block.
5154
5155 @node sweep_lcrecords_1, compact_string_chars, gc_sweep, Garbage Collection - Step by Step
5156 @subsection @code{sweep_lcrecords_1}
5157 @cindex @code{sweep_lcrecords_1}
5158
5159 After nullifying the complete lcrecord statistics, we go over all
5160 lcrecords two separate times. They are all chained together in a list with
5161 a head called @code{all_lcrecords}.
5162
5163 The first loop calls for each object its @code{finalizer} method, but only
5164 in the case that it is not read only
5165 (@code{C_READONLY_RECORD_HEADER_P)}, it is not already marked
5166 (@code{MARKED_RECORD_HEADER_P}), it is not already in a free list (list of
5167 freed objects, field @code{free}) and finally it owns a finalizer
5168 method.
5169
5170 The second loop actually frees the appropriate objects again by iterating
5171 through the whole list. In case an object is read only or marked, it
5172 has to persist, otherwise it is manually freed by calling
5173 @code{xfree}. During this loop, the lcrecord statistics are kept up to
5174 date by calling @code{tick_lcrecord_stats} with the right arguments,
5175
5176 @node compact_string_chars, sweep_strings, sweep_lcrecords_1, Garbage Collection - Step by Step
5177 @subsection @code{compact_string_chars}
5178 @cindex @code{compact_string_chars}
5179
5180 The purpose of this function is to compact all the data parts of the
5181 strings that are held in so-called @code{string_chars_block}, i.e. the
5182 strings that do not exceed a certain maximal length.
5183
5184 The procedure with which this is done is as follows. We are keeping two
5185 positions in the @code{string_chars_block}s using two pointer/integer
5186 pairs, namely @code{from_sb}/@code{from_pos} and
5187 @code{to_sb}/@code{to_pos}. They stand for the actual positions, from
5188 where to where, to copy the actually handled string.
5189
5190 While going over all chained @code{string_char_block}s and their held
5191 strings, staring at @code{first_string_chars_block}, both pointers
5192 are advanced and eventually a string is copied from @code{from_sb} to
5193 @code{to_sb}, depending on the status of the pointed at strings.
5194
5195 More precisely, we can distinguish between the following actions.
5196 @itemize @bullet
5197 @item
5198 The string at @code{from_sb}'s position could be marked as free, which
5199 is indicated by an invalid pointer to the pointer that should point back
5200 to the fixed size string object, and which is checked by
5201 @code{FREE_STRUCT_P}. In this case, the @code{from_sb}/@code{from_pos}
5202 is advanced to the next string, and nothing has to be copied.
5203 @item
5204 Also, if a string object itself is unmarked, nothing has to be
5205 copied. We likewise advance the @code{from_sb}/@code{from_pos}
5206 pair as described above.
5207 @item
5208 In all other cases, we have a marked string at hand. The string data
5209 must be moved from the from-position to the to-position. In case
5210 there is not enough space in the actual @code{to_sb}-block, we advance
5211 this pointer to the beginning of the next block before copying. In case the
5212 from and to positions are different, we perform the
5213 actual copying using the library function @code{memmove}.
5214 @end itemize
5215
5216 After compacting, the pointer to the current
5217 @code{string_chars_block}, sitting in @code{current_string_chars_block},
5218 is reset on the last block to which we moved a string,
5219 i.e. @code{to_block}, and all remaining blocks (we know that they just
5220 carry garbage) are explicitly @code{xfree}d.
5221
5222 @node sweep_strings, sweep_bit_vectors_1, compact_string_chars, Garbage Collection - Step by Step
5223 @subsection @code{sweep_strings}
5224 @cindex @code{sweep_strings}
5225
5226 The sweeping for the fixed sized string objects is essentially exactly
5227 the same as it is for all other fixed size types. As before, the freeing
5228 into the suitable free list is done by using the macro
5229 @code{SWEEP_FIXED_SIZE_BLOCK} after defining the right macros
5230 @code{UNMARK_string} and @code{ADDITIONAL_FREE_string}. These two
5231 definitions are a little bit special compared to the ones used
5232 for the other fixed size types.
5233
5234 @code{UNMARK_string} is defined the same way except some additional code
5235 used for updating the bookkeeping information.
5236
5237 For strings, @code{ADDITIONAL_FREE_string} has to do something in
5238 addition: in case, the string was not allocated in a
5239 @code{string_chars_block} because it exceeded the maximal length, and
5240 therefore it was @code{malloc}ed separately, we know also @code{xfree}
5241 it explicitly.
5242
5243 @node sweep_bit_vectors_1,  , sweep_strings, Garbage Collection - Step by Step
5244 @subsection @code{sweep_bit_vectors_1}
5245 @cindex @code{sweep_bit_vectors_1}
5246
5247 Bit vectors are also one of the rare types that are @code{malloc}ed
5248 individually. Consequently, while sweeping, all further needless
5249 bit vectors must be freed by hand. This is done, as one might imagine,
5250 the expected way: since they are all registered in a list called
5251 @code{all_bit_vectors}, all elements of that list are traversed,
5252 all unmarked bit vectors are unlinked by calling @code{xfree} and all of
5253 them become unmarked.
5254 In addition, the bookkeeping information used for garbage
5255 collector's output purposes is updated.
5256
5257 @node Integers and Characters, Allocation from Frob Blocks, Garbage Collection - Step by Step, Allocation of Objects in XEmacs Lisp
5258 @section Integers and Characters
5259
5260   Integer and character Lisp objects are created from integers using the
5261 macros @code{XSETINT()} and @code{XSETCHAR()} or the equivalent
5262 functions @code{make_int()} and @code{make_char()}. (These are actually
5263 macros on most systems.)  These functions basically just do some moving
5264 of bits around, since the integral value of the object is stored
5265 directly in the @code{Lisp_Object}.
5266
5267   @code{XSETINT()} and the like will truncate values given to them that
5268 are too big; i.e. you won't get the value you expected but the tag bits
5269 will at least be correct.
5270
5271 @node Allocation from Frob Blocks, lrecords, Integers and Characters, Allocation of Objects in XEmacs Lisp
5272 @section Allocation from Frob Blocks
5273
5274 The uninitialized memory required by a @code{Lisp_Object} of a particular type
5275 is allocated using
5276 @code{ALLOCATE_FIXED_TYPE()}.  This only occurs inside of the
5277 lowest-level object-creating functions in @file{alloc.c}:
5278 @code{Fcons()}, @code{make_float()}, @code{Fmake_byte_code()},
5279 @code{Fmake_symbol()}, @code{allocate_extent()},
5280 @code{allocate_event()}, @code{Fmake_marker()}, and
5281 @code{make_uninit_string()}.  The idea is that, for each type, there are
5282 a number of frob blocks (each 2K in size); each frob block is divided up
5283 into object-sized chunks.  Each frob block will have some of these
5284 chunks that are currently assigned to objects, and perhaps some that are
5285 free. (If a frob block has nothing but free chunks, it is freed at the
5286 end of the garbage collection cycle.)  The free chunks are stored in a
5287 free list, which is chained by storing a pointer in the first four bytes
5288 of the chunk. (Except for the free chunks at the end of the last frob
5289 block, which are handled using an index which points past the end of the
5290 last-allocated chunk in the last frob block.)
5291 @code{ALLOCATE_FIXED_TYPE()} first tries to retrieve a chunk from the
5292 free list; if that fails, it calls
5293 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK()}, which looks at the end of the
5294 last frob block for space, and creates a new frob block if there is
5295 none. (There are actually two versions of these macros, one of which is
5296 more defensive but less efficient and is used for error-checking.)
5297
5298 @node lrecords, Low-level allocation, Allocation from Frob Blocks, Allocation of Objects in XEmacs Lisp
5299 @section lrecords
5300
5301   [see @file{lrecord.h}]
5302
5303   All lrecords have at the beginning of their structure a @code{struct
5304 lrecord_header}.  This just contains a pointer to a @code{struct
5305 lrecord_implementation}, which is a structure containing method pointers
5306 and such.  There is one of these for each type, and it is a global,
5307 constant, statically-declared structure that is declared in the
5308 @code{DEFINE_LRECORD_IMPLEMENTATION()} macro. (This macro actually
5309 declares an array of two @code{struct lrecord_implementation}
5310 structures.  The first one contains all the standard method pointers,
5311 and is used in all normal circumstances.  During garbage collection,
5312 however, the lrecord is @dfn{marked} by bumping its implementation
5313 pointer by one, so that it points to the second structure in the array.
5314 This structure contains a special indication in it that it's a
5315 @dfn{marked-object} structure: the finalize method is the special
5316 function @code{this_marks_a_marked_record()}, and all other methods are
5317 null pointers.  At the end of garbage collection, all lrecords will
5318 either be reclaimed or unmarked by decrementing their implementation
5319 pointers, so this second structure pointer will never remain past
5320 garbage collection.
5321
5322   Simple lrecords (of type (c) above) just have a @code{struct
5323 lrecord_header} at their beginning.  lcrecords, however, actually have a
5324 @code{struct lcrecord_header}.  This, in turn, has a @code{struct
5325 lrecord_header} at its beginning, so sanity is preserved; but it also
5326 has a pointer used to chain all lcrecords together, and a special ID
5327 field used to distinguish one lcrecord from another. (This field is used
5328 only for debugging and could be removed, but the space gain is not
5329 significant.)
5330
5331   Simple lrecords are created using @code{ALLOCATE_FIXED_TYPE()}, just
5332 like for other frob blocks.  The only change is that the implementation
5333 pointer must be initialized correctly. (The implementation structure for
5334 an lrecord, or rather the pointer to it, is named @code{lrecord_float},
5335 @code{lrecord_extent}, @code{lrecord_buffer}, etc.)
5336
5337   lcrecords are created using @code{alloc_lcrecord()}.  This takes a
5338 size to allocate and an implementation pointer. (The size needs to be
5339 passed because some lcrecords, such as window configurations, are of
5340 variable size.) This basically just @code{malloc()}s the storage,
5341 initializes the @code{struct lcrecord_header}, and chains the lcrecord
5342 onto the head of the list of all lcrecords, which is stored in the
5343 variable @code{all_lcrecords}.  The calls to @code{alloc_lcrecord()}
5344 generally occur in the lowest-level allocation function for each lrecord
5345 type.
5346
5347 Whenever you create an lrecord, you need to call either
5348 @code{DEFINE_LRECORD_IMPLEMENTATION()} or
5349 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()}.  This needs to be
5350 specified in a C file, at the top level.  What this actually does is
5351 define and initialize the implementation structure for the lrecord. (And
5352 possibly declares a function @code{error_check_foo()} that implements
5353 the @code{XFOO()} macro when error-checking is enabled.)  The arguments
5354 to the macros are the actual type name (this is used to construct the C
5355 variable name of the lrecord implementation structure and related
5356 structures using the @samp{##} macro concatenation operator), a string
5357 that names the type on the Lisp level (this may not be the same as the C
5358 type name; typically, the C type name has underscores, while the Lisp
5359 string has dashes), various method pointers, and the name of the C
5360 structure that contains the object.  The methods are used to encapsulate
5361 type-specific information about the object, such as how to print it or
5362 mark it for garbage collection, so that it's easy to add new object
5363 types without having to add a specific case for each new type in a bunch
5364 of different places.
5365
5366   The difference between @code{DEFINE_LRECORD_IMPLEMENTATION()} and
5367 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()} is that the former is
5368 used for fixed-size object types and the latter is for variable-size
5369 object types.  Most object types are fixed-size; some complex
5370 types, however (e.g. window configurations), are variable-size.
5371 Variable-size object types have an extra method, which is called
5372 to determine the actual size of a particular object of that type.
5373 (Currently this is only used for keeping allocation statistics.)
5374
5375   For the purpose of keeping allocation statistics, the allocation
5376 engine keeps a list of all the different types that exist.  Note that,
5377 since @code{DEFINE_LRECORD_IMPLEMENTATION()} is a macro that is
5378 specified at top-level, there is no way for it to add to the list of all
5379 existing types.  What happens instead is that each implementation
5380 structure contains in it a dynamically assigned number that is
5381 particular to that type. (Or rather, it contains a pointer to another
5382 structure that contains this number.  This evasiveness is done so that
5383 the implementation structure can be declared const.) In the sweep stage
5384 of garbage collection, each lrecord is examined to see if its
5385 implementation structure has its dynamically-assigned number set.  If
5386 not, it must be a new type, and it is added to the list of known types
5387 and a new number assigned.  The number is used to index into an array
5388 holding the number of objects of each type and the total memory
5389 allocated for objects of that type.  The statistics in this array are
5390 also computed during the sweep stage.  These statistics are returned by
5391 the call to @code{garbage-collect} and are printed out at the end of the
5392 loadup phase.
5393
5394   Note that for every type defined with a @code{DEFINE_LRECORD_*()}
5395 macro, there needs to be a @code{DECLARE_LRECORD_IMPLEMENTATION()}
5396 somewhere in a @file{.h} file, and this @file{.h} file needs to be
5397 included by @file{inline.c}.
5398
5399   Furthermore, there should generally be a set of @code{XFOOBAR()},
5400 @code{FOOBARP()}, etc. macros in a @file{.h} (or occasionally @file{.c})
5401 file.  To create one of these, copy an existing model and modify as
5402 necessary.
5403
5404   The various methods in the lrecord implementation structure are:
5405
5406 @enumerate
5407 @item
5408 @cindex mark method
5409 A @dfn{mark} method.  This is called during the marking stage and passed
5410 a function pointer (usually the @code{mark_object()} function), which is
5411 used to mark an object.  All Lisp objects that are contained within the
5412 object need to be marked by applying this function to them.  The mark
5413 method should also return a Lisp object, which should be either nil or
5414 an object to mark. (This can be used in lieu of calling
5415 @code{mark_object()} on the object, to reduce the recursion depth, and
5416 consequently should be the most heavily nested sub-object, such as a
5417 long list.)
5418
5419 @strong{Please note:} When the mark method is called, garbage collection
5420 is in progress, and special precautions need to be taken when accessing
5421 objects; see section (B) above.
5422
5423 If your mark method does not need to do anything, it can be
5424 @code{NULL}.
5425
5426 @item
5427 A @dfn{print} method.  This is called to create a printed representation
5428 of the object, whenever @code{princ}, @code{prin1}, or the like is
5429 called.  It is passed the object, a stream to which the output is to be
5430 directed, and an @code{escapeflag} which indicates whether the object's
5431 printed representation should be @dfn{escaped} so that it is
5432 readable. (This corresponds to the difference between @code{princ} and
5433 @code{prin1}.) Basically, @dfn{escaped} means that strings will have
5434 quotes around them and confusing characters in the strings such as
5435 quotes, backslashes, and newlines will be backslashed; and that special
5436 care will be taken to make symbols print in a readable fashion
5437 (e.g. symbols that look like numbers will be backslashed).  Other
5438 readable objects should perhaps pass @code{escapeflag} on when
5439 sub-objects are printed, so that readability is preserved when necessary
5440 (or if not, always pass in a 1 for @code{escapeflag}).  Non-readable
5441 objects should in general ignore @code{escapeflag}, except that some use
5442 it as an indication that more verbose output should be given.
5443
5444 Sub-objects are printed using @code{print_internal()}, which takes
5445 exactly the same arguments as are passed to the print method.
5446
5447 Literal C strings should be printed using @code{write_c_string()},
5448 or @code{write_string_1()} for non-null-terminated strings.
5449
5450 Functions that do not have a readable representation should check the
5451 @code{print_readably} flag and signal an error if it is set.
5452
5453 If you specify NULL for the print method, the
5454 @code{default_object_printer()} will be used.
5455
5456 @item
5457 A @dfn{finalize} method.  This is called at the beginning of the sweep
5458 stage on lcrecords that are about to be freed, and should be used to
5459 perform any extra object cleanup.  This typically involves freeing any
5460 extra @code{malloc()}ed memory associated with the object, releasing any
5461 operating-system and window-system resources associated with the object
5462 (e.g. pixmaps, fonts), etc.
5463
5464 The finalize method can be NULL if nothing needs to be done.
5465
5466 WARNING #1: The finalize method is also called at the end of the dump
5467 phase; this time with the for_disksave parameter set to non-zero.  The
5468 object is @emph{not} about to disappear, so you have to make sure to
5469 @emph{not} free any extra @code{malloc()}ed memory if you're going to
5470 need it later.  (Also, signal an error if there are any operating-system
5471 and window-system resources here, because they can't be dumped.)
5472
5473 Finalize methods should, as a rule, set to zero any pointers after
5474 they've been freed, and check to make sure pointers are not zero before
5475 freeing.  Although I'm pretty sure that finalize methods are not called
5476 twice on the same object (except for the @code{for_disksave} proviso),
5477 we've gotten nastily burned in some cases by not doing this.
5478
5479 WARNING #2: The finalize method is @emph{only} called for
5480 lcrecords, @emph{not} for simply lrecords.  If you need a
5481 finalize method for simple lrecords, you have to stick
5482 it in the @code{ADDITIONAL_FREE_foo()} macro in @file{alloc.c}.
5483
5484 WARNING #3: Things are in an @emph{extremely} bizarre state
5485 when @code{ADDITIONAL_FREE_foo()} is called, so you have to
5486 be incredibly careful when writing one of these functions.
5487 See the comment in @code{gc_sweep()}.  If you ever have to add
5488 one of these, consider using an lcrecord or dealing with
5489 the problem in a different fashion.
5490
5491 @item
5492 An @dfn{equal} method.  This compares the two objects for similarity,
5493 when @code{equal} is called.  It should compare the contents of the
5494 objects in some reasonable fashion.  It is passed the two objects and a
5495 @dfn{depth} value, which is used to catch circular objects.  To compare
5496 sub-Lisp-objects, call @code{internal_equal()} and bump the depth value
5497 by one.  If this value gets too high, a @code{circular-object} error
5498 will be signaled.
5499
5500 If this is NULL, objects are @code{equal} only when they are @code{eq},
5501 i.e. identical.
5502
5503 @item
5504 A @dfn{hash} method.  This is used to hash objects when they are to be
5505 compared with @code{equal}.  The rule here is that if two objects are
5506 @code{equal}, they @emph{must} hash to the same value; i.e. your hash
5507 function should use some subset of the sub-fields of the object that are
5508 compared in the ``equal'' method.  If you specify this method as
5509 @code{NULL}, the object's pointer will be used as the hash, which will
5510 @emph{fail} if the object has an @code{equal} method, so don't do this.
5511
5512 To hash a sub-Lisp-object, call @code{internal_hash()}.  Bump the
5513 depth by one, just like in the ``equal'' method.
5514
5515 To convert a Lisp object directly into a hash value (using
5516 its pointer), use @code{LISP_HASH()}.  This is what happens when
5517 the hash method is NULL.
5518
5519 To hash two or more values together into a single value, use
5520 @code{HASH2()}, @code{HASH3()}, @code{HASH4()}, etc.
5521
5522 @item
5523 @dfn{getprop}, @dfn{putprop}, @dfn{remprop}, and @dfn{plist} methods.
5524 These are used for object types that have properties.  I don't feel like
5525 documenting them here.  If you create one of these objects, you have to
5526 use different macros to define them,
5527 i.e. @code{DEFINE_LRECORD_IMPLEMENTATION_WITH_PROPS()} or
5528 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION_WITH_PROPS()}.
5529
5530 @item
5531 A @dfn{size_in_bytes} method, when the object is of variable-size.
5532 (i.e. declared with a @code{_SEQUENCE_IMPLEMENTATION} macro.)  This should
5533 simply return the object's size in bytes, exactly as you might expect.
5534 For an example, see the methods for window configurations and opaques.
5535 @end enumerate
5536
5537 @node Low-level allocation, Pure Space, lrecords, Allocation of Objects in XEmacs Lisp
5538 @section Low-level allocation
5539
5540   Memory that you want to allocate directly should be allocated using
5541 @code{xmalloc()} rather than @code{malloc()}.  This implements
5542 error-checking on the return value, and once upon a time did some more
5543 vital stuff (i.e. @code{BLOCK_INPUT}, which is no longer necessary).
5544 Free using @code{xfree()}, and realloc using @code{xrealloc()}.  Note
5545 that @code{xmalloc()} will do a non-local exit if the memory can't be
5546 allocated. (Many functions, however, do not expect this, and thus XEmacs
5547 will likely crash if this happens.  @strong{This is a bug.}  If you can,
5548 you should strive to make your function handle this OK.  However, it's
5549 difficult in the general circumstance, perhaps requiring extra
5550 unwind-protects and such.)
5551
5552   Note that XEmacs provides two separate replacements for the standard
5553 @code{malloc()} library function.  These are called @dfn{old GNU malloc}
5554 (@file{malloc.c}) and @dfn{new GNU malloc} (@file{gmalloc.c}),
5555 respectively.  New GNU malloc is better in pretty much every way than
5556 old GNU malloc, and should be used if possible.  (It used to be that on
5557 some systems, the old one worked but the new one didn't.  I think this
5558 was due specifically to a bug in SunOS, which the new one now works
5559 around; so I don't think the old one ever has to be used any more.) The
5560 primary difference between both of these mallocs and the standard system
5561 malloc is that they are much faster, at the expense of increased space.
5562 The basic idea is that memory is allocated in fixed chunks of powers of
5563 two.  This allows for basically constant malloc time, since the various
5564 chunks can just be kept on a number of free lists. (The standard system
5565 malloc typically allocates arbitrary-sized chunks and has to spend some
5566 time, sometimes a significant amount of time, walking the heap looking
5567 for a free block to use and cleaning things up.)  The new GNU malloc
5568 improves on things by allocating large objects in chunks of 4096 bytes
5569 rather than in ever larger powers of two, which results in ever larger
5570 wastage.  There is a slight speed loss here, but it's of doubtful
5571 significance.
5572
5573   NOTE: Apparently there is a third-generation GNU malloc that is
5574 significantly better than the new GNU malloc, and should probably
5575 be included in XEmacs.
5576
5577   There is also the relocating allocator, @file{ralloc.c}.  This actually
5578 moves blocks of memory around so that the @code{sbrk()} pointer shrunk
5579 and virtual memory released back to the system.  On some systems,
5580 this is a big win.  On all systems, it causes a noticeable (and
5581 sometimes huge) speed penalty, so I turn it off by default.
5582 @file{ralloc.c} only works with the new GNU malloc in @file{gmalloc.c}.
5583 There are also two versions of @file{ralloc.c}, one that uses @code{mmap()}
5584 rather than block copies to move data around.  This purports to
5585 be faster, although that depends on the amount of data that would
5586 have had to be block copied and the system-call overhead for
5587 @code{mmap()}.  I don't know exactly how this works, except that the
5588 relocating-allocation routines are pretty much used only for
5589 the memory allocated for a buffer, which is the biggest consumer
5590 of space, esp. of space that may get freed later.
5591
5592   Note that the GNU mallocs have some ``memory warning'' facilities.
5593 XEmacs taps into them and issues a warning through the standard
5594 warning system, when memory gets to 75%, 85%, and 95% full.
5595 (On some systems, the memory warnings are not functional.)
5596
5597   Allocated memory that is going to be used to make a Lisp object
5598 is created using @code{allocate_lisp_storage()}.  This calls @code{xmalloc()}
5599 but also verifies that the pointer to the memory can fit into
5600 a Lisp word (remember that some bits are taken away for a type
5601 tag and a mark bit).  If not, an error is issued through @code{memory_full()}.
5602 @code{allocate_lisp_storage()} is called by @code{alloc_lcrecord()},
5603 @code{ALLOCATE_FIXED_TYPE()}, and the vector and bit-vector creation
5604 routines.  These routines also call @code{INCREMENT_CONS_COUNTER()} at the
5605 appropriate times; this keeps statistics on how much memory is
5606 allocated, so that garbage-collection can be invoked when the
5607 threshold is reached.
5608
5609 @node Pure Space, Cons, Low-level allocation, Allocation of Objects in XEmacs Lisp
5610 @section Pure Space
5611
5612   Not yet documented.
5613
5614 @node Cons, Vector, Pure Space, Allocation of Objects in XEmacs Lisp
5615 @section Cons
5616
5617   Conses are allocated in standard frob blocks.  The only thing to
5618 note is that conses can be explicitly freed using @code{free_cons()}
5619 and associated functions @code{free_list()} and @code{free_alist()}.  This
5620 immediately puts the conses onto the cons free list, and decrements
5621 the statistics on memory allocation appropriately.  This is used
5622 to good effect by some extremely commonly-used code, to avoid
5623 generating extra objects and thereby triggering GC sooner.
5624 However, you have to be @emph{extremely} careful when doing this.
5625 If you mess this up, you will get BADLY BURNED, and it has happened
5626 before.
5627
5628 @node Vector, Bit Vector, Cons, Allocation of Objects in XEmacs Lisp
5629 @section Vector
5630
5631   As mentioned above, each vector is @code{malloc()}ed individually, and
5632 all are threaded through the variable @code{all_vectors}.  Vectors are
5633 marked strangely during garbage collection, by kludging the size field.
5634 Note that the @code{struct Lisp_Vector} is declared with its
5635 @code{contents} field being a @emph{stretchy} array of one element.  It
5636 is actually @code{malloc()}ed with the right size, however, and access
5637 to any element through the @code{contents} array works fine.
5638
5639 @node Bit Vector, Symbol, Vector, Allocation of Objects in XEmacs Lisp
5640 @section Bit Vector
5641
5642   Bit vectors work exactly like vectors, except for more complicated
5643 code to access an individual bit, and except for the fact that bit
5644 vectors are lrecords while vectors are not. (The only difference here is
5645 that there's an lrecord implementation pointer at the beginning and the
5646 tag field in bit vector Lisp words is ``lrecord'' rather than
5647 ``vector''.)
5648
5649 @node Symbol, Marker, Bit Vector, Allocation of Objects in XEmacs Lisp
5650 @section Symbol
5651
5652   Symbols are also allocated in frob blocks.  Note that the code
5653 exists for symbols to be either lrecords (category (c) above)
5654 or simple types (category (b) above), and are lrecords by
5655 default (I think), although there is no good reason for this.
5656
5657   Note that symbols in the awful horrible obarray structure are
5658 chained through their @code{next} field.
5659
5660 Remember that @code{intern} looks up a symbol in an obarray, creating
5661 one if necessary.
5662
5663 @node Marker, String, Symbol, Allocation of Objects in XEmacs Lisp
5664 @section Marker
5665
5666   Markers are allocated in frob blocks, as usual.  They are kept
5667 in a buffer unordered, but in a doubly-linked list so that they
5668 can easily be removed. (Formerly this was a singly-linked list,
5669 but in some cases garbage collection took an extraordinarily
5670 long time due to the O(N^2) time required to remove lots of
5671 markers from a buffer.) Markers are removed from a buffer in
5672 the finalize stage, in @code{ADDITIONAL_FREE_marker()}.
5673
5674 @node String, Compiled Function, Marker, Allocation of Objects in XEmacs Lisp
5675 @section String
5676
5677   As mentioned above, strings are a special case.  A string is logically
5678 two parts, a fixed-size object (containing the length, property list,
5679 and a pointer to the actual data), and the actual data in the string.
5680 The fixed-size object is a @code{struct Lisp_String} and is allocated in
5681 frob blocks, as usual.  The actual data is stored in special
5682 @dfn{string-chars blocks}, which are 8K blocks of memory.
5683 Currently-allocated strings are simply laid end to end in these
5684 string-chars blocks, with a pointer back to the @code{struct Lisp_String}
5685 stored before each string in the string-chars block.  When a new string
5686 needs to be allocated, the remaining space at the end of the last
5687 string-chars block is used if there's enough, and a new string-chars
5688 block is created otherwise.
5689
5690   There are never any holes in the string-chars blocks due to the string
5691 compaction and relocation that happens at the end of garbage collection.
5692 During the sweep stage of garbage collection, when objects are
5693 reclaimed, the garbage collector goes through all string-chars blocks,
5694 looking for unused strings.  Each chunk of string data is preceded by a
5695 pointer to the corresponding @code{struct Lisp_String}, which indicates
5696 both whether the string is used and how big the string is, i.e. how to
5697 get to the next chunk of string data.  Holes are compressed by
5698 block-copying the next string into the empty space and relocating the
5699 pointer stored in the corresponding @code{struct Lisp_String}.
5700 @strong{This means you have to be careful with strings in your code.}
5701 See the section above on @code{GCPRO}ing.
5702
5703   Note that there is one situation not handled: a string that is too big
5704 to fit into a string-chars block.  Such strings, called @dfn{big
5705 strings}, are all @code{malloc()}ed as their own block. (#### Although it
5706 would make more sense for the threshold for big strings to be somewhat
5707 lower, e.g. 1/2 or 1/4 the size of a string-chars block.  It seems that
5708 this was indeed the case formerly---indeed, the threshold was set at
5709 1/8---but Mly forgot about this when rewriting things for 19.8.)
5710
5711 Note also that the string data in string-chars blocks is padded as
5712 necessary so that proper alignment constraints on the @code{struct
5713 Lisp_String} back pointers are maintained.
5714
5715   Finally, strings can be resized.  This happens in Mule when a
5716 character is substituted with a different-length character, or during
5717 modeline frobbing. (You could also export this to Lisp, but it's not
5718 done so currently.) Resizing a string is a potentially tricky process.
5719 If the change is small enough that the padding can absorb it, nothing
5720 other than a simple memory move needs to be done.  Keep in mind,
5721 however, that the string can't shrink too much because the offset to the
5722 next string in the string-chars block is computed by looking at the
5723 length and rounding to the nearest multiple of four or eight.  If the
5724 string would shrink or expand beyond the correct padding, new string
5725 data needs to be allocated at the end of the last string-chars block and
5726 the data moved appropriately.  This leaves some dead string data, which
5727 is marked by putting a special marker of 0xFFFFFFFF in the @code{struct
5728 Lisp_String} pointer before the data (there's no real @code{struct
5729 Lisp_String} to point to and relocate), and storing the size of the dead
5730 string data (which would normally be obtained from the now-non-existent
5731 @code{struct Lisp_String}) at the beginning of the dead string data gap.
5732 The string compactor recognizes this special 0xFFFFFFFF marker and
5733 handles it correctly.
5734
5735 @node Compiled Function,  , String, Allocation of Objects in XEmacs Lisp
5736 @section Compiled Function
5737
5738   Not yet documented.
5739
5740
5741 @node Dumping, Events and the Event Loop, Allocation of Objects in XEmacs Lisp, Top
5742 @chapter Dumping
5743
5744 @section What is dumping and its justification
5745
5746 The C code of XEmacs is just a Lisp engine with a lot of built-in
5747 primitives useful for writing an editor.  The editor itself is written
5748 mostly in Lisp, and represents around 100K lines of code.  Loading and
5749 executing the initialization of all this code takes a bit a time (five
5750 to ten times the usual startup time of current xemacs) and requires
5751 having all the lisp source files around.  Having to reload them each
5752 time the editor is started would not be acceptable.
5753
5754 The traditional solution to this problem is called dumping: the build
5755 process first creates the lisp engine under the name @file{temacs}, then
5756 runs it until it has finished loading and initializing all the lisp
5757 code, and eventually creates a new executable called @file{xemacs}
5758 including both the object code in @file{temacs} and all the contents of
5759 the memory after the initialization.
5760
5761 This solution, while working, has a huge problem: the creation of the
5762 new executable from the actual contents of memory is an extremely
5763 system-specific process, quite error-prone, and which interferes with a
5764 lot of system libraries (like malloc).  It is even getting worse
5765 nowadays with libraries using constructors which are automatically
5766 called when the program is started (even before main()) which tend to
5767 crash when they are called multiple times, once before dumping and once
5768 after (IRIX 6.x libz.so pulls in some C++ image libraries thru
5769 dependencies which have this problem).  Writing the dumper is also one
5770 of the most difficult parts of porting XEmacs to a new operating system.
5771 Basically, `dumping' is an operation that is just not officially
5772 supported on many operating systems.
5773
5774 The aim of the portable dumper is to solve the same problem as the
5775 system-specific dumper, that is to be able to reload quickly, using only
5776 a small number of files, the fully initialized lisp part of the editor,
5777 without any system-specific hacks.
5778
5779 @menu
5780 * Overview::
5781 * Data descriptions::
5782 * Dumping phase::
5783 * Reloading phase::
5784 * Remaining issues::
5785 @end menu
5786
5787 @node Overview, Data descriptions, Dumping, Dumping
5788 @section Overview
5789
5790 The portable dumping system has to:
5791
5792 @enumerate
5793 @item
5794 At dump time, write all initialized, non-quickly-rebuildable data to a
5795 file [Note: currently named @file{xemacs.dmp}, but the name will
5796 change], along with all informations needed for the reloading.
5797
5798 @item
5799 When starting xemacs, reload the dump file, relocate it to its new
5800 starting address if needed, and reinitialize all pointers to this
5801 data.  Also, rebuild all the quickly rebuildable data.
5802 @end enumerate
5803
5804 @node Data descriptions, Dumping phase, Overview, Dumping
5805 @section Data descriptions
5806
5807 The more complex task of the dumper is to be able to write lisp objects
5808 (lrecords) and C structs to disk and reload them at a different address,
5809 updating all the pointers they include in the process.  This is done by
5810 using external data descriptions that give information about the layout
5811 of the structures in memory.
5812
5813 The specification of these descriptions is in lrecord.h.  A description
5814 of an lrecord is an array of struct lrecord_description.  Each of these
5815 structs include a type, an offset in the structure and some optional
5816 parameters depending on the type.  For instance, here is the string
5817 description:
5818
5819 @example
5820 static const struct lrecord_description string_description[] = @{
5821   @{ XD_BYTECOUNT,         offsetof (Lisp_String, size) @},
5822   @{ XD_OPAQUE_DATA_PTR,   offsetof (Lisp_String, data), XD_INDIRECT(0, 1) @},
5823   @{ XD_LISP_OBJECT,       offsetof (Lisp_String, plist) @},
5824   @{ XD_END @}
5825 @};
5826 @end example
5827
5828 The first line indicates a member of type Bytecount, which is used by
5829 the next, indirect directive.  The second means "there is a pointer to
5830 some opaque data in the field @code{data}".  The length of said data is
5831 given by the expression @code{XD_INDIRECT(0, 1)}, which means "the value
5832 in the 0th line of the description (welcome to C) plus one".  The third
5833 line means "there is a Lisp_Object member @code{plist} in the Lisp_String
5834 structure".  @code{XD_END} then ends the description.
5835
5836 This gives us all the information we need to move around what is pointed
5837 to by a structure (C or lrecord) and, by transitivity, everything that
5838 it points to.  The only missing information for dumping is the size of
5839 the structure.  For lrecords, this is part of the
5840 lrecord_implementation, so we don't need to duplicate it.  For C
5841 structures we use a struct struct_description, which includes a size
5842 field and a pointer to an associated array of lrecord_description.
5843
5844 @node Dumping phase, Reloading phase, Data descriptions, Dumping
5845 @section Dumping phase
5846
5847 Dumping is done by calling the function pdump() (in alloc.c) which is
5848 invoked from Fdump_emacs (in emacs.c).  This function performs a number
5849 of tasks.
5850
5851 @menu
5852 * Object inventory::
5853 * Address allocation::
5854 * The header::
5855 * Data dumping::
5856 * Pointers dumping::
5857 @end menu
5858
5859 @node Object inventory, Address allocation, Dumping phase, Dumping phase
5860 @subsection Object inventory
5861
5862 The first task is to build the list of the objects to dump.  This
5863 includes:
5864
5865 @itemize @bullet
5866 @item lisp objects
5867 @item C structures
5868 @end itemize
5869
5870 We end up with one @code{pdump_entry_list_elmt} per object group (arrays
5871 of C structs are kept together) which includes a pointer to the first
5872 object of the group, the per-object size and the count of objects in the
5873 group, along with some other information which is initialized later.
5874
5875 These entries are linked together in @code{pdump_entry_list} structures
5876 and can be enumerated thru either:
5877
5878 @enumerate
5879 @item
5880 the @code{pdump_object_table}, an array of @code{pdump_entry_list}, one
5881 per lrecord type, indexed by type number.
5882
5883 @item
5884 the @code{pdump_opaque_data_list}, used for the opaque data which does
5885 not include pointers, and hence does not need descriptions.
5886
5887 @item
5888 the @code{pdump_struct_table}, which is a vector of
5889 @code{struct_description}/@code{pdump_entry_list} pairs, used for
5890 non-opaque C structures.
5891 @end enumerate
5892
5893 This uses a marking strategy similar to the garbage collector.  Some
5894 differences though:
5895
5896 @enumerate
5897 @item
5898 We do not use the mark bit (which does not exist for C structures
5899 anyway), we use a big hash table instead.
5900
5901 @item
5902 We do not use the mark function of lrecords but instead rely on the
5903 external descriptions.  This happens essentially because we need to
5904 follow pointers to C structures and opaque data in addition to
5905 Lisp_Object members.
5906 @end enumerate
5907
5908 This is done by @code{pdump_register_object}, which handles Lisp_Object
5909 variables, and pdump_register_struct which handles C structures, which
5910 both delegate the description management to pdump_register_sub.
5911
5912 The hash table doubles as a map object to pdump_entry_list_elmt (i.e.
5913 allows us to look up a pdump_entry_list_elmt with the object it points
5914 to).  Entries are added with @code{pdump_add_entry()} and looked up with
5915 @code{pdump_get_entry()}.  There is no need for entry removal.  The hash
5916 value is computed quite basically from the object pointer by
5917 @code{pdump_make_hash()}.
5918
5919 The roots for the marking are:
5920
5921 @enumerate
5922 @item
5923 the @code{staticpro}'ed variables (there is a special @code{staticpro_nodump()}
5924 call for protected variables we do not want to dump).
5925
5926 @item
5927 the @code{pdump_wire}'d variables (@code{staticpro} is equivalent to
5928 @code{staticpro_nodump()} + @code{pdump_wire()}).
5929
5930 @item
5931 the @code{dumpstruct}'ed variables, which points to C structures.
5932 @end enumerate
5933
5934 This does not include the GCPRO'ed variables, the specbinds, the
5935 catchtags, the backlist, the redisplay or the profiling info, since we
5936 do not want to rebuild the actual chain of lisp calls which end up to
5937 the dump-emacs call, only the global variables.
5938
5939 Weak lists and weak hash tables are dumped as if they were their
5940 non-weak equivalent (without changing their type, of course).  This has
5941 not yet been a problem.
5942
5943 @node Address allocation, The header, Object inventory, Dumping phase
5944 @subsection Address allocation
5945
5946
5947 The next step is to allocate the offsets of each of the objects in the
5948 final dump file.  This is done by @code{pdump_allocate_offset()} which
5949 is called indirectly by @code{pdump_scan_by_alignment()}.
5950
5951 The strategy to deal with alignment problems uses these facts:
5952
5953 @enumerate
5954 @item
5955 real world alignment requirements are powers of two.
5956
5957 @item
5958 the C compiler is required to adjust the size of a struct so that you
5959 can have an array of them next to each other.  This means you can have a
5960 upper bound of the alignment requirements of a given structure by
5961 looking at which power of two its size is a multiple.
5962
5963 @item
5964 the non-variant part of variable size lrecords has an alignment
5965 requirement of 4.
5966 @end enumerate
5967
5968 Hence, for each lrecord type, C struct type or opaque data block the
5969 alignment requirement is computed as a power of two, with a minimum of
5970 2^2 for lrecords.  @code{pdump_scan_by_alignment()} then scans all the
5971 @code{pdump_entry_list_elmt}'s, the ones with the highest requirements
5972 first.  This ensures the best packing.
5973
5974 The maximum alignment requirement we take into account is 2^8.
5975
5976 @code{pdump_allocate_offset()} only has to do a linear allocation,
5977 starting at offset 256 (this leaves room for the header and keep the
5978 alignments happy).
5979
5980 @node The header, Data dumping, Address allocation, Dumping phase
5981 @subsection The header
5982
5983 The next step creates the file and writes a header with a signature and
5984 some random informations in it (number of staticpro, number of assigned
5985 lrecord types, etc...).  The reloc_address field, which indicates at
5986 which address the file should be loaded if we want to avoid post-reload
5987 relocation, is set to 0.  It then seeks to offset 256 (base offset for
5988 the objects).
5989
5990 @node Data dumping, Pointers dumping, The header, Dumping phase
5991 @subsection Data dumping
5992
5993 The data is dumped in the same order as the addresses were allocated by
5994 @code{pdump_dump_data()}, called from @code{pdump_scan_by_alignment()}.
5995 This function copies the data to a temporary buffer, relocates all
5996 pointers in the object to the addresses allocated in step Address
5997 Allocation, and writes it to the file.  Using the same order means that,
5998 if we are careful with lrecords whose size is not a multiple of 4, we
5999 are ensured that the object is always written at the offset in the file
6000 allocated in step Address Allocation.
6001
6002 @node Pointers dumping,  , Data dumping, Dumping phase
6003 @subsection Pointers dumping
6004
6005 A bunch of tables needed to reassign properly the global pointers are
6006 then written.  They are:
6007
6008 @enumerate
6009 @item the staticpro array
6010 @item the dumpstruct array
6011 @item the lrecord_implementation_table array
6012 @item a vector of all the offsets to the objects in the file that include a
6013 description (for faster relocation at reload time)
6014 @item the pdump_wired and pdump_wired_list arrays
6015 @end enumerate
6016
6017 For each of the arrays we write both the pointer to the variables and
6018 the relocated offset of the object they point to.  Since these variables
6019 are global, the pointers are still valid when restarting the program and
6020 are used to regenerate the global pointers.
6021
6022 The @code{pdump_wired_list} array is a special case.  The variables it
6023 points to are the head of weak linked lists of lisp objects of the same
6024 type.  Not all objects of this list are dumped so the relocated pointer
6025 we associate with them points to the first dumped object of the list, or
6026 Qnil if none is available.  This is also the reason why they are not
6027 used as roots for the purpose of object enumeration.
6028
6029 This is the end of the dumping part.
6030
6031 @node Reloading phase, Remaining issues, Dumping phase, Dumping
6032 @section Reloading phase
6033
6034 @subsection File loading
6035
6036 The file is mmap'ed in memory (which ensures a PAGESIZE alignment, at
6037 least 4096), or if mmap is unavailable or fails, a 256-bytes aligned
6038 malloc is done and the file is loaded.
6039
6040 Some variables are reinitialized from the values found in the header.
6041
6042 The difference between the actual loading address and the reloc_address
6043 is computed and will be used for all the relocations.
6044
6045
6046 @subsection Putting back the staticvec
6047
6048 The staticvec array is memcpy'd from the file and the variables it
6049 points to are reset to the relocated objects addresses.
6050
6051
6052 @subsection Putting back the dumpstructed variables
6053
6054 The variables pointed to by dumpstruct in the dump phase are reset to
6055 the right relocated object addresses.
6056
6057
6058 @subsection lrecord_implementations_table
6059
6060 The lrecord_implementations_table is reset to its dump time state and
6061 the right lrecord_type_index values are put in.
6062
6063
6064 @subsection Object relocation
6065
6066 All the objects are relocated using their description and their offset
6067 by @code{pdump_reloc_one}.  This step is unnecessary if the
6068 reloc_address is equal to the file loading address.
6069
6070
6071 @subsection Putting back the pdump_wire and pdump_wire_list variables
6072
6073 Same as Putting back the dumpstructed variables.
6074
6075
6076 @subsection Reorganize the hash tables
6077
6078 Since some of the hash values in the lisp hash tables are
6079 address-dependent, their layout is now wrong.  So we go through each of
6080 them and have them resorted by calling @code{pdump_reorganize_hash_table}.
6081
6082 @node Remaining issues,  , Reloading phase, Dumping
6083 @section Remaining issues
6084
6085 The build process will have to start a post-dump xemacs, ask it the
6086 loading address (which will, hopefully, be always the same between
6087 different xemacs invocations) and relocate the file to the new address.
6088 This way the object relocation phase will not have to be done, which
6089 means no writes in the objects and that, because of the use of mmap, the
6090 dumped data will be shared between all the xemacs running on the
6091 computer.
6092
6093 Some executable signature will be necessary to ensure that a given dump
6094 file is really associated with a given executable, or random crashes
6095 will occur.  Maybe a random number set at compile or configure time thru
6096 a define.  This will also allow for having differently-compiled xemacsen
6097 on the same system (mule and no-mule comes to mind).
6098
6099 The DOC file contents should probably end up in the dump file.
6100
6101
6102 @node Events and the Event Loop, Evaluation; Stack Frames; Bindings, Dumping, Top
6103 @chapter Events and the Event Loop
6104
6105 @menu
6106 * Introduction to Events::
6107 * Main Loop::
6108 * Specifics of the Event Gathering Mechanism::
6109 * Specifics About the Emacs Event::
6110 * The Event Stream Callback Routines::
6111 * Other Event Loop Functions::
6112 * Converting Events::
6113 * Dispatching Events; The Command Builder::
6114 @end menu
6115
6116 @node Introduction to Events, Main Loop, Events and the Event Loop, Events and the Event Loop
6117 @section Introduction to Events
6118
6119   An event is an object that encapsulates information about an
6120 interesting occurrence in the operating system.  Events are
6121 generated either by user action, direct (e.g. typing on the
6122 keyboard or moving the mouse) or indirect (moving another
6123 window, thereby generating an expose event on an Emacs frame),
6124 or as a result of some other typically asynchronous action happening,
6125 such as output from a subprocess being ready or a timer expiring.
6126 Events come into the system in an asynchronous fashion (typically
6127 through a callback being called) and are converted into a
6128 synchronous event queue (first-in, first-out) in a process that
6129 we will call @dfn{collection}.
6130
6131   Note that each application has its own event queue. (It is
6132 immaterial whether the collection process directly puts the
6133 events in the proper application's queue, or puts them into
6134 a single system queue, which is later split up.)
6135
6136   The most basic level of event collection is done by the
6137 operating system or window system.  Typically, XEmacs does
6138 its own event collection as well.  Often there are multiple
6139 layers of collection in XEmacs, with events from various
6140 sources being collected into a queue, which is then combined
6141 with other sources to go into another queue (i.e. a second
6142 level of collection), with perhaps another level on top of
6143 this, etc.
6144
6145   XEmacs has its own types of events (called @dfn{Emacs events}),
6146 which provides an abstract layer on top of the system-dependent
6147 nature of the most basic events that are received.  Part of the
6148 complex nature of the XEmacs event collection process involves
6149 converting from the operating-system events into the proper
6150 Emacs events---there may not be a one-to-one correspondence.
6151
6152   Emacs events are documented in @file{events.h}; I'll discuss them
6153 later.
6154
6155 @node Main Loop, Specifics of the Event Gathering Mechanism, Introduction to Events, Events and the Event Loop
6156 @section Main Loop
6157
6158   The @dfn{command loop} is the top-level loop that the editor is always
6159 running.  It loops endlessly, calling @code{next-event} to retrieve an
6160 event and @code{dispatch-event} to execute it. @code{dispatch-event} does
6161 the appropriate thing with non-user events (process, timeout,
6162 magic, eval, mouse motion); this involves calling a Lisp handler
6163 function, redrawing a newly-exposed part of a frame, reading
6164 subprocess output, etc.  For user events, @code{dispatch-event}
6165 looks up the event in relevant keymaps or menubars; when a
6166 full key sequence or menubar selection is reached, the appropriate
6167 function is executed. @code{dispatch-event} may have to keep state
6168 across calls; this is done in the ``command-builder'' structure
6169 associated with each console (remember, there's usually only
6170 one console), and the engine that looks up keystrokes and
6171 constructs full key sequences is called the @dfn{command builder}.
6172 This is documented elsewhere.
6173
6174   The guts of the command loop are in @code{command_loop_1()}.  This
6175 function doesn't catch errors, though---that's the job of
6176 @code{command_loop_2()}, which is a condition-case (i.e. error-trapping)
6177 wrapper around @code{command_loop_1()}.  @code{command_loop_1()} never
6178 returns, but may get thrown out of.
6179
6180   When an error occurs, @code{cmd_error()} is called, which usually
6181 invokes the Lisp error handler in @code{command-error}; however, a
6182 default error handler is provided if @code{command-error} is @code{nil}
6183 (e.g. during startup).  The purpose of the error handler is simply to
6184 display the error message and do associated cleanup; it does not need to
6185 throw anywhere.  When the error handler finishes, the condition-case in
6186 @code{command_loop_2()} will finish and @code{command_loop_2()} will
6187 reinvoke @code{command_loop_1()}.
6188
6189   @code{command_loop_2()} is invoked from three places: from
6190 @code{initial_command_loop()} (called from @code{main()} at the end of
6191 internal initialization), from the Lisp function @code{recursive-edit},
6192 and from @code{call_command_loop()}.
6193
6194   @code{call_command_loop()} is called when a macro is started and when
6195 the minibuffer is entered; normal termination of the macro or minibuffer
6196 causes a throw out of the recursive command loop. (To
6197 @code{execute-kbd-macro} for macros and @code{exit} for minibuffers.
6198 Note also that the low-level minibuffer-entering function,
6199 @code{read-minibuffer-internal}, provides its own error handling and
6200 does not need @code{command_loop_2()}'s error encapsulation; so it tells
6201 @code{call_command_loop()} to invoke @code{command_loop_1()} directly.)
6202
6203   Note that both read-minibuffer-internal and recursive-edit set up a
6204 catch for @code{exit}; this is why @code{abort-recursive-edit}, which
6205 throws to this catch, exits out of either one.
6206
6207   @code{initial_command_loop()}, called from @code{main()}, sets up a
6208 catch for @code{top-level} when invoking @code{command_loop_2()},
6209 allowing functions to throw all the way to the top level if they really
6210 need to.  Before invoking @code{command_loop_2()},
6211 @code{initial_command_loop()} calls @code{top_level_1()}, which handles
6212 all of the startup stuff (creating the initial frame, handling the
6213 command-line options, loading the user's @file{.emacs} file, etc.).  The
6214 function that actually does this is in Lisp and is pointed to by the
6215 variable @code{top-level}; normally this function is
6216 @code{normal-top-level}.  @code{top_level_1()} is just an error-handling
6217 wrapper similar to @code{command_loop_2()}.  Note also that
6218 @code{initial_command_loop()} sets up a catch for @code{top-level} when
6219 invoking @code{top_level_1()}, just like when it invokes
6220 @code{command_loop_2()}.
6221
6222 @node Specifics of the Event Gathering Mechanism, Specifics About the Emacs Event, Main Loop, Events and the Event Loop
6223 @section Specifics of the Event Gathering Mechanism
6224
6225   Here is an approximate diagram of the collection processes
6226 at work in XEmacs, under TTY's (TTY's are simpler than X
6227 so we'll look at this first):
6228
6229 @noindent
6230 @example
6231  asynch.      asynch.    asynch.   asynch.             [Collectors in
6232 kbd events  kbd events   process   process                the OS]
6233       |         |         output    output
6234       |         |           |         |
6235       |         |           |         |      SIGINT,   [signal handlers
6236       |         |           |         |      SIGQUIT,     in XEmacs]
6237       V         V           V         V      SIGWINCH,
6238      file      file        file      file    SIGALRM
6239      desc.     desc.       desc.     desc.     |
6240      (TTY)     (TTY)       (pipe)    (pipe)    |
6241       |          |          |         |      fake    timeouts
6242       |          |          |         |      file        |
6243       |          |          |         |      desc.       |
6244       |          |          |         |      (pipe)      |
6245       |          |          |         |        |         |
6246       |          |          |         |        |         |
6247       |          |          |         |        |         |
6248       V          V          V         V        V         V
6249       ------>-----------<----------------<----------------
6250                   |
6251                   |
6252                   | [collected using select() in emacs_tty_next_event()
6253                   |  and converted to the appropriate Emacs event]
6254                   |
6255                   |
6256                   V          (above this line is TTY-specific)
6257                 Emacs -----------------------------------------------
6258                 event (below this line is the generic event mechanism)
6259                   |
6260                   |
6261 was there     if not, call
6262 a SIGINT?  emacs_tty_next_event()
6263     |             |
6264     |             |
6265     |             |
6266     V             V
6267     --->------<----
6268            |
6269            |     [collected in event_stream_next_event();
6270            |      SIGINT is converted using maybe_read_quit_event()]
6271            V
6272          Emacs
6273          event
6274            |
6275            \---->------>----- maybe_kbd_translate() ---->---\
6276                                                             |
6277                                                             |
6278                                                             |
6279      command event queue                                    |
6280                                                if not from command
6281   (contains events that were                   event queue, call
6282   read earlier but not processed,              event_stream_next_event()
6283   typically when waiting in a                               |
6284   sit-for, sleep-for, etc. for                              |
6285  a particular event to be received)                         |
6286                |                                            |
6287                |                                            |
6288                V                                            V
6289                ---->------------------------------------<----
6290                                                |
6291                                                | [collected in
6292                                                |  next_event_internal()]
6293                                                |
6294  unread-     unread-       event from          |
6295  command-    command-       keyboard       else, call
6296  events      event           macro      next_event_internal()
6297    |           |               |               |
6298    |           |               |               |
6299    |           |               |               |
6300    V           V               V               V
6301    --------->----------------------<------------
6302                      |
6303                      |      [collected in `next-event', which may loop
6304                      |       more than once if the event it gets is on
6305                      |       a dead frame, device, etc.]
6306                      |
6307                      |
6308                      V
6309             feed into top-level event loop,
6310             which repeatedly calls `next-event'
6311             and then dispatches the event
6312             using `dispatch-event'
6313 @end example
6314
6315 Notice the separation between TTY-specific and generic event mechanism.
6316 When using the Xt-based event loop, the TTY-specific stuff is replaced
6317 but the rest stays the same.
6318
6319 It's also important to realize that only one different kind of
6320 system-specific event loop can be operating at a time, and must be able
6321 to receive all kinds of events simultaneously.  For the two existing
6322 event loops (implemented in @file{event-tty.c} and @file{event-Xt.c},
6323 respectively), the TTY event loop @emph{only} handles TTY consoles,
6324 while the Xt event loop handles @emph{both} TTY and X consoles.  This
6325 situation is different from all of the output handlers, where you simply
6326 have one per console type.
6327
6328   Here's the Xt Event Loop Diagram (notice that below a certain point,
6329 it's the same as the above diagram):
6330
6331 @example
6332 asynch. asynch. asynch. asynch.                 [Collectors in
6333  kbd     kbd    process process                    the OS]
6334 events  events  output  output
6335   |       |       |       |
6336   |       |       |       |     asynch. asynch. [Collectors in the
6337   |       |       |       |       X        X     OS and X Window System]
6338   |       |       |       |     events  events
6339   |       |       |       |       |        |
6340   |       |       |       |       |        |
6341   |       |       |       |       |        |    SIGINT, [signal handlers
6342   |       |       |       |       |        |    SIGQUIT,   in XEmacs]
6343   |       |       |       |       |        |    SIGWINCH,
6344   |       |       |       |       |        |    SIGALRM
6345   |       |       |       |       |        |       |
6346   |       |       |       |       |        |       |
6347   |       |       |       |       |        |       |      timeouts
6348   |       |       |       |       |        |       |          |
6349   |       |       |       |       |        |       |          |
6350   |       |       |       |       |        |       V          |
6351   V       V       V       V       V        V      fake        |
6352  file    file    file    file    file     file    file        |
6353  desc.   desc.   desc.   desc.   desc.    desc.   desc.       |
6354  (TTY)   (TTY)   (pipe)  (pipe) (socket) (socket) (pipe)      |
6355   |       |       |       |       |        |       |          |
6356   |       |       |       |       |        |       |          |
6357   |       |       |       |       |        |       |          |
6358   V       V       V       V       V        V       V          V
6359   --->----------------------------------------<---------<------
6360        |              |               |
6361        |              |               |[collected using select() in
6362        |              |               | _XtWaitForSomething(), called
6363        |              |               | from XtAppProcessEvent(), called
6364        |              |               | in emacs_Xt_next_event();
6365        |              |               | dispatched to various callbacks]
6366        |              |               |
6367        |              |               |
6368   emacs_Xt_        p_s_callback(),    | [popup_selection_callback]
6369   event_handler()  x_u_v_s_callback(),| [x_update_vertical_scrollbar_
6370        |           x_u_h_s_callback(),|  callback]
6371        |           search_callback()  | [x_update_horizontal_scrollbar_
6372        |              |               |  callback]
6373        |              |               |
6374        |              |               |
6375   enqueue_Xt_       signal_special_   |
6376   dispatch_event()  Xt_user_event()   |
6377   [maybe multiple     |               |
6378    times, maybe 0     |               |
6379    times]             |               |
6380        |            enqueue_Xt_       |
6381        |            dispatch_event()  |
6382        |              |               |
6383        |              |               |
6384        V              V               |
6385        -->----------<--               |
6386               |                       |
6387               |                       |
6388            dispatch             Xt_what_callback()
6389            event                  sets flags
6390            queue                      |
6391               |                       |
6392               |                       |
6393               |                       |
6394               |                       |
6395               ---->-----------<--------
6396                    |
6397                    |
6398                    |     [collected and converted as appropriate in
6399                    |            emacs_Xt_next_event()]
6400                    |
6401                    |
6402                    V          (above this line is Xt-specific)
6403                  Emacs ------------------------------------------------
6404                  event (below this line is the generic event mechanism)
6405                    |
6406                    |
6407 was there      if not, call
6408 a SIGINT?   emacs_Xt_next_event()
6409     |              |
6410     |              |
6411     |              |
6412     V              V
6413     --->-------<----
6414            |
6415            |        [collected in event_stream_next_event();
6416            |         SIGINT is converted using maybe_read_quit_event()]
6417            V
6418          Emacs
6419          event
6420            |
6421            \---->------>----- maybe_kbd_translate() -->-----\
6422                                                             |
6423                                                             |
6424                                                             |
6425      command event queue                                    |
6426                                               if not from command
6427   (contains events that were                  event queue, call
6428   read earlier but not processed,             event_stream_next_event()
6429   typically when waiting in a                               |
6430   sit-for, sleep-for, etc. for                              |
6431  a particular event to be received)                         |
6432                |                                            |
6433                |                                            |
6434                V                                            V
6435                ---->----------------------------------<------
6436                                                |
6437                                                | [collected in
6438                                                |  next_event_internal()]
6439                                                |
6440  unread-     unread-       event from          |
6441  command-    command-       keyboard       else, call
6442  events      event           macro      next_event_internal()
6443    |           |               |               |
6444    |           |               |               |
6445    |           |               |               |
6446    V           V               V               V
6447    --------->----------------------<------------
6448                      |
6449                      |      [collected in `next-event', which may loop
6450                      |       more than once if the event it gets is on
6451                      |       a dead frame, device, etc.]
6452                      |
6453                      |
6454                      V
6455             feed into top-level event loop,
6456             which repeatedly calls `next-event'
6457             and then dispatches the event
6458             using `dispatch-event'
6459 @end example
6460
6461 @node Specifics About the Emacs Event, The Event Stream Callback Routines, Specifics of the Event Gathering Mechanism, Events and the Event Loop
6462 @section Specifics About the Emacs Event
6463
6464 @node The Event Stream Callback Routines, Other Event Loop Functions, Specifics About the Emacs Event, Events and the Event Loop
6465 @section The Event Stream Callback Routines
6466
6467 @node Other Event Loop Functions, Converting Events, The Event Stream Callback Routines, Events and the Event Loop
6468 @section Other Event Loop Functions
6469
6470   @code{detect_input_pending()} and @code{input-pending-p} look for
6471 input by calling @code{event_stream->event_pending_p} and looking in
6472 @code{[V]unread-command-event} and the @code{command_event_queue} (they
6473 do not check for an executing keyboard macro, though).
6474
6475   @code{discard-input} cancels any command events pending (and any
6476 keyboard macros currently executing), and puts the others onto the
6477 @code{command_event_queue}.  There is a comment about a ``race
6478 condition'', which is not a good sign.
6479
6480   @code{next-command-event} and @code{read-char} are higher-level
6481 interfaces to @code{next-event}.  @code{next-command-event} gets the
6482 next @dfn{command} event (i.e.  keypress, mouse event, menu selection,
6483 or scrollbar action), calling @code{dispatch-event} on any others.
6484 @code{read-char} calls @code{next-command-event} and uses
6485 @code{event_to_character()} to return the character equivalent.  With
6486 the right kind of input method support, it is possible for (read-char)
6487 to return a Kanji character.
6488
6489 @node Converting Events, Dispatching Events; The Command Builder, Other Event Loop Functions, Events and the Event Loop
6490 @section Converting Events
6491
6492   @code{character_to_event()}, @code{event_to_character()},
6493 @code{event-to-character}, and @code{character-to-event} convert between
6494 characters and keypress events corresponding to the characters.  If the
6495 event was not a keypress, @code{event_to_character()} returns -1 and
6496 @code{event-to-character} returns @code{nil}.  These functions convert
6497 between character representation and the split-up event representation
6498 (keysym plus mod keys).
6499
6500 @node Dispatching Events; The Command Builder,  , Converting Events, Events and the Event Loop
6501 @section Dispatching Events; The Command Builder
6502
6503 Not yet documented.
6504
6505 @node Evaluation; Stack Frames; Bindings, Symbols and Variables, Events and the Event Loop, Top
6506 @chapter Evaluation; Stack Frames; Bindings
6507
6508 @menu
6509 * Evaluation::
6510 * Dynamic Binding; The specbinding Stack; Unwind-Protects::
6511 * Simple Special Forms::
6512 * Catch and Throw::
6513 @end menu
6514
6515 @node Evaluation, Dynamic Binding; The specbinding Stack; Unwind-Protects, Evaluation; Stack Frames; Bindings, Evaluation; Stack Frames; Bindings
6516 @section Evaluation
6517
6518   @code{Feval()} evaluates the form (a Lisp object) that is passed to
6519 it.  Note that evaluation is only non-trivial for two types of objects:
6520 symbols and conses.  A symbol is evaluated simply by calling
6521 @code{symbol-value} on it and returning the value.
6522
6523   Evaluating a cons means calling a function.  First, @code{eval} checks
6524 to see if garbage-collection is necessary, and calls
6525 @code{garbage_collect_1()} if so.  It then increases the evaluation
6526 depth by 1 (@code{lisp_eval_depth}, which is always less than
6527 @code{max_lisp_eval_depth}) and adds an element to the linked list of
6528 @code{struct backtrace}'s (@code{backtrace_list}).  Each such structure
6529 contains a pointer to the function being called plus a list of the
6530 function's arguments.  Originally these values are stored unevalled, and
6531 as they are evaluated, the backtrace structure is updated.  Garbage
6532 collection pays attention to the objects pointed to in the backtrace
6533 structures (garbage collection might happen while a function is being
6534 called or while an argument is being evaluated, and there could easily
6535 be no other references to the arguments in the argument list; once an
6536 argument is evaluated, however, the unevalled version is not needed by
6537 eval, and so the backtrace structure is changed).
6538
6539 At this point, the function to be called is determined by looking at
6540 the car of the cons (if this is a symbol, its function definition is
6541 retrieved and the process repeated).  The function should then consist
6542 of either a @code{Lisp_Subr} (built-in function written in C), a
6543 @code{Lisp_Compiled_Function} object, or a cons whose car is one of the
6544 symbols @code{autoload}, @code{macro} or @code{lambda}.
6545
6546 If the function is a @code{Lisp_Subr}, the lisp object points to a
6547 @code{struct Lisp_Subr} (created by @code{DEFUN()}), which contains a
6548 pointer to the C function, a minimum and maximum number of arguments
6549 (or possibly the special constants @code{MANY} or @code{UNEVALLED}), a
6550 pointer to the symbol referring to that subr, and a couple of other
6551 things.  If the subr wants its arguments @code{UNEVALLED}, they are
6552 passed raw as a list.  Otherwise, an array of evaluated arguments is
6553 created and put into the backtrace structure, and either passed whole
6554 (@code{MANY}) or each argument is passed as a C argument.
6555
6556 If the function is a @code{Lisp_Compiled_Function},
6557 @code{funcall_compiled_function()} is called.  If the function is a
6558 lambda list, @code{funcall_lambda()} is called.  If the function is a
6559 macro, [..... fill in] is done.  If the function is an autoload,
6560 @code{do_autoload()} is called to load the definition and then eval
6561 starts over [explain this more].
6562
6563 When @code{Feval()} exits, the evaluation depth is reduced by one, the
6564 debugger is called if appropriate, and the current backtrace structure
6565 is removed from the list.
6566
6567 Both @code{funcall_compiled_function()} and @code{funcall_lambda()} need
6568 to go through the list of formal parameters to the function and bind
6569 them to the actual arguments, checking for @code{&rest} and
6570 @code{&optional} symbols in the formal parameters and making sure the
6571 number of actual arguments is correct.
6572 @code{funcall_compiled_function()} can do this a little more
6573 efficiently, since the formal parameter list can be checked for sanity
6574 when the compiled function object is created.
6575
6576 @code{funcall_lambda()} simply calls @code{Fprogn} to execute the code
6577 in the lambda list.
6578
6579 @code{funcall_compiled_function()} calls the real byte-code interpreter
6580 @code{execute_optimized_program()} on the byte-code instructions, which
6581 are converted into an internal form for faster execution.
6582
6583 When a compiled function is executed for the first time by
6584 @code{funcall_compiled_function()}, or when it is @code{Fpurecopy()}ed
6585 during the dump phase of building XEmacs, the byte-code instructions are
6586 converted from a @code{Lisp_String} (which is inefficient to access,
6587 especially in the presence of MULE) into a @code{Lisp_Opaque} object
6588 containing an array of unsigned char, which can be directly executed by
6589 the byte-code interpreter.  At this time the byte code is also analyzed
6590 for validity and transformed into a more optimized form, so that
6591 @code{execute_optimized_program()} can really fly.
6592
6593 Here are some of the optimizations performed by the internal byte-code
6594 transformer:
6595 @enumerate
6596 @item
6597 References to the @code{constants} array are checked for out-of-range
6598 indices, so that the byte interpreter doesn't have to.
6599 @item
6600 References to the @code{constants} array that will be used as a Lisp
6601 variable are checked for being correct non-constant (i.e. not @code{t},
6602 @code{nil}, or @code{keywordp}) symbols, so that the byte interpreter
6603 doesn't have to.
6604 @item
6605 The maxiumum number of variable bindings in the byte-code is
6606 pre-computed, so that space on the @code{specpdl} stack can be
6607 pre-reserved once for the whole function execution.
6608 @item
6609 All byte-code jumps are relative to the current program counter instead
6610 of the start of the program, thereby saving a register.
6611 @item
6612 One-byte relative jumps are converted from the byte-code form of unsigned
6613 chars offset by 127 to machine-friendly signed chars.
6614 @end enumerate
6615
6616 Of course, this transformation of the @code{instructions} should not be
6617 visible to the user, so @code{Fcompiled_function_instructions()} needs
6618 to know how to convert the optimized opaque object back into a Lisp
6619 string that is identical to the original string from the @file{.elc}
6620 file.  (Actually, the resulting string may (rarely) contain slightly
6621 different, yet equivalent, byte code.)
6622
6623 @code{Ffuncall()} implements Lisp @code{funcall}.  @code{(funcall fun
6624 x1 x2 x3 ...)} is equivalent to @code{(eval (list fun (quote x1) (quote
6625 x2) (quote x3) ...))}.  @code{Ffuncall()} contains its own code to do
6626 the evaluation, however, and is very similar to @code{Feval()}.
6627
6628 From the performance point of view, it is worth knowing that most of the
6629 time in Lisp evaluation is spent executing @code{Lisp_Subr} and
6630 @code{Lisp_Compiled_Function} objects via @code{Ffuncall()} (not
6631 @code{Feval()}).
6632
6633 @code{Fapply()} implements Lisp @code{apply}, which is very similar to
6634 @code{funcall} except that if the last argument is a list, the result is the
6635 same as if each of the arguments in the list had been passed separately.
6636 @code{Fapply()} does some business to expand the last argument if it's a
6637 list, then calls @code{Ffuncall()} to do the work.
6638
6639 @code{apply1()}, @code{call0()}, @code{call1()}, @code{call2()}, and
6640 @code{call3()} call a function, passing it the argument(s) given (the
6641 arguments are given as separate C arguments rather than being passed as
6642 an array).  @code{apply1()} uses @code{Fapply()} while the others use
6643 @code{Ffuncall()} to do the real work.
6644
6645 @node Dynamic Binding; The specbinding Stack; Unwind-Protects, Simple Special Forms, Evaluation, Evaluation; Stack Frames; Bindings
6646 @section Dynamic Binding; The specbinding Stack; Unwind-Protects
6647
6648 @example
6649 struct specbinding
6650 @{
6651   Lisp_Object symbol;
6652   Lisp_Object old_value;
6653   Lisp_Object (*func) (Lisp_Object); /* for unwind-protect */
6654 @};
6655 @end example
6656
6657   @code{struct specbinding} is used for local-variable bindings and
6658 unwind-protects.  @code{specpdl} holds an array of @code{struct specbinding}'s,
6659 @code{specpdl_ptr} points to the beginning of the free bindings in the
6660 array, @code{specpdl_size} specifies the total number of binding slots
6661 in the array, and @code{max_specpdl_size} specifies the maximum number
6662 of bindings the array can be expanded to hold.  @code{grow_specpdl()}
6663 increases the size of the @code{specpdl} array, multiplying its size by
6664 2 but never exceeding @code{max_specpdl_size} (except that if this
6665 number is less than 400, it is first set to 400).
6666
6667   @code{specbind()} binds a symbol to a value and is used for local
6668 variables and @code{let} forms.  The symbol and its old value (which
6669 might be @code{Qunbound}, indicating no prior value) are recorded in the
6670 specpdl array, and @code{specpdl_size} is increased by 1.
6671
6672   @code{record_unwind_protect()} implements an @dfn{unwind-protect},
6673 which, when placed around a section of code, ensures that some specified
6674 cleanup routine will be executed even if the code exits abnormally
6675 (e.g. through a @code{throw} or quit).  @code{record_unwind_protect()}
6676 simply adds a new specbinding to the @code{specpdl} array and stores the
6677 appropriate information in it.  The cleanup routine can either be a C
6678 function, which is stored in the @code{func} field, or a @code{progn}
6679 form, which is stored in the @code{old_value} field.
6680
6681   @code{unbind_to()} removes specbindings from the @code{specpdl} array
6682 until the specified position is reached.  Each specbinding can be one of
6683 three types:
6684
6685 @enumerate
6686 @item
6687 an unwind-protect with a C cleanup function (@code{func} is not 0, and
6688 @code{old_value} holds an argument to be passed to the function);
6689 @item
6690 an unwind-protect with a Lisp form (@code{func} is 0, @code{symbol}
6691 is @code{nil}, and @code{old_value} holds the form to be executed with
6692 @code{Fprogn()}); or
6693 @item
6694 a local-variable binding (@code{func} is 0, @code{symbol} is not
6695 @code{nil}, and @code{old_value} holds the old value, which is stored as
6696 the symbol's value).
6697 @end enumerate
6698
6699 @node Simple Special Forms, Catch and Throw, Dynamic Binding; The specbinding Stack; Unwind-Protects, Evaluation; Stack Frames; Bindings
6700 @section Simple Special Forms
6701
6702 @code{or}, @code{and}, @code{if}, @code{cond}, @code{progn},
6703 @code{prog1}, @code{prog2}, @code{setq}, @code{quote}, @code{function},
6704 @code{let*}, @code{let}, @code{while}
6705
6706 All of these are very simple and work as expected, calling
6707 @code{Feval()} or @code{Fprogn()} as necessary and (in the case of
6708 @code{let} and @code{let*}) using @code{specbind()} to create bindings
6709 and @code{unbind_to()} to undo the bindings when finished.
6710
6711 Note that, with the exeption of @code{Fprogn}, these functions are
6712 typically called in real life only in interpreted code, since the byte
6713 compiler knows how to convert calls to these functions directly into
6714 byte code.
6715
6716 @node Catch and Throw,  , Simple Special Forms, Evaluation; Stack Frames; Bindings
6717 @section Catch and Throw
6718
6719 @example
6720 struct catchtag
6721 @{
6722   Lisp_Object tag;
6723   Lisp_Object val;
6724   struct catchtag *next;
6725   struct gcpro *gcpro;
6726   jmp_buf jmp;
6727   struct backtrace *backlist;
6728   int lisp_eval_depth;
6729   int pdlcount;
6730 @};
6731 @end example
6732
6733   @code{catch} is a Lisp function that places a catch around a body of
6734 code.  A catch is a means of non-local exit from the code.  When a catch
6735 is created, a tag is specified, and executing a @code{throw} to this tag
6736 will exit from the body of code caught with this tag, and its value will
6737 be the value given in the call to @code{throw}.  If there is no such
6738 call, the code will be executed normally.
6739
6740   Information pertaining to a catch is held in a @code{struct catchtag},
6741 which is placed at the head of a linked list pointed to by
6742 @code{catchlist}.  @code{internal_catch()} is passed a C function to
6743 call (@code{Fprogn()} when Lisp @code{catch} is called) and arguments to
6744 give it, and places a catch around the function.  Each @code{struct
6745 catchtag} is held in the stack frame of the @code{internal_catch()}
6746 instance that created the catch.
6747
6748   @code{internal_catch()} is fairly straightforward.  It stores into the
6749 @code{struct catchtag} the tag name and the current values of
6750 @code{backtrace_list}, @code{lisp_eval_depth}, @code{gcprolist}, and the
6751 offset into the @code{specpdl} array, sets a jump point with @code{_setjmp()}
6752 (storing the jump point into the @code{struct catchtag}), and calls the
6753 function.  Control will return to @code{internal_catch()} either when
6754 the function exits normally or through a @code{_longjmp()} to this jump
6755 point.  In the latter case, @code{throw} will store the value to be
6756 returned into the @code{struct catchtag} before jumping.  When it's
6757 done, @code{internal_catch()} removes the @code{struct catchtag} from
6758 the catchlist and returns the proper value.
6759
6760   @code{Fthrow()} goes up through the catchlist until it finds one with
6761 a matching tag.  It then calls @code{unbind_catch()} to restore
6762 everything to what it was when the appropriate catch was set, stores the
6763 return value in the @code{struct catchtag}, and jumps (with
6764 @code{_longjmp()}) to its jump point.
6765
6766   @code{unbind_catch()} removes all catches from the catchlist until it
6767 finds the correct one.  Some of the catches might have been placed for
6768 error-trapping, and if so, the appropriate entries on the handlerlist
6769 must be removed (see ``errors'').  @code{unbind_catch()} also restores
6770 the values of @code{gcprolist}, @code{backtrace_list}, and
6771 @code{lisp_eval}, and calls @code{unbind_to()} to undo any specbindings
6772 created since the catch.
6773
6774
6775 @node Symbols and Variables, Buffers and Textual Representation, Evaluation; Stack Frames; Bindings, Top
6776 @chapter Symbols and Variables
6777
6778 @menu
6779 * Introduction to Symbols::
6780 * Obarrays::
6781 * Symbol Values::
6782 @end menu
6783
6784 @node Introduction to Symbols, Obarrays, Symbols and Variables, Symbols and Variables
6785 @section Introduction to Symbols
6786
6787   A symbol is basically just an object with four fields: a name (a
6788 string), a value (some Lisp object), a function (some Lisp object), and
6789 a property list (usually a list of alternating keyword/value pairs).
6790 What makes symbols special is that there is usually only one symbol with
6791 a given name, and the symbol is referred to by name.  This makes a
6792 symbol a convenient way of calling up data by name, i.e. of implementing
6793 variables. (The variable's value is stored in the @dfn{value slot}.)
6794 Similarly, functions are referenced by name, and the definition of the
6795 function is stored in a symbol's @dfn{function slot}.  This means that
6796 there can be a distinct function and variable with the same name.  The
6797 property list is used as a more general mechanism of associating
6798 additional values with particular names, and once again the namespace is
6799 independent of the function and variable namespaces.
6800
6801 @node Obarrays, Symbol Values, Introduction to Symbols, Symbols and Variables
6802 @section Obarrays
6803
6804   The identity of symbols with their names is accomplished through a
6805 structure called an obarray, which is just a poorly-implemented hash
6806 table mapping from strings to symbols whose name is that string. (I say
6807 ``poorly implemented'' because an obarray appears in Lisp as a vector
6808 with some hidden fields rather than as its own opaque type.  This is an
6809 Emacs Lisp artifact that should be fixed.)
6810
6811   Obarrays are implemented as a vector of some fixed size (which should
6812 be a prime for best results), where each ``bucket'' of the vector
6813 contains one or more symbols, threaded through a hidden @code{next}
6814 field in the symbol.  Lookup of a symbol in an obarray, and adding a
6815 symbol to an obarray, is accomplished through standard hash-table
6816 techniques.
6817
6818   The standard Lisp function for working with symbols and obarrays is
6819 @code{intern}.  This looks up a symbol in an obarray given its name; if
6820 it's not found, a new symbol is automatically created with the specified
6821 name, added to the obarray, and returned.  This is what happens when the
6822 Lisp reader encounters a symbol (or more precisely, encounters the name
6823 of a symbol) in some text that it is reading.  There is a standard
6824 obarray called @code{obarray} that is used for this purpose, although
6825 the Lisp programmer is free to create his own obarrays and @code{intern}
6826 symbols in them.
6827
6828   Note that, once a symbol is in an obarray, it stays there until
6829 something is done about it, and the standard obarray @code{obarray}
6830 always stays around, so once you use any particular variable name, a
6831 corresponding symbol will stay around in @code{obarray} until you exit
6832 XEmacs.
6833
6834   Note that @code{obarray} itself is a variable, and as such there is a
6835 symbol in @code{obarray} whose name is @code{"obarray"} and which
6836 contains @code{obarray} as its value.
6837
6838   Note also that this call to @code{intern} occurs only when in the Lisp
6839 reader, not when the code is executed (at which point the symbol is
6840 already around, stored as such in the definition of the function).
6841
6842   You can create your own obarray using @code{make-vector} (this is
6843 horrible but is an artifact) and intern symbols into that obarray.
6844 Doing that will result in two or more symbols with the same name.
6845 However, at most one of these symbols is in the standard @code{obarray}:
6846 You cannot have two symbols of the same name in any particular obarray.
6847 Note that you cannot add a symbol to an obarray in any fashion other
6848 than using @code{intern}: i.e. you can't take an existing symbol and put
6849 it in an existing obarray.  Nor can you change the name of an existing
6850 symbol. (Since obarrays are vectors, you can violate the consistency of
6851 things by storing directly into the vector, but let's ignore that
6852 possibility.)
6853
6854   Usually symbols are created by @code{intern}, but if you really want,
6855 you can explicitly create a symbol using @code{make-symbol}, giving it
6856 some name.  The resulting symbol is not in any obarray (i.e. it is
6857 @dfn{uninterned}), and you can't add it to any obarray.  Therefore its
6858 primary purpose is as a symbol to use in macros to avoid namespace
6859 pollution.  It can also be used as a carrier of information, but cons
6860 cells could probably be used just as well.
6861
6862   You can also use @code{intern-soft} to look up a symbol but not create
6863 a new one, and @code{unintern} to remove a symbol from an obarray.  This
6864 returns the removed symbol. (Remember: You can't put the symbol back
6865 into any obarray.) Finally, @code{mapatoms} maps over all of the symbols
6866 in an obarray.
6867
6868 @node Symbol Values,  , Obarrays, Symbols and Variables
6869 @section Symbol Values
6870
6871   The value field of a symbol normally contains a Lisp object.  However,
6872 a symbol can be @dfn{unbound}, meaning that it logically has no value.
6873 This is internally indicated by storing a special Lisp object, called
6874 @dfn{the unbound marker} and stored in the global variable
6875 @code{Qunbound}.  The unbound marker is of a special Lisp object type
6876 called @dfn{symbol-value-magic}.  It is impossible for the Lisp
6877 programmer to directly create or access any object of this type.
6878
6879   @strong{You must not let any ``symbol-value-magic'' object escape to
6880 the Lisp level.}  Printing any of these objects will cause the message
6881 @samp{INTERNAL EMACS BUG} to appear as part of the print representation.
6882 (You may see this normally when you call @code{debug_print()} from the
6883 debugger on a Lisp object.) If you let one of these objects escape to
6884 the Lisp level, you will violate a number of assumptions contained in
6885 the C code and make the unbound marker not function right.
6886
6887   When a symbol is created, its value field (and function field) are set
6888 to @code{Qunbound}.  The Lisp programmer can restore these conditions
6889 later using @code{makunbound} or @code{fmakunbound}, and can query to
6890 see whether the value of function fields are @dfn{bound} (i.e. have a
6891 value other than @code{Qunbound}) using @code{boundp} and
6892 @code{fboundp}.  The fields are set to a normal Lisp object using
6893 @code{set} (or @code{setq}) and @code{fset}.
6894
6895   Other symbol-value-magic objects are used as special markers to
6896 indicate variables that have non-normal properties.  This includes any
6897 variables that are tied into C variables (setting the variable magically
6898 sets some global variable in the C code, and likewise for retrieving the
6899 variable's value), variables that magically tie into slots in the
6900 current buffer, variables that are buffer-local, etc.  The
6901 symbol-value-magic object is stored in the value cell in place of
6902 a normal object, and the code to retrieve a symbol's value
6903 (i.e. @code{symbol-value}) knows how to do special things with them.
6904 This means that you should not just fetch the value cell directly if you
6905 want a symbol's value.
6906
6907   The exact workings of this are rather complex and involved and are
6908 well-documented in comments in @file{buffer.c}, @file{symbols.c}, and
6909 @file{lisp.h}.
6910
6911 @node Buffers and Textual Representation, MULE Character Sets and Encodings, Symbols and Variables, Top
6912 @chapter Buffers and Textual Representation
6913
6914 @menu
6915 * Introduction to Buffers::     A buffer holds a block of text such as a file.
6916 * The Text in a Buffer::        Representation of the text in a buffer.
6917 * Buffer Lists::                Keeping track of all buffers.
6918 * Markers and Extents::         Tagging locations within a buffer.
6919 * Bufbytes and Emchars::        Representation of individual characters.
6920 * The Buffer Object::           The Lisp object corresponding to a buffer.
6921 @end menu
6922
6923 @node Introduction to Buffers, The Text in a Buffer, Buffers and Textual Representation, Buffers and Textual Representation
6924 @section Introduction to Buffers
6925
6926   A buffer is logically just a Lisp object that holds some text.
6927 In this, it is like a string, but a buffer is optimized for
6928 frequent insertion and deletion, while a string is not.  Furthermore:
6929
6930 @enumerate
6931 @item
6932 Buffers are @dfn{permanent} objects, i.e. once you create them, they
6933 remain around, and need to be explicitly deleted before they go away.
6934 @item
6935 Each buffer has a unique name, which is a string.  Buffers are
6936 normally referred to by name.  In this respect, they are like
6937 symbols.
6938 @item
6939 Buffers have a default insertion position, called @dfn{point}.
6940 Inserting text (unless you explicitly give a position) goes at point,
6941 and moves point forward past the text.  This is what is going on when
6942 you type text into Emacs.
6943 @item
6944 Buffers have lots of extra properties associated with them.
6945 @item
6946 Buffers can be @dfn{displayed}.  What this means is that there
6947 exist a number of @dfn{windows}, which are objects that correspond
6948 to some visible section of your display, and each window has
6949 an associated buffer, and the current contents of the buffer
6950 are shown in that section of the display.  The redisplay mechanism
6951 (which takes care of doing this) knows how to look at the
6952 text of a buffer and come up with some reasonable way of displaying
6953 this.  Many of the properties of a buffer control how the
6954 buffer's text is displayed.
6955 @item
6956 One buffer is distinguished and called the @dfn{current buffer}.  It is
6957 stored in the variable @code{current_buffer}.  Buffer operations operate
6958 on this buffer by default.  When you are typing text into a buffer, the
6959 buffer you are typing into is always @code{current_buffer}.  Switching
6960 to a different window changes the current buffer.  Note that Lisp code
6961 can temporarily change the current buffer using @code{set-buffer} (often
6962 enclosed in a @code{save-excursion} so that the former current buffer
6963 gets restored when the code is finished).  However, calling
6964 @code{set-buffer} will NOT cause a permanent change in the current
6965 buffer.  The reason for this is that the top-level event loop sets
6966 @code{current_buffer} to the buffer of the selected window, each time
6967 it finishes executing a user command.
6968 @end enumerate
6969
6970   Make sure you understand the distinction between @dfn{current buffer}
6971 and @dfn{buffer of the selected window}, and the distinction between
6972 @dfn{point} of the current buffer and @dfn{window-point} of the selected
6973 window. (This latter distinction is explained in detail in the section
6974 on windows.)
6975
6976 @node The Text in a Buffer, Buffer Lists, Introduction to Buffers, Buffers and Textual Representation
6977 @section The Text in a Buffer
6978
6979   The text in a buffer consists of a sequence of zero or more
6980 characters.  A @dfn{character} is an integer that logically represents
6981 a letter, number, space, or other unit of text.  Most of the characters
6982 that you will typically encounter belong to the ASCII set of characters,
6983 but there are also characters for various sorts of accented letters,
6984 special symbols, Chinese and Japanese ideograms (i.e. Kanji, Katakana,
6985 etc.), Cyrillic and Greek letters, etc.  The actual number of possible
6986 characters is quite large.
6987
6988   For now, we can view a character as some non-negative integer that
6989 has some shape that defines how it typically appears (e.g. as an
6990 uppercase A). (The exact way in which a character appears depends on the
6991 font used to display the character.) The internal type of characters in
6992 the C code is an @code{Emchar}; this is just an @code{int}, but using a
6993 symbolic type makes the code clearer.
6994
6995   Between every character in a buffer is a @dfn{buffer position} or
6996 @dfn{character position}.  We can speak of the character before or after
6997 a particular buffer position, and when you insert a character at a
6998 particular position, all characters after that position end up at new
6999 positions.  When we speak of the character @dfn{at} a position, we
7000 really mean the character after the position.  (This schizophrenia
7001 between a buffer position being ``between'' a character and ``on'' a
7002 character is rampant in Emacs.)
7003
7004   Buffer positions are numbered starting at 1.  This means that
7005 position 1 is before the first character, and position 0 is not
7006 valid.  If there are N characters in a buffer, then buffer
7007 position N+1 is after the last one, and position N+2 is not valid.
7008
7009   The internal makeup of the Emchar integer varies depending on whether
7010 we have compiled with MULE support.  If not, the Emchar integer is an
7011 8-bit integer with possible values from 0 - 255.  0 - 127 are the
7012 standard ASCII characters, while 128 - 255 are the characters from the
7013 ISO-8859-1 character set.  If we have compiled with MULE support, an
7014 Emchar is a 19-bit integer, with the various bits having meanings
7015 according to a complex scheme that will be detailed later.  The
7016 characters numbered 0 - 255 still have the same meanings as for the
7017 non-MULE case, though.
7018
7019   Internally, the text in a buffer is represented in a fairly simple
7020 fashion: as a contiguous array of bytes, with a @dfn{gap} of some size
7021 in the middle.  Although the gap is of some substantial size in bytes,
7022 there is no text contained within it: From the perspective of the text
7023 in the buffer, it does not exist.  The gap logically sits at some buffer
7024 position, between two characters (or possibly at the beginning or end of
7025 the buffer).  Insertion of text in a buffer at a particular position is
7026 always accomplished by first moving the gap to that position
7027 (i.e. through some block moving of text), then writing the text into the
7028 beginning of the gap, thereby shrinking the gap.  If the gap shrinks
7029 down to nothing, a new gap is created. (What actually happens is that a
7030 new gap is ``created'' at the end of the buffer's text, which requires
7031 nothing more than changing a couple of indices; then the gap is
7032 ``moved'' to the position where the insertion needs to take place by
7033 moving up in memory all the text after that position.)  Similarly,
7034 deletion occurs by moving the gap to the place where the text is to be
7035 deleted, and then simply expanding the gap to include the deleted text.
7036 (@dfn{Expanding} and @dfn{shrinking} the gap as just described means
7037 just that the internal indices that keep track of where the gap is
7038 located are changed.)
7039
7040   Note that the total amount of memory allocated for a buffer text never
7041 decreases while the buffer is live.  Therefore, if you load up a
7042 20-megabyte file and then delete all but one character, there will be a
7043 20-megabyte gap, which won't get any smaller (except by inserting
7044 characters back again).  Once the buffer is killed, the memory allocated
7045 for the buffer text will be freed, but it will still be sitting on the
7046 heap, taking up virtual memory, and will not be released back to the
7047 operating system. (However, if you have compiled XEmacs with rel-alloc,
7048 the situation is different.  In this case, the space @emph{will} be
7049 released back to the operating system.  However, this tends to result in a
7050 noticeable speed penalty.)
7051
7052   Astute readers may notice that the text in a buffer is represented as
7053 an array of @emph{bytes}, while (at least in the MULE case) an Emchar is
7054 a 19-bit integer, which clearly cannot fit in a byte.  This means (of
7055 course) that the text in a buffer uses a different representation from
7056 an Emchar: specifically, the 19-bit Emchar becomes a series of one to
7057 four bytes.  The conversion between these two representations is complex
7058 and will be described later.
7059
7060   In the non-MULE case, everything is very simple: An Emchar
7061 is an 8-bit value, which fits neatly into one byte.
7062
7063   If we are given a buffer position and want to retrieve the
7064 character at that position, we need to follow these steps:
7065
7066 @enumerate
7067 @item
7068 Pretend there's no gap, and convert the buffer position into a @dfn{byte
7069 index} that indexes to the appropriate byte in the buffer's stream of
7070 textual bytes.  By convention, byte indices begin at 1, just like buffer
7071 positions.  In the non-MULE case, byte indices and buffer positions are
7072 identical, since one character equals one byte.
7073 @item
7074 Convert the byte index into a @dfn{memory index}, which takes the gap
7075 into account.  The memory index is a direct index into the block of
7076 memory that stores the text of a buffer.  This basically just involves
7077 checking to see if the byte index is past the gap, and if so, adding the
7078 size of the gap to it.  By convention, memory indices begin at 1, just
7079 like buffer positions and byte indices, and when referring to the
7080 position that is @dfn{at} the gap, we always use the memory position at
7081 the @emph{beginning}, not at the end, of the gap.
7082 @item
7083 Fetch the appropriate bytes at the determined memory position.
7084 @item
7085 Convert these bytes into an Emchar.
7086 @end enumerate
7087
7088   In the non-Mule case, (3) and (4) boil down to a simple one-byte
7089 memory access.
7090
7091   Note that we have defined three types of positions in a buffer:
7092
7093 @enumerate
7094 @item
7095 @dfn{buffer positions} or @dfn{character positions}, typedef @code{Bufpos}
7096 @item
7097 @dfn{byte indices}, typedef @code{Bytind}
7098 @item
7099 @dfn{memory indices}, typedef @code{Memind}
7100 @end enumerate
7101
7102   All three typedefs are just @code{int}s, but defining them this way makes
7103 things a lot clearer.
7104
7105   Most code works with buffer positions.  In particular, all Lisp code
7106 that refers to text in a buffer uses buffer positions.  Lisp code does
7107 not know that byte indices or memory indices exist.
7108
7109   Finally, we have a typedef for the bytes in a buffer.  This is a
7110 @code{Bufbyte}, which is an unsigned char.  Referring to them as
7111 Bufbytes underscores the fact that we are working with a string of bytes
7112 in the internal Emacs buffer representation rather than in one of a
7113 number of possible alternative representations (e.g. EUC-encoded text,
7114 etc.).
7115
7116 @node Buffer Lists, Markers and Extents, The Text in a Buffer, Buffers and Textual Representation
7117 @section Buffer Lists
7118
7119   Recall earlier that buffers are @dfn{permanent} objects, i.e.  that
7120 they remain around until explicitly deleted.  This entails that there is
7121 a list of all the buffers in existence.  This list is actually an
7122 assoc-list (mapping from the buffer's name to the buffer) and is stored
7123 in the global variable @code{Vbuffer_alist}.
7124
7125   The order of the buffers in the list is important: the buffers are
7126 ordered approximately from most-recently-used to least-recently-used.
7127 Switching to a buffer using @code{switch-to-buffer},
7128 @code{pop-to-buffer}, etc. and switching windows using
7129 @code{other-window}, etc.  usually brings the new current buffer to the
7130 front of the list.  @code{switch-to-buffer}, @code{other-buffer},
7131 etc. look at the beginning of the list to find an alternative buffer to
7132 suggest.  You can also explicitly move a buffer to the end of the list
7133 using @code{bury-buffer}.
7134
7135   In addition to the global ordering in @code{Vbuffer_alist}, each frame
7136 has its own ordering of the list.  These lists always contain the same
7137 elements as in @code{Vbuffer_alist} although possibly in a different
7138 order.  @code{buffer-list} normally returns the list for the selected
7139 frame.  This allows you to work in separate frames without things
7140 interfering with each other.
7141
7142   The standard way to look up a buffer given a name is
7143 @code{get-buffer}, and the standard way to create a new buffer is
7144 @code{get-buffer-create}, which looks up a buffer with a given name,
7145 creating a new one if necessary.  These operations correspond exactly
7146 with the symbol operations @code{intern-soft} and @code{intern},
7147 respectively.  You can also force a new buffer to be created using
7148 @code{generate-new-buffer}, which takes a name and (if necessary) makes
7149 a unique name from this by appending a number, and then creates the
7150 buffer.  This is basically like the symbol operation @code{gensym}.
7151
7152 @node Markers and Extents, Bufbytes and Emchars, Buffer Lists, Buffers and Textual Representation
7153 @section Markers and Extents
7154
7155   Among the things associated with a buffer are things that are
7156 logically attached to certain buffer positions.  This can be used to
7157 keep track of a buffer position when text is inserted and deleted, so
7158 that it remains at the same spot relative to the text around it; to
7159 assign properties to particular sections of text; etc.  There are two
7160 such objects that are useful in this regard: they are @dfn{markers} and
7161 @dfn{extents}.
7162
7163   A @dfn{marker} is simply a flag placed at a particular buffer
7164 position, which is moved around as text is inserted and deleted.
7165 Markers are used for all sorts of purposes, such as the @code{mark} that
7166 is the other end of textual regions to be cut, copied, etc.
7167
7168   An @dfn{extent} is similar to two markers plus some associated
7169 properties, and is used to keep track of regions in a buffer as text is
7170 inserted and deleted, and to add properties (e.g. fonts) to particular
7171 regions of text.  The external interface of extents is explained
7172 elsewhere.
7173
7174   The important thing here is that markers and extents simply contain
7175 buffer positions in them as integers, and every time text is inserted or
7176 deleted, these positions must be updated.  In order to minimize the
7177 amount of shuffling that needs to be done, the positions in markers and
7178 extents (there's one per marker, two per extent) and stored in Meminds.
7179 This means that they only need to be moved when the text is physically
7180 moved in memory; since the gap structure tries to minimize this, it also
7181 minimizes the number of marker and extent indices that need to be
7182 adjusted.  Look in @file{insdel.c} for the details of how this works.
7183
7184   One other important distinction is that markers are @dfn{temporary}
7185 while extents are @dfn{permanent}.  This means that markers disappear as
7186 soon as there are no more pointers to them, and correspondingly, there
7187 is no way to determine what markers are in a buffer if you are just
7188 given the buffer.  Extents remain in a buffer until they are detached
7189 (which could happen as a result of text being deleted) or the buffer is
7190 deleted, and primitives do exist to enumerate the extents in a buffer.
7191
7192 @node Bufbytes and Emchars, The Buffer Object, Markers and Extents, Buffers and Textual Representation
7193 @section Bufbytes and Emchars
7194
7195   Not yet documented.
7196
7197 @node The Buffer Object,  , Bufbytes and Emchars, Buffers and Textual Representation
7198 @section The Buffer Object
7199
7200   Buffers contain fields not directly accessible by the Lisp programmer.
7201 We describe them here, naming them by the names used in the C code.
7202 Many are accessible indirectly in Lisp programs via Lisp primitives.
7203
7204 @table @code
7205 @item name
7206 The buffer name is a string that names the buffer.  It is guaranteed to
7207 be unique.  @xref{Buffer Names,,, lispref, XEmacs Lisp Programmer's
7208 Manual}.
7209
7210 @item save_modified
7211 This field contains the time when the buffer was last saved, as an
7212 integer.  @xref{Buffer Modification,,, lispref, XEmacs Lisp Programmer's
7213 Manual}.
7214
7215 @item modtime
7216 This field contains the modification time of the visited file.  It is
7217 set when the file is written or read.  Every time the buffer is written
7218 to the file, this field is compared to the modification time of the
7219 file.  @xref{Buffer Modification,,, lispref, XEmacs Lisp Programmer's
7220 Manual}.
7221
7222 @item auto_save_modified
7223 This field contains the time when the buffer was last auto-saved.
7224
7225 @item last_window_start
7226 This field contains the @code{window-start} position in the buffer as of
7227 the last time the buffer was displayed in a window.
7228
7229 @item undo_list
7230 This field points to the buffer's undo list.  @xref{Undo,,, lispref,
7231 XEmacs Lisp Programmer's Manual}.
7232
7233 @item syntax_table_v
7234 This field contains the syntax table for the buffer.  @xref{Syntax
7235 Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
7236
7237 @item downcase_table
7238 This field contains the conversion table for converting text to lower
7239 case.  @xref{Case Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
7240
7241 @item upcase_table
7242 This field contains the conversion table for converting text to upper
7243 case.  @xref{Case Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
7244
7245 @item case_canon_table
7246 This field contains the conversion table for canonicalizing text for
7247 case-folding search.  @xref{Case Tables,,, lispref, XEmacs Lisp
7248 Programmer's Manual}.
7249
7250 @item case_eqv_table
7251 This field contains the equivalence table for case-folding search.
7252 @xref{Case Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
7253
7254 @item display_table
7255 This field contains the buffer's display table, or @code{nil} if it
7256 doesn't have one.  @xref{Display Tables,,, lispref, XEmacs Lisp
7257 Programmer's Manual}.
7258
7259 @item markers
7260 This field contains the chain of all markers that currently point into
7261 the buffer.  Deletion of text in the buffer, and motion of the buffer's
7262 gap, must check each of these markers and perhaps update it.
7263 @xref{Markers,,, lispref, XEmacs Lisp Programmer's Manual}.
7264
7265 @item backed_up
7266 This field is a flag that tells whether a backup file has been made for
7267 the visited file of this buffer.
7268
7269 @item mark
7270 This field contains the mark for the buffer.  The mark is a marker,
7271 hence it is also included on the list @code{markers}.  @xref{The Mark,,,
7272 lispref, XEmacs Lisp Programmer's Manual}.
7273
7274 @item mark_active
7275 This field is non-@code{nil} if the buffer's mark is active.
7276
7277 @item local_var_alist
7278 This field contains the association list describing the variables local
7279 in this buffer, and their values, with the exception of local variables
7280 that have special slots in the buffer object.  (Those slots are omitted
7281 from this table.)  @xref{Buffer-Local Variables,,, lispref, XEmacs Lisp
7282 Programmer's Manual}.
7283
7284 @item modeline_format
7285 This field contains a Lisp object which controls how to display the mode
7286 line for this buffer.  @xref{Modeline Format,,, lispref, XEmacs Lisp
7287 Programmer's Manual}.
7288
7289 @item base_buffer
7290 This field holds the buffer's base buffer (if it is an indirect buffer),
7291 or @code{nil}.
7292 @end table
7293
7294 @node MULE Character Sets and Encodings, The Lisp Reader and Compiler, Buffers and Textual Representation, Top
7295 @chapter MULE Character Sets and Encodings
7296
7297   Recall that there are two primary ways that text is represented in
7298 XEmacs.  The @dfn{buffer} representation sees the text as a series of
7299 bytes (Bufbytes), with a variable number of bytes used per character.
7300 The @dfn{character} representation sees the text as a series of integers
7301 (Emchars), one per character.  The character representation is a cleaner
7302 representation from a theoretical standpoint, and is thus used in many
7303 cases when lots of manipulations on a string need to be done.  However,
7304 the buffer representation is the standard representation used in both
7305 Lisp strings and buffers, and because of this, it is the ``default''
7306 representation that text comes in.  The reason for using this
7307 representation is that it's compact and is compatible with ASCII.
7308
7309 @menu
7310 * Character Sets::
7311 * Encodings::
7312 * Internal Mule Encodings::
7313 * CCL::
7314 @end menu
7315
7316 @node Character Sets, Encodings, MULE Character Sets and Encodings, MULE Character Sets and Encodings
7317 @section Character Sets
7318
7319   A character set (or @dfn{charset}) is an ordered set of characters.  A
7320 particular character in a charset is indexed using one or more
7321 @dfn{position codes}, which are non-negative integers.  The number of
7322 position codes needed to identify a particular character in a charset is
7323 called the @dfn{dimension} of the charset.  In XEmacs/Mule, all charsets
7324 have dimension 1 or 2, and the size of all charsets (except for a few
7325 special cases) is either 94, 96, 94 by 94, or 96 by 96.  The range of
7326 position codes used to index characters from any of these types of
7327 character sets is as follows:
7328
7329 @example
7330 Charset type            Position code 1         Position code 2
7331 ------------------------------------------------------------
7332 94                      33 - 126                N/A
7333 96                      32 - 127                N/A
7334 94x94                   33 - 126                33 - 126
7335 96x96                   32 - 127                32 - 127
7336 @end example
7337
7338   Note that in the above cases position codes do not start at an
7339 expected value such as 0 or 1.  The reason for this will become clear
7340 later.
7341
7342   For example, Latin-1 is a 96-character charset, and JISX0208 (the
7343 Japanese national character set) is a 94x94-character charset.
7344
7345   [Note that, although the ranges above define the @emph{valid} position
7346 codes for a charset, some of the slots in a particular charset may in
7347 fact be empty.  This is the case for JISX0208, for example, where (e.g.)
7348 all the slots whose first position code is in the range 118 - 127 are
7349 empty.]
7350
7351   There are three charsets that do not follow the above rules.  All of
7352 them have one dimension, and have ranges of position codes as follows:
7353
7354 @example
7355 Charset name            Position code 1
7356 ------------------------------------
7357 ASCII                   0 - 127
7358 Control-1               0 - 31
7359 Composite               0 - some large number
7360 @end example
7361
7362   (The upper bound of the position code for composite characters has not
7363 yet been determined, but it will probably be at least 16,383).
7364
7365   ASCII is the union of two subsidiary character sets: Printing-ASCII
7366 (the printing ASCII character set, consisting of position codes 33 -
7367 126, like for a standard 94-character charset) and Control-ASCII (the
7368 non-printing characters that would appear in a binary file with codes 0
7369 - 32 and 127).
7370
7371   Control-1 contains the non-printing characters that would appear in a
7372 binary file with codes 128 - 159.
7373
7374   Composite contains characters that are generated by overstriking one
7375 or more characters from other charsets.
7376
7377   Note that some characters in ASCII, and all characters in Control-1,
7378 are @dfn{control} (non-printing) characters.  These have no printed
7379 representation but instead control some other function of the printing
7380 (e.g. TAB or 8 moves the current character position to the next tab
7381 stop).  All other characters in all charsets are @dfn{graphic}
7382 (printing) characters.
7383
7384   When a binary file is read in, the bytes in the file are assigned to
7385 character sets as follows:
7386
7387 @example
7388 Bytes           Character set           Range
7389 --------------------------------------------------
7390 0 - 127         ASCII                   0 - 127
7391 128 - 159       Control-1               0 - 31
7392 160 - 255       Latin-1                 32 - 127
7393 @end example
7394
7395   This is a bit ad-hoc but gets the job done.
7396
7397 @node Encodings, Internal Mule Encodings, Character Sets, MULE Character Sets and Encodings
7398 @section Encodings
7399
7400   An @dfn{encoding} is a way of numerically representing characters from
7401 one or more character sets.  If an encoding only encompasses one
7402 character set, then the position codes for the characters in that
7403 character set could be used directly.  This is not possible, however, if
7404 more than one character set is to be used in the encoding.
7405
7406   For example, the conversion detailed above between bytes in a binary
7407 file and characters is effectively an encoding that encompasses the
7408 three character sets ASCII, Control-1, and Latin-1 in a stream of 8-bit
7409 bytes.
7410
7411   Thus, an encoding can be viewed as a way of encoding characters from a
7412 specified group of character sets using a stream of bytes, each of which
7413 contains a fixed number of bits (but not necessarily 8, as in the common
7414 usage of ``byte'').
7415
7416   Here are descriptions of a couple of common
7417 encodings:
7418
7419 @menu
7420 * Japanese EUC (Extended Unix Code)::
7421 * JIS7::
7422 @end menu
7423
7424 @node Japanese EUC (Extended Unix Code), JIS7, Encodings, Encodings
7425 @subsection Japanese EUC (Extended Unix Code)
7426
7427 This encompasses the character sets Printing-ASCII, Japanese-JISX0201,
7428 and Japanese-JISX0208-Kana (half-width katakana, the right half of
7429 JISX0201).  It uses 8-bit bytes.
7430
7431 Note that Printing-ASCII and Japanese-JISX0201-Kana are 94-character
7432 charsets, while Japanese-JISX0208 is a 94x94-character charset.
7433
7434 The encoding is as follows:
7435
7436 @example
7437 Character set            Representation (PC=position-code)
7438 -------------            --------------
7439 Printing-ASCII           PC1
7440 Japanese-JISX0201-Kana   0x8E       | PC1 + 0x80
7441 Japanese-JISX0208        PC1 + 0x80 | PC2 + 0x80
7442 Japanese-JISX0212        PC1 + 0x80 | PC2 + 0x80
7443 @end example
7444
7445
7446 @node JIS7,  , Japanese EUC (Extended Unix Code), Encodings
7447 @subsection JIS7
7448
7449 This encompasses the character sets Printing-ASCII,
7450 Japanese-JISX0201-Roman (the left half of JISX0201; this character set
7451 is very similar to Printing-ASCII and is a 94-character charset),
7452 Japanese-JISX0208, and Japanese-JISX0201-Kana.  It uses 7-bit bytes.
7453
7454 Unlike Japanese EUC, this is a @dfn{modal} encoding, which
7455 means that there are multiple states that the encoding can
7456 be in, which affect how the bytes are to be interpreted.
7457 Special sequences of bytes (called @dfn{escape sequences})
7458 are used to change states.
7459
7460   The encoding is as follows:
7461
7462 @example
7463 Character set              Representation (PC=position-code)
7464 -------------              --------------
7465 Printing-ASCII             PC1
7466 Japanese-JISX0201-Roman    PC1
7467 Japanese-JISX0201-Kana     PC1
7468 Japanese-JISX0208          PC1 PC2
7469
7470
7471 Escape sequence   ASCII equivalent   Meaning
7472 ---------------   ----------------   -------
7473 0x1B 0x28 0x4A    ESC ( J            invoke Japanese-JISX0201-Roman
7474 0x1B 0x28 0x49    ESC ( I            invoke Japanese-JISX0201-Kana
7475 0x1B 0x24 0x42    ESC $ B            invoke Japanese-JISX0208
7476 0x1B 0x28 0x42    ESC ( B            invoke Printing-ASCII
7477 @end example
7478
7479   Initially, Printing-ASCII is invoked.
7480
7481 @node Internal Mule Encodings, CCL, Encodings, MULE Character Sets and Encodings
7482 @section Internal Mule Encodings
7483
7484 In XEmacs/Mule, each character set is assigned a unique number, called a
7485 @dfn{leading byte}.  This is used in the encodings of a character.
7486 Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has
7487 a leading byte of 0), although some leading bytes are reserved.
7488
7489 Charsets whose leading byte is in the range 0x80 - 0x9F are called
7490 @dfn{official} and are used for built-in charsets.  Other charsets are
7491 called @dfn{private} and have leading bytes in the range 0xA0 - 0xFF;
7492 these are user-defined charsets.
7493
7494   More specifically:
7495
7496 @example
7497 Character set           Leading byte
7498 -------------           ------------
7499 ASCII                   0
7500 Composite               0x80
7501 Dimension-1 Official    0x81 - 0x8D
7502                           (0x8E is free)
7503 Control-1               0x8F
7504 Dimension-2 Official    0x90 - 0x99
7505                           (0x9A - 0x9D are free;
7506                            0x9E and 0x9F are reserved)
7507 Dimension-1 Private     0xA0 - 0xEF
7508 Dimension-2 Private     0xF0 - 0xFF
7509 @end example
7510
7511 There are two internal encodings for characters in XEmacs/Mule.  One is
7512 called @dfn{string encoding} and is an 8-bit encoding that is used for
7513 representing characters in a buffer or string.  It uses 1 to 4 bytes per
7514 character.  The other is called @dfn{character encoding} and is a 19-bit
7515 encoding that is used for representing characters individually in a
7516 variable.
7517
7518 (In the following descriptions, we'll ignore composite characters for
7519 the moment.  We also give a general (structural) overview first,
7520 followed later by the exact details.)
7521
7522 @menu
7523 * Internal String Encoding::
7524 * Internal Character Encoding::
7525 @end menu
7526
7527 @node Internal String Encoding, Internal Character Encoding, Internal Mule Encodings, Internal Mule Encodings
7528 @subsection Internal String Encoding
7529
7530 ASCII characters are encoded using their position code directly.  Other
7531 characters are encoded using their leading byte followed by their
7532 position code(s) with the high bit set.  Characters in private character
7533 sets have their leading byte prefixed with a @dfn{leading byte prefix},
7534 which is either 0x9E or 0x9F. (No character sets are ever assigned these
7535 leading bytes.) Specifically:
7536
7537 @example
7538 Character set           Encoding (PC=position-code, LB=leading-byte)
7539 -------------           --------
7540 ASCII                   PC-1 |
7541 Control-1               LB   |  PC1 + 0xA0 |
7542 Dimension-1 official    LB   |  PC1 + 0x80 |
7543 Dimension-1 private     0x9E |  LB         | PC1 + 0x80 |
7544 Dimension-2 official    LB   |  PC1 + 0x80 | PC2 + 0x80 |
7545 Dimension-2 private     0x9F |  LB         | PC1 + 0x80 | PC2 + 0x80
7546 @end example
7547
7548   The basic characteristic of this encoding is that the first byte
7549 of all characters is in the range 0x00 - 0x9F, and the second and
7550 following bytes of all characters is in the range 0xA0 - 0xFF.
7551 This means that it is impossible to get out of sync, or more
7552 specifically:
7553
7554 @enumerate
7555 @item
7556 Given any byte position, the beginning of the character it is
7557 within can be determined in constant time.
7558 @item
7559 Given any byte position at the beginning of a character, the
7560 beginning of the next character can be determined in constant
7561 time.
7562 @item
7563 Given any byte position at the beginning of a character, the
7564 beginning of the previous character can be determined in constant
7565 time.
7566 @item
7567 Textual searches can simply treat encoded strings as if they
7568 were encoded in a one-byte-per-character fashion rather than
7569 the actual multi-byte encoding.
7570 @end enumerate
7571
7572   None of the standard non-modal encodings meet all of these
7573 conditions.  For example, EUC satisfies only (2) and (3), while
7574 Shift-JIS and Big5 (not yet described) satisfy only (2). (All
7575 non-modal encodings must satisfy (2), in order to be unambiguous.)
7576
7577 @node Internal Character Encoding,  , Internal String Encoding, Internal Mule Encodings
7578 @subsection Internal Character Encoding
7579
7580   One 19-bit word represents a single character.  The word is
7581 separated into three fields:
7582
7583 @example
7584 Bit number:     18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
7585                 <------------> <------------------> <------------------>
7586 Field:                1                  2                    3
7587 @end example
7588
7589   Note that fields 2 and 3 hold 7 bits each, while field 1 holds 5 bits.
7590
7591 @example
7592 Character set           Field 1         Field 2         Field 3
7593 -------------           -------         -------         -------
7594 ASCII                      0               0              PC1
7595    range:                                                   (00 - 7F)
7596 Control-1                  0               1              PC1
7597    range:                                                   (00 - 1F)
7598 Dimension-1 official       0            LB - 0x80         PC1
7599    range:                                    (01 - 0D)      (20 - 7F)
7600 Dimension-1 private        0            LB - 0x80         PC1
7601    range:                                    (20 - 6F)      (20 - 7F)
7602 Dimension-2 official    LB - 0x8F         PC1             PC2
7603    range:                    (01 - 0A)       (20 - 7F)      (20 - 7F)
7604 Dimension-2 private     LB - 0xE1         PC1             PC2
7605    range:                    (0F - 1E)       (20 - 7F)      (20 - 7F)
7606 Composite                 0x1F             ?               ?
7607 @end example
7608
7609   Note that character codes 0 - 255 are the same as the ``binary encoding''
7610 described above.
7611
7612 @node CCL,  , Internal Mule Encodings, MULE Character Sets and Encodings
7613 @section CCL
7614
7615 @example
7616 CCL PROGRAM SYNTAX:
7617      CCL_PROGRAM := (CCL_MAIN_BLOCK
7618                      [ CCL_EOF_BLOCK ])
7619
7620      CCL_MAIN_BLOCK := CCL_BLOCK
7621      CCL_EOF_BLOCK := CCL_BLOCK
7622
7623      CCL_BLOCK := STATEMENT | (STATEMENT [STATEMENT ...])
7624      STATEMENT :=
7625              SET | IF | BRANCH | LOOP | REPEAT | BREAK
7626              | READ | WRITE
7627
7628      SET := (REG = EXPRESSION) | (REG SELF_OP EXPRESSION)
7629             | INT-OR-CHAR
7630
7631      EXPRESSION := ARG | (EXPRESSION OP ARG)
7632
7633      IF := (if EXPRESSION CCL_BLOCK CCL_BLOCK)
7634      BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...])
7635      LOOP := (loop STATEMENT [STATEMENT ...])
7636      BREAK := (break)
7637      REPEAT := (repeat)
7638              | (write-repeat [REG | INT-OR-CHAR | string])
7639              | (write-read-repeat REG [INT-OR-CHAR | string | ARRAY]?)
7640      READ := (read REG) | (read REG REG)
7641              | (read-if REG ARITH_OP ARG CCL_BLOCK CCL_BLOCK)
7642              | (read-branch REG CCL_BLOCK [CCL_BLOCK ...])
7643      WRITE := (write REG) | (write REG REG)
7644              | (write INT-OR-CHAR) | (write STRING) | STRING
7645              | (write REG ARRAY)
7646      END := (end)
7647
7648      REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7
7649      ARG := REG | INT-OR-CHAR
7650      OP :=   + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | //
7651              | < | > | == | <= | >= | !=
7652      SELF_OP :=
7653              += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>=
7654      ARRAY := '[' INT-OR-CHAR ... ']'
7655      INT-OR-CHAR := INT | CHAR
7656
7657 MACHINE CODE:
7658
7659 The machine code consists of a vector of 32-bit words.
7660 The first such word specifies the start of the EOF section of the code;
7661 this is the code executed to handle any stuff that needs to be done
7662 (e.g. designating back to ASCII and left-to-right mode) after all
7663 other encoded/decoded data has been written out.  This is not used for
7664 charset CCL programs.
7665
7666 REGISTER: 0..7  -- refered by RRR or rrr
7667
7668 OPERATOR BIT FIELD (27-bit): XXXXXXXXXXXXXXX RRR TTTTT
7669         TTTTT (5-bit): operator type
7670         RRR (3-bit): register number
7671         XXXXXXXXXXXXXXXX (15-bit):
7672                 CCCCCCCCCCCCCCC: constant or address
7673                 000000000000rrr: register number
7674
7675 AAAA:   00000 +
7676         00001 -
7677         00010 *
7678         00011 /
7679         00100 %
7680         00101 &
7681         00110 |
7682         00111 ~
7683
7684         01000 <<
7685         01001 >>
7686         01010 <8
7687         01011 >8
7688         01100 //
7689         01101 not used
7690         01110 not used
7691         01111 not used
7692
7693         10000 <
7694         10001 >
7695         10010 ==
7696         10011 <=
7697         10100 >=
7698         10101 !=
7699
7700 OPERATORS:      TTTTT RRR XX..
7701
7702 SetCS:          00000 RRR C...C      RRR = C...C
7703 SetCL:          00001 RRR .....      RRR = c...c
7704                 c.............c
7705 SetR:           00010 RRR ..rrr      RRR = rrr
7706 SetA:           00011 RRR ..rrr      RRR = array[rrr]
7707                 C.............C      size of array = C...C
7708                 c.............c      contents = c...c
7709
7710 Jump:           00100 000 c...c      jump to c...c
7711 JumpCond:       00101 RRR c...c      if (!RRR) jump to c...c
7712 WriteJump:      00110 RRR c...c      Write1 RRR, jump to c...c
7713 WriteReadJump:  00111 RRR c...c      Write1, Read1 RRR, jump to c...c
7714 WriteCJump:     01000 000 c...c      Write1 C...C, jump to c...c
7715                 C...C
7716 WriteCReadJump: 01001 RRR c...c      Write1 C...C, Read1 RRR,
7717                 C.............C      and jump to c...c
7718 WriteSJump:     01010 000 c...c      WriteS, jump to c...c
7719                 C.............C
7720                 S.............S
7721                 ...
7722 WriteSReadJump: 01011 RRR c...c      WriteS, Read1 RRR, jump to c...c
7723                 C.............C
7724                 S.............S
7725                 ...
7726 WriteAReadJump: 01100 RRR c...c      WriteA, Read1 RRR, jump to c...c
7727                 C.............C      size of array = C...C
7728                 c.............c      contents = c...c
7729                 ...
7730 Branch:         01101 RRR C...C      if (RRR >= 0 && RRR < C..)
7731                 c.............c      branch to (RRR+1)th address
7732 Read1:          01110 RRR ...        read 1-byte to RRR
7733 Read2:          01111 RRR ..rrr      read 2-byte to RRR and rrr
7734 ReadBranch:     10000 RRR C...C      Read1 and Branch
7735                 c.............c
7736                 ...
7737 Write1:         10001 RRR .....      write 1-byte RRR
7738 Write2:         10010 RRR ..rrr      write 2-byte RRR and rrr
7739 WriteC:         10011 000 .....      write 1-char C...CC
7740                 C.............C
7741 WriteS:         10100 000 .....      write C..-byte of string
7742                 C.............C
7743                 S.............S
7744                 ...
7745 WriteA:         10101 RRR .....      write array[RRR]
7746                 C.............C      size of array = C...C
7747                 c.............c      contents = c...c
7748                 ...
7749 End:            10110 000 .....      terminate the execution
7750
7751 SetSelfCS:      10111 RRR C...C      RRR AAAAA= C...C
7752                 ..........AAAAA
7753 SetSelfCL:      11000 RRR .....      RRR AAAAA= c...c
7754                 c.............c
7755                 ..........AAAAA
7756 SetSelfR:       11001 RRR ..Rrr      RRR AAAAA= rrr
7757                 ..........AAAAA
7758 SetExprCL:      11010 RRR ..Rrr      RRR = rrr AAAAA c...c
7759                 c.............c
7760                 ..........AAAAA
7761 SetExprR:       11011 RRR ..rrr      RRR = rrr AAAAA Rrr
7762                 ............Rrr
7763                 ..........AAAAA
7764 JumpCondC:      11100 RRR c...c      if !(RRR AAAAA C..) jump to c...c
7765                 C.............C
7766                 ..........AAAAA
7767 JumpCondR:      11101 RRR c...c      if !(RRR AAAAA rrr) jump to c...c
7768                 ............rrr
7769                 ..........AAAAA
7770 ReadJumpCondC:  11110 RRR c...c      Read1 and JumpCondC
7771                 C.............C
7772                 ..........AAAAA
7773 ReadJumpCondR:  11111 RRR c...c      Read1 and JumpCondR
7774                 ............rrr
7775                 ..........AAAAA
7776 @end example
7777
7778 @node The Lisp Reader and Compiler, Lstreams, MULE Character Sets and Encodings, Top
7779 @chapter The Lisp Reader and Compiler
7780
7781 Not yet documented.
7782
7783 @node Lstreams, Consoles; Devices; Frames; Windows, The Lisp Reader and Compiler, Top
7784 @chapter Lstreams
7785
7786   An @dfn{lstream} is an internal Lisp object that provides a generic
7787 buffering stream implementation.  Conceptually, you send data to the
7788 stream or read data from the stream, not caring what's on the other end
7789 of the stream.  The other end could be another stream, a file
7790 descriptor, a stdio stream, a fixed block of memory, a reallocating
7791 block of memory, etc.  The main purpose of the stream is to provide a
7792 standard interface and to do buffering.  Macros are defined to read or
7793 write characters, so the calling functions do not have to worry about
7794 blocking data together in order to achieve efficiency.
7795
7796 @menu
7797 * Creating an Lstream::         Creating an lstream object.
7798 * Lstream Types::               Different sorts of things that are streamed.
7799 * Lstream Functions::           Functions for working with lstreams.
7800 * Lstream Methods::             Creating new lstream types.
7801 @end menu
7802
7803 @node Creating an Lstream, Lstream Types, Lstreams, Lstreams
7804 @section Creating an Lstream
7805
7806 Lstreams come in different types, depending on what is being interfaced
7807 to.  Although the primitive for creating new lstreams is
7808 @code{Lstream_new()}, generally you do not call this directly.  Instead,
7809 you call some type-specific creation function, which creates the lstream
7810 and initializes it as appropriate for the particular type.
7811
7812 All lstream creation functions take a @var{mode} argument, specifying
7813 what mode the lstream should be opened as.  This controls whether the
7814 lstream is for input and output, and optionally whether data should be
7815 blocked up in units of MULE characters.  Note that some types of
7816 lstreams can only be opened for input; others only for output; and
7817 others can be opened either way.  #### Richard Mlynarik thinks that
7818 there should be a strict separation between input and output streams,
7819 and he's probably right.
7820
7821   @var{mode} is a string, one of
7822
7823 @table @code
7824 @item "r"
7825   Open for reading.
7826 @item "w"
7827   Open for writing.
7828 @item "rc"
7829   Open for reading, but ``read'' never returns partial MULE characters.
7830 @item "wc"
7831   Open for writing, but never writes partial MULE characters.
7832 @end table
7833
7834 @node Lstream Types, Lstream Functions, Creating an Lstream, Lstreams
7835 @section Lstream Types
7836
7837 @table @asis
7838 @item stdio
7839
7840 @item filedesc
7841
7842 @item lisp-string
7843
7844 @item fixed-buffer
7845
7846 @item resizing-buffer
7847
7848 @item dynarr
7849
7850 @item lisp-buffer
7851
7852 @item print
7853
7854 @item decoding
7855
7856 @item encoding
7857 @end table
7858
7859 @node Lstream Functions, Lstream Methods, Lstream Types, Lstreams
7860 @section Lstream Functions
7861
7862 @deftypefun {Lstream *} Lstream_new (Lstream_implementation *@var{imp}, const char *@var{mode})
7863 Allocate and return a new Lstream.  This function is not really meant to
7864 be called directly; rather, each stream type should provide its own
7865 stream creation function, which creates the stream and does any other
7866 necessary creation stuff (e.g. opening a file).
7867 @end deftypefun
7868
7869 @deftypefun void Lstream_set_buffering (Lstream *@var{lstr}, Lstream_buffering @var{buffering}, int @var{buffering_size})
7870 Change the buffering of a stream.  See @file{lstream.h}.  By default the
7871 buffering is @code{STREAM_BLOCK_BUFFERED}.
7872 @end deftypefun
7873
7874 @deftypefun int Lstream_flush (Lstream *@var{lstr})
7875 Flush out any pending unwritten data in the stream.  Clear any buffered
7876 input data.  Returns 0 on success, -1 on error.
7877 @end deftypefun
7878
7879 @deftypefn Macro int Lstream_putc (Lstream *@var{stream}, int @var{c})
7880 Write out one byte to the stream.  This is a macro and so it is very
7881 efficient.  The @var{c} argument is only evaluated once but the @var{stream}
7882 argument is evaluated more than once.  Returns 0 on success, -1 on
7883 error.
7884 @end deftypefn
7885
7886 @deftypefn Macro int Lstream_getc (Lstream *@var{stream})
7887 Read one byte from the stream.  This is a macro and so it is very
7888 efficient.  The @var{stream} argument is evaluated more than once.  Return
7889 value is -1 for EOF or error.
7890 @end deftypefn
7891
7892 @deftypefn Macro void Lstream_ungetc (Lstream *@var{stream}, int @var{c})
7893 Push one byte back onto the input queue.  This will be the next byte
7894 read from the stream.  Any number of bytes can be pushed back and will
7895 be read in the reverse order they were pushed back---most recent
7896 first. (This is necessary for consistency---if there are a number of
7897 bytes that have been unread and I read and unread a byte, it needs to be
7898 the first to be read again.) This is a macro and so it is very
7899 efficient.  The @var{c} argument is only evaluated once but the @var{stream}
7900 argument is evaluated more than once.
7901 @end deftypefn
7902
7903 @deftypefun int Lstream_fputc (Lstream *@var{stream}, int @var{c})
7904 @deftypefunx int Lstream_fgetc (Lstream *@var{stream})
7905 @deftypefunx void Lstream_fungetc (Lstream *@var{stream}, int @var{c})
7906 Function equivalents of the above macros.
7907 @end deftypefun
7908
7909 @deftypefun ssize_t Lstream_read (Lstream *@var{stream}, void *@var{data}, size_t @var{size})
7910 Read @var{size} bytes of @var{data} from the stream.  Return the number
7911 of bytes read.  0 means EOF. -1 means an error occurred and no bytes
7912 were read.
7913 @end deftypefun
7914
7915 @deftypefun ssize_t Lstream_write (Lstream *@var{stream}, void *@var{data}, size_t @var{size})
7916 Write @var{size} bytes of @var{data} to the stream.  Return the number
7917 of bytes written.  -1 means an error occurred and no bytes were written.
7918 @end deftypefun
7919
7920 @deftypefun void Lstream_unread (Lstream *@var{stream}, void *@var{data}, size_t @var{size})
7921 Push back @var{size} bytes of @var{data} onto the input queue.  The next
7922 call to @code{Lstream_read()} with the same size will read the same
7923 bytes back.  Note that this will be the case even if there is other
7924 pending unread data.
7925 @end deftypefun
7926
7927 @deftypefun int Lstream_close (Lstream *@var{stream})
7928 Close the stream.  All data will be flushed out.
7929 @end deftypefun
7930
7931 @deftypefun void Lstream_reopen (Lstream *@var{stream})
7932 Reopen a closed stream.  This enables I/O on it again.  This is not
7933 meant to be called except from a wrapper routine that reinitializes
7934 variables and such---the close routine may well have freed some
7935 necessary storage structures, for example.
7936 @end deftypefun
7937
7938 @deftypefun void Lstream_rewind (Lstream *@var{stream})
7939 Rewind the stream to the beginning.
7940 @end deftypefun
7941
7942 @node Lstream Methods,  , Lstream Functions, Lstreams
7943 @section Lstream Methods
7944
7945 @deftypefn {Lstream Method} ssize_t reader (Lstream *@var{stream}, unsigned char *@var{data}, size_t @var{size})
7946 Read some data from the stream's end and store it into @var{data}, which
7947 can hold @var{size} bytes.  Return the number of bytes read.  A return
7948 value of 0 means no bytes can be read at this time.  This may be because
7949 of an EOF, or because there is a granularity greater than one byte that
7950 the stream imposes on the returned data, and @var{size} is less than
7951 this granularity. (This will happen frequently for streams that need to
7952 return whole characters, because @code{Lstream_read()} calls the reader
7953 function repeatedly until it has the number of bytes it wants or until 0
7954 is returned.)  The lstream functions do not treat a 0 return as EOF or
7955 do anything special; however, the calling function will interpret any 0
7956 it gets back as EOF.  This will normally not happen unless the caller
7957 calls @code{Lstream_read()} with a very small size.
7958
7959 This function can be @code{NULL} if the stream is output-only.
7960 @end deftypefn
7961
7962 @deftypefn {Lstream Method} ssize_t writer (Lstream *@var{stream}, const unsigned char *@var{data}, size_t @var{size})
7963 Send some data to the stream's end.  Data to be sent is in @var{data}
7964 and is @var{size} bytes.  Return the number of bytes sent.  This
7965 function can send and return fewer bytes than is passed in; in that
7966 case, the function will just be called again until there is no data left
7967 or 0 is returned.  A return value of 0 means that no more data can be
7968 currently stored, but there is no error; the data will be squirreled
7969 away until the writer can accept data. (This is useful, e.g., if you're
7970 dealing with a non-blocking file descriptor and are getting
7971 @code{EWOULDBLOCK} errors.)  This function can be @code{NULL} if the
7972 stream is input-only.
7973 @end deftypefn
7974
7975 @deftypefn {Lstream Method} int rewinder (Lstream *@var{stream})
7976 Rewind the stream.  If this is @code{NULL}, the stream is not seekable.
7977 @end deftypefn
7978
7979 @deftypefn {Lstream Method} int seekable_p (Lstream *@var{stream})
7980 Indicate whether this stream is seekable---i.e. it can be rewound.
7981 This method is ignored if the stream does not have a rewind method.  If
7982 this method is not present, the result is determined by whether a rewind
7983 method is present.
7984 @end deftypefn
7985
7986 @deftypefn {Lstream Method} int flusher (Lstream *@var{stream})
7987 Perform any additional operations necessary to flush the data in this
7988 stream.
7989 @end deftypefn
7990
7991 @deftypefn {Lstream Method} int pseudo_closer (Lstream *@var{stream})
7992 @end deftypefn
7993
7994 @deftypefn {Lstream Method} int closer (Lstream *@var{stream})
7995 Perform any additional operations necessary to close this stream down.
7996 May be @code{NULL}.  This function is called when @code{Lstream_close()}
7997 is called or when the stream is garbage-collected.  When this function
7998 is called, all pending data in the stream will already have been written
7999 out.
8000 @end deftypefn
8001
8002 @deftypefn {Lstream Method} Lisp_Object marker (Lisp_Object @var{lstream}, void (*@var{markfun}) (Lisp_Object))
8003 Mark this object for garbage collection.  Same semantics as a standard
8004 @code{Lisp_Object} marker.  This function can be @code{NULL}.
8005 @end deftypefn
8006
8007 @node Consoles; Devices; Frames; Windows, The Redisplay Mechanism, Lstreams, Top
8008 @chapter Consoles; Devices; Frames; Windows
8009
8010 @menu
8011 * Introduction to Consoles; Devices; Frames; Windows::
8012 * Point::
8013 * Window Hierarchy::
8014 * The Window Object::
8015 @end menu
8016
8017 @node Introduction to Consoles; Devices; Frames; Windows, Point, Consoles; Devices; Frames; Windows, Consoles; Devices; Frames; Windows
8018 @section Introduction to Consoles; Devices; Frames; Windows
8019
8020 A window-system window that you see on the screen is called a
8021 @dfn{frame} in Emacs terminology.  Each frame is subdivided into one or
8022 more non-overlapping panes, called (confusingly) @dfn{windows}.  Each
8023 window displays the text of a buffer in it. (See above on Buffers.) Note
8024 that buffers and windows are independent entities: Two or more windows
8025 can be displaying the same buffer (potentially in different locations),
8026 and a buffer can be displayed in no windows.
8027
8028   A single display screen that contains one or more frames is called
8029 a @dfn{display}.  Under most circumstances, there is only one display.
8030 However, more than one display can exist, for example if you have
8031 a @dfn{multi-headed} console, i.e. one with a single keyboard but
8032 multiple displays. (Typically in such a situation, the various
8033 displays act like one large display, in that the mouse is only
8034 in one of them at a time, and moving the mouse off of one moves
8035 it into another.) In some cases, the different displays will
8036 have different characteristics, e.g. one color and one mono.
8037
8038   XEmacs can display frames on multiple displays.  It can even deal
8039 simultaneously with frames on multiple keyboards (called @dfn{consoles} in
8040 XEmacs terminology).  Here is one case where this might be useful: You
8041 are using XEmacs on your workstation at work, and leave it running.
8042 Then you go home and dial in on a TTY line, and you can use the
8043 already-running XEmacs process to display another frame on your local
8044 TTY.
8045
8046   Thus, there is a hierarchy console -> display -> frame -> window.
8047 There is a separate Lisp object type for each of these four concepts.
8048 Furthermore, there is logically a @dfn{selected console},
8049 @dfn{selected display}, @dfn{selected frame}, and @dfn{selected window}.
8050 Each of these objects is distinguished in various ways, such as being the
8051 default object for various functions that act on objects of that type.
8052 Note that every containing object rememembers the ``selected'' object
8053 among the objects that it contains: e.g. not only is there a selected
8054 window, but every frame remembers the last window in it that was
8055 selected, and changing the selected frame causes the remembered window
8056 within it to become the selected window.  Similar relationships apply
8057 for consoles to devices and devices to frames.
8058
8059 @node Point, Window Hierarchy, Introduction to Consoles; Devices; Frames; Windows, Consoles; Devices; Frames; Windows
8060 @section Point
8061
8062   Recall that every buffer has a current insertion position, called
8063 @dfn{point}.  Now, two or more windows may be displaying the same buffer,
8064 and the text cursor in the two windows (i.e. @code{point}) can be in
8065 two different places.  You may ask, how can that be, since each
8066 buffer has only one value of @code{point}?  The answer is that each window
8067 also has a value of @code{point} that is squirreled away in it.  There
8068 is only one selected window, and the value of ``point'' in that buffer
8069 corresponds to that window.  When the selected window is changed
8070 from one window to another displaying the same buffer, the old
8071 value of @code{point} is stored into the old window's ``point'' and the
8072 value of @code{point} from the new window is retrieved and made the
8073 value of @code{point} in the buffer.  This means that @code{window-point}
8074 for the selected window is potentially inaccurate, and if you
8075 want to retrieve the correct value of @code{point} for a window,
8076 you must special-case on the selected window and retrieve the
8077 buffer's point instead.  This is related to why @code{save-window-excursion}
8078 does not save the selected window's value of @code{point}.
8079
8080 @node Window Hierarchy, The Window Object, Point, Consoles; Devices; Frames; Windows
8081 @section Window Hierarchy
8082 @cindex window hierarchy
8083 @cindex hierarchy of windows
8084
8085   If a frame contains multiple windows (panes), they are always created
8086 by splitting an existing window along the horizontal or vertical axis.
8087 Terminology is a bit confusing here: to @dfn{split a window
8088 horizontally} means to create two side-by-side windows, i.e. to make a
8089 @emph{vertical} cut in a window.  Likewise, to @dfn{split a window
8090 vertically} means to create two windows, one above the other, by making
8091 a @emph{horizontal} cut.
8092
8093   If you split a window and then split again along the same axis, you
8094 will end up with a number of panes all arranged along the same axis.
8095 The precise way in which the splits were made should not be important,
8096 and this is reflected internally.  Internally, all windows are arranged
8097 in a tree, consisting of two types of windows, @dfn{combination} windows
8098 (which have children, and are covered completely by those children) and
8099 @dfn{leaf} windows, which have no children and are visible.  Every
8100 combination window has two or more children, all arranged along the same
8101 axis.  There are (logically) two subtypes of windows, depending on
8102 whether their children are horizontally or vertically arrayed.  There is
8103 always one root window, which is either a leaf window (if the frame
8104 contains only one window) or a combination window (if the frame contains
8105 more than one window).  In the latter case, the root window will have
8106 two or more children, either horizontally or vertically arrayed, and
8107 each of those children will be either a leaf window or another
8108 combination window.
8109
8110   Here are some rules:
8111
8112 @enumerate
8113 @item
8114 Horizontal combination windows can never have children that are
8115 horizontal combination windows; same for vertical.
8116
8117 @item
8118 Only leaf windows can be split (obviously) and this splitting does one
8119 of two things: (a) turns the leaf window into a combination window and
8120 creates two new leaf children, or (b) turns the leaf window into one of
8121 the two new leaves and creates the other leaf.  Rule (1) dictates which
8122 of these two outcomes happens.
8123
8124 @item
8125 Every combination window must have at least two children.
8126
8127 @item
8128 Leaf windows can never become combination windows.  They can be deleted,
8129 however.  If this results in a violation of (3), the parent combination
8130 window also gets deleted.
8131
8132 @item
8133 All functions that accept windows must be prepared to accept combination
8134 windows, and do something sane (e.g. signal an error if so).
8135 Combination windows @emph{do} escape to the Lisp level.
8136
8137 @item
8138 All windows have three fields governing their contents:
8139 these are @dfn{hchild} (a list of horizontally-arrayed children),
8140 @dfn{vchild} (a list of vertically-arrayed children), and @dfn{buffer}
8141 (the buffer contained in a leaf window).  Exactly one of
8142 these will be non-nil.  Remember that @dfn{horizontally-arrayed}
8143 means ``side-by-side'' and @dfn{vertically-arrayed} means
8144 @dfn{one above the other}.
8145
8146 @item
8147 Leaf windows also have markers in their @code{start} (the
8148 first buffer position displayed in the window) and @code{pointm}
8149 (the window's stashed value of @code{point}---see above) fields,
8150 while combination windows have nil in these fields.
8151
8152 @item
8153 The list of children for a window is threaded through the
8154 @code{next} and @code{prev} fields of each child window.
8155
8156 @item
8157 @strong{Deleted windows can be undeleted}.  This happens as a result of
8158 restoring a window configuration, and is unlike frames, displays, and
8159 consoles, which, once deleted, can never be restored.  Deleting a window
8160 does nothing except set a special @code{dead} bit to 1 and clear out the
8161 @code{next}, @code{prev}, @code{hchild}, and @code{vchild} fields, for
8162 GC purposes.
8163
8164 @item
8165 Most frames actually have two top-level windows---one for the
8166 minibuffer and one (the @dfn{root}) for everything else.  The modeline
8167 (if present) separates these two.  The @code{next} field of the root
8168 points to the minibuffer, and the @code{prev} field of the minibuffer
8169 points to the root.  The other @code{next} and @code{prev} fields are
8170 @code{nil}, and the frame points to both of these windows.
8171 Minibuffer-less frames have no minibuffer window, and the @code{next}
8172 and @code{prev} of the root window are @code{nil}.  Minibuffer-only
8173 frames have no root window, and the @code{next} of the minibuffer window
8174 is @code{nil} but the @code{prev} points to itself. (#### This is an
8175 artifact that should be fixed.)
8176 @end enumerate
8177
8178 @node The Window Object,  , Window Hierarchy, Consoles; Devices; Frames; Windows
8179 @section The Window Object
8180
8181   Windows have the following accessible fields:
8182
8183 @table @code
8184 @item frame
8185 The frame that this window is on.
8186
8187 @item mini_p
8188 Non-@code{nil} if this window is a minibuffer window.
8189
8190 @item buffer
8191 The buffer that the window is displaying.  This may change often during
8192 the life of the window.
8193
8194 @item dedicated
8195 Non-@code{nil} if this window is dedicated to its buffer.
8196
8197 @item pointm
8198 @cindex window point internals
8199 This is the value of point in the current buffer when this window is
8200 selected; when it is not selected, it retains its previous value.
8201
8202 @item start
8203 The position in the buffer that is the first character to be displayed
8204 in the window.
8205
8206 @item force_start
8207 If this flag is non-@code{nil}, it says that the window has been
8208 scrolled explicitly by the Lisp program.  This affects what the next
8209 redisplay does if point is off the screen: instead of scrolling the
8210 window to show the text around point, it moves point to a location that
8211 is on the screen.
8212
8213 @item last_modified
8214 The @code{modified} field of the window's buffer, as of the last time
8215 a redisplay completed in this window.
8216
8217 @item last_point
8218 The buffer's value of point, as of the last time
8219 a redisplay completed in this window.
8220
8221 @item left
8222 This is the left-hand edge of the window, measured in columns.  (The
8223 leftmost column on the screen is @w{column 0}.)
8224
8225 @item top
8226 This is the top edge of the window, measured in lines.  (The top line on
8227 the screen is @w{line 0}.)
8228
8229 @item height
8230 The height of the window, measured in lines.
8231
8232 @item width
8233 The width of the window, measured in columns.
8234
8235 @item next
8236 This is the window that is the next in the chain of siblings.  It is
8237 @code{nil} in a window that is the rightmost or bottommost of a group of
8238 siblings.
8239
8240 @item prev
8241 This is the window that is the previous in the chain of siblings.  It is
8242 @code{nil} in a window that is the leftmost or topmost of a group of
8243 siblings.
8244
8245 @item parent
8246 Internally, XEmacs arranges windows in a tree; each group of siblings has
8247 a parent window whose area includes all the siblings.  This field points
8248 to a window's parent.
8249
8250 Parent windows do not display buffers, and play little role in display
8251 except to shape their child windows.  Emacs Lisp programs usually have
8252 no access to the parent windows; they operate on the windows at the
8253 leaves of the tree, which actually display buffers.
8254
8255 @item hscroll
8256 This is the number of columns that the display in the window is scrolled
8257 horizontally to the left.  Normally, this is 0.
8258
8259 @item use_time
8260 This is the last time that the window was selected.  The function
8261 @code{get-lru-window} uses this field.
8262
8263 @item display_table
8264 The window's display table, or @code{nil} if none is specified for it.
8265
8266 @item update_mode_line
8267 Non-@code{nil} means this window's mode line needs to be updated.
8268
8269 @item base_line_number
8270 The line number of a certain position in the buffer, or @code{nil}.
8271 This is used for displaying the line number of point in the mode line.
8272
8273 @item base_line_pos
8274 The position in the buffer for which the line number is known, or
8275 @code{nil} meaning none is known.
8276
8277 @item region_showing
8278 If the region (or part of it) is highlighted in this window, this field
8279 holds the mark position that made one end of that region.  Otherwise,
8280 this field is @code{nil}.
8281 @end table
8282
8283 @node The Redisplay Mechanism, Extents, Consoles; Devices; Frames; Windows, Top
8284 @chapter The Redisplay Mechanism
8285
8286   The redisplay mechanism is one of the most complicated sections of
8287 XEmacs, especially from a conceptual standpoint.  This is doubly so
8288 because, unlike for the basic aspects of the Lisp interpreter, the
8289 computer science theories of how to efficiently handle redisplay are not
8290 well-developed.
8291
8292   When working with the redisplay mechanism, remember the Golden Rules
8293 of Redisplay:
8294
8295 @enumerate
8296 @item
8297 It Is Better To Be Correct Than Fast.
8298 @item
8299 Thou Shalt Not Run Elisp From Within Redisplay.
8300 @item
8301 It Is Better To Be Fast Than Not To Be.
8302 @end enumerate
8303
8304 @menu
8305 * Critical Redisplay Sections::
8306 * Line Start Cache::
8307 * Redisplay Piece by Piece::
8308 @end menu
8309
8310 @node Critical Redisplay Sections, Line Start Cache, The Redisplay Mechanism, The Redisplay Mechanism
8311 @section Critical Redisplay Sections
8312 @cindex critical redisplay sections
8313
8314 Within this section, we are defenseless and assume that the
8315 following cannot happen:
8316
8317 @enumerate
8318 @item
8319 garbage collection
8320 @item
8321 Lisp code evaluation
8322 @item
8323 frame size changes
8324 @end enumerate
8325
8326 We ensure (3) by calling @code{hold_frame_size_changes()}, which
8327 will cause any pending frame size changes to get put on hold
8328 till after the end of the critical section.  (1) follows
8329 automatically if (2) is met.  #### Unfortunately, there are
8330 some places where Lisp code can be called within this section.
8331 We need to remove them.
8332
8333 If @code{Fsignal()} is called during this critical section, we
8334 will @code{abort()}.
8335
8336 If garbage collection is called during this critical section,
8337 we simply return. #### We should abort instead.
8338
8339 #### If a frame-size change does occur we should probably
8340 actually be preempting redisplay.
8341
8342 @node Line Start Cache, Redisplay Piece by Piece, Critical Redisplay Sections, The Redisplay Mechanism
8343 @section Line Start Cache
8344 @cindex line start cache
8345
8346   The traditional scrolling code in Emacs breaks in a variable height
8347 world.  It depends on the key assumption that the number of lines that
8348 can be displayed at any given time is fixed.  This led to a complete
8349 separation of the scrolling code from the redisplay code.  In order to
8350 fully support variable height lines, the scrolling code must actually be
8351 tightly integrated with redisplay.  Only redisplay can determine how
8352 many lines will be displayed on a screen for any given starting point.
8353
8354   What is ideally wanted is a complete list of the starting buffer
8355 position for every possible display line of a buffer along with the
8356 height of that display line.  Maintaining such a full list would be very
8357 expensive.  We settle for having it include information for all areas
8358 which we happen to generate anyhow (i.e. the region currently being
8359 displayed) and for those areas we need to work with.
8360
8361   In order to ensure that the cache accurately represents what redisplay
8362 would actually show, it is necessary to invalidate it in many
8363 situations.  If the buffer changes, the starting positions may no longer
8364 be correct.  If a face or an extent has changed then the line heights
8365 may have altered.  These events happen frequently enough that the cache
8366 can end up being constantly disabled.  With this potentially constant
8367 invalidation when is the cache ever useful?
8368
8369   Even if the cache is invalidated before every single usage, it is
8370 necessary.  Scrolling often requires knowledge about display lines which
8371 are actually above or below the visible region.  The cache provides a
8372 convenient light-weight method of storing this information for multiple
8373 display regions.  This knowledge is necessary for the scrolling code to
8374 always obey the First Golden Rule of Redisplay.
8375
8376   If the cache already contains all of the information that the scrolling
8377 routines happen to need so that it doesn't have to go generate it, then
8378 we are able to obey the Third Golden Rule of Redisplay.  The first thing
8379 we do to help out the cache is to always add the displayed region.  This
8380 region had to be generated anyway, so the cache ends up getting the
8381 information basically for free.  In those cases where a user is simply
8382 scrolling around viewing a buffer there is a high probability that this
8383 is sufficient to always provide the needed information.  The second
8384 thing we can do is be smart about invalidating the cache.
8385
8386   TODO---Be smart about invalidating the cache.  Potential places:
8387
8388 @itemize @bullet
8389 @item
8390 Insertions at end-of-line which don't cause line-wraps do not alter the
8391 starting positions of any display lines.  These types of buffer
8392 modifications should not invalidate the cache.  This is actually a large
8393 optimization for redisplay speed as well.
8394 @item
8395 Buffer modifications frequently only affect the display of lines at and
8396 below where they occur.  In these situations we should only invalidate
8397 the part of the cache starting at where the modification occurs.
8398 @end itemize
8399
8400   In case you're wondering, the Second Golden Rule of Redisplay is not
8401 applicable.
8402
8403 @node Redisplay Piece by Piece,  , Line Start Cache, The Redisplay Mechanism
8404 @section Redisplay Piece by Piece
8405 @cindex Redisplay Piece by Piece
8406
8407 As you can begin to see redisplay is complex and also not well
8408 documented. Chuck no longer works on XEmacs so this section is my take
8409 on the workings of redisplay.
8410
8411 Redisplay happens in three phases:
8412
8413 @enumerate
8414 @item
8415 Determine desired display in area that needs redisplay.
8416 Implemented by @code{redisplay.c}
8417 @item
8418 Compare desired display with current display
8419 Implemented by @code{redisplay-output.c}
8420 @item
8421 Output changes Implemented by @code{redisplay-output.c},
8422 @code{redisplay-x.c}, @code{redisplay-msw.c} and @code{redisplay-tty.c}
8423 @end enumerate
8424
8425 Steps 1 and 2 are device-independant and relatively complex.  Step 3 is
8426 mostly device-dependent.
8427
8428 Determining the desired display
8429
8430 Display attributes are stored in @code{display_line} structures. Each
8431 @code{display_line} consists of a set of @code{display_block}'s and each
8432 @code{display_block} contains a number of @code{rune}'s. Generally
8433 dynarr's of @code{display_line}'s are held by each window representing
8434 the current display and the desired display.
8435
8436 The @code{display_line} structures are tighly tied to buffers which
8437 presents a problem for redisplay as this connection is bogus for the
8438 modeline. Hence the @code{display_line} generation routines are
8439 duplicated for generating the modeline. This means that the modeline
8440 display code has many bugs that the standard redisplay code does not.
8441
8442 The guts of @code{display_line} generation are in
8443 @code{create_text_block}, which creates a single display line for the
8444 desired locale. This incrementally parses the characters on the current
8445 line and generates redisplay structures for each.
8446
8447 Gutter redisplay is different. Because the data to display is stored in
8448 a string we cannot use @code{create_text_block}. Instead we use
8449 @code{create_text_string_block} which performs the same function as
8450 @code{create_text_block} but for strings. Many of the complexities of
8451 @code{create_text_block} to do with cursor handling and selective
8452 display have been removed.
8453
8454 @node Extents, Faces, The Redisplay Mechanism, Top
8455 @chapter Extents
8456
8457 @menu
8458 * Introduction to Extents::     Extents are ranges over text, with properties.
8459 * Extent Ordering::             How extents are ordered internally.
8460 * Format of the Extent Info::   The extent information in a buffer or string.
8461 * Zero-Length Extents::         A weird special case.
8462 * Mathematics of Extent Ordering::  A rigorous foundation.
8463 * Extent Fragments::            Cached information useful for redisplay.
8464 @end menu
8465
8466 @node Introduction to Extents, Extent Ordering, Extents, Extents
8467 @section Introduction to Extents
8468
8469   Extents are regions over a buffer, with a start and an end position
8470 denoting the region of the buffer included in the extent.  In
8471 addition, either end can be closed or open, meaning that the endpoint
8472 is or is not logically included in the extent.  Insertion of a character
8473 at a closed endpoint causes the character to go inside the extent;
8474 insertion at an open endpoint causes the character to go outside.
8475
8476   Extent endpoints are stored using memory indices (see @file{insdel.c}),
8477 to minimize the amount of adjusting that needs to be done when
8478 characters are inserted or deleted.
8479
8480   (Formerly, extent endpoints at the gap could be either before or
8481 after the gap, depending on the open/closedness of the endpoint.
8482 The intent of this was to make it so that insertions would
8483 automatically go inside or out of extents as necessary with no
8484 further work needing to be done.  It didn't work out that way,
8485 however, and just ended up complexifying and buggifying all the
8486 rest of the code.)
8487
8488 @node Extent Ordering, Format of the Extent Info, Introduction to Extents, Extents
8489 @section Extent Ordering
8490
8491   Extents are compared using memory indices.  There are two orderings
8492 for extents and both orders are kept current at all times.  The normal
8493 or @dfn{display} order is as follows:
8494
8495 @example
8496 Extent A is ``less than'' extent B,
8497 that is, earlier in the display order,
8498   if:    A-start < B-start,
8499   or if: A-start = B-start, and A-end > B-end
8500 @end example
8501
8502   So if two extents begin at the same position, the larger of them is the
8503 earlier one in the display order (@code{EXTENT_LESS} is true).
8504
8505   For the e-order, the same thing holds:
8506
8507 @example
8508 Extent A is ``less than'' extent B in e-order,
8509 that is, later in the buffer,
8510   if:    A-end < B-end,
8511   or if: A-end = B-end, and A-start > B-start
8512 @end example
8513
8514   So if two extents end at the same position, the smaller of them is the
8515 earlier one in the e-order (@code{EXTENT_E_LESS} is true).
8516
8517   The display order and the e-order are complementary orders: any
8518 theorem about the display order also applies to the e-order if you swap
8519 all occurrences of ``display order'' and ``e-order'', ``less than'' and
8520 ``greater than'', and ``extent start'' and ``extent end''.
8521
8522 @node Format of the Extent Info, Zero-Length Extents, Extent Ordering, Extents
8523 @section Format of the Extent Info
8524
8525   An extent-info structure consists of a list of the buffer or string's
8526 extents and a @dfn{stack of extents} that lists all of the extents over
8527 a particular position.  The stack-of-extents info is used for
8528 optimization purposes---it basically caches some info that might
8529 be expensive to compute.  Certain otherwise hard computations are easy
8530 given the stack of extents over a particular position, and if the
8531 stack of extents over a nearby position is known (because it was
8532 calculated at some prior point in time), it's easy to move the stack
8533 of extents to the proper position.
8534
8535   Given that the stack of extents is an optimization, and given that
8536 it requires memory, a string's stack of extents is wiped out each
8537 time a garbage collection occurs.  Therefore, any time you retrieve
8538 the stack of extents, it might not be there.  If you need it to
8539 be there, use the @code{_force} version.
8540
8541   Similarly, a string may or may not have an extent_info structure.
8542 (Generally it won't if there haven't been any extents added to the
8543 string.) So use the @code{_force} version if you need the extent_info
8544 structure to be there.
8545
8546   A list of extents is maintained as a double gap array: one gap array
8547 is ordered by start index (the @dfn{display order}) and the other is
8548 ordered by end index (the @dfn{e-order}).  Note that positions in an
8549 extent list should logically be conceived of as referring @emph{to} a
8550 particular extent (as is the norm in programs) rather than sitting
8551 between two extents.  Note also that callers of these functions should
8552 not be aware of the fact that the extent list is implemented as an
8553 array, except for the fact that positions are integers (this should be
8554 generalized to handle integers and linked list equally well).
8555
8556 @node Zero-Length Extents, Mathematics of Extent Ordering, Format of the Extent Info, Extents
8557 @section Zero-Length Extents
8558
8559   Extents can be zero-length, and will end up that way if their endpoints
8560 are explicitly set that way or if their detachable property is nil
8561 and all the text in the extent is deleted. (The exception is open-open
8562 zero-length extents, which are barred from existing because there is
8563 no sensible way to define their properties.  Deletion of the text in
8564 an open-open extent causes it to be converted into a closed-open
8565 extent.)  Zero-length extents are primarily used to represent
8566 annotations, and behave as follows:
8567
8568 @enumerate
8569 @item
8570 Insertion at the position of a zero-length extent expands the extent
8571 if both endpoints are closed; goes after the extent if it is closed-open;
8572 and goes before the extent if it is open-closed.
8573
8574 @item
8575 Deletion of a character on a side of a zero-length extent whose
8576 corresponding endpoint is closed causes the extent to be detached if
8577 it is detachable; if the extent is not detachable or the corresponding
8578 endpoint is open, the extent remains in the buffer, moving as necessary.
8579 @end enumerate
8580
8581   Note that closed-open, non-detachable zero-length extents behave
8582 exactly like markers and that open-closed, non-detachable zero-length
8583 extents behave like the ``point-type'' marker in Mule.
8584
8585 @node Mathematics of Extent Ordering, Extent Fragments, Zero-Length Extents, Extents
8586 @section Mathematics of Extent Ordering
8587 @cindex extent mathematics
8588 @cindex mathematics of extents
8589 @cindex extent ordering
8590
8591 @cindex display order of extents
8592 @cindex extents, display order
8593   The extents in a buffer are ordered by ``display order'' because that
8594 is that order that the redisplay mechanism needs to process them in.
8595 The e-order is an auxiliary ordering used to facilitate operations
8596 over extents.  The operations that can be performed on the ordered
8597 list of extents in a buffer are
8598
8599 @enumerate
8600 @item
8601 Locate where an extent would go if inserted into the list.
8602 @item
8603 Insert an extent into the list.
8604 @item
8605 Remove an extent from the list.
8606 @item
8607 Map over all the extents that overlap a range.
8608 @end enumerate
8609
8610   (4) requires being able to determine the first and last extents
8611 that overlap a range.
8612
8613   NOTE: @dfn{overlap} is used as follows:
8614
8615 @itemize @bullet
8616 @item
8617 two ranges overlap if they have at least one point in common.
8618 Whether the endpoints are open or closed makes a difference here.
8619 @item
8620 a point overlaps a range if the point is contained within the
8621 range; this is equivalent to treating a point @math{P} as the range
8622 @math{[P, P]}.
8623 @item
8624 In the case of an @emph{extent} overlapping a point or range, the extent
8625 is normally treated as having closed endpoints.  This applies
8626 consistently in the discussion of stacks of extents and such below.
8627 Note that this definition of overlap is not necessarily consistent with
8628 the extents that @code{map-extents} maps over, since @code{map-extents}
8629 sometimes pays attention to whether the endpoints of an extents are open
8630 or closed.  But for our purposes, it greatly simplifies things to treat
8631 all extents as having closed endpoints.
8632 @end itemize
8633
8634 First, define @math{>}, @math{<}, @math{<=}, etc. as applied to extents
8635 to mean comparison according to the display order.  Comparison between
8636 an extent @math{E} and an index @math{I} means comparison between
8637 @math{E} and the range @math{[I, I]}.
8638
8639 Also define @math{e>}, @math{e<}, @math{e<=}, etc. to mean comparison
8640 according to the e-order.
8641
8642 For any range @math{R}, define @math{R(0)} to be the starting index of
8643 the range and @math{R(1)} to be the ending index of the range.
8644
8645 For any extent @math{E}, define @math{E(next)} to be the extent directly
8646 following @math{E}, and @math{E(prev)} to be the extent directly
8647 preceding @math{E}.  Assume @math{E(next)} and @math{E(prev)} can be
8648 determined from @math{E} in constant time.  (This is because we store
8649 the extent list as a doubly linked list.)
8650
8651 Similarly, define @math{E(e-next)} and @math{E(e-prev)} to be the
8652 extents directly following and preceding @math{E} in the e-order.
8653
8654 Now:
8655
8656 Let @math{R} be a range.
8657 Let @math{F} be the first extent overlapping @math{R}.
8658 Let @math{L} be the last extent overlapping @math{R}.
8659
8660 Theorem 1: @math{R(1)} lies between @math{L} and @math{L(next)},
8661 i.e. @math{L <= R(1) < L(next)}.
8662
8663   This follows easily from the definition of display order.  The
8664 basic reason that this theorem applies is that the display order
8665 sorts by increasing starting index.
8666
8667   Therefore, we can determine @math{L} just by looking at where we would
8668 insert @math{R(1)} into the list, and if we know @math{F} and are moving
8669 forward over extents, we can easily determine when we've hit @math{L} by
8670 comparing the extent we're at to @math{R(1)}.
8671
8672 @example
8673 Theorem 2: @math{F(e-prev) e< [1, R(0)] e<= F}.
8674 @end example
8675
8676   This is the analog of Theorem 1, and applies because the e-order
8677 sorts by increasing ending index.
8678
8679   Therefore, @math{F} can be found in the same amount of time as
8680 operation (1), i.e. the time that it takes to locate where an extent
8681 would go if inserted into the e-order list.
8682
8683   If the lists were stored as balanced binary trees, then operation (1)
8684 would take logarithmic time, which is usually quite fast.  However,
8685 currently they're stored as simple doubly-linked lists, and instead we
8686 do some caching to try to speed things up.
8687
8688   Define a @dfn{stack of extents} (or @dfn{SOE}) as the set of extents
8689 (ordered in the display order) that overlap an index @math{I}, together
8690 with the SOE's @dfn{previous} extent, which is an extent that precedes
8691 @math{I} in the e-order. (Hopefully there will not be very many extents
8692 between @math{I} and the previous extent.)
8693
8694 Now:
8695
8696 Let @math{I} be an index, let @math{S} be the stack of extents on
8697 @math{I}, let @math{F} be the first extent in @math{S}, and let @math{P}
8698 be @math{S}'s previous extent.
8699
8700 Theorem 3: The first extent in @math{S} is the first extent that overlaps
8701 any range @math{[I, J]}.
8702
8703 Proof: Any extent that overlaps @math{[I, J]} but does not include
8704 @math{I} must have a start index @math{> I}, and thus be greater than
8705 any extent in @math{S}.
8706
8707 Therefore, finding the first extent that overlaps a range @math{R} is
8708 the same as finding the first extent that overlaps @math{R(0)}.
8709
8710 Theorem 4: Let @math{I2} be an index such that @math{I2 > I}, and let
8711 @math{F2} be the first extent that overlaps @math{I2}.  Then, either
8712 @math{F2} is in @math{S} or @math{F2} is greater than any extent in
8713 @math{S}.
8714
8715 Proof: If @math{F2} does not include @math{I} then its start index is
8716 greater than @math{I} and thus it is greater than any extent in
8717 @math{S}, including @math{F}.  Otherwise, @math{F2} includes @math{I}
8718 and thus is in @math{S}, and thus @math{F2 >= F}.
8719
8720 @node Extent Fragments,  , Mathematics of Extent Ordering, Extents
8721 @section Extent Fragments
8722 @cindex extent fragment
8723
8724   Imagine that the buffer is divided up into contiguous, non-overlapping
8725 @dfn{runs} of text such that no extent starts or ends within a run
8726 (extents that abut the run don't count).
8727
8728   An extent fragment is a structure that holds data about the run that
8729 contains a particular buffer position (if the buffer position is at the
8730 junction of two runs, the run after the position is used)---the
8731 beginning and end of the run, a list of all of the extents in that run,
8732 the @dfn{merged face} that results from merging all of the faces
8733 corresponding to those extents, the begin and end glyphs at the
8734 beginning of the run, etc.  This is the information that redisplay needs
8735 in order to display this run.
8736
8737   Extent fragments have to be very quick to update to a new buffer
8738 position when moving linearly through the buffer.  They rely on the
8739 stack-of-extents code, which does the heavy-duty algorithmic work of
8740 determining which extents overly a particular position.
8741
8742 @node Faces, Glyphs, Extents, Top
8743 @chapter Faces
8744
8745 Not yet documented.
8746
8747 @node Glyphs, Specifiers, Faces, Top
8748 @chapter Glyphs
8749
8750 Glyphs are graphical elements that can be displayed in XEmacs buffers or
8751 gutters. We use the term graphical element here in the broadest possible
8752 sense since glyphs can be as mundane as text to as arcane as a native
8753 tab widget.
8754
8755 In XEmacs, glyphs represent the uninstantiated state of graphical
8756 elements, i.e. they hold all the information necessary to produce an
8757 image on-screen but the image does not exist at this stage.
8758
8759 Glyphs are lazily instantiated by calling one of the glyph
8760 functions. This usually occurs within redisplay when
8761 @code{Fglyph_height} is called. Instantiation causes an image-instance
8762 to be created and cached. This cache is on a device basis for all glyphs
8763 except glyph-widgets, and on a window basis for glyph widgets.  The
8764 caching is done by @code{image_instantiate} and is necessary because it
8765 is generally possible to display an image-instance in multiple
8766 domains. For instance if we create a Pixmap, we can actually display
8767 this on multiple windows - even though we only need a single Pixmap
8768 instance to do this. If caching wasn't done then it would be necessary
8769 to create image-instances for every displayable occurrance of a glyph -
8770 and every usage - and this would be extremely memory and cpu intensive.
8771
8772 Widget-glyphs (a.k.a native widgets) are not cached in this way. This is
8773 because widget-glyph image-instances on screen are toolkit windows, and
8774 thus cannot be reused in multiple XEmacs domains. Thus widget-glyphs are
8775 cached on a window basis.
8776
8777 Any action on a glyph first consults the cache before actually
8778 instantiating a widget.
8779
8780 @section Widget-Glyphs in the MS-Windows Environment
8781
8782 To Do
8783
8784 @section Widget-Glyphs in the X Environment
8785
8786 Widget-glyphs under X make heavy use of lwlib for manipulating the
8787 native toolkit objects. This is primarily so that different toolkits can
8788 be supported for widget-glyphs, just as they are supported for features
8789 such as menubars etc.
8790
8791 Lwlib is extremely poorly documented and quite hairy so here is my
8792 understanding of what goes on.
8793
8794 Lwlib maintains a set of widget_instances which mirror the hierarchical
8795 state of Xt widgets. I think this is so that widgets can be updated and
8796 manipulated generically by the lwlib library. For instance
8797 update_one_widget_instance can cope with multiple types of widget and
8798 multiple types of toolkit. Each element in the widget hierarchy is updated
8799 from its corresponding widget_instance by walking the widget_instance
8800 tree recursively.
8801
8802 This has desirable properties such as lw_modify_all_widgets which is
8803 called from glyphs-x.c and updates all the properties of a widget
8804 without having to know what the widget is or what toolkit it is from.
8805 Unfortunately this also has hairy properrties such as making the lwlib
8806 code quite complex. And of course lwlib has to know at some level what
8807 the widget is and how to set its properties.
8808
8809 @node Specifiers, Menus, Glyphs, Top
8810 @chapter Specifiers
8811
8812 Not yet documented.
8813
8814 @node Menus, Subprocesses, Specifiers, Top
8815 @chapter Menus
8816
8817   A menu is set by setting the value of the variable
8818 @code{current-menubar} (which may be buffer-local) and then calling
8819 @code{set-menubar-dirty-flag} to signal a change.  This will cause the
8820 menu to be redrawn at the next redisplay.  The format of the data in
8821 @code{current-menubar} is described in @file{menubar.c}.
8822
8823   Internally the data in current-menubar is parsed into a tree of
8824 @code{widget_value's} (defined in @file{lwlib.h}); this is accomplished
8825 by the recursive function @code{menu_item_descriptor_to_widget_value()},
8826 called by @code{compute_menubar_data()}.  Such a tree is deallocated
8827 using @code{free_widget_value()}.
8828
8829   @code{update_screen_menubars()} is one of the external entry points.
8830 This checks to see, for each screen, if that screen's menubar needs to
8831 be updated.  This is the case if
8832
8833 @enumerate
8834 @item
8835 @code{set-menubar-dirty-flag} was called since the last redisplay.  (This
8836 function sets the C variable menubar_has_changed.)
8837 @item
8838 The buffer displayed in the screen has changed.
8839 @item
8840 The screen has no menubar currently displayed.
8841 @end enumerate
8842
8843   @code{set_screen_menubar()} is called for each such screen.  This
8844 function calls @code{compute_menubar_data()} to create the tree of
8845 widget_value's, then calls @code{lw_create_widget()},
8846 @code{lw_modify_all_widgets()}, and/or @code{lw_destroy_all_widgets()}
8847 to create the X-Toolkit widget associated with the menu.
8848
8849   @code{update_psheets()}, the other external entry point, actually
8850 changes the menus being displayed.  It uses the widgets fixed by
8851 @code{update_screen_menubars()} and calls various X functions to ensure
8852 that the menus are displayed properly.
8853
8854   The menubar widget is set up so that @code{pre_activate_callback()} is
8855 called when the menu is first selected (i.e. mouse button goes down),
8856 and @code{menubar_selection_callback()} is called when an item is
8857 selected.  @code{pre_activate_callback()} calls the function in
8858 activate-menubar-hook, which can change the menubar (this is described
8859 in @file{menubar.c}).  If the menubar is changed,
8860 @code{set_screen_menubars()} is called.
8861 @code{menubar_selection_callback()} enqueues a menu event, putting in it
8862 a function to call (either @code{eval} or @code{call-interactively}) and
8863 its argument, which is the callback function or form given in the menu's
8864 description.
8865
8866 @node Subprocesses, Interface to X Windows, Menus, Top
8867 @chapter Subprocesses
8868
8869   The fields of a process are:
8870
8871 @table @code
8872 @item name
8873 A string, the name of the process.
8874
8875 @item command
8876 A list containing the command arguments that were used to start this
8877 process.
8878
8879 @item filter
8880 A function used to accept output from the process instead of a buffer,
8881 or @code{nil}.
8882
8883 @item sentinel
8884 A function called whenever the process receives a signal, or @code{nil}.
8885
8886 @item buffer
8887 The associated buffer of the process.
8888
8889 @item pid
8890 An integer, the Unix process @sc{id}.
8891
8892 @item childp
8893 A flag, non-@code{nil} if this is really a child process.
8894 It is @code{nil} for a network connection.
8895
8896 @item mark
8897 A marker indicating the position of the end of the last output from this
8898 process inserted into the buffer.  This is often but not always the end
8899 of the buffer.
8900
8901 @item kill_without_query
8902 If this is non-@code{nil}, killing XEmacs while this process is still
8903 running does not ask for confirmation about killing the process.
8904
8905 @item raw_status_low
8906 @itemx raw_status_high
8907 These two fields record 16 bits each of the process status returned by
8908 the @code{wait} system call.
8909
8910 @item status
8911 The process status, as @code{process-status} should return it.
8912
8913 @item tick
8914 @itemx update_tick
8915 If these two fields are not equal, a change in the status of the process
8916 needs to be reported, either by running the sentinel or by inserting a
8917 message in the process buffer.
8918
8919 @item pty_flag
8920 Non-@code{nil} if communication with the subprocess uses a @sc{pty};
8921 @code{nil} if it uses a pipe.
8922
8923 @item infd
8924 The file descriptor for input from the process.
8925
8926 @item outfd
8927 The file descriptor for output to the process.
8928
8929 @item subtty
8930 The file descriptor for the terminal that the subprocess is using.  (On
8931 some systems, there is no need to record this, so the value is
8932 @code{-1}.)
8933
8934 @item tty_name
8935 The name of the terminal that the subprocess is using,
8936 or @code{nil} if it is using pipes.
8937 @end table
8938
8939 @node Interface to X Windows, Index , Subprocesses, Top
8940 @chapter Interface to X Windows
8941
8942 Not yet documented.
8943
8944 @include index.texi
8945
8946 @c Print the tables of contents
8947 @summarycontents
8948 @contents
8949 @c That's all
8950
8951 @bye
8952