git.chise.org Git - chise/xemacs-chise.git.1/blob - man/internals/internals.texi

   1 \input texinfo  @c -*-texinfo-*-
   2 @c %**start of header
   3 @setfilename ../../info/internals.info
   4 @settitle XEmacs Internals Manual
   5 @c %**end of header
   6
   7 @ifinfo
   8 @dircategory XEmacs Editor
   9 @direntry
  10 * Internals: (internals).       XEmacs Internals Manual.
  11 @end direntry
  12
  13 Copyright @copyright{} 1992 - 1996 Ben Wing.
  14 Copyright @copyright{} 1996, 1997 Sun Microsystems.
  15 Copyright @copyright{} 1994 - 1998 Free Software Foundation.
  16 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
  17
  18
  19 Permission is granted to make and distribute verbatim copies of this
  20 manual provided the copyright notice and this permission notice are
  21 preserved on all copies.
  22
  23 @ignore
  24 Permission is granted to process this file through TeX and print the
  25 results, provided the printed document carries copying permission notice
  26 identical to this one except for the removal of this paragraph (this
  27 paragraph not being relevant to the printed manual).
  28
  29 @end ignore
  30 Permission is granted to copy and distribute modified versions of this
  31 manual under the conditions for verbatim copying, provided that the
  32 entire resulting derived work is distributed under the terms of a
  33 permission notice identical to this one.
  34
  35 Permission is granted to copy and distribute translations of this manual
  36 into another language, under the above conditions for modified versions,
  37 except that this permission notice may be stated in a translation
  38 approved by the Foundation.
  39
  40 Permission is granted to copy and distribute modified versions of this
  41 manual under the conditions for verbatim copying, provided also that the
  42 section entitled ``GNU General Public License'' is included exactly as
  43 in the original, and provided that the entire resulting derived work is
  44 distributed under the terms of a permission notice identical to this
  45 one.
  46
  47 Permission is granted to copy and distribute translations of this manual
  48 into another language, under the above conditions for modified versions,
  49 except that the section entitled ``GNU General Public License'' may be
  50 included in a translation approved by the Free Software Foundation
  51 instead of in the original English.
  52 @end ifinfo
  53
  54 @c Combine indices.
  55 @synindex cp fn
  56 @syncodeindex vr fn
  57 @syncodeindex ky fn
  58 @syncodeindex pg fn
  59 @syncodeindex tp fn
  60
  61 @setchapternewpage odd
  62 @finalout
  63
  64 @titlepage
  65 @title XEmacs Internals Manual
  66 @subtitle Version 1.4, March 2001
  67
  68 @author Ben Wing
  69 @author Martin Buchholz
  70 @author Hrvoje Niksic
  71 @author Matthias Neubauer
  72 @author Olivier Galibert
  73 @page
  74 @vskip 0pt plus 1fill
  75
  76 @noindent
  77 Copyright @copyright{} 1992 - 1996, 2001 Ben Wing. @*
  78 Copyright @copyright{} 1996, 1997 Sun Microsystems, Inc. @*
  79 Copyright @copyright{} 1994 - 1998 Free Software Foundation. @*
  80 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
  81
  82 @sp 2
  83 Version 1.4 @*
  84 March 2001.@*
  85
  86 Permission is granted to make and distribute verbatim copies of this
  87 manual provided the copyright notice and this permission notice are
  88 preserved on all copies.
  89
  90 Permission is granted to copy and distribute modified versions of this
  91 manual under the conditions for verbatim copying, provided also that the
  92 section entitled ``GNU General Public License'' is included
  93 exactly as in the original, and provided that the entire resulting
  94 derived work is distributed under the terms of a permission notice
  95 identical to this one.
  96
  97 Permission is granted to copy and distribute translations of this manual
  98 into another language, under the above conditions for modified versions,
  99 except that the section entitled ``GNU General Public License'' may be
 100 included in a translation approved by the Free Software Foundation
 101 instead of in the original English.
 102 @end titlepage
 103 @page
 104
 105 @node Top, A History of Emacs, (dir), (dir)
 106
 107 @ifinfo
 108 This Info file contains v1.4 of the XEmacs Internals Manual, March 2001.
 109 @end ifinfo
 110
 111 @menu
 112 * A History of Emacs::          Times, dates, important events.
 113 * XEmacs From the Outside::     A broad conceptual overview.
 114 * The Lisp Language::           An overview.
 115 * XEmacs From the Perspective of Building::
 116 * XEmacs From the Inside::
 117 * The XEmacs Object System (Abstractly Speaking)::
 118 * How Lisp Objects Are Represented in C::
 119 * Rules When Writing New C Code::
 120 * A Summary of the Various XEmacs Modules::
 121 * Allocation of Objects in XEmacs Lisp::
 122 * Dumping::
 123 * Events and the Event Loop::
 124 * Evaluation; Stack Frames; Bindings::
 125 * Symbols and Variables::
 126 * Buffers and Textual Representation::
 127 * MULE Character Sets and Encodings::
 128 * The Lisp Reader and Compiler::
 129 * Lstreams::
 130 * Consoles; Devices; Frames; Windows::
 131 * The Redisplay Mechanism::
 132 * Extents::
 133 * Faces::
 134 * Glyphs::
 135 * Specifiers::
 136 * Menus::
 137 * Subprocesses::
 138 * Interface to the X Window System::
 139 * Index::
 140
 141 @detailmenu
 142
 143 --- The Detailed Node Listing ---
 144
 145 A History of Emacs
 146
 147 * Through Version 18::          Unification prevails.
 148 * Lucid Emacs::                 One version 19 Emacs.
 149 * GNU Emacs 19::                The other version 19 Emacs.
 150 * GNU Emacs 20::                The other version 20 Emacs.
 151 * XEmacs::                      The continuation of Lucid Emacs.
 152
 153 Rules When Writing New C Code
 154
 155 * General Coding Rules::
 156 * Writing Lisp Primitives::
 157 * Adding Global Lisp Variables::
 158 * Coding for Mule::
 159 * Techniques for XEmacs Developers::
 160
 161 Coding for Mule
 162
 163 * Character-Related Data Types::
 164 * Working With Character and Byte Positions::
 165 * Conversion to and from External Data::
 166 * General Guidelines for Writing Mule-Aware Code::
 167 * An Example of Mule-Aware Code::
 168
 169 A Summary of the Various XEmacs Modules
 170
 171 * Low-Level Modules::
 172 * Basic Lisp Modules::
 173 * Modules for Standard Editing Operations::
 174 * Editor-Level Control Flow Modules::
 175 * Modules for the Basic Displayable Lisp Objects::
 176 * Modules for other Display-Related Lisp Objects::
 177 * Modules for the Redisplay Mechanism::
 178 * Modules for Interfacing with the File System::
 179 * Modules for Other Aspects of the Lisp Interpreter and Object System::
 180 * Modules for Interfacing with the Operating System::
 181 * Modules for Interfacing with X Windows::
 182 * Modules for Internationalization::
 183
 184 Allocation of Objects in XEmacs Lisp
 185
 186 * Introduction to Allocation::
 187 * Garbage Collection::
 188 * GCPROing::
 189 * Garbage Collection - Step by Step::
 190 * Integers and Characters::
 191 * Allocation from Frob Blocks::
 192 * lrecords::
 193 * Low-level allocation::
 194 * Cons::
 195 * Vector::
 196 * Bit Vector::
 197 * Symbol::
 198 * Marker::
 199 * String::
 200 * Compiled Function::
 201
 202 Garbage Collection - Step by Step
 203
 204 * Invocation::
 205 * garbage_collect_1::
 206 * mark_object::
 207 * gc_sweep::
 208 * sweep_lcrecords_1::
 209 * compact_string_chars::
 210 * sweep_strings::
 211 * sweep_bit_vectors_1::
 212
 213 Dumping
 214
 215 * Overview::
 216 * Data descriptions::
 217 * Dumping phase::
 218 * Reloading phase::
 219
 220 Dumping phase
 221
 222 * Object inventory::
 223 * Address allocation::
 224 * The header::
 225 * Data dumping::
 226 * Pointers dumping::
 227
 228 Events and the Event Loop
 229
 230 * Introduction to Events::
 231 * Main Loop::
 232 * Specifics of the Event Gathering Mechanism::
 233 * Specifics About the Emacs Event::
 234 * The Event Stream Callback Routines::
 235 * Other Event Loop Functions::
 236 * Converting Events::
 237 * Dispatching Events; The Command Builder::
 238
 239 Evaluation; Stack Frames; Bindings
 240
 241 * Evaluation::
 242 * Dynamic Binding; The specbinding Stack; Unwind-Protects::
 243 * Simple Special Forms::
 244 * Catch and Throw::
 245
 246 Symbols and Variables
 247
 248 * Introduction to Symbols::
 249 * Obarrays::
 250 * Symbol Values::
 251
 252 Buffers and Textual Representation
 253
 254 * Introduction to Buffers::     A buffer holds a block of text such as a file.
 255 * The Text in a Buffer::        Representation of the text in a buffer.
 256 * Buffer Lists::                Keeping track of all buffers.
 257 * Markers and Extents::         Tagging locations within a buffer.
 258 * Bufbytes and Emchars::        Representation of individual characters.
 259 * The Buffer Object::           The Lisp object corresponding to a buffer.
 260
 261 MULE Character Sets and Encodings
 262
 263 * Character Sets::
 264 * Encodings::
 265 * Internal Mule Encodings::
 266 * CCL::
 267
 268 Encodings
 269
 270 * Japanese EUC (Extended Unix Code)::
 271 * JIS7::
 272
 273 Internal Mule Encodings
 274
 275 * Internal String Encoding::
 276 * Internal Character Encoding::
 277
 278 Lstreams
 279
 280 * Creating an Lstream::         Creating an lstream object.
 281 * Lstream Types::               Different sorts of things that are streamed.
 282 * Lstream Functions::           Functions for working with lstreams.
 283 * Lstream Methods::             Creating new lstream types.
 284
 285 Consoles; Devices; Frames; Windows
 286
 287 * Introduction to Consoles; Devices; Frames; Windows::
 288 * Point::
 289 * Window Hierarchy::
 290 * The Window Object::
 291
 292 The Redisplay Mechanism
 293
 294 * Critical Redisplay Sections::
 295 * Line Start Cache::
 296 * Redisplay Piece by Piece::
 297
 298 Extents
 299
 300 * Introduction to Extents::     Extents are ranges over text, with properties.
 301 * Extent Ordering::             How extents are ordered internally.
 302 * Format of the Extent Info::   The extent information in a buffer or string.
 303 * Zero-Length Extents::         A weird special case.
 304 * Mathematics of Extent Ordering::  A rigorous foundation.
 305 * Extent Fragments::            Cached information useful for redisplay.
 306
 307 @end detailmenu
 308 @end menu
 309
 310 @node A History of Emacs, XEmacs From the Outside, Top, Top
 311 @chapter A History of Emacs
 312 @cindex history of Emacs, a
 313 @cindex Emacs, a history of
 314 @cindex Hackers (Steven Levy)
 315 @cindex Levy, Steven
 316 @cindex ITS (Incompatible Timesharing System)
 317 @cindex Stallman, Richard
 318 @cindex RMS
 319 @cindex MIT
 320 @cindex TECO
 321 @cindex FSF
 322 @cindex Free Software Foundation
 323
 324   XEmacs is a powerful, customizable text editor and development
 325 environment.  It began as Lucid Emacs, which was in turn derived from
 326 GNU Emacs, a program written by Richard Stallman of the Free Software
 327 Foundation.  GNU Emacs dates back to the 1970's, and was modelled
 328 after a package called ``Emacs'', written in 1976, that was a set of
 329 macros on top of TECO, an old, old text editor written at MIT on the
 330 DEC PDP 10 under one of the earliest time-sharing operating systems,
 331 ITS (Incompatible Timesharing System). (ITS dates back well before
 332 Unix.) ITS, TECO, and Emacs were products of a group of people at MIT
 333 who called themselves ``hackers'', who shared an idealistic belief
 334 system about the free exchange of information and were fanatical in
 335 their devotion to and time spent with computers. (The hacker
 336 subculture dates back to the late 1950's at MIT and is described in
 337 detail in Steven Levy's book @cite{Hackers}.  This book also includes
 338 a lot of information about Stallman himself and the development of
 339 Lisp, a programming language developed at MIT that underlies Emacs.)
 340
 341 @menu
 342 * Through Version 18::          Unification prevails.
 343 * Lucid Emacs::                 One version 19 Emacs.
 344 * GNU Emacs 19::                The other version 19 Emacs.
 345 * GNU Emacs 20::                The other version 20 Emacs.
 346 * XEmacs::                      The continuation of Lucid Emacs.
 347 @end menu
 348
 349 @node Through Version 18
 350 @section Through Version 18
 351 @cindex version 18, through
 352 @cindex Gosling, James
 353 @cindex Great Usenet Renaming
 354
 355   Although the history of the early versions of GNU Emacs is unclear,
 356 the history is well-known from the middle of 1985.  A time line is:
 357
 358 @itemize @bullet
 359 @item
 360 GNU Emacs version 15 (15.34) was released sometime in 1984 or 1985 and
 361 shared some code with a version of Emacs written by James Gosling (the
 362 same James Gosling who later created the Java language).
 363 @item
 364 GNU Emacs version 16 (first released version was 16.56) was released on
 365 July 15, 1985.  All Gosling code was removed due to potential copyright
 366 problems with the code.
 367 @item
 368 version 16.57: released on September 16, 1985.
 369 @item
 370 versions 16.58, 16.59: released on September 17, 1985.
 371 @item
 372 version 16.60: released on September 19, 1985.  These later version 16's
 373 incorporated patches from the net, esp. for getting Emacs to work under
 374 System V.
 375 @item
 376 version 17.36 (first official v17 release) released on December 20,
 377 1985.  Included a TeX-able user manual.  First official unpatched
 378 version that worked on vanilla System V machines.
 379 @item
 380 version 17.43 (second official v17 release) released on January 25,
 381 1986.
 382 @item
 383 version 17.45 released on January 30, 1986.
 384 @item
 385 version 17.46 released on February 4, 1986.
 386 @item
 387 version 17.48 released on February 10, 1986.
 388 @item
 389 version 17.49 released on February 12, 1986.
 390 @item
 391 version 17.55 released on March 18, 1986.
 392 @item
 393 version 17.57 released on March 27, 1986.
 394 @item
 395 version 17.58 released on April 4, 1986.
 396 @item
 397 version 17.61 released on April 12, 1986.
 398 @item
 399 version 17.63 released on May 7, 1986.
 400 @item
 401 version 17.64 released on May 12, 1986.
 402 @item
 403 version 18.24 (a beta version) released on October 2, 1986.
 404 @item
 405 version 18.30 (a beta version) released on November 15, 1986.
 406 @item
 407 version 18.31 (a beta version) released on November 23, 1986.
 408 @item
 409 version 18.32 (a beta version) released on December 7, 1986.
 410 @item
 411 version 18.33 (a beta version) released on December 12, 1986.
 412 @item
 413 version 18.35 (a beta version) released on January 5, 1987.
 414 @item
 415 version 18.36 (a beta version) released on January 21, 1987.
 416 @item
 417 January 27, 1987: The Great Usenet Renaming.  net.emacs is now
 418 comp.emacs.
 419 @item
 420 version 18.37 (a beta version) released on February 12, 1987.
 421 @item
 422 version 18.38 (a beta version) released on March 3, 1987.
 423 @item
 424 version 18.39 (a beta version) released on March 14, 1987.
 425 @item
 426 version 18.40 (a beta version) released on March 18, 1987.
 427 @item
 428 version 18.41 (the first ``official'' release) released on March 22,
 429 1987.
 430 @item
 431 version 18.45 released on June 2, 1987.
 432 @item
 433 version 18.46 released on June 9, 1987.
 434 @item
 435 version 18.47 released on June 18, 1987.
 436 @item
 437 version 18.48 released on September 3, 1987.
 438 @item
 439 version 18.49 released on September 18, 1987.
 440 @item
 441 version 18.50 released on February 13, 1988.
 442 @item
 443 version 18.51 released on May 7, 1988.
 444 @item
 445 version 18.52 released on September 1, 1988.
 446 @item
 447 version 18.53 released on February 24, 1989.
 448 @item
 449 version 18.54 released on April 26, 1989.
 450 @item
 451 version 18.55 released on August 23, 1989.  This is the earliest version
 452 that is still available by FTP.
 453 @item
 454 version 18.56 released on January 17, 1991.
 455 @item
 456 version 18.57 released late January, 1991.
 457 @item
 458 version 18.58 released ?????.
 459 @item
 460 version 18.59 released October 31, 1992.
 461 @end itemize
 462
 463 @node Lucid Emacs
 464 @section Lucid Emacs
 465 @cindex Lucid Emacs
 466 @cindex Lucid Inc.
 467 @cindex Energize
 468 @cindex Epoch
 469
 470   Lucid Emacs was developed by the (now-defunct) Lucid Inc., a maker of
 471 C++ and Lisp development environments.  It began when Lucid decided they
 472 wanted to use Emacs as the editor and cornerstone of their C++
 473 development environment (called ``Energize'').  They needed many features
 474 that were not available in the existing version of GNU Emacs (version
 475 18.5something), in particular good and integrated support for GUI
 476 elements such as mouse support, multiple fonts, multiple window-system
 477 windows, etc.  A branch of GNU Emacs called Epoch, written at the
 478 University of Illinois, existed that supplied many of these features;
 479 however, Lucid needed more than what existed in Epoch.  At the time, the
 480 Free Software Foundation was working on version 19 of Emacs (this was
 481 sometime around 1991), which was planned to have similar features, and
 482 so Lucid decided to work with the Free Software Foundation.  Their plan
 483 was to add features that they needed, and coordinate with the FSF so
 484 that the features would get included back into Emacs version 19.
 485
 486   Delays in the release of version 19 occurred, however (resulting in it
 487 finally being released more than a year after what was initially
 488 planned), and Lucid encountered unexpected technical resistance in
 489 getting their changes merged back into version 19, so they decided to
 490 release their own version of Emacs, which became Lucid Emacs 19.0.
 491
 492 @cindex Zawinski, Jamie
 493 @cindex Sexton, Harlan
 494 @cindex Benson, Eric
 495 @cindex Devin, Matthieu
 496   The initial authors of Lucid Emacs were Matthieu Devin, Harlan Sexton,
 497 and Eric Benson, and the work was later taken over by Jamie Zawinski,
 498 who became ``Mr. Lucid Emacs'' for many releases.
 499
 500   A time line for Lucid Emacs is
 501
 502 @itemize @bullet
 503 @item
 504 version 19.0 shipped with Energize 1.0, April 1992.
 505 @item
 506 version 19.1 released June 4, 1992.
 507 @item
 508 version 19.2 released June 19, 1992.
 509 @item
 510 version 19.3 released September 9, 1992.
 511 @item
 512 version 19.4 released January 21, 1993.
 513 @item
 514 version 19.5 was a repackaging of 19.4 with a few bug fixes and
 515 shipped with Energize 2.0.  Never released to the net.
 516 @item
 517 version 19.6 released April 9, 1993.
 518 @item
 519 version 19.7 was a repackaging of 19.6 with a few bug fixes and
 520 shipped with Energize 2.1.  Never released to the net.
 521 @item
 522 version 19.8 released September 6, 1993.
 523 @item
 524 version 19.9 released January 12, 1994.
 525 @item
 526 version 19.10 released May 27, 1994.
 527 @item
 528 version 19.11 (first XEmacs) released September 13, 1994.
 529 @item
 530 version 19.12 released June 23, 1995.
 531 @item
 532 version 19.13 released September 1, 1995.
 533 @item
 534 version 19.14 released June 23, 1996.
 535 @item
 536 version 20.0 released February 9, 1997.
 537 @item
 538 version 19.15 released March 28, 1997.
 539 @item
 540 version 20.1 (not released to the net) April 15, 1997.
 541 @item
 542 version 20.2 released May 16, 1997.
 543 @item
 544 version 19.16 released October 31, 1997.
 545 @item
 546 version 20.3 (the first stable version of XEmacs 20.x) released November 30,
 547 1997.
 548 @item
 549 version 20.4 released February 28, 1998.
 550 @item
 551 version 21.1.2 released May 14, 1999. (The version naming scheme was
 552 changed at this point: [a] the second version number is odd for stable
 553 versions, even for beta versions; [b] a third version number is added,
 554 replacing the "beta xxx" ending for beta versions and allowing for
 555 periodic maintenance releases for stable versions.  Therefore, 21.0 was
 556 never "officially" released; similarly for 21.2, etc.)
 557 @item
 558 version 21.1.3 released June 26, 1999.
 559 @item
 560 version 21.1.4 released July 8, 1999.
 561 @item
 562 version 21.1.6 released August 14, 1999. (There was no 21.1.5.)
 563 @item
 564 version 21.1.7 released September 26, 1999.
 565 @item
 566 version 21.1.8 released November 2, 1999.
 567 @item
 568 version 21.1.9 released February 13, 2000.
 569 @item
 570 version 21.1.10 released May 7, 2000.
 571 @item
 572 version 21.1.10a released June 24, 2000.
 573 @item
 574 version 21.1.11 released July 18, 2000.
 575 @item
 576 version 21.1.12 released August 5, 2000.
 577 @item
 578 version 21.1.13 released January 7, 2001.
 579 @item
 580 version 21.1.14 released January 27, 2001.
 581 @end itemize
 582
 583 @node GNU Emacs 19
 584 @section GNU Emacs 19
 585 @cindex GNU Emacs 19
 586 @cindex Emacs 19, GNU
 587 @cindex version 19, GNU Emacs
 588 @cindex FSF Emacs
 589
 590   About a year after the initial release of Lucid Emacs, the FSF
 591 released a beta of their version of Emacs 19 (referred to here as ``GNU
 592 Emacs'').  By this time, the current version of Lucid Emacs was
 593 19.6. (Strangely, the first released beta from the FSF was GNU Emacs
 594 19.7.) A time line for GNU Emacs version 19 is
 595
 596 @itemize @bullet
 597 @item
 598 version 19.8 (beta) released May 27, 1993.
 599 @item
 600 version 19.9 (beta) released May 27, 1993.
 601 @item
 602 version 19.10 (beta) released May 30, 1993.
 603 @item
 604 version 19.11 (beta) released June 1, 1993.
 605 @item
 606 version 19.12 (beta) released June 2, 1993.
 607 @item
 608 version 19.13 (beta) released June 8, 1993.
 609 @item
 610 version 19.14 (beta) released June 17, 1993.
 611 @item
 612 version 19.15 (beta) released June 19, 1993.
 613 @item
 614 version 19.16 (beta) released July 6, 1993.
 615 @item
 616 version 19.17 (beta) released late July, 1993.
 617 @item
 618 version 19.18 (beta) released August 9, 1993.
 619 @item
 620 version 19.19 (beta) released August 15, 1993.
 621 @item
 622 version 19.20 (beta) released November 17, 1993.
 623 @item
 624 version 19.21 (beta) released November 17, 1993.
 625 @item
 626 version 19.22 (beta) released November 28, 1993.
 627 @item
 628 version 19.23 (beta) released May 17, 1994.
 629 @item
 630 version 19.24 (beta) released May 16, 1994.
 631 @item
 632 version 19.25 (beta) released June 3, 1994.
 633 @item
 634 version 19.26 (beta) released September 11, 1994.
 635 @item
 636 version 19.27 (beta) released September 14, 1994.
 637 @item
 638 version 19.28 (first ``official'' release) released November 1, 1994.
 639 @item
 640 version 19.29 released June 21, 1995.
 641 @item
 642 version 19.30 released November 24, 1995.
 643 @item
 644 version 19.31 released May 25, 1996.
 645 @item
 646 version 19.32 released July 31, 1996.
 647 @item
 648 version 19.33 released August 11, 1996.
 649 @item
 650 version 19.34 released August 21, 1996.
 651 @item
 652 version 19.34b released September 6, 1996.
 653 @end itemize
 654
 655 @cindex Mlynarik, Richard
 656   In some ways, GNU Emacs 19 was better than Lucid Emacs; in some ways,
 657 worse.  Lucid soon began incorporating features from GNU Emacs 19 into
 658 Lucid Emacs; the work was mostly done by Richard Mlynarik, who had been
 659 working on and using GNU Emacs for a long time (back as far as version
 660 16 or 17).
 661
 662 @node GNU Emacs 20
 663 @section GNU Emacs 20
 664 @cindex GNU Emacs 20
 665 @cindex Emacs 20, GNU
 666 @cindex version 20, GNU Emacs
 667 @cindex FSF Emacs
 668
 669 On February 2, 1997 work began on GNU Emacs to integrate Mule.  The first
 670 release was made in September of that year.
 671
 672 A timeline for Emacs 20 is
 673
 674 @itemize @bullet
 675 @item
 676 version 20.1 released September 17, 1997.
 677 @item
 678 version 20.2 released September 20, 1997.
 679 @item
 680 version 20.3 released August 19, 1998.
 681 @end itemize
 682
 683 @node XEmacs
 684 @section XEmacs
 685 @cindex XEmacs
 686
 687 @cindex Sun Microsystems
 688 @cindex University of Illinois
 689 @cindex Illinois, University of
 690 @cindex SPARCWorks
 691 @cindex Andreessen, Marc
 692 @cindex Baur, Steve
 693 @cindex Buchholz, Martin
 694 @cindex Kaplan, Simon
 695 @cindex Wing, Ben
 696 @cindex Thompson, Chuck
 697 @cindex Win-Emacs
 698 @cindex Epoch
 699 @cindex Amdahl Corporation
 700   Around the time that Lucid was developing Energize, Sun Microsystems
 701 was developing their own development environment (called ``SPARCWorks'')
 702 and also decided to use Emacs.  They joined forces with the Epoch team
 703 at the University of Illinois and later with Lucid.  The maintainer of
 704 the last-released version of Epoch was Marc Andreessen, but he dropped
 705 out and the Epoch project, headed by Simon Kaplan, lured Chuck Thompson
 706 away from a system administration job to become the primary Lucid Emacs
 707 author for Epoch and Sun.  Chuck's area of specialty became the
 708 redisplay engine (he replaced the old Lucid Emacs redisplay engine with
 709 a ported version from Epoch and then later rewrote it from scratch).
 710 Sun also hired Ben Wing (the author of Win-Emacs, a port of Lucid Emacs
 711 to Microsoft Windows 3.1) in 1993, for what was initially a one-month
 712 contract to fix some event problems but later became a many-year
 713 involvement, punctuated by a six-month contract with Amdahl Corporation.
 714
 715 @cindex rename to XEmacs
 716   In 1994, Sun and Lucid agreed to rename Lucid Emacs to XEmacs (a name
 717 not favorable to either company); the first release called XEmacs was
 718 version 19.11.  In June 1994, Lucid folded and Jamie quit to work for
 719 the newly formed Mosaic Communications Corp., later Netscape
 720 Communications Corp. (co-founded by the same Marc Andreessen, who had
 721 quit his Epoch job to work on a graphical browser for the World Wide
 722 Web).  Chuck then become the primary maintainer of XEmacs, and put out
 723 versions 19.11 through 19.14 in conjunction with Ben.  For 19.12 and
 724 19.13, Chuck added the new redisplay and many other display improvements
 725 and Ben added MULE support (support for Asian and other languages) and
 726 redesigned most of the internal Lisp subsystems to better support the
 727 MULE work and the various other features being added to XEmacs.  After
 728 19.14 Chuck retired as primary maintainer and Steve Baur stepped in.
 729
 730 @cindex MULE merged XEmacs appears
 731   Soon after 19.13 was released, work began in earnest on the MULE
 732 internationalization code and the source tree was divided into two
 733 development paths.  The MULE version was initially called 19.20, but was
 734 soon renamed to 20.0.  In 1996 Martin Buchholz of Sun Microsystems took
 735 over the care and feeding of it and worked on it in parallel with the
 736 19.14 development that was occurring at the same time.  After much work
 737 by Martin, it was decided to release 20.0 ahead of 19.15 in February
 738 1997.  The source tree remained divided until 20.2 when the version 19
 739 source was finally retired at version 19.16.
 740
 741 @cindex Baur, Steve
 742 @cindex Buchholz, Martin
 743 @cindex Jones, Kyle
 744 @cindex Niksic, Hrvoje
 745 @cindex XEmacs goes it alone
 746   In 1997, Sun finally dropped all pretense of support for XEmacs and
 747 Martin Buchholz left the company in November.  Since then, and mostly
 748 for the previous year, because Steve Baur was never paid to work on
 749 XEmacs, XEmacs has existed solely on the contributions of volunteers
 750 from the Free Software Community.  Starting from 1997, Hrvoje Niksic and
 751 Kyle Jones have figured prominently in XEmacs development.
 752
 753 @cindex merging attempts
 754   Many attempts have been made to merge XEmacs and GNU Emacs, but they
 755 have consistently failed.
 756
 757   A more detailed history is contained in the XEmacs About page.
 758
 759   A time line for XEmacs is
 760
 761 @itemize @bullet
 762 @item
 763 version 19.11 (first XEmacs) released September 13, 1994.
 764 @item
 765 version 19.12 released June 23, 1995.
 766 @item
 767 version 19.13 released September 1, 1995.
 768 @item
 769 version 19.14 released June 23, 1996.
 770 @item
 771 version 20.0 released February 9, 1997.
 772 @item
 773 version 19.15 released March 28, 1997.
 774 @item
 775 version 20.1 (not released to the net) April 15, 1997.
 776 @item
 777 version 20.2 released May 16, 1997.
 778 @item
 779 version 19.16 released October 31, 1997.
 780 @item
 781 version 20.3 (the first stable version of XEmacs 20.x) released November 30,
 782 1997.
 783 @item
 784 version 20.4 released February 28, 1998.
 785 @item
 786 version 21.0.60 released December 10, 1998. (The version naming scheme was
 787 changed at this point: [a] the second version number is odd for stable
 788 versions, even for beta versions; [b] a third version number is added,
 789 replacing the "beta xxx" ending for beta versions and allowing for
 790 periodic maintenance releases for stable versions.  Therefore, 21.0 was
 791 never "officially" released; similarly for 21.2, etc.)
 792 @item
 793 version 21.0.61 released January 4, 1999.
 794 @item
 795 version 21.0.63 released February 3, 1999.
 796 @item
 797 version 21.0.64 released March 1, 1999.
 798 @item
 799 version 21.0.65 released March 5, 1999.
 800 @item
 801 version 21.0.66 released March 12, 1999.
 802 @item
 803 version 21.0.67 released March 25, 1999.
 804 @item
 805 version 21.1.2 released May 14, 1999. (This is the followup to 21.0.67.
 806 The second version number was bumped to indicate the beginning of the
 807 "stable" series.)
 808 @item
 809 version 21.1.3 released June 26, 1999.
 810 @item
 811 version 21.1.4 released July 8, 1999.
 812 @item
 813 version 21.1.6 released August 14, 1999. (There was no 21.1.5.)
 814 @item
 815 version 21.1.7 released September 26, 1999.
 816 @item
 817 version 21.1.8 released November 2, 1999.
 818 @item
 819 version 21.1.9 released February 13, 2000.
 820 @item
 821 version 21.1.10 released May 7, 2000.
 822 @item
 823 version 21.1.10a released June 24, 2000.
 824 @item
 825 version 21.1.11 released July 18, 2000.
 826 @item
 827 version 21.1.12 released August 5, 2000.
 828 @item
 829 version 21.1.13 released January 7, 2001.
 830 @item
 831 version 21.1.14 released January 27, 2001.
 832 @item
 833 version 21.2.9 released February 3, 1999.
 834 @item
 835 version 21.2.10 released February 5, 1999.
 836 @item
 837 version 21.2.11 released March 1, 1999.
 838 @item
 839 version 21.2.12 released March 5, 1999.
 840 @item
 841 version 21.2.13 released March 12, 1999.
 842 @item
 843 version 21.2.14 released May 14, 1999.
 844 @item
 845 version 21.2.15 released June 4, 1999.
 846 @item
 847 version 21.2.16 released June 11, 1999.
 848 @item
 849 version 21.2.17 released June 22, 1999.
 850 @item
 851 version 21.2.18 released July 14, 1999.
 852 @item
 853 version 21.2.19 released July 30, 1999.
 854 @item
 855 version 21.2.20 released November 10, 1999.
 856 @item
 857 version 21.2.21 released November 28, 1999.
 858 @item
 859 version 21.2.22 released November 29, 1999.
 860 @item
 861 version 21.2.23 released December 7, 1999.
 862 @item
 863 version 21.2.24 released December 14, 1999.
 864 @item
 865 version 21.2.25 released December 24, 1999.
 866 @item
 867 version 21.2.26 released December 31, 1999.
 868 @item
 869 version 21.2.27 released January 18, 2000.
 870 @item
 871 version 21.2.28 released February 7, 2000.
 872 @item
 873 version 21.2.29 released February 16, 2000.
 874 @item
 875 version 21.2.30 released February 21, 2000.
 876 @item
 877 version 21.2.31 released February 23, 2000.
 878 @item
 879 version 21.2.32 released March 20, 2000.
 880 @item
 881 version 21.2.33 released May 1, 2000.
 882 @item
 883 version 21.2.34 released May 28, 2000.
 884 @item
 885 version 21.2.35 released July 19, 2000.
 886 @item
 887 version 21.2.36 released October 4, 2000.
 888 @item
 889 version 21.2.37 released November 14, 2000.
 890 @item
 891 version 21.2.38 released December 5, 2000.
 892 @item
 893 version 21.2.39 released December 31, 2000.
 894 @item
 895 version 21.2.40 released January 8, 2001.
 896 @item
 897 version 21.2.41 released January 17, 2001.
 898 @item
 899 version 21.2.42 released January 20, 2001.
 900 @item
 901 version 21.2.43 released January 26, 2001.
 902 @item
 903 version 21.2.44 released February 8, 2001.
 904 @item
 905 version 21.2.45 released February 23, 2001.
 906 @item
 907 version 21.2.46 released March 21, 2001.
 908 @end itemize
 909
 910 @node XEmacs From the Outside, The Lisp Language, A History of Emacs, Top
 911 @chapter XEmacs From the Outside
 912 @cindex XEmacs from the outside
 913 @cindex outside, XEmacs from the
 914 @cindex read-eval-print
 915
 916   XEmacs appears to the outside world as an editor, but it is really a
 917 Lisp environment.  At its heart is a Lisp interpreter; it also
 918 ``happens'' to contain many specialized object types (e.g. buffers,
 919 windows, frames, events) that are useful for implementing an editor.
 920 Some of these objects (in particular windows and frames) have
 921 displayable representations, and XEmacs provides a function
 922 @code{redisplay()} that ensures that the display of all such objects
 923 matches their internal state.  Most of the time, a standard Lisp
 924 environment is in a @dfn{read-eval-print} loop---i.e. ``read some Lisp
 925 code, execute it, and print the results''.  XEmacs has a similar loop:
 926
 927 @itemize @bullet
 928 @item
 929 read an event
 930 @item
 931 dispatch the event (i.e. ``do it'')
 932 @item
 933 redisplay
 934 @end itemize
 935
 936   Reading an event is done using the Lisp function @code{next-event},
 937 which waits for something to happen (typically, the user presses a key
 938 or moves the mouse) and returns an event object describing this.
 939 Dispatching an event is done using the Lisp function
 940 @code{dispatch-event}, which looks up the event in a keymap object (a
 941 particular kind of object that associates an event with a Lisp function)
 942 and calls that function.  The function ``does'' what the user has
 943 requested by changing the state of particular frame objects, buffer
 944 objects, etc.  Finally, @code{redisplay()} is called, which updates the
 945 display to reflect those changes just made.  Thus is an ``editor'' born.
 946
 947 @cindex bridge, playing
 948 @cindex taxes, doing
 949 @cindex pi, calculating
 950   Note that you do not have to use XEmacs as an editor; you could just
 951 as well make it do your taxes, compute pi, play bridge, etc.  You'd just
 952 have to write functions to do those operations in Lisp.
 953
 954 @node The Lisp Language, XEmacs From the Perspective of Building, XEmacs From the Outside, Top
 955 @chapter The Lisp Language
 956 @cindex Lisp language, the
 957 @cindex Lisp vs. C
 958 @cindex C vs. Lisp
 959 @cindex Lisp vs. Java
 960 @cindex Java vs. Lisp
 961 @cindex dynamic scoping
 962 @cindex scoping, dynamic
 963 @cindex dynamic types
 964 @cindex types, dynamic
 965 @cindex Java
 966 @cindex Common Lisp
 967 @cindex Gosling, James
 968
 969   Lisp is a general-purpose language that is higher-level than C and in
 970 many ways more powerful than C.  Powerful dialects of Lisp such as
 971 Common Lisp are probably much better languages for writing very large
 972 applications than is C. (Unfortunately, for many non-technical
 973 reasons C and its successor C++ have become the dominant languages for
 974 application development.  These languages are both inadequate for
 975 extremely large applications, which is evidenced by the fact that newer,
 976 larger programs are becoming ever harder to write and are requiring ever
 977 more programmers despite great increases in C development environments;
 978 and by the fact that, although hardware speeds and reliability have been
 979 growing at an exponential rate, most software is still generally
 980 considered to be slow and buggy.)
 981
 982   The new Java language holds promise as a better general-purpose
 983 development language than C.  Java has many features in common with
 984 Lisp that are not shared by C (this is not a coincidence, since
 985 Java was designed by James Gosling, a former Lisp hacker).  This
 986 will be discussed more later.
 987
 988 For those used to C, here is a summary of the basic differences between
 989 C and Lisp:
 990
 991 @enumerate
 992 @item
 993 Lisp has an extremely regular syntax.  Every function, expression,
 994 and control statement is written in the form
 995
 996 @example
 997    (@var{func} @var{arg1} @var{arg2} ...)
 998 @end example
 999
1000 This is as opposed to C, which writes functions as
1001
1002 @example
1003    func(@var{arg1}, @var{arg2}, ...)
1004 @end example
1005
1006 but writes expressions involving operators as (e.g.)
1007
1008 @example
1009    @var{arg1} + @var{arg2}
1010 @end example
1011
1012 and writes control statements as (e.g.)
1013
1014 @example
1015    while (@var{expr}) @{ @var{statement1}; @var{statement2}; ... @}
1016 @end example
1017
1018 Lisp equivalents of the latter two would be
1019
1020 @example
1021    (+ @var{arg1} @var{arg2} ...)
1022 @end example
1023
1024 and
1025
1026 @example
1027    (while @var{expr} @var{statement1} @var{statement2} ...)
1028 @end example
1029
1030 @item
1031 Lisp is a safe language.  Assuming there are no bugs in the Lisp
1032 interpreter/compiler, it is impossible to write a program that ``core
1033 dumps'' or otherwise causes the machine to execute an illegal
1034 instruction.  This is very different from C, where perhaps the most
1035 common outcome of a bug is exactly such a crash.  A corollary of this is that
1036 the C operation of casting a pointer is impossible (and unnecessary) in
1037 Lisp, and that it is impossible to access memory outside the bounds of
1038 an array.
1039
1040 @item
1041 Programs and data are written in the same form.  The
1042 parenthesis-enclosing form described above for statements is the same
1043 form used for the most common data type in Lisp, the list.  Thus, it is
1044 possible to represent any Lisp program using Lisp data types, and for
1045 one program to construct Lisp statements and then dynamically
1046 @dfn{evaluate} them, or cause them to execute.
1047
1048 @item
1049 All objects are @dfn{dynamically typed}.  This means that part of every
1050 object is an indication of what type it is.  A Lisp program can
1051 manipulate an object without knowing what type it is, and can query an
1052 object to determine its type.  This means that, correspondingly,
1053 variables and function parameters can hold objects of any type and are
1054 not normally declared as being of any particular type.  This is opposed
1055 to the @dfn{static typing} of C, where variables can hold exactly one
1056 type of object and must be declared as such, and objects do not contain
1057 an indication of their type because it's implicit in the variables they
1058 are stored in.  It is possible in C to have a variable hold different
1059 types of objects (e.g. through the use of @code{void *} pointers or
1060 variable-argument functions), but the type information must then be
1061 passed explicitly in some other fashion, leading to additional program
1062 complexity.
1063
1064 @item
1065 Allocated memory is automatically reclaimed when it is no longer in use.
1066 This operation is called @dfn{garbage collection} and involves looking
1067 through all variables to see what memory is being pointed to, and
1068 reclaiming any memory that is not pointed to and is thus
1069 ``inaccessible'' and out of use.  This is as opposed to C, in which
1070 allocated memory must be explicitly reclaimed using @code{free()}.  If
1071 you simply drop all pointers to memory without freeing it, it becomes
1072 ``leaked'' memory that still takes up space.  Over a long period of
1073 time, this can cause your program to grow and grow until it runs out of
1074 memory.
1075
1076 @item
1077 Lisp has built-in facilities for handling errors and exceptions.  In C,
1078 when an error occurs, usually either the program exits entirely or the
1079 routine in which the error occurs returns a value indicating this.  If
1080 an error occurs in a deeply-nested routine, then every routine currently
1081 called must unwind itself normally and return an error value back up to
1082 the next routine.  This means that every routine must explicitly check
1083 for an error in all the routines it calls; if it does not do so,
1084 unexpected and often random behavior results.  This is an extremely
1085 common source of bugs in C programs.  An alternative would be to do a
1086 non-local exit using @code{longjmp()}, but that is often very dangerous
1087 because the routines that were exited past had no opportunity to clean
1088 up after themselves and may leave things in an inconsistent state,
1089 causing a crash shortly afterwards.
1090
1091 Lisp provides mechanisms to make such non-local exits safe.  When an
1092 error occurs, a routine simply signals that an error of a particular
1093 class has occurred, and a non-local exit takes place.  Any routine can
1094 trap errors occurring in routines it calls by registering an error
1095 handler for some or all classes of errors. (If no handler is registered,
1096 a default handler, generally installed by the top-level event loop, is
1097 executed; this prints out the error and continues.) Routines can also
1098 specify cleanup code (called an @dfn{unwind-protect}) that will be
1099 called when control exits from a block of code, no matter how that exit
1100 occurs---i.e. even if a function deeply nested below it causes a
1101 non-local exit back to the top level.
1102
1103 Note that this facility has appeared in some recent vintages of C, in
1104 particular Visual C++ and other PC compilers written for the Microsoft
1105 Win32 API.
1106
1107 @item
1108 In Emacs Lisp, local variables are @dfn{dynamically scoped}.  This means
1109 that if you declare a local variable in a particular function, and then
1110 call another function, that subfunction can ``see'' the local variable
1111 you declared.  This is actually considered a bug in Emacs Lisp and in
1112 all other early dialects of Lisp, and was corrected in Common Lisp. (In
1113 Common Lisp, you can still declare dynamically scoped variables if you
1114 want to---they are sometimes useful---but variables by default are
1115 @dfn{lexically scoped} as in C.)
1116 @end enumerate
1117
1118 For those familiar with Lisp, Emacs Lisp is modelled after MacLisp, an
1119 early dialect of Lisp developed at MIT (no relation to the Macintosh
1120 computer).  There is a Common Lisp compatibility package available for
1121 Emacs that provides many of the features of Common Lisp.
1122
1123 The Java language is derived in many ways from C, and shares a similar
1124 syntax, but has the following features in common with Lisp (and different
1125 from C):
1126
1127 @enumerate
1128 @item
1129 Java is a safe language, like Lisp.
1130 @item
1131 Java provides garbage collection, like Lisp.
1132 @item
1133 Java has built-in facilities for handling errors and exceptions, like
1134 Lisp.
1135 @item
1136 Java has a type system that combines the best advantages of both static
1137 and dynamic typing.  Objects (except very simple types) are explicitly
1138 marked with their type, as in dynamic typing; but there is a hierarchy
1139 of types and functions are declared to accept only certain types, thus
1140 providing the increased compile-time error-checking of static typing.
1141 @end enumerate
1142
1143 The Java language also has some negative attributes:
1144
1145 @enumerate
1146 @item
1147 Java uses the edit/compile/run model of software development.  This
1148 makes it hard to use interactively.  For example, to use Java like
1149 @code{bc} it is necessary to write a special purpose, albeit tiny,
1150 application.  In Emacs Lisp, a calculator comes built-in without any
1151 effort - one can always just type an expression in the @code{*scratch*}
1152 buffer.
1153 @item
1154 Java tries too hard to enforce, not merely enable, portability, making
1155 ordinary access to standard OS facilities painful.  Java has an
1156 @dfn{agenda}.  I think this is why @code{chdir} is not part of standard
1157 Java, which is inexcusable.
1158 @end enumerate
1159
1160 Unfortunately, there is no perfect language.  Static typing allows a
1161 compiler to catch programmer errors and produce more efficient code, but
1162 makes programming more tedious and less fun.  For the foreseeable future,
1163 an Ideal Editing and Programming Environment (and that is what XEmacs
1164 aspires to) will be programmable in multiple languages: high level ones
1165 like Lisp for user customization and prototyping, and lower level ones
1166 for infrastructure and industrial strength applications.  If I had my
1167 way, XEmacs would be friendly towards the Python, Scheme, C++, ML,
1168 etc... communities.  But there are serious technical difficulties to
1169 achieving that goal.
1170
1171 The word @dfn{application} in the previous paragraph was used
1172 intentionally.  XEmacs implements an API for programs written in Lisp
1173 that makes it a full-fledged application platform, very much like an OS
1174 inside the real OS.
1175
1176 @node XEmacs From the Perspective of Building, XEmacs From the Inside, The Lisp Language, Top
1177 @chapter XEmacs From the Perspective of Building
1178 @cindex XEmacs from the perspective of building
1179 @cindex building, XEmacs from the perspective of
1180
1181 The heart of XEmacs is the Lisp environment, which is written in C.
1182 This is contained in the @file{src/} subdirectory.  Underneath
1183 @file{src/} are two subdirectories of header files: @file{s/} (header
1184 files for particular operating systems) and @file{m/} (header files for
1185 particular machine types).  In practice the distinction between the two
1186 types of header files is blurred.  These header files define or undefine
1187 certain preprocessor constants and macros to indicate particular
1188 characteristics of the associated machine or operating system.  As part
1189 of the configure process, one @file{s/} file and one @file{m/} file is
1190 identified for the particular environment in which XEmacs is being
1191 built.
1192
1193 XEmacs also contains a great deal of Lisp code.  This implements the
1194 operations that make XEmacs useful as an editor as well as just a Lisp
1195 environment, and also contains many add-on packages that allow XEmacs to
1196 browse directories, act as a mail and Usenet news reader, compile Lisp
1197 code, etc.  There is actually more Lisp code than C code associated with
1198 XEmacs, but much of the Lisp code is peripheral to the actual operation
1199 of the editor.  The Lisp code all lies in subdirectories underneath the
1200 @file{lisp/} directory.
1201
1202 The @file{lwlib/} directory contains C code that implements a
1203 generalized interface onto different X widget toolkits and also
1204 implements some widgets of its own that behave like Motif widgets but
1205 are faster, free, and in some cases more powerful.  The code in this
1206 directory compiles into a library and is mostly independent from XEmacs.
1207
1208 The @file{etc/} directory contains various data files associated with
1209 XEmacs.  Some of them are actually read by XEmacs at startup; others
1210 merely contain useful information of various sorts.
1211
1212 The @file{lib-src/} directory contains C code for various auxiliary
1213 programs that are used in connection with XEmacs.  Some of them are used
1214 during the build process; others are used to perform certain functions
1215 that cannot conveniently be placed in the XEmacs executable (e.g. the
1216 @file{movemail} program for fetching mail out of @file{/var/spool/mail},
1217 which must be setgid to @file{mail} on many systems; and the
1218 @file{gnuclient} program, which allows an external script to communicate
1219 with a running XEmacs process).
1220
1221 The @file{man/} directory contains the sources for the XEmacs
1222 documentation.  It is mostly in a form called Texinfo, which can be
1223 converted into either a printed document (by passing it through @TeX{})
1224 or into on-line documentation called @dfn{info files}.
1225
1226 The @file{info/} directory contains the results of formatting the XEmacs
1227 documentation as @dfn{info files}, for on-line use.  These files are
1228 used when you enter the Info system using @kbd{C-h i} or through the
1229 Help menu.
1230
1231 The @file{dynodump/} directory contains auxiliary code used to build
1232 XEmacs on Solaris platforms.
1233
1234 The other directories contain various miscellaneous code and information
1235 that is not normally used or needed.
1236
1237 The first step of building involves running the @file{configure} program
1238 and passing it various parameters to specify any optional features you
1239 want and compiler arguments and such, as described in the @file{INSTALL}
1240 file.  This determines what the build environment is, chooses the
1241 appropriate @file{s/} and @file{m/} file, and runs a series of tests to
1242 determine many details about your environment, such as which library
1243 functions are available and exactly how they work.  The reason for
1244 running these tests is that it allows XEmacs to be compiled on a much
1245 wider variety of platforms than those that the XEmacs developers happen
1246 to be familiar with, including various sorts of hybrid platforms.  This
1247 is especially important now that many operating systems give you a great
1248 deal of control over exactly what features you want installed, and allow
1249 for easy upgrading of parts of a system without upgrading the rest.  It
1250 would be impossible to pre-determine and pre-specify the information for
1251 all possible configurations.
1252
1253 In fact, the @file{s/} and @file{m/} files are basically @emph{evil},
1254 since they contain unmaintainable platform-specific hard-coded
1255 information.  XEmacs has been moving in the direction of having all
1256 system-specific information be determined dynamically by
1257 @file{configure}.  Perhaps someday we can @code{rm -rf src/s src/m}.
1258
1259 When configure is done running, it generates @file{Makefile}s and
1260 @file{GNUmakefile}s and the file @file{src/config.h} (which describes
1261 the features of your system) from template files.  You then run
1262 @file{make}, which compiles the auxiliary code and programs in
1263 @file{lib-src/} and @file{lwlib/} and the main XEmacs executable in
1264 @file{src/}.  The result of compiling and linking is an executable
1265 called @file{temacs}, which is @emph{not} the final XEmacs executable.
1266 @file{temacs} by itself is not intended to function as an editor or even
1267 display any windows on the screen, and if you simply run it, it will
1268 exit immediately.  The @file{Makefile} runs @file{temacs} with certain
1269 options that cause it to initialize itself, read in a number of basic
1270 Lisp files, and then dump itself out into a new executable called
1271 @file{xemacs}.  This new executable has been pre-initialized and
1272 contains pre-digested Lisp code that is necessary for the editor to
1273 function (this includes most basic editing functions,
1274 e.g. @code{kill-line}, that can be defined in terms of other Lisp
1275 primitives; some initialization code that is called when certain
1276 objects, such as frames, are created; and all of the standard
1277 keybindings and code for the actions they result in).  This executable,
1278 @file{xemacs}, is the executable that you run to use the XEmacs editor.
1279
1280 Although @file{temacs} is not intended to be run as an editor, it can,
1281 by using the incantation @code{temacs -batch -l loadup.el run-temacs}.
1282 This is useful when the dumping procedure described above is broken, or
1283 when using certain program debugging tools such as Purify.  These tools
1284 get mighty confused by the tricks played by the XEmacs build process,
1285 such as allocation memory in one process, and freeing it in the next.
1286
1287 @node XEmacs From the Inside, The XEmacs Object System (Abstractly Speaking), XEmacs From the Perspective of Building, Top
1288 @chapter XEmacs From the Inside
1289 @cindex XEmacs from the inside
1290 @cindex inside, XEmacs from the
1291
1292 Internally, XEmacs is quite complex, and can be very confusing.  To
1293 simplify things, it can be useful to think of XEmacs as containing an
1294 event loop that ``drives'' everything, and a number of other subsystems,
1295 such as a Lisp engine and a redisplay mechanism.  Each of these other
1296 subsystems exists simultaneously in XEmacs, and each has a certain
1297 state.  The flow of control continually passes in and out of these
1298 different subsystems in the course of normal operation of the editor.
1299
1300 It is important to keep in mind that, most of the time, the editor is
1301 ``driven'' by the event loop.  Except during initialization and batch
1302 mode, all subsystems are entered directly or indirectly through the
1303 event loop, and ultimately, control exits out of all subsystems back up
1304 to the event loop.  This cycle of entering a subsystem, exiting back out
1305 to the event loop, and starting another iteration of the event loop
1306 occurs once each keystroke, mouse motion, etc.
1307
1308 If you're trying to understand a particular subsystem (other than the
1309 event loop), think of it as a ``daemon'' process or ``servant'' that is
1310 responsible for one particular aspect of a larger system, and
1311 periodically receives commands or environment changes that cause it to
1312 do something.  Ultimately, these commands and environment changes are
1313 always triggered by the event loop.  For example:
1314
1315 @itemize @bullet
1316 @item
1317 The window and frame mechanism is responsible for keeping track of what
1318 windows and frames exist, what buffers are in them, etc.  It is
1319 periodically given commands (usually from the user) to make a change to
1320 the current window/frame state: i.e. create a new frame, delete a
1321 window, etc.
1322
1323 @item
1324 The buffer mechanism is responsible for keeping track of what buffers
1325 exist and what text is in them.  It is periodically given commands
1326 (usually from the user) to insert or delete text, create a buffer, etc.
1327 When it receives a text-change command, it notifies the redisplay
1328 mechanism.
1329
1330 @item
1331 The redisplay mechanism is responsible for making sure that windows and
1332 frames are displayed correctly.  It is periodically told (by the event
1333 loop) to actually ``do its job'', i.e. snoop around and see what the
1334 current state of the environment (mostly of the currently-existing
1335 windows, frames, and buffers) is, and make sure that that state matches
1336 what's actually displayed.  It keeps lots and lots of information around
1337 (such as what is actually being displayed currently, and what the
1338 environment was last time it checked) so that it can minimize the work
1339 it has to do.  It is also helped along in that whenever a relevant
1340 change to the environment occurs, the redisplay mechanism is told about
1341 this, so it has a pretty good idea of where it has to look to find
1342 possible changes and doesn't have to look everywhere.
1343
1344 @item
1345 The Lisp engine is responsible for executing the Lisp code in which most
1346 user commands are written.  It is entered through a call to @code{eval}
1347 or @code{funcall}, which occurs as a result of dispatching an event from
1348 the event loop.  The functions it calls issue commands to the buffer
1349 mechanism, the window/frame subsystem, etc.
1350
1351 @item
1352 The Lisp allocation subsystem is responsible for keeping track of Lisp
1353 objects.  It is given commands from the Lisp engine to allocate objects,
1354 garbage collect, etc.
1355 @end itemize
1356
1357 etc.
1358
1359   The important idea here is that there are a number of independent
1360 subsystems each with its own responsibility and persistent state, just
1361 like different employees in a company, and each subsystem is
1362 periodically given commands from other subsystems.  Commands can flow
1363 from any one subsystem to any other, but there is usually some sort of
1364 hierarchy, with all commands originating from the event subsystem.
1365
1366   XEmacs is entered in @code{main()}, which is in @file{emacs.c}.  When
1367 this is called the first time (in a properly-invoked @file{temacs}), it
1368 does the following:
1369
1370 @enumerate
1371 @item
1372 It does some very basic environment initializations, such as determining
1373 where it and its directories (e.g. @file{lisp/} and @file{etc/}) reside
1374 and setting up signal handlers.
1375 @item
1376 It initializes the entire Lisp interpreter.
1377 @item
1378 It sets the initial values of many built-in variables (including many
1379 variables that are visible to Lisp programs), such as the global keymap
1380 object and the built-in faces (a face is an object that describes the
1381 display characteristics of text).  This involves creating Lisp objects
1382 and thus is dependent on step (2).
1383 @item
1384 It performs various other initializations that are relevant to the
1385 particular environment it is running in, such as retrieving environment
1386 variables, determining the current date and the user who is running the
1387 program, examining its standard input, creating any necessary file
1388 descriptors, etc.
1389 @item
1390 At this point, the C initialization is complete.  A Lisp program that
1391 was specified on the command line (usually @file{loadup.el}) is called
1392 (temacs is normally invoked as @code{temacs -batch -l loadup.el dump}).
1393 @file{loadup.el} loads all of the other Lisp files that are needed for
1394 the operation of the editor, calls the @code{dump-emacs} function to
1395 write out @file{xemacs}, and then kills the temacs process.
1396 @end enumerate
1397
1398   When @file{xemacs} is then run, it only redoes steps (1) and (4)
1399 above; all variables already contain the values they were set to when
1400 the executable was dumped, and all memory that was allocated with
1401 @code{malloc()} is still around. (XEmacs knows whether it is being run
1402 as @file{xemacs} or @file{temacs} because it sets the global variable
1403 @code{initialized} to 1 after step (4) above.) At this point,
1404 @file{xemacs} calls a Lisp function to do any further initialization,
1405 which includes parsing the command-line (the C code can only do limited
1406 command-line parsing, which includes looking for the @samp{-batch} and
1407 @samp{-l} flags and a few other flags that it needs to know about before
1408 initialization is complete), creating the first frame (or @dfn{window}
1409 in standard window-system parlance), running the user's init file
1410 (usually the file @file{.emacs} in the user's home directory), etc.  The
1411 function to do this is usually called @code{normal-top-level};
1412 @file{loadup.el} tells the C code about this function by setting its
1413 name as the value of the Lisp variable @code{top-level}.
1414
1415   When the Lisp initialization code is done, the C code enters the event
1416 loop, and stays there for the duration of the XEmacs process.  The code
1417 for the event loop is contained in @file{cmdloop.c}, and is called
1418 @code{Fcommand_loop_1()}.  Note that this event loop could very well be
1419 written in Lisp, and in fact a Lisp version exists; but apparently,
1420 doing this makes XEmacs run noticeably slower.
1421
1422   Notice how much of the initialization is done in Lisp, not in C.
1423 In general, XEmacs tries to move as much code as is possible
1424 into Lisp.  Code that remains in C is code that implements the
1425 Lisp interpreter itself, or code that needs to be very fast, or
1426 code that needs to do system calls or other such stuff that
1427 needs to be done in C, or code that needs to have access to
1428 ``forbidden'' structures. (One conscious aspect of the design of
1429 Lisp under XEmacs is a clean separation between the external
1430 interface to a Lisp object's functionality and its internal
1431 implementation.  Part of this design is that Lisp programs
1432 are forbidden from accessing the contents of the object other
1433 than through using a standard API.  In this respect, XEmacs Lisp
1434 is similar to modern Lisp dialects but differs from GNU Emacs,
1435 which tends to expose the implementation and allow Lisp
1436 programs to look at it directly.  The major advantage of
1437 hiding the implementation is that it allows the implementation
1438 to be redesigned without affecting any Lisp programs, including
1439 those that might want to be ``clever'' by looking directly at
1440 the object's contents and possibly manipulating them.)
1441
1442   Moving code into Lisp makes the code easier to debug and maintain and
1443 makes it much easier for people who are not XEmacs developers to
1444 customize XEmacs, because they can make a change with much less chance
1445 of obscure and unwanted interactions occurring than if they were to
1446 change the C code.
1447
1448 @node The XEmacs Object System (Abstractly Speaking), How Lisp Objects Are Represented in C, XEmacs From the Inside, Top
1449 @chapter The XEmacs Object System (Abstractly Speaking)
1450 @cindex XEmacs object system (abstractly speaking), the
1451 @cindex object system (abstractly speaking), the XEmacs
1452
1453   At the heart of the Lisp interpreter is its management of objects.
1454 XEmacs Lisp contains many built-in objects, some of which are
1455 simple and others of which can be very complex; and some of which
1456 are very common, and others of which are rarely used or are only
1457 used internally. (Since the Lisp allocation system, with its
1458 automatic reclamation of unused storage, is so much more convenient
1459 than @code{malloc()} and @code{free()}, the C code makes extensive use of it
1460 in its internal operations.)
1461
1462   The basic Lisp objects are
1463
1464 @table @code
1465 @item integer
1466 28 or 31 bits of precision, or 60 or 63 bits on 64-bit machines; the
1467 reason for this is described below when the internal Lisp object
1468 representation is described.
1469 @item float
1470 Same precision as a double in C.
1471 @item cons
1472 A simple container for two Lisp objects, used to implement lists and
1473 most other data structures in Lisp.
1474 @item char
1475 An object representing a single character of text; chars behave like
1476 integers in many ways but are logically considered text rather than
1477 numbers and have a different read syntax. (the read syntax for a char
1478 contains the char itself or some textual encoding of it---for example,
1479 a Japanese Kanji character might be encoded as @samp{^[$(B#&^[(B} using the
1480 ISO-2022 encoding standard---rather than the numerical representation
1481 of the char; this way, if the mapping between chars and integers
1482 changes, which is quite possible for Kanji characters and other extended
1483 characters, the same character will still be created.  Note that some
1484 primitives confuse chars and integers.  The worst culprit is @code{eq},
1485 which makes a special exception and considers a char to be @code{eq} to
1486 its integer equivalent, even though in no other case are objects of two
1487 different types @code{eq}.  The reason for this monstrosity is
1488 compatibility with existing code; the separation of char from integer
1489 came fairly recently.)
1490 @item symbol
1491 An object that contains Lisp objects and is referred to by name;
1492 symbols are used to implement variables and named functions
1493 and to provide the equivalent of preprocessor constants in C.
1494 @item vector
1495 A one-dimensional array of Lisp objects providing constant-time access
1496 to any of the objects; access to an arbitrary object in a vector is
1497 faster than for lists, but the operations that can be done on a vector
1498 are more limited.
1499 @item string
1500 Self-explanatory; behaves much like a vector of chars
1501 but has a different read syntax and is stored and manipulated
1502 more compactly.
1503 @item bit-vector
1504 A vector of bits; similar to a string in spirit.
1505 @item compiled-function
1506 An object containing compiled Lisp code, known as @dfn{byte code}.
1507 @item subr
1508 A Lisp primitive, i.e. a Lisp-callable function implemented in C.
1509 @end table
1510
1511 @cindex closure
1512 Note that there is no basic ``function'' type, as in more powerful
1513 versions of Lisp (where it's called a @dfn{closure}).  XEmacs Lisp does
1514 not provide the closure semantics implemented by Common Lisp and Scheme.
1515 The guts of a function in XEmacs Lisp are represented in one of four
1516 ways: a symbol specifying another function (when one function is an
1517 alias for another), a list (whose first element must be the symbol
1518 @code{lambda}) containing the function's source code, a
1519 compiled-function object, or a subr object. (In other words, given a
1520 symbol specifying the name of a function, calling @code{symbol-function}
1521 to retrieve the contents of the symbol's function cell will return one
1522 of these types of objects.)
1523
1524 XEmacs Lisp also contains numerous specialized objects used to implement
1525 the editor:
1526
1527 @table @code
1528 @item buffer
1529 Stores text like a string, but is optimized for insertion and deletion
1530 and has certain other properties that can be set.
1531 @item frame
1532 An object with various properties whose displayable representation is a
1533 @dfn{window} in window-system parlance.
1534 @item window
1535 A section of a frame that displays the contents of a buffer;
1536 often called a @dfn{pane} in window-system parlance.
1537 @item window-configuration
1538 An object that represents a saved configuration of windows in a frame.
1539 @item device
1540 An object representing a screen on which frames can be displayed;
1541 equivalent to a @dfn{display} in the X Window System and a @dfn{TTY} in
1542 character mode.
1543 @item face
1544 An object specifying the appearance of text or graphics; it has
1545 properties such as font, foreground color, and background color.
1546 @item marker
1547 An object that refers to a particular position in a buffer and moves
1548 around as text is inserted and deleted to stay in the same relative
1549 position to the text around it.
1550 @item extent
1551 Similar to a marker but covers a range of text in a buffer; can also
1552 specify properties of the text, such as a face in which the text is to
1553 be displayed, whether the text is invisible or unmodifiable, etc.
1554 @item event
1555 Generated by calling @code{next-event} and contains information
1556 describing a particular event happening in the system, such as the user
1557 pressing a key or a process terminating.
1558 @item keymap
1559 An object that maps from events (described using lists, vectors, and
1560 symbols rather than with an event object because the mapping is for
1561 classes of events, rather than individual events) to functions to
1562 execute or other events to recursively look up; the functions are
1563 described by name, using a symbol, or using lists to specify the
1564 function's code.
1565 @item glyph
1566 An object that describes the appearance of an image (e.g.  pixmap) on
1567 the screen; glyphs can be attached to the beginning or end of extents
1568 and in some future version of XEmacs will be able to be inserted
1569 directly into a buffer.
1570 @item process
1571 An object that describes a connection to an externally-running process.
1572 @end table
1573
1574   There are some other, less-commonly-encountered general objects:
1575
1576 @table @code
1577 @item hash-table
1578 An object that maps from an arbitrary Lisp object to another arbitrary
1579 Lisp object, using hashing for fast lookup.
1580 @item obarray
1581 A limited form of hash-table that maps from strings to symbols; obarrays
1582 are used to look up a symbol given its name and are not actually their
1583 own object type but are kludgily represented using vectors with hidden
1584 fields (this representation derives from GNU Emacs).
1585 @item specifier
1586 A complex object used to specify the value of a display property; a
1587 default value is given and different values can be specified for
1588 particular frames, buffers, windows, devices, or classes of device.
1589 @item char-table
1590 An object that maps from chars or classes of chars to arbitrary Lisp
1591 objects; internally char tables use a complex nested-vector
1592 representation that is optimized to the way characters are represented
1593 as integers.
1594 @item range-table
1595 An object that maps from ranges of integers to arbitrary Lisp objects.
1596 @end table
1597
1598   And some strange special-purpose objects:
1599
1600 @table @code
1601 @item charset
1602 @itemx coding-system
1603 Objects used when MULE, or multi-lingual/Asian-language, support is
1604 enabled.
1605 @item color-instance
1606 @itemx font-instance
1607 @itemx image-instance
1608 An object that encapsulates a window-system resource; instances are
1609 mostly used internally but are exposed on the Lisp level for cleanness
1610 of the specifier model and because it's occasionally useful for Lisp
1611 program to create or query the properties of instances.
1612 @item subwindow
1613 An object that encapsulate a @dfn{subwindow} resource, i.e. a
1614 window-system child window that is drawn into by an external process;
1615 this object should be integrated into the glyph system but isn't yet,
1616 and may change form when this is done.
1617 @item tooltalk-message
1618 @itemx tooltalk-pattern
1619 Objects that represent resources used in the ToolTalk interprocess
1620 communication protocol.
1621 @item toolbar-button
1622 An object used in conjunction with the toolbar.
1623 @end table
1624
1625   And objects that are only used internally:
1626
1627 @table @code
1628 @item opaque
1629 A generic object for encapsulating arbitrary memory; this allows you the
1630 generality of @code{malloc()} and the convenience of the Lisp object
1631 system.
1632 @item lstream
1633 A buffering I/O stream, used to provide a unified interface to anything
1634 that can accept output or provide input, such as a file descriptor, a
1635 stdio stream, a chunk of memory, a Lisp buffer, a Lisp string, etc.;
1636 it's a Lisp object to make its memory management more convenient.
1637 @item char-table-entry
1638 Subsidiary objects in the internal char-table representation.
1639 @item extent-auxiliary
1640 @itemx menubar-data
1641 @itemx toolbar-data
1642 Various special-purpose objects that are basically just used to
1643 encapsulate memory for particular subsystems, similar to the more
1644 general ``opaque'' object.
1645 @item symbol-value-forward
1646 @itemx symbol-value-buffer-local
1647 @itemx symbol-value-varalias
1648 @itemx symbol-value-lisp-magic
1649 Special internal-only objects that are placed in the value cell of a
1650 symbol to indicate that there is something special with this variable --
1651 e.g. it has no value, it mirrors another variable, or it mirrors some C
1652 variable; there is really only one kind of object, called a
1653 @dfn{symbol-value-magic}, but it is sort-of halfway kludged into
1654 semi-different object types.
1655 @end table
1656
1657 @cindex permanent objects
1658 @cindex temporary objects
1659   Some types of objects are @dfn{permanent}, meaning that once created,
1660 they do not disappear until explicitly destroyed, using a function such
1661 as @code{delete-buffer}, @code{delete-window}, @code{delete-frame}, etc.
1662 Others will disappear once they are not longer used, through the garbage
1663 collection mechanism.  Buffers, frames, windows, devices, and processes
1664 are among the objects that are permanent.  Note that some objects can go
1665 both ways: Faces can be created either way; extents are normally
1666 permanent, but detached extents (extents not referring to any text, as
1667 happens to some extents when the text they are referring to is deleted)
1668 are temporary.  Note that some permanent objects, such as faces and
1669 coding systems, cannot be deleted.  Note also that windows are unique in
1670 that they can be @emph{undeleted} after having previously been
1671 deleted. (This happens as a result of restoring a window configuration.)
1672
1673 @cindex read syntax
1674   Note that many types of objects have a @dfn{read syntax}, i.e. a way of
1675 specifying an object of that type in Lisp code.  When you load a Lisp
1676 file, or type in code to be evaluated, what really happens is that the
1677 function @code{read} is called, which reads some text and creates an object
1678 based on the syntax of that text; then @code{eval} is called, which
1679 possibly does something special; then this loop repeats until there's
1680 no more text to read. (@code{eval} only actually does something special
1681 with symbols, which causes the symbol's value to be returned,
1682 similar to referencing a variable; and with conses [i.e. lists],
1683 which cause a function invocation.  All other values are returned
1684 unchanged.)
1685
1686   The read syntax
1687
1688 @example
1689 17297
1690 @end example
1691
1692 converts to an integer whose value is 17297.
1693
1694 @example
1695 1.983e-4
1696 @end example
1697
1698 converts to a float whose value is 1.983e-4, or .0001983.
1699
1700 @example
1701 ?b
1702 @end example
1703
1704 converts to a char that represents the lowercase letter b.
1705
1706 @example
1707 ?^[$(B#&^[(B
1708 @end example
1709
1710 (where @samp{^[} actually is an @samp{ESC} character) converts to a
1711 particular Kanji character when using an ISO2022-based coding system for
1712 input. (To decode this goo: @samp{ESC} begins an escape sequence;
1713 @samp{ESC $ (} is a class of escape sequences meaning ``switch to a
1714 94x94 character set''; @samp{ESC $ ( B} means ``switch to Japanese
1715 Kanji''; @samp{#} and @samp{&} collectively index into a 94-by-94 array
1716 of characters [subtract 33 from the ASCII value of each character to get
1717 the corresponding index]; @samp{ESC (} is a class of escape sequences
1718 meaning ``switch to a 94 character set''; @samp{ESC (B} means ``switch
1719 to US ASCII''.  It is a coincidence that the letter @samp{B} is used to
1720 denote both Japanese Kanji and US ASCII.  If the first @samp{B} were
1721 replaced with an @samp{A}, you'd be requesting a Chinese Hanzi character
1722 from the GB2312 character set.)
1723
1724 @example
1725 "foobar"
1726 @end example
1727
1728 converts to a string.
1729
1730 @example
1731 foobar
1732 @end example
1733
1734 converts to a symbol whose name is @code{"foobar"}.  This is done by
1735 looking up the string equivalent in the global variable
1736 @code{obarray}, whose contents should be an obarray.  If no symbol
1737 is found, a new symbol with the name @code{"foobar"} is automatically
1738 created and added to @code{obarray}; this process is called
1739 @dfn{interning} the symbol.
1740 @cindex interning
1741
1742 @example
1743 (foo . bar)
1744 @end example
1745
1746 converts to a cons cell containing the symbols @code{foo} and @code{bar}.
1747
1748 @example
1749 (1 a 2.5)
1750 @end example
1751
1752 converts to a three-element list containing the specified objects
1753 (note that a list is actually a set of nested conses; see the
1754 XEmacs Lisp Reference).
1755
1756 @example
1757 [1 a 2.5]
1758 @end example
1759
1760 converts to a three-element vector containing the specified objects.
1761
1762 @example
1763 #[... ... ... ...]
1764 @end example
1765
1766 converts to a compiled-function object (the actual contents are not
1767 shown since they are not relevant here; look at a file that ends with
1768 @file{.elc} for examples).
1769
1770 @example
1771 #*01110110
1772 @end example
1773
1774 converts to a bit-vector.
1775
1776 @example
1777 #s(hash-table ... ...)
1778 @end example
1779
1780 converts to a hash table (the actual contents are not shown).
1781
1782 @example
1783 #s(range-table ... ...)
1784 @end example
1785
1786 converts to a range table (the actual contents are not shown).
1787
1788 @example
1789 #s(char-table ... ...)
1790 @end example
1791
1792 converts to a char table (the actual contents are not shown).
1793
1794 Note that the @code{#s()} syntax is the general syntax for structures,
1795 which are not really implemented in XEmacs Lisp but should be.
1796
1797 When an object is printed out (using @code{print} or a related
1798 function), the read syntax is used, so that the same object can be read
1799 in again.
1800
1801 The other objects do not have read syntaxes, usually because it does not
1802 really make sense to create them in this fashion (i.e.  processes, where
1803 it doesn't make sense to have a subprocess created as a side effect of
1804 reading some Lisp code), or because they can't be created at all
1805 (e.g. subrs).  Permanent objects, as a rule, do not have a read syntax;
1806 nor do most complex objects, which contain too much state to be easily
1807 initialized through a read syntax.
1808
1809 @node How Lisp Objects Are Represented in C, Rules When Writing New C Code, The XEmacs Object System (Abstractly Speaking), Top
1810 @chapter How Lisp Objects Are Represented in C
1811 @cindex Lisp objects are represented in C, how
1812 @cindex objects are represented in C, how Lisp
1813 @cindex represented in C, how Lisp objects are
1814
1815 Lisp objects are represented in C using a 32-bit or 64-bit machine word
1816 (depending on the processor; i.e. DEC Alphas use 64-bit Lisp objects and
1817 most other processors use 32-bit Lisp objects).  The representation
1818 stuffs a pointer together with a tag, as follows:
1819
1820 @example
1821  [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
1822  [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
1823
1824    <---------------------------------------------------------> <->
1825             a pointer to a structure, or an integer            tag
1826 @end example
1827
1828 A tag of 00 is used for all pointer object types, a tag of 10 is used
1829 for characters, and the other two tags 01 and 11 are joined together to
1830 form the integer object type.  This representation gives us 31 bit
1831 integers and 30 bit characters, while pointers are represented directly
1832 without any bit masking or shifting.  This representation, though,
1833 assumes that pointers to structs are always aligned to multiples of 4,
1834 so the lower 2 bits are always zero.
1835
1836 Lisp objects use the typedef @code{Lisp_Object}, but the actual C type
1837 used for the Lisp object can vary.  It can be either a simple type
1838 (@code{long} on the DEC Alpha, @code{int} on other machines) or a
1839 structure whose fields are bit fields that line up properly (actually, a
1840 union of structures is used).  Generally the simple integral type is
1841 preferable because it ensures that the compiler will actually use a
1842 machine word to represent the object (some compilers will use more
1843 general and less efficient code for unions and structs even if they can
1844 fit in a machine word).  The union type, however, has the advantage of
1845 stricter type checking.  If you accidentally pass an integer where a Lisp
1846 object is desired, you get a compile error.  The choice of which type
1847 to use is determined by the preprocessor constant @code{USE_UNION_TYPE}
1848 which is defined via the @code{--use-union-type} option to
1849 @code{configure}.
1850
1851 Various macros are used to convert between Lisp_Objects and the
1852 corresponding C type.  Macros of the form @code{XINT()}, @code{XCHAR()},
1853 @code{XSTRING()}, @code{XSYMBOL()}, do any required bit shifting and/or
1854 masking and cast it to the appropriate type.  @code{XINT()} needs to be
1855 a bit tricky so that negative numbers are properly sign-extended.  Since
1856 integers are stored left-shifted, if the right-shift operator does an
1857 arithmetic shift (i.e. it leaves the most-significant bit as-is rather
1858 than shifting in a zero, so that it mimics a divide-by-two even for
1859 negative numbers) the shift to remove the tag bit is enough.  This is
1860 the case on all the systems we support.
1861
1862 Note that when @code{ERROR_CHECK_TYPECHECK} is defined, the converter
1863 macros become more complicated---they check the tag bits and/or the
1864 type field in the first four bytes of a record type to ensure that the
1865 object is really of the correct type.  This is great for catching places
1866 where an incorrect type is being dereferenced---this typically results
1867 in a pointer being dereferenced as the wrong type of structure, with
1868 unpredictable (and sometimes not easily traceable) results.
1869
1870 There are similar @code{XSET@var{TYPE}()} macros that construct a Lisp
1871 object.  These macros are of the form @code{XSET@var{TYPE}
1872 (@var{lvalue}, @var{result})}, i.e. they have to be a statement rather
1873 than just used in an expression.  The reason for this is that standard C
1874 doesn't let you ``construct'' a structure (but GCC does).  Granted, this
1875 sometimes isn't too convenient; for the case of integers, at least, you
1876 can use the function @code{make_int()}, which constructs and
1877 @emph{returns} an integer Lisp object.  Note that the
1878 @code{XSET@var{TYPE}()} macros are also affected by
1879 @code{ERROR_CHECK_TYPECHECK} and make sure that the structure is of the
1880 right type in the case of record types, where the type is contained in
1881 the structure.
1882
1883 The C programmer is responsible for @strong{guaranteeing} that a
1884 Lisp_Object is the correct type before using the @code{X@var{TYPE}}
1885 macros.  This is especially important in the case of lists.  Use
1886 @code{XCAR} and @code{XCDR} if a Lisp_Object is certainly a cons cell,
1887 else use @code{Fcar()} and @code{Fcdr()}.  Trust other C code, but not
1888 Lisp code.  On the other hand, if XEmacs has an internal logic error,
1889 it's better to crash immediately, so sprinkle @code{assert()}s and
1890 ``unreachable'' @code{abort()}s liberally about the source code.  Where
1891 performance is an issue, use @code{type_checking_assert},
1892 @code{bufpos_checking_assert}, and @code{gc_checking_assert}, which do
1893 nothing unless the corresponding configure error checking flag was
1894 specified.
1895
1896 @node Rules When Writing New C Code, A Summary of the Various XEmacs Modules, How Lisp Objects Are Represented in C, Top
1897 @chapter Rules When Writing New C Code
1898 @cindex writing new C code, rules when
1899 @cindex C code, rules when writing new
1900 @cindex code, rules when writing new C
1901
1902 The XEmacs C Code is extremely complex and intricate, and there are many
1903 rules that are more or less consistently followed throughout the code.
1904 Many of these rules are not obvious, so they are explained here.  It is
1905 of the utmost importance that you follow them.  If you don't, you may
1906 get something that appears to work, but which will crash in odd
1907 situations, often in code far away from where the actual breakage is.
1908
1909 @menu
1910 * General Coding Rules::
1911 * Writing Lisp Primitives::
1912 * Writing Good Comments::
1913 * Adding Global Lisp Variables::
1914 * Proper Use of Unsigned Types::
1915 * Coding for Mule::
1916 * Techniques for XEmacs Developers::
1917 @end menu
1918
1919 @node General Coding Rules
1920 @section General Coding Rules
1921 @cindex coding rules, general
1922
1923 The C code is actually written in a dialect of C called @dfn{Clean C},
1924 meaning that it can be compiled, mostly warning-free, with either a C or
1925 C++ compiler.  Coding in Clean C has several advantages over plain C.
1926 C++ compilers are more nit-picking, and a number of coding errors have
1927 been found by compiling with C++.  The ability to use both C and C++
1928 tools means that a greater variety of development tools are available to
1929 the developer.
1930
1931 Every module includes @file{<config.h>} (angle brackets so that
1932 @samp{--srcdir} works correctly; @file{config.h} may or may not be in
1933 the same directory as the C sources) and @file{lisp.h}.  @file{config.h}
1934 must always be included before any other header files (including
1935 system header files) to ensure that certain tricks played by various
1936 @file{s/} and @file{m/} files work out correctly.
1937
1938 When including header files, always use angle brackets, not double
1939 quotes, except when the file to be included is always in the same
1940 directory as the including file.  If either file is a generated file,
1941 then that is not likely to be the case.  In order to understand why we
1942 have this rule, imagine what happens when you do a build in the source
1943 directory using @samp{./configure} and another build in another
1944 directory using @samp{../work/configure}.  There will be two different
1945 @file{config.h} files.  Which one will be used if you @samp{#include
1946 "config.h"}?
1947
1948 Almost every module contains a @code{syms_of_*()} function and a
1949 @code{vars_of_*()} function.  The former declares any Lisp primitives
1950 you have defined and defines any symbols you will be using.  The latter
1951 declares any global Lisp variables you have added and initializes global
1952 C variables in the module.  @strong{Important}: There are stringent
1953 requirements on exactly what can go into these functions.  See the
1954 comment in @file{emacs.c}.  The reason for this is to avoid obscure
1955 unwanted interactions during initialization.  If you don't follow these
1956 rules, you'll be sorry!  If you want to do anything that isn't allowed,
1957 create a @code{complex_vars_of_*()} function for it.  Doing this is
1958 tricky, though: you have to make sure your function is called at the
1959 right time so that all the initialization dependencies work out.
1960
1961 Declare each function of these kinds in @file{symsinit.h}.  Make sure
1962 it's called in the appropriate place in @file{emacs.c}.  You never need
1963 to include @file{symsinit.h} directly, because it is included by
1964 @file{lisp.h}.
1965
1966 @strong{All global and static variables that are to be modifiable must
1967 be declared uninitialized.}  This means that you may not use the
1968 ``declare with initializer'' form for these variables, such as @code{int
1969 some_variable = 0;}.  The reason for this has to do with some kludges
1970 done during the dumping process: If possible, the initialized data
1971 segment is re-mapped so that it becomes part of the (unmodifiable) code
1972 segment in the dumped executable.  This allows this memory to be shared
1973 among multiple running XEmacs processes.  XEmacs is careful to place as
1974 much constant data as possible into initialized variables during the
1975 @file{temacs} phase.
1976
1977 @cindex copy-on-write
1978 @strong{Please note:} This kludge only works on a few systems nowadays,
1979 and is rapidly becoming irrelevant because most modern operating systems
1980 provide @dfn{copy-on-write} semantics.  All data is initially shared
1981 between processes, and a private copy is automatically made (on a
1982 page-by-page basis) when a process first attempts to write to a page of
1983 memory.
1984
1985 Formerly, there was a requirement that static variables not be declared
1986 inside of functions.  This had to do with another hack along the same
1987 vein as what was just described: old USG systems put statically-declared
1988 variables in the initialized data space, so those header files had a
1989 @code{#define static} declaration. (That way, the data-segment remapping
1990 described above could still work.) This fails badly on static variables
1991 inside of functions, which suddenly become automatic variables;
1992 therefore, you weren't supposed to have any of them.  This awful kludge
1993 has been removed in XEmacs because
1994
1995 @enumerate
1996 @item
1997 almost all of the systems that used this kludge ended up having
1998 to disable the data-segment remapping anyway;
1999 @item
2000 the only systems that didn't were extremely outdated ones;
2001 @item
2002 this hack completely messed up inline functions.
2003 @end enumerate
2004
2005 The C source code makes heavy use of C preprocessor macros.  One popular
2006 macro style is:
2007
2008 @example
2009 #define FOO(var, value) do @{            \
2010   Lisp_Object FOO_value = (value);      \
2011   ... /* compute using FOO_value */     \
2012   (var) = bar;                          \
2013 @} while (0)
2014 @end example
2015
2016 The @code{do @{...@} while (0)} is a standard trick to allow FOO to have
2017 statement semantics, so that it can safely be used within an @code{if}
2018 statement in C, for example.  Multiple evaluation is prevented by
2019 copying a supplied argument into a local variable, so that
2020 @code{FOO(var,fun(1))} only calls @code{fun} once.
2021
2022 Lisp lists are popular data structures in the C code as well as in
2023 Elisp.  There are two sets of macros that iterate over lists.
2024 @code{EXTERNAL_LIST_LOOP_@var{n}} should be used when the list has been
2025 supplied by the user, and cannot be trusted to be acyclic and
2026 @code{nil}-terminated.  A @code{malformed-list} or @code{circular-list} error
2027 will be generated if the list being iterated over is not entirely
2028 kosher.  @code{LIST_LOOP_@var{n}}, on the other hand, is faster and less
2029 safe, and can be used only on trusted lists.
2030
2031 Related macros are @code{GET_EXTERNAL_LIST_LENGTH} and
2032 @code{GET_LIST_LENGTH}, which calculate the length of a list, and in the
2033 case of @code{GET_EXTERNAL_LIST_LENGTH}, validating the properness of
2034 the list.  The macros @code{EXTERNAL_LIST_LOOP_DELETE_IF} and
2035 @code{LIST_LOOP_DELETE_IF} delete elements from a lisp list satisfying some
2036 predicate.
2037
2038 @node Writing Lisp Primitives
2039 @section Writing Lisp Primitives
2040 @cindex writing Lisp primitives
2041 @cindex Lisp primitives, writing
2042 @cindex primitives, writing Lisp
2043
2044 Lisp primitives are Lisp functions implemented in C.  The details of
2045 interfacing the C function so that Lisp can call it are handled by a few
2046 C macros.  The only way to really understand how to write new C code is
2047 to read the source, but we can explain some things here.
2048
2049 An example of a special form is the definition of @code{prog1}, from
2050 @file{eval.c}.  (An ordinary function would have the same general
2051 appearance.)
2052
2053 @cindex garbage collection protection
2054 @smallexample
2055 @group
2056 DEFUN ("prog1", Fprog1, 1, UNEVALLED, 0, /*
2057 Similar to `progn', but the value of the first form is returned.
2058 \(prog1 FIRST BODY...): All the arguments are evaluated sequentially.
2059 The value of FIRST is saved during evaluation of the remaining args,
2060 whose values are discarded.
2061 */
2062        (args))
2063 @{
2064   /* This function can GC */
2065   REGISTER Lisp_Object val, form, tail;
2066   struct gcpro gcpro1;
2067
2068   val = Feval (XCAR (args));
2069
2070   GCPRO1 (val);
2071
2072   LIST_LOOP_3 (form, XCDR (args), tail)
2073     Feval (form);
2074
2075   UNGCPRO;
2076   return val;
2077 @}
2078 @end group
2079 @end smallexample
2080
2081   Let's start with a precise explanation of the arguments to the
2082 @code{DEFUN} macro.  Here is a template for them:
2083
2084 @example
2085 @group
2086 DEFUN (@var{lname}, @var{fname}, @var{min_args}, @var{max_args}, @var{interactive}, /*
2087 @var{docstring}
2088 */
2089    (@var{arglist}))
2090 @end group
2091 @end example
2092
2093 @table @var
2094 @item lname
2095 This string is the name of the Lisp symbol to define as the function
2096 name; in the example above, it is @code{"prog1"}.
2097
2098 @item fname
2099 This is the C function name for this function.  This is the name that is
2100 used in C code for calling the function.  The name is, by convention,
2101 @samp{F} prepended to the Lisp name, with all dashes (@samp{-}) in the
2102 Lisp name changed to underscores.  Thus, to call this function from C
2103 code, call @code{Fprog1}.  Remember that the arguments are of type
2104 @code{Lisp_Object}; various macros and functions for creating values of
2105 type @code{Lisp_Object} are declared in the file @file{lisp.h}.
2106
2107 Primitives whose names are special characters (e.g. @code{+} or
2108 @code{<}) are named by spelling out, in some fashion, the special
2109 character: e.g. @code{Fplus()} or @code{Flss()}.  Primitives whose names
2110 begin with normal alphanumeric characters but also contain special
2111 characters are spelled out in some creative way, e.g. @code{let*}
2112 becomes @code{FletX()}.
2113
2114 Each function also has an associated structure that holds the data for
2115 the subr object that represents the function in Lisp.  This structure
2116 conveys the Lisp symbol name to the initialization routine that will
2117 create the symbol and store the subr object as its definition.  The C
2118 variable name of this structure is always @samp{S} prepended to the
2119 @var{fname}.  You hardly ever need to be aware of the existence of this
2120 structure, since @code{DEFUN} plus @code{DEFSUBR} takes care of all the
2121 details.
2122
2123 @item min_args
2124 This is the minimum number of arguments that the function requires.  The
2125 function @code{prog1} allows a minimum of one argument.
2126
2127 @item max_args
2128 This is the maximum number of arguments that the function accepts, if
2129 there is a fixed maximum.  Alternatively, it can be @code{UNEVALLED},
2130 indicating a special form that receives unevaluated arguments, or
2131 @code{MANY}, indicating an unlimited number of evaluated arguments (the
2132 C equivalent of @code{&rest}).  Both @code{UNEVALLED} and @code{MANY}
2133 are macros.  If @var{max_args} is a number, it may not be less than
2134 @var{min_args} and it may not be greater than 8. (If you need to add a
2135 function with more than 8 arguments, use the @code{MANY} form.  Resist
2136 the urge to edit the definition of @code{DEFUN} in @file{lisp.h}.  If
2137 you do it anyways, make sure to also add another clause to the switch
2138 statement in @code{primitive_funcall().})
2139
2140 @item interactive
2141 This is an interactive specification, a string such as might be used as
2142 the argument of @code{interactive} in a Lisp function.  In the case of
2143 @code{prog1}, it is 0 (a null pointer), indicating that @code{prog1}
2144 cannot be called interactively.  A value of @code{""} indicates a
2145 function that should receive no arguments when called interactively.
2146
2147 @item docstring
2148 This is the documentation string.  It is written just like a
2149 documentation string for a function defined in Lisp; in particular, the
2150 first line should be a single sentence.  Note how the documentation
2151 string is enclosed in a comment, none of the documentation is placed on
2152 the same lines as the comment-start and comment-end characters, and the
2153 comment-start characters are on the same line as the interactive
2154 specification.  @file{make-docfile}, which scans the C files for
2155 documentation strings, is very particular about what it looks for, and
2156 will not properly extract the doc string if it's not in this exact format.
2157
2158 In order to make both @file{etags} and @file{make-docfile} happy, make
2159 sure that the @code{DEFUN} line contains the @var{lname} and
2160 @var{fname}, and that the comment-start characters for the doc string
2161 are on the same line as the interactive specification, and put a newline
2162 directly after them (and before the comment-end characters).
2163
2164 @item arglist
2165 This is the comma-separated list of arguments to the C function.  For a
2166 function with a fixed maximum number of arguments, provide a C argument
2167 for each Lisp argument.  In this case, unlike regular C functions, the
2168 types of the arguments are not declared; they are simply always of type
2169 @code{Lisp_Object}.
2170
2171 The names of the C arguments will be used as the names of the arguments
2172 to the Lisp primitive as displayed in its documentation, modulo the same
2173 concerns described above for @code{F...} names (in particular,
2174 underscores in the C arguments become dashes in the Lisp arguments).
2175
2176 There is one additional kludge: A trailing `_' on the C argument is
2177 discarded when forming the Lisp argument.  This allows C language
2178 reserved words (like @code{default}) or global symbols (like
2179 @code{dirname}) to be used as argument names without compiler warnings
2180 or errors.
2181
2182 A Lisp function with @w{@var{max_args} = @code{UNEVALLED}} is a
2183 @w{@dfn{special form}}; its arguments are not evaluated.  Instead it
2184 receives one argument of type @code{Lisp_Object}, a (Lisp) list of the
2185 unevaluated arguments, conventionally named @code{(args)}.
2186
2187 When a Lisp function has no upper limit on the number of arguments,
2188 specify @w{@var{max_args} = @code{MANY}}.  In this case its implementation in
2189 C actually receives exactly two arguments: the number of Lisp arguments
2190 (an @code{int}) and the address of a block containing their values (a
2191 @w{@code{Lisp_Object *}}).  In this case only are the C types specified
2192 in the @var{arglist}: @w{@code{(int nargs, Lisp_Object *args)}}.
2193
2194 @end table
2195
2196 Within the function @code{Fprog1} itself, note the use of the macros
2197 @code{GCPRO1} and @code{UNGCPRO}.  @code{GCPRO1} is used to ``protect''
2198 a variable from garbage collection---to inform the garbage collector
2199 that it must look in that variable and regard the object pointed at by
2200 its contents as an accessible object.  This is necessary whenever you
2201 call @code{Feval} or anything that can directly or indirectly call
2202 @code{Feval} (this includes the @code{QUIT} macro!).  At such a time,
2203 any Lisp object that you intend to refer to again must be protected
2204 somehow.  @code{UNGCPRO} cancels the protection of the variables that
2205 are protected in the current function.  It is necessary to do this
2206 explicitly.
2207
2208 The macro @code{GCPRO1} protects just one local variable.  If you want
2209 to protect two, use @code{GCPRO2} instead; repeating @code{GCPRO1} will
2210 not work.  Macros @code{GCPRO3} and @code{GCPRO4} also exist.
2211
2212 These macros implicitly use local variables such as @code{gcpro1}; you
2213 must declare these explicitly, with type @code{struct gcpro}.  Thus, if
2214 you use @code{GCPRO2}, you must declare @code{gcpro1} and @code{gcpro2}.
2215
2216 @cindex caller-protects (@code{GCPRO} rule)
2217 Note also that the general rule is @dfn{caller-protects}; i.e. you are
2218 only responsible for protecting those Lisp objects that you create.  Any
2219 objects passed to you as arguments should have been protected by whoever
2220 created them, so you don't in general have to protect them.
2221
2222 In particular, the arguments to any Lisp primitive are always
2223 automatically @code{GCPRO}ed, when called ``normally'' from Lisp code or
2224 bytecode.  So only a few Lisp primitives that are called frequently from
2225 C code, such as @code{Fprogn} protect their arguments as a service to
2226 their caller.  You don't need to protect your arguments when writing a
2227 new @code{DEFUN}.
2228
2229 @code{GCPRO}ing is perhaps the trickiest and most error-prone part of
2230 XEmacs coding.  It is @strong{extremely} important that you get this
2231 right and use a great deal of discipline when writing this code.
2232 @xref{GCPROing, ,@code{GCPRO}ing}, for full details on how to do this.
2233
2234 What @code{DEFUN} actually does is declare a global structure of type
2235 @code{Lisp_Subr} whose name begins with capital @samp{SF} and which
2236 contains information about the primitive (e.g. a pointer to the
2237 function, its minimum and maximum allowed arguments, a string describing
2238 its Lisp name); @code{DEFUN} then begins a normal C function declaration
2239 using the @code{F...} name.  The Lisp subr object that is the function
2240 definition of a primitive (i.e. the object in the function slot of the
2241 symbol that names the primitive) actually points to this @samp{SF}
2242 structure; when @code{Feval} encounters a subr, it looks in the
2243 structure to find out how to call the C function.
2244
2245 Defining the C function is not enough to make a Lisp primitive
2246 available; you must also create the Lisp symbol for the primitive (the
2247 symbol is @dfn{interned}; @pxref{Obarrays}) and store a suitable subr
2248 object in its function cell. (If you don't do this, the primitive won't
2249 be seen by Lisp code.) The code looks like this:
2250
2251 @example
2252 DEFSUBR (@var{fname});
2253 @end example
2254
2255 @noindent
2256 Here @var{fname} is the same name you used as the second argument to
2257 @code{DEFUN}.
2258
2259 This call to @code{DEFSUBR} should go in the @code{syms_of_*()} function
2260 at the end of the module.  If no such function exists, create it and
2261 make sure to also declare it in @file{symsinit.h} and call it from the
2262 appropriate spot in @code{main()}.  @xref{General Coding Rules}.
2263
2264 Note that C code cannot call functions by name unless they are defined
2265 in C.  The way to call a function written in Lisp from C is to use
2266 @code{Ffuncall}, which embodies the Lisp function @code{funcall}.  Since
2267 the Lisp function @code{funcall} accepts an unlimited number of
2268 arguments, in C it takes two: the number of Lisp-level arguments, and a
2269 one-dimensional array containing their values.  The first Lisp-level
2270 argument is the Lisp function to call, and the rest are the arguments to
2271 pass to it.  Since @code{Ffuncall} can call the evaluator, you must
2272 protect pointers from garbage collection around the call to
2273 @code{Ffuncall}. (However, @code{Ffuncall} explicitly protects all of
2274 its parameters, so you don't have to protect any pointers passed as
2275 parameters to it.)
2276
2277 The C functions @code{call0}, @code{call1}, @code{call2}, and so on,
2278 provide handy ways to call a Lisp function conveniently with a fixed
2279 number of arguments.  They work by calling @code{Ffuncall}.
2280
2281 @file{eval.c} is a very good file to look through for examples;
2282 @file{lisp.h} contains the definitions for important macros and
2283 functions.
2284
2285 @node Writing Good Comments
2286 @section Writing Good Comments
2287 @cindex writing good comments
2288 @cindex comments, writing good
2289
2290 Comments are a lifeline for programmers trying to understand tricky
2291 code.  In general, the less obvious it is what you are doing, the more
2292 you need a comment, and the more detailed it needs to be.  You should
2293 always be on guard when you're writing code for stuff that's tricky, and
2294 should constantly be putting yourself in someone else's shoes and asking
2295 if that person could figure out without much difficulty what's going
2296 on. (Assume they are a competent programmer who understands the
2297 essentials of how the XEmacs code is structured but doesn't know much
2298 about the module you're working on or any algorithms you're using.) If
2299 you're not sure whether they would be able to, add a comment.  Always
2300 err on the side of more comments, rather than less.
2301
2302 Generally, when making comments, there is no need to attribute them with
2303 your name or initials.  This especially goes for small,
2304 easy-to-understand, non-opinionated ones.  Also, comments indicating
2305 where, when, and by whom a file was changed are @emph{strongly}
2306 discouraged, and in general will be removed as they are discovered.
2307 This is exactly what @file{ChangeLogs} are there for.  However, it can
2308 occasionally be useful to mark exactly where (but not when or by whom)
2309 changes are made, particularly when making small changes to a file
2310 imported from elsewhere.  These marks help when later on a newer version
2311 of the file is imported and the changes need to be merged. (If
2312 everything were always kept in CVS, there would be no need for this.
2313 But in practice, this often doesn't happen, or the CVS repository is
2314 later on lost or unavailable to the person doing the update.)
2315
2316 When putting in an explicit opinion in a comment, you should
2317 @emph{always} attribute it with your name, and optionally the date.
2318 This also goes for long, complex comments explaining in detail the
2319 workings of something -- by putting your name there, you make it
2320 possible for someone who has questions about how that thing works to
2321 determine who wrote the comment so they can write to them.  Preferably,
2322 use your actual name and not your initials, unless your initials are
2323 generally recognized (e.g. @samp{jwz}).  You can use only your first
2324 name if it's obvious who you are; otherwise, give first and last name.
2325 If you're not a regular contributor, you might consider putting your
2326 email address in -- it may be in the ChangeLog, but after awhile
2327 ChangeLogs have a tendency of disappearing or getting
2328 muddled. (E.g. your comment may get copied somewhere else or even into
2329 another program, and tracking down the proper ChangeLog may be very
2330 difficult.)
2331
2332 If you come across an opinion that is not or no longer valid, or you
2333 come across any comment that no longer applies but you want to keep it
2334 around, enclose it in @samp{[[ } and @samp{ ]]} marks and add a comment
2335 afterwards explaining why the preceding comment is no longer valid.  Put
2336 your name on this comment, as explained above.
2337
2338 Just as comments are a lifeline to programmers, incorrect comments are
2339 death.  If you come across an incorrect comment, @strong{immediately}
2340 correct it or flag it as incorrect, as described in the previous
2341 paragraph.  Whenever you work on a section of code, @emph{always} make
2342 sure to update any comments to be correct -- or, at the very least, flag
2343 them as incorrect.
2344
2345 To indicate a "todo" or other problem, use four pound signs --
2346 i.e. @samp{####}.
2347
2348 @node Adding Global Lisp Variables
2349 @section Adding Global Lisp Variables
2350 @cindex global Lisp variables, adding
2351 @cindex variables, adding global Lisp
2352
2353 Global variables whose names begin with @samp{Q} are constants whose
2354 value is a symbol of a particular name.  The name of the variable should
2355 be derived from the name of the symbol using the same rules as for Lisp
2356 primitives.  These variables are initialized using a call to
2357 @code{defsymbol()} in the @code{syms_of_*()} function. (This call
2358 interns a symbol, sets the C variable to the resulting Lisp object, and
2359 calls @code{staticpro()} on the C variable to tell the
2360 garbage-collection mechanism about this variable.  What
2361 @code{staticpro()} does is add a pointer to the variable to a large
2362 global array; when garbage-collection happens, all pointers listed in
2363 the array are used as starting points for marking Lisp objects.  This is
2364 important because it's quite possible that the only current reference to
2365 the object is the C variable.  In the case of symbols, the
2366 @code{staticpro()} doesn't matter all that much because the symbol is
2367 contained in @code{obarray}, which is itself @code{staticpro()}ed.
2368 However, it's possible that a naughty user could do something like
2369 uninterning the symbol out of @code{obarray} or even setting
2370 @code{obarray} to a different value [although this is likely to make
2371 XEmacs crash!].)
2372
2373   @strong{Please note:} It is potentially deadly if you declare a
2374 @samp{Q...}  variable in two different modules.  The two calls to
2375 @code{defsymbol()} are no problem, but some linkers will complain about
2376 multiply-defined symbols.  The most insidious aspect of this is that
2377 often the link will succeed anyway, but then the resulting executable
2378 will sometimes crash in obscure ways during certain operations!  To
2379 avoid this problem, declare any symbols with common names (such as
2380 @code{text}) that are not obviously associated with this particular
2381 module in the module @file{general.c}.
2382
2383   Global variables whose names begin with @samp{V} are variables that
2384 contain Lisp objects.  The convention here is that all global variables
2385 of type @code{Lisp_Object} begin with @samp{V}, and all others don't
2386 (including integer and boolean variables that have Lisp
2387 equivalents). Most of the time, these variables have equivalents in
2388 Lisp, but some don't.  Those that do are declared this way by a call to
2389 @code{DEFVAR_LISP()} in the @code{vars_of_*()} initializer for the
2390 module.  What this does is create a special @dfn{symbol-value-forward}
2391 Lisp object that contains a pointer to the C variable, intern a symbol
2392 whose name is as specified in the call to @code{DEFVAR_LISP()}, and set
2393 its value to the symbol-value-forward Lisp object; it also calls
2394 @code{staticpro()} on the C variable to tell the garbage-collection
2395 mechanism about the variable.  When @code{eval} (or actually
2396 @code{symbol-value}) encounters this special object in the process of
2397 retrieving a variable's value, it follows the indirection to the C
2398 variable and gets its value.  @code{setq} does similar things so that
2399 the C variable gets changed.
2400
2401   Whether or not you @code{DEFVAR_LISP()} a variable, you need to
2402 initialize it in the @code{vars_of_*()} function; otherwise it will end
2403 up as all zeroes, which is the integer 0 (@emph{not} @code{nil}), and
2404 this is probably not what you want.  Also, if the variable is not
2405 @code{DEFVAR_LISP()}ed, @strong{you must call} @code{staticpro()} on the
2406 C variable in the @code{vars_of_*()} function.  Otherwise, the
2407 garbage-collection mechanism won't know that the object in this variable
2408 is in use, and will happily collect it and reuse its storage for another
2409 Lisp object, and you will be the one who's unhappy when you can't figure
2410 out how your variable got overwritten.
2411
2412 @node Proper Use of Unsigned Types
2413 @section Proper Use of Unsigned Types
2414 @cindex unsigned types, proper use of
2415 @cindex types, proper use of unsigned
2416
2417 Avoid using @code{unsigned int} and @code{unsigned long} whenever
2418 possible.  Unsigned types are viral -- any arithmetic or comparisons
2419 involving mixed signed and unsigned types are automatically converted to
2420 unsigned, which is almost certainly not what you want.  Many subtle and
2421 hard-to-find bugs are created by careless use of unsigned types.  In
2422 general, you should almost @emph{never} use an unsigned type to hold a
2423 regular quantity of any sort.  The only exceptions are
2424
2425 @enumerate
2426 @item
2427 When there's a reasonable possibility you will actually need all 32 or
2428 64 bits to store the quantity.
2429 @item
2430 When calling existing API's that require unsigned types.  In this case,
2431 you should still do all manipulation using signed types, and do the
2432 conversion at the very threshold of the API call.
2433 @item
2434 In existing code that you don't want to modify because you don't
2435 maintain it.
2436 @item
2437 In bit-field structures.
2438 @end enumerate
2439
2440 Other reasonable uses of @code{unsigned int} and @code{unsigned long}
2441 are representing non-quantities -- e.g. bit-oriented flags and such.
2442
2443 @node Coding for Mule
2444 @section Coding for Mule
2445 @cindex coding for Mule
2446 @cindex Mule, coding for
2447
2448 Although Mule support is not compiled by default in XEmacs, many people
2449 are using it, and we consider it crucial that new code works correctly
2450 with multibyte characters.  This is not hard; it is only a matter of
2451 following several simple user-interface guidelines.  Even if you never
2452 compile with Mule, with a little practice you will find it quite easy
2453 to code Mule-correctly.
2454
2455 Note that these guidelines are not necessarily tied to the current Mule
2456 implementation; they are also a good idea to follow on the grounds of
2457 code generalization for future I18N work.
2458
2459 @menu
2460 * Character-Related Data Types::
2461 * Working With Character and Byte Positions::
2462 * Conversion to and from External Data::
2463 * General Guidelines for Writing Mule-Aware Code::
2464 * An Example of Mule-Aware Code::
2465 @end menu
2466
2467 @node Character-Related Data Types
2468 @subsection Character-Related Data Types
2469 @cindex character-related data types
2470 @cindex data types, character-related
2471
2472 First, let's review the basic character-related datatypes used by
2473 XEmacs.  Note that the separate @code{typedef}s are not mandatory in the
2474 current implementation (all of them boil down to @code{unsigned char} or
2475 @code{int}), but they improve clarity of code a great deal, because one
2476 glance at the declaration can tell the intended use of the variable.
2477
2478 @table @code
2479 @item Emchar
2480 @cindex Emchar
2481 An @code{Emchar} holds a single Emacs character.
2482
2483 Obviously, the equality between characters and bytes is lost in the Mule
2484 world.  Characters can be represented by one or more bytes in the
2485 buffer, and @code{Emchar} is the C type large enough to hold any
2486 character.
2487
2488 Without Mule support, an @code{Emchar} is equivalent to an
2489 @code{unsigned char}.
2490
2491 @item Bufbyte
2492 @cindex Bufbyte
2493 The data representing the text in a buffer or string is logically a set
2494 of @code{Bufbyte}s.
2495
2496 XEmacs does not work with the same character formats all the time; when
2497 reading characters from the outside, it decodes them to an internal
2498 format, and likewise encodes them when writing.  @code{Bufbyte} (in fact
2499 @code{unsigned char}) is the basic unit of XEmacs internal buffers and
2500 strings format.  A @code{Bufbyte *} is the type that points at text
2501 encoded in the variable-width internal encoding.
2502
2503 One character can correspond to one or more @code{Bufbyte}s.  In the
2504 current Mule implementation, an ASCII character is represented by the
2505 same @code{Bufbyte}, and other characters are represented by a sequence
2506 of two or more @code{Bufbyte}s.
2507
2508 Without Mule support, there are exactly 256 characters, implicitly
2509 Latin-1, and each character is represented using one @code{Bufbyte}, and
2510 there is a one-to-one correspondence between @code{Bufbyte}s and
2511 @code{Emchar}s.
2512
2513 @item Bufpos
2514 @itemx Charcount
2515 @cindex Bufpos
2516 @cindex Charcount
2517 A @code{Bufpos} represents a character position in a buffer or string.
2518 A @code{Charcount} represents a number (count) of characters.
2519 Logically, subtracting two @code{Bufpos} values yields a
2520 @code{Charcount} value.  Although all of these are @code{typedef}ed to
2521 @code{EMACS_INT}, we use them in preference to @code{EMACS_INT} to make
2522 it clear what sort of position is being used.
2523
2524 @code{Bufpos} and @code{Charcount} values are the only ones that are
2525 ever visible to Lisp.
2526
2527 @item Bytind
2528 @itemx Bytecount
2529 @cindex Bytind
2530 @cindex Bytecount
2531 A @code{Bytind} represents a byte position in a buffer or string.  A
2532 @code{Bytecount} represents the distance between two positions, in bytes.
2533 The relationship between @code{Bytind} and @code{Bytecount} is the same
2534 as the relationship between @code{Bufpos} and @code{Charcount}.
2535
2536 @item Extbyte
2537 @itemx Extcount
2538 @cindex Extbyte
2539 @cindex Extcount
2540 When dealing with the outside world, XEmacs works with @code{Extbyte}s,
2541 which are equivalent to @code{unsigned char}.  Obviously, an
2542 @code{Extcount} is the distance between two @code{Extbyte}s.  Extbytes
2543 and Extcounts are not all that frequent in XEmacs code.
2544 @end table
2545
2546 @node Working With Character and Byte Positions
2547 @subsection Working With Character and Byte Positions
2548 @cindex character and byte positions, working with
2549 @cindex byte positions, working with character and
2550 @cindex positions, working with character and byte
2551
2552 Now that we have defined the basic character-related types, we can look
2553 at the macros and functions designed for work with them and for
2554 conversion between them.  Most of these macros are defined in
2555 @file{buffer.h}, and we don't discuss all of them here, but only the
2556 most important ones.  Examining the existing code is the best way to
2557 learn about them.
2558
2559 @table @code
2560 @item MAX_EMCHAR_LEN
2561 @cindex MAX_EMCHAR_LEN
2562 This preprocessor constant is the maximum number of buffer bytes to
2563 represent an Emacs character in the variable width internal encoding.
2564 It is useful when allocating temporary strings to keep a known number of
2565 characters.  For instance:
2566
2567 @example
2568 @group
2569 @{
2570   Charcount cclen;
2571   ...
2572   @{
2573     /* Allocate place for @var{cclen} characters. */
2574     Bufbyte *buf = (Bufbyte *)alloca (cclen * MAX_EMCHAR_LEN);
2575 ...
2576 @end group
2577 @end example
2578
2579 If you followed the previous section, you can guess that, logically,
2580 multiplying a @code{Charcount} value with @code{MAX_EMCHAR_LEN} produces
2581 a @code{Bytecount} value.
2582
2583 In the current Mule implementation, @code{MAX_EMCHAR_LEN} equals 4.
2584 Without Mule, it is 1.
2585
2586 @item charptr_emchar
2587 @itemx set_charptr_emchar
2588 @cindex charptr_emchar
2589 @cindex set_charptr_emchar
2590 The @code{charptr_emchar} macro takes a @code{Bufbyte} pointer and
2591 returns the @code{Emchar} stored at that position.  If it were a
2592 function, its prototype would be:
2593
2594 @example
2595 Emchar charptr_emchar (Bufbyte *p);
2596 @end example
2597
2598 @code{set_charptr_emchar} stores an @code{Emchar} to the specified byte
2599 position.  It returns the number of bytes stored:
2600
2601 @example
2602 Bytecount set_charptr_emchar (Bufbyte *p, Emchar c);
2603 @end example
2604
2605 It is important to note that @code{set_charptr_emchar} is safe only for
2606 appending a character at the end of a buffer, not for overwriting a
2607 character in the middle.  This is because the width of characters
2608 varies, and @code{set_charptr_emchar} cannot resize the string if it
2609 writes, say, a two-byte character where a single-byte character used to
2610 reside.
2611
2612 A typical use of @code{set_charptr_emchar} can be demonstrated by this
2613 example, which copies characters from buffer @var{buf} to a temporary
2614 string of Bufbytes.
2615
2616 @example
2617 @group
2618 @{
2619   Bufpos pos;
2620   for (pos = beg; pos < end; pos++)
2621     @{
2622       Emchar c = BUF_FETCH_CHAR (buf, pos);
2623       p += set_charptr_emchar (buf, c);
2624     @}
2625 @}
2626 @end group
2627 @end example
2628
2629 Note how @code{set_charptr_emchar} is used to store the @code{Emchar}
2630 and increment the counter, at the same time.
2631
2632 @item INC_CHARPTR
2633 @itemx DEC_CHARPTR
2634 @cindex INC_CHARPTR
2635 @cindex DEC_CHARPTR
2636 These two macros increment and decrement a @code{Bufbyte} pointer,
2637 respectively.  They will adjust the pointer by the appropriate number of
2638 bytes according to the byte length of the character stored there.  Both
2639 macros assume that the memory address is located at the beginning of a
2640 valid character.
2641
2642 Without Mule support, @code{INC_CHARPTR (p)} and @code{DEC_CHARPTR (p)}
2643 simply expand to @code{p++} and @code{p--}, respectively.
2644
2645 @item bytecount_to_charcount
2646 @cindex bytecount_to_charcount
2647 Given a pointer to a text string and a length in bytes, return the
2648 equivalent length in characters.
2649
2650 @example
2651 Charcount bytecount_to_charcount (Bufbyte *p, Bytecount bc);
2652 @end example
2653
2654 @item charcount_to_bytecount
2655 @cindex charcount_to_bytecount
2656 Given a pointer to a text string and a length in characters, return the
2657 equivalent length in bytes.
2658
2659 @example
2660 Bytecount charcount_to_bytecount (Bufbyte *p, Charcount cc);
2661 @end example
2662
2663 @item charptr_n_addr
2664 @cindex charptr_n_addr
2665 Return a pointer to the beginning of the character offset @var{cc} (in
2666 characters) from @var{p}.
2667
2668 @example
2669 Bufbyte *charptr_n_addr (Bufbyte *p, Charcount cc);
2670 @end example
2671 @end table
2672
2673 @node Conversion to and from External Data
2674 @subsection Conversion to and from External Data
2675 @cindex conversion to and from external data
2676 @cindex external data, conversion to and from
2677
2678 When an external function, such as a C library function, returns a
2679 @code{char} pointer, you should almost never treat it as @code{Bufbyte}.
2680 This is because these returned strings may contain 8bit characters which
2681 can be misinterpreted by XEmacs, and cause a crash.  Likewise, when
2682 exporting a piece of internal text to the outside world, you should
2683 always convert it to an appropriate external encoding, lest the internal
2684 stuff (such as the infamous \201 characters) leak out.
2685
2686 The interface to conversion between the internal and external
2687 representations of text are the numerous conversion macros defined in
2688 @file{buffer.h}.  There used to be a fixed set of external formats
2689 supported by these macros, but now any coding system can be used with
2690 these macros.  The coding system alias mechanism is used to create the
2691 following logical coding systems, which replace the fixed external
2692 formats.  The (dontusethis-set-symbol-value-handler) mechanism was
2693 enhanced to make this possible (more work on that is needed - like
2694 remove the @code{dontusethis-} prefix).
2695
2696 @table @code
2697 @item Qbinary
2698 This is the simplest format and is what we use in the absence of a more
2699 appropriate format.  This converts according to the @code{binary} coding
2700 system:
2701
2702 @enumerate a
2703 @item
2704 On input, bytes 0--255 are converted into (implicitly Latin-1)
2705 characters 0--255.  A non-Mule xemacs doesn't really know about
2706 different character sets and the fonts to display them, so the bytes can
2707 be treated as text in different 1-byte encodings by simply setting the
2708 appropriate fonts.  So in a sense, non-Mule xemacs is a multi-lingual
2709 editor if, for example, different fonts are used to display text in
2710 different buffers, faces, or windows.  The specifier mechanism gives the
2711 user complete control over this kind of behavior.
2712 @item
2713 On output, characters 0--255 are converted into bytes 0--255 and other
2714 characters are converted into `~'.
2715 @end enumerate
2716
2717 @item Qfile_name
2718 Format used for filenames.  This is user-definable via either the
2719 @code{file-name-coding-system} or @code{pathname-coding-system} (now
2720 obsolete) variables.
2721
2722 @item Qnative
2723 Format used for the external Unix environment---@code{argv[]}, stuff
2724 from @code{getenv()}, stuff from the @file{/etc/passwd} file, etc.
2725 Currently this is the same as Qfile_name.  The two should be
2726 distinguished for clarity and possible future separation.
2727
2728 @item Qctext
2729 Compound--text format.  This is the standard X11 format used for data
2730 stored in properties, selections, and the like.  This is an 8-bit
2731 no-lock-shift ISO2022 coding system.  This is a real coding system,
2732 unlike Qfile_name, which is user-definable.
2733 @end table
2734
2735 There are two fundamental macros to convert between external and
2736 internal format.
2737
2738 @code{TO_INTERNAL_FORMAT} converts external data to internal format, and
2739 @code{TO_EXTERNAL_FORMAT} converts the other way around.  The arguments
2740 each of these receives are a source type, a source, a sink type, a sink,
2741 and a coding system (or a symbol naming a coding system).
2742
2743 A typical call looks like
2744 @example
2745 TO_EXTERNAL_FORMAT (LISP_STRING, str, C_STRING_MALLOC, ptr, Qfile_name);
2746 @end example
2747
2748 which means that the contents of the lisp string @code{str} are written
2749 to a malloc'ed memory area which will be pointed to by @code{ptr}, after
2750 the function returns.  The conversion will be done using the
2751 @code{file-name} coding system, which will be controlled by the user
2752 indirectly by setting or binding the variable
2753 @code{file-name-coding-system}.
2754
2755 Some sources and sinks require two C variables to specify.  We use some
2756 preprocessor magic to allow different source and sink types, and even
2757 different numbers of arguments to specify different types of sources and
2758 sinks.
2759
2760 So we can have a call that looks like
2761 @example
2762 TO_INTERNAL_FORMAT (DATA, (ptr, len),
2763                     MALLOC, (ptr, len),
2764                     coding_system);
2765 @end example
2766
2767 The parenthesized argument pairs are required to make the preprocessor
2768 magic work.
2769
2770 Here are the different source and sink types:
2771
2772 @table @code
2773 @item @code{DATA, (ptr, len),}
2774 input data is a fixed buffer of size @var{len} at address @var{ptr}
2775 @item @code{ALLOCA, (ptr, len),}
2776 output data is placed in an alloca()ed buffer of size @var{len} pointed to by @var{ptr}
2777 @item @code{MALLOC, (ptr, len),}
2778 output data is in a malloc()ed buffer of size @var{len} pointed to by @var{ptr}
2779 @item @code{C_STRING_ALLOCA, ptr,}
2780 equivalent to @code{ALLOCA (ptr, len_ignored)} on output.
2781 @item @code{C_STRING_MALLOC, ptr,}
2782 equivalent to @code{MALLOC (ptr, len_ignored)} on output
2783 @item @code{C_STRING, ptr,}
2784 equivalent to @code{DATA, (ptr, strlen (ptr) + 1)} on input
2785 @item @code{LISP_STRING, string,}
2786 input or output is a Lisp_Object of type string
2787 @item @code{LISP_BUFFER, buffer,}
2788 output is written to @code{(point)} in lisp buffer @var{buffer}
2789 @item @code{LISP_LSTREAM, lstream,}
2790 input or output is a Lisp_Object of type lstream
2791 @item @code{LISP_OPAQUE, object,}
2792 input or output is a Lisp_Object of type opaque
2793 @end table
2794
2795 Often, the data is being converted to a '\0'-byte-terminated string,
2796 which is the format required by many external system C APIs.  For these
2797 purposes, a source type of @code{C_STRING} or a sink type of
2798 @code{C_STRING_ALLOCA} or @code{C_STRING_MALLOC} is appropriate.
2799 Otherwise, we should try to keep XEmacs '\0'-byte-clean, which means
2800 using (ptr, len) pairs.
2801
2802 The sinks to be specified must be lvalues, unless they are the lisp
2803 object types @code{LISP_LSTREAM} or @code{LISP_BUFFER}.
2804
2805 For the sink types @code{ALLOCA} and @code{C_STRING_ALLOCA}, the
2806 resulting text is stored in a stack-allocated buffer, which is
2807 automatically freed on returning from the function.  However, the sink
2808 types @code{MALLOC} and @code{C_STRING_MALLOC} return @code{xmalloc()}ed
2809 memory.  The caller is responsible for freeing this memory using
2810 @code{xfree()}.
2811
2812 Note that it doesn't make sense for @code{LISP_STRING} to be a source
2813 for @code{TO_INTERNAL_FORMAT} or a sink for @code{TO_EXTERNAL_FORMAT}.
2814 You'll get an assertion failure if you try.
2815
2816
2817 @node General Guidelines for Writing Mule-Aware Code
2818 @subsection General Guidelines for Writing Mule-Aware Code
2819 @cindex writing Mule-aware code, general guidelines for
2820 @cindex Mule-aware code, general guidelines for writing
2821 @cindex code, general guidelines for writing Mule-aware
2822
2823 This section contains some general guidance on how to write Mule-aware
2824 code, as well as some pitfalls you should avoid.
2825
2826 @table @emph
2827 @item Never use @code{char} and @code{char *}.
2828 In XEmacs, the use of @code{char} and @code{char *} is almost always a
2829 mistake.  If you want to manipulate an Emacs character from ``C'', use
2830 @code{Emchar}.  If you want to examine a specific octet in the internal
2831 format, use @code{Bufbyte}.  If you want a Lisp-visible character, use a
2832 @code{Lisp_Object} and @code{make_char}.  If you want a pointer to move
2833 through the internal text, use @code{Bufbyte *}.  Also note that you
2834 almost certainly do not need @code{Emchar *}.
2835
2836 @item Be careful not to confuse @code{Charcount}, @code{Bytecount}, and @code{Bufpos}.
2837 The whole point of using different types is to avoid confusion about the
2838 use of certain variables.  Lest this effect be nullified, you need to be
2839 careful about using the right types.
2840
2841 @item Always convert external data
2842 It is extremely important to always convert external data, because
2843 XEmacs can crash if unexpected 8bit sequences are copied to its internal
2844 buffers literally.
2845
2846 This means that when a system function, such as @code{readdir}, returns
2847 a string, you may need to convert it using one of the conversion macros
2848 described in the previous chapter, before passing it further to Lisp.
2849
2850 Actually, most of the basic system functions that accept '\0'-terminated
2851 string arguments, like @code{stat()} and @code{open()}, have been
2852 @strong{encapsulated} so that they are they @code{always} do internal to
2853 external conversion themselves.  This means you must pass internally
2854 encoded data, typically the @code{XSTRING_DATA} of a Lisp_String to
2855 these functions.  This is actually a design bug, since it unexpectedly
2856 changes the semantics of the system functions.  A better design would be
2857 to provide separate versions of these system functions that accepted
2858 Lisp_Objects which were lisp strings in place of their current
2859 @code{char *} arguments.
2860
2861 @example
2862 int stat_lisp (Lisp_Object path, struct stat *buf); /* Implement me */
2863 @end example
2864
2865 Also note that many internal functions, such as @code{make_string},
2866 accept Bufbytes, which removes the need for them to convert the data
2867 they receive.  This increases efficiency because that way external data
2868 needs to be decoded only once, when it is read.  After that, it is
2869 passed around in internal format.
2870 @end table
2871
2872 @node An Example of Mule-Aware Code
2873 @subsection An Example of Mule-Aware Code
2874 @cindex code, an example of Mule-aware
2875 @cindex Mule-aware code, an example of
2876
2877 As an example of Mule-aware code, we will analyze the @code{string}
2878 function, which conses up a Lisp string from the character arguments it
2879 receives.  Here is the definition, pasted from @code{alloc.c}:
2880
2881 @example
2882 @group
2883 DEFUN ("string", Fstring, 0, MANY, 0, /*
2884 Concatenate all the argument characters and make the result a string.
2885 */
2886        (int nargs, Lisp_Object *args))
2887 @{
2888   Bufbyte *storage = alloca_array (Bufbyte, nargs * MAX_EMCHAR_LEN);
2889   Bufbyte *p = storage;
2890
2891   for (; nargs; nargs--, args++)
2892     @{
2893       Lisp_Object lisp_char = *args;
2894       CHECK_CHAR_COERCE_INT (lisp_char);
2895       p += set_charptr_emchar (p, XCHAR (lisp_char));
2896     @}
2897   return make_string (storage, p - storage);
2898 @}
2899 @end group
2900 @end example
2901
2902 Now we can analyze the source line by line.
2903
2904 Obviously, string will be as long as there are arguments to the
2905 function.  This is why we allocate @code{MAX_EMCHAR_LEN} * @var{nargs}
2906 bytes on the stack, i.e. the worst-case number of bytes for @var{nargs}
2907 @code{Emchar}s to fit in the string.
2908
2909 Then, the loop checks that each element is a character, converting
2910 integers in the process.  Like many other functions in XEmacs, this
2911 function silently accepts integers where characters are expected, for
2912 historical and compatibility reasons.  Unless you know what you are
2913 doing, @code{CHECK_CHAR} will also suffice.  @code{XCHAR (lisp_char)}
2914 extracts the @code{Emchar} from the @code{Lisp_Object}, and
2915 @code{set_charptr_emchar} stores it to storage, increasing @code{p} in
2916 the process.
2917
2918 Other instructive examples of correct coding under Mule can be found all
2919 over the XEmacs code.  For starters, I recommend
2920 @code{Fnormalize_menu_item_name} in @file{menubar.c}.  After you have
2921 understood this section of the manual and studied the examples, you can
2922 proceed writing new Mule-aware code.
2923
2924 @node Techniques for XEmacs Developers
2925 @section Techniques for XEmacs Developers
2926 @cindex techniques for XEmacs developers
2927 @cindex developers, techniques for XEmacs
2928
2929 @cindex Purify
2930 @cindex Quantify
2931 To make a purified XEmacs, do: @code{make puremacs}.
2932 To make a quantified XEmacs, do: @code{make quantmacs}.
2933
2934 You simply can't dump Quantified and Purified images (unless using the
2935 portable dumper).  Purify gets confused when xemacs frees memory in one
2936 process that was allocated in a @emph{different} process on a different
2937 machine!.  Run it like so:
2938 @example
2939 temacs -batch -l loadup.el run-temacs @var{xemacs-args...}
2940 @end example
2941
2942 @cindex error checking
2943 Before you go through the trouble, are you compiling with all
2944 debugging and error-checking off?  If not, try that first.  Be warned
2945 that while Quantify is directly responsible for quite a few
2946 optimizations which have been made to XEmacs, doing a run which
2947 generates results which can be acted upon is not necessarily a trivial
2948 task.
2949
2950 Also, if you're still willing to do some runs make sure you configure
2951 with the @samp{--quantify} flag.  That will keep Quantify from starting
2952 to record data until after the loadup is completed and will shut off
2953 recording right before it shuts down (which generates enough bogus data
2954 to throw most results off).  It also enables three additional elisp
2955 commands: @code{quantify-start-recording-data},
2956 @code{quantify-stop-recording-data} and @code{quantify-clear-data}.
2957
2958 If you want to make XEmacs faster, target your favorite slow benchmark,
2959 run a profiler like Quantify, @code{gprof}, or @code{tcov}, and figure
2960 out where the cycles are going.  Specific projects:
2961
2962 @itemize @bullet
2963 @item
2964 Make the garbage collector faster.  Figure out how to write an
2965 incremental garbage collector.
2966 @item
2967 Write a compiler that takes bytecode and spits out C code.
2968 Unfortunately, you will then need a C compiler and a more fully
2969 developed module system.
2970 @item
2971 Speed up redisplay.
2972 @item
2973 Speed up syntax highlighting.  Maybe moving some of the syntax
2974 highlighting capabilities into C would make a difference.
2975 @item
2976 Implement tail recursion in Emacs Lisp (hard!).
2977 @end itemize
2978
2979 Unfortunately, Emacs Lisp is slow, and is going to stay slow.  Function
2980 calls in elisp are especially expensive.  Iterating over a long list is
2981 going to be 30 times faster implemented in C than in Elisp.
2982
2983 Heavily used small code fragments need to be fast.  The traditional way
2984 to implement such code fragments in C is with macros.  But macros in C
2985 are known to be broken.
2986
2987 @cindex macro hygiene
2988 Macro arguments that are repeatedly evaluated may suffer from repeated
2989 side effects or suboptimal performance.
2990
2991 Variable names used in macros may collide with caller's variables,
2992 causing (at least) unwanted compiler warnings.
2993
2994 In order to solve these problems, and maintain statement semantics, one
2995 should use the @code{do @{ ... @} while (0)} trick while trying to
2996 reference macro arguments exactly once using local variables.
2997
2998 Let's take a look at this poor macro definition:
2999
3000 @example
3001 #define MARK_OBJECT(obj) \
3002   if (!marked_p (obj)) mark_object (obj), did_mark = 1
3003 @end example
3004
3005 This macro evaluates its argument twice, and also fails if used like this:
3006 @example
3007   if (flag) MARK_OBJECT (obj); else do_something();
3008 @end example
3009
3010 A much better definition is
3011
3012 @example
3013 #define MARK_OBJECT(obj) do @{ \
3014   Lisp_Object mo_obj = (obj); \
3015   if (!marked_p (mo_obj))     \
3016     @{                         \
3017       mark_object (mo_obj);   \
3018       did_mark = 1;           \
3019     @}                         \
3020 @} while (0)
3021 @end example
3022
3023 Notice the elimination of double evaluation by using the local variable
3024 with the obscure name.  Writing safe and efficient macros requires great
3025 care.  The one problem with macros that cannot be portably worked around
3026 is, since a C block has no value, a macro used as an expression rather
3027 than a statement cannot use the techniques just described to avoid
3028 multiple evaluation.
3029
3030 @cindex inline functions
3031 In most cases where a macro has function semantics, an inline function
3032 is a better implementation technique.  Modern compiler optimizers tend
3033 to inline functions even if they have no @code{inline} keyword, and
3034 configure magic ensures that the @code{inline} keyword can be safely
3035 used as an additional compiler hint.  Inline functions used in a single
3036 .c files are easy.  The function must already be defined to be
3037 @code{static}.  Just add another @code{inline} keyword to the
3038 definition.
3039
3040 @example
3041 inline static int
3042 heavily_used_small_function (int arg)
3043 @{
3044   ...
3045 @}
3046 @end example
3047
3048 Inline functions in header files are trickier, because we would like to
3049 make the following optimization if the function is @emph{not} inlined
3050 (for example, because we're compiling for debugging).  We would like the
3051 function to be defined externally exactly once, and each calling
3052 translation unit would create an external reference to the function,
3053 instead of including a definition of the inline function in the object
3054 code of every translation unit that uses it.  This optimization is
3055 currently only available for gcc.  But you don't have to worry about the
3056 trickiness; just define your inline functions in header files using this
3057 pattern:
3058
3059 @example
3060 INLINE_HEADER int
3061 i_used_to_be_a_crufty_macro_but_look_at_me_now (int arg);
3062 INLINE_HEADER int
3063 i_used_to_be_a_crufty_macro_but_look_at_me_now (int arg)
3064 @{
3065   ...
3066 @}
3067 @end example
3068
3069 The declaration right before the definition is to prevent warnings when
3070 compiling with @code{gcc -Wmissing-declarations}.  I consider issuing
3071 this warning for inline functions a gcc bug, but the gcc maintainers disagree.
3072
3073 @cindex inline functions, headers
3074 @cindex header files, inline functions
3075 Every header which contains inline functions, either directly by using
3076 @code{INLINE_HEADER} or indirectly by using @code{DECLARE_LRECORD} must
3077 be added to @file{inline.c}'s includes to make the optimization
3078 described above work.  (Optimization note: if all INLINE_HEADER
3079 functions are in fact inlined in all translation units, then the linker
3080 can just discard @code{inline.o}, since it contains only unreferenced code).
3081
3082 To get started debugging XEmacs, take a look at the @file{.gdbinit} and
3083 @file{.dbxrc} files in the @file{src} directory.  See the section in the
3084 XEmacs FAQ on How to Debug an XEmacs problem with a debugger.
3085
3086 After making source code changes, run @code{make check} to ensure that
3087 you haven't introduced any regressions.  If you want to make xemacs more
3088 reliable, please improve the test suite in @file{tests/automated}.
3089
3090 Did you make sure you didn't introduce any new compiler warnings?
3091
3092 Before submitting a patch, please try compiling at least once with
3093
3094 @example
3095 configure --with-mule --with-union-type --error-checking=all
3096 @end example
3097
3098 Here are things to know when you create a new source file:
3099
3100 @itemize @bullet
3101 @item
3102 All @file{.c} files should @code{#include <config.h>} first.  Almost all
3103 @file{.c} files should @code{#include "lisp.h"} second.
3104
3105 @item
3106 Generated header files should be included using the @code{#include <...>} syntax,
3107 not the @code{#include "..."} syntax.  The generated headers are:
3108
3109 @file{config.h sheap-adjust.h paths.h Emacs.ad.h}
3110
3111 The basic rule is that you should assume builds using @code{--srcdir}
3112 and the @code{#include <...>} syntax needs to be used when the
3113 to-be-included generated file is in a potentially different directory
3114 @emph{at compile time}.  The non-obvious C rule is that @code{#include "..."}
3115 means to search for the included file in the same directory as the
3116 including file, @emph{not} in the current directory.
3117
3118 @item
3119 Header files should @emph{not} include @code{<config.h>} and
3120 @code{"lisp.h"}.  It is the responsibility of the @file{.c} files that
3121 use it to do so.
3122
3123 @end itemize
3124
3125 @cindex Lisp object types, creating
3126 @cindex creating Lisp object types
3127 @cindex object types, creating Lisp
3128 Here is a checklist of things to do when creating a new lisp object type
3129 named @var{foo}:
3130
3131 @enumerate
3132 @item
3133 create @var{foo}.h
3134 @item
3135 create @var{foo}.c
3136 @item
3137 add definitions of @code{syms_of_@var{foo}}, etc. to @file{@var{foo}.c}
3138 @item
3139 add declarations of @code{syms_of_@var{foo}}, etc. to @file{symsinit.h}
3140 @item
3141 add calls to @code{syms_of_@var{foo}}, etc. to @file{emacs.c}
3142 @item
3143 add definitions of macros like @code{CHECK_@var{FOO}} and
3144 @code{@var{FOO}P} to @file{@var{foo}.h}
3145 @item
3146 add the new type index to @code{enum lrecord_type}
3147 @item
3148 add a DEFINE_LRECORD_IMPLEMENTATION call to @file{@var{foo}.c}
3149 @item
3150 add an INIT_LRECORD_IMPLEMENTATION call to @code{syms_of_@var{foo}.c}
3151 @end enumerate
3152
3153 @node A Summary of the Various XEmacs Modules, Allocation of Objects in XEmacs Lisp, Rules When Writing New C Code, Top
3154 @chapter A Summary of the Various XEmacs Modules
3155 @cindex modules, a summary of the various XEmacs
3156
3157   This is accurate as of XEmacs 20.0.
3158
3159 @menu
3160 * Low-Level Modules::
3161 * Basic Lisp Modules::
3162 * Modules for Standard Editing Operations::
3163 * Editor-Level Control Flow Modules::
3164 * Modules for the Basic Displayable Lisp Objects::
3165 * Modules for other Display-Related Lisp Objects::
3166 * Modules for the Redisplay Mechanism::
3167 * Modules for Interfacing with the File System::
3168 * Modules for Other Aspects of the Lisp Interpreter and Object System::
3169 * Modules for Interfacing with the Operating System::
3170 * Modules for Interfacing with X Windows::
3171 * Modules for Internationalization::
3172 @end menu
3173
3174 @node Low-Level Modules
3175 @section Low-Level Modules
3176 @cindex low-level modules
3177 @cindex modules, low-level
3178
3179 @example
3180 config.h
3181 @end example
3182
3183 This is automatically generated from @file{config.h.in} based on the
3184 results of configure tests and user-selected optional features and
3185 contains preprocessor definitions specifying the nature of the
3186 environment in which XEmacs is being compiled.
3187
3188
3189
3190 @example
3191 paths.h
3192 @end example
3193
3194 This is automatically generated from @file{paths.h.in} based on supplied
3195 configure values, and allows for non-standard installed configurations
3196 of the XEmacs directories.  It's currently broken, though.
3197
3198
3199
3200 @example
3201 emacs.c
3202 signal.c
3203 @end example
3204
3205 @file{emacs.c} contains @code{main()} and other code that performs the most
3206 basic environment initializations and handles shutting down the XEmacs
3207 process (this includes @code{kill-emacs}, the normal way that XEmacs is
3208 exited; @code{dump-emacs}, which is used during the build process to
3209 write out the XEmacs executable; @code{run-emacs-from-temacs}, which can
3210 be used to start XEmacs directly when temacs has finished loading all
3211 the Lisp code; and emergency code to handle crashes [XEmacs tries to
3212 auto-save all files before it crashes]).
3213
3214 Low-level code that directly interacts with the Unix signal mechanism,
3215 however, is in @file{signal.c}.  Note that this code does not handle system
3216 dependencies in interfacing to signals; that is handled using the
3217 @file{syssignal.h} header file, described in section J below.
3218
3219
3220
3221 @example
3222 unexaix.c
3223 unexalpha.c
3224 unexapollo.c
3225 unexconvex.c
3226 unexec.c
3227 unexelf.c
3228 unexelfsgi.c
3229 unexencap.c
3230 unexenix.c
3231 unexfreebsd.c
3232 unexfx2800.c
3233 unexhp9k3.c
3234 unexhp9k800.c
3235 unexmips.c
3236 unexnext.c
3237 unexsol2.c
3238 unexsunos4.c
3239 @end example
3240
3241 These modules contain code dumping out the XEmacs executable on various
3242 different systems. (This process is highly machine-specific and
3243 requires intimate knowledge of the executable format and the memory map
3244 of the process.) Only one of these modules is actually used; this is
3245 chosen by @file{configure}.
3246
3247
3248
3249 @example
3250 ecrt0.c
3251 lastfile.c
3252 pre-crt0.c
3253 @end example
3254
3255 These modules are used in conjunction with the dump mechanism.  On some
3256 systems, an alternative version of the C startup code (the actual code
3257 that receives control from the operating system when the process is
3258 started, and which calls @code{main()}) is required so that the dumping
3259 process works properly; @file{crt0.c} provides this.
3260
3261 @file{pre-crt0.c} and @file{lastfile.c} should be the very first and
3262 very last file linked, respectively. (Actually, this is not really true.
3263 @file{lastfile.c} should be after all Emacs modules whose initialized
3264 data should be made constant, and before all other Emacs files and all
3265 libraries.  In particular, the allocation modules @file{gmalloc.c},
3266 @file{alloca.c}, etc. are normally placed past @file{lastfile.c}, and
3267 all of the files that implement Xt widget classes @emph{must} be placed
3268 after @file{lastfile.c} because they contain various structures that
3269 must be statically initialized and into which Xt writes at various
3270 times.) @file{pre-crt0.c} and @file{lastfile.c} contain exported symbols
3271 that are used to determine the start and end of XEmacs' initialized
3272 data space when dumping.
3273
3274
3275
3276 @example
3277 alloca.c
3278 free-hook.c
3279 getpagesize.h
3280 gmalloc.c
3281 malloc.c
3282 mem-limits.h
3283 ralloc.c
3284 vm-limit.c
3285 @end example
3286
3287 These handle basic C allocation of memory.  @file{alloca.c} is an emulation of
3288 the stack allocation function @code{alloca()} on machines that lack
3289 this. (XEmacs makes extensive use of @code{alloca()} in its code.)
3290
3291 @file{gmalloc.c} and @file{malloc.c} are two implementations of the standard C
3292 functions @code{malloc()}, @code{realloc()} and @code{free()}.  They are
3293 often used in place of the standard system-provided @code{malloc()}
3294 because they usually provide a much faster implementation, at the
3295 expense of additional memory use.  @file{gmalloc.c} is a newer implementation
3296 that is much more memory-efficient for large allocations than @file{malloc.c},
3297 and should always be preferred if it works. (At one point, @file{gmalloc.c}
3298 didn't work on some systems where @file{malloc.c} worked; but this should be
3299 fixed now.)
3300
3301 @cindex relocating allocator
3302 @file{ralloc.c} is the @dfn{relocating allocator}.  It provides
3303 functions similar to @code{malloc()}, @code{realloc()} and @code{free()}
3304 that allocate memory that can be dynamically relocated in memory.  The
3305 advantage of this is that allocated memory can be shuffled around to
3306 place all the free memory at the end of the heap, and the heap can then
3307 be shrunk, releasing the memory back to the operating system.  The use
3308 of this can be controlled with the configure option @code{--rel-alloc};
3309 if enabled, memory allocated for buffers will be relocatable, so that if
3310 a very large file is visited and the buffer is later killed, the memory
3311 can be released to the operating system.  (The disadvantage of this
3312 mechanism is that it can be very slow.  On systems with the
3313 @code{mmap()} system call, the XEmacs version of @file{ralloc.c} uses
3314 this to move memory around without actually having to block-copy it,
3315 which can speed things up; but it can still cause noticeable performance
3316 degradation.)
3317
3318 @file{free-hook.c} contains some debugging functions for checking for invalid
3319 arguments to @code{free()}.
3320
3321 @file{vm-limit.c} contains some functions that warn the user when memory is
3322 getting low.  These are callback functions that are called by @file{gmalloc.c}
3323 and @file{malloc.c} at appropriate times.
3324
3325 @file{getpagesize.h} provides a uniform interface for retrieving the size of a
3326 page in virtual memory.  @file{mem-limits.h} provides a uniform interface for
3327 retrieving the total amount of available virtual memory.  Both are
3328 similar in spirit to the @file{sys*.h} files described in section J, below.
3329
3330
3331
3332 @example
3333 blocktype.c
3334 blocktype.h
3335 dynarr.c
3336 @end example
3337
3338 These implement a couple of basic C data types to facilitate memory
3339 allocation.  The @code{Blocktype} type efficiently manages the
3340 allocation of fixed-size blocks by minimizing the number of times that
3341 @code{malloc()} and @code{free()} are called.  It allocates memory in
3342 large chunks, subdivides the chunks into blocks of the proper size, and
3343 returns the blocks as requested.  When blocks are freed, they are placed
3344 onto a linked list, so they can be efficiently reused.  This data type
3345 is not much used in XEmacs currently, because it's a fairly new
3346 addition.
3347
3348 @cindex dynamic array
3349 The @code{Dynarr} type implements a @dfn{dynamic array}, which is
3350 similar to a standard C array but has no fixed limit on the number of
3351 elements it can contain.  Dynamic arrays can hold elements of any type,
3352 and when you add a new element, the array automatically resizes itself
3353 if it isn't big enough.  Dynarrs are extensively used in the redisplay
3354 mechanism.
3355
3356
3357
3358 @example
3359 inline.c
3360 @end example
3361
3362 This module is used in connection with inline functions (available in
3363 some compilers).  Often, inline functions need to have a corresponding
3364 non-inline function that does the same thing.  This module is where they
3365 reside.  It contains no actual code, but defines some special flags that
3366 cause inline functions defined in header files to be rendered as actual
3367 functions.  It then includes all header files that contain any inline
3368 function definitions, so that each one gets a real function equivalent.
3369
3370
3371
3372 @example
3373 debug.c
3374 debug.h
3375 @end example
3376
3377 These functions provide a system for doing internal consistency checks
3378 during code development.  This system is not currently used; instead the
3379 simpler @code{assert()} macro is used along with the various checks
3380 provided by the @samp{--error-check-*} configuration options.
3381
3382
3383
3384 @example
3385 universe.h
3386 @end example
3387
3388 This is not currently used.
3389
3390
3391
3392 @node Basic Lisp Modules
3393 @section Basic Lisp Modules
3394 @cindex Lisp modules, basic
3395 @cindex modules, basic Lisp
3396
3397 @example
3398 lisp-disunion.h
3399 lisp-union.h
3400 lisp.h
3401 lrecord.h
3402 symsinit.h
3403 @end example
3404
3405 These are the basic header files for all XEmacs modules.  Each module
3406 includes @file{lisp.h}, which brings the other header files in.
3407 @file{lisp.h} contains the definitions of the structures and extractor
3408 and constructor macros for the basic Lisp objects and various other
3409 basic definitions for the Lisp environment, as well as some
3410 general-purpose definitions (e.g. @code{min()} and @code{max()}).
3411 @file{lisp.h} includes either @file{lisp-disunion.h} or
3412 @file{lisp-union.h}, depending on whether @code{USE_UNION_TYPE} is
3413 defined.  These files define the typedef of the Lisp object itself (as
3414 described above) and the low-level macros that hide the actual
3415 implementation of the Lisp object.  All extractor and constructor macros
3416 for particular types of Lisp objects are defined in terms of these
3417 low-level macros.
3418
3419 As a general rule, all typedefs should go into the typedefs section of
3420 @file{lisp.h} rather than into a module-specific header file even if the
3421 structure is defined elsewhere.  This allows function prototypes that
3422 use the typedef to be placed into other header files.  Forward structure
3423 declarations (i.e. a simple declaration like @code{struct foo;} where
3424 the structure itself is defined elsewhere) should be placed into the
3425 typedefs section as necessary.
3426
3427 @file{lrecord.h} contains the basic structures and macros that implement
3428 all record-type Lisp objects---i.e. all objects whose type is a field
3429 in their C structure, which includes all objects except the few most
3430 basic ones.
3431
3432 @file{lisp.h} contains prototypes for most of the exported functions in
3433 the various modules.  Lisp primitives defined using @code{DEFUN} that
3434 need to be called by C code should be declared using @code{EXFUN}.
3435 Other function prototypes should be placed either into the appropriate
3436 section of @code{lisp.h}, or into a module-specific header file,
3437 depending on how general-purpose the function is and whether it has
3438 special-purpose argument types requiring definitions not in
3439 @file{lisp.h}.)  All initialization functions are prototyped in
3440 @file{symsinit.h}.
3441
3442
3443
3444 @example
3445 alloc.c
3446 @end example
3447
3448 The large module @file{alloc.c} implements all of the basic allocation and
3449 garbage collection for Lisp objects.  The most commonly used Lisp
3450 objects are allocated in chunks, similar to the Blocktype data type
3451 described above; others are allocated in individually @code{malloc()}ed
3452 blocks.  This module provides the foundation on which all other aspects
3453 of the Lisp environment sit, and is the first module initialized at
3454 startup.
3455
3456 Note that @file{alloc.c} provides a series of generic functions that are
3457 not dependent on any particular object type, and interfaces to
3458 particular types of objects using a standardized interface of
3459 type-specific methods.  This scheme is a fundamental principle of
3460 object-oriented programming and is heavily used throughout XEmacs.  The
3461 great advantage of this is that it allows for a clean separation of
3462 functionality into different modules---new classes of Lisp objects, new
3463 event interfaces, new device types, new stream interfaces, etc. can be
3464 added transparently without affecting code anywhere else in XEmacs.
3465 Because the different subsystems are divided into general and specific
3466 code, adding a new subtype within a subsystem will in general not
3467 require changes to the generic subsystem code or affect any of the other
3468 subtypes in the subsystem; this provides a great deal of robustness to
3469 the XEmacs code.
3470
3471
3472 @example
3473 eval.c
3474 backtrace.h
3475 @end example
3476
3477 This module contains all of the functions to handle the flow of control.
3478 This includes the mechanisms of defining functions, calling functions,
3479 traversing stack frames, and binding variables; the control primitives
3480 and other special forms such as @code{while}, @code{if}, @code{eval},
3481 @code{let}, @code{and}, @code{or}, @code{progn}, etc.; handling of
3482 non-local exits, unwind-protects, and exception handlers; entering the
3483 debugger; methods for the subr Lisp object type; etc.  It does
3484 @emph{not} include the @code{read} function, the @code{print} function,
3485 or the handling of symbols and obarrays.
3486
3487 @file{backtrace.h} contains some structures related to stack frames and the
3488 flow of control.
3489
3490
3491
3492 @example
3493 lread.c
3494 @end example
3495
3496 This module implements the Lisp reader and the @code{read} function,
3497 which converts text into Lisp objects, according to the read syntax of
3498 the objects, as described above.  This is similar to the parser that is
3499 a part of all compilers.
3500
3501
3502
3503 @example
3504 print.c
3505 @end example
3506
3507 This module implements the Lisp print mechanism and the @code{print}
3508 function and related functions.  This is the inverse of the Lisp reader
3509 -- it converts Lisp objects to a printed, textual representation.
3510 (Hopefully something that can be read back in using @code{read} to get
3511 an equivalent object.)
3512
3513
3514
3515 @example
3516 general.c
3517 symbols.c
3518 symeval.h
3519 @end example
3520
3521 @file{symbols.c} implements the handling of symbols, obarrays, and
3522 retrieving the values of symbols.  Much of the code is devoted to
3523 handling the special @dfn{symbol-value-magic} objects that define
3524 special types of variables---this includes buffer-local variables,
3525 variable aliases, variables that forward into C variables, etc.  This
3526 module is initialized extremely early (right after @file{alloc.c}),
3527 because it is here that the basic symbols @code{t} and @code{nil} are
3528 created, and those symbols are used everywhere throughout XEmacs.
3529
3530 @file{symeval.h} contains the definitions of symbol structures and the
3531 @code{DEFVAR_LISP()} and related macros for declaring variables.
3532
3533
3534
3535 @example
3536 data.c
3537 floatfns.c
3538 fns.c
3539 @end example
3540
3541 These modules implement the methods and standard Lisp primitives for all
3542 the basic Lisp object types other than symbols (which are described
3543 above).  @file{data.c} contains all the predicates (primitives that return
3544 whether an object is of a particular type); the integer arithmetic
3545 functions; and the basic accessor and mutator primitives for the various
3546 object types.  @file{fns.c} contains all the standard predicates for working
3547 with sequences (where, abstractly speaking, a sequence is an ordered set
3548 of objects, and can be represented by a list, string, vector, or
3549 bit-vector); it also contains @code{equal}, perhaps on the grounds that
3550 bulk of the operation of @code{equal} is comparing sequences.
3551 @file{floatfns.c} contains methods and primitives for floats and floating-point
3552 arithmetic.
3553
3554
3555
3556 @example
3557 bytecode.c
3558 bytecode.h
3559 @end example
3560
3561 @file{bytecode.c} implements the byte-code interpreter and
3562 compiled-function objects, and @file{bytecode.h} contains associated
3563 structures.  Note that the byte-code @emph{compiler} is written in Lisp.
3564
3565
3566
3567
3568 @node Modules for Standard Editing Operations
3569 @section Modules for Standard Editing Operations
3570 @cindex modules for standard editing operations
3571 @cindex editing operations, modules for standard
3572
3573 @example
3574 buffer.c
3575 buffer.h
3576 bufslots.h
3577 @end example
3578
3579 @file{buffer.c} implements the @dfn{buffer} Lisp object type.  This
3580 includes functions that create and destroy buffers; retrieve buffers by
3581 name or by other properties; manipulate lists of buffers (remember that
3582 buffers are permanent objects and stored in various ordered lists);
3583 retrieve or change buffer properties; etc.  It also contains the
3584 definitions of all the built-in buffer-local variables (which can be
3585 viewed as buffer properties).  It does @emph{not} contain code to
3586 manipulate buffer-local variables (that's in @file{symbols.c}, described
3587 above); or code to manipulate the text in a buffer.
3588
3589 @file{buffer.h} defines the structures associated with a buffer and the various
3590 macros for retrieving text from a buffer and special buffer positions
3591 (e.g. @code{point}, the default location for text insertion).  It also
3592 contains macros for working with buffer positions and converting between
3593 their representations as character offsets and as byte offsets (under
3594 MULE, they are different, because characters can be multi-byte).  It is
3595 one of the largest header files.
3596
3597 @file{bufslots.h} defines the fields in the buffer structure that correspond to
3598 the built-in buffer-local variables.  It is its own header file because
3599 it is included many times in @file{buffer.c}, as a way of iterating over all
3600 the built-in buffer-local variables.
3601
3602
3603
3604 @example
3605 insdel.c
3606 insdel.h
3607 @end example
3608
3609 @file{insdel.c} contains low-level functions for inserting and deleting text in
3610 a buffer, keeping track of changed regions for use by redisplay, and
3611 calling any before-change and after-change functions that may have been
3612 registered for the buffer.  It also contains the actual functions that
3613 convert between byte offsets and character offsets.
3614
3615 @file{insdel.h} contains associated headers.
3616
3617
3618
3619 @example
3620 marker.c
3621 @end example
3622
3623 This module implements the @dfn{marker} Lisp object type, which
3624 conceptually is a pointer to a text position in a buffer that moves
3625 around as text is inserted and deleted, so as to remain in the same
3626 relative position.  This module doesn't actually move the markers around
3627 -- that's handled in @file{insdel.c}.  This module just creates them and
3628 implements the primitives for working with them.  As markers are simple
3629 objects, this does not entail much.
3630
3631 Note that the standard arithmetic primitives (e.g. @code{+}) accept
3632 markers in place of integers and automatically substitute the value of
3633 @code{marker-position} for the marker, i.e. an integer describing the
3634 current buffer position of the marker.
3635
3636
3637
3638 @example
3639 extents.c
3640 extents.h
3641 @end example
3642
3643 This module implements the @dfn{extent} Lisp object type, which is like
3644 a marker that works over a range of text rather than a single position.
3645 Extents are also much more complex and powerful than markers and have a
3646 more efficient (and more algorithmically complex) implementation.  The
3647 implementation is described in detail in comments in @file{extents.c}.
3648
3649 The code in @file{extents.c} works closely with @file{insdel.c} so that
3650 extents are properly moved around as text is inserted and deleted.
3651 There is also code in @file{extents.c} that provides information needed
3652 by the redisplay mechanism for efficient operation. (Remember that
3653 extents can have display properties that affect [sometimes drastically,
3654 as in the @code{invisible} property] the display of the text they
3655 cover.)
3656
3657
3658
3659 @example
3660 editfns.c
3661 @end example
3662
3663 @file{editfns.c} contains the standard Lisp primitives for working with
3664 a buffer's text, and calls the low-level functions in @file{insdel.c}.
3665 It also contains primitives for working with @code{point} (the default
3666 buffer insertion location).
3667
3668 @file{editfns.c} also contains functions for retrieving various
3669 characteristics from the external environment: the current time, the
3670 process ID of the running XEmacs process, the name of the user who ran
3671 this XEmacs process, etc.  It's not clear why this code is in
3672 @file{editfns.c}.
3673
3674
3675
3676 @example
3677 callint.c
3678 cmds.c
3679 commands.h
3680 @end example
3681
3682 @cindex interactive
3683 These modules implement the basic @dfn{interactive} commands,
3684 i.e. user-callable functions.  Commands, as opposed to other functions,
3685 have special ways of getting their parameters interactively (by querying
3686 the user), as opposed to having them passed in a normal function
3687 invocation.  Many commands are not really meant to be called from other
3688 Lisp functions, because they modify global state in a way that's often
3689 undesired as part of other Lisp functions.
3690
3691 @file{callint.c} implements the mechanism for querying the user for
3692 parameters and calling interactive commands.  The bulk of this module is
3693 code that parses the interactive spec that is supplied with an
3694 interactive command.
3695
3696 @file{cmds.c} implements the basic, most commonly used editing commands:
3697 commands to move around the current buffer and insert and delete
3698 characters.  These commands are implemented using the Lisp primitives
3699 defined in @file{editfns.c}.
3700
3701 @file{commands.h} contains associated structure definitions and prototypes.
3702
3703
3704
3705 @example
3706 regex.c
3707 regex.h
3708 search.c
3709 @end example
3710
3711 @file{search.c} implements the Lisp primitives for searching for text in
3712 a buffer, and some of the low-level algorithms for doing this.  In
3713 particular, the fast fixed-string Boyer-Moore search algorithm is
3714 implemented in @file{search.c}.  The low-level algorithms for doing
3715 regular-expression searching, however, are implemented in @file{regex.c}
3716 and @file{regex.h}.  These two modules are largely independent of
3717 XEmacs, and are similar to (and based upon) the regular-expression
3718 routines used in @file{grep} and other GNU utilities.
3719
3720
3721
3722 @example
3723 doprnt.c
3724 @end example
3725
3726 @file{doprnt.c} implements formatted-string processing, similar to
3727 @code{printf()} command in C.
3728
3729
3730
3731 @example
3732 undo.c
3733 @end example
3734
3735 This module implements the undo mechanism for tracking buffer changes.
3736 Most of this could be implemented in Lisp.
3737
3738
3739
3740 @node Editor-Level Control Flow Modules
3741 @section Editor-Level Control Flow Modules
3742 @cindex control flow modules, editor-level
3743 @cindex modules, editor-level control flow
3744
3745 @example
3746 event-Xt.c
3747 event-msw.c
3748 event-stream.c
3749 event-tty.c
3750 events-mod.h
3751 gpmevent.c
3752 gpmevent.h
3753 events.c
3754 events.h
3755 @end example
3756
3757 These implement the handling of events (user input and other system
3758 notifications).
3759
3760 @file{events.c} and @file{events.h} define the @dfn{event} Lisp object
3761 type and primitives for manipulating it.
3762
3763 @file{event-stream.c} implements the basic functions for working with
3764 event queues, dispatching an event by looking it up in relevant keymaps
3765 and such, and handling timeouts; this includes the primitives
3766 @code{next-event} and @code{dispatch-event}, as well as related
3767 primitives such as @code{sit-for}, @code{sleep-for}, and
3768 @code{accept-process-output}. (@file{event-stream.c} is one of the
3769 hairiest and trickiest modules in XEmacs.  Beware!  You can easily mess
3770 things up here.)
3771
3772 @file{event-Xt.c} and @file{event-tty.c} implement the low-level
3773 interfaces onto retrieving events from Xt (the X toolkit) and from TTY's
3774 (using @code{read()} and @code{select()}), respectively.  The event
3775 interface enforces a clean separation between the specific code for
3776 interfacing with the operating system and the generic code for working
3777 with events, by defining an API of basic, low-level event methods;
3778 @file{event-Xt.c} and @file{event-tty.c} are two different
3779 implementations of this API.  To add support for a new operating system
3780 (e.g. NeXTstep), one merely needs to provide another implementation of
3781 those API functions.
3782
3783 Note that the choice of whether to use @file{event-Xt.c} or
3784 @file{event-tty.c} is made at compile time!  Or at the very latest, it
3785 is made at startup time.  @file{event-Xt.c} handles events for
3786 @emph{both} X and TTY frames; @file{event-tty.c} is only used when X
3787 support is not compiled into XEmacs.  The reason for this is that there
3788 is only one event loop in XEmacs: thus, it needs to be able to receive
3789 events from all different kinds of frames.
3790
3791
3792
3793 @example
3794 keymap.c
3795 keymap.h
3796 @end example
3797
3798 @file{keymap.c} and @file{keymap.h} define the @dfn{keymap} Lisp object
3799 type and associated methods and primitives. (Remember that keymaps are
3800 objects that associate event descriptions with functions to be called to
3801 ``execute'' those events; @code{dispatch-event} looks up events in the
3802 relevant keymaps.)
3803
3804
3805
3806 @example
3807 cmdloop.c
3808 @end example
3809
3810 @file{cmdloop.c} contains functions that implement the actual editor
3811 command loop---i.e. the event loop that cyclically retrieves and
3812 dispatches events.  This code is also rather tricky, just like
3813 @file{event-stream.c}.
3814
3815
3816
3817 @example
3818 macros.c
3819 macros.h
3820 @end example
3821
3822 These two modules contain the basic code for defining keyboard macros.
3823 These functions don't actually do much; most of the code that handles keyboard
3824 macros is mixed in with the event-handling code in @file{event-stream.c}.
3825
3826
3827
3828 @example
3829 minibuf.c
3830 @end example
3831
3832 This contains some miscellaneous code related to the minibuffer (most of
3833 the minibuffer code was moved into Lisp by Richard Mlynarik).  This
3834 includes the primitives for completion (although filename completion is
3835 in @file{dired.c}), the lowest-level interface to the minibuffer (if the
3836 command loop were cleaned up, this too could be in Lisp), and code for
3837 dealing with the echo area (this, too, was mostly moved into Lisp, and
3838 the only code remaining is code to call out to Lisp or provide simple
3839 bootstrapping implementations early in temacs, before the echo-area Lisp
3840 code is loaded).
3841
3842
3843
3844 @node Modules for the Basic Displayable Lisp Objects
3845 @section Modules for the Basic Displayable Lisp Objects
3846 @cindex modules for the basic displayable Lisp objects
3847 @cindex displayable Lisp objects, modules for the basic
3848 @cindex Lisp objects, modules for the basic displayable
3849 @cindex objects, modules for the basic displayable Lisp
3850
3851 @example
3852 console-msw.c
3853 console-msw.h
3854 console-stream.c
3855 console-stream.h
3856 console-tty.c
3857 console-tty.h
3858 console-x.c
3859 console-x.h
3860 console.c
3861 console.h
3862 @end example
3863
3864 These modules implement the @dfn{console} Lisp object type.  A console
3865 contains multiple display devices, but only one keyboard and mouse.
3866 Most of the time, a console will contain exactly one device.
3867
3868 Consoles are the top of a lisp object inclusion hierarchy.  Consoles
3869 contain devices, which contain frames, which contain windows.
3870
3871
3872
3873 @example
3874 device-msw.c
3875 device-tty.c
3876 device-x.c
3877 device.c
3878 device.h
3879 @end example
3880
3881 These modules implement the @dfn{device} Lisp object type.  This
3882 abstracts a particular screen or connection on which frames are
3883 displayed.  As with Lisp objects, event interfaces, and other
3884 subsystems, the device code is separated into a generic component that
3885 contains a standardized interface (in the form of a set of methods) onto
3886 particular device types.
3887
3888 The device subsystem defines all the methods and provides method
3889 services for not only device operations but also for the frame, window,
3890 menubar, scrollbar, toolbar, and other displayable-object subsystems.
3891 The reason for this is that all of these subsystems have the same
3892 subtypes (X, TTY, NeXTstep, Microsoft Windows, etc.) as devices do.
3893
3894
3895
3896 @example
3897 frame-msw.c
3898 frame-tty.c
3899 frame-x.c
3900 frame.c
3901 frame.h
3902 @end example
3903
3904 Each device contains one or more frames in which objects (e.g. text) are
3905 displayed.  A frame corresponds to a window in the window system;
3906 usually this is a top-level window but it could potentially be one of a
3907 number of overlapping child windows within a top-level window, using the
3908 MDI (Multiple Document Interface) protocol in Microsoft Windows or a
3909 similar scheme.
3910
3911 The @file{frame-*} files implement the @dfn{frame} Lisp object type and
3912 provide the generic and device-type-specific operations on frames
3913 (e.g. raising, lowering, resizing, moving, etc.).
3914
3915
3916
3917 @example
3918 window.c
3919 window.h
3920 @end example
3921
3922 @cindex window (in Emacs)
3923 @cindex pane
3924 Each frame consists of one or more non-overlapping @dfn{windows} (better
3925 known as @dfn{panes} in standard window-system terminology) in which a
3926 buffer's text can be displayed.  Windows can also have scrollbars
3927 displayed around their edges.
3928
3929 @file{window.c} and @file{window.h} implement the @dfn{window} Lisp
3930 object type and provide code to manage windows.  Since windows have no
3931 associated resources in the window system (the window system knows only
3932 about the frame; no child windows or anything are used for XEmacs
3933 windows), there is no device-type-specific code here; all of that code
3934 is part of the redisplay mechanism or the code for particular object
3935 types such as scrollbars.
3936
3937
3938
3939 @node Modules for other Display-Related Lisp Objects
3940 @section Modules for other Display-Related Lisp Objects
3941 @cindex modules for other display-related Lisp objects
3942 @cindex display-related Lisp objects, modules for other
3943 @cindex Lisp objects, modules for other display-related
3944
3945 @example
3946 faces.c
3947 faces.h
3948 @end example
3949
3950
3951
3952 @example
3953 bitmaps.h
3954 glyphs-eimage.c
3955 glyphs-msw.c
3956 glyphs-msw.h
3957 glyphs-widget.c
3958 glyphs-x.c
3959 glyphs-x.h
3960 glyphs.c
3961 glyphs.h
3962 @end example
3963
3964
3965
3966 @example
3967 objects-msw.c
3968 objects-msw.h
3969 objects-tty.c
3970 objects-tty.h
3971 objects-x.c
3972 objects-x.h
3973 objects.c
3974 objects.h
3975 @end example
3976
3977
3978
3979 @example
3980 menubar-msw.c
3981 menubar-msw.h
3982 menubar-x.c
3983 menubar.c
3984 menubar.h
3985 @end example
3986
3987
3988
3989 @example
3990 scrollbar-msw.c
3991 scrollbar-msw.h
3992 scrollbar-x.c
3993 scrollbar-x.h
3994 scrollbar.c
3995 scrollbar.h
3996 @end example
3997
3998
3999
4000 @example
4001 toolbar-msw.c
4002 toolbar-x.c
4003 toolbar.c
4004 toolbar.h
4005 @end example
4006
4007
4008
4009 @example
4010 font-lock.c
4011 @end example
4012
4013 This file provides C support for syntax highlighting---i.e.
4014 highlighting different syntactic constructs of a source file in
4015 different colors, for easy reading.  The C support is provided so that
4016 this is fast.
4017
4018
4019
4020 @example
4021 dgif_lib.c
4022 gif_err.c
4023 gif_lib.h
4024 gifalloc.c
4025 @end example
4026
4027 These modules decode GIF-format image files, for use with glyphs.
4028 These files were removed due to Unisys patent infringement concerns.
4029
4030
4031
4032 @node Modules for the Redisplay Mechanism
4033 @section Modules for the Redisplay Mechanism
4034 @cindex modules for the redisplay mechanism
4035 @cindex redisplay mechanism, modules for the
4036
4037 @example
4038 redisplay-output.c
4039 redisplay-msw.c
4040 redisplay-tty.c
4041 redisplay-x.c
4042 redisplay.c
4043 redisplay.h
4044 @end example
4045
4046 These files provide the redisplay mechanism.  As with many other
4047 subsystems in XEmacs, there is a clean separation between the general
4048 and device-specific support.
4049
4050 @file{redisplay.c} contains the bulk of the redisplay engine.  These
4051 functions update the redisplay structures (which describe how the screen
4052 is to appear) to reflect any changes made to the state of any
4053 displayable objects (buffer, frame, window, etc.) since the last time
4054 that redisplay was called.  These functions are highly optimized to
4055 avoid doing more work than necessary (since redisplay is called
4056 extremely often and is potentially a huge time sink), and depend heavily
4057 on notifications from the objects themselves that changes have occurred,
4058 so that redisplay doesn't explicitly have to check each possible object.
4059 The redisplay mechanism also contains a great deal of caching to further
4060 speed things up; some of this caching is contained within the various
4061 displayable objects.
4062
4063 @file{redisplay-output.c} goes through the redisplay structures and converts
4064 them into calls to device-specific methods to actually output the screen
4065 changes.
4066
4067 @file{redisplay-x.c} and @file{redisplay-tty.c} are two implementations
4068 of these redisplay output methods, for X frames and TTY frames,
4069 respectively.
4070
4071
4072
4073 @example
4074 indent.c
4075 @end example
4076
4077 This module contains various functions and Lisp primitives for
4078 converting between buffer positions and screen positions.  These
4079 functions call the redisplay mechanism to do most of the work, and then
4080 examine the redisplay structures to get the necessary information.  This
4081 module needs work.
4082
4083
4084
4085 @example
4086 termcap.c
4087 terminfo.c
4088 tparam.c
4089 @end example
4090
4091 These files contain functions for working with the termcap (BSD-style)
4092 and terminfo (System V style) databases of terminal capabilities and
4093 escape sequences, used when XEmacs is displaying in a TTY.
4094
4095
4096
4097 @example
4098 cm.c
4099 cm.h
4100 @end example
4101
4102 These files provide some miscellaneous TTY-output functions and should
4103 probably be merged into @file{redisplay-tty.c}.
4104
4105
4106
4107 @node Modules for Interfacing with the File System
4108 @section Modules for Interfacing with the File System
4109 @cindex modules for interfacing with the file system
4110 @cindex interfacing with the file system, modules for
4111 @cindex file system, modules for interfacing with the
4112
4113 @example
4114 lstream.c
4115 lstream.h
4116 @end example
4117
4118 These modules implement the @dfn{stream} Lisp object type.  This is an
4119 internal-only Lisp object that implements a generic buffering stream.
4120 The idea is to provide a uniform interface onto all sources and sinks of
4121 data, including file descriptors, stdio streams, chunks of memory, Lisp
4122 buffers, Lisp strings, etc.  That way, I/O functions can be written to
4123 the stream interface and can transparently handle all possible sources
4124 and sinks.  (For example, the @code{read} function can read data from a
4125 file, a string, a buffer, or even a function that is called repeatedly
4126 to return data, without worrying about where the data is coming from or
4127 what-size chunks it is returned in.)
4128
4129 @cindex lstream
4130 Note that in the C code, streams are called @dfn{lstreams} (for ``Lisp
4131 streams'') to distinguish them from other kinds of streams, e.g. stdio
4132 streams and C++ I/O streams.
4133
4134 Similar to other subsystems in XEmacs, lstreams are separated into
4135 generic functions and a set of methods for the different types of
4136 lstreams.  @file{lstream.c} provides implementations of many different
4137 types of streams; others are provided, e.g., in @file{file-coding.c}.
4138
4139
4140
4141 @example
4142 fileio.c
4143 @end example
4144
4145 This implements the basic primitives for interfacing with the file
4146 system.  This includes primitives for reading files into buffers,
4147 writing buffers into files, checking for the presence or accessibility
4148 of files, canonicalizing file names, etc.  Note that these primitives
4149 are usually not invoked directly by the user: There is a great deal of
4150 higher-level Lisp code that implements the user commands such as
4151 @code{find-file} and @code{save-buffer}.  This is similar to the
4152 distinction between the lower-level primitives in @file{editfns.c} and
4153 the higher-level user commands in @file{commands.c} and
4154 @file{simple.el}.
4155
4156
4157
4158 @example
4159 filelock.c
4160 @end example
4161
4162 This file provides functions for detecting clashes between different
4163 processes (e.g. XEmacs and some external process, or two different
4164 XEmacs processes) modifying the same file.  (XEmacs can optionally use
4165 the @file{lock/} subdirectory to provide a form of ``locking'' between
4166 different XEmacs processes.)  This module is also used by the low-level
4167 functions in @file{insdel.c} to ensure that, if the first modification
4168 is being made to a buffer whose corresponding file has been externally
4169 modified, the user is made aware of this so that the buffer can be
4170 synched up with the external changes if necessary.
4171
4172
4173 @example
4174 filemode.c
4175 @end example
4176
4177 This file provides some miscellaneous functions that construct a
4178 @samp{rwxr-xr-x}-type permissions string (as might appear in an
4179 @file{ls}-style directory listing) given the information returned by the
4180 @code{stat()} system call.
4181
4182
4183
4184 @example
4185 dired.c
4186 ndir.h
4187 @end example
4188
4189 These files implement the XEmacs interface to directory searching.  This
4190 includes a number of primitives for determining the files in a directory
4191 and for doing filename completion. (Remember that generic completion is
4192 handled by a different mechanism, in @file{minibuf.c}.)
4193
4194 @file{ndir.h} is a header file used for the directory-searching
4195 emulation functions provided in @file{sysdep.c} (see section J below),
4196 for systems that don't provide any directory-searching functions. (On
4197 those systems, directories can be read directly as files, and parsed.)
4198
4199
4200
4201 @example
4202 realpath.c
4203 @end example
4204
4205 This file provides an implementation of the @code{realpath()} function
4206 for expanding symbolic links, on systems that don't implement it or have
4207 a broken implementation.
4208
4209
4210
4211 @node Modules for Other Aspects of the Lisp Interpreter and Object System
4212 @section Modules for Other Aspects of the Lisp Interpreter and Object System
4213 @cindex modules for other aspects of the Lisp interpreter and object system
4214 @cindex Lisp interpreter and object system, modules for other aspects of the
4215 @cindex interpreter and object system, modules for other aspects of the Lisp
4216 @cindex object system, modules for other aspects of the Lisp interpreter and
4217
4218 @example
4219 elhash.c
4220 elhash.h
4221 hash.c
4222 hash.h
4223 @end example
4224
4225 These files provide two implementations of hash tables.  Files
4226 @file{hash.c} and @file{hash.h} provide a generic C implementation of
4227 hash tables which can stand independently of XEmacs.  Files
4228 @file{elhash.c} and @file{elhash.h} provide a separate implementation of
4229 hash tables that can store only Lisp objects, and knows about Lispy
4230 things like garbage collection, and implement the @dfn{hash-table} Lisp
4231 object type.
4232
4233
4234 @example
4235 specifier.c
4236 specifier.h
4237 @end example
4238
4239 This module implements the @dfn{specifier} Lisp object type.  This is
4240 primarily used for displayable properties, and allows for values that
4241 are specific to a particular buffer, window, frame, device, or device
4242 class, as well as a default value existing.  This is used, for example,
4243 to control the height of the horizontal scrollbar or the appearance of
4244 the @code{default}, @code{bold}, or other faces.  The specifier object
4245 consists of a number of specifications, each of which maps from a
4246 buffer, window, etc. to a value.  The function @code{specifier-instance}
4247 looks up a value given a window (from which a buffer, frame, and device
4248 can be derived).
4249
4250
4251 @example
4252 chartab.c
4253 chartab.h
4254 casetab.c
4255 @end example
4256
4257 @file{chartab.c} and @file{chartab.h} implement the @dfn{char table}
4258 Lisp object type, which maps from characters or certain sorts of
4259 character ranges to Lisp objects.  The implementation of this object
4260 type is optimized for the internal representation of characters.  Char
4261 tables come in different types, which affect the allowed object types to
4262 which a character can be mapped and also dictate certain other
4263 properties of the char table.
4264
4265 @cindex case table
4266 @file{casetab.c} implements one sort of char table, the @dfn{case
4267 table}, which maps characters to other characters of possibly different
4268 case.  These are used by XEmacs to implement case-changing primitives
4269 and to do case-insensitive searching.
4270
4271
4272
4273 @example
4274 syntax.c
4275 syntax.h
4276 @end example
4277
4278 @cindex scanner
4279 This module implements @dfn{syntax tables}, another sort of char table
4280 that maps characters into syntax classes that define the syntax of these
4281 characters (e.g. a parenthesis belongs to a class of @samp{open}
4282 characters that have corresponding @samp{close} characters and can be
4283 nested).  This module also implements the Lisp @dfn{scanner}, a set of
4284 primitives for scanning over text based on syntax tables.  This is used,
4285 for example, to find the matching parenthesis in a command such as
4286 @code{forward-sexp}, and by @file{font-lock.c} to locate quoted strings,
4287 comments, etc.
4288
4289
4290
4291 @example
4292 casefiddle.c
4293 @end example
4294
4295 This module implements various Lisp primitives for upcasing, downcasing
4296 and capitalizing strings or regions of buffers.
4297
4298
4299
4300 @example
4301 rangetab.c
4302 @end example
4303
4304 This module implements the @dfn{range table} Lisp object type, which
4305 provides for a mapping from ranges of integers to arbitrary Lisp
4306 objects.
4307
4308
4309
4310 @example
4311 opaque.c
4312 opaque.h
4313 @end example
4314
4315 This module implements the @dfn{opaque} Lisp object type, an
4316 internal-only Lisp object that encapsulates an arbitrary block of memory
4317 so that it can be managed by the Lisp allocation system.  To create an
4318 opaque object, you call @code{make_opaque()}, passing a pointer to a
4319 block of memory.  An object is created that is big enough to hold the
4320 memory, which is copied into the object's storage.  The object will then
4321 stick around as long as you keep pointers to it, after which it will be
4322 automatically reclaimed.
4323
4324 @cindex mark method
4325 Opaque objects can also have an arbitrary @dfn{mark method} associated
4326 with them, in case the block of memory contains other Lisp objects that
4327 need to be marked for garbage-collection purposes. (If you need other
4328 object methods, such as a finalize method, you should just go ahead and
4329 create a new Lisp object type---it's not hard.)
4330
4331
4332
4333 @example
4334 abbrev.c
4335 @end example
4336
4337 This function provides a few primitives for doing dynamic abbreviation
4338 expansion.  In XEmacs, most of the code for this has been moved into
4339 Lisp.  Some C code remains for speed and because the primitive
4340 @code{self-insert-command} (which is executed for all self-inserting
4341 characters) hooks into the abbrev mechanism. (@code{self-insert-command}
4342 is itself in C only for speed.)
4343
4344
4345
4346 @example
4347 doc.c
4348 @end example
4349
4350 This function provides primitives for retrieving the documentation
4351 strings of functions and variables.  These documentation strings contain
4352 certain special markers that get dynamically expanded (e.g. a
4353 reverse-lookup is performed on some named functions to retrieve their
4354 current key bindings).  Some documentation strings (in particular, for
4355 the built-in primitives and pre-loaded Lisp functions) are stored
4356 externally in a file @file{DOC} in the @file{lib-src/} directory and
4357 need to be fetched from that file. (Part of the build stage involves
4358 building this file, and another part involves constructing an index for
4359 this file and embedding it into the executable, so that the functions in
4360 @file{doc.c} do not have to search the entire @file{DOC} file to find
4361 the appropriate documentation string.)
4362
4363
4364
4365 @example
4366 md5.c
4367 @end example
4368
4369 This function provides a Lisp primitive that implements the MD5 secure
4370 hashing scheme, used to create a large hash value of a string of data such that
4371 the data cannot be derived from the hash value.  This is used for
4372 various security applications on the Internet.
4373
4374
4375
4376
4377 @node Modules for Interfacing with the Operating System
4378 @section Modules for Interfacing with the Operating System
4379 @cindex modules for interfacing with the operating system
4380 @cindex interfacing with the operating system, modules for
4381 @cindex operating system, modules for interfacing with the
4382
4383 @example
4384 callproc.c
4385 process.c
4386 process.h
4387 @end example
4388
4389 These modules allow XEmacs to spawn and communicate with subprocesses
4390 and network connections.
4391
4392 @cindex synchronous subprocesses
4393 @cindex subprocesses, synchronous
4394   @file{callproc.c} implements (through the @code{call-process}
4395 primitive) what are called @dfn{synchronous subprocesses}.  This means
4396 that XEmacs runs a program, waits till it's done, and retrieves its
4397 output.  A typical example might be calling the @file{ls} program to get
4398 a directory listing.
4399
4400 @cindex asynchronous subprocesses
4401 @cindex subprocesses, asynchronous
4402   @file{process.c} and @file{process.h} implement @dfn{asynchronous
4403 subprocesses}.  This means that XEmacs starts a program and then
4404 continues normally, not waiting for the process to finish.  Data can be
4405 sent to the process or retrieved from it as it's running.  This is used
4406 for the @code{shell} command (which provides a front end onto a shell
4407 program such as @file{csh}), the mail and news readers implemented in
4408 XEmacs, etc.  The result of calling @code{start-process} to start a
4409 subprocess is a process object, a particular kind of object used to
4410 communicate with the subprocess.  You can send data to the process by
4411 passing the process object and the data to @code{send-process}, and you
4412 can specify what happens to data retrieved from the process by setting
4413 properties of the process object. (When the process sends data, XEmacs
4414 receives a process event, which says that there is data ready.  When
4415 @code{dispatch-event} is called on this event, it reads the data from
4416 the process and does something with it, as specified by the process
4417 object's properties.  Typically, this means inserting the data into a
4418 buffer or calling a function.) Another property of the process object is
4419 called the @dfn{sentinel}, which is a function that is called when the
4420 process terminates.
4421
4422 @cindex network connections
4423   Process objects are also used for network connections (connections to a
4424 process running on another machine).  Network connections are started
4425 with @code{open-network-stream} but otherwise work just like
4426 subprocesses.
4427
4428
4429
4430 @example
4431 sysdep.c
4432 sysdep.h
4433 @end example
4434
4435   These modules implement most of the low-level, messy operating-system
4436 interface code.  This includes various device control (ioctl) operations
4437 for file descriptors, TTY's, pseudo-terminals, etc. (usually this stuff
4438 is fairly system-dependent; thus the name of this module), and emulation
4439 of standard library functions and system calls on systems that don't
4440 provide them or have broken versions.
4441
4442
4443
4444 @example
4445 sysdir.h
4446 sysfile.h
4447 sysfloat.h
4448 sysproc.h
4449 syspwd.h
4450 syssignal.h
4451 systime.h
4452 systty.h
4453 syswait.h
4454 @end example
4455
4456 These header files provide consistent interfaces onto system-dependent
4457 header files and system calls.  The idea is that, instead of including a
4458 standard header file like @file{<sys/param.h>} (which may or may not
4459 exist on various systems) or having to worry about whether all system
4460 provide a particular preprocessor constant, or having to deal with the
4461 four different paradigms for manipulating signals, you just include the
4462 appropriate @file{sys*.h} header file, which includes all the right
4463 system header files, defines and missing preprocessor constants,
4464 provides a uniform interface onto system calls, etc.
4465
4466 @file{sysdir.h} provides a uniform interface onto directory-querying
4467 functions. (In some cases, this is in conjunction with emulation
4468 functions in @file{sysdep.c}.)
4469
4470 @file{sysfile.h} includes all the necessary header files for standard
4471 system calls (e.g. @code{read()}), ensures that all necessary
4472 @code{open()} and @code{stat()} preprocessor constants are defined, and
4473 possibly (usually) substitutes sugared versions of @code{read()},
4474 @code{write()}, etc. that automatically restart interrupted I/O
4475 operations.
4476
4477 @file{sysfloat.h} includes the necessary header files for floating-point
4478 operations.
4479
4480 @file{sysproc.h} includes the necessary header files for calling
4481 @code{select()}, @code{fork()}, @code{execve()}, socket operations, and
4482 the like, and ensures that the @code{FD_*()} macros for descriptor-set
4483 manipulations are available.
4484
4485 @file{syspwd.h} includes the necessary header files for obtaining
4486 information from @file{/etc/passwd} (the functions are emulated under
4487 VMS).
4488
4489 @file{syssignal.h} includes the necessary header files for
4490 signal-handling and provides a uniform interface onto the different
4491 signal-handling and signal-blocking paradigms.
4492
4493 @file{systime.h} includes the necessary header files and provides
4494 uniform interfaces for retrieving the time of day, setting file
4495 access/modification times, getting the amount of time used by the XEmacs
4496 process, etc.
4497
4498 @file{systty.h} buffers against the infinitude of different ways of
4499 controlling TTY's.
4500
4501 @file{syswait.h} provides a uniform way of retrieving the exit status
4502 from a @code{wait()}ed-on process (some systems use a union, others use
4503 an int).
4504
4505
4506
4507 @example
4508 hpplay.c
4509 libsst.c
4510 libsst.h
4511 libst.h
4512 linuxplay.c
4513 nas.c
4514 sgiplay.c
4515 sound.c
4516 sunplay.c
4517 @end example
4518
4519 These files implement the ability to play various sounds on some types
4520 of computers.  You have to configure your XEmacs with sound support in
4521 order to get this capability.
4522
4523 @file{sound.c} provides the generic interface.  It implements various
4524 Lisp primitives and variables that let you specify which sounds should
4525 be played in certain conditions. (The conditions are identified by
4526 symbols, which are passed to @code{ding} to make a sound.  Various
4527 standard functions call this function at certain times; if sound support
4528 does not exist, a simple beep results.
4529
4530 @cindex native sound
4531 @cindex sound, native
4532 @file{sgiplay.c}, @file{sunplay.c}, @file{hpplay.c}, and
4533 @file{linuxplay.c} interface to the machine's speaker for various
4534 different kind of machines.  This is called @dfn{native} sound.
4535
4536 @cindex sound, network
4537 @cindex network sound
4538 @cindex NAS
4539 @file{nas.c} interfaces to a computer somewhere else on the network
4540 using the NAS (Network Audio Server) protocol, playing sounds on that
4541 machine.  This allows you to run XEmacs on a remote machine, with its
4542 display set to your local machine, and have the sounds be made on your
4543 local machine, provided that you have a NAS server running on your local
4544 machine.
4545
4546 @file{libsst.c}, @file{libsst.h}, and @file{libst.h} provide some
4547 additional functions for playing sound on a Sun SPARC but are not
4548 currently in use.
4549
4550
4551
4552 @example
4553 tooltalk.c
4554 tooltalk.h
4555 @end example
4556
4557 These two modules implement an interface to the ToolTalk protocol, which
4558 is an interprocess communication protocol implemented on some versions
4559 of Unix.  ToolTalk is a high-level protocol that allows processes to
4560 register themselves as providers of particular services; other processes
4561 can then request a service without knowing or caring exactly who is
4562 providing the service.  It is similar in spirit to the DDE protocol
4563 provided under Microsoft Windows.  ToolTalk is a part of the new CDE
4564 (Common Desktop Environment) specification and is used to connect the
4565 parts of the SPARCWorks development environment.
4566
4567
4568
4569 @example
4570 getloadavg.c
4571 @end example
4572
4573 This module provides the ability to retrieve the system's current load
4574 average. (The way to do this is highly system-specific, unfortunately,
4575 and requires a lot of special-case code.)
4576
4577
4578
4579 @example
4580 sunpro.c
4581 @end example
4582
4583 This module provides a small amount of code used internally at Sun to
4584 keep statistics on the usage of XEmacs.
4585
4586
4587
4588 @example
4589 broken-sun.h
4590 strcmp.c
4591 strcpy.c
4592 sunOS-fix.c
4593 @end example
4594
4595 These files provide replacement functions and prototypes to fix numerous
4596 bugs in early releases of SunOS 4.1.
4597
4598
4599
4600 @example
4601 hftctl.c
4602 @end example
4603
4604 This module provides some terminal-control code necessary on versions of
4605 AIX prior to 4.1.
4606
4607
4608
4609 @node Modules for Interfacing with X Windows
4610 @section Modules for Interfacing with X Windows
4611 @cindex modules for interfacing with X Windows
4612 @cindex interfacing with X Windows, modules for
4613 @cindex X Windows, modules for interfacing with
4614
4615 @example
4616 Emacs.ad.h
4617 @end example
4618
4619 A file generated from @file{Emacs.ad}, which contains XEmacs-supplied
4620 fallback resources (so that XEmacs has pretty defaults).
4621
4622
4623
4624 @example
4625 EmacsFrame.c
4626 EmacsFrame.h
4627 EmacsFrameP.h
4628 @end example
4629
4630 These modules implement an Xt widget class that encapsulates a frame.
4631 This is for ease in integrating with Xt.  The EmacsFrame widget covers
4632 the entire X window except for the menubar; the scrollbars are
4633 positioned on top of the EmacsFrame widget.
4634
4635 @strong{Warning:} Abandon hope, all ye who enter here.  This code took
4636 an ungodly amount of time to get right, and is likely to fall apart
4637 mercilessly at the slightest change.  Such is life under Xt.
4638
4639
4640
4641 @example
4642 EmacsManager.c
4643 EmacsManager.h
4644 EmacsManagerP.h
4645 @end example
4646
4647 These modules implement a simple Xt manager (i.e. composite) widget
4648 class that simply lets its children set whatever geometry they want.
4649 It's amazing that Xt doesn't provide this standardly, but on second
4650 thought, it makes sense, considering how amazingly broken Xt is.
4651
4652
4653 @example
4654 EmacsShell-sub.c
4655 EmacsShell.c
4656 EmacsShell.h
4657 EmacsShellP.h
4658 @end example
4659
4660 These modules implement two Xt widget classes that are subclasses of
4661 the TopLevelShell and TransientShell classes.  This is necessary to deal
4662 with more brokenness that Xt has sadistically thrust onto the backs of
4663 developers.
4664
4665
4666
4667 @example
4668 xgccache.c
4669 xgccache.h
4670 @end example
4671
4672 These modules provide functions for maintenance and caching of GC's
4673 (graphics contexts) under the X Window System.  This code is junky and
4674 needs to be rewritten.
4675
4676
4677
4678 @example
4679 select-msw.c
4680 select-x.c
4681 select.c
4682 select.h
4683 @end example
4684
4685 @cindex selections
4686   This module provides an interface to the X Window System's concept of
4687 @dfn{selections}, the standard way for X applications to communicate
4688 with each other.
4689
4690
4691
4692 @example
4693 xintrinsic.h
4694 xintrinsicp.h
4695 xmmanagerp.h
4696 xmprimitivep.h
4697 @end example
4698
4699 These header files are similar in spirit to the @file{sys*.h} files and buffer
4700 against different implementations of Xt and Motif.
4701
4702 @itemize @bullet
4703 @item
4704 @file{xintrinsic.h} should be included in place of @file{<Intrinsic.h>}.
4705 @item
4706 @file{xintrinsicp.h} should be included in place of @file{<IntrinsicP.h>}.
4707 @item
4708 @file{xmmanagerp.h} should be included in place of @file{<XmManagerP.h>}.
4709 @item
4710 @file{xmprimitivep.h} should be included in place of @file{<XmPrimitiveP.h>}.
4711 @end itemize
4712
4713
4714
4715 @example
4716 xmu.c
4717 xmu.h
4718 @end example
4719
4720 These files provide an emulation of the Xmu library for those systems
4721 (i.e. HPUX) that don't provide it as a standard part of X.
4722
4723
4724
4725 @example
4726 ExternalClient-Xlib.c
4727 ExternalClient.c
4728 ExternalClient.h
4729 ExternalClientP.h
4730 ExternalShell.c
4731 ExternalShell.h
4732 ExternalShellP.h
4733 extw-Xlib.c
4734 extw-Xlib.h
4735 extw-Xt.c
4736 extw-Xt.h
4737 @end example
4738
4739 @cindex external widget
4740   These files provide the @dfn{external widget} interface, which allows an
4741 XEmacs frame to appear as a widget in another application.  To do this,
4742 you have to configure with @samp{--external-widget}.
4743
4744 @file{ExternalShell*} provides the server (XEmacs) side of the
4745 connection.
4746
4747 @file{ExternalClient*} provides the client (other application) side of
4748 the connection.  These files are not compiled into XEmacs but are
4749 compiled into libraries that are then linked into your application.
4750
4751 @file{extw-*} is common code that is used for both the client and server.
4752
4753 Don't touch this code; something is liable to break if you do.
4754
4755
4756
4757 @node Modules for Internationalization
4758 @section Modules for Internationalization
4759 @cindex modules for internationalization
4760 @cindex internationalization, modules for
4761
4762 @example
4763 mule-canna.c
4764 mule-ccl.c
4765 mule-charset.c
4766 mule-charset.h
4767 file-coding.c
4768 file-coding.h
4769 mule-mcpath.c
4770 mule-mcpath.h
4771 mule-wnnfns.c
4772 mule.c
4773 @end example
4774
4775 These files implement the MULE (Asian-language) support.  Note that MULE
4776 actually provides a general interface for all sorts of languages, not
4777 just Asian languages (although they are generally the most complicated
4778 to support).  This code is still in beta.
4779
4780 @file{mule-charset.*} and @file{file-coding.*} provide the heart of the
4781 XEmacs MULE support.  @file{mule-charset.*} implements the @dfn{charset}
4782 Lisp object type, which encapsulates a character set (an ordered one- or
4783 two-dimensional set of characters, such as US ASCII or JISX0208 Japanese
4784 Kanji).
4785
4786 @file{file-coding.*} implements the @dfn{coding-system} Lisp object
4787 type, which encapsulates a method of converting between different
4788 encodings.  An encoding is a representation of a stream of characters,
4789 possibly from multiple character sets, using a stream of bytes or words,
4790 and defines (e.g.) which escape sequences are used to specify particular
4791 character sets, how the indices for a character are converted into bytes
4792 (sometimes this involves setting the high bit; sometimes complicated
4793 rearranging of the values takes place, as in the Shift-JIS encoding),
4794 etc.
4795
4796 @file{mule-ccl.c} provides the CCL (Code Conversion Language)
4797 interpreter.  CCL is similar in spirit to Lisp byte code and is used to
4798 implement converters for custom encodings.
4799
4800 @file{mule-canna.c} and @file{mule-wnnfns.c} implement interfaces to
4801 external programs used to implement the Canna and WNN input methods,
4802 respectively.  This is currently in beta.
4803
4804 @file{mule-mcpath.c} provides some functions to allow for pathnames
4805 containing extended characters.  This code is fragmentary, obsolete, and
4806 completely non-working.  Instead, @var{pathname-coding-system} is used
4807 to specify conversions of names of files and directories.  The standard
4808 C I/O functions like @samp{open()} are wrapped so that conversion occurs
4809 automatically.
4810
4811 @file{mule.c} provides a few miscellaneous things that should probably
4812 be elsewhere.
4813
4814
4815
4816 @example
4817 intl.c
4818 @end example
4819
4820 This provides some miscellaneous internationalization code for
4821 implementing message translation and interfacing to the Ximp input
4822 method.  None of this code is currently working.
4823
4824
4825
4826 @example
4827 iso-wide.h
4828 @end example
4829
4830 This contains leftover code from an earlier implementation of
4831 Asian-language support, and is not currently used.
4832
4833
4834
4835
4836 @node Allocation of Objects in XEmacs Lisp, Dumping, A Summary of the Various XEmacs Modules, Top
4837 @chapter Allocation of Objects in XEmacs Lisp
4838 @cindex allocation of objects in XEmacs Lisp
4839 @cindex objects in XEmacs Lisp, allocation of
4840 @cindex Lisp objects, allocation of in XEmacs
4841
4842 @menu
4843 * Introduction to Allocation::
4844 * Garbage Collection::
4845 * GCPROing::
4846 * Garbage Collection - Step by Step::
4847 * Integers and Characters::
4848 * Allocation from Frob Blocks::
4849 * lrecords::
4850 * Low-level allocation::
4851 * Cons::
4852 * Vector::
4853 * Bit Vector::
4854 * Symbol::
4855 * Marker::
4856 * String::
4857 * Compiled Function::
4858 @end menu
4859
4860 @node Introduction to Allocation
4861 @section Introduction to Allocation
4862 @cindex allocation, introduction to
4863
4864   Emacs Lisp, like all Lisps, has garbage collection.  This means that
4865 the programmer never has to explicitly free (destroy) an object; it
4866 happens automatically when the object becomes inaccessible.  Most
4867 experts agree that garbage collection is a necessity in a modern,
4868 high-level language.  Its omission from C stems from the fact that C was
4869 originally designed to be a nice abstract layer on top of assembly
4870 language, for writing kernels and basic system utilities rather than
4871 large applications.
4872
4873   Lisp objects can be created by any of a number of Lisp primitives.
4874 Most object types have one or a small number of basic primitives
4875 for creating objects.  For conses, the basic primitive is @code{cons};
4876 for vectors, the primitives are @code{make-vector} and @code{vector}; for
4877 symbols, the primitives are @code{make-symbol} and @code{intern}; etc.
4878 Some Lisp objects, especially those that are primarily used internally,
4879 have no corresponding Lisp primitives.  Every Lisp object, though,
4880 has at least one C primitive for creating it.
4881
4882   Recall from section (VII) that a Lisp object, as stored in a 32-bit or
4883 64-bit word, has a few tag bits, and a ``value'' that occupies the
4884 remainder of the bits.  We can separate the different Lisp object types
4885 into three broad categories:
4886
4887 @itemize @bullet
4888 @item
4889 (a) Those for whom the value directly represents the contents of the
4890 Lisp object.  Only two types are in this category: integers and
4891 characters.  No special allocation or garbage collection is necessary
4892 for such objects.  Lisp objects of these types do not need to be
4893 @code{GCPRO}ed.
4894 @end itemize
4895
4896   In the remaining two categories, the type is stored in the object
4897 itself.  The tag for all such objects is the generic @dfn{lrecord}
4898 (Lisp_Type_Record) tag.  The first bytes of the object's structure are an
4899 integer (actually a char) characterising the object's type and some
4900 flags, in particular the mark bit used for garbage collection.  A
4901 structure describing the type is accessible thru the
4902 lrecord_implementation_table indexed with said integer.  This structure
4903 includes the method pointers and a pointer to a string naming the type.
4904
4905 @itemize @bullet
4906 @item
4907 (b) Those lrecords that are allocated in frob blocks (see above).  This
4908 includes the objects that are most common and relatively small, and
4909 includes conses, strings, subrs, floats, compiled functions, symbols,
4910 extents, events, and markers.  With the cleanup of frob blocks done in
4911 19.12, it's not terribly hard to add more objects to this category, but
4912 it's a bit trickier than adding an object type to type (c) (esp. if the
4913 object needs a finalization method), and is not likely to save much
4914 space unless the object is small and there are many of them. (In fact,
4915 if there are very few of them, it might actually waste space.)
4916 @item
4917 (c) Those lrecords that are individually @code{malloc()}ed.  These are
4918 called @dfn{lcrecords}.  All other types are in this category.  Adding a
4919 new type to this category is comparatively easy, and all types added
4920 since 19.8 (when the current allocation scheme was devised, by Richard
4921 Mlynarik), with the exception of the character type, have been in this
4922 category.
4923 @end itemize
4924
4925   Note that bit vectors are a bit of a special case.  They are
4926 simple lrecords as in category (b), but are individually @code{malloc()}ed
4927 like vectors.  You can basically view them as exactly like vectors
4928 except that their type is stored in lrecord fashion rather than
4929 in directly-tagged fashion.
4930
4931
4932 @node Garbage Collection
4933 @section Garbage Collection
4934 @cindex garbage collection
4935
4936 @cindex mark and sweep
4937   Garbage collection is simple in theory but tricky to implement.
4938 Emacs Lisp uses the oldest garbage collection method, called
4939 @dfn{mark and sweep}.  Garbage collection begins by starting with
4940 all accessible locations (i.e. all variables and other slots where
4941 Lisp objects might occur) and recursively traversing all objects
4942 accessible from those slots, marking each one that is found.
4943 We then go through all of memory and free each object that is
4944 not marked, and unmarking each object that is marked.  Note
4945 that ``all of memory'' means all currently allocated objects.
4946 Traversing all these objects means traversing all frob blocks,
4947 all vectors (which are chained in one big list), and all
4948 lcrecords (which are likewise chained).
4949
4950   Garbage collection can be invoked explicitly by calling
4951 @code{garbage-collect} but is also called automatically by @code{eval},
4952 once a certain amount of memory has been allocated since the last
4953 garbage collection (according to @code{gc-cons-threshold}).
4954
4955
4956 @node GCPROing
4957 @section @code{GCPRO}ing
4958 @cindex @code{GCPRO}ing
4959 @cindex garbage collection protection
4960 @cindex protection, garbage collection
4961
4962 @code{GCPRO}ing is one of the ugliest and trickiest parts of Emacs
4963 internals.  The basic idea is that whenever garbage collection
4964 occurs, all in-use objects must be reachable somehow or
4965 other from one of the roots of accessibility.  The roots
4966 of accessibility are:
4967
4968 @enumerate
4969 @item
4970 All objects that have been @code{staticpro()}d or
4971 @code{staticpro_nodump()}ed.  This is used for any global C variables
4972 that hold Lisp objects.  A call to @code{staticpro()} happens implicitly
4973 as a result of any symbols declared with @code{defsymbol()} and any
4974 variables declared with @code{DEFVAR_FOO()}.  You need to explicitly
4975 call @code{staticpro()} (in the @code{vars_of_foo()} method of a module)
4976 for other global C variables holding Lisp objects. (This typically
4977 includes internal lists and such things.).  Use
4978 @code{staticpro_nodump()} only in the rare cases when you do not want
4979 the pointed variable to be saved at dump time but rather recompute it at
4980 startup.
4981
4982 Note that @code{obarray} is one of the @code{staticpro()}d things.
4983 Therefore, all functions and variables get marked through this.
4984 @item
4985 Any shadowed bindings that are sitting on the @code{specpdl} stack.
4986 @item
4987 Any objects sitting in currently active (Lisp) stack frames,
4988 catches, and condition cases.
4989 @item
4990 A couple of special-case places where active objects are
4991 located.
4992 @item
4993 Anything currently marked with @code{GCPRO}.
4994 @end enumerate
4995
4996   Marking with @code{GCPRO} is necessary because some C functions (quite
4997 a lot, in fact), allocate objects during their operation.  Quite
4998 frequently, there will be no other pointer to the object while the
4999 function is running, and if a garbage collection occurs and the object
5000 needs to be referenced again, bad things will happen.  The solution is
5001 to mark those objects with @code{GCPRO}.  Unfortunately this is easy to
5002 forget, and there is basically no way around this problem.  Here are
5003 some rules, though:
5004
5005 @enumerate
5006 @item
5007 For every @code{GCPRO@var{n}}, there have to be declarations of
5008 @code{struct gcpro gcpro1, gcpro2}, etc.
5009
5010 @item
5011 You @emph{must} @code{UNGCPRO} anything that's @code{GCPRO}ed, and you
5012 @emph{must not} @code{UNGCPRO} if you haven't @code{GCPRO}ed.  Getting
5013 either of these wrong will lead to crashes, often in completely random
5014 places unrelated to where the problem lies.
5015
5016 @item
5017 The way this actually works is that all currently active @code{GCPRO}s
5018 are chained through the @code{struct gcpro} local variables, with the
5019 variable @samp{gcprolist} pointing to the head of the list and the nth
5020 local @code{gcpro} variable pointing to the first @code{gcpro} variable
5021 in the next enclosing stack frame.  Each @code{GCPRO}ed thing is an
5022 lvalue, and the @code{struct gcpro} local variable contains a pointer to
5023 this lvalue.  This is why things will mess up badly if you don't pair up
5024 the @code{GCPRO}s and @code{UNGCPRO}s---you will end up with
5025 @code{gcprolist}s containing pointers to @code{struct gcpro}s or local
5026 @code{Lisp_Object} variables in no-longer-active stack frames.
5027
5028 @item
5029 It is actually possible for a single @code{struct gcpro} to
5030 protect a contiguous array of any number of values, rather than
5031 just a single lvalue.  To effect this, call @code{GCPRO@var{n}} as usual on
5032 the first object in the array and then set @code{gcpro@var{n}.nvars}.
5033
5034 @item
5035 @strong{Strings are relocated.}  What this means in practice is that the
5036 pointer obtained using @code{XSTRING_DATA()} is liable to change at any
5037 time, and you should never keep it around past any function call, or
5038 pass it as an argument to any function that might cause a garbage
5039 collection.  This is why a number of functions accept either a
5040 ``non-relocatable'' @code{char *} pointer or a relocatable Lisp string,
5041 and only access the Lisp string's data at the very last minute.  In some
5042 cases, you may end up having to @code{alloca()} some space and copy the
5043 string's data into it.
5044
5045 @item
5046 By convention, if you have to nest @code{GCPRO}'s, use @code{NGCPRO@var{n}}
5047 (along with @code{struct gcpro ngcpro1, ngcpro2}, etc.), @code{NNGCPRO@var{n}},
5048 etc.  This avoids compiler warnings about shadowed locals.
5049
5050 @item
5051 It is @emph{always} better to err on the side of extra @code{GCPRO}s
5052 rather than too few.  The extra cycles spent on this are
5053 almost never going to make a whit of difference in the
5054 speed of anything.
5055
5056 @item
5057 The general rule to follow is that caller, not callee, @code{GCPRO}s.
5058 That is, you should not have to explicitly @code{GCPRO} any Lisp objects
5059 that are passed in as parameters.
5060
5061 One exception from this rule is if you ever plan to change the parameter
5062 value, and store a new object in it.  In that case, you @emph{must}
5063 @code{GCPRO} the parameter, because otherwise the new object will not be
5064 protected.
5065
5066 So, if you create any Lisp objects (remember, this happens in all sorts
5067 of circumstances, e.g. with @code{Fcons()}, etc.), you are responsible
5068 for @code{GCPRO}ing them, unless you are @emph{absolutely sure} that
5069 there's no possibility that a garbage-collection can occur while you
5070 need to use the object.  Even then, consider @code{GCPRO}ing.
5071
5072 @item
5073 A garbage collection can occur whenever anything calls @code{Feval}, or
5074 whenever a QUIT can occur where execution can continue past
5075 this. (Remember, this is almost anywhere.)
5076
5077 @item
5078 If you have the @emph{least smidgeon of doubt} about whether
5079 you need to @code{GCPRO}, you should @code{GCPRO}.
5080
5081 @item
5082 Beware of @code{GCPRO}ing something that is uninitialized.  If you have
5083 any shade of doubt about this, initialize all your variables to @code{Qnil}.
5084
5085 @item
5086 Be careful of traps, like calling @code{Fcons()} in the argument to
5087 another function.  By the ``caller protects'' law, you should be
5088 @code{GCPRO}ing the newly-created cons, but you aren't.  A certain
5089 number of functions that are commonly called on freshly created stuff
5090 (e.g. @code{nconc2()}, @code{Fsignal()}), break the ``caller protects''
5091 law and go ahead and @code{GCPRO} their arguments so as to simplify
5092 things, but make sure and check if it's OK whenever doing something like
5093 this.
5094
5095 @item
5096 Once again, remember to @code{GCPRO}!  Bugs resulting from insufficient
5097 @code{GCPRO}ing are intermittent and extremely difficult to track down,
5098 often showing up in crashes inside of @code{garbage-collect} or in
5099 weirdly corrupted objects or even in incorrect values in a totally
5100 different section of code.
5101 @end enumerate
5102
5103 @cindex garbage collection, conservative
5104 @cindex conservative garbage collection
5105   Given the extremely error-prone nature of the @code{GCPRO} scheme, and
5106 the difficulties in tracking down, it should be considered a deficiency
5107 in the XEmacs code.  A solution to this problem would involve
5108 implementing so-called @dfn{conservative} garbage collection for the C
5109 stack.  That involves looking through all of stack memory and treating
5110 anything that looks like a reference to an object as a reference.  This
5111 will result in a few objects not getting collected when they should, but
5112 it obviates the need for @code{GCPRO}ing, and allows garbage collection
5113 to happen at any point at all, such as during object allocation.
5114
5115 @node Garbage Collection - Step by Step
5116 @section Garbage Collection - Step by Step
5117 @cindex garbage collection - step by step
5118
5119 @menu
5120 * Invocation::
5121 * garbage_collect_1::
5122 * mark_object::
5123 * gc_sweep::
5124 * sweep_lcrecords_1::
5125 * compact_string_chars::
5126 * sweep_strings::
5127 * sweep_bit_vectors_1::
5128 @end menu
5129
5130 @node Invocation
5131 @subsection Invocation
5132 @cindex garbage collection, invocation
5133
5134 The first thing that anyone should know about garbage collection is:
5135 when and how the garbage collector is invoked. One might think that this
5136 could happen every time new memory is allocated, e.g. new objects are
5137 created, but this is @emph{not} the case. Instead, we have the following
5138 situation:
5139
5140 The entry point of any process of garbage collection is an invocation
5141 of the function @code{garbage_collect_1} in file @code{alloc.c}. The
5142 invocation can occur @emph{explicitly} by calling the function
5143 @code{Fgarbage_collect} (in addition this function provides information
5144 about the freed memory), or can occur @emph{implicitly} in four different
5145 situations:
5146 @enumerate
5147 @item
5148 In function @code{main_1} in file @code{emacs.c}. This function is called
5149 at each startup of xemacs. The garbage collection is invoked after all
5150 initial creations are completed, but only if a special internal error
5151 checking-constant @code{ERROR_CHECK_GC} is defined.
5152 @item
5153 In function @code{disksave_object_finalization} in file
5154 @code{alloc.c}. The only purpose of this function is to clear the
5155 objects from memory which need not be stored with xemacs when we dump out
5156 an executable. This is only done by @code{Fdump_emacs} or by
5157 @code{Fdump_emacs_data} respectively (both in @code{emacs.c}). The
5158 actual clearing is accomplished by making these objects unreachable and
5159 starting a garbage collection. The function is only used while building
5160 xemacs.
5161 @item
5162 In function @code{Feval / eval} in file @code{eval.c}. Each time the
5163 well known and often used function eval is called to evaluate a form,
5164 one of the first things that could happen, is a potential call of
5165 @code{garbage_collect_1}. There exist three global variables,
5166 @code{consing_since_gc} (counts the created cons-cells since the last
5167 garbage collection), @code{gc_cons_threshold} (a specified threshold
5168 after which a garbage collection occurs) and @code{always_gc}. If
5169 @code{always_gc} is set or if the threshold is exceeded, the garbage
5170 collection will start.
5171 @item
5172 In function @code{Ffuncall / funcall} in file @code{eval.c}. This
5173 function evaluates calls of elisp functions and works according to
5174 @code{Feval}.
5175 @end enumerate
5176
5177 The upshot is that garbage collection can basically occur everywhere
5178 @code{Feval}, respectively @code{Ffuncall}, is used - either directly or
5179 through another function. Since calls to these two functions are hidden
5180 in various other functions, many calls to @code{garbage_collect_1} are
5181 not obviously foreseeable, and therefore unexpected. Instances where
5182 they are used that are worth remembering are various elisp commands, as
5183 for example @code{or}, @code{and}, @code{if}, @code{cond}, @code{while},
5184 @code{setq}, etc., miscellaneous @code{gui_item_...} functions,
5185 everything related to @code{eval} (@code{Feval_buffer}, @code{call0},
5186 ...) and inside @code{Fsignal}. The latter is used to handle signals, as
5187 for example the ones raised by every @code{QUIT}-macro triggered after
5188 pressing Ctrl-g.
5189
5190 @node garbage_collect_1
5191 @subsection @code{garbage_collect_1}
5192 @cindex @code{garbage_collect_1}
5193
5194 We can now describe exactly what happens after the invocation takes
5195 place.
5196 @enumerate
5197 @item
5198 There are several cases in which the garbage collector is left immediately:
5199 when we are already garbage collecting (@code{gc_in_progress}), when
5200 the garbage collection is somehow forbidden
5201 (@code{gc_currently_forbidden}), when we are currently displaying something
5202 (@code{in_display}) or when we are preparing for the armageddon of the
5203 whole system (@code{preparing_for_armageddon}).
5204 @item
5205 Next the correct frame in which to put
5206 all the output occurring during garbage collecting is determined. In
5207 order to be able to restore the old display's state after displaying the
5208 message, some data about the current cursor position has to be
5209 saved. The variables @code{pre_gc_cursor} and @code{cursor_changed} take
5210 care of that.
5211 @item
5212 The state of @code{gc_currently_forbidden} must be restored after
5213 the garbage collection, no matter what happens during the process. We
5214 accomplish this by @code{record_unwind_protect}ing the suitable function
5215 @code{restore_gc_inhibit} together with the current value of
5216 @code{gc_currently_forbidden}.
5217 @item
5218 If we are concurrently running an interactive xemacs session, the next step
5219 is simply to show the garbage collector's cursor/message.
5220 @item
5221 The following steps are the intrinsic steps of the garbage collector,
5222 therefore @code{gc_in_progress} is set.
5223 @item
5224 For debugging purposes, it is possible to copy the current C stack
5225 frame. However, this seems to be a currently unused feature.
5226 @item
5227 Before actually starting to go over all live objects, references to
5228 objects that are no longer used are pruned. We only have to do this for events
5229 (@code{clear_event_resource}) and for specifiers
5230 (@code{cleanup_specifiers}).
5231 @item
5232 Now the mark phase begins and marks all accessible elements. In order to
5233 start from
5234 all slots that serve as roots of accessibility, the function
5235 @code{mark_object} is called for each root individually to go out from
5236 there to mark all reachable objects. All roots that are traversed are
5237 shown in their processed order:
5238 @itemize @bullet
5239 @item
5240 all constant symbols and static variables that are registered via
5241 @code{staticpro}@ in the dynarr @code{staticpros}.
5242 @xref{Adding Global Lisp Variables}.
5243 @item
5244 all Lisp objects that are created in C functions and that must be
5245 protected from freeing them. They are registered in the global
5246 list @code{gcprolist}.
5247 @xref{GCPROing}.
5248 @item
5249 all local variables (i.e. their name fields @code{symbol} and old
5250 values @code{old_values}) that are bound during the evaluation by the Lisp
5251 engine. They are stored in @code{specbinding} structs pushed on a stack
5252 called @code{specpdl}.
5253 @xref{Dynamic Binding; The specbinding Stack; Unwind-Protects}.
5254 @item
5255 all catch blocks that the Lisp engine encounters during the evaluation
5256 cause the creation of structs @code{catchtag} inserted in the list
5257 @code{catchlist}. Their tag (@code{tag}) and value (@code{val} fields
5258 are freshly created objects and therefore have to be marked.
5259 @xref{Catch and Throw}.
5260 @item
5261 every function application pushes new structs @code{backtrace}
5262 on the call stack of the Lisp engine (@code{backtrace_list}). The unique
5263 parts that have to be marked are the fields for each function
5264 (@code{function}) and all their arguments (@code{args}).
5265 @xref{Evaluation}.
5266 @item
5267 all objects that are used by the redisplay engine that must not be freed
5268 are marked by a special function called @code{mark_redisplay} (in
5269 @code{redisplay.c}).
5270 @item
5271 all objects created for profiling purposes are allocated by C functions
5272 instead of using the lisp allocation mechanisms. In order to receive the
5273 right ones during the sweep phase, they also have to be marked
5274 manually. That is done by the function @code{mark_profiling_info}
5275 @end itemize
5276 @item
5277 Hash tables in XEmacs belong to a kind of special objects that
5278 make use of a concept often called 'weak pointers'.
5279 To make a long story short, these kind of pointers are not followed
5280 during the estimation of the live objects during garbage collection.
5281 Any object referenced only by weak pointers is collected
5282 anyway, and the reference to it is cleared. In hash tables there are
5283 different usage patterns of them, manifesting in different types of hash
5284 tables, namely 'non-weak', 'weak', 'key-weak' and 'value-weak'
5285 (internally also 'key-car-weak' and 'value-car-weak') hash tables, each
5286 clearing entries depending on different conditions. More information can
5287 be found in the documentation to the function @code{make-hash-table}.
5288
5289 Because there are complicated dependency rules about when and what to
5290 mark while processing weak hash tables, the standard @code{marker}
5291 method is only active if it is marking non-weak hash tables. As soon as
5292 a weak component is in the table, the hash table entries are ignored
5293 while marking. Instead their marking is done each separately by the
5294 function @code{finish_marking_weak_hash_tables}. This function iterates
5295 over each hash table entry @code{hentries} for each weak hash table in
5296 @code{Vall_weak_hash_tables}. Depending on the type of a table, the
5297 appropriate action is performed.
5298 If a table is acting as @code{HASH_TABLE_KEY_WEAK}, and a key already marked,
5299 everything reachable from the @code{value} component is marked. If it is
5300 acting as a @code{HASH_TABLE_VALUE_WEAK} and the value component is
5301 already marked, the marking starts beginning only from the
5302 @code{key} component.
5303 If it is a @code{HASH_TABLE_KEY_CAR_WEAK} and the car
5304 of the key entry is already marked, we mark both the @code{key} and
5305 @code{value} components.
5306 Finally, if the table is of the type @code{HASH_TABLE_VALUE_CAR_WEAK}
5307 and the car of the value components is already marked, again both the
5308 @code{key} and the @code{value} components get marked.
5309
5310 Again, there are lists with comparable properties called weak
5311 lists. There exist different peculiarities of their types called
5312 @code{simple}, @code{assoc}, @code{key-assoc} and
5313 @code{value-assoc}. You can find further details about them in the
5314 description to the function @code{make-weak-list}. The scheme of their
5315 marking is similar: all weak lists are listed in @code{Qall_weak_lists},
5316 therefore we iterate over them. The marking is advanced until we hit an
5317 already marked pair. Then we know that during a former run all
5318 the rest has been marked completely. Again, depending on the special
5319 type of the weak list, our jobs differ. If it is a @code{WEAK_LIST_SIMPLE}
5320 and the elem is marked, we mark the @code{cons} part. If it is a
5321 @code{WEAK_LIST_ASSOC} and not a pair or a pair with both marked car and
5322 cdr, we mark the @code{cons} and the @code{elem}. If it is a
5323 @code{WEAK_LIST_KEY_ASSOC} and not a pair or a pair with a marked car of
5324 the elem, we mark the @code{cons} and the @code{elem}. Finally, if it is
5325 a @code{WEAK_LIST_VALUE_ASSOC} and not a pair or a pair with a marked
5326 cdr of the elem, we mark both the @code{cons} and the @code{elem}.
5327
5328 Since, by marking objects in reach from weak hash tables and weak lists,
5329 other objects could get marked, this perhaps implies further marking of
5330 other weak objects, both finishing functions are redone as long as
5331 yet unmarked objects get freshly marked.
5332
5333 @item
5334 After completing the special marking for the weak hash tables and for the weak
5335 lists, all entries that point to objects that are going to be swept in
5336 the further process are useless, and therefore have to be removed from
5337 the table or the list.
5338
5339 The function @code{prune_weak_hash_tables} does the job for weak hash
5340 tables. Totally unmarked hash tables are removed from the list
5341 @code{Vall_weak_hash_tables}. The other ones are treated more carefully
5342 by scanning over all entries and removing one as soon as one of
5343 the components @code{key} and @code{value} is unmarked.
5344
5345 The same idea applies to the weak lists. It is accomplished by
5346 @code{prune_weak_lists}: An unmarked list is pruned from
5347 @code{Vall_weak_lists} immediately. A marked list is treated more
5348 carefully by going over it and removing just the unmarked pairs.
5349
5350 @item
5351 The function @code{prune_specifiers} checks all listed specifiers held
5352 in @code{Vall_specifiers} and removes the ones from the lists that are
5353 unmarked.
5354
5355 @item
5356 All syntax tables are stored in a list called
5357 @code{Vall_syntax_tables}. The function @code{prune_syntax_tables} walks
5358 through it and unlinks the tables that are unmarked.
5359
5360 @item
5361 Next, we will attack the complete sweeping - the function
5362 @code{gc_sweep} which holds the predominance.
5363 @item
5364 First, all the variables with respect to garbage collection are
5365 reset. @code{consing_since_gc} - the counter of the created cells since
5366 the last garbage collection - is set back to 0, and
5367 @code{gc_in_progress} is not @code{true} anymore.
5368 @item
5369 In case the session is interactive, the displayed cursor and message are
5370 removed again.
5371 @item
5372 The state of @code{gc_inhibit} is restored to the former value by
5373 unwinding the stack.
5374 @item
5375 A small memory reserve is always held back that can be reached by
5376 @code{breathing_space}. If nothing more is left, we create a new reserve
5377 and exit.
5378 @end enumerate
5379
5380 @node mark_object
5381 @subsection @code{mark_object}
5382 @cindex @code{mark_object}
5383
5384 The first thing that is checked while marking an object is whether the
5385 object is a real Lisp object @code{Lisp_Type_Record} or just an integer
5386 or a character. Integers and characters are the only two types that are
5387 stored directly - without another level of indirection, and therefore they
5388 don't have to be marked and collected.
5389 @xref{How Lisp Objects Are Represented in C}.
5390
5391 The second case is the one we have to handle. It is the one when we are
5392 dealing with a pointer to a Lisp object. But, there exist also three
5393 possibilities, that prevent us from doing anything while marking: The
5394 object is read only which prevents it from being garbage collected,
5395 i.e. marked (@code{C_READONLY_RECORD_HEADER}). The object in question is
5396 already marked, and need not be marked for the second time (checked by
5397 @code{MARKED_RECORD_HEADER_P}). If it is a special, unmarkable object
5398 (@code{UNMARKABLE_RECORD_HEADER_P}, apparently, these are objects that
5399 sit in some const space, and can therefore not be marked, see
5400 @code{this_one_is_unmarkable} in @code{alloc.c}).
5401
5402 Now, the actual marking is feasible. We do so by once using the macro
5403 @code{MARK_RECORD_HEADER} to mark the object itself (actually the
5404 special flag in the lrecord header), and calling its special marker
5405 "method" @code{marker} if available. The marker method marks every
5406 other object that is in reach from our current object. Note, that these
5407 marker methods should not call @code{mark_object} recursively, but
5408 instead should return the next object from where further marking has to
5409 be performed.
5410
5411 In case another object was returned, as mentioned before, we reiterate
5412 the whole @code{mark_object} process beginning with this next object.
5413
5414 @node gc_sweep
5415 @subsection @code{gc_sweep}
5416 @cindex @code{gc_sweep}
5417
5418 The job of this function is to free all unmarked records from memory. As
5419 we know, there are different types of objects implemented and managed, and
5420 consequently different ways to free them from memory.
5421 @xref{Introduction to Allocation}.
5422
5423 We start with all objects stored through @code{lcrecords}. All
5424 bulkier objects are allocated and handled using that scheme of
5425 @code{lcrecords}. Each object is @code{malloc}ed separately
5426 instead of placing it in one of the contiguous frob blocks. All types
5427 that are currently stored
5428 using @code{lcrecords}'s  @code{alloc_lcrecord} and
5429 @code{make_lcrecord_list} are the types: vectors, buffers,
5430 char-table, char-table-entry, console, weak-list, database, device,
5431 ldap, hash-table, command-builder, extent-auxiliary, extent-info, face,
5432 coding-system, frame, image-instance, glyph, popup-data, gui-item,
5433 keymap, charset, color_instance, font_instance, opaque, opaque-list,
5434 process, range-table, specifier, symbol-value-buffer-local,
5435 symbol-value-lisp-magic, symbol-value-varalias, toolbar-button,
5436 tooltalk-message, tooltalk-pattern, window, and window-configuration. We
5437 take care of them in the fist place
5438 in order to be able to handle and to finalize items stored in them more
5439 easily. The function @code{sweep_lcrecords_1} as described below is
5440 doing the whole job for us.
5441 For a description about the internals: @xref{lrecords}.
5442
5443 Our next candidates are the other objects that behave quite differently
5444 than everything else: the strings. They consists of two parts, a
5445 fixed-size portion (@code{struct Lisp_String}) holding the string's
5446 length, its property list and a pointer to the second part, and the
5447 actual string data, which is stored in string-chars blocks comparable to
5448 frob blocks. In this block, the data is not only freed, but also a
5449 compression of holes is made, i.e. all strings are relocated together.
5450 @xref{String}. This compacting phase is performed by the function
5451 @code{compact_string_chars}, the actual sweeping by the function
5452 @code{sweep_strings} is described below.
5453
5454 After that, the other types are swept step by step using functions
5455 @code{sweep_conses}, @code{sweep_bit_vectors_1},
5456 @code{sweep_compiled_functions}, @code{sweep_floats},
5457 @code{sweep_symbols}, @code{sweep_extents}, @code{sweep_markers} and
5458 @code{sweep_extents}.  They are the fixed-size types cons, floats,
5459 compiled-functions, symbol, marker, extent, and event stored in
5460 so-called "frob blocks", and therefore we can basically do the same on
5461 every type objects, using the same macros, especially defined only to
5462 handle everything with respect to fixed-size blocks. The only fixed-size
5463 type that is not handled here are the fixed-size portion of strings,
5464 because we took special care of them earlier.
5465
5466 The only big exceptions are bit vectors stored differently and
5467 therefore treated differently by the function @code{sweep_bit_vectors_1}
5468 described later.
5469
5470 At first, we need some brief information about how
5471 these fixed-size types are managed in general, in order to understand
5472 how the sweeping is done. They have all a fixed size, and are therefore
5473 stored in big blocks of memory - allocated at once - that can hold a
5474 certain amount of objects of one type. The macro
5475 @code{DECLARE_FIXED_TYPE_ALLOC} creates the suitable structures for
5476 every type. More precisely, we have the block struct
5477 (holding a pointer to the previous block @code{prev} and the
5478 objects in @code{block[]}), a pointer to current block
5479 (@code{current_..._block)}) and its last index
5480 (@code{current_..._block_index}), and a pointer to the free list that
5481 will be created. Also a macro @code{FIXED_TYPE_FROM_BLOCK} plus some
5482 related macros exists that are used to obtain a new object, either from
5483 the free list @code{ALLOCATE_FIXED_TYPE_1} if there is an unused object
5484 of that type stored or by allocating a completely new block using
5485 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK}.
5486
5487 The rest works as follows: all of them define a
5488 macro @code{UNMARK_...} that is used to unmark the object. They define a
5489 macro @code{ADDITIONAL_FREE_...} that defines additional work that has
5490 to be done when converting an object from in use to not in use (so far,
5491 only markers use it in order to unchain them). Then, they all call
5492 the macro @code{SWEEP_FIXED_TYPE_BLOCK} instantiated with their type name
5493 and their struct name.
5494
5495 This call in particular does the following: we go over all blocks
5496 starting with the current moving towards the oldest.
5497 For each block, we look at every object in it. If the object already
5498 freed (checked with @code{FREE_STRUCT_P} using the first pointer of the
5499 object), or if it is
5500 set to read only (@code{C_READONLY_RECORD_HEADER_P}, nothing must be
5501 done. If it is unmarked (checked with @code{MARKED_RECORD_HEADER_P}), it
5502 is put in the free list and set free (using the macro
5503 @code{FREE_FIXED_TYPE}, otherwise it stays in the block, but is unmarked
5504 (by @code{UNMARK_...}). While going through one block, we note if the
5505 whole block is empty. If so, the whole block is freed (using
5506 @code{xfree}) and the free list state is set to the state it had before
5507 handling this block.
5508
5509 @node sweep_lcrecords_1
5510 @subsection @code{sweep_lcrecords_1}
5511 @cindex @code{sweep_lcrecords_1}
5512
5513 After nullifying the complete lcrecord statistics, we go over all
5514 lcrecords two separate times. They are all chained together in a list with
5515 a head called @code{all_lcrecords}.
5516
5517 The first loop calls for each object its @code{finalizer} method, but only
5518 in the case that it is not read only
5519 (@code{C_READONLY_RECORD_HEADER_P)}, it is not already marked
5520 (@code{MARKED_RECORD_HEADER_P}), it is not already in a free list (list of
5521 freed objects, field @code{free}) and finally it owns a finalizer
5522 method.
5523
5524 The second loop actually frees the appropriate objects again by iterating
5525 through the whole list. In case an object is read only or marked, it
5526 has to persist, otherwise it is manually freed by calling
5527 @code{xfree}. During this loop, the lcrecord statistics are kept up to
5528 date by calling @code{tick_lcrecord_stats} with the right arguments,
5529
5530 @node compact_string_chars
5531 @subsection @code{compact_string_chars}
5532 @cindex @code{compact_string_chars}
5533
5534 The purpose of this function is to compact all the data parts of the
5535 strings that are held in so-called @code{string_chars_block}, i.e. the
5536 strings that do not exceed a certain maximal length.
5537
5538 The procedure with which this is done is as follows. We are keeping two
5539 positions in the @code{string_chars_block}s using two pointer/integer
5540 pairs, namely @code{from_sb}/@code{from_pos} and
5541 @code{to_sb}/@code{to_pos}. They stand for the actual positions, from
5542 where to where, to copy the actually handled string.
5543
5544 While going over all chained @code{string_char_block}s and their held
5545 strings, staring at @code{first_string_chars_block}, both pointers
5546 are advanced and eventually a string is copied from @code{from_sb} to
5547 @code{to_sb}, depending on the status of the pointed at strings.
5548
5549 More precisely, we can distinguish between the following actions.
5550 @itemize @bullet
5551 @item
5552 The string at @code{from_sb}'s position could be marked as free, which
5553 is indicated by an invalid pointer to the pointer that should point back
5554 to the fixed size string object, and which is checked by
5555 @code{FREE_STRUCT_P}. In this case, the @code{from_sb}/@code{from_pos}
5556 is advanced to the next string, and nothing has to be copied.
5557 @item
5558 Also, if a string object itself is unmarked, nothing has to be
5559 copied. We likewise advance the @code{from_sb}/@code{from_pos}
5560 pair as described above.
5561 @item
5562 In all other cases, we have a marked string at hand. The string data
5563 must be moved from the from-position to the to-position. In case
5564 there is not enough space in the actual @code{to_sb}-block, we advance
5565 this pointer to the beginning of the next block before copying. In case the
5566 from and to positions are different, we perform the
5567 actual copying using the library function @code{memmove}.
5568 @end itemize
5569
5570 After compacting, the pointer to the current
5571 @code{string_chars_block}, sitting in @code{current_string_chars_block},
5572 is reset on the last block to which we moved a string,
5573 i.e. @code{to_block}, and all remaining blocks (we know that they just
5574 carry garbage) are explicitly @code{xfree}d.
5575
5576 @node sweep_strings
5577 @subsection @code{sweep_strings}
5578 @cindex @code{sweep_strings}
5579
5580 The sweeping for the fixed sized string objects is essentially exactly
5581 the same as it is for all other fixed size types. As before, the freeing
5582 into the suitable free list is done by using the macro
5583 @code{SWEEP_FIXED_SIZE_BLOCK} after defining the right macros
5584 @code{UNMARK_string} and @code{ADDITIONAL_FREE_string}. These two
5585 definitions are a little bit special compared to the ones used
5586 for the other fixed size types.
5587
5588 @code{UNMARK_string} is defined the same way except some additional code
5589 used for updating the bookkeeping information.
5590
5591 For strings, @code{ADDITIONAL_FREE_string} has to do something in
5592 addition: in case, the string was not allocated in a
5593 @code{string_chars_block} because it exceeded the maximal length, and
5594 therefore it was @code{malloc}ed separately, we know also @code{xfree}
5595 it explicitly.
5596
5597 @node sweep_bit_vectors_1
5598 @subsection @code{sweep_bit_vectors_1}
5599 @cindex @code{sweep_bit_vectors_1}
5600
5601 Bit vectors are also one of the rare types that are @code{malloc}ed
5602 individually. Consequently, while sweeping, all further needless
5603 bit vectors must be freed by hand. This is done, as one might imagine,
5604 the expected way: since they are all registered in a list called
5605 @code{all_bit_vectors}, all elements of that list are traversed,
5606 all unmarked bit vectors are unlinked by calling @code{xfree} and all of
5607 them become unmarked.
5608 In addition, the bookkeeping information used for garbage
5609 collector's output purposes is updated.
5610
5611 @node Integers and Characters
5612 @section Integers and Characters
5613 @cindex integers and characters
5614 @cindex characters, integers and
5615
5616   Integer and character Lisp objects are created from integers using the
5617 macros @code{XSETINT()} and @code{XSETCHAR()} or the equivalent
5618 functions @code{make_int()} and @code{make_char()}. (These are actually
5619 macros on most systems.)  These functions basically just do some moving
5620 of bits around, since the integral value of the object is stored
5621 directly in the @code{Lisp_Object}.
5622
5623   @code{XSETINT()} and the like will truncate values given to them that
5624 are too big; i.e. you won't get the value you expected but the tag bits
5625 will at least be correct.
5626
5627 @node Allocation from Frob Blocks
5628 @section Allocation from Frob Blocks
5629 @cindex allocation from frob blocks
5630 @cindex frob blocks, allocation from
5631
5632 The uninitialized memory required by a @code{Lisp_Object} of a particular type
5633 is allocated using
5634 @code{ALLOCATE_FIXED_TYPE()}.  This only occurs inside of the
5635 lowest-level object-creating functions in @file{alloc.c}:
5636 @code{Fcons()}, @code{make_float()}, @code{Fmake_byte_code()},
5637 @code{Fmake_symbol()}, @code{allocate_extent()},
5638 @code{allocate_event()}, @code{Fmake_marker()}, and
5639 @code{make_uninit_string()}.  The idea is that, for each type, there are
5640 a number of frob blocks (each 2K in size); each frob block is divided up
5641 into object-sized chunks.  Each frob block will have some of these
5642 chunks that are currently assigned to objects, and perhaps some that are
5643 free. (If a frob block has nothing but free chunks, it is freed at the
5644 end of the garbage collection cycle.)  The free chunks are stored in a
5645 free list, which is chained by storing a pointer in the first four bytes
5646 of the chunk. (Except for the free chunks at the end of the last frob
5647 block, which are handled using an index which points past the end of the
5648 last-allocated chunk in the last frob block.)
5649 @code{ALLOCATE_FIXED_TYPE()} first tries to retrieve a chunk from the
5650 free list; if that fails, it calls
5651 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK()}, which looks at the end of the
5652 last frob block for space, and creates a new frob block if there is
5653 none. (There are actually two versions of these macros, one of which is
5654 more defensive but less efficient and is used for error-checking.)
5655
5656 @node lrecords
5657 @section lrecords
5658 @cindex lrecords
5659
5660   [see @file{lrecord.h}]
5661
5662   All lrecords have at the beginning of their structure a @code{struct
5663 lrecord_header}.  This just contains a type number and some flags,
5664 including the mark bit.  All builtin type numbers are defined as
5665 constants in @code{enum lrecord_type}, to allow the compiler to generate
5666 more efficient code for @code{@var{type}P}.  The type number, thru the
5667 @code{lrecord_implementation_table}, gives access to a @code{struct
5668 lrecord_implementation}, which is a structure containing method pointers
5669 and such.  There is one of these for each type, and it is a global,
5670 constant, statically-declared structure that is declared in the
5671 @code{DEFINE_LRECORD_IMPLEMENTATION()} macro.
5672
5673   Simple lrecords (of type (b) above) just have a @code{struct
5674 lrecord_header} at their beginning.  lcrecords, however, actually have a
5675 @code{struct lcrecord_header}.  This, in turn, has a @code{struct
5676 lrecord_header} at its beginning, so sanity is preserved; but it also
5677 has a pointer used to chain all lcrecords together, and a special ID
5678 field used to distinguish one lcrecord from another. (This field is used
5679 only for debugging and could be removed, but the space gain is not
5680 significant.)
5681
5682   Simple lrecords are created using @code{ALLOCATE_FIXED_TYPE()}, just
5683 like for other frob blocks.  The only change is that the implementation
5684 pointer must be initialized correctly. (The implementation structure for
5685 an lrecord, or rather the pointer to it, is named @code{lrecord_float},
5686 @code{lrecord_extent}, @code{lrecord_buffer}, etc.)
5687
5688   lcrecords are created using @code{alloc_lcrecord()}.  This takes a
5689 size to allocate and an implementation pointer. (The size needs to be
5690 passed because some lcrecords, such as window configurations, are of
5691 variable size.) This basically just @code{malloc()}s the storage,
5692 initializes the @code{struct lcrecord_header}, and chains the lcrecord
5693 onto the head of the list of all lcrecords, which is stored in the
5694 variable @code{all_lcrecords}.  The calls to @code{alloc_lcrecord()}
5695 generally occur in the lowest-level allocation function for each lrecord
5696 type.
5697
5698 Whenever you create an lrecord, you need to call either
5699 @code{DEFINE_LRECORD_IMPLEMENTATION()} or
5700 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()}.  This needs to be
5701 specified in a @file{.c} file, at the top level.  What this actually
5702 does is define and initialize the implementation structure for the
5703 lrecord. (And possibly declares a function @code{error_check_foo()} that
5704 implements the @code{XFOO()} macro when error-checking is enabled.)  The
5705 arguments to the macros are the actual type name (this is used to
5706 construct the C variable name of the lrecord implementation structure
5707 and related structures using the @samp{##} macro concatenation
5708 operator), a string that names the type on the Lisp level (this may not
5709 be the same as the C type name; typically, the C type name has
5710 underscores, while the Lisp string has dashes), various method pointers,
5711 and the name of the C structure that contains the object.  The methods
5712 are used to encapsulate type-specific information about the object, such
5713 as how to print it or mark it for garbage collection, so that it's easy
5714 to add new object types without having to add a specific case for each
5715 new type in a bunch of different places.
5716
5717   The difference between @code{DEFINE_LRECORD_IMPLEMENTATION()} and
5718 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()} is that the former is
5719 used for fixed-size object types and the latter is for variable-size
5720 object types.  Most object types are fixed-size; some complex
5721 types, however (e.g. window configurations), are variable-size.
5722 Variable-size object types have an extra method, which is called
5723 to determine the actual size of a particular object of that type.
5724 (Currently this is only used for keeping allocation statistics.)
5725
5726   For the purpose of keeping allocation statistics, the allocation
5727 engine keeps a list of all the different types that exist.  Note that,
5728 since @code{DEFINE_LRECORD_IMPLEMENTATION()} is a macro that is
5729 specified at top-level, there is no way for it to initialize the global
5730 data structures containing type information, like
5731 @code{lrecord_implementations_table}.  For this reason a call to
5732 @code{INIT_LRECORD_IMPLEMENTATION} must be added to the same source file
5733 containing @code{DEFINE_LRECORD_IMPLEMENTATION}, but instead of to the
5734 top level, to one of the init functions, typically
5735 @code{syms_of_@var{foo}.c}.  @code{INIT_LRECORD_IMPLEMENTATION} must be
5736 called before an object of this type is used.
5737
5738 The type number is also used to index into an array holding the number
5739 of objects of each type and the total memory allocated for objects of
5740 that type.  The statistics in this array are computed during the sweep
5741 stage.  These statistics are returned by the call to
5742 @code{garbage-collect}.
5743
5744   Note that for every type defined with a @code{DEFINE_LRECORD_*()}
5745 macro, there needs to be a @code{DECLARE_LRECORD_IMPLEMENTATION()}
5746 somewhere in a @file{.h} file, and this @file{.h} file needs to be
5747 included by @file{inline.c}.
5748
5749   Furthermore, there should generally be a set of @code{XFOOBAR()},
5750 @code{FOOBARP()}, etc. macros in a @file{.h} (or occasionally @file{.c})
5751 file.  To create one of these, copy an existing model and modify as
5752 necessary.
5753
5754   @strong{Please note:} If you define an lrecord in an external
5755 dynamically-loaded module, you must use @code{DECLARE_EXTERNAL_LRECORD},
5756 @code{DEFINE_EXTERNAL_LRECORD_IMPLEMENTATION}, and
5757 @code{DEFINE_EXTERNAL_LRECORD_SEQUENCE_IMPLEMENTATION} instead of the
5758 non-EXTERNAL forms. These macros will dynamically add new type numbers
5759 to the global enum that records them, whereas the non-EXTERNAL forms
5760 assume that the programmer has already inserted the correct type numbers
5761 into the enum's code at compile-time.
5762
5763   The various methods in the lrecord implementation structure are:
5764
5765 @enumerate
5766 @item
5767 @cindex mark method
5768 A @dfn{mark} method.  This is called during the marking stage and passed
5769 a function pointer (usually the @code{mark_object()} function), which is
5770 used to mark an object.  All Lisp objects that are contained within the
5771 object need to be marked by applying this function to them.  The mark
5772 method should also return a Lisp object, which should be either @code{nil} or
5773 an object to mark. (This can be used in lieu of calling
5774 @code{mark_object()} on the object, to reduce the recursion depth, and
5775 consequently should be the most heavily nested sub-object, such as a
5776 long list.)
5777
5778 @strong{Please note:} When the mark method is called, garbage collection
5779 is in progress, and special precautions need to be taken when accessing
5780 objects; see section (B) above.
5781
5782 If your mark method does not need to do anything, it can be
5783 @code{NULL}.
5784
5785 @item
5786 A @dfn{print} method.  This is called to create a printed representation
5787 of the object, whenever @code{princ}, @code{prin1}, or the like is
5788 called.  It is passed the object, a stream to which the output is to be
5789 directed, and an @code{escapeflag} which indicates whether the object's
5790 printed representation should be @dfn{escaped} so that it is
5791 readable. (This corresponds to the difference between @code{princ} and
5792 @code{prin1}.) Basically, @dfn{escaped} means that strings will have
5793 quotes around them and confusing characters in the strings such as
5794 quotes, backslashes, and newlines will be backslashed; and that special
5795 care will be taken to make symbols print in a readable fashion
5796 (e.g. symbols that look like numbers will be backslashed).  Other
5797 readable objects should perhaps pass @code{escapeflag} on when
5798 sub-objects are printed, so that readability is preserved when necessary
5799 (or if not, always pass in a 1 for @code{escapeflag}).  Non-readable
5800 objects should in general ignore @code{escapeflag}, except that some use
5801 it as an indication that more verbose output should be given.
5802
5803 Sub-objects are printed using @code{print_internal()}, which takes
5804 exactly the same arguments as are passed to the print method.
5805
5806 Literal C strings should be printed using @code{write_c_string()},
5807 or @code{write_string_1()} for non-null-terminated strings.
5808
5809 Functions that do not have a readable representation should check the
5810 @code{print_readably} flag and signal an error if it is set.
5811
5812 If you specify NULL for the print method, the
5813 @code{default_object_printer()} will be used.
5814
5815 @item
5816 A @dfn{finalize} method.  This is called at the beginning of the sweep
5817 stage on lcrecords that are about to be freed, and should be used to
5818 perform any extra object cleanup.  This typically involves freeing any
5819 extra @code{malloc()}ed memory associated with the object, releasing any
5820 operating-system and window-system resources associated with the object
5821 (e.g. pixmaps, fonts), etc.
5822
5823 The finalize method can be NULL if nothing needs to be done.
5824
5825 WARNING #1: The finalize method is also called at the end of the dump
5826 phase; this time with the for_disksave parameter set to non-zero.  The
5827 object is @emph{not} about to disappear, so you have to make sure to
5828 @emph{not} free any extra @code{malloc()}ed memory if you're going to
5829 need it later.  (Also, signal an error if there are any operating-system
5830 and window-system resources here, because they can't be dumped.)
5831
5832 Finalize methods should, as a rule, set to zero any pointers after
5833 they've been freed, and check to make sure pointers are not zero before
5834 freeing.  Although I'm pretty sure that finalize methods are not called
5835 twice on the same object (except for the @code{for_disksave} proviso),
5836 we've gotten nastily burned in some cases by not doing this.
5837
5838 WARNING #2: The finalize method is @emph{only} called for
5839 lcrecords, @emph{not} for simply lrecords.  If you need a
5840 finalize method for simple lrecords, you have to stick
5841 it in the @code{ADDITIONAL_FREE_foo()} macro in @file{alloc.c}.
5842
5843 WARNING #3: Things are in an @emph{extremely} bizarre state
5844 when @code{ADDITIONAL_FREE_foo()} is called, so you have to
5845 be incredibly careful when writing one of these functions.
5846 See the comment in @code{gc_sweep()}.  If you ever have to add
5847 one of these, consider using an lcrecord or dealing with
5848 the problem in a different fashion.
5849
5850 @item
5851 An @dfn{equal} method.  This compares the two objects for similarity,
5852 when @code{equal} is called.  It should compare the contents of the
5853 objects in some reasonable fashion.  It is passed the two objects and a
5854 @dfn{depth} value, which is used to catch circular objects.  To compare
5855 sub-Lisp-objects, call @code{internal_equal()} and bump the depth value
5856 by one.  If this value gets too high, a @code{circular-object} error
5857 will be signaled.
5858
5859 If this is NULL, objects are @code{equal} only when they are @code{eq},
5860 i.e. identical.
5861
5862 @item
5863 A @dfn{hash} method.  This is used to hash objects when they are to be
5864 compared with @code{equal}.  The rule here is that if two objects are
5865 @code{equal}, they @emph{must} hash to the same value; i.e. your hash
5866 function should use some subset of the sub-fields of the object that are
5867 compared in the ``equal'' method.  If you specify this method as
5868 @code{NULL}, the object's pointer will be used as the hash, which will
5869 @emph{fail} if the object has an @code{equal} method, so don't do this.
5870
5871 To hash a sub-Lisp-object, call @code{internal_hash()}.  Bump the
5872 depth by one, just like in the ``equal'' method.
5873
5874 To convert a Lisp object directly into a hash value (using
5875 its pointer), use @code{LISP_HASH()}.  This is what happens when
5876 the hash method is NULL.
5877
5878 To hash two or more values together into a single value, use
5879 @code{HASH2()}, @code{HASH3()}, @code{HASH4()}, etc.
5880
5881 @item
5882 @dfn{getprop}, @dfn{putprop}, @dfn{remprop}, and @dfn{plist} methods.
5883 These are used for object types that have properties.  I don't feel like
5884 documenting them here.  If you create one of these objects, you have to
5885 use different macros to define them,
5886 i.e. @code{DEFINE_LRECORD_IMPLEMENTATION_WITH_PROPS()} or
5887 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION_WITH_PROPS()}.
5888
5889 @item
5890 A @dfn{size_in_bytes} method, when the object is of variable-size.
5891 (i.e. declared with a @code{_SEQUENCE_IMPLEMENTATION} macro.)  This should
5892 simply return the object's size in bytes, exactly as you might expect.
5893 For an example, see the methods for window configurations and opaques.
5894 @end enumerate
5895
5896 @node Low-level allocation
5897 @section Low-level allocation
5898 @cindex low-level allocation
5899 @cindex allocation, low-level
5900
5901   Memory that you want to allocate directly should be allocated using
5902 @code{xmalloc()} rather than @code{malloc()}.  This implements
5903 error-checking on the return value, and once upon a time did some more
5904 vital stuff (i.e. @code{BLOCK_INPUT}, which is no longer necessary).
5905 Free using @code{xfree()}, and realloc using @code{xrealloc()}.  Note
5906 that @code{xmalloc()} will do a non-local exit if the memory can't be
5907 allocated. (Many functions, however, do not expect this, and thus XEmacs
5908 will likely crash if this happens.  @strong{This is a bug.}  If you can,
5909 you should strive to make your function handle this OK.  However, it's
5910 difficult in the general circumstance, perhaps requiring extra
5911 unwind-protects and such.)
5912
5913   Note that XEmacs provides two separate replacements for the standard
5914 @code{malloc()} library function.  These are called @dfn{old GNU malloc}
5915 (@file{malloc.c}) and @dfn{new GNU malloc} (@file{gmalloc.c}),
5916 respectively.  New GNU malloc is better in pretty much every way than
5917 old GNU malloc, and should be used if possible.  (It used to be that on
5918 some systems, the old one worked but the new one didn't.  I think this
5919 was due specifically to a bug in SunOS, which the new one now works
5920 around; so I don't think the old one ever has to be used any more.) The
5921 primary difference between both of these mallocs and the standard system
5922 malloc is that they are much faster, at the expense of increased space.
5923 The basic idea is that memory is allocated in fixed chunks of powers of
5924 two.  This allows for basically constant malloc time, since the various
5925 chunks can just be kept on a number of free lists. (The standard system
5926 malloc typically allocates arbitrary-sized chunks and has to spend some
5927 time, sometimes a significant amount of time, walking the heap looking
5928 for a free block to use and cleaning things up.)  The new GNU malloc
5929 improves on things by allocating large objects in chunks of 4096 bytes
5930 rather than in ever larger powers of two, which results in ever larger
5931 wastage.  There is a slight speed loss here, but it's of doubtful
5932 significance.
5933
5934   NOTE: Apparently there is a third-generation GNU malloc that is
5935 significantly better than the new GNU malloc, and should probably
5936 be included in XEmacs.
5937
5938   There is also the relocating allocator, @file{ralloc.c}.  This actually
5939 moves blocks of memory around so that the @code{sbrk()} pointer shrunk
5940 and virtual memory released back to the system.  On some systems,
5941 this is a big win.  On all systems, it causes a noticeable (and
5942 sometimes huge) speed penalty, so I turn it off by default.
5943 @file{ralloc.c} only works with the new GNU malloc in @file{gmalloc.c}.
5944 There are also two versions of @file{ralloc.c}, one that uses @code{mmap()}
5945 rather than block copies to move data around.  This purports to
5946 be faster, although that depends on the amount of data that would
5947 have had to be block copied and the system-call overhead for
5948 @code{mmap()}.  I don't know exactly how this works, except that the
5949 relocating-allocation routines are pretty much used only for
5950 the memory allocated for a buffer, which is the biggest consumer
5951 of space, esp. of space that may get freed later.
5952
5953   Note that the GNU mallocs have some ``memory warning'' facilities.
5954 XEmacs taps into them and issues a warning through the standard
5955 warning system, when memory gets to 75%, 85%, and 95% full.
5956 (On some systems, the memory warnings are not functional.)
5957
5958   Allocated memory that is going to be used to make a Lisp object
5959 is created using @code{allocate_lisp_storage()}.  This just calls
5960 @code{xmalloc()}.  It used to verify that the pointer to the memory can
5961 fit into a Lisp word, before the current Lisp object representation was
5962 introduced.  @code{allocate_lisp_storage()} is called by
5963 @code{alloc_lcrecord()}, @code{ALLOCATE_FIXED_TYPE()}, and the vector
5964 and bit-vector creation routines.  These routines also call
5965 @code{INCREMENT_CONS_COUNTER()} at the appropriate times; this keeps
5966 statistics on how much memory is allocated, so that garbage-collection
5967 can be invoked when the threshold is reached.
5968
5969 @node Cons
5970 @section Cons
5971 @cindex cons
5972
5973   Conses are allocated in standard frob blocks.  The only thing to
5974 note is that conses can be explicitly freed using @code{free_cons()}
5975 and associated functions @code{free_list()} and @code{free_alist()}.  This
5976 immediately puts the conses onto the cons free list, and decrements
5977 the statistics on memory allocation appropriately.  This is used
5978 to good effect by some extremely commonly-used code, to avoid
5979 generating extra objects and thereby triggering GC sooner.
5980 However, you have to be @emph{extremely} careful when doing this.
5981 If you mess this up, you will get BADLY BURNED, and it has happened
5982 before.
5983
5984 @node Vector
5985 @section Vector
5986 @cindex vector
5987
5988   As mentioned above, each vector is @code{malloc()}ed individually, and
5989 all are threaded through the variable @code{all_vectors}.  Vectors are
5990 marked strangely during garbage collection, by kludging the size field.
5991 Note that the @code{struct Lisp_Vector} is declared with its
5992 @code{contents} field being a @emph{stretchy} array of one element.  It
5993 is actually @code{malloc()}ed with the right size, however, and access
5994 to any element through the @code{contents} array works fine.
5995
5996 @node Bit Vector
5997 @section Bit Vector
5998 @cindex bit vector
5999 @cindex vector, bit
6000
6001   Bit vectors work exactly like vectors, except for more complicated
6002 code to access an individual bit, and except for the fact that bit
6003 vectors are lrecords while vectors are not. (The only difference here is
6004 that there's an lrecord implementation pointer at the beginning and the
6005 tag field in bit vector Lisp words is ``lrecord'' rather than
6006 ``vector''.)
6007
6008 @node Symbol
6009 @section Symbol
6010 @cindex symbol
6011
6012   Symbols are also allocated in frob blocks.  Symbols in the awful
6013 horrible obarray structure are chained through their @code{next} field.
6014
6015 Remember that @code{intern} looks up a symbol in an obarray, creating
6016 one if necessary.
6017
6018 @node Marker
6019 @section Marker
6020 @cindex marker
6021
6022   Markers are allocated in frob blocks, as usual.  They are kept
6023 in a buffer unordered, but in a doubly-linked list so that they
6024 can easily be removed. (Formerly this was a singly-linked list,
6025 but in some cases garbage collection took an extraordinarily
6026 long time due to the O(N^2) time required to remove lots of
6027 markers from a buffer.) Markers are removed from a buffer in
6028 the finalize stage, in @code{ADDITIONAL_FREE_marker()}.
6029
6030 @node String
6031 @section String
6032 @cindex string
6033
6034   As mentioned above, strings are a special case.  A string is logically
6035 two parts, a fixed-size object (containing the length, property list,
6036 and a pointer to the actual data), and the actual data in the string.
6037 The fixed-size object is a @code{struct Lisp_String} and is allocated in
6038 frob blocks, as usual.  The actual data is stored in special
6039 @dfn{string-chars blocks}, which are 8K blocks of memory.
6040 Currently-allocated strings are simply laid end to end in these
6041 string-chars blocks, with a pointer back to the @code{struct Lisp_String}
6042 stored before each string in the string-chars block.  When a new string
6043 needs to be allocated, the remaining space at the end of the last
6044 string-chars block is used if there's enough, and a new string-chars
6045 block is created otherwise.
6046
6047   There are never any holes in the string-chars blocks due to the string
6048 compaction and relocation that happens at the end of garbage collection.
6049 During the sweep stage of garbage collection, when objects are
6050 reclaimed, the garbage collector goes through all string-chars blocks,
6051 looking for unused strings.  Each chunk of string data is preceded by a
6052 pointer to the corresponding @code{struct Lisp_String}, which indicates
6053 both whether the string is used and how big the string is, i.e. how to
6054 get to the next chunk of string data.  Holes are compressed by
6055 block-copying the next string into the empty space and relocating the
6056 pointer stored in the corresponding @code{struct Lisp_String}.
6057 @strong{This means you have to be careful with strings in your code.}
6058 See the section above on @code{GCPRO}ing.
6059
6060   Note that there is one situation not handled: a string that is too big
6061 to fit into a string-chars block.  Such strings, called @dfn{big
6062 strings}, are all @code{malloc()}ed as their own block. (#### Although it
6063 would make more sense for the threshold for big strings to be somewhat
6064 lower, e.g. 1/2 or 1/4 the size of a string-chars block.  It seems that
6065 this was indeed the case formerly---indeed, the threshold was set at
6066 1/8---but Mly forgot about this when rewriting things for 19.8.)
6067
6068 Note also that the string data in string-chars blocks is padded as
6069 necessary so that proper alignment constraints on the @code{struct
6070 Lisp_String} back pointers are maintained.
6071
6072   Finally, strings can be resized.  This happens in Mule when a
6073 character is substituted with a different-length character, or during
6074 modeline frobbing. (You could also export this to Lisp, but it's not
6075 done so currently.) Resizing a string is a potentially tricky process.
6076 If the change is small enough that the padding can absorb it, nothing
6077 other than a simple memory move needs to be done.  Keep in mind,
6078 however, that the string can't shrink too much because the offset to the
6079 next string in the string-chars block is computed by looking at the
6080 length and rounding to the nearest multiple of four or eight.  If the
6081 string would shrink or expand beyond the correct padding, new string
6082 data needs to be allocated at the end of the last string-chars block and
6083 the data moved appropriately.  This leaves some dead string data, which
6084 is marked by putting a special marker of 0xFFFFFFFF in the @code{struct
6085 Lisp_String} pointer before the data (there's no real @code{struct
6086 Lisp_String} to point to and relocate), and storing the size of the dead
6087 string data (which would normally be obtained from the now-non-existent
6088 @code{struct Lisp_String}) at the beginning of the dead string data gap.
6089 The string compactor recognizes this special 0xFFFFFFFF marker and
6090 handles it correctly.
6091
6092 @node Compiled Function
6093 @section Compiled Function
6094 @cindex compiled function
6095 @cindex function, compiled
6096
6097   Not yet documented.
6098
6099
6100 @node Dumping, Events and the Event Loop, Allocation of Objects in XEmacs Lisp, Top
6101 @chapter Dumping
6102 @cindex dumping
6103
6104 @section What is dumping and its justification
6105 @cindex dumping and its justification, what is
6106
6107 The C code of XEmacs is just a Lisp engine with a lot of built-in
6108 primitives useful for writing an editor.  The editor itself is written
6109 mostly in Lisp, and represents around 100K lines of code.  Loading and
6110 executing the initialization of all this code takes a bit a time (five
6111 to ten times the usual startup time of current xemacs) and requires
6112 having all the lisp source files around.  Having to reload them each
6113 time the editor is started would not be acceptable.
6114
6115 The traditional solution to this problem is called dumping: the build
6116 process first creates the lisp engine under the name @file{temacs}, then
6117 runs it until it has finished loading and initializing all the lisp
6118 code, and eventually creates a new executable called @file{xemacs}
6119 including both the object code in @file{temacs} and all the contents of
6120 the memory after the initialization.
6121
6122 This solution, while working, has a huge problem: the creation of the
6123 new executable from the actual contents of memory is an extremely
6124 system-specific process, quite error-prone, and which interferes with a
6125 lot of system libraries (like malloc).  It is even getting worse
6126 nowadays with libraries using constructors which are automatically
6127 called when the program is started (even before main()) which tend to
6128 crash when they are called multiple times, once before dumping and once
6129 after (IRIX 6.x libz.so pulls in some C++ image libraries thru
6130 dependencies which have this problem).  Writing the dumper is also one
6131 of the most difficult parts of porting XEmacs to a new operating system.
6132 Basically, `dumping' is an operation that is just not officially
6133 supported on many operating systems.
6134
6135 The aim of the portable dumper is to solve the same problem as the
6136 system-specific dumper, that is to be able to reload quickly, using only
6137 a small number of files, the fully initialized lisp part of the editor,
6138 without any system-specific hacks.
6139
6140 @menu
6141 * Overview::
6142 * Data descriptions::
6143 * Dumping phase::
6144 * Reloading phase::
6145 * Remaining issues::
6146 @end menu
6147
6148 @node Overview
6149 @section Overview
6150 @cindex dumping overview
6151
6152 The portable dumping system has to:
6153
6154 @enumerate
6155 @item
6156 At dump time, write all initialized, non-quickly-rebuildable data to a
6157 file [Note: currently named @file{xemacs.dmp}, but the name will
6158 change], along with all informations needed for the reloading.
6159
6160 @item
6161 When starting xemacs, reload the dump file, relocate it to its new
6162 starting address if needed, and reinitialize all pointers to this
6163 data.  Also, rebuild all the quickly rebuildable data.
6164 @end enumerate
6165
6166 @node Data descriptions
6167 @section Data descriptions
6168 @cindex dumping data descriptions
6169
6170 The more complex task of the dumper is to be able to write lisp objects
6171 (lrecords) and C structs to disk and reload them at a different address,
6172 updating all the pointers they include in the process.  This is done by
6173 using external data descriptions that give information about the layout
6174 of the structures in memory.
6175
6176 The specification of these descriptions is in lrecord.h.  A description
6177 of an lrecord is an array of struct lrecord_description.  Each of these
6178 structs include a type, an offset in the structure and some optional
6179 parameters depending on the type.  For instance, here is the string
6180 description:
6181
6182 @example
6183 static const struct lrecord_description string_description[] = @{
6184   @{ XD_BYTECOUNT,         offsetof (Lisp_String, size) @},
6185   @{ XD_OPAQUE_DATA_PTR,   offsetof (Lisp_String, data), XD_INDIRECT(0, 1) @},
6186   @{ XD_LISP_OBJECT,       offsetof (Lisp_String, plist) @},
6187   @{ XD_END @}
6188 @};
6189 @end example
6190
6191 The first line indicates a member of type Bytecount, which is used by
6192 the next, indirect directive.  The second means "there is a pointer to
6193 some opaque data in the field @code{data}".  The length of said data is
6194 given by the expression @code{XD_INDIRECT(0, 1)}, which means "the value
6195 in the 0th line of the description (welcome to C) plus one".  The third
6196 line means "there is a Lisp_Object member @code{plist} in the Lisp_String
6197 structure".  @code{XD_END} then ends the description.
6198
6199 This gives us all the information we need to move around what is pointed
6200 to by a structure (C or lrecord) and, by transitivity, everything that
6201 it points to.  The only missing information for dumping is the size of
6202 the structure.  For lrecords, this is part of the
6203 lrecord_implementation, so we don't need to duplicate it.  For C
6204 structures we use a struct struct_description, which includes a size
6205 field and a pointer to an associated array of lrecord_description.
6206
6207 @node Dumping phase
6208 @section Dumping phase
6209 @cindex dumping phase
6210
6211 Dumping is done by calling the function pdump() (in dumper.c) which is
6212 invoked from Fdump_emacs (in emacs.c).  This function performs a number
6213 of tasks.
6214
6215 @menu
6216 * Object inventory::
6217 * Address allocation::
6218 * The header::
6219 * Data dumping::
6220 * Pointers dumping::
6221 @end menu
6222
6223 @node Object inventory
6224 @subsection Object inventory
6225 @cindex dumping object inventory
6226
6227 The first task is to build the list of the objects to dump.  This
6228 includes:
6229
6230 @itemize @bullet
6231 @item lisp objects
6232 @item C structures
6233 @end itemize
6234
6235 We end up with one @code{pdump_entry_list_elmt} per object group (arrays
6236 of C structs are kept together) which includes a pointer to the first
6237 object of the group, the per-object size and the count of objects in the
6238 group, along with some other information which is initialized later.
6239
6240 These entries are linked together in @code{pdump_entry_list} structures
6241 and can be enumerated thru either:
6242
6243 @enumerate
6244 @item
6245 the @code{pdump_object_table}, an array of @code{pdump_entry_list}, one
6246 per lrecord type, indexed by type number.
6247
6248 @item
6249 the @code{pdump_opaque_data_list}, used for the opaque data which does
6250 not include pointers, and hence does not need descriptions.
6251
6252 @item
6253 the @code{pdump_struct_table}, which is a vector of
6254 @code{struct_description}/@code{pdump_entry_list} pairs, used for
6255 non-opaque C structures.
6256 @end enumerate
6257
6258 This uses a marking strategy similar to the garbage collector.  Some
6259 differences though:
6260
6261 @enumerate
6262 @item
6263 We do not use the mark bit (which does not exist for C structures
6264 anyway); we use a big hash table instead.
6265
6266 @item
6267 We do not use the mark function of lrecords but instead rely on the
6268 external descriptions.  This happens essentially because we need to
6269 follow pointers to C structures and opaque data in addition to
6270 Lisp_Object members.
6271 @end enumerate
6272
6273 This is done by @code{pdump_register_object()}, which handles Lisp_Object
6274 variables, and @code{pdump_register_struct()} which handles C structures,
6275 which both delegate the description management to @code{pdump_register_sub()}.
6276
6277 The hash table doubles as a map object to pdump_entry_list_elmt (i.e.
6278 allows us to look up a pdump_entry_list_elmt with the object it points
6279 to).  Entries are added with @code{pdump_add_entry()} and looked up with
6280 @code{pdump_get_entry()}.  There is no need for entry removal.  The hash
6281 value is computed quite simply from the object pointer by
6282 @code{pdump_make_hash()}.
6283
6284 The roots for the marking are:
6285
6286 @enumerate
6287 @item
6288 the @code{staticpro}'ed variables (there is a special @code{staticpro_nodump()}
6289 call for protected variables we do not want to dump).
6290
6291 @item
6292 the variables registered via @code{dump_add_root_object}
6293 (@code{staticpro()} is equivalent to @code{staticpro_nodump()} +
6294 @code{dump_add_root_object()}).
6295
6296 @item
6297 the variables registered via @code{dump_add_root_struct_ptr}, each of
6298 which points to a C structure.
6299 @end enumerate
6300
6301 This does not include the GCPRO'ed variables, the specbinds, the
6302 catchtags, the backlist, the redisplay or the profiling info, since we
6303 do not want to rebuild the actual chain of lisp calls which end up to
6304 the dump-emacs call, only the global variables.
6305
6306 Weak lists and weak hash tables are dumped as if they were their
6307 non-weak equivalent (without changing their type, of course).  This has
6308 not yet been a problem.
6309
6310 @node Address allocation
6311 @subsection Address allocation
6312 @cindex dumping address allocation
6313
6314
6315 The next step is to allocate the offsets of each of the objects in the
6316 final dump file.  This is done by @code{pdump_allocate_offset()} which
6317 is called indirectly by @code{pdump_scan_by_alignment()}.
6318
6319 The strategy to deal with alignment problems uses these facts:
6320
6321 @enumerate
6322 @item
6323 real world alignment requirements are powers of two.
6324
6325 @item
6326 the C compiler is required to adjust the size of a struct so that you
6327 can have an array of them next to each other.  This means you can have an
6328 upper bound of the alignment requirements of a given structure by
6329 looking at which power of two its size is a multiple.
6330
6331 @item
6332 the non-variant part of variable size lrecords has an alignment
6333 requirement of 4.
6334 @end enumerate
6335
6336 Hence, for each lrecord type, C struct type or opaque data block the
6337 alignment requirement is computed as a power of two, with a minimum of
6338 2^2 for lrecords.  @code{pdump_scan_by_alignment()} then scans all the
6339 @code{pdump_entry_list_elmt}'s, the ones with the highest requirements
6340 first.  This ensures the best packing.
6341
6342 The maximum alignment requirement we take into account is 2^8.
6343
6344 @code{pdump_allocate_offset()} only has to do a linear allocation,
6345 starting at offset 256 (this leaves room for the header and keeps the
6346 alignments happy).
6347
6348 @node The header
6349 @subsection The header
6350 @cindex dumping, the header
6351
6352 The next step creates the file and writes a header with a signature and
6353 some random information in it.  The @code{reloc_address} field, which
6354 indicates at which address the file should be loaded if we want to avoid
6355 post-reload relocation, is set to 0.  It then seeks to offset 256 (base
6356 offset for the objects).
6357
6358 @node Data dumping
6359 @subsection Data dumping
6360 @cindex data dumping
6361 @cindex dumping, data
6362
6363 The data is dumped in the same order as the addresses were allocated by
6364 @code{pdump_dump_data()}, called from @code{pdump_scan_by_alignment()}.
6365 This function copies the data to a temporary buffer, relocates all
6366 pointers in the object to the addresses allocated in step Address
6367 Allocation, and writes it to the file.  Using the same order means that,
6368 if we are careful with lrecords whose size is not a multiple of 4, we
6369 are ensured that the object is always written at the offset in the file
6370 allocated in step Address Allocation.
6371
6372 @node Pointers dumping
6373 @subsection Pointers dumping
6374 @cindex pointers dumping
6375 @cindex dumping, pointers
6376
6377 A bunch of tables needed to reassign properly the global pointers are
6378 then written.  They are:
6379
6380 @enumerate
6381 @item
6382 the pdump_root_struct_ptrs dynarr
6383 @item
6384 the pdump_opaques dynarr
6385 @item
6386 a vector of all the offsets to the objects in the file that include a
6387 description (for faster relocation at reload time)
6388 @item
6389 the pdump_root_objects and pdump_weak_object_chains dynarrs.
6390 @end enumerate
6391
6392 For each of the dynarrs we write both the pointer to the variables and
6393 the relocated offset of the object they point to.  Since these variables
6394 are global, the pointers are still valid when restarting the program and
6395 are used to regenerate the global pointers.
6396
6397 The @code{pdump_weak_object_chains} dynarr is a special case.  The
6398 variables it points to are the head of weak linked lists of lisp objects
6399 of the same type.  Not all objects of this list are dumped so the
6400 relocated pointer we associate with them points to the first dumped
6401 object of the list, or Qnil if none is available.  This is also the
6402 reason why they are not used as roots for the purpose of object
6403 enumeration.
6404
6405 Some very important information like the @code{staticpros} and
6406 @code{lrecord_implementations_table} are handled indirectly using
6407 @code{dump_add_opaque} or @code{dump_add_root_struct_ptr}.
6408
6409 This is the end of the dumping part.
6410
6411 @node Reloading phase
6412 @section Reloading phase
6413 @cindex reloading phase
6414 @cindex dumping, reloading phase
6415
6416 @subsection File loading
6417 @cindex dumping, file loading
6418
6419 The file is mmap'ed in memory (which ensures a PAGESIZE alignment, at
6420 least 4096), or if mmap is unavailable or fails, a 256-bytes aligned
6421 malloc is done and the file is loaded.
6422
6423 Some variables are reinitialized from the values found in the header.
6424
6425 The difference between the actual loading address and the reloc_address
6426 is computed and will be used for all the relocations.
6427
6428
6429 @subsection Putting back the pdump_opaques
6430 @cindex dumping, putting back the pdump_opaques
6431
6432 The memory contents are restored in the obvious and trivial way.
6433
6434
6435 @subsection Putting back the pdump_root_struct_ptrs
6436 @cindex dumping, putting back the pdump_root_struct_ptrs
6437
6438 The variables pointed to by pdump_root_struct_ptrs in the dump phase are
6439 reset to the right relocated object addresses.
6440
6441
6442 @subsection Object relocation
6443 @cindex dumping, object relocation
6444
6445 All the objects are relocated using their description and their offset
6446 by @code{pdump_reloc_one}.  This step is unnecessary if the
6447 reloc_address is equal to the file loading address.
6448
6449
6450 @subsection Putting back the pdump_root_objects and pdump_weak_object_chains
6451 @cindex dumping, putting back the pdump_root_objects and pdump_weak_object_chains
6452
6453 Same as Putting back the pdump_root_struct_ptrs.
6454
6455
6456 @subsection Reorganize the hash tables
6457 @cindex dumping, reorganize the hash tables
6458
6459 Since some of the hash values in the lisp hash tables are
6460 address-dependent, their layout is now wrong.  So we go through each of
6461 them and have them resorted by calling @code{pdump_reorganize_hash_table}.
6462
6463 @node Remaining issues
6464 @section Remaining issues
6465 @cindex dumping, remaining issues
6466
6467 The build process will have to start a post-dump xemacs, ask it the
6468 loading address (which will, hopefully, be always the same between
6469 different xemacs invocations) and relocate the file to the new address.
6470 This way the object relocation phase will not have to be done, which
6471 means no writes in the objects and that, because of the use of mmap, the
6472 dumped data will be shared between all the xemacs running on the
6473 computer.
6474
6475 Some executable signature will be necessary to ensure that a given dump
6476 file is really associated with a given executable, or random crashes
6477 will occur.  Maybe a random number set at compile or configure time thru
6478 a define.  This will also allow for having differently-compiled xemacsen
6479 on the same system (mule and no-mule comes to mind).
6480
6481 The DOC file contents should probably end up in the dump file.
6482
6483
6484 @node Events and the Event Loop, Evaluation; Stack Frames; Bindings, Dumping, Top
6485 @chapter Events and the Event Loop
6486 @cindex events and the event loop
6487 @cindex event loop, events and the
6488
6489 @menu
6490 * Introduction to Events::
6491 * Main Loop::
6492 * Specifics of the Event Gathering Mechanism::
6493 * Specifics About the Emacs Event::
6494 * The Event Stream Callback Routines::
6495 * Other Event Loop Functions::
6496 * Converting Events::
6497 * Dispatching Events; The Command Builder::
6498 @end menu
6499
6500 @node Introduction to Events
6501 @section Introduction to Events
6502 @cindex events, introduction to
6503
6504   An event is an object that encapsulates information about an
6505 interesting occurrence in the operating system.  Events are
6506 generated either by user action, direct (e.g. typing on the
6507 keyboard or moving the mouse) or indirect (moving another
6508 window, thereby generating an expose event on an Emacs frame),
6509 or as a result of some other typically asynchronous action happening,
6510 such as output from a subprocess being ready or a timer expiring.
6511 Events come into the system in an asynchronous fashion (typically
6512 through a callback being called) and are converted into a
6513 synchronous event queue (first-in, first-out) in a process that
6514 we will call @dfn{collection}.
6515
6516   Note that each application has its own event queue. (It is
6517 immaterial whether the collection process directly puts the
6518 events in the proper application's queue, or puts them into
6519 a single system queue, which is later split up.)
6520
6521   The most basic level of event collection is done by the
6522 operating system or window system.  Typically, XEmacs does
6523 its own event collection as well.  Often there are multiple
6524 layers of collection in XEmacs, with events from various
6525 sources being collected into a queue, which is then combined
6526 with other sources to go into another queue (i.e. a second
6527 level of collection), with perhaps another level on top of
6528 this, etc.
6529
6530   XEmacs has its own types of events (called @dfn{Emacs events}),
6531 which provides an abstract layer on top of the system-dependent
6532 nature of the most basic events that are received.  Part of the
6533 complex nature of the XEmacs event collection process involves
6534 converting from the operating-system events into the proper
6535 Emacs events---there may not be a one-to-one correspondence.
6536
6537   Emacs events are documented in @file{events.h}; I'll discuss them
6538 later.
6539
6540 @node Main Loop
6541 @section Main Loop
6542 @cindex main loop
6543 @cindex events, main loop
6544
6545   The @dfn{command loop} is the top-level loop that the editor is always
6546 running.  It loops endlessly, calling @code{next-event} to retrieve an
6547 event and @code{dispatch-event} to execute it. @code{dispatch-event} does
6548 the appropriate thing with non-user events (process, timeout,
6549 magic, eval, mouse motion); this involves calling a Lisp handler
6550 function, redrawing a newly-exposed part of a frame, reading
6551 subprocess output, etc.  For user events, @code{dispatch-event}
6552 looks up the event in relevant keymaps or menubars; when a
6553 full key sequence or menubar selection is reached, the appropriate
6554 function is executed. @code{dispatch-event} may have to keep state
6555 across calls; this is done in the ``command-builder'' structure
6556 associated with each console (remember, there's usually only
6557 one console), and the engine that looks up keystrokes and
6558 constructs full key sequences is called the @dfn{command builder}.
6559 This is documented elsewhere.
6560
6561   The guts of the command loop are in @code{command_loop_1()}.  This
6562 function doesn't catch errors, though---that's the job of
6563 @code{command_loop_2()}, which is a condition-case (i.e. error-trapping)
6564 wrapper around @code{command_loop_1()}.  @code{command_loop_1()} never
6565 returns, but may get thrown out of.
6566
6567   When an error occurs, @code{cmd_error()} is called, which usually
6568 invokes the Lisp error handler in @code{command-error}; however, a
6569 default error handler is provided if @code{command-error} is @code{nil}
6570 (e.g. during startup).  The purpose of the error handler is simply to
6571 display the error message and do associated cleanup; it does not need to
6572 throw anywhere.  When the error handler finishes, the condition-case in
6573 @code{command_loop_2()} will finish and @code{command_loop_2()} will
6574 reinvoke @code{command_loop_1()}.
6575
6576   @code{command_loop_2()} is invoked from three places: from
6577 @code{initial_command_loop()} (called from @code{main()} at the end of
6578 internal initialization), from the Lisp function @code{recursive-edit},
6579 and from @code{call_command_loop()}.
6580
6581   @code{call_command_loop()} is called when a macro is started and when
6582 the minibuffer is entered; normal termination of the macro or minibuffer
6583 causes a throw out of the recursive command loop. (To
6584 @code{execute-kbd-macro} for macros and @code{exit} for minibuffers.
6585 Note also that the low-level minibuffer-entering function,
6586 @code{read-minibuffer-internal}, provides its own error handling and
6587 does not need @code{command_loop_2()}'s error encapsulation; so it tells
6588 @code{call_command_loop()} to invoke @code{command_loop_1()} directly.)
6589
6590   Note that both read-minibuffer-internal and recursive-edit set up a
6591 catch for @code{exit}; this is why @code{abort-recursive-edit}, which
6592 throws to this catch, exits out of either one.
6593
6594   @code{initial_command_loop()}, called from @code{main()}, sets up a
6595 catch for @code{top-level} when invoking @code{command_loop_2()},
6596 allowing functions to throw all the way to the top level if they really
6597 need to.  Before invoking @code{command_loop_2()},
6598 @code{initial_command_loop()} calls @code{top_level_1()}, which handles
6599 all of the startup stuff (creating the initial frame, handling the
6600 command-line options, loading the user's @file{.emacs} file, etc.).  The
6601 function that actually does this is in Lisp and is pointed to by the
6602 variable @code{top-level}; normally this function is
6603 @code{normal-top-level}.  @code{top_level_1()} is just an error-handling
6604 wrapper similar to @code{command_loop_2()}.  Note also that
6605 @code{initial_command_loop()} sets up a catch for @code{top-level} when
6606 invoking @code{top_level_1()}, just like when it invokes
6607 @code{command_loop_2()}.
6608
6609 @node Specifics of the Event Gathering Mechanism
6610 @section Specifics of the Event Gathering Mechanism
6611 @cindex event gathering mechanism, specifics of the
6612
6613   Here is an approximate diagram of the collection processes
6614 at work in XEmacs, under TTY's (TTY's are simpler than X
6615 so we'll look at this first):
6616
6617 @noindent
6618 @example
6619  asynch.      asynch.    asynch.   asynch.             [Collectors in
6620 kbd events  kbd events   process   process                the OS]
6621       |         |         output    output
6622       |         |           |         |
6623       |         |           |         |      SIGINT,   [signal handlers
6624       |         |           |         |      SIGQUIT,     in XEmacs]
6625       V         V           V         V      SIGWINCH,
6626      file      file        file      file    SIGALRM
6627      desc.     desc.       desc.     desc.     |
6628      (TTY)     (TTY)       (pipe)    (pipe)    |
6629       |          |          |         |      fake    timeouts
6630       |          |          |         |      file        |
6631       |          |          |         |      desc.       |
6632       |          |          |         |      (pipe)      |
6633       |          |          |         |        |         |
6634       |          |          |         |        |         |
6635       |          |          |         |        |         |
6636       V          V          V         V        V         V
6637       ------>-----------<----------------<----------------
6638                   |
6639                   |
6640                   | [collected using select() in emacs_tty_next_event()
6641                   |  and converted to the appropriate Emacs event]
6642                   |
6643                   |
6644                   V          (above this line is TTY-specific)
6645                 Emacs -----------------------------------------------
6646                 event (below this line is the generic event mechanism)
6647                   |
6648                   |
6649 was there     if not, call
6650 a SIGINT?  emacs_tty_next_event()
6651     |             |
6652     |             |
6653     |             |
6654     V             V
6655     --->------<----
6656            |
6657            |     [collected in event_stream_next_event();
6658            |      SIGINT is converted using maybe_read_quit_event()]
6659            V
6660          Emacs
6661          event
6662            |
6663            \---->------>----- maybe_kbd_translate() ---->---\
6664                                                             |
6665                                                             |
6666                                                             |
6667      command event queue                                    |
6668                                                if not from command
6669   (contains events that were                   event queue, call
6670   read earlier but not processed,              event_stream_next_event()
6671   typically when waiting in a                               |
6672   sit-for, sleep-for, etc. for                              |
6673  a particular event to be received)                         |
6674                |                                            |
6675                |                                            |
6676                V                                            V
6677                ---->------------------------------------<----
6678                                                |
6679                                                | [collected in
6680                                                |  next_event_internal()]
6681                                                |
6682  unread-     unread-       event from          |
6683  command-    command-       keyboard       else, call
6684  events      event           macro      next_event_internal()
6685    |           |               |               |
6686    |           |               |               |
6687    |           |               |               |
6688    V           V               V               V
6689    --------->----------------------<------------
6690                      |
6691                      |      [collected in `next-event', which may loop
6692                      |       more than once if the event it gets is on
6693                      |       a dead frame, device, etc.]
6694                      |
6695                      |
6696                      V
6697             feed into top-level event loop,
6698             which repeatedly calls `next-event'
6699             and then dispatches the event
6700             using `dispatch-event'
6701 @end example
6702
6703 Notice the separation between TTY-specific and generic event mechanism.
6704 When using the Xt-based event loop, the TTY-specific stuff is replaced
6705 but the rest stays the same.
6706
6707 It's also important to realize that only one different kind of
6708 system-specific event loop can be operating at a time, and must be able
6709 to receive all kinds of events simultaneously.  For the two existing
6710 event loops (implemented in @file{event-tty.c} and @file{event-Xt.c},
6711 respectively), the TTY event loop @emph{only} handles TTY consoles,
6712 while the Xt event loop handles @emph{both} TTY and X consoles.  This
6713 situation is different from all of the output handlers, where you simply
6714 have one per console type.
6715
6716   Here's the Xt Event Loop Diagram (notice that below a certain point,
6717 it's the same as the above diagram):
6718
6719 @example
6720 asynch. asynch. asynch. asynch.                 [Collectors in
6721  kbd     kbd    process process                    the OS]
6722 events  events  output  output
6723   |       |       |       |
6724   |       |       |       |     asynch. asynch. [Collectors in the
6725   |       |       |       |       X        X     OS and X Window System]
6726   |       |       |       |     events  events
6727   |       |       |       |       |        |
6728   |       |       |       |       |        |
6729   |       |       |       |       |        |    SIGINT, [signal handlers
6730   |       |       |       |       |        |    SIGQUIT,   in XEmacs]
6731   |       |       |       |       |        |    SIGWINCH,
6732   |       |       |       |       |        |    SIGALRM
6733   |       |       |       |       |        |       |
6734   |       |       |       |       |        |       |
6735   |       |       |       |       |        |       |      timeouts
6736   |       |       |       |       |        |       |          |
6737   |       |       |       |       |        |       |          |
6738   |       |       |       |       |        |       V          |
6739   V       V       V       V       V        V      fake        |
6740  file    file    file    file    file     file    file        |
6741  desc.   desc.   desc.   desc.   desc.    desc.   desc.       |
6742  (TTY)   (TTY)   (pipe)  (pipe) (socket) (socket) (pipe)      |
6743   |       |       |       |       |        |       |          |
6744   |       |       |       |       |        |       |          |
6745   |       |       |       |       |        |       |          |
6746   V       V       V       V       V        V       V          V
6747   --->----------------------------------------<---------<------
6748        |              |               |
6749        |              |               |[collected using select() in
6750        |              |               | _XtWaitForSomething(), called
6751        |              |               | from XtAppProcessEvent(), called
6752        |              |               | in emacs_Xt_next_event();
6753        |              |               | dispatched to various callbacks]
6754        |              |               |
6755        |              |               |
6756   emacs_Xt_        p_s_callback(),    | [popup_selection_callback]
6757   event_handler()  x_u_v_s_callback(),| [x_update_vertical_scrollbar_
6758        |           x_u_h_s_callback(),|  callback]
6759        |           search_callback()  | [x_update_horizontal_scrollbar_
6760        |              |               |  callback]
6761        |              |               |
6762        |              |               |
6763   enqueue_Xt_       signal_special_   |
6764   dispatch_event()  Xt_user_event()   |
6765   [maybe multiple     |               |
6766    times, maybe 0     |               |
6767    times]             |               |
6768        |            enqueue_Xt_       |
6769        |            dispatch_event()  |
6770        |              |               |
6771        |              |               |
6772        V              V               |
6773        -->----------<--               |
6774               |                       |
6775               |                       |
6776            dispatch             Xt_what_callback()
6777            event                  sets flags
6778            queue                      |
6779               |                       |
6780               |                       |
6781               |                       |
6782               |                       |
6783               ---->-----------<--------
6784                    |
6785                    |
6786                    |     [collected and converted as appropriate in
6787                    |            emacs_Xt_next_event()]
6788                    |
6789                    |
6790                    V          (above this line is Xt-specific)
6791                  Emacs ------------------------------------------------
6792                  event (below this line is the generic event mechanism)
6793                    |
6794                    |
6795 was there      if not, call
6796 a SIGINT?   emacs_Xt_next_event()
6797     |              |
6798     |              |
6799     |              |
6800     V              V
6801     --->-------<----
6802            |
6803            |        [collected in event_stream_next_event();
6804            |         SIGINT is converted using maybe_read_quit_event()]
6805            V
6806          Emacs
6807          event
6808            |
6809            \---->------>----- maybe_kbd_translate() -->-----\
6810                                                             |
6811                                                             |
6812                                                             |
6813      command event queue                                    |
6814                                               if not from command
6815   (contains events that were                  event queue, call
6816   read earlier but not processed,             event_stream_next_event()
6817   typically when waiting in a                               |
6818   sit-for, sleep-for, etc. for                              |
6819  a particular event to be received)                         |
6820                |                                            |
6821                |                                            |
6822                V                                            V
6823                ---->----------------------------------<------
6824                                                |
6825                                                | [collected in
6826                                                |  next_event_internal()]
6827                                                |
6828  unread-     unread-       event from          |
6829  command-    command-       keyboard       else, call
6830  events      event           macro      next_event_internal()
6831    |           |               |               |
6832    |           |               |               |
6833    |           |               |               |
6834    V           V               V               V
6835    --------->----------------------<------------
6836                      |
6837                      |      [collected in `next-event', which may loop
6838                      |       more than once if the event it gets is on
6839                      |       a dead frame, device, etc.]
6840                      |
6841                      |
6842                      V
6843             feed into top-level event loop,
6844             which repeatedly calls `next-event'
6845             and then dispatches the event
6846             using `dispatch-event'
6847 @end example
6848
6849 @node Specifics About the Emacs Event
6850 @section Specifics About the Emacs Event
6851 @cindex event, specifics about the Lisp object
6852
6853 @node The Event Stream Callback Routines
6854 @section The Event Stream Callback Routines
6855 @cindex event stream callback routines, the
6856 @cindex callback routines, the event stream
6857
6858 @node Other Event Loop Functions
6859 @section Other Event Loop Functions
6860 @cindex event loop functions, other
6861
6862   @code{detect_input_pending()} and @code{input-pending-p} look for
6863 input by calling @code{event_stream->event_pending_p} and looking in
6864 @code{[V]unread-command-event} and the @code{command_event_queue} (they
6865 do not check for an executing keyboard macro, though).
6866
6867   @code{discard-input} cancels any command events pending (and any
6868 keyboard macros currently executing), and puts the others onto the
6869 @code{command_event_queue}.  There is a comment about a ``race
6870 condition'', which is not a good sign.
6871
6872   @code{next-command-event} and @code{read-char} are higher-level
6873 interfaces to @code{next-event}.  @code{next-command-event} gets the
6874 next @dfn{command} event (i.e.  keypress, mouse event, menu selection,
6875 or scrollbar action), calling @code{dispatch-event} on any others.
6876 @code{read-char} calls @code{next-command-event} and uses
6877 @code{event_to_character()} to return the character equivalent.  With
6878 the right kind of input method support, it is possible for (read-char)
6879 to return a Kanji character.
6880
6881 @node Converting Events
6882 @section Converting Events
6883 @cindex converting events
6884 @cindex events, converting
6885
6886   @code{character_to_event()}, @code{event_to_character()},
6887 @code{event-to-character}, and @code{character-to-event} convert between
6888 characters and keypress events corresponding to the characters.  If the
6889 event was not a keypress, @code{event_to_character()} returns -1 and
6890 @code{event-to-character} returns @code{nil}.  These functions convert
6891 between character representation and the split-up event representation
6892 (keysym plus mod keys).
6893
6894 @node Dispatching Events; The Command Builder
6895 @section Dispatching Events; The Command Builder
6896 @cindex dispatching events; the command builder
6897 @cindex events; the command builder, dispatching
6898 @cindex command builder, dispatching events; the
6899
6900 Not yet documented.
6901
6902 @node Evaluation; Stack Frames; Bindings, Symbols and Variables, Events and the Event Loop, Top
6903 @chapter Evaluation; Stack Frames; Bindings
6904 @cindex evaluation; stack frames; bindings
6905 @cindex stack frames; bindings, evaluation;
6906 @cindex bindings, evaluation; stack frames;
6907
6908 @menu
6909 * Evaluation::
6910 * Dynamic Binding; The specbinding Stack; Unwind-Protects::
6911 * Simple Special Forms::
6912 * Catch and Throw::
6913 @end menu
6914
6915 @node Evaluation
6916 @section Evaluation
6917 @cindex evaluation
6918
6919   @code{Feval()} evaluates the form (a Lisp object) that is passed to
6920 it.  Note that evaluation is only non-trivial for two types of objects:
6921 symbols and conses.  A symbol is evaluated simply by calling
6922 @code{symbol-value} on it and returning the value.
6923
6924   Evaluating a cons means calling a function.  First, @code{eval} checks
6925 to see if garbage-collection is necessary, and calls
6926 @code{garbage_collect_1()} if so.  It then increases the evaluation
6927 depth by 1 (@code{lisp_eval_depth}, which is always less than
6928 @code{max_lisp_eval_depth}) and adds an element to the linked list of
6929 @code{struct backtrace}'s (@code{backtrace_list}).  Each such structure
6930 contains a pointer to the function being called plus a list of the
6931 function's arguments.  Originally these values are stored unevalled, and
6932 as they are evaluated, the backtrace structure is updated.  Garbage
6933 collection pays attention to the objects pointed to in the backtrace
6934 structures (garbage collection might happen while a function is being
6935 called or while an argument is being evaluated, and there could easily
6936 be no other references to the arguments in the argument list; once an
6937 argument is evaluated, however, the unevalled version is not needed by
6938 eval, and so the backtrace structure is changed).
6939
6940 At this point, the function to be called is determined by looking at
6941 the car of the cons (if this is a symbol, its function definition is
6942 retrieved and the process repeated).  The function should then consist
6943 of either a @code{Lisp_Subr} (built-in function written in C), a
6944 @code{Lisp_Compiled_Function} object, or a cons whose car is one of the
6945 symbols @code{autoload}, @code{macro} or @code{lambda}.
6946
6947 If the function is a @code{Lisp_Subr}, the lisp object points to a
6948 @code{struct Lisp_Subr} (created by @code{DEFUN()}), which contains a
6949 pointer to the C function, a minimum and maximum number of arguments
6950 (or possibly the special constants @code{MANY} or @code{UNEVALLED}), a
6951 pointer to the symbol referring to that subr, and a couple of other
6952 things.  If the subr wants its arguments @code{UNEVALLED}, they are
6953 passed raw as a list.  Otherwise, an array of evaluated arguments is
6954 created and put into the backtrace structure, and either passed whole
6955 (@code{MANY}) or each argument is passed as a C argument.
6956
6957 If the function is a @code{Lisp_Compiled_Function},
6958 @code{funcall_compiled_function()} is called.  If the function is a
6959 lambda list, @code{funcall_lambda()} is called.  If the function is a
6960 macro, [..... fill in] is done.  If the function is an autoload,
6961 @code{do_autoload()} is called to load the definition and then eval
6962 starts over [explain this more].
6963
6964 When @code{Feval()} exits, the evaluation depth is reduced by one, the
6965 debugger is called if appropriate, and the current backtrace structure
6966 is removed from the list.
6967
6968 Both @code{funcall_compiled_function()} and @code{funcall_lambda()} need
6969 to go through the list of formal parameters to the function and bind
6970 them to the actual arguments, checking for @code{&rest} and
6971 @code{&optional} symbols in the formal parameters and making sure the
6972 number of actual arguments is correct.
6973 @code{funcall_compiled_function()} can do this a little more
6974 efficiently, since the formal parameter list can be checked for sanity
6975 when the compiled function object is created.
6976
6977 @code{funcall_lambda()} simply calls @code{Fprogn} to execute the code
6978 in the lambda list.
6979
6980 @code{funcall_compiled_function()} calls the real byte-code interpreter
6981 @code{execute_optimized_program()} on the byte-code instructions, which
6982 are converted into an internal form for faster execution.
6983
6984 When a compiled function is executed for the first time by
6985 @code{funcall_compiled_function()}, or during the dump phase of building
6986 XEmacs, the byte-code instructions are converted from a
6987 @code{Lisp_String} (which is inefficient to access, especially in the
6988 presence of MULE) into a @code{Lisp_Opaque} object containing an array
6989 of unsigned char, which can be directly executed by the byte-code
6990 interpreter.  At this time the byte code is also analyzed for validity
6991 and transformed into a more optimized form, so that
6992 @code{execute_optimized_program()} can really fly.
6993
6994 Here are some of the optimizations performed by the internal byte-code
6995 transformer:
6996 @enumerate
6997 @item
6998 References to the @code{constants} array are checked for out-of-range
6999 indices, so that the byte interpreter doesn't have to.
7000 @item
7001 References to the @code{constants} array that will be used as a Lisp
7002 variable are checked for being correct non-constant (i.e. not @code{t},
7003 @code{nil}, or @code{keywordp}) symbols, so that the byte interpreter
7004 doesn't have to.
7005 @item
7006 The maximum number of variable bindings in the byte-code is
7007 pre-computed, so that space on the @code{specpdl} stack can be
7008 pre-reserved once for the whole function execution.
7009 @item
7010 All byte-code jumps are relative to the current program counter instead
7011 of the start of the program, thereby saving a register.
7012 @item
7013 One-byte relative jumps are converted from the byte-code form of unsigned
7014 chars offset by 127 to machine-friendly signed chars.
7015 @end enumerate
7016
7017 Of course, this transformation of the @code{instructions} should not be
7018 visible to the user, so @code{Fcompiled_function_instructions()} needs
7019 to know how to convert the optimized opaque object back into a Lisp
7020 string that is identical to the original string from the @file{.elc}
7021 file.  (Actually, the resulting string may (rarely) contain slightly
7022 different, yet equivalent, byte code.)
7023
7024 @code{Ffuncall()} implements Lisp @code{funcall}.  @code{(funcall fun
7025 x1 x2 x3 ...)} is equivalent to @code{(eval (list fun (quote x1) (quote
7026 x2) (quote x3) ...))}.  @code{Ffuncall()} contains its own code to do
7027 the evaluation, however, and is very similar to @code{Feval()}.
7028
7029 From the performance point of view, it is worth knowing that most of the
7030 time in Lisp evaluation is spent executing @code{Lisp_Subr} and
7031 @code{Lisp_Compiled_Function} objects via @code{Ffuncall()} (not
7032 @code{Feval()}).
7033
7034 @code{Fapply()} implements Lisp @code{apply}, which is very similar to
7035 @code{funcall} except that if the last argument is a list, the result is the
7036 same as if each of the arguments in the list had been passed separately.
7037 @code{Fapply()} does some business to expand the last argument if it's a
7038 list, then calls @code{Ffuncall()} to do the work.
7039
7040 @code{apply1()}, @code{call0()}, @code{call1()}, @code{call2()}, and
7041 @code{call3()} call a function, passing it the argument(s) given (the
7042 arguments are given as separate C arguments rather than being passed as
7043 an array).  @code{apply1()} uses @code{Fapply()} while the others use
7044 @code{Ffuncall()} to do the real work.
7045
7046 @node Dynamic Binding; The specbinding Stack; Unwind-Protects
7047 @section Dynamic Binding; The specbinding Stack; Unwind-Protects
7048 @cindex dynamic binding; the specbinding stack; unwind-protects
7049 @cindex binding; the specbinding stack; unwind-protects, dynamic
7050 @cindex specbinding stack; unwind-protects, dynamic binding; the
7051 @cindex unwind-protects, dynamic binding; the specbinding stack;
7052
7053 @example
7054 struct specbinding
7055 @{
7056   Lisp_Object symbol;
7057   Lisp_Object old_value;
7058   Lisp_Object (*func) (Lisp_Object); /* for unwind-protect */
7059 @};
7060 @end example
7061
7062   @code{struct specbinding} is used for local-variable bindings and
7063 unwind-protects.  @code{specpdl} holds an array of @code{struct specbinding}'s,
7064 @code{specpdl_ptr} points to the beginning of the free bindings in the
7065 array, @code{specpdl_size} specifies the total number of binding slots
7066 in the array, and @code{max_specpdl_size} specifies the maximum number
7067 of bindings the array can be expanded to hold.  @code{grow_specpdl()}
7068 increases the size of the @code{specpdl} array, multiplying its size by
7069 2 but never exceeding @code{max_specpdl_size} (except that if this
7070 number is less than 400, it is first set to 400).
7071
7072   @code{specbind()} binds a symbol to a value and is used for local
7073 variables and @code{let} forms.  The symbol and its old value (which
7074 might be @code{Qunbound}, indicating no prior value) are recorded in the
7075 specpdl array, and @code{specpdl_size} is increased by 1.
7076
7077   @code{record_unwind_protect()} implements an @dfn{unwind-protect},
7078 which, when placed around a section of code, ensures that some specified
7079 cleanup routine will be executed even if the code exits abnormally
7080 (e.g. through a @code{throw} or quit).  @code{record_unwind_protect()}
7081 simply adds a new specbinding to the @code{specpdl} array and stores the
7082 appropriate information in it.  The cleanup routine can either be a C
7083 function, which is stored in the @code{func} field, or a @code{progn}
7084 form, which is stored in the @code{old_value} field.
7085
7086   @code{unbind_to()} removes specbindings from the @code{specpdl} array
7087 until the specified position is reached.  Each specbinding can be one of
7088 three types:
7089
7090 @enumerate
7091 @item
7092 an unwind-protect with a C cleanup function (@code{func} is not 0, and
7093 @code{old_value} holds an argument to be passed to the function);
7094 @item
7095 an unwind-protect with a Lisp form (@code{func} is 0, @code{symbol}
7096 is @code{nil}, and @code{old_value} holds the form to be executed with
7097 @code{Fprogn()}); or
7098 @item
7099 a local-variable binding (@code{func} is 0, @code{symbol} is not
7100 @code{nil}, and @code{old_value} holds the old value, which is stored as
7101 the symbol's value).
7102 @end enumerate
7103
7104 @node Simple Special Forms
7105 @section Simple Special Forms
7106 @cindex special forms, simple
7107
7108 @code{or}, @code{and}, @code{if}, @code{cond}, @code{progn},
7109 @code{prog1}, @code{prog2}, @code{setq}, @code{quote}, @code{function},
7110 @code{let*}, @code{let}, @code{while}
7111
7112 All of these are very simple and work as expected, calling
7113 @code{Feval()} or @code{Fprogn()} as necessary and (in the case of
7114 @code{let} and @code{let*}) using @code{specbind()} to create bindings
7115 and @code{unbind_to()} to undo the bindings when finished.
7116
7117 Note that, with the exception of @code{Fprogn}, these functions are
7118 typically called in real life only in interpreted code, since the byte
7119 compiler knows how to convert calls to these functions directly into
7120 byte code.
7121
7122 @node Catch and Throw
7123 @section Catch and Throw
7124 @cindex catch and throw
7125 @cindex throw, catch and
7126
7127 @example
7128 struct catchtag
7129 @{
7130   Lisp_Object tag;
7131   Lisp_Object val;
7132   struct catchtag *next;
7133   struct gcpro *gcpro;
7134   jmp_buf jmp;
7135   struct backtrace *backlist;
7136   int lisp_eval_depth;
7137   int pdlcount;
7138 @};
7139 @end example
7140
7141   @code{catch} is a Lisp function that places a catch around a body of
7142 code.  A catch is a means of non-local exit from the code.  When a catch
7143 is created, a tag is specified, and executing a @code{throw} to this tag
7144 will exit from the body of code caught with this tag, and its value will
7145 be the value given in the call to @code{throw}.  If there is no such
7146 call, the code will be executed normally.
7147
7148   Information pertaining to a catch is held in a @code{struct catchtag},
7149 which is placed at the head of a linked list pointed to by
7150 @code{catchlist}.  @code{internal_catch()} is passed a C function to
7151 call (@code{Fprogn()} when Lisp @code{catch} is called) and arguments to
7152 give it, and places a catch around the function.  Each @code{struct
7153 catchtag} is held in the stack frame of the @code{internal_catch()}
7154 instance that created the catch.
7155
7156   @code{internal_catch()} is fairly straightforward.  It stores into the
7157 @code{struct catchtag} the tag name and the current values of
7158 @code{backtrace_list}, @code{lisp_eval_depth}, @code{gcprolist}, and the
7159 offset into the @code{specpdl} array, sets a jump point with @code{_setjmp()}
7160 (storing the jump point into the @code{struct catchtag}), and calls the
7161 function.  Control will return to @code{internal_catch()} either when
7162 the function exits normally or through a @code{_longjmp()} to this jump
7163 point.  In the latter case, @code{throw} will store the value to be
7164 returned into the @code{struct catchtag} before jumping.  When it's
7165 done, @code{internal_catch()} removes the @code{struct catchtag} from
7166 the catchlist and returns the proper value.
7167
7168   @code{Fthrow()} goes up through the catchlist until it finds one with
7169 a matching tag.  It then calls @code{unbind_catch()} to restore
7170 everything to what it was when the appropriate catch was set, stores the
7171 return value in the @code{struct catchtag}, and jumps (with
7172 @code{_longjmp()}) to its jump point.
7173
7174   @code{unbind_catch()} removes all catches from the catchlist until it
7175 finds the correct one.  Some of the catches might have been placed for
7176 error-trapping, and if so, the appropriate entries on the handlerlist
7177 must be removed (see ``errors'').  @code{unbind_catch()} also restores
7178 the values of @code{gcprolist}, @code{backtrace_list}, and
7179 @code{lisp_eval}, and calls @code{unbind_to()} to undo any specbindings
7180 created since the catch.
7181
7182
7183 @node Symbols and Variables, Buffers and Textual Representation, Evaluation; Stack Frames; Bindings, Top
7184 @chapter Symbols and Variables
7185 @cindex symbols and variables
7186 @cindex variables, symbols and
7187
7188 @menu
7189 * Introduction to Symbols::
7190 * Obarrays::
7191 * Symbol Values::
7192 @end menu
7193
7194 @node Introduction to Symbols
7195 @section Introduction to Symbols
7196 @cindex symbols, introduction to
7197
7198   A symbol is basically just an object with four fields: a name (a
7199 string), a value (some Lisp object), a function (some Lisp object), and
7200 a property list (usually a list of alternating keyword/value pairs).
7201 What makes symbols special is that there is usually only one symbol with
7202 a given name, and the symbol is referred to by name.  This makes a
7203 symbol a convenient way of calling up data by name, i.e. of implementing
7204 variables. (The variable's value is stored in the @dfn{value slot}.)
7205 Similarly, functions are referenced by name, and the definition of the
7206 function is stored in a symbol's @dfn{function slot}.  This means that
7207 there can be a distinct function and variable with the same name.  The
7208 property list is used as a more general mechanism of associating
7209 additional values with particular names, and once again the namespace is
7210 independent of the function and variable namespaces.
7211
7212 @node Obarrays
7213 @section Obarrays
7214 @cindex obarrays
7215
7216   The identity of symbols with their names is accomplished through a
7217 structure called an obarray, which is just a poorly-implemented hash
7218 table mapping from strings to symbols whose name is that string. (I say
7219 ``poorly implemented'' because an obarray appears in Lisp as a vector
7220 with some hidden fields rather than as its own opaque type.  This is an
7221 Emacs Lisp artifact that should be fixed.)
7222
7223   Obarrays are implemented as a vector of some fixed size (which should
7224 be a prime for best results), where each ``bucket'' of the vector
7225 contains one or more symbols, threaded through a hidden @code{next}
7226 field in the symbol.  Lookup of a symbol in an obarray, and adding a
7227 symbol to an obarray, is accomplished through standard hash-table
7228 techniques.
7229
7230   The standard Lisp function for working with symbols and obarrays is
7231 @code{intern}.  This looks up a symbol in an obarray given its name; if
7232 it's not found, a new symbol is automatically created with the specified
7233 name, added to the obarray, and returned.  This is what happens when the
7234 Lisp reader encounters a symbol (or more precisely, encounters the name
7235 of a symbol) in some text that it is reading.  There is a standard
7236 obarray called @code{obarray} that is used for this purpose, although
7237 the Lisp programmer is free to create his own obarrays and @code{intern}
7238 symbols in them.
7239
7240   Note that, once a symbol is in an obarray, it stays there until
7241 something is done about it, and the standard obarray @code{obarray}
7242 always stays around, so once you use any particular variable name, a
7243 corresponding symbol will stay around in @code{obarray} until you exit
7244 XEmacs.
7245
7246   Note that @code{obarray} itself is a variable, and as such there is a
7247 symbol in @code{obarray} whose name is @code{"obarray"} and which
7248 contains @code{obarray} as its value.
7249
7250   Note also that this call to @code{intern} occurs only when in the Lisp
7251 reader, not when the code is executed (at which point the symbol is
7252 already around, stored as such in the definition of the function).
7253
7254   You can create your own obarray using @code{make-vector} (this is
7255 horrible but is an artifact) and intern symbols into that obarray.
7256 Doing that will result in two or more symbols with the same name.
7257 However, at most one of these symbols is in the standard @code{obarray}:
7258 You cannot have two symbols of the same name in any particular obarray.
7259 Note that you cannot add a symbol to an obarray in any fashion other
7260 than using @code{intern}: i.e. you can't take an existing symbol and put
7261 it in an existing obarray.  Nor can you change the name of an existing
7262 symbol. (Since obarrays are vectors, you can violate the consistency of
7263 things by storing directly into the vector, but let's ignore that
7264 possibility.)
7265
7266   Usually symbols are created by @code{intern}, but if you really want,
7267 you can explicitly create a symbol using @code{make-symbol}, giving it
7268 some name.  The resulting symbol is not in any obarray (i.e. it is
7269 @dfn{uninterned}), and you can't add it to any obarray.  Therefore its
7270 primary purpose is as a symbol to use in macros to avoid namespace
7271 pollution.  It can also be used as a carrier of information, but cons
7272 cells could probably be used just as well.
7273
7274   You can also use @code{intern-soft} to look up a symbol but not create
7275 a new one, and @code{unintern} to remove a symbol from an obarray.  This
7276 returns the removed symbol. (Remember: You can't put the symbol back
7277 into any obarray.) Finally, @code{mapatoms} maps over all of the symbols
7278 in an obarray.
7279
7280 @node Symbol Values
7281 @section Symbol Values
7282 @cindex symbol values
7283 @cindex values, symbol
7284
7285   The value field of a symbol normally contains a Lisp object.  However,
7286 a symbol can be @dfn{unbound}, meaning that it logically has no value.
7287 This is internally indicated by storing a special Lisp object, called
7288 @dfn{the unbound marker} and stored in the global variable
7289 @code{Qunbound}.  The unbound marker is of a special Lisp object type
7290 called @dfn{symbol-value-magic}.  It is impossible for the Lisp
7291 programmer to directly create or access any object of this type.
7292
7293   @strong{You must not let any ``symbol-value-magic'' object escape to
7294 the Lisp level.}  Printing any of these objects will cause the message
7295 @samp{INTERNAL EMACS BUG} to appear as part of the print representation.
7296 (You may see this normally when you call @code{debug_print()} from the
7297 debugger on a Lisp object.) If you let one of these objects escape to
7298 the Lisp level, you will violate a number of assumptions contained in
7299 the C code and make the unbound marker not function right.
7300
7301   When a symbol is created, its value field (and function field) are set
7302 to @code{Qunbound}.  The Lisp programmer can restore these conditions
7303 later using @code{makunbound} or @code{fmakunbound}, and can query to
7304 see whether the value of function fields are @dfn{bound} (i.e. have a
7305 value other than @code{Qunbound}) using @code{boundp} and
7306 @code{fboundp}.  The fields are set to a normal Lisp object using
7307 @code{set} (or @code{setq}) and @code{fset}.
7308
7309   Other symbol-value-magic objects are used as special markers to
7310 indicate variables that have non-normal properties.  This includes any
7311 variables that are tied into C variables (setting the variable magically
7312 sets some global variable in the C code, and likewise for retrieving the
7313 variable's value), variables that magically tie into slots in the
7314 current buffer, variables that are buffer-local, etc.  The
7315 symbol-value-magic object is stored in the value cell in place of
7316 a normal object, and the code to retrieve a symbol's value
7317 (i.e. @code{symbol-value}) knows how to do special things with them.
7318 This means that you should not just fetch the value cell directly if you
7319 want a symbol's value.
7320
7321   The exact workings of this are rather complex and involved and are
7322 well-documented in comments in @file{buffer.c}, @file{symbols.c}, and
7323 @file{lisp.h}.
7324
7325 @node Buffers and Textual Representation, MULE Character Sets and Encodings, Symbols and Variables, Top
7326 @chapter Buffers and Textual Representation
7327 @cindex buffers and textual representation
7328 @cindex textual representation, buffers and
7329
7330 @menu
7331 * Introduction to Buffers::     A buffer holds a block of text such as a file.
7332 * The Text in a Buffer::        Representation of the text in a buffer.
7333 * Buffer Lists::                Keeping track of all buffers.
7334 * Markers and Extents::         Tagging locations within a buffer.
7335 * Bufbytes and Emchars::        Representation of individual characters.
7336 * The Buffer Object::           The Lisp object corresponding to a buffer.
7337 @end menu
7338
7339 @node Introduction to Buffers
7340 @section Introduction to Buffers
7341 @cindex buffers, introduction to
7342
7343   A buffer is logically just a Lisp object that holds some text.
7344 In this, it is like a string, but a buffer is optimized for
7345 frequent insertion and deletion, while a string is not.  Furthermore:
7346
7347 @enumerate
7348 @item
7349 Buffers are @dfn{permanent} objects, i.e. once you create them, they
7350 remain around, and need to be explicitly deleted before they go away.
7351 @item
7352 Each buffer has a unique name, which is a string.  Buffers are
7353 normally referred to by name.  In this respect, they are like
7354 symbols.
7355 @item
7356 Buffers have a default insertion position, called @dfn{point}.
7357 Inserting text (unless you explicitly give a position) goes at point,
7358 and moves point forward past the text.  This is what is going on when
7359 you type text into Emacs.
7360 @item
7361 Buffers have lots of extra properties associated with them.
7362 @item
7363 Buffers can be @dfn{displayed}.  What this means is that there
7364 exist a number of @dfn{windows}, which are objects that correspond
7365 to some visible section of your display, and each window has
7366 an associated buffer, and the current contents of the buffer
7367 are shown in that section of the display.  The redisplay mechanism
7368 (which takes care of doing this) knows how to look at the
7369 text of a buffer and come up with some reasonable way of displaying
7370 this.  Many of the properties of a buffer control how the
7371 buffer's text is displayed.
7372 @item
7373 One buffer is distinguished and called the @dfn{current buffer}.  It is
7374 stored in the variable @code{current_buffer}.  Buffer operations operate
7375 on this buffer by default.  When you are typing text into a buffer, the
7376 buffer you are typing into is always @code{current_buffer}.  Switching
7377 to a different window changes the current buffer.  Note that Lisp code
7378 can temporarily change the current buffer using @code{set-buffer} (often
7379 enclosed in a @code{save-excursion} so that the former current buffer
7380 gets restored when the code is finished).  However, calling
7381 @code{set-buffer} will NOT cause a permanent change in the current
7382 buffer.  The reason for this is that the top-level event loop sets
7383 @code{current_buffer} to the buffer of the selected window, each time
7384 it finishes executing a user command.
7385 @end enumerate
7386
7387   Make sure you understand the distinction between @dfn{current buffer}
7388 and @dfn{buffer of the selected window}, and the distinction between
7389 @dfn{point} of the current buffer and @dfn{window-point} of the selected
7390 window. (This latter distinction is explained in detail in the section
7391 on windows.)
7392
7393 @node The Text in a Buffer
7394 @section The Text in a Buffer
7395 @cindex text in a buffer, the
7396 @cindex buffer, the text in a
7397
7398   The text in a buffer consists of a sequence of zero or more
7399 characters.  A @dfn{character} is an integer that logically represents
7400 a letter, number, space, or other unit of text.  Most of the characters
7401 that you will typically encounter belong to the ASCII set of characters,
7402 but there are also characters for various sorts of accented letters,
7403 special symbols, Chinese and Japanese ideograms (i.e. Kanji, Katakana,
7404 etc.), Cyrillic and Greek letters, etc.  The actual number of possible
7405 characters is quite large.
7406
7407   For now, we can view a character as some non-negative integer that
7408 has some shape that defines how it typically appears (e.g. as an
7409 uppercase A). (The exact way in which a character appears depends on the
7410 font used to display the character.) The internal type of characters in
7411 the C code is an @code{Emchar}; this is just an @code{int}, but using a
7412 symbolic type makes the code clearer.
7413
7414   Between every character in a buffer is a @dfn{buffer position} or
7415 @dfn{character position}.  We can speak of the character before or after
7416 a particular buffer position, and when you insert a character at a
7417 particular position, all characters after that position end up at new
7418 positions.  When we speak of the character @dfn{at} a position, we
7419 really mean the character after the position.  (This schizophrenia
7420 between a buffer position being ``between'' a character and ``on'' a
7421 character is rampant in Emacs.)
7422
7423   Buffer positions are numbered starting at 1.  This means that
7424 position 1 is before the first character, and position 0 is not
7425 valid.  If there are N characters in a buffer, then buffer
7426 position N+1 is after the last one, and position N+2 is not valid.
7427
7428   The internal makeup of the Emchar integer varies depending on whether
7429 we have compiled with MULE support.  If not, the Emchar integer is an
7430 8-bit integer with possible values from 0 - 255.  0 - 127 are the
7431 standard ASCII characters, while 128 - 255 are the characters from the
7432 ISO-8859-1 character set.  If we have compiled with MULE support, an
7433 Emchar is a 19-bit integer, with the various bits having meanings
7434 according to a complex scheme that will be detailed later.  The
7435 characters numbered 0 - 255 still have the same meanings as for the
7436 non-MULE case, though.
7437
7438   Internally, the text in a buffer is represented in a fairly simple
7439 fashion: as a contiguous array of bytes, with a @dfn{gap} of some size
7440 in the middle.  Although the gap is of some substantial size in bytes,
7441 there is no text contained within it: From the perspective of the text
7442 in the buffer, it does not exist.  The gap logically sits at some buffer
7443 position, between two characters (or possibly at the beginning or end of
7444 the buffer).  Insertion of text in a buffer at a particular position is
7445 always accomplished by first moving the gap to that position
7446 (i.e. through some block moving of text), then writing the text into the
7447 beginning of the gap, thereby shrinking the gap.  If the gap shrinks
7448 down to nothing, a new gap is created. (What actually happens is that a
7449 new gap is ``created'' at the end of the buffer's text, which requires
7450 nothing more than changing a couple of indices; then the gap is
7451 ``moved'' to the position where the insertion needs to take place by
7452 moving up in memory all the text after that position.)  Similarly,
7453 deletion occurs by moving the gap to the place where the text is to be
7454 deleted, and then simply expanding the gap to include the deleted text.
7455 (@dfn{Expanding} and @dfn{shrinking} the gap as just described means
7456 just that the internal indices that keep track of where the gap is
7457 located are changed.)
7458
7459   Note that the total amount of memory allocated for a buffer text never
7460 decreases while the buffer is live.  Therefore, if you load up a
7461 20-megabyte file and then delete all but one character, there will be a
7462 20-megabyte gap, which won't get any smaller (except by inserting
7463 characters back again).  Once the buffer is killed, the memory allocated
7464 for the buffer text will be freed, but it will still be sitting on the
7465 heap, taking up virtual memory, and will not be released back to the
7466 operating system. (However, if you have compiled XEmacs with rel-alloc,
7467 the situation is different.  In this case, the space @emph{will} be
7468 released back to the operating system.  However, this tends to result in a
7469 noticeable speed penalty.)
7470
7471   Astute readers may notice that the text in a buffer is represented as
7472 an array of @emph{bytes}, while (at least in the MULE case) an Emchar is
7473 a 19-bit integer, which clearly cannot fit in a byte.  This means (of
7474 course) that the text in a buffer uses a different representation from
7475 an Emchar: specifically, the 19-bit Emchar becomes a series of one to
7476 four bytes.  The conversion between these two representations is complex
7477 and will be described later.
7478
7479   In the non-MULE case, everything is very simple: An Emchar
7480 is an 8-bit value, which fits neatly into one byte.
7481
7482   If we are given a buffer position and want to retrieve the
7483 character at that position, we need to follow these steps:
7484
7485 @enumerate
7486 @item
7487 Pretend there's no gap, and convert the buffer position into a @dfn{byte
7488 index} that indexes to the appropriate byte in the buffer's stream of
7489 textual bytes.  By convention, byte indices begin at 1, just like buffer
7490 positions.  In the non-MULE case, byte indices and buffer positions are
7491 identical, since one character equals one byte.
7492 @item
7493 Convert the byte index into a @dfn{memory index}, which takes the gap
7494 into account.  The memory index is a direct index into the block of
7495 memory that stores the text of a buffer.  This basically just involves
7496 checking to see if the byte index is past the gap, and if so, adding the
7497 size of the gap to it.  By convention, memory indices begin at 1, just
7498 like buffer positions and byte indices, and when referring to the
7499 position that is @dfn{at} the gap, we always use the memory position at
7500 the @emph{beginning}, not at the end, of the gap.
7501 @item
7502 Fetch the appropriate bytes at the determined memory position.
7503 @item
7504 Convert these bytes into an Emchar.
7505 @end enumerate
7506
7507   In the non-Mule case, (3) and (4) boil down to a simple one-byte
7508 memory access.
7509
7510   Note that we have defined three types of positions in a buffer:
7511
7512 @enumerate
7513 @item
7514 @dfn{buffer positions} or @dfn{character positions}, typedef @code{Bufpos}
7515 @item
7516 @dfn{byte indices}, typedef @code{Bytind}
7517 @item
7518 @dfn{memory indices}, typedef @code{Memind}
7519 @end enumerate
7520
7521   All three typedefs are just @code{int}s, but defining them this way makes
7522 things a lot clearer.
7523
7524   Most code works with buffer positions.  In particular, all Lisp code
7525 that refers to text in a buffer uses buffer positions.  Lisp code does
7526 not know that byte indices or memory indices exist.
7527
7528   Finally, we have a typedef for the bytes in a buffer.  This is a
7529 @code{Bufbyte}, which is an unsigned char.  Referring to them as
7530 Bufbytes underscores the fact that we are working with a string of bytes
7531 in the internal Emacs buffer representation rather than in one of a
7532 number of possible alternative representations (e.g. EUC-encoded text,
7533 etc.).
7534
7535 @node Buffer Lists
7536 @section Buffer Lists
7537 @cindex buffer lists
7538
7539   Recall earlier that buffers are @dfn{permanent} objects, i.e.  that
7540 they remain around until explicitly deleted.  This entails that there is
7541 a list of all the buffers in existence.  This list is actually an
7542 assoc-list (mapping from the buffer's name to the buffer) and is stored
7543 in the global variable @code{Vbuffer_alist}.
7544
7545   The order of the buffers in the list is important: the buffers are
7546 ordered approximately from most-recently-used to least-recently-used.
7547 Switching to a buffer using @code{switch-to-buffer},
7548 @code{pop-to-buffer}, etc. and switching windows using
7549 @code{other-window}, etc.  usually brings the new current buffer to the
7550 front of the list.  @code{switch-to-buffer}, @code{other-buffer},
7551 etc. look at the beginning of the list to find an alternative buffer to
7552 suggest.  You can also explicitly move a buffer to the end of the list
7553 using @code{bury-buffer}.
7554
7555   In addition to the global ordering in @code{Vbuffer_alist}, each frame
7556 has its own ordering of the list.  These lists always contain the same
7557 elements as in @code{Vbuffer_alist} although possibly in a different
7558 order.  @code{buffer-list} normally returns the list for the selected
7559 frame.  This allows you to work in separate frames without things
7560 interfering with each other.
7561
7562   The standard way to look up a buffer given a name is
7563 @code{get-buffer}, and the standard way to create a new buffer is
7564 @code{get-buffer-create}, which looks up a buffer with a given name,
7565 creating a new one if necessary.  These operations correspond exactly
7566 with the symbol operations @code{intern-soft} and @code{intern},
7567 respectively.  You can also force a new buffer to be created using
7568 @code{generate-new-buffer}, which takes a name and (if necessary) makes
7569 a unique name from this by appending a number, and then creates the
7570 buffer.  This is basically like the symbol operation @code{gensym}.
7571
7572 @node Markers and Extents
7573 @section Markers and Extents
7574 @cindex markers and extents
7575 @cindex extents, markers and
7576
7577   Among the things associated with a buffer are things that are
7578 logically attached to certain buffer positions.  This can be used to
7579 keep track of a buffer position when text is inserted and deleted, so
7580 that it remains at the same spot relative to the text around it; to
7581 assign properties to particular sections of text; etc.  There are two
7582 such objects that are useful in this regard: they are @dfn{markers} and
7583 @dfn{extents}.
7584
7585   A @dfn{marker} is simply a flag placed at a particular buffer
7586 position, which is moved around as text is inserted and deleted.
7587 Markers are used for all sorts of purposes, such as the @code{mark} that
7588 is the other end of textual regions to be cut, copied, etc.
7589
7590   An @dfn{extent} is similar to two markers plus some associated
7591 properties, and is used to keep track of regions in a buffer as text is
7592 inserted and deleted, and to add properties (e.g. fonts) to particular
7593 regions of text.  The external interface of extents is explained
7594 elsewhere.
7595
7596   The important thing here is that markers and extents simply contain
7597 buffer positions in them as integers, and every time text is inserted or
7598 deleted, these positions must be updated.  In order to minimize the
7599 amount of shuffling that needs to be done, the positions in markers and
7600 extents (there's one per marker, two per extent) are stored in Meminds.
7601 This means that they only need to be moved when the text is physically
7602 moved in memory; since the gap structure tries to minimize this, it also
7603 minimizes the number of marker and extent indices that need to be
7604 adjusted.  Look in @file{insdel.c} for the details of how this works.
7605
7606   One other important distinction is that markers are @dfn{temporary}
7607 while extents are @dfn{permanent}.  This means that markers disappear as
7608 soon as there are no more pointers to them, and correspondingly, there
7609 is no way to determine what markers are in a buffer if you are just
7610 given the buffer.  Extents remain in a buffer until they are detached
7611 (which could happen as a result of text being deleted) or the buffer is
7612 deleted, and primitives do exist to enumerate the extents in a buffer.
7613
7614 @node Bufbytes and Emchars
7615 @section Bufbytes and Emchars
7616 @cindex Bufbytes and Emchars
7617 @cindex Emchars, Bufbytes and
7618
7619   Not yet documented.
7620
7621 @node The Buffer Object
7622 @section The Buffer Object
7623 @cindex buffer object, the
7624 @cindex object, the buffer
7625
7626   Buffers contain fields not directly accessible by the Lisp programmer.
7627 We describe them here, naming them by the names used in the C code.
7628 Many are accessible indirectly in Lisp programs via Lisp primitives.
7629
7630 @table @code
7631 @item name
7632 The buffer name is a string that names the buffer.  It is guaranteed to
7633 be unique.  @xref{Buffer Names,,, lispref, XEmacs Lisp Reference
7634 Manual}.
7635
7636 @item save_modified
7637 This field contains the time when the buffer was last saved, as an
7638 integer.  @xref{Buffer Modification,,, lispref, XEmacs Lisp Reference
7639 Manual}.
7640
7641 @item modtime
7642 This field contains the modification time of the visited file.  It is
7643 set when the file is written or read.  Every time the buffer is written
7644 to the file, this field is compared to the modification time of the
7645 file.  @xref{Buffer Modification,,, lispref, XEmacs Lisp Reference
7646 Manual}.
7647
7648 @item auto_save_modified
7649 This field contains the time when the buffer was last auto-saved.
7650
7651 @item last_window_start
7652 This field contains the @code{window-start} position in the buffer as of
7653 the last time the buffer was displayed in a window.
7654
7655 @item undo_list
7656 This field points to the buffer's undo list.  @xref{Undo,,, lispref,
7657 XEmacs Lisp Reference Manual}.
7658
7659 @item syntax_table_v
7660 This field contains the syntax table for the buffer.  @xref{Syntax
7661 Tables,,, lispref, XEmacs Lisp Reference Manual}.
7662
7663 @item downcase_table
7664 This field contains the conversion table for converting text to lower
7665 case.  @xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}.
7666
7667 @item upcase_table
7668 This field contains the conversion table for converting text to upper
7669 case.  @xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}.
7670
7671 @item case_canon_table
7672 This field contains the conversion table for canonicalizing text for
7673 case-folding search.  @xref{Case Tables,,, lispref, XEmacs Lisp
7674 Reference Manual}.
7675
7676 @item case_eqv_table
7677 This field contains the equivalence table for case-folding search.
7678 @xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}.
7679
7680 @item display_table
7681 This field contains the buffer's display table, or @code{nil} if it
7682 doesn't have one.  @xref{Display Tables,,, lispref, XEmacs Lisp
7683 Reference Manual}.
7684
7685 @item markers
7686 This field contains the chain of all markers that currently point into
7687 the buffer.  Deletion of text in the buffer, and motion of the buffer's
7688 gap, must check each of these markers and perhaps update it.
7689 @xref{Markers,,, lispref, XEmacs Lisp Reference Manual}.
7690
7691 @item backed_up
7692 This field is a flag that tells whether a backup file has been made for
7693 the visited file of this buffer.
7694
7695 @item mark
7696 This field contains the mark for the buffer.  The mark is a marker,
7697 hence it is also included on the list @code{markers}.  @xref{The Mark,,,
7698 lispref, XEmacs Lisp Reference Manual}.
7699
7700 @item mark_active
7701 This field is non-@code{nil} if the buffer's mark is active.
7702
7703 @item local_var_alist
7704 This field contains the association list describing the variables local
7705 in this buffer, and their values, with the exception of local variables
7706 that have special slots in the buffer object.  (Those slots are omitted
7707 from this table.)  @xref{Buffer-Local Variables,,, lispref, XEmacs Lisp
7708 Reference Manual}.
7709
7710 @item modeline_format
7711 This field contains a Lisp object which controls how to display the mode
7712 line for this buffer.  @xref{Modeline Format,,, lispref, XEmacs Lisp
7713 Reference Manual}.
7714
7715 @item base_buffer
7716 This field holds the buffer's base buffer (if it is an indirect buffer),
7717 or @code{nil}.
7718 @end table
7719
7720 @node MULE Character Sets and Encodings, The Lisp Reader and Compiler, Buffers and Textual Representation, Top
7721 @chapter MULE Character Sets and Encodings
7722 @cindex Mule character sets and encodings
7723 @cindex character sets and encodings, Mule
7724 @cindex encodings, Mule character sets and
7725
7726   Recall that there are two primary ways that text is represented in
7727 XEmacs.  The @dfn{buffer} representation sees the text as a series of
7728 bytes (Bufbytes), with a variable number of bytes used per character.
7729 The @dfn{character} representation sees the text as a series of integers
7730 (Emchars), one per character.  The character representation is a cleaner
7731 representation from a theoretical standpoint, and is thus used in many
7732 cases when lots of manipulations on a string need to be done.  However,
7733 the buffer representation is the standard representation used in both
7734 Lisp strings and buffers, and because of this, it is the ``default''
7735 representation that text comes in.  The reason for using this
7736 representation is that it's compact and is compatible with ASCII.
7737
7738 @menu
7739 * Character Sets::
7740 * Encodings::
7741 * Internal Mule Encodings::
7742 * CCL::
7743 @end menu
7744
7745 @node Character Sets
7746 @section Character Sets
7747 @cindex character sets
7748
7749   A character set (or @dfn{charset}) is an ordered set of characters.  A
7750 particular character in a charset is indexed using one or more
7751 @dfn{position codes}, which are non-negative integers.  The number of
7752 position codes needed to identify a particular character in a charset is
7753 called the @dfn{dimension} of the charset.  In XEmacs/Mule, all charsets
7754 have dimension 1 or 2, and the size of all charsets (except for a few
7755 special cases) is either 94, 96, 94 by 94, or 96 by 96.  The range of
7756 position codes used to index characters from any of these types of
7757 character sets is as follows:
7758
7759 @example
7760 Charset type            Position code 1         Position code 2
7761 ------------------------------------------------------------
7762 94                      33 - 126                N/A
7763 96                      32 - 127                N/A
7764 94x94                   33 - 126                33 - 126
7765 96x96                   32 - 127                32 - 127
7766 @end example
7767
7768   Note that in the above cases position codes do not start at an
7769 expected value such as 0 or 1.  The reason for this will become clear
7770 later.
7771
7772   For example, Latin-1 is a 96-character charset, and JISX0208 (the
7773 Japanese national character set) is a 94x94-character charset.
7774
7775   [Note that, although the ranges above define the @emph{valid} position
7776 codes for a charset, some of the slots in a particular charset may in
7777 fact be empty.  This is the case for JISX0208, for example, where (e.g.)
7778 all the slots whose first position code is in the range 118 - 127 are
7779 empty.]
7780
7781   There are three charsets that do not follow the above rules.  All of
7782 them have one dimension, and have ranges of position codes as follows:
7783
7784 @example
7785 Charset name            Position code 1
7786 ------------------------------------
7787 ASCII                   0 - 127
7788 Control-1               0 - 31
7789 Composite               0 - some large number
7790 @end example
7791
7792   (The upper bound of the position code for composite characters has not
7793 yet been determined, but it will probably be at least 16,383).
7794
7795   ASCII is the union of two subsidiary character sets: Printing-ASCII
7796 (the printing ASCII character set, consisting of position codes 33 -
7797 126, like for a standard 94-character charset) and Control-ASCII (the
7798 non-printing characters that would appear in a binary file with codes 0
7799 - 32 and 127).
7800
7801   Control-1 contains the non-printing characters that would appear in a
7802 binary file with codes 128 - 159.
7803
7804   Composite contains characters that are generated by overstriking one
7805 or more characters from other charsets.
7806
7807   Note that some characters in ASCII, and all characters in Control-1,
7808 are @dfn{control} (non-printing) characters.  These have no printed
7809 representation but instead control some other function of the printing
7810 (e.g. TAB or 8 moves the current character position to the next tab
7811 stop).  All other characters in all charsets are @dfn{graphic}
7812 (printing) characters.
7813
7814   When a binary file is read in, the bytes in the file are assigned to
7815 character sets as follows:
7816
7817 @example
7818 Bytes           Character set           Range
7819 --------------------------------------------------
7820 0 - 127         ASCII                   0 - 127
7821 128 - 159       Control-1               0 - 31
7822 160 - 255       Latin-1                 32 - 127
7823 @end example
7824
7825   This is a bit ad-hoc but gets the job done.
7826
7827 @node Encodings
7828 @section Encodings
7829 @cindex encodings, Mule
7830 @cindex Mule encodings
7831
7832   An @dfn{encoding} is a way of numerically representing characters from
7833 one or more character sets.  If an encoding only encompasses one
7834 character set, then the position codes for the characters in that
7835 character set could be used directly.  This is not possible, however, if
7836 more than one character set is to be used in the encoding.
7837
7838   For example, the conversion detailed above between bytes in a binary
7839 file and characters is effectively an encoding that encompasses the
7840 three character sets ASCII, Control-1, and Latin-1 in a stream of 8-bit
7841 bytes.
7842
7843   Thus, an encoding can be viewed as a way of encoding characters from a
7844 specified group of character sets using a stream of bytes, each of which
7845 contains a fixed number of bits (but not necessarily 8, as in the common
7846 usage of ``byte'').
7847
7848   Here are descriptions of a couple of common
7849 encodings:
7850
7851 @menu
7852 * Japanese EUC (Extended Unix Code)::
7853 * JIS7::
7854 @end menu
7855
7856 @node Japanese EUC (Extended Unix Code)
7857 @subsection Japanese EUC (Extended Unix Code)
7858 @cindex Japanese EUC (Extended Unix Code)
7859 @cindex EUC (Extended Unix Code), Japanese
7860 @cindex Extended Unix Code, Japanese EUC
7861
7862 This encompasses the character sets Printing-ASCII, Japanese-JISX0201,
7863 and Japanese-JISX0208-Kana (half-width katakana, the right half of
7864 JISX0201).  It uses 8-bit bytes.
7865
7866 Note that Printing-ASCII and Japanese-JISX0201-Kana are 94-character
7867 charsets, while Japanese-JISX0208 is a 94x94-character charset.
7868
7869 The encoding is as follows:
7870
7871 @example
7872 Character set            Representation (PC=position-code)
7873 -------------            --------------
7874 Printing-ASCII           PC1
7875 Japanese-JISX0201-Kana   0x8E       | PC1 + 0x80
7876 Japanese-JISX0208        PC1 + 0x80 | PC2 + 0x80
7877 Japanese-JISX0212        PC1 + 0x80 | PC2 + 0x80
7878 @end example
7879
7880
7881 @node JIS7
7882 @subsection JIS7
7883 @cindex JIS7
7884
7885 This encompasses the character sets Printing-ASCII,
7886 Japanese-JISX0201-Roman (the left half of JISX0201; this character set
7887 is very similar to Printing-ASCII and is a 94-character charset),
7888 Japanese-JISX0208, and Japanese-JISX0201-Kana.  It uses 7-bit bytes.
7889
7890 Unlike Japanese EUC, this is a @dfn{modal} encoding, which
7891 means that there are multiple states that the encoding can
7892 be in, which affect how the bytes are to be interpreted.
7893 Special sequences of bytes (called @dfn{escape sequences})
7894 are used to change states.
7895
7896   The encoding is as follows:
7897
7898 @example
7899 Character set              Representation (PC=position-code)
7900 -------------              --------------
7901 Printing-ASCII             PC1
7902 Japanese-JISX0201-Roman    PC1
7903 Japanese-JISX0201-Kana     PC1
7904 Japanese-JISX0208          PC1 PC2
7905
7906
7907 Escape sequence   ASCII equivalent   Meaning
7908 ---------------   ----------------   -------
7909 0x1B 0x28 0x4A    ESC ( J            invoke Japanese-JISX0201-Roman
7910 0x1B 0x28 0x49    ESC ( I            invoke Japanese-JISX0201-Kana
7911 0x1B 0x24 0x42    ESC $ B            invoke Japanese-JISX0208
7912 0x1B 0x28 0x42    ESC ( B            invoke Printing-ASCII
7913 @end example
7914
7915   Initially, Printing-ASCII is invoked.
7916
7917 @node Internal Mule Encodings
7918 @section Internal Mule Encodings
7919 @cindex internal Mule encodings
7920 @cindex Mule encodings, internal
7921 @cindex encodings, internal Mule
7922
7923 In XEmacs/Mule, each character set is assigned a unique number, called a
7924 @dfn{leading byte}.  This is used in the encodings of a character.
7925 Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has
7926 a leading byte of 0), although some leading bytes are reserved.
7927
7928 Charsets whose leading byte is in the range 0x80 - 0x9F are called
7929 @dfn{official} and are used for built-in charsets.  Other charsets are
7930 called @dfn{private} and have leading bytes in the range 0xA0 - 0xFF;
7931 these are user-defined charsets.
7932
7933   More specifically:
7934
7935 @example
7936 Character set           Leading byte
7937 -------------           ------------
7938 ASCII                   0
7939 Composite               0x80
7940 Dimension-1 Official    0x81 - 0x8D
7941                           (0x8E is free)
7942 Control-1               0x8F
7943 Dimension-2 Official    0x90 - 0x99
7944                           (0x9A - 0x9D are free;
7945                            0x9E and 0x9F are reserved)
7946 Dimension-1 Private     0xA0 - 0xEF
7947 Dimension-2 Private     0xF0 - 0xFF
7948 @end example
7949
7950 There are two internal encodings for characters in XEmacs/Mule.  One is
7951 called @dfn{string encoding} and is an 8-bit encoding that is used for
7952 representing characters in a buffer or string.  It uses 1 to 4 bytes per
7953 character.  The other is called @dfn{character encoding} and is a 19-bit
7954 encoding that is used for representing characters individually in a
7955 variable.
7956
7957 (In the following descriptions, we'll ignore composite characters for
7958 the moment.  We also give a general (structural) overview first,
7959 followed later by the exact details.)
7960
7961 @menu
7962 * Internal String Encoding::
7963 * Internal Character Encoding::
7964 @end menu
7965
7966 @node Internal String Encoding
7967 @subsection Internal String Encoding
7968 @cindex internal string encoding
7969 @cindex string encoding, internal
7970 @cindex encoding, internal string
7971
7972 ASCII characters are encoded using their position code directly.  Other
7973 characters are encoded using their leading byte followed by their
7974 position code(s) with the high bit set.  Characters in private character
7975 sets have their leading byte prefixed with a @dfn{leading byte prefix},
7976 which is either 0x9E or 0x9F. (No character sets are ever assigned these
7977 leading bytes.) Specifically:
7978
7979 @example
7980 Character set           Encoding (PC=position-code, LB=leading-byte)
7981 -------------           --------
7982 ASCII                   PC-1 |
7983 Control-1               LB   |  PC1 + 0xA0 |
7984 Dimension-1 official    LB   |  PC1 + 0x80 |
7985 Dimension-1 private     0x9E |  LB         | PC1 + 0x80 |
7986 Dimension-2 official    LB   |  PC1 + 0x80 | PC2 + 0x80 |
7987 Dimension-2 private     0x9F |  LB         | PC1 + 0x80 | PC2 + 0x80
7988 @end example
7989
7990   The basic characteristic of this encoding is that the first byte
7991 of all characters is in the range 0x00 - 0x9F, and the second and
7992 following bytes of all characters is in the range 0xA0 - 0xFF.
7993 This means that it is impossible to get out of sync, or more
7994 specifically:
7995
7996 @enumerate
7997 @item
7998 Given any byte position, the beginning of the character it is
7999 within can be determined in constant time.
8000 @item
8001 Given any byte position at the beginning of a character, the
8002 beginning of the next character can be determined in constant
8003 time.
8004 @item
8005 Given any byte position at the beginning of a character, the
8006 beginning of the previous character can be determined in constant
8007 time.
8008 @item
8009 Textual searches can simply treat encoded strings as if they
8010 were encoded in a one-byte-per-character fashion rather than
8011 the actual multi-byte encoding.
8012 @end enumerate
8013
8014   None of the standard non-modal encodings meet all of these
8015 conditions.  For example, EUC satisfies only (2) and (3), while
8016 Shift-JIS and Big5 (not yet described) satisfy only (2). (All
8017 non-modal encodings must satisfy (2), in order to be unambiguous.)
8018
8019 @node Internal Character Encoding
8020 @subsection Internal Character Encoding
8021 @cindex internal character encoding
8022 @cindex character encoding, internal
8023 @cindex encoding, internal character
8024
8025   One 19-bit word represents a single character.  The word is
8026 separated into three fields:
8027
8028 @example
8029 Bit number:     18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
8030                 <------------> <------------------> <------------------>
8031 Field:                1                  2                    3
8032 @end example
8033
8034   Note that fields 2 and 3 hold 7 bits each, while field 1 holds 5 bits.
8035
8036 @example
8037 Character set           Field 1         Field 2         Field 3
8038 -------------           -------         -------         -------
8039 ASCII                      0               0              PC1
8040    range:                                                   (00 - 7F)
8041 Control-1                  0               1              PC1
8042    range:                                                   (00 - 1F)
8043 Dimension-1 official       0            LB - 0x80         PC1
8044    range:                                    (01 - 0D)      (20 - 7F)
8045 Dimension-1 private        0            LB - 0x80         PC1
8046    range:                                    (20 - 6F)      (20 - 7F)
8047 Dimension-2 official    LB - 0x8F         PC1             PC2
8048    range:                    (01 - 0A)       (20 - 7F)      (20 - 7F)
8049 Dimension-2 private     LB - 0xE1         PC1             PC2
8050    range:                    (0F - 1E)       (20 - 7F)      (20 - 7F)
8051 Composite                 0x1F             ?               ?
8052 @end example
8053
8054   Note that character codes 0 - 255 are the same as the ``binary encoding''
8055 described above.
8056
8057 @node CCL
8058 @section CCL
8059 @cindex CCL
8060
8061 @example
8062 CCL PROGRAM SYNTAX:
8063      CCL_PROGRAM := (CCL_MAIN_BLOCK
8064                      [ CCL_EOF_BLOCK ])
8065
8066      CCL_MAIN_BLOCK := CCL_BLOCK
8067      CCL_EOF_BLOCK := CCL_BLOCK
8068
8069      CCL_BLOCK := STATEMENT | (STATEMENT [STATEMENT ...])
8070      STATEMENT :=
8071              SET | IF | BRANCH | LOOP | REPEAT | BREAK
8072              | READ | WRITE
8073
8074      SET := (REG = EXPRESSION) | (REG SELF_OP EXPRESSION)
8075             | INT-OR-CHAR
8076
8077      EXPRESSION := ARG | (EXPRESSION OP ARG)
8078
8079      IF := (if EXPRESSION CCL_BLOCK CCL_BLOCK)
8080      BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...])
8081      LOOP := (loop STATEMENT [STATEMENT ...])
8082      BREAK := (break)
8083      REPEAT := (repeat)
8084              | (write-repeat [REG | INT-OR-CHAR | string])
8085              | (write-read-repeat REG [INT-OR-CHAR | string | ARRAY]?)
8086      READ := (read REG) | (read REG REG)
8087              | (read-if REG ARITH_OP ARG CCL_BLOCK CCL_BLOCK)
8088              | (read-branch REG CCL_BLOCK [CCL_BLOCK ...])
8089      WRITE := (write REG) | (write REG REG)
8090              | (write INT-OR-CHAR) | (write STRING) | STRING
8091              | (write REG ARRAY)
8092      END := (end)
8093
8094      REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7
8095      ARG := REG | INT-OR-CHAR
8096      OP :=   + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | //
8097              | < | > | == | <= | >= | !=
8098      SELF_OP :=
8099              += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>=
8100      ARRAY := '[' INT-OR-CHAR ... ']'
8101      INT-OR-CHAR := INT | CHAR
8102
8103 MACHINE CODE:
8104
8105 The machine code consists of a vector of 32-bit words.
8106 The first such word specifies the start of the EOF section of the code;
8107 this is the code executed to handle any stuff that needs to be done
8108 (e.g. designating back to ASCII and left-to-right mode) after all
8109 other encoded/decoded data has been written out.  This is not used for
8110 charset CCL programs.
8111
8112 REGISTER: 0..7  -- referred by RRR or rrr
8113
8114 OPERATOR BIT FIELD (27-bit): XXXXXXXXXXXXXXX RRR TTTTT
8115         TTTTT (5-bit): operator type
8116         RRR (3-bit): register number
8117         XXXXXXXXXXXXXXXX (15-bit):
8118                 CCCCCCCCCCCCCCC: constant or address
8119                 000000000000rrr: register number
8120
8121 AAAA:   00000 +
8122         00001 -
8123         00010 *
8124         00011 /
8125         00100 %
8126         00101 &
8127         00110 |
8128         00111 ~
8129
8130         01000 <<
8131         01001 >>
8132         01010 <8
8133         01011 >8
8134         01100 //
8135         01101 not used
8136         01110 not used
8137         01111 not used
8138
8139         10000 <
8140         10001 >
8141         10010 ==
8142         10011 <=
8143         10100 >=
8144         10101 !=
8145
8146 OPERATORS:      TTTTT RRR XX..
8147
8148 SetCS:          00000 RRR C...C      RRR = C...C
8149 SetCL:          00001 RRR .....      RRR = c...c
8150                 c.............c
8151 SetR:           00010 RRR ..rrr      RRR = rrr
8152 SetA:           00011 RRR ..rrr      RRR = array[rrr]
8153                 C.............C      size of array = C...C
8154                 c.............c      contents = c...c
8155
8156 Jump:           00100 000 c...c      jump to c...c
8157 JumpCond:       00101 RRR c...c      if (!RRR) jump to c...c
8158 WriteJump:      00110 RRR c...c      Write1 RRR, jump to c...c
8159 WriteReadJump:  00111 RRR c...c      Write1, Read1 RRR, jump to c...c
8160 WriteCJump:     01000 000 c...c      Write1 C...C, jump to c...c
8161                 C...C
8162 WriteCReadJump: 01001 RRR c...c      Write1 C...C, Read1 RRR,
8163                 C.............C      and jump to c...c
8164 WriteSJump:     01010 000 c...c      WriteS, jump to c...c
8165                 C.............C
8166                 S.............S
8167                 ...
8168 WriteSReadJump: 01011 RRR c...c      WriteS, Read1 RRR, jump to c...c
8169                 C.............C
8170                 S.............S
8171                 ...
8172 WriteAReadJump: 01100 RRR c...c      WriteA, Read1 RRR, jump to c...c
8173                 C.............C      size of array = C...C
8174                 c.............c      contents = c...c
8175                 ...
8176 Branch:         01101 RRR C...C      if (RRR >= 0 && RRR < C..)
8177                 c.............c      branch to (RRR+1)th address
8178 Read1:          01110 RRR ...        read 1-byte to RRR
8179 Read2:          01111 RRR ..rrr      read 2-byte to RRR and rrr
8180 ReadBranch:     10000 RRR C...C      Read1 and Branch
8181                 c.............c
8182                 ...
8183 Write1:         10001 RRR .....      write 1-byte RRR
8184 Write2:         10010 RRR ..rrr      write 2-byte RRR and rrr
8185 WriteC:         10011 000 .....      write 1-char C...CC
8186                 C.............C
8187 WriteS:         10100 000 .....      write C..-byte of string
8188                 C.............C
8189                 S.............S
8190                 ...
8191 WriteA:         10101 RRR .....      write array[RRR]
8192                 C.............C      size of array = C...C
8193                 c.............c      contents = c...c
8194                 ...
8195 End:            10110 000 .....      terminate the execution
8196
8197 SetSelfCS:      10111 RRR C...C      RRR AAAAA= C...C
8198                 ..........AAAAA
8199 SetSelfCL:      11000 RRR .....      RRR AAAAA= c...c
8200                 c.............c
8201                 ..........AAAAA
8202 SetSelfR:       11001 RRR ..Rrr      RRR AAAAA= rrr
8203                 ..........AAAAA
8204 SetExprCL:      11010 RRR ..Rrr      RRR = rrr AAAAA c...c
8205                 c.............c
8206                 ..........AAAAA
8207 SetExprR:       11011 RRR ..rrr      RRR = rrr AAAAA Rrr
8208                 ............Rrr
8209                 ..........AAAAA
8210 JumpCondC:      11100 RRR c...c      if !(RRR AAAAA C..) jump to c...c
8211                 C.............C
8212                 ..........AAAAA
8213 JumpCondR:      11101 RRR c...c      if !(RRR AAAAA rrr) jump to c...c
8214                 ............rrr
8215                 ..........AAAAA
8216 ReadJumpCondC:  11110 RRR c...c      Read1 and JumpCondC
8217                 C.............C
8218                 ..........AAAAA
8219 ReadJumpCondR:  11111 RRR c...c      Read1 and JumpCondR
8220                 ............rrr
8221                 ..........AAAAA
8222 @end example
8223
8224 @node The Lisp Reader and Compiler, Lstreams, MULE Character Sets and Encodings, Top
8225 @chapter The Lisp Reader and Compiler
8226 @cindex Lisp reader and compiler, the
8227 @cindex reader and compiler, the Lisp
8228 @cindex compiler, the Lisp reader and
8229
8230 Not yet documented.
8231
8232 @node Lstreams, Consoles; Devices; Frames; Windows, The Lisp Reader and Compiler, Top
8233 @chapter Lstreams
8234 @cindex lstreams
8235
8236   An @dfn{lstream} is an internal Lisp object that provides a generic
8237 buffering stream implementation.  Conceptually, you send data to the
8238 stream or read data from the stream, not caring what's on the other end
8239 of the stream.  The other end could be another stream, a file
8240 descriptor, a stdio stream, a fixed block of memory, a reallocating
8241 block of memory, etc.  The main purpose of the stream is to provide a
8242 standard interface and to do buffering.  Macros are defined to read or
8243 write characters, so the calling functions do not have to worry about
8244 blocking data together in order to achieve efficiency.
8245
8246 @menu
8247 * Creating an Lstream::         Creating an lstream object.
8248 * Lstream Types::               Different sorts of things that are streamed.
8249 * Lstream Functions::           Functions for working with lstreams.
8250 * Lstream Methods::             Creating new lstream types.
8251 @end menu
8252
8253 @node Creating an Lstream
8254 @section Creating an Lstream
8255 @cindex lstream, creating an
8256
8257 Lstreams come in different types, depending on what is being interfaced
8258 to.  Although the primitive for creating new lstreams is
8259 @code{Lstream_new()}, generally you do not call this directly.  Instead,
8260 you call some type-specific creation function, which creates the lstream
8261 and initializes it as appropriate for the particular type.
8262
8263 All lstream creation functions take a @var{mode} argument, specifying
8264 what mode the lstream should be opened as.  This controls whether the
8265 lstream is for input and output, and optionally whether data should be
8266 blocked up in units of MULE characters.  Note that some types of
8267 lstreams can only be opened for input; others only for output; and
8268 others can be opened either way.  #### Richard Mlynarik thinks that
8269 there should be a strict separation between input and output streams,
8270 and he's probably right.
8271
8272   @var{mode} is a string, one of
8273
8274 @table @code
8275 @item "r"
8276   Open for reading.
8277 @item "w"
8278   Open for writing.
8279 @item "rc"
8280   Open for reading, but ``read'' never returns partial MULE characters.
8281 @item "wc"
8282   Open for writing, but never writes partial MULE characters.
8283 @end table
8284
8285 @node Lstream Types
8286 @section Lstream Types
8287 @cindex lstream types
8288 @cindex types, lstream
8289
8290 @table @asis
8291 @item stdio
8292
8293 @item filedesc
8294
8295 @item lisp-string
8296
8297 @item fixed-buffer
8298
8299 @item resizing-buffer
8300
8301 @item dynarr
8302
8303 @item lisp-buffer
8304
8305 @item print
8306
8307 @item decoding
8308
8309 @item encoding
8310 @end table
8311
8312 @node Lstream Functions
8313 @section Lstream Functions
8314 @cindex lstream functions
8315
8316 @deftypefun {Lstream *} Lstream_new (Lstream_implementation *@var{imp}, const char *@var{mode})
8317 Allocate and return a new Lstream.  This function is not really meant to
8318 be called directly; rather, each stream type should provide its own
8319 stream creation function, which creates the stream and does any other
8320 necessary creation stuff (e.g. opening a file).
8321 @end deftypefun
8322
8323 @deftypefun void Lstream_set_buffering (Lstream *@var{lstr}, Lstream_buffering @var{buffering}, int @var{buffering_size})
8324 Change the buffering of a stream.  See @file{lstream.h}.  By default the
8325 buffering is @code{STREAM_BLOCK_BUFFERED}.
8326 @end deftypefun
8327
8328 @deftypefun int Lstream_flush (Lstream *@var{lstr})
8329 Flush out any pending unwritten data in the stream.  Clear any buffered
8330 input data.  Returns 0 on success, -1 on error.
8331 @end deftypefun
8332
8333 @deftypefn Macro int Lstream_putc (Lstream *@var{stream}, int @var{c})
8334 Write out one byte to the stream.  This is a macro and so it is very
8335 efficient.  The @var{c} argument is only evaluated once but the @var{stream}
8336 argument is evaluated more than once.  Returns 0 on success, -1 on
8337 error.
8338 @end deftypefn
8339
8340 @deftypefn Macro int Lstream_getc (Lstream *@var{stream})
8341 Read one byte from the stream.  This is a macro and so it is very
8342 efficient.  The @var{stream} argument is evaluated more than once.  Return
8343 value is -1 for EOF or error.
8344 @end deftypefn
8345
8346 @deftypefn Macro void Lstream_ungetc (Lstream *@var{stream}, int @var{c})
8347 Push one byte back onto the input queue.  This will be the next byte
8348 read from the stream.  Any number of bytes can be pushed back and will
8349 be read in the reverse order they were pushed back---most recent
8350 first. (This is necessary for consistency---if there are a number of
8351 bytes that have been unread and I read and unread a byte, it needs to be
8352 the first to be read again.) This is a macro and so it is very
8353 efficient.  The @var{c} argument is only evaluated once but the @var{stream}
8354 argument is evaluated more than once.
8355 @end deftypefn
8356
8357 @deftypefun int Lstream_fputc (Lstream *@var{stream}, int @var{c})
8358 @deftypefunx int Lstream_fgetc (Lstream *@var{stream})
8359 @deftypefunx void Lstream_fungetc (Lstream *@var{stream}, int @var{c})
8360 Function equivalents of the above macros.
8361 @end deftypefun
8362
8363 @deftypefun ssize_t Lstream_read (Lstream *@var{stream}, void *@var{data}, size_t @var{size})
8364 Read @var{size} bytes of @var{data} from the stream.  Return the number
8365 of bytes read.  0 means EOF. -1 means an error occurred and no bytes
8366 were read.
8367 @end deftypefun
8368
8369 @deftypefun ssize_t Lstream_write (Lstream *@var{stream}, void *@var{data}, size_t @var{size})
8370 Write @var{size} bytes of @var{data} to the stream.  Return the number
8371 of bytes written.  -1 means an error occurred and no bytes were written.
8372 @end deftypefun
8373
8374 @deftypefun void Lstream_unread (Lstream *@var{stream}, void *@var{data}, size_t @var{size})
8375 Push back @var{size} bytes of @var{data} onto the input queue.  The next
8376 call to @code{Lstream_read()} with the same size will read the same
8377 bytes back.  Note that this will be the case even if there is other
8378 pending unread data.
8379 @end deftypefun
8380
8381 @deftypefun int Lstream_close (Lstream *@var{stream})
8382 Close the stream.  All data will be flushed out.
8383 @end deftypefun
8384
8385 @deftypefun void Lstream_reopen (Lstream *@var{stream})
8386 Reopen a closed stream.  This enables I/O on it again.  This is not
8387 meant to be called except from a wrapper routine that reinitializes
8388 variables and such---the close routine may well have freed some
8389 necessary storage structures, for example.
8390 @end deftypefun
8391
8392 @deftypefun void Lstream_rewind (Lstream *@var{stream})
8393 Rewind the stream to the beginning.
8394 @end deftypefun
8395
8396 @node Lstream Methods
8397 @section Lstream Methods
8398 @cindex lstream methods
8399
8400 @deftypefn {Lstream Method} ssize_t reader (Lstream *@var{stream}, unsigned char *@var{data}, size_t @var{size})
8401 Read some data from the stream's end and store it into @var{data}, which
8402 can hold @var{size} bytes.  Return the number of bytes read.  A return
8403 value of 0 means no bytes can be read at this time.  This may be because
8404 of an EOF, or because there is a granularity greater than one byte that
8405 the stream imposes on the returned data, and @var{size} is less than
8406 this granularity. (This will happen frequently for streams that need to
8407 return whole characters, because @code{Lstream_read()} calls the reader
8408 function repeatedly until it has the number of bytes it wants or until 0
8409 is returned.)  The lstream functions do not treat a 0 return as EOF or
8410 do anything special; however, the calling function will interpret any 0
8411 it gets back as EOF.  This will normally not happen unless the caller
8412 calls @code{Lstream_read()} with a very small size.
8413
8414 This function can be @code{NULL} if the stream is output-only.
8415 @end deftypefn
8416
8417 @deftypefn {Lstream Method} ssize_t writer (Lstream *@var{stream}, const unsigned char *@var{data}, size_t @var{size})
8418 Send some data to the stream's end.  Data to be sent is in @var{data}
8419 and is @var{size} bytes.  Return the number of bytes sent.  This
8420 function can send and return fewer bytes than is passed in; in that
8421 case, the function will just be called again until there is no data left
8422 or 0 is returned.  A return value of 0 means that no more data can be
8423 currently stored, but there is no error; the data will be squirreled
8424 away until the writer can accept data. (This is useful, e.g., if you're
8425 dealing with a non-blocking file descriptor and are getting
8426 @code{EWOULDBLOCK} errors.)  This function can be @code{NULL} if the
8427 stream is input-only.
8428 @end deftypefn
8429
8430 @deftypefn {Lstream Method} int rewinder (Lstream *@var{stream})
8431 Rewind the stream.  If this is @code{NULL}, the stream is not seekable.
8432 @end deftypefn
8433
8434 @deftypefn {Lstream Method} int seekable_p (Lstream *@var{stream})
8435 Indicate whether this stream is seekable---i.e. it can be rewound.
8436 This method is ignored if the stream does not have a rewind method.  If
8437 this method is not present, the result is determined by whether a rewind
8438 method is present.
8439 @end deftypefn
8440
8441 @deftypefn {Lstream Method} int flusher (Lstream *@var{stream})
8442 Perform any additional operations necessary to flush the data in this
8443 stream.
8444 @end deftypefn
8445
8446 @deftypefn {Lstream Method} int pseudo_closer (Lstream *@var{stream})
8447 @end deftypefn
8448
8449 @deftypefn {Lstream Method} int closer (Lstream *@var{stream})
8450 Perform any additional operations necessary to close this stream down.
8451 May be @code{NULL}.  This function is called when @code{Lstream_close()}
8452 is called or when the stream is garbage-collected.  When this function
8453 is called, all pending data in the stream will already have been written
8454 out.
8455 @end deftypefn
8456
8457 @deftypefn {Lstream Method} Lisp_Object marker (Lisp_Object @var{lstream}, void (*@var{markfun}) (Lisp_Object))
8458 Mark this object for garbage collection.  Same semantics as a standard
8459 @code{Lisp_Object} marker.  This function can be @code{NULL}.
8460 @end deftypefn
8461
8462 @node Consoles; Devices; Frames; Windows, The Redisplay Mechanism, Lstreams, Top
8463 @chapter Consoles; Devices; Frames; Windows
8464 @cindex consoles; devices; frames; windows
8465 @cindex devices; frames; windows, consoles;
8466 @cindex frames; windows, consoles; devices;
8467 @cindex windows, consoles; devices; frames;
8468
8469 @menu
8470 * Introduction to Consoles; Devices; Frames; Windows::
8471 * Point::
8472 * Window Hierarchy::
8473 * The Window Object::
8474 @end menu
8475
8476 @node Introduction to Consoles; Devices; Frames; Windows
8477 @section Introduction to Consoles; Devices; Frames; Windows
8478 @cindex consoles; devices; frames; windows, introduction to
8479 @cindex devices; frames; windows, introduction to consoles;
8480 @cindex frames; windows, introduction to consoles; devices;
8481 @cindex windows, introduction to consoles; devices; frames;
8482
8483 A window-system window that you see on the screen is called a
8484 @dfn{frame} in Emacs terminology.  Each frame is subdivided into one or
8485 more non-overlapping panes, called (confusingly) @dfn{windows}.  Each
8486 window displays the text of a buffer in it. (See above on Buffers.) Note
8487 that buffers and windows are independent entities: Two or more windows
8488 can be displaying the same buffer (potentially in different locations),
8489 and a buffer can be displayed in no windows.
8490
8491   A single display screen that contains one or more frames is called
8492 a @dfn{display}.  Under most circumstances, there is only one display.
8493 However, more than one display can exist, for example if you have
8494 a @dfn{multi-headed} console, i.e. one with a single keyboard but
8495 multiple displays. (Typically in such a situation, the various
8496 displays act like one large display, in that the mouse is only
8497 in one of them at a time, and moving the mouse off of one moves
8498 it into another.) In some cases, the different displays will
8499 have different characteristics, e.g. one color and one mono.
8500
8501   XEmacs can display frames on multiple displays.  It can even deal
8502 simultaneously with frames on multiple keyboards (called @dfn{consoles} in
8503 XEmacs terminology).  Here is one case where this might be useful: You
8504 are using XEmacs on your workstation at work, and leave it running.
8505 Then you go home and dial in on a TTY line, and you can use the
8506 already-running XEmacs process to display another frame on your local
8507 TTY.
8508
8509   Thus, there is a hierarchy console -> display -> frame -> window.
8510 There is a separate Lisp object type for each of these four concepts.
8511 Furthermore, there is logically a @dfn{selected console},
8512 @dfn{selected display}, @dfn{selected frame}, and @dfn{selected window}.
8513 Each of these objects is distinguished in various ways, such as being the
8514 default object for various functions that act on objects of that type.
8515 Note that every containing object remembers the ``selected'' object
8516 among the objects that it contains: e.g. not only is there a selected
8517 window, but every frame remembers the last window in it that was
8518 selected, and changing the selected frame causes the remembered window
8519 within it to become the selected window.  Similar relationships apply
8520 for consoles to devices and devices to frames.
8521
8522 @node Point
8523 @section Point
8524 @cindex point
8525
8526   Recall that every buffer has a current insertion position, called
8527 @dfn{point}.  Now, two or more windows may be displaying the same buffer,
8528 and the text cursor in the two windows (i.e. @code{point}) can be in
8529 two different places.  You may ask, how can that be, since each
8530 buffer has only one value of @code{point}?  The answer is that each window
8531 also has a value of @code{point} that is squirreled away in it.  There
8532 is only one selected window, and the value of ``point'' in that buffer
8533 corresponds to that window.  When the selected window is changed
8534 from one window to another displaying the same buffer, the old
8535 value of @code{point} is stored into the old window's ``point'' and the
8536 value of @code{point} from the new window is retrieved and made the
8537 value of @code{point} in the buffer.  This means that @code{window-point}
8538 for the selected window is potentially inaccurate, and if you
8539 want to retrieve the correct value of @code{point} for a window,
8540 you must special-case on the selected window and retrieve the
8541 buffer's point instead.  This is related to why @code{save-window-excursion}
8542 does not save the selected window's value of @code{point}.
8543
8544 @node Window Hierarchy
8545 @section Window Hierarchy
8546 @cindex window hierarchy
8547 @cindex hierarchy of windows
8548
8549   If a frame contains multiple windows (panes), they are always created
8550 by splitting an existing window along the horizontal or vertical axis.
8551 Terminology is a bit confusing here: to @dfn{split a window
8552 horizontally} means to create two side-by-side windows, i.e. to make a
8553 @emph{vertical} cut in a window.  Likewise, to @dfn{split a window
8554 vertically} means to create two windows, one above the other, by making
8555 a @emph{horizontal} cut.
8556
8557   If you split a window and then split again along the same axis, you
8558 will end up with a number of panes all arranged along the same axis.
8559 The precise way in which the splits were made should not be important,
8560 and this is reflected internally.  Internally, all windows are arranged
8561 in a tree, consisting of two types of windows, @dfn{combination} windows
8562 (which have children, and are covered completely by those children) and
8563 @dfn{leaf} windows, which have no children and are visible.  Every
8564 combination window has two or more children, all arranged along the same
8565 axis.  There are (logically) two subtypes of windows, depending on
8566 whether their children are horizontally or vertically arrayed.  There is
8567 always one root window, which is either a leaf window (if the frame
8568 contains only one window) or a combination window (if the frame contains
8569 more than one window).  In the latter case, the root window will have
8570 two or more children, either horizontally or vertically arrayed, and
8571 each of those children will be either a leaf window or another
8572 combination window.
8573
8574   Here are some rules:
8575
8576 @enumerate
8577 @item
8578 Horizontal combination windows can never have children that are
8579 horizontal combination windows; same for vertical.
8580
8581 @item
8582 Only leaf windows can be split (obviously) and this splitting does one
8583 of two things: (a) turns the leaf window into a combination window and
8584 creates two new leaf children, or (b) turns the leaf window into one of
8585 the two new leaves and creates the other leaf.  Rule (1) dictates which
8586 of these two outcomes happens.
8587
8588 @item
8589 Every combination window must have at least two children.
8590
8591 @item
8592 Leaf windows can never become combination windows.  They can be deleted,
8593 however.  If this results in a violation of (3), the parent combination
8594 window also gets deleted.
8595
8596 @item
8597 All functions that accept windows must be prepared to accept combination
8598 windows, and do something sane (e.g. signal an error if so).
8599 Combination windows @emph{do} escape to the Lisp level.
8600
8601 @item
8602 All windows have three fields governing their contents:
8603 these are @dfn{hchild} (a list of horizontally-arrayed children),
8604 @dfn{vchild} (a list of vertically-arrayed children), and @dfn{buffer}
8605 (the buffer contained in a leaf window).  Exactly one of
8606 these will be non-@code{nil}.  Remember that @dfn{horizontally-arrayed}
8607 means ``side-by-side'' and @dfn{vertically-arrayed} means
8608 @dfn{one above the other}.
8609
8610 @item
8611 Leaf windows also have markers in their @code{start} (the
8612 first buffer position displayed in the window) and @code{pointm}
8613 (the window's stashed value of @code{point}---see above) fields,
8614 while combination windows have @code{nil} in these fields.
8615
8616 @item
8617 The list of children for a window is threaded through the
8618 @code{next} and @code{prev} fields of each child window.
8619
8620 @item
8621 @strong{Deleted windows can be undeleted}.  This happens as a result of
8622 restoring a window configuration, and is unlike frames, displays, and
8623 consoles, which, once deleted, can never be restored.  Deleting a window
8624 does nothing except set a special @code{dead} bit to 1 and clear out the
8625 @code{next}, @code{prev}, @code{hchild}, and @code{vchild} fields, for
8626 GC purposes.
8627
8628 @item
8629 Most frames actually have two top-level windows---one for the
8630 minibuffer and one (the @dfn{root}) for everything else.  The modeline
8631 (if present) separates these two.  The @code{next} field of the root
8632 points to the minibuffer, and the @code{prev} field of the minibuffer
8633 points to the root.  The other @code{next} and @code{prev} fields are
8634 @code{nil}, and the frame points to both of these windows.
8635 Minibuffer-less frames have no minibuffer window, and the @code{next}
8636 and @code{prev} of the root window are @code{nil}.  Minibuffer-only
8637 frames have no root window, and the @code{next} of the minibuffer window
8638 is @code{nil} but the @code{prev} points to itself. (#### This is an
8639 artifact that should be fixed.)
8640 @end enumerate
8641
8642 @node The Window Object
8643 @section The Window Object
8644 @cindex window object, the
8645 @cindex object, the window
8646
8647   Windows have the following accessible fields:
8648
8649 @table @code
8650 @item frame
8651 The frame that this window is on.
8652
8653 @item mini_p
8654 Non-@code{nil} if this window is a minibuffer window.
8655
8656 @item buffer
8657 The buffer that the window is displaying.  This may change often during
8658 the life of the window.
8659
8660 @item dedicated
8661 Non-@code{nil} if this window is dedicated to its buffer.
8662
8663 @item pointm
8664 @cindex window point internals
8665 This is the value of point in the current buffer when this window is
8666 selected; when it is not selected, it retains its previous value.
8667
8668 @item start
8669 The position in the buffer that is the first character to be displayed
8670 in the window.
8671
8672 @item force_start
8673 If this flag is non-@code{nil}, it says that the window has been
8674 scrolled explicitly by the Lisp program.  This affects what the next
8675 redisplay does if point is off the screen: instead of scrolling the
8676 window to show the text around point, it moves point to a location that
8677 is on the screen.
8678
8679 @item last_modified
8680 The @code{modified} field of the window's buffer, as of the last time
8681 a redisplay completed in this window.
8682
8683 @item last_point
8684 The buffer's value of point, as of the last time
8685 a redisplay completed in this window.
8686
8687 @item left
8688 This is the left-hand edge of the window, measured in columns.  (The
8689 leftmost column on the screen is @w{column 0}.)
8690
8691 @item top
8692 This is the top edge of the window, measured in lines.  (The top line on
8693 the screen is @w{line 0}.)
8694
8695 @item height
8696 The height of the window, measured in lines.
8697
8698 @item width
8699 The width of the window, measured in columns.
8700
8701 @item next
8702 This is the window that is the next in the chain of siblings.  It is
8703 @code{nil} in a window that is the rightmost or bottommost of a group of
8704 siblings.
8705
8706 @item prev
8707 This is the window that is the previous in the chain of siblings.  It is
8708 @code{nil} in a window that is the leftmost or topmost of a group of
8709 siblings.
8710
8711 @item parent
8712 Internally, XEmacs arranges windows in a tree; each group of siblings has
8713 a parent window whose area includes all the siblings.  This field points
8714 to a window's parent.
8715
8716 Parent windows do not display buffers, and play little role in display
8717 except to shape their child windows.  Emacs Lisp programs usually have
8718 no access to the parent windows; they operate on the windows at the
8719 leaves of the tree, which actually display buffers.
8720
8721 @item hscroll
8722 This is the number of columns that the display in the window is scrolled
8723 horizontally to the left.  Normally, this is 0.
8724
8725 @item use_time
8726 This is the last time that the window was selected.  The function
8727 @code{get-lru-window} uses this field.
8728
8729 @item display_table
8730 The window's display table, or @code{nil} if none is specified for it.
8731
8732 @item update_mode_line
8733 Non-@code{nil} means this window's mode line needs to be updated.
8734
8735 @item base_line_number
8736 The line number of a certain position in the buffer, or @code{nil}.
8737 This is used for displaying the line number of point in the mode line.
8738
8739 @item base_line_pos
8740 The position in the buffer for which the line number is known, or
8741 @code{nil} meaning none is known.
8742
8743 @item region_showing
8744 If the region (or part of it) is highlighted in this window, this field
8745 holds the mark position that made one end of that region.  Otherwise,
8746 this field is @code{nil}.
8747 @end table
8748
8749 @node The Redisplay Mechanism, Extents, Consoles; Devices; Frames; Windows, Top
8750 @chapter The Redisplay Mechanism
8751 @cindex redisplay mechanism, the
8752
8753   The redisplay mechanism is one of the most complicated sections of
8754 XEmacs, especially from a conceptual standpoint.  This is doubly so
8755 because, unlike for the basic aspects of the Lisp interpreter, the
8756 computer science theories of how to efficiently handle redisplay are not
8757 well-developed.
8758
8759   When working with the redisplay mechanism, remember the Golden Rules
8760 of Redisplay:
8761
8762 @enumerate
8763 @item
8764 It Is Better To Be Correct Than Fast.
8765 @item
8766 Thou Shalt Not Run Elisp From Within Redisplay.
8767 @item
8768 It Is Better To Be Fast Than Not To Be.
8769 @end enumerate
8770
8771 @menu
8772 * Critical Redisplay Sections::
8773 * Line Start Cache::
8774 * Redisplay Piece by Piece::
8775 @end menu
8776
8777 @node Critical Redisplay Sections
8778 @section Critical Redisplay Sections
8779 @cindex redisplay sections, critical
8780 @cindex critical redisplay sections
8781
8782 Within this section, we are defenseless and assume that the
8783 following cannot happen:
8784
8785 @enumerate
8786 @item
8787 garbage collection
8788 @item
8789 Lisp code evaluation
8790 @item
8791 frame size changes
8792 @end enumerate
8793
8794 We ensure (3) by calling @code{hold_frame_size_changes()}, which
8795 will cause any pending frame size changes to get put on hold
8796 till after the end of the critical section.  (1) follows
8797 automatically if (2) is met.  #### Unfortunately, there are
8798 some places where Lisp code can be called within this section.
8799 We need to remove them.
8800
8801 If @code{Fsignal()} is called during this critical section, we
8802 will @code{abort()}.
8803
8804 If garbage collection is called during this critical section,
8805 we simply return. #### We should abort instead.
8806
8807 #### If a frame-size change does occur we should probably
8808 actually be preempting redisplay.
8809
8810 @node Line Start Cache
8811 @section Line Start Cache
8812 @cindex line start cache
8813
8814   The traditional scrolling code in Emacs breaks in a variable height
8815 world.  It depends on the key assumption that the number of lines that
8816 can be displayed at any given time is fixed.  This led to a complete
8817 separation of the scrolling code from the redisplay code.  In order to
8818 fully support variable height lines, the scrolling code must actually be
8819 tightly integrated with redisplay.  Only redisplay can determine how
8820 many lines will be displayed on a screen for any given starting point.
8821
8822   What is ideally wanted is a complete list of the starting buffer
8823 position for every possible display line of a buffer along with the
8824 height of that display line.  Maintaining such a full list would be very
8825 expensive.  We settle for having it include information for all areas
8826 which we happen to generate anyhow (i.e. the region currently being
8827 displayed) and for those areas we need to work with.
8828
8829   In order to ensure that the cache accurately represents what redisplay
8830 would actually show, it is necessary to invalidate it in many
8831 situations.  If the buffer changes, the starting positions may no longer
8832 be correct.  If a face or an extent has changed then the line heights
8833 may have altered.  These events happen frequently enough that the cache
8834 can end up being constantly disabled.  With this potentially constant
8835 invalidation when is the cache ever useful?
8836
8837   Even if the cache is invalidated before every single usage, it is
8838 necessary.  Scrolling often requires knowledge about display lines which
8839 are actually above or below the visible region.  The cache provides a
8840 convenient light-weight method of storing this information for multiple
8841 display regions.  This knowledge is necessary for the scrolling code to
8842 always obey the First Golden Rule of Redisplay.
8843
8844   If the cache already contains all of the information that the scrolling
8845 routines happen to need so that it doesn't have to go generate it, then
8846 we are able to obey the Third Golden Rule of Redisplay.  The first thing
8847 we do to help out the cache is to always add the displayed region.  This
8848 region had to be generated anyway, so the cache ends up getting the
8849 information basically for free.  In those cases where a user is simply
8850 scrolling around viewing a buffer there is a high probability that this
8851 is sufficient to always provide the needed information.  The second
8852 thing we can do is be smart about invalidating the cache.
8853
8854   TODO---Be smart about invalidating the cache.  Potential places:
8855
8856 @itemize @bullet
8857 @item
8858 Insertions at end-of-line which don't cause line-wraps do not alter the
8859 starting positions of any display lines.  These types of buffer
8860 modifications should not invalidate the cache.  This is actually a large
8861 optimization for redisplay speed as well.
8862 @item
8863 Buffer modifications frequently only affect the display of lines at and
8864 below where they occur.  In these situations we should only invalidate
8865 the part of the cache starting at where the modification occurs.
8866 @end itemize
8867
8868   In case you're wondering, the Second Golden Rule of Redisplay is not
8869 applicable.
8870
8871 @node Redisplay Piece by Piece
8872 @section Redisplay Piece by Piece
8873 @cindex redisplay piece by piece
8874
8875 As you can begin to see redisplay is complex and also not well
8876 documented. Chuck no longer works on XEmacs so this section is my take
8877 on the workings of redisplay.
8878
8879 Redisplay happens in three phases:
8880
8881 @enumerate
8882 @item
8883 Determine desired display in area that needs redisplay.
8884 Implemented by @code{redisplay.c}
8885 @item
8886 Compare desired display with current display
8887 Implemented by @code{redisplay-output.c}
8888 @item
8889 Output changes Implemented by @code{redisplay-output.c},
8890 @code{redisplay-x.c}, @code{redisplay-msw.c} and @code{redisplay-tty.c}
8891 @end enumerate
8892
8893 Steps 1 and 2 are device-independent and relatively complex.  Step 3 is
8894 mostly device-dependent.
8895
8896 Determining the desired display
8897
8898 Display attributes are stored in @code{display_line} structures. Each
8899 @code{display_line} consists of a set of @code{display_block}'s and each
8900 @code{display_block} contains a number of @code{rune}'s. Generally
8901 dynarr's of @code{display_line}'s are held by each window representing
8902 the current display and the desired display.
8903
8904 The @code{display_line} structures are tightly tied to buffers which
8905 presents a problem for redisplay as this connection is bogus for the
8906 modeline. Hence the @code{display_line} generation routines are
8907 duplicated for generating the modeline. This means that the modeline
8908 display code has many bugs that the standard redisplay code does not.
8909
8910 The guts of @code{display_line} generation are in
8911 @code{create_text_block}, which creates a single display line for the
8912 desired locale. This incrementally parses the characters on the current
8913 line and generates redisplay structures for each.
8914
8915 Gutter redisplay is different. Because the data to display is stored in
8916 a string we cannot use @code{create_text_block}. Instead we use
8917 @code{create_text_string_block} which performs the same function as
8918 @code{create_text_block} but for strings. Many of the complexities of
8919 @code{create_text_block} to do with cursor handling and selective
8920 display have been removed.
8921
8922 @node Extents, Faces, The Redisplay Mechanism, Top
8923 @chapter Extents
8924 @cindex extents
8925
8926 @menu
8927 * Introduction to Extents::     Extents are ranges over text, with properties.
8928 * Extent Ordering::             How extents are ordered internally.
8929 * Format of the Extent Info::   The extent information in a buffer or string.
8930 * Zero-Length Extents::         A weird special case.
8931 * Mathematics of Extent Ordering::  A rigorous foundation.
8932 * Extent Fragments::            Cached information useful for redisplay.
8933 @end menu
8934
8935 @node Introduction to Extents
8936 @section Introduction to Extents
8937 @cindex extents, introduction to
8938
8939   Extents are regions over a buffer, with a start and an end position
8940 denoting the region of the buffer included in the extent.  In
8941 addition, either end can be closed or open, meaning that the endpoint
8942 is or is not logically included in the extent.  Insertion of a character
8943 at a closed endpoint causes the character to go inside the extent;
8944 insertion at an open endpoint causes the character to go outside.
8945
8946   Extent endpoints are stored using memory indices (see @file{insdel.c}),
8947 to minimize the amount of adjusting that needs to be done when
8948 characters are inserted or deleted.
8949
8950   (Formerly, extent endpoints at the gap could be either before or
8951 after the gap, depending on the open/closedness of the endpoint.
8952 The intent of this was to make it so that insertions would
8953 automatically go inside or out of extents as necessary with no
8954 further work needing to be done.  It didn't work out that way,
8955 however, and just ended up complexifying and buggifying all the
8956 rest of the code.)
8957
8958 @node Extent Ordering
8959 @section Extent Ordering
8960 @cindex extent ordering
8961
8962   Extents are compared using memory indices.  There are two orderings
8963 for extents and both orders are kept current at all times.  The normal
8964 or @dfn{display} order is as follows:
8965
8966 @example
8967 Extent A is ``less than'' extent B,
8968 that is, earlier in the display order,
8969   if:    A-start < B-start,
8970   or if: A-start = B-start, and A-end > B-end
8971 @end example
8972
8973   So if two extents begin at the same position, the larger of them is the
8974 earlier one in the display order (@code{EXTENT_LESS} is true).
8975
8976   For the e-order, the same thing holds:
8977
8978 @example
8979 Extent A is ``less than'' extent B in e-order,
8980 that is, later in the buffer,
8981   if:    A-end < B-end,
8982   or if: A-end = B-end, and A-start > B-start
8983 @end example
8984
8985   So if two extents end at the same position, the smaller of them is the
8986 earlier one in the e-order (@code{EXTENT_E_LESS} is true).
8987
8988   The display order and the e-order are complementary orders: any
8989 theorem about the display order also applies to the e-order if you swap
8990 all occurrences of ``display order'' and ``e-order'', ``less than'' and
8991 ``greater than'', and ``extent start'' and ``extent end''.
8992
8993 @node Format of the Extent Info
8994 @section Format of the Extent Info
8995 @cindex extent info, format of the
8996
8997   An extent-info structure consists of a list of the buffer or string's
8998 extents and a @dfn{stack of extents} that lists all of the extents over
8999 a particular position.  The stack-of-extents info is used for
9000 optimization purposes---it basically caches some info that might
9001 be expensive to compute.  Certain otherwise hard computations are easy
9002 given the stack of extents over a particular position, and if the
9003 stack of extents over a nearby position is known (because it was
9004 calculated at some prior point in time), it's easy to move the stack
9005 of extents to the proper position.
9006
9007   Given that the stack of extents is an optimization, and given that
9008 it requires memory, a string's stack of extents is wiped out each
9009 time a garbage collection occurs.  Therefore, any time you retrieve
9010 the stack of extents, it might not be there.  If you need it to
9011 be there, use the @code{_force} version.
9012
9013   Similarly, a string may or may not have an extent_info structure.
9014 (Generally it won't if there haven't been any extents added to the
9015 string.) So use the @code{_force} version if you need the extent_info
9016 structure to be there.
9017
9018   A list of extents is maintained as a double gap array: one gap array
9019 is ordered by start index (the @dfn{display order}) and the other is
9020 ordered by end index (the @dfn{e-order}).  Note that positions in an
9021 extent list should logically be conceived of as referring @emph{to} a
9022 particular extent (as is the norm in programs) rather than sitting
9023 between two extents.  Note also that callers of these functions should
9024 not be aware of the fact that the extent list is implemented as an
9025 array, except for the fact that positions are integers (this should be
9026 generalized to handle integers and linked list equally well).
9027
9028 @node Zero-Length Extents
9029 @section Zero-Length Extents
9030 @cindex zero-length extents
9031 @cindex extents, zero-length
9032
9033   Extents can be zero-length, and will end up that way if their endpoints
9034 are explicitly set that way or if their detachable property is @code{nil}
9035 and all the text in the extent is deleted. (The exception is open-open
9036 zero-length extents, which are barred from existing because there is
9037 no sensible way to define their properties.  Deletion of the text in
9038 an open-open extent causes it to be converted into a closed-open
9039 extent.)  Zero-length extents are primarily used to represent
9040 annotations, and behave as follows:
9041
9042 @enumerate
9043 @item
9044 Insertion at the position of a zero-length extent expands the extent
9045 if both endpoints are closed; goes after the extent if it is closed-open;
9046 and goes before the extent if it is open-closed.
9047
9048 @item
9049 Deletion of a character on a side of a zero-length extent whose
9050 corresponding endpoint is closed causes the extent to be detached if
9051 it is detachable; if the extent is not detachable or the corresponding
9052 endpoint is open, the extent remains in the buffer, moving as necessary.
9053 @end enumerate
9054
9055   Note that closed-open, non-detachable zero-length extents behave
9056 exactly like markers and that open-closed, non-detachable zero-length
9057 extents behave like the ``point-type'' marker in Mule.
9058
9059 @node Mathematics of Extent Ordering
9060 @section Mathematics of Extent Ordering
9061 @cindex mathematics of extent ordering
9062 @cindex extent mathematics
9063 @cindex extent ordering
9064
9065 @cindex display order of extents
9066 @cindex extents, display order
9067   The extents in a buffer are ordered by ``display order'' because that
9068 is that order that the redisplay mechanism needs to process them in.
9069 The e-order is an auxiliary ordering used to facilitate operations
9070 over extents.  The operations that can be performed on the ordered
9071 list of extents in a buffer are
9072
9073 @enumerate
9074 @item
9075 Locate where an extent would go if inserted into the list.
9076 @item
9077 Insert an extent into the list.
9078 @item
9079 Remove an extent from the list.
9080 @item
9081 Map over all the extents that overlap a range.
9082 @end enumerate
9083
9084   (4) requires being able to determine the first and last extents
9085 that overlap a range.
9086
9087   NOTE: @dfn{overlap} is used as follows:
9088
9089 @itemize @bullet
9090 @item
9091 two ranges overlap if they have at least one point in common.
9092 Whether the endpoints are open or closed makes a difference here.
9093 @item
9094 a point overlaps a range if the point is contained within the
9095 range; this is equivalent to treating a point @math{P} as the range
9096 @math{[P, P]}.
9097 @item
9098 In the case of an @emph{extent} overlapping a point or range, the extent
9099 is normally treated as having closed endpoints.  This applies
9100 consistently in the discussion of stacks of extents and such below.
9101 Note that this definition of overlap is not necessarily consistent with
9102 the extents that @code{map-extents} maps over, since @code{map-extents}
9103 sometimes pays attention to whether the endpoints of an extents are open
9104 or closed.  But for our purposes, it greatly simplifies things to treat
9105 all extents as having closed endpoints.
9106 @end itemize
9107
9108 First, define @math{>}, @math{<}, @math{<=}, etc. as applied to extents
9109 to mean comparison according to the display order.  Comparison between
9110 an extent @math{E} and an index @math{I} means comparison between
9111 @math{E} and the range @math{[I, I]}.
9112
9113 Also define @math{e>}, @math{e<}, @math{e<=}, etc. to mean comparison
9114 according to the e-order.
9115
9116 For any range @math{R}, define @math{R(0)} to be the starting index of
9117 the range and @math{R(1)} to be the ending index of the range.
9118
9119 For any extent @math{E}, define @math{E(next)} to be the extent directly
9120 following @math{E}, and @math{E(prev)} to be the extent directly
9121 preceding @math{E}.  Assume @math{E(next)} and @math{E(prev)} can be
9122 determined from @math{E} in constant time.  (This is because we store
9123 the extent list as a doubly linked list.)
9124
9125 Similarly, define @math{E(e-next)} and @math{E(e-prev)} to be the
9126 extents directly following and preceding @math{E} in the e-order.
9127
9128 Now:
9129
9130 Let @math{R} be a range.
9131 Let @math{F} be the first extent overlapping @math{R}.
9132 Let @math{L} be the last extent overlapping @math{R}.
9133
9134 Theorem 1: @math{R(1)} lies between @math{L} and @math{L(next)},
9135 i.e. @math{L <= R(1) < L(next)}.
9136
9137   This follows easily from the definition of display order.  The
9138 basic reason that this theorem applies is that the display order
9139 sorts by increasing starting index.
9140
9141   Therefore, we can determine @math{L} just by looking at where we would
9142 insert @math{R(1)} into the list, and if we know @math{F} and are moving
9143 forward over extents, we can easily determine when we've hit @math{L} by
9144 comparing the extent we're at to @math{R(1)}.
9145
9146 @example
9147 Theorem 2: @math{F(e-prev) e< [1, R(0)] e<= F}.
9148 @end example
9149
9150   This is the analog of Theorem 1, and applies because the e-order
9151 sorts by increasing ending index.
9152
9153   Therefore, @math{F} can be found in the same amount of time as
9154 operation (1), i.e. the time that it takes to locate where an extent
9155 would go if inserted into the e-order list.
9156
9157   If the lists were stored as balanced binary trees, then operation (1)
9158 would take logarithmic time, which is usually quite fast.  However,
9159 currently they're stored as simple doubly-linked lists, and instead we
9160 do some caching to try to speed things up.
9161
9162   Define a @dfn{stack of extents} (or @dfn{SOE}) as the set of extents
9163 (ordered in the display order) that overlap an index @math{I}, together
9164 with the SOE's @dfn{previous} extent, which is an extent that precedes
9165 @math{I} in the e-order. (Hopefully there will not be very many extents
9166 between @math{I} and the previous extent.)
9167
9168 Now:
9169
9170 Let @math{I} be an index, let @math{S} be the stack of extents on
9171 @math{I}, let @math{F} be the first extent in @math{S}, and let @math{P}
9172 be @math{S}'s previous extent.
9173
9174 Theorem 3: The first extent in @math{S} is the first extent that overlaps
9175 any range @math{[I, J]}.
9176
9177 Proof: Any extent that overlaps @math{[I, J]} but does not include
9178 @math{I} must have a start index @math{> I}, and thus be greater than
9179 any extent in @math{S}.
9180
9181 Therefore, finding the first extent that overlaps a range @math{R} is
9182 the same as finding the first extent that overlaps @math{R(0)}.
9183
9184 Theorem 4: Let @math{I2} be an index such that @math{I2 > I}, and let
9185 @math{F2} be the first extent that overlaps @math{I2}.  Then, either
9186 @math{F2} is in @math{S} or @math{F2} is greater than any extent in
9187 @math{S}.
9188
9189 Proof: If @math{F2} does not include @math{I} then its start index is
9190 greater than @math{I} and thus it is greater than any extent in
9191 @math{S}, including @math{F}.  Otherwise, @math{F2} includes @math{I}
9192 and thus is in @math{S}, and thus @math{F2 >= F}.
9193
9194 @node Extent Fragments
9195 @section Extent Fragments
9196 @cindex extent fragments
9197 @cindex fragments, extent
9198
9199   Imagine that the buffer is divided up into contiguous, non-overlapping
9200 @dfn{runs} of text such that no extent starts or ends within a run
9201 (extents that abut the run don't count).
9202
9203   An extent fragment is a structure that holds data about the run that
9204 contains a particular buffer position (if the buffer position is at the
9205 junction of two runs, the run after the position is used)---the
9206 beginning and end of the run, a list of all of the extents in that run,
9207 the @dfn{merged face} that results from merging all of the faces
9208 corresponding to those extents, the begin and end glyphs at the
9209 beginning of the run, etc.  This is the information that redisplay needs
9210 in order to display this run.
9211
9212   Extent fragments have to be very quick to update to a new buffer
9213 position when moving linearly through the buffer.  They rely on the
9214 stack-of-extents code, which does the heavy-duty algorithmic work of
9215 determining which extents overly a particular position.
9216
9217 @node Faces, Glyphs, Extents, Top
9218 @chapter Faces
9219 @cindex faces
9220
9221 Not yet documented.
9222
9223 @node Glyphs, Specifiers, Faces, Top
9224 @chapter Glyphs
9225 @cindex glyphs
9226
9227 Glyphs are graphical elements that can be displayed in XEmacs buffers or
9228 gutters. We use the term graphical element here in the broadest possible
9229 sense since glyphs can be as mundane as text or as arcane as a native
9230 tab widget.
9231
9232 In XEmacs, glyphs represent the uninstantiated state of graphical
9233 elements, i.e. they hold all the information necessary to produce an
9234 image on-screen but the image need not exist at this stage, and multiple
9235 screen images can be instantiated from a single glyph.
9236
9237 Glyphs are lazily instantiated by calling one of the glyph
9238 functions. This usually occurs within redisplay when
9239 @code{Fglyph_height} is called. Instantiation causes an image-instance
9240 to be created and cached. This cache is on a per-device basis for all glyphs
9241 except widget-glyphs, and on a per-window basis for widgets-glyphs.  The
9242 caching is done by @code{image_instantiate} and is necessary because it
9243 is generally possible to display an image-instance in multiple
9244 domains. For instance if we create a Pixmap, we can actually display
9245 this on multiple windows - even though we only need a single Pixmap
9246 instance to do this. If caching wasn't done then it would be necessary
9247 to create image-instances for every displayable occurrence of a glyph -
9248 and every usage - and this would be extremely memory and cpu intensive.
9249
9250 Widget-glyphs (a.k.a native widgets) are not cached in this way. This is
9251 because widget-glyph image-instances on screen are toolkit windows, and
9252 thus cannot be reused in multiple XEmacs domains. Thus widget-glyphs are
9253 cached on an XEmacs window basis.
9254
9255 Any action on a glyph first consults the cache before actually
9256 instantiating a widget.
9257
9258 @section Glyph Instantiation
9259 @cindex glyph instantiation
9260 @cindex instantiation, glyph
9261
9262 Glyph instantiation is a hairy topic and requires some explanation. The
9263 guts of glyph instantiation is contained within
9264 @code{image_instantiate}. A glyph contains an image which is a
9265 specifier. When a glyph function - for instance @code{Fglyph_height} -
9266 asks for a property of the glyph that can only be determined from its
9267 instantiated state, then the glyph image is instantiated and an image
9268 instance created. The instantiation process is governed by the specifier
9269 code and goes through a series of steps:
9270
9271 @itemize @bullet
9272 @item
9273 Validation. Instantiation of image instances happens dynamically - often
9274 within the guts of redisplay. Thus it is often not feasible to catch
9275 instantiator errors at instantiation time. Instead the instantiator is
9276 validated at the time it is added to the image specifier. This function
9277 is defined by @code{image_validate} and at a simple level validates
9278 keyword value pairs.
9279 @item
9280 Duplication. The specifier code by default takes a copy of the
9281 instantiator. This is reasonable for most specifiers but in the case of
9282 widget-glyphs can be problematic, since some of the properties in the
9283 instantiator - for instance callbacks - could cause infinite recursion
9284 in the copying process. Thus the image code defines a function -
9285 @code{image_copy_instantiator} - which will selectively copy values.
9286 This is controlled by the way that a keyword is defined either using
9287 @code{IIFORMAT_VALID_KEYWORD} or
9288 @code{IIFORMAT_VALID_NONCOPY_KEYWORD}. Note that the image caching and
9289 redisplay code relies on instantiator copying to ensure that current and
9290 new instantiators are actually different rather than referring to the
9291 same thing.
9292 @item
9293 Normalization. Once the instantiator has been copied it must be
9294 converted into a form that is viable at instantiation time. This can
9295 involve no changes at all, but typically involves things like converting
9296 file names to the actual data. This function is defined by
9297 @code{image_going_to_add} and @code{normalize_image_instantiator}.
9298 @item
9299 Instantiation. When an image instance is actually required for display
9300 it is instantiated using @code{image_instantiate}. This involves calling
9301 instantiate methods that are specific to the type of image being
9302 instantiated.
9303 @end itemize
9304
9305 The final instantiation phase also involves a number of steps. In order
9306 to understand these we need to describe a number of concepts.
9307
9308 An image is instantiated in a @dfn{domain}, where a domain can be any
9309 one of a device, frame, window or image-instance. The domain gives the
9310 image-instance context and identity and properties that affect the
9311 appearance of the image-instance may be different for the same glyph
9312 instantiated in different domains. An example is the face used to
9313 display the image-instance.
9314
9315 Although an image is instantiated in a particular domain the
9316 instantiation domain is not necessarily the domain in which the
9317 image-instance is cached. For example a pixmap can be instantiated in a
9318 window be actually be cached on a per-device basis. The domain in which
9319 the image-instance is actually cached is called the
9320 @dfn{governing-domain}. A governing-domain is currently either a device
9321 or a window. Widget-glyphs and text-glyphs have a window as a
9322 governing-domain, all other image-instances have a device as the
9323 governing-domain. The governing domain for an image-instance is
9324 determined using the governing_domain image-instance method.
9325
9326 @section Widget-Glyphs
9327 @cindex widget-glyphs
9328
9329 @section Widget-Glyphs in the MS-Windows Environment
9330 @cindex widget-glyphs in the MS-Windows environment
9331 @cindex MS-Windows environment, widget-glyphs in the
9332
9333 To Do
9334
9335 @section Widget-Glyphs in the X Environment
9336 @cindex widget-glyphs in the X environment
9337 @cindex X environment, widget-glyphs in the
9338
9339 Widget-glyphs under X make heavy use of lwlib (@pxref{Lucid Widget
9340 Library}) for manipulating the native toolkit objects. This is primarily
9341 so that different toolkits can be supported for widget-glyphs, just as
9342 they are supported for features such as menubars etc.
9343
9344 Lwlib is extremely poorly documented and quite hairy so here is my
9345 understanding of what goes on.
9346
9347 Lwlib maintains a set of widget_instances which mirror the hierarchical
9348 state of Xt widgets. I think this is so that widgets can be updated and
9349 manipulated generically by the lwlib library. For instance
9350 update_one_widget_instance can cope with multiple types of widget and
9351 multiple types of toolkit. Each element in the widget hierarchy is updated
9352 from its corresponding widget_instance by walking the widget_instance
9353 tree recursively.
9354
9355 This has desirable properties such as lw_modify_all_widgets which is
9356 called from @file{glyphs-x.c} and updates all the properties of a widget
9357 without having to know what the widget is or what toolkit it is from.
9358 Unfortunately this also has hairy properties such as making the lwlib
9359 code quite complex. And of course lwlib has to know at some level what
9360 the widget is and how to set its properties.
9361
9362 @node Specifiers, Menus, Glyphs, Top
9363 @chapter Specifiers
9364 @cindex specifiers
9365
9366 Not yet documented.
9367
9368 @node Menus, Subprocesses, Specifiers, Top
9369 @chapter Menus
9370 @cindex menus
9371
9372   A menu is set by setting the value of the variable
9373 @code{current-menubar} (which may be buffer-local) and then calling
9374 @code{set-menubar-dirty-flag} to signal a change.  This will cause the
9375 menu to be redrawn at the next redisplay.  The format of the data in
9376 @code{current-menubar} is described in @file{menubar.c}.
9377
9378   Internally the data in current-menubar is parsed into a tree of
9379 @code{widget_value's} (defined in @file{lwlib.h}); this is accomplished
9380 by the recursive function @code{menu_item_descriptor_to_widget_value()},
9381 called by @code{compute_menubar_data()}.  Such a tree is deallocated
9382 using @code{free_widget_value()}.
9383
9384   @code{update_screen_menubars()} is one of the external entry points.
9385 This checks to see, for each screen, if that screen's menubar needs to
9386 be updated.  This is the case if
9387
9388 @enumerate
9389 @item
9390 @code{set-menubar-dirty-flag} was called since the last redisplay.  (This
9391 function sets the C variable menubar_has_changed.)
9392 @item
9393 The buffer displayed in the screen has changed.
9394 @item
9395 The screen has no menubar currently displayed.
9396 @end enumerate
9397
9398   @code{set_screen_menubar()} is called for each such screen.  This
9399 function calls @code{compute_menubar_data()} to create the tree of
9400 widget_value's, then calls @code{lw_create_widget()},
9401 @code{lw_modify_all_widgets()}, and/or @code{lw_destroy_all_widgets()}
9402 to create the X-Toolkit widget associated with the menu.
9403
9404   @code{update_psheets()}, the other external entry point, actually
9405 changes the menus being displayed.  It uses the widgets fixed by
9406 @code{update_screen_menubars()} and calls various X functions to ensure
9407 that the menus are displayed properly.
9408
9409   The menubar widget is set up so that @code{pre_activate_callback()} is
9410 called when the menu is first selected (i.e. mouse button goes down),
9411 and @code{menubar_selection_callback()} is called when an item is
9412 selected.  @code{pre_activate_callback()} calls the function in
9413 activate-menubar-hook, which can change the menubar (this is described
9414 in @file{menubar.c}).  If the menubar is changed,
9415 @code{set_screen_menubars()} is called.
9416 @code{menubar_selection_callback()} enqueues a menu event, putting in it
9417 a function to call (either @code{eval} or @code{call-interactively}) and
9418 its argument, which is the callback function or form given in the menu's
9419 description.
9420
9421 @node Subprocesses, Interface to the X Window System, Menus, Top
9422 @chapter Subprocesses
9423 @cindex subprocesses
9424
9425   The fields of a process are:
9426
9427 @table @code
9428 @item name
9429 A string, the name of the process.
9430
9431 @item command
9432 A list containing the command arguments that were used to start this
9433 process.
9434
9435 @item filter
9436 A function used to accept output from the process instead of a buffer,
9437 or @code{nil}.
9438
9439 @item sentinel
9440 A function called whenever the process receives a signal, or @code{nil}.
9441
9442 @item buffer
9443 The associated buffer of the process.
9444
9445 @item pid
9446 An integer, the Unix process @sc{id}.
9447
9448 @item childp
9449 A flag, non-@code{nil} if this is really a child process.
9450 It is @code{nil} for a network connection.
9451
9452 @item mark
9453 A marker indicating the position of the end of the last output from this
9454 process inserted into the buffer.  This is often but not always the end
9455 of the buffer.
9456
9457 @item kill_without_query
9458 If this is non-@code{nil}, killing XEmacs while this process is still
9459 running does not ask for confirmation about killing the process.
9460
9461 @item raw_status_low
9462 @itemx raw_status_high
9463 These two fields record 16 bits each of the process status returned by
9464 the @code{wait} system call.
9465
9466 @item status
9467 The process status, as @code{process-status} should return it.
9468
9469 @item tick
9470 @itemx update_tick
9471 If these two fields are not equal, a change in the status of the process
9472 needs to be reported, either by running the sentinel or by inserting a
9473 message in the process buffer.
9474
9475 @item pty_flag
9476 Non-@code{nil} if communication with the subprocess uses a @sc{pty};
9477 @code{nil} if it uses a pipe.
9478
9479 @item infd
9480 The file descriptor for input from the process.
9481
9482 @item outfd
9483 The file descriptor for output to the process.
9484
9485 @item subtty
9486 The file descriptor for the terminal that the subprocess is using.  (On
9487 some systems, there is no need to record this, so the value is
9488 @code{-1}.)
9489
9490 @item tty_name
9491 The name of the terminal that the subprocess is using,
9492 or @code{nil} if it is using pipes.
9493 @end table
9494
9495 @node Interface to the X Window System, Index, Subprocesses, Top
9496 @chapter Interface to the X Window System
9497 @cindex X Window System, interface to the
9498
9499 Mostly undocumented.
9500
9501 @menu
9502 * Lucid Widget Library::        An interface to various widget sets.
9503 @end menu
9504
9505 @node Lucid Widget Library
9506 @section Lucid Widget Library
9507 @cindex Lucid Widget Library
9508 @cindex widget library, Lucid
9509 @cindex library, Lucid Widget
9510
9511 Lwlib is extremely poorly documented and quite hairy.  The author(s)
9512 blame that on X, Xt, and Motif, with some justice, but also sufficient
9513 hypocrisy to avoid drawing the obvious conclusion about their own work.
9514
9515 The Lucid Widget Library is composed of two more or less independent
9516 pieces.  The first, as the name suggests, is a set of widgets.  These
9517 widgets are intended to resemble and improve on widgets provided in the
9518 Motif toolkit but not in the Athena widgets, including menubars and
9519 scrollbars.  Recent additions by Andy Piper integrate some ``modern''
9520 widgets by Edward Falk, including checkboxes, radio buttons, progress
9521 gauges, and index tab controls (aka notebooks).
9522
9523 The second piece of the Lucid widget library is a generic interface to
9524 several toolkits for X (including Xt, the Athena widget set, and Motif,
9525 as well as the Lucid widgets themselves) so that core XEmacs code need
9526 not know which widget set has been used to build the graphical user
9527 interface.
9528
9529 @menu
9530 * Generic Widget Interface::    The lwlib generic widget interface.
9531 * Scrollbars::
9532 * Menubars::
9533 * Checkboxes and Radio Buttons::
9534 * Progress Bars::
9535 * Tab Controls::
9536 @end menu
9537
9538 @node Generic Widget Interface
9539 @subsection Generic Widget Interface
9540 @cindex widget interface, generic
9541
9542 In general in any toolkit a widget may be a composite object.  In Xt,
9543 all widgets have an X window that they manage, but typically a complex
9544 widget will have widget children, each of which manages a subwindow of
9545 the parent widget's X window.  These children may themselves be
9546 composite widgets.  Thus a widget is actually a tree or hierarchy of
9547 widgets.
9548
9549 For each toolkit widget, lwlib maintains a tree of @code{widget_values}
9550 which mirror the hierarchical state of Xt widgets (including Motif,
9551 Athena, 3D Athena, and Falk's widget sets).  Each @code{widget_value}
9552 has @code{contents} member, which points to the head of a linked list of
9553 its children.  The linked list of siblings is chained through the
9554 @code{next} member of @code{widget_value}.
9555
9556 @example
9557            +-----------+
9558            | composite |
9559            +-----------+
9560                  |
9561                  | contents
9562                  V
9563              +-------+ next +-------+ next +-------+
9564              | child |----->| child |----->| child |
9565              +-------+      +-------+      +-------+
9566                                 |
9567                                 | contents
9568                                 V
9569                          +-------------+ next +-------------+
9570                          | grand child |----->| grand child |
9571                          +-------------+      +-------------+
9572
9573 The @code{widget_value} hierarchy of a composite widget with two simple
9574 children and one composite child.
9575 @end example
9576
9577 The @code{widget_instance} structure maintains the inverse view of the
9578 tree.  As for the @code{widget_value}, siblings are chained through the
9579 @code{next} member.  However, rather than naming children, the
9580 @code{widget_instance} tree links to parents.
9581
9582 @example
9583            +-----------+
9584            | composite |
9585            +-----------+
9586                  A
9587                  | parent
9588                  |
9589              +-------+ next +-------+ next +-------+
9590              | child |----->| child |----->| child |
9591              +-------+      +-------+      +-------+
9592                                 A
9593                                 | parent
9594                                 |
9595                          +-------------+ next +-------------+
9596                          | grand child |----->| grand child |
9597                          +-------------+      +-------------+
9598
9599 The @code{widget_value} hierarchy of a composite widget with two simple
9600 children and one composite child.
9601 @end example
9602
9603 This permits widgets derived from different toolkits to be updated and
9604 manipulated generically by the lwlib library. For instance
9605 @code{update_one_widget_instance} can cope with multiple types of widget
9606 and multiple types of toolkit. Each element in the widget hierarchy is
9607 updated from its corresponding @code{widget_value} by walking the
9608 @code{widget_value} tree.  This has desirable properties.  For example,
9609 @code{lw_modify_all_widgets} is called from @file{glyphs-x.c} and
9610 updates all the properties of a widget without having to know what the
9611 widget is or what toolkit it is from.  Unfortunately this also has its
9612 hairy properties; the lwlib code quite complex. And of course lwlib has
9613 to know at some level what the widget is and how to set its properties.
9614
9615 The @code{widget_instance} structure also contains a pointer to the root
9616 of its tree.  Widget instances are further confi
9617
9618
9619 @node Scrollbars
9620 @subsection Scrollbars
9621 @cindex scrollbars
9622
9623 @node Menubars
9624 @subsection Menubars
9625 @cindex menubars
9626
9627 @node Checkboxes and Radio Buttons
9628 @subsection Checkboxes and Radio Buttons
9629 @cindex checkboxes and radio buttons
9630 @cindex radio buttons, checkboxes and
9631 @cindex buttons, checkboxes and radio
9632
9633 @node Progress Bars
9634 @subsection Progress Bars
9635 @cindex progress bars
9636 @cindex bars, progress
9637
9638 @node Tab Controls
9639 @subsection Tab Controls
9640 @cindex tab controls
9641
9642 @include index.texi
9643
9644 @c Print the tables of contents
9645 @summarycontents
9646 @contents
9647 @c That's all
9648
9649 @bye