man/internals/internals.texi

   1 \input texinfo  @c -*-texinfo-*-
   2 @c %**start of header
   3 @setfilename ../../info/internals.info
   4 @settitle XEmacs Internals Manual
   5 @c %**end of header
   6
   7 @ifinfo
   8 @dircategory XEmacs Editor
   9 @direntry
  10 * Internals: (internals).       XEmacs Internals Manual.
  11 @end direntry
  12
  13 Copyright @copyright{} 1992 - 1996 Ben Wing.
  14 Copyright @copyright{} 1996, 1997 Sun Microsystems.
  15 Copyright @copyright{} 1994 - 1998 Free Software Foundation.
  16 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
  17
  18
  19 Permission is granted to make and distribute verbatim copies of this
  20 manual provided the copyright notice and this permission notice are
  21 preserved on all copies.
  22
  23 @ignore
  24 Permission is granted to process this file through TeX and print the
  25 results, provided the printed document carries copying permission notice
  26 identical to this one except for the removal of this paragraph (this
  27 paragraph not being relevant to the printed manual).
  28
  29 @end ignore
  30 Permission is granted to copy and distribute modified versions of this
  31 manual under the conditions for verbatim copying, provided that the
  32 entire resulting derived work is distributed under the terms of a
  33 permission notice identical to this one.
  34
  35 Permission is granted to copy and distribute translations of this manual
  36 into another language, under the above conditions for modified versions,
  37 except that this permission notice may be stated in a translation
  38 approved by the Foundation.
  39
  40 Permission is granted to copy and distribute modified versions of this
  41 manual under the conditions for verbatim copying, provided also that the
  42 section entitled ``GNU General Public License'' is included exactly as
  43 in the original, and provided that the entire resulting derived work is
  44 distributed under the terms of a permission notice identical to this
  45 one.
  46
  47 Permission is granted to copy and distribute translations of this manual
  48 into another language, under the above conditions for modified versions,
  49 except that the section entitled ``GNU General Public License'' may be
  50 included in a translation approved by the Free Software Foundation
  51 instead of in the original English.
  52 @end ifinfo
  53
  54 @c Combine indices.
  55 @synindex cp fn
  56 @syncodeindex vr fn
  57 @syncodeindex ky fn
  58 @syncodeindex pg fn
  59 @syncodeindex tp fn
  60
  61 @setchapternewpage odd
  62 @finalout
  63
  64 @titlepage
  65 @title XEmacs Internals Manual
  66 @subtitle Version 1.3, August 1999
  67
  68 @author Ben Wing
  69 @author Martin Buchholz
  70 @author Hrvoje Niksic
  71 @author Matthias Neubauer
  72 @author Olivier Galibert
  73 @page
  74 @vskip 0pt plus 1fill
  75
  76 @noindent
  77 Copyright @copyright{} 1992 - 1996 Ben Wing. @*
  78 Copyright @copyright{} 1996, 1997 Sun Microsystems, Inc. @*
  79 Copyright @copyright{} 1994 - 1998 Free Software Foundation. @*
  80 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
  81
  82 @sp 2
  83 Version 1.3 @*
  84 August 1999.@*
  85
  86 Permission is granted to make and distribute verbatim copies of this
  87 manual provided the copyright notice and this permission notice are
  88 preserved on all copies.
  89
  90 Permission is granted to copy and distribute modified versions of this
  91 manual under the conditions for verbatim copying, provided also that the
  92 section entitled ``GNU General Public License'' is included
  93 exactly as in the original, and provided that the entire resulting
  94 derived work is distributed under the terms of a permission notice
  95 identical to this one.
  96
  97 Permission is granted to copy and distribute translations of this manual
  98 into another language, under the above conditions for modified versions,
  99 except that the section entitled ``GNU General Public License'' may be
 100 included in a translation approved by the Free Software Foundation
 101 instead of in the original English.
 102 @end titlepage
 103 @page
 104
 105 @node Top, A History of Emacs, (dir), (dir)
 106
 107 @ifinfo
 108 This Info file contains v1.0 of the XEmacs Internals Manual.
 109 @end ifinfo
 110
 111 @menu
 112 * A History of Emacs::          Times, dates, important events.
 113 * XEmacs From the Outside::     A broad conceptual overview.
 114 * The Lisp Language::           An overview.
 115 * XEmacs From the Perspective of Building::
 116 * XEmacs From the Inside::
 117 * The XEmacs Object System (Abstractly Speaking)::
 118 * How Lisp Objects Are Represented in C::
 119 * Rules When Writing New C Code::
 120 * A Summary of the Various XEmacs Modules::
 121 * Allocation of Objects in XEmacs Lisp::
 122 * Dumping::
 123 * Events and the Event Loop::
 124 * Evaluation; Stack Frames; Bindings::
 125 * Symbols and Variables::
 126 * Buffers and Textual Representation::
 127 * MULE Character Sets and Encodings::
 128 * The Lisp Reader and Compiler::
 129 * Lstreams::
 130 * Consoles; Devices; Frames; Windows::
 131 * The Redisplay Mechanism::
 132 * Extents::
 133 * Faces::
 134 * Glyphs::
 135 * Specifiers::
 136 * Menus::
 137 * Subprocesses::
 138 * Interface to X Windows::
 139 * Index::
 140
 141 @detailmenu
 142
 143 --- The Detailed Node Listing ---
 144
 145 A History of Emacs
 146
 147 * Through Version 18::          Unification prevails.
 148 * Lucid Emacs::                 One version 19 Emacs.
 149 * GNU Emacs 19::                The other version 19 Emacs.
 150 * GNU Emacs 20::                The other version 20 Emacs.
 151 * XEmacs::                      The continuation of Lucid Emacs.
 152
 153 Rules When Writing New C Code
 154
 155 * General Coding Rules::
 156 * Writing Lisp Primitives::
 157 * Adding Global Lisp Variables::
 158 * Coding for Mule::
 159 * Techniques for XEmacs Developers::
 160
 161 Coding for Mule
 162
 163 * Character-Related Data Types::
 164 * Working With Character and Byte Positions::
 165 * Conversion to and from External Data::
 166 * General Guidelines for Writing Mule-Aware Code::
 167 * An Example of Mule-Aware Code::
 168
 169 A Summary of the Various XEmacs Modules
 170
 171 * Low-Level Modules::
 172 * Basic Lisp Modules::
 173 * Modules for Standard Editing Operations::
 174 * Editor-Level Control Flow Modules::
 175 * Modules for the Basic Displayable Lisp Objects::
 176 * Modules for other Display-Related Lisp Objects::
 177 * Modules for the Redisplay Mechanism::
 178 * Modules for Interfacing with the File System::
 179 * Modules for Other Aspects of the Lisp Interpreter and Object System::
 180 * Modules for Interfacing with the Operating System::
 181 * Modules for Interfacing with X Windows::
 182 * Modules for Internationalization::
 183
 184 Allocation of Objects in XEmacs Lisp
 185
 186 * Introduction to Allocation::
 187 * Garbage Collection::
 188 * GCPROing::
 189 * Garbage Collection - Step by Step::
 190 * Integers and Characters::
 191 * Allocation from Frob Blocks::
 192 * lrecords::
 193 * Low-level allocation::
 194 * Cons::
 195 * Vector::
 196 * Bit Vector::
 197 * Symbol::
 198 * Marker::
 199 * String::
 200 * Compiled Function::
 201
 202 Garbage Collection - Step by Step
 203
 204 * Invocation::
 205 * garbage_collect_1::
 206 * mark_object::
 207 * gc_sweep::
 208 * sweep_lcrecords_1::
 209 * compact_string_chars::
 210 * sweep_strings::
 211 * sweep_bit_vectors_1::
 212
 213 Dumping
 214
 215 * Overview::
 216 * Data descriptions::
 217 * Dumping phase::
 218 * Reloading phase::
 219
 220 Dumping phase
 221
 222 * Object inventory::
 223 * Address allocation::
 224 * The header::
 225 * Data dumping::
 226 * Pointers dumping::
 227
 228 Events and the Event Loop
 229
 230 * Introduction to Events::
 231 * Main Loop::
 232 * Specifics of the Event Gathering Mechanism::
 233 * Specifics About the Emacs Event::
 234 * The Event Stream Callback Routines::
 235 * Other Event Loop Functions::
 236 * Converting Events::
 237 * Dispatching Events; The Command Builder::
 238
 239 Evaluation; Stack Frames; Bindings
 240
 241 * Evaluation::
 242 * Dynamic Binding; The specbinding Stack; Unwind-Protects::
 243 * Simple Special Forms::
 244 * Catch and Throw::
 245
 246 Symbols and Variables
 247
 248 * Introduction to Symbols::
 249 * Obarrays::
 250 * Symbol Values::
 251
 252 Buffers and Textual Representation
 253
 254 * Introduction to Buffers::     A buffer holds a block of text such as a file.
 255 * The Text in a Buffer::        Representation of the text in a buffer.
 256 * Buffer Lists::                Keeping track of all buffers.
 257 * Markers and Extents::         Tagging locations within a buffer.
 258 * Bufbytes and Emchars::        Representation of individual characters.
 259 * The Buffer Object::           The Lisp object corresponding to a buffer.
 260
 261 MULE Character Sets and Encodings
 262
 263 * Character Sets::
 264 * Encodings::
 265 * Internal Mule Encodings::
 266 * CCL::
 267
 268 Encodings
 269
 270 * Japanese EUC (Extended Unix Code)::
 271 * JIS7::
 272
 273 Internal Mule Encodings
 274
 275 * Internal String Encoding::
 276 * Internal Character Encoding::
 277
 278 Lstreams
 279
 280 * Creating an Lstream::         Creating an lstream object.
 281 * Lstream Types::               Different sorts of things that are streamed.
 282 * Lstream Functions::           Functions for working with lstreams.
 283 * Lstream Methods::             Creating new lstream types.
 284
 285 Consoles; Devices; Frames; Windows
 286
 287 * Introduction to Consoles; Devices; Frames; Windows::
 288 * Point::
 289 * Window Hierarchy::
 290 * The Window Object::
 291
 292 The Redisplay Mechanism
 293
 294 * Critical Redisplay Sections::
 295 * Line Start Cache::
 296 * Redisplay Piece by Piece::
 297
 298 Extents
 299
 300 * Introduction to Extents::     Extents are ranges over text, with properties.
 301 * Extent Ordering::             How extents are ordered internally.
 302 * Format of the Extent Info::   The extent information in a buffer or string.
 303 * Zero-Length Extents::         A weird special case.
 304 * Mathematics of Extent Ordering::  A rigorous foundation.
 305 * Extent Fragments::            Cached information useful for redisplay.
 306
 307 @end detailmenu
 308 @end menu
 309
 310 @node A History of Emacs, XEmacs From the Outside, Top, Top
 311 @chapter A History of Emacs
 312 @cindex history of Emacs
 313 @cindex Hackers (Steven Levy)
 314 @cindex Levy, Steven
 315 @cindex ITS (Incompatible Timesharing System)
 316 @cindex Stallman, Richard
 317 @cindex RMS
 318 @cindex MIT
 319 @cindex TECO
 320 @cindex FSF
 321 @cindex Free Software Foundation
 322
 323   XEmacs is a powerful, customizable text editor and development
 324 environment.  It began as Lucid Emacs, which was in turn derived from
 325 GNU Emacs, a program written by Richard Stallman of the Free Software
 326 Foundation.  GNU Emacs dates back to the 1970's, and was modelled
 327 after a package called ``Emacs'', written in 1976, that was a set of
 328 macros on top of TECO, an old, old text editor written at MIT on the
 329 DEC PDP 10 under one of the earliest time-sharing operating systems,
 330 ITS (Incompatible Timesharing System). (ITS dates back well before
 331 Unix.) ITS, TECO, and Emacs were products of a group of people at MIT
 332 who called themselves ``hackers'', who shared an idealistic belief
 333 system about the free exchange of information and were fanatical in
 334 their devotion to and time spent with computers. (The hacker
 335 subculture dates back to the late 1950's at MIT and is described in
 336 detail in Steven Levy's book @cite{Hackers}.  This book also includes
 337 a lot of information about Stallman himself and the development of
 338 Lisp, a programming language developed at MIT that underlies Emacs.)
 339
 340 @menu
 341 * Through Version 18::          Unification prevails.
 342 * Lucid Emacs::                 One version 19 Emacs.
 343 * GNU Emacs 19::                The other version 19 Emacs.
 344 * GNU Emacs 20::                The other version 20 Emacs.
 345 * XEmacs::                      The continuation of Lucid Emacs.
 346 @end menu
 347
 348 @node Through Version 18, Lucid Emacs, A History of Emacs, A History of Emacs
 349 @section Through Version 18
 350 @cindex Gosling, James
 351 @cindex Great Usenet Renaming
 352
 353   Although the history of the early versions of GNU Emacs is unclear,
 354 the history is well-known from the middle of 1985.  A time line is:
 355
 356 @itemize @bullet
 357 @item
 358 GNU Emacs version 15 (15.34) was released sometime in 1984 or 1985 and
 359 shared some code with a version of Emacs written by James Gosling (the
 360 same James Gosling who later created the Java language).
 361 @item
 362 GNU Emacs version 16 (first released version was 16.56) was released on
 363 July 15, 1985.  All Gosling code was removed due to potential copyright
 364 problems with the code.
 365 @item
 366 version 16.57: released on September 16, 1985.
 367 @item
 368 versions 16.58, 16.59: released on September 17, 1985.
 369 @item
 370 version 16.60: released on September 19, 1985.  These later version 16's
 371 incorporated patches from the net, esp. for getting Emacs to work under
 372 System V.
 373 @item
 374 version 17.36 (first official v17 release) released on December 20,
 375 1985.  Included a TeX-able user manual.  First official unpatched
 376 version that worked on vanilla System V machines.
 377 @item
 378 version 17.43 (second official v17 release) released on January 25,
 379 1986.
 380 @item
 381 version 17.45 released on January 30, 1986.
 382 @item
 383 version 17.46 released on February 4, 1986.
 384 @item
 385 version 17.48 released on February 10, 1986.
 386 @item
 387 version 17.49 released on February 12, 1986.
 388 @item
 389 version 17.55 released on March 18, 1986.
 390 @item
 391 version 17.57 released on March 27, 1986.
 392 @item
 393 version 17.58 released on April 4, 1986.
 394 @item
 395 version 17.61 released on April 12, 1986.
 396 @item
 397 version 17.63 released on May 7, 1986.
 398 @item
 399 version 17.64 released on May 12, 1986.
 400 @item
 401 version 18.24 (a beta version) released on October 2, 1986.
 402 @item
 403 version 18.30 (a beta version) released on November 15, 1986.
 404 @item
 405 version 18.31 (a beta version) released on November 23, 1986.
 406 @item
 407 version 18.32 (a beta version) released on December 7, 1986.
 408 @item
 409 version 18.33 (a beta version) released on December 12, 1986.
 410 @item
 411 version 18.35 (a beta version) released on January 5, 1987.
 412 @item
 413 version 18.36 (a beta version) released on January 21, 1987.
 414 @item
 415 January 27, 1987: The Great Usenet Renaming.  net.emacs is now
 416 comp.emacs.
 417 @item
 418 version 18.37 (a beta version) released on February 12, 1987.
 419 @item
 420 version 18.38 (a beta version) released on March 3, 1987.
 421 @item
 422 version 18.39 (a beta version) released on March 14, 1987.
 423 @item
 424 version 18.40 (a beta version) released on March 18, 1987.
 425 @item
 426 version 18.41 (the first ``official'' release) released on March 22,
 427 1987.
 428 @item
 429 version 18.45 released on June 2, 1987.
 430 @item
 431 version 18.46 released on June 9, 1987.
 432 @item
 433 version 18.47 released on June 18, 1987.
 434 @item
 435 version 18.48 released on September 3, 1987.
 436 @item
 437 version 18.49 released on September 18, 1987.
 438 @item
 439 version 18.50 released on February 13, 1988.
 440 @item
 441 version 18.51 released on May 7, 1988.
 442 @item
 443 version 18.52 released on September 1, 1988.
 444 @item
 445 version 18.53 released on February 24, 1989.
 446 @item
 447 version 18.54 released on April 26, 1989.
 448 @item
 449 version 18.55 released on August 23, 1989.  This is the earliest version
 450 that is still available by FTP.
 451 @item
 452 version 18.56 released on January 17, 1991.
 453 @item
 454 version 18.57 released late January, 1991.
 455 @item
 456 version 18.58 released ?????.
 457 @item
 458 version 18.59 released October 31, 1992.
 459 @end itemize
 460
 461 @node Lucid Emacs, GNU Emacs 19, Through Version 18, A History of Emacs
 462 @section Lucid Emacs
 463 @cindex Lucid Emacs
 464 @cindex Lucid Inc.
 465 @cindex Energize
 466 @cindex Epoch
 467
 468   Lucid Emacs was developed by the (now-defunct) Lucid Inc., a maker of
 469 C++ and Lisp development environments.  It began when Lucid decided they
 470 wanted to use Emacs as the editor and cornerstone of their C++
 471 development environment (called ``Energize'').  They needed many features
 472 that were not available in the existing version of GNU Emacs (version
 473 18.5something), in particular good and integrated support for GUI
 474 elements such as mouse support, multiple fonts, multiple window-system
 475 windows, etc.  A branch of GNU Emacs called Epoch, written at the
 476 University of Illinois, existed that supplied many of these features;
 477 however, Lucid needed more than what existed in Epoch.  At the time, the
 478 Free Software Foundation was working on version 19 of Emacs (this was
 479 sometime around 1991), which was planned to have similar features, and
 480 so Lucid decided to work with the Free Software Foundation.  Their plan
 481 was to add features that they needed, and coordinate with the FSF so
 482 that the features would get included back into Emacs version 19.
 483
 484   Delays in the release of version 19 occurred, however (resulting in it
 485 finally being released more than a year after what was initially
 486 planned), and Lucid encountered unexpected technical resistance in
 487 getting their changes merged back into version 19, so they decided to
 488 release their own version of Emacs, which became Lucid Emacs 19.0.
 489
 490 @cindex Zawinski, Jamie
 491 @cindex Sexton, Harlan
 492 @cindex Benson, Eric
 493 @cindex Devin, Matthieu
 494   The initial authors of Lucid Emacs were Matthieu Devin, Harlan Sexton,
 495 and Eric Benson, and the work was later taken over by Jamie Zawinski,
 496 who became ``Mr. Lucid Emacs'' for many releases.
 497
 498   A time line for Lucid Emacs/XEmacs is
 499
 500 @itemize @bullet
 501 @item
 502 version 19.0 shipped with Energize 1.0, April 1992.
 503 @item
 504 version 19.1 released June 4, 1992.
 505 @item
 506 version 19.2 released June 19, 1992.
 507 @item
 508 version 19.3 released September 9, 1992.
 509 @item
 510 version 19.4 released January 21, 1993.
 511 @item
 512 version 19.5 was a repackaging of 19.4 with a few bug fixes and
 513 shipped with Energize 2.0.  Never released to the net.
 514 @item
 515 version 19.6 released April 9, 1993.
 516 @item
 517 version 19.7 was a repackaging of 19.6 with a few bug fixes and
 518 shipped with Energize 2.1.  Never released to the net.
 519 @item
 520 version 19.8 released September 6, 1993.
 521 @item
 522 version 19.9 released January 12, 1994.
 523 @item
 524 version 19.10 released May 27, 1994.
 525 @item
 526 version 19.11 (first XEmacs) released September 13, 1994.
 527 @item
 528 version 19.12 released June 23, 1995.
 529 @item
 530 version 19.13 released September 1, 1995.
 531 @item
 532 version 19.14 released June 23, 1996.
 533 @item
 534 version 20.0 released February 9, 1997.
 535 @item
 536 version 19.15 released March 28, 1997.
 537 @item
 538 version 20.1 (not released to the net) April 15, 1997.
 539 @item
 540 version 20.2 released May 16, 1997.
 541 @item
 542 version 19.16 released October 31, 1997.
 543 @item
 544 version 20.3 (the first stable version of XEmacs 20.x) released November 30,
 545 1997.
 546 version 20.4 released February 28, 1998.
 547 @end itemize
 548
 549 @node GNU Emacs 19, GNU Emacs 20, Lucid Emacs, A History of Emacs
 550 @section GNU Emacs 19
 551 @cindex GNU Emacs 19
 552 @cindex FSF Emacs
 553
 554   About a year after the initial release of Lucid Emacs, the FSF
 555 released a beta of their version of Emacs 19 (referred to here as ``GNU
 556 Emacs'').  By this time, the current version of Lucid Emacs was
 557 19.6. (Strangely, the first released beta from the FSF was GNU Emacs
 558 19.7.) A time line for GNU Emacs version 19 is
 559
 560 @itemize @bullet
 561 @item
 562 version 19.8 (beta) released May 27, 1993.
 563 @item
 564 version 19.9 (beta) released May 27, 1993.
 565 @item
 566 version 19.10 (beta) released May 30, 1993.
 567 @item
 568 version 19.11 (beta) released June 1, 1993.
 569 @item
 570 version 19.12 (beta) released June 2, 1993.
 571 @item
 572 version 19.13 (beta) released June 8, 1993.
 573 @item
 574 version 19.14 (beta) released June 17, 1993.
 575 @item
 576 version 19.15 (beta) released June 19, 1993.
 577 @item
 578 version 19.16 (beta) released July 6, 1993.
 579 @item
 580 version 19.17 (beta) released late July, 1993.
 581 @item
 582 version 19.18 (beta) released August 9, 1993.
 583 @item
 584 version 19.19 (beta) released August 15, 1993.
 585 @item
 586 version 19.20 (beta) released November 17, 1993.
 587 @item
 588 version 19.21 (beta) released November 17, 1993.
 589 @item
 590 version 19.22 (beta) released November 28, 1993.
 591 @item
 592 version 19.23 (beta) released May 17, 1994.
 593 @item
 594 version 19.24 (beta) released May 16, 1994.
 595 @item
 596 version 19.25 (beta) released June 3, 1994.
 597 @item
 598 version 19.26 (beta) released September 11, 1994.
 599 @item
 600 version 19.27 (beta) released September 14, 1994.
 601 @item
 602 version 19.28 (first ``official'' release) released November 1, 1994.
 603 @item
 604 version 19.29 released June 21, 1995.
 605 @item
 606 version 19.30 released November 24, 1995.
 607 @item
 608 version 19.31 released May 25, 1996.
 609 @item
 610 version 19.32 released July 31, 1996.
 611 @item
 612 version 19.33 released August 11, 1996.
 613 @item
 614 version 19.34 released August 21, 1996.
 615 @item
 616 version 19.34b released September 6, 1996.
 617 @end itemize
 618
 619 @cindex Mlynarik, Richard
 620   In some ways, GNU Emacs 19 was better than Lucid Emacs; in some ways,
 621 worse.  Lucid soon began incorporating features from GNU Emacs 19 into
 622 Lucid Emacs; the work was mostly done by Richard Mlynarik, who had been
 623 working on and using GNU Emacs for a long time (back as far as version
 624 16 or 17).
 625
 626 @node GNU Emacs 20, XEmacs, GNU Emacs 19, A History of Emacs
 627 @section GNU Emacs 20
 628 @cindex GNU Emacs 20
 629 @cindex FSF Emacs
 630
 631 On February 2, 1997 work began on GNU Emacs to integrate Mule.  The first
 632 release was made in September of that year.
 633
 634 A timeline for Emacs 20 is
 635
 636 @itemize @bullet
 637 @item
 638 version 20.1 released September 17, 1997.
 639 @item
 640 version 20.2 released September 20, 1997.
 641 @item
 642 version 20.3 released August 19, 1998.
 643 @end itemize
 644
 645 @node XEmacs,  , GNU Emacs 20, A History of Emacs
 646 @section XEmacs
 647 @cindex XEmacs
 648
 649 @cindex Sun Microsystems
 650 @cindex University of Illinois
 651 @cindex Illinois, University of
 652 @cindex SPARCWorks
 653 @cindex Andreessen, Marc
 654 @cindex Baur, Steve
 655 @cindex Buchholz, Martin
 656 @cindex Kaplan, Simon
 657 @cindex Wing, Ben
 658 @cindex Thompson, Chuck
 659 @cindex Win-Emacs
 660 @cindex Epoch
 661 @cindex Amdahl Corporation
 662   Around the time that Lucid was developing Energize, Sun Microsystems
 663 was developing their own development environment (called ``SPARCWorks'')
 664 and also decided to use Emacs.  They joined forces with the Epoch team
 665 at the University of Illinois and later with Lucid.  The maintainer of
 666 the last-released version of Epoch was Marc Andreessen, but he dropped
 667 out and the Epoch project, headed by Simon Kaplan, lured Chuck Thompson
 668 away from a system administration job to become the primary Lucid Emacs
 669 author for Epoch and Sun.  Chuck's area of specialty became the
 670 redisplay engine (he replaced the old Lucid Emacs redisplay engine with
 671 a ported version from Epoch and then later rewrote it from scratch).
 672 Sun also hired Ben Wing (the author of Win-Emacs, a port of Lucid Emacs
 673 to Microsoft Windows 3.1) in 1993, for what was initially a one-month
 674 contract to fix some event problems but later became a many-year
 675 involvement, punctuated by a six-month contract with Amdahl Corporation.
 676
 677 @cindex rename to XEmacs
 678   In 1994, Sun and Lucid agreed to rename Lucid Emacs to XEmacs (a name
 679 not favorable to either company); the first release called XEmacs was
 680 version 19.11.  In June 1994, Lucid folded and Jamie quit to work for
 681 the newly formed Mosaic Communications Corp., later Netscape
 682 Communications Corp. (co-founded by the same Marc Andreessen, who had
 683 quit his Epoch job to work on a graphical browser for the World Wide
 684 Web).  Chuck then become the primary maintainer of XEmacs, and put out
 685 versions 19.11 through 19.14 in conjunction with Ben.  For 19.12 and
 686 19.13, Chuck added the new redisplay and many other display improvements
 687 and Ben added MULE support (support for Asian and other languages) and
 688 redesigned most of the internal Lisp subsystems to better support the
 689 MULE work and the various other features being added to XEmacs.  After
 690 19.14 Chuck retired as primary maintainer and Steve Baur stepped in.
 691
 692 @cindex MULE merged XEmacs appears
 693   Soon after 19.13 was released, work began in earnest on the MULE
 694 internationalization code and the source tree was divided into two
 695 development paths.  The MULE version was initially called 19.20, but was
 696 soon renamed to 20.0.  In 1996 Martin Buchholz of Sun Microsystems took
 697 over the care and feeding of it and worked on it in parallel with the
 698 19.14 development that was occurring at the same time.  After much work
 699 by Martin, it was decided to release 20.0 ahead of 19.15 in February
 700 1997.  The source tree remained divided until 20.2 when the version 19
 701 source was finally retired at version 19.16.
 702
 703 @cindex Baur, Steve
 704 @cindex Buchholz, Martin
 705 @cindex Jones, Kyle
 706 @cindex Niksic, Hrvoje
 707 @cindex XEmacs goes it alone
 708   In 1997, Sun finally dropped all pretense of support for XEmacs and
 709 Martin Buchholz left the company in November.  Since then, and mostly
 710 for the previous year, because Steve Baur was never paid to work on
 711 XEmacs, XEmacs has existed solely on the contributions of volunteers
 712 from the Free Software Community.  Starting from 1997, Hrvoje Niksic and
 713 Kyle Jones have figured prominently in XEmacs development.
 714
 715 @cindex merging attempts
 716   Many attempts have been made to merge XEmacs and GNU Emacs, but they
 717 have consistently failed.
 718
 719   A more detailed history is contained in the XEmacs About page.
 720
 721 @node XEmacs From the Outside, The Lisp Language, A History of Emacs, Top
 722 @chapter XEmacs From the Outside
 723 @cindex read-eval-print
 724
 725   XEmacs appears to the outside world as an editor, but it is really a
 726 Lisp environment.  At its heart is a Lisp interpreter; it also
 727 ``happens'' to contain many specialized object types (e.g. buffers,
 728 windows, frames, events) that are useful for implementing an editor.
 729 Some of these objects (in particular windows and frames) have
 730 displayable representations, and XEmacs provides a function
 731 @code{redisplay()} that ensures that the display of all such objects
 732 matches their internal state.  Most of the time, a standard Lisp
 733 environment is in a @dfn{read-eval-print} loop---i.e. ``read some Lisp
 734 code, execute it, and print the results''.  XEmacs has a similar loop:
 735
 736 @itemize @bullet
 737 @item
 738 read an event
 739 @item
 740 dispatch the event (i.e. ``do it'')
 741 @item
 742 redisplay
 743 @end itemize
 744
 745   Reading an event is done using the Lisp function @code{next-event},
 746 which waits for something to happen (typically, the user presses a key
 747 or moves the mouse) and returns an event object describing this.
 748 Dispatching an event is done using the Lisp function
 749 @code{dispatch-event}, which looks up the event in a keymap object (a
 750 particular kind of object that associates an event with a Lisp function)
 751 and calls that function.  The function ``does'' what the user has
 752 requested by changing the state of particular frame objects, buffer
 753 objects, etc.  Finally, @code{redisplay()} is called, which updates the
 754 display to reflect those changes just made.  Thus is an ``editor'' born.
 755
 756 @cindex bridge, playing
 757 @cindex taxes, doing
 758 @cindex pi, calculating
 759   Note that you do not have to use XEmacs as an editor; you could just
 760 as well make it do your taxes, compute pi, play bridge, etc.  You'd just
 761 have to write functions to do those operations in Lisp.
 762
 763 @node The Lisp Language, XEmacs From the Perspective of Building, XEmacs From the Outside, Top
 764 @chapter The Lisp Language
 765 @cindex Lisp vs. C
 766 @cindex C vs. Lisp
 767 @cindex Lisp vs. Java
 768 @cindex Java vs. Lisp
 769 @cindex dynamic scoping
 770 @cindex scoping, dynamic
 771 @cindex dynamic types
 772 @cindex types, dynamic
 773 @cindex Java
 774 @cindex Common Lisp
 775 @cindex Gosling, James
 776
 777   Lisp is a general-purpose language that is higher-level than C and in
 778 many ways more powerful than C.  Powerful dialects of Lisp such as
 779 Common Lisp are probably much better languages for writing very large
 780 applications than is C. (Unfortunately, for many non-technical
 781 reasons C and its successor C++ have become the dominant languages for
 782 application development.  These languages are both inadequate for
 783 extremely large applications, which is evidenced by the fact that newer,
 784 larger programs are becoming ever harder to write and are requiring ever
 785 more programmers despite great increases in C development environments;
 786 and by the fact that, although hardware speeds and reliability have been
 787 growing at an exponential rate, most software is still generally
 788 considered to be slow and buggy.)
 789
 790   The new Java language holds promise as a better general-purpose
 791 development language than C.  Java has many features in common with
 792 Lisp that are not shared by C (this is not a coincidence, since
 793 Java was designed by James Gosling, a former Lisp hacker).  This
 794 will be discussed more later.
 795
 796 For those used to C, here is a summary of the basic differences between
 797 C and Lisp:
 798
 799 @enumerate
 800 @item
 801 Lisp has an extremely regular syntax.  Every function, expression,
 802 and control statement is written in the form
 803
 804 @example
 805    (@var{func} @var{arg1} @var{arg2} ...)
 806 @end example
 807
 808 This is as opposed to C, which writes functions as
 809
 810 @example
 811    func(@var{arg1}, @var{arg2}, ...)
 812 @end example
 813
 814 but writes expressions involving operators as (e.g.)
 815
 816 @example
 817    @var{arg1} + @var{arg2}
 818 @end example
 819
 820 and writes control statements as (e.g.)
 821
 822 @example
 823    while (@var{expr}) @{ @var{statement1}; @var{statement2}; ... @}
 824 @end example
 825
 826 Lisp equivalents of the latter two would be
 827
 828 @example
 829    (+ @var{arg1} @var{arg2} ...)
 830 @end example
 831
 832 and
 833
 834 @example
 835    (while @var{expr} @var{statement1} @var{statement2} ...)
 836 @end example
 837
 838 @item
 839 Lisp is a safe language.  Assuming there are no bugs in the Lisp
 840 interpreter/compiler, it is impossible to write a program that ``core
 841 dumps'' or otherwise causes the machine to execute an illegal
 842 instruction.  This is very different from C, where perhaps the most
 843 common outcome of a bug is exactly such a crash.  A corollary of this is that
 844 the C operation of casting a pointer is impossible (and unnecessary) in
 845 Lisp, and that it is impossible to access memory outside the bounds of
 846 an array.
 847
 848 @item
 849 Programs and data are written in the same form.  The
 850 parenthesis-enclosing form described above for statements is the same
 851 form used for the most common data type in Lisp, the list.  Thus, it is
 852 possible to represent any Lisp program using Lisp data types, and for
 853 one program to construct Lisp statements and then dynamically
 854 @dfn{evaluate} them, or cause them to execute.
 855
 856 @item
 857 All objects are @dfn{dynamically typed}.  This means that part of every
 858 object is an indication of what type it is.  A Lisp program can
 859 manipulate an object without knowing what type it is, and can query an
 860 object to determine its type.  This means that, correspondingly,
 861 variables and function parameters can hold objects of any type and are
 862 not normally declared as being of any particular type.  This is opposed
 863 to the @dfn{static typing} of C, where variables can hold exactly one
 864 type of object and must be declared as such, and objects do not contain
 865 an indication of their type because it's implicit in the variables they
 866 are stored in.  It is possible in C to have a variable hold different
 867 types of objects (e.g. through the use of @code{void *} pointers or
 868 variable-argument functions), but the type information must then be
 869 passed explicitly in some other fashion, leading to additional program
 870 complexity.
 871
 872 @item
 873 Allocated memory is automatically reclaimed when it is no longer in use.
 874 This operation is called @dfn{garbage collection} and involves looking
 875 through all variables to see what memory is being pointed to, and
 876 reclaiming any memory that is not pointed to and is thus
 877 ``inaccessible'' and out of use.  This is as opposed to C, in which
 878 allocated memory must be explicitly reclaimed using @code{free()}.  If
 879 you simply drop all pointers to memory without freeing it, it becomes
 880 ``leaked'' memory that still takes up space.  Over a long period of
 881 time, this can cause your program to grow and grow until it runs out of
 882 memory.
 883
 884 @item
 885 Lisp has built-in facilities for handling errors and exceptions.  In C,
 886 when an error occurs, usually either the program exits entirely or the
 887 routine in which the error occurs returns a value indicating this.  If
 888 an error occurs in a deeply-nested routine, then every routine currently
 889 called must unwind itself normally and return an error value back up to
 890 the next routine.  This means that every routine must explicitly check
 891 for an error in all the routines it calls; if it does not do so,
 892 unexpected and often random behavior results.  This is an extremely
 893 common source of bugs in C programs.  An alternative would be to do a
 894 non-local exit using @code{longjmp()}, but that is often very dangerous
 895 because the routines that were exited past had no opportunity to clean
 896 up after themselves and may leave things in an inconsistent state,
 897 causing a crash shortly afterwards.
 898
 899 Lisp provides mechanisms to make such non-local exits safe.  When an
 900 error occurs, a routine simply signals that an error of a particular
 901 class has occurred, and a non-local exit takes place.  Any routine can
 902 trap errors occurring in routines it calls by registering an error
 903 handler for some or all classes of errors. (If no handler is registered,
 904 a default handler, generally installed by the top-level event loop, is
 905 executed; this prints out the error and continues.) Routines can also
 906 specify cleanup code (called an @dfn{unwind-protect}) that will be
 907 called when control exits from a block of code, no matter how that exit
 908 occurs---i.e. even if a function deeply nested below it causes a
 909 non-local exit back to the top level.
 910
 911 Note that this facility has appeared in some recent vintages of C, in
 912 particular Visual C++ and other PC compilers written for the Microsoft
 913 Win32 API.
 914
 915 @item
 916 In Emacs Lisp, local variables are @dfn{dynamically scoped}.  This means
 917 that if you declare a local variable in a particular function, and then
 918 call another function, that subfunction can ``see'' the local variable
 919 you declared.  This is actually considered a bug in Emacs Lisp and in
 920 all other early dialects of Lisp, and was corrected in Common Lisp. (In
 921 Common Lisp, you can still declare dynamically scoped variables if you
 922 want to---they are sometimes useful---but variables by default are
 923 @dfn{lexically scoped} as in C.)
 924 @end enumerate
 925
 926 For those familiar with Lisp, Emacs Lisp is modelled after MacLisp, an
 927 early dialect of Lisp developed at MIT (no relation to the Macintosh
 928 computer).  There is a Common Lisp compatibility package available for
 929 Emacs that provides many of the features of Common Lisp.
 930
 931 The Java language is derived in many ways from C, and shares a similar
 932 syntax, but has the following features in common with Lisp (and different
 933 from C):
 934
 935 @enumerate
 936 @item
 937 Java is a safe language, like Lisp.
 938 @item
 939 Java provides garbage collection, like Lisp.
 940 @item
 941 Java has built-in facilities for handling errors and exceptions, like
 942 Lisp.
 943 @item
 944 Java has a type system that combines the best advantages of both static
 945 and dynamic typing.  Objects (except very simple types) are explicitly
 946 marked with their type, as in dynamic typing; but there is a hierarchy
 947 of types and functions are declared to accept only certain types, thus
 948 providing the increased compile-time error-checking of static typing.
 949 @end enumerate
 950
 951 The Java language also has some negative attributes:
 952
 953 @enumerate
 954 @item
 955 Java uses the edit/compile/run model of software development.  This
 956 makes it hard to use interactively.  For example, to use Java like
 957 @code{bc} it is necessary to write a special purpose, albeit tiny,
 958 application.  In Emacs Lisp, a calculator comes built-in without any
 959 effort - one can always just type an expression in the @code{*scratch*}
 960 buffer.
 961 @item
 962 Java tries too hard to enforce, not merely enable, portability, making
 963 ordinary access to standard OS facilities painful.  Java has an
 964 @dfn{agenda}.  I think this is why @code{chdir} is not part of standard
 965 Java, which is inexcusable.
 966 @end enumerate
 967
 968 Unfortunately, there is no perfect language.  Static typing allows a
 969 compiler to catch programmer errors and produce more efficient code, but
 970 makes programming more tedious and less fun.  For the foreseeable future,
 971 an Ideal Editing and Programming Environment (and that is what XEmacs
 972 aspires to) will be programmable in multiple languages: high level ones
 973 like Lisp for user customization and prototyping, and lower level ones
 974 for infrastructure and industrial strength applications.  If I had my
 975 way, XEmacs would be friendly towards the Python, Scheme, C++, ML,
 976 etc... communities.  But there are serious technical difficulties to
 977 achieving that goal.
 978
 979 The word @dfn{application} in the previous paragraph was used
 980 intentionally.  XEmacs implements an API for programs written in Lisp
 981 that makes it a full-fledged application platform, very much like an OS
 982 inside the real OS.
 983
 984 @node XEmacs From the Perspective of Building, XEmacs From the Inside, The Lisp Language, Top
 985 @chapter XEmacs From the Perspective of Building
 986
 987 The heart of XEmacs is the Lisp environment, which is written in C.
 988 This is contained in the @file{src/} subdirectory.  Underneath
 989 @file{src/} are two subdirectories of header files: @file{s/} (header
 990 files for particular operating systems) and @file{m/} (header files for
 991 particular machine types).  In practice the distinction between the two
 992 types of header files is blurred.  These header files define or undefine
 993 certain preprocessor constants and macros to indicate particular
 994 characteristics of the associated machine or operating system.  As part
 995 of the configure process, one @file{s/} file and one @file{m/} file is
 996 identified for the particular environment in which XEmacs is being
 997 built.
 998
 999 XEmacs also contains a great deal of Lisp code.  This implements the
1000 operations that make XEmacs useful as an editor as well as just a Lisp
1001 environment, and also contains many add-on packages that allow XEmacs to
1002 browse directories, act as a mail and Usenet news reader, compile Lisp
1003 code, etc.  There is actually more Lisp code than C code associated with
1004 XEmacs, but much of the Lisp code is peripheral to the actual operation
1005 of the editor.  The Lisp code all lies in subdirectories underneath the
1006 @file{lisp/} directory.
1007
1008 The @file{lwlib/} directory contains C code that implements a
1009 generalized interface onto different X widget toolkits and also
1010 implements some widgets of its own that behave like Motif widgets but
1011 are faster, free, and in some cases more powerful.  The code in this
1012 directory compiles into a library and is mostly independent from XEmacs.
1013
1014 The @file{etc/} directory contains various data files associated with
1015 XEmacs.  Some of them are actually read by XEmacs at startup; others
1016 merely contain useful information of various sorts.
1017
1018 The @file{lib-src/} directory contains C code for various auxiliary
1019 programs that are used in connection with XEmacs.  Some of them are used
1020 during the build process; others are used to perform certain functions
1021 that cannot conveniently be placed in the XEmacs executable (e.g. the
1022 @file{movemail} program for fetching mail out of @file{/var/spool/mail},
1023 which must be setgid to @file{mail} on many systems; and the
1024 @file{gnuclient} program, which allows an external script to communicate
1025 with a running XEmacs process).
1026
1027 The @file{man/} directory contains the sources for the XEmacs
1028 documentation.  It is mostly in a form called Texinfo, which can be
1029 converted into either a printed document (by passing it through @TeX{})
1030 or into on-line documentation called @dfn{info files}.
1031
1032 The @file{info/} directory contains the results of formatting the XEmacs
1033 documentation as @dfn{info files}, for on-line use.  These files are
1034 used when you enter the Info system using @kbd{C-h i} or through the
1035 Help menu.
1036
1037 The @file{dynodump/} directory contains auxiliary code used to build
1038 XEmacs on Solaris platforms.
1039
1040 The other directories contain various miscellaneous code and information
1041 that is not normally used or needed.
1042
1043 The first step of building involves running the @file{configure} program
1044 and passing it various parameters to specify any optional features you
1045 want and compiler arguments and such, as described in the @file{INSTALL}
1046 file.  This determines what the build environment is, chooses the
1047 appropriate @file{s/} and @file{m/} file, and runs a series of tests to
1048 determine many details about your environment, such as which library
1049 functions are available and exactly how they work.  The reason for
1050 running these tests is that it allows XEmacs to be compiled on a much
1051 wider variety of platforms than those that the XEmacs developers happen
1052 to be familiar with, including various sorts of hybrid platforms.  This
1053 is especially important now that many operating systems give you a great
1054 deal of control over exactly what features you want installed, and allow
1055 for easy upgrading of parts of a system without upgrading the rest.  It
1056 would be impossible to pre-determine and pre-specify the information for
1057 all possible configurations.
1058
1059 In fact, the @file{s/} and @file{m/} files are basically @emph{evil},
1060 since they contain unmaintainable platform-specific hard-coded
1061 information.  XEmacs has been moving in the direction of having all
1062 system-specific information be determined dynamically by
1063 @file{configure}.  Perhaps someday we can @code{rm -rf src/s src/m}.
1064
1065 When configure is done running, it generates @file{Makefile}s and
1066 @file{GNUmakefile}s and the file @file{src/config.h} (which describes
1067 the features of your system) from template files.  You then run
1068 @file{make}, which compiles the auxiliary code and programs in
1069 @file{lib-src/} and @file{lwlib/} and the main XEmacs executable in
1070 @file{src/}.  The result of compiling and linking is an executable
1071 called @file{temacs}, which is @emph{not} the final XEmacs executable.
1072 @file{temacs} by itself is not intended to function as an editor or even
1073 display any windows on the screen, and if you simply run it, it will
1074 exit immediately.  The @file{Makefile} runs @file{temacs} with certain
1075 options that cause it to initialize itself, read in a number of basic
1076 Lisp files, and then dump itself out into a new executable called
1077 @file{xemacs}.  This new executable has been pre-initialized and
1078 contains pre-digested Lisp code that is necessary for the editor to
1079 function (this includes most basic editing functions,
1080 e.g. @code{kill-line}, that can be defined in terms of other Lisp
1081 primitives; some initialization code that is called when certain
1082 objects, such as frames, are created; and all of the standard
1083 keybindings and code for the actions they result in).  This executable,
1084 @file{xemacs}, is the executable that you run to use the XEmacs editor.
1085
1086 Although @file{temacs} is not intended to be run as an editor, it can,
1087 by using the incantation @code{temacs -batch -l loadup.el run-temacs}.
1088 This is useful when the dumping procedure described above is broken, or
1089 when using certain program debugging tools such as Purify.  These tools
1090 get mighty confused by the tricks played by the XEmacs build process,
1091 such as allocation memory in one process, and freeing it in the next.
1092
1093 @node XEmacs From the Inside, The XEmacs Object System (Abstractly Speaking), XEmacs From the Perspective of Building, Top
1094 @chapter XEmacs From the Inside
1095
1096 Internally, XEmacs is quite complex, and can be very confusing.  To
1097 simplify things, it can be useful to think of XEmacs as containing an
1098 event loop that ``drives'' everything, and a number of other subsystems,
1099 such as a Lisp engine and a redisplay mechanism.  Each of these other
1100 subsystems exists simultaneously in XEmacs, and each has a certain
1101 state.  The flow of control continually passes in and out of these
1102 different subsystems in the course of normal operation of the editor.
1103
1104 It is important to keep in mind that, most of the time, the editor is
1105 ``driven'' by the event loop.  Except during initialization and batch
1106 mode, all subsystems are entered directly or indirectly through the
1107 event loop, and ultimately, control exits out of all subsystems back up
1108 to the event loop.  This cycle of entering a subsystem, exiting back out
1109 to the event loop, and starting another iteration of the event loop
1110 occurs once each keystroke, mouse motion, etc.
1111
1112 If you're trying to understand a particular subsystem (other than the
1113 event loop), think of it as a ``daemon'' process or ``servant'' that is
1114 responsible for one particular aspect of a larger system, and
1115 periodically receives commands or environment changes that cause it to
1116 do something.  Ultimately, these commands and environment changes are
1117 always triggered by the event loop.  For example:
1118
1119 @itemize @bullet
1120 @item
1121 The window and frame mechanism is responsible for keeping track of what
1122 windows and frames exist, what buffers are in them, etc.  It is
1123 periodically given commands (usually from the user) to make a change to
1124 the current window/frame state: i.e. create a new frame, delete a
1125 window, etc.
1126
1127 @item
1128 The buffer mechanism is responsible for keeping track of what buffers
1129 exist and what text is in them.  It is periodically given commands
1130 (usually from the user) to insert or delete text, create a buffer, etc.
1131 When it receives a text-change command, it notifies the redisplay
1132 mechanism.
1133
1134 @item
1135 The redisplay mechanism is responsible for making sure that windows and
1136 frames are displayed correctly.  It is periodically told (by the event
1137 loop) to actually ``do its job'', i.e. snoop around and see what the
1138 current state of the environment (mostly of the currently-existing
1139 windows, frames, and buffers) is, and make sure that that state matches
1140 what's actually displayed.  It keeps lots and lots of information around
1141 (such as what is actually being displayed currently, and what the
1142 environment was last time it checked) so that it can minimize the work
1143 it has to do.  It is also helped along in that whenever a relevant
1144 change to the environment occurs, the redisplay mechanism is told about
1145 this, so it has a pretty good idea of where it has to look to find
1146 possible changes and doesn't have to look everywhere.
1147
1148 @item
1149 The Lisp engine is responsible for executing the Lisp code in which most
1150 user commands are written.  It is entered through a call to @code{eval}
1151 or @code{funcall}, which occurs as a result of dispatching an event from
1152 the event loop.  The functions it calls issue commands to the buffer
1153 mechanism, the window/frame subsystem, etc.
1154
1155 @item
1156 The Lisp allocation subsystem is responsible for keeping track of Lisp
1157 objects.  It is given commands from the Lisp engine to allocate objects,
1158 garbage collect, etc.
1159 @end itemize
1160
1161 etc.
1162
1163   The important idea here is that there are a number of independent
1164 subsystems each with its own responsibility and persistent state, just
1165 like different employees in a company, and each subsystem is
1166 periodically given commands from other subsystems.  Commands can flow
1167 from any one subsystem to any other, but there is usually some sort of
1168 hierarchy, with all commands originating from the event subsystem.
1169
1170   XEmacs is entered in @code{main()}, which is in @file{emacs.c}.  When
1171 this is called the first time (in a properly-invoked @file{temacs}), it
1172 does the following:
1173
1174 @enumerate
1175 @item
1176 It does some very basic environment initializations, such as determining
1177 where it and its directories (e.g. @file{lisp/} and @file{etc/}) reside
1178 and setting up signal handlers.
1179 @item
1180 It initializes the entire Lisp interpreter.
1181 @item
1182 It sets the initial values of many built-in variables (including many
1183 variables that are visible to Lisp programs), such as the global keymap
1184 object and the built-in faces (a face is an object that describes the
1185 display characteristics of text).  This involves creating Lisp objects
1186 and thus is dependent on step (2).
1187 @item
1188 It performs various other initializations that are relevant to the
1189 particular environment it is running in, such as retrieving environment
1190 variables, determining the current date and the user who is running the
1191 program, examining its standard input, creating any necessary file
1192 descriptors, etc.
1193 @item
1194 At this point, the C initialization is complete.  A Lisp program that
1195 was specified on the command line (usually @file{loadup.el}) is called
1196 (temacs is normally invoked as @code{temacs -batch -l loadup.el dump}).
1197 @file{loadup.el} loads all of the other Lisp files that are needed for
1198 the operation of the editor, calls the @code{dump-emacs} function to
1199 write out @file{xemacs}, and then kills the temacs process.
1200 @end enumerate
1201
1202   When @file{xemacs} is then run, it only redoes steps (1) and (4)
1203 above; all variables already contain the values they were set to when
1204 the executable was dumped, and all memory that was allocated with
1205 @code{malloc()} is still around. (XEmacs knows whether it is being run
1206 as @file{xemacs} or @file{temacs} because it sets the global variable
1207 @code{initialized} to 1 after step (4) above.) At this point,
1208 @file{xemacs} calls a Lisp function to do any further initialization,
1209 which includes parsing the command-line (the C code can only do limited
1210 command-line parsing, which includes looking for the @samp{-batch} and
1211 @samp{-l} flags and a few other flags that it needs to know about before
1212 initialization is complete), creating the first frame (or @dfn{window}
1213 in standard window-system parlance), running the user's init file
1214 (usually the file @file{.emacs} in the user's home directory), etc.  The
1215 function to do this is usually called @code{normal-top-level};
1216 @file{loadup.el} tells the C code about this function by setting its
1217 name as the value of the Lisp variable @code{top-level}.
1218
1219   When the Lisp initialization code is done, the C code enters the event
1220 loop, and stays there for the duration of the XEmacs process.  The code
1221 for the event loop is contained in @file{keyboard.c}, and is called
1222 @code{Fcommand_loop_1()}.  Note that this event loop could very well be
1223 written in Lisp, and in fact a Lisp version exists; but apparently,
1224 doing this makes XEmacs run noticeably slower.
1225
1226   Notice how much of the initialization is done in Lisp, not in C.
1227 In general, XEmacs tries to move as much code as is possible
1228 into Lisp.  Code that remains in C is code that implements the
1229 Lisp interpreter itself, or code that needs to be very fast, or
1230 code that needs to do system calls or other such stuff that
1231 needs to be done in C, or code that needs to have access to
1232 ``forbidden'' structures. (One conscious aspect of the design of
1233 Lisp under XEmacs is a clean separation between the external
1234 interface to a Lisp object's functionality and its internal
1235 implementation.  Part of this design is that Lisp programs
1236 are forbidden from accessing the contents of the object other
1237 than through using a standard API.  In this respect, XEmacs Lisp
1238 is similar to modern Lisp dialects but differs from GNU Emacs,
1239 which tends to expose the implementation and allow Lisp
1240 programs to look at it directly.  The major advantage of
1241 hiding the implementation is that it allows the implementation
1242 to be redesigned without affecting any Lisp programs, including
1243 those that might want to be ``clever'' by looking directly at
1244 the object's contents and possibly manipulating them.)
1245
1246   Moving code into Lisp makes the code easier to debug and maintain and
1247 makes it much easier for people who are not XEmacs developers to
1248 customize XEmacs, because they can make a change with much less chance
1249 of obscure and unwanted interactions occurring than if they were to
1250 change the C code.
1251
1252 @node The XEmacs Object System (Abstractly Speaking), How Lisp Objects Are Represented in C, XEmacs From the Inside, Top
1253 @chapter The XEmacs Object System (Abstractly Speaking)
1254
1255   At the heart of the Lisp interpreter is its management of objects.
1256 XEmacs Lisp contains many built-in objects, some of which are
1257 simple and others of which can be very complex; and some of which
1258 are very common, and others of which are rarely used or are only
1259 used internally. (Since the Lisp allocation system, with its
1260 automatic reclamation of unused storage, is so much more convenient
1261 than @code{malloc()} and @code{free()}, the C code makes extensive use of it
1262 in its internal operations.)
1263
1264   The basic Lisp objects are
1265
1266 @table @code
1267 @item integer
1268 28 or 31 bits of precision, or 60 or 63 bits on 64-bit machines; the
1269 reason for this is described below when the internal Lisp object
1270 representation is described.
1271 @item float
1272 Same precision as a double in C.
1273 @item cons
1274 A simple container for two Lisp objects, used to implement lists and
1275 most other data structures in Lisp.
1276 @item char
1277 An object representing a single character of text; chars behave like
1278 integers in many ways but are logically considered text rather than
1279 numbers and have a different read syntax. (the read syntax for a char
1280 contains the char itself or some textual encoding of it---for example,
1281 a Japanese Kanji character might be encoded as @samp{^[$(B#&^[(B} using the
1282 ISO-2022 encoding standard---rather than the numerical representation
1283 of the char; this way, if the mapping between chars and integers
1284 changes, which is quite possible for Kanji characters and other extended
1285 characters, the same character will still be created.  Note that some
1286 primitives confuse chars and integers.  The worst culprit is @code{eq},
1287 which makes a special exception and considers a char to be @code{eq} to
1288 its integer equivalent, even though in no other case are objects of two
1289 different types @code{eq}.  The reason for this monstrosity is
1290 compatibility with existing code; the separation of char from integer
1291 came fairly recently.)
1292 @item symbol
1293 An object that contains Lisp objects and is referred to by name;
1294 symbols are used to implement variables and named functions
1295 and to provide the equivalent of preprocessor constants in C.
1296 @item vector
1297 A one-dimensional array of Lisp objects providing constant-time access
1298 to any of the objects; access to an arbitrary object in a vector is
1299 faster than for lists, but the operations that can be done on a vector
1300 are more limited.
1301 @item string
1302 Self-explanatory; behaves much like a vector of chars
1303 but has a different read syntax and is stored and manipulated
1304 more compactly.
1305 @item bit-vector
1306 A vector of bits; similar to a string in spirit.
1307 @item compiled-function
1308 An object containing compiled Lisp code, known as @dfn{byte code}.
1309 @item subr
1310 A Lisp primitive, i.e. a Lisp-callable function implemented in C.
1311 @end table
1312
1313 @cindex closure
1314 Note that there is no basic ``function'' type, as in more powerful
1315 versions of Lisp (where it's called a @dfn{closure}).  XEmacs Lisp does
1316 not provide the closure semantics implemented by Common Lisp and Scheme.
1317 The guts of a function in XEmacs Lisp are represented in one of four
1318 ways: a symbol specifying another function (when one function is an
1319 alias for another), a list (whose first element must be the symbol
1320 @code{lambda}) containing the function's source code, a
1321 compiled-function object, or a subr object. (In other words, given a
1322 symbol specifying the name of a function, calling @code{symbol-function}
1323 to retrieve the contents of the symbol's function cell will return one
1324 of these types of objects.)
1325
1326 XEmacs Lisp also contains numerous specialized objects used to implement
1327 the editor:
1328
1329 @table @code
1330 @item buffer
1331 Stores text like a string, but is optimized for insertion and deletion
1332 and has certain other properties that can be set.
1333 @item frame
1334 An object with various properties whose displayable representation is a
1335 @dfn{window} in window-system parlance.
1336 @item window
1337 A section of a frame that displays the contents of a buffer;
1338 often called a @dfn{pane} in window-system parlance.
1339 @item window-configuration
1340 An object that represents a saved configuration of windows in a frame.
1341 @item device
1342 An object representing a screen on which frames can be displayed;
1343 equivalent to a @dfn{display} in the X Window System and a @dfn{TTY} in
1344 character mode.
1345 @item face
1346 An object specifying the appearance of text or graphics; it has
1347 properties such as font, foreground color, and background color.
1348 @item marker
1349 An object that refers to a particular position in a buffer and moves
1350 around as text is inserted and deleted to stay in the same relative
1351 position to the text around it.
1352 @item extent
1353 Similar to a marker but covers a range of text in a buffer; can also
1354 specify properties of the text, such as a face in which the text is to
1355 be displayed, whether the text is invisible or unmodifiable, etc.
1356 @item event
1357 Generated by calling @code{next-event} and contains information
1358 describing a particular event happening in the system, such as the user
1359 pressing a key or a process terminating.
1360 @item keymap
1361 An object that maps from events (described using lists, vectors, and
1362 symbols rather than with an event object because the mapping is for
1363 classes of events, rather than individual events) to functions to
1364 execute or other events to recursively look up; the functions are
1365 described by name, using a symbol, or using lists to specify the
1366 function's code.
1367 @item glyph
1368 An object that describes the appearance of an image (e.g.  pixmap) on
1369 the screen; glyphs can be attached to the beginning or end of extents
1370 and in some future version of XEmacs will be able to be inserted
1371 directly into a buffer.
1372 @item process
1373 An object that describes a connection to an externally-running process.
1374 @end table
1375
1376   There are some other, less-commonly-encountered general objects:
1377
1378 @table @code
1379 @item hash-table
1380 An object that maps from an arbitrary Lisp object to another arbitrary
1381 Lisp object, using hashing for fast lookup.
1382 @item obarray
1383 A limited form of hash-table that maps from strings to symbols; obarrays
1384 are used to look up a symbol given its name and are not actually their
1385 own object type but are kludgily represented using vectors with hidden
1386 fields (this representation derives from GNU Emacs).
1387 @item specifier
1388 A complex object used to specify the value of a display property; a
1389 default value is given and different values can be specified for
1390 particular frames, buffers, windows, devices, or classes of device.
1391 @item char-table
1392 An object that maps from chars or classes of chars to arbitrary Lisp
1393 objects; internally char tables use a complex nested-vector
1394 representation that is optimized to the way characters are represented
1395 as integers.
1396 @item range-table
1397 An object that maps from ranges of integers to arbitrary Lisp objects.
1398 @end table
1399
1400   And some strange special-purpose objects:
1401
1402 @table @code
1403 @item charset
1404 @itemx coding-system
1405 Objects used when MULE, or multi-lingual/Asian-language, support is
1406 enabled.
1407 @item color-instance
1408 @itemx font-instance
1409 @itemx image-instance
1410 An object that encapsulates a window-system resource; instances are
1411 mostly used internally but are exposed on the Lisp level for cleanness
1412 of the specifier model and because it's occasionally useful for Lisp
1413 program to create or query the properties of instances.
1414 @item subwindow
1415 An object that encapsulate a @dfn{subwindow} resource, i.e. a
1416 window-system child window that is drawn into by an external process;
1417 this object should be integrated into the glyph system but isn't yet,
1418 and may change form when this is done.
1419 @item tooltalk-message
1420 @itemx tooltalk-pattern
1421 Objects that represent resources used in the ToolTalk interprocess
1422 communication protocol.
1423 @item toolbar-button
1424 An object used in conjunction with the toolbar.
1425 @end table
1426
1427   And objects that are only used internally:
1428
1429 @table @code
1430 @item opaque
1431 A generic object for encapsulating arbitrary memory; this allows you the
1432 generality of @code{malloc()} and the convenience of the Lisp object
1433 system.
1434 @item lstream
1435 A buffering I/O stream, used to provide a unified interface to anything
1436 that can accept output or provide input, such as a file descriptor, a
1437 stdio stream, a chunk of memory, a Lisp buffer, a Lisp string, etc.;
1438 it's a Lisp object to make its memory management more convenient.
1439 @item char-table-entry
1440 Subsidiary objects in the internal char-table representation.
1441 @item extent-auxiliary
1442 @itemx menubar-data
1443 @itemx toolbar-data
1444 Various special-purpose objects that are basically just used to
1445 encapsulate memory for particular subsystems, similar to the more
1446 general ``opaque'' object.
1447 @item symbol-value-forward
1448 @itemx symbol-value-buffer-local
1449 @itemx symbol-value-varalias
1450 @itemx symbol-value-lisp-magic
1451 Special internal-only objects that are placed in the value cell of a
1452 symbol to indicate that there is something special with this variable --
1453 e.g. it has no value, it mirrors another variable, or it mirrors some C
1454 variable; there is really only one kind of object, called a
1455 @dfn{symbol-value-magic}, but it is sort-of halfway kludged into
1456 semi-different object types.
1457 @end table
1458
1459 @cindex permanent objects
1460 @cindex temporary objects
1461   Some types of objects are @dfn{permanent}, meaning that once created,
1462 they do not disappear until explicitly destroyed, using a function such
1463 as @code{delete-buffer}, @code{delete-window}, @code{delete-frame}, etc.
1464 Others will disappear once they are not longer used, through the garbage
1465 collection mechanism.  Buffers, frames, windows, devices, and processes
1466 are among the objects that are permanent.  Note that some objects can go
1467 both ways: Faces can be created either way; extents are normally
1468 permanent, but detached extents (extents not referring to any text, as
1469 happens to some extents when the text they are referring to is deleted)
1470 are temporary.  Note that some permanent objects, such as faces and
1471 coding systems, cannot be deleted.  Note also that windows are unique in
1472 that they can be @emph{undeleted} after having previously been
1473 deleted. (This happens as a result of restoring a window configuration.)
1474
1475 @cindex read syntax
1476   Note that many types of objects have a @dfn{read syntax}, i.e. a way of
1477 specifying an object of that type in Lisp code.  When you load a Lisp
1478 file, or type in code to be evaluated, what really happens is that the
1479 function @code{read} is called, which reads some text and creates an object
1480 based on the syntax of that text; then @code{eval} is called, which
1481 possibly does something special; then this loop repeats until there's
1482 no more text to read. (@code{eval} only actually does something special
1483 with symbols, which causes the symbol's value to be returned,
1484 similar to referencing a variable; and with conses [i.e. lists],
1485 which cause a function invocation.  All other values are returned
1486 unchanged.)
1487
1488   The read syntax
1489
1490 @example
1491 17297
1492 @end example
1493
1494 converts to an integer whose value is 17297.
1495
1496 @example
1497 1.983e-4
1498 @end example
1499
1500 converts to a float whose value is 1.983e-4, or .0001983.
1501
1502 @example
1503 ?b
1504 @end example
1505
1506 converts to a char that represents the lowercase letter b.
1507
1508 @example
1509 ?^[$(B#&^[(B
1510 @end example
1511
1512 (where @samp{^[} actually is an @samp{ESC} character) converts to a
1513 particular Kanji character when using an ISO2022-based coding system for
1514 input. (To decode this goo: @samp{ESC} begins an escape sequence;
1515 @samp{ESC $ (} is a class of escape sequences meaning ``switch to a
1516 94x94 character set''; @samp{ESC $ ( B} means ``switch to Japanese
1517 Kanji''; @samp{#} and @samp{&} collectively index into a 94-by-94 array
1518 of characters [subtract 33 from the ASCII value of each character to get
1519 the corresponding index]; @samp{ESC (} is a class of escape sequences
1520 meaning ``switch to a 94 character set''; @samp{ESC (B} means ``switch
1521 to US ASCII''.  It is a coincidence that the letter @samp{B} is used to
1522 denote both Japanese Kanji and US ASCII.  If the first @samp{B} were
1523 replaced with an @samp{A}, you'd be requesting a Chinese Hanzi character
1524 from the GB2312 character set.)
1525
1526 @example
1527 "foobar"
1528 @end example
1529
1530 converts to a string.
1531
1532 @example
1533 foobar
1534 @end example
1535
1536 converts to a symbol whose name is @code{"foobar"}.  This is done by
1537 looking up the string equivalent in the global variable
1538 @code{obarray}, whose contents should be an obarray.  If no symbol
1539 is found, a new symbol with the name @code{"foobar"} is automatically
1540 created and added to @code{obarray}; this process is called
1541 @dfn{interning} the symbol.
1542 @cindex interning
1543
1544 @example
1545 (foo . bar)
1546 @end example
1547
1548 converts to a cons cell containing the symbols @code{foo} and @code{bar}.
1549
1550 @example
1551 (1 a 2.5)
1552 @end example
1553
1554 converts to a three-element list containing the specified objects
1555 (note that a list is actually a set of nested conses; see the
1556 XEmacs Lisp Reference).
1557
1558 @example
1559 [1 a 2.5]
1560 @end example
1561
1562 converts to a three-element vector containing the specified objects.
1563
1564 @example
1565 #[... ... ... ...]
1566 @end example
1567
1568 converts to a compiled-function object (the actual contents are not
1569 shown since they are not relevant here; look at a file that ends with
1570 @file{.elc} for examples).
1571
1572 @example
1573 #*01110110
1574 @end example
1575
1576 converts to a bit-vector.
1577
1578 @example
1579 #s(hash-table ... ...)
1580 @end example
1581
1582 converts to a hash table (the actual contents are not shown).
1583
1584 @example
1585 #s(range-table ... ...)
1586 @end example
1587
1588 converts to a range table (the actual contents are not shown).
1589
1590 @example
1591 #s(char-table ... ...)
1592 @end example
1593
1594 converts to a char table (the actual contents are not shown).
1595
1596 Note that the @code{#s()} syntax is the general syntax for structures,
1597 which are not really implemented in XEmacs Lisp but should be.
1598
1599 When an object is printed out (using @code{print} or a related
1600 function), the read syntax is used, so that the same object can be read
1601 in again.
1602
1603 The other objects do not have read syntaxes, usually because it does not
1604 really make sense to create them in this fashion (i.e.  processes, where
1605 it doesn't make sense to have a subprocess created as a side effect of
1606 reading some Lisp code), or because they can't be created at all
1607 (e.g. subrs).  Permanent objects, as a rule, do not have a read syntax;
1608 nor do most complex objects, which contain too much state to be easily
1609 initialized through a read syntax.
1610
1611 @node How Lisp Objects Are Represented in C, Rules When Writing New C Code, The XEmacs Object System (Abstractly Speaking), Top
1612 @chapter How Lisp Objects Are Represented in C
1613
1614 Lisp objects are represented in C using a 32-bit or 64-bit machine word
1615 (depending on the processor; i.e. DEC Alphas use 64-bit Lisp objects and
1616 most other processors use 32-bit Lisp objects).  The representation
1617 stuffs a pointer together with a tag, as follows:
1618
1619 @example
1620  [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
1621  [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
1622
1623    <---------------------------------------------------------> <->
1624             a pointer to a structure, or an integer            tag
1625 @end example
1626
1627 A tag of 00 is used for all pointer object types, a tag of 10 is used
1628 for characters, and the other two tags 01 and 11 are joined together to
1629 form the integer object type.  This representation gives us 31 bits
1630 integers, 30 bits characters and pointers are represented directly
1631 without any bit masking.  This representation, though, assumes that
1632 pointers to structs are always aligned to multiples of 4, so the lower 2
1633 bits are always zero.
1634
1635 Lisp objects use the typedef @code{Lisp_Object}, but the actual C type
1636 used for the Lisp object can vary.  It can be either a simple type
1637 (@code{long} on the DEC Alpha, @code{int} on other machines) or a
1638 structure whose fields are bit fields that line up properly (actually, a
1639 union of structures is used).  Generally the simple integral type is
1640 preferable because it ensures that the compiler will actually use a
1641 machine word to represent the object (some compilers will use more
1642 general and less efficient code for unions and structs even if they can
1643 fit in a machine word).  The union type, however, has the advantage of
1644 stricter type checking (if you accidentally pass an integer where a Lisp
1645 object is desired, you get a compile error), and it makes it easier to
1646 decode Lisp objects when debugging.  The choice of which type to use is
1647 determined by the preprocessor constant @code{USE_UNION_TYPE} which is
1648 defined via the @code{--use-union-type} option to @code{configure}.
1649
1650 Various macros are used to construct Lisp objects and extract the
1651 components.  Macros of the form @code{XINT()}, @code{XCHAR()},
1652 @code{XSTRING()}, @code{XSYMBOL()}, etc. shift out the tag field if
1653 needed cast it to the appropriate type.  @code{XINT()} needs to be a bit
1654 tricky so that negative numbers are properly sign-extended.  Since
1655 integers are stored left-shifted, if the right-shift operator does an
1656 arithmetic shift (i.e. it leaves the most-significant bit as-is rather
1657 than shifting in a zero, so that it mimics a divide-by-two even for
1658 negative numbers) the shift to remove the tag bit is enough.  This is
1659 the case on all the systems we support.
1660
1661 Note that when @code{ERROR_CHECK_TYPECHECK} is defined, the extractor
1662 macros become more complicated---they check the tag bits and/or the
1663 type field in the first four bytes of a record type to ensure that the
1664 object is really of the correct type.  This is great for catching places
1665 where an incorrect type is being dereferenced---this typically results
1666 in a pointer being dereferenced as the wrong type of structure, with
1667 unpredictable (and sometimes not easily traceable) results.
1668
1669 There are similar @code{XSET@var{TYPE}()} macros that construct a Lisp
1670 object.  These macros are of the form @code{XSET@var{TYPE}
1671 (@var{lvalue}, @var{result})},
1672 i.e. they have to be a statement rather than just used in an expression.
1673 The reason for this is that standard C doesn't let you ``construct'' a
1674 structure (but GCC does).  Granted, this sometimes isn't too convenient;
1675 for the case of integers, at least, you can use the function
1676 @code{make_int()}, which constructs and @emph{returns} an integer
1677 Lisp object.  Note that the @code{XSET@var{TYPE}()} macros are also
1678 affected by @code{ERROR_CHECK_TYPECHECK} and make sure that the
1679 structure is of the right type in the case of record types, where the
1680 type is contained in the structure.
1681
1682 The C programmer is responsible for @strong{guaranteeing} that a
1683 Lisp_Object is is the correct type before using the @code{X@var{TYPE}}
1684 macros.  This is especially important in the case of lists.  Use
1685 @code{XCAR} and @code{XCDR} if a Lisp_Object is certainly a cons cell,
1686 else use @code{Fcar()} and @code{Fcdr()}.  Trust other C code, but not
1687 Lisp code.  On the other hand, if XEmacs has an internal logic error,
1688 it's better to crash immediately, so sprinkle ``unreachable''
1689 @code{abort()}s liberally about the source code.
1690
1691 @node Rules When Writing New C Code, A Summary of the Various XEmacs Modules, How Lisp Objects Are Represented in C, Top
1692 @chapter Rules When Writing New C Code
1693
1694 The XEmacs C Code is extremely complex and intricate, and there are many
1695 rules that are more or less consistently followed throughout the code.
1696 Many of these rules are not obvious, so they are explained here.  It is
1697 of the utmost importance that you follow them.  If you don't, you may
1698 get something that appears to work, but which will crash in odd
1699 situations, often in code far away from where the actual breakage is.
1700
1701 @menu
1702 * General Coding Rules::
1703 * Writing Lisp Primitives::
1704 * Adding Global Lisp Variables::
1705 * Coding for Mule::
1706 * Techniques for XEmacs Developers::
1707 @end menu
1708
1709 @node General Coding Rules, Writing Lisp Primitives, Rules When Writing New C Code, Rules When Writing New C Code
1710 @section General Coding Rules
1711
1712 The C code is actually written in a dialect of C called @dfn{Clean C},
1713 meaning that it can be compiled, mostly warning-free, with either a C or
1714 C++ compiler.  Coding in Clean C has several advantages over plain C.
1715 C++ compilers are more nit-picking, and a number of coding errors have
1716 been found by compiling with C++.  The ability to use both C and C++
1717 tools means that a greater variety of development tools are available to
1718 the developer.
1719
1720 Almost every module contains a @code{syms_of_*()} function and a
1721 @code{vars_of_*()} function.  The former declares any Lisp primitives
1722 you have defined and defines any symbols you will be using.  The latter
1723 declares any global Lisp variables you have added and initializes global
1724 C variables in the module.  For each such function, declare it in
1725 @file{symsinit.h} and make sure it's called in the appropriate place in
1726 @file{emacs.c}.  @strong{Important}: There are stringent requirements on
1727 exactly what can go into these functions.  See the comment in
1728 @file{emacs.c}.  The reason for this is to avoid obscure unwanted
1729 interactions during initialization.  If you don't follow these rules,
1730 you'll be sorry!  If you want to do anything that isn't allowed, create
1731 a @code{complex_vars_of_*()} function for it.  Doing this is tricky,
1732 though: You have to make sure your function is called at the right time
1733 so that all the initialization dependencies work out.
1734
1735 Every module includes @file{<config.h>} (angle brackets so that
1736 @samp{--srcdir} works correctly; @file{config.h} may or may not be in
1737 the same directory as the C sources) and @file{lisp.h}.  @file{config.h}
1738 must always be included before any other header files (including
1739 system header files) to ensure that certain tricks played by various
1740 @file{s/} and @file{m/} files work out correctly.
1741
1742 When including header files, always use angle brackets, not double
1743 quotes, except when the file to be included is in the same directory as
1744 the including file.  If either file is a generated file, then that is
1745 not likely to be the case.  In order to understand why we have this
1746 rule, imagine what happens when you do a build in the source directory
1747 using @samp{./configure} and another build in another directory using
1748 @samp{../work/configure}.  There will be two different @file{config.h}
1749 files.  Which one will be used if you @samp{#include "config.h"}?
1750
1751 @strong{All global and static variables that are to be modifiable must
1752 be declared uninitialized.}  This means that you may not use the
1753 ``declare with initializer'' form for these variables, such as @code{int
1754 some_variable = 0;}.  The reason for this has to do with some kludges
1755 done during the dumping process: If possible, the initialized data
1756 segment is re-mapped so that it becomes part of the (unmodifiable) code
1757 segment in the dumped executable.  This allows this memory to be shared
1758 among multiple running XEmacs processes.  XEmacs is careful to place as
1759 much constant data as possible into initialized variables during the
1760 @file{temacs} phase.
1761
1762 @cindex copy-on-write
1763 @strong{Please note:} This kludge only works on a few systems nowadays,
1764 and is rapidly becoming irrelevant because most modern operating systems
1765 provide @dfn{copy-on-write} semantics.  All data is initially shared
1766 between processes, and a private copy is automatically made (on a
1767 page-by-page basis) when a process first attempts to write to a page of
1768 memory.
1769
1770 Formerly, there was a requirement that static variables not be declared
1771 inside of functions.  This had to do with another hack along the same
1772 vein as what was just described: old USG systems put statically-declared
1773 variables in the initialized data space, so those header files had a
1774 @code{#define static} declaration. (That way, the data-segment remapping
1775 described above could still work.) This fails badly on static variables
1776 inside of functions, which suddenly become automatic variables;
1777 therefore, you weren't supposed to have any of them.  This awful kludge
1778 has been removed in XEmacs because
1779
1780 @enumerate
1781 @item
1782 almost all of the systems that used this kludge ended up having
1783 to disable the data-segment remapping anyway;
1784 @item
1785 the only systems that didn't were extremely outdated ones;
1786 @item
1787 this hack completely messed up inline functions.
1788 @end enumerate
1789
1790 The C source code makes heavy use of C preprocessor macros.  One popular
1791 macro style is:
1792
1793 @example
1794 #define FOO(var, value) do @{           \
1795   Lisp_Object FOO_value = (value);      \
1796   ... /* compute using FOO_value */     \
1797   (var) = bar;                          \
1798 @} while (0)
1799 @end example
1800
1801 The @code{do @{...@} while (0)} is a standard trick to allow FOO to have
1802 statement semantics, so that it can safely be used within an @code{if}
1803 statement in C, for example.  Multiple evaluation is prevented by
1804 copying a supplied argument into a local variable, so that
1805 @code{FOO(var,fun(1))} only calls @code{fun} once.
1806
1807 Lisp lists are popular data structures in the C code as well as in
1808 Elisp.  There are two sets of macros that iterate over lists.
1809 @code{EXTERNAL_LIST_LOOP_@var{n}} should be used when the list has been
1810 supplied by the user, and cannot be trusted to be acyclic and
1811 nil-terminated.  A @code{malformed-list} or @code{circular-list} error
1812 will be generated if the list being iterated over is not entirely
1813 kosher.  @code{LIST_LOOP_@var{n}}, on the other hand, is faster and less
1814 safe, and can be used only on trusted lists.
1815
1816 Related macros are @code{GET_EXTERNAL_LIST_LENGTH} and
1817 @code{GET_LIST_LENGTH}, which calculate the length of a list, and in the
1818 case of @code{GET_EXTERNAL_LIST_LENGTH}, validating the properness of
1819 the list.  The macros @code{EXTERNAL_LIST_LOOP_DELETE_IF} and
1820 @code{LIST_LOOP_DELETE_IF} delete elements from a lisp list satisfying some
1821 predicate.
1822
1823 @node Writing Lisp Primitives, Adding Global Lisp Variables, General Coding Rules, Rules When Writing New C Code
1824 @section Writing Lisp Primitives
1825
1826 Lisp primitives are Lisp functions implemented in C.  The details of
1827 interfacing the C function so that Lisp can call it are handled by a few
1828 C macros.  The only way to really understand how to write new C code is
1829 to read the source, but we can explain some things here.
1830
1831 An example of a special form is the definition of @code{prog1}, from
1832 @file{eval.c}.  (An ordinary function would have the same general
1833 appearance.)
1834
1835 @cindex garbage collection protection
1836 @smallexample
1837 @group
1838 DEFUN ("prog1", Fprog1, 1, UNEVALLED, 0, /*
1839 Similar to `progn', but the value of the first form is returned.
1840 \(prog1 FIRST BODY...): All the arguments are evaluated sequentially.
1841 The value of FIRST is saved during evaluation of the remaining args,
1842 whose values are discarded.
1843 */
1844        (args))
1845 @{
1846   /* This function can GC */
1847   REGISTER Lisp_Object val, form, tail;
1848   struct gcpro gcpro1;
1849
1850   val = Feval (XCAR (args));
1851
1852   GCPRO1 (val);
1853
1854   LIST_LOOP_3 (form, XCDR (args), tail)
1855     Feval (form);
1856
1857   UNGCPRO;
1858   return val;
1859 @}
1860 @end group
1861 @end smallexample
1862
1863   Let's start with a precise explanation of the arguments to the
1864 @code{DEFUN} macro.  Here is a template for them:
1865
1866 @example
1867 @group
1868 DEFUN (@var{lname}, @var{fname}, @var{min_args}, @var{max_args}, @var{interactive}, /*
1869 @var{docstring}
1870 */
1871    (@var{arglist}))
1872 @end group
1873 @end example
1874
1875 @table @var
1876 @item lname
1877 This string is the name of the Lisp symbol to define as the function
1878 name; in the example above, it is @code{"prog1"}.
1879
1880 @item fname
1881 This is the C function name for this function.  This is the name that is
1882 used in C code for calling the function.  The name is, by convention,
1883 @samp{F} prepended to the Lisp name, with all dashes (@samp{-}) in the
1884 Lisp name changed to underscores.  Thus, to call this function from C
1885 code, call @code{Fprog1}.  Remember that the arguments are of type
1886 @code{Lisp_Object}; various macros and functions for creating values of
1887 type @code{Lisp_Object} are declared in the file @file{lisp.h}.
1888
1889 Primitives whose names are special characters (e.g. @code{+} or
1890 @code{<}) are named by spelling out, in some fashion, the special
1891 character: e.g. @code{Fplus()} or @code{Flss()}.  Primitives whose names
1892 begin with normal alphanumeric characters but also contain special
1893 characters are spelled out in some creative way, e.g. @code{let*}
1894 becomes @code{FletX()}.
1895
1896 Each function also has an associated structure that holds the data for
1897 the subr object that represents the function in Lisp.  This structure
1898 conveys the Lisp symbol name to the initialization routine that will
1899 create the symbol and store the subr object as its definition.  The C
1900 variable name of this structure is always @samp{S} prepended to the
1901 @var{fname}.  You hardly ever need to be aware of the existence of this
1902 structure, since @code{DEFUN} plus @code{DEFSUBR} takes care of all the
1903 details.
1904
1905 @item min_args
1906 This is the minimum number of arguments that the function requires.  The
1907 function @code{prog1} allows a minimum of one argument.
1908
1909 @item max_args
1910 This is the maximum number of arguments that the function accepts, if
1911 there is a fixed maximum.  Alternatively, it can be @code{UNEVALLED},
1912 indicating a special form that receives unevaluated arguments, or
1913 @code{MANY}, indicating an unlimited number of evaluated arguments (the
1914 C equivalent of @code{&rest}).  Both @code{UNEVALLED} and @code{MANY}
1915 are macros.  If @var{max_args} is a number, it may not be less than
1916 @var{min_args} and it may not be greater than 8. (If you need to add a
1917 function with more than 8 arguments, use the @code{MANY} form.  Resist
1918 the urge to edit the definition of @code{DEFUN} in @file{lisp.h}.  If
1919 you do it anyways, make sure to also add another clause to the switch
1920 statement in @code{primitive_funcall().})
1921
1922 @item interactive
1923 This is an interactive specification, a string such as might be used as
1924 the argument of @code{interactive} in a Lisp function.  In the case of
1925 @code{prog1}, it is 0 (a null pointer), indicating that @code{prog1}
1926 cannot be called interactively.  A value of @code{""} indicates a
1927 function that should receive no arguments when called interactively.
1928
1929 @item docstring
1930 This is the documentation string.  It is written just like a
1931 documentation string for a function defined in Lisp; in particular, the
1932 first line should be a single sentence.  Note how the documentation
1933 string is enclosed in a comment, none of the documentation is placed on
1934 the same lines as the comment-start and comment-end characters, and the
1935 comment-start characters are on the same line as the interactive
1936 specification.  @file{make-docfile}, which scans the C files for
1937 documentation strings, is very particular about what it looks for, and
1938 will not properly extract the doc string if it's not in this exact format.
1939
1940 In order to make both @file{etags} and @file{make-docfile} happy, make
1941 sure that the @code{DEFUN} line contains the @var{lname} and
1942 @var{fname}, and that the comment-start characters for the doc string
1943 are on the same line as the interactive specification, and put a newline
1944 directly after them (and before the comment-end characters).
1945
1946 @item arglist
1947 This is the comma-separated list of arguments to the C function.  For a
1948 function with a fixed maximum number of arguments, provide a C argument
1949 for each Lisp argument.  In this case, unlike regular C functions, the
1950 types of the arguments are not declared; they are simply always of type
1951 @code{Lisp_Object}.
1952
1953 The names of the C arguments will be used as the names of the arguments
1954 to the Lisp primitive as displayed in its documentation, modulo the same
1955 concerns described above for @code{F...} names (in particular,
1956 underscores in the C arguments become dashes in the Lisp arguments).
1957
1958 There is one additional kludge: A trailing `_' on the C argument is
1959 discarded when forming the Lisp argument.  This allows C language
1960 reserved words (like @code{default}) or global symbols (like
1961 @code{dirname}) to be used as argument names without compiler warnings
1962 or errors.
1963
1964 A Lisp function with @w{@var{max_args} = @code{UNEVALLED}} is a
1965 @w{@dfn{special form}}; its arguments are not evaluated.  Instead it
1966 receives one argument of type @code{Lisp_Object}, a (Lisp) list of the
1967 unevaluated arguments, conventionally named @code{(args)}.
1968
1969 When a Lisp function has no upper limit on the number of arguments,
1970 specify @w{@var{max_args} = @code{MANY}}.  In this case its implementation in
1971 C actually receives exactly two arguments: the number of Lisp arguments
1972 (an @code{int}) and the address of a block containing their values (a
1973 @w{@code{Lisp_Object *}}).  In this case only are the C types specified
1974 in the @var{arglist}: @w{@code{(int nargs, Lisp_Object *args)}}.
1975
1976 @end table
1977
1978 Within the function @code{Fprog1} itself, note the use of the macros
1979 @code{GCPRO1} and @code{UNGCPRO}.  @code{GCPRO1} is used to ``protect''
1980 a variable from garbage collection---to inform the garbage collector
1981 that it must look in that variable and regard the object pointed at by
1982 its contents as an accessible object.  This is necessary whenever you
1983 call @code{Feval} or anything that can directly or indirectly call
1984 @code{Feval} (this includes the @code{QUIT} macro!).  At such a time,
1985 any Lisp object that you intend to refer to again must be protected
1986 somehow.  @code{UNGCPRO} cancels the protection of the variables that
1987 are protected in the current function.  It is necessary to do this
1988 explicitly.
1989
1990 The macro @code{GCPRO1} protects just one local variable.  If you want
1991 to protect two, use @code{GCPRO2} instead; repeating @code{GCPRO1} will
1992 not work.  Macros @code{GCPRO3} and @code{GCPRO4} also exist.
1993
1994 These macros implicitly use local variables such as @code{gcpro1}; you
1995 must declare these explicitly, with type @code{struct gcpro}.  Thus, if
1996 you use @code{GCPRO2}, you must declare @code{gcpro1} and @code{gcpro2}.
1997
1998 @cindex caller-protects (@code{GCPRO} rule)
1999 Note also that the general rule is @dfn{caller-protects}; i.e. you are
2000 only responsible for protecting those Lisp objects that you create.  Any
2001 objects passed to you as arguments should have been protected by whoever
2002 created them, so you don't in general have to protect them.
2003
2004 In particular, the arguments to any Lisp primitive are always
2005 automatically @code{GCPRO}ed, when called ``normally'' from Lisp code or
2006 bytecode.  So only a few Lisp primitives that are called frequently from
2007 C code, such as @code{Fprogn} protect their arguments as a service to
2008 their caller.  You don't need to protect your arguments when writing a
2009 new @code{DEFUN}.
2010
2011 @code{GCPRO}ing is perhaps the trickiest and most error-prone part of
2012 XEmacs coding.  It is @strong{extremely} important that you get this
2013 right and use a great deal of discipline when writing this code.
2014 @xref{GCPROing, ,@code{GCPRO}ing}, for full details on how to do this.
2015
2016 What @code{DEFUN} actually does is declare a global structure of type
2017 @code{Lisp_Subr} whose name begins with capital @samp{SF} and which
2018 contains information about the primitive (e.g. a pointer to the
2019 function, its minimum and maximum allowed arguments, a string describing
2020 its Lisp name); @code{DEFUN} then begins a normal C function declaration
2021 using the @code{F...} name.  The Lisp subr object that is the function
2022 definition of a primitive (i.e. the object in the function slot of the
2023 symbol that names the primitive) actually points to this @samp{SF}
2024 structure; when @code{Feval} encounters a subr, it looks in the
2025 structure to find out how to call the C function.
2026
2027 Defining the C function is not enough to make a Lisp primitive
2028 available; you must also create the Lisp symbol for the primitive (the
2029 symbol is @dfn{interned}; @pxref{Obarrays}) and store a suitable subr
2030 object in its function cell. (If you don't do this, the primitive won't
2031 be seen by Lisp code.) The code looks like this:
2032
2033 @example
2034 DEFSUBR (@var{fname});
2035 @end example
2036
2037 @noindent
2038 Here @var{fname} is the same name you used as the second argument to
2039 @code{DEFUN}.
2040
2041 This call to @code{DEFSUBR} should go in the @code{syms_of_*()} function
2042 at the end of the module.  If no such function exists, create it and
2043 make sure to also declare it in @file{symsinit.h} and call it from the
2044 appropriate spot in @code{main()}.  @xref{General Coding Rules}.
2045
2046 Note that C code cannot call functions by name unless they are defined
2047 in C.  The way to call a function written in Lisp from C is to use
2048 @code{Ffuncall}, which embodies the Lisp function @code{funcall}.  Since
2049 the Lisp function @code{funcall} accepts an unlimited number of
2050 arguments, in C it takes two: the number of Lisp-level arguments, and a
2051 one-dimensional array containing their values.  The first Lisp-level
2052 argument is the Lisp function to call, and the rest are the arguments to
2053 pass to it.  Since @code{Ffuncall} can call the evaluator, you must
2054 protect pointers from garbage collection around the call to
2055 @code{Ffuncall}. (However, @code{Ffuncall} explicitly protects all of
2056 its parameters, so you don't have to protect any pointers passed as
2057 parameters to it.)
2058
2059 The C functions @code{call0}, @code{call1}, @code{call2}, and so on,
2060 provide handy ways to call a Lisp function conveniently with a fixed
2061 number of arguments.  They work by calling @code{Ffuncall}.
2062
2063 @file{eval.c} is a very good file to look through for examples;
2064 @file{lisp.h} contains the definitions for important macros and
2065 functions.
2066
2067 @node Adding Global Lisp Variables, Coding for Mule, Writing Lisp Primitives, Rules When Writing New C Code
2068 @section Adding Global Lisp Variables
2069
2070 Global variables whose names begin with @samp{Q} are constants whose
2071 value is a symbol of a particular name.  The name of the variable should
2072 be derived from the name of the symbol using the same rules as for Lisp
2073 primitives.  These variables are initialized using a call to
2074 @code{defsymbol()} in the @code{syms_of_*()} function. (This call
2075 interns a symbol, sets the C variable to the resulting Lisp object, and
2076 calls @code{staticpro()} on the C variable to tell the
2077 garbage-collection mechanism about this variable.  What
2078 @code{staticpro()} does is add a pointer to the variable to a large
2079 global array; when garbage-collection happens, all pointers listed in
2080 the array are used as starting points for marking Lisp objects.  This is
2081 important because it's quite possible that the only current reference to
2082 the object is the C variable.  In the case of symbols, the
2083 @code{staticpro()} doesn't matter all that much because the symbol is
2084 contained in @code{obarray}, which is itself @code{staticpro()}ed.
2085 However, it's possible that a naughty user could do something like
2086 uninterning the symbol out of @code{obarray} or even setting
2087 @code{obarray} to a different value [although this is likely to make
2088 XEmacs crash!].)
2089
2090   @strong{Please note:} It is potentially deadly if you declare a
2091 @samp{Q...}  variable in two different modules.  The two calls to
2092 @code{defsymbol()} are no problem, but some linkers will complain about
2093 multiply-defined symbols.  The most insidious aspect of this is that
2094 often the link will succeed anyway, but then the resulting executable
2095 will sometimes crash in obscure ways during certain operations!  To
2096 avoid this problem, declare any symbols with common names (such as
2097 @code{text}) that are not obviously associated with this particular
2098 module in the module @file{general.c}.
2099
2100   Global variables whose names begin with @samp{V} are variables that
2101 contain Lisp objects.  The convention here is that all global variables
2102 of type @code{Lisp_Object} begin with @samp{V}, and all others don't
2103 (including integer and boolean variables that have Lisp
2104 equivalents). Most of the time, these variables have equivalents in
2105 Lisp, but some don't.  Those that do are declared this way by a call to
2106 @code{DEFVAR_LISP()} in the @code{vars_of_*()} initializer for the
2107 module.  What this does is create a special @dfn{symbol-value-forward}
2108 Lisp object that contains a pointer to the C variable, intern a symbol
2109 whose name is as specified in the call to @code{DEFVAR_LISP()}, and set
2110 its value to the symbol-value-forward Lisp object; it also calls
2111 @code{staticpro()} on the C variable to tell the garbage-collection
2112 mechanism about the variable.  When @code{eval} (or actually
2113 @code{symbol-value}) encounters this special object in the process of
2114 retrieving a variable's value, it follows the indirection to the C
2115 variable and gets its value.  @code{setq} does similar things so that
2116 the C variable gets changed.
2117
2118   Whether or not you @code{DEFVAR_LISP()} a variable, you need to
2119 initialize it in the @code{vars_of_*()} function; otherwise it will end
2120 up as all zeroes, which is the integer 0 (@emph{not} @code{nil}), and
2121 this is probably not what you want.  Also, if the variable is not
2122 @code{DEFVAR_LISP()}ed, @strong{you must call} @code{staticpro()} on the
2123 C variable in the @code{vars_of_*()} function.  Otherwise, the
2124 garbage-collection mechanism won't know that the object in this variable
2125 is in use, and will happily collect it and reuse its storage for another
2126 Lisp object, and you will be the one who's unhappy when you can't figure
2127 out how your variable got overwritten.
2128
2129 @node Coding for Mule, Techniques for XEmacs Developers, Adding Global Lisp Variables, Rules When Writing New C Code
2130 @section Coding for Mule
2131 @cindex Coding for Mule
2132
2133 Although Mule support is not compiled by default in XEmacs, many people
2134 are using it, and we consider it crucial that new code works correctly
2135 with multibyte characters.  This is not hard; it is only a matter of
2136 following several simple user-interface guidelines.  Even if you never
2137 compile with Mule, with a little practice you will find it quite easy
2138 to code Mule-correctly.
2139
2140 Note that these guidelines are not necessarily tied to the current Mule
2141 implementation; they are also a good idea to follow on the grounds of
2142 code generalization for future I18N work.
2143
2144 @menu
2145 * Character-Related Data Types::
2146 * Working With Character and Byte Positions::
2147 * Conversion to and from External Data::
2148 * General Guidelines for Writing Mule-Aware Code::
2149 * An Example of Mule-Aware Code::
2150 @end menu
2151
2152 @node Character-Related Data Types, Working With Character and Byte Positions, Coding for Mule, Coding for Mule
2153 @subsection Character-Related Data Types
2154
2155 First, let's review the basic character-related datatypes used by
2156 XEmacs.  Note that the separate @code{typedef}s are not mandatory in the
2157 current implementation (all of them boil down to @code{unsigned char} or
2158 @code{int}), but they improve clarity of code a great deal, because one
2159 glance at the declaration can tell the intended use of the variable.
2160
2161 @table @code
2162 @item Emchar
2163 @cindex Emchar
2164 An @code{Emchar} holds a single Emacs character.
2165
2166 Obviously, the equality between characters and bytes is lost in the Mule
2167 world.  Characters can be represented by one or more bytes in the
2168 buffer, and @code{Emchar} is the C type large enough to hold any
2169 character.
2170
2171 Without Mule support, an @code{Emchar} is equivalent to an
2172 @code{unsigned char}.
2173
2174 @item Bufbyte
2175 @cindex Bufbyte
2176 The data representing the text in a buffer or string is logically a set
2177 of @code{Bufbyte}s.
2178
2179 XEmacs does not work with the same character formats all the time; when
2180 reading characters from the outside, it decodes them to an internal
2181 format, and likewise encodes them when writing.  @code{Bufbyte} (in fact
2182 @code{unsigned char}) is the basic unit of XEmacs internal buffers and
2183 strings format.  A @code{Bufbyte *} is the type that points at text
2184 encoded in the variable-width internal encoding.
2185
2186 One character can correspond to one or more @code{Bufbyte}s.  In the
2187 current Mule implementation, an ASCII character is represented by the
2188 same @code{Bufbyte}, and other characters are represented by a sequence
2189 of two or more @code{Bufbyte}s.
2190
2191 Without Mule support, there are exactly 256 characters, implicitly
2192 Latin-1, and each character is represented using one @code{Bufbyte}, and
2193 there is a one-to-one correspondence between @code{Bufbyte}s and
2194 @code{Emchar}s.
2195
2196 @item Bufpos
2197 @itemx Charcount
2198 @cindex Bufpos
2199 @cindex Charcount
2200 A @code{Bufpos} represents a character position in a buffer or string.
2201 A @code{Charcount} represents a number (count) of characters.
2202 Logically, subtracting two @code{Bufpos} values yields a
2203 @code{Charcount} value.  Although all of these are @code{typedef}ed to
2204 @code{EMACS_INT}, we use them in preference to @code{EMACS_INT} to make
2205 it clear what sort of position is being used.
2206
2207 @code{Bufpos} and @code{Charcount} values are the only ones that are
2208 ever visible to Lisp.
2209
2210 @item Bytind
2211 @itemx Bytecount
2212 @cindex Bytind
2213 @cindex Bytecount
2214 A @code{Bytind} represents a byte position in a buffer or string.  A
2215 @code{Bytecount} represents the distance between two positions, in bytes.
2216 The relationship between @code{Bytind} and @code{Bytecount} is the same
2217 as the relationship between @code{Bufpos} and @code{Charcount}.
2218
2219 @item Extbyte
2220 @itemx Extcount
2221 @cindex Extbyte
2222 @cindex Extcount
2223 When dealing with the outside world, XEmacs works with @code{Extbyte}s,
2224 which are equivalent to @code{unsigned char}.  Obviously, an
2225 @code{Extcount} is the distance between two @code{Extbyte}s.  Extbytes
2226 and Extcounts are not all that frequent in XEmacs code.
2227 @end table
2228
2229 @node Working With Character and Byte Positions, Conversion to and from External Data, Character-Related Data Types, Coding for Mule
2230 @subsection Working With Character and Byte Positions
2231
2232 Now that we have defined the basic character-related types, we can look
2233 at the macros and functions designed for work with them and for
2234 conversion between them.  Most of these macros are defined in
2235 @file{buffer.h}, and we don't discuss all of them here, but only the
2236 most important ones.  Examining the existing code is the best way to
2237 learn about them.
2238
2239 @table @code
2240 @item MAX_EMCHAR_LEN
2241 @cindex MAX_EMCHAR_LEN
2242 This preprocessor constant is the maximum number of buffer bytes to
2243 represent an Emacs character in the variable width internal encoding.
2244 It is useful when allocating temporary strings to keep a known number of
2245 characters.  For instance:
2246
2247 @example
2248 @group
2249 @{
2250   Charcount cclen;
2251   ...
2252   @{
2253     /* Allocate place for @var{cclen} characters. */
2254     Bufbyte *buf = (Bufbyte *)alloca (cclen * MAX_EMCHAR_LEN);
2255 ...
2256 @end group
2257 @end example
2258
2259 If you followed the previous section, you can guess that, logically,
2260 multiplying a @code{Charcount} value with @code{MAX_EMCHAR_LEN} produces
2261 a @code{Bytecount} value.
2262
2263 In the current Mule implementation, @code{MAX_EMCHAR_LEN} equals 4.
2264 Without Mule, it is 1.
2265
2266 @item charptr_emchar
2267 @itemx set_charptr_emchar
2268 @cindex charptr_emchar
2269 @cindex set_charptr_emchar
2270 The @code{charptr_emchar} macro takes a @code{Bufbyte} pointer and
2271 returns the @code{Emchar} stored at that position.  If it were a
2272 function, its prototype would be:
2273
2274 @example
2275 Emchar charptr_emchar (Bufbyte *p);
2276 @end example
2277
2278 @code{set_charptr_emchar} stores an @code{Emchar} to the specified byte
2279 position.  It returns the number of bytes stored:
2280
2281 @example
2282 Bytecount set_charptr_emchar (Bufbyte *p, Emchar c);
2283 @end example
2284
2285 It is important to note that @code{set_charptr_emchar} is safe only for
2286 appending a character at the end of a buffer, not for overwriting a
2287 character in the middle.  This is because the width of characters
2288 varies, and @code{set_charptr_emchar} cannot resize the string if it
2289 writes, say, a two-byte character where a single-byte character used to
2290 reside.
2291
2292 A typical use of @code{set_charptr_emchar} can be demonstrated by this
2293 example, which copies characters from buffer @var{buf} to a temporary
2294 string of Bufbytes.
2295
2296 @example
2297 @group
2298 @{
2299   Bufpos pos;
2300   for (pos = beg; pos < end; pos++)
2301     @{
2302       Emchar c = BUF_FETCH_CHAR (buf, pos);
2303       p += set_charptr_emchar (buf, c);
2304     @}
2305 @}
2306 @end group
2307 @end example
2308
2309 Note how @code{set_charptr_emchar} is used to store the @code{Emchar}
2310 and increment the counter, at the same time.
2311
2312 @item INC_CHARPTR
2313 @itemx DEC_CHARPTR
2314 @cindex INC_CHARPTR
2315 @cindex DEC_CHARPTR
2316 These two macros increment and decrement a @code{Bufbyte} pointer,
2317 respectively.  They will adjust the pointer by the appropriate number of
2318 bytes according to the byte length of the character stored there.  Both
2319 macros assume that the memory address is located at the beginning of a
2320 valid character.
2321
2322 Without Mule support, @code{INC_CHARPTR (p)} and @code{DEC_CHARPTR (p)}
2323 simply expand to @code{p++} and @code{p--}, respectively.
2324
2325 @item bytecount_to_charcount
2326 @cindex bytecount_to_charcount
2327 Given a pointer to a text string and a length in bytes, return the
2328 equivalent length in characters.
2329
2330 @example
2331 Charcount bytecount_to_charcount (Bufbyte *p, Bytecount bc);
2332 @end example
2333
2334 @item charcount_to_bytecount
2335 @cindex charcount_to_bytecount
2336 Given a pointer to a text string and a length in characters, return the
2337 equivalent length in bytes.
2338
2339 @example
2340 Bytecount charcount_to_bytecount (Bufbyte *p, Charcount cc);
2341 @end example
2342
2343 @item charptr_n_addr
2344 @cindex charptr_n_addr
2345 Return a pointer to the beginning of the character offset @var{cc} (in
2346 characters) from @var{p}.
2347
2348 @example
2349 Bufbyte *charptr_n_addr (Bufbyte *p, Charcount cc);
2350 @end example
2351 @end table
2352
2353 @node Conversion to and from External Data, General Guidelines for Writing Mule-Aware Code, Working With Character and Byte Positions, Coding for Mule
2354 @subsection Conversion to and from External Data
2355
2356 When an external function, such as a C library function, returns a
2357 @code{char} pointer, you should almost never treat it as @code{Bufbyte}.
2358 This is because these returned strings may contain 8bit characters which
2359 can be misinterpreted by XEmacs, and cause a crash.  Likewise, when
2360 exporting a piece of internal text to the outside world, you should
2361 always convert it to an appropriate external encoding, lest the internal
2362 stuff (such as the infamous \201 characters) leak out.
2363
2364 The interface to conversion between the internal and external
2365 representations of text are the numerous conversion macros defined in
2366 @file{buffer.h}.  There used to be a fixed set of external formats
2367 supported by these macros, but now any coding system can be used with
2368 these macros.  The coding system alias mechanism is used to create the
2369 following logical coding systems, which replace the fixed external
2370 formats.  The (dontusethis-set-symbol-value-handler) mechanism was
2371 enhanced to make this possible (more work on that is needed - like
2372 remove the @code{dontusethis-} prefix).
2373
2374 @table @code
2375 @item Qbinary
2376 This is the simplest format and is what we use in the absence of a more
2377 appropriate format.  This converts according to the @code{binary} coding
2378 system:
2379
2380 @enumerate a
2381 @item
2382 On input, bytes 0--255 are converted into (implicitly Latin-1)
2383 characters 0--255.  A non-Mule xemacs doesn't really know about
2384 different character sets and the fonts to display them, so the bytes can
2385 be treated as text in different 1-byte encodings by simply setting the
2386 appropriate fonts.  So in a sense, non-Mule xemacs is a multi-lingual
2387 editor if, for example, different fonts are used to display text in
2388 different buffers, faces, or windows.  The specifier mechanism gives the
2389 user complete control over this kind of behavior.
2390 @item
2391 On output, characters 0--255 are converted into bytes 0--255 and other
2392 characters are converted into `~'.
2393 @end enumerate
2394
2395 @item Qfile_name
2396 Format used for filenames.  This is user-definable via either the
2397 @code{file-name-coding-system} or @code{pathname-coding-system} (now
2398 obsolete) variables.
2399
2400 @item Qnative
2401 Format used for the external Unix environment---@code{argv[]}, stuff
2402 from @code{getenv()}, stuff from the @file{/etc/passwd} file, etc.
2403 Currently this is the same as Qfile_name.  The two should be
2404 distinguished for clarity and possible future separation.
2405
2406 @item Qctext
2407 Compound--text format.  This is the standard X11 format used for data
2408 stored in properties, selections, and the like.  This is an 8-bit
2409 no-lock-shift ISO2022 coding system.  This is a real coding system,
2410 unlike Qfile_name, which is user-definable.
2411 @end table
2412
2413 There are two fundamental macros to convert between external and
2414 internal format.
2415
2416 @code{TO_INTERNAL_FORMAT} converts external data to internal format, and
2417 @code{TO_EXTERNAL_FORMAT} converts the other way around.  The arguments
2418 each of these receives are a source type, a source, a sink type, a sink,
2419 and a coding system (or a symbol naming a coding system).
2420
2421 A typical call looks like
2422 @example
2423 TO_EXTERNAL_FORMAT (LISP_STRING, str, C_STRING_MALLOC, ptr, Qfile_name);
2424 @end example
2425
2426 which means that the contents of the lisp string @code{str} are written
2427 to a malloc'ed memory area which will be pointed to by @code{ptr}, after
2428 the function returns.  The conversion will be done using the
2429 @code{file-name} coding system, which will be controlled by the user
2430 indirectly by setting or binding the variable
2431 @code{file-name-coding-system}.
2432
2433 Some sources and sinks require two C variables to specify.  We use some
2434 preprocessor magic to allow different source and sink types, and even
2435 different numbers of arguments to specify different types of sources and
2436 sinks.
2437
2438 So we can have a call that looks like
2439 @example
2440 TO_INTERNAL_FORMAT (DATA, (ptr, len),
2441                     MALLOC, (ptr, len),
2442                     coding_system);
2443 @end example
2444
2445 The parenthesized argument pairs are required to make the preprocessor
2446 magic work.
2447
2448 Here are the different source and sink types:
2449
2450 @table @code
2451 @item @code{DATA, (ptr, len),}
2452 input data is a fixed buffer of size @var{len} at address @var{ptr}
2453 @item @code{ALLOCA, (ptr, len),}
2454 output data is placed in an alloca()ed buffer of size @var{len} pointed to by @var{ptr}
2455 @item @code{MALLOC, (ptr, len),}
2456 output data is in a malloc()ed buffer of size @var{len} pointed to by @var{ptr}
2457 @item @code{C_STRING_ALLOCA, ptr,}
2458 equivalent to @code{ALLOCA (ptr, len_ignored)} on output.
2459 @item @code{C_STRING_MALLOC, ptr,}
2460 equivalent to @code{MALLOC (ptr, len_ignored)} on output
2461 @item @code{C_STRING, ptr,}
2462 equivalent to @code{DATA, (ptr, strlen (ptr) + 1)} on input
2463 @item @code{LISP_STRING, string,}
2464 input or output is a Lisp_Object of type string
2465 @item @code{LISP_BUFFER, buffer,}
2466 output is written to @code{(point)} in lisp buffer @var{buffer}
2467 @item @code{LISP_LSTREAM, lstream,}
2468 input or output is a Lisp_Object of type lstream
2469 @item @code{LISP_OPAQUE, object,}
2470 input or output is a Lisp_Object of type opaque
2471 @end table
2472
2473 Often, the data is being converted to a '\0'-byte-terminated string,
2474 which is the format required by many external system C APIs.  For these
2475 purposes, a source type of @code{C_STRING} or a sink type of
2476 @code{C_STRING_ALLOCA} or @code{C_STRING_MALLOC} is appropriate.
2477 Otherwise, we should try to keep XEmacs '\0'-byte-clean, which means
2478 using (ptr, len) pairs.
2479
2480 The sinks to be specified must be lvalues, unless they are the lisp
2481 object types @code{LISP_LSTREAM} or @code{LISP_BUFFER}.
2482
2483 For the sink types @code{ALLOCA} and @code{C_STRING_ALLOCA}, the
2484 resulting text is stored in a stack-allocated buffer, which is
2485 automatically freed on returning from the function.  However, the sink
2486 types @code{MALLOC} and @code{C_STRING_MALLOC} return @code{xmalloc()}ed
2487 memory.  The caller is responsible for freeing this memory using
2488 @code{xfree()}.
2489
2490 Note that it doesn't make sense for @code{LISP_STRING} to be a source
2491 for @code{TO_INTERNAL_FORMAT} or a sink for @code{TO_EXTERNAL_FORMAT}.
2492 You'll get an assertion failure if you try.
2493
2494
2495 @node General Guidelines for Writing Mule-Aware Code, An Example of Mule-Aware Code, Conversion to and from External Data, Coding for Mule
2496 @subsection General Guidelines for Writing Mule-Aware Code
2497
2498 This section contains some general guidance on how to write Mule-aware
2499 code, as well as some pitfalls you should avoid.
2500
2501 @table @emph
2502 @item Never use @code{char} and @code{char *}.
2503 In XEmacs, the use of @code{char} and @code{char *} is almost always a
2504 mistake.  If you want to manipulate an Emacs character from ``C'', use
2505 @code{Emchar}.  If you want to examine a specific octet in the internal
2506 format, use @code{Bufbyte}.  If you want a Lisp-visible character, use a
2507 @code{Lisp_Object} and @code{make_char}.  If you want a pointer to move
2508 through the internal text, use @code{Bufbyte *}.  Also note that you
2509 almost certainly do not need @code{Emchar *}.
2510
2511 @item Be careful not to confuse @code{Charcount}, @code{Bytecount}, and @code{Bufpos}.
2512 The whole point of using different types is to avoid confusion about the
2513 use of certain variables.  Lest this effect be nullified, you need to be
2514 careful about using the right types.
2515
2516 @item Always convert external data
2517 It is extremely important to always convert external data, because
2518 XEmacs can crash if unexpected 8bit sequences are copied to its internal
2519 buffers literally.
2520
2521 This means that when a system function, such as @code{readdir}, returns
2522 a string, you may need to convert it using one of the conversion macros
2523 described in the previous chapter, before passing it further to Lisp.
2524
2525 Actually, most of the basic system functions that accept '\0'-terminated
2526 string arguments, like @code{stat()} and @code{open()}, have been
2527 @strong{encapsulated} so that they are they @code{always} do internal to
2528 external conversion themselves.  This means you must pass internally
2529 encoded data, typically the @code{XSTRING_DATA} of a Lisp_String to
2530 these functions.  This is actually a design bug, since it unexpectedly
2531 changes the semantics of the system functions.  A better design would be
2532 to provide separate versions of these system functions that accepted
2533 Lisp_Objects which were lisp strings in place of their current
2534 @code{char *} arguments.
2535
2536 @example
2537 int stat_lisp (Lisp_Object path, struct stat *buf); /* Implement me */
2538 @end example
2539
2540 Also note that many internal functions, such as @code{make_string},
2541 accept Bufbytes, which removes the need for them to convert the data
2542 they receive.  This increases efficiency because that way external data
2543 needs to be decoded only once, when it is read.  After that, it is
2544 passed around in internal format.
2545 @end table
2546
2547 @node An Example of Mule-Aware Code,  , General Guidelines for Writing Mule-Aware Code, Coding for Mule
2548 @subsection An Example of Mule-Aware Code
2549
2550 As an example of Mule-aware code, we will analyze the @code{string}
2551 function, which conses up a Lisp string from the character arguments it
2552 receives.  Here is the definition, pasted from @code{alloc.c}:
2553
2554 @example
2555 @group
2556 DEFUN ("string", Fstring, 0, MANY, 0, /*
2557 Concatenate all the argument characters and make the result a string.
2558 */
2559        (int nargs, Lisp_Object *args))
2560 @{
2561   Bufbyte *storage = alloca_array (Bufbyte, nargs * MAX_EMCHAR_LEN);
2562   Bufbyte *p = storage;
2563
2564   for (; nargs; nargs--, args++)
2565     @{
2566       Lisp_Object lisp_char = *args;
2567       CHECK_CHAR_COERCE_INT (lisp_char);
2568       p += set_charptr_emchar (p, XCHAR (lisp_char));
2569     @}
2570   return make_string (storage, p - storage);
2571 @}
2572 @end group
2573 @end example
2574
2575 Now we can analyze the source line by line.
2576
2577 Obviously, string will be as long as there are arguments to the
2578 function.  This is why we allocate @code{MAX_EMCHAR_LEN} * @var{nargs}
2579 bytes on the stack, i.e. the worst-case number of bytes for @var{nargs}
2580 @code{Emchar}s to fit in the string.
2581
2582 Then, the loop checks that each element is a character, converting
2583 integers in the process.  Like many other functions in XEmacs, this
2584 function silently accepts integers where characters are expected, for
2585 historical and compatibility reasons.  Unless you know what you are
2586 doing, @code{CHECK_CHAR} will also suffice.  @code{XCHAR (lisp_char)}
2587 extracts the @code{Emchar} from the @code{Lisp_Object}, and
2588 @code{set_charptr_emchar} stores it to storage, increasing @code{p} in
2589 the process.
2590
2591 Other instructive examples of correct coding under Mule can be found all
2592 over the XEmacs code.  For starters, I recommend
2593 @code{Fnormalize_menu_item_name} in @file{menubar.c}.  After you have
2594 understood this section of the manual and studied the examples, you can
2595 proceed writing new Mule-aware code.
2596
2597 @node Techniques for XEmacs Developers,  , Coding for Mule, Rules When Writing New C Code
2598 @section Techniques for XEmacs Developers
2599
2600 To make a quantified XEmacs, do: @code{make quantmacs}.
2601
2602 You simply can't dump Quantified and Purified images.  Run the image
2603 like so:  @code{quantmacs -batch -l loadup.el run-temacs @var{xemacs-args...}}.
2604
2605 Before you go through the trouble, are you compiling with all
2606 debugging and error-checking off?  If not try that first.  Be warned
2607 that while Quantify is directly responsible for quite a few
2608 optimizations which have been made to XEmacs, doing a run which
2609 generates results which can be acted upon is not necessarily a trivial
2610 task.
2611
2612 Also, if you're still willing to do some runs make sure you configure
2613 with the @samp{--quantify} flag.  That will keep Quantify from starting
2614 to record data until after the loadup is completed and will shut off
2615 recording right before it shuts down (which generates enough bogus data
2616 to throw most results off).  It also enables three additional elisp
2617 commands: @code{quantify-start-recording-data},
2618 @code{quantify-stop-recording-data} and @code{quantify-clear-data}.
2619
2620 If you want to make XEmacs faster, target your favorite slow benchmark,
2621 run a profiler like Quantify, @code{gprof}, or @code{tcov}, and figure
2622 out where the cycles are going.  Specific projects:
2623
2624 @itemize @bullet
2625 @item
2626 Make the garbage collector faster.  Figure out how to write an
2627 incremental garbage collector.
2628 @item
2629 Write a compiler that takes bytecode and spits out C code.
2630 Unfortunately, you will then need a C compiler and a more fully
2631 developed module system.
2632 @item
2633 Speed up redisplay.
2634 @item
2635 Speed up syntax highlighting.  Maybe moving some of the syntax
2636 highlighting capabilities into C would make a difference.
2637 @item
2638 Implement tail recursion in Emacs Lisp (hard!).
2639 @end itemize
2640
2641 Unfortunately, Emacs Lisp is slow, and is going to stay slow.  Function
2642 calls in elisp are especially expensive.  Iterating over a long list is
2643 going to be 30 times faster implemented in C than in Elisp.
2644
2645 To get started debugging XEmacs, take a look at the @file{.gdbinit} and
2646 @file{.dbxrc} files in the @file{src} directory.
2647 @xref{Q2.1.15 - How to Debug an XEmacs problem with a debugger,,,
2648 xemacs-faq, XEmacs FAQ}.
2649
2650 After making source code changes, run @code{make check} to ensure that
2651 you haven't introduced any regressions.  If you're feeling ambitious,
2652 you can try to improve the test suite in @file{tests/automated}.
2653
2654 Here are things to know when you create a new source file:
2655
2656 @itemize @bullet
2657 @item
2658 All @file{.c} files should @code{#include <config.h>} first.  Almost all
2659 @file{.c} files should @code{#include "lisp.h"} second.
2660
2661 @item
2662 Generated header files should be included using the @code{#include <...>} syntax,
2663 not the @code{#include "..."} syntax.  The generated headers are:
2664
2665 @file{config.h sheap-adjust.h paths.h Emacs.ad.h}
2666
2667 The basic rule is that you should assume builds using @code{--srcdir}
2668 and the @code{#include <...>} syntax needs to be used when the
2669 to-be-included generated file is in a potentially different directory
2670 @emph{at compile time}.  The non-obvious C rule is that @code{#include "..."}
2671 means to search for the included file in the same directory as the
2672 including file, @emph{not} in the current directory.
2673
2674 @item
2675 Header files should @emph{not} include @code{<config.h>} and
2676 @code{"lisp.h"}.  It is the responsibility of the @file{.c} files that
2677 use it to do so.
2678
2679 @item
2680 If the header uses @code{INLINE}, either directly or through
2681 @code{DECLARE_LRECORD}, then it must be added to @file{inline.c}'s
2682 includes.
2683
2684 @item
2685 Try compiling at least once with
2686
2687 @example
2688 gcc --with-mule --with-union-type --error-checking=all
2689 @end example
2690
2691 @item
2692 Did I mention that you should run the test suite?
2693 @example
2694 make check
2695 @end example
2696 @end itemize
2697
2698 Here is a checklist of things to do when creating a new lisp object type
2699 named @var{foo}:
2700
2701 @enumerate
2702 @item
2703 create @var{foo}.h
2704 @item
2705 create @var{foo}.c
2706 @item
2707 add definitions of syms_of_@var{foo}, etc. to @var{foo}.c
2708 @item
2709 add declarations of syms_of_@var{foo}, etc. to symsinit.h
2710 @item
2711 add calls to syms_of_@var{foo}, etc. to emacs.c(main_1)
2712 @item
2713 add definitions of macros like CHECK_FOO and FOOP to @var{foo}.h
2714 @item
2715 add the new type index to enum lrecord_type
2716 @item
2717 add DEFINE_LRECORD_IMPLEMENTATION call to @var{foo}.c
2718 @end enumerate
2719
2720 @node A Summary of the Various XEmacs Modules, Allocation of Objects in XEmacs Lisp, Rules When Writing New C Code, Top
2721 @chapter A Summary of the Various XEmacs Modules
2722
2723   This is accurate as of XEmacs 20.0.
2724
2725 @menu
2726 * Low-Level Modules::
2727 * Basic Lisp Modules::
2728 * Modules for Standard Editing Operations::
2729 * Editor-Level Control Flow Modules::
2730 * Modules for the Basic Displayable Lisp Objects::
2731 * Modules for other Display-Related Lisp Objects::
2732 * Modules for the Redisplay Mechanism::
2733 * Modules for Interfacing with the File System::
2734 * Modules for Other Aspects of the Lisp Interpreter and Object System::
2735 * Modules for Interfacing with the Operating System::
2736 * Modules for Interfacing with X Windows::
2737 * Modules for Internationalization::
2738 @end menu
2739
2740 @node Low-Level Modules, Basic Lisp Modules, A Summary of the Various XEmacs Modules, A Summary of the Various XEmacs Modules
2741 @section Low-Level Modules
2742
2743 @example
2744 config.h
2745 @end example
2746
2747 This is automatically generated from @file{config.h.in} based on the
2748 results of configure tests and user-selected optional features and
2749 contains preprocessor definitions specifying the nature of the
2750 environment in which XEmacs is being compiled.
2751
2752
2753
2754 @example
2755 paths.h
2756 @end example
2757
2758 This is automatically generated from @file{paths.h.in} based on supplied
2759 configure values, and allows for non-standard installed configurations
2760 of the XEmacs directories.  It's currently broken, though.
2761
2762
2763
2764 @example
2765 emacs.c
2766 signal.c
2767 @end example
2768
2769 @file{emacs.c} contains @code{main()} and other code that performs the most
2770 basic environment initializations and handles shutting down the XEmacs
2771 process (this includes @code{kill-emacs}, the normal way that XEmacs is
2772 exited; @code{dump-emacs}, which is used during the build process to
2773 write out the XEmacs executable; @code{run-emacs-from-temacs}, which can
2774 be used to start XEmacs directly when temacs has finished loading all
2775 the Lisp code; and emergency code to handle crashes [XEmacs tries to
2776 auto-save all files before it crashes]).
2777
2778 Low-level code that directly interacts with the Unix signal mechanism,
2779 however, is in @file{signal.c}.  Note that this code does not handle system
2780 dependencies in interfacing to signals; that is handled using the
2781 @file{syssignal.h} header file, described in section J below.
2782
2783
2784
2785 @example
2786 unexaix.c
2787 unexalpha.c
2788 unexapollo.c
2789 unexconvex.c
2790 unexec.c
2791 unexelf.c
2792 unexelfsgi.c
2793 unexencap.c
2794 unexenix.c
2795 unexfreebsd.c
2796 unexfx2800.c
2797 unexhp9k3.c
2798 unexhp9k800.c
2799 unexmips.c
2800 unexnext.c
2801 unexsol2.c
2802 unexsunos4.c
2803 @end example
2804
2805 These modules contain code dumping out the XEmacs executable on various
2806 different systems. (This process is highly machine-specific and
2807 requires intimate knowledge of the executable format and the memory map
2808 of the process.) Only one of these modules is actually used; this is
2809 chosen by @file{configure}.
2810
2811
2812
2813 @example
2814 crt0.c
2815 lastfile.c
2816 pre-crt0.c
2817 @end example
2818
2819 These modules are used in conjunction with the dump mechanism.  On some
2820 systems, an alternative version of the C startup code (the actual code
2821 that receives control from the operating system when the process is
2822 started, and which calls @code{main()}) is required so that the dumping
2823 process works properly; @file{crt0.c} provides this.
2824
2825 @file{pre-crt0.c} and @file{lastfile.c} should be the very first and
2826 very last file linked, respectively. (Actually, this is not really true.
2827 @file{lastfile.c} should be after all Emacs modules whose initialized
2828 data should be made constant, and before all other Emacs files and all
2829 libraries.  In particular, the allocation modules @file{gmalloc.c},
2830 @file{alloca.c}, etc. are normally placed past @file{lastfile.c}, and
2831 all of the files that implement Xt widget classes @emph{must} be placed
2832 after @file{lastfile.c} because they contain various structures that
2833 must be statically initialized and into which Xt writes at various
2834 times.) @file{pre-crt0.c} and @file{lastfile.c} contain exported symbols
2835 that are used to determine the start and end of XEmacs' initialized
2836 data space when dumping.
2837
2838
2839
2840 @example
2841 alloca.c
2842 free-hook.c
2843 getpagesize.h
2844 gmalloc.c
2845 malloc.c
2846 mem-limits.h
2847 ralloc.c
2848 vm-limit.c
2849 @end example
2850
2851 These handle basic C allocation of memory.  @file{alloca.c} is an emulation of
2852 the stack allocation function @code{alloca()} on machines that lack
2853 this. (XEmacs makes extensive use of @code{alloca()} in its code.)
2854
2855 @file{gmalloc.c} and @file{malloc.c} are two implementations of the standard C
2856 functions @code{malloc()}, @code{realloc()} and @code{free()}.  They are
2857 often used in place of the standard system-provided @code{malloc()}
2858 because they usually provide a much faster implementation, at the
2859 expense of additional memory use.  @file{gmalloc.c} is a newer implementation
2860 that is much more memory-efficient for large allocations than @file{malloc.c},
2861 and should always be preferred if it works. (At one point, @file{gmalloc.c}
2862 didn't work on some systems where @file{malloc.c} worked; but this should be
2863 fixed now.)
2864
2865 @cindex relocating allocator
2866 @file{ralloc.c} is the @dfn{relocating allocator}.  It provides
2867 functions similar to @code{malloc()}, @code{realloc()} and @code{free()}
2868 that allocate memory that can be dynamically relocated in memory.  The
2869 advantage of this is that allocated memory can be shuffled around to
2870 place all the free memory at the end of the heap, and the heap can then
2871 be shrunk, releasing the memory back to the operating system.  The use
2872 of this can be controlled with the configure option @code{--rel-alloc};
2873 if enabled, memory allocated for buffers will be relocatable, so that if
2874 a very large file is visited and the buffer is later killed, the memory
2875 can be released to the operating system.  (The disadvantage of this
2876 mechanism is that it can be very slow.  On systems with the
2877 @code{mmap()} system call, the XEmacs version of @file{ralloc.c} uses
2878 this to move memory around without actually having to block-copy it,
2879 which can speed things up; but it can still cause noticeable performance
2880 degradation.)
2881
2882 @file{free-hook.c} contains some debugging functions for checking for invalid
2883 arguments to @code{free()}.
2884
2885 @file{vm-limit.c} contains some functions that warn the user when memory is
2886 getting low.  These are callback functions that are called by @file{gmalloc.c}
2887 and @file{malloc.c} at appropriate times.
2888
2889 @file{getpagesize.h} provides a uniform interface for retrieving the size of a
2890 page in virtual memory.  @file{mem-limits.h} provides a uniform interface for
2891 retrieving the total amount of available virtual memory.  Both are
2892 similar in spirit to the @file{sys*.h} files described in section J, below.
2893
2894
2895
2896 @example
2897 blocktype.c
2898 blocktype.h
2899 dynarr.c
2900 @end example
2901
2902 These implement a couple of basic C data types to facilitate memory
2903 allocation.  The @code{Blocktype} type efficiently manages the
2904 allocation of fixed-size blocks by minimizing the number of times that
2905 @code{malloc()} and @code{free()} are called.  It allocates memory in
2906 large chunks, subdivides the chunks into blocks of the proper size, and
2907 returns the blocks as requested.  When blocks are freed, they are placed
2908 onto a linked list, so they can be efficiently reused.  This data type
2909 is not much used in XEmacs currently, because it's a fairly new
2910 addition.
2911
2912 @cindex dynamic array
2913 The @code{Dynarr} type implements a @dfn{dynamic array}, which is
2914 similar to a standard C array but has no fixed limit on the number of
2915 elements it can contain.  Dynamic arrays can hold elements of any type,
2916 and when you add a new element, the array automatically resizes itself
2917 if it isn't big enough.  Dynarrs are extensively used in the redisplay
2918 mechanism.
2919
2920
2921
2922 @example
2923 inline.c
2924 @end example
2925
2926 This module is used in connection with inline functions (available in
2927 some compilers).  Often, inline functions need to have a corresponding
2928 non-inline function that does the same thing.  This module is where they
2929 reside.  It contains no actual code, but defines some special flags that
2930 cause inline functions defined in header files to be rendered as actual
2931 functions.  It then includes all header files that contain any inline
2932 function definitions, so that each one gets a real function equivalent.
2933
2934
2935
2936 @example
2937 debug.c
2938 debug.h
2939 @end example
2940
2941 These functions provide a system for doing internal consistency checks
2942 during code development.  This system is not currently used; instead the
2943 simpler @code{assert()} macro is used along with the various checks
2944 provided by the @samp{--error-check-*} configuration options.
2945
2946
2947
2948 @example
2949 prefix-args.c
2950 @end example
2951
2952 This is actually the source for a small, self-contained program
2953 used during building.
2954
2955
2956 @example
2957 universe.h
2958 @end example
2959
2960 This is not currently used.
2961
2962
2963
2964 @node Basic Lisp Modules, Modules for Standard Editing Operations, Low-Level Modules, A Summary of the Various XEmacs Modules
2965 @section Basic Lisp Modules
2966
2967 @example
2968 emacsfns.h
2969 lisp-disunion.h
2970 lisp-union.h
2971 lisp.h
2972 lrecord.h
2973 symsinit.h
2974 @end example
2975
2976 These are the basic header files for all XEmacs modules.  Each module
2977 includes @file{lisp.h}, which brings the other header files in.
2978 @file{lisp.h} contains the definitions of the structures and extractor
2979 and constructor macros for the basic Lisp objects and various other
2980 basic definitions for the Lisp environment, as well as some
2981 general-purpose definitions (e.g. @code{min()} and @code{max()}).
2982 @file{lisp.h} includes either @file{lisp-disunion.h} or
2983 @file{lisp-union.h}, depending on whether @code{USE_UNION_TYPE} is
2984 defined.  These files define the typedef of the Lisp object itself (as
2985 described above) and the low-level macros that hide the actual
2986 implementation of the Lisp object.  All extractor and constructor macros
2987 for particular types of Lisp objects are defined in terms of these
2988 low-level macros.
2989
2990 As a general rule, all typedefs should go into the typedefs section of
2991 @file{lisp.h} rather than into a module-specific header file even if the
2992 structure is defined elsewhere.  This allows function prototypes that
2993 use the typedef to be placed into other header files.  Forward structure
2994 declarations (i.e. a simple declaration like @code{struct foo;} where
2995 the structure itself is defined elsewhere) should be placed into the
2996 typedefs section as necessary.
2997
2998 @file{lrecord.h} contains the basic structures and macros that implement
2999 all record-type Lisp objects---i.e. all objects whose type is a field
3000 in their C structure, which includes all objects except the few most
3001 basic ones.
3002
3003 @file{lisp.h} contains prototypes for most of the exported functions in
3004 the various modules.  Lisp primitives defined using @code{DEFUN} that
3005 need to be called by C code should be declared using @code{EXFUN}.
3006 Other function prototypes should be placed either into the appropriate
3007 section of @code{lisp.h}, or into a module-specific header file,
3008 depending on how general-purpose the function is and whether it has
3009 special-purpose argument types requiring definitions not in
3010 @file{lisp.h}.)  All initialization functions are prototyped in
3011 @file{symsinit.h}.
3012
3013
3014
3015 @example
3016 alloc.c
3017 @end example
3018
3019 The large module @file{alloc.c} implements all of the basic allocation and
3020 garbage collection for Lisp objects.  The most commonly used Lisp
3021 objects are allocated in chunks, similar to the Blocktype data type
3022 described above; others are allocated in individually @code{malloc()}ed
3023 blocks.  This module provides the foundation on which all other aspects
3024 of the Lisp environment sit, and is the first module initialized at
3025 startup.
3026
3027 Note that @file{alloc.c} provides a series of generic functions that are
3028 not dependent on any particular object type, and interfaces to
3029 particular types of objects using a standardized interface of
3030 type-specific methods.  This scheme is a fundamental principle of
3031 object-oriented programming and is heavily used throughout XEmacs.  The
3032 great advantage of this is that it allows for a clean separation of
3033 functionality into different modules---new classes of Lisp objects, new
3034 event interfaces, new device types, new stream interfaces, etc. can be
3035 added transparently without affecting code anywhere else in XEmacs.
3036 Because the different subsystems are divided into general and specific
3037 code, adding a new subtype within a subsystem will in general not
3038 require changes to the generic subsystem code or affect any of the other
3039 subtypes in the subsystem; this provides a great deal of robustness to
3040 the XEmacs code.
3041
3042
3043 @example
3044 eval.c
3045 backtrace.h
3046 @end example
3047
3048 This module contains all of the functions to handle the flow of control.
3049 This includes the mechanisms of defining functions, calling functions,
3050 traversing stack frames, and binding variables; the control primitives
3051 and other special forms such as @code{while}, @code{if}, @code{eval},
3052 @code{let}, @code{and}, @code{or}, @code{progn}, etc.; handling of
3053 non-local exits, unwind-protects, and exception handlers; entering the
3054 debugger; methods for the subr Lisp object type; etc.  It does
3055 @emph{not} include the @code{read} function, the @code{print} function,
3056 or the handling of symbols and obarrays.
3057
3058 @file{backtrace.h} contains some structures related to stack frames and the
3059 flow of control.
3060
3061
3062
3063 @example
3064 lread.c
3065 @end example
3066
3067 This module implements the Lisp reader and the @code{read} function,
3068 which converts text into Lisp objects, according to the read syntax of
3069 the objects, as described above.  This is similar to the parser that is
3070 a part of all compilers.
3071
3072
3073
3074 @example
3075 print.c
3076 @end example
3077
3078 This module implements the Lisp print mechanism and the @code{print}
3079 function and related functions.  This is the inverse of the Lisp reader
3080 -- it converts Lisp objects to a printed, textual representation.
3081 (Hopefully something that can be read back in using @code{read} to get
3082 an equivalent object.)
3083
3084
3085
3086 @example
3087 general.c
3088 symbols.c
3089 symeval.h
3090 @end example
3091
3092 @file{symbols.c} implements the handling of symbols, obarrays, and
3093 retrieving the values of symbols.  Much of the code is devoted to
3094 handling the special @dfn{symbol-value-magic} objects that define
3095 special types of variables---this includes buffer-local variables,
3096 variable aliases, variables that forward into C variables, etc.  This
3097 module is initialized extremely early (right after @file{alloc.c}),
3098 because it is here that the basic symbols @code{t} and @code{nil} are
3099 created, and those symbols are used everywhere throughout XEmacs.
3100
3101 @file{symeval.h} contains the definitions of symbol structures and the
3102 @code{DEFVAR_LISP()} and related macros for declaring variables.
3103
3104
3105
3106 @example
3107 data.c
3108 floatfns.c
3109 fns.c
3110 @end example
3111
3112 These modules implement the methods and standard Lisp primitives for all
3113 the basic Lisp object types other than symbols (which are described
3114 above).  @file{data.c} contains all the predicates (primitives that return
3115 whether an object is of a particular type); the integer arithmetic
3116 functions; and the basic accessor and mutator primitives for the various
3117 object types.  @file{fns.c} contains all the standard predicates for working
3118 with sequences (where, abstractly speaking, a sequence is an ordered set
3119 of objects, and can be represented by a list, string, vector, or
3120 bit-vector); it also contains @code{equal}, perhaps on the grounds that
3121 bulk of the operation of @code{equal} is comparing sequences.
3122 @file{floatfns.c} contains methods and primitives for floats and floating-point
3123 arithmetic.
3124
3125
3126
3127 @example
3128 bytecode.c
3129 bytecode.h
3130 @end example
3131
3132 @file{bytecode.c} implements the byte-code interpreter and
3133 compiled-function objects, and @file{bytecode.h} contains associated
3134 structures.  Note that the byte-code @emph{compiler} is written in Lisp.
3135
3136
3137
3138
3139 @node Modules for Standard Editing Operations, Editor-Level Control Flow Modules, Basic Lisp Modules, A Summary of the Various XEmacs Modules
3140 @section Modules for Standard Editing Operations
3141
3142 @example
3143 buffer.c
3144 buffer.h
3145 bufslots.h
3146 @end example
3147
3148 @file{buffer.c} implements the @dfn{buffer} Lisp object type.  This
3149 includes functions that create and destroy buffers; retrieve buffers by
3150 name or by other properties; manipulate lists of buffers (remember that
3151 buffers are permanent objects and stored in various ordered lists);
3152 retrieve or change buffer properties; etc.  It also contains the
3153 definitions of all the built-in buffer-local variables (which can be
3154 viewed as buffer properties).  It does @emph{not} contain code to
3155 manipulate buffer-local variables (that's in @file{symbols.c}, described
3156 above); or code to manipulate the text in a buffer.
3157
3158 @file{buffer.h} defines the structures associated with a buffer and the various
3159 macros for retrieving text from a buffer and special buffer positions
3160 (e.g. @code{point}, the default location for text insertion).  It also
3161 contains macros for working with buffer positions and converting between
3162 their representations as character offsets and as byte offsets (under
3163 MULE, they are different, because characters can be multi-byte).  It is
3164 one of the largest header files.
3165
3166 @file{bufslots.h} defines the fields in the buffer structure that correspond to
3167 the built-in buffer-local variables.  It is its own header file because
3168 it is included many times in @file{buffer.c}, as a way of iterating over all
3169 the built-in buffer-local variables.
3170
3171
3172
3173 @example
3174 insdel.c
3175 insdel.h
3176 @end example
3177
3178 @file{insdel.c} contains low-level functions for inserting and deleting text in
3179 a buffer, keeping track of changed regions for use by redisplay, and
3180 calling any before-change and after-change functions that may have been
3181 registered for the buffer.  It also contains the actual functions that
3182 convert between byte offsets and character offsets.
3183
3184 @file{insdel.h} contains associated headers.
3185
3186
3187
3188 @example
3189 marker.c
3190 @end example
3191
3192 This module implements the @dfn{marker} Lisp object type, which
3193 conceptually is a pointer to a text position in a buffer that moves
3194 around as text is inserted and deleted, so as to remain in the same
3195 relative position.  This module doesn't actually move the markers around
3196 -- that's handled in @file{insdel.c}.  This module just creates them and
3197 implements the primitives for working with them.  As markers are simple
3198 objects, this does not entail much.
3199
3200 Note that the standard arithmetic primitives (e.g. @code{+}) accept
3201 markers in place of integers and automatically substitute the value of
3202 @code{marker-position} for the marker, i.e. an integer describing the
3203 current buffer position of the marker.
3204
3205
3206
3207 @example
3208 extents.c
3209 extents.h
3210 @end example
3211
3212 This module implements the @dfn{extent} Lisp object type, which is like
3213 a marker that works over a range of text rather than a single position.
3214 Extents are also much more complex and powerful than markers and have a
3215 more efficient (and more algorithmically complex) implementation.  The
3216 implementation is described in detail in comments in @file{extents.c}.
3217
3218 The code in @file{extents.c} works closely with @file{insdel.c} so that
3219 extents are properly moved around as text is inserted and deleted.
3220 There is also code in @file{extents.c} that provides information needed
3221 by the redisplay mechanism for efficient operation. (Remember that
3222 extents can have display properties that affect [sometimes drastically,
3223 as in the @code{invisible} property] the display of the text they
3224 cover.)
3225
3226
3227
3228 @example
3229 editfns.c
3230 @end example
3231
3232 @file{editfns.c} contains the standard Lisp primitives for working with
3233 a buffer's text, and calls the low-level functions in @file{insdel.c}.
3234 It also contains primitives for working with @code{point} (the default
3235 buffer insertion location).
3236
3237 @file{editfns.c} also contains functions for retrieving various
3238 characteristics from the external environment: the current time, the
3239 process ID of the running XEmacs process, the name of the user who ran
3240 this XEmacs process, etc.  It's not clear why this code is in
3241 @file{editfns.c}.
3242
3243
3244
3245 @example
3246 callint.c
3247 cmds.c
3248 commands.h
3249 @end example
3250
3251 @cindex interactive
3252 These modules implement the basic @dfn{interactive} commands,
3253 i.e. user-callable functions.  Commands, as opposed to other functions,
3254 have special ways of getting their parameters interactively (by querying
3255 the user), as opposed to having them passed in a normal function
3256 invocation.  Many commands are not really meant to be called from other
3257 Lisp functions, because they modify global state in a way that's often
3258 undesired as part of other Lisp functions.
3259
3260 @file{callint.c} implements the mechanism for querying the user for
3261 parameters and calling interactive commands.  The bulk of this module is
3262 code that parses the interactive spec that is supplied with an
3263 interactive command.
3264
3265 @file{cmds.c} implements the basic, most commonly used editing commands:
3266 commands to move around the current buffer and insert and delete
3267 characters.  These commands are implemented using the Lisp primitives
3268 defined in @file{editfns.c}.
3269
3270 @file{commands.h} contains associated structure definitions and prototypes.
3271
3272
3273
3274 @example
3275 regex.c
3276 regex.h
3277 search.c
3278 @end example
3279
3280 @file{search.c} implements the Lisp primitives for searching for text in
3281 a buffer, and some of the low-level algorithms for doing this.  In
3282 particular, the fast fixed-string Boyer-Moore search algorithm is
3283 implemented in @file{search.c}.  The low-level algorithms for doing
3284 regular-expression searching, however, are implemented in @file{regex.c}
3285 and @file{regex.h}.  These two modules are largely independent of
3286 XEmacs, and are similar to (and based upon) the regular-expression
3287 routines used in @file{grep} and other GNU utilities.
3288
3289
3290
3291 @example
3292 doprnt.c
3293 @end example
3294
3295 @file{doprnt.c} implements formatted-string processing, similar to
3296 @code{printf()} command in C.
3297
3298
3299
3300 @example
3301 undo.c
3302 @end example
3303
3304 This module implements the undo mechanism for tracking buffer changes.
3305 Most of this could be implemented in Lisp.
3306
3307
3308
3309 @node Editor-Level Control Flow Modules, Modules for the Basic Displayable Lisp Objects, Modules for Standard Editing Operations, A Summary of the Various XEmacs Modules
3310 @section Editor-Level Control Flow Modules
3311
3312 @example
3313 event-Xt.c
3314 event-stream.c
3315 event-tty.c
3316 events.c
3317 events.h
3318 @end example
3319
3320 These implement the handling of events (user input and other system
3321 notifications).
3322
3323 @file{events.c} and @file{events.h} define the @dfn{event} Lisp object
3324 type and primitives for manipulating it.
3325
3326 @file{event-stream.c} implements the basic functions for working with
3327 event queues, dispatching an event by looking it up in relevant keymaps
3328 and such, and handling timeouts; this includes the primitives
3329 @code{next-event} and @code{dispatch-event}, as well as related
3330 primitives such as @code{sit-for}, @code{sleep-for}, and
3331 @code{accept-process-output}. (@file{event-stream.c} is one of the
3332 hairiest and trickiest modules in XEmacs.  Beware!  You can easily mess
3333 things up here.)
3334
3335 @file{event-Xt.c} and @file{event-tty.c} implement the low-level
3336 interfaces onto retrieving events from Xt (the X toolkit) and from TTY's
3337 (using @code{read()} and @code{select()}), respectively.  The event
3338 interface enforces a clean separation between the specific code for
3339 interfacing with the operating system and the generic code for working
3340 with events, by defining an API of basic, low-level event methods;
3341 @file{event-Xt.c} and @file{event-tty.c} are two different
3342 implementations of this API.  To add support for a new operating system
3343 (e.g. NeXTstep), one merely needs to provide another implementation of
3344 those API functions.
3345
3346 Note that the choice of whether to use @file{event-Xt.c} or
3347 @file{event-tty.c} is made at compile time!  Or at the very latest, it
3348 is made at startup time.  @file{event-Xt.c} handles events for
3349 @emph{both} X and TTY frames; @file{event-tty.c} is only used when X
3350 support is not compiled into XEmacs.  The reason for this is that there
3351 is only one event loop in XEmacs: thus, it needs to be able to receive
3352 events from all different kinds of frames.
3353
3354
3355
3356 @example
3357 keymap.c
3358 keymap.h
3359 @end example
3360
3361 @file{keymap.c} and @file{keymap.h} define the @dfn{keymap} Lisp object
3362 type and associated methods and primitives. (Remember that keymaps are
3363 objects that associate event descriptions with functions to be called to
3364 ``execute'' those events; @code{dispatch-event} looks up events in the
3365 relevant keymaps.)
3366
3367
3368
3369 @example
3370 keyboard.c
3371 @end example
3372
3373 @file{keyboard.c} contains functions that implement the actual editor
3374 command loop---i.e. the event loop that cyclically retrieves and
3375 dispatches events.  This code is also rather tricky, just like
3376 @file{event-stream.c}.
3377
3378
3379
3380 @example
3381 macros.c
3382 macros.h
3383 @end example
3384
3385 These two modules contain the basic code for defining keyboard macros.
3386 These functions don't actually do much; most of the code that handles keyboard
3387 macros is mixed in with the event-handling code in @file{event-stream.c}.
3388
3389
3390
3391 @example
3392 minibuf.c
3393 @end example
3394
3395 This contains some miscellaneous code related to the minibuffer (most of
3396 the minibuffer code was moved into Lisp by Richard Mlynarik).  This
3397 includes the primitives for completion (although filename completion is
3398 in @file{dired.c}), the lowest-level interface to the minibuffer (if the
3399 command loop were cleaned up, this too could be in Lisp), and code for
3400 dealing with the echo area (this, too, was mostly moved into Lisp, and
3401 the only code remaining is code to call out to Lisp or provide simple
3402 bootstrapping implementations early in temacs, before the echo-area Lisp
3403 code is loaded).
3404
3405
3406
3407 @node Modules for the Basic Displayable Lisp Objects, Modules for other Display-Related Lisp Objects, Editor-Level Control Flow Modules, A Summary of the Various XEmacs Modules
3408 @section Modules for the Basic Displayable Lisp Objects
3409
3410 @example
3411 device-ns.h
3412 device-stream.c
3413 device-stream.h
3414 device-tty.c
3415 device-tty.h
3416 device-x.c
3417 device-x.h
3418 device.c
3419 device.h
3420 @end example
3421
3422 These modules implement the @dfn{device} Lisp object type.  This
3423 abstracts a particular screen or connection on which frames are
3424 displayed.  As with Lisp objects, event interfaces, and other
3425 subsystems, the device code is separated into a generic component that
3426 contains a standardized interface (in the form of a set of methods) onto
3427 particular device types.
3428
3429 The device subsystem defines all the methods and provides method
3430 services for not only device operations but also for the frame, window,
3431 menubar, scrollbar, toolbar, and other displayable-object subsystems.
3432 The reason for this is that all of these subsystems have the same
3433 subtypes (X, TTY, NeXTstep, Microsoft Windows, etc.) as devices do.
3434
3435
3436
3437 @example
3438 frame-ns.h
3439 frame-tty.c
3440 frame-x.c
3441 frame-x.h
3442 frame.c
3443 frame.h
3444 @end example
3445
3446 Each device contains one or more frames in which objects (e.g. text) are
3447 displayed.  A frame corresponds to a window in the window system;
3448 usually this is a top-level window but it could potentially be one of a
3449 number of overlapping child windows within a top-level window, using the
3450 MDI (Multiple Document Interface) protocol in Microsoft Windows or a
3451 similar scheme.
3452
3453 The @file{frame-*} files implement the @dfn{frame} Lisp object type and
3454 provide the generic and device-type-specific operations on frames
3455 (e.g. raising, lowering, resizing, moving, etc.).
3456
3457
3458
3459 @example
3460 window.c
3461 window.h
3462 @end example
3463
3464 @cindex window (in Emacs)
3465 @cindex pane
3466 Each frame consists of one or more non-overlapping @dfn{windows} (better
3467 known as @dfn{panes} in standard window-system terminology) in which a
3468 buffer's text can be displayed.  Windows can also have scrollbars
3469 displayed around their edges.
3470
3471 @file{window.c} and @file{window.h} implement the @dfn{window} Lisp
3472 object type and provide code to manage windows.  Since windows have no
3473 associated resources in the window system (the window system knows only
3474 about the frame; no child windows or anything are used for XEmacs
3475 windows), there is no device-type-specific code here; all of that code
3476 is part of the redisplay mechanism or the code for particular object
3477 types such as scrollbars.
3478
3479
3480
3481 @node Modules for other Display-Related Lisp Objects, Modules for the Redisplay Mechanism, Modules for the Basic Displayable Lisp Objects, A Summary of the Various XEmacs Modules
3482 @section Modules for other Display-Related Lisp Objects
3483
3484 @example
3485 faces.c
3486 faces.h
3487 @end example
3488
3489
3490
3491 @example
3492 bitmaps.h
3493 glyphs-ns.h
3494 glyphs-x.c
3495 glyphs-x.h
3496 glyphs.c
3497 glyphs.h
3498 @end example
3499
3500
3501
3502 @example
3503 objects-ns.h
3504 objects-tty.c
3505 objects-tty.h
3506 objects-x.c
3507 objects-x.h
3508 objects.c
3509 objects.h
3510 @end example
3511
3512
3513
3514 @example
3515 menubar-x.c
3516 menubar.c
3517 @end example
3518
3519
3520
3521 @example
3522 scrollbar-x.c
3523 scrollbar-x.h
3524 scrollbar.c
3525 scrollbar.h
3526 @end example
3527
3528
3529
3530 @example
3531 toolbar-x.c
3532 toolbar.c
3533 toolbar.h
3534 @end example
3535
3536
3537
3538 @example
3539 font-lock.c
3540 @end example
3541
3542 This file provides C support for syntax highlighting---i.e.
3543 highlighting different syntactic constructs of a source file in
3544 different colors, for easy reading.  The C support is provided so that
3545 this is fast.
3546
3547
3548
3549 @example
3550 dgif_lib.c
3551 gif_err.c
3552 gif_lib.h
3553 gifalloc.c
3554 @end example
3555
3556 These modules decode GIF-format image files, for use with glyphs.
3557
3558
3559
3560 @node Modules for the Redisplay Mechanism, Modules for Interfacing with the File System, Modules for other Display-Related Lisp Objects, A Summary of the Various XEmacs Modules
3561 @section Modules for the Redisplay Mechanism
3562
3563 @example
3564 redisplay-output.c
3565 redisplay-tty.c
3566 redisplay-x.c
3567 redisplay.c
3568 redisplay.h
3569 @end example
3570
3571 These files provide the redisplay mechanism.  As with many other
3572 subsystems in XEmacs, there is a clean separation between the general
3573 and device-specific support.
3574
3575 @file{redisplay.c} contains the bulk of the redisplay engine.  These
3576 functions update the redisplay structures (which describe how the screen
3577 is to appear) to reflect any changes made to the state of any
3578 displayable objects (buffer, frame, window, etc.) since the last time
3579 that redisplay was called.  These functions are highly optimized to
3580 avoid doing more work than necessary (since redisplay is called
3581 extremely often and is potentially a huge time sink), and depend heavily
3582 on notifications from the objects themselves that changes have occurred,
3583 so that redisplay doesn't explicitly have to check each possible object.
3584 The redisplay mechanism also contains a great deal of caching to further
3585 speed things up; some of this caching is contained within the various
3586 displayable objects.
3587
3588 @file{redisplay-output.c} goes through the redisplay structures and converts
3589 them into calls to device-specific methods to actually output the screen
3590 changes.
3591
3592 @file{redisplay-x.c} and @file{redisplay-tty.c} are two implementations
3593 of these redisplay output methods, for X frames and TTY frames,
3594 respectively.
3595
3596
3597
3598 @example
3599 indent.c
3600 @end example
3601
3602 This module contains various functions and Lisp primitives for
3603 converting between buffer positions and screen positions.  These
3604 functions call the redisplay mechanism to do most of the work, and then
3605 examine the redisplay structures to get the necessary information.  This
3606 module needs work.
3607
3608
3609
3610 @example
3611 termcap.c
3612 terminfo.c
3613 tparam.c
3614 @end example
3615
3616 These files contain functions for working with the termcap (BSD-style)
3617 and terminfo (System V style) databases of terminal capabilities and
3618 escape sequences, used when XEmacs is displaying in a TTY.
3619
3620
3621
3622 @example
3623 cm.c
3624 cm.h
3625 @end example
3626
3627 These files provide some miscellaneous TTY-output functions and should
3628 probably be merged into @file{redisplay-tty.c}.
3629
3630
3631
3632 @node Modules for Interfacing with the File System, Modules for Other Aspects of the Lisp Interpreter and Object System, Modules for the Redisplay Mechanism, A Summary of the Various XEmacs Modules
3633 @section Modules for Interfacing with the File System
3634
3635 @example
3636 lstream.c
3637 lstream.h
3638 @end example
3639
3640 These modules implement the @dfn{stream} Lisp object type.  This is an
3641 internal-only Lisp object that implements a generic buffering stream.
3642 The idea is to provide a uniform interface onto all sources and sinks of
3643 data, including file descriptors, stdio streams, chunks of memory, Lisp
3644 buffers, Lisp strings, etc.  That way, I/O functions can be written to
3645 the stream interface and can transparently handle all possible sources
3646 and sinks.  (For example, the @code{read} function can read data from a
3647 file, a string, a buffer, or even a function that is called repeatedly
3648 to return data, without worrying about where the data is coming from or
3649 what-size chunks it is returned in.)
3650
3651 @cindex lstream
3652 Note that in the C code, streams are called @dfn{lstreams} (for ``Lisp
3653 streams'') to distinguish them from other kinds of streams, e.g. stdio
3654 streams and C++ I/O streams.
3655
3656 Similar to other subsystems in XEmacs, lstreams are separated into
3657 generic functions and a set of methods for the different types of
3658 lstreams.  @file{lstream.c} provides implementations of many different
3659 types of streams; others are provided, e.g., in @file{mule-coding.c}.
3660
3661
3662
3663 @example
3664 fileio.c
3665 @end example
3666
3667 This implements the basic primitives for interfacing with the file
3668 system.  This includes primitives for reading files into buffers,
3669 writing buffers into files, checking for the presence or accessibility
3670 of files, canonicalizing file names, etc.  Note that these primitives
3671 are usually not invoked directly by the user: There is a great deal of
3672 higher-level Lisp code that implements the user commands such as
3673 @code{find-file} and @code{save-buffer}.  This is similar to the
3674 distinction between the lower-level primitives in @file{editfns.c} and
3675 the higher-level user commands in @file{commands.c} and
3676 @file{simple.el}.
3677
3678
3679
3680 @example
3681 filelock.c
3682 @end example
3683
3684 This file provides functions for detecting clashes between different
3685 processes (e.g. XEmacs and some external process, or two different
3686 XEmacs processes) modifying the same file.  (XEmacs can optionally use
3687 the @file{lock/} subdirectory to provide a form of ``locking'' between
3688 different XEmacs processes.)  This module is also used by the low-level
3689 functions in @file{insdel.c} to ensure that, if the first modification
3690 is being made to a buffer whose corresponding file has been externally
3691 modified, the user is made aware of this so that the buffer can be
3692 synched up with the external changes if necessary.
3693
3694
3695 @example
3696 filemode.c
3697 @end example
3698
3699 This file provides some miscellaneous functions that construct a
3700 @samp{rwxr-xr-x}-type permissions string (as might appear in an
3701 @file{ls}-style directory listing) given the information returned by the
3702 @code{stat()} system call.
3703
3704
3705
3706 @example
3707 dired.c
3708 ndir.h
3709 @end example
3710
3711 These files implement the XEmacs interface to directory searching.  This
3712 includes a number of primitives for determining the files in a directory
3713 and for doing filename completion. (Remember that generic completion is
3714 handled by a different mechanism, in @file{minibuf.c}.)
3715
3716 @file{ndir.h} is a header file used for the directory-searching
3717 emulation functions provided in @file{sysdep.c} (see section J below),
3718 for systems that don't provide any directory-searching functions. (On
3719 those systems, directories can be read directly as files, and parsed.)
3720
3721
3722
3723 @example
3724 realpath.c
3725 @end example
3726
3727 This file provides an implementation of the @code{realpath()} function
3728 for expanding symbolic links, on systems that don't implement it or have
3729 a broken implementation.
3730
3731
3732
3733 @node Modules for Other Aspects of the Lisp Interpreter and Object System, Modules for Interfacing with the Operating System, Modules for Interfacing with the File System, A Summary of the Various XEmacs Modules
3734 @section Modules for Other Aspects of the Lisp Interpreter and Object System
3735
3736 @example
3737 elhash.c
3738 elhash.h
3739 hash.c
3740 hash.h
3741 @end example
3742
3743 These files provide two implementations of hash tables.  Files
3744 @file{hash.c} and @file{hash.h} provide a generic C implementation of
3745 hash tables which can stand independently of XEmacs.  Files
3746 @file{elhash.c} and @file{elhash.h} provide a separate implementation of
3747 hash tables that can store only Lisp objects, and knows about Lispy
3748 things like garbage collection, and implement the @dfn{hash-table} Lisp
3749 object type.
3750
3751
3752 @example
3753 specifier.c
3754 specifier.h
3755 @end example
3756
3757 This module implements the @dfn{specifier} Lisp object type.  This is
3758 primarily used for displayable properties, and allows for values that
3759 are specific to a particular buffer, window, frame, device, or device
3760 class, as well as a default value existing.  This is used, for example,
3761 to control the height of the horizontal scrollbar or the appearance of
3762 the @code{default}, @code{bold}, or other faces.  The specifier object
3763 consists of a number of specifications, each of which maps from a
3764 buffer, window, etc. to a value.  The function @code{specifier-instance}
3765 looks up a value given a window (from which a buffer, frame, and device
3766 can be derived).
3767
3768
3769 @example
3770 chartab.c
3771 chartab.h
3772 casetab.c
3773 @end example
3774
3775 @file{chartab.c} and @file{chartab.h} implement the @dfn{char table}
3776 Lisp object type, which maps from characters or certain sorts of
3777 character ranges to Lisp objects.  The implementation of this object
3778 type is optimized for the internal representation of characters.  Char
3779 tables come in different types, which affect the allowed object types to
3780 which a character can be mapped and also dictate certain other
3781 properties of the char table.
3782
3783 @cindex case table
3784 @file{casetab.c} implements one sort of char table, the @dfn{case
3785 table}, which maps characters to other characters of possibly different
3786 case.  These are used by XEmacs to implement case-changing primitives
3787 and to do case-insensitive searching.
3788
3789
3790
3791 @example
3792 syntax.c
3793 syntax.h
3794 @end example
3795
3796 @cindex scanner
3797 This module implements @dfn{syntax tables}, another sort of char table
3798 that maps characters into syntax classes that define the syntax of these
3799 characters (e.g. a parenthesis belongs to a class of @samp{open}
3800 characters that have corresponding @samp{close} characters and can be
3801 nested).  This module also implements the Lisp @dfn{scanner}, a set of
3802 primitives for scanning over text based on syntax tables.  This is used,
3803 for example, to find the matching parenthesis in a command such as
3804 @code{forward-sexp}, and by @file{font-lock.c} to locate quoted strings,
3805 comments, etc.
3806
3807
3808
3809 @example
3810 casefiddle.c
3811 @end example
3812
3813 This module implements various Lisp primitives for upcasing, downcasing
3814 and capitalizing strings or regions of buffers.
3815
3816
3817
3818 @example
3819 rangetab.c
3820 @end example
3821
3822 This module implements the @dfn{range table} Lisp object type, which
3823 provides for a mapping from ranges of integers to arbitrary Lisp
3824 objects.
3825
3826
3827
3828 @example
3829 opaque.c
3830 opaque.h
3831 @end example
3832
3833 This module implements the @dfn{opaque} Lisp object type, an
3834 internal-only Lisp object that encapsulates an arbitrary block of memory
3835 so that it can be managed by the Lisp allocation system.  To create an
3836 opaque object, you call @code{make_opaque()}, passing a pointer to a
3837 block of memory.  An object is created that is big enough to hold the
3838 memory, which is copied into the object's storage.  The object will then
3839 stick around as long as you keep pointers to it, after which it will be
3840 automatically reclaimed.
3841
3842 @cindex mark method
3843 Opaque objects can also have an arbitrary @dfn{mark method} associated
3844 with them, in case the block of memory contains other Lisp objects that
3845 need to be marked for garbage-collection purposes. (If you need other
3846 object methods, such as a finalize method, you should just go ahead and
3847 create a new Lisp object type---it's not hard.)
3848
3849
3850
3851 @example
3852 abbrev.c
3853 @end example
3854
3855 This function provides a few primitives for doing dynamic abbreviation
3856 expansion.  In XEmacs, most of the code for this has been moved into
3857 Lisp.  Some C code remains for speed and because the primitive
3858 @code{self-insert-command} (which is executed for all self-inserting
3859 characters) hooks into the abbrev mechanism. (@code{self-insert-command}
3860 is itself in C only for speed.)
3861
3862
3863
3864 @example
3865 doc.c
3866 @end example
3867
3868 This function provides primitives for retrieving the documentation
3869 strings of functions and variables.  These documentation strings contain
3870 certain special markers that get dynamically expanded (e.g. a
3871 reverse-lookup is performed on some named functions to retrieve their
3872 current key bindings).  Some documentation strings (in particular, for
3873 the built-in primitives and pre-loaded Lisp functions) are stored
3874 externally in a file @file{DOC} in the @file{lib-src/} directory and
3875 need to be fetched from that file. (Part of the build stage involves
3876 building this file, and another part involves constructing an index for
3877 this file and embedding it into the executable, so that the functions in
3878 @file{doc.c} do not have to search the entire @file{DOC} file to find
3879 the appropriate documentation string.)
3880
3881
3882
3883 @example
3884 md5.c
3885 @end example
3886
3887 This function provides a Lisp primitive that implements the MD5 secure
3888 hashing scheme, used to create a large hash value of a string of data such that
3889 the data cannot be derived from the hash value.  This is used for
3890 various security applications on the Internet.
3891
3892
3893
3894
3895 @node Modules for Interfacing with the Operating System, Modules for Interfacing with X Windows, Modules for Other Aspects of the Lisp Interpreter and Object System, A Summary of the Various XEmacs Modules
3896 @section Modules for Interfacing with the Operating System
3897
3898 @example
3899 callproc.c
3900 process.c
3901 process.h
3902 @end example
3903
3904 These modules allow XEmacs to spawn and communicate with subprocesses
3905 and network connections.
3906
3907 @cindex synchronous subprocesses
3908 @cindex subprocesses, synchronous
3909   @file{callproc.c} implements (through the @code{call-process}
3910 primitive) what are called @dfn{synchronous subprocesses}.  This means
3911 that XEmacs runs a program, waits till it's done, and retrieves its
3912 output.  A typical example might be calling the @file{ls} program to get
3913 a directory listing.
3914
3915 @cindex asynchronous subprocesses
3916 @cindex subprocesses, asynchronous
3917   @file{process.c} and @file{process.h} implement @dfn{asynchronous
3918 subprocesses}.  This means that XEmacs starts a program and then
3919 continues normally, not waiting for the process to finish.  Data can be
3920 sent to the process or retrieved from it as it's running.  This is used
3921 for the @code{shell} command (which provides a front end onto a shell
3922 program such as @file{csh}), the mail and news readers implemented in
3923 XEmacs, etc.  The result of calling @code{start-process} to start a
3924 subprocess is a process object, a particular kind of object used to
3925 communicate with the subprocess.  You can send data to the process by
3926 passing the process object and the data to @code{send-process}, and you
3927 can specify what happens to data retrieved from the process by setting
3928 properties of the process object. (When the process sends data, XEmacs
3929 receives a process event, which says that there is data ready.  When
3930 @code{dispatch-event} is called on this event, it reads the data from
3931 the process and does something with it, as specified by the process
3932 object's properties.  Typically, this means inserting the data into a
3933 buffer or calling a function.) Another property of the process object is
3934 called the @dfn{sentinel}, which is a function that is called when the
3935 process terminates.
3936
3937 @cindex network connections
3938   Process objects are also used for network connections (connections to a
3939 process running on another machine).  Network connections are started
3940 with @code{open-network-stream} but otherwise work just like
3941 subprocesses.
3942
3943
3944
3945 @example
3946 sysdep.c
3947 sysdep.h
3948 @end example
3949
3950   These modules implement most of the low-level, messy operating-system
3951 interface code.  This includes various device control (ioctl) operations
3952 for file descriptors, TTY's, pseudo-terminals, etc. (usually this stuff
3953 is fairly system-dependent; thus the name of this module), and emulation
3954 of standard library functions and system calls on systems that don't
3955 provide them or have broken versions.
3956
3957
3958
3959 @example
3960 sysdir.h
3961 sysfile.h
3962 sysfloat.h
3963 sysproc.h
3964 syspwd.h
3965 syssignal.h
3966 systime.h
3967 systty.h
3968 syswait.h
3969 @end example
3970
3971 These header files provide consistent interfaces onto system-dependent
3972 header files and system calls.  The idea is that, instead of including a
3973 standard header file like @file{<sys/param.h>} (which may or may not
3974 exist on various systems) or having to worry about whether all system
3975 provide a particular preprocessor constant, or having to deal with the
3976 four different paradigms for manipulating signals, you just include the
3977 appropriate @file{sys*.h} header file, which includes all the right
3978 system header files, defines and missing preprocessor constants,
3979 provides a uniform interface onto system calls, etc.
3980
3981 @file{sysdir.h} provides a uniform interface onto directory-querying
3982 functions. (In some cases, this is in conjunction with emulation
3983 functions in @file{sysdep.c}.)
3984
3985 @file{sysfile.h} includes all the necessary header files for standard
3986 system calls (e.g. @code{read()}), ensures that all necessary
3987 @code{open()} and @code{stat()} preprocessor constants are defined, and
3988 possibly (usually) substitutes sugared versions of @code{read()},
3989 @code{write()}, etc. that automatically restart interrupted I/O
3990 operations.
3991
3992 @file{sysfloat.h} includes the necessary header files for floating-point
3993 operations.
3994
3995 @file{sysproc.h} includes the necessary header files for calling
3996 @code{select()}, @code{fork()}, @code{execve()}, socket operations, and
3997 the like, and ensures that the @code{FD_*()} macros for descriptor-set
3998 manipulations are available.
3999
4000 @file{syspwd.h} includes the necessary header files for obtaining
4001 information from @file{/etc/passwd} (the functions are emulated under
4002 VMS).
4003
4004 @file{syssignal.h} includes the necessary header files for
4005 signal-handling and provides a uniform interface onto the different
4006 signal-handling and signal-blocking paradigms.
4007
4008 @file{systime.h} includes the necessary header files and provides
4009 uniform interfaces for retrieving the time of day, setting file
4010 access/modification times, getting the amount of time used by the XEmacs
4011 process, etc.
4012
4013 @file{systty.h} buffers against the infinitude of different ways of
4014 controlling TTY's.
4015
4016 @file{syswait.h} provides a uniform way of retrieving the exit status
4017 from a @code{wait()}ed-on process (some systems use a union, others use
4018 an int).
4019
4020
4021
4022 @example
4023 hpplay.c
4024 libsst.c
4025 libsst.h
4026 libst.h
4027 linuxplay.c
4028 nas.c
4029 sgiplay.c
4030 sound.c
4031 sunplay.c
4032 @end example
4033
4034 These files implement the ability to play various sounds on some types
4035 of computers.  You have to configure your XEmacs with sound support in
4036 order to get this capability.
4037
4038 @file{sound.c} provides the generic interface.  It implements various
4039 Lisp primitives and variables that let you specify which sounds should
4040 be played in certain conditions. (The conditions are identified by
4041 symbols, which are passed to @code{ding} to make a sound.  Various
4042 standard functions call this function at certain times; if sound support
4043 does not exist, a simple beep results.
4044
4045 @cindex native sound
4046 @cindex sound, native
4047 @file{sgiplay.c}, @file{sunplay.c}, @file{hpplay.c}, and
4048 @file{linuxplay.c} interface to the machine's speaker for various
4049 different kind of machines.  This is called @dfn{native} sound.
4050
4051 @cindex sound, network
4052 @cindex network sound
4053 @cindex NAS
4054 @file{nas.c} interfaces to a computer somewhere else on the network
4055 using the NAS (Network Audio Server) protocol, playing sounds on that
4056 machine.  This allows you to run XEmacs on a remote machine, with its
4057 display set to your local machine, and have the sounds be made on your
4058 local machine, provided that you have a NAS server running on your local
4059 machine.
4060
4061 @file{libsst.c}, @file{libsst.h}, and @file{libst.h} provide some
4062 additional functions for playing sound on a Sun SPARC but are not
4063 currently in use.
4064
4065
4066
4067 @example
4068 tooltalk.c
4069 tooltalk.h
4070 @end example
4071
4072 These two modules implement an interface to the ToolTalk protocol, which
4073 is an interprocess communication protocol implemented on some versions
4074 of Unix.  ToolTalk is a high-level protocol that allows processes to
4075 register themselves as providers of particular services; other processes
4076 can then request a service without knowing or caring exactly who is
4077 providing the service.  It is similar in spirit to the DDE protocol
4078 provided under Microsoft Windows.  ToolTalk is a part of the new CDE
4079 (Common Desktop Environment) specification and is used to connect the
4080 parts of the SPARCWorks development environment.
4081
4082
4083
4084 @example
4085 getloadavg.c
4086 @end example
4087
4088 This module provides the ability to retrieve the system's current load
4089 average. (The way to do this is highly system-specific, unfortunately,
4090 and requires a lot of special-case code.)
4091
4092
4093
4094 @example
4095 sunpro.c
4096 @end example
4097
4098 This module provides a small amount of code used internally at Sun to
4099 keep statistics on the usage of XEmacs.
4100
4101
4102
4103 @example
4104 broken-sun.h
4105 strcmp.c
4106 strcpy.c
4107 sunOS-fix.c
4108 @end example
4109
4110 These files provide replacement functions and prototypes to fix numerous
4111 bugs in early releases of SunOS 4.1.
4112
4113
4114
4115 @example
4116 hftctl.c
4117 @end example
4118
4119 This module provides some terminal-control code necessary on versions of
4120 AIX prior to 4.1.
4121
4122
4123
4124 @example
4125 msdos.c
4126 msdos.h
4127 @end example
4128
4129 These modules are used for MS-DOS support, which does not work in
4130 XEmacs.
4131
4132
4133
4134 @node Modules for Interfacing with X Windows, Modules for Internationalization, Modules for Interfacing with the Operating System, A Summary of the Various XEmacs Modules
4135 @section Modules for Interfacing with X Windows
4136
4137 @example
4138 Emacs.ad.h
4139 @end example
4140
4141 A file generated from @file{Emacs.ad}, which contains XEmacs-supplied
4142 fallback resources (so that XEmacs has pretty defaults).
4143
4144
4145
4146 @example
4147 EmacsFrame.c
4148 EmacsFrame.h
4149 EmacsFrameP.h
4150 @end example
4151
4152 These modules implement an Xt widget class that encapsulates a frame.
4153 This is for ease in integrating with Xt.  The EmacsFrame widget covers
4154 the entire X window except for the menubar; the scrollbars are
4155 positioned on top of the EmacsFrame widget.
4156
4157 @strong{Warning:} Abandon hope, all ye who enter here.  This code took
4158 an ungodly amount of time to get right, and is likely to fall apart
4159 mercilessly at the slightest change.  Such is life under Xt.
4160
4161
4162
4163 @example
4164 EmacsManager.c
4165 EmacsManager.h
4166 EmacsManagerP.h
4167 @end example
4168
4169 These modules implement a simple Xt manager (i.e. composite) widget
4170 class that simply lets its children set whatever geometry they want.
4171 It's amazing that Xt doesn't provide this standardly, but on second
4172 thought, it makes sense, considering how amazingly broken Xt is.
4173
4174
4175 @example
4176 EmacsShell-sub.c
4177 EmacsShell.c
4178 EmacsShell.h
4179 EmacsShellP.h
4180 @end example
4181
4182 These modules implement two Xt widget classes that are subclasses of
4183 the TopLevelShell and TransientShell classes.  This is necessary to deal
4184 with more brokenness that Xt has sadistically thrust onto the backs of
4185 developers.
4186
4187
4188
4189 @example
4190 xgccache.c
4191 xgccache.h
4192 @end example
4193
4194 These modules provide functions for maintenance and caching of GC's
4195 (graphics contexts) under the X Window System.  This code is junky and
4196 needs to be rewritten.
4197
4198
4199
4200 @example
4201 xselect.c
4202 @end example
4203
4204 @cindex selections
4205   This module provides an interface to the X Window System's concept of
4206 @dfn{selections}, the standard way for X applications to communicate
4207 with each other.
4208
4209
4210
4211 @example
4212 xintrinsic.h
4213 xintrinsicp.h
4214 xmmanagerp.h
4215 xmprimitivep.h
4216 @end example
4217
4218 These header files are similar in spirit to the @file{sys*.h} files and buffer
4219 against different implementations of Xt and Motif.
4220
4221 @itemize @bullet
4222 @item
4223 @file{xintrinsic.h} should be included in place of @file{<Intrinsic.h>}.
4224 @item
4225 @file{xintrinsicp.h} should be included in place of @file{<IntrinsicP.h>}.
4226 @item
4227 @file{xmmanagerp.h} should be included in place of @file{<XmManagerP.h>}.
4228 @item
4229 @file{xmprimitivep.h} should be included in place of @file{<XmPrimitiveP.h>}.
4230 @end itemize
4231
4232
4233
4234 @example
4235 xmu.c
4236 xmu.h
4237 @end example
4238
4239 These files provide an emulation of the Xmu library for those systems
4240 (i.e. HPUX) that don't provide it as a standard part of X.
4241
4242
4243
4244 @example
4245 ExternalClient-Xlib.c
4246 ExternalClient.c
4247 ExternalClient.h
4248 ExternalClientP.h
4249 ExternalShell.c
4250 ExternalShell.h
4251 ExternalShellP.h
4252 extw-Xlib.c
4253 extw-Xlib.h
4254 extw-Xt.c
4255 extw-Xt.h
4256 @end example
4257
4258 @cindex external widget
4259   These files provide the @dfn{external widget} interface, which allows an
4260 XEmacs frame to appear as a widget in another application.  To do this,
4261 you have to configure with @samp{--external-widget}.
4262
4263 @file{ExternalShell*} provides the server (XEmacs) side of the
4264 connection.
4265
4266 @file{ExternalClient*} provides the client (other application) side of
4267 the connection.  These files are not compiled into XEmacs but are
4268 compiled into libraries that are then linked into your application.
4269
4270 @file{extw-*} is common code that is used for both the client and server.
4271
4272 Don't touch this code; something is liable to break if you do.
4273
4274
4275
4276 @node Modules for Internationalization,  , Modules for Interfacing with X Windows, A Summary of the Various XEmacs Modules
4277 @section Modules for Internationalization
4278
4279 @example
4280 mule-canna.c
4281 mule-ccl.c
4282 mule-charset.c
4283 mule-charset.h
4284 mule-coding.c
4285 mule-coding.h
4286 mule-mcpath.c
4287 mule-mcpath.h
4288 mule-wnnfns.c
4289 mule.c
4290 @end example
4291
4292 These files implement the MULE (Asian-language) support.  Note that MULE
4293 actually provides a general interface for all sorts of languages, not
4294 just Asian languages (although they are generally the most complicated
4295 to support).  This code is still in beta.
4296
4297 @file{mule-charset.*} and @file{mule-coding.*} provide the heart of the
4298 XEmacs MULE support.  @file{mule-charset.*} implements the @dfn{charset}
4299 Lisp object type, which encapsulates a character set (an ordered one- or
4300 two-dimensional set of characters, such as US ASCII or JISX0208 Japanese
4301 Kanji).
4302
4303 @file{mule-coding.*} implements the @dfn{coding-system} Lisp object
4304 type, which encapsulates a method of converting between different
4305 encodings.  An encoding is a representation of a stream of characters,
4306 possibly from multiple character sets, using a stream of bytes or words,
4307 and defines (e.g.) which escape sequences are used to specify particular
4308 character sets, how the indices for a character are converted into bytes
4309 (sometimes this involves setting the high bit; sometimes complicated
4310 rearranging of the values takes place, as in the Shift-JIS encoding),
4311 etc.
4312
4313 @file{mule-ccl.c} provides the CCL (Code Conversion Language)
4314 interpreter.  CCL is similar in spirit to Lisp byte code and is used to
4315 implement converters for custom encodings.
4316
4317 @file{mule-canna.c} and @file{mule-wnnfns.c} implement interfaces to
4318 external programs used to implement the Canna and WNN input methods,
4319 respectively.  This is currently in beta.
4320
4321 @file{mule-mcpath.c} provides some functions to allow for pathnames
4322 containing extended characters.  This code is fragmentary, obsolete, and
4323 completely non-working.  Instead, @var{pathname-coding-system} is used
4324 to specify conversions of names of files and directories.  The standard
4325 C I/O functions like @samp{open()} are wrapped so that conversion occurs
4326 automatically.
4327
4328 @file{mule.c} provides a few miscellaneous things that should probably
4329 be elsewhere.
4330
4331
4332
4333 @example
4334 intl.c
4335 @end example
4336
4337 This provides some miscellaneous internationalization code for
4338 implementing message translation and interfacing to the Ximp input
4339 method.  None of this code is currently working.
4340
4341
4342
4343 @example
4344 iso-wide.h
4345 @end example
4346
4347 This contains leftover code from an earlier implementation of
4348 Asian-language support, and is not currently used.
4349
4350
4351
4352
4353 @node Allocation of Objects in XEmacs Lisp, Dumping, A Summary of the Various XEmacs Modules, Top
4354 @chapter Allocation of Objects in XEmacs Lisp
4355
4356 @menu
4357 * Introduction to Allocation::
4358 * Garbage Collection::
4359 * GCPROing::
4360 * Garbage Collection - Step by Step::
4361 * Integers and Characters::
4362 * Allocation from Frob Blocks::
4363 * lrecords::
4364 * Low-level allocation::
4365 * Cons::
4366 * Vector::
4367 * Bit Vector::
4368 * Symbol::
4369 * Marker::
4370 * String::
4371 * Compiled Function::
4372 @end menu
4373
4374 @node Introduction to Allocation, Garbage Collection, Allocation of Objects in XEmacs Lisp, Allocation of Objects in XEmacs Lisp
4375 @section Introduction to Allocation
4376
4377   Emacs Lisp, like all Lisps, has garbage collection.  This means that
4378 the programmer never has to explicitly free (destroy) an object; it
4379 happens automatically when the object becomes inaccessible.  Most
4380 experts agree that garbage collection is a necessity in a modern,
4381 high-level language.  Its omission from C stems from the fact that C was
4382 originally designed to be a nice abstract layer on top of assembly
4383 language, for writing kernels and basic system utilities rather than
4384 large applications.
4385
4386   Lisp objects can be created by any of a number of Lisp primitives.
4387 Most object types have one or a small number of basic primitives
4388 for creating objects.  For conses, the basic primitive is @code{cons};
4389 for vectors, the primitives are @code{make-vector} and @code{vector}; for
4390 symbols, the primitives are @code{make-symbol} and @code{intern}; etc.
4391 Some Lisp objects, especially those that are primarily used internally,
4392 have no corresponding Lisp primitives.  Every Lisp object, though,
4393 has at least one C primitive for creating it.
4394
4395   Recall from section (VII) that a Lisp object, as stored in a 32-bit or
4396 64-bit word, has a few tag bits, and a ``value'' that occupies the
4397 remainder of the bits.  We can separate the different Lisp object types
4398 into three broad categories:
4399
4400 @itemize @bullet
4401 @item
4402 (a) Those for whom the value directly represents the contents of the
4403 Lisp object.  Only two types are in this category: integers and
4404 characters.  No special allocation or garbage collection is necessary
4405 for such objects.  Lisp objects of these types do not need to be
4406 @code{GCPRO}ed.
4407 @end itemize
4408
4409   In the remaining two categories, the type is stored in the object
4410 itself.  The tag for all such objects is the generic @dfn{lrecord}
4411 (Lisp_Type_Record) tag.  The first bytes of the object's structure are an
4412 integer (actually a char) characterising the object's type and some
4413 flags, in particular the mark bit used for garbage collection.  A
4414 structure describing the type is accessible thru the
4415 lrecord_implementation_table indexed with said integer.  This structure
4416 includes the method pointers and a pointer to a string naming the type.
4417
4418 @itemize @bullet
4419 @item
4420 (b) Those lrecords that are allocated in frob blocks (see above).  This
4421 includes the objects that are most common and relatively small, and
4422 includes conses, strings, subrs, floats, compiled functions, symbols,
4423 extents, events, and markers.  With the cleanup of frob blocks done in
4424 19.12, it's not terribly hard to add more objects to this category, but
4425 it's a bit trickier than adding an object type to type (c) (esp. if the
4426 object needs a finalization method), and is not likely to save much
4427 space unless the object is small and there are many of them. (In fact,
4428 if there are very few of them, it might actually waste space.)
4429 @item
4430 (c) Those lrecords that are individually @code{malloc()}ed.  These are
4431 called @dfn{lcrecords}.  All other types are in this category.  Adding a
4432 new type to this category is comparatively easy, and all types added
4433 since 19.8 (when the current allocation scheme was devised, by Richard
4434 Mlynarik), with the exception of the character type, have been in this
4435 category.
4436 @end itemize
4437
4438   Note that bit vectors are a bit of a special case.  They are
4439 simple lrecords as in category (b), but are individually @code{malloc()}ed
4440 like vectors.  You can basically view them as exactly like vectors
4441 except that their type is stored in lrecord fashion rather than
4442 in directly-tagged fashion.
4443
4444
4445 @node Garbage Collection, GCPROing, Introduction to Allocation, Allocation of Objects in XEmacs Lisp
4446 @section Garbage Collection
4447 @cindex garbage collection
4448
4449 @cindex mark and sweep
4450   Garbage collection is simple in theory but tricky to implement.
4451 Emacs Lisp uses the oldest garbage collection method, called
4452 @dfn{mark and sweep}.  Garbage collection begins by starting with
4453 all accessible locations (i.e. all variables and other slots where
4454 Lisp objects might occur) and recursively traversing all objects
4455 accessible from those slots, marking each one that is found.
4456 We then go through all of memory and free each object that is
4457 not marked, and unmarking each object that is marked.  Note
4458 that ``all of memory'' means all currently allocated objects.
4459 Traversing all these objects means traversing all frob blocks,
4460 all vectors (which are chained in one big list), and all
4461 lcrecords (which are likewise chained).
4462
4463   Garbage collection can be invoked explicitly by calling
4464 @code{garbage-collect} but is also called automatically by @code{eval},
4465 once a certain amount of memory has been allocated since the last
4466 garbage collection (according to @code{gc-cons-threshold}).
4467
4468
4469 @node GCPROing, Garbage Collection - Step by Step, Garbage Collection, Allocation of Objects in XEmacs Lisp
4470 @section @code{GCPRO}ing
4471
4472 @code{GCPRO}ing is one of the ugliest and trickiest parts of Emacs
4473 internals.  The basic idea is that whenever garbage collection
4474 occurs, all in-use objects must be reachable somehow or
4475 other from one of the roots of accessibility.  The roots
4476 of accessibility are:
4477
4478 @enumerate
4479 @item
4480 All objects that have been @code{staticpro()}d or
4481 @code{staticpro_nodump()}ed.  This is used for any global C variables
4482 that hold Lisp objects.  A call to @code{staticpro()} happens implicitly
4483 as a result of any symbols declared with @code{defsymbol()} and any
4484 variables declared with @code{DEFVAR_FOO()}.  You need to explicitly
4485 call @code{staticpro()} (in the @code{vars_of_foo()} method of a module)
4486 for other global C variables holding Lisp objects. (This typically
4487 includes internal lists and such things.).  Use
4488 @code{staticpro_nodump()} only in the rare cases when you do not want
4489 the pointed variable to be saved at dump time but rather recompute it at
4490 startup.
4491
4492 Note that @code{obarray} is one of the @code{staticpro()}d things.
4493 Therefore, all functions and variables get marked through this.
4494 @item
4495 Any shadowed bindings that are sitting on the @code{specpdl} stack.
4496 @item
4497 Any objects sitting in currently active (Lisp) stack frames,
4498 catches, and condition cases.
4499 @item
4500 A couple of special-case places where active objects are
4501 located.
4502 @item
4503 Anything currently marked with @code{GCPRO}.
4504 @end enumerate
4505
4506   Marking with @code{GCPRO} is necessary because some C functions (quite
4507 a lot, in fact), allocate objects during their operation.  Quite
4508 frequently, there will be no other pointer to the object while the
4509 function is running, and if a garbage collection occurs and the object
4510 needs to be referenced again, bad things will happen.  The solution is
4511 to mark those objects with @code{GCPRO}.  Unfortunately this is easy to
4512 forget, and there is basically no way around this problem.  Here are
4513 some rules, though:
4514
4515 @enumerate
4516 @item
4517 For every @code{GCPRO@var{n}}, there have to be declarations of
4518 @code{struct gcpro gcpro1, gcpro2}, etc.
4519
4520 @item
4521 You @emph{must} @code{UNGCPRO} anything that's @code{GCPRO}ed, and you
4522 @emph{must not} @code{UNGCPRO} if you haven't @code{GCPRO}ed.  Getting
4523 either of these wrong will lead to crashes, often in completely random
4524 places unrelated to where the problem lies.
4525
4526 @item
4527 The way this actually works is that all currently active @code{GCPRO}s
4528 are chained through the @code{struct gcpro} local variables, with the
4529 variable @samp{gcprolist} pointing to the head of the list and the nth
4530 local @code{gcpro} variable pointing to the first @code{gcpro} variable
4531 in the next enclosing stack frame.  Each @code{GCPRO}ed thing is an
4532 lvalue, and the @code{struct gcpro} local variable contains a pointer to
4533 this lvalue.  This is why things will mess up badly if you don't pair up
4534 the @code{GCPRO}s and @code{UNGCPRO}s---you will end up with
4535 @code{gcprolist}s containing pointers to @code{struct gcpro}s or local
4536 @code{Lisp_Object} variables in no-longer-active stack frames.
4537
4538 @item
4539 It is actually possible for a single @code{struct gcpro} to
4540 protect a contiguous array of any number of values, rather than
4541 just a single lvalue.  To effect this, call @code{GCPRO@var{n}} as usual on
4542 the first object in the array and then set @code{gcpro@var{n}.nvars}.
4543
4544 @item
4545 @strong{Strings are relocated.}  What this means in practice is that the
4546 pointer obtained using @code{XSTRING_DATA()} is liable to change at any
4547 time, and you should never keep it around past any function call, or
4548 pass it as an argument to any function that might cause a garbage
4549 collection.  This is why a number of functions accept either a
4550 ``non-relocatable'' @code{char *} pointer or a relocatable Lisp string,
4551 and only access the Lisp string's data at the very last minute.  In some
4552 cases, you may end up having to @code{alloca()} some space and copy the
4553 string's data into it.
4554
4555 @item
4556 By convention, if you have to nest @code{GCPRO}'s, use @code{NGCPRO@var{n}}
4557 (along with @code{struct gcpro ngcpro1, ngcpro2}, etc.), @code{NNGCPRO@var{n}},
4558 etc.  This avoids compiler warnings about shadowed locals.
4559
4560 @item
4561 It is @emph{always} better to err on the side of extra @code{GCPRO}s
4562 rather than too few.  The extra cycles spent on this are
4563 almost never going to make a whit of difference in the
4564 speed of anything.
4565
4566 @item
4567 The general rule to follow is that caller, not callee, @code{GCPRO}s.
4568 That is, you should not have to explicitly @code{GCPRO} any Lisp objects
4569 that are passed in as parameters.
4570
4571 One exception from this rule is if you ever plan to change the parameter
4572 value, and store a new object in it.  In that case, you @emph{must}
4573 @code{GCPRO} the parameter, because otherwise the new object will not be
4574 protected.
4575
4576 So, if you create any Lisp objects (remember, this happens in all sorts
4577 of circumstances, e.g. with @code{Fcons()}, etc.), you are responsible
4578 for @code{GCPRO}ing them, unless you are @emph{absolutely sure} that
4579 there's no possibility that a garbage-collection can occur while you
4580 need to use the object.  Even then, consider @code{GCPRO}ing.
4581
4582 @item
4583 A garbage collection can occur whenever anything calls @code{Feval}, or
4584 whenever a QUIT can occur where execution can continue past
4585 this. (Remember, this is almost anywhere.)
4586
4587 @item
4588 If you have the @emph{least smidgeon of doubt} about whether
4589 you need to @code{GCPRO}, you should @code{GCPRO}.
4590
4591 @item
4592 Beware of @code{GCPRO}ing something that is uninitialized.  If you have
4593 any shade of doubt about this, initialize all your variables to @code{Qnil}.
4594
4595 @item
4596 Be careful of traps, like calling @code{Fcons()} in the argument to
4597 another function.  By the ``caller protects'' law, you should be
4598 @code{GCPRO}ing the newly-created cons, but you aren't.  A certain
4599 number of functions that are commonly called on freshly created stuff
4600 (e.g. @code{nconc2()}, @code{Fsignal()}), break the ``caller protects''
4601 law and go ahead and @code{GCPRO} their arguments so as to simplify
4602 things, but make sure and check if it's OK whenever doing something like
4603 this.
4604
4605 @item
4606 Once again, remember to @code{GCPRO}!  Bugs resulting from insufficient
4607 @code{GCPRO}ing are intermittent and extremely difficult to track down,
4608 often showing up in crashes inside of @code{garbage-collect} or in
4609 weirdly corrupted objects or even in incorrect values in a totally
4610 different section of code.
4611 @end enumerate
4612
4613 @cindex garbage collection, conservative
4614 @cindex conservative garbage collection
4615   Given the extremely error-prone nature of the @code{GCPRO} scheme, and
4616 the difficulties in tracking down, it should be considered a deficiency
4617 in the XEmacs code.  A solution to this problem would involve
4618 implementing so-called @dfn{conservative} garbage collection for the C
4619 stack.  That involves looking through all of stack memory and treating
4620 anything that looks like a reference to an object as a reference.  This
4621 will result in a few objects not getting collected when they should, but
4622 it obviates the need for @code{GCPRO}ing, and allows garbage collection
4623 to happen at any point at all, such as during object allocation.
4624
4625 @node Garbage Collection - Step by Step, Integers and Characters, GCPROing, Allocation of Objects in XEmacs Lisp
4626 @section Garbage Collection - Step by Step
4627 @cindex garbage collection step by step
4628
4629 @menu
4630 * Invocation::
4631 * garbage_collect_1::
4632 * mark_object::
4633 * gc_sweep::
4634 * sweep_lcrecords_1::
4635 * compact_string_chars::
4636 * sweep_strings::
4637 * sweep_bit_vectors_1::
4638 @end menu
4639
4640 @node Invocation, garbage_collect_1, Garbage Collection - Step by Step, Garbage Collection - Step by Step
4641 @subsection Invocation
4642 @cindex garbage collection, invocation
4643
4644 The first thing that anyone should know about garbage collection is:
4645 when and how the garbage collector is invoked. One might think that this
4646 could happen every time new memory is allocated, e.g. new objects are
4647 created, but this is @emph{not} the case. Instead, we have the following
4648 situation:
4649
4650 The entry point of any process of garbage collection is an invocation
4651 of the function @code{garbage_collect_1} in file @code{alloc.c}. The
4652 invocation can occur @emph{explicitly} by calling the function
4653 @code{Fgarbage_collect} (in addition this function provides information
4654 about the freed memory), or can occur @emph{implicitly} in four different
4655 situations:
4656 @enumerate
4657 @item
4658 In function @code{main_1} in file @code{emacs.c}. This function is called
4659 at each startup of xemacs. The garbage collection is invoked after all
4660 initial creations are completed, but only if a special internal error
4661 checking-constant @code{ERROR_CHECK_GC} is defined.
4662 @item
4663 In function @code{disksave_object_finalization} in file
4664 @code{alloc.c}. The only purpose of this function is to clear the
4665 objects from memory which need not be stored with xemacs when we dump out
4666 an executable. This is only done by @code{Fdump_emacs} or by
4667 @code{Fdump_emacs_data} respectively (both in @code{emacs.c}). The
4668 actual clearing is accomplished by making these objects unreachable and
4669 starting a garbage collection. The function is only used while building
4670 xemacs.
4671 @item
4672 In function @code{Feval / eval} in file @code{eval.c}. Each time the
4673 well known and often used function eval is called to evaluate a form,
4674 one of the first things that could happen, is a potential call of
4675 @code{garbage_collect_1}. There exist three global variables,
4676 @code{consing_since_gc} (counts the created cons-cells since the last
4677 garbage collection), @code{gc_cons_threshold} (a specified threshold
4678 after which a garbage collection occurs) and @code{always_gc}. If
4679 @code{always_gc} is set or if the threshold is exceeded, the garbage
4680 collection will start.
4681 @item
4682 In function @code{Ffuncall / funcall} in file @code{eval.c}. This
4683 function evaluates calls of elisp functions and works according to
4684 @code{Feval}.
4685 @end enumerate
4686
4687 The upshot is that garbage collection can basically occur everywhere
4688 @code{Feval}, respectively @code{Ffuncall}, is used - either directly or
4689 through another function. Since calls to these two functions are hidden
4690 in various other functions, many calls to @code{garbage_collect_1} are
4691 not obviously foreseeable, and therefore unexpected. Instances where
4692 they are used that are worth remembering are various elisp commands, as
4693 for example @code{or}, @code{and}, @code{if}, @code{cond}, @code{while},
4694 @code{setq}, etc., miscellaneous @code{gui_item_...} functions,
4695 everything related to @code{eval} (@code{Feval_buffer}, @code{call0},
4696 ...) and inside @code{Fsignal}. The latter is used to handle signals, as
4697 for example the ones raised by every @code{QUITE}-macro triggered after
4698 pressing Ctrl-g.
4699
4700 @node garbage_collect_1, mark_object, Invocation, Garbage Collection - Step by Step
4701 @subsection @code{garbage_collect_1}
4702 @cindex @code{garbage_collect_1}
4703
4704 We can now describe exactly what happens after the invocation takes
4705 place.
4706 @enumerate
4707 @item
4708 There are several cases in which the garbage collector is left immediately:
4709 when we are already garbage collecting (@code{gc_in_progress}), when
4710 the garbage collection is somehow forbidden
4711 (@code{gc_currently_forbidden}), when we are currently displaying something
4712 (@code{in_display}) or when we are preparing for the armageddon of the
4713 whole system (@code{preparing_for_armageddon}).
4714 @item
4715 Next the correct frame in which to put
4716 all the output occurring during garbage collecting is determined. In
4717 order to be able to restore the old display's state after displaying the
4718 message, some data about the current cursor position has to be
4719 saved. The variables @code{pre_gc_cursor} and @code{cursor_changed} take
4720 care of that.
4721 @item
4722 The state of @code{gc_currently_forbidden} must be restored after
4723 the garbage collection, no matter what happens during the process. We
4724 accomplish this by @code{record_unwind_protect}ing the suitable function
4725 @code{restore_gc_inhibit} together with the current value of
4726 @code{gc_currently_forbidden}.
4727 @item
4728 If we are concurrently running an interactive xemacs session, the next step
4729 is simply to show the garbage collector's cursor/message.
4730 @item
4731 The following steps are the intrinsic steps of the garbage collector,
4732 therefore @code{gc_in_progress} is set.
4733 @item
4734 For debugging purposes, it is possible to copy the current C stack
4735 frame. However, this seems to be a currently unused feature.
4736 @item
4737 Before actually starting to go over all live objects, references to
4738 objects that are no longer used are pruned. We only have to do this for events
4739 (@code{clear_event_resource}) and for specifiers
4740 (@code{cleanup_specifiers}).
4741 @item
4742 Now the mark phase begins and marks all accessible elements. In order to
4743 start from
4744 all slots that serve as roots of accessibility, the function
4745 @code{mark_object} is called for each root individually to go out from
4746 there to mark all reachable objects. All roots that are traversed are
4747 shown in their processed order:
4748 @itemize @bullet
4749 @item
4750 all constant symbols and static variables that are registered via
4751 @code{staticpro}@ in the array @code{staticvec}.
4752 @xref{Adding Global Lisp Variables}.
4753 @item
4754 all Lisp objects that are created in C functions and that must be
4755 protected from freeing them. They are registered in the global
4756 list @code{gcprolist}.
4757 @xref{GCPROing}.
4758 @item
4759 all local variables (i.e. their name fields @code{symbol} and old
4760 values @code{old_values}) that are bound during the evaluation by the Lisp
4761 engine. They are stored in @code{specbinding} structs pushed on a stack
4762 called @code{specpdl}.
4763 @xref{Dynamic Binding; The specbinding Stack; Unwind-Protects}.
4764 @item
4765 all catch blocks that the Lisp engine encounters during the evaluation
4766 cause the creation of structs @code{catchtag} inserted in the list
4767 @code{catchlist}. Their tag (@code{tag}) and value (@code{val} fields
4768 are freshly created objects and therefore have to be marked.
4769 @xref{Catch and Throw}.
4770 @item
4771 every function application pushes new structs @code{backtrace}
4772 on the call stack of the Lisp engine (@code{backtrace_list}). The unique
4773 parts that have to be marked are the fields for each function
4774 (@code{function}) and all their arguments (@code{args}).
4775 @xref{Evaluation}.
4776 @item
4777 all objects that are used by the redisplay engine that must not be freed
4778 are marked by a special function called @code{mark_redisplay} (in
4779 @code{redisplay.c}).
4780 @item
4781 all objects created for profiling purposes are allocated by C functions
4782 instead of using the lisp allocation mechanisms. In order to receive the
4783 right ones during the sweep phase, they also have to be marked
4784 manually. That is done by the function @code{mark_profiling_info}
4785 @end itemize
4786 @item
4787 Hash tables in XEmacs belong to a kind of special objects that
4788 make use of a concept often called 'weak pointers'.
4789 To make a long story short, these kind of pointers are not followed
4790 during the estimation of the live objects during garbage collection.
4791 Any object referenced only by weak pointers is collected
4792 anyway, and the reference to it is cleared. In hash tables there are
4793 different usage patterns of them, manifesting in different types of hash
4794 tables, namely 'non-weak', 'weak', 'key-weak' and 'value-weak'
4795 (internally also 'key-car-weak' and 'value-car-weak') hash tables, each
4796 clearing entries depending on different conditions. More information can
4797 be found in the documentation to the function @code{make-hash-table}.
4798
4799 Because there are complicated dependency rules about when and what to
4800 mark while processing weak hash tables, the standard @code{marker}
4801 method is only active if it is marking non-weak hash tables. As soon as
4802 a weak component is in the table, the hash table entries are ignored
4803 while marking. Instead their marking is done each separately by the
4804 function @code{finish_marking_weak_hash_tables}. This function iterates
4805 over each hash table entry @code{hentries} for each weak hash table in
4806 @code{Vall_weak_hash_tables}. Depending on the type of a table, the
4807 appropriate action is performed.
4808 If a table is acting as @code{HASH_TABLE_KEY_WEAK}, and a key already marked,
4809 everything reachable from the @code{value} component is marked. If it is
4810 acting as a @code{HASH_TABLE_VALUE_WEAK} and the value component is
4811 already marked, the marking starts beginning only from the
4812 @code{key} component.
4813 If it is a @code{HASH_TABLE_KEY_CAR_WEAK} and the car
4814 of the key entry is already marked, we mark both the @code{key} and
4815 @code{value} components.
4816 Finally, if the table is of the type @code{HASH_TABLE_VALUE_CAR_WEAK}
4817 and the car of the value components is already marked, again both the
4818 @code{key} and the @code{value} components get marked.
4819
4820 Again, there are lists with comparable properties called weak
4821 lists. There exist different peculiarities of their types called
4822 @code{simple}, @code{assoc}, @code{key-assoc} and
4823 @code{value-assoc}. You can find further details about them in the
4824 description to the function @code{make-weak-list}. The scheme of their
4825 marking is similar: all weak lists are listed in @code{Qall_weak_lists},
4826 therefore we iterate over them. The marking is advanced until we hit an
4827 already marked pair. Then we know that during a former run all
4828 the rest has been marked completely. Again, depending on the special
4829 type of the weak list, our jobs differ. If it is a @code{WEAK_LIST_SIMPLE}
4830 and the elem is marked, we mark the @code{cons} part. If it is a
4831 @code{WEAK_LIST_ASSOC} and not a pair or a pair with both marked car and
4832 cdr, we mark the @code{cons} and the @code{elem}. If it is a
4833 @code{WEAK_LIST_KEY_ASSOC} and not a pair or a pair with a marked car of
4834 the elem, we mark the @code{cons} and the @code{elem}. Finally, if it is
4835 a @code{WEAK_LIST_VALUE_ASSOC} and not a pair or a pair with a marked
4836 cdr of the elem, we mark both the @code{cons} and the @code{elem}.
4837
4838 Since, by marking objects in reach from weak hash tables and weak lists,
4839 other objects could get marked, this perhaps implies further marking of
4840 other weak objects, both finishing functions are redone as long as
4841 yet unmarked objects get freshly marked.
4842
4843 @item
4844 After completing the special marking for the weak hash tables and for the weak
4845 lists, all entries that point to objects that are going to be swept in
4846 the further process are useless, and therefore have to be removed from
4847 the table or the list.
4848
4849 The function @code{prune_weak_hash_tables} does the job for weak hash
4850 tables. Totally unmarked hash tables are removed from the list
4851 @code{Vall_weak_hash_tables}. The other ones are treated more carefully
4852 by scanning over all entries and removing one as soon as one of
4853 the components @code{key} and @code{value} is unmarked.
4854
4855 The same idea applies to the weak lists. It is accomplished by
4856 @code{prune_weak_lists}: An unmarked list is pruned from
4857 @code{Vall_weak_lists} immediately. A marked list is treated more
4858 carefully by going over it and removing just the unmarked pairs.
4859
4860 @item
4861 The function @code{prune_specifiers} checks all listed specifiers held
4862 in @code{Vall_specifiers} and removes the ones from the lists that are
4863 unmarked.
4864
4865 @item
4866 All syntax tables are stored in a list called
4867 @code{Vall_syntax_tables}. The function @code{prune_syntax_tables} walks
4868 through it and unlinks the tables that are unmarked.
4869
4870 @item
4871 Next, we will attack the complete sweeping - the function
4872 @code{gc_sweep} which holds the predominance.
4873 @item
4874 First, all the variables with respect to garbage collection are
4875 reset. @code{consing_since_gc} - the counter of the created cells since
4876 the last garbage collection - is set back to 0, and
4877 @code{gc_in_progress} is not @code{true} anymore.
4878 @item
4879 In case the session is interactive, the displayed cursor and message are
4880 removed again.
4881 @item
4882 The state of @code{gc_inhibit} is restored to the former value by
4883 unwinding the stack.
4884 @item
4885 A small memory reserve is always held back that can be reached by
4886 @code{breathing_space}. If nothing more is left, we create a new reserve
4887 and exit.
4888 @end enumerate
4889
4890 @node mark_object, gc_sweep, garbage_collect_1, Garbage Collection - Step by Step
4891 @subsection @code{mark_object}
4892 @cindex @code{mark_object}
4893
4894 The first thing that is checked while marking an object is whether the
4895 object is a real Lisp object @code{Lisp_Type_Record} or just an integer
4896 or a character. Integers and characters are the only two types that are
4897 stored directly - without another level of indirection, and therefore they
4898 don't have to be marked and collected.
4899 @xref{How Lisp Objects Are Represented in C}.
4900
4901 The second case is the one we have to handle. It is the one when we are
4902 dealing with a pointer to a Lisp object. But, there exist also three
4903 possibilities, that prevent us from doing anything while marking: The
4904 object is read only which prevents it from being garbage collected,
4905 i.e. marked (@code{C_READONLY_RECORD_HEADER}). The object in question is
4906 already marked, and need not be marked for the second time (checked by
4907 @code{MARKED_RECORD_HEADER_P}). If it is a special, unmarkable object
4908 (@code{UNMARKABLE_RECORD_HEADER_P}, apparently, these are objects that
4909 sit in some const space, and can therefore not be marked, see
4910 @code{this_one_is_unmarkable} in @code{alloc.c}).
4911
4912 Now, the actual marking is feasible. We do so by once using the macro
4913 @code{MARK_RECORD_HEADER} to mark the object itself (actually the
4914 special flag in the lrecord header), and calling its special marker
4915 "method" @code{marker} if available. The marker method marks every
4916 other object that is in reach from our current object. Note, that these
4917 marker methods should not call @code{mark_object} recursively, but
4918 instead should return the next object from where further marking has to
4919 be performed.
4920
4921 In case another object was returned, as mentioned before, we reiterate
4922 the whole @code{mark_object} process beginning with this next object.
4923
4924 @node gc_sweep, sweep_lcrecords_1, mark_object, Garbage Collection - Step by Step
4925 @subsection @code{gc_sweep}
4926 @cindex @code{gc_sweep}
4927
4928 The job of this function is to free all unmarked records from memory. As
4929 we know, there are different types of objects implemented and managed, and
4930 consequently different ways to free them from memory.
4931 @xref{Introduction to Allocation}.
4932
4933 We start with all objects stored through @code{lcrecords}. All
4934 bulkier objects are allocated and handled using that scheme of
4935 @code{lcrecords}. Each object is @code{malloc}ed separately
4936 instead of placing it in one of the contiguous frob blocks. All types
4937 that are currently stored
4938 using @code{lcrecords}'s  @code{alloc_lcrecord} and
4939 @code{make_lcrecord_list} are the types: vectors, buffers,
4940 char-table, char-table-entry, console, weak-list, database, device,
4941 ldap, hash-table, command-builder, extent-auxiliary, extent-info, face,
4942 coding-system, frame, image-instance, glyph, popup-data, gui-item,
4943 keymap, charset, color_instance, font_instance, opaque, opaque-list,
4944 process, range-table, specifier, symbol-value-buffer-local,
4945 symbol-value-lisp-magic, symbol-value-varalias, toolbar-button,
4946 tooltalk-message, tooltalk-pattern, window, and window-configuration. We
4947 take care of them in the fist place
4948 in order to be able to handle and to finalize items stored in them more
4949 easily. The function @code{sweep_lcrecords_1} as described below is
4950 doing the whole job for us.
4951 For a description about the internals: @xref{lrecords}.
4952
4953 Our next candidates are the other objects that behave quite differently
4954 than everything else: the strings. They consists of two parts, a
4955 fixed-size portion (@code{struct Lisp_string}) holding the string's
4956 length, its property list and a pointer to the second part, and the
4957 actual string data, which is stored in string-chars blocks comparable to
4958 frob blocks. In this block, the data is not only freed, but also a
4959 compression of holes is made, i.e. all strings are relocated together.
4960 @xref{String}. This compacting phase is performed by the function
4961 @code{compact_string_chars}, the actual sweeping by the function
4962 @code{sweep_strings} is described below.
4963
4964 After that, the other types are swept step by step using functions
4965 @code{sweep_conses}, @code{sweep_bit_vectors_1},
4966 @code{sweep_compiled_functions}, @code{sweep_floats},
4967 @code{sweep_symbols}, @code{sweep_extents}, @code{sweep_markers} and
4968 @code{sweep_extents}.  They are the fixed-size types cons, floats,
4969 compiled-functions, symbol, marker, extent, and event stored in
4970 so-called "frob blocks", and therefore we can basically do the same on
4971 every type objects, using the same macros, especially defined only to
4972 handle everything with respect to fixed-size blocks. The only fixed-size
4973 type that is not handled here are the fixed-size portion of strings,
4974 because we took special care of them earlier.
4975
4976 The only big exceptions are bit vectors stored differently and
4977 therefore treated differently by the function @code{sweep_bit_vectors_1}
4978 described later.
4979
4980 At first, we need some brief information about how
4981 these fixed-size types are managed in general, in order to understand
4982 how the sweeping is done. They have all a fixed size, and are therefore
4983 stored in big blocks of memory - allocated at once - that can hold a
4984 certain amount of objects of one type. The macro
4985 @code{DECLARE_FIXED_TYPE_ALLOC} creates the suitable structures for
4986 every type. More precisely, we have the block struct
4987 (holding a pointer to the previous block @code{prev} and the
4988 objects in @code{block[]}), a pointer to current block
4989 (@code{current_..._block)}) and its last index
4990 (@code{current_..._block_index}), and a pointer to the free list that
4991 will be created. Also a macro @code{FIXED_TYPE_FROM_BLOCK} plus some
4992 related macros exists that are used to obtain a new object, either from
4993 the free list @code{ALLOCATE_FIXED_TYPE_1} if there is an unused object
4994 of that type stored or by allocating a completely new block using
4995 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK}.
4996
4997 The rest works as follows: all of them define a
4998 macro @code{UNMARK_...} that is used to unmark the object. They define a
4999 macro @code{ADDITIONAL_FREE_...} that defines additional work that has
5000 to be done when converting an object from in use to not in use (so far,
5001 only markers use it in order to unchain them). Then, they all call
5002 the macro @code{SWEEP_FIXED_TYPE_BLOCK} instantiated with their type name
5003 and their struct name.
5004
5005 This call in particular does the following: we go over all blocks
5006 starting with the current moving towards the oldest.
5007 For each block, we look at every object in it. If the object already
5008 freed (checked with @code{FREE_STRUCT_P} using the first pointer of the
5009 object), or if it is
5010 set to read only (@code{C_READONLY_RECORD_HEADER_P}, nothing must be
5011 done. If it is unmarked (checked with @code{MARKED_RECORD_HEADER_P}), it
5012 is put in the free list and set free (using the macro
5013 @code{FREE_FIXED_TYPE}, otherwise it stays in the block, but is unmarked
5014 (by @code{UNMARK_...}). While going through one block, we note if the
5015 whole block is empty. If so, the whole block is freed (using
5016 @code{xfree}) and the free list state is set to the state it had before
5017 handling this block.
5018
5019 @node sweep_lcrecords_1, compact_string_chars, gc_sweep, Garbage Collection - Step by Step
5020 @subsection @code{sweep_lcrecords_1}
5021 @cindex @code{sweep_lcrecords_1}
5022
5023 After nullifying the complete lcrecord statistics, we go over all
5024 lcrecords two separate times. They are all chained together in a list with
5025 a head called @code{all_lcrecords}.
5026
5027 The first loop calls for each object its @code{finalizer} method, but only
5028 in the case that it is not read only
5029 (@code{C_READONLY_RECORD_HEADER_P)}, it is not already marked
5030 (@code{MARKED_RECORD_HEADER_P}), it is not already in a free list (list of
5031 freed objects, field @code{free}) and finally it owns a finalizer
5032 method.
5033
5034 The second loop actually frees the appropriate objects again by iterating
5035 through the whole list. In case an object is read only or marked, it
5036 has to persist, otherwise it is manually freed by calling
5037 @code{xfree}. During this loop, the lcrecord statistics are kept up to
5038 date by calling @code{tick_lcrecord_stats} with the right arguments,
5039
5040 @node compact_string_chars, sweep_strings, sweep_lcrecords_1, Garbage Collection - Step by Step
5041 @subsection @code{compact_string_chars}
5042 @cindex @code{compact_string_chars}
5043
5044 The purpose of this function is to compact all the data parts of the
5045 strings that are held in so-called @code{string_chars_block}, i.e. the
5046 strings that do not exceed a certain maximal length.
5047
5048 The procedure with which this is done is as follows. We are keeping two
5049 positions in the @code{string_chars_block}s using two pointer/integer
5050 pairs, namely @code{from_sb}/@code{from_pos} and
5051 @code{to_sb}/@code{to_pos}. They stand for the actual positions, from
5052 where to where, to copy the actually handled string.
5053
5054 While going over all chained @code{string_char_block}s and their held
5055 strings, staring at @code{first_string_chars_block}, both pointers
5056 are advanced and eventually a string is copied from @code{from_sb} to
5057 @code{to_sb}, depending on the status of the pointed at strings.
5058
5059 More precisely, we can distinguish between the following actions.
5060 @itemize @bullet
5061 @item
5062 The string at @code{from_sb}'s position could be marked as free, which
5063 is indicated by an invalid pointer to the pointer that should point back
5064 to the fixed size string object, and which is checked by
5065 @code{FREE_STRUCT_P}. In this case, the @code{from_sb}/@code{from_pos}
5066 is advanced to the next string, and nothing has to be copied.
5067 @item
5068 Also, if a string object itself is unmarked, nothing has to be
5069 copied. We likewise advance the @code{from_sb}/@code{from_pos}
5070 pair as described above.
5071 @item
5072 In all other cases, we have a marked string at hand. The string data
5073 must be moved from the from-position to the to-position. In case
5074 there is not enough space in the actual @code{to_sb}-block, we advance
5075 this pointer to the beginning of the next block before copying. In case the
5076 from and to positions are different, we perform the
5077 actual copying using the library function @code{memmove}.
5078 @end itemize
5079
5080 After compacting, the pointer to the current
5081 @code{string_chars_block}, sitting in @code{current_string_chars_block},
5082 is reset on the last block to which we moved a string,
5083 i.e. @code{to_block}, and all remaining blocks (we know that they just
5084 carry garbage) are explicitly @code{xfree}d.
5085
5086 @node sweep_strings, sweep_bit_vectors_1, compact_string_chars, Garbage Collection - Step by Step
5087 @subsection @code{sweep_strings}
5088 @cindex @code{sweep_strings}
5089
5090 The sweeping for the fixed sized string objects is essentially exactly
5091 the same as it is for all other fixed size types. As before, the freeing
5092 into the suitable free list is done by using the macro
5093 @code{SWEEP_FIXED_SIZE_BLOCK} after defining the right macros
5094 @code{UNMARK_string} and @code{ADDITIONAL_FREE_string}. These two
5095 definitions are a little bit special compared to the ones used
5096 for the other fixed size types.
5097
5098 @code{UNMARK_string} is defined the same way except some additional code
5099 used for updating the bookkeeping information.
5100
5101 For strings, @code{ADDITIONAL_FREE_string} has to do something in
5102 addition: in case, the string was not allocated in a
5103 @code{string_chars_block} because it exceeded the maximal length, and
5104 therefore it was @code{malloc}ed separately, we know also @code{xfree}
5105 it explicitly.
5106
5107 @node sweep_bit_vectors_1,  , sweep_strings, Garbage Collection - Step by Step
5108 @subsection @code{sweep_bit_vectors_1}
5109 @cindex @code{sweep_bit_vectors_1}
5110
5111 Bit vectors are also one of the rare types that are @code{malloc}ed
5112 individually. Consequently, while sweeping, all further needless
5113 bit vectors must be freed by hand. This is done, as one might imagine,
5114 the expected way: since they are all registered in a list called
5115 @code{all_bit_vectors}, all elements of that list are traversed,
5116 all unmarked bit vectors are unlinked by calling @code{xfree} and all of
5117 them become unmarked.
5118 In addition, the bookkeeping information used for garbage
5119 collector's output purposes is updated.
5120
5121 @node Integers and Characters, Allocation from Frob Blocks, Garbage Collection - Step by Step, Allocation of Objects in XEmacs Lisp
5122 @section Integers and Characters
5123
5124   Integer and character Lisp objects are created from integers using the
5125 macros @code{XSETINT()} and @code{XSETCHAR()} or the equivalent
5126 functions @code{make_int()} and @code{make_char()}. (These are actually
5127 macros on most systems.)  These functions basically just do some moving
5128 of bits around, since the integral value of the object is stored
5129 directly in the @code{Lisp_Object}.
5130
5131   @code{XSETINT()} and the like will truncate values given to them that
5132 are too big; i.e. you won't get the value you expected but the tag bits
5133 will at least be correct.
5134
5135 @node Allocation from Frob Blocks, lrecords, Integers and Characters, Allocation of Objects in XEmacs Lisp
5136 @section Allocation from Frob Blocks
5137
5138 The uninitialized memory required by a @code{Lisp_Object} of a particular type
5139 is allocated using
5140 @code{ALLOCATE_FIXED_TYPE()}.  This only occurs inside of the
5141 lowest-level object-creating functions in @file{alloc.c}:
5142 @code{Fcons()}, @code{make_float()}, @code{Fmake_byte_code()},
5143 @code{Fmake_symbol()}, @code{allocate_extent()},
5144 @code{allocate_event()}, @code{Fmake_marker()}, and
5145 @code{make_uninit_string()}.  The idea is that, for each type, there are
5146 a number of frob blocks (each 2K in size); each frob block is divided up
5147 into object-sized chunks.  Each frob block will have some of these
5148 chunks that are currently assigned to objects, and perhaps some that are
5149 free. (If a frob block has nothing but free chunks, it is freed at the
5150 end of the garbage collection cycle.)  The free chunks are stored in a
5151 free list, which is chained by storing a pointer in the first four bytes
5152 of the chunk. (Except for the free chunks at the end of the last frob
5153 block, which are handled using an index which points past the end of the
5154 last-allocated chunk in the last frob block.)
5155 @code{ALLOCATE_FIXED_TYPE()} first tries to retrieve a chunk from the
5156 free list; if that fails, it calls
5157 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK()}, which looks at the end of the
5158 last frob block for space, and creates a new frob block if there is
5159 none. (There are actually two versions of these macros, one of which is
5160 more defensive but less efficient and is used for error-checking.)
5161
5162 @node lrecords, Low-level allocation, Allocation from Frob Blocks, Allocation of Objects in XEmacs Lisp
5163 @section lrecords
5164
5165   [see @file{lrecord.h}]
5166
5167   All lrecords have at the beginning of their structure a @code{struct
5168 lrecord_header}.  This just contains a type number and some flags,
5169 including the mark bit.  The type number, thru the
5170 @code{lrecord_implementation_table}, gives access to a @code{struct
5171 lrecord_implementation}, which is a structure containing method pointers
5172 and such.  There is one of these for each type, and it is a global,
5173 constant, statically-declared structure that is declared in the
5174 @code{DEFINE_LRECORD_IMPLEMENTATION()} macro.
5175
5176   Simple lrecords (of type (b) above) just have a @code{struct
5177 lrecord_header} at their beginning.  lcrecords, however, actually have a
5178 @code{struct lcrecord_header}.  This, in turn, has a @code{struct
5179 lrecord_header} at its beginning, so sanity is preserved; but it also
5180 has a pointer used to chain all lcrecords together, and a special ID
5181 field used to distinguish one lcrecord from another. (This field is used
5182 only for debugging and could be removed, but the space gain is not
5183 significant.)
5184
5185   Simple lrecords are created using @code{ALLOCATE_FIXED_TYPE()}, just
5186 like for other frob blocks.  The only change is that the implementation
5187 pointer must be initialized correctly. (The implementation structure for
5188 an lrecord, or rather the pointer to it, is named @code{lrecord_float},
5189 @code{lrecord_extent}, @code{lrecord_buffer}, etc.)
5190
5191   lcrecords are created using @code{alloc_lcrecord()}.  This takes a
5192 size to allocate and an implementation pointer. (The size needs to be
5193 passed because some lcrecords, such as window configurations, are of
5194 variable size.) This basically just @code{malloc()}s the storage,
5195 initializes the @code{struct lcrecord_header}, and chains the lcrecord
5196 onto the head of the list of all lcrecords, which is stored in the
5197 variable @code{all_lcrecords}.  The calls to @code{alloc_lcrecord()}
5198 generally occur in the lowest-level allocation function for each lrecord
5199 type.
5200
5201 Whenever you create an lrecord, you need to call either
5202 @code{DEFINE_LRECORD_IMPLEMENTATION()} or
5203 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()}.  This needs to be
5204 specified in a C file, at the top level.  What this actually does is
5205 define and initialize the implementation structure for the lrecord. (And
5206 possibly declares a function @code{error_check_foo()} that implements
5207 the @code{XFOO()} macro when error-checking is enabled.)  The arguments
5208 to the macros are the actual type name (this is used to construct the C
5209 variable name of the lrecord implementation structure and related
5210 structures using the @samp{##} macro concatenation operator), a string
5211 that names the type on the Lisp level (this may not be the same as the C
5212 type name; typically, the C type name has underscores, while the Lisp
5213 string has dashes), various method pointers, and the name of the C
5214 structure that contains the object.  The methods are used to encapsulate
5215 type-specific information about the object, such as how to print it or
5216 mark it for garbage collection, so that it's easy to add new object
5217 types without having to add a specific case for each new type in a bunch
5218 of different places.
5219
5220   The difference between @code{DEFINE_LRECORD_IMPLEMENTATION()} and
5221 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()} is that the former is
5222 used for fixed-size object types and the latter is for variable-size
5223 object types.  Most object types are fixed-size; some complex
5224 types, however (e.g. window configurations), are variable-size.
5225 Variable-size object types have an extra method, which is called
5226 to determine the actual size of a particular object of that type.
5227 (Currently this is only used for keeping allocation statistics.)
5228
5229   For the purpose of keeping allocation statistics, the allocation
5230 engine keeps a list of all the different types that exist.  Note that,
5231 since @code{DEFINE_LRECORD_IMPLEMENTATION()} is a macro that is
5232 specified at top-level, there is no way for it to add to the list of all
5233 existing types.  What happens instead is that each implementation
5234 structure contains in it a dynamically assigned number that is
5235 particular to that type. (Or rather, it contains a pointer to another
5236 structure that contains this number.  This evasiveness is done so that
5237 the implementation structure can be declared const.) In the sweep stage
5238 of garbage collection, each lrecord is examined to see if its
5239 implementation structure has its dynamically-assigned number set.  If
5240 not, it must be a new type, and it is added to the list of known types
5241 and a new number assigned.  The number is used to index into an array
5242 holding the number of objects of each type and the total memory
5243 allocated for objects of that type.  The statistics in this array are
5244 also computed during the sweep stage.  These statistics are returned by
5245 the call to @code{garbage-collect} and are printed out at the end of the
5246 loadup phase.
5247
5248   Note that for every type defined with a @code{DEFINE_LRECORD_*()}
5249 macro, there needs to be a @code{DECLARE_LRECORD_IMPLEMENTATION()}
5250 somewhere in a @file{.h} file, and this @file{.h} file needs to be
5251 included by @file{inline.c}.
5252
5253   Furthermore, there should generally be a set of @code{XFOOBAR()},
5254 @code{FOOBARP()}, etc. macros in a @file{.h} (or occasionally @file{.c})
5255 file.  To create one of these, copy an existing model and modify as
5256 necessary.
5257
5258   The various methods in the lrecord implementation structure are:
5259
5260 @enumerate
5261 @item
5262 @cindex mark method
5263 A @dfn{mark} method.  This is called during the marking stage and passed
5264 a function pointer (usually the @code{mark_object()} function), which is
5265 used to mark an object.  All Lisp objects that are contained within the
5266 object need to be marked by applying this function to them.  The mark
5267 method should also return a Lisp object, which should be either nil or
5268 an object to mark. (This can be used in lieu of calling
5269 @code{mark_object()} on the object, to reduce the recursion depth, and
5270 consequently should be the most heavily nested sub-object, such as a
5271 long list.)
5272
5273 @strong{Please note:} When the mark method is called, garbage collection
5274 is in progress, and special precautions need to be taken when accessing
5275 objects; see section (B) above.
5276
5277 If your mark method does not need to do anything, it can be
5278 @code{NULL}.
5279
5280 @item
5281 A @dfn{print} method.  This is called to create a printed representation
5282 of the object, whenever @code{princ}, @code{prin1}, or the like is
5283 called.  It is passed the object, a stream to which the output is to be
5284 directed, and an @code{escapeflag} which indicates whether the object's
5285 printed representation should be @dfn{escaped} so that it is
5286 readable. (This corresponds to the difference between @code{princ} and
5287 @code{prin1}.) Basically, @dfn{escaped} means that strings will have
5288 quotes around them and confusing characters in the strings such as
5289 quotes, backslashes, and newlines will be backslashed; and that special
5290 care will be taken to make symbols print in a readable fashion
5291 (e.g. symbols that look like numbers will be backslashed).  Other
5292 readable objects should perhaps pass @code{escapeflag} on when
5293 sub-objects are printed, so that readability is preserved when necessary
5294 (or if not, always pass in a 1 for @code{escapeflag}).  Non-readable
5295 objects should in general ignore @code{escapeflag}, except that some use
5296 it as an indication that more verbose output should be given.
5297
5298 Sub-objects are printed using @code{print_internal()}, which takes
5299 exactly the same arguments as are passed to the print method.
5300
5301 Literal C strings should be printed using @code{write_c_string()},
5302 or @code{write_string_1()} for non-null-terminated strings.
5303
5304 Functions that do not have a readable representation should check the
5305 @code{print_readably} flag and signal an error if it is set.
5306
5307 If you specify NULL for the print method, the
5308 @code{default_object_printer()} will be used.
5309
5310 @item
5311 A @dfn{finalize} method.  This is called at the beginning of the sweep
5312 stage on lcrecords that are about to be freed, and should be used to
5313 perform any extra object cleanup.  This typically involves freeing any
5314 extra @code{malloc()}ed memory associated with the object, releasing any
5315 operating-system and window-system resources associated with the object
5316 (e.g. pixmaps, fonts), etc.
5317
5318 The finalize method can be NULL if nothing needs to be done.
5319
5320 WARNING #1: The finalize method is also called at the end of the dump
5321 phase; this time with the for_disksave parameter set to non-zero.  The
5322 object is @emph{not} about to disappear, so you have to make sure to
5323 @emph{not} free any extra @code{malloc()}ed memory if you're going to
5324 need it later.  (Also, signal an error if there are any operating-system
5325 and window-system resources here, because they can't be dumped.)
5326
5327 Finalize methods should, as a rule, set to zero any pointers after
5328 they've been freed, and check to make sure pointers are not zero before
5329 freeing.  Although I'm pretty sure that finalize methods are not called
5330 twice on the same object (except for the @code{for_disksave} proviso),
5331 we've gotten nastily burned in some cases by not doing this.
5332
5333 WARNING #2: The finalize method is @emph{only} called for
5334 lcrecords, @emph{not} for simply lrecords.  If you need a
5335 finalize method for simple lrecords, you have to stick
5336 it in the @code{ADDITIONAL_FREE_foo()} macro in @file{alloc.c}.
5337
5338 WARNING #3: Things are in an @emph{extremely} bizarre state
5339 when @code{ADDITIONAL_FREE_foo()} is called, so you have to
5340 be incredibly careful when writing one of these functions.
5341 See the comment in @code{gc_sweep()}.  If you ever have to add
5342 one of these, consider using an lcrecord or dealing with
5343 the problem in a different fashion.
5344
5345 @item
5346 An @dfn{equal} method.  This compares the two objects for similarity,
5347 when @code{equal} is called.  It should compare the contents of the
5348 objects in some reasonable fashion.  It is passed the two objects and a
5349 @dfn{depth} value, which is used to catch circular objects.  To compare
5350 sub-Lisp-objects, call @code{internal_equal()} and bump the depth value
5351 by one.  If this value gets too high, a @code{circular-object} error
5352 will be signaled.
5353
5354 If this is NULL, objects are @code{equal} only when they are @code{eq},
5355 i.e. identical.
5356
5357 @item
5358 A @dfn{hash} method.  This is used to hash objects when they are to be
5359 compared with @code{equal}.  The rule here is that if two objects are
5360 @code{equal}, they @emph{must} hash to the same value; i.e. your hash
5361 function should use some subset of the sub-fields of the object that are
5362 compared in the ``equal'' method.  If you specify this method as
5363 @code{NULL}, the object's pointer will be used as the hash, which will
5364 @emph{fail} if the object has an @code{equal} method, so don't do this.
5365
5366 To hash a sub-Lisp-object, call @code{internal_hash()}.  Bump the
5367 depth by one, just like in the ``equal'' method.
5368
5369 To convert a Lisp object directly into a hash value (using
5370 its pointer), use @code{LISP_HASH()}.  This is what happens when
5371 the hash method is NULL.
5372
5373 To hash two or more values together into a single value, use
5374 @code{HASH2()}, @code{HASH3()}, @code{HASH4()}, etc.
5375
5376 @item
5377 @dfn{getprop}, @dfn{putprop}, @dfn{remprop}, and @dfn{plist} methods.
5378 These are used for object types that have properties.  I don't feel like
5379 documenting them here.  If you create one of these objects, you have to
5380 use different macros to define them,
5381 i.e. @code{DEFINE_LRECORD_IMPLEMENTATION_WITH_PROPS()} or
5382 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION_WITH_PROPS()}.
5383
5384 @item
5385 A @dfn{size_in_bytes} method, when the object is of variable-size.
5386 (i.e. declared with a @code{_SEQUENCE_IMPLEMENTATION} macro.)  This should
5387 simply return the object's size in bytes, exactly as you might expect.
5388 For an example, see the methods for window configurations and opaques.
5389 @end enumerate
5390
5391 @node Low-level allocation, Cons, lrecords, Allocation of Objects in XEmacs Lisp
5392 @section Low-level allocation
5393
5394   Memory that you want to allocate directly should be allocated using
5395 @code{xmalloc()} rather than @code{malloc()}.  This implements
5396 error-checking on the return value, and once upon a time did some more
5397 vital stuff (i.e. @code{BLOCK_INPUT}, which is no longer necessary).
5398 Free using @code{xfree()}, and realloc using @code{xrealloc()}.  Note
5399 that @code{xmalloc()} will do a non-local exit if the memory can't be
5400 allocated. (Many functions, however, do not expect this, and thus XEmacs
5401 will likely crash if this happens.  @strong{This is a bug.}  If you can,
5402 you should strive to make your function handle this OK.  However, it's
5403 difficult in the general circumstance, perhaps requiring extra
5404 unwind-protects and such.)
5405
5406   Note that XEmacs provides two separate replacements for the standard
5407 @code{malloc()} library function.  These are called @dfn{old GNU malloc}
5408 (@file{malloc.c}) and @dfn{new GNU malloc} (@file{gmalloc.c}),
5409 respectively.  New GNU malloc is better in pretty much every way than
5410 old GNU malloc, and should be used if possible.  (It used to be that on
5411 some systems, the old one worked but the new one didn't.  I think this
5412 was due specifically to a bug in SunOS, which the new one now works
5413 around; so I don't think the old one ever has to be used any more.) The
5414 primary difference between both of these mallocs and the standard system
5415 malloc is that they are much faster, at the expense of increased space.
5416 The basic idea is that memory is allocated in fixed chunks of powers of
5417 two.  This allows for basically constant malloc time, since the various
5418 chunks can just be kept on a number of free lists. (The standard system
5419 malloc typically allocates arbitrary-sized chunks and has to spend some
5420 time, sometimes a significant amount of time, walking the heap looking
5421 for a free block to use and cleaning things up.)  The new GNU malloc
5422 improves on things by allocating large objects in chunks of 4096 bytes
5423 rather than in ever larger powers of two, which results in ever larger
5424 wastage.  There is a slight speed loss here, but it's of doubtful
5425 significance.
5426
5427   NOTE: Apparently there is a third-generation GNU malloc that is
5428 significantly better than the new GNU malloc, and should probably
5429 be included in XEmacs.
5430
5431   There is also the relocating allocator, @file{ralloc.c}.  This actually
5432 moves blocks of memory around so that the @code{sbrk()} pointer shrunk
5433 and virtual memory released back to the system.  On some systems,
5434 this is a big win.  On all systems, it causes a noticeable (and
5435 sometimes huge) speed penalty, so I turn it off by default.
5436 @file{ralloc.c} only works with the new GNU malloc in @file{gmalloc.c}.
5437 There are also two versions of @file{ralloc.c}, one that uses @code{mmap()}
5438 rather than block copies to move data around.  This purports to
5439 be faster, although that depends on the amount of data that would
5440 have had to be block copied and the system-call overhead for
5441 @code{mmap()}.  I don't know exactly how this works, except that the
5442 relocating-allocation routines are pretty much used only for
5443 the memory allocated for a buffer, which is the biggest consumer
5444 of space, esp. of space that may get freed later.
5445
5446   Note that the GNU mallocs have some ``memory warning'' facilities.
5447 XEmacs taps into them and issues a warning through the standard
5448 warning system, when memory gets to 75%, 85%, and 95% full.
5449 (On some systems, the memory warnings are not functional.)
5450
5451   Allocated memory that is going to be used to make a Lisp object
5452 is created using @code{allocate_lisp_storage()}.  This calls @code{xmalloc()}
5453 but also verifies that the pointer to the memory can fit into
5454 a Lisp word (remember that some bits are taken away for a type
5455 tag and a mark bit).  If not, an error is issued through @code{memory_full()}.
5456 @code{allocate_lisp_storage()} is called by @code{alloc_lcrecord()},
5457 @code{ALLOCATE_FIXED_TYPE()}, and the vector and bit-vector creation
5458 routines.  These routines also call @code{INCREMENT_CONS_COUNTER()} at the
5459 appropriate times; this keeps statistics on how much memory is
5460 allocated, so that garbage-collection can be invoked when the
5461 threshold is reached.
5462
5463 @node Cons, Vector, Low-level allocation, Allocation of Objects in XEmacs Lisp
5464 @section Cons
5465
5466   Conses are allocated in standard frob blocks.  The only thing to
5467 note is that conses can be explicitly freed using @code{free_cons()}
5468 and associated functions @code{free_list()} and @code{free_alist()}.  This
5469 immediately puts the conses onto the cons free list, and decrements
5470 the statistics on memory allocation appropriately.  This is used
5471 to good effect by some extremely commonly-used code, to avoid
5472 generating extra objects and thereby triggering GC sooner.
5473 However, you have to be @emph{extremely} careful when doing this.
5474 If you mess this up, you will get BADLY BURNED, and it has happened
5475 before.
5476
5477 @node Vector, Bit Vector, Cons, Allocation of Objects in XEmacs Lisp
5478 @section Vector
5479
5480   As mentioned above, each vector is @code{malloc()}ed individually, and
5481 all are threaded through the variable @code{all_vectors}.  Vectors are
5482 marked strangely during garbage collection, by kludging the size field.
5483 Note that the @code{struct Lisp_Vector} is declared with its
5484 @code{contents} field being a @emph{stretchy} array of one element.  It
5485 is actually @code{malloc()}ed with the right size, however, and access
5486 to any element through the @code{contents} array works fine.
5487
5488 @node Bit Vector, Symbol, Vector, Allocation of Objects in XEmacs Lisp
5489 @section Bit Vector
5490
5491   Bit vectors work exactly like vectors, except for more complicated
5492 code to access an individual bit, and except for the fact that bit
5493 vectors are lrecords while vectors are not. (The only difference here is
5494 that there's an lrecord implementation pointer at the beginning and the
5495 tag field in bit vector Lisp words is ``lrecord'' rather than
5496 ``vector''.)
5497
5498 @node Symbol, Marker, Bit Vector, Allocation of Objects in XEmacs Lisp
5499 @section Symbol
5500
5501   Symbols are also allocated in frob blocks.  Symbols in the awful
5502 horrible obarray structure are chained through their @code{next} field.
5503
5504 Remember that @code{intern} looks up a symbol in an obarray, creating
5505 one if necessary.
5506
5507 @node Marker, String, Symbol, Allocation of Objects in XEmacs Lisp
5508 @section Marker
5509
5510   Markers are allocated in frob blocks, as usual.  They are kept
5511 in a buffer unordered, but in a doubly-linked list so that they
5512 can easily be removed. (Formerly this was a singly-linked list,
5513 but in some cases garbage collection took an extraordinarily
5514 long time due to the O(N^2) time required to remove lots of
5515 markers from a buffer.) Markers are removed from a buffer in
5516 the finalize stage, in @code{ADDITIONAL_FREE_marker()}.
5517
5518 @node String, Compiled Function, Marker, Allocation of Objects in XEmacs Lisp
5519 @section String
5520
5521   As mentioned above, strings are a special case.  A string is logically
5522 two parts, a fixed-size object (containing the length, property list,
5523 and a pointer to the actual data), and the actual data in the string.
5524 The fixed-size object is a @code{struct Lisp_String} and is allocated in
5525 frob blocks, as usual.  The actual data is stored in special
5526 @dfn{string-chars blocks}, which are 8K blocks of memory.
5527 Currently-allocated strings are simply laid end to end in these
5528 string-chars blocks, with a pointer back to the @code{struct Lisp_String}
5529 stored before each string in the string-chars block.  When a new string
5530 needs to be allocated, the remaining space at the end of the last
5531 string-chars block is used if there's enough, and a new string-chars
5532 block is created otherwise.
5533
5534   There are never any holes in the string-chars blocks due to the string
5535 compaction and relocation that happens at the end of garbage collection.
5536 During the sweep stage of garbage collection, when objects are
5537 reclaimed, the garbage collector goes through all string-chars blocks,
5538 looking for unused strings.  Each chunk of string data is preceded by a
5539 pointer to the corresponding @code{struct Lisp_String}, which indicates
5540 both whether the string is used and how big the string is, i.e. how to
5541 get to the next chunk of string data.  Holes are compressed by
5542 block-copying the next string into the empty space and relocating the
5543 pointer stored in the corresponding @code{struct Lisp_String}.
5544 @strong{This means you have to be careful with strings in your code.}
5545 See the section above on @code{GCPRO}ing.
5546
5547   Note that there is one situation not handled: a string that is too big
5548 to fit into a string-chars block.  Such strings, called @dfn{big
5549 strings}, are all @code{malloc()}ed as their own block. (#### Although it
5550 would make more sense for the threshold for big strings to be somewhat
5551 lower, e.g. 1/2 or 1/4 the size of a string-chars block.  It seems that
5552 this was indeed the case formerly---indeed, the threshold was set at
5553 1/8---but Mly forgot about this when rewriting things for 19.8.)
5554
5555 Note also that the string data in string-chars blocks is padded as
5556 necessary so that proper alignment constraints on the @code{struct
5557 Lisp_String} back pointers are maintained.
5558
5559   Finally, strings can be resized.  This happens in Mule when a
5560 character is substituted with a different-length character, or during
5561 modeline frobbing. (You could also export this to Lisp, but it's not
5562 done so currently.) Resizing a string is a potentially tricky process.
5563 If the change is small enough that the padding can absorb it, nothing
5564 other than a simple memory move needs to be done.  Keep in mind,
5565 however, that the string can't shrink too much because the offset to the
5566 next string in the string-chars block is computed by looking at the
5567 length and rounding to the nearest multiple of four or eight.  If the
5568 string would shrink or expand beyond the correct padding, new string
5569 data needs to be allocated at the end of the last string-chars block and
5570 the data moved appropriately.  This leaves some dead string data, which
5571 is marked by putting a special marker of 0xFFFFFFFF in the @code{struct
5572 Lisp_String} pointer before the data (there's no real @code{struct
5573 Lisp_String} to point to and relocate), and storing the size of the dead
5574 string data (which would normally be obtained from the now-non-existent
5575 @code{struct Lisp_String}) at the beginning of the dead string data gap.
5576 The string compactor recognizes this special 0xFFFFFFFF marker and
5577 handles it correctly.
5578
5579 @node Compiled Function,  , String, Allocation of Objects in XEmacs Lisp
5580 @section Compiled Function
5581
5582   Not yet documented.
5583
5584
5585 @node Dumping, Events and the Event Loop, Allocation of Objects in XEmacs Lisp, Top
5586 @chapter Dumping
5587
5588 @section What is dumping and its justification
5589
5590 The C code of XEmacs is just a Lisp engine with a lot of built-in
5591 primitives useful for writing an editor.  The editor itself is written
5592 mostly in Lisp, and represents around 100K lines of code.  Loading and
5593 executing the initialization of all this code takes a bit a time (five
5594 to ten times the usual startup time of current xemacs) and requires
5595 having all the lisp source files around.  Having to reload them each
5596 time the editor is started would not be acceptable.
5597
5598 The traditional solution to this problem is called dumping: the build
5599 process first creates the lisp engine under the name @file{temacs}, then
5600 runs it until it has finished loading and initializing all the lisp
5601 code, and eventually creates a new executable called @file{xemacs}
5602 including both the object code in @file{temacs} and all the contents of
5603 the memory after the initialization.
5604
5605 This solution, while working, has a huge problem: the creation of the
5606 new executable from the actual contents of memory is an extremely
5607 system-specific process, quite error-prone, and which interferes with a
5608 lot of system libraries (like malloc).  It is even getting worse
5609 nowadays with libraries using constructors which are automatically
5610 called when the program is started (even before main()) which tend to
5611 crash when they are called multiple times, once before dumping and once
5612 after (IRIX 6.x libz.so pulls in some C++ image libraries thru
5613 dependencies which have this problem).  Writing the dumper is also one
5614 of the most difficult parts of porting XEmacs to a new operating system.
5615 Basically, `dumping' is an operation that is just not officially
5616 supported on many operating systems.
5617
5618 The aim of the portable dumper is to solve the same problem as the
5619 system-specific dumper, that is to be able to reload quickly, using only
5620 a small number of files, the fully initialized lisp part of the editor,
5621 without any system-specific hacks.
5622
5623 @menu
5624 * Overview::
5625 * Data descriptions::
5626 * Dumping phase::
5627 * Reloading phase::
5628 * Remaining issues::
5629 @end menu
5630
5631 @node Overview, Data descriptions, Dumping, Dumping
5632 @section Overview
5633
5634 The portable dumping system has to:
5635
5636 @enumerate
5637 @item
5638 At dump time, write all initialized, non-quickly-rebuildable data to a
5639 file [Note: currently named @file{xemacs.dmp}, but the name will
5640 change], along with all informations needed for the reloading.
5641
5642 @item
5643 When starting xemacs, reload the dump file, relocate it to its new
5644 starting address if needed, and reinitialize all pointers to this
5645 data.  Also, rebuild all the quickly rebuildable data.
5646 @end enumerate
5647
5648 @node Data descriptions, Dumping phase, Overview, Dumping
5649 @section Data descriptions
5650
5651 The more complex task of the dumper is to be able to write lisp objects
5652 (lrecords) and C structs to disk and reload them at a different address,
5653 updating all the pointers they include in the process.  This is done by
5654 using external data descriptions that give information about the layout
5655 of the structures in memory.
5656
5657 The specification of these descriptions is in lrecord.h.  A description
5658 of an lrecord is an array of struct lrecord_description.  Each of these
5659 structs include a type, an offset in the structure and some optional
5660 parameters depending on the type.  For instance, here is the string
5661 description:
5662
5663 @example
5664 static const struct lrecord_description string_description[] = @{
5665   @{ XD_BYTECOUNT,         offsetof (Lisp_String, size) @},
5666   @{ XD_OPAQUE_DATA_PTR,   offsetof (Lisp_String, data), XD_INDIRECT(0, 1) @},
5667   @{ XD_LISP_OBJECT,       offsetof (Lisp_String, plist) @},
5668   @{ XD_END @}
5669 @};
5670 @end example
5671
5672 The first line indicates a member of type Bytecount, which is used by
5673 the next, indirect directive.  The second means "there is a pointer to
5674 some opaque data in the field @code{data}".  The length of said data is
5675 given by the expression @code{XD_INDIRECT(0, 1)}, which means "the value
5676 in the 0th line of the description (welcome to C) plus one".  The third
5677 line means "there is a Lisp_Object member @code{plist} in the Lisp_String
5678 structure".  @code{XD_END} then ends the description.
5679
5680 This gives us all the information we need to move around what is pointed
5681 to by a structure (C or lrecord) and, by transitivity, everything that
5682 it points to.  The only missing information for dumping is the size of
5683 the structure.  For lrecords, this is part of the
5684 lrecord_implementation, so we don't need to duplicate it.  For C
5685 structures we use a struct struct_description, which includes a size
5686 field and a pointer to an associated array of lrecord_description.
5687
5688 @node Dumping phase, Reloading phase, Data descriptions, Dumping
5689 @section Dumping phase
5690
5691 Dumping is done by calling the function pdump() (in alloc.c) which is
5692 invoked from Fdump_emacs (in emacs.c).  This function performs a number
5693 of tasks.
5694
5695 @menu
5696 * Object inventory::
5697 * Address allocation::
5698 * The header::
5699 * Data dumping::
5700 * Pointers dumping::
5701 @end menu
5702
5703 @node Object inventory, Address allocation, Dumping phase, Dumping phase
5704 @subsection Object inventory
5705
5706 The first task is to build the list of the objects to dump.  This
5707 includes:
5708
5709 @itemize @bullet
5710 @item lisp objects
5711 @item C structures
5712 @end itemize
5713
5714 We end up with one @code{pdump_entry_list_elmt} per object group (arrays
5715 of C structs are kept together) which includes a pointer to the first
5716 object of the group, the per-object size and the count of objects in the
5717 group, along with some other information which is initialized later.
5718
5719 These entries are linked together in @code{pdump_entry_list} structures
5720 and can be enumerated thru either:
5721
5722 @enumerate
5723 @item
5724 the @code{pdump_object_table}, an array of @code{pdump_entry_list}, one
5725 per lrecord type, indexed by type number.
5726
5727 @item
5728 the @code{pdump_opaque_data_list}, used for the opaque data which does
5729 not include pointers, and hence does not need descriptions.
5730
5731 @item
5732 the @code{pdump_struct_table}, which is a vector of
5733 @code{struct_description}/@code{pdump_entry_list} pairs, used for
5734 non-opaque C structures.
5735 @end enumerate
5736
5737 This uses a marking strategy similar to the garbage collector.  Some
5738 differences though:
5739
5740 @enumerate
5741 @item
5742 We do not use the mark bit (which does not exist for C structures
5743 anyway), we use a big hash table instead.
5744
5745 @item
5746 We do not use the mark function of lrecords but instead rely on the
5747 external descriptions.  This happens essentially because we need to
5748 follow pointers to C structures and opaque data in addition to
5749 Lisp_Object members.
5750 @end enumerate
5751
5752 This is done by @code{pdump_register_object}, which handles Lisp_Object
5753 variables, and pdump_register_struct which handles C structures, which
5754 both delegate the description management to pdump_register_sub.
5755
5756 The hash table doubles as a map object to pdump_entry_list_elmt (i.e.
5757 allows us to look up a pdump_entry_list_elmt with the object it points
5758 to).  Entries are added with @code{pdump_add_entry()} and looked up with
5759 @code{pdump_get_entry()}.  There is no need for entry removal.  The hash
5760 value is computed quite basically from the object pointer by
5761 @code{pdump_make_hash()}.
5762
5763 The roots for the marking are:
5764
5765 @enumerate
5766 @item
5767 the @code{staticpro}'ed variables (there is a special @code{staticpro_nodump()}
5768 call for protected variables we do not want to dump).
5769
5770 @item
5771 the @code{pdump_wire}'d variables (@code{staticpro} is equivalent to
5772 @code{staticpro_nodump()} + @code{pdump_wire()}).
5773
5774 @item
5775 the @code{dumpstruct}'ed variables, which points to C structures.
5776 @end enumerate
5777
5778 This does not include the GCPRO'ed variables, the specbinds, the
5779 catchtags, the backlist, the redisplay or the profiling info, since we
5780 do not want to rebuild the actual chain of lisp calls which end up to
5781 the dump-emacs call, only the global variables.
5782
5783 Weak lists and weak hash tables are dumped as if they were their
5784 non-weak equivalent (without changing their type, of course).  This has
5785 not yet been a problem.
5786
5787 @node Address allocation, The header, Object inventory, Dumping phase
5788 @subsection Address allocation
5789
5790
5791 The next step is to allocate the offsets of each of the objects in the
5792 final dump file.  This is done by @code{pdump_allocate_offset()} which
5793 is called indirectly by @code{pdump_scan_by_alignment()}.
5794
5795 The strategy to deal with alignment problems uses these facts:
5796
5797 @enumerate
5798 @item
5799 real world alignment requirements are powers of two.
5800
5801 @item
5802 the C compiler is required to adjust the size of a struct so that you
5803 can have an array of them next to each other.  This means you can have a
5804 upper bound of the alignment requirements of a given structure by
5805 looking at which power of two its size is a multiple.
5806
5807 @item
5808 the non-variant part of variable size lrecords has an alignment
5809 requirement of 4.
5810 @end enumerate
5811
5812 Hence, for each lrecord type, C struct type or opaque data block the
5813 alignment requirement is computed as a power of two, with a minimum of
5814 2^2 for lrecords.  @code{pdump_scan_by_alignment()} then scans all the
5815 @code{pdump_entry_list_elmt}'s, the ones with the highest requirements
5816 first.  This ensures the best packing.
5817
5818 The maximum alignment requirement we take into account is 2^8.
5819
5820 @code{pdump_allocate_offset()} only has to do a linear allocation,
5821 starting at offset 256 (this leaves room for the header and keep the
5822 alignments happy).
5823
5824 @node The header, Data dumping, Address allocation, Dumping phase
5825 @subsection The header
5826
5827 The next step creates the file and writes a header with a signature and
5828 some random informations in it (number of staticpro, number of assigned
5829 lrecord types, etc...).  The reloc_address field, which indicates at
5830 which address the file should be loaded if we want to avoid post-reload
5831 relocation, is set to 0.  It then seeks to offset 256 (base offset for
5832 the objects).
5833
5834 @node Data dumping, Pointers dumping, The header, Dumping phase
5835 @subsection Data dumping
5836
5837 The data is dumped in the same order as the addresses were allocated by
5838 @code{pdump_dump_data()}, called from @code{pdump_scan_by_alignment()}.
5839 This function copies the data to a temporary buffer, relocates all
5840 pointers in the object to the addresses allocated in step Address
5841 Allocation, and writes it to the file.  Using the same order means that,
5842 if we are careful with lrecords whose size is not a multiple of 4, we
5843 are ensured that the object is always written at the offset in the file
5844 allocated in step Address Allocation.
5845
5846 @node Pointers dumping,  , Data dumping, Dumping phase
5847 @subsection Pointers dumping
5848
5849 A bunch of tables needed to reassign properly the global pointers are
5850 then written.  They are:
5851
5852 @enumerate
5853 @item
5854 the staticpro array
5855 @item
5856 the dumpstruct array
5857 @item
5858 the lrecord_implementation_table array
5859 @item
5860 a vector of all the offsets to the objects in the file that include a
5861 description (for faster relocation at reload time)
5862 @item
5863 the pdump_wired and pdump_wired_list arrays
5864 @end enumerate
5865
5866 For each of the arrays we write both the pointer to the variables and
5867 the relocated offset of the object they point to.  Since these variables
5868 are global, the pointers are still valid when restarting the program and
5869 are used to regenerate the global pointers.
5870
5871 The @code{pdump_wired_list} array is a special case.  The variables it
5872 points to are the head of weak linked lists of lisp objects of the same
5873 type.  Not all objects of this list are dumped so the relocated pointer
5874 we associate with them points to the first dumped object of the list, or
5875 Qnil if none is available.  This is also the reason why they are not
5876 used as roots for the purpose of object enumeration.
5877
5878 This is the end of the dumping part.
5879
5880 @node Reloading phase, Remaining issues, Dumping phase, Dumping
5881 @section Reloading phase
5882
5883 @subsection File loading
5884
5885 The file is mmap'ed in memory (which ensures a PAGESIZE alignment, at
5886 least 4096), or if mmap is unavailable or fails, a 256-bytes aligned
5887 malloc is done and the file is loaded.
5888
5889 Some variables are reinitialized from the values found in the header.
5890
5891 The difference between the actual loading address and the reloc_address
5892 is computed and will be used for all the relocations.
5893
5894
5895 @subsection Putting back the staticvec
5896
5897 The staticvec array is memcpy'd from the file and the variables it
5898 points to are reset to the relocated objects addresses.
5899
5900
5901 @subsection Putting back the dumpstructed variables
5902
5903 The variables pointed to by dumpstruct in the dump phase are reset to
5904 the right relocated object addresses.
5905
5906
5907 @subsection lrecord_implementations_table
5908
5909 The lrecord_implementations_table is reset to its dump time state and
5910 the right lrecord_type_index values are put in.
5911
5912
5913 @subsection Object relocation
5914
5915 All the objects are relocated using their description and their offset
5916 by @code{pdump_reloc_one}.  This step is unnecessary if the
5917 reloc_address is equal to the file loading address.
5918
5919
5920 @subsection Putting back the pdump_wire and pdump_wire_list variables
5921
5922 Same as Putting back the dumpstructed variables.
5923
5924
5925 @subsection Reorganize the hash tables
5926
5927 Since some of the hash values in the lisp hash tables are
5928 address-dependent, their layout is now wrong.  So we go through each of
5929 them and have them resorted by calling @code{pdump_reorganize_hash_table}.
5930
5931 @node Remaining issues,  , Reloading phase, Dumping
5932 @section Remaining issues
5933
5934 The build process will have to start a post-dump xemacs, ask it the
5935 loading address (which will, hopefully, be always the same between
5936 different xemacs invocations) and relocate the file to the new address.
5937 This way the object relocation phase will not have to be done, which
5938 means no writes in the objects and that, because of the use of mmap, the
5939 dumped data will be shared between all the xemacs running on the
5940 computer.
5941
5942 Some executable signature will be necessary to ensure that a given dump
5943 file is really associated with a given executable, or random crashes
5944 will occur.  Maybe a random number set at compile or configure time thru
5945 a define.  This will also allow for having differently-compiled xemacsen
5946 on the same system (mule and no-mule comes to mind).
5947
5948 The DOC file contents should probably end up in the dump file.
5949
5950
5951 @node Events and the Event Loop, Evaluation; Stack Frames; Bindings, Dumping, Top
5952 @chapter Events and the Event Loop
5953
5954 @menu
5955 * Introduction to Events::
5956 * Main Loop::
5957 * Specifics of the Event Gathering Mechanism::
5958 * Specifics About the Emacs Event::
5959 * The Event Stream Callback Routines::
5960 * Other Event Loop Functions::
5961 * Converting Events::
5962 * Dispatching Events; The Command Builder::
5963 @end menu
5964
5965 @node Introduction to Events, Main Loop, Events and the Event Loop, Events and the Event Loop
5966 @section Introduction to Events
5967
5968   An event is an object that encapsulates information about an
5969 interesting occurrence in the operating system.  Events are
5970 generated either by user action, direct (e.g. typing on the
5971 keyboard or moving the mouse) or indirect (moving another
5972 window, thereby generating an expose event on an Emacs frame),
5973 or as a result of some other typically asynchronous action happening,
5974 such as output from a subprocess being ready or a timer expiring.
5975 Events come into the system in an asynchronous fashion (typically
5976 through a callback being called) and are converted into a
5977 synchronous event queue (first-in, first-out) in a process that
5978 we will call @dfn{collection}.
5979
5980   Note that each application has its own event queue. (It is
5981 immaterial whether the collection process directly puts the
5982 events in the proper application's queue, or puts them into
5983 a single system queue, which is later split up.)
5984
5985   The most basic level of event collection is done by the
5986 operating system or window system.  Typically, XEmacs does
5987 its own event collection as well.  Often there are multiple
5988 layers of collection in XEmacs, with events from various
5989 sources being collected into a queue, which is then combined
5990 with other sources to go into another queue (i.e. a second
5991 level of collection), with perhaps another level on top of
5992 this, etc.
5993
5994   XEmacs has its own types of events (called @dfn{Emacs events}),
5995 which provides an abstract layer on top of the system-dependent
5996 nature of the most basic events that are received.  Part of the
5997 complex nature of the XEmacs event collection process involves
5998 converting from the operating-system events into the proper
5999 Emacs events---there may not be a one-to-one correspondence.
6000
6001   Emacs events are documented in @file{events.h}; I'll discuss them
6002 later.
6003
6004 @node Main Loop, Specifics of the Event Gathering Mechanism, Introduction to Events, Events and the Event Loop
6005 @section Main Loop
6006
6007   The @dfn{command loop} is the top-level loop that the editor is always
6008 running.  It loops endlessly, calling @code{next-event} to retrieve an
6009 event and @code{dispatch-event} to execute it. @code{dispatch-event} does
6010 the appropriate thing with non-user events (process, timeout,
6011 magic, eval, mouse motion); this involves calling a Lisp handler
6012 function, redrawing a newly-exposed part of a frame, reading
6013 subprocess output, etc.  For user events, @code{dispatch-event}
6014 looks up the event in relevant keymaps or menubars; when a
6015 full key sequence or menubar selection is reached, the appropriate
6016 function is executed. @code{dispatch-event} may have to keep state
6017 across calls; this is done in the ``command-builder'' structure
6018 associated with each console (remember, there's usually only
6019 one console), and the engine that looks up keystrokes and
6020 constructs full key sequences is called the @dfn{command builder}.
6021 This is documented elsewhere.
6022
6023   The guts of the command loop are in @code{command_loop_1()}.  This
6024 function doesn't catch errors, though---that's the job of
6025 @code{command_loop_2()}, which is a condition-case (i.e. error-trapping)
6026 wrapper around @code{command_loop_1()}.  @code{command_loop_1()} never
6027 returns, but may get thrown out of.
6028
6029   When an error occurs, @code{cmd_error()} is called, which usually
6030 invokes the Lisp error handler in @code{command-error}; however, a
6031 default error handler is provided if @code{command-error} is @code{nil}
6032 (e.g. during startup).  The purpose of the error handler is simply to
6033 display the error message and do associated cleanup; it does not need to
6034 throw anywhere.  When the error handler finishes, the condition-case in
6035 @code{command_loop_2()} will finish and @code{command_loop_2()} will
6036 reinvoke @code{command_loop_1()}.
6037
6038   @code{command_loop_2()} is invoked from three places: from
6039 @code{initial_command_loop()} (called from @code{main()} at the end of
6040 internal initialization), from the Lisp function @code{recursive-edit},
6041 and from @code{call_command_loop()}.
6042
6043   @code{call_command_loop()} is called when a macro is started and when
6044 the minibuffer is entered; normal termination of the macro or minibuffer
6045 causes a throw out of the recursive command loop. (To
6046 @code{execute-kbd-macro} for macros and @code{exit} for minibuffers.
6047 Note also that the low-level minibuffer-entering function,
6048 @code{read-minibuffer-internal}, provides its own error handling and
6049 does not need @code{command_loop_2()}'s error encapsulation; so it tells
6050 @code{call_command_loop()} to invoke @code{command_loop_1()} directly.)
6051
6052   Note that both read-minibuffer-internal and recursive-edit set up a
6053 catch for @code{exit}; this is why @code{abort-recursive-edit}, which
6054 throws to this catch, exits out of either one.
6055
6056   @code{initial_command_loop()}, called from @code{main()}, sets up a
6057 catch for @code{top-level} when invoking @code{command_loop_2()},
6058 allowing functions to throw all the way to the top level if they really
6059 need to.  Before invoking @code{command_loop_2()},
6060 @code{initial_command_loop()} calls @code{top_level_1()}, which handles
6061 all of the startup stuff (creating the initial frame, handling the
6062 command-line options, loading the user's @file{.emacs} file, etc.).  The
6063 function that actually does this is in Lisp and is pointed to by the
6064 variable @code{top-level}; normally this function is
6065 @code{normal-top-level}.  @code{top_level_1()} is just an error-handling
6066 wrapper similar to @code{command_loop_2()}.  Note also that
6067 @code{initial_command_loop()} sets up a catch for @code{top-level} when
6068 invoking @code{top_level_1()}, just like when it invokes
6069 @code{command_loop_2()}.
6070
6071 @node Specifics of the Event Gathering Mechanism, Specifics About the Emacs Event, Main Loop, Events and the Event Loop
6072 @section Specifics of the Event Gathering Mechanism
6073
6074   Here is an approximate diagram of the collection processes
6075 at work in XEmacs, under TTY's (TTY's are simpler than X
6076 so we'll look at this first):
6077
6078 @noindent
6079 @example
6080  asynch.      asynch.    asynch.   asynch.             [Collectors in
6081 kbd events  kbd events   process   process                the OS]
6082       |         |         output    output
6083       |         |           |         |
6084       |         |           |         |      SIGINT,   [signal handlers
6085       |         |           |         |      SIGQUIT,     in XEmacs]
6086       V         V           V         V      SIGWINCH,
6087      file      file        file      file    SIGALRM
6088      desc.     desc.       desc.     desc.     |
6089      (TTY)     (TTY)       (pipe)    (pipe)    |
6090       |          |          |         |      fake    timeouts
6091       |          |          |         |      file        |
6092       |          |          |         |      desc.       |
6093       |          |          |         |      (pipe)      |
6094       |          |          |         |        |         |
6095       |          |          |         |        |         |
6096       |          |          |         |        |         |
6097       V          V          V         V        V         V
6098       ------>-----------<----------------<----------------
6099                   |
6100                   |
6101                   | [collected using select() in emacs_tty_next_event()
6102                   |  and converted to the appropriate Emacs event]
6103                   |
6104                   |
6105                   V          (above this line is TTY-specific)
6106                 Emacs -----------------------------------------------
6107                 event (below this line is the generic event mechanism)
6108                   |
6109                   |
6110 was there     if not, call
6111 a SIGINT?  emacs_tty_next_event()
6112     |             |
6113     |             |
6114     |             |
6115     V             V
6116     --->------<----
6117            |
6118            |     [collected in event_stream_next_event();
6119            |      SIGINT is converted using maybe_read_quit_event()]
6120            V
6121          Emacs
6122          event
6123            |
6124            \---->------>----- maybe_kbd_translate() ---->---\
6125                                                             |
6126                                                             |
6127                                                             |
6128      command event queue                                    |
6129                                                if not from command
6130   (contains events that were                   event queue, call
6131   read earlier but not processed,              event_stream_next_event()
6132   typically when waiting in a                               |
6133   sit-for, sleep-for, etc. for                              |
6134  a particular event to be received)                         |
6135                |                                            |
6136                |                                            |
6137                V                                            V
6138                ---->------------------------------------<----
6139                                                |
6140                                                | [collected in
6141                                                |  next_event_internal()]
6142                                                |
6143  unread-     unread-       event from          |
6144  command-    command-       keyboard       else, call
6145  events      event           macro      next_event_internal()
6146    |           |               |               |
6147    |           |               |               |
6148    |           |               |               |
6149    V           V               V               V
6150    --------->----------------------<------------
6151                      |
6152                      |      [collected in `next-event', which may loop
6153                      |       more than once if the event it gets is on
6154                      |       a dead frame, device, etc.]
6155                      |
6156                      |
6157                      V
6158             feed into top-level event loop,
6159             which repeatedly calls `next-event'
6160             and then dispatches the event
6161             using `dispatch-event'
6162 @end example
6163
6164 Notice the separation between TTY-specific and generic event mechanism.
6165 When using the Xt-based event loop, the TTY-specific stuff is replaced
6166 but the rest stays the same.
6167
6168 It's also important to realize that only one different kind of
6169 system-specific event loop can be operating at a time, and must be able
6170 to receive all kinds of events simultaneously.  For the two existing
6171 event loops (implemented in @file{event-tty.c} and @file{event-Xt.c},
6172 respectively), the TTY event loop @emph{only} handles TTY consoles,
6173 while the Xt event loop handles @emph{both} TTY and X consoles.  This
6174 situation is different from all of the output handlers, where you simply
6175 have one per console type.
6176
6177   Here's the Xt Event Loop Diagram (notice that below a certain point,
6178 it's the same as the above diagram):
6179
6180 @example
6181 asynch. asynch. asynch. asynch.                 [Collectors in
6182  kbd     kbd    process process                    the OS]
6183 events  events  output  output
6184   |       |       |       |
6185   |       |       |       |     asynch. asynch. [Collectors in the
6186   |       |       |       |       X        X     OS and X Window System]
6187   |       |       |       |     events  events
6188   |       |       |       |       |        |
6189   |       |       |       |       |        |
6190   |       |       |       |       |        |    SIGINT, [signal handlers
6191   |       |       |       |       |        |    SIGQUIT,   in XEmacs]
6192   |       |       |       |       |        |    SIGWINCH,
6193   |       |       |       |       |        |    SIGALRM
6194   |       |       |       |       |        |       |
6195   |       |       |       |       |        |       |
6196   |       |       |       |       |        |       |      timeouts
6197   |       |       |       |       |        |       |          |
6198   |       |       |       |       |        |       |          |
6199   |       |       |       |       |        |       V          |
6200   V       V       V       V       V        V      fake        |
6201  file    file    file    file    file     file    file        |
6202  desc.   desc.   desc.   desc.   desc.    desc.   desc.       |
6203  (TTY)   (TTY)   (pipe)  (pipe) (socket) (socket) (pipe)      |
6204   |       |       |       |       |        |       |          |
6205   |       |       |       |       |        |       |          |
6206   |       |       |       |       |        |       |          |
6207   V       V       V       V       V        V       V          V
6208   --->----------------------------------------<---------<------
6209        |              |               |
6210        |              |               |[collected using select() in
6211        |              |               | _XtWaitForSomething(), called
6212        |              |               | from XtAppProcessEvent(), called
6213        |              |               | in emacs_Xt_next_event();
6214        |              |               | dispatched to various callbacks]
6215        |              |               |
6216        |              |               |
6217   emacs_Xt_        p_s_callback(),    | [popup_selection_callback]
6218   event_handler()  x_u_v_s_callback(),| [x_update_vertical_scrollbar_
6219        |           x_u_h_s_callback(),|  callback]
6220        |           search_callback()  | [x_update_horizontal_scrollbar_
6221        |              |               |  callback]
6222        |              |               |
6223        |              |               |
6224   enqueue_Xt_       signal_special_   |
6225   dispatch_event()  Xt_user_event()   |
6226   [maybe multiple     |               |
6227    times, maybe 0     |               |
6228    times]             |               |
6229        |            enqueue_Xt_       |
6230        |            dispatch_event()  |
6231        |              |               |
6232        |              |               |
6233        V              V               |
6234        -->----------<--               |
6235               |                       |
6236               |                       |
6237            dispatch             Xt_what_callback()
6238            event                  sets flags
6239            queue                      |
6240               |                       |
6241               |                       |
6242               |                       |
6243               |                       |
6244               ---->-----------<--------
6245                    |
6246                    |
6247                    |     [collected and converted as appropriate in
6248                    |            emacs_Xt_next_event()]
6249                    |
6250                    |
6251                    V          (above this line is Xt-specific)
6252                  Emacs ------------------------------------------------
6253                  event (below this line is the generic event mechanism)
6254                    |
6255                    |
6256 was there      if not, call
6257 a SIGINT?   emacs_Xt_next_event()
6258     |              |
6259     |              |
6260     |              |
6261     V              V
6262     --->-------<----
6263            |
6264            |        [collected in event_stream_next_event();
6265            |         SIGINT is converted using maybe_read_quit_event()]
6266            V
6267          Emacs
6268          event
6269            |
6270            \---->------>----- maybe_kbd_translate() -->-----\
6271                                                             |
6272                                                             |
6273                                                             |
6274      command event queue                                    |
6275                                               if not from command
6276   (contains events that were                  event queue, call
6277   read earlier but not processed,             event_stream_next_event()
6278   typically when waiting in a                               |
6279   sit-for, sleep-for, etc. for                              |
6280  a particular event to be received)                         |
6281                |                                            |
6282                |                                            |
6283                V                                            V
6284                ---->----------------------------------<------
6285                                                |
6286                                                | [collected in
6287                                                |  next_event_internal()]
6288                                                |
6289  unread-     unread-       event from          |
6290  command-    command-       keyboard       else, call
6291  events      event           macro      next_event_internal()
6292    |           |               |               |
6293    |           |               |               |
6294    |           |               |               |
6295    V           V               V               V
6296    --------->----------------------<------------
6297                      |
6298                      |      [collected in `next-event', which may loop
6299                      |       more than once if the event it gets is on
6300                      |       a dead frame, device, etc.]
6301                      |
6302                      |
6303                      V
6304             feed into top-level event loop,
6305             which repeatedly calls `next-event'
6306             and then dispatches the event
6307             using `dispatch-event'
6308 @end example
6309
6310 @node Specifics About the Emacs Event, The Event Stream Callback Routines, Specifics of the Event Gathering Mechanism, Events and the Event Loop
6311 @section Specifics About the Emacs Event
6312
6313 @node The Event Stream Callback Routines, Other Event Loop Functions, Specifics About the Emacs Event, Events and the Event Loop
6314 @section The Event Stream Callback Routines
6315
6316 @node Other Event Loop Functions, Converting Events, The Event Stream Callback Routines, Events and the Event Loop
6317 @section Other Event Loop Functions
6318
6319   @code{detect_input_pending()} and @code{input-pending-p} look for
6320 input by calling @code{event_stream->event_pending_p} and looking in
6321 @code{[V]unread-command-event} and the @code{command_event_queue} (they
6322 do not check for an executing keyboard macro, though).
6323
6324   @code{discard-input} cancels any command events pending (and any
6325 keyboard macros currently executing), and puts the others onto the
6326 @code{command_event_queue}.  There is a comment about a ``race
6327 condition'', which is not a good sign.
6328
6329   @code{next-command-event} and @code{read-char} are higher-level
6330 interfaces to @code{next-event}.  @code{next-command-event} gets the
6331 next @dfn{command} event (i.e.  keypress, mouse event, menu selection,
6332 or scrollbar action), calling @code{dispatch-event} on any others.
6333 @code{read-char} calls @code{next-command-event} and uses
6334 @code{event_to_character()} to return the character equivalent.  With
6335 the right kind of input method support, it is possible for (read-char)
6336 to return a Kanji character.
6337
6338 @node Converting Events, Dispatching Events; The Command Builder, Other Event Loop Functions, Events and the Event Loop
6339 @section Converting Events
6340
6341   @code{character_to_event()}, @code{event_to_character()},
6342 @code{event-to-character}, and @code{character-to-event} convert between
6343 characters and keypress events corresponding to the characters.  If the
6344 event was not a keypress, @code{event_to_character()} returns -1 and
6345 @code{event-to-character} returns @code{nil}.  These functions convert
6346 between character representation and the split-up event representation
6347 (keysym plus mod keys).
6348
6349 @node Dispatching Events; The Command Builder,  , Converting Events, Events and the Event Loop
6350 @section Dispatching Events; The Command Builder
6351
6352 Not yet documented.
6353
6354 @node Evaluation; Stack Frames; Bindings, Symbols and Variables, Events and the Event Loop, Top
6355 @chapter Evaluation; Stack Frames; Bindings
6356
6357 @menu
6358 * Evaluation::
6359 * Dynamic Binding; The specbinding Stack; Unwind-Protects::
6360 * Simple Special Forms::
6361 * Catch and Throw::
6362 @end menu
6363
6364 @node Evaluation, Dynamic Binding; The specbinding Stack; Unwind-Protects, Evaluation; Stack Frames; Bindings, Evaluation; Stack Frames; Bindings
6365 @section Evaluation
6366
6367   @code{Feval()} evaluates the form (a Lisp object) that is passed to
6368 it.  Note that evaluation is only non-trivial for two types of objects:
6369 symbols and conses.  A symbol is evaluated simply by calling
6370 @code{symbol-value} on it and returning the value.
6371
6372   Evaluating a cons means calling a function.  First, @code{eval} checks
6373 to see if garbage-collection is necessary, and calls
6374 @code{garbage_collect_1()} if so.  It then increases the evaluation
6375 depth by 1 (@code{lisp_eval_depth}, which is always less than
6376 @code{max_lisp_eval_depth}) and adds an element to the linked list of
6377 @code{struct backtrace}'s (@code{backtrace_list}).  Each such structure
6378 contains a pointer to the function being called plus a list of the
6379 function's arguments.  Originally these values are stored unevalled, and
6380 as they are evaluated, the backtrace structure is updated.  Garbage
6381 collection pays attention to the objects pointed to in the backtrace
6382 structures (garbage collection might happen while a function is being
6383 called or while an argument is being evaluated, and there could easily
6384 be no other references to the arguments in the argument list; once an
6385 argument is evaluated, however, the unevalled version is not needed by
6386 eval, and so the backtrace structure is changed).
6387
6388 At this point, the function to be called is determined by looking at
6389 the car of the cons (if this is a symbol, its function definition is
6390 retrieved and the process repeated).  The function should then consist
6391 of either a @code{Lisp_Subr} (built-in function written in C), a
6392 @code{Lisp_Compiled_Function} object, or a cons whose car is one of the
6393 symbols @code{autoload}, @code{macro} or @code{lambda}.
6394
6395 If the function is a @code{Lisp_Subr}, the lisp object points to a
6396 @code{struct Lisp_Subr} (created by @code{DEFUN()}), which contains a
6397 pointer to the C function, a minimum and maximum number of arguments
6398 (or possibly the special constants @code{MANY} or @code{UNEVALLED}), a
6399 pointer to the symbol referring to that subr, and a couple of other
6400 things.  If the subr wants its arguments @code{UNEVALLED}, they are
6401 passed raw as a list.  Otherwise, an array of evaluated arguments is
6402 created and put into the backtrace structure, and either passed whole
6403 (@code{MANY}) or each argument is passed as a C argument.
6404
6405 If the function is a @code{Lisp_Compiled_Function},
6406 @code{funcall_compiled_function()} is called.  If the function is a
6407 lambda list, @code{funcall_lambda()} is called.  If the function is a
6408 macro, [..... fill in] is done.  If the function is an autoload,
6409 @code{do_autoload()} is called to load the definition and then eval
6410 starts over [explain this more].
6411
6412 When @code{Feval()} exits, the evaluation depth is reduced by one, the
6413 debugger is called if appropriate, and the current backtrace structure
6414 is removed from the list.
6415
6416 Both @code{funcall_compiled_function()} and @code{funcall_lambda()} need
6417 to go through the list of formal parameters to the function and bind
6418 them to the actual arguments, checking for @code{&rest} and
6419 @code{&optional} symbols in the formal parameters and making sure the
6420 number of actual arguments is correct.
6421 @code{funcall_compiled_function()} can do this a little more
6422 efficiently, since the formal parameter list can be checked for sanity
6423 when the compiled function object is created.
6424
6425 @code{funcall_lambda()} simply calls @code{Fprogn} to execute the code
6426 in the lambda list.
6427
6428 @code{funcall_compiled_function()} calls the real byte-code interpreter
6429 @code{execute_optimized_program()} on the byte-code instructions, which
6430 are converted into an internal form for faster execution.
6431
6432 When a compiled function is executed for the first time by
6433 @code{funcall_compiled_function()}, or during the dump phase of building
6434 XEmacs, the byte-code instructions are converted from a
6435 @code{Lisp_String} (which is inefficient to access, especially in the
6436 presence of MULE) into a @code{Lisp_Opaque} object containing an array
6437 of unsigned char, which can be directly executed by the byte-code
6438 interpreter.  At this time the byte code is also analyzed for validity
6439 and transformed into a more optimized form, so that
6440 @code{execute_optimized_program()} can really fly.
6441
6442 Here are some of the optimizations performed by the internal byte-code
6443 transformer:
6444 @enumerate
6445 @item
6446 References to the @code{constants} array are checked for out-of-range
6447 indices, so that the byte interpreter doesn't have to.
6448 @item
6449 References to the @code{constants} array that will be used as a Lisp
6450 variable are checked for being correct non-constant (i.e. not @code{t},
6451 @code{nil}, or @code{keywordp}) symbols, so that the byte interpreter
6452 doesn't have to.
6453 @item
6454 The maximum number of variable bindings in the byte-code is
6455 pre-computed, so that space on the @code{specpdl} stack can be
6456 pre-reserved once for the whole function execution.
6457 @item
6458 All byte-code jumps are relative to the current program counter instead
6459 of the start of the program, thereby saving a register.
6460 @item
6461 One-byte relative jumps are converted from the byte-code form of unsigned
6462 chars offset by 127 to machine-friendly signed chars.
6463 @end enumerate
6464
6465 Of course, this transformation of the @code{instructions} should not be
6466 visible to the user, so @code{Fcompiled_function_instructions()} needs
6467 to know how to convert the optimized opaque object back into a Lisp
6468 string that is identical to the original string from the @file{.elc}
6469 file.  (Actually, the resulting string may (rarely) contain slightly
6470 different, yet equivalent, byte code.)
6471
6472 @code{Ffuncall()} implements Lisp @code{funcall}.  @code{(funcall fun
6473 x1 x2 x3 ...)} is equivalent to @code{(eval (list fun (quote x1) (quote
6474 x2) (quote x3) ...))}.  @code{Ffuncall()} contains its own code to do
6475 the evaluation, however, and is very similar to @code{Feval()}.
6476
6477 From the performance point of view, it is worth knowing that most of the
6478 time in Lisp evaluation is spent executing @code{Lisp_Subr} and
6479 @code{Lisp_Compiled_Function} objects via @code{Ffuncall()} (not
6480 @code{Feval()}).
6481
6482 @code{Fapply()} implements Lisp @code{apply}, which is very similar to
6483 @code{funcall} except that if the last argument is a list, the result is the
6484 same as if each of the arguments in the list had been passed separately.
6485 @code{Fapply()} does some business to expand the last argument if it's a
6486 list, then calls @code{Ffuncall()} to do the work.
6487
6488 @code{apply1()}, @code{call0()}, @code{call1()}, @code{call2()}, and
6489 @code{call3()} call a function, passing it the argument(s) given (the
6490 arguments are given as separate C arguments rather than being passed as
6491 an array).  @code{apply1()} uses @code{Fapply()} while the others use
6492 @code{Ffuncall()} to do the real work.
6493
6494 @node Dynamic Binding; The specbinding Stack; Unwind-Protects, Simple Special Forms, Evaluation, Evaluation; Stack Frames; Bindings
6495 @section Dynamic Binding; The specbinding Stack; Unwind-Protects
6496
6497 @example
6498 struct specbinding
6499 @{
6500   Lisp_Object symbol;
6501   Lisp_Object old_value;
6502   Lisp_Object (*func) (Lisp_Object); /* for unwind-protect */
6503 @};
6504 @end example
6505
6506   @code{struct specbinding} is used for local-variable bindings and
6507 unwind-protects.  @code{specpdl} holds an array of @code{struct specbinding}'s,
6508 @code{specpdl_ptr} points to the beginning of the free bindings in the
6509 array, @code{specpdl_size} specifies the total number of binding slots
6510 in the array, and @code{max_specpdl_size} specifies the maximum number
6511 of bindings the array can be expanded to hold.  @code{grow_specpdl()}
6512 increases the size of the @code{specpdl} array, multiplying its size by
6513 2 but never exceeding @code{max_specpdl_size} (except that if this
6514 number is less than 400, it is first set to 400).
6515
6516   @code{specbind()} binds a symbol to a value and is used for local
6517 variables and @code{let} forms.  The symbol and its old value (which
6518 might be @code{Qunbound}, indicating no prior value) are recorded in the
6519 specpdl array, and @code{specpdl_size} is increased by 1.
6520
6521   @code{record_unwind_protect()} implements an @dfn{unwind-protect},
6522 which, when placed around a section of code, ensures that some specified
6523 cleanup routine will be executed even if the code exits abnormally
6524 (e.g. through a @code{throw} or quit).  @code{record_unwind_protect()}
6525 simply adds a new specbinding to the @code{specpdl} array and stores the
6526 appropriate information in it.  The cleanup routine can either be a C
6527 function, which is stored in the @code{func} field, or a @code{progn}
6528 form, which is stored in the @code{old_value} field.
6529
6530   @code{unbind_to()} removes specbindings from the @code{specpdl} array
6531 until the specified position is reached.  Each specbinding can be one of
6532 three types:
6533
6534 @enumerate
6535 @item
6536 an unwind-protect with a C cleanup function (@code{func} is not 0, and
6537 @code{old_value} holds an argument to be passed to the function);
6538 @item
6539 an unwind-protect with a Lisp form (@code{func} is 0, @code{symbol}
6540 is @code{nil}, and @code{old_value} holds the form to be executed with
6541 @code{Fprogn()}); or
6542 @item
6543 a local-variable binding (@code{func} is 0, @code{symbol} is not
6544 @code{nil}, and @code{old_value} holds the old value, which is stored as
6545 the symbol's value).
6546 @end enumerate
6547
6548 @node Simple Special Forms, Catch and Throw, Dynamic Binding; The specbinding Stack; Unwind-Protects, Evaluation; Stack Frames; Bindings
6549 @section Simple Special Forms
6550
6551 @code{or}, @code{and}, @code{if}, @code{cond}, @code{progn},
6552 @code{prog1}, @code{prog2}, @code{setq}, @code{quote}, @code{function},
6553 @code{let*}, @code{let}, @code{while}
6554
6555 All of these are very simple and work as expected, calling
6556 @code{Feval()} or @code{Fprogn()} as necessary and (in the case of
6557 @code{let} and @code{let*}) using @code{specbind()} to create bindings
6558 and @code{unbind_to()} to undo the bindings when finished.
6559
6560 Note that, with the exception of @code{Fprogn}, these functions are
6561 typically called in real life only in interpreted code, since the byte
6562 compiler knows how to convert calls to these functions directly into
6563 byte code.
6564
6565 @node Catch and Throw,  , Simple Special Forms, Evaluation; Stack Frames; Bindings
6566 @section Catch and Throw
6567
6568 @example
6569 struct catchtag
6570 @{
6571   Lisp_Object tag;
6572   Lisp_Object val;
6573   struct catchtag *next;
6574   struct gcpro *gcpro;
6575   jmp_buf jmp;
6576   struct backtrace *backlist;
6577   int lisp_eval_depth;
6578   int pdlcount;
6579 @};
6580 @end example
6581
6582   @code{catch} is a Lisp function that places a catch around a body of
6583 code.  A catch is a means of non-local exit from the code.  When a catch
6584 is created, a tag is specified, and executing a @code{throw} to this tag
6585 will exit from the body of code caught with this tag, and its value will
6586 be the value given in the call to @code{throw}.  If there is no such
6587 call, the code will be executed normally.
6588
6589   Information pertaining to a catch is held in a @code{struct catchtag},
6590 which is placed at the head of a linked list pointed to by
6591 @code{catchlist}.  @code{internal_catch()} is passed a C function to
6592 call (@code{Fprogn()} when Lisp @code{catch} is called) and arguments to
6593 give it, and places a catch around the function.  Each @code{struct
6594 catchtag} is held in the stack frame of the @code{internal_catch()}
6595 instance that created the catch.
6596
6597   @code{internal_catch()} is fairly straightforward.  It stores into the
6598 @code{struct catchtag} the tag name and the current values of
6599 @code{backtrace_list}, @code{lisp_eval_depth}, @code{gcprolist}, and the
6600 offset into the @code{specpdl} array, sets a jump point with @code{_setjmp()}
6601 (storing the jump point into the @code{struct catchtag}), and calls the
6602 function.  Control will return to @code{internal_catch()} either when
6603 the function exits normally or through a @code{_longjmp()} to this jump
6604 point.  In the latter case, @code{throw} will store the value to be
6605 returned into the @code{struct catchtag} before jumping.  When it's
6606 done, @code{internal_catch()} removes the @code{struct catchtag} from
6607 the catchlist and returns the proper value.
6608
6609   @code{Fthrow()} goes up through the catchlist until it finds one with
6610 a matching tag.  It then calls @code{unbind_catch()} to restore
6611 everything to what it was when the appropriate catch was set, stores the
6612 return value in the @code{struct catchtag}, and jumps (with
6613 @code{_longjmp()}) to its jump point.
6614
6615   @code{unbind_catch()} removes all catches from the catchlist until it
6616 finds the correct one.  Some of the catches might have been placed for
6617 error-trapping, and if so, the appropriate entries on the handlerlist
6618 must be removed (see ``errors'').  @code{unbind_catch()} also restores
6619 the values of @code{gcprolist}, @code{backtrace_list}, and
6620 @code{lisp_eval}, and calls @code{unbind_to()} to undo any specbindings
6621 created since the catch.
6622
6623
6624 @node Symbols and Variables, Buffers and Textual Representation, Evaluation; Stack Frames; Bindings, Top
6625 @chapter Symbols and Variables
6626
6627 @menu
6628 * Introduction to Symbols::
6629 * Obarrays::
6630 * Symbol Values::
6631 @end menu
6632
6633 @node Introduction to Symbols, Obarrays, Symbols and Variables, Symbols and Variables
6634 @section Introduction to Symbols
6635
6636   A symbol is basically just an object with four fields: a name (a
6637 string), a value (some Lisp object), a function (some Lisp object), and
6638 a property list (usually a list of alternating keyword/value pairs).
6639 What makes symbols special is that there is usually only one symbol with
6640 a given name, and the symbol is referred to by name.  This makes a
6641 symbol a convenient way of calling up data by name, i.e. of implementing
6642 variables. (The variable's value is stored in the @dfn{value slot}.)
6643 Similarly, functions are referenced by name, and the definition of the
6644 function is stored in a symbol's @dfn{function slot}.  This means that
6645 there can be a distinct function and variable with the same name.  The
6646 property list is used as a more general mechanism of associating
6647 additional values with particular names, and once again the namespace is
6648 independent of the function and variable namespaces.
6649
6650 @node Obarrays, Symbol Values, Introduction to Symbols, Symbols and Variables
6651 @section Obarrays
6652
6653   The identity of symbols with their names is accomplished through a
6654 structure called an obarray, which is just a poorly-implemented hash
6655 table mapping from strings to symbols whose name is that string. (I say
6656 ``poorly implemented'' because an obarray appears in Lisp as a vector
6657 with some hidden fields rather than as its own opaque type.  This is an
6658 Emacs Lisp artifact that should be fixed.)
6659
6660   Obarrays are implemented as a vector of some fixed size (which should
6661 be a prime for best results), where each ``bucket'' of the vector
6662 contains one or more symbols, threaded through a hidden @code{next}
6663 field in the symbol.  Lookup of a symbol in an obarray, and adding a
6664 symbol to an obarray, is accomplished through standard hash-table
6665 techniques.
6666
6667   The standard Lisp function for working with symbols and obarrays is
6668 @code{intern}.  This looks up a symbol in an obarray given its name; if
6669 it's not found, a new symbol is automatically created with the specified
6670 name, added to the obarray, and returned.  This is what happens when the
6671 Lisp reader encounters a symbol (or more precisely, encounters the name
6672 of a symbol) in some text that it is reading.  There is a standard
6673 obarray called @code{obarray} that is used for this purpose, although
6674 the Lisp programmer is free to create his own obarrays and @code{intern}
6675 symbols in them.
6676
6677   Note that, once a symbol is in an obarray, it stays there until
6678 something is done about it, and the standard obarray @code{obarray}
6679 always stays around, so once you use any particular variable name, a
6680 corresponding symbol will stay around in @code{obarray} until you exit
6681 XEmacs.
6682
6683   Note that @code{obarray} itself is a variable, and as such there is a
6684 symbol in @code{obarray} whose name is @code{"obarray"} and which
6685 contains @code{obarray} as its value.
6686
6687   Note also that this call to @code{intern} occurs only when in the Lisp
6688 reader, not when the code is executed (at which point the symbol is
6689 already around, stored as such in the definition of the function).
6690
6691   You can create your own obarray using @code{make-vector} (this is
6692 horrible but is an artifact) and intern symbols into that obarray.
6693 Doing that will result in two or more symbols with the same name.
6694 However, at most one of these symbols is in the standard @code{obarray}:
6695 You cannot have two symbols of the same name in any particular obarray.
6696 Note that you cannot add a symbol to an obarray in any fashion other
6697 than using @code{intern}: i.e. you can't take an existing symbol and put
6698 it in an existing obarray.  Nor can you change the name of an existing
6699 symbol. (Since obarrays are vectors, you can violate the consistency of
6700 things by storing directly into the vector, but let's ignore that
6701 possibility.)
6702
6703   Usually symbols are created by @code{intern}, but if you really want,
6704 you can explicitly create a symbol using @code{make-symbol}, giving it
6705 some name.  The resulting symbol is not in any obarray (i.e. it is
6706 @dfn{uninterned}), and you can't add it to any obarray.  Therefore its
6707 primary purpose is as a symbol to use in macros to avoid namespace
6708 pollution.  It can also be used as a carrier of information, but cons
6709 cells could probably be used just as well.
6710
6711   You can also use @code{intern-soft} to look up a symbol but not create
6712 a new one, and @code{unintern} to remove a symbol from an obarray.  This
6713 returns the removed symbol. (Remember: You can't put the symbol back
6714 into any obarray.) Finally, @code{mapatoms} maps over all of the symbols
6715 in an obarray.
6716
6717 @node Symbol Values,  , Obarrays, Symbols and Variables
6718 @section Symbol Values
6719
6720   The value field of a symbol normally contains a Lisp object.  However,
6721 a symbol can be @dfn{unbound}, meaning that it logically has no value.
6722 This is internally indicated by storing a special Lisp object, called
6723 @dfn{the unbound marker} and stored in the global variable
6724 @code{Qunbound}.  The unbound marker is of a special Lisp object type
6725 called @dfn{symbol-value-magic}.  It is impossible for the Lisp
6726 programmer to directly create or access any object of this type.
6727
6728   @strong{You must not let any ``symbol-value-magic'' object escape to
6729 the Lisp level.}  Printing any of these objects will cause the message
6730 @samp{INTERNAL EMACS BUG} to appear as part of the print representation.
6731 (You may see this normally when you call @code{debug_print()} from the
6732 debugger on a Lisp object.) If you let one of these objects escape to
6733 the Lisp level, you will violate a number of assumptions contained in
6734 the C code and make the unbound marker not function right.
6735
6736   When a symbol is created, its value field (and function field) are set
6737 to @code{Qunbound}.  The Lisp programmer can restore these conditions
6738 later using @code{makunbound} or @code{fmakunbound}, and can query to
6739 see whether the value of function fields are @dfn{bound} (i.e. have a
6740 value other than @code{Qunbound}) using @code{boundp} and
6741 @code{fboundp}.  The fields are set to a normal Lisp object using
6742 @code{set} (or @code{setq}) and @code{fset}.
6743
6744   Other symbol-value-magic objects are used as special markers to
6745 indicate variables that have non-normal properties.  This includes any
6746 variables that are tied into C variables (setting the variable magically
6747 sets some global variable in the C code, and likewise for retrieving the
6748 variable's value), variables that magically tie into slots in the
6749 current buffer, variables that are buffer-local, etc.  The
6750 symbol-value-magic object is stored in the value cell in place of
6751 a normal object, and the code to retrieve a symbol's value
6752 (i.e. @code{symbol-value}) knows how to do special things with them.
6753 This means that you should not just fetch the value cell directly if you
6754 want a symbol's value.
6755
6756   The exact workings of this are rather complex and involved and are
6757 well-documented in comments in @file{buffer.c}, @file{symbols.c}, and
6758 @file{lisp.h}.
6759
6760 @node Buffers and Textual Representation, MULE Character Sets and Encodings, Symbols and Variables, Top
6761 @chapter Buffers and Textual Representation
6762
6763 @menu
6764 * Introduction to Buffers::     A buffer holds a block of text such as a file.
6765 * The Text in a Buffer::        Representation of the text in a buffer.
6766 * Buffer Lists::                Keeping track of all buffers.
6767 * Markers and Extents::         Tagging locations within a buffer.
6768 * Bufbytes and Emchars::        Representation of individual characters.
6769 * The Buffer Object::           The Lisp object corresponding to a buffer.
6770 @end menu
6771
6772 @node Introduction to Buffers, The Text in a Buffer, Buffers and Textual Representation, Buffers and Textual Representation
6773 @section Introduction to Buffers
6774
6775   A buffer is logically just a Lisp object that holds some text.
6776 In this, it is like a string, but a buffer is optimized for
6777 frequent insertion and deletion, while a string is not.  Furthermore:
6778
6779 @enumerate
6780 @item
6781 Buffers are @dfn{permanent} objects, i.e. once you create them, they
6782 remain around, and need to be explicitly deleted before they go away.
6783 @item
6784 Each buffer has a unique name, which is a string.  Buffers are
6785 normally referred to by name.  In this respect, they are like
6786 symbols.
6787 @item
6788 Buffers have a default insertion position, called @dfn{point}.
6789 Inserting text (unless you explicitly give a position) goes at point,
6790 and moves point forward past the text.  This is what is going on when
6791 you type text into Emacs.
6792 @item
6793 Buffers have lots of extra properties associated with them.
6794 @item
6795 Buffers can be @dfn{displayed}.  What this means is that there
6796 exist a number of @dfn{windows}, which are objects that correspond
6797 to some visible section of your display, and each window has
6798 an associated buffer, and the current contents of the buffer
6799 are shown in that section of the display.  The redisplay mechanism
6800 (which takes care of doing this) knows how to look at the
6801 text of a buffer and come up with some reasonable way of displaying
6802 this.  Many of the properties of a buffer control how the
6803 buffer's text is displayed.
6804 @item
6805 One buffer is distinguished and called the @dfn{current buffer}.  It is
6806 stored in the variable @code{current_buffer}.  Buffer operations operate
6807 on this buffer by default.  When you are typing text into a buffer, the
6808 buffer you are typing into is always @code{current_buffer}.  Switching
6809 to a different window changes the current buffer.  Note that Lisp code
6810 can temporarily change the current buffer using @code{set-buffer} (often
6811 enclosed in a @code{save-excursion} so that the former current buffer
6812 gets restored when the code is finished).  However, calling
6813 @code{set-buffer} will NOT cause a permanent change in the current
6814 buffer.  The reason for this is that the top-level event loop sets
6815 @code{current_buffer} to the buffer of the selected window, each time
6816 it finishes executing a user command.
6817 @end enumerate
6818
6819   Make sure you understand the distinction between @dfn{current buffer}
6820 and @dfn{buffer of the selected window}, and the distinction between
6821 @dfn{point} of the current buffer and @dfn{window-point} of the selected
6822 window. (This latter distinction is explained in detail in the section
6823 on windows.)
6824
6825 @node The Text in a Buffer, Buffer Lists, Introduction to Buffers, Buffers and Textual Representation
6826 @section The Text in a Buffer
6827
6828   The text in a buffer consists of a sequence of zero or more
6829 characters.  A @dfn{character} is an integer that logically represents
6830 a letter, number, space, or other unit of text.  Most of the characters
6831 that you will typically encounter belong to the ASCII set of characters,
6832 but there are also characters for various sorts of accented letters,
6833 special symbols, Chinese and Japanese ideograms (i.e. Kanji, Katakana,
6834 etc.), Cyrillic and Greek letters, etc.  The actual number of possible
6835 characters is quite large.
6836
6837   For now, we can view a character as some non-negative integer that
6838 has some shape that defines how it typically appears (e.g. as an
6839 uppercase A). (The exact way in which a character appears depends on the
6840 font used to display the character.) The internal type of characters in
6841 the C code is an @code{Emchar}; this is just an @code{int}, but using a
6842 symbolic type makes the code clearer.
6843
6844   Between every character in a buffer is a @dfn{buffer position} or
6845 @dfn{character position}.  We can speak of the character before or after
6846 a particular buffer position, and when you insert a character at a
6847 particular position, all characters after that position end up at new
6848 positions.  When we speak of the character @dfn{at} a position, we
6849 really mean the character after the position.  (This schizophrenia
6850 between a buffer position being ``between'' a character and ``on'' a
6851 character is rampant in Emacs.)
6852
6853   Buffer positions are numbered starting at 1.  This means that
6854 position 1 is before the first character, and position 0 is not
6855 valid.  If there are N characters in a buffer, then buffer
6856 position N+1 is after the last one, and position N+2 is not valid.
6857
6858   The internal makeup of the Emchar integer varies depending on whether
6859 we have compiled with MULE support.  If not, the Emchar integer is an
6860 8-bit integer with possible values from 0 - 255.  0 - 127 are the
6861 standard ASCII characters, while 128 - 255 are the characters from the
6862 ISO-8859-1 character set.  If we have compiled with MULE support, an
6863 Emchar is a 19-bit integer, with the various bits having meanings
6864 according to a complex scheme that will be detailed later.  The
6865 characters numbered 0 - 255 still have the same meanings as for the
6866 non-MULE case, though.
6867
6868   Internally, the text in a buffer is represented in a fairly simple
6869 fashion: as a contiguous array of bytes, with a @dfn{gap} of some size
6870 in the middle.  Although the gap is of some substantial size in bytes,
6871 there is no text contained within it: From the perspective of the text
6872 in the buffer, it does not exist.  The gap logically sits at some buffer
6873 position, between two characters (or possibly at the beginning or end of
6874 the buffer).  Insertion of text in a buffer at a particular position is
6875 always accomplished by first moving the gap to that position
6876 (i.e. through some block moving of text), then writing the text into the
6877 beginning of the gap, thereby shrinking the gap.  If the gap shrinks
6878 down to nothing, a new gap is created. (What actually happens is that a
6879 new gap is ``created'' at the end of the buffer's text, which requires
6880 nothing more than changing a couple of indices; then the gap is
6881 ``moved'' to the position where the insertion needs to take place by
6882 moving up in memory all the text after that position.)  Similarly,
6883 deletion occurs by moving the gap to the place where the text is to be
6884 deleted, and then simply expanding the gap to include the deleted text.
6885 (@dfn{Expanding} and @dfn{shrinking} the gap as just described means
6886 just that the internal indices that keep track of where the gap is
6887 located are changed.)
6888
6889   Note that the total amount of memory allocated for a buffer text never
6890 decreases while the buffer is live.  Therefore, if you load up a
6891 20-megabyte file and then delete all but one character, there will be a
6892 20-megabyte gap, which won't get any smaller (except by inserting
6893 characters back again).  Once the buffer is killed, the memory allocated
6894 for the buffer text will be freed, but it will still be sitting on the
6895 heap, taking up virtual memory, and will not be released back to the
6896 operating system. (However, if you have compiled XEmacs with rel-alloc,
6897 the situation is different.  In this case, the space @emph{will} be
6898 released back to the operating system.  However, this tends to result in a
6899 noticeable speed penalty.)
6900
6901   Astute readers may notice that the text in a buffer is represented as
6902 an array of @emph{bytes}, while (at least in the MULE case) an Emchar is
6903 a 19-bit integer, which clearly cannot fit in a byte.  This means (of
6904 course) that the text in a buffer uses a different representation from
6905 an Emchar: specifically, the 19-bit Emchar becomes a series of one to
6906 four bytes.  The conversion between these two representations is complex
6907 and will be described later.
6908
6909   In the non-MULE case, everything is very simple: An Emchar
6910 is an 8-bit value, which fits neatly into one byte.
6911
6912   If we are given a buffer position and want to retrieve the
6913 character at that position, we need to follow these steps:
6914
6915 @enumerate
6916 @item
6917 Pretend there's no gap, and convert the buffer position into a @dfn{byte
6918 index} that indexes to the appropriate byte in the buffer's stream of
6919 textual bytes.  By convention, byte indices begin at 1, just like buffer
6920 positions.  In the non-MULE case, byte indices and buffer positions are
6921 identical, since one character equals one byte.
6922 @item
6923 Convert the byte index into a @dfn{memory index}, which takes the gap
6924 into account.  The memory index is a direct index into the block of
6925 memory that stores the text of a buffer.  This basically just involves
6926 checking to see if the byte index is past the gap, and if so, adding the
6927 size of the gap to it.  By convention, memory indices begin at 1, just
6928 like buffer positions and byte indices, and when referring to the
6929 position that is @dfn{at} the gap, we always use the memory position at
6930 the @emph{beginning}, not at the end, of the gap.
6931 @item
6932 Fetch the appropriate bytes at the determined memory position.
6933 @item
6934 Convert these bytes into an Emchar.
6935 @end enumerate
6936
6937   In the non-Mule case, (3) and (4) boil down to a simple one-byte
6938 memory access.
6939
6940   Note that we have defined three types of positions in a buffer:
6941
6942 @enumerate
6943 @item
6944 @dfn{buffer positions} or @dfn{character positions}, typedef @code{Bufpos}
6945 @item
6946 @dfn{byte indices}, typedef @code{Bytind}
6947 @item
6948 @dfn{memory indices}, typedef @code{Memind}
6949 @end enumerate
6950
6951   All three typedefs are just @code{int}s, but defining them this way makes
6952 things a lot clearer.
6953
6954   Most code works with buffer positions.  In particular, all Lisp code
6955 that refers to text in a buffer uses buffer positions.  Lisp code does
6956 not know that byte indices or memory indices exist.
6957
6958   Finally, we have a typedef for the bytes in a buffer.  This is a
6959 @code{Bufbyte}, which is an unsigned char.  Referring to them as
6960 Bufbytes underscores the fact that we are working with a string of bytes
6961 in the internal Emacs buffer representation rather than in one of a
6962 number of possible alternative representations (e.g. EUC-encoded text,
6963 etc.).
6964
6965 @node Buffer Lists, Markers and Extents, The Text in a Buffer, Buffers and Textual Representation
6966 @section Buffer Lists
6967
6968   Recall earlier that buffers are @dfn{permanent} objects, i.e.  that
6969 they remain around until explicitly deleted.  This entails that there is
6970 a list of all the buffers in existence.  This list is actually an
6971 assoc-list (mapping from the buffer's name to the buffer) and is stored
6972 in the global variable @code{Vbuffer_alist}.
6973
6974   The order of the buffers in the list is important: the buffers are
6975 ordered approximately from most-recently-used to least-recently-used.
6976 Switching to a buffer using @code{switch-to-buffer},
6977 @code{pop-to-buffer}, etc. and switching windows using
6978 @code{other-window}, etc.  usually brings the new current buffer to the
6979 front of the list.  @code{switch-to-buffer}, @code{other-buffer},
6980 etc. look at the beginning of the list to find an alternative buffer to
6981 suggest.  You can also explicitly move a buffer to the end of the list
6982 using @code{bury-buffer}.
6983
6984   In addition to the global ordering in @code{Vbuffer_alist}, each frame
6985 has its own ordering of the list.  These lists always contain the same
6986 elements as in @code{Vbuffer_alist} although possibly in a different
6987 order.  @code{buffer-list} normally returns the list for the selected
6988 frame.  This allows you to work in separate frames without things
6989 interfering with each other.
6990
6991   The standard way to look up a buffer given a name is
6992 @code{get-buffer}, and the standard way to create a new buffer is
6993 @code{get-buffer-create}, which looks up a buffer with a given name,
6994 creating a new one if necessary.  These operations correspond exactly
6995 with the symbol operations @code{intern-soft} and @code{intern},
6996 respectively.  You can also force a new buffer to be created using
6997 @code{generate-new-buffer}, which takes a name and (if necessary) makes
6998 a unique name from this by appending a number, and then creates the
6999 buffer.  This is basically like the symbol operation @code{gensym}.
7000
7001 @node Markers and Extents, Bufbytes and Emchars, Buffer Lists, Buffers and Textual Representation
7002 @section Markers and Extents
7003
7004   Among the things associated with a buffer are things that are
7005 logically attached to certain buffer positions.  This can be used to
7006 keep track of a buffer position when text is inserted and deleted, so
7007 that it remains at the same spot relative to the text around it; to
7008 assign properties to particular sections of text; etc.  There are two
7009 such objects that are useful in this regard: they are @dfn{markers} and
7010 @dfn{extents}.
7011
7012   A @dfn{marker} is simply a flag placed at a particular buffer
7013 position, which is moved around as text is inserted and deleted.
7014 Markers are used for all sorts of purposes, such as the @code{mark} that
7015 is the other end of textual regions to be cut, copied, etc.
7016
7017   An @dfn{extent} is similar to two markers plus some associated
7018 properties, and is used to keep track of regions in a buffer as text is
7019 inserted and deleted, and to add properties (e.g. fonts) to particular
7020 regions of text.  The external interface of extents is explained
7021 elsewhere.
7022
7023   The important thing here is that markers and extents simply contain
7024 buffer positions in them as integers, and every time text is inserted or
7025 deleted, these positions must be updated.  In order to minimize the
7026 amount of shuffling that needs to be done, the positions in markers and
7027 extents (there's one per marker, two per extent) and stored in Meminds.
7028 This means that they only need to be moved when the text is physically
7029 moved in memory; since the gap structure tries to minimize this, it also
7030 minimizes the number of marker and extent indices that need to be
7031 adjusted.  Look in @file{insdel.c} for the details of how this works.
7032
7033   One other important distinction is that markers are @dfn{temporary}
7034 while extents are @dfn{permanent}.  This means that markers disappear as
7035 soon as there are no more pointers to them, and correspondingly, there
7036 is no way to determine what markers are in a buffer if you are just
7037 given the buffer.  Extents remain in a buffer until they are detached
7038 (which could happen as a result of text being deleted) or the buffer is
7039 deleted, and primitives do exist to enumerate the extents in a buffer.
7040
7041 @node Bufbytes and Emchars, The Buffer Object, Markers and Extents, Buffers and Textual Representation
7042 @section Bufbytes and Emchars
7043
7044   Not yet documented.
7045
7046 @node The Buffer Object,  , Bufbytes and Emchars, Buffers and Textual Representation
7047 @section The Buffer Object
7048
7049   Buffers contain fields not directly accessible by the Lisp programmer.
7050 We describe them here, naming them by the names used in the C code.
7051 Many are accessible indirectly in Lisp programs via Lisp primitives.
7052
7053 @table @code
7054 @item name
7055 The buffer name is a string that names the buffer.  It is guaranteed to
7056 be unique.  @xref{Buffer Names,,, lispref, XEmacs Lisp Programmer's
7057 Manual}.
7058
7059 @item save_modified
7060 This field contains the time when the buffer was last saved, as an
7061 integer.  @xref{Buffer Modification,,, lispref, XEmacs Lisp Programmer's
7062 Manual}.
7063
7064 @item modtime
7065 This field contains the modification time of the visited file.  It is
7066 set when the file is written or read.  Every time the buffer is written
7067 to the file, this field is compared to the modification time of the
7068 file.  @xref{Buffer Modification,,, lispref, XEmacs Lisp Programmer's
7069 Manual}.
7070
7071 @item auto_save_modified
7072 This field contains the time when the buffer was last auto-saved.
7073
7074 @item last_window_start
7075 This field contains the @code{window-start} position in the buffer as of
7076 the last time the buffer was displayed in a window.
7077
7078 @item undo_list
7079 This field points to the buffer's undo list.  @xref{Undo,,, lispref,
7080 XEmacs Lisp Programmer's Manual}.
7081
7082 @item syntax_table_v
7083 This field contains the syntax table for the buffer.  @xref{Syntax
7084 Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
7085
7086 @item downcase_table
7087 This field contains the conversion table for converting text to lower
7088 case.  @xref{Case Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
7089
7090 @item upcase_table
7091 This field contains the conversion table for converting text to upper
7092 case.  @xref{Case Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
7093
7094 @item case_canon_table
7095 This field contains the conversion table for canonicalizing text for
7096 case-folding search.  @xref{Case Tables,,, lispref, XEmacs Lisp
7097 Programmer's Manual}.
7098
7099 @item case_eqv_table
7100 This field contains the equivalence table for case-folding search.
7101 @xref{Case Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
7102
7103 @item display_table
7104 This field contains the buffer's display table, or @code{nil} if it
7105 doesn't have one.  @xref{Display Tables,,, lispref, XEmacs Lisp
7106 Programmer's Manual}.
7107
7108 @item markers
7109 This field contains the chain of all markers that currently point into
7110 the buffer.  Deletion of text in the buffer, and motion of the buffer's
7111 gap, must check each of these markers and perhaps update it.
7112 @xref{Markers,,, lispref, XEmacs Lisp Programmer's Manual}.
7113
7114 @item backed_up
7115 This field is a flag that tells whether a backup file has been made for
7116 the visited file of this buffer.
7117
7118 @item mark
7119 This field contains the mark for the buffer.  The mark is a marker,
7120 hence it is also included on the list @code{markers}.  @xref{The Mark,,,
7121 lispref, XEmacs Lisp Programmer's Manual}.
7122
7123 @item mark_active
7124 This field is non-@code{nil} if the buffer's mark is active.
7125
7126 @item local_var_alist
7127 This field contains the association list describing the variables local
7128 in this buffer, and their values, with the exception of local variables
7129 that have special slots in the buffer object.  (Those slots are omitted
7130 from this table.)  @xref{Buffer-Local Variables,,, lispref, XEmacs Lisp
7131 Programmer's Manual}.
7132
7133 @item modeline_format
7134 This field contains a Lisp object which controls how to display the mode
7135 line for this buffer.  @xref{Modeline Format,,, lispref, XEmacs Lisp
7136 Programmer's Manual}.
7137
7138 @item base_buffer
7139 This field holds the buffer's base buffer (if it is an indirect buffer),
7140 or @code{nil}.
7141 @end table
7142
7143 @node MULE Character Sets and Encodings, The Lisp Reader and Compiler, Buffers and Textual Representation, Top
7144 @chapter MULE Character Sets and Encodings
7145
7146   Recall that there are two primary ways that text is represented in
7147 XEmacs.  The @dfn{buffer} representation sees the text as a series of
7148 bytes (Bufbytes), with a variable number of bytes used per character.
7149 The @dfn{character} representation sees the text as a series of integers
7150 (Emchars), one per character.  The character representation is a cleaner
7151 representation from a theoretical standpoint, and is thus used in many
7152 cases when lots of manipulations on a string need to be done.  However,
7153 the buffer representation is the standard representation used in both
7154 Lisp strings and buffers, and because of this, it is the ``default''
7155 representation that text comes in.  The reason for using this
7156 representation is that it's compact and is compatible with ASCII.
7157
7158 @menu
7159 * Character Sets::
7160 * Encodings::
7161 * Internal Mule Encodings::
7162 * CCL::
7163 @end menu
7164
7165 @node Character Sets, Encodings, MULE Character Sets and Encodings, MULE Character Sets and Encodings
7166 @section Character Sets
7167
7168   A character set (or @dfn{charset}) is an ordered set of characters.  A
7169 particular character in a charset is indexed using one or more
7170 @dfn{position codes}, which are non-negative integers.  The number of
7171 position codes needed to identify a particular character in a charset is
7172 called the @dfn{dimension} of the charset.  In XEmacs/Mule, all charsets
7173 have dimension 1 or 2, and the size of all charsets (except for a few
7174 special cases) is either 94, 96, 94 by 94, or 96 by 96.  The range of
7175 position codes used to index characters from any of these types of
7176 character sets is as follows:
7177
7178 @example
7179 Charset type            Position code 1         Position code 2
7180 ------------------------------------------------------------
7181 94                      33 - 126                N/A
7182 96                      32 - 127                N/A
7183 94x94                   33 - 126                33 - 126
7184 96x96                   32 - 127                32 - 127
7185 @end example
7186
7187   Note that in the above cases position codes do not start at an
7188 expected value such as 0 or 1.  The reason for this will become clear
7189 later.
7190
7191   For example, Latin-1 is a 96-character charset, and JISX0208 (the
7192 Japanese national character set) is a 94x94-character charset.
7193
7194   [Note that, although the ranges above define the @emph{valid} position
7195 codes for a charset, some of the slots in a particular charset may in
7196 fact be empty.  This is the case for JISX0208, for example, where (e.g.)
7197 all the slots whose first position code is in the range 118 - 127 are
7198 empty.]
7199
7200   There are three charsets that do not follow the above rules.  All of
7201 them have one dimension, and have ranges of position codes as follows:
7202
7203 @example
7204 Charset name            Position code 1
7205 ------------------------------------
7206 ASCII                   0 - 127
7207 Control-1               0 - 31
7208 Composite               0 - some large number
7209 @end example
7210
7211   (The upper bound of the position code for composite characters has not
7212 yet been determined, but it will probably be at least 16,383).
7213
7214   ASCII is the union of two subsidiary character sets: Printing-ASCII
7215 (the printing ASCII character set, consisting of position codes 33 -
7216 126, like for a standard 94-character charset) and Control-ASCII (the
7217 non-printing characters that would appear in a binary file with codes 0
7218 - 32 and 127).
7219
7220   Control-1 contains the non-printing characters that would appear in a
7221 binary file with codes 128 - 159.
7222
7223   Composite contains characters that are generated by overstriking one
7224 or more characters from other charsets.
7225
7226   Note that some characters in ASCII, and all characters in Control-1,
7227 are @dfn{control} (non-printing) characters.  These have no printed
7228 representation but instead control some other function of the printing
7229 (e.g. TAB or 8 moves the current character position to the next tab
7230 stop).  All other characters in all charsets are @dfn{graphic}
7231 (printing) characters.
7232
7233   When a binary file is read in, the bytes in the file are assigned to
7234 character sets as follows:
7235
7236 @example
7237 Bytes           Character set           Range
7238 --------------------------------------------------
7239 0 - 127         ASCII                   0 - 127
7240 128 - 159       Control-1               0 - 31
7241 160 - 255       Latin-1                 32 - 127
7242 @end example
7243
7244   This is a bit ad-hoc but gets the job done.
7245
7246 @node Encodings, Internal Mule Encodings, Character Sets, MULE Character Sets and Encodings
7247 @section Encodings
7248
7249   An @dfn{encoding} is a way of numerically representing characters from
7250 one or more character sets.  If an encoding only encompasses one
7251 character set, then the position codes for the characters in that
7252 character set could be used directly.  This is not possible, however, if
7253 more than one character set is to be used in the encoding.
7254
7255   For example, the conversion detailed above between bytes in a binary
7256 file and characters is effectively an encoding that encompasses the
7257 three character sets ASCII, Control-1, and Latin-1 in a stream of 8-bit
7258 bytes.
7259
7260   Thus, an encoding can be viewed as a way of encoding characters from a
7261 specified group of character sets using a stream of bytes, each of which
7262 contains a fixed number of bits (but not necessarily 8, as in the common
7263 usage of ``byte'').
7264
7265   Here are descriptions of a couple of common
7266 encodings:
7267
7268 @menu
7269 * Japanese EUC (Extended Unix Code)::
7270 * JIS7::
7271 @end menu
7272
7273 @node Japanese EUC (Extended Unix Code), JIS7, Encodings, Encodings
7274 @subsection Japanese EUC (Extended Unix Code)
7275
7276 This encompasses the character sets Printing-ASCII, Japanese-JISX0201,
7277 and Japanese-JISX0208-Kana (half-width katakana, the right half of
7278 JISX0201).  It uses 8-bit bytes.
7279
7280 Note that Printing-ASCII and Japanese-JISX0201-Kana are 94-character
7281 charsets, while Japanese-JISX0208 is a 94x94-character charset.
7282
7283 The encoding is as follows:
7284
7285 @example
7286 Character set            Representation (PC=position-code)
7287 -------------            --------------
7288 Printing-ASCII           PC1
7289 Japanese-JISX0201-Kana   0x8E       | PC1 + 0x80
7290 Japanese-JISX0208        PC1 + 0x80 | PC2 + 0x80
7291 Japanese-JISX0212        PC1 + 0x80 | PC2 + 0x80
7292 @end example
7293
7294
7295 @node JIS7,  , Japanese EUC (Extended Unix Code), Encodings
7296 @subsection JIS7
7297
7298 This encompasses the character sets Printing-ASCII,
7299 Japanese-JISX0201-Roman (the left half of JISX0201; this character set
7300 is very similar to Printing-ASCII and is a 94-character charset),
7301 Japanese-JISX0208, and Japanese-JISX0201-Kana.  It uses 7-bit bytes.
7302
7303 Unlike Japanese EUC, this is a @dfn{modal} encoding, which
7304 means that there are multiple states that the encoding can
7305 be in, which affect how the bytes are to be interpreted.
7306 Special sequences of bytes (called @dfn{escape sequences})
7307 are used to change states.
7308
7309   The encoding is as follows:
7310
7311 @example
7312 Character set              Representation (PC=position-code)
7313 -------------              --------------
7314 Printing-ASCII             PC1
7315 Japanese-JISX0201-Roman    PC1
7316 Japanese-JISX0201-Kana     PC1
7317 Japanese-JISX0208          PC1 PC2
7318
7319
7320 Escape sequence   ASCII equivalent   Meaning
7321 ---------------   ----------------   -------
7322 0x1B 0x28 0x4A    ESC ( J            invoke Japanese-JISX0201-Roman
7323 0x1B 0x28 0x49    ESC ( I            invoke Japanese-JISX0201-Kana
7324 0x1B 0x24 0x42    ESC $ B            invoke Japanese-JISX0208
7325 0x1B 0x28 0x42    ESC ( B            invoke Printing-ASCII
7326 @end example
7327
7328   Initially, Printing-ASCII is invoked.
7329
7330 @node Internal Mule Encodings, CCL, Encodings, MULE Character Sets and Encodings
7331 @section Internal Mule Encodings
7332
7333 In XEmacs/Mule, each character set is assigned a unique number, called a
7334 @dfn{leading byte}.  This is used in the encodings of a character.
7335 Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has
7336 a leading byte of 0), although some leading bytes are reserved.
7337
7338 Charsets whose leading byte is in the range 0x80 - 0x9F are called
7339 @dfn{official} and are used for built-in charsets.  Other charsets are
7340 called @dfn{private} and have leading bytes in the range 0xA0 - 0xFF;
7341 these are user-defined charsets.
7342
7343   More specifically:
7344
7345 @example
7346 Character set           Leading byte
7347 -------------           ------------
7348 ASCII                   0
7349 Composite               0x80
7350 Dimension-1 Official    0x81 - 0x8D
7351                           (0x8E is free)
7352 Control-1               0x8F
7353 Dimension-2 Official    0x90 - 0x99
7354                           (0x9A - 0x9D are free;
7355                            0x9E and 0x9F are reserved)
7356 Dimension-1 Private     0xA0 - 0xEF
7357 Dimension-2 Private     0xF0 - 0xFF
7358 @end example
7359
7360 There are two internal encodings for characters in XEmacs/Mule.  One is
7361 called @dfn{string encoding} and is an 8-bit encoding that is used for
7362 representing characters in a buffer or string.  It uses 1 to 4 bytes per
7363 character.  The other is called @dfn{character encoding} and is a 19-bit
7364 encoding that is used for representing characters individually in a
7365 variable.
7366
7367 (In the following descriptions, we'll ignore composite characters for
7368 the moment.  We also give a general (structural) overview first,
7369 followed later by the exact details.)
7370
7371 @menu
7372 * Internal String Encoding::
7373 * Internal Character Encoding::
7374 @end menu
7375
7376 @node Internal String Encoding, Internal Character Encoding, Internal Mule Encodings, Internal Mule Encodings
7377 @subsection Internal String Encoding
7378
7379 ASCII characters are encoded using their position code directly.  Other
7380 characters are encoded using their leading byte followed by their
7381 position code(s) with the high bit set.  Characters in private character
7382 sets have their leading byte prefixed with a @dfn{leading byte prefix},
7383 which is either 0x9E or 0x9F. (No character sets are ever assigned these
7384 leading bytes.) Specifically:
7385
7386 @example
7387 Character set           Encoding (PC=position-code, LB=leading-byte)
7388 -------------           --------
7389 ASCII                   PC-1 |
7390 Control-1               LB   |  PC1 + 0xA0 |
7391 Dimension-1 official    LB   |  PC1 + 0x80 |
7392 Dimension-1 private     0x9E |  LB         | PC1 + 0x80 |
7393 Dimension-2 official    LB   |  PC1 + 0x80 | PC2 + 0x80 |
7394 Dimension-2 private     0x9F |  LB         | PC1 + 0x80 | PC2 + 0x80
7395 @end example
7396
7397   The basic characteristic of this encoding is that the first byte
7398 of all characters is in the range 0x00 - 0x9F, and the second and
7399 following bytes of all characters is in the range 0xA0 - 0xFF.
7400 This means that it is impossible to get out of sync, or more
7401 specifically:
7402
7403 @enumerate
7404 @item
7405 Given any byte position, the beginning of the character it is
7406 within can be determined in constant time.
7407 @item
7408 Given any byte position at the beginning of a character, the
7409 beginning of the next character can be determined in constant
7410 time.
7411 @item
7412 Given any byte position at the beginning of a character, the
7413 beginning of the previous character can be determined in constant
7414 time.
7415 @item
7416 Textual searches can simply treat encoded strings as if they
7417 were encoded in a one-byte-per-character fashion rather than
7418 the actual multi-byte encoding.
7419 @end enumerate
7420
7421   None of the standard non-modal encodings meet all of these
7422 conditions.  For example, EUC satisfies only (2) and (3), while
7423 Shift-JIS and Big5 (not yet described) satisfy only (2). (All
7424 non-modal encodings must satisfy (2), in order to be unambiguous.)
7425
7426 @node Internal Character Encoding,  , Internal String Encoding, Internal Mule Encodings
7427 @subsection Internal Character Encoding
7428
7429   One 19-bit word represents a single character.  The word is
7430 separated into three fields:
7431
7432 @example
7433 Bit number:     18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
7434                 <------------> <------------------> <------------------>
7435 Field:                1                  2                    3
7436 @end example
7437
7438   Note that fields 2 and 3 hold 7 bits each, while field 1 holds 5 bits.
7439
7440 @example
7441 Character set           Field 1         Field 2         Field 3
7442 -------------           -------         -------         -------
7443 ASCII                      0               0              PC1
7444    range:                                                   (00 - 7F)
7445 Control-1                  0               1              PC1
7446    range:                                                   (00 - 1F)
7447 Dimension-1 official       0            LB - 0x80         PC1
7448    range:                                    (01 - 0D)      (20 - 7F)
7449 Dimension-1 private        0            LB - 0x80         PC1
7450    range:                                    (20 - 6F)      (20 - 7F)
7451 Dimension-2 official    LB - 0x8F         PC1             PC2
7452    range:                    (01 - 0A)       (20 - 7F)      (20 - 7F)
7453 Dimension-2 private     LB - 0xE1         PC1             PC2
7454    range:                    (0F - 1E)       (20 - 7F)      (20 - 7F)
7455 Composite                 0x1F             ?               ?
7456 @end example
7457
7458   Note that character codes 0 - 255 are the same as the ``binary encoding''
7459 described above.
7460
7461 @node CCL,  , Internal Mule Encodings, MULE Character Sets and Encodings
7462 @section CCL
7463
7464 @example
7465 CCL PROGRAM SYNTAX:
7466      CCL_PROGRAM := (CCL_MAIN_BLOCK
7467                      [ CCL_EOF_BLOCK ])
7468
7469      CCL_MAIN_BLOCK := CCL_BLOCK
7470      CCL_EOF_BLOCK := CCL_BLOCK
7471
7472      CCL_BLOCK := STATEMENT | (STATEMENT [STATEMENT ...])
7473      STATEMENT :=
7474              SET | IF | BRANCH | LOOP | REPEAT | BREAK
7475              | READ | WRITE
7476
7477      SET := (REG = EXPRESSION) | (REG SELF_OP EXPRESSION)
7478             | INT-OR-CHAR
7479
7480      EXPRESSION := ARG | (EXPRESSION OP ARG)
7481
7482      IF := (if EXPRESSION CCL_BLOCK CCL_BLOCK)
7483      BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...])
7484      LOOP := (loop STATEMENT [STATEMENT ...])
7485      BREAK := (break)
7486      REPEAT := (repeat)
7487              | (write-repeat [REG | INT-OR-CHAR | string])
7488              | (write-read-repeat REG [INT-OR-CHAR | string | ARRAY]?)
7489      READ := (read REG) | (read REG REG)
7490              | (read-if REG ARITH_OP ARG CCL_BLOCK CCL_BLOCK)
7491              | (read-branch REG CCL_BLOCK [CCL_BLOCK ...])
7492      WRITE := (write REG) | (write REG REG)
7493              | (write INT-OR-CHAR) | (write STRING) | STRING
7494              | (write REG ARRAY)
7495      END := (end)
7496
7497      REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7
7498      ARG := REG | INT-OR-CHAR
7499      OP :=   + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | //
7500              | < | > | == | <= | >= | !=
7501      SELF_OP :=
7502              += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>=
7503      ARRAY := '[' INT-OR-CHAR ... ']'
7504      INT-OR-CHAR := INT | CHAR
7505
7506 MACHINE CODE:
7507
7508 The machine code consists of a vector of 32-bit words.
7509 The first such word specifies the start of the EOF section of the code;
7510 this is the code executed to handle any stuff that needs to be done
7511 (e.g. designating back to ASCII and left-to-right mode) after all
7512 other encoded/decoded data has been written out.  This is not used for
7513 charset CCL programs.
7514
7515 REGISTER: 0..7  -- refered by RRR or rrr
7516
7517 OPERATOR BIT FIELD (27-bit): XXXXXXXXXXXXXXX RRR TTTTT
7518         TTTTT (5-bit): operator type
7519         RRR (3-bit): register number
7520         XXXXXXXXXXXXXXXX (15-bit):
7521                 CCCCCCCCCCCCCCC: constant or address
7522                 000000000000rrr: register number
7523
7524 AAAA:   00000 +
7525         00001 -
7526         00010 *
7527         00011 /
7528         00100 %
7529         00101 &
7530         00110 |
7531         00111 ~
7532
7533         01000 <<
7534         01001 >>
7535         01010 <8
7536         01011 >8
7537         01100 //
7538         01101 not used
7539         01110 not used
7540         01111 not used
7541
7542         10000 <
7543         10001 >
7544         10010 ==
7545         10011 <=
7546         10100 >=
7547         10101 !=
7548
7549 OPERATORS:      TTTTT RRR XX..
7550
7551 SetCS:          00000 RRR C...C      RRR = C...C
7552 SetCL:          00001 RRR .....      RRR = c...c
7553                 c.............c
7554 SetR:           00010 RRR ..rrr      RRR = rrr
7555 SetA:           00011 RRR ..rrr      RRR = array[rrr]
7556                 C.............C      size of array = C...C
7557                 c.............c      contents = c...c
7558
7559 Jump:           00100 000 c...c      jump to c...c
7560 JumpCond:       00101 RRR c...c      if (!RRR) jump to c...c
7561 WriteJump:      00110 RRR c...c      Write1 RRR, jump to c...c
7562 WriteReadJump:  00111 RRR c...c      Write1, Read1 RRR, jump to c...c
7563 WriteCJump:     01000 000 c...c      Write1 C...C, jump to c...c
7564                 C...C
7565 WriteCReadJump: 01001 RRR c...c      Write1 C...C, Read1 RRR,
7566                 C.............C      and jump to c...c
7567 WriteSJump:     01010 000 c...c      WriteS, jump to c...c
7568                 C.............C
7569                 S.............S
7570                 ...
7571 WriteSReadJump: 01011 RRR c...c      WriteS, Read1 RRR, jump to c...c
7572                 C.............C
7573                 S.............S
7574                 ...
7575 WriteAReadJump: 01100 RRR c...c      WriteA, Read1 RRR, jump to c...c
7576                 C.............C      size of array = C...C
7577                 c.............c      contents = c...c
7578                 ...
7579 Branch:         01101 RRR C...C      if (RRR >= 0 && RRR < C..)
7580                 c.............c      branch to (RRR+1)th address
7581 Read1:          01110 RRR ...        read 1-byte to RRR
7582 Read2:          01111 RRR ..rrr      read 2-byte to RRR and rrr
7583 ReadBranch:     10000 RRR C...C      Read1 and Branch
7584                 c.............c
7585                 ...
7586 Write1:         10001 RRR .....      write 1-byte RRR
7587 Write2:         10010 RRR ..rrr      write 2-byte RRR and rrr
7588 WriteC:         10011 000 .....      write 1-char C...CC
7589                 C.............C
7590 WriteS:         10100 000 .....      write C..-byte of string
7591                 C.............C
7592                 S.............S
7593                 ...
7594 WriteA:         10101 RRR .....      write array[RRR]
7595                 C.............C      size of array = C...C
7596                 c.............c      contents = c...c
7597                 ...
7598 End:            10110 000 .....      terminate the execution
7599
7600 SetSelfCS:      10111 RRR C...C      RRR AAAAA= C...C
7601                 ..........AAAAA
7602 SetSelfCL:      11000 RRR .....      RRR AAAAA= c...c
7603                 c.............c
7604                 ..........AAAAA
7605 SetSelfR:       11001 RRR ..Rrr      RRR AAAAA= rrr
7606                 ..........AAAAA
7607 SetExprCL:      11010 RRR ..Rrr      RRR = rrr AAAAA c...c
7608                 c.............c
7609                 ..........AAAAA
7610 SetExprR:       11011 RRR ..rrr      RRR = rrr AAAAA Rrr
7611                 ............Rrr
7612                 ..........AAAAA
7613 JumpCondC:      11100 RRR c...c      if !(RRR AAAAA C..) jump to c...c
7614                 C.............C
7615                 ..........AAAAA
7616 JumpCondR:      11101 RRR c...c      if !(RRR AAAAA rrr) jump to c...c
7617                 ............rrr
7618                 ..........AAAAA
7619 ReadJumpCondC:  11110 RRR c...c      Read1 and JumpCondC
7620                 C.............C
7621                 ..........AAAAA
7622 ReadJumpCondR:  11111 RRR c...c      Read1 and JumpCondR
7623                 ............rrr
7624                 ..........AAAAA
7625 @end example
7626
7627 @node The Lisp Reader and Compiler, Lstreams, MULE Character Sets and Encodings, Top
7628 @chapter The Lisp Reader and Compiler
7629
7630 Not yet documented.
7631
7632 @node Lstreams, Consoles; Devices; Frames; Windows, The Lisp Reader and Compiler, Top
7633 @chapter Lstreams
7634
7635   An @dfn{lstream} is an internal Lisp object that provides a generic
7636 buffering stream implementation.  Conceptually, you send data to the
7637 stream or read data from the stream, not caring what's on the other end
7638 of the stream.  The other end could be another stream, a file
7639 descriptor, a stdio stream, a fixed block of memory, a reallocating
7640 block of memory, etc.  The main purpose of the stream is to provide a
7641 standard interface and to do buffering.  Macros are defined to read or
7642 write characters, so the calling functions do not have to worry about
7643 blocking data together in order to achieve efficiency.
7644
7645 @menu
7646 * Creating an Lstream::         Creating an lstream object.
7647 * Lstream Types::               Different sorts of things that are streamed.
7648 * Lstream Functions::           Functions for working with lstreams.
7649 * Lstream Methods::             Creating new lstream types.
7650 @end menu
7651
7652 @node Creating an Lstream, Lstream Types, Lstreams, Lstreams
7653 @section Creating an Lstream
7654
7655 Lstreams come in different types, depending on what is being interfaced
7656 to.  Although the primitive for creating new lstreams is
7657 @code{Lstream_new()}, generally you do not call this directly.  Instead,
7658 you call some type-specific creation function, which creates the lstream
7659 and initializes it as appropriate for the particular type.
7660
7661 All lstream creation functions take a @var{mode} argument, specifying
7662 what mode the lstream should be opened as.  This controls whether the
7663 lstream is for input and output, and optionally whether data should be
7664 blocked up in units of MULE characters.  Note that some types of
7665 lstreams can only be opened for input; others only for output; and
7666 others can be opened either way.  #### Richard Mlynarik thinks that
7667 there should be a strict separation between input and output streams,
7668 and he's probably right.
7669
7670   @var{mode} is a string, one of
7671
7672 @table @code
7673 @item "r"
7674   Open for reading.
7675 @item "w"
7676   Open for writing.
7677 @item "rc"
7678   Open for reading, but ``read'' never returns partial MULE characters.
7679 @item "wc"
7680   Open for writing, but never writes partial MULE characters.
7681 @end table
7682
7683 @node Lstream Types, Lstream Functions, Creating an Lstream, Lstreams
7684 @section Lstream Types
7685
7686 @table @asis
7687 @item stdio
7688
7689 @item filedesc
7690
7691 @item lisp-string
7692
7693 @item fixed-buffer
7694
7695 @item resizing-buffer
7696
7697 @item dynarr
7698
7699 @item lisp-buffer
7700
7701 @item print
7702
7703 @item decoding
7704
7705 @item encoding
7706 @end table
7707
7708 @node Lstream Functions, Lstream Methods, Lstream Types, Lstreams
7709 @section Lstream Functions
7710
7711 @deftypefun {Lstream *} Lstream_new (Lstream_implementation *@var{imp}, const char *@var{mode})
7712 Allocate and return a new Lstream.  This function is not really meant to
7713 be called directly; rather, each stream type should provide its own
7714 stream creation function, which creates the stream and does any other
7715 necessary creation stuff (e.g. opening a file).
7716 @end deftypefun
7717
7718 @deftypefun void Lstream_set_buffering (Lstream *@var{lstr}, Lstream_buffering @var{buffering}, int @var{buffering_size})
7719 Change the buffering of a stream.  See @file{lstream.h}.  By default the
7720 buffering is @code{STREAM_BLOCK_BUFFERED}.
7721 @end deftypefun
7722
7723 @deftypefun int Lstream_flush (Lstream *@var{lstr})
7724 Flush out any pending unwritten data in the stream.  Clear any buffered
7725 input data.  Returns 0 on success, -1 on error.
7726 @end deftypefun
7727
7728 @deftypefn Macro int Lstream_putc (Lstream *@var{stream}, int @var{c})
7729 Write out one byte to the stream.  This is a macro and so it is very
7730 efficient.  The @var{c} argument is only evaluated once but the @var{stream}
7731 argument is evaluated more than once.  Returns 0 on success, -1 on
7732 error.
7733 @end deftypefn
7734
7735 @deftypefn Macro int Lstream_getc (Lstream *@var{stream})
7736 Read one byte from the stream.  This is a macro and so it is very
7737 efficient.  The @var{stream} argument is evaluated more than once.  Return
7738 value is -1 for EOF or error.
7739 @end deftypefn
7740
7741 @deftypefn Macro void Lstream_ungetc (Lstream *@var{stream}, int @var{c})
7742 Push one byte back onto the input queue.  This will be the next byte
7743 read from the stream.  Any number of bytes can be pushed back and will
7744 be read in the reverse order they were pushed back---most recent
7745 first. (This is necessary for consistency---if there are a number of
7746 bytes that have been unread and I read and unread a byte, it needs to be
7747 the first to be read again.) This is a macro and so it is very
7748 efficient.  The @var{c} argument is only evaluated once but the @var{stream}
7749 argument is evaluated more than once.
7750 @end deftypefn
7751
7752 @deftypefun int Lstream_fputc (Lstream *@var{stream}, int @var{c})
7753 @deftypefunx int Lstream_fgetc (Lstream *@var{stream})
7754 @deftypefunx void Lstream_fungetc (Lstream *@var{stream}, int @var{c})
7755 Function equivalents of the above macros.
7756 @end deftypefun
7757
7758 @deftypefun ssize_t Lstream_read (Lstream *@var{stream}, void *@var{data}, size_t @var{size})
7759 Read @var{size} bytes of @var{data} from the stream.  Return the number
7760 of bytes read.  0 means EOF. -1 means an error occurred and no bytes
7761 were read.
7762 @end deftypefun
7763
7764 @deftypefun ssize_t Lstream_write (Lstream *@var{stream}, void *@var{data}, size_t @var{size})
7765 Write @var{size} bytes of @var{data} to the stream.  Return the number
7766 of bytes written.  -1 means an error occurred and no bytes were written.
7767 @end deftypefun
7768
7769 @deftypefun void Lstream_unread (Lstream *@var{stream}, void *@var{data}, size_t @var{size})
7770 Push back @var{size} bytes of @var{data} onto the input queue.  The next
7771 call to @code{Lstream_read()} with the same size will read the same
7772 bytes back.  Note that this will be the case even if there is other
7773 pending unread data.
7774 @end deftypefun
7775
7776 @deftypefun int Lstream_close (Lstream *@var{stream})
7777 Close the stream.  All data will be flushed out.
7778 @end deftypefun
7779
7780 @deftypefun void Lstream_reopen (Lstream *@var{stream})
7781 Reopen a closed stream.  This enables I/O on it again.  This is not
7782 meant to be called except from a wrapper routine that reinitializes
7783 variables and such---the close routine may well have freed some
7784 necessary storage structures, for example.
7785 @end deftypefun
7786
7787 @deftypefun void Lstream_rewind (Lstream *@var{stream})
7788 Rewind the stream to the beginning.
7789 @end deftypefun
7790
7791 @node Lstream Methods,  , Lstream Functions, Lstreams
7792 @section Lstream Methods
7793
7794 @deftypefn {Lstream Method} ssize_t reader (Lstream *@var{stream}, unsigned char *@var{data}, size_t @var{size})
7795 Read some data from the stream's end and store it into @var{data}, which
7796 can hold @var{size} bytes.  Return the number of bytes read.  A return
7797 value of 0 means no bytes can be read at this time.  This may be because
7798 of an EOF, or because there is a granularity greater than one byte that
7799 the stream imposes on the returned data, and @var{size} is less than
7800 this granularity. (This will happen frequently for streams that need to
7801 return whole characters, because @code{Lstream_read()} calls the reader
7802 function repeatedly until it has the number of bytes it wants or until 0
7803 is returned.)  The lstream functions do not treat a 0 return as EOF or
7804 do anything special; however, the calling function will interpret any 0
7805 it gets back as EOF.  This will normally not happen unless the caller
7806 calls @code{Lstream_read()} with a very small size.
7807
7808 This function can be @code{NULL} if the stream is output-only.
7809 @end deftypefn
7810
7811 @deftypefn {Lstream Method} ssize_t writer (Lstream *@var{stream}, const unsigned char *@var{data}, size_t @var{size})
7812 Send some data to the stream's end.  Data to be sent is in @var{data}
7813 and is @var{size} bytes.  Return the number of bytes sent.  This
7814 function can send and return fewer bytes than is passed in; in that
7815 case, the function will just be called again until there is no data left
7816 or 0 is returned.  A return value of 0 means that no more data can be
7817 currently stored, but there is no error; the data will be squirreled
7818 away until the writer can accept data. (This is useful, e.g., if you're
7819 dealing with a non-blocking file descriptor and are getting
7820 @code{EWOULDBLOCK} errors.)  This function can be @code{NULL} if the
7821 stream is input-only.
7822 @end deftypefn
7823
7824 @deftypefn {Lstream Method} int rewinder (Lstream *@var{stream})
7825 Rewind the stream.  If this is @code{NULL}, the stream is not seekable.
7826 @end deftypefn
7827
7828 @deftypefn {Lstream Method} int seekable_p (Lstream *@var{stream})
7829 Indicate whether this stream is seekable---i.e. it can be rewound.
7830 This method is ignored if the stream does not have a rewind method.  If
7831 this method is not present, the result is determined by whether a rewind
7832 method is present.
7833 @end deftypefn
7834
7835 @deftypefn {Lstream Method} int flusher (Lstream *@var{stream})
7836 Perform any additional operations necessary to flush the data in this
7837 stream.
7838 @end deftypefn
7839
7840 @deftypefn {Lstream Method} int pseudo_closer (Lstream *@var{stream})
7841 @end deftypefn
7842
7843 @deftypefn {Lstream Method} int closer (Lstream *@var{stream})
7844 Perform any additional operations necessary to close this stream down.
7845 May be @code{NULL}.  This function is called when @code{Lstream_close()}
7846 is called or when the stream is garbage-collected.  When this function
7847 is called, all pending data in the stream will already have been written
7848 out.
7849 @end deftypefn
7850
7851 @deftypefn {Lstream Method} Lisp_Object marker (Lisp_Object @var{lstream}, void (*@var{markfun}) (Lisp_Object))
7852 Mark this object for garbage collection.  Same semantics as a standard
7853 @code{Lisp_Object} marker.  This function can be @code{NULL}.
7854 @end deftypefn
7855
7856 @node Consoles; Devices; Frames; Windows, The Redisplay Mechanism, Lstreams, Top
7857 @chapter Consoles; Devices; Frames; Windows
7858
7859 @menu
7860 * Introduction to Consoles; Devices; Frames; Windows::
7861 * Point::
7862 * Window Hierarchy::
7863 * The Window Object::
7864 @end menu
7865
7866 @node Introduction to Consoles; Devices; Frames; Windows, Point, Consoles; Devices; Frames; Windows, Consoles; Devices; Frames; Windows
7867 @section Introduction to Consoles; Devices; Frames; Windows
7868
7869 A window-system window that you see on the screen is called a
7870 @dfn{frame} in Emacs terminology.  Each frame is subdivided into one or
7871 more non-overlapping panes, called (confusingly) @dfn{windows}.  Each
7872 window displays the text of a buffer in it. (See above on Buffers.) Note
7873 that buffers and windows are independent entities: Two or more windows
7874 can be displaying the same buffer (potentially in different locations),
7875 and a buffer can be displayed in no windows.
7876
7877   A single display screen that contains one or more frames is called
7878 a @dfn{display}.  Under most circumstances, there is only one display.
7879 However, more than one display can exist, for example if you have
7880 a @dfn{multi-headed} console, i.e. one with a single keyboard but
7881 multiple displays. (Typically in such a situation, the various
7882 displays act like one large display, in that the mouse is only
7883 in one of them at a time, and moving the mouse off of one moves
7884 it into another.) In some cases, the different displays will
7885 have different characteristics, e.g. one color and one mono.
7886
7887   XEmacs can display frames on multiple displays.  It can even deal
7888 simultaneously with frames on multiple keyboards (called @dfn{consoles} in
7889 XEmacs terminology).  Here is one case where this might be useful: You
7890 are using XEmacs on your workstation at work, and leave it running.
7891 Then you go home and dial in on a TTY line, and you can use the
7892 already-running XEmacs process to display another frame on your local
7893 TTY.
7894
7895   Thus, there is a hierarchy console -> display -> frame -> window.
7896 There is a separate Lisp object type for each of these four concepts.
7897 Furthermore, there is logically a @dfn{selected console},
7898 @dfn{selected display}, @dfn{selected frame}, and @dfn{selected window}.
7899 Each of these objects is distinguished in various ways, such as being the
7900 default object for various functions that act on objects of that type.
7901 Note that every containing object remembers the ``selected'' object
7902 among the objects that it contains: e.g. not only is there a selected
7903 window, but every frame remembers the last window in it that was
7904 selected, and changing the selected frame causes the remembered window
7905 within it to become the selected window.  Similar relationships apply
7906 for consoles to devices and devices to frames.
7907
7908 @node Point, Window Hierarchy, Introduction to Consoles; Devices; Frames; Windows, Consoles; Devices; Frames; Windows
7909 @section Point
7910
7911   Recall that every buffer has a current insertion position, called
7912 @dfn{point}.  Now, two or more windows may be displaying the same buffer,
7913 and the text cursor in the two windows (i.e. @code{point}) can be in
7914 two different places.  You may ask, how can that be, since each
7915 buffer has only one value of @code{point}?  The answer is that each window
7916 also has a value of @code{point} that is squirreled away in it.  There
7917 is only one selected window, and the value of ``point'' in that buffer
7918 corresponds to that window.  When the selected window is changed
7919 from one window to another displaying the same buffer, the old
7920 value of @code{point} is stored into the old window's ``point'' and the
7921 value of @code{point} from the new window is retrieved and made the
7922 value of @code{point} in the buffer.  This means that @code{window-point}
7923 for the selected window is potentially inaccurate, and if you
7924 want to retrieve the correct value of @code{point} for a window,
7925 you must special-case on the selected window and retrieve the
7926 buffer's point instead.  This is related to why @code{save-window-excursion}
7927 does not save the selected window's value of @code{point}.
7928
7929 @node Window Hierarchy, The Window Object, Point, Consoles; Devices; Frames; Windows
7930 @section Window Hierarchy
7931 @cindex window hierarchy
7932 @cindex hierarchy of windows
7933
7934   If a frame contains multiple windows (panes), they are always created
7935 by splitting an existing window along the horizontal or vertical axis.
7936 Terminology is a bit confusing here: to @dfn{split a window
7937 horizontally} means to create two side-by-side windows, i.e. to make a
7938 @emph{vertical} cut in a window.  Likewise, to @dfn{split a window
7939 vertically} means to create two windows, one above the other, by making
7940 a @emph{horizontal} cut.
7941
7942   If you split a window and then split again along the same axis, you
7943 will end up with a number of panes all arranged along the same axis.
7944 The precise way in which the splits were made should not be important,
7945 and this is reflected internally.  Internally, all windows are arranged
7946 in a tree, consisting of two types of windows, @dfn{combination} windows
7947 (which have children, and are covered completely by those children) and
7948 @dfn{leaf} windows, which have no children and are visible.  Every
7949 combination window has two or more children, all arranged along the same
7950 axis.  There are (logically) two subtypes of windows, depending on
7951 whether their children are horizontally or vertically arrayed.  There is
7952 always one root window, which is either a leaf window (if the frame
7953 contains only one window) or a combination window (if the frame contains
7954 more than one window).  In the latter case, the root window will have
7955 two or more children, either horizontally or vertically arrayed, and
7956 each of those children will be either a leaf window or another
7957 combination window.
7958
7959   Here are some rules:
7960
7961 @enumerate
7962 @item
7963 Horizontal combination windows can never have children that are
7964 horizontal combination windows; same for vertical.
7965
7966 @item
7967 Only leaf windows can be split (obviously) and this splitting does one
7968 of two things: (a) turns the leaf window into a combination window and
7969 creates two new leaf children, or (b) turns the leaf window into one of
7970 the two new leaves and creates the other leaf.  Rule (1) dictates which
7971 of these two outcomes happens.
7972
7973 @item
7974 Every combination window must have at least two children.
7975
7976 @item
7977 Leaf windows can never become combination windows.  They can be deleted,
7978 however.  If this results in a violation of (3), the parent combination
7979 window also gets deleted.
7980
7981 @item
7982 All functions that accept windows must be prepared to accept combination
7983 windows, and do something sane (e.g. signal an error if so).
7984 Combination windows @emph{do} escape to the Lisp level.
7985
7986 @item
7987 All windows have three fields governing their contents:
7988 these are @dfn{hchild} (a list of horizontally-arrayed children),
7989 @dfn{vchild} (a list of vertically-arrayed children), and @dfn{buffer}
7990 (the buffer contained in a leaf window).  Exactly one of
7991 these will be non-nil.  Remember that @dfn{horizontally-arrayed}
7992 means ``side-by-side'' and @dfn{vertically-arrayed} means
7993 @dfn{one above the other}.
7994
7995 @item
7996 Leaf windows also have markers in their @code{start} (the
7997 first buffer position displayed in the window) and @code{pointm}
7998 (the window's stashed value of @code{point}---see above) fields,
7999 while combination windows have nil in these fields.
8000
8001 @item
8002 The list of children for a window is threaded through the
8003 @code{next} and @code{prev} fields of each child window.
8004
8005 @item
8006 @strong{Deleted windows can be undeleted}.  This happens as a result of
8007 restoring a window configuration, and is unlike frames, displays, and
8008 consoles, which, once deleted, can never be restored.  Deleting a window
8009 does nothing except set a special @code{dead} bit to 1 and clear out the
8010 @code{next}, @code{prev}, @code{hchild}, and @code{vchild} fields, for
8011 GC purposes.
8012
8013 @item
8014 Most frames actually have two top-level windows---one for the
8015 minibuffer and one (the @dfn{root}) for everything else.  The modeline
8016 (if present) separates these two.  The @code{next} field of the root
8017 points to the minibuffer, and the @code{prev} field of the minibuffer
8018 points to the root.  The other @code{next} and @code{prev} fields are
8019 @code{nil}, and the frame points to both of these windows.
8020 Minibuffer-less frames have no minibuffer window, and the @code{next}
8021 and @code{prev} of the root window are @code{nil}.  Minibuffer-only
8022 frames have no root window, and the @code{next} of the minibuffer window
8023 is @code{nil} but the @code{prev} points to itself. (#### This is an
8024 artifact that should be fixed.)
8025 @end enumerate
8026
8027 @node The Window Object,  , Window Hierarchy, Consoles; Devices; Frames; Windows
8028 @section The Window Object
8029
8030   Windows have the following accessible fields:
8031
8032 @table @code
8033 @item frame
8034 The frame that this window is on.
8035
8036 @item mini_p
8037 Non-@code{nil} if this window is a minibuffer window.
8038
8039 @item buffer
8040 The buffer that the window is displaying.  This may change often during
8041 the life of the window.
8042
8043 @item dedicated
8044 Non-@code{nil} if this window is dedicated to its buffer.
8045
8046 @item pointm
8047 @cindex window point internals
8048 This is the value of point in the current buffer when this window is
8049 selected; when it is not selected, it retains its previous value.
8050
8051 @item start
8052 The position in the buffer that is the first character to be displayed
8053 in the window.
8054
8055 @item force_start
8056 If this flag is non-@code{nil}, it says that the window has been
8057 scrolled explicitly by the Lisp program.  This affects what the next
8058 redisplay does if point is off the screen: instead of scrolling the
8059 window to show the text around point, it moves point to a location that
8060 is on the screen.
8061
8062 @item last_modified
8063 The @code{modified} field of the window's buffer, as of the last time
8064 a redisplay completed in this window.
8065
8066 @item last_point
8067 The buffer's value of point, as of the last time
8068 a redisplay completed in this window.
8069
8070 @item left
8071 This is the left-hand edge of the window, measured in columns.  (The
8072 leftmost column on the screen is @w{column 0}.)
8073
8074 @item top
8075 This is the top edge of the window, measured in lines.  (The top line on
8076 the screen is @w{line 0}.)
8077
8078 @item height
8079 The height of the window, measured in lines.
8080
8081 @item width
8082 The width of the window, measured in columns.
8083
8084 @item next
8085 This is the window that is the next in the chain of siblings.  It is
8086 @code{nil} in a window that is the rightmost or bottommost of a group of
8087 siblings.
8088
8089 @item prev
8090 This is the window that is the previous in the chain of siblings.  It is
8091 @code{nil} in a window that is the leftmost or topmost of a group of
8092 siblings.
8093
8094 @item parent
8095 Internally, XEmacs arranges windows in a tree; each group of siblings has
8096 a parent window whose area includes all the siblings.  This field points
8097 to a window's parent.
8098
8099 Parent windows do not display buffers, and play little role in display
8100 except to shape their child windows.  Emacs Lisp programs usually have
8101 no access to the parent windows; they operate on the windows at the
8102 leaves of the tree, which actually display buffers.
8103
8104 @item hscroll
8105 This is the number of columns that the display in the window is scrolled
8106 horizontally to the left.  Normally, this is 0.
8107
8108 @item use_time
8109 This is the last time that the window was selected.  The function
8110 @code{get-lru-window} uses this field.
8111
8112 @item display_table
8113 The window's display table, or @code{nil} if none is specified for it.
8114
8115 @item update_mode_line
8116 Non-@code{nil} means this window's mode line needs to be updated.
8117
8118 @item base_line_number
8119 The line number of a certain position in the buffer, or @code{nil}.
8120 This is used for displaying the line number of point in the mode line.
8121
8122 @item base_line_pos
8123 The position in the buffer for which the line number is known, or
8124 @code{nil} meaning none is known.
8125
8126 @item region_showing
8127 If the region (or part of it) is highlighted in this window, this field
8128 holds the mark position that made one end of that region.  Otherwise,
8129 this field is @code{nil}.
8130 @end table
8131
8132 @node The Redisplay Mechanism, Extents, Consoles; Devices; Frames; Windows, Top
8133 @chapter The Redisplay Mechanism
8134
8135   The redisplay mechanism is one of the most complicated sections of
8136 XEmacs, especially from a conceptual standpoint.  This is doubly so
8137 because, unlike for the basic aspects of the Lisp interpreter, the
8138 computer science theories of how to efficiently handle redisplay are not
8139 well-developed.
8140
8141   When working with the redisplay mechanism, remember the Golden Rules
8142 of Redisplay:
8143
8144 @enumerate
8145 @item
8146 It Is Better To Be Correct Than Fast.
8147 @item
8148 Thou Shalt Not Run Elisp From Within Redisplay.
8149 @item
8150 It Is Better To Be Fast Than Not To Be.
8151 @end enumerate
8152
8153 @menu
8154 * Critical Redisplay Sections::
8155 * Line Start Cache::
8156 * Redisplay Piece by Piece::
8157 @end menu
8158
8159 @node Critical Redisplay Sections, Line Start Cache, The Redisplay Mechanism, The Redisplay Mechanism
8160 @section Critical Redisplay Sections
8161 @cindex critical redisplay sections
8162
8163 Within this section, we are defenseless and assume that the
8164 following cannot happen:
8165
8166 @enumerate
8167 @item
8168 garbage collection
8169 @item
8170 Lisp code evaluation
8171 @item
8172 frame size changes
8173 @end enumerate
8174
8175 We ensure (3) by calling @code{hold_frame_size_changes()}, which
8176 will cause any pending frame size changes to get put on hold
8177 till after the end of the critical section.  (1) follows
8178 automatically if (2) is met.  #### Unfortunately, there are
8179 some places where Lisp code can be called within this section.
8180 We need to remove them.
8181
8182 If @code{Fsignal()} is called during this critical section, we
8183 will @code{abort()}.
8184
8185 If garbage collection is called during this critical section,
8186 we simply return. #### We should abort instead.
8187
8188 #### If a frame-size change does occur we should probably
8189 actually be preempting redisplay.
8190
8191 @node Line Start Cache, Redisplay Piece by Piece, Critical Redisplay Sections, The Redisplay Mechanism
8192 @section Line Start Cache
8193 @cindex line start cache
8194
8195   The traditional scrolling code in Emacs breaks in a variable height
8196 world.  It depends on the key assumption that the number of lines that
8197 can be displayed at any given time is fixed.  This led to a complete
8198 separation of the scrolling code from the redisplay code.  In order to
8199 fully support variable height lines, the scrolling code must actually be
8200 tightly integrated with redisplay.  Only redisplay can determine how
8201 many lines will be displayed on a screen for any given starting point.
8202
8203   What is ideally wanted is a complete list of the starting buffer
8204 position for every possible display line of a buffer along with the
8205 height of that display line.  Maintaining such a full list would be very
8206 expensive.  We settle for having it include information for all areas
8207 which we happen to generate anyhow (i.e. the region currently being
8208 displayed) and for those areas we need to work with.
8209
8210   In order to ensure that the cache accurately represents what redisplay
8211 would actually show, it is necessary to invalidate it in many
8212 situations.  If the buffer changes, the starting positions may no longer
8213 be correct.  If a face or an extent has changed then the line heights
8214 may have altered.  These events happen frequently enough that the cache
8215 can end up being constantly disabled.  With this potentially constant
8216 invalidation when is the cache ever useful?
8217
8218   Even if the cache is invalidated before every single usage, it is
8219 necessary.  Scrolling often requires knowledge about display lines which
8220 are actually above or below the visible region.  The cache provides a
8221 convenient light-weight method of storing this information for multiple
8222 display regions.  This knowledge is necessary for the scrolling code to
8223 always obey the First Golden Rule of Redisplay.
8224
8225   If the cache already contains all of the information that the scrolling
8226 routines happen to need so that it doesn't have to go generate it, then
8227 we are able to obey the Third Golden Rule of Redisplay.  The first thing
8228 we do to help out the cache is to always add the displayed region.  This
8229 region had to be generated anyway, so the cache ends up getting the
8230 information basically for free.  In those cases where a user is simply
8231 scrolling around viewing a buffer there is a high probability that this
8232 is sufficient to always provide the needed information.  The second
8233 thing we can do is be smart about invalidating the cache.
8234
8235   TODO---Be smart about invalidating the cache.  Potential places:
8236
8237 @itemize @bullet
8238 @item
8239 Insertions at end-of-line which don't cause line-wraps do not alter the
8240 starting positions of any display lines.  These types of buffer
8241 modifications should not invalidate the cache.  This is actually a large
8242 optimization for redisplay speed as well.
8243 @item
8244 Buffer modifications frequently only affect the display of lines at and
8245 below where they occur.  In these situations we should only invalidate
8246 the part of the cache starting at where the modification occurs.
8247 @end itemize
8248
8249   In case you're wondering, the Second Golden Rule of Redisplay is not
8250 applicable.
8251
8252 @node Redisplay Piece by Piece,  , Line Start Cache, The Redisplay Mechanism
8253 @section Redisplay Piece by Piece
8254 @cindex Redisplay Piece by Piece
8255
8256 As you can begin to see redisplay is complex and also not well
8257 documented. Chuck no longer works on XEmacs so this section is my take
8258 on the workings of redisplay.
8259
8260 Redisplay happens in three phases:
8261
8262 @enumerate
8263 @item
8264 Determine desired display in area that needs redisplay.
8265 Implemented by @code{redisplay.c}
8266 @item
8267 Compare desired display with current display
8268 Implemented by @code{redisplay-output.c}
8269 @item
8270 Output changes Implemented by @code{redisplay-output.c},
8271 @code{redisplay-x.c}, @code{redisplay-msw.c} and @code{redisplay-tty.c}
8272 @end enumerate
8273
8274 Steps 1 and 2 are device-independent and relatively complex.  Step 3 is
8275 mostly device-dependent.
8276
8277 Determining the desired display
8278
8279 Display attributes are stored in @code{display_line} structures. Each
8280 @code{display_line} consists of a set of @code{display_block}'s and each
8281 @code{display_block} contains a number of @code{rune}'s. Generally
8282 dynarr's of @code{display_line}'s are held by each window representing
8283 the current display and the desired display.
8284
8285 The @code{display_line} structures are tightly tied to buffers which
8286 presents a problem for redisplay as this connection is bogus for the
8287 modeline. Hence the @code{display_line} generation routines are
8288 duplicated for generating the modeline. This means that the modeline
8289 display code has many bugs that the standard redisplay code does not.
8290
8291 The guts of @code{display_line} generation are in
8292 @code{create_text_block}, which creates a single display line for the
8293 desired locale. This incrementally parses the characters on the current
8294 line and generates redisplay structures for each.
8295
8296 Gutter redisplay is different. Because the data to display is stored in
8297 a string we cannot use @code{create_text_block}. Instead we use
8298 @code{create_text_string_block} which performs the same function as
8299 @code{create_text_block} but for strings. Many of the complexities of
8300 @code{create_text_block} to do with cursor handling and selective
8301 display have been removed.
8302
8303 @node Extents, Faces, The Redisplay Mechanism, Top
8304 @chapter Extents
8305
8306 @menu
8307 * Introduction to Extents::     Extents are ranges over text, with properties.
8308 * Extent Ordering::             How extents are ordered internally.
8309 * Format of the Extent Info::   The extent information in a buffer or string.
8310 * Zero-Length Extents::         A weird special case.
8311 * Mathematics of Extent Ordering::  A rigorous foundation.
8312 * Extent Fragments::            Cached information useful for redisplay.
8313 @end menu
8314
8315 @node Introduction to Extents, Extent Ordering, Extents, Extents
8316 @section Introduction to Extents
8317
8318   Extents are regions over a buffer, with a start and an end position
8319 denoting the region of the buffer included in the extent.  In
8320 addition, either end can be closed or open, meaning that the endpoint
8321 is or is not logically included in the extent.  Insertion of a character
8322 at a closed endpoint causes the character to go inside the extent;
8323 insertion at an open endpoint causes the character to go outside.
8324
8325   Extent endpoints are stored using memory indices (see @file{insdel.c}),
8326 to minimize the amount of adjusting that needs to be done when
8327 characters are inserted or deleted.
8328
8329   (Formerly, extent endpoints at the gap could be either before or
8330 after the gap, depending on the open/closedness of the endpoint.
8331 The intent of this was to make it so that insertions would
8332 automatically go inside or out of extents as necessary with no
8333 further work needing to be done.  It didn't work out that way,
8334 however, and just ended up complexifying and buggifying all the
8335 rest of the code.)
8336
8337 @node Extent Ordering, Format of the Extent Info, Introduction to Extents, Extents
8338 @section Extent Ordering
8339
8340   Extents are compared using memory indices.  There are two orderings
8341 for extents and both orders are kept current at all times.  The normal
8342 or @dfn{display} order is as follows:
8343
8344 @example
8345 Extent A is ``less than'' extent B,
8346 that is, earlier in the display order,
8347   if:    A-start < B-start,
8348   or if: A-start = B-start, and A-end > B-end
8349 @end example
8350
8351   So if two extents begin at the same position, the larger of them is the
8352 earlier one in the display order (@code{EXTENT_LESS} is true).
8353
8354   For the e-order, the same thing holds:
8355
8356 @example
8357 Extent A is ``less than'' extent B in e-order,
8358 that is, later in the buffer,
8359   if:    A-end < B-end,
8360   or if: A-end = B-end, and A-start > B-start
8361 @end example
8362
8363   So if two extents end at the same position, the smaller of them is the
8364 earlier one in the e-order (@code{EXTENT_E_LESS} is true).
8365
8366   The display order and the e-order are complementary orders: any
8367 theorem about the display order also applies to the e-order if you swap
8368 all occurrences of ``display order'' and ``e-order'', ``less than'' and
8369 ``greater than'', and ``extent start'' and ``extent end''.
8370
8371 @node Format of the Extent Info, Zero-Length Extents, Extent Ordering, Extents
8372 @section Format of the Extent Info
8373
8374   An extent-info structure consists of a list of the buffer or string's
8375 extents and a @dfn{stack of extents} that lists all of the extents over
8376 a particular position.  The stack-of-extents info is used for
8377 optimization purposes---it basically caches some info that might
8378 be expensive to compute.  Certain otherwise hard computations are easy
8379 given the stack of extents over a particular position, and if the
8380 stack of extents over a nearby position is known (because it was
8381 calculated at some prior point in time), it's easy to move the stack
8382 of extents to the proper position.
8383
8384   Given that the stack of extents is an optimization, and given that
8385 it requires memory, a string's stack of extents is wiped out each
8386 time a garbage collection occurs.  Therefore, any time you retrieve
8387 the stack of extents, it might not be there.  If you need it to
8388 be there, use the @code{_force} version.
8389
8390   Similarly, a string may or may not have an extent_info structure.
8391 (Generally it won't if there haven't been any extents added to the
8392 string.) So use the @code{_force} version if you need the extent_info
8393 structure to be there.
8394
8395   A list of extents is maintained as a double gap array: one gap array
8396 is ordered by start index (the @dfn{display order}) and the other is
8397 ordered by end index (the @dfn{e-order}).  Note that positions in an
8398 extent list should logically be conceived of as referring @emph{to} a
8399 particular extent (as is the norm in programs) rather than sitting
8400 between two extents.  Note also that callers of these functions should
8401 not be aware of the fact that the extent list is implemented as an
8402 array, except for the fact that positions are integers (this should be
8403 generalized to handle integers and linked list equally well).
8404
8405 @node Zero-Length Extents, Mathematics of Extent Ordering, Format of the Extent Info, Extents
8406 @section Zero-Length Extents
8407
8408   Extents can be zero-length, and will end up that way if their endpoints
8409 are explicitly set that way or if their detachable property is nil
8410 and all the text in the extent is deleted. (The exception is open-open
8411 zero-length extents, which are barred from existing because there is
8412 no sensible way to define their properties.  Deletion of the text in
8413 an open-open extent causes it to be converted into a closed-open
8414 extent.)  Zero-length extents are primarily used to represent
8415 annotations, and behave as follows:
8416
8417 @enumerate
8418 @item
8419 Insertion at the position of a zero-length extent expands the extent
8420 if both endpoints are closed; goes after the extent if it is closed-open;
8421 and goes before the extent if it is open-closed.
8422
8423 @item
8424 Deletion of a character on a side of a zero-length extent whose
8425 corresponding endpoint is closed causes the extent to be detached if
8426 it is detachable; if the extent is not detachable or the corresponding
8427 endpoint is open, the extent remains in the buffer, moving as necessary.
8428 @end enumerate
8429
8430   Note that closed-open, non-detachable zero-length extents behave
8431 exactly like markers and that open-closed, non-detachable zero-length
8432 extents behave like the ``point-type'' marker in Mule.
8433
8434 @node Mathematics of Extent Ordering, Extent Fragments, Zero-Length Extents, Extents
8435 @section Mathematics of Extent Ordering
8436 @cindex extent mathematics
8437 @cindex mathematics of extents
8438 @cindex extent ordering
8439
8440 @cindex display order of extents
8441 @cindex extents, display order
8442   The extents in a buffer are ordered by ``display order'' because that
8443 is that order that the redisplay mechanism needs to process them in.
8444 The e-order is an auxiliary ordering used to facilitate operations
8445 over extents.  The operations that can be performed on the ordered
8446 list of extents in a buffer are
8447
8448 @enumerate
8449 @item
8450 Locate where an extent would go if inserted into the list.
8451 @item
8452 Insert an extent into the list.
8453 @item
8454 Remove an extent from the list.
8455 @item
8456 Map over all the extents that overlap a range.
8457 @end enumerate
8458
8459   (4) requires being able to determine the first and last extents
8460 that overlap a range.
8461
8462   NOTE: @dfn{overlap} is used as follows:
8463
8464 @itemize @bullet
8465 @item
8466 two ranges overlap if they have at least one point in common.
8467 Whether the endpoints are open or closed makes a difference here.
8468 @item
8469 a point overlaps a range if the point is contained within the
8470 range; this is equivalent to treating a point @math{P} as the range
8471 @math{[P, P]}.
8472 @item
8473 In the case of an @emph{extent} overlapping a point or range, the extent
8474 is normally treated as having closed endpoints.  This applies
8475 consistently in the discussion of stacks of extents and such below.
8476 Note that this definition of overlap is not necessarily consistent with
8477 the extents that @code{map-extents} maps over, since @code{map-extents}
8478 sometimes pays attention to whether the endpoints of an extents are open
8479 or closed.  But for our purposes, it greatly simplifies things to treat
8480 all extents as having closed endpoints.
8481 @end itemize
8482
8483 First, define @math{>}, @math{<}, @math{<=}, etc. as applied to extents
8484 to mean comparison according to the display order.  Comparison between
8485 an extent @math{E} and an index @math{I} means comparison between
8486 @math{E} and the range @math{[I, I]}.
8487
8488 Also define @math{e>}, @math{e<}, @math{e<=}, etc. to mean comparison
8489 according to the e-order.
8490
8491 For any range @math{R}, define @math{R(0)} to be the starting index of
8492 the range and @math{R(1)} to be the ending index of the range.
8493
8494 For any extent @math{E}, define @math{E(next)} to be the extent directly
8495 following @math{E}, and @math{E(prev)} to be the extent directly
8496 preceding @math{E}.  Assume @math{E(next)} and @math{E(prev)} can be
8497 determined from @math{E} in constant time.  (This is because we store
8498 the extent list as a doubly linked list.)
8499
8500 Similarly, define @math{E(e-next)} and @math{E(e-prev)} to be the
8501 extents directly following and preceding @math{E} in the e-order.
8502
8503 Now:
8504
8505 Let @math{R} be a range.
8506 Let @math{F} be the first extent overlapping @math{R}.
8507 Let @math{L} be the last extent overlapping @math{R}.
8508
8509 Theorem 1: @math{R(1)} lies between @math{L} and @math{L(next)},
8510 i.e. @math{L <= R(1) < L(next)}.
8511
8512   This follows easily from the definition of display order.  The
8513 basic reason that this theorem applies is that the display order
8514 sorts by increasing starting index.
8515
8516   Therefore, we can determine @math{L} just by looking at where we would
8517 insert @math{R(1)} into the list, and if we know @math{F} and are moving
8518 forward over extents, we can easily determine when we've hit @math{L} by
8519 comparing the extent we're at to @math{R(1)}.
8520
8521 @example
8522 Theorem 2: @math{F(e-prev) e< [1, R(0)] e<= F}.
8523 @end example
8524
8525   This is the analog of Theorem 1, and applies because the e-order
8526 sorts by increasing ending index.
8527
8528   Therefore, @math{F} can be found in the same amount of time as
8529 operation (1), i.e. the time that it takes to locate where an extent
8530 would go if inserted into the e-order list.
8531
8532   If the lists were stored as balanced binary trees, then operation (1)
8533 would take logarithmic time, which is usually quite fast.  However,
8534 currently they're stored as simple doubly-linked lists, and instead we
8535 do some caching to try to speed things up.
8536
8537   Define a @dfn{stack of extents} (or @dfn{SOE}) as the set of extents
8538 (ordered in the display order) that overlap an index @math{I}, together
8539 with the SOE's @dfn{previous} extent, which is an extent that precedes
8540 @math{I} in the e-order. (Hopefully there will not be very many extents
8541 between @math{I} and the previous extent.)
8542
8543 Now:
8544
8545 Let @math{I} be an index, let @math{S} be the stack of extents on
8546 @math{I}, let @math{F} be the first extent in @math{S}, and let @math{P}
8547 be @math{S}'s previous extent.
8548
8549 Theorem 3: The first extent in @math{S} is the first extent that overlaps
8550 any range @math{[I, J]}.
8551
8552 Proof: Any extent that overlaps @math{[I, J]} but does not include
8553 @math{I} must have a start index @math{> I}, and thus be greater than
8554 any extent in @math{S}.
8555
8556 Therefore, finding the first extent that overlaps a range @math{R} is
8557 the same as finding the first extent that overlaps @math{R(0)}.
8558
8559 Theorem 4: Let @math{I2} be an index such that @math{I2 > I}, and let
8560 @math{F2} be the first extent that overlaps @math{I2}.  Then, either
8561 @math{F2} is in @math{S} or @math{F2} is greater than any extent in
8562 @math{S}.
8563
8564 Proof: If @math{F2} does not include @math{I} then its start index is
8565 greater than @math{I} and thus it is greater than any extent in
8566 @math{S}, including @math{F}.  Otherwise, @math{F2} includes @math{I}
8567 and thus is in @math{S}, and thus @math{F2 >= F}.
8568
8569 @node Extent Fragments,  , Mathematics of Extent Ordering, Extents
8570 @section Extent Fragments
8571 @cindex extent fragment
8572
8573   Imagine that the buffer is divided up into contiguous, non-overlapping
8574 @dfn{runs} of text such that no extent starts or ends within a run
8575 (extents that abut the run don't count).
8576
8577   An extent fragment is a structure that holds data about the run that
8578 contains a particular buffer position (if the buffer position is at the
8579 junction of two runs, the run after the position is used)---the
8580 beginning and end of the run, a list of all of the extents in that run,
8581 the @dfn{merged face} that results from merging all of the faces
8582 corresponding to those extents, the begin and end glyphs at the
8583 beginning of the run, etc.  This is the information that redisplay needs
8584 in order to display this run.
8585
8586   Extent fragments have to be very quick to update to a new buffer
8587 position when moving linearly through the buffer.  They rely on the
8588 stack-of-extents code, which does the heavy-duty algorithmic work of
8589 determining which extents overly a particular position.
8590
8591 @node Faces, Glyphs, Extents, Top
8592 @chapter Faces
8593
8594 Not yet documented.
8595
8596 @node Glyphs, Specifiers, Faces, Top
8597 @chapter Glyphs
8598
8599 Glyphs are graphical elements that can be displayed in XEmacs buffers or
8600 gutters. We use the term graphical element here in the broadest possible
8601 sense since glyphs can be as mundane as text to as arcane as a native
8602 tab widget.
8603
8604 In XEmacs, glyphs represent the uninstantiated state of graphical
8605 elements, i.e. they hold all the information necessary to produce an
8606 image on-screen but the image does not exist at this stage.
8607
8608 Glyphs are lazily instantiated by calling one of the glyph
8609 functions. This usually occurs within redisplay when
8610 @code{Fglyph_height} is called. Instantiation causes an image-instance
8611 to be created and cached. This cache is on a device basis for all glyphs
8612 except glyph-widgets, and on a window basis for glyph widgets.  The
8613 caching is done by @code{image_instantiate} and is necessary because it
8614 is generally possible to display an image-instance in multiple
8615 domains. For instance if we create a Pixmap, we can actually display
8616 this on multiple windows - even though we only need a single Pixmap
8617 instance to do this. If caching wasn't done then it would be necessary
8618 to create image-instances for every displayable occurrence of a glyph -
8619 and every usage - and this would be extremely memory and cpu intensive.
8620
8621 Widget-glyphs (a.k.a native widgets) are not cached in this way. This is
8622 because widget-glyph image-instances on screen are toolkit windows, and
8623 thus cannot be reused in multiple XEmacs domains. Thus widget-glyphs are
8624 cached on a window basis.
8625
8626 Any action on a glyph first consults the cache before actually
8627 instantiating a widget.
8628
8629 @section Widget-Glyphs in the MS-Windows Environment
8630
8631 To Do
8632
8633 @section Widget-Glyphs in the X Environment
8634
8635 Widget-glyphs under X make heavy use of lwlib for manipulating the
8636 native toolkit objects. This is primarily so that different toolkits can
8637 be supported for widget-glyphs, just as they are supported for features
8638 such as menubars etc.
8639
8640 Lwlib is extremely poorly documented and quite hairy so here is my
8641 understanding of what goes on.
8642
8643 Lwlib maintains a set of widget_instances which mirror the hierarchical
8644 state of Xt widgets. I think this is so that widgets can be updated and
8645 manipulated generically by the lwlib library. For instance
8646 update_one_widget_instance can cope with multiple types of widget and
8647 multiple types of toolkit. Each element in the widget hierarchy is updated
8648 from its corresponding widget_instance by walking the widget_instance
8649 tree recursively.
8650
8651 This has desirable properties such as lw_modify_all_widgets which is
8652 called from glyphs-x.c and updates all the properties of a widget
8653 without having to know what the widget is or what toolkit it is from.
8654 Unfortunately this also has hairy properties such as making the lwlib
8655 code quite complex. And of course lwlib has to know at some level what
8656 the widget is and how to set its properties.
8657
8658 @node Specifiers, Menus, Glyphs, Top
8659 @chapter Specifiers
8660
8661 Not yet documented.
8662
8663 @node Menus, Subprocesses, Specifiers, Top
8664 @chapter Menus
8665
8666   A menu is set by setting the value of the variable
8667 @code{current-menubar} (which may be buffer-local) and then calling
8668 @code{set-menubar-dirty-flag} to signal a change.  This will cause the
8669 menu to be redrawn at the next redisplay.  The format of the data in
8670 @code{current-menubar} is described in @file{menubar.c}.
8671
8672   Internally the data in current-menubar is parsed into a tree of
8673 @code{widget_value's} (defined in @file{lwlib.h}); this is accomplished
8674 by the recursive function @code{menu_item_descriptor_to_widget_value()},
8675 called by @code{compute_menubar_data()}.  Such a tree is deallocated
8676 using @code{free_widget_value()}.
8677
8678   @code{update_screen_menubars()} is one of the external entry points.
8679 This checks to see, for each screen, if that screen's menubar needs to
8680 be updated.  This is the case if
8681
8682 @enumerate
8683 @item
8684 @code{set-menubar-dirty-flag} was called since the last redisplay.  (This
8685 function sets the C variable menubar_has_changed.)
8686 @item
8687 The buffer displayed in the screen has changed.
8688 @item
8689 The screen has no menubar currently displayed.
8690 @end enumerate
8691
8692   @code{set_screen_menubar()} is called for each such screen.  This
8693 function calls @code{compute_menubar_data()} to create the tree of
8694 widget_value's, then calls @code{lw_create_widget()},
8695 @code{lw_modify_all_widgets()}, and/or @code{lw_destroy_all_widgets()}
8696 to create the X-Toolkit widget associated with the menu.
8697
8698   @code{update_psheets()}, the other external entry point, actually
8699 changes the menus being displayed.  It uses the widgets fixed by
8700 @code{update_screen_menubars()} and calls various X functions to ensure
8701 that the menus are displayed properly.
8702
8703   The menubar widget is set up so that @code{pre_activate_callback()} is
8704 called when the menu is first selected (i.e. mouse button goes down),
8705 and @code{menubar_selection_callback()} is called when an item is
8706 selected.  @code{pre_activate_callback()} calls the function in
8707 activate-menubar-hook, which can change the menubar (this is described
8708 in @file{menubar.c}).  If the menubar is changed,
8709 @code{set_screen_menubars()} is called.
8710 @code{menubar_selection_callback()} enqueues a menu event, putting in it
8711 a function to call (either @code{eval} or @code{call-interactively}) and
8712 its argument, which is the callback function or form given in the menu's
8713 description.
8714
8715 @node Subprocesses, Interface to X Windows, Menus, Top
8716 @chapter Subprocesses
8717
8718   The fields of a process are:
8719
8720 @table @code
8721 @item name
8722 A string, the name of the process.
8723
8724 @item command
8725 A list containing the command arguments that were used to start this
8726 process.
8727
8728 @item filter
8729 A function used to accept output from the process instead of a buffer,
8730 or @code{nil}.
8731
8732 @item sentinel
8733 A function called whenever the process receives a signal, or @code{nil}.
8734
8735 @item buffer
8736 The associated buffer of the process.
8737
8738 @item pid
8739 An integer, the Unix process @sc{id}.
8740
8741 @item childp
8742 A flag, non-@code{nil} if this is really a child process.
8743 It is @code{nil} for a network connection.
8744
8745 @item mark
8746 A marker indicating the position of the end of the last output from this
8747 process inserted into the buffer.  This is often but not always the end
8748 of the buffer.
8749
8750 @item kill_without_query
8751 If this is non-@code{nil}, killing XEmacs while this process is still
8752 running does not ask for confirmation about killing the process.
8753
8754 @item raw_status_low
8755 @itemx raw_status_high
8756 These two fields record 16 bits each of the process status returned by
8757 the @code{wait} system call.
8758
8759 @item status
8760 The process status, as @code{process-status} should return it.
8761
8762 @item tick
8763 @itemx update_tick
8764 If these two fields are not equal, a change in the status of the process
8765 needs to be reported, either by running the sentinel or by inserting a
8766 message in the process buffer.
8767
8768 @item pty_flag
8769 Non-@code{nil} if communication with the subprocess uses a @sc{pty};
8770 @code{nil} if it uses a pipe.
8771
8772 @item infd
8773 The file descriptor for input from the process.
8774
8775 @item outfd
8776 The file descriptor for output to the process.
8777
8778 @item subtty
8779 The file descriptor for the terminal that the subprocess is using.  (On
8780 some systems, there is no need to record this, so the value is
8781 @code{-1}.)
8782
8783 @item tty_name
8784 The name of the terminal that the subprocess is using,
8785 or @code{nil} if it is using pipes.
8786 @end table
8787
8788 @node Interface to X Windows, Index , Subprocesses, Top
8789 @chapter Interface to X Windows
8790
8791 Not yet documented.
8792
8793 @include index.texi
8794
8795 @c Print the tables of contents
8796 @summarycontents
8797 @contents
8798 @c That's all
8799
8800 @bye
8801