git.chise.org Git - chise/xemacs-chise.git.1/blob - man/internals/internals.texi

   1 \input texinfo  @c -*-texinfo-*-
   2 @c %**start of header
   3 @setfilename ../../info/internals.info
   4 @settitle XEmacs Internals Manual
   5 @c %**end of header
   6
   7 @ifinfo
   8 @dircategory XEmacs Editor
   9 @direntry
  10 * Internals: (internals).       XEmacs Internals Manual.
  11 @end direntry
  12
  13 Copyright @copyright{} 1992 - 1996 Ben Wing.
  14 Copyright @copyright{} 1996, 1997 Sun Microsystems.
  15 Copyright @copyright{} 1994 - 1998 Free Software Foundation.
  16 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
  17
  18
  19 Permission is granted to make and distribute verbatim copies of this
  20 manual provided the copyright notice and this permission notice are
  21 preserved on all copies.
  22
  23 @ignore
  24 Permission is granted to process this file through TeX and print the
  25 results, provided the printed document carries copying permission notice
  26 identical to this one except for the removal of this paragraph (this
  27 paragraph not being relevant to the printed manual).
  28
  29 @end ignore
  30 Permission is granted to copy and distribute modified versions of this
  31 manual under the conditions for verbatim copying, provided that the
  32 entire resulting derived work is distributed under the terms of a
  33 permission notice identical to this one.
  34
  35 Permission is granted to copy and distribute translations of this manual
  36 into another language, under the above conditions for modified versions,
  37 except that this permission notice may be stated in a translation
  38 approved by the Foundation.
  39
  40 Permission is granted to copy and distribute modified versions of this
  41 manual under the conditions for verbatim copying, provided also that the
  42 section entitled ``GNU General Public License'' is included exactly as
  43 in the original, and provided that the entire resulting derived work is
  44 distributed under the terms of a permission notice identical to this
  45 one.
  46
  47 Permission is granted to copy and distribute translations of this manual
  48 into another language, under the above conditions for modified versions,
  49 except that the section entitled ``GNU General Public License'' may be
  50 included in a translation approved by the Free Software Foundation
  51 instead of in the original English.
  52 @end ifinfo
  53
  54 @c Combine indices.
  55 @synindex cp fn
  56 @syncodeindex vr fn
  57 @syncodeindex ky fn
  58 @syncodeindex pg fn
  59 @syncodeindex tp fn
  60
  61 @setchapternewpage odd
  62 @finalout
  63
  64 @titlepage
  65 @title XEmacs Internals Manual
  66 @subtitle Version 1.3, August 1999
  67
  68 @author Ben Wing
  69 @author Martin Buchholz
  70 @author Hrvoje Niksic
  71 @author Matthias Neubauer
  72 @author Olivier Galibert
  73 @page
  74 @vskip 0pt plus 1fill
  75
  76 @noindent
  77 Copyright @copyright{} 1992 - 1996 Ben Wing. @*
  78 Copyright @copyright{} 1996, 1997 Sun Microsystems, Inc. @*
  79 Copyright @copyright{} 1994 - 1998 Free Software Foundation. @*
  80 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
  81
  82 @sp 2
  83 Version 1.3 @*
  84 August 1999.@*
  85
  86 Permission is granted to make and distribute verbatim copies of this
  87 manual provided the copyright notice and this permission notice are
  88 preserved on all copies.
  89
  90 Permission is granted to copy and distribute modified versions of this
  91 manual under the conditions for verbatim copying, provided also that the
  92 section entitled ``GNU General Public License'' is included
  93 exactly as in the original, and provided that the entire resulting
  94 derived work is distributed under the terms of a permission notice
  95 identical to this one.
  96
  97 Permission is granted to copy and distribute translations of this manual
  98 into another language, under the above conditions for modified versions,
  99 except that the section entitled ``GNU General Public License'' may be
 100 included in a translation approved by the Free Software Foundation
 101 instead of in the original English.
 102 @end titlepage
 103 @page
 104
 105 @node Top, A History of Emacs, (dir), (dir)
 106
 107 @ifinfo
 108 This Info file contains v1.3 of the XEmacs Internals Manual, August 1999.
 109 @end ifinfo
 110
 111 @menu
 112 * A History of Emacs::          Times, dates, important events.
 113 * XEmacs From the Outside::     A broad conceptual overview.
 114 * The Lisp Language::           An overview.
 115 * XEmacs From the Perspective of Building::
 116 * XEmacs From the Inside::
 117 * The XEmacs Object System (Abstractly Speaking)::
 118 * How Lisp Objects Are Represented in C::
 119 * Rules When Writing New C Code::
 120 * A Summary of the Various XEmacs Modules::
 121 * Allocation of Objects in XEmacs Lisp::
 122 * Dumping::
 123 * Events and the Event Loop::
 124 * Evaluation; Stack Frames; Bindings::
 125 * Symbols and Variables::
 126 * Buffers and Textual Representation::
 127 * MULE Character Sets and Encodings::
 128 * The Lisp Reader and Compiler::
 129 * Lstreams::
 130 * Consoles; Devices; Frames; Windows::
 131 * The Redisplay Mechanism::
 132 * Extents::
 133 * Faces::
 134 * Glyphs::
 135 * Specifiers::
 136 * Menus::
 137 * Subprocesses::
 138 * Interface to the X Window System::
 139 * Index::
 140
 141 @detailmenu
 142
 143 --- The Detailed Node Listing ---
 144
 145 A History of Emacs
 146
 147 * Through Version 18::          Unification prevails.
 148 * Lucid Emacs::                 One version 19 Emacs.
 149 * GNU Emacs 19::                The other version 19 Emacs.
 150 * GNU Emacs 20::                The other version 20 Emacs.
 151 * XEmacs::                      The continuation of Lucid Emacs.
 152
 153 Rules When Writing New C Code
 154
 155 * General Coding Rules::
 156 * Writing Lisp Primitives::
 157 * Adding Global Lisp Variables::
 158 * Coding for Mule::
 159 * Techniques for XEmacs Developers::
 160
 161 Coding for Mule
 162
 163 * Character-Related Data Types::
 164 * Working With Character and Byte Positions::
 165 * Conversion to and from External Data::
 166 * General Guidelines for Writing Mule-Aware Code::
 167 * An Example of Mule-Aware Code::
 168
 169 A Summary of the Various XEmacs Modules
 170
 171 * Low-Level Modules::
 172 * Basic Lisp Modules::
 173 * Modules for Standard Editing Operations::
 174 * Editor-Level Control Flow Modules::
 175 * Modules for the Basic Displayable Lisp Objects::
 176 * Modules for other Display-Related Lisp Objects::
 177 * Modules for the Redisplay Mechanism::
 178 * Modules for Interfacing with the File System::
 179 * Modules for Other Aspects of the Lisp Interpreter and Object System::
 180 * Modules for Interfacing with the Operating System::
 181 * Modules for Interfacing with X Windows::
 182 * Modules for Internationalization::
 183
 184 Allocation of Objects in XEmacs Lisp
 185
 186 * Introduction to Allocation::
 187 * Garbage Collection::
 188 * GCPROing::
 189 * Garbage Collection - Step by Step::
 190 * Integers and Characters::
 191 * Allocation from Frob Blocks::
 192 * lrecords::
 193 * Low-level allocation::
 194 * Cons::
 195 * Vector::
 196 * Bit Vector::
 197 * Symbol::
 198 * Marker::
 199 * String::
 200 * Compiled Function::
 201
 202 Garbage Collection - Step by Step
 203
 204 * Invocation::
 205 * garbage_collect_1::
 206 * mark_object::
 207 * gc_sweep::
 208 * sweep_lcrecords_1::
 209 * compact_string_chars::
 210 * sweep_strings::
 211 * sweep_bit_vectors_1::
 212
 213 Dumping
 214
 215 * Overview::
 216 * Data descriptions::
 217 * Dumping phase::
 218 * Reloading phase::
 219
 220 Dumping phase
 221
 222 * Object inventory::
 223 * Address allocation::
 224 * The header::
 225 * Data dumping::
 226 * Pointers dumping::
 227
 228 Events and the Event Loop
 229
 230 * Introduction to Events::
 231 * Main Loop::
 232 * Specifics of the Event Gathering Mechanism::
 233 * Specifics About the Emacs Event::
 234 * The Event Stream Callback Routines::
 235 * Other Event Loop Functions::
 236 * Converting Events::
 237 * Dispatching Events; The Command Builder::
 238
 239 Evaluation; Stack Frames; Bindings
 240
 241 * Evaluation::
 242 * Dynamic Binding; The specbinding Stack; Unwind-Protects::
 243 * Simple Special Forms::
 244 * Catch and Throw::
 245
 246 Symbols and Variables
 247
 248 * Introduction to Symbols::
 249 * Obarrays::
 250 * Symbol Values::
 251
 252 Buffers and Textual Representation
 253
 254 * Introduction to Buffers::     A buffer holds a block of text such as a file.
 255 * The Text in a Buffer::        Representation of the text in a buffer.
 256 * Buffer Lists::                Keeping track of all buffers.
 257 * Markers and Extents::         Tagging locations within a buffer.
 258 * Bufbytes and Emchars::        Representation of individual characters.
 259 * The Buffer Object::           The Lisp object corresponding to a buffer.
 260
 261 MULE Character Sets and Encodings
 262
 263 * Character Sets::
 264 * Encodings::
 265 * Internal Mule Encodings::
 266 * CCL::
 267
 268 Encodings
 269
 270 * Japanese EUC (Extended Unix Code)::
 271 * JIS7::
 272
 273 Internal Mule Encodings
 274
 275 * Internal String Encoding::
 276 * Internal Character Encoding::
 277
 278 Lstreams
 279
 280 * Creating an Lstream::         Creating an lstream object.
 281 * Lstream Types::               Different sorts of things that are streamed.
 282 * Lstream Functions::           Functions for working with lstreams.
 283 * Lstream Methods::             Creating new lstream types.
 284
 285 Consoles; Devices; Frames; Windows
 286
 287 * Introduction to Consoles; Devices; Frames; Windows::
 288 * Point::
 289 * Window Hierarchy::
 290 * The Window Object::
 291
 292 The Redisplay Mechanism
 293
 294 * Critical Redisplay Sections::
 295 * Line Start Cache::
 296 * Redisplay Piece by Piece::
 297
 298 Extents
 299
 300 * Introduction to Extents::     Extents are ranges over text, with properties.
 301 * Extent Ordering::             How extents are ordered internally.
 302 * Format of the Extent Info::   The extent information in a buffer or string.
 303 * Zero-Length Extents::         A weird special case.
 304 * Mathematics of Extent Ordering::  A rigorous foundation.
 305 * Extent Fragments::            Cached information useful for redisplay.
 306
 307 @end detailmenu
 308 @end menu
 309
 310 @node A History of Emacs, XEmacs From the Outside, Top, Top
 311 @chapter A History of Emacs
 312 @cindex history of Emacs, a
 313 @cindex Emacs, a history of
 314 @cindex Hackers (Steven Levy)
 315 @cindex Levy, Steven
 316 @cindex ITS (Incompatible Timesharing System)
 317 @cindex Stallman, Richard
 318 @cindex RMS
 319 @cindex MIT
 320 @cindex TECO
 321 @cindex FSF
 322 @cindex Free Software Foundation
 323
 324   XEmacs is a powerful, customizable text editor and development
 325 environment.  It began as Lucid Emacs, which was in turn derived from
 326 GNU Emacs, a program written by Richard Stallman of the Free Software
 327 Foundation.  GNU Emacs dates back to the 1970's, and was modelled
 328 after a package called ``Emacs'', written in 1976, that was a set of
 329 macros on top of TECO, an old, old text editor written at MIT on the
 330 DEC PDP 10 under one of the earliest time-sharing operating systems,
 331 ITS (Incompatible Timesharing System). (ITS dates back well before
 332 Unix.) ITS, TECO, and Emacs were products of a group of people at MIT
 333 who called themselves ``hackers'', who shared an idealistic belief
 334 system about the free exchange of information and were fanatical in
 335 their devotion to and time spent with computers. (The hacker
 336 subculture dates back to the late 1950's at MIT and is described in
 337 detail in Steven Levy's book @cite{Hackers}.  This book also includes
 338 a lot of information about Stallman himself and the development of
 339 Lisp, a programming language developed at MIT that underlies Emacs.)
 340
 341 @menu
 342 * Through Version 18::          Unification prevails.
 343 * Lucid Emacs::                 One version 19 Emacs.
 344 * GNU Emacs 19::                The other version 19 Emacs.
 345 * GNU Emacs 20::                The other version 20 Emacs.
 346 * XEmacs::                      The continuation of Lucid Emacs.
 347 @end menu
 348
 349 @node Through Version 18
 350 @section Through Version 18
 351 @cindex version 18, through
 352 @cindex Gosling, James
 353 @cindex Great Usenet Renaming
 354
 355   Although the history of the early versions of GNU Emacs is unclear,
 356 the history is well-known from the middle of 1985.  A time line is:
 357
 358 @itemize @bullet
 359 @item
 360 GNU Emacs version 15 (15.34) was released sometime in 1984 or 1985 and
 361 shared some code with a version of Emacs written by James Gosling (the
 362 same James Gosling who later created the Java language).
 363 @item
 364 GNU Emacs version 16 (first released version was 16.56) was released on
 365 July 15, 1985.  All Gosling code was removed due to potential copyright
 366 problems with the code.
 367 @item
 368 version 16.57: released on September 16, 1985.
 369 @item
 370 versions 16.58, 16.59: released on September 17, 1985.
 371 @item
 372 version 16.60: released on September 19, 1985.  These later version 16's
 373 incorporated patches from the net, esp. for getting Emacs to work under
 374 System V.
 375 @item
 376 version 17.36 (first official v17 release) released on December 20,
 377 1985.  Included a TeX-able user manual.  First official unpatched
 378 version that worked on vanilla System V machines.
 379 @item
 380 version 17.43 (second official v17 release) released on January 25,
 381 1986.
 382 @item
 383 version 17.45 released on January 30, 1986.
 384 @item
 385 version 17.46 released on February 4, 1986.
 386 @item
 387 version 17.48 released on February 10, 1986.
 388 @item
 389 version 17.49 released on February 12, 1986.
 390 @item
 391 version 17.55 released on March 18, 1986.
 392 @item
 393 version 17.57 released on March 27, 1986.
 394 @item
 395 version 17.58 released on April 4, 1986.
 396 @item
 397 version 17.61 released on April 12, 1986.
 398 @item
 399 version 17.63 released on May 7, 1986.
 400 @item
 401 version 17.64 released on May 12, 1986.
 402 @item
 403 version 18.24 (a beta version) released on October 2, 1986.
 404 @item
 405 version 18.30 (a beta version) released on November 15, 1986.
 406 @item
 407 version 18.31 (a beta version) released on November 23, 1986.
 408 @item
 409 version 18.32 (a beta version) released on December 7, 1986.
 410 @item
 411 version 18.33 (a beta version) released on December 12, 1986.
 412 @item
 413 version 18.35 (a beta version) released on January 5, 1987.
 414 @item
 415 version 18.36 (a beta version) released on January 21, 1987.
 416 @item
 417 January 27, 1987: The Great Usenet Renaming.  net.emacs is now
 418 comp.emacs.
 419 @item
 420 version 18.37 (a beta version) released on February 12, 1987.
 421 @item
 422 version 18.38 (a beta version) released on March 3, 1987.
 423 @item
 424 version 18.39 (a beta version) released on March 14, 1987.
 425 @item
 426 version 18.40 (a beta version) released on March 18, 1987.
 427 @item
 428 version 18.41 (the first ``official'' release) released on March 22,
 429 1987.
 430 @item
 431 version 18.45 released on June 2, 1987.
 432 @item
 433 version 18.46 released on June 9, 1987.
 434 @item
 435 version 18.47 released on June 18, 1987.
 436 @item
 437 version 18.48 released on September 3, 1987.
 438 @item
 439 version 18.49 released on September 18, 1987.
 440 @item
 441 version 18.50 released on February 13, 1988.
 442 @item
 443 version 18.51 released on May 7, 1988.
 444 @item
 445 version 18.52 released on September 1, 1988.
 446 @item
 447 version 18.53 released on February 24, 1989.
 448 @item
 449 version 18.54 released on April 26, 1989.
 450 @item
 451 version 18.55 released on August 23, 1989.  This is the earliest version
 452 that is still available by FTP.
 453 @item
 454 version 18.56 released on January 17, 1991.
 455 @item
 456 version 18.57 released late January, 1991.
 457 @item
 458 version 18.58 released ?????.
 459 @item
 460 version 18.59 released October 31, 1992.
 461 @end itemize
 462
 463 @node Lucid Emacs
 464 @section Lucid Emacs
 465 @cindex Lucid Emacs
 466 @cindex Lucid Inc.
 467 @cindex Energize
 468 @cindex Epoch
 469
 470   Lucid Emacs was developed by the (now-defunct) Lucid Inc., a maker of
 471 C++ and Lisp development environments.  It began when Lucid decided they
 472 wanted to use Emacs as the editor and cornerstone of their C++
 473 development environment (called ``Energize'').  They needed many features
 474 that were not available in the existing version of GNU Emacs (version
 475 18.5something), in particular good and integrated support for GUI
 476 elements such as mouse support, multiple fonts, multiple window-system
 477 windows, etc.  A branch of GNU Emacs called Epoch, written at the
 478 University of Illinois, existed that supplied many of these features;
 479 however, Lucid needed more than what existed in Epoch.  At the time, the
 480 Free Software Foundation was working on version 19 of Emacs (this was
 481 sometime around 1991), which was planned to have similar features, and
 482 so Lucid decided to work with the Free Software Foundation.  Their plan
 483 was to add features that they needed, and coordinate with the FSF so
 484 that the features would get included back into Emacs version 19.
 485
 486   Delays in the release of version 19 occurred, however (resulting in it
 487 finally being released more than a year after what was initially
 488 planned), and Lucid encountered unexpected technical resistance in
 489 getting their changes merged back into version 19, so they decided to
 490 release their own version of Emacs, which became Lucid Emacs 19.0.
 491
 492 @cindex Zawinski, Jamie
 493 @cindex Sexton, Harlan
 494 @cindex Benson, Eric
 495 @cindex Devin, Matthieu
 496   The initial authors of Lucid Emacs were Matthieu Devin, Harlan Sexton,
 497 and Eric Benson, and the work was later taken over by Jamie Zawinski,
 498 who became ``Mr. Lucid Emacs'' for many releases.
 499
 500   A time line for Lucid Emacs/XEmacs is
 501
 502 @itemize @bullet
 503 @item
 504 version 19.0 shipped with Energize 1.0, April 1992.
 505 @item
 506 version 19.1 released June 4, 1992.
 507 @item
 508 version 19.2 released June 19, 1992.
 509 @item
 510 version 19.3 released September 9, 1992.
 511 @item
 512 version 19.4 released January 21, 1993.
 513 @item
 514 version 19.5 was a repackaging of 19.4 with a few bug fixes and
 515 shipped with Energize 2.0.  Never released to the net.
 516 @item
 517 version 19.6 released April 9, 1993.
 518 @item
 519 version 19.7 was a repackaging of 19.6 with a few bug fixes and
 520 shipped with Energize 2.1.  Never released to the net.
 521 @item
 522 version 19.8 released September 6, 1993.
 523 @item
 524 version 19.9 released January 12, 1994.
 525 @item
 526 version 19.10 released May 27, 1994.
 527 @item
 528 version 19.11 (first XEmacs) released September 13, 1994.
 529 @item
 530 version 19.12 released June 23, 1995.
 531 @item
 532 version 19.13 released September 1, 1995.
 533 @item
 534 version 19.14 released June 23, 1996.
 535 @item
 536 version 20.0 released February 9, 1997.
 537 @item
 538 version 19.15 released March 28, 1997.
 539 @item
 540 version 20.1 (not released to the net) April 15, 1997.
 541 @item
 542 version 20.2 released May 16, 1997.
 543 @item
 544 version 19.16 released October 31, 1997.
 545 @item
 546 version 20.3 (the first stable version of XEmacs 20.x) released November 30,
 547 1997.
 548 version 20.4 released February 28, 1998.
 549 @end itemize
 550
 551 @node GNU Emacs 19
 552 @section GNU Emacs 19
 553 @cindex GNU Emacs 19
 554 @cindex Emacs 19, GNU
 555 @cindex version 19, GNU Emacs
 556 @cindex FSF Emacs
 557
 558   About a year after the initial release of Lucid Emacs, the FSF
 559 released a beta of their version of Emacs 19 (referred to here as ``GNU
 560 Emacs'').  By this time, the current version of Lucid Emacs was
 561 19.6. (Strangely, the first released beta from the FSF was GNU Emacs
 562 19.7.) A time line for GNU Emacs version 19 is
 563
 564 @itemize @bullet
 565 @item
 566 version 19.8 (beta) released May 27, 1993.
 567 @item
 568 version 19.9 (beta) released May 27, 1993.
 569 @item
 570 version 19.10 (beta) released May 30, 1993.
 571 @item
 572 version 19.11 (beta) released June 1, 1993.
 573 @item
 574 version 19.12 (beta) released June 2, 1993.
 575 @item
 576 version 19.13 (beta) released June 8, 1993.
 577 @item
 578 version 19.14 (beta) released June 17, 1993.
 579 @item
 580 version 19.15 (beta) released June 19, 1993.
 581 @item
 582 version 19.16 (beta) released July 6, 1993.
 583 @item
 584 version 19.17 (beta) released late July, 1993.
 585 @item
 586 version 19.18 (beta) released August 9, 1993.
 587 @item
 588 version 19.19 (beta) released August 15, 1993.
 589 @item
 590 version 19.20 (beta) released November 17, 1993.
 591 @item
 592 version 19.21 (beta) released November 17, 1993.
 593 @item
 594 version 19.22 (beta) released November 28, 1993.
 595 @item
 596 version 19.23 (beta) released May 17, 1994.
 597 @item
 598 version 19.24 (beta) released May 16, 1994.
 599 @item
 600 version 19.25 (beta) released June 3, 1994.
 601 @item
 602 version 19.26 (beta) released September 11, 1994.
 603 @item
 604 version 19.27 (beta) released September 14, 1994.
 605 @item
 606 version 19.28 (first ``official'' release) released November 1, 1994.
 607 @item
 608 version 19.29 released June 21, 1995.
 609 @item
 610 version 19.30 released November 24, 1995.
 611 @item
 612 version 19.31 released May 25, 1996.
 613 @item
 614 version 19.32 released July 31, 1996.
 615 @item
 616 version 19.33 released August 11, 1996.
 617 @item
 618 version 19.34 released August 21, 1996.
 619 @item
 620 version 19.34b released September 6, 1996.
 621 @end itemize
 622
 623 @cindex Mlynarik, Richard
 624   In some ways, GNU Emacs 19 was better than Lucid Emacs; in some ways,
 625 worse.  Lucid soon began incorporating features from GNU Emacs 19 into
 626 Lucid Emacs; the work was mostly done by Richard Mlynarik, who had been
 627 working on and using GNU Emacs for a long time (back as far as version
 628 16 or 17).
 629
 630 @node GNU Emacs 20
 631 @section GNU Emacs 20
 632 @cindex GNU Emacs 20
 633 @cindex Emacs 20, GNU
 634 @cindex version 20, GNU Emacs
 635 @cindex FSF Emacs
 636
 637 On February 2, 1997 work began on GNU Emacs to integrate Mule.  The first
 638 release was made in September of that year.
 639
 640 A timeline for Emacs 20 is
 641
 642 @itemize @bullet
 643 @item
 644 version 20.1 released September 17, 1997.
 645 @item
 646 version 20.2 released September 20, 1997.
 647 @item
 648 version 20.3 released August 19, 1998.
 649 @end itemize
 650
 651 @node XEmacs
 652 @section XEmacs
 653 @cindex XEmacs
 654
 655 @cindex Sun Microsystems
 656 @cindex University of Illinois
 657 @cindex Illinois, University of
 658 @cindex SPARCWorks
 659 @cindex Andreessen, Marc
 660 @cindex Baur, Steve
 661 @cindex Buchholz, Martin
 662 @cindex Kaplan, Simon
 663 @cindex Wing, Ben
 664 @cindex Thompson, Chuck
 665 @cindex Win-Emacs
 666 @cindex Epoch
 667 @cindex Amdahl Corporation
 668   Around the time that Lucid was developing Energize, Sun Microsystems
 669 was developing their own development environment (called ``SPARCWorks'')
 670 and also decided to use Emacs.  They joined forces with the Epoch team
 671 at the University of Illinois and later with Lucid.  The maintainer of
 672 the last-released version of Epoch was Marc Andreessen, but he dropped
 673 out and the Epoch project, headed by Simon Kaplan, lured Chuck Thompson
 674 away from a system administration job to become the primary Lucid Emacs
 675 author for Epoch and Sun.  Chuck's area of specialty became the
 676 redisplay engine (he replaced the old Lucid Emacs redisplay engine with
 677 a ported version from Epoch and then later rewrote it from scratch).
 678 Sun also hired Ben Wing (the author of Win-Emacs, a port of Lucid Emacs
 679 to Microsoft Windows 3.1) in 1993, for what was initially a one-month
 680 contract to fix some event problems but later became a many-year
 681 involvement, punctuated by a six-month contract with Amdahl Corporation.
 682
 683 @cindex rename to XEmacs
 684   In 1994, Sun and Lucid agreed to rename Lucid Emacs to XEmacs (a name
 685 not favorable to either company); the first release called XEmacs was
 686 version 19.11.  In June 1994, Lucid folded and Jamie quit to work for
 687 the newly formed Mosaic Communications Corp., later Netscape
 688 Communications Corp. (co-founded by the same Marc Andreessen, who had
 689 quit his Epoch job to work on a graphical browser for the World Wide
 690 Web).  Chuck then become the primary maintainer of XEmacs, and put out
 691 versions 19.11 through 19.14 in conjunction with Ben.  For 19.12 and
 692 19.13, Chuck added the new redisplay and many other display improvements
 693 and Ben added MULE support (support for Asian and other languages) and
 694 redesigned most of the internal Lisp subsystems to better support the
 695 MULE work and the various other features being added to XEmacs.  After
 696 19.14 Chuck retired as primary maintainer and Steve Baur stepped in.
 697
 698 @cindex MULE merged XEmacs appears
 699   Soon after 19.13 was released, work began in earnest on the MULE
 700 internationalization code and the source tree was divided into two
 701 development paths.  The MULE version was initially called 19.20, but was
 702 soon renamed to 20.0.  In 1996 Martin Buchholz of Sun Microsystems took
 703 over the care and feeding of it and worked on it in parallel with the
 704 19.14 development that was occurring at the same time.  After much work
 705 by Martin, it was decided to release 20.0 ahead of 19.15 in February
 706 1997.  The source tree remained divided until 20.2 when the version 19
 707 source was finally retired at version 19.16.
 708
 709 @cindex Baur, Steve
 710 @cindex Buchholz, Martin
 711 @cindex Jones, Kyle
 712 @cindex Niksic, Hrvoje
 713 @cindex XEmacs goes it alone
 714   In 1997, Sun finally dropped all pretense of support for XEmacs and
 715 Martin Buchholz left the company in November.  Since then, and mostly
 716 for the previous year, because Steve Baur was never paid to work on
 717 XEmacs, XEmacs has existed solely on the contributions of volunteers
 718 from the Free Software Community.  Starting from 1997, Hrvoje Niksic and
 719 Kyle Jones have figured prominently in XEmacs development.
 720
 721 @cindex merging attempts
 722   Many attempts have been made to merge XEmacs and GNU Emacs, but they
 723 have consistently failed.
 724
 725   A more detailed history is contained in the XEmacs About page.
 726
 727 @node XEmacs From the Outside, The Lisp Language, A History of Emacs, Top
 728 @chapter XEmacs From the Outside
 729 @cindex XEmacs from the outside
 730 @cindex outside, XEmacs from the
 731 @cindex read-eval-print
 732
 733   XEmacs appears to the outside world as an editor, but it is really a
 734 Lisp environment.  At its heart is a Lisp interpreter; it also
 735 ``happens'' to contain many specialized object types (e.g. buffers,
 736 windows, frames, events) that are useful for implementing an editor.
 737 Some of these objects (in particular windows and frames) have
 738 displayable representations, and XEmacs provides a function
 739 @code{redisplay()} that ensures that the display of all such objects
 740 matches their internal state.  Most of the time, a standard Lisp
 741 environment is in a @dfn{read-eval-print} loop---i.e. ``read some Lisp
 742 code, execute it, and print the results''.  XEmacs has a similar loop:
 743
 744 @itemize @bullet
 745 @item
 746 read an event
 747 @item
 748 dispatch the event (i.e. ``do it'')
 749 @item
 750 redisplay
 751 @end itemize
 752
 753   Reading an event is done using the Lisp function @code{next-event},
 754 which waits for something to happen (typically, the user presses a key
 755 or moves the mouse) and returns an event object describing this.
 756 Dispatching an event is done using the Lisp function
 757 @code{dispatch-event}, which looks up the event in a keymap object (a
 758 particular kind of object that associates an event with a Lisp function)
 759 and calls that function.  The function ``does'' what the user has
 760 requested by changing the state of particular frame objects, buffer
 761 objects, etc.  Finally, @code{redisplay()} is called, which updates the
 762 display to reflect those changes just made.  Thus is an ``editor'' born.
 763
 764 @cindex bridge, playing
 765 @cindex taxes, doing
 766 @cindex pi, calculating
 767   Note that you do not have to use XEmacs as an editor; you could just
 768 as well make it do your taxes, compute pi, play bridge, etc.  You'd just
 769 have to write functions to do those operations in Lisp.
 770
 771 @node The Lisp Language, XEmacs From the Perspective of Building, XEmacs From the Outside, Top
 772 @chapter The Lisp Language
 773 @cindex Lisp language, the
 774 @cindex Lisp vs. C
 775 @cindex C vs. Lisp
 776 @cindex Lisp vs. Java
 777 @cindex Java vs. Lisp
 778 @cindex dynamic scoping
 779 @cindex scoping, dynamic
 780 @cindex dynamic types
 781 @cindex types, dynamic
 782 @cindex Java
 783 @cindex Common Lisp
 784 @cindex Gosling, James
 785
 786   Lisp is a general-purpose language that is higher-level than C and in
 787 many ways more powerful than C.  Powerful dialects of Lisp such as
 788 Common Lisp are probably much better languages for writing very large
 789 applications than is C. (Unfortunately, for many non-technical
 790 reasons C and its successor C++ have become the dominant languages for
 791 application development.  These languages are both inadequate for
 792 extremely large applications, which is evidenced by the fact that newer,
 793 larger programs are becoming ever harder to write and are requiring ever
 794 more programmers despite great increases in C development environments;
 795 and by the fact that, although hardware speeds and reliability have been
 796 growing at an exponential rate, most software is still generally
 797 considered to be slow and buggy.)
 798
 799   The new Java language holds promise as a better general-purpose
 800 development language than C.  Java has many features in common with
 801 Lisp that are not shared by C (this is not a coincidence, since
 802 Java was designed by James Gosling, a former Lisp hacker).  This
 803 will be discussed more later.
 804
 805 For those used to C, here is a summary of the basic differences between
 806 C and Lisp:
 807
 808 @enumerate
 809 @item
 810 Lisp has an extremely regular syntax.  Every function, expression,
 811 and control statement is written in the form
 812
 813 @example
 814    (@var{func} @var{arg1} @var{arg2} ...)
 815 @end example
 816
 817 This is as opposed to C, which writes functions as
 818
 819 @example
 820    func(@var{arg1}, @var{arg2}, ...)
 821 @end example
 822
 823 but writes expressions involving operators as (e.g.)
 824
 825 @example
 826    @var{arg1} + @var{arg2}
 827 @end example
 828
 829 and writes control statements as (e.g.)
 830
 831 @example
 832    while (@var{expr}) @{ @var{statement1}; @var{statement2}; ... @}
 833 @end example
 834
 835 Lisp equivalents of the latter two would be
 836
 837 @example
 838    (+ @var{arg1} @var{arg2} ...)
 839 @end example
 840
 841 and
 842
 843 @example
 844    (while @var{expr} @var{statement1} @var{statement2} ...)
 845 @end example
 846
 847 @item
 848 Lisp is a safe language.  Assuming there are no bugs in the Lisp
 849 interpreter/compiler, it is impossible to write a program that ``core
 850 dumps'' or otherwise causes the machine to execute an illegal
 851 instruction.  This is very different from C, where perhaps the most
 852 common outcome of a bug is exactly such a crash.  A corollary of this is that
 853 the C operation of casting a pointer is impossible (and unnecessary) in
 854 Lisp, and that it is impossible to access memory outside the bounds of
 855 an array.
 856
 857 @item
 858 Programs and data are written in the same form.  The
 859 parenthesis-enclosing form described above for statements is the same
 860 form used for the most common data type in Lisp, the list.  Thus, it is
 861 possible to represent any Lisp program using Lisp data types, and for
 862 one program to construct Lisp statements and then dynamically
 863 @dfn{evaluate} them, or cause them to execute.
 864
 865 @item
 866 All objects are @dfn{dynamically typed}.  This means that part of every
 867 object is an indication of what type it is.  A Lisp program can
 868 manipulate an object without knowing what type it is, and can query an
 869 object to determine its type.  This means that, correspondingly,
 870 variables and function parameters can hold objects of any type and are
 871 not normally declared as being of any particular type.  This is opposed
 872 to the @dfn{static typing} of C, where variables can hold exactly one
 873 type of object and must be declared as such, and objects do not contain
 874 an indication of their type because it's implicit in the variables they
 875 are stored in.  It is possible in C to have a variable hold different
 876 types of objects (e.g. through the use of @code{void *} pointers or
 877 variable-argument functions), but the type information must then be
 878 passed explicitly in some other fashion, leading to additional program
 879 complexity.
 880
 881 @item
 882 Allocated memory is automatically reclaimed when it is no longer in use.
 883 This operation is called @dfn{garbage collection} and involves looking
 884 through all variables to see what memory is being pointed to, and
 885 reclaiming any memory that is not pointed to and is thus
 886 ``inaccessible'' and out of use.  This is as opposed to C, in which
 887 allocated memory must be explicitly reclaimed using @code{free()}.  If
 888 you simply drop all pointers to memory without freeing it, it becomes
 889 ``leaked'' memory that still takes up space.  Over a long period of
 890 time, this can cause your program to grow and grow until it runs out of
 891 memory.
 892
 893 @item
 894 Lisp has built-in facilities for handling errors and exceptions.  In C,
 895 when an error occurs, usually either the program exits entirely or the
 896 routine in which the error occurs returns a value indicating this.  If
 897 an error occurs in a deeply-nested routine, then every routine currently
 898 called must unwind itself normally and return an error value back up to
 899 the next routine.  This means that every routine must explicitly check
 900 for an error in all the routines it calls; if it does not do so,
 901 unexpected and often random behavior results.  This is an extremely
 902 common source of bugs in C programs.  An alternative would be to do a
 903 non-local exit using @code{longjmp()}, but that is often very dangerous
 904 because the routines that were exited past had no opportunity to clean
 905 up after themselves and may leave things in an inconsistent state,
 906 causing a crash shortly afterwards.
 907
 908 Lisp provides mechanisms to make such non-local exits safe.  When an
 909 error occurs, a routine simply signals that an error of a particular
 910 class has occurred, and a non-local exit takes place.  Any routine can
 911 trap errors occurring in routines it calls by registering an error
 912 handler for some or all classes of errors. (If no handler is registered,
 913 a default handler, generally installed by the top-level event loop, is
 914 executed; this prints out the error and continues.) Routines can also
 915 specify cleanup code (called an @dfn{unwind-protect}) that will be
 916 called when control exits from a block of code, no matter how that exit
 917 occurs---i.e. even if a function deeply nested below it causes a
 918 non-local exit back to the top level.
 919
 920 Note that this facility has appeared in some recent vintages of C, in
 921 particular Visual C++ and other PC compilers written for the Microsoft
 922 Win32 API.
 923
 924 @item
 925 In Emacs Lisp, local variables are @dfn{dynamically scoped}.  This means
 926 that if you declare a local variable in a particular function, and then
 927 call another function, that subfunction can ``see'' the local variable
 928 you declared.  This is actually considered a bug in Emacs Lisp and in
 929 all other early dialects of Lisp, and was corrected in Common Lisp. (In
 930 Common Lisp, you can still declare dynamically scoped variables if you
 931 want to---they are sometimes useful---but variables by default are
 932 @dfn{lexically scoped} as in C.)
 933 @end enumerate
 934
 935 For those familiar with Lisp, Emacs Lisp is modelled after MacLisp, an
 936 early dialect of Lisp developed at MIT (no relation to the Macintosh
 937 computer).  There is a Common Lisp compatibility package available for
 938 Emacs that provides many of the features of Common Lisp.
 939
 940 The Java language is derived in many ways from C, and shares a similar
 941 syntax, but has the following features in common with Lisp (and different
 942 from C):
 943
 944 @enumerate
 945 @item
 946 Java is a safe language, like Lisp.
 947 @item
 948 Java provides garbage collection, like Lisp.
 949 @item
 950 Java has built-in facilities for handling errors and exceptions, like
 951 Lisp.
 952 @item
 953 Java has a type system that combines the best advantages of both static
 954 and dynamic typing.  Objects (except very simple types) are explicitly
 955 marked with their type, as in dynamic typing; but there is a hierarchy
 956 of types and functions are declared to accept only certain types, thus
 957 providing the increased compile-time error-checking of static typing.
 958 @end enumerate
 959
 960 The Java language also has some negative attributes:
 961
 962 @enumerate
 963 @item
 964 Java uses the edit/compile/run model of software development.  This
 965 makes it hard to use interactively.  For example, to use Java like
 966 @code{bc} it is necessary to write a special purpose, albeit tiny,
 967 application.  In Emacs Lisp, a calculator comes built-in without any
 968 effort - one can always just type an expression in the @code{*scratch*}
 969 buffer.
 970 @item
 971 Java tries too hard to enforce, not merely enable, portability, making
 972 ordinary access to standard OS facilities painful.  Java has an
 973 @dfn{agenda}.  I think this is why @code{chdir} is not part of standard
 974 Java, which is inexcusable.
 975 @end enumerate
 976
 977 Unfortunately, there is no perfect language.  Static typing allows a
 978 compiler to catch programmer errors and produce more efficient code, but
 979 makes programming more tedious and less fun.  For the foreseeable future,
 980 an Ideal Editing and Programming Environment (and that is what XEmacs
 981 aspires to) will be programmable in multiple languages: high level ones
 982 like Lisp for user customization and prototyping, and lower level ones
 983 for infrastructure and industrial strength applications.  If I had my
 984 way, XEmacs would be friendly towards the Python, Scheme, C++, ML,
 985 etc... communities.  But there are serious technical difficulties to
 986 achieving that goal.
 987
 988 The word @dfn{application} in the previous paragraph was used
 989 intentionally.  XEmacs implements an API for programs written in Lisp
 990 that makes it a full-fledged application platform, very much like an OS
 991 inside the real OS.
 992
 993 @node XEmacs From the Perspective of Building, XEmacs From the Inside, The Lisp Language, Top
 994 @chapter XEmacs From the Perspective of Building
 995 @cindex XEmacs from the perspective of building
 996 @cindex building, XEmacs from the perspective of
 997
 998 The heart of XEmacs is the Lisp environment, which is written in C.
 999 This is contained in the @file{src/} subdirectory.  Underneath
1000 @file{src/} are two subdirectories of header files: @file{s/} (header
1001 files for particular operating systems) and @file{m/} (header files for
1002 particular machine types).  In practice the distinction between the two
1003 types of header files is blurred.  These header files define or undefine
1004 certain preprocessor constants and macros to indicate particular
1005 characteristics of the associated machine or operating system.  As part
1006 of the configure process, one @file{s/} file and one @file{m/} file is
1007 identified for the particular environment in which XEmacs is being
1008 built.
1009
1010 XEmacs also contains a great deal of Lisp code.  This implements the
1011 operations that make XEmacs useful as an editor as well as just a Lisp
1012 environment, and also contains many add-on packages that allow XEmacs to
1013 browse directories, act as a mail and Usenet news reader, compile Lisp
1014 code, etc.  There is actually more Lisp code than C code associated with
1015 XEmacs, but much of the Lisp code is peripheral to the actual operation
1016 of the editor.  The Lisp code all lies in subdirectories underneath the
1017 @file{lisp/} directory.
1018
1019 The @file{lwlib/} directory contains C code that implements a
1020 generalized interface onto different X widget toolkits and also
1021 implements some widgets of its own that behave like Motif widgets but
1022 are faster, free, and in some cases more powerful.  The code in this
1023 directory compiles into a library and is mostly independent from XEmacs.
1024
1025 The @file{etc/} directory contains various data files associated with
1026 XEmacs.  Some of them are actually read by XEmacs at startup; others
1027 merely contain useful information of various sorts.
1028
1029 The @file{lib-src/} directory contains C code for various auxiliary
1030 programs that are used in connection with XEmacs.  Some of them are used
1031 during the build process; others are used to perform certain functions
1032 that cannot conveniently be placed in the XEmacs executable (e.g. the
1033 @file{movemail} program for fetching mail out of @file{/var/spool/mail},
1034 which must be setgid to @file{mail} on many systems; and the
1035 @file{gnuclient} program, which allows an external script to communicate
1036 with a running XEmacs process).
1037
1038 The @file{man/} directory contains the sources for the XEmacs
1039 documentation.  It is mostly in a form called Texinfo, which can be
1040 converted into either a printed document (by passing it through @TeX{})
1041 or into on-line documentation called @dfn{info files}.
1042
1043 The @file{info/} directory contains the results of formatting the XEmacs
1044 documentation as @dfn{info files}, for on-line use.  These files are
1045 used when you enter the Info system using @kbd{C-h i} or through the
1046 Help menu.
1047
1048 The @file{dynodump/} directory contains auxiliary code used to build
1049 XEmacs on Solaris platforms.
1050
1051 The other directories contain various miscellaneous code and information
1052 that is not normally used or needed.
1053
1054 The first step of building involves running the @file{configure} program
1055 and passing it various parameters to specify any optional features you
1056 want and compiler arguments and such, as described in the @file{INSTALL}
1057 file.  This determines what the build environment is, chooses the
1058 appropriate @file{s/} and @file{m/} file, and runs a series of tests to
1059 determine many details about your environment, such as which library
1060 functions are available and exactly how they work.  The reason for
1061 running these tests is that it allows XEmacs to be compiled on a much
1062 wider variety of platforms than those that the XEmacs developers happen
1063 to be familiar with, including various sorts of hybrid platforms.  This
1064 is especially important now that many operating systems give you a great
1065 deal of control over exactly what features you want installed, and allow
1066 for easy upgrading of parts of a system without upgrading the rest.  It
1067 would be impossible to pre-determine and pre-specify the information for
1068 all possible configurations.
1069
1070 In fact, the @file{s/} and @file{m/} files are basically @emph{evil},
1071 since they contain unmaintainable platform-specific hard-coded
1072 information.  XEmacs has been moving in the direction of having all
1073 system-specific information be determined dynamically by
1074 @file{configure}.  Perhaps someday we can @code{rm -rf src/s src/m}.
1075
1076 When configure is done running, it generates @file{Makefile}s and
1077 @file{GNUmakefile}s and the file @file{src/config.h} (which describes
1078 the features of your system) from template files.  You then run
1079 @file{make}, which compiles the auxiliary code and programs in
1080 @file{lib-src/} and @file{lwlib/} and the main XEmacs executable in
1081 @file{src/}.  The result of compiling and linking is an executable
1082 called @file{temacs}, which is @emph{not} the final XEmacs executable.
1083 @file{temacs} by itself is not intended to function as an editor or even
1084 display any windows on the screen, and if you simply run it, it will
1085 exit immediately.  The @file{Makefile} runs @file{temacs} with certain
1086 options that cause it to initialize itself, read in a number of basic
1087 Lisp files, and then dump itself out into a new executable called
1088 @file{xemacs}.  This new executable has been pre-initialized and
1089 contains pre-digested Lisp code that is necessary for the editor to
1090 function (this includes most basic editing functions,
1091 e.g. @code{kill-line}, that can be defined in terms of other Lisp
1092 primitives; some initialization code that is called when certain
1093 objects, such as frames, are created; and all of the standard
1094 keybindings and code for the actions they result in).  This executable,
1095 @file{xemacs}, is the executable that you run to use the XEmacs editor.
1096
1097 Although @file{temacs} is not intended to be run as an editor, it can,
1098 by using the incantation @code{temacs -batch -l loadup.el run-temacs}.
1099 This is useful when the dumping procedure described above is broken, or
1100 when using certain program debugging tools such as Purify.  These tools
1101 get mighty confused by the tricks played by the XEmacs build process,
1102 such as allocation memory in one process, and freeing it in the next.
1103
1104 @node XEmacs From the Inside, The XEmacs Object System (Abstractly Speaking), XEmacs From the Perspective of Building, Top
1105 @chapter XEmacs From the Inside
1106 @cindex XEmacs from the inside
1107 @cindex inside, XEmacs from the
1108
1109 Internally, XEmacs is quite complex, and can be very confusing.  To
1110 simplify things, it can be useful to think of XEmacs as containing an
1111 event loop that ``drives'' everything, and a number of other subsystems,
1112 such as a Lisp engine and a redisplay mechanism.  Each of these other
1113 subsystems exists simultaneously in XEmacs, and each has a certain
1114 state.  The flow of control continually passes in and out of these
1115 different subsystems in the course of normal operation of the editor.
1116
1117 It is important to keep in mind that, most of the time, the editor is
1118 ``driven'' by the event loop.  Except during initialization and batch
1119 mode, all subsystems are entered directly or indirectly through the
1120 event loop, and ultimately, control exits out of all subsystems back up
1121 to the event loop.  This cycle of entering a subsystem, exiting back out
1122 to the event loop, and starting another iteration of the event loop
1123 occurs once each keystroke, mouse motion, etc.
1124
1125 If you're trying to understand a particular subsystem (other than the
1126 event loop), think of it as a ``daemon'' process or ``servant'' that is
1127 responsible for one particular aspect of a larger system, and
1128 periodically receives commands or environment changes that cause it to
1129 do something.  Ultimately, these commands and environment changes are
1130 always triggered by the event loop.  For example:
1131
1132 @itemize @bullet
1133 @item
1134 The window and frame mechanism is responsible for keeping track of what
1135 windows and frames exist, what buffers are in them, etc.  It is
1136 periodically given commands (usually from the user) to make a change to
1137 the current window/frame state: i.e. create a new frame, delete a
1138 window, etc.
1139
1140 @item
1141 The buffer mechanism is responsible for keeping track of what buffers
1142 exist and what text is in them.  It is periodically given commands
1143 (usually from the user) to insert or delete text, create a buffer, etc.
1144 When it receives a text-change command, it notifies the redisplay
1145 mechanism.
1146
1147 @item
1148 The redisplay mechanism is responsible for making sure that windows and
1149 frames are displayed correctly.  It is periodically told (by the event
1150 loop) to actually ``do its job'', i.e. snoop around and see what the
1151 current state of the environment (mostly of the currently-existing
1152 windows, frames, and buffers) is, and make sure that that state matches
1153 what's actually displayed.  It keeps lots and lots of information around
1154 (such as what is actually being displayed currently, and what the
1155 environment was last time it checked) so that it can minimize the work
1156 it has to do.  It is also helped along in that whenever a relevant
1157 change to the environment occurs, the redisplay mechanism is told about
1158 this, so it has a pretty good idea of where it has to look to find
1159 possible changes and doesn't have to look everywhere.
1160
1161 @item
1162 The Lisp engine is responsible for executing the Lisp code in which most
1163 user commands are written.  It is entered through a call to @code{eval}
1164 or @code{funcall}, which occurs as a result of dispatching an event from
1165 the event loop.  The functions it calls issue commands to the buffer
1166 mechanism, the window/frame subsystem, etc.
1167
1168 @item
1169 The Lisp allocation subsystem is responsible for keeping track of Lisp
1170 objects.  It is given commands from the Lisp engine to allocate objects,
1171 garbage collect, etc.
1172 @end itemize
1173
1174 etc.
1175
1176   The important idea here is that there are a number of independent
1177 subsystems each with its own responsibility and persistent state, just
1178 like different employees in a company, and each subsystem is
1179 periodically given commands from other subsystems.  Commands can flow
1180 from any one subsystem to any other, but there is usually some sort of
1181 hierarchy, with all commands originating from the event subsystem.
1182
1183   XEmacs is entered in @code{main()}, which is in @file{emacs.c}.  When
1184 this is called the first time (in a properly-invoked @file{temacs}), it
1185 does the following:
1186
1187 @enumerate
1188 @item
1189 It does some very basic environment initializations, such as determining
1190 where it and its directories (e.g. @file{lisp/} and @file{etc/}) reside
1191 and setting up signal handlers.
1192 @item
1193 It initializes the entire Lisp interpreter.
1194 @item
1195 It sets the initial values of many built-in variables (including many
1196 variables that are visible to Lisp programs), such as the global keymap
1197 object and the built-in faces (a face is an object that describes the
1198 display characteristics of text).  This involves creating Lisp objects
1199 and thus is dependent on step (2).
1200 @item
1201 It performs various other initializations that are relevant to the
1202 particular environment it is running in, such as retrieving environment
1203 variables, determining the current date and the user who is running the
1204 program, examining its standard input, creating any necessary file
1205 descriptors, etc.
1206 @item
1207 At this point, the C initialization is complete.  A Lisp program that
1208 was specified on the command line (usually @file{loadup.el}) is called
1209 (temacs is normally invoked as @code{temacs -batch -l loadup.el dump}).
1210 @file{loadup.el} loads all of the other Lisp files that are needed for
1211 the operation of the editor, calls the @code{dump-emacs} function to
1212 write out @file{xemacs}, and then kills the temacs process.
1213 @end enumerate
1214
1215   When @file{xemacs} is then run, it only redoes steps (1) and (4)
1216 above; all variables already contain the values they were set to when
1217 the executable was dumped, and all memory that was allocated with
1218 @code{malloc()} is still around. (XEmacs knows whether it is being run
1219 as @file{xemacs} or @file{temacs} because it sets the global variable
1220 @code{initialized} to 1 after step (4) above.) At this point,
1221 @file{xemacs} calls a Lisp function to do any further initialization,
1222 which includes parsing the command-line (the C code can only do limited
1223 command-line parsing, which includes looking for the @samp{-batch} and
1224 @samp{-l} flags and a few other flags that it needs to know about before
1225 initialization is complete), creating the first frame (or @dfn{window}
1226 in standard window-system parlance), running the user's init file
1227 (usually the file @file{.emacs} in the user's home directory), etc.  The
1228 function to do this is usually called @code{normal-top-level};
1229 @file{loadup.el} tells the C code about this function by setting its
1230 name as the value of the Lisp variable @code{top-level}.
1231
1232   When the Lisp initialization code is done, the C code enters the event
1233 loop, and stays there for the duration of the XEmacs process.  The code
1234 for the event loop is contained in @file{cmdloop.c}, and is called
1235 @code{Fcommand_loop_1()}.  Note that this event loop could very well be
1236 written in Lisp, and in fact a Lisp version exists; but apparently,
1237 doing this makes XEmacs run noticeably slower.
1238
1239   Notice how much of the initialization is done in Lisp, not in C.
1240 In general, XEmacs tries to move as much code as is possible
1241 into Lisp.  Code that remains in C is code that implements the
1242 Lisp interpreter itself, or code that needs to be very fast, or
1243 code that needs to do system calls or other such stuff that
1244 needs to be done in C, or code that needs to have access to
1245 ``forbidden'' structures. (One conscious aspect of the design of
1246 Lisp under XEmacs is a clean separation between the external
1247 interface to a Lisp object's functionality and its internal
1248 implementation.  Part of this design is that Lisp programs
1249 are forbidden from accessing the contents of the object other
1250 than through using a standard API.  In this respect, XEmacs Lisp
1251 is similar to modern Lisp dialects but differs from GNU Emacs,
1252 which tends to expose the implementation and allow Lisp
1253 programs to look at it directly.  The major advantage of
1254 hiding the implementation is that it allows the implementation
1255 to be redesigned without affecting any Lisp programs, including
1256 those that might want to be ``clever'' by looking directly at
1257 the object's contents and possibly manipulating them.)
1258
1259   Moving code into Lisp makes the code easier to debug and maintain and
1260 makes it much easier for people who are not XEmacs developers to
1261 customize XEmacs, because they can make a change with much less chance
1262 of obscure and unwanted interactions occurring than if they were to
1263 change the C code.
1264
1265 @node The XEmacs Object System (Abstractly Speaking), How Lisp Objects Are Represented in C, XEmacs From the Inside, Top
1266 @chapter The XEmacs Object System (Abstractly Speaking)
1267 @cindex XEmacs object system (abstractly speaking), the
1268 @cindex object system (abstractly speaking), the XEmacs
1269
1270   At the heart of the Lisp interpreter is its management of objects.
1271 XEmacs Lisp contains many built-in objects, some of which are
1272 simple and others of which can be very complex; and some of which
1273 are very common, and others of which are rarely used or are only
1274 used internally. (Since the Lisp allocation system, with its
1275 automatic reclamation of unused storage, is so much more convenient
1276 than @code{malloc()} and @code{free()}, the C code makes extensive use of it
1277 in its internal operations.)
1278
1279   The basic Lisp objects are
1280
1281 @table @code
1282 @item integer
1283 28 or 31 bits of precision, or 60 or 63 bits on 64-bit machines; the
1284 reason for this is described below when the internal Lisp object
1285 representation is described.
1286 @item float
1287 Same precision as a double in C.
1288 @item cons
1289 A simple container for two Lisp objects, used to implement lists and
1290 most other data structures in Lisp.
1291 @item char
1292 An object representing a single character of text; chars behave like
1293 integers in many ways but are logically considered text rather than
1294 numbers and have a different read syntax. (the read syntax for a char
1295 contains the char itself or some textual encoding of it---for example,
1296 a Japanese Kanji character might be encoded as @samp{^[$(B#&^[(B} using the
1297 ISO-2022 encoding standard---rather than the numerical representation
1298 of the char; this way, if the mapping between chars and integers
1299 changes, which is quite possible for Kanji characters and other extended
1300 characters, the same character will still be created.  Note that some
1301 primitives confuse chars and integers.  The worst culprit is @code{eq},
1302 which makes a special exception and considers a char to be @code{eq} to
1303 its integer equivalent, even though in no other case are objects of two
1304 different types @code{eq}.  The reason for this monstrosity is
1305 compatibility with existing code; the separation of char from integer
1306 came fairly recently.)
1307 @item symbol
1308 An object that contains Lisp objects and is referred to by name;
1309 symbols are used to implement variables and named functions
1310 and to provide the equivalent of preprocessor constants in C.
1311 @item vector
1312 A one-dimensional array of Lisp objects providing constant-time access
1313 to any of the objects; access to an arbitrary object in a vector is
1314 faster than for lists, but the operations that can be done on a vector
1315 are more limited.
1316 @item string
1317 Self-explanatory; behaves much like a vector of chars
1318 but has a different read syntax and is stored and manipulated
1319 more compactly.
1320 @item bit-vector
1321 A vector of bits; similar to a string in spirit.
1322 @item compiled-function
1323 An object containing compiled Lisp code, known as @dfn{byte code}.
1324 @item subr
1325 A Lisp primitive, i.e. a Lisp-callable function implemented in C.
1326 @end table
1327
1328 @cindex closure
1329 Note that there is no basic ``function'' type, as in more powerful
1330 versions of Lisp (where it's called a @dfn{closure}).  XEmacs Lisp does
1331 not provide the closure semantics implemented by Common Lisp and Scheme.
1332 The guts of a function in XEmacs Lisp are represented in one of four
1333 ways: a symbol specifying another function (when one function is an
1334 alias for another), a list (whose first element must be the symbol
1335 @code{lambda}) containing the function's source code, a
1336 compiled-function object, or a subr object. (In other words, given a
1337 symbol specifying the name of a function, calling @code{symbol-function}
1338 to retrieve the contents of the symbol's function cell will return one
1339 of these types of objects.)
1340
1341 XEmacs Lisp also contains numerous specialized objects used to implement
1342 the editor:
1343
1344 @table @code
1345 @item buffer
1346 Stores text like a string, but is optimized for insertion and deletion
1347 and has certain other properties that can be set.
1348 @item frame
1349 An object with various properties whose displayable representation is a
1350 @dfn{window} in window-system parlance.
1351 @item window
1352 A section of a frame that displays the contents of a buffer;
1353 often called a @dfn{pane} in window-system parlance.
1354 @item window-configuration
1355 An object that represents a saved configuration of windows in a frame.
1356 @item device
1357 An object representing a screen on which frames can be displayed;
1358 equivalent to a @dfn{display} in the X Window System and a @dfn{TTY} in
1359 character mode.
1360 @item face
1361 An object specifying the appearance of text or graphics; it has
1362 properties such as font, foreground color, and background color.
1363 @item marker
1364 An object that refers to a particular position in a buffer and moves
1365 around as text is inserted and deleted to stay in the same relative
1366 position to the text around it.
1367 @item extent
1368 Similar to a marker but covers a range of text in a buffer; can also
1369 specify properties of the text, such as a face in which the text is to
1370 be displayed, whether the text is invisible or unmodifiable, etc.
1371 @item event
1372 Generated by calling @code{next-event} and contains information
1373 describing a particular event happening in the system, such as the user
1374 pressing a key or a process terminating.
1375 @item keymap
1376 An object that maps from events (described using lists, vectors, and
1377 symbols rather than with an event object because the mapping is for
1378 classes of events, rather than individual events) to functions to
1379 execute or other events to recursively look up; the functions are
1380 described by name, using a symbol, or using lists to specify the
1381 function's code.
1382 @item glyph
1383 An object that describes the appearance of an image (e.g.  pixmap) on
1384 the screen; glyphs can be attached to the beginning or end of extents
1385 and in some future version of XEmacs will be able to be inserted
1386 directly into a buffer.
1387 @item process
1388 An object that describes a connection to an externally-running process.
1389 @end table
1390
1391   There are some other, less-commonly-encountered general objects:
1392
1393 @table @code
1394 @item hash-table
1395 An object that maps from an arbitrary Lisp object to another arbitrary
1396 Lisp object, using hashing for fast lookup.
1397 @item obarray
1398 A limited form of hash-table that maps from strings to symbols; obarrays
1399 are used to look up a symbol given its name and are not actually their
1400 own object type but are kludgily represented using vectors with hidden
1401 fields (this representation derives from GNU Emacs).
1402 @item specifier
1403 A complex object used to specify the value of a display property; a
1404 default value is given and different values can be specified for
1405 particular frames, buffers, windows, devices, or classes of device.
1406 @item char-table
1407 An object that maps from chars or classes of chars to arbitrary Lisp
1408 objects; internally char tables use a complex nested-vector
1409 representation that is optimized to the way characters are represented
1410 as integers.
1411 @item range-table
1412 An object that maps from ranges of integers to arbitrary Lisp objects.
1413 @end table
1414
1415   And some strange special-purpose objects:
1416
1417 @table @code
1418 @item charset
1419 @itemx coding-system
1420 Objects used when MULE, or multi-lingual/Asian-language, support is
1421 enabled.
1422 @item color-instance
1423 @itemx font-instance
1424 @itemx image-instance
1425 An object that encapsulates a window-system resource; instances are
1426 mostly used internally but are exposed on the Lisp level for cleanness
1427 of the specifier model and because it's occasionally useful for Lisp
1428 program to create or query the properties of instances.
1429 @item subwindow
1430 An object that encapsulate a @dfn{subwindow} resource, i.e. a
1431 window-system child window that is drawn into by an external process;
1432 this object should be integrated into the glyph system but isn't yet,
1433 and may change form when this is done.
1434 @item tooltalk-message
1435 @itemx tooltalk-pattern
1436 Objects that represent resources used in the ToolTalk interprocess
1437 communication protocol.
1438 @item toolbar-button
1439 An object used in conjunction with the toolbar.
1440 @end table
1441
1442   And objects that are only used internally:
1443
1444 @table @code
1445 @item opaque
1446 A generic object for encapsulating arbitrary memory; this allows you the
1447 generality of @code{malloc()} and the convenience of the Lisp object
1448 system.
1449 @item lstream
1450 A buffering I/O stream, used to provide a unified interface to anything
1451 that can accept output or provide input, such as a file descriptor, a
1452 stdio stream, a chunk of memory, a Lisp buffer, a Lisp string, etc.;
1453 it's a Lisp object to make its memory management more convenient.
1454 @item char-table-entry
1455 Subsidiary objects in the internal char-table representation.
1456 @item extent-auxiliary
1457 @itemx menubar-data
1458 @itemx toolbar-data
1459 Various special-purpose objects that are basically just used to
1460 encapsulate memory for particular subsystems, similar to the more
1461 general ``opaque'' object.
1462 @item symbol-value-forward
1463 @itemx symbol-value-buffer-local
1464 @itemx symbol-value-varalias
1465 @itemx symbol-value-lisp-magic
1466 Special internal-only objects that are placed in the value cell of a
1467 symbol to indicate that there is something special with this variable --
1468 e.g. it has no value, it mirrors another variable, or it mirrors some C
1469 variable; there is really only one kind of object, called a
1470 @dfn{symbol-value-magic}, but it is sort-of halfway kludged into
1471 semi-different object types.
1472 @end table
1473
1474 @cindex permanent objects
1475 @cindex temporary objects
1476   Some types of objects are @dfn{permanent}, meaning that once created,
1477 they do not disappear until explicitly destroyed, using a function such
1478 as @code{delete-buffer}, @code{delete-window}, @code{delete-frame}, etc.
1479 Others will disappear once they are not longer used, through the garbage
1480 collection mechanism.  Buffers, frames, windows, devices, and processes
1481 are among the objects that are permanent.  Note that some objects can go
1482 both ways: Faces can be created either way; extents are normally
1483 permanent, but detached extents (extents not referring to any text, as
1484 happens to some extents when the text they are referring to is deleted)
1485 are temporary.  Note that some permanent objects, such as faces and
1486 coding systems, cannot be deleted.  Note also that windows are unique in
1487 that they can be @emph{undeleted} after having previously been
1488 deleted. (This happens as a result of restoring a window configuration.)
1489
1490 @cindex read syntax
1491   Note that many types of objects have a @dfn{read syntax}, i.e. a way of
1492 specifying an object of that type in Lisp code.  When you load a Lisp
1493 file, or type in code to be evaluated, what really happens is that the
1494 function @code{read} is called, which reads some text and creates an object
1495 based on the syntax of that text; then @code{eval} is called, which
1496 possibly does something special; then this loop repeats until there's
1497 no more text to read. (@code{eval} only actually does something special
1498 with symbols, which causes the symbol's value to be returned,
1499 similar to referencing a variable; and with conses [i.e. lists],
1500 which cause a function invocation.  All other values are returned
1501 unchanged.)
1502
1503   The read syntax
1504
1505 @example
1506 17297
1507 @end example
1508
1509 converts to an integer whose value is 17297.
1510
1511 @example
1512 1.983e-4
1513 @end example
1514
1515 converts to a float whose value is 1.983e-4, or .0001983.
1516
1517 @example
1518 ?b
1519 @end example
1520
1521 converts to a char that represents the lowercase letter b.
1522
1523 @example
1524 ?^[$(B#&^[(B
1525 @end example
1526
1527 (where @samp{^[} actually is an @samp{ESC} character) converts to a
1528 particular Kanji character when using an ISO2022-based coding system for
1529 input. (To decode this goo: @samp{ESC} begins an escape sequence;
1530 @samp{ESC $ (} is a class of escape sequences meaning ``switch to a
1531 94x94 character set''; @samp{ESC $ ( B} means ``switch to Japanese
1532 Kanji''; @samp{#} and @samp{&} collectively index into a 94-by-94 array
1533 of characters [subtract 33 from the ASCII value of each character to get
1534 the corresponding index]; @samp{ESC (} is a class of escape sequences
1535 meaning ``switch to a 94 character set''; @samp{ESC (B} means ``switch
1536 to US ASCII''.  It is a coincidence that the letter @samp{B} is used to
1537 denote both Japanese Kanji and US ASCII.  If the first @samp{B} were
1538 replaced with an @samp{A}, you'd be requesting a Chinese Hanzi character
1539 from the GB2312 character set.)
1540
1541 @example
1542 "foobar"
1543 @end example
1544
1545 converts to a string.
1546
1547 @example
1548 foobar
1549 @end example
1550
1551 converts to a symbol whose name is @code{"foobar"}.  This is done by
1552 looking up the string equivalent in the global variable
1553 @code{obarray}, whose contents should be an obarray.  If no symbol
1554 is found, a new symbol with the name @code{"foobar"} is automatically
1555 created and added to @code{obarray}; this process is called
1556 @dfn{interning} the symbol.
1557 @cindex interning
1558
1559 @example
1560 (foo . bar)
1561 @end example
1562
1563 converts to a cons cell containing the symbols @code{foo} and @code{bar}.
1564
1565 @example
1566 (1 a 2.5)
1567 @end example
1568
1569 converts to a three-element list containing the specified objects
1570 (note that a list is actually a set of nested conses; see the
1571 XEmacs Lisp Reference).
1572
1573 @example
1574 [1 a 2.5]
1575 @end example
1576
1577 converts to a three-element vector containing the specified objects.
1578
1579 @example
1580 #[... ... ... ...]
1581 @end example
1582
1583 converts to a compiled-function object (the actual contents are not
1584 shown since they are not relevant here; look at a file that ends with
1585 @file{.elc} for examples).
1586
1587 @example
1588 #*01110110
1589 @end example
1590
1591 converts to a bit-vector.
1592
1593 @example
1594 #s(hash-table ... ...)
1595 @end example
1596
1597 converts to a hash table (the actual contents are not shown).
1598
1599 @example
1600 #s(range-table ... ...)
1601 @end example
1602
1603 converts to a range table (the actual contents are not shown).
1604
1605 @example
1606 #s(char-table ... ...)
1607 @end example
1608
1609 converts to a char table (the actual contents are not shown).
1610
1611 Note that the @code{#s()} syntax is the general syntax for structures,
1612 which are not really implemented in XEmacs Lisp but should be.
1613
1614 When an object is printed out (using @code{print} or a related
1615 function), the read syntax is used, so that the same object can be read
1616 in again.
1617
1618 The other objects do not have read syntaxes, usually because it does not
1619 really make sense to create them in this fashion (i.e.  processes, where
1620 it doesn't make sense to have a subprocess created as a side effect of
1621 reading some Lisp code), or because they can't be created at all
1622 (e.g. subrs).  Permanent objects, as a rule, do not have a read syntax;
1623 nor do most complex objects, which contain too much state to be easily
1624 initialized through a read syntax.
1625
1626 @node How Lisp Objects Are Represented in C, Rules When Writing New C Code, The XEmacs Object System (Abstractly Speaking), Top
1627 @chapter How Lisp Objects Are Represented in C
1628 @cindex Lisp objects are represented in C, how
1629 @cindex objects are represented in C, how Lisp
1630 @cindex represented in C, how Lisp objects are
1631
1632 Lisp objects are represented in C using a 32-bit or 64-bit machine word
1633 (depending on the processor; i.e. DEC Alphas use 64-bit Lisp objects and
1634 most other processors use 32-bit Lisp objects).  The representation
1635 stuffs a pointer together with a tag, as follows:
1636
1637 @example
1638  [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
1639  [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
1640
1641    <---------------------------------------------------------> <->
1642             a pointer to a structure, or an integer            tag
1643 @end example
1644
1645 A tag of 00 is used for all pointer object types, a tag of 10 is used
1646 for characters, and the other two tags 01 and 11 are joined together to
1647 form the integer object type.  This representation gives us 31 bit
1648 integers and 30 bit characters, while pointers are represented directly
1649 without any bit masking or shifting.  This representation, though,
1650 assumes that pointers to structs are always aligned to multiples of 4,
1651 so the lower 2 bits are always zero.
1652
1653 Lisp objects use the typedef @code{Lisp_Object}, but the actual C type
1654 used for the Lisp object can vary.  It can be either a simple type
1655 (@code{long} on the DEC Alpha, @code{int} on other machines) or a
1656 structure whose fields are bit fields that line up properly (actually, a
1657 union of structures is used).  Generally the simple integral type is
1658 preferable because it ensures that the compiler will actually use a
1659 machine word to represent the object (some compilers will use more
1660 general and less efficient code for unions and structs even if they can
1661 fit in a machine word).  The union type, however, has the advantage of
1662 stricter type checking.  If you accidentally pass an integer where a Lisp
1663 object is desired, you get a compile error.  The choice of which type
1664 to use is determined by the preprocessor constant @code{USE_UNION_TYPE}
1665 which is defined via the @code{--use-union-type} option to
1666 @code{configure}.
1667
1668 Various macros are used to convert between Lisp_Objects and the
1669 corresponding C type.  Macros of the form @code{XINT()}, @code{XCHAR()},
1670 @code{XSTRING()}, @code{XSYMBOL()}, do any required bit shifting and/or
1671 masking and cast it to the appropriate type.  @code{XINT()} needs to be
1672 a bit tricky so that negative numbers are properly sign-extended.  Since
1673 integers are stored left-shifted, if the right-shift operator does an
1674 arithmetic shift (i.e. it leaves the most-significant bit as-is rather
1675 than shifting in a zero, so that it mimics a divide-by-two even for
1676 negative numbers) the shift to remove the tag bit is enough.  This is
1677 the case on all the systems we support.
1678
1679 Note that when @code{ERROR_CHECK_TYPECHECK} is defined, the converter
1680 macros become more complicated---they check the tag bits and/or the
1681 type field in the first four bytes of a record type to ensure that the
1682 object is really of the correct type.  This is great for catching places
1683 where an incorrect type is being dereferenced---this typically results
1684 in a pointer being dereferenced as the wrong type of structure, with
1685 unpredictable (and sometimes not easily traceable) results.
1686
1687 There are similar @code{XSET@var{TYPE}()} macros that construct a Lisp
1688 object.  These macros are of the form @code{XSET@var{TYPE}
1689 (@var{lvalue}, @var{result})}, i.e. they have to be a statement rather
1690 than just used in an expression.  The reason for this is that standard C
1691 doesn't let you ``construct'' a structure (but GCC does).  Granted, this
1692 sometimes isn't too convenient; for the case of integers, at least, you
1693 can use the function @code{make_int()}, which constructs and
1694 @emph{returns} an integer Lisp object.  Note that the
1695 @code{XSET@var{TYPE}()} macros are also affected by
1696 @code{ERROR_CHECK_TYPECHECK} and make sure that the structure is of the
1697 right type in the case of record types, where the type is contained in
1698 the structure.
1699
1700 The C programmer is responsible for @strong{guaranteeing} that a
1701 Lisp_Object is the correct type before using the @code{X@var{TYPE}}
1702 macros.  This is especially important in the case of lists.  Use
1703 @code{XCAR} and @code{XCDR} if a Lisp_Object is certainly a cons cell,
1704 else use @code{Fcar()} and @code{Fcdr()}.  Trust other C code, but not
1705 Lisp code.  On the other hand, if XEmacs has an internal logic error,
1706 it's better to crash immediately, so sprinkle @code{assert()}s and
1707 ``unreachable'' @code{abort()}s liberally about the source code.  Where
1708 performance is an issue, use @code{type_checking_assert},
1709 @code{bufpos_checking_assert}, and @code{gc_checking_assert}, which do
1710 nothing unless the corresponding configure error checking flag was
1711 specified.
1712
1713 @node Rules When Writing New C Code, A Summary of the Various XEmacs Modules, How Lisp Objects Are Represented in C, Top
1714 @chapter Rules When Writing New C Code
1715 @cindex writing new C code, rules when
1716 @cindex C code, rules when writing new
1717 @cindex code, rules when writing new C
1718
1719 The XEmacs C Code is extremely complex and intricate, and there are many
1720 rules that are more or less consistently followed throughout the code.
1721 Many of these rules are not obvious, so they are explained here.  It is
1722 of the utmost importance that you follow them.  If you don't, you may
1723 get something that appears to work, but which will crash in odd
1724 situations, often in code far away from where the actual breakage is.
1725
1726 @menu
1727 * General Coding Rules::
1728 * Writing Lisp Primitives::
1729 * Writing Good Comments::
1730 * Adding Global Lisp Variables::
1731 * Proper Use of Unsigned Types::
1732 * Coding for Mule::
1733 * Techniques for XEmacs Developers::
1734 @end menu
1735
1736 @node General Coding Rules
1737 @section General Coding Rules
1738 @cindex coding rules, general
1739
1740 The C code is actually written in a dialect of C called @dfn{Clean C},
1741 meaning that it can be compiled, mostly warning-free, with either a C or
1742 C++ compiler.  Coding in Clean C has several advantages over plain C.
1743 C++ compilers are more nit-picking, and a number of coding errors have
1744 been found by compiling with C++.  The ability to use both C and C++
1745 tools means that a greater variety of development tools are available to
1746 the developer.
1747
1748 Every module includes @file{<config.h>} (angle brackets so that
1749 @samp{--srcdir} works correctly; @file{config.h} may or may not be in
1750 the same directory as the C sources) and @file{lisp.h}.  @file{config.h}
1751 must always be included before any other header files (including
1752 system header files) to ensure that certain tricks played by various
1753 @file{s/} and @file{m/} files work out correctly.
1754
1755 When including header files, always use angle brackets, not double
1756 quotes, except when the file to be included is always in the same
1757 directory as the including file.  If either file is a generated file,
1758 then that is not likely to be the case.  In order to understand why we
1759 have this rule, imagine what happens when you do a build in the source
1760 directory using @samp{./configure} and another build in another
1761 directory using @samp{../work/configure}.  There will be two different
1762 @file{config.h} files.  Which one will be used if you @samp{#include
1763 "config.h"}?
1764
1765 Almost every module contains a @code{syms_of_*()} function and a
1766 @code{vars_of_*()} function.  The former declares any Lisp primitives
1767 you have defined and defines any symbols you will be using.  The latter
1768 declares any global Lisp variables you have added and initializes global
1769 C variables in the module.  @strong{Important}: There are stringent
1770 requirements on exactly what can go into these functions.  See the
1771 comment in @file{emacs.c}.  The reason for this is to avoid obscure
1772 unwanted interactions during initialization.  If you don't follow these
1773 rules, you'll be sorry!  If you want to do anything that isn't allowed,
1774 create a @code{complex_vars_of_*()} function for it.  Doing this is
1775 tricky, though: you have to make sure your function is called at the
1776 right time so that all the initialization dependencies work out.
1777
1778 Declare each function of these kinds in @file{symsinit.h}.  Make sure
1779 it's called in the appropriate place in @file{emacs.c}.  You never need
1780 to include @file{symsinit.h} directly, because it is included by
1781 @file{lisp.h}.
1782
1783 @strong{All global and static variables that are to be modifiable must
1784 be declared uninitialized.}  This means that you may not use the
1785 ``declare with initializer'' form for these variables, such as @code{int
1786 some_variable = 0;}.  The reason for this has to do with some kludges
1787 done during the dumping process: If possible, the initialized data
1788 segment is re-mapped so that it becomes part of the (unmodifiable) code
1789 segment in the dumped executable.  This allows this memory to be shared
1790 among multiple running XEmacs processes.  XEmacs is careful to place as
1791 much constant data as possible into initialized variables during the
1792 @file{temacs} phase.
1793
1794 @cindex copy-on-write
1795 @strong{Please note:} This kludge only works on a few systems nowadays,
1796 and is rapidly becoming irrelevant because most modern operating systems
1797 provide @dfn{copy-on-write} semantics.  All data is initially shared
1798 between processes, and a private copy is automatically made (on a
1799 page-by-page basis) when a process first attempts to write to a page of
1800 memory.
1801
1802 Formerly, there was a requirement that static variables not be declared
1803 inside of functions.  This had to do with another hack along the same
1804 vein as what was just described: old USG systems put statically-declared
1805 variables in the initialized data space, so those header files had a
1806 @code{#define static} declaration. (That way, the data-segment remapping
1807 described above could still work.) This fails badly on static variables
1808 inside of functions, which suddenly become automatic variables;
1809 therefore, you weren't supposed to have any of them.  This awful kludge
1810 has been removed in XEmacs because
1811
1812 @enumerate
1813 @item
1814 almost all of the systems that used this kludge ended up having
1815 to disable the data-segment remapping anyway;
1816 @item
1817 the only systems that didn't were extremely outdated ones;
1818 @item
1819 this hack completely messed up inline functions.
1820 @end enumerate
1821
1822 The C source code makes heavy use of C preprocessor macros.  One popular
1823 macro style is:
1824
1825 @example
1826 #define FOO(var, value) do @{            \
1827   Lisp_Object FOO_value = (value);      \
1828   ... /* compute using FOO_value */     \
1829   (var) = bar;                          \
1830 @} while (0)
1831 @end example
1832
1833 The @code{do @{...@} while (0)} is a standard trick to allow FOO to have
1834 statement semantics, so that it can safely be used within an @code{if}
1835 statement in C, for example.  Multiple evaluation is prevented by
1836 copying a supplied argument into a local variable, so that
1837 @code{FOO(var,fun(1))} only calls @code{fun} once.
1838
1839 Lisp lists are popular data structures in the C code as well as in
1840 Elisp.  There are two sets of macros that iterate over lists.
1841 @code{EXTERNAL_LIST_LOOP_@var{n}} should be used when the list has been
1842 supplied by the user, and cannot be trusted to be acyclic and
1843 @code{nil}-terminated.  A @code{malformed-list} or @code{circular-list} error
1844 will be generated if the list being iterated over is not entirely
1845 kosher.  @code{LIST_LOOP_@var{n}}, on the other hand, is faster and less
1846 safe, and can be used only on trusted lists.
1847
1848 Related macros are @code{GET_EXTERNAL_LIST_LENGTH} and
1849 @code{GET_LIST_LENGTH}, which calculate the length of a list, and in the
1850 case of @code{GET_EXTERNAL_LIST_LENGTH}, validating the properness of
1851 the list.  The macros @code{EXTERNAL_LIST_LOOP_DELETE_IF} and
1852 @code{LIST_LOOP_DELETE_IF} delete elements from a lisp list satisfying some
1853 predicate.
1854
1855 @node Writing Lisp Primitives
1856 @section Writing Lisp Primitives
1857 @cindex writing Lisp primitives
1858 @cindex Lisp primitives, writing
1859 @cindex primitives, writing Lisp
1860
1861 Lisp primitives are Lisp functions implemented in C.  The details of
1862 interfacing the C function so that Lisp can call it are handled by a few
1863 C macros.  The only way to really understand how to write new C code is
1864 to read the source, but we can explain some things here.
1865
1866 An example of a special form is the definition of @code{prog1}, from
1867 @file{eval.c}.  (An ordinary function would have the same general
1868 appearance.)
1869
1870 @cindex garbage collection protection
1871 @smallexample
1872 @group
1873 DEFUN ("prog1", Fprog1, 1, UNEVALLED, 0, /*
1874 Similar to `progn', but the value of the first form is returned.
1875 \(prog1 FIRST BODY...): All the arguments are evaluated sequentially.
1876 The value of FIRST is saved during evaluation of the remaining args,
1877 whose values are discarded.
1878 */
1879        (args))
1880 @{
1881   /* This function can GC */
1882   REGISTER Lisp_Object val, form, tail;
1883   struct gcpro gcpro1;
1884
1885   val = Feval (XCAR (args));
1886
1887   GCPRO1 (val);
1888
1889   LIST_LOOP_3 (form, XCDR (args), tail)
1890     Feval (form);
1891
1892   UNGCPRO;
1893   return val;
1894 @}
1895 @end group
1896 @end smallexample
1897
1898   Let's start with a precise explanation of the arguments to the
1899 @code{DEFUN} macro.  Here is a template for them:
1900
1901 @example
1902 @group
1903 DEFUN (@var{lname}, @var{fname}, @var{min_args}, @var{max_args}, @var{interactive}, /*
1904 @var{docstring}
1905 */
1906    (@var{arglist}))
1907 @end group
1908 @end example
1909
1910 @table @var
1911 @item lname
1912 This string is the name of the Lisp symbol to define as the function
1913 name; in the example above, it is @code{"prog1"}.
1914
1915 @item fname
1916 This is the C function name for this function.  This is the name that is
1917 used in C code for calling the function.  The name is, by convention,
1918 @samp{F} prepended to the Lisp name, with all dashes (@samp{-}) in the
1919 Lisp name changed to underscores.  Thus, to call this function from C
1920 code, call @code{Fprog1}.  Remember that the arguments are of type
1921 @code{Lisp_Object}; various macros and functions for creating values of
1922 type @code{Lisp_Object} are declared in the file @file{lisp.h}.
1923
1924 Primitives whose names are special characters (e.g. @code{+} or
1925 @code{<}) are named by spelling out, in some fashion, the special
1926 character: e.g. @code{Fplus()} or @code{Flss()}.  Primitives whose names
1927 begin with normal alphanumeric characters but also contain special
1928 characters are spelled out in some creative way, e.g. @code{let*}
1929 becomes @code{FletX()}.
1930
1931 Each function also has an associated structure that holds the data for
1932 the subr object that represents the function in Lisp.  This structure
1933 conveys the Lisp symbol name to the initialization routine that will
1934 create the symbol and store the subr object as its definition.  The C
1935 variable name of this structure is always @samp{S} prepended to the
1936 @var{fname}.  You hardly ever need to be aware of the existence of this
1937 structure, since @code{DEFUN} plus @code{DEFSUBR} takes care of all the
1938 details.
1939
1940 @item min_args
1941 This is the minimum number of arguments that the function requires.  The
1942 function @code{prog1} allows a minimum of one argument.
1943
1944 @item max_args
1945 This is the maximum number of arguments that the function accepts, if
1946 there is a fixed maximum.  Alternatively, it can be @code{UNEVALLED},
1947 indicating a special form that receives unevaluated arguments, or
1948 @code{MANY}, indicating an unlimited number of evaluated arguments (the
1949 C equivalent of @code{&rest}).  Both @code{UNEVALLED} and @code{MANY}
1950 are macros.  If @var{max_args} is a number, it may not be less than
1951 @var{min_args} and it may not be greater than 8. (If you need to add a
1952 function with more than 8 arguments, use the @code{MANY} form.  Resist
1953 the urge to edit the definition of @code{DEFUN} in @file{lisp.h}.  If
1954 you do it anyways, make sure to also add another clause to the switch
1955 statement in @code{primitive_funcall().})
1956
1957 @item interactive
1958 This is an interactive specification, a string such as might be used as
1959 the argument of @code{interactive} in a Lisp function.  In the case of
1960 @code{prog1}, it is 0 (a null pointer), indicating that @code{prog1}
1961 cannot be called interactively.  A value of @code{""} indicates a
1962 function that should receive no arguments when called interactively.
1963
1964 @item docstring
1965 This is the documentation string.  It is written just like a
1966 documentation string for a function defined in Lisp; in particular, the
1967 first line should be a single sentence.  Note how the documentation
1968 string is enclosed in a comment, none of the documentation is placed on
1969 the same lines as the comment-start and comment-end characters, and the
1970 comment-start characters are on the same line as the interactive
1971 specification.  @file{make-docfile}, which scans the C files for
1972 documentation strings, is very particular about what it looks for, and
1973 will not properly extract the doc string if it's not in this exact format.
1974
1975 In order to make both @file{etags} and @file{make-docfile} happy, make
1976 sure that the @code{DEFUN} line contains the @var{lname} and
1977 @var{fname}, and that the comment-start characters for the doc string
1978 are on the same line as the interactive specification, and put a newline
1979 directly after them (and before the comment-end characters).
1980
1981 @item arglist
1982 This is the comma-separated list of arguments to the C function.  For a
1983 function with a fixed maximum number of arguments, provide a C argument
1984 for each Lisp argument.  In this case, unlike regular C functions, the
1985 types of the arguments are not declared; they are simply always of type
1986 @code{Lisp_Object}.
1987
1988 The names of the C arguments will be used as the names of the arguments
1989 to the Lisp primitive as displayed in its documentation, modulo the same
1990 concerns described above for @code{F...} names (in particular,
1991 underscores in the C arguments become dashes in the Lisp arguments).
1992
1993 There is one additional kludge: A trailing `_' on the C argument is
1994 discarded when forming the Lisp argument.  This allows C language
1995 reserved words (like @code{default}) or global symbols (like
1996 @code{dirname}) to be used as argument names without compiler warnings
1997 or errors.
1998
1999 A Lisp function with @w{@var{max_args} = @code{UNEVALLED}} is a
2000 @w{@dfn{special form}}; its arguments are not evaluated.  Instead it
2001 receives one argument of type @code{Lisp_Object}, a (Lisp) list of the
2002 unevaluated arguments, conventionally named @code{(args)}.
2003
2004 When a Lisp function has no upper limit on the number of arguments,
2005 specify @w{@var{max_args} = @code{MANY}}.  In this case its implementation in
2006 C actually receives exactly two arguments: the number of Lisp arguments
2007 (an @code{int}) and the address of a block containing their values (a
2008 @w{@code{Lisp_Object *}}).  In this case only are the C types specified
2009 in the @var{arglist}: @w{@code{(int nargs, Lisp_Object *args)}}.
2010
2011 @end table
2012
2013 Within the function @code{Fprog1} itself, note the use of the macros
2014 @code{GCPRO1} and @code{UNGCPRO}.  @code{GCPRO1} is used to ``protect''
2015 a variable from garbage collection---to inform the garbage collector
2016 that it must look in that variable and regard the object pointed at by
2017 its contents as an accessible object.  This is necessary whenever you
2018 call @code{Feval} or anything that can directly or indirectly call
2019 @code{Feval} (this includes the @code{QUIT} macro!).  At such a time,
2020 any Lisp object that you intend to refer to again must be protected
2021 somehow.  @code{UNGCPRO} cancels the protection of the variables that
2022 are protected in the current function.  It is necessary to do this
2023 explicitly.
2024
2025 The macro @code{GCPRO1} protects just one local variable.  If you want
2026 to protect two, use @code{GCPRO2} instead; repeating @code{GCPRO1} will
2027 not work.  Macros @code{GCPRO3} and @code{GCPRO4} also exist.
2028
2029 These macros implicitly use local variables such as @code{gcpro1}; you
2030 must declare these explicitly, with type @code{struct gcpro}.  Thus, if
2031 you use @code{GCPRO2}, you must declare @code{gcpro1} and @code{gcpro2}.
2032
2033 @cindex caller-protects (@code{GCPRO} rule)
2034 Note also that the general rule is @dfn{caller-protects}; i.e. you are
2035 only responsible for protecting those Lisp objects that you create.  Any
2036 objects passed to you as arguments should have been protected by whoever
2037 created them, so you don't in general have to protect them.
2038
2039 In particular, the arguments to any Lisp primitive are always
2040 automatically @code{GCPRO}ed, when called ``normally'' from Lisp code or
2041 bytecode.  So only a few Lisp primitives that are called frequently from
2042 C code, such as @code{Fprogn} protect their arguments as a service to
2043 their caller.  You don't need to protect your arguments when writing a
2044 new @code{DEFUN}.
2045
2046 @code{GCPRO}ing is perhaps the trickiest and most error-prone part of
2047 XEmacs coding.  It is @strong{extremely} important that you get this
2048 right and use a great deal of discipline when writing this code.
2049 @xref{GCPROing, ,@code{GCPRO}ing}, for full details on how to do this.
2050
2051 What @code{DEFUN} actually does is declare a global structure of type
2052 @code{Lisp_Subr} whose name begins with capital @samp{SF} and which
2053 contains information about the primitive (e.g. a pointer to the
2054 function, its minimum and maximum allowed arguments, a string describing
2055 its Lisp name); @code{DEFUN} then begins a normal C function declaration
2056 using the @code{F...} name.  The Lisp subr object that is the function
2057 definition of a primitive (i.e. the object in the function slot of the
2058 symbol that names the primitive) actually points to this @samp{SF}
2059 structure; when @code{Feval} encounters a subr, it looks in the
2060 structure to find out how to call the C function.
2061
2062 Defining the C function is not enough to make a Lisp primitive
2063 available; you must also create the Lisp symbol for the primitive (the
2064 symbol is @dfn{interned}; @pxref{Obarrays}) and store a suitable subr
2065 object in its function cell. (If you don't do this, the primitive won't
2066 be seen by Lisp code.) The code looks like this:
2067
2068 @example
2069 DEFSUBR (@var{fname});
2070 @end example
2071
2072 @noindent
2073 Here @var{fname} is the same name you used as the second argument to
2074 @code{DEFUN}.
2075
2076 This call to @code{DEFSUBR} should go in the @code{syms_of_*()} function
2077 at the end of the module.  If no such function exists, create it and
2078 make sure to also declare it in @file{symsinit.h} and call it from the
2079 appropriate spot in @code{main()}.  @xref{General Coding Rules}.
2080
2081 Note that C code cannot call functions by name unless they are defined
2082 in C.  The way to call a function written in Lisp from C is to use
2083 @code{Ffuncall}, which embodies the Lisp function @code{funcall}.  Since
2084 the Lisp function @code{funcall} accepts an unlimited number of
2085 arguments, in C it takes two: the number of Lisp-level arguments, and a
2086 one-dimensional array containing their values.  The first Lisp-level
2087 argument is the Lisp function to call, and the rest are the arguments to
2088 pass to it.  Since @code{Ffuncall} can call the evaluator, you must
2089 protect pointers from garbage collection around the call to
2090 @code{Ffuncall}. (However, @code{Ffuncall} explicitly protects all of
2091 its parameters, so you don't have to protect any pointers passed as
2092 parameters to it.)
2093
2094 The C functions @code{call0}, @code{call1}, @code{call2}, and so on,
2095 provide handy ways to call a Lisp function conveniently with a fixed
2096 number of arguments.  They work by calling @code{Ffuncall}.
2097
2098 @file{eval.c} is a very good file to look through for examples;
2099 @file{lisp.h} contains the definitions for important macros and
2100 functions.
2101
2102 @node Writing Good Comments
2103 @section Writing Good Comments
2104 @cindex writing good comments
2105 @cindex comments, writing good
2106
2107 Comments are a lifeline for programmers trying to understand tricky
2108 code.  In general, the less obvious it is what you are doing, the more
2109 you need a comment, and the more detailed it needs to be.  You should
2110 always be on guard when you're writing code for stuff that's tricky, and
2111 should constantly be putting yourself in someone else's shoes and asking
2112 if that person could figure out without much difficulty what's going
2113 on. (Assume they are a competent programmer who understands the
2114 essentials of how the XEmacs code is structured but doesn't know much
2115 about the module you're working on or any algorithms you're using.) If
2116 you're not sure whether they would be able to, add a comment.  Always
2117 err on the side of more comments, rather than less.
2118
2119 Generally, when making comments, there is no need to attribute them with
2120 your name or initials.  This especially goes for small,
2121 easy-to-understand, non-opinionated ones.  Also, comments indicating
2122 where, when, and by whom a file was changed are @emph{strongly}
2123 discouraged, and in general will be removed as they are discovered.
2124 This is exactly what @file{ChangeLogs} are there for.  However, it can
2125 occasionally be useful to mark exactly where (but not when or by whom)
2126 changes are made, particularly when making small changes to a file
2127 imported from elsewhere.  These marks help when later on a newer version
2128 of the file is imported and the changes need to be merged. (If
2129 everything were always kept in CVS, there would be no need for this.
2130 But in practice, this often doesn't happen, or the CVS repository is
2131 later on lost or unavailable to the person doing the update.)
2132
2133 When putting in an explicit opinion in a comment, you should
2134 @emph{always} attribute it with your name, and optionally the date.
2135 This also goes for long, complex comments explaining in detail the
2136 workings of something -- by putting your name there, you make it
2137 possible for someone who has questions about how that thing works to
2138 determine who wrote the comment so they can write to them.  Preferably,
2139 use your actual name and not your initials, unless your initials are
2140 generally recognized (e.g. @samp{jwz}).  You can use only your first
2141 name if it's obvious who you are; otherwise, give first and last name.
2142 If you're not a regular contributor, you might consider putting your
2143 email address in -- it may be in the ChangeLog, but after awhile
2144 ChangeLogs have a tendency of disappearing or getting
2145 muddled. (E.g. your comment may get copied somewhere else or even into
2146 another program, and tracking down the proper ChangeLog may be very
2147 difficult.)
2148
2149 If you come across an opinion that is not or no longer valid, or you
2150 come across any comment that no longer applies but you want to keep it
2151 around, enclose it in @samp{[[ } and @samp{ ]]} marks and add a comment
2152 afterwards explaining why the preceding comment is no longer valid.  Put
2153 your name on this comment, as explained above.
2154
2155 Just as comments are a lifeline to programmers, incorrect comments are
2156 death.  If you come across an incorrect comment, @strong{immediately}
2157 correct it or flag it as incorrect, as described in the previous
2158 paragraph.  Whenever you work on a section of code, @emph{always} make
2159 sure to update any comments to be correct -- or, at the very least, flag
2160 them as incorrect.
2161
2162 To indicate a "todo" or other problem, use four pound signs --
2163 i.e. @samp{####}.
2164
2165 @node Adding Global Lisp Variables
2166 @section Adding Global Lisp Variables
2167 @cindex global Lisp variables, adding
2168 @cindex variables, adding global Lisp
2169
2170 Global variables whose names begin with @samp{Q} are constants whose
2171 value is a symbol of a particular name.  The name of the variable should
2172 be derived from the name of the symbol using the same rules as for Lisp
2173 primitives.  These variables are initialized using a call to
2174 @code{defsymbol()} in the @code{syms_of_*()} function. (This call
2175 interns a symbol, sets the C variable to the resulting Lisp object, and
2176 calls @code{staticpro()} on the C variable to tell the
2177 garbage-collection mechanism about this variable.  What
2178 @code{staticpro()} does is add a pointer to the variable to a large
2179 global array; when garbage-collection happens, all pointers listed in
2180 the array are used as starting points for marking Lisp objects.  This is
2181 important because it's quite possible that the only current reference to
2182 the object is the C variable.  In the case of symbols, the
2183 @code{staticpro()} doesn't matter all that much because the symbol is
2184 contained in @code{obarray}, which is itself @code{staticpro()}ed.
2185 However, it's possible that a naughty user could do something like
2186 uninterning the symbol out of @code{obarray} or even setting
2187 @code{obarray} to a different value [although this is likely to make
2188 XEmacs crash!].)
2189
2190   @strong{Please note:} It is potentially deadly if you declare a
2191 @samp{Q...}  variable in two different modules.  The two calls to
2192 @code{defsymbol()} are no problem, but some linkers will complain about
2193 multiply-defined symbols.  The most insidious aspect of this is that
2194 often the link will succeed anyway, but then the resulting executable
2195 will sometimes crash in obscure ways during certain operations!  To
2196 avoid this problem, declare any symbols with common names (such as
2197 @code{text}) that are not obviously associated with this particular
2198 module in the module @file{general.c}.
2199
2200   Global variables whose names begin with @samp{V} are variables that
2201 contain Lisp objects.  The convention here is that all global variables
2202 of type @code{Lisp_Object} begin with @samp{V}, and all others don't
2203 (including integer and boolean variables that have Lisp
2204 equivalents). Most of the time, these variables have equivalents in
2205 Lisp, but some don't.  Those that do are declared this way by a call to
2206 @code{DEFVAR_LISP()} in the @code{vars_of_*()} initializer for the
2207 module.  What this does is create a special @dfn{symbol-value-forward}
2208 Lisp object that contains a pointer to the C variable, intern a symbol
2209 whose name is as specified in the call to @code{DEFVAR_LISP()}, and set
2210 its value to the symbol-value-forward Lisp object; it also calls
2211 @code{staticpro()} on the C variable to tell the garbage-collection
2212 mechanism about the variable.  When @code{eval} (or actually
2213 @code{symbol-value}) encounters this special object in the process of
2214 retrieving a variable's value, it follows the indirection to the C
2215 variable and gets its value.  @code{setq} does similar things so that
2216 the C variable gets changed.
2217
2218   Whether or not you @code{DEFVAR_LISP()} a variable, you need to
2219 initialize it in the @code{vars_of_*()} function; otherwise it will end
2220 up as all zeroes, which is the integer 0 (@emph{not} @code{nil}), and
2221 this is probably not what you want.  Also, if the variable is not
2222 @code{DEFVAR_LISP()}ed, @strong{you must call} @code{staticpro()} on the
2223 C variable in the @code{vars_of_*()} function.  Otherwise, the
2224 garbage-collection mechanism won't know that the object in this variable
2225 is in use, and will happily collect it and reuse its storage for another
2226 Lisp object, and you will be the one who's unhappy when you can't figure
2227 out how your variable got overwritten.
2228
2229 @node Proper Use of Unsigned Types
2230 @section Proper Use of Unsigned Types
2231 @cindex unsigned types, proper use of
2232 @cindex types, proper use of unsigned
2233
2234 Avoid using @code{unsigned int} and @code{unsigned long} whenever
2235 possible.  Unsigned types are viral -- any arithmetic or comparisons
2236 involving mixed signed and unsigned types are automatically converted to
2237 unsigned, which is almost certainly not what you want.  Many subtle and
2238 hard-to-find bugs are created by careless use of unsigned types.  In
2239 general, you should almost @emph{never} use an unsigned type to hold a
2240 regular quantity of any sort.  The only exceptions are
2241
2242 @enumerate
2243 @item
2244 When there's a reasonable possibility you will actually need all 32 or
2245 64 bits to store the quantity.
2246 @item
2247 When calling existing API's that require unsigned types.  In this case,
2248 you should still do all manipulation using signed types, and do the
2249 conversion at the very threshold of the API call.
2250 @item
2251 In existing code that you don't want to modify because you don't
2252 maintain it.
2253 @item
2254 In bit-field structures.
2255 @end enumerate
2256
2257 Other reasonable uses of @code{unsigned int} and @code{unsigned long}
2258 are representing non-quantities -- e.g. bit-oriented flags and such.
2259
2260 @node Coding for Mule
2261 @section Coding for Mule
2262 @cindex coding for Mule
2263 @cindex Mule, coding for
2264
2265 Although Mule support is not compiled by default in XEmacs, many people
2266 are using it, and we consider it crucial that new code works correctly
2267 with multibyte characters.  This is not hard; it is only a matter of
2268 following several simple user-interface guidelines.  Even if you never
2269 compile with Mule, with a little practice you will find it quite easy
2270 to code Mule-correctly.
2271
2272 Note that these guidelines are not necessarily tied to the current Mule
2273 implementation; they are also a good idea to follow on the grounds of
2274 code generalization for future I18N work.
2275
2276 @menu
2277 * Character-Related Data Types::
2278 * Working With Character and Byte Positions::
2279 * Conversion to and from External Data::
2280 * General Guidelines for Writing Mule-Aware Code::
2281 * An Example of Mule-Aware Code::
2282 @end menu
2283
2284 @node Character-Related Data Types
2285 @subsection Character-Related Data Types
2286 @cindex character-related data types
2287 @cindex data types, character-related
2288
2289 First, let's review the basic character-related datatypes used by
2290 XEmacs.  Note that the separate @code{typedef}s are not mandatory in the
2291 current implementation (all of them boil down to @code{unsigned char} or
2292 @code{int}), but they improve clarity of code a great deal, because one
2293 glance at the declaration can tell the intended use of the variable.
2294
2295 @table @code
2296 @item Emchar
2297 @cindex Emchar
2298 An @code{Emchar} holds a single Emacs character.
2299
2300 Obviously, the equality between characters and bytes is lost in the Mule
2301 world.  Characters can be represented by one or more bytes in the
2302 buffer, and @code{Emchar} is the C type large enough to hold any
2303 character.
2304
2305 Without Mule support, an @code{Emchar} is equivalent to an
2306 @code{unsigned char}.
2307
2308 @item Bufbyte
2309 @cindex Bufbyte
2310 The data representing the text in a buffer or string is logically a set
2311 of @code{Bufbyte}s.
2312
2313 XEmacs does not work with the same character formats all the time; when
2314 reading characters from the outside, it decodes them to an internal
2315 format, and likewise encodes them when writing.  @code{Bufbyte} (in fact
2316 @code{unsigned char}) is the basic unit of XEmacs internal buffers and
2317 strings format.  A @code{Bufbyte *} is the type that points at text
2318 encoded in the variable-width internal encoding.
2319
2320 One character can correspond to one or more @code{Bufbyte}s.  In the
2321 current Mule implementation, an ASCII character is represented by the
2322 same @code{Bufbyte}, and other characters are represented by a sequence
2323 of two or more @code{Bufbyte}s.
2324
2325 Without Mule support, there are exactly 256 characters, implicitly
2326 Latin-1, and each character is represented using one @code{Bufbyte}, and
2327 there is a one-to-one correspondence between @code{Bufbyte}s and
2328 @code{Emchar}s.
2329
2330 @item Bufpos
2331 @itemx Charcount
2332 @cindex Bufpos
2333 @cindex Charcount
2334 A @code{Bufpos} represents a character position in a buffer or string.
2335 A @code{Charcount} represents a number (count) of characters.
2336 Logically, subtracting two @code{Bufpos} values yields a
2337 @code{Charcount} value.  Although all of these are @code{typedef}ed to
2338 @code{EMACS_INT}, we use them in preference to @code{EMACS_INT} to make
2339 it clear what sort of position is being used.
2340
2341 @code{Bufpos} and @code{Charcount} values are the only ones that are
2342 ever visible to Lisp.
2343
2344 @item Bytind
2345 @itemx Bytecount
2346 @cindex Bytind
2347 @cindex Bytecount
2348 A @code{Bytind} represents a byte position in a buffer or string.  A
2349 @code{Bytecount} represents the distance between two positions, in bytes.
2350 The relationship between @code{Bytind} and @code{Bytecount} is the same
2351 as the relationship between @code{Bufpos} and @code{Charcount}.
2352
2353 @item Extbyte
2354 @itemx Extcount
2355 @cindex Extbyte
2356 @cindex Extcount
2357 When dealing with the outside world, XEmacs works with @code{Extbyte}s,
2358 which are equivalent to @code{unsigned char}.  Obviously, an
2359 @code{Extcount} is the distance between two @code{Extbyte}s.  Extbytes
2360 and Extcounts are not all that frequent in XEmacs code.
2361 @end table
2362
2363 @node Working With Character and Byte Positions
2364 @subsection Working With Character and Byte Positions
2365 @cindex character and byte positions, working with
2366 @cindex byte positions, working with character and
2367 @cindex positions, working with character and byte
2368
2369 Now that we have defined the basic character-related types, we can look
2370 at the macros and functions designed for work with them and for
2371 conversion between them.  Most of these macros are defined in
2372 @file{buffer.h}, and we don't discuss all of them here, but only the
2373 most important ones.  Examining the existing code is the best way to
2374 learn about them.
2375
2376 @table @code
2377 @item MAX_EMCHAR_LEN
2378 @cindex MAX_EMCHAR_LEN
2379 This preprocessor constant is the maximum number of buffer bytes to
2380 represent an Emacs character in the variable width internal encoding.
2381 It is useful when allocating temporary strings to keep a known number of
2382 characters.  For instance:
2383
2384 @example
2385 @group
2386 @{
2387   Charcount cclen;
2388   ...
2389   @{
2390     /* Allocate place for @var{cclen} characters. */
2391     Bufbyte *buf = (Bufbyte *)alloca (cclen * MAX_EMCHAR_LEN);
2392 ...
2393 @end group
2394 @end example
2395
2396 If you followed the previous section, you can guess that, logically,
2397 multiplying a @code{Charcount} value with @code{MAX_EMCHAR_LEN} produces
2398 a @code{Bytecount} value.
2399
2400 In the current Mule implementation, @code{MAX_EMCHAR_LEN} equals 4.
2401 Without Mule, it is 1.
2402
2403 @item charptr_emchar
2404 @itemx set_charptr_emchar
2405 @cindex charptr_emchar
2406 @cindex set_charptr_emchar
2407 The @code{charptr_emchar} macro takes a @code{Bufbyte} pointer and
2408 returns the @code{Emchar} stored at that position.  If it were a
2409 function, its prototype would be:
2410
2411 @example
2412 Emchar charptr_emchar (Bufbyte *p);
2413 @end example
2414
2415 @code{set_charptr_emchar} stores an @code{Emchar} to the specified byte
2416 position.  It returns the number of bytes stored:
2417
2418 @example
2419 Bytecount set_charptr_emchar (Bufbyte *p, Emchar c);
2420 @end example
2421
2422 It is important to note that @code{set_charptr_emchar} is safe only for
2423 appending a character at the end of a buffer, not for overwriting a
2424 character in the middle.  This is because the width of characters
2425 varies, and @code{set_charptr_emchar} cannot resize the string if it
2426 writes, say, a two-byte character where a single-byte character used to
2427 reside.
2428
2429 A typical use of @code{set_charptr_emchar} can be demonstrated by this
2430 example, which copies characters from buffer @var{buf} to a temporary
2431 string of Bufbytes.
2432
2433 @example
2434 @group
2435 @{
2436   Bufpos pos;
2437   for (pos = beg; pos < end; pos++)
2438     @{
2439       Emchar c = BUF_FETCH_CHAR (buf, pos);
2440       p += set_charptr_emchar (buf, c);
2441     @}
2442 @}
2443 @end group
2444 @end example
2445
2446 Note how @code{set_charptr_emchar} is used to store the @code{Emchar}
2447 and increment the counter, at the same time.
2448
2449 @item INC_CHARPTR
2450 @itemx DEC_CHARPTR
2451 @cindex INC_CHARPTR
2452 @cindex DEC_CHARPTR
2453 These two macros increment and decrement a @code{Bufbyte} pointer,
2454 respectively.  They will adjust the pointer by the appropriate number of
2455 bytes according to the byte length of the character stored there.  Both
2456 macros assume that the memory address is located at the beginning of a
2457 valid character.
2458
2459 Without Mule support, @code{INC_CHARPTR (p)} and @code{DEC_CHARPTR (p)}
2460 simply expand to @code{p++} and @code{p--}, respectively.
2461
2462 @item bytecount_to_charcount
2463 @cindex bytecount_to_charcount
2464 Given a pointer to a text string and a length in bytes, return the
2465 equivalent length in characters.
2466
2467 @example
2468 Charcount bytecount_to_charcount (Bufbyte *p, Bytecount bc);
2469 @end example
2470
2471 @item charcount_to_bytecount
2472 @cindex charcount_to_bytecount
2473 Given a pointer to a text string and a length in characters, return the
2474 equivalent length in bytes.
2475
2476 @example
2477 Bytecount charcount_to_bytecount (Bufbyte *p, Charcount cc);
2478 @end example
2479
2480 @item charptr_n_addr
2481 @cindex charptr_n_addr
2482 Return a pointer to the beginning of the character offset @var{cc} (in
2483 characters) from @var{p}.
2484
2485 @example
2486 Bufbyte *charptr_n_addr (Bufbyte *p, Charcount cc);
2487 @end example
2488 @end table
2489
2490 @node Conversion to and from External Data
2491 @subsection Conversion to and from External Data
2492 @cindex conversion to and from external data
2493 @cindex external data, conversion to and from
2494
2495 When an external function, such as a C library function, returns a
2496 @code{char} pointer, you should almost never treat it as @code{Bufbyte}.
2497 This is because these returned strings may contain 8bit characters which
2498 can be misinterpreted by XEmacs, and cause a crash.  Likewise, when
2499 exporting a piece of internal text to the outside world, you should
2500 always convert it to an appropriate external encoding, lest the internal
2501 stuff (such as the infamous \201 characters) leak out.
2502
2503 The interface to conversion between the internal and external
2504 representations of text are the numerous conversion macros defined in
2505 @file{buffer.h}.  There used to be a fixed set of external formats
2506 supported by these macros, but now any coding system can be used with
2507 these macros.  The coding system alias mechanism is used to create the
2508 following logical coding systems, which replace the fixed external
2509 formats.  The (dontusethis-set-symbol-value-handler) mechanism was
2510 enhanced to make this possible (more work on that is needed - like
2511 remove the @code{dontusethis-} prefix).
2512
2513 @table @code
2514 @item Qbinary
2515 This is the simplest format and is what we use in the absence of a more
2516 appropriate format.  This converts according to the @code{binary} coding
2517 system:
2518
2519 @enumerate a
2520 @item
2521 On input, bytes 0--255 are converted into (implicitly Latin-1)
2522 characters 0--255.  A non-Mule xemacs doesn't really know about
2523 different character sets and the fonts to display them, so the bytes can
2524 be treated as text in different 1-byte encodings by simply setting the
2525 appropriate fonts.  So in a sense, non-Mule xemacs is a multi-lingual
2526 editor if, for example, different fonts are used to display text in
2527 different buffers, faces, or windows.  The specifier mechanism gives the
2528 user complete control over this kind of behavior.
2529 @item
2530 On output, characters 0--255 are converted into bytes 0--255 and other
2531 characters are converted into `~'.
2532 @end enumerate
2533
2534 @item Qfile_name
2535 Format used for filenames.  This is user-definable via either the
2536 @code{file-name-coding-system} or @code{pathname-coding-system} (now
2537 obsolete) variables.
2538
2539 @item Qnative
2540 Format used for the external Unix environment---@code{argv[]}, stuff
2541 from @code{getenv()}, stuff from the @file{/etc/passwd} file, etc.
2542 Currently this is the same as Qfile_name.  The two should be
2543 distinguished for clarity and possible future separation.
2544
2545 @item Qctext
2546 Compound--text format.  This is the standard X11 format used for data
2547 stored in properties, selections, and the like.  This is an 8-bit
2548 no-lock-shift ISO2022 coding system.  This is a real coding system,
2549 unlike Qfile_name, which is user-definable.
2550 @end table
2551
2552 There are two fundamental macros to convert between external and
2553 internal format.
2554
2555 @code{TO_INTERNAL_FORMAT} converts external data to internal format, and
2556 @code{TO_EXTERNAL_FORMAT} converts the other way around.  The arguments
2557 each of these receives are a source type, a source, a sink type, a sink,
2558 and a coding system (or a symbol naming a coding system).
2559
2560 A typical call looks like
2561 @example
2562 TO_EXTERNAL_FORMAT (LISP_STRING, str, C_STRING_MALLOC, ptr, Qfile_name);
2563 @end example
2564
2565 which means that the contents of the lisp string @code{str} are written
2566 to a malloc'ed memory area which will be pointed to by @code{ptr}, after
2567 the function returns.  The conversion will be done using the
2568 @code{file-name} coding system, which will be controlled by the user
2569 indirectly by setting or binding the variable
2570 @code{file-name-coding-system}.
2571
2572 Some sources and sinks require two C variables to specify.  We use some
2573 preprocessor magic to allow different source and sink types, and even
2574 different numbers of arguments to specify different types of sources and
2575 sinks.
2576
2577 So we can have a call that looks like
2578 @example
2579 TO_INTERNAL_FORMAT (DATA, (ptr, len),
2580                     MALLOC, (ptr, len),
2581                     coding_system);
2582 @end example
2583
2584 The parenthesized argument pairs are required to make the preprocessor
2585 magic work.
2586
2587 Here are the different source and sink types:
2588
2589 @table @code
2590 @item @code{DATA, (ptr, len),}
2591 input data is a fixed buffer of size @var{len} at address @var{ptr}
2592 @item @code{ALLOCA, (ptr, len),}
2593 output data is placed in an alloca()ed buffer of size @var{len} pointed to by @var{ptr}
2594 @item @code{MALLOC, (ptr, len),}
2595 output data is in a malloc()ed buffer of size @var{len} pointed to by @var{ptr}
2596 @item @code{C_STRING_ALLOCA, ptr,}
2597 equivalent to @code{ALLOCA (ptr, len_ignored)} on output.
2598 @item @code{C_STRING_MALLOC, ptr,}
2599 equivalent to @code{MALLOC (ptr, len_ignored)} on output
2600 @item @code{C_STRING, ptr,}
2601 equivalent to @code{DATA, (ptr, strlen (ptr) + 1)} on input
2602 @item @code{LISP_STRING, string,}
2603 input or output is a Lisp_Object of type string
2604 @item @code{LISP_BUFFER, buffer,}
2605 output is written to @code{(point)} in lisp buffer @var{buffer}
2606 @item @code{LISP_LSTREAM, lstream,}
2607 input or output is a Lisp_Object of type lstream
2608 @item @code{LISP_OPAQUE, object,}
2609 input or output is a Lisp_Object of type opaque
2610 @end table
2611
2612 Often, the data is being converted to a '\0'-byte-terminated string,
2613 which is the format required by many external system C APIs.  For these
2614 purposes, a source type of @code{C_STRING} or a sink type of
2615 @code{C_STRING_ALLOCA} or @code{C_STRING_MALLOC} is appropriate.
2616 Otherwise, we should try to keep XEmacs '\0'-byte-clean, which means
2617 using (ptr, len) pairs.
2618
2619 The sinks to be specified must be lvalues, unless they are the lisp
2620 object types @code{LISP_LSTREAM} or @code{LISP_BUFFER}.
2621
2622 For the sink types @code{ALLOCA} and @code{C_STRING_ALLOCA}, the
2623 resulting text is stored in a stack-allocated buffer, which is
2624 automatically freed on returning from the function.  However, the sink
2625 types @code{MALLOC} and @code{C_STRING_MALLOC} return @code{xmalloc()}ed
2626 memory.  The caller is responsible for freeing this memory using
2627 @code{xfree()}.
2628
2629 Note that it doesn't make sense for @code{LISP_STRING} to be a source
2630 for @code{TO_INTERNAL_FORMAT} or a sink for @code{TO_EXTERNAL_FORMAT}.
2631 You'll get an assertion failure if you try.
2632
2633
2634 @node General Guidelines for Writing Mule-Aware Code
2635 @subsection General Guidelines for Writing Mule-Aware Code
2636 @cindex writing Mule-aware code, general guidelines for
2637 @cindex Mule-aware code, general guidelines for writing
2638 @cindex code, general guidelines for writing Mule-aware
2639
2640 This section contains some general guidance on how to write Mule-aware
2641 code, as well as some pitfalls you should avoid.
2642
2643 @table @emph
2644 @item Never use @code{char} and @code{char *}.
2645 In XEmacs, the use of @code{char} and @code{char *} is almost always a
2646 mistake.  If you want to manipulate an Emacs character from ``C'', use
2647 @code{Emchar}.  If you want to examine a specific octet in the internal
2648 format, use @code{Bufbyte}.  If you want a Lisp-visible character, use a
2649 @code{Lisp_Object} and @code{make_char}.  If you want a pointer to move
2650 through the internal text, use @code{Bufbyte *}.  Also note that you
2651 almost certainly do not need @code{Emchar *}.
2652
2653 @item Be careful not to confuse @code{Charcount}, @code{Bytecount}, and @code{Bufpos}.
2654 The whole point of using different types is to avoid confusion about the
2655 use of certain variables.  Lest this effect be nullified, you need to be
2656 careful about using the right types.
2657
2658 @item Always convert external data
2659 It is extremely important to always convert external data, because
2660 XEmacs can crash if unexpected 8bit sequences are copied to its internal
2661 buffers literally.
2662
2663 This means that when a system function, such as @code{readdir}, returns
2664 a string, you may need to convert it using one of the conversion macros
2665 described in the previous chapter, before passing it further to Lisp.
2666
2667 Actually, most of the basic system functions that accept '\0'-terminated
2668 string arguments, like @code{stat()} and @code{open()}, have been
2669 @strong{encapsulated} so that they are they @code{always} do internal to
2670 external conversion themselves.  This means you must pass internally
2671 encoded data, typically the @code{XSTRING_DATA} of a Lisp_String to
2672 these functions.  This is actually a design bug, since it unexpectedly
2673 changes the semantics of the system functions.  A better design would be
2674 to provide separate versions of these system functions that accepted
2675 Lisp_Objects which were lisp strings in place of their current
2676 @code{char *} arguments.
2677
2678 @example
2679 int stat_lisp (Lisp_Object path, struct stat *buf); /* Implement me */
2680 @end example
2681
2682 Also note that many internal functions, such as @code{make_string},
2683 accept Bufbytes, which removes the need for them to convert the data
2684 they receive.  This increases efficiency because that way external data
2685 needs to be decoded only once, when it is read.  After that, it is
2686 passed around in internal format.
2687 @end table
2688
2689 @node An Example of Mule-Aware Code
2690 @subsection An Example of Mule-Aware Code
2691 @cindex code, an example of Mule-aware
2692 @cindex Mule-aware code, an example of
2693
2694 As an example of Mule-aware code, we will analyze the @code{string}
2695 function, which conses up a Lisp string from the character arguments it
2696 receives.  Here is the definition, pasted from @code{alloc.c}:
2697
2698 @example
2699 @group
2700 DEFUN ("string", Fstring, 0, MANY, 0, /*
2701 Concatenate all the argument characters and make the result a string.
2702 */
2703        (int nargs, Lisp_Object *args))
2704 @{
2705   Bufbyte *storage = alloca_array (Bufbyte, nargs * MAX_EMCHAR_LEN);
2706   Bufbyte *p = storage;
2707
2708   for (; nargs; nargs--, args++)
2709     @{
2710       Lisp_Object lisp_char = *args;
2711       CHECK_CHAR_COERCE_INT (lisp_char);
2712       p += set_charptr_emchar (p, XCHAR (lisp_char));
2713     @}
2714   return make_string (storage, p - storage);
2715 @}
2716 @end group
2717 @end example
2718
2719 Now we can analyze the source line by line.
2720
2721 Obviously, string will be as long as there are arguments to the
2722 function.  This is why we allocate @code{MAX_EMCHAR_LEN} * @var{nargs}
2723 bytes on the stack, i.e. the worst-case number of bytes for @var{nargs}
2724 @code{Emchar}s to fit in the string.
2725
2726 Then, the loop checks that each element is a character, converting
2727 integers in the process.  Like many other functions in XEmacs, this
2728 function silently accepts integers where characters are expected, for
2729 historical and compatibility reasons.  Unless you know what you are
2730 doing, @code{CHECK_CHAR} will also suffice.  @code{XCHAR (lisp_char)}
2731 extracts the @code{Emchar} from the @code{Lisp_Object}, and
2732 @code{set_charptr_emchar} stores it to storage, increasing @code{p} in
2733 the process.
2734
2735 Other instructive examples of correct coding under Mule can be found all
2736 over the XEmacs code.  For starters, I recommend
2737 @code{Fnormalize_menu_item_name} in @file{menubar.c}.  After you have
2738 understood this section of the manual and studied the examples, you can
2739 proceed writing new Mule-aware code.
2740
2741 @node Techniques for XEmacs Developers
2742 @section Techniques for XEmacs Developers
2743 @cindex techniques for XEmacs developers
2744 @cindex developers, techniques for XEmacs
2745
2746 @cindex Purify
2747 @cindex Quantify
2748 To make a purified XEmacs, do: @code{make puremacs}.
2749 To make a quantified XEmacs, do: @code{make quantmacs}.
2750
2751 You simply can't dump Quantified and Purified images (unless using the
2752 portable dumper).  Purify gets confused when xemacs frees memory in one
2753 process that was allocated in a @emph{different} process on a different
2754 machine!.  Run it like so:
2755 @example
2756 temacs -batch -l loadup.el run-temacs @var{xemacs-args...}
2757 @end example
2758
2759 @cindex error checking
2760 Before you go through the trouble, are you compiling with all
2761 debugging and error-checking off?  If not, try that first.  Be warned
2762 that while Quantify is directly responsible for quite a few
2763 optimizations which have been made to XEmacs, doing a run which
2764 generates results which can be acted upon is not necessarily a trivial
2765 task.
2766
2767 Also, if you're still willing to do some runs make sure you configure
2768 with the @samp{--quantify} flag.  That will keep Quantify from starting
2769 to record data until after the loadup is completed and will shut off
2770 recording right before it shuts down (which generates enough bogus data
2771 to throw most results off).  It also enables three additional elisp
2772 commands: @code{quantify-start-recording-data},
2773 @code{quantify-stop-recording-data} and @code{quantify-clear-data}.
2774
2775 If you want to make XEmacs faster, target your favorite slow benchmark,
2776 run a profiler like Quantify, @code{gprof}, or @code{tcov}, and figure
2777 out where the cycles are going.  Specific projects:
2778
2779 @itemize @bullet
2780 @item
2781 Make the garbage collector faster.  Figure out how to write an
2782 incremental garbage collector.
2783 @item
2784 Write a compiler that takes bytecode and spits out C code.
2785 Unfortunately, you will then need a C compiler and a more fully
2786 developed module system.
2787 @item
2788 Speed up redisplay.
2789 @item
2790 Speed up syntax highlighting.  Maybe moving some of the syntax
2791 highlighting capabilities into C would make a difference.
2792 @item
2793 Implement tail recursion in Emacs Lisp (hard!).
2794 @end itemize
2795
2796 Unfortunately, Emacs Lisp is slow, and is going to stay slow.  Function
2797 calls in elisp are especially expensive.  Iterating over a long list is
2798 going to be 30 times faster implemented in C than in Elisp.
2799
2800 Heavily used small code fragments need to be fast.  The traditional way
2801 to implement such code fragments in C is with macros.  But macros in C
2802 are known to be broken.
2803
2804 @cindex macro hygiene
2805 Macro arguments that are repeatedly evaluated may suffer from repeated
2806 side effects or suboptimal performance.
2807
2808 Variable names used in macros may collide with caller's variables,
2809 causing (at least) unwanted compiler warnings.
2810
2811 In order to solve these problems, and maintain statement semantics, one
2812 should use the @code{do @{ ... @} while (0)} trick while trying to
2813 reference macro arguments exactly once using local variables.
2814
2815 Let's take a look at this poor macro definition:
2816
2817 @example
2818 #define MARK_OBJECT(obj) \
2819   if (!marked_p (obj)) mark_object (obj), did_mark = 1
2820 @end example
2821
2822 This macro evaluates its argument twice, and also fails if used like this:
2823 @example
2824   if (flag) MARK_OBJECT (obj); else do_something();
2825 @end example
2826
2827 A much better definition is
2828
2829 @example
2830 #define MARK_OBJECT(obj) do @{ \
2831   Lisp_Object mo_obj = (obj); \
2832   if (!marked_p (mo_obj))     \
2833     @{                         \
2834       mark_object (mo_obj);   \
2835       did_mark = 1;           \
2836     @}                         \
2837 @} while (0)
2838 @end example
2839
2840 Notice the elimination of double evaluation by using the local variable
2841 with the obscure name.  Writing safe and efficient macros requires great
2842 care.  The one problem with macros that cannot be portably worked around
2843 is, since a C block has no value, a macro used as an expression rather
2844 than a statement cannot use the techniques just described to avoid
2845 multiple evaluation.
2846
2847 @cindex inline functions
2848 In most cases where a macro has function semantics, an inline function
2849 is a better implementation technique.  Modern compiler optimizers tend
2850 to inline functions even if they have no @code{inline} keyword, and
2851 configure magic ensures that the @code{inline} keyword can be safely
2852 used as an additional compiler hint.  Inline functions used in a single
2853 .c files are easy.  The function must already be defined to be
2854 @code{static}.  Just add another @code{inline} keyword to the
2855 definition.
2856
2857 @example
2858 inline static int
2859 heavily_used_small_function (int arg)
2860 @{
2861   ...
2862 @}
2863 @end example
2864
2865 Inline functions in header files are trickier, because we would like to
2866 make the following optimization if the function is @emph{not} inlined
2867 (for example, because we're compiling for debugging).  We would like the
2868 function to be defined externally exactly once, and each calling
2869 translation unit would create an external reference to the function,
2870 instead of including a definition of the inline function in the object
2871 code of every translation unit that uses it.  This optimization is
2872 currently only available for gcc.  But you don't have to worry about the
2873 trickiness; just define your inline functions in header files using this
2874 pattern:
2875
2876 @example
2877 INLINE_HEADER int
2878 i_used_to_be_a_crufty_macro_but_look_at_me_now (int arg);
2879 INLINE_HEADER int
2880 i_used_to_be_a_crufty_macro_but_look_at_me_now (int arg)
2881 @{
2882   ...
2883 @}
2884 @end example
2885
2886 The declaration right before the definition is to prevent warnings when
2887 compiling with @code{gcc -Wmissing-declarations}.  I consider issuing
2888 this warning for inline functions a gcc bug, but the gcc maintainers disagree.
2889
2890 @cindex inline functions, headers
2891 @cindex header files, inline functions
2892 Every header which contains inline functions, either directly by using
2893 @code{INLINE_HEADER} or indirectly by using @code{DECLARE_LRECORD} must
2894 be added to @file{inline.c}'s includes to make the optimization
2895 described above work.  (Optimization note: if all INLINE_HEADER
2896 functions are in fact inlined in all translation units, then the linker
2897 can just discard @code{inline.o}, since it contains only unreferenced code).
2898
2899 To get started debugging XEmacs, take a look at the @file{.gdbinit} and
2900 @file{.dbxrc} files in the @file{src} directory.  See the section in the
2901 XEmacs FAQ on How to Debug an XEmacs problem with a debugger.
2902
2903 After making source code changes, run @code{make check} to ensure that
2904 you haven't introduced any regressions.  If you want to make xemacs more
2905 reliable, please improve the test suite in @file{tests/automated}.
2906
2907 Did you make sure you didn't introduce any new compiler warnings?
2908
2909 Before submitting a patch, please try compiling at least once with
2910
2911 @example
2912 configure --with-mule --with-union-type --error-checking=all
2913 @end example
2914
2915 Here are things to know when you create a new source file:
2916
2917 @itemize @bullet
2918 @item
2919 All @file{.c} files should @code{#include <config.h>} first.  Almost all
2920 @file{.c} files should @code{#include "lisp.h"} second.
2921
2922 @item
2923 Generated header files should be included using the @code{#include <...>} syntax,
2924 not the @code{#include "..."} syntax.  The generated headers are:
2925
2926 @file{config.h sheap-adjust.h paths.h Emacs.ad.h}
2927
2928 The basic rule is that you should assume builds using @code{--srcdir}
2929 and the @code{#include <...>} syntax needs to be used when the
2930 to-be-included generated file is in a potentially different directory
2931 @emph{at compile time}.  The non-obvious C rule is that @code{#include "..."}
2932 means to search for the included file in the same directory as the
2933 including file, @emph{not} in the current directory.
2934
2935 @item
2936 Header files should @emph{not} include @code{<config.h>} and
2937 @code{"lisp.h"}.  It is the responsibility of the @file{.c} files that
2938 use it to do so.
2939
2940 @end itemize
2941
2942 @cindex Lisp object types, creating
2943 @cindex creating Lisp object types
2944 @cindex object types, creating Lisp
2945 Here is a checklist of things to do when creating a new lisp object type
2946 named @var{foo}:
2947
2948 @enumerate
2949 @item
2950 create @var{foo}.h
2951 @item
2952 create @var{foo}.c
2953 @item
2954 add definitions of @code{syms_of_@var{foo}}, etc. to @file{@var{foo}.c}
2955 @item
2956 add declarations of @code{syms_of_@var{foo}}, etc. to @file{symsinit.h}
2957 @item
2958 add calls to @code{syms_of_@var{foo}}, etc. to @file{emacs.c}
2959 @item
2960 add definitions of macros like @code{CHECK_@var{FOO}} and
2961 @code{@var{FOO}P} to @file{@var{foo}.h}
2962 @item
2963 add the new type index to @code{enum lrecord_type}
2964 @item
2965 add a DEFINE_LRECORD_IMPLEMENTATION call to @file{@var{foo}.c}
2966 @item
2967 add an INIT_LRECORD_IMPLEMENTATION call to @code{syms_of_@var{foo}.c}
2968 @end enumerate
2969
2970 @node A Summary of the Various XEmacs Modules, Allocation of Objects in XEmacs Lisp, Rules When Writing New C Code, Top
2971 @chapter A Summary of the Various XEmacs Modules
2972 @cindex modules, a summary of the various XEmacs
2973
2974   This is accurate as of XEmacs 20.0.
2975
2976 @menu
2977 * Low-Level Modules::
2978 * Basic Lisp Modules::
2979 * Modules for Standard Editing Operations::
2980 * Editor-Level Control Flow Modules::
2981 * Modules for the Basic Displayable Lisp Objects::
2982 * Modules for other Display-Related Lisp Objects::
2983 * Modules for the Redisplay Mechanism::
2984 * Modules for Interfacing with the File System::
2985 * Modules for Other Aspects of the Lisp Interpreter and Object System::
2986 * Modules for Interfacing with the Operating System::
2987 * Modules for Interfacing with X Windows::
2988 * Modules for Internationalization::
2989 @end menu
2990
2991 @node Low-Level Modules
2992 @section Low-Level Modules
2993 @cindex low-level modules
2994 @cindex modules, low-level
2995
2996 @example
2997 config.h
2998 @end example
2999
3000 This is automatically generated from @file{config.h.in} based on the
3001 results of configure tests and user-selected optional features and
3002 contains preprocessor definitions specifying the nature of the
3003 environment in which XEmacs is being compiled.
3004
3005
3006
3007 @example
3008 paths.h
3009 @end example
3010
3011 This is automatically generated from @file{paths.h.in} based on supplied
3012 configure values, and allows for non-standard installed configurations
3013 of the XEmacs directories.  It's currently broken, though.
3014
3015
3016
3017 @example
3018 emacs.c
3019 signal.c
3020 @end example
3021
3022 @file{emacs.c} contains @code{main()} and other code that performs the most
3023 basic environment initializations and handles shutting down the XEmacs
3024 process (this includes @code{kill-emacs}, the normal way that XEmacs is
3025 exited; @code{dump-emacs}, which is used during the build process to
3026 write out the XEmacs executable; @code{run-emacs-from-temacs}, which can
3027 be used to start XEmacs directly when temacs has finished loading all
3028 the Lisp code; and emergency code to handle crashes [XEmacs tries to
3029 auto-save all files before it crashes]).
3030
3031 Low-level code that directly interacts with the Unix signal mechanism,
3032 however, is in @file{signal.c}.  Note that this code does not handle system
3033 dependencies in interfacing to signals; that is handled using the
3034 @file{syssignal.h} header file, described in section J below.
3035
3036
3037
3038 @example
3039 unexaix.c
3040 unexalpha.c
3041 unexapollo.c
3042 unexconvex.c
3043 unexec.c
3044 unexelf.c
3045 unexelfsgi.c
3046 unexencap.c
3047 unexenix.c
3048 unexfreebsd.c
3049 unexfx2800.c
3050 unexhp9k3.c
3051 unexhp9k800.c
3052 unexmips.c
3053 unexnext.c
3054 unexsol2.c
3055 unexsunos4.c
3056 @end example
3057
3058 These modules contain code dumping out the XEmacs executable on various
3059 different systems. (This process is highly machine-specific and
3060 requires intimate knowledge of the executable format and the memory map
3061 of the process.) Only one of these modules is actually used; this is
3062 chosen by @file{configure}.
3063
3064
3065
3066 @example
3067 ecrt0.c
3068 lastfile.c
3069 pre-crt0.c
3070 @end example
3071
3072 These modules are used in conjunction with the dump mechanism.  On some
3073 systems, an alternative version of the C startup code (the actual code
3074 that receives control from the operating system when the process is
3075 started, and which calls @code{main()}) is required so that the dumping
3076 process works properly; @file{crt0.c} provides this.
3077
3078 @file{pre-crt0.c} and @file{lastfile.c} should be the very first and
3079 very last file linked, respectively. (Actually, this is not really true.
3080 @file{lastfile.c} should be after all Emacs modules whose initialized
3081 data should be made constant, and before all other Emacs files and all
3082 libraries.  In particular, the allocation modules @file{gmalloc.c},
3083 @file{alloca.c}, etc. are normally placed past @file{lastfile.c}, and
3084 all of the files that implement Xt widget classes @emph{must} be placed
3085 after @file{lastfile.c} because they contain various structures that
3086 must be statically initialized and into which Xt writes at various
3087 times.) @file{pre-crt0.c} and @file{lastfile.c} contain exported symbols
3088 that are used to determine the start and end of XEmacs' initialized
3089 data space when dumping.
3090
3091
3092
3093 @example
3094 alloca.c
3095 free-hook.c
3096 getpagesize.h
3097 gmalloc.c
3098 malloc.c
3099 mem-limits.h
3100 ralloc.c
3101 vm-limit.c
3102 @end example
3103
3104 These handle basic C allocation of memory.  @file{alloca.c} is an emulation of
3105 the stack allocation function @code{alloca()} on machines that lack
3106 this. (XEmacs makes extensive use of @code{alloca()} in its code.)
3107
3108 @file{gmalloc.c} and @file{malloc.c} are two implementations of the standard C
3109 functions @code{malloc()}, @code{realloc()} and @code{free()}.  They are
3110 often used in place of the standard system-provided @code{malloc()}
3111 because they usually provide a much faster implementation, at the
3112 expense of additional memory use.  @file{gmalloc.c} is a newer implementation
3113 that is much more memory-efficient for large allocations than @file{malloc.c},
3114 and should always be preferred if it works. (At one point, @file{gmalloc.c}
3115 didn't work on some systems where @file{malloc.c} worked; but this should be
3116 fixed now.)
3117
3118 @cindex relocating allocator
3119 @file{ralloc.c} is the @dfn{relocating allocator}.  It provides
3120 functions similar to @code{malloc()}, @code{realloc()} and @code{free()}
3121 that allocate memory that can be dynamically relocated in memory.  The
3122 advantage of this is that allocated memory can be shuffled around to
3123 place all the free memory at the end of the heap, and the heap can then
3124 be shrunk, releasing the memory back to the operating system.  The use
3125 of this can be controlled with the configure option @code{--rel-alloc};
3126 if enabled, memory allocated for buffers will be relocatable, so that if
3127 a very large file is visited and the buffer is later killed, the memory
3128 can be released to the operating system.  (The disadvantage of this
3129 mechanism is that it can be very slow.  On systems with the
3130 @code{mmap()} system call, the XEmacs version of @file{ralloc.c} uses
3131 this to move memory around without actually having to block-copy it,
3132 which can speed things up; but it can still cause noticeable performance
3133 degradation.)
3134
3135 @file{free-hook.c} contains some debugging functions for checking for invalid
3136 arguments to @code{free()}.
3137
3138 @file{vm-limit.c} contains some functions that warn the user when memory is
3139 getting low.  These are callback functions that are called by @file{gmalloc.c}
3140 and @file{malloc.c} at appropriate times.
3141
3142 @file{getpagesize.h} provides a uniform interface for retrieving the size of a
3143 page in virtual memory.  @file{mem-limits.h} provides a uniform interface for
3144 retrieving the total amount of available virtual memory.  Both are
3145 similar in spirit to the @file{sys*.h} files described in section J, below.
3146
3147
3148
3149 @example
3150 blocktype.c
3151 blocktype.h
3152 dynarr.c
3153 @end example
3154
3155 These implement a couple of basic C data types to facilitate memory
3156 allocation.  The @code{Blocktype} type efficiently manages the
3157 allocation of fixed-size blocks by minimizing the number of times that
3158 @code{malloc()} and @code{free()} are called.  It allocates memory in
3159 large chunks, subdivides the chunks into blocks of the proper size, and
3160 returns the blocks as requested.  When blocks are freed, they are placed
3161 onto a linked list, so they can be efficiently reused.  This data type
3162 is not much used in XEmacs currently, because it's a fairly new
3163 addition.
3164
3165 @cindex dynamic array
3166 The @code{Dynarr} type implements a @dfn{dynamic array}, which is
3167 similar to a standard C array but has no fixed limit on the number of
3168 elements it can contain.  Dynamic arrays can hold elements of any type,
3169 and when you add a new element, the array automatically resizes itself
3170 if it isn't big enough.  Dynarrs are extensively used in the redisplay
3171 mechanism.
3172
3173
3174
3175 @example
3176 inline.c
3177 @end example
3178
3179 This module is used in connection with inline functions (available in
3180 some compilers).  Often, inline functions need to have a corresponding
3181 non-inline function that does the same thing.  This module is where they
3182 reside.  It contains no actual code, but defines some special flags that
3183 cause inline functions defined in header files to be rendered as actual
3184 functions.  It then includes all header files that contain any inline
3185 function definitions, so that each one gets a real function equivalent.
3186
3187
3188
3189 @example
3190 debug.c
3191 debug.h
3192 @end example
3193
3194 These functions provide a system for doing internal consistency checks
3195 during code development.  This system is not currently used; instead the
3196 simpler @code{assert()} macro is used along with the various checks
3197 provided by the @samp{--error-check-*} configuration options.
3198
3199
3200
3201 @example
3202 universe.h
3203 @end example
3204
3205 This is not currently used.
3206
3207
3208
3209 @node Basic Lisp Modules
3210 @section Basic Lisp Modules
3211 @cindex Lisp modules, basic
3212 @cindex modules, basic Lisp
3213
3214 @example
3215 lisp-disunion.h
3216 lisp-union.h
3217 lisp.h
3218 lrecord.h
3219 symsinit.h
3220 @end example
3221
3222 These are the basic header files for all XEmacs modules.  Each module
3223 includes @file{lisp.h}, which brings the other header files in.
3224 @file{lisp.h} contains the definitions of the structures and extractor
3225 and constructor macros for the basic Lisp objects and various other
3226 basic definitions for the Lisp environment, as well as some
3227 general-purpose definitions (e.g. @code{min()} and @code{max()}).
3228 @file{lisp.h} includes either @file{lisp-disunion.h} or
3229 @file{lisp-union.h}, depending on whether @code{USE_UNION_TYPE} is
3230 defined.  These files define the typedef of the Lisp object itself (as
3231 described above) and the low-level macros that hide the actual
3232 implementation of the Lisp object.  All extractor and constructor macros
3233 for particular types of Lisp objects are defined in terms of these
3234 low-level macros.
3235
3236 As a general rule, all typedefs should go into the typedefs section of
3237 @file{lisp.h} rather than into a module-specific header file even if the
3238 structure is defined elsewhere.  This allows function prototypes that
3239 use the typedef to be placed into other header files.  Forward structure
3240 declarations (i.e. a simple declaration like @code{struct foo;} where
3241 the structure itself is defined elsewhere) should be placed into the
3242 typedefs section as necessary.
3243
3244 @file{lrecord.h} contains the basic structures and macros that implement
3245 all record-type Lisp objects---i.e. all objects whose type is a field
3246 in their C structure, which includes all objects except the few most
3247 basic ones.
3248
3249 @file{lisp.h} contains prototypes for most of the exported functions in
3250 the various modules.  Lisp primitives defined using @code{DEFUN} that
3251 need to be called by C code should be declared using @code{EXFUN}.
3252 Other function prototypes should be placed either into the appropriate
3253 section of @code{lisp.h}, or into a module-specific header file,
3254 depending on how general-purpose the function is and whether it has
3255 special-purpose argument types requiring definitions not in
3256 @file{lisp.h}.)  All initialization functions are prototyped in
3257 @file{symsinit.h}.
3258
3259
3260
3261 @example
3262 alloc.c
3263 @end example
3264
3265 The large module @file{alloc.c} implements all of the basic allocation and
3266 garbage collection for Lisp objects.  The most commonly used Lisp
3267 objects are allocated in chunks, similar to the Blocktype data type
3268 described above; others are allocated in individually @code{malloc()}ed
3269 blocks.  This module provides the foundation on which all other aspects
3270 of the Lisp environment sit, and is the first module initialized at
3271 startup.
3272
3273 Note that @file{alloc.c} provides a series of generic functions that are
3274 not dependent on any particular object type, and interfaces to
3275 particular types of objects using a standardized interface of
3276 type-specific methods.  This scheme is a fundamental principle of
3277 object-oriented programming and is heavily used throughout XEmacs.  The
3278 great advantage of this is that it allows for a clean separation of
3279 functionality into different modules---new classes of Lisp objects, new
3280 event interfaces, new device types, new stream interfaces, etc. can be
3281 added transparently without affecting code anywhere else in XEmacs.
3282 Because the different subsystems are divided into general and specific
3283 code, adding a new subtype within a subsystem will in general not
3284 require changes to the generic subsystem code or affect any of the other
3285 subtypes in the subsystem; this provides a great deal of robustness to
3286 the XEmacs code.
3287
3288
3289 @example
3290 eval.c
3291 backtrace.h
3292 @end example
3293
3294 This module contains all of the functions to handle the flow of control.
3295 This includes the mechanisms of defining functions, calling functions,
3296 traversing stack frames, and binding variables; the control primitives
3297 and other special forms such as @code{while}, @code{if}, @code{eval},
3298 @code{let}, @code{and}, @code{or}, @code{progn}, etc.; handling of
3299 non-local exits, unwind-protects, and exception handlers; entering the
3300 debugger; methods for the subr Lisp object type; etc.  It does
3301 @emph{not} include the @code{read} function, the @code{print} function,
3302 or the handling of symbols and obarrays.
3303
3304 @file{backtrace.h} contains some structures related to stack frames and the
3305 flow of control.
3306
3307
3308
3309 @example
3310 lread.c
3311 @end example
3312
3313 This module implements the Lisp reader and the @code{read} function,
3314 which converts text into Lisp objects, according to the read syntax of
3315 the objects, as described above.  This is similar to the parser that is
3316 a part of all compilers.
3317
3318
3319
3320 @example
3321 print.c
3322 @end example
3323
3324 This module implements the Lisp print mechanism and the @code{print}
3325 function and related functions.  This is the inverse of the Lisp reader
3326 -- it converts Lisp objects to a printed, textual representation.
3327 (Hopefully something that can be read back in using @code{read} to get
3328 an equivalent object.)
3329
3330
3331
3332 @example
3333 general.c
3334 symbols.c
3335 symeval.h
3336 @end example
3337
3338 @file{symbols.c} implements the handling of symbols, obarrays, and
3339 retrieving the values of symbols.  Much of the code is devoted to
3340 handling the special @dfn{symbol-value-magic} objects that define
3341 special types of variables---this includes buffer-local variables,
3342 variable aliases, variables that forward into C variables, etc.  This
3343 module is initialized extremely early (right after @file{alloc.c}),
3344 because it is here that the basic symbols @code{t} and @code{nil} are
3345 created, and those symbols are used everywhere throughout XEmacs.
3346
3347 @file{symeval.h} contains the definitions of symbol structures and the
3348 @code{DEFVAR_LISP()} and related macros for declaring variables.
3349
3350
3351
3352 @example
3353 data.c
3354 floatfns.c
3355 fns.c
3356 @end example
3357
3358 These modules implement the methods and standard Lisp primitives for all
3359 the basic Lisp object types other than symbols (which are described
3360 above).  @file{data.c} contains all the predicates (primitives that return
3361 whether an object is of a particular type); the integer arithmetic
3362 functions; and the basic accessor and mutator primitives for the various
3363 object types.  @file{fns.c} contains all the standard predicates for working
3364 with sequences (where, abstractly speaking, a sequence is an ordered set
3365 of objects, and can be represented by a list, string, vector, or
3366 bit-vector); it also contains @code{equal}, perhaps on the grounds that
3367 bulk of the operation of @code{equal} is comparing sequences.
3368 @file{floatfns.c} contains methods and primitives for floats and floating-point
3369 arithmetic.
3370
3371
3372
3373 @example
3374 bytecode.c
3375 bytecode.h
3376 @end example
3377
3378 @file{bytecode.c} implements the byte-code interpreter and
3379 compiled-function objects, and @file{bytecode.h} contains associated
3380 structures.  Note that the byte-code @emph{compiler} is written in Lisp.
3381
3382
3383
3384
3385 @node Modules for Standard Editing Operations
3386 @section Modules for Standard Editing Operations
3387 @cindex modules for standard editing operations
3388 @cindex editing operations, modules for standard
3389
3390 @example
3391 buffer.c
3392 buffer.h
3393 bufslots.h
3394 @end example
3395
3396 @file{buffer.c} implements the @dfn{buffer} Lisp object type.  This
3397 includes functions that create and destroy buffers; retrieve buffers by
3398 name or by other properties; manipulate lists of buffers (remember that
3399 buffers are permanent objects and stored in various ordered lists);
3400 retrieve or change buffer properties; etc.  It also contains the
3401 definitions of all the built-in buffer-local variables (which can be
3402 viewed as buffer properties).  It does @emph{not} contain code to
3403 manipulate buffer-local variables (that's in @file{symbols.c}, described
3404 above); or code to manipulate the text in a buffer.
3405
3406 @file{buffer.h} defines the structures associated with a buffer and the various
3407 macros for retrieving text from a buffer and special buffer positions
3408 (e.g. @code{point}, the default location for text insertion).  It also
3409 contains macros for working with buffer positions and converting between
3410 their representations as character offsets and as byte offsets (under
3411 MULE, they are different, because characters can be multi-byte).  It is
3412 one of the largest header files.
3413
3414 @file{bufslots.h} defines the fields in the buffer structure that correspond to
3415 the built-in buffer-local variables.  It is its own header file because
3416 it is included many times in @file{buffer.c}, as a way of iterating over all
3417 the built-in buffer-local variables.
3418
3419
3420
3421 @example
3422 insdel.c
3423 insdel.h
3424 @end example
3425
3426 @file{insdel.c} contains low-level functions for inserting and deleting text in
3427 a buffer, keeping track of changed regions for use by redisplay, and
3428 calling any before-change and after-change functions that may have been
3429 registered for the buffer.  It also contains the actual functions that
3430 convert between byte offsets and character offsets.
3431
3432 @file{insdel.h} contains associated headers.
3433
3434
3435
3436 @example
3437 marker.c
3438 @end example
3439
3440 This module implements the @dfn{marker} Lisp object type, which
3441 conceptually is a pointer to a text position in a buffer that moves
3442 around as text is inserted and deleted, so as to remain in the same
3443 relative position.  This module doesn't actually move the markers around
3444 -- that's handled in @file{insdel.c}.  This module just creates them and
3445 implements the primitives for working with them.  As markers are simple
3446 objects, this does not entail much.
3447
3448 Note that the standard arithmetic primitives (e.g. @code{+}) accept
3449 markers in place of integers and automatically substitute the value of
3450 @code{marker-position} for the marker, i.e. an integer describing the
3451 current buffer position of the marker.
3452
3453
3454
3455 @example
3456 extents.c
3457 extents.h
3458 @end example
3459
3460 This module implements the @dfn{extent} Lisp object type, which is like
3461 a marker that works over a range of text rather than a single position.
3462 Extents are also much more complex and powerful than markers and have a
3463 more efficient (and more algorithmically complex) implementation.  The
3464 implementation is described in detail in comments in @file{extents.c}.
3465
3466 The code in @file{extents.c} works closely with @file{insdel.c} so that
3467 extents are properly moved around as text is inserted and deleted.
3468 There is also code in @file{extents.c} that provides information needed
3469 by the redisplay mechanism for efficient operation. (Remember that
3470 extents can have display properties that affect [sometimes drastically,
3471 as in the @code{invisible} property] the display of the text they
3472 cover.)
3473
3474
3475
3476 @example
3477 editfns.c
3478 @end example
3479
3480 @file{editfns.c} contains the standard Lisp primitives for working with
3481 a buffer's text, and calls the low-level functions in @file{insdel.c}.
3482 It also contains primitives for working with @code{point} (the default
3483 buffer insertion location).
3484
3485 @file{editfns.c} also contains functions for retrieving various
3486 characteristics from the external environment: the current time, the
3487 process ID of the running XEmacs process, the name of the user who ran
3488 this XEmacs process, etc.  It's not clear why this code is in
3489 @file{editfns.c}.
3490
3491
3492
3493 @example
3494 callint.c
3495 cmds.c
3496 commands.h
3497 @end example
3498
3499 @cindex interactive
3500 These modules implement the basic @dfn{interactive} commands,
3501 i.e. user-callable functions.  Commands, as opposed to other functions,
3502 have special ways of getting their parameters interactively (by querying
3503 the user), as opposed to having them passed in a normal function
3504 invocation.  Many commands are not really meant to be called from other
3505 Lisp functions, because they modify global state in a way that's often
3506 undesired as part of other Lisp functions.
3507
3508 @file{callint.c} implements the mechanism for querying the user for
3509 parameters and calling interactive commands.  The bulk of this module is
3510 code that parses the interactive spec that is supplied with an
3511 interactive command.
3512
3513 @file{cmds.c} implements the basic, most commonly used editing commands:
3514 commands to move around the current buffer and insert and delete
3515 characters.  These commands are implemented using the Lisp primitives
3516 defined in @file{editfns.c}.
3517
3518 @file{commands.h} contains associated structure definitions and prototypes.
3519
3520
3521
3522 @example
3523 regex.c
3524 regex.h
3525 search.c
3526 @end example
3527
3528 @file{search.c} implements the Lisp primitives for searching for text in
3529 a buffer, and some of the low-level algorithms for doing this.  In
3530 particular, the fast fixed-string Boyer-Moore search algorithm is
3531 implemented in @file{search.c}.  The low-level algorithms for doing
3532 regular-expression searching, however, are implemented in @file{regex.c}
3533 and @file{regex.h}.  These two modules are largely independent of
3534 XEmacs, and are similar to (and based upon) the regular-expression
3535 routines used in @file{grep} and other GNU utilities.
3536
3537
3538
3539 @example
3540 doprnt.c
3541 @end example
3542
3543 @file{doprnt.c} implements formatted-string processing, similar to
3544 @code{printf()} command in C.
3545
3546
3547
3548 @example
3549 undo.c
3550 @end example
3551
3552 This module implements the undo mechanism for tracking buffer changes.
3553 Most of this could be implemented in Lisp.
3554
3555
3556
3557 @node Editor-Level Control Flow Modules
3558 @section Editor-Level Control Flow Modules
3559 @cindex control flow modules, editor-level
3560 @cindex modules, editor-level control flow
3561
3562 @example
3563 event-Xt.c
3564 event-msw.c
3565 event-stream.c
3566 event-tty.c
3567 events-mod.h
3568 gpmevent.c
3569 gpmevent.h
3570 events.c
3571 events.h
3572 @end example
3573
3574 These implement the handling of events (user input and other system
3575 notifications).
3576
3577 @file{events.c} and @file{events.h} define the @dfn{event} Lisp object
3578 type and primitives for manipulating it.
3579
3580 @file{event-stream.c} implements the basic functions for working with
3581 event queues, dispatching an event by looking it up in relevant keymaps
3582 and such, and handling timeouts; this includes the primitives
3583 @code{next-event} and @code{dispatch-event}, as well as related
3584 primitives such as @code{sit-for}, @code{sleep-for}, and
3585 @code{accept-process-output}. (@file{event-stream.c} is one of the
3586 hairiest and trickiest modules in XEmacs.  Beware!  You can easily mess
3587 things up here.)
3588
3589 @file{event-Xt.c} and @file{event-tty.c} implement the low-level
3590 interfaces onto retrieving events from Xt (the X toolkit) and from TTY's
3591 (using @code{read()} and @code{select()}), respectively.  The event
3592 interface enforces a clean separation between the specific code for
3593 interfacing with the operating system and the generic code for working
3594 with events, by defining an API of basic, low-level event methods;
3595 @file{event-Xt.c} and @file{event-tty.c} are two different
3596 implementations of this API.  To add support for a new operating system
3597 (e.g. NeXTstep), one merely needs to provide another implementation of
3598 those API functions.
3599
3600 Note that the choice of whether to use @file{event-Xt.c} or
3601 @file{event-tty.c} is made at compile time!  Or at the very latest, it
3602 is made at startup time.  @file{event-Xt.c} handles events for
3603 @emph{both} X and TTY frames; @file{event-tty.c} is only used when X
3604 support is not compiled into XEmacs.  The reason for this is that there
3605 is only one event loop in XEmacs: thus, it needs to be able to receive
3606 events from all different kinds of frames.
3607
3608
3609
3610 @example
3611 keymap.c
3612 keymap.h
3613 @end example
3614
3615 @file{keymap.c} and @file{keymap.h} define the @dfn{keymap} Lisp object
3616 type and associated methods and primitives. (Remember that keymaps are
3617 objects that associate event descriptions with functions to be called to
3618 ``execute'' those events; @code{dispatch-event} looks up events in the
3619 relevant keymaps.)
3620
3621
3622
3623 @example
3624 cmdloop.c
3625 @end example
3626
3627 @file{cmdloop.c} contains functions that implement the actual editor
3628 command loop---i.e. the event loop that cyclically retrieves and
3629 dispatches events.  This code is also rather tricky, just like
3630 @file{event-stream.c}.
3631
3632
3633
3634 @example
3635 macros.c
3636 macros.h
3637 @end example
3638
3639 These two modules contain the basic code for defining keyboard macros.
3640 These functions don't actually do much; most of the code that handles keyboard
3641 macros is mixed in with the event-handling code in @file{event-stream.c}.
3642
3643
3644
3645 @example
3646 minibuf.c
3647 @end example
3648
3649 This contains some miscellaneous code related to the minibuffer (most of
3650 the minibuffer code was moved into Lisp by Richard Mlynarik).  This
3651 includes the primitives for completion (although filename completion is
3652 in @file{dired.c}), the lowest-level interface to the minibuffer (if the
3653 command loop were cleaned up, this too could be in Lisp), and code for
3654 dealing with the echo area (this, too, was mostly moved into Lisp, and
3655 the only code remaining is code to call out to Lisp or provide simple
3656 bootstrapping implementations early in temacs, before the echo-area Lisp
3657 code is loaded).
3658
3659
3660
3661 @node Modules for the Basic Displayable Lisp Objects
3662 @section Modules for the Basic Displayable Lisp Objects
3663 @cindex modules for the basic displayable Lisp objects
3664 @cindex displayable Lisp objects, modules for the basic
3665 @cindex Lisp objects, modules for the basic displayable
3666 @cindex objects, modules for the basic displayable Lisp
3667
3668 @example
3669 console-msw.c
3670 console-msw.h
3671 console-stream.c
3672 console-stream.h
3673 console-tty.c
3674 console-tty.h
3675 console-x.c
3676 console-x.h
3677 console.c
3678 console.h
3679 @end example
3680
3681 These modules implement the @dfn{console} Lisp object type.  A console
3682 contains multiple display devices, but only one keyboard and mouse.
3683 Most of the time, a console will contain exactly one device.
3684
3685 Consoles are the top of a lisp object inclusion hierarchy.  Consoles
3686 contain devices, which contain frames, which contain windows.
3687
3688
3689
3690 @example
3691 device-msw.c
3692 device-tty.c
3693 device-x.c
3694 device.c
3695 device.h
3696 @end example
3697
3698 These modules implement the @dfn{device} Lisp object type.  This
3699 abstracts a particular screen or connection on which frames are
3700 displayed.  As with Lisp objects, event interfaces, and other
3701 subsystems, the device code is separated into a generic component that
3702 contains a standardized interface (in the form of a set of methods) onto
3703 particular device types.
3704
3705 The device subsystem defines all the methods and provides method
3706 services for not only device operations but also for the frame, window,
3707 menubar, scrollbar, toolbar, and other displayable-object subsystems.
3708 The reason for this is that all of these subsystems have the same
3709 subtypes (X, TTY, NeXTstep, Microsoft Windows, etc.) as devices do.
3710
3711
3712
3713 @example
3714 frame-msw.c
3715 frame-tty.c
3716 frame-x.c
3717 frame.c
3718 frame.h
3719 @end example
3720
3721 Each device contains one or more frames in which objects (e.g. text) are
3722 displayed.  A frame corresponds to a window in the window system;
3723 usually this is a top-level window but it could potentially be one of a
3724 number of overlapping child windows within a top-level window, using the
3725 MDI (Multiple Document Interface) protocol in Microsoft Windows or a
3726 similar scheme.
3727
3728 The @file{frame-*} files implement the @dfn{frame} Lisp object type and
3729 provide the generic and device-type-specific operations on frames
3730 (e.g. raising, lowering, resizing, moving, etc.).
3731
3732
3733
3734 @example
3735 window.c
3736 window.h
3737 @end example
3738
3739 @cindex window (in Emacs)
3740 @cindex pane
3741 Each frame consists of one or more non-overlapping @dfn{windows} (better
3742 known as @dfn{panes} in standard window-system terminology) in which a
3743 buffer's text can be displayed.  Windows can also have scrollbars
3744 displayed around their edges.
3745
3746 @file{window.c} and @file{window.h} implement the @dfn{window} Lisp
3747 object type and provide code to manage windows.  Since windows have no
3748 associated resources in the window system (the window system knows only
3749 about the frame; no child windows or anything are used for XEmacs
3750 windows), there is no device-type-specific code here; all of that code
3751 is part of the redisplay mechanism or the code for particular object
3752 types such as scrollbars.
3753
3754
3755
3756 @node Modules for other Display-Related Lisp Objects
3757 @section Modules for other Display-Related Lisp Objects
3758 @cindex modules for other display-related Lisp objects
3759 @cindex display-related Lisp objects, modules for other
3760 @cindex Lisp objects, modules for other display-related
3761
3762 @example
3763 faces.c
3764 faces.h
3765 @end example
3766
3767
3768
3769 @example
3770 bitmaps.h
3771 glyphs-eimage.c
3772 glyphs-msw.c
3773 glyphs-msw.h
3774 glyphs-widget.c
3775 glyphs-x.c
3776 glyphs-x.h
3777 glyphs.c
3778 glyphs.h
3779 @end example
3780
3781
3782
3783 @example
3784 objects-msw.c
3785 objects-msw.h
3786 objects-tty.c
3787 objects-tty.h
3788 objects-x.c
3789 objects-x.h
3790 objects.c
3791 objects.h
3792 @end example
3793
3794
3795
3796 @example
3797 menubar-msw.c
3798 menubar-msw.h
3799 menubar-x.c
3800 menubar.c
3801 menubar.h
3802 @end example
3803
3804
3805
3806 @example
3807 scrollbar-msw.c
3808 scrollbar-msw.h
3809 scrollbar-x.c
3810 scrollbar-x.h
3811 scrollbar.c
3812 scrollbar.h
3813 @end example
3814
3815
3816
3817 @example
3818 toolbar-msw.c
3819 toolbar-x.c
3820 toolbar.c
3821 toolbar.h
3822 @end example
3823
3824
3825
3826 @example
3827 font-lock.c
3828 @end example
3829
3830 This file provides C support for syntax highlighting---i.e.
3831 highlighting different syntactic constructs of a source file in
3832 different colors, for easy reading.  The C support is provided so that
3833 this is fast.
3834
3835
3836
3837 @example
3838 dgif_lib.c
3839 gif_err.c
3840 gif_lib.h
3841 gifalloc.c
3842 @end example
3843
3844 These modules decode GIF-format image files, for use with glyphs.
3845 These files were removed due to Unisys patent infringement concerns.
3846
3847
3848
3849 @node Modules for the Redisplay Mechanism
3850 @section Modules for the Redisplay Mechanism
3851 @cindex modules for the redisplay mechanism
3852 @cindex redisplay mechanism, modules for the
3853
3854 @example
3855 redisplay-output.c
3856 redisplay-msw.c
3857 redisplay-tty.c
3858 redisplay-x.c
3859 redisplay.c
3860 redisplay.h
3861 @end example
3862
3863 These files provide the redisplay mechanism.  As with many other
3864 subsystems in XEmacs, there is a clean separation between the general
3865 and device-specific support.
3866
3867 @file{redisplay.c} contains the bulk of the redisplay engine.  These
3868 functions update the redisplay structures (which describe how the screen
3869 is to appear) to reflect any changes made to the state of any
3870 displayable objects (buffer, frame, window, etc.) since the last time
3871 that redisplay was called.  These functions are highly optimized to
3872 avoid doing more work than necessary (since redisplay is called
3873 extremely often and is potentially a huge time sink), and depend heavily
3874 on notifications from the objects themselves that changes have occurred,
3875 so that redisplay doesn't explicitly have to check each possible object.
3876 The redisplay mechanism also contains a great deal of caching to further
3877 speed things up; some of this caching is contained within the various
3878 displayable objects.
3879
3880 @file{redisplay-output.c} goes through the redisplay structures and converts
3881 them into calls to device-specific methods to actually output the screen
3882 changes.
3883
3884 @file{redisplay-x.c} and @file{redisplay-tty.c} are two implementations
3885 of these redisplay output methods, for X frames and TTY frames,
3886 respectively.
3887
3888
3889
3890 @example
3891 indent.c
3892 @end example
3893
3894 This module contains various functions and Lisp primitives for
3895 converting between buffer positions and screen positions.  These
3896 functions call the redisplay mechanism to do most of the work, and then
3897 examine the redisplay structures to get the necessary information.  This
3898 module needs work.
3899
3900
3901
3902 @example
3903 termcap.c
3904 terminfo.c
3905 tparam.c
3906 @end example
3907
3908 These files contain functions for working with the termcap (BSD-style)
3909 and terminfo (System V style) databases of terminal capabilities and
3910 escape sequences, used when XEmacs is displaying in a TTY.
3911
3912
3913
3914 @example
3915 cm.c
3916 cm.h
3917 @end example
3918
3919 These files provide some miscellaneous TTY-output functions and should
3920 probably be merged into @file{redisplay-tty.c}.
3921
3922
3923
3924 @node Modules for Interfacing with the File System
3925 @section Modules for Interfacing with the File System
3926 @cindex modules for interfacing with the file system
3927 @cindex interfacing with the file system, modules for
3928 @cindex file system, modules for interfacing with the
3929
3930 @example
3931 lstream.c
3932 lstream.h
3933 @end example
3934
3935 These modules implement the @dfn{stream} Lisp object type.  This is an
3936 internal-only Lisp object that implements a generic buffering stream.
3937 The idea is to provide a uniform interface onto all sources and sinks of
3938 data, including file descriptors, stdio streams, chunks of memory, Lisp
3939 buffers, Lisp strings, etc.  That way, I/O functions can be written to
3940 the stream interface and can transparently handle all possible sources
3941 and sinks.  (For example, the @code{read} function can read data from a
3942 file, a string, a buffer, or even a function that is called repeatedly
3943 to return data, without worrying about where the data is coming from or
3944 what-size chunks it is returned in.)
3945
3946 @cindex lstream
3947 Note that in the C code, streams are called @dfn{lstreams} (for ``Lisp
3948 streams'') to distinguish them from other kinds of streams, e.g. stdio
3949 streams and C++ I/O streams.
3950
3951 Similar to other subsystems in XEmacs, lstreams are separated into
3952 generic functions and a set of methods for the different types of
3953 lstreams.  @file{lstream.c} provides implementations of many different
3954 types of streams; others are provided, e.g., in @file{file-coding.c}.
3955
3956
3957
3958 @example
3959 fileio.c
3960 @end example
3961
3962 This implements the basic primitives for interfacing with the file
3963 system.  This includes primitives for reading files into buffers,
3964 writing buffers into files, checking for the presence or accessibility
3965 of files, canonicalizing file names, etc.  Note that these primitives
3966 are usually not invoked directly by the user: There is a great deal of
3967 higher-level Lisp code that implements the user commands such as
3968 @code{find-file} and @code{save-buffer}.  This is similar to the
3969 distinction between the lower-level primitives in @file{editfns.c} and
3970 the higher-level user commands in @file{commands.c} and
3971 @file{simple.el}.
3972
3973
3974
3975 @example
3976 filelock.c
3977 @end example
3978
3979 This file provides functions for detecting clashes between different
3980 processes (e.g. XEmacs and some external process, or two different
3981 XEmacs processes) modifying the same file.  (XEmacs can optionally use
3982 the @file{lock/} subdirectory to provide a form of ``locking'' between
3983 different XEmacs processes.)  This module is also used by the low-level
3984 functions in @file{insdel.c} to ensure that, if the first modification
3985 is being made to a buffer whose corresponding file has been externally
3986 modified, the user is made aware of this so that the buffer can be
3987 synched up with the external changes if necessary.
3988
3989
3990 @example
3991 filemode.c
3992 @end example
3993
3994 This file provides some miscellaneous functions that construct a
3995 @samp{rwxr-xr-x}-type permissions string (as might appear in an
3996 @file{ls}-style directory listing) given the information returned by the
3997 @code{stat()} system call.
3998
3999
4000
4001 @example
4002 dired.c
4003 ndir.h
4004 @end example
4005
4006 These files implement the XEmacs interface to directory searching.  This
4007 includes a number of primitives for determining the files in a directory
4008 and for doing filename completion. (Remember that generic completion is
4009 handled by a different mechanism, in @file{minibuf.c}.)
4010
4011 @file{ndir.h} is a header file used for the directory-searching
4012 emulation functions provided in @file{sysdep.c} (see section J below),
4013 for systems that don't provide any directory-searching functions. (On
4014 those systems, directories can be read directly as files, and parsed.)
4015
4016
4017
4018 @example
4019 realpath.c
4020 @end example
4021
4022 This file provides an implementation of the @code{realpath()} function
4023 for expanding symbolic links, on systems that don't implement it or have
4024 a broken implementation.
4025
4026
4027
4028 @node Modules for Other Aspects of the Lisp Interpreter and Object System
4029 @section Modules for Other Aspects of the Lisp Interpreter and Object System
4030 @cindex modules for other aspects of the Lisp interpreter and object system
4031 @cindex Lisp interpreter and object system, modules for other aspects of the
4032 @cindex interpreter and object system, modules for other aspects of the Lisp
4033 @cindex object system, modules for other aspects of the Lisp interpreter and
4034
4035 @example
4036 elhash.c
4037 elhash.h
4038 hash.c
4039 hash.h
4040 @end example
4041
4042 These files provide two implementations of hash tables.  Files
4043 @file{hash.c} and @file{hash.h} provide a generic C implementation of
4044 hash tables which can stand independently of XEmacs.  Files
4045 @file{elhash.c} and @file{elhash.h} provide a separate implementation of
4046 hash tables that can store only Lisp objects, and knows about Lispy
4047 things like garbage collection, and implement the @dfn{hash-table} Lisp
4048 object type.
4049
4050
4051 @example
4052 specifier.c
4053 specifier.h
4054 @end example
4055
4056 This module implements the @dfn{specifier} Lisp object type.  This is
4057 primarily used for displayable properties, and allows for values that
4058 are specific to a particular buffer, window, frame, device, or device
4059 class, as well as a default value existing.  This is used, for example,
4060 to control the height of the horizontal scrollbar or the appearance of
4061 the @code{default}, @code{bold}, or other faces.  The specifier object
4062 consists of a number of specifications, each of which maps from a
4063 buffer, window, etc. to a value.  The function @code{specifier-instance}
4064 looks up a value given a window (from which a buffer, frame, and device
4065 can be derived).
4066
4067
4068 @example
4069 chartab.c
4070 chartab.h
4071 casetab.c
4072 @end example
4073
4074 @file{chartab.c} and @file{chartab.h} implement the @dfn{char table}
4075 Lisp object type, which maps from characters or certain sorts of
4076 character ranges to Lisp objects.  The implementation of this object
4077 type is optimized for the internal representation of characters.  Char
4078 tables come in different types, which affect the allowed object types to
4079 which a character can be mapped and also dictate certain other
4080 properties of the char table.
4081
4082 @cindex case table
4083 @file{casetab.c} implements one sort of char table, the @dfn{case
4084 table}, which maps characters to other characters of possibly different
4085 case.  These are used by XEmacs to implement case-changing primitives
4086 and to do case-insensitive searching.
4087
4088
4089
4090 @example
4091 syntax.c
4092 syntax.h
4093 @end example
4094
4095 @cindex scanner
4096 This module implements @dfn{syntax tables}, another sort of char table
4097 that maps characters into syntax classes that define the syntax of these
4098 characters (e.g. a parenthesis belongs to a class of @samp{open}
4099 characters that have corresponding @samp{close} characters and can be
4100 nested).  This module also implements the Lisp @dfn{scanner}, a set of
4101 primitives for scanning over text based on syntax tables.  This is used,
4102 for example, to find the matching parenthesis in a command such as
4103 @code{forward-sexp}, and by @file{font-lock.c} to locate quoted strings,
4104 comments, etc.
4105
4106
4107
4108 @example
4109 casefiddle.c
4110 @end example
4111
4112 This module implements various Lisp primitives for upcasing, downcasing
4113 and capitalizing strings or regions of buffers.
4114
4115
4116
4117 @example
4118 rangetab.c
4119 @end example
4120
4121 This module implements the @dfn{range table} Lisp object type, which
4122 provides for a mapping from ranges of integers to arbitrary Lisp
4123 objects.
4124
4125
4126
4127 @example
4128 opaque.c
4129 opaque.h
4130 @end example
4131
4132 This module implements the @dfn{opaque} Lisp object type, an
4133 internal-only Lisp object that encapsulates an arbitrary block of memory
4134 so that it can be managed by the Lisp allocation system.  To create an
4135 opaque object, you call @code{make_opaque()}, passing a pointer to a
4136 block of memory.  An object is created that is big enough to hold the
4137 memory, which is copied into the object's storage.  The object will then
4138 stick around as long as you keep pointers to it, after which it will be
4139 automatically reclaimed.
4140
4141 @cindex mark method
4142 Opaque objects can also have an arbitrary @dfn{mark method} associated
4143 with them, in case the block of memory contains other Lisp objects that
4144 need to be marked for garbage-collection purposes. (If you need other
4145 object methods, such as a finalize method, you should just go ahead and
4146 create a new Lisp object type---it's not hard.)
4147
4148
4149
4150 @example
4151 abbrev.c
4152 @end example
4153
4154 This function provides a few primitives for doing dynamic abbreviation
4155 expansion.  In XEmacs, most of the code for this has been moved into
4156 Lisp.  Some C code remains for speed and because the primitive
4157 @code{self-insert-command} (which is executed for all self-inserting
4158 characters) hooks into the abbrev mechanism. (@code{self-insert-command}
4159 is itself in C only for speed.)
4160
4161
4162
4163 @example
4164 doc.c
4165 @end example
4166
4167 This function provides primitives for retrieving the documentation
4168 strings of functions and variables.  These documentation strings contain
4169 certain special markers that get dynamically expanded (e.g. a
4170 reverse-lookup is performed on some named functions to retrieve their
4171 current key bindings).  Some documentation strings (in particular, for
4172 the built-in primitives and pre-loaded Lisp functions) are stored
4173 externally in a file @file{DOC} in the @file{lib-src/} directory and
4174 need to be fetched from that file. (Part of the build stage involves
4175 building this file, and another part involves constructing an index for
4176 this file and embedding it into the executable, so that the functions in
4177 @file{doc.c} do not have to search the entire @file{DOC} file to find
4178 the appropriate documentation string.)
4179
4180
4181
4182 @example
4183 md5.c
4184 @end example
4185
4186 This function provides a Lisp primitive that implements the MD5 secure
4187 hashing scheme, used to create a large hash value of a string of data such that
4188 the data cannot be derived from the hash value.  This is used for
4189 various security applications on the Internet.
4190
4191
4192
4193
4194 @node Modules for Interfacing with the Operating System
4195 @section Modules for Interfacing with the Operating System
4196 @cindex modules for interfacing with the operating system
4197 @cindex interfacing with the operating system, modules for
4198 @cindex operating system, modules for interfacing with the
4199
4200 @example
4201 callproc.c
4202 process.c
4203 process.h
4204 @end example
4205
4206 These modules allow XEmacs to spawn and communicate with subprocesses
4207 and network connections.
4208
4209 @cindex synchronous subprocesses
4210 @cindex subprocesses, synchronous
4211   @file{callproc.c} implements (through the @code{call-process}
4212 primitive) what are called @dfn{synchronous subprocesses}.  This means
4213 that XEmacs runs a program, waits till it's done, and retrieves its
4214 output.  A typical example might be calling the @file{ls} program to get
4215 a directory listing.
4216
4217 @cindex asynchronous subprocesses
4218 @cindex subprocesses, asynchronous
4219   @file{process.c} and @file{process.h} implement @dfn{asynchronous
4220 subprocesses}.  This means that XEmacs starts a program and then
4221 continues normally, not waiting for the process to finish.  Data can be
4222 sent to the process or retrieved from it as it's running.  This is used
4223 for the @code{shell} command (which provides a front end onto a shell
4224 program such as @file{csh}), the mail and news readers implemented in
4225 XEmacs, etc.  The result of calling @code{start-process} to start a
4226 subprocess is a process object, a particular kind of object used to
4227 communicate with the subprocess.  You can send data to the process by
4228 passing the process object and the data to @code{send-process}, and you
4229 can specify what happens to data retrieved from the process by setting
4230 properties of the process object. (When the process sends data, XEmacs
4231 receives a process event, which says that there is data ready.  When
4232 @code{dispatch-event} is called on this event, it reads the data from
4233 the process and does something with it, as specified by the process
4234 object's properties.  Typically, this means inserting the data into a
4235 buffer or calling a function.) Another property of the process object is
4236 called the @dfn{sentinel}, which is a function that is called when the
4237 process terminates.
4238
4239 @cindex network connections
4240   Process objects are also used for network connections (connections to a
4241 process running on another machine).  Network connections are started
4242 with @code{open-network-stream} but otherwise work just like
4243 subprocesses.
4244
4245
4246
4247 @example
4248 sysdep.c
4249 sysdep.h
4250 @end example
4251
4252   These modules implement most of the low-level, messy operating-system
4253 interface code.  This includes various device control (ioctl) operations
4254 for file descriptors, TTY's, pseudo-terminals, etc. (usually this stuff
4255 is fairly system-dependent; thus the name of this module), and emulation
4256 of standard library functions and system calls on systems that don't
4257 provide them or have broken versions.
4258
4259
4260
4261 @example
4262 sysdir.h
4263 sysfile.h
4264 sysfloat.h
4265 sysproc.h
4266 syspwd.h
4267 syssignal.h
4268 systime.h
4269 systty.h
4270 syswait.h
4271 @end example
4272
4273 These header files provide consistent interfaces onto system-dependent
4274 header files and system calls.  The idea is that, instead of including a
4275 standard header file like @file{<sys/param.h>} (which may or may not
4276 exist on various systems) or having to worry about whether all system
4277 provide a particular preprocessor constant, or having to deal with the
4278 four different paradigms for manipulating signals, you just include the
4279 appropriate @file{sys*.h} header file, which includes all the right
4280 system header files, defines and missing preprocessor constants,
4281 provides a uniform interface onto system calls, etc.
4282
4283 @file{sysdir.h} provides a uniform interface onto directory-querying
4284 functions. (In some cases, this is in conjunction with emulation
4285 functions in @file{sysdep.c}.)
4286
4287 @file{sysfile.h} includes all the necessary header files for standard
4288 system calls (e.g. @code{read()}), ensures that all necessary
4289 @code{open()} and @code{stat()} preprocessor constants are defined, and
4290 possibly (usually) substitutes sugared versions of @code{read()},
4291 @code{write()}, etc. that automatically restart interrupted I/O
4292 operations.
4293
4294 @file{sysfloat.h} includes the necessary header files for floating-point
4295 operations.
4296
4297 @file{sysproc.h} includes the necessary header files for calling
4298 @code{select()}, @code{fork()}, @code{execve()}, socket operations, and
4299 the like, and ensures that the @code{FD_*()} macros for descriptor-set
4300 manipulations are available.
4301
4302 @file{syspwd.h} includes the necessary header files for obtaining
4303 information from @file{/etc/passwd} (the functions are emulated under
4304 VMS).
4305
4306 @file{syssignal.h} includes the necessary header files for
4307 signal-handling and provides a uniform interface onto the different
4308 signal-handling and signal-blocking paradigms.
4309
4310 @file{systime.h} includes the necessary header files and provides
4311 uniform interfaces for retrieving the time of day, setting file
4312 access/modification times, getting the amount of time used by the XEmacs
4313 process, etc.
4314
4315 @file{systty.h} buffers against the infinitude of different ways of
4316 controlling TTY's.
4317
4318 @file{syswait.h} provides a uniform way of retrieving the exit status
4319 from a @code{wait()}ed-on process (some systems use a union, others use
4320 an int).
4321
4322
4323
4324 @example
4325 hpplay.c
4326 libsst.c
4327 libsst.h
4328 libst.h
4329 linuxplay.c
4330 nas.c
4331 sgiplay.c
4332 sound.c
4333 sunplay.c
4334 @end example
4335
4336 These files implement the ability to play various sounds on some types
4337 of computers.  You have to configure your XEmacs with sound support in
4338 order to get this capability.
4339
4340 @file{sound.c} provides the generic interface.  It implements various
4341 Lisp primitives and variables that let you specify which sounds should
4342 be played in certain conditions. (The conditions are identified by
4343 symbols, which are passed to @code{ding} to make a sound.  Various
4344 standard functions call this function at certain times; if sound support
4345 does not exist, a simple beep results.
4346
4347 @cindex native sound
4348 @cindex sound, native
4349 @file{sgiplay.c}, @file{sunplay.c}, @file{hpplay.c}, and
4350 @file{linuxplay.c} interface to the machine's speaker for various
4351 different kind of machines.  This is called @dfn{native} sound.
4352
4353 @cindex sound, network
4354 @cindex network sound
4355 @cindex NAS
4356 @file{nas.c} interfaces to a computer somewhere else on the network
4357 using the NAS (Network Audio Server) protocol, playing sounds on that
4358 machine.  This allows you to run XEmacs on a remote machine, with its
4359 display set to your local machine, and have the sounds be made on your
4360 local machine, provided that you have a NAS server running on your local
4361 machine.
4362
4363 @file{libsst.c}, @file{libsst.h}, and @file{libst.h} provide some
4364 additional functions for playing sound on a Sun SPARC but are not
4365 currently in use.
4366
4367
4368
4369 @example
4370 tooltalk.c
4371 tooltalk.h
4372 @end example
4373
4374 These two modules implement an interface to the ToolTalk protocol, which
4375 is an interprocess communication protocol implemented on some versions
4376 of Unix.  ToolTalk is a high-level protocol that allows processes to
4377 register themselves as providers of particular services; other processes
4378 can then request a service without knowing or caring exactly who is
4379 providing the service.  It is similar in spirit to the DDE protocol
4380 provided under Microsoft Windows.  ToolTalk is a part of the new CDE
4381 (Common Desktop Environment) specification and is used to connect the
4382 parts of the SPARCWorks development environment.
4383
4384
4385
4386 @example
4387 getloadavg.c
4388 @end example
4389
4390 This module provides the ability to retrieve the system's current load
4391 average. (The way to do this is highly system-specific, unfortunately,
4392 and requires a lot of special-case code.)
4393
4394
4395
4396 @example
4397 sunpro.c
4398 @end example
4399
4400 This module provides a small amount of code used internally at Sun to
4401 keep statistics on the usage of XEmacs.
4402
4403
4404
4405 @example
4406 broken-sun.h
4407 strcmp.c
4408 strcpy.c
4409 sunOS-fix.c
4410 @end example
4411
4412 These files provide replacement functions and prototypes to fix numerous
4413 bugs in early releases of SunOS 4.1.
4414
4415
4416
4417 @example
4418 hftctl.c
4419 @end example
4420
4421 This module provides some terminal-control code necessary on versions of
4422 AIX prior to 4.1.
4423
4424
4425
4426 @node Modules for Interfacing with X Windows
4427 @section Modules for Interfacing with X Windows
4428 @cindex modules for interfacing with X Windows
4429 @cindex interfacing with X Windows, modules for
4430 @cindex X Windows, modules for interfacing with
4431
4432 @example
4433 Emacs.ad.h
4434 @end example
4435
4436 A file generated from @file{Emacs.ad}, which contains XEmacs-supplied
4437 fallback resources (so that XEmacs has pretty defaults).
4438
4439
4440
4441 @example
4442 EmacsFrame.c
4443 EmacsFrame.h
4444 EmacsFrameP.h
4445 @end example
4446
4447 These modules implement an Xt widget class that encapsulates a frame.
4448 This is for ease in integrating with Xt.  The EmacsFrame widget covers
4449 the entire X window except for the menubar; the scrollbars are
4450 positioned on top of the EmacsFrame widget.
4451
4452 @strong{Warning:} Abandon hope, all ye who enter here.  This code took
4453 an ungodly amount of time to get right, and is likely to fall apart
4454 mercilessly at the slightest change.  Such is life under Xt.
4455
4456
4457
4458 @example
4459 EmacsManager.c
4460 EmacsManager.h
4461 EmacsManagerP.h
4462 @end example
4463
4464 These modules implement a simple Xt manager (i.e. composite) widget
4465 class that simply lets its children set whatever geometry they want.
4466 It's amazing that Xt doesn't provide this standardly, but on second
4467 thought, it makes sense, considering how amazingly broken Xt is.
4468
4469
4470 @example
4471 EmacsShell-sub.c
4472 EmacsShell.c
4473 EmacsShell.h
4474 EmacsShellP.h
4475 @end example
4476
4477 These modules implement two Xt widget classes that are subclasses of
4478 the TopLevelShell and TransientShell classes.  This is necessary to deal
4479 with more brokenness that Xt has sadistically thrust onto the backs of
4480 developers.
4481
4482
4483
4484 @example
4485 xgccache.c
4486 xgccache.h
4487 @end example
4488
4489 These modules provide functions for maintenance and caching of GC's
4490 (graphics contexts) under the X Window System.  This code is junky and
4491 needs to be rewritten.
4492
4493
4494
4495 @example
4496 select-msw.c
4497 select-x.c
4498 select.c
4499 select.h
4500 @end example
4501
4502 @cindex selections
4503   This module provides an interface to the X Window System's concept of
4504 @dfn{selections}, the standard way for X applications to communicate
4505 with each other.
4506
4507
4508
4509 @example
4510 xintrinsic.h
4511 xintrinsicp.h
4512 xmmanagerp.h
4513 xmprimitivep.h
4514 @end example
4515
4516 These header files are similar in spirit to the @file{sys*.h} files and buffer
4517 against different implementations of Xt and Motif.
4518
4519 @itemize @bullet
4520 @item
4521 @file{xintrinsic.h} should be included in place of @file{<Intrinsic.h>}.
4522 @item
4523 @file{xintrinsicp.h} should be included in place of @file{<IntrinsicP.h>}.
4524 @item
4525 @file{xmmanagerp.h} should be included in place of @file{<XmManagerP.h>}.
4526 @item
4527 @file{xmprimitivep.h} should be included in place of @file{<XmPrimitiveP.h>}.
4528 @end itemize
4529
4530
4531
4532 @example
4533 xmu.c
4534 xmu.h
4535 @end example
4536
4537 These files provide an emulation of the Xmu library for those systems
4538 (i.e. HPUX) that don't provide it as a standard part of X.
4539
4540
4541
4542 @example
4543 ExternalClient-Xlib.c
4544 ExternalClient.c
4545 ExternalClient.h
4546 ExternalClientP.h
4547 ExternalShell.c
4548 ExternalShell.h
4549 ExternalShellP.h
4550 extw-Xlib.c
4551 extw-Xlib.h
4552 extw-Xt.c
4553 extw-Xt.h
4554 @end example
4555
4556 @cindex external widget
4557   These files provide the @dfn{external widget} interface, which allows an
4558 XEmacs frame to appear as a widget in another application.  To do this,
4559 you have to configure with @samp{--external-widget}.
4560
4561 @file{ExternalShell*} provides the server (XEmacs) side of the
4562 connection.
4563
4564 @file{ExternalClient*} provides the client (other application) side of
4565 the connection.  These files are not compiled into XEmacs but are
4566 compiled into libraries that are then linked into your application.
4567
4568 @file{extw-*} is common code that is used for both the client and server.
4569
4570 Don't touch this code; something is liable to break if you do.
4571
4572
4573
4574 @node Modules for Internationalization
4575 @section Modules for Internationalization
4576 @cindex modules for internationalization
4577 @cindex internationalization, modules for
4578
4579 @example
4580 mule-canna.c
4581 mule-ccl.c
4582 mule-charset.c
4583 mule-charset.h
4584 file-coding.c
4585 file-coding.h
4586 mule-mcpath.c
4587 mule-mcpath.h
4588 mule-wnnfns.c
4589 mule.c
4590 @end example
4591
4592 These files implement the MULE (Asian-language) support.  Note that MULE
4593 actually provides a general interface for all sorts of languages, not
4594 just Asian languages (although they are generally the most complicated
4595 to support).  This code is still in beta.
4596
4597 @file{mule-charset.*} and @file{file-coding.*} provide the heart of the
4598 XEmacs MULE support.  @file{mule-charset.*} implements the @dfn{charset}
4599 Lisp object type, which encapsulates a character set (an ordered one- or
4600 two-dimensional set of characters, such as US ASCII or JISX0208 Japanese
4601 Kanji).
4602
4603 @file{file-coding.*} implements the @dfn{coding-system} Lisp object
4604 type, which encapsulates a method of converting between different
4605 encodings.  An encoding is a representation of a stream of characters,
4606 possibly from multiple character sets, using a stream of bytes or words,
4607 and defines (e.g.) which escape sequences are used to specify particular
4608 character sets, how the indices for a character are converted into bytes
4609 (sometimes this involves setting the high bit; sometimes complicated
4610 rearranging of the values takes place, as in the Shift-JIS encoding),
4611 etc.
4612
4613 @file{mule-ccl.c} provides the CCL (Code Conversion Language)
4614 interpreter.  CCL is similar in spirit to Lisp byte code and is used to
4615 implement converters for custom encodings.
4616
4617 @file{mule-canna.c} and @file{mule-wnnfns.c} implement interfaces to
4618 external programs used to implement the Canna and WNN input methods,
4619 respectively.  This is currently in beta.
4620
4621 @file{mule-mcpath.c} provides some functions to allow for pathnames
4622 containing extended characters.  This code is fragmentary, obsolete, and
4623 completely non-working.  Instead, @var{pathname-coding-system} is used
4624 to specify conversions of names of files and directories.  The standard
4625 C I/O functions like @samp{open()} are wrapped so that conversion occurs
4626 automatically.
4627
4628 @file{mule.c} provides a few miscellaneous things that should probably
4629 be elsewhere.
4630
4631
4632
4633 @example
4634 intl.c
4635 @end example
4636
4637 This provides some miscellaneous internationalization code for
4638 implementing message translation and interfacing to the Ximp input
4639 method.  None of this code is currently working.
4640
4641
4642
4643 @example
4644 iso-wide.h
4645 @end example
4646
4647 This contains leftover code from an earlier implementation of
4648 Asian-language support, and is not currently used.
4649
4650
4651
4652
4653 @node Allocation of Objects in XEmacs Lisp, Dumping, A Summary of the Various XEmacs Modules, Top
4654 @chapter Allocation of Objects in XEmacs Lisp
4655 @cindex allocation of objects in XEmacs Lisp
4656 @cindex objects in XEmacs Lisp, allocation of
4657 @cindex Lisp objects, allocation of in XEmacs
4658
4659 @menu
4660 * Introduction to Allocation::
4661 * Garbage Collection::
4662 * GCPROing::
4663 * Garbage Collection - Step by Step::
4664 * Integers and Characters::
4665 * Allocation from Frob Blocks::
4666 * lrecords::
4667 * Low-level allocation::
4668 * Cons::
4669 * Vector::
4670 * Bit Vector::
4671 * Symbol::
4672 * Marker::
4673 * String::
4674 * Compiled Function::
4675 @end menu
4676
4677 @node Introduction to Allocation
4678 @section Introduction to Allocation
4679 @cindex allocation, introduction to
4680
4681   Emacs Lisp, like all Lisps, has garbage collection.  This means that
4682 the programmer never has to explicitly free (destroy) an object; it
4683 happens automatically when the object becomes inaccessible.  Most
4684 experts agree that garbage collection is a necessity in a modern,
4685 high-level language.  Its omission from C stems from the fact that C was
4686 originally designed to be a nice abstract layer on top of assembly
4687 language, for writing kernels and basic system utilities rather than
4688 large applications.
4689
4690   Lisp objects can be created by any of a number of Lisp primitives.
4691 Most object types have one or a small number of basic primitives
4692 for creating objects.  For conses, the basic primitive is @code{cons};
4693 for vectors, the primitives are @code{make-vector} and @code{vector}; for
4694 symbols, the primitives are @code{make-symbol} and @code{intern}; etc.
4695 Some Lisp objects, especially those that are primarily used internally,
4696 have no corresponding Lisp primitives.  Every Lisp object, though,
4697 has at least one C primitive for creating it.
4698
4699   Recall from section (VII) that a Lisp object, as stored in a 32-bit or
4700 64-bit word, has a few tag bits, and a ``value'' that occupies the
4701 remainder of the bits.  We can separate the different Lisp object types
4702 into three broad categories:
4703
4704 @itemize @bullet
4705 @item
4706 (a) Those for whom the value directly represents the contents of the
4707 Lisp object.  Only two types are in this category: integers and
4708 characters.  No special allocation or garbage collection is necessary
4709 for such objects.  Lisp objects of these types do not need to be
4710 @code{GCPRO}ed.
4711 @end itemize
4712
4713   In the remaining two categories, the type is stored in the object
4714 itself.  The tag for all such objects is the generic @dfn{lrecord}
4715 (Lisp_Type_Record) tag.  The first bytes of the object's structure are an
4716 integer (actually a char) characterising the object's type and some
4717 flags, in particular the mark bit used for garbage collection.  A
4718 structure describing the type is accessible thru the
4719 lrecord_implementation_table indexed with said integer.  This structure
4720 includes the method pointers and a pointer to a string naming the type.
4721
4722 @itemize @bullet
4723 @item
4724 (b) Those lrecords that are allocated in frob blocks (see above).  This
4725 includes the objects that are most common and relatively small, and
4726 includes conses, strings, subrs, floats, compiled functions, symbols,
4727 extents, events, and markers.  With the cleanup of frob blocks done in
4728 19.12, it's not terribly hard to add more objects to this category, but
4729 it's a bit trickier than adding an object type to type (c) (esp. if the
4730 object needs a finalization method), and is not likely to save much
4731 space unless the object is small and there are many of them. (In fact,
4732 if there are very few of them, it might actually waste space.)
4733 @item
4734 (c) Those lrecords that are individually @code{malloc()}ed.  These are
4735 called @dfn{lcrecords}.  All other types are in this category.  Adding a
4736 new type to this category is comparatively easy, and all types added
4737 since 19.8 (when the current allocation scheme was devised, by Richard
4738 Mlynarik), with the exception of the character type, have been in this
4739 category.
4740 @end itemize
4741
4742   Note that bit vectors are a bit of a special case.  They are
4743 simple lrecords as in category (b), but are individually @code{malloc()}ed
4744 like vectors.  You can basically view them as exactly like vectors
4745 except that their type is stored in lrecord fashion rather than
4746 in directly-tagged fashion.
4747
4748
4749 @node Garbage Collection
4750 @section Garbage Collection
4751 @cindex garbage collection
4752
4753 @cindex mark and sweep
4754   Garbage collection is simple in theory but tricky to implement.
4755 Emacs Lisp uses the oldest garbage collection method, called
4756 @dfn{mark and sweep}.  Garbage collection begins by starting with
4757 all accessible locations (i.e. all variables and other slots where
4758 Lisp objects might occur) and recursively traversing all objects
4759 accessible from those slots, marking each one that is found.
4760 We then go through all of memory and free each object that is
4761 not marked, and unmarking each object that is marked.  Note
4762 that ``all of memory'' means all currently allocated objects.
4763 Traversing all these objects means traversing all frob blocks,
4764 all vectors (which are chained in one big list), and all
4765 lcrecords (which are likewise chained).
4766
4767   Garbage collection can be invoked explicitly by calling
4768 @code{garbage-collect} but is also called automatically by @code{eval},
4769 once a certain amount of memory has been allocated since the last
4770 garbage collection (according to @code{gc-cons-threshold}).
4771
4772
4773 @node GCPROing
4774 @section @code{GCPRO}ing
4775 @cindex @code{GCPRO}ing
4776 @cindex garbage collection protection
4777 @cindex protection, garbage collection
4778
4779 @code{GCPRO}ing is one of the ugliest and trickiest parts of Emacs
4780 internals.  The basic idea is that whenever garbage collection
4781 occurs, all in-use objects must be reachable somehow or
4782 other from one of the roots of accessibility.  The roots
4783 of accessibility are:
4784
4785 @enumerate
4786 @item
4787 All objects that have been @code{staticpro()}d or
4788 @code{staticpro_nodump()}ed.  This is used for any global C variables
4789 that hold Lisp objects.  A call to @code{staticpro()} happens implicitly
4790 as a result of any symbols declared with @code{defsymbol()} and any
4791 variables declared with @code{DEFVAR_FOO()}.  You need to explicitly
4792 call @code{staticpro()} (in the @code{vars_of_foo()} method of a module)
4793 for other global C variables holding Lisp objects. (This typically
4794 includes internal lists and such things.).  Use
4795 @code{staticpro_nodump()} only in the rare cases when you do not want
4796 the pointed variable to be saved at dump time but rather recompute it at
4797 startup.
4798
4799 Note that @code{obarray} is one of the @code{staticpro()}d things.
4800 Therefore, all functions and variables get marked through this.
4801 @item
4802 Any shadowed bindings that are sitting on the @code{specpdl} stack.
4803 @item
4804 Any objects sitting in currently active (Lisp) stack frames,
4805 catches, and condition cases.
4806 @item
4807 A couple of special-case places where active objects are
4808 located.
4809 @item
4810 Anything currently marked with @code{GCPRO}.
4811 @end enumerate
4812
4813   Marking with @code{GCPRO} is necessary because some C functions (quite
4814 a lot, in fact), allocate objects during their operation.  Quite
4815 frequently, there will be no other pointer to the object while the
4816 function is running, and if a garbage collection occurs and the object
4817 needs to be referenced again, bad things will happen.  The solution is
4818 to mark those objects with @code{GCPRO}.  Unfortunately this is easy to
4819 forget, and there is basically no way around this problem.  Here are
4820 some rules, though:
4821
4822 @enumerate
4823 @item
4824 For every @code{GCPRO@var{n}}, there have to be declarations of
4825 @code{struct gcpro gcpro1, gcpro2}, etc.
4826
4827 @item
4828 You @emph{must} @code{UNGCPRO} anything that's @code{GCPRO}ed, and you
4829 @emph{must not} @code{UNGCPRO} if you haven't @code{GCPRO}ed.  Getting
4830 either of these wrong will lead to crashes, often in completely random
4831 places unrelated to where the problem lies.
4832
4833 @item
4834 The way this actually works is that all currently active @code{GCPRO}s
4835 are chained through the @code{struct gcpro} local variables, with the
4836 variable @samp{gcprolist} pointing to the head of the list and the nth
4837 local @code{gcpro} variable pointing to the first @code{gcpro} variable
4838 in the next enclosing stack frame.  Each @code{GCPRO}ed thing is an
4839 lvalue, and the @code{struct gcpro} local variable contains a pointer to
4840 this lvalue.  This is why things will mess up badly if you don't pair up
4841 the @code{GCPRO}s and @code{UNGCPRO}s---you will end up with
4842 @code{gcprolist}s containing pointers to @code{struct gcpro}s or local
4843 @code{Lisp_Object} variables in no-longer-active stack frames.
4844
4845 @item
4846 It is actually possible for a single @code{struct gcpro} to
4847 protect a contiguous array of any number of values, rather than
4848 just a single lvalue.  To effect this, call @code{GCPRO@var{n}} as usual on
4849 the first object in the array and then set @code{gcpro@var{n}.nvars}.
4850
4851 @item
4852 @strong{Strings are relocated.}  What this means in practice is that the
4853 pointer obtained using @code{XSTRING_DATA()} is liable to change at any
4854 time, and you should never keep it around past any function call, or
4855 pass it as an argument to any function that might cause a garbage
4856 collection.  This is why a number of functions accept either a
4857 ``non-relocatable'' @code{char *} pointer or a relocatable Lisp string,
4858 and only access the Lisp string's data at the very last minute.  In some
4859 cases, you may end up having to @code{alloca()} some space and copy the
4860 string's data into it.
4861
4862 @item
4863 By convention, if you have to nest @code{GCPRO}'s, use @code{NGCPRO@var{n}}
4864 (along with @code{struct gcpro ngcpro1, ngcpro2}, etc.), @code{NNGCPRO@var{n}},
4865 etc.  This avoids compiler warnings about shadowed locals.
4866
4867 @item
4868 It is @emph{always} better to err on the side of extra @code{GCPRO}s
4869 rather than too few.  The extra cycles spent on this are
4870 almost never going to make a whit of difference in the
4871 speed of anything.
4872
4873 @item
4874 The general rule to follow is that caller, not callee, @code{GCPRO}s.
4875 That is, you should not have to explicitly @code{GCPRO} any Lisp objects
4876 that are passed in as parameters.
4877
4878 One exception from this rule is if you ever plan to change the parameter
4879 value, and store a new object in it.  In that case, you @emph{must}
4880 @code{GCPRO} the parameter, because otherwise the new object will not be
4881 protected.
4882
4883 So, if you create any Lisp objects (remember, this happens in all sorts
4884 of circumstances, e.g. with @code{Fcons()}, etc.), you are responsible
4885 for @code{GCPRO}ing them, unless you are @emph{absolutely sure} that
4886 there's no possibility that a garbage-collection can occur while you
4887 need to use the object.  Even then, consider @code{GCPRO}ing.
4888
4889 @item
4890 A garbage collection can occur whenever anything calls @code{Feval}, or
4891 whenever a QUIT can occur where execution can continue past
4892 this. (Remember, this is almost anywhere.)
4893
4894 @item
4895 If you have the @emph{least smidgeon of doubt} about whether
4896 you need to @code{GCPRO}, you should @code{GCPRO}.
4897
4898 @item
4899 Beware of @code{GCPRO}ing something that is uninitialized.  If you have
4900 any shade of doubt about this, initialize all your variables to @code{Qnil}.
4901
4902 @item
4903 Be careful of traps, like calling @code{Fcons()} in the argument to
4904 another function.  By the ``caller protects'' law, you should be
4905 @code{GCPRO}ing the newly-created cons, but you aren't.  A certain
4906 number of functions that are commonly called on freshly created stuff
4907 (e.g. @code{nconc2()}, @code{Fsignal()}), break the ``caller protects''
4908 law and go ahead and @code{GCPRO} their arguments so as to simplify
4909 things, but make sure and check if it's OK whenever doing something like
4910 this.
4911
4912 @item
4913 Once again, remember to @code{GCPRO}!  Bugs resulting from insufficient
4914 @code{GCPRO}ing are intermittent and extremely difficult to track down,
4915 often showing up in crashes inside of @code{garbage-collect} or in
4916 weirdly corrupted objects or even in incorrect values in a totally
4917 different section of code.
4918 @end enumerate
4919
4920 @cindex garbage collection, conservative
4921 @cindex conservative garbage collection
4922   Given the extremely error-prone nature of the @code{GCPRO} scheme, and
4923 the difficulties in tracking down, it should be considered a deficiency
4924 in the XEmacs code.  A solution to this problem would involve
4925 implementing so-called @dfn{conservative} garbage collection for the C
4926 stack.  That involves looking through all of stack memory and treating
4927 anything that looks like a reference to an object as a reference.  This
4928 will result in a few objects not getting collected when they should, but
4929 it obviates the need for @code{GCPRO}ing, and allows garbage collection
4930 to happen at any point at all, such as during object allocation.
4931
4932 @node Garbage Collection - Step by Step
4933 @section Garbage Collection - Step by Step
4934 @cindex garbage collection - step by step
4935
4936 @menu
4937 * Invocation::
4938 * garbage_collect_1::
4939 * mark_object::
4940 * gc_sweep::
4941 * sweep_lcrecords_1::
4942 * compact_string_chars::
4943 * sweep_strings::
4944 * sweep_bit_vectors_1::
4945 @end menu
4946
4947 @node Invocation
4948 @subsection Invocation
4949 @cindex garbage collection, invocation
4950
4951 The first thing that anyone should know about garbage collection is:
4952 when and how the garbage collector is invoked. One might think that this
4953 could happen every time new memory is allocated, e.g. new objects are
4954 created, but this is @emph{not} the case. Instead, we have the following
4955 situation:
4956
4957 The entry point of any process of garbage collection is an invocation
4958 of the function @code{garbage_collect_1} in file @code{alloc.c}. The
4959 invocation can occur @emph{explicitly} by calling the function
4960 @code{Fgarbage_collect} (in addition this function provides information
4961 about the freed memory), or can occur @emph{implicitly} in four different
4962 situations:
4963 @enumerate
4964 @item
4965 In function @code{main_1} in file @code{emacs.c}. This function is called
4966 at each startup of xemacs. The garbage collection is invoked after all
4967 initial creations are completed, but only if a special internal error
4968 checking-constant @code{ERROR_CHECK_GC} is defined.
4969 @item
4970 In function @code{disksave_object_finalization} in file
4971 @code{alloc.c}. The only purpose of this function is to clear the
4972 objects from memory which need not be stored with xemacs when we dump out
4973 an executable. This is only done by @code{Fdump_emacs} or by
4974 @code{Fdump_emacs_data} respectively (both in @code{emacs.c}). The
4975 actual clearing is accomplished by making these objects unreachable and
4976 starting a garbage collection. The function is only used while building
4977 xemacs.
4978 @item
4979 In function @code{Feval / eval} in file @code{eval.c}. Each time the
4980 well known and often used function eval is called to evaluate a form,
4981 one of the first things that could happen, is a potential call of
4982 @code{garbage_collect_1}. There exist three global variables,
4983 @code{consing_since_gc} (counts the created cons-cells since the last
4984 garbage collection), @code{gc_cons_threshold} (a specified threshold
4985 after which a garbage collection occurs) and @code{always_gc}. If
4986 @code{always_gc} is set or if the threshold is exceeded, the garbage
4987 collection will start.
4988 @item
4989 In function @code{Ffuncall / funcall} in file @code{eval.c}. This
4990 function evaluates calls of elisp functions and works according to
4991 @code{Feval}.
4992 @end enumerate
4993
4994 The upshot is that garbage collection can basically occur everywhere
4995 @code{Feval}, respectively @code{Ffuncall}, is used - either directly or
4996 through another function. Since calls to these two functions are hidden
4997 in various other functions, many calls to @code{garbage_collect_1} are
4998 not obviously foreseeable, and therefore unexpected. Instances where
4999 they are used that are worth remembering are various elisp commands, as
5000 for example @code{or}, @code{and}, @code{if}, @code{cond}, @code{while},
5001 @code{setq}, etc., miscellaneous @code{gui_item_...} functions,
5002 everything related to @code{eval} (@code{Feval_buffer}, @code{call0},
5003 ...) and inside @code{Fsignal}. The latter is used to handle signals, as
5004 for example the ones raised by every @code{QUIT}-macro triggered after
5005 pressing Ctrl-g.
5006
5007 @node garbage_collect_1
5008 @subsection @code{garbage_collect_1}
5009 @cindex @code{garbage_collect_1}
5010
5011 We can now describe exactly what happens after the invocation takes
5012 place.
5013 @enumerate
5014 @item
5015 There are several cases in which the garbage collector is left immediately:
5016 when we are already garbage collecting (@code{gc_in_progress}), when
5017 the garbage collection is somehow forbidden
5018 (@code{gc_currently_forbidden}), when we are currently displaying something
5019 (@code{in_display}) or when we are preparing for the armageddon of the
5020 whole system (@code{preparing_for_armageddon}).
5021 @item
5022 Next the correct frame in which to put
5023 all the output occurring during garbage collecting is determined. In
5024 order to be able to restore the old display's state after displaying the
5025 message, some data about the current cursor position has to be
5026 saved. The variables @code{pre_gc_cursor} and @code{cursor_changed} take
5027 care of that.
5028 @item
5029 The state of @code{gc_currently_forbidden} must be restored after
5030 the garbage collection, no matter what happens during the process. We
5031 accomplish this by @code{record_unwind_protect}ing the suitable function
5032 @code{restore_gc_inhibit} together with the current value of
5033 @code{gc_currently_forbidden}.
5034 @item
5035 If we are concurrently running an interactive xemacs session, the next step
5036 is simply to show the garbage collector's cursor/message.
5037 @item
5038 The following steps are the intrinsic steps of the garbage collector,
5039 therefore @code{gc_in_progress} is set.
5040 @item
5041 For debugging purposes, it is possible to copy the current C stack
5042 frame. However, this seems to be a currently unused feature.
5043 @item
5044 Before actually starting to go over all live objects, references to
5045 objects that are no longer used are pruned. We only have to do this for events
5046 (@code{clear_event_resource}) and for specifiers
5047 (@code{cleanup_specifiers}).
5048 @item
5049 Now the mark phase begins and marks all accessible elements. In order to
5050 start from
5051 all slots that serve as roots of accessibility, the function
5052 @code{mark_object} is called for each root individually to go out from
5053 there to mark all reachable objects. All roots that are traversed are
5054 shown in their processed order:
5055 @itemize @bullet
5056 @item
5057 all constant symbols and static variables that are registered via
5058 @code{staticpro}@ in the dynarr @code{staticpros}.
5059 @xref{Adding Global Lisp Variables}.
5060 @item
5061 all Lisp objects that are created in C functions and that must be
5062 protected from freeing them. They are registered in the global
5063 list @code{gcprolist}.
5064 @xref{GCPROing}.
5065 @item
5066 all local variables (i.e. their name fields @code{symbol} and old
5067 values @code{old_values}) that are bound during the evaluation by the Lisp
5068 engine. They are stored in @code{specbinding} structs pushed on a stack
5069 called @code{specpdl}.
5070 @xref{Dynamic Binding; The specbinding Stack; Unwind-Protects}.
5071 @item
5072 all catch blocks that the Lisp engine encounters during the evaluation
5073 cause the creation of structs @code{catchtag} inserted in the list
5074 @code{catchlist}. Their tag (@code{tag}) and value (@code{val} fields
5075 are freshly created objects and therefore have to be marked.
5076 @xref{Catch and Throw}.
5077 @item
5078 every function application pushes new structs @code{backtrace}
5079 on the call stack of the Lisp engine (@code{backtrace_list}). The unique
5080 parts that have to be marked are the fields for each function
5081 (@code{function}) and all their arguments (@code{args}).
5082 @xref{Evaluation}.
5083 @item
5084 all objects that are used by the redisplay engine that must not be freed
5085 are marked by a special function called @code{mark_redisplay} (in
5086 @code{redisplay.c}).
5087 @item
5088 all objects created for profiling purposes are allocated by C functions
5089 instead of using the lisp allocation mechanisms. In order to receive the
5090 right ones during the sweep phase, they also have to be marked
5091 manually. That is done by the function @code{mark_profiling_info}
5092 @end itemize
5093 @item
5094 Hash tables in XEmacs belong to a kind of special objects that
5095 make use of a concept often called 'weak pointers'.
5096 To make a long story short, these kind of pointers are not followed
5097 during the estimation of the live objects during garbage collection.
5098 Any object referenced only by weak pointers is collected
5099 anyway, and the reference to it is cleared. In hash tables there are
5100 different usage patterns of them, manifesting in different types of hash
5101 tables, namely 'non-weak', 'weak', 'key-weak' and 'value-weak'
5102 (internally also 'key-car-weak' and 'value-car-weak') hash tables, each
5103 clearing entries depending on different conditions. More information can
5104 be found in the documentation to the function @code{make-hash-table}.
5105
5106 Because there are complicated dependency rules about when and what to
5107 mark while processing weak hash tables, the standard @code{marker}
5108 method is only active if it is marking non-weak hash tables. As soon as
5109 a weak component is in the table, the hash table entries are ignored
5110 while marking. Instead their marking is done each separately by the
5111 function @code{finish_marking_weak_hash_tables}. This function iterates
5112 over each hash table entry @code{hentries} for each weak hash table in
5113 @code{Vall_weak_hash_tables}. Depending on the type of a table, the
5114 appropriate action is performed.
5115 If a table is acting as @code{HASH_TABLE_KEY_WEAK}, and a key already marked,
5116 everything reachable from the @code{value} component is marked. If it is
5117 acting as a @code{HASH_TABLE_VALUE_WEAK} and the value component is
5118 already marked, the marking starts beginning only from the
5119 @code{key} component.
5120 If it is a @code{HASH_TABLE_KEY_CAR_WEAK} and the car
5121 of the key entry is already marked, we mark both the @code{key} and
5122 @code{value} components.
5123 Finally, if the table is of the type @code{HASH_TABLE_VALUE_CAR_WEAK}
5124 and the car of the value components is already marked, again both the
5125 @code{key} and the @code{value} components get marked.
5126
5127 Again, there are lists with comparable properties called weak
5128 lists. There exist different peculiarities of their types called
5129 @code{simple}, @code{assoc}, @code{key-assoc} and
5130 @code{value-assoc}. You can find further details about them in the
5131 description to the function @code{make-weak-list}. The scheme of their
5132 marking is similar: all weak lists are listed in @code{Qall_weak_lists},
5133 therefore we iterate over them. The marking is advanced until we hit an
5134 already marked pair. Then we know that during a former run all
5135 the rest has been marked completely. Again, depending on the special
5136 type of the weak list, our jobs differ. If it is a @code{WEAK_LIST_SIMPLE}
5137 and the elem is marked, we mark the @code{cons} part. If it is a
5138 @code{WEAK_LIST_ASSOC} and not a pair or a pair with both marked car and
5139 cdr, we mark the @code{cons} and the @code{elem}. If it is a
5140 @code{WEAK_LIST_KEY_ASSOC} and not a pair or a pair with a marked car of
5141 the elem, we mark the @code{cons} and the @code{elem}. Finally, if it is
5142 a @code{WEAK_LIST_VALUE_ASSOC} and not a pair or a pair with a marked
5143 cdr of the elem, we mark both the @code{cons} and the @code{elem}.
5144
5145 Since, by marking objects in reach from weak hash tables and weak lists,
5146 other objects could get marked, this perhaps implies further marking of
5147 other weak objects, both finishing functions are redone as long as
5148 yet unmarked objects get freshly marked.
5149
5150 @item
5151 After completing the special marking for the weak hash tables and for the weak
5152 lists, all entries that point to objects that are going to be swept in
5153 the further process are useless, and therefore have to be removed from
5154 the table or the list.
5155
5156 The function @code{prune_weak_hash_tables} does the job for weak hash
5157 tables. Totally unmarked hash tables are removed from the list
5158 @code{Vall_weak_hash_tables}. The other ones are treated more carefully
5159 by scanning over all entries and removing one as soon as one of
5160 the components @code{key} and @code{value} is unmarked.
5161
5162 The same idea applies to the weak lists. It is accomplished by
5163 @code{prune_weak_lists}: An unmarked list is pruned from
5164 @code{Vall_weak_lists} immediately. A marked list is treated more
5165 carefully by going over it and removing just the unmarked pairs.
5166
5167 @item
5168 The function @code{prune_specifiers} checks all listed specifiers held
5169 in @code{Vall_specifiers} and removes the ones from the lists that are
5170 unmarked.
5171
5172 @item
5173 All syntax tables are stored in a list called
5174 @code{Vall_syntax_tables}. The function @code{prune_syntax_tables} walks
5175 through it and unlinks the tables that are unmarked.
5176
5177 @item
5178 Next, we will attack the complete sweeping - the function
5179 @code{gc_sweep} which holds the predominance.
5180 @item
5181 First, all the variables with respect to garbage collection are
5182 reset. @code{consing_since_gc} - the counter of the created cells since
5183 the last garbage collection - is set back to 0, and
5184 @code{gc_in_progress} is not @code{true} anymore.
5185 @item
5186 In case the session is interactive, the displayed cursor and message are
5187 removed again.
5188 @item
5189 The state of @code{gc_inhibit} is restored to the former value by
5190 unwinding the stack.
5191 @item
5192 A small memory reserve is always held back that can be reached by
5193 @code{breathing_space}. If nothing more is left, we create a new reserve
5194 and exit.
5195 @end enumerate
5196
5197 @node mark_object
5198 @subsection @code{mark_object}
5199 @cindex @code{mark_object}
5200
5201 The first thing that is checked while marking an object is whether the
5202 object is a real Lisp object @code{Lisp_Type_Record} or just an integer
5203 or a character. Integers and characters are the only two types that are
5204 stored directly - without another level of indirection, and therefore they
5205 don't have to be marked and collected.
5206 @xref{How Lisp Objects Are Represented in C}.
5207
5208 The second case is the one we have to handle. It is the one when we are
5209 dealing with a pointer to a Lisp object. But, there exist also three
5210 possibilities, that prevent us from doing anything while marking: The
5211 object is read only which prevents it from being garbage collected,
5212 i.e. marked (@code{C_READONLY_RECORD_HEADER}). The object in question is
5213 already marked, and need not be marked for the second time (checked by
5214 @code{MARKED_RECORD_HEADER_P}). If it is a special, unmarkable object
5215 (@code{UNMARKABLE_RECORD_HEADER_P}, apparently, these are objects that
5216 sit in some const space, and can therefore not be marked, see
5217 @code{this_one_is_unmarkable} in @code{alloc.c}).
5218
5219 Now, the actual marking is feasible. We do so by once using the macro
5220 @code{MARK_RECORD_HEADER} to mark the object itself (actually the
5221 special flag in the lrecord header), and calling its special marker
5222 "method" @code{marker} if available. The marker method marks every
5223 other object that is in reach from our current object. Note, that these
5224 marker methods should not call @code{mark_object} recursively, but
5225 instead should return the next object from where further marking has to
5226 be performed.
5227
5228 In case another object was returned, as mentioned before, we reiterate
5229 the whole @code{mark_object} process beginning with this next object.
5230
5231 @node gc_sweep
5232 @subsection @code{gc_sweep}
5233 @cindex @code{gc_sweep}
5234
5235 The job of this function is to free all unmarked records from memory. As
5236 we know, there are different types of objects implemented and managed, and
5237 consequently different ways to free them from memory.
5238 @xref{Introduction to Allocation}.
5239
5240 We start with all objects stored through @code{lcrecords}. All
5241 bulkier objects are allocated and handled using that scheme of
5242 @code{lcrecords}. Each object is @code{malloc}ed separately
5243 instead of placing it in one of the contiguous frob blocks. All types
5244 that are currently stored
5245 using @code{lcrecords}'s  @code{alloc_lcrecord} and
5246 @code{make_lcrecord_list} are the types: vectors, buffers,
5247 char-table, char-table-entry, console, weak-list, database, device,
5248 ldap, hash-table, command-builder, extent-auxiliary, extent-info, face,
5249 coding-system, frame, image-instance, glyph, popup-data, gui-item,
5250 keymap, charset, color_instance, font_instance, opaque, opaque-list,
5251 process, range-table, specifier, symbol-value-buffer-local,
5252 symbol-value-lisp-magic, symbol-value-varalias, toolbar-button,
5253 tooltalk-message, tooltalk-pattern, window, and window-configuration. We
5254 take care of them in the fist place
5255 in order to be able to handle and to finalize items stored in them more
5256 easily. The function @code{sweep_lcrecords_1} as described below is
5257 doing the whole job for us.
5258 For a description about the internals: @xref{lrecords}.
5259
5260 Our next candidates are the other objects that behave quite differently
5261 than everything else: the strings. They consists of two parts, a
5262 fixed-size portion (@code{struct Lisp_String}) holding the string's
5263 length, its property list and a pointer to the second part, and the
5264 actual string data, which is stored in string-chars blocks comparable to
5265 frob blocks. In this block, the data is not only freed, but also a
5266 compression of holes is made, i.e. all strings are relocated together.
5267 @xref{String}. This compacting phase is performed by the function
5268 @code{compact_string_chars}, the actual sweeping by the function
5269 @code{sweep_strings} is described below.
5270
5271 After that, the other types are swept step by step using functions
5272 @code{sweep_conses}, @code{sweep_bit_vectors_1},
5273 @code{sweep_compiled_functions}, @code{sweep_floats},
5274 @code{sweep_symbols}, @code{sweep_extents}, @code{sweep_markers} and
5275 @code{sweep_extents}.  They are the fixed-size types cons, floats,
5276 compiled-functions, symbol, marker, extent, and event stored in
5277 so-called "frob blocks", and therefore we can basically do the same on
5278 every type objects, using the same macros, especially defined only to
5279 handle everything with respect to fixed-size blocks. The only fixed-size
5280 type that is not handled here are the fixed-size portion of strings,
5281 because we took special care of them earlier.
5282
5283 The only big exceptions are bit vectors stored differently and
5284 therefore treated differently by the function @code{sweep_bit_vectors_1}
5285 described later.
5286
5287 At first, we need some brief information about how
5288 these fixed-size types are managed in general, in order to understand
5289 how the sweeping is done. They have all a fixed size, and are therefore
5290 stored in big blocks of memory - allocated at once - that can hold a
5291 certain amount of objects of one type. The macro
5292 @code{DECLARE_FIXED_TYPE_ALLOC} creates the suitable structures for
5293 every type. More precisely, we have the block struct
5294 (holding a pointer to the previous block @code{prev} and the
5295 objects in @code{block[]}), a pointer to current block
5296 (@code{current_..._block)}) and its last index
5297 (@code{current_..._block_index}), and a pointer to the free list that
5298 will be created. Also a macro @code{FIXED_TYPE_FROM_BLOCK} plus some
5299 related macros exists that are used to obtain a new object, either from
5300 the free list @code{ALLOCATE_FIXED_TYPE_1} if there is an unused object
5301 of that type stored or by allocating a completely new block using
5302 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK}.
5303
5304 The rest works as follows: all of them define a
5305 macro @code{UNMARK_...} that is used to unmark the object. They define a
5306 macro @code{ADDITIONAL_FREE_...} that defines additional work that has
5307 to be done when converting an object from in use to not in use (so far,
5308 only markers use it in order to unchain them). Then, they all call
5309 the macro @code{SWEEP_FIXED_TYPE_BLOCK} instantiated with their type name
5310 and their struct name.
5311
5312 This call in particular does the following: we go over all blocks
5313 starting with the current moving towards the oldest.
5314 For each block, we look at every object in it. If the object already
5315 freed (checked with @code{FREE_STRUCT_P} using the first pointer of the
5316 object), or if it is
5317 set to read only (@code{C_READONLY_RECORD_HEADER_P}, nothing must be
5318 done. If it is unmarked (checked with @code{MARKED_RECORD_HEADER_P}), it
5319 is put in the free list and set free (using the macro
5320 @code{FREE_FIXED_TYPE}, otherwise it stays in the block, but is unmarked
5321 (by @code{UNMARK_...}). While going through one block, we note if the
5322 whole block is empty. If so, the whole block is freed (using
5323 @code{xfree}) and the free list state is set to the state it had before
5324 handling this block.
5325
5326 @node sweep_lcrecords_1
5327 @subsection @code{sweep_lcrecords_1}
5328 @cindex @code{sweep_lcrecords_1}
5329
5330 After nullifying the complete lcrecord statistics, we go over all
5331 lcrecords two separate times. They are all chained together in a list with
5332 a head called @code{all_lcrecords}.
5333
5334 The first loop calls for each object its @code{finalizer} method, but only
5335 in the case that it is not read only
5336 (@code{C_READONLY_RECORD_HEADER_P)}, it is not already marked
5337 (@code{MARKED_RECORD_HEADER_P}), it is not already in a free list (list of
5338 freed objects, field @code{free}) and finally it owns a finalizer
5339 method.
5340
5341 The second loop actually frees the appropriate objects again by iterating
5342 through the whole list. In case an object is read only or marked, it
5343 has to persist, otherwise it is manually freed by calling
5344 @code{xfree}. During this loop, the lcrecord statistics are kept up to
5345 date by calling @code{tick_lcrecord_stats} with the right arguments,
5346
5347 @node compact_string_chars
5348 @subsection @code{compact_string_chars}
5349 @cindex @code{compact_string_chars}
5350
5351 The purpose of this function is to compact all the data parts of the
5352 strings that are held in so-called @code{string_chars_block}, i.e. the
5353 strings that do not exceed a certain maximal length.
5354
5355 The procedure with which this is done is as follows. We are keeping two
5356 positions in the @code{string_chars_block}s using two pointer/integer
5357 pairs, namely @code{from_sb}/@code{from_pos} and
5358 @code{to_sb}/@code{to_pos}. They stand for the actual positions, from
5359 where to where, to copy the actually handled string.
5360
5361 While going over all chained @code{string_char_block}s and their held
5362 strings, staring at @code{first_string_chars_block}, both pointers
5363 are advanced and eventually a string is copied from @code{from_sb} to
5364 @code{to_sb}, depending on the status of the pointed at strings.
5365
5366 More precisely, we can distinguish between the following actions.
5367 @itemize @bullet
5368 @item
5369 The string at @code{from_sb}'s position could be marked as free, which
5370 is indicated by an invalid pointer to the pointer that should point back
5371 to the fixed size string object, and which is checked by
5372 @code{FREE_STRUCT_P}. In this case, the @code{from_sb}/@code{from_pos}
5373 is advanced to the next string, and nothing has to be copied.
5374 @item
5375 Also, if a string object itself is unmarked, nothing has to be
5376 copied. We likewise advance the @code{from_sb}/@code{from_pos}
5377 pair as described above.
5378 @item
5379 In all other cases, we have a marked string at hand. The string data
5380 must be moved from the from-position to the to-position. In case
5381 there is not enough space in the actual @code{to_sb}-block, we advance
5382 this pointer to the beginning of the next block before copying. In case the
5383 from and to positions are different, we perform the
5384 actual copying using the library function @code{memmove}.
5385 @end itemize
5386
5387 After compacting, the pointer to the current
5388 @code{string_chars_block}, sitting in @code{current_string_chars_block},
5389 is reset on the last block to which we moved a string,
5390 i.e. @code{to_block}, and all remaining blocks (we know that they just
5391 carry garbage) are explicitly @code{xfree}d.
5392
5393 @node sweep_strings
5394 @subsection @code{sweep_strings}
5395 @cindex @code{sweep_strings}
5396
5397 The sweeping for the fixed sized string objects is essentially exactly
5398 the same as it is for all other fixed size types. As before, the freeing
5399 into the suitable free list is done by using the macro
5400 @code{SWEEP_FIXED_SIZE_BLOCK} after defining the right macros
5401 @code{UNMARK_string} and @code{ADDITIONAL_FREE_string}. These two
5402 definitions are a little bit special compared to the ones used
5403 for the other fixed size types.
5404
5405 @code{UNMARK_string} is defined the same way except some additional code
5406 used for updating the bookkeeping information.
5407
5408 For strings, @code{ADDITIONAL_FREE_string} has to do something in
5409 addition: in case, the string was not allocated in a
5410 @code{string_chars_block} because it exceeded the maximal length, and
5411 therefore it was @code{malloc}ed separately, we know also @code{xfree}
5412 it explicitly.
5413
5414 @node sweep_bit_vectors_1
5415 @subsection @code{sweep_bit_vectors_1}
5416 @cindex @code{sweep_bit_vectors_1}
5417
5418 Bit vectors are also one of the rare types that are @code{malloc}ed
5419 individually. Consequently, while sweeping, all further needless
5420 bit vectors must be freed by hand. This is done, as one might imagine,
5421 the expected way: since they are all registered in a list called
5422 @code{all_bit_vectors}, all elements of that list are traversed,
5423 all unmarked bit vectors are unlinked by calling @code{xfree} and all of
5424 them become unmarked.
5425 In addition, the bookkeeping information used for garbage
5426 collector's output purposes is updated.
5427
5428 @node Integers and Characters
5429 @section Integers and Characters
5430 @cindex integers and characters
5431 @cindex characters, integers and
5432
5433   Integer and character Lisp objects are created from integers using the
5434 macros @code{XSETINT()} and @code{XSETCHAR()} or the equivalent
5435 functions @code{make_int()} and @code{make_char()}. (These are actually
5436 macros on most systems.)  These functions basically just do some moving
5437 of bits around, since the integral value of the object is stored
5438 directly in the @code{Lisp_Object}.
5439
5440   @code{XSETINT()} and the like will truncate values given to them that
5441 are too big; i.e. you won't get the value you expected but the tag bits
5442 will at least be correct.
5443
5444 @node Allocation from Frob Blocks
5445 @section Allocation from Frob Blocks
5446 @cindex allocation from frob blocks
5447 @cindex frob blocks, allocation from
5448
5449 The uninitialized memory required by a @code{Lisp_Object} of a particular type
5450 is allocated using
5451 @code{ALLOCATE_FIXED_TYPE()}.  This only occurs inside of the
5452 lowest-level object-creating functions in @file{alloc.c}:
5453 @code{Fcons()}, @code{make_float()}, @code{Fmake_byte_code()},
5454 @code{Fmake_symbol()}, @code{allocate_extent()},
5455 @code{allocate_event()}, @code{Fmake_marker()}, and
5456 @code{make_uninit_string()}.  The idea is that, for each type, there are
5457 a number of frob blocks (each 2K in size); each frob block is divided up
5458 into object-sized chunks.  Each frob block will have some of these
5459 chunks that are currently assigned to objects, and perhaps some that are
5460 free. (If a frob block has nothing but free chunks, it is freed at the
5461 end of the garbage collection cycle.)  The free chunks are stored in a
5462 free list, which is chained by storing a pointer in the first four bytes
5463 of the chunk. (Except for the free chunks at the end of the last frob
5464 block, which are handled using an index which points past the end of the
5465 last-allocated chunk in the last frob block.)
5466 @code{ALLOCATE_FIXED_TYPE()} first tries to retrieve a chunk from the
5467 free list; if that fails, it calls
5468 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK()}, which looks at the end of the
5469 last frob block for space, and creates a new frob block if there is
5470 none. (There are actually two versions of these macros, one of which is
5471 more defensive but less efficient and is used for error-checking.)
5472
5473 @node lrecords
5474 @section lrecords
5475 @cindex lrecords
5476
5477   [see @file{lrecord.h}]
5478
5479   All lrecords have at the beginning of their structure a @code{struct
5480 lrecord_header}.  This just contains a type number and some flags,
5481 including the mark bit.  All builtin type numbers are defined as
5482 constants in @code{enum lrecord_type}, to allow the compiler to generate
5483 more efficient code for @code{@var{type}P}.  The type number, thru the
5484 @code{lrecord_implementation_table}, gives access to a @code{struct
5485 lrecord_implementation}, which is a structure containing method pointers
5486 and such.  There is one of these for each type, and it is a global,
5487 constant, statically-declared structure that is declared in the
5488 @code{DEFINE_LRECORD_IMPLEMENTATION()} macro.
5489
5490   Simple lrecords (of type (b) above) just have a @code{struct
5491 lrecord_header} at their beginning.  lcrecords, however, actually have a
5492 @code{struct lcrecord_header}.  This, in turn, has a @code{struct
5493 lrecord_header} at its beginning, so sanity is preserved; but it also
5494 has a pointer used to chain all lcrecords together, and a special ID
5495 field used to distinguish one lcrecord from another. (This field is used
5496 only for debugging and could be removed, but the space gain is not
5497 significant.)
5498
5499   Simple lrecords are created using @code{ALLOCATE_FIXED_TYPE()}, just
5500 like for other frob blocks.  The only change is that the implementation
5501 pointer must be initialized correctly. (The implementation structure for
5502 an lrecord, or rather the pointer to it, is named @code{lrecord_float},
5503 @code{lrecord_extent}, @code{lrecord_buffer}, etc.)
5504
5505   lcrecords are created using @code{alloc_lcrecord()}.  This takes a
5506 size to allocate and an implementation pointer. (The size needs to be
5507 passed because some lcrecords, such as window configurations, are of
5508 variable size.) This basically just @code{malloc()}s the storage,
5509 initializes the @code{struct lcrecord_header}, and chains the lcrecord
5510 onto the head of the list of all lcrecords, which is stored in the
5511 variable @code{all_lcrecords}.  The calls to @code{alloc_lcrecord()}
5512 generally occur in the lowest-level allocation function for each lrecord
5513 type.
5514
5515 Whenever you create an lrecord, you need to call either
5516 @code{DEFINE_LRECORD_IMPLEMENTATION()} or
5517 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()}.  This needs to be
5518 specified in a @file{.c} file, at the top level.  What this actually
5519 does is define and initialize the implementation structure for the
5520 lrecord. (And possibly declares a function @code{error_check_foo()} that
5521 implements the @code{XFOO()} macro when error-checking is enabled.)  The
5522 arguments to the macros are the actual type name (this is used to
5523 construct the C variable name of the lrecord implementation structure
5524 and related structures using the @samp{##} macro concatenation
5525 operator), a string that names the type on the Lisp level (this may not
5526 be the same as the C type name; typically, the C type name has
5527 underscores, while the Lisp string has dashes), various method pointers,
5528 and the name of the C structure that contains the object.  The methods
5529 are used to encapsulate type-specific information about the object, such
5530 as how to print it or mark it for garbage collection, so that it's easy
5531 to add new object types without having to add a specific case for each
5532 new type in a bunch of different places.
5533
5534   The difference between @code{DEFINE_LRECORD_IMPLEMENTATION()} and
5535 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()} is that the former is
5536 used for fixed-size object types and the latter is for variable-size
5537 object types.  Most object types are fixed-size; some complex
5538 types, however (e.g. window configurations), are variable-size.
5539 Variable-size object types have an extra method, which is called
5540 to determine the actual size of a particular object of that type.
5541 (Currently this is only used for keeping allocation statistics.)
5542
5543   For the purpose of keeping allocation statistics, the allocation
5544 engine keeps a list of all the different types that exist.  Note that,
5545 since @code{DEFINE_LRECORD_IMPLEMENTATION()} is a macro that is
5546 specified at top-level, there is no way for it to initialize the global
5547 data structures containing type information, like
5548 @code{lrecord_implementations_table}.  For this reason a call to
5549 @code{INIT_LRECORD_IMPLEMENTATION} must be added to the same source file
5550 containing @code{DEFINE_LRECORD_IMPLEMENTATION}, but instead of to the
5551 top level, to one of the init functions, typically
5552 @code{syms_of_@var{foo}.c}.  @code{INIT_LRECORD_IMPLEMENTATION} must be
5553 called before an object of this type is used.
5554
5555 The type number is also used to index into an array holding the number
5556 of objects of each type and the total memory allocated for objects of
5557 that type.  The statistics in this array are computed during the sweep
5558 stage.  These statistics are returned by the call to
5559 @code{garbage-collect}.
5560
5561   Note that for every type defined with a @code{DEFINE_LRECORD_*()}
5562 macro, there needs to be a @code{DECLARE_LRECORD_IMPLEMENTATION()}
5563 somewhere in a @file{.h} file, and this @file{.h} file needs to be
5564 included by @file{inline.c}.
5565
5566   Furthermore, there should generally be a set of @code{XFOOBAR()},
5567 @code{FOOBARP()}, etc. macros in a @file{.h} (or occasionally @file{.c})
5568 file.  To create one of these, copy an existing model and modify as
5569 necessary.
5570
5571   @strong{Please note:} If you define an lrecord in an external
5572 dynamically-loaded module, you must use @code{DECLARE_EXTERNAL_LRECORD},
5573 @code{DEFINE_EXTERNAL_LRECORD_IMPLEMENTATION}, and
5574 @code{DEFINE_EXTERNAL_LRECORD_SEQUENCE_IMPLEMENTATION} instead of the
5575 non-EXTERNAL forms. These macros will dynamically add new type numbers
5576 to the global enum that records them, whereas the non-EXTERNAL forms
5577 assume that the programmer has already inserted the correct type numbers
5578 into the enum's code at compile-time.
5579
5580   The various methods in the lrecord implementation structure are:
5581
5582 @enumerate
5583 @item
5584 @cindex mark method
5585 A @dfn{mark} method.  This is called during the marking stage and passed
5586 a function pointer (usually the @code{mark_object()} function), which is
5587 used to mark an object.  All Lisp objects that are contained within the
5588 object need to be marked by applying this function to them.  The mark
5589 method should also return a Lisp object, which should be either @code{nil} or
5590 an object to mark. (This can be used in lieu of calling
5591 @code{mark_object()} on the object, to reduce the recursion depth, and
5592 consequently should be the most heavily nested sub-object, such as a
5593 long list.)
5594
5595 @strong{Please note:} When the mark method is called, garbage collection
5596 is in progress, and special precautions need to be taken when accessing
5597 objects; see section (B) above.
5598
5599 If your mark method does not need to do anything, it can be
5600 @code{NULL}.
5601
5602 @item
5603 A @dfn{print} method.  This is called to create a printed representation
5604 of the object, whenever @code{princ}, @code{prin1}, or the like is
5605 called.  It is passed the object, a stream to which the output is to be
5606 directed, and an @code{escapeflag} which indicates whether the object's
5607 printed representation should be @dfn{escaped} so that it is
5608 readable. (This corresponds to the difference between @code{princ} and
5609 @code{prin1}.) Basically, @dfn{escaped} means that strings will have
5610 quotes around them and confusing characters in the strings such as
5611 quotes, backslashes, and newlines will be backslashed; and that special
5612 care will be taken to make symbols print in a readable fashion
5613 (e.g. symbols that look like numbers will be backslashed).  Other
5614 readable objects should perhaps pass @code{escapeflag} on when
5615 sub-objects are printed, so that readability is preserved when necessary
5616 (or if not, always pass in a 1 for @code{escapeflag}).  Non-readable
5617 objects should in general ignore @code{escapeflag}, except that some use
5618 it as an indication that more verbose output should be given.
5619
5620 Sub-objects are printed using @code{print_internal()}, which takes
5621 exactly the same arguments as are passed to the print method.
5622
5623 Literal C strings should be printed using @code{write_c_string()},
5624 or @code{write_string_1()} for non-null-terminated strings.
5625
5626 Functions that do not have a readable representation should check the
5627 @code{print_readably} flag and signal an error if it is set.
5628
5629 If you specify NULL for the print method, the
5630 @code{default_object_printer()} will be used.
5631
5632 @item
5633 A @dfn{finalize} method.  This is called at the beginning of the sweep
5634 stage on lcrecords that are about to be freed, and should be used to
5635 perform any extra object cleanup.  This typically involves freeing any
5636 extra @code{malloc()}ed memory associated with the object, releasing any
5637 operating-system and window-system resources associated with the object
5638 (e.g. pixmaps, fonts), etc.
5639
5640 The finalize method can be NULL if nothing needs to be done.
5641
5642 WARNING #1: The finalize method is also called at the end of the dump
5643 phase; this time with the for_disksave parameter set to non-zero.  The
5644 object is @emph{not} about to disappear, so you have to make sure to
5645 @emph{not} free any extra @code{malloc()}ed memory if you're going to
5646 need it later.  (Also, signal an error if there are any operating-system
5647 and window-system resources here, because they can't be dumped.)
5648
5649 Finalize methods should, as a rule, set to zero any pointers after
5650 they've been freed, and check to make sure pointers are not zero before
5651 freeing.  Although I'm pretty sure that finalize methods are not called
5652 twice on the same object (except for the @code{for_disksave} proviso),
5653 we've gotten nastily burned in some cases by not doing this.
5654
5655 WARNING #2: The finalize method is @emph{only} called for
5656 lcrecords, @emph{not} for simply lrecords.  If you need a
5657 finalize method for simple lrecords, you have to stick
5658 it in the @code{ADDITIONAL_FREE_foo()} macro in @file{alloc.c}.
5659
5660 WARNING #3: Things are in an @emph{extremely} bizarre state
5661 when @code{ADDITIONAL_FREE_foo()} is called, so you have to
5662 be incredibly careful when writing one of these functions.
5663 See the comment in @code{gc_sweep()}.  If you ever have to add
5664 one of these, consider using an lcrecord or dealing with
5665 the problem in a different fashion.
5666
5667 @item
5668 An @dfn{equal} method.  This compares the two objects for similarity,
5669 when @code{equal} is called.  It should compare the contents of the
5670 objects in some reasonable fashion.  It is passed the two objects and a
5671 @dfn{depth} value, which is used to catch circular objects.  To compare
5672 sub-Lisp-objects, call @code{internal_equal()} and bump the depth value
5673 by one.  If this value gets too high, a @code{circular-object} error
5674 will be signaled.
5675
5676 If this is NULL, objects are @code{equal} only when they are @code{eq},
5677 i.e. identical.
5678
5679 @item
5680 A @dfn{hash} method.  This is used to hash objects when they are to be
5681 compared with @code{equal}.  The rule here is that if two objects are
5682 @code{equal}, they @emph{must} hash to the same value; i.e. your hash
5683 function should use some subset of the sub-fields of the object that are
5684 compared in the ``equal'' method.  If you specify this method as
5685 @code{NULL}, the object's pointer will be used as the hash, which will
5686 @emph{fail} if the object has an @code{equal} method, so don't do this.
5687
5688 To hash a sub-Lisp-object, call @code{internal_hash()}.  Bump the
5689 depth by one, just like in the ``equal'' method.
5690
5691 To convert a Lisp object directly into a hash value (using
5692 its pointer), use @code{LISP_HASH()}.  This is what happens when
5693 the hash method is NULL.
5694
5695 To hash two or more values together into a single value, use
5696 @code{HASH2()}, @code{HASH3()}, @code{HASH4()}, etc.
5697
5698 @item
5699 @dfn{getprop}, @dfn{putprop}, @dfn{remprop}, and @dfn{plist} methods.
5700 These are used for object types that have properties.  I don't feel like
5701 documenting them here.  If you create one of these objects, you have to
5702 use different macros to define them,
5703 i.e. @code{DEFINE_LRECORD_IMPLEMENTATION_WITH_PROPS()} or
5704 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION_WITH_PROPS()}.
5705
5706 @item
5707 A @dfn{size_in_bytes} method, when the object is of variable-size.
5708 (i.e. declared with a @code{_SEQUENCE_IMPLEMENTATION} macro.)  This should
5709 simply return the object's size in bytes, exactly as you might expect.
5710 For an example, see the methods for window configurations and opaques.
5711 @end enumerate
5712
5713 @node Low-level allocation
5714 @section Low-level allocation
5715 @cindex low-level allocation
5716 @cindex allocation, low-level
5717
5718   Memory that you want to allocate directly should be allocated using
5719 @code{xmalloc()} rather than @code{malloc()}.  This implements
5720 error-checking on the return value, and once upon a time did some more
5721 vital stuff (i.e. @code{BLOCK_INPUT}, which is no longer necessary).
5722 Free using @code{xfree()}, and realloc using @code{xrealloc()}.  Note
5723 that @code{xmalloc()} will do a non-local exit if the memory can't be
5724 allocated. (Many functions, however, do not expect this, and thus XEmacs
5725 will likely crash if this happens.  @strong{This is a bug.}  If you can,
5726 you should strive to make your function handle this OK.  However, it's
5727 difficult in the general circumstance, perhaps requiring extra
5728 unwind-protects and such.)
5729
5730   Note that XEmacs provides two separate replacements for the standard
5731 @code{malloc()} library function.  These are called @dfn{old GNU malloc}
5732 (@file{malloc.c}) and @dfn{new GNU malloc} (@file{gmalloc.c}),
5733 respectively.  New GNU malloc is better in pretty much every way than
5734 old GNU malloc, and should be used if possible.  (It used to be that on
5735 some systems, the old one worked but the new one didn't.  I think this
5736 was due specifically to a bug in SunOS, which the new one now works
5737 around; so I don't think the old one ever has to be used any more.) The
5738 primary difference between both of these mallocs and the standard system
5739 malloc is that they are much faster, at the expense of increased space.
5740 The basic idea is that memory is allocated in fixed chunks of powers of
5741 two.  This allows for basically constant malloc time, since the various
5742 chunks can just be kept on a number of free lists. (The standard system
5743 malloc typically allocates arbitrary-sized chunks and has to spend some
5744 time, sometimes a significant amount of time, walking the heap looking
5745 for a free block to use and cleaning things up.)  The new GNU malloc
5746 improves on things by allocating large objects in chunks of 4096 bytes
5747 rather than in ever larger powers of two, which results in ever larger
5748 wastage.  There is a slight speed loss here, but it's of doubtful
5749 significance.
5750
5751   NOTE: Apparently there is a third-generation GNU malloc that is
5752 significantly better than the new GNU malloc, and should probably
5753 be included in XEmacs.
5754
5755   There is also the relocating allocator, @file{ralloc.c}.  This actually
5756 moves blocks of memory around so that the @code{sbrk()} pointer shrunk
5757 and virtual memory released back to the system.  On some systems,
5758 this is a big win.  On all systems, it causes a noticeable (and
5759 sometimes huge) speed penalty, so I turn it off by default.
5760 @file{ralloc.c} only works with the new GNU malloc in @file{gmalloc.c}.
5761 There are also two versions of @file{ralloc.c}, one that uses @code{mmap()}
5762 rather than block copies to move data around.  This purports to
5763 be faster, although that depends on the amount of data that would
5764 have had to be block copied and the system-call overhead for
5765 @code{mmap()}.  I don't know exactly how this works, except that the
5766 relocating-allocation routines are pretty much used only for
5767 the memory allocated for a buffer, which is the biggest consumer
5768 of space, esp. of space that may get freed later.
5769
5770   Note that the GNU mallocs have some ``memory warning'' facilities.
5771 XEmacs taps into them and issues a warning through the standard
5772 warning system, when memory gets to 75%, 85%, and 95% full.
5773 (On some systems, the memory warnings are not functional.)
5774
5775   Allocated memory that is going to be used to make a Lisp object
5776 is created using @code{allocate_lisp_storage()}.  This just calls
5777 @code{xmalloc()}.  It used to verify that the pointer to the memory can
5778 fit into a Lisp word, before the current Lisp object representation was
5779 introduced.  @code{allocate_lisp_storage()} is called by
5780 @code{alloc_lcrecord()}, @code{ALLOCATE_FIXED_TYPE()}, and the vector
5781 and bit-vector creation routines.  These routines also call
5782 @code{INCREMENT_CONS_COUNTER()} at the appropriate times; this keeps
5783 statistics on how much memory is allocated, so that garbage-collection
5784 can be invoked when the threshold is reached.
5785
5786 @node Cons
5787 @section Cons
5788 @cindex cons
5789
5790   Conses are allocated in standard frob blocks.  The only thing to
5791 note is that conses can be explicitly freed using @code{free_cons()}
5792 and associated functions @code{free_list()} and @code{free_alist()}.  This
5793 immediately puts the conses onto the cons free list, and decrements
5794 the statistics on memory allocation appropriately.  This is used
5795 to good effect by some extremely commonly-used code, to avoid
5796 generating extra objects and thereby triggering GC sooner.
5797 However, you have to be @emph{extremely} careful when doing this.
5798 If you mess this up, you will get BADLY BURNED, and it has happened
5799 before.
5800
5801 @node Vector
5802 @section Vector
5803 @cindex vector
5804
5805   As mentioned above, each vector is @code{malloc()}ed individually, and
5806 all are threaded through the variable @code{all_vectors}.  Vectors are
5807 marked strangely during garbage collection, by kludging the size field.
5808 Note that the @code{struct Lisp_Vector} is declared with its
5809 @code{contents} field being a @emph{stretchy} array of one element.  It
5810 is actually @code{malloc()}ed with the right size, however, and access
5811 to any element through the @code{contents} array works fine.
5812
5813 @node Bit Vector
5814 @section Bit Vector
5815 @cindex bit vector
5816 @cindex vector, bit
5817
5818   Bit vectors work exactly like vectors, except for more complicated
5819 code to access an individual bit, and except for the fact that bit
5820 vectors are lrecords while vectors are not. (The only difference here is
5821 that there's an lrecord implementation pointer at the beginning and the
5822 tag field in bit vector Lisp words is ``lrecord'' rather than
5823 ``vector''.)
5824
5825 @node Symbol
5826 @section Symbol
5827 @cindex symbol
5828
5829   Symbols are also allocated in frob blocks.  Symbols in the awful
5830 horrible obarray structure are chained through their @code{next} field.
5831
5832 Remember that @code{intern} looks up a symbol in an obarray, creating
5833 one if necessary.
5834
5835 @node Marker
5836 @section Marker
5837 @cindex marker
5838
5839   Markers are allocated in frob blocks, as usual.  They are kept
5840 in a buffer unordered, but in a doubly-linked list so that they
5841 can easily be removed. (Formerly this was a singly-linked list,
5842 but in some cases garbage collection took an extraordinarily
5843 long time due to the O(N^2) time required to remove lots of
5844 markers from a buffer.) Markers are removed from a buffer in
5845 the finalize stage, in @code{ADDITIONAL_FREE_marker()}.
5846
5847 @node String
5848 @section String
5849 @cindex string
5850
5851   As mentioned above, strings are a special case.  A string is logically
5852 two parts, a fixed-size object (containing the length, property list,
5853 and a pointer to the actual data), and the actual data in the string.
5854 The fixed-size object is a @code{struct Lisp_String} and is allocated in
5855 frob blocks, as usual.  The actual data is stored in special
5856 @dfn{string-chars blocks}, which are 8K blocks of memory.
5857 Currently-allocated strings are simply laid end to end in these
5858 string-chars blocks, with a pointer back to the @code{struct Lisp_String}
5859 stored before each string in the string-chars block.  When a new string
5860 needs to be allocated, the remaining space at the end of the last
5861 string-chars block is used if there's enough, and a new string-chars
5862 block is created otherwise.
5863
5864   There are never any holes in the string-chars blocks due to the string
5865 compaction and relocation that happens at the end of garbage collection.
5866 During the sweep stage of garbage collection, when objects are
5867 reclaimed, the garbage collector goes through all string-chars blocks,
5868 looking for unused strings.  Each chunk of string data is preceded by a
5869 pointer to the corresponding @code{struct Lisp_String}, which indicates
5870 both whether the string is used and how big the string is, i.e. how to
5871 get to the next chunk of string data.  Holes are compressed by
5872 block-copying the next string into the empty space and relocating the
5873 pointer stored in the corresponding @code{struct Lisp_String}.
5874 @strong{This means you have to be careful with strings in your code.}
5875 See the section above on @code{GCPRO}ing.
5876
5877   Note that there is one situation not handled: a string that is too big
5878 to fit into a string-chars block.  Such strings, called @dfn{big
5879 strings}, are all @code{malloc()}ed as their own block. (#### Although it
5880 would make more sense for the threshold for big strings to be somewhat
5881 lower, e.g. 1/2 or 1/4 the size of a string-chars block.  It seems that
5882 this was indeed the case formerly---indeed, the threshold was set at
5883 1/8---but Mly forgot about this when rewriting things for 19.8.)
5884
5885 Note also that the string data in string-chars blocks is padded as
5886 necessary so that proper alignment constraints on the @code{struct
5887 Lisp_String} back pointers are maintained.
5888
5889   Finally, strings can be resized.  This happens in Mule when a
5890 character is substituted with a different-length character, or during
5891 modeline frobbing. (You could also export this to Lisp, but it's not
5892 done so currently.) Resizing a string is a potentially tricky process.
5893 If the change is small enough that the padding can absorb it, nothing
5894 other than a simple memory move needs to be done.  Keep in mind,
5895 however, that the string can't shrink too much because the offset to the
5896 next string in the string-chars block is computed by looking at the
5897 length and rounding to the nearest multiple of four or eight.  If the
5898 string would shrink or expand beyond the correct padding, new string
5899 data needs to be allocated at the end of the last string-chars block and
5900 the data moved appropriately.  This leaves some dead string data, which
5901 is marked by putting a special marker of 0xFFFFFFFF in the @code{struct
5902 Lisp_String} pointer before the data (there's no real @code{struct
5903 Lisp_String} to point to and relocate), and storing the size of the dead
5904 string data (which would normally be obtained from the now-non-existent
5905 @code{struct Lisp_String}) at the beginning of the dead string data gap.
5906 The string compactor recognizes this special 0xFFFFFFFF marker and
5907 handles it correctly.
5908
5909 @node Compiled Function
5910 @section Compiled Function
5911 @cindex compiled function
5912 @cindex function, compiled
5913
5914   Not yet documented.
5915
5916
5917 @node Dumping, Events and the Event Loop, Allocation of Objects in XEmacs Lisp, Top
5918 @chapter Dumping
5919 @cindex dumping
5920
5921 @section What is dumping and its justification
5922 @cindex dumping and its justification, what is
5923
5924 The C code of XEmacs is just a Lisp engine with a lot of built-in
5925 primitives useful for writing an editor.  The editor itself is written
5926 mostly in Lisp, and represents around 100K lines of code.  Loading and
5927 executing the initialization of all this code takes a bit a time (five
5928 to ten times the usual startup time of current xemacs) and requires
5929 having all the lisp source files around.  Having to reload them each
5930 time the editor is started would not be acceptable.
5931
5932 The traditional solution to this problem is called dumping: the build
5933 process first creates the lisp engine under the name @file{temacs}, then
5934 runs it until it has finished loading and initializing all the lisp
5935 code, and eventually creates a new executable called @file{xemacs}
5936 including both the object code in @file{temacs} and all the contents of
5937 the memory after the initialization.
5938
5939 This solution, while working, has a huge problem: the creation of the
5940 new executable from the actual contents of memory is an extremely
5941 system-specific process, quite error-prone, and which interferes with a
5942 lot of system libraries (like malloc).  It is even getting worse
5943 nowadays with libraries using constructors which are automatically
5944 called when the program is started (even before main()) which tend to
5945 crash when they are called multiple times, once before dumping and once
5946 after (IRIX 6.x libz.so pulls in some C++ image libraries thru
5947 dependencies which have this problem).  Writing the dumper is also one
5948 of the most difficult parts of porting XEmacs to a new operating system.
5949 Basically, `dumping' is an operation that is just not officially
5950 supported on many operating systems.
5951
5952 The aim of the portable dumper is to solve the same problem as the
5953 system-specific dumper, that is to be able to reload quickly, using only
5954 a small number of files, the fully initialized lisp part of the editor,
5955 without any system-specific hacks.
5956
5957 @menu
5958 * Overview::
5959 * Data descriptions::
5960 * Dumping phase::
5961 * Reloading phase::
5962 * Remaining issues::
5963 @end menu
5964
5965 @node Overview
5966 @section Overview
5967 @cindex dumping overview
5968
5969 The portable dumping system has to:
5970
5971 @enumerate
5972 @item
5973 At dump time, write all initialized, non-quickly-rebuildable data to a
5974 file [Note: currently named @file{xemacs.dmp}, but the name will
5975 change], along with all informations needed for the reloading.
5976
5977 @item
5978 When starting xemacs, reload the dump file, relocate it to its new
5979 starting address if needed, and reinitialize all pointers to this
5980 data.  Also, rebuild all the quickly rebuildable data.
5981 @end enumerate
5982
5983 @node Data descriptions
5984 @section Data descriptions
5985 @cindex dumping data descriptions
5986
5987 The more complex task of the dumper is to be able to write lisp objects
5988 (lrecords) and C structs to disk and reload them at a different address,
5989 updating all the pointers they include in the process.  This is done by
5990 using external data descriptions that give information about the layout
5991 of the structures in memory.
5992
5993 The specification of these descriptions is in lrecord.h.  A description
5994 of an lrecord is an array of struct lrecord_description.  Each of these
5995 structs include a type, an offset in the structure and some optional
5996 parameters depending on the type.  For instance, here is the string
5997 description:
5998
5999 @example
6000 static const struct lrecord_description string_description[] = @{
6001   @{ XD_BYTECOUNT,         offsetof (Lisp_String, size) @},
6002   @{ XD_OPAQUE_DATA_PTR,   offsetof (Lisp_String, data), XD_INDIRECT(0, 1) @},
6003   @{ XD_LISP_OBJECT,       offsetof (Lisp_String, plist) @},
6004   @{ XD_END @}
6005 @};
6006 @end example
6007
6008 The first line indicates a member of type Bytecount, which is used by
6009 the next, indirect directive.  The second means "there is a pointer to
6010 some opaque data in the field @code{data}".  The length of said data is
6011 given by the expression @code{XD_INDIRECT(0, 1)}, which means "the value
6012 in the 0th line of the description (welcome to C) plus one".  The third
6013 line means "there is a Lisp_Object member @code{plist} in the Lisp_String
6014 structure".  @code{XD_END} then ends the description.
6015
6016 This gives us all the information we need to move around what is pointed
6017 to by a structure (C or lrecord) and, by transitivity, everything that
6018 it points to.  The only missing information for dumping is the size of
6019 the structure.  For lrecords, this is part of the
6020 lrecord_implementation, so we don't need to duplicate it.  For C
6021 structures we use a struct struct_description, which includes a size
6022 field and a pointer to an associated array of lrecord_description.
6023
6024 @node Dumping phase
6025 @section Dumping phase
6026 @cindex dumping phase
6027
6028 Dumping is done by calling the function pdump() (in dumper.c) which is
6029 invoked from Fdump_emacs (in emacs.c).  This function performs a number
6030 of tasks.
6031
6032 @menu
6033 * Object inventory::
6034 * Address allocation::
6035 * The header::
6036 * Data dumping::
6037 * Pointers dumping::
6038 @end menu
6039
6040 @node Object inventory
6041 @subsection Object inventory
6042 @cindex dumping object inventory
6043
6044 The first task is to build the list of the objects to dump.  This
6045 includes:
6046
6047 @itemize @bullet
6048 @item lisp objects
6049 @item C structures
6050 @end itemize
6051
6052 We end up with one @code{pdump_entry_list_elmt} per object group (arrays
6053 of C structs are kept together) which includes a pointer to the first
6054 object of the group, the per-object size and the count of objects in the
6055 group, along with some other information which is initialized later.
6056
6057 These entries are linked together in @code{pdump_entry_list} structures
6058 and can be enumerated thru either:
6059
6060 @enumerate
6061 @item
6062 the @code{pdump_object_table}, an array of @code{pdump_entry_list}, one
6063 per lrecord type, indexed by type number.
6064
6065 @item
6066 the @code{pdump_opaque_data_list}, used for the opaque data which does
6067 not include pointers, and hence does not need descriptions.
6068
6069 @item
6070 the @code{pdump_struct_table}, which is a vector of
6071 @code{struct_description}/@code{pdump_entry_list} pairs, used for
6072 non-opaque C structures.
6073 @end enumerate
6074
6075 This uses a marking strategy similar to the garbage collector.  Some
6076 differences though:
6077
6078 @enumerate
6079 @item
6080 We do not use the mark bit (which does not exist for C structures
6081 anyway); we use a big hash table instead.
6082
6083 @item
6084 We do not use the mark function of lrecords but instead rely on the
6085 external descriptions.  This happens essentially because we need to
6086 follow pointers to C structures and opaque data in addition to
6087 Lisp_Object members.
6088 @end enumerate
6089
6090 This is done by @code{pdump_register_object()}, which handles Lisp_Object
6091 variables, and @code{pdump_register_struct()} which handles C structures,
6092 which both delegate the description management to @code{pdump_register_sub()}.
6093
6094 The hash table doubles as a map object to pdump_entry_list_elmt (i.e.
6095 allows us to look up a pdump_entry_list_elmt with the object it points
6096 to).  Entries are added with @code{pdump_add_entry()} and looked up with
6097 @code{pdump_get_entry()}.  There is no need for entry removal.  The hash
6098 value is computed quite simply from the object pointer by
6099 @code{pdump_make_hash()}.
6100
6101 The roots for the marking are:
6102
6103 @enumerate
6104 @item
6105 the @code{staticpro}'ed variables (there is a special @code{staticpro_nodump()}
6106 call for protected variables we do not want to dump).
6107
6108 @item
6109 the variables registered via @code{dump_add_root_object}
6110 (@code{staticpro()} is equivalent to @code{staticpro_nodump()} +
6111 @code{dump_add_root_object()}).
6112
6113 @item
6114 the variables registered via @code{dump_add_root_struct_ptr}, each of
6115 which points to a C structure.
6116 @end enumerate
6117
6118 This does not include the GCPRO'ed variables, the specbinds, the
6119 catchtags, the backlist, the redisplay or the profiling info, since we
6120 do not want to rebuild the actual chain of lisp calls which end up to
6121 the dump-emacs call, only the global variables.
6122
6123 Weak lists and weak hash tables are dumped as if they were their
6124 non-weak equivalent (without changing their type, of course).  This has
6125 not yet been a problem.
6126
6127 @node Address allocation
6128 @subsection Address allocation
6129 @cindex dumping address allocation
6130
6131
6132 The next step is to allocate the offsets of each of the objects in the
6133 final dump file.  This is done by @code{pdump_allocate_offset()} which
6134 is called indirectly by @code{pdump_scan_by_alignment()}.
6135
6136 The strategy to deal with alignment problems uses these facts:
6137
6138 @enumerate
6139 @item
6140 real world alignment requirements are powers of two.
6141
6142 @item
6143 the C compiler is required to adjust the size of a struct so that you
6144 can have an array of them next to each other.  This means you can have an
6145 upper bound of the alignment requirements of a given structure by
6146 looking at which power of two its size is a multiple.
6147
6148 @item
6149 the non-variant part of variable size lrecords has an alignment
6150 requirement of 4.
6151 @end enumerate
6152
6153 Hence, for each lrecord type, C struct type or opaque data block the
6154 alignment requirement is computed as a power of two, with a minimum of
6155 2^2 for lrecords.  @code{pdump_scan_by_alignment()} then scans all the
6156 @code{pdump_entry_list_elmt}'s, the ones with the highest requirements
6157 first.  This ensures the best packing.
6158
6159 The maximum alignment requirement we take into account is 2^8.
6160
6161 @code{pdump_allocate_offset()} only has to do a linear allocation,
6162 starting at offset 256 (this leaves room for the header and keeps the
6163 alignments happy).
6164
6165 @node The header
6166 @subsection The header
6167 @cindex dumping, the header
6168
6169 The next step creates the file and writes a header with a signature and
6170 some random information in it.  The @code{reloc_address} field, which
6171 indicates at which address the file should be loaded if we want to avoid
6172 post-reload relocation, is set to 0.  It then seeks to offset 256 (base
6173 offset for the objects).
6174
6175 @node Data dumping
6176 @subsection Data dumping
6177 @cindex data dumping
6178 @cindex dumping, data
6179
6180 The data is dumped in the same order as the addresses were allocated by
6181 @code{pdump_dump_data()}, called from @code{pdump_scan_by_alignment()}.
6182 This function copies the data to a temporary buffer, relocates all
6183 pointers in the object to the addresses allocated in step Address
6184 Allocation, and writes it to the file.  Using the same order means that,
6185 if we are careful with lrecords whose size is not a multiple of 4, we
6186 are ensured that the object is always written at the offset in the file
6187 allocated in step Address Allocation.
6188
6189 @node Pointers dumping
6190 @subsection Pointers dumping
6191 @cindex pointers dumping
6192 @cindex dumping, pointers
6193
6194 A bunch of tables needed to reassign properly the global pointers are
6195 then written.  They are:
6196
6197 @enumerate
6198 @item
6199 the pdump_root_struct_ptrs dynarr
6200 @item
6201 the pdump_opaques dynarr
6202 @item
6203 a vector of all the offsets to the objects in the file that include a
6204 description (for faster relocation at reload time)
6205 @item
6206 the pdump_root_objects and pdump_weak_object_chains dynarrs.
6207 @end enumerate
6208
6209 For each of the dynarrs we write both the pointer to the variables and
6210 the relocated offset of the object they point to.  Since these variables
6211 are global, the pointers are still valid when restarting the program and
6212 are used to regenerate the global pointers.
6213
6214 The @code{pdump_weak_object_chains} dynarr is a special case.  The
6215 variables it points to are the head of weak linked lists of lisp objects
6216 of the same type.  Not all objects of this list are dumped so the
6217 relocated pointer we associate with them points to the first dumped
6218 object of the list, or Qnil if none is available.  This is also the
6219 reason why they are not used as roots for the purpose of object
6220 enumeration.
6221
6222 Some very important information like the @code{staticpros} and
6223 @code{lrecord_implementations_table} are handled indirectly using
6224 @code{dump_add_opaque} or @code{dump_add_root_struct_ptr}.
6225
6226 This is the end of the dumping part.
6227
6228 @node Reloading phase
6229 @section Reloading phase
6230 @cindex reloading phase
6231 @cindex dumping, reloading phase
6232
6233 @subsection File loading
6234 @cindex dumping, file loading
6235
6236 The file is mmap'ed in memory (which ensures a PAGESIZE alignment, at
6237 least 4096), or if mmap is unavailable or fails, a 256-bytes aligned
6238 malloc is done and the file is loaded.
6239
6240 Some variables are reinitialized from the values found in the header.
6241
6242 The difference between the actual loading address and the reloc_address
6243 is computed and will be used for all the relocations.
6244
6245
6246 @subsection Putting back the pdump_opaques
6247 @cindex dumping, putting back the pdump_opaques
6248
6249 The memory contents are restored in the obvious and trivial way.
6250
6251
6252 @subsection Putting back the pdump_root_struct_ptrs
6253 @cindex dumping, putting back the pdump_root_struct_ptrs
6254
6255 The variables pointed to by pdump_root_struct_ptrs in the dump phase are
6256 reset to the right relocated object addresses.
6257
6258
6259 @subsection Object relocation
6260 @cindex dumping, object relocation
6261
6262 All the objects are relocated using their description and their offset
6263 by @code{pdump_reloc_one}.  This step is unnecessary if the
6264 reloc_address is equal to the file loading address.
6265
6266
6267 @subsection Putting back the pdump_root_objects and pdump_weak_object_chains
6268 @cindex dumping, putting back the pdump_root_objects and pdump_weak_object_chains
6269
6270 Same as Putting back the pdump_root_struct_ptrs.
6271
6272
6273 @subsection Reorganize the hash tables
6274 @cindex dumping, reorganize the hash tables
6275
6276 Since some of the hash values in the lisp hash tables are
6277 address-dependent, their layout is now wrong.  So we go through each of
6278 them and have them resorted by calling @code{pdump_reorganize_hash_table}.
6279
6280 @node Remaining issues
6281 @section Remaining issues
6282 @cindex dumping, remaining issues
6283
6284 The build process will have to start a post-dump xemacs, ask it the
6285 loading address (which will, hopefully, be always the same between
6286 different xemacs invocations) and relocate the file to the new address.
6287 This way the object relocation phase will not have to be done, which
6288 means no writes in the objects and that, because of the use of mmap, the
6289 dumped data will be shared between all the xemacs running on the
6290 computer.
6291
6292 Some executable signature will be necessary to ensure that a given dump
6293 file is really associated with a given executable, or random crashes
6294 will occur.  Maybe a random number set at compile or configure time thru
6295 a define.  This will also allow for having differently-compiled xemacsen
6296 on the same system (mule and no-mule comes to mind).
6297
6298 The DOC file contents should probably end up in the dump file.
6299
6300
6301 @node Events and the Event Loop, Evaluation; Stack Frames; Bindings, Dumping, Top
6302 @chapter Events and the Event Loop
6303 @cindex events and the event loop
6304 @cindex event loop, events and the
6305
6306 @menu
6307 * Introduction to Events::
6308 * Main Loop::
6309 * Specifics of the Event Gathering Mechanism::
6310 * Specifics About the Emacs Event::
6311 * The Event Stream Callback Routines::
6312 * Other Event Loop Functions::
6313 * Converting Events::
6314 * Dispatching Events; The Command Builder::
6315 @end menu
6316
6317 @node Introduction to Events
6318 @section Introduction to Events
6319 @cindex events, introduction to
6320
6321   An event is an object that encapsulates information about an
6322 interesting occurrence in the operating system.  Events are
6323 generated either by user action, direct (e.g. typing on the
6324 keyboard or moving the mouse) or indirect (moving another
6325 window, thereby generating an expose event on an Emacs frame),
6326 or as a result of some other typically asynchronous action happening,
6327 such as output from a subprocess being ready or a timer expiring.
6328 Events come into the system in an asynchronous fashion (typically
6329 through a callback being called) and are converted into a
6330 synchronous event queue (first-in, first-out) in a process that
6331 we will call @dfn{collection}.
6332
6333   Note that each application has its own event queue. (It is
6334 immaterial whether the collection process directly puts the
6335 events in the proper application's queue, or puts them into
6336 a single system queue, which is later split up.)
6337
6338   The most basic level of event collection is done by the
6339 operating system or window system.  Typically, XEmacs does
6340 its own event collection as well.  Often there are multiple
6341 layers of collection in XEmacs, with events from various
6342 sources being collected into a queue, which is then combined
6343 with other sources to go into another queue (i.e. a second
6344 level of collection), with perhaps another level on top of
6345 this, etc.
6346
6347   XEmacs has its own types of events (called @dfn{Emacs events}),
6348 which provides an abstract layer on top of the system-dependent
6349 nature of the most basic events that are received.  Part of the
6350 complex nature of the XEmacs event collection process involves
6351 converting from the operating-system events into the proper
6352 Emacs events---there may not be a one-to-one correspondence.
6353
6354   Emacs events are documented in @file{events.h}; I'll discuss them
6355 later.
6356
6357 @node Main Loop
6358 @section Main Loop
6359 @cindex main loop
6360 @cindex events, main loop
6361
6362   The @dfn{command loop} is the top-level loop that the editor is always
6363 running.  It loops endlessly, calling @code{next-event} to retrieve an
6364 event and @code{dispatch-event} to execute it. @code{dispatch-event} does
6365 the appropriate thing with non-user events (process, timeout,
6366 magic, eval, mouse motion); this involves calling a Lisp handler
6367 function, redrawing a newly-exposed part of a frame, reading
6368 subprocess output, etc.  For user events, @code{dispatch-event}
6369 looks up the event in relevant keymaps or menubars; when a
6370 full key sequence or menubar selection is reached, the appropriate
6371 function is executed. @code{dispatch-event} may have to keep state
6372 across calls; this is done in the ``command-builder'' structure
6373 associated with each console (remember, there's usually only
6374 one console), and the engine that looks up keystrokes and
6375 constructs full key sequences is called the @dfn{command builder}.
6376 This is documented elsewhere.
6377
6378   The guts of the command loop are in @code{command_loop_1()}.  This
6379 function doesn't catch errors, though---that's the job of
6380 @code{command_loop_2()}, which is a condition-case (i.e. error-trapping)
6381 wrapper around @code{command_loop_1()}.  @code{command_loop_1()} never
6382 returns, but may get thrown out of.
6383
6384   When an error occurs, @code{cmd_error()} is called, which usually
6385 invokes the Lisp error handler in @code{command-error}; however, a
6386 default error handler is provided if @code{command-error} is @code{nil}
6387 (e.g. during startup).  The purpose of the error handler is simply to
6388 display the error message and do associated cleanup; it does not need to
6389 throw anywhere.  When the error handler finishes, the condition-case in
6390 @code{command_loop_2()} will finish and @code{command_loop_2()} will
6391 reinvoke @code{command_loop_1()}.
6392
6393   @code{command_loop_2()} is invoked from three places: from
6394 @code{initial_command_loop()} (called from @code{main()} at the end of
6395 internal initialization), from the Lisp function @code{recursive-edit},
6396 and from @code{call_command_loop()}.
6397
6398   @code{call_command_loop()} is called when a macro is started and when
6399 the minibuffer is entered; normal termination of the macro or minibuffer
6400 causes a throw out of the recursive command loop. (To
6401 @code{execute-kbd-macro} for macros and @code{exit} for minibuffers.
6402 Note also that the low-level minibuffer-entering function,
6403 @code{read-minibuffer-internal}, provides its own error handling and
6404 does not need @code{command_loop_2()}'s error encapsulation; so it tells
6405 @code{call_command_loop()} to invoke @code{command_loop_1()} directly.)
6406
6407   Note that both read-minibuffer-internal and recursive-edit set up a
6408 catch for @code{exit}; this is why @code{abort-recursive-edit}, which
6409 throws to this catch, exits out of either one.
6410
6411   @code{initial_command_loop()}, called from @code{main()}, sets up a
6412 catch for @code{top-level} when invoking @code{command_loop_2()},
6413 allowing functions to throw all the way to the top level if they really
6414 need to.  Before invoking @code{command_loop_2()},
6415 @code{initial_command_loop()} calls @code{top_level_1()}, which handles
6416 all of the startup stuff (creating the initial frame, handling the
6417 command-line options, loading the user's @file{.emacs} file, etc.).  The
6418 function that actually does this is in Lisp and is pointed to by the
6419 variable @code{top-level}; normally this function is
6420 @code{normal-top-level}.  @code{top_level_1()} is just an error-handling
6421 wrapper similar to @code{command_loop_2()}.  Note also that
6422 @code{initial_command_loop()} sets up a catch for @code{top-level} when
6423 invoking @code{top_level_1()}, just like when it invokes
6424 @code{command_loop_2()}.
6425
6426 @node Specifics of the Event Gathering Mechanism
6427 @section Specifics of the Event Gathering Mechanism
6428 @cindex event gathering mechanism, specifics of the
6429
6430   Here is an approximate diagram of the collection processes
6431 at work in XEmacs, under TTY's (TTY's are simpler than X
6432 so we'll look at this first):
6433
6434 @noindent
6435 @example
6436  asynch.      asynch.    asynch.   asynch.             [Collectors in
6437 kbd events  kbd events   process   process                the OS]
6438       |         |         output    output
6439       |         |           |         |
6440       |         |           |         |      SIGINT,   [signal handlers
6441       |         |           |         |      SIGQUIT,     in XEmacs]
6442       V         V           V         V      SIGWINCH,
6443      file      file        file      file    SIGALRM
6444      desc.     desc.       desc.     desc.     |
6445      (TTY)     (TTY)       (pipe)    (pipe)    |
6446       |          |          |         |      fake    timeouts
6447       |          |          |         |      file        |
6448       |          |          |         |      desc.       |
6449       |          |          |         |      (pipe)      |
6450       |          |          |         |        |         |
6451       |          |          |         |        |         |
6452       |          |          |         |        |         |
6453       V          V          V         V        V         V
6454       ------>-----------<----------------<----------------
6455                   |
6456                   |
6457                   | [collected using select() in emacs_tty_next_event()
6458                   |  and converted to the appropriate Emacs event]
6459                   |
6460                   |
6461                   V          (above this line is TTY-specific)
6462                 Emacs -----------------------------------------------
6463                 event (below this line is the generic event mechanism)
6464                   |
6465                   |
6466 was there     if not, call
6467 a SIGINT?  emacs_tty_next_event()
6468     |             |
6469     |             |
6470     |             |
6471     V             V
6472     --->------<----
6473            |
6474            |     [collected in event_stream_next_event();
6475            |      SIGINT is converted using maybe_read_quit_event()]
6476            V
6477          Emacs
6478          event
6479            |
6480            \---->------>----- maybe_kbd_translate() ---->---\
6481                                                             |
6482                                                             |
6483                                                             |
6484      command event queue                                    |
6485                                                if not from command
6486   (contains events that were                   event queue, call
6487   read earlier but not processed,              event_stream_next_event()
6488   typically when waiting in a                               |
6489   sit-for, sleep-for, etc. for                              |
6490  a particular event to be received)                         |
6491                |                                            |
6492                |                                            |
6493                V                                            V
6494                ---->------------------------------------<----
6495                                                |
6496                                                | [collected in
6497                                                |  next_event_internal()]
6498                                                |
6499  unread-     unread-       event from          |
6500  command-    command-       keyboard       else, call
6501  events      event           macro      next_event_internal()
6502    |           |               |               |
6503    |           |               |               |
6504    |           |               |               |
6505    V           V               V               V
6506    --------->----------------------<------------
6507                      |
6508                      |      [collected in `next-event', which may loop
6509                      |       more than once if the event it gets is on
6510                      |       a dead frame, device, etc.]
6511                      |
6512                      |
6513                      V
6514             feed into top-level event loop,
6515             which repeatedly calls `next-event'
6516             and then dispatches the event
6517             using `dispatch-event'
6518 @end example
6519
6520 Notice the separation between TTY-specific and generic event mechanism.
6521 When using the Xt-based event loop, the TTY-specific stuff is replaced
6522 but the rest stays the same.
6523
6524 It's also important to realize that only one different kind of
6525 system-specific event loop can be operating at a time, and must be able
6526 to receive all kinds of events simultaneously.  For the two existing
6527 event loops (implemented in @file{event-tty.c} and @file{event-Xt.c},
6528 respectively), the TTY event loop @emph{only} handles TTY consoles,
6529 while the Xt event loop handles @emph{both} TTY and X consoles.  This
6530 situation is different from all of the output handlers, where you simply
6531 have one per console type.
6532
6533   Here's the Xt Event Loop Diagram (notice that below a certain point,
6534 it's the same as the above diagram):
6535
6536 @example
6537 asynch. asynch. asynch. asynch.                 [Collectors in
6538  kbd     kbd    process process                    the OS]
6539 events  events  output  output
6540   |       |       |       |
6541   |       |       |       |     asynch. asynch. [Collectors in the
6542   |       |       |       |       X        X     OS and X Window System]
6543   |       |       |       |     events  events
6544   |       |       |       |       |        |
6545   |       |       |       |       |        |
6546   |       |       |       |       |        |    SIGINT, [signal handlers
6547   |       |       |       |       |        |    SIGQUIT,   in XEmacs]
6548   |       |       |       |       |        |    SIGWINCH,
6549   |       |       |       |       |        |    SIGALRM
6550   |       |       |       |       |        |       |
6551   |       |       |       |       |        |       |
6552   |       |       |       |       |        |       |      timeouts
6553   |       |       |       |       |        |       |          |
6554   |       |       |       |       |        |       |          |
6555   |       |       |       |       |        |       V          |
6556   V       V       V       V       V        V      fake        |
6557  file    file    file    file    file     file    file        |
6558  desc.   desc.   desc.   desc.   desc.    desc.   desc.       |
6559  (TTY)   (TTY)   (pipe)  (pipe) (socket) (socket) (pipe)      |
6560   |       |       |       |       |        |       |          |
6561   |       |       |       |       |        |       |          |
6562   |       |       |       |       |        |       |          |
6563   V       V       V       V       V        V       V          V
6564   --->----------------------------------------<---------<------
6565        |              |               |
6566        |              |               |[collected using select() in
6567        |              |               | _XtWaitForSomething(), called
6568        |              |               | from XtAppProcessEvent(), called
6569        |              |               | in emacs_Xt_next_event();
6570        |              |               | dispatched to various callbacks]
6571        |              |               |
6572        |              |               |
6573   emacs_Xt_        p_s_callback(),    | [popup_selection_callback]
6574   event_handler()  x_u_v_s_callback(),| [x_update_vertical_scrollbar_
6575        |           x_u_h_s_callback(),|  callback]
6576        |           search_callback()  | [x_update_horizontal_scrollbar_
6577        |              |               |  callback]
6578        |              |               |
6579        |              |               |
6580   enqueue_Xt_       signal_special_   |
6581   dispatch_event()  Xt_user_event()   |
6582   [maybe multiple     |               |
6583    times, maybe 0     |               |
6584    times]             |               |
6585        |            enqueue_Xt_       |
6586        |            dispatch_event()  |
6587        |              |               |
6588        |              |               |
6589        V              V               |
6590        -->----------<--               |
6591               |                       |
6592               |                       |
6593            dispatch             Xt_what_callback()
6594            event                  sets flags
6595            queue                      |
6596               |                       |
6597               |                       |
6598               |                       |
6599               |                       |
6600               ---->-----------<--------
6601                    |
6602                    |
6603                    |     [collected and converted as appropriate in
6604                    |            emacs_Xt_next_event()]
6605                    |
6606                    |
6607                    V          (above this line is Xt-specific)
6608                  Emacs ------------------------------------------------
6609                  event (below this line is the generic event mechanism)
6610                    |
6611                    |
6612 was there      if not, call
6613 a SIGINT?   emacs_Xt_next_event()
6614     |              |
6615     |              |
6616     |              |
6617     V              V
6618     --->-------<----
6619            |
6620            |        [collected in event_stream_next_event();
6621            |         SIGINT is converted using maybe_read_quit_event()]
6622            V
6623          Emacs
6624          event
6625            |
6626            \---->------>----- maybe_kbd_translate() -->-----\
6627                                                             |
6628                                                             |
6629                                                             |
6630      command event queue                                    |
6631                                               if not from command
6632   (contains events that were                  event queue, call
6633   read earlier but not processed,             event_stream_next_event()
6634   typically when waiting in a                               |
6635   sit-for, sleep-for, etc. for                              |
6636  a particular event to be received)                         |
6637                |                                            |
6638                |                                            |
6639                V                                            V
6640                ---->----------------------------------<------
6641                                                |
6642                                                | [collected in
6643                                                |  next_event_internal()]
6644                                                |
6645  unread-     unread-       event from          |
6646  command-    command-       keyboard       else, call
6647  events      event           macro      next_event_internal()
6648    |           |               |               |
6649    |           |               |               |
6650    |           |               |               |
6651    V           V               V               V
6652    --------->----------------------<------------
6653                      |
6654                      |      [collected in `next-event', which may loop
6655                      |       more than once if the event it gets is on
6656                      |       a dead frame, device, etc.]
6657                      |
6658                      |
6659                      V
6660             feed into top-level event loop,
6661             which repeatedly calls `next-event'
6662             and then dispatches the event
6663             using `dispatch-event'
6664 @end example
6665
6666 @node Specifics About the Emacs Event
6667 @section Specifics About the Emacs Event
6668 @cindex event, specifics about the Lisp object
6669
6670 @node The Event Stream Callback Routines
6671 @section The Event Stream Callback Routines
6672 @cindex event stream callback routines, the
6673 @cindex callback routines, the event stream
6674
6675 @node Other Event Loop Functions
6676 @section Other Event Loop Functions
6677 @cindex event loop functions, other
6678
6679   @code{detect_input_pending()} and @code{input-pending-p} look for
6680 input by calling @code{event_stream->event_pending_p} and looking in
6681 @code{[V]unread-command-event} and the @code{command_event_queue} (they
6682 do not check for an executing keyboard macro, though).
6683
6684   @code{discard-input} cancels any command events pending (and any
6685 keyboard macros currently executing), and puts the others onto the
6686 @code{command_event_queue}.  There is a comment about a ``race
6687 condition'', which is not a good sign.
6688
6689   @code{next-command-event} and @code{read-char} are higher-level
6690 interfaces to @code{next-event}.  @code{next-command-event} gets the
6691 next @dfn{command} event (i.e.  keypress, mouse event, menu selection,
6692 or scrollbar action), calling @code{dispatch-event} on any others.
6693 @code{read-char} calls @code{next-command-event} and uses
6694 @code{event_to_character()} to return the character equivalent.  With
6695 the right kind of input method support, it is possible for (read-char)
6696 to return a Kanji character.
6697
6698 @node Converting Events
6699 @section Converting Events
6700 @cindex converting events
6701 @cindex events, converting
6702
6703   @code{character_to_event()}, @code{event_to_character()},
6704 @code{event-to-character}, and @code{character-to-event} convert between
6705 characters and keypress events corresponding to the characters.  If the
6706 event was not a keypress, @code{event_to_character()} returns -1 and
6707 @code{event-to-character} returns @code{nil}.  These functions convert
6708 between character representation and the split-up event representation
6709 (keysym plus mod keys).
6710
6711 @node Dispatching Events; The Command Builder
6712 @section Dispatching Events; The Command Builder
6713 @cindex dispatching events; the command builder
6714 @cindex events; the command builder, dispatching
6715 @cindex command builder, dispatching events; the
6716
6717 Not yet documented.
6718
6719 @node Evaluation; Stack Frames; Bindings, Symbols and Variables, Events and the Event Loop, Top
6720 @chapter Evaluation; Stack Frames; Bindings
6721 @cindex evaluation; stack frames; bindings
6722 @cindex stack frames; bindings, evaluation;
6723 @cindex bindings, evaluation; stack frames;
6724
6725 @menu
6726 * Evaluation::
6727 * Dynamic Binding; The specbinding Stack; Unwind-Protects::
6728 * Simple Special Forms::
6729 * Catch and Throw::
6730 @end menu
6731
6732 @node Evaluation
6733 @section Evaluation
6734 @cindex evaluation
6735
6736   @code{Feval()} evaluates the form (a Lisp object) that is passed to
6737 it.  Note that evaluation is only non-trivial for two types of objects:
6738 symbols and conses.  A symbol is evaluated simply by calling
6739 @code{symbol-value} on it and returning the value.
6740
6741   Evaluating a cons means calling a function.  First, @code{eval} checks
6742 to see if garbage-collection is necessary, and calls
6743 @code{garbage_collect_1()} if so.  It then increases the evaluation
6744 depth by 1 (@code{lisp_eval_depth}, which is always less than
6745 @code{max_lisp_eval_depth}) and adds an element to the linked list of
6746 @code{struct backtrace}'s (@code{backtrace_list}).  Each such structure
6747 contains a pointer to the function being called plus a list of the
6748 function's arguments.  Originally these values are stored unevalled, and
6749 as they are evaluated, the backtrace structure is updated.  Garbage
6750 collection pays attention to the objects pointed to in the backtrace
6751 structures (garbage collection might happen while a function is being
6752 called or while an argument is being evaluated, and there could easily
6753 be no other references to the arguments in the argument list; once an
6754 argument is evaluated, however, the unevalled version is not needed by
6755 eval, and so the backtrace structure is changed).
6756
6757 At this point, the function to be called is determined by looking at
6758 the car of the cons (if this is a symbol, its function definition is
6759 retrieved and the process repeated).  The function should then consist
6760 of either a @code{Lisp_Subr} (built-in function written in C), a
6761 @code{Lisp_Compiled_Function} object, or a cons whose car is one of the
6762 symbols @code{autoload}, @code{macro} or @code{lambda}.
6763
6764 If the function is a @code{Lisp_Subr}, the lisp object points to a
6765 @code{struct Lisp_Subr} (created by @code{DEFUN()}), which contains a
6766 pointer to the C function, a minimum and maximum number of arguments
6767 (or possibly the special constants @code{MANY} or @code{UNEVALLED}), a
6768 pointer to the symbol referring to that subr, and a couple of other
6769 things.  If the subr wants its arguments @code{UNEVALLED}, they are
6770 passed raw as a list.  Otherwise, an array of evaluated arguments is
6771 created and put into the backtrace structure, and either passed whole
6772 (@code{MANY}) or each argument is passed as a C argument.
6773
6774 If the function is a @code{Lisp_Compiled_Function},
6775 @code{funcall_compiled_function()} is called.  If the function is a
6776 lambda list, @code{funcall_lambda()} is called.  If the function is a
6777 macro, [..... fill in] is done.  If the function is an autoload,
6778 @code{do_autoload()} is called to load the definition and then eval
6779 starts over [explain this more].
6780
6781 When @code{Feval()} exits, the evaluation depth is reduced by one, the
6782 debugger is called if appropriate, and the current backtrace structure
6783 is removed from the list.
6784
6785 Both @code{funcall_compiled_function()} and @code{funcall_lambda()} need
6786 to go through the list of formal parameters to the function and bind
6787 them to the actual arguments, checking for @code{&rest} and
6788 @code{&optional} symbols in the formal parameters and making sure the
6789 number of actual arguments is correct.
6790 @code{funcall_compiled_function()} can do this a little more
6791 efficiently, since the formal parameter list can be checked for sanity
6792 when the compiled function object is created.
6793
6794 @code{funcall_lambda()} simply calls @code{Fprogn} to execute the code
6795 in the lambda list.
6796
6797 @code{funcall_compiled_function()} calls the real byte-code interpreter
6798 @code{execute_optimized_program()} on the byte-code instructions, which
6799 are converted into an internal form for faster execution.
6800
6801 When a compiled function is executed for the first time by
6802 @code{funcall_compiled_function()}, or during the dump phase of building
6803 XEmacs, the byte-code instructions are converted from a
6804 @code{Lisp_String} (which is inefficient to access, especially in the
6805 presence of MULE) into a @code{Lisp_Opaque} object containing an array
6806 of unsigned char, which can be directly executed by the byte-code
6807 interpreter.  At this time the byte code is also analyzed for validity
6808 and transformed into a more optimized form, so that
6809 @code{execute_optimized_program()} can really fly.
6810
6811 Here are some of the optimizations performed by the internal byte-code
6812 transformer:
6813 @enumerate
6814 @item
6815 References to the @code{constants} array are checked for out-of-range
6816 indices, so that the byte interpreter doesn't have to.
6817 @item
6818 References to the @code{constants} array that will be used as a Lisp
6819 variable are checked for being correct non-constant (i.e. not @code{t},
6820 @code{nil}, or @code{keywordp}) symbols, so that the byte interpreter
6821 doesn't have to.
6822 @item
6823 The maximum number of variable bindings in the byte-code is
6824 pre-computed, so that space on the @code{specpdl} stack can be
6825 pre-reserved once for the whole function execution.
6826 @item
6827 All byte-code jumps are relative to the current program counter instead
6828 of the start of the program, thereby saving a register.
6829 @item
6830 One-byte relative jumps are converted from the byte-code form of unsigned
6831 chars offset by 127 to machine-friendly signed chars.
6832 @end enumerate
6833
6834 Of course, this transformation of the @code{instructions} should not be
6835 visible to the user, so @code{Fcompiled_function_instructions()} needs
6836 to know how to convert the optimized opaque object back into a Lisp
6837 string that is identical to the original string from the @file{.elc}
6838 file.  (Actually, the resulting string may (rarely) contain slightly
6839 different, yet equivalent, byte code.)
6840
6841 @code{Ffuncall()} implements Lisp @code{funcall}.  @code{(funcall fun
6842 x1 x2 x3 ...)} is equivalent to @code{(eval (list fun (quote x1) (quote
6843 x2) (quote x3) ...))}.  @code{Ffuncall()} contains its own code to do
6844 the evaluation, however, and is very similar to @code{Feval()}.
6845
6846 From the performance point of view, it is worth knowing that most of the
6847 time in Lisp evaluation is spent executing @code{Lisp_Subr} and
6848 @code{Lisp_Compiled_Function} objects via @code{Ffuncall()} (not
6849 @code{Feval()}).
6850
6851 @code{Fapply()} implements Lisp @code{apply}, which is very similar to
6852 @code{funcall} except that if the last argument is a list, the result is the
6853 same as if each of the arguments in the list had been passed separately.
6854 @code{Fapply()} does some business to expand the last argument if it's a
6855 list, then calls @code{Ffuncall()} to do the work.
6856
6857 @code{apply1()}, @code{call0()}, @code{call1()}, @code{call2()}, and
6858 @code{call3()} call a function, passing it the argument(s) given (the
6859 arguments are given as separate C arguments rather than being passed as
6860 an array).  @code{apply1()} uses @code{Fapply()} while the others use
6861 @code{Ffuncall()} to do the real work.
6862
6863 @node Dynamic Binding; The specbinding Stack; Unwind-Protects
6864 @section Dynamic Binding; The specbinding Stack; Unwind-Protects
6865 @cindex dynamic binding; the specbinding stack; unwind-protects
6866 @cindex binding; the specbinding stack; unwind-protects, dynamic
6867 @cindex specbinding stack; unwind-protects, dynamic binding; the
6868 @cindex unwind-protects, dynamic binding; the specbinding stack;
6869
6870 @example
6871 struct specbinding
6872 @{
6873   Lisp_Object symbol;
6874   Lisp_Object old_value;
6875   Lisp_Object (*func) (Lisp_Object); /* for unwind-protect */
6876 @};
6877 @end example
6878
6879   @code{struct specbinding} is used for local-variable bindings and
6880 unwind-protects.  @code{specpdl} holds an array of @code{struct specbinding}'s,
6881 @code{specpdl_ptr} points to the beginning of the free bindings in the
6882 array, @code{specpdl_size} specifies the total number of binding slots
6883 in the array, and @code{max_specpdl_size} specifies the maximum number
6884 of bindings the array can be expanded to hold.  @code{grow_specpdl()}
6885 increases the size of the @code{specpdl} array, multiplying its size by
6886 2 but never exceeding @code{max_specpdl_size} (except that if this
6887 number is less than 400, it is first set to 400).
6888
6889   @code{specbind()} binds a symbol to a value and is used for local
6890 variables and @code{let} forms.  The symbol and its old value (which
6891 might be @code{Qunbound}, indicating no prior value) are recorded in the
6892 specpdl array, and @code{specpdl_size} is increased by 1.
6893
6894   @code{record_unwind_protect()} implements an @dfn{unwind-protect},
6895 which, when placed around a section of code, ensures that some specified
6896 cleanup routine will be executed even if the code exits abnormally
6897 (e.g. through a @code{throw} or quit).  @code{record_unwind_protect()}
6898 simply adds a new specbinding to the @code{specpdl} array and stores the
6899 appropriate information in it.  The cleanup routine can either be a C
6900 function, which is stored in the @code{func} field, or a @code{progn}
6901 form, which is stored in the @code{old_value} field.
6902
6903   @code{unbind_to()} removes specbindings from the @code{specpdl} array
6904 until the specified position is reached.  Each specbinding can be one of
6905 three types:
6906
6907 @enumerate
6908 @item
6909 an unwind-protect with a C cleanup function (@code{func} is not 0, and
6910 @code{old_value} holds an argument to be passed to the function);
6911 @item
6912 an unwind-protect with a Lisp form (@code{func} is 0, @code{symbol}
6913 is @code{nil}, and @code{old_value} holds the form to be executed with
6914 @code{Fprogn()}); or
6915 @item
6916 a local-variable binding (@code{func} is 0, @code{symbol} is not
6917 @code{nil}, and @code{old_value} holds the old value, which is stored as
6918 the symbol's value).
6919 @end enumerate
6920
6921 @node Simple Special Forms
6922 @section Simple Special Forms
6923 @cindex special forms, simple
6924
6925 @code{or}, @code{and}, @code{if}, @code{cond}, @code{progn},
6926 @code{prog1}, @code{prog2}, @code{setq}, @code{quote}, @code{function},
6927 @code{let*}, @code{let}, @code{while}
6928
6929 All of these are very simple and work as expected, calling
6930 @code{Feval()} or @code{Fprogn()} as necessary and (in the case of
6931 @code{let} and @code{let*}) using @code{specbind()} to create bindings
6932 and @code{unbind_to()} to undo the bindings when finished.
6933
6934 Note that, with the exception of @code{Fprogn}, these functions are
6935 typically called in real life only in interpreted code, since the byte
6936 compiler knows how to convert calls to these functions directly into
6937 byte code.
6938
6939 @node Catch and Throw
6940 @section Catch and Throw
6941 @cindex catch and throw
6942 @cindex throw, catch and
6943
6944 @example
6945 struct catchtag
6946 @{
6947   Lisp_Object tag;
6948   Lisp_Object val;
6949   struct catchtag *next;
6950   struct gcpro *gcpro;
6951   jmp_buf jmp;
6952   struct backtrace *backlist;
6953   int lisp_eval_depth;
6954   int pdlcount;
6955 @};
6956 @end example
6957
6958   @code{catch} is a Lisp function that places a catch around a body of
6959 code.  A catch is a means of non-local exit from the code.  When a catch
6960 is created, a tag is specified, and executing a @code{throw} to this tag
6961 will exit from the body of code caught with this tag, and its value will
6962 be the value given in the call to @code{throw}.  If there is no such
6963 call, the code will be executed normally.
6964
6965   Information pertaining to a catch is held in a @code{struct catchtag},
6966 which is placed at the head of a linked list pointed to by
6967 @code{catchlist}.  @code{internal_catch()} is passed a C function to
6968 call (@code{Fprogn()} when Lisp @code{catch} is called) and arguments to
6969 give it, and places a catch around the function.  Each @code{struct
6970 catchtag} is held in the stack frame of the @code{internal_catch()}
6971 instance that created the catch.
6972
6973   @code{internal_catch()} is fairly straightforward.  It stores into the
6974 @code{struct catchtag} the tag name and the current values of
6975 @code{backtrace_list}, @code{lisp_eval_depth}, @code{gcprolist}, and the
6976 offset into the @code{specpdl} array, sets a jump point with @code{_setjmp()}
6977 (storing the jump point into the @code{struct catchtag}), and calls the
6978 function.  Control will return to @code{internal_catch()} either when
6979 the function exits normally or through a @code{_longjmp()} to this jump
6980 point.  In the latter case, @code{throw} will store the value to be
6981 returned into the @code{struct catchtag} before jumping.  When it's
6982 done, @code{internal_catch()} removes the @code{struct catchtag} from
6983 the catchlist and returns the proper value.
6984
6985   @code{Fthrow()} goes up through the catchlist until it finds one with
6986 a matching tag.  It then calls @code{unbind_catch()} to restore
6987 everything to what it was when the appropriate catch was set, stores the
6988 return value in the @code{struct catchtag}, and jumps (with
6989 @code{_longjmp()}) to its jump point.
6990
6991   @code{unbind_catch()} removes all catches from the catchlist until it
6992 finds the correct one.  Some of the catches might have been placed for
6993 error-trapping, and if so, the appropriate entries on the handlerlist
6994 must be removed (see ``errors'').  @code{unbind_catch()} also restores
6995 the values of @code{gcprolist}, @code{backtrace_list}, and
6996 @code{lisp_eval}, and calls @code{unbind_to()} to undo any specbindings
6997 created since the catch.
6998
6999
7000 @node Symbols and Variables, Buffers and Textual Representation, Evaluation; Stack Frames; Bindings, Top
7001 @chapter Symbols and Variables
7002 @cindex symbols and variables
7003 @cindex variables, symbols and
7004
7005 @menu
7006 * Introduction to Symbols::
7007 * Obarrays::
7008 * Symbol Values::
7009 @end menu
7010
7011 @node Introduction to Symbols
7012 @section Introduction to Symbols
7013 @cindex symbols, introduction to
7014
7015   A symbol is basically just an object with four fields: a name (a
7016 string), a value (some Lisp object), a function (some Lisp object), and
7017 a property list (usually a list of alternating keyword/value pairs).
7018 What makes symbols special is that there is usually only one symbol with
7019 a given name, and the symbol is referred to by name.  This makes a
7020 symbol a convenient way of calling up data by name, i.e. of implementing
7021 variables. (The variable's value is stored in the @dfn{value slot}.)
7022 Similarly, functions are referenced by name, and the definition of the
7023 function is stored in a symbol's @dfn{function slot}.  This means that
7024 there can be a distinct function and variable with the same name.  The
7025 property list is used as a more general mechanism of associating
7026 additional values with particular names, and once again the namespace is
7027 independent of the function and variable namespaces.
7028
7029 @node Obarrays
7030 @section Obarrays
7031 @cindex obarrays
7032
7033   The identity of symbols with their names is accomplished through a
7034 structure called an obarray, which is just a poorly-implemented hash
7035 table mapping from strings to symbols whose name is that string. (I say
7036 ``poorly implemented'' because an obarray appears in Lisp as a vector
7037 with some hidden fields rather than as its own opaque type.  This is an
7038 Emacs Lisp artifact that should be fixed.)
7039
7040   Obarrays are implemented as a vector of some fixed size (which should
7041 be a prime for best results), where each ``bucket'' of the vector
7042 contains one or more symbols, threaded through a hidden @code{next}
7043 field in the symbol.  Lookup of a symbol in an obarray, and adding a
7044 symbol to an obarray, is accomplished through standard hash-table
7045 techniques.
7046
7047   The standard Lisp function for working with symbols and obarrays is
7048 @code{intern}.  This looks up a symbol in an obarray given its name; if
7049 it's not found, a new symbol is automatically created with the specified
7050 name, added to the obarray, and returned.  This is what happens when the
7051 Lisp reader encounters a symbol (or more precisely, encounters the name
7052 of a symbol) in some text that it is reading.  There is a standard
7053 obarray called @code{obarray} that is used for this purpose, although
7054 the Lisp programmer is free to create his own obarrays and @code{intern}
7055 symbols in them.
7056
7057   Note that, once a symbol is in an obarray, it stays there until
7058 something is done about it, and the standard obarray @code{obarray}
7059 always stays around, so once you use any particular variable name, a
7060 corresponding symbol will stay around in @code{obarray} until you exit
7061 XEmacs.
7062
7063   Note that @code{obarray} itself is a variable, and as such there is a
7064 symbol in @code{obarray} whose name is @code{"obarray"} and which
7065 contains @code{obarray} as its value.
7066
7067   Note also that this call to @code{intern} occurs only when in the Lisp
7068 reader, not when the code is executed (at which point the symbol is
7069 already around, stored as such in the definition of the function).
7070
7071   You can create your own obarray using @code{make-vector} (this is
7072 horrible but is an artifact) and intern symbols into that obarray.
7073 Doing that will result in two or more symbols with the same name.
7074 However, at most one of these symbols is in the standard @code{obarray}:
7075 You cannot have two symbols of the same name in any particular obarray.
7076 Note that you cannot add a symbol to an obarray in any fashion other
7077 than using @code{intern}: i.e. you can't take an existing symbol and put
7078 it in an existing obarray.  Nor can you change the name of an existing
7079 symbol. (Since obarrays are vectors, you can violate the consistency of
7080 things by storing directly into the vector, but let's ignore that
7081 possibility.)
7082
7083   Usually symbols are created by @code{intern}, but if you really want,
7084 you can explicitly create a symbol using @code{make-symbol}, giving it
7085 some name.  The resulting symbol is not in any obarray (i.e. it is
7086 @dfn{uninterned}), and you can't add it to any obarray.  Therefore its
7087 primary purpose is as a symbol to use in macros to avoid namespace
7088 pollution.  It can also be used as a carrier of information, but cons
7089 cells could probably be used just as well.
7090
7091   You can also use @code{intern-soft} to look up a symbol but not create
7092 a new one, and @code{unintern} to remove a symbol from an obarray.  This
7093 returns the removed symbol. (Remember: You can't put the symbol back
7094 into any obarray.) Finally, @code{mapatoms} maps over all of the symbols
7095 in an obarray.
7096
7097 @node Symbol Values
7098 @section Symbol Values
7099 @cindex symbol values
7100 @cindex values, symbol
7101
7102   The value field of a symbol normally contains a Lisp object.  However,
7103 a symbol can be @dfn{unbound}, meaning that it logically has no value.
7104 This is internally indicated by storing a special Lisp object, called
7105 @dfn{the unbound marker} and stored in the global variable
7106 @code{Qunbound}.  The unbound marker is of a special Lisp object type
7107 called @dfn{symbol-value-magic}.  It is impossible for the Lisp
7108 programmer to directly create or access any object of this type.
7109
7110   @strong{You must not let any ``symbol-value-magic'' object escape to
7111 the Lisp level.}  Printing any of these objects will cause the message
7112 @samp{INTERNAL EMACS BUG} to appear as part of the print representation.
7113 (You may see this normally when you call @code{debug_print()} from the
7114 debugger on a Lisp object.) If you let one of these objects escape to
7115 the Lisp level, you will violate a number of assumptions contained in
7116 the C code and make the unbound marker not function right.
7117
7118   When a symbol is created, its value field (and function field) are set
7119 to @code{Qunbound}.  The Lisp programmer can restore these conditions
7120 later using @code{makunbound} or @code{fmakunbound}, and can query to
7121 see whether the value of function fields are @dfn{bound} (i.e. have a
7122 value other than @code{Qunbound}) using @code{boundp} and
7123 @code{fboundp}.  The fields are set to a normal Lisp object using
7124 @code{set} (or @code{setq}) and @code{fset}.
7125
7126   Other symbol-value-magic objects are used as special markers to
7127 indicate variables that have non-normal properties.  This includes any
7128 variables that are tied into C variables (setting the variable magically
7129 sets some global variable in the C code, and likewise for retrieving the
7130 variable's value), variables that magically tie into slots in the
7131 current buffer, variables that are buffer-local, etc.  The
7132 symbol-value-magic object is stored in the value cell in place of
7133 a normal object, and the code to retrieve a symbol's value
7134 (i.e. @code{symbol-value}) knows how to do special things with them.
7135 This means that you should not just fetch the value cell directly if you
7136 want a symbol's value.
7137
7138   The exact workings of this are rather complex and involved and are
7139 well-documented in comments in @file{buffer.c}, @file{symbols.c}, and
7140 @file{lisp.h}.
7141
7142 @node Buffers and Textual Representation, MULE Character Sets and Encodings, Symbols and Variables, Top
7143 @chapter Buffers and Textual Representation
7144 @cindex buffers and textual representation
7145 @cindex textual representation, buffers and
7146
7147 @menu
7148 * Introduction to Buffers::     A buffer holds a block of text such as a file.
7149 * The Text in a Buffer::        Representation of the text in a buffer.
7150 * Buffer Lists::                Keeping track of all buffers.
7151 * Markers and Extents::         Tagging locations within a buffer.
7152 * Bufbytes and Emchars::        Representation of individual characters.
7153 * The Buffer Object::           The Lisp object corresponding to a buffer.
7154 @end menu
7155
7156 @node Introduction to Buffers
7157 @section Introduction to Buffers
7158 @cindex buffers, introduction to
7159
7160   A buffer is logically just a Lisp object that holds some text.
7161 In this, it is like a string, but a buffer is optimized for
7162 frequent insertion and deletion, while a string is not.  Furthermore:
7163
7164 @enumerate
7165 @item
7166 Buffers are @dfn{permanent} objects, i.e. once you create them, they
7167 remain around, and need to be explicitly deleted before they go away.
7168 @item
7169 Each buffer has a unique name, which is a string.  Buffers are
7170 normally referred to by name.  In this respect, they are like
7171 symbols.
7172 @item
7173 Buffers have a default insertion position, called @dfn{point}.
7174 Inserting text (unless you explicitly give a position) goes at point,
7175 and moves point forward past the text.  This is what is going on when
7176 you type text into Emacs.
7177 @item
7178 Buffers have lots of extra properties associated with them.
7179 @item
7180 Buffers can be @dfn{displayed}.  What this means is that there
7181 exist a number of @dfn{windows}, which are objects that correspond
7182 to some visible section of your display, and each window has
7183 an associated buffer, and the current contents of the buffer
7184 are shown in that section of the display.  The redisplay mechanism
7185 (which takes care of doing this) knows how to look at the
7186 text of a buffer and come up with some reasonable way of displaying
7187 this.  Many of the properties of a buffer control how the
7188 buffer's text is displayed.
7189 @item
7190 One buffer is distinguished and called the @dfn{current buffer}.  It is
7191 stored in the variable @code{current_buffer}.  Buffer operations operate
7192 on this buffer by default.  When you are typing text into a buffer, the
7193 buffer you are typing into is always @code{current_buffer}.  Switching
7194 to a different window changes the current buffer.  Note that Lisp code
7195 can temporarily change the current buffer using @code{set-buffer} (often
7196 enclosed in a @code{save-excursion} so that the former current buffer
7197 gets restored when the code is finished).  However, calling
7198 @code{set-buffer} will NOT cause a permanent change in the current
7199 buffer.  The reason for this is that the top-level event loop sets
7200 @code{current_buffer} to the buffer of the selected window, each time
7201 it finishes executing a user command.
7202 @end enumerate
7203
7204   Make sure you understand the distinction between @dfn{current buffer}
7205 and @dfn{buffer of the selected window}, and the distinction between
7206 @dfn{point} of the current buffer and @dfn{window-point} of the selected
7207 window. (This latter distinction is explained in detail in the section
7208 on windows.)
7209
7210 @node The Text in a Buffer
7211 @section The Text in a Buffer
7212 @cindex text in a buffer, the
7213 @cindex buffer, the text in a
7214
7215   The text in a buffer consists of a sequence of zero or more
7216 characters.  A @dfn{character} is an integer that logically represents
7217 a letter, number, space, or other unit of text.  Most of the characters
7218 that you will typically encounter belong to the ASCII set of characters,
7219 but there are also characters for various sorts of accented letters,
7220 special symbols, Chinese and Japanese ideograms (i.e. Kanji, Katakana,
7221 etc.), Cyrillic and Greek letters, etc.  The actual number of possible
7222 characters is quite large.
7223
7224   For now, we can view a character as some non-negative integer that
7225 has some shape that defines how it typically appears (e.g. as an
7226 uppercase A). (The exact way in which a character appears depends on the
7227 font used to display the character.) The internal type of characters in
7228 the C code is an @code{Emchar}; this is just an @code{int}, but using a
7229 symbolic type makes the code clearer.
7230
7231   Between every character in a buffer is a @dfn{buffer position} or
7232 @dfn{character position}.  We can speak of the character before or after
7233 a particular buffer position, and when you insert a character at a
7234 particular position, all characters after that position end up at new
7235 positions.  When we speak of the character @dfn{at} a position, we
7236 really mean the character after the position.  (This schizophrenia
7237 between a buffer position being ``between'' a character and ``on'' a
7238 character is rampant in Emacs.)
7239
7240   Buffer positions are numbered starting at 1.  This means that
7241 position 1 is before the first character, and position 0 is not
7242 valid.  If there are N characters in a buffer, then buffer
7243 position N+1 is after the last one, and position N+2 is not valid.
7244
7245   The internal makeup of the Emchar integer varies depending on whether
7246 we have compiled with MULE support.  If not, the Emchar integer is an
7247 8-bit integer with possible values from 0 - 255.  0 - 127 are the
7248 standard ASCII characters, while 128 - 255 are the characters from the
7249 ISO-8859-1 character set.  If we have compiled with MULE support, an
7250 Emchar is a 19-bit integer, with the various bits having meanings
7251 according to a complex scheme that will be detailed later.  The
7252 characters numbered 0 - 255 still have the same meanings as for the
7253 non-MULE case, though.
7254
7255   Internally, the text in a buffer is represented in a fairly simple
7256 fashion: as a contiguous array of bytes, with a @dfn{gap} of some size
7257 in the middle.  Although the gap is of some substantial size in bytes,
7258 there is no text contained within it: From the perspective of the text
7259 in the buffer, it does not exist.  The gap logically sits at some buffer
7260 position, between two characters (or possibly at the beginning or end of
7261 the buffer).  Insertion of text in a buffer at a particular position is
7262 always accomplished by first moving the gap to that position
7263 (i.e. through some block moving of text), then writing the text into the
7264 beginning of the gap, thereby shrinking the gap.  If the gap shrinks
7265 down to nothing, a new gap is created. (What actually happens is that a
7266 new gap is ``created'' at the end of the buffer's text, which requires
7267 nothing more than changing a couple of indices; then the gap is
7268 ``moved'' to the position where the insertion needs to take place by
7269 moving up in memory all the text after that position.)  Similarly,
7270 deletion occurs by moving the gap to the place where the text is to be
7271 deleted, and then simply expanding the gap to include the deleted text.
7272 (@dfn{Expanding} and @dfn{shrinking} the gap as just described means
7273 just that the internal indices that keep track of where the gap is
7274 located are changed.)
7275
7276   Note that the total amount of memory allocated for a buffer text never
7277 decreases while the buffer is live.  Therefore, if you load up a
7278 20-megabyte file and then delete all but one character, there will be a
7279 20-megabyte gap, which won't get any smaller (except by inserting
7280 characters back again).  Once the buffer is killed, the memory allocated
7281 for the buffer text will be freed, but it will still be sitting on the
7282 heap, taking up virtual memory, and will not be released back to the
7283 operating system. (However, if you have compiled XEmacs with rel-alloc,
7284 the situation is different.  In this case, the space @emph{will} be
7285 released back to the operating system.  However, this tends to result in a
7286 noticeable speed penalty.)
7287
7288   Astute readers may notice that the text in a buffer is represented as
7289 an array of @emph{bytes}, while (at least in the MULE case) an Emchar is
7290 a 19-bit integer, which clearly cannot fit in a byte.  This means (of
7291 course) that the text in a buffer uses a different representation from
7292 an Emchar: specifically, the 19-bit Emchar becomes a series of one to
7293 four bytes.  The conversion between these two representations is complex
7294 and will be described later.
7295
7296   In the non-MULE case, everything is very simple: An Emchar
7297 is an 8-bit value, which fits neatly into one byte.
7298
7299   If we are given a buffer position and want to retrieve the
7300 character at that position, we need to follow these steps:
7301
7302 @enumerate
7303 @item
7304 Pretend there's no gap, and convert the buffer position into a @dfn{byte
7305 index} that indexes to the appropriate byte in the buffer's stream of
7306 textual bytes.  By convention, byte indices begin at 1, just like buffer
7307 positions.  In the non-MULE case, byte indices and buffer positions are
7308 identical, since one character equals one byte.
7309 @item
7310 Convert the byte index into a @dfn{memory index}, which takes the gap
7311 into account.  The memory index is a direct index into the block of
7312 memory that stores the text of a buffer.  This basically just involves
7313 checking to see if the byte index is past the gap, and if so, adding the
7314 size of the gap to it.  By convention, memory indices begin at 1, just
7315 like buffer positions and byte indices, and when referring to the
7316 position that is @dfn{at} the gap, we always use the memory position at
7317 the @emph{beginning}, not at the end, of the gap.
7318 @item
7319 Fetch the appropriate bytes at the determined memory position.
7320 @item
7321 Convert these bytes into an Emchar.
7322 @end enumerate
7323
7324   In the non-Mule case, (3) and (4) boil down to a simple one-byte
7325 memory access.
7326
7327   Note that we have defined three types of positions in a buffer:
7328
7329 @enumerate
7330 @item
7331 @dfn{buffer positions} or @dfn{character positions}, typedef @code{Bufpos}
7332 @item
7333 @dfn{byte indices}, typedef @code{Bytind}
7334 @item
7335 @dfn{memory indices}, typedef @code{Memind}
7336 @end enumerate
7337
7338   All three typedefs are just @code{int}s, but defining them this way makes
7339 things a lot clearer.
7340
7341   Most code works with buffer positions.  In particular, all Lisp code
7342 that refers to text in a buffer uses buffer positions.  Lisp code does
7343 not know that byte indices or memory indices exist.
7344
7345   Finally, we have a typedef for the bytes in a buffer.  This is a
7346 @code{Bufbyte}, which is an unsigned char.  Referring to them as
7347 Bufbytes underscores the fact that we are working with a string of bytes
7348 in the internal Emacs buffer representation rather than in one of a
7349 number of possible alternative representations (e.g. EUC-encoded text,
7350 etc.).
7351
7352 @node Buffer Lists
7353 @section Buffer Lists
7354 @cindex buffer lists
7355
7356   Recall earlier that buffers are @dfn{permanent} objects, i.e.  that
7357 they remain around until explicitly deleted.  This entails that there is
7358 a list of all the buffers in existence.  This list is actually an
7359 assoc-list (mapping from the buffer's name to the buffer) and is stored
7360 in the global variable @code{Vbuffer_alist}.
7361
7362   The order of the buffers in the list is important: the buffers are
7363 ordered approximately from most-recently-used to least-recently-used.
7364 Switching to a buffer using @code{switch-to-buffer},
7365 @code{pop-to-buffer}, etc. and switching windows using
7366 @code{other-window}, etc.  usually brings the new current buffer to the
7367 front of the list.  @code{switch-to-buffer}, @code{other-buffer},
7368 etc. look at the beginning of the list to find an alternative buffer to
7369 suggest.  You can also explicitly move a buffer to the end of the list
7370 using @code{bury-buffer}.
7371
7372   In addition to the global ordering in @code{Vbuffer_alist}, each frame
7373 has its own ordering of the list.  These lists always contain the same
7374 elements as in @code{Vbuffer_alist} although possibly in a different
7375 order.  @code{buffer-list} normally returns the list for the selected
7376 frame.  This allows you to work in separate frames without things
7377 interfering with each other.
7378
7379   The standard way to look up a buffer given a name is
7380 @code{get-buffer}, and the standard way to create a new buffer is
7381 @code{get-buffer-create}, which looks up a buffer with a given name,
7382 creating a new one if necessary.  These operations correspond exactly
7383 with the symbol operations @code{intern-soft} and @code{intern},
7384 respectively.  You can also force a new buffer to be created using
7385 @code{generate-new-buffer}, which takes a name and (if necessary) makes
7386 a unique name from this by appending a number, and then creates the
7387 buffer.  This is basically like the symbol operation @code{gensym}.
7388
7389 @node Markers and Extents
7390 @section Markers and Extents
7391 @cindex markers and extents
7392 @cindex extents, markers and
7393
7394   Among the things associated with a buffer are things that are
7395 logically attached to certain buffer positions.  This can be used to
7396 keep track of a buffer position when text is inserted and deleted, so
7397 that it remains at the same spot relative to the text around it; to
7398 assign properties to particular sections of text; etc.  There are two
7399 such objects that are useful in this regard: they are @dfn{markers} and
7400 @dfn{extents}.
7401
7402   A @dfn{marker} is simply a flag placed at a particular buffer
7403 position, which is moved around as text is inserted and deleted.
7404 Markers are used for all sorts of purposes, such as the @code{mark} that
7405 is the other end of textual regions to be cut, copied, etc.
7406
7407   An @dfn{extent} is similar to two markers plus some associated
7408 properties, and is used to keep track of regions in a buffer as text is
7409 inserted and deleted, and to add properties (e.g. fonts) to particular
7410 regions of text.  The external interface of extents is explained
7411 elsewhere.
7412
7413   The important thing here is that markers and extents simply contain
7414 buffer positions in them as integers, and every time text is inserted or
7415 deleted, these positions must be updated.  In order to minimize the
7416 amount of shuffling that needs to be done, the positions in markers and
7417 extents (there's one per marker, two per extent) are stored in Meminds.
7418 This means that they only need to be moved when the text is physically
7419 moved in memory; since the gap structure tries to minimize this, it also
7420 minimizes the number of marker and extent indices that need to be
7421 adjusted.  Look in @file{insdel.c} for the details of how this works.
7422
7423   One other important distinction is that markers are @dfn{temporary}
7424 while extents are @dfn{permanent}.  This means that markers disappear as
7425 soon as there are no more pointers to them, and correspondingly, there
7426 is no way to determine what markers are in a buffer if you are just
7427 given the buffer.  Extents remain in a buffer until they are detached
7428 (which could happen as a result of text being deleted) or the buffer is
7429 deleted, and primitives do exist to enumerate the extents in a buffer.
7430
7431 @node Bufbytes and Emchars
7432 @section Bufbytes and Emchars
7433 @cindex Bufbytes and Emchars
7434 @cindex Emchars, Bufbytes and
7435
7436   Not yet documented.
7437
7438 @node The Buffer Object
7439 @section The Buffer Object
7440 @cindex buffer object, the
7441 @cindex object, the buffer
7442
7443   Buffers contain fields not directly accessible by the Lisp programmer.
7444 We describe them here, naming them by the names used in the C code.
7445 Many are accessible indirectly in Lisp programs via Lisp primitives.
7446
7447 @table @code
7448 @item name
7449 The buffer name is a string that names the buffer.  It is guaranteed to
7450 be unique.  @xref{Buffer Names,,, lispref, XEmacs Lisp Reference
7451 Manual}.
7452
7453 @item save_modified
7454 This field contains the time when the buffer was last saved, as an
7455 integer.  @xref{Buffer Modification,,, lispref, XEmacs Lisp Reference
7456 Manual}.
7457
7458 @item modtime
7459 This field contains the modification time of the visited file.  It is
7460 set when the file is written or read.  Every time the buffer is written
7461 to the file, this field is compared to the modification time of the
7462 file.  @xref{Buffer Modification,,, lispref, XEmacs Lisp Reference
7463 Manual}.
7464
7465 @item auto_save_modified
7466 This field contains the time when the buffer was last auto-saved.
7467
7468 @item last_window_start
7469 This field contains the @code{window-start} position in the buffer as of
7470 the last time the buffer was displayed in a window.
7471
7472 @item undo_list
7473 This field points to the buffer's undo list.  @xref{Undo,,, lispref,
7474 XEmacs Lisp Reference Manual}.
7475
7476 @item syntax_table_v
7477 This field contains the syntax table for the buffer.  @xref{Syntax
7478 Tables,,, lispref, XEmacs Lisp Reference Manual}.
7479
7480 @item downcase_table
7481 This field contains the conversion table for converting text to lower
7482 case.  @xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}.
7483
7484 @item upcase_table
7485 This field contains the conversion table for converting text to upper
7486 case.  @xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}.
7487
7488 @item case_canon_table
7489 This field contains the conversion table for canonicalizing text for
7490 case-folding search.  @xref{Case Tables,,, lispref, XEmacs Lisp
7491 Reference Manual}.
7492
7493 @item case_eqv_table
7494 This field contains the equivalence table for case-folding search.
7495 @xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}.
7496
7497 @item display_table
7498 This field contains the buffer's display table, or @code{nil} if it
7499 doesn't have one.  @xref{Display Tables,,, lispref, XEmacs Lisp
7500 Reference Manual}.
7501
7502 @item markers
7503 This field contains the chain of all markers that currently point into
7504 the buffer.  Deletion of text in the buffer, and motion of the buffer's
7505 gap, must check each of these markers and perhaps update it.
7506 @xref{Markers,,, lispref, XEmacs Lisp Reference Manual}.
7507
7508 @item backed_up
7509 This field is a flag that tells whether a backup file has been made for
7510 the visited file of this buffer.
7511
7512 @item mark
7513 This field contains the mark for the buffer.  The mark is a marker,
7514 hence it is also included on the list @code{markers}.  @xref{The Mark,,,
7515 lispref, XEmacs Lisp Reference Manual}.
7516
7517 @item mark_active
7518 This field is non-@code{nil} if the buffer's mark is active.
7519
7520 @item local_var_alist
7521 This field contains the association list describing the variables local
7522 in this buffer, and their values, with the exception of local variables
7523 that have special slots in the buffer object.  (Those slots are omitted
7524 from this table.)  @xref{Buffer-Local Variables,,, lispref, XEmacs Lisp
7525 Reference Manual}.
7526
7527 @item modeline_format
7528 This field contains a Lisp object which controls how to display the mode
7529 line for this buffer.  @xref{Modeline Format,,, lispref, XEmacs Lisp
7530 Reference Manual}.
7531
7532 @item base_buffer
7533 This field holds the buffer's base buffer (if it is an indirect buffer),
7534 or @code{nil}.
7535 @end table
7536
7537 @node MULE Character Sets and Encodings, The Lisp Reader and Compiler, Buffers and Textual Representation, Top
7538 @chapter MULE Character Sets and Encodings
7539 @cindex Mule character sets and encodings
7540 @cindex character sets and encodings, Mule
7541 @cindex encodings, Mule character sets and
7542
7543   Recall that there are two primary ways that text is represented in
7544 XEmacs.  The @dfn{buffer} representation sees the text as a series of
7545 bytes (Bufbytes), with a variable number of bytes used per character.
7546 The @dfn{character} representation sees the text as a series of integers
7547 (Emchars), one per character.  The character representation is a cleaner
7548 representation from a theoretical standpoint, and is thus used in many
7549 cases when lots of manipulations on a string need to be done.  However,
7550 the buffer representation is the standard representation used in both
7551 Lisp strings and buffers, and because of this, it is the ``default''
7552 representation that text comes in.  The reason for using this
7553 representation is that it's compact and is compatible with ASCII.
7554
7555 @menu
7556 * Character Sets::
7557 * Encodings::
7558 * Internal Mule Encodings::
7559 * CCL::
7560 @end menu
7561
7562 @node Character Sets
7563 @section Character Sets
7564 @cindex character sets
7565
7566   A character set (or @dfn{charset}) is an ordered set of characters.  A
7567 particular character in a charset is indexed using one or more
7568 @dfn{position codes}, which are non-negative integers.  The number of
7569 position codes needed to identify a particular character in a charset is
7570 called the @dfn{dimension} of the charset.  In XEmacs/Mule, all charsets
7571 have dimension 1 or 2, and the size of all charsets (except for a few
7572 special cases) is either 94, 96, 94 by 94, or 96 by 96.  The range of
7573 position codes used to index characters from any of these types of
7574 character sets is as follows:
7575
7576 @example
7577 Charset type            Position code 1         Position code 2
7578 ------------------------------------------------------------
7579 94                      33 - 126                N/A
7580 96                      32 - 127                N/A
7581 94x94                   33 - 126                33 - 126
7582 96x96                   32 - 127                32 - 127
7583 @end example
7584
7585   Note that in the above cases position codes do not start at an
7586 expected value such as 0 or 1.  The reason for this will become clear
7587 later.
7588
7589   For example, Latin-1 is a 96-character charset, and JISX0208 (the
7590 Japanese national character set) is a 94x94-character charset.
7591
7592   [Note that, although the ranges above define the @emph{valid} position
7593 codes for a charset, some of the slots in a particular charset may in
7594 fact be empty.  This is the case for JISX0208, for example, where (e.g.)
7595 all the slots whose first position code is in the range 118 - 127 are
7596 empty.]
7597
7598   There are three charsets that do not follow the above rules.  All of
7599 them have one dimension, and have ranges of position codes as follows:
7600
7601 @example
7602 Charset name            Position code 1
7603 ------------------------------------
7604 ASCII                   0 - 127
7605 Control-1               0 - 31
7606 Composite               0 - some large number
7607 @end example
7608
7609   (The upper bound of the position code for composite characters has not
7610 yet been determined, but it will probably be at least 16,383).
7611
7612   ASCII is the union of two subsidiary character sets: Printing-ASCII
7613 (the printing ASCII character set, consisting of position codes 33 -
7614 126, like for a standard 94-character charset) and Control-ASCII (the
7615 non-printing characters that would appear in a binary file with codes 0
7616 - 32 and 127).
7617
7618   Control-1 contains the non-printing characters that would appear in a
7619 binary file with codes 128 - 159.
7620
7621   Composite contains characters that are generated by overstriking one
7622 or more characters from other charsets.
7623
7624   Note that some characters in ASCII, and all characters in Control-1,
7625 are @dfn{control} (non-printing) characters.  These have no printed
7626 representation but instead control some other function of the printing
7627 (e.g. TAB or 8 moves the current character position to the next tab
7628 stop).  All other characters in all charsets are @dfn{graphic}
7629 (printing) characters.
7630
7631   When a binary file is read in, the bytes in the file are assigned to
7632 character sets as follows:
7633
7634 @example
7635 Bytes           Character set           Range
7636 --------------------------------------------------
7637 0 - 127         ASCII                   0 - 127
7638 128 - 159       Control-1               0 - 31
7639 160 - 255       Latin-1                 32 - 127
7640 @end example
7641
7642   This is a bit ad-hoc but gets the job done.
7643
7644 @node Encodings
7645 @section Encodings
7646 @cindex encodings, Mule
7647 @cindex Mule encodings
7648
7649   An @dfn{encoding} is a way of numerically representing characters from
7650 one or more character sets.  If an encoding only encompasses one
7651 character set, then the position codes for the characters in that
7652 character set could be used directly.  This is not possible, however, if
7653 more than one character set is to be used in the encoding.
7654
7655   For example, the conversion detailed above between bytes in a binary
7656 file and characters is effectively an encoding that encompasses the
7657 three character sets ASCII, Control-1, and Latin-1 in a stream of 8-bit
7658 bytes.
7659
7660   Thus, an encoding can be viewed as a way of encoding characters from a
7661 specified group of character sets using a stream of bytes, each of which
7662 contains a fixed number of bits (but not necessarily 8, as in the common
7663 usage of ``byte'').
7664
7665   Here are descriptions of a couple of common
7666 encodings:
7667
7668 @menu
7669 * Japanese EUC (Extended Unix Code)::
7670 * JIS7::
7671 @end menu
7672
7673 @node Japanese EUC (Extended Unix Code)
7674 @subsection Japanese EUC (Extended Unix Code)
7675 @cindex Japanese EUC (Extended Unix Code)
7676 @cindex EUC (Extended Unix Code), Japanese
7677 @cindex Extended Unix Code, Japanese EUC
7678
7679 This encompasses the character sets Printing-ASCII, Japanese-JISX0201,
7680 and Japanese-JISX0208-Kana (half-width katakana, the right half of
7681 JISX0201).  It uses 8-bit bytes.
7682
7683 Note that Printing-ASCII and Japanese-JISX0201-Kana are 94-character
7684 charsets, while Japanese-JISX0208 is a 94x94-character charset.
7685
7686 The encoding is as follows:
7687
7688 @example
7689 Character set            Representation (PC=position-code)
7690 -------------            --------------
7691 Printing-ASCII           PC1
7692 Japanese-JISX0201-Kana   0x8E       | PC1 + 0x80
7693 Japanese-JISX0208        PC1 + 0x80 | PC2 + 0x80
7694 Japanese-JISX0212        PC1 + 0x80 | PC2 + 0x80
7695 @end example
7696
7697
7698 @node JIS7
7699 @subsection JIS7
7700 @cindex JIS7
7701
7702 This encompasses the character sets Printing-ASCII,
7703 Japanese-JISX0201-Roman (the left half of JISX0201; this character set
7704 is very similar to Printing-ASCII and is a 94-character charset),
7705 Japanese-JISX0208, and Japanese-JISX0201-Kana.  It uses 7-bit bytes.
7706
7707 Unlike Japanese EUC, this is a @dfn{modal} encoding, which
7708 means that there are multiple states that the encoding can
7709 be in, which affect how the bytes are to be interpreted.
7710 Special sequences of bytes (called @dfn{escape sequences})
7711 are used to change states.
7712
7713   The encoding is as follows:
7714
7715 @example
7716 Character set              Representation (PC=position-code)
7717 -------------              --------------
7718 Printing-ASCII             PC1
7719 Japanese-JISX0201-Roman    PC1
7720 Japanese-JISX0201-Kana     PC1
7721 Japanese-JISX0208          PC1 PC2
7722
7723
7724 Escape sequence   ASCII equivalent   Meaning
7725 ---------------   ----------------   -------
7726 0x1B 0x28 0x4A    ESC ( J            invoke Japanese-JISX0201-Roman
7727 0x1B 0x28 0x49    ESC ( I            invoke Japanese-JISX0201-Kana
7728 0x1B 0x24 0x42    ESC $ B            invoke Japanese-JISX0208
7729 0x1B 0x28 0x42    ESC ( B            invoke Printing-ASCII
7730 @end example
7731
7732   Initially, Printing-ASCII is invoked.
7733
7734 @node Internal Mule Encodings
7735 @section Internal Mule Encodings
7736 @cindex internal Mule encodings
7737 @cindex Mule encodings, internal
7738 @cindex encodings, internal Mule
7739
7740 In XEmacs/Mule, each character set is assigned a unique number, called a
7741 @dfn{leading byte}.  This is used in the encodings of a character.
7742 Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has
7743 a leading byte of 0), although some leading bytes are reserved.
7744
7745 Charsets whose leading byte is in the range 0x80 - 0x9F are called
7746 @dfn{official} and are used for built-in charsets.  Other charsets are
7747 called @dfn{private} and have leading bytes in the range 0xA0 - 0xFF;
7748 these are user-defined charsets.
7749
7750   More specifically:
7751
7752 @example
7753 Character set           Leading byte
7754 -------------           ------------
7755 ASCII                   0
7756 Composite               0x80
7757 Dimension-1 Official    0x81 - 0x8D
7758                           (0x8E is free)
7759 Control-1               0x8F
7760 Dimension-2 Official    0x90 - 0x99
7761                           (0x9A - 0x9D are free;
7762                            0x9E and 0x9F are reserved)
7763 Dimension-1 Private     0xA0 - 0xEF
7764 Dimension-2 Private     0xF0 - 0xFF
7765 @end example
7766
7767 There are two internal encodings for characters in XEmacs/Mule.  One is
7768 called @dfn{string encoding} and is an 8-bit encoding that is used for
7769 representing characters in a buffer or string.  It uses 1 to 4 bytes per
7770 character.  The other is called @dfn{character encoding} and is a 19-bit
7771 encoding that is used for representing characters individually in a
7772 variable.
7773
7774 (In the following descriptions, we'll ignore composite characters for
7775 the moment.  We also give a general (structural) overview first,
7776 followed later by the exact details.)
7777
7778 @menu
7779 * Internal String Encoding::
7780 * Internal Character Encoding::
7781 @end menu
7782
7783 @node Internal String Encoding
7784 @subsection Internal String Encoding
7785 @cindex internal string encoding
7786 @cindex string encoding, internal
7787 @cindex encoding, internal string
7788
7789 ASCII characters are encoded using their position code directly.  Other
7790 characters are encoded using their leading byte followed by their
7791 position code(s) with the high bit set.  Characters in private character
7792 sets have their leading byte prefixed with a @dfn{leading byte prefix},
7793 which is either 0x9E or 0x9F. (No character sets are ever assigned these
7794 leading bytes.) Specifically:
7795
7796 @example
7797 Character set           Encoding (PC=position-code, LB=leading-byte)
7798 -------------           --------
7799 ASCII                   PC-1 |
7800 Control-1               LB   |  PC1 + 0xA0 |
7801 Dimension-1 official    LB   |  PC1 + 0x80 |
7802 Dimension-1 private     0x9E |  LB         | PC1 + 0x80 |
7803 Dimension-2 official    LB   |  PC1 + 0x80 | PC2 + 0x80 |
7804 Dimension-2 private     0x9F |  LB         | PC1 + 0x80 | PC2 + 0x80
7805 @end example
7806
7807   The basic characteristic of this encoding is that the first byte
7808 of all characters is in the range 0x00 - 0x9F, and the second and
7809 following bytes of all characters is in the range 0xA0 - 0xFF.
7810 This means that it is impossible to get out of sync, or more
7811 specifically:
7812
7813 @enumerate
7814 @item
7815 Given any byte position, the beginning of the character it is
7816 within can be determined in constant time.
7817 @item
7818 Given any byte position at the beginning of a character, the
7819 beginning of the next character can be determined in constant
7820 time.
7821 @item
7822 Given any byte position at the beginning of a character, the
7823 beginning of the previous character can be determined in constant
7824 time.
7825 @item
7826 Textual searches can simply treat encoded strings as if they
7827 were encoded in a one-byte-per-character fashion rather than
7828 the actual multi-byte encoding.
7829 @end enumerate
7830
7831   None of the standard non-modal encodings meet all of these
7832 conditions.  For example, EUC satisfies only (2) and (3), while
7833 Shift-JIS and Big5 (not yet described) satisfy only (2). (All
7834 non-modal encodings must satisfy (2), in order to be unambiguous.)
7835
7836 @node Internal Character Encoding
7837 @subsection Internal Character Encoding
7838 @cindex internal character encoding
7839 @cindex character encoding, internal
7840 @cindex encoding, internal character
7841
7842   One 19-bit word represents a single character.  The word is
7843 separated into three fields:
7844
7845 @example
7846 Bit number:     18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
7847                 <------------> <------------------> <------------------>
7848 Field:                1                  2                    3
7849 @end example
7850
7851   Note that fields 2 and 3 hold 7 bits each, while field 1 holds 5 bits.
7852
7853 @example
7854 Character set           Field 1         Field 2         Field 3
7855 -------------           -------         -------         -------
7856 ASCII                      0               0              PC1
7857    range:                                                   (00 - 7F)
7858 Control-1                  0               1              PC1
7859    range:                                                   (00 - 1F)
7860 Dimension-1 official       0            LB - 0x80         PC1
7861    range:                                    (01 - 0D)      (20 - 7F)
7862 Dimension-1 private        0            LB - 0x80         PC1
7863    range:                                    (20 - 6F)      (20 - 7F)
7864 Dimension-2 official    LB - 0x8F         PC1             PC2
7865    range:                    (01 - 0A)       (20 - 7F)      (20 - 7F)
7866 Dimension-2 private     LB - 0xE1         PC1             PC2
7867    range:                    (0F - 1E)       (20 - 7F)      (20 - 7F)
7868 Composite                 0x1F             ?               ?
7869 @end example
7870
7871   Note that character codes 0 - 255 are the same as the ``binary encoding''
7872 described above.
7873
7874 @node CCL
7875 @section CCL
7876 @cindex CCL
7877
7878 @example
7879 CCL PROGRAM SYNTAX:
7880      CCL_PROGRAM := (CCL_MAIN_BLOCK
7881                      [ CCL_EOF_BLOCK ])
7882
7883      CCL_MAIN_BLOCK := CCL_BLOCK
7884      CCL_EOF_BLOCK := CCL_BLOCK
7885
7886      CCL_BLOCK := STATEMENT | (STATEMENT [STATEMENT ...])
7887      STATEMENT :=
7888              SET | IF | BRANCH | LOOP | REPEAT | BREAK
7889              | READ | WRITE
7890
7891      SET := (REG = EXPRESSION) | (REG SELF_OP EXPRESSION)
7892             | INT-OR-CHAR
7893
7894      EXPRESSION := ARG | (EXPRESSION OP ARG)
7895
7896      IF := (if EXPRESSION CCL_BLOCK CCL_BLOCK)
7897      BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...])
7898      LOOP := (loop STATEMENT [STATEMENT ...])
7899      BREAK := (break)
7900      REPEAT := (repeat)
7901              | (write-repeat [REG | INT-OR-CHAR | string])
7902              | (write-read-repeat REG [INT-OR-CHAR | string | ARRAY]?)
7903      READ := (read REG) | (read REG REG)
7904              | (read-if REG ARITH_OP ARG CCL_BLOCK CCL_BLOCK)
7905              | (read-branch REG CCL_BLOCK [CCL_BLOCK ...])
7906      WRITE := (write REG) | (write REG REG)
7907              | (write INT-OR-CHAR) | (write STRING) | STRING
7908              | (write REG ARRAY)
7909      END := (end)
7910
7911      REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7
7912      ARG := REG | INT-OR-CHAR
7913      OP :=   + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | //
7914              | < | > | == | <= | >= | !=
7915      SELF_OP :=
7916              += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>=
7917      ARRAY := '[' INT-OR-CHAR ... ']'
7918      INT-OR-CHAR := INT | CHAR
7919
7920 MACHINE CODE:
7921
7922 The machine code consists of a vector of 32-bit words.
7923 The first such word specifies the start of the EOF section of the code;
7924 this is the code executed to handle any stuff that needs to be done
7925 (e.g. designating back to ASCII and left-to-right mode) after all
7926 other encoded/decoded data has been written out.  This is not used for
7927 charset CCL programs.
7928
7929 REGISTER: 0..7  -- referred by RRR or rrr
7930
7931 OPERATOR BIT FIELD (27-bit): XXXXXXXXXXXXXXX RRR TTTTT
7932         TTTTT (5-bit): operator type
7933         RRR (3-bit): register number
7934         XXXXXXXXXXXXXXXX (15-bit):
7935                 CCCCCCCCCCCCCCC: constant or address
7936                 000000000000rrr: register number
7937
7938 AAAA:   00000 +
7939         00001 -
7940         00010 *
7941         00011 /
7942         00100 %
7943         00101 &
7944         00110 |
7945         00111 ~
7946
7947         01000 <<
7948         01001 >>
7949         01010 <8
7950         01011 >8
7951         01100 //
7952         01101 not used
7953         01110 not used
7954         01111 not used
7955
7956         10000 <
7957         10001 >
7958         10010 ==
7959         10011 <=
7960         10100 >=
7961         10101 !=
7962
7963 OPERATORS:      TTTTT RRR XX..
7964
7965 SetCS:          00000 RRR C...C      RRR = C...C
7966 SetCL:          00001 RRR .....      RRR = c...c
7967                 c.............c
7968 SetR:           00010 RRR ..rrr      RRR = rrr
7969 SetA:           00011 RRR ..rrr      RRR = array[rrr]
7970                 C.............C      size of array = C...C
7971                 c.............c      contents = c...c
7972
7973 Jump:           00100 000 c...c      jump to c...c
7974 JumpCond:       00101 RRR c...c      if (!RRR) jump to c...c
7975 WriteJump:      00110 RRR c...c      Write1 RRR, jump to c...c
7976 WriteReadJump:  00111 RRR c...c      Write1, Read1 RRR, jump to c...c
7977 WriteCJump:     01000 000 c...c      Write1 C...C, jump to c...c
7978                 C...C
7979 WriteCReadJump: 01001 RRR c...c      Write1 C...C, Read1 RRR,
7980                 C.............C      and jump to c...c
7981 WriteSJump:     01010 000 c...c      WriteS, jump to c...c
7982                 C.............C
7983                 S.............S
7984                 ...
7985 WriteSReadJump: 01011 RRR c...c      WriteS, Read1 RRR, jump to c...c
7986                 C.............C
7987                 S.............S
7988                 ...
7989 WriteAReadJump: 01100 RRR c...c      WriteA, Read1 RRR, jump to c...c
7990                 C.............C      size of array = C...C
7991                 c.............c      contents = c...c
7992                 ...
7993 Branch:         01101 RRR C...C      if (RRR >= 0 && RRR < C..)
7994                 c.............c      branch to (RRR+1)th address
7995 Read1:          01110 RRR ...        read 1-byte to RRR
7996 Read2:          01111 RRR ..rrr      read 2-byte to RRR and rrr
7997 ReadBranch:     10000 RRR C...C      Read1 and Branch
7998                 c.............c
7999                 ...
8000 Write1:         10001 RRR .....      write 1-byte RRR
8001 Write2:         10010 RRR ..rrr      write 2-byte RRR and rrr
8002 WriteC:         10011 000 .....      write 1-char C...CC
8003                 C.............C
8004 WriteS:         10100 000 .....      write C..-byte of string
8005                 C.............C
8006                 S.............S
8007                 ...
8008 WriteA:         10101 RRR .....      write array[RRR]
8009                 C.............C      size of array = C...C
8010                 c.............c      contents = c...c
8011                 ...
8012 End:            10110 000 .....      terminate the execution
8013
8014 SetSelfCS:      10111 RRR C...C      RRR AAAAA= C...C
8015                 ..........AAAAA
8016 SetSelfCL:      11000 RRR .....      RRR AAAAA= c...c
8017                 c.............c
8018                 ..........AAAAA
8019 SetSelfR:       11001 RRR ..Rrr      RRR AAAAA= rrr
8020                 ..........AAAAA
8021 SetExprCL:      11010 RRR ..Rrr      RRR = rrr AAAAA c...c
8022                 c.............c
8023                 ..........AAAAA
8024 SetExprR:       11011 RRR ..rrr      RRR = rrr AAAAA Rrr
8025                 ............Rrr
8026                 ..........AAAAA
8027 JumpCondC:      11100 RRR c...c      if !(RRR AAAAA C..) jump to c...c
8028                 C.............C
8029                 ..........AAAAA
8030 JumpCondR:      11101 RRR c...c      if !(RRR AAAAA rrr) jump to c...c
8031                 ............rrr
8032                 ..........AAAAA
8033 ReadJumpCondC:  11110 RRR c...c      Read1 and JumpCondC
8034                 C.............C
8035                 ..........AAAAA
8036 ReadJumpCondR:  11111 RRR c...c      Read1 and JumpCondR
8037                 ............rrr
8038                 ..........AAAAA
8039 @end example
8040
8041 @node The Lisp Reader and Compiler, Lstreams, MULE Character Sets and Encodings, Top
8042 @chapter The Lisp Reader and Compiler
8043 @cindex Lisp reader and compiler, the
8044 @cindex reader and compiler, the Lisp
8045 @cindex compiler, the Lisp reader and
8046
8047 Not yet documented.
8048
8049 @node Lstreams, Consoles; Devices; Frames; Windows, The Lisp Reader and Compiler, Top
8050 @chapter Lstreams
8051 @cindex lstreams
8052
8053   An @dfn{lstream} is an internal Lisp object that provides a generic
8054 buffering stream implementation.  Conceptually, you send data to the
8055 stream or read data from the stream, not caring what's on the other end
8056 of the stream.  The other end could be another stream, a file
8057 descriptor, a stdio stream, a fixed block of memory, a reallocating
8058 block of memory, etc.  The main purpose of the stream is to provide a
8059 standard interface and to do buffering.  Macros are defined to read or
8060 write characters, so the calling functions do not have to worry about
8061 blocking data together in order to achieve efficiency.
8062
8063 @menu
8064 * Creating an Lstream::         Creating an lstream object.
8065 * Lstream Types::               Different sorts of things that are streamed.
8066 * Lstream Functions::           Functions for working with lstreams.
8067 * Lstream Methods::             Creating new lstream types.
8068 @end menu
8069
8070 @node Creating an Lstream
8071 @section Creating an Lstream
8072 @cindex lstream, creating an
8073
8074 Lstreams come in different types, depending on what is being interfaced
8075 to.  Although the primitive for creating new lstreams is
8076 @code{Lstream_new()}, generally you do not call this directly.  Instead,
8077 you call some type-specific creation function, which creates the lstream
8078 and initializes it as appropriate for the particular type.
8079
8080 All lstream creation functions take a @var{mode} argument, specifying
8081 what mode the lstream should be opened as.  This controls whether the
8082 lstream is for input and output, and optionally whether data should be
8083 blocked up in units of MULE characters.  Note that some types of
8084 lstreams can only be opened for input; others only for output; and
8085 others can be opened either way.  #### Richard Mlynarik thinks that
8086 there should be a strict separation between input and output streams,
8087 and he's probably right.
8088
8089   @var{mode} is a string, one of
8090
8091 @table @code
8092 @item "r"
8093   Open for reading.
8094 @item "w"
8095   Open for writing.
8096 @item "rc"
8097   Open for reading, but ``read'' never returns partial MULE characters.
8098 @item "wc"
8099   Open for writing, but never writes partial MULE characters.
8100 @end table
8101
8102 @node Lstream Types
8103 @section Lstream Types
8104 @cindex lstream types
8105 @cindex types, lstream
8106
8107 @table @asis
8108 @item stdio
8109
8110 @item filedesc
8111
8112 @item lisp-string
8113
8114 @item fixed-buffer
8115
8116 @item resizing-buffer
8117
8118 @item dynarr
8119
8120 @item lisp-buffer
8121
8122 @item print
8123
8124 @item decoding
8125
8126 @item encoding
8127 @end table
8128
8129 @node Lstream Functions
8130 @section Lstream Functions
8131 @cindex lstream functions
8132
8133 @deftypefun {Lstream *} Lstream_new (Lstream_implementation *@var{imp}, const char *@var{mode})
8134 Allocate and return a new Lstream.  This function is not really meant to
8135 be called directly; rather, each stream type should provide its own
8136 stream creation function, which creates the stream and does any other
8137 necessary creation stuff (e.g. opening a file).
8138 @end deftypefun
8139
8140 @deftypefun void Lstream_set_buffering (Lstream *@var{lstr}, Lstream_buffering @var{buffering}, int @var{buffering_size})
8141 Change the buffering of a stream.  See @file{lstream.h}.  By default the
8142 buffering is @code{STREAM_BLOCK_BUFFERED}.
8143 @end deftypefun
8144
8145 @deftypefun int Lstream_flush (Lstream *@var{lstr})
8146 Flush out any pending unwritten data in the stream.  Clear any buffered
8147 input data.  Returns 0 on success, -1 on error.
8148 @end deftypefun
8149
8150 @deftypefn Macro int Lstream_putc (Lstream *@var{stream}, int @var{c})
8151 Write out one byte to the stream.  This is a macro and so it is very
8152 efficient.  The @var{c} argument is only evaluated once but the @var{stream}
8153 argument is evaluated more than once.  Returns 0 on success, -1 on
8154 error.
8155 @end deftypefn
8156
8157 @deftypefn Macro int Lstream_getc (Lstream *@var{stream})
8158 Read one byte from the stream.  This is a macro and so it is very
8159 efficient.  The @var{stream} argument is evaluated more than once.  Return
8160 value is -1 for EOF or error.
8161 @end deftypefn
8162
8163 @deftypefn Macro void Lstream_ungetc (Lstream *@var{stream}, int @var{c})
8164 Push one byte back onto the input queue.  This will be the next byte
8165 read from the stream.  Any number of bytes can be pushed back and will
8166 be read in the reverse order they were pushed back---most recent
8167 first. (This is necessary for consistency---if there are a number of
8168 bytes that have been unread and I read and unread a byte, it needs to be
8169 the first to be read again.) This is a macro and so it is very
8170 efficient.  The @var{c} argument is only evaluated once but the @var{stream}
8171 argument is evaluated more than once.
8172 @end deftypefn
8173
8174 @deftypefun int Lstream_fputc (Lstream *@var{stream}, int @var{c})
8175 @deftypefunx int Lstream_fgetc (Lstream *@var{stream})
8176 @deftypefunx void Lstream_fungetc (Lstream *@var{stream}, int @var{c})
8177 Function equivalents of the above macros.
8178 @end deftypefun
8179
8180 @deftypefun ssize_t Lstream_read (Lstream *@var{stream}, void *@var{data}, size_t @var{size})
8181 Read @var{size} bytes of @var{data} from the stream.  Return the number
8182 of bytes read.  0 means EOF. -1 means an error occurred and no bytes
8183 were read.
8184 @end deftypefun
8185
8186 @deftypefun ssize_t Lstream_write (Lstream *@var{stream}, void *@var{data}, size_t @var{size})
8187 Write @var{size} bytes of @var{data} to the stream.  Return the number
8188 of bytes written.  -1 means an error occurred and no bytes were written.
8189 @end deftypefun
8190
8191 @deftypefun void Lstream_unread (Lstream *@var{stream}, void *@var{data}, size_t @var{size})
8192 Push back @var{size} bytes of @var{data} onto the input queue.  The next
8193 call to @code{Lstream_read()} with the same size will read the same
8194 bytes back.  Note that this will be the case even if there is other
8195 pending unread data.
8196 @end deftypefun
8197
8198 @deftypefun int Lstream_close (Lstream *@var{stream})
8199 Close the stream.  All data will be flushed out.
8200 @end deftypefun
8201
8202 @deftypefun void Lstream_reopen (Lstream *@var{stream})
8203 Reopen a closed stream.  This enables I/O on it again.  This is not
8204 meant to be called except from a wrapper routine that reinitializes
8205 variables and such---the close routine may well have freed some
8206 necessary storage structures, for example.
8207 @end deftypefun
8208
8209 @deftypefun void Lstream_rewind (Lstream *@var{stream})
8210 Rewind the stream to the beginning.
8211 @end deftypefun
8212
8213 @node Lstream Methods
8214 @section Lstream Methods
8215 @cindex lstream methods
8216
8217 @deftypefn {Lstream Method} ssize_t reader (Lstream *@var{stream}, unsigned char *@var{data}, size_t @var{size})
8218 Read some data from the stream's end and store it into @var{data}, which
8219 can hold @var{size} bytes.  Return the number of bytes read.  A return
8220 value of 0 means no bytes can be read at this time.  This may be because
8221 of an EOF, or because there is a granularity greater than one byte that
8222 the stream imposes on the returned data, and @var{size} is less than
8223 this granularity. (This will happen frequently for streams that need to
8224 return whole characters, because @code{Lstream_read()} calls the reader
8225 function repeatedly until it has the number of bytes it wants or until 0
8226 is returned.)  The lstream functions do not treat a 0 return as EOF or
8227 do anything special; however, the calling function will interpret any 0
8228 it gets back as EOF.  This will normally not happen unless the caller
8229 calls @code{Lstream_read()} with a very small size.
8230
8231 This function can be @code{NULL} if the stream is output-only.
8232 @end deftypefn
8233
8234 @deftypefn {Lstream Method} ssize_t writer (Lstream *@var{stream}, const unsigned char *@var{data}, size_t @var{size})
8235 Send some data to the stream's end.  Data to be sent is in @var{data}
8236 and is @var{size} bytes.  Return the number of bytes sent.  This
8237 function can send and return fewer bytes than is passed in; in that
8238 case, the function will just be called again until there is no data left
8239 or 0 is returned.  A return value of 0 means that no more data can be
8240 currently stored, but there is no error; the data will be squirreled
8241 away until the writer can accept data. (This is useful, e.g., if you're
8242 dealing with a non-blocking file descriptor and are getting
8243 @code{EWOULDBLOCK} errors.)  This function can be @code{NULL} if the
8244 stream is input-only.
8245 @end deftypefn
8246
8247 @deftypefn {Lstream Method} int rewinder (Lstream *@var{stream})
8248 Rewind the stream.  If this is @code{NULL}, the stream is not seekable.
8249 @end deftypefn
8250
8251 @deftypefn {Lstream Method} int seekable_p (Lstream *@var{stream})
8252 Indicate whether this stream is seekable---i.e. it can be rewound.
8253 This method is ignored if the stream does not have a rewind method.  If
8254 this method is not present, the result is determined by whether a rewind
8255 method is present.
8256 @end deftypefn
8257
8258 @deftypefn {Lstream Method} int flusher (Lstream *@var{stream})
8259 Perform any additional operations necessary to flush the data in this
8260 stream.
8261 @end deftypefn
8262
8263 @deftypefn {Lstream Method} int pseudo_closer (Lstream *@var{stream})
8264 @end deftypefn
8265
8266 @deftypefn {Lstream Method} int closer (Lstream *@var{stream})
8267 Perform any additional operations necessary to close this stream down.
8268 May be @code{NULL}.  This function is called when @code{Lstream_close()}
8269 is called or when the stream is garbage-collected.  When this function
8270 is called, all pending data in the stream will already have been written
8271 out.
8272 @end deftypefn
8273
8274 @deftypefn {Lstream Method} Lisp_Object marker (Lisp_Object @var{lstream}, void (*@var{markfun}) (Lisp_Object))
8275 Mark this object for garbage collection.  Same semantics as a standard
8276 @code{Lisp_Object} marker.  This function can be @code{NULL}.
8277 @end deftypefn
8278
8279 @node Consoles; Devices; Frames; Windows, The Redisplay Mechanism, Lstreams, Top
8280 @chapter Consoles; Devices; Frames; Windows
8281 @cindex consoles; devices; frames; windows
8282 @cindex devices; frames; windows, consoles;
8283 @cindex frames; windows, consoles; devices;
8284 @cindex windows, consoles; devices; frames;
8285
8286 @menu
8287 * Introduction to Consoles; Devices; Frames; Windows::
8288 * Point::
8289 * Window Hierarchy::
8290 * The Window Object::
8291 @end menu
8292
8293 @node Introduction to Consoles; Devices; Frames; Windows
8294 @section Introduction to Consoles; Devices; Frames; Windows
8295 @cindex consoles; devices; frames; windows, introduction to
8296 @cindex devices; frames; windows, introduction to consoles;
8297 @cindex frames; windows, introduction to consoles; devices;
8298 @cindex windows, introduction to consoles; devices; frames;
8299
8300 A window-system window that you see on the screen is called a
8301 @dfn{frame} in Emacs terminology.  Each frame is subdivided into one or
8302 more non-overlapping panes, called (confusingly) @dfn{windows}.  Each
8303 window displays the text of a buffer in it. (See above on Buffers.) Note
8304 that buffers and windows are independent entities: Two or more windows
8305 can be displaying the same buffer (potentially in different locations),
8306 and a buffer can be displayed in no windows.
8307
8308   A single display screen that contains one or more frames is called
8309 a @dfn{display}.  Under most circumstances, there is only one display.
8310 However, more than one display can exist, for example if you have
8311 a @dfn{multi-headed} console, i.e. one with a single keyboard but
8312 multiple displays. (Typically in such a situation, the various
8313 displays act like one large display, in that the mouse is only
8314 in one of them at a time, and moving the mouse off of one moves
8315 it into another.) In some cases, the different displays will
8316 have different characteristics, e.g. one color and one mono.
8317
8318   XEmacs can display frames on multiple displays.  It can even deal
8319 simultaneously with frames on multiple keyboards (called @dfn{consoles} in
8320 XEmacs terminology).  Here is one case where this might be useful: You
8321 are using XEmacs on your workstation at work, and leave it running.
8322 Then you go home and dial in on a TTY line, and you can use the
8323 already-running XEmacs process to display another frame on your local
8324 TTY.
8325
8326   Thus, there is a hierarchy console -> display -> frame -> window.
8327 There is a separate Lisp object type for each of these four concepts.
8328 Furthermore, there is logically a @dfn{selected console},
8329 @dfn{selected display}, @dfn{selected frame}, and @dfn{selected window}.
8330 Each of these objects is distinguished in various ways, such as being the
8331 default object for various functions that act on objects of that type.
8332 Note that every containing object remembers the ``selected'' object
8333 among the objects that it contains: e.g. not only is there a selected
8334 window, but every frame remembers the last window in it that was
8335 selected, and changing the selected frame causes the remembered window
8336 within it to become the selected window.  Similar relationships apply
8337 for consoles to devices and devices to frames.
8338
8339 @node Point
8340 @section Point
8341 @cindex point
8342
8343   Recall that every buffer has a current insertion position, called
8344 @dfn{point}.  Now, two or more windows may be displaying the same buffer,
8345 and the text cursor in the two windows (i.e. @code{point}) can be in
8346 two different places.  You may ask, how can that be, since each
8347 buffer has only one value of @code{point}?  The answer is that each window
8348 also has a value of @code{point} that is squirreled away in it.  There
8349 is only one selected window, and the value of ``point'' in that buffer
8350 corresponds to that window.  When the selected window is changed
8351 from one window to another displaying the same buffer, the old
8352 value of @code{point} is stored into the old window's ``point'' and the
8353 value of @code{point} from the new window is retrieved and made the
8354 value of @code{point} in the buffer.  This means that @code{window-point}
8355 for the selected window is potentially inaccurate, and if you
8356 want to retrieve the correct value of @code{point} for a window,
8357 you must special-case on the selected window and retrieve the
8358 buffer's point instead.  This is related to why @code{save-window-excursion}
8359 does not save the selected window's value of @code{point}.
8360
8361 @node Window Hierarchy
8362 @section Window Hierarchy
8363 @cindex window hierarchy
8364 @cindex hierarchy of windows
8365
8366   If a frame contains multiple windows (panes), they are always created
8367 by splitting an existing window along the horizontal or vertical axis.
8368 Terminology is a bit confusing here: to @dfn{split a window
8369 horizontally} means to create two side-by-side windows, i.e. to make a
8370 @emph{vertical} cut in a window.  Likewise, to @dfn{split a window
8371 vertically} means to create two windows, one above the other, by making
8372 a @emph{horizontal} cut.
8373
8374   If you split a window and then split again along the same axis, you
8375 will end up with a number of panes all arranged along the same axis.
8376 The precise way in which the splits were made should not be important,
8377 and this is reflected internally.  Internally, all windows are arranged
8378 in a tree, consisting of two types of windows, @dfn{combination} windows
8379 (which have children, and are covered completely by those children) and
8380 @dfn{leaf} windows, which have no children and are visible.  Every
8381 combination window has two or more children, all arranged along the same
8382 axis.  There are (logically) two subtypes of windows, depending on
8383 whether their children are horizontally or vertically arrayed.  There is
8384 always one root window, which is either a leaf window (if the frame
8385 contains only one window) or a combination window (if the frame contains
8386 more than one window).  In the latter case, the root window will have
8387 two or more children, either horizontally or vertically arrayed, and
8388 each of those children will be either a leaf window or another
8389 combination window.
8390
8391   Here are some rules:
8392
8393 @enumerate
8394 @item
8395 Horizontal combination windows can never have children that are
8396 horizontal combination windows; same for vertical.
8397
8398 @item
8399 Only leaf windows can be split (obviously) and this splitting does one
8400 of two things: (a) turns the leaf window into a combination window and
8401 creates two new leaf children, or (b) turns the leaf window into one of
8402 the two new leaves and creates the other leaf.  Rule (1) dictates which
8403 of these two outcomes happens.
8404
8405 @item
8406 Every combination window must have at least two children.
8407
8408 @item
8409 Leaf windows can never become combination windows.  They can be deleted,
8410 however.  If this results in a violation of (3), the parent combination
8411 window also gets deleted.
8412
8413 @item
8414 All functions that accept windows must be prepared to accept combination
8415 windows, and do something sane (e.g. signal an error if so).
8416 Combination windows @emph{do} escape to the Lisp level.
8417
8418 @item
8419 All windows have three fields governing their contents:
8420 these are @dfn{hchild} (a list of horizontally-arrayed children),
8421 @dfn{vchild} (a list of vertically-arrayed children), and @dfn{buffer}
8422 (the buffer contained in a leaf window).  Exactly one of
8423 these will be non-@code{nil}.  Remember that @dfn{horizontally-arrayed}
8424 means ``side-by-side'' and @dfn{vertically-arrayed} means
8425 @dfn{one above the other}.
8426
8427 @item
8428 Leaf windows also have markers in their @code{start} (the
8429 first buffer position displayed in the window) and @code{pointm}
8430 (the window's stashed value of @code{point}---see above) fields,
8431 while combination windows have @code{nil} in these fields.
8432
8433 @item
8434 The list of children for a window is threaded through the
8435 @code{next} and @code{prev} fields of each child window.
8436
8437 @item
8438 @strong{Deleted windows can be undeleted}.  This happens as a result of
8439 restoring a window configuration, and is unlike frames, displays, and
8440 consoles, which, once deleted, can never be restored.  Deleting a window
8441 does nothing except set a special @code{dead} bit to 1 and clear out the
8442 @code{next}, @code{prev}, @code{hchild}, and @code{vchild} fields, for
8443 GC purposes.
8444
8445 @item
8446 Most frames actually have two top-level windows---one for the
8447 minibuffer and one (the @dfn{root}) for everything else.  The modeline
8448 (if present) separates these two.  The @code{next} field of the root
8449 points to the minibuffer, and the @code{prev} field of the minibuffer
8450 points to the root.  The other @code{next} and @code{prev} fields are
8451 @code{nil}, and the frame points to both of these windows.
8452 Minibuffer-less frames have no minibuffer window, and the @code{next}
8453 and @code{prev} of the root window are @code{nil}.  Minibuffer-only
8454 frames have no root window, and the @code{next} of the minibuffer window
8455 is @code{nil} but the @code{prev} points to itself. (#### This is an
8456 artifact that should be fixed.)
8457 @end enumerate
8458
8459 @node The Window Object
8460 @section The Window Object
8461 @cindex window object, the
8462 @cindex object, the window
8463
8464   Windows have the following accessible fields:
8465
8466 @table @code
8467 @item frame
8468 The frame that this window is on.
8469
8470 @item mini_p
8471 Non-@code{nil} if this window is a minibuffer window.
8472
8473 @item buffer
8474 The buffer that the window is displaying.  This may change often during
8475 the life of the window.
8476
8477 @item dedicated
8478 Non-@code{nil} if this window is dedicated to its buffer.
8479
8480 @item pointm
8481 @cindex window point internals
8482 This is the value of point in the current buffer when this window is
8483 selected; when it is not selected, it retains its previous value.
8484
8485 @item start
8486 The position in the buffer that is the first character to be displayed
8487 in the window.
8488
8489 @item force_start
8490 If this flag is non-@code{nil}, it says that the window has been
8491 scrolled explicitly by the Lisp program.  This affects what the next
8492 redisplay does if point is off the screen: instead of scrolling the
8493 window to show the text around point, it moves point to a location that
8494 is on the screen.
8495
8496 @item last_modified
8497 The @code{modified} field of the window's buffer, as of the last time
8498 a redisplay completed in this window.
8499
8500 @item last_point
8501 The buffer's value of point, as of the last time
8502 a redisplay completed in this window.
8503
8504 @item left
8505 This is the left-hand edge of the window, measured in columns.  (The
8506 leftmost column on the screen is @w{column 0}.)
8507
8508 @item top
8509 This is the top edge of the window, measured in lines.  (The top line on
8510 the screen is @w{line 0}.)
8511
8512 @item height
8513 The height of the window, measured in lines.
8514
8515 @item width
8516 The width of the window, measured in columns.
8517
8518 @item next
8519 This is the window that is the next in the chain of siblings.  It is
8520 @code{nil} in a window that is the rightmost or bottommost of a group of
8521 siblings.
8522
8523 @item prev
8524 This is the window that is the previous in the chain of siblings.  It is
8525 @code{nil} in a window that is the leftmost or topmost of a group of
8526 siblings.
8527
8528 @item parent
8529 Internally, XEmacs arranges windows in a tree; each group of siblings has
8530 a parent window whose area includes all the siblings.  This field points
8531 to a window's parent.
8532
8533 Parent windows do not display buffers, and play little role in display
8534 except to shape their child windows.  Emacs Lisp programs usually have
8535 no access to the parent windows; they operate on the windows at the
8536 leaves of the tree, which actually display buffers.
8537
8538 @item hscroll
8539 This is the number of columns that the display in the window is scrolled
8540 horizontally to the left.  Normally, this is 0.
8541
8542 @item use_time
8543 This is the last time that the window was selected.  The function
8544 @code{get-lru-window} uses this field.
8545
8546 @item display_table
8547 The window's display table, or @code{nil} if none is specified for it.
8548
8549 @item update_mode_line
8550 Non-@code{nil} means this window's mode line needs to be updated.
8551
8552 @item base_line_number
8553 The line number of a certain position in the buffer, or @code{nil}.
8554 This is used for displaying the line number of point in the mode line.
8555
8556 @item base_line_pos
8557 The position in the buffer for which the line number is known, or
8558 @code{nil} meaning none is known.
8559
8560 @item region_showing
8561 If the region (or part of it) is highlighted in this window, this field
8562 holds the mark position that made one end of that region.  Otherwise,
8563 this field is @code{nil}.
8564 @end table
8565
8566 @node The Redisplay Mechanism, Extents, Consoles; Devices; Frames; Windows, Top
8567 @chapter The Redisplay Mechanism
8568 @cindex redisplay mechanism, the
8569
8570   The redisplay mechanism is one of the most complicated sections of
8571 XEmacs, especially from a conceptual standpoint.  This is doubly so
8572 because, unlike for the basic aspects of the Lisp interpreter, the
8573 computer science theories of how to efficiently handle redisplay are not
8574 well-developed.
8575
8576   When working with the redisplay mechanism, remember the Golden Rules
8577 of Redisplay:
8578
8579 @enumerate
8580 @item
8581 It Is Better To Be Correct Than Fast.
8582 @item
8583 Thou Shalt Not Run Elisp From Within Redisplay.
8584 @item
8585 It Is Better To Be Fast Than Not To Be.
8586 @end enumerate
8587
8588 @menu
8589 * Critical Redisplay Sections::
8590 * Line Start Cache::
8591 * Redisplay Piece by Piece::
8592 @end menu
8593
8594 @node Critical Redisplay Sections
8595 @section Critical Redisplay Sections
8596 @cindex redisplay sections, critical
8597 @cindex critical redisplay sections
8598
8599 Within this section, we are defenseless and assume that the
8600 following cannot happen:
8601
8602 @enumerate
8603 @item
8604 garbage collection
8605 @item
8606 Lisp code evaluation
8607 @item
8608 frame size changes
8609 @end enumerate
8610
8611 We ensure (3) by calling @code{hold_frame_size_changes()}, which
8612 will cause any pending frame size changes to get put on hold
8613 till after the end of the critical section.  (1) follows
8614 automatically if (2) is met.  #### Unfortunately, there are
8615 some places where Lisp code can be called within this section.
8616 We need to remove them.
8617
8618 If @code{Fsignal()} is called during this critical section, we
8619 will @code{abort()}.
8620
8621 If garbage collection is called during this critical section,
8622 we simply return. #### We should abort instead.
8623
8624 #### If a frame-size change does occur we should probably
8625 actually be preempting redisplay.
8626
8627 @node Line Start Cache
8628 @section Line Start Cache
8629 @cindex line start cache
8630
8631   The traditional scrolling code in Emacs breaks in a variable height
8632 world.  It depends on the key assumption that the number of lines that
8633 can be displayed at any given time is fixed.  This led to a complete
8634 separation of the scrolling code from the redisplay code.  In order to
8635 fully support variable height lines, the scrolling code must actually be
8636 tightly integrated with redisplay.  Only redisplay can determine how
8637 many lines will be displayed on a screen for any given starting point.
8638
8639   What is ideally wanted is a complete list of the starting buffer
8640 position for every possible display line of a buffer along with the
8641 height of that display line.  Maintaining such a full list would be very
8642 expensive.  We settle for having it include information for all areas
8643 which we happen to generate anyhow (i.e. the region currently being
8644 displayed) and for those areas we need to work with.
8645
8646   In order to ensure that the cache accurately represents what redisplay
8647 would actually show, it is necessary to invalidate it in many
8648 situations.  If the buffer changes, the starting positions may no longer
8649 be correct.  If a face or an extent has changed then the line heights
8650 may have altered.  These events happen frequently enough that the cache
8651 can end up being constantly disabled.  With this potentially constant
8652 invalidation when is the cache ever useful?
8653
8654   Even if the cache is invalidated before every single usage, it is
8655 necessary.  Scrolling often requires knowledge about display lines which
8656 are actually above or below the visible region.  The cache provides a
8657 convenient light-weight method of storing this information for multiple
8658 display regions.  This knowledge is necessary for the scrolling code to
8659 always obey the First Golden Rule of Redisplay.
8660
8661   If the cache already contains all of the information that the scrolling
8662 routines happen to need so that it doesn't have to go generate it, then
8663 we are able to obey the Third Golden Rule of Redisplay.  The first thing
8664 we do to help out the cache is to always add the displayed region.  This
8665 region had to be generated anyway, so the cache ends up getting the
8666 information basically for free.  In those cases where a user is simply
8667 scrolling around viewing a buffer there is a high probability that this
8668 is sufficient to always provide the needed information.  The second
8669 thing we can do is be smart about invalidating the cache.
8670
8671   TODO---Be smart about invalidating the cache.  Potential places:
8672
8673 @itemize @bullet
8674 @item
8675 Insertions at end-of-line which don't cause line-wraps do not alter the
8676 starting positions of any display lines.  These types of buffer
8677 modifications should not invalidate the cache.  This is actually a large
8678 optimization for redisplay speed as well.
8679 @item
8680 Buffer modifications frequently only affect the display of lines at and
8681 below where they occur.  In these situations we should only invalidate
8682 the part of the cache starting at where the modification occurs.
8683 @end itemize
8684
8685   In case you're wondering, the Second Golden Rule of Redisplay is not
8686 applicable.
8687
8688 @node Redisplay Piece by Piece
8689 @section Redisplay Piece by Piece
8690 @cindex redisplay piece by piece
8691
8692 As you can begin to see redisplay is complex and also not well
8693 documented. Chuck no longer works on XEmacs so this section is my take
8694 on the workings of redisplay.
8695
8696 Redisplay happens in three phases:
8697
8698 @enumerate
8699 @item
8700 Determine desired display in area that needs redisplay.
8701 Implemented by @code{redisplay.c}
8702 @item
8703 Compare desired display with current display
8704 Implemented by @code{redisplay-output.c}
8705 @item
8706 Output changes Implemented by @code{redisplay-output.c},
8707 @code{redisplay-x.c}, @code{redisplay-msw.c} and @code{redisplay-tty.c}
8708 @end enumerate
8709
8710 Steps 1 and 2 are device-independent and relatively complex.  Step 3 is
8711 mostly device-dependent.
8712
8713 Determining the desired display
8714
8715 Display attributes are stored in @code{display_line} structures. Each
8716 @code{display_line} consists of a set of @code{display_block}'s and each
8717 @code{display_block} contains a number of @code{rune}'s. Generally
8718 dynarr's of @code{display_line}'s are held by each window representing
8719 the current display and the desired display.
8720
8721 The @code{display_line} structures are tightly tied to buffers which
8722 presents a problem for redisplay as this connection is bogus for the
8723 modeline. Hence the @code{display_line} generation routines are
8724 duplicated for generating the modeline. This means that the modeline
8725 display code has many bugs that the standard redisplay code does not.
8726
8727 The guts of @code{display_line} generation are in
8728 @code{create_text_block}, which creates a single display line for the
8729 desired locale. This incrementally parses the characters on the current
8730 line and generates redisplay structures for each.
8731
8732 Gutter redisplay is different. Because the data to display is stored in
8733 a string we cannot use @code{create_text_block}. Instead we use
8734 @code{create_text_string_block} which performs the same function as
8735 @code{create_text_block} but for strings. Many of the complexities of
8736 @code{create_text_block} to do with cursor handling and selective
8737 display have been removed.
8738
8739 @node Extents, Faces, The Redisplay Mechanism, Top
8740 @chapter Extents
8741 @cindex extents
8742
8743 @menu
8744 * Introduction to Extents::     Extents are ranges over text, with properties.
8745 * Extent Ordering::             How extents are ordered internally.
8746 * Format of the Extent Info::   The extent information in a buffer or string.
8747 * Zero-Length Extents::         A weird special case.
8748 * Mathematics of Extent Ordering::  A rigorous foundation.
8749 * Extent Fragments::            Cached information useful for redisplay.
8750 @end menu
8751
8752 @node Introduction to Extents
8753 @section Introduction to Extents
8754 @cindex extents, introduction to
8755
8756   Extents are regions over a buffer, with a start and an end position
8757 denoting the region of the buffer included in the extent.  In
8758 addition, either end can be closed or open, meaning that the endpoint
8759 is or is not logically included in the extent.  Insertion of a character
8760 at a closed endpoint causes the character to go inside the extent;
8761 insertion at an open endpoint causes the character to go outside.
8762
8763   Extent endpoints are stored using memory indices (see @file{insdel.c}),
8764 to minimize the amount of adjusting that needs to be done when
8765 characters are inserted or deleted.
8766
8767   (Formerly, extent endpoints at the gap could be either before or
8768 after the gap, depending on the open/closedness of the endpoint.
8769 The intent of this was to make it so that insertions would
8770 automatically go inside or out of extents as necessary with no
8771 further work needing to be done.  It didn't work out that way,
8772 however, and just ended up complexifying and buggifying all the
8773 rest of the code.)
8774
8775 @node Extent Ordering
8776 @section Extent Ordering
8777 @cindex extent ordering
8778
8779   Extents are compared using memory indices.  There are two orderings
8780 for extents and both orders are kept current at all times.  The normal
8781 or @dfn{display} order is as follows:
8782
8783 @example
8784 Extent A is ``less than'' extent B,
8785 that is, earlier in the display order,
8786   if:    A-start < B-start,
8787   or if: A-start = B-start, and A-end > B-end
8788 @end example
8789
8790   So if two extents begin at the same position, the larger of them is the
8791 earlier one in the display order (@code{EXTENT_LESS} is true).
8792
8793   For the e-order, the same thing holds:
8794
8795 @example
8796 Extent A is ``less than'' extent B in e-order,
8797 that is, later in the buffer,
8798   if:    A-end < B-end,
8799   or if: A-end = B-end, and A-start > B-start
8800 @end example
8801
8802   So if two extents end at the same position, the smaller of them is the
8803 earlier one in the e-order (@code{EXTENT_E_LESS} is true).
8804
8805   The display order and the e-order are complementary orders: any
8806 theorem about the display order also applies to the e-order if you swap
8807 all occurrences of ``display order'' and ``e-order'', ``less than'' and
8808 ``greater than'', and ``extent start'' and ``extent end''.
8809
8810 @node Format of the Extent Info
8811 @section Format of the Extent Info
8812 @cindex extent info, format of the
8813
8814   An extent-info structure consists of a list of the buffer or string's
8815 extents and a @dfn{stack of extents} that lists all of the extents over
8816 a particular position.  The stack-of-extents info is used for
8817 optimization purposes---it basically caches some info that might
8818 be expensive to compute.  Certain otherwise hard computations are easy
8819 given the stack of extents over a particular position, and if the
8820 stack of extents over a nearby position is known (because it was
8821 calculated at some prior point in time), it's easy to move the stack
8822 of extents to the proper position.
8823
8824   Given that the stack of extents is an optimization, and given that
8825 it requires memory, a string's stack of extents is wiped out each
8826 time a garbage collection occurs.  Therefore, any time you retrieve
8827 the stack of extents, it might not be there.  If you need it to
8828 be there, use the @code{_force} version.
8829
8830   Similarly, a string may or may not have an extent_info structure.
8831 (Generally it won't if there haven't been any extents added to the
8832 string.) So use the @code{_force} version if you need the extent_info
8833 structure to be there.
8834
8835   A list of extents is maintained as a double gap array: one gap array
8836 is ordered by start index (the @dfn{display order}) and the other is
8837 ordered by end index (the @dfn{e-order}).  Note that positions in an
8838 extent list should logically be conceived of as referring @emph{to} a
8839 particular extent (as is the norm in programs) rather than sitting
8840 between two extents.  Note also that callers of these functions should
8841 not be aware of the fact that the extent list is implemented as an
8842 array, except for the fact that positions are integers (this should be
8843 generalized to handle integers and linked list equally well).
8844
8845 @node Zero-Length Extents
8846 @section Zero-Length Extents
8847 @cindex zero-length extents
8848 @cindex extents, zero-length
8849
8850   Extents can be zero-length, and will end up that way if their endpoints
8851 are explicitly set that way or if their detachable property is @code{nil}
8852 and all the text in the extent is deleted. (The exception is open-open
8853 zero-length extents, which are barred from existing because there is
8854 no sensible way to define their properties.  Deletion of the text in
8855 an open-open extent causes it to be converted into a closed-open
8856 extent.)  Zero-length extents are primarily used to represent
8857 annotations, and behave as follows:
8858
8859 @enumerate
8860 @item
8861 Insertion at the position of a zero-length extent expands the extent
8862 if both endpoints are closed; goes after the extent if it is closed-open;
8863 and goes before the extent if it is open-closed.
8864
8865 @item
8866 Deletion of a character on a side of a zero-length extent whose
8867 corresponding endpoint is closed causes the extent to be detached if
8868 it is detachable; if the extent is not detachable or the corresponding
8869 endpoint is open, the extent remains in the buffer, moving as necessary.
8870 @end enumerate
8871
8872   Note that closed-open, non-detachable zero-length extents behave
8873 exactly like markers and that open-closed, non-detachable zero-length
8874 extents behave like the ``point-type'' marker in Mule.
8875
8876 @node Mathematics of Extent Ordering
8877 @section Mathematics of Extent Ordering
8878 @cindex mathematics of extent ordering
8879 @cindex extent mathematics
8880 @cindex extent ordering
8881
8882 @cindex display order of extents
8883 @cindex extents, display order
8884   The extents in a buffer are ordered by ``display order'' because that
8885 is that order that the redisplay mechanism needs to process them in.
8886 The e-order is an auxiliary ordering used to facilitate operations
8887 over extents.  The operations that can be performed on the ordered
8888 list of extents in a buffer are
8889
8890 @enumerate
8891 @item
8892 Locate where an extent would go if inserted into the list.
8893 @item
8894 Insert an extent into the list.
8895 @item
8896 Remove an extent from the list.
8897 @item
8898 Map over all the extents that overlap a range.
8899 @end enumerate
8900
8901   (4) requires being able to determine the first and last extents
8902 that overlap a range.
8903
8904   NOTE: @dfn{overlap} is used as follows:
8905
8906 @itemize @bullet
8907 @item
8908 two ranges overlap if they have at least one point in common.
8909 Whether the endpoints are open or closed makes a difference here.
8910 @item
8911 a point overlaps a range if the point is contained within the
8912 range; this is equivalent to treating a point @math{P} as the range
8913 @math{[P, P]}.
8914 @item
8915 In the case of an @emph{extent} overlapping a point or range, the extent
8916 is normally treated as having closed endpoints.  This applies
8917 consistently in the discussion of stacks of extents and such below.
8918 Note that this definition of overlap is not necessarily consistent with
8919 the extents that @code{map-extents} maps over, since @code{map-extents}
8920 sometimes pays attention to whether the endpoints of an extents are open
8921 or closed.  But for our purposes, it greatly simplifies things to treat
8922 all extents as having closed endpoints.
8923 @end itemize
8924
8925 First, define @math{>}, @math{<}, @math{<=}, etc. as applied to extents
8926 to mean comparison according to the display order.  Comparison between
8927 an extent @math{E} and an index @math{I} means comparison between
8928 @math{E} and the range @math{[I, I]}.
8929
8930 Also define @math{e>}, @math{e<}, @math{e<=}, etc. to mean comparison
8931 according to the e-order.
8932
8933 For any range @math{R}, define @math{R(0)} to be the starting index of
8934 the range and @math{R(1)} to be the ending index of the range.
8935
8936 For any extent @math{E}, define @math{E(next)} to be the extent directly
8937 following @math{E}, and @math{E(prev)} to be the extent directly
8938 preceding @math{E}.  Assume @math{E(next)} and @math{E(prev)} can be
8939 determined from @math{E} in constant time.  (This is because we store
8940 the extent list as a doubly linked list.)
8941
8942 Similarly, define @math{E(e-next)} and @math{E(e-prev)} to be the
8943 extents directly following and preceding @math{E} in the e-order.
8944
8945 Now:
8946
8947 Let @math{R} be a range.
8948 Let @math{F} be the first extent overlapping @math{R}.
8949 Let @math{L} be the last extent overlapping @math{R}.
8950
8951 Theorem 1: @math{R(1)} lies between @math{L} and @math{L(next)},
8952 i.e. @math{L <= R(1) < L(next)}.
8953
8954   This follows easily from the definition of display order.  The
8955 basic reason that this theorem applies is that the display order
8956 sorts by increasing starting index.
8957
8958   Therefore, we can determine @math{L} just by looking at where we would
8959 insert @math{R(1)} into the list, and if we know @math{F} and are moving
8960 forward over extents, we can easily determine when we've hit @math{L} by
8961 comparing the extent we're at to @math{R(1)}.
8962
8963 @example
8964 Theorem 2: @math{F(e-prev) e< [1, R(0)] e<= F}.
8965 @end example
8966
8967   This is the analog of Theorem 1, and applies because the e-order
8968 sorts by increasing ending index.
8969
8970   Therefore, @math{F} can be found in the same amount of time as
8971 operation (1), i.e. the time that it takes to locate where an extent
8972 would go if inserted into the e-order list.
8973
8974   If the lists were stored as balanced binary trees, then operation (1)
8975 would take logarithmic time, which is usually quite fast.  However,
8976 currently they're stored as simple doubly-linked lists, and instead we
8977 do some caching to try to speed things up.
8978
8979   Define a @dfn{stack of extents} (or @dfn{SOE}) as the set of extents
8980 (ordered in the display order) that overlap an index @math{I}, together
8981 with the SOE's @dfn{previous} extent, which is an extent that precedes
8982 @math{I} in the e-order. (Hopefully there will not be very many extents
8983 between @math{I} and the previous extent.)
8984
8985 Now:
8986
8987 Let @math{I} be an index, let @math{S} be the stack of extents on
8988 @math{I}, let @math{F} be the first extent in @math{S}, and let @math{P}
8989 be @math{S}'s previous extent.
8990
8991 Theorem 3: The first extent in @math{S} is the first extent that overlaps
8992 any range @math{[I, J]}.
8993
8994 Proof: Any extent that overlaps @math{[I, J]} but does not include
8995 @math{I} must have a start index @math{> I}, and thus be greater than
8996 any extent in @math{S}.
8997
8998 Therefore, finding the first extent that overlaps a range @math{R} is
8999 the same as finding the first extent that overlaps @math{R(0)}.
9000
9001 Theorem 4: Let @math{I2} be an index such that @math{I2 > I}, and let
9002 @math{F2} be the first extent that overlaps @math{I2}.  Then, either
9003 @math{F2} is in @math{S} or @math{F2} is greater than any extent in
9004 @math{S}.
9005
9006 Proof: If @math{F2} does not include @math{I} then its start index is
9007 greater than @math{I} and thus it is greater than any extent in
9008 @math{S}, including @math{F}.  Otherwise, @math{F2} includes @math{I}
9009 and thus is in @math{S}, and thus @math{F2 >= F}.
9010
9011 @node Extent Fragments
9012 @section Extent Fragments
9013 @cindex extent fragments
9014 @cindex fragments, extent
9015
9016   Imagine that the buffer is divided up into contiguous, non-overlapping
9017 @dfn{runs} of text such that no extent starts or ends within a run
9018 (extents that abut the run don't count).
9019
9020   An extent fragment is a structure that holds data about the run that
9021 contains a particular buffer position (if the buffer position is at the
9022 junction of two runs, the run after the position is used)---the
9023 beginning and end of the run, a list of all of the extents in that run,
9024 the @dfn{merged face} that results from merging all of the faces
9025 corresponding to those extents, the begin and end glyphs at the
9026 beginning of the run, etc.  This is the information that redisplay needs
9027 in order to display this run.
9028
9029   Extent fragments have to be very quick to update to a new buffer
9030 position when moving linearly through the buffer.  They rely on the
9031 stack-of-extents code, which does the heavy-duty algorithmic work of
9032 determining which extents overly a particular position.
9033
9034 @node Faces, Glyphs, Extents, Top
9035 @chapter Faces
9036 @cindex faces
9037
9038 Not yet documented.
9039
9040 @node Glyphs, Specifiers, Faces, Top
9041 @chapter Glyphs
9042 @cindex glyphs
9043
9044 Glyphs are graphical elements that can be displayed in XEmacs buffers or
9045 gutters. We use the term graphical element here in the broadest possible
9046 sense since glyphs can be as mundane as text or as arcane as a native
9047 tab widget.
9048
9049 In XEmacs, glyphs represent the uninstantiated state of graphical
9050 elements, i.e. they hold all the information necessary to produce an
9051 image on-screen but the image need not exist at this stage, and multiple
9052 screen images can be instantiated from a single glyph.
9053
9054 Glyphs are lazily instantiated by calling one of the glyph
9055 functions. This usually occurs within redisplay when
9056 @code{Fglyph_height} is called. Instantiation causes an image-instance
9057 to be created and cached. This cache is on a per-device basis for all glyphs
9058 except widget-glyphs, and on a per-window basis for widgets-glyphs.  The
9059 caching is done by @code{image_instantiate} and is necessary because it
9060 is generally possible to display an image-instance in multiple
9061 domains. For instance if we create a Pixmap, we can actually display
9062 this on multiple windows - even though we only need a single Pixmap
9063 instance to do this. If caching wasn't done then it would be necessary
9064 to create image-instances for every displayable occurrence of a glyph -
9065 and every usage - and this would be extremely memory and cpu intensive.
9066
9067 Widget-glyphs (a.k.a native widgets) are not cached in this way. This is
9068 because widget-glyph image-instances on screen are toolkit windows, and
9069 thus cannot be reused in multiple XEmacs domains. Thus widget-glyphs are
9070 cached on an XEmacs window basis.
9071
9072 Any action on a glyph first consults the cache before actually
9073 instantiating a widget.
9074
9075 @section Glyph Instantiation
9076 @cindex glyph instantiation
9077 @cindex instantiation, glyph
9078
9079 Glyph instantiation is a hairy topic and requires some explanation. The
9080 guts of glyph instantiation is contained within
9081 @code{image_instantiate}. A glyph contains an image which is a
9082 specifier. When a glyph function - for instance @code{Fglyph_height} -
9083 asks for a property of the glyph that can only be determined from its
9084 instantiated state, then the glyph image is instantiated and an image
9085 instance created. The instantiation process is governed by the specifier
9086 code and goes through a series of steps:
9087
9088 @itemize @bullet
9089 @item
9090 Validation. Instantiation of image instances happens dynamically - often
9091 within the guts of redisplay. Thus it is often not feasible to catch
9092 instantiator errors at instantiation time. Instead the instantiator is
9093 validated at the time it is added to the image specifier. This function
9094 is defined by @code{image_validate} and at a simple level validates
9095 keyword value pairs.
9096 @item
9097 Duplication. The specifier code by default takes a copy of the
9098 instantiator. This is reasonable for most specifiers but in the case of
9099 widget-glyphs can be problematic, since some of the properties in the
9100 instantiator - for instance callbacks - could cause infinite recursion
9101 in the copying process. Thus the image code defines a function -
9102 @code{image_copy_instantiator} - which will selectively copy values.
9103 This is controlled by the way that a keyword is defined either using
9104 @code{IIFORMAT_VALID_KEYWORD} or
9105 @code{IIFORMAT_VALID_NONCOPY_KEYWORD}. Note that the image caching and
9106 redisplay code relies on instantiator copying to ensure that current and
9107 new instantiators are actually different rather than referring to the
9108 same thing.
9109 @item
9110 Normalization. Once the instantiator has been copied it must be
9111 converted into a form that is viable at instantiation time. This can
9112 involve no changes at all, but typically involves things like converting
9113 file names to the actual data. This function is defined by
9114 @code{image_going_to_add} and @code{normalize_image_instantiator}.
9115 @item
9116 Instantiation. When an image instance is actually required for display
9117 it is instantiated using @code{image_instantiate}. This involves calling
9118 instantiate methods that are specific to the type of image being
9119 instantiated.
9120 @end itemize
9121
9122 The final instantiation phase also involves a number of steps. In order
9123 to understand these we need to describe a number of concepts.
9124
9125 An image is instantiated in a @dfn{domain}, where a domain can be any
9126 one of a device, frame, window or image-instance. The domain gives the
9127 image-instance context and identity and properties that affect the
9128 appearance of the image-instance may be different for the same glyph
9129 instantiated in different domains. An example is the face used to
9130 display the image-instance.
9131
9132 Although an image is instantiated in a particular domain the
9133 instantiation domain is not necessarily the domain in which the
9134 image-instance is cached. For example a pixmap can be instantiated in a
9135 window be actually be cached on a per-device basis. The domain in which
9136 the image-instance is actually cached is called the
9137 @dfn{governing-domain}. A governing-domain is currently either a device
9138 or a window. Widget-glyphs and text-glyphs have a window as a
9139 governing-domain, all other image-instances have a device as the
9140 governing-domain. The governing domain for an image-instance is
9141 determined using the governing_domain image-instance method.
9142
9143 @section Widget-Glyphs
9144 @cindex widget-glyphs
9145
9146 @section Widget-Glyphs in the MS-Windows Environment
9147 @cindex widget-glyphs in the MS-Windows environment
9148 @cindex MS-Windows environment, widget-glyphs in the
9149
9150 To Do
9151
9152 @section Widget-Glyphs in the X Environment
9153 @cindex widget-glyphs in the X environment
9154 @cindex X environment, widget-glyphs in the
9155
9156 Widget-glyphs under X make heavy use of lwlib (@pxref{Lucid Widget
9157 Library}) for manipulating the native toolkit objects. This is primarily
9158 so that different toolkits can be supported for widget-glyphs, just as
9159 they are supported for features such as menubars etc.
9160
9161 Lwlib is extremely poorly documented and quite hairy so here is my
9162 understanding of what goes on.
9163
9164 Lwlib maintains a set of widget_instances which mirror the hierarchical
9165 state of Xt widgets. I think this is so that widgets can be updated and
9166 manipulated generically by the lwlib library. For instance
9167 update_one_widget_instance can cope with multiple types of widget and
9168 multiple types of toolkit. Each element in the widget hierarchy is updated
9169 from its corresponding widget_instance by walking the widget_instance
9170 tree recursively.
9171
9172 This has desirable properties such as lw_modify_all_widgets which is
9173 called from @file{glyphs-x.c} and updates all the properties of a widget
9174 without having to know what the widget is or what toolkit it is from.
9175 Unfortunately this also has hairy properties such as making the lwlib
9176 code quite complex. And of course lwlib has to know at some level what
9177 the widget is and how to set its properties.
9178
9179 @node Specifiers, Menus, Glyphs, Top
9180 @chapter Specifiers
9181 @cindex specifiers
9182
9183 Not yet documented.
9184
9185 @node Menus, Subprocesses, Specifiers, Top
9186 @chapter Menus
9187 @cindex menus
9188
9189   A menu is set by setting the value of the variable
9190 @code{current-menubar} (which may be buffer-local) and then calling
9191 @code{set-menubar-dirty-flag} to signal a change.  This will cause the
9192 menu to be redrawn at the next redisplay.  The format of the data in
9193 @code{current-menubar} is described in @file{menubar.c}.
9194
9195   Internally the data in current-menubar is parsed into a tree of
9196 @code{widget_value's} (defined in @file{lwlib.h}); this is accomplished
9197 by the recursive function @code{menu_item_descriptor_to_widget_value()},
9198 called by @code{compute_menubar_data()}.  Such a tree is deallocated
9199 using @code{free_widget_value()}.
9200
9201   @code{update_screen_menubars()} is one of the external entry points.
9202 This checks to see, for each screen, if that screen's menubar needs to
9203 be updated.  This is the case if
9204
9205 @enumerate
9206 @item
9207 @code{set-menubar-dirty-flag} was called since the last redisplay.  (This
9208 function sets the C variable menubar_has_changed.)
9209 @item
9210 The buffer displayed in the screen has changed.
9211 @item
9212 The screen has no menubar currently displayed.
9213 @end enumerate
9214
9215   @code{set_screen_menubar()} is called for each such screen.  This
9216 function calls @code{compute_menubar_data()} to create the tree of
9217 widget_value's, then calls @code{lw_create_widget()},
9218 @code{lw_modify_all_widgets()}, and/or @code{lw_destroy_all_widgets()}
9219 to create the X-Toolkit widget associated with the menu.
9220
9221   @code{update_psheets()}, the other external entry point, actually
9222 changes the menus being displayed.  It uses the widgets fixed by
9223 @code{update_screen_menubars()} and calls various X functions to ensure
9224 that the menus are displayed properly.
9225
9226   The menubar widget is set up so that @code{pre_activate_callback()} is
9227 called when the menu is first selected (i.e. mouse button goes down),
9228 and @code{menubar_selection_callback()} is called when an item is
9229 selected.  @code{pre_activate_callback()} calls the function in
9230 activate-menubar-hook, which can change the menubar (this is described
9231 in @file{menubar.c}).  If the menubar is changed,
9232 @code{set_screen_menubars()} is called.
9233 @code{menubar_selection_callback()} enqueues a menu event, putting in it
9234 a function to call (either @code{eval} or @code{call-interactively}) and
9235 its argument, which is the callback function or form given in the menu's
9236 description.
9237
9238 @node Subprocesses, Interface to the X Window System, Menus, Top
9239 @chapter Subprocesses
9240 @cindex subprocesses
9241
9242   The fields of a process are:
9243
9244 @table @code
9245 @item name
9246 A string, the name of the process.
9247
9248 @item command
9249 A list containing the command arguments that were used to start this
9250 process.
9251
9252 @item filter
9253 A function used to accept output from the process instead of a buffer,
9254 or @code{nil}.
9255
9256 @item sentinel
9257 A function called whenever the process receives a signal, or @code{nil}.
9258
9259 @item buffer
9260 The associated buffer of the process.
9261
9262 @item pid
9263 An integer, the Unix process @sc{id}.
9264
9265 @item childp
9266 A flag, non-@code{nil} if this is really a child process.
9267 It is @code{nil} for a network connection.
9268
9269 @item mark
9270 A marker indicating the position of the end of the last output from this
9271 process inserted into the buffer.  This is often but not always the end
9272 of the buffer.
9273
9274 @item kill_without_query
9275 If this is non-@code{nil}, killing XEmacs while this process is still
9276 running does not ask for confirmation about killing the process.
9277
9278 @item raw_status_low
9279 @itemx raw_status_high
9280 These two fields record 16 bits each of the process status returned by
9281 the @code{wait} system call.
9282
9283 @item status
9284 The process status, as @code{process-status} should return it.
9285
9286 @item tick
9287 @itemx update_tick
9288 If these two fields are not equal, a change in the status of the process
9289 needs to be reported, either by running the sentinel or by inserting a
9290 message in the process buffer.
9291
9292 @item pty_flag
9293 Non-@code{nil} if communication with the subprocess uses a @sc{pty};
9294 @code{nil} if it uses a pipe.
9295
9296 @item infd
9297 The file descriptor for input from the process.
9298
9299 @item outfd
9300 The file descriptor for output to the process.
9301
9302 @item subtty
9303 The file descriptor for the terminal that the subprocess is using.  (On
9304 some systems, there is no need to record this, so the value is
9305 @code{-1}.)
9306
9307 @item tty_name
9308 The name of the terminal that the subprocess is using,
9309 or @code{nil} if it is using pipes.
9310 @end table
9311
9312 @node Interface to the X Window System, Index, Subprocesses, Top
9313 @chapter Interface to the X Window System
9314 @cindex X Window System, interface to the
9315
9316 Mostly undocumented.
9317
9318 @menu
9319 * Lucid Widget Library::        An interface to various widget sets.
9320 @end menu
9321
9322 @node Lucid Widget Library
9323 @section Lucid Widget Library
9324 @cindex Lucid Widget Library
9325 @cindex widget library, Lucid
9326 @cindex library, Lucid Widget
9327
9328 Lwlib is extremely poorly documented and quite hairy.  The author(s)
9329 blame that on X, Xt, and Motif, with some justice, but also sufficient
9330 hypocrisy to avoid drawing the obvious conclusion about their own work.
9331
9332 The Lucid Widget Library is composed of two more or less independent
9333 pieces.  The first, as the name suggests, is a set of widgets.  These
9334 widgets are intended to resemble and improve on widgets provided in the
9335 Motif toolkit but not in the Athena widgets, including menubars and
9336 scrollbars.  Recent additions by Andy Piper integrate some ``modern''
9337 widgets by Edward Falk, including checkboxes, radio buttons, progress
9338 gauges, and index tab controls (aka notebooks).
9339
9340 The second piece of the Lucid widget library is a generic interface to
9341 several toolkits for X (including Xt, the Athena widget set, and Motif,
9342 as well as the Lucid widgets themselves) so that core XEmacs code need
9343 not know which widget set has been used to build the graphical user
9344 interface.
9345
9346 @menu
9347 * Generic Widget Interface::    The lwlib generic widget interface.
9348 * Scrollbars::
9349 * Menubars::
9350 * Checkboxes and Radio Buttons::
9351 * Progress Bars::
9352 * Tab Controls::
9353 @end menu
9354
9355 @node Generic Widget Interface
9356 @subsection Generic Widget Interface
9357 @cindex widget interface, generic
9358
9359 In general in any toolkit a widget may be a composite object.  In Xt,
9360 all widgets have an X window that they manage, but typically a complex
9361 widget will have widget children, each of which manages a subwindow of
9362 the parent widget's X window.  These children may themselves be
9363 composite widgets.  Thus a widget is actually a tree or hierarchy of
9364 widgets.
9365
9366 For each toolkit widget, lwlib maintains a tree of @code{widget_values}
9367 which mirror the hierarchical state of Xt widgets (including Motif,
9368 Athena, 3D Athena, and Falk's widget sets).  Each @code{widget_value}
9369 has @code{contents} member, which points to the head of a linked list of
9370 its children.  The linked list of siblings is chained through the
9371 @code{next} member of @code{widget_value}.
9372
9373 @example
9374            +-----------+
9375            | composite |
9376            +-----------+
9377                  |
9378                  | contents
9379                  V
9380              +-------+ next +-------+ next +-------+
9381              | child |----->| child |----->| child |
9382              +-------+      +-------+      +-------+
9383                                 |
9384                                 | contents
9385                                 V
9386                          +-------------+ next +-------------+
9387                          | grand child |----->| grand child |
9388                          +-------------+      +-------------+
9389
9390 The @code{widget_value} hierarchy of a composite widget with two simple
9391 children and one composite child.
9392 @end example
9393
9394 The @code{widget_instance} structure maintains the inverse view of the
9395 tree.  As for the @code{widget_value}, siblings are chained through the
9396 @code{next} member.  However, rather than naming children, the
9397 @code{widget_instance} tree links to parents.
9398
9399 @example
9400            +-----------+
9401            | composite |
9402            +-----------+
9403                  A
9404                  | parent
9405                  |
9406              +-------+ next +-------+ next +-------+
9407              | child |----->| child |----->| child |
9408              +-------+      +-------+      +-------+
9409                                 A
9410                                 | parent
9411                                 |
9412                          +-------------+ next +-------------+
9413                          | grand child |----->| grand child |
9414                          +-------------+      +-------------+
9415
9416 The @code{widget_value} hierarchy of a composite widget with two simple
9417 children and one composite child.
9418 @end example
9419
9420 This permits widgets derived from different toolkits to be updated and
9421 manipulated generically by the lwlib library. For instance
9422 @code{update_one_widget_instance} can cope with multiple types of widget
9423 and multiple types of toolkit. Each element in the widget hierarchy is
9424 updated from its corresponding @code{widget_value} by walking the
9425 @code{widget_value} tree.  This has desirable properties.  For example,
9426 @code{lw_modify_all_widgets} is called from @file{glyphs-x.c} and
9427 updates all the properties of a widget without having to know what the
9428 widget is or what toolkit it is from.  Unfortunately this also has its
9429 hairy properties; the lwlib code quite complex. And of course lwlib has
9430 to know at some level what the widget is and how to set its properties.
9431
9432 The @code{widget_instance} structure also contains a pointer to the root
9433 of its tree.  Widget instances are further confi
9434
9435
9436 @node Scrollbars
9437 @subsection Scrollbars
9438 @cindex scrollbars
9439
9440 @node Menubars
9441 @subsection Menubars
9442 @cindex menubars
9443
9444 @node Checkboxes and Radio Buttons
9445 @subsection Checkboxes and Radio Buttons
9446 @cindex checkboxes and radio buttons
9447 @cindex radio buttons, checkboxes and
9448 @cindex buttons, checkboxes and radio
9449
9450 @node Progress Bars
9451 @subsection Progress Bars
9452 @cindex progress bars
9453 @cindex bars, progress
9454
9455 @node Tab Controls
9456 @subsection Tab Controls
9457 @cindex tab controls
9458
9459 @include index.texi
9460
9461 @c Print the tables of contents
9462 @summarycontents
9463 @contents
9464 @c That's all
9465
9466 @bye