git.chise.org Git - chise/xemacs-chise.git.1/blob - man/internals/internals.texi

   1 \input texinfo  @c -*-texinfo-*-
   2 @c %**start of header
   3 @setfilename ../../info/internals.info
   4 @settitle XEmacs Internals Manual
   5 @c %**end of header
   6
   7 @ifinfo
   8 @dircategory XEmacs Editor
   9 @direntry
  10 * Internals: (internals).       XEmacs Internals Manual.
  11 @end direntry
  12
  13 Copyright @copyright{} 1992 - 1996 Ben Wing.
  14 Copyright @copyright{} 1996, 1997 Sun Microsystems.
  15 Copyright @copyright{} 1994 - 1998 Free Software Foundation.
  16 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
  17
  18
  19 Permission is granted to make and distribute verbatim copies of this
  20 manual provided the copyright notice and this permission notice are
  21 preserved on all copies.
  22
  23 @ignore
  24 Permission is granted to process this file through TeX and print the
  25 results, provided the printed document carries copying permission notice
  26 identical to this one except for the removal of this paragraph (this
  27 paragraph not being relevant to the printed manual).
  28
  29 @end ignore
  30 Permission is granted to copy and distribute modified versions of this
  31 manual under the conditions for verbatim copying, provided that the
  32 entire resulting derived work is distributed under the terms of a
  33 permission notice identical to this one.
  34
  35 Permission is granted to copy and distribute translations of this manual
  36 into another language, under the above conditions for modified versions,
  37 except that this permission notice may be stated in a translation
  38 approved by the Foundation.
  39
  40 Permission is granted to copy and distribute modified versions of this
  41 manual under the conditions for verbatim copying, provided also that the
  42 section entitled ``GNU General Public License'' is included exactly as
  43 in the original, and provided that the entire resulting derived work is
  44 distributed under the terms of a permission notice identical to this
  45 one.
  46
  47 Permission is granted to copy and distribute translations of this manual
  48 into another language, under the above conditions for modified versions,
  49 except that the section entitled ``GNU General Public License'' may be
  50 included in a translation approved by the Free Software Foundation
  51 instead of in the original English.
  52 @end ifinfo
  53
  54 @c Combine indices.
  55 @synindex cp fn
  56 @syncodeindex vr fn
  57 @syncodeindex ky fn
  58 @syncodeindex pg fn
  59 @syncodeindex tp fn
  60
  61 @setchapternewpage odd
  62 @finalout
  63
  64 @titlepage
  65 @title XEmacs Internals Manual
  66 @subtitle Version 1.3, August 1999
  67
  68 @author Ben Wing
  69 @author Martin Buchholz
  70 @author Hrvoje Niksic
  71 @author Matthias Neubauer
  72 @page
  73 @vskip 0pt plus 1fill
  74
  75 @noindent
  76 Copyright @copyright{} 1992 - 1996 Ben Wing. @*
  77 Copyright @copyright{} 1996, 1997 Sun Microsystems, Inc. @*
  78 Copyright @copyright{} 1994 - 1998 Free Software Foundation. @*
  79 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
  80
  81 @sp 2
  82 Version 1.3 @*
  83 August 1999.@*
  84
  85 Permission is granted to make and distribute verbatim copies of this
  86 manual provided the copyright notice and this permission notice are
  87 preserved on all copies.
  88
  89 Permission is granted to copy and distribute modified versions of this
  90 manual under the conditions for verbatim copying, provided also that the
  91 section entitled ``GNU General Public License'' is included
  92 exactly as in the original, and provided that the entire resulting
  93 derived work is distributed under the terms of a permission notice
  94 identical to this one.
  95
  96 Permission is granted to copy and distribute translations of this manual
  97 into another language, under the above conditions for modified versions,
  98 except that the section entitled ``GNU General Public License'' may be
  99 included in a translation approved by the Free Software Foundation
 100 instead of in the original English.
 101 @end titlepage
 102 @page
 103
 104 @node Top, A History of Emacs, (dir), (dir)
 105
 106 @ifinfo
 107 This Info file contains v1.0 of the XEmacs Internals Manual.
 108 @end ifinfo
 109
 110 @menu
 111 * A History of Emacs::          Times, dates, important events.
 112 * XEmacs From the Outside::     A broad conceptual overview.
 113 * The Lisp Language::           An overview.
 114 * XEmacs From the Perspective of Building::
 115 * XEmacs From the Inside::
 116 * The XEmacs Object System (Abstractly Speaking)::
 117 * How Lisp Objects Are Represented in C::
 118 * Rules When Writing New C Code::
 119 * A Summary of the Various XEmacs Modules::
 120 * Allocation of Objects in XEmacs Lisp::
 121 * Events and the Event Loop::
 122 * Evaluation; Stack Frames; Bindings::
 123 * Symbols and Variables::
 124 * Buffers and Textual Representation::
 125 * MULE Character Sets and Encodings::
 126 * The Lisp Reader and Compiler::
 127 * Lstreams::
 128 * Consoles; Devices; Frames; Windows::
 129 * The Redisplay Mechanism::
 130 * Extents::
 131 * Faces::
 132 * Glyphs::
 133 * Specifiers::
 134 * Menus::
 135 * Subprocesses::
 136 * Interface to X Windows::
 137 * Index::                   Index including concepts, functions, variables,
 138                               and other terms.
 139
 140       --- The Detailed Node Listing ---
 141
 142 Here are other nodes that are inferiors of those already listed,
 143 mentioned here so you can get to them in one step:
 144
 145 A History of Emacs
 146
 147 * Through Version 18::          Unification prevails.
 148 * Lucid Emacs::                 One version 19 Emacs.
 149 * GNU Emacs 19::                The other version 19 Emacs.
 150 * XEmacs::                      The continuation of Lucid Emacs.
 151
 152 Rules When Writing New C Code
 153
 154 * General Coding Rules::
 155 * Writing Lisp Primitives::
 156 * Adding Global Lisp Variables::
 157 * Techniques for XEmacs Developers::
 158
 159 A Summary of the Various XEmacs Modules
 160
 161 * Low-Level Modules::
 162 * Basic Lisp Modules::
 163 * Modules for Standard Editing Operations::
 164 * Editor-Level Control Flow Modules::
 165 * Modules for the Basic Displayable Lisp Objects::
 166 * Modules for other Display-Related Lisp Objects::
 167 * Modules for the Redisplay Mechanism::
 168 * Modules for Interfacing with the File System::
 169 * Modules for Other Aspects of the Lisp Interpreter and Object System::
 170 * Modules for Interfacing with the Operating System::
 171 * Modules for Interfacing with X Windows::
 172 * Modules for Internationalization::
 173
 174 Allocation of Objects in XEmacs Lisp
 175
 176 * Introduction to Allocation::
 177 * Garbage Collection::
 178 * GCPROing::
 179 * Garbage Collection - Step by Step::
 180 * Integers and Characters::
 181 * Allocation from Frob Blocks::
 182 * lrecords::
 183 * Low-level allocation::
 184 * Pure Space::
 185 * Cons::
 186 * Vector::
 187 * Bit Vector::
 188 * Symbol::
 189 * Marker::
 190 * String::
 191 * Compiled Function::
 192
 193 Events and the Event Loop
 194
 195 * Introduction to Events::
 196 * Main Loop::
 197 * Specifics of the Event Gathering Mechanism::
 198 * Specifics About the Emacs Event::
 199 * The Event Stream Callback Routines::
 200 * Other Event Loop Functions::
 201 * Converting Events::
 202 * Dispatching Events; The Command Builder::
 203
 204 Evaluation; Stack Frames; Bindings
 205
 206 * Evaluation::
 207 * Dynamic Binding; The specbinding Stack; Unwind-Protects::
 208 * Simple Special Forms::
 209 * Catch and Throw::
 210
 211 Symbols and Variables
 212
 213 * Introduction to Symbols::
 214 * Obarrays::
 215 * Symbol Values::
 216
 217 Buffers and Textual Representation
 218
 219 * Introduction to Buffers::     A buffer holds a block of text such as a file.
 220 * The Text in a Buffer::        Representation of the text in a buffer.
 221 * Buffer Lists::                Keeping track of all buffers.
 222 * Markers and Extents::         Tagging locations within a buffer.
 223 * Bufbytes and Emchars::        Representation of individual characters.
 224 * The Buffer Object::           The Lisp object corresponding to a buffer.
 225
 226 MULE Character Sets and Encodings
 227
 228 * Character Sets::
 229 * Encodings::
 230 * Internal Mule Encodings::
 231
 232 Encodings
 233
 234 * Japanese EUC (Extended Unix Code)::
 235 * JIS7::
 236
 237 Internal Mule Encodings
 238
 239 * Internal String Encoding::
 240 * Internal Character Encoding::
 241
 242 The Lisp Reader and Compiler
 243
 244 Lstreams
 245
 246 Consoles; Devices; Frames; Windows
 247
 248 * Introduction to Consoles; Devices; Frames; Windows::
 249 * Point::
 250 * Window Hierarchy::
 251
 252 The Redisplay Mechanism
 253
 254 * Critical Redisplay Sections::
 255 * Line Start Cache::
 256
 257 Extents
 258
 259 * Introduction to Extents::     Extents are ranges over text, with properties.
 260 * Extent Ordering::             How extents are ordered internally.
 261 * Format of the Extent Info::   The extent information in a buffer or string.
 262 * Zero-Length Extents::         A weird special case.
 263 * Mathematics of Extent Ordering::      A rigorous foundation.
 264 * Extent Fragments::            Cached information useful for redisplay.
 265
 266 Faces
 267
 268 Glyphs
 269
 270 Specifiers
 271
 272 Menus
 273
 274 Subprocesses
 275
 276 Interface to X Windows
 277
 278 @end menu
 279
 280 @node A History of Emacs, XEmacs From the Outside, Top, Top
 281 @chapter A History of Emacs
 282 @cindex history of Emacs
 283 @cindex Hackers (Steven Levy)
 284 @cindex Levy, Steven
 285 @cindex ITS (Incompatible Timesharing System)
 286 @cindex Stallman, Richard
 287 @cindex RMS
 288 @cindex MIT
 289 @cindex TECO
 290 @cindex FSF
 291 @cindex Free Software Foundation
 292
 293   XEmacs is a powerful, customizable text editor and development
 294 environment.  It began as Lucid Emacs, which was in turn derived from
 295 GNU Emacs, a program written by Richard Stallman of the Free Software
 296 Foundation.  GNU Emacs dates back to the 1970's, and was modelled
 297 after a package called ``Emacs'', written in 1976, that was a set of
 298 macros on top of TECO, an old, old text editor written at MIT on the
 299 DEC PDP 10 under one of the earliest time-sharing operating systems,
 300 ITS (Incompatible Timesharing System). (ITS dates back well before
 301 Unix.) ITS, TECO, and Emacs were products of a group of people at MIT
 302 who called themselves ``hackers'', who shared an idealistic belief
 303 system about the free exchange of information and were fanatical in
 304 their devotion to and time spent with computers. (The hacker
 305 subculture dates back to the late 1950's at MIT and is described in
 306 detail in Steven Levy's book @cite{Hackers}.  This book also includes
 307 a lot of information about Stallman himself and the development of
 308 Lisp, a programming language developed at MIT that underlies Emacs.)
 309
 310 @menu
 311 * Through Version 18::          Unification prevails.
 312 * Lucid Emacs::                 One version 19 Emacs.
 313 * GNU Emacs 19::                The other version 19 Emacs.
 314 * GNU Emacs 20::                The other version 20 Emacs.
 315 * XEmacs::                      The continuation of Lucid Emacs.
 316 @end menu
 317
 318 @node Through Version 18
 319 @section Through Version 18
 320 @cindex Gosling, James
 321 @cindex Great Usenet Renaming
 322
 323   Although the history of the early versions of GNU Emacs is unclear,
 324 the history is well-known from the middle of 1985.  A time line is:
 325
 326 @itemize @bullet
 327 @item
 328 GNU Emacs version 15 (15.34) was released sometime in 1984 or 1985 and
 329 shared some code with a version of Emacs written by James Gosling (the
 330 same James Gosling who later created the Java language).
 331 @item
 332 GNU Emacs version 16 (first released version was 16.56) was released on
 333 July 15, 1985.  All Gosling code was removed due to potential copyright
 334 problems with the code.
 335 @item
 336 version 16.57: released on September 16, 1985.
 337 @item
 338 versions 16.58, 16.59: released on September 17, 1985.
 339 @item
 340 version 16.60: released on September 19, 1985.  These later version 16's
 341 incorporated patches from the net, esp. for getting Emacs to work under
 342 System V.
 343 @item
 344 version 17.36 (first official v17 release) released on December 20,
 345 1985.  Included a TeX-able user manual.  First official unpatched
 346 version that worked on vanilla System V machines.
 347 @item
 348 version 17.43 (second official v17 release) released on January 25,
 349 1986.
 350 @item
 351 version 17.45 released on January 30, 1986.
 352 @item
 353 version 17.46 released on February 4, 1986.
 354 @item
 355 version 17.48 released on February 10, 1986.
 356 @item
 357 version 17.49 released on February 12, 1986.
 358 @item
 359 version 17.55 released on March 18, 1986.
 360 @item
 361 version 17.57 released on March 27, 1986.
 362 @item
 363 version 17.58 released on April 4, 1986.
 364 @item
 365 version 17.61 released on April 12, 1986.
 366 @item
 367 version 17.63 released on May 7, 1986.
 368 @item
 369 version 17.64 released on May 12, 1986.
 370 @item
 371 version 18.24 (a beta version) released on October 2, 1986.
 372 @item
 373 version 18.30 (a beta version) released on November 15, 1986.
 374 @item
 375 version 18.31 (a beta version) released on November 23, 1986.
 376 @item
 377 version 18.32 (a beta version) released on December 7, 1986.
 378 @item
 379 version 18.33 (a beta version) released on December 12, 1986.
 380 @item
 381 version 18.35 (a beta version) released on January 5, 1987.
 382 @item
 383 version 18.36 (a beta version) released on January 21, 1987.
 384 @item
 385 January 27, 1987: The Great Usenet Renaming.  net.emacs is now
 386 comp.emacs.
 387 @item
 388 version 18.37 (a beta version) released on February 12, 1987.
 389 @item
 390 version 18.38 (a beta version) released on March 3, 1987.
 391 @item
 392 version 18.39 (a beta version) released on March 14, 1987.
 393 @item
 394 version 18.40 (a beta version) released on March 18, 1987.
 395 @item
 396 version 18.41 (the first ``official'' release) released on March 22,
 397 1987.
 398 @item
 399 version 18.45 released on June 2, 1987.
 400 @item
 401 version 18.46 released on June 9, 1987.
 402 @item
 403 version 18.47 released on June 18, 1987.
 404 @item
 405 version 18.48 released on September 3, 1987.
 406 @item
 407 version 18.49 released on September 18, 1987.
 408 @item
 409 version 18.50 released on February 13, 1988.
 410 @item
 411 version 18.51 released on May 7, 1988.
 412 @item
 413 version 18.52 released on September 1, 1988.
 414 @item
 415 version 18.53 released on February 24, 1989.
 416 @item
 417 version 18.54 released on April 26, 1989.
 418 @item
 419 version 18.55 released on August 23, 1989.  This is the earliest version
 420 that is still available by FTP.
 421 @item
 422 version 18.56 released on January 17, 1991.
 423 @item
 424 version 18.57 released late January, 1991.
 425 @item
 426 version 18.58 released ?????.
 427 @item
 428 version 18.59 released October 31, 1992.
 429 @end itemize
 430
 431 @node Lucid Emacs
 432 @section Lucid Emacs
 433 @cindex Lucid Emacs
 434 @cindex Lucid Inc.
 435 @cindex Energize
 436 @cindex Epoch
 437
 438   Lucid Emacs was developed by the (now-defunct) Lucid Inc., a maker of
 439 C++ and Lisp development environments.  It began when Lucid decided they
 440 wanted to use Emacs as the editor and cornerstone of their C++
 441 development environment (called ``Energize'').  They needed many features
 442 that were not available in the existing version of GNU Emacs (version
 443 18.5something), in particular good and integrated support for GUI
 444 elements such as mouse support, multiple fonts, multiple window-system
 445 windows, etc.  A branch of GNU Emacs called Epoch, written at the
 446 University of Illinois, existed that supplied many of these features;
 447 however, Lucid needed more than what existed in Epoch.  At the time, the
 448 Free Software Foundation was working on version 19 of Emacs (this was
 449 sometime around 1991), which was planned to have similar features, and
 450 so Lucid decided to work with the Free Software Foundation.  Their plan
 451 was to add features that they needed, and coordinate with the FSF so
 452 that the features would get included back into Emacs version 19.
 453
 454   Delays in the release of version 19 occurred, however (resulting in it
 455 finally being released more than a year after what was initially
 456 planned), and Lucid encountered unexpected technical resistance in
 457 getting their changes merged back into version 19, so they decided to
 458 release their own version of Emacs, which became Lucid Emacs 19.0.
 459
 460 @cindex Zawinski, Jamie
 461 @cindex Sexton, Harlan
 462 @cindex Benson, Eric
 463 @cindex Devin, Matthieu
 464   The initial authors of Lucid Emacs were Matthieu Devin, Harlan Sexton,
 465 and Eric Benson, and the work was later taken over by Jamie Zawinski,
 466 who became ``Mr. Lucid Emacs'' for many releases.
 467
 468   A time line for Lucid Emacs/XEmacs is
 469
 470 @itemize @bullet
 471 @item
 472 version 19.0 shipped with Energize 1.0, April 1992.
 473 @item
 474 version 19.1 released June 4, 1992.
 475 @item
 476 version 19.2 released June 19, 1992.
 477 @item
 478 version 19.3 released September 9, 1992.
 479 @item
 480 version 19.4 released January 21, 1993.
 481 @item
 482 version 19.5 was a repackaging of 19.4 with a few bug fixes and
 483 shipped with Energize 2.0.  Never released to the net.
 484 @item
 485 version 19.6 released April 9, 1993.
 486 @item
 487 version 19.7 was a repackaging of 19.6 with a few bug fixes and
 488 shipped with Energize 2.1.  Never released to the net.
 489 @item
 490 version 19.8 released September 6, 1993.
 491 @item
 492 version 19.9 released January 12, 1994.
 493 @item
 494 version 19.10 released May 27, 1994.
 495 @item
 496 version 19.11 (first XEmacs) released September 13, 1994.
 497 @item
 498 version 19.12 released June 23, 1995.
 499 @item
 500 version 19.13 released September 1, 1995.
 501 @item
 502 version 19.14 released June 23, 1996.
 503 @item
 504 version 20.0 released February 9, 1997.
 505 @item
 506 version 19.15 released March 28, 1997.
 507 @item
 508 version 20.1 (not released to the net) April 15, 1997.
 509 @item
 510 version 20.2 released May 16, 1997.
 511 @item
 512 version 19.16 released October 31, 1997.
 513 @item
 514 version 20.3 (the first stable version of XEmacs 20.x) released November 30,
 515 1997.
 516 version 20.4 released February 28, 1998.
 517 @end itemize
 518
 519 @node GNU Emacs 19
 520 @section GNU Emacs 19
 521 @cindex GNU Emacs 19
 522 @cindex FSF Emacs
 523
 524   About a year after the initial release of Lucid Emacs, the FSF
 525 released a beta of their version of Emacs 19 (referred to here as ``GNU
 526 Emacs'').  By this time, the current version of Lucid Emacs was
 527 19.6. (Strangely, the first released beta from the FSF was GNU Emacs
 528 19.7.) A time line for GNU Emacs version 19 is
 529
 530 @itemize @bullet
 531 @item
 532 version 19.8 (beta) released May 27, 1993.
 533 @item
 534 version 19.9 (beta) released May 27, 1993.
 535 @item
 536 version 19.10 (beta) released May 30, 1993.
 537 @item
 538 version 19.11 (beta) released June 1, 1993.
 539 @item
 540 version 19.12 (beta) released June 2, 1993.
 541 @item
 542 version 19.13 (beta) released June 8, 1993.
 543 @item
 544 version 19.14 (beta) released June 17, 1993.
 545 @item
 546 version 19.15 (beta) released June 19, 1993.
 547 @item
 548 version 19.16 (beta) released July 6, 1993.
 549 @item
 550 version 19.17 (beta) released late July, 1993.
 551 @item
 552 version 19.18 (beta) released August 9, 1993.
 553 @item
 554 version 19.19 (beta) released August 15, 1993.
 555 @item
 556 version 19.20 (beta) released November 17, 1993.
 557 @item
 558 version 19.21 (beta) released November 17, 1993.
 559 @item
 560 version 19.22 (beta) released November 28, 1993.
 561 @item
 562 version 19.23 (beta) released May 17, 1994.
 563 @item
 564 version 19.24 (beta) released May 16, 1994.
 565 @item
 566 version 19.25 (beta) released June 3, 1994.
 567 @item
 568 version 19.26 (beta) released September 11, 1994.
 569 @item
 570 version 19.27 (beta) released September 14, 1994.
 571 @item
 572 version 19.28 (first ``official'' release) released November 1, 1994.
 573 @item
 574 version 19.29 released June 21, 1995.
 575 @item
 576 version 19.30 released November 24, 1995.
 577 @item
 578 version 19.31 released May 25, 1996.
 579 @item
 580 version 19.32 released July 31, 1996.
 581 @item
 582 version 19.33 released August 11, 1996.
 583 @item
 584 version 19.34 released August 21, 1996.
 585 @item
 586 version 19.34b released September 6, 1996.
 587 @end itemize
 588
 589 @cindex Mlynarik, Richard
 590   In some ways, GNU Emacs 19 was better than Lucid Emacs; in some ways,
 591 worse.  Lucid soon began incorporating features from GNU Emacs 19 into
 592 Lucid Emacs; the work was mostly done by Richard Mlynarik, who had been
 593 working on and using GNU Emacs for a long time (back as far as version
 594 16 or 17).
 595
 596 @node GNU Emacs 20
 597 @section GNU Emacs 20
 598 @cindex GNU Emacs 20
 599 @cindex FSF Emacs
 600
 601 On February 2, 1997 work began on GNU Emacs to integrate Mule.  The first
 602 release was made in September of that year.
 603
 604 A timeline for Emacs 20 is
 605
 606 @itemize @bullet
 607 @item
 608 version 20.1 released September 17, 1997.
 609 @item
 610 version 20.2 released September 20, 1997.
 611 @item
 612 version 20.3 released August 19, 1998.
 613 @end itemize
 614
 615 @node XEmacs
 616 @section XEmacs
 617 @cindex XEmacs
 618
 619 @cindex Sun Microsystems
 620 @cindex University of Illinois
 621 @cindex Illinois, University of
 622 @cindex SPARCWorks
 623 @cindex Andreessen, Marc
 624 @cindex Baur, Steve
 625 @cindex Buchholz, Martin
 626 @cindex Kaplan, Simon
 627 @cindex Wing, Ben
 628 @cindex Thompson, Chuck
 629 @cindex Win-Emacs
 630 @cindex Epoch
 631 @cindex Amdahl Corporation
 632   Around the time that Lucid was developing Energize, Sun Microsystems
 633 was developing their own development environment (called ``SPARCWorks'')
 634 and also decided to use Emacs.  They joined forces with the Epoch team
 635 at the University of Illinois and later with Lucid.  The maintainer of
 636 the last-released version of Epoch was Marc Andreessen, but he dropped
 637 out and the Epoch project, headed by Simon Kaplan, lured Chuck Thompson
 638 away from a system administration job to become the primary Lucid Emacs
 639 author for Epoch and Sun.  Chuck's area of specialty became the
 640 redisplay engine (he replaced the old Lucid Emacs redisplay engine with
 641 a ported version from Epoch and then later rewrote it from scratch).
 642 Sun also hired Ben Wing (the author of Win-Emacs, a port of Lucid Emacs
 643 to Microsoft Windows 3.1) in 1993, for what was initially a one-month
 644 contract to fix some event problems but later became a many-year
 645 involvement, punctuated by a six-month contract with Amdahl Corporation.
 646
 647 @cindex rename to XEmacs
 648   In 1994, Sun and Lucid agreed to rename Lucid Emacs to XEmacs (a name
 649 not favorable to either company); the first release called XEmacs was
 650 version 19.11.  In June 1994, Lucid folded and Jamie quit to work for
 651 the newly formed Mosaic Communications Corp., later Netscape
 652 Communications Corp. (co-founded by the same Marc Andreessen, who had
 653 quit his Epoch job to work on a graphical browser for the World Wide
 654 Web).  Chuck then become the primary maintainer of XEmacs, and put out
 655 versions 19.11 through 19.14 in conjunction with Ben.  For 19.12 and
 656 19.13, Chuck added the new redisplay and many other display improvements
 657 and Ben added MULE support (support for Asian and other languages) and
 658 redesigned most of the internal Lisp subsystems to better support the
 659 MULE work and the various other features being added to XEmacs.  After
 660 19.14 Chuck retired as primary maintainer and Steve Baur stepped in.
 661
 662 @cindex MULE merged XEmacs appears
 663   Soon after 19.13 was released, work began in earnest on the MULE
 664 internationalization code and the source tree was divided into two
 665 development paths.  The MULE version was initially called 19.20, but was
 666 soon renamed to 20.0.  In 1996 Martin Buchholz of Sun Microsystems took
 667 over the care and feeding of it and worked on it in parallel with the
 668 19.14 development that was occurring at the same time.  After much work
 669 by Martin, it was decided to release 20.0 ahead of 19.15 in February
 670 1997.  The source tree remained divided until 20.2 when the version 19
 671 source was finally retired at version 19.16.
 672
 673 @cindex Baur, Steve
 674 @cindex Buchholz, Martin
 675 @cindex Jones, Kyle
 676 @cindex Niksic, Hrvoje
 677 @cindex XEmacs goes it alone
 678   In 1997, Sun finally dropped all pretense of support for XEmacs and
 679 Martin Buchholz left the company in November.  Since then, and mostly
 680 for the previous year, because Steve Baur was never paid to work on
 681 XEmacs, XEmacs has existed solely on the contributions of volunteers
 682 from the Free Software Community.  Starting from 1997, Hrvoje Niksic and
 683 Kyle Jones have figured prominently in XEmacs development.
 684
 685 @cindex merging attempts
 686   Many attempts have been made to merge XEmacs and GNU Emacs, but they
 687 have consistently failed.
 688
 689   A more detailed history is contained in the XEmacs About page.
 690
 691 @node XEmacs From the Outside, The Lisp Language, A History of Emacs, Top
 692 @chapter XEmacs From the Outside
 693 @cindex read-eval-print
 694
 695   XEmacs appears to the outside world as an editor, but it is really a
 696 Lisp environment.  At its heart is a Lisp interpreter; it also
 697 ``happens'' to contain many specialized object types (e.g. buffers,
 698 windows, frames, events) that are useful for implementing an editor.
 699 Some of these objects (in particular windows and frames) have
 700 displayable representations, and XEmacs provides a function
 701 @code{redisplay()} that ensures that the display of all such objects
 702 matches their internal state.  Most of the time, a standard Lisp
 703 environment is in a @dfn{read-eval-print} loop---i.e. ``read some Lisp
 704 code, execute it, and print the results''.  XEmacs has a similar loop:
 705
 706 @itemize @bullet
 707 @item
 708 read an event
 709 @item
 710 dispatch the event (i.e. ``do it'')
 711 @item
 712 redisplay
 713 @end itemize
 714
 715   Reading an event is done using the Lisp function @code{next-event},
 716 which waits for something to happen (typically, the user presses a key
 717 or moves the mouse) and returns an event object describing this.
 718 Dispatching an event is done using the Lisp function
 719 @code{dispatch-event}, which looks up the event in a keymap object (a
 720 particular kind of object that associates an event with a Lisp function)
 721 and calls that function.  The function ``does'' what the user has
 722 requested by changing the state of particular frame objects, buffer
 723 objects, etc.  Finally, @code{redisplay()} is called, which updates the
 724 display to reflect those changes just made.  Thus is an ``editor'' born.
 725
 726 @cindex bridge, playing
 727 @cindex taxes, doing
 728 @cindex pi, calculating
 729   Note that you do not have to use XEmacs as an editor; you could just
 730 as well make it do your taxes, compute pi, play bridge, etc.  You'd just
 731 have to write functions to do those operations in Lisp.
 732
 733 @node The Lisp Language, XEmacs From the Perspective of Building, XEmacs From the Outside, Top
 734 @chapter The Lisp Language
 735 @cindex Lisp vs. C
 736 @cindex C vs. Lisp
 737 @cindex Lisp vs. Java
 738 @cindex Java vs. Lisp
 739 @cindex dynamic scoping
 740 @cindex scoping, dynamic
 741 @cindex dynamic types
 742 @cindex types, dynamic
 743 @cindex Java
 744 @cindex Common Lisp
 745 @cindex Gosling, James
 746
 747   Lisp is a general-purpose language that is higher-level than C and in
 748 many ways more powerful than C.  Powerful dialects of Lisp such as
 749 Common Lisp are probably much better languages for writing very large
 750 applications than is C. (Unfortunately, for many non-technical
 751 reasons C and its successor C++ have become the dominant languages for
 752 application development.  These languages are both inadequate for
 753 extremely large applications, which is evidenced by the fact that newer,
 754 larger programs are becoming ever harder to write and are requiring ever
 755 more programmers despite great increases in C development environments;
 756 and by the fact that, although hardware speeds and reliability have been
 757 growing at an exponential rate, most software is still generally
 758 considered to be slow and buggy.)
 759
 760   The new Java language holds promise as a better general-purpose
 761 development language than C.  Java has many features in common with
 762 Lisp that are not shared by C (this is not a coincidence, since
 763 Java was designed by James Gosling, a former Lisp hacker).  This
 764 will be discussed more later.
 765
 766 For those used to C, here is a summary of the basic differences between
 767 C and Lisp:
 768
 769 @enumerate
 770 @item
 771 Lisp has an extremely regular syntax.  Every function, expression,
 772 and control statement is written in the form
 773
 774 @example
 775    (@var{func} @var{arg1} @var{arg2} ...)
 776 @end example
 777
 778 This is as opposed to C, which writes functions as
 779
 780 @example
 781    func(@var{arg1}, @var{arg2}, ...)
 782 @end example
 783
 784 but writes expressions involving operators as (e.g.)
 785
 786 @example
 787    @var{arg1} + @var{arg2}
 788 @end example
 789
 790 and writes control statements as (e.g.)
 791
 792 @example
 793    while (@var{expr}) @{ @var{statement1}; @var{statement2}; ... @}
 794 @end example
 795
 796 Lisp equivalents of the latter two would be
 797
 798 @example
 799    (+ @var{arg1} @var{arg2} ...)
 800 @end example
 801
 802 and
 803
 804 @example
 805    (while @var{expr} @var{statement1} @var{statement2} ...)
 806 @end example
 807
 808 @item
 809 Lisp is a safe language.  Assuming there are no bugs in the Lisp
 810 interpreter/compiler, it is impossible to write a program that ``core
 811 dumps'' or otherwise causes the machine to execute an illegal
 812 instruction.  This is very different from C, where perhaps the most
 813 common outcome of a bug is exactly such a crash.  A corollary of this is that
 814 the C operation of casting a pointer is impossible (and unnecessary) in
 815 Lisp, and that it is impossible to access memory outside the bounds of
 816 an array.
 817
 818 @item
 819 Programs and data are written in the same form.  The
 820 parenthesis-enclosing form described above for statements is the same
 821 form used for the most common data type in Lisp, the list.  Thus, it is
 822 possible to represent any Lisp program using Lisp data types, and for
 823 one program to construct Lisp statements and then dynamically
 824 @dfn{evaluate} them, or cause them to execute.
 825
 826 @item
 827 All objects are @dfn{dynamically typed}.  This means that part of every
 828 object is an indication of what type it is.  A Lisp program can
 829 manipulate an object without knowing what type it is, and can query an
 830 object to determine its type.  This means that, correspondingly,
 831 variables and function parameters can hold objects of any type and are
 832 not normally declared as being of any particular type.  This is opposed
 833 to the @dfn{static typing} of C, where variables can hold exactly one
 834 type of object and must be declared as such, and objects do not contain
 835 an indication of their type because it's implicit in the variables they
 836 are stored in.  It is possible in C to have a variable hold different
 837 types of objects (e.g. through the use of @code{void *} pointers or
 838 variable-argument functions), but the type information must then be
 839 passed explicitly in some other fashion, leading to additional program
 840 complexity.
 841
 842 @item
 843 Allocated memory is automatically reclaimed when it is no longer in use.
 844 This operation is called @dfn{garbage collection} and involves looking
 845 through all variables to see what memory is being pointed to, and
 846 reclaiming any memory that is not pointed to and is thus
 847 ``inaccessible'' and out of use.  This is as opposed to C, in which
 848 allocated memory must be explicitly reclaimed using @code{free()}.  If
 849 you simply drop all pointers to memory without freeing it, it becomes
 850 ``leaked'' memory that still takes up space.  Over a long period of
 851 time, this can cause your program to grow and grow until it runs out of
 852 memory.
 853
 854 @item
 855 Lisp has built-in facilities for handling errors and exceptions.  In C,
 856 when an error occurs, usually either the program exits entirely or the
 857 routine in which the error occurs returns a value indicating this.  If
 858 an error occurs in a deeply-nested routine, then every routine currently
 859 called must unwind itself normally and return an error value back up to
 860 the next routine.  This means that every routine must explicitly check
 861 for an error in all the routines it calls; if it does not do so,
 862 unexpected and often random behavior results.  This is an extremely
 863 common source of bugs in C programs.  An alternative would be to do a
 864 non-local exit using @code{longjmp()}, but that is often very dangerous
 865 because the routines that were exited past had no opportunity to clean
 866 up after themselves and may leave things in an inconsistent state,
 867 causing a crash shortly afterwards.
 868
 869 Lisp provides mechanisms to make such non-local exits safe.  When an
 870 error occurs, a routine simply signals that an error of a particular
 871 class has occurred, and a non-local exit takes place.  Any routine can
 872 trap errors occurring in routines it calls by registering an error
 873 handler for some or all classes of errors. (If no handler is registered,
 874 a default handler, generally installed by the top-level event loop, is
 875 executed; this prints out the error and continues.) Routines can also
 876 specify cleanup code (called an @dfn{unwind-protect}) that will be
 877 called when control exits from a block of code, no matter how that exit
 878 occurs---i.e. even if a function deeply nested below it causes a
 879 non-local exit back to the top level.
 880
 881 Note that this facility has appeared in some recent vintages of C, in
 882 particular Visual C++ and other PC compilers written for the Microsoft
 883 Win32 API.
 884
 885 @item
 886 In Emacs Lisp, local variables are @dfn{dynamically scoped}.  This means
 887 that if you declare a local variable in a particular function, and then
 888 call another function, that subfunction can ``see'' the local variable
 889 you declared.  This is actually considered a bug in Emacs Lisp and in
 890 all other early dialects of Lisp, and was corrected in Common Lisp. (In
 891 Common Lisp, you can still declare dynamically scoped variables if you
 892 want to---they are sometimes useful---but variables by default are
 893 @dfn{lexically scoped} as in C.)
 894 @end enumerate
 895
 896 For those familiar with Lisp, Emacs Lisp is modelled after MacLisp, an
 897 early dialect of Lisp developed at MIT (no relation to the Macintosh
 898 computer).  There is a Common Lisp compatibility package available for
 899 Emacs that provides many of the features of Common Lisp.
 900
 901 The Java language is derived in many ways from C, and shares a similar
 902 syntax, but has the following features in common with Lisp (and different
 903 from C):
 904
 905 @enumerate
 906 @item
 907 Java is a safe language, like Lisp.
 908 @item
 909 Java provides garbage collection, like Lisp.
 910 @item
 911 Java has built-in facilities for handling errors and exceptions, like
 912 Lisp.
 913 @item
 914 Java has a type system that combines the best advantages of both static
 915 and dynamic typing.  Objects (except very simple types) are explicitly
 916 marked with their type, as in dynamic typing; but there is a hierarchy
 917 of types and functions are declared to accept only certain types, thus
 918 providing the increased compile-time error-checking of static typing.
 919 @end enumerate
 920
 921 The Java language also has some negative attributes:
 922
 923 @enumerate
 924 @item
 925 Java uses the edit/compile/run model of software development.  This
 926 makes it hard to use interactively.  For example, to use Java like
 927 @code{bc} it is necessary to write a special purpose, albeit tiny,
 928 application.  In Emacs Lisp, a calculator comes built-in without any
 929 effort - one can always just type an expression in the @code{*scratch*}
 930 buffer.
 931 @item
 932 Java tries too hard to enforce, not merely enable, portability, making
 933 ordinary access to standard OS facilities painful.  Java has an
 934 @dfn{agenda}.  I think this is why @code{chdir} is not part of standard
 935 Java, which is inexcusable.
 936 @end enumerate
 937
 938 Unfortunately, there is no perfect language.  Static typing allows a
 939 compiler to catch programmer errors and produce more efficient code, but
 940 makes programming more tedious and less fun.  For the forseeable future,
 941 an Ideal Editing and Programming Environment (and that is what XEmacs
 942 aspires to) will be programmable in multiple languages: high level ones
 943 like Lisp for user customization and prototyping, and lower level ones
 944 for infrastructure and industrial strength applications.  If I had my
 945 way, XEmacs would be friendly towards the Python, Scheme, C++, ML,
 946 etc... communities.  But there are serious technical difficulties to
 947 achieving that goal.
 948
 949 The word @dfn{application} in the previous paragraph was used
 950 intentionally.  XEmacs implements an API for programs written in Lisp
 951 that makes it a full-fledged application platform, very much like an OS
 952 inside the real OS.
 953
 954 @node XEmacs From the Perspective of Building, XEmacs From the Inside, The Lisp Language, Top
 955 @chapter XEmacs From the Perspective of Building
 956
 957 The heart of XEmacs is the Lisp environment, which is written in C.
 958 This is contained in the @file{src/} subdirectory.  Underneath
 959 @file{src/} are two subdirectories of header files: @file{s/} (header
 960 files for particular operating systems) and @file{m/} (header files for
 961 particular machine types).  In practice the distinction between the two
 962 types of header files is blurred.  These header files define or undefine
 963 certain preprocessor constants and macros to indicate particular
 964 characteristics of the associated machine or operating system.  As part
 965 of the configure process, one @file{s/} file and one @file{m/} file is
 966 identified for the particular environment in which XEmacs is being
 967 built.
 968
 969 XEmacs also contains a great deal of Lisp code.  This implements the
 970 operations that make XEmacs useful as an editor as well as just a Lisp
 971 environment, and also contains many add-on packages that allow XEmacs to
 972 browse directories, act as a mail and Usenet news reader, compile Lisp
 973 code, etc.  There is actually more Lisp code than C code associated with
 974 XEmacs, but much of the Lisp code is peripheral to the actual operation
 975 of the editor.  The Lisp code all lies in subdirectories underneath the
 976 @file{lisp/} directory.
 977
 978 The @file{lwlib/} directory contains C code that implements a
 979 generalized interface onto different X widget toolkits and also
 980 implements some widgets of its own that behave like Motif widgets but
 981 are faster, free, and in some cases more powerful.  The code in this
 982 directory compiles into a library and is mostly independent from XEmacs.
 983
 984 The @file{etc/} directory contains various data files associated with
 985 XEmacs.  Some of them are actually read by XEmacs at startup; others
 986 merely contain useful information of various sorts.
 987
 988 The @file{lib-src/} directory contains C code for various auxiliary
 989 programs that are used in connection with XEmacs.  Some of them are used
 990 during the build process; others are used to perform certain functions
 991 that cannot conveniently be placed in the XEmacs executable (e.g. the
 992 @file{movemail} program for fetching mail out of @file{/var/spool/mail},
 993 which must be setgid to @file{mail} on many systems; and the
 994 @file{gnuclient} program, which allows an external script to communicate
 995 with a running XEmacs process).
 996
 997 The @file{man/} directory contains the sources for the XEmacs
 998 documentation.  It is mostly in a form called Texinfo, which can be
 999 converted into either a printed document (by passing it through @TeX{})
1000 or into on-line documentation called @dfn{info files}.
1001
1002 The @file{info/} directory contains the results of formatting the XEmacs
1003 documentation as @dfn{info files}, for on-line use.  These files are
1004 used when you enter the Info system using @kbd{C-h i} or through the
1005 Help menu.
1006
1007 The @file{dynodump/} directory contains auxiliary code used to build
1008 XEmacs on Solaris platforms.
1009
1010 The other directories contain various miscellaneous code and information
1011 that is not normally used or needed.
1012
1013 The first step of building involves running the @file{configure} program
1014 and passing it various parameters to specify any optional features you
1015 want and compiler arguments and such, as described in the @file{INSTALL}
1016 file.  This determines what the build environment is, chooses the
1017 appropriate @file{s/} and @file{m/} file, and runs a series of tests to
1018 determine many details about your environment, such as which library
1019 functions are available and exactly how they work.  The reason for
1020 running these tests is that it allows XEmacs to be compiled on a much
1021 wider variety of platforms than those that the XEmacs developers happen
1022 to be familiar with, including various sorts of hybrid platforms.  This
1023 is especially important now that many operating systems give you a great
1024 deal of control over exactly what features you want installed, and allow
1025 for easy upgrading of parts of a system without upgrading the rest.  It
1026 would be impossible to pre-determine and pre-specify the information for
1027 all possible configurations.
1028
1029 In fact, the @file{s/} and @file{m/} files are basically @emph{evil},
1030 since they contain unmaintainable platform-specific hard-coded
1031 information.  XEmacs has been moving in the direction of having all
1032 system-specific information be determined dynamically by
1033 @file{configure}.  Perhaps someday we can @code{rm -rf src/s src/m}.
1034
1035 When configure is done running, it generates @file{Makefile}s and
1036 @file{GNUmakefile}s and the file @file{src/config.h} (which describes
1037 the features of your system) from template files.  You then run
1038 @file{make}, which compiles the auxiliary code and programs in
1039 @file{lib-src/} and @file{lwlib/} and the main XEmacs executable in
1040 @file{src/}.  The result of compiling and linking is an executable
1041 called @file{temacs}, which is @emph{not} the final XEmacs executable.
1042 @file{temacs} by itself is not intended to function as an editor or even
1043 display any windows on the screen, and if you simply run it, it will
1044 exit immediately.  The @file{Makefile} runs @file{temacs} with certain
1045 options that cause it to initialize itself, read in a number of basic
1046 Lisp files, and then dump itself out into a new executable called
1047 @file{xemacs}.  This new executable has been pre-initialized and
1048 contains pre-digested Lisp code that is necessary for the editor to
1049 function (this includes most basic editing functions,
1050 e.g. @code{kill-line}, that can be defined in terms of other Lisp
1051 primitives; some initialization code that is called when certain
1052 objects, such as frames, are created; and all of the standard
1053 keybindings and code for the actions they result in).  This executable,
1054 @file{xemacs}, is the executable that you run to use the XEmacs editor.
1055
1056 Although @file{temacs} is not intended to be run as an editor, it can,
1057 by using the incantation @code{temacs -batch -l loadup.el run-temacs}.
1058 This is useful when the dumping procedure described above is broken, or
1059 when using certain program debugging tools such as Purify.  These tools
1060 get mighty confused by the tricks played by the XEmacs build process,
1061 such as allocation memory in one process, and freeing it in the next.
1062
1063 @node XEmacs From the Inside, The XEmacs Object System (Abstractly Speaking), XEmacs From the Perspective of Building, Top
1064 @chapter XEmacs From the Inside
1065
1066 Internally, XEmacs is quite complex, and can be very confusing.  To
1067 simplify things, it can be useful to think of XEmacs as containing an
1068 event loop that ``drives'' everything, and a number of other subsystems,
1069 such as a Lisp engine and a redisplay mechanism.  Each of these other
1070 subsystems exists simultaneously in XEmacs, and each has a certain
1071 state.  The flow of control continually passes in and out of these
1072 different subsystems in the course of normal operation of the editor.
1073
1074 It is important to keep in mind that, most of the time, the editor is
1075 ``driven'' by the event loop.  Except during initialization and batch
1076 mode, all subsystems are entered directly or indirectly through the
1077 event loop, and ultimately, control exits out of all subsystems back up
1078 to the event loop.  This cycle of entering a subsystem, exiting back out
1079 to the event loop, and starting another iteration of the event loop
1080 occurs once each keystroke, mouse motion, etc.
1081
1082 If you're trying to understand a particular subsystem (other than the
1083 event loop), think of it as a ``daemon'' process or ``servant'' that is
1084 responsible for one particular aspect of a larger system, and
1085 periodically receives commands or environment changes that cause it to
1086 do something.  Ultimately, these commands and environment changes are
1087 always triggered by the event loop.  For example:
1088
1089 @itemize @bullet
1090 @item
1091 The window and frame mechanism is responsible for keeping track of what
1092 windows and frames exist, what buffers are in them, etc.  It is
1093 periodically given commands (usually from the user) to make a change to
1094 the current window/frame state: i.e. create a new frame, delete a
1095 window, etc.
1096
1097 @item
1098 The buffer mechanism is responsible for keeping track of what buffers
1099 exist and what text is in them.  It is periodically given commands
1100 (usually from the user) to insert or delete text, create a buffer, etc.
1101 When it receives a text-change command, it notifies the redisplay
1102 mechanism.
1103
1104 @item
1105 The redisplay mechanism is responsible for making sure that windows and
1106 frames are displayed correctly.  It is periodically told (by the event
1107 loop) to actually ``do its job'', i.e. snoop around and see what the
1108 current state of the environment (mostly of the currently-existing
1109 windows, frames, and buffers) is, and make sure that that state matches
1110 what's actually displayed.  It keeps lots and lots of information around
1111 (such as what is actually being displayed currently, and what the
1112 environment was last time it checked) so that it can minimize the work
1113 it has to do.  It is also helped along in that whenever a relevant
1114 change to the environment occurs, the redisplay mechanism is told about
1115 this, so it has a pretty good idea of where it has to look to find
1116 possible changes and doesn't have to look everywhere.
1117
1118 @item
1119 The Lisp engine is responsible for executing the Lisp code in which most
1120 user commands are written.  It is entered through a call to @code{eval}
1121 or @code{funcall}, which occurs as a result of dispatching an event from
1122 the event loop.  The functions it calls issue commands to the buffer
1123 mechanism, the window/frame subsystem, etc.
1124
1125 @item
1126 The Lisp allocation subsystem is responsible for keeping track of Lisp
1127 objects.  It is given commands from the Lisp engine to allocate objects,
1128 garbage collect, etc.
1129 @end itemize
1130
1131 etc.
1132
1133   The important idea here is that there are a number of independent
1134 subsystems each with its own responsibility and persistent state, just
1135 like different employees in a company, and each subsystem is
1136 periodically given commands from other subsystems.  Commands can flow
1137 from any one subsystem to any other, but there is usually some sort of
1138 hierarchy, with all commands originating from the event subsystem.
1139
1140   XEmacs is entered in @code{main()}, which is in @file{emacs.c}.  When
1141 this is called the first time (in a properly-invoked @file{temacs}), it
1142 does the following:
1143
1144 @enumerate
1145 @item
1146 It does some very basic environment initializations, such as determining
1147 where it and its directories (e.g. @file{lisp/} and @file{etc/}) reside
1148 and setting up signal handlers.
1149 @item
1150 It initializes the entire Lisp interpreter.
1151 @item
1152 It sets the initial values of many built-in variables (including many
1153 variables that are visible to Lisp programs), such as the global keymap
1154 object and the built-in faces (a face is an object that describes the
1155 display characteristics of text).  This involves creating Lisp objects
1156 and thus is dependent on step (2).
1157 @item
1158 It performs various other initializations that are relevant to the
1159 particular environment it is running in, such as retrieving environment
1160 variables, determining the current date and the user who is running the
1161 program, examining its standard input, creating any necessary file
1162 descriptors, etc.
1163 @item
1164 At this point, the C initialization is complete.  A Lisp program that
1165 was specified on the command line (usually @file{loadup.el}) is called
1166 (temacs is normally invoked as @code{temacs -batch -l loadup.el dump}).
1167 @file{loadup.el} loads all of the other Lisp files that are needed for
1168 the operation of the editor, calls the @code{dump-emacs} function to
1169 write out @file{xemacs}, and then kills the temacs process.
1170 @end enumerate
1171
1172   When @file{xemacs} is then run, it only redoes steps (1) and (4)
1173 above; all variables already contain the values they were set to when
1174 the executable was dumped, and all memory that was allocated with
1175 @code{malloc()} is still around. (XEmacs knows whether it is being run
1176 as @file{xemacs} or @file{temacs} because it sets the global variable
1177 @code{initialized} to 1 after step (4) above.) At this point,
1178 @file{xemacs} calls a Lisp function to do any further initialization,
1179 which includes parsing the command-line (the C code can only do limited
1180 command-line parsing, which includes looking for the @samp{-batch} and
1181 @samp{-l} flags and a few other flags that it needs to know about before
1182 initialization is complete), creating the first frame (or @dfn{window}
1183 in standard window-system parlance), running the user's init file
1184 (usually the file @file{.emacs} in the user's home directory), etc.  The
1185 function to do this is usually called @code{normal-top-level};
1186 @file{loadup.el} tells the C code about this function by setting its
1187 name as the value of the Lisp variable @code{top-level}.
1188
1189   When the Lisp initialization code is done, the C code enters the event
1190 loop, and stays there for the duration of the XEmacs process.  The code
1191 for the event loop is contained in @file{keyboard.c}, and is called
1192 @code{Fcommand_loop_1()}.  Note that this event loop could very well be
1193 written in Lisp, and in fact a Lisp version exists; but apparently,
1194 doing this makes XEmacs run noticeably slower.
1195
1196   Notice how much of the initialization is done in Lisp, not in C.
1197 In general, XEmacs tries to move as much code as is possible
1198 into Lisp.  Code that remains in C is code that implements the
1199 Lisp interpreter itself, or code that needs to be very fast, or
1200 code that needs to do system calls or other such stuff that
1201 needs to be done in C, or code that needs to have access to
1202 ``forbidden'' structures. (One conscious aspect of the design of
1203 Lisp under XEmacs is a clean separation between the external
1204 interface to a Lisp object's functionality and its internal
1205 implementation.  Part of this design is that Lisp programs
1206 are forbidden from accessing the contents of the object other
1207 than through using a standard API.  In this respect, XEmacs Lisp
1208 is similar to modern Lisp dialects but differs from GNU Emacs,
1209 which tends to expose the implementation and allow Lisp
1210 programs to look at it directly.  The major advantage of
1211 hiding the implementation is that it allows the implementation
1212 to be redesigned without affecting any Lisp programs, including
1213 those that might want to be ``clever'' by looking directly at
1214 the object's contents and possibly manipulating them.)
1215
1216   Moving code into Lisp makes the code easier to debug and maintain and
1217 makes it much easier for people who are not XEmacs developers to
1218 customize XEmacs, because they can make a change with much less chance
1219 of obscure and unwanted interactions occurring than if they were to
1220 change the C code.
1221
1222 @node The XEmacs Object System (Abstractly Speaking), How Lisp Objects Are Represented in C, XEmacs From the Inside, Top
1223 @chapter The XEmacs Object System (Abstractly Speaking)
1224
1225   At the heart of the Lisp interpreter is its management of objects.
1226 XEmacs Lisp contains many built-in objects, some of which are
1227 simple and others of which can be very complex; and some of which
1228 are very common, and others of which are rarely used or are only
1229 used internally. (Since the Lisp allocation system, with its
1230 automatic reclamation of unused storage, is so much more convenient
1231 than @code{malloc()} and @code{free()}, the C code makes extensive use of it
1232 in its internal operations.)
1233
1234   The basic Lisp objects are
1235
1236 @table @code
1237 @item integer
1238 28 or 31 bits of precision, or 60 or 63 bits on 64-bit machines; the
1239 reason for this is described below when the internal Lisp object
1240 representation is described.
1241 @item float
1242 Same precision as a double in C.
1243 @item cons
1244 A simple container for two Lisp objects, used to implement lists and
1245 most other data structures in Lisp.
1246 @item char
1247 An object representing a single character of text; chars behave like
1248 integers in many ways but are logically considered text rather than
1249 numbers and have a different read syntax. (the read syntax for a char
1250 contains the char itself or some textual encoding of it---for example,
1251 a Japanese Kanji character might be encoded as @samp{^[$(B#&^[(B} using the
1252 ISO-2022 encoding standard---rather than the numerical representation
1253 of the char; this way, if the mapping between chars and integers
1254 changes, which is quite possible for Kanji characters and other extended
1255 characters, the same character will still be created.  Note that some
1256 primitives confuse chars and integers.  The worst culprit is @code{eq},
1257 which makes a special exception and considers a char to be @code{eq} to
1258 its integer equivalent, even though in no other case are objects of two
1259 different types @code{eq}.  The reason for this monstrosity is
1260 compatibility with existing code; the separation of char from integer
1261 came fairly recently.)
1262 @item symbol
1263 An object that contains Lisp objects and is referred to by name;
1264 symbols are used to implement variables and named functions
1265 and to provide the equivalent of preprocessor constants in C.
1266 @item vector
1267 A one-dimensional array of Lisp objects providing constant-time access
1268 to any of the objects; access to an arbitrary object in a vector is
1269 faster than for lists, but the operations that can be done on a vector
1270 are more limited.
1271 @item string
1272 Self-explanatory; behaves much like a vector of chars
1273 but has a different read syntax and is stored and manipulated
1274 more compactly.
1275 @item bit-vector
1276 A vector of bits; similar to a string in spirit.
1277 @item compiled-function
1278 An object containing compiled Lisp code, known as @dfn{byte code}.
1279 @item subr
1280 A Lisp primitive, i.e. a Lisp-callable function implemented in C.
1281 @end table
1282
1283 @cindex closure
1284 Note that there is no basic ``function'' type, as in more powerful
1285 versions of Lisp (where it's called a @dfn{closure}).  XEmacs Lisp does
1286 not provide the closure semantics implemented by Common Lisp and Scheme.
1287 The guts of a function in XEmacs Lisp are represented in one of four
1288 ways: a symbol specifying another function (when one function is an
1289 alias for another), a list (whose first element must be the symbol
1290 @code{lambda}) containing the function's source code, a
1291 compiled-function object, or a subr object. (In other words, given a
1292 symbol specifying the name of a function, calling @code{symbol-function}
1293 to retrieve the contents of the symbol's function cell will return one
1294 of these types of objects.)
1295
1296 XEmacs Lisp also contains numerous specialized objects used to implement
1297 the editor:
1298
1299 @table @code
1300 @item buffer
1301 Stores text like a string, but is optimized for insertion and deletion
1302 and has certain other properties that can be set.
1303 @item frame
1304 An object with various properties whose displayable representation is a
1305 @dfn{window} in window-system parlance.
1306 @item window
1307 A section of a frame that displays the contents of a buffer;
1308 often called a @dfn{pane} in window-system parlance.
1309 @item window-configuration
1310 An object that represents a saved configuration of windows in a frame.
1311 @item device
1312 An object representing a screen on which frames can be displayed;
1313 equivalent to a @dfn{display} in the X Window System and a @dfn{TTY} in
1314 character mode.
1315 @item face
1316 An object specifying the appearance of text or graphics; it has
1317 properties such as font, foreground color, and background color.
1318 @item marker
1319 An object that refers to a particular position in a buffer and moves
1320 around as text is inserted and deleted to stay in the same relative
1321 position to the text around it.
1322 @item extent
1323 Similar to a marker but covers a range of text in a buffer; can also
1324 specify properties of the text, such as a face in which the text is to
1325 be displayed, whether the text is invisible or unmodifiable, etc.
1326 @item event
1327 Generated by calling @code{next-event} and contains information
1328 describing a particular event happening in the system, such as the user
1329 pressing a key or a process terminating.
1330 @item keymap
1331 An object that maps from events (described using lists, vectors, and
1332 symbols rather than with an event object because the mapping is for
1333 classes of events, rather than individual events) to functions to
1334 execute or other events to recursively look up; the functions are
1335 described by name, using a symbol, or using lists to specify the
1336 function's code.
1337 @item glyph
1338 An object that describes the appearance of an image (e.g.  pixmap) on
1339 the screen; glyphs can be attached to the beginning or end of extents
1340 and in some future version of XEmacs will be able to be inserted
1341 directly into a buffer.
1342 @item process
1343 An object that describes a connection to an externally-running process.
1344 @end table
1345
1346   There are some other, less-commonly-encountered general objects:
1347
1348 @table @code
1349 @item hash-table
1350 An object that maps from an arbitrary Lisp object to another arbitrary
1351 Lisp object, using hashing for fast lookup.
1352 @item obarray
1353 A limited form of hash-table that maps from strings to symbols; obarrays
1354 are used to look up a symbol given its name and are not actually their
1355 own object type but are kludgily represented using vectors with hidden
1356 fields (this representation derives from GNU Emacs).
1357 @item specifier
1358 A complex object used to specify the value of a display property; a
1359 default value is given and different values can be specified for
1360 particular frames, buffers, windows, devices, or classes of device.
1361 @item char-table
1362 An object that maps from chars or classes of chars to arbitrary Lisp
1363 objects; internally char tables use a complex nested-vector
1364 representation that is optimized to the way characters are represented
1365 as integers.
1366 @item range-table
1367 An object that maps from ranges of integers to arbitrary Lisp objects.
1368 @end table
1369
1370   And some strange special-purpose objects:
1371
1372 @table @code
1373 @item charset
1374 @itemx coding-system
1375 Objects used when MULE, or multi-lingual/Asian-language, support is
1376 enabled.
1377 @item color-instance
1378 @itemx font-instance
1379 @itemx image-instance
1380 An object that encapsulates a window-system resource; instances are
1381 mostly used internally but are exposed on the Lisp level for cleanness
1382 of the specifier model and because it's occasionally useful for Lisp
1383 program to create or query the properties of instances.
1384 @item subwindow
1385 An object that encapsulate a @dfn{subwindow} resource, i.e. a
1386 window-system child window that is drawn into by an external process;
1387 this object should be integrated into the glyph system but isn't yet,
1388 and may change form when this is done.
1389 @item tooltalk-message
1390 @itemx tooltalk-pattern
1391 Objects that represent resources used in the ToolTalk interprocess
1392 communication protocol.
1393 @item toolbar-button
1394 An object used in conjunction with the toolbar.
1395 @end table
1396
1397   And objects that are only used internally:
1398
1399 @table @code
1400 @item opaque
1401 A generic object for encapsulating arbitrary memory; this allows you the
1402 generality of @code{malloc()} and the convenience of the Lisp object
1403 system.
1404 @item lstream
1405 A buffering I/O stream, used to provide a unified interface to anything
1406 that can accept output or provide input, such as a file descriptor, a
1407 stdio stream, a chunk of memory, a Lisp buffer, a Lisp string, etc.;
1408 it's a Lisp object to make its memory management more convenient.
1409 @item char-table-entry
1410 Subsidiary objects in the internal char-table representation.
1411 @item extent-auxiliary
1412 @itemx menubar-data
1413 @itemx toolbar-data
1414 Various special-purpose objects that are basically just used to
1415 encapsulate memory for particular subsystems, similar to the more
1416 general ``opaque'' object.
1417 @item symbol-value-forward
1418 @itemx symbol-value-buffer-local
1419 @itemx symbol-value-varalias
1420 @itemx symbol-value-lisp-magic
1421 Special internal-only objects that are placed in the value cell of a
1422 symbol to indicate that there is something special with this variable --
1423 e.g. it has no value, it mirrors another variable, or it mirrors some C
1424 variable; there is really only one kind of object, called a
1425 @dfn{symbol-value-magic}, but it is sort-of halfway kludged into
1426 semi-different object types.
1427 @end table
1428
1429 @cindex permanent objects
1430 @cindex temporary objects
1431   Some types of objects are @dfn{permanent}, meaning that once created,
1432 they do not disappear until explicitly destroyed, using a function such
1433 as @code{delete-buffer}, @code{delete-window}, @code{delete-frame}, etc.
1434 Others will disappear once they are not longer used, through the garbage
1435 collection mechanism.  Buffers, frames, windows, devices, and processes
1436 are among the objects that are permanent.  Note that some objects can go
1437 both ways: Faces can be created either way; extents are normally
1438 permanent, but detached extents (extents not referring to any text, as
1439 happens to some extents when the text they are referring to is deleted)
1440 are temporary.  Note that some permanent objects, such as faces and
1441 coding systems, cannot be deleted.  Note also that windows are unique in
1442 that they can be @emph{undeleted} after having previously been
1443 deleted. (This happens as a result of restoring a window configuration.)
1444
1445 @cindex read syntax
1446   Note that many types of objects have a @dfn{read syntax}, i.e. a way of
1447 specifying an object of that type in Lisp code.  When you load a Lisp
1448 file, or type in code to be evaluated, what really happens is that the
1449 function @code{read} is called, which reads some text and creates an object
1450 based on the syntax of that text; then @code{eval} is called, which
1451 possibly does something special; then this loop repeats until there's
1452 no more text to read. (@code{eval} only actually does something special
1453 with symbols, which causes the symbol's value to be returned,
1454 similar to referencing a variable; and with conses [i.e. lists],
1455 which cause a function invocation.  All other values are returned
1456 unchanged.)
1457
1458   The read syntax
1459
1460 @example
1461 17297
1462 @end example
1463
1464 converts to an integer whose value is 17297.
1465
1466 @example
1467 1.983e-4
1468 @end example
1469
1470 converts to a float whose value is 1.983e-4, or .0001983.
1471
1472 @example
1473 ?b
1474 @end example
1475
1476 converts to a char that represents the lowercase letter b.
1477
1478 @example
1479 ?^[$(B#&^[(B
1480 @end example
1481
1482 (where @samp{^[} actually is an @samp{ESC} character) converts to a
1483 particular Kanji character when using an ISO2022-based coding system for
1484 input. (To decode this goo: @samp{ESC} begins an escape sequence;
1485 @samp{ESC $ (} is a class of escape sequences meaning ``switch to a
1486 94x94 character set''; @samp{ESC $ ( B} means ``switch to Japanese
1487 Kanji''; @samp{#} and @samp{&} collectively index into a 94-by-94 array
1488 of characters [subtract 33 from the ASCII value of each character to get
1489 the corresponding index]; @samp{ESC (} is a class of escape sequences
1490 meaning ``switch to a 94 character set''; @samp{ESC (B} means ``switch
1491 to US ASCII''.  It is a coincidence that the letter @samp{B} is used to
1492 denote both Japanese Kanji and US ASCII.  If the first @samp{B} were
1493 replaced with an @samp{A}, you'd be requesting a Chinese Hanzi character
1494 from the GB2312 character set.)
1495
1496 @example
1497 "foobar"
1498 @end example
1499
1500 converts to a string.
1501
1502 @example
1503 foobar
1504 @end example
1505
1506 converts to a symbol whose name is @code{"foobar"}.  This is done by
1507 looking up the string equivalent in the global variable
1508 @code{obarray}, whose contents should be an obarray.  If no symbol
1509 is found, a new symbol with the name @code{"foobar"} is automatically
1510 created and added to @code{obarray}; this process is called
1511 @dfn{interning} the symbol.
1512 @cindex interning
1513
1514 @example
1515 (foo . bar)
1516 @end example
1517
1518 converts to a cons cell containing the symbols @code{foo} and @code{bar}.
1519
1520 @example
1521 (1 a 2.5)
1522 @end example
1523
1524 converts to a three-element list containing the specified objects
1525 (note that a list is actually a set of nested conses; see the
1526 XEmacs Lisp Reference).
1527
1528 @example
1529 [1 a 2.5]
1530 @end example
1531
1532 converts to a three-element vector containing the specified objects.
1533
1534 @example
1535 #[... ... ... ...]
1536 @end example
1537
1538 converts to a compiled-function object (the actual contents are not
1539 shown since they are not relevant here; look at a file that ends with
1540 @file{.elc} for examples).
1541
1542 @example
1543 #*01110110
1544 @end example
1545
1546 converts to a bit-vector.
1547
1548 @example
1549 #s(hash-table ... ...)
1550 @end example
1551
1552 converts to a hash table (the actual contents are not shown).
1553
1554 @example
1555 #s(range-table ... ...)
1556 @end example
1557
1558 converts to a range table (the actual contents are not shown).
1559
1560 @example
1561 #s(char-table ... ...)
1562 @end example
1563
1564 converts to a char table (the actual contents are not shown).
1565
1566 Note that the @code{#s()} syntax is the general syntax for structures,
1567 which are not really implemented in XEmacs Lisp but should be.
1568
1569 When an object is printed out (using @code{print} or a related
1570 function), the read syntax is used, so that the same object can be read
1571 in again.
1572
1573 The other objects do not have read syntaxes, usually because it does not
1574 really make sense to create them in this fashion (i.e.  processes, where
1575 it doesn't make sense to have a subprocess created as a side effect of
1576 reading some Lisp code), or because they can't be created at all
1577 (e.g. subrs).  Permanent objects, as a rule, do not have a read syntax;
1578 nor do most complex objects, which contain too much state to be easily
1579 initialized through a read syntax.
1580
1581 @node How Lisp Objects Are Represented in C, Rules When Writing New C Code, The XEmacs Object System (Abstractly Speaking), Top
1582 @chapter How Lisp Objects Are Represented in C
1583
1584 Lisp objects are represented in C using a 32-bit or 64-bit machine word
1585 (depending on the processor; i.e. DEC Alphas use 64-bit Lisp objects and
1586 most other processors use 32-bit Lisp objects).  The representation
1587 stuffs a pointer together with a tag, as follows:
1588
1589 @example
1590  [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
1591  [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
1592
1593    <---> ^ <------------------------------------------------------>
1594     tag  |       a pointer to a structure, or an integer
1595          |
1596        mark bit
1597 @end example
1598
1599 The tag describes the type of the Lisp object.  For integers and chars,
1600 the lower 28 bits contain the value of the integer or char; for all
1601 others, the lower 28 bits contain a pointer.  The mark bit is used
1602 during garbage-collection, and is always 0 when garbage collection is
1603 not happening. (The way that garbage collection works, basically, is that it
1604 loops over all places where Lisp objects could exist---this includes
1605 all global variables in C that contain Lisp objects [including
1606 @code{Vobarray}, the C equivalent of @code{obarray}; through this, all
1607 Lisp variables will get marked], plus various other places---and
1608 recursively scans through the Lisp objects, marking each object it finds
1609 by setting the mark bit.  Then it goes through the lists of all objects
1610 allocated, freeing the ones that are not marked and turning off the mark
1611 bit of the ones that are marked.)
1612
1613 Lisp objects use the typedef @code{Lisp_Object}, but the actual C type
1614 used for the Lisp object can vary.  It can be either a simple type
1615 (@code{long} on the DEC Alpha, @code{int} on other machines) or a
1616 structure whose fields are bit fields that line up properly (actually, a
1617 union of structures is used).  Generally the simple integral type is
1618 preferable because it ensures that the compiler will actually use a
1619 machine word to represent the object (some compilers will use more
1620 general and less efficient code for unions and structs even if they can
1621 fit in a machine word).  The union type, however, has the advantage of
1622 stricter type checking (if you accidentally pass an integer where a Lisp
1623 object is desired, you get a compile error), and it makes it easier to
1624 decode Lisp objects when debugging.  The choice of which type to use is
1625 determined by the preprocessor constant @code{USE_UNION_TYPE} which is
1626 defined via the @code{--use-union-type} option to @code{configure}.
1627
1628 @cindex record type
1629
1630 Note that there are only eight types that the tag can represent, but
1631 many more actual types than this.  This is handled by having one of the
1632 tag types specify a meta-type called a @dfn{record}; for all such
1633 objects, the first four bytes of the pointed-to structure indicate what
1634 the actual type is.
1635
1636 Note also that having 28 bits for pointers and integers restricts a lot
1637 of things to 256 megabytes of memory. (Basically, enough pointers and
1638 indices and whatnot get stuffed into Lisp objects that the total amount
1639 of memory used by XEmacs can't grow above 256 megabytes.  In older
1640 versions of XEmacs and GNU Emacs, the tag was 5 bits wide, allowing for
1641 32 types, which was more than the actual number of types that existed at
1642 the time, and no ``record'' type was necessary.  However, this limited
1643 the editor to 64 megabytes total, which some users who edited large
1644 files might conceivably exceed.)
1645
1646 Also, note that there is an implicit assumption here that all pointers
1647 are low enough that the top bits are all zero and can just be chopped
1648 off.  On standard machines that allocate memory from the bottom up (and
1649 give each process its own address space), this works fine.  Some
1650 machines, however, put the data space somewhere else in memory
1651 (e.g. beginning at 0x80000000).  Those machines cope by defining
1652 @code{DATA_SEG_BITS} in the corresponding @file{m/} or @file{s/} file to
1653 the proper mask.  Then, pointers retrieved from Lisp objects are
1654 automatically OR'ed with this value prior to being used.
1655
1656 A corollary of the previous paragraph is that @strong{(pointers to)
1657 stack-allocated structures cannot be put into Lisp objects}.  The stack
1658 is generally located near the top of memory; if you put such a pointer
1659 into a Lisp object, it will get its top bits chopped off, and you will
1660 lose.
1661
1662 Actually, there's an alternative representation of a @code{Lisp_Object},
1663 invented by Kyle Jones, that is used when the
1664 @code{--use-minimal-tagbits} option to @code{configure} is used.  In
1665 this case the 2 lower bits are used for the tag bits.  This
1666 representation assumes that pointers to structs are always aligned to
1667 multiples of 4, so the lower 2 bits are always zero.
1668
1669 @example
1670  [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
1671  [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
1672
1673    <---------------------------------------------------------> <->
1674             a pointer to a structure, or an integer            tag
1675 @end example
1676
1677 A tag of 00 is used for all pointer object types, a tag of 10 is used
1678 for characters, and the other two tags 01 and 11 are joined together to
1679 form the integer object type.  The markbit is moved to part of the
1680 structure being pointed at (integers and chars do not need to be marked,
1681 since no memory is allocated).  This representation has these
1682 advantages:
1683
1684 @enumerate
1685 @item
1686 31 bits can be used for Lisp Integers.
1687 @item
1688 @emph{Any} pointer can be represented directly, and no bit masking
1689 operations are necessary.
1690 @end enumerate
1691
1692 The disadvantages are:
1693
1694 @enumerate
1695 @item
1696 An extra level of indirection is needed when accessing the object types
1697 that were not record types.  So checking whether a Lisp object is a cons
1698 cell becomes a slower operation.
1699 @item
1700 Mark bits can no longer be stored directly in Lisp objects, so another
1701 place for them must be found.  This means that a cons cell requires more
1702 memory than merely room for 2 lisp objects, leading to extra memory use.
1703 @end enumerate
1704
1705 Various macros are used to construct Lisp objects and extract the
1706 components.  Macros of the form @code{XINT()}, @code{XCHAR()},
1707 @code{XSTRING()}, @code{XSYMBOL()}, etc. mask out the pointer/integer
1708 field and cast it to the appropriate type.  All of the macros that
1709 construct pointers will @code{OR} with @code{DATA_SEG_BITS} if
1710 necessary.  @code{XINT()} needs to be a bit tricky so that negative
1711 numbers are properly sign-extended: Usually it does this by shifting the
1712 number four bits to the left and then four bits to the right.  This
1713 assumes that the right-shift operator does an arithmetic shift (i.e. it
1714 leaves the most-significant bit as-is rather than shifting in a zero, so
1715 that it mimics a divide-by-two even for negative numbers).  Not all
1716 machines/compilers do this, and on the ones that don't, a more
1717 complicated definition is selected by defining
1718 @code{EXPLICIT_SIGN_EXTEND}.
1719
1720 Note that when @code{ERROR_CHECK_TYPECHECK} is defined, the extractor
1721 macros become more complicated---they check the tag bits and/or the
1722 type field in the first four bytes of a record type to ensure that the
1723 object is really of the correct type.  This is great for catching places
1724 where an incorrect type is being dereferenced---this typically results
1725 in a pointer being dereferenced as the wrong type of structure, with
1726 unpredictable (and sometimes not easily traceable) results.
1727
1728 There are similar @code{XSET@var{TYPE}()} macros that construct a Lisp
1729 object.  These macros are of the form @code{XSET@var{TYPE}
1730 (@var{lvalue}, @var{result})},
1731 i.e. they have to be a statement rather than just used in an expression.
1732 The reason for this is that standard C doesn't let you ``construct'' a
1733 structure (but GCC does).  Granted, this sometimes isn't too convenient;
1734 for the case of integers, at least, you can use the function
1735 @code{make_int()}, which constructs and @emph{returns} an integer
1736 Lisp object.  Note that the @code{XSET@var{TYPE}()} macros are also
1737 affected by @code{ERROR_CHECK_TYPECHECK} and make sure that the
1738 structure is of the right type in the case of record types, where the
1739 type is contained in the structure.
1740
1741 The C programmer is responsible for @strong{guaranteeing} that a
1742 Lisp_Object is is the correct type before using the @code{X@var{TYPE}}
1743 macros.  This is especially important in the case of lists.  Use
1744 @code{XCAR} and @code{XCDR} if a Lisp_Object is certainly a cons cell,
1745 else use @code{Fcar()} and @code{Fcdr()}.  Trust other C code, but not
1746 Lisp code.  On the other hand, if XEmacs has an internal logic error,
1747 it's better to crash immediately, so sprinkle ``unreachable''
1748 @code{abort()}s liberally about the source code.
1749
1750 @node Rules When Writing New C Code, A Summary of the Various XEmacs Modules, How Lisp Objects Are Represented in C, Top
1751 @chapter Rules When Writing New C Code
1752
1753 The XEmacs C Code is extremely complex and intricate, and there are many
1754 rules that are more or less consistently followed throughout the code.
1755 Many of these rules are not obvious, so they are explained here.  It is
1756 of the utmost importance that you follow them.  If you don't, you may
1757 get something that appears to work, but which will crash in odd
1758 situations, often in code far away from where the actual breakage is.
1759
1760 @menu
1761 * General Coding Rules::
1762 * Writing Lisp Primitives::
1763 * Adding Global Lisp Variables::
1764 * Coding for Mule::
1765 * Techniques for XEmacs Developers::
1766 @end menu
1767
1768 @node General Coding Rules
1769 @section General Coding Rules
1770
1771 The C code is actually written in a dialect of C called @dfn{Clean C},
1772 meaning that it can be compiled, mostly warning-free, with either a C or
1773 C++ compiler.  Coding in Clean C has several advantages over plain C.
1774 C++ compilers are more nit-picking, and a number of coding errors have
1775 been found by compiling with C++.  The ability to use both C and C++
1776 tools means that a greater variety of development tools are available to
1777 the developer.
1778
1779 Almost every module contains a @code{syms_of_*()} function and a
1780 @code{vars_of_*()} function.  The former declares any Lisp primitives
1781 you have defined and defines any symbols you will be using.  The latter
1782 declares any global Lisp variables you have added and initializes global
1783 C variables in the module.  For each such function, declare it in
1784 @file{symsinit.h} and make sure it's called in the appropriate place in
1785 @file{emacs.c}.  @strong{Important}: There are stringent requirements on
1786 exactly what can go into these functions.  See the comment in
1787 @file{emacs.c}.  The reason for this is to avoid obscure unwanted
1788 interactions during initialization.  If you don't follow these rules,
1789 you'll be sorry!  If you want to do anything that isn't allowed, create
1790 a @code{complex_vars_of_*()} function for it.  Doing this is tricky,
1791 though: You have to make sure your function is called at the right time
1792 so that all the initialization dependencies work out.
1793
1794 Every module includes @file{<config.h>} (angle brackets so that
1795 @samp{--srcdir} works correctly; @file{config.h} may or may not be in
1796 the same directory as the C sources) and @file{lisp.h}.  @file{config.h}
1797 must always be included before any other header files (including
1798 system header files) to ensure that certain tricks played by various
1799 @file{s/} and @file{m/} files work out correctly.
1800
1801 When including header files, always use angle brackets, not double
1802 quotes, except when the file to be included is in the same directory as
1803 the including file.  If either file is a generated file, then that is
1804 not likely to be the case.  In order to understand why we have this
1805 rule, imagine what happens when you do a build in the source directory
1806 using @samp{./configure} and another build in another directory using
1807 @samp{../work/configure}.  There will be two different @file{config.h}
1808 files.  Which one will be used if you @samp{#include "config.h"}?
1809
1810 @strong{All global and static variables that are to be modifiable must
1811 be declared uninitialized.}  This means that you may not use the
1812 ``declare with initializer'' form for these variables, such as @code{int
1813 some_variable = 0;}.  The reason for this has to do with some kludges
1814 done during the dumping process: If possible, the initialized data
1815 segment is re-mapped so that it becomes part of the (unmodifiable) code
1816 segment in the dumped executable.  This allows this memory to be shared
1817 among multiple running XEmacs processes.  XEmacs is careful to place as
1818 much constant data as possible into initialized variables (in
1819 particular, into what's called the @dfn{pure space}---see below) during
1820 the @file{temacs} phase.
1821
1822 @cindex copy-on-write
1823 @strong{Please note:} This kludge only works on a few systems nowadays,
1824 and is rapidly becoming irrelevant because most modern operating systems
1825 provide @dfn{copy-on-write} semantics.  All data is initially shared
1826 between processes, and a private copy is automatically made (on a
1827 page-by-page basis) when a process first attempts to write to a page of
1828 memory.
1829
1830 Formerly, there was a requirement that static variables not be declared
1831 inside of functions.  This had to do with another hack along the same
1832 vein as what was just described: old USG systems put statically-declared
1833 variables in the initialized data space, so those header files had a
1834 @code{#define static} declaration. (That way, the data-segment remapping
1835 described above could still work.) This fails badly on static variables
1836 inside of functions, which suddenly become automatic variables;
1837 therefore, you weren't supposed to have any of them.  This awful kludge
1838 has been removed in XEmacs because
1839
1840 @enumerate
1841 @item
1842 almost all of the systems that used this kludge ended up having
1843 to disable the data-segment remapping anyway;
1844 @item
1845 the only systems that didn't were extremely outdated ones;
1846 @item
1847 this hack completely messed up inline functions.
1848 @end enumerate
1849
1850 The C source code makes heavy use of C preprocessor macros.  One popular
1851 macro style is:
1852
1853 @example
1854 #define FOO(var, value) do @{           \
1855   Lisp_Object FOO_value = (value);      \
1856   ... /* compute using FOO_value */     \
1857   (var) = bar;                          \
1858 @} while (0)
1859 @end example
1860
1861 The @code{do @{...@} while (0)} is a standard trick to allow FOO to have
1862 statement semantics, so that it can safely be used within an @code{if}
1863 statement in C, for example.  Multiple evaluation is prevented by
1864 copying a supplied argument into a local variable, so that
1865 @code{FOO(var,fun(1))} only calls @code{fun} once.
1866
1867 Lisp lists are popular data structures in the C code as well as in
1868 Elisp.  There are two sets of macros that iterate over lists.
1869 @code{EXTERNAL_LIST_LOOP_@var{n}} should be used when the list has been
1870 supplied by the user, and cannot be trusted to be acyclic and
1871 nil-terminated.  A @code{malformed-list} or @code{circular-list} error
1872 will be generated if the list being iterated over is not entirely
1873 kosher.  @code{LIST_LOOP_@var{n}}, on the other hand, is faster and less
1874 safe, and can be used only on trusted lists.
1875
1876 Related macros are @code{GET_EXTERNAL_LIST_LENGTH} and
1877 @code{GET_LIST_LENGTH}, which calculate the length of a list, and in the
1878 case of @code{GET_EXTERNAL_LIST_LENGTH}, validating the properness of
1879 the list.  The macros @code{EXTERNAL_LIST_LOOP_DELETE_IF} and
1880 @code{LIST_LOOP_DELETE_IF} delete elements from a lisp list satisfying some
1881 predicate.
1882
1883 @node Writing Lisp Primitives
1884 @section Writing Lisp Primitives
1885
1886 Lisp primitives are Lisp functions implemented in C.  The details of
1887 interfacing the C function so that Lisp can call it are handled by a few
1888 C macros.  The only way to really understand how to write new C code is
1889 to read the source, but we can explain some things here.
1890
1891 An example of a special form is the definition of @code{prog1}, from
1892 @file{eval.c}.  (An ordinary function would have the same general
1893 appearance.)
1894
1895 @cindex garbage collection protection
1896 @smallexample
1897 @group
1898 DEFUN ("prog1", Fprog1, 1, UNEVALLED, 0, /*
1899 Similar to `progn', but the value of the first form is returned.
1900 \(prog1 FIRST BODY...): All the arguments are evaluated sequentially.
1901 The value of FIRST is saved during evaluation of the remaining args,
1902 whose values are discarded.
1903 */
1904        (args))
1905 @{
1906   /* This function can GC */
1907   REGISTER Lisp_Object val, form, tail;
1908   struct gcpro gcpro1;
1909
1910   val = Feval (XCAR (args));
1911
1912   GCPRO1 (val);
1913
1914   LIST_LOOP_3 (form, XCDR (args), tail)
1915     Feval (form);
1916
1917   UNGCPRO;
1918   return val;
1919 @}
1920 @end group
1921 @end smallexample
1922
1923   Let's start with a precise explanation of the arguments to the
1924 @code{DEFUN} macro.  Here is a template for them:
1925
1926 @example
1927 @group
1928 DEFUN (@var{lname}, @var{fname}, @var{min_args}, @var{max_args}, @var{interactive}, /*
1929 @var{docstring}
1930 */
1931    (@var{arglist}))
1932 @end group
1933 @end example
1934
1935 @table @var
1936 @item lname
1937 This string is the name of the Lisp symbol to define as the function
1938 name; in the example above, it is @code{"prog1"}.
1939
1940 @item fname
1941 This is the C function name for this function.  This is the name that is
1942 used in C code for calling the function.  The name is, by convention,
1943 @samp{F} prepended to the Lisp name, with all dashes (@samp{-}) in the
1944 Lisp name changed to underscores.  Thus, to call this function from C
1945 code, call @code{Fprog1}.  Remember that the arguments are of type
1946 @code{Lisp_Object}; various macros and functions for creating values of
1947 type @code{Lisp_Object} are declared in the file @file{lisp.h}.
1948
1949 Primitives whose names are special characters (e.g. @code{+} or
1950 @code{<}) are named by spelling out, in some fashion, the special
1951 character: e.g. @code{Fplus()} or @code{Flss()}.  Primitives whose names
1952 begin with normal alphanumeric characters but also contain special
1953 characters are spelled out in some creative way, e.g. @code{let*}
1954 becomes @code{FletX()}.
1955
1956 Each function also has an associated structure that holds the data for
1957 the subr object that represents the function in Lisp.  This structure
1958 conveys the Lisp symbol name to the initialization routine that will
1959 create the symbol and store the subr object as its definition.  The C
1960 variable name of this structure is always @samp{S} prepended to the
1961 @var{fname}.  You hardly ever need to be aware of the existence of this
1962 structure, since @code{DEFUN} plus @code{DEFSUBR} takes care of all the
1963 details.
1964
1965 @item min_args
1966 This is the minimum number of arguments that the function requires.  The
1967 function @code{prog1} allows a minimum of one argument.
1968
1969 @item max_args
1970 This is the maximum number of arguments that the function accepts, if
1971 there is a fixed maximum.  Alternatively, it can be @code{UNEVALLED},
1972 indicating a special form that receives unevaluated arguments, or
1973 @code{MANY}, indicating an unlimited number of evaluated arguments (the
1974 C equivalent of @code{&rest}).  Both @code{UNEVALLED} and @code{MANY}
1975 are macros.  If @var{max_args} is a number, it may not be less than
1976 @var{min_args} and it may not be greater than 8. (If you need to add a
1977 function with more than 8 arguments, use the @code{MANY} form.  Resist
1978 the urge to edit the definition of @code{DEFUN} in @file{lisp.h}.  If
1979 you do it anyways, make sure to also add another clause to the switch
1980 statement in @code{primitive_funcall().})
1981
1982 @item interactive
1983 This is an interactive specification, a string such as might be used as
1984 the argument of @code{interactive} in a Lisp function.  In the case of
1985 @code{prog1}, it is 0 (a null pointer), indicating that @code{prog1}
1986 cannot be called interactively.  A value of @code{""} indicates a
1987 function that should receive no arguments when called interactively.
1988
1989 @item docstring
1990 This is the documentation string.  It is written just like a
1991 documentation string for a function defined in Lisp; in particular, the
1992 first line should be a single sentence.  Note how the documentation
1993 string is enclosed in a comment, none of the documentation is placed on
1994 the same lines as the comment-start and comment-end characters, and the
1995 comment-start characters are on the same line as the interactive
1996 specification.  @file{make-docfile}, which scans the C files for
1997 documentation strings, is very particular about what it looks for, and
1998 will not properly extract the doc string if it's not in this exact format.
1999
2000 In order to make both @file{etags} and @file{make-docfile} happy, make
2001 sure that the @code{DEFUN} line contains the @var{lname} and
2002 @var{fname}, and that the comment-start characters for the doc string
2003 are on the same line as the interactive specification, and put a newline
2004 directly after them (and before the comment-end characters).
2005
2006 @item arglist
2007 This is the comma-separated list of arguments to the C function.  For a
2008 function with a fixed maximum number of arguments, provide a C argument
2009 for each Lisp argument.  In this case, unlike regular C functions, the
2010 types of the arguments are not declared; they are simply always of type
2011 @code{Lisp_Object}.
2012
2013 The names of the C arguments will be used as the names of the arguments
2014 to the Lisp primitive as displayed in its documentation, modulo the same
2015 concerns described above for @code{F...} names (in particular,
2016 underscores in the C arguments become dashes in the Lisp arguments).
2017
2018 There is one additional kludge: A trailing `_' on the C argument is
2019 discarded when forming the Lisp argument.  This allows C language
2020 reserved words (like @code{default}) or global symbols (like
2021 @code{dirname}) to be used as argument names without compiler warnings
2022 or errors.
2023
2024 A Lisp function with @w{@var{max_args} = @code{UNEVALLED}} is a
2025 @w{@dfn{special form}}; its arguments are not evaluated.  Instead it
2026 receives one argument of type @code{Lisp_Object}, a (Lisp) list of the
2027 unevaluated arguments, conventionally named @code{(args)}.
2028
2029 When a Lisp function has no upper limit on the number of arguments,
2030 specify @w{@var{max_args} = @code{MANY}}.  In this case its implementation in
2031 C actually receives exactly two arguments: the number of Lisp arguments
2032 (an @code{int}) and the address of a block containing their values (a
2033 @w{@code{Lisp_Object *}}).  In this case only are the C types specified
2034 in the @var{arglist}: @w{@code{(int nargs, Lisp_Object *args)}}.
2035
2036 @end table
2037
2038 Within the function @code{Fprog1} itself, note the use of the macros
2039 @code{GCPRO1} and @code{UNGCPRO}.  @code{GCPRO1} is used to ``protect''
2040 a variable from garbage collection---to inform the garbage collector
2041 that it must look in that variable and regard the object pointed at by
2042 its contents as an accessible object.  This is necessary whenever you
2043 call @code{Feval} or anything that can directly or indirectly call
2044 @code{Feval} (this includes the @code{QUIT} macro!).  At such a time,
2045 any Lisp object that you intend to refer to again must be protected
2046 somehow.  @code{UNGCPRO} cancels the protection of the variables that
2047 are protected in the current function.  It is necessary to do this
2048 explicitly.
2049
2050 The macro @code{GCPRO1} protects just one local variable.  If you want
2051 to protect two, use @code{GCPRO2} instead; repeating @code{GCPRO1} will
2052 not work.  Macros @code{GCPRO3} and @code{GCPRO4} also exist.
2053
2054 These macros implicitly use local variables such as @code{gcpro1}; you
2055 must declare these explicitly, with type @code{struct gcpro}.  Thus, if
2056 you use @code{GCPRO2}, you must declare @code{gcpro1} and @code{gcpro2}.
2057
2058 @cindex caller-protects (@code{GCPRO} rule)
2059 Note also that the general rule is @dfn{caller-protects}; i.e. you are
2060 only responsible for protecting those Lisp objects that you create.  Any
2061 objects passed to you as arguments should have been protected by whoever
2062 created them, so you don't in general have to protect them.
2063
2064 In particular, the arguments to any Lisp primitive are always
2065 automatically @code{GCPRO}ed, when called ``normally'' from Lisp code or
2066 bytecode.  So only a few Lisp primitives that are called frequently from
2067 C code, such as @code{Fprogn} protect their arguments as a service to
2068 their caller.  You don't need to protect your arguments when writing a
2069 new @code{DEFUN}.
2070
2071 @code{GCPRO}ing is perhaps the trickiest and most error-prone part of
2072 XEmacs coding.  It is @strong{extremely} important that you get this
2073 right and use a great deal of discipline when writing this code.
2074 @xref{GCPROing, ,@code{GCPRO}ing}, for full details on how to do this.
2075
2076 What @code{DEFUN} actually does is declare a global structure of type
2077 @code{Lisp_Subr} whose name begins with capital @samp{SF} and which
2078 contains information about the primitive (e.g. a pointer to the
2079 function, its minimum and maximum allowed arguments, a string describing
2080 its Lisp name); @code{DEFUN} then begins a normal C function declaration
2081 using the @code{F...} name.  The Lisp subr object that is the function
2082 definition of a primitive (i.e. the object in the function slot of the
2083 symbol that names the primitive) actually points to this @samp{SF}
2084 structure; when @code{Feval} encounters a subr, it looks in the
2085 structure to find out how to call the C function.
2086
2087 Defining the C function is not enough to make a Lisp primitive
2088 available; you must also create the Lisp symbol for the primitive (the
2089 symbol is @dfn{interned}; @pxref{Obarrays}) and store a suitable subr
2090 object in its function cell. (If you don't do this, the primitive won't
2091 be seen by Lisp code.) The code looks like this:
2092
2093 @example
2094 DEFSUBR (@var{fname});
2095 @end example
2096
2097 @noindent
2098 Here @var{fname} is the same name you used as the second argument to
2099 @code{DEFUN}.
2100
2101 This call to @code{DEFSUBR} should go in the @code{syms_of_*()} function
2102 at the end of the module.  If no such function exists, create it and
2103 make sure to also declare it in @file{symsinit.h} and call it from the
2104 appropriate spot in @code{main()}.  @xref{General Coding Rules}.
2105
2106 Note that C code cannot call functions by name unless they are defined
2107 in C.  The way to call a function written in Lisp from C is to use
2108 @code{Ffuncall}, which embodies the Lisp function @code{funcall}.  Since
2109 the Lisp function @code{funcall} accepts an unlimited number of
2110 arguments, in C it takes two: the number of Lisp-level arguments, and a
2111 one-dimensional array containing their values.  The first Lisp-level
2112 argument is the Lisp function to call, and the rest are the arguments to
2113 pass to it.  Since @code{Ffuncall} can call the evaluator, you must
2114 protect pointers from garbage collection around the call to
2115 @code{Ffuncall}. (However, @code{Ffuncall} explicitly protects all of
2116 its parameters, so you don't have to protect any pointers passed as
2117 parameters to it.)
2118
2119 The C functions @code{call0}, @code{call1}, @code{call2}, and so on,
2120 provide handy ways to call a Lisp function conveniently with a fixed
2121 number of arguments.  They work by calling @code{Ffuncall}.
2122
2123 @file{eval.c} is a very good file to look through for examples;
2124 @file{lisp.h} contains the definitions for important macros and
2125 functions.
2126
2127 @node Adding Global Lisp Variables
2128 @section Adding Global Lisp Variables
2129
2130 Global variables whose names begin with @samp{Q} are constants whose
2131 value is a symbol of a particular name.  The name of the variable should
2132 be derived from the name of the symbol using the same rules as for Lisp
2133 primitives.  These variables are initialized using a call to
2134 @code{defsymbol()} in the @code{syms_of_*()} function. (This call
2135 interns a symbol, sets the C variable to the resulting Lisp object, and
2136 calls @code{staticpro()} on the C variable to tell the
2137 garbage-collection mechanism about this variable.  What
2138 @code{staticpro()} does is add a pointer to the variable to a large
2139 global array; when garbage-collection happens, all pointers listed in
2140 the array are used as starting points for marking Lisp objects.  This is
2141 important because it's quite possible that the only current reference to
2142 the object is the C variable.  In the case of symbols, the
2143 @code{staticpro()} doesn't matter all that much because the symbol is
2144 contained in @code{obarray}, which is itself @code{staticpro()}ed.
2145 However, it's possible that a naughty user could do something like
2146 uninterning the symbol out of @code{obarray} or even setting
2147 @code{obarray} to a different value [although this is likely to make
2148 XEmacs crash!].)
2149
2150   @strong{Please note:} It is potentially deadly if you declare a
2151 @samp{Q...}  variable in two different modules.  The two calls to
2152 @code{defsymbol()} are no problem, but some linkers will complain about
2153 multiply-defined symbols.  The most insidious aspect of this is that
2154 often the link will succeed anyway, but then the resulting executable
2155 will sometimes crash in obscure ways during certain operations!  To
2156 avoid this problem, declare any symbols with common names (such as
2157 @code{text}) that are not obviously associated with this particular
2158 module in the module @file{general.c}.
2159
2160   Global variables whose names begin with @samp{V} are variables that
2161 contain Lisp objects.  The convention here is that all global variables
2162 of type @code{Lisp_Object} begin with @samp{V}, and all others don't
2163 (including integer and boolean variables that have Lisp
2164 equivalents). Most of the time, these variables have equivalents in
2165 Lisp, but some don't.  Those that do are declared this way by a call to
2166 @code{DEFVAR_LISP()} in the @code{vars_of_*()} initializer for the
2167 module.  What this does is create a special @dfn{symbol-value-forward}
2168 Lisp object that contains a pointer to the C variable, intern a symbol
2169 whose name is as specified in the call to @code{DEFVAR_LISP()}, and set
2170 its value to the symbol-value-forward Lisp object; it also calls
2171 @code{staticpro()} on the C variable to tell the garbage-collection
2172 mechanism about the variable.  When @code{eval} (or actually
2173 @code{symbol-value}) encounters this special object in the process of
2174 retrieving a variable's value, it follows the indirection to the C
2175 variable and gets its value.  @code{setq} does similar things so that
2176 the C variable gets changed.
2177
2178   Whether or not you @code{DEFVAR_LISP()} a variable, you need to
2179 initialize it in the @code{vars_of_*()} function; otherwise it will end
2180 up as all zeroes, which is the integer 0 (@emph{not} @code{nil}), and
2181 this is probably not what you want.  Also, if the variable is not
2182 @code{DEFVAR_LISP()}ed, @strong{you must call} @code{staticpro()} on the
2183 C variable in the @code{vars_of_*()} function.  Otherwise, the
2184 garbage-collection mechanism won't know that the object in this variable
2185 is in use, and will happily collect it and reuse its storage for another
2186 Lisp object, and you will be the one who's unhappy when you can't figure
2187 out how your variable got overwritten.
2188
2189 @node Coding for Mule
2190 @section Coding for Mule
2191 @cindex Coding for Mule
2192
2193 Although Mule support is not compiled by default in XEmacs, many people
2194 are using it, and we consider it crucial that new code works correctly
2195 with multibyte characters.  This is not hard; it is only a matter of
2196 following several simple user-interface guidelines.  Even if you never
2197 compile with Mule, with a little practice you will find it quite easy
2198 to code Mule-correctly.
2199
2200 Note that these guidelines are not necessarily tied to the current Mule
2201 implementation; they are also a good idea to follow on the grounds of
2202 code generalization for future I18N work.
2203
2204 @menu
2205 * Character-Related Data Types::
2206 * Working With Character and Byte Positions::
2207 * Conversion to and from External Data::
2208 * General Guidelines for Writing Mule-Aware Code::
2209 * An Example of Mule-Aware Code::
2210 @end menu
2211
2212 @node Character-Related Data Types
2213 @subsection Character-Related Data Types
2214
2215 First, let's review the basic character-related datatypes used by
2216 XEmacs.  Note that the separate @code{typedef}s are not mandatory in the
2217 current implementation (all of them boil down to @code{unsigned char} or
2218 @code{int}), but they improve clarity of code a great deal, because one
2219 glance at the declaration can tell the intended use of the variable.
2220
2221 @table @code
2222 @item Emchar
2223 @cindex Emchar
2224 An @code{Emchar} holds a single Emacs character.
2225
2226 Obviously, the equality between characters and bytes is lost in the Mule
2227 world.  Characters can be represented by one or more bytes in the
2228 buffer, and @code{Emchar} is the C type large enough to hold any
2229 character.
2230
2231 Without Mule support, an @code{Emchar} is equivalent to an
2232 @code{unsigned char}.
2233
2234 @item Bufbyte
2235 @cindex Bufbyte
2236 The data representing the text in a buffer or string is logically a set
2237 of @code{Bufbyte}s.
2238
2239 XEmacs does not work with character formats all the time; when reading
2240 characters from the outside, it decodes them to an internal format, and
2241 likewise encodes them when writing.  @code{Bufbyte} (in fact
2242 @code{unsigned char}) is the basic unit of XEmacs internal buffers and
2243 strings format.
2244
2245 One character can correspond to one or more @code{Bufbyte}s.  In the
2246 current implementation, an ASCII character is represented by the same
2247 @code{Bufbyte}, and extended characters are represented by a sequence of
2248 @code{Bufbyte}s.
2249
2250 Without Mule support, a @code{Bufbyte} is equivalent to an
2251 @code{Emchar}.
2252
2253 @item Bufpos
2254 @itemx Charcount
2255 @cindex Bufpos
2256 @cindex Charcount
2257 A @code{Bufpos} represents a character position in a buffer or string.
2258 A @code{Charcount} represents a number (count) of characters.
2259 Logically, subtracting two @code{Bufpos} values yields a
2260 @code{Charcount} value.  Although all of these are @code{typedef}ed to
2261 @code{int}, we use them in preference to @code{int} to make it clear
2262 what sort of position is being used.
2263
2264 @code{Bufpos} and @code{Charcount} values are the only ones that are
2265 ever visible to Lisp.
2266
2267 @item Bytind
2268 @itemx Bytecount
2269 @cindex Bytind
2270 @cindex Bytecount
2271 A @code{Bytind} represents a byte position in a buffer or string.  A
2272 @code{Bytecount} represents the distance between two positions in bytes.
2273 The relationship between @code{Bytind} and @code{Bytecount} is the same
2274 as the relationship between @code{Bufpos} and @code{Charcount}.
2275
2276 @item Extbyte
2277 @itemx Extcount
2278 @cindex Extbyte
2279 @cindex Extcount
2280 When dealing with the outside world, XEmacs works with @code{Extbyte}s,
2281 which are equivalent to @code{unsigned char}.  Obviously, an
2282 @code{Extcount} is the distance between two @code{Extbyte}s.  Extbytes
2283 and Extcounts are not all that frequent in XEmacs code.
2284 @end table
2285
2286 @node Working With Character and Byte Positions
2287 @subsection Working With Character and Byte Positions
2288
2289 Now that we have defined the basic character-related types, we can look
2290 at the macros and functions designed for work with them and for
2291 conversion between them.  Most of these macros are defined in
2292 @file{buffer.h}, and we don't discuss all of them here, but only the
2293 most important ones.  Examining the existing code is the best way to
2294 learn about them.
2295
2296 @table @code
2297 @item MAX_EMCHAR_LEN
2298 @cindex MAX_EMCHAR_LEN
2299 This preprocessor constant is the maximum number of buffer bytes per
2300 Emacs character, i.e. the byte length of an @code{Emchar}.  It is useful
2301 when allocating temporary strings to keep a known number of characters.
2302 For instance:
2303
2304 @example
2305 @group
2306 @{
2307   Charcount cclen;
2308   ...
2309   @{
2310     /* Allocate place for @var{cclen} characters. */
2311     Bufbyte *buf = (Bufbyte *)alloca (cclen * MAX_EMCHAR_LEN);
2312 ...
2313 @end group
2314 @end example
2315
2316 If you followed the previous section, you can guess that, logically,
2317 multiplying a @code{Charcount} value with @code{MAX_EMCHAR_LEN} produces
2318 a @code{Bytecount} value.
2319
2320 In the current Mule implementation, @code{MAX_EMCHAR_LEN} equals 4.
2321 Without Mule, it is 1.
2322
2323 @item charptr_emchar
2324 @itemx set_charptr_emchar
2325 @cindex charptr_emchar
2326 @cindex set_charptr_emchar
2327 The @code{charptr_emchar} macro takes a @code{Bufbyte} pointer and
2328 returns the @code{Emchar} stored at that position.  If it were a
2329 function, its prototype would be:
2330
2331 @example
2332 Emchar charptr_emchar (Bufbyte *p);
2333 @end example
2334
2335 @code{set_charptr_emchar} stores an @code{Emchar} to the specified byte
2336 position.  It returns the number of bytes stored:
2337
2338 @example
2339 Bytecount set_charptr_emchar (Bufbyte *p, Emchar c);
2340 @end example
2341
2342 It is important to note that @code{set_charptr_emchar} is safe only for
2343 appending a character at the end of a buffer, not for overwriting a
2344 character in the middle.  This is because the width of characters
2345 varies, and @code{set_charptr_emchar} cannot resize the string if it
2346 writes, say, a two-byte character where a single-byte character used to
2347 reside.
2348
2349 A typical use of @code{set_charptr_emchar} can be demonstrated by this
2350 example, which copies characters from buffer @var{buf} to a temporary
2351 string of Bufbytes.
2352
2353 @example
2354 @group
2355 @{
2356   Bufpos pos;
2357   for (pos = beg; pos < end; pos++)
2358     @{
2359       Emchar c = BUF_FETCH_CHAR (buf, pos);
2360       p += set_charptr_emchar (buf, c);
2361     @}
2362 @}
2363 @end group
2364 @end example
2365
2366 Note how @code{set_charptr_emchar} is used to store the @code{Emchar}
2367 and increment the counter, at the same time.
2368
2369 @item INC_CHARPTR
2370 @itemx DEC_CHARPTR
2371 @cindex INC_CHARPTR
2372 @cindex DEC_CHARPTR
2373 These two macros increment and decrement a @code{Bufbyte} pointer,
2374 respectively.  They will adjust the pointer by the appropriate number of
2375 bytes according to the byte length of the character stored there.  Both
2376 macros assume that the memory address is located at the beginning of a
2377 valid character.
2378
2379 Without Mule support, @code{INC_CHARPTR (p)} and @code{DEC_CHARPTR (p)}
2380 simply expand to @code{p++} and @code{p--}, respectively.
2381
2382 @item bytecount_to_charcount
2383 @cindex bytecount_to_charcount
2384 Given a pointer to a text string and a length in bytes, return the
2385 equivalent length in characters.
2386
2387 @example
2388 Charcount bytecount_to_charcount (Bufbyte *p, Bytecount bc);
2389 @end example
2390
2391 @item charcount_to_bytecount
2392 @cindex charcount_to_bytecount
2393 Given a pointer to a text string and a length in characters, return the
2394 equivalent length in bytes.
2395
2396 @example
2397 Bytecount charcount_to_bytecount (Bufbyte *p, Charcount cc);
2398 @end example
2399
2400 @item charptr_n_addr
2401 @cindex charptr_n_addr
2402 Return a pointer to the beginning of the character offset @var{cc} (in
2403 characters) from @var{p}.
2404
2405 @example
2406 Bufbyte *charptr_n_addr (Bufbyte *p, Charcount cc);
2407 @end example
2408 @end table
2409
2410 @node Conversion to and from External Data
2411 @subsection Conversion to and from External Data
2412
2413 When an external function, such as a C library function, returns a
2414 @code{char} pointer, you should almost never treat it as @code{Bufbyte}.
2415 This is because these returned strings may contain 8bit characters which
2416 can be misinterpreted by XEmacs, and cause a crash.  Likewise, when
2417 exporting a piece of internal text to the outside world, you should
2418 always convert it to an appropriate external encoding, lest the internal
2419 stuff (such as the infamous \201 characters) leak out.
2420
2421 The interface to conversion between the internal and external
2422 representations of text are the numerous conversion macros defined in
2423 @file{buffer.h}.  Before looking at them, we'll look at the external
2424 formats supported by these macros.
2425
2426 Currently meaningful formats are @code{FORMAT_BINARY},
2427 @code{FORMAT_FILENAME}, @code{FORMAT_OS}, and @code{FORMAT_CTEXT}.  Here
2428 is a description of these.
2429
2430 @table @code
2431 @item FORMAT_BINARY
2432 Binary format.  This is the simplest format and is what we use in the
2433 absence of a more appropriate format.  This converts according to the
2434 @code{binary} coding system:
2435
2436 @enumerate a
2437 @item
2438 On input, bytes 0--255 are converted into characters 0--255.
2439 @item
2440 On output, characters 0--255 are converted into bytes 0--255 and other
2441 characters are converted into `X'.
2442 @end enumerate
2443
2444 @item FORMAT_FILENAME
2445 Format used for filenames.  In the original Mule, this is user-definable
2446 with the @code{pathname-coding-system} variable.  For the moment, we
2447 just use the @code{binary} coding system.
2448
2449 @item FORMAT_OS
2450 Format used for the external Unix environment---@code{argv[]}, stuff
2451 from @code{getenv()}, stuff from the @file{/etc/passwd} file, etc.
2452
2453 Perhaps should be the same as FORMAT_FILENAME.
2454
2455 @item FORMAT_CTEXT
2456 Compound--text format.  This is the standard X format used for data
2457 stored in properties, selections, and the like.  This is an 8-bit
2458 no-lock-shift ISO2022 coding system.
2459 @end table
2460
2461 The macros to convert between these formats and the internal format, and
2462 vice versa, follow.
2463
2464 @table @code
2465 @item GET_CHARPTR_INT_DATA_ALLOCA
2466 @itemx GET_CHARPTR_EXT_DATA_ALLOCA
2467 These two are the most basic conversion macros.
2468 @code{GET_CHARPTR_INT_DATA_ALLOCA} converts external data to internal
2469 format, and @code{GET_CHARPTR_EXT_DATA_ALLOCA} converts the other way
2470 around.  The arguments each of these receives are @var{ptr} (pointer to
2471 the text in external format), @var{len} (length of texts in bytes),
2472 @var{fmt} (format of the external text), @var{ptr_out} (lvalue to which
2473 new text should be copied), and @var{len_out} (lvalue which will be
2474 assigned the length of the internal text in bytes).  The resulting text
2475 is stored to a stack-allocated buffer.  If the text doesn't need
2476 changing, these macros will do nothing, except for setting
2477 @var{len_out}.
2478
2479 The macros above take many arguments which makes them unwieldy.  For
2480 this reason, a number of convenience macros are defined with obvious
2481 functionality, but accepting less arguments.  The general rule is that
2482 macros with @samp{INT} in their name convert text to internal Emacs
2483 representation, whereas the @samp{EXT} macros convert to external
2484 representation.
2485
2486 @item GET_C_CHARPTR_INT_DATA_ALLOCA
2487 @itemx GET_C_CHARPTR_EXT_DATA_ALLOCA
2488 As their names imply, these macros work on C char pointers, which are
2489 zero-terminated, and thus do not need @var{len} or @var{len_out}
2490 parameters.
2491
2492 @item GET_STRING_EXT_DATA_ALLOCA
2493 @itemx GET_C_STRING_EXT_DATA_ALLOCA
2494 These two macros convert a Lisp string into an external representation.
2495 The difference between them is that @code{GET_STRING_EXT_DATA_ALLOCA}
2496 stores its output to a generic string, providing @var{len_out}, the
2497 length of the resulting external string.  On the other hand,
2498 @code{GET_C_STRING_EXT_DATA_ALLOCA} assumes that the caller will be
2499 satisfied with output string being zero-terminated.
2500
2501 Note that for Lisp strings only one conversion direction makes sense.
2502
2503 @item GET_C_CHARPTR_EXT_BINARY_DATA_ALLOCA
2504 @itemx GET_CHARPTR_EXT_BINARY_DATA_ALLOCA
2505 @itemx GET_STRING_BINARY_DATA_ALLOCA
2506 @itemx GET_C_STRING_BINARY_DATA_ALLOCA
2507 @itemx GET_C_CHARPTR_EXT_FILENAME_DATA_ALLOCA
2508 @itemx ...
2509 These macros convert internal text to a specific external
2510 representation, with the external format being encoded into the name of
2511 the macro.  Note that the @code{GET_STRING_...} and
2512 @code{GET_C_STRING...}  macros lack the @samp{EXT} tag, because they
2513 only make sense in that direction.
2514
2515 @item GET_C_CHARPTR_INT_BINARY_DATA_ALLOCA
2516 @itemx GET_CHARPTR_INT_BINARY_DATA_ALLOCA
2517 @itemx GET_C_CHARPTR_INT_FILENAME_DATA_ALLOCA
2518 @itemx ...
2519 These macros convert external text of a specific format to its internal
2520 representation, with the external format being incoded into the name of
2521 the macro.
2522 @end table
2523
2524 @node General Guidelines for Writing Mule-Aware Code
2525 @subsection General Guidelines for Writing Mule-Aware Code
2526
2527 This section contains some general guidance on how to write Mule-aware
2528 code, as well as some pitfalls you should avoid.
2529
2530 @table @emph
2531 @item Never use @code{char} and @code{char *}.
2532 In XEmacs, the use of @code{char} and @code{char *} is almost always a
2533 mistake.  If you want to manipulate an Emacs character from ``C'', use
2534 @code{Emchar}.  If you want to examine a specific octet in the internal
2535 format, use @code{Bufbyte}.  If you want a Lisp-visible character, use a
2536 @code{Lisp_Object} and @code{make_char}.  If you want a pointer to move
2537 through the internal text, use @code{Bufbyte *}.  Also note that you
2538 almost certainly do not need @code{Emchar *}.
2539
2540 @item Be careful not to confuse @code{Charcount}, @code{Bytecount}, and @code{Bufpos}.
2541 The whole point of using different types is to avoid confusion about the
2542 use of certain variables.  Lest this effect be nullified, you need to be
2543 careful about using the right types.
2544
2545 @item Always convert external data
2546 It is extremely important to always convert external data, because
2547 XEmacs can crash if unexpected 8bit sequences are copied to its internal
2548 buffers literally.
2549
2550 This means that when a system function, such as @code{readdir}, returns
2551 a string, you need to convert it using one of the conversion macros
2552 described in the previous chapter, before passing it further to Lisp.
2553 In the case of @code{readdir}, you would use the
2554 @code{GET_C_CHARPTR_INT_FILENAME_DATA_ALLOCA} macro.
2555
2556 Also note that many internal functions, such as @code{make_string},
2557 accept Bufbytes, which removes the need for them to convert the data
2558 they receive.  This increases efficiency because that way external data
2559 needs to be decoded only once, when it is read.  After that, it is
2560 passed around in internal format.
2561 @end table
2562
2563 @node An Example of Mule-Aware Code
2564 @subsection An Example of Mule-Aware Code
2565
2566 As an example of Mule-aware code, we shall will analyze the
2567 @code{string} function, which conses up a Lisp string from the character
2568 arguments it receives.  Here is the definition, pasted from
2569 @code{alloc.c}:
2570
2571 @example
2572 @group
2573 DEFUN ("string", Fstring, 0, MANY, 0, /*
2574 Concatenate all the argument characters and make the result a string.
2575 */
2576        (int nargs, Lisp_Object *args))
2577 @{
2578   Bufbyte *storage = alloca_array (Bufbyte, nargs * MAX_EMCHAR_LEN);
2579   Bufbyte *p = storage;
2580
2581   for (; nargs; nargs--, args++)
2582     @{
2583       Lisp_Object lisp_char = *args;
2584       CHECK_CHAR_COERCE_INT (lisp_char);
2585       p += set_charptr_emchar (p, XCHAR (lisp_char));
2586     @}
2587   return make_string (storage, p - storage);
2588 @}
2589 @end group
2590 @end example
2591
2592 Now we can analyze the source line by line.
2593
2594 Obviously, string will be as long as there are arguments to the
2595 function.  This is why we allocate @code{MAX_EMCHAR_LEN} * @var{nargs}
2596 bytes on the stack, i.e. the worst-case number of bytes for @var{nargs}
2597 @code{Emchar}s to fit in the string.
2598
2599 Then, the loop checks that each element is a character, converting
2600 integers in the process.  Like many other functions in XEmacs, this
2601 function silently accepts integers where characters are expected, for
2602 historical and compatibility reasons.  Unless you know what you are
2603 doing, @code{CHECK_CHAR} will also suffice.  @code{XCHAR (lisp_char)}
2604 extracts the @code{Emchar} from the @code{Lisp_Object}, and
2605 @code{set_charptr_emchar} stores it to storage, increasing @code{p} in
2606 the process.
2607
2608 Other instructive examples of correct coding under Mule can be found all
2609 over the XEmacs code.  For starters, I recommend
2610 @code{Fnormalize_menu_item_name} in @file{menubar.c}.  After you have
2611 understood this section of the manual and studied the examples, you can
2612 proceed writing new Mule-aware code.
2613
2614 @node Techniques for XEmacs Developers
2615 @section Techniques for XEmacs Developers
2616
2617 To make a quantified XEmacs, do: @code{make quantmacs}.
2618
2619 You simply can't dump Quantified and Purified images.  Run the image
2620 like so:  @code{quantmacs -batch -l loadup.el run-temacs @var{xemacs-args...}}.
2621
2622 Before you go through the trouble, are you compiling with all
2623 debugging and error-checking off?  If not try that first.  Be warned
2624 that while Quantify is directly responsible for quite a few
2625 optimizations which have been made to XEmacs, doing a run which
2626 generates results which can be acted upon is not necessarily a trivial
2627 task.
2628
2629 Also, if you're still willing to do some runs make sure you configure
2630 with the @samp{--quantify} flag.  That will keep Quantify from starting
2631 to record data until after the loadup is completed and will shut off
2632 recording right before it shuts down (which generates enough bogus data
2633 to throw most results off).  It also enables three additional elisp
2634 commands: @code{quantify-start-recording-data},
2635 @code{quantify-stop-recording-data} and @code{quantify-clear-data}.
2636
2637 If you want to make XEmacs faster, target your favorite slow benchmark,
2638 run a profiler like Quantify, @code{gprof}, or @code{tcov}, and figure
2639 out where the cycles are going.  Specific projects:
2640
2641 @itemize @bullet
2642 @item
2643 Make the garbage collector faster.  Figure out how to write an
2644 incremental garbage collector.
2645 @item
2646 Write a compiler that takes bytecode and spits out C code.
2647 Unfortunately, you will then need a C compiler and a more fully
2648 developed module system.
2649 @item
2650 Speed up redisplay.
2651 @item
2652 Speed up syntax highlighting.  Maybe moving some of the syntax
2653 highlighting capabilities into C would make a difference.
2654 @item
2655 Implement tail recursion in Emacs Lisp (hard!).
2656 @end itemize
2657
2658 Unfortunately, Emacs Lisp is slow, and is going to stay slow.  Function
2659 calls in elisp are especially expensive.  Iterating over a long list is
2660 going to be 30 times faster implemented in C than in Elisp.
2661
2662 To get started debugging XEmacs, take a look at the @file{.gdbinit} and
2663 @file{.dbxrc} files in the @file{src} directory.
2664 @xref{Q2.1.15 - How to Debug an XEmacs problem with a debugger,,,
2665 xemacs-faq, XEmacs FAQ}.
2666
2667 After making source code changes, run @code{make check} to ensure that
2668 you haven't introduced any regressions.  If you're feeling ambitious,
2669 you can try to improve the test suite in @file{tests/automated}.
2670
2671 Here are things to know when you create a new source file:
2672
2673 @itemize @bullet
2674 @item
2675 All @file{.c} files should @code{#include <config.h>} first.  Almost all
2676 @file{.c} files should @code{#include "lisp.h"} second.
2677
2678 @item
2679 Generated header files should be included using the @code{#include <...>} syntax,
2680 not the @code{#include "..."} syntax.  The generated headers are:
2681
2682 @file{config.h puresize-adjust.h sheap-adjust.h paths.h Emacs.ad.h}
2683
2684 The basic rule is that you should assume builds using @code{--srcdir}
2685 and the @code{#include <...>} syntax needs to be used when the
2686 to-be-included generated file is in a potentially different directory
2687 @emph{at compile time}.  The non-obvious C rule is that @code{#include "..."}
2688 means to search for the included file in the same directory as the
2689 including file, @emph{not} in the current directory.
2690
2691 @item
2692 Header files should @emph{not} include @code{<config.h>} and
2693 @code{"lisp.h"}.  It is the responsibility of the @file{.c} files that
2694 use it to do so.
2695
2696 @item
2697 If the header uses @code{INLINE}, either directly or through
2698 @code{DECLARE_LRECORD}, then it must be added to @file{inline.c}'s
2699 includes.
2700
2701 @item
2702 Try compiling at least once with
2703
2704 @example
2705 gcc --with-mule --with-union-type --error-checking=all
2706 @end example
2707
2708 @item
2709 Did I mention that you should run the test suite?
2710 @example
2711 make check
2712 @end example
2713 @end itemize
2714
2715
2716 @node A Summary of the Various XEmacs Modules, Allocation of Objects in XEmacs Lisp, Rules When Writing New C Code, Top
2717 @chapter A Summary of the Various XEmacs Modules
2718
2719   This is accurate as of XEmacs 20.0.
2720
2721 @menu
2722 * Low-Level Modules::
2723 * Basic Lisp Modules::
2724 * Modules for Standard Editing Operations::
2725 * Editor-Level Control Flow Modules::
2726 * Modules for the Basic Displayable Lisp Objects::
2727 * Modules for other Display-Related Lisp Objects::
2728 * Modules for the Redisplay Mechanism::
2729 * Modules for Interfacing with the File System::
2730 * Modules for Other Aspects of the Lisp Interpreter and Object System::
2731 * Modules for Interfacing with the Operating System::
2732 * Modules for Interfacing with X Windows::
2733 * Modules for Internationalization::
2734 @end menu
2735
2736 @node Low-Level Modules
2737 @section Low-Level Modules
2738
2739 @example
2740 config.h
2741 @end example
2742
2743 This is automatically generated from @file{config.h.in} based on the
2744 results of configure tests and user-selected optional features and
2745 contains preprocessor definitions specifying the nature of the
2746 environment in which XEmacs is being compiled.
2747
2748
2749
2750 @example
2751 paths.h
2752 @end example
2753
2754 This is automatically generated from @file{paths.h.in} based on supplied
2755 configure values, and allows for non-standard installed configurations
2756 of the XEmacs directories.  It's currently broken, though.
2757
2758
2759
2760 @example
2761 emacs.c
2762 signal.c
2763 @end example
2764
2765 @file{emacs.c} contains @code{main()} and other code that performs the most
2766 basic environment initializations and handles shutting down the XEmacs
2767 process (this includes @code{kill-emacs}, the normal way that XEmacs is
2768 exited; @code{dump-emacs}, which is used during the build process to
2769 write out the XEmacs executable; @code{run-emacs-from-temacs}, which can
2770 be used to start XEmacs directly when temacs has finished loading all
2771 the Lisp code; and emergency code to handle crashes [XEmacs tries to
2772 auto-save all files before it crashes]).
2773
2774 Low-level code that directly interacts with the Unix signal mechanism,
2775 however, is in @file{signal.c}.  Note that this code does not handle system
2776 dependencies in interfacing to signals; that is handled using the
2777 @file{syssignal.h} header file, described in section J below.
2778
2779
2780
2781 @example
2782 unexaix.c
2783 unexalpha.c
2784 unexapollo.c
2785 unexconvex.c
2786 unexec.c
2787 unexelf.c
2788 unexelfsgi.c
2789 unexencap.c
2790 unexenix.c
2791 unexfreebsd.c
2792 unexfx2800.c
2793 unexhp9k3.c
2794 unexhp9k800.c
2795 unexmips.c
2796 unexnext.c
2797 unexsol2.c
2798 unexsunos4.c
2799 @end example
2800
2801 These modules contain code dumping out the XEmacs executable on various
2802 different systems. (This process is highly machine-specific and
2803 requires intimate knowledge of the executable format and the memory map
2804 of the process.) Only one of these modules is actually used; this is
2805 chosen by @file{configure}.
2806
2807
2808
2809 @example
2810 crt0.c
2811 lastfile.c
2812 pre-crt0.c
2813 @end example
2814
2815 These modules are used in conjunction with the dump mechanism.  On some
2816 systems, an alternative version of the C startup code (the actual code
2817 that receives control from the operating system when the process is
2818 started, and which calls @code{main()}) is required so that the dumping
2819 process works properly; @file{crt0.c} provides this.
2820
2821 @file{pre-crt0.c} and @file{lastfile.c} should be the very first and
2822 very last file linked, respectively. (Actually, this is not really true.
2823 @file{lastfile.c} should be after all Emacs modules whose initialized
2824 data should be made constant, and before all other Emacs files and all
2825 libraries.  In particular, the allocation modules @file{gmalloc.c},
2826 @file{alloca.c}, etc. are normally placed past @file{lastfile.c}, and
2827 all of the files that implement Xt widget classes @emph{must} be placed
2828 after @file{lastfile.c} because they contain various structures that
2829 must be statically initialized and into which Xt writes at various
2830 times.) @file{pre-crt0.c} and @file{lastfile.c} contain exported symbols
2831 that are used to determine the start and end of XEmacs' initialized
2832 data space when dumping.
2833
2834
2835
2836 @example
2837 alloca.c
2838 free-hook.c
2839 getpagesize.h
2840 gmalloc.c
2841 malloc.c
2842 mem-limits.h
2843 ralloc.c
2844 vm-limit.c
2845 @end example
2846
2847 These handle basic C allocation of memory.  @file{alloca.c} is an emulation of
2848 the stack allocation function @code{alloca()} on machines that lack
2849 this. (XEmacs makes extensive use of @code{alloca()} in its code.)
2850
2851 @file{gmalloc.c} and @file{malloc.c} are two implementations of the standard C
2852 functions @code{malloc()}, @code{realloc()} and @code{free()}.  They are
2853 often used in place of the standard system-provided @code{malloc()}
2854 because they usually provide a much faster implementation, at the
2855 expense of additional memory use.  @file{gmalloc.c} is a newer implementation
2856 that is much more memory-efficient for large allocations than @file{malloc.c},
2857 and should always be preferred if it works. (At one point, @file{gmalloc.c}
2858 didn't work on some systems where @file{malloc.c} worked; but this should be
2859 fixed now.)
2860
2861 @cindex relocating allocator
2862 @file{ralloc.c} is the @dfn{relocating allocator}.  It provides
2863 functions similar to @code{malloc()}, @code{realloc()} and @code{free()}
2864 that allocate memory that can be dynamically relocated in memory.  The
2865 advantage of this is that allocated memory can be shuffled around to
2866 place all the free memory at the end of the heap, and the heap can then
2867 be shrunk, releasing the memory back to the operating system.  The use
2868 of this can be controlled with the configure option @code{--rel-alloc};
2869 if enabled, memory allocated for buffers will be relocatable, so that if
2870 a very large file is visited and the buffer is later killed, the memory
2871 can be released to the operating system.  (The disadvantage of this
2872 mechanism is that it can be very slow.  On systems with the
2873 @code{mmap()} system call, the XEmacs version of @file{ralloc.c} uses
2874 this to move memory around without actually having to block-copy it,
2875 which can speed things up; but it can still cause noticeable performance
2876 degradation.)
2877
2878 @file{free-hook.c} contains some debugging functions for checking for invalid
2879 arguments to @code{free()}.
2880
2881 @file{vm-limit.c} contains some functions that warn the user when memory is
2882 getting low.  These are callback functions that are called by @file{gmalloc.c}
2883 and @file{malloc.c} at appropriate times.
2884
2885 @file{getpagesize.h} provides a uniform interface for retrieving the size of a
2886 page in virtual memory.  @file{mem-limits.h} provides a uniform interface for
2887 retrieving the total amount of available virtual memory.  Both are
2888 similar in spirit to the @file{sys*.h} files described in section J, below.
2889
2890
2891
2892 @example
2893 blocktype.c
2894 blocktype.h
2895 dynarr.c
2896 @end example
2897
2898 These implement a couple of basic C data types to facilitate memory
2899 allocation.  The @code{Blocktype} type efficiently manages the
2900 allocation of fixed-size blocks by minimizing the number of times that
2901 @code{malloc()} and @code{free()} are called.  It allocates memory in
2902 large chunks, subdivides the chunks into blocks of the proper size, and
2903 returns the blocks as requested.  When blocks are freed, they are placed
2904 onto a linked list, so they can be efficiently reused.  This data type
2905 is not much used in XEmacs currently, because it's a fairly new
2906 addition.
2907
2908 @cindex dynamic array
2909 The @code{Dynarr} type implements a @dfn{dynamic array}, which is
2910 similar to a standard C array but has no fixed limit on the number of
2911 elements it can contain.  Dynamic arrays can hold elements of any type,
2912 and when you add a new element, the array automatically resizes itself
2913 if it isn't big enough.  Dynarrs are extensively used in the redisplay
2914 mechanism.
2915
2916
2917
2918 @example
2919 inline.c
2920 @end example
2921
2922 This module is used in connection with inline functions (available in
2923 some compilers).  Often, inline functions need to have a corresponding
2924 non-inline function that does the same thing.  This module is where they
2925 reside.  It contains no actual code, but defines some special flags that
2926 cause inline functions defined in header files to be rendered as actual
2927 functions.  It then includes all header files that contain any inline
2928 function definitions, so that each one gets a real function equivalent.
2929
2930
2931
2932 @example
2933 debug.c
2934 debug.h
2935 @end example
2936
2937 These functions provide a system for doing internal consistency checks
2938 during code development.  This system is not currently used; instead the
2939 simpler @code{assert()} macro is used along with the various checks
2940 provided by the @samp{--error-check-*} configuration options.
2941
2942
2943
2944 @example
2945 prefix-args.c
2946 @end example
2947
2948 This is actually the source for a small, self-contained program
2949 used during building.
2950
2951
2952 @example
2953 universe.h
2954 @end example
2955
2956 This is not currently used.
2957
2958
2959
2960 @node Basic Lisp Modules
2961 @section Basic Lisp Modules
2962
2963 @example
2964 emacsfns.h
2965 lisp-disunion.h
2966 lisp-union.h
2967 lisp.h
2968 lrecord.h
2969 symsinit.h
2970 @end example
2971
2972 These are the basic header files for all XEmacs modules.  Each module
2973 includes @file{lisp.h}, which brings the other header files in.
2974 @file{lisp.h} contains the definitions of the structures and extractor
2975 and constructor macros for the basic Lisp objects and various other
2976 basic definitions for the Lisp environment, as well as some
2977 general-purpose definitions (e.g. @code{min()} and @code{max()}).
2978 @file{lisp.h} includes either @file{lisp-disunion.h} or
2979 @file{lisp-union.h}, depending on whether @code{USE_UNION_TYPE} is
2980 defined.  These files define the typedef of the Lisp object itself (as
2981 described above) and the low-level macros that hide the actual
2982 implementation of the Lisp object.  All extractor and constructor macros
2983 for particular types of Lisp objects are defined in terms of these
2984 low-level macros.
2985
2986 As a general rule, all typedefs should go into the typedefs section of
2987 @file{lisp.h} rather than into a module-specific header file even if the
2988 structure is defined elsewhere.  This allows function prototypes that
2989 use the typedef to be placed into other header files.  Forward structure
2990 declarations (i.e. a simple declaration like @code{struct foo;} where
2991 the structure itself is defined elsewhere) should be placed into the
2992 typedefs section as necessary.
2993
2994 @file{lrecord.h} contains the basic structures and macros that implement
2995 all record-type Lisp objects---i.e. all objects whose type is a field
2996 in their C structure, which includes all objects except the few most
2997 basic ones.
2998
2999 @file{lisp.h} contains prototypes for most of the exported functions in
3000 the various modules.  Lisp primitives defined using @code{DEFUN} that
3001 need to be called by C code should be declared using @code{EXFUN}.
3002 Other function prototypes should be placed either into the appropriate
3003 section of @code{lisp.h}, or into a module-specific header file,
3004 depending on how general-purpose the function is and whether it has
3005 special-purpose argument types requiring definitions not in
3006 @file{lisp.h}.)  All initialization functions are prototyped in
3007 @file{symsinit.h}.
3008
3009
3010
3011 @example
3012 alloc.c
3013 pure.c
3014 puresize.h
3015 @end example
3016
3017 The large module @file{alloc.c} implements all of the basic allocation and
3018 garbage collection for Lisp objects.  The most commonly used Lisp
3019 objects are allocated in chunks, similar to the Blocktype data type
3020 described above; others are allocated in individually @code{malloc()}ed
3021 blocks.  This module provides the foundation on which all other aspects
3022 of the Lisp environment sit, and is the first module initialized at
3023 startup.
3024
3025 Note that @file{alloc.c} provides a series of generic functions that are
3026 not dependent on any particular object type, and interfaces to
3027 particular types of objects using a standardized interface of
3028 type-specific methods.  This scheme is a fundamental principle of
3029 object-oriented programming and is heavily used throughout XEmacs.  The
3030 great advantage of this is that it allows for a clean separation of
3031 functionality into different modules---new classes of Lisp objects, new
3032 event interfaces, new device types, new stream interfaces, etc. can be
3033 added transparently without affecting code anywhere else in XEmacs.
3034 Because the different subsystems are divided into general and specific
3035 code, adding a new subtype within a subsystem will in general not
3036 require changes to the generic subsystem code or affect any of the other
3037 subtypes in the subsystem; this provides a great deal of robustness to
3038 the XEmacs code.
3039
3040 @cindex pure space
3041 @file{pure.c} contains the declaration of the @dfn{purespace} array.
3042 Pure space is a hack used to place some constant Lisp data into the code
3043 segment of the XEmacs executable, even though the data needs to be
3044 initialized through function calls.  (See above in section VIII for more
3045 info about this.)  During startup, certain sorts of data is
3046 automatically copied into pure space, and other data is copied manually
3047 in some of the basic Lisp files by calling the function @code{purecopy},
3048 which copies the object if possible (this only works in temacs, of
3049 course) and returns the new object.  In particular, while temacs is
3050 executing, the Lisp reader automatically copies all compiled-function
3051 objects that it reads into pure space.  Since compiled-function objects
3052 are large, are never modified, and typically comprise the majority of
3053 the contents of a compiled-Lisp file, this works well.  While XEmacs is
3054 running, any attempt to modify an object that resides in pure space
3055 causes an error.  Objects in pure space are never garbage collected --
3056 almost all of the time, they're intended to be permanent, and in any
3057 case you can't write into pure space to set the mark bits.
3058
3059 @file{puresize.h} contains the declaration of the size of the pure space
3060 array.  This depends on the optional features that are compiled in, any
3061 extra purespace requested by the user at compile time, and certain other
3062 factors (e.g. 64-bit machines need more pure space because their Lisp
3063 objects are larger).  The smallest size that suffices should be used, so
3064 that there's no wasted space.  If there's not enough pure space, you
3065 will get an error during the build process, specifying how much more
3066 pure space is needed.
3067
3068
3069
3070 @example
3071 eval.c
3072 backtrace.h
3073 @end example
3074
3075 This module contains all of the functions to handle the flow of control.
3076 This includes the mechanisms of defining functions, calling functions,
3077 traversing stack frames, and binding variables; the control primitives
3078 and other special forms such as @code{while}, @code{if}, @code{eval},
3079 @code{let}, @code{and}, @code{or}, @code{progn}, etc.; handling of
3080 non-local exits, unwind-protects, and exception handlers; entering the
3081 debugger; methods for the subr Lisp object type; etc.  It does
3082 @emph{not} include the @code{read} function, the @code{print} function,
3083 or the handling of symbols and obarrays.
3084
3085 @file{backtrace.h} contains some structures related to stack frames and the
3086 flow of control.
3087
3088
3089
3090 @example
3091 lread.c
3092 @end example
3093
3094 This module implements the Lisp reader and the @code{read} function,
3095 which converts text into Lisp objects, according to the read syntax of
3096 the objects, as described above.  This is similar to the parser that is
3097 a part of all compilers.
3098
3099
3100
3101 @example
3102 print.c
3103 @end example
3104
3105 This module implements the Lisp print mechanism and the @code{print}
3106 function and related functions.  This is the inverse of the Lisp reader
3107 -- it converts Lisp objects to a printed, textual representation.
3108 (Hopefully something that can be read back in using @code{read} to get
3109 an equivalent object.)
3110
3111
3112
3113 @example
3114 general.c
3115 symbols.c
3116 symeval.h
3117 @end example
3118
3119 @file{symbols.c} implements the handling of symbols, obarrays, and
3120 retrieving the values of symbols.  Much of the code is devoted to
3121 handling the special @dfn{symbol-value-magic} objects that define
3122 special types of variables---this includes buffer-local variables,
3123 variable aliases, variables that forward into C variables, etc.  This
3124 module is initialized extremely early (right after @file{alloc.c}),
3125 because it is here that the basic symbols @code{t} and @code{nil} are
3126 created, and those symbols are used everywhere throughout XEmacs.
3127
3128 @file{symeval.h} contains the definitions of symbol structures and the
3129 @code{DEFVAR_LISP()} and related macros for declaring variables.
3130
3131
3132
3133 @example
3134 data.c
3135 floatfns.c
3136 fns.c
3137 @end example
3138
3139 These modules implement the methods and standard Lisp primitives for all
3140 the basic Lisp object types other than symbols (which are described
3141 above).  @file{data.c} contains all the predicates (primitives that return
3142 whether an object is of a particular type); the integer arithmetic
3143 functions; and the basic accessor and mutator primitives for the various
3144 object types.  @file{fns.c} contains all the standard predicates for working
3145 with sequences (where, abstractly speaking, a sequence is an ordered set
3146 of objects, and can be represented by a list, string, vector, or
3147 bit-vector); it also contains @code{equal}, perhaps on the grounds that
3148 bulk of the operation of @code{equal} is comparing sequences.
3149 @file{floatfns.c} contains methods and primitives for floats and floating-point
3150 arithmetic.
3151
3152
3153
3154 @example
3155 bytecode.c
3156 bytecode.h
3157 @end example
3158
3159 @file{bytecode.c} implements the byte-code interpreter and
3160 compiled-function objects, and @file{bytecode.h} contains associated
3161 structures.  Note that the byte-code @emph{compiler} is written in Lisp.
3162
3163
3164
3165
3166 @node Modules for Standard Editing Operations
3167 @section Modules for Standard Editing Operations
3168
3169 @example
3170 buffer.c
3171 buffer.h
3172 bufslots.h
3173 @end example
3174
3175 @file{buffer.c} implements the @dfn{buffer} Lisp object type.  This
3176 includes functions that create and destroy buffers; retrieve buffers by
3177 name or by other properties; manipulate lists of buffers (remember that
3178 buffers are permanent objects and stored in various ordered lists);
3179 retrieve or change buffer properties; etc.  It also contains the
3180 definitions of all the built-in buffer-local variables (which can be
3181 viewed as buffer properties).  It does @emph{not} contain code to
3182 manipulate buffer-local variables (that's in @file{symbols.c}, described
3183 above); or code to manipulate the text in a buffer.
3184
3185 @file{buffer.h} defines the structures associated with a buffer and the various
3186 macros for retrieving text from a buffer and special buffer positions
3187 (e.g. @code{point}, the default location for text insertion).  It also
3188 contains macros for working with buffer positions and converting between
3189 their representations as character offsets and as byte offsets (under
3190 MULE, they are different, because characters can be multi-byte).  It is
3191 one of the largest header files.
3192
3193 @file{bufslots.h} defines the fields in the buffer structure that correspond to
3194 the built-in buffer-local variables.  It is its own header file because
3195 it is included many times in @file{buffer.c}, as a way of iterating over all
3196 the built-in buffer-local variables.
3197
3198
3199
3200 @example
3201 insdel.c
3202 insdel.h
3203 @end example
3204
3205 @file{insdel.c} contains low-level functions for inserting and deleting text in
3206 a buffer, keeping track of changed regions for use by redisplay, and
3207 calling any before-change and after-change functions that may have been
3208 registered for the buffer.  It also contains the actual functions that
3209 convert between byte offsets and character offsets.
3210
3211 @file{insdel.h} contains associated headers.
3212
3213
3214
3215 @example
3216 marker.c
3217 @end example
3218
3219 This module implements the @dfn{marker} Lisp object type, which
3220 conceptually is a pointer to a text position in a buffer that moves
3221 around as text is inserted and deleted, so as to remain in the same
3222 relative position.  This module doesn't actually move the markers around
3223 -- that's handled in @file{insdel.c}.  This module just creates them and
3224 implements the primitives for working with them.  As markers are simple
3225 objects, this does not entail much.
3226
3227 Note that the standard arithmetic primitives (e.g. @code{+}) accept
3228 markers in place of integers and automatically substitute the value of
3229 @code{marker-position} for the marker, i.e. an integer describing the
3230 current buffer position of the marker.
3231
3232
3233
3234 @example
3235 extents.c
3236 extents.h
3237 @end example
3238
3239 This module implements the @dfn{extent} Lisp object type, which is like
3240 a marker that works over a range of text rather than a single position.
3241 Extents are also much more complex and powerful than markers and have a
3242 more efficient (and more algorithmically complex) implementation.  The
3243 implementation is described in detail in comments in @file{extents.c}.
3244
3245 The code in @file{extents.c} works closely with @file{insdel.c} so that
3246 extents are properly moved around as text is inserted and deleted.
3247 There is also code in @file{extents.c} that provides information needed
3248 by the redisplay mechanism for efficient operation. (Remember that
3249 extents can have display properties that affect [sometimes drastically,
3250 as in the @code{invisible} property] the display of the text they
3251 cover.)
3252
3253
3254
3255 @example
3256 editfns.c
3257 @end example
3258
3259 @file{editfns.c} contains the standard Lisp primitives for working with
3260 a buffer's text, and calls the low-level functions in @file{insdel.c}.
3261 It also contains primitives for working with @code{point} (the default
3262 buffer insertion location).
3263
3264 @file{editfns.c} also contains functions for retrieving various
3265 characteristics from the external environment: the current time, the
3266 process ID of the running XEmacs process, the name of the user who ran
3267 this XEmacs process, etc.  It's not clear why this code is in
3268 @file{editfns.c}.
3269
3270
3271
3272 @example
3273 callint.c
3274 cmds.c
3275 commands.h
3276 @end example
3277
3278 @cindex interactive
3279 These modules implement the basic @dfn{interactive} commands,
3280 i.e. user-callable functions.  Commands, as opposed to other functions,
3281 have special ways of getting their parameters interactively (by querying
3282 the user), as opposed to having them passed in a normal function
3283 invocation.  Many commands are not really meant to be called from other
3284 Lisp functions, because they modify global state in a way that's often
3285 undesired as part of other Lisp functions.
3286
3287 @file{callint.c} implements the mechanism for querying the user for
3288 parameters and calling interactive commands.  The bulk of this module is
3289 code that parses the interactive spec that is supplied with an
3290 interactive command.
3291
3292 @file{cmds.c} implements the basic, most commonly used editing commands:
3293 commands to move around the current buffer and insert and delete
3294 characters.  These commands are implemented using the Lisp primitives
3295 defined in @file{editfns.c}.
3296
3297 @file{commands.h} contains associated structure definitions and prototypes.
3298
3299
3300
3301 @example
3302 regex.c
3303 regex.h
3304 search.c
3305 @end example
3306
3307 @file{search.c} implements the Lisp primitives for searching for text in
3308 a buffer, and some of the low-level algorithms for doing this.  In
3309 particular, the fast fixed-string Boyer-Moore search algorithm is
3310 implemented in @file{search.c}.  The low-level algorithms for doing
3311 regular-expression searching, however, are implemented in @file{regex.c}
3312 and @file{regex.h}.  These two modules are largely independent of
3313 XEmacs, and are similar to (and based upon) the regular-expression
3314 routines used in @file{grep} and other GNU utilities.
3315
3316
3317
3318 @example
3319 doprnt.c
3320 @end example
3321
3322 @file{doprnt.c} implements formatted-string processing, similar to
3323 @code{printf()} command in C.
3324
3325
3326
3327 @example
3328 undo.c
3329 @end example
3330
3331 This module implements the undo mechanism for tracking buffer changes.
3332 Most of this could be implemented in Lisp.
3333
3334
3335
3336 @node Editor-Level Control Flow Modules
3337 @section Editor-Level Control Flow Modules
3338
3339 @example
3340 event-Xt.c
3341 event-stream.c
3342 event-tty.c
3343 events.c
3344 events.h
3345 @end example
3346
3347 These implement the handling of events (user input and other system
3348 notifications).
3349
3350 @file{events.c} and @file{events.h} define the @dfn{event} Lisp object
3351 type and primitives for manipulating it.
3352
3353 @file{event-stream.c} implements the basic functions for working with
3354 event queues, dispatching an event by looking it up in relevant keymaps
3355 and such, and handling timeouts; this includes the primitives
3356 @code{next-event} and @code{dispatch-event}, as well as related
3357 primitives such as @code{sit-for}, @code{sleep-for}, and
3358 @code{accept-process-output}. (@file{event-stream.c} is one of the
3359 hairiest and trickiest modules in XEmacs.  Beware!  You can easily mess
3360 things up here.)
3361
3362 @file{event-Xt.c} and @file{event-tty.c} implement the low-level
3363 interfaces onto retrieving events from Xt (the X toolkit) and from TTY's
3364 (using @code{read()} and @code{select()}), respectively.  The event
3365 interface enforces a clean separation between the specific code for
3366 interfacing with the operating system and the generic code for working
3367 with events, by defining an API of basic, low-level event methods;
3368 @file{event-Xt.c} and @file{event-tty.c} are two different
3369 implementations of this API.  To add support for a new operating system
3370 (e.g. NeXTstep), one merely needs to provide another implementation of
3371 those API functions.
3372
3373 Note that the choice of whether to use @file{event-Xt.c} or
3374 @file{event-tty.c} is made at compile time!  Or at the very latest, it
3375 is made at startup time.  @file{event-Xt.c} handles events for
3376 @emph{both} X and TTY frames; @file{event-tty.c} is only used when X
3377 support is not compiled into XEmacs.  The reason for this is that there
3378 is only one event loop in XEmacs: thus, it needs to be able to receive
3379 events from all different kinds of frames.
3380
3381
3382
3383 @example
3384 keymap.c
3385 keymap.h
3386 @end example
3387
3388 @file{keymap.c} and @file{keymap.h} define the @dfn{keymap} Lisp object
3389 type and associated methods and primitives. (Remember that keymaps are
3390 objects that associate event descriptions with functions to be called to
3391 ``execute'' those events; @code{dispatch-event} looks up events in the
3392 relevant keymaps.)
3393
3394
3395
3396 @example
3397 keyboard.c
3398 @end example
3399
3400 @file{keyboard.c} contains functions that implement the actual editor
3401 command loop---i.e. the event loop that cyclically retrieves and
3402 dispatches events.  This code is also rather tricky, just like
3403 @file{event-stream.c}.
3404
3405
3406
3407 @example
3408 macros.c
3409 macros.h
3410 @end example
3411
3412 These two modules contain the basic code for defining keyboard macros.
3413 These functions don't actually do much; most of the code that handles keyboard
3414 macros is mixed in with the event-handling code in @file{event-stream.c}.
3415
3416
3417
3418 @example
3419 minibuf.c
3420 @end example
3421
3422 This contains some miscellaneous code related to the minibuffer (most of
3423 the minibuffer code was moved into Lisp by Richard Mlynarik).  This
3424 includes the primitives for completion (although filename completion is
3425 in @file{dired.c}), the lowest-level interface to the minibuffer (if the
3426 command loop were cleaned up, this too could be in Lisp), and code for
3427 dealing with the echo area (this, too, was mostly moved into Lisp, and
3428 the only code remaining is code to call out to Lisp or provide simple
3429 bootstrapping implementations early in temacs, before the echo-area Lisp
3430 code is loaded).
3431
3432
3433
3434 @node Modules for the Basic Displayable Lisp Objects
3435 @section Modules for the Basic Displayable Lisp Objects
3436
3437 @example
3438 device-ns.h
3439 device-stream.c
3440 device-stream.h
3441 device-tty.c
3442 device-tty.h
3443 device-x.c
3444 device-x.h
3445 device.c
3446 device.h
3447 @end example
3448
3449 These modules implement the @dfn{device} Lisp object type.  This
3450 abstracts a particular screen or connection on which frames are
3451 displayed.  As with Lisp objects, event interfaces, and other
3452 subsystems, the device code is separated into a generic component that
3453 contains a standardized interface (in the form of a set of methods) onto
3454 particular device types.
3455
3456 The device subsystem defines all the methods and provides method
3457 services for not only device operations but also for the frame, window,
3458 menubar, scrollbar, toolbar, and other displayable-object subsystems.
3459 The reason for this is that all of these subsystems have the same
3460 subtypes (X, TTY, NeXTstep, Microsoft Windows, etc.) as devices do.
3461
3462
3463
3464 @example
3465 frame-ns.h
3466 frame-tty.c
3467 frame-x.c
3468 frame-x.h
3469 frame.c
3470 frame.h
3471 @end example
3472
3473 Each device contains one or more frames in which objects (e.g. text) are
3474 displayed.  A frame corresponds to a window in the window system;
3475 usually this is a top-level window but it could potentially be one of a
3476 number of overlapping child windows within a top-level window, using the
3477 MDI (Multiple Document Interface) protocol in Microsoft Windows or a
3478 similar scheme.
3479
3480 The @file{frame-*} files implement the @dfn{frame} Lisp object type and
3481 provide the generic and device-type-specific operations on frames
3482 (e.g. raising, lowering, resizing, moving, etc.).
3483
3484
3485
3486 @example
3487 window.c
3488 window.h
3489 @end example
3490
3491 @cindex window (in Emacs)
3492 @cindex pane
3493 Each frame consists of one or more non-overlapping @dfn{windows} (better
3494 known as @dfn{panes} in standard window-system terminology) in which a
3495 buffer's text can be displayed.  Windows can also have scrollbars
3496 displayed around their edges.
3497
3498 @file{window.c} and @file{window.h} implement the @dfn{window} Lisp
3499 object type and provide code to manage windows.  Since windows have no
3500 associated resources in the window system (the window system knows only
3501 about the frame; no child windows or anything are used for XEmacs
3502 windows), there is no device-type-specific code here; all of that code
3503 is part of the redisplay mechanism or the code for particular object
3504 types such as scrollbars.
3505
3506
3507
3508 @node Modules for other Display-Related Lisp Objects
3509 @section Modules for other Display-Related Lisp Objects
3510
3511 @example
3512 faces.c
3513 faces.h
3514 @end example
3515
3516
3517
3518 @example
3519 bitmaps.h
3520 glyphs-ns.h
3521 glyphs-x.c
3522 glyphs-x.h
3523 glyphs.c
3524 glyphs.h
3525 @end example
3526
3527
3528
3529 @example
3530 objects-ns.h
3531 objects-tty.c
3532 objects-tty.h
3533 objects-x.c
3534 objects-x.h
3535 objects.c
3536 objects.h
3537 @end example
3538
3539
3540
3541 @example
3542 menubar-x.c
3543 menubar.c
3544 @end example
3545
3546
3547
3548 @example
3549 scrollbar-x.c
3550 scrollbar-x.h
3551 scrollbar.c
3552 scrollbar.h
3553 @end example
3554
3555
3556
3557 @example
3558 toolbar-x.c
3559 toolbar.c
3560 toolbar.h
3561 @end example
3562
3563
3564
3565 @example
3566 font-lock.c
3567 @end example
3568
3569 This file provides C support for syntax highlighting---i.e.
3570 highlighting different syntactic constructs of a source file in
3571 different colors, for easy reading.  The C support is provided so that
3572 this is fast.
3573
3574
3575
3576 @example
3577 dgif_lib.c
3578 gif_err.c
3579 gif_lib.h
3580 gifalloc.c
3581 @end example
3582
3583 These modules decode GIF-format image files, for use with glyphs.
3584
3585
3586
3587 @node Modules for the Redisplay Mechanism
3588 @section Modules for the Redisplay Mechanism
3589
3590 @example
3591 redisplay-output.c
3592 redisplay-tty.c
3593 redisplay-x.c
3594 redisplay.c
3595 redisplay.h
3596 @end example
3597
3598 These files provide the redisplay mechanism.  As with many other
3599 subsystems in XEmacs, there is a clean separation between the general
3600 and device-specific support.
3601
3602 @file{redisplay.c} contains the bulk of the redisplay engine.  These
3603 functions update the redisplay structures (which describe how the screen
3604 is to appear) to reflect any changes made to the state of any
3605 displayable objects (buffer, frame, window, etc.) since the last time
3606 that redisplay was called.  These functions are highly optimized to
3607 avoid doing more work than necessary (since redisplay is called
3608 extremely often and is potentially a huge time sink), and depend heavily
3609 on notifications from the objects themselves that changes have occurred,
3610 so that redisplay doesn't explicitly have to check each possible object.
3611 The redisplay mechanism also contains a great deal of caching to further
3612 speed things up; some of this caching is contained within the various
3613 displayable objects.
3614
3615 @file{redisplay-output.c} goes through the redisplay structures and converts
3616 them into calls to device-specific methods to actually output the screen
3617 changes.
3618
3619 @file{redisplay-x.c} and @file{redisplay-tty.c} are two implementations
3620 of these redisplay output methods, for X frames and TTY frames,
3621 respectively.
3622
3623
3624
3625 @example
3626 indent.c
3627 @end example
3628
3629 This module contains various functions and Lisp primitives for
3630 converting between buffer positions and screen positions.  These
3631 functions call the redisplay mechanism to do most of the work, and then
3632 examine the redisplay structures to get the necessary information.  This
3633 module needs work.
3634
3635
3636
3637 @example
3638 termcap.c
3639 terminfo.c
3640 tparam.c
3641 @end example
3642
3643 These files contain functions for working with the termcap (BSD-style)
3644 and terminfo (System V style) databases of terminal capabilities and
3645 escape sequences, used when XEmacs is displaying in a TTY.
3646
3647
3648
3649 @example
3650 cm.c
3651 cm.h
3652 @end example
3653
3654 These files provide some miscellaneous TTY-output functions and should
3655 probably be merged into @file{redisplay-tty.c}.
3656
3657
3658
3659 @node Modules for Interfacing with the File System
3660 @section Modules for Interfacing with the File System
3661
3662 @example
3663 lstream.c
3664 lstream.h
3665 @end example
3666
3667 These modules implement the @dfn{stream} Lisp object type.  This is an
3668 internal-only Lisp object that implements a generic buffering stream.
3669 The idea is to provide a uniform interface onto all sources and sinks of
3670 data, including file descriptors, stdio streams, chunks of memory, Lisp
3671 buffers, Lisp strings, etc.  That way, I/O functions can be written to
3672 the stream interface and can transparently handle all possible sources
3673 and sinks.  (For example, the @code{read} function can read data from a
3674 file, a string, a buffer, or even a function that is called repeatedly
3675 to return data, without worrying about where the data is coming from or
3676 what-size chunks it is returned in.)
3677
3678 @cindex lstream
3679 Note that in the C code, streams are called @dfn{lstreams} (for ``Lisp
3680 streams'') to distinguish them from other kinds of streams, e.g. stdio
3681 streams and C++ I/O streams.
3682
3683 Similar to other subsystems in XEmacs, lstreams are separated into
3684 generic functions and a set of methods for the different types of
3685 lstreams.  @file{lstream.c} provides implementations of many different
3686 types of streams; others are provided, e.g., in @file{mule-coding.c}.
3687
3688
3689
3690 @example
3691 fileio.c
3692 @end example
3693
3694 This implements the basic primitives for interfacing with the file
3695 system.  This includes primitives for reading files into buffers,
3696 writing buffers into files, checking for the presence or accessibility
3697 of files, canonicalizing file names, etc.  Note that these primitives
3698 are usually not invoked directly by the user: There is a great deal of
3699 higher-level Lisp code that implements the user commands such as
3700 @code{find-file} and @code{save-buffer}.  This is similar to the
3701 distinction between the lower-level primitives in @file{editfns.c} and
3702 the higher-level user commands in @file{commands.c} and
3703 @file{simple.el}.
3704
3705
3706
3707 @example
3708 filelock.c
3709 @end example
3710
3711 This file provides functions for detecting clashes between different
3712 processes (e.g. XEmacs and some external process, or two different
3713 XEmacs processes) modifying the same file.  (XEmacs can optionally use
3714 the @file{lock/} subdirectory to provide a form of ``locking'' between
3715 different XEmacs processes.)  This module is also used by the low-level
3716 functions in @file{insdel.c} to ensure that, if the first modification
3717 is being made to a buffer whose corresponding file has been externally
3718 modified, the user is made aware of this so that the buffer can be
3719 synched up with the external changes if necessary.
3720
3721
3722 @example
3723 filemode.c
3724 @end example
3725
3726 This file provides some miscellaneous functions that construct a
3727 @samp{rwxr-xr-x}-type permissions string (as might appear in an
3728 @file{ls}-style directory listing) given the information returned by the
3729 @code{stat()} system call.
3730
3731
3732
3733 @example
3734 dired.c
3735 ndir.h
3736 @end example
3737
3738 These files implement the XEmacs interface to directory searching.  This
3739 includes a number of primitives for determining the files in a directory
3740 and for doing filename completion. (Remember that generic completion is
3741 handled by a different mechanism, in @file{minibuf.c}.)
3742
3743 @file{ndir.h} is a header file used for the directory-searching
3744 emulation functions provided in @file{sysdep.c} (see section J below),
3745 for systems that don't provide any directory-searching functions. (On
3746 those systems, directories can be read directly as files, and parsed.)
3747
3748
3749
3750 @example
3751 realpath.c
3752 @end example
3753
3754 This file provides an implementation of the @code{realpath()} function
3755 for expanding symbolic links, on systems that don't implement it or have
3756 a broken implementation.
3757
3758
3759
3760 @node Modules for Other Aspects of the Lisp Interpreter and Object System
3761 @section Modules for Other Aspects of the Lisp Interpreter and Object System
3762
3763 @example
3764 elhash.c
3765 elhash.h
3766 hash.c
3767 hash.h
3768 @end example
3769
3770 These files provide two implementations of hash tables.  Files
3771 @file{hash.c} and @file{hash.h} provide a generic C implementation of
3772 hash tables which can stand independently of XEmacs.  Files
3773 @file{elhash.c} and @file{elhash.h} provide a separate implementation of
3774 hash tables that can store only Lisp objects, and knows about Lispy
3775 things like garbage collection, and implement the @dfn{hash-table} Lisp
3776 object type.
3777
3778
3779 @example
3780 specifier.c
3781 specifier.h
3782 @end example
3783
3784 This module implements the @dfn{specifier} Lisp object type.  This is
3785 primarily used for displayable properties, and allows for values that
3786 are specific to a particular buffer, window, frame, device, or device
3787 class, as well as a default value existing.  This is used, for example,
3788 to control the height of the horizontal scrollbar or the appearance of
3789 the @code{default}, @code{bold}, or other faces.  The specifier object
3790 consists of a number of specifications, each of which maps from a
3791 buffer, window, etc. to a value.  The function @code{specifier-instance}
3792 looks up a value given a window (from which a buffer, frame, and device
3793 can be derived).
3794
3795
3796 @example
3797 chartab.c
3798 chartab.h
3799 casetab.c
3800 @end example
3801
3802 @file{chartab.c} and @file{chartab.h} implement the @dfn{char table}
3803 Lisp object type, which maps from characters or certain sorts of
3804 character ranges to Lisp objects.  The implementation of this object
3805 type is optimized for the internal representation of characters.  Char
3806 tables come in different types, which affect the allowed object types to
3807 which a character can be mapped and also dictate certain other
3808 properties of the char table.
3809
3810 @cindex case table
3811 @file{casetab.c} implements one sort of char table, the @dfn{case
3812 table}, which maps characters to other characters of possibly different
3813 case.  These are used by XEmacs to implement case-changing primitives
3814 and to do case-insensitive searching.
3815
3816
3817
3818 @example
3819 syntax.c
3820 syntax.h
3821 @end example
3822
3823 @cindex scanner
3824 This module implements @dfn{syntax tables}, another sort of char table
3825 that maps characters into syntax classes that define the syntax of these
3826 characters (e.g. a parenthesis belongs to a class of @samp{open}
3827 characters that have corresponding @samp{close} characters and can be
3828 nested).  This module also implements the Lisp @dfn{scanner}, a set of
3829 primitives for scanning over text based on syntax tables.  This is used,
3830 for example, to find the matching parenthesis in a command such as
3831 @code{forward-sexp}, and by @file{font-lock.c} to locate quoted strings,
3832 comments, etc.
3833
3834
3835
3836 @example
3837 casefiddle.c
3838 @end example
3839
3840 This module implements various Lisp primitives for upcasing, downcasing
3841 and capitalizing strings or regions of buffers.
3842
3843
3844
3845 @example
3846 rangetab.c
3847 @end example
3848
3849 This module implements the @dfn{range table} Lisp object type, which
3850 provides for a mapping from ranges of integers to arbitrary Lisp
3851 objects.
3852
3853
3854
3855 @example
3856 opaque.c
3857 opaque.h
3858 @end example
3859
3860 This module implements the @dfn{opaque} Lisp object type, an
3861 internal-only Lisp object that encapsulates an arbitrary block of memory
3862 so that it can be managed by the Lisp allocation system.  To create an
3863 opaque object, you call @code{make_opaque()}, passing a pointer to a
3864 block of memory.  An object is created that is big enough to hold the
3865 memory, which is copied into the object's storage.  The object will then
3866 stick around as long as you keep pointers to it, after which it will be
3867 automatically reclaimed.
3868
3869 @cindex mark method
3870 Opaque objects can also have an arbitrary @dfn{mark method} associated
3871 with them, in case the block of memory contains other Lisp objects that
3872 need to be marked for garbage-collection purposes. (If you need other
3873 object methods, such as a finalize method, you should just go ahead and
3874 create a new Lisp object type---it's not hard.)
3875
3876
3877
3878 @example
3879 abbrev.c
3880 @end example
3881
3882 This function provides a few primitives for doing dynamic abbreviation
3883 expansion.  In XEmacs, most of the code for this has been moved into
3884 Lisp.  Some C code remains for speed and because the primitive
3885 @code{self-insert-command} (which is executed for all self-inserting
3886 characters) hooks into the abbrev mechanism. (@code{self-insert-command}
3887 is itself in C only for speed.)
3888
3889
3890
3891 @example
3892 doc.c
3893 @end example
3894
3895 This function provides primitives for retrieving the documentation
3896 strings of functions and variables.  These documentation strings contain
3897 certain special markers that get dynamically expanded (e.g. a
3898 reverse-lookup is performed on some named functions to retrieve their
3899 current key bindings).  Some documentation strings (in particular, for
3900 the built-in primitives and pre-loaded Lisp functions) are stored
3901 externally in a file @file{DOC} in the @file{lib-src/} directory and
3902 need to be fetched from that file. (Part of the build stage involves
3903 building this file, and another part involves constructing an index for
3904 this file and embedding it into the executable, so that the functions in
3905 @file{doc.c} do not have to search the entire @file{DOC} file to find
3906 the appropriate documentation string.)
3907
3908
3909
3910 @example
3911 md5.c
3912 @end example
3913
3914 This function provides a Lisp primitive that implements the MD5 secure
3915 hashing scheme, used to create a large hash value of a string of data such that
3916 the data cannot be derived from the hash value.  This is used for
3917 various security applications on the Internet.
3918
3919
3920
3921
3922 @node Modules for Interfacing with the Operating System
3923 @section Modules for Interfacing with the Operating System
3924
3925 @example
3926 callproc.c
3927 process.c
3928 process.h
3929 @end example
3930
3931 These modules allow XEmacs to spawn and communicate with subprocesses
3932 and network connections.
3933
3934 @cindex synchronous subprocesses
3935 @cindex subprocesses, synchronous
3936   @file{callproc.c} implements (through the @code{call-process}
3937 primitive) what are called @dfn{synchronous subprocesses}.  This means
3938 that XEmacs runs a program, waits till it's done, and retrieves its
3939 output.  A typical example might be calling the @file{ls} program to get
3940 a directory listing.
3941
3942 @cindex asynchronous subprocesses
3943 @cindex subprocesses, asynchronous
3944   @file{process.c} and @file{process.h} implement @dfn{asynchronous
3945 subprocesses}.  This means that XEmacs starts a program and then
3946 continues normally, not waiting for the process to finish.  Data can be
3947 sent to the process or retrieved from it as it's running.  This is used
3948 for the @code{shell} command (which provides a front end onto a shell
3949 program such as @file{csh}), the mail and news readers implemented in
3950 XEmacs, etc.  The result of calling @code{start-process} to start a
3951 subprocess is a process object, a particular kind of object used to
3952 communicate with the subprocess.  You can send data to the process by
3953 passing the process object and the data to @code{send-process}, and you
3954 can specify what happens to data retrieved from the process by setting
3955 properties of the process object. (When the process sends data, XEmacs
3956 receives a process event, which says that there is data ready.  When
3957 @code{dispatch-event} is called on this event, it reads the data from
3958 the process and does something with it, as specified by the process
3959 object's properties.  Typically, this means inserting the data into a
3960 buffer or calling a function.) Another property of the process object is
3961 called the @dfn{sentinel}, which is a function that is called when the
3962 process terminates.
3963
3964 @cindex network connections
3965   Process objects are also used for network connections (connections to a
3966 process running on another machine).  Network connections are started
3967 with @code{open-network-stream} but otherwise work just like
3968 subprocesses.
3969
3970
3971
3972 @example
3973 sysdep.c
3974 sysdep.h
3975 @end example
3976
3977   These modules implement most of the low-level, messy operating-system
3978 interface code.  This includes various device control (ioctl) operations
3979 for file descriptors, TTY's, pseudo-terminals, etc. (usually this stuff
3980 is fairly system-dependent; thus the name of this module), and emulation
3981 of standard library functions and system calls on systems that don't
3982 provide them or have broken versions.
3983
3984
3985
3986 @example
3987 sysdir.h
3988 sysfile.h
3989 sysfloat.h
3990 sysproc.h
3991 syspwd.h
3992 syssignal.h
3993 systime.h
3994 systty.h
3995 syswait.h
3996 @end example
3997
3998 These header files provide consistent interfaces onto system-dependent
3999 header files and system calls.  The idea is that, instead of including a
4000 standard header file like @file{<sys/param.h>} (which may or may not
4001 exist on various systems) or having to worry about whether all system
4002 provide a particular preprocessor constant, or having to deal with the
4003 four different paradigms for manipulating signals, you just include the
4004 appropriate @file{sys*.h} header file, which includes all the right
4005 system header files, defines and missing preprocessor constants,
4006 provides a uniform interface onto system calls, etc.
4007
4008 @file{sysdir.h} provides a uniform interface onto directory-querying
4009 functions. (In some cases, this is in conjunction with emulation
4010 functions in @file{sysdep.c}.)
4011
4012 @file{sysfile.h} includes all the necessary header files for standard
4013 system calls (e.g. @code{read()}), ensures that all necessary
4014 @code{open()} and @code{stat()} preprocessor constants are defined, and
4015 possibly (usually) substitutes sugared versions of @code{read()},
4016 @code{write()}, etc. that automatically restart interrupted I/O
4017 operations.
4018
4019 @file{sysfloat.h} includes the necessary header files for floating-point
4020 operations.
4021
4022 @file{sysproc.h} includes the necessary header files for calling
4023 @code{select()}, @code{fork()}, @code{execve()}, socket operations, and
4024 the like, and ensures that the @code{FD_*()} macros for descriptor-set
4025 manipulations are available.
4026
4027 @file{syspwd.h} includes the necessary header files for obtaining
4028 information from @file{/etc/passwd} (the functions are emulated under
4029 VMS).
4030
4031 @file{syssignal.h} includes the necessary header files for
4032 signal-handling and provides a uniform interface onto the different
4033 signal-handling and signal-blocking paradigms.
4034
4035 @file{systime.h} includes the necessary header files and provides
4036 uniform interfaces for retrieving the time of day, setting file
4037 access/modification times, getting the amount of time used by the XEmacs
4038 process, etc.
4039
4040 @file{systty.h} buffers against the infinitude of different ways of
4041 controlling TTY's.
4042
4043 @file{syswait.h} provides a uniform way of retrieving the exit status
4044 from a @code{wait()}ed-on process (some systems use a union, others use
4045 an int).
4046
4047
4048
4049 @example
4050 hpplay.c
4051 libsst.c
4052 libsst.h
4053 libst.h
4054 linuxplay.c
4055 nas.c
4056 sgiplay.c
4057 sound.c
4058 sunplay.c
4059 @end example
4060
4061 These files implement the ability to play various sounds on some types
4062 of computers.  You have to configure your XEmacs with sound support in
4063 order to get this capability.
4064
4065 @file{sound.c} provides the generic interface.  It implements various
4066 Lisp primitives and variables that let you specify which sounds should
4067 be played in certain conditions. (The conditions are identified by
4068 symbols, which are passed to @code{ding} to make a sound.  Various
4069 standard functions call this function at certain times; if sound support
4070 does not exist, a simple beep results.
4071
4072 @cindex native sound
4073 @cindex sound, native
4074 @file{sgiplay.c}, @file{sunplay.c}, @file{hpplay.c}, and
4075 @file{linuxplay.c} interface to the machine's speaker for various
4076 different kind of machines.  This is called @dfn{native} sound.
4077
4078 @cindex sound, network
4079 @cindex network sound
4080 @cindex NAS
4081 @file{nas.c} interfaces to a computer somewhere else on the network
4082 using the NAS (Network Audio Server) protocol, playing sounds on that
4083 machine.  This allows you to run XEmacs on a remote machine, with its
4084 display set to your local machine, and have the sounds be made on your
4085 local machine, provided that you have a NAS server running on your local
4086 machine.
4087
4088 @file{libsst.c}, @file{libsst.h}, and @file{libst.h} provide some
4089 additional functions for playing sound on a Sun SPARC but are not
4090 currently in use.
4091
4092
4093
4094 @example
4095 tooltalk.c
4096 tooltalk.h
4097 @end example
4098
4099 These two modules implement an interface to the ToolTalk protocol, which
4100 is an interprocess communication protocol implemented on some versions
4101 of Unix.  ToolTalk is a high-level protocol that allows processes to
4102 register themselves as providers of particular services; other processes
4103 can then request a service without knowing or caring exactly who is
4104 providing the service.  It is similar in spirit to the DDE protocol
4105 provided under Microsoft Windows.  ToolTalk is a part of the new CDE
4106 (Common Desktop Environment) specification and is used to connect the
4107 parts of the SPARCWorks development environment.
4108
4109
4110
4111 @example
4112 getloadavg.c
4113 @end example
4114
4115 This module provides the ability to retrieve the system's current load
4116 average. (The way to do this is highly system-specific, unfortunately,
4117 and requires a lot of special-case code.)
4118
4119
4120
4121 @example
4122 sunpro.c
4123 @end example
4124
4125 This module provides a small amount of code used internally at Sun to
4126 keep statistics on the usage of XEmacs.
4127
4128
4129
4130 @example
4131 broken-sun.h
4132 strcmp.c
4133 strcpy.c
4134 sunOS-fix.c
4135 @end example
4136
4137 These files provide replacement functions and prototypes to fix numerous
4138 bugs in early releases of SunOS 4.1.
4139
4140
4141
4142 @example
4143 hftctl.c
4144 @end example
4145
4146 This module provides some terminal-control code necessary on versions of
4147 AIX prior to 4.1.
4148
4149
4150
4151 @example
4152 msdos.c
4153 msdos.h
4154 @end example
4155
4156 These modules are used for MS-DOS support, which does not work in
4157 XEmacs.
4158
4159
4160
4161 @node Modules for Interfacing with X Windows
4162 @section Modules for Interfacing with X Windows
4163
4164 @example
4165 Emacs.ad.h
4166 @end example
4167
4168 A file generated from @file{Emacs.ad}, which contains XEmacs-supplied
4169 fallback resources (so that XEmacs has pretty defaults).
4170
4171
4172
4173 @example
4174 EmacsFrame.c
4175 EmacsFrame.h
4176 EmacsFrameP.h
4177 @end example
4178
4179 These modules implement an Xt widget class that encapsulates a frame.
4180 This is for ease in integrating with Xt.  The EmacsFrame widget covers
4181 the entire X window except for the menubar; the scrollbars are
4182 positioned on top of the EmacsFrame widget.
4183
4184 @strong{Warning:} Abandon hope, all ye who enter here.  This code took
4185 an ungodly amount of time to get right, and is likely to fall apart
4186 mercilessly at the slightest change.  Such is life under Xt.
4187
4188
4189
4190 @example
4191 EmacsManager.c
4192 EmacsManager.h
4193 EmacsManagerP.h
4194 @end example
4195
4196 These modules implement a simple Xt manager (i.e. composite) widget
4197 class that simply lets its children set whatever geometry they want.
4198 It's amazing that Xt doesn't provide this standardly, but on second
4199 thought, it makes sense, considering how amazingly broken Xt is.
4200
4201
4202 @example
4203 EmacsShell-sub.c
4204 EmacsShell.c
4205 EmacsShell.h
4206 EmacsShellP.h
4207 @end example
4208
4209 These modules implement two Xt widget classes that are subclasses of
4210 the TopLevelShell and TransientShell classes.  This is necessary to deal
4211 with more brokenness that Xt has sadistically thrust onto the backs of
4212 developers.
4213
4214
4215
4216 @example
4217 xgccache.c
4218 xgccache.h
4219 @end example
4220
4221 These modules provide functions for maintenance and caching of GC's
4222 (graphics contexts) under the X Window System.  This code is junky and
4223 needs to be rewritten.
4224
4225
4226
4227 @example
4228 xselect.c
4229 @end example
4230
4231 @cindex selections
4232   This module provides an interface to the X Window System's concept of
4233 @dfn{selections}, the standard way for X applications to communicate
4234 with each other.
4235
4236
4237
4238 @example
4239 xintrinsic.h
4240 xintrinsicp.h
4241 xmmanagerp.h
4242 xmprimitivep.h
4243 @end example
4244
4245 These header files are similar in spirit to the @file{sys*.h} files and buffer
4246 against different implementations of Xt and Motif.
4247
4248 @itemize @bullet
4249 @item
4250 @file{xintrinsic.h} should be included in place of @file{<Intrinsic.h>}.
4251 @item
4252 @file{xintrinsicp.h} should be included in place of @file{<IntrinsicP.h>}.
4253 @item
4254 @file{xmmanagerp.h} should be included in place of @file{<XmManagerP.h>}.
4255 @item
4256 @file{xmprimitivep.h} should be included in place of @file{<XmPrimitiveP.h>}.
4257 @end itemize
4258
4259
4260
4261 @example
4262 xmu.c
4263 xmu.h
4264 @end example
4265
4266 These files provide an emulation of the Xmu library for those systems
4267 (i.e. HPUX) that don't provide it as a standard part of X.
4268
4269
4270
4271 @example
4272 ExternalClient-Xlib.c
4273 ExternalClient.c
4274 ExternalClient.h
4275 ExternalClientP.h
4276 ExternalShell.c
4277 ExternalShell.h
4278 ExternalShellP.h
4279 extw-Xlib.c
4280 extw-Xlib.h
4281 extw-Xt.c
4282 extw-Xt.h
4283 @end example
4284
4285 @cindex external widget
4286   These files provide the @dfn{external widget} interface, which allows an
4287 XEmacs frame to appear as a widget in another application.  To do this,
4288 you have to configure with @samp{--external-widget}.
4289
4290 @file{ExternalShell*} provides the server (XEmacs) side of the
4291 connection.
4292
4293 @file{ExternalClient*} provides the client (other application) side of
4294 the connection.  These files are not compiled into XEmacs but are
4295 compiled into libraries that are then linked into your application.
4296
4297 @file{extw-*} is common code that is used for both the client and server.
4298
4299 Don't touch this code; something is liable to break if you do.
4300
4301
4302
4303 @node Modules for Internationalization
4304 @section Modules for Internationalization
4305
4306 @example
4307 mule-canna.c
4308 mule-ccl.c
4309 mule-charset.c
4310 mule-charset.h
4311 mule-coding.c
4312 mule-coding.h
4313 mule-mcpath.c
4314 mule-mcpath.h
4315 mule-wnnfns.c
4316 mule.c
4317 @end example
4318
4319 These files implement the MULE (Asian-language) support.  Note that MULE
4320 actually provides a general interface for all sorts of languages, not
4321 just Asian languages (although they are generally the most complicated
4322 to support).  This code is still in beta.
4323
4324 @file{mule-charset.*} and @file{mule-coding.*} provide the heart of the
4325 XEmacs MULE support.  @file{mule-charset.*} implements the @dfn{charset}
4326 Lisp object type, which encapsulates a character set (an ordered one- or
4327 two-dimensional set of characters, such as US ASCII or JISX0208 Japanese
4328 Kanji).
4329
4330 @file{mule-coding.*} implements the @dfn{coding-system} Lisp object
4331 type, which encapsulates a method of converting between different
4332 encodings.  An encoding is a representation of a stream of characters,
4333 possibly from multiple character sets, using a stream of bytes or words,
4334 and defines (e.g.) which escape sequences are used to specify particular
4335 character sets, how the indices for a character are converted into bytes
4336 (sometimes this involves setting the high bit; sometimes complicated
4337 rearranging of the values takes place, as in the Shift-JIS encoding),
4338 etc.
4339
4340 @file{mule-ccl.c} provides the CCL (Code Conversion Language)
4341 interpreter.  CCL is similar in spirit to Lisp byte code and is used to
4342 implement converters for custom encodings.
4343
4344 @file{mule-canna.c} and @file{mule-wnnfns.c} implement interfaces to
4345 external programs used to implement the Canna and WNN input methods,
4346 respectively.  This is currently in beta.
4347
4348 @file{mule-mcpath.c} provides some functions to allow for pathnames
4349 containing extended characters.  This code is fragmentary, obsolete, and
4350 completely non-working.  Instead, @var{pathname-coding-system} is used
4351 to specify conversions of names of files and directories.  The standard
4352 C I/O functions like @samp{open()} are wrapped so that conversion occurs
4353 automatically.
4354
4355 @file{mule.c} provides a few miscellaneous things that should probably
4356 be elsewhere.
4357
4358
4359
4360 @example
4361 intl.c
4362 @end example
4363
4364 This provides some miscellaneous internationalization code for
4365 implementing message translation and interfacing to the Ximp input
4366 method.  None of this code is currently working.
4367
4368
4369
4370 @example
4371 iso-wide.h
4372 @end example
4373
4374 This contains leftover code from an earlier implementation of
4375 Asian-language support, and is not currently used.
4376
4377
4378
4379
4380 @node Allocation of Objects in XEmacs Lisp, Events and the Event Loop, A Summary of the Various XEmacs Modules, Top
4381 @chapter Allocation of Objects in XEmacs Lisp
4382
4383 @menu
4384 * Introduction to Allocation::
4385 * Garbage Collection::
4386 * GCPROing::
4387 * Garbage Collection - Step by Step::
4388 * Integers and Characters::
4389 * Allocation from Frob Blocks::
4390 * lrecords::
4391 * Low-level allocation::
4392 * Pure Space::
4393 * Cons::
4394 * Vector::
4395 * Bit Vector::
4396 * Symbol::
4397 * Marker::
4398 * String::
4399 * Compiled Function::
4400 @end menu
4401
4402 @node Introduction to Allocation
4403 @section Introduction to Allocation
4404
4405   Emacs Lisp, like all Lisps, has garbage collection.  This means that
4406 the programmer never has to explicitly free (destroy) an object; it
4407 happens automatically when the object becomes inaccessible.  Most
4408 experts agree that garbage collection is a necessity in a modern,
4409 high-level language.  Its omission from C stems from the fact that C was
4410 originally designed to be a nice abstract layer on top of assembly
4411 language, for writing kernels and basic system utilities rather than
4412 large applications.
4413
4414   Lisp objects can be created by any of a number of Lisp primitives.
4415 Most object types have one or a small number of basic primitives
4416 for creating objects.  For conses, the basic primitive is @code{cons};
4417 for vectors, the primitives are @code{make-vector} and @code{vector}; for
4418 symbols, the primitives are @code{make-symbol} and @code{intern}; etc.
4419 Some Lisp objects, especially those that are primarily used internally,
4420 have no corresponding Lisp primitives.  Every Lisp object, though,
4421 has at least one C primitive for creating it.
4422
4423   Recall from section (VII) that a Lisp object, as stored in a 32-bit
4424 or 64-bit word, has a mark bit, a few tag bits, and a ``value'' that
4425 occupies the remainder of the bits.  We can separate the different
4426 Lisp object types into four broad categories:
4427
4428 @itemize @bullet
4429 @item
4430 (a) Those for whom the value directly represents the contents of the
4431 Lisp object.  Only two types are in this category: integers and
4432 characters.  No special allocation or garbage collection is necessary
4433 for such objects.  Lisp objects of these types do not need to be
4434 @code{GCPRO}ed.
4435 @end itemize
4436
4437   In the remaining three categories, the value is a pointer to a
4438 structure.
4439
4440 @itemize @bullet
4441 @item
4442 @cindex frob block
4443 (b) Those for whom the tag directly specifies the type.  Recall that
4444 there are only three tag bits; this means that at most five types can be
4445 specified this way.  The most commonly-used types are stored in this
4446 format; this includes conses, strings, vectors, and sometimes symbols.
4447 With the exception of vectors, objects in this category are allocated in
4448 @dfn{frob blocks}, i.e. large blocks of memory that are subdivided into
4449 individual objects.  This saves a lot on malloc overhead, since there
4450 are typically quite a lot of these objects around, and the objects are
4451 small.  (A cons, for example, occupies 8 bytes on 32-bit machines---4
4452 bytes for each of the two objects it contains.) Vectors are individually
4453 @code{malloc()}ed since they are of variable size.  (It would be
4454 possible, and desirable, to allocate vectors of certain small sizes out
4455 of frob blocks, but it isn't currently done.) Strings are handled
4456 specially: Each string is allocated in two parts, a fixed size structure
4457 containing a length and a data pointer, and the actual data of the
4458 string.  The former structure is allocated in frob blocks as usual, and
4459 the latter data is stored in @dfn{string chars blocks} and is relocated
4460 during garbage collection to eliminate holes.
4461 @end itemize
4462
4463   In the remaining two categories, the type is stored in the object
4464 itself.  The tag for all such objects is the generic @dfn{lrecord}
4465 (Lisp_Record) tag.  The first four bytes (or eight, for 64-bit machines)
4466 of the object's structure are a pointer to a structure that describes
4467 the object's type, which includes method pointers and a pointer to a
4468 string naming the type.  Note that it's possible to save some space by
4469 using a one- or two-byte tag, rather than a four- or eight-byte pointer
4470 to store the type, but it's not clear it's worth making the change.
4471
4472 @itemize @bullet
4473 @item
4474 (c) Those lrecords that are allocated in frob blocks (see above).  This
4475 includes the objects that are most common and relatively small, and
4476 includes floats, compiled functions, symbols (when not in category (b)),
4477 extents, events, and markers.  With the cleanup of frob blocks done in
4478 19.12, it's not terribly hard to add more objects to this category, but
4479 it's a bit trickier than adding an object type to type (d) (esp. if the
4480 object needs a finalization method), and is not likely to save much
4481 space unless the object is small and there are many of them. (In fact,
4482 if there are very few of them, it might actually waste space.)
4483 @item
4484 (d) Those lrecords that are individually @code{malloc()}ed.  These are
4485 called @dfn{lcrecords}.  All other types are in this category.  Adding a
4486 new type to this category is comparatively easy, and all types added
4487 since 19.8 (when the current allocation scheme was devised, by Richard
4488 Mlynarik), with the exception of the character type, have been in this
4489 category.
4490 @end itemize
4491
4492   Note that bit vectors are a bit of a special case.  They are
4493 simple lrecords as in category (c), but are individually @code{malloc()}ed
4494 like vectors.  You can basically view them as exactly like vectors
4495 except that their type is stored in lrecord fashion rather than
4496 in directly-tagged fashion.
4497
4498   Note that FSF Emacs redesigned their object system in 19.29 to follow
4499 a similar scheme.  However, given RMS's expressed dislike for data
4500 abstraction, the FSF scheme is not nearly as clean or as easy to
4501 extend. (FSF calls items of type (c) @code{Lisp_Misc} and items of type
4502 (d) @code{Lisp_Vectorlike}, with separate tags for each, although
4503 @code{Lisp_Vectorlike} is also used for vectors.)
4504
4505 @node Garbage Collection
4506 @section Garbage Collection
4507 @cindex garbage collection
4508
4509 @cindex mark and sweep
4510   Garbage collection is simple in theory but tricky to implement.
4511 Emacs Lisp uses the oldest garbage collection method, called
4512 @dfn{mark and sweep}.  Garbage collection begins by starting with
4513 all accessible locations (i.e. all variables and other slots where
4514 Lisp objects might occur) and recursively traversing all objects
4515 accessible from those slots, marking each one that is found.
4516 We then go through all of memory and free each object that is
4517 not marked, and unmarking each object that is marked.  Note
4518 that ``all of memory'' means all currently allocated objects.
4519 Traversing all these objects means traversing all frob blocks,
4520 all vectors (which are chained in one big list), and all
4521 lcrecords (which are likewise chained).
4522
4523   Note that, when an object is marked, the mark has to occur
4524 inside of the object's structure, rather than in the 32-bit
4525 @code{Lisp_Object} holding the object's pointer; i.e. you can't just
4526 set the pointer's mark bit.  This is because there may be many
4527 pointers to the same object.  This means that the method of
4528 marking an object can differ depending on the type.  The
4529 different marking methods are approximately as follows:
4530
4531 @enumerate
4532 @item
4533 For conses, the mark bit of the car is set.
4534 @item
4535 For strings, the mark bit of the string's plist is set.
4536 @item
4537 For symbols when not lrecords, the mark bit of the
4538 symbol's plist is set.
4539 @item
4540 For vectors, the length is negated after adding 1.
4541 @item
4542 For lrecords, the pointer to the structure describing
4543 the type is changed (see below).
4544 @item
4545 Integers and characters do not need to be marked, since
4546 no allocation occurs for them.
4547 @end enumerate
4548
4549   The details of this are in the @code{mark_object()} function.
4550
4551   Note that any code that operates during garbage collection has
4552 to be especially careful because of the fact that some objects
4553 may be marked and as such may not look like they normally do.
4554 In particular:
4555
4556 @itemize @bullet
4557 Some object pointers may have their mark bit set.  This will make
4558 @code{FOOBARP()} predicates fail.  Use @code{GC_FOOBARP()} to deal with
4559 this.
4560 @item
4561 Even if you clear the mark bit, @code{FOOBARP()} will still fail
4562 for lrecords because the implementation pointer has been
4563 changed (see below).  @code{GC_FOOBARP()} will correctly deal with
4564 this.
4565 @item
4566 Vectors have their size field munged, so anything that
4567 looks at this field will fail.
4568 @item
4569 Note that @code{XFOOBAR()} macros @emph{will} work correctly on object
4570 pointers with their mark bit set, because the logical shift operations
4571 that remove the tag also remove the mark bit.
4572 @end itemize
4573
4574   Finally, note that garbage collection can be invoked explicitly
4575 by calling @code{garbage-collect} but is also called automatically
4576 by @code{eval}, once a certain amount of memory has been allocated
4577 since the last garbage collection (according to @code{gc-cons-threshold}).
4578
4579 @node GCPROing
4580 @section @code{GCPRO}ing
4581
4582 @code{GCPRO}ing is one of the ugliest and trickiest parts of Emacs
4583 internals.  The basic idea is that whenever garbage collection
4584 occurs, all in-use objects must be reachable somehow or
4585 other from one of the roots of accessibility.  The roots
4586 of accessibility are:
4587
4588 @enumerate
4589 @item
4590 All objects that have been @code{staticpro()}d.  This is used for
4591 any global C variables that hold Lisp objects.  A call to
4592 @code{staticpro()} happens implicitly as a result of any symbols
4593 declared with @code{defsymbol()} and any variables declared with
4594 @code{DEFVAR_FOO()}.  You need to explicitly call @code{staticpro()}
4595 (in the @code{vars_of_foo()} method of a module) for other global
4596 C variables holding Lisp objects. (This typically includes
4597 internal lists and such things.)
4598
4599 Note that @code{obarray} is one of the @code{staticpro()}d things.
4600 Therefore, all functions and variables get marked through this.
4601 @item
4602 Any shadowed bindings that are sitting on the @code{specpdl} stack.
4603 @item
4604 Any objects sitting in currently active (Lisp) stack frames,
4605 catches, and condition cases.
4606 @item
4607 A couple of special-case places where active objects are
4608 located.
4609 @item
4610 Anything currently marked with @code{GCPRO}.
4611 @end enumerate
4612
4613   Marking with @code{GCPRO} is necessary because some C functions (quite
4614 a lot, in fact), allocate objects during their operation.  Quite
4615 frequently, there will be no other pointer to the object while the
4616 function is running, and if a garbage collection occurs and the object
4617 needs to be referenced again, bad things will happen.  The solution is
4618 to mark those objects with @code{GCPRO}.  Unfortunately this is easy to
4619 forget, and there is basically no way around this problem.  Here are
4620 some rules, though:
4621
4622 @enumerate
4623 @item
4624 For every @code{GCPRO@var{n}}, there have to be declarations of
4625 @code{struct gcpro gcpro1, gcpro2}, etc.
4626
4627 @item
4628 You @emph{must} @code{UNGCPRO} anything that's @code{GCPRO}ed, and you
4629 @emph{must not} @code{UNGCPRO} if you haven't @code{GCPRO}ed.  Getting
4630 either of these wrong will lead to crashes, often in completely random
4631 places unrelated to where the problem lies.
4632
4633 @item
4634 The way this actually works is that all currently active @code{GCPRO}s
4635 are chained through the @code{struct gcpro} local variables, with the
4636 variable @samp{gcprolist} pointing to the head of the list and the nth
4637 local @code{gcpro} variable pointing to the first @code{gcpro} variable
4638 in the next enclosing stack frame.  Each @code{GCPRO}ed thing is an
4639 lvalue, and the @code{struct gcpro} local variable contains a pointer to
4640 this lvalue.  This is why things will mess up badly if you don't pair up
4641 the @code{GCPRO}s and @code{UNGCPRO}s---you will end up with
4642 @code{gcprolist}s containing pointers to @code{struct gcpro}s or local
4643 @code{Lisp_Object} variables in no-longer-active stack frames.
4644
4645 @item
4646 It is actually possible for a single @code{struct gcpro} to
4647 protect a contiguous array of any number of values, rather than
4648 just a single lvalue.  To effect this, call @code{GCPRO@var{n}} as usual on
4649 the first object in the array and then set @code{gcpro@var{n}.nvars}.
4650
4651 @item
4652 @strong{Strings are relocated.}  What this means in practice is that the
4653 pointer obtained using @code{XSTRING_DATA()} is liable to change at any
4654 time, and you should never keep it around past any function call, or
4655 pass it as an argument to any function that might cause a garbage
4656 collection.  This is why a number of functions accept either a
4657 ``non-relocatable'' @code{char *} pointer or a relocatable Lisp string,
4658 and only access the Lisp string's data at the very last minute.  In some
4659 cases, you may end up having to @code{alloca()} some space and copy the
4660 string's data into it.
4661
4662 @item
4663 By convention, if you have to nest @code{GCPRO}'s, use @code{NGCPRO@var{n}}
4664 (along with @code{struct gcpro ngcpro1, ngcpro2}, etc.), @code{NNGCPRO@var{n}},
4665 etc.  This avoids compiler warnings about shadowed locals.
4666
4667 @item
4668 It is @emph{always} better to err on the side of extra @code{GCPRO}s
4669 rather than too few.  The extra cycles spent on this are
4670 almost never going to make a whit of difference in the
4671 speed of anything.
4672
4673 @item
4674 The general rule to follow is that caller, not callee, @code{GCPRO}s.
4675 That is, you should not have to explicitly @code{GCPRO} any Lisp objects
4676 that are passed in as parameters.
4677
4678 One exception from this rule is if you ever plan to change the parameter
4679 value, and store a new object in it.  In that case, you @emph{must}
4680 @code{GCPRO} the parameter, because otherwise the new object will not be
4681 protected.
4682
4683 So, if you create any Lisp objects (remember, this happens in all sorts
4684 of circumstances, e.g. with @code{Fcons()}, etc.), you are responsible
4685 for @code{GCPRO}ing them, unless you are @emph{absolutely sure} that
4686 there's no possibility that a garbage-collection can occur while you
4687 need to use the object.  Even then, consider @code{GCPRO}ing.
4688
4689 @item
4690 A garbage collection can occur whenever anything calls @code{Feval}, or
4691 whenever a QUIT can occur where execution can continue past
4692 this. (Remember, this is almost anywhere.)
4693
4694 @item
4695 If you have the @emph{least smidgeon of doubt} about whether
4696 you need to @code{GCPRO}, you should @code{GCPRO}.
4697
4698 @item
4699 Beware of @code{GCPRO}ing something that is uninitialized.  If you have
4700 any shade of doubt about this, initialize all your variables to @code{Qnil}.
4701
4702 @item
4703 Be careful of traps, like calling @code{Fcons()} in the argument to
4704 another function.  By the ``caller protects'' law, you should be
4705 @code{GCPRO}ing the newly-created cons, but you aren't.  A certain
4706 number of functions that are commonly called on freshly created stuff
4707 (e.g. @code{nconc2()}, @code{Fsignal()}), break the ``caller protects''
4708 law and go ahead and @code{GCPRO} their arguments so as to simplify
4709 things, but make sure and check if it's OK whenever doing something like
4710 this.
4711
4712 @item
4713 Once again, remember to @code{GCPRO}!  Bugs resulting from insufficient
4714 @code{GCPRO}ing are intermittent and extremely difficult to track down,
4715 often showing up in crashes inside of @code{garbage-collect} or in
4716 weirdly corrupted objects or even in incorrect values in a totally
4717 different section of code.
4718 @end enumerate
4719
4720 @cindex garbage collection, conservative
4721 @cindex conservative garbage collection
4722   Given the extremely error-prone nature of the @code{GCPRO} scheme, and
4723 the difficulties in tracking down, it should be considered a deficiency
4724 in the XEmacs code.  A solution to this problem would involve
4725 implementing so-called @dfn{conservative} garbage collection for the C
4726 stack.  That involves looking through all of stack memory and treating
4727 anything that looks like a reference to an object as a reference.  This
4728 will result in a few objects not getting collected when they should, but
4729 it obviates the need for @code{GCPRO}ing, and allows garbage collection
4730 to happen at any point at all, such as during object allocation.
4731
4732 @node Garbage Collection - Step by Step
4733 @section Garbage Collection - Step by Step
4734 @cindex garbage collection step by step
4735
4736 @menu
4737 * Invocation::
4738 * garbage_collect_1::
4739 * mark_object::
4740 * gc_sweep::
4741 * sweep_lcrecords_1::
4742 * compact_string_chars::
4743 * sweep_strings::
4744 * sweep_bit_vectors_1::
4745 @end menu
4746
4747 @node Invocation
4748 @subsection Invocation
4749 @cindex garbage collection, invocation
4750
4751 The first thing that anyone should know about garbage collection is:
4752 when and how the garbage collector is invoked. One might think that this
4753 could happen every time new memory is allocated, e.g. new objects are
4754 created, but this is @emph{not} the case. Instead, we have the following
4755 situation:
4756
4757 The entry point of any process of garbage collection is an invocation
4758 of the function @code{garbage_collect_1} in file @code{alloc.c}. The
4759 invocation can occur @emph{explicitly} by calling the function
4760 @code{Fgarbage_collect} (in addition this function provides information
4761 about the freed memory), or can occur @emph{implicitly} in four different
4762 situations:
4763 @enumerate
4764 @item
4765 In function @code{main_1} in file @code{emacs.c}. This function is called
4766 at each startup of xemacs. The garbage collection is invoked after all
4767 initial creations are completed, but only if a special internal error
4768 checking-constant @code{ERROR_CHECK_GC} is defined.
4769 @item
4770 In function @code{disksave_object_finalization} in file
4771 @code{alloc.c}. The only purpose of this function is to clear the
4772 objects from memory which need not be stored with xemacs when we dump out
4773 an executable. This is only done by @code{Fdump_emacs} or by
4774 @code{Fdump_emacs_data} respectively (both in @code{emacs.c}). The
4775 actual clearing is accomplished by making these objects unreachable and
4776 starting a garbage collection. The function is only used while building
4777 xemacs.
4778 @item
4779 In function @code{Feval / eval} in file @code{eval.c}. Each time the
4780 well known and often used function eval is called to evaluate a form,
4781 one of the first things that could happen, is a potential call of
4782 @code{garbage_collect_1}. There exist three global variables,
4783 @code{consing_since_gc} (counts the created cons-cells since the last
4784 garbage collection), @code{gc_cons_threshold} (a specified threshold
4785 after which a garbage collection occurs) and @code{always_gc}. If
4786 @code{always_gc} is set or if the threshold is exceeded, the garbage
4787 collection will start.
4788 @item
4789 In function @code{Ffuncall / funcall} in file @code{eval.c}. This
4790 function evaluates calls of elisp functions and works according to
4791 @code{Feval}.
4792 @end enumerate
4793
4794 The upshot is that garbage collection can basically occur everywhere
4795 @code{Feval}, respectively @code{Ffuncall}, is used - either directly or
4796 through another function. Since calls to these two functions are
4797 hidden in various other functions, many calls to
4798 @code{garabge_collect_1} are not obviously foreseeable, and therefore
4799 unexpected. Instances where they are used that are worth remembering are
4800 various elisp commands, as for example @code{or},
4801 @code{and}, @code{if}, @code{cond}, @code{while}, @code{setq}, etc.,
4802 miscellaneous @code{gui_item_...} functions, everything related to
4803 @code{eval} (@code{Feval_buffer}, @code{call0}, ...) and inside
4804 @code{Fsignal}. The latter is used to handle signals, as for example the
4805 ones raised by every @code{QUITE}-macro triggered after pressing Ctrl-g.
4806
4807 @node garbage_collect_1
4808 @subsection @code{garbage_collect_1}
4809 @cindex @code{garbage_collect_1}
4810
4811 We can now describe exactly what happens after the invocation takes
4812 place.
4813 @enumerate
4814 @item
4815 There are several cases in which the garbage collector is left immediately:
4816 when we are already garbage collecting (@code{gc_in_progress}), when
4817 the garbage collection is somehow forbidden
4818 (@code{gc_currently_forbidden}), when we are currently displaying something
4819 (@code{in_display}) or when we are preparing for the armageddon of the
4820 whole system (@code{preparing_for_armageddon}).
4821 @item
4822 Next the correct frame in which to put
4823 all the output occurring during garbage collecting is determined. In
4824 order to be able to restore the old display's state after displaying the
4825 message, some data about the current cursor position has to be
4826 saved. The variables @code{pre_gc_curser} and @code{cursor_changed} take
4827 care of that.
4828 @item
4829 The state of @code{gc_currently_forbidden} must be restored after
4830 the garbage collection, no matter what happens during the process. We
4831 accomplish this by @code{record_unwind_protect}ing the suitable function
4832 @code{restore_gc_inhibit} together with the current value of
4833 @code{gc_currently_forbidden}.
4834 @item
4835 If we are concurrently running an interactive xemacs session, the next step
4836 is simply to show the garbage collector's cursor/message.
4837 @item
4838 The following steps are the intrinsic steps of the garbage collector,
4839 therefore @code{gc_in_progress} is set.
4840 @item
4841 For debugging purposes, it is possible to copy the current C stack
4842 frame. However, this seems to be a currently unused feature.
4843 @item
4844 Before actually starting to go over all live objects, references to
4845 objects that are no longer used are pruned. We only have to do this for events
4846 (@code{clear_event_resource}) and for specifiers
4847 (@code{cleanup_specifiers}).
4848 @item
4849 Now the mark phase begins and marks all accessible elements. In order to
4850 start from
4851 all slots that serve as roots of accessibility, the function
4852 @code{mark_object} is called for each root individually to go out from
4853 there to mark all reachable objects. All roots that are traversed are
4854 shown in their processed order:
4855 @itemize @bullet
4856 @item
4857 all constant symbols and static variables that are registered via
4858 @code{staticpro}@ in the array @code{staticvec}.
4859 @xref{Adding Global Lisp Variables}.
4860 @item
4861 all Lisp objects that are created in C functions and that must be
4862 protected from freeing them. They are registered in the global
4863 list @code{gcprolist}.
4864 @xref{GCPROing}.
4865 @item
4866 all local variables (i.e. their name fields @code{symbol} and old
4867 values @code{old_values}) that are bound during the evaluation by the Lisp
4868 engine. They are stored in @code{specbinding} structs pushed on a stack
4869 called @code{specpdl}.
4870 @xref{Dynamic Binding; The specbinding Stack; Unwind-Protects}.
4871 @item
4872 all catch blocks that the Lisp engine encounters during the evaluation
4873 cause the creation of structs @code{catchtag} inserted in the list
4874 @code{catchlist}. Their tag (@code{tag}) and value (@code{val} fields
4875 are freshly created objects and therefore have to be marked.
4876 @xref{Catch and Throw}.
4877 @item
4878 every function application pushes new structs @code{backtrace}
4879 on the call stack of the Lisp engine (@code{backtrace_list}). The unique
4880 parts that have to be marked are the fields for each function
4881 (@code{function}) and all their arguments (@code{args}).
4882 @xref{Evaluation}.
4883 @item
4884 all objects that are used by the redisplay engine that must not be freed
4885 are marked by a special function called @code{mark_redisplay} (in
4886 @code{redisplay.c}).
4887 @item
4888 all objects created for profiling purposes are allocated by C functions
4889 instead of using the lisp allocation mechanisms. In order to receive the
4890 right ones during the sweep phase, they also have to be marked
4891 manually. That is done by the function @code{mark_profiling_info}
4892 @end itemize
4893 @item
4894 Hash tables in XEmacs belong to a kind of special objects that
4895 make use of a concept often called 'weak pointers'.
4896 To make a long story short, these kind of pointers are not followed
4897 during the estimation of the live objects during garbage collection.
4898 Any object referenced only by weak pointers is collected
4899 anyway, and the reference to it is cleared. In hash tables there are
4900 different usage patterns of them, manifesting in different types of hash
4901 tables, namely 'non-weak', 'weak', 'key-weak' and 'value-weak'
4902 (internally also 'key-car-weak' and 'value-car-weak') hash tables, each
4903 clearing entries depending on different conditions. More information can
4904 be found in the documentation to the function @code{make-hash-table}.
4905
4906 Because there are complicated dependency rules about when and what to
4907 mark while processing weak hash tables, the standard @code{marker}
4908 method is only active if it is marking non-weak hash tables. As soon as
4909 a weak component is in the table, the hash table entries are ignored
4910 while marking. Instead their marking is done each separately by the
4911 function @code{finish_marking_weak_hash_tables}. This function iterates
4912 over each hash table entry @code{hentries} for each weak hash table in
4913 @code{Vall_weak_hash_tables}. Depending on the type of a table, the
4914 appropriate action is performed.
4915 If a table is acting as @code{HASH_TABLE_KEY_WEAK}, and a key already marked,
4916 everything reachable from the @code{value} component is marked. If it is
4917 acting as a @code{HASH_TABLE_VALUE_WEAK} and the value component is
4918 already marked, the marking starts beginning only from the
4919 @code{key} component.
4920 If it is a @code{HASH_TABLE_KEY_CAR_WEAK} and the car
4921 of the key entry is already marked, we mark both the @code{key} and
4922 @code{value} components.
4923 Finally, if the table is of the type @code{HASH_TABLE_VALUE_CAR_WEAK}
4924 and the car of the value components is already marked, again both the
4925 @code{key} and the @code{value} components get marked.
4926
4927 Again, there are lists with comparable properties called weak
4928 lists. There exist different peculiarities of their types called
4929 @code{simple}, @code{assoc}, @code{key-assoc} and
4930 @code{value-assoc}. You can find further details about them in the
4931 description to the function @code{make-weak-list}. The scheme of their
4932 marking is similar: all weak lists are listed in @code{Qall_weak_lists},
4933 therefore we iterate over them. The marking is advanced until we hit an
4934 already marked pair. Then we know that during a former run all
4935 the rest has been marked completely. Again, depending on the special
4936 type of the weak list, our jobs differ. If it is a @code{WEAK_LIST_SIMPLE}
4937 and the elem is marked, we mark the @code{cons} part. If it is a
4938 @code{WEAK_LIST_ASSOC} and not a pair or a pair with both marked car and
4939 cdr, we mark the @code{cons} and the @code{elem}. If it is a
4940 @code{WEAK_LIST_KEY_ASSOC} and not a pair or a pair with a marked car of
4941 the elem, we mark the @code{cons} and the @code{elem}. Finally, if it is
4942 a @code{WEAK_LIST_VALUE_ASSOC} and not a pair or a pair with a marked
4943 cdr of the elem, we mark both the @code{cons} and the @code{elem}.
4944
4945 Since, by marking objects in reach from weak hash tables and weak lists,
4946 other objects could get marked, this perhaps implies further marking of
4947 other weak objects, both finishing functions are redone as long as
4948 yet unmarked objects get freshly marked.
4949
4950 @item
4951 After completing the special marking for the weak hash tables and for the weak
4952 lists, all entries that point to objects that are going to be swept in
4953 the further process are useless, and therefore have to be removed from
4954 the table or the list.
4955
4956 The function @code{prune_weak_hash_tables} does the job for weak hash
4957 tables. Totally unmarked hash tables are removed from the list
4958 @code{Vall_weak_hash_tables}. The other ones are treated more carefully
4959 by scanning over all entries and removing one as soon as one of
4960 the components @code{key} and @code{value} is unmarked.
4961
4962 The same idea applies to the weak lists. It is accomplished by
4963 @code{prune_weak_lists}: An unmarked list is pruned from
4964 @code{Vall_weak_lists} immediately. A marked list is treated more
4965 carefully by going over it and removing just the unmarked pairs.
4966
4967 @item
4968 The function @code{prune_specifiers} checks all listed specifiers held
4969 in @code{Vall_speficiers} and removes the ones from the lists that are
4970 unmarked.
4971
4972 @item
4973 All syntax tables are stored in a list called
4974 @code{Vall_syntax_tables}. The function @code{prune_syntax_tables} walks
4975 through it and unlinks the tables that are unmarked.
4976
4977 @item
4978 Next, we will attack the complete sweeping - the function
4979 @code{gc_sweep} which holds the predominance.
4980 @item
4981 First, all the variables with respect to garbage collection are
4982 reset. @code{consing_since_gc} - the counter of the created cells since
4983 the last garbage collection - is set back to 0, and
4984 @code{gc_in_progress} is not @code{true} anymore.
4985 @item
4986 In case the session is interactive, the displayed cursor and message are
4987 removed again.
4988 @item
4989 The state of @code{gc_inhibit} is restored to the former value by
4990 unwinding the stack.
4991 @item
4992 A small memory reserve is always held back that can be reached by
4993 @code{breathing_space}. If nothing more is left, we create a new reserve
4994 and exit.
4995 @end enumerate
4996
4997 @node mark_object
4998 @subsection @code{mark_object}
4999 @cindex @code{mark_object}
5000
5001 The first thing that is checked while marking an object is whether the
5002 object is a real Lisp object @code{Lisp_Type_Record} or just an integer
5003 or a character. Integers and characters are the only two types that are
5004 stored directly - without another level of indirection, and therefore they
5005 don't have to be marked and collected.
5006 @xref{How Lisp Objects Are Represented in C}.
5007
5008 The second case is the one we have to handle. It is the one when we are
5009 dealing with a pointer to a Lisp object. But, there exist also three
5010 possibilities, that prevent us from doing anything while marking: The
5011 object is read only which prevents it from being garbage collected,
5012 i.e. marked (@code{C_READONLY_RECORD_HEADER}). The object in question is
5013 already marked, and need not be marked for the second time (checked by
5014 @code{MARKED_RECORD_HEADER_P}). If it is a special, unmarkable object
5015 (@code{UNMARKABLE_RECORD_HEADER_P}, apparently, these are objects that
5016 sit in some CONST space, and can therefore not be marked, see
5017 @code{this_one_is_unmarkable} in @code{alloc.c}).
5018
5019 Now, the actual marking is feasible. We do so by once using the macro
5020 @code{MARK_RECORD_HEADER} to mark the object itself (actually the
5021 special flag in the lrecord header), and calling its special marker
5022 "method" @code{marker} if available. The marker method marks every
5023 other object that is in reach from our current object. Note, that these
5024 marker methods should not call @code{mark_object} recursively, but
5025 instead should return the next object from where further marking has to
5026 be performed.
5027
5028 In case another object was returned, as mentioned before, we reiterate
5029 the whole @code{mark_object} process beginning with this next object.
5030
5031 @node gc_sweep
5032 @subsection @code{gc_sweep}
5033 @cindex @code{gc_sweep}
5034
5035 The job of this function is to free all unmarked records from memory. As
5036 we know, there are different types of objects implemented and managed, and
5037 consequently different ways to free them from memory.
5038 @xref{Introduction to Allocation}.
5039
5040 We start with all objects stored through @code{lcrecords}. All
5041 bulkier objects are allocated and handled using that scheme of
5042 @code{lcrecords}. Each object is @code{malloc}ed separately
5043 instead of placing it in one of the contiguous frob blocks. All types
5044 that are currently stored
5045 using @code{lcrecords}'s  @code{alloc_lcrecord} and
5046 @code{make_lcrecord_list} are the types: vectors, buffers,
5047 char-table, char-table-entry, console, weak-list, database, device,
5048 ldap, hash-table, command-builder, extent-auxiliary, extent-info, face,
5049 coding-system, frame, image-instance, glyph, popup-data, gui-item,
5050 keymap, charset, color_instance, font_instance, opaque, opaque-list,
5051 process, range-table, specifier, symbol-value-buffer-local,
5052 symbol-value-lisp-magic, symbol-value-varalias, toolbar-button,
5053 tooltalk-message, tooltalk-pattern, window, and window-configuration. We
5054 take care of them in the fist place
5055 in order to be able to handle and to finalize items stored in them more
5056 easily. The function @code{sweep_lcrecords_1} as described below is
5057 doing the whole job for us.
5058 For a description about the internals: @xref{lrecords}.
5059
5060 Our next candidates are the other objects that behave quite differently
5061 than everything else: the strings. They consists of two parts, a
5062 fixed-size portion (@code{struct Lisp_string}) holding the string's
5063 length, its property list and a pointer to the second part, and the
5064 actual string data, which is stored in string-chars blocks comparable to
5065 frob blocks. In this block, the data is not only freed, but also a
5066 compression of holes is made, i.e. all strings are relocated together.
5067 @xref{String}. This compacting phase is performed by the function
5068 @code{compact_string_chars}, the actual sweeping by the function
5069 @code{sweep_strings} is described below.
5070
5071 After that, the other types are swept step by step using functions
5072 @code{sweep_conses}, @code{sweep_bit_vectors_1},
5073 @code{sweep_compiled_functions}, @code{sweep_floats},
5074 @code{sweep_symbols}, @code{sweep_extents}, @code{sweep_markers} and
5075 @code{sweep_extents}.  They are the fixed-size types cons, floats,
5076 compiled-functions, symbol, marker, extent, and event stored in
5077 so-called "frob blocks", and therefore we can basically do the same on
5078 every type objects, using the same macros, especially defined only to
5079 handle everything with respect to fixed-size blocks. The only fixed-size
5080 type that is not handled here are the fixed-size portion of strings,
5081 because we took special care of them earlier.
5082
5083 The only big exceptions are bit vectors stored differently and
5084 therefore treated differently by the function @code{sweep_bit_vectors_1}
5085 described later.
5086
5087 At first, we need some brief information about how
5088 these fixed-size types are managed in general, in order to understand
5089 how the sweeping is done. They have all a fixed size, and are therefore
5090 stored in big blocks of memory - allocated at once - that can hold a
5091 certain amount of objects of one type. The macro
5092 @code{DECLARE_FIXED_TYPE_ALLOC} creates the suitable structures for
5093 every type. More precisely, we have the block struct
5094 (holding a pointer to the previous block @code{prev} and the
5095 objects in @code{block[]}), a pointer to current block
5096 (@code{current_..._block)}) and its last index
5097 (@code{current_..._block_index}), and a pointer to the free list that
5098 will be created. Also a macro @code{FIXED_TYPE_FROM_BLOCK} plus some
5099 related macros exists that are used to obtain a new object, either from
5100 the free list @code{ALLOCATE_FIXED_TYPE_1} if there is an unused object
5101 of that type stored or by allocating a completely new block using
5102 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK}.
5103
5104 The rest works as follows: all of them define a
5105 macro @code{UNMARK_...} that is used to unmark the object. They define a
5106 macro @code{ADDITIONAL_FREE_...} that defines additional work that has
5107 to be done when converting an object from in use to not in use (so far,
5108 only markers use it in order to unchain them). Then, they all call
5109 the macro @code{SWEEP_FIXED_TYPE_BLOCK} instantiated with their type name
5110 and their struct name.
5111
5112 This call in particular does the following: we go over all blocks
5113 starting with the current moving towards the oldest.
5114 For each block, we look at every object in it. If the object already
5115 freed (checked with @code{FREE_STRUCT_P} using the first pointer of the
5116 object), or if it is
5117 set to read only (@code{C_READONLY_RECORD_HEADER_P}, nothing must be
5118 done. If it is unmarked (checked with @code{MARKED_RECORD_HEADER_P}), it
5119 is put in the free list and set free (using the macro
5120 @code{FREE_FIXED_TYPE}, otherwise it stays in the block, but is unmarked
5121 (by @code{UNMARK_...}). While going through one block, we note if the
5122 whole block is empty. If so, the whole block is freed (using
5123 @code{xfree}) and the free list state is set to the state it had before
5124 handling this block.
5125
5126 @node sweep_lcrecords_1
5127 @subsection @code{sweep_lcrecords_1}
5128 @cindex @code{sweep_lcrecords_1}
5129
5130 After nullifying the complete lcrecord statistics, we go over all
5131 lcrecords two separate times. They are all chained together in a list with
5132 a head called @code{all_lcrecords}.
5133
5134 The first loop calls for each object its @code{finalizer} method, but only
5135 in the case that it is not read only
5136 (@code{C_READONLY_RECORD_HEADER_P)}, it is not already marked
5137 (@code{MARKED_RECORD_HEADER_P}), it is not already in a free list (list of
5138 freed objects, field @code{free}) and finally it owns a finalizer
5139 method.
5140
5141 The second loop actually frees the appropriate objects again by iterating
5142 through the whole list. In case an object is read only or marked, it
5143 has to persist, otherwise it is manually freed by calling
5144 @code{xfree}. During this loop, the lcrecord statistics are kept up to
5145 date by calling @code{tick_lcrecord_stats} with the right arguments,
5146
5147 @node compact_string_chars
5148 @subsection @code{compact_string_chars}
5149 @cindex @code{compact_string_chars}
5150
5151 The purpose of this function is to compact all the data parts of the
5152 strings that are held in so-called @code{string_chars_block}, i.e. the
5153 strings that do not exceed a certain maximal length.
5154
5155 The procedure with which this is done is as follows. We are keeping two
5156 positions in the @code{string_chars_block}s using two pointer/integer
5157 pairs, namely @code{from_sb}/@code{from_pos} and
5158 @code{to_sb}/@code{to_pos}. They stand for the actual positions, from
5159 where to where, to copy the actually handled string.
5160
5161 While going over all chained @code{string_char_block}s and their held
5162 strings, staring at @code{first_string_chars_block}, both pointers
5163 are advanced and eventually a string is copied from @code{from_sb} to
5164 @code{to_sb}, depending on the status of the pointed at strings.
5165
5166 More precisely, we can distinguish between the following actions.
5167 @itemize @bullet
5168 @item
5169 The string at @code{from_sb}'s position could be marked as free, which
5170 is indicated by an invalid pointer to the pointer that should point back
5171 to the fixed size string object, and which is checked by
5172 @code{FREE_STRUCT_P}. In this case, the @code{from_sb}/@code{from_pos}
5173 is advanced to the next string, and nothing has to be copied.
5174 @item
5175 Also, if a string object itself is unmarked, nothing has to be
5176 copied. We likewise advance the @code{from_sb}/@code{from_pos}
5177 pair as described above.
5178 @item
5179 In all other cases, we have a marked string at hand. The string data
5180 must be moved from the from-position to the to-position. In case
5181 there is not enough space in the actual @code{to_sb}-block, we advance
5182 this pointer to the beginning of the next block before copying. In case the
5183 from and to positions are different, we perform the
5184 actual copying using the library function @code{memmove}.
5185 @end itemize
5186
5187 After compacting, the pointer to the current
5188 @code{string_chars_block}, sitting in @code{current_string_chars_block},
5189 is reset on the last block to which we moved a string,
5190 i.e. @code{to_block}, and all remaining blocks (we know that they just
5191 carry garbage) are explicitly @code{xfree}d.
5192
5193 @node sweep_strings
5194 @subsection @code{sweep_strings}
5195 @cindex @code{sweep_strings}
5196
5197 The sweeping for the fixed sized string objects is essentially exactly
5198 the same as it is for all other fixed size types. As before, the freeing
5199 into the suitable free list is done by using the macro
5200 @code{SWEEP_FIXED_SIZE_BLOCK} after defining the right macros
5201 @code{UNMARK_string} and @code{ADDITIONAL_FREE_string}. These two
5202 definitions are a little bit special compared to the ones used
5203 for the other fixed size types.
5204
5205 @code{UNMARK_string} is defined the same way except some additional code
5206 used for updating the bookkeeping information.
5207
5208 For strings, @code{ADDITIONAL_FREE_string} has to do something in
5209 addition: in case, the string was not allocated in a
5210 @code{string_chars_block} because it exceeded the maximal length, and
5211 therefore it was @code{malloc}ed separately, we know also @code{xfree}
5212 it explicitly.
5213
5214 @node sweep_bit_vectors_1
5215 @subsection @code{sweep_bit_vectors_1}
5216 @cindex @code{sweep_bit_vectors_1}
5217
5218 Bit vectors are also one of the rare types that are @code{malloc}ed
5219 individually. Consequently, while sweeping, all further needless
5220 bit vectors must be freed by hand. This is done, as one might imagine,
5221 the expected way: since they are all registered in a list called
5222 @code{all_bit_vectors}, all elements of that list are traversed,
5223 all unmarked bit vectors are unlinked by calling @code{xfree} and all of
5224 them become unmarked.
5225 In addition, the bookkeeping information used for garbage
5226 collector's output purposes is updated.
5227
5228 @node Integers and Characters
5229 @section Integers and Characters
5230
5231   Integer and character Lisp objects are created from integers using the
5232 macros @code{XSETINT()} and @code{XSETCHAR()} or the equivalent
5233 functions @code{make_int()} and @code{make_char()}. (These are actually
5234 macros on most systems.)  These functions basically just do some moving
5235 of bits around, since the integral value of the object is stored
5236 directly in the @code{Lisp_Object}.
5237
5238   @code{XSETINT()} and the like will truncate values given to them that
5239 are too big; i.e. you won't get the value you expected but the tag bits
5240 will at least be correct.
5241
5242 @node Allocation from Frob Blocks
5243 @section Allocation from Frob Blocks
5244
5245 The uninitialized memory required by a @code{Lisp_Object} of a particular type
5246 is allocated using
5247 @code{ALLOCATE_FIXED_TYPE()}.  This only occurs inside of the
5248 lowest-level object-creating functions in @file{alloc.c}:
5249 @code{Fcons()}, @code{make_float()}, @code{Fmake_byte_code()},
5250 @code{Fmake_symbol()}, @code{allocate_extent()},
5251 @code{allocate_event()}, @code{Fmake_marker()}, and
5252 @code{make_uninit_string()}.  The idea is that, for each type, there are
5253 a number of frob blocks (each 2K in size); each frob block is divided up
5254 into object-sized chunks.  Each frob block will have some of these
5255 chunks that are currently assigned to objects, and perhaps some that are
5256 free. (If a frob block has nothing but free chunks, it is freed at the
5257 end of the garbage collection cycle.)  The free chunks are stored in a
5258 free list, which is chained by storing a pointer in the first four bytes
5259 of the chunk. (Except for the free chunks at the end of the last frob
5260 block, which are handled using an index which points past the end of the
5261 last-allocated chunk in the last frob block.)
5262 @code{ALLOCATE_FIXED_TYPE()} first tries to retrieve a chunk from the
5263 free list; if that fails, it calls
5264 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK()}, which looks at the end of the
5265 last frob block for space, and creates a new frob block if there is
5266 none. (There are actually two versions of these macros, one of which is
5267 more defensive but less efficient and is used for error-checking.)
5268
5269 @node lrecords
5270 @section lrecords
5271
5272   [see @file{lrecord.h}]
5273
5274   All lrecords have at the beginning of their structure a @code{struct
5275 lrecord_header}.  This just contains a pointer to a @code{struct
5276 lrecord_implementation}, which is a structure containing method pointers
5277 and such.  There is one of these for each type, and it is a global,
5278 constant, statically-declared structure that is declared in the
5279 @code{DEFINE_LRECORD_IMPLEMENTATION()} macro. (This macro actually
5280 declares an array of two @code{struct lrecord_implementation}
5281 structures.  The first one contains all the standard method pointers,
5282 and is used in all normal circumstances.  During garbage collection,
5283 however, the lrecord is @dfn{marked} by bumping its implementation
5284 pointer by one, so that it points to the second structure in the array.
5285 This structure contains a special indication in it that it's a
5286 @dfn{marked-object} structure: the finalize method is the special
5287 function @code{this_marks_a_marked_record()}, and all other methods are
5288 null pointers.  At the end of garbage collection, all lrecords will
5289 either be reclaimed or unmarked by decrementing their implementation
5290 pointers, so this second structure pointer will never remain past
5291 garbage collection.
5292
5293   Simple lrecords (of type (c) above) just have a @code{struct
5294 lrecord_header} at their beginning.  lcrecords, however, actually have a
5295 @code{struct lcrecord_header}.  This, in turn, has a @code{struct
5296 lrecord_header} at its beginning, so sanity is preserved; but it also
5297 has a pointer used to chain all lcrecords together, and a special ID
5298 field used to distinguish one lcrecord from another. (This field is used
5299 only for debugging and could be removed, but the space gain is not
5300 significant.)
5301
5302   Simple lrecords are created using @code{ALLOCATE_FIXED_TYPE()}, just
5303 like for other frob blocks.  The only change is that the implementation
5304 pointer must be initialized correctly. (The implementation structure for
5305 an lrecord, or rather the pointer to it, is named @code{lrecord_float},
5306 @code{lrecord_extent}, @code{lrecord_buffer}, etc.)
5307
5308   lcrecords are created using @code{alloc_lcrecord()}.  This takes a
5309 size to allocate and an implementation pointer. (The size needs to be
5310 passed because some lcrecords, such as window configurations, are of
5311 variable size.) This basically just @code{malloc()}s the storage,
5312 initializes the @code{struct lcrecord_header}, and chains the lcrecord
5313 onto the head of the list of all lcrecords, which is stored in the
5314 variable @code{all_lcrecords}.  The calls to @code{alloc_lcrecord()}
5315 generally occur in the lowest-level allocation function for each lrecord
5316 type.
5317
5318 Whenever you create an lrecord, you need to call either
5319 @code{DEFINE_LRECORD_IMPLEMENTATION()} or
5320 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()}.  This needs to be
5321 specified in a C file, at the top level.  What this actually does is
5322 define and initialize the implementation structure for the lrecord. (And
5323 possibly declares a function @code{error_check_foo()} that implements
5324 the @code{XFOO()} macro when error-checking is enabled.)  The arguments
5325 to the macros are the actual type name (this is used to construct the C
5326 variable name of the lrecord implementation structure and related
5327 structures using the @samp{##} macro concatenation operator), a string
5328 that names the type on the Lisp level (this may not be the same as the C
5329 type name; typically, the C type name has underscores, while the Lisp
5330 string has dashes), various method pointers, and the name of the C
5331 structure that contains the object.  The methods are used to encapsulate
5332 type-specific information about the object, such as how to print it or
5333 mark it for garbage collection, so that it's easy to add new object
5334 types without having to add a specific case for each new type in a bunch
5335 of different places.
5336
5337   The difference between @code{DEFINE_LRECORD_IMPLEMENTATION()} and
5338 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()} is that the former is
5339 used for fixed-size object types and the latter is for variable-size
5340 object types.  Most object types are fixed-size; some complex
5341 types, however (e.g. window configurations), are variable-size.
5342 Variable-size object types have an extra method, which is called
5343 to determine the actual size of a particular object of that type.
5344 (Currently this is only used for keeping allocation statistics.)
5345
5346   For the purpose of keeping allocation statistics, the allocation
5347 engine keeps a list of all the different types that exist.  Note that,
5348 since @code{DEFINE_LRECORD_IMPLEMENTATION()} is a macro that is
5349 specified at top-level, there is no way for it to add to the list of all
5350 existing types.  What happens instead is that each implementation
5351 structure contains in it a dynamically assigned number that is
5352 particular to that type. (Or rather, it contains a pointer to another
5353 structure that contains this number.  This evasiveness is done so that
5354 the implementation structure can be declared const.) In the sweep stage
5355 of garbage collection, each lrecord is examined to see if its
5356 implementation structure has its dynamically-assigned number set.  If
5357 not, it must be a new type, and it is added to the list of known types
5358 and a new number assigned.  The number is used to index into an array
5359 holding the number of objects of each type and the total memory
5360 allocated for objects of that type.  The statistics in this array are
5361 also computed during the sweep stage.  These statistics are returned by
5362 the call to @code{garbage-collect} and are printed out at the end of the
5363 loadup phase.
5364
5365   Note that for every type defined with a @code{DEFINE_LRECORD_*()}
5366 macro, there needs to be a @code{DECLARE_LRECORD_IMPLEMENTATION()}
5367 somewhere in a @file{.h} file, and this @file{.h} file needs to be
5368 included by @file{inline.c}.
5369
5370   Furthermore, there should generally be a set of @code{XFOOBAR()},
5371 @code{FOOBARP()}, etc. macros in a @file{.h} (or occasionally @file{.c})
5372 file.  To create one of these, copy an existing model and modify as
5373 necessary.
5374
5375   The various methods in the lrecord implementation structure are:
5376
5377 @enumerate
5378 @item
5379 @cindex mark method
5380 A @dfn{mark} method.  This is called during the marking stage and passed
5381 a function pointer (usually the @code{mark_object()} function), which is
5382 used to mark an object.  All Lisp objects that are contained within the
5383 object need to be marked by applying this function to them.  The mark
5384 method should also return a Lisp object, which should be either nil or
5385 an object to mark. (This can be used in lieu of calling
5386 @code{mark_object()} on the object, to reduce the recursion depth, and
5387 consequently should be the most heavily nested sub-object, such as a
5388 long list.)
5389
5390 @strong{Please note:} When the mark method is called, garbage collection
5391 is in progress, and special precautions need to be taken when accessing
5392 objects; see section (B) above.
5393
5394 If your mark method does not need to do anything, it can be
5395 @code{NULL}.
5396
5397 @item
5398 A @dfn{print} method.  This is called to create a printed representation
5399 of the object, whenever @code{princ}, @code{prin1}, or the like is
5400 called.  It is passed the object, a stream to which the output is to be
5401 directed, and an @code{escapeflag} which indicates whether the object's
5402 printed representation should be @dfn{escaped} so that it is
5403 readable. (This corresponds to the difference between @code{princ} and
5404 @code{prin1}.) Basically, @dfn{escaped} means that strings will have
5405 quotes around them and confusing characters in the strings such as
5406 quotes, backslashes, and newlines will be backslashed; and that special
5407 care will be taken to make symbols print in a readable fashion
5408 (e.g. symbols that look like numbers will be backslashed).  Other
5409 readable objects should perhaps pass @code{escapeflag} on when
5410 sub-objects are printed, so that readability is preserved when necessary
5411 (or if not, always pass in a 1 for @code{escapeflag}).  Non-readable
5412 objects should in general ignore @code{escapeflag}, except that some use
5413 it as an indication that more verbose output should be given.
5414
5415 Sub-objects are printed using @code{print_internal()}, which takes
5416 exactly the same arguments as are passed to the print method.
5417
5418 Literal C strings should be printed using @code{write_c_string()},
5419 or @code{write_string_1()} for non-null-terminated strings.
5420
5421 Functions that do not have a readable representation should check the
5422 @code{print_readably} flag and signal an error if it is set.
5423
5424 If you specify NULL for the print method, the
5425 @code{default_object_printer()} will be used.
5426
5427 @item
5428 A @dfn{finalize} method.  This is called at the beginning of the sweep
5429 stage on lcrecords that are about to be freed, and should be used to
5430 perform any extra object cleanup.  This typically involves freeing any
5431 extra @code{malloc()}ed memory associated with the object, releasing any
5432 operating-system and window-system resources associated with the object
5433 (e.g. pixmaps, fonts), etc.
5434
5435 The finalize method can be NULL if nothing needs to be done.
5436
5437 WARNING #1: The finalize method is also called at the end of the dump
5438 phase; this time with the for_disksave parameter set to non-zero.  The
5439 object is @emph{not} about to disappear, so you have to make sure to
5440 @emph{not} free any extra @code{malloc()}ed memory if you're going to
5441 need it later.  (Also, signal an error if there are any operating-system
5442 and window-system resources here, because they can't be dumped.)
5443
5444 Finalize methods should, as a rule, set to zero any pointers after
5445 they've been freed, and check to make sure pointers are not zero before
5446 freeing.  Although I'm pretty sure that finalize methods are not called
5447 twice on the same object (except for the @code{for_disksave} proviso),
5448 we've gotten nastily burned in some cases by not doing this.
5449
5450 WARNING #2: The finalize method is @emph{only} called for
5451 lcrecords, @emph{not} for simply lrecords.  If you need a
5452 finalize method for simple lrecords, you have to stick
5453 it in the @code{ADDITIONAL_FREE_foo()} macro in @file{alloc.c}.
5454
5455 WARNING #3: Things are in an @emph{extremely} bizarre state
5456 when @code{ADDITIONAL_FREE_foo()} is called, so you have to
5457 be incredibly careful when writing one of these functions.
5458 See the comment in @code{gc_sweep()}.  If you ever have to add
5459 one of these, consider using an lcrecord or dealing with
5460 the problem in a different fashion.
5461
5462 @item
5463 An @dfn{equal} method.  This compares the two objects for similarity,
5464 when @code{equal} is called.  It should compare the contents of the
5465 objects in some reasonable fashion.  It is passed the two objects and a
5466 @dfn{depth} value, which is used to catch circular objects.  To compare
5467 sub-Lisp-objects, call @code{internal_equal()} and bump the depth value
5468 by one.  If this value gets too high, a @code{circular-object} error
5469 will be signaled.
5470
5471 If this is NULL, objects are @code{equal} only when they are @code{eq},
5472 i.e. identical.
5473
5474 @item
5475 A @dfn{hash} method.  This is used to hash objects when they are to be
5476 compared with @code{equal}.  The rule here is that if two objects are
5477 @code{equal}, they @emph{must} hash to the same value; i.e. your hash
5478 function should use some subset of the sub-fields of the object that are
5479 compared in the ``equal'' method.  If you specify this method as
5480 @code{NULL}, the object's pointer will be used as the hash, which will
5481 @emph{fail} if the object has an @code{equal} method, so don't do this.
5482
5483 To hash a sub-Lisp-object, call @code{internal_hash()}.  Bump the
5484 depth by one, just like in the ``equal'' method.
5485
5486 To convert a Lisp object directly into a hash value (using
5487 its pointer), use @code{LISP_HASH()}.  This is what happens when
5488 the hash method is NULL.
5489
5490 To hash two or more values together into a single value, use
5491 @code{HASH2()}, @code{HASH3()}, @code{HASH4()}, etc.
5492
5493 @item
5494 @dfn{getprop}, @dfn{putprop}, @dfn{remprop}, and @dfn{plist} methods.
5495 These are used for object types that have properties.  I don't feel like
5496 documenting them here.  If you create one of these objects, you have to
5497 use different macros to define them,
5498 i.e. @code{DEFINE_LRECORD_IMPLEMENTATION_WITH_PROPS()} or
5499 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION_WITH_PROPS()}.
5500
5501 @item
5502 A @dfn{size_in_bytes} method, when the object is of variable-size.
5503 (i.e. declared with a @code{_SEQUENCE_IMPLEMENTATION} macro.)  This should
5504 simply return the object's size in bytes, exactly as you might expect.
5505 For an example, see the methods for window configurations and opaques.
5506 @end enumerate
5507
5508 @node Low-level allocation
5509 @section Low-level allocation
5510
5511   Memory that you want to allocate directly should be allocated using
5512 @code{xmalloc()} rather than @code{malloc()}.  This implements
5513 error-checking on the return value, and once upon a time did some more
5514 vital stuff (i.e. @code{BLOCK_INPUT}, which is no longer necessary).
5515 Free using @code{xfree()}, and realloc using @code{xrealloc()}.  Note
5516 that @code{xmalloc()} will do a non-local exit if the memory can't be
5517 allocated. (Many functions, however, do not expect this, and thus XEmacs
5518 will likely crash if this happens.  @strong{This is a bug.}  If you can,
5519 you should strive to make your function handle this OK.  However, it's
5520 difficult in the general circumstance, perhaps requiring extra
5521 unwind-protects and such.)
5522
5523   Note that XEmacs provides two separate replacements for the standard
5524 @code{malloc()} library function.  These are called @dfn{old GNU malloc}
5525 (@file{malloc.c}) and @dfn{new GNU malloc} (@file{gmalloc.c}),
5526 respectively.  New GNU malloc is better in pretty much every way than
5527 old GNU malloc, and should be used if possible.  (It used to be that on
5528 some systems, the old one worked but the new one didn't.  I think this
5529 was due specifically to a bug in SunOS, which the new one now works
5530 around; so I don't think the old one ever has to be used any more.) The
5531 primary difference between both of these mallocs and the standard system
5532 malloc is that they are much faster, at the expense of increased space.
5533 The basic idea is that memory is allocated in fixed chunks of powers of
5534 two.  This allows for basically constant malloc time, since the various
5535 chunks can just be kept on a number of free lists. (The standard system
5536 malloc typically allocates arbitrary-sized chunks and has to spend some
5537 time, sometimes a significant amount of time, walking the heap looking
5538 for a free block to use and cleaning things up.)  The new GNU malloc
5539 improves on things by allocating large objects in chunks of 4096 bytes
5540 rather than in ever larger powers of two, which results in ever larger
5541 wastage.  There is a slight speed loss here, but it's of doubtful
5542 significance.
5543
5544   NOTE: Apparently there is a third-generation GNU malloc that is
5545 significantly better than the new GNU malloc, and should probably
5546 be included in XEmacs.
5547
5548   There is also the relocating allocator, @file{ralloc.c}.  This actually
5549 moves blocks of memory around so that the @code{sbrk()} pointer shrunk
5550 and virtual memory released back to the system.  On some systems,
5551 this is a big win.  On all systems, it causes a noticeable (and
5552 sometimes huge) speed penalty, so I turn it off by default.
5553 @file{ralloc.c} only works with the new GNU malloc in @file{gmalloc.c}.
5554 There are also two versions of @file{ralloc.c}, one that uses @code{mmap()}
5555 rather than block copies to move data around.  This purports to
5556 be faster, although that depends on the amount of data that would
5557 have had to be block copied and the system-call overhead for
5558 @code{mmap()}.  I don't know exactly how this works, except that the
5559 relocating-allocation routines are pretty much used only for
5560 the memory allocated for a buffer, which is the biggest consumer
5561 of space, esp. of space that may get freed later.
5562
5563   Note that the GNU mallocs have some ``memory warning'' facilities.
5564 XEmacs taps into them and issues a warning through the standard
5565 warning system, when memory gets to 75%, 85%, and 95% full.
5566 (On some systems, the memory warnings are not functional.)
5567
5568   Allocated memory that is going to be used to make a Lisp object
5569 is created using @code{allocate_lisp_storage()}.  This calls @code{xmalloc()}
5570 but also verifies that the pointer to the memory can fit into
5571 a Lisp word (remember that some bits are taken away for a type
5572 tag and a mark bit).  If not, an error is issued through @code{memory_full()}.
5573 @code{allocate_lisp_storage()} is called by @code{alloc_lcrecord()},
5574 @code{ALLOCATE_FIXED_TYPE()}, and the vector and bit-vector creation
5575 routines.  These routines also call @code{INCREMENT_CONS_COUNTER()} at the
5576 appropriate times; this keeps statistics on how much memory is
5577 allocated, so that garbage-collection can be invoked when the
5578 threshold is reached.
5579
5580 @node Pure Space
5581 @section Pure Space
5582
5583   Not yet documented.
5584
5585 @node Cons
5586 @section Cons
5587
5588   Conses are allocated in standard frob blocks.  The only thing to
5589 note is that conses can be explicitly freed using @code{free_cons()}
5590 and associated functions @code{free_list()} and @code{free_alist()}.  This
5591 immediately puts the conses onto the cons free list, and decrements
5592 the statistics on memory allocation appropriately.  This is used
5593 to good effect by some extremely commonly-used code, to avoid
5594 generating extra objects and thereby triggering GC sooner.
5595 However, you have to be @emph{extremely} careful when doing this.
5596 If you mess this up, you will get BADLY BURNED, and it has happened
5597 before.
5598
5599 @node Vector
5600 @section Vector
5601
5602   As mentioned above, each vector is @code{malloc()}ed individually, and
5603 all are threaded through the variable @code{all_vectors}.  Vectors are
5604 marked strangely during garbage collection, by kludging the size field.
5605 Note that the @code{struct Lisp_Vector} is declared with its
5606 @code{contents} field being a @emph{stretchy} array of one element.  It
5607 is actually @code{malloc()}ed with the right size, however, and access
5608 to any element through the @code{contents} array works fine.
5609
5610 @node Bit Vector
5611 @section Bit Vector
5612
5613   Bit vectors work exactly like vectors, except for more complicated
5614 code to access an individual bit, and except for the fact that bit
5615 vectors are lrecords while vectors are not. (The only difference here is
5616 that there's an lrecord implementation pointer at the beginning and the
5617 tag field in bit vector Lisp words is ``lrecord'' rather than
5618 ``vector''.)
5619
5620 @node Symbol
5621 @section Symbol
5622
5623   Symbols are also allocated in frob blocks.  Note that the code
5624 exists for symbols to be either lrecords (category (c) above)
5625 or simple types (category (b) above), and are lrecords by
5626 default (I think), although there is no good reason for this.
5627
5628   Note that symbols in the awful horrible obarray structure are
5629 chained through their @code{next} field.
5630
5631 Remember that @code{intern} looks up a symbol in an obarray, creating
5632 one if necessary.
5633
5634 @node Marker
5635 @section Marker
5636
5637   Markers are allocated in frob blocks, as usual.  They are kept
5638 in a buffer unordered, but in a doubly-linked list so that they
5639 can easily be removed. (Formerly this was a singly-linked list,
5640 but in some cases garbage collection took an extraordinarily
5641 long time due to the O(N^2) time required to remove lots of
5642 markers from a buffer.) Markers are removed from a buffer in
5643 the finalize stage, in @code{ADDITIONAL_FREE_marker()}.
5644
5645 @node String
5646 @section String
5647
5648   As mentioned above, strings are a special case.  A string is logically
5649 two parts, a fixed-size object (containing the length, property list,
5650 and a pointer to the actual data), and the actual data in the string.
5651 The fixed-size object is a @code{struct Lisp_String} and is allocated in
5652 frob blocks, as usual.  The actual data is stored in special
5653 @dfn{string-chars blocks}, which are 8K blocks of memory.
5654 Currently-allocated strings are simply laid end to end in these
5655 string-chars blocks, with a pointer back to the @code{struct Lisp_String}
5656 stored before each string in the string-chars block.  When a new string
5657 needs to be allocated, the remaining space at the end of the last
5658 string-chars block is used if there's enough, and a new string-chars
5659 block is created otherwise.
5660
5661   There are never any holes in the string-chars blocks due to the string
5662 compaction and relocation that happens at the end of garbage collection.
5663 During the sweep stage of garbage collection, when objects are
5664 reclaimed, the garbage collector goes through all string-chars blocks,
5665 looking for unused strings.  Each chunk of string data is preceded by a
5666 pointer to the corresponding @code{struct Lisp_String}, which indicates
5667 both whether the string is used and how big the string is, i.e. how to
5668 get to the next chunk of string data.  Holes are compressed by
5669 block-copying the next string into the empty space and relocating the
5670 pointer stored in the corresponding @code{struct Lisp_String}.
5671 @strong{This means you have to be careful with strings in your code.}
5672 See the section above on @code{GCPRO}ing.
5673
5674   Note that there is one situation not handled: a string that is too big
5675 to fit into a string-chars block.  Such strings, called @dfn{big
5676 strings}, are all @code{malloc()}ed as their own block. (#### Although it
5677 would make more sense for the threshold for big strings to be somewhat
5678 lower, e.g. 1/2 or 1/4 the size of a string-chars block.  It seems that
5679 this was indeed the case formerly---indeed, the threshold was set at
5680 1/8---but Mly forgot about this when rewriting things for 19.8.)
5681
5682 Note also that the string data in string-chars blocks is padded as
5683 necessary so that proper alignment constraints on the @code{struct
5684 Lisp_String} back pointers are maintained.
5685
5686   Finally, strings can be resized.  This happens in Mule when a
5687 character is substituted with a different-length character, or during
5688 modeline frobbing. (You could also export this to Lisp, but it's not
5689 done so currently.) Resizing a string is a potentially tricky process.
5690 If the change is small enough that the padding can absorb it, nothing
5691 other than a simple memory move needs to be done.  Keep in mind,
5692 however, that the string can't shrink too much because the offset to the
5693 next string in the string-chars block is computed by looking at the
5694 length and rounding to the nearest multiple of four or eight.  If the
5695 string would shrink or expand beyond the correct padding, new string
5696 data needs to be allocated at the end of the last string-chars block and
5697 the data moved appropriately.  This leaves some dead string data, which
5698 is marked by putting a special marker of 0xFFFFFFFF in the @code{struct
5699 Lisp_String} pointer before the data (there's no real @code{struct
5700 Lisp_String} to point to and relocate), and storing the size of the dead
5701 string data (which would normally be obtained from the now-non-existent
5702 @code{struct Lisp_String}) at the beginning of the dead string data gap.
5703 The string compactor recognizes this special 0xFFFFFFFF marker and
5704 handles it correctly.
5705
5706 @node Compiled Function
5707 @section Compiled Function
5708
5709   Not yet documented.
5710
5711 @node Events and the Event Loop, Evaluation; Stack Frames; Bindings, Allocation of Objects in XEmacs Lisp, Top
5712 @chapter Events and the Event Loop
5713
5714 @menu
5715 * Introduction to Events::
5716 * Main Loop::
5717 * Specifics of the Event Gathering Mechanism::
5718 * Specifics About the Emacs Event::
5719 * The Event Stream Callback Routines::
5720 * Other Event Loop Functions::
5721 * Converting Events::
5722 * Dispatching Events; The Command Builder::
5723 @end menu
5724
5725 @node Introduction to Events
5726 @section Introduction to Events
5727
5728   An event is an object that encapsulates information about an
5729 interesting occurrence in the operating system.  Events are
5730 generated either by user action, direct (e.g. typing on the
5731 keyboard or moving the mouse) or indirect (moving another
5732 window, thereby generating an expose event on an Emacs frame),
5733 or as a result of some other typically asynchronous action happening,
5734 such as output from a subprocess being ready or a timer expiring.
5735 Events come into the system in an asynchronous fashion (typically
5736 through a callback being called) and are converted into a
5737 synchronous event queue (first-in, first-out) in a process that
5738 we will call @dfn{collection}.
5739
5740   Note that each application has its own event queue. (It is
5741 immaterial whether the collection process directly puts the
5742 events in the proper application's queue, or puts them into
5743 a single system queue, which is later split up.)
5744
5745   The most basic level of event collection is done by the
5746 operating system or window system.  Typically, XEmacs does
5747 its own event collection as well.  Often there are multiple
5748 layers of collection in XEmacs, with events from various
5749 sources being collected into a queue, which is then combined
5750 with other sources to go into another queue (i.e. a second
5751 level of collection), with perhaps another level on top of
5752 this, etc.
5753
5754   XEmacs has its own types of events (called @dfn{Emacs events}),
5755 which provides an abstract layer on top of the system-dependent
5756 nature of the most basic events that are received.  Part of the
5757 complex nature of the XEmacs event collection process involves
5758 converting from the operating-system events into the proper
5759 Emacs events---there may not be a one-to-one correspondence.
5760
5761   Emacs events are documented in @file{events.h}; I'll discuss them
5762 later.
5763
5764 @node Main Loop
5765 @section Main Loop
5766
5767   The @dfn{command loop} is the top-level loop that the editor is always
5768 running.  It loops endlessly, calling @code{next-event} to retrieve an
5769 event and @code{dispatch-event} to execute it. @code{dispatch-event} does
5770 the appropriate thing with non-user events (process, timeout,
5771 magic, eval, mouse motion); this involves calling a Lisp handler
5772 function, redrawing a newly-exposed part of a frame, reading
5773 subprocess output, etc.  For user events, @code{dispatch-event}
5774 looks up the event in relevant keymaps or menubars; when a
5775 full key sequence or menubar selection is reached, the appropriate
5776 function is executed. @code{dispatch-event} may have to keep state
5777 across calls; this is done in the ``command-builder'' structure
5778 associated with each console (remember, there's usually only
5779 one console), and the engine that looks up keystrokes and
5780 constructs full key sequences is called the @dfn{command builder}.
5781 This is documented elsewhere.
5782
5783   The guts of the command loop are in @code{command_loop_1()}.  This
5784 function doesn't catch errors, though---that's the job of
5785 @code{command_loop_2()}, which is a condition-case (i.e. error-trapping)
5786 wrapper around @code{command_loop_1()}.  @code{command_loop_1()} never
5787 returns, but may get thrown out of.
5788
5789   When an error occurs, @code{cmd_error()} is called, which usually
5790 invokes the Lisp error handler in @code{command-error}; however, a
5791 default error handler is provided if @code{command-error} is @code{nil}
5792 (e.g. during startup).  The purpose of the error handler is simply to
5793 display the error message and do associated cleanup; it does not need to
5794 throw anywhere.  When the error handler finishes, the condition-case in
5795 @code{command_loop_2()} will finish and @code{command_loop_2()} will
5796 reinvoke @code{command_loop_1()}.
5797
5798   @code{command_loop_2()} is invoked from three places: from
5799 @code{initial_command_loop()} (called from @code{main()} at the end of
5800 internal initialization), from the Lisp function @code{recursive-edit},
5801 and from @code{call_command_loop()}.
5802
5803   @code{call_command_loop()} is called when a macro is started and when
5804 the minibuffer is entered; normal termination of the macro or minibuffer
5805 causes a throw out of the recursive command loop. (To
5806 @code{execute-kbd-macro} for macros and @code{exit} for minibuffers.
5807 Note also that the low-level minibuffer-entering function,
5808 @code{read-minibuffer-internal}, provides its own error handling and
5809 does not need @code{command_loop_2()}'s error encapsulation; so it tells
5810 @code{call_command_loop()} to invoke @code{command_loop_1()} directly.)
5811
5812   Note that both read-minibuffer-internal and recursive-edit set up a
5813 catch for @code{exit}; this is why @code{abort-recursive-edit}, which
5814 throws to this catch, exits out of either one.
5815
5816   @code{initial_command_loop()}, called from @code{main()}, sets up a
5817 catch for @code{top-level} when invoking @code{command_loop_2()},
5818 allowing functions to throw all the way to the top level if they really
5819 need to.  Before invoking @code{command_loop_2()},
5820 @code{initial_command_loop()} calls @code{top_level_1()}, which handles
5821 all of the startup stuff (creating the initial frame, handling the
5822 command-line options, loading the user's @file{.emacs} file, etc.).  The
5823 function that actually does this is in Lisp and is pointed to by the
5824 variable @code{top-level}; normally this function is
5825 @code{normal-top-level}.  @code{top_level_1()} is just an error-handling
5826 wrapper similar to @code{command_loop_2()}.  Note also that
5827 @code{initial_command_loop()} sets up a catch for @code{top-level} when
5828 invoking @code{top_level_1()}, just like when it invokes
5829 @code{command_loop_2()}.
5830
5831 @node Specifics of the Event Gathering Mechanism
5832 @section Specifics of the Event Gathering Mechanism
5833
5834   Here is an approximate diagram of the collection processes
5835 at work in XEmacs, under TTY's (TTY's are simpler than X
5836 so we'll look at this first):
5837
5838 @noindent
5839 @example
5840  asynch.      asynch.    asynch.   asynch.             [Collectors in
5841 kbd events  kbd events   process   process                the OS]
5842       |         |         output    output
5843       |         |           |         |
5844       |         |           |         |      SIGINT,   [signal handlers
5845       |         |           |         |      SIGQUIT,     in XEmacs]
5846       V         V           V         V      SIGWINCH,
5847      file      file        file      file    SIGALRM
5848      desc.     desc.       desc.     desc.     |
5849      (TTY)     (TTY)       (pipe)    (pipe)    |
5850       |          |          |         |      fake    timeouts
5851       |          |          |         |      file        |
5852       |          |          |         |      desc.       |
5853       |          |          |         |      (pipe)      |
5854       |          |          |         |        |         |
5855       |          |          |         |        |         |
5856       |          |          |         |        |         |
5857       V          V          V         V        V         V
5858       ------>-----------<----------------<----------------
5859                   |
5860                   |
5861                   | [collected using select() in emacs_tty_next_event()
5862                   |  and converted to the appropriate Emacs event]
5863                   |
5864                   |
5865                   V          (above this line is TTY-specific)
5866                 Emacs -----------------------------------------------
5867                 event (below this line is the generic event mechanism)
5868                   |
5869                   |
5870 was there     if not, call
5871 a SIGINT?  emacs_tty_next_event()
5872     |             |
5873     |             |
5874     |             |
5875     V             V
5876     --->------<----
5877            |
5878            |     [collected in event_stream_next_event();
5879            |      SIGINT is converted using maybe_read_quit_event()]
5880            V
5881          Emacs
5882          event
5883            |
5884            \---->------>----- maybe_kbd_translate() ---->---\
5885                                                             |
5886                                                             |
5887                                                             |
5888      command event queue                                    |
5889                                                if not from command
5890   (contains events that were                   event queue, call
5891   read earlier but not processed,              event_stream_next_event()
5892   typically when waiting in a                               |
5893   sit-for, sleep-for, etc. for                              |
5894  a particular event to be received)                         |
5895                |                                            |
5896                |                                            |
5897                V                                            V
5898                ---->------------------------------------<----
5899                                                |
5900                                                | [collected in
5901                                                |  next_event_internal()]
5902                                                |
5903  unread-     unread-       event from          |
5904  command-    command-       keyboard       else, call
5905  events      event           macro      next_event_internal()
5906    |           |               |               |
5907    |           |               |               |
5908    |           |               |               |
5909    V           V               V               V
5910    --------->----------------------<------------
5911                      |
5912                      |      [collected in `next-event', which may loop
5913                      |       more than once if the event it gets is on
5914                      |       a dead frame, device, etc.]
5915                      |
5916                      |
5917                      V
5918             feed into top-level event loop,
5919             which repeatedly calls `next-event'
5920             and then dispatches the event
5921             using `dispatch-event'
5922 @end example
5923
5924 Notice the separation between TTY-specific and generic event mechanism.
5925 When using the Xt-based event loop, the TTY-specific stuff is replaced
5926 but the rest stays the same.
5927
5928 It's also important to realize that only one different kind of
5929 system-specific event loop can be operating at a time, and must be able
5930 to receive all kinds of events simultaneously.  For the two existing
5931 event loops (implemented in @file{event-tty.c} and @file{event-Xt.c},
5932 respectively), the TTY event loop @emph{only} handles TTY consoles,
5933 while the Xt event loop handles @emph{both} TTY and X consoles.  This
5934 situation is different from all of the output handlers, where you simply
5935 have one per console type.
5936
5937   Here's the Xt Event Loop Diagram (notice that below a certain point,
5938 it's the same as the above diagram):
5939
5940 @example
5941 asynch. asynch. asynch. asynch.                 [Collectors in
5942  kbd     kbd    process process                    the OS]
5943 events  events  output  output
5944   |       |       |       |
5945   |       |       |       |     asynch. asynch. [Collectors in the
5946   |       |       |       |       X        X     OS and X Window System]
5947   |       |       |       |     events  events
5948   |       |       |       |       |        |
5949   |       |       |       |       |        |
5950   |       |       |       |       |        |    SIGINT, [signal handlers
5951   |       |       |       |       |        |    SIGQUIT,   in XEmacs]
5952   |       |       |       |       |        |    SIGWINCH,
5953   |       |       |       |       |        |    SIGALRM
5954   |       |       |       |       |        |       |
5955   |       |       |       |       |        |       |
5956   |       |       |       |       |        |       |      timeouts
5957   |       |       |       |       |        |       |          |
5958   |       |       |       |       |        |       |          |
5959   |       |       |       |       |        |       V          |
5960   V       V       V       V       V        V      fake        |
5961  file    file    file    file    file     file    file        |
5962  desc.   desc.   desc.   desc.   desc.    desc.   desc.       |
5963  (TTY)   (TTY)   (pipe)  (pipe) (socket) (socket) (pipe)      |
5964   |       |       |       |       |        |       |          |
5965   |       |       |       |       |        |       |          |
5966   |       |       |       |       |        |       |          |
5967   V       V       V       V       V        V       V          V
5968   --->----------------------------------------<---------<------
5969        |              |               |
5970        |              |               |[collected using select() in
5971        |              |               | _XtWaitForSomething(), called
5972        |              |               | from XtAppProcessEvent(), called
5973        |              |               | in emacs_Xt_next_event();
5974        |              |               | dispatched to various callbacks]
5975        |              |               |
5976        |              |               |
5977   emacs_Xt_        p_s_callback(),    | [popup_selection_callback]
5978   event_handler()  x_u_v_s_callback(),| [x_update_vertical_scrollbar_
5979        |           x_u_h_s_callback(),|  callback]
5980        |           search_callback()  | [x_update_horizontal_scrollbar_
5981        |              |               |  callback]
5982        |              |               |
5983        |              |               |
5984   enqueue_Xt_       signal_special_   |
5985   dispatch_event()  Xt_user_event()   |
5986   [maybe multiple     |               |
5987    times, maybe 0     |               |
5988    times]             |               |
5989        |            enqueue_Xt_       |
5990        |            dispatch_event()  |
5991        |              |               |
5992        |              |               |
5993        V              V               |
5994        -->----------<--               |
5995               |                       |
5996               |                       |
5997            dispatch             Xt_what_callback()
5998            event                  sets flags
5999            queue                      |
6000               |                       |
6001               |                       |
6002               |                       |
6003               |                       |
6004               ---->-----------<--------
6005                    |
6006                    |
6007                    |     [collected and converted as appropriate in
6008                    |            emacs_Xt_next_event()]
6009                    |
6010                    |
6011                    V          (above this line is Xt-specific)
6012                  Emacs ------------------------------------------------
6013                  event (below this line is the generic event mechanism)
6014                    |
6015                    |
6016 was there      if not, call
6017 a SIGINT?   emacs_Xt_next_event()
6018     |              |
6019     |              |
6020     |              |
6021     V              V
6022     --->-------<----
6023            |
6024            |        [collected in event_stream_next_event();
6025            |         SIGINT is converted using maybe_read_quit_event()]
6026            V
6027          Emacs
6028          event
6029            |
6030            \---->------>----- maybe_kbd_translate() -->-----\
6031                                                             |
6032                                                             |
6033                                                             |
6034      command event queue                                    |
6035                                               if not from command
6036   (contains events that were                  event queue, call
6037   read earlier but not processed,             event_stream_next_event()
6038   typically when waiting in a                               |
6039   sit-for, sleep-for, etc. for                              |
6040  a particular event to be received)                         |
6041                |                                            |
6042                |                                            |
6043                V                                            V
6044                ---->----------------------------------<------
6045                                                |
6046                                                | [collected in
6047                                                |  next_event_internal()]
6048                                                |
6049  unread-     unread-       event from          |
6050  command-    command-       keyboard       else, call
6051  events      event           macro      next_event_internal()
6052    |           |               |               |
6053    |           |               |               |
6054    |           |               |               |
6055    V           V               V               V
6056    --------->----------------------<------------
6057                      |
6058                      |      [collected in `next-event', which may loop
6059                      |       more than once if the event it gets is on
6060                      |       a dead frame, device, etc.]
6061                      |
6062                      |
6063                      V
6064             feed into top-level event loop,
6065             which repeatedly calls `next-event'
6066             and then dispatches the event
6067             using `dispatch-event'
6068 @end example
6069
6070 @node Specifics About the Emacs Event
6071 @section Specifics About the Emacs Event
6072
6073 @node The Event Stream Callback Routines
6074 @section The Event Stream Callback Routines
6075
6076 @node Other Event Loop Functions
6077 @section Other Event Loop Functions
6078
6079   @code{detect_input_pending()} and @code{input-pending-p} look for
6080 input by calling @code{event_stream->event_pending_p} and looking in
6081 @code{[V]unread-command-event} and the @code{command_event_queue} (they
6082 do not check for an executing keyboard macro, though).
6083
6084   @code{discard-input} cancels any command events pending (and any
6085 keyboard macros currently executing), and puts the others onto the
6086 @code{command_event_queue}.  There is a comment about a ``race
6087 condition'', which is not a good sign.
6088
6089   @code{next-command-event} and @code{read-char} are higher-level
6090 interfaces to @code{next-event}.  @code{next-command-event} gets the
6091 next @dfn{command} event (i.e.  keypress, mouse event, menu selection,
6092 or scrollbar action), calling @code{dispatch-event} on any others.
6093 @code{read-char} calls @code{next-command-event} and uses
6094 @code{event_to_character()} to return the character equivalent.  With
6095 the right kind of input method support, it is possible for (read-char)
6096 to return a Kanji character.
6097
6098 @node Converting Events
6099 @section Converting Events
6100
6101   @code{character_to_event()}, @code{event_to_character()},
6102 @code{event-to-character}, and @code{character-to-event} convert between
6103 characters and keypress events corresponding to the characters.  If the
6104 event was not a keypress, @code{event_to_character()} returns -1 and
6105 @code{event-to-character} returns @code{nil}.  These functions convert
6106 between character representation and the split-up event representation
6107 (keysym plus mod keys).
6108
6109 @node Dispatching Events; The Command Builder
6110 @section Dispatching Events; The Command Builder
6111
6112 Not yet documented.
6113
6114 @node Evaluation; Stack Frames; Bindings, Symbols and Variables, Events and the Event Loop, Top
6115 @chapter Evaluation; Stack Frames; Bindings
6116
6117 @menu
6118 * Evaluation::
6119 * Dynamic Binding; The specbinding Stack; Unwind-Protects::
6120 * Simple Special Forms::
6121 * Catch and Throw::
6122 @end menu
6123
6124 @node Evaluation
6125 @section Evaluation
6126
6127   @code{Feval()} evaluates the form (a Lisp object) that is passed to
6128 it.  Note that evaluation is only non-trivial for two types of objects:
6129 symbols and conses.  A symbol is evaluated simply by calling
6130 @code{symbol-value} on it and returning the value.
6131
6132   Evaluating a cons means calling a function.  First, @code{eval} checks
6133 to see if garbage-collection is necessary, and calls
6134 @code{garbage_collect_1()} if so.  It then increases the evaluation
6135 depth by 1 (@code{lisp_eval_depth}, which is always less than
6136 @code{max_lisp_eval_depth}) and adds an element to the linked list of
6137 @code{struct backtrace}'s (@code{backtrace_list}).  Each such structure
6138 contains a pointer to the function being called plus a list of the
6139 function's arguments.  Originally these values are stored unevalled, and
6140 as they are evaluated, the backtrace structure is updated.  Garbage
6141 collection pays attention to the objects pointed to in the backtrace
6142 structures (garbage collection might happen while a function is being
6143 called or while an argument is being evaluated, and there could easily
6144 be no other references to the arguments in the argument list; once an
6145 argument is evaluated, however, the unevalled version is not needed by
6146 eval, and so the backtrace structure is changed).
6147
6148 At this point, the function to be called is determined by looking at
6149 the car of the cons (if this is a symbol, its function definition is
6150 retrieved and the process repeated).  The function should then consist
6151 of either a @code{Lisp_Subr} (built-in function written in C), a
6152 @code{Lisp_Compiled_Function} object, or a cons whose car is one of the
6153 symbols @code{autoload}, @code{macro} or @code{lambda}.
6154
6155 If the function is a @code{Lisp_Subr}, the lisp object points to a
6156 @code{struct Lisp_Subr} (created by @code{DEFUN()}), which contains a
6157 pointer to the C function, a minimum and maximum number of arguments
6158 (or possibly the special constants @code{MANY} or @code{UNEVALLED}), a
6159 pointer to the symbol referring to that subr, and a couple of other
6160 things.  If the subr wants its arguments @code{UNEVALLED}, they are
6161 passed raw as a list.  Otherwise, an array of evaluated arguments is
6162 created and put into the backtrace structure, and either passed whole
6163 (@code{MANY}) or each argument is passed as a C argument.
6164
6165 If the function is a @code{Lisp_Compiled_Function},
6166 @code{funcall_compiled_function()} is called.  If the function is a
6167 lambda list, @code{funcall_lambda()} is called.  If the function is a
6168 macro, [..... fill in] is done.  If the function is an autoload,
6169 @code{do_autoload()} is called to load the definition and then eval
6170 starts over [explain this more].
6171
6172 When @code{Feval()} exits, the evaluation depth is reduced by one, the
6173 debugger is called if appropriate, and the current backtrace structure
6174 is removed from the list.
6175
6176 Both @code{funcall_compiled_function()} and @code{funcall_lambda()} need
6177 to go through the list of formal parameters to the function and bind
6178 them to the actual arguments, checking for @code{&rest} and
6179 @code{&optional} symbols in the formal parameters and making sure the
6180 number of actual arguments is correct.
6181 @code{funcall_compiled_function()} can do this a little more
6182 efficiently, since the formal parameter list can be checked for sanity
6183 when the compiled function object is created.
6184
6185 @code{funcall_lambda()} simply calls @code{Fprogn} to execute the code
6186 in the lambda list.
6187
6188 @code{funcall_compiled_function()} calls the real byte-code interpreter
6189 @code{execute_optimized_program()} on the byte-code instructions, which
6190 are converted into an internal form for faster execution.
6191
6192 When a compiled function is executed for the first time by
6193 @code{funcall_compiled_function()}, or when it is @code{Fpurecopy()}ed
6194 during the dump phase of building XEmacs, the byte-code instructions are
6195 converted from a @code{Lisp_String} (which is inefficient to access,
6196 especially in the presence of MULE) into a @code{Lisp_Opaque} object
6197 containing an array of unsigned char, which can be directly executed by
6198 the byte-code interpreter.  At this time the byte code is also analyzed
6199 for validity and transformed into a more optimized form, so that
6200 @code{execute_optimized_program()} can really fly.
6201
6202 Here are some of the optimizations performed by the internal byte-code
6203 transformer:
6204 @enumerate
6205 @item
6206 References to the @code{constants} array are checked for out-of-range
6207 indices, so that the byte interpreter doesn't have to.
6208 @item
6209 References to the @code{constants} array that will be used as a Lisp
6210 variable are checked for being correct non-constant (i.e. not @code{t},
6211 @code{nil}, or @code{keywordp}) symbols, so that the byte interpreter
6212 doesn't have to.
6213 @item
6214 The maxiumum number of variable bindings in the byte-code is
6215 pre-computed, so that space on the @code{specpdl} stack can be
6216 pre-reserved once for the whole function execution.
6217 @item
6218 All byte-code jumps are relative to the current program counter instead
6219 of the start of the program, thereby saving a register.
6220 @item
6221 One-byte relative jumps are converted from the byte-code form of unsigned
6222 chars offset by 127 to machine-friendly signed chars.
6223 @end enumerate
6224
6225 Of course, this transformation of the @code{instructions} should not be
6226 visible to the user, so @code{Fcompiled_function_instructions()} needs
6227 to know how to convert the optimized opaque object back into a Lisp
6228 string that is identical to the original string from the @file{.elc}
6229 file.  (Actually, the resulting string may (rarely) contain slightly
6230 different, yet equivalent, byte code.)
6231
6232 @code{Ffuncall()} implements Lisp @code{funcall}.  @code{(funcall fun
6233 x1 x2 x3 ...)} is equivalent to @code{(eval (list fun (quote x1) (quote
6234 x2) (quote x3) ...))}.  @code{Ffuncall()} contains its own code to do
6235 the evaluation, however, and is very similar to @code{Feval()}.
6236
6237 From the performance point of view, it is worth knowing that most of the
6238 time in Lisp evaluation is spent executing @code{Lisp_Subr} and
6239 @code{Lisp_Compiled_Function} objects via @code{Ffuncall()} (not
6240 @code{Feval()}).
6241
6242 @code{Fapply()} implements Lisp @code{apply}, which is very similar to
6243 @code{funcall} except that if the last argument is a list, the result is the
6244 same as if each of the arguments in the list had been passed separately.
6245 @code{Fapply()} does some business to expand the last argument if it's a
6246 list, then calls @code{Ffuncall()} to do the work.
6247
6248 @code{apply1()}, @code{call0()}, @code{call1()}, @code{call2()}, and
6249 @code{call3()} call a function, passing it the argument(s) given (the
6250 arguments are given as separate C arguments rather than being passed as
6251 an array).  @code{apply1()} uses @code{Fapply()} while the others use
6252 @code{Ffuncall()} to do the real work.
6253
6254 @node Dynamic Binding; The specbinding Stack; Unwind-Protects
6255 @section Dynamic Binding; The specbinding Stack; Unwind-Protects
6256
6257 @example
6258 struct specbinding
6259 @{
6260   Lisp_Object symbol;
6261   Lisp_Object old_value;
6262   Lisp_Object (*func) (Lisp_Object); /* for unwind-protect */
6263 @};
6264 @end example
6265
6266   @code{struct specbinding} is used for local-variable bindings and
6267 unwind-protects.  @code{specpdl} holds an array of @code{struct specbinding}'s,
6268 @code{specpdl_ptr} points to the beginning of the free bindings in the
6269 array, @code{specpdl_size} specifies the total number of binding slots
6270 in the array, and @code{max_specpdl_size} specifies the maximum number
6271 of bindings the array can be expanded to hold.  @code{grow_specpdl()}
6272 increases the size of the @code{specpdl} array, multiplying its size by
6273 2 but never exceeding @code{max_specpdl_size} (except that if this
6274 number is less than 400, it is first set to 400).
6275
6276   @code{specbind()} binds a symbol to a value and is used for local
6277 variables and @code{let} forms.  The symbol and its old value (which
6278 might be @code{Qunbound}, indicating no prior value) are recorded in the
6279 specpdl array, and @code{specpdl_size} is increased by 1.
6280
6281   @code{record_unwind_protect()} implements an @dfn{unwind-protect},
6282 which, when placed around a section of code, ensures that some specified
6283 cleanup routine will be executed even if the code exits abnormally
6284 (e.g. through a @code{throw} or quit).  @code{record_unwind_protect()}
6285 simply adds a new specbinding to the @code{specpdl} array and stores the
6286 appropriate information in it.  The cleanup routine can either be a C
6287 function, which is stored in the @code{func} field, or a @code{progn}
6288 form, which is stored in the @code{old_value} field.
6289
6290   @code{unbind_to()} removes specbindings from the @code{specpdl} array
6291 until the specified position is reached.  Each specbinding can be one of
6292 three types:
6293
6294 @enumerate
6295 @item
6296 an unwind-protect with a C cleanup function (@code{func} is not 0, and
6297 @code{old_value} holds an argument to be passed to the function);
6298 @item
6299 an unwind-protect with a Lisp form (@code{func} is 0, @code{symbol}
6300 is @code{nil}, and @code{old_value} holds the form to be executed with
6301 @code{Fprogn()}); or
6302 @item
6303 a local-variable binding (@code{func} is 0, @code{symbol} is not
6304 @code{nil}, and @code{old_value} holds the old value, which is stored as
6305 the symbol's value).
6306 @end enumerate
6307
6308 @node Simple Special Forms
6309 @section Simple Special Forms
6310
6311 @code{or}, @code{and}, @code{if}, @code{cond}, @code{progn},
6312 @code{prog1}, @code{prog2}, @code{setq}, @code{quote}, @code{function},
6313 @code{let*}, @code{let}, @code{while}
6314
6315 All of these are very simple and work as expected, calling
6316 @code{Feval()} or @code{Fprogn()} as necessary and (in the case of
6317 @code{let} and @code{let*}) using @code{specbind()} to create bindings
6318 and @code{unbind_to()} to undo the bindings when finished.
6319
6320 Note that, with the exeption of @code{Fprogn}, these functions are
6321 typically called in real life only in interpreted code, since the byte
6322 compiler knows how to convert calls to these functions directly into
6323 byte code.
6324
6325 @node Catch and Throw
6326 @section Catch and Throw
6327
6328 @example
6329 struct catchtag
6330 @{
6331   Lisp_Object tag;
6332   Lisp_Object val;
6333   struct catchtag *next;
6334   struct gcpro *gcpro;
6335   jmp_buf jmp;
6336   struct backtrace *backlist;
6337   int lisp_eval_depth;
6338   int pdlcount;
6339 @};
6340 @end example
6341
6342   @code{catch} is a Lisp function that places a catch around a body of
6343 code.  A catch is a means of non-local exit from the code.  When a catch
6344 is created, a tag is specified, and executing a @code{throw} to this tag
6345 will exit from the body of code caught with this tag, and its value will
6346 be the value given in the call to @code{throw}.  If there is no such
6347 call, the code will be executed normally.
6348
6349   Information pertaining to a catch is held in a @code{struct catchtag},
6350 which is placed at the head of a linked list pointed to by
6351 @code{catchlist}.  @code{internal_catch()} is passed a C function to
6352 call (@code{Fprogn()} when Lisp @code{catch} is called) and arguments to
6353 give it, and places a catch around the function.  Each @code{struct
6354 catchtag} is held in the stack frame of the @code{internal_catch()}
6355 instance that created the catch.
6356
6357   @code{internal_catch()} is fairly straightforward.  It stores into the
6358 @code{struct catchtag} the tag name and the current values of
6359 @code{backtrace_list}, @code{lisp_eval_depth}, @code{gcprolist}, and the
6360 offset into the @code{specpdl} array, sets a jump point with @code{_setjmp()}
6361 (storing the jump point into the @code{struct catchtag}), and calls the
6362 function.  Control will return to @code{internal_catch()} either when
6363 the function exits normally or through a @code{_longjmp()} to this jump
6364 point.  In the latter case, @code{throw} will store the value to be
6365 returned into the @code{struct catchtag} before jumping.  When it's
6366 done, @code{internal_catch()} removes the @code{struct catchtag} from
6367 the catchlist and returns the proper value.
6368
6369   @code{Fthrow()} goes up through the catchlist until it finds one with
6370 a matching tag.  It then calls @code{unbind_catch()} to restore
6371 everything to what it was when the appropriate catch was set, stores the
6372 return value in the @code{struct catchtag}, and jumps (with
6373 @code{_longjmp()}) to its jump point.
6374
6375   @code{unbind_catch()} removes all catches from the catchlist until it
6376 finds the correct one.  Some of the catches might have been placed for
6377 error-trapping, and if so, the appropriate entries on the handlerlist
6378 must be removed (see ``errors'').  @code{unbind_catch()} also restores
6379 the values of @code{gcprolist}, @code{backtrace_list}, and
6380 @code{lisp_eval}, and calls @code{unbind_to()} to undo any specbindings
6381 created since the catch.
6382
6383
6384 @node Symbols and Variables, Buffers and Textual Representation, Evaluation; Stack Frames; Bindings, Top
6385 @chapter Symbols and Variables
6386
6387 @menu
6388 * Introduction to Symbols::
6389 * Obarrays::
6390 * Symbol Values::
6391 @end menu
6392
6393 @node Introduction to Symbols
6394 @section Introduction to Symbols
6395
6396   A symbol is basically just an object with four fields: a name (a
6397 string), a value (some Lisp object), a function (some Lisp object), and
6398 a property list (usually a list of alternating keyword/value pairs).
6399 What makes symbols special is that there is usually only one symbol with
6400 a given name, and the symbol is referred to by name.  This makes a
6401 symbol a convenient way of calling up data by name, i.e. of implementing
6402 variables. (The variable's value is stored in the @dfn{value slot}.)
6403 Similarly, functions are referenced by name, and the definition of the
6404 function is stored in a symbol's @dfn{function slot}.  This means that
6405 there can be a distinct function and variable with the same name.  The
6406 property list is used as a more general mechanism of associating
6407 additional values with particular names, and once again the namespace is
6408 independent of the function and variable namespaces.
6409
6410 @node Obarrays
6411 @section Obarrays
6412
6413   The identity of symbols with their names is accomplished through a
6414 structure called an obarray, which is just a poorly-implemented hash
6415 table mapping from strings to symbols whose name is that string. (I say
6416 ``poorly implemented'' because an obarray appears in Lisp as a vector
6417 with some hidden fields rather than as its own opaque type.  This is an
6418 Emacs Lisp artifact that should be fixed.)
6419
6420   Obarrays are implemented as a vector of some fixed size (which should
6421 be a prime for best results), where each ``bucket'' of the vector
6422 contains one or more symbols, threaded through a hidden @code{next}
6423 field in the symbol.  Lookup of a symbol in an obarray, and adding a
6424 symbol to an obarray, is accomplished through standard hash-table
6425 techniques.
6426
6427   The standard Lisp function for working with symbols and obarrays is
6428 @code{intern}.  This looks up a symbol in an obarray given its name; if
6429 it's not found, a new symbol is automatically created with the specified
6430 name, added to the obarray, and returned.  This is what happens when the
6431 Lisp reader encounters a symbol (or more precisely, encounters the name
6432 of a symbol) in some text that it is reading.  There is a standard
6433 obarray called @code{obarray} that is used for this purpose, although
6434 the Lisp programmer is free to create his own obarrays and @code{intern}
6435 symbols in them.
6436
6437   Note that, once a symbol is in an obarray, it stays there until
6438 something is done about it, and the standard obarray @code{obarray}
6439 always stays around, so once you use any particular variable name, a
6440 corresponding symbol will stay around in @code{obarray} until you exit
6441 XEmacs.
6442
6443   Note that @code{obarray} itself is a variable, and as such there is a
6444 symbol in @code{obarray} whose name is @code{"obarray"} and which
6445 contains @code{obarray} as its value.
6446
6447   Note also that this call to @code{intern} occurs only when in the Lisp
6448 reader, not when the code is executed (at which point the symbol is
6449 already around, stored as such in the definition of the function).
6450
6451   You can create your own obarray using @code{make-vector} (this is
6452 horrible but is an artifact) and intern symbols into that obarray.
6453 Doing that will result in two or more symbols with the same name.
6454 However, at most one of these symbols is in the standard @code{obarray}:
6455 You cannot have two symbols of the same name in any particular obarray.
6456 Note that you cannot add a symbol to an obarray in any fashion other
6457 than using @code{intern}: i.e. you can't take an existing symbol and put
6458 it in an existing obarray.  Nor can you change the name of an existing
6459 symbol. (Since obarrays are vectors, you can violate the consistency of
6460 things by storing directly into the vector, but let's ignore that
6461 possibility.)
6462
6463   Usually symbols are created by @code{intern}, but if you really want,
6464 you can explicitly create a symbol using @code{make-symbol}, giving it
6465 some name.  The resulting symbol is not in any obarray (i.e. it is
6466 @dfn{uninterned}), and you can't add it to any obarray.  Therefore its
6467 primary purpose is as a symbol to use in macros to avoid namespace
6468 pollution.  It can also be used as a carrier of information, but cons
6469 cells could probably be used just as well.
6470
6471   You can also use @code{intern-soft} to look up a symbol but not create
6472 a new one, and @code{unintern} to remove a symbol from an obarray.  This
6473 returns the removed symbol. (Remember: You can't put the symbol back
6474 into any obarray.) Finally, @code{mapatoms} maps over all of the symbols
6475 in an obarray.
6476
6477 @node Symbol Values
6478 @section Symbol Values
6479
6480   The value field of a symbol normally contains a Lisp object.  However,
6481 a symbol can be @dfn{unbound}, meaning that it logically has no value.
6482 This is internally indicated by storing a special Lisp object, called
6483 @dfn{the unbound marker} and stored in the global variable
6484 @code{Qunbound}.  The unbound marker is of a special Lisp object type
6485 called @dfn{symbol-value-magic}.  It is impossible for the Lisp
6486 programmer to directly create or access any object of this type.
6487
6488   @strong{You must not let any ``symbol-value-magic'' object escape to
6489 the Lisp level.}  Printing any of these objects will cause the message
6490 @samp{INTERNAL EMACS BUG} to appear as part of the print representation.
6491 (You may see this normally when you call @code{debug_print()} from the
6492 debugger on a Lisp object.) If you let one of these objects escape to
6493 the Lisp level, you will violate a number of assumptions contained in
6494 the C code and make the unbound marker not function right.
6495
6496   When a symbol is created, its value field (and function field) are set
6497 to @code{Qunbound}.  The Lisp programmer can restore these conditions
6498 later using @code{makunbound} or @code{fmakunbound}, and can query to
6499 see whether the value of function fields are @dfn{bound} (i.e. have a
6500 value other than @code{Qunbound}) using @code{boundp} and
6501 @code{fboundp}.  The fields are set to a normal Lisp object using
6502 @code{set} (or @code{setq}) and @code{fset}.
6503
6504   Other symbol-value-magic objects are used as special markers to
6505 indicate variables that have non-normal properties.  This includes any
6506 variables that are tied into C variables (setting the variable magically
6507 sets some global variable in the C code, and likewise for retrieving the
6508 variable's value), variables that magically tie into slots in the
6509 current buffer, variables that are buffer-local, etc.  The
6510 symbol-value-magic object is stored in the value cell in place of
6511 a normal object, and the code to retrieve a symbol's value
6512 (i.e. @code{symbol-value}) knows how to do special things with them.
6513 This means that you should not just fetch the value cell directly if you
6514 want a symbol's value.
6515
6516   The exact workings of this are rather complex and involved and are
6517 well-documented in comments in @file{buffer.c}, @file{symbols.c}, and
6518 @file{lisp.h}.
6519
6520 @node Buffers and Textual Representation, MULE Character Sets and Encodings, Symbols and Variables, Top
6521 @chapter Buffers and Textual Representation
6522
6523 @menu
6524 * Introduction to Buffers::     A buffer holds a block of text such as a file.
6525 * The Text in a Buffer::        Representation of the text in a buffer.
6526 * Buffer Lists::                Keeping track of all buffers.
6527 * Markers and Extents::         Tagging locations within a buffer.
6528 * Bufbytes and Emchars::        Representation of individual characters.
6529 * The Buffer Object::           The Lisp object corresponding to a buffer.
6530 @end menu
6531
6532 @node Introduction to Buffers
6533 @section Introduction to Buffers
6534
6535   A buffer is logically just a Lisp object that holds some text.
6536 In this, it is like a string, but a buffer is optimized for
6537 frequent insertion and deletion, while a string is not.  Furthermore:
6538
6539 @enumerate
6540 @item
6541 Buffers are @dfn{permanent} objects, i.e. once you create them, they
6542 remain around, and need to be explicitly deleted before they go away.
6543 @item
6544 Each buffer has a unique name, which is a string.  Buffers are
6545 normally referred to by name.  In this respect, they are like
6546 symbols.
6547 @item
6548 Buffers have a default insertion position, called @dfn{point}.
6549 Inserting text (unless you explicitly give a position) goes at point,
6550 and moves point forward past the text.  This is what is going on when
6551 you type text into Emacs.
6552 @item
6553 Buffers have lots of extra properties associated with them.
6554 @item
6555 Buffers can be @dfn{displayed}.  What this means is that there
6556 exist a number of @dfn{windows}, which are objects that correspond
6557 to some visible section of your display, and each window has
6558 an associated buffer, and the current contents of the buffer
6559 are shown in that section of the display.  The redisplay mechanism
6560 (which takes care of doing this) knows how to look at the
6561 text of a buffer and come up with some reasonable way of displaying
6562 this.  Many of the properties of a buffer control how the
6563 buffer's text is displayed.
6564 @item
6565 One buffer is distinguished and called the @dfn{current buffer}.  It is
6566 stored in the variable @code{current_buffer}.  Buffer operations operate
6567 on this buffer by default.  When you are typing text into a buffer, the
6568 buffer you are typing into is always @code{current_buffer}.  Switching
6569 to a different window changes the current buffer.  Note that Lisp code
6570 can temporarily change the current buffer using @code{set-buffer} (often
6571 enclosed in a @code{save-excursion} so that the former current buffer
6572 gets restored when the code is finished).  However, calling
6573 @code{set-buffer} will NOT cause a permanent change in the current
6574 buffer.  The reason for this is that the top-level event loop sets
6575 @code{current_buffer} to the buffer of the selected window, each time
6576 it finishes executing a user command.
6577 @end enumerate
6578
6579   Make sure you understand the distinction between @dfn{current buffer}
6580 and @dfn{buffer of the selected window}, and the distinction between
6581 @dfn{point} of the current buffer and @dfn{window-point} of the selected
6582 window. (This latter distinction is explained in detail in the section
6583 on windows.)
6584
6585 @node The Text in a Buffer
6586 @section The Text in a Buffer
6587
6588   The text in a buffer consists of a sequence of zero or more
6589 characters.  A @dfn{character} is an integer that logically represents
6590 a letter, number, space, or other unit of text.  Most of the characters
6591 that you will typically encounter belong to the ASCII set of characters,
6592 but there are also characters for various sorts of accented letters,
6593 special symbols, Chinese and Japanese ideograms (i.e. Kanji, Katakana,
6594 etc.), Cyrillic and Greek letters, etc.  The actual number of possible
6595 characters is quite large.
6596
6597   For now, we can view a character as some non-negative integer that
6598 has some shape that defines how it typically appears (e.g. as an
6599 uppercase A). (The exact way in which a character appears depends on the
6600 font used to display the character.) The internal type of characters in
6601 the C code is an @code{Emchar}; this is just an @code{int}, but using a
6602 symbolic type makes the code clearer.
6603
6604   Between every character in a buffer is a @dfn{buffer position} or
6605 @dfn{character position}.  We can speak of the character before or after
6606 a particular buffer position, and when you insert a character at a
6607 particular position, all characters after that position end up at new
6608 positions.  When we speak of the character @dfn{at} a position, we
6609 really mean the character after the position.  (This schizophrenia
6610 between a buffer position being ``between'' a character and ``on'' a
6611 character is rampant in Emacs.)
6612
6613   Buffer positions are numbered starting at 1.  This means that
6614 position 1 is before the first character, and position 0 is not
6615 valid.  If there are N characters in a buffer, then buffer
6616 position N+1 is after the last one, and position N+2 is not valid.
6617
6618   The internal makeup of the Emchar integer varies depending on whether
6619 we have compiled with MULE support.  If not, the Emchar integer is an
6620 8-bit integer with possible values from 0 - 255.  0 - 127 are the
6621 standard ASCII characters, while 128 - 255 are the characters from the
6622 ISO-8859-1 character set.  If we have compiled with MULE support, an
6623 Emchar is a 19-bit integer, with the various bits having meanings
6624 according to a complex scheme that will be detailed later.  The
6625 characters numbered 0 - 255 still have the same meanings as for the
6626 non-MULE case, though.
6627
6628   Internally, the text in a buffer is represented in a fairly simple
6629 fashion: as a contiguous array of bytes, with a @dfn{gap} of some size
6630 in the middle.  Although the gap is of some substantial size in bytes,
6631 there is no text contained within it: From the perspective of the text
6632 in the buffer, it does not exist.  The gap logically sits at some buffer
6633 position, between two characters (or possibly at the beginning or end of
6634 the buffer).  Insertion of text in a buffer at a particular position is
6635 always accomplished by first moving the gap to that position
6636 (i.e. through some block moving of text), then writing the text into the
6637 beginning of the gap, thereby shrinking the gap.  If the gap shrinks
6638 down to nothing, a new gap is created. (What actually happens is that a
6639 new gap is ``created'' at the end of the buffer's text, which requires
6640 nothing more than changing a couple of indices; then the gap is
6641 ``moved'' to the position where the insertion needs to take place by
6642 moving up in memory all the text after that position.)  Similarly,
6643 deletion occurs by moving the gap to the place where the text is to be
6644 deleted, and then simply expanding the gap to include the deleted text.
6645 (@dfn{Expanding} and @dfn{shrinking} the gap as just described means
6646 just that the internal indices that keep track of where the gap is
6647 located are changed.)
6648
6649   Note that the total amount of memory allocated for a buffer text never
6650 decreases while the buffer is live.  Therefore, if you load up a
6651 20-megabyte file and then delete all but one character, there will be a
6652 20-megabyte gap, which won't get any smaller (except by inserting
6653 characters back again).  Once the buffer is killed, the memory allocated
6654 for the buffer text will be freed, but it will still be sitting on the
6655 heap, taking up virtual memory, and will not be released back to the
6656 operating system. (However, if you have compiled XEmacs with rel-alloc,
6657 the situation is different.  In this case, the space @emph{will} be
6658 released back to the operating system.  However, this tends to result in a
6659 noticeable speed penalty.)
6660
6661   Astute readers may notice that the text in a buffer is represented as
6662 an array of @emph{bytes}, while (at least in the MULE case) an Emchar is
6663 a 19-bit integer, which clearly cannot fit in a byte.  This means (of
6664 course) that the text in a buffer uses a different representation from
6665 an Emchar: specifically, the 19-bit Emchar becomes a series of one to
6666 four bytes.  The conversion between these two representations is complex
6667 and will be described later.
6668
6669   In the non-MULE case, everything is very simple: An Emchar
6670 is an 8-bit value, which fits neatly into one byte.
6671
6672   If we are given a buffer position and want to retrieve the
6673 character at that position, we need to follow these steps:
6674
6675 @enumerate
6676 @item
6677 Pretend there's no gap, and convert the buffer position into a @dfn{byte
6678 index} that indexes to the appropriate byte in the buffer's stream of
6679 textual bytes.  By convention, byte indices begin at 1, just like buffer
6680 positions.  In the non-MULE case, byte indices and buffer positions are
6681 identical, since one character equals one byte.
6682 @item
6683 Convert the byte index into a @dfn{memory index}, which takes the gap
6684 into account.  The memory index is a direct index into the block of
6685 memory that stores the text of a buffer.  This basically just involves
6686 checking to see if the byte index is past the gap, and if so, adding the
6687 size of the gap to it.  By convention, memory indices begin at 1, just
6688 like buffer positions and byte indices, and when referring to the
6689 position that is @dfn{at} the gap, we always use the memory position at
6690 the @emph{beginning}, not at the end, of the gap.
6691 @item
6692 Fetch the appropriate bytes at the determined memory position.
6693 @item
6694 Convert these bytes into an Emchar.
6695 @end enumerate
6696
6697   In the non-Mule case, (3) and (4) boil down to a simple one-byte
6698 memory access.
6699
6700   Note that we have defined three types of positions in a buffer:
6701
6702 @enumerate
6703 @item
6704 @dfn{buffer positions} or @dfn{character positions}, typedef @code{Bufpos}
6705 @item
6706 @dfn{byte indices}, typedef @code{Bytind}
6707 @item
6708 @dfn{memory indices}, typedef @code{Memind}
6709 @end enumerate
6710
6711   All three typedefs are just @code{int}s, but defining them this way makes
6712 things a lot clearer.
6713
6714   Most code works with buffer positions.  In particular, all Lisp code
6715 that refers to text in a buffer uses buffer positions.  Lisp code does
6716 not know that byte indices or memory indices exist.
6717
6718   Finally, we have a typedef for the bytes in a buffer.  This is a
6719 @code{Bufbyte}, which is an unsigned char.  Referring to them as
6720 Bufbytes underscores the fact that we are working with a string of bytes
6721 in the internal Emacs buffer representation rather than in one of a
6722 number of possible alternative representations (e.g. EUC-encoded text,
6723 etc.).
6724
6725 @node Buffer Lists
6726 @section Buffer Lists
6727
6728   Recall earlier that buffers are @dfn{permanent} objects, i.e.  that
6729 they remain around until explicitly deleted.  This entails that there is
6730 a list of all the buffers in existence.  This list is actually an
6731 assoc-list (mapping from the buffer's name to the buffer) and is stored
6732 in the global variable @code{Vbuffer_alist}.
6733
6734   The order of the buffers in the list is important: the buffers are
6735 ordered approximately from most-recently-used to least-recently-used.
6736 Switching to a buffer using @code{switch-to-buffer},
6737 @code{pop-to-buffer}, etc. and switching windows using
6738 @code{other-window}, etc.  usually brings the new current buffer to the
6739 front of the list.  @code{switch-to-buffer}, @code{other-buffer},
6740 etc. look at the beginning of the list to find an alternative buffer to
6741 suggest.  You can also explicitly move a buffer to the end of the list
6742 using @code{bury-buffer}.
6743
6744   In addition to the global ordering in @code{Vbuffer_alist}, each frame
6745 has its own ordering of the list.  These lists always contain the same
6746 elements as in @code{Vbuffer_alist} although possibly in a different
6747 order.  @code{buffer-list} normally returns the list for the selected
6748 frame.  This allows you to work in separate frames without things
6749 interfering with each other.
6750
6751   The standard way to look up a buffer given a name is
6752 @code{get-buffer}, and the standard way to create a new buffer is
6753 @code{get-buffer-create}, which looks up a buffer with a given name,
6754 creating a new one if necessary.  These operations correspond exactly
6755 with the symbol operations @code{intern-soft} and @code{intern},
6756 respectively.  You can also force a new buffer to be created using
6757 @code{generate-new-buffer}, which takes a name and (if necessary) makes
6758 a unique name from this by appending a number, and then creates the
6759 buffer.  This is basically like the symbol operation @code{gensym}.
6760
6761 @node Markers and Extents
6762 @section Markers and Extents
6763
6764   Among the things associated with a buffer are things that are
6765 logically attached to certain buffer positions.  This can be used to
6766 keep track of a buffer position when text is inserted and deleted, so
6767 that it remains at the same spot relative to the text around it; to
6768 assign properties to particular sections of text; etc.  There are two
6769 such objects that are useful in this regard: they are @dfn{markers} and
6770 @dfn{extents}.
6771
6772   A @dfn{marker} is simply a flag placed at a particular buffer
6773 position, which is moved around as text is inserted and deleted.
6774 Markers are used for all sorts of purposes, such as the @code{mark} that
6775 is the other end of textual regions to be cut, copied, etc.
6776
6777   An @dfn{extent} is similar to two markers plus some associated
6778 properties, and is used to keep track of regions in a buffer as text is
6779 inserted and deleted, and to add properties (e.g. fonts) to particular
6780 regions of text.  The external interface of extents is explained
6781 elsewhere.
6782
6783   The important thing here is that markers and extents simply contain
6784 buffer positions in them as integers, and every time text is inserted or
6785 deleted, these positions must be updated.  In order to minimize the
6786 amount of shuffling that needs to be done, the positions in markers and
6787 extents (there's one per marker, two per extent) and stored in Meminds.
6788 This means that they only need to be moved when the text is physically
6789 moved in memory; since the gap structure tries to minimize this, it also
6790 minimizes the number of marker and extent indices that need to be
6791 adjusted.  Look in @file{insdel.c} for the details of how this works.
6792
6793   One other important distinction is that markers are @dfn{temporary}
6794 while extents are @dfn{permanent}.  This means that markers disappear as
6795 soon as there are no more pointers to them, and correspondingly, there
6796 is no way to determine what markers are in a buffer if you are just
6797 given the buffer.  Extents remain in a buffer until they are detached
6798 (which could happen as a result of text being deleted) or the buffer is
6799 deleted, and primitives do exist to enumerate the extents in a buffer.
6800
6801 @node Bufbytes and Emchars
6802 @section Bufbytes and Emchars
6803
6804   Not yet documented.
6805
6806 @node The Buffer Object
6807 @section The Buffer Object
6808
6809   Buffers contain fields not directly accessible by the Lisp programmer.
6810 We describe them here, naming them by the names used in the C code.
6811 Many are accessible indirectly in Lisp programs via Lisp primitives.
6812
6813 @table @code
6814 @item name
6815 The buffer name is a string that names the buffer.  It is guaranteed to
6816 be unique.  @xref{Buffer Names,,, lispref, XEmacs Lisp Programmer's
6817 Manual}.
6818
6819 @item save_modified
6820 This field contains the time when the buffer was last saved, as an
6821 integer.  @xref{Buffer Modification,,, lispref, XEmacs Lisp Programmer's
6822 Manual}.
6823
6824 @item modtime
6825 This field contains the modification time of the visited file.  It is
6826 set when the file is written or read.  Every time the buffer is written
6827 to the file, this field is compared to the modification time of the
6828 file.  @xref{Buffer Modification,,, lispref, XEmacs Lisp Programmer's
6829 Manual}.
6830
6831 @item auto_save_modified
6832 This field contains the time when the buffer was last auto-saved.
6833
6834 @item last_window_start
6835 This field contains the @code{window-start} position in the buffer as of
6836 the last time the buffer was displayed in a window.
6837
6838 @item undo_list
6839 This field points to the buffer's undo list.  @xref{Undo,,, lispref,
6840 XEmacs Lisp Programmer's Manual}.
6841
6842 @item syntax_table_v
6843 This field contains the syntax table for the buffer.  @xref{Syntax
6844 Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
6845
6846 @item downcase_table
6847 This field contains the conversion table for converting text to lower
6848 case.  @xref{Case Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
6849
6850 @item upcase_table
6851 This field contains the conversion table for converting text to upper
6852 case.  @xref{Case Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
6853
6854 @item case_canon_table
6855 This field contains the conversion table for canonicalizing text for
6856 case-folding search.  @xref{Case Tables,,, lispref, XEmacs Lisp
6857 Programmer's Manual}.
6858
6859 @item case_eqv_table
6860 This field contains the equivalence table for case-folding search.
6861 @xref{Case Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
6862
6863 @item display_table
6864 This field contains the buffer's display table, or @code{nil} if it
6865 doesn't have one.  @xref{Display Tables,,, lispref, XEmacs Lisp
6866 Programmer's Manual}.
6867
6868 @item markers
6869 This field contains the chain of all markers that currently point into
6870 the buffer.  Deletion of text in the buffer, and motion of the buffer's
6871 gap, must check each of these markers and perhaps update it.
6872 @xref{Markers,,, lispref, XEmacs Lisp Programmer's Manual}.
6873
6874 @item backed_up
6875 This field is a flag that tells whether a backup file has been made for
6876 the visited file of this buffer.
6877
6878 @item mark
6879 This field contains the mark for the buffer.  The mark is a marker,
6880 hence it is also included on the list @code{markers}.  @xref{The Mark,,,
6881 lispref, XEmacs Lisp Programmer's Manual}.
6882
6883 @item mark_active
6884 This field is non-@code{nil} if the buffer's mark is active.
6885
6886 @item local_var_alist
6887 This field contains the association list describing the variables local
6888 in this buffer, and their values, with the exception of local variables
6889 that have special slots in the buffer object.  (Those slots are omitted
6890 from this table.)  @xref{Buffer-Local Variables,,, lispref, XEmacs Lisp
6891 Programmer's Manual}.
6892
6893 @item modeline_format
6894 This field contains a Lisp object which controls how to display the mode
6895 line for this buffer.  @xref{Modeline Format,,, lispref, XEmacs Lisp
6896 Programmer's Manual}.
6897
6898 @item base_buffer
6899 This field holds the buffer's base buffer (if it is an indirect buffer),
6900 or @code{nil}.
6901 @end table
6902
6903 @node MULE Character Sets and Encodings, The Lisp Reader and Compiler, Buffers and Textual Representation, Top
6904 @chapter MULE Character Sets and Encodings
6905
6906   Recall that there are two primary ways that text is represented in
6907 XEmacs.  The @dfn{buffer} representation sees the text as a series of
6908 bytes (Bufbytes), with a variable number of bytes used per character.
6909 The @dfn{character} representation sees the text as a series of integers
6910 (Emchars), one per character.  The character representation is a cleaner
6911 representation from a theoretical standpoint, and is thus used in many
6912 cases when lots of manipulations on a string need to be done.  However,
6913 the buffer representation is the standard representation used in both
6914 Lisp strings and buffers, and because of this, it is the ``default''
6915 representation that text comes in.  The reason for using this
6916 representation is that it's compact and is compatible with ASCII.
6917
6918 @menu
6919 * Character Sets::
6920 * Encodings::
6921 * Internal Mule Encodings::
6922 * CCL::
6923 @end menu
6924
6925 @node Character Sets
6926 @section Character Sets
6927
6928   A character set (or @dfn{charset}) is an ordered set of characters.  A
6929 particular character in a charset is indexed using one or more
6930 @dfn{position codes}, which are non-negative integers.  The number of
6931 position codes needed to identify a particular character in a charset is
6932 called the @dfn{dimension} of the charset.  In XEmacs/Mule, all charsets
6933 have dimension 1 or 2, and the size of all charsets (except for a few
6934 special cases) is either 94, 96, 94 by 94, or 96 by 96.  The range of
6935 position codes used to index characters from any of these types of
6936 character sets is as follows:
6937
6938 @example
6939 Charset type            Position code 1         Position code 2
6940 ------------------------------------------------------------
6941 94                      33 - 126                N/A
6942 96                      32 - 127                N/A
6943 94x94                   33 - 126                33 - 126
6944 96x96                   32 - 127                32 - 127
6945 @end example
6946
6947   Note that in the above cases position codes do not start at an
6948 expected value such as 0 or 1.  The reason for this will become clear
6949 later.
6950
6951   For example, Latin-1 is a 96-character charset, and JISX0208 (the
6952 Japanese national character set) is a 94x94-character charset.
6953
6954   [Note that, although the ranges above define the @emph{valid} position
6955 codes for a charset, some of the slots in a particular charset may in
6956 fact be empty.  This is the case for JISX0208, for example, where (e.g.)
6957 all the slots whose first position code is in the range 118 - 127 are
6958 empty.]
6959
6960   There are three charsets that do not follow the above rules.  All of
6961 them have one dimension, and have ranges of position codes as follows:
6962
6963 @example
6964 Charset name            Position code 1
6965 ------------------------------------
6966 ASCII                   0 - 127
6967 Control-1               0 - 31
6968 Composite               0 - some large number
6969 @end example
6970
6971   (The upper bound of the position code for composite characters has not
6972 yet been determined, but it will probably be at least 16,383).
6973
6974   ASCII is the union of two subsidiary character sets: Printing-ASCII
6975 (the printing ASCII character set, consisting of position codes 33 -
6976 126, like for a standard 94-character charset) and Control-ASCII (the
6977 non-printing characters that would appear in a binary file with codes 0
6978 - 32 and 127).
6979
6980   Control-1 contains the non-printing characters that would appear in a
6981 binary file with codes 128 - 159.
6982
6983   Composite contains characters that are generated by overstriking one
6984 or more characters from other charsets.
6985
6986   Note that some characters in ASCII, and all characters in Control-1,
6987 are @dfn{control} (non-printing) characters.  These have no printed
6988 representation but instead control some other function of the printing
6989 (e.g. TAB or 8 moves the current character position to the next tab
6990 stop).  All other characters in all charsets are @dfn{graphic}
6991 (printing) characters.
6992
6993   When a binary file is read in, the bytes in the file are assigned to
6994 character sets as follows:
6995
6996 @example
6997 Bytes           Character set           Range
6998 --------------------------------------------------
6999 0 - 127         ASCII                   0 - 127
7000 128 - 159       Control-1               0 - 31
7001 160 - 255       Latin-1                 32 - 127
7002 @end example
7003
7004   This is a bit ad-hoc but gets the job done.
7005
7006 @node Encodings
7007 @section Encodings
7008
7009   An @dfn{encoding} is a way of numerically representing characters from
7010 one or more character sets.  If an encoding only encompasses one
7011 character set, then the position codes for the characters in that
7012 character set could be used directly.  This is not possible, however, if
7013 more than one character set is to be used in the encoding.
7014
7015   For example, the conversion detailed above between bytes in a binary
7016 file and characters is effectively an encoding that encompasses the
7017 three character sets ASCII, Control-1, and Latin-1 in a stream of 8-bit
7018 bytes.
7019
7020   Thus, an encoding can be viewed as a way of encoding characters from a
7021 specified group of character sets using a stream of bytes, each of which
7022 contains a fixed number of bits (but not necessarily 8, as in the common
7023 usage of ``byte'').
7024
7025   Here are descriptions of a couple of common
7026 encodings:
7027
7028 @menu
7029 * Japanese EUC (Extended Unix Code)::
7030 * JIS7::
7031 @end menu
7032
7033 @node Japanese EUC (Extended Unix Code)
7034 @subsection Japanese EUC (Extended Unix Code)
7035
7036 This encompasses the character sets Printing-ASCII, Japanese-JISX0201,
7037 and Japanese-JISX0208-Kana (half-width katakana, the right half of
7038 JISX0201).  It uses 8-bit bytes.
7039
7040 Note that Printing-ASCII and Japanese-JISX0201-Kana are 94-character
7041 charsets, while Japanese-JISX0208 is a 94x94-character charset.
7042
7043 The encoding is as follows:
7044
7045 @example
7046 Character set            Representation (PC=position-code)
7047 -------------            --------------
7048 Printing-ASCII           PC1
7049 Japanese-JISX0201-Kana   0x8E       | PC1 + 0x80
7050 Japanese-JISX0208        PC1 + 0x80 | PC2 + 0x80
7051 Japanese-JISX0212        PC1 + 0x80 | PC2 + 0x80
7052 @end example
7053
7054
7055 @node JIS7
7056 @subsection JIS7
7057
7058 This encompasses the character sets Printing-ASCII,
7059 Japanese-JISX0201-Roman (the left half of JISX0201; this character set
7060 is very similar to Printing-ASCII and is a 94-character charset),
7061 Japanese-JISX0208, and Japanese-JISX0201-Kana.  It uses 7-bit bytes.
7062
7063 Unlike Japanese EUC, this is a @dfn{modal} encoding, which
7064 means that there are multiple states that the encoding can
7065 be in, which affect how the bytes are to be interpreted.
7066 Special sequences of bytes (called @dfn{escape sequences})
7067 are used to change states.
7068
7069   The encoding is as follows:
7070
7071 @example
7072 Character set              Representation (PC=position-code)
7073 -------------              --------------
7074 Printing-ASCII             PC1
7075 Japanese-JISX0201-Roman    PC1
7076 Japanese-JISX0201-Kana     PC1
7077 Japanese-JISX0208          PC1 PC2
7078
7079
7080 Escape sequence   ASCII equivalent   Meaning
7081 ---------------   ----------------   -------
7082 0x1B 0x28 0x4A    ESC ( J            invoke Japanese-JISX0201-Roman
7083 0x1B 0x28 0x49    ESC ( I            invoke Japanese-JISX0201-Kana
7084 0x1B 0x24 0x42    ESC $ B            invoke Japanese-JISX0208
7085 0x1B 0x28 0x42    ESC ( B            invoke Printing-ASCII
7086 @end example
7087
7088   Initially, Printing-ASCII is invoked.
7089
7090 @node Internal Mule Encodings
7091 @section Internal Mule Encodings
7092
7093 In XEmacs/Mule, each character set is assigned a unique number, called a
7094 @dfn{leading byte}.  This is used in the encodings of a character.
7095 Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has
7096 a leading byte of 0), although some leading bytes are reserved.
7097
7098 Charsets whose leading byte is in the range 0x80 - 0x9F are called
7099 @dfn{official} and are used for built-in charsets.  Other charsets are
7100 called @dfn{private} and have leading bytes in the range 0xA0 - 0xFF;
7101 these are user-defined charsets.
7102
7103   More specifically:
7104
7105 @example
7106 Character set           Leading byte
7107 -------------           ------------
7108 ASCII                   0
7109 Composite               0x80
7110 Dimension-1 Official    0x81 - 0x8D
7111                           (0x8E is free)
7112 Control-1               0x8F
7113 Dimension-2 Official    0x90 - 0x99
7114                           (0x9A - 0x9D are free;
7115                            0x9E and 0x9F are reserved)
7116 Dimension-1 Private     0xA0 - 0xEF
7117 Dimension-2 Private     0xF0 - 0xFF
7118 @end example
7119
7120 There are two internal encodings for characters in XEmacs/Mule.  One is
7121 called @dfn{string encoding} and is an 8-bit encoding that is used for
7122 representing characters in a buffer or string.  It uses 1 to 4 bytes per
7123 character.  The other is called @dfn{character encoding} and is a 19-bit
7124 encoding that is used for representing characters individually in a
7125 variable.
7126
7127 (In the following descriptions, we'll ignore composite characters for
7128 the moment.  We also give a general (structural) overview first,
7129 followed later by the exact details.)
7130
7131 @menu
7132 * Internal String Encoding::
7133 * Internal Character Encoding::
7134 @end menu
7135
7136 @node Internal String Encoding
7137 @subsection Internal String Encoding
7138
7139 ASCII characters are encoded using their position code directly.  Other
7140 characters are encoded using their leading byte followed by their
7141 position code(s) with the high bit set.  Characters in private character
7142 sets have their leading byte prefixed with a @dfn{leading byte prefix},
7143 which is either 0x9E or 0x9F. (No character sets are ever assigned these
7144 leading bytes.) Specifically:
7145
7146 @example
7147 Character set           Encoding (PC=position-code, LB=leading-byte)
7148 -------------           --------
7149 ASCII                   PC-1 |
7150 Control-1               LB   |  PC1 + 0xA0 |
7151 Dimension-1 official    LB   |  PC1 + 0x80 |
7152 Dimension-1 private     0x9E |  LB         | PC1 + 0x80 |
7153 Dimension-2 official    LB   |  PC1 + 0x80 | PC2 + 0x80 |
7154 Dimension-2 private     0x9F |  LB         | PC1 + 0x80 | PC2 + 0x80
7155 @end example
7156
7157   The basic characteristic of this encoding is that the first byte
7158 of all characters is in the range 0x00 - 0x9F, and the second and
7159 following bytes of all characters is in the range 0xA0 - 0xFF.
7160 This means that it is impossible to get out of sync, or more
7161 specifically:
7162
7163 @enumerate
7164 @item
7165 Given any byte position, the beginning of the character it is
7166 within can be determined in constant time.
7167 @item
7168 Given any byte position at the beginning of a character, the
7169 beginning of the next character can be determined in constant
7170 time.
7171 @item
7172 Given any byte position at the beginning of a character, the
7173 beginning of the previous character can be determined in constant
7174 time.
7175 @item
7176 Textual searches can simply treat encoded strings as if they
7177 were encoded in a one-byte-per-character fashion rather than
7178 the actual multi-byte encoding.
7179 @end enumerate
7180
7181   None of the standard non-modal encodings meet all of these
7182 conditions.  For example, EUC satisfies only (2) and (3), while
7183 Shift-JIS and Big5 (not yet described) satisfy only (2). (All
7184 non-modal encodings must satisfy (2), in order to be unambiguous.)
7185
7186 @node Internal Character Encoding
7187 @subsection Internal Character Encoding
7188
7189   One 19-bit word represents a single character.  The word is
7190 separated into three fields:
7191
7192 @example
7193 Bit number:     18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
7194                 <------------> <------------------> <------------------>
7195 Field:                1                  2                    3
7196 @end example
7197
7198   Note that fields 2 and 3 hold 7 bits each, while field 1 holds 5 bits.
7199
7200 @example
7201 Character set           Field 1         Field 2         Field 3
7202 -------------           -------         -------         -------
7203 ASCII                      0               0              PC1
7204    range:                                                   (00 - 7F)
7205 Control-1                  0               1              PC1
7206    range:                                                   (00 - 1F)
7207 Dimension-1 official       0            LB - 0x80         PC1
7208    range:                                    (01 - 0D)      (20 - 7F)
7209 Dimension-1 private        0            LB - 0x80         PC1
7210    range:                                    (20 - 6F)      (20 - 7F)
7211 Dimension-2 official    LB - 0x8F         PC1             PC2
7212    range:                    (01 - 0A)       (20 - 7F)      (20 - 7F)
7213 Dimension-2 private     LB - 0xE1         PC1             PC2
7214    range:                    (0F - 1E)       (20 - 7F)      (20 - 7F)
7215 Composite                 0x1F             ?               ?
7216 @end example
7217
7218   Note that character codes 0 - 255 are the same as the ``binary encoding''
7219 described above.
7220
7221 @node CCL
7222 @section CCL
7223
7224 @example
7225 CCL PROGRAM SYNTAX:
7226      CCL_PROGRAM := (CCL_MAIN_BLOCK
7227                      [ CCL_EOF_BLOCK ])
7228
7229      CCL_MAIN_BLOCK := CCL_BLOCK
7230      CCL_EOF_BLOCK := CCL_BLOCK
7231
7232      CCL_BLOCK := STATEMENT | (STATEMENT [STATEMENT ...])
7233      STATEMENT :=
7234              SET | IF | BRANCH | LOOP | REPEAT | BREAK
7235              | READ | WRITE
7236
7237      SET := (REG = EXPRESSION) | (REG SELF_OP EXPRESSION)
7238             | INT-OR-CHAR
7239
7240      EXPRESSION := ARG | (EXPRESSION OP ARG)
7241
7242      IF := (if EXPRESSION CCL_BLOCK CCL_BLOCK)
7243      BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...])
7244      LOOP := (loop STATEMENT [STATEMENT ...])
7245      BREAK := (break)
7246      REPEAT := (repeat)
7247              | (write-repeat [REG | INT-OR-CHAR | string])
7248              | (write-read-repeat REG [INT-OR-CHAR | string | ARRAY]?)
7249      READ := (read REG) | (read REG REG)
7250              | (read-if REG ARITH_OP ARG CCL_BLOCK CCL_BLOCK)
7251              | (read-branch REG CCL_BLOCK [CCL_BLOCK ...])
7252      WRITE := (write REG) | (write REG REG)
7253              | (write INT-OR-CHAR) | (write STRING) | STRING
7254              | (write REG ARRAY)
7255      END := (end)
7256
7257      REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7
7258      ARG := REG | INT-OR-CHAR
7259      OP :=   + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | //
7260              | < | > | == | <= | >= | !=
7261      SELF_OP :=
7262              += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>=
7263      ARRAY := '[' INT-OR-CHAR ... ']'
7264      INT-OR-CHAR := INT | CHAR
7265
7266 MACHINE CODE:
7267
7268 The machine code consists of a vector of 32-bit words.
7269 The first such word specifies the start of the EOF section of the code;
7270 this is the code executed to handle any stuff that needs to be done
7271 (e.g. designating back to ASCII and left-to-right mode) after all
7272 other encoded/decoded data has been written out.  This is not used for
7273 charset CCL programs.
7274
7275 REGISTER: 0..7  -- refered by RRR or rrr
7276
7277 OPERATOR BIT FIELD (27-bit): XXXXXXXXXXXXXXX RRR TTTTT
7278         TTTTT (5-bit): operator type
7279         RRR (3-bit): register number
7280         XXXXXXXXXXXXXXXX (15-bit):
7281                 CCCCCCCCCCCCCCC: constant or address
7282                 000000000000rrr: register number
7283
7284 AAAA:   00000 +
7285         00001 -
7286         00010 *
7287         00011 /
7288         00100 %
7289         00101 &
7290         00110 |
7291         00111 ~
7292
7293         01000 <<
7294         01001 >>
7295         01010 <8
7296         01011 >8
7297         01100 //
7298         01101 not used
7299         01110 not used
7300         01111 not used
7301
7302         10000 <
7303         10001 >
7304         10010 ==
7305         10011 <=
7306         10100 >=
7307         10101 !=
7308
7309 OPERATORS:      TTTTT RRR XX..
7310
7311 SetCS:          00000 RRR C...C      RRR = C...C
7312 SetCL:          00001 RRR .....      RRR = c...c
7313                 c.............c
7314 SetR:           00010 RRR ..rrr      RRR = rrr
7315 SetA:           00011 RRR ..rrr      RRR = array[rrr]
7316                 C.............C      size of array = C...C
7317                 c.............c      contents = c...c
7318
7319 Jump:           00100 000 c...c      jump to c...c
7320 JumpCond:       00101 RRR c...c      if (!RRR) jump to c...c
7321 WriteJump:      00110 RRR c...c      Write1 RRR, jump to c...c
7322 WriteReadJump:  00111 RRR c...c      Write1, Read1 RRR, jump to c...c
7323 WriteCJump:     01000 000 c...c      Write1 C...C, jump to c...c
7324                 C...C
7325 WriteCReadJump: 01001 RRR c...c      Write1 C...C, Read1 RRR,
7326                 C.............C      and jump to c...c
7327 WriteSJump:     01010 000 c...c      WriteS, jump to c...c
7328                 C.............C
7329                 S.............S
7330                 ...
7331 WriteSReadJump: 01011 RRR c...c      WriteS, Read1 RRR, jump to c...c
7332                 C.............C
7333                 S.............S
7334                 ...
7335 WriteAReadJump: 01100 RRR c...c      WriteA, Read1 RRR, jump to c...c
7336                 C.............C      size of array = C...C
7337                 c.............c      contents = c...c
7338                 ...
7339 Branch:         01101 RRR C...C      if (RRR >= 0 && RRR < C..)
7340                 c.............c      branch to (RRR+1)th address
7341 Read1:          01110 RRR ...        read 1-byte to RRR
7342 Read2:          01111 RRR ..rrr      read 2-byte to RRR and rrr
7343 ReadBranch:     10000 RRR C...C      Read1 and Branch
7344                 c.............c
7345                 ...
7346 Write1:         10001 RRR .....      write 1-byte RRR
7347 Write2:         10010 RRR ..rrr      write 2-byte RRR and rrr
7348 WriteC:         10011 000 .....      write 1-char C...CC
7349                 C.............C
7350 WriteS:         10100 000 .....      write C..-byte of string
7351                 C.............C
7352                 S.............S
7353                 ...
7354 WriteA:         10101 RRR .....      write array[RRR]
7355                 C.............C      size of array = C...C
7356                 c.............c      contents = c...c
7357                 ...
7358 End:            10110 000 .....      terminate the execution
7359
7360 SetSelfCS:      10111 RRR C...C      RRR AAAAA= C...C
7361                 ..........AAAAA
7362 SetSelfCL:      11000 RRR .....      RRR AAAAA= c...c
7363                 c.............c
7364                 ..........AAAAA
7365 SetSelfR:       11001 RRR ..Rrr      RRR AAAAA= rrr
7366                 ..........AAAAA
7367 SetExprCL:      11010 RRR ..Rrr      RRR = rrr AAAAA c...c
7368                 c.............c
7369                 ..........AAAAA
7370 SetExprR:       11011 RRR ..rrr      RRR = rrr AAAAA Rrr
7371                 ............Rrr
7372                 ..........AAAAA
7373 JumpCondC:      11100 RRR c...c      if !(RRR AAAAA C..) jump to c...c
7374                 C.............C
7375                 ..........AAAAA
7376 JumpCondR:      11101 RRR c...c      if !(RRR AAAAA rrr) jump to c...c
7377                 ............rrr
7378                 ..........AAAAA
7379 ReadJumpCondC:  11110 RRR c...c      Read1 and JumpCondC
7380                 C.............C
7381                 ..........AAAAA
7382 ReadJumpCondR:  11111 RRR c...c      Read1 and JumpCondR
7383                 ............rrr
7384                 ..........AAAAA
7385 @end example
7386
7387 @node The Lisp Reader and Compiler, Lstreams, MULE Character Sets and Encodings, Top
7388 @chapter The Lisp Reader and Compiler
7389
7390 Not yet documented.
7391
7392 @node Lstreams, Consoles; Devices; Frames; Windows, The Lisp Reader and Compiler, Top
7393 @chapter Lstreams
7394
7395   An @dfn{lstream} is an internal Lisp object that provides a generic
7396 buffering stream implementation.  Conceptually, you send data to the
7397 stream or read data from the stream, not caring what's on the other end
7398 of the stream.  The other end could be another stream, a file
7399 descriptor, a stdio stream, a fixed block of memory, a reallocating
7400 block of memory, etc.  The main purpose of the stream is to provide a
7401 standard interface and to do buffering.  Macros are defined to read or
7402 write characters, so the calling functions do not have to worry about
7403 blocking data together in order to achieve efficiency.
7404
7405 @menu
7406 * Creating an Lstream::         Creating an lstream object.
7407 * Lstream Types::               Different sorts of things that are streamed.
7408 * Lstream Functions::           Functions for working with lstreams.
7409 * Lstream Methods::             Creating new lstream types.
7410 @end menu
7411
7412 @node Creating an Lstream
7413 @section Creating an Lstream
7414
7415 Lstreams come in different types, depending on what is being interfaced
7416 to.  Although the primitive for creating new lstreams is
7417 @code{Lstream_new()}, generally you do not call this directly.  Instead,
7418 you call some type-specific creation function, which creates the lstream
7419 and initializes it as appropriate for the particular type.
7420
7421 All lstream creation functions take a @var{mode} argument, specifying
7422 what mode the lstream should be opened as.  This controls whether the
7423 lstream is for input and output, and optionally whether data should be
7424 blocked up in units of MULE characters.  Note that some types of
7425 lstreams can only be opened for input; others only for output; and
7426 others can be opened either way.  #### Richard Mlynarik thinks that
7427 there should be a strict separation between input and output streams,
7428 and he's probably right.
7429
7430   @var{mode} is a string, one of
7431
7432 @table @code
7433 @item "r"
7434   Open for reading.
7435 @item "w"
7436   Open for writing.
7437 @item "rc"
7438   Open for reading, but ``read'' never returns partial MULE characters.
7439 @item "wc"
7440   Open for writing, but never writes partial MULE characters.
7441 @end table
7442
7443 @node Lstream Types
7444 @section Lstream Types
7445
7446 @table @asis
7447 @item stdio
7448
7449 @item filedesc
7450
7451 @item lisp-string
7452
7453 @item fixed-buffer
7454
7455 @item resizing-buffer
7456
7457 @item dynarr
7458
7459 @item lisp-buffer
7460
7461 @item print
7462
7463 @item decoding
7464
7465 @item encoding
7466 @end table
7467
7468 @node Lstream Functions
7469 @section Lstream Functions
7470
7471 @deftypefun {Lstream *} Lstream_new (Lstream_implementation *@var{imp}, CONST char *@var{mode})
7472 Allocate and return a new Lstream.  This function is not really meant to
7473 be called directly; rather, each stream type should provide its own
7474 stream creation function, which creates the stream and does any other
7475 necessary creation stuff (e.g. opening a file).
7476 @end deftypefun
7477
7478 @deftypefun void Lstream_set_buffering (Lstream *@var{lstr}, Lstream_buffering @var{buffering}, int @var{buffering_size})
7479 Change the buffering of a stream.  See @file{lstream.h}.  By default the
7480 buffering is @code{STREAM_BLOCK_BUFFERED}.
7481 @end deftypefun
7482
7483 @deftypefun int Lstream_flush (Lstream *@var{lstr})
7484 Flush out any pending unwritten data in the stream.  Clear any buffered
7485 input data.  Returns 0 on success, -1 on error.
7486 @end deftypefun
7487
7488 @deftypefn Macro int Lstream_putc (Lstream *@var{stream}, int @var{c})
7489 Write out one byte to the stream.  This is a macro and so it is very
7490 efficient.  The @var{c} argument is only evaluated once but the @var{stream}
7491 argument is evaluated more than once.  Returns 0 on success, -1 on
7492 error.
7493 @end deftypefn
7494
7495 @deftypefn Macro int Lstream_getc (Lstream *@var{stream})
7496 Read one byte from the stream.  This is a macro and so it is very
7497 efficient.  The @var{stream} argument is evaluated more than once.  Return
7498 value is -1 for EOF or error.
7499 @end deftypefn
7500
7501 @deftypefn Macro void Lstream_ungetc (Lstream *@var{stream}, int @var{c})
7502 Push one byte back onto the input queue.  This will be the next byte
7503 read from the stream.  Any number of bytes can be pushed back and will
7504 be read in the reverse order they were pushed back---most recent
7505 first. (This is necessary for consistency---if there are a number of
7506 bytes that have been unread and I read and unread a byte, it needs to be
7507 the first to be read again.) This is a macro and so it is very
7508 efficient.  The @var{c} argument is only evaluated once but the @var{stream}
7509 argument is evaluated more than once.
7510 @end deftypefn
7511
7512 @deftypefun int Lstream_fputc (Lstream *@var{stream}, int @var{c})
7513 @deftypefunx int Lstream_fgetc (Lstream *@var{stream})
7514 @deftypefunx void Lstream_fungetc (Lstream *@var{stream}, int @var{c})
7515 Function equivalents of the above macros.
7516 @end deftypefun
7517
7518 @deftypefun ssize_t Lstream_read (Lstream *@var{stream}, void *@var{data}, size_t @var{size})
7519 Read @var{size} bytes of @var{data} from the stream.  Return the number
7520 of bytes read.  0 means EOF. -1 means an error occurred and no bytes
7521 were read.
7522 @end deftypefun
7523
7524 @deftypefun ssize_t Lstream_write (Lstream *@var{stream}, void *@var{data}, size_t @var{size})
7525 Write @var{size} bytes of @var{data} to the stream.  Return the number
7526 of bytes written.  -1 means an error occurred and no bytes were written.
7527 @end deftypefun
7528
7529 @deftypefun void Lstream_unread (Lstream *@var{stream}, void *@var{data}, size_t @var{size})
7530 Push back @var{size} bytes of @var{data} onto the input queue.  The next
7531 call to @code{Lstream_read()} with the same size will read the same
7532 bytes back.  Note that this will be the case even if there is other
7533 pending unread data.
7534 @end deftypefun
7535
7536 @deftypefun int Lstream_close (Lstream *@var{stream})
7537 Close the stream.  All data will be flushed out.
7538 @end deftypefun
7539
7540 @deftypefun void Lstream_reopen (Lstream *@var{stream})
7541 Reopen a closed stream.  This enables I/O on it again.  This is not
7542 meant to be called except from a wrapper routine that reinitializes
7543 variables and such---the close routine may well have freed some
7544 necessary storage structures, for example.
7545 @end deftypefun
7546
7547 @deftypefun void Lstream_rewind (Lstream *@var{stream})
7548 Rewind the stream to the beginning.
7549 @end deftypefun
7550
7551 @node Lstream Methods
7552 @section Lstream Methods
7553
7554 @deftypefn {Lstream Method} ssize_t reader (Lstream *@var{stream}, unsigned char *@var{data}, size_t @var{size})
7555 Read some data from the stream's end and store it into @var{data}, which
7556 can hold @var{size} bytes.  Return the number of bytes read.  A return
7557 value of 0 means no bytes can be read at this time.  This may be because
7558 of an EOF, or because there is a granularity greater than one byte that
7559 the stream imposes on the returned data, and @var{size} is less than
7560 this granularity. (This will happen frequently for streams that need to
7561 return whole characters, because @code{Lstream_read()} calls the reader
7562 function repeatedly until it has the number of bytes it wants or until 0
7563 is returned.)  The lstream functions do not treat a 0 return as EOF or
7564 do anything special; however, the calling function will interpret any 0
7565 it gets back as EOF.  This will normally not happen unless the caller
7566 calls @code{Lstream_read()} with a very small size.
7567
7568 This function can be @code{NULL} if the stream is output-only.
7569 @end deftypefn
7570
7571 @deftypefn {Lstream Method} ssize_t writer (Lstream *@var{stream}, CONST unsigned char *@var{data}, size_t @var{size})
7572 Send some data to the stream's end.  Data to be sent is in @var{data}
7573 and is @var{size} bytes.  Return the number of bytes sent.  This
7574 function can send and return fewer bytes than is passed in; in that
7575 case, the function will just be called again until there is no data left
7576 or 0 is returned.  A return value of 0 means that no more data can be
7577 currently stored, but there is no error; the data will be squirreled
7578 away until the writer can accept data. (This is useful, e.g., if you're
7579 dealing with a non-blocking file descriptor and are getting
7580 @code{EWOULDBLOCK} errors.)  This function can be @code{NULL} if the
7581 stream is input-only.
7582 @end deftypefn
7583
7584 @deftypefn {Lstream Method} int rewinder (Lstream *@var{stream})
7585 Rewind the stream.  If this is @code{NULL}, the stream is not seekable.
7586 @end deftypefn
7587
7588 @deftypefn {Lstream Method} int seekable_p (Lstream *@var{stream})
7589 Indicate whether this stream is seekable---i.e. it can be rewound.
7590 This method is ignored if the stream does not have a rewind method.  If
7591 this method is not present, the result is determined by whether a rewind
7592 method is present.
7593 @end deftypefn
7594
7595 @deftypefn {Lstream Method} int flusher (Lstream *@var{stream})
7596 Perform any additional operations necessary to flush the data in this
7597 stream.
7598 @end deftypefn
7599
7600 @deftypefn {Lstream Method} int pseudo_closer (Lstream *@var{stream})
7601 @end deftypefn
7602
7603 @deftypefn {Lstream Method} int closer (Lstream *@var{stream})
7604 Perform any additional operations necessary to close this stream down.
7605 May be @code{NULL}.  This function is called when @code{Lstream_close()}
7606 is called or when the stream is garbage-collected.  When this function
7607 is called, all pending data in the stream will already have been written
7608 out.
7609 @end deftypefn
7610
7611 @deftypefn {Lstream Method} Lisp_Object marker (Lisp_Object @var{lstream}, void (*@var{markfun}) (Lisp_Object))
7612 Mark this object for garbage collection.  Same semantics as a standard
7613 @code{Lisp_Object} marker.  This function can be @code{NULL}.
7614 @end deftypefn
7615
7616 @node Consoles; Devices; Frames; Windows, The Redisplay Mechanism, Lstreams, Top
7617 @chapter Consoles; Devices; Frames; Windows
7618
7619 @menu
7620 * Introduction to Consoles; Devices; Frames; Windows::
7621 * Point::
7622 * Window Hierarchy::
7623 * The Window Object::
7624 @end menu
7625
7626 @node Introduction to Consoles; Devices; Frames; Windows
7627 @section Introduction to Consoles; Devices; Frames; Windows
7628
7629 A window-system window that you see on the screen is called a
7630 @dfn{frame} in Emacs terminology.  Each frame is subdivided into one or
7631 more non-overlapping panes, called (confusingly) @dfn{windows}.  Each
7632 window displays the text of a buffer in it. (See above on Buffers.) Note
7633 that buffers and windows are independent entities: Two or more windows
7634 can be displaying the same buffer (potentially in different locations),
7635 and a buffer can be displayed in no windows.
7636
7637   A single display screen that contains one or more frames is called
7638 a @dfn{display}.  Under most circumstances, there is only one display.
7639 However, more than one display can exist, for example if you have
7640 a @dfn{multi-headed} console, i.e. one with a single keyboard but
7641 multiple displays. (Typically in such a situation, the various
7642 displays act like one large display, in that the mouse is only
7643 in one of them at a time, and moving the mouse off of one moves
7644 it into another.) In some cases, the different displays will
7645 have different characteristics, e.g. one color and one mono.
7646
7647   XEmacs can display frames on multiple displays.  It can even deal
7648 simultaneously with frames on multiple keyboards (called @dfn{consoles} in
7649 XEmacs terminology).  Here is one case where this might be useful: You
7650 are using XEmacs on your workstation at work, and leave it running.
7651 Then you go home and dial in on a TTY line, and you can use the
7652 already-running XEmacs process to display another frame on your local
7653 TTY.
7654
7655   Thus, there is a hierarchy console -> display -> frame -> window.
7656 There is a separate Lisp object type for each of these four concepts.
7657 Furthermore, there is logically a @dfn{selected console},
7658 @dfn{selected display}, @dfn{selected frame}, and @dfn{selected window}.
7659 Each of these objects is distinguished in various ways, such as being the
7660 default object for various functions that act on objects of that type.
7661 Note that every containing object rememembers the ``selected'' object
7662 among the objects that it contains: e.g. not only is there a selected
7663 window, but every frame remembers the last window in it that was
7664 selected, and changing the selected frame causes the remembered window
7665 within it to become the selected window.  Similar relationships apply
7666 for consoles to devices and devices to frames.
7667
7668 @node Point
7669 @section Point
7670
7671   Recall that every buffer has a current insertion position, called
7672 @dfn{point}.  Now, two or more windows may be displaying the same buffer,
7673 and the text cursor in the two windows (i.e. @code{point}) can be in
7674 two different places.  You may ask, how can that be, since each
7675 buffer has only one value of @code{point}?  The answer is that each window
7676 also has a value of @code{point} that is squirreled away in it.  There
7677 is only one selected window, and the value of ``point'' in that buffer
7678 corresponds to that window.  When the selected window is changed
7679 from one window to another displaying the same buffer, the old
7680 value of @code{point} is stored into the old window's ``point'' and the
7681 value of @code{point} from the new window is retrieved and made the
7682 value of @code{point} in the buffer.  This means that @code{window-point}
7683 for the selected window is potentially inaccurate, and if you
7684 want to retrieve the correct value of @code{point} for a window,
7685 you must special-case on the selected window and retrieve the
7686 buffer's point instead.  This is related to why @code{save-window-excursion}
7687 does not save the selected window's value of @code{point}.
7688
7689 @node Window Hierarchy
7690 @section Window Hierarchy
7691 @cindex window hierarchy
7692 @cindex hierarchy of windows
7693
7694   If a frame contains multiple windows (panes), they are always created
7695 by splitting an existing window along the horizontal or vertical axis.
7696 Terminology is a bit confusing here: to @dfn{split a window
7697 horizontally} means to create two side-by-side windows, i.e. to make a
7698 @emph{vertical} cut in a window.  Likewise, to @dfn{split a window
7699 vertically} means to create two windows, one above the other, by making
7700 a @emph{horizontal} cut.
7701
7702   If you split a window and then split again along the same axis, you
7703 will end up with a number of panes all arranged along the same axis.
7704 The precise way in which the splits were made should not be important,
7705 and this is reflected internally.  Internally, all windows are arranged
7706 in a tree, consisting of two types of windows, @dfn{combination} windows
7707 (which have children, and are covered completely by those children) and
7708 @dfn{leaf} windows, which have no children and are visible.  Every
7709 combination window has two or more children, all arranged along the same
7710 axis.  There are (logically) two subtypes of windows, depending on
7711 whether their children are horizontally or vertically arrayed.  There is
7712 always one root window, which is either a leaf window (if the frame
7713 contains only one window) or a combination window (if the frame contains
7714 more than one window).  In the latter case, the root window will have
7715 two or more children, either horizontally or vertically arrayed, and
7716 each of those children will be either a leaf window or another
7717 combination window.
7718
7719   Here are some rules:
7720
7721 @enumerate
7722 @item
7723 Horizontal combination windows can never have children that are
7724 horizontal combination windows; same for vertical.
7725
7726 @item
7727 Only leaf windows can be split (obviously) and this splitting does one
7728 of two things: (a) turns the leaf window into a combination window and
7729 creates two new leaf children, or (b) turns the leaf window into one of
7730 the two new leaves and creates the other leaf.  Rule (1) dictates which
7731 of these two outcomes happens.
7732
7733 @item
7734 Every combination window must have at least two children.
7735
7736 @item
7737 Leaf windows can never become combination windows.  They can be deleted,
7738 however.  If this results in a violation of (3), the parent combination
7739 window also gets deleted.
7740
7741 @item
7742 All functions that accept windows must be prepared to accept combination
7743 windows, and do something sane (e.g. signal an error if so).
7744 Combination windows @emph{do} escape to the Lisp level.
7745
7746 @item
7747 All windows have three fields governing their contents:
7748 these are @dfn{hchild} (a list of horizontally-arrayed children),
7749 @dfn{vchild} (a list of vertically-arrayed children), and @dfn{buffer}
7750 (the buffer contained in a leaf window).  Exactly one of
7751 these will be non-nil.  Remember that @dfn{horizontally-arrayed}
7752 means ``side-by-side'' and @dfn{vertically-arrayed} means
7753 @dfn{one above the other}.
7754
7755 @item
7756 Leaf windows also have markers in their @code{start} (the
7757 first buffer position displayed in the window) and @code{pointm}
7758 (the window's stashed value of @code{point}---see above) fields,
7759 while combination windows have nil in these fields.
7760
7761 @item
7762 The list of children for a window is threaded through the
7763 @code{next} and @code{prev} fields of each child window.
7764
7765 @item
7766 @strong{Deleted windows can be undeleted}.  This happens as a result of
7767 restoring a window configuration, and is unlike frames, displays, and
7768 consoles, which, once deleted, can never be restored.  Deleting a window
7769 does nothing except set a special @code{dead} bit to 1 and clear out the
7770 @code{next}, @code{prev}, @code{hchild}, and @code{vchild} fields, for
7771 GC purposes.
7772
7773 @item
7774 Most frames actually have two top-level windows---one for the
7775 minibuffer and one (the @dfn{root}) for everything else.  The modeline
7776 (if present) separates these two.  The @code{next} field of the root
7777 points to the minibuffer, and the @code{prev} field of the minibuffer
7778 points to the root.  The other @code{next} and @code{prev} fields are
7779 @code{nil}, and the frame points to both of these windows.
7780 Minibuffer-less frames have no minibuffer window, and the @code{next}
7781 and @code{prev} of the root window are @code{nil}.  Minibuffer-only
7782 frames have no root window, and the @code{next} of the minibuffer window
7783 is @code{nil} but the @code{prev} points to itself. (#### This is an
7784 artifact that should be fixed.)
7785 @end enumerate
7786
7787 @node The Window Object
7788 @section The Window Object
7789
7790   Windows have the following accessible fields:
7791
7792 @table @code
7793 @item frame
7794 The frame that this window is on.
7795
7796 @item mini_p
7797 Non-@code{nil} if this window is a minibuffer window.
7798
7799 @item buffer
7800 The buffer that the window is displaying.  This may change often during
7801 the life of the window.
7802
7803 @item dedicated
7804 Non-@code{nil} if this window is dedicated to its buffer.
7805
7806 @item pointm
7807 @cindex window point internals
7808 This is the value of point in the current buffer when this window is
7809 selected; when it is not selected, it retains its previous value.
7810
7811 @item start
7812 The position in the buffer that is the first character to be displayed
7813 in the window.
7814
7815 @item force_start
7816 If this flag is non-@code{nil}, it says that the window has been
7817 scrolled explicitly by the Lisp program.  This affects what the next
7818 redisplay does if point is off the screen: instead of scrolling the
7819 window to show the text around point, it moves point to a location that
7820 is on the screen.
7821
7822 @item last_modified
7823 The @code{modified} field of the window's buffer, as of the last time
7824 a redisplay completed in this window.
7825
7826 @item last_point
7827 The buffer's value of point, as of the last time
7828 a redisplay completed in this window.
7829
7830 @item left
7831 This is the left-hand edge of the window, measured in columns.  (The
7832 leftmost column on the screen is @w{column 0}.)
7833
7834 @item top
7835 This is the top edge of the window, measured in lines.  (The top line on
7836 the screen is @w{line 0}.)
7837
7838 @item height
7839 The height of the window, measured in lines.
7840
7841 @item width
7842 The width of the window, measured in columns.
7843
7844 @item next
7845 This is the window that is the next in the chain of siblings.  It is
7846 @code{nil} in a window that is the rightmost or bottommost of a group of
7847 siblings.
7848
7849 @item prev
7850 This is the window that is the previous in the chain of siblings.  It is
7851 @code{nil} in a window that is the leftmost or topmost of a group of
7852 siblings.
7853
7854 @item parent
7855 Internally, XEmacs arranges windows in a tree; each group of siblings has
7856 a parent window whose area includes all the siblings.  This field points
7857 to a window's parent.
7858
7859 Parent windows do not display buffers, and play little role in display
7860 except to shape their child windows.  Emacs Lisp programs usually have
7861 no access to the parent windows; they operate on the windows at the
7862 leaves of the tree, which actually display buffers.
7863
7864 @item hscroll
7865 This is the number of columns that the display in the window is scrolled
7866 horizontally to the left.  Normally, this is 0.
7867
7868 @item use_time
7869 This is the last time that the window was selected.  The function
7870 @code{get-lru-window} uses this field.
7871
7872 @item display_table
7873 The window's display table, or @code{nil} if none is specified for it.
7874
7875 @item update_mode_line
7876 Non-@code{nil} means this window's mode line needs to be updated.
7877
7878 @item base_line_number
7879 The line number of a certain position in the buffer, or @code{nil}.
7880 This is used for displaying the line number of point in the mode line.
7881
7882 @item base_line_pos
7883 The position in the buffer for which the line number is known, or
7884 @code{nil} meaning none is known.
7885
7886 @item region_showing
7887 If the region (or part of it) is highlighted in this window, this field
7888 holds the mark position that made one end of that region.  Otherwise,
7889 this field is @code{nil}.
7890 @end table
7891
7892 @node The Redisplay Mechanism, Extents, Consoles; Devices; Frames; Windows, Top
7893 @chapter The Redisplay Mechanism
7894
7895   The redisplay mechanism is one of the most complicated sections of
7896 XEmacs, especially from a conceptual standpoint.  This is doubly so
7897 because, unlike for the basic aspects of the Lisp interpreter, the
7898 computer science theories of how to efficiently handle redisplay are not
7899 well-developed.
7900
7901   When working with the redisplay mechanism, remember the Golden Rules
7902 of Redisplay:
7903
7904 @enumerate
7905 @item
7906 It Is Better To Be Correct Than Fast.
7907 @item
7908 Thou Shalt Not Run Elisp From Within Redisplay.
7909 @item
7910 It Is Better To Be Fast Than Not To Be.
7911 @end enumerate
7912
7913 @menu
7914 * Critical Redisplay Sections::
7915 * Line Start Cache::
7916 * Redisplay Piece by Piece::
7917 @end menu
7918
7919 @node Critical Redisplay Sections
7920 @section Critical Redisplay Sections
7921 @cindex critical redisplay sections
7922
7923 Within this section, we are defenseless and assume that the
7924 following cannot happen:
7925
7926 @enumerate
7927 @item
7928 garbage collection
7929 @item
7930 Lisp code evaluation
7931 @item
7932 frame size changes
7933 @end enumerate
7934
7935 We ensure (3) by calling @code{hold_frame_size_changes()}, which
7936 will cause any pending frame size changes to get put on hold
7937 till after the end of the critical section.  (1) follows
7938 automatically if (2) is met.  #### Unfortunately, there are
7939 some places where Lisp code can be called within this section.
7940 We need to remove them.
7941
7942 If @code{Fsignal()} is called during this critical section, we
7943 will @code{abort()}.
7944
7945 If garbage collection is called during this critical section,
7946 we simply return. #### We should abort instead.
7947
7948 #### If a frame-size change does occur we should probably
7949 actually be preempting redisplay.
7950
7951 @node Line Start Cache
7952 @section Line Start Cache
7953 @cindex line start cache
7954
7955   The traditional scrolling code in Emacs breaks in a variable height
7956 world.  It depends on the key assumption that the number of lines that
7957 can be displayed at any given time is fixed.  This led to a complete
7958 separation of the scrolling code from the redisplay code.  In order to
7959 fully support variable height lines, the scrolling code must actually be
7960 tightly integrated with redisplay.  Only redisplay can determine how
7961 many lines will be displayed on a screen for any given starting point.
7962
7963   What is ideally wanted is a complete list of the starting buffer
7964 position for every possible display line of a buffer along with the
7965 height of that display line.  Maintaining such a full list would be very
7966 expensive.  We settle for having it include information for all areas
7967 which we happen to generate anyhow (i.e. the region currently being
7968 displayed) and for those areas we need to work with.
7969
7970   In order to ensure that the cache accurately represents what redisplay
7971 would actually show, it is necessary to invalidate it in many
7972 situations.  If the buffer changes, the starting positions may no longer
7973 be correct.  If a face or an extent has changed then the line heights
7974 may have altered.  These events happen frequently enough that the cache
7975 can end up being constantly disabled.  With this potentially constant
7976 invalidation when is the cache ever useful?
7977
7978   Even if the cache is invalidated before every single usage, it is
7979 necessary.  Scrolling often requires knowledge about display lines which
7980 are actually above or below the visible region.  The cache provides a
7981 convenient light-weight method of storing this information for multiple
7982 display regions.  This knowledge is necessary for the scrolling code to
7983 always obey the First Golden Rule of Redisplay.
7984
7985   If the cache already contains all of the information that the scrolling
7986 routines happen to need so that it doesn't have to go generate it, then
7987 we are able to obey the Third Golden Rule of Redisplay.  The first thing
7988 we do to help out the cache is to always add the displayed region.  This
7989 region had to be generated anyway, so the cache ends up getting the
7990 information basically for free.  In those cases where a user is simply
7991 scrolling around viewing a buffer there is a high probability that this
7992 is sufficient to always provide the needed information.  The second
7993 thing we can do is be smart about invalidating the cache.
7994
7995   TODO---Be smart about invalidating the cache.  Potential places:
7996
7997 @itemize @bullet
7998 @item
7999 Insertions at end-of-line which don't cause line-wraps do not alter the
8000 starting positions of any display lines.  These types of buffer
8001 modifications should not invalidate the cache.  This is actually a large
8002 optimization for redisplay speed as well.
8003 @item
8004 Buffer modifications frequently only affect the display of lines at and
8005 below where they occur.  In these situations we should only invalidate
8006 the part of the cache starting at where the modification occurs.
8007 @end itemize
8008
8009   In case you're wondering, the Second Golden Rule of Redisplay is not
8010 applicable.
8011
8012 @node Redisplay Piece by Piece
8013 @section Redisplay Piece by Piece
8014 @cindex Redisplay Piece by Piece
8015
8016 As you can begin to see redisplay is complex and also not well
8017 documented. Chuck no longer works on XEmacs so this section is my take
8018 on the workings of redisplay.
8019
8020 Redisplay happens in three phases:
8021
8022 @enumerate
8023 @item
8024 Determine desired display in area that needs redisplay.
8025 Implemented by @code{redisplay.c}
8026 @item
8027 Compare desired display with current display
8028 Implemented by @code{redisplay-output.c}
8029 @item
8030 Output changes Implemented by @code{redisplay-output.c},
8031 @code{redisplay-x.c}, @code{redisplay-msw.c} and @code{redisplay-tty.c}
8032 @end enumerate
8033
8034 Steps 1 and 2 are device-independant and relatively complex.  Step 3 is
8035 mostly device-dependent.
8036
8037 Determining the desired display
8038
8039 Display attributes are stored in @code{display_line} structures. Each
8040 @code{display_line} consists of a set of @code{display_block}'s and each
8041 @code{display_block} contains a number of @code{rune}'s. Generally
8042 dynarr's of @code{display_line}'s are held by each window representing
8043 the current display and the desired display.
8044
8045 The @code{display_line} structures are tighly tied to buffers which
8046 presents a problem for redisplay as this connection is bogus for the
8047 modeline. Hence the @code{display_line} generation routines are
8048 duplicated for generating the modeline. This means that the modeline
8049 display code has many bugs that the standard redisplay code does not.
8050
8051 The guts of @code{display_line} generation are in
8052 @code{create_text_block}, which creates a single display line for the
8053 desired locale. This incrementally parses the characters on the current
8054 line and generates redisplay structures for each.
8055
8056 Gutter redisplay is different. Because the data to display is stored in
8057 a string we cannot use @code{create_text_block}. Instead we use
8058 @code{create_text_string_block} which performs the same function as
8059 @code{create_text_block} but for strings. Many of the complexities of
8060 @code{create_text_block} to do with cursor handling and selective
8061 display have been removed.
8062
8063 @node Extents, Faces, The Redisplay Mechanism, Top
8064 @chapter Extents
8065
8066 @menu
8067 * Introduction to Extents::     Extents are ranges over text, with properties.
8068 * Extent Ordering::             How extents are ordered internally.
8069 * Format of the Extent Info::   The extent information in a buffer or string.
8070 * Zero-Length Extents::         A weird special case.
8071 * Mathematics of Extent Ordering::      A rigorous foundation.
8072 * Extent Fragments::            Cached information useful for redisplay.
8073 @end menu
8074
8075 @node Introduction to Extents
8076 @section Introduction to Extents
8077
8078   Extents are regions over a buffer, with a start and an end position
8079 denoting the region of the buffer included in the extent.  In
8080 addition, either end can be closed or open, meaning that the endpoint
8081 is or is not logically included in the extent.  Insertion of a character
8082 at a closed endpoint causes the character to go inside the extent;
8083 insertion at an open endpoint causes the character to go outside.
8084
8085   Extent endpoints are stored using memory indices (see @file{insdel.c}),
8086 to minimize the amount of adjusting that needs to be done when
8087 characters are inserted or deleted.
8088
8089   (Formerly, extent endpoints at the gap could be either before or
8090 after the gap, depending on the open/closedness of the endpoint.
8091 The intent of this was to make it so that insertions would
8092 automatically go inside or out of extents as necessary with no
8093 further work needing to be done.  It didn't work out that way,
8094 however, and just ended up complexifying and buggifying all the
8095 rest of the code.)
8096
8097 @node Extent Ordering
8098 @section Extent Ordering
8099
8100   Extents are compared using memory indices.  There are two orderings
8101 for extents and both orders are kept current at all times.  The normal
8102 or @dfn{display} order is as follows:
8103
8104 @example
8105 Extent A is ``less than'' extent B,
8106 that is, earlier in the display order,
8107   if:    A-start < B-start,
8108   or if: A-start = B-start, and A-end > B-end
8109 @end example
8110
8111   So if two extents begin at the same position, the larger of them is the
8112 earlier one in the display order (@code{EXTENT_LESS} is true).
8113
8114   For the e-order, the same thing holds:
8115
8116 @example
8117 Extent A is ``less than'' extent B in e-order,
8118 that is, later in the buffer,
8119   if:    A-end < B-end,
8120   or if: A-end = B-end, and A-start > B-start
8121 @end example
8122
8123   So if two extents end at the same position, the smaller of them is the
8124 earlier one in the e-order (@code{EXTENT_E_LESS} is true).
8125
8126   The display order and the e-order are complementary orders: any
8127 theorem about the display order also applies to the e-order if you swap
8128 all occurrences of ``display order'' and ``e-order'', ``less than'' and
8129 ``greater than'', and ``extent start'' and ``extent end''.
8130
8131 @node Format of the Extent Info
8132 @section Format of the Extent Info
8133
8134   An extent-info structure consists of a list of the buffer or string's
8135 extents and a @dfn{stack of extents} that lists all of the extents over
8136 a particular position.  The stack-of-extents info is used for
8137 optimization purposes---it basically caches some info that might
8138 be expensive to compute.  Certain otherwise hard computations are easy
8139 given the stack of extents over a particular position, and if the
8140 stack of extents over a nearby position is known (because it was
8141 calculated at some prior point in time), it's easy to move the stack
8142 of extents to the proper position.
8143
8144   Given that the stack of extents is an optimization, and given that
8145 it requires memory, a string's stack of extents is wiped out each
8146 time a garbage collection occurs.  Therefore, any time you retrieve
8147 the stack of extents, it might not be there.  If you need it to
8148 be there, use the @code{_force} version.
8149
8150   Similarly, a string may or may not have an extent_info structure.
8151 (Generally it won't if there haven't been any extents added to the
8152 string.) So use the @code{_force} version if you need the extent_info
8153 structure to be there.
8154
8155   A list of extents is maintained as a double gap array: one gap array
8156 is ordered by start index (the @dfn{display order}) and the other is
8157 ordered by end index (the @dfn{e-order}).  Note that positions in an
8158 extent list should logically be conceived of as referring @emph{to} a
8159 particular extent (as is the norm in programs) rather than sitting
8160 between two extents.  Note also that callers of these functions should
8161 not be aware of the fact that the extent list is implemented as an
8162 array, except for the fact that positions are integers (this should be
8163 generalized to handle integers and linked list equally well).
8164
8165 @node Zero-Length Extents
8166 @section Zero-Length Extents
8167
8168   Extents can be zero-length, and will end up that way if their endpoints
8169 are explicitly set that way or if their detachable property is nil
8170 and all the text in the extent is deleted. (The exception is open-open
8171 zero-length extents, which are barred from existing because there is
8172 no sensible way to define their properties.  Deletion of the text in
8173 an open-open extent causes it to be converted into a closed-open
8174 extent.)  Zero-length extents are primarily used to represent
8175 annotations, and behave as follows:
8176
8177 @enumerate
8178 @item
8179 Insertion at the position of a zero-length extent expands the extent
8180 if both endpoints are closed; goes after the extent if it is closed-open;
8181 and goes before the extent if it is open-closed.
8182
8183 @item
8184 Deletion of a character on a side of a zero-length extent whose
8185 corresponding endpoint is closed causes the extent to be detached if
8186 it is detachable; if the extent is not detachable or the corresponding
8187 endpoint is open, the extent remains in the buffer, moving as necessary.
8188 @end enumerate
8189
8190   Note that closed-open, non-detachable zero-length extents behave
8191 exactly like markers and that open-closed, non-detachable zero-length
8192 extents behave like the ``point-type'' marker in Mule.
8193
8194 @node Mathematics of Extent Ordering
8195 @section Mathematics of Extent Ordering
8196 @cindex extent mathematics
8197 @cindex mathematics of extents
8198 @cindex extent ordering
8199
8200 @cindex display order of extents
8201 @cindex extents, display order
8202   The extents in a buffer are ordered by ``display order'' because that
8203 is that order that the redisplay mechanism needs to process them in.
8204 The e-order is an auxiliary ordering used to facilitate operations
8205 over extents.  The operations that can be performed on the ordered
8206 list of extents in a buffer are
8207
8208 @enumerate
8209 @item
8210 Locate where an extent would go if inserted into the list.
8211 @item
8212 Insert an extent into the list.
8213 @item
8214 Remove an extent from the list.
8215 @item
8216 Map over all the extents that overlap a range.
8217 @end enumerate
8218
8219   (4) requires being able to determine the first and last extents
8220 that overlap a range.
8221
8222   NOTE: @dfn{overlap} is used as follows:
8223
8224 @itemize @bullet
8225 @item
8226 two ranges overlap if they have at least one point in common.
8227 Whether the endpoints are open or closed makes a difference here.
8228 @item
8229 a point overlaps a range if the point is contained within the
8230 range; this is equivalent to treating a point @math{P} as the range
8231 @math{[P, P]}.
8232 @item
8233 In the case of an @emph{extent} overlapping a point or range, the extent
8234 is normally treated as having closed endpoints.  This applies
8235 consistently in the discussion of stacks of extents and such below.
8236 Note that this definition of overlap is not necessarily consistent with
8237 the extents that @code{map-extents} maps over, since @code{map-extents}
8238 sometimes pays attention to whether the endpoints of an extents are open
8239 or closed.  But for our purposes, it greatly simplifies things to treat
8240 all extents as having closed endpoints.
8241 @end itemize
8242
8243 First, define @math{>}, @math{<}, @math{<=}, etc. as applied to extents
8244 to mean comparison according to the display order.  Comparison between
8245 an extent @math{E} and an index @math{I} means comparison between
8246 @math{E} and the range @math{[I, I]}.
8247
8248 Also define @math{e>}, @math{e<}, @math{e<=}, etc. to mean comparison
8249 according to the e-order.
8250
8251 For any range @math{R}, define @math{R(0)} to be the starting index of
8252 the range and @math{R(1)} to be the ending index of the range.
8253
8254 For any extent @math{E}, define @math{E(next)} to be the extent directly
8255 following @math{E}, and @math{E(prev)} to be the extent directly
8256 preceding @math{E}.  Assume @math{E(next)} and @math{E(prev)} can be
8257 determined from @math{E} in constant time.  (This is because we store
8258 the extent list as a doubly linked list.)
8259
8260 Similarly, define @math{E(e-next)} and @math{E(e-prev)} to be the
8261 extents directly following and preceding @math{E} in the e-order.
8262
8263 Now:
8264
8265 Let @math{R} be a range.
8266 Let @math{F} be the first extent overlapping @math{R}.
8267 Let @math{L} be the last extent overlapping @math{R}.
8268
8269 Theorem 1: @math{R(1)} lies between @math{L} and @math{L(next)},
8270 i.e. @math{L <= R(1) < L(next)}.
8271
8272   This follows easily from the definition of display order.  The
8273 basic reason that this theorem applies is that the display order
8274 sorts by increasing starting index.
8275
8276   Therefore, we can determine @math{L} just by looking at where we would
8277 insert @math{R(1)} into the list, and if we know @math{F} and are moving
8278 forward over extents, we can easily determine when we've hit @math{L} by
8279 comparing the extent we're at to @math{R(1)}.
8280
8281 @example
8282 Theorem 2: @math{F(e-prev) e< [1, R(0)] e<= F}.
8283 @end example
8284
8285   This is the analog of Theorem 1, and applies because the e-order
8286 sorts by increasing ending index.
8287
8288   Therefore, @math{F} can be found in the same amount of time as
8289 operation (1), i.e. the time that it takes to locate where an extent
8290 would go if inserted into the e-order list.
8291
8292   If the lists were stored as balanced binary trees, then operation (1)
8293 would take logarithmic time, which is usually quite fast.  However,
8294 currently they're stored as simple doubly-linked lists, and instead we
8295 do some caching to try to speed things up.
8296
8297   Define a @dfn{stack of extents} (or @dfn{SOE}) as the set of extents
8298 (ordered in the display order) that overlap an index @math{I}, together
8299 with the SOE's @dfn{previous} extent, which is an extent that precedes
8300 @math{I} in the e-order. (Hopefully there will not be very many extents
8301 between @math{I} and the previous extent.)
8302
8303 Now:
8304
8305 Let @math{I} be an index, let @math{S} be the stack of extents on
8306 @math{I}, let @math{F} be the first extent in @math{S}, and let @math{P}
8307 be @math{S}'s previous extent.
8308
8309 Theorem 3: The first extent in @math{S} is the first extent that overlaps
8310 any range @math{[I, J]}.
8311
8312 Proof: Any extent that overlaps @math{[I, J]} but does not include
8313 @math{I} must have a start index @math{> I}, and thus be greater than
8314 any extent in @math{S}.
8315
8316 Therefore, finding the first extent that overlaps a range @math{R} is
8317 the same as finding the first extent that overlaps @math{R(0)}.
8318
8319 Theorem 4: Let @math{I2} be an index such that @math{I2 > I}, and let
8320 @math{F2} be the first extent that overlaps @math{I2}.  Then, either
8321 @math{F2} is in @math{S} or @math{F2} is greater than any extent in
8322 @math{S}.
8323
8324 Proof: If @math{F2} does not include @math{I} then its start index is
8325 greater than @math{I} and thus it is greater than any extent in
8326 @math{S}, including @math{F}.  Otherwise, @math{F2} includes @math{I}
8327 and thus is in @math{S}, and thus @math{F2 >= F}.
8328
8329 @node Extent Fragments
8330 @section Extent Fragments
8331 @cindex extent fragment
8332
8333   Imagine that the buffer is divided up into contiguous, non-overlapping
8334 @dfn{runs} of text such that no extent starts or ends within a run
8335 (extents that abut the run don't count).
8336
8337   An extent fragment is a structure that holds data about the run that
8338 contains a particular buffer position (if the buffer position is at the
8339 junction of two runs, the run after the position is used)---the
8340 beginning and end of the run, a list of all of the extents in that run,
8341 the @dfn{merged face} that results from merging all of the faces
8342 corresponding to those extents, the begin and end glyphs at the
8343 beginning of the run, etc.  This is the information that redisplay needs
8344 in order to display this run.
8345
8346   Extent fragments have to be very quick to update to a new buffer
8347 position when moving linearly through the buffer.  They rely on the
8348 stack-of-extents code, which does the heavy-duty algorithmic work of
8349 determining which extents overly a particular position.
8350
8351 @node Faces, Glyphs, Extents, Top
8352 @chapter Faces
8353
8354 Not yet documented.
8355
8356 @node Glyphs, Specifiers, Faces, Top
8357 @chapter Glyphs
8358
8359 Glyphs are graphical elements that can be displayed in XEmacs buffers or
8360 gutters. We use the term graphical element here in the broadest possible
8361 sense since glyphs can be as mundane as text to as arcane as a native
8362 tab widget.
8363
8364 In XEmacs, glyphs represent the uninstantiated state of graphical
8365 elements, i.e. they hold all the information necessary to produce an
8366 image on-screen but the image does not exist at this stage.
8367
8368 Glyphs are lazily instantiated by calling one of the glyph
8369 functions. This usually occurs within redisplay when
8370 @code{Fglyph_height} is called. Instantiation causes an image-instance
8371 to be created and cached. This cache is on a device basis for all glyphs
8372 except glyph-widgets, and on a window basis for glyph widgets.  The
8373 caching is done by @code{image_instantiate} and is necessary because it
8374 is generally possible to display an image-instance in multiple
8375 domains. For instance if we create a Pixmap, we can actually display
8376 this on multiple windows - even though we only need a single Pixmap
8377 instance to do this. If caching wasn't done then it would be necessary
8378 to create image-instances for every displayable occurrance of a glyph -
8379 and every usage - and this would be extremely memory and cpu intensive.
8380
8381 Widget-glyphs (a.k.a native widgets) are not cached in this way. This is
8382 because widget-glyph image-instances on screen are toolkit windows, and
8383 thus cannot be reused in multiple XEmacs domains. Thus widget-glyphs are
8384 cached on a window basis.
8385
8386 Any action on a glyph first consults the cache before actually
8387 instantiating a widget.
8388
8389 @section Widget-Glyphs in the MS-Windows Environment
8390
8391 To Do
8392
8393 @section Widget-Glyphs in the X Environment
8394
8395 Widget-glyphs under X make heavy use of lwlib for manipulating the
8396 native toolkit objects. This is primarily so that different toolkits can
8397 be supported for widget-glyphs, just as they are supported for features
8398 such as menubars etc.
8399
8400 Lwlib is extremely poorly documented and quite hairy so here is my
8401 understanding of what goes on.
8402
8403 Lwlib maintains a set of widget_instances which mirror the hierarchical
8404 state of Xt widgets. I think this is so that widgets can be updated and
8405 manipulated generically by the lwlib library. For instance
8406 update_one_widget_instance can cope with multiple types of widget and
8407 multiple types of toolkit. Each element in the widget hierarchy is updated
8408 from its corresponding widget_instance by walking the widget_instance
8409 tree recursively.
8410
8411 This has desirable properties such as lw_modify_all_widgets which is
8412 called from glyphs-x.c and updates all the properties of a widget
8413 without having to know what the widget is or what toolkit it is from.
8414 Unfortunately this also has hairy properrties such as making the lwlib
8415 code quite complex. And of course lwlib has to know at some level what
8416 the widget is and how to set its properties.
8417
8418 @node Specifiers, Menus, Glyphs, Top
8419 @chapter Specifiers
8420
8421 Not yet documented.
8422
8423 @node Menus, Subprocesses, Specifiers, Top
8424 @chapter Menus
8425
8426   A menu is set by setting the value of the variable
8427 @code{current-menubar} (which may be buffer-local) and then calling
8428 @code{set-menubar-dirty-flag} to signal a change.  This will cause the
8429 menu to be redrawn at the next redisplay.  The format of the data in
8430 @code{current-menubar} is described in @file{menubar.c}.
8431
8432   Internally the data in current-menubar is parsed into a tree of
8433 @code{widget_value's} (defined in @file{lwlib.h}); this is accomplished
8434 by the recursive function @code{menu_item_descriptor_to_widget_value()},
8435 called by @code{compute_menubar_data()}.  Such a tree is deallocated
8436 using @code{free_widget_value()}.
8437
8438   @code{update_screen_menubars()} is one of the external entry points.
8439 This checks to see, for each screen, if that screen's menubar needs to
8440 be updated.  This is the case if
8441
8442 @enumerate
8443 @item
8444 @code{set-menubar-dirty-flag} was called since the last redisplay.  (This
8445 function sets the C variable menubar_has_changed.)
8446 @item
8447 The buffer displayed in the screen has changed.
8448 @item
8449 The screen has no menubar currently displayed.
8450 @end enumerate
8451
8452   @code{set_screen_menubar()} is called for each such screen.  This
8453 function calls @code{compute_menubar_data()} to create the tree of
8454 widget_value's, then calls @code{lw_create_widget()},
8455 @code{lw_modify_all_widgets()}, and/or @code{lw_destroy_all_widgets()}
8456 to create the X-Toolkit widget associated with the menu.
8457
8458   @code{update_psheets()}, the other external entry point, actually
8459 changes the menus being displayed.  It uses the widgets fixed by
8460 @code{update_screen_menubars()} and calls various X functions to ensure
8461 that the menus are displayed properly.
8462
8463   The menubar widget is set up so that @code{pre_activate_callback()} is
8464 called when the menu is first selected (i.e. mouse button goes down),
8465 and @code{menubar_selection_callback()} is called when an item is
8466 selected.  @code{pre_activate_callback()} calls the function in
8467 activate-menubar-hook, which can change the menubar (this is described
8468 in @file{menubar.c}).  If the menubar is changed,
8469 @code{set_screen_menubars()} is called.
8470 @code{menubar_selection_callback()} enqueues a menu event, putting in it
8471 a function to call (either @code{eval} or @code{call-interactively}) and
8472 its argument, which is the callback function or form given in the menu's
8473 description.
8474
8475 @node Subprocesses, Interface to X Windows, Menus, Top
8476 @chapter Subprocesses
8477
8478   The fields of a process are:
8479
8480 @table @code
8481 @item name
8482 A string, the name of the process.
8483
8484 @item command
8485 A list containing the command arguments that were used to start this
8486 process.
8487
8488 @item filter
8489 A function used to accept output from the process instead of a buffer,
8490 or @code{nil}.
8491
8492 @item sentinel
8493 A function called whenever the process receives a signal, or @code{nil}.
8494
8495 @item buffer
8496 The associated buffer of the process.
8497
8498 @item pid
8499 An integer, the Unix process @sc{id}.
8500
8501 @item childp
8502 A flag, non-@code{nil} if this is really a child process.
8503 It is @code{nil} for a network connection.
8504
8505 @item mark
8506 A marker indicating the position of the end of the last output from this
8507 process inserted into the buffer.  This is often but not always the end
8508 of the buffer.
8509
8510 @item kill_without_query
8511 If this is non-@code{nil}, killing XEmacs while this process is still
8512 running does not ask for confirmation about killing the process.
8513
8514 @item raw_status_low
8515 @itemx raw_status_high
8516 These two fields record 16 bits each of the process status returned by
8517 the @code{wait} system call.
8518
8519 @item status
8520 The process status, as @code{process-status} should return it.
8521
8522 @item tick
8523 @itemx update_tick
8524 If these two fields are not equal, a change in the status of the process
8525 needs to be reported, either by running the sentinel or by inserting a
8526 message in the process buffer.
8527
8528 @item pty_flag
8529 Non-@code{nil} if communication with the subprocess uses a @sc{pty};
8530 @code{nil} if it uses a pipe.
8531
8532 @item infd
8533 The file descriptor for input from the process.
8534
8535 @item outfd
8536 The file descriptor for output to the process.
8537
8538 @item subtty
8539 The file descriptor for the terminal that the subprocess is using.  (On
8540 some systems, there is no need to record this, so the value is
8541 @code{-1}.)
8542
8543 @item tty_name
8544 The name of the terminal that the subprocess is using,
8545 or @code{nil} if it is using pipes.
8546 @end table
8547
8548 @node Interface to X Windows, Index, Subprocesses, Top
8549 @chapter Interface to X Windows
8550
8551 Not yet documented.
8552
8553 @include index.texi
8554
8555 @c Print the tables of contents
8556 @summarycontents
8557 @contents
8558 @c That's all
8559
8560 @bye
8561