man/internals/internals.texi

   1 \input texinfo  @c -*-texinfo-*-
   2 @c %**start of header
   3 @setfilename ../../info/internals.info
   4 @settitle XEmacs Internals Manual
   5 @c %**end of header
   6
   7 @ifinfo
   8
   9 Copyright @copyright{} 1992 - 1996 Ben Wing.
  10 Copyright @copyright{} 1996, 1997 Sun Microsystems.
  11 Copyright @copyright{} 1994 - 1998 Free Software Foundation.
  12 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
  13
  14
  15 Permission is granted to make and distribute verbatim copies of this
  16 manual provided the copyright notice and this permission notice are
  17 preserved on all copies.
  18
  19 @ignore
  20 Permission is granted to process this file through TeX and print the
  21 results, provided the printed document carries copying permission notice
  22 identical to this one except for the removal of this paragraph (this
  23 paragraph not being relevant to the printed manual).
  24
  25 @end ignore
  26 Permission is granted to copy and distribute modified versions of this
  27 manual under the conditions for verbatim copying, provided that the
  28 entire resulting derived work is distributed under the terms of a
  29 permission notice identical to this one.
  30
  31 Permission is granted to copy and distribute translations of this manual
  32 into another language, under the above conditions for modified versions,
  33 except that this permission notice may be stated in a translation
  34 approved by the Foundation.
  35
  36 Permission is granted to copy and distribute modified versions of this
  37 manual under the conditions for verbatim copying, provided also that the
  38 section entitled ``GNU General Public License'' is included exactly as
  39 in the original, and provided that the entire resulting derived work is
  40 distributed under the terms of a permission notice identical to this
  41 one.
  42
  43 Permission is granted to copy and distribute translations of this manual
  44 into another language, under the above conditions for modified versions,
  45 except that the section entitled ``GNU General Public License'' may be
  46 included in a translation approved by the Free Software Foundation
  47 instead of in the original English.
  48 @end ifinfo
  49
  50 @c Combine indices.
  51 @synindex cp fn
  52 @syncodeindex vr fn
  53 @syncodeindex ky fn
  54 @syncodeindex pg fn
  55 @syncodeindex tp fn
  56
  57 @setchapternewpage odd
  58 @finalout
  59
  60 @titlepage
  61 @title XEmacs Internals Manual
  62 @subtitle Version 1.2, October 1998
  63
  64 @author Ben Wing
  65 @author Martin Buchholz
  66 @author Hrvoje Niksic
  67 @page
  68 @vskip 0pt plus 1fill
  69
  70 @noindent
  71 Copyright @copyright{} 1992 - 1996 Ben Wing. @*
  72 Copyright @copyright{} 1996, 1997 Sun Microsystems, Inc. @*
  73 Copyright @copyright{} 1994 - 1998 Free Software Foundation. @*
  74 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
  75
  76 @sp 2
  77 Version 1.2 @*
  78 October 1998.@*
  79
  80 Permission is granted to make and distribute verbatim copies of this
  81 manual provided the copyright notice and this permission notice are
  82 preserved on all copies.
  83
  84 Permission is granted to copy and distribute modified versions of this
  85 manual under the conditions for verbatim copying, provided also that the
  86 section entitled ``GNU General Public License'' is included
  87 exactly as in the original, and provided that the entire resulting
  88 derived work is distributed under the terms of a permission notice
  89 identical to this one.
  90
  91 Permission is granted to copy and distribute translations of this manual
  92 into another language, under the above conditions for modified versions,
  93 except that the section entitled ``GNU General Public License'' may be
  94 included in a translation approved by the Free Software Foundation
  95 instead of in the original English.
  96 @end titlepage
  97 @page
  98
  99 @node Top, A History of Emacs, (dir), (dir)
 100
 101 @ifinfo
 102 This Info file contains v1.0 of the XEmacs Internals Manual.
 103 @end ifinfo
 104
 105 @menu
 106 * A History of Emacs::          Times, dates, important events.
 107 * XEmacs From the Outside::     A broad conceptual overview.
 108 * The Lisp Language::           An overview.
 109 * XEmacs From the Perspective of Building::
 110 * XEmacs From the Inside::
 111 * The XEmacs Object System (Abstractly Speaking)::
 112 * How Lisp Objects Are Represented in C::
 113 * Rules When Writing New C Code::
 114 * A Summary of the Various XEmacs Modules::
 115 * Allocation of Objects in XEmacs Lisp::
 116 * Events and the Event Loop::
 117 * Evaluation; Stack Frames; Bindings::
 118 * Symbols and Variables::
 119 * Buffers and Textual Representation::
 120 * MULE Character Sets and Encodings::
 121 * The Lisp Reader and Compiler::
 122 * Lstreams::
 123 * Consoles; Devices; Frames; Windows::
 124 * The Redisplay Mechanism::
 125 * Extents::
 126 * Faces and Glyphs::
 127 * Specifiers::
 128 * Menus::
 129 * Subprocesses::
 130 * Interface to X Windows::
 131 * Index::                   Index including concepts, functions, variables,
 132                               and other terms.
 133
 134       --- The Detailed Node Listing ---
 135
 136 Here are other nodes that are inferiors of those already listed,
 137 mentioned here so you can get to them in one step:
 138
 139 A History of Emacs
 140
 141 * Through Version 18::          Unification prevails.
 142 * Lucid Emacs::                 One version 19 Emacs.
 143 * GNU Emacs 19::                The other version 19 Emacs.
 144 * XEmacs::                      The continuation of Lucid Emacs.
 145
 146 Rules When Writing New C Code
 147
 148 * General Coding Rules::
 149 * Writing Lisp Primitives::
 150 * Adding Global Lisp Variables::
 151 * Techniques for XEmacs Developers::
 152
 153 A Summary of the Various XEmacs Modules
 154
 155 * Low-Level Modules::
 156 * Basic Lisp Modules::
 157 * Modules for Standard Editing Operations::
 158 * Editor-Level Control Flow Modules::
 159 * Modules for the Basic Displayable Lisp Objects::
 160 * Modules for other Display-Related Lisp Objects::
 161 * Modules for the Redisplay Mechanism::
 162 * Modules for Interfacing with the File System::
 163 * Modules for Other Aspects of the Lisp Interpreter and Object System::
 164 * Modules for Interfacing with the Operating System::
 165 * Modules for Interfacing with X Windows::
 166 * Modules for Internationalization::
 167
 168 Allocation of Objects in XEmacs Lisp
 169
 170 * Introduction to Allocation::
 171 * Garbage Collection::
 172 * GCPROing::
 173 * Integers and Characters::
 174 * Allocation from Frob Blocks::
 175 * lrecords::
 176 * Low-level allocation::
 177 * Pure Space::
 178 * Cons::
 179 * Vector::
 180 * Bit Vector::
 181 * Symbol::
 182 * Marker::
 183 * String::
 184 * Compiled Function::
 185
 186 Events and the Event Loop
 187
 188 * Introduction to Events::
 189 * Main Loop::
 190 * Specifics of the Event Gathering Mechanism::
 191 * Specifics About the Emacs Event::
 192 * The Event Stream Callback Routines::
 193 * Other Event Loop Functions::
 194 * Converting Events::
 195 * Dispatching Events; The Command Builder::
 196
 197 Evaluation; Stack Frames; Bindings
 198
 199 * Evaluation::
 200 * Dynamic Binding; The specbinding Stack; Unwind-Protects::
 201 * Simple Special Forms::
 202 * Catch and Throw::
 203
 204 Symbols and Variables
 205
 206 * Introduction to Symbols::
 207 * Obarrays::
 208 * Symbol Values::
 209
 210 Buffers and Textual Representation
 211
 212 * Introduction to Buffers::     A buffer holds a block of text such as a file.
 213 * The Text in a Buffer::        Representation of the text in a buffer.
 214 * Buffer Lists::                Keeping track of all buffers.
 215 * Markers and Extents::         Tagging locations within a buffer.
 216 * Bufbytes and Emchars::        Representation of individual characters.
 217 * The Buffer Object::           The Lisp object corresponding to a buffer.
 218
 219 MULE Character Sets and Encodings
 220
 221 * Character Sets::
 222 * Encodings::
 223 * Internal Mule Encodings::
 224
 225 Encodings
 226
 227 * Japanese EUC (Extended Unix Code)::
 228 * JIS7::
 229
 230 Internal Mule Encodings
 231
 232 * Internal String Encoding::
 233 * Internal Character Encoding::
 234
 235 The Lisp Reader and Compiler
 236
 237 Lstreams
 238
 239 Consoles; Devices; Frames; Windows
 240
 241 * Introduction to Consoles; Devices; Frames; Windows::
 242 * Point::
 243 * Window Hierarchy::
 244
 245 The Redisplay Mechanism
 246
 247 * Critical Redisplay Sections::
 248 * Line Start Cache::
 249
 250 Extents
 251
 252 * Introduction to Extents::     Extents are ranges over text, with properties.
 253 * Extent Ordering::             How extents are ordered internally.
 254 * Format of the Extent Info::   The extent information in a buffer or string.
 255 * Zero-Length Extents::         A weird special case.
 256 * Mathematics of Extent Ordering::      A rigorous foundation.
 257 * Extent Fragments::            Cached information useful for redisplay.
 258
 259 Faces and Glyphs
 260
 261 Specifiers
 262
 263 Menus
 264
 265 Subprocesses
 266
 267 Interface to X Windows
 268
 269 @end menu
 270
 271 @node A History of Emacs, XEmacs From the Outside, Top, Top
 272 @chapter A History of Emacs
 273 @cindex history of Emacs
 274 @cindex Hackers (Steven Levy)
 275 @cindex Levy, Steven
 276 @cindex ITS (Incompatible Timesharing System)
 277 @cindex Stallman, Richard
 278 @cindex RMS
 279 @cindex MIT
 280 @cindex TECO
 281 @cindex FSF
 282 @cindex Free Software Foundation
 283
 284   XEmacs is a powerful, customizable text editor and development
 285 environment.  It began as Lucid Emacs, which was in turn derived from
 286 GNU Emacs, a program written by Richard Stallman of the Free Software
 287 Foundation.  GNU Emacs dates back to the 1970's, and was modelled
 288 after a package called ``Emacs'', written in 1976, that was a set of
 289 macros on top of TECO, an old, old text editor written at MIT on the
 290 DEC PDP 10 under one of the earliest time-sharing operating systems,
 291 ITS (Incompatible Timesharing System). (ITS dates back well before
 292 Unix.) ITS, TECO, and Emacs were products of a group of people at MIT
 293 who called themselves ``hackers'', who shared an idealistic belief
 294 system about the free exchange of information and were fanatical in
 295 their devotion to and time spent with computers. (The hacker
 296 subculture dates back to the late 1950's at MIT and is described in
 297 detail in Steven Levy's book @cite{Hackers}.  This book also includes
 298 a lot of information about Stallman himself and the development of
 299 Lisp, a programming language developed at MIT that underlies Emacs.)
 300
 301 @menu
 302 * Through Version 18::          Unification prevails.
 303 * Lucid Emacs::                 One version 19 Emacs.
 304 * GNU Emacs 19::                The other version 19 Emacs.
 305 * GNU Emacs 20::                The other version 20 Emacs.
 306 * XEmacs::                      The continuation of Lucid Emacs.
 307 @end menu
 308
 309 @node Through Version 18
 310 @section Through Version 18
 311 @cindex Gosling, James
 312 @cindex Great Usenet Renaming
 313
 314   Although the history of the early versions of GNU Emacs is unclear,
 315 the history is well-known from the middle of 1985.  A time line is:
 316
 317 @itemize @bullet
 318 @item
 319 GNU Emacs version 15 (15.34) was released sometime in 1984 or 1985 and
 320 shared some code with a version of Emacs written by James Gosling (the
 321 same James Gosling who later created the Java language).
 322 @item
 323 GNU Emacs version 16 (first released version was 16.56) was released on
 324 July 15, 1985.  All Gosling code was removed due to potential copyright
 325 problems with the code.
 326 @item
 327 version 16.57: released on September 16, 1985.
 328 @item
 329 versions 16.58, 16.59: released on September 17, 1985.
 330 @item
 331 version 16.60: released on September 19, 1985.  These later version 16's
 332 incorporated patches from the net, esp. for getting Emacs to work under
 333 System V.
 334 @item
 335 version 17.36 (first official v17 release) released on December 20,
 336 1985.  Included a TeX-able user manual.  First official unpatched
 337 version that worked on vanilla System V machines.
 338 @item
 339 version 17.43 (second official v17 release) released on January 25,
 340 1986.
 341 @item
 342 version 17.45 released on January 30, 1986.
 343 @item
 344 version 17.46 released on February 4, 1986.
 345 @item
 346 version 17.48 released on February 10, 1986.
 347 @item
 348 version 17.49 released on February 12, 1986.
 349 @item
 350 version 17.55 released on March 18, 1986.
 351 @item
 352 version 17.57 released on March 27, 1986.
 353 @item
 354 version 17.58 released on April 4, 1986.
 355 @item
 356 version 17.61 released on April 12, 1986.
 357 @item
 358 version 17.63 released on May 7, 1986.
 359 @item
 360 version 17.64 released on May 12, 1986.
 361 @item
 362 version 18.24 (a beta version) released on October 2, 1986.
 363 @item
 364 version 18.30 (a beta version) released on November 15, 1986.
 365 @item
 366 version 18.31 (a beta version) released on November 23, 1986.
 367 @item
 368 version 18.32 (a beta version) released on December 7, 1986.
 369 @item
 370 version 18.33 (a beta version) released on December 12, 1986.
 371 @item
 372 version 18.35 (a beta version) released on January 5, 1987.
 373 @item
 374 version 18.36 (a beta version) released on January 21, 1987.
 375 @item
 376 January 27, 1987: The Great Usenet Renaming.  net.emacs is now
 377 comp.emacs.
 378 @item
 379 version 18.37 (a beta version) released on February 12, 1987.
 380 @item
 381 version 18.38 (a beta version) released on March 3, 1987.
 382 @item
 383 version 18.39 (a beta version) released on March 14, 1987.
 384 @item
 385 version 18.40 (a beta version) released on March 18, 1987.
 386 @item
 387 version 18.41 (the first ``official'' release) released on March 22,
 388 1987.
 389 @item
 390 version 18.45 released on June 2, 1987.
 391 @item
 392 version 18.46 released on June 9, 1987.
 393 @item
 394 version 18.47 released on June 18, 1987.
 395 @item
 396 version 18.48 released on September 3, 1987.
 397 @item
 398 version 18.49 released on September 18, 1987.
 399 @item
 400 version 18.50 released on February 13, 1988.
 401 @item
 402 version 18.51 released on May 7, 1988.
 403 @item
 404 version 18.52 released on September 1, 1988.
 405 @item
 406 version 18.53 released on February 24, 1989.
 407 @item
 408 version 18.54 released on April 26, 1989.
 409 @item
 410 version 18.55 released on August 23, 1989.  This is the earliest version
 411 that is still available by FTP.
 412 @item
 413 version 18.56 released on January 17, 1991.
 414 @item
 415 version 18.57 released late January, 1991.
 416 @item
 417 version 18.58 released ?????.
 418 @item
 419 version 18.59 released October 31, 1992.
 420 @end itemize
 421
 422 @node Lucid Emacs
 423 @section Lucid Emacs
 424 @cindex Lucid Emacs
 425 @cindex Lucid Inc.
 426 @cindex Energize
 427 @cindex Epoch
 428
 429   Lucid Emacs was developed by the (now-defunct) Lucid Inc., a maker of
 430 C++ and Lisp development environments.  It began when Lucid decided they
 431 wanted to use Emacs as the editor and cornerstone of their C++
 432 development environment (called ``Energize'').  They needed many features
 433 that were not available in the existing version of GNU Emacs (version
 434 18.5something), in particular good and integrated support for GUI
 435 elements such as mouse support, multiple fonts, multiple window-system
 436 windows, etc.  A branch of GNU Emacs called Epoch, written at the
 437 University of Illinois, existed that supplied many of these features;
 438 however, Lucid needed more than what existed in Epoch.  At the time, the
 439 Free Software Foundation was working on version 19 of Emacs (this was
 440 sometime around 1991), which was planned to have similar features, and
 441 so Lucid decided to work with the Free Software Foundation.  Their plan
 442 was to add features that they needed, and coordinate with the FSF so
 443 that the features would get included back into Emacs version 19.
 444
 445   Delays in the release of version 19 occurred, however (resulting in it
 446 finally being released more than a year after what was initially
 447 planned), and Lucid encountered unexpected technical resistance in
 448 getting their changes merged back into version 19, so they decided to
 449 release their own version of Emacs, which became Lucid Emacs 19.0.
 450
 451 @cindex Zawinski, Jamie
 452 @cindex Sexton, Harlan
 453 @cindex Benson, Eric
 454 @cindex Devin, Matthieu
 455   The initial authors of Lucid Emacs were Matthieu Devin, Harlan Sexton,
 456 and Eric Benson, and the work was later taken over by Jamie Zawinski,
 457 who became ``Mr. Lucid Emacs'' for many releases.
 458
 459   A time line for Lucid Emacs/XEmacs is
 460
 461 @itemize @bullet
 462 @item
 463 version 19.0 shipped with Energize 1.0, April 1992.
 464 @item
 465 version 19.1 released June 4, 1992.
 466 @item
 467 version 19.2 released June 19, 1992.
 468 @item
 469 version 19.3 released September 9, 1992.
 470 @item
 471 version 19.4 released January 21, 1993.
 472 @item
 473 version 19.5 was a repackaging of 19.4 with a few bug fixes and
 474 shipped with Energize 2.0.  Never released to the net.
 475 @item
 476 version 19.6 released April 9, 1993.
 477 @item
 478 version 19.7 was a repackaging of 19.6 with a few bug fixes and
 479 shipped with Energize 2.1.  Never released to the net.
 480 @item
 481 version 19.8 released September 6, 1993.
 482 @item
 483 version 19.9 released January 12, 1994.
 484 @item
 485 version 19.10 released May 27, 1994.
 486 @item
 487 version 19.11 (first XEmacs) released September 13, 1994.
 488 @item
 489 version 19.12 released June 23, 1995.
 490 @item
 491 version 19.13 released September 1, 1995.
 492 @item
 493 version 19.14 released June 23, 1996.
 494 @item
 495 version 20.0 released February 9, 1997.
 496 @item
 497 version 19.15 released March 28, 1997.
 498 @item
 499 version 20.1 (not released to the net) April 15, 1997.
 500 @item
 501 version 20.2 released May 16, 1997.
 502 @item
 503 version 19.16 released October 31, 1997.
 504 @item
 505 version 20.3 (the first stable version of XEmacs 20.x) released November 30,
 506 1997.
 507 version 20.4 released February 28, 1998.
 508 @end itemize
 509
 510 @node GNU Emacs 19
 511 @section GNU Emacs 19
 512 @cindex GNU Emacs 19
 513 @cindex FSF Emacs
 514
 515   About a year after the initial release of Lucid Emacs, the FSF
 516 released a beta of their version of Emacs 19 (referred to here as ``GNU
 517 Emacs'').  By this time, the current version of Lucid Emacs was
 518 19.6. (Strangely, the first released beta from the FSF was GNU Emacs
 519 19.7.) A time line for GNU Emacs version 19 is
 520
 521 @itemize @bullet
 522 @item
 523 version 19.8 (beta) released May 27, 1993.
 524 @item
 525 version 19.9 (beta) released May 27, 1993.
 526 @item
 527 version 19.10 (beta) released May 30, 1993.
 528 @item
 529 version 19.11 (beta) released June 1, 1993.
 530 @item
 531 version 19.12 (beta) released June 2, 1993.
 532 @item
 533 version 19.13 (beta) released June 8, 1993.
 534 @item
 535 version 19.14 (beta) released June 17, 1993.
 536 @item
 537 version 19.15 (beta) released June 19, 1993.
 538 @item
 539 version 19.16 (beta) released July 6, 1993.
 540 @item
 541 version 19.17 (beta) released late July, 1993.
 542 @item
 543 version 19.18 (beta) released August 9, 1993.
 544 @item
 545 version 19.19 (beta) released August 15, 1993.
 546 @item
 547 version 19.20 (beta) released November 17, 1993.
 548 @item
 549 version 19.21 (beta) released November 17, 1993.
 550 @item
 551 version 19.22 (beta) released November 28, 1993.
 552 @item
 553 version 19.23 (beta) released May 17, 1994.
 554 @item
 555 version 19.24 (beta) released May 16, 1994.
 556 @item
 557 version 19.25 (beta) released June 3, 1994.
 558 @item
 559 version 19.26 (beta) released September 11, 1994.
 560 @item
 561 version 19.27 (beta) released September 14, 1994.
 562 @item
 563 version 19.28 (first ``official'' release) released November 1, 1994.
 564 @item
 565 version 19.29 released June 21, 1995.
 566 @item
 567 version 19.30 released November 24, 1995.
 568 @item
 569 version 19.31 released May 25, 1996.
 570 @item
 571 version 19.32 released July 31, 1996.
 572 @item
 573 version 19.33 released August 11, 1996.
 574 @item
 575 version 19.34 released August 21, 1996.
 576 @item
 577 version 19.34b released September 6, 1996.
 578 @end itemize
 579
 580 @cindex Mlynarik, Richard
 581   In some ways, GNU Emacs 19 was better than Lucid Emacs; in some ways,
 582 worse.  Lucid soon began incorporating features from GNU Emacs 19 into
 583 Lucid Emacs; the work was mostly done by Richard Mlynarik, who had been
 584 working on and using GNU Emacs for a long time (back as far as version
 585 16 or 17).
 586
 587 @node GNU Emacs 20
 588 @section GNU Emacs 20
 589 @cindex GNU Emacs 20
 590 @cindex FSF Emacs
 591
 592 On February 2, 1997 work began on GNU Emacs to integrate Mule.  The first
 593 release was made in September of that year.
 594
 595 A timeline for Emacs 20 is
 596
 597 @itemize @bullet
 598 @item
 599 version 20.1 released September 17, 1997.
 600 @item
 601 version 20.2 released September 20, 1997.
 602 @item
 603 version 20.3 released August 19, 1998.
 604 @end itemize
 605
 606 @node XEmacs
 607 @section XEmacs
 608 @cindex XEmacs
 609
 610 @cindex Sun Microsystems
 611 @cindex University of Illinois
 612 @cindex Illinois, University of
 613 @cindex SPARCWorks
 614 @cindex Andreessen, Marc
 615 @cindex Baur, Steve
 616 @cindex Buchholz, Martin
 617 @cindex Kaplan, Simon
 618 @cindex Wing, Ben
 619 @cindex Thompson, Chuck
 620 @cindex Win-Emacs
 621 @cindex Epoch
 622 @cindex Amdahl Corporation
 623   Around the time that Lucid was developing Energize, Sun Microsystems
 624 was developing their own development environment (called ``SPARCWorks'')
 625 and also decided to use Emacs.  They joined forces with the Epoch team
 626 at the University of Illinois and later with Lucid.  The maintainer of
 627 the last-released version of Epoch was Marc Andreessen, but he dropped
 628 out and the Epoch project, headed by Simon Kaplan, lured Chuck Thompson
 629 away from a system administration job to become the primary Lucid Emacs
 630 author for Epoch and Sun.  Chuck's area of specialty became the
 631 redisplay engine (he replaced the old Lucid Emacs redisplay engine with
 632 a ported version from Epoch and then later rewrote it from scratch).
 633 Sun also hired Ben Wing (the author of Win-Emacs, a port of Lucid Emacs
 634 to Microsoft Windows 3.1) in 1993, for what was initially a one-month
 635 contract to fix some event problems but later became a many-year
 636 involvement, punctuated by a six-month contract with Amdahl Corporation.
 637
 638 @cindex rename to XEmacs
 639   In 1994, Sun and Lucid agreed to rename Lucid Emacs to XEmacs (a name
 640 not favorable to either company); the first release called XEmacs was
 641 version 19.11.  In June 1994, Lucid folded and Jamie quit to work for
 642 the newly formed Mosaic Communications Corp., later Netscape
 643 Communications Corp. (co-founded by the same Marc Andreessen, who had
 644 quit his Epoch job to work on a graphical browser for the World Wide
 645 Web).  Chuck then become the primary maintainer of XEmacs, and put out
 646 versions 19.11 through 19.14 in conjunction with Ben.  For 19.12 and
 647 19.13, Chuck added the new redisplay and many other display improvements
 648 and Ben added MULE support (support for Asian and other languages) and
 649 redesigned most of the internal Lisp subsystems to better support the
 650 MULE work and the various other features being added to XEmacs.  After
 651 19.14 Chuck retired as primary maintainer and Steve Baur stepped in.
 652
 653 @cindex MULE merged XEmacs appears
 654   Soon after 19.13 was released, work began in earnest on the MULE
 655 internationalization code and the source tree was divided into two
 656 development paths.  The MULE version was initially called 19.20, but was
 657 soon renamed to 20.0.  In 1996 Martin Buchholz of Sun Microsystems took
 658 over the care and feeding of it and worked on it in parallel with the
 659 19.14 development that was occurring at the same time.  After much work
 660 by Martin, it was decided to release 20.0 ahead of 19.15 in February
 661 1997.  The source tree remained divided until 20.2 when the version 19
 662 source was finally retired at version 19.16.
 663
 664 @cindex Baur, Steve
 665 @cindex Buchholz, Martin
 666 @cindex Jones, Kyle
 667 @cindex Niksic, Hrvoje
 668 @cindex XEmacs goes it alone
 669   In 1997, Sun finally dropped all pretense of support for XEmacs and
 670 Martin Buchholz left the company in November.  Since then, and mostly
 671 for the previous year, because Steve Baur was never paid to work on
 672 XEmacs, XEmacs has existed solely on the contributions of volunteers
 673 from the Free Software Community.  Starting from 1997, Hrvoje Niksic and
 674 Kyle Jones have figured prominently in XEmacs development.
 675
 676 @cindex merging attempts
 677   Many attempts have been made to merge XEmacs and GNU Emacs, but they
 678 have consistently failed.
 679
 680   A more detailed history is contained in the XEmacs About page.
 681
 682 @node XEmacs From the Outside, The Lisp Language, A History of Emacs, Top
 683 @chapter XEmacs From the Outside
 684 @cindex read-eval-print
 685
 686   XEmacs appears to the outside world as an editor, but it is really a
 687 Lisp environment.  At its heart is a Lisp interpreter; it also
 688 ``happens'' to contain many specialized object types (e.g. buffers,
 689 windows, frames, events) that are useful for implementing an editor.
 690 Some of these objects (in particular windows and frames) have
 691 displayable representations, and XEmacs provides a function
 692 @code{redisplay()} that ensures that the display of all such objects
 693 matches their internal state.  Most of the time, a standard Lisp
 694 environment is in a @dfn{read-eval-print} loop -- i.e. ``read some Lisp
 695 code, execute it, and print the results''.  XEmacs has a similar loop:
 696
 697 @itemize @bullet
 698 @item
 699 read an event
 700 @item
 701 dispatch the event (i.e. ``do it'')
 702 @item
 703 redisplay
 704 @end itemize
 705
 706   Reading an event is done using the Lisp function @code{next-event},
 707 which waits for something to happen (typically, the user presses a key
 708 or moves the mouse) and returns an event object describing this.
 709 Dispatching an event is done using the Lisp function
 710 @code{dispatch-event}, which looks up the event in a keymap object (a
 711 particular kind of object that associates an event with a Lisp function)
 712 and calls that function.  The function ``does'' what the user has
 713 requested by changing the state of particular frame objects, buffer
 714 objects, etc.  Finally, @code{redisplay()} is called, which updates the
 715 display to reflect those changes just made.  Thus is an ``editor'' born.
 716
 717 @cindex bridge, playing
 718 @cindex taxes, doing
 719 @cindex pi, calculating
 720   Note that you do not have to use XEmacs as an editor; you could just
 721 as well make it do your taxes, compute pi, play bridge, etc.  You'd just
 722 have to write functions to do those operations in Lisp.
 723
 724 @node The Lisp Language, XEmacs From the Perspective of Building, XEmacs From the Outside, Top
 725 @chapter The Lisp Language
 726 @cindex Lisp vs. C
 727 @cindex C vs. Lisp
 728 @cindex Lisp vs. Java
 729 @cindex Java vs. Lisp
 730 @cindex dynamic scoping
 731 @cindex scoping, dynamic
 732 @cindex dynamic types
 733 @cindex types, dynamic
 734 @cindex Java
 735 @cindex Common Lisp
 736 @cindex Gosling, James
 737
 738   Lisp is a general-purpose language that is higher-level than C and in
 739 many ways more powerful than C.  Powerful dialects of Lisp such as
 740 Common Lisp are probably much better languages for writing very large
 741 applications than is C. (Unfortunately, for many non-technical
 742 reasons C and its successor C++ have become the dominant languages for
 743 application development.  These languages are both inadequate for
 744 extremely large applications, which is evidenced by the fact that newer,
 745 larger programs are becoming ever harder to write and are requiring ever
 746 more programmers despite great increases in C development environments;
 747 and by the fact that, although hardware speeds and reliability have been
 748 growing at an exponential rate, most software is still generally
 749 considered to be slow and buggy.)
 750
 751   The new Java language holds promise as a better general-purpose
 752 development language than C.  Java has many features in common with
 753 Lisp that are not shared by C (this is not a coincidence, since
 754 Java was designed by James Gosling, a former Lisp hacker).  This
 755 will be discussed more later.
 756
 757 For those used to C, here is a summary of the basic differences between
 758 C and Lisp:
 759
 760 @enumerate
 761 @item
 762 Lisp has an extremely regular syntax.  Every function, expression,
 763 and control statement is written in the form
 764
 765 @example
 766    (@var{func} @var{arg1} @var{arg2} ...)
 767 @end example
 768
 769 This is as opposed to C, which writes functions as
 770
 771 @example
 772    func(@var{arg1}, @var{arg2}, ...)
 773 @end example
 774
 775 but writes expressions involving operators as (e.g.)
 776
 777 @example
 778    @var{arg1} + @var{arg2}
 779 @end example
 780
 781 and writes control statements as (e.g.)
 782
 783 @example
 784    while (@var{expr}) @{ @var{statement1}; @var{statement2}; ... @}
 785 @end example
 786
 787 Lisp equivalents of the latter two would be
 788
 789 @example
 790    (+ @var{arg1} @var{arg2} ...)
 791 @end example
 792
 793 and
 794
 795 @example
 796    (while @var{expr} @var{statement1} @var{statement2} ...)
 797 @end example
 798
 799 @item
 800 Lisp is a safe language.  Assuming there are no bugs in the Lisp
 801 interpreter/compiler, it is impossible to write a program that ``core
 802 dumps'' or otherwise causes the machine to execute an illegal
 803 instruction.  This is very different from C, where perhaps the most
 804 common outcome of a bug is exactly such a crash.  A corollary of this is that
 805 the C operation of casting a pointer is impossible (and unnecessary) in
 806 Lisp, and that it is impossible to access memory outside the bounds of
 807 an array.
 808
 809 @item
 810 Programs and data are written in the same form.  The
 811 parenthesis-enclosing form described above for statements is the same
 812 form used for the most common data type in Lisp, the list.  Thus, it is
 813 possible to represent any Lisp program using Lisp data types, and for
 814 one program to construct Lisp statements and then dynamically
 815 @dfn{evaluate} them, or cause them to execute.
 816
 817 @item
 818 All objects are @dfn{dynamically typed}.  This means that part of every
 819 object is an indication of what type it is.  A Lisp program can
 820 manipulate an object without knowing what type it is, and can query an
 821 object to determine its type.  This means that, correspondingly,
 822 variables and function parameters can hold objects of any type and are
 823 not normally declared as being of any particular type.  This is opposed
 824 to the @dfn{static typing} of C, where variables can hold exactly one
 825 type of object and must be declared as such, and objects do not contain
 826 an indication of their type because it's implicit in the variables they
 827 are stored in.  It is possible in C to have a variable hold different
 828 types of objects (e.g. through the use of @code{void *} pointers or
 829 variable-argument functions), but the type information must then be
 830 passed explicitly in some other fashion, leading to additional program
 831 complexity.
 832
 833 @item
 834 Allocated memory is automatically reclaimed when it is no longer in use.
 835 This operation is called @dfn{garbage collection} and involves looking
 836 through all variables to see what memory is being pointed to, and
 837 reclaiming any memory that is not pointed to and is thus
 838 ``inaccessible'' and out of use.  This is as opposed to C, in which
 839 allocated memory must be explicitly reclaimed using @code{free()}.  If
 840 you simply drop all pointers to memory without freeing it, it becomes
 841 ``leaked'' memory that still takes up space.  Over a long period of
 842 time, this can cause your program to grow and grow until it runs out of
 843 memory.
 844
 845 @item
 846 Lisp has built-in facilities for handling errors and exceptions.  In C,
 847 when an error occurs, usually either the program exits entirely or the
 848 routine in which the error occurs returns a value indicating this.  If
 849 an error occurs in a deeply-nested routine, then every routine currently
 850 called must unwind itself normally and return an error value back up to
 851 the next routine.  This means that every routine must explicitly check
 852 for an error in all the routines it calls; if it does not do so,
 853 unexpected and often random behavior results.  This is an extremely
 854 common source of bugs in C programs.  An alternative would be to do a
 855 non-local exit using @code{longjmp()}, but that is often very dangerous
 856 because the routines that were exited past had no opportunity to clean
 857 up after themselves and may leave things in an inconsistent state,
 858 causing a crash shortly afterwards.
 859
 860 Lisp provides mechanisms to make such non-local exits safe.  When an
 861 error occurs, a routine simply signals that an error of a particular
 862 class has occurred, and a non-local exit takes place.  Any routine can
 863 trap errors occurring in routines it calls by registering an error
 864 handler for some or all classes of errors. (If no handler is registered,
 865 a default handler, generally installed by the top-level event loop, is
 866 executed; this prints out the error and continues.) Routines can also
 867 specify cleanup code (called an @dfn{unwind-protect}) that will be
 868 called when control exits from a block of code, no matter how that exit
 869 occurs -- i.e. even if a function deeply nested below it causes a
 870 non-local exit back to the top level.
 871
 872 Note that this facility has appeared in some recent vintages of C, in
 873 particular Visual C++ and other PC compilers written for the Microsoft
 874 Win32 API.
 875
 876 @item
 877 In Emacs Lisp, local variables are @dfn{dynamically scoped}.  This means
 878 that if you declare a local variable in a particular function, and then
 879 call another function, that subfunction can ``see'' the local variable
 880 you declared.  This is actually considered a bug in Emacs Lisp and in
 881 all other early dialects of Lisp, and was corrected in Common Lisp. (In
 882 Common Lisp, you can still declare dynamically scoped variables if you
 883 want to -- they are sometimes useful -- but variables by default are
 884 @dfn{lexically scoped} as in C.)
 885 @end enumerate
 886
 887 For those familiar with Lisp, Emacs Lisp is modelled after MacLisp, an
 888 early dialect of Lisp developed at MIT (no relation to the Macintosh
 889 computer).  There is a Common Lisp compatibility package available for
 890 Emacs that provides many of the features of Common Lisp.
 891
 892 The Java language is derived in many ways from C, and shares a similar
 893 syntax, but has the following features in common with Lisp (and different
 894 from C):
 895
 896 @enumerate
 897 @item
 898 Java is a safe language, like Lisp.
 899 @item
 900 Java provides garbage collection, like Lisp.
 901 @item
 902 Java has built-in facilities for handling errors and exceptions, like
 903 Lisp.
 904 @item
 905 Java has a type system that combines the best advantages of both static
 906 and dynamic typing.  Objects (except very simple types) are explicitly
 907 marked with their type, as in dynamic typing; but there is a hierarchy
 908 of types and functions are declared to accept only certain types, thus
 909 providing the increased compile-time error-checking of static typing.
 910 @end enumerate
 911
 912 The Java language also has some negative attributes:
 913
 914 @enumerate
 915 @item
 916 Java uses the edit/compile/run model of software development.  This
 917 makes it hard to use interactively.  For example, to use Java like
 918 @code{bc} it is necessary to write a special purpose, albeit tiny,
 919 application.  In Emacs Lisp, a calculator comes built-in without any
 920 effort - one can always just type an expression in the @code{*scratch*}
 921 buffer.
 922 @item
 923 Java tries too hard to enforce, not merely enable, portability, making
 924 ordinary access to standard OS facilities painful.  Java has an
 925 @dfn{agenda}.  I think this is why @code{chdir} is not part of standard
 926 Java, which is inexcusable.
 927 @end enumerate
 928
 929 Unfortunately, there is no perfect language.  Static typing allows a
 930 compiler to catch programmer errors and produce more efficient code, but
 931 makes programming more tedious and less fun.  For the forseeable future,
 932 an Ideal Editing and Programming Environment (and that is what XEmacs
 933 aspires to) will be programmable in multiple languages: high level ones
 934 like Lisp for user customization and prototyping, and lower level ones
 935 for infrastructure and industrial strength applications.  If I had my
 936 way, XEmacs would be friendly towards the Python, Scheme, C++, ML,
 937 etc... communities.  But there are serious technical difficulties to
 938 achieving that goal.
 939
 940 The word @dfn{application} in the previous paragraph was used
 941 intentionally.  XEmacs implements an API for programs written in Lisp
 942 that makes it a full-fledged application platform, very much like an OS
 943 inside the real OS.
 944
 945 @node XEmacs From the Perspective of Building, XEmacs From the Inside, The Lisp Language, Top
 946 @chapter XEmacs From the Perspective of Building
 947
 948 The heart of XEmacs is the Lisp environment, which is written in C.
 949 This is contained in the @file{src/} subdirectory.  Underneath
 950 @file{src/} are two subdirectories of header files: @file{s/} (header
 951 files for particular operating systems) and @file{m/} (header files for
 952 particular machine types).  In practice the distinction between the two
 953 types of header files is blurred.  These header files define or undefine
 954 certain preprocessor constants and macros to indicate particular
 955 characteristics of the associated machine or operating system.  As part
 956 of the configure process, one @file{s/} file and one @file{m/} file is
 957 identified for the particular environment in which XEmacs is being
 958 built.
 959
 960 XEmacs also contains a great deal of Lisp code.  This implements the
 961 operations that make XEmacs useful as an editor as well as just a Lisp
 962 environment, and also contains many add-on packages that allow XEmacs to
 963 browse directories, act as a mail and Usenet news reader, compile Lisp
 964 code, etc.  There is actually more Lisp code than C code associated with
 965 XEmacs, but much of the Lisp code is peripheral to the actual operation
 966 of the editor.  The Lisp code all lies in subdirectories underneath the
 967 @file{lisp/} directory.
 968
 969 The @file{lwlib/} directory contains C code that implements a
 970 generalized interface onto different X widget toolkits and also
 971 implements some widgets of its own that behave like Motif widgets but
 972 are faster, free, and in some cases more powerful.  The code in this
 973 directory compiles into a library and is mostly independent from XEmacs.
 974
 975 The @file{etc/} directory contains various data files associated with
 976 XEmacs.  Some of them are actually read by XEmacs at startup; others
 977 merely contain useful information of various sorts.
 978
 979 The @file{lib-src/} directory contains C code for various auxiliary
 980 programs that are used in connection with XEmacs.  Some of them are used
 981 during the build process; others are used to perform certain functions
 982 that cannot conveniently be placed in the XEmacs executable (e.g. the
 983 @file{movemail} program for fetching mail out of @file{/var/spool/mail},
 984 which must be setgid to @file{mail} on many systems; and the
 985 @file{gnuclient} program, which allows an external script to communicate
 986 with a running XEmacs process).
 987
 988 The @file{man/} directory contains the sources for the XEmacs
 989 documentation.  It is mostly in a form called Texinfo, which can be
 990 converted into either a printed document (by passing it through @TeX{})
 991 or into on-line documentation called @dfn{info files}.
 992
 993 The @file{info/} directory contains the results of formatting the XEmacs
 994 documentation as @dfn{info files}, for on-line use.  These files are
 995 used when you enter the Info system using @kbd{C-h i} or through the
 996 Help menu.
 997
 998 The @file{dynodump/} directory contains auxiliary code used to build
 999 XEmacs on Solaris platforms.
1000
1001 The other directories contain various miscellaneous code and information
1002 that is not normally used or needed.
1003
1004 The first step of building involves running the @file{configure} program
1005 and passing it various parameters to specify any optional features you
1006 want and compiler arguments and such, as described in the @file{INSTALL}
1007 file.  This determines what the build environment is, chooses the
1008 appropriate @file{s/} and @file{m/} file, and runs a series of tests to
1009 determine many details about your environment, such as which library
1010 functions are available and exactly how they work.  The reason for
1011 running these tests is that it allows XEmacs to be compiled on a much
1012 wider variety of platforms than those that the XEmacs developers happen
1013 to be familiar with, including various sorts of hybrid platforms.  This
1014 is especially important now that many operating systems give you a great
1015 deal of control over exactly what features you want installed, and allow
1016 for easy upgrading of parts of a system without upgrading the rest.  It
1017 would be impossible to pre-determine and pre-specify the information for
1018 all possible configurations.
1019
1020 In fact, the @file{s/} and @file{m/} files are basically @emph{evil},
1021 since they contain unmaintainable platform-specific hard-coded
1022 information.  XEmacs has been moving in the direction of having all
1023 system-specific information be determined dynamically by
1024 @file{configure}.  Perhaps someday we can @code{rm -rf src/s src/m}.
1025
1026 When configure is done running, it generates @file{Makefile}s and
1027 @file{GNUmakefile}s and the file @file{src/config.h} (which describes
1028 the features of your system) from template files.  You then run
1029 @file{make}, which compiles the auxiliary code and programs in
1030 @file{lib-src/} and @file{lwlib/} and the main XEmacs executable in
1031 @file{src/}.  The result of compiling and linking is an executable
1032 called @file{temacs}, which is @emph{not} the final XEmacs executable.
1033 @file{temacs} by itself is not intended to function as an editor or even
1034 display any windows on the screen, and if you simply run it, it will
1035 exit immediately.  The @file{Makefile} runs @file{temacs} with certain
1036 options that cause it to initialize itself, read in a number of basic
1037 Lisp files, and then dump itself out into a new executable called
1038 @file{xemacs}.  This new executable has been pre-initialized and
1039 contains pre-digested Lisp code that is necessary for the editor to
1040 function (this includes most basic editing functions,
1041 e.g. @code{kill-line}, that can be defined in terms of other Lisp
1042 primitives; some initialization code that is called when certain
1043 objects, such as frames, are created; and all of the standard
1044 keybindings and code for the actions they result in).  This executable,
1045 @file{xemacs}, is the executable that you run to use the XEmacs editor.
1046
1047 Although @file{temacs} is not intended to be run as an editor, it can,
1048 by using the incantation @code{temacs -batch -l loadup.el run-temacs}.
1049 This is useful when the dumping procedure described above is broken, or
1050 when using certain program debugging tools such as Purify.  These tools
1051 get mighty confused by the tricks played by the XEmacs build process,
1052 such as allocation memory in one process, and freeing it in the next.
1053
1054 @node XEmacs From the Inside, The XEmacs Object System (Abstractly Speaking), XEmacs From the Perspective of Building, Top
1055 @chapter XEmacs From the Inside
1056
1057 Internally, XEmacs is quite complex, and can be very confusing.  To
1058 simplify things, it can be useful to think of XEmacs as containing an
1059 event loop that ``drives'' everything, and a number of other subsystems,
1060 such as a Lisp engine and a redisplay mechanism.  Each of these other
1061 subsystems exists simultaneously in XEmacs, and each has a certain
1062 state.  The flow of control continually passes in and out of these
1063 different subsystems in the course of normal operation of the editor.
1064
1065 It is important to keep in mind that, most of the time, the editor is
1066 ``driven'' by the event loop.  Except during initialization and batch
1067 mode, all subsystems are entered directly or indirectly through the
1068 event loop, and ultimately, control exits out of all subsystems back up
1069 to the event loop.  This cycle of entering a subsystem, exiting back out
1070 to the event loop, and starting another iteration of the event loop
1071 occurs once each keystroke, mouse motion, etc.
1072
1073 If you're trying to understand a particular subsystem (other than the
1074 event loop), think of it as a ``daemon'' process or ``servant'' that is
1075 responsible for one particular aspect of a larger system, and
1076 periodically receives commands or environment changes that cause it to
1077 do something.  Ultimately, these commands and environment changes are
1078 always triggered by the event loop.  For example:
1079
1080 @itemize @bullet
1081 @item
1082 The window and frame mechanism is responsible for keeping track of what
1083 windows and frames exist, what buffers are in them, etc.  It is
1084 periodically given commands (usually from the user) to make a change to
1085 the current window/frame state: i.e. create a new frame, delete a
1086 window, etc.
1087
1088 @item
1089 The buffer mechanism is responsible for keeping track of what buffers
1090 exist and what text is in them.  It is periodically given commands
1091 (usually from the user) to insert or delete text, create a buffer, etc.
1092 When it receives a text-change command, it notifies the redisplay
1093 mechanism.
1094
1095 @item
1096 The redisplay mechanism is responsible for making sure that windows and
1097 frames are displayed correctly.  It is periodically told (by the event
1098 loop) to actually ``do its job'', i.e. snoop around and see what the
1099 current state of the environment (mostly of the currently-existing
1100 windows, frames, and buffers) is, and make sure that that state matches
1101 what's actually displayed.  It keeps lots and lots of information around
1102 (such as what is actually being displayed currently, and what the
1103 environment was last time it checked) so that it can minimize the work
1104 it has to do.  It is also helped along in that whenever a relevant
1105 change to the environment occurs, the redisplay mechanism is told about
1106 this, so it has a pretty good idea of where it has to look to find
1107 possible changes and doesn't have to look everywhere.
1108
1109 @item
1110 The Lisp engine is responsible for executing the Lisp code in which most
1111 user commands are written.  It is entered through a call to @code{eval}
1112 or @code{funcall}, which occurs as a result of dispatching an event from
1113 the event loop.  The functions it calls issue commands to the buffer
1114 mechanism, the window/frame subsystem, etc.
1115
1116 @item
1117 The Lisp allocation subsystem is responsible for keeping track of Lisp
1118 objects.  It is given commands from the Lisp engine to allocate objects,
1119 garbage collect, etc.
1120 @end itemize
1121
1122 etc.
1123
1124   The important idea here is that there are a number of independent
1125 subsystems each with its own responsibility and persistent state, just
1126 like different employees in a company, and each subsystem is
1127 periodically given commands from other subsystems.  Commands can flow
1128 from any one subsystem to any other, but there is usually some sort of
1129 hierarchy, with all commands originating from the event subsystem.
1130
1131   XEmacs is entered in @code{main()}, which is in @file{emacs.c}.  When
1132 this is called the first time (in a properly-invoked @file{temacs}), it
1133 does the following:
1134
1135 @enumerate
1136 @item
1137 It does some very basic environment initializations, such as determining
1138 where it and its directories (e.g. @file{lisp/} and @file{etc/}) reside
1139 and setting up signal handlers.
1140 @item
1141 It initializes the entire Lisp interpreter.
1142 @item
1143 It sets the initial values of many built-in variables (including many
1144 variables that are visible to Lisp programs), such as the global keymap
1145 object and the built-in faces (a face is an object that describes the
1146 display characteristics of text).  This involves creating Lisp objects
1147 and thus is dependent on step (2).
1148 @item
1149 It performs various other initializations that are relevant to the
1150 particular environment it is running in, such as retrieving environment
1151 variables, determining the current date and the user who is running the
1152 program, examining its standard input, creating any necessary file
1153 descriptors, etc.
1154 @item
1155 At this point, the C initialization is complete.  A Lisp program that
1156 was specified on the command line (usually @file{loadup.el}) is called
1157 (temacs is normally invoked as @code{temacs -batch -l loadup.el dump}).
1158 @file{loadup.el} loads all of the other Lisp files that are needed for
1159 the operation of the editor, calls the @code{dump-emacs} function to
1160 write out @file{xemacs}, and then kills the temacs process.
1161 @end enumerate
1162
1163   When @file{xemacs} is then run, it only redoes steps (1) and (4)
1164 above; all variables already contain the values they were set to when
1165 the executable was dumped, and all memory that was allocated with
1166 @code{malloc()} is still around. (XEmacs knows whether it is being run
1167 as @file{xemacs} or @file{temacs} because it sets the global variable
1168 @code{initialized} to 1 after step (4) above.) At this point,
1169 @file{xemacs} calls a Lisp function to do any further initialization,
1170 which includes parsing the command-line (the C code can only do limited
1171 command-line parsing, which includes looking for the @samp{-batch} and
1172 @samp{-l} flags and a few other flags that it needs to know about before
1173 initialization is complete), creating the first frame (or @dfn{window}
1174 in standard window-system parlance), running the user's init file
1175 (usually the file @file{.emacs} in the user's home directory), etc.  The
1176 function to do this is usually called @code{normal-top-level};
1177 @file{loadup.el} tells the C code about this function by setting its
1178 name as the value of the Lisp variable @code{top-level}.
1179
1180   When the Lisp initialization code is done, the C code enters the event
1181 loop, and stays there for the duration of the XEmacs process.  The code
1182 for the event loop is contained in @file{keyboard.c}, and is called
1183 @code{Fcommand_loop_1()}.  Note that this event loop could very well be
1184 written in Lisp, and in fact a Lisp version exists; but apparently,
1185 doing this makes XEmacs run noticeably slower.
1186
1187   Notice how much of the initialization is done in Lisp, not in C.
1188 In general, XEmacs tries to move as much code as is possible
1189 into Lisp.  Code that remains in C is code that implements the
1190 Lisp interpreter itself, or code that needs to be very fast, or
1191 code that needs to do system calls or other such stuff that
1192 needs to be done in C, or code that needs to have access to
1193 ``forbidden'' structures. (One conscious aspect of the design of
1194 Lisp under XEmacs is a clean separation between the external
1195 interface to a Lisp object's functionality and its internal
1196 implementation.  Part of this design is that Lisp programs
1197 are forbidden from accessing the contents of the object other
1198 than through using a standard API.  In this respect, XEmacs Lisp
1199 is similar to modern Lisp dialects but differs from GNU Emacs,
1200 which tends to expose the implementation and allow Lisp
1201 programs to look at it directly.  The major advantage of
1202 hiding the implementation is that it allows the implementation
1203 to be redesigned without affecting any Lisp programs, including
1204 those that might want to be ``clever'' by looking directly at
1205 the object's contents and possibly manipulating them.)
1206
1207   Moving code into Lisp makes the code easier to debug and maintain and
1208 makes it much easier for people who are not XEmacs developers to
1209 customize XEmacs, because they can make a change with much less chance
1210 of obscure and unwanted interactions occurring than if they were to
1211 change the C code.
1212
1213 @node The XEmacs Object System (Abstractly Speaking), How Lisp Objects Are Represented in C, XEmacs From the Inside, Top
1214 @chapter The XEmacs Object System (Abstractly Speaking)
1215
1216   At the heart of the Lisp interpreter is its management of objects.
1217 XEmacs Lisp contains many built-in objects, some of which are
1218 simple and others of which can be very complex; and some of which
1219 are very common, and others of which are rarely used or are only
1220 used internally. (Since the Lisp allocation system, with its
1221 automatic reclamation of unused storage, is so much more convenient
1222 than @code{malloc()} and @code{free()}, the C code makes extensive use of it
1223 in its internal operations.)
1224
1225   The basic Lisp objects are
1226
1227 @table @code
1228 @item integer
1229 28 or 31 bits of precision, or 60 or 63 bits on 64-bit machines; the
1230 reason for this is described below when the internal Lisp object
1231 representation is described.
1232 @item float
1233 Same precision as a double in C.
1234 @item cons
1235 A simple container for two Lisp objects, used to implement lists and
1236 most other data structures in Lisp.
1237 @item char
1238 An object representing a single character of text; chars behave like
1239 integers in many ways but are logically considered text rather than
1240 numbers and have a different read syntax. (the read syntax for a char
1241 contains the char itself or some textual encoding of it -- for example,
1242 a Japanese Kanji character might be encoded as @samp{^[$(B#&^[(B} using the
1243 ISO-2022 encoding standard -- rather than the numerical representation
1244 of the char; this way, if the mapping between chars and integers
1245 changes, which is quite possible for Kanji characters and other extended
1246 characters, the same character will still be created.  Note that some
1247 primitives confuse chars and integers.  The worst culprit is @code{eq},
1248 which makes a special exception and considers a char to be @code{eq} to
1249 its integer equivalent, even though in no other case are objects of two
1250 different types @code{eq}.  The reason for this monstrosity is
1251 compatibility with existing code; the separation of char from integer
1252 came fairly recently.)
1253 @item symbol
1254 An object that contains Lisp objects and is referred to by name;
1255 symbols are used to implement variables and named functions
1256 and to provide the equivalent of preprocessor constants in C.
1257 @item vector
1258 A one-dimensional array of Lisp objects providing constant-time access
1259 to any of the objects; access to an arbitrary object in a vector is
1260 faster than for lists, but the operations that can be done on a vector
1261 are more limited.
1262 @item string
1263 Self-explanatory; behaves much like a vector of chars
1264 but has a different read syntax and is stored and manipulated
1265 more compactly.
1266 @item bit-vector
1267 A vector of bits; similar to a string in spirit.
1268 @item compiled-function
1269 An object containing compiled Lisp code, known as @dfn{byte code}.
1270 @item subr
1271 A Lisp primitive, i.e. a Lisp-callable function implemented in C.
1272 @end table
1273
1274 @cindex closure
1275 Note that there is no basic ``function'' type, as in more powerful
1276 versions of Lisp (where it's called a @dfn{closure}).  XEmacs Lisp does
1277 not provide the closure semantics implemented by Common Lisp and Scheme.
1278 The guts of a function in XEmacs Lisp are represented in one of four
1279 ways: a symbol specifying another function (when one function is an
1280 alias for another), a list (whose first element must be the symbol
1281 @code{lambda}) containing the function's source code, a
1282 compiled-function object, or a subr object. (In other words, given a
1283 symbol specifying the name of a function, calling @code{symbol-function}
1284 to retrieve the contents of the symbol's function cell will return one
1285 of these types of objects.)
1286
1287 XEmacs Lisp also contains numerous specialized objects used to implement
1288 the editor:
1289
1290 @table @code
1291 @item buffer
1292 Stores text like a string, but is optimized for insertion and deletion
1293 and has certain other properties that can be set.
1294 @item frame
1295 An object with various properties whose displayable representation is a
1296 @dfn{window} in window-system parlance.
1297 @item window
1298 A section of a frame that displays the contents of a buffer;
1299 often called a @dfn{pane} in window-system parlance.
1300 @item window-configuration
1301 An object that represents a saved configuration of windows in a frame.
1302 @item device
1303 An object representing a screen on which frames can be displayed;
1304 equivalent to a @dfn{display} in the X Window System and a @dfn{TTY} in
1305 character mode.
1306 @item face
1307 An object specifying the appearance of text or graphics; it has
1308 properties such as font, foreground color, and background color.
1309 @item marker
1310 An object that refers to a particular position in a buffer and moves
1311 around as text is inserted and deleted to stay in the same relative
1312 position to the text around it.
1313 @item extent
1314 Similar to a marker but covers a range of text in a buffer; can also
1315 specify properties of the text, such as a face in which the text is to
1316 be displayed, whether the text is invisible or unmodifiable, etc.
1317 @item event
1318 Generated by calling @code{next-event} and contains information
1319 describing a particular event happening in the system, such as the user
1320 pressing a key or a process terminating.
1321 @item keymap
1322 An object that maps from events (described using lists, vectors, and
1323 symbols rather than with an event object because the mapping is for
1324 classes of events, rather than individual events) to functions to
1325 execute or other events to recursively look up; the functions are
1326 described by name, using a symbol, or using lists to specify the
1327 function's code.
1328 @item glyph
1329 An object that describes the appearance of an image (e.g.  pixmap) on
1330 the screen; glyphs can be attached to the beginning or end of extents
1331 and in some future version of XEmacs will be able to be inserted
1332 directly into a buffer.
1333 @item process
1334 An object that describes a connection to an externally-running process.
1335 @end table
1336
1337   There are some other, less-commonly-encountered general objects:
1338
1339 @table @code
1340 @item hash-table
1341 An object that maps from an arbitrary Lisp object to another arbitrary
1342 Lisp object, using hashing for fast lookup.
1343 @item obarray
1344 A limited form of hash-table that maps from strings to symbols; obarrays
1345 are used to look up a symbol given its name and are not actually their
1346 own object type but are kludgily represented using vectors with hidden
1347 fields (this representation derives from GNU Emacs).
1348 @item specifier
1349 A complex object used to specify the value of a display property; a
1350 default value is given and different values can be specified for
1351 particular frames, buffers, windows, devices, or classes of device.
1352 @item char-table
1353 An object that maps from chars or classes of chars to arbitrary Lisp
1354 objects; internally char tables use a complex nested-vector
1355 representation that is optimized to the way characters are represented
1356 as integers.
1357 @item range-table
1358 An object that maps from ranges of integers to arbitrary Lisp objects.
1359 @end table
1360
1361   And some strange special-purpose objects:
1362
1363 @table @code
1364 @item charset
1365 @itemx coding-system
1366 Objects used when MULE, or multi-lingual/Asian-language, support is
1367 enabled.
1368 @item color-instance
1369 @itemx font-instance
1370 @itemx image-instance
1371 An object that encapsulates a window-system resource; instances are
1372 mostly used internally but are exposed on the Lisp level for cleanness
1373 of the specifier model and because it's occasionally useful for Lisp
1374 program to create or query the properties of instances.
1375 @item subwindow
1376 An object that encapsulate a @dfn{subwindow} resource, i.e. a
1377 window-system child window that is drawn into by an external process;
1378 this object should be integrated into the glyph system but isn't yet,
1379 and may change form when this is done.
1380 @item tooltalk-message
1381 @itemx tooltalk-pattern
1382 Objects that represent resources used in the ToolTalk interprocess
1383 communication protocol.
1384 @item toolbar-button
1385 An object used in conjunction with the toolbar.
1386 @end table
1387
1388   And objects that are only used internally:
1389
1390 @table @code
1391 @item opaque
1392 A generic object for encapsulating arbitrary memory; this allows you the
1393 generality of @code{malloc()} and the convenience of the Lisp object
1394 system.
1395 @item lstream
1396 A buffering I/O stream, used to provide a unified interface to anything
1397 that can accept output or provide input, such as a file descriptor, a
1398 stdio stream, a chunk of memory, a Lisp buffer, a Lisp string, etc.;
1399 it's a Lisp object to make its memory management more convenient.
1400 @item char-table-entry
1401 Subsidiary objects in the internal char-table representation.
1402 @item extent-auxiliary
1403 @itemx menubar-data
1404 @itemx toolbar-data
1405 Various special-purpose objects that are basically just used to
1406 encapsulate memory for particular subsystems, similar to the more
1407 general ``opaque'' object.
1408 @item symbol-value-forward
1409 @itemx symbol-value-buffer-local
1410 @itemx symbol-value-varalias
1411 @itemx symbol-value-lisp-magic
1412 Special internal-only objects that are placed in the value cell of a
1413 symbol to indicate that there is something special with this variable --
1414 e.g. it has no value, it mirrors another variable, or it mirrors some C
1415 variable; there is really only one kind of object, called a
1416 @dfn{symbol-value-magic}, but it is sort-of halfway kludged into
1417 semi-different object types.
1418 @end table
1419
1420 @cindex permanent objects
1421 @cindex temporary objects
1422   Some types of objects are @dfn{permanent}, meaning that once created,
1423 they do not disappear until explicitly destroyed, using a function such
1424 as @code{delete-buffer}, @code{delete-window}, @code{delete-frame}, etc.
1425 Others will disappear once they are not longer used, through the garbage
1426 collection mechanism.  Buffers, frames, windows, devices, and processes
1427 are among the objects that are permanent.  Note that some objects can go
1428 both ways: Faces can be created either way; extents are normally
1429 permanent, but detached extents (extents not referring to any text, as
1430 happens to some extents when the text they are referring to is deleted)
1431 are temporary.  Note that some permanent objects, such as faces and
1432 coding systems, cannot be deleted.  Note also that windows are unique in
1433 that they can be @emph{undeleted} after having previously been
1434 deleted. (This happens as a result of restoring a window configuration.)
1435
1436 @cindex read syntax
1437   Note that many types of objects have a @dfn{read syntax}, i.e. a way of
1438 specifying an object of that type in Lisp code.  When you load a Lisp
1439 file, or type in code to be evaluated, what really happens is that the
1440 function @code{read} is called, which reads some text and creates an object
1441 based on the syntax of that text; then @code{eval} is called, which
1442 possibly does something special; then this loop repeats until there's
1443 no more text to read. (@code{eval} only actually does something special
1444 with symbols, which causes the symbol's value to be returned,
1445 similar to referencing a variable; and with conses [i.e. lists],
1446 which cause a function invocation.  All other values are returned
1447 unchanged.)
1448
1449   The read syntax
1450
1451 @example
1452 17297
1453 @end example
1454
1455 converts to an integer whose value is 17297.
1456
1457 @example
1458 1.983e-4
1459 @end example
1460
1461 converts to a float whose value is 1983.23e-4, or .0001983.
1462
1463 @example
1464 ?b
1465 @end example
1466
1467 converts to a char that represents the lowercase letter b.
1468
1469 @example
1470 ?^[$(B#&^[(B
1471 @end example
1472
1473 (where @samp{^[} actually is an @samp{ESC} character) converts to a
1474 particular Kanji character when using an ISO2022-based coding system for
1475 input. (To decode this goo: @samp{ESC} begins an escape sequence;
1476 @samp{ESC $ (} is a class of escape sequences meaning ``switch to a
1477 94x94 character set''; @samp{ESC $ ( B} means ``switch to Japanese
1478 Kanji''; @samp{#} and @samp{&} collectively index into a 94-by-94 array
1479 of characters [subtract 33 from the ASCII value of each character to get
1480 the corresponding index]; @samp{ESC (} is a class of escape sequences
1481 meaning ``switch to a 94 character set''; @samp{ESC (B} means ``switch
1482 to US ASCII''.  It is a coincidence that the letter @samp{B} is used to
1483 denote both Japanese Kanji and US ASCII.  If the first @samp{B} were
1484 replaced with an @samp{A}, you'd be requesting a Chinese Hanzi character
1485 from the GB2312 character set.)
1486
1487 @example
1488 "foobar"
1489 @end example
1490
1491 converts to a string.
1492
1493 @example
1494 foobar
1495 @end example
1496
1497 converts to a symbol whose name is @code{"foobar"}.  This is done by
1498 looking up the string equivalent in the global variable
1499 @code{obarray}, whose contents should be an obarray.  If no symbol
1500 is found, a new symbol with the name @code{"foobar"} is automatically
1501 created and added to @code{obarray}; this process is called
1502 @dfn{interning} the symbol.
1503 @cindex interning
1504
1505 @example
1506 (foo . bar)
1507 @end example
1508
1509 converts to a cons cell containing the symbols @code{foo} and @code{bar}.
1510
1511 @example
1512 (1 a 2.5)
1513 @end example
1514
1515 converts to a three-element list containing the specified objects
1516 (note that a list is actually a set of nested conses; see the
1517 XEmacs Lisp Reference).
1518
1519 @example
1520 [1 a 2.5]
1521 @end example
1522
1523 converts to a three-element vector containing the specified objects.
1524
1525 @example
1526 #[... ... ... ...]
1527 @end example
1528
1529 converts to a compiled-function object (the actual contents are not
1530 shown since they are not relevant here; look at a file that ends with
1531 @file{.elc} for examples).
1532
1533 @example
1534 #*01110110
1535 @end example
1536
1537 converts to a bit-vector.
1538
1539 @example
1540 #s(hash-table ... ...)
1541 @end example
1542
1543 converts to a hash table (the actual contents are not shown).
1544
1545 @example
1546 #s(range-table ... ...)
1547 @end example
1548
1549 converts to a range table (the actual contents are not shown).
1550
1551 @example
1552 #s(char-table ... ...)
1553 @end example
1554
1555 converts to a char table (the actual contents are not shown).
1556
1557 Note that the @code{#s()} syntax is the general syntax for structures,
1558 which are not really implemented in XEmacs Lisp but should be.
1559
1560 When an object is printed out (using @code{print} or a related
1561 function), the read syntax is used, so that the same object can be read
1562 in again.
1563
1564 The other objects do not have read syntaxes, usually because it does not
1565 really make sense to create them in this fashion (i.e.  processes, where
1566 it doesn't make sense to have a subprocess created as a side effect of
1567 reading some Lisp code), or because they can't be created at all
1568 (e.g. subrs).  Permanent objects, as a rule, do not have a read syntax;
1569 nor do most complex objects, which contain too much state to be easily
1570 initialized through a read syntax.
1571
1572 @node How Lisp Objects Are Represented in C, Rules When Writing New C Code, The XEmacs Object System (Abstractly Speaking), Top
1573 @chapter How Lisp Objects Are Represented in C
1574
1575 Lisp objects are represented in C using a 32-bit or 64-bit machine word
1576 (depending on the processor; i.e. DEC Alphas use 64-bit Lisp objects and
1577 most other processors use 32-bit Lisp objects).  The representation
1578 stuffs a pointer together with a tag, as follows:
1579
1580 @example
1581  [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
1582  [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
1583
1584    <---> ^ <------------------------------------------------------>
1585     tag  |       a pointer to a structure, or an integer
1586          |
1587        mark bit
1588 @end example
1589
1590 The tag describes the type of the Lisp object.  For integers and chars,
1591 the lower 28 bits contain the value of the integer or char; for all
1592 others, the lower 28 bits contain a pointer.  The mark bit is used
1593 during garbage-collection, and is always 0 when garbage collection is
1594 not happening. (The way that garbage collection works, basically, is that it
1595 loops over all places where Lisp objects could exist -- this includes
1596 all global variables in C that contain Lisp objects [including
1597 @code{Vobarray}, the C equivalent of @code{obarray}; through this, all
1598 Lisp variables will get marked], plus various other places -- and
1599 recursively scans through the Lisp objects, marking each object it finds
1600 by setting the mark bit.  Then it goes through the lists of all objects
1601 allocated, freeing the ones that are not marked and turning off the mark
1602 bit of the ones that are marked.)
1603
1604 Lisp objects use the typedef @code{Lisp_Object}, but the actual C type
1605 used for the Lisp object can vary.  It can be either a simple type
1606 (@code{long} on the DEC Alpha, @code{int} on other machines) or a
1607 structure whose fields are bit fields that line up properly (actually, a
1608 union of structures is used).  Generally the simple integral type is
1609 preferable because it ensures that the compiler will actually use a
1610 machine word to represent the object (some compilers will use more
1611 general and less efficient code for unions and structs even if they can
1612 fit in a machine word).  The union type, however, has the advantage of
1613 stricter type checking (if you accidentally pass an integer where a Lisp
1614 object is desired, you get a compile error), and it makes it easier to
1615 decode Lisp objects when debugging.  The choice of which type to use is
1616 determined by the preprocessor constant @code{USE_UNION_TYPE} which is
1617 defined via the @code{--use-union-type} option to @code{configure}.
1618
1619 @cindex record type
1620
1621 Note that there are only eight types that the tag can represent, but
1622 many more actual types than this.  This is handled by having one of the
1623 tag types specify a meta-type called a @dfn{record}; for all such
1624 objects, the first four bytes of the pointed-to structure indicate what
1625 the actual type is.
1626
1627 Note also that having 28 bits for pointers and integers restricts a lot
1628 of things to 256 megabytes of memory. (Basically, enough pointers and
1629 indices and whatnot get stuffed into Lisp objects that the total amount
1630 of memory used by XEmacs can't grow above 256 megabytes.  In older
1631 versions of XEmacs and GNU Emacs, the tag was 5 bits wide, allowing for
1632 32 types, which was more than the actual number of types that existed at
1633 the time, and no ``record'' type was necessary.  However, this limited
1634 the editor to 64 megabytes total, which some users who edited large
1635 files might conceivably exceed.)
1636
1637 Also, note that there is an implicit assumption here that all pointers
1638 are low enough that the top bits are all zero and can just be chopped
1639 off.  On standard machines that allocate memory from the bottom up (and
1640 give each process its own address space), this works fine.  Some
1641 machines, however, put the data space somewhere else in memory
1642 (e.g. beginning at 0x80000000).  Those machines cope by defining
1643 @code{DATA_SEG_BITS} in the corresponding @file{m/} or @file{s/} file to
1644 the proper mask.  Then, pointers retrieved from Lisp objects are
1645 automatically OR'ed with this value prior to being used.
1646
1647 A corollary of the previous paragraph is that @strong{(pointers to)
1648 stack-allocated structures cannot be put into Lisp objects}.  The stack
1649 is generally located near the top of memory; if you put such a pointer
1650 into a Lisp object, it will get its top bits chopped off, and you will
1651 lose.
1652
1653 Actually, there's an alternative representation of a @code{Lisp_Object},
1654 invented by Kyle Jones, that is used when the
1655 @code{--use-minimal-tagbits} option to @code{configure} is used.  In
1656 this case the 2 lower bits are used for the tag bits.  This
1657 representation assumes that pointers to structs are always aligned to
1658 multiples of 4, so the lower 2 bits are always zero.
1659
1660 @example
1661  [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
1662  [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
1663
1664    <---------------------------------------------------------> <->
1665             a pointer to a structure, or an integer            tag
1666 @end example
1667
1668 A tag of 00 is used for all pointer object types, a tag of 10 is used
1669 for characters, and the other two tags 01 and 11 are joined together to
1670 form the integer object type.  The markbit is moved to part of the
1671 structure being pointed at (integers and chars do not need to be marked,
1672 since no memory is allocated).  This representation has these
1673 advantages:
1674
1675 @enumerate
1676 @item
1677 31 bits can be used for Lisp Integers.
1678 @item
1679 @emph{Any} pointer can be represented directly, and no bit masking
1680 operations are necessary.
1681 @end enumerate
1682
1683 The disadvantages are:
1684
1685 @enumerate
1686 @item
1687 An extra level of indirection is needed when accessing the object types
1688 that were not record types.  So checking whether a Lisp object is a cons
1689 cell becomes a slower operation.
1690 @item
1691 Mark bits can no longer be stored directly in Lisp objects, so another
1692 place for them must be found.  This means that a cons cell requires more
1693 memory than merely room for 2 lisp objects, leading to extra memory use.
1694 @end enumerate
1695
1696 Various macros are used to construct Lisp objects and extract the
1697 components.  Macros of the form @code{XINT()}, @code{XCHAR()},
1698 @code{XSTRING()}, @code{XSYMBOL()}, etc. mask out the pointer/integer
1699 field and cast it to the appropriate type.  All of the macros that
1700 construct pointers will @code{OR} with @code{DATA_SEG_BITS} if
1701 necessary.  @code{XINT()} needs to be a bit tricky so that negative
1702 numbers are properly sign-extended: Usually it does this by shifting the
1703 number four bits to the left and then four bits to the right.  This
1704 assumes that the right-shift operator does an arithmetic shift (i.e. it
1705 leaves the most-significant bit as-is rather than shifting in a zero, so
1706 that it mimics a divide-by-two even for negative numbers).  Not all
1707 machines/compilers do this, and on the ones that don't, a more
1708 complicated definition is selected by defining
1709 @code{EXPLICIT_SIGN_EXTEND}.
1710
1711 Note that when @code{ERROR_CHECK_TYPECHECK} is defined, the extractor
1712 macros become more complicated -- they check the tag bits and/or the
1713 type field in the first four bytes of a record type to ensure that the
1714 object is really of the correct type.  This is great for catching places
1715 where an incorrect type is being dereferenced -- this typically results
1716 in a pointer being dereferenced as the wrong type of structure, with
1717 unpredictable (and sometimes not easily traceable) results.
1718
1719 There are similar @code{XSET@var{TYPE}()} macros that construct a Lisp
1720 object.  These macros are of the form @code{XSET@var{TYPE}
1721 (@var{lvalue}, @var{result})},
1722 i.e. they have to be a statement rather than just used in an expression.
1723 The reason for this is that standard C doesn't let you ``construct'' a
1724 structure (but GCC does).  Granted, this sometimes isn't too convenient;
1725 for the case of integers, at least, you can use the function
1726 @code{make_int()}, which constructs and @emph{returns} an integer
1727 Lisp object.  Note that the @code{XSET@var{TYPE}()} macros are also
1728 affected by @code{ERROR_CHECK_TYPECHECK} and make sure that the
1729 structure is of the right type in the case of record types, where the
1730 type is contained in the structure.
1731
1732 The C programmer is responsible for @strong{guaranteeing} that a
1733 Lisp_Object is is the correct type before using the @code{X@var{TYPE}}
1734 macros.  This is especially important in the case of lists.  Use
1735 @code{XCAR} and @code{XCDR} if a Lisp_Object is certainly a cons cell,
1736 else use @code{Fcar()} and @code{Fcdr()}.  Trust other C code, but not
1737 Lisp code.  On the other hand, if XEmacs has an internal logic error,
1738 it's better to crash immediately, so sprinkle ``unreachable''
1739 @code{abort()}s liberally about the source code.
1740
1741 @node Rules When Writing New C Code, A Summary of the Various XEmacs Modules, How Lisp Objects Are Represented in C, Top
1742 @chapter Rules When Writing New C Code
1743
1744 The XEmacs C Code is extremely complex and intricate, and there are many
1745 rules that are more or less consistently followed throughout the code.
1746 Many of these rules are not obvious, so they are explained here.  It is
1747 of the utmost importance that you follow them.  If you don't, you may
1748 get something that appears to work, but which will crash in odd
1749 situations, often in code far away from where the actual breakage is.
1750
1751 @menu
1752 * General Coding Rules::
1753 * Writing Lisp Primitives::
1754 * Adding Global Lisp Variables::
1755 * Coding for Mule::
1756 * Techniques for XEmacs Developers::
1757 @end menu
1758
1759 @node General Coding Rules
1760 @section General Coding Rules
1761
1762 The C code is actually written in a dialect of C called @dfn{Clean C},
1763 meaning that it can be compiled, mostly warning-free, with either a C or
1764 C++ compiler.  Coding in Clean C has several advantages over plain C.
1765 C++ compilers are more nit-picking, and a number of coding errors have
1766 been found by compiling with C++.  The ability to use both C and C++
1767 tools means that a greater variety of development tools are available to
1768 the developer.
1769
1770 Almost every module contains a @code{syms_of_*()} function and a
1771 @code{vars_of_*()} function.  The former declares any Lisp primitives
1772 you have defined and defines any symbols you will be using.  The latter
1773 declares any global Lisp variables you have added and initializes global
1774 C variables in the module.  For each such function, declare it in
1775 @file{symsinit.h} and make sure it's called in the appropriate place in
1776 @file{emacs.c}.  @strong{Important}: There are stringent requirements on
1777 exactly what can go into these functions.  See the comment in
1778 @file{emacs.c}.  The reason for this is to avoid obscure unwanted
1779 interactions during initialization.  If you don't follow these rules,
1780 you'll be sorry!  If you want to do anything that isn't allowed, create
1781 a @code{complex_vars_of_*()} function for it.  Doing this is tricky,
1782 though: You have to make sure your function is called at the right time
1783 so that all the initialization dependencies work out.
1784
1785 Every module includes @file{<config.h>} (angle brackets so that
1786 @samp{--srcdir} works correctly; @file{config.h} may or may not be in
1787 the same directory as the C sources) and @file{lisp.h}.  @file{config.h}
1788 must always be included before any other header files (including
1789 system header files) to ensure that certain tricks played by various
1790 @file{s/} and @file{m/} files work out correctly.
1791
1792 @strong{All global and static variables that are to be modifiable must
1793 be declared uninitialized.}  This means that you may not use the
1794 ``declare with initializer'' form for these variables, such as @code{int
1795 some_variable = 0;}.  The reason for this has to do with some kludges
1796 done during the dumping process: If possible, the initialized data
1797 segment is re-mapped so that it becomes part of the (unmodifiable) code
1798 segment in the dumped executable.  This allows this memory to be shared
1799 among multiple running XEmacs processes.  XEmacs is careful to place as
1800 much constant data as possible into initialized variables (in
1801 particular, into what's called the @dfn{pure space} -- see below) during
1802 the @file{temacs} phase.
1803
1804 @cindex copy-on-write
1805 @strong{Please note:} This kludge only works on a few systems nowadays,
1806 and is rapidly becoming irrelevant because most modern operating systems
1807 provide @dfn{copy-on-write} semantics.  All data is initially shared
1808 between processes, and a private copy is automatically made (on a
1809 page-by-page basis) when a process first attempts to write to a page of
1810 memory.
1811
1812 Formerly, there was a requirement that static variables not be declared
1813 inside of functions.  This had to do with another hack along the same
1814 vein as what was just described: old USG systems put statically-declared
1815 variables in the initialized data space, so those header files had a
1816 @code{#define static} declaration. (That way, the data-segment remapping
1817 described above could still work.) This fails badly on static variables
1818 inside of functions, which suddenly become automatic variables;
1819 therefore, you weren't supposed to have any of them.  This awful kludge
1820 has been removed in XEmacs because
1821
1822 @enumerate
1823 @item
1824 almost all of the systems that used this kludge ended up having
1825 to disable the data-segment remapping anyway;
1826 @item
1827 the only systems that didn't were extremely outdated ones;
1828 @item
1829 this hack completely messed up inline functions.
1830 @end enumerate
1831
1832 The C source code makes heavy use of C preprocessor macros.  One popular
1833 macro style is:
1834
1835 @example
1836 #define FOO(var, value) do @{           \
1837   Lisp_Object FOO_value = (value);      \
1838   ... /* compute using FOO_value */     \
1839   (var) = bar;                          \
1840 @} while (0)
1841 @end example
1842
1843 The @code{do @{...@} while (0)} is a standard trick to allow FOO to have
1844 statement semantics, so that it can safely be used within an @code{if}
1845 statement in C, for example.  Multiple evaluation is prevented by
1846 copying a supplied argument into a local variable, so that
1847 @code{FOO(var,fun(1))} only calls @code{fun} once.
1848
1849 Lisp lists are popular data structures in the C code as well as in
1850 Elisp.  There are two sets of macros that iterate over lists.
1851 @code{EXTERNAL_LIST_LOOP_@var{n}} should be used when the list has been
1852 supplied by the user, and cannot be trusted to be acyclic and
1853 nil-terminated.  A @code{malformed-list} or @code{circular-list} error
1854 will be generated if the list being iterated over is not entirely
1855 kosher.  @code{LIST_LOOP_@var{n}}, on the other hand, is faster and less
1856 safe, and can be used only on trusted lists.
1857
1858 Related macros are @code{GET_EXTERNAL_LIST_LENGTH} and
1859 @code{GET_LIST_LENGTH}, which calculate the length of a list, and in the
1860 case of @code{GET_EXTERNAL_LIST_LENGTH}, validating the properness of
1861 the list.  The macros @code{EXTERNAL_LIST_LOOP_DELETE_IF} and
1862 @code{LIST_LOOP_DELETE_IF} delete elements from a lisp list satisfying some
1863 predicate.
1864
1865 @node Writing Lisp Primitives
1866 @section Writing Lisp Primitives
1867
1868 Lisp primitives are Lisp functions implemented in C.  The details of
1869 interfacing the C function so that Lisp can call it are handled by a few
1870 C macros.  The only way to really understand how to write new C code is
1871 to read the source, but we can explain some things here.
1872
1873 An example of a special form is the definition of @code{prog1}, from
1874 @file{eval.c}.  (An ordinary function would have the same general
1875 appearance.)
1876
1877 @cindex garbage collection protection
1878 @smallexample
1879 @group
1880 DEFUN ("prog1", Fprog1, 1, UNEVALLED, 0, /*
1881 Similar to `progn', but the value of the first form is returned.
1882 \(prog1 FIRST BODY...): All the arguments are evaluated sequentially.
1883 The value of FIRST is saved during evaluation of the remaining args,
1884 whose values are discarded.
1885 */
1886        (args))
1887 @{
1888   /* This function can GC */
1889   REGISTER Lisp_Object val, form, tail;
1890   struct gcpro gcpro1;
1891
1892   val = Feval (XCAR (args));
1893
1894   GCPRO1 (val);
1895
1896   LIST_LOOP_3 (form, XCDR (args), tail)
1897     Feval (form);
1898
1899   UNGCPRO;
1900   return val;
1901 @}
1902 @end group
1903 @end smallexample
1904
1905   Let's start with a precise explanation of the arguments to the
1906 @code{DEFUN} macro.  Here is a template for them:
1907
1908 @example
1909 @group
1910 DEFUN (@var{lname}, @var{fname}, @var{min_args}, @var{max_args}, @var{interactive}, /*
1911 @var{docstring}
1912 */
1913    (@var{arglist}))
1914 @end group
1915 @end example
1916
1917 @table @var
1918 @item lname
1919 This string is the name of the Lisp symbol to define as the function
1920 name; in the example above, it is @code{"prog1"}.
1921
1922 @item fname
1923 This is the C function name for this function.  This is the name that is
1924 used in C code for calling the function.  The name is, by convention,
1925 @samp{F} prepended to the Lisp name, with all dashes (@samp{-}) in the
1926 Lisp name changed to underscores.  Thus, to call this function from C
1927 code, call @code{Fprog1}.  Remember that the arguments are of type
1928 @code{Lisp_Object}; various macros and functions for creating values of
1929 type @code{Lisp_Object} are declared in the file @file{lisp.h}.
1930
1931 Primitives whose names are special characters (e.g. @code{+} or
1932 @code{<}) are named by spelling out, in some fashion, the special
1933 character: e.g. @code{Fplus()} or @code{Flss()}.  Primitives whose names
1934 begin with normal alphanumeric characters but also contain special
1935 characters are spelled out in some creative way, e.g. @code{let*}
1936 becomes @code{FletX()}.
1937
1938 Each function also has an associated structure that holds the data for
1939 the subr object that represents the function in Lisp.  This structure
1940 conveys the Lisp symbol name to the initialization routine that will
1941 create the symbol and store the subr object as its definition.  The C
1942 variable name of this structure is always @samp{S} prepended to the
1943 @var{fname}.  You hardly ever need to be aware of the existence of this
1944 structure, since @code{DEFUN} plus @code{DEFSUBR} takes care of all the
1945 details.
1946
1947 @item min_args
1948 This is the minimum number of arguments that the function requires.  The
1949 function @code{prog1} allows a minimum of one argument.
1950
1951 @item max_args
1952 This is the maximum number of arguments that the function accepts, if
1953 there is a fixed maximum.  Alternatively, it can be @code{UNEVALLED},
1954 indicating a special form that receives unevaluated arguments, or
1955 @code{MANY}, indicating an unlimited number of evaluated arguments (the
1956 C equivalent of @code{&rest}).  Both @code{UNEVALLED} and @code{MANY}
1957 are macros.  If @var{max_args} is a number, it may not be less than
1958 @var{min_args} and it may not be greater than 8. (If you need to add a
1959 function with more than 8 arguments, use the @code{MANY} form.  Resist
1960 the urge to edit the definition of @code{DEFUN} in @file{lisp.h}.  If
1961 you do it anyways, make sure to also add another clause to the switch
1962 statement in @code{primitive_funcall().})
1963
1964 @item interactive
1965 This is an interactive specification, a string such as might be used as
1966 the argument of @code{interactive} in a Lisp function.  In the case of
1967 @code{prog1}, it is 0 (a null pointer), indicating that @code{prog1}
1968 cannot be called interactively.  A value of @code{""} indicates a
1969 function that should receive no arguments when called interactively.
1970
1971 @item docstring
1972 This is the documentation string.  It is written just like a
1973 documentation string for a function defined in Lisp; in particular, the
1974 first line should be a single sentence.  Note how the documentation
1975 string is enclosed in a comment, none of the documentation is placed on
1976 the same lines as the comment-start and comment-end characters, and the
1977 comment-start characters are on the same line as the interactive
1978 specification.  @file{make-docfile}, which scans the C files for
1979 documentation strings, is very particular about what it looks for, and
1980 will not properly extract the doc string if it's not in this exact format.
1981
1982 In order to make both @file{etags} and @file{make-docfile} happy, make
1983 sure that the @code{DEFUN} line contains the @var{lname} and
1984 @var{fname}, and that the comment-start characters for the doc string
1985 are on the same line as the interactive specification, and put a newline
1986 directly after them (and before the comment-end characters).
1987
1988 @item arglist
1989 This is the comma-separated list of arguments to the C function.  For a
1990 function with a fixed maximum number of arguments, provide a C argument
1991 for each Lisp argument.  In this case, unlike regular C functions, the
1992 types of the arguments are not declared; they are simply always of type
1993 @code{Lisp_Object}.
1994
1995 The names of the C arguments will be used as the names of the arguments
1996 to the Lisp primitive as displayed in its documentation, modulo the same
1997 concerns described above for @code{F...} names (in particular,
1998 underscores in the C arguments become dashes in the Lisp arguments).
1999
2000 There is one additional kludge: A trailing `_' on the C argument is
2001 discarded when forming the Lisp argument.  This allows C language
2002 reserved words (like @code{default}) or global symbols (like
2003 @code{dirname}) to be used as argument names without compiler warnings
2004 or errors.
2005
2006 A Lisp function with @w{@var{max_args} = @code{UNEVALLED}} is a
2007 @w{@dfn{special form}}; its arguments are not evaluated.  Instead it
2008 receives one argument of type @code{Lisp_Object}, a (Lisp) list of the
2009 unevaluated arguments, conventionally named @code{(args)}.
2010
2011 When a Lisp function has no upper limit on the number of arguments,
2012 specify @w{@var{max_args} = @code{MANY}}.  In this case its implementation in
2013 C actually receives exactly two arguments: the number of Lisp arguments
2014 (an @code{int}) and the address of a block containing their values (a
2015 @w{@code{Lisp_Object *}}).  In this case only are the C types specified
2016 in the @var{arglist}: @w{@code{(int nargs, Lisp_Object *args)}}.
2017
2018 @end table
2019
2020 Within the function @code{Fprog1} itself, note the use of the macros
2021 @code{GCPRO1} and @code{UNGCPRO}.  @code{GCPRO1} is used to ``protect''
2022 a variable from garbage collection---to inform the garbage collector
2023 that it must look in that variable and regard the object pointed at by
2024 its contents as an accessible object.  This is necessary whenever you
2025 call @code{Feval} or anything that can directly or indirectly call
2026 @code{Feval} (this includes the @code{QUIT} macro!).  At such a time,
2027 any Lisp object that you intend to refer to again must be protected
2028 somehow.  @code{UNGCPRO} cancels the protection of the variables that
2029 are protected in the current function.  It is necessary to do this
2030 explicitly.
2031
2032 The macro @code{GCPRO1} protects just one local variable.  If you want
2033 to protect two, use @code{GCPRO2} instead; repeating @code{GCPRO1} will
2034 not work.  Macros @code{GCPRO3} and @code{GCPRO4} also exist.
2035
2036 These macros implicitly use local variables such as @code{gcpro1}; you
2037 must declare these explicitly, with type @code{struct gcpro}.  Thus, if
2038 you use @code{GCPRO2}, you must declare @code{gcpro1} and @code{gcpro2}.
2039
2040 @cindex caller-protects (@code{GCPRO} rule)
2041 Note also that the general rule is @dfn{caller-protects}; i.e. you are
2042 only responsible for protecting those Lisp objects that you create.  Any
2043 objects passed to you as arguments should have been protected by whoever
2044 created them, so you don't in general have to protect them.
2045
2046 In particular, the arguments to any Lisp primitive are always
2047 automatically @code{GCPRO}ed, when called ``normally'' from Lisp code or
2048 bytecode.  So only a few Lisp primitives that are called frequently from
2049 C code, such as @code{Fprogn} protect their arguments as a service to
2050 their caller.  You don't need to protect your arguments when writing a
2051 new @code{DEFUN}.
2052
2053 @code{GCPRO}ing is perhaps the trickiest and most error-prone part of
2054 XEmacs coding.  It is @strong{extremely} important that you get this
2055 right and use a great deal of discipline when writing this code.
2056 @xref{GCPROing, ,@code{GCPRO}ing}, for full details on how to do this.
2057
2058 What @code{DEFUN} actually does is declare a global structure of type
2059 @code{Lisp_Subr} whose name begins with capital @samp{SF} and which
2060 contains information about the primitive (e.g. a pointer to the
2061 function, its minimum and maximum allowed arguments, a string describing
2062 its Lisp name); @code{DEFUN} then begins a normal C function declaration
2063 using the @code{F...} name.  The Lisp subr object that is the function
2064 definition of a primitive (i.e. the object in the function slot of the
2065 symbol that names the primitive) actually points to this @samp{SF}
2066 structure; when @code{Feval} encounters a subr, it looks in the
2067 structure to find out how to call the C function.
2068
2069 Defining the C function is not enough to make a Lisp primitive
2070 available; you must also create the Lisp symbol for the primitive (the
2071 symbol is @dfn{interned}; @pxref{Obarrays}) and store a suitable subr
2072 object in its function cell. (If you don't do this, the primitive won't
2073 be seen by Lisp code.) The code looks like this:
2074
2075 @example
2076 DEFSUBR (@var{fname});
2077 @end example
2078
2079 @noindent
2080 Here @var{fname} is the same name you used as the second argument to
2081 @code{DEFUN}.
2082
2083 This call to @code{DEFSUBR} should go in the @code{syms_of_*()} function
2084 at the end of the module.  If no such function exists, create it and
2085 make sure to also declare it in @file{symsinit.h} and call it from the
2086 appropriate spot in @code{main()}.  @xref{General Coding Rules}.
2087
2088 Note that C code cannot call functions by name unless they are defined
2089 in C.  The way to call a function written in Lisp from C is to use
2090 @code{Ffuncall}, which embodies the Lisp function @code{funcall}.  Since
2091 the Lisp function @code{funcall} accepts an unlimited number of
2092 arguments, in C it takes two: the number of Lisp-level arguments, and a
2093 one-dimensional array containing their values.  The first Lisp-level
2094 argument is the Lisp function to call, and the rest are the arguments to
2095 pass to it.  Since @code{Ffuncall} can call the evaluator, you must
2096 protect pointers from garbage collection around the call to
2097 @code{Ffuncall}. (However, @code{Ffuncall} explicitly protects all of
2098 its parameters, so you don't have to protect any pointers passed as
2099 parameters to it.)
2100
2101 The C functions @code{call0}, @code{call1}, @code{call2}, and so on,
2102 provide handy ways to call a Lisp function conveniently with a fixed
2103 number of arguments.  They work by calling @code{Ffuncall}.
2104
2105 @file{eval.c} is a very good file to look through for examples;
2106 @file{lisp.h} contains the definitions for important macros and
2107 functions.
2108
2109 @node Adding Global Lisp Variables
2110 @section Adding Global Lisp Variables
2111
2112 Global variables whose names begin with @samp{Q} are constants whose
2113 value is a symbol of a particular name.  The name of the variable should
2114 be derived from the name of the symbol using the same rules as for Lisp
2115 primitives.  These variables are initialized using a call to
2116 @code{defsymbol()} in the @code{syms_of_*()} function. (This call
2117 interns a symbol, sets the C variable to the resulting Lisp object, and
2118 calls @code{staticpro()} on the C variable to tell the
2119 garbage-collection mechanism about this variable.  What
2120 @code{staticpro()} does is add a pointer to the variable to a large
2121 global array; when garbage-collection happens, all pointers listed in
2122 the array are used as starting points for marking Lisp objects.  This is
2123 important because it's quite possible that the only current reference to
2124 the object is the C variable.  In the case of symbols, the
2125 @code{staticpro()} doesn't matter all that much because the symbol is
2126 contained in @code{obarray}, which is itself @code{staticpro()}ed.
2127 However, it's possible that a naughty user could do something like
2128 uninterning the symbol out of @code{obarray} or even setting
2129 @code{obarray} to a different value [although this is likely to make
2130 XEmacs crash!].)
2131
2132   @strong{Please note:} It is potentially deadly if you declare a
2133 @samp{Q...}  variable in two different modules.  The two calls to
2134 @code{defsymbol()} are no problem, but some linkers will complain about
2135 multiply-defined symbols.  The most insidious aspect of this is that
2136 often the link will succeed anyway, but then the resulting executable
2137 will sometimes crash in obscure ways during certain operations!  To
2138 avoid this problem, declare any symbols with common names (such as
2139 @code{text}) that are not obviously associated with this particular
2140 module in the module @file{general.c}.
2141
2142   Global variables whose names begin with @samp{V} are variables that
2143 contain Lisp objects.  The convention here is that all global variables
2144 of type @code{Lisp_Object} begin with @samp{V}, and all others don't
2145 (including integer and boolean variables that have Lisp
2146 equivalents). Most of the time, these variables have equivalents in
2147 Lisp, but some don't.  Those that do are declared this way by a call to
2148 @code{DEFVAR_LISP()} in the @code{vars_of_*()} initializer for the
2149 module.  What this does is create a special @dfn{symbol-value-forward}
2150 Lisp object that contains a pointer to the C variable, intern a symbol
2151 whose name is as specified in the call to @code{DEFVAR_LISP()}, and set
2152 its value to the symbol-value-forward Lisp object; it also calls
2153 @code{staticpro()} on the C variable to tell the garbage-collection
2154 mechanism about the variable.  When @code{eval} (or actually
2155 @code{symbol-value}) encounters this special object in the process of
2156 retrieving a variable's value, it follows the indirection to the C
2157 variable and gets its value.  @code{setq} does similar things so that
2158 the C variable gets changed.
2159
2160   Whether or not you @code{DEFVAR_LISP()} a variable, you need to
2161 initialize it in the @code{vars_of_*()} function; otherwise it will end
2162 up as all zeroes, which is the integer 0 (@emph{not} @code{nil}), and
2163 this is probably not what you want.  Also, if the variable is not
2164 @code{DEFVAR_LISP()}ed, @strong{you must call} @code{staticpro()} on the
2165 C variable in the @code{vars_of_*()} function.  Otherwise, the
2166 garbage-collection mechanism won't know that the object in this variable
2167 is in use, and will happily collect it and reuse its storage for another
2168 Lisp object, and you will be the one who's unhappy when you can't figure
2169 out how your variable got overwritten.
2170
2171 @node Coding for Mule
2172 @section Coding for Mule
2173 @cindex Coding for Mule
2174
2175 Although Mule support is not compiled by default in XEmacs, many people
2176 are using it, and we consider it crucial that new code works correctly
2177 with multibyte characters.  This is not hard; it is only a matter of
2178 following several simple user-interface guidelines.  Even if you never
2179 compile with Mule, with a little practice you will find it quite easy
2180 to code Mule-correctly.
2181
2182 Note that these guidelines are not necessarily tied to the current Mule
2183 implementation; they are also a good idea to follow on the grounds of
2184 code generalization for future I18N work.
2185
2186 @menu
2187 * Character-Related Data Types::
2188 * Working With Character and Byte Positions::
2189 * Conversion to and from External Data::
2190 * General Guidelines for Writing Mule-Aware Code::
2191 * An Example of Mule-Aware Code::
2192 @end menu
2193
2194 @node Character-Related Data Types
2195 @subsection Character-Related Data Types
2196
2197 First, let's review the basic character-related datatypes used by
2198 XEmacs.  Note that the separate @code{typedef}s are not mandatory in the
2199 current implementation (all of them boil down to @code{unsigned char} or
2200 @code{int}), but they improve clarity of code a great deal, because one
2201 glance at the declaration can tell the intended use of the variable.
2202
2203 @table @code
2204 @item Emchar
2205 @cindex Emchar
2206 An @code{Emchar} holds a single Emacs character.
2207
2208 Obviously, the equality between characters and bytes is lost in the Mule
2209 world.  Characters can be represented by one or more bytes in the
2210 buffer, and @code{Emchar} is the C type large enough to hold any
2211 character.
2212
2213 Without Mule support, an @code{Emchar} is equivalent to an
2214 @code{unsigned char}.
2215
2216 @item Bufbyte
2217 @cindex Bufbyte
2218 The data representing the text in a buffer or string is logically a set
2219 of @code{Bufbyte}s.
2220
2221 XEmacs does not work with character formats all the time; when reading
2222 characters from the outside, it decodes them to an internal format, and
2223 likewise encodes them when writing.  @code{Bufbyte} (in fact
2224 @code{unsigned char}) is the basic unit of XEmacs internal buffers and
2225 strings format.
2226
2227 One character can correspond to one or more @code{Bufbyte}s.  In the
2228 current implementation, an ASCII character is represented by the same
2229 @code{Bufbyte}, and extended characters are represented by a sequence of
2230 @code{Bufbyte}s.
2231
2232 Without Mule support, a @code{Bufbyte} is equivalent to an
2233 @code{Emchar}.
2234
2235 @item Bufpos
2236 @itemx Charcount
2237 @cindex Bufpos
2238 @cindex Charcount
2239 A @code{Bufpos} represents a character position in a buffer or string.
2240 A @code{Charcount} represents a number (count) of characters.
2241 Logically, subtracting two @code{Bufpos} values yields a
2242 @code{Charcount} value.  Although all of these are @code{typedef}ed to
2243 @code{int}, we use them in preference to @code{int} to make it clear
2244 what sort of position is being used.
2245
2246 @code{Bufpos} and @code{Charcount} values are the only ones that are
2247 ever visible to Lisp.
2248
2249 @item Bytind
2250 @itemx Bytecount
2251 @cindex Bytind
2252 @cindex Bytecount
2253 A @code{Bytind} represents a byte position in a buffer or string.  A
2254 @code{Bytecount} represents the distance between two positions in bytes.
2255 The relationship between @code{Bytind} and @code{Bytecount} is the same
2256 as the relationship between @code{Bufpos} and @code{Charcount}.
2257
2258 @item Extbyte
2259 @itemx Extcount
2260 @cindex Extbyte
2261 @cindex Extcount
2262 When dealing with the outside world, XEmacs works with @code{Extbyte}s,
2263 which are equivalent to @code{unsigned char}.  Obviously, an
2264 @code{Extcount} is the distance between two @code{Extbyte}s.  Extbytes
2265 and Extcounts are not all that frequent in XEmacs code.
2266 @end table
2267
2268 @node Working With Character and Byte Positions
2269 @subsection Working With Character and Byte Positions
2270
2271 Now that we have defined the basic character-related types, we can look
2272 at the macros and functions designed for work with them and for
2273 conversion between them.  Most of these macros are defined in
2274 @file{buffer.h}, and we don't discuss all of them here, but only the
2275 most important ones.  Examining the existing code is the best way to
2276 learn about them.
2277
2278 @table @code
2279 @item MAX_EMCHAR_LEN
2280 @cindex MAX_EMCHAR_LEN
2281 This preprocessor constant is the maximum number of buffer bytes per
2282 Emacs character, i.e. the byte length of an @code{Emchar}.  It is useful
2283 when allocating temporary strings to keep a known number of characters.
2284 For instance:
2285
2286 @example
2287 @group
2288 @{
2289   Charcount cclen;
2290   ...
2291   @{
2292     /* Allocate place for @var{cclen} characters. */
2293     Bufbyte *buf = (Bufbyte *)alloca (cclen * MAX_EMCHAR_LEN);
2294 ...
2295 @end group
2296 @end example
2297
2298 If you followed the previous section, you can guess that, logically,
2299 multiplying a @code{Charcount} value with @code{MAX_EMCHAR_LEN} produces
2300 a @code{Bytecount} value.
2301
2302 In the current Mule implementation, @code{MAX_EMCHAR_LEN} equals 4.
2303 Without Mule, it is 1.
2304
2305 @item charptr_emchar
2306 @itemx set_charptr_emchar
2307 @cindex charptr_emchar
2308 @cindex set_charptr_emchar
2309 The @code{charptr_emchar} macro takes a @code{Bufbyte} pointer and
2310 returns the @code{Emchar} stored at that position.  If it were a
2311 function, its prototype would be:
2312
2313 @example
2314 Emchar charptr_emchar (Bufbyte *p);
2315 @end example
2316
2317 @code{set_charptr_emchar} stores an @code{Emchar} to the specified byte
2318 position.  It returns the number of bytes stored:
2319
2320 @example
2321 Bytecount set_charptr_emchar (Bufbyte *p, Emchar c);
2322 @end example
2323
2324 It is important to note that @code{set_charptr_emchar} is safe only for
2325 appending a character at the end of a buffer, not for overwriting a
2326 character in the middle.  This is because the width of characters
2327 varies, and @code{set_charptr_emchar} cannot resize the string if it
2328 writes, say, a two-byte character where a single-byte character used to
2329 reside.
2330
2331 A typical use of @code{set_charptr_emchar} can be demonstrated by this
2332 example, which copies characters from buffer @var{buf} to a temporary
2333 string of Bufbytes.
2334
2335 @example
2336 @group
2337 @{
2338   Bufpos pos;
2339   for (pos = beg; pos < end; pos++)
2340     @{
2341       Emchar c = BUF_FETCH_CHAR (buf, pos);
2342       p += set_charptr_emchar (buf, c);
2343     @}
2344 @}
2345 @end group
2346 @end example
2347
2348 Note how @code{set_charptr_emchar} is used to store the @code{Emchar}
2349 and increment the counter, at the same time.
2350
2351 @item INC_CHARPTR
2352 @itemx DEC_CHARPTR
2353 @cindex INC_CHARPTR
2354 @cindex DEC_CHARPTR
2355 These two macros increment and decrement a @code{Bufbyte} pointer,
2356 respectively.  They will adjust the pointer by the appropriate number of
2357 bytes according to the byte length of the character stored there.  Both
2358 macros assume that the memory address is located at the beginning of a
2359 valid character.
2360
2361 Without Mule support, @code{INC_CHARPTR (p)} and @code{DEC_CHARPTR (p)}
2362 simply expand to @code{p++} and @code{p--}, respectively.
2363
2364 @item bytecount_to_charcount
2365 @cindex bytecount_to_charcount
2366 Given a pointer to a text string and a length in bytes, return the
2367 equivalent length in characters.
2368
2369 @example
2370 Charcount bytecount_to_charcount (Bufbyte *p, Bytecount bc);
2371 @end example
2372
2373 @item charcount_to_bytecount
2374 @cindex charcount_to_bytecount
2375 Given a pointer to a text string and a length in characters, return the
2376 equivalent length in bytes.
2377
2378 @example
2379 Bytecount charcount_to_bytecount (Bufbyte *p, Charcount cc);
2380 @end example
2381
2382 @item charptr_n_addr
2383 @cindex charptr_n_addr
2384 Return a pointer to the beginning of the character offset @var{cc} (in
2385 characters) from @var{p}.
2386
2387 @example
2388 Bufbyte *charptr_n_addr (Bufbyte *p, Charcount cc);
2389 @end example
2390 @end table
2391
2392 @node Conversion to and from External Data
2393 @subsection Conversion to and from External Data
2394
2395 When an external function, such as a C library function, returns a
2396 @code{char} pointer, you should almost never treat it as @code{Bufbyte}.
2397 This is because these returned strings may contain 8bit characters which
2398 can be misinterpreted by XEmacs, and cause a crash.  Likewise, when
2399 exporting a piece of internal text to the outside world, you should
2400 always convert it to an appropriate external encoding, lest the internal
2401 stuff (such as the infamous \201 characters) leak out.
2402
2403 The interface to conversion between the internal and external
2404 representations of text are the numerous conversion macros defined in
2405 @file{buffer.h}.  Before looking at them, we'll look at the external
2406 formats supported by these macros.
2407
2408 Currently meaningful formats are @code{FORMAT_BINARY},
2409 @code{FORMAT_FILENAME}, @code{FORMAT_OS}, and @code{FORMAT_CTEXT}.  Here
2410 is a description of these.
2411
2412 @table @code
2413 @item FORMAT_BINARY
2414 Binary format.  This is the simplest format and is what we use in the
2415 absence of a more appropriate format.  This converts according to the
2416 @code{binary} coding system:
2417
2418 @enumerate a
2419 @item
2420 On input, bytes 0--255 are converted into characters 0--255.
2421 @item
2422 On output, characters 0--255 are converted into bytes 0--255 and other
2423 characters are converted into `X'.
2424 @end enumerate
2425
2426 @item FORMAT_FILENAME
2427 Format used for filenames.  In the original Mule, this is user-definable
2428 with the @code{pathname-coding-system} variable.  For the moment, we
2429 just use the @code{binary} coding system.
2430
2431 @item FORMAT_OS
2432 Format used for the external Unix environment---@code{argv[]}, stuff
2433 from @code{getenv()}, stuff from the @file{/etc/passwd} file, etc.
2434
2435 Perhaps should be the same as FORMAT_FILENAME.
2436
2437 @item FORMAT_CTEXT
2438 Compound--text format.  This is the standard X format used for data
2439 stored in properties, selections, and the like.  This is an 8-bit
2440 no-lock-shift ISO2022 coding system.
2441 @end table
2442
2443 The macros to convert between these formats and the internal format, and
2444 vice versa, follow.
2445
2446 @table @code
2447 @item GET_CHARPTR_INT_DATA_ALLOCA
2448 @itemx GET_CHARPTR_EXT_DATA_ALLOCA
2449 These two are the most basic conversion macros.
2450 @code{GET_CHARPTR_INT_DATA_ALLOCA} converts external data to internal
2451 format, and @code{GET_CHARPTR_EXT_DATA_ALLOCA} converts the other way
2452 around.  The arguments each of these receives are @var{ptr} (pointer to
2453 the text in external format), @var{len} (length of texts in bytes),
2454 @var{fmt} (format of the external text), @var{ptr_out} (lvalue to which
2455 new text should be copied), and @var{len_out} (lvalue which will be
2456 assigned the length of the internal text in bytes).  The resulting text
2457 is stored to a stack-allocated buffer.  If the text doesn't need
2458 changing, these macros will do nothing, except for setting
2459 @var{len_out}.
2460
2461 The macros above take many arguments which makes them unwieldy.  For
2462 this reason, a number of convenience macros are defined with obvious
2463 functionality, but accepting less arguments.  The general rule is that
2464 macros with @samp{INT} in their name convert text to internal Emacs
2465 representation, whereas the @samp{EXT} macros convert to external
2466 representation.
2467
2468 @item GET_C_CHARPTR_INT_DATA_ALLOCA
2469 @itemx GET_C_CHARPTR_EXT_DATA_ALLOCA
2470 As their names imply, these macros work on C char pointers, which are
2471 zero-terminated, and thus do not need @var{len} or @var{len_out}
2472 parameters.
2473
2474 @item GET_STRING_EXT_DATA_ALLOCA
2475 @itemx GET_C_STRING_EXT_DATA_ALLOCA
2476 These two macros convert a Lisp string into an external representation.
2477 The difference between them is that @code{GET_STRING_EXT_DATA_ALLOCA}
2478 stores its output to a generic string, providing @var{len_out}, the
2479 length of the resulting external string.  On the other hand,
2480 @code{GET_C_STRING_EXT_DATA_ALLOCA} assumes that the caller will be
2481 satisfied with output string being zero-terminated.
2482
2483 Note that for Lisp strings only one conversion direction makes sense.
2484
2485 @item GET_C_CHARPTR_EXT_BINARY_DATA_ALLOCA
2486 @itemx GET_CHARPTR_EXT_BINARY_DATA_ALLOCA
2487 @itemx GET_STRING_BINARY_DATA_ALLOCA
2488 @itemx GET_C_STRING_BINARY_DATA_ALLOCA
2489 @itemx GET_C_CHARPTR_EXT_FILENAME_DATA_ALLOCA
2490 @itemx ...
2491 These macros convert internal text to a specific external
2492 representation, with the external format being encoded into the name of
2493 the macro.  Note that the @code{GET_STRING_...} and
2494 @code{GET_C_STRING...}  macros lack the @samp{EXT} tag, because they
2495 only make sense in that direction.
2496
2497 @item GET_C_CHARPTR_INT_BINARY_DATA_ALLOCA
2498 @itemx GET_CHARPTR_INT_BINARY_DATA_ALLOCA
2499 @itemx GET_C_CHARPTR_INT_FILENAME_DATA_ALLOCA
2500 @itemx ...
2501 These macros convert external text of a specific format to its internal
2502 representation, with the external format being incoded into the name of
2503 the macro.
2504 @end table
2505
2506 @node General Guidelines for Writing Mule-Aware Code
2507 @subsection General Guidelines for Writing Mule-Aware Code
2508
2509 This section contains some general guidance on how to write Mule-aware
2510 code, as well as some pitfalls you should avoid.
2511
2512 @table @emph
2513 @item Never use @code{char} and @code{char *}.
2514 In XEmacs, the use of @code{char} and @code{char *} is almost always a
2515 mistake.  If you want to manipulate an Emacs character from ``C'', use
2516 @code{Emchar}.  If you want to examine a specific octet in the internal
2517 format, use @code{Bufbyte}.  If you want a Lisp-visible character, use a
2518 @code{Lisp_Object} and @code{make_char}.  If you want a pointer to move
2519 through the internal text, use @code{Bufbyte *}.  Also note that you
2520 almost certainly do not need @code{Emchar *}.
2521
2522 @item Be careful not to confuse @code{Charcount}, @code{Bytecount}, and @code{Bufpos}.
2523 The whole point of using different types is to avoid confusion about the
2524 use of certain variables.  Lest this effect be nullified, you need to be
2525 careful about using the right types.
2526
2527 @item Always convert external data
2528 It is extremely important to always convert external data, because
2529 XEmacs can crash if unexpected 8bit sequences are copied to its internal
2530 buffers literally.
2531
2532 This means that when a system function, such as @code{readdir}, returns
2533 a string, you need to convert it using one of the conversion macros
2534 described in the previous chapter, before passing it further to Lisp.
2535 In the case of @code{readdir}, you would use the
2536 @code{GET_C_CHARPTR_INT_FILENAME_DATA_ALLOCA} macro.
2537
2538 Also note that many internal functions, such as @code{make_string},
2539 accept Bufbytes, which removes the need for them to convert the data
2540 they receive.  This increases efficiency because that way external data
2541 needs to be decoded only once, when it is read.  After that, it is
2542 passed around in internal format.
2543 @end table
2544
2545 @node An Example of Mule-Aware Code
2546 @subsection An Example of Mule-Aware Code
2547
2548 As an example of Mule-aware code, we shall will analyze the
2549 @code{string} function, which conses up a Lisp string from the character
2550 arguments it receives.  Here is the definition, pasted from
2551 @code{alloc.c}:
2552
2553 @example
2554 @group
2555 DEFUN ("string", Fstring, 0, MANY, 0, /*
2556 Concatenate all the argument characters and make the result a string.
2557 */
2558        (int nargs, Lisp_Object *args))
2559 @{
2560   Bufbyte *storage = alloca_array (Bufbyte, nargs * MAX_EMCHAR_LEN);
2561   Bufbyte *p = storage;
2562
2563   for (; nargs; nargs--, args++)
2564     @{
2565       Lisp_Object lisp_char = *args;
2566       CHECK_CHAR_COERCE_INT (lisp_char);
2567       p += set_charptr_emchar (p, XCHAR (lisp_char));
2568     @}
2569   return make_string (storage, p - storage);
2570 @}
2571 @end group
2572 @end example
2573
2574 Now we can analyze the source line by line.
2575
2576 Obviously, string will be as long as there are arguments to the
2577 function.  This is why we allocate @code{MAX_EMCHAR_LEN} * @var{nargs}
2578 bytes on the stack, i.e. the worst-case number of bytes for @var{nargs}
2579 @code{Emchar}s to fit in the string.
2580
2581 Then, the loop checks that each element is a character, converting
2582 integers in the process.  Like many other functions in XEmacs, this
2583 function silently accepts integers where characters are expected, for
2584 historical and compatibility reasons.  Unless you know what you are
2585 doing, @code{CHECK_CHAR} will also suffice.  @code{XCHAR (lisp_char)}
2586 extracts the @code{Emchar} from the @code{Lisp_Object}, and
2587 @code{set_charptr_emchar} stores it to storage, increasing @code{p} in
2588 the process.
2589
2590 Other instructive examples of correct coding under Mule can be found all
2591 over the XEmacs code.  For starters, I recommend
2592 @code{Fnormalize_menu_item_name} in @file{menubar.c}.  After you have
2593 understood this section of the manual and studied the examples, you can
2594 proceed writing new Mule-aware code.
2595
2596 @node Techniques for XEmacs Developers
2597 @section Techniques for XEmacs Developers
2598
2599 To make a quantified XEmacs, do: @code{make quantmacs}.
2600
2601 You simply can't dump Quantified and Purified images.  Run the image
2602 like so:  @code{quantmacs -batch -l loadup.el run-temacs @var{xemacs-args...}}.
2603
2604 Before you go through the trouble, are you compiling with all
2605 debugging and error-checking off?  If not try that first.  Be warned
2606 that while Quantify is directly responsible for quite a few
2607 optimizations which have been made to XEmacs, doing a run which
2608 generates results which can be acted upon is not necessarily a trivial
2609 task.
2610
2611 Also, if you're still willing to do some runs make sure you configure
2612 with the @samp{--quantify} flag.  That will keep Quantify from starting
2613 to record data until after the loadup is completed and will shut off
2614 recording right before it shuts down (which generates enough bogus data
2615 to throw most results off).  It also enables three additional elisp
2616 commands: @code{quantify-start-recording-data},
2617 @code{quantify-stop-recording-data} and @code{quantify-clear-data}.
2618
2619 If you want to make XEmacs faster, target your favorite slow benchmark,
2620 run a profiler like Quantify, @code{gprof}, or @code{tcov}, and figure
2621 out where the cycles are going.  Specific projects:
2622
2623 @itemize @bullet
2624 @item
2625 Make the garbage collector faster.  Figure out how to write an
2626 incremental garbage collector.
2627 @item
2628 Write a compiler that takes bytecode and spits out C code.
2629 Unfortunately, you will then need a C compiler and a more fully
2630 developed module system.
2631 @item
2632 Speed up redisplay.
2633 @item
2634 Speed up syntax highlighting.  Maybe moving some of the syntax
2635 highlighting capabilities into C would make a difference.
2636 @item
2637 Implement tail recursion in Emacs Lisp (hard!).
2638 @end itemize
2639
2640 Unfortunately, Emacs Lisp is slow, and is going to stay slow.  Function
2641 calls in elisp are especially expensive.  Iterating over a long list is
2642 going to be 30 times faster implemented in C than in Elisp.
2643
2644 To get started debugging XEmacs, take a look at the @file{gdbinit} and
2645 @file{dbxrc} files in the @file{src} directory.
2646 @xref{Q2.1.15 - How to Debug an XEmacs problem with a debugger,,,
2647 xemacs-faq, XEmacs FAQ}.
2648
2649 After making source code changes, run @code{make check} to ensure that
2650 you haven't introduced any regressions.  If you're feeling ambitious,
2651 you can try to improve the test suite in @file{tests/automated}.
2652
2653 Here are things to know when you create a new source file:
2654
2655 @itemize @bullet
2656 @item
2657 All @file{.c} files should @code{#include <config.h>} first.  Almost all
2658 @file{.c} files should @code{#include "lisp.h"} second.
2659
2660 @item
2661 Generated header files should be included using the @code{#include <...>} syntax,
2662 not the @code{#include "..."} syntax.  The generated headers are:
2663
2664 @file{config.h puresize-adjust.h sheap-adjust.h paths.h Emacs.ad.h}
2665
2666 The basic rule is that you should assume builds using @code{--srcdir}
2667 and the @code{#include <...>} syntax needs to be used when the
2668 to-be-included generated file is in a potentially different directory
2669 @emph{at compile time}.  The non-obvious C rule is that @code{#include "..."}
2670 means to search for the included file in the same directory as the
2671 including file, @emph{not} in the current directory.
2672
2673 @item
2674 Header files should @emph{not} include @code{<config.h>} and
2675 @code{"lisp.h"}.  It is the responsibility of the @file{.c} files that
2676 use it to do so.
2677
2678 @item
2679 If the header uses @code{INLINE}, either directly or through
2680 @code{DECLARE_LRECORD}, then it must be added to @file{inline.c}'s
2681 includes.
2682
2683 @item
2684 Try compiling at least once with
2685
2686 @example
2687 gcc --with-mule --with-union-type --error-checking=all
2688 @end example
2689
2690 @item
2691 Did I mention that you should run the test suite?
2692 @example
2693 make check
2694 @end example
2695 @end itemize
2696
2697
2698 @node A Summary of the Various XEmacs Modules, Allocation of Objects in XEmacs Lisp, Rules When Writing New C Code, Top
2699 @chapter A Summary of the Various XEmacs Modules
2700
2701   This is accurate as of XEmacs 20.0.
2702
2703 @menu
2704 * Low-Level Modules::
2705 * Basic Lisp Modules::
2706 * Modules for Standard Editing Operations::
2707 * Editor-Level Control Flow Modules::
2708 * Modules for the Basic Displayable Lisp Objects::
2709 * Modules for other Display-Related Lisp Objects::
2710 * Modules for the Redisplay Mechanism::
2711 * Modules for Interfacing with the File System::
2712 * Modules for Other Aspects of the Lisp Interpreter and Object System::
2713 * Modules for Interfacing with the Operating System::
2714 * Modules for Interfacing with X Windows::
2715 * Modules for Internationalization::
2716 @end menu
2717
2718 @node Low-Level Modules
2719 @section Low-Level Modules
2720
2721 @example
2722 config.h
2723 @end example
2724
2725 This is automatically generated from @file{config.h.in} based on the
2726 results of configure tests and user-selected optional features and
2727 contains preprocessor definitions specifying the nature of the
2728 environment in which XEmacs is being compiled.
2729
2730
2731
2732 @example
2733 paths.h
2734 @end example
2735
2736 This is automatically generated from @file{paths.h.in} based on supplied
2737 configure values, and allows for non-standard installed configurations
2738 of the XEmacs directories.  It's currently broken, though.
2739
2740
2741
2742 @example
2743 emacs.c
2744 signal.c
2745 @end example
2746
2747 @file{emacs.c} contains @code{main()} and other code that performs the most
2748 basic environment initializations and handles shutting down the XEmacs
2749 process (this includes @code{kill-emacs}, the normal way that XEmacs is
2750 exited; @code{dump-emacs}, which is used during the build process to
2751 write out the XEmacs executable; @code{run-emacs-from-temacs}, which can
2752 be used to start XEmacs directly when temacs has finished loading all
2753 the Lisp code; and emergency code to handle crashes [XEmacs tries to
2754 auto-save all files before it crashes]).
2755
2756 Low-level code that directly interacts with the Unix signal mechanism,
2757 however, is in @file{signal.c}.  Note that this code does not handle system
2758 dependencies in interfacing to signals; that is handled using the
2759 @file{syssignal.h} header file, described in section J below.
2760
2761
2762
2763 @example
2764 unexaix.c
2765 unexalpha.c
2766 unexapollo.c
2767 unexconvex.c
2768 unexec.c
2769 unexelf.c
2770 unexelfsgi.c
2771 unexencap.c
2772 unexenix.c
2773 unexfreebsd.c
2774 unexfx2800.c
2775 unexhp9k3.c
2776 unexhp9k800.c
2777 unexmips.c
2778 unexnext.c
2779 unexsol2.c
2780 unexsunos4.c
2781 @end example
2782
2783 These modules contain code dumping out the XEmacs executable on various
2784 different systems. (This process is highly machine-specific and
2785 requires intimate knowledge of the executable format and the memory map
2786 of the process.) Only one of these modules is actually used; this is
2787 chosen by @file{configure}.
2788
2789
2790
2791 @example
2792 crt0.c
2793 lastfile.c
2794 pre-crt0.c
2795 @end example
2796
2797 These modules are used in conjunction with the dump mechanism.  On some
2798 systems, an alternative version of the C startup code (the actual code
2799 that receives control from the operating system when the process is
2800 started, and which calls @code{main()}) is required so that the dumping
2801 process works properly; @file{crt0.c} provides this.
2802
2803 @file{pre-crt0.c} and @file{lastfile.c} should be the very first and
2804 very last file linked, respectively. (Actually, this is not really true.
2805 @file{lastfile.c} should be after all Emacs modules whose initialized
2806 data should be made constant, and before all other Emacs files and all
2807 libraries.  In particular, the allocation modules @file{gmalloc.c},
2808 @file{alloca.c}, etc. are normally placed past @file{lastfile.c}, and
2809 all of the files that implement Xt widget classes @emph{must} be placed
2810 after @file{lastfile.c} because they contain various structures that
2811 must be statically initialized and into which Xt writes at various
2812 times.) @file{pre-crt0.c} and @file{lastfile.c} contain exported symbols
2813 that are used to determine the start and end of XEmacs' initialized
2814 data space when dumping.
2815
2816
2817
2818 @example
2819 alloca.c
2820 free-hook.c
2821 getpagesize.h
2822 gmalloc.c
2823 malloc.c
2824 mem-limits.h
2825 ralloc.c
2826 vm-limit.c
2827 @end example
2828
2829 These handle basic C allocation of memory.  @file{alloca.c} is an emulation of
2830 the stack allocation function @code{alloca()} on machines that lack
2831 this. (XEmacs makes extensive use of @code{alloca()} in its code.)
2832
2833 @file{gmalloc.c} and @file{malloc.c} are two implementations of the standard C
2834 functions @code{malloc()}, @code{realloc()} and @code{free()}.  They are
2835 often used in place of the standard system-provided @code{malloc()}
2836 because they usually provide a much faster implementation, at the
2837 expense of additional memory use.  @file{gmalloc.c} is a newer implementation
2838 that is much more memory-efficient for large allocations than @file{malloc.c},
2839 and should always be preferred if it works. (At one point, @file{gmalloc.c}
2840 didn't work on some systems where @file{malloc.c} worked; but this should be
2841 fixed now.)
2842
2843 @cindex relocating allocator
2844 @file{ralloc.c} is the @dfn{relocating allocator}.  It provides
2845 functions similar to @code{malloc()}, @code{realloc()} and @code{free()}
2846 that allocate memory that can be dynamically relocated in memory.  The
2847 advantage of this is that allocated memory can be shuffled around to
2848 place all the free memory at the end of the heap, and the heap can then
2849 be shrunk, releasing the memory back to the operating system.  The use
2850 of this can be controlled with the configure option @code{--rel-alloc};
2851 if enabled, memory allocated for buffers will be relocatable, so that if
2852 a very large file is visited and the buffer is later killed, the memory
2853 can be released to the operating system.  (The disadvantage of this
2854 mechanism is that it can be very slow.  On systems with the
2855 @code{mmap()} system call, the XEmacs version of @file{ralloc.c} uses
2856 this to move memory around without actually having to block-copy it,
2857 which can speed things up; but it can still cause noticeable performance
2858 degradation.)
2859
2860 @file{free-hook.c} contains some debugging functions for checking for invalid
2861 arguments to @code{free()}.
2862
2863 @file{vm-limit.c} contains some functions that warn the user when memory is
2864 getting low.  These are callback functions that are called by @file{gmalloc.c}
2865 and @file{malloc.c} at appropriate times.
2866
2867 @file{getpagesize.h} provides a uniform interface for retrieving the size of a
2868 page in virtual memory.  @file{mem-limits.h} provides a uniform interface for
2869 retrieving the total amount of available virtual memory.  Both are
2870 similar in spirit to the @file{sys*.h} files described in section J, below.
2871
2872
2873
2874 @example
2875 blocktype.c
2876 blocktype.h
2877 dynarr.c
2878 @end example
2879
2880 These implement a couple of basic C data types to facilitate memory
2881 allocation.  The @code{Blocktype} type efficiently manages the
2882 allocation of fixed-size blocks by minimizing the number of times that
2883 @code{malloc()} and @code{free()} are called.  It allocates memory in
2884 large chunks, subdivides the chunks into blocks of the proper size, and
2885 returns the blocks as requested.  When blocks are freed, they are placed
2886 onto a linked list, so they can be efficiently reused.  This data type
2887 is not much used in XEmacs currently, because it's a fairly new
2888 addition.
2889
2890 @cindex dynamic array
2891 The @code{Dynarr} type implements a @dfn{dynamic array}, which is
2892 similar to a standard C array but has no fixed limit on the number of
2893 elements it can contain.  Dynamic arrays can hold elements of any type,
2894 and when you add a new element, the array automatically resizes itself
2895 if it isn't big enough.  Dynarrs are extensively used in the redisplay
2896 mechanism.
2897
2898
2899
2900 @example
2901 inline.c
2902 @end example
2903
2904 This module is used in connection with inline functions (available in
2905 some compilers).  Often, inline functions need to have a corresponding
2906 non-inline function that does the same thing.  This module is where they
2907 reside.  It contains no actual code, but defines some special flags that
2908 cause inline functions defined in header files to be rendered as actual
2909 functions.  It then includes all header files that contain any inline
2910 function definitions, so that each one gets a real function equivalent.
2911
2912
2913
2914 @example
2915 debug.c
2916 debug.h
2917 @end example
2918
2919 These functions provide a system for doing internal consistency checks
2920 during code development.  This system is not currently used; instead the
2921 simpler @code{assert()} macro is used along with the various checks
2922 provided by the @samp{--error-check-*} configuration options.
2923
2924
2925
2926 @example
2927 prefix-args.c
2928 @end example
2929
2930 This is actually the source for a small, self-contained program
2931 used during building.
2932
2933
2934 @example
2935 universe.h
2936 @end example
2937
2938 This is not currently used.
2939
2940
2941
2942 @node Basic Lisp Modules
2943 @section Basic Lisp Modules
2944
2945 @example
2946 emacsfns.h
2947 lisp-disunion.h
2948 lisp-union.h
2949 lisp.h
2950 lrecord.h
2951 symsinit.h
2952 @end example
2953
2954 These are the basic header files for all XEmacs modules.  Each module
2955 includes @file{lisp.h}, which brings the other header files in.
2956 @file{lisp.h} contains the definitions of the structures and extractor
2957 and constructor macros for the basic Lisp objects and various other
2958 basic definitions for the Lisp environment, as well as some
2959 general-purpose definitions (e.g. @code{min()} and @code{max()}).
2960 @file{lisp.h} includes either @file{lisp-disunion.h} or
2961 @file{lisp-union.h}, depending on whether @code{USE_UNION_TYPE} is
2962 defined.  These files define the typedef of the Lisp object itself (as
2963 described above) and the low-level macros that hide the actual
2964 implementation of the Lisp object.  All extractor and constructor macros
2965 for particular types of Lisp objects are defined in terms of these
2966 low-level macros.
2967
2968 As a general rule, all typedefs should go into the typedefs section of
2969 @file{lisp.h} rather than into a module-specific header file even if the
2970 structure is defined elsewhere.  This allows function prototypes that
2971 use the typedef to be placed into other header files.  Forward structure
2972 declarations (i.e. a simple declaration like @code{struct foo;} where
2973 the structure itself is defined elsewhere) should be placed into the
2974 typedefs section as necessary.
2975
2976 @file{lrecord.h} contains the basic structures and macros that implement
2977 all record-type Lisp objects -- i.e. all objects whose type is a field
2978 in their C structure, which includes all objects except the few most
2979 basic ones.
2980
2981 @file{lisp.h} contains prototypes for most of the exported functions in
2982 the various modules.  Lisp primitives defined using @code{DEFUN} that
2983 need to be called by C code should be declared using @code{EXFUN}.
2984 Other function prototypes should be placed either into the appropriate
2985 section of @code{lisp.h}, or into a module-specific header file,
2986 depending on how general-purpose the function is and whether it has
2987 special-purpose argument types requiring definitions not in
2988 @file{lisp.h}.)  All initialization functions are prototyped in
2989 @file{symsinit.h}.
2990
2991
2992
2993 @example
2994 alloc.c
2995 pure.c
2996 puresize.h
2997 @end example
2998
2999 The large module @file{alloc.c} implements all of the basic allocation and
3000 garbage collection for Lisp objects.  The most commonly used Lisp
3001 objects are allocated in chunks, similar to the Blocktype data type
3002 described above; others are allocated in individually @code{malloc()}ed
3003 blocks.  This module provides the foundation on which all other aspects
3004 of the Lisp environment sit, and is the first module initialized at
3005 startup.
3006
3007 Note that @file{alloc.c} provides a series of generic functions that are
3008 not dependent on any particular object type, and interfaces to
3009 particular types of objects using a standardized interface of
3010 type-specific methods.  This scheme is a fundamental principle of
3011 object-oriented programming and is heavily used throughout XEmacs.  The
3012 great advantage of this is that it allows for a clean separation of
3013 functionality into different modules -- new classes of Lisp objects, new
3014 event interfaces, new device types, new stream interfaces, etc. can be
3015 added transparently without affecting code anywhere else in XEmacs.
3016 Because the different subsystems are divided into general and specific
3017 code, adding a new subtype within a subsystem will in general not
3018 require changes to the generic subsystem code or affect any of the other
3019 subtypes in the subsystem; this provides a great deal of robustness to
3020 the XEmacs code.
3021
3022 @cindex pure space
3023 @file{pure.c} contains the declaration of the @dfn{purespace} array.
3024 Pure space is a hack used to place some constant Lisp data into the code
3025 segment of the XEmacs executable, even though the data needs to be
3026 initialized through function calls.  (See above in section VIII for more
3027 info about this.)  During startup, certain sorts of data is
3028 automatically copied into pure space, and other data is copied manually
3029 in some of the basic Lisp files by calling the function @code{purecopy},
3030 which copies the object if possible (this only works in temacs, of
3031 course) and returns the new object.  In particular, while temacs is
3032 executing, the Lisp reader automatically copies all compiled-function
3033 objects that it reads into pure space.  Since compiled-function objects
3034 are large, are never modified, and typically comprise the majority of
3035 the contents of a compiled-Lisp file, this works well.  While XEmacs is
3036 running, any attempt to modify an object that resides in pure space
3037 causes an error.  Objects in pure space are never garbage collected --
3038 almost all of the time, they're intended to be permanent, and in any
3039 case you can't write into pure space to set the mark bits.
3040
3041 @file{puresize.h} contains the declaration of the size of the pure space
3042 array.  This depends on the optional features that are compiled in, any
3043 extra purespace requested by the user at compile time, and certain other
3044 factors (e.g. 64-bit machines need more pure space because their Lisp
3045 objects are larger).  The smallest size that suffices should be used, so
3046 that there's no wasted space.  If there's not enough pure space, you
3047 will get an error during the build process, specifying how much more
3048 pure space is needed.
3049
3050
3051
3052 @example
3053 eval.c
3054 backtrace.h
3055 @end example
3056
3057 This module contains all of the functions to handle the flow of control.
3058 This includes the mechanisms of defining functions, calling functions,
3059 traversing stack frames, and binding variables; the control primitives
3060 and other special forms such as @code{while}, @code{if}, @code{eval},
3061 @code{let}, @code{and}, @code{or}, @code{progn}, etc.; handling of
3062 non-local exits, unwind-protects, and exception handlers; entering the
3063 debugger; methods for the subr Lisp object type; etc.  It does
3064 @emph{not} include the @code{read} function, the @code{print} function,
3065 or the handling of symbols and obarrays.
3066
3067 @file{backtrace.h} contains some structures related to stack frames and the
3068 flow of control.
3069
3070
3071
3072 @example
3073 lread.c
3074 @end example
3075
3076 This module implements the Lisp reader and the @code{read} function,
3077 which converts text into Lisp objects, according to the read syntax of
3078 the objects, as described above.  This is similar to the parser that is
3079 a part of all compilers.
3080
3081
3082
3083 @example
3084 print.c
3085 @end example
3086
3087 This module implements the Lisp print mechanism and the @code{print}
3088 function and related functions.  This is the inverse of the Lisp reader
3089 -- it converts Lisp objects to a printed, textual representation.
3090 (Hopefully something that can be read back in using @code{read} to get
3091 an equivalent object.)
3092
3093
3094
3095 @example
3096 general.c
3097 symbols.c
3098 symeval.h
3099 @end example
3100
3101 @file{symbols.c} implements the handling of symbols, obarrays, and
3102 retrieving the values of symbols.  Much of the code is devoted to
3103 handling the special @dfn{symbol-value-magic} objects that define
3104 special types of variables -- this includes buffer-local variables,
3105 variable aliases, variables that forward into C variables, etc.  This
3106 module is initialized extremely early (right after @file{alloc.c}),
3107 because it is here that the basic symbols @code{t} and @code{nil} are
3108 created, and those symbols are used everywhere throughout XEmacs.
3109
3110 @file{symeval.h} contains the definitions of symbol structures and the
3111 @code{DEFVAR_LISP()} and related macros for declaring variables.
3112
3113
3114
3115 @example
3116 data.c
3117 floatfns.c
3118 fns.c
3119 @end example
3120
3121 These modules implement the methods and standard Lisp primitives for all
3122 the basic Lisp object types other than symbols (which are described
3123 above).  @file{data.c} contains all the predicates (primitives that return
3124 whether an object is of a particular type); the integer arithmetic
3125 functions; and the basic accessor and mutator primitives for the various
3126 object types.  @file{fns.c} contains all the standard predicates for working
3127 with sequences (where, abstractly speaking, a sequence is an ordered set
3128 of objects, and can be represented by a list, string, vector, or
3129 bit-vector); it also contains @code{equal}, perhaps on the grounds that
3130 bulk of the operation of @code{equal} is comparing sequences.
3131 @file{floatfns.c} contains methods and primitives for floats and floating-point
3132 arithmetic.
3133
3134
3135
3136 @example
3137 bytecode.c
3138 bytecode.h
3139 @end example
3140
3141 @file{bytecode.c} implements the byte-code interpreter and
3142 compiled-function objects, and @file{bytecode.h} contains associated
3143 structures.  Note that the byte-code @emph{compiler} is written in Lisp.
3144
3145
3146
3147
3148 @node Modules for Standard Editing Operations
3149 @section Modules for Standard Editing Operations
3150
3151 @example
3152 buffer.c
3153 buffer.h
3154 bufslots.h
3155 @end example
3156
3157 @file{buffer.c} implements the @dfn{buffer} Lisp object type.  This
3158 includes functions that create and destroy buffers; retrieve buffers by
3159 name or by other properties; manipulate lists of buffers (remember that
3160 buffers are permanent objects and stored in various ordered lists);
3161 retrieve or change buffer properties; etc.  It also contains the
3162 definitions of all the built-in buffer-local variables (which can be
3163 viewed as buffer properties).  It does @emph{not} contain code to
3164 manipulate buffer-local variables (that's in @file{symbols.c}, described
3165 above); or code to manipulate the text in a buffer.
3166
3167 @file{buffer.h} defines the structures associated with a buffer and the various
3168 macros for retrieving text from a buffer and special buffer positions
3169 (e.g. @code{point}, the default location for text insertion).  It also
3170 contains macros for working with buffer positions and converting between
3171 their representations as character offsets and as byte offsets (under
3172 MULE, they are different, because characters can be multi-byte).  It is
3173 one of the largest header files.
3174
3175 @file{bufslots.h} defines the fields in the buffer structure that correspond to
3176 the built-in buffer-local variables.  It is its own header file because
3177 it is included many times in @file{buffer.c}, as a way of iterating over all
3178 the built-in buffer-local variables.
3179
3180
3181
3182 @example
3183 insdel.c
3184 insdel.h
3185 @end example
3186
3187 @file{insdel.c} contains low-level functions for inserting and deleting text in
3188 a buffer, keeping track of changed regions for use by redisplay, and
3189 calling any before-change and after-change functions that may have been
3190 registered for the buffer.  It also contains the actual functions that
3191 convert between byte offsets and character offsets.
3192
3193 @file{insdel.h} contains associated headers.
3194
3195
3196
3197 @example
3198 marker.c
3199 @end example
3200
3201 This module implements the @dfn{marker} Lisp object type, which
3202 conceptually is a pointer to a text position in a buffer that moves
3203 around as text is inserted and deleted, so as to remain in the same
3204 relative position.  This module doesn't actually move the markers around
3205 -- that's handled in @file{insdel.c}.  This module just creates them and
3206 implements the primitives for working with them.  As markers are simple
3207 objects, this does not entail much.
3208
3209 Note that the standard arithmetic primitives (e.g. @code{+}) accept
3210 markers in place of integers and automatically substitute the value of
3211 @code{marker-position} for the marker, i.e. an integer describing the
3212 current buffer position of the marker.
3213
3214
3215
3216 @example
3217 extents.c
3218 extents.h
3219 @end example
3220
3221 This module implements the @dfn{extent} Lisp object type, which is like
3222 a marker that works over a range of text rather than a single position.
3223 Extents are also much more complex and powerful than markers and have a
3224 more efficient (and more algorithmically complex) implementation.  The
3225 implementation is described in detail in comments in @file{extents.c}.
3226
3227 The code in @file{extents.c} works closely with @file{insdel.c} so that
3228 extents are properly moved around as text is inserted and deleted.
3229 There is also code in @file{extents.c} that provides information needed
3230 by the redisplay mechanism for efficient operation. (Remember that
3231 extents can have display properties that affect [sometimes drastically,
3232 as in the @code{invisible} property] the display of the text they
3233 cover.)
3234
3235
3236
3237 @example
3238 editfns.c
3239 @end example
3240
3241 @file{editfns.c} contains the standard Lisp primitives for working with
3242 a buffer's text, and calls the low-level functions in @file{insdel.c}.
3243 It also contains primitives for working with @code{point} (the default
3244 buffer insertion location).
3245
3246 @file{editfns.c} also contains functions for retrieving various
3247 characteristics from the external environment: the current time, the
3248 process ID of the running XEmacs process, the name of the user who ran
3249 this XEmacs process, etc.  It's not clear why this code is in
3250 @file{editfns.c}.
3251
3252
3253
3254 @example
3255 callint.c
3256 cmds.c
3257 commands.h
3258 @end example
3259
3260 @cindex interactive
3261 These modules implement the basic @dfn{interactive} commands,
3262 i.e. user-callable functions.  Commands, as opposed to other functions,
3263 have special ways of getting their parameters interactively (by querying
3264 the user), as opposed to having them passed in a normal function
3265 invocation.  Many commands are not really meant to be called from other
3266 Lisp functions, because they modify global state in a way that's often
3267 undesired as part of other Lisp functions.
3268
3269 @file{callint.c} implements the mechanism for querying the user for
3270 parameters and calling interactive commands.  The bulk of this module is
3271 code that parses the interactive spec that is supplied with an
3272 interactive command.
3273
3274 @file{cmds.c} implements the basic, most commonly used editing commands:
3275 commands to move around the current buffer and insert and delete
3276 characters.  These commands are implemented using the Lisp primitives
3277 defined in @file{editfns.c}.
3278
3279 @file{commands.h} contains associated structure definitions and prototypes.
3280
3281
3282
3283 @example
3284 regex.c
3285 regex.h
3286 search.c
3287 @end example
3288
3289 @file{search.c} implements the Lisp primitives for searching for text in
3290 a buffer, and some of the low-level algorithms for doing this.  In
3291 particular, the fast fixed-string Boyer-Moore search algorithm is
3292 implemented in @file{search.c}.  The low-level algorithms for doing
3293 regular-expression searching, however, are implemented in @file{regex.c}
3294 and @file{regex.h}.  These two modules are largely independent of
3295 XEmacs, and are similar to (and based upon) the regular-expression
3296 routines used in @file{grep} and other GNU utilities.
3297
3298
3299
3300 @example
3301 doprnt.c
3302 @end example
3303
3304 @file{doprnt.c} implements formatted-string processing, similar to
3305 @code{printf()} command in C.
3306
3307
3308
3309 @example
3310 undo.c
3311 @end example
3312
3313 This module implements the undo mechanism for tracking buffer changes.
3314 Most of this could be implemented in Lisp.
3315
3316
3317
3318 @node Editor-Level Control Flow Modules
3319 @section Editor-Level Control Flow Modules
3320
3321 @example
3322 event-Xt.c
3323 event-stream.c
3324 event-tty.c
3325 events.c
3326 events.h
3327 @end example
3328
3329 These implement the handling of events (user input and other system
3330 notifications).
3331
3332 @file{events.c} and @file{events.h} define the @dfn{event} Lisp object
3333 type and primitives for manipulating it.
3334
3335 @file{event-stream.c} implements the basic functions for working with
3336 event queues, dispatching an event by looking it up in relevant keymaps
3337 and such, and handling timeouts; this includes the primitives
3338 @code{next-event} and @code{dispatch-event}, as well as related
3339 primitives such as @code{sit-for}, @code{sleep-for}, and
3340 @code{accept-process-output}. (@file{event-stream.c} is one of the
3341 hairiest and trickiest modules in XEmacs.  Beware!  You can easily mess
3342 things up here.)
3343
3344 @file{event-Xt.c} and @file{event-tty.c} implement the low-level
3345 interfaces onto retrieving events from Xt (the X toolkit) and from TTY's
3346 (using @code{read()} and @code{select()}), respectively.  The event
3347 interface enforces a clean separation between the specific code for
3348 interfacing with the operating system and the generic code for working
3349 with events, by defining an API of basic, low-level event methods;
3350 @file{event-Xt.c} and @file{event-tty.c} are two different
3351 implementations of this API.  To add support for a new operating system
3352 (e.g. NeXTstep), one merely needs to provide another implementation of
3353 those API functions.
3354
3355 Note that the choice of whether to use @file{event-Xt.c} or
3356 @file{event-tty.c} is made at compile time!  Or at the very latest, it
3357 is made at startup time.  @file{event-Xt.c} handles events for
3358 @emph{both} X and TTY frames; @file{event-tty.c} is only used when X
3359 support is not compiled into XEmacs.  The reason for this is that there
3360 is only one event loop in XEmacs: thus, it needs to be able to receive
3361 events from all different kinds of frames.
3362
3363
3364
3365 @example
3366 keymap.c
3367 keymap.h
3368 @end example
3369
3370 @file{keymap.c} and @file{keymap.h} define the @dfn{keymap} Lisp object
3371 type and associated methods and primitives. (Remember that keymaps are
3372 objects that associate event descriptions with functions to be called to
3373 ``execute'' those events; @code{dispatch-event} looks up events in the
3374 relevant keymaps.)
3375
3376
3377
3378 @example
3379 keyboard.c
3380 @end example
3381
3382 @file{keyboard.c} contains functions that implement the actual editor
3383 command loop -- i.e. the event loop that cyclically retrieves and
3384 dispatches events.  This code is also rather tricky, just like
3385 @file{event-stream.c}.
3386
3387
3388
3389 @example
3390 macros.c
3391 macros.h
3392 @end example
3393
3394 These two modules contain the basic code for defining keyboard macros.
3395 These functions don't actually do much; most of the code that handles keyboard
3396 macros is mixed in with the event-handling code in @file{event-stream.c}.
3397
3398
3399
3400 @example
3401 minibuf.c
3402 @end example
3403
3404 This contains some miscellaneous code related to the minibuffer (most of
3405 the minibuffer code was moved into Lisp by Richard Mlynarik).  This
3406 includes the primitives for completion (although filename completion is
3407 in @file{dired.c}), the lowest-level interface to the minibuffer (if the
3408 command loop were cleaned up, this too could be in Lisp), and code for
3409 dealing with the echo area (this, too, was mostly moved into Lisp, and
3410 the only code remaining is code to call out to Lisp or provide simple
3411 bootstrapping implementations early in temacs, before the echo-area Lisp
3412 code is loaded).
3413
3414
3415
3416 @node Modules for the Basic Displayable Lisp Objects
3417 @section Modules for the Basic Displayable Lisp Objects
3418
3419 @example
3420 device-ns.h
3421 device-stream.c
3422 device-stream.h
3423 device-tty.c
3424 device-tty.h
3425 device-x.c
3426 device-x.h
3427 device.c
3428 device.h
3429 @end example
3430
3431 These modules implement the @dfn{device} Lisp object type.  This
3432 abstracts a particular screen or connection on which frames are
3433 displayed.  As with Lisp objects, event interfaces, and other
3434 subsystems, the device code is separated into a generic component that
3435 contains a standardized interface (in the form of a set of methods) onto
3436 particular device types.
3437
3438 The device subsystem defines all the methods and provides method
3439 services for not only device operations but also for the frame, window,
3440 menubar, scrollbar, toolbar, and other displayable-object subsystems.
3441 The reason for this is that all of these subsystems have the same
3442 subtypes (X, TTY, NeXTstep, Microsoft Windows, etc.) as devices do.
3443
3444
3445
3446 @example
3447 frame-ns.h
3448 frame-tty.c
3449 frame-x.c
3450 frame-x.h
3451 frame.c
3452 frame.h
3453 @end example
3454
3455 Each device contains one or more frames in which objects (e.g. text) are
3456 displayed.  A frame corresponds to a window in the window system;
3457 usually this is a top-level window but it could potentially be one of a
3458 number of overlapping child windows within a top-level window, using the
3459 MDI (Multiple Document Interface) protocol in Microsoft Windows or a
3460 similar scheme.
3461
3462 The @file{frame-*} files implement the @dfn{frame} Lisp object type and
3463 provide the generic and device-type-specific operations on frames
3464 (e.g. raising, lowering, resizing, moving, etc.).
3465
3466
3467
3468 @example
3469 window.c
3470 window.h
3471 @end example
3472
3473 @cindex window (in Emacs)
3474 @cindex pane
3475 Each frame consists of one or more non-overlapping @dfn{windows} (better
3476 known as @dfn{panes} in standard window-system terminology) in which a
3477 buffer's text can be displayed.  Windows can also have scrollbars
3478 displayed around their edges.
3479
3480 @file{window.c} and @file{window.h} implement the @dfn{window} Lisp
3481 object type and provide code to manage windows.  Since windows have no
3482 associated resources in the window system (the window system knows only
3483 about the frame; no child windows or anything are used for XEmacs
3484 windows), there is no device-type-specific code here; all of that code
3485 is part of the redisplay mechanism or the code for particular object
3486 types such as scrollbars.
3487
3488
3489
3490 @node Modules for other Display-Related Lisp Objects
3491 @section Modules for other Display-Related Lisp Objects
3492
3493 @example
3494 faces.c
3495 faces.h
3496 @end example
3497
3498
3499
3500 @example
3501 bitmaps.h
3502 glyphs-ns.h
3503 glyphs-x.c
3504 glyphs-x.h
3505 glyphs.c
3506 glyphs.h
3507 @end example
3508
3509
3510
3511 @example
3512 objects-ns.h
3513 objects-tty.c
3514 objects-tty.h
3515 objects-x.c
3516 objects-x.h
3517 objects.c
3518 objects.h
3519 @end example
3520
3521
3522
3523 @example
3524 menubar-x.c
3525 menubar.c
3526 @end example
3527
3528
3529
3530 @example
3531 scrollbar-x.c
3532 scrollbar-x.h
3533 scrollbar.c
3534 scrollbar.h
3535 @end example
3536
3537
3538
3539 @example
3540 toolbar-x.c
3541 toolbar.c
3542 toolbar.h
3543 @end example
3544
3545
3546
3547 @example
3548 font-lock.c
3549 @end example
3550
3551 This file provides C support for syntax highlighting -- i.e.
3552 highlighting different syntactic constructs of a source file in
3553 different colors, for easy reading.  The C support is provided so that
3554 this is fast.
3555
3556
3557
3558 @example
3559 dgif_lib.c
3560 gif_err.c
3561 gif_lib.h
3562 gifalloc.c
3563 @end example
3564
3565 These modules decode GIF-format image files, for use with glyphs.
3566
3567
3568
3569 @node Modules for the Redisplay Mechanism
3570 @section Modules for the Redisplay Mechanism
3571
3572 @example
3573 redisplay-output.c
3574 redisplay-tty.c
3575 redisplay-x.c
3576 redisplay.c
3577 redisplay.h
3578 @end example
3579
3580 These files provide the redisplay mechanism.  As with many other
3581 subsystems in XEmacs, there is a clean separation between the general
3582 and device-specific support.
3583
3584 @file{redisplay.c} contains the bulk of the redisplay engine.  These
3585 functions update the redisplay structures (which describe how the screen
3586 is to appear) to reflect any changes made to the state of any
3587 displayable objects (buffer, frame, window, etc.) since the last time
3588 that redisplay was called.  These functions are highly optimized to
3589 avoid doing more work than necessary (since redisplay is called
3590 extremely often and is potentially a huge time sink), and depend heavily
3591 on notifications from the objects themselves that changes have occurred,
3592 so that redisplay doesn't explicitly have to check each possible object.
3593 The redisplay mechanism also contains a great deal of caching to further
3594 speed things up; some of this caching is contained within the various
3595 displayable objects.
3596
3597 @file{redisplay-output.c} goes through the redisplay structures and converts
3598 them into calls to device-specific methods to actually output the screen
3599 changes.
3600
3601 @file{redisplay-x.c} and @file{redisplay-tty.c} are two implementations
3602 of these redisplay output methods, for X frames and TTY frames,
3603 respectively.
3604
3605
3606
3607 @example
3608 indent.c
3609 @end example
3610
3611 This module contains various functions and Lisp primitives for
3612 converting between buffer positions and screen positions.  These
3613 functions call the redisplay mechanism to do most of the work, and then
3614 examine the redisplay structures to get the necessary information.  This
3615 module needs work.
3616
3617
3618
3619 @example
3620 termcap.c
3621 terminfo.c
3622 tparam.c
3623 @end example
3624
3625 These files contain functions for working with the termcap (BSD-style)
3626 and terminfo (System V style) databases of terminal capabilities and
3627 escape sequences, used when XEmacs is displaying in a TTY.
3628
3629
3630
3631 @example
3632 cm.c
3633 cm.h
3634 @end example
3635
3636 These files provide some miscellaneous TTY-output functions and should
3637 probably be merged into @file{redisplay-tty.c}.
3638
3639
3640
3641 @node Modules for Interfacing with the File System
3642 @section Modules for Interfacing with the File System
3643
3644 @example
3645 lstream.c
3646 lstream.h
3647 @end example
3648
3649 These modules implement the @dfn{stream} Lisp object type.  This is an
3650 internal-only Lisp object that implements a generic buffering stream.
3651 The idea is to provide a uniform interface onto all sources and sinks of
3652 data, including file descriptors, stdio streams, chunks of memory, Lisp
3653 buffers, Lisp strings, etc.  That way, I/O functions can be written to
3654 the stream interface and can transparently handle all possible sources
3655 and sinks.  (For example, the @code{read} function can read data from a
3656 file, a string, a buffer, or even a function that is called repeatedly
3657 to return data, without worrying about where the data is coming from or
3658 what-size chunks it is returned in.)
3659
3660 @cindex lstream
3661 Note that in the C code, streams are called @dfn{lstreams} (for ``Lisp
3662 streams'') to distinguish them from other kinds of streams, e.g. stdio
3663 streams and C++ I/O streams.
3664
3665 Similar to other subsystems in XEmacs, lstreams are separated into
3666 generic functions and a set of methods for the different types of
3667 lstreams.  @file{lstream.c} provides implementations of many different
3668 types of streams; others are provided, e.g., in @file{mule-coding.c}.
3669
3670
3671
3672 @example
3673 fileio.c
3674 @end example
3675
3676 This implements the basic primitives for interfacing with the file
3677 system.  This includes primitives for reading files into buffers,
3678 writing buffers into files, checking for the presence or accessibility
3679 of files, canonicalizing file names, etc.  Note that these primitives
3680 are usually not invoked directly by the user: There is a great deal of
3681 higher-level Lisp code that implements the user commands such as
3682 @code{find-file} and @code{save-buffer}.  This is similar to the
3683 distinction between the lower-level primitives in @file{editfns.c} and
3684 the higher-level user commands in @file{commands.c} and
3685 @file{simple.el}.
3686
3687
3688
3689 @example
3690 filelock.c
3691 @end example
3692
3693 This file provides functions for detecting clashes between different
3694 processes (e.g. XEmacs and some external process, or two different
3695 XEmacs processes) modifying the same file.  (XEmacs can optionally use
3696 the @file{lock/} subdirectory to provide a form of ``locking'' between
3697 different XEmacs processes.)  This module is also used by the low-level
3698 functions in @file{insdel.c} to ensure that, if the first modification
3699 is being made to a buffer whose corresponding file has been externally
3700 modified, the user is made aware of this so that the buffer can be
3701 synched up with the external changes if necessary.
3702
3703
3704 @example
3705 filemode.c
3706 @end example
3707
3708 This file provides some miscellaneous functions that construct a
3709 @samp{rwxr-xr-x}-type permissions string (as might appear in an
3710 @file{ls}-style directory listing) given the information returned by the
3711 @code{stat()} system call.
3712
3713
3714
3715 @example
3716 dired.c
3717 ndir.h
3718 @end example
3719
3720 These files implement the XEmacs interface to directory searching.  This
3721 includes a number of primitives for determining the files in a directory
3722 and for doing filename completion. (Remember that generic completion is
3723 handled by a different mechanism, in @file{minibuf.c}.)
3724
3725 @file{ndir.h} is a header file used for the directory-searching
3726 emulation functions provided in @file{sysdep.c} (see section J below),
3727 for systems that don't provide any directory-searching functions. (On
3728 those systems, directories can be read directly as files, and parsed.)
3729
3730
3731
3732 @example
3733 realpath.c
3734 @end example
3735
3736 This file provides an implementation of the @code{realpath()} function
3737 for expanding symbolic links, on systems that don't implement it or have
3738 a broken implementation.
3739
3740
3741
3742 @node Modules for Other Aspects of the Lisp Interpreter and Object System
3743 @section Modules for Other Aspects of the Lisp Interpreter and Object System
3744
3745 @example
3746 elhash.c
3747 elhash.h
3748 hash.c
3749 hash.h
3750 @end example
3751
3752 These files provide two implementations of hash tables.  Files
3753 @file{hash.c} and @file{hash.h} provide a generic C implementation of
3754 hash tables which can stand independently of XEmacs.  Files
3755 @file{elhash.c} and @file{elhash.h} provide a separate implementation of
3756 hash tables that can store only Lisp objects, and knows about Lispy
3757 things like garbage collection, and implement the @dfn{hash-table} Lisp
3758 object type.
3759
3760
3761 @example
3762 specifier.c
3763 specifier.h
3764 @end example
3765
3766 This module implements the @dfn{specifier} Lisp object type.  This is
3767 primarily used for displayable properties, and allows for values that
3768 are specific to a particular buffer, window, frame, device, or device
3769 class, as well as a default value existing.  This is used, for example,
3770 to control the height of the horizontal scrollbar or the appearance of
3771 the @code{default}, @code{bold}, or other faces.  The specifier object
3772 consists of a number of specifications, each of which maps from a
3773 buffer, window, etc. to a value.  The function @code{specifier-instance}
3774 looks up a value given a window (from which a buffer, frame, and device
3775 can be derived).
3776
3777
3778 @example
3779 chartab.c
3780 chartab.h
3781 casetab.c
3782 @end example
3783
3784 @file{chartab.c} and @file{chartab.h} implement the @dfn{char table}
3785 Lisp object type, which maps from characters or certain sorts of
3786 character ranges to Lisp objects.  The implementation of this object
3787 type is optimized for the internal representation of characters.  Char
3788 tables come in different types, which affect the allowed object types to
3789 which a character can be mapped and also dictate certain other
3790 properties of the char table.
3791
3792 @cindex case table
3793 @file{casetab.c} implements one sort of char table, the @dfn{case
3794 table}, which maps characters to other characters of possibly different
3795 case.  These are used by XEmacs to implement case-changing primitives
3796 and to do case-insensitive searching.
3797
3798
3799
3800 @example
3801 syntax.c
3802 syntax.h
3803 @end example
3804
3805 @cindex scanner
3806 This module implements @dfn{syntax tables}, another sort of char table
3807 that maps characters into syntax classes that define the syntax of these
3808 characters (e.g. a parenthesis belongs to a class of @samp{open}
3809 characters that have corresponding @samp{close} characters and can be
3810 nested).  This module also implements the Lisp @dfn{scanner}, a set of
3811 primitives for scanning over text based on syntax tables.  This is used,
3812 for example, to find the matching parenthesis in a command such as
3813 @code{forward-sexp}, and by @file{font-lock.c} to locate quoted strings,
3814 comments, etc.
3815
3816
3817
3818 @example
3819 casefiddle.c
3820 @end example
3821
3822 This module implements various Lisp primitives for upcasing, downcasing
3823 and capitalizing strings or regions of buffers.
3824
3825
3826
3827 @example
3828 rangetab.c
3829 @end example
3830
3831 This module implements the @dfn{range table} Lisp object type, which
3832 provides for a mapping from ranges of integers to arbitrary Lisp
3833 objects.
3834
3835
3836
3837 @example
3838 opaque.c
3839 opaque.h
3840 @end example
3841
3842 This module implements the @dfn{opaque} Lisp object type, an
3843 internal-only Lisp object that encapsulates an arbitrary block of memory
3844 so that it can be managed by the Lisp allocation system.  To create an
3845 opaque object, you call @code{make_opaque()}, passing a pointer to a
3846 block of memory.  An object is created that is big enough to hold the
3847 memory, which is copied into the object's storage.  The object will then
3848 stick around as long as you keep pointers to it, after which it will be
3849 automatically reclaimed.
3850
3851 @cindex mark method
3852 Opaque objects can also have an arbitrary @dfn{mark method} associated
3853 with them, in case the block of memory contains other Lisp objects that
3854 need to be marked for garbage-collection purposes. (If you need other
3855 object methods, such as a finalize method, you should just go ahead and
3856 create a new Lisp object type -- it's not hard.)
3857
3858
3859
3860 @example
3861 abbrev.c
3862 @end example
3863
3864 This function provides a few primitives for doing dynamic abbreviation
3865 expansion.  In XEmacs, most of the code for this has been moved into
3866 Lisp.  Some C code remains for speed and because the primitive
3867 @code{self-insert-command} (which is executed for all self-inserting
3868 characters) hooks into the abbrev mechanism. (@code{self-insert-command}
3869 is itself in C only for speed.)
3870
3871
3872
3873 @example
3874 doc.c
3875 @end example
3876
3877 This function provides primitives for retrieving the documentation
3878 strings of functions and variables.  These documentation strings contain
3879 certain special markers that get dynamically expanded (e.g. a
3880 reverse-lookup is performed on some named functions to retrieve their
3881 current key bindings).  Some documentation strings (in particular, for
3882 the built-in primitives and pre-loaded Lisp functions) are stored
3883 externally in a file @file{DOC} in the @file{lib-src/} directory and
3884 need to be fetched from that file. (Part of the build stage involves
3885 building this file, and another part involves constructing an index for
3886 this file and embedding it into the executable, so that the functions in
3887 @file{doc.c} do not have to search the entire @file{DOC} file to find
3888 the appropriate documentation string.)
3889
3890
3891
3892 @example
3893 md5.c
3894 @end example
3895
3896 This function provides a Lisp primitive that implements the MD5 secure
3897 hashing scheme, used to create a large hash value of a string of data such that
3898 the data cannot be derived from the hash value.  This is used for
3899 various security applications on the Internet.
3900
3901
3902
3903
3904 @node Modules for Interfacing with the Operating System
3905 @section Modules for Interfacing with the Operating System
3906
3907 @example
3908 callproc.c
3909 process.c
3910 process.h
3911 @end example
3912
3913 These modules allow XEmacs to spawn and communicate with subprocesses
3914 and network connections.
3915
3916 @cindex synchronous subprocesses
3917 @cindex subprocesses, synchronous
3918   @file{callproc.c} implements (through the @code{call-process}
3919 primitive) what are called @dfn{synchronous subprocesses}.  This means
3920 that XEmacs runs a program, waits till it's done, and retrieves its
3921 output.  A typical example might be calling the @file{ls} program to get
3922 a directory listing.
3923
3924 @cindex asynchronous subprocesses
3925 @cindex subprocesses, asynchronous
3926   @file{process.c} and @file{process.h} implement @dfn{asynchronous
3927 subprocesses}.  This means that XEmacs starts a program and then
3928 continues normally, not waiting for the process to finish.  Data can be
3929 sent to the process or retrieved from it as it's running.  This is used
3930 for the @code{shell} command (which provides a front end onto a shell
3931 program such as @file{csh}), the mail and news readers implemented in
3932 XEmacs, etc.  The result of calling @code{start-process} to start a
3933 subprocess is a process object, a particular kind of object used to
3934 communicate with the subprocess.  You can send data to the process by
3935 passing the process object and the data to @code{send-process}, and you
3936 can specify what happens to data retrieved from the process by setting
3937 properties of the process object. (When the process sends data, XEmacs
3938 receives a process event, which says that there is data ready.  When
3939 @code{dispatch-event} is called on this event, it reads the data from
3940 the process and does something with it, as specified by the process
3941 object's properties.  Typically, this means inserting the data into a
3942 buffer or calling a function.) Another property of the process object is
3943 called the @dfn{sentinel}, which is a function that is called when the
3944 process terminates.
3945
3946 @cindex network connections
3947   Process objects are also used for network connections (connections to a
3948 process running on another machine).  Network connections are started
3949 with @code{open-network-stream} but otherwise work just like
3950 subprocesses.
3951
3952
3953
3954 @example
3955 sysdep.c
3956 sysdep.h
3957 @end example
3958
3959   These modules implement most of the low-level, messy operating-system
3960 interface code.  This includes various device control (ioctl) operations
3961 for file descriptors, TTY's, pseudo-terminals, etc. (usually this stuff
3962 is fairly system-dependent; thus the name of this module), and emulation
3963 of standard library functions and system calls on systems that don't
3964 provide them or have broken versions.
3965
3966
3967
3968 @example
3969 sysdir.h
3970 sysfile.h
3971 sysfloat.h
3972 sysproc.h
3973 syspwd.h
3974 syssignal.h
3975 systime.h
3976 systty.h
3977 syswait.h
3978 @end example
3979
3980 These header files provide consistent interfaces onto system-dependent
3981 header files and system calls.  The idea is that, instead of including a
3982 standard header file like @file{<sys/param.h>} (which may or may not
3983 exist on various systems) or having to worry about whether all system
3984 provide a particular preprocessor constant, or having to deal with the
3985 four different paradigms for manipulating signals, you just include the
3986 appropriate @file{sys*.h} header file, which includes all the right
3987 system header files, defines and missing preprocessor constants,
3988 provides a uniform interface onto system calls, etc.
3989
3990 @file{sysdir.h} provides a uniform interface onto directory-querying
3991 functions. (In some cases, this is in conjunction with emulation
3992 functions in @file{sysdep.c}.)
3993
3994 @file{sysfile.h} includes all the necessary header files for standard
3995 system calls (e.g. @code{read()}), ensures that all necessary
3996 @code{open()} and @code{stat()} preprocessor constants are defined, and
3997 possibly (usually) substitutes sugared versions of @code{read()},
3998 @code{write()}, etc. that automatically restart interrupted I/O
3999 operations.
4000
4001 @file{sysfloat.h} includes the necessary header files for floating-point
4002 operations.
4003
4004 @file{sysproc.h} includes the necessary header files for calling
4005 @code{select()}, @code{fork()}, @code{execve()}, socket operations, and
4006 the like, and ensures that the @code{FD_*()} macros for descriptor-set
4007 manipulations are available.
4008
4009 @file{syspwd.h} includes the necessary header files for obtaining
4010 information from @file{/etc/passwd} (the functions are emulated under
4011 VMS).
4012
4013 @file{syssignal.h} includes the necessary header files for
4014 signal-handling and provides a uniform interface onto the different
4015 signal-handling and signal-blocking paradigms.
4016
4017 @file{systime.h} includes the necessary header files and provides
4018 uniform interfaces for retrieving the time of day, setting file
4019 access/modification times, getting the amount of time used by the XEmacs
4020 process, etc.
4021
4022 @file{systty.h} buffers against the infinitude of different ways of
4023 controlling TTY's.
4024
4025 @file{syswait.h} provides a uniform way of retrieving the exit status
4026 from a @code{wait()}ed-on process (some systems use a union, others use
4027 an int).
4028
4029
4030
4031 @example
4032 hpplay.c
4033 libsst.c
4034 libsst.h
4035 libst.h
4036 linuxplay.c
4037 nas.c
4038 sgiplay.c
4039 sound.c
4040 sunplay.c
4041 @end example
4042
4043 These files implement the ability to play various sounds on some types
4044 of computers.  You have to configure your XEmacs with sound support in
4045 order to get this capability.
4046
4047 @file{sound.c} provides the generic interface.  It implements various
4048 Lisp primitives and variables that let you specify which sounds should
4049 be played in certain conditions. (The conditions are identified by
4050 symbols, which are passed to @code{ding} to make a sound.  Various
4051 standard functions call this function at certain times; if sound support
4052 does not exist, a simple beep results.
4053
4054 @cindex native sound
4055 @cindex sound, native
4056 @file{sgiplay.c}, @file{sunplay.c}, @file{hpplay.c}, and
4057 @file{linuxplay.c} interface to the machine's speaker for various
4058 different kind of machines.  This is called @dfn{native} sound.
4059
4060 @cindex sound, network
4061 @cindex network sound
4062 @cindex NAS
4063 @file{nas.c} interfaces to a computer somewhere else on the network
4064 using the NAS (Network Audio Server) protocol, playing sounds on that
4065 machine.  This allows you to run XEmacs on a remote machine, with its
4066 display set to your local machine, and have the sounds be made on your
4067 local machine, provided that you have a NAS server running on your local
4068 machine.
4069
4070 @file{libsst.c}, @file{libsst.h}, and @file{libst.h} provide some
4071 additional functions for playing sound on a Sun SPARC but are not
4072 currently in use.
4073
4074
4075
4076 @example
4077 tooltalk.c
4078 tooltalk.h
4079 @end example
4080
4081 These two modules implement an interface to the ToolTalk protocol, which
4082 is an interprocess communication protocol implemented on some versions
4083 of Unix.  ToolTalk is a high-level protocol that allows processes to
4084 register themselves as providers of particular services; other processes
4085 can then request a service without knowing or caring exactly who is
4086 providing the service.  It is similar in spirit to the DDE protocol
4087 provided under Microsoft Windows.  ToolTalk is a part of the new CDE
4088 (Common Desktop Environment) specification and is used to connect the
4089 parts of the SPARCWorks development environment.
4090
4091
4092
4093 @example
4094 getloadavg.c
4095 @end example
4096
4097 This module provides the ability to retrieve the system's current load
4098 average. (The way to do this is highly system-specific, unfortunately,
4099 and requires a lot of special-case code.)
4100
4101
4102
4103 @example
4104 sunpro.c
4105 @end example
4106
4107 This module provides a small amount of code used internally at Sun to
4108 keep statistics on the usage of XEmacs.
4109
4110
4111
4112 @example
4113 broken-sun.h
4114 strcmp.c
4115 strcpy.c
4116 sunOS-fix.c
4117 @end example
4118
4119 These files provide replacement functions and prototypes to fix numerous
4120 bugs in early releases of SunOS 4.1.
4121
4122
4123
4124 @example
4125 hftctl.c
4126 @end example
4127
4128 This module provides some terminal-control code necessary on versions of
4129 AIX prior to 4.1.
4130
4131
4132
4133 @example
4134 msdos.c
4135 msdos.h
4136 @end example
4137
4138 These modules are used for MS-DOS support, which does not work in
4139 XEmacs.
4140
4141
4142
4143 @node Modules for Interfacing with X Windows
4144 @section Modules for Interfacing with X Windows
4145
4146 @example
4147 Emacs.ad.h
4148 @end example
4149
4150 A file generated from @file{Emacs.ad}, which contains XEmacs-supplied
4151 fallback resources (so that XEmacs has pretty defaults).
4152
4153
4154
4155 @example
4156 EmacsFrame.c
4157 EmacsFrame.h
4158 EmacsFrameP.h
4159 @end example
4160
4161 These modules implement an Xt widget class that encapsulates a frame.
4162 This is for ease in integrating with Xt.  The EmacsFrame widget covers
4163 the entire X window except for the menubar; the scrollbars are
4164 positioned on top of the EmacsFrame widget.
4165
4166 @strong{Warning:} Abandon hope, all ye who enter here.  This code took
4167 an ungodly amount of time to get right, and is likely to fall apart
4168 mercilessly at the slightest change.  Such is life under Xt.
4169
4170
4171
4172 @example
4173 EmacsManager.c
4174 EmacsManager.h
4175 EmacsManagerP.h
4176 @end example
4177
4178 These modules implement a simple Xt manager (i.e. composite) widget
4179 class that simply lets its children set whatever geometry they want.
4180 It's amazing that Xt doesn't provide this standardly, but on second
4181 thought, it makes sense, considering how amazingly broken Xt is.
4182
4183
4184 @example
4185 EmacsShell-sub.c
4186 EmacsShell.c
4187 EmacsShell.h
4188 EmacsShellP.h
4189 @end example
4190
4191 These modules implement two Xt widget classes that are subclasses of
4192 the TopLevelShell and TransientShell classes.  This is necessary to deal
4193 with more brokenness that Xt has sadistically thrust onto the backs of
4194 developers.
4195
4196
4197
4198 @example
4199 xgccache.c
4200 xgccache.h
4201 @end example
4202
4203 These modules provide functions for maintenance and caching of GC's
4204 (graphics contexts) under the X Window System.  This code is junky and
4205 needs to be rewritten.
4206
4207
4208
4209 @example
4210 xselect.c
4211 @end example
4212
4213 @cindex selections
4214   This module provides an interface to the X Window System's concept of
4215 @dfn{selections}, the standard way for X applications to communicate
4216 with each other.
4217
4218
4219
4220 @example
4221 xintrinsic.h
4222 xintrinsicp.h
4223 xmmanagerp.h
4224 xmprimitivep.h
4225 @end example
4226
4227 These header files are similar in spirit to the @file{sys*.h} files and buffer
4228 against different implementations of Xt and Motif.
4229
4230 @itemize @bullet
4231 @item
4232 @file{xintrinsic.h} should be included in place of @file{<Intrinsic.h>}.
4233 @item
4234 @file{xintrinsicp.h} should be included in place of @file{<IntrinsicP.h>}.
4235 @item
4236 @file{xmmanagerp.h} should be included in place of @file{<XmManagerP.h>}.
4237 @item
4238 @file{xmprimitivep.h} should be included in place of @file{<XmPrimitiveP.h>}.
4239 @end itemize
4240
4241
4242
4243 @example
4244 xmu.c
4245 xmu.h
4246 @end example
4247
4248 These files provide an emulation of the Xmu library for those systems
4249 (i.e. HPUX) that don't provide it as a standard part of X.
4250
4251
4252
4253 @example
4254 ExternalClient-Xlib.c
4255 ExternalClient.c
4256 ExternalClient.h
4257 ExternalClientP.h
4258 ExternalShell.c
4259 ExternalShell.h
4260 ExternalShellP.h
4261 extw-Xlib.c
4262 extw-Xlib.h
4263 extw-Xt.c
4264 extw-Xt.h
4265 @end example
4266
4267 @cindex external widget
4268   These files provide the @dfn{external widget} interface, which allows an
4269 XEmacs frame to appear as a widget in another application.  To do this,
4270 you have to configure with @samp{--external-widget}.
4271
4272 @file{ExternalShell*} provides the server (XEmacs) side of the
4273 connection.
4274
4275 @file{ExternalClient*} provides the client (other application) side of
4276 the connection.  These files are not compiled into XEmacs but are
4277 compiled into libraries that are then linked into your application.
4278
4279 @file{extw-*} is common code that is used for both the client and server.
4280
4281 Don't touch this code; something is liable to break if you do.
4282
4283
4284
4285 @node Modules for Internationalization
4286 @section Modules for Internationalization
4287
4288 @example
4289 mule-canna.c
4290 mule-ccl.c
4291 mule-charset.c
4292 mule-charset.h
4293 mule-coding.c
4294 mule-coding.h
4295 mule-mcpath.c
4296 mule-mcpath.h
4297 mule-wnnfns.c
4298 mule.c
4299 @end example
4300
4301 These files implement the MULE (Asian-language) support.  Note that MULE
4302 actually provides a general interface for all sorts of languages, not
4303 just Asian languages (although they are generally the most complicated
4304 to support).  This code is still in beta.
4305
4306 @file{mule-charset.*} and @file{mule-coding.*} provide the heart of the
4307 XEmacs MULE support.  @file{mule-charset.*} implements the @dfn{charset}
4308 Lisp object type, which encapsulates a character set (an ordered one- or
4309 two-dimensional set of characters, such as US ASCII or JISX0208 Japanese
4310 Kanji).
4311
4312 @file{mule-coding.*} implements the @dfn{coding-system} Lisp object
4313 type, which encapsulates a method of converting between different
4314 encodings.  An encoding is a representation of a stream of characters,
4315 possibly from multiple character sets, using a stream of bytes or words,
4316 and defines (e.g.) which escape sequences are used to specify particular
4317 character sets, how the indices for a character are converted into bytes
4318 (sometimes this involves setting the high bit; sometimes complicated
4319 rearranging of the values takes place, as in the Shift-JIS encoding),
4320 etc.
4321
4322 @file{mule-ccl.c} provides the CCL (Code Conversion Language)
4323 interpreter.  CCL is similar in spirit to Lisp byte code and is used to
4324 implement converters for custom encodings.
4325
4326 @file{mule-canna.c} and @file{mule-wnnfns.c} implement interfaces to
4327 external programs used to implement the Canna and WNN input methods,
4328 respectively.  This is currently in beta.
4329
4330 @file{mule-mcpath.c} provides some functions to allow for pathnames
4331 containing extended characters.  This code is fragmentary, obsolete, and
4332 completely non-working.  Instead, @var{pathname-coding-system} is used
4333 to specify conversions of names of files and directories.  The standard
4334 C I/O functions like @samp{open()} are wrapped so that conversion occurs
4335 automatically.
4336
4337 @file{mule.c} provides a few miscellaneous things that should probably
4338 be elsewhere.
4339
4340
4341
4342 @example
4343 intl.c
4344 @end example
4345
4346 This provides some miscellaneous internationalization code for
4347 implementing message translation and interfacing to the Ximp input
4348 method.  None of this code is currently working.
4349
4350
4351
4352 @example
4353 iso-wide.h
4354 @end example
4355
4356 This contains leftover code from an earlier implementation of
4357 Asian-language support, and is not currently used.
4358
4359
4360
4361
4362 @node Allocation of Objects in XEmacs Lisp, Events and the Event Loop, A Summary of the Various XEmacs Modules, Top
4363 @chapter Allocation of Objects in XEmacs Lisp
4364
4365 @menu
4366 * Introduction to Allocation::
4367 * Garbage Collection::
4368 * GCPROing::
4369 * Integers and Characters::
4370 * Allocation from Frob Blocks::
4371 * lrecords::
4372 * Low-level allocation::
4373 * Pure Space::
4374 * Cons::
4375 * Vector::
4376 * Bit Vector::
4377 * Symbol::
4378 * Marker::
4379 * String::
4380 * Compiled Function::
4381 @end menu
4382
4383 @node Introduction to Allocation
4384 @section Introduction to Allocation
4385
4386   Emacs Lisp, like all Lisps, has garbage collection.  This means that
4387 the programmer never has to explicitly free (destroy) an object; it
4388 happens automatically when the object becomes inaccessible.  Most
4389 experts agree that garbage collection is a necessity in a modern,
4390 high-level language.  Its omission from C stems from the fact that C was
4391 originally designed to be a nice abstract layer on top of assembly
4392 language, for writing kernels and basic system utilities rather than
4393 large applications.
4394
4395   Lisp objects can be created by any of a number of Lisp primitives.
4396 Most object types have one or a small number of basic primitives
4397 for creating objects.  For conses, the basic primitive is @code{cons};
4398 for vectors, the primitives are @code{make-vector} and @code{vector}; for
4399 symbols, the primitives are @code{make-symbol} and @code{intern}; etc.
4400 Some Lisp objects, especially those that are primarily used internally,
4401 have no corresponding Lisp primitives.  Every Lisp object, though,
4402 has at least one C primitive for creating it.
4403
4404   Recall from section (VII) that a Lisp object, as stored in a 32-bit
4405 or 64-bit word, has a mark bit, a few tag bits, and a ``value'' that
4406 occupies the remainder of the bits.  We can separate the different
4407 Lisp object types into four broad categories:
4408
4409 @itemize @bullet
4410 @item
4411 (a) Those for whom the value directly represents the contents of the
4412 Lisp object.  Only two types are in this category: integers and
4413 characters.  No special allocation or garbage collection is necessary
4414 for such objects.  Lisp objects of these types do not need to be
4415 @code{GCPRO}ed.
4416 @end itemize
4417
4418   In the remaining three categories, the value is a pointer to a
4419 structure.
4420
4421 @itemize @bullet
4422 @item
4423 @cindex frob block
4424 (b) Those for whom the tag directly specifies the type.  Recall that
4425 there are only three tag bits; this means that at most five types can be
4426 specified this way.  The most commonly-used types are stored in this
4427 format; this includes conses, strings, vectors, and sometimes symbols.
4428 With the exception of vectors, objects in this category are allocated in
4429 @dfn{frob blocks}, i.e. large blocks of memory that are subdivided into
4430 individual objects.  This saves a lot on malloc overhead, since there
4431 are typically quite a lot of these objects around, and the objects are
4432 small.  (A cons, for example, occupies 8 bytes on 32-bit machines -- 4
4433 bytes for each of the two objects it contains.) Vectors are individually
4434 @code{malloc()}ed since they are of variable size.  (It would be
4435 possible, and desirable, to allocate vectors of certain small sizes out
4436 of frob blocks, but it isn't currently done.) Strings are handled
4437 specially: Each string is allocated in two parts, a fixed size structure
4438 containing a length and a data pointer, and the actual data of the
4439 string.  The former structure is allocated in frob blocks as usual, and
4440 the latter data is stored in @dfn{string chars blocks} and is relocated
4441 during garbage collection to eliminate holes.
4442 @end itemize
4443
4444   In the remaining two categories, the type is stored in the object
4445 itself.  The tag for all such objects is the generic @dfn{lrecord}
4446 (Lisp_Record) tag.  The first four bytes (or eight, for 64-bit machines)
4447 of the object's structure are a pointer to a structure that describes
4448 the object's type, which includes method pointers and a pointer to a
4449 string naming the type.  Note that it's possible to save some space by
4450 using a one- or two-byte tag, rather than a four- or eight-byte pointer
4451 to store the type, but it's not clear it's worth making the change.
4452
4453 @itemize @bullet
4454 @item
4455 (c) Those lrecords that are allocated in frob blocks (see above).  This
4456 includes the objects that are most common and relatively small, and
4457 includes floats, compiled functions, symbols (when not in category (b)),
4458 extents, events, and markers.  With the cleanup of frob blocks done in
4459 19.12, it's not terribly hard to add more objects to this category, but
4460 it's a bit trickier than adding an object type to type (d) (esp. if the
4461 object needs a finalization method), and is not likely to save much
4462 space unless the object is small and there are many of them. (In fact,
4463 if there are very few of them, it might actually waste space.)
4464 @item
4465 (d) Those lrecords that are individually @code{malloc()}ed.  These are
4466 called @dfn{lcrecords}.  All other types are in this category.  Adding a
4467 new type to this category is comparatively easy, and all types added
4468 since 19.8 (when the current allocation scheme was devised, by Richard
4469 Mlynarik), with the exception of the character type, have been in this
4470 category.
4471 @end itemize
4472
4473   Note that bit vectors are a bit of a special case.  They are
4474 simple lrecords as in category (c), but are individually @code{malloc()}ed
4475 like vectors.  You can basically view them as exactly like vectors
4476 except that their type is stored in lrecord fashion rather than
4477 in directly-tagged fashion.
4478
4479   Note that FSF Emacs redesigned their object system in 19.29 to follow
4480 a similar scheme.  However, given RMS's expressed dislike for data
4481 abstraction, the FSF scheme is not nearly as clean or as easy to
4482 extend. (FSF calls items of type (c) @code{Lisp_Misc} and items of type
4483 (d) @code{Lisp_Vectorlike}, with separate tags for each, although
4484 @code{Lisp_Vectorlike} is also used for vectors.)
4485
4486 @node Garbage Collection
4487 @section Garbage Collection
4488 @cindex garbage collection
4489
4490 @cindex mark and sweep
4491   Garbage collection is simple in theory but tricky to implement.
4492 Emacs Lisp uses the oldest garbage collection method, called
4493 @dfn{mark and sweep}.  Garbage collection begins by starting with
4494 all accessible locations (i.e. all variables and other slots where
4495 Lisp objects might occur) and recursively traversing all objects
4496 accessible from those slots, marking each one that is found.
4497 We then go through all of memory and free each object that is
4498 not marked, and unmarking each object that is marked.  Note
4499 that ``all of memory'' means all currently allocated objects.
4500 Traversing all these objects means traversing all frob blocks,
4501 all vectors (which are chained in one big list), and all
4502 lcrecords (which are likewise chained).
4503
4504   Note that, when an object is marked, the mark has to occur
4505 inside of the object's structure, rather than in the 32-bit
4506 @code{Lisp_Object} holding the object's pointer; i.e. you can't just
4507 set the pointer's mark bit.  This is because there may be many
4508 pointers to the same object.  This means that the method of
4509 marking an object can differ depending on the type.  The
4510 different marking methods are approximately as follows:
4511
4512 @enumerate
4513 @item
4514 For conses, the mark bit of the car is set.
4515 @item
4516 For strings, the mark bit of the string's plist is set.
4517 @item
4518 For symbols when not lrecords, the mark bit of the
4519 symbol's plist is set.
4520 @item
4521 For vectors, the length is negated after adding 1.
4522 @item
4523 For lrecords, the pointer to the structure describing
4524 the type is changed (see below).
4525 @item
4526 Integers and characters do not need to be marked, since
4527 no allocation occurs for them.
4528 @end enumerate
4529
4530   The details of this are in the @code{mark_object()} function.
4531
4532   Note that any code that operates during garbage collection has
4533 to be especially careful because of the fact that some objects
4534 may be marked and as such may not look like they normally do.
4535 In particular:
4536
4537 @itemize @bullet
4538 Some object pointers may have their mark bit set.  This will make
4539 @code{FOOBARP()} predicates fail.  Use @code{GC_FOOBARP()} to deal with
4540 this.
4541 @item
4542 Even if you clear the mark bit, @code{FOOBARP()} will still fail
4543 for lrecords because the implementation pointer has been
4544 changed (see below).  @code{GC_FOOBARP()} will correctly deal with
4545 this.
4546 @item
4547 Vectors have their size field munged, so anything that
4548 looks at this field will fail.
4549 @item
4550 Note that @code{XFOOBAR()} macros @emph{will} work correctly on object
4551 pointers with their mark bit set, because the logical shift operations
4552 that remove the tag also remove the mark bit.
4553 @end itemize
4554
4555   Finally, note that garbage collection can be invoked explicitly
4556 by calling @code{garbage-collect} but is also called automatically
4557 by @code{eval}, once a certain amount of memory has been allocated
4558 since the last garbage collection (according to @code{gc-cons-threshold}).
4559
4560 @node GCPROing
4561 @section @code{GCPRO}ing
4562
4563 @code{GCPRO}ing is one of the ugliest and trickiest parts of Emacs
4564 internals.  The basic idea is that whenever garbage collection
4565 occurs, all in-use objects must be reachable somehow or
4566 other from one of the roots of accessibility.  The roots
4567 of accessibility are:
4568
4569 @enumerate
4570 @item
4571 All objects that have been @code{staticpro()}d.  This is used for
4572 any global C variables that hold Lisp objects.  A call to
4573 @code{staticpro()} happens implicitly as a result of any symbols
4574 declared with @code{defsymbol()} and any variables declared with
4575 @code{DEFVAR_FOO()}.  You need to explicitly call @code{staticpro()}
4576 (in the @code{vars_of_foo()} method of a module) for other global
4577 C variables holding Lisp objects. (This typically includes
4578 internal lists and such things.)
4579
4580 Note that @code{obarray} is one of the @code{staticpro()}d things.
4581 Therefore, all functions and variables get marked through this.
4582 @item
4583 Any shadowed bindings that are sitting on the @code{specpdl} stack.
4584 @item
4585 Any objects sitting in currently active (Lisp) stack frames,
4586 catches, and condition cases.
4587 @item
4588 A couple of special-case places where active objects are
4589 located.
4590 @item
4591 Anything currently marked with @code{GCPRO}.
4592 @end enumerate
4593
4594   Marking with @code{GCPRO} is necessary because some C functions (quite
4595 a lot, in fact), allocate objects during their operation.  Quite
4596 frequently, there will be no other pointer to the object while the
4597 function is running, and if a garbage collection occurs and the object
4598 needs to be referenced again, bad things will happen.  The solution is
4599 to mark those objects with @code{GCPRO}.  Unfortunately this is easy to
4600 forget, and there is basically no way around this problem.  Here are
4601 some rules, though:
4602
4603 @enumerate
4604 @item
4605 For every @code{GCPRO@var{n}}, there have to be declarations of
4606 @code{struct gcpro gcpro1, gcpro2}, etc.
4607
4608 @item
4609 You @emph{must} @code{UNGCPRO} anything that's @code{GCPRO}ed, and you
4610 @emph{must not} @code{UNGCPRO} if you haven't @code{GCPRO}ed.  Getting
4611 either of these wrong will lead to crashes, often in completely random
4612 places unrelated to where the problem lies.
4613
4614 @item
4615 The way this actually works is that all currently active @code{GCPRO}s
4616 are chained through the @code{struct gcpro} local variables, with the
4617 variable @samp{gcprolist} pointing to the head of the list and the nth
4618 local @code{gcpro} variable pointing to the first @code{gcpro} variable
4619 in the next enclosing stack frame.  Each @code{GCPRO}ed thing is an
4620 lvalue, and the @code{struct gcpro} local variable contains a pointer to
4621 this lvalue.  This is why things will mess up badly if you don't pair up
4622 the @code{GCPRO}s and @code{UNGCPRO}s -- you will end up with
4623 @code{gcprolist}s containing pointers to @code{struct gcpro}s or local
4624 @code{Lisp_Object} variables in no-longer-active stack frames.
4625
4626 @item
4627 It is actually possible for a single @code{struct gcpro} to
4628 protect a contiguous array of any number of values, rather than
4629 just a single lvalue.  To effect this, call @code{GCPRO@var{n}} as usual on
4630 the first object in the array and then set @code{gcpro@var{n}.nvars}.
4631
4632 @item
4633 @strong{Strings are relocated.}  What this means in practice is that the
4634 pointer obtained using @code{XSTRING_DATA()} is liable to change at any
4635 time, and you should never keep it around past any function call, or
4636 pass it as an argument to any function that might cause a garbage
4637 collection.  This is why a number of functions accept either a
4638 ``non-relocatable'' @code{char *} pointer or a relocatable Lisp string,
4639 and only access the Lisp string's data at the very last minute.  In some
4640 cases, you may end up having to @code{alloca()} some space and copy the
4641 string's data into it.
4642
4643 @item
4644 By convention, if you have to nest @code{GCPRO}'s, use @code{NGCPRO@var{n}}
4645 (along with @code{struct gcpro ngcpro1, ngcpro2}, etc.), @code{NNGCPRO@var{n}},
4646 etc.  This avoids compiler warnings about shadowed locals.
4647
4648 @item
4649 It is @emph{always} better to err on the side of extra @code{GCPRO}s
4650 rather than too few.  The extra cycles spent on this are
4651 almost never going to make a whit of difference in the
4652 speed of anything.
4653
4654 @item
4655 The general rule to follow is that caller, not callee, @code{GCPRO}s.
4656 That is, you should not have to explicitly @code{GCPRO} any Lisp objects
4657 that are passed in as parameters.
4658
4659 One exception from this rule is if you ever plan to change the parameter
4660 value, and store a new object in it.  In that case, you @emph{must}
4661 @code{GCPRO} the parameter, because otherwise the new object will not be
4662 protected.
4663
4664 So, if you create any Lisp objects (remember, this happens in all sorts
4665 of circumstances, e.g. with @code{Fcons()}, etc.), you are responsible
4666 for @code{GCPRO}ing them, unless you are @emph{absolutely sure} that
4667 there's no possibility that a garbage-collection can occur while you
4668 need to use the object.  Even then, consider @code{GCPRO}ing.
4669
4670 @item
4671 A garbage collection can occur whenever anything calls @code{Feval}, or
4672 whenever a QUIT can occur where execution can continue past
4673 this. (Remember, this is almost anywhere.)
4674
4675 @item
4676 If you have the @emph{least smidgeon of doubt} about whether
4677 you need to @code{GCPRO}, you should @code{GCPRO}.
4678
4679 @item
4680 Beware of @code{GCPRO}ing something that is uninitialized.  If you have
4681 any shade of doubt about this, initialize all your variables to @code{Qnil}.
4682
4683 @item
4684 Be careful of traps, like calling @code{Fcons()} in the argument to
4685 another function.  By the ``caller protects'' law, you should be
4686 @code{GCPRO}ing the newly-created cons, but you aren't.  A certain
4687 number of functions that are commonly called on freshly created stuff
4688 (e.g. @code{nconc2()}, @code{Fsignal()}), break the ``caller protects''
4689 law and go ahead and @code{GCPRO} their arguments so as to simplify
4690 things, but make sure and check if it's OK whenever doing something like
4691 this.
4692
4693 @item
4694 Once again, remember to @code{GCPRO}!  Bugs resulting from insufficient
4695 @code{GCPRO}ing are intermittent and extremely difficult to track down,
4696 often showing up in crashes inside of @code{garbage-collect} or in
4697 weirdly corrupted objects or even in incorrect values in a totally
4698 different section of code.
4699 @end enumerate
4700
4701 @cindex garbage collection, conservative
4702 @cindex conservative garbage collection
4703   Given the extremely error-prone nature of the @code{GCPRO} scheme, and
4704 the difficulties in tracking down, it should be considered a deficiency
4705 in the XEmacs code.  A solution to this problem would involve
4706 implementing so-called @dfn{conservative} garbage collection for the C
4707 stack.  That involves looking through all of stack memory and treating
4708 anything that looks like a reference to an object as a reference.  This
4709 will result in a few objects not getting collected when they should, but
4710 it obviates the need for @code{GCPRO}ing, and allows garbage collection
4711 to happen at any point at all, such as during object allocation.
4712
4713 @node Integers and Characters
4714 @section Integers and Characters
4715
4716   Integer and character Lisp objects are created from integers using the
4717 macros @code{XSETINT()} and @code{XSETCHAR()} or the equivalent
4718 functions @code{make_int()} and @code{make_char()}. (These are actually
4719 macros on most systems.)  These functions basically just do some moving
4720 of bits around, since the integral value of the object is stored
4721 directly in the @code{Lisp_Object}.
4722
4723   @code{XSETINT()} and the like will truncate values given to them that
4724 are too big; i.e. you won't get the value you expected but the tag bits
4725 will at least be correct.
4726
4727 @node Allocation from Frob Blocks
4728 @section Allocation from Frob Blocks
4729
4730 The uninitialized memory required by a @code{Lisp_Object} of a particular type
4731 is allocated using
4732 @code{ALLOCATE_FIXED_TYPE()}.  This only occurs inside of the
4733 lowest-level object-creating functions in @file{alloc.c}:
4734 @code{Fcons()}, @code{make_float()}, @code{Fmake_byte_code()},
4735 @code{Fmake_symbol()}, @code{allocate_extent()},
4736 @code{allocate_event()}, @code{Fmake_marker()}, and
4737 @code{make_uninit_string()}.  The idea is that, for each type, there are
4738 a number of frob blocks (each 2K in size); each frob block is divided up
4739 into object-sized chunks.  Each frob block will have some of these
4740 chunks that are currently assigned to objects, and perhaps some that are
4741 free. (If a frob block has nothing but free chunks, it is freed at the
4742 end of the garbage collection cycle.)  The free chunks are stored in a
4743 free list, which is chained by storing a pointer in the first four bytes
4744 of the chunk. (Except for the free chunks at the end of the last frob
4745 block, which are handled using an index which points past the end of the
4746 last-allocated chunk in the last frob block.)
4747 @code{ALLOCATE_FIXED_TYPE()} first tries to retrieve a chunk from the
4748 free list; if that fails, it calls
4749 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK()}, which looks at the end of the
4750 last frob block for space, and creates a new frob block if there is
4751 none. (There are actually two versions of these macros, one of which is
4752 more defensive but less efficient and is used for error-checking.)
4753
4754 @node lrecords
4755 @section lrecords
4756
4757   [see @file{lrecord.h}]
4758
4759   All lrecords have at the beginning of their structure a @code{struct
4760 lrecord_header}.  This just contains a pointer to a @code{struct
4761 lrecord_implementation}, which is a structure containing method pointers
4762 and such.  There is one of these for each type, and it is a global,
4763 constant, statically-declared structure that is declared in the
4764 @code{DEFINE_LRECORD_IMPLEMENTATION()} macro. (This macro actually
4765 declares an array of two @code{struct lrecord_implementation}
4766 structures.  The first one contains all the standard method pointers,
4767 and is used in all normal circumstances.  During garbage collection,
4768 however, the lrecord is @dfn{marked} by bumping its implementation
4769 pointer by one, so that it points to the second structure in the array.
4770 This structure contains a special indication in it that it's a
4771 @dfn{marked-object} structure: the finalize method is the special
4772 function @code{this_marks_a_marked_record()}, and all other methods are
4773 null pointers.  At the end of garbage collection, all lrecords will
4774 either be reclaimed or unmarked by decrementing their implementation
4775 pointers, so this second structure pointer will never remain past
4776 garbage collection.
4777
4778   Simple lrecords (of type (c) above) just have a @code{struct
4779 lrecord_header} at their beginning.  lcrecords, however, actually have a
4780 @code{struct lcrecord_header}.  This, in turn, has a @code{struct
4781 lrecord_header} at its beginning, so sanity is preserved; but it also
4782 has a pointer used to chain all lcrecords together, and a special ID
4783 field used to distinguish one lcrecord from another. (This field is used
4784 only for debugging and could be removed, but the space gain is not
4785 significant.)
4786
4787   Simple lrecords are created using @code{ALLOCATE_FIXED_TYPE()}, just
4788 like for other frob blocks.  The only change is that the implementation
4789 pointer must be initialized correctly. (The implementation structure for
4790 an lrecord, or rather the pointer to it, is named @code{lrecord_float},
4791 @code{lrecord_extent}, @code{lrecord_buffer}, etc.)
4792
4793   lcrecords are created using @code{alloc_lcrecord()}.  This takes a
4794 size to allocate and an implementation pointer. (The size needs to be
4795 passed because some lcrecords, such as window configurations, are of
4796 variable size.) This basically just @code{malloc()}s the storage,
4797 initializes the @code{struct lcrecord_header}, and chains the lcrecord
4798 onto the head of the list of all lcrecords, which is stored in the
4799 variable @code{all_lcrecords}.  The calls to @code{alloc_lcrecord()}
4800 generally occur in the lowest-level allocation function for each lrecord
4801 type.
4802
4803 Whenever you create an lrecord, you need to call either
4804 @code{DEFINE_LRECORD_IMPLEMENTATION()} or
4805 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()}.  This needs to be
4806 specified in a C file, at the top level.  What this actually does is
4807 define and initialize the implementation structure for the lrecord. (And
4808 possibly declares a function @code{error_check_foo()} that implements
4809 the @code{XFOO()} macro when error-checking is enabled.)  The arguments
4810 to the macros are the actual type name (this is used to construct the C
4811 variable name of the lrecord implementation structure and related
4812 structures using the @samp{##} macro concatenation operator), a string
4813 that names the type on the Lisp level (this may not be the same as the C
4814 type name; typically, the C type name has underscores, while the Lisp
4815 string has dashes), various method pointers, and the name of the C
4816 structure that contains the object.  The methods are used to encapsulate
4817 type-specific information about the object, such as how to print it or
4818 mark it for garbage collection, so that it's easy to add new object
4819 types without having to add a specific case for each new type in a bunch
4820 of different places.
4821
4822   The difference between @code{DEFINE_LRECORD_IMPLEMENTATION()} and
4823 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()} is that the former is
4824 used for fixed-size object types and the latter is for variable-size
4825 object types.  Most object types are fixed-size; some complex
4826 types, however (e.g. window configurations), are variable-size.
4827 Variable-size object types have an extra method, which is called
4828 to determine the actual size of a particular object of that type.
4829 (Currently this is only used for keeping allocation statistics.)
4830
4831   For the purpose of keeping allocation statistics, the allocation
4832 engine keeps a list of all the different types that exist.  Note that,
4833 since @code{DEFINE_LRECORD_IMPLEMENTATION()} is a macro that is
4834 specified at top-level, there is no way for it to add to the list of all
4835 existing types.  What happens instead is that each implementation
4836 structure contains in it a dynamically assigned number that is
4837 particular to that type. (Or rather, it contains a pointer to another
4838 structure that contains this number.  This evasiveness is done so that
4839 the implementation structure can be declared const.) In the sweep stage
4840 of garbage collection, each lrecord is examined to see if its
4841 implementation structure has its dynamically-assigned number set.  If
4842 not, it must be a new type, and it is added to the list of known types
4843 and a new number assigned.  The number is used to index into an array
4844 holding the number of objects of each type and the total memory
4845 allocated for objects of that type.  The statistics in this array are
4846 also computed during the sweep stage.  These statistics are returned by
4847 the call to @code{garbage-collect} and are printed out at the end of the
4848 loadup phase.
4849
4850   Note that for every type defined with a @code{DEFINE_LRECORD_*()}
4851 macro, there needs to be a @code{DECLARE_LRECORD_IMPLEMENTATION()}
4852 somewhere in a @file{.h} file, and this @file{.h} file needs to be
4853 included by @file{inline.c}.
4854
4855   Furthermore, there should generally be a set of @code{XFOOBAR()},
4856 @code{FOOBARP()}, etc. macros in a @file{.h} (or occasionally @file{.c})
4857 file.  To create one of these, copy an existing model and modify as
4858 necessary.
4859
4860   The various methods in the lrecord implementation structure are:
4861
4862 @enumerate
4863 @item
4864 @cindex mark method
4865 A @dfn{mark} method.  This is called during the marking stage and passed
4866 a function pointer (usually the @code{mark_object()} function), which is
4867 used to mark an object.  All Lisp objects that are contained within the
4868 object need to be marked by applying this function to them.  The mark
4869 method should also return a Lisp object, which should be either nil or
4870 an object to mark. (This can be used in lieu of calling
4871 @code{mark_object()} on the object, to reduce the recursion depth, and
4872 consequently should be the most heavily nested sub-object, such as a
4873 long list.)
4874
4875 @strong{Please note:} When the mark method is called, garbage collection
4876 is in progress, and special precautions need to be taken when accessing
4877 objects; see section (B) above.
4878
4879 If your mark method does not need to do anything, it can be
4880 @code{NULL}.
4881
4882 @item
4883 A @dfn{print} method.  This is called to create a printed representation
4884 of the object, whenever @code{princ}, @code{prin1}, or the like is
4885 called.  It is passed the object, a stream to which the output is to be
4886 directed, and an @code{escapeflag} which indicates whether the object's
4887 printed representation should be @dfn{escaped} so that it is
4888 readable. (This corresponds to the difference between @code{princ} and
4889 @code{prin1}.) Basically, @dfn{escaped} means that strings will have
4890 quotes around them and confusing characters in the strings such as
4891 quotes, backslashes, and newlines will be backslashed; and that special
4892 care will be taken to make symbols print in a readable fashion
4893 (e.g. symbols that look like numbers will be backslashed).  Other
4894 readable objects should perhaps pass @code{escapeflag} on when
4895 sub-objects are printed, so that readability is preserved when necessary
4896 (or if not, always pass in a 1 for @code{escapeflag}).  Non-readable
4897 objects should in general ignore @code{escapeflag}, except that some use
4898 it as an indication that more verbose output should be given.
4899
4900 Sub-objects are printed using @code{print_internal()}, which takes
4901 exactly the same arguments as are passed to the print method.
4902
4903 Literal C strings should be printed using @code{write_c_string()},
4904 or @code{write_string_1()} for non-null-terminated strings.
4905
4906 Functions that do not have a readable representation should check the
4907 @code{print_readably} flag and signal an error if it is set.
4908
4909 If you specify NULL for the print method, the
4910 @code{default_object_printer()} will be used.
4911
4912 @item
4913 A @dfn{finalize} method.  This is called at the beginning of the sweep
4914 stage on lcrecords that are about to be freed, and should be used to
4915 perform any extra object cleanup.  This typically involves freeing any
4916 extra @code{malloc()}ed memory associated with the object, releasing any
4917 operating-system and window-system resources associated with the object
4918 (e.g. pixmaps, fonts), etc.
4919
4920 The finalize method can be NULL if nothing needs to be done.
4921
4922 WARNING #1: The finalize method is also called at the end of the dump
4923 phase; this time with the for_disksave parameter set to non-zero.  The
4924 object is @emph{not} about to disappear, so you have to make sure to
4925 @emph{not} free any extra @code{malloc()}ed memory if you're going to
4926 need it later.  (Also, signal an error if there are any operating-system
4927 and window-system resources here, because they can't be dumped.)
4928
4929 Finalize methods should, as a rule, set to zero any pointers after
4930 they've been freed, and check to make sure pointers are not zero before
4931 freeing.  Although I'm pretty sure that finalize methods are not called
4932 twice on the same object (except for the @code{for_disksave} proviso),
4933 we've gotten nastily burned in some cases by not doing this.
4934
4935 WARNING #2: The finalize method is @emph{only} called for
4936 lcrecords, @emph{not} for simply lrecords.  If you need a
4937 finalize method for simple lrecords, you have to stick
4938 it in the @code{ADDITIONAL_FREE_foo()} macro in @file{alloc.c}.
4939
4940 WARNING #3: Things are in an @emph{extremely} bizarre state
4941 when @code{ADDITIONAL_FREE_foo()} is called, so you have to
4942 be incredibly careful when writing one of these functions.
4943 See the comment in @code{gc_sweep()}.  If you ever have to add
4944 one of these, consider using an lcrecord or dealing with
4945 the problem in a different fashion.
4946
4947 @item
4948 An @dfn{equal} method.  This compares the two objects for similarity,
4949 when @code{equal} is called.  It should compare the contents of the
4950 objects in some reasonable fashion.  It is passed the two objects and a
4951 @dfn{depth} value, which is used to catch circular objects.  To compare
4952 sub-Lisp-objects, call @code{internal_equal()} and bump the depth value
4953 by one.  If this value gets too high, a @code{circular-object} error
4954 will be signaled.
4955
4956 If this is NULL, objects are @code{equal} only when they are @code{eq},
4957 i.e. identical.
4958
4959 @item
4960 A @dfn{hash} method.  This is used to hash objects when they are to be
4961 compared with @code{equal}.  The rule here is that if two objects are
4962 @code{equal}, they @emph{must} hash to the same value; i.e. your hash
4963 function should use some subset of the sub-fields of the object that are
4964 compared in the ``equal'' method.  If you specify this method as
4965 @code{NULL}, the object's pointer will be used as the hash, which will
4966 @emph{fail} if the object has an @code{equal} method, so don't do this.
4967
4968 To hash a sub-Lisp-object, call @code{internal_hash()}.  Bump the
4969 depth by one, just like in the ``equal'' method.
4970
4971 To convert a Lisp object directly into a hash value (using
4972 its pointer), use @code{LISP_HASH()}.  This is what happens when
4973 the hash method is NULL.
4974
4975 To hash two or more values together into a single value, use
4976 @code{HASH2()}, @code{HASH3()}, @code{HASH4()}, etc.
4977
4978 @item
4979 @dfn{getprop}, @dfn{putprop}, @dfn{remprop}, and @dfn{plist} methods.
4980 These are used for object types that have properties.  I don't feel like
4981 documenting them here.  If you create one of these objects, you have to
4982 use different macros to define them,
4983 i.e. @code{DEFINE_LRECORD_IMPLEMENTATION_WITH_PROPS()} or
4984 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION_WITH_PROPS()}.
4985
4986 @item
4987 A @dfn{size_in_bytes} method, when the object is of variable-size.
4988 (i.e. declared with a @code{_SEQUENCE_IMPLEMENTATION} macro.)  This should
4989 simply return the object's size in bytes, exactly as you might expect.
4990 For an example, see the methods for window configurations and opaques.
4991 @end enumerate
4992
4993 @node Low-level allocation
4994 @section Low-level allocation
4995
4996   Memory that you want to allocate directly should be allocated using
4997 @code{xmalloc()} rather than @code{malloc()}.  This implements
4998 error-checking on the return value, and once upon a time did some more
4999 vital stuff (i.e. @code{BLOCK_INPUT}, which is no longer necessary).
5000 Free using @code{xfree()}, and realloc using @code{xrealloc()}.  Note
5001 that @code{xmalloc()} will do a non-local exit if the memory can't be
5002 allocated. (Many functions, however, do not expect this, and thus XEmacs
5003 will likely crash if this happens.  @strong{This is a bug.}  If you can,
5004 you should strive to make your function handle this OK.  However, it's
5005 difficult in the general circumstance, perhaps requiring extra
5006 unwind-protects and such.)
5007
5008   Note that XEmacs provides two separate replacements for the standard
5009 @code{malloc()} library function.  These are called @dfn{old GNU malloc}
5010 (@file{malloc.c}) and @dfn{new GNU malloc} (@file{gmalloc.c}),
5011 respectively.  New GNU malloc is better in pretty much every way than
5012 old GNU malloc, and should be used if possible.  (It used to be that on
5013 some systems, the old one worked but the new one didn't.  I think this
5014 was due specifically to a bug in SunOS, which the new one now works
5015 around; so I don't think the old one ever has to be used any more.) The
5016 primary difference between both of these mallocs and the standard system
5017 malloc is that they are much faster, at the expense of increased space.
5018 The basic idea is that memory is allocated in fixed chunks of powers of
5019 two.  This allows for basically constant malloc time, since the various
5020 chunks can just be kept on a number of free lists. (The standard system
5021 malloc typically allocates arbitrary-sized chunks and has to spend some
5022 time, sometimes a significant amount of time, walking the heap looking
5023 for a free block to use and cleaning things up.)  The new GNU malloc
5024 improves on things by allocating large objects in chunks of 4096 bytes
5025 rather than in ever larger powers of two, which results in ever larger
5026 wastage.  There is a slight speed loss here, but it's of doubtful
5027 significance.
5028
5029   NOTE: Apparently there is a third-generation GNU malloc that is
5030 significantly better than the new GNU malloc, and should probably
5031 be included in XEmacs.
5032
5033   There is also the relocating allocator, @file{ralloc.c}.  This actually
5034 moves blocks of memory around so that the @code{sbrk()} pointer shrunk
5035 and virtual memory released back to the system.  On some systems,
5036 this is a big win.  On all systems, it causes a noticeable (and
5037 sometimes huge) speed penalty, so I turn it off by default.
5038 @file{ralloc.c} only works with the new GNU malloc in @file{gmalloc.c}.
5039 There are also two versions of @file{ralloc.c}, one that uses @code{mmap()}
5040 rather than block copies to move data around.  This purports to
5041 be faster, although that depends on the amount of data that would
5042 have had to be block copied and the system-call overhead for
5043 @code{mmap()}.  I don't know exactly how this works, except that the
5044 relocating-allocation routines are pretty much used only for
5045 the memory allocated for a buffer, which is the biggest consumer
5046 of space, esp. of space that may get freed later.
5047
5048   Note that the GNU mallocs have some ``memory warning'' facilities.
5049 XEmacs taps into them and issues a warning through the standard
5050 warning system, when memory gets to 75%, 85%, and 95% full.
5051 (On some systems, the memory warnings are not functional.)
5052
5053   Allocated memory that is going to be used to make a Lisp object
5054 is created using @code{allocate_lisp_storage()}.  This calls @code{xmalloc()}
5055 but also verifies that the pointer to the memory can fit into
5056 a Lisp word (remember that some bits are taken away for a type
5057 tag and a mark bit).  If not, an error is issued through @code{memory_full()}.
5058 @code{allocate_lisp_storage()} is called by @code{alloc_lcrecord()},
5059 @code{ALLOCATE_FIXED_TYPE()}, and the vector and bit-vector creation
5060 routines.  These routines also call @code{INCREMENT_CONS_COUNTER()} at the
5061 appropriate times; this keeps statistics on how much memory is
5062 allocated, so that garbage-collection can be invoked when the
5063 threshold is reached.
5064
5065 @node Pure Space
5066 @section Pure Space
5067
5068   Not yet documented.
5069
5070 @node Cons
5071 @section Cons
5072
5073   Conses are allocated in standard frob blocks.  The only thing to
5074 note is that conses can be explicitly freed using @code{free_cons()}
5075 and associated functions @code{free_list()} and @code{free_alist()}.  This
5076 immediately puts the conses onto the cons free list, and decrements
5077 the statistics on memory allocation appropriately.  This is used
5078 to good effect by some extremely commonly-used code, to avoid
5079 generating extra objects and thereby triggering GC sooner.
5080 However, you have to be @emph{extremely} careful when doing this.
5081 If you mess this up, you will get BADLY BURNED, and it has happened
5082 before.
5083
5084 @node Vector
5085 @section Vector
5086
5087   As mentioned above, each vector is @code{malloc()}ed individually, and
5088 all are threaded through the variable @code{all_vectors}.  Vectors are
5089 marked strangely during garbage collection, by kludging the size field.
5090 Note that the @code{struct Lisp_Vector} is declared with its
5091 @code{contents} field being a @emph{stretchy} array of one element.  It
5092 is actually @code{malloc()}ed with the right size, however, and access
5093 to any element through the @code{contents} array works fine.
5094
5095 @node Bit Vector
5096 @section Bit Vector
5097
5098   Bit vectors work exactly like vectors, except for more complicated
5099 code to access an individual bit, and except for the fact that bit
5100 vectors are lrecords while vectors are not. (The only difference here is
5101 that there's an lrecord implementation pointer at the beginning and the
5102 tag field in bit vector Lisp words is ``lrecord'' rather than
5103 ``vector''.)
5104
5105 @node Symbol
5106 @section Symbol
5107
5108   Symbols are also allocated in frob blocks.  Note that the code
5109 exists for symbols to be either lrecords (category (c) above)
5110 or simple types (category (b) above), and are lrecords by
5111 default (I think), although there is no good reason for this.
5112
5113   Note that symbols in the awful horrible obarray structure are
5114 chained through their @code{next} field.
5115
5116 Remember that @code{intern} looks up a symbol in an obarray, creating
5117 one if necessary.
5118
5119 @node Marker
5120 @section Marker
5121
5122   Markers are allocated in frob blocks, as usual.  They are kept
5123 in a buffer unordered, but in a doubly-linked list so that they
5124 can easily be removed. (Formerly this was a singly-linked list,
5125 but in some cases garbage collection took an extraordinarily
5126 long time due to the O(N^2) time required to remove lots of
5127 markers from a buffer.) Markers are removed from a buffer in
5128 the finalize stage, in @code{ADDITIONAL_FREE_marker()}.
5129
5130 @node String
5131 @section String
5132
5133   As mentioned above, strings are a special case.  A string is logically
5134 two parts, a fixed-size object (containing the length, property list,
5135 and a pointer to the actual data), and the actual data in the string.
5136 The fixed-size object is a @code{struct Lisp_String} and is allocated in
5137 frob blocks, as usual.  The actual data is stored in special
5138 @dfn{string-chars blocks}, which are 8K blocks of memory.
5139 Currently-allocated strings are simply laid end to end in these
5140 string-chars blocks, with a pointer back to the @code{struct Lisp_String}
5141 stored before each string in the string-chars block.  When a new string
5142 needs to be allocated, the remaining space at the end of the last
5143 string-chars block is used if there's enough, and a new string-chars
5144 block is created otherwise.
5145
5146   There are never any holes in the string-chars blocks due to the string
5147 compaction and relocation that happens at the end of garbage collection.
5148 During the sweep stage of garbage collection, when objects are
5149 reclaimed, the garbage collector goes through all string-chars blocks,
5150 looking for unused strings.  Each chunk of string data is preceded by a
5151 pointer to the corresponding @code{struct Lisp_String}, which indicates
5152 both whether the string is used and how big the string is, i.e. how to
5153 get to the next chunk of string data.  Holes are compressed by
5154 block-copying the next string into the empty space and relocating the
5155 pointer stored in the corresponding @code{struct Lisp_String}.
5156 @strong{This means you have to be careful with strings in your code.}
5157 See the section above on @code{GCPRO}ing.
5158
5159   Note that there is one situation not handled: a string that is too big
5160 to fit into a string-chars block.  Such strings, called @dfn{big
5161 strings}, are all @code{malloc()}ed as their own block. (#### Although it
5162 would make more sense for the threshold for big strings to be somewhat
5163 lower, e.g. 1/2 or 1/4 the size of a string-chars block.  It seems that
5164 this was indeed the case formerly -- indeed, the threshold was set at
5165 1/8 -- but Mly forgot about this when rewriting things for 19.8.)
5166
5167 Note also that the string data in string-chars blocks is padded as
5168 necessary so that proper alignment constraints on the @code{struct
5169 Lisp_String} back pointers are maintained.
5170
5171   Finally, strings can be resized.  This happens in Mule when a
5172 character is substituted with a different-length character, or during
5173 modeline frobbing. (You could also export this to Lisp, but it's not
5174 done so currently.) Resizing a string is a potentially tricky process.
5175 If the change is small enough that the padding can absorb it, nothing
5176 other than a simple memory move needs to be done.  Keep in mind,
5177 however, that the string can't shrink too much because the offset to the
5178 next string in the string-chars block is computed by looking at the
5179 length and rounding to the nearest multiple of four or eight.  If the
5180 string would shrink or expand beyond the correct padding, new string
5181 data needs to be allocated at the end of the last string-chars block and
5182 the data moved appropriately.  This leaves some dead string data, which
5183 is marked by putting a special marker of 0xFFFFFFFF in the @code{struct
5184 Lisp_String} pointer before the data (there's no real @code{struct
5185 Lisp_String} to point to and relocate), and storing the size of the dead
5186 string data (which would normally be obtained from the now-non-existent
5187 @code{struct Lisp_String}) at the beginning of the dead string data gap.
5188 The string compactor recognizes this special 0xFFFFFFFF marker and
5189 handles it correctly.
5190
5191 @node Compiled Function
5192 @section Compiled Function
5193
5194   Not yet documented.
5195
5196 @node Events and the Event Loop, Evaluation; Stack Frames; Bindings, Allocation of Objects in XEmacs Lisp, Top
5197 @chapter Events and the Event Loop
5198
5199 @menu
5200 * Introduction to Events::
5201 * Main Loop::
5202 * Specifics of the Event Gathering Mechanism::
5203 * Specifics About the Emacs Event::
5204 * The Event Stream Callback Routines::
5205 * Other Event Loop Functions::
5206 * Converting Events::
5207 * Dispatching Events; The Command Builder::
5208 @end menu
5209
5210 @node Introduction to Events
5211 @section Introduction to Events
5212
5213   An event is an object that encapsulates information about an
5214 interesting occurrence in the operating system.  Events are
5215 generated either by user action, direct (e.g. typing on the
5216 keyboard or moving the mouse) or indirect (moving another
5217 window, thereby generating an expose event on an Emacs frame),
5218 or as a result of some other typically asynchronous action happening,
5219 such as output from a subprocess being ready or a timer expiring.
5220 Events come into the system in an asynchronous fashion (typically
5221 through a callback being called) and are converted into a
5222 synchronous event queue (first-in, first-out) in a process that
5223 we will call @dfn{collection}.
5224
5225   Note that each application has its own event queue. (It is
5226 immaterial whether the collection process directly puts the
5227 events in the proper application's queue, or puts them into
5228 a single system queue, which is later split up.)
5229
5230   The most basic level of event collection is done by the
5231 operating system or window system.  Typically, XEmacs does
5232 its own event collection as well.  Often there are multiple
5233 layers of collection in XEmacs, with events from various
5234 sources being collected into a queue, which is then combined
5235 with other sources to go into another queue (i.e. a second
5236 level of collection), with perhaps another level on top of
5237 this, etc.
5238
5239   XEmacs has its own types of events (called @dfn{Emacs events}),
5240 which provides an abstract layer on top of the system-dependent
5241 nature of the most basic events that are received.  Part of the
5242 complex nature of the XEmacs event collection process involves
5243 converting from the operating-system events into the proper
5244 Emacs events -- there may not be a one-to-one correspondence.
5245
5246   Emacs events are documented in @file{events.h}; I'll discuss them
5247 later.
5248
5249 @node Main Loop
5250 @section Main Loop
5251
5252   The @dfn{command loop} is the top-level loop that the editor is always
5253 running.  It loops endlessly, calling @code{next-event} to retrieve an
5254 event and @code{dispatch-event} to execute it. @code{dispatch-event} does
5255 the appropriate thing with non-user events (process, timeout,
5256 magic, eval, mouse motion); this involves calling a Lisp handler
5257 function, redrawing a newly-exposed part of a frame, reading
5258 subprocess output, etc.  For user events, @code{dispatch-event}
5259 looks up the event in relevant keymaps or menubars; when a
5260 full key sequence or menubar selection is reached, the appropriate
5261 function is executed. @code{dispatch-event} may have to keep state
5262 across calls; this is done in the ``command-builder'' structure
5263 associated with each console (remember, there's usually only
5264 one console), and the engine that looks up keystrokes and
5265 constructs full key sequences is called the @dfn{command builder}.
5266 This is documented elsewhere.
5267
5268   The guts of the command loop are in @code{command_loop_1()}.  This
5269 function doesn't catch errors, though -- that's the job of
5270 @code{command_loop_2()}, which is a condition-case (i.e. error-trapping)
5271 wrapper around @code{command_loop_1()}.  @code{command_loop_1()} never
5272 returns, but may get thrown out of.
5273
5274   When an error occurs, @code{cmd_error()} is called, which usually
5275 invokes the Lisp error handler in @code{command-error}; however, a
5276 default error handler is provided if @code{command-error} is @code{nil}
5277 (e.g. during startup).  The purpose of the error handler is simply to
5278 display the error message and do associated cleanup; it does not need to
5279 throw anywhere.  When the error handler finishes, the condition-case in
5280 @code{command_loop_2()} will finish and @code{command_loop_2()} will
5281 reinvoke @code{command_loop_1()}.
5282
5283   @code{command_loop_2()} is invoked from three places: from
5284 @code{initial_command_loop()} (called from @code{main()} at the end of
5285 internal initialization), from the Lisp function @code{recursive-edit},
5286 and from @code{call_command_loop()}.
5287
5288   @code{call_command_loop()} is called when a macro is started and when
5289 the minibuffer is entered; normal termination of the macro or minibuffer
5290 causes a throw out of the recursive command loop. (To
5291 @code{execute-kbd-macro} for macros and @code{exit} for minibuffers.
5292 Note also that the low-level minibuffer-entering function,
5293 @code{read-minibuffer-internal}, provides its own error handling and
5294 does not need @code{command_loop_2()}'s error encapsulation; so it tells
5295 @code{call_command_loop()} to invoke @code{command_loop_1()} directly.)
5296
5297   Note that both read-minibuffer-internal and recursive-edit set up a
5298 catch for @code{exit}; this is why @code{abort-recursive-edit}, which
5299 throws to this catch, exits out of either one.
5300
5301   @code{initial_command_loop()}, called from @code{main()}, sets up a
5302 catch for @code{top-level} when invoking @code{command_loop_2()},
5303 allowing functions to throw all the way to the top level if they really
5304 need to.  Before invoking @code{command_loop_2()},
5305 @code{initial_command_loop()} calls @code{top_level_1()}, which handles
5306 all of the startup stuff (creating the initial frame, handling the
5307 command-line options, loading the user's @file{.emacs} file, etc.).  The
5308 function that actually does this is in Lisp and is pointed to by the
5309 variable @code{top-level}; normally this function is
5310 @code{normal-top-level}.  @code{top_level_1()} is just an error-handling
5311 wrapper similar to @code{command_loop_2()}.  Note also that
5312 @code{initial_command_loop()} sets up a catch for @code{top-level} when
5313 invoking @code{top_level_1()}, just like when it invokes
5314 @code{command_loop_2()}.
5315
5316 @node Specifics of the Event Gathering Mechanism
5317 @section Specifics of the Event Gathering Mechanism
5318
5319   Here is an approximate diagram of the collection processes
5320 at work in XEmacs, under TTY's (TTY's are simpler than X
5321 so we'll look at this first):
5322
5323 @noindent
5324 @example
5325  asynch.      asynch.    asynch.   asynch.             [Collectors in
5326 kbd events  kbd events   process   process                the OS]
5327       |         |         output    output
5328       |         |           |         |
5329       |         |           |         |      SIGINT,   [signal handlers
5330       |         |           |         |      SIGQUIT,     in XEmacs]
5331       V         V           V         V      SIGWINCH,
5332      file      file        file      file    SIGALRM
5333      desc.     desc.       desc.     desc.     |
5334      (TTY)     (TTY)       (pipe)    (pipe)    |
5335       |          |          |         |      fake    timeouts
5336       |          |          |         |      file        |
5337       |          |          |         |      desc.       |
5338       |          |          |         |      (pipe)      |
5339       |          |          |         |        |         |
5340       |          |          |         |        |         |
5341       |          |          |         |        |         |
5342       V          V          V         V        V         V
5343       ------>-----------<----------------<----------------
5344                   |
5345                   |
5346                   | [collected using select() in emacs_tty_next_event()
5347                   |  and converted to the appropriate Emacs event]
5348                   |
5349                   |
5350                   V          (above this line is TTY-specific)
5351                 Emacs -----------------------------------------------
5352                 event (below this line is the generic event mechanism)
5353                   |
5354                   |
5355 was there     if not, call
5356 a SIGINT?  emacs_tty_next_event()
5357     |             |
5358     |             |
5359     |             |
5360     V             V
5361     --->------<----
5362            |
5363            |     [collected in event_stream_next_event();
5364            |      SIGINT is converted using maybe_read_quit_event()]
5365            V
5366          Emacs
5367          event
5368            |
5369            \---->------>----- maybe_kbd_translate() ---->---\
5370                                                             |
5371                                                             |
5372                                                             |
5373      command event queue                                    |
5374                                                if not from command
5375   (contains events that were                   event queue, call
5376   read earlier but not processed,              event_stream_next_event()
5377   typically when waiting in a                               |
5378   sit-for, sleep-for, etc. for                              |
5379  a particular event to be received)                         |
5380                |                                            |
5381                |                                            |
5382                V                                            V
5383                ---->------------------------------------<----
5384                                                |
5385                                                | [collected in
5386                                                |  next_event_internal()]
5387                                                |
5388  unread-     unread-       event from          |
5389  command-    command-       keyboard       else, call
5390  events      event           macro      next_event_internal()
5391    |           |               |               |
5392    |           |               |               |
5393    |           |               |               |
5394    V           V               V               V
5395    --------->----------------------<------------
5396                      |
5397                      |      [collected in `next-event', which may loop
5398                      |       more than once if the event it gets is on
5399                      |       a dead frame, device, etc.]
5400                      |
5401                      |
5402                      V
5403             feed into top-level event loop,
5404             which repeatedly calls `next-event'
5405             and then dispatches the event
5406             using `dispatch-event'
5407 @end example
5408
5409 Notice the separation between TTY-specific and generic event mechanism.
5410 When using the Xt-based event loop, the TTY-specific stuff is replaced
5411 but the rest stays the same.
5412
5413 It's also important to realize that only one different kind of
5414 system-specific event loop can be operating at a time, and must be able
5415 to receive all kinds of events simultaneously.  For the two existing
5416 event loops (implemented in @file{event-tty.c} and @file{event-Xt.c},
5417 respectively), the TTY event loop @emph{only} handles TTY consoles,
5418 while the Xt event loop handles @emph{both} TTY and X consoles.  This
5419 situation is different from all of the output handlers, where you simply
5420 have one per console type.
5421
5422   Here's the Xt Event Loop Diagram (notice that below a certain point,
5423 it's the same as the above diagram):
5424
5425 @example
5426 asynch. asynch. asynch. asynch.                 [Collectors in
5427  kbd     kbd    process process                    the OS]
5428 events  events  output  output
5429   |       |       |       |
5430   |       |       |       |     asynch. asynch. [Collectors in the
5431   |       |       |       |       X        X     OS and X Window System]
5432   |       |       |       |     events  events
5433   |       |       |       |       |        |
5434   |       |       |       |       |        |
5435   |       |       |       |       |        |    SIGINT, [signal handlers
5436   |       |       |       |       |        |    SIGQUIT,   in XEmacs]
5437   |       |       |       |       |        |    SIGWINCH,
5438   |       |       |       |       |        |    SIGALRM
5439   |       |       |       |       |        |       |
5440   |       |       |       |       |        |       |
5441   |       |       |       |       |        |       |      timeouts
5442   |       |       |       |       |        |       |          |
5443   |       |       |       |       |        |       |          |
5444   |       |       |       |       |        |       V          |
5445   V       V       V       V       V        V      fake        |
5446  file    file    file    file    file     file    file        |
5447  desc.   desc.   desc.   desc.   desc.    desc.   desc.       |
5448  (TTY)   (TTY)   (pipe)  (pipe) (socket) (socket) (pipe)      |
5449   |       |       |       |       |        |       |          |
5450   |       |       |       |       |        |       |          |
5451   |       |       |       |       |        |       |          |
5452   V       V       V       V       V        V       V          V
5453   --->----------------------------------------<---------<------
5454        |              |               |
5455        |              |               |[collected using select() in
5456        |              |               | _XtWaitForSomething(), called
5457        |              |               | from XtAppProcessEvent(), called
5458        |              |               | in emacs_Xt_next_event();
5459        |              |               | dispatched to various callbacks]
5460        |              |               |
5461        |              |               |
5462   emacs_Xt_        p_s_callback(),    | [popup_selection_callback]
5463   event_handler()  x_u_v_s_callback(),| [x_update_vertical_scrollbar_
5464        |           x_u_h_s_callback(),|  callback]
5465        |           search_callback()  | [x_update_horizontal_scrollbar_
5466        |              |               |  callback]
5467        |              |               |
5468        |              |               |
5469   enqueue_Xt_       signal_special_   |
5470   dispatch_event()  Xt_user_event()   |
5471   [maybe multiple     |               |
5472    times, maybe 0     |               |
5473    times]             |               |
5474        |            enqueue_Xt_       |
5475        |            dispatch_event()  |
5476        |              |               |
5477        |              |               |
5478        V              V               |
5479        -->----------<--               |
5480               |                       |
5481               |                       |
5482            dispatch             Xt_what_callback()
5483            event                  sets flags
5484            queue                      |
5485               |                       |
5486               |                       |
5487               |                       |
5488               |                       |
5489               ---->-----------<--------
5490                    |
5491                    |
5492                    |     [collected and converted as appropriate in
5493                    |            emacs_Xt_next_event()]
5494                    |
5495                    |
5496                    V          (above this line is Xt-specific)
5497                  Emacs ------------------------------------------------
5498                  event (below this line is the generic event mechanism)
5499                    |
5500                    |
5501 was there      if not, call
5502 a SIGINT?   emacs_Xt_next_event()
5503     |              |
5504     |              |
5505     |              |
5506     V              V
5507     --->-------<----
5508            |
5509            |        [collected in event_stream_next_event();
5510            |         SIGINT is converted using maybe_read_quit_event()]
5511            V
5512          Emacs
5513          event
5514            |
5515            \---->------>----- maybe_kbd_translate() -->-----\
5516                                                             |
5517                                                             |
5518                                                             |
5519      command event queue                                    |
5520                                               if not from command
5521   (contains events that were                  event queue, call
5522   read earlier but not processed,             event_stream_next_event()
5523   typically when waiting in a                               |
5524   sit-for, sleep-for, etc. for                              |
5525  a particular event to be received)                         |
5526                |                                            |
5527                |                                            |
5528                V                                            V
5529                ---->----------------------------------<------
5530                                                |
5531                                                | [collected in
5532                                                |  next_event_internal()]
5533                                                |
5534  unread-     unread-       event from          |
5535  command-    command-       keyboard       else, call
5536  events      event           macro      next_event_internal()
5537    |           |               |               |
5538    |           |               |               |
5539    |           |               |               |
5540    V           V               V               V
5541    --------->----------------------<------------
5542                      |
5543                      |      [collected in `next-event', which may loop
5544                      |       more than once if the event it gets is on
5545                      |       a dead frame, device, etc.]
5546                      |
5547                      |
5548                      V
5549             feed into top-level event loop,
5550             which repeatedly calls `next-event'
5551             and then dispatches the event
5552             using `dispatch-event'
5553 @end example
5554
5555 @node Specifics About the Emacs Event
5556 @section Specifics About the Emacs Event
5557
5558 @node The Event Stream Callback Routines
5559 @section The Event Stream Callback Routines
5560
5561 @node Other Event Loop Functions
5562 @section Other Event Loop Functions
5563
5564   @code{detect_input_pending()} and @code{input-pending-p} look for
5565 input by calling @code{event_stream->event_pending_p} and looking in
5566 @code{[V]unread-command-event} and the @code{command_event_queue} (they
5567 do not check for an executing keyboard macro, though).
5568
5569   @code{discard-input} cancels any command events pending (and any
5570 keyboard macros currently executing), and puts the others onto the
5571 @code{command_event_queue}.  There is a comment about a ``race
5572 condition'', which is not a good sign.
5573
5574   @code{next-command-event} and @code{read-char} are higher-level
5575 interfaces to @code{next-event}.  @code{next-command-event} gets the
5576 next @dfn{command} event (i.e.  keypress, mouse event, menu selection,
5577 or scrollbar action), calling @code{dispatch-event} on any others.
5578 @code{read-char} calls @code{next-command-event} and uses
5579 @code{event_to_character()} to return the character equivalent.  With
5580 the right kind of input method support, it is possible for (read-char)
5581 to return a Kanji character.
5582
5583 @node Converting Events
5584 @section Converting Events
5585
5586   @code{character_to_event()}, @code{event_to_character()},
5587 @code{event-to-character}, and @code{character-to-event} convert between
5588 characters and keypress events corresponding to the characters.  If the
5589 event was not a keypress, @code{event_to_character()} returns -1 and
5590 @code{event-to-character} returns @code{nil}.  These functions convert
5591 between character representation and the split-up event representation
5592 (keysym plus mod keys).
5593
5594 @node Dispatching Events; The Command Builder
5595 @section Dispatching Events; The Command Builder
5596
5597 Not yet documented.
5598
5599 @node Evaluation; Stack Frames; Bindings, Symbols and Variables, Events and the Event Loop, Top
5600 @chapter Evaluation; Stack Frames; Bindings
5601
5602 @menu
5603 * Evaluation::
5604 * Dynamic Binding; The specbinding Stack; Unwind-Protects::
5605 * Simple Special Forms::
5606 * Catch and Throw::
5607 @end menu
5608
5609 @node Evaluation
5610 @section Evaluation
5611
5612   @code{Feval()} evaluates the form (a Lisp object) that is passed to
5613 it.  Note that evaluation is only non-trivial for two types of objects:
5614 symbols and conses.  A symbol is evaluated simply by calling
5615 @code{symbol-value} on it and returning the value.
5616
5617   Evaluating a cons means calling a function.  First, @code{eval} checks
5618 to see if garbage-collection is necessary, and calls
5619 @code{garbage_collect_1()} if so.  It then increases the evaluation
5620 depth by 1 (@code{lisp_eval_depth}, which is always less than
5621 @code{max_lisp_eval_depth}) and adds an element to the linked list of
5622 @code{struct backtrace}'s (@code{backtrace_list}).  Each such structure
5623 contains a pointer to the function being called plus a list of the
5624 function's arguments.  Originally these values are stored unevalled, and
5625 as they are evaluated, the backtrace structure is updated.  Garbage
5626 collection pays attention to the objects pointed to in the backtrace
5627 structures (garbage collection might happen while a function is being
5628 called or while an argument is being evaluated, and there could easily
5629 be no other references to the arguments in the argument list; once an
5630 argument is evaluated, however, the unevalled version is not needed by
5631 eval, and so the backtrace structure is changed).
5632
5633 At this point, the function to be called is determined by looking at
5634 the car of the cons (if this is a symbol, its function definition is
5635 retrieved and the process repeated).  The function should then consist
5636 of either a @code{Lisp_Subr} (built-in function written in C), a
5637 @code{Lisp_Compiled_Function} object, or a cons whose car is one of the
5638 symbols @code{autoload}, @code{macro} or @code{lambda}.
5639
5640 If the function is a @code{Lisp_Subr}, the lisp object points to a
5641 @code{struct Lisp_Subr} (created by @code{DEFUN()}), which contains a
5642 pointer to the C function, a minimum and maximum number of arguments
5643 (or possibly the special constants @code{MANY} or @code{UNEVALLED}), a
5644 pointer to the symbol referring to that subr, and a couple of other
5645 things.  If the subr wants its arguments @code{UNEVALLED}, they are
5646 passed raw as a list.  Otherwise, an array of evaluated arguments is
5647 created and put into the backtrace structure, and either passed whole
5648 (@code{MANY}) or each argument is passed as a C argument.
5649
5650 If the function is a @code{Lisp_Compiled_Function},
5651 @code{funcall_compiled_function()} is called.  If the function is a
5652 lambda list, @code{funcall_lambda()} is called.  If the function is a
5653 macro, [..... fill in] is done.  If the function is an autoload,
5654 @code{do_autoload()} is called to load the definition and then eval
5655 starts over [explain this more].
5656
5657 When @code{Feval()} exits, the evaluation depth is reduced by one, the
5658 debugger is called if appropriate, and the current backtrace structure
5659 is removed from the list.
5660
5661 Both @code{funcall_compiled_function()} and @code{funcall_lambda()} need
5662 to go through the list of formal parameters to the function and bind
5663 them to the actual arguments, checking for @code{&rest} and
5664 @code{&optional} symbols in the formal parameters and making sure the
5665 number of actual arguments is correct.
5666 @code{funcall_compiled_function()} can do this a little more
5667 efficiently, since the formal parameter list can be checked for sanity
5668 when the compiled function object is created.
5669
5670 @code{funcall_lambda()} simply calls @code{Fprogn} to execute the code
5671 in the lambda list.
5672
5673 @code{funcall_compiled_function()} calls the real byte-code interpreter
5674 @code{execute_optimized_program()} on the byte-code instructions, which
5675 are converted into an internal form for faster execution.
5676
5677 When a compiled function is executed for the first time by
5678 @code{funcall_compiled_function()}, or when it is @code{Fpurecopy()}ed
5679 during the dump phase of building XEmacs, the byte-code instructions are
5680 converted from a @code{Lisp_String} (which is inefficient to access,
5681 especially in the presence of MULE) into a @code{Lisp_Opaque} object
5682 containing an array of unsigned char, which can be directly executed by
5683 the byte-code interpreter.  At this time the byte code is also analyzed
5684 for validity and transformed into a more optimized form, so that
5685 @code{execute_optimized_program()} can really fly.
5686
5687 Here are some of the optimizations performed by the internal byte-code
5688 transformer:
5689 @enumerate
5690 @item
5691 References to the @code{constants} array are checked for out-of-range
5692 indices, so that the byte interpreter doesn't have to.
5693 @item
5694 References to the @code{constants} array that will be used as a Lisp
5695 variable are checked for being correct non-constant (i.e. not @code{t},
5696 @code{nil}, or @code{keywordp}) symbols, so that the byte interpreter
5697 doesn't have to.
5698 @item
5699 The maxiumum number of variable bindings in the byte-code is
5700 pre-computed, so that space on the @code{specpdl} stack can be
5701 pre-reserved once for the whole function execution.
5702 @item
5703 All byte-code jumps are relative to the current program counter instead
5704 of the start of the program, thereby saving a register.
5705 @item
5706 One-byte relative jumps are converted from the byte-code form of unsigned
5707 chars offset by 127 to machine-friendly signed chars.
5708 @end enumerate
5709
5710 Of course, this transformation of the @code{instructions} should not be
5711 visible to the user, so @code{Fcompiled_function_instructions()} needs
5712 to know how to convert the optimized opaque object back into a Lisp
5713 string that is identical to the original string from the @file{.elc}
5714 file.  (Actually, the resulting string may (rarely) contain slightly
5715 different, yet equivalent, byte code.)
5716
5717 @code{Ffuncall()} implements Lisp @code{funcall}.  @code{(funcall fun
5718 x1 x2 x3 ...)} is equivalent to @code{(eval (list fun (quote x1) (quote
5719 x2) (quote x3) ...))}.  @code{Ffuncall()} contains its own code to do
5720 the evaluation, however, and is very similar to @code{Feval()}.
5721
5722 From the performance point of view, it is worth knowing that most of the
5723 time in Lisp evaluation is spent executing @code{Lisp_Subr} and
5724 @code{Lisp_Compiled_Function} objects via @code{Ffuncall()} (not
5725 @code{Feval()}).
5726
5727 @code{Fapply()} implements Lisp @code{apply}, which is very similar to
5728 @code{funcall} except that if the last argument is a list, the result is the
5729 same as if each of the arguments in the list had been passed separately.
5730 @code{Fapply()} does some business to expand the last argument if it's a
5731 list, then calls @code{Ffuncall()} to do the work.
5732
5733 @code{apply1()}, @code{call0()}, @code{call1()}, @code{call2()}, and
5734 @code{call3()} call a function, passing it the argument(s) given (the
5735 arguments are given as separate C arguments rather than being passed as
5736 an array).  @code{apply1()} uses @code{Fapply()} while the others use
5737 @code{Ffuncall()} to do the real work.
5738
5739 @node Dynamic Binding; The specbinding Stack; Unwind-Protects
5740 @section Dynamic Binding; The specbinding Stack; Unwind-Protects
5741
5742 @example
5743 struct specbinding
5744 @{
5745   Lisp_Object symbol;
5746   Lisp_Object old_value;
5747   Lisp_Object (*func) (Lisp_Object); /* for unwind-protect */
5748 @};
5749 @end example
5750
5751   @code{struct specbinding} is used for local-variable bindings and
5752 unwind-protects.  @code{specpdl} holds an array of @code{struct specbinding}'s,
5753 @code{specpdl_ptr} points to the beginning of the free bindings in the
5754 array, @code{specpdl_size} specifies the total number of binding slots
5755 in the array, and @code{max_specpdl_size} specifies the maximum number
5756 of bindings the array can be expanded to hold.  @code{grow_specpdl()}
5757 increases the size of the @code{specpdl} array, multiplying its size by
5758 2 but never exceeding @code{max_specpdl_size} (except that if this
5759 number is less than 400, it is first set to 400).
5760
5761   @code{specbind()} binds a symbol to a value and is used for local
5762 variables and @code{let} forms.  The symbol and its old value (which
5763 might be @code{Qunbound}, indicating no prior value) are recorded in the
5764 specpdl array, and @code{specpdl_size} is increased by 1.
5765
5766   @code{record_unwind_protect()} implements an @dfn{unwind-protect},
5767 which, when placed around a section of code, ensures that some specified
5768 cleanup routine will be executed even if the code exits abnormally
5769 (e.g. through a @code{throw} or quit).  @code{record_unwind_protect()}
5770 simply adds a new specbinding to the @code{specpdl} array and stores the
5771 appropriate information in it.  The cleanup routine can either be a C
5772 function, which is stored in the @code{func} field, or a @code{progn}
5773 form, which is stored in the @code{old_value} field.
5774
5775   @code{unbind_to()} removes specbindings from the @code{specpdl} array
5776 until the specified position is reached.  Each specbinding can be one of
5777 three types:
5778
5779 @enumerate
5780 @item
5781 an unwind-protect with a C cleanup function (@code{func} is not 0, and
5782 @code{old_value} holds an argument to be passed to the function);
5783 @item
5784 an unwind-protect with a Lisp form (@code{func} is 0, @code{symbol}
5785 is @code{nil}, and @code{old_value} holds the form to be executed with
5786 @code{Fprogn()}); or
5787 @item
5788 a local-variable binding (@code{func} is 0, @code{symbol} is not
5789 @code{nil}, and @code{old_value} holds the old value, which is stored as
5790 the symbol's value).
5791 @end enumerate
5792
5793 @node Simple Special Forms
5794 @section Simple Special Forms
5795
5796 @code{or}, @code{and}, @code{if}, @code{cond}, @code{progn},
5797 @code{prog1}, @code{prog2}, @code{setq}, @code{quote}, @code{function},
5798 @code{let*}, @code{let}, @code{while}
5799
5800 All of these are very simple and work as expected, calling
5801 @code{Feval()} or @code{Fprogn()} as necessary and (in the case of
5802 @code{let} and @code{let*}) using @code{specbind()} to create bindings
5803 and @code{unbind_to()} to undo the bindings when finished.
5804
5805 Note that, with the exeption of @code{Fprogn}, these functions are
5806 typically called in real life only in interpreted code, since the byte
5807 compiler knows how to convert calls to these functions directly into
5808 byte code.
5809
5810 @node Catch and Throw
5811 @section Catch and Throw
5812
5813 @example
5814 struct catchtag
5815 @{
5816   Lisp_Object tag;
5817   Lisp_Object val;
5818   struct catchtag *next;
5819   struct gcpro *gcpro;
5820   jmp_buf jmp;
5821   struct backtrace *backlist;
5822   int lisp_eval_depth;
5823   int pdlcount;
5824 @};
5825 @end example
5826
5827   @code{catch} is a Lisp function that places a catch around a body of
5828 code.  A catch is a means of non-local exit from the code.  When a catch
5829 is created, a tag is specified, and executing a @code{throw} to this tag
5830 will exit from the body of code caught with this tag, and its value will
5831 be the value given in the call to @code{throw}.  If there is no such
5832 call, the code will be executed normally.
5833
5834   Information pertaining to a catch is held in a @code{struct catchtag},
5835 which is placed at the head of a linked list pointed to by
5836 @code{catchlist}.  @code{internal_catch()} is passed a C function to
5837 call (@code{Fprogn()} when Lisp @code{catch} is called) and arguments to
5838 give it, and places a catch around the function.  Each @code{struct
5839 catchtag} is held in the stack frame of the @code{internal_catch()}
5840 instance that created the catch.
5841
5842   @code{internal_catch()} is fairly straightforward.  It stores into the
5843 @code{struct catchtag} the tag name and the current values of
5844 @code{backtrace_list}, @code{lisp_eval_depth}, @code{gcprolist}, and the
5845 offset into the @code{specpdl} array, sets a jump point with @code{_setjmp()}
5846 (storing the jump point into the @code{struct catchtag}), and calls the
5847 function.  Control will return to @code{internal_catch()} either when
5848 the function exits normally or through a @code{_longjmp()} to this jump
5849 point.  In the latter case, @code{throw} will store the value to be
5850 returned into the @code{struct catchtag} before jumping.  When it's
5851 done, @code{internal_catch()} removes the @code{struct catchtag} from
5852 the catchlist and returns the proper value.
5853
5854   @code{Fthrow()} goes up through the catchlist until it finds one with
5855 a matching tag.  It then calls @code{unbind_catch()} to restore
5856 everything to what it was when the appropriate catch was set, stores the
5857 return value in the @code{struct catchtag}, and jumps (with
5858 @code{_longjmp()}) to its jump point.
5859
5860   @code{unbind_catch()} removes all catches from the catchlist until it
5861 finds the correct one.  Some of the catches might have been placed for
5862 error-trapping, and if so, the appropriate entries on the handlerlist
5863 must be removed (see ``errors'').  @code{unbind_catch()} also restores
5864 the values of @code{gcprolist}, @code{backtrace_list}, and
5865 @code{lisp_eval}, and calls @code{unbind_to()} to undo any specbindings
5866 created since the catch.
5867
5868
5869 @node Symbols and Variables, Buffers and Textual Representation, Evaluation; Stack Frames; Bindings, Top
5870 @chapter Symbols and Variables
5871
5872 @menu
5873 * Introduction to Symbols::
5874 * Obarrays::
5875 * Symbol Values::
5876 @end menu
5877
5878 @node Introduction to Symbols
5879 @section Introduction to Symbols
5880
5881   A symbol is basically just an object with four fields: a name (a
5882 string), a value (some Lisp object), a function (some Lisp object), and
5883 a property list (usually a list of alternating keyword/value pairs).
5884 What makes symbols special is that there is usually only one symbol with
5885 a given name, and the symbol is referred to by name.  This makes a
5886 symbol a convenient way of calling up data by name, i.e. of implementing
5887 variables. (The variable's value is stored in the @dfn{value slot}.)
5888 Similarly, functions are referenced by name, and the definition of the
5889 function is stored in a symbol's @dfn{function slot}.  This means that
5890 there can be a distinct function and variable with the same name.  The
5891 property list is used as a more general mechanism of associating
5892 additional values with particular names, and once again the namespace is
5893 independent of the function and variable namespaces.
5894
5895 @node Obarrays
5896 @section Obarrays
5897
5898   The identity of symbols with their names is accomplished through a
5899 structure called an obarray, which is just a poorly-implemented hash
5900 table mapping from strings to symbols whose name is that string. (I say
5901 ``poorly implemented'' because an obarray appears in Lisp as a vector
5902 with some hidden fields rather than as its own opaque type.  This is an
5903 Emacs Lisp artifact that should be fixed.)
5904
5905   Obarrays are implemented as a vector of some fixed size (which should
5906 be a prime for best results), where each ``bucket'' of the vector
5907 contains one or more symbols, threaded through a hidden @code{next}
5908 field in the symbol.  Lookup of a symbol in an obarray, and adding a
5909 symbol to an obarray, is accomplished through standard hash-table
5910 techniques.
5911
5912   The standard Lisp function for working with symbols and obarrays is
5913 @code{intern}.  This looks up a symbol in an obarray given its name; if
5914 it's not found, a new symbol is automatically created with the specified
5915 name, added to the obarray, and returned.  This is what happens when the
5916 Lisp reader encounters a symbol (or more precisely, encounters the name
5917 of a symbol) in some text that it is reading.  There is a standard
5918 obarray called @code{obarray} that is used for this purpose, although
5919 the Lisp programmer is free to create his own obarrays and @code{intern}
5920 symbols in them.
5921
5922   Note that, once a symbol is in an obarray, it stays there until
5923 something is done about it, and the standard obarray @code{obarray}
5924 always stays around, so once you use any particular variable name, a
5925 corresponding symbol will stay around in @code{obarray} until you exit
5926 XEmacs.
5927
5928   Note that @code{obarray} itself is a variable, and as such there is a
5929 symbol in @code{obarray} whose name is @code{"obarray"} and which
5930 contains @code{obarray} as its value.
5931
5932   Note also that this call to @code{intern} occurs only when in the Lisp
5933 reader, not when the code is executed (at which point the symbol is
5934 already around, stored as such in the definition of the function).
5935
5936   You can create your own obarray using @code{make-vector} (this is
5937 horrible but is an artifact) and intern symbols into that obarray.
5938 Doing that will result in two or more symbols with the same name.
5939 However, at most one of these symbols is in the standard @code{obarray}:
5940 You cannot have two symbols of the same name in any particular obarray.
5941 Note that you cannot add a symbol to an obarray in any fashion other
5942 than using @code{intern}: i.e. you can't take an existing symbol and put
5943 it in an existing obarray.  Nor can you change the name of an existing
5944 symbol. (Since obarrays are vectors, you can violate the consistency of
5945 things by storing directly into the vector, but let's ignore that
5946 possibility.)
5947
5948   Usually symbols are created by @code{intern}, but if you really want,
5949 you can explicitly create a symbol using @code{make-symbol}, giving it
5950 some name.  The resulting symbol is not in any obarray (i.e. it is
5951 @dfn{uninterned}), and you can't add it to any obarray.  Therefore its
5952 primary purpose is as a symbol to use in macros to avoid namespace
5953 pollution.  It can also be used as a carrier of information, but cons
5954 cells could probably be used just as well.
5955
5956   You can also use @code{intern-soft} to look up a symbol but not create
5957 a new one, and @code{unintern} to remove a symbol from an obarray.  This
5958 returns the removed symbol. (Remember: You can't put the symbol back
5959 into any obarray.) Finally, @code{mapatoms} maps over all of the symbols
5960 in an obarray.
5961
5962 @node Symbol Values
5963 @section Symbol Values
5964
5965   The value field of a symbol normally contains a Lisp object.  However,
5966 a symbol can be @dfn{unbound}, meaning that it logically has no value.
5967 This is internally indicated by storing a special Lisp object, called
5968 @dfn{the unbound marker} and stored in the global variable
5969 @code{Qunbound}.  The unbound marker is of a special Lisp object type
5970 called @dfn{symbol-value-magic}.  It is impossible for the Lisp
5971 programmer to directly create or access any object of this type.
5972
5973   @strong{You must not let any ``symbol-value-magic'' object escape to
5974 the Lisp level.}  Printing any of these objects will cause the message
5975 @samp{INTERNAL EMACS BUG} to appear as part of the print representation.
5976 (You may see this normally when you call @code{debug_print()} from the
5977 debugger on a Lisp object.) If you let one of these objects escape to
5978 the Lisp level, you will violate a number of assumptions contained in
5979 the C code and make the unbound marker not function right.
5980
5981   When a symbol is created, its value field (and function field) are set
5982 to @code{Qunbound}.  The Lisp programmer can restore these conditions
5983 later using @code{makunbound} or @code{fmakunbound}, and can query to
5984 see whether the value of function fields are @dfn{bound} (i.e. have a
5985 value other than @code{Qunbound}) using @code{boundp} and
5986 @code{fboundp}.  The fields are set to a normal Lisp object using
5987 @code{set} (or @code{setq}) and @code{fset}.
5988
5989   Other symbol-value-magic objects are used as special markers to
5990 indicate variables that have non-normal properties.  This includes any
5991 variables that are tied into C variables (setting the variable magically
5992 sets some global variable in the C code, and likewise for retrieving the
5993 variable's value), variables that magically tie into slots in the
5994 current buffer, variables that are buffer-local, etc.  The
5995 symbol-value-magic object is stored in the value cell in place of
5996 a normal object, and the code to retrieve a symbol's value
5997 (i.e. @code{symbol-value}) knows how to do special things with them.
5998 This means that you should not just fetch the value cell directly if you
5999 want a symbol's value.
6000
6001   The exact workings of this are rather complex and involved and are
6002 well-documented in comments in @file{buffer.c}, @file{symbols.c}, and
6003 @file{lisp.h}.
6004
6005 @node Buffers and Textual Representation, MULE Character Sets and Encodings, Symbols and Variables, Top
6006 @chapter Buffers and Textual Representation
6007
6008 @menu
6009 * Introduction to Buffers::     A buffer holds a block of text such as a file.
6010 * The Text in a Buffer::        Representation of the text in a buffer.
6011 * Buffer Lists::                Keeping track of all buffers.
6012 * Markers and Extents::         Tagging locations within a buffer.
6013 * Bufbytes and Emchars::        Representation of individual characters.
6014 * The Buffer Object::           The Lisp object corresponding to a buffer.
6015 @end menu
6016
6017 @node Introduction to Buffers
6018 @section Introduction to Buffers
6019
6020   A buffer is logically just a Lisp object that holds some text.
6021 In this, it is like a string, but a buffer is optimized for
6022 frequent insertion and deletion, while a string is not.  Furthermore:
6023
6024 @enumerate
6025 @item
6026 Buffers are @dfn{permanent} objects, i.e. once you create them, they
6027 remain around, and need to be explicitly deleted before they go away.
6028 @item
6029 Each buffer has a unique name, which is a string.  Buffers are
6030 normally referred to by name.  In this respect, they are like
6031 symbols.
6032 @item
6033 Buffers have a default insertion position, called @dfn{point}.
6034 Inserting text (unless you explicitly give a position) goes at point,
6035 and moves point forward past the text.  This is what is going on when
6036 you type text into Emacs.
6037 @item
6038 Buffers have lots of extra properties associated with them.
6039 @item
6040 Buffers can be @dfn{displayed}.  What this means is that there
6041 exist a number of @dfn{windows}, which are objects that correspond
6042 to some visible section of your display, and each window has
6043 an associated buffer, and the current contents of the buffer
6044 are shown in that section of the display.  The redisplay mechanism
6045 (which takes care of doing this) knows how to look at the
6046 text of a buffer and come up with some reasonable way of displaying
6047 this.  Many of the properties of a buffer control how the
6048 buffer's text is displayed.
6049 @item
6050 One buffer is distinguished and called the @dfn{current buffer}.  It is
6051 stored in the variable @code{current_buffer}.  Buffer operations operate
6052 on this buffer by default.  When you are typing text into a buffer, the
6053 buffer you are typing into is always @code{current_buffer}.  Switching
6054 to a different window changes the current buffer.  Note that Lisp code
6055 can temporarily change the current buffer using @code{set-buffer} (often
6056 enclosed in a @code{save-excursion} so that the former current buffer
6057 gets restored when the code is finished).  However, calling
6058 @code{set-buffer} will NOT cause a permanent change in the current
6059 buffer.  The reason for this is that the top-level event loop sets
6060 @code{current_buffer} to the buffer of the selected window, each time
6061 it finishes executing a user command.
6062 @end enumerate
6063
6064   Make sure you understand the distinction between @dfn{current buffer}
6065 and @dfn{buffer of the selected window}, and the distinction between
6066 @dfn{point} of the current buffer and @dfn{window-point} of the selected
6067 window. (This latter distinction is explained in detail in the section
6068 on windows.)
6069
6070 @node The Text in a Buffer
6071 @section The Text in a Buffer
6072
6073   The text in a buffer consists of a sequence of zero or more
6074 characters.  A @dfn{character} is an integer that logically represents
6075 a letter, number, space, or other unit of text.  Most of the characters
6076 that you will typically encounter belong to the ASCII set of characters,
6077 but there are also characters for various sorts of accented letters,
6078 special symbols, Chinese and Japanese ideograms (i.e. Kanji, Katakana,
6079 etc.), Cyrillic and Greek letters, etc.  The actual number of possible
6080 characters is quite large.
6081
6082   For now, we can view a character as some non-negative integer that
6083 has some shape that defines how it typically appears (e.g. as an
6084 uppercase A). (The exact way in which a character appears depends on the
6085 font used to display the character.) The internal type of characters in
6086 the C code is an @code{Emchar}; this is just an @code{int}, but using a
6087 symbolic type makes the code clearer.
6088
6089   Between every character in a buffer is a @dfn{buffer position} or
6090 @dfn{character position}.  We can speak of the character before or after
6091 a particular buffer position, and when you insert a character at a
6092 particular position, all characters after that position end up at new
6093 positions.  When we speak of the character @dfn{at} a position, we
6094 really mean the character after the position.  (This schizophrenia
6095 between a buffer position being ``between'' a character and ``on'' a
6096 character is rampant in Emacs.)
6097
6098   Buffer positions are numbered starting at 1.  This means that
6099 position 1 is before the first character, and position 0 is not
6100 valid.  If there are N characters in a buffer, then buffer
6101 position N+1 is after the last one, and position N+2 is not valid.
6102
6103   The internal makeup of the Emchar integer varies depending on whether
6104 we have compiled with MULE support.  If not, the Emchar integer is an
6105 8-bit integer with possible values from 0 - 255.  0 - 127 are the
6106 standard ASCII characters, while 128 - 255 are the characters from the
6107 ISO-8859-1 character set.  If we have compiled with MULE support, an
6108 Emchar is a 19-bit integer, with the various bits having meanings
6109 according to a complex scheme that will be detailed later.  The
6110 characters numbered 0 - 255 still have the same meanings as for the
6111 non-MULE case, though.
6112
6113   Internally, the text in a buffer is represented in a fairly simple
6114 fashion: as a contiguous array of bytes, with a @dfn{gap} of some size
6115 in the middle.  Although the gap is of some substantial size in bytes,
6116 there is no text contained within it: From the perspective of the text
6117 in the buffer, it does not exist.  The gap logically sits at some buffer
6118 position, between two characters (or possibly at the beginning or end of
6119 the buffer).  Insertion of text in a buffer at a particular position is
6120 always accomplished by first moving the gap to that position
6121 (i.e. through some block moving of text), then writing the text into the
6122 beginning of the gap, thereby shrinking the gap.  If the gap shrinks
6123 down to nothing, a new gap is created. (What actually happens is that a
6124 new gap is ``created'' at the end of the buffer's text, which requires
6125 nothing more than changing a couple of indices; then the gap is
6126 ``moved'' to the position where the insertion needs to take place by
6127 moving up in memory all the text after that position.)  Similarly,
6128 deletion occurs by moving the gap to the place where the text is to be
6129 deleted, and then simply expanding the gap to include the deleted text.
6130 (@dfn{Expanding} and @dfn{shrinking} the gap as just described means
6131 just that the internal indices that keep track of where the gap is
6132 located are changed.)
6133
6134   Note that the total amount of memory allocated for a buffer text never
6135 decreases while the buffer is live.  Therefore, if you load up a
6136 20-megabyte file and then delete all but one character, there will be a
6137 20-megabyte gap, which won't get any smaller (except by inserting
6138 characters back again).  Once the buffer is killed, the memory allocated
6139 for the buffer text will be freed, but it will still be sitting on the
6140 heap, taking up virtual memory, and will not be released back to the
6141 operating system. (However, if you have compiled XEmacs with rel-alloc,
6142 the situation is different.  In this case, the space @emph{will} be
6143 released back to the operating system.  However, this tends to result in a
6144 noticeable speed penalty.)
6145
6146   Astute readers may notice that the text in a buffer is represented as
6147 an array of @emph{bytes}, while (at least in the MULE case) an Emchar is
6148 a 19-bit integer, which clearly cannot fit in a byte.  This means (of
6149 course) that the text in a buffer uses a different representation from
6150 an Emchar: specifically, the 19-bit Emchar becomes a series of one to
6151 four bytes.  The conversion between these two representations is complex
6152 and will be described later.
6153
6154   In the non-MULE case, everything is very simple: An Emchar
6155 is an 8-bit value, which fits neatly into one byte.
6156
6157   If we are given a buffer position and want to retrieve the
6158 character at that position, we need to follow these steps:
6159
6160 @enumerate
6161 @item
6162 Pretend there's no gap, and convert the buffer position into a @dfn{byte
6163 index} that indexes to the appropriate byte in the buffer's stream of
6164 textual bytes.  By convention, byte indices begin at 1, just like buffer
6165 positions.  In the non-MULE case, byte indices and buffer positions are
6166 identical, since one character equals one byte.
6167 @item
6168 Convert the byte index into a @dfn{memory index}, which takes the gap
6169 into account.  The memory index is a direct index into the block of
6170 memory that stores the text of a buffer.  This basically just involves
6171 checking to see if the byte index is past the gap, and if so, adding the
6172 size of the gap to it.  By convention, memory indices begin at 1, just
6173 like buffer positions and byte indices, and when referring to the
6174 position that is @dfn{at} the gap, we always use the memory position at
6175 the @emph{beginning}, not at the end, of the gap.
6176 @item
6177 Fetch the appropriate bytes at the determined memory position.
6178 @item
6179 Convert these bytes into an Emchar.
6180 @end enumerate
6181
6182   In the non-Mule case, (3) and (4) boil down to a simple one-byte
6183 memory access.
6184
6185   Note that we have defined three types of positions in a buffer:
6186
6187 @enumerate
6188 @item
6189 @dfn{buffer positions} or @dfn{character positions}, typedef @code{Bufpos}
6190 @item
6191 @dfn{byte indices}, typedef @code{Bytind}
6192 @item
6193 @dfn{memory indices}, typedef @code{Memind}
6194 @end enumerate
6195
6196   All three typedefs are just @code{int}s, but defining them this way makes
6197 things a lot clearer.
6198
6199   Most code works with buffer positions.  In particular, all Lisp code
6200 that refers to text in a buffer uses buffer positions.  Lisp code does
6201 not know that byte indices or memory indices exist.
6202
6203   Finally, we have a typedef for the bytes in a buffer.  This is a
6204 @code{Bufbyte}, which is an unsigned char.  Referring to them as
6205 Bufbytes underscores the fact that we are working with a string of bytes
6206 in the internal Emacs buffer representation rather than in one of a
6207 number of possible alternative representations (e.g. EUC-encoded text,
6208 etc.).
6209
6210 @node Buffer Lists
6211 @section Buffer Lists
6212
6213   Recall earlier that buffers are @dfn{permanent} objects, i.e.  that
6214 they remain around until explicitly deleted.  This entails that there is
6215 a list of all the buffers in existence.  This list is actually an
6216 assoc-list (mapping from the buffer's name to the buffer) and is stored
6217 in the global variable @code{Vbuffer_alist}.
6218
6219   The order of the buffers in the list is important: the buffers are
6220 ordered approximately from most-recently-used to least-recently-used.
6221 Switching to a buffer using @code{switch-to-buffer},
6222 @code{pop-to-buffer}, etc. and switching windows using
6223 @code{other-window}, etc.  usually brings the new current buffer to the
6224 front of the list.  @code{switch-to-buffer}, @code{other-buffer},
6225 etc. look at the beginning of the list to find an alternative buffer to
6226 suggest.  You can also explicitly move a buffer to the end of the list
6227 using @code{bury-buffer}.
6228
6229   In addition to the global ordering in @code{Vbuffer_alist}, each frame
6230 has its own ordering of the list.  These lists always contain the same
6231 elements as in @code{Vbuffer_alist} although possibly in a different
6232 order.  @code{buffer-list} normally returns the list for the selected
6233 frame.  This allows you to work in separate frames without things
6234 interfering with each other.
6235
6236   The standard way to look up a buffer given a name is
6237 @code{get-buffer}, and the standard way to create a new buffer is
6238 @code{get-buffer-create}, which looks up a buffer with a given name,
6239 creating a new one if necessary.  These operations correspond exactly
6240 with the symbol operations @code{intern-soft} and @code{intern},
6241 respectively.  You can also force a new buffer to be created using
6242 @code{generate-new-buffer}, which takes a name and (if necessary) makes
6243 a unique name from this by appending a number, and then creates the
6244 buffer.  This is basically like the symbol operation @code{gensym}.
6245
6246 @node Markers and Extents
6247 @section Markers and Extents
6248
6249   Among the things associated with a buffer are things that are
6250 logically attached to certain buffer positions.  This can be used to
6251 keep track of a buffer position when text is inserted and deleted, so
6252 that it remains at the same spot relative to the text around it; to
6253 assign properties to particular sections of text; etc.  There are two
6254 such objects that are useful in this regard: they are @dfn{markers} and
6255 @dfn{extents}.
6256
6257   A @dfn{marker} is simply a flag placed at a particular buffer
6258 position, which is moved around as text is inserted and deleted.
6259 Markers are used for all sorts of purposes, such as the @code{mark} that
6260 is the other end of textual regions to be cut, copied, etc.
6261
6262   An @dfn{extent} is similar to two markers plus some associated
6263 properties, and is used to keep track of regions in a buffer as text is
6264 inserted and deleted, and to add properties (e.g. fonts) to particular
6265 regions of text.  The external interface of extents is explained
6266 elsewhere.
6267
6268   The important thing here is that markers and extents simply contain
6269 buffer positions in them as integers, and every time text is inserted or
6270 deleted, these positions must be updated.  In order to minimize the
6271 amount of shuffling that needs to be done, the positions in markers and
6272 extents (there's one per marker, two per extent) and stored in Meminds.
6273 This means that they only need to be moved when the text is physically
6274 moved in memory; since the gap structure tries to minimize this, it also
6275 minimizes the number of marker and extent indices that need to be
6276 adjusted.  Look in @file{insdel.c} for the details of how this works.
6277
6278   One other important distinction is that markers are @dfn{temporary}
6279 while extents are @dfn{permanent}.  This means that markers disappear as
6280 soon as there are no more pointers to them, and correspondingly, there
6281 is no way to determine what markers are in a buffer if you are just
6282 given the buffer.  Extents remain in a buffer until they are detached
6283 (which could happen as a result of text being deleted) or the buffer is
6284 deleted, and primitives do exist to enumerate the extents in a buffer.
6285
6286 @node Bufbytes and Emchars
6287 @section Bufbytes and Emchars
6288
6289   Not yet documented.
6290
6291 @node The Buffer Object
6292 @section The Buffer Object
6293
6294   Buffers contain fields not directly accessible by the Lisp programmer.
6295 We describe them here, naming them by the names used in the C code.
6296 Many are accessible indirectly in Lisp programs via Lisp primitives.
6297
6298 @table @code
6299 @item name
6300 The buffer name is a string that names the buffer.  It is guaranteed to
6301 be unique.  @xref{Buffer Names,,, lispref, XEmacs Lisp Programmer's
6302 Manual}.
6303
6304 @item save_modified
6305 This field contains the time when the buffer was last saved, as an
6306 integer.  @xref{Buffer Modification,,, lispref, XEmacs Lisp Programmer's
6307 Manual}.
6308
6309 @item modtime
6310 This field contains the modification time of the visited file.  It is
6311 set when the file is written or read.  Every time the buffer is written
6312 to the file, this field is compared to the modification time of the
6313 file.  @xref{Buffer Modification,,, lispref, XEmacs Lisp Programmer's
6314 Manual}.
6315
6316 @item auto_save_modified
6317 This field contains the time when the buffer was last auto-saved.
6318
6319 @item last_window_start
6320 This field contains the @code{window-start} position in the buffer as of
6321 the last time the buffer was displayed in a window.
6322
6323 @item undo_list
6324 This field points to the buffer's undo list.  @xref{Undo,,, lispref,
6325 XEmacs Lisp Programmer's Manual}.
6326
6327 @item syntax_table_v
6328 This field contains the syntax table for the buffer.  @xref{Syntax
6329 Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
6330
6331 @item downcase_table
6332 This field contains the conversion table for converting text to lower
6333 case.  @xref{Case Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
6334
6335 @item upcase_table
6336 This field contains the conversion table for converting text to upper
6337 case.  @xref{Case Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
6338
6339 @item case_canon_table
6340 This field contains the conversion table for canonicalizing text for
6341 case-folding search.  @xref{Case Tables,,, lispref, XEmacs Lisp
6342 Programmer's Manual}.
6343
6344 @item case_eqv_table
6345 This field contains the equivalence table for case-folding search.
6346 @xref{Case Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
6347
6348 @item display_table
6349 This field contains the buffer's display table, or @code{nil} if it
6350 doesn't have one.  @xref{Display Tables,,, lispref, XEmacs Lisp
6351 Programmer's Manual}.
6352
6353 @item markers
6354 This field contains the chain of all markers that currently point into
6355 the buffer.  Deletion of text in the buffer, and motion of the buffer's
6356 gap, must check each of these markers and perhaps update it.
6357 @xref{Markers,,, lispref, XEmacs Lisp Programmer's Manual}.
6358
6359 @item backed_up
6360 This field is a flag that tells whether a backup file has been made for
6361 the visited file of this buffer.
6362
6363 @item mark
6364 This field contains the mark for the buffer.  The mark is a marker,
6365 hence it is also included on the list @code{markers}.  @xref{The Mark,,,
6366 lispref, XEmacs Lisp Programmer's Manual}.
6367
6368 @item mark_active
6369 This field is non-@code{nil} if the buffer's mark is active.
6370
6371 @item local_var_alist
6372 This field contains the association list describing the variables local
6373 in this buffer, and their values, with the exception of local variables
6374 that have special slots in the buffer object.  (Those slots are omitted
6375 from this table.)  @xref{Buffer-Local Variables,,, lispref, XEmacs Lisp
6376 Programmer's Manual}.
6377
6378 @item modeline_format
6379 This field contains a Lisp object which controls how to display the mode
6380 line for this buffer.  @xref{Modeline Format,,, lispref, XEmacs Lisp
6381 Programmer's Manual}.
6382
6383 @item base_buffer
6384 This field holds the buffer's base buffer (if it is an indirect buffer),
6385 or @code{nil}.
6386 @end table
6387
6388 @node MULE Character Sets and Encodings, The Lisp Reader and Compiler, Buffers and Textual Representation, Top
6389 @chapter MULE Character Sets and Encodings
6390
6391   Recall that there are two primary ways that text is represented in
6392 XEmacs.  The @dfn{buffer} representation sees the text as a series of
6393 bytes (Bufbytes), with a variable number of bytes used per character.
6394 The @dfn{character} representation sees the text as a series of integers
6395 (Emchars), one per character.  The character representation is a cleaner
6396 representation from a theoretical standpoint, and is thus used in many
6397 cases when lots of manipulations on a string need to be done.  However,
6398 the buffer representation is the standard representation used in both
6399 Lisp strings and buffers, and because of this, it is the ``default''
6400 representation that text comes in.  The reason for using this
6401 representation is that it's compact and is compatible with ASCII.
6402
6403 @menu
6404 * Character Sets::
6405 * Encodings::
6406 * Internal Mule Encodings::
6407 * CCL::
6408 @end menu
6409
6410 @node Character Sets
6411 @section Character Sets
6412
6413   A character set (or @dfn{charset}) is an ordered set of characters.  A
6414 particular character in a charset is indexed using one or more
6415 @dfn{position codes}, which are non-negative integers.  The number of
6416 position codes needed to identify a particular character in a charset is
6417 called the @dfn{dimension} of the charset.  In XEmacs/Mule, all charsets
6418 have dimension 1 or 2, and the size of all charsets (except for a few
6419 special cases) is either 94, 96, 94 by 94, or 96 by 96.  The range of
6420 position codes used to index characters from any of these types of
6421 character sets is as follows:
6422
6423 @example
6424 Charset type            Position code 1         Position code 2
6425 ------------------------------------------------------------
6426 94                      33 - 126                N/A
6427 96                      32 - 127                N/A
6428 94x94                   33 - 126                33 - 126
6429 96x96                   32 - 127                32 - 127
6430 @end example
6431
6432   Note that in the above cases position codes do not start at an
6433 expected value such as 0 or 1.  The reason for this will become clear
6434 later.
6435
6436   For example, Latin-1 is a 96-character charset, and JISX0208 (the
6437 Japanese national character set) is a 94x94-character charset.
6438
6439   [Note that, although the ranges above define the @emph{valid} position
6440 codes for a charset, some of the slots in a particular charset may in
6441 fact be empty.  This is the case for JISX0208, for example, where (e.g.)
6442 all the slots whose first position code is in the range 118 - 127 are
6443 empty.]
6444
6445   There are three charsets that do not follow the above rules.  All of
6446 them have one dimension, and have ranges of position codes as follows:
6447
6448 @example
6449 Charset name            Position code 1
6450 ------------------------------------
6451 ASCII                   0 - 127
6452 Control-1               0 - 31
6453 Composite               0 - some large number
6454 @end example
6455
6456   (The upper bound of the position code for composite characters has not
6457 yet been determined, but it will probably be at least 16,383).
6458
6459   ASCII is the union of two subsidiary character sets: Printing-ASCII
6460 (the printing ASCII character set, consisting of position codes 33 -
6461 126, like for a standard 94-character charset) and Control-ASCII (the
6462 non-printing characters that would appear in a binary file with codes 0
6463 - 32 and 127).
6464
6465   Control-1 contains the non-printing characters that would appear in a
6466 binary file with codes 128 - 159.
6467
6468   Composite contains characters that are generated by overstriking one
6469 or more characters from other charsets.
6470
6471   Note that some characters in ASCII, and all characters in Control-1,
6472 are @dfn{control} (non-printing) characters.  These have no printed
6473 representation but instead control some other function of the printing
6474 (e.g. TAB or 8 moves the current character position to the next tab
6475 stop).  All other characters in all charsets are @dfn{graphic}
6476 (printing) characters.
6477
6478   When a binary file is read in, the bytes in the file are assigned to
6479 character sets as follows:
6480
6481 @example
6482 Bytes           Character set           Range
6483 --------------------------------------------------
6484 0 - 127         ASCII                   0 - 127
6485 128 - 159       Control-1               0 - 31
6486 160 - 255       Latin-1                 32 - 127
6487 @end example
6488
6489   This is a bit ad-hoc but gets the job done.
6490
6491 @node Encodings
6492 @section Encodings
6493
6494   An @dfn{encoding} is a way of numerically representing characters from
6495 one or more character sets.  If an encoding only encompasses one
6496 character set, then the position codes for the characters in that
6497 character set could be used directly.  This is not possible, however, if
6498 more than one character set is to be used in the encoding.
6499
6500   For example, the conversion detailed above between bytes in a binary
6501 file and characters is effectively an encoding that encompasses the
6502 three character sets ASCII, Control-1, and Latin-1 in a stream of 8-bit
6503 bytes.
6504
6505   Thus, an encoding can be viewed as a way of encoding characters from a
6506 specified group of character sets using a stream of bytes, each of which
6507 contains a fixed number of bits (but not necessarily 8, as in the common
6508 usage of ``byte'').
6509
6510   Here are descriptions of a couple of common
6511 encodings:
6512
6513 @menu
6514 * Japanese EUC (Extended Unix Code)::
6515 * JIS7::
6516 @end menu
6517
6518 @node Japanese EUC (Extended Unix Code)
6519 @subsection Japanese EUC (Extended Unix Code)
6520
6521 This encompasses the character sets Printing-ASCII, Japanese-JISX0201,
6522 and Japanese-JISX0208-Kana (half-width katakana, the right half of
6523 JISX0201).  It uses 8-bit bytes.
6524
6525 Note that Printing-ASCII and Japanese-JISX0201-Kana are 94-character
6526 charsets, while Japanese-JISX0208 is a 94x94-character charset.
6527
6528 The encoding is as follows:
6529
6530 @example
6531 Character set            Representation (PC=position-code)
6532 -------------            --------------
6533 Printing-ASCII           PC1
6534 Japanese-JISX0201-Kana   0x8E       | PC1 + 0x80
6535 Japanese-JISX0208        PC1 + 0x80 | PC2 + 0x80
6536 Japanese-JISX0212        PC1 + 0x80 | PC2 + 0x80
6537 @end example
6538
6539
6540 @node JIS7
6541 @subsection JIS7
6542
6543 This encompasses the character sets Printing-ASCII,
6544 Japanese-JISX0201-Roman (the left half of JISX0201; this character set
6545 is very similar to Printing-ASCII and is a 94-character charset),
6546 Japanese-JISX0208, and Japanese-JISX0201-Kana.  It uses 7-bit bytes.
6547
6548 Unlike Japanese EUC, this is a @dfn{modal} encoding, which
6549 means that there are multiple states that the encoding can
6550 be in, which affect how the bytes are to be interpreted.
6551 Special sequences of bytes (called @dfn{escape sequences})
6552 are used to change states.
6553
6554   The encoding is as follows:
6555
6556 @example
6557 Character set              Representation (PC=position-code)
6558 -------------              --------------
6559 Printing-ASCII             PC1
6560 Japanese-JISX0201-Roman    PC1
6561 Japanese-JISX0201-Kana     PC1
6562 Japanese-JISX0208          PC1 PC2
6563
6564
6565 Escape sequence   ASCII equivalent   Meaning
6566 ---------------   ----------------   -------
6567 0x1B 0x28 0x4A    ESC ( J            invoke Japanese-JISX0201-Roman
6568 0x1B 0x28 0x49    ESC ( I            invoke Japanese-JISX0201-Kana
6569 0x1B 0x24 0x42    ESC $ B            invoke Japanese-JISX0208
6570 0x1B 0x28 0x42    ESC ( B            invoke Printing-ASCII
6571 @end example
6572
6573   Initially, Printing-ASCII is invoked.
6574
6575 @node Internal Mule Encodings
6576 @section Internal Mule Encodings
6577
6578 In XEmacs/Mule, each character set is assigned a unique number, called a
6579 @dfn{leading byte}.  This is used in the encodings of a character.
6580 Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has
6581 a leading byte of 0), although some leading bytes are reserved.
6582
6583 Charsets whose leading byte is in the range 0x80 - 0x9F are called
6584 @dfn{official} and are used for built-in charsets.  Other charsets are
6585 called @dfn{private} and have leading bytes in the range 0xA0 - 0xFF;
6586 these are user-defined charsets.
6587
6588   More specifically:
6589
6590 @example
6591 Character set           Leading byte
6592 -------------           ------------
6593 ASCII                   0
6594 Composite               0x80
6595 Dimension-1 Official    0x81 - 0x8D
6596                           (0x8E is free)
6597 Control-1               0x8F
6598 Dimension-2 Official    0x90 - 0x99
6599                           (0x9A - 0x9D are free;
6600                            0x9E and 0x9F are reserved)
6601 Dimension-1 Private     0xA0 - 0xEF
6602 Dimension-2 Private     0xF0 - 0xFF
6603 @end example
6604
6605 There are two internal encodings for characters in XEmacs/Mule.  One is
6606 called @dfn{string encoding} and is an 8-bit encoding that is used for
6607 representing characters in a buffer or string.  It uses 1 to 4 bytes per
6608 character.  The other is called @dfn{character encoding} and is a 19-bit
6609 encoding that is used for representing characters individually in a
6610 variable.
6611
6612 (In the following descriptions, we'll ignore composite characters for
6613 the moment.  We also give a general (structural) overview first,
6614 followed later by the exact details.)
6615
6616 @menu
6617 * Internal String Encoding::
6618 * Internal Character Encoding::
6619 @end menu
6620
6621 @node Internal String Encoding
6622 @subsection Internal String Encoding
6623
6624 ASCII characters are encoded using their position code directly.  Other
6625 characters are encoded using their leading byte followed by their
6626 position code(s) with the high bit set.  Characters in private character
6627 sets have their leading byte prefixed with a @dfn{leading byte prefix},
6628 which is either 0x9E or 0x9F. (No character sets are ever assigned these
6629 leading bytes.) Specifically:
6630
6631 @example
6632 Character set           Encoding (PC=position-code, LB=leading-byte)
6633 -------------           --------
6634 ASCII                   PC-1 |
6635 Control-1               LB   |  PC1 + 0xA0 |
6636 Dimension-1 official    LB   |  PC1 + 0x80 |
6637 Dimension-1 private     0x9E |  LB         | PC1 + 0x80 |
6638 Dimension-2 official    LB   |  PC1 + 0x80 | PC2 + 0x80 |
6639 Dimension-2 private     0x9F |  LB         | PC1 + 0x80 | PC2 + 0x80
6640 @end example
6641
6642   The basic characteristic of this encoding is that the first byte
6643 of all characters is in the range 0x00 - 0x9F, and the second and
6644 following bytes of all characters is in the range 0xA0 - 0xFF.
6645 This means that it is impossible to get out of sync, or more
6646 specifically:
6647
6648 @enumerate
6649 @item
6650 Given any byte position, the beginning of the character it is
6651 within can be determined in constant time.
6652 @item
6653 Given any byte position at the beginning of a character, the
6654 beginning of the next character can be determined in constant
6655 time.
6656 @item
6657 Given any byte position at the beginning of a character, the
6658 beginning of the previous character can be determined in constant
6659 time.
6660 @item
6661 Textual searches can simply treat encoded strings as if they
6662 were encoded in a one-byte-per-character fashion rather than
6663 the actual multi-byte encoding.
6664 @end enumerate
6665
6666   None of the standard non-modal encodings meet all of these
6667 conditions.  For example, EUC satisfies only (2) and (3), while
6668 Shift-JIS and Big5 (not yet described) satisfy only (2). (All
6669 non-modal encodings must satisfy (2), in order to be unambiguous.)
6670
6671 @node Internal Character Encoding
6672 @subsection Internal Character Encoding
6673
6674   One 19-bit word represents a single character.  The word is
6675 separated into three fields:
6676
6677 @example
6678 Bit number:     18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
6679                 <------------> <------------------> <------------------>
6680 Field:                1                  2                    3
6681 @end example
6682
6683   Note that fields 2 and 3 hold 7 bits each, while field 1 holds 5 bits.
6684
6685 @example
6686 Character set           Field 1         Field 2         Field 3
6687 -------------           -------         -------         -------
6688 ASCII                      0               0              PC1
6689    range:                                                   (00 - 7F)
6690 Control-1                  0               1              PC1
6691    range:                                                   (00 - 1F)
6692 Dimension-1 official       0            LB - 0x80         PC1
6693    range:                                    (01 - 0D)      (20 - 7F)
6694 Dimension-1 private        0            LB - 0x80         PC1
6695    range:                                    (20 - 6F)      (20 - 7F)
6696 Dimension-2 official    LB - 0x8F         PC1             PC2
6697    range:                    (01 - 0A)       (20 - 7F)      (20 - 7F)
6698 Dimension-2 private     LB - 0xE1         PC1             PC2
6699    range:                    (0F - 1E)       (20 - 7F)      (20 - 7F)
6700 Composite                 0x1F             ?               ?
6701 @end example
6702
6703   Note that character codes 0 - 255 are the same as the ``binary encoding''
6704 described above.
6705
6706 @node CCL
6707 @section CCL
6708
6709 @example
6710 CCL PROGRAM SYNTAX:
6711      CCL_PROGRAM := (CCL_MAIN_BLOCK
6712                      [ CCL_EOF_BLOCK ])
6713
6714      CCL_MAIN_BLOCK := CCL_BLOCK
6715      CCL_EOF_BLOCK := CCL_BLOCK
6716
6717      CCL_BLOCK := STATEMENT | (STATEMENT [STATEMENT ...])
6718      STATEMENT :=
6719              SET | IF | BRANCH | LOOP | REPEAT | BREAK
6720              | READ | WRITE
6721
6722      SET := (REG = EXPRESSION) | (REG SELF_OP EXPRESSION)
6723             | INT-OR-CHAR
6724
6725      EXPRESSION := ARG | (EXPRESSION OP ARG)
6726
6727      IF := (if EXPRESSION CCL_BLOCK CCL_BLOCK)
6728      BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...])
6729      LOOP := (loop STATEMENT [STATEMENT ...])
6730      BREAK := (break)
6731      REPEAT := (repeat)
6732              | (write-repeat [REG | INT-OR-CHAR | string])
6733              | (write-read-repeat REG [INT-OR-CHAR | string | ARRAY]?)
6734      READ := (read REG) | (read REG REG)
6735              | (read-if REG ARITH_OP ARG CCL_BLOCK CCL_BLOCK)
6736              | (read-branch REG CCL_BLOCK [CCL_BLOCK ...])
6737      WRITE := (write REG) | (write REG REG)
6738              | (write INT-OR-CHAR) | (write STRING) | STRING
6739              | (write REG ARRAY)
6740      END := (end)
6741
6742      REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7
6743      ARG := REG | INT-OR-CHAR
6744      OP :=   + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | //
6745              | < | > | == | <= | >= | !=
6746      SELF_OP :=
6747              += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>=
6748      ARRAY := '[' INT-OR-CHAR ... ']'
6749      INT-OR-CHAR := INT | CHAR
6750
6751 MACHINE CODE:
6752
6753 The machine code consists of a vector of 32-bit words.
6754 The first such word specifies the start of the EOF section of the code;
6755 this is the code executed to handle any stuff that needs to be done
6756 (e.g. designating back to ASCII and left-to-right mode) after all
6757 other encoded/decoded data has been written out.  This is not used for
6758 charset CCL programs.
6759
6760 REGISTER: 0..7  -- refered by RRR or rrr
6761
6762 OPERATOR BIT FIELD (27-bit): XXXXXXXXXXXXXXX RRR TTTTT
6763         TTTTT (5-bit): operator type
6764         RRR (3-bit): register number
6765         XXXXXXXXXXXXXXXX (15-bit):
6766                 CCCCCCCCCCCCCCC: constant or address
6767                 000000000000rrr: register number
6768
6769 AAAA:   00000 +
6770         00001 -
6771         00010 *
6772         00011 /
6773         00100 %
6774         00101 &
6775         00110 |
6776         00111 ~
6777
6778         01000 <<
6779         01001 >>
6780         01010 <8
6781         01011 >8
6782         01100 //
6783         01101 not used
6784         01110 not used
6785         01111 not used
6786
6787         10000 <
6788         10001 >
6789         10010 ==
6790         10011 <=
6791         10100 >=
6792         10101 !=
6793
6794 OPERATORS:      TTTTT RRR XX..
6795
6796 SetCS:          00000 RRR C...C      RRR = C...C
6797 SetCL:          00001 RRR .....      RRR = c...c
6798                 c.............c
6799 SetR:           00010 RRR ..rrr      RRR = rrr
6800 SetA:           00011 RRR ..rrr      RRR = array[rrr]
6801                 C.............C      size of array = C...C
6802                 c.............c      contents = c...c
6803
6804 Jump:           00100 000 c...c      jump to c...c
6805 JumpCond:       00101 RRR c...c      if (!RRR) jump to c...c
6806 WriteJump:      00110 RRR c...c      Write1 RRR, jump to c...c
6807 WriteReadJump:  00111 RRR c...c      Write1, Read1 RRR, jump to c...c
6808 WriteCJump:     01000 000 c...c      Write1 C...C, jump to c...c
6809                 C...C
6810 WriteCReadJump: 01001 RRR c...c      Write1 C...C, Read1 RRR,
6811                 C.............C      and jump to c...c
6812 WriteSJump:     01010 000 c...c      WriteS, jump to c...c
6813                 C.............C
6814                 S.............S
6815                 ...
6816 WriteSReadJump: 01011 RRR c...c      WriteS, Read1 RRR, jump to c...c
6817                 C.............C
6818                 S.............S
6819                 ...
6820 WriteAReadJump: 01100 RRR c...c      WriteA, Read1 RRR, jump to c...c
6821                 C.............C      size of array = C...C
6822                 c.............c      contents = c...c
6823                 ...
6824 Branch:         01101 RRR C...C      if (RRR >= 0 && RRR < C..)
6825                 c.............c      branch to (RRR+1)th address
6826 Read1:          01110 RRR ...        read 1-byte to RRR
6827 Read2:          01111 RRR ..rrr      read 2-byte to RRR and rrr
6828 ReadBranch:     10000 RRR C...C      Read1 and Branch
6829                 c.............c
6830                 ...
6831 Write1:         10001 RRR .....      write 1-byte RRR
6832 Write2:         10010 RRR ..rrr      write 2-byte RRR and rrr
6833 WriteC:         10011 000 .....      write 1-char C...CC
6834                 C.............C
6835 WriteS:         10100 000 .....      write C..-byte of string
6836                 C.............C
6837                 S.............S
6838                 ...
6839 WriteA:         10101 RRR .....      write array[RRR]
6840                 C.............C      size of array = C...C
6841                 c.............c      contents = c...c
6842                 ...
6843 End:            10110 000 .....      terminate the execution
6844
6845 SetSelfCS:      10111 RRR C...C      RRR AAAAA= C...C
6846                 ..........AAAAA
6847 SetSelfCL:      11000 RRR .....      RRR AAAAA= c...c
6848                 c.............c
6849                 ..........AAAAA
6850 SetSelfR:       11001 RRR ..Rrr      RRR AAAAA= rrr
6851                 ..........AAAAA
6852 SetExprCL:      11010 RRR ..Rrr      RRR = rrr AAAAA c...c
6853                 c.............c
6854                 ..........AAAAA
6855 SetExprR:       11011 RRR ..rrr      RRR = rrr AAAAA Rrr
6856                 ............Rrr
6857                 ..........AAAAA
6858 JumpCondC:      11100 RRR c...c      if !(RRR AAAAA C..) jump to c...c
6859                 C.............C
6860                 ..........AAAAA
6861 JumpCondR:      11101 RRR c...c      if !(RRR AAAAA rrr) jump to c...c
6862                 ............rrr
6863                 ..........AAAAA
6864 ReadJumpCondC:  11110 RRR c...c      Read1 and JumpCondC
6865                 C.............C
6866                 ..........AAAAA
6867 ReadJumpCondR:  11111 RRR c...c      Read1 and JumpCondR
6868                 ............rrr
6869                 ..........AAAAA
6870 @end example
6871
6872 @node The Lisp Reader and Compiler, Lstreams, MULE Character Sets and Encodings, Top
6873 @chapter The Lisp Reader and Compiler
6874
6875 Not yet documented.
6876
6877 @node Lstreams, Consoles; Devices; Frames; Windows, The Lisp Reader and Compiler, Top
6878 @chapter Lstreams
6879
6880   An @dfn{lstream} is an internal Lisp object that provides a generic
6881 buffering stream implementation.  Conceptually, you send data to the
6882 stream or read data from the stream, not caring what's on the other end
6883 of the stream.  The other end could be another stream, a file
6884 descriptor, a stdio stream, a fixed block of memory, a reallocating
6885 block of memory, etc.  The main purpose of the stream is to provide a
6886 standard interface and to do buffering.  Macros are defined to read or
6887 write characters, so the calling functions do not have to worry about
6888 blocking data together in order to achieve efficiency.
6889
6890 @menu
6891 * Creating an Lstream::         Creating an lstream object.
6892 * Lstream Types::               Different sorts of things that are streamed.
6893 * Lstream Functions::           Functions for working with lstreams.
6894 * Lstream Methods::             Creating new lstream types.
6895 @end menu
6896
6897 @node Creating an Lstream
6898 @section Creating an Lstream
6899
6900 Lstreams come in different types, depending on what is being interfaced
6901 to.  Although the primitive for creating new lstreams is
6902 @code{Lstream_new()}, generally you do not call this directly.  Instead,
6903 you call some type-specific creation function, which creates the lstream
6904 and initializes it as appropriate for the particular type.
6905
6906 All lstream creation functions take a @var{mode} argument, specifying
6907 what mode the lstream should be opened as.  This controls whether the
6908 lstream is for input and output, and optionally whether data should be
6909 blocked up in units of MULE characters.  Note that some types of
6910 lstreams can only be opened for input; others only for output; and
6911 others can be opened either way.  #### Richard Mlynarik thinks that
6912 there should be a strict separation between input and output streams,
6913 and he's probably right.
6914
6915   @var{mode} is a string, one of
6916
6917 @table @code
6918 @item "r"
6919   Open for reading.
6920 @item "w"
6921   Open for writing.
6922 @item "rc"
6923   Open for reading, but ``read'' never returns partial MULE characters.
6924 @item "wc"
6925   Open for writing, but never writes partial MULE characters.
6926 @end table
6927
6928 @node Lstream Types
6929 @section Lstream Types
6930
6931 @table @asis
6932 @item stdio
6933
6934 @item filedesc
6935
6936 @item lisp-string
6937
6938 @item fixed-buffer
6939
6940 @item resizing-buffer
6941
6942 @item dynarr
6943
6944 @item lisp-buffer
6945
6946 @item print
6947
6948 @item decoding
6949
6950 @item encoding
6951 @end table
6952
6953 @node Lstream Functions
6954 @section Lstream Functions
6955
6956 @deftypefun {Lstream *} Lstream_new (Lstream_implementation *@var{imp}, CONST char *@var{mode})
6957 Allocate and return a new Lstream.  This function is not really meant to
6958 be called directly; rather, each stream type should provide its own
6959 stream creation function, which creates the stream and does any other
6960 necessary creation stuff (e.g. opening a file).
6961 @end deftypefun
6962
6963 @deftypefun void Lstream_set_buffering (Lstream *@var{lstr}, Lstream_buffering @var{buffering}, int @var{buffering_size})
6964 Change the buffering of a stream.  See @file{lstream.h}.  By default the
6965 buffering is @code{STREAM_BLOCK_BUFFERED}.
6966 @end deftypefun
6967
6968 @deftypefun int Lstream_flush (Lstream *@var{lstr})
6969 Flush out any pending unwritten data in the stream.  Clear any buffered
6970 input data.  Returns 0 on success, -1 on error.
6971 @end deftypefun
6972
6973 @deftypefn Macro int Lstream_putc (Lstream *@var{stream}, int @var{c})
6974 Write out one byte to the stream.  This is a macro and so it is very
6975 efficient.  The @var{c} argument is only evaluated once but the @var{stream}
6976 argument is evaluated more than once.  Returns 0 on success, -1 on
6977 error.
6978 @end deftypefn
6979
6980 @deftypefn Macro int Lstream_getc (Lstream *@var{stream})
6981 Read one byte from the stream.  This is a macro and so it is very
6982 efficient.  The @var{stream} argument is evaluated more than once.  Return
6983 value is -1 for EOF or error.
6984 @end deftypefn
6985
6986 @deftypefn Macro void Lstream_ungetc (Lstream *@var{stream}, int @var{c})
6987 Push one byte back onto the input queue.  This will be the next byte
6988 read from the stream.  Any number of bytes can be pushed back and will
6989 be read in the reverse order they were pushed back -- most recent
6990 first. (This is necessary for consistency -- if there are a number of
6991 bytes that have been unread and I read and unread a byte, it needs to be
6992 the first to be read again.) This is a macro and so it is very
6993 efficient.  The @var{c} argument is only evaluated once but the @var{stream}
6994 argument is evaluated more than once.
6995 @end deftypefn
6996
6997 @deftypefun int Lstream_fputc (Lstream *@var{stream}, int @var{c})
6998 @deftypefunx int Lstream_fgetc (Lstream *@var{stream})
6999 @deftypefunx void Lstream_fungetc (Lstream *@var{stream}, int @var{c})
7000 Function equivalents of the above macros.
7001 @end deftypefun
7002
7003 @deftypefun int Lstream_read (Lstream *@var{stream}, void *@var{data}, int @var{size})
7004 Read @var{size} bytes of @var{data} from the stream.  Return the number
7005 of bytes read.  0 means EOF. -1 means an error occurred and no bytes
7006 were read.
7007 @end deftypefun
7008
7009 @deftypefun int Lstream_write (Lstream *@var{stream}, void *@var{data}, int @var{size})
7010 Write @var{size} bytes of @var{data} to the stream.  Return the number
7011 of bytes written.  -1 means an error occurred and no bytes were written.
7012 @end deftypefun
7013
7014 @deftypefun void Lstream_unread (Lstream *@var{stream}, void *@var{data}, int @var{size})
7015 Push back @var{size} bytes of @var{data} onto the input queue.  The next
7016 call to @code{Lstream_read()} with the same size will read the same
7017 bytes back.  Note that this will be the case even if there is other
7018 pending unread data.
7019 @end deftypefun
7020
7021 @deftypefun int Lstream_close (Lstream *@var{stream})
7022 Close the stream.  All data will be flushed out.
7023 @end deftypefun
7024
7025 @deftypefun void Lstream_reopen (Lstream *@var{stream})
7026 Reopen a closed stream.  This enables I/O on it again.  This is not
7027 meant to be called except from a wrapper routine that reinitializes
7028 variables and such -- the close routine may well have freed some
7029 necessary storage structures, for example.
7030 @end deftypefun
7031
7032 @deftypefun void Lstream_rewind (Lstream *@var{stream})
7033 Rewind the stream to the beginning.
7034 @end deftypefun
7035
7036 @node Lstream Methods
7037 @section Lstream Methods
7038
7039 @deftypefn {Lstream Method} int reader (Lstream *@var{stream}, unsigned char *@var{data}, int @var{size})
7040 Read some data from the stream's end and store it into @var{data}, which
7041 can hold @var{size} bytes.  Return the number of bytes read.  A return
7042 value of 0 means no bytes can be read at this time.  This may be because
7043 of an EOF, or because there is a granularity greater than one byte that
7044 the stream imposes on the returned data, and @var{size} is less than
7045 this granularity. (This will happen frequently for streams that need to
7046 return whole characters, because @code{Lstream_read()} calls the reader
7047 function repeatedly until it has the number of bytes it wants or until 0
7048 is returned.)  The lstream functions do not treat a 0 return as EOF or
7049 do anything special; however, the calling function will interpret any 0
7050 it gets back as EOF.  This will normally not happen unless the caller
7051 calls @code{Lstream_read()} with a very small size.
7052
7053 This function can be @code{NULL} if the stream is output-only.
7054 @end deftypefn
7055
7056 @deftypefn {Lstream Method} int writer (Lstream *@var{stream}, CONST unsigned char *@var{data}, int @var{size})
7057 Send some data to the stream's end.  Data to be sent is in @var{data}
7058 and is @var{size} bytes.  Return the number of bytes sent.  This
7059 function can send and return fewer bytes than is passed in; in that
7060 case, the function will just be called again until there is no data left
7061 or 0 is returned.  A return value of 0 means that no more data can be
7062 currently stored, but there is no error; the data will be squirreled
7063 away until the writer can accept data. (This is useful, e.g., if you're
7064 dealing with a non-blocking file descriptor and are getting
7065 @code{EWOULDBLOCK} errors.)  This function can be @code{NULL} if the
7066 stream is input-only.
7067 @end deftypefn
7068
7069 @deftypefn {Lstream Method} int rewinder (Lstream *@var{stream})
7070 Rewind the stream.  If this is @code{NULL}, the stream is not seekable.
7071 @end deftypefn
7072
7073 @deftypefn {Lstream Method} int seekable_p (Lstream *@var{stream})
7074 Indicate whether this stream is seekable -- i.e. it can be rewound.
7075 This method is ignored if the stream does not have a rewind method.  If
7076 this method is not present, the result is determined by whether a rewind
7077 method is present.
7078 @end deftypefn
7079
7080 @deftypefn {Lstream Method} int flusher (Lstream *@var{stream})
7081 Perform any additional operations necessary to flush the data in this
7082 stream.
7083 @end deftypefn
7084
7085 @deftypefn {Lstream Method} int pseudo_closer (Lstream *@var{stream})
7086 @end deftypefn
7087
7088 @deftypefn {Lstream Method} int closer (Lstream *@var{stream})
7089 Perform any additional operations necessary to close this stream down.
7090 May be @code{NULL}.  This function is called when @code{Lstream_close()}
7091 is called or when the stream is garbage-collected.  When this function
7092 is called, all pending data in the stream will already have been written
7093 out.
7094 @end deftypefn
7095
7096 @deftypefn {Lstream Method} Lisp_Object marker (Lisp_Object @var{lstream}, void (*@var{markfun}) (Lisp_Object))
7097 Mark this object for garbage collection.  Same semantics as a standard
7098 @code{Lisp_Object} marker.  This function can be @code{NULL}.
7099 @end deftypefn
7100
7101 @node Consoles; Devices; Frames; Windows, The Redisplay Mechanism, Lstreams, Top
7102 @chapter Consoles; Devices; Frames; Windows
7103
7104 @menu
7105 * Introduction to Consoles; Devices; Frames; Windows::
7106 * Point::
7107 * Window Hierarchy::
7108 * The Window Object::
7109 @end menu
7110
7111 @node Introduction to Consoles; Devices; Frames; Windows
7112 @section Introduction to Consoles; Devices; Frames; Windows
7113
7114 A window-system window that you see on the screen is called a
7115 @dfn{frame} in Emacs terminology.  Each frame is subdivided into one or
7116 more non-overlapping panes, called (confusingly) @dfn{windows}.  Each
7117 window displays the text of a buffer in it. (See above on Buffers.) Note
7118 that buffers and windows are independent entities: Two or more windows
7119 can be displaying the same buffer (potentially in different locations),
7120 and a buffer can be displayed in no windows.
7121
7122   A single display screen that contains one or more frames is called
7123 a @dfn{display}.  Under most circumstances, there is only one display.
7124 However, more than one display can exist, for example if you have
7125 a @dfn{multi-headed} console, i.e. one with a single keyboard but
7126 multiple displays. (Typically in such a situation, the various
7127 displays act like one large display, in that the mouse is only
7128 in one of them at a time, and moving the mouse off of one moves
7129 it into another.) In some cases, the different displays will
7130 have different characteristics, e.g. one color and one mono.
7131
7132   XEmacs can display frames on multiple displays.  It can even deal
7133 simultaneously with frames on multiple keyboards (called @dfn{consoles} in
7134 XEmacs terminology).  Here is one case where this might be useful: You
7135 are using XEmacs on your workstation at work, and leave it running.
7136 Then you go home and dial in on a TTY line, and you can use the
7137 already-running XEmacs process to display another frame on your local
7138 TTY.
7139
7140   Thus, there is a hierarchy console -> display -> frame -> window.
7141 There is a separate Lisp object type for each of these four concepts.
7142 Furthermore, there is logically a @dfn{selected console},
7143 @dfn{selected display}, @dfn{selected frame}, and @dfn{selected window}.
7144 Each of these objects is distinguished in various ways, such as being the
7145 default object for various functions that act on objects of that type.
7146 Note that every containing object rememembers the ``selected'' object
7147 among the objects that it contains: e.g. not only is there a selected
7148 window, but every frame remembers the last window in it that was
7149 selected, and changing the selected frame causes the remembered window
7150 within it to become the selected window.  Similar relationships apply
7151 for consoles to devices and devices to frames.
7152
7153 @node Point
7154 @section Point
7155
7156   Recall that every buffer has a current insertion position, called
7157 @dfn{point}.  Now, two or more windows may be displaying the same buffer,
7158 and the text cursor in the two windows (i.e. @code{point}) can be in
7159 two different places.  You may ask, how can that be, since each
7160 buffer has only one value of @code{point}?  The answer is that each window
7161 also has a value of @code{point} that is squirreled away in it.  There
7162 is only one selected window, and the value of ``point'' in that buffer
7163 corresponds to that window.  When the selected window is changed
7164 from one window to another displaying the same buffer, the old
7165 value of @code{point} is stored into the old window's ``point'' and the
7166 value of @code{point} from the new window is retrieved and made the
7167 value of @code{point} in the buffer.  This means that @code{window-point}
7168 for the selected window is potentially inaccurate, and if you
7169 want to retrieve the correct value of @code{point} for a window,
7170 you must special-case on the selected window and retrieve the
7171 buffer's point instead.  This is related to why @code{save-window-excursion}
7172 does not save the selected window's value of @code{point}.
7173
7174 @node Window Hierarchy
7175 @section Window Hierarchy
7176 @cindex window hierarchy
7177 @cindex hierarchy of windows
7178
7179   If a frame contains multiple windows (panes), they are always created
7180 by splitting an existing window along the horizontal or vertical axis.
7181 Terminology is a bit confusing here: to @dfn{split a window
7182 horizontally} means to create two side-by-side windows, i.e. to make a
7183 @emph{vertical} cut in a window.  Likewise, to @dfn{split a window
7184 vertically} means to create two windows, one above the other, by making
7185 a @emph{horizontal} cut.
7186
7187   If you split a window and then split again along the same axis, you
7188 will end up with a number of panes all arranged along the same axis.
7189 The precise way in which the splits were made should not be important,
7190 and this is reflected internally.  Internally, all windows are arranged
7191 in a tree, consisting of two types of windows, @dfn{combination} windows
7192 (which have children, and are covered completely by those children) and
7193 @dfn{leaf} windows, which have no children and are visible.  Every
7194 combination window has two or more children, all arranged along the same
7195 axis.  There are (logically) two subtypes of windows, depending on
7196 whether their children are horizontally or vertically arrayed.  There is
7197 always one root window, which is either a leaf window (if the frame
7198 contains only one window) or a combination window (if the frame contains
7199 more than one window).  In the latter case, the root window will have
7200 two or more children, either horizontally or vertically arrayed, and
7201 each of those children will be either a leaf window or another
7202 combination window.
7203
7204   Here are some rules:
7205
7206 @enumerate
7207 @item
7208 Horizontal combination windows can never have children that are
7209 horizontal combination windows; same for vertical.
7210
7211 @item
7212 Only leaf windows can be split (obviously) and this splitting does one
7213 of two things: (a) turns the leaf window into a combination window and
7214 creates two new leaf children, or (b) turns the leaf window into one of
7215 the two new leaves and creates the other leaf.  Rule (1) dictates which
7216 of these two outcomes happens.
7217
7218 @item
7219 Every combination window must have at least two children.
7220
7221 @item
7222 Leaf windows can never become combination windows.  They can be deleted,
7223 however.  If this results in a violation of (3), the parent combination
7224 window also gets deleted.
7225
7226 @item
7227 All functions that accept windows must be prepared to accept combination
7228 windows, and do something sane (e.g. signal an error if so).
7229 Combination windows @emph{do} escape to the Lisp level.
7230
7231 @item
7232 All windows have three fields governing their contents:
7233 these are @dfn{hchild} (a list of horizontally-arrayed children),
7234 @dfn{vchild} (a list of vertically-arrayed children), and @dfn{buffer}
7235 (the buffer contained in a leaf window).  Exactly one of
7236 these will be non-nil.  Remember that @dfn{horizontally-arrayed}
7237 means ``side-by-side'' and @dfn{vertically-arrayed} means
7238 @dfn{one above the other}.
7239
7240 @item
7241 Leaf windows also have markers in their @code{start} (the
7242 first buffer position displayed in the window) and @code{pointm}
7243 (the window's stashed value of @code{point} -- see above) fields,
7244 while combination windows have nil in these fields.
7245
7246 @item
7247 The list of children for a window is threaded through the
7248 @code{next} and @code{prev} fields of each child window.
7249
7250 @item
7251 @strong{Deleted windows can be undeleted}.  This happens as a result of
7252 restoring a window configuration, and is unlike frames, displays, and
7253 consoles, which, once deleted, can never be restored.  Deleting a window
7254 does nothing except set a special @code{dead} bit to 1 and clear out the
7255 @code{next}, @code{prev}, @code{hchild}, and @code{vchild} fields, for
7256 GC purposes.
7257
7258 @item
7259 Most frames actually have two top-level windows -- one for the
7260 minibuffer and one (the @dfn{root}) for everything else.  The modeline
7261 (if present) separates these two.  The @code{next} field of the root
7262 points to the minibuffer, and the @code{prev} field of the minibuffer
7263 points to the root.  The other @code{next} and @code{prev} fields are
7264 @code{nil}, and the frame points to both of these windows.
7265 Minibuffer-less frames have no minibuffer window, and the @code{next}
7266 and @code{prev} of the root window are @code{nil}.  Minibuffer-only
7267 frames have no root window, and the @code{next} of the minibuffer window
7268 is @code{nil} but the @code{prev} points to itself. (#### This is an
7269 artifact that should be fixed.)
7270 @end enumerate
7271
7272 @node The Window Object
7273 @section The Window Object
7274
7275   Windows have the following accessible fields:
7276
7277 @table @code
7278 @item frame
7279 The frame that this window is on.
7280
7281 @item mini_p
7282 Non-@code{nil} if this window is a minibuffer window.
7283
7284 @item buffer
7285 The buffer that the window is displaying.  This may change often during
7286 the life of the window.
7287
7288 @item dedicated
7289 Non-@code{nil} if this window is dedicated to its buffer.
7290
7291 @item pointm
7292 @cindex window point internals
7293 This is the value of point in the current buffer when this window is
7294 selected; when it is not selected, it retains its previous value.
7295
7296 @item start
7297 The position in the buffer that is the first character to be displayed
7298 in the window.
7299
7300 @item force_start
7301 If this flag is non-@code{nil}, it says that the window has been
7302 scrolled explicitly by the Lisp program.  This affects what the next
7303 redisplay does if point is off the screen: instead of scrolling the
7304 window to show the text around point, it moves point to a location that
7305 is on the screen.
7306
7307 @item last_modified
7308 The @code{modified} field of the window's buffer, as of the last time
7309 a redisplay completed in this window.
7310
7311 @item last_point
7312 The buffer's value of point, as of the last time
7313 a redisplay completed in this window.
7314
7315 @item left
7316 This is the left-hand edge of the window, measured in columns.  (The
7317 leftmost column on the screen is @w{column 0}.)
7318
7319 @item top
7320 This is the top edge of the window, measured in lines.  (The top line on
7321 the screen is @w{line 0}.)
7322
7323 @item height
7324 The height of the window, measured in lines.
7325
7326 @item width
7327 The width of the window, measured in columns.
7328
7329 @item next
7330 This is the window that is the next in the chain of siblings.  It is
7331 @code{nil} in a window that is the rightmost or bottommost of a group of
7332 siblings.
7333
7334 @item prev
7335 This is the window that is the previous in the chain of siblings.  It is
7336 @code{nil} in a window that is the leftmost or topmost of a group of
7337 siblings.
7338
7339 @item parent
7340 Internally, XEmacs arranges windows in a tree; each group of siblings has
7341 a parent window whose area includes all the siblings.  This field points
7342 to a window's parent.
7343
7344 Parent windows do not display buffers, and play little role in display
7345 except to shape their child windows.  Emacs Lisp programs usually have
7346 no access to the parent windows; they operate on the windows at the
7347 leaves of the tree, which actually display buffers.
7348
7349 @item hscroll
7350 This is the number of columns that the display in the window is scrolled
7351 horizontally to the left.  Normally, this is 0.
7352
7353 @item use_time
7354 This is the last time that the window was selected.  The function
7355 @code{get-lru-window} uses this field.
7356
7357 @item display_table
7358 The window's display table, or @code{nil} if none is specified for it.
7359
7360 @item update_mode_line
7361 Non-@code{nil} means this window's mode line needs to be updated.
7362
7363 @item base_line_number
7364 The line number of a certain position in the buffer, or @code{nil}.
7365 This is used for displaying the line number of point in the mode line.
7366
7367 @item base_line_pos
7368 The position in the buffer for which the line number is known, or
7369 @code{nil} meaning none is known.
7370
7371 @item region_showing
7372 If the region (or part of it) is highlighted in this window, this field
7373 holds the mark position that made one end of that region.  Otherwise,
7374 this field is @code{nil}.
7375 @end table
7376
7377 @node The Redisplay Mechanism, Extents, Consoles; Devices; Frames; Windows, Top
7378 @chapter The Redisplay Mechanism
7379
7380   The redisplay mechanism is one of the most complicated sections of
7381 XEmacs, especially from a conceptual standpoint.  This is doubly so
7382 because, unlike for the basic aspects of the Lisp interpreter, the
7383 computer science theories of how to efficiently handle redisplay are not
7384 well-developed.
7385
7386   When working with the redisplay mechanism, remember the Golden Rules
7387 of Redisplay:
7388
7389 @enumerate
7390 @item
7391 It Is Better To Be Correct Than Fast.
7392 @item
7393 Thou Shalt Not Run Elisp From Within Redisplay.
7394 @item
7395 It Is Better To Be Fast Than Not To Be.
7396 @end enumerate
7397
7398 @menu
7399 * Critical Redisplay Sections::
7400 * Line Start Cache::
7401 @end menu
7402
7403 @node Critical Redisplay Sections
7404 @section Critical Redisplay Sections
7405 @cindex critical redisplay sections
7406
7407 Within this section, we are defenseless and assume that the
7408 following cannot happen:
7409
7410 @enumerate
7411 @item
7412 garbage collection
7413 @item
7414 Lisp code evaluation
7415 @item
7416 frame size changes
7417 @end enumerate
7418
7419 We ensure (3) by calling @code{hold_frame_size_changes()}, which
7420 will cause any pending frame size changes to get put on hold
7421 till after the end of the critical section.  (1) follows
7422 automatically if (2) is met.  #### Unfortunately, there are
7423 some places where Lisp code can be called within this section.
7424 We need to remove them.
7425
7426 If @code{Fsignal()} is called during this critical section, we
7427 will @code{abort()}.
7428
7429 If garbage collection is called during this critical section,
7430 we simply return. #### We should abort instead.
7431
7432 #### If a frame-size change does occur we should probably
7433 actually be preempting redisplay.
7434
7435 @node Line Start Cache
7436 @section Line Start Cache
7437 @cindex line start cache
7438
7439   The traditional scrolling code in Emacs breaks in a variable height
7440 world.  It depends on the key assumption that the number of lines that
7441 can be displayed at any given time is fixed.  This led to a complete
7442 separation of the scrolling code from the redisplay code.  In order to
7443 fully support variable height lines, the scrolling code must actually be
7444 tightly integrated with redisplay.  Only redisplay can determine how
7445 many lines will be displayed on a screen for any given starting point.
7446
7447   What is ideally wanted is a complete list of the starting buffer
7448 position for every possible display line of a buffer along with the
7449 height of that display line.  Maintaining such a full list would be very
7450 expensive.  We settle for having it include information for all areas
7451 which we happen to generate anyhow (i.e. the region currently being
7452 displayed) and for those areas we need to work with.
7453
7454   In order to ensure that the cache accurately represents what redisplay
7455 would actually show, it is necessary to invalidate it in many
7456 situations.  If the buffer changes, the starting positions may no longer
7457 be correct.  If a face or an extent has changed then the line heights
7458 may have altered.  These events happen frequently enough that the cache
7459 can end up being constantly disabled.  With this potentially constant
7460 invalidation when is the cache ever useful?
7461
7462   Even if the cache is invalidated before every single usage, it is
7463 necessary.  Scrolling often requires knowledge about display lines which
7464 are actually above or below the visible region.  The cache provides a
7465 convenient light-weight method of storing this information for multiple
7466 display regions.  This knowledge is necessary for the scrolling code to
7467 always obey the First Golden Rule of Redisplay.
7468
7469   If the cache already contains all of the information that the scrolling
7470 routines happen to need so that it doesn't have to go generate it, then
7471 we are able to obey the Third Golden Rule of Redisplay.  The first thing
7472 we do to help out the cache is to always add the displayed region.  This
7473 region had to be generated anyway, so the cache ends up getting the
7474 information basically for free.  In those cases where a user is simply
7475 scrolling around viewing a buffer there is a high probability that this
7476 is sufficient to always provide the needed information.  The second
7477 thing we can do is be smart about invalidating the cache.
7478
7479   TODO -- Be smart about invalidating the cache.  Potential places:
7480
7481 @itemize @bullet
7482 @item
7483 Insertions at end-of-line which don't cause line-wraps do not alter the
7484 starting positions of any display lines.  These types of buffer
7485 modifications should not invalidate the cache.  This is actually a large
7486 optimization for redisplay speed as well.
7487 @item
7488 Buffer modifications frequently only affect the display of lines at and
7489 below where they occur.  In these situations we should only invalidate
7490 the part of the cache starting at where the modification occurs.
7491 @end itemize
7492
7493   In case you're wondering, the Second Golden Rule of Redisplay is not
7494 applicable.
7495
7496 @node Extents, Faces and Glyphs, The Redisplay Mechanism, Top
7497 @chapter Extents
7498
7499 @menu
7500 * Introduction to Extents::     Extents are ranges over text, with properties.
7501 * Extent Ordering::             How extents are ordered internally.
7502 * Format of the Extent Info::   The extent information in a buffer or string.
7503 * Zero-Length Extents::         A weird special case.
7504 * Mathematics of Extent Ordering::      A rigorous foundation.
7505 * Extent Fragments::            Cached information useful for redisplay.
7506 @end menu
7507
7508 @node Introduction to Extents
7509 @section Introduction to Extents
7510
7511   Extents are regions over a buffer, with a start and an end position
7512 denoting the region of the buffer included in the extent.  In
7513 addition, either end can be closed or open, meaning that the endpoint
7514 is or is not logically included in the extent.  Insertion of a character
7515 at a closed endpoint causes the character to go inside the extent;
7516 insertion at an open endpoint causes the character to go outside.
7517
7518   Extent endpoints are stored using memory indices (see @file{insdel.c}),
7519 to minimize the amount of adjusting that needs to be done when
7520 characters are inserted or deleted.
7521
7522   (Formerly, extent endpoints at the gap could be either before or
7523 after the gap, depending on the open/closedness of the endpoint.
7524 The intent of this was to make it so that insertions would
7525 automatically go inside or out of extents as necessary with no
7526 further work needing to be done.  It didn't work out that way,
7527 however, and just ended up complexifying and buggifying all the
7528 rest of the code.)
7529
7530 @node Extent Ordering
7531 @section Extent Ordering
7532
7533   Extents are compared using memory indices.  There are two orderings
7534 for extents and both orders are kept current at all times.  The normal
7535 or @dfn{display} order is as follows:
7536
7537 @example
7538 Extent A is ``less than'' extent B,
7539 that is, earlier in the display order,
7540   if:    A-start < B-start,
7541   or if: A-start = B-start, and A-end > B-end
7542 @end example
7543
7544   So if two extents begin at the same position, the larger of them is the
7545 earlier one in the display order (@code{EXTENT_LESS} is true).
7546
7547   For the e-order, the same thing holds:
7548
7549 @example
7550 Extent A is ``less than'' extent B in e-order,
7551 that is, later in the buffer,
7552   if:    A-end < B-end,
7553   or if: A-end = B-end, and A-start > B-start
7554 @end example
7555
7556   So if two extents end at the same position, the smaller of them is the
7557 earlier one in the e-order (@code{EXTENT_E_LESS} is true).
7558
7559   The display order and the e-order are complementary orders: any
7560 theorem about the display order also applies to the e-order if you swap
7561 all occurrences of ``display order'' and ``e-order'', ``less than'' and
7562 ``greater than'', and ``extent start'' and ``extent end''.
7563
7564 @node Format of the Extent Info
7565 @section Format of the Extent Info
7566
7567   An extent-info structure consists of a list of the buffer or string's
7568 extents and a @dfn{stack of extents} that lists all of the extents over
7569 a particular position.  The stack-of-extents info is used for
7570 optimization purposes -- it basically caches some info that might
7571 be expensive to compute.  Certain otherwise hard computations are easy
7572 given the stack of extents over a particular position, and if the
7573 stack of extents over a nearby position is known (because it was
7574 calculated at some prior point in time), it's easy to move the stack
7575 of extents to the proper position.
7576
7577   Given that the stack of extents is an optimization, and given that
7578 it requires memory, a string's stack of extents is wiped out each
7579 time a garbage collection occurs.  Therefore, any time you retrieve
7580 the stack of extents, it might not be there.  If you need it to
7581 be there, use the @code{_force} version.
7582
7583   Similarly, a string may or may not have an extent_info structure.
7584 (Generally it won't if there haven't been any extents added to the
7585 string.) So use the @code{_force} version if you need the extent_info
7586 structure to be there.
7587
7588   A list of extents is maintained as a double gap array: one gap array
7589 is ordered by start index (the @dfn{display order}) and the other is
7590 ordered by end index (the @dfn{e-order}).  Note that positions in an
7591 extent list should logically be conceived of as referring @emph{to} a
7592 particular extent (as is the norm in programs) rather than sitting
7593 between two extents.  Note also that callers of these functions should
7594 not be aware of the fact that the extent list is implemented as an
7595 array, except for the fact that positions are integers (this should be
7596 generalized to handle integers and linked list equally well).
7597
7598 @node Zero-Length Extents
7599 @section Zero-Length Extents
7600
7601   Extents can be zero-length, and will end up that way if their endpoints
7602 are explicitly set that way or if their detachable property is nil
7603 and all the text in the extent is deleted. (The exception is open-open
7604 zero-length extents, which are barred from existing because there is
7605 no sensible way to define their properties.  Deletion of the text in
7606 an open-open extent causes it to be converted into a closed-open
7607 extent.)  Zero-length extents are primarily used to represent
7608 annotations, and behave as follows:
7609
7610 @enumerate
7611 @item
7612 Insertion at the position of a zero-length extent expands the extent
7613 if both endpoints are closed; goes after the extent if it is closed-open;
7614 and goes before the extent if it is open-closed.
7615
7616 @item
7617 Deletion of a character on a side of a zero-length extent whose
7618 corresponding endpoint is closed causes the extent to be detached if
7619 it is detachable; if the extent is not detachable or the corresponding
7620 endpoint is open, the extent remains in the buffer, moving as necessary.
7621 @end enumerate
7622
7623   Note that closed-open, non-detachable zero-length extents behave
7624 exactly like markers and that open-closed, non-detachable zero-length
7625 extents behave like the ``point-type'' marker in Mule.
7626
7627 @node Mathematics of Extent Ordering
7628 @section Mathematics of Extent Ordering
7629 @cindex extent mathematics
7630 @cindex mathematics of extents
7631 @cindex extent ordering
7632
7633 @cindex display order of extents
7634 @cindex extents, display order
7635   The extents in a buffer are ordered by ``display order'' because that
7636 is that order that the redisplay mechanism needs to process them in.
7637 The e-order is an auxiliary ordering used to facilitate operations
7638 over extents.  The operations that can be performed on the ordered
7639 list of extents in a buffer are
7640
7641 @enumerate
7642 @item
7643 Locate where an extent would go if inserted into the list.
7644 @item
7645 Insert an extent into the list.
7646 @item
7647 Remove an extent from the list.
7648 @item
7649 Map over all the extents that overlap a range.
7650 @end enumerate
7651
7652   (4) requires being able to determine the first and last extents
7653 that overlap a range.
7654
7655   NOTE: @dfn{overlap} is used as follows:
7656
7657 @itemize @bullet
7658 @item
7659 two ranges overlap if they have at least one point in common.
7660 Whether the endpoints are open or closed makes a difference here.
7661 @item
7662 a point overlaps a range if the point is contained within the
7663 range; this is equivalent to treating a point @math{P} as the range
7664 @math{[P, P]}.
7665 @item
7666 In the case of an @emph{extent} overlapping a point or range, the extent
7667 is normally treated as having closed endpoints.  This applies
7668 consistently in the discussion of stacks of extents and such below.
7669 Note that this definition of overlap is not necessarily consistent with
7670 the extents that @code{map-extents} maps over, since @code{map-extents}
7671 sometimes pays attention to whether the endpoints of an extents are open
7672 or closed.  But for our purposes, it greatly simplifies things to treat
7673 all extents as having closed endpoints.
7674 @end itemize
7675
7676 First, define @math{>}, @math{<}, @math{<=}, etc. as applied to extents
7677 to mean comparison according to the display order.  Comparison between
7678 an extent @math{E} and an index @math{I} means comparison between
7679 @math{E} and the range @math{[I, I]}.
7680
7681 Also define @math{e>}, @math{e<}, @math{e<=}, etc. to mean comparison
7682 according to the e-order.
7683
7684 For any range @math{R}, define @math{R(0)} to be the starting index of
7685 the range and @math{R(1)} to be the ending index of the range.
7686
7687 For any extent @math{E}, define @math{E(next)} to be the extent directly
7688 following @math{E}, and @math{E(prev)} to be the extent directly
7689 preceding @math{E}.  Assume @math{E(next)} and @math{E(prev)} can be
7690 determined from @math{E} in constant time.  (This is because we store
7691 the extent list as a doubly linked list.)
7692
7693 Similarly, define @math{E(e-next)} and @math{E(e-prev)} to be the
7694 extents directly following and preceding @math{E} in the e-order.
7695
7696 Now:
7697
7698 Let @math{R} be a range.
7699 Let @math{F} be the first extent overlapping @math{R}.
7700 Let @math{L} be the last extent overlapping @math{R}.
7701
7702 Theorem 1: @math{R(1)} lies between @math{L} and @math{L(next)},
7703 i.e. @math{L <= R(1) < L(next)}.
7704
7705   This follows easily from the definition of display order.  The
7706 basic reason that this theorem applies is that the display order
7707 sorts by increasing starting index.
7708
7709   Therefore, we can determine @math{L} just by looking at where we would
7710 insert @math{R(1)} into the list, and if we know @math{F} and are moving
7711 forward over extents, we can easily determine when we've hit @math{L} by
7712 comparing the extent we're at to @math{R(1)}.
7713
7714 @example
7715 Theorem 2: @math{F(e-prev) e< [1, R(0)] e<= F}.
7716 @end example
7717
7718   This is the analog of Theorem 1, and applies because the e-order
7719 sorts by increasing ending index.
7720
7721   Therefore, @math{F} can be found in the same amount of time as
7722 operation (1), i.e. the time that it takes to locate where an extent
7723 would go if inserted into the e-order list.
7724
7725   If the lists were stored as balanced binary trees, then operation (1)
7726 would take logarithmic time, which is usually quite fast.  However,
7727 currently they're stored as simple doubly-linked lists, and instead we
7728 do some caching to try to speed things up.
7729
7730   Define a @dfn{stack of extents} (or @dfn{SOE}) as the set of extents
7731 (ordered in the display order) that overlap an index @math{I}, together
7732 with the SOE's @dfn{previous} extent, which is an extent that precedes
7733 @math{I} in the e-order. (Hopefully there will not be very many extents
7734 between @math{I} and the previous extent.)
7735
7736 Now:
7737
7738 Let @math{I} be an index, let @math{S} be the stack of extents on
7739 @math{I}, let @math{F} be the first extent in @math{S}, and let @math{P}
7740 be @math{S}'s previous extent.
7741
7742 Theorem 3: The first extent in @math{S} is the first extent that overlaps
7743 any range @math{[I, J]}.
7744
7745 Proof: Any extent that overlaps @math{[I, J]} but does not include
7746 @math{I} must have a start index @math{> I}, and thus be greater than
7747 any extent in @math{S}.
7748
7749 Therefore, finding the first extent that overlaps a range @math{R} is
7750 the same as finding the first extent that overlaps @math{R(0)}.
7751
7752 Theorem 4: Let @math{I2} be an index such that @math{I2 > I}, and let
7753 @math{F2} be the first extent that overlaps @math{I2}.  Then, either
7754 @math{F2} is in @math{S} or @math{F2} is greater than any extent in
7755 @math{S}.
7756
7757 Proof: If @math{F2} does not include @math{I} then its start index is
7758 greater than @math{I} and thus it is greater than any extent in
7759 @math{S}, including @math{F}.  Otherwise, @math{F2} includes @math{I}
7760 and thus is in @math{S}, and thus @math{F2 >= F}.
7761
7762 @node Extent Fragments
7763 @section Extent Fragments
7764 @cindex extent fragment
7765
7766   Imagine that the buffer is divided up into contiguous, non-overlapping
7767 @dfn{runs} of text such that no extent starts or ends within a run
7768 (extents that abut the run don't count).
7769
7770   An extent fragment is a structure that holds data about the run that
7771 contains a particular buffer position (if the buffer position is at the
7772 junction of two runs, the run after the position is used) -- the
7773 beginning and end of the run, a list of all of the extents in that run,
7774 the @dfn{merged face} that results from merging all of the faces
7775 corresponding to those extents, the begin and end glyphs at the
7776 beginning of the run, etc.  This is the information that redisplay needs
7777 in order to display this run.
7778
7779   Extent fragments have to be very quick to update to a new buffer
7780 position when moving linearly through the buffer.  They rely on the
7781 stack-of-extents code, which does the heavy-duty algorithmic work of
7782 determining which extents overly a particular position.
7783
7784 @node Faces and Glyphs, Specifiers, Extents, Top
7785 @chapter Faces and Glyphs
7786
7787 Not yet documented.
7788
7789 @node Specifiers, Menus, Faces and Glyphs, Top
7790 @chapter Specifiers
7791
7792 Not yet documented.
7793
7794 @node Menus, Subprocesses, Specifiers, Top
7795 @chapter Menus
7796
7797   A menu is set by setting the value of the variable
7798 @code{current-menubar} (which may be buffer-local) and then calling
7799 @code{set-menubar-dirty-flag} to signal a change.  This will cause the
7800 menu to be redrawn at the next redisplay.  The format of the data in
7801 @code{current-menubar} is described in @file{menubar.c}.
7802
7803   Internally the data in current-menubar is parsed into a tree of
7804 @code{widget_value's} (defined in @file{lwlib.h}); this is accomplished
7805 by the recursive function @code{menu_item_descriptor_to_widget_value()},
7806 called by @code{compute_menubar_data()}.  Such a tree is deallocated
7807 using @code{free_widget_value()}.
7808
7809   @code{update_screen_menubars()} is one of the external entry points.
7810 This checks to see, for each screen, if that screen's menubar needs to
7811 be updated.  This is the case if
7812
7813 @enumerate
7814 @item
7815 @code{set-menubar-dirty-flag} was called since the last redisplay.  (This
7816 function sets the C variable menubar_has_changed.)
7817 @item
7818 The buffer displayed in the screen has changed.
7819 @item
7820 The screen has no menubar currently displayed.
7821 @end enumerate
7822
7823   @code{set_screen_menubar()} is called for each such screen.  This
7824 function calls @code{compute_menubar_data()} to create the tree of
7825 widget_value's, then calls @code{lw_create_widget()},
7826 @code{lw_modify_all_widgets()}, and/or @code{lw_destroy_all_widgets()}
7827 to create the X-Toolkit widget associated with the menu.
7828
7829   @code{update_psheets()}, the other external entry point, actually
7830 changes the menus being displayed.  It uses the widgets fixed by
7831 @code{update_screen_menubars()} and calls various X functions to ensure
7832 that the menus are displayed properly.
7833
7834   The menubar widget is set up so that @code{pre_activate_callback()} is
7835 called when the menu is first selected (i.e. mouse button goes down),
7836 and @code{menubar_selection_callback()} is called when an item is
7837 selected.  @code{pre_activate_callback()} calls the function in
7838 activate-menubar-hook, which can change the menubar (this is described
7839 in @file{menubar.c}).  If the menubar is changed,
7840 @code{set_screen_menubars()} is called.
7841 @code{menubar_selection_callback()} enqueues a menu event, putting in it
7842 a function to call (either @code{eval} or @code{call-interactively}) and
7843 its argument, which is the callback function or form given in the menu's
7844 description.
7845
7846 @node Subprocesses, Interface to X Windows, Menus, Top
7847 @chapter Subprocesses
7848
7849   The fields of a process are:
7850
7851 @table @code
7852 @item name
7853 A string, the name of the process.
7854
7855 @item command
7856 A list containing the command arguments that were used to start this
7857 process.
7858
7859 @item filter
7860 A function used to accept output from the process instead of a buffer,
7861 or @code{nil}.
7862
7863 @item sentinel
7864 A function called whenever the process receives a signal, or @code{nil}.
7865
7866 @item buffer
7867 The associated buffer of the process.
7868
7869 @item pid
7870 An integer, the Unix process @sc{id}.
7871
7872 @item childp
7873 A flag, non-@code{nil} if this is really a child process.
7874 It is @code{nil} for a network connection.
7875
7876 @item mark
7877 A marker indicating the position of the end of the last output from this
7878 process inserted into the buffer.  This is often but not always the end
7879 of the buffer.
7880
7881 @item kill_without_query
7882 If this is non-@code{nil}, killing XEmacs while this process is still
7883 running does not ask for confirmation about killing the process.
7884
7885 @item raw_status_low
7886 @itemx raw_status_high
7887 These two fields record 16 bits each of the process status returned by
7888 the @code{wait} system call.
7889
7890 @item status
7891 The process status, as @code{process-status} should return it.
7892
7893 @item tick
7894 @itemx update_tick
7895 If these two fields are not equal, a change in the status of the process
7896 needs to be reported, either by running the sentinel or by inserting a
7897 message in the process buffer.
7898
7899 @item pty_flag
7900 Non-@code{nil} if communication with the subprocess uses a @sc{pty};
7901 @code{nil} if it uses a pipe.
7902
7903 @item infd
7904 The file descriptor for input from the process.
7905
7906 @item outfd
7907 The file descriptor for output to the process.
7908
7909 @item subtty
7910 The file descriptor for the terminal that the subprocess is using.  (On
7911 some systems, there is no need to record this, so the value is
7912 @code{-1}.)
7913
7914 @item tty_name
7915 The name of the terminal that the subprocess is using,
7916 or @code{nil} if it is using pipes.
7917 @end table
7918
7919 @node Interface to X Windows, Index, Subprocesses, Top
7920 @chapter Interface to X Windows
7921
7922 Not yet documented.
7923
7924 @include index.texi
7925
7926 @c Print the tables of contents
7927 @summarycontents
7928 @contents
7929 @c That's all
7930
7931 @bye
7932