git.chise.org Git - chise/xemacs-chise.git.1/blob - man/internals/internals.texi

   1 \input texinfo  @c -*-texinfo-*-
   2 @c %**start of header
   3 @setfilename ../../info/internals.info
   4 @settitle XEmacs Internals Manual
   5 @c %**end of header
   6
   7 @ifinfo
   8 @dircategory XEmacs Editor
   9 @direntry
  10 * Internals: (internals).       XEmacs Internals Manual.
  11 @end direntry
  12
  13 Copyright @copyright{} 1992 - 1996 Ben Wing.
  14 Copyright @copyright{} 1996, 1997 Sun Microsystems.
  15 Copyright @copyright{} 1994 - 1998 Free Software Foundation.
  16 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
  17
  18
  19 Permission is granted to make and distribute verbatim copies of this
  20 manual provided the copyright notice and this permission notice are
  21 preserved on all copies.
  22
  23 @ignore
  24 Permission is granted to process this file through TeX and print the
  25 results, provided the printed document carries copying permission notice
  26 identical to this one except for the removal of this paragraph (this
  27 paragraph not being relevant to the printed manual).
  28
  29 @end ignore
  30 Permission is granted to copy and distribute modified versions of this
  31 manual under the conditions for verbatim copying, provided that the
  32 entire resulting derived work is distributed under the terms of a
  33 permission notice identical to this one.
  34
  35 Permission is granted to copy and distribute translations of this manual
  36 into another language, under the above conditions for modified versions,
  37 except that this permission notice may be stated in a translation
  38 approved by the Foundation.
  39
  40 Permission is granted to copy and distribute modified versions of this
  41 manual under the conditions for verbatim copying, provided also that the
  42 section entitled ``GNU General Public License'' is included exactly as
  43 in the original, and provided that the entire resulting derived work is
  44 distributed under the terms of a permission notice identical to this
  45 one.
  46
  47 Permission is granted to copy and distribute translations of this manual
  48 into another language, under the above conditions for modified versions,
  49 except that the section entitled ``GNU General Public License'' may be
  50 included in a translation approved by the Free Software Foundation
  51 instead of in the original English.
  52 @end ifinfo
  53
  54 @c Combine indices.
  55 @synindex cp fn
  56 @syncodeindex vr fn
  57 @syncodeindex ky fn
  58 @syncodeindex pg fn
  59 @syncodeindex tp fn
  60
  61 @setchapternewpage odd
  62 @finalout
  63
  64 @titlepage
  65 @title XEmacs Internals Manual
  66 @subtitle Version 1.2, October 1998
  67
  68 @author Ben Wing
  69 @author Martin Buchholz
  70 @author Hrvoje Niksic
  71 @page
  72 @vskip 0pt plus 1fill
  73
  74 @noindent
  75 Copyright @copyright{} 1992 - 1996 Ben Wing. @*
  76 Copyright @copyright{} 1996, 1997 Sun Microsystems, Inc. @*
  77 Copyright @copyright{} 1994 - 1998 Free Software Foundation. @*
  78 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
  79
  80 @sp 2
  81 Version 1.2 @*
  82 October 1998.@*
  83
  84 Permission is granted to make and distribute verbatim copies of this
  85 manual provided the copyright notice and this permission notice are
  86 preserved on all copies.
  87
  88 Permission is granted to copy and distribute modified versions of this
  89 manual under the conditions for verbatim copying, provided also that the
  90 section entitled ``GNU General Public License'' is included
  91 exactly as in the original, and provided that the entire resulting
  92 derived work is distributed under the terms of a permission notice
  93 identical to this one.
  94
  95 Permission is granted to copy and distribute translations of this manual
  96 into another language, under the above conditions for modified versions,
  97 except that the section entitled ``GNU General Public License'' may be
  98 included in a translation approved by the Free Software Foundation
  99 instead of in the original English.
 100 @end titlepage
 101 @page
 102
 103 @node Top, A History of Emacs, (dir), (dir)
 104
 105 @ifinfo
 106 This Info file contains v1.0 of the XEmacs Internals Manual.
 107 @end ifinfo
 108
 109 @menu
 110 * A History of Emacs::          Times, dates, important events.
 111 * XEmacs From the Outside::     A broad conceptual overview.
 112 * The Lisp Language::           An overview.
 113 * XEmacs From the Perspective of Building::
 114 * XEmacs From the Inside::
 115 * The XEmacs Object System (Abstractly Speaking)::
 116 * How Lisp Objects Are Represented in C::
 117 * Rules When Writing New C Code::
 118 * A Summary of the Various XEmacs Modules::
 119 * Allocation of Objects in XEmacs Lisp::
 120 * Events and the Event Loop::
 121 * Evaluation; Stack Frames; Bindings::
 122 * Symbols and Variables::
 123 * Buffers and Textual Representation::
 124 * MULE Character Sets and Encodings::
 125 * The Lisp Reader and Compiler::
 126 * Lstreams::
 127 * Consoles; Devices; Frames; Windows::
 128 * The Redisplay Mechanism::
 129 * Extents::
 130 * Faces and Glyphs::
 131 * Specifiers::
 132 * Menus::
 133 * Subprocesses::
 134 * Interface to X Windows::
 135 * Index::                   Index including concepts, functions, variables,
 136                               and other terms.
 137
 138       --- The Detailed Node Listing ---
 139
 140 Here are other nodes that are inferiors of those already listed,
 141 mentioned here so you can get to them in one step:
 142
 143 A History of Emacs
 144
 145 * Through Version 18::          Unification prevails.
 146 * Lucid Emacs::                 One version 19 Emacs.
 147 * GNU Emacs 19::                The other version 19 Emacs.
 148 * XEmacs::                      The continuation of Lucid Emacs.
 149
 150 Rules When Writing New C Code
 151
 152 * General Coding Rules::
 153 * Writing Lisp Primitives::
 154 * Adding Global Lisp Variables::
 155 * Techniques for XEmacs Developers::
 156
 157 A Summary of the Various XEmacs Modules
 158
 159 * Low-Level Modules::
 160 * Basic Lisp Modules::
 161 * Modules for Standard Editing Operations::
 162 * Editor-Level Control Flow Modules::
 163 * Modules for the Basic Displayable Lisp Objects::
 164 * Modules for other Display-Related Lisp Objects::
 165 * Modules for the Redisplay Mechanism::
 166 * Modules for Interfacing with the File System::
 167 * Modules for Other Aspects of the Lisp Interpreter and Object System::
 168 * Modules for Interfacing with the Operating System::
 169 * Modules for Interfacing with X Windows::
 170 * Modules for Internationalization::
 171
 172 Allocation of Objects in XEmacs Lisp
 173
 174 * Introduction to Allocation::
 175 * Garbage Collection::
 176 * GCPROing::
 177 * Integers and Characters::
 178 * Allocation from Frob Blocks::
 179 * lrecords::
 180 * Low-level allocation::
 181 * Pure Space::
 182 * Cons::
 183 * Vector::
 184 * Bit Vector::
 185 * Symbol::
 186 * Marker::
 187 * String::
 188 * Compiled Function::
 189
 190 Events and the Event Loop
 191
 192 * Introduction to Events::
 193 * Main Loop::
 194 * Specifics of the Event Gathering Mechanism::
 195 * Specifics About the Emacs Event::
 196 * The Event Stream Callback Routines::
 197 * Other Event Loop Functions::
 198 * Converting Events::
 199 * Dispatching Events; The Command Builder::
 200
 201 Evaluation; Stack Frames; Bindings
 202
 203 * Evaluation::
 204 * Dynamic Binding; The specbinding Stack; Unwind-Protects::
 205 * Simple Special Forms::
 206 * Catch and Throw::
 207
 208 Symbols and Variables
 209
 210 * Introduction to Symbols::
 211 * Obarrays::
 212 * Symbol Values::
 213
 214 Buffers and Textual Representation
 215
 216 * Introduction to Buffers::     A buffer holds a block of text such as a file.
 217 * The Text in a Buffer::        Representation of the text in a buffer.
 218 * Buffer Lists::                Keeping track of all buffers.
 219 * Markers and Extents::         Tagging locations within a buffer.
 220 * Bufbytes and Emchars::        Representation of individual characters.
 221 * The Buffer Object::           The Lisp object corresponding to a buffer.
 222
 223 MULE Character Sets and Encodings
 224
 225 * Character Sets::
 226 * Encodings::
 227 * Internal Mule Encodings::
 228
 229 Encodings
 230
 231 * Japanese EUC (Extended Unix Code)::
 232 * JIS7::
 233
 234 Internal Mule Encodings
 235
 236 * Internal String Encoding::
 237 * Internal Character Encoding::
 238
 239 The Lisp Reader and Compiler
 240
 241 Lstreams
 242
 243 Consoles; Devices; Frames; Windows
 244
 245 * Introduction to Consoles; Devices; Frames; Windows::
 246 * Point::
 247 * Window Hierarchy::
 248
 249 The Redisplay Mechanism
 250
 251 * Critical Redisplay Sections::
 252 * Line Start Cache::
 253
 254 Extents
 255
 256 * Introduction to Extents::     Extents are ranges over text, with properties.
 257 * Extent Ordering::             How extents are ordered internally.
 258 * Format of the Extent Info::   The extent information in a buffer or string.
 259 * Zero-Length Extents::         A weird special case.
 260 * Mathematics of Extent Ordering::      A rigorous foundation.
 261 * Extent Fragments::            Cached information useful for redisplay.
 262
 263 Faces and Glyphs
 264
 265 Specifiers
 266
 267 Menus
 268
 269 Subprocesses
 270
 271 Interface to X Windows
 272
 273 @end menu
 274
 275 @node A History of Emacs, XEmacs From the Outside, Top, Top
 276 @chapter A History of Emacs
 277 @cindex history of Emacs
 278 @cindex Hackers (Steven Levy)
 279 @cindex Levy, Steven
 280 @cindex ITS (Incompatible Timesharing System)
 281 @cindex Stallman, Richard
 282 @cindex RMS
 283 @cindex MIT
 284 @cindex TECO
 285 @cindex FSF
 286 @cindex Free Software Foundation
 287
 288   XEmacs is a powerful, customizable text editor and development
 289 environment.  It began as Lucid Emacs, which was in turn derived from
 290 GNU Emacs, a program written by Richard Stallman of the Free Software
 291 Foundation.  GNU Emacs dates back to the 1970's, and was modelled
 292 after a package called ``Emacs'', written in 1976, that was a set of
 293 macros on top of TECO, an old, old text editor written at MIT on the
 294 DEC PDP 10 under one of the earliest time-sharing operating systems,
 295 ITS (Incompatible Timesharing System). (ITS dates back well before
 296 Unix.) ITS, TECO, and Emacs were products of a group of people at MIT
 297 who called themselves ``hackers'', who shared an idealistic belief
 298 system about the free exchange of information and were fanatical in
 299 their devotion to and time spent with computers. (The hacker
 300 subculture dates back to the late 1950's at MIT and is described in
 301 detail in Steven Levy's book @cite{Hackers}.  This book also includes
 302 a lot of information about Stallman himself and the development of
 303 Lisp, a programming language developed at MIT that underlies Emacs.)
 304
 305 @menu
 306 * Through Version 18::          Unification prevails.
 307 * Lucid Emacs::                 One version 19 Emacs.
 308 * GNU Emacs 19::                The other version 19 Emacs.
 309 * GNU Emacs 20::                The other version 20 Emacs.
 310 * XEmacs::                      The continuation of Lucid Emacs.
 311 @end menu
 312
 313 @node Through Version 18
 314 @section Through Version 18
 315 @cindex Gosling, James
 316 @cindex Great Usenet Renaming
 317
 318   Although the history of the early versions of GNU Emacs is unclear,
 319 the history is well-known from the middle of 1985.  A time line is:
 320
 321 @itemize @bullet
 322 @item
 323 GNU Emacs version 15 (15.34) was released sometime in 1984 or 1985 and
 324 shared some code with a version of Emacs written by James Gosling (the
 325 same James Gosling who later created the Java language).
 326 @item
 327 GNU Emacs version 16 (first released version was 16.56) was released on
 328 July 15, 1985.  All Gosling code was removed due to potential copyright
 329 problems with the code.
 330 @item
 331 version 16.57: released on September 16, 1985.
 332 @item
 333 versions 16.58, 16.59: released on September 17, 1985.
 334 @item
 335 version 16.60: released on September 19, 1985.  These later version 16's
 336 incorporated patches from the net, esp. for getting Emacs to work under
 337 System V.
 338 @item
 339 version 17.36 (first official v17 release) released on December 20,
 340 1985.  Included a TeX-able user manual.  First official unpatched
 341 version that worked on vanilla System V machines.
 342 @item
 343 version 17.43 (second official v17 release) released on January 25,
 344 1986.
 345 @item
 346 version 17.45 released on January 30, 1986.
 347 @item
 348 version 17.46 released on February 4, 1986.
 349 @item
 350 version 17.48 released on February 10, 1986.
 351 @item
 352 version 17.49 released on February 12, 1986.
 353 @item
 354 version 17.55 released on March 18, 1986.
 355 @item
 356 version 17.57 released on March 27, 1986.
 357 @item
 358 version 17.58 released on April 4, 1986.
 359 @item
 360 version 17.61 released on April 12, 1986.
 361 @item
 362 version 17.63 released on May 7, 1986.
 363 @item
 364 version 17.64 released on May 12, 1986.
 365 @item
 366 version 18.24 (a beta version) released on October 2, 1986.
 367 @item
 368 version 18.30 (a beta version) released on November 15, 1986.
 369 @item
 370 version 18.31 (a beta version) released on November 23, 1986.
 371 @item
 372 version 18.32 (a beta version) released on December 7, 1986.
 373 @item
 374 version 18.33 (a beta version) released on December 12, 1986.
 375 @item
 376 version 18.35 (a beta version) released on January 5, 1987.
 377 @item
 378 version 18.36 (a beta version) released on January 21, 1987.
 379 @item
 380 January 27, 1987: The Great Usenet Renaming.  net.emacs is now
 381 comp.emacs.
 382 @item
 383 version 18.37 (a beta version) released on February 12, 1987.
 384 @item
 385 version 18.38 (a beta version) released on March 3, 1987.
 386 @item
 387 version 18.39 (a beta version) released on March 14, 1987.
 388 @item
 389 version 18.40 (a beta version) released on March 18, 1987.
 390 @item
 391 version 18.41 (the first ``official'' release) released on March 22,
 392 1987.
 393 @item
 394 version 18.45 released on June 2, 1987.
 395 @item
 396 version 18.46 released on June 9, 1987.
 397 @item
 398 version 18.47 released on June 18, 1987.
 399 @item
 400 version 18.48 released on September 3, 1987.
 401 @item
 402 version 18.49 released on September 18, 1987.
 403 @item
 404 version 18.50 released on February 13, 1988.
 405 @item
 406 version 18.51 released on May 7, 1988.
 407 @item
 408 version 18.52 released on September 1, 1988.
 409 @item
 410 version 18.53 released on February 24, 1989.
 411 @item
 412 version 18.54 released on April 26, 1989.
 413 @item
 414 version 18.55 released on August 23, 1989.  This is the earliest version
 415 that is still available by FTP.
 416 @item
 417 version 18.56 released on January 17, 1991.
 418 @item
 419 version 18.57 released late January, 1991.
 420 @item
 421 version 18.58 released ?????.
 422 @item
 423 version 18.59 released October 31, 1992.
 424 @end itemize
 425
 426 @node Lucid Emacs
 427 @section Lucid Emacs
 428 @cindex Lucid Emacs
 429 @cindex Lucid Inc.
 430 @cindex Energize
 431 @cindex Epoch
 432
 433   Lucid Emacs was developed by the (now-defunct) Lucid Inc., a maker of
 434 C++ and Lisp development environments.  It began when Lucid decided they
 435 wanted to use Emacs as the editor and cornerstone of their C++
 436 development environment (called ``Energize'').  They needed many features
 437 that were not available in the existing version of GNU Emacs (version
 438 18.5something), in particular good and integrated support for GUI
 439 elements such as mouse support, multiple fonts, multiple window-system
 440 windows, etc.  A branch of GNU Emacs called Epoch, written at the
 441 University of Illinois, existed that supplied many of these features;
 442 however, Lucid needed more than what existed in Epoch.  At the time, the
 443 Free Software Foundation was working on version 19 of Emacs (this was
 444 sometime around 1991), which was planned to have similar features, and
 445 so Lucid decided to work with the Free Software Foundation.  Their plan
 446 was to add features that they needed, and coordinate with the FSF so
 447 that the features would get included back into Emacs version 19.
 448
 449   Delays in the release of version 19 occurred, however (resulting in it
 450 finally being released more than a year after what was initially
 451 planned), and Lucid encountered unexpected technical resistance in
 452 getting their changes merged back into version 19, so they decided to
 453 release their own version of Emacs, which became Lucid Emacs 19.0.
 454
 455 @cindex Zawinski, Jamie
 456 @cindex Sexton, Harlan
 457 @cindex Benson, Eric
 458 @cindex Devin, Matthieu
 459   The initial authors of Lucid Emacs were Matthieu Devin, Harlan Sexton,
 460 and Eric Benson, and the work was later taken over by Jamie Zawinski,
 461 who became ``Mr. Lucid Emacs'' for many releases.
 462
 463   A time line for Lucid Emacs/XEmacs is
 464
 465 @itemize @bullet
 466 @item
 467 version 19.0 shipped with Energize 1.0, April 1992.
 468 @item
 469 version 19.1 released June 4, 1992.
 470 @item
 471 version 19.2 released June 19, 1992.
 472 @item
 473 version 19.3 released September 9, 1992.
 474 @item
 475 version 19.4 released January 21, 1993.
 476 @item
 477 version 19.5 was a repackaging of 19.4 with a few bug fixes and
 478 shipped with Energize 2.0.  Never released to the net.
 479 @item
 480 version 19.6 released April 9, 1993.
 481 @item
 482 version 19.7 was a repackaging of 19.6 with a few bug fixes and
 483 shipped with Energize 2.1.  Never released to the net.
 484 @item
 485 version 19.8 released September 6, 1993.
 486 @item
 487 version 19.9 released January 12, 1994.
 488 @item
 489 version 19.10 released May 27, 1994.
 490 @item
 491 version 19.11 (first XEmacs) released September 13, 1994.
 492 @item
 493 version 19.12 released June 23, 1995.
 494 @item
 495 version 19.13 released September 1, 1995.
 496 @item
 497 version 19.14 released June 23, 1996.
 498 @item
 499 version 20.0 released February 9, 1997.
 500 @item
 501 version 19.15 released March 28, 1997.
 502 @item
 503 version 20.1 (not released to the net) April 15, 1997.
 504 @item
 505 version 20.2 released May 16, 1997.
 506 @item
 507 version 19.16 released October 31, 1997.
 508 @item
 509 version 20.3 (the first stable version of XEmacs 20.x) released November 30,
 510 1997.
 511 version 20.4 released February 28, 1998.
 512 @end itemize
 513
 514 @node GNU Emacs 19
 515 @section GNU Emacs 19
 516 @cindex GNU Emacs 19
 517 @cindex FSF Emacs
 518
 519   About a year after the initial release of Lucid Emacs, the FSF
 520 released a beta of their version of Emacs 19 (referred to here as ``GNU
 521 Emacs'').  By this time, the current version of Lucid Emacs was
 522 19.6. (Strangely, the first released beta from the FSF was GNU Emacs
 523 19.7.) A time line for GNU Emacs version 19 is
 524
 525 @itemize @bullet
 526 @item
 527 version 19.8 (beta) released May 27, 1993.
 528 @item
 529 version 19.9 (beta) released May 27, 1993.
 530 @item
 531 version 19.10 (beta) released May 30, 1993.
 532 @item
 533 version 19.11 (beta) released June 1, 1993.
 534 @item
 535 version 19.12 (beta) released June 2, 1993.
 536 @item
 537 version 19.13 (beta) released June 8, 1993.
 538 @item
 539 version 19.14 (beta) released June 17, 1993.
 540 @item
 541 version 19.15 (beta) released June 19, 1993.
 542 @item
 543 version 19.16 (beta) released July 6, 1993.
 544 @item
 545 version 19.17 (beta) released late July, 1993.
 546 @item
 547 version 19.18 (beta) released August 9, 1993.
 548 @item
 549 version 19.19 (beta) released August 15, 1993.
 550 @item
 551 version 19.20 (beta) released November 17, 1993.
 552 @item
 553 version 19.21 (beta) released November 17, 1993.
 554 @item
 555 version 19.22 (beta) released November 28, 1993.
 556 @item
 557 version 19.23 (beta) released May 17, 1994.
 558 @item
 559 version 19.24 (beta) released May 16, 1994.
 560 @item
 561 version 19.25 (beta) released June 3, 1994.
 562 @item
 563 version 19.26 (beta) released September 11, 1994.
 564 @item
 565 version 19.27 (beta) released September 14, 1994.
 566 @item
 567 version 19.28 (first ``official'' release) released November 1, 1994.
 568 @item
 569 version 19.29 released June 21, 1995.
 570 @item
 571 version 19.30 released November 24, 1995.
 572 @item
 573 version 19.31 released May 25, 1996.
 574 @item
 575 version 19.32 released July 31, 1996.
 576 @item
 577 version 19.33 released August 11, 1996.
 578 @item
 579 version 19.34 released August 21, 1996.
 580 @item
 581 version 19.34b released September 6, 1996.
 582 @end itemize
 583
 584 @cindex Mlynarik, Richard
 585   In some ways, GNU Emacs 19 was better than Lucid Emacs; in some ways,
 586 worse.  Lucid soon began incorporating features from GNU Emacs 19 into
 587 Lucid Emacs; the work was mostly done by Richard Mlynarik, who had been
 588 working on and using GNU Emacs for a long time (back as far as version
 589 16 or 17).
 590
 591 @node GNU Emacs 20
 592 @section GNU Emacs 20
 593 @cindex GNU Emacs 20
 594 @cindex FSF Emacs
 595
 596 On February 2, 1997 work began on GNU Emacs to integrate Mule.  The first
 597 release was made in September of that year.
 598
 599 A timeline for Emacs 20 is
 600
 601 @itemize @bullet
 602 @item
 603 version 20.1 released September 17, 1997.
 604 @item
 605 version 20.2 released September 20, 1997.
 606 @item
 607 version 20.3 released August 19, 1998.
 608 @end itemize
 609
 610 @node XEmacs
 611 @section XEmacs
 612 @cindex XEmacs
 613
 614 @cindex Sun Microsystems
 615 @cindex University of Illinois
 616 @cindex Illinois, University of
 617 @cindex SPARCWorks
 618 @cindex Andreessen, Marc
 619 @cindex Baur, Steve
 620 @cindex Buchholz, Martin
 621 @cindex Kaplan, Simon
 622 @cindex Wing, Ben
 623 @cindex Thompson, Chuck
 624 @cindex Win-Emacs
 625 @cindex Epoch
 626 @cindex Amdahl Corporation
 627   Around the time that Lucid was developing Energize, Sun Microsystems
 628 was developing their own development environment (called ``SPARCWorks'')
 629 and also decided to use Emacs.  They joined forces with the Epoch team
 630 at the University of Illinois and later with Lucid.  The maintainer of
 631 the last-released version of Epoch was Marc Andreessen, but he dropped
 632 out and the Epoch project, headed by Simon Kaplan, lured Chuck Thompson
 633 away from a system administration job to become the primary Lucid Emacs
 634 author for Epoch and Sun.  Chuck's area of specialty became the
 635 redisplay engine (he replaced the old Lucid Emacs redisplay engine with
 636 a ported version from Epoch and then later rewrote it from scratch).
 637 Sun also hired Ben Wing (the author of Win-Emacs, a port of Lucid Emacs
 638 to Microsoft Windows 3.1) in 1993, for what was initially a one-month
 639 contract to fix some event problems but later became a many-year
 640 involvement, punctuated by a six-month contract with Amdahl Corporation.
 641
 642 @cindex rename to XEmacs
 643   In 1994, Sun and Lucid agreed to rename Lucid Emacs to XEmacs (a name
 644 not favorable to either company); the first release called XEmacs was
 645 version 19.11.  In June 1994, Lucid folded and Jamie quit to work for
 646 the newly formed Mosaic Communications Corp., later Netscape
 647 Communications Corp. (co-founded by the same Marc Andreessen, who had
 648 quit his Epoch job to work on a graphical browser for the World Wide
 649 Web).  Chuck then become the primary maintainer of XEmacs, and put out
 650 versions 19.11 through 19.14 in conjunction with Ben.  For 19.12 and
 651 19.13, Chuck added the new redisplay and many other display improvements
 652 and Ben added MULE support (support for Asian and other languages) and
 653 redesigned most of the internal Lisp subsystems to better support the
 654 MULE work and the various other features being added to XEmacs.  After
 655 19.14 Chuck retired as primary maintainer and Steve Baur stepped in.
 656
 657 @cindex MULE merged XEmacs appears
 658   Soon after 19.13 was released, work began in earnest on the MULE
 659 internationalization code and the source tree was divided into two
 660 development paths.  The MULE version was initially called 19.20, but was
 661 soon renamed to 20.0.  In 1996 Martin Buchholz of Sun Microsystems took
 662 over the care and feeding of it and worked on it in parallel with the
 663 19.14 development that was occurring at the same time.  After much work
 664 by Martin, it was decided to release 20.0 ahead of 19.15 in February
 665 1997.  The source tree remained divided until 20.2 when the version 19
 666 source was finally retired at version 19.16.
 667
 668 @cindex Baur, Steve
 669 @cindex Buchholz, Martin
 670 @cindex Jones, Kyle
 671 @cindex Niksic, Hrvoje
 672 @cindex XEmacs goes it alone
 673   In 1997, Sun finally dropped all pretense of support for XEmacs and
 674 Martin Buchholz left the company in November.  Since then, and mostly
 675 for the previous year, because Steve Baur was never paid to work on
 676 XEmacs, XEmacs has existed solely on the contributions of volunteers
 677 from the Free Software Community.  Starting from 1997, Hrvoje Niksic and
 678 Kyle Jones have figured prominently in XEmacs development.
 679
 680 @cindex merging attempts
 681   Many attempts have been made to merge XEmacs and GNU Emacs, but they
 682 have consistently failed.
 683
 684   A more detailed history is contained in the XEmacs About page.
 685
 686 @node XEmacs From the Outside, The Lisp Language, A History of Emacs, Top
 687 @chapter XEmacs From the Outside
 688 @cindex read-eval-print
 689
 690   XEmacs appears to the outside world as an editor, but it is really a
 691 Lisp environment.  At its heart is a Lisp interpreter; it also
 692 ``happens'' to contain many specialized object types (e.g. buffers,
 693 windows, frames, events) that are useful for implementing an editor.
 694 Some of these objects (in particular windows and frames) have
 695 displayable representations, and XEmacs provides a function
 696 @code{redisplay()} that ensures that the display of all such objects
 697 matches their internal state.  Most of the time, a standard Lisp
 698 environment is in a @dfn{read-eval-print} loop -- i.e. ``read some Lisp
 699 code, execute it, and print the results''.  XEmacs has a similar loop:
 700
 701 @itemize @bullet
 702 @item
 703 read an event
 704 @item
 705 dispatch the event (i.e. ``do it'')
 706 @item
 707 redisplay
 708 @end itemize
 709
 710   Reading an event is done using the Lisp function @code{next-event},
 711 which waits for something to happen (typically, the user presses a key
 712 or moves the mouse) and returns an event object describing this.
 713 Dispatching an event is done using the Lisp function
 714 @code{dispatch-event}, which looks up the event in a keymap object (a
 715 particular kind of object that associates an event with a Lisp function)
 716 and calls that function.  The function ``does'' what the user has
 717 requested by changing the state of particular frame objects, buffer
 718 objects, etc.  Finally, @code{redisplay()} is called, which updates the
 719 display to reflect those changes just made.  Thus is an ``editor'' born.
 720
 721 @cindex bridge, playing
 722 @cindex taxes, doing
 723 @cindex pi, calculating
 724   Note that you do not have to use XEmacs as an editor; you could just
 725 as well make it do your taxes, compute pi, play bridge, etc.  You'd just
 726 have to write functions to do those operations in Lisp.
 727
 728 @node The Lisp Language, XEmacs From the Perspective of Building, XEmacs From the Outside, Top
 729 @chapter The Lisp Language
 730 @cindex Lisp vs. C
 731 @cindex C vs. Lisp
 732 @cindex Lisp vs. Java
 733 @cindex Java vs. Lisp
 734 @cindex dynamic scoping
 735 @cindex scoping, dynamic
 736 @cindex dynamic types
 737 @cindex types, dynamic
 738 @cindex Java
 739 @cindex Common Lisp
 740 @cindex Gosling, James
 741
 742   Lisp is a general-purpose language that is higher-level than C and in
 743 many ways more powerful than C.  Powerful dialects of Lisp such as
 744 Common Lisp are probably much better languages for writing very large
 745 applications than is C. (Unfortunately, for many non-technical
 746 reasons C and its successor C++ have become the dominant languages for
 747 application development.  These languages are both inadequate for
 748 extremely large applications, which is evidenced by the fact that newer,
 749 larger programs are becoming ever harder to write and are requiring ever
 750 more programmers despite great increases in C development environments;
 751 and by the fact that, although hardware speeds and reliability have been
 752 growing at an exponential rate, most software is still generally
 753 considered to be slow and buggy.)
 754
 755   The new Java language holds promise as a better general-purpose
 756 development language than C.  Java has many features in common with
 757 Lisp that are not shared by C (this is not a coincidence, since
 758 Java was designed by James Gosling, a former Lisp hacker).  This
 759 will be discussed more later.
 760
 761 For those used to C, here is a summary of the basic differences between
 762 C and Lisp:
 763
 764 @enumerate
 765 @item
 766 Lisp has an extremely regular syntax.  Every function, expression,
 767 and control statement is written in the form
 768
 769 @example
 770    (@var{func} @var{arg1} @var{arg2} ...)
 771 @end example
 772
 773 This is as opposed to C, which writes functions as
 774
 775 @example
 776    func(@var{arg1}, @var{arg2}, ...)
 777 @end example
 778
 779 but writes expressions involving operators as (e.g.)
 780
 781 @example
 782    @var{arg1} + @var{arg2}
 783 @end example
 784
 785 and writes control statements as (e.g.)
 786
 787 @example
 788    while (@var{expr}) @{ @var{statement1}; @var{statement2}; ... @}
 789 @end example
 790
 791 Lisp equivalents of the latter two would be
 792
 793 @example
 794    (+ @var{arg1} @var{arg2} ...)
 795 @end example
 796
 797 and
 798
 799 @example
 800    (while @var{expr} @var{statement1} @var{statement2} ...)
 801 @end example
 802
 803 @item
 804 Lisp is a safe language.  Assuming there are no bugs in the Lisp
 805 interpreter/compiler, it is impossible to write a program that ``core
 806 dumps'' or otherwise causes the machine to execute an illegal
 807 instruction.  This is very different from C, where perhaps the most
 808 common outcome of a bug is exactly such a crash.  A corollary of this is that
 809 the C operation of casting a pointer is impossible (and unnecessary) in
 810 Lisp, and that it is impossible to access memory outside the bounds of
 811 an array.
 812
 813 @item
 814 Programs and data are written in the same form.  The
 815 parenthesis-enclosing form described above for statements is the same
 816 form used for the most common data type in Lisp, the list.  Thus, it is
 817 possible to represent any Lisp program using Lisp data types, and for
 818 one program to construct Lisp statements and then dynamically
 819 @dfn{evaluate} them, or cause them to execute.
 820
 821 @item
 822 All objects are @dfn{dynamically typed}.  This means that part of every
 823 object is an indication of what type it is.  A Lisp program can
 824 manipulate an object without knowing what type it is, and can query an
 825 object to determine its type.  This means that, correspondingly,
 826 variables and function parameters can hold objects of any type and are
 827 not normally declared as being of any particular type.  This is opposed
 828 to the @dfn{static typing} of C, where variables can hold exactly one
 829 type of object and must be declared as such, and objects do not contain
 830 an indication of their type because it's implicit in the variables they
 831 are stored in.  It is possible in C to have a variable hold different
 832 types of objects (e.g. through the use of @code{void *} pointers or
 833 variable-argument functions), but the type information must then be
 834 passed explicitly in some other fashion, leading to additional program
 835 complexity.
 836
 837 @item
 838 Allocated memory is automatically reclaimed when it is no longer in use.
 839 This operation is called @dfn{garbage collection} and involves looking
 840 through all variables to see what memory is being pointed to, and
 841 reclaiming any memory that is not pointed to and is thus
 842 ``inaccessible'' and out of use.  This is as opposed to C, in which
 843 allocated memory must be explicitly reclaimed using @code{free()}.  If
 844 you simply drop all pointers to memory without freeing it, it becomes
 845 ``leaked'' memory that still takes up space.  Over a long period of
 846 time, this can cause your program to grow and grow until it runs out of
 847 memory.
 848
 849 @item
 850 Lisp has built-in facilities for handling errors and exceptions.  In C,
 851 when an error occurs, usually either the program exits entirely or the
 852 routine in which the error occurs returns a value indicating this.  If
 853 an error occurs in a deeply-nested routine, then every routine currently
 854 called must unwind itself normally and return an error value back up to
 855 the next routine.  This means that every routine must explicitly check
 856 for an error in all the routines it calls; if it does not do so,
 857 unexpected and often random behavior results.  This is an extremely
 858 common source of bugs in C programs.  An alternative would be to do a
 859 non-local exit using @code{longjmp()}, but that is often very dangerous
 860 because the routines that were exited past had no opportunity to clean
 861 up after themselves and may leave things in an inconsistent state,
 862 causing a crash shortly afterwards.
 863
 864 Lisp provides mechanisms to make such non-local exits safe.  When an
 865 error occurs, a routine simply signals that an error of a particular
 866 class has occurred, and a non-local exit takes place.  Any routine can
 867 trap errors occurring in routines it calls by registering an error
 868 handler for some or all classes of errors. (If no handler is registered,
 869 a default handler, generally installed by the top-level event loop, is
 870 executed; this prints out the error and continues.) Routines can also
 871 specify cleanup code (called an @dfn{unwind-protect}) that will be
 872 called when control exits from a block of code, no matter how that exit
 873 occurs -- i.e. even if a function deeply nested below it causes a
 874 non-local exit back to the top level.
 875
 876 Note that this facility has appeared in some recent vintages of C, in
 877 particular Visual C++ and other PC compilers written for the Microsoft
 878 Win32 API.
 879
 880 @item
 881 In Emacs Lisp, local variables are @dfn{dynamically scoped}.  This means
 882 that if you declare a local variable in a particular function, and then
 883 call another function, that subfunction can ``see'' the local variable
 884 you declared.  This is actually considered a bug in Emacs Lisp and in
 885 all other early dialects of Lisp, and was corrected in Common Lisp. (In
 886 Common Lisp, you can still declare dynamically scoped variables if you
 887 want to -- they are sometimes useful -- but variables by default are
 888 @dfn{lexically scoped} as in C.)
 889 @end enumerate
 890
 891 For those familiar with Lisp, Emacs Lisp is modelled after MacLisp, an
 892 early dialect of Lisp developed at MIT (no relation to the Macintosh
 893 computer).  There is a Common Lisp compatibility package available for
 894 Emacs that provides many of the features of Common Lisp.
 895
 896 The Java language is derived in many ways from C, and shares a similar
 897 syntax, but has the following features in common with Lisp (and different
 898 from C):
 899
 900 @enumerate
 901 @item
 902 Java is a safe language, like Lisp.
 903 @item
 904 Java provides garbage collection, like Lisp.
 905 @item
 906 Java has built-in facilities for handling errors and exceptions, like
 907 Lisp.
 908 @item
 909 Java has a type system that combines the best advantages of both static
 910 and dynamic typing.  Objects (except very simple types) are explicitly
 911 marked with their type, as in dynamic typing; but there is a hierarchy
 912 of types and functions are declared to accept only certain types, thus
 913 providing the increased compile-time error-checking of static typing.
 914 @end enumerate
 915
 916 The Java language also has some negative attributes:
 917
 918 @enumerate
 919 @item
 920 Java uses the edit/compile/run model of software development.  This
 921 makes it hard to use interactively.  For example, to use Java like
 922 @code{bc} it is necessary to write a special purpose, albeit tiny,
 923 application.  In Emacs Lisp, a calculator comes built-in without any
 924 effort - one can always just type an expression in the @code{*scratch*}
 925 buffer.
 926 @item
 927 Java tries too hard to enforce, not merely enable, portability, making
 928 ordinary access to standard OS facilities painful.  Java has an
 929 @dfn{agenda}.  I think this is why @code{chdir} is not part of standard
 930 Java, which is inexcusable.
 931 @end enumerate
 932
 933 Unfortunately, there is no perfect language.  Static typing allows a
 934 compiler to catch programmer errors and produce more efficient code, but
 935 makes programming more tedious and less fun.  For the forseeable future,
 936 an Ideal Editing and Programming Environment (and that is what XEmacs
 937 aspires to) will be programmable in multiple languages: high level ones
 938 like Lisp for user customization and prototyping, and lower level ones
 939 for infrastructure and industrial strength applications.  If I had my
 940 way, XEmacs would be friendly towards the Python, Scheme, C++, ML,
 941 etc... communities.  But there are serious technical difficulties to
 942 achieving that goal.
 943
 944 The word @dfn{application} in the previous paragraph was used
 945 intentionally.  XEmacs implements an API for programs written in Lisp
 946 that makes it a full-fledged application platform, very much like an OS
 947 inside the real OS.
 948
 949 @node XEmacs From the Perspective of Building, XEmacs From the Inside, The Lisp Language, Top
 950 @chapter XEmacs From the Perspective of Building
 951
 952 The heart of XEmacs is the Lisp environment, which is written in C.
 953 This is contained in the @file{src/} subdirectory.  Underneath
 954 @file{src/} are two subdirectories of header files: @file{s/} (header
 955 files for particular operating systems) and @file{m/} (header files for
 956 particular machine types).  In practice the distinction between the two
 957 types of header files is blurred.  These header files define or undefine
 958 certain preprocessor constants and macros to indicate particular
 959 characteristics of the associated machine or operating system.  As part
 960 of the configure process, one @file{s/} file and one @file{m/} file is
 961 identified for the particular environment in which XEmacs is being
 962 built.
 963
 964 XEmacs also contains a great deal of Lisp code.  This implements the
 965 operations that make XEmacs useful as an editor as well as just a Lisp
 966 environment, and also contains many add-on packages that allow XEmacs to
 967 browse directories, act as a mail and Usenet news reader, compile Lisp
 968 code, etc.  There is actually more Lisp code than C code associated with
 969 XEmacs, but much of the Lisp code is peripheral to the actual operation
 970 of the editor.  The Lisp code all lies in subdirectories underneath the
 971 @file{lisp/} directory.
 972
 973 The @file{lwlib/} directory contains C code that implements a
 974 generalized interface onto different X widget toolkits and also
 975 implements some widgets of its own that behave like Motif widgets but
 976 are faster, free, and in some cases more powerful.  The code in this
 977 directory compiles into a library and is mostly independent from XEmacs.
 978
 979 The @file{etc/} directory contains various data files associated with
 980 XEmacs.  Some of them are actually read by XEmacs at startup; others
 981 merely contain useful information of various sorts.
 982
 983 The @file{lib-src/} directory contains C code for various auxiliary
 984 programs that are used in connection with XEmacs.  Some of them are used
 985 during the build process; others are used to perform certain functions
 986 that cannot conveniently be placed in the XEmacs executable (e.g. the
 987 @file{movemail} program for fetching mail out of @file{/var/spool/mail},
 988 which must be setgid to @file{mail} on many systems; and the
 989 @file{gnuclient} program, which allows an external script to communicate
 990 with a running XEmacs process).
 991
 992 The @file{man/} directory contains the sources for the XEmacs
 993 documentation.  It is mostly in a form called Texinfo, which can be
 994 converted into either a printed document (by passing it through @TeX{})
 995 or into on-line documentation called @dfn{info files}.
 996
 997 The @file{info/} directory contains the results of formatting the XEmacs
 998 documentation as @dfn{info files}, for on-line use.  These files are
 999 used when you enter the Info system using @kbd{C-h i} or through the
1000 Help menu.
1001
1002 The @file{dynodump/} directory contains auxiliary code used to build
1003 XEmacs on Solaris platforms.
1004
1005 The other directories contain various miscellaneous code and information
1006 that is not normally used or needed.
1007
1008 The first step of building involves running the @file{configure} program
1009 and passing it various parameters to specify any optional features you
1010 want and compiler arguments and such, as described in the @file{INSTALL}
1011 file.  This determines what the build environment is, chooses the
1012 appropriate @file{s/} and @file{m/} file, and runs a series of tests to
1013 determine many details about your environment, such as which library
1014 functions are available and exactly how they work.  The reason for
1015 running these tests is that it allows XEmacs to be compiled on a much
1016 wider variety of platforms than those that the XEmacs developers happen
1017 to be familiar with, including various sorts of hybrid platforms.  This
1018 is especially important now that many operating systems give you a great
1019 deal of control over exactly what features you want installed, and allow
1020 for easy upgrading of parts of a system without upgrading the rest.  It
1021 would be impossible to pre-determine and pre-specify the information for
1022 all possible configurations.
1023
1024 In fact, the @file{s/} and @file{m/} files are basically @emph{evil},
1025 since they contain unmaintainable platform-specific hard-coded
1026 information.  XEmacs has been moving in the direction of having all
1027 system-specific information be determined dynamically by
1028 @file{configure}.  Perhaps someday we can @code{rm -rf src/s src/m}.
1029
1030 When configure is done running, it generates @file{Makefile}s and
1031 @file{GNUmakefile}s and the file @file{src/config.h} (which describes
1032 the features of your system) from template files.  You then run
1033 @file{make}, which compiles the auxiliary code and programs in
1034 @file{lib-src/} and @file{lwlib/} and the main XEmacs executable in
1035 @file{src/}.  The result of compiling and linking is an executable
1036 called @file{temacs}, which is @emph{not} the final XEmacs executable.
1037 @file{temacs} by itself is not intended to function as an editor or even
1038 display any windows on the screen, and if you simply run it, it will
1039 exit immediately.  The @file{Makefile} runs @file{temacs} with certain
1040 options that cause it to initialize itself, read in a number of basic
1041 Lisp files, and then dump itself out into a new executable called
1042 @file{xemacs}.  This new executable has been pre-initialized and
1043 contains pre-digested Lisp code that is necessary for the editor to
1044 function (this includes most basic editing functions,
1045 e.g. @code{kill-line}, that can be defined in terms of other Lisp
1046 primitives; some initialization code that is called when certain
1047 objects, such as frames, are created; and all of the standard
1048 keybindings and code for the actions they result in).  This executable,
1049 @file{xemacs}, is the executable that you run to use the XEmacs editor.
1050
1051 Although @file{temacs} is not intended to be run as an editor, it can,
1052 by using the incantation @code{temacs -batch -l loadup.el run-temacs}.
1053 This is useful when the dumping procedure described above is broken, or
1054 when using certain program debugging tools such as Purify.  These tools
1055 get mighty confused by the tricks played by the XEmacs build process,
1056 such as allocation memory in one process, and freeing it in the next.
1057
1058 @node XEmacs From the Inside, The XEmacs Object System (Abstractly Speaking), XEmacs From the Perspective of Building, Top
1059 @chapter XEmacs From the Inside
1060
1061 Internally, XEmacs is quite complex, and can be very confusing.  To
1062 simplify things, it can be useful to think of XEmacs as containing an
1063 event loop that ``drives'' everything, and a number of other subsystems,
1064 such as a Lisp engine and a redisplay mechanism.  Each of these other
1065 subsystems exists simultaneously in XEmacs, and each has a certain
1066 state.  The flow of control continually passes in and out of these
1067 different subsystems in the course of normal operation of the editor.
1068
1069 It is important to keep in mind that, most of the time, the editor is
1070 ``driven'' by the event loop.  Except during initialization and batch
1071 mode, all subsystems are entered directly or indirectly through the
1072 event loop, and ultimately, control exits out of all subsystems back up
1073 to the event loop.  This cycle of entering a subsystem, exiting back out
1074 to the event loop, and starting another iteration of the event loop
1075 occurs once each keystroke, mouse motion, etc.
1076
1077 If you're trying to understand a particular subsystem (other than the
1078 event loop), think of it as a ``daemon'' process or ``servant'' that is
1079 responsible for one particular aspect of a larger system, and
1080 periodically receives commands or environment changes that cause it to
1081 do something.  Ultimately, these commands and environment changes are
1082 always triggered by the event loop.  For example:
1083
1084 @itemize @bullet
1085 @item
1086 The window and frame mechanism is responsible for keeping track of what
1087 windows and frames exist, what buffers are in them, etc.  It is
1088 periodically given commands (usually from the user) to make a change to
1089 the current window/frame state: i.e. create a new frame, delete a
1090 window, etc.
1091
1092 @item
1093 The buffer mechanism is responsible for keeping track of what buffers
1094 exist and what text is in them.  It is periodically given commands
1095 (usually from the user) to insert or delete text, create a buffer, etc.
1096 When it receives a text-change command, it notifies the redisplay
1097 mechanism.
1098
1099 @item
1100 The redisplay mechanism is responsible for making sure that windows and
1101 frames are displayed correctly.  It is periodically told (by the event
1102 loop) to actually ``do its job'', i.e. snoop around and see what the
1103 current state of the environment (mostly of the currently-existing
1104 windows, frames, and buffers) is, and make sure that that state matches
1105 what's actually displayed.  It keeps lots and lots of information around
1106 (such as what is actually being displayed currently, and what the
1107 environment was last time it checked) so that it can minimize the work
1108 it has to do.  It is also helped along in that whenever a relevant
1109 change to the environment occurs, the redisplay mechanism is told about
1110 this, so it has a pretty good idea of where it has to look to find
1111 possible changes and doesn't have to look everywhere.
1112
1113 @item
1114 The Lisp engine is responsible for executing the Lisp code in which most
1115 user commands are written.  It is entered through a call to @code{eval}
1116 or @code{funcall}, which occurs as a result of dispatching an event from
1117 the event loop.  The functions it calls issue commands to the buffer
1118 mechanism, the window/frame subsystem, etc.
1119
1120 @item
1121 The Lisp allocation subsystem is responsible for keeping track of Lisp
1122 objects.  It is given commands from the Lisp engine to allocate objects,
1123 garbage collect, etc.
1124 @end itemize
1125
1126 etc.
1127
1128   The important idea here is that there are a number of independent
1129 subsystems each with its own responsibility and persistent state, just
1130 like different employees in a company, and each subsystem is
1131 periodically given commands from other subsystems.  Commands can flow
1132 from any one subsystem to any other, but there is usually some sort of
1133 hierarchy, with all commands originating from the event subsystem.
1134
1135   XEmacs is entered in @code{main()}, which is in @file{emacs.c}.  When
1136 this is called the first time (in a properly-invoked @file{temacs}), it
1137 does the following:
1138
1139 @enumerate
1140 @item
1141 It does some very basic environment initializations, such as determining
1142 where it and its directories (e.g. @file{lisp/} and @file{etc/}) reside
1143 and setting up signal handlers.
1144 @item
1145 It initializes the entire Lisp interpreter.
1146 @item
1147 It sets the initial values of many built-in variables (including many
1148 variables that are visible to Lisp programs), such as the global keymap
1149 object and the built-in faces (a face is an object that describes the
1150 display characteristics of text).  This involves creating Lisp objects
1151 and thus is dependent on step (2).
1152 @item
1153 It performs various other initializations that are relevant to the
1154 particular environment it is running in, such as retrieving environment
1155 variables, determining the current date and the user who is running the
1156 program, examining its standard input, creating any necessary file
1157 descriptors, etc.
1158 @item
1159 At this point, the C initialization is complete.  A Lisp program that
1160 was specified on the command line (usually @file{loadup.el}) is called
1161 (temacs is normally invoked as @code{temacs -batch -l loadup.el dump}).
1162 @file{loadup.el} loads all of the other Lisp files that are needed for
1163 the operation of the editor, calls the @code{dump-emacs} function to
1164 write out @file{xemacs}, and then kills the temacs process.
1165 @end enumerate
1166
1167   When @file{xemacs} is then run, it only redoes steps (1) and (4)
1168 above; all variables already contain the values they were set to when
1169 the executable was dumped, and all memory that was allocated with
1170 @code{malloc()} is still around. (XEmacs knows whether it is being run
1171 as @file{xemacs} or @file{temacs} because it sets the global variable
1172 @code{initialized} to 1 after step (4) above.) At this point,
1173 @file{xemacs} calls a Lisp function to do any further initialization,
1174 which includes parsing the command-line (the C code can only do limited
1175 command-line parsing, which includes looking for the @samp{-batch} and
1176 @samp{-l} flags and a few other flags that it needs to know about before
1177 initialization is complete), creating the first frame (or @dfn{window}
1178 in standard window-system parlance), running the user's init file
1179 (usually the file @file{.emacs} in the user's home directory), etc.  The
1180 function to do this is usually called @code{normal-top-level};
1181 @file{loadup.el} tells the C code about this function by setting its
1182 name as the value of the Lisp variable @code{top-level}.
1183
1184   When the Lisp initialization code is done, the C code enters the event
1185 loop, and stays there for the duration of the XEmacs process.  The code
1186 for the event loop is contained in @file{keyboard.c}, and is called
1187 @code{Fcommand_loop_1()}.  Note that this event loop could very well be
1188 written in Lisp, and in fact a Lisp version exists; but apparently,
1189 doing this makes XEmacs run noticeably slower.
1190
1191   Notice how much of the initialization is done in Lisp, not in C.
1192 In general, XEmacs tries to move as much code as is possible
1193 into Lisp.  Code that remains in C is code that implements the
1194 Lisp interpreter itself, or code that needs to be very fast, or
1195 code that needs to do system calls or other such stuff that
1196 needs to be done in C, or code that needs to have access to
1197 ``forbidden'' structures. (One conscious aspect of the design of
1198 Lisp under XEmacs is a clean separation between the external
1199 interface to a Lisp object's functionality and its internal
1200 implementation.  Part of this design is that Lisp programs
1201 are forbidden from accessing the contents of the object other
1202 than through using a standard API.  In this respect, XEmacs Lisp
1203 is similar to modern Lisp dialects but differs from GNU Emacs,
1204 which tends to expose the implementation and allow Lisp
1205 programs to look at it directly.  The major advantage of
1206 hiding the implementation is that it allows the implementation
1207 to be redesigned without affecting any Lisp programs, including
1208 those that might want to be ``clever'' by looking directly at
1209 the object's contents and possibly manipulating them.)
1210
1211   Moving code into Lisp makes the code easier to debug and maintain and
1212 makes it much easier for people who are not XEmacs developers to
1213 customize XEmacs, because they can make a change with much less chance
1214 of obscure and unwanted interactions occurring than if they were to
1215 change the C code.
1216
1217 @node The XEmacs Object System (Abstractly Speaking), How Lisp Objects Are Represented in C, XEmacs From the Inside, Top
1218 @chapter The XEmacs Object System (Abstractly Speaking)
1219
1220   At the heart of the Lisp interpreter is its management of objects.
1221 XEmacs Lisp contains many built-in objects, some of which are
1222 simple and others of which can be very complex; and some of which
1223 are very common, and others of which are rarely used or are only
1224 used internally. (Since the Lisp allocation system, with its
1225 automatic reclamation of unused storage, is so much more convenient
1226 than @code{malloc()} and @code{free()}, the C code makes extensive use of it
1227 in its internal operations.)
1228
1229   The basic Lisp objects are
1230
1231 @table @code
1232 @item integer
1233 28 or 31 bits of precision, or 60 or 63 bits on 64-bit machines; the
1234 reason for this is described below when the internal Lisp object
1235 representation is described.
1236 @item float
1237 Same precision as a double in C.
1238 @item cons
1239 A simple container for two Lisp objects, used to implement lists and
1240 most other data structures in Lisp.
1241 @item char
1242 An object representing a single character of text; chars behave like
1243 integers in many ways but are logically considered text rather than
1244 numbers and have a different read syntax. (the read syntax for a char
1245 contains the char itself or some textual encoding of it -- for example,
1246 a Japanese Kanji character might be encoded as @samp{^[$(B#&^[(B} using the
1247 ISO-2022 encoding standard -- rather than the numerical representation
1248 of the char; this way, if the mapping between chars and integers
1249 changes, which is quite possible for Kanji characters and other extended
1250 characters, the same character will still be created.  Note that some
1251 primitives confuse chars and integers.  The worst culprit is @code{eq},
1252 which makes a special exception and considers a char to be @code{eq} to
1253 its integer equivalent, even though in no other case are objects of two
1254 different types @code{eq}.  The reason for this monstrosity is
1255 compatibility with existing code; the separation of char from integer
1256 came fairly recently.)
1257 @item symbol
1258 An object that contains Lisp objects and is referred to by name;
1259 symbols are used to implement variables and named functions
1260 and to provide the equivalent of preprocessor constants in C.
1261 @item vector
1262 A one-dimensional array of Lisp objects providing constant-time access
1263 to any of the objects; access to an arbitrary object in a vector is
1264 faster than for lists, but the operations that can be done on a vector
1265 are more limited.
1266 @item string
1267 Self-explanatory; behaves much like a vector of chars
1268 but has a different read syntax and is stored and manipulated
1269 more compactly.
1270 @item bit-vector
1271 A vector of bits; similar to a string in spirit.
1272 @item compiled-function
1273 An object containing compiled Lisp code, known as @dfn{byte code}.
1274 @item subr
1275 A Lisp primitive, i.e. a Lisp-callable function implemented in C.
1276 @end table
1277
1278 @cindex closure
1279 Note that there is no basic ``function'' type, as in more powerful
1280 versions of Lisp (where it's called a @dfn{closure}).  XEmacs Lisp does
1281 not provide the closure semantics implemented by Common Lisp and Scheme.
1282 The guts of a function in XEmacs Lisp are represented in one of four
1283 ways: a symbol specifying another function (when one function is an
1284 alias for another), a list (whose first element must be the symbol
1285 @code{lambda}) containing the function's source code, a
1286 compiled-function object, or a subr object. (In other words, given a
1287 symbol specifying the name of a function, calling @code{symbol-function}
1288 to retrieve the contents of the symbol's function cell will return one
1289 of these types of objects.)
1290
1291 XEmacs Lisp also contains numerous specialized objects used to implement
1292 the editor:
1293
1294 @table @code
1295 @item buffer
1296 Stores text like a string, but is optimized for insertion and deletion
1297 and has certain other properties that can be set.
1298 @item frame
1299 An object with various properties whose displayable representation is a
1300 @dfn{window} in window-system parlance.
1301 @item window
1302 A section of a frame that displays the contents of a buffer;
1303 often called a @dfn{pane} in window-system parlance.
1304 @item window-configuration
1305 An object that represents a saved configuration of windows in a frame.
1306 @item device
1307 An object representing a screen on which frames can be displayed;
1308 equivalent to a @dfn{display} in the X Window System and a @dfn{TTY} in
1309 character mode.
1310 @item face
1311 An object specifying the appearance of text or graphics; it has
1312 properties such as font, foreground color, and background color.
1313 @item marker
1314 An object that refers to a particular position in a buffer and moves
1315 around as text is inserted and deleted to stay in the same relative
1316 position to the text around it.
1317 @item extent
1318 Similar to a marker but covers a range of text in a buffer; can also
1319 specify properties of the text, such as a face in which the text is to
1320 be displayed, whether the text is invisible or unmodifiable, etc.
1321 @item event
1322 Generated by calling @code{next-event} and contains information
1323 describing a particular event happening in the system, such as the user
1324 pressing a key or a process terminating.
1325 @item keymap
1326 An object that maps from events (described using lists, vectors, and
1327 symbols rather than with an event object because the mapping is for
1328 classes of events, rather than individual events) to functions to
1329 execute or other events to recursively look up; the functions are
1330 described by name, using a symbol, or using lists to specify the
1331 function's code.
1332 @item glyph
1333 An object that describes the appearance of an image (e.g.  pixmap) on
1334 the screen; glyphs can be attached to the beginning or end of extents
1335 and in some future version of XEmacs will be able to be inserted
1336 directly into a buffer.
1337 @item process
1338 An object that describes a connection to an externally-running process.
1339 @end table
1340
1341   There are some other, less-commonly-encountered general objects:
1342
1343 @table @code
1344 @item hash-table
1345 An object that maps from an arbitrary Lisp object to another arbitrary
1346 Lisp object, using hashing for fast lookup.
1347 @item obarray
1348 A limited form of hash-table that maps from strings to symbols; obarrays
1349 are used to look up a symbol given its name and are not actually their
1350 own object type but are kludgily represented using vectors with hidden
1351 fields (this representation derives from GNU Emacs).
1352 @item specifier
1353 A complex object used to specify the value of a display property; a
1354 default value is given and different values can be specified for
1355 particular frames, buffers, windows, devices, or classes of device.
1356 @item char-table
1357 An object that maps from chars or classes of chars to arbitrary Lisp
1358 objects; internally char tables use a complex nested-vector
1359 representation that is optimized to the way characters are represented
1360 as integers.
1361 @item range-table
1362 An object that maps from ranges of integers to arbitrary Lisp objects.
1363 @end table
1364
1365   And some strange special-purpose objects:
1366
1367 @table @code
1368 @item charset
1369 @itemx coding-system
1370 Objects used when MULE, or multi-lingual/Asian-language, support is
1371 enabled.
1372 @item color-instance
1373 @itemx font-instance
1374 @itemx image-instance
1375 An object that encapsulates a window-system resource; instances are
1376 mostly used internally but are exposed on the Lisp level for cleanness
1377 of the specifier model and because it's occasionally useful for Lisp
1378 program to create or query the properties of instances.
1379 @item subwindow
1380 An object that encapsulate a @dfn{subwindow} resource, i.e. a
1381 window-system child window that is drawn into by an external process;
1382 this object should be integrated into the glyph system but isn't yet,
1383 and may change form when this is done.
1384 @item tooltalk-message
1385 @itemx tooltalk-pattern
1386 Objects that represent resources used in the ToolTalk interprocess
1387 communication protocol.
1388 @item toolbar-button
1389 An object used in conjunction with the toolbar.
1390 @end table
1391
1392   And objects that are only used internally:
1393
1394 @table @code
1395 @item opaque
1396 A generic object for encapsulating arbitrary memory; this allows you the
1397 generality of @code{malloc()} and the convenience of the Lisp object
1398 system.
1399 @item lstream
1400 A buffering I/O stream, used to provide a unified interface to anything
1401 that can accept output or provide input, such as a file descriptor, a
1402 stdio stream, a chunk of memory, a Lisp buffer, a Lisp string, etc.;
1403 it's a Lisp object to make its memory management more convenient.
1404 @item char-table-entry
1405 Subsidiary objects in the internal char-table representation.
1406 @item extent-auxiliary
1407 @itemx menubar-data
1408 @itemx toolbar-data
1409 Various special-purpose objects that are basically just used to
1410 encapsulate memory for particular subsystems, similar to the more
1411 general ``opaque'' object.
1412 @item symbol-value-forward
1413 @itemx symbol-value-buffer-local
1414 @itemx symbol-value-varalias
1415 @itemx symbol-value-lisp-magic
1416 Special internal-only objects that are placed in the value cell of a
1417 symbol to indicate that there is something special with this variable --
1418 e.g. it has no value, it mirrors another variable, or it mirrors some C
1419 variable; there is really only one kind of object, called a
1420 @dfn{symbol-value-magic}, but it is sort-of halfway kludged into
1421 semi-different object types.
1422 @end table
1423
1424 @cindex permanent objects
1425 @cindex temporary objects
1426   Some types of objects are @dfn{permanent}, meaning that once created,
1427 they do not disappear until explicitly destroyed, using a function such
1428 as @code{delete-buffer}, @code{delete-window}, @code{delete-frame}, etc.
1429 Others will disappear once they are not longer used, through the garbage
1430 collection mechanism.  Buffers, frames, windows, devices, and processes
1431 are among the objects that are permanent.  Note that some objects can go
1432 both ways: Faces can be created either way; extents are normally
1433 permanent, but detached extents (extents not referring to any text, as
1434 happens to some extents when the text they are referring to is deleted)
1435 are temporary.  Note that some permanent objects, such as faces and
1436 coding systems, cannot be deleted.  Note also that windows are unique in
1437 that they can be @emph{undeleted} after having previously been
1438 deleted. (This happens as a result of restoring a window configuration.)
1439
1440 @cindex read syntax
1441   Note that many types of objects have a @dfn{read syntax}, i.e. a way of
1442 specifying an object of that type in Lisp code.  When you load a Lisp
1443 file, or type in code to be evaluated, what really happens is that the
1444 function @code{read} is called, which reads some text and creates an object
1445 based on the syntax of that text; then @code{eval} is called, which
1446 possibly does something special; then this loop repeats until there's
1447 no more text to read. (@code{eval} only actually does something special
1448 with symbols, which causes the symbol's value to be returned,
1449 similar to referencing a variable; and with conses [i.e. lists],
1450 which cause a function invocation.  All other values are returned
1451 unchanged.)
1452
1453   The read syntax
1454
1455 @example
1456 17297
1457 @end example
1458
1459 converts to an integer whose value is 17297.
1460
1461 @example
1462 1.983e-4
1463 @end example
1464
1465 converts to a float whose value is 1983.23e-4, or .0001983.
1466
1467 @example
1468 ?b
1469 @end example
1470
1471 converts to a char that represents the lowercase letter b.
1472
1473 @example
1474 ?^[$(B#&^[(B
1475 @end example
1476
1477 (where @samp{^[} actually is an @samp{ESC} character) converts to a
1478 particular Kanji character when using an ISO2022-based coding system for
1479 input. (To decode this goo: @samp{ESC} begins an escape sequence;
1480 @samp{ESC $ (} is a class of escape sequences meaning ``switch to a
1481 94x94 character set''; @samp{ESC $ ( B} means ``switch to Japanese
1482 Kanji''; @samp{#} and @samp{&} collectively index into a 94-by-94 array
1483 of characters [subtract 33 from the ASCII value of each character to get
1484 the corresponding index]; @samp{ESC (} is a class of escape sequences
1485 meaning ``switch to a 94 character set''; @samp{ESC (B} means ``switch
1486 to US ASCII''.  It is a coincidence that the letter @samp{B} is used to
1487 denote both Japanese Kanji and US ASCII.  If the first @samp{B} were
1488 replaced with an @samp{A}, you'd be requesting a Chinese Hanzi character
1489 from the GB2312 character set.)
1490
1491 @example
1492 "foobar"
1493 @end example
1494
1495 converts to a string.
1496
1497 @example
1498 foobar
1499 @end example
1500
1501 converts to a symbol whose name is @code{"foobar"}.  This is done by
1502 looking up the string equivalent in the global variable
1503 @code{obarray}, whose contents should be an obarray.  If no symbol
1504 is found, a new symbol with the name @code{"foobar"} is automatically
1505 created and added to @code{obarray}; this process is called
1506 @dfn{interning} the symbol.
1507 @cindex interning
1508
1509 @example
1510 (foo . bar)
1511 @end example
1512
1513 converts to a cons cell containing the symbols @code{foo} and @code{bar}.
1514
1515 @example
1516 (1 a 2.5)
1517 @end example
1518
1519 converts to a three-element list containing the specified objects
1520 (note that a list is actually a set of nested conses; see the
1521 XEmacs Lisp Reference).
1522
1523 @example
1524 [1 a 2.5]
1525 @end example
1526
1527 converts to a three-element vector containing the specified objects.
1528
1529 @example
1530 #[... ... ... ...]
1531 @end example
1532
1533 converts to a compiled-function object (the actual contents are not
1534 shown since they are not relevant here; look at a file that ends with
1535 @file{.elc} for examples).
1536
1537 @example
1538 #*01110110
1539 @end example
1540
1541 converts to a bit-vector.
1542
1543 @example
1544 #s(hash-table ... ...)
1545 @end example
1546
1547 converts to a hash table (the actual contents are not shown).
1548
1549 @example
1550 #s(range-table ... ...)
1551 @end example
1552
1553 converts to a range table (the actual contents are not shown).
1554
1555 @example
1556 #s(char-table ... ...)
1557 @end example
1558
1559 converts to a char table (the actual contents are not shown).
1560
1561 Note that the @code{#s()} syntax is the general syntax for structures,
1562 which are not really implemented in XEmacs Lisp but should be.
1563
1564 When an object is printed out (using @code{print} or a related
1565 function), the read syntax is used, so that the same object can be read
1566 in again.
1567
1568 The other objects do not have read syntaxes, usually because it does not
1569 really make sense to create them in this fashion (i.e.  processes, where
1570 it doesn't make sense to have a subprocess created as a side effect of
1571 reading some Lisp code), or because they can't be created at all
1572 (e.g. subrs).  Permanent objects, as a rule, do not have a read syntax;
1573 nor do most complex objects, which contain too much state to be easily
1574 initialized through a read syntax.
1575
1576 @node How Lisp Objects Are Represented in C, Rules When Writing New C Code, The XEmacs Object System (Abstractly Speaking), Top
1577 @chapter How Lisp Objects Are Represented in C
1578
1579 Lisp objects are represented in C using a 32-bit or 64-bit machine word
1580 (depending on the processor; i.e. DEC Alphas use 64-bit Lisp objects and
1581 most other processors use 32-bit Lisp objects).  The representation
1582 stuffs a pointer together with a tag, as follows:
1583
1584 @example
1585  [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
1586  [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
1587
1588    <---> ^ <------------------------------------------------------>
1589     tag  |       a pointer to a structure, or an integer
1590          |
1591        mark bit
1592 @end example
1593
1594 The tag describes the type of the Lisp object.  For integers and chars,
1595 the lower 28 bits contain the value of the integer or char; for all
1596 others, the lower 28 bits contain a pointer.  The mark bit is used
1597 during garbage-collection, and is always 0 when garbage collection is
1598 not happening. (The way that garbage collection works, basically, is that it
1599 loops over all places where Lisp objects could exist -- this includes
1600 all global variables in C that contain Lisp objects [including
1601 @code{Vobarray}, the C equivalent of @code{obarray}; through this, all
1602 Lisp variables will get marked], plus various other places -- and
1603 recursively scans through the Lisp objects, marking each object it finds
1604 by setting the mark bit.  Then it goes through the lists of all objects
1605 allocated, freeing the ones that are not marked and turning off the mark
1606 bit of the ones that are marked.)
1607
1608 Lisp objects use the typedef @code{Lisp_Object}, but the actual C type
1609 used for the Lisp object can vary.  It can be either a simple type
1610 (@code{long} on the DEC Alpha, @code{int} on other machines) or a
1611 structure whose fields are bit fields that line up properly (actually, a
1612 union of structures is used).  Generally the simple integral type is
1613 preferable because it ensures that the compiler will actually use a
1614 machine word to represent the object (some compilers will use more
1615 general and less efficient code for unions and structs even if they can
1616 fit in a machine word).  The union type, however, has the advantage of
1617 stricter type checking (if you accidentally pass an integer where a Lisp
1618 object is desired, you get a compile error), and it makes it easier to
1619 decode Lisp objects when debugging.  The choice of which type to use is
1620 determined by the preprocessor constant @code{USE_UNION_TYPE} which is
1621 defined via the @code{--use-union-type} option to @code{configure}.
1622
1623 @cindex record type
1624
1625 Note that there are only eight types that the tag can represent, but
1626 many more actual types than this.  This is handled by having one of the
1627 tag types specify a meta-type called a @dfn{record}; for all such
1628 objects, the first four bytes of the pointed-to structure indicate what
1629 the actual type is.
1630
1631 Note also that having 28 bits for pointers and integers restricts a lot
1632 of things to 256 megabytes of memory. (Basically, enough pointers and
1633 indices and whatnot get stuffed into Lisp objects that the total amount
1634 of memory used by XEmacs can't grow above 256 megabytes.  In older
1635 versions of XEmacs and GNU Emacs, the tag was 5 bits wide, allowing for
1636 32 types, which was more than the actual number of types that existed at
1637 the time, and no ``record'' type was necessary.  However, this limited
1638 the editor to 64 megabytes total, which some users who edited large
1639 files might conceivably exceed.)
1640
1641 Also, note that there is an implicit assumption here that all pointers
1642 are low enough that the top bits are all zero and can just be chopped
1643 off.  On standard machines that allocate memory from the bottom up (and
1644 give each process its own address space), this works fine.  Some
1645 machines, however, put the data space somewhere else in memory
1646 (e.g. beginning at 0x80000000).  Those machines cope by defining
1647 @code{DATA_SEG_BITS} in the corresponding @file{m/} or @file{s/} file to
1648 the proper mask.  Then, pointers retrieved from Lisp objects are
1649 automatically OR'ed with this value prior to being used.
1650
1651 A corollary of the previous paragraph is that @strong{(pointers to)
1652 stack-allocated structures cannot be put into Lisp objects}.  The stack
1653 is generally located near the top of memory; if you put such a pointer
1654 into a Lisp object, it will get its top bits chopped off, and you will
1655 lose.
1656
1657 Actually, there's an alternative representation of a @code{Lisp_Object},
1658 invented by Kyle Jones, that is used when the
1659 @code{--use-minimal-tagbits} option to @code{configure} is used.  In
1660 this case the 2 lower bits are used for the tag bits.  This
1661 representation assumes that pointers to structs are always aligned to
1662 multiples of 4, so the lower 2 bits are always zero.
1663
1664 @example
1665  [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
1666  [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
1667
1668    <---------------------------------------------------------> <->
1669             a pointer to a structure, or an integer            tag
1670 @end example
1671
1672 A tag of 00 is used for all pointer object types, a tag of 10 is used
1673 for characters, and the other two tags 01 and 11 are joined together to
1674 form the integer object type.  The markbit is moved to part of the
1675 structure being pointed at (integers and chars do not need to be marked,
1676 since no memory is allocated).  This representation has these
1677 advantages:
1678
1679 @enumerate
1680 @item
1681 31 bits can be used for Lisp Integers.
1682 @item
1683 @emph{Any} pointer can be represented directly, and no bit masking
1684 operations are necessary.
1685 @end enumerate
1686
1687 The disadvantages are:
1688
1689 @enumerate
1690 @item
1691 An extra level of indirection is needed when accessing the object types
1692 that were not record types.  So checking whether a Lisp object is a cons
1693 cell becomes a slower operation.
1694 @item
1695 Mark bits can no longer be stored directly in Lisp objects, so another
1696 place for them must be found.  This means that a cons cell requires more
1697 memory than merely room for 2 lisp objects, leading to extra memory use.
1698 @end enumerate
1699
1700 Various macros are used to construct Lisp objects and extract the
1701 components.  Macros of the form @code{XINT()}, @code{XCHAR()},
1702 @code{XSTRING()}, @code{XSYMBOL()}, etc. mask out the pointer/integer
1703 field and cast it to the appropriate type.  All of the macros that
1704 construct pointers will @code{OR} with @code{DATA_SEG_BITS} if
1705 necessary.  @code{XINT()} needs to be a bit tricky so that negative
1706 numbers are properly sign-extended: Usually it does this by shifting the
1707 number four bits to the left and then four bits to the right.  This
1708 assumes that the right-shift operator does an arithmetic shift (i.e. it
1709 leaves the most-significant bit as-is rather than shifting in a zero, so
1710 that it mimics a divide-by-two even for negative numbers).  Not all
1711 machines/compilers do this, and on the ones that don't, a more
1712 complicated definition is selected by defining
1713 @code{EXPLICIT_SIGN_EXTEND}.
1714
1715 Note that when @code{ERROR_CHECK_TYPECHECK} is defined, the extractor
1716 macros become more complicated -- they check the tag bits and/or the
1717 type field in the first four bytes of a record type to ensure that the
1718 object is really of the correct type.  This is great for catching places
1719 where an incorrect type is being dereferenced -- this typically results
1720 in a pointer being dereferenced as the wrong type of structure, with
1721 unpredictable (and sometimes not easily traceable) results.
1722
1723 There are similar @code{XSET@var{TYPE}()} macros that construct a Lisp
1724 object.  These macros are of the form @code{XSET@var{TYPE}
1725 (@var{lvalue}, @var{result})},
1726 i.e. they have to be a statement rather than just used in an expression.
1727 The reason for this is that standard C doesn't let you ``construct'' a
1728 structure (but GCC does).  Granted, this sometimes isn't too convenient;
1729 for the case of integers, at least, you can use the function
1730 @code{make_int()}, which constructs and @emph{returns} an integer
1731 Lisp object.  Note that the @code{XSET@var{TYPE}()} macros are also
1732 affected by @code{ERROR_CHECK_TYPECHECK} and make sure that the
1733 structure is of the right type in the case of record types, where the
1734 type is contained in the structure.
1735
1736 The C programmer is responsible for @strong{guaranteeing} that a
1737 Lisp_Object is is the correct type before using the @code{X@var{TYPE}}
1738 macros.  This is especially important in the case of lists.  Use
1739 @code{XCAR} and @code{XCDR} if a Lisp_Object is certainly a cons cell,
1740 else use @code{Fcar()} and @code{Fcdr()}.  Trust other C code, but not
1741 Lisp code.  On the other hand, if XEmacs has an internal logic error,
1742 it's better to crash immediately, so sprinkle ``unreachable''
1743 @code{abort()}s liberally about the source code.
1744
1745 @node Rules When Writing New C Code, A Summary of the Various XEmacs Modules, How Lisp Objects Are Represented in C, Top
1746 @chapter Rules When Writing New C Code
1747
1748 The XEmacs C Code is extremely complex and intricate, and there are many
1749 rules that are more or less consistently followed throughout the code.
1750 Many of these rules are not obvious, so they are explained here.  It is
1751 of the utmost importance that you follow them.  If you don't, you may
1752 get something that appears to work, but which will crash in odd
1753 situations, often in code far away from where the actual breakage is.
1754
1755 @menu
1756 * General Coding Rules::
1757 * Writing Lisp Primitives::
1758 * Adding Global Lisp Variables::
1759 * Coding for Mule::
1760 * Techniques for XEmacs Developers::
1761 @end menu
1762
1763 @node General Coding Rules
1764 @section General Coding Rules
1765
1766 The C code is actually written in a dialect of C called @dfn{Clean C},
1767 meaning that it can be compiled, mostly warning-free, with either a C or
1768 C++ compiler.  Coding in Clean C has several advantages over plain C.
1769 C++ compilers are more nit-picking, and a number of coding errors have
1770 been found by compiling with C++.  The ability to use both C and C++
1771 tools means that a greater variety of development tools are available to
1772 the developer.
1773
1774 Almost every module contains a @code{syms_of_*()} function and a
1775 @code{vars_of_*()} function.  The former declares any Lisp primitives
1776 you have defined and defines any symbols you will be using.  The latter
1777 declares any global Lisp variables you have added and initializes global
1778 C variables in the module.  For each such function, declare it in
1779 @file{symsinit.h} and make sure it's called in the appropriate place in
1780 @file{emacs.c}.  @strong{Important}: There are stringent requirements on
1781 exactly what can go into these functions.  See the comment in
1782 @file{emacs.c}.  The reason for this is to avoid obscure unwanted
1783 interactions during initialization.  If you don't follow these rules,
1784 you'll be sorry!  If you want to do anything that isn't allowed, create
1785 a @code{complex_vars_of_*()} function for it.  Doing this is tricky,
1786 though: You have to make sure your function is called at the right time
1787 so that all the initialization dependencies work out.
1788
1789 Every module includes @file{<config.h>} (angle brackets so that
1790 @samp{--srcdir} works correctly; @file{config.h} may or may not be in
1791 the same directory as the C sources) and @file{lisp.h}.  @file{config.h}
1792 must always be included before any other header files (including
1793 system header files) to ensure that certain tricks played by various
1794 @file{s/} and @file{m/} files work out correctly.
1795
1796 @strong{All global and static variables that are to be modifiable must
1797 be declared uninitialized.}  This means that you may not use the
1798 ``declare with initializer'' form for these variables, such as @code{int
1799 some_variable = 0;}.  The reason for this has to do with some kludges
1800 done during the dumping process: If possible, the initialized data
1801 segment is re-mapped so that it becomes part of the (unmodifiable) code
1802 segment in the dumped executable.  This allows this memory to be shared
1803 among multiple running XEmacs processes.  XEmacs is careful to place as
1804 much constant data as possible into initialized variables (in
1805 particular, into what's called the @dfn{pure space} -- see below) during
1806 the @file{temacs} phase.
1807
1808 @cindex copy-on-write
1809 @strong{Please note:} This kludge only works on a few systems nowadays,
1810 and is rapidly becoming irrelevant because most modern operating systems
1811 provide @dfn{copy-on-write} semantics.  All data is initially shared
1812 between processes, and a private copy is automatically made (on a
1813 page-by-page basis) when a process first attempts to write to a page of
1814 memory.
1815
1816 Formerly, there was a requirement that static variables not be declared
1817 inside of functions.  This had to do with another hack along the same
1818 vein as what was just described: old USG systems put statically-declared
1819 variables in the initialized data space, so those header files had a
1820 @code{#define static} declaration. (That way, the data-segment remapping
1821 described above could still work.) This fails badly on static variables
1822 inside of functions, which suddenly become automatic variables;
1823 therefore, you weren't supposed to have any of them.  This awful kludge
1824 has been removed in XEmacs because
1825
1826 @enumerate
1827 @item
1828 almost all of the systems that used this kludge ended up having
1829 to disable the data-segment remapping anyway;
1830 @item
1831 the only systems that didn't were extremely outdated ones;
1832 @item
1833 this hack completely messed up inline functions.
1834 @end enumerate
1835
1836 The C source code makes heavy use of C preprocessor macros.  One popular
1837 macro style is:
1838
1839 @example
1840 #define FOO(var, value) do @{           \
1841   Lisp_Object FOO_value = (value);      \
1842   ... /* compute using FOO_value */     \
1843   (var) = bar;                          \
1844 @} while (0)
1845 @end example
1846
1847 The @code{do @{...@} while (0)} is a standard trick to allow FOO to have
1848 statement semantics, so that it can safely be used within an @code{if}
1849 statement in C, for example.  Multiple evaluation is prevented by
1850 copying a supplied argument into a local variable, so that
1851 @code{FOO(var,fun(1))} only calls @code{fun} once.
1852
1853 Lisp lists are popular data structures in the C code as well as in
1854 Elisp.  There are two sets of macros that iterate over lists.
1855 @code{EXTERNAL_LIST_LOOP_@var{n}} should be used when the list has been
1856 supplied by the user, and cannot be trusted to be acyclic and
1857 nil-terminated.  A @code{malformed-list} or @code{circular-list} error
1858 will be generated if the list being iterated over is not entirely
1859 kosher.  @code{LIST_LOOP_@var{n}}, on the other hand, is faster and less
1860 safe, and can be used only on trusted lists.
1861
1862 Related macros are @code{GET_EXTERNAL_LIST_LENGTH} and
1863 @code{GET_LIST_LENGTH}, which calculate the length of a list, and in the
1864 case of @code{GET_EXTERNAL_LIST_LENGTH}, validating the properness of
1865 the list.  The macros @code{EXTERNAL_LIST_LOOP_DELETE_IF} and
1866 @code{LIST_LOOP_DELETE_IF} delete elements from a lisp list satisfying some
1867 predicate.
1868
1869 @node Writing Lisp Primitives
1870 @section Writing Lisp Primitives
1871
1872 Lisp primitives are Lisp functions implemented in C.  The details of
1873 interfacing the C function so that Lisp can call it are handled by a few
1874 C macros.  The only way to really understand how to write new C code is
1875 to read the source, but we can explain some things here.
1876
1877 An example of a special form is the definition of @code{prog1}, from
1878 @file{eval.c}.  (An ordinary function would have the same general
1879 appearance.)
1880
1881 @cindex garbage collection protection
1882 @smallexample
1883 @group
1884 DEFUN ("prog1", Fprog1, 1, UNEVALLED, 0, /*
1885 Similar to `progn', but the value of the first form is returned.
1886 \(prog1 FIRST BODY...): All the arguments are evaluated sequentially.
1887 The value of FIRST is saved during evaluation of the remaining args,
1888 whose values are discarded.
1889 */
1890        (args))
1891 @{
1892   /* This function can GC */
1893   REGISTER Lisp_Object val, form, tail;
1894   struct gcpro gcpro1;
1895
1896   val = Feval (XCAR (args));
1897
1898   GCPRO1 (val);
1899
1900   LIST_LOOP_3 (form, XCDR (args), tail)
1901     Feval (form);
1902
1903   UNGCPRO;
1904   return val;
1905 @}
1906 @end group
1907 @end smallexample
1908
1909   Let's start with a precise explanation of the arguments to the
1910 @code{DEFUN} macro.  Here is a template for them:
1911
1912 @example
1913 @group
1914 DEFUN (@var{lname}, @var{fname}, @var{min_args}, @var{max_args}, @var{interactive}, /*
1915 @var{docstring}
1916 */
1917    (@var{arglist}))
1918 @end group
1919 @end example
1920
1921 @table @var
1922 @item lname
1923 This string is the name of the Lisp symbol to define as the function
1924 name; in the example above, it is @code{"prog1"}.
1925
1926 @item fname
1927 This is the C function name for this function.  This is the name that is
1928 used in C code for calling the function.  The name is, by convention,
1929 @samp{F} prepended to the Lisp name, with all dashes (@samp{-}) in the
1930 Lisp name changed to underscores.  Thus, to call this function from C
1931 code, call @code{Fprog1}.  Remember that the arguments are of type
1932 @code{Lisp_Object}; various macros and functions for creating values of
1933 type @code{Lisp_Object} are declared in the file @file{lisp.h}.
1934
1935 Primitives whose names are special characters (e.g. @code{+} or
1936 @code{<}) are named by spelling out, in some fashion, the special
1937 character: e.g. @code{Fplus()} or @code{Flss()}.  Primitives whose names
1938 begin with normal alphanumeric characters but also contain special
1939 characters are spelled out in some creative way, e.g. @code{let*}
1940 becomes @code{FletX()}.
1941
1942 Each function also has an associated structure that holds the data for
1943 the subr object that represents the function in Lisp.  This structure
1944 conveys the Lisp symbol name to the initialization routine that will
1945 create the symbol and store the subr object as its definition.  The C
1946 variable name of this structure is always @samp{S} prepended to the
1947 @var{fname}.  You hardly ever need to be aware of the existence of this
1948 structure, since @code{DEFUN} plus @code{DEFSUBR} takes care of all the
1949 details.
1950
1951 @item min_args
1952 This is the minimum number of arguments that the function requires.  The
1953 function @code{prog1} allows a minimum of one argument.
1954
1955 @item max_args
1956 This is the maximum number of arguments that the function accepts, if
1957 there is a fixed maximum.  Alternatively, it can be @code{UNEVALLED},
1958 indicating a special form that receives unevaluated arguments, or
1959 @code{MANY}, indicating an unlimited number of evaluated arguments (the
1960 C equivalent of @code{&rest}).  Both @code{UNEVALLED} and @code{MANY}
1961 are macros.  If @var{max_args} is a number, it may not be less than
1962 @var{min_args} and it may not be greater than 8. (If you need to add a
1963 function with more than 8 arguments, use the @code{MANY} form.  Resist
1964 the urge to edit the definition of @code{DEFUN} in @file{lisp.h}.  If
1965 you do it anyways, make sure to also add another clause to the switch
1966 statement in @code{primitive_funcall().})
1967
1968 @item interactive
1969 This is an interactive specification, a string such as might be used as
1970 the argument of @code{interactive} in a Lisp function.  In the case of
1971 @code{prog1}, it is 0 (a null pointer), indicating that @code{prog1}
1972 cannot be called interactively.  A value of @code{""} indicates a
1973 function that should receive no arguments when called interactively.
1974
1975 @item docstring
1976 This is the documentation string.  It is written just like a
1977 documentation string for a function defined in Lisp; in particular, the
1978 first line should be a single sentence.  Note how the documentation
1979 string is enclosed in a comment, none of the documentation is placed on
1980 the same lines as the comment-start and comment-end characters, and the
1981 comment-start characters are on the same line as the interactive
1982 specification.  @file{make-docfile}, which scans the C files for
1983 documentation strings, is very particular about what it looks for, and
1984 will not properly extract the doc string if it's not in this exact format.
1985
1986 In order to make both @file{etags} and @file{make-docfile} happy, make
1987 sure that the @code{DEFUN} line contains the @var{lname} and
1988 @var{fname}, and that the comment-start characters for the doc string
1989 are on the same line as the interactive specification, and put a newline
1990 directly after them (and before the comment-end characters).
1991
1992 @item arglist
1993 This is the comma-separated list of arguments to the C function.  For a
1994 function with a fixed maximum number of arguments, provide a C argument
1995 for each Lisp argument.  In this case, unlike regular C functions, the
1996 types of the arguments are not declared; they are simply always of type
1997 @code{Lisp_Object}.
1998
1999 The names of the C arguments will be used as the names of the arguments
2000 to the Lisp primitive as displayed in its documentation, modulo the same
2001 concerns described above for @code{F...} names (in particular,
2002 underscores in the C arguments become dashes in the Lisp arguments).
2003
2004 There is one additional kludge: A trailing `_' on the C argument is
2005 discarded when forming the Lisp argument.  This allows C language
2006 reserved words (like @code{default}) or global symbols (like
2007 @code{dirname}) to be used as argument names without compiler warnings
2008 or errors.
2009
2010 A Lisp function with @w{@var{max_args} = @code{UNEVALLED}} is a
2011 @w{@dfn{special form}}; its arguments are not evaluated.  Instead it
2012 receives one argument of type @code{Lisp_Object}, a (Lisp) list of the
2013 unevaluated arguments, conventionally named @code{(args)}.
2014
2015 When a Lisp function has no upper limit on the number of arguments,
2016 specify @w{@var{max_args} = @code{MANY}}.  In this case its implementation in
2017 C actually receives exactly two arguments: the number of Lisp arguments
2018 (an @code{int}) and the address of a block containing their values (a
2019 @w{@code{Lisp_Object *}}).  In this case only are the C types specified
2020 in the @var{arglist}: @w{@code{(int nargs, Lisp_Object *args)}}.
2021
2022 @end table
2023
2024 Within the function @code{Fprog1} itself, note the use of the macros
2025 @code{GCPRO1} and @code{UNGCPRO}.  @code{GCPRO1} is used to ``protect''
2026 a variable from garbage collection---to inform the garbage collector
2027 that it must look in that variable and regard the object pointed at by
2028 its contents as an accessible object.  This is necessary whenever you
2029 call @code{Feval} or anything that can directly or indirectly call
2030 @code{Feval} (this includes the @code{QUIT} macro!).  At such a time,
2031 any Lisp object that you intend to refer to again must be protected
2032 somehow.  @code{UNGCPRO} cancels the protection of the variables that
2033 are protected in the current function.  It is necessary to do this
2034 explicitly.
2035
2036 The macro @code{GCPRO1} protects just one local variable.  If you want
2037 to protect two, use @code{GCPRO2} instead; repeating @code{GCPRO1} will
2038 not work.  Macros @code{GCPRO3} and @code{GCPRO4} also exist.
2039
2040 These macros implicitly use local variables such as @code{gcpro1}; you
2041 must declare these explicitly, with type @code{struct gcpro}.  Thus, if
2042 you use @code{GCPRO2}, you must declare @code{gcpro1} and @code{gcpro2}.
2043
2044 @cindex caller-protects (@code{GCPRO} rule)
2045 Note also that the general rule is @dfn{caller-protects}; i.e. you are
2046 only responsible for protecting those Lisp objects that you create.  Any
2047 objects passed to you as arguments should have been protected by whoever
2048 created them, so you don't in general have to protect them.
2049
2050 In particular, the arguments to any Lisp primitive are always
2051 automatically @code{GCPRO}ed, when called ``normally'' from Lisp code or
2052 bytecode.  So only a few Lisp primitives that are called frequently from
2053 C code, such as @code{Fprogn} protect their arguments as a service to
2054 their caller.  You don't need to protect your arguments when writing a
2055 new @code{DEFUN}.
2056
2057 @code{GCPRO}ing is perhaps the trickiest and most error-prone part of
2058 XEmacs coding.  It is @strong{extremely} important that you get this
2059 right and use a great deal of discipline when writing this code.
2060 @xref{GCPROing, ,@code{GCPRO}ing}, for full details on how to do this.
2061
2062 What @code{DEFUN} actually does is declare a global structure of type
2063 @code{Lisp_Subr} whose name begins with capital @samp{SF} and which
2064 contains information about the primitive (e.g. a pointer to the
2065 function, its minimum and maximum allowed arguments, a string describing
2066 its Lisp name); @code{DEFUN} then begins a normal C function declaration
2067 using the @code{F...} name.  The Lisp subr object that is the function
2068 definition of a primitive (i.e. the object in the function slot of the
2069 symbol that names the primitive) actually points to this @samp{SF}
2070 structure; when @code{Feval} encounters a subr, it looks in the
2071 structure to find out how to call the C function.
2072
2073 Defining the C function is not enough to make a Lisp primitive
2074 available; you must also create the Lisp symbol for the primitive (the
2075 symbol is @dfn{interned}; @pxref{Obarrays}) and store a suitable subr
2076 object in its function cell. (If you don't do this, the primitive won't
2077 be seen by Lisp code.) The code looks like this:
2078
2079 @example
2080 DEFSUBR (@var{fname});
2081 @end example
2082
2083 @noindent
2084 Here @var{fname} is the same name you used as the second argument to
2085 @code{DEFUN}.
2086
2087 This call to @code{DEFSUBR} should go in the @code{syms_of_*()} function
2088 at the end of the module.  If no such function exists, create it and
2089 make sure to also declare it in @file{symsinit.h} and call it from the
2090 appropriate spot in @code{main()}.  @xref{General Coding Rules}.
2091
2092 Note that C code cannot call functions by name unless they are defined
2093 in C.  The way to call a function written in Lisp from C is to use
2094 @code{Ffuncall}, which embodies the Lisp function @code{funcall}.  Since
2095 the Lisp function @code{funcall} accepts an unlimited number of
2096 arguments, in C it takes two: the number of Lisp-level arguments, and a
2097 one-dimensional array containing their values.  The first Lisp-level
2098 argument is the Lisp function to call, and the rest are the arguments to
2099 pass to it.  Since @code{Ffuncall} can call the evaluator, you must
2100 protect pointers from garbage collection around the call to
2101 @code{Ffuncall}. (However, @code{Ffuncall} explicitly protects all of
2102 its parameters, so you don't have to protect any pointers passed as
2103 parameters to it.)
2104
2105 The C functions @code{call0}, @code{call1}, @code{call2}, and so on,
2106 provide handy ways to call a Lisp function conveniently with a fixed
2107 number of arguments.  They work by calling @code{Ffuncall}.
2108
2109 @file{eval.c} is a very good file to look through for examples;
2110 @file{lisp.h} contains the definitions for important macros and
2111 functions.
2112
2113 @node Adding Global Lisp Variables
2114 @section Adding Global Lisp Variables
2115
2116 Global variables whose names begin with @samp{Q} are constants whose
2117 value is a symbol of a particular name.  The name of the variable should
2118 be derived from the name of the symbol using the same rules as for Lisp
2119 primitives.  These variables are initialized using a call to
2120 @code{defsymbol()} in the @code{syms_of_*()} function. (This call
2121 interns a symbol, sets the C variable to the resulting Lisp object, and
2122 calls @code{staticpro()} on the C variable to tell the
2123 garbage-collection mechanism about this variable.  What
2124 @code{staticpro()} does is add a pointer to the variable to a large
2125 global array; when garbage-collection happens, all pointers listed in
2126 the array are used as starting points for marking Lisp objects.  This is
2127 important because it's quite possible that the only current reference to
2128 the object is the C variable.  In the case of symbols, the
2129 @code{staticpro()} doesn't matter all that much because the symbol is
2130 contained in @code{obarray}, which is itself @code{staticpro()}ed.
2131 However, it's possible that a naughty user could do something like
2132 uninterning the symbol out of @code{obarray} or even setting
2133 @code{obarray} to a different value [although this is likely to make
2134 XEmacs crash!].)
2135
2136   @strong{Please note:} It is potentially deadly if you declare a
2137 @samp{Q...}  variable in two different modules.  The two calls to
2138 @code{defsymbol()} are no problem, but some linkers will complain about
2139 multiply-defined symbols.  The most insidious aspect of this is that
2140 often the link will succeed anyway, but then the resulting executable
2141 will sometimes crash in obscure ways during certain operations!  To
2142 avoid this problem, declare any symbols with common names (such as
2143 @code{text}) that are not obviously associated with this particular
2144 module in the module @file{general.c}.
2145
2146   Global variables whose names begin with @samp{V} are variables that
2147 contain Lisp objects.  The convention here is that all global variables
2148 of type @code{Lisp_Object} begin with @samp{V}, and all others don't
2149 (including integer and boolean variables that have Lisp
2150 equivalents). Most of the time, these variables have equivalents in
2151 Lisp, but some don't.  Those that do are declared this way by a call to
2152 @code{DEFVAR_LISP()} in the @code{vars_of_*()} initializer for the
2153 module.  What this does is create a special @dfn{symbol-value-forward}
2154 Lisp object that contains a pointer to the C variable, intern a symbol
2155 whose name is as specified in the call to @code{DEFVAR_LISP()}, and set
2156 its value to the symbol-value-forward Lisp object; it also calls
2157 @code{staticpro()} on the C variable to tell the garbage-collection
2158 mechanism about the variable.  When @code{eval} (or actually
2159 @code{symbol-value}) encounters this special object in the process of
2160 retrieving a variable's value, it follows the indirection to the C
2161 variable and gets its value.  @code{setq} does similar things so that
2162 the C variable gets changed.
2163
2164   Whether or not you @code{DEFVAR_LISP()} a variable, you need to
2165 initialize it in the @code{vars_of_*()} function; otherwise it will end
2166 up as all zeroes, which is the integer 0 (@emph{not} @code{nil}), and
2167 this is probably not what you want.  Also, if the variable is not
2168 @code{DEFVAR_LISP()}ed, @strong{you must call} @code{staticpro()} on the
2169 C variable in the @code{vars_of_*()} function.  Otherwise, the
2170 garbage-collection mechanism won't know that the object in this variable
2171 is in use, and will happily collect it and reuse its storage for another
2172 Lisp object, and you will be the one who's unhappy when you can't figure
2173 out how your variable got overwritten.
2174
2175 @node Coding for Mule
2176 @section Coding for Mule
2177 @cindex Coding for Mule
2178
2179 Although Mule support is not compiled by default in XEmacs, many people
2180 are using it, and we consider it crucial that new code works correctly
2181 with multibyte characters.  This is not hard; it is only a matter of
2182 following several simple user-interface guidelines.  Even if you never
2183 compile with Mule, with a little practice you will find it quite easy
2184 to code Mule-correctly.
2185
2186 Note that these guidelines are not necessarily tied to the current Mule
2187 implementation; they are also a good idea to follow on the grounds of
2188 code generalization for future I18N work.
2189
2190 @menu
2191 * Character-Related Data Types::
2192 * Working With Character and Byte Positions::
2193 * Conversion to and from External Data::
2194 * General Guidelines for Writing Mule-Aware Code::
2195 * An Example of Mule-Aware Code::
2196 @end menu
2197
2198 @node Character-Related Data Types
2199 @subsection Character-Related Data Types
2200
2201 First, let's review the basic character-related datatypes used by
2202 XEmacs.  Note that the separate @code{typedef}s are not mandatory in the
2203 current implementation (all of them boil down to @code{unsigned char} or
2204 @code{int}), but they improve clarity of code a great deal, because one
2205 glance at the declaration can tell the intended use of the variable.
2206
2207 @table @code
2208 @item Emchar
2209 @cindex Emchar
2210 An @code{Emchar} holds a single Emacs character.
2211
2212 Obviously, the equality between characters and bytes is lost in the Mule
2213 world.  Characters can be represented by one or more bytes in the
2214 buffer, and @code{Emchar} is the C type large enough to hold any
2215 character.
2216
2217 Without Mule support, an @code{Emchar} is equivalent to an
2218 @code{unsigned char}.
2219
2220 @item Bufbyte
2221 @cindex Bufbyte
2222 The data representing the text in a buffer or string is logically a set
2223 of @code{Bufbyte}s.
2224
2225 XEmacs does not work with character formats all the time; when reading
2226 characters from the outside, it decodes them to an internal format, and
2227 likewise encodes them when writing.  @code{Bufbyte} (in fact
2228 @code{unsigned char}) is the basic unit of XEmacs internal buffers and
2229 strings format.
2230
2231 One character can correspond to one or more @code{Bufbyte}s.  In the
2232 current implementation, an ASCII character is represented by the same
2233 @code{Bufbyte}, and extended characters are represented by a sequence of
2234 @code{Bufbyte}s.
2235
2236 Without Mule support, a @code{Bufbyte} is equivalent to an
2237 @code{Emchar}.
2238
2239 @item Bufpos
2240 @itemx Charcount
2241 @cindex Bufpos
2242 @cindex Charcount
2243 A @code{Bufpos} represents a character position in a buffer or string.
2244 A @code{Charcount} represents a number (count) of characters.
2245 Logically, subtracting two @code{Bufpos} values yields a
2246 @code{Charcount} value.  Although all of these are @code{typedef}ed to
2247 @code{int}, we use them in preference to @code{int} to make it clear
2248 what sort of position is being used.
2249
2250 @code{Bufpos} and @code{Charcount} values are the only ones that are
2251 ever visible to Lisp.
2252
2253 @item Bytind
2254 @itemx Bytecount
2255 @cindex Bytind
2256 @cindex Bytecount
2257 A @code{Bytind} represents a byte position in a buffer or string.  A
2258 @code{Bytecount} represents the distance between two positions in bytes.
2259 The relationship between @code{Bytind} and @code{Bytecount} is the same
2260 as the relationship between @code{Bufpos} and @code{Charcount}.
2261
2262 @item Extbyte
2263 @itemx Extcount
2264 @cindex Extbyte
2265 @cindex Extcount
2266 When dealing with the outside world, XEmacs works with @code{Extbyte}s,
2267 which are equivalent to @code{unsigned char}.  Obviously, an
2268 @code{Extcount} is the distance between two @code{Extbyte}s.  Extbytes
2269 and Extcounts are not all that frequent in XEmacs code.
2270 @end table
2271
2272 @node Working With Character and Byte Positions
2273 @subsection Working With Character and Byte Positions
2274
2275 Now that we have defined the basic character-related types, we can look
2276 at the macros and functions designed for work with them and for
2277 conversion between them.  Most of these macros are defined in
2278 @file{buffer.h}, and we don't discuss all of them here, but only the
2279 most important ones.  Examining the existing code is the best way to
2280 learn about them.
2281
2282 @table @code
2283 @item MAX_EMCHAR_LEN
2284 @cindex MAX_EMCHAR_LEN
2285 This preprocessor constant is the maximum number of buffer bytes per
2286 Emacs character, i.e. the byte length of an @code{Emchar}.  It is useful
2287 when allocating temporary strings to keep a known number of characters.
2288 For instance:
2289
2290 @example
2291 @group
2292 @{
2293   Charcount cclen;
2294   ...
2295   @{
2296     /* Allocate place for @var{cclen} characters. */
2297     Bufbyte *buf = (Bufbyte *)alloca (cclen * MAX_EMCHAR_LEN);
2298 ...
2299 @end group
2300 @end example
2301
2302 If you followed the previous section, you can guess that, logically,
2303 multiplying a @code{Charcount} value with @code{MAX_EMCHAR_LEN} produces
2304 a @code{Bytecount} value.
2305
2306 In the current Mule implementation, @code{MAX_EMCHAR_LEN} equals 4.
2307 Without Mule, it is 1.
2308
2309 @item charptr_emchar
2310 @itemx set_charptr_emchar
2311 @cindex charptr_emchar
2312 @cindex set_charptr_emchar
2313 The @code{charptr_emchar} macro takes a @code{Bufbyte} pointer and
2314 returns the @code{Emchar} stored at that position.  If it were a
2315 function, its prototype would be:
2316
2317 @example
2318 Emchar charptr_emchar (Bufbyte *p);
2319 @end example
2320
2321 @code{set_charptr_emchar} stores an @code{Emchar} to the specified byte
2322 position.  It returns the number of bytes stored:
2323
2324 @example
2325 Bytecount set_charptr_emchar (Bufbyte *p, Emchar c);
2326 @end example
2327
2328 It is important to note that @code{set_charptr_emchar} is safe only for
2329 appending a character at the end of a buffer, not for overwriting a
2330 character in the middle.  This is because the width of characters
2331 varies, and @code{set_charptr_emchar} cannot resize the string if it
2332 writes, say, a two-byte character where a single-byte character used to
2333 reside.
2334
2335 A typical use of @code{set_charptr_emchar} can be demonstrated by this
2336 example, which copies characters from buffer @var{buf} to a temporary
2337 string of Bufbytes.
2338
2339 @example
2340 @group
2341 @{
2342   Bufpos pos;
2343   for (pos = beg; pos < end; pos++)
2344     @{
2345       Emchar c = BUF_FETCH_CHAR (buf, pos);
2346       p += set_charptr_emchar (buf, c);
2347     @}
2348 @}
2349 @end group
2350 @end example
2351
2352 Note how @code{set_charptr_emchar} is used to store the @code{Emchar}
2353 and increment the counter, at the same time.
2354
2355 @item INC_CHARPTR
2356 @itemx DEC_CHARPTR
2357 @cindex INC_CHARPTR
2358 @cindex DEC_CHARPTR
2359 These two macros increment and decrement a @code{Bufbyte} pointer,
2360 respectively.  They will adjust the pointer by the appropriate number of
2361 bytes according to the byte length of the character stored there.  Both
2362 macros assume that the memory address is located at the beginning of a
2363 valid character.
2364
2365 Without Mule support, @code{INC_CHARPTR (p)} and @code{DEC_CHARPTR (p)}
2366 simply expand to @code{p++} and @code{p--}, respectively.
2367
2368 @item bytecount_to_charcount
2369 @cindex bytecount_to_charcount
2370 Given a pointer to a text string and a length in bytes, return the
2371 equivalent length in characters.
2372
2373 @example
2374 Charcount bytecount_to_charcount (Bufbyte *p, Bytecount bc);
2375 @end example
2376
2377 @item charcount_to_bytecount
2378 @cindex charcount_to_bytecount
2379 Given a pointer to a text string and a length in characters, return the
2380 equivalent length in bytes.
2381
2382 @example
2383 Bytecount charcount_to_bytecount (Bufbyte *p, Charcount cc);
2384 @end example
2385
2386 @item charptr_n_addr
2387 @cindex charptr_n_addr
2388 Return a pointer to the beginning of the character offset @var{cc} (in
2389 characters) from @var{p}.
2390
2391 @example
2392 Bufbyte *charptr_n_addr (Bufbyte *p, Charcount cc);
2393 @end example
2394 @end table
2395
2396 @node Conversion to and from External Data
2397 @subsection Conversion to and from External Data
2398
2399 When an external function, such as a C library function, returns a
2400 @code{char} pointer, you should almost never treat it as @code{Bufbyte}.
2401 This is because these returned strings may contain 8bit characters which
2402 can be misinterpreted by XEmacs, and cause a crash.  Likewise, when
2403 exporting a piece of internal text to the outside world, you should
2404 always convert it to an appropriate external encoding, lest the internal
2405 stuff (such as the infamous \201 characters) leak out.
2406
2407 The interface to conversion between the internal and external
2408 representations of text are the numerous conversion macros defined in
2409 @file{buffer.h}.  Before looking at them, we'll look at the external
2410 formats supported by these macros.
2411
2412 Currently meaningful formats are @code{FORMAT_BINARY},
2413 @code{FORMAT_FILENAME}, @code{FORMAT_OS}, and @code{FORMAT_CTEXT}.  Here
2414 is a description of these.
2415
2416 @table @code
2417 @item FORMAT_BINARY
2418 Binary format.  This is the simplest format and is what we use in the
2419 absence of a more appropriate format.  This converts according to the
2420 @code{binary} coding system:
2421
2422 @enumerate a
2423 @item
2424 On input, bytes 0--255 are converted into characters 0--255.
2425 @item
2426 On output, characters 0--255 are converted into bytes 0--255 and other
2427 characters are converted into `X'.
2428 @end enumerate
2429
2430 @item FORMAT_FILENAME
2431 Format used for filenames.  In the original Mule, this is user-definable
2432 with the @code{pathname-coding-system} variable.  For the moment, we
2433 just use the @code{binary} coding system.
2434
2435 @item FORMAT_OS
2436 Format used for the external Unix environment---@code{argv[]}, stuff
2437 from @code{getenv()}, stuff from the @file{/etc/passwd} file, etc.
2438
2439 Perhaps should be the same as FORMAT_FILENAME.
2440
2441 @item FORMAT_CTEXT
2442 Compound--text format.  This is the standard X format used for data
2443 stored in properties, selections, and the like.  This is an 8-bit
2444 no-lock-shift ISO2022 coding system.
2445 @end table
2446
2447 The macros to convert between these formats and the internal format, and
2448 vice versa, follow.
2449
2450 @table @code
2451 @item GET_CHARPTR_INT_DATA_ALLOCA
2452 @itemx GET_CHARPTR_EXT_DATA_ALLOCA
2453 These two are the most basic conversion macros.
2454 @code{GET_CHARPTR_INT_DATA_ALLOCA} converts external data to internal
2455 format, and @code{GET_CHARPTR_EXT_DATA_ALLOCA} converts the other way
2456 around.  The arguments each of these receives are @var{ptr} (pointer to
2457 the text in external format), @var{len} (length of texts in bytes),
2458 @var{fmt} (format of the external text), @var{ptr_out} (lvalue to which
2459 new text should be copied), and @var{len_out} (lvalue which will be
2460 assigned the length of the internal text in bytes).  The resulting text
2461 is stored to a stack-allocated buffer.  If the text doesn't need
2462 changing, these macros will do nothing, except for setting
2463 @var{len_out}.
2464
2465 The macros above take many arguments which makes them unwieldy.  For
2466 this reason, a number of convenience macros are defined with obvious
2467 functionality, but accepting less arguments.  The general rule is that
2468 macros with @samp{INT} in their name convert text to internal Emacs
2469 representation, whereas the @samp{EXT} macros convert to external
2470 representation.
2471
2472 @item GET_C_CHARPTR_INT_DATA_ALLOCA
2473 @itemx GET_C_CHARPTR_EXT_DATA_ALLOCA
2474 As their names imply, these macros work on C char pointers, which are
2475 zero-terminated, and thus do not need @var{len} or @var{len_out}
2476 parameters.
2477
2478 @item GET_STRING_EXT_DATA_ALLOCA
2479 @itemx GET_C_STRING_EXT_DATA_ALLOCA
2480 These two macros convert a Lisp string into an external representation.
2481 The difference between them is that @code{GET_STRING_EXT_DATA_ALLOCA}
2482 stores its output to a generic string, providing @var{len_out}, the
2483 length of the resulting external string.  On the other hand,
2484 @code{GET_C_STRING_EXT_DATA_ALLOCA} assumes that the caller will be
2485 satisfied with output string being zero-terminated.
2486
2487 Note that for Lisp strings only one conversion direction makes sense.
2488
2489 @item GET_C_CHARPTR_EXT_BINARY_DATA_ALLOCA
2490 @itemx GET_CHARPTR_EXT_BINARY_DATA_ALLOCA
2491 @itemx GET_STRING_BINARY_DATA_ALLOCA
2492 @itemx GET_C_STRING_BINARY_DATA_ALLOCA
2493 @itemx GET_C_CHARPTR_EXT_FILENAME_DATA_ALLOCA
2494 @itemx ...
2495 These macros convert internal text to a specific external
2496 representation, with the external format being encoded into the name of
2497 the macro.  Note that the @code{GET_STRING_...} and
2498 @code{GET_C_STRING...}  macros lack the @samp{EXT} tag, because they
2499 only make sense in that direction.
2500
2501 @item GET_C_CHARPTR_INT_BINARY_DATA_ALLOCA
2502 @itemx GET_CHARPTR_INT_BINARY_DATA_ALLOCA
2503 @itemx GET_C_CHARPTR_INT_FILENAME_DATA_ALLOCA
2504 @itemx ...
2505 These macros convert external text of a specific format to its internal
2506 representation, with the external format being incoded into the name of
2507 the macro.
2508 @end table
2509
2510 @node General Guidelines for Writing Mule-Aware Code
2511 @subsection General Guidelines for Writing Mule-Aware Code
2512
2513 This section contains some general guidance on how to write Mule-aware
2514 code, as well as some pitfalls you should avoid.
2515
2516 @table @emph
2517 @item Never use @code{char} and @code{char *}.
2518 In XEmacs, the use of @code{char} and @code{char *} is almost always a
2519 mistake.  If you want to manipulate an Emacs character from ``C'', use
2520 @code{Emchar}.  If you want to examine a specific octet in the internal
2521 format, use @code{Bufbyte}.  If you want a Lisp-visible character, use a
2522 @code{Lisp_Object} and @code{make_char}.  If you want a pointer to move
2523 through the internal text, use @code{Bufbyte *}.  Also note that you
2524 almost certainly do not need @code{Emchar *}.
2525
2526 @item Be careful not to confuse @code{Charcount}, @code{Bytecount}, and @code{Bufpos}.
2527 The whole point of using different types is to avoid confusion about the
2528 use of certain variables.  Lest this effect be nullified, you need to be
2529 careful about using the right types.
2530
2531 @item Always convert external data
2532 It is extremely important to always convert external data, because
2533 XEmacs can crash if unexpected 8bit sequences are copied to its internal
2534 buffers literally.
2535
2536 This means that when a system function, such as @code{readdir}, returns
2537 a string, you need to convert it using one of the conversion macros
2538 described in the previous chapter, before passing it further to Lisp.
2539 In the case of @code{readdir}, you would use the
2540 @code{GET_C_CHARPTR_INT_FILENAME_DATA_ALLOCA} macro.
2541
2542 Also note that many internal functions, such as @code{make_string},
2543 accept Bufbytes, which removes the need for them to convert the data
2544 they receive.  This increases efficiency because that way external data
2545 needs to be decoded only once, when it is read.  After that, it is
2546 passed around in internal format.
2547 @end table
2548
2549 @node An Example of Mule-Aware Code
2550 @subsection An Example of Mule-Aware Code
2551
2552 As an example of Mule-aware code, we shall will analyze the
2553 @code{string} function, which conses up a Lisp string from the character
2554 arguments it receives.  Here is the definition, pasted from
2555 @code{alloc.c}:
2556
2557 @example
2558 @group
2559 DEFUN ("string", Fstring, 0, MANY, 0, /*
2560 Concatenate all the argument characters and make the result a string.
2561 */
2562        (int nargs, Lisp_Object *args))
2563 @{
2564   Bufbyte *storage = alloca_array (Bufbyte, nargs * MAX_EMCHAR_LEN);
2565   Bufbyte *p = storage;
2566
2567   for (; nargs; nargs--, args++)
2568     @{
2569       Lisp_Object lisp_char = *args;
2570       CHECK_CHAR_COERCE_INT (lisp_char);
2571       p += set_charptr_emchar (p, XCHAR (lisp_char));
2572     @}
2573   return make_string (storage, p - storage);
2574 @}
2575 @end group
2576 @end example
2577
2578 Now we can analyze the source line by line.
2579
2580 Obviously, string will be as long as there are arguments to the
2581 function.  This is why we allocate @code{MAX_EMCHAR_LEN} * @var{nargs}
2582 bytes on the stack, i.e. the worst-case number of bytes for @var{nargs}
2583 @code{Emchar}s to fit in the string.
2584
2585 Then, the loop checks that each element is a character, converting
2586 integers in the process.  Like many other functions in XEmacs, this
2587 function silently accepts integers where characters are expected, for
2588 historical and compatibility reasons.  Unless you know what you are
2589 doing, @code{CHECK_CHAR} will also suffice.  @code{XCHAR (lisp_char)}
2590 extracts the @code{Emchar} from the @code{Lisp_Object}, and
2591 @code{set_charptr_emchar} stores it to storage, increasing @code{p} in
2592 the process.
2593
2594 Other instructive examples of correct coding under Mule can be found all
2595 over the XEmacs code.  For starters, I recommend
2596 @code{Fnormalize_menu_item_name} in @file{menubar.c}.  After you have
2597 understood this section of the manual and studied the examples, you can
2598 proceed writing new Mule-aware code.
2599
2600 @node Techniques for XEmacs Developers
2601 @section Techniques for XEmacs Developers
2602
2603 To make a quantified XEmacs, do: @code{make quantmacs}.
2604
2605 You simply can't dump Quantified and Purified images.  Run the image
2606 like so:  @code{quantmacs -batch -l loadup.el run-temacs @var{xemacs-args...}}.
2607
2608 Before you go through the trouble, are you compiling with all
2609 debugging and error-checking off?  If not try that first.  Be warned
2610 that while Quantify is directly responsible for quite a few
2611 optimizations which have been made to XEmacs, doing a run which
2612 generates results which can be acted upon is not necessarily a trivial
2613 task.
2614
2615 Also, if you're still willing to do some runs make sure you configure
2616 with the @samp{--quantify} flag.  That will keep Quantify from starting
2617 to record data until after the loadup is completed and will shut off
2618 recording right before it shuts down (which generates enough bogus data
2619 to throw most results off).  It also enables three additional elisp
2620 commands: @code{quantify-start-recording-data},
2621 @code{quantify-stop-recording-data} and @code{quantify-clear-data}.
2622
2623 If you want to make XEmacs faster, target your favorite slow benchmark,
2624 run a profiler like Quantify, @code{gprof}, or @code{tcov}, and figure
2625 out where the cycles are going.  Specific projects:
2626
2627 @itemize @bullet
2628 @item
2629 Make the garbage collector faster.  Figure out how to write an
2630 incremental garbage collector.
2631 @item
2632 Write a compiler that takes bytecode and spits out C code.
2633 Unfortunately, you will then need a C compiler and a more fully
2634 developed module system.
2635 @item
2636 Speed up redisplay.
2637 @item
2638 Speed up syntax highlighting.  Maybe moving some of the syntax
2639 highlighting capabilities into C would make a difference.
2640 @item
2641 Implement tail recursion in Emacs Lisp (hard!).
2642 @end itemize
2643
2644 Unfortunately, Emacs Lisp is slow, and is going to stay slow.  Function
2645 calls in elisp are especially expensive.  Iterating over a long list is
2646 going to be 30 times faster implemented in C than in Elisp.
2647
2648 To get started debugging XEmacs, take a look at the @file{gdbinit} and
2649 @file{dbxrc} files in the @file{src} directory.
2650 @xref{Q2.1.15 - How to Debug an XEmacs problem with a debugger,,,
2651 xemacs-faq, XEmacs FAQ}.
2652
2653 After making source code changes, run @code{make check} to ensure that
2654 you haven't introduced any regressions.  If you're feeling ambitious,
2655 you can try to improve the test suite in @file{tests/automated}.
2656
2657 Here are things to know when you create a new source file:
2658
2659 @itemize @bullet
2660 @item
2661 All @file{.c} files should @code{#include <config.h>} first.  Almost all
2662 @file{.c} files should @code{#include "lisp.h"} second.
2663
2664 @item
2665 Generated header files should be included using the @code{#include <...>} syntax,
2666 not the @code{#include "..."} syntax.  The generated headers are:
2667
2668 @file{config.h puresize-adjust.h sheap-adjust.h paths.h Emacs.ad.h}
2669
2670 The basic rule is that you should assume builds using @code{--srcdir}
2671 and the @code{#include <...>} syntax needs to be used when the
2672 to-be-included generated file is in a potentially different directory
2673 @emph{at compile time}.  The non-obvious C rule is that @code{#include "..."}
2674 means to search for the included file in the same directory as the
2675 including file, @emph{not} in the current directory.
2676
2677 @item
2678 Header files should @emph{not} include @code{<config.h>} and
2679 @code{"lisp.h"}.  It is the responsibility of the @file{.c} files that
2680 use it to do so.
2681
2682 @item
2683 If the header uses @code{INLINE}, either directly or through
2684 @code{DECLARE_LRECORD}, then it must be added to @file{inline.c}'s
2685 includes.
2686
2687 @item
2688 Try compiling at least once with
2689
2690 @example
2691 gcc --with-mule --with-union-type --error-checking=all
2692 @end example
2693
2694 @item
2695 Did I mention that you should run the test suite?
2696 @example
2697 make check
2698 @end example
2699 @end itemize
2700
2701
2702 @node A Summary of the Various XEmacs Modules, Allocation of Objects in XEmacs Lisp, Rules When Writing New C Code, Top
2703 @chapter A Summary of the Various XEmacs Modules
2704
2705   This is accurate as of XEmacs 20.0.
2706
2707 @menu
2708 * Low-Level Modules::
2709 * Basic Lisp Modules::
2710 * Modules for Standard Editing Operations::
2711 * Editor-Level Control Flow Modules::
2712 * Modules for the Basic Displayable Lisp Objects::
2713 * Modules for other Display-Related Lisp Objects::
2714 * Modules for the Redisplay Mechanism::
2715 * Modules for Interfacing with the File System::
2716 * Modules for Other Aspects of the Lisp Interpreter and Object System::
2717 * Modules for Interfacing with the Operating System::
2718 * Modules for Interfacing with X Windows::
2719 * Modules for Internationalization::
2720 @end menu
2721
2722 @node Low-Level Modules
2723 @section Low-Level Modules
2724
2725 @example
2726 config.h
2727 @end example
2728
2729 This is automatically generated from @file{config.h.in} based on the
2730 results of configure tests and user-selected optional features and
2731 contains preprocessor definitions specifying the nature of the
2732 environment in which XEmacs is being compiled.
2733
2734
2735
2736 @example
2737 paths.h
2738 @end example
2739
2740 This is automatically generated from @file{paths.h.in} based on supplied
2741 configure values, and allows for non-standard installed configurations
2742 of the XEmacs directories.  It's currently broken, though.
2743
2744
2745
2746 @example
2747 emacs.c
2748 signal.c
2749 @end example
2750
2751 @file{emacs.c} contains @code{main()} and other code that performs the most
2752 basic environment initializations and handles shutting down the XEmacs
2753 process (this includes @code{kill-emacs}, the normal way that XEmacs is
2754 exited; @code{dump-emacs}, which is used during the build process to
2755 write out the XEmacs executable; @code{run-emacs-from-temacs}, which can
2756 be used to start XEmacs directly when temacs has finished loading all
2757 the Lisp code; and emergency code to handle crashes [XEmacs tries to
2758 auto-save all files before it crashes]).
2759
2760 Low-level code that directly interacts with the Unix signal mechanism,
2761 however, is in @file{signal.c}.  Note that this code does not handle system
2762 dependencies in interfacing to signals; that is handled using the
2763 @file{syssignal.h} header file, described in section J below.
2764
2765
2766
2767 @example
2768 unexaix.c
2769 unexalpha.c
2770 unexapollo.c
2771 unexconvex.c
2772 unexec.c
2773 unexelf.c
2774 unexelfsgi.c
2775 unexencap.c
2776 unexenix.c
2777 unexfreebsd.c
2778 unexfx2800.c
2779 unexhp9k3.c
2780 unexhp9k800.c
2781 unexmips.c
2782 unexnext.c
2783 unexsol2.c
2784 unexsunos4.c
2785 @end example
2786
2787 These modules contain code dumping out the XEmacs executable on various
2788 different systems. (This process is highly machine-specific and
2789 requires intimate knowledge of the executable format and the memory map
2790 of the process.) Only one of these modules is actually used; this is
2791 chosen by @file{configure}.
2792
2793
2794
2795 @example
2796 crt0.c
2797 lastfile.c
2798 pre-crt0.c
2799 @end example
2800
2801 These modules are used in conjunction with the dump mechanism.  On some
2802 systems, an alternative version of the C startup code (the actual code
2803 that receives control from the operating system when the process is
2804 started, and which calls @code{main()}) is required so that the dumping
2805 process works properly; @file{crt0.c} provides this.
2806
2807 @file{pre-crt0.c} and @file{lastfile.c} should be the very first and
2808 very last file linked, respectively. (Actually, this is not really true.
2809 @file{lastfile.c} should be after all Emacs modules whose initialized
2810 data should be made constant, and before all other Emacs files and all
2811 libraries.  In particular, the allocation modules @file{gmalloc.c},
2812 @file{alloca.c}, etc. are normally placed past @file{lastfile.c}, and
2813 all of the files that implement Xt widget classes @emph{must} be placed
2814 after @file{lastfile.c} because they contain various structures that
2815 must be statically initialized and into which Xt writes at various
2816 times.) @file{pre-crt0.c} and @file{lastfile.c} contain exported symbols
2817 that are used to determine the start and end of XEmacs' initialized
2818 data space when dumping.
2819
2820
2821
2822 @example
2823 alloca.c
2824 free-hook.c
2825 getpagesize.h
2826 gmalloc.c
2827 malloc.c
2828 mem-limits.h
2829 ralloc.c
2830 vm-limit.c
2831 @end example
2832
2833 These handle basic C allocation of memory.  @file{alloca.c} is an emulation of
2834 the stack allocation function @code{alloca()} on machines that lack
2835 this. (XEmacs makes extensive use of @code{alloca()} in its code.)
2836
2837 @file{gmalloc.c} and @file{malloc.c} are two implementations of the standard C
2838 functions @code{malloc()}, @code{realloc()} and @code{free()}.  They are
2839 often used in place of the standard system-provided @code{malloc()}
2840 because they usually provide a much faster implementation, at the
2841 expense of additional memory use.  @file{gmalloc.c} is a newer implementation
2842 that is much more memory-efficient for large allocations than @file{malloc.c},
2843 and should always be preferred if it works. (At one point, @file{gmalloc.c}
2844 didn't work on some systems where @file{malloc.c} worked; but this should be
2845 fixed now.)
2846
2847 @cindex relocating allocator
2848 @file{ralloc.c} is the @dfn{relocating allocator}.  It provides
2849 functions similar to @code{malloc()}, @code{realloc()} and @code{free()}
2850 that allocate memory that can be dynamically relocated in memory.  The
2851 advantage of this is that allocated memory can be shuffled around to
2852 place all the free memory at the end of the heap, and the heap can then
2853 be shrunk, releasing the memory back to the operating system.  The use
2854 of this can be controlled with the configure option @code{--rel-alloc};
2855 if enabled, memory allocated for buffers will be relocatable, so that if
2856 a very large file is visited and the buffer is later killed, the memory
2857 can be released to the operating system.  (The disadvantage of this
2858 mechanism is that it can be very slow.  On systems with the
2859 @code{mmap()} system call, the XEmacs version of @file{ralloc.c} uses
2860 this to move memory around without actually having to block-copy it,
2861 which can speed things up; but it can still cause noticeable performance
2862 degradation.)
2863
2864 @file{free-hook.c} contains some debugging functions for checking for invalid
2865 arguments to @code{free()}.
2866
2867 @file{vm-limit.c} contains some functions that warn the user when memory is
2868 getting low.  These are callback functions that are called by @file{gmalloc.c}
2869 and @file{malloc.c} at appropriate times.
2870
2871 @file{getpagesize.h} provides a uniform interface for retrieving the size of a
2872 page in virtual memory.  @file{mem-limits.h} provides a uniform interface for
2873 retrieving the total amount of available virtual memory.  Both are
2874 similar in spirit to the @file{sys*.h} files described in section J, below.
2875
2876
2877
2878 @example
2879 blocktype.c
2880 blocktype.h
2881 dynarr.c
2882 @end example
2883
2884 These implement a couple of basic C data types to facilitate memory
2885 allocation.  The @code{Blocktype} type efficiently manages the
2886 allocation of fixed-size blocks by minimizing the number of times that
2887 @code{malloc()} and @code{free()} are called.  It allocates memory in
2888 large chunks, subdivides the chunks into blocks of the proper size, and
2889 returns the blocks as requested.  When blocks are freed, they are placed
2890 onto a linked list, so they can be efficiently reused.  This data type
2891 is not much used in XEmacs currently, because it's a fairly new
2892 addition.
2893
2894 @cindex dynamic array
2895 The @code{Dynarr} type implements a @dfn{dynamic array}, which is
2896 similar to a standard C array but has no fixed limit on the number of
2897 elements it can contain.  Dynamic arrays can hold elements of any type,
2898 and when you add a new element, the array automatically resizes itself
2899 if it isn't big enough.  Dynarrs are extensively used in the redisplay
2900 mechanism.
2901
2902
2903
2904 @example
2905 inline.c
2906 @end example
2907
2908 This module is used in connection with inline functions (available in
2909 some compilers).  Often, inline functions need to have a corresponding
2910 non-inline function that does the same thing.  This module is where they
2911 reside.  It contains no actual code, but defines some special flags that
2912 cause inline functions defined in header files to be rendered as actual
2913 functions.  It then includes all header files that contain any inline
2914 function definitions, so that each one gets a real function equivalent.
2915
2916
2917
2918 @example
2919 debug.c
2920 debug.h
2921 @end example
2922
2923 These functions provide a system for doing internal consistency checks
2924 during code development.  This system is not currently used; instead the
2925 simpler @code{assert()} macro is used along with the various checks
2926 provided by the @samp{--error-check-*} configuration options.
2927
2928
2929
2930 @example
2931 prefix-args.c
2932 @end example
2933
2934 This is actually the source for a small, self-contained program
2935 used during building.
2936
2937
2938 @example
2939 universe.h
2940 @end example
2941
2942 This is not currently used.
2943
2944
2945
2946 @node Basic Lisp Modules
2947 @section Basic Lisp Modules
2948
2949 @example
2950 emacsfns.h
2951 lisp-disunion.h
2952 lisp-union.h
2953 lisp.h
2954 lrecord.h
2955 symsinit.h
2956 @end example
2957
2958 These are the basic header files for all XEmacs modules.  Each module
2959 includes @file{lisp.h}, which brings the other header files in.
2960 @file{lisp.h} contains the definitions of the structures and extractor
2961 and constructor macros for the basic Lisp objects and various other
2962 basic definitions for the Lisp environment, as well as some
2963 general-purpose definitions (e.g. @code{min()} and @code{max()}).
2964 @file{lisp.h} includes either @file{lisp-disunion.h} or
2965 @file{lisp-union.h}, depending on whether @code{USE_UNION_TYPE} is
2966 defined.  These files define the typedef of the Lisp object itself (as
2967 described above) and the low-level macros that hide the actual
2968 implementation of the Lisp object.  All extractor and constructor macros
2969 for particular types of Lisp objects are defined in terms of these
2970 low-level macros.
2971
2972 As a general rule, all typedefs should go into the typedefs section of
2973 @file{lisp.h} rather than into a module-specific header file even if the
2974 structure is defined elsewhere.  This allows function prototypes that
2975 use the typedef to be placed into other header files.  Forward structure
2976 declarations (i.e. a simple declaration like @code{struct foo;} where
2977 the structure itself is defined elsewhere) should be placed into the
2978 typedefs section as necessary.
2979
2980 @file{lrecord.h} contains the basic structures and macros that implement
2981 all record-type Lisp objects -- i.e. all objects whose type is a field
2982 in their C structure, which includes all objects except the few most
2983 basic ones.
2984
2985 @file{lisp.h} contains prototypes for most of the exported functions in
2986 the various modules.  Lisp primitives defined using @code{DEFUN} that
2987 need to be called by C code should be declared using @code{EXFUN}.
2988 Other function prototypes should be placed either into the appropriate
2989 section of @code{lisp.h}, or into a module-specific header file,
2990 depending on how general-purpose the function is and whether it has
2991 special-purpose argument types requiring definitions not in
2992 @file{lisp.h}.)  All initialization functions are prototyped in
2993 @file{symsinit.h}.
2994
2995
2996
2997 @example
2998 alloc.c
2999 pure.c
3000 puresize.h
3001 @end example
3002
3003 The large module @file{alloc.c} implements all of the basic allocation and
3004 garbage collection for Lisp objects.  The most commonly used Lisp
3005 objects are allocated in chunks, similar to the Blocktype data type
3006 described above; others are allocated in individually @code{malloc()}ed
3007 blocks.  This module provides the foundation on which all other aspects
3008 of the Lisp environment sit, and is the first module initialized at
3009 startup.
3010
3011 Note that @file{alloc.c} provides a series of generic functions that are
3012 not dependent on any particular object type, and interfaces to
3013 particular types of objects using a standardized interface of
3014 type-specific methods.  This scheme is a fundamental principle of
3015 object-oriented programming and is heavily used throughout XEmacs.  The
3016 great advantage of this is that it allows for a clean separation of
3017 functionality into different modules -- new classes of Lisp objects, new
3018 event interfaces, new device types, new stream interfaces, etc. can be
3019 added transparently without affecting code anywhere else in XEmacs.
3020 Because the different subsystems are divided into general and specific
3021 code, adding a new subtype within a subsystem will in general not
3022 require changes to the generic subsystem code or affect any of the other
3023 subtypes in the subsystem; this provides a great deal of robustness to
3024 the XEmacs code.
3025
3026 @cindex pure space
3027 @file{pure.c} contains the declaration of the @dfn{purespace} array.
3028 Pure space is a hack used to place some constant Lisp data into the code
3029 segment of the XEmacs executable, even though the data needs to be
3030 initialized through function calls.  (See above in section VIII for more
3031 info about this.)  During startup, certain sorts of data is
3032 automatically copied into pure space, and other data is copied manually
3033 in some of the basic Lisp files by calling the function @code{purecopy},
3034 which copies the object if possible (this only works in temacs, of
3035 course) and returns the new object.  In particular, while temacs is
3036 executing, the Lisp reader automatically copies all compiled-function
3037 objects that it reads into pure space.  Since compiled-function objects
3038 are large, are never modified, and typically comprise the majority of
3039 the contents of a compiled-Lisp file, this works well.  While XEmacs is
3040 running, any attempt to modify an object that resides in pure space
3041 causes an error.  Objects in pure space are never garbage collected --
3042 almost all of the time, they're intended to be permanent, and in any
3043 case you can't write into pure space to set the mark bits.
3044
3045 @file{puresize.h} contains the declaration of the size of the pure space
3046 array.  This depends on the optional features that are compiled in, any
3047 extra purespace requested by the user at compile time, and certain other
3048 factors (e.g. 64-bit machines need more pure space because their Lisp
3049 objects are larger).  The smallest size that suffices should be used, so
3050 that there's no wasted space.  If there's not enough pure space, you
3051 will get an error during the build process, specifying how much more
3052 pure space is needed.
3053
3054
3055
3056 @example
3057 eval.c
3058 backtrace.h
3059 @end example
3060
3061 This module contains all of the functions to handle the flow of control.
3062 This includes the mechanisms of defining functions, calling functions,
3063 traversing stack frames, and binding variables; the control primitives
3064 and other special forms such as @code{while}, @code{if}, @code{eval},
3065 @code{let}, @code{and}, @code{or}, @code{progn}, etc.; handling of
3066 non-local exits, unwind-protects, and exception handlers; entering the
3067 debugger; methods for the subr Lisp object type; etc.  It does
3068 @emph{not} include the @code{read} function, the @code{print} function,
3069 or the handling of symbols and obarrays.
3070
3071 @file{backtrace.h} contains some structures related to stack frames and the
3072 flow of control.
3073
3074
3075
3076 @example
3077 lread.c
3078 @end example
3079
3080 This module implements the Lisp reader and the @code{read} function,
3081 which converts text into Lisp objects, according to the read syntax of
3082 the objects, as described above.  This is similar to the parser that is
3083 a part of all compilers.
3084
3085
3086
3087 @example
3088 print.c
3089 @end example
3090
3091 This module implements the Lisp print mechanism and the @code{print}
3092 function and related functions.  This is the inverse of the Lisp reader
3093 -- it converts Lisp objects to a printed, textual representation.
3094 (Hopefully something that can be read back in using @code{read} to get
3095 an equivalent object.)
3096
3097
3098
3099 @example
3100 general.c
3101 symbols.c
3102 symeval.h
3103 @end example
3104
3105 @file{symbols.c} implements the handling of symbols, obarrays, and
3106 retrieving the values of symbols.  Much of the code is devoted to
3107 handling the special @dfn{symbol-value-magic} objects that define
3108 special types of variables -- this includes buffer-local variables,
3109 variable aliases, variables that forward into C variables, etc.  This
3110 module is initialized extremely early (right after @file{alloc.c}),
3111 because it is here that the basic symbols @code{t} and @code{nil} are
3112 created, and those symbols are used everywhere throughout XEmacs.
3113
3114 @file{symeval.h} contains the definitions of symbol structures and the
3115 @code{DEFVAR_LISP()} and related macros for declaring variables.
3116
3117
3118
3119 @example
3120 data.c
3121 floatfns.c
3122 fns.c
3123 @end example
3124
3125 These modules implement the methods and standard Lisp primitives for all
3126 the basic Lisp object types other than symbols (which are described
3127 above).  @file{data.c} contains all the predicates (primitives that return
3128 whether an object is of a particular type); the integer arithmetic
3129 functions; and the basic accessor and mutator primitives for the various
3130 object types.  @file{fns.c} contains all the standard predicates for working
3131 with sequences (where, abstractly speaking, a sequence is an ordered set
3132 of objects, and can be represented by a list, string, vector, or
3133 bit-vector); it also contains @code{equal}, perhaps on the grounds that
3134 bulk of the operation of @code{equal} is comparing sequences.
3135 @file{floatfns.c} contains methods and primitives for floats and floating-point
3136 arithmetic.
3137
3138
3139
3140 @example
3141 bytecode.c
3142 bytecode.h
3143 @end example
3144
3145 @file{bytecode.c} implements the byte-code interpreter and
3146 compiled-function objects, and @file{bytecode.h} contains associated
3147 structures.  Note that the byte-code @emph{compiler} is written in Lisp.
3148
3149
3150
3151
3152 @node Modules for Standard Editing Operations
3153 @section Modules for Standard Editing Operations
3154
3155 @example
3156 buffer.c
3157 buffer.h
3158 bufslots.h
3159 @end example
3160
3161 @file{buffer.c} implements the @dfn{buffer} Lisp object type.  This
3162 includes functions that create and destroy buffers; retrieve buffers by
3163 name or by other properties; manipulate lists of buffers (remember that
3164 buffers are permanent objects and stored in various ordered lists);
3165 retrieve or change buffer properties; etc.  It also contains the
3166 definitions of all the built-in buffer-local variables (which can be
3167 viewed as buffer properties).  It does @emph{not} contain code to
3168 manipulate buffer-local variables (that's in @file{symbols.c}, described
3169 above); or code to manipulate the text in a buffer.
3170
3171 @file{buffer.h} defines the structures associated with a buffer and the various
3172 macros for retrieving text from a buffer and special buffer positions
3173 (e.g. @code{point}, the default location for text insertion).  It also
3174 contains macros for working with buffer positions and converting between
3175 their representations as character offsets and as byte offsets (under
3176 MULE, they are different, because characters can be multi-byte).  It is
3177 one of the largest header files.
3178
3179 @file{bufslots.h} defines the fields in the buffer structure that correspond to
3180 the built-in buffer-local variables.  It is its own header file because
3181 it is included many times in @file{buffer.c}, as a way of iterating over all
3182 the built-in buffer-local variables.
3183
3184
3185
3186 @example
3187 insdel.c
3188 insdel.h
3189 @end example
3190
3191 @file{insdel.c} contains low-level functions for inserting and deleting text in
3192 a buffer, keeping track of changed regions for use by redisplay, and
3193 calling any before-change and after-change functions that may have been
3194 registered for the buffer.  It also contains the actual functions that
3195 convert between byte offsets and character offsets.
3196
3197 @file{insdel.h} contains associated headers.
3198
3199
3200
3201 @example
3202 marker.c
3203 @end example
3204
3205 This module implements the @dfn{marker} Lisp object type, which
3206 conceptually is a pointer to a text position in a buffer that moves
3207 around as text is inserted and deleted, so as to remain in the same
3208 relative position.  This module doesn't actually move the markers around
3209 -- that's handled in @file{insdel.c}.  This module just creates them and
3210 implements the primitives for working with them.  As markers are simple
3211 objects, this does not entail much.
3212
3213 Note that the standard arithmetic primitives (e.g. @code{+}) accept
3214 markers in place of integers and automatically substitute the value of
3215 @code{marker-position} for the marker, i.e. an integer describing the
3216 current buffer position of the marker.
3217
3218
3219
3220 @example
3221 extents.c
3222 extents.h
3223 @end example
3224
3225 This module implements the @dfn{extent} Lisp object type, which is like
3226 a marker that works over a range of text rather than a single position.
3227 Extents are also much more complex and powerful than markers and have a
3228 more efficient (and more algorithmically complex) implementation.  The
3229 implementation is described in detail in comments in @file{extents.c}.
3230
3231 The code in @file{extents.c} works closely with @file{insdel.c} so that
3232 extents are properly moved around as text is inserted and deleted.
3233 There is also code in @file{extents.c} that provides information needed
3234 by the redisplay mechanism for efficient operation. (Remember that
3235 extents can have display properties that affect [sometimes drastically,
3236 as in the @code{invisible} property] the display of the text they
3237 cover.)
3238
3239
3240
3241 @example
3242 editfns.c
3243 @end example
3244
3245 @file{editfns.c} contains the standard Lisp primitives for working with
3246 a buffer's text, and calls the low-level functions in @file{insdel.c}.
3247 It also contains primitives for working with @code{point} (the default
3248 buffer insertion location).
3249
3250 @file{editfns.c} also contains functions for retrieving various
3251 characteristics from the external environment: the current time, the
3252 process ID of the running XEmacs process, the name of the user who ran
3253 this XEmacs process, etc.  It's not clear why this code is in
3254 @file{editfns.c}.
3255
3256
3257
3258 @example
3259 callint.c
3260 cmds.c
3261 commands.h
3262 @end example
3263
3264 @cindex interactive
3265 These modules implement the basic @dfn{interactive} commands,
3266 i.e. user-callable functions.  Commands, as opposed to other functions,
3267 have special ways of getting their parameters interactively (by querying
3268 the user), as opposed to having them passed in a normal function
3269 invocation.  Many commands are not really meant to be called from other
3270 Lisp functions, because they modify global state in a way that's often
3271 undesired as part of other Lisp functions.
3272
3273 @file{callint.c} implements the mechanism for querying the user for
3274 parameters and calling interactive commands.  The bulk of this module is
3275 code that parses the interactive spec that is supplied with an
3276 interactive command.
3277
3278 @file{cmds.c} implements the basic, most commonly used editing commands:
3279 commands to move around the current buffer and insert and delete
3280 characters.  These commands are implemented using the Lisp primitives
3281 defined in @file{editfns.c}.
3282
3283 @file{commands.h} contains associated structure definitions and prototypes.
3284
3285
3286
3287 @example
3288 regex.c
3289 regex.h
3290 search.c
3291 @end example
3292
3293 @file{search.c} implements the Lisp primitives for searching for text in
3294 a buffer, and some of the low-level algorithms for doing this.  In
3295 particular, the fast fixed-string Boyer-Moore search algorithm is
3296 implemented in @file{search.c}.  The low-level algorithms for doing
3297 regular-expression searching, however, are implemented in @file{regex.c}
3298 and @file{regex.h}.  These two modules are largely independent of
3299 XEmacs, and are similar to (and based upon) the regular-expression
3300 routines used in @file{grep} and other GNU utilities.
3301
3302
3303
3304 @example
3305 doprnt.c
3306 @end example
3307
3308 @file{doprnt.c} implements formatted-string processing, similar to
3309 @code{printf()} command in C.
3310
3311
3312
3313 @example
3314 undo.c
3315 @end example
3316
3317 This module implements the undo mechanism for tracking buffer changes.
3318 Most of this could be implemented in Lisp.
3319
3320
3321
3322 @node Editor-Level Control Flow Modules
3323 @section Editor-Level Control Flow Modules
3324
3325 @example
3326 event-Xt.c
3327 event-stream.c
3328 event-tty.c
3329 events.c
3330 events.h
3331 @end example
3332
3333 These implement the handling of events (user input and other system
3334 notifications).
3335
3336 @file{events.c} and @file{events.h} define the @dfn{event} Lisp object
3337 type and primitives for manipulating it.
3338
3339 @file{event-stream.c} implements the basic functions for working with
3340 event queues, dispatching an event by looking it up in relevant keymaps
3341 and such, and handling timeouts; this includes the primitives
3342 @code{next-event} and @code{dispatch-event}, as well as related
3343 primitives such as @code{sit-for}, @code{sleep-for}, and
3344 @code{accept-process-output}. (@file{event-stream.c} is one of the
3345 hairiest and trickiest modules in XEmacs.  Beware!  You can easily mess
3346 things up here.)
3347
3348 @file{event-Xt.c} and @file{event-tty.c} implement the low-level
3349 interfaces onto retrieving events from Xt (the X toolkit) and from TTY's
3350 (using @code{read()} and @code{select()}), respectively.  The event
3351 interface enforces a clean separation between the specific code for
3352 interfacing with the operating system and the generic code for working
3353 with events, by defining an API of basic, low-level event methods;
3354 @file{event-Xt.c} and @file{event-tty.c} are two different
3355 implementations of this API.  To add support for a new operating system
3356 (e.g. NeXTstep), one merely needs to provide another implementation of
3357 those API functions.
3358
3359 Note that the choice of whether to use @file{event-Xt.c} or
3360 @file{event-tty.c} is made at compile time!  Or at the very latest, it
3361 is made at startup time.  @file{event-Xt.c} handles events for
3362 @emph{both} X and TTY frames; @file{event-tty.c} is only used when X
3363 support is not compiled into XEmacs.  The reason for this is that there
3364 is only one event loop in XEmacs: thus, it needs to be able to receive
3365 events from all different kinds of frames.
3366
3367
3368
3369 @example
3370 keymap.c
3371 keymap.h
3372 @end example
3373
3374 @file{keymap.c} and @file{keymap.h} define the @dfn{keymap} Lisp object
3375 type and associated methods and primitives. (Remember that keymaps are
3376 objects that associate event descriptions with functions to be called to
3377 ``execute'' those events; @code{dispatch-event} looks up events in the
3378 relevant keymaps.)
3379
3380
3381
3382 @example
3383 keyboard.c
3384 @end example
3385
3386 @file{keyboard.c} contains functions that implement the actual editor
3387 command loop -- i.e. the event loop that cyclically retrieves and
3388 dispatches events.  This code is also rather tricky, just like
3389 @file{event-stream.c}.
3390
3391
3392
3393 @example
3394 macros.c
3395 macros.h
3396 @end example
3397
3398 These two modules contain the basic code for defining keyboard macros.
3399 These functions don't actually do much; most of the code that handles keyboard
3400 macros is mixed in with the event-handling code in @file{event-stream.c}.
3401
3402
3403
3404 @example
3405 minibuf.c
3406 @end example
3407
3408 This contains some miscellaneous code related to the minibuffer (most of
3409 the minibuffer code was moved into Lisp by Richard Mlynarik).  This
3410 includes the primitives for completion (although filename completion is
3411 in @file{dired.c}), the lowest-level interface to the minibuffer (if the
3412 command loop were cleaned up, this too could be in Lisp), and code for
3413 dealing with the echo area (this, too, was mostly moved into Lisp, and
3414 the only code remaining is code to call out to Lisp or provide simple
3415 bootstrapping implementations early in temacs, before the echo-area Lisp
3416 code is loaded).
3417
3418
3419
3420 @node Modules for the Basic Displayable Lisp Objects
3421 @section Modules for the Basic Displayable Lisp Objects
3422
3423 @example
3424 device-ns.h
3425 device-stream.c
3426 device-stream.h
3427 device-tty.c
3428 device-tty.h
3429 device-x.c
3430 device-x.h
3431 device.c
3432 device.h
3433 @end example
3434
3435 These modules implement the @dfn{device} Lisp object type.  This
3436 abstracts a particular screen or connection on which frames are
3437 displayed.  As with Lisp objects, event interfaces, and other
3438 subsystems, the device code is separated into a generic component that
3439 contains a standardized interface (in the form of a set of methods) onto
3440 particular device types.
3441
3442 The device subsystem defines all the methods and provides method
3443 services for not only device operations but also for the frame, window,
3444 menubar, scrollbar, toolbar, and other displayable-object subsystems.
3445 The reason for this is that all of these subsystems have the same
3446 subtypes (X, TTY, NeXTstep, Microsoft Windows, etc.) as devices do.
3447
3448
3449
3450 @example
3451 frame-ns.h
3452 frame-tty.c
3453 frame-x.c
3454 frame-x.h
3455 frame.c
3456 frame.h
3457 @end example
3458
3459 Each device contains one or more frames in which objects (e.g. text) are
3460 displayed.  A frame corresponds to a window in the window system;
3461 usually this is a top-level window but it could potentially be one of a
3462 number of overlapping child windows within a top-level window, using the
3463 MDI (Multiple Document Interface) protocol in Microsoft Windows or a
3464 similar scheme.
3465
3466 The @file{frame-*} files implement the @dfn{frame} Lisp object type and
3467 provide the generic and device-type-specific operations on frames
3468 (e.g. raising, lowering, resizing, moving, etc.).
3469
3470
3471
3472 @example
3473 window.c
3474 window.h
3475 @end example
3476
3477 @cindex window (in Emacs)
3478 @cindex pane
3479 Each frame consists of one or more non-overlapping @dfn{windows} (better
3480 known as @dfn{panes} in standard window-system terminology) in which a
3481 buffer's text can be displayed.  Windows can also have scrollbars
3482 displayed around their edges.
3483
3484 @file{window.c} and @file{window.h} implement the @dfn{window} Lisp
3485 object type and provide code to manage windows.  Since windows have no
3486 associated resources in the window system (the window system knows only
3487 about the frame; no child windows or anything are used for XEmacs
3488 windows), there is no device-type-specific code here; all of that code
3489 is part of the redisplay mechanism or the code for particular object
3490 types such as scrollbars.
3491
3492
3493
3494 @node Modules for other Display-Related Lisp Objects
3495 @section Modules for other Display-Related Lisp Objects
3496
3497 @example
3498 faces.c
3499 faces.h
3500 @end example
3501
3502
3503
3504 @example
3505 bitmaps.h
3506 glyphs-ns.h
3507 glyphs-x.c
3508 glyphs-x.h
3509 glyphs.c
3510 glyphs.h
3511 @end example
3512
3513
3514
3515 @example
3516 objects-ns.h
3517 objects-tty.c
3518 objects-tty.h
3519 objects-x.c
3520 objects-x.h
3521 objects.c
3522 objects.h
3523 @end example
3524
3525
3526
3527 @example
3528 menubar-x.c
3529 menubar.c
3530 @end example
3531
3532
3533
3534 @example
3535 scrollbar-x.c
3536 scrollbar-x.h
3537 scrollbar.c
3538 scrollbar.h
3539 @end example
3540
3541
3542
3543 @example
3544 toolbar-x.c
3545 toolbar.c
3546 toolbar.h
3547 @end example
3548
3549
3550
3551 @example
3552 font-lock.c
3553 @end example
3554
3555 This file provides C support for syntax highlighting -- i.e.
3556 highlighting different syntactic constructs of a source file in
3557 different colors, for easy reading.  The C support is provided so that
3558 this is fast.
3559
3560
3561
3562 @example
3563 dgif_lib.c
3564 gif_err.c
3565 gif_lib.h
3566 gifalloc.c
3567 @end example
3568
3569 These modules decode GIF-format image files, for use with glyphs.
3570
3571
3572
3573 @node Modules for the Redisplay Mechanism
3574 @section Modules for the Redisplay Mechanism
3575
3576 @example
3577 redisplay-output.c
3578 redisplay-tty.c
3579 redisplay-x.c
3580 redisplay.c
3581 redisplay.h
3582 @end example
3583
3584 These files provide the redisplay mechanism.  As with many other
3585 subsystems in XEmacs, there is a clean separation between the general
3586 and device-specific support.
3587
3588 @file{redisplay.c} contains the bulk of the redisplay engine.  These
3589 functions update the redisplay structures (which describe how the screen
3590 is to appear) to reflect any changes made to the state of any
3591 displayable objects (buffer, frame, window, etc.) since the last time
3592 that redisplay was called.  These functions are highly optimized to
3593 avoid doing more work than necessary (since redisplay is called
3594 extremely often and is potentially a huge time sink), and depend heavily
3595 on notifications from the objects themselves that changes have occurred,
3596 so that redisplay doesn't explicitly have to check each possible object.
3597 The redisplay mechanism also contains a great deal of caching to further
3598 speed things up; some of this caching is contained within the various
3599 displayable objects.
3600
3601 @file{redisplay-output.c} goes through the redisplay structures and converts
3602 them into calls to device-specific methods to actually output the screen
3603 changes.
3604
3605 @file{redisplay-x.c} and @file{redisplay-tty.c} are two implementations
3606 of these redisplay output methods, for X frames and TTY frames,
3607 respectively.
3608
3609
3610
3611 @example
3612 indent.c
3613 @end example
3614
3615 This module contains various functions and Lisp primitives for
3616 converting between buffer positions and screen positions.  These
3617 functions call the redisplay mechanism to do most of the work, and then
3618 examine the redisplay structures to get the necessary information.  This
3619 module needs work.
3620
3621
3622
3623 @example
3624 termcap.c
3625 terminfo.c
3626 tparam.c
3627 @end example
3628
3629 These files contain functions for working with the termcap (BSD-style)
3630 and terminfo (System V style) databases of terminal capabilities and
3631 escape sequences, used when XEmacs is displaying in a TTY.
3632
3633
3634
3635 @example
3636 cm.c
3637 cm.h
3638 @end example
3639
3640 These files provide some miscellaneous TTY-output functions and should
3641 probably be merged into @file{redisplay-tty.c}.
3642
3643
3644
3645 @node Modules for Interfacing with the File System
3646 @section Modules for Interfacing with the File System
3647
3648 @example
3649 lstream.c
3650 lstream.h
3651 @end example
3652
3653 These modules implement the @dfn{stream} Lisp object type.  This is an
3654 internal-only Lisp object that implements a generic buffering stream.
3655 The idea is to provide a uniform interface onto all sources and sinks of
3656 data, including file descriptors, stdio streams, chunks of memory, Lisp
3657 buffers, Lisp strings, etc.  That way, I/O functions can be written to
3658 the stream interface and can transparently handle all possible sources
3659 and sinks.  (For example, the @code{read} function can read data from a
3660 file, a string, a buffer, or even a function that is called repeatedly
3661 to return data, without worrying about where the data is coming from or
3662 what-size chunks it is returned in.)
3663
3664 @cindex lstream
3665 Note that in the C code, streams are called @dfn{lstreams} (for ``Lisp
3666 streams'') to distinguish them from other kinds of streams, e.g. stdio
3667 streams and C++ I/O streams.
3668
3669 Similar to other subsystems in XEmacs, lstreams are separated into
3670 generic functions and a set of methods for the different types of
3671 lstreams.  @file{lstream.c} provides implementations of many different
3672 types of streams; others are provided, e.g., in @file{mule-coding.c}.
3673
3674
3675
3676 @example
3677 fileio.c
3678 @end example
3679
3680 This implements the basic primitives for interfacing with the file
3681 system.  This includes primitives for reading files into buffers,
3682 writing buffers into files, checking for the presence or accessibility
3683 of files, canonicalizing file names, etc.  Note that these primitives
3684 are usually not invoked directly by the user: There is a great deal of
3685 higher-level Lisp code that implements the user commands such as
3686 @code{find-file} and @code{save-buffer}.  This is similar to the
3687 distinction between the lower-level primitives in @file{editfns.c} and
3688 the higher-level user commands in @file{commands.c} and
3689 @file{simple.el}.
3690
3691
3692
3693 @example
3694 filelock.c
3695 @end example
3696
3697 This file provides functions for detecting clashes between different
3698 processes (e.g. XEmacs and some external process, or two different
3699 XEmacs processes) modifying the same file.  (XEmacs can optionally use
3700 the @file{lock/} subdirectory to provide a form of ``locking'' between
3701 different XEmacs processes.)  This module is also used by the low-level
3702 functions in @file{insdel.c} to ensure that, if the first modification
3703 is being made to a buffer whose corresponding file has been externally
3704 modified, the user is made aware of this so that the buffer can be
3705 synched up with the external changes if necessary.
3706
3707
3708 @example
3709 filemode.c
3710 @end example
3711
3712 This file provides some miscellaneous functions that construct a
3713 @samp{rwxr-xr-x}-type permissions string (as might appear in an
3714 @file{ls}-style directory listing) given the information returned by the
3715 @code{stat()} system call.
3716
3717
3718
3719 @example
3720 dired.c
3721 ndir.h
3722 @end example
3723
3724 These files implement the XEmacs interface to directory searching.  This
3725 includes a number of primitives for determining the files in a directory
3726 and for doing filename completion. (Remember that generic completion is
3727 handled by a different mechanism, in @file{minibuf.c}.)
3728
3729 @file{ndir.h} is a header file used for the directory-searching
3730 emulation functions provided in @file{sysdep.c} (see section J below),
3731 for systems that don't provide any directory-searching functions. (On
3732 those systems, directories can be read directly as files, and parsed.)
3733
3734
3735
3736 @example
3737 realpath.c
3738 @end example
3739
3740 This file provides an implementation of the @code{realpath()} function
3741 for expanding symbolic links, on systems that don't implement it or have
3742 a broken implementation.
3743
3744
3745
3746 @node Modules for Other Aspects of the Lisp Interpreter and Object System
3747 @section Modules for Other Aspects of the Lisp Interpreter and Object System
3748
3749 @example
3750 elhash.c
3751 elhash.h
3752 hash.c
3753 hash.h
3754 @end example
3755
3756 These files provide two implementations of hash tables.  Files
3757 @file{hash.c} and @file{hash.h} provide a generic C implementation of
3758 hash tables which can stand independently of XEmacs.  Files
3759 @file{elhash.c} and @file{elhash.h} provide a separate implementation of
3760 hash tables that can store only Lisp objects, and knows about Lispy
3761 things like garbage collection, and implement the @dfn{hash-table} Lisp
3762 object type.
3763
3764
3765 @example
3766 specifier.c
3767 specifier.h
3768 @end example
3769
3770 This module implements the @dfn{specifier} Lisp object type.  This is
3771 primarily used for displayable properties, and allows for values that
3772 are specific to a particular buffer, window, frame, device, or device
3773 class, as well as a default value existing.  This is used, for example,
3774 to control the height of the horizontal scrollbar or the appearance of
3775 the @code{default}, @code{bold}, or other faces.  The specifier object
3776 consists of a number of specifications, each of which maps from a
3777 buffer, window, etc. to a value.  The function @code{specifier-instance}
3778 looks up a value given a window (from which a buffer, frame, and device
3779 can be derived).
3780
3781
3782 @example
3783 chartab.c
3784 chartab.h
3785 casetab.c
3786 @end example
3787
3788 @file{chartab.c} and @file{chartab.h} implement the @dfn{char table}
3789 Lisp object type, which maps from characters or certain sorts of
3790 character ranges to Lisp objects.  The implementation of this object
3791 type is optimized for the internal representation of characters.  Char
3792 tables come in different types, which affect the allowed object types to
3793 which a character can be mapped and also dictate certain other
3794 properties of the char table.
3795
3796 @cindex case table
3797 @file{casetab.c} implements one sort of char table, the @dfn{case
3798 table}, which maps characters to other characters of possibly different
3799 case.  These are used by XEmacs to implement case-changing primitives
3800 and to do case-insensitive searching.
3801
3802
3803
3804 @example
3805 syntax.c
3806 syntax.h
3807 @end example
3808
3809 @cindex scanner
3810 This module implements @dfn{syntax tables}, another sort of char table
3811 that maps characters into syntax classes that define the syntax of these
3812 characters (e.g. a parenthesis belongs to a class of @samp{open}
3813 characters that have corresponding @samp{close} characters and can be
3814 nested).  This module also implements the Lisp @dfn{scanner}, a set of
3815 primitives for scanning over text based on syntax tables.  This is used,
3816 for example, to find the matching parenthesis in a command such as
3817 @code{forward-sexp}, and by @file{font-lock.c} to locate quoted strings,
3818 comments, etc.
3819
3820
3821
3822 @example
3823 casefiddle.c
3824 @end example
3825
3826 This module implements various Lisp primitives for upcasing, downcasing
3827 and capitalizing strings or regions of buffers.
3828
3829
3830
3831 @example
3832 rangetab.c
3833 @end example
3834
3835 This module implements the @dfn{range table} Lisp object type, which
3836 provides for a mapping from ranges of integers to arbitrary Lisp
3837 objects.
3838
3839
3840
3841 @example
3842 opaque.c
3843 opaque.h
3844 @end example
3845
3846 This module implements the @dfn{opaque} Lisp object type, an
3847 internal-only Lisp object that encapsulates an arbitrary block of memory
3848 so that it can be managed by the Lisp allocation system.  To create an
3849 opaque object, you call @code{make_opaque()}, passing a pointer to a
3850 block of memory.  An object is created that is big enough to hold the
3851 memory, which is copied into the object's storage.  The object will then
3852 stick around as long as you keep pointers to it, after which it will be
3853 automatically reclaimed.
3854
3855 @cindex mark method
3856 Opaque objects can also have an arbitrary @dfn{mark method} associated
3857 with them, in case the block of memory contains other Lisp objects that
3858 need to be marked for garbage-collection purposes. (If you need other
3859 object methods, such as a finalize method, you should just go ahead and
3860 create a new Lisp object type -- it's not hard.)
3861
3862
3863
3864 @example
3865 abbrev.c
3866 @end example
3867
3868 This function provides a few primitives for doing dynamic abbreviation
3869 expansion.  In XEmacs, most of the code for this has been moved into
3870 Lisp.  Some C code remains for speed and because the primitive
3871 @code{self-insert-command} (which is executed for all self-inserting
3872 characters) hooks into the abbrev mechanism. (@code{self-insert-command}
3873 is itself in C only for speed.)
3874
3875
3876
3877 @example
3878 doc.c
3879 @end example
3880
3881 This function provides primitives for retrieving the documentation
3882 strings of functions and variables.  These documentation strings contain
3883 certain special markers that get dynamically expanded (e.g. a
3884 reverse-lookup is performed on some named functions to retrieve their
3885 current key bindings).  Some documentation strings (in particular, for
3886 the built-in primitives and pre-loaded Lisp functions) are stored
3887 externally in a file @file{DOC} in the @file{lib-src/} directory and
3888 need to be fetched from that file. (Part of the build stage involves
3889 building this file, and another part involves constructing an index for
3890 this file and embedding it into the executable, so that the functions in
3891 @file{doc.c} do not have to search the entire @file{DOC} file to find
3892 the appropriate documentation string.)
3893
3894
3895
3896 @example
3897 md5.c
3898 @end example
3899
3900 This function provides a Lisp primitive that implements the MD5 secure
3901 hashing scheme, used to create a large hash value of a string of data such that
3902 the data cannot be derived from the hash value.  This is used for
3903 various security applications on the Internet.
3904
3905
3906
3907
3908 @node Modules for Interfacing with the Operating System
3909 @section Modules for Interfacing with the Operating System
3910
3911 @example
3912 callproc.c
3913 process.c
3914 process.h
3915 @end example
3916
3917 These modules allow XEmacs to spawn and communicate with subprocesses
3918 and network connections.
3919
3920 @cindex synchronous subprocesses
3921 @cindex subprocesses, synchronous
3922   @file{callproc.c} implements (through the @code{call-process}
3923 primitive) what are called @dfn{synchronous subprocesses}.  This means
3924 that XEmacs runs a program, waits till it's done, and retrieves its
3925 output.  A typical example might be calling the @file{ls} program to get
3926 a directory listing.
3927
3928 @cindex asynchronous subprocesses
3929 @cindex subprocesses, asynchronous
3930   @file{process.c} and @file{process.h} implement @dfn{asynchronous
3931 subprocesses}.  This means that XEmacs starts a program and then
3932 continues normally, not waiting for the process to finish.  Data can be
3933 sent to the process or retrieved from it as it's running.  This is used
3934 for the @code{shell} command (which provides a front end onto a shell
3935 program such as @file{csh}), the mail and news readers implemented in
3936 XEmacs, etc.  The result of calling @code{start-process} to start a
3937 subprocess is a process object, a particular kind of object used to
3938 communicate with the subprocess.  You can send data to the process by
3939 passing the process object and the data to @code{send-process}, and you
3940 can specify what happens to data retrieved from the process by setting
3941 properties of the process object. (When the process sends data, XEmacs
3942 receives a process event, which says that there is data ready.  When
3943 @code{dispatch-event} is called on this event, it reads the data from
3944 the process and does something with it, as specified by the process
3945 object's properties.  Typically, this means inserting the data into a
3946 buffer or calling a function.) Another property of the process object is
3947 called the @dfn{sentinel}, which is a function that is called when the
3948 process terminates.
3949
3950 @cindex network connections
3951   Process objects are also used for network connections (connections to a
3952 process running on another machine).  Network connections are started
3953 with @code{open-network-stream} but otherwise work just like
3954 subprocesses.
3955
3956
3957
3958 @example
3959 sysdep.c
3960 sysdep.h
3961 @end example
3962
3963   These modules implement most of the low-level, messy operating-system
3964 interface code.  This includes various device control (ioctl) operations
3965 for file descriptors, TTY's, pseudo-terminals, etc. (usually this stuff
3966 is fairly system-dependent; thus the name of this module), and emulation
3967 of standard library functions and system calls on systems that don't
3968 provide them or have broken versions.
3969
3970
3971
3972 @example
3973 sysdir.h
3974 sysfile.h
3975 sysfloat.h
3976 sysproc.h
3977 syspwd.h
3978 syssignal.h
3979 systime.h
3980 systty.h
3981 syswait.h
3982 @end example
3983
3984 These header files provide consistent interfaces onto system-dependent
3985 header files and system calls.  The idea is that, instead of including a
3986 standard header file like @file{<sys/param.h>} (which may or may not
3987 exist on various systems) or having to worry about whether all system
3988 provide a particular preprocessor constant, or having to deal with the
3989 four different paradigms for manipulating signals, you just include the
3990 appropriate @file{sys*.h} header file, which includes all the right
3991 system header files, defines and missing preprocessor constants,
3992 provides a uniform interface onto system calls, etc.
3993
3994 @file{sysdir.h} provides a uniform interface onto directory-querying
3995 functions. (In some cases, this is in conjunction with emulation
3996 functions in @file{sysdep.c}.)
3997
3998 @file{sysfile.h} includes all the necessary header files for standard
3999 system calls (e.g. @code{read()}), ensures that all necessary
4000 @code{open()} and @code{stat()} preprocessor constants are defined, and
4001 possibly (usually) substitutes sugared versions of @code{read()},
4002 @code{write()}, etc. that automatically restart interrupted I/O
4003 operations.
4004
4005 @file{sysfloat.h} includes the necessary header files for floating-point
4006 operations.
4007
4008 @file{sysproc.h} includes the necessary header files for calling
4009 @code{select()}, @code{fork()}, @code{execve()}, socket operations, and
4010 the like, and ensures that the @code{FD_*()} macros for descriptor-set
4011 manipulations are available.
4012
4013 @file{syspwd.h} includes the necessary header files for obtaining
4014 information from @file{/etc/passwd} (the functions are emulated under
4015 VMS).
4016
4017 @file{syssignal.h} includes the necessary header files for
4018 signal-handling and provides a uniform interface onto the different
4019 signal-handling and signal-blocking paradigms.
4020
4021 @file{systime.h} includes the necessary header files and provides
4022 uniform interfaces for retrieving the time of day, setting file
4023 access/modification times, getting the amount of time used by the XEmacs
4024 process, etc.
4025
4026 @file{systty.h} buffers against the infinitude of different ways of
4027 controlling TTY's.
4028
4029 @file{syswait.h} provides a uniform way of retrieving the exit status
4030 from a @code{wait()}ed-on process (some systems use a union, others use
4031 an int).
4032
4033
4034
4035 @example
4036 hpplay.c
4037 libsst.c
4038 libsst.h
4039 libst.h
4040 linuxplay.c
4041 nas.c
4042 sgiplay.c
4043 sound.c
4044 sunplay.c
4045 @end example
4046
4047 These files implement the ability to play various sounds on some types
4048 of computers.  You have to configure your XEmacs with sound support in
4049 order to get this capability.
4050
4051 @file{sound.c} provides the generic interface.  It implements various
4052 Lisp primitives and variables that let you specify which sounds should
4053 be played in certain conditions. (The conditions are identified by
4054 symbols, which are passed to @code{ding} to make a sound.  Various
4055 standard functions call this function at certain times; if sound support
4056 does not exist, a simple beep results.
4057
4058 @cindex native sound
4059 @cindex sound, native
4060 @file{sgiplay.c}, @file{sunplay.c}, @file{hpplay.c}, and
4061 @file{linuxplay.c} interface to the machine's speaker for various
4062 different kind of machines.  This is called @dfn{native} sound.
4063
4064 @cindex sound, network
4065 @cindex network sound
4066 @cindex NAS
4067 @file{nas.c} interfaces to a computer somewhere else on the network
4068 using the NAS (Network Audio Server) protocol, playing sounds on that
4069 machine.  This allows you to run XEmacs on a remote machine, with its
4070 display set to your local machine, and have the sounds be made on your
4071 local machine, provided that you have a NAS server running on your local
4072 machine.
4073
4074 @file{libsst.c}, @file{libsst.h}, and @file{libst.h} provide some
4075 additional functions for playing sound on a Sun SPARC but are not
4076 currently in use.
4077
4078
4079
4080 @example
4081 tooltalk.c
4082 tooltalk.h
4083 @end example
4084
4085 These two modules implement an interface to the ToolTalk protocol, which
4086 is an interprocess communication protocol implemented on some versions
4087 of Unix.  ToolTalk is a high-level protocol that allows processes to
4088 register themselves as providers of particular services; other processes
4089 can then request a service without knowing or caring exactly who is
4090 providing the service.  It is similar in spirit to the DDE protocol
4091 provided under Microsoft Windows.  ToolTalk is a part of the new CDE
4092 (Common Desktop Environment) specification and is used to connect the
4093 parts of the SPARCWorks development environment.
4094
4095
4096
4097 @example
4098 getloadavg.c
4099 @end example
4100
4101 This module provides the ability to retrieve the system's current load
4102 average. (The way to do this is highly system-specific, unfortunately,
4103 and requires a lot of special-case code.)
4104
4105
4106
4107 @example
4108 sunpro.c
4109 @end example
4110
4111 This module provides a small amount of code used internally at Sun to
4112 keep statistics on the usage of XEmacs.
4113
4114
4115
4116 @example
4117 broken-sun.h
4118 strcmp.c
4119 strcpy.c
4120 sunOS-fix.c
4121 @end example
4122
4123 These files provide replacement functions and prototypes to fix numerous
4124 bugs in early releases of SunOS 4.1.
4125
4126
4127
4128 @example
4129 hftctl.c
4130 @end example
4131
4132 This module provides some terminal-control code necessary on versions of
4133 AIX prior to 4.1.
4134
4135
4136
4137 @example
4138 msdos.c
4139 msdos.h
4140 @end example
4141
4142 These modules are used for MS-DOS support, which does not work in
4143 XEmacs.
4144
4145
4146
4147 @node Modules for Interfacing with X Windows
4148 @section Modules for Interfacing with X Windows
4149
4150 @example
4151 Emacs.ad.h
4152 @end example
4153
4154 A file generated from @file{Emacs.ad}, which contains XEmacs-supplied
4155 fallback resources (so that XEmacs has pretty defaults).
4156
4157
4158
4159 @example
4160 EmacsFrame.c
4161 EmacsFrame.h
4162 EmacsFrameP.h
4163 @end example
4164
4165 These modules implement an Xt widget class that encapsulates a frame.
4166 This is for ease in integrating with Xt.  The EmacsFrame widget covers
4167 the entire X window except for the menubar; the scrollbars are
4168 positioned on top of the EmacsFrame widget.
4169
4170 @strong{Warning:} Abandon hope, all ye who enter here.  This code took
4171 an ungodly amount of time to get right, and is likely to fall apart
4172 mercilessly at the slightest change.  Such is life under Xt.
4173
4174
4175
4176 @example
4177 EmacsManager.c
4178 EmacsManager.h
4179 EmacsManagerP.h
4180 @end example
4181
4182 These modules implement a simple Xt manager (i.e. composite) widget
4183 class that simply lets its children set whatever geometry they want.
4184 It's amazing that Xt doesn't provide this standardly, but on second
4185 thought, it makes sense, considering how amazingly broken Xt is.
4186
4187
4188 @example
4189 EmacsShell-sub.c
4190 EmacsShell.c
4191 EmacsShell.h
4192 EmacsShellP.h
4193 @end example
4194
4195 These modules implement two Xt widget classes that are subclasses of
4196 the TopLevelShell and TransientShell classes.  This is necessary to deal
4197 with more brokenness that Xt has sadistically thrust onto the backs of
4198 developers.
4199
4200
4201
4202 @example
4203 xgccache.c
4204 xgccache.h
4205 @end example
4206
4207 These modules provide functions for maintenance and caching of GC's
4208 (graphics contexts) under the X Window System.  This code is junky and
4209 needs to be rewritten.
4210
4211
4212
4213 @example
4214 xselect.c
4215 @end example
4216
4217 @cindex selections
4218   This module provides an interface to the X Window System's concept of
4219 @dfn{selections}, the standard way for X applications to communicate
4220 with each other.
4221
4222
4223
4224 @example
4225 xintrinsic.h
4226 xintrinsicp.h
4227 xmmanagerp.h
4228 xmprimitivep.h
4229 @end example
4230
4231 These header files are similar in spirit to the @file{sys*.h} files and buffer
4232 against different implementations of Xt and Motif.
4233
4234 @itemize @bullet
4235 @item
4236 @file{xintrinsic.h} should be included in place of @file{<Intrinsic.h>}.
4237 @item
4238 @file{xintrinsicp.h} should be included in place of @file{<IntrinsicP.h>}.
4239 @item
4240 @file{xmmanagerp.h} should be included in place of @file{<XmManagerP.h>}.
4241 @item
4242 @file{xmprimitivep.h} should be included in place of @file{<XmPrimitiveP.h>}.
4243 @end itemize
4244
4245
4246
4247 @example
4248 xmu.c
4249 xmu.h
4250 @end example
4251
4252 These files provide an emulation of the Xmu library for those systems
4253 (i.e. HPUX) that don't provide it as a standard part of X.
4254
4255
4256
4257 @example
4258 ExternalClient-Xlib.c
4259 ExternalClient.c
4260 ExternalClient.h
4261 ExternalClientP.h
4262 ExternalShell.c
4263 ExternalShell.h
4264 ExternalShellP.h
4265 extw-Xlib.c
4266 extw-Xlib.h
4267 extw-Xt.c
4268 extw-Xt.h
4269 @end example
4270
4271 @cindex external widget
4272   These files provide the @dfn{external widget} interface, which allows an
4273 XEmacs frame to appear as a widget in another application.  To do this,
4274 you have to configure with @samp{--external-widget}.
4275
4276 @file{ExternalShell*} provides the server (XEmacs) side of the
4277 connection.
4278
4279 @file{ExternalClient*} provides the client (other application) side of
4280 the connection.  These files are not compiled into XEmacs but are
4281 compiled into libraries that are then linked into your application.
4282
4283 @file{extw-*} is common code that is used for both the client and server.
4284
4285 Don't touch this code; something is liable to break if you do.
4286
4287
4288
4289 @node Modules for Internationalization
4290 @section Modules for Internationalization
4291
4292 @example
4293 mule-canna.c
4294 mule-ccl.c
4295 mule-charset.c
4296 mule-charset.h
4297 mule-coding.c
4298 mule-coding.h
4299 mule-mcpath.c
4300 mule-mcpath.h
4301 mule-wnnfns.c
4302 mule.c
4303 @end example
4304
4305 These files implement the MULE (Asian-language) support.  Note that MULE
4306 actually provides a general interface for all sorts of languages, not
4307 just Asian languages (although they are generally the most complicated
4308 to support).  This code is still in beta.
4309
4310 @file{mule-charset.*} and @file{mule-coding.*} provide the heart of the
4311 XEmacs MULE support.  @file{mule-charset.*} implements the @dfn{charset}
4312 Lisp object type, which encapsulates a character set (an ordered one- or
4313 two-dimensional set of characters, such as US ASCII or JISX0208 Japanese
4314 Kanji).
4315
4316 @file{mule-coding.*} implements the @dfn{coding-system} Lisp object
4317 type, which encapsulates a method of converting between different
4318 encodings.  An encoding is a representation of a stream of characters,
4319 possibly from multiple character sets, using a stream of bytes or words,
4320 and defines (e.g.) which escape sequences are used to specify particular
4321 character sets, how the indices for a character are converted into bytes
4322 (sometimes this involves setting the high bit; sometimes complicated
4323 rearranging of the values takes place, as in the Shift-JIS encoding),
4324 etc.
4325
4326 @file{mule-ccl.c} provides the CCL (Code Conversion Language)
4327 interpreter.  CCL is similar in spirit to Lisp byte code and is used to
4328 implement converters for custom encodings.
4329
4330 @file{mule-canna.c} and @file{mule-wnnfns.c} implement interfaces to
4331 external programs used to implement the Canna and WNN input methods,
4332 respectively.  This is currently in beta.
4333
4334 @file{mule-mcpath.c} provides some functions to allow for pathnames
4335 containing extended characters.  This code is fragmentary, obsolete, and
4336 completely non-working.  Instead, @var{pathname-coding-system} is used
4337 to specify conversions of names of files and directories.  The standard
4338 C I/O functions like @samp{open()} are wrapped so that conversion occurs
4339 automatically.
4340
4341 @file{mule.c} provides a few miscellaneous things that should probably
4342 be elsewhere.
4343
4344
4345
4346 @example
4347 intl.c
4348 @end example
4349
4350 This provides some miscellaneous internationalization code for
4351 implementing message translation and interfacing to the Ximp input
4352 method.  None of this code is currently working.
4353
4354
4355
4356 @example
4357 iso-wide.h
4358 @end example
4359
4360 This contains leftover code from an earlier implementation of
4361 Asian-language support, and is not currently used.
4362
4363
4364
4365
4366 @node Allocation of Objects in XEmacs Lisp, Events and the Event Loop, A Summary of the Various XEmacs Modules, Top
4367 @chapter Allocation of Objects in XEmacs Lisp
4368
4369 @menu
4370 * Introduction to Allocation::
4371 * Garbage Collection::
4372 * GCPROing::
4373 * Integers and Characters::
4374 * Allocation from Frob Blocks::
4375 * lrecords::
4376 * Low-level allocation::
4377 * Pure Space::
4378 * Cons::
4379 * Vector::
4380 * Bit Vector::
4381 * Symbol::
4382 * Marker::
4383 * String::
4384 * Compiled Function::
4385 @end menu
4386
4387 @node Introduction to Allocation
4388 @section Introduction to Allocation
4389
4390   Emacs Lisp, like all Lisps, has garbage collection.  This means that
4391 the programmer never has to explicitly free (destroy) an object; it
4392 happens automatically when the object becomes inaccessible.  Most
4393 experts agree that garbage collection is a necessity in a modern,
4394 high-level language.  Its omission from C stems from the fact that C was
4395 originally designed to be a nice abstract layer on top of assembly
4396 language, for writing kernels and basic system utilities rather than
4397 large applications.
4398
4399   Lisp objects can be created by any of a number of Lisp primitives.
4400 Most object types have one or a small number of basic primitives
4401 for creating objects.  For conses, the basic primitive is @code{cons};
4402 for vectors, the primitives are @code{make-vector} and @code{vector}; for
4403 symbols, the primitives are @code{make-symbol} and @code{intern}; etc.
4404 Some Lisp objects, especially those that are primarily used internally,
4405 have no corresponding Lisp primitives.  Every Lisp object, though,
4406 has at least one C primitive for creating it.
4407
4408   Recall from section (VII) that a Lisp object, as stored in a 32-bit
4409 or 64-bit word, has a mark bit, a few tag bits, and a ``value'' that
4410 occupies the remainder of the bits.  We can separate the different
4411 Lisp object types into four broad categories:
4412
4413 @itemize @bullet
4414 @item
4415 (a) Those for whom the value directly represents the contents of the
4416 Lisp object.  Only two types are in this category: integers and
4417 characters.  No special allocation or garbage collection is necessary
4418 for such objects.  Lisp objects of these types do not need to be
4419 @code{GCPRO}ed.
4420 @end itemize
4421
4422   In the remaining three categories, the value is a pointer to a
4423 structure.
4424
4425 @itemize @bullet
4426 @item
4427 @cindex frob block
4428 (b) Those for whom the tag directly specifies the type.  Recall that
4429 there are only three tag bits; this means that at most five types can be
4430 specified this way.  The most commonly-used types are stored in this
4431 format; this includes conses, strings, vectors, and sometimes symbols.
4432 With the exception of vectors, objects in this category are allocated in
4433 @dfn{frob blocks}, i.e. large blocks of memory that are subdivided into
4434 individual objects.  This saves a lot on malloc overhead, since there
4435 are typically quite a lot of these objects around, and the objects are
4436 small.  (A cons, for example, occupies 8 bytes on 32-bit machines -- 4
4437 bytes for each of the two objects it contains.) Vectors are individually
4438 @code{malloc()}ed since they are of variable size.  (It would be
4439 possible, and desirable, to allocate vectors of certain small sizes out
4440 of frob blocks, but it isn't currently done.) Strings are handled
4441 specially: Each string is allocated in two parts, a fixed size structure
4442 containing a length and a data pointer, and the actual data of the
4443 string.  The former structure is allocated in frob blocks as usual, and
4444 the latter data is stored in @dfn{string chars blocks} and is relocated
4445 during garbage collection to eliminate holes.
4446 @end itemize
4447
4448   In the remaining two categories, the type is stored in the object
4449 itself.  The tag for all such objects is the generic @dfn{lrecord}
4450 (Lisp_Record) tag.  The first four bytes (or eight, for 64-bit machines)
4451 of the object's structure are a pointer to a structure that describes
4452 the object's type, which includes method pointers and a pointer to a
4453 string naming the type.  Note that it's possible to save some space by
4454 using a one- or two-byte tag, rather than a four- or eight-byte pointer
4455 to store the type, but it's not clear it's worth making the change.
4456
4457 @itemize @bullet
4458 @item
4459 (c) Those lrecords that are allocated in frob blocks (see above).  This
4460 includes the objects that are most common and relatively small, and
4461 includes floats, compiled functions, symbols (when not in category (b)),
4462 extents, events, and markers.  With the cleanup of frob blocks done in
4463 19.12, it's not terribly hard to add more objects to this category, but
4464 it's a bit trickier than adding an object type to type (d) (esp. if the
4465 object needs a finalization method), and is not likely to save much
4466 space unless the object is small and there are many of them. (In fact,
4467 if there are very few of them, it might actually waste space.)
4468 @item
4469 (d) Those lrecords that are individually @code{malloc()}ed.  These are
4470 called @dfn{lcrecords}.  All other types are in this category.  Adding a
4471 new type to this category is comparatively easy, and all types added
4472 since 19.8 (when the current allocation scheme was devised, by Richard
4473 Mlynarik), with the exception of the character type, have been in this
4474 category.
4475 @end itemize
4476
4477   Note that bit vectors are a bit of a special case.  They are
4478 simple lrecords as in category (c), but are individually @code{malloc()}ed
4479 like vectors.  You can basically view them as exactly like vectors
4480 except that their type is stored in lrecord fashion rather than
4481 in directly-tagged fashion.
4482
4483   Note that FSF Emacs redesigned their object system in 19.29 to follow
4484 a similar scheme.  However, given RMS's expressed dislike for data
4485 abstraction, the FSF scheme is not nearly as clean or as easy to
4486 extend. (FSF calls items of type (c) @code{Lisp_Misc} and items of type
4487 (d) @code{Lisp_Vectorlike}, with separate tags for each, although
4488 @code{Lisp_Vectorlike} is also used for vectors.)
4489
4490 @node Garbage Collection
4491 @section Garbage Collection
4492 @cindex garbage collection
4493
4494 @cindex mark and sweep
4495   Garbage collection is simple in theory but tricky to implement.
4496 Emacs Lisp uses the oldest garbage collection method, called
4497 @dfn{mark and sweep}.  Garbage collection begins by starting with
4498 all accessible locations (i.e. all variables and other slots where
4499 Lisp objects might occur) and recursively traversing all objects
4500 accessible from those slots, marking each one that is found.
4501 We then go through all of memory and free each object that is
4502 not marked, and unmarking each object that is marked.  Note
4503 that ``all of memory'' means all currently allocated objects.
4504 Traversing all these objects means traversing all frob blocks,
4505 all vectors (which are chained in one big list), and all
4506 lcrecords (which are likewise chained).
4507
4508   Note that, when an object is marked, the mark has to occur
4509 inside of the object's structure, rather than in the 32-bit
4510 @code{Lisp_Object} holding the object's pointer; i.e. you can't just
4511 set the pointer's mark bit.  This is because there may be many
4512 pointers to the same object.  This means that the method of
4513 marking an object can differ depending on the type.  The
4514 different marking methods are approximately as follows:
4515
4516 @enumerate
4517 @item
4518 For conses, the mark bit of the car is set.
4519 @item
4520 For strings, the mark bit of the string's plist is set.
4521 @item
4522 For symbols when not lrecords, the mark bit of the
4523 symbol's plist is set.
4524 @item
4525 For vectors, the length is negated after adding 1.
4526 @item
4527 For lrecords, the pointer to the structure describing
4528 the type is changed (see below).
4529 @item
4530 Integers and characters do not need to be marked, since
4531 no allocation occurs for them.
4532 @end enumerate
4533
4534   The details of this are in the @code{mark_object()} function.
4535
4536   Note that any code that operates during garbage collection has
4537 to be especially careful because of the fact that some objects
4538 may be marked and as such may not look like they normally do.
4539 In particular:
4540
4541 @itemize @bullet
4542 Some object pointers may have their mark bit set.  This will make
4543 @code{FOOBARP()} predicates fail.  Use @code{GC_FOOBARP()} to deal with
4544 this.
4545 @item
4546 Even if you clear the mark bit, @code{FOOBARP()} will still fail
4547 for lrecords because the implementation pointer has been
4548 changed (see below).  @code{GC_FOOBARP()} will correctly deal with
4549 this.
4550 @item
4551 Vectors have their size field munged, so anything that
4552 looks at this field will fail.
4553 @item
4554 Note that @code{XFOOBAR()} macros @emph{will} work correctly on object
4555 pointers with their mark bit set, because the logical shift operations
4556 that remove the tag also remove the mark bit.
4557 @end itemize
4558
4559   Finally, note that garbage collection can be invoked explicitly
4560 by calling @code{garbage-collect} but is also called automatically
4561 by @code{eval}, once a certain amount of memory has been allocated
4562 since the last garbage collection (according to @code{gc-cons-threshold}).
4563
4564 @node GCPROing
4565 @section @code{GCPRO}ing
4566
4567 @code{GCPRO}ing is one of the ugliest and trickiest parts of Emacs
4568 internals.  The basic idea is that whenever garbage collection
4569 occurs, all in-use objects must be reachable somehow or
4570 other from one of the roots of accessibility.  The roots
4571 of accessibility are:
4572
4573 @enumerate
4574 @item
4575 All objects that have been @code{staticpro()}d.  This is used for
4576 any global C variables that hold Lisp objects.  A call to
4577 @code{staticpro()} happens implicitly as a result of any symbols
4578 declared with @code{defsymbol()} and any variables declared with
4579 @code{DEFVAR_FOO()}.  You need to explicitly call @code{staticpro()}
4580 (in the @code{vars_of_foo()} method of a module) for other global
4581 C variables holding Lisp objects. (This typically includes
4582 internal lists and such things.)
4583
4584 Note that @code{obarray} is one of the @code{staticpro()}d things.
4585 Therefore, all functions and variables get marked through this.
4586 @item
4587 Any shadowed bindings that are sitting on the @code{specpdl} stack.
4588 @item
4589 Any objects sitting in currently active (Lisp) stack frames,
4590 catches, and condition cases.
4591 @item
4592 A couple of special-case places where active objects are
4593 located.
4594 @item
4595 Anything currently marked with @code{GCPRO}.
4596 @end enumerate
4597
4598   Marking with @code{GCPRO} is necessary because some C functions (quite
4599 a lot, in fact), allocate objects during their operation.  Quite
4600 frequently, there will be no other pointer to the object while the
4601 function is running, and if a garbage collection occurs and the object
4602 needs to be referenced again, bad things will happen.  The solution is
4603 to mark those objects with @code{GCPRO}.  Unfortunately this is easy to
4604 forget, and there is basically no way around this problem.  Here are
4605 some rules, though:
4606
4607 @enumerate
4608 @item
4609 For every @code{GCPRO@var{n}}, there have to be declarations of
4610 @code{struct gcpro gcpro1, gcpro2}, etc.
4611
4612 @item
4613 You @emph{must} @code{UNGCPRO} anything that's @code{GCPRO}ed, and you
4614 @emph{must not} @code{UNGCPRO} if you haven't @code{GCPRO}ed.  Getting
4615 either of these wrong will lead to crashes, often in completely random
4616 places unrelated to where the problem lies.
4617
4618 @item
4619 The way this actually works is that all currently active @code{GCPRO}s
4620 are chained through the @code{struct gcpro} local variables, with the
4621 variable @samp{gcprolist} pointing to the head of the list and the nth
4622 local @code{gcpro} variable pointing to the first @code{gcpro} variable
4623 in the next enclosing stack frame.  Each @code{GCPRO}ed thing is an
4624 lvalue, and the @code{struct gcpro} local variable contains a pointer to
4625 this lvalue.  This is why things will mess up badly if you don't pair up
4626 the @code{GCPRO}s and @code{UNGCPRO}s -- you will end up with
4627 @code{gcprolist}s containing pointers to @code{struct gcpro}s or local
4628 @code{Lisp_Object} variables in no-longer-active stack frames.
4629
4630 @item
4631 It is actually possible for a single @code{struct gcpro} to
4632 protect a contiguous array of any number of values, rather than
4633 just a single lvalue.  To effect this, call @code{GCPRO@var{n}} as usual on
4634 the first object in the array and then set @code{gcpro@var{n}.nvars}.
4635
4636 @item
4637 @strong{Strings are relocated.}  What this means in practice is that the
4638 pointer obtained using @code{XSTRING_DATA()} is liable to change at any
4639 time, and you should never keep it around past any function call, or
4640 pass it as an argument to any function that might cause a garbage
4641 collection.  This is why a number of functions accept either a
4642 ``non-relocatable'' @code{char *} pointer or a relocatable Lisp string,
4643 and only access the Lisp string's data at the very last minute.  In some
4644 cases, you may end up having to @code{alloca()} some space and copy the
4645 string's data into it.
4646
4647 @item
4648 By convention, if you have to nest @code{GCPRO}'s, use @code{NGCPRO@var{n}}
4649 (along with @code{struct gcpro ngcpro1, ngcpro2}, etc.), @code{NNGCPRO@var{n}},
4650 etc.  This avoids compiler warnings about shadowed locals.
4651
4652 @item
4653 It is @emph{always} better to err on the side of extra @code{GCPRO}s
4654 rather than too few.  The extra cycles spent on this are
4655 almost never going to make a whit of difference in the
4656 speed of anything.
4657
4658 @item
4659 The general rule to follow is that caller, not callee, @code{GCPRO}s.
4660 That is, you should not have to explicitly @code{GCPRO} any Lisp objects
4661 that are passed in as parameters.
4662
4663 One exception from this rule is if you ever plan to change the parameter
4664 value, and store a new object in it.  In that case, you @emph{must}
4665 @code{GCPRO} the parameter, because otherwise the new object will not be
4666 protected.
4667
4668 So, if you create any Lisp objects (remember, this happens in all sorts
4669 of circumstances, e.g. with @code{Fcons()}, etc.), you are responsible
4670 for @code{GCPRO}ing them, unless you are @emph{absolutely sure} that
4671 there's no possibility that a garbage-collection can occur while you
4672 need to use the object.  Even then, consider @code{GCPRO}ing.
4673
4674 @item
4675 A garbage collection can occur whenever anything calls @code{Feval}, or
4676 whenever a QUIT can occur where execution can continue past
4677 this. (Remember, this is almost anywhere.)
4678
4679 @item
4680 If you have the @emph{least smidgeon of doubt} about whether
4681 you need to @code{GCPRO}, you should @code{GCPRO}.
4682
4683 @item
4684 Beware of @code{GCPRO}ing something that is uninitialized.  If you have
4685 any shade of doubt about this, initialize all your variables to @code{Qnil}.
4686
4687 @item
4688 Be careful of traps, like calling @code{Fcons()} in the argument to
4689 another function.  By the ``caller protects'' law, you should be
4690 @code{GCPRO}ing the newly-created cons, but you aren't.  A certain
4691 number of functions that are commonly called on freshly created stuff
4692 (e.g. @code{nconc2()}, @code{Fsignal()}), break the ``caller protects''
4693 law and go ahead and @code{GCPRO} their arguments so as to simplify
4694 things, but make sure and check if it's OK whenever doing something like
4695 this.
4696
4697 @item
4698 Once again, remember to @code{GCPRO}!  Bugs resulting from insufficient
4699 @code{GCPRO}ing are intermittent and extremely difficult to track down,
4700 often showing up in crashes inside of @code{garbage-collect} or in
4701 weirdly corrupted objects or even in incorrect values in a totally
4702 different section of code.
4703 @end enumerate
4704
4705 @cindex garbage collection, conservative
4706 @cindex conservative garbage collection
4707   Given the extremely error-prone nature of the @code{GCPRO} scheme, and
4708 the difficulties in tracking down, it should be considered a deficiency
4709 in the XEmacs code.  A solution to this problem would involve
4710 implementing so-called @dfn{conservative} garbage collection for the C
4711 stack.  That involves looking through all of stack memory and treating
4712 anything that looks like a reference to an object as a reference.  This
4713 will result in a few objects not getting collected when they should, but
4714 it obviates the need for @code{GCPRO}ing, and allows garbage collection
4715 to happen at any point at all, such as during object allocation.
4716
4717 @node Integers and Characters
4718 @section Integers and Characters
4719
4720   Integer and character Lisp objects are created from integers using the
4721 macros @code{XSETINT()} and @code{XSETCHAR()} or the equivalent
4722 functions @code{make_int()} and @code{make_char()}. (These are actually
4723 macros on most systems.)  These functions basically just do some moving
4724 of bits around, since the integral value of the object is stored
4725 directly in the @code{Lisp_Object}.
4726
4727   @code{XSETINT()} and the like will truncate values given to them that
4728 are too big; i.e. you won't get the value you expected but the tag bits
4729 will at least be correct.
4730
4731 @node Allocation from Frob Blocks
4732 @section Allocation from Frob Blocks
4733
4734 The uninitialized memory required by a @code{Lisp_Object} of a particular type
4735 is allocated using
4736 @code{ALLOCATE_FIXED_TYPE()}.  This only occurs inside of the
4737 lowest-level object-creating functions in @file{alloc.c}:
4738 @code{Fcons()}, @code{make_float()}, @code{Fmake_byte_code()},
4739 @code{Fmake_symbol()}, @code{allocate_extent()},
4740 @code{allocate_event()}, @code{Fmake_marker()}, and
4741 @code{make_uninit_string()}.  The idea is that, for each type, there are
4742 a number of frob blocks (each 2K in size); each frob block is divided up
4743 into object-sized chunks.  Each frob block will have some of these
4744 chunks that are currently assigned to objects, and perhaps some that are
4745 free. (If a frob block has nothing but free chunks, it is freed at the
4746 end of the garbage collection cycle.)  The free chunks are stored in a
4747 free list, which is chained by storing a pointer in the first four bytes
4748 of the chunk. (Except for the free chunks at the end of the last frob
4749 block, which are handled using an index which points past the end of the
4750 last-allocated chunk in the last frob block.)
4751 @code{ALLOCATE_FIXED_TYPE()} first tries to retrieve a chunk from the
4752 free list; if that fails, it calls
4753 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK()}, which looks at the end of the
4754 last frob block for space, and creates a new frob block if there is
4755 none. (There are actually two versions of these macros, one of which is
4756 more defensive but less efficient and is used for error-checking.)
4757
4758 @node lrecords
4759 @section lrecords
4760
4761   [see @file{lrecord.h}]
4762
4763   All lrecords have at the beginning of their structure a @code{struct
4764 lrecord_header}.  This just contains a pointer to a @code{struct
4765 lrecord_implementation}, which is a structure containing method pointers
4766 and such.  There is one of these for each type, and it is a global,
4767 constant, statically-declared structure that is declared in the
4768 @code{DEFINE_LRECORD_IMPLEMENTATION()} macro. (This macro actually
4769 declares an array of two @code{struct lrecord_implementation}
4770 structures.  The first one contains all the standard method pointers,
4771 and is used in all normal circumstances.  During garbage collection,
4772 however, the lrecord is @dfn{marked} by bumping its implementation
4773 pointer by one, so that it points to the second structure in the array.
4774 This structure contains a special indication in it that it's a
4775 @dfn{marked-object} structure: the finalize method is the special
4776 function @code{this_marks_a_marked_record()}, and all other methods are
4777 null pointers.  At the end of garbage collection, all lrecords will
4778 either be reclaimed or unmarked by decrementing their implementation
4779 pointers, so this second structure pointer will never remain past
4780 garbage collection.
4781
4782   Simple lrecords (of type (c) above) just have a @code{struct
4783 lrecord_header} at their beginning.  lcrecords, however, actually have a
4784 @code{struct lcrecord_header}.  This, in turn, has a @code{struct
4785 lrecord_header} at its beginning, so sanity is preserved; but it also
4786 has a pointer used to chain all lcrecords together, and a special ID
4787 field used to distinguish one lcrecord from another. (This field is used
4788 only for debugging and could be removed, but the space gain is not
4789 significant.)
4790
4791   Simple lrecords are created using @code{ALLOCATE_FIXED_TYPE()}, just
4792 like for other frob blocks.  The only change is that the implementation
4793 pointer must be initialized correctly. (The implementation structure for
4794 an lrecord, or rather the pointer to it, is named @code{lrecord_float},
4795 @code{lrecord_extent}, @code{lrecord_buffer}, etc.)
4796
4797   lcrecords are created using @code{alloc_lcrecord()}.  This takes a
4798 size to allocate and an implementation pointer. (The size needs to be
4799 passed because some lcrecords, such as window configurations, are of
4800 variable size.) This basically just @code{malloc()}s the storage,
4801 initializes the @code{struct lcrecord_header}, and chains the lcrecord
4802 onto the head of the list of all lcrecords, which is stored in the
4803 variable @code{all_lcrecords}.  The calls to @code{alloc_lcrecord()}
4804 generally occur in the lowest-level allocation function for each lrecord
4805 type.
4806
4807 Whenever you create an lrecord, you need to call either
4808 @code{DEFINE_LRECORD_IMPLEMENTATION()} or
4809 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()}.  This needs to be
4810 specified in a C file, at the top level.  What this actually does is
4811 define and initialize the implementation structure for the lrecord. (And
4812 possibly declares a function @code{error_check_foo()} that implements
4813 the @code{XFOO()} macro when error-checking is enabled.)  The arguments
4814 to the macros are the actual type name (this is used to construct the C
4815 variable name of the lrecord implementation structure and related
4816 structures using the @samp{##} macro concatenation operator), a string
4817 that names the type on the Lisp level (this may not be the same as the C
4818 type name; typically, the C type name has underscores, while the Lisp
4819 string has dashes), various method pointers, and the name of the C
4820 structure that contains the object.  The methods are used to encapsulate
4821 type-specific information about the object, such as how to print it or
4822 mark it for garbage collection, so that it's easy to add new object
4823 types without having to add a specific case for each new type in a bunch
4824 of different places.
4825
4826   The difference between @code{DEFINE_LRECORD_IMPLEMENTATION()} and
4827 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()} is that the former is
4828 used for fixed-size object types and the latter is for variable-size
4829 object types.  Most object types are fixed-size; some complex
4830 types, however (e.g. window configurations), are variable-size.
4831 Variable-size object types have an extra method, which is called
4832 to determine the actual size of a particular object of that type.
4833 (Currently this is only used for keeping allocation statistics.)
4834
4835   For the purpose of keeping allocation statistics, the allocation
4836 engine keeps a list of all the different types that exist.  Note that,
4837 since @code{DEFINE_LRECORD_IMPLEMENTATION()} is a macro that is
4838 specified at top-level, there is no way for it to add to the list of all
4839 existing types.  What happens instead is that each implementation
4840 structure contains in it a dynamically assigned number that is
4841 particular to that type. (Or rather, it contains a pointer to another
4842 structure that contains this number.  This evasiveness is done so that
4843 the implementation structure can be declared const.) In the sweep stage
4844 of garbage collection, each lrecord is examined to see if its
4845 implementation structure has its dynamically-assigned number set.  If
4846 not, it must be a new type, and it is added to the list of known types
4847 and a new number assigned.  The number is used to index into an array
4848 holding the number of objects of each type and the total memory
4849 allocated for objects of that type.  The statistics in this array are
4850 also computed during the sweep stage.  These statistics are returned by
4851 the call to @code{garbage-collect} and are printed out at the end of the
4852 loadup phase.
4853
4854   Note that for every type defined with a @code{DEFINE_LRECORD_*()}
4855 macro, there needs to be a @code{DECLARE_LRECORD_IMPLEMENTATION()}
4856 somewhere in a @file{.h} file, and this @file{.h} file needs to be
4857 included by @file{inline.c}.
4858
4859   Furthermore, there should generally be a set of @code{XFOOBAR()},
4860 @code{FOOBARP()}, etc. macros in a @file{.h} (or occasionally @file{.c})
4861 file.  To create one of these, copy an existing model and modify as
4862 necessary.
4863
4864   The various methods in the lrecord implementation structure are:
4865
4866 @enumerate
4867 @item
4868 @cindex mark method
4869 A @dfn{mark} method.  This is called during the marking stage and passed
4870 a function pointer (usually the @code{mark_object()} function), which is
4871 used to mark an object.  All Lisp objects that are contained within the
4872 object need to be marked by applying this function to them.  The mark
4873 method should also return a Lisp object, which should be either nil or
4874 an object to mark. (This can be used in lieu of calling
4875 @code{mark_object()} on the object, to reduce the recursion depth, and
4876 consequently should be the most heavily nested sub-object, such as a
4877 long list.)
4878
4879 @strong{Please note:} When the mark method is called, garbage collection
4880 is in progress, and special precautions need to be taken when accessing
4881 objects; see section (B) above.
4882
4883 If your mark method does not need to do anything, it can be
4884 @code{NULL}.
4885
4886 @item
4887 A @dfn{print} method.  This is called to create a printed representation
4888 of the object, whenever @code{princ}, @code{prin1}, or the like is
4889 called.  It is passed the object, a stream to which the output is to be
4890 directed, and an @code{escapeflag} which indicates whether the object's
4891 printed representation should be @dfn{escaped} so that it is
4892 readable. (This corresponds to the difference between @code{princ} and
4893 @code{prin1}.) Basically, @dfn{escaped} means that strings will have
4894 quotes around them and confusing characters in the strings such as
4895 quotes, backslashes, and newlines will be backslashed; and that special
4896 care will be taken to make symbols print in a readable fashion
4897 (e.g. symbols that look like numbers will be backslashed).  Other
4898 readable objects should perhaps pass @code{escapeflag} on when
4899 sub-objects are printed, so that readability is preserved when necessary
4900 (or if not, always pass in a 1 for @code{escapeflag}).  Non-readable
4901 objects should in general ignore @code{escapeflag}, except that some use
4902 it as an indication that more verbose output should be given.
4903
4904 Sub-objects are printed using @code{print_internal()}, which takes
4905 exactly the same arguments as are passed to the print method.
4906
4907 Literal C strings should be printed using @code{write_c_string()},
4908 or @code{write_string_1()} for non-null-terminated strings.
4909
4910 Functions that do not have a readable representation should check the
4911 @code{print_readably} flag and signal an error if it is set.
4912
4913 If you specify NULL for the print method, the
4914 @code{default_object_printer()} will be used.
4915
4916 @item
4917 A @dfn{finalize} method.  This is called at the beginning of the sweep
4918 stage on lcrecords that are about to be freed, and should be used to
4919 perform any extra object cleanup.  This typically involves freeing any
4920 extra @code{malloc()}ed memory associated with the object, releasing any
4921 operating-system and window-system resources associated with the object
4922 (e.g. pixmaps, fonts), etc.
4923
4924 The finalize method can be NULL if nothing needs to be done.
4925
4926 WARNING #1: The finalize method is also called at the end of the dump
4927 phase; this time with the for_disksave parameter set to non-zero.  The
4928 object is @emph{not} about to disappear, so you have to make sure to
4929 @emph{not} free any extra @code{malloc()}ed memory if you're going to
4930 need it later.  (Also, signal an error if there are any operating-system
4931 and window-system resources here, because they can't be dumped.)
4932
4933 Finalize methods should, as a rule, set to zero any pointers after
4934 they've been freed, and check to make sure pointers are not zero before
4935 freeing.  Although I'm pretty sure that finalize methods are not called
4936 twice on the same object (except for the @code{for_disksave} proviso),
4937 we've gotten nastily burned in some cases by not doing this.
4938
4939 WARNING #2: The finalize method is @emph{only} called for
4940 lcrecords, @emph{not} for simply lrecords.  If you need a
4941 finalize method for simple lrecords, you have to stick
4942 it in the @code{ADDITIONAL_FREE_foo()} macro in @file{alloc.c}.
4943
4944 WARNING #3: Things are in an @emph{extremely} bizarre state
4945 when @code{ADDITIONAL_FREE_foo()} is called, so you have to
4946 be incredibly careful when writing one of these functions.
4947 See the comment in @code{gc_sweep()}.  If you ever have to add
4948 one of these, consider using an lcrecord or dealing with
4949 the problem in a different fashion.
4950
4951 @item
4952 An @dfn{equal} method.  This compares the two objects for similarity,
4953 when @code{equal} is called.  It should compare the contents of the
4954 objects in some reasonable fashion.  It is passed the two objects and a
4955 @dfn{depth} value, which is used to catch circular objects.  To compare
4956 sub-Lisp-objects, call @code{internal_equal()} and bump the depth value
4957 by one.  If this value gets too high, a @code{circular-object} error
4958 will be signaled.
4959
4960 If this is NULL, objects are @code{equal} only when they are @code{eq},
4961 i.e. identical.
4962
4963 @item
4964 A @dfn{hash} method.  This is used to hash objects when they are to be
4965 compared with @code{equal}.  The rule here is that if two objects are
4966 @code{equal}, they @emph{must} hash to the same value; i.e. your hash
4967 function should use some subset of the sub-fields of the object that are
4968 compared in the ``equal'' method.  If you specify this method as
4969 @code{NULL}, the object's pointer will be used as the hash, which will
4970 @emph{fail} if the object has an @code{equal} method, so don't do this.
4971
4972 To hash a sub-Lisp-object, call @code{internal_hash()}.  Bump the
4973 depth by one, just like in the ``equal'' method.
4974
4975 To convert a Lisp object directly into a hash value (using
4976 its pointer), use @code{LISP_HASH()}.  This is what happens when
4977 the hash method is NULL.
4978
4979 To hash two or more values together into a single value, use
4980 @code{HASH2()}, @code{HASH3()}, @code{HASH4()}, etc.
4981
4982 @item
4983 @dfn{getprop}, @dfn{putprop}, @dfn{remprop}, and @dfn{plist} methods.
4984 These are used for object types that have properties.  I don't feel like
4985 documenting them here.  If you create one of these objects, you have to
4986 use different macros to define them,
4987 i.e. @code{DEFINE_LRECORD_IMPLEMENTATION_WITH_PROPS()} or
4988 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION_WITH_PROPS()}.
4989
4990 @item
4991 A @dfn{size_in_bytes} method, when the object is of variable-size.
4992 (i.e. declared with a @code{_SEQUENCE_IMPLEMENTATION} macro.)  This should
4993 simply return the object's size in bytes, exactly as you might expect.
4994 For an example, see the methods for window configurations and opaques.
4995 @end enumerate
4996
4997 @node Low-level allocation
4998 @section Low-level allocation
4999
5000   Memory that you want to allocate directly should be allocated using
5001 @code{xmalloc()} rather than @code{malloc()}.  This implements
5002 error-checking on the return value, and once upon a time did some more
5003 vital stuff (i.e. @code{BLOCK_INPUT}, which is no longer necessary).
5004 Free using @code{xfree()}, and realloc using @code{xrealloc()}.  Note
5005 that @code{xmalloc()} will do a non-local exit if the memory can't be
5006 allocated. (Many functions, however, do not expect this, and thus XEmacs
5007 will likely crash if this happens.  @strong{This is a bug.}  If you can,
5008 you should strive to make your function handle this OK.  However, it's
5009 difficult in the general circumstance, perhaps requiring extra
5010 unwind-protects and such.)
5011
5012   Note that XEmacs provides two separate replacements for the standard
5013 @code{malloc()} library function.  These are called @dfn{old GNU malloc}
5014 (@file{malloc.c}) and @dfn{new GNU malloc} (@file{gmalloc.c}),
5015 respectively.  New GNU malloc is better in pretty much every way than
5016 old GNU malloc, and should be used if possible.  (It used to be that on
5017 some systems, the old one worked but the new one didn't.  I think this
5018 was due specifically to a bug in SunOS, which the new one now works
5019 around; so I don't think the old one ever has to be used any more.) The
5020 primary difference between both of these mallocs and the standard system
5021 malloc is that they are much faster, at the expense of increased space.
5022 The basic idea is that memory is allocated in fixed chunks of powers of
5023 two.  This allows for basically constant malloc time, since the various
5024 chunks can just be kept on a number of free lists. (The standard system
5025 malloc typically allocates arbitrary-sized chunks and has to spend some
5026 time, sometimes a significant amount of time, walking the heap looking
5027 for a free block to use and cleaning things up.)  The new GNU malloc
5028 improves on things by allocating large objects in chunks of 4096 bytes
5029 rather than in ever larger powers of two, which results in ever larger
5030 wastage.  There is a slight speed loss here, but it's of doubtful
5031 significance.
5032
5033   NOTE: Apparently there is a third-generation GNU malloc that is
5034 significantly better than the new GNU malloc, and should probably
5035 be included in XEmacs.
5036
5037   There is also the relocating allocator, @file{ralloc.c}.  This actually
5038 moves blocks of memory around so that the @code{sbrk()} pointer shrunk
5039 and virtual memory released back to the system.  On some systems,
5040 this is a big win.  On all systems, it causes a noticeable (and
5041 sometimes huge) speed penalty, so I turn it off by default.
5042 @file{ralloc.c} only works with the new GNU malloc in @file{gmalloc.c}.
5043 There are also two versions of @file{ralloc.c}, one that uses @code{mmap()}
5044 rather than block copies to move data around.  This purports to
5045 be faster, although that depends on the amount of data that would
5046 have had to be block copied and the system-call overhead for
5047 @code{mmap()}.  I don't know exactly how this works, except that the
5048 relocating-allocation routines are pretty much used only for
5049 the memory allocated for a buffer, which is the biggest consumer
5050 of space, esp. of space that may get freed later.
5051
5052   Note that the GNU mallocs have some ``memory warning'' facilities.
5053 XEmacs taps into them and issues a warning through the standard
5054 warning system, when memory gets to 75%, 85%, and 95% full.
5055 (On some systems, the memory warnings are not functional.)
5056
5057   Allocated memory that is going to be used to make a Lisp object
5058 is created using @code{allocate_lisp_storage()}.  This calls @code{xmalloc()}
5059 but also verifies that the pointer to the memory can fit into
5060 a Lisp word (remember that some bits are taken away for a type
5061 tag and a mark bit).  If not, an error is issued through @code{memory_full()}.
5062 @code{allocate_lisp_storage()} is called by @code{alloc_lcrecord()},
5063 @code{ALLOCATE_FIXED_TYPE()}, and the vector and bit-vector creation
5064 routines.  These routines also call @code{INCREMENT_CONS_COUNTER()} at the
5065 appropriate times; this keeps statistics on how much memory is
5066 allocated, so that garbage-collection can be invoked when the
5067 threshold is reached.
5068
5069 @node Pure Space
5070 @section Pure Space
5071
5072   Not yet documented.
5073
5074 @node Cons
5075 @section Cons
5076
5077   Conses are allocated in standard frob blocks.  The only thing to
5078 note is that conses can be explicitly freed using @code{free_cons()}
5079 and associated functions @code{free_list()} and @code{free_alist()}.  This
5080 immediately puts the conses onto the cons free list, and decrements
5081 the statistics on memory allocation appropriately.  This is used
5082 to good effect by some extremely commonly-used code, to avoid
5083 generating extra objects and thereby triggering GC sooner.
5084 However, you have to be @emph{extremely} careful when doing this.
5085 If you mess this up, you will get BADLY BURNED, and it has happened
5086 before.
5087
5088 @node Vector
5089 @section Vector
5090
5091   As mentioned above, each vector is @code{malloc()}ed individually, and
5092 all are threaded through the variable @code{all_vectors}.  Vectors are
5093 marked strangely during garbage collection, by kludging the size field.
5094 Note that the @code{struct Lisp_Vector} is declared with its
5095 @code{contents} field being a @emph{stretchy} array of one element.  It
5096 is actually @code{malloc()}ed with the right size, however, and access
5097 to any element through the @code{contents} array works fine.
5098
5099 @node Bit Vector
5100 @section Bit Vector
5101
5102   Bit vectors work exactly like vectors, except for more complicated
5103 code to access an individual bit, and except for the fact that bit
5104 vectors are lrecords while vectors are not. (The only difference here is
5105 that there's an lrecord implementation pointer at the beginning and the
5106 tag field in bit vector Lisp words is ``lrecord'' rather than
5107 ``vector''.)
5108
5109 @node Symbol
5110 @section Symbol
5111
5112   Symbols are also allocated in frob blocks.  Note that the code
5113 exists for symbols to be either lrecords (category (c) above)
5114 or simple types (category (b) above), and are lrecords by
5115 default (I think), although there is no good reason for this.
5116
5117   Note that symbols in the awful horrible obarray structure are
5118 chained through their @code{next} field.
5119
5120 Remember that @code{intern} looks up a symbol in an obarray, creating
5121 one if necessary.
5122
5123 @node Marker
5124 @section Marker
5125
5126   Markers are allocated in frob blocks, as usual.  They are kept
5127 in a buffer unordered, but in a doubly-linked list so that they
5128 can easily be removed. (Formerly this was a singly-linked list,
5129 but in some cases garbage collection took an extraordinarily
5130 long time due to the O(N^2) time required to remove lots of
5131 markers from a buffer.) Markers are removed from a buffer in
5132 the finalize stage, in @code{ADDITIONAL_FREE_marker()}.
5133
5134 @node String
5135 @section String
5136
5137   As mentioned above, strings are a special case.  A string is logically
5138 two parts, a fixed-size object (containing the length, property list,
5139 and a pointer to the actual data), and the actual data in the string.
5140 The fixed-size object is a @code{struct Lisp_String} and is allocated in
5141 frob blocks, as usual.  The actual data is stored in special
5142 @dfn{string-chars blocks}, which are 8K blocks of memory.
5143 Currently-allocated strings are simply laid end to end in these
5144 string-chars blocks, with a pointer back to the @code{struct Lisp_String}
5145 stored before each string in the string-chars block.  When a new string
5146 needs to be allocated, the remaining space at the end of the last
5147 string-chars block is used if there's enough, and a new string-chars
5148 block is created otherwise.
5149
5150   There are never any holes in the string-chars blocks due to the string
5151 compaction and relocation that happens at the end of garbage collection.
5152 During the sweep stage of garbage collection, when objects are
5153 reclaimed, the garbage collector goes through all string-chars blocks,
5154 looking for unused strings.  Each chunk of string data is preceded by a
5155 pointer to the corresponding @code{struct Lisp_String}, which indicates
5156 both whether the string is used and how big the string is, i.e. how to
5157 get to the next chunk of string data.  Holes are compressed by
5158 block-copying the next string into the empty space and relocating the
5159 pointer stored in the corresponding @code{struct Lisp_String}.
5160 @strong{This means you have to be careful with strings in your code.}
5161 See the section above on @code{GCPRO}ing.
5162
5163   Note that there is one situation not handled: a string that is too big
5164 to fit into a string-chars block.  Such strings, called @dfn{big
5165 strings}, are all @code{malloc()}ed as their own block. (#### Although it
5166 would make more sense for the threshold for big strings to be somewhat
5167 lower, e.g. 1/2 or 1/4 the size of a string-chars block.  It seems that
5168 this was indeed the case formerly -- indeed, the threshold was set at
5169 1/8 -- but Mly forgot about this when rewriting things for 19.8.)
5170
5171 Note also that the string data in string-chars blocks is padded as
5172 necessary so that proper alignment constraints on the @code{struct
5173 Lisp_String} back pointers are maintained.
5174
5175   Finally, strings can be resized.  This happens in Mule when a
5176 character is substituted with a different-length character, or during
5177 modeline frobbing. (You could also export this to Lisp, but it's not
5178 done so currently.) Resizing a string is a potentially tricky process.
5179 If the change is small enough that the padding can absorb it, nothing
5180 other than a simple memory move needs to be done.  Keep in mind,
5181 however, that the string can't shrink too much because the offset to the
5182 next string in the string-chars block is computed by looking at the
5183 length and rounding to the nearest multiple of four or eight.  If the
5184 string would shrink or expand beyond the correct padding, new string
5185 data needs to be allocated at the end of the last string-chars block and
5186 the data moved appropriately.  This leaves some dead string data, which
5187 is marked by putting a special marker of 0xFFFFFFFF in the @code{struct
5188 Lisp_String} pointer before the data (there's no real @code{struct
5189 Lisp_String} to point to and relocate), and storing the size of the dead
5190 string data (which would normally be obtained from the now-non-existent
5191 @code{struct Lisp_String}) at the beginning of the dead string data gap.
5192 The string compactor recognizes this special 0xFFFFFFFF marker and
5193 handles it correctly.
5194
5195 @node Compiled Function
5196 @section Compiled Function
5197
5198   Not yet documented.
5199
5200 @node Events and the Event Loop, Evaluation; Stack Frames; Bindings, Allocation of Objects in XEmacs Lisp, Top
5201 @chapter Events and the Event Loop
5202
5203 @menu
5204 * Introduction to Events::
5205 * Main Loop::
5206 * Specifics of the Event Gathering Mechanism::
5207 * Specifics About the Emacs Event::
5208 * The Event Stream Callback Routines::
5209 * Other Event Loop Functions::
5210 * Converting Events::
5211 * Dispatching Events; The Command Builder::
5212 @end menu
5213
5214 @node Introduction to Events
5215 @section Introduction to Events
5216
5217   An event is an object that encapsulates information about an
5218 interesting occurrence in the operating system.  Events are
5219 generated either by user action, direct (e.g. typing on the
5220 keyboard or moving the mouse) or indirect (moving another
5221 window, thereby generating an expose event on an Emacs frame),
5222 or as a result of some other typically asynchronous action happening,
5223 such as output from a subprocess being ready or a timer expiring.
5224 Events come into the system in an asynchronous fashion (typically
5225 through a callback being called) and are converted into a
5226 synchronous event queue (first-in, first-out) in a process that
5227 we will call @dfn{collection}.
5228
5229   Note that each application has its own event queue. (It is
5230 immaterial whether the collection process directly puts the
5231 events in the proper application's queue, or puts them into
5232 a single system queue, which is later split up.)
5233
5234   The most basic level of event collection is done by the
5235 operating system or window system.  Typically, XEmacs does
5236 its own event collection as well.  Often there are multiple
5237 layers of collection in XEmacs, with events from various
5238 sources being collected into a queue, which is then combined
5239 with other sources to go into another queue (i.e. a second
5240 level of collection), with perhaps another level on top of
5241 this, etc.
5242
5243   XEmacs has its own types of events (called @dfn{Emacs events}),
5244 which provides an abstract layer on top of the system-dependent
5245 nature of the most basic events that are received.  Part of the
5246 complex nature of the XEmacs event collection process involves
5247 converting from the operating-system events into the proper
5248 Emacs events -- there may not be a one-to-one correspondence.
5249
5250   Emacs events are documented in @file{events.h}; I'll discuss them
5251 later.
5252
5253 @node Main Loop
5254 @section Main Loop
5255
5256   The @dfn{command loop} is the top-level loop that the editor is always
5257 running.  It loops endlessly, calling @code{next-event} to retrieve an
5258 event and @code{dispatch-event} to execute it. @code{dispatch-event} does
5259 the appropriate thing with non-user events (process, timeout,
5260 magic, eval, mouse motion); this involves calling a Lisp handler
5261 function, redrawing a newly-exposed part of a frame, reading
5262 subprocess output, etc.  For user events, @code{dispatch-event}
5263 looks up the event in relevant keymaps or menubars; when a
5264 full key sequence or menubar selection is reached, the appropriate
5265 function is executed. @code{dispatch-event} may have to keep state
5266 across calls; this is done in the ``command-builder'' structure
5267 associated with each console (remember, there's usually only
5268 one console), and the engine that looks up keystrokes and
5269 constructs full key sequences is called the @dfn{command builder}.
5270 This is documented elsewhere.
5271
5272   The guts of the command loop are in @code{command_loop_1()}.  This
5273 function doesn't catch errors, though -- that's the job of
5274 @code{command_loop_2()}, which is a condition-case (i.e. error-trapping)
5275 wrapper around @code{command_loop_1()}.  @code{command_loop_1()} never
5276 returns, but may get thrown out of.
5277
5278   When an error occurs, @code{cmd_error()} is called, which usually
5279 invokes the Lisp error handler in @code{command-error}; however, a
5280 default error handler is provided if @code{command-error} is @code{nil}
5281 (e.g. during startup).  The purpose of the error handler is simply to
5282 display the error message and do associated cleanup; it does not need to
5283 throw anywhere.  When the error handler finishes, the condition-case in
5284 @code{command_loop_2()} will finish and @code{command_loop_2()} will
5285 reinvoke @code{command_loop_1()}.
5286
5287   @code{command_loop_2()} is invoked from three places: from
5288 @code{initial_command_loop()} (called from @code{main()} at the end of
5289 internal initialization), from the Lisp function @code{recursive-edit},
5290 and from @code{call_command_loop()}.
5291
5292   @code{call_command_loop()} is called when a macro is started and when
5293 the minibuffer is entered; normal termination of the macro or minibuffer
5294 causes a throw out of the recursive command loop. (To
5295 @code{execute-kbd-macro} for macros and @code{exit} for minibuffers.
5296 Note also that the low-level minibuffer-entering function,
5297 @code{read-minibuffer-internal}, provides its own error handling and
5298 does not need @code{command_loop_2()}'s error encapsulation; so it tells
5299 @code{call_command_loop()} to invoke @code{command_loop_1()} directly.)
5300
5301   Note that both read-minibuffer-internal and recursive-edit set up a
5302 catch for @code{exit}; this is why @code{abort-recursive-edit}, which
5303 throws to this catch, exits out of either one.
5304
5305   @code{initial_command_loop()}, called from @code{main()}, sets up a
5306 catch for @code{top-level} when invoking @code{command_loop_2()},
5307 allowing functions to throw all the way to the top level if they really
5308 need to.  Before invoking @code{command_loop_2()},
5309 @code{initial_command_loop()} calls @code{top_level_1()}, which handles
5310 all of the startup stuff (creating the initial frame, handling the
5311 command-line options, loading the user's @file{.emacs} file, etc.).  The
5312 function that actually does this is in Lisp and is pointed to by the
5313 variable @code{top-level}; normally this function is
5314 @code{normal-top-level}.  @code{top_level_1()} is just an error-handling
5315 wrapper similar to @code{command_loop_2()}.  Note also that
5316 @code{initial_command_loop()} sets up a catch for @code{top-level} when
5317 invoking @code{top_level_1()}, just like when it invokes
5318 @code{command_loop_2()}.
5319
5320 @node Specifics of the Event Gathering Mechanism
5321 @section Specifics of the Event Gathering Mechanism
5322
5323   Here is an approximate diagram of the collection processes
5324 at work in XEmacs, under TTY's (TTY's are simpler than X
5325 so we'll look at this first):
5326
5327 @noindent
5328 @example
5329  asynch.      asynch.    asynch.   asynch.             [Collectors in
5330 kbd events  kbd events   process   process                the OS]
5331       |         |         output    output
5332       |         |           |         |
5333       |         |           |         |      SIGINT,   [signal handlers
5334       |         |           |         |      SIGQUIT,     in XEmacs]
5335       V         V           V         V      SIGWINCH,
5336      file      file        file      file    SIGALRM
5337      desc.     desc.       desc.     desc.     |
5338      (TTY)     (TTY)       (pipe)    (pipe)    |
5339       |          |          |         |      fake    timeouts
5340       |          |          |         |      file        |
5341       |          |          |         |      desc.       |
5342       |          |          |         |      (pipe)      |
5343       |          |          |         |        |         |
5344       |          |          |         |        |         |
5345       |          |          |         |        |         |
5346       V          V          V         V        V         V
5347       ------>-----------<----------------<----------------
5348                   |
5349                   |
5350                   | [collected using select() in emacs_tty_next_event()
5351                   |  and converted to the appropriate Emacs event]
5352                   |
5353                   |
5354                   V          (above this line is TTY-specific)
5355                 Emacs -----------------------------------------------
5356                 event (below this line is the generic event mechanism)
5357                   |
5358                   |
5359 was there     if not, call
5360 a SIGINT?  emacs_tty_next_event()
5361     |             |
5362     |             |
5363     |             |
5364     V             V
5365     --->------<----
5366            |
5367            |     [collected in event_stream_next_event();
5368            |      SIGINT is converted using maybe_read_quit_event()]
5369            V
5370          Emacs
5371          event
5372            |
5373            \---->------>----- maybe_kbd_translate() ---->---\
5374                                                             |
5375                                                             |
5376                                                             |
5377      command event queue                                    |
5378                                                if not from command
5379   (contains events that were                   event queue, call
5380   read earlier but not processed,              event_stream_next_event()
5381   typically when waiting in a                               |
5382   sit-for, sleep-for, etc. for                              |
5383  a particular event to be received)                         |
5384                |                                            |
5385                |                                            |
5386                V                                            V
5387                ---->------------------------------------<----
5388                                                |
5389                                                | [collected in
5390                                                |  next_event_internal()]
5391                                                |
5392  unread-     unread-       event from          |
5393  command-    command-       keyboard       else, call
5394  events      event           macro      next_event_internal()
5395    |           |               |               |
5396    |           |               |               |
5397    |           |               |               |
5398    V           V               V               V
5399    --------->----------------------<------------
5400                      |
5401                      |      [collected in `next-event', which may loop
5402                      |       more than once if the event it gets is on
5403                      |       a dead frame, device, etc.]
5404                      |
5405                      |
5406                      V
5407             feed into top-level event loop,
5408             which repeatedly calls `next-event'
5409             and then dispatches the event
5410             using `dispatch-event'
5411 @end example
5412
5413 Notice the separation between TTY-specific and generic event mechanism.
5414 When using the Xt-based event loop, the TTY-specific stuff is replaced
5415 but the rest stays the same.
5416
5417 It's also important to realize that only one different kind of
5418 system-specific event loop can be operating at a time, and must be able
5419 to receive all kinds of events simultaneously.  For the two existing
5420 event loops (implemented in @file{event-tty.c} and @file{event-Xt.c},
5421 respectively), the TTY event loop @emph{only} handles TTY consoles,
5422 while the Xt event loop handles @emph{both} TTY and X consoles.  This
5423 situation is different from all of the output handlers, where you simply
5424 have one per console type.
5425
5426   Here's the Xt Event Loop Diagram (notice that below a certain point,
5427 it's the same as the above diagram):
5428
5429 @example
5430 asynch. asynch. asynch. asynch.                 [Collectors in
5431  kbd     kbd    process process                    the OS]
5432 events  events  output  output
5433   |       |       |       |
5434   |       |       |       |     asynch. asynch. [Collectors in the
5435   |       |       |       |       X        X     OS and X Window System]
5436   |       |       |       |     events  events
5437   |       |       |       |       |        |
5438   |       |       |       |       |        |
5439   |       |       |       |       |        |    SIGINT, [signal handlers
5440   |       |       |       |       |        |    SIGQUIT,   in XEmacs]
5441   |       |       |       |       |        |    SIGWINCH,
5442   |       |       |       |       |        |    SIGALRM
5443   |       |       |       |       |        |       |
5444   |       |       |       |       |        |       |
5445   |       |       |       |       |        |       |      timeouts
5446   |       |       |       |       |        |       |          |
5447   |       |       |       |       |        |       |          |
5448   |       |       |       |       |        |       V          |
5449   V       V       V       V       V        V      fake        |
5450  file    file    file    file    file     file    file        |
5451  desc.   desc.   desc.   desc.   desc.    desc.   desc.       |
5452  (TTY)   (TTY)   (pipe)  (pipe) (socket) (socket) (pipe)      |
5453   |       |       |       |       |        |       |          |
5454   |       |       |       |       |        |       |          |
5455   |       |       |       |       |        |       |          |
5456   V       V       V       V       V        V       V          V
5457   --->----------------------------------------<---------<------
5458        |              |               |
5459        |              |               |[collected using select() in
5460        |              |               | _XtWaitForSomething(), called
5461        |              |               | from XtAppProcessEvent(), called
5462        |              |               | in emacs_Xt_next_event();
5463        |              |               | dispatched to various callbacks]
5464        |              |               |
5465        |              |               |
5466   emacs_Xt_        p_s_callback(),    | [popup_selection_callback]
5467   event_handler()  x_u_v_s_callback(),| [x_update_vertical_scrollbar_
5468        |           x_u_h_s_callback(),|  callback]
5469        |           search_callback()  | [x_update_horizontal_scrollbar_
5470        |              |               |  callback]
5471        |              |               |
5472        |              |               |
5473   enqueue_Xt_       signal_special_   |
5474   dispatch_event()  Xt_user_event()   |
5475   [maybe multiple     |               |
5476    times, maybe 0     |               |
5477    times]             |               |
5478        |            enqueue_Xt_       |
5479        |            dispatch_event()  |
5480        |              |               |
5481        |              |               |
5482        V              V               |
5483        -->----------<--               |
5484               |                       |
5485               |                       |
5486            dispatch             Xt_what_callback()
5487            event                  sets flags
5488            queue                      |
5489               |                       |
5490               |                       |
5491               |                       |
5492               |                       |
5493               ---->-----------<--------
5494                    |
5495                    |
5496                    |     [collected and converted as appropriate in
5497                    |            emacs_Xt_next_event()]
5498                    |
5499                    |
5500                    V          (above this line is Xt-specific)
5501                  Emacs ------------------------------------------------
5502                  event (below this line is the generic event mechanism)
5503                    |
5504                    |
5505 was there      if not, call
5506 a SIGINT?   emacs_Xt_next_event()
5507     |              |
5508     |              |
5509     |              |
5510     V              V
5511     --->-------<----
5512            |
5513            |        [collected in event_stream_next_event();
5514            |         SIGINT is converted using maybe_read_quit_event()]
5515            V
5516          Emacs
5517          event
5518            |
5519            \---->------>----- maybe_kbd_translate() -->-----\
5520                                                             |
5521                                                             |
5522                                                             |
5523      command event queue                                    |
5524                                               if not from command
5525   (contains events that were                  event queue, call
5526   read earlier but not processed,             event_stream_next_event()
5527   typically when waiting in a                               |
5528   sit-for, sleep-for, etc. for                              |
5529  a particular event to be received)                         |
5530                |                                            |
5531                |                                            |
5532                V                                            V
5533                ---->----------------------------------<------
5534                                                |
5535                                                | [collected in
5536                                                |  next_event_internal()]
5537                                                |
5538  unread-     unread-       event from          |
5539  command-    command-       keyboard       else, call
5540  events      event           macro      next_event_internal()
5541    |           |               |               |
5542    |           |               |               |
5543    |           |               |               |
5544    V           V               V               V
5545    --------->----------------------<------------
5546                      |
5547                      |      [collected in `next-event', which may loop
5548                      |       more than once if the event it gets is on
5549                      |       a dead frame, device, etc.]
5550                      |
5551                      |
5552                      V
5553             feed into top-level event loop,
5554             which repeatedly calls `next-event'
5555             and then dispatches the event
5556             using `dispatch-event'
5557 @end example
5558
5559 @node Specifics About the Emacs Event
5560 @section Specifics About the Emacs Event
5561
5562 @node The Event Stream Callback Routines
5563 @section The Event Stream Callback Routines
5564
5565 @node Other Event Loop Functions
5566 @section Other Event Loop Functions
5567
5568   @code{detect_input_pending()} and @code{input-pending-p} look for
5569 input by calling @code{event_stream->event_pending_p} and looking in
5570 @code{[V]unread-command-event} and the @code{command_event_queue} (they
5571 do not check for an executing keyboard macro, though).
5572
5573   @code{discard-input} cancels any command events pending (and any
5574 keyboard macros currently executing), and puts the others onto the
5575 @code{command_event_queue}.  There is a comment about a ``race
5576 condition'', which is not a good sign.
5577
5578   @code{next-command-event} and @code{read-char} are higher-level
5579 interfaces to @code{next-event}.  @code{next-command-event} gets the
5580 next @dfn{command} event (i.e.  keypress, mouse event, menu selection,
5581 or scrollbar action), calling @code{dispatch-event} on any others.
5582 @code{read-char} calls @code{next-command-event} and uses
5583 @code{event_to_character()} to return the character equivalent.  With
5584 the right kind of input method support, it is possible for (read-char)
5585 to return a Kanji character.
5586
5587 @node Converting Events
5588 @section Converting Events
5589
5590   @code{character_to_event()}, @code{event_to_character()},
5591 @code{event-to-character}, and @code{character-to-event} convert between
5592 characters and keypress events corresponding to the characters.  If the
5593 event was not a keypress, @code{event_to_character()} returns -1 and
5594 @code{event-to-character} returns @code{nil}.  These functions convert
5595 between character representation and the split-up event representation
5596 (keysym plus mod keys).
5597
5598 @node Dispatching Events; The Command Builder
5599 @section Dispatching Events; The Command Builder
5600
5601 Not yet documented.
5602
5603 @node Evaluation; Stack Frames; Bindings, Symbols and Variables, Events and the Event Loop, Top
5604 @chapter Evaluation; Stack Frames; Bindings
5605
5606 @menu
5607 * Evaluation::
5608 * Dynamic Binding; The specbinding Stack; Unwind-Protects::
5609 * Simple Special Forms::
5610 * Catch and Throw::
5611 @end menu
5612
5613 @node Evaluation
5614 @section Evaluation
5615
5616   @code{Feval()} evaluates the form (a Lisp object) that is passed to
5617 it.  Note that evaluation is only non-trivial for two types of objects:
5618 symbols and conses.  A symbol is evaluated simply by calling
5619 @code{symbol-value} on it and returning the value.
5620
5621   Evaluating a cons means calling a function.  First, @code{eval} checks
5622 to see if garbage-collection is necessary, and calls
5623 @code{garbage_collect_1()} if so.  It then increases the evaluation
5624 depth by 1 (@code{lisp_eval_depth}, which is always less than
5625 @code{max_lisp_eval_depth}) and adds an element to the linked list of
5626 @code{struct backtrace}'s (@code{backtrace_list}).  Each such structure
5627 contains a pointer to the function being called plus a list of the
5628 function's arguments.  Originally these values are stored unevalled, and
5629 as they are evaluated, the backtrace structure is updated.  Garbage
5630 collection pays attention to the objects pointed to in the backtrace
5631 structures (garbage collection might happen while a function is being
5632 called or while an argument is being evaluated, and there could easily
5633 be no other references to the arguments in the argument list; once an
5634 argument is evaluated, however, the unevalled version is not needed by
5635 eval, and so the backtrace structure is changed).
5636
5637 At this point, the function to be called is determined by looking at
5638 the car of the cons (if this is a symbol, its function definition is
5639 retrieved and the process repeated).  The function should then consist
5640 of either a @code{Lisp_Subr} (built-in function written in C), a
5641 @code{Lisp_Compiled_Function} object, or a cons whose car is one of the
5642 symbols @code{autoload}, @code{macro} or @code{lambda}.
5643
5644 If the function is a @code{Lisp_Subr}, the lisp object points to a
5645 @code{struct Lisp_Subr} (created by @code{DEFUN()}), which contains a
5646 pointer to the C function, a minimum and maximum number of arguments
5647 (or possibly the special constants @code{MANY} or @code{UNEVALLED}), a
5648 pointer to the symbol referring to that subr, and a couple of other
5649 things.  If the subr wants its arguments @code{UNEVALLED}, they are
5650 passed raw as a list.  Otherwise, an array of evaluated arguments is
5651 created and put into the backtrace structure, and either passed whole
5652 (@code{MANY}) or each argument is passed as a C argument.
5653
5654 If the function is a @code{Lisp_Compiled_Function},
5655 @code{funcall_compiled_function()} is called.  If the function is a
5656 lambda list, @code{funcall_lambda()} is called.  If the function is a
5657 macro, [..... fill in] is done.  If the function is an autoload,
5658 @code{do_autoload()} is called to load the definition and then eval
5659 starts over [explain this more].
5660
5661 When @code{Feval()} exits, the evaluation depth is reduced by one, the
5662 debugger is called if appropriate, and the current backtrace structure
5663 is removed from the list.
5664
5665 Both @code{funcall_compiled_function()} and @code{funcall_lambda()} need
5666 to go through the list of formal parameters to the function and bind
5667 them to the actual arguments, checking for @code{&rest} and
5668 @code{&optional} symbols in the formal parameters and making sure the
5669 number of actual arguments is correct.
5670 @code{funcall_compiled_function()} can do this a little more
5671 efficiently, since the formal parameter list can be checked for sanity
5672 when the compiled function object is created.
5673
5674 @code{funcall_lambda()} simply calls @code{Fprogn} to execute the code
5675 in the lambda list.
5676
5677 @code{funcall_compiled_function()} calls the real byte-code interpreter
5678 @code{execute_optimized_program()} on the byte-code instructions, which
5679 are converted into an internal form for faster execution.
5680
5681 When a compiled function is executed for the first time by
5682 @code{funcall_compiled_function()}, or when it is @code{Fpurecopy()}ed
5683 during the dump phase of building XEmacs, the byte-code instructions are
5684 converted from a @code{Lisp_String} (which is inefficient to access,
5685 especially in the presence of MULE) into a @code{Lisp_Opaque} object
5686 containing an array of unsigned char, which can be directly executed by
5687 the byte-code interpreter.  At this time the byte code is also analyzed
5688 for validity and transformed into a more optimized form, so that
5689 @code{execute_optimized_program()} can really fly.
5690
5691 Here are some of the optimizations performed by the internal byte-code
5692 transformer:
5693 @enumerate
5694 @item
5695 References to the @code{constants} array are checked for out-of-range
5696 indices, so that the byte interpreter doesn't have to.
5697 @item
5698 References to the @code{constants} array that will be used as a Lisp
5699 variable are checked for being correct non-constant (i.e. not @code{t},
5700 @code{nil}, or @code{keywordp}) symbols, so that the byte interpreter
5701 doesn't have to.
5702 @item
5703 The maxiumum number of variable bindings in the byte-code is
5704 pre-computed, so that space on the @code{specpdl} stack can be
5705 pre-reserved once for the whole function execution.
5706 @item
5707 All byte-code jumps are relative to the current program counter instead
5708 of the start of the program, thereby saving a register.
5709 @item
5710 One-byte relative jumps are converted from the byte-code form of unsigned
5711 chars offset by 127 to machine-friendly signed chars.
5712 @end enumerate
5713
5714 Of course, this transformation of the @code{instructions} should not be
5715 visible to the user, so @code{Fcompiled_function_instructions()} needs
5716 to know how to convert the optimized opaque object back into a Lisp
5717 string that is identical to the original string from the @file{.elc}
5718 file.  (Actually, the resulting string may (rarely) contain slightly
5719 different, yet equivalent, byte code.)
5720
5721 @code{Ffuncall()} implements Lisp @code{funcall}.  @code{(funcall fun
5722 x1 x2 x3 ...)} is equivalent to @code{(eval (list fun (quote x1) (quote
5723 x2) (quote x3) ...))}.  @code{Ffuncall()} contains its own code to do
5724 the evaluation, however, and is very similar to @code{Feval()}.
5725
5726 From the performance point of view, it is worth knowing that most of the
5727 time in Lisp evaluation is spent executing @code{Lisp_Subr} and
5728 @code{Lisp_Compiled_Function} objects via @code{Ffuncall()} (not
5729 @code{Feval()}).
5730
5731 @code{Fapply()} implements Lisp @code{apply}, which is very similar to
5732 @code{funcall} except that if the last argument is a list, the result is the
5733 same as if each of the arguments in the list had been passed separately.
5734 @code{Fapply()} does some business to expand the last argument if it's a
5735 list, then calls @code{Ffuncall()} to do the work.
5736
5737 @code{apply1()}, @code{call0()}, @code{call1()}, @code{call2()}, and
5738 @code{call3()} call a function, passing it the argument(s) given (the
5739 arguments are given as separate C arguments rather than being passed as
5740 an array).  @code{apply1()} uses @code{Fapply()} while the others use
5741 @code{Ffuncall()} to do the real work.
5742
5743 @node Dynamic Binding; The specbinding Stack; Unwind-Protects
5744 @section Dynamic Binding; The specbinding Stack; Unwind-Protects
5745
5746 @example
5747 struct specbinding
5748 @{
5749   Lisp_Object symbol;
5750   Lisp_Object old_value;
5751   Lisp_Object (*func) (Lisp_Object); /* for unwind-protect */
5752 @};
5753 @end example
5754
5755   @code{struct specbinding} is used for local-variable bindings and
5756 unwind-protects.  @code{specpdl} holds an array of @code{struct specbinding}'s,
5757 @code{specpdl_ptr} points to the beginning of the free bindings in the
5758 array, @code{specpdl_size} specifies the total number of binding slots
5759 in the array, and @code{max_specpdl_size} specifies the maximum number
5760 of bindings the array can be expanded to hold.  @code{grow_specpdl()}
5761 increases the size of the @code{specpdl} array, multiplying its size by
5762 2 but never exceeding @code{max_specpdl_size} (except that if this
5763 number is less than 400, it is first set to 400).
5764
5765   @code{specbind()} binds a symbol to a value and is used for local
5766 variables and @code{let} forms.  The symbol and its old value (which
5767 might be @code{Qunbound}, indicating no prior value) are recorded in the
5768 specpdl array, and @code{specpdl_size} is increased by 1.
5769
5770   @code{record_unwind_protect()} implements an @dfn{unwind-protect},
5771 which, when placed around a section of code, ensures that some specified
5772 cleanup routine will be executed even if the code exits abnormally
5773 (e.g. through a @code{throw} or quit).  @code{record_unwind_protect()}
5774 simply adds a new specbinding to the @code{specpdl} array and stores the
5775 appropriate information in it.  The cleanup routine can either be a C
5776 function, which is stored in the @code{func} field, or a @code{progn}
5777 form, which is stored in the @code{old_value} field.
5778
5779   @code{unbind_to()} removes specbindings from the @code{specpdl} array
5780 until the specified position is reached.  Each specbinding can be one of
5781 three types:
5782
5783 @enumerate
5784 @item
5785 an unwind-protect with a C cleanup function (@code{func} is not 0, and
5786 @code{old_value} holds an argument to be passed to the function);
5787 @item
5788 an unwind-protect with a Lisp form (@code{func} is 0, @code{symbol}
5789 is @code{nil}, and @code{old_value} holds the form to be executed with
5790 @code{Fprogn()}); or
5791 @item
5792 a local-variable binding (@code{func} is 0, @code{symbol} is not
5793 @code{nil}, and @code{old_value} holds the old value, which is stored as
5794 the symbol's value).
5795 @end enumerate
5796
5797 @node Simple Special Forms
5798 @section Simple Special Forms
5799
5800 @code{or}, @code{and}, @code{if}, @code{cond}, @code{progn},
5801 @code{prog1}, @code{prog2}, @code{setq}, @code{quote}, @code{function},
5802 @code{let*}, @code{let}, @code{while}
5803
5804 All of these are very simple and work as expected, calling
5805 @code{Feval()} or @code{Fprogn()} as necessary and (in the case of
5806 @code{let} and @code{let*}) using @code{specbind()} to create bindings
5807 and @code{unbind_to()} to undo the bindings when finished.
5808
5809 Note that, with the exeption of @code{Fprogn}, these functions are
5810 typically called in real life only in interpreted code, since the byte
5811 compiler knows how to convert calls to these functions directly into
5812 byte code.
5813
5814 @node Catch and Throw
5815 @section Catch and Throw
5816
5817 @example
5818 struct catchtag
5819 @{
5820   Lisp_Object tag;
5821   Lisp_Object val;
5822   struct catchtag *next;
5823   struct gcpro *gcpro;
5824   jmp_buf jmp;
5825   struct backtrace *backlist;
5826   int lisp_eval_depth;
5827   int pdlcount;
5828 @};
5829 @end example
5830
5831   @code{catch} is a Lisp function that places a catch around a body of
5832 code.  A catch is a means of non-local exit from the code.  When a catch
5833 is created, a tag is specified, and executing a @code{throw} to this tag
5834 will exit from the body of code caught with this tag, and its value will
5835 be the value given in the call to @code{throw}.  If there is no such
5836 call, the code will be executed normally.
5837
5838   Information pertaining to a catch is held in a @code{struct catchtag},
5839 which is placed at the head of a linked list pointed to by
5840 @code{catchlist}.  @code{internal_catch()} is passed a C function to
5841 call (@code{Fprogn()} when Lisp @code{catch} is called) and arguments to
5842 give it, and places a catch around the function.  Each @code{struct
5843 catchtag} is held in the stack frame of the @code{internal_catch()}
5844 instance that created the catch.
5845
5846   @code{internal_catch()} is fairly straightforward.  It stores into the
5847 @code{struct catchtag} the tag name and the current values of
5848 @code{backtrace_list}, @code{lisp_eval_depth}, @code{gcprolist}, and the
5849 offset into the @code{specpdl} array, sets a jump point with @code{_setjmp()}
5850 (storing the jump point into the @code{struct catchtag}), and calls the
5851 function.  Control will return to @code{internal_catch()} either when
5852 the function exits normally or through a @code{_longjmp()} to this jump
5853 point.  In the latter case, @code{throw} will store the value to be
5854 returned into the @code{struct catchtag} before jumping.  When it's
5855 done, @code{internal_catch()} removes the @code{struct catchtag} from
5856 the catchlist and returns the proper value.
5857
5858   @code{Fthrow()} goes up through the catchlist until it finds one with
5859 a matching tag.  It then calls @code{unbind_catch()} to restore
5860 everything to what it was when the appropriate catch was set, stores the
5861 return value in the @code{struct catchtag}, and jumps (with
5862 @code{_longjmp()}) to its jump point.
5863
5864   @code{unbind_catch()} removes all catches from the catchlist until it
5865 finds the correct one.  Some of the catches might have been placed for
5866 error-trapping, and if so, the appropriate entries on the handlerlist
5867 must be removed (see ``errors'').  @code{unbind_catch()} also restores
5868 the values of @code{gcprolist}, @code{backtrace_list}, and
5869 @code{lisp_eval}, and calls @code{unbind_to()} to undo any specbindings
5870 created since the catch.
5871
5872
5873 @node Symbols and Variables, Buffers and Textual Representation, Evaluation; Stack Frames; Bindings, Top
5874 @chapter Symbols and Variables
5875
5876 @menu
5877 * Introduction to Symbols::
5878 * Obarrays::
5879 * Symbol Values::
5880 @end menu
5881
5882 @node Introduction to Symbols
5883 @section Introduction to Symbols
5884
5885   A symbol is basically just an object with four fields: a name (a
5886 string), a value (some Lisp object), a function (some Lisp object), and
5887 a property list (usually a list of alternating keyword/value pairs).
5888 What makes symbols special is that there is usually only one symbol with
5889 a given name, and the symbol is referred to by name.  This makes a
5890 symbol a convenient way of calling up data by name, i.e. of implementing
5891 variables. (The variable's value is stored in the @dfn{value slot}.)
5892 Similarly, functions are referenced by name, and the definition of the
5893 function is stored in a symbol's @dfn{function slot}.  This means that
5894 there can be a distinct function and variable with the same name.  The
5895 property list is used as a more general mechanism of associating
5896 additional values with particular names, and once again the namespace is
5897 independent of the function and variable namespaces.
5898
5899 @node Obarrays
5900 @section Obarrays
5901
5902   The identity of symbols with their names is accomplished through a
5903 structure called an obarray, which is just a poorly-implemented hash
5904 table mapping from strings to symbols whose name is that string. (I say
5905 ``poorly implemented'' because an obarray appears in Lisp as a vector
5906 with some hidden fields rather than as its own opaque type.  This is an
5907 Emacs Lisp artifact that should be fixed.)
5908
5909   Obarrays are implemented as a vector of some fixed size (which should
5910 be a prime for best results), where each ``bucket'' of the vector
5911 contains one or more symbols, threaded through a hidden @code{next}
5912 field in the symbol.  Lookup of a symbol in an obarray, and adding a
5913 symbol to an obarray, is accomplished through standard hash-table
5914 techniques.
5915
5916   The standard Lisp function for working with symbols and obarrays is
5917 @code{intern}.  This looks up a symbol in an obarray given its name; if
5918 it's not found, a new symbol is automatically created with the specified
5919 name, added to the obarray, and returned.  This is what happens when the
5920 Lisp reader encounters a symbol (or more precisely, encounters the name
5921 of a symbol) in some text that it is reading.  There is a standard
5922 obarray called @code{obarray} that is used for this purpose, although
5923 the Lisp programmer is free to create his own obarrays and @code{intern}
5924 symbols in them.
5925
5926   Note that, once a symbol is in an obarray, it stays there until
5927 something is done about it, and the standard obarray @code{obarray}
5928 always stays around, so once you use any particular variable name, a
5929 corresponding symbol will stay around in @code{obarray} until you exit
5930 XEmacs.
5931
5932   Note that @code{obarray} itself is a variable, and as such there is a
5933 symbol in @code{obarray} whose name is @code{"obarray"} and which
5934 contains @code{obarray} as its value.
5935
5936   Note also that this call to @code{intern} occurs only when in the Lisp
5937 reader, not when the code is executed (at which point the symbol is
5938 already around, stored as such in the definition of the function).
5939
5940   You can create your own obarray using @code{make-vector} (this is
5941 horrible but is an artifact) and intern symbols into that obarray.
5942 Doing that will result in two or more symbols with the same name.
5943 However, at most one of these symbols is in the standard @code{obarray}:
5944 You cannot have two symbols of the same name in any particular obarray.
5945 Note that you cannot add a symbol to an obarray in any fashion other
5946 than using @code{intern}: i.e. you can't take an existing symbol and put
5947 it in an existing obarray.  Nor can you change the name of an existing
5948 symbol. (Since obarrays are vectors, you can violate the consistency of
5949 things by storing directly into the vector, but let's ignore that
5950 possibility.)
5951
5952   Usually symbols are created by @code{intern}, but if you really want,
5953 you can explicitly create a symbol using @code{make-symbol}, giving it
5954 some name.  The resulting symbol is not in any obarray (i.e. it is
5955 @dfn{uninterned}), and you can't add it to any obarray.  Therefore its
5956 primary purpose is as a symbol to use in macros to avoid namespace
5957 pollution.  It can also be used as a carrier of information, but cons
5958 cells could probably be used just as well.
5959
5960   You can also use @code{intern-soft} to look up a symbol but not create
5961 a new one, and @code{unintern} to remove a symbol from an obarray.  This
5962 returns the removed symbol. (Remember: You can't put the symbol back
5963 into any obarray.) Finally, @code{mapatoms} maps over all of the symbols
5964 in an obarray.
5965
5966 @node Symbol Values
5967 @section Symbol Values
5968
5969   The value field of a symbol normally contains a Lisp object.  However,
5970 a symbol can be @dfn{unbound}, meaning that it logically has no value.
5971 This is internally indicated by storing a special Lisp object, called
5972 @dfn{the unbound marker} and stored in the global variable
5973 @code{Qunbound}.  The unbound marker is of a special Lisp object type
5974 called @dfn{symbol-value-magic}.  It is impossible for the Lisp
5975 programmer to directly create or access any object of this type.
5976
5977   @strong{You must not let any ``symbol-value-magic'' object escape to
5978 the Lisp level.}  Printing any of these objects will cause the message
5979 @samp{INTERNAL EMACS BUG} to appear as part of the print representation.
5980 (You may see this normally when you call @code{debug_print()} from the
5981 debugger on a Lisp object.) If you let one of these objects escape to
5982 the Lisp level, you will violate a number of assumptions contained in
5983 the C code and make the unbound marker not function right.
5984
5985   When a symbol is created, its value field (and function field) are set
5986 to @code{Qunbound}.  The Lisp programmer can restore these conditions
5987 later using @code{makunbound} or @code{fmakunbound}, and can query to
5988 see whether the value of function fields are @dfn{bound} (i.e. have a
5989 value other than @code{Qunbound}) using @code{boundp} and
5990 @code{fboundp}.  The fields are set to a normal Lisp object using
5991 @code{set} (or @code{setq}) and @code{fset}.
5992
5993   Other symbol-value-magic objects are used as special markers to
5994 indicate variables that have non-normal properties.  This includes any
5995 variables that are tied into C variables (setting the variable magically
5996 sets some global variable in the C code, and likewise for retrieving the
5997 variable's value), variables that magically tie into slots in the
5998 current buffer, variables that are buffer-local, etc.  The
5999 symbol-value-magic object is stored in the value cell in place of
6000 a normal object, and the code to retrieve a symbol's value
6001 (i.e. @code{symbol-value}) knows how to do special things with them.
6002 This means that you should not just fetch the value cell directly if you
6003 want a symbol's value.
6004
6005   The exact workings of this are rather complex and involved and are
6006 well-documented in comments in @file{buffer.c}, @file{symbols.c}, and
6007 @file{lisp.h}.
6008
6009 @node Buffers and Textual Representation, MULE Character Sets and Encodings, Symbols and Variables, Top
6010 @chapter Buffers and Textual Representation
6011
6012 @menu
6013 * Introduction to Buffers::     A buffer holds a block of text such as a file.
6014 * The Text in a Buffer::        Representation of the text in a buffer.
6015 * Buffer Lists::                Keeping track of all buffers.
6016 * Markers and Extents::         Tagging locations within a buffer.
6017 * Bufbytes and Emchars::        Representation of individual characters.
6018 * The Buffer Object::           The Lisp object corresponding to a buffer.
6019 @end menu
6020
6021 @node Introduction to Buffers
6022 @section Introduction to Buffers
6023
6024   A buffer is logically just a Lisp object that holds some text.
6025 In this, it is like a string, but a buffer is optimized for
6026 frequent insertion and deletion, while a string is not.  Furthermore:
6027
6028 @enumerate
6029 @item
6030 Buffers are @dfn{permanent} objects, i.e. once you create them, they
6031 remain around, and need to be explicitly deleted before they go away.
6032 @item
6033 Each buffer has a unique name, which is a string.  Buffers are
6034 normally referred to by name.  In this respect, they are like
6035 symbols.
6036 @item
6037 Buffers have a default insertion position, called @dfn{point}.
6038 Inserting text (unless you explicitly give a position) goes at point,
6039 and moves point forward past the text.  This is what is going on when
6040 you type text into Emacs.
6041 @item
6042 Buffers have lots of extra properties associated with them.
6043 @item
6044 Buffers can be @dfn{displayed}.  What this means is that there
6045 exist a number of @dfn{windows}, which are objects that correspond
6046 to some visible section of your display, and each window has
6047 an associated buffer, and the current contents of the buffer
6048 are shown in that section of the display.  The redisplay mechanism
6049 (which takes care of doing this) knows how to look at the
6050 text of a buffer and come up with some reasonable way of displaying
6051 this.  Many of the properties of a buffer control how the
6052 buffer's text is displayed.
6053 @item
6054 One buffer is distinguished and called the @dfn{current buffer}.  It is
6055 stored in the variable @code{current_buffer}.  Buffer operations operate
6056 on this buffer by default.  When you are typing text into a buffer, the
6057 buffer you are typing into is always @code{current_buffer}.  Switching
6058 to a different window changes the current buffer.  Note that Lisp code
6059 can temporarily change the current buffer using @code{set-buffer} (often
6060 enclosed in a @code{save-excursion} so that the former current buffer
6061 gets restored when the code is finished).  However, calling
6062 @code{set-buffer} will NOT cause a permanent change in the current
6063 buffer.  The reason for this is that the top-level event loop sets
6064 @code{current_buffer} to the buffer of the selected window, each time
6065 it finishes executing a user command.
6066 @end enumerate
6067
6068   Make sure you understand the distinction between @dfn{current buffer}
6069 and @dfn{buffer of the selected window}, and the distinction between
6070 @dfn{point} of the current buffer and @dfn{window-point} of the selected
6071 window. (This latter distinction is explained in detail in the section
6072 on windows.)
6073
6074 @node The Text in a Buffer
6075 @section The Text in a Buffer
6076
6077   The text in a buffer consists of a sequence of zero or more
6078 characters.  A @dfn{character} is an integer that logically represents
6079 a letter, number, space, or other unit of text.  Most of the characters
6080 that you will typically encounter belong to the ASCII set of characters,
6081 but there are also characters for various sorts of accented letters,
6082 special symbols, Chinese and Japanese ideograms (i.e. Kanji, Katakana,
6083 etc.), Cyrillic and Greek letters, etc.  The actual number of possible
6084 characters is quite large.
6085
6086   For now, we can view a character as some non-negative integer that
6087 has some shape that defines how it typically appears (e.g. as an
6088 uppercase A). (The exact way in which a character appears depends on the
6089 font used to display the character.) The internal type of characters in
6090 the C code is an @code{Emchar}; this is just an @code{int}, but using a
6091 symbolic type makes the code clearer.
6092
6093   Between every character in a buffer is a @dfn{buffer position} or
6094 @dfn{character position}.  We can speak of the character before or after
6095 a particular buffer position, and when you insert a character at a
6096 particular position, all characters after that position end up at new
6097 positions.  When we speak of the character @dfn{at} a position, we
6098 really mean the character after the position.  (This schizophrenia
6099 between a buffer position being ``between'' a character and ``on'' a
6100 character is rampant in Emacs.)
6101
6102   Buffer positions are numbered starting at 1.  This means that
6103 position 1 is before the first character, and position 0 is not
6104 valid.  If there are N characters in a buffer, then buffer
6105 position N+1 is after the last one, and position N+2 is not valid.
6106
6107   The internal makeup of the Emchar integer varies depending on whether
6108 we have compiled with MULE support.  If not, the Emchar integer is an
6109 8-bit integer with possible values from 0 - 255.  0 - 127 are the
6110 standard ASCII characters, while 128 - 255 are the characters from the
6111 ISO-8859-1 character set.  If we have compiled with MULE support, an
6112 Emchar is a 19-bit integer, with the various bits having meanings
6113 according to a complex scheme that will be detailed later.  The
6114 characters numbered 0 - 255 still have the same meanings as for the
6115 non-MULE case, though.
6116
6117   Internally, the text in a buffer is represented in a fairly simple
6118 fashion: as a contiguous array of bytes, with a @dfn{gap} of some size
6119 in the middle.  Although the gap is of some substantial size in bytes,
6120 there is no text contained within it: From the perspective of the text
6121 in the buffer, it does not exist.  The gap logically sits at some buffer
6122 position, between two characters (or possibly at the beginning or end of
6123 the buffer).  Insertion of text in a buffer at a particular position is
6124 always accomplished by first moving the gap to that position
6125 (i.e. through some block moving of text), then writing the text into the
6126 beginning of the gap, thereby shrinking the gap.  If the gap shrinks
6127 down to nothing, a new gap is created. (What actually happens is that a
6128 new gap is ``created'' at the end of the buffer's text, which requires
6129 nothing more than changing a couple of indices; then the gap is
6130 ``moved'' to the position where the insertion needs to take place by
6131 moving up in memory all the text after that position.)  Similarly,
6132 deletion occurs by moving the gap to the place where the text is to be
6133 deleted, and then simply expanding the gap to include the deleted text.
6134 (@dfn{Expanding} and @dfn{shrinking} the gap as just described means
6135 just that the internal indices that keep track of where the gap is
6136 located are changed.)
6137
6138   Note that the total amount of memory allocated for a buffer text never
6139 decreases while the buffer is live.  Therefore, if you load up a
6140 20-megabyte file and then delete all but one character, there will be a
6141 20-megabyte gap, which won't get any smaller (except by inserting
6142 characters back again).  Once the buffer is killed, the memory allocated
6143 for the buffer text will be freed, but it will still be sitting on the
6144 heap, taking up virtual memory, and will not be released back to the
6145 operating system. (However, if you have compiled XEmacs with rel-alloc,
6146 the situation is different.  In this case, the space @emph{will} be
6147 released back to the operating system.  However, this tends to result in a
6148 noticeable speed penalty.)
6149
6150   Astute readers may notice that the text in a buffer is represented as
6151 an array of @emph{bytes}, while (at least in the MULE case) an Emchar is
6152 a 19-bit integer, which clearly cannot fit in a byte.  This means (of
6153 course) that the text in a buffer uses a different representation from
6154 an Emchar: specifically, the 19-bit Emchar becomes a series of one to
6155 four bytes.  The conversion between these two representations is complex
6156 and will be described later.
6157
6158   In the non-MULE case, everything is very simple: An Emchar
6159 is an 8-bit value, which fits neatly into one byte.
6160
6161   If we are given a buffer position and want to retrieve the
6162 character at that position, we need to follow these steps:
6163
6164 @enumerate
6165 @item
6166 Pretend there's no gap, and convert the buffer position into a @dfn{byte
6167 index} that indexes to the appropriate byte in the buffer's stream of
6168 textual bytes.  By convention, byte indices begin at 1, just like buffer
6169 positions.  In the non-MULE case, byte indices and buffer positions are
6170 identical, since one character equals one byte.
6171 @item
6172 Convert the byte index into a @dfn{memory index}, which takes the gap
6173 into account.  The memory index is a direct index into the block of
6174 memory that stores the text of a buffer.  This basically just involves
6175 checking to see if the byte index is past the gap, and if so, adding the
6176 size of the gap to it.  By convention, memory indices begin at 1, just
6177 like buffer positions and byte indices, and when referring to the
6178 position that is @dfn{at} the gap, we always use the memory position at
6179 the @emph{beginning}, not at the end, of the gap.
6180 @item
6181 Fetch the appropriate bytes at the determined memory position.
6182 @item
6183 Convert these bytes into an Emchar.
6184 @end enumerate
6185
6186   In the non-Mule case, (3) and (4) boil down to a simple one-byte
6187 memory access.
6188
6189   Note that we have defined three types of positions in a buffer:
6190
6191 @enumerate
6192 @item
6193 @dfn{buffer positions} or @dfn{character positions}, typedef @code{Bufpos}
6194 @item
6195 @dfn{byte indices}, typedef @code{Bytind}
6196 @item
6197 @dfn{memory indices}, typedef @code{Memind}
6198 @end enumerate
6199
6200   All three typedefs are just @code{int}s, but defining them this way makes
6201 things a lot clearer.
6202
6203   Most code works with buffer positions.  In particular, all Lisp code
6204 that refers to text in a buffer uses buffer positions.  Lisp code does
6205 not know that byte indices or memory indices exist.
6206
6207   Finally, we have a typedef for the bytes in a buffer.  This is a
6208 @code{Bufbyte}, which is an unsigned char.  Referring to them as
6209 Bufbytes underscores the fact that we are working with a string of bytes
6210 in the internal Emacs buffer representation rather than in one of a
6211 number of possible alternative representations (e.g. EUC-encoded text,
6212 etc.).
6213
6214 @node Buffer Lists
6215 @section Buffer Lists
6216
6217   Recall earlier that buffers are @dfn{permanent} objects, i.e.  that
6218 they remain around until explicitly deleted.  This entails that there is
6219 a list of all the buffers in existence.  This list is actually an
6220 assoc-list (mapping from the buffer's name to the buffer) and is stored
6221 in the global variable @code{Vbuffer_alist}.
6222
6223   The order of the buffers in the list is important: the buffers are
6224 ordered approximately from most-recently-used to least-recently-used.
6225 Switching to a buffer using @code{switch-to-buffer},
6226 @code{pop-to-buffer}, etc. and switching windows using
6227 @code{other-window}, etc.  usually brings the new current buffer to the
6228 front of the list.  @code{switch-to-buffer}, @code{other-buffer},
6229 etc. look at the beginning of the list to find an alternative buffer to
6230 suggest.  You can also explicitly move a buffer to the end of the list
6231 using @code{bury-buffer}.
6232
6233   In addition to the global ordering in @code{Vbuffer_alist}, each frame
6234 has its own ordering of the list.  These lists always contain the same
6235 elements as in @code{Vbuffer_alist} although possibly in a different
6236 order.  @code{buffer-list} normally returns the list for the selected
6237 frame.  This allows you to work in separate frames without things
6238 interfering with each other.
6239
6240   The standard way to look up a buffer given a name is
6241 @code{get-buffer}, and the standard way to create a new buffer is
6242 @code{get-buffer-create}, which looks up a buffer with a given name,
6243 creating a new one if necessary.  These operations correspond exactly
6244 with the symbol operations @code{intern-soft} and @code{intern},
6245 respectively.  You can also force a new buffer to be created using
6246 @code{generate-new-buffer}, which takes a name and (if necessary) makes
6247 a unique name from this by appending a number, and then creates the
6248 buffer.  This is basically like the symbol operation @code{gensym}.
6249
6250 @node Markers and Extents
6251 @section Markers and Extents
6252
6253   Among the things associated with a buffer are things that are
6254 logically attached to certain buffer positions.  This can be used to
6255 keep track of a buffer position when text is inserted and deleted, so
6256 that it remains at the same spot relative to the text around it; to
6257 assign properties to particular sections of text; etc.  There are two
6258 such objects that are useful in this regard: they are @dfn{markers} and
6259 @dfn{extents}.
6260
6261   A @dfn{marker} is simply a flag placed at a particular buffer
6262 position, which is moved around as text is inserted and deleted.
6263 Markers are used for all sorts of purposes, such as the @code{mark} that
6264 is the other end of textual regions to be cut, copied, etc.
6265
6266   An @dfn{extent} is similar to two markers plus some associated
6267 properties, and is used to keep track of regions in a buffer as text is
6268 inserted and deleted, and to add properties (e.g. fonts) to particular
6269 regions of text.  The external interface of extents is explained
6270 elsewhere.
6271
6272   The important thing here is that markers and extents simply contain
6273 buffer positions in them as integers, and every time text is inserted or
6274 deleted, these positions must be updated.  In order to minimize the
6275 amount of shuffling that needs to be done, the positions in markers and
6276 extents (there's one per marker, two per extent) and stored in Meminds.
6277 This means that they only need to be moved when the text is physically
6278 moved in memory; since the gap structure tries to minimize this, it also
6279 minimizes the number of marker and extent indices that need to be
6280 adjusted.  Look in @file{insdel.c} for the details of how this works.
6281
6282   One other important distinction is that markers are @dfn{temporary}
6283 while extents are @dfn{permanent}.  This means that markers disappear as
6284 soon as there are no more pointers to them, and correspondingly, there
6285 is no way to determine what markers are in a buffer if you are just
6286 given the buffer.  Extents remain in a buffer until they are detached
6287 (which could happen as a result of text being deleted) or the buffer is
6288 deleted, and primitives do exist to enumerate the extents in a buffer.
6289
6290 @node Bufbytes and Emchars
6291 @section Bufbytes and Emchars
6292
6293   Not yet documented.
6294
6295 @node The Buffer Object
6296 @section The Buffer Object
6297
6298   Buffers contain fields not directly accessible by the Lisp programmer.
6299 We describe them here, naming them by the names used in the C code.
6300 Many are accessible indirectly in Lisp programs via Lisp primitives.
6301
6302 @table @code
6303 @item name
6304 The buffer name is a string that names the buffer.  It is guaranteed to
6305 be unique.  @xref{Buffer Names,,, lispref, XEmacs Lisp Programmer's
6306 Manual}.
6307
6308 @item save_modified
6309 This field contains the time when the buffer was last saved, as an
6310 integer.  @xref{Buffer Modification,,, lispref, XEmacs Lisp Programmer's
6311 Manual}.
6312
6313 @item modtime
6314 This field contains the modification time of the visited file.  It is
6315 set when the file is written or read.  Every time the buffer is written
6316 to the file, this field is compared to the modification time of the
6317 file.  @xref{Buffer Modification,,, lispref, XEmacs Lisp Programmer's
6318 Manual}.
6319
6320 @item auto_save_modified
6321 This field contains the time when the buffer was last auto-saved.
6322
6323 @item last_window_start
6324 This field contains the @code{window-start} position in the buffer as of
6325 the last time the buffer was displayed in a window.
6326
6327 @item undo_list
6328 This field points to the buffer's undo list.  @xref{Undo,,, lispref,
6329 XEmacs Lisp Programmer's Manual}.
6330
6331 @item syntax_table_v
6332 This field contains the syntax table for the buffer.  @xref{Syntax
6333 Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
6334
6335 @item downcase_table
6336 This field contains the conversion table for converting text to lower
6337 case.  @xref{Case Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
6338
6339 @item upcase_table
6340 This field contains the conversion table for converting text to upper
6341 case.  @xref{Case Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
6342
6343 @item case_canon_table
6344 This field contains the conversion table for canonicalizing text for
6345 case-folding search.  @xref{Case Tables,,, lispref, XEmacs Lisp
6346 Programmer's Manual}.
6347
6348 @item case_eqv_table
6349 This field contains the equivalence table for case-folding search.
6350 @xref{Case Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
6351
6352 @item display_table
6353 This field contains the buffer's display table, or @code{nil} if it
6354 doesn't have one.  @xref{Display Tables,,, lispref, XEmacs Lisp
6355 Programmer's Manual}.
6356
6357 @item markers
6358 This field contains the chain of all markers that currently point into
6359 the buffer.  Deletion of text in the buffer, and motion of the buffer's
6360 gap, must check each of these markers and perhaps update it.
6361 @xref{Markers,,, lispref, XEmacs Lisp Programmer's Manual}.
6362
6363 @item backed_up
6364 This field is a flag that tells whether a backup file has been made for
6365 the visited file of this buffer.
6366
6367 @item mark
6368 This field contains the mark for the buffer.  The mark is a marker,
6369 hence it is also included on the list @code{markers}.  @xref{The Mark,,,
6370 lispref, XEmacs Lisp Programmer's Manual}.
6371
6372 @item mark_active
6373 This field is non-@code{nil} if the buffer's mark is active.
6374
6375 @item local_var_alist
6376 This field contains the association list describing the variables local
6377 in this buffer, and their values, with the exception of local variables
6378 that have special slots in the buffer object.  (Those slots are omitted
6379 from this table.)  @xref{Buffer-Local Variables,,, lispref, XEmacs Lisp
6380 Programmer's Manual}.
6381
6382 @item modeline_format
6383 This field contains a Lisp object which controls how to display the mode
6384 line for this buffer.  @xref{Modeline Format,,, lispref, XEmacs Lisp
6385 Programmer's Manual}.
6386
6387 @item base_buffer
6388 This field holds the buffer's base buffer (if it is an indirect buffer),
6389 or @code{nil}.
6390 @end table
6391
6392 @node MULE Character Sets and Encodings, The Lisp Reader and Compiler, Buffers and Textual Representation, Top
6393 @chapter MULE Character Sets and Encodings
6394
6395   Recall that there are two primary ways that text is represented in
6396 XEmacs.  The @dfn{buffer} representation sees the text as a series of
6397 bytes (Bufbytes), with a variable number of bytes used per character.
6398 The @dfn{character} representation sees the text as a series of integers
6399 (Emchars), one per character.  The character representation is a cleaner
6400 representation from a theoretical standpoint, and is thus used in many
6401 cases when lots of manipulations on a string need to be done.  However,
6402 the buffer representation is the standard representation used in both
6403 Lisp strings and buffers, and because of this, it is the ``default''
6404 representation that text comes in.  The reason for using this
6405 representation is that it's compact and is compatible with ASCII.
6406
6407 @menu
6408 * Character Sets::
6409 * Encodings::
6410 * Internal Mule Encodings::
6411 * CCL::
6412 @end menu
6413
6414 @node Character Sets
6415 @section Character Sets
6416
6417   A character set (or @dfn{charset}) is an ordered set of characters.  A
6418 particular character in a charset is indexed using one or more
6419 @dfn{position codes}, which are non-negative integers.  The number of
6420 position codes needed to identify a particular character in a charset is
6421 called the @dfn{dimension} of the charset.  In XEmacs/Mule, all charsets
6422 have dimension 1 or 2, and the size of all charsets (except for a few
6423 special cases) is either 94, 96, 94 by 94, or 96 by 96.  The range of
6424 position codes used to index characters from any of these types of
6425 character sets is as follows:
6426
6427 @example
6428 Charset type            Position code 1         Position code 2
6429 ------------------------------------------------------------
6430 94                      33 - 126                N/A
6431 96                      32 - 127                N/A
6432 94x94                   33 - 126                33 - 126
6433 96x96                   32 - 127                32 - 127
6434 @end example
6435
6436   Note that in the above cases position codes do not start at an
6437 expected value such as 0 or 1.  The reason for this will become clear
6438 later.
6439
6440   For example, Latin-1 is a 96-character charset, and JISX0208 (the
6441 Japanese national character set) is a 94x94-character charset.
6442
6443   [Note that, although the ranges above define the @emph{valid} position
6444 codes for a charset, some of the slots in a particular charset may in
6445 fact be empty.  This is the case for JISX0208, for example, where (e.g.)
6446 all the slots whose first position code is in the range 118 - 127 are
6447 empty.]
6448
6449   There are three charsets that do not follow the above rules.  All of
6450 them have one dimension, and have ranges of position codes as follows:
6451
6452 @example
6453 Charset name            Position code 1
6454 ------------------------------------
6455 ASCII                   0 - 127
6456 Control-1               0 - 31
6457 Composite               0 - some large number
6458 @end example
6459
6460   (The upper bound of the position code for composite characters has not
6461 yet been determined, but it will probably be at least 16,383).
6462
6463   ASCII is the union of two subsidiary character sets: Printing-ASCII
6464 (the printing ASCII character set, consisting of position codes 33 -
6465 126, like for a standard 94-character charset) and Control-ASCII (the
6466 non-printing characters that would appear in a binary file with codes 0
6467 - 32 and 127).
6468
6469   Control-1 contains the non-printing characters that would appear in a
6470 binary file with codes 128 - 159.
6471
6472   Composite contains characters that are generated by overstriking one
6473 or more characters from other charsets.
6474
6475   Note that some characters in ASCII, and all characters in Control-1,
6476 are @dfn{control} (non-printing) characters.  These have no printed
6477 representation but instead control some other function of the printing
6478 (e.g. TAB or 8 moves the current character position to the next tab
6479 stop).  All other characters in all charsets are @dfn{graphic}
6480 (printing) characters.
6481
6482   When a binary file is read in, the bytes in the file are assigned to
6483 character sets as follows:
6484
6485 @example
6486 Bytes           Character set           Range
6487 --------------------------------------------------
6488 0 - 127         ASCII                   0 - 127
6489 128 - 159       Control-1               0 - 31
6490 160 - 255       Latin-1                 32 - 127
6491 @end example
6492
6493   This is a bit ad-hoc but gets the job done.
6494
6495 @node Encodings
6496 @section Encodings
6497
6498   An @dfn{encoding} is a way of numerically representing characters from
6499 one or more character sets.  If an encoding only encompasses one
6500 character set, then the position codes for the characters in that
6501 character set could be used directly.  This is not possible, however, if
6502 more than one character set is to be used in the encoding.
6503
6504   For example, the conversion detailed above between bytes in a binary
6505 file and characters is effectively an encoding that encompasses the
6506 three character sets ASCII, Control-1, and Latin-1 in a stream of 8-bit
6507 bytes.
6508
6509   Thus, an encoding can be viewed as a way of encoding characters from a
6510 specified group of character sets using a stream of bytes, each of which
6511 contains a fixed number of bits (but not necessarily 8, as in the common
6512 usage of ``byte'').
6513
6514   Here are descriptions of a couple of common
6515 encodings:
6516
6517 @menu
6518 * Japanese EUC (Extended Unix Code)::
6519 * JIS7::
6520 @end menu
6521
6522 @node Japanese EUC (Extended Unix Code)
6523 @subsection Japanese EUC (Extended Unix Code)
6524
6525 This encompasses the character sets Printing-ASCII, Japanese-JISX0201,
6526 and Japanese-JISX0208-Kana (half-width katakana, the right half of
6527 JISX0201).  It uses 8-bit bytes.
6528
6529 Note that Printing-ASCII and Japanese-JISX0201-Kana are 94-character
6530 charsets, while Japanese-JISX0208 is a 94x94-character charset.
6531
6532 The encoding is as follows:
6533
6534 @example
6535 Character set            Representation (PC=position-code)
6536 -------------            --------------
6537 Printing-ASCII           PC1
6538 Japanese-JISX0201-Kana   0x8E       | PC1 + 0x80
6539 Japanese-JISX0208        PC1 + 0x80 | PC2 + 0x80
6540 Japanese-JISX0212        PC1 + 0x80 | PC2 + 0x80
6541 @end example
6542
6543
6544 @node JIS7
6545 @subsection JIS7
6546
6547 This encompasses the character sets Printing-ASCII,
6548 Japanese-JISX0201-Roman (the left half of JISX0201; this character set
6549 is very similar to Printing-ASCII and is a 94-character charset),
6550 Japanese-JISX0208, and Japanese-JISX0201-Kana.  It uses 7-bit bytes.
6551
6552 Unlike Japanese EUC, this is a @dfn{modal} encoding, which
6553 means that there are multiple states that the encoding can
6554 be in, which affect how the bytes are to be interpreted.
6555 Special sequences of bytes (called @dfn{escape sequences})
6556 are used to change states.
6557
6558   The encoding is as follows:
6559
6560 @example
6561 Character set              Representation (PC=position-code)
6562 -------------              --------------
6563 Printing-ASCII             PC1
6564 Japanese-JISX0201-Roman    PC1
6565 Japanese-JISX0201-Kana     PC1
6566 Japanese-JISX0208          PC1 PC2
6567
6568
6569 Escape sequence   ASCII equivalent   Meaning
6570 ---------------   ----------------   -------
6571 0x1B 0x28 0x4A    ESC ( J            invoke Japanese-JISX0201-Roman
6572 0x1B 0x28 0x49    ESC ( I            invoke Japanese-JISX0201-Kana
6573 0x1B 0x24 0x42    ESC $ B            invoke Japanese-JISX0208
6574 0x1B 0x28 0x42    ESC ( B            invoke Printing-ASCII
6575 @end example
6576
6577   Initially, Printing-ASCII is invoked.
6578
6579 @node Internal Mule Encodings
6580 @section Internal Mule Encodings
6581
6582 In XEmacs/Mule, each character set is assigned a unique number, called a
6583 @dfn{leading byte}.  This is used in the encodings of a character.
6584 Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has
6585 a leading byte of 0), although some leading bytes are reserved.
6586
6587 Charsets whose leading byte is in the range 0x80 - 0x9F are called
6588 @dfn{official} and are used for built-in charsets.  Other charsets are
6589 called @dfn{private} and have leading bytes in the range 0xA0 - 0xFF;
6590 these are user-defined charsets.
6591
6592   More specifically:
6593
6594 @example
6595 Character set           Leading byte
6596 -------------           ------------
6597 ASCII                   0
6598 Composite               0x80
6599 Dimension-1 Official    0x81 - 0x8D
6600                           (0x8E is free)
6601 Control-1               0x8F
6602 Dimension-2 Official    0x90 - 0x99
6603                           (0x9A - 0x9D are free;
6604                            0x9E and 0x9F are reserved)
6605 Dimension-1 Private     0xA0 - 0xEF
6606 Dimension-2 Private     0xF0 - 0xFF
6607 @end example
6608
6609 There are two internal encodings for characters in XEmacs/Mule.  One is
6610 called @dfn{string encoding} and is an 8-bit encoding that is used for
6611 representing characters in a buffer or string.  It uses 1 to 4 bytes per
6612 character.  The other is called @dfn{character encoding} and is a 19-bit
6613 encoding that is used for representing characters individually in a
6614 variable.
6615
6616 (In the following descriptions, we'll ignore composite characters for
6617 the moment.  We also give a general (structural) overview first,
6618 followed later by the exact details.)
6619
6620 @menu
6621 * Internal String Encoding::
6622 * Internal Character Encoding::
6623 @end menu
6624
6625 @node Internal String Encoding
6626 @subsection Internal String Encoding
6627
6628 ASCII characters are encoded using their position code directly.  Other
6629 characters are encoded using their leading byte followed by their
6630 position code(s) with the high bit set.  Characters in private character
6631 sets have their leading byte prefixed with a @dfn{leading byte prefix},
6632 which is either 0x9E or 0x9F. (No character sets are ever assigned these
6633 leading bytes.) Specifically:
6634
6635 @example
6636 Character set           Encoding (PC=position-code, LB=leading-byte)
6637 -------------           --------
6638 ASCII                   PC-1 |
6639 Control-1               LB   |  PC1 + 0xA0 |
6640 Dimension-1 official    LB   |  PC1 + 0x80 |
6641 Dimension-1 private     0x9E |  LB         | PC1 + 0x80 |
6642 Dimension-2 official    LB   |  PC1 + 0x80 | PC2 + 0x80 |
6643 Dimension-2 private     0x9F |  LB         | PC1 + 0x80 | PC2 + 0x80
6644 @end example
6645
6646   The basic characteristic of this encoding is that the first byte
6647 of all characters is in the range 0x00 - 0x9F, and the second and
6648 following bytes of all characters is in the range 0xA0 - 0xFF.
6649 This means that it is impossible to get out of sync, or more
6650 specifically:
6651
6652 @enumerate
6653 @item
6654 Given any byte position, the beginning of the character it is
6655 within can be determined in constant time.
6656 @item
6657 Given any byte position at the beginning of a character, the
6658 beginning of the next character can be determined in constant
6659 time.
6660 @item
6661 Given any byte position at the beginning of a character, the
6662 beginning of the previous character can be determined in constant
6663 time.
6664 @item
6665 Textual searches can simply treat encoded strings as if they
6666 were encoded in a one-byte-per-character fashion rather than
6667 the actual multi-byte encoding.
6668 @end enumerate
6669
6670   None of the standard non-modal encodings meet all of these
6671 conditions.  For example, EUC satisfies only (2) and (3), while
6672 Shift-JIS and Big5 (not yet described) satisfy only (2). (All
6673 non-modal encodings must satisfy (2), in order to be unambiguous.)
6674
6675 @node Internal Character Encoding
6676 @subsection Internal Character Encoding
6677
6678   One 19-bit word represents a single character.  The word is
6679 separated into three fields:
6680
6681 @example
6682 Bit number:     18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
6683                 <------------> <------------------> <------------------>
6684 Field:                1                  2                    3
6685 @end example
6686
6687   Note that fields 2 and 3 hold 7 bits each, while field 1 holds 5 bits.
6688
6689 @example
6690 Character set           Field 1         Field 2         Field 3
6691 -------------           -------         -------         -------
6692 ASCII                      0               0              PC1
6693    range:                                                   (00 - 7F)
6694 Control-1                  0               1              PC1
6695    range:                                                   (00 - 1F)
6696 Dimension-1 official       0            LB - 0x80         PC1
6697    range:                                    (01 - 0D)      (20 - 7F)
6698 Dimension-1 private        0            LB - 0x80         PC1
6699    range:                                    (20 - 6F)      (20 - 7F)
6700 Dimension-2 official    LB - 0x8F         PC1             PC2
6701    range:                    (01 - 0A)       (20 - 7F)      (20 - 7F)
6702 Dimension-2 private     LB - 0xE1         PC1             PC2
6703    range:                    (0F - 1E)       (20 - 7F)      (20 - 7F)
6704 Composite                 0x1F             ?               ?
6705 @end example
6706
6707   Note that character codes 0 - 255 are the same as the ``binary encoding''
6708 described above.
6709
6710 @node CCL
6711 @section CCL
6712
6713 @example
6714 CCL PROGRAM SYNTAX:
6715      CCL_PROGRAM := (CCL_MAIN_BLOCK
6716                      [ CCL_EOF_BLOCK ])
6717
6718      CCL_MAIN_BLOCK := CCL_BLOCK
6719      CCL_EOF_BLOCK := CCL_BLOCK
6720
6721      CCL_BLOCK := STATEMENT | (STATEMENT [STATEMENT ...])
6722      STATEMENT :=
6723              SET | IF | BRANCH | LOOP | REPEAT | BREAK
6724              | READ | WRITE
6725
6726      SET := (REG = EXPRESSION) | (REG SELF_OP EXPRESSION)
6727             | INT-OR-CHAR
6728
6729      EXPRESSION := ARG | (EXPRESSION OP ARG)
6730
6731      IF := (if EXPRESSION CCL_BLOCK CCL_BLOCK)
6732      BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...])
6733      LOOP := (loop STATEMENT [STATEMENT ...])
6734      BREAK := (break)
6735      REPEAT := (repeat)
6736              | (write-repeat [REG | INT-OR-CHAR | string])
6737              | (write-read-repeat REG [INT-OR-CHAR | string | ARRAY]?)
6738      READ := (read REG) | (read REG REG)
6739              | (read-if REG ARITH_OP ARG CCL_BLOCK CCL_BLOCK)
6740              | (read-branch REG CCL_BLOCK [CCL_BLOCK ...])
6741      WRITE := (write REG) | (write REG REG)
6742              | (write INT-OR-CHAR) | (write STRING) | STRING
6743              | (write REG ARRAY)
6744      END := (end)
6745
6746      REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7
6747      ARG := REG | INT-OR-CHAR
6748      OP :=   + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | //
6749              | < | > | == | <= | >= | !=
6750      SELF_OP :=
6751              += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>=
6752      ARRAY := '[' INT-OR-CHAR ... ']'
6753      INT-OR-CHAR := INT | CHAR
6754
6755 MACHINE CODE:
6756
6757 The machine code consists of a vector of 32-bit words.
6758 The first such word specifies the start of the EOF section of the code;
6759 this is the code executed to handle any stuff that needs to be done
6760 (e.g. designating back to ASCII and left-to-right mode) after all
6761 other encoded/decoded data has been written out.  This is not used for
6762 charset CCL programs.
6763
6764 REGISTER: 0..7  -- refered by RRR or rrr
6765
6766 OPERATOR BIT FIELD (27-bit): XXXXXXXXXXXXXXX RRR TTTTT
6767         TTTTT (5-bit): operator type
6768         RRR (3-bit): register number
6769         XXXXXXXXXXXXXXXX (15-bit):
6770                 CCCCCCCCCCCCCCC: constant or address
6771                 000000000000rrr: register number
6772
6773 AAAA:   00000 +
6774         00001 -
6775         00010 *
6776         00011 /
6777         00100 %
6778         00101 &
6779         00110 |
6780         00111 ~
6781
6782         01000 <<
6783         01001 >>
6784         01010 <8
6785         01011 >8
6786         01100 //
6787         01101 not used
6788         01110 not used
6789         01111 not used
6790
6791         10000 <
6792         10001 >
6793         10010 ==
6794         10011 <=
6795         10100 >=
6796         10101 !=
6797
6798 OPERATORS:      TTTTT RRR XX..
6799
6800 SetCS:          00000 RRR C...C      RRR = C...C
6801 SetCL:          00001 RRR .....      RRR = c...c
6802                 c.............c
6803 SetR:           00010 RRR ..rrr      RRR = rrr
6804 SetA:           00011 RRR ..rrr      RRR = array[rrr]
6805                 C.............C      size of array = C...C
6806                 c.............c      contents = c...c
6807
6808 Jump:           00100 000 c...c      jump to c...c
6809 JumpCond:       00101 RRR c...c      if (!RRR) jump to c...c
6810 WriteJump:      00110 RRR c...c      Write1 RRR, jump to c...c
6811 WriteReadJump:  00111 RRR c...c      Write1, Read1 RRR, jump to c...c
6812 WriteCJump:     01000 000 c...c      Write1 C...C, jump to c...c
6813                 C...C
6814 WriteCReadJump: 01001 RRR c...c      Write1 C...C, Read1 RRR,
6815                 C.............C      and jump to c...c
6816 WriteSJump:     01010 000 c...c      WriteS, jump to c...c
6817                 C.............C
6818                 S.............S
6819                 ...
6820 WriteSReadJump: 01011 RRR c...c      WriteS, Read1 RRR, jump to c...c
6821                 C.............C
6822                 S.............S
6823                 ...
6824 WriteAReadJump: 01100 RRR c...c      WriteA, Read1 RRR, jump to c...c
6825                 C.............C      size of array = C...C
6826                 c.............c      contents = c...c
6827                 ...
6828 Branch:         01101 RRR C...C      if (RRR >= 0 && RRR < C..)
6829                 c.............c      branch to (RRR+1)th address
6830 Read1:          01110 RRR ...        read 1-byte to RRR
6831 Read2:          01111 RRR ..rrr      read 2-byte to RRR and rrr
6832 ReadBranch:     10000 RRR C...C      Read1 and Branch
6833                 c.............c
6834                 ...
6835 Write1:         10001 RRR .....      write 1-byte RRR
6836 Write2:         10010 RRR ..rrr      write 2-byte RRR and rrr
6837 WriteC:         10011 000 .....      write 1-char C...CC
6838                 C.............C
6839 WriteS:         10100 000 .....      write C..-byte of string
6840                 C.............C
6841                 S.............S
6842                 ...
6843 WriteA:         10101 RRR .....      write array[RRR]
6844                 C.............C      size of array = C...C
6845                 c.............c      contents = c...c
6846                 ...
6847 End:            10110 000 .....      terminate the execution
6848
6849 SetSelfCS:      10111 RRR C...C      RRR AAAAA= C...C
6850                 ..........AAAAA
6851 SetSelfCL:      11000 RRR .....      RRR AAAAA= c...c
6852                 c.............c
6853                 ..........AAAAA
6854 SetSelfR:       11001 RRR ..Rrr      RRR AAAAA= rrr
6855                 ..........AAAAA
6856 SetExprCL:      11010 RRR ..Rrr      RRR = rrr AAAAA c...c
6857                 c.............c
6858                 ..........AAAAA
6859 SetExprR:       11011 RRR ..rrr      RRR = rrr AAAAA Rrr
6860                 ............Rrr
6861                 ..........AAAAA
6862 JumpCondC:      11100 RRR c...c      if !(RRR AAAAA C..) jump to c...c
6863                 C.............C
6864                 ..........AAAAA
6865 JumpCondR:      11101 RRR c...c      if !(RRR AAAAA rrr) jump to c...c
6866                 ............rrr
6867                 ..........AAAAA
6868 ReadJumpCondC:  11110 RRR c...c      Read1 and JumpCondC
6869                 C.............C
6870                 ..........AAAAA
6871 ReadJumpCondR:  11111 RRR c...c      Read1 and JumpCondR
6872                 ............rrr
6873                 ..........AAAAA
6874 @end example
6875
6876 @node The Lisp Reader and Compiler, Lstreams, MULE Character Sets and Encodings, Top
6877 @chapter The Lisp Reader and Compiler
6878
6879 Not yet documented.
6880
6881 @node Lstreams, Consoles; Devices; Frames; Windows, The Lisp Reader and Compiler, Top
6882 @chapter Lstreams
6883
6884   An @dfn{lstream} is an internal Lisp object that provides a generic
6885 buffering stream implementation.  Conceptually, you send data to the
6886 stream or read data from the stream, not caring what's on the other end
6887 of the stream.  The other end could be another stream, a file
6888 descriptor, a stdio stream, a fixed block of memory, a reallocating
6889 block of memory, etc.  The main purpose of the stream is to provide a
6890 standard interface and to do buffering.  Macros are defined to read or
6891 write characters, so the calling functions do not have to worry about
6892 blocking data together in order to achieve efficiency.
6893
6894 @menu
6895 * Creating an Lstream::         Creating an lstream object.
6896 * Lstream Types::               Different sorts of things that are streamed.
6897 * Lstream Functions::           Functions for working with lstreams.
6898 * Lstream Methods::             Creating new lstream types.
6899 @end menu
6900
6901 @node Creating an Lstream
6902 @section Creating an Lstream
6903
6904 Lstreams come in different types, depending on what is being interfaced
6905 to.  Although the primitive for creating new lstreams is
6906 @code{Lstream_new()}, generally you do not call this directly.  Instead,
6907 you call some type-specific creation function, which creates the lstream
6908 and initializes it as appropriate for the particular type.
6909
6910 All lstream creation functions take a @var{mode} argument, specifying
6911 what mode the lstream should be opened as.  This controls whether the
6912 lstream is for input and output, and optionally whether data should be
6913 blocked up in units of MULE characters.  Note that some types of
6914 lstreams can only be opened for input; others only for output; and
6915 others can be opened either way.  #### Richard Mlynarik thinks that
6916 there should be a strict separation between input and output streams,
6917 and he's probably right.
6918
6919   @var{mode} is a string, one of
6920
6921 @table @code
6922 @item "r"
6923   Open for reading.
6924 @item "w"
6925   Open for writing.
6926 @item "rc"
6927   Open for reading, but ``read'' never returns partial MULE characters.
6928 @item "wc"
6929   Open for writing, but never writes partial MULE characters.
6930 @end table
6931
6932 @node Lstream Types
6933 @section Lstream Types
6934
6935 @table @asis
6936 @item stdio
6937
6938 @item filedesc
6939
6940 @item lisp-string
6941
6942 @item fixed-buffer
6943
6944 @item resizing-buffer
6945
6946 @item dynarr
6947
6948 @item lisp-buffer
6949
6950 @item print
6951
6952 @item decoding
6953
6954 @item encoding
6955 @end table
6956
6957 @node Lstream Functions
6958 @section Lstream Functions
6959
6960 @deftypefun {Lstream *} Lstream_new (Lstream_implementation *@var{imp}, CONST char *@var{mode})
6961 Allocate and return a new Lstream.  This function is not really meant to
6962 be called directly; rather, each stream type should provide its own
6963 stream creation function, which creates the stream and does any other
6964 necessary creation stuff (e.g. opening a file).
6965 @end deftypefun
6966
6967 @deftypefun void Lstream_set_buffering (Lstream *@var{lstr}, Lstream_buffering @var{buffering}, int @var{buffering_size})
6968 Change the buffering of a stream.  See @file{lstream.h}.  By default the
6969 buffering is @code{STREAM_BLOCK_BUFFERED}.
6970 @end deftypefun
6971
6972 @deftypefun int Lstream_flush (Lstream *@var{lstr})
6973 Flush out any pending unwritten data in the stream.  Clear any buffered
6974 input data.  Returns 0 on success, -1 on error.
6975 @end deftypefun
6976
6977 @deftypefn Macro int Lstream_putc (Lstream *@var{stream}, int @var{c})
6978 Write out one byte to the stream.  This is a macro and so it is very
6979 efficient.  The @var{c} argument is only evaluated once but the @var{stream}
6980 argument is evaluated more than once.  Returns 0 on success, -1 on
6981 error.
6982 @end deftypefn
6983
6984 @deftypefn Macro int Lstream_getc (Lstream *@var{stream})
6985 Read one byte from the stream.  This is a macro and so it is very
6986 efficient.  The @var{stream} argument is evaluated more than once.  Return
6987 value is -1 for EOF or error.
6988 @end deftypefn
6989
6990 @deftypefn Macro void Lstream_ungetc (Lstream *@var{stream}, int @var{c})
6991 Push one byte back onto the input queue.  This will be the next byte
6992 read from the stream.  Any number of bytes can be pushed back and will
6993 be read in the reverse order they were pushed back -- most recent
6994 first. (This is necessary for consistency -- if there are a number of
6995 bytes that have been unread and I read and unread a byte, it needs to be
6996 the first to be read again.) This is a macro and so it is very
6997 efficient.  The @var{c} argument is only evaluated once but the @var{stream}
6998 argument is evaluated more than once.
6999 @end deftypefn
7000
7001 @deftypefun int Lstream_fputc (Lstream *@var{stream}, int @var{c})
7002 @deftypefunx int Lstream_fgetc (Lstream *@var{stream})
7003 @deftypefunx void Lstream_fungetc (Lstream *@var{stream}, int @var{c})
7004 Function equivalents of the above macros.
7005 @end deftypefun
7006
7007 @deftypefun int Lstream_read (Lstream *@var{stream}, void *@var{data}, int @var{size})
7008 Read @var{size} bytes of @var{data} from the stream.  Return the number
7009 of bytes read.  0 means EOF. -1 means an error occurred and no bytes
7010 were read.
7011 @end deftypefun
7012
7013 @deftypefun int Lstream_write (Lstream *@var{stream}, void *@var{data}, int @var{size})
7014 Write @var{size} bytes of @var{data} to the stream.  Return the number
7015 of bytes written.  -1 means an error occurred and no bytes were written.
7016 @end deftypefun
7017
7018 @deftypefun void Lstream_unread (Lstream *@var{stream}, void *@var{data}, int @var{size})
7019 Push back @var{size} bytes of @var{data} onto the input queue.  The next
7020 call to @code{Lstream_read()} with the same size will read the same
7021 bytes back.  Note that this will be the case even if there is other
7022 pending unread data.
7023 @end deftypefun
7024
7025 @deftypefun int Lstream_close (Lstream *@var{stream})
7026 Close the stream.  All data will be flushed out.
7027 @end deftypefun
7028
7029 @deftypefun void Lstream_reopen (Lstream *@var{stream})
7030 Reopen a closed stream.  This enables I/O on it again.  This is not
7031 meant to be called except from a wrapper routine that reinitializes
7032 variables and such -- the close routine may well have freed some
7033 necessary storage structures, for example.
7034 @end deftypefun
7035
7036 @deftypefun void Lstream_rewind (Lstream *@var{stream})
7037 Rewind the stream to the beginning.
7038 @end deftypefun
7039
7040 @node Lstream Methods
7041 @section Lstream Methods
7042
7043 @deftypefn {Lstream Method} int reader (Lstream *@var{stream}, unsigned char *@var{data}, int @var{size})
7044 Read some data from the stream's end and store it into @var{data}, which
7045 can hold @var{size} bytes.  Return the number of bytes read.  A return
7046 value of 0 means no bytes can be read at this time.  This may be because
7047 of an EOF, or because there is a granularity greater than one byte that
7048 the stream imposes on the returned data, and @var{size} is less than
7049 this granularity. (This will happen frequently for streams that need to
7050 return whole characters, because @code{Lstream_read()} calls the reader
7051 function repeatedly until it has the number of bytes it wants or until 0
7052 is returned.)  The lstream functions do not treat a 0 return as EOF or
7053 do anything special; however, the calling function will interpret any 0
7054 it gets back as EOF.  This will normally not happen unless the caller
7055 calls @code{Lstream_read()} with a very small size.
7056
7057 This function can be @code{NULL} if the stream is output-only.
7058 @end deftypefn
7059
7060 @deftypefn {Lstream Method} int writer (Lstream *@var{stream}, CONST unsigned char *@var{data}, int @var{size})
7061 Send some data to the stream's end.  Data to be sent is in @var{data}
7062 and is @var{size} bytes.  Return the number of bytes sent.  This
7063 function can send and return fewer bytes than is passed in; in that
7064 case, the function will just be called again until there is no data left
7065 or 0 is returned.  A return value of 0 means that no more data can be
7066 currently stored, but there is no error; the data will be squirreled
7067 away until the writer can accept data. (This is useful, e.g., if you're
7068 dealing with a non-blocking file descriptor and are getting
7069 @code{EWOULDBLOCK} errors.)  This function can be @code{NULL} if the
7070 stream is input-only.
7071 @end deftypefn
7072
7073 @deftypefn {Lstream Method} int rewinder (Lstream *@var{stream})
7074 Rewind the stream.  If this is @code{NULL}, the stream is not seekable.
7075 @end deftypefn
7076
7077 @deftypefn {Lstream Method} int seekable_p (Lstream *@var{stream})
7078 Indicate whether this stream is seekable -- i.e. it can be rewound.
7079 This method is ignored if the stream does not have a rewind method.  If
7080 this method is not present, the result is determined by whether a rewind
7081 method is present.
7082 @end deftypefn
7083
7084 @deftypefn {Lstream Method} int flusher (Lstream *@var{stream})
7085 Perform any additional operations necessary to flush the data in this
7086 stream.
7087 @end deftypefn
7088
7089 @deftypefn {Lstream Method} int pseudo_closer (Lstream *@var{stream})
7090 @end deftypefn
7091
7092 @deftypefn {Lstream Method} int closer (Lstream *@var{stream})
7093 Perform any additional operations necessary to close this stream down.
7094 May be @code{NULL}.  This function is called when @code{Lstream_close()}
7095 is called or when the stream is garbage-collected.  When this function
7096 is called, all pending data in the stream will already have been written
7097 out.
7098 @end deftypefn
7099
7100 @deftypefn {Lstream Method} Lisp_Object marker (Lisp_Object @var{lstream}, void (*@var{markfun}) (Lisp_Object))
7101 Mark this object for garbage collection.  Same semantics as a standard
7102 @code{Lisp_Object} marker.  This function can be @code{NULL}.
7103 @end deftypefn
7104
7105 @node Consoles; Devices; Frames; Windows, The Redisplay Mechanism, Lstreams, Top
7106 @chapter Consoles; Devices; Frames; Windows
7107
7108 @menu
7109 * Introduction to Consoles; Devices; Frames; Windows::
7110 * Point::
7111 * Window Hierarchy::
7112 * The Window Object::
7113 @end menu
7114
7115 @node Introduction to Consoles; Devices; Frames; Windows
7116 @section Introduction to Consoles; Devices; Frames; Windows
7117
7118 A window-system window that you see on the screen is called a
7119 @dfn{frame} in Emacs terminology.  Each frame is subdivided into one or
7120 more non-overlapping panes, called (confusingly) @dfn{windows}.  Each
7121 window displays the text of a buffer in it. (See above on Buffers.) Note
7122 that buffers and windows are independent entities: Two or more windows
7123 can be displaying the same buffer (potentially in different locations),
7124 and a buffer can be displayed in no windows.
7125
7126   A single display screen that contains one or more frames is called
7127 a @dfn{display}.  Under most circumstances, there is only one display.
7128 However, more than one display can exist, for example if you have
7129 a @dfn{multi-headed} console, i.e. one with a single keyboard but
7130 multiple displays. (Typically in such a situation, the various
7131 displays act like one large display, in that the mouse is only
7132 in one of them at a time, and moving the mouse off of one moves
7133 it into another.) In some cases, the different displays will
7134 have different characteristics, e.g. one color and one mono.
7135
7136   XEmacs can display frames on multiple displays.  It can even deal
7137 simultaneously with frames on multiple keyboards (called @dfn{consoles} in
7138 XEmacs terminology).  Here is one case where this might be useful: You
7139 are using XEmacs on your workstation at work, and leave it running.
7140 Then you go home and dial in on a TTY line, and you can use the
7141 already-running XEmacs process to display another frame on your local
7142 TTY.
7143
7144   Thus, there is a hierarchy console -> display -> frame -> window.
7145 There is a separate Lisp object type for each of these four concepts.
7146 Furthermore, there is logically a @dfn{selected console},
7147 @dfn{selected display}, @dfn{selected frame}, and @dfn{selected window}.
7148 Each of these objects is distinguished in various ways, such as being the
7149 default object for various functions that act on objects of that type.
7150 Note that every containing object rememembers the ``selected'' object
7151 among the objects that it contains: e.g. not only is there a selected
7152 window, but every frame remembers the last window in it that was
7153 selected, and changing the selected frame causes the remembered window
7154 within it to become the selected window.  Similar relationships apply
7155 for consoles to devices and devices to frames.
7156
7157 @node Point
7158 @section Point
7159
7160   Recall that every buffer has a current insertion position, called
7161 @dfn{point}.  Now, two or more windows may be displaying the same buffer,
7162 and the text cursor in the two windows (i.e. @code{point}) can be in
7163 two different places.  You may ask, how can that be, since each
7164 buffer has only one value of @code{point}?  The answer is that each window
7165 also has a value of @code{point} that is squirreled away in it.  There
7166 is only one selected window, and the value of ``point'' in that buffer
7167 corresponds to that window.  When the selected window is changed
7168 from one window to another displaying the same buffer, the old
7169 value of @code{point} is stored into the old window's ``point'' and the
7170 value of @code{point} from the new window is retrieved and made the
7171 value of @code{point} in the buffer.  This means that @code{window-point}
7172 for the selected window is potentially inaccurate, and if you
7173 want to retrieve the correct value of @code{point} for a window,
7174 you must special-case on the selected window and retrieve the
7175 buffer's point instead.  This is related to why @code{save-window-excursion}
7176 does not save the selected window's value of @code{point}.
7177
7178 @node Window Hierarchy
7179 @section Window Hierarchy
7180 @cindex window hierarchy
7181 @cindex hierarchy of windows
7182
7183   If a frame contains multiple windows (panes), they are always created
7184 by splitting an existing window along the horizontal or vertical axis.
7185 Terminology is a bit confusing here: to @dfn{split a window
7186 horizontally} means to create two side-by-side windows, i.e. to make a
7187 @emph{vertical} cut in a window.  Likewise, to @dfn{split a window
7188 vertically} means to create two windows, one above the other, by making
7189 a @emph{horizontal} cut.
7190
7191   If you split a window and then split again along the same axis, you
7192 will end up with a number of panes all arranged along the same axis.
7193 The precise way in which the splits were made should not be important,
7194 and this is reflected internally.  Internally, all windows are arranged
7195 in a tree, consisting of two types of windows, @dfn{combination} windows
7196 (which have children, and are covered completely by those children) and
7197 @dfn{leaf} windows, which have no children and are visible.  Every
7198 combination window has two or more children, all arranged along the same
7199 axis.  There are (logically) two subtypes of windows, depending on
7200 whether their children are horizontally or vertically arrayed.  There is
7201 always one root window, which is either a leaf window (if the frame
7202 contains only one window) or a combination window (if the frame contains
7203 more than one window).  In the latter case, the root window will have
7204 two or more children, either horizontally or vertically arrayed, and
7205 each of those children will be either a leaf window or another
7206 combination window.
7207
7208   Here are some rules:
7209
7210 @enumerate
7211 @item
7212 Horizontal combination windows can never have children that are
7213 horizontal combination windows; same for vertical.
7214
7215 @item
7216 Only leaf windows can be split (obviously) and this splitting does one
7217 of two things: (a) turns the leaf window into a combination window and
7218 creates two new leaf children, or (b) turns the leaf window into one of
7219 the two new leaves and creates the other leaf.  Rule (1) dictates which
7220 of these two outcomes happens.
7221
7222 @item
7223 Every combination window must have at least two children.
7224
7225 @item
7226 Leaf windows can never become combination windows.  They can be deleted,
7227 however.  If this results in a violation of (3), the parent combination
7228 window also gets deleted.
7229
7230 @item
7231 All functions that accept windows must be prepared to accept combination
7232 windows, and do something sane (e.g. signal an error if so).
7233 Combination windows @emph{do} escape to the Lisp level.
7234
7235 @item
7236 All windows have three fields governing their contents:
7237 these are @dfn{hchild} (a list of horizontally-arrayed children),
7238 @dfn{vchild} (a list of vertically-arrayed children), and @dfn{buffer}
7239 (the buffer contained in a leaf window).  Exactly one of
7240 these will be non-nil.  Remember that @dfn{horizontally-arrayed}
7241 means ``side-by-side'' and @dfn{vertically-arrayed} means
7242 @dfn{one above the other}.
7243
7244 @item
7245 Leaf windows also have markers in their @code{start} (the
7246 first buffer position displayed in the window) and @code{pointm}
7247 (the window's stashed value of @code{point} -- see above) fields,
7248 while combination windows have nil in these fields.
7249
7250 @item
7251 The list of children for a window is threaded through the
7252 @code{next} and @code{prev} fields of each child window.
7253
7254 @item
7255 @strong{Deleted windows can be undeleted}.  This happens as a result of
7256 restoring a window configuration, and is unlike frames, displays, and
7257 consoles, which, once deleted, can never be restored.  Deleting a window
7258 does nothing except set a special @code{dead} bit to 1 and clear out the
7259 @code{next}, @code{prev}, @code{hchild}, and @code{vchild} fields, for
7260 GC purposes.
7261
7262 @item
7263 Most frames actually have two top-level windows -- one for the
7264 minibuffer and one (the @dfn{root}) for everything else.  The modeline
7265 (if present) separates these two.  The @code{next} field of the root
7266 points to the minibuffer, and the @code{prev} field of the minibuffer
7267 points to the root.  The other @code{next} and @code{prev} fields are
7268 @code{nil}, and the frame points to both of these windows.
7269 Minibuffer-less frames have no minibuffer window, and the @code{next}
7270 and @code{prev} of the root window are @code{nil}.  Minibuffer-only
7271 frames have no root window, and the @code{next} of the minibuffer window
7272 is @code{nil} but the @code{prev} points to itself. (#### This is an
7273 artifact that should be fixed.)
7274 @end enumerate
7275
7276 @node The Window Object
7277 @section The Window Object
7278
7279   Windows have the following accessible fields:
7280
7281 @table @code
7282 @item frame
7283 The frame that this window is on.
7284
7285 @item mini_p
7286 Non-@code{nil} if this window is a minibuffer window.
7287
7288 @item buffer
7289 The buffer that the window is displaying.  This may change often during
7290 the life of the window.
7291
7292 @item dedicated
7293 Non-@code{nil} if this window is dedicated to its buffer.
7294
7295 @item pointm
7296 @cindex window point internals
7297 This is the value of point in the current buffer when this window is
7298 selected; when it is not selected, it retains its previous value.
7299
7300 @item start
7301 The position in the buffer that is the first character to be displayed
7302 in the window.
7303
7304 @item force_start
7305 If this flag is non-@code{nil}, it says that the window has been
7306 scrolled explicitly by the Lisp program.  This affects what the next
7307 redisplay does if point is off the screen: instead of scrolling the
7308 window to show the text around point, it moves point to a location that
7309 is on the screen.
7310
7311 @item last_modified
7312 The @code{modified} field of the window's buffer, as of the last time
7313 a redisplay completed in this window.
7314
7315 @item last_point
7316 The buffer's value of point, as of the last time
7317 a redisplay completed in this window.
7318
7319 @item left
7320 This is the left-hand edge of the window, measured in columns.  (The
7321 leftmost column on the screen is @w{column 0}.)
7322
7323 @item top
7324 This is the top edge of the window, measured in lines.  (The top line on
7325 the screen is @w{line 0}.)
7326
7327 @item height
7328 The height of the window, measured in lines.
7329
7330 @item width
7331 The width of the window, measured in columns.
7332
7333 @item next
7334 This is the window that is the next in the chain of siblings.  It is
7335 @code{nil} in a window that is the rightmost or bottommost of a group of
7336 siblings.
7337
7338 @item prev
7339 This is the window that is the previous in the chain of siblings.  It is
7340 @code{nil} in a window that is the leftmost or topmost of a group of
7341 siblings.
7342
7343 @item parent
7344 Internally, XEmacs arranges windows in a tree; each group of siblings has
7345 a parent window whose area includes all the siblings.  This field points
7346 to a window's parent.
7347
7348 Parent windows do not display buffers, and play little role in display
7349 except to shape their child windows.  Emacs Lisp programs usually have
7350 no access to the parent windows; they operate on the windows at the
7351 leaves of the tree, which actually display buffers.
7352
7353 @item hscroll
7354 This is the number of columns that the display in the window is scrolled
7355 horizontally to the left.  Normally, this is 0.
7356
7357 @item use_time
7358 This is the last time that the window was selected.  The function
7359 @code{get-lru-window} uses this field.
7360
7361 @item display_table
7362 The window's display table, or @code{nil} if none is specified for it.
7363
7364 @item update_mode_line
7365 Non-@code{nil} means this window's mode line needs to be updated.
7366
7367 @item base_line_number
7368 The line number of a certain position in the buffer, or @code{nil}.
7369 This is used for displaying the line number of point in the mode line.
7370
7371 @item base_line_pos
7372 The position in the buffer for which the line number is known, or
7373 @code{nil} meaning none is known.
7374
7375 @item region_showing
7376 If the region (or part of it) is highlighted in this window, this field
7377 holds the mark position that made one end of that region.  Otherwise,
7378 this field is @code{nil}.
7379 @end table
7380
7381 @node The Redisplay Mechanism, Extents, Consoles; Devices; Frames; Windows, Top
7382 @chapter The Redisplay Mechanism
7383
7384   The redisplay mechanism is one of the most complicated sections of
7385 XEmacs, especially from a conceptual standpoint.  This is doubly so
7386 because, unlike for the basic aspects of the Lisp interpreter, the
7387 computer science theories of how to efficiently handle redisplay are not
7388 well-developed.
7389
7390   When working with the redisplay mechanism, remember the Golden Rules
7391 of Redisplay:
7392
7393 @enumerate
7394 @item
7395 It Is Better To Be Correct Than Fast.
7396 @item
7397 Thou Shalt Not Run Elisp From Within Redisplay.
7398 @item
7399 It Is Better To Be Fast Than Not To Be.
7400 @end enumerate
7401
7402 @menu
7403 * Critical Redisplay Sections::
7404 * Line Start Cache::
7405 @end menu
7406
7407 @node Critical Redisplay Sections
7408 @section Critical Redisplay Sections
7409 @cindex critical redisplay sections
7410
7411 Within this section, we are defenseless and assume that the
7412 following cannot happen:
7413
7414 @enumerate
7415 @item
7416 garbage collection
7417 @item
7418 Lisp code evaluation
7419 @item
7420 frame size changes
7421 @end enumerate
7422
7423 We ensure (3) by calling @code{hold_frame_size_changes()}, which
7424 will cause any pending frame size changes to get put on hold
7425 till after the end of the critical section.  (1) follows
7426 automatically if (2) is met.  #### Unfortunately, there are
7427 some places where Lisp code can be called within this section.
7428 We need to remove them.
7429
7430 If @code{Fsignal()} is called during this critical section, we
7431 will @code{abort()}.
7432
7433 If garbage collection is called during this critical section,
7434 we simply return. #### We should abort instead.
7435
7436 #### If a frame-size change does occur we should probably
7437 actually be preempting redisplay.
7438
7439 @node Line Start Cache
7440 @section Line Start Cache
7441 @cindex line start cache
7442
7443   The traditional scrolling code in Emacs breaks in a variable height
7444 world.  It depends on the key assumption that the number of lines that
7445 can be displayed at any given time is fixed.  This led to a complete
7446 separation of the scrolling code from the redisplay code.  In order to
7447 fully support variable height lines, the scrolling code must actually be
7448 tightly integrated with redisplay.  Only redisplay can determine how
7449 many lines will be displayed on a screen for any given starting point.
7450
7451   What is ideally wanted is a complete list of the starting buffer
7452 position for every possible display line of a buffer along with the
7453 height of that display line.  Maintaining such a full list would be very
7454 expensive.  We settle for having it include information for all areas
7455 which we happen to generate anyhow (i.e. the region currently being
7456 displayed) and for those areas we need to work with.
7457
7458   In order to ensure that the cache accurately represents what redisplay
7459 would actually show, it is necessary to invalidate it in many
7460 situations.  If the buffer changes, the starting positions may no longer
7461 be correct.  If a face or an extent has changed then the line heights
7462 may have altered.  These events happen frequently enough that the cache
7463 can end up being constantly disabled.  With this potentially constant
7464 invalidation when is the cache ever useful?
7465
7466   Even if the cache is invalidated before every single usage, it is
7467 necessary.  Scrolling often requires knowledge about display lines which
7468 are actually above or below the visible region.  The cache provides a
7469 convenient light-weight method of storing this information for multiple
7470 display regions.  This knowledge is necessary for the scrolling code to
7471 always obey the First Golden Rule of Redisplay.
7472
7473   If the cache already contains all of the information that the scrolling
7474 routines happen to need so that it doesn't have to go generate it, then
7475 we are able to obey the Third Golden Rule of Redisplay.  The first thing
7476 we do to help out the cache is to always add the displayed region.  This
7477 region had to be generated anyway, so the cache ends up getting the
7478 information basically for free.  In those cases where a user is simply
7479 scrolling around viewing a buffer there is a high probability that this
7480 is sufficient to always provide the needed information.  The second
7481 thing we can do is be smart about invalidating the cache.
7482
7483   TODO -- Be smart about invalidating the cache.  Potential places:
7484
7485 @itemize @bullet
7486 @item
7487 Insertions at end-of-line which don't cause line-wraps do not alter the
7488 starting positions of any display lines.  These types of buffer
7489 modifications should not invalidate the cache.  This is actually a large
7490 optimization for redisplay speed as well.
7491 @item
7492 Buffer modifications frequently only affect the display of lines at and
7493 below where they occur.  In these situations we should only invalidate
7494 the part of the cache starting at where the modification occurs.
7495 @end itemize
7496
7497   In case you're wondering, the Second Golden Rule of Redisplay is not
7498 applicable.
7499
7500 @node Extents, Faces and Glyphs, The Redisplay Mechanism, Top
7501 @chapter Extents
7502
7503 @menu
7504 * Introduction to Extents::     Extents are ranges over text, with properties.
7505 * Extent Ordering::             How extents are ordered internally.
7506 * Format of the Extent Info::   The extent information in a buffer or string.
7507 * Zero-Length Extents::         A weird special case.
7508 * Mathematics of Extent Ordering::      A rigorous foundation.
7509 * Extent Fragments::            Cached information useful for redisplay.
7510 @end menu
7511
7512 @node Introduction to Extents
7513 @section Introduction to Extents
7514
7515   Extents are regions over a buffer, with a start and an end position
7516 denoting the region of the buffer included in the extent.  In
7517 addition, either end can be closed or open, meaning that the endpoint
7518 is or is not logically included in the extent.  Insertion of a character
7519 at a closed endpoint causes the character to go inside the extent;
7520 insertion at an open endpoint causes the character to go outside.
7521
7522   Extent endpoints are stored using memory indices (see @file{insdel.c}),
7523 to minimize the amount of adjusting that needs to be done when
7524 characters are inserted or deleted.
7525
7526   (Formerly, extent endpoints at the gap could be either before or
7527 after the gap, depending on the open/closedness of the endpoint.
7528 The intent of this was to make it so that insertions would
7529 automatically go inside or out of extents as necessary with no
7530 further work needing to be done.  It didn't work out that way,
7531 however, and just ended up complexifying and buggifying all the
7532 rest of the code.)
7533
7534 @node Extent Ordering
7535 @section Extent Ordering
7536
7537   Extents are compared using memory indices.  There are two orderings
7538 for extents and both orders are kept current at all times.  The normal
7539 or @dfn{display} order is as follows:
7540
7541 @example
7542 Extent A is ``less than'' extent B,
7543 that is, earlier in the display order,
7544   if:    A-start < B-start,
7545   or if: A-start = B-start, and A-end > B-end
7546 @end example
7547
7548   So if two extents begin at the same position, the larger of them is the
7549 earlier one in the display order (@code{EXTENT_LESS} is true).
7550
7551   For the e-order, the same thing holds:
7552
7553 @example
7554 Extent A is ``less than'' extent B in e-order,
7555 that is, later in the buffer,
7556   if:    A-end < B-end,
7557   or if: A-end = B-end, and A-start > B-start
7558 @end example
7559
7560   So if two extents end at the same position, the smaller of them is the
7561 earlier one in the e-order (@code{EXTENT_E_LESS} is true).
7562
7563   The display order and the e-order are complementary orders: any
7564 theorem about the display order also applies to the e-order if you swap
7565 all occurrences of ``display order'' and ``e-order'', ``less than'' and
7566 ``greater than'', and ``extent start'' and ``extent end''.
7567
7568 @node Format of the Extent Info
7569 @section Format of the Extent Info
7570
7571   An extent-info structure consists of a list of the buffer or string's
7572 extents and a @dfn{stack of extents} that lists all of the extents over
7573 a particular position.  The stack-of-extents info is used for
7574 optimization purposes -- it basically caches some info that might
7575 be expensive to compute.  Certain otherwise hard computations are easy
7576 given the stack of extents over a particular position, and if the
7577 stack of extents over a nearby position is known (because it was
7578 calculated at some prior point in time), it's easy to move the stack
7579 of extents to the proper position.
7580
7581   Given that the stack of extents is an optimization, and given that
7582 it requires memory, a string's stack of extents is wiped out each
7583 time a garbage collection occurs.  Therefore, any time you retrieve
7584 the stack of extents, it might not be there.  If you need it to
7585 be there, use the @code{_force} version.
7586
7587   Similarly, a string may or may not have an extent_info structure.
7588 (Generally it won't if there haven't been any extents added to the
7589 string.) So use the @code{_force} version if you need the extent_info
7590 structure to be there.
7591
7592   A list of extents is maintained as a double gap array: one gap array
7593 is ordered by start index (the @dfn{display order}) and the other is
7594 ordered by end index (the @dfn{e-order}).  Note that positions in an
7595 extent list should logically be conceived of as referring @emph{to} a
7596 particular extent (as is the norm in programs) rather than sitting
7597 between two extents.  Note also that callers of these functions should
7598 not be aware of the fact that the extent list is implemented as an
7599 array, except for the fact that positions are integers (this should be
7600 generalized to handle integers and linked list equally well).
7601
7602 @node Zero-Length Extents
7603 @section Zero-Length Extents
7604
7605   Extents can be zero-length, and will end up that way if their endpoints
7606 are explicitly set that way or if their detachable property is nil
7607 and all the text in the extent is deleted. (The exception is open-open
7608 zero-length extents, which are barred from existing because there is
7609 no sensible way to define their properties.  Deletion of the text in
7610 an open-open extent causes it to be converted into a closed-open
7611 extent.)  Zero-length extents are primarily used to represent
7612 annotations, and behave as follows:
7613
7614 @enumerate
7615 @item
7616 Insertion at the position of a zero-length extent expands the extent
7617 if both endpoints are closed; goes after the extent if it is closed-open;
7618 and goes before the extent if it is open-closed.
7619
7620 @item
7621 Deletion of a character on a side of a zero-length extent whose
7622 corresponding endpoint is closed causes the extent to be detached if
7623 it is detachable; if the extent is not detachable or the corresponding
7624 endpoint is open, the extent remains in the buffer, moving as necessary.
7625 @end enumerate
7626
7627   Note that closed-open, non-detachable zero-length extents behave
7628 exactly like markers and that open-closed, non-detachable zero-length
7629 extents behave like the ``point-type'' marker in Mule.
7630
7631 @node Mathematics of Extent Ordering
7632 @section Mathematics of Extent Ordering
7633 @cindex extent mathematics
7634 @cindex mathematics of extents
7635 @cindex extent ordering
7636
7637 @cindex display order of extents
7638 @cindex extents, display order
7639   The extents in a buffer are ordered by ``display order'' because that
7640 is that order that the redisplay mechanism needs to process them in.
7641 The e-order is an auxiliary ordering used to facilitate operations
7642 over extents.  The operations that can be performed on the ordered
7643 list of extents in a buffer are
7644
7645 @enumerate
7646 @item
7647 Locate where an extent would go if inserted into the list.
7648 @item
7649 Insert an extent into the list.
7650 @item
7651 Remove an extent from the list.
7652 @item
7653 Map over all the extents that overlap a range.
7654 @end enumerate
7655
7656   (4) requires being able to determine the first and last extents
7657 that overlap a range.
7658
7659   NOTE: @dfn{overlap} is used as follows:
7660
7661 @itemize @bullet
7662 @item
7663 two ranges overlap if they have at least one point in common.
7664 Whether the endpoints are open or closed makes a difference here.
7665 @item
7666 a point overlaps a range if the point is contained within the
7667 range; this is equivalent to treating a point @math{P} as the range
7668 @math{[P, P]}.
7669 @item
7670 In the case of an @emph{extent} overlapping a point or range, the extent
7671 is normally treated as having closed endpoints.  This applies
7672 consistently in the discussion of stacks of extents and such below.
7673 Note that this definition of overlap is not necessarily consistent with
7674 the extents that @code{map-extents} maps over, since @code{map-extents}
7675 sometimes pays attention to whether the endpoints of an extents are open
7676 or closed.  But for our purposes, it greatly simplifies things to treat
7677 all extents as having closed endpoints.
7678 @end itemize
7679
7680 First, define @math{>}, @math{<}, @math{<=}, etc. as applied to extents
7681 to mean comparison according to the display order.  Comparison between
7682 an extent @math{E} and an index @math{I} means comparison between
7683 @math{E} and the range @math{[I, I]}.
7684
7685 Also define @math{e>}, @math{e<}, @math{e<=}, etc. to mean comparison
7686 according to the e-order.
7687
7688 For any range @math{R}, define @math{R(0)} to be the starting index of
7689 the range and @math{R(1)} to be the ending index of the range.
7690
7691 For any extent @math{E}, define @math{E(next)} to be the extent directly
7692 following @math{E}, and @math{E(prev)} to be the extent directly
7693 preceding @math{E}.  Assume @math{E(next)} and @math{E(prev)} can be
7694 determined from @math{E} in constant time.  (This is because we store
7695 the extent list as a doubly linked list.)
7696
7697 Similarly, define @math{E(e-next)} and @math{E(e-prev)} to be the
7698 extents directly following and preceding @math{E} in the e-order.
7699
7700 Now:
7701
7702 Let @math{R} be a range.
7703 Let @math{F} be the first extent overlapping @math{R}.
7704 Let @math{L} be the last extent overlapping @math{R}.
7705
7706 Theorem 1: @math{R(1)} lies between @math{L} and @math{L(next)},
7707 i.e. @math{L <= R(1) < L(next)}.
7708
7709   This follows easily from the definition of display order.  The
7710 basic reason that this theorem applies is that the display order
7711 sorts by increasing starting index.
7712
7713   Therefore, we can determine @math{L} just by looking at where we would
7714 insert @math{R(1)} into the list, and if we know @math{F} and are moving
7715 forward over extents, we can easily determine when we've hit @math{L} by
7716 comparing the extent we're at to @math{R(1)}.
7717
7718 @example
7719 Theorem 2: @math{F(e-prev) e< [1, R(0)] e<= F}.
7720 @end example
7721
7722   This is the analog of Theorem 1, and applies because the e-order
7723 sorts by increasing ending index.
7724
7725   Therefore, @math{F} can be found in the same amount of time as
7726 operation (1), i.e. the time that it takes to locate where an extent
7727 would go if inserted into the e-order list.
7728
7729   If the lists were stored as balanced binary trees, then operation (1)
7730 would take logarithmic time, which is usually quite fast.  However,
7731 currently they're stored as simple doubly-linked lists, and instead we
7732 do some caching to try to speed things up.
7733
7734   Define a @dfn{stack of extents} (or @dfn{SOE}) as the set of extents
7735 (ordered in the display order) that overlap an index @math{I}, together
7736 with the SOE's @dfn{previous} extent, which is an extent that precedes
7737 @math{I} in the e-order. (Hopefully there will not be very many extents
7738 between @math{I} and the previous extent.)
7739
7740 Now:
7741
7742 Let @math{I} be an index, let @math{S} be the stack of extents on
7743 @math{I}, let @math{F} be the first extent in @math{S}, and let @math{P}
7744 be @math{S}'s previous extent.
7745
7746 Theorem 3: The first extent in @math{S} is the first extent that overlaps
7747 any range @math{[I, J]}.
7748
7749 Proof: Any extent that overlaps @math{[I, J]} but does not include
7750 @math{I} must have a start index @math{> I}, and thus be greater than
7751 any extent in @math{S}.
7752
7753 Therefore, finding the first extent that overlaps a range @math{R} is
7754 the same as finding the first extent that overlaps @math{R(0)}.
7755
7756 Theorem 4: Let @math{I2} be an index such that @math{I2 > I}, and let
7757 @math{F2} be the first extent that overlaps @math{I2}.  Then, either
7758 @math{F2} is in @math{S} or @math{F2} is greater than any extent in
7759 @math{S}.
7760
7761 Proof: If @math{F2} does not include @math{I} then its start index is
7762 greater than @math{I} and thus it is greater than any extent in
7763 @math{S}, including @math{F}.  Otherwise, @math{F2} includes @math{I}
7764 and thus is in @math{S}, and thus @math{F2 >= F}.
7765
7766 @node Extent Fragments
7767 @section Extent Fragments
7768 @cindex extent fragment
7769
7770   Imagine that the buffer is divided up into contiguous, non-overlapping
7771 @dfn{runs} of text such that no extent starts or ends within a run
7772 (extents that abut the run don't count).
7773
7774   An extent fragment is a structure that holds data about the run that
7775 contains a particular buffer position (if the buffer position is at the
7776 junction of two runs, the run after the position is used) -- the
7777 beginning and end of the run, a list of all of the extents in that run,
7778 the @dfn{merged face} that results from merging all of the faces
7779 corresponding to those extents, the begin and end glyphs at the
7780 beginning of the run, etc.  This is the information that redisplay needs
7781 in order to display this run.
7782
7783   Extent fragments have to be very quick to update to a new buffer
7784 position when moving linearly through the buffer.  They rely on the
7785 stack-of-extents code, which does the heavy-duty algorithmic work of
7786 determining which extents overly a particular position.
7787
7788 @node Faces and Glyphs, Specifiers, Extents, Top
7789 @chapter Faces and Glyphs
7790
7791 Not yet documented.
7792
7793 @node Specifiers, Menus, Faces and Glyphs, Top
7794 @chapter Specifiers
7795
7796 Not yet documented.
7797
7798 @node Menus, Subprocesses, Specifiers, Top
7799 @chapter Menus
7800
7801   A menu is set by setting the value of the variable
7802 @code{current-menubar} (which may be buffer-local) and then calling
7803 @code{set-menubar-dirty-flag} to signal a change.  This will cause the
7804 menu to be redrawn at the next redisplay.  The format of the data in
7805 @code{current-menubar} is described in @file{menubar.c}.
7806
7807   Internally the data in current-menubar is parsed into a tree of
7808 @code{widget_value's} (defined in @file{lwlib.h}); this is accomplished
7809 by the recursive function @code{menu_item_descriptor_to_widget_value()},
7810 called by @code{compute_menubar_data()}.  Such a tree is deallocated
7811 using @code{free_widget_value()}.
7812
7813   @code{update_screen_menubars()} is one of the external entry points.
7814 This checks to see, for each screen, if that screen's menubar needs to
7815 be updated.  This is the case if
7816
7817 @enumerate
7818 @item
7819 @code{set-menubar-dirty-flag} was called since the last redisplay.  (This
7820 function sets the C variable menubar_has_changed.)
7821 @item
7822 The buffer displayed in the screen has changed.
7823 @item
7824 The screen has no menubar currently displayed.
7825 @end enumerate
7826
7827   @code{set_screen_menubar()} is called for each such screen.  This
7828 function calls @code{compute_menubar_data()} to create the tree of
7829 widget_value's, then calls @code{lw_create_widget()},
7830 @code{lw_modify_all_widgets()}, and/or @code{lw_destroy_all_widgets()}
7831 to create the X-Toolkit widget associated with the menu.
7832
7833   @code{update_psheets()}, the other external entry point, actually
7834 changes the menus being displayed.  It uses the widgets fixed by
7835 @code{update_screen_menubars()} and calls various X functions to ensure
7836 that the menus are displayed properly.
7837
7838   The menubar widget is set up so that @code{pre_activate_callback()} is
7839 called when the menu is first selected (i.e. mouse button goes down),
7840 and @code{menubar_selection_callback()} is called when an item is
7841 selected.  @code{pre_activate_callback()} calls the function in
7842 activate-menubar-hook, which can change the menubar (this is described
7843 in @file{menubar.c}).  If the menubar is changed,
7844 @code{set_screen_menubars()} is called.
7845 @code{menubar_selection_callback()} enqueues a menu event, putting in it
7846 a function to call (either @code{eval} or @code{call-interactively}) and
7847 its argument, which is the callback function or form given in the menu's
7848 description.
7849
7850 @node Subprocesses, Interface to X Windows, Menus, Top
7851 @chapter Subprocesses
7852
7853   The fields of a process are:
7854
7855 @table @code
7856 @item name
7857 A string, the name of the process.
7858
7859 @item command
7860 A list containing the command arguments that were used to start this
7861 process.
7862
7863 @item filter
7864 A function used to accept output from the process instead of a buffer,
7865 or @code{nil}.
7866
7867 @item sentinel
7868 A function called whenever the process receives a signal, or @code{nil}.
7869
7870 @item buffer
7871 The associated buffer of the process.
7872
7873 @item pid
7874 An integer, the Unix process @sc{id}.
7875
7876 @item childp
7877 A flag, non-@code{nil} if this is really a child process.
7878 It is @code{nil} for a network connection.
7879
7880 @item mark
7881 A marker indicating the position of the end of the last output from this
7882 process inserted into the buffer.  This is often but not always the end
7883 of the buffer.
7884
7885 @item kill_without_query
7886 If this is non-@code{nil}, killing XEmacs while this process is still
7887 running does not ask for confirmation about killing the process.
7888
7889 @item raw_status_low
7890 @itemx raw_status_high
7891 These two fields record 16 bits each of the process status returned by
7892 the @code{wait} system call.
7893
7894 @item status
7895 The process status, as @code{process-status} should return it.
7896
7897 @item tick
7898 @itemx update_tick
7899 If these two fields are not equal, a change in the status of the process
7900 needs to be reported, either by running the sentinel or by inserting a
7901 message in the process buffer.
7902
7903 @item pty_flag
7904 Non-@code{nil} if communication with the subprocess uses a @sc{pty};
7905 @code{nil} if it uses a pipe.
7906
7907 @item infd
7908 The file descriptor for input from the process.
7909
7910 @item outfd
7911 The file descriptor for output to the process.
7912
7913 @item subtty
7914 The file descriptor for the terminal that the subprocess is using.  (On
7915 some systems, there is no need to record this, so the value is
7916 @code{-1}.)
7917
7918 @item tty_name
7919 The name of the terminal that the subprocess is using,
7920 or @code{nil} if it is using pipes.
7921 @end table
7922
7923 @node Interface to X Windows, Index, Subprocesses, Top
7924 @chapter Interface to X Windows
7925
7926 Not yet documented.
7927
7928 @include index.texi
7929
7930 @c Print the tables of contents
7931 @summarycontents
7932 @contents
7933 @c That's all
7934
7935 @bye
7936