git.chise.org Git - chise/xemacs-chise.git.1/blob - man/internals/internals.texi

   1 \input texinfo  @c -*-texinfo-*-
   2 @c %**start of header
   3 @setfilename ../../info/internals.info
   4 @settitle XEmacs Internals Manual
   5 @c %**end of header
   6
   7 @ifinfo
   8
   9 Copyright @copyright{} 1992 - 1996 Ben Wing.
  10 Copyright @copyright{} 1996, 1997 Sun Microsystems.
  11 Copyright @copyright{} 1994, 1995 Free Software Foundation.
  12 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
  13
  14
  15 Permission is granted to make and distribute verbatim copies of this
  16 manual provided the copyright notice and this permission notice are
  17 preserved on all copies.
  18
  19 @ignore
  20 Permission is granted to process this file through TeX and print the
  21 results, provided the printed document carries copying permission notice
  22 identical to this one except for the removal of this paragraph (this
  23 paragraph not being relevant to the printed manual).
  24
  25 @end ignore
  26 Permission is granted to copy and distribute modified versions of this
  27 manual under the conditions for verbatim copying, provided that the
  28 entire resulting derived work is distributed under the terms of a
  29 permission notice identical to this one.
  30
  31 Permission is granted to copy and distribute translations of this manual
  32 into another language, under the above conditions for modified versions,
  33 except that this permission notice may be stated in a translation
  34 approved by the Foundation.
  35
  36 Permission is granted to copy and distribute modified versions of this
  37 manual under the conditions for verbatim copying, provided also that the
  38 section entitled ``GNU General Public License'' is included exactly as
  39 in the original, and provided that the entire resulting derived work is
  40 distributed under the terms of a permission notice identical to this
  41 one.
  42
  43 Permission is granted to copy and distribute translations of this manual
  44 into another language, under the above conditions for modified versions,
  45 except that the section entitled ``GNU General Public License'' may be
  46 included in a translation approved by the Free Software Foundation
  47 instead of in the original English.
  48 @end ifinfo
  49
  50 @c Combine indices.
  51 @synindex cp fn
  52 @syncodeindex vr fn
  53 @syncodeindex ky fn
  54 @syncodeindex pg fn
  55 @syncodeindex tp fn
  56
  57 @setchapternewpage odd
  58 @finalout
  59
  60 @titlepage
  61 @title XEmacs Internals Manual
  62 @subtitle Version 1.1, March 1997
  63
  64 @author Ben Wing
  65 @author Martin Buchholz
  66 @page
  67 @vskip 0pt plus 1fill
  68
  69 @noindent
  70 Copyright @copyright{} 1992 - 1996 Ben Wing. @*
  71 Copyright @copyright{} 1996 Sun Microsystems, Inc. @*
  72 Copyright @copyright{} 1994 Free Software Foundation. @*
  73 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
  74
  75 @sp 2
  76 Version 1.1 @*
  77 March, 1997.@*
  78
  79 Permission is granted to make and distribute verbatim copies of this
  80 manual provided the copyright notice and this permission notice are
  81 preserved on all copies.
  82
  83 Permission is granted to copy and distribute modified versions of this
  84 manual under the conditions for verbatim copying, provided also that the
  85 section entitled ``GNU General Public License'' is included
  86 exactly as in the original, and provided that the entire resulting
  87 derived work is distributed under the terms of a permission notice
  88 identical to this one.
  89
  90 Permission is granted to copy and distribute translations of this manual
  91 into another language, under the above conditions for modified versions,
  92 except that the section entitled ``GNU General Public License'' may be
  93 included in a translation approved by the Free Software Foundation
  94 instead of in the original English.
  95 @end titlepage
  96 @page
  97
  98 @node Top, A History of Emacs, (dir), (dir)
  99
 100 @ifinfo
 101 This Info file contains v1.0 of the XEmacs Internals Manual.
 102 @end ifinfo
 103
 104 @menu
 105 * A History of Emacs::          Times, dates, important events.
 106 * XEmacs From the Outside::     A broad conceptual overview.
 107 * The Lisp Language::           An overview.
 108 * XEmacs From the Perspective of Building::
 109 * XEmacs From the Inside::
 110 * The XEmacs Object System (Abstractly Speaking)::
 111 * How Lisp Objects Are Represented in C::
 112 * Rules When Writing New C Code::
 113 * A Summary of the Various XEmacs Modules::
 114 * Allocation of Objects in XEmacs Lisp::
 115 * Events and the Event Loop::
 116 * Evaluation; Stack Frames; Bindings::
 117 * Symbols and Variables::
 118 * Buffers and Textual Representation::
 119 * MULE Character Sets and Encodings::
 120 * The Lisp Reader and Compiler::
 121 * Lstreams::
 122 * Consoles; Devices; Frames; Windows::
 123 * The Redisplay Mechanism::
 124 * Extents::
 125 * Faces and Glyphs::
 126 * Specifiers::
 127 * Menus::
 128 * Subprocesses::
 129 * Interface to X Windows::
 130 * Index::                   Index including concepts, functions, variables,
 131                               and other terms.
 132
 133       --- The Detailed Node Listing ---
 134
 135 Here are other nodes that are inferiors of those already listed,
 136 mentioned here so you can get to them in one step:
 137
 138 A History of Emacs
 139
 140 * Through Version 18::          Unification prevails.
 141 * Lucid Emacs::                 One version 19 Emacs.
 142 * GNU Emacs 19::                The other version 19 Emacs.
 143 * XEmacs::                      The continuation of Lucid Emacs.
 144
 145 Rules When Writing New C Code
 146
 147 * General Coding Rules::
 148 * Writing Lisp Primitives::
 149 * Adding Global Lisp Variables::
 150 * Techniques for XEmacs Developers::
 151
 152 A Summary of the Various XEmacs Modules
 153
 154 * Low-Level Modules::
 155 * Basic Lisp Modules::
 156 * Modules for Standard Editing Operations::
 157 * Editor-Level Control Flow Modules::
 158 * Modules for the Basic Displayable Lisp Objects::
 159 * Modules for other Display-Related Lisp Objects::
 160 * Modules for the Redisplay Mechanism::
 161 * Modules for Interfacing with the File System::
 162 * Modules for Other Aspects of the Lisp Interpreter and Object System::
 163 * Modules for Interfacing with the Operating System::
 164 * Modules for Interfacing with X Windows::
 165 * Modules for Internationalization::
 166
 167 Allocation of Objects in XEmacs Lisp
 168
 169 * Introduction to Allocation::
 170 * Garbage Collection::
 171 * GCPROing::
 172 * Integers and Characters::
 173 * Allocation from Frob Blocks::
 174 * lrecords::
 175 * Low-level allocation::
 176 * Pure Space::
 177 * Cons::
 178 * Vector::
 179 * Bit Vector::
 180 * Symbol::
 181 * Marker::
 182 * String::
 183 * Bytecode::
 184
 185 Events and the Event Loop
 186
 187 * Introduction to Events::
 188 * Main Loop::
 189 * Specifics of the Event Gathering Mechanism::
 190 * Specifics About the Emacs Event::
 191 * The Event Stream Callback Routines::
 192 * Other Event Loop Functions::
 193 * Converting Events::
 194 * Dispatching Events; The Command Builder::
 195
 196 Evaluation; Stack Frames; Bindings
 197
 198 * Evaluation::
 199 * Dynamic Binding; The specbinding Stack; Unwind-Protects::
 200 * Simple Special Forms::
 201 * Catch and Throw::
 202
 203 Symbols and Variables
 204
 205 * Introduction to Symbols::
 206 * Obarrays::
 207 * Symbol Values::
 208
 209 Buffers and Textual Representation
 210
 211 * Introduction to Buffers::     A buffer holds a block of text such as a file.
 212 * The Text in a Buffer::        Representation of the text in a buffer.
 213 * Buffer Lists::                Keeping track of all buffers.
 214 * Markers and Extents::         Tagging locations within a buffer.
 215 * Bufbytes and Emchars::        Representation of individual characters.
 216 * The Buffer Object::           The Lisp object corresponding to a buffer.
 217
 218 MULE Character Sets and Encodings
 219
 220 * Character Sets::
 221 * Encodings::
 222 * Internal Mule Encodings::
 223
 224 Encodings
 225
 226 * Japanese EUC (Extended Unix Code)::
 227 * JIS7::
 228
 229 Internal Mule Encodings
 230
 231 * Internal String Encoding::
 232 * Internal Character Encoding::
 233
 234 The Lisp Reader and Compiler
 235
 236 Lstreams
 237
 238 Consoles; Devices; Frames; Windows
 239
 240 * Introduction to Consoles; Devices; Frames; Windows::
 241 * Point::
 242 * Window Hierarchy::
 243
 244 The Redisplay Mechanism
 245
 246 * Critical Redisplay Sections::
 247 * Line Start Cache::
 248
 249 Extents
 250
 251 * Introduction to Extents::     Extents are ranges over text, with properties.
 252 * Extent Ordering::             How extents are ordered internally.
 253 * Format of the Extent Info::   The extent information in a buffer or string.
 254 * Zero-Length Extents::         A weird special case.
 255 * Mathematics of Extent Ordering::      A rigorous foundation.
 256 * Extent Fragments::            Cached information useful for redisplay.
 257
 258 Faces and Glyphs
 259
 260 Specifiers
 261
 262 Menus
 263
 264 Subprocesses
 265
 266 Interface to X Windows
 267
 268 @end menu
 269
 270 @node A History of Emacs, XEmacs From the Outside, Top, Top
 271 @chapter A History of Emacs
 272 @cindex history of Emacs
 273 @cindex Hackers (Steven Levy)
 274 @cindex Levy, Steven
 275 @cindex ITS (Incompatible Timesharing System)
 276 @cindex Stallman, Richard
 277 @cindex RMS
 278 @cindex MIT
 279 @cindex TECO
 280 @cindex FSF
 281 @cindex Free Software Foundation
 282
 283   XEmacs is a powerful, customizable text editor and development
 284 environment.  It began as Lucid Emacs, which was in turn derived from
 285 GNU Emacs, a program written by Richard Stallman of the Free Software
 286 Foundation.  GNU Emacs dates back to the 1970's, and was modelled
 287 after a package called ``Emacs'', written in 1976, that was a set of
 288 macros on top of TECO, an old, old text editor written at MIT on the
 289 DEC PDP 10 under one of the earliest time-sharing operating systems,
 290 ITS (Incompatible Timesharing System). (ITS dates back well before
 291 Unix.) ITS, TECO, and Emacs were products of a group of people at MIT
 292 who called themselves ``hackers'', who shared an idealistic belief
 293 system about the free exchange of information and were fanatical in
 294 their devotion to and time spent with computers. (The hacker
 295 subculture dates back to the late 1950's at MIT and is described in
 296 detail in Steven Levy's book @cite{Hackers}.  This book also includes
 297 a lot of information about Stallman himself and the development of
 298 Lisp, a programming language developed at MIT that underlies Emacs.)
 299
 300 @menu
 301 * Through Version 18::          Unification prevails.
 302 * Lucid Emacs::                 One version 19 Emacs.
 303 * GNU Emacs 19::                The other version 19 Emacs.
 304 * GNU Emacs 20::                The other version 20 Emacs.
 305 * XEmacs::                      The continuation of Lucid Emacs.
 306 @end menu
 307
 308 @node Through Version 18
 309 @section Through Version 18
 310 @cindex Gosling, James
 311 @cindex Great Usenet Renaming
 312
 313   Although the history of the early versions of GNU Emacs is unclear,
 314 the history is well-known from the middle of 1985.  A time line is:
 315
 316 @itemize @bullet
 317 @item
 318 GNU Emacs version 15 (15.34) was released sometime in 1984 or 1985 and
 319 shared some code with a version of Emacs written by James Gosling (the
 320 same James Gosling who later created the Java language).
 321 @item
 322 GNU Emacs version 16 (first released version was 16.56) was released on
 323 July 15, 1985.  All Gosling code was removed due to potential copyright
 324 problems with the code.
 325 @item
 326 version 16.57: released on September 16, 1985.
 327 @item
 328 versions 16.58, 16.59: released on September 17, 1985.
 329 @item
 330 version 16.60: released on September 19, 1985.  These later version 16's
 331 incorporated patches from the net, esp. for getting Emacs to work under
 332 System V.
 333 @item
 334 version 17.36 (first official v17 release) released on December 20,
 335 1985.  Included a TeX-able user manual.  First official unpatched
 336 version that worked on vanilla System V machines.
 337 @item
 338 version 17.43 (second official v17 release) released on January 25,
 339 1986.
 340 @item
 341 version 17.45 released on January 30, 1986.
 342 @item
 343 version 17.46 released on February 4, 1986.
 344 @item
 345 version 17.48 released on February 10, 1986.
 346 @item
 347 version 17.49 released on February 12, 1986.
 348 @item
 349 version 17.55 released on March 18, 1986.
 350 @item
 351 version 17.57 released on March 27, 1986.
 352 @item
 353 version 17.58 released on April 4, 1986.
 354 @item
 355 version 17.61 released on April 12, 1986.
 356 @item
 357 version 17.63 released on May 7, 1986.
 358 @item
 359 version 17.64 released on May 12, 1986.
 360 @item
 361 version 18.24 (a beta version) released on October 2, 1986.
 362 @item
 363 version 18.30 (a beta version) released on November 15, 1986.
 364 @item
 365 version 18.31 (a beta version) released on November 23, 1986.
 366 @item
 367 version 18.32 (a beta version) released on December 7, 1986.
 368 @item
 369 version 18.33 (a beta version) released on December 12, 1986.
 370 @item
 371 version 18.35 (a beta version) released on January 5, 1987.
 372 @item
 373 version 18.36 (a beta version) released on January 21, 1987.
 374 @item
 375 January 27, 1987: The Great Usenet Renaming.  net.emacs is now
 376 comp.emacs.
 377 @item
 378 version 18.37 (a beta version) released on February 12, 1987.
 379 @item
 380 version 18.38 (a beta version) released on March 3, 1987.
 381 @item
 382 version 18.39 (a beta version) released on March 14, 1987.
 383 @item
 384 version 18.40 (a beta version) released on March 18, 1987.
 385 @item
 386 version 18.41 (the first ``official'' release) released on March 22,
 387 1987.
 388 @item
 389 version 18.45 released on June 2, 1987.
 390 @item
 391 version 18.46 released on June 9, 1987.
 392 @item
 393 version 18.47 released on June 18, 1987.
 394 @item
 395 version 18.48 released on September 3, 1987.
 396 @item
 397 version 18.49 released on September 18, 1987.
 398 @item
 399 version 18.50 released on February 13, 1988.
 400 @item
 401 version 18.51 released on May 7, 1988.
 402 @item
 403 version 18.52 released on September 1, 1988.
 404 @item
 405 version 18.53 released on February 24, 1989.
 406 @item
 407 version 18.54 released on April 26, 1989.
 408 @item
 409 version 18.55 released on August 23, 1989.  This is the earliest version
 410 that is still available by FTP.
 411 @item
 412 version 18.56 released on January 17, 1991.
 413 @item
 414 version 18.57 released late January, 1991.
 415 @item
 416 version 18.58 released ?????.
 417 @item
 418 version 18.59 released October 31, 1992.
 419 @end itemize
 420
 421 @node Lucid Emacs
 422 @section Lucid Emacs
 423 @cindex Lucid Emacs
 424 @cindex Lucid Inc.
 425 @cindex Energize
 426 @cindex Epoch
 427
 428   Lucid Emacs was developed by the (now-defunct) Lucid Inc., a maker of
 429 C++ and Lisp development environments.  It began when Lucid decided they
 430 wanted to use Emacs as the editor and cornerstone of their C++
 431 development environment (called ``Energize'').  They needed many features
 432 that were not available in the existing version of GNU Emacs (version
 433 18.5something), in particular good and integrated support for GUI
 434 elements such as mouse support, multiple fonts, multiple window-system
 435 windows, etc.  A branch of GNU Emacs called Epoch, written at the
 436 University of Illinois, existed that supplied many of these features;
 437 however, Lucid needed more than what existed in Epoch.  At the time, the
 438 Free Software Foundation was working on version 19 of Emacs (this was
 439 sometime around 1991), which was planned to have similar features, and
 440 so Lucid decided to work with the Free Software Foundation.  Their plan
 441 was to add features that they needed, and coordinate with the FSF so
 442 that the features would get included back into Emacs version 19.
 443
 444   Delays in the release of version 19 occurred, however (resulting in it
 445 finally being released more than a year after what was initially
 446 planned), and Lucid encountered unexpected technical resistance in
 447 getting their changes merged back into version 19, so they decided to
 448 release their own version of Emacs, which became Lucid Emacs 19.0.
 449
 450 @cindex Zawinski, Jamie
 451 @cindex Sexton, Harlan
 452 @cindex Benson, Eric
 453 @cindex Devin, Matthieu
 454   The initial authors of Lucid Emacs were Matthieu Devin, Harlan Sexton,
 455 and Eric Benson, and the work was later taken over by Jamie Zawinski,
 456 who became ``Mr. Lucid Emacs'' for many releases.
 457
 458   A time line for Lucid Emacs/XEmacs is
 459
 460 @itemize @bullet
 461 @item
 462 version 19.0 shipped with Energize 1.0, April 1992.
 463 @item
 464 version 19.1 released June 4, 1992.
 465 @item
 466 version 19.2 released June 19, 1992.
 467 @item
 468 version 19.3 released September 9, 1992.
 469 @item
 470 version 19.4 released January 21, 1993.
 471 @item
 472 version 19.5 was a repackaging of 19.4 with a few bug fixes and
 473 shipped with Energize 2.0.  Never released to the net.
 474 @item
 475 version 19.6 released April 9, 1993.
 476 @item
 477 version 19.7 was a repackaging of 19.6 with a few bug fixes and
 478 shipped with Energize 2.1.  Never released to the net.
 479 @item
 480 version 19.8 released September 6, 1993.
 481 @item
 482 version 19.9 released January 12, 1994.
 483 @item
 484 version 19.10 released May 27, 1994.
 485 @item
 486 version 19.11 (first XEmacs) released September 13, 1994.
 487 @item
 488 version 19.12 released June 23, 1995.
 489 @item
 490 version 19.13 released September 1, 1995.
 491 @item
 492 version 19.14 released June 23, 1996.
 493 @item
 494 version 20.0 released February 9, 1997.
 495 @item
 496 version 19.15 released March 28, 1997.
 497 @item
 498 version 20.1 (not released to the net) April 15, 1997.
 499 @item
 500 version 20.2 released May 16, 1997.
 501 @item
 502 version 19.16 released October 31, 1997.
 503 @item
 504 version 20.3 (the first stable version of XEmacs 20.x) released November 30,
 505 1997.
 506 version 20.4 released February 28, 1998.
 507 @end itemize
 508
 509 @node GNU Emacs 19
 510 @section GNU Emacs 19
 511 @cindex GNU Emacs 19
 512 @cindex FSF Emacs
 513
 514   About a year after the initial release of Lucid Emacs, the FSF
 515 released a beta of their version of Emacs 19 (referred to here as ``GNU
 516 Emacs'').  By this time, the current version of Lucid Emacs was
 517 19.6. (Strangely, the first released beta from the FSF was GNU Emacs
 518 19.7.) A time line for GNU Emacs version 19 is
 519
 520 @itemize @bullet
 521 @item
 522 version 19.8 (beta) released May 27, 1993.
 523 @item
 524 version 19.9 (beta) released May 27, 1993.
 525 @item
 526 version 19.10 (beta) released May 30, 1993.
 527 @item
 528 version 19.11 (beta) released June 1, 1993.
 529 @item
 530 version 19.12 (beta) released June 2, 1993.
 531 @item
 532 version 19.13 (beta) released June 8, 1993.
 533 @item
 534 version 19.14 (beta) released June 17, 1993.
 535 @item
 536 version 19.15 (beta) released June 19, 1993.
 537 @item
 538 version 19.16 (beta) released July 6, 1993.
 539 @item
 540 version 19.17 (beta) released late July, 1993.
 541 @item
 542 version 19.18 (beta) released August 9, 1993.
 543 @item
 544 version 19.19 (beta) released August 15, 1993.
 545 @item
 546 version 19.20 (beta) released November 17, 1993.
 547 @item
 548 version 19.21 (beta) released November 17, 1993.
 549 @item
 550 version 19.22 (beta) released November 28, 1993.
 551 @item
 552 version 19.23 (beta) released May 17, 1994.
 553 @item
 554 version 19.24 (beta) released May 16, 1994.
 555 @item
 556 version 19.25 (beta) released June 3, 1994.
 557 @item
 558 version 19.26 (beta) released September 11, 1994.
 559 @item
 560 version 19.27 (beta) released September 14, 1994.
 561 @item
 562 version 19.28 (first ``official'' release) released November 1, 1994.
 563 @item
 564 version 19.29 released June 21, 1995.
 565 @item
 566 version 19.30 released November 24, 1995.
 567 @item
 568 version 19.31 released May 25, 1996.
 569 @item
 570 version 19.32 released July 31, 1996.
 571 @item
 572 version 19.33 released August 11, 1996.
 573 @item
 574 version 19.34 released August 21, 1996.
 575 @item
 576 version 19.34b released September 6, 1996.
 577 @end itemize
 578
 579 @cindex Mlynarik, Richard
 580   In some ways, GNU Emacs 19 was better than Lucid Emacs; in some ways,
 581 worse.  Lucid soon began incorporating features from GNU Emacs 19 into
 582 Lucid Emacs; the work was mostly done by Richard Mlynarik, who had been
 583 working on and using GNU Emacs for a long time (back as far as version
 584 16 or 17).
 585
 586 @node GNU Emacs 20
 587 @section GNU Emacs 20
 588 @cindex GNU Emacs 20
 589 @cindex FSF Emacs
 590
 591 On February 2, 1997 work began on GNU Emacs to integrate Mule.  The first
 592 release was made in September of that year.
 593
 594 A timeline for Emacs 20 is
 595
 596 @itemize @bullet
 597 @item
 598 version 20.1 released September 17, 1997.
 599 @item
 600 version 20.2 released September 20, 1997.
 601 @item
 602 version 20.3 released August 19, 1998.
 603 @end itemize
 604
 605 @node XEmacs
 606 @section XEmacs
 607 @cindex XEmacs
 608
 609 @cindex Sun Microsystems
 610 @cindex University of Illinois
 611 @cindex Illinois, University of
 612 @cindex SPARCWorks
 613 @cindex Andreessen, Marc
 614 @cindex Baur, Steve
 615 @cindex Buchholz, Martin
 616 @cindex Kaplan, Simon
 617 @cindex Wing, Ben
 618 @cindex Thompson, Chuck
 619 @cindex Win-Emacs
 620 @cindex Epoch
 621 @cindex Amdahl Corporation
 622   Around the time that Lucid was developing Energize, Sun Microsystems
 623 was developing their own development environment (called ``SPARCWorks'')
 624 and also decided to use Emacs.  They joined forces with the Epoch team
 625 at the University of Illinois and later with Lucid.  The maintainer of
 626 the last-released version of Epoch was Marc Andreessen, but he dropped
 627 out and the Epoch project, headed by Simon Kaplan, lured Chuck Thompson
 628 away from a system administration job to become the primary Lucid Emacs
 629 author for Epoch and Sun.  Chuck's area of specialty became the
 630 redisplay engine (he replaced the old Lucid Emacs redisplay engine with
 631 a ported version from Epoch and then later rewrote it from scratch).
 632 Sun also hired Ben Wing (the author of Win-Emacs, a port of Lucid Emacs
 633 to Microsoft Windows 3.1) in 1993, for what was initially a one-month
 634 contract to fix some event problems but later became a many-year
 635 involvement, punctuated by a six-month contract with Amdahl Corporation.
 636
 637 @cindex rename to XEmacs
 638   In 1994, Sun and Lucid agreed to rename Lucid Emacs to XEmacs (a name
 639 not favorable to either company); the first release called XEmacs was
 640 version 19.11.  In June 1994, Lucid folded and Jamie quit to work for
 641 the newly formed Mosaic Communications Corp., later Netscape
 642 Communications Corp. (co-founded by the same Marc Andreessen, who had
 643 quit his Epoch job to work on a graphical browser for the World Wide
 644 Web).  Chuck then become the primary maintainer of XEmacs, and put out
 645 versions 19.11 through 19.14 in conjunction with Ben.  For 19.12 and
 646 19.13, Chuck added the new redisplay and many other display improvements
 647 and Ben added MULE support (support for Asian and other languages) and
 648 redesigned most of the internal Lisp subsystems to better support the
 649 MULE work and the various other features being added to XEmacs.  After
 650 19.14 Chuck retired as primary maintainer and Steve Baur stepped in.
 651
 652 @cindex MULE merged XEmacs appears
 653   Soon after 19.13 was released, work began in earnest on the MULE
 654 internationalization code and the source tree was divided into two
 655 development paths.  The MULE version was initially called 19.20, but was
 656 soon renamed to 20.0.  In 1996 Martin Buchholz of Sun Microsystems took
 657 over the care and feeding of it and worked on it in parallel with the
 658 19.14 development that was occurring at the same time.  After much work
 659 by Martin, it was decided to release 20.0 ahead of 19.15 in February
 660 1997.  The source tree remained divided until 20.2 when the version 19
 661 source was finally retired at version 19.16.
 662
 663 @cindex Baur, Steve
 664 @cindex Buchholz, Martin
 665 @cindex Jones, Kyle
 666 @cindex Niksic, Hrvoje
 667 @cindex XEmacs goes it alone
 668   In 1997, Sun finally dropped all pretense of support for XEmacs and
 669 Martin Buchholz left the company in November.  Since then, and mostly
 670 for the previous year, because Steve Baur was never paid to work on
 671 XEmacs, XEmacs has existed solely on the contributions of volunteers
 672 from the Free Software Community.  Starting from 1997, Hrvoje Niksic and
 673 Kyle Jones have figured prominently in XEmacs development.
 674
 675 @cindex merging attempts
 676   Many attempts have been made to merge XEmacs and GNU Emacs, but they
 677 have consistently failed.
 678
 679   A more detailed history is contained in the XEmacs About page.
 680
 681 @node XEmacs From the Outside, The Lisp Language, A History of Emacs, Top
 682 @chapter XEmacs From the Outside
 683 @cindex read-eval-print
 684
 685   XEmacs appears to the outside world as an editor, but it is really a
 686 Lisp environment.  At its heart is a Lisp interpreter; it also
 687 ``happens'' to contain many specialized object types (e.g. buffers,
 688 windows, frames, events) that are useful for implementing an editor.
 689 Some of these objects (in particular windows and frames) have
 690 displayable representations, and XEmacs provides a function
 691 @code{redisplay()} that ensures that the display of all such objects
 692 matches their internal state.  Most of the time, a standard Lisp
 693 environment is in a @dfn{read-eval-print} loop -- i.e. ``read some Lisp
 694 code, execute it, and print the results''.  XEmacs has a similar loop:
 695
 696 @itemize @bullet
 697 @item
 698 read an event
 699 @item
 700 dispatch the event (i.e. ``do it'')
 701 @item
 702 redisplay
 703 @end itemize
 704
 705   Reading an event is done using the Lisp function @code{next-event},
 706 which waits for something to happen (typically, the user presses a key
 707 or moves the mouse) and returns an event object describing this.
 708 Dispatching an event is done using the Lisp function
 709 @code{dispatch-event}, which looks up the event in a keymap object (a
 710 particular kind of object that associates an event with a Lisp function)
 711 and calls that function.  The function ``does'' what the user has
 712 requested by changing the state of particular frame objects, buffer
 713 objects, etc.  Finally, @code{redisplay()} is called, which updates the
 714 display to reflect those changes just made.  Thus is an ``editor'' born.
 715
 716 @cindex bridge, playing
 717 @cindex taxes, doing
 718 @cindex pi, calculating
 719   Note that you do not have to use XEmacs as an editor; you could just
 720 as well make it do your taxes, compute pi, play bridge, etc.  You'd just
 721 have to write functions to do those operations in Lisp.
 722
 723 @node The Lisp Language, XEmacs From the Perspective of Building, XEmacs From the Outside, Top
 724 @chapter The Lisp Language
 725 @cindex Lisp vs. C
 726 @cindex C vs. Lisp
 727 @cindex Lisp vs. Java
 728 @cindex Java vs. Lisp
 729 @cindex dynamic scoping
 730 @cindex scoping, dynamic
 731 @cindex dynamic types
 732 @cindex types, dynamic
 733 @cindex Java
 734 @cindex Common Lisp
 735 @cindex Gosling, James
 736
 737   Lisp is a general-purpose language that is higher-level than C and in
 738 many ways more powerful than C.  Powerful dialects of Lisp such as
 739 Common Lisp are probably much better languages for writing very large
 740 applications than is C. (Unfortunately, for many non-technical
 741 reasons C and its successor C++ have become the dominant languages for
 742 application development.  These languages are both inadequate for
 743 extremely large applications, which is evidenced by the fact that newer,
 744 larger programs are becoming ever harder to write and are requiring ever
 745 more programmers despite great increases in C development environments;
 746 and by the fact that, although hardware speeds and reliability have been
 747 growing at an exponential rate, most software is still generally
 748 considered to be slow and buggy.)
 749
 750   The new Java language holds promise as a better general-purpose
 751 development language than C.  Java has many features in common with
 752 Lisp that are not shared by C (this is not a coincidence, since
 753 Java was designed by James Gosling, a former Lisp hacker).  This
 754 will be discussed more later.
 755
 756 For those used to C, here is a summary of the basic differences between
 757 C and Lisp:
 758
 759 @enumerate
 760 @item
 761 Lisp has an extremely regular syntax.  Every function, expression,
 762 and control statement is written in the form
 763
 764 @example
 765    (@var{func} @var{arg1} @var{arg2} ...)
 766 @end example
 767
 768 This is as opposed to C, which writes functions as
 769
 770 @example
 771    func(@var{arg1}, @var{arg2}, ...)
 772 @end example
 773
 774 but writes expressions involving operators as (e.g.)
 775
 776 @example
 777    @var{arg1} + @var{arg2}
 778 @end example
 779
 780 and writes control statements as (e.g.)
 781
 782 @example
 783    while (@var{expr}) @{ @var{statement1}; @var{statement2}; ... @}
 784 @end example
 785
 786 Lisp equivalents of the latter two would be
 787
 788 @example
 789    (+ @var{arg1} @var{arg2} ...)
 790 @end example
 791
 792 and
 793
 794 @example
 795    (while @var{expr} @var{statement1} @var{statement2} ...)
 796 @end example
 797
 798 @item
 799 Lisp is a safe language.  Assuming there are no bugs in the Lisp
 800 interpreter/compiler, it is impossible to write a program that ``core
 801 dumps'' or otherwise causes the machine to execute an illegal
 802 instruction.  This is very different from C, where perhaps the most
 803 common outcome of a bug is exactly such a crash.  A corollary of this is that
 804 the C operation of casting a pointer is impossible (and unnecessary) in
 805 Lisp, and that it is impossible to access memory outside the bounds of
 806 an array.
 807
 808 @item
 809 Programs and data are written in the same form.  The
 810 parenthesis-enclosing form described above for statements is the same
 811 form used for the most common data type in Lisp, the list.  Thus, it is
 812 possible to represent any Lisp program using Lisp data types, and for
 813 one program to construct Lisp statements and then dynamically
 814 @dfn{evaluate} them, or cause them to execute.
 815
 816 @item
 817 All objects are @dfn{dynamically typed}.  This means that part of every
 818 object is an indication of what type it is.  A Lisp program can
 819 manipulate an object without knowing what type it is, and can query an
 820 object to determine its type.  This means that, correspondingly,
 821 variables and function parameters can hold objects of any type and are
 822 not normally declared as being of any particular type.  This is opposed
 823 to the @dfn{static typing} of C, where variables can hold exactly one
 824 type of object and must be declared as such, and objects do not contain
 825 an indication of their type because it's implicit in the variables they
 826 are stored in.  It is possible in C to have a variable hold different
 827 types of objects (e.g. through the use of @code{void *} pointers or
 828 variable-argument functions), but the type information must then be
 829 passed explicitly in some other fashion, leading to additional program
 830 complexity.
 831
 832 @item
 833 Allocated memory is automatically reclaimed when it is no longer in use.
 834 This operation is called @dfn{garbage collection} and involves looking
 835 through all variables to see what memory is being pointed to, and
 836 reclaiming any memory that is not pointed to and is thus
 837 ``inaccessible'' and out of use.  This is as opposed to C, in which
 838 allocated memory must be explicitly reclaimed using @code{free()}.  If
 839 you simply drop all pointers to memory without freeing it, it becomes
 840 ``leaked'' memory that still takes up space.  Over a long period of
 841 time, this can cause your program to grow and grow until it runs out of
 842 memory.
 843
 844 @item
 845 Lisp has built-in facilities for handling errors and exceptions.  In C,
 846 when an error occurs, usually either the program exits entirely or the
 847 routine in which the error occurs returns a value indicating this.  If
 848 an error occurs in a deeply-nested routine, then every routine currently
 849 called must unwind itself normally and return an error value back up to
 850 the next routine.  This means that every routine must explicitly check
 851 for an error in all the routines it calls; if it does not do so,
 852 unexpected and often random behavior results.  This is an extremely
 853 common source of bugs in C programs.  An alternative would be to do a
 854 non-local exit using @code{longjmp()}, but that is often very dangerous
 855 because the routines that were exited past had no opportunity to clean
 856 up after themselves and may leave things in an inconsistent state,
 857 causing a crash shortly afterwards.
 858
 859 Lisp provides mechanisms to make such non-local exits safe.  When an
 860 error occurs, a routine simply signals that an error of a particular
 861 class has occurred, and a non-local exit takes place.  Any routine can
 862 trap errors occurring in routines it calls by registering an error
 863 handler for some or all classes of errors. (If no handler is registered,
 864 a default handler, generally installed by the top-level event loop, is
 865 executed; this prints out the error and continues.) Routines can also
 866 specify cleanup code (called an @dfn{unwind-protect}) that will be
 867 called when control exits from a block of code, no matter how that exit
 868 occurs -- i.e. even if a function deeply nested below it causes a
 869 non-local exit back to the top level.
 870
 871 Note that this facility has appeared in some recent vintages of C, in
 872 particular Visual C++ and other PC compilers written for the Microsoft
 873 Win32 API.
 874
 875 @item
 876 In Emacs Lisp, local variables are @dfn{dynamically scoped}.  This means
 877 that if you declare a local variable in a particular function, and then
 878 call another function, that subfunction can ``see'' the local variable
 879 you declared.  This is actually considered a bug in Emacs Lisp and in
 880 all other early dialects of Lisp, and was corrected in Common Lisp. (In
 881 Common Lisp, you can still declare dynamically scoped variables if you
 882 want to -- they are sometimes useful -- but variables by default are
 883 @dfn{lexically scoped} as in C.)
 884 @end enumerate
 885
 886 For those familiar with Lisp, Emacs Lisp is modelled after MacLisp, an
 887 early dialect of Lisp developed at MIT (no relation to the Macintosh
 888 computer).  There is a Common Lisp compatibility package available for
 889 Emacs that provides many of the features of Common Lisp.
 890
 891 The Java language is derived in many ways from C, and shares a similar
 892 syntax, but has the following features in common with Lisp (and different
 893 from C):
 894
 895 @enumerate
 896 @item
 897 Java is a safe language, like Lisp.
 898 @item
 899 Java provides garbage collection, like Lisp.
 900 @item
 901 Java has built-in facilities for handling errors and exceptions, like
 902 Lisp.
 903 @item
 904 Java has a type system that combines the best advantages of both static
 905 and dynamic typing.  Objects (except very simple types) are explicitly
 906 marked with their type, as in dynamic typing; but there is a hierarchy
 907 of types and functions are declared to accept only certain types, thus
 908 providing the increased compile-time error-checking of static typing.
 909 @end enumerate
 910
 911 @node XEmacs From the Perspective of Building, XEmacs From the Inside, The Lisp Language, Top
 912 @chapter XEmacs From the Perspective of Building
 913
 914   The heart of XEmacs is the Lisp environment, which is written in C.
 915 This is contained in the @file{src/} subdirectory.  Underneath
 916 @file{src/} are two subdirectories of header files: @file{s/} (header
 917 files for particular operating systems) and @file{m/} (header files for
 918 particular machine types).  In practice the distinction between the two
 919 types of header files is blurred.  These header files define or undefine
 920 certain preprocessor constants and macros to indicate particular
 921 characteristics of the associated machine or operating system.  As part
 922 of the configure process, one @file{s/} file and one @file{m/} file is
 923 identified for the particular environment in which XEmacs is being
 924 built.
 925
 926   XEmacs also contains a great deal of Lisp code.  This implements the
 927 operations that make XEmacs useful as an editor as well as just a
 928 Lisp environment, and also contains many add-on packages that allow
 929 XEmacs to browse directories, act as a mail and Usenet news reader,
 930 compile Lisp code, etc.  There is actually more Lisp code than
 931 C code associated with XEmacs, but much of the Lisp code is
 932 peripheral to the actual operation of the editor.  The Lisp code
 933 all lies in subdirectories underneath the @file{lisp/} directory.
 934
 935   The @file{lwlib/} directory contains C code that implements a
 936 generalized interface onto different X widget toolkits and also
 937 implements some widgets of its own that behave like Motif widgets but
 938 are faster, free, and in some cases more powerful.  The code in this
 939 directory compiles into a library and is mostly independent from XEmacs.
 940
 941   The @file{etc/} directory contains various data files associated with
 942 XEmacs.  Some of them are actually read by XEmacs at startup; others
 943 merely contain useful information of various sorts.
 944
 945   The @file{lib-src/} directory contains C code for various auxiliary
 946 programs that are used in connection with XEmacs.  Some of them are used
 947 during the build process; others are used to perform certain functions
 948 that cannot conveniently be placed in the XEmacs executable (e.g. the
 949 @file{movemail} program for fetching mail out of @file{/var/spool/mail},
 950 which must be setgid to @file{mail} on many systems; and the
 951 @file{gnuclient} program, which allows an external script to communicate
 952 with a running XEmacs process).
 953
 954   The @file{man/} directory contains the sources for the XEmacs
 955 documentation.  It is mostly in a form called Texinfo, which can be
 956 converted into either a printed document (by passing it through @TeX{})
 957 or into on-line documentation called @dfn{info files}.
 958
 959   The @file{info/} directory contains the results of formatting the
 960 XEmacs documentation as @dfn{info files}, for on-line use.  These files
 961 are used when you enter the Info system using @kbd{C-h i} or through the
 962 Help menu.
 963
 964   The @file{dynodump/} directory contains auxiliary code used to build
 965 XEmacs on Solaris platforms.
 966
 967   The other directories contain various miscellaneous code and
 968 information that is not normally used or needed.
 969
 970   The first step of building involves running the @file{configure}
 971 program and passing it various parameters to specify any optional
 972 features you want and compiler arguments and such, as described in the
 973 @file{INSTALL} file.  This determines what the build environment is,
 974 chooses the appropriate @file{s/} and @file{m/} file, and runs a series
 975 of tests to determine many details about your environment, such as which
 976 library functions are available and exactly how they work. (The
 977 @file{s/} and @file{m/} files only contain information that cannot be
 978 conveniently detected in this fashion.) The reason for running these
 979 tests is that it allows XEmacs to be compiled on a much wider variety of
 980 platforms than those that the XEmacs developers happen to be familiar
 981 with, including various sorts of hybrid platforms.  This is especially
 982 important now that many operating systems give you a great deal of
 983 control over exactly what features you want installed, and allow for
 984 easy upgrading of parts of a system without upgrading the rest.  It
 985 would be impossible to pre-determine and pre-specify the information for
 986 all possible configurations.
 987
 988   When configure is done running, it generates @file{Makefile}s and the
 989 file @file{src/config.h} (which describes the features of your system)
 990 from template files.  You then run @file{make}, which compiles the
 991 auxiliary code and programs in @file{lib-src/} and @file{lwlib/} and the
 992 main XEmacs executable in @file{src/}.  The result of compiling and
 993 linking is an executable called @file{temacs}, which is @emph{not} the
 994 final XEmacs executable.  @file{temacs} by itself is not intended to
 995 function as an editor or even display any windows on the screen, and if
 996 you simply run it, it will exit immediately.  The @file{Makefile} runs
 997 @file{temacs} with certain options that cause it to initialize itself,
 998 read in a number of basic Lisp files, and then dump itself out into a
 999 new executable called @file{xemacs}.  This new executable has been
1000 pre-initialized and contains pre-digested Lisp code that is necessary
1001 for the editor to function (this includes most basic Lisp functions,
1002 e.g. @code{not}, that can be defined in terms of other Lisp primitives;
1003 some initialization code that is called when certain objects, such as
1004 frames, are created; and all of the standard keybindings and code for
1005 the actions they result in).  This executable, @file{xemacs}, is the
1006 executable that you run to use the XEmacs editor.
1007
1008 Although @file{temacs} is not intended to be run as an editor, it can,
1009 by using the incantation @code{temacs -batch -l loadup.el run-temacs}.
1010 This is useful when the dumping procedure described above is broken, or
1011 when using certain program debugging tools such as Purify.  These tools
1012 get mighty confused by the tricks played by the XEmacs build process,
1013 such as allocation memory in one process, and freeing it in the next.
1014
1015 @node XEmacs From the Inside, The XEmacs Object System (Abstractly Speaking), XEmacs From the Perspective of Building, Top
1016 @chapter XEmacs From the Inside
1017
1018   Internally, XEmacs is quite complex, and can be very confusing.  To
1019 simplify things, it can be useful to think of XEmacs as containing an
1020 event loop that ``drives'' everything, and a number of other subsystems,
1021 such as a Lisp engine and a redisplay mechanism.  Each of these other
1022 subsystems exists simultaneously in XEmacs, and each has a certain
1023 state.  The flow of control continually passes in and out of these
1024 different subsystems in the course of normal operation of the editor.
1025
1026   It is important to keep in mind that, most of the time, the editor is
1027 ``driven'' by the event loop.  Except during initialization and batch
1028 mode, all subsystems are entered directly or indirectly through the
1029 event loop, and ultimately, control exits out of all subsystems back up
1030 to the event loop.  This cycle of entering a subsystem, exiting back out
1031 to the event loop, and starting another iteration of the event loop
1032 occurs once each keystroke, mouse motion, etc.
1033
1034   If you're trying to understand a particular subsystem (other than the
1035 event loop), think of it as a ``daemon'' process or ``servant'' that is
1036 responsible for one particular aspect of a larger system, and
1037 periodically receives commands or environment changes that cause it to
1038 do something.  Ultimately, these commands and environment changes are
1039 always triggered by the event loop.  For example:
1040
1041 @itemize @bullet
1042 @item
1043 The window and frame mechanism is responsible for keeping track of what
1044 windows and frames exist, what buffers are in them, etc.  It is
1045 periodically given commands (usually from the user) to make a change to
1046 the current window/frame state: i.e. create a new frame, delete a
1047 window, etc.
1048
1049 @item
1050 The buffer mechanism is responsible for keeping track of what buffers
1051 exist and what text is in them.  It is periodically given commands
1052 (usually from the user) to insert or delete text, create a buffer, etc.
1053 When it receives a text-change command, it notifies the redisplay
1054 mechanism.
1055
1056 @item
1057 The redisplay mechanism is responsible for making sure that windows and
1058 frames are displayed correctly.  It is periodically told (by the event
1059 loop) to actually ``do its job'', i.e. snoop around and see what the
1060 current state of the environment (mostly of the currently-existing
1061 windows, frames, and buffers) is, and make sure that that state matches
1062 what's actually displayed.  It keeps lots and lots of information around
1063 (such as what is actually being displayed currently, and what the
1064 environment was last time it checked) so that it can minimize the work
1065 it has to do.  It is also helped along in that whenever a relevant
1066 change to the environment occurs, the redisplay mechanism is told about
1067 this, so it has a pretty good idea of where it has to look to find
1068 possible changes and doesn't have to look everywhere.
1069
1070 @item
1071 The Lisp engine is responsible for executing the Lisp code in which most
1072 user commands are written.  It is entered through a call to @code{eval}
1073 or @code{funcall}, which occurs as a result of dispatching an event from
1074 the event loop.  The functions it calls issue commands to the buffer
1075 mechanism, the window/frame subsystem, etc.
1076
1077 @item
1078 The Lisp allocation subsystem is responsible for keeping track of Lisp
1079 objects.  It is given commands from the Lisp engine to allocate objects,
1080 garbage collect, etc.
1081 @end itemize
1082
1083 etc.
1084
1085   The important idea here is that there are a number of independent
1086 subsystems each with its own responsibility and persistent state, just
1087 like different employees in a company, and each subsystem is
1088 periodically given commands from other subsystems.  Commands can flow
1089 from any one subsystem to any other, but there is usually some sort of
1090 hierarchy, with all commands originating from the event subsystem.
1091
1092   XEmacs is entered in @code{main()}, which is in @file{emacs.c}.  When
1093 this is called the first time (in a properly-invoked @file{temacs}), it
1094 does the following:
1095
1096 @enumerate
1097 @item
1098 It does some very basic environment initializations, such as determining
1099 where it and its directories (e.g. @file{lisp/} and @file{etc/}) reside
1100 and setting up signal handlers.
1101 @item
1102 It initializes the entire Lisp interpreter.
1103 @item
1104 It sets the initial values of many built-in variables (including many
1105 variables that are visible to Lisp programs), such as the global keymap
1106 object and the built-in faces (a face is an object that describes the
1107 display characteristics of text).  This involves creating Lisp objects
1108 and thus is dependent on step (2).
1109 @item
1110 It performs various other initializations that are relevant to the
1111 particular environment it is running in, such as retrieving environment
1112 variables, determining the current date and the user who is running the
1113 program, examining its standard input, creating any necessary file
1114 descriptors, etc.
1115 @item
1116 At this point, the C initialization is complete.  A Lisp program that
1117 was specified on the command line (usually @file{loadup.el}) is called
1118 (temacs is normally invoked as @code{temacs -batch -l loadup.el dump}).
1119 @file{loadup.el} loads all of the other Lisp files that are needed for
1120 the operation of the editor, calls the @code{dump-emacs} function to
1121 write out @file{xemacs}, and then kills the temacs process.
1122 @end enumerate
1123
1124   When @file{xemacs} is then run, it only redoes steps (1) and (4)
1125 above; all variables already contain the values they were set to when
1126 the executable was dumped, and all memory that was allocated with
1127 @code{malloc()} is still around. (XEmacs knows whether it is being run
1128 as @file{xemacs} or @file{temacs} because it sets the global variable
1129 @code{initialized} to 1 after step (4) above.) At this point,
1130 @file{xemacs} calls a Lisp function to do any further initialization,
1131 which includes parsing the command-line (the C code can only do limited
1132 command-line parsing, which includes looking for the @samp{-batch} and
1133 @samp{-l} flags and a few other flags that it needs to know about before
1134 initialization is complete), creating the first frame (or @dfn{window}
1135 in standard window-system parlance), running the user's init file
1136 (usually the file @file{.emacs} in the user's home directory), etc.  The
1137 function to do this is usually called @code{normal-top-level};
1138 @file{loadup.el} tells the C code about this function by setting its
1139 name as the value of the Lisp variable @code{top-level}.
1140
1141   When the Lisp initialization code is done, the C code enters the event
1142 loop, and stays there for the duration of the XEmacs process.  The code
1143 for the event loop is contained in @file{keyboard.c}, and is called
1144 @code{Fcommand_loop_1()}.  Note that this event loop could very well be
1145 written in Lisp, and in fact a Lisp version exists; but apparently,
1146 doing this makes XEmacs run noticeably slower.
1147
1148   Notice how much of the initialization is done in Lisp, not in C.
1149 In general, XEmacs tries to move as much code as is possible
1150 into Lisp.  Code that remains in C is code that implements the
1151 Lisp interpreter itself, or code that needs to be very fast, or
1152 code that needs to do system calls or other such stuff that
1153 needs to be done in C, or code that needs to have access to
1154 ``forbidden'' structures. (One conscious aspect of the design of
1155 Lisp under XEmacs is a clean separation between the external
1156 interface to a Lisp object's functionality and its internal
1157 implementation.  Part of this design is that Lisp programs
1158 are forbidden from accessing the contents of the object other
1159 than through using a standard API.  In this respect, XEmacs Lisp
1160 is similar to modern Lisp dialects but differs from GNU Emacs,
1161 which tends to expose the implementation and allow Lisp
1162 programs to look at it directly.  The major advantage of
1163 hiding the implementation is that it allows the implementation
1164 to be redesigned without affecting any Lisp programs, including
1165 those that might want to be ``clever'' by looking directly at
1166 the object's contents and possibly manipulating them.)
1167
1168   Moving code into Lisp makes the code easier to debug and maintain and
1169 makes it much easier for people who are not XEmacs developers to
1170 customize XEmacs, because they can make a change with much less chance
1171 of obscure and unwanted interactions occurring than if they were to
1172 change the C code.
1173
1174 @node The XEmacs Object System (Abstractly Speaking), How Lisp Objects Are Represented in C, XEmacs From the Inside, Top
1175 @chapter The XEmacs Object System (Abstractly Speaking)
1176
1177   At the heart of the Lisp interpreter is its management of objects.
1178 XEmacs Lisp contains many built-in objects, some of which are
1179 simple and others of which can be very complex; and some of which
1180 are very common, and others of which are rarely used or are only
1181 used internally. (Since the Lisp allocation system, with its
1182 automatic reclamation of unused storage, is so much more convenient
1183 than @code{malloc()} and @code{free()}, the C code makes extensive use of it
1184 in its internal operations.)
1185
1186   The basic Lisp objects are
1187
1188 @table @code
1189 @item integer
1190 28 bits of precision, or 60 bits on 64-bit machines; the reason for this
1191 is described below when the internal Lisp object representation is
1192 described.
1193 @item float
1194 Same precision as a double in C.
1195 @item cons
1196 A simple container for two Lisp objects, used to implement lists and
1197 most other data structures in Lisp.
1198 @item char
1199 An object representing a single character of text; chars behave like
1200 integers in many ways but are logically considered text rather than
1201 numbers and have a different read syntax. (the read syntax for a char
1202 contains the char itself or some textual encoding of it -- for example,
1203 a Japanese Kanji character might be encoded as @samp{^[$(B#&^[(B} using the
1204 ISO-2022 encoding standard -- rather than the numerical representation
1205 of the char; this way, if the mapping between chars and integers
1206 changes, which is quite possible for Kanji characters and other extended
1207 characters, the same character will still be created.  Note that some
1208 primitives confuse chars and integers.  The worst culprit is @code{eq},
1209 which makes a special exception and considers a char to be @code{eq} to
1210 its integer equivalent, even though in no other case are objects of two
1211 different types @code{eq}.  The reason for this monstrosity is
1212 compatibility with existing code; the separation of char from integer
1213 came fairly recently.)
1214 @item symbol
1215 An object that contains Lisp objects and is referred to by name;
1216 symbols are used to implement variables and named functions
1217 and to provide the equivalent of preprocessor constants in C.
1218 @item vector
1219 A one-dimensional array of Lisp objects providing constant-time access
1220 to any of the objects; access to an arbitrary object in a vector is
1221 faster than for lists, but the operations that can be done on a vector
1222 are more limited.
1223 @item string
1224 Self-explanatory; behaves much like a vector of chars
1225 but has a different read syntax and is stored and manipulated
1226 more compactly and efficiently.
1227 @item bit-vector
1228 A vector of bits; similar to a string in spirit.
1229 @item compiled-function
1230 An object describing compiled Lisp code, known as @dfn{byte code}.
1231 @item subr
1232 An object describing a Lisp primitive.
1233 @end table
1234
1235 @cindex closure
1236   Note that there is no basic ``function'' type, as in more powerful
1237 versions of Lisp (where it's called a @dfn{closure}).  XEmacs Lisp does
1238 not provide the closure semantics implemented by Common Lisp and Scheme.
1239 The guts of a function in XEmacs Lisp are represented in one of four
1240 ways: a symbol specifying another function (when one function is an
1241 alias for another), a list containing the function's source code, a
1242 bytecode object, or a subr object. (In other words, given a symbol
1243 specifying the name of a function, calling @code{symbol-function} to
1244 retrieve the contents of the symbol's function cell will return one of
1245 these types of objects.)
1246
1247   XEmacs Lisp also contains numerous specialized objects used to
1248 implement the editor:
1249
1250 @table @code
1251 @item buffer
1252 Stores text like a string, but is optimized for insertion and deletion
1253 and has certain other properties that can be set.
1254 @item frame
1255 An object with various properties whose displayable representation is a
1256 @dfn{window} in window-system parlance.
1257 @item window
1258 A section of a frame that displays the contents of a buffer;
1259 often called a @dfn{pane} in window-system parlance.
1260 @item window-configuration
1261 An object that represents a saved configuration of windows in a frame.
1262 @item device
1263 An object representing a screen on which frames can be displayed;
1264 equivalent to a @dfn{display} in the X Window System and a @dfn{TTY} in
1265 character mode.
1266 @item face
1267 An object specifying the appearance of text or graphics; it contains
1268 characteristics such as font, foreground color, and background color.
1269 @item marker
1270 An object that refers to a particular position in a buffer and moves
1271 around as text is inserted and deleted to stay in the same relative
1272 position to the text around it.
1273 @item extent
1274 Similar to a marker but covers a range of text in a buffer; can also
1275 specify properties of the text, such as a face in which the text is to
1276 be displayed, whether the text is invisible or unmodifiable, etc.
1277 @item event
1278 Generated by calling @code{next-event} and contains information
1279 describing a particular event happening in the system, such as the user
1280 pressing a key or a process terminating.
1281 @item keymap
1282 An object that maps from events (described using lists, vectors, and
1283 symbols rather than with an event object because the mapping is for
1284 classes of events, rather than individual events) to functions to
1285 execute or other events to recursively look up; the functions are
1286 described by name, using a symbol, or using lists to specify the
1287 function's code.
1288 @item glyph
1289 An object that describes the appearance of an image (e.g.  pixmap) on
1290 the screen; glyphs can be attached to the beginning or end of extents
1291 and in some future version of XEmacs will be able to be inserted
1292 directly into a buffer.
1293 @item process
1294 An object that describes a connection to an externally-running process.
1295 @end table
1296
1297   There are some other, less-commonly-encountered general objects:
1298
1299 @table @code
1300 @item hashtable
1301 An object that maps from an arbitrary Lisp object to another arbitrary
1302 Lisp object, using hashing for fast lookup.
1303 @item obarray
1304 A limited form of hashtable that maps from strings to symbols; obarrays
1305 are used to look up a symbol given its name and are not actually their
1306 own object type but are kludgily represented using vectors with hidden
1307 fields (this representation derives from GNU Emacs).
1308 @item specifier
1309 A complex object used to specify the value of a display property; a
1310 default value is given and different values can be specified for
1311 particular frames, buffers, windows, devices, or classes of device.
1312 @item char-table
1313 An object that maps from chars or classes of chars to arbitrary Lisp
1314 objects; internally char tables use a complex nested-vector
1315 representation that is optimized to the way characters are represented
1316 as integers.
1317 @item range-table
1318 An object that maps from ranges of integers to arbitrary Lisp objects.
1319 @end table
1320
1321   And some strange special-purpose objects:
1322
1323 @table @code
1324 @item charset
1325 @itemx coding-system
1326 Objects used when MULE, or multi-lingual/Asian-language, support is
1327 enabled.
1328 @item color-instance
1329 @itemx font-instance
1330 @itemx image-instance
1331 An object that encapsulates a window-system resource; instances are
1332 mostly used internally but are exposed on the Lisp level for cleanness
1333 of the specifier model and because it's occasionally useful for Lisp
1334 program to create or query the properties of instances.
1335 @item subwindow
1336 An object that encapsulate a @dfn{subwindow} resource, i.e. a
1337 window-system child window that is drawn into by an external process;
1338 this object should be integrated into the glyph system but isn't yet,
1339 and may change form when this is done.
1340 @item tooltalk-message
1341 @itemx tooltalk-pattern
1342 Objects that represent resources used in the ToolTalk interprocess
1343 communication protocol.
1344 @item toolbar-button
1345 An object used in conjunction with the toolbar.
1346 @item x-resource
1347 An object that encapsulates certain miscellaneous resources in the X
1348 window system, used only when Epoch support is enabled.
1349 @end table
1350
1351   And objects that are only used internally:
1352
1353 @table @asis
1354 @item opaque
1355 A generic object for encapsulating arbitrary memory; this allows you the
1356 generality of @code{malloc()} and the convenience of the Lisp object
1357 system.
1358 @item lstream
1359 A buffering I/O stream, used to provide a unified interface to anything
1360 that can accept output or provide input, such as a file descriptor, a
1361 stdio stream, a chunk of memory, a Lisp buffer, a Lisp string, etc.;
1362 it's a Lisp object to make its memory management more convenient.
1363 @item char-table-entry
1364 Subsidiary objects in the internal char-table representation.
1365 @item extent-auxiliary
1366 @itemx menubar-data
1367 @itemx toolbar-data
1368 Various special-purpose objects that are basically just used to
1369 encapsulate memory for particular subsystems, similar to the more
1370 general ``opaque'' object.
1371 @item symbol-value-forward
1372 @itemx symbol-value-buffer-local
1373 @itemx symbol-value-varalias
1374 @itemx symbol-value-lisp-magic
1375 Special internal-only objects that are placed in the value cell of a
1376 symbol to indicate that there is something special with this variable --
1377 e.g. it has no value, it mirrors another variable, or it mirrors some C
1378 variable; there is really only one kind of object, called a
1379 @dfn{symbol-value-magic}, but it is sort-of halfway kludged into
1380 semi-different object types.
1381 @end table
1382
1383 @cindex permanent objects
1384 @cindex temporary objects
1385   Some types of objects are @dfn{permanent}, meaning that once created,
1386 they do not disappear until explicitly destroyed, using a function such
1387 as @code{delete-buffer}, @code{delete-window}, @code{delete-frame}, etc.
1388 Others will disappear once they are not longer used, through the garbage
1389 collection mechanism.  Buffers, frames, windows, devices, and processes
1390 are among the objects that are permanent.  Note that some objects can go
1391 both ways: Faces can be created either way; extents are normally
1392 permanent, but detached extents (extents not referring to any text, as
1393 happens to some extents when the text they are referring to is deleted)
1394 are temporary.  Note that some permanent objects, such as faces and
1395 coding systems, cannot be deleted.  Note also that windows are unique in
1396 that they can be @emph{undeleted} after having previously been
1397 deleted. (This happens as a result of restoring a window configuration.)
1398
1399 @cindex read syntax
1400   Note that many types of objects have a @dfn{read syntax}, i.e. a way of
1401 specifying an object of that type in Lisp code.  When you load a Lisp
1402 file, or type in code to be evaluated, what really happens is that the
1403 function @code{read} is called, which reads some text and creates an object
1404 based on the syntax of that text; then @code{eval} is called, which
1405 possibly does something special; then this loop repeats until there's
1406 no more text to read. (@code{eval} only actually does something special
1407 with symbols, which causes the symbol's value to be returned,
1408 similar to referencing a variable; and with conses [i.e. lists],
1409 which cause a function invocation.  All other values are returned
1410 unchanged.)
1411
1412   The read syntax
1413
1414 @example
1415 17297
1416 @end example
1417
1418 converts to an integer whose value is 17297.
1419
1420 @example
1421 1.983e-4
1422 @end example
1423
1424 converts to a float whose value is 1983.23e-4, or .0001983.
1425
1426 @example
1427 ?b
1428 @end example
1429
1430 converts to a char that represents the lowercase letter b.
1431
1432 @example
1433 ?^[$(B#&^[(B
1434 @end example
1435
1436 (where @samp{^[} actually is an @samp{ESC} character) converts to a
1437 particular Kanji character when using an ISO2022-based coding system for
1438 input. (To decode this gook: @samp{ESC} begins an escape sequence;
1439 @samp{ESC $ (} is a class of escape sequences meaning ``switch to a
1440 94x94 character set''; @samp{ESC $ ( B} means ``switch to Japanese
1441 Kanji''; @samp{#} and @samp{&} collectively index into a 94-by-94 array
1442 of characters [subtract 33 from the ASCII value of each character to get
1443 the corresponding index]; @samp{ESC (} is a class of escape sequences
1444 meaning ``switch to a 94 character set''; @samp{ESC (B} means ``switch
1445 to US ASCII''.  It is a coincidence that the letter @samp{B} is used to
1446 denote both Japanese Kanji and US ASCII.  If the first @samp{B} were
1447 replaced with an @samp{A}, you'd be requesting a Chinese Hanzi character
1448 from the GB2312 character set.)
1449
1450 @example
1451 "foobar"
1452 @end example
1453
1454 converts to a string.
1455
1456 @example
1457 foobar
1458 @end example
1459
1460 converts to a symbol whose name is @code{"foobar"}.  This is done by
1461 looking up the string equivalent in the global variable
1462 @code{obarray}, whose contents should be an obarray.  If no symbol
1463 is found, a new symbol with the name @code{"foobar"} is automatically
1464 created and added to @code{obarray}; this process is called
1465 @dfn{interning} the symbol.
1466 @cindex interning
1467
1468 @example
1469 (foo . bar)
1470 @end example
1471
1472 converts to a cons cell containing the symbols @code{foo} and @code{bar}.
1473
1474 @example
1475 (1 a 2.5)
1476 @end example
1477
1478 converts to a three-element list containing the specified objects
1479 (note that a list is actually a set of nested conses; see the
1480 XEmacs Lisp Reference).
1481
1482 @example
1483 [1 a 2.5]
1484 @end example
1485
1486 converts to a three-element vector containing the specified objects.
1487
1488 @example
1489 #[... ... ... ...]
1490 @end example
1491
1492 converts to a compiled-function object (the actual contents are not
1493 shown since they are not relevant here; look at a file that ends with
1494 @file{.elc} for examples).
1495
1496 @example
1497 #*01110110
1498 @end example
1499
1500 converts to a bit-vector.
1501
1502 @example
1503 #s(range-table ... ...)
1504 @end example
1505
1506 converts to a range table (the actual contents are not shown).
1507
1508 @example
1509 #s(char-table ... ...)
1510 @end example
1511
1512 converts to a char table (the actual contents are not shown).
1513 (Note that the #s syntax is the general syntax for structures,
1514 which are not really implemented in XEmacs Lisp but should be.)
1515
1516   When an object is printed out (using @code{print} or a related
1517 function), the read syntax is used, so that the same object can be read
1518 in again.
1519
1520   The other objects do not have read syntaxes, usually because it does
1521 not really make sense to create them in this fashion (i.e.  processes,
1522 where it doesn't make sense to have a subprocess created as a side
1523 effect of reading some Lisp code), or because they can't be created at
1524 all (e.g. subrs).  Permanent objects, as a rule, do not have a read
1525 syntax; nor do most complex objects, which contain too much state to be
1526 easily initialized through a read syntax.
1527
1528 @node How Lisp Objects Are Represented in C, Rules When Writing New C Code, The XEmacs Object System (Abstractly Speaking), Top
1529 @chapter How Lisp Objects Are Represented in C
1530
1531   Lisp objects are represented in C using a 32- or 64-bit machine word
1532 (depending on the processor; i.e. DEC Alphas use 64-bit Lisp objects and
1533 most other processors use 32-bit Lisp objects).  The representation
1534 stuffs a pointer together with a tag, as follows:
1535
1536 @example
1537  [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
1538  [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
1539
1540    ^ <---> <------------------------------------------------------>
1541    |  tag         a pointer to a structure, or an integer
1542    |
1543    `---> mark bit
1544 @end example
1545
1546   The tag describes the type of the Lisp object.  For integers and
1547 chars, the lower 28 bits contain the value of the integer or char; for
1548 all others, the lower 28 bits contain a pointer.  The mark bit is used
1549 during garbage-collection, and is always 0 when garbage collection is
1550 not happening.  Many macros that extract out parts of a Lisp object
1551 expect that the mark bit is 0, and will produce incorrect results if
1552 it's not. (The way that garbage collection works, basically, is that it
1553 loops over all places where Lisp objects could exist -- this includes
1554 all global variables in C that contain Lisp objects [including
1555 @code{Vobarray}, the C equivalent of @code{obarray}; through this, all
1556 Lisp variables will get marked], plus various other places -- and
1557 recursively scans through the Lisp objects, marking each object it finds
1558 by setting the mark bit.  Then it goes through the lists of all objects
1559 allocated, freeing the ones that are not marked and turning off the
1560 mark bit of the ones that are marked.)
1561
1562   Lisp objects use the typedef @code{Lisp_Object}, but the actual C type
1563 used for the Lisp object can vary.  It can be either a simple type
1564 (@code{long} on the DEC Alpha, @code{int} on other machines) or a
1565 structure whose fields are bit fields that line up properly (actually, a
1566 union of structures that's used).  Generally the simple integral type is
1567 preferable because it ensures that the compiler will actually use a
1568 machine word to represent the object (some compilers will use more
1569 general and less efficient code for unions and structs even if they can
1570 fit in a machine word).  The union type, however, has the advantage of
1571 stricter type checking (if you accidentally pass an integer where a Lisp
1572 object is desired, you get a compile error), and it makes it easier to
1573 decode Lisp objects when debugging.  The choice of which type to use is
1574 determined by the presence or absence of the preprocessor constant
1575 @code{USE_UNION_TYPE}.
1576
1577 @cindex record type
1578   Note that there are only eight types that the tag can represent,
1579 but many more actual types than this.  This is handled by having
1580 one of the tag types specify a meta-type called a @dfn{record};
1581 for all such objects, the first four bytes of the pointed-to
1582 structure indicate what the actual type is.
1583
1584   Note also that having 28 bits for pointers and integers restricts a
1585 lot of things to 256 megabytes of memory. (Basically, enough pointers
1586 and indices and whatnot get stuffed into Lisp objects that the total
1587 amount of memory used by XEmacs can't grow above 256 megabytes.  In
1588 older versions of XEmacs and GNU Emacs, the tag was 5 bits wide,
1589 allowing for 32 types, which was more than the actual number of types
1590 that existed at the time, and no ``record'' type was necessary.
1591 However, this limited the editor to 64 megabytes total, which some users
1592 who edited large files might conceivably exceed.)
1593
1594   Also, note that there is an implicit assumption here that all pointers
1595 are low enough that the top bits are all zero and can just be chopped
1596 off.  On standard machines that allocate memory from the bottom up (and
1597 give each process its own address space), this works fine.  Some
1598 machines, however, put the data space somewhere else in memory
1599 (e.g. beginning at 0x80000000).  Those machines cope by defining
1600 @code{DATA_SEG_BITS} in the corresponding @file{m/} or @file{s/} file to
1601 the proper mask.  Then, pointers retrieved from Lisp objects are
1602 automatically OR'ed with this value prior to being used.
1603
1604   A corollary of the previous paragraph is that @strong{(pointers to)
1605 stack-allocated structures cannot be put into Lisp objects}.  The stack
1606 is generally located near the top of memory; if you put such a pointer
1607 into a Lisp object, it will get its top bits chopped off, and you will
1608 lose.
1609
1610   Various macros are used to construct Lisp objects and extract the
1611 components.  Macros of the form @code{XINT()}, @code{XCHAR()},
1612 @code{XSTRING()}, @code{XSYMBOL()}, etc. mask out the pointer/integer
1613 field and cast it to the appropriate type.  All of the macros that
1614 construct pointers will @code{OR} with @code{DATA_SEG_BITS} if
1615 necessary.  @code{XINT()} needs to be a bit tricky so that negative
1616 numbers are properly sign-extended: Usually it does this by shifting the
1617 number four bits to the left and then four bits to the right.  This
1618 assumes that the right-shift operator does an arithmetic shift (i.e. it
1619 leaves the most-significant bit as-is rather than shifting in a zero, so
1620 that it mimics a divide-by-two even for negative numbers).  Not all
1621 machines/compilers do this, and on the ones that don't, a more
1622 complicated definition is selected by defining
1623 @code{EXPLICIT_SIGN_EXTEND}.
1624
1625   Note that when @code{ERROR_CHECK_TYPECHECK} is defined, the extractor
1626 macros become more complicated -- they check the tag bits and/or the
1627 type field in the first four bytes of a record type to ensure that the
1628 object is really of the correct type.  This is great for catching places
1629 where an incorrect type is being dereferenced -- this typically results
1630 in a pointer being dereferenced as the wrong type of structure, with
1631 unpredictable (and sometimes not easily traceable) results.
1632
1633   There are similar @code{XSET@var{TYPE}()} macros that construct a Lisp object.
1634 These macros are of the form @code{XSET@var{TYPE} (@var{lvalue}, @var{result})},
1635 i.e. they have to be a statement rather than just used in an expression.
1636 The reason for this is that standard C doesn't let you ``construct'' a
1637 structure (but GCC does).  Granted, this sometimes isn't too convenient;
1638 for the case of integers, at least, you can use the function
1639 @code{make_int()}, which constructs and @emph{returns} an integer
1640 Lisp object.  Note that the @code{XSET@var{TYPE}()} macros are also
1641 affected by @code{ERROR_CHECK_TYPECHECK} and make sure that the
1642 structure is of the right type in the case of record types, where the
1643 type is contained in the structure.
1644
1645 @node Rules When Writing New C Code, A Summary of the Various XEmacs Modules, How Lisp Objects Are Represented in C, Top
1646 @chapter Rules When Writing New C Code
1647
1648   The XEmacs C Code is extremely complex and intricate, and there are
1649 many rules that are more or less consistently followed throughout the code.
1650 Many of these rules are not obvious, so they are explained here.  It is
1651 of the utmost importance that you follow them.  If you don't, you may get
1652 something that appears to work, but which will crash in odd situations,
1653 often in code far away from where the actual breakage is.
1654
1655 @menu
1656 * General Coding Rules::
1657 * Writing Lisp Primitives::
1658 * Adding Global Lisp Variables::
1659 * Coding for Mule::
1660 * Techniques for XEmacs Developers::
1661 @end menu
1662
1663 @node General Coding Rules
1664 @section General Coding Rules
1665
1666   Almost every module contains a @code{syms_of_*()} function and a
1667 @code{vars_of_*()} function.  The former declares any Lisp primitives
1668 you have defined and defines any symbols you will be using.  The latter
1669 declares any global Lisp variables you have added and initializes global
1670 C variables in the module.  For each such function, declare it in
1671 @file{symsinit.h} and make sure it's called in the appropriate place in
1672 @file{emacs.c}.  @strong{Important}: There are stringent requirements on
1673 exactly what can go into these functions.  See the comment in
1674 @file{emacs.c}.  The reason for this is to avoid obscure unwanted
1675 interactions during initialization.  If you don't follow these rules,
1676 you'll be sorry!  If you want to do anything that isn't allowed, create
1677 a @code{complex_vars_of_*()} function for it.  Doing this is tricky,
1678 though: You have to make sure your function is called at the right time
1679 so that all the initialization dependencies work out.
1680
1681   Every module includes @file{<config.h>} (angle brackets so that
1682 @samp{--srcdir} works correctly; @file{config.h} may or may not be in
1683 the same directory as the C sources) and @file{lisp.h}.  @file{config.h}
1684 should always be included before any other header files (including
1685 system header files) to ensure that certain tricks played by various
1686 @file{s/} and @file{m/} files work out correctly.
1687
1688   @strong{All global and static variables that are to be modifiable must
1689 be declared uninitialized.}  This means that you may not use the ``declare
1690 with initializer'' form for these variables, such as @code{int
1691 some_variable = 0;}.  The reason for this has to do with some kludges
1692 done during the dumping process: If possible, the initialized data
1693 segment is re-mapped so that it becomes part of the (unmodifiable) code
1694 segment in the dumped executable.  This allows this memory to be shared
1695 among multiple running XEmacs processes.  XEmacs is careful to place as
1696 much constant data as possible into initialized variables (in
1697 particular, into what's called the @dfn{pure space} -- see below) during
1698 the @file{temacs} phase.
1699
1700 @cindex copy-on-write
1701   @strong{Please note:} This kludge only works on a few systems
1702 nowadays, and is rapidly becoming irrelevant because most modern
1703 operating systems provide @dfn{copy-on-write} semantics.  All data is
1704 initially shared between processes, and a private copy is automatically
1705 made (on a page-by-page basis) when a process first attempts to write to
1706 a page of memory.
1707
1708   Formerly, there was a requirement that static variables not be
1709 declared inside of functions.  This had to do with another hack along
1710 the same vein as what was just described: old USG systems put
1711 statically-declared variables in the initialized data space, so those
1712 header files had a @code{#define static} declaration. (That way, the
1713 data-segment remapping described above could still work.) This fails
1714 badly on static variables inside of functions, which suddenly become
1715 automatic variables; therefore, you weren't supposed to have any of
1716 them.  This awful kludge has been removed in XEmacs because
1717
1718 @enumerate
1719 @item
1720 almost all of the systems that used this kludge ended up having
1721 to disable the data-segment remapping anyway;
1722 @item
1723 the only systems that didn't were extremely outdated ones;
1724 @item
1725 this hack completely messed up inline functions.
1726 @end enumerate
1727
1728 @node Writing Lisp Primitives
1729 @section Writing Lisp Primitives
1730
1731   Lisp primitives are Lisp functions implemented in C.  The details of
1732 interfacing the C function so that Lisp can call it are handled by a few
1733 C macros.  The only way to really understand how to write new C code is
1734 to read the source, but we can explain some things here.
1735
1736   An example of a special form is the definition of @code{or}, from
1737 @file{eval.c}.  (An ordinary function would have the same general
1738 appearance.)
1739
1740 @cindex garbage collection protection
1741 @smallexample
1742 @group
1743 DEFUN ("or", For, 0, UNEVALLED, 0, /*
1744 Eval args until one of them yields non-nil, then return that value.
1745 The remaining args are not evalled at all.
1746 If all args return nil, return nil.
1747 */
1748        (args))
1749 @{
1750   /* This function can GC */
1751   Lisp_Object val = Qnil;
1752   struct gcpro gcpro1;
1753
1754   GCPRO1 (args);
1755
1756   while (!NILP (args))
1757     @{
1758       val = Feval (XCAR (args));
1759       if (!NILP (val))
1760         break;
1761       args = XCDR (args);
1762     @}
1763
1764   UNGCPRO;
1765   return val;
1766 @}
1767 @end group
1768 @end smallexample
1769
1770   Let's start with a precise explanation of the arguments to the
1771 @code{DEFUN} macro.  Here is a template for them:
1772
1773 @example
1774 DEFUN (@var{lname}, @var{fname}, @var{min}, @var{max}, @var{interactive}, /*
1775 @var{docstring}
1776 */
1777    (@var{arglist}) )
1778 @end example
1779
1780 @table @var
1781 @item lname
1782 This string is the name of the Lisp symbol to define as the function
1783 name; in the example above, it is @code{"or"}.
1784
1785 @item fname
1786 This is the C function name for this function.  This is the name that is
1787 used in C code for calling the function.  The name is, by convention,
1788 @samp{F} prepended to the Lisp name, with all dashes (@samp{-}) in the
1789 Lisp name changed to underscores.  Thus, to call this function from C
1790 code, call @code{For}.  Remember that the arguments are of type
1791 @code{Lisp_Object}; various macros and functions for creating values of
1792 type @code{Lisp_Object} are declared in the file @file{lisp.h}.
1793
1794 Primitives whose names are special characters (e.g. @code{+} or
1795 @code{<}) are named by spelling out, in some fashion, the special
1796 character: e.g. @code{Fplus()} or @code{Flss()}.  Primitives whose names
1797 begin with normal alphanumeric characters but also contain special
1798 characters are spelled out in some creative way, e.g. @code{let*}
1799 becomes @code{FletX()}.
1800
1801 Each function also has an associated structure that holds the data for
1802 the subr object that represents the function in Lisp.  This structure
1803 conveys the Lisp symbol name to the initialization routine that will
1804 create the symbol and store the subr object as its definition.  The C
1805 variable name of this structure is always @samp{S} prepended to the
1806 @var{fname}.  You hardly ever need to be aware of the existence of this
1807 structure.
1808
1809 @item min
1810 This is the minimum number of arguments that the function requires.  The
1811 function @code{or} allows a minimum of zero arguments.
1812
1813 @item max
1814 This is the maximum number of arguments that the function accepts, if
1815 there is a fixed maximum.  Alternatively, it can be @code{UNEVALLED},
1816 indicating a special form that receives unevaluated arguments, or
1817 @code{MANY}, indicating an unlimited number of evaluated arguments (the
1818 equivalent of @code{&rest}).  Both @code{UNEVALLED} and @code{MANY} are
1819 macros.  If @var{max} is a number, it may not be less than @var{min} and
1820 it may not be greater than 8. (If you need to add a function with
1821 more than 8 arguments, either use the @code{MANY} form or edit the
1822 definition of @code{DEFUN} in @file{lisp.h}.  If you do the latter,
1823 make sure to also add another clause to the switch statement in
1824 @code{primitive_funcall().})
1825
1826 @item interactive
1827 This is an interactive specification, a string such as might be used as
1828 the argument of @code{interactive} in a Lisp function.  In the case of
1829 @code{or}, it is 0 (a null pointer), indicating that @code{or} cannot be
1830 called interactively.  A value of @code{""} indicates a function that
1831 should receive no arguments when called interactively.
1832
1833 @item docstring
1834 This is the documentation string.  It is written just like a
1835 documentation string for a function defined in Lisp; in particular, the
1836 first line should be a single sentence.  Note how the documentation
1837 string is enclosed in a comment, none of the documentation is placed on
1838 the same lines as the comment-start and comment-end characters, and the
1839 comment-start characters are on the same line as the interactive
1840 specification.  @file{make-docfile}, which scans the C files for
1841 documentation strings, is very particular about what it looks for, and
1842 will not properly extract the doc string if it's not in this exact format.
1843
1844 You are free to put the various arguments to @code{DEFUN} on separate
1845 lines to avoid overly long lines.  However, make sure to put the
1846 comment-start characters for the doc string on the same line as the
1847 interactive specification, and put a newline directly after them (and
1848 before the comment-end characters).
1849
1850 @item arglist
1851 This is the comma-separated list of arguments to the C function.  For a
1852 function with a fixed maximum number of arguments, provide a C argument
1853 for each Lisp argument.  In this case, unlike regular C functions, the
1854 types of the arguments are not declared; they are simply always of type
1855 @code{Lisp_Object}.
1856
1857 The names of the C arguments will be used as the names of the arguments
1858 to the Lisp primitive as displayed in its documentation, modulo the same
1859 concerns described above for @code{F...} names (in particular,
1860 underscores in the C arguments become dashes in the Lisp arguments).
1861
1862 There is one additional kludge: A trailing `_' on the C argument is
1863 discarded when forming the Lisp argument.  This allows C language
1864 reserved words (like @code{default}) or global symbols (like
1865 @code{dirname}) to be used as argument names without compiler warnings
1866 or errors.
1867
1868 A Lisp function with @w{@var{max} = @code{UNEVALLED}} is a
1869 @w{@dfn{special form}}; its arguments are not evaluated.  Instead it
1870 receives one argument of type @code{Lisp_Object}, a (Lisp) list of the
1871 unevaluated arguments, conventionally named @code{(args)}.
1872
1873 When a Lisp function has no upper limit on the number of arguments,
1874 specify @w{@var{max} = @code{MANY}}.  In this case its implementation in
1875 C actually receives exactly two arguments: the number of Lisp arguments
1876 (an @code{int}) and the address of a block containing their values (a
1877 @w{@code{Lisp_Object *}}).  In this case only are the C types specified
1878 in the @var{arglist}: @w{@code{(int nargs, Lisp_Object *args)}}.
1879
1880 @end table
1881
1882   Within the function @code{For} itself, note the use of the macros
1883 @code{GCPRO1} and @code{UNGCPRO}.  @code{GCPRO1} is used to ``protect''
1884 a variable from garbage collection---to inform the garbage collector
1885 that it must look in that variable and regard its contents as an
1886 accessible object.  This is necessary whenever you call @code{Feval} or
1887 anything that can directly or indirectly call @code{Feval} (this
1888 includes the @code{QUIT} macro!).  At such a time, any Lisp object that
1889 you intend to refer to again must be protected somehow.  @code{UNGCPRO}
1890 cancels the protection of the variables that are protected in the
1891 current function.  It is necessary to do this explicitly.
1892
1893   The macro @code{GCPRO1} protects just one local variable.  If you want
1894 to protect two, use @code{GCPRO2} instead; repeating @code{GCPRO1} will
1895 not work.  Macros @code{GCPRO3} and @code{GCPRO4} also exist.
1896
1897   These macros implicitly use local variables such as @code{gcpro1}; you
1898 must declare these explicitly, with type @code{struct gcpro}.  Thus, if
1899 you use @code{GCPRO2}, you must declare @code{gcpro1} and @code{gcpro2}.
1900
1901 @cindex caller-protects (@code{GCPRO} rule)
1902   Note also that the general rule is @dfn{caller-protects}; i.e. you
1903 are only responsible for protecting those Lisp objects that you create.
1904 Any objects passed to you as parameters should have been protected
1905 by whoever created them, so you don't in general have to protect them.
1906 @code{For} is an exception; it protects its parameters to provide
1907 extra assurance against Lisp primitives elsewhere that are incorrectly
1908 written, and against malicious self-modifying code.  There are a few
1909 other standard functions that also do this.
1910
1911 @code{GCPRO}ing is perhaps the trickiest and most error-prone part
1912 of XEmacs coding.  It is @strong{extremely} important that you get this
1913 right and use a great deal of discipline when writing this code.
1914 @xref{GCPROing, ,@code{GCPRO}ing}, for full details on how to do this.
1915
1916   What @code{DEFUN} actually does is declare a global structure of
1917 type @code{Lisp_Subr} whose name begins with capital @samp{SF} and
1918 which contains information about the primitive (e.g. a pointer to the
1919 function, its minimum and maximum allowed arguments, a string describing
1920 its Lisp name); @code{DEFUN} then begins a normal C function
1921 declaration using the @code{F...} name.  The Lisp subr object that is
1922 the function definition of a primitive (i.e. the object in the function
1923 slot of the symbol that names the primitive) actually points to this
1924 @samp{SF} structure; when @code{Feval} encounters a subr, it looks in the
1925 structure to find out how to call the C function.
1926
1927   Defining the C function is not enough to make a Lisp primitive
1928 available; you must also create the Lisp symbol for the primitive (the
1929 symbol is @dfn{interned}; @pxref{Obarrays}) and store a suitable subr
1930 object in its function cell. (If you don't do this, the primitive won't
1931 be seen by Lisp code.) The code looks like this:
1932
1933 @example
1934 DEFSUBR (@var{fname});
1935 @end example
1936
1937 @noindent
1938 Here @var{fname} is the name you used as the second argument to
1939 @code{DEFUN}.
1940
1941   This call to @code{DEFSUBR} should go in the @code{syms_of_*()}
1942 function at the end of the module.  If no such function exists, create
1943 it and make sure to also declare it in @file{symsinit.h} and call it
1944 from the appropriate spot in @code{main()}.  @xref{General Coding
1945 Rules}.
1946
1947   Note that C code cannot call functions by name unless they are defined
1948 in C.  The way to call a function written in Lisp from C is to use
1949 @code{Ffuncall}, which embodies the Lisp function @code{funcall}.  Since
1950 the Lisp function @code{funcall} accepts an unlimited number of
1951 arguments, in C it takes two: the number of Lisp-level arguments, and a
1952 one-dimensional array containing their values.  The first Lisp-level
1953 argument is the Lisp function to call, and the rest are the arguments to
1954 pass to it.  Since @code{Ffuncall} can call the evaluator, you must
1955 protect pointers from garbage collection around the call to
1956 @code{Ffuncall}. (However, @code{Ffuncall} explicitly protects all of
1957 its parameters, so you don't have to protect any pointers passed
1958 as parameters to it.)
1959
1960   The C functions @code{call0}, @code{call1}, @code{call2}, and so on,
1961 provide handy ways to call a Lisp function conveniently with a fixed
1962 number of arguments.  They work by calling @code{Ffuncall}.
1963
1964   @file{eval.c} is a very good file to look through for examples;
1965 @file{lisp.h} contains the definitions for some important macros and
1966 functions.
1967
1968 @node Adding Global Lisp Variables
1969 @section Adding Global Lisp Variables
1970
1971   Global variables whose names begin with @samp{Q} are constants whose
1972 value is a symbol of a particular name.  The name of the variable should
1973 be derived from the name of the symbol using the same rules as for Lisp
1974 primitives.  These variables are initialized using a call to
1975 @code{defsymbol()} in the @code{syms_of_*()} function. (This call
1976 interns a symbol, sets the C variable to the resulting Lisp object, and
1977 calls @code{staticpro()} on the C variable to tell the
1978 garbage-collection mechanism about this variable.  What
1979 @code{staticpro()} does is add a pointer to the variable to a large
1980 global array; when garbage-collection happens, all pointers listed in
1981 the array are used as starting points for marking Lisp objects.  This is
1982 important because it's quite possible that the only current reference to
1983 the object is the C variable.  In the case of symbols, the
1984 @code{staticpro()} doesn't matter all that much because the symbol is
1985 contained in @code{obarray}, which is itself @code{staticpro()}ed.
1986 However, it's possible that a naughty user could do something like
1987 uninterning the symbol out of @code{obarray} or even setting
1988 @code{obarray} to a different value [although this is likely to make
1989 XEmacs crash!].)
1990
1991   @strong{Please note:} It is potentially deadly if you declare a
1992 @samp{Q...}  variable in two different modules.  The two calls to
1993 @code{defsymbol()} are no problem, but some linkers will complain about
1994 multiply-defined symbols.  The most insidious aspect of this is that
1995 often the link will succeed anyway, but then the resulting executable
1996 will sometimes crash in obscure ways during certain operations!  To
1997 avoid this problem, declare any symbols with common names (such as
1998 @code{text}) that are not obviously associated with this particular
1999 module in the module @file{general.c}.
2000
2001   Global variables whose names begin with @samp{V} are variables that
2002 contain Lisp objects.  The convention here is that all global variables
2003 of type @code{Lisp_Object} begin with @samp{V}, and all others don't
2004 (including integer and boolean variables that have Lisp
2005 equivalents). Most of the time, these variables have equivalents in
2006 Lisp, but some don't.  Those that do are declared this way by a call to
2007 @code{DEFVAR_LISP()} in the @code{vars_of_*()} initializer for the
2008 module.  What this does is create a special @dfn{symbol-value-forward}
2009 Lisp object that contains a pointer to the C variable, intern a symbol
2010 whose name is as specified in the call to @code{DEFVAR_LISP()}, and set
2011 its value to the symbol-value-forward Lisp object; it also calls
2012 @code{staticpro()} on the C variable to tell the garbage-collection
2013 mechanism about the variable.  When @code{eval} (or actually
2014 @code{symbol-value}) encounters this special object in the process of
2015 retrieving a variable's value, it follows the indirection to the C
2016 variable and gets its value.  @code{setq} does similar things so that
2017 the C variable gets changed.
2018
2019   Whether or not you @code{DEFVAR_LISP()} a variable, you need to
2020 initialize it in the @code{vars_of_*()} function; otherwise it will end
2021 up as all zeroes, which is the integer 0 (@emph{not} @code{nil}), and
2022 this is probably not what you want.  Also, if the variable is not
2023 @code{DEFVAR_LISP()}ed, @strong{you must call} @code{staticpro()} on the
2024 C variable in the @code{vars_of_*()} function.  Otherwise, the
2025 garbage-collection mechanism won't know that the object in this variable
2026 is in use, and will happily collect it and reuse its storage for another
2027 Lisp object, and you will be the one who's unhappy when you can't figure
2028 out how your variable got overwritten.
2029
2030 @node Coding for Mule
2031 @section Coding for Mule
2032 @cindex Coding for Mule
2033
2034 Although Mule support is not compiled by default in XEmacs, many people
2035 are using it, and we consider it crucial that new code works correctly
2036 with multibyte characters.  This is not hard; it is only a matter of
2037 following several simple user-interface guidelines.  Even if you never
2038 compile with Mule, with a little practice you will find it quite easy
2039 to code Mule-correctly.
2040
2041 Note that these guidelines are not necessarily tied to the current Mule
2042 implementation; they are also a good idea to follow on the grounds of
2043 code generalization for future I18N work.
2044
2045 @menu
2046 * Character-Related Data Types::
2047 * Working With Character and Byte Positions::
2048 * Conversion to and from External Data::
2049 * General Guidelines for Writing Mule-Aware Code::
2050 * An Example of Mule-Aware Code::
2051 @end menu
2052
2053 @node Character-Related Data Types
2054 @subsection Character-Related Data Types
2055
2056 First, let's review the basic character-related datatypes used by
2057 XEmacs.  Note that the separate @code{typedef}s are not mandatory in the
2058 current implementation (all of them boil down to @code{unsigned char} or
2059 @code{int}), but they improve clarity of code a great deal, because one
2060 glance at the declaration can tell the intended use of the variable.
2061
2062 @table @code
2063 @item Emchar
2064 @cindex Emchar
2065 An @code{Emchar} holds a single Emacs character.
2066
2067 Obviously, the equality between characters and bytes is lost in the Mule
2068 world.  Characters can be represented by one or more bytes in the
2069 buffer, and @code{Emchar} is the C type large enough to hold any
2070 character.
2071
2072 Without Mule support, an @code{Emchar} is equivalent to an
2073 @code{unsigned char}.
2074
2075 @item Bufbyte
2076 @cindex Bufbyte
2077 The data representing the text in a buffer or string is logically a set
2078 of @code{Bufbyte}s.
2079
2080 XEmacs does not work with character formats all the time; when reading
2081 characters from the outside, it decodes them to an internal format, and
2082 likewise encodes them when writing.  @code{Bufbyte} (in fact
2083 @code{unsigned char}) is the basic unit of XEmacs internal buffers and
2084 strings format.
2085
2086 One character can correspond to one or more @code{Bufbyte}s.  In the
2087 current implementation, an ASCII character is represented by the same
2088 @code{Bufbyte}, and extended characters are represented by a sequence of
2089 @code{Bufbyte}s.
2090
2091 Without Mule support, a @code{Bufbyte} is equivalent to an
2092 @code{Emchar}.
2093
2094 @item Bufpos
2095 @itemx Charcount
2096 @cindex Bufpos
2097 @cindex Charcount
2098 A @code{Bufpos} represents a character position in a buffer or string.
2099 A @code{Charcount} represents a number (count) of characters.
2100 Logically, subtracting two @code{Bufpos} values yields a
2101 @code{Charcount} value.  Although all of these are @code{typedef}ed to
2102 @code{int}, we use them in preference to @code{int} to make it clear
2103 what sort of position is being used.
2104
2105 @code{Bufpos} and @code{Charcount} values are the only ones that are
2106 ever visible to Lisp.
2107
2108 @item Bytind
2109 @itemx Bytecount
2110 @cindex Bytind
2111 @cindex Bytecount
2112 A @code{Bytind} represents a byte position in a buffer or string.  A
2113 @code{Bytecount} represents the distance between two positions in bytes.
2114 The relationship between @code{Bytind} and @code{Bytecount} is the same
2115 as the relationship between @code{Bufpos} and @code{Charcount}.
2116
2117 @item Extbyte
2118 @itemx Extcount
2119 @cindex Extbyte
2120 @cindex Extcount
2121 When dealing with the outside world, XEmacs works with @code{Extbyte}s,
2122 which are equivalent to @code{unsigned char}.  Obviously, an
2123 @code{Extcount} is the distance between two @code{Extbyte}s.  Extbytes
2124 and Extcounts are not all that frequent in XEmacs code.
2125 @end table
2126
2127 @node Working With Character and Byte Positions
2128 @subsection Working With Character and Byte Positions
2129
2130 Now that we have defined the basic character-related types, we can look
2131 at the macros and functions designed for work with them and for
2132 conversion between them.  Most of these macros are defined in
2133 @file{buffer.h}, and we don't discuss all of them here, but only the
2134 most important ones.  Examining the existing code is the best way to
2135 learn about them.
2136
2137 @table @code
2138 @item MAX_EMCHAR_LEN
2139 @cindex MAX_EMCHAR_LEN
2140 This preprocessor constant is the maximum number of buffer bytes per
2141 Emacs character, i.e. the byte length of an @code{Emchar}.  It is useful
2142 when allocating temporary strings to keep a known number of characters.
2143 For instance:
2144
2145 @example
2146 @group
2147 @{
2148   Charcount cclen;
2149   ...
2150   @{
2151     /* Allocate place for @var{cclen} characters. */
2152     Bufbyte *tmp_buf = (Bufbyte *)alloca (cclen * MAX_EMCHAR_LEN);
2153 ...
2154 @end group
2155 @end example
2156
2157 If you followed the previous section, you can guess that, logically,
2158 multiplying a @code{Charcount} value with @code{MAX_EMCHAR_LEN} produces
2159 a @code{Bytecount} value.
2160
2161 In the current Mule implementation, @code{MAX_EMCHAR_LEN} equals 4.
2162 Without Mule, it is 1.
2163
2164 @item charptr_emchar
2165 @itemx set_charptr_emchar
2166 @cindex charptr_emchar
2167 @cindex set_charptr_emchar
2168 The @code{charptr_emchar} macro takes a @code{Bufbyte} pointer and
2169 returns the @code{Emchar} stored at that position.  If it were a
2170 function, its prototype would be:
2171
2172 @example
2173 Emchar charptr_emchar (Bufbyte *p);
2174 @end example
2175
2176 @code{set_charptr_emchar} stores an @code{Emchar} to the specified byte
2177 position.  It returns the number of bytes stored:
2178
2179 @example
2180 Bytecount set_charptr_emchar (Bufbyte *p, Emchar c);
2181 @end example
2182
2183 It is important to note that @code{set_charptr_emchar} is safe only for
2184 appending a character at the end of a buffer, not for overwriting a
2185 character in the middle.  This is because the width of characters
2186 varies, and @code{set_charptr_emchar} cannot resize the string if it
2187 writes, say, a two-byte character where a single-byte character used to
2188 reside.
2189
2190 A typical use of @code{set_charptr_emchar} can be demonstrated by this
2191 example, which copies characters from buffer @var{buf} to a temporary
2192 string of Bufbytes.
2193
2194 @example
2195 @group
2196 @{
2197   Bufpos pos;
2198   for (pos = beg; pos < end; pos++)
2199     @{
2200       Emchar c = BUF_FETCH_CHAR (buf, pos);
2201       p += set_charptr_emchar (buf, c);
2202     @}
2203 @}
2204 @end group
2205 @end example
2206
2207 Note how @code{set_charptr_emchar} is used to store the @code{Emchar}
2208 and increment the counter, at the same time.
2209
2210 @item INC_CHARPTR
2211 @itemx DEC_CHARPTR
2212 @cindex INC_CHARPTR
2213 @cindex DEC_CHARPTR
2214 These two macros increment and decrement a @code{Bufbyte} pointer,
2215 respectively.  They will adjust the pointer by the appropriate number of
2216 bytes according to the byte length of the character stored there.  Both
2217 macros assume that the memory address is located at the beginning of a
2218 valid character.
2219
2220 Without Mule support, @code{INC_CHARPTR (p)} and @code{DEC_CHARPTR (p)}
2221 simply expand to @code{p++} and @code{p--}, respectively.
2222
2223 @item bytecount_to_charcount
2224 @cindex bytecount_to_charcount
2225 Given a pointer to a text string and a length in bytes, return the
2226 equivalent length in characters.
2227
2228 @example
2229 Charcount bytecount_to_charcount (Bufbyte *p, Bytecount bc);
2230 @end example
2231
2232 @item charcount_to_bytecount
2233 @cindex charcount_to_bytecount
2234 Given a pointer to a text string and a length in characters, return the
2235 equivalent length in bytes.
2236
2237 @example
2238 Bytecount charcount_to_bytecount (Bufbyte *p, Charcount cc);
2239 @end example
2240
2241 @item charptr_n_addr
2242 @cindex charptr_n_addr
2243 Return a pointer to the beginning of the character offset @var{cc} (in
2244 characters) from @var{p}.
2245
2246 @example
2247 Bufbyte *charptr_n_addr (Bufbyte *p, Charcount cc);
2248 @end example
2249 @end table
2250
2251 @node Conversion to and from External Data
2252 @subsection Conversion to and from External Data
2253
2254 When an external function, such as a C library function, returns a
2255 @code{char} pointer, you should almost never treat it as @code{Bufbyte}.
2256 This is because these returned strings may contain 8bit characters which
2257 can be misinterpreted by XEmacs, and cause a crash.  Likewise, when
2258 exporting a piece of internal text to the outside world, you should
2259 always convert it to an appropriate external encoding, lest the internal
2260 stuff (such as the infamous \201 characters) leak out.
2261
2262 The interface to conversion between the internal and external
2263 representations of text are the numerous conversion macros defined in
2264 @file{buffer.h}.  Before looking at them, we'll look at the external
2265 formats supported by these macros.
2266
2267 Currently meaningful formats are @code{FORMAT_BINARY},
2268 @code{FORMAT_FILENAME}, @code{FORMAT_OS}, and @code{FORMAT_CTEXT}.  Here
2269 is a description of these.
2270
2271 @table @code
2272 @item FORMAT_BINARY
2273 Binary format.  This is the simplest format and is what we use in the
2274 absence of a more appropriate format.  This converts according to the
2275 @code{binary} coding system:
2276
2277 @enumerate a
2278 @item
2279 On input, bytes 0--255 are converted into characters 0--255.
2280 @item
2281 On output, characters 0--255 are converted into bytes 0--255 and other
2282 characters are converted into `X'.
2283 @end enumerate
2284
2285 @item FORMAT_FILENAME
2286 Format used for filenames.  In the original Mule, this is user-definable
2287 with the @code{pathname-coding-system} variable.  For the moment, we
2288 just use the @code{binary} coding system.
2289
2290 @item FORMAT_OS
2291 Format used for the external Unix environment---@code{argv[]}, stuff
2292 from @code{getenv()}, stuff from the @file{/etc/passwd} file, etc.
2293
2294 Perhaps should be the same as FORMAT_FILENAME.
2295
2296 @item FORMAT_CTEXT
2297 Compound--text format.  This is the standard X format used for data
2298 stored in properties, selections, and the like.  This is an 8-bit
2299 no-lock-shift ISO2022 coding system.
2300 @end table
2301
2302 The macros to convert between these formats and the internal format, and
2303 vice versa, follow.
2304
2305 @table @code
2306 @item GET_CHARPTR_INT_DATA_ALLOCA
2307 @itemx GET_CHARPTR_EXT_DATA_ALLOCA
2308 These two are the most basic conversion macros.
2309 @code{GET_CHARPTR_INT_DATA_ALLOCA} converts external data to internal
2310 format, and @code{GET_CHARPTR_EXT_DATA_ALLOCA} converts the other way
2311 around.  The arguments each of these receives are @var{ptr} (pointer to
2312 the text in external format), @var{len} (length of texts in bytes),
2313 @var{fmt} (format of the external text), @var{ptr_out} (lvalue to which
2314 new text should be copied), and @var{len_out} (lvalue which will be
2315 assigned the length of the internal text in bytes).  The resulting text
2316 is stored to a stack-allocated buffer.  If the text doesn't need
2317 changing, these macros will do nothing, except for setting
2318 @var{len_out}.
2319
2320 The macros above take many arguments which makes them unwieldy.  For
2321 this reason, a number of convenience macros are defined with obvious
2322 functionality, but accepting less arguments.  The general rule is that
2323 macros with @samp{INT} in their name convert text to internal Emacs
2324 representation, whereas the @samp{EXT} macros convert to external
2325 representation.
2326
2327 @item GET_C_CHARPTR_INT_DATA_ALLOCA
2328 @itemx GET_C_CHARPTR_EXT_DATA_ALLOCA
2329 As their names imply, these macros work on C char pointers, which are
2330 zero-terminated, and thus do not need @var{len} or @var{len_out}
2331 parameters.
2332
2333 @item GET_STRING_EXT_DATA_ALLOCA
2334 @itemx GET_C_STRING_EXT_DATA_ALLOCA
2335 These two macros convert a Lisp string into an external representation.
2336 The difference between them is that @code{GET_STRING_EXT_DATA_ALLOCA}
2337 stores its output to a generic string, providing @var{len_out}, the
2338 length of the resulting external string.  On the other hand,
2339 @code{GET_C_STRING_EXT_DATA_ALLOCA} assumes that the caller will be
2340 satisfied with output string being zero-terminated.
2341
2342 Note that for Lisp strings only one conversion direction makes sense.
2343
2344 @item GET_C_CHARPTR_EXT_BINARY_DATA_ALLOCA
2345 @itemx GET_CHARPTR_EXT_BINARY_DATA_ALLOCA
2346 @itemx GET_STRING_BINARY_DATA_ALLOCA
2347 @itemx GET_C_STRING_BINARY_DATA_ALLOCA
2348 @itemx GET_C_CHARPTR_EXT_FILENAME_DATA_ALLOCA
2349 @itemx ...
2350 These macros convert internal text to a specific external
2351 representation, with the external format being encoded into the name of
2352 the macro.  Note that the @code{GET_STRING_...} and
2353 @code{GET_C_STRING...}  macros lack the @samp{EXT} tag, because they
2354 only make sense in that direction.
2355
2356 @item GET_C_CHARPTR_INT_BINARY_DATA_ALLOCA
2357 @itemx GET_CHARPTR_INT_BINARY_DATA_ALLOCA
2358 @itemx GET_C_CHARPTR_INT_FILENAME_DATA_ALLOCA
2359 @itemx ...
2360 These macros convert external text of a specific format to its internal
2361 representation, with the external format being incoded into the name of
2362 the macro.
2363 @end table
2364
2365 @node General Guidelines for Writing Mule-Aware Code
2366 @subsection General Guidelines for Writing Mule-Aware Code
2367
2368 This section contains some general guidance on how to write Mule-aware
2369 code, as well as some pitfalls you should avoid.
2370
2371 @table @emph
2372 @item Never use @code{char} and @code{char *}.
2373 In XEmacs, the use of @code{char} and @code{char *} is almost always a
2374 mistake.  If you want to manipulate an Emacs character from ``C'', use
2375 @code{Emchar}.  If you want to examine a specific octet in the internal
2376 format, use @code{Bufbyte}.  If you want a Lisp-visible character, use a
2377 @code{Lisp_Object} and @code{make_char}.  If you want a pointer to move
2378 through the internal text, use @code{Bufbyte *}.  Also note that you
2379 almost certainly do not need @code{Emchar *}.
2380
2381 @item Be careful not to confuse @code{Charcount}, @code{Bytecount}, and @code{Bufpos}.
2382 The whole point of using different types is to avoid confusion about the
2383 use of certain variables.  Lest this effect be nullified, you need to be
2384 careful about using the right types.
2385
2386 @item Always convert external data
2387 It is extremely important to always convert external data, because
2388 XEmacs can crash if unexpected 8bit sequences are copied to its internal
2389 buffers literally.
2390
2391 This means that when a system function, such as @code{readdir}, returns
2392 a string, you need to convert it using one of the conversion macros
2393 described in the previous chapter, before passing it further to Lisp.
2394 In the case of @code{readdir}, you would use the
2395 @code{GET_C_CHARPTR_INT_FILENAME_DATA_ALLOCA} macro.
2396
2397 Also note that many internal functions, such as @code{make_string},
2398 accept Bufbytes, which removes the need for them to convert the data
2399 they receive.  This increases efficiency because that way external data
2400 needs to be decoded only once, when it is read.  After that, it is
2401 passed around in internal format.
2402 @end table
2403
2404 @node An Example of Mule-Aware Code
2405 @subsection An Example of Mule-Aware Code
2406
2407 As an example of Mule-aware code, we shall will analyze the
2408 @code{string} function, which conses up a Lisp string from the character
2409 arguments it receives.  Here is the definition, pasted from
2410 @code{alloc.c}:
2411
2412 @example
2413 @group
2414 DEFUN ("string", Fstring, 0, MANY, 0, /*
2415 Concatenate all the argument characters and make the result a string.
2416 */
2417        (int nargs, Lisp_Object *args))
2418 @{
2419   Bufbyte *storage = alloca_array (Bufbyte, nargs * MAX_EMCHAR_LEN);
2420   Bufbyte *p = storage;
2421
2422   for (; nargs; nargs--, args++)
2423     @{
2424       Lisp_Object lisp_char = *args;
2425       CHECK_CHAR_COERCE_INT (lisp_char);
2426       p += set_charptr_emchar (p, XCHAR (lisp_char));
2427     @}
2428   return make_string (storage, p - storage);
2429 @}
2430 @end group
2431 @end example
2432
2433 Now we can analyze the source line by line.
2434
2435 Obviously, string will be as long as there are arguments to the
2436 function.  This is why we allocate @code{MAX_EMCHAR_LEN} * @var{nargs}
2437 bytes on the stack, i.e. the worst-case number of bytes for @var{nargs}
2438 @code{Emchar}s to fit in the string.
2439
2440 Then, the loop checks that each element is a character, converting
2441 integers in the process.  Like many other functions in XEmacs, this
2442 function silently accepts integers where characters are expected, for
2443 historical and compatibility reasons.  Unless you know what you are
2444 doing, @code{CHECK_CHAR} will also suffice.  @code{XCHAR (lisp_char)}
2445 extracts the @code{Emchar} from the @code{Lisp_Object}, and
2446 @code{set_charptr_emchar} stores it to storage, increasing @code{p} in
2447 the process.
2448
2449 Other instructing examples of correct coding under Mule can be found all
2450 over XEmacs code.  For starters, I recommend
2451 @code{Fnormalize_menu_item_name} in @file{menubar.c}.  After you have
2452 understood this section of the manual and studied the examples, you can
2453 proceed writing new Mule-aware code.
2454
2455 @node Techniques for XEmacs Developers
2456 @section Techniques for XEmacs Developers
2457
2458 To make a quantified XEmacs, do: @code{make quantmacs}.
2459
2460 You simply can't dump Quantified and Purified images.  Run the image
2461 like so:  @code{quantmacs -batch -l loadup.el run-temacs -q}.
2462
2463 Before you go through the trouble, are you compiling with all
2464 debugging and error-checking off?  If not try that first.  Be warned
2465 that while Quantify is directly responsible for quite a few
2466 optimizations which have been made to XEmacs, doing a run which
2467 generates results which can be acted upon is not necessarily a trivial
2468 task.
2469
2470 Also, if you're still willing to do some runs make sure you configure
2471 with the @samp{--quantify} flag.  That will keep Quantify from starting
2472 to record data until after the loadup is completed and will shut off
2473 recording right before it shuts down (which generates enough bogus data
2474 to throw most results off).  It also enables three additional elisp
2475 commands: @code{quantify-start-recording-data},
2476 @code{quantify-stop-recording-data} and @code{quantify-clear-data}.
2477
2478 To get started debugging XEmacs, take a look at the @file{gdbinit} and
2479 @file{dbxrc} files in the @file{src} directory.
2480 @xref{Q2.1.15 - How to Debug an XEmacs problem with a debugger,,,
2481 xemacs-faq, XEmacs FAQ}.
2482
2483
2484 Here are things to know when you create a new source file:
2485
2486 @itemize @bullet
2487 @item
2488 All .c files should @code{#include <config.h>} first.  Almost all .c
2489 files should @code{#include "lisp.h"} second.
2490
2491 @item
2492 Generated header files should be included using the @code{<>} syntax,
2493 not the @code{""} syntax.  The generated headers are:
2494
2495 config.h puresize-adjust.h sheap-adjust.h paths.h Emacs.ad.h
2496
2497 The basic rule is that you should assume builds using @code{--srcdir}
2498 and the @code{<>} syntax needs to be used when the to-be-included
2499 generated file is in a potentially different directory
2500 @emph{at compile time}.
2501
2502 @item
2503 Header files should not include <config.h> and "lisp.h".   It is the
2504 responsibility of the .c files that use it to do so.
2505
2506 @item
2507 If the header uses INLINE, either directly or though DECLARE_LRECORD,
2508 then it must be added to inline.c's includes.
2509
2510 @item
2511 Try compiling at least once with
2512
2513 @example
2514 gcc --with-mule --with-union-type --error-checking=all
2515 @end example
2516 @end itemize
2517
2518 @node A Summary of the Various XEmacs Modules, Allocation of Objects in XEmacs Lisp, Rules When Writing New C Code, Top
2519 @chapter A Summary of the Various XEmacs Modules
2520
2521   This is accurate as of XEmacs 20.0.
2522
2523 @menu
2524 * Low-Level Modules::
2525 * Basic Lisp Modules::
2526 * Modules for Standard Editing Operations::
2527 * Editor-Level Control Flow Modules::
2528 * Modules for the Basic Displayable Lisp Objects::
2529 * Modules for other Display-Related Lisp Objects::
2530 * Modules for the Redisplay Mechanism::
2531 * Modules for Interfacing with the File System::
2532 * Modules for Other Aspects of the Lisp Interpreter and Object System::
2533 * Modules for Interfacing with the Operating System::
2534 * Modules for Interfacing with X Windows::
2535 * Modules for Internationalization::
2536 @end menu
2537
2538 @node Low-Level Modules
2539 @section Low-Level Modules
2540
2541 @example
2542    size  name
2543 -------  ---------------------
2544   18150  config.h
2545 @end example
2546
2547 This is automatically generated from @file{config.h.in} based on the
2548 results of configure tests and user-selected optional features and
2549 contains preprocessor definitions specifying the nature of the
2550 environment in which XEmacs is being compiled.
2551
2552
2553
2554 @example
2555    2347  paths.h
2556 @end example
2557
2558 This is automatically generated from @file{paths.h.in} based on supplied
2559 configure values, and allows for non-standard installed configurations
2560 of the XEmacs directories.  It's currently broken, though.
2561
2562
2563
2564 @example
2565   47878  emacs.c
2566   20239  signal.c
2567 @end example
2568
2569 @file{emacs.c} contains @code{main()} and other code that performs the most
2570 basic environment initializations and handles shutting down the XEmacs
2571 process (this includes @code{kill-emacs}, the normal way that XEmacs is
2572 exited; @code{dump-emacs}, which is used during the build process to
2573 write out the XEmacs executable; @code{run-emacs-from-temacs}, which can
2574 be used to start XEmacs directly when temacs has finished loading all
2575 the Lisp code; and emergency code to handle crashes [XEmacs tries to
2576 auto-save all files before it crashes]).
2577
2578 Low-level code that directly interacts with the Unix signal mechanism,
2579 however, is in @file{signal.c}.  Note that this code does not handle system
2580 dependencies in interfacing to signals; that is handled using the
2581 @file{syssignal.h} header file, described in section J below.
2582
2583
2584
2585 @example
2586   23458  unexaix.c
2587    9893  unexalpha.c
2588   11302  unexapollo.c
2589   16544  unexconvex.c
2590   31967  unexec.c
2591   30959  unexelf.c
2592   35791  unexelfsgi.c
2593    3207  unexencap.c
2594    7276  unexenix.c
2595   20539  unexfreebsd.c
2596    1153  unexfx2800.c
2597   13432  unexhp9k3.c
2598   11049  unexhp9k800.c
2599    9165  unexmips.c
2600    8981  unexnext.c
2601    1673  unexsol2.c
2602   19261  unexsunos4.c
2603 @end example
2604
2605 These modules contain code dumping out the XEmacs executable on various
2606 different systems. (This process is highly machine-specific and
2607 requires intimate knowledge of the executable format and the memory map
2608 of the process.) Only one of these modules is actually used; this is
2609 chosen by @file{configure}.
2610
2611
2612
2613 @example
2614   15715  crt0.c
2615    1484  lastfile.c
2616    1115  pre-crt0.c
2617 @end example
2618
2619 These modules are used in conjunction with the dump mechanism.  On some
2620 systems, an alternative version of the C startup code (the actual code
2621 that receives control from the operating system when the process is
2622 started, and which calls @code{main()}) is required so that the dumping
2623 process works properly; @file{crt0.c} provides this.
2624
2625 @file{pre-crt0.c} and @file{lastfile.c} should be the very first and
2626 very last file linked, respectively. (Actually, this is not really true.
2627 @file{lastfile.c} should be after all Emacs modules whose initialized
2628 data should be made constant, and before all other Emacs files and all
2629 libraries.  In particular, the allocation modules @file{gmalloc.c},
2630 @file{alloca.c}, etc. are normally placed past @file{lastfile.c}, and
2631 all of the files that implement Xt widget classes @emph{must} be placed
2632 after @file{lastfile.c} because they contain various structures that
2633 must be statically initialized and into which Xt writes at various
2634 times.) @file{pre-crt0.c} and @file{lastfile.c} contain exported symbols
2635 that are used to determine the start and end of XEmacs' initialized
2636 data space when dumping.
2637
2638
2639
2640 @example
2641   14786  alloca.c
2642   16678  free-hook.c
2643    1692  getpagesize.h
2644   41936  gmalloc.c
2645   25141  malloc.c
2646    3802  mem-limits.h
2647   39011  ralloc.c
2648    3436  vm-limit.c
2649 @end example
2650
2651 These handle basic C allocation of memory.  @file{alloca.c} is an emulation of
2652 the stack allocation function @code{alloca()} on machines that lack
2653 this. (XEmacs makes extensive use of @code{alloca()} in its code.)
2654
2655 @file{gmalloc.c} and @file{malloc.c} are two implementations of the standard C
2656 functions @code{malloc()}, @code{realloc()} and @code{free()}.  They are
2657 often used in place of the standard system-provided @code{malloc()}
2658 because they usually provide a much faster implementation, at the
2659 expense of additional memory use.  @file{gmalloc.c} is a newer implementation
2660 that is much more memory-efficient for large allocations than @file{malloc.c},
2661 and should always be preferred if it works. (At one point, @file{gmalloc.c}
2662 didn't work on some systems where @file{malloc.c} worked; but this should be
2663 fixed now.)
2664
2665 @cindex relocating allocator
2666 @file{ralloc.c} is the @dfn{relocating allocator}.  It provides functions
2667 similar to @code{malloc()}, @code{realloc()} and @code{free()} that allocate
2668 memory that can be dynamically relocated in memory.  The advantage of
2669 this is that allocated memory can be shuffled around to place all the
2670 free memory at the end of the heap, and the heap can then be shrunk,
2671 releasing the memory back to the operating system.  The use of this can
2672 be controlled with the configure option @code{--rel-alloc}; if enabled, memory allocated for
2673 buffers will be relocatable, so that if a very large file is visited and
2674 the buffer is later killed, the memory can be released to the operating
2675 system.  (The disadvantage of this mechanism is that it can be very
2676 slow.  On systems with the @code{mmap()} system call, the XEmacs version
2677 of @file{ralloc.c} uses this to move memory around without actually having to
2678 block-copy it, which can speed things up; but it can still cause
2679 noticeable performance degradation.)
2680
2681 @file{free-hook.c} contains some debugging functions for checking for invalid
2682 arguments to @code{free()}.
2683
2684 @file{vm-limit.c} contains some functions that warn the user when memory is
2685 getting low.  These are callback functions that are called by @file{gmalloc.c}
2686 and @file{malloc.c} at appropriate times.
2687
2688 @file{getpagesize.h} provides a uniform interface for retrieving the size of a
2689 page in virtual memory.  @file{mem-limits.h} provides a uniform interface for
2690 retrieving the total amount of available virtual memory.  Both are
2691 similar in spirit to the @file{sys*.h} files described in section J, below.
2692
2693
2694
2695 @example
2696    2659  blocktype.c
2697    1410  blocktype.h
2698    7194  dynarr.c
2699    2671  dynarr.h
2700 @end example
2701
2702 These implement a couple of basic C data types to facilitate memory
2703 allocation.  The @code{Blocktype} type efficiently manages the
2704 allocation of fixed-size blocks by minimizing the number of times that
2705 @code{malloc()} and @code{free()} are called.  It allocates memory in
2706 large chunks, subdivides the chunks into blocks of the proper size, and
2707 returns the blocks as requested.  When blocks are freed, they are placed
2708 onto a linked list, so they can be efficiently reused.  This data type
2709 is not much used in XEmacs currently, because it's a fairly new
2710 addition.
2711
2712 @cindex dynamic array
2713 The @code{Dynarr} type implements a @dfn{dynamic array}, which is
2714 similar to a standard C array but has no fixed limit on the number of
2715 elements it can contain.  Dynamic arrays can hold elements of any type,
2716 and when you add a new element, the array automatically resizes itself
2717 if it isn't big enough.  Dynarrs are extensively used in the redisplay
2718 mechanism.
2719
2720
2721
2722 @example
2723    2058  inline.c
2724 @end example
2725
2726 This module is used in connection with inline functions (available in
2727 some compilers).  Often, inline functions need to have a corresponding
2728 non-inline function that does the same thing.  This module is where they
2729 reside.  It contains no actual code, but defines some special flags that
2730 cause inline functions defined in header files to be rendered as actual
2731 functions.  It then includes all header files that contain any inline
2732 function definitions, so that each one gets a real function equivalent.
2733
2734
2735
2736 @example
2737    6489  debug.c
2738    2267  debug.h
2739 @end example
2740
2741 These functions provide a system for doing internal consistency checks
2742 during code development.  This system is not currently used; instead the
2743 simpler @code{assert()} macro is used along with the various checks
2744 provided by the @samp{--error-check-*} configuration options.
2745
2746
2747
2748 @example
2749    1643  prefix-args.c
2750 @end example
2751
2752 This is actually the source for a small, self-contained program
2753 used during building.
2754
2755
2756 @example
2757     904  universe.h
2758 @end example
2759
2760 This is not currently used.
2761
2762
2763
2764 @node Basic Lisp Modules
2765 @section Basic Lisp Modules
2766
2767 @example
2768    size  name
2769 -------  ---------------------
2770   70167  emacsfns.h
2771    6305  lisp-disunion.h
2772    7086  lisp-union.h
2773   54929  lisp.h
2774   14235  lrecord.h
2775   10728  symsinit.h
2776 @end example
2777
2778 These are the basic header files for all XEmacs modules.  Each module
2779 includes @file{lisp.h}, which brings the other header files in.
2780 @file{lisp.h} contains the definitions of the structures and extractor
2781 and constructor macros for the basic Lisp objects and various other
2782 basic definitions for the Lisp environment, as well as some
2783 general-purpose definitions (e.g. @code{min()} and @code{max()}).
2784 @file{lisp.h} includes either @file{lisp-disunion.h} or
2785 @file{lisp-union.h}, depending on whether @code{USE_UNION_TYPE} is
2786 defined.  These files define the typedef of the Lisp object itself (as
2787 described above) and the low-level macros that hide the actual
2788 implementation of the Lisp object.  All extractor and constructor macros
2789 for particular types of Lisp objects are defined in terms of these
2790 low-level macros.
2791
2792 As a general rule, all typedefs should go into the typedefs section of
2793 @file{lisp.h} rather than into a module-specific header file even if the
2794 structure is defined elsewhere.  This allows function prototypes that
2795 use the typedef to be placed into @file{emacsfns.h}.  Forward structure
2796 declarations (i.e. a simple declaration like @code{struct foo;} where
2797 the structure itself is defined elsewhere) should be placed into the
2798 typedefs section as necessary.
2799
2800 @file{lrecord.h} contains the basic structures and macros that implement
2801 all record-type Lisp objects -- i.e. all objects whose type is a field
2802 in their C structure, which includes all objects except the few most
2803 basic ones.
2804
2805 @file{emacsfns.h} contains prototypes for most of the exported functions
2806 in the various modules. (In particular, prototypes for Lisp primitives
2807 should always go into this header file.  Prototypes for other functions
2808 can either go here or in a module-specific header file, depending on how
2809 general-purpose the function is and whether it has special-purpose
2810 argument types requiring definitions not in @file{lisp.h}.)  All
2811 initialization functions are prototyped in @file{symsinit.h}.
2812
2813
2814
2815 @example
2816  120478  alloc.c
2817    1029  pure.c
2818    2506  puresize.h
2819 @end example
2820
2821 The large module @file{alloc.c} implements all of the basic allocation and
2822 garbage collection for Lisp objects.  The most commonly used Lisp
2823 objects are allocated in chunks, similar to the Blocktype data type
2824 described above; others are allocated in individually @code{malloc()}ed
2825 blocks.  This module provides the foundation on which all other aspects
2826 of the Lisp environment sit, and is the first module initialized at
2827 startup.
2828
2829 Note that @file{alloc.c} provides a series of generic functions that are
2830 not dependent on any particular object type, and interfaces to
2831 particular types of objects using a standardized interface of
2832 type-specific methods.  This scheme is a fundamental principle of
2833 object-oriented programming and is heavily used throughout XEmacs.  The
2834 great advantage of this is that it allows for a clean separation of
2835 functionality into different modules -- new classes of Lisp objects, new
2836 event interfaces, new device types, new stream interfaces, etc. can be
2837 added transparently without affecting code anywhere else in XEmacs.
2838 Because the different subsystems are divided into general and specific
2839 code, adding a new subtype within a subsystem will in general not
2840 require changes to the generic subsystem code or affect any of the other
2841 subtypes in the subsystem; this provides a great deal of robustness to
2842 the XEmacs code.
2843
2844 @cindex pure space
2845 @file{pure.c} contains the declaration of the @dfn{purespace} array.
2846 Pure space is a hack used to place some constant Lisp data into the code
2847 segment of the XEmacs executable, even though the data needs to be
2848 initialized through function calls.  (See above in section VIII for more
2849 info about this.)  During startup, certain sorts of data is
2850 automatically copied into pure space, and other data is copied manually
2851 in some of the basic Lisp files by calling the function @code{purecopy},
2852 which copies the object if possible (this only works in temacs, of
2853 course) and returns the new object.  In particular, while temacs is
2854 executing, the Lisp reader automatically copies all compiled-function
2855 objects that it reads into pure space.  Since compiled-function objects
2856 are large, are never modified, and typically comprise the majority of
2857 the contents of a compiled-Lisp file, this works well.  While XEmacs is
2858 running, any attempt to modify an object that resides in pure space
2859 causes an error.  Objects in pure space are never garbage collected --
2860 almost all of the time, they're intended to be permanent, and in any
2861 case you can't write into pure space to set the mark bits.
2862
2863 @file{puresize.h} contains the declaration of the size of the pure space
2864 array.  This depends on the optional features that are compiled in, any
2865 extra purespace requested by the user at compile time, and certain other
2866 factors (e.g. 64-bit machines need more pure space because their Lisp
2867 objects are larger).  The smallest size that suffices should be used, so
2868 that there's no wasted space.  If there's not enough pure space, you
2869 will get an error during the build process, specifying how much more
2870 pure space is needed.
2871
2872
2873
2874 @example
2875  122243  eval.c
2876    2305  backtrace.h
2877 @end example
2878
2879 This module contains all of the functions to handle the flow of control.
2880 This includes the mechanisms of defining functions, calling functions,
2881 traversing stack frames, and binding variables; the control primitives
2882 and other special forms such as @code{while}, @code{if}, @code{eval},
2883 @code{let}, @code{and}, @code{or}, @code{progn}, etc.; handling of
2884 non-local exits, unwind-protects, and exception handlers; entering the
2885 debugger; methods for the subr Lisp object type; etc.  It does
2886 @emph{not} include the @code{read} function, the @code{print} function,
2887 or the handling of symbols and obarrays.
2888
2889 @file{backtrace.h} contains some structures related to stack frames and the
2890 flow of control.
2891
2892
2893
2894 @example
2895   64949  lread.c
2896 @end example
2897
2898 This module implements the Lisp reader and the @code{read} function,
2899 which converts text into Lisp objects, according to the read syntax of
2900 the objects, as described above.  This is similar to the parser that is
2901 a part of all compilers.
2902
2903
2904
2905 @example
2906   40900  print.c
2907 @end example
2908
2909 This module implements the Lisp print mechanism and the @code{print}
2910 function and related functions.  This is the inverse of the Lisp reader
2911 -- it converts Lisp objects to a printed, textual representation.
2912 (Hopefully something that can be read back in using @code{read} to get
2913 an equivalent object.)
2914
2915
2916
2917 @example
2918    4518  general.c
2919   60220  symbols.c
2920    9966  symeval.h
2921 @end example
2922
2923 @file{symbols.c} implements the handling of symbols, obarrays, and
2924 retrieving the values of symbols.  Much of the code is devoted to
2925 handling the special @dfn{symbol-value-magic} objects that define
2926 special types of variables -- this includes buffer-local variables,
2927 variable aliases, variables that forward into C variables, etc.  This
2928 module is initialized extremely early (right after @file{alloc.c}),
2929 because it is here that the basic symbols @code{t} and @code{nil} are
2930 created, and those symbols are used everywhere throughout XEmacs.
2931
2932 @file{symeval.h} contains the definitions of symbol structures and the
2933 @code{DEFVAR_LISP()} and related macros for declaring variables.
2934
2935
2936
2937 @example
2938   48973  data.c
2939   25694  floatfns.c
2940   71049  fns.c
2941 @end example
2942
2943 These modules implement the methods and standard Lisp primitives for all
2944 the basic Lisp object types other than symbols (which are described
2945 above).  @file{data.c} contains all the predicates (primitives that return
2946 whether an object is of a particular type); the integer arithmetic
2947 functions; and the basic accessor and mutator primitives for the various
2948 object types.  @file{fns.c} contains all the standard predicates for working
2949 with sequences (where, abstractly speaking, a sequence is an ordered set
2950 of objects, and can be represented by a list, string, vector, or
2951 bit-vector); it also contains @code{equal}, perhaps on the grounds that
2952 bulk of the operation of @code{equal} is comparing sequences.
2953 @file{floatfns.c} contains methods and primitives for floats and floating-point
2954 arithmetic.
2955
2956
2957
2958 @example
2959   23555  bytecode.c
2960    3358  bytecode.h
2961 @end example
2962
2963 @file{bytecode.c} implements the byte-code interpreter, and @file{bytecode.h} contains
2964 associated structures.  Note that the byte-code @emph{compiler} is
2965 written in Lisp.
2966
2967
2968
2969
2970 @node Modules for Standard Editing Operations
2971 @section Modules for Standard Editing Operations
2972
2973 @example
2974    size  name
2975 -------  ---------------------
2976   82900  buffer.c
2977   60964  buffer.h
2978    6059  bufslots.h
2979 @end example
2980
2981 @file{buffer.c} implements the @dfn{buffer} Lisp object type.  This
2982 includes functions that create and destroy buffers; retrieve buffers by
2983 name or by other properties; manipulate lists of buffers (remember that
2984 buffers are permanent objects and stored in various ordered lists);
2985 retrieve or change buffer properties; etc.  It also contains the
2986 definitions of all the built-in buffer-local variables (which can be
2987 viewed as buffer properties).  It does @emph{not} contain code to
2988 manipulate buffer-local variables (that's in @file{symbols.c}, described
2989 above); or code to manipulate the text in a buffer.
2990
2991 @file{buffer.h} defines the structures associated with a buffer and the various
2992 macros for retrieving text from a buffer and special buffer positions
2993 (e.g. @code{point}, the default location for text insertion).  It also
2994 contains macros for working with buffer positions and converting between
2995 their representations as character offsets and as byte offsets (under
2996 MULE, they are different, because characters can be multi-byte).  It is
2997 one of the largest header files.
2998
2999 @file{bufslots.h} defines the fields in the buffer structure that correspond to
3000 the built-in buffer-local variables.  It is its own header file because
3001 it is included many times in @file{buffer.c}, as a way of iterating over all
3002 the built-in buffer-local variables.
3003
3004
3005
3006 @example
3007   79888  insdel.c
3008    6103  insdel.h
3009 @end example
3010
3011 @file{insdel.c} contains low-level functions for inserting and deleting text in
3012 a buffer, keeping track of changed regions for use by redisplay, and
3013 calling any before-change and after-change functions that may have been
3014 registered for the buffer.  It also contains the actual functions that
3015 convert between byte offsets and character offsets.
3016
3017 @file{insdel.h} contains associated headers.
3018
3019
3020
3021 @example
3022   10975  marker.c
3023 @end example
3024
3025 This module implements the @dfn{marker} Lisp object type, which
3026 conceptually is a pointer to a text position in a buffer that moves
3027 around as text is inserted and deleted, so as to remain in the same
3028 relative position.  This module doesn't actually move the markers around
3029 -- that's handled in @file{insdel.c}.  This module just creates them and
3030 implements the primitives for working with them.  As markers are simple
3031 objects, this does not entail much.
3032
3033 Note that the standard arithmetic primitives (e.g. @code{+}) accept
3034 markers in place of integers and automatically substitute the value of
3035 @code{marker-position} for the marker, i.e. an integer describing the
3036 current buffer position of the marker.
3037
3038
3039
3040 @example
3041  193714  extents.c
3042   15686  extents.h
3043 @end example
3044
3045 This module implements the @dfn{extent} Lisp object type, which is like
3046 a marker that works over a range of text rather than a single position.
3047 Extents are also much more complex and powerful than markers and have a
3048 more efficient (and more algorithmically complex) implementation.  The
3049 implementation is described in detail in comments in @file{extents.c}.
3050
3051 The code in @file{extents.c} works closely with @file{insdel.c} so that
3052 extents are properly moved around as text is inserted and deleted.
3053 There is also code in @file{extents.c} that provides information needed
3054 by the redisplay mechanism for efficient operation. (Remember that
3055 extents can have display properties that affect [sometimes drastically,
3056 as in the @code{invisible} property] the display of the text they
3057 cover.)
3058
3059
3060
3061 @example
3062   60155  editfns.c
3063 @end example
3064
3065 @file{editfns.c} contains the standard Lisp primitives for working with
3066 a buffer's text, and calls the low-level functions in @file{insdel.c}.
3067 It also contains primitives for working with @code{point} (the default
3068 buffer insertion location).
3069
3070 @file{editfns.c} also contains functions for retrieving various
3071 characteristics from the external environment: the current time, the
3072 process ID of the running XEmacs process, the name of the user who ran
3073 this XEmacs process, etc.  It's not clear why this code is in
3074 @file{editfns.c}.
3075
3076
3077
3078 @example
3079   26081  callint.c
3080   12577  cmds.c
3081    2749  commands.h
3082 @end example
3083
3084 @cindex interactive
3085 These modules implement the basic @dfn{interactive} commands,
3086 i.e. user-callable functions.  Commands, as opposed to other functions,
3087 have special ways of getting their parameters interactively (by querying
3088 the user), as opposed to having them passed in a normal function
3089 invocation.  Many commands are not really meant to be called from other
3090 Lisp functions, because they modify global state in a way that's often
3091 undesired as part of other Lisp functions.
3092
3093 @file{callint.c} implements the mechanism for querying the user for
3094 parameters and calling interactive commands.  The bulk of this module is
3095 code that parses the interactive spec that is supplied with an
3096 interactive command.
3097
3098 @file{cmds.c} implements the basic, most commonly used editing commands:
3099 commands to move around the current buffer and insert and delete
3100 characters.  These commands are implemented using the Lisp primitives
3101 defined in @file{editfns.c}.
3102
3103 @file{commands.h} contains associated structure definitions and prototypes.
3104
3105
3106
3107 @example
3108  194863  regex.c
3109   18968  regex.h
3110   79800  search.c
3111 @end example
3112
3113 @file{search.c} implements the Lisp primitives for searching for text in
3114 a buffer, and some of the low-level algorithms for doing this.  In
3115 particular, the fast fixed-string Boyer-Moore search algorithm is
3116 implemented in @file{search.c}.  The low-level algorithms for doing
3117 regular-expression searching, however, are implemented in @file{regex.c}
3118 and @file{regex.h}.  These two modules are largely independent of
3119 XEmacs, and are similar to (and based upon) the regular-expression
3120 routines used in @file{grep} and other GNU utilities.
3121
3122
3123
3124 @example
3125   20476  doprnt.c
3126 @end example
3127
3128 @file{doprnt.c} implements formatted-string processing, similar to
3129 @code{printf()} command in C.
3130
3131
3132
3133 @example
3134   15372  undo.c
3135 @end example
3136
3137 This module implements the undo mechanism for tracking buffer changes.
3138 Most of this could be implemented in Lisp.
3139
3140
3141
3142 @node Editor-Level Control Flow Modules
3143 @section Editor-Level Control Flow Modules
3144
3145 @example
3146    size  name
3147 -------  ---------------------
3148   84546  event-Xt.c
3149  121483  event-stream.c
3150    6658  event-tty.c
3151   49271  events.c
3152   14459  events.h
3153 @end example
3154
3155 These implement the handling of events (user input and other system
3156 notifications).
3157
3158 @file{events.c} and @file{events.h} define the @dfn{event} Lisp object
3159 type and primitives for manipulating it.
3160
3161 @file{event-stream.c} implements the basic functions for working with
3162 event queues, dispatching an event by looking it up in relevant keymaps
3163 and such, and handling timeouts; this includes the primitives
3164 @code{next-event} and @code{dispatch-event}, as well as related
3165 primitives such as @code{sit-for}, @code{sleep-for}, and
3166 @code{accept-process-output}. (@file{event-stream.c} is one of the
3167 hairiest and trickiest modules in XEmacs.  Beware!  You can easily mess
3168 things up here.)
3169
3170 @file{event-Xt.c} and @file{event-tty.c} implement the low-level
3171 interfaces onto retrieving events from Xt (the X toolkit) and from TTY's
3172 (using @code{read()} and @code{select()}), respectively.  The event
3173 interface enforces a clean separation between the specific code for
3174 interfacing with the operating system and the generic code for working
3175 with events, by defining an API of basic, low-level event methods;
3176 @file{event-Xt.c} and @file{event-tty.c} are two different
3177 implementations of this API.  To add support for a new operating system
3178 (e.g. NeXTstep), one merely needs to provide another implementation of
3179 those API functions.
3180
3181 Note that the choice of whether to use @file{event-Xt.c} or
3182 @file{event-tty.c} is made at compile time!  Or at the very latest, it
3183 is made at startup time.  @file{event-Xt.c} handles events for
3184 @emph{both} X and TTY frames; @file{event-tty.c} is only used when X
3185 support is not compiled into XEmacs.  The reason for this is that there
3186 is only one event loop in XEmacs: thus, it needs to be able to receive
3187 events from all different kinds of frames.
3188
3189
3190
3191 @example
3192  129583  keymap.c
3193    2621  keymap.h
3194 @end example
3195
3196 @file{keymap.c} and @file{keymap.h} define the @dfn{keymap} Lisp object
3197 type and associated methods and primitives. (Remember that keymaps are
3198 objects that associate event descriptions with functions to be called to
3199 ``execute'' those events; @code{dispatch-event} looks up events in the
3200 relevant keymaps.)
3201
3202
3203
3204 @example
3205   25212  keyboard.c
3206 @end example
3207
3208 @file{keyboard.c} contains functions that implement the actual editor
3209 command loop -- i.e. the event loop that cyclically retrieves and
3210 dispatches events.  This code is also rather tricky, just like
3211 @file{event-stream.c}.
3212
3213
3214
3215 @example
3216    9973  macros.c
3217    1397  macros.h
3218 @end example
3219
3220 These two modules contain the basic code for defining keyboard macros.
3221 These functions don't actually do much; most of the code that handles keyboard
3222 macros is mixed in with the event-handling code in @file{event-stream.c}.
3223
3224
3225
3226 @example
3227   23234  minibuf.c
3228 @end example
3229
3230 This contains some miscellaneous code related to the minibuffer (most of
3231 the minibuffer code was moved into Lisp by Richard Mlynarik).  This
3232 includes the primitives for completion (although filename completion is
3233 in @file{dired.c}), the lowest-level interface to the minibuffer (if the
3234 command loop were cleaned up, this too could be in Lisp), and code for
3235 dealing with the echo area (this, too, was mostly moved into Lisp, and
3236 the only code remaining is code to call out to Lisp or provide simple
3237 bootstrapping implementations early in temacs, before the echo-area Lisp
3238 code is loaded).
3239
3240
3241
3242 @node Modules for the Basic Displayable Lisp Objects
3243 @section Modules for the Basic Displayable Lisp Objects
3244
3245 @example
3246    size  name
3247 -------  ---------------------
3248     985  device-ns.h
3249    6454  device-stream.c
3250    1196  device-stream.h
3251    9526  device-tty.c
3252    8660  device-tty.h
3253   43798  device-x.c
3254   11667  device-x.h
3255   26056  device.c
3256   22993  device.h
3257 @end example
3258
3259 These modules implement the @dfn{device} Lisp object type.  This
3260 abstracts a particular screen or connection on which frames are
3261 displayed.  As with Lisp objects, event interfaces, and other
3262 subsystems, the device code is separated into a generic component that
3263 contains a standardized interface (in the form of a set of methods) onto
3264 particular device types.
3265
3266 The device subsystem defines all the methods and provides method
3267 services for not only device operations but also for the frame, window,
3268 menubar, scrollbar, toolbar, and other displayable-object subsystems.
3269 The reason for this is that all of these subsystems have the same
3270 subtypes (X, TTY, NeXTstep, Microsoft Windows, etc.) as devices do.
3271
3272
3273
3274 @example
3275     934  frame-ns.h
3276    2303  frame-tty.c
3277   69205  frame-x.c
3278    5976  frame-x.h
3279   68175  frame.c
3280   15080  frame.h
3281 @end example
3282
3283 Each device contains one or more frames in which objects (e.g. text) are
3284 displayed.  A frame corresponds to a window in the window system;
3285 usually this is a top-level window but it could potentially be one of a
3286 number of overlapping child windows within a top-level window, using the
3287 MDI (Multiple Document Interface) protocol in Microsoft Windows or a
3288 similar scheme.
3289
3290 The @file{frame-*} files implement the @dfn{frame} Lisp object type and
3291 provide the generic and device-type-specific operations on frames
3292 (e.g. raising, lowering, resizing, moving, etc.).
3293
3294
3295
3296 @example
3297  160783  window.c
3298   15974  window.h
3299 @end example
3300
3301 @cindex window (in Emacs)
3302 @cindex pane
3303 Each frame consists of one or more non-overlapping @dfn{windows} (better
3304 known as @dfn{panes} in standard window-system terminology) in which a
3305 buffer's text can be displayed.  Windows can also have scrollbars
3306 displayed around their edges.
3307
3308 @file{window.c} and @file{window.h} implement the @dfn{window} Lisp
3309 object type and provide code to manage windows.  Since windows have no
3310 associated resources in the window system (the window system knows only
3311 about the frame; no child windows or anything are used for XEmacs
3312 windows), there is no device-type-specific code here; all of that code
3313 is part of the redisplay mechanism or the code for particular object
3314 types such as scrollbars.
3315
3316
3317
3318 @node Modules for other Display-Related Lisp Objects
3319 @section Modules for other Display-Related Lisp Objects
3320
3321 @example
3322    size  name
3323 -------  ---------------------
3324   54397  faces.c
3325   15173  faces.h
3326 @end example
3327
3328
3329
3330 @example
3331    4961  bitmaps.h
3332     954  glyphs-ns.h
3333  105345  glyphs-x.c
3334    4288  glyphs-x.h
3335   72102  glyphs.c
3336   16356  glyphs.h
3337 @end example
3338
3339
3340
3341 @example
3342     952  objects-ns.h
3343    9971  objects-tty.c
3344    1465  objects-tty.h
3345   32326  objects-x.c
3346    2806  objects-x.h
3347   31944  objects.c
3348    6809  objects.h
3349 @end example
3350
3351
3352
3353 @example
3354   57511  menubar-x.c
3355   11243  menubar.c
3356 @end example
3357
3358
3359
3360 @example
3361   25012  scrollbar-x.c
3362    2554  scrollbar-x.h
3363   26954  scrollbar.c
3364    2778  scrollbar.h
3365 @end example
3366
3367
3368
3369 @example
3370   23117  toolbar-x.c
3371   43456  toolbar.c
3372    4280  toolbar.h
3373 @end example
3374
3375
3376
3377 @example
3378   25070  font-lock.c
3379 @end example
3380
3381 This file provides C support for syntax highlighting -- i.e.
3382 highlighting different syntactic constructs of a source file in
3383 different colors, for easy reading.  The C support is provided so that
3384 this is fast.
3385
3386
3387
3388 @example
3389   32180  dgif_lib.c
3390    3999  gif_err.c
3391   10697  gif_lib.h
3392    9371  gifalloc.c
3393 @end example
3394
3395 These modules decode GIF-format image files, for use with glyphs.
3396
3397
3398
3399 @node Modules for the Redisplay Mechanism
3400 @section Modules for the Redisplay Mechanism
3401
3402 @example
3403    size  name
3404 -------  ---------------------
3405   38692  redisplay-output.c
3406   40835  redisplay-tty.c
3407   65069  redisplay-x.c
3408  234142  redisplay.c
3409   17026  redisplay.h
3410 @end example
3411
3412 These files provide the redisplay mechanism.  As with many other
3413 subsystems in XEmacs, there is a clean separation between the general
3414 and device-specific support.
3415
3416 @file{redisplay.c} contains the bulk of the redisplay engine.  These
3417 functions update the redisplay structures (which describe how the screen
3418 is to appear) to reflect any changes made to the state of any
3419 displayable objects (buffer, frame, window, etc.) since the last time
3420 that redisplay was called.  These functions are highly optimized to
3421 avoid doing more work than necessary (since redisplay is called
3422 extremely often and is potentially a huge time sink), and depend heavily
3423 on notifications from the objects themselves that changes have occurred,
3424 so that redisplay doesn't explicitly have to check each possible object.
3425 The redisplay mechanism also contains a great deal of caching to further
3426 speed things up; some of this caching is contained within the various
3427 displayable objects.
3428
3429 @file{redisplay-output.c} goes through the redisplay structures and converts
3430 them into calls to device-specific methods to actually output the screen
3431 changes.
3432
3433 @file{redisplay-x.c} and @file{redisplay-tty.c} are two implementations
3434 of these redisplay output methods, for X frames and TTY frames,
3435 respectively.
3436
3437
3438
3439 @example
3440   14129  indent.c
3441 @end example
3442
3443 This module contains various functions and Lisp primitives for
3444 converting between buffer positions and screen positions.  These
3445 functions call the redisplay mechanism to do most of the work, and then
3446 examine the redisplay structures to get the necessary information.  This
3447 module needs work.
3448
3449
3450
3451 @example
3452   14754  termcap.c
3453    2141  terminfo.c
3454    7253  tparam.c
3455 @end example
3456
3457 These files contain functions for working with the termcap (BSD-style)
3458 and terminfo (System V style) databases of terminal capabilities and
3459 escape sequences, used when XEmacs is displaying in a TTY.
3460
3461
3462
3463 @example
3464   10869  cm.c
3465    5876  cm.h
3466 @end example
3467
3468 These files provide some miscellaneous TTY-output functions and should
3469 probably be merged into @file{redisplay-tty.c}.
3470
3471
3472
3473 @node Modules for Interfacing with the File System
3474 @section Modules for Interfacing with the File System
3475
3476 @example
3477    size  name
3478 -------  ---------------------
3479   43362  lstream.c
3480   14240  lstream.h
3481 @end example
3482
3483 These modules implement the @dfn{stream} Lisp object type.  This is an
3484 internal-only Lisp object that implements a generic buffering stream.
3485 The idea is to provide a uniform interface onto all sources and sinks of
3486 data, including file descriptors, stdio streams, chunks of memory, Lisp
3487 buffers, Lisp strings, etc.  That way, I/O functions can be written to
3488 the stream interface and can transparently handle all possible sources
3489 and sinks.  (For example, the @code{read} function can read data from a
3490 file, a string, a buffer, or even a function that is called repeatedly
3491 to return data, without worrying about where the data is coming from or
3492 what-size chunks it is returned in.)
3493
3494 @cindex lstream
3495 Note that in the C code, streams are called @dfn{lstreams} (for ``Lisp
3496 streams'') to distinguish them from other kinds of streams, e.g. stdio
3497 streams and C++ I/O streams.
3498
3499 Similar to other subsystems in XEmacs, lstreams are separated into
3500 generic functions and a set of methods for the different types of
3501 lstreams.  @file{lstream.c} provides implementations of many different
3502 types of streams; others are provided, e.g., in @file{mule-coding.c}.
3503
3504
3505
3506 @example
3507  126926  fileio.c
3508 @end example
3509
3510 This implements the basic primitives for interfacing with the file
3511 system.  This includes primitives for reading files into buffers,
3512 writing buffers into files, checking for the presence or accessibility
3513 of files, canonicalizing file names, etc.  Note that these primitives
3514 are usually not invoked directly by the user: There is a great deal of
3515 higher-level Lisp code that implements the user commands such as
3516 @code{find-file} and @code{save-buffer}.  This is similar to the
3517 distinction between the lower-level primitives in @file{editfns.c} and
3518 the higher-level user commands in @file{commands.c} and
3519 @file{simple.el}.
3520
3521
3522
3523 @example
3524   10960  filelock.c
3525 @end example
3526
3527 This file provides functions for detecting clashes between different
3528 processes (e.g. XEmacs and some external process, or two different
3529 XEmacs processes) modifying the same file.  (XEmacs can optionally use
3530 the @file{lock/} subdirectory to provide a form of ``locking'' between
3531 different XEmacs processes.)  This module is also used by the low-level
3532 functions in @file{insdel.c} to ensure that, if the first modification
3533 is being made to a buffer whose corresponding file has been externally
3534 modified, the user is made aware of this so that the buffer can be
3535 synched up with the external changes if necessary.
3536
3537
3538 @example
3539    4527  filemode.c
3540 @end example
3541
3542 This file provides some miscellaneous functions that construct a
3543 @samp{rwxr-xr-x}-type permissions string (as might appear in an
3544 @file{ls}-style directory listing) given the information returned by the
3545 @code{stat()} system call.
3546
3547
3548
3549 @example
3550   22855  dired.c
3551    2094  ndir.h
3552 @end example
3553
3554 These files implement the XEmacs interface to directory searching.  This
3555 includes a number of primitives for determining the files in a directory
3556 and for doing filename completion. (Remember that generic completion is
3557 handled by a different mechanism, in @file{minibuf.c}.)
3558
3559 @file{ndir.h} is a header file used for the directory-searching
3560 emulation functions provided in @file{sysdep.c} (see section J below),
3561 for systems that don't provide any directory-searching functions. (On
3562 those systems, directories can be read directly as files, and parsed.)
3563
3564
3565
3566 @example
3567    4311  realpath.c
3568 @end example
3569
3570 This file provides an implementation of the @code{realpath()} function
3571 for expanding symbolic links, on systems that don't implement it or have
3572 a broken implementation.
3573
3574
3575
3576 @node Modules for Other Aspects of the Lisp Interpreter and Object System
3577 @section Modules for Other Aspects of the Lisp Interpreter and Object System
3578
3579 @example
3580    size  name
3581 -------  ---------------------
3582   22290  elhash.c
3583    2454  elhash.h
3584   12169  hash.c
3585    3369  hash.h
3586 @end example
3587
3588 These files implement the @dfn{hashtable} Lisp object type.
3589 @file{hash.c} and @file{hash.h} provide a generic C implementation of
3590 hash tables (which can stand independently of XEmacs), and
3591 @file{elhash.c} and @file{elhash.h} provide a Lisp interface onto the C
3592 hash tables using the hashtable Lisp object type.
3593
3594
3595
3596 @example
3597   95691  specifier.c
3598   11167  specifier.h
3599 @end example
3600
3601 This module implements the @dfn{specifier} Lisp object type.  This is
3602 primarily used for displayable properties, and allows for values that
3603 are specific to a particular buffer, window, frame, device, or device
3604 class, as well as a default value existing.  This is used, for example,
3605 to control the height of the horizontal scrollbar or the appearance of
3606 the @code{default}, @code{bold}, or other faces.  The specifier object
3607 consists of a number of specifications, each of which maps from a
3608 buffer, window, etc. to a value.  The function @code{specifier-instance}
3609 looks up a value given a window (from which a buffer, frame, and device
3610 can be derived).
3611
3612
3613 @example
3614   43058  chartab.c
3615    6503  chartab.h
3616    9918  casetab.c
3617 @end example
3618
3619 @file{chartab.c} and @file{chartab.h} implement the @dfn{char table}
3620 Lisp object type, which maps from characters or certain sorts of
3621 character ranges to Lisp objects.  The implementation of this object
3622 type is optimized for the internal representation of characters.  Char
3623 tables come in different types, which affect the allowed object types to
3624 which a character can be mapped and also dictate certain other
3625 properties of the char table.
3626
3627 @cindex case table
3628 @file{casetab.c} implements one sort of char table, the @dfn{case
3629 table}, which maps characters to other characters of possibly different
3630 case.  These are used by XEmacs to implement case-changing primitives
3631 and to do case-insensitive searching.
3632
3633
3634
3635 @example
3636   49593  syntax.c
3637   10200  syntax.h
3638 @end example
3639
3640 @cindex scanner
3641 This module implements @dfn{syntax tables}, another sort of char table
3642 that maps characters into syntax classes that define the syntax of these
3643 characters (e.g. a parenthesis belongs to a class of @samp{open}
3644 characters that have corresponding @samp{close} characters and can be
3645 nested).  This module also implements the Lisp @dfn{scanner}, a set of
3646 primitives for scanning over text based on syntax tables.  This is used,
3647 for example, to find the matching parenthesis in a command such as
3648 @code{forward-sexp}, and by @file{font-lock.c} to locate quoted strings,
3649 comments, etc.
3650
3651
3652
3653 @example
3654   10438  casefiddle.c
3655 @end example
3656
3657 This module implements various Lisp primitives for upcasing, downcasing
3658 and capitalizing strings or regions of buffers.
3659
3660
3661
3662 @example
3663   20234  rangetab.c
3664 @end example
3665
3666 This module implements the @dfn{range table} Lisp object type, which
3667 provides for a mapping from ranges of integers to arbitrary Lisp
3668 objects.
3669
3670
3671
3672 @example
3673    3201  opaque.c
3674    2206  opaque.h
3675 @end example
3676
3677 This module implements the @dfn{opaque} Lisp object type, an
3678 internal-only Lisp object that encapsulates an arbitrary block of memory
3679 so that it can be managed by the Lisp allocation system.  To create an
3680 opaque object, you call @code{make_opaque()}, passing a pointer to a
3681 block of memory.  An object is created that is big enough to hold the
3682 memory, which is copied into the object's storage.  The object will then
3683 stick around as long as you keep pointers to it, after which it will be
3684 automatically reclaimed.
3685
3686 @cindex mark method
3687 Opaque objects can also have an arbitrary @dfn{mark method} associated
3688 with them, in case the block of memory contains other Lisp objects that
3689 need to be marked for garbage-collection purposes. (If you need other
3690 object methods, such as a finalize method, you should just go ahead and
3691 create a new Lisp object type -- it's not hard.)
3692
3693
3694
3695 @example
3696    8783  abbrev.c
3697 @end example
3698
3699 This function provides a few primitives for doing dynamic abbreviation
3700 expansion.  In XEmacs, most of the code for this has been moved into
3701 Lisp.  Some C code remains for speed and because the primitive
3702 @code{self-insert-command} (which is executed for all self-inserting
3703 characters) hooks into the abbrev mechanism. (@code{self-insert-command}
3704 is itself in C only for speed.)
3705
3706
3707
3708 @example
3709   21934  doc.c
3710 @end example
3711
3712 This function provides primitives for retrieving the documentation
3713 strings of functions and variables.  These documentation strings contain
3714 certain special markers that get dynamically expanded (e.g. a
3715 reverse-lookup is performed on some named functions to retrieve their
3716 current key bindings).  Some documentation strings (in particular, for
3717 the built-in primitives and pre-loaded Lisp functions) are stored
3718 externally in a file @file{DOC} in the @file{lib-src/} directory and
3719 need to be fetched from that file. (Part of the build stage involves
3720 building this file, and another part involves constructing an index for
3721 this file and embedding it into the executable, so that the functions in
3722 @file{doc.c} do not have to search the entire @file{DOC} file to find
3723 the appropriate documentation string.)
3724
3725
3726
3727 @example
3728   13197  md5.c
3729 @end example
3730
3731 This function provides a Lisp primitive that implements the MD5 secure
3732 hashing scheme, used to create a large hash value of a string of data such that
3733 the data cannot be derived from the hash value.  This is used for
3734 various security applications on the Internet.
3735
3736
3737
3738
3739 @node Modules for Interfacing with the Operating System
3740 @section Modules for Interfacing with the Operating System
3741
3742 @example
3743    size  name
3744 -------  ---------------------
3745   33533  callproc.c
3746   89697  process.c
3747    4663  process.h
3748 @end example
3749
3750 These modules allow XEmacs to spawn and communicate with subprocesses
3751 and network connections.
3752
3753 @cindex synchronous subprocesses
3754 @cindex subprocesses, synchronous
3755   @file{callproc.c} implements (through the @code{call-process}
3756 primitive) what are called @dfn{synchronous subprocesses}.  This means
3757 that XEmacs runs a program, waits till it's done, and retrieves its
3758 output.  A typical example might be calling the @file{ls} program to get
3759 a directory listing.
3760
3761 @cindex asynchronous subprocesses
3762 @cindex subprocesses, asynchronous
3763   @file{process.c} and @file{process.h} implement @dfn{asynchronous
3764 subprocesses}.  This means that XEmacs starts a program and then
3765 continues normally, not waiting for the process to finish.  Data can be
3766 sent to the process or retrieved from it as it's running.  This is used
3767 for the @code{shell} command (which provides a front end onto a shell
3768 program such as @file{csh}), the mail and news readers implemented in
3769 XEmacs, etc.  The result of calling @code{start-process} to start a
3770 subprocess is a process object, a particular kind of object used to
3771 communicate with the subprocess.  You can send data to the process by
3772 passing the process object and the data to @code{send-process}, and you
3773 can specify what happens to data retrieved from the process by setting
3774 properties of the process object. (When the process sends data, XEmacs
3775 receives a process event, which says that there is data ready.  When
3776 @code{dispatch-event} is called on this event, it reads the data from
3777 the process and does something with it, as specified by the process
3778 object's properties.  Typically, this means inserting the data into a
3779 buffer or calling a function.) Another property of the process object is
3780 called the @dfn{sentinel}, which is a function that is called when the
3781 process terminates.
3782
3783 @cindex network connections
3784   Process objects are also used for network connections (connections to a
3785 process running on another machine).  Network connections are started
3786 with @code{open-network-stream} but otherwise work just like
3787 subprocesses.
3788
3789
3790
3791 @example
3792  136029  sysdep.c
3793    5986  sysdep.h
3794 @end example
3795
3796   These modules implement most of the low-level, messy operating-system
3797 interface code.  This includes various device control (ioctl) operations
3798 for file descriptors, TTY's, pseudo-terminals, etc. (usually this stuff
3799 is fairly system-dependent; thus the name of this module), and emulation
3800 of standard library functions and system calls on systems that don't
3801 provide them or have broken versions.
3802
3803
3804
3805 @example
3806    3605  sysdir.h
3807    6708  sysfile.h
3808    2027  sysfloat.h
3809    2918  sysproc.h
3810     745  syspwd.h
3811    7643  syssignal.h
3812    6892  systime.h
3813   12477  systty.h
3814    3487  syswait.h
3815 @end example
3816
3817 These header files provide consistent interfaces onto system-dependent
3818 header files and system calls.  The idea is that, instead of including a
3819 standard header file like @file{<sys/param.h>} (which may or may not
3820 exist on various systems) or having to worry about whether all system
3821 provide a particular preprocessor constant, or having to deal with the
3822 four different paradigms for manipulating signals, you just include the
3823 appropriate @file{sys*.h} header file, which includes all the right
3824 system header files, defines and missing preprocessor constants,
3825 provides a uniform interface onto system calls, etc.
3826
3827 @file{sysdir.h} provides a uniform interface onto directory-querying
3828 functions. (In some cases, this is in conjunction with emulation
3829 functions in @file{sysdep.c}.)
3830
3831 @file{sysfile.h} includes all the necessary header files for standard
3832 system calls (e.g. @code{read()}), ensures that all necessary
3833 @code{open()} and @code{stat()} preprocessor constants are defined, and
3834 possibly (usually) substitutes sugared versions of @code{read()},
3835 @code{write()}, etc. that automatically restart interrupted I/O
3836 operations.
3837
3838 @file{sysfloat.h} includes the necessary header files for floating-point
3839 operations.
3840
3841 @file{sysproc.h} includes the necessary header files for calling
3842 @code{select()}, @code{fork()}, @code{execve()}, socket operations, and
3843 the like, and ensures that the @code{FD_*()} macros for descriptor-set
3844 manipulations are available.
3845
3846 @file{syspwd.h} includes the necessary header files for obtaining
3847 information from @file{/etc/passwd} (the functions are emulated under
3848 VMS).
3849
3850 @file{syssignal.h} includes the necessary header files for
3851 signal-handling and provides a uniform interface onto the different
3852 signal-handling and signal-blocking paradigms.
3853
3854 @file{systime.h} includes the necessary header files and provides
3855 uniform interfaces for retrieving the time of day, setting file
3856 access/modification times, getting the amount of time used by the XEmacs
3857 process, etc.
3858
3859 @file{systty.h} buffers against the infinitude of different ways of
3860 controlling TTY's.
3861
3862 @file{syswait.h} provides a uniform way of retrieving the exit status
3863 from a @code{wait()}ed-on process (some systems use a union, others use
3864 an int).
3865
3866
3867
3868 @example
3869    7940  hpplay.c
3870   10920  libsst.c
3871    1480  libsst.h
3872    3260  libst.h
3873   15355  linuxplay.c
3874   15849  nas.c
3875   19133  sgiplay.c
3876   15411  sound.c
3877    7358  sunplay.c
3878 @end example
3879
3880 These files implement the ability to play various sounds on some types
3881 of computers.  You have to configure your XEmacs with sound support in
3882 order to get this capability.
3883
3884 @file{sound.c} provides the generic interface.  It implements various
3885 Lisp primitives and variables that let you specify which sounds should
3886 be played in certain conditions. (The conditions are identified by
3887 symbols, which are passed to @code{ding} to make a sound.  Various
3888 standard functions call this function at certain times; if sound support
3889 does not exist, a simple beep results.
3890
3891 @cindex native sound
3892 @cindex sound, native
3893 @file{sgiplay.c}, @file{sunplay.c}, @file{hpplay.c}, and
3894 @file{linuxplay.c} interface to the machine's speaker for various
3895 different kind of machines.  This is called @dfn{native} sound.
3896
3897 @cindex sound, network
3898 @cindex network sound
3899 @cindex NAS
3900 @file{nas.c} interfaces to a computer somewhere else on the network
3901 using the NAS (Network Audio Server) protocol, playing sounds on that
3902 machine.  This allows you to run XEmacs on a remote machine, with its
3903 display set to your local machine, and have the sounds be made on your
3904 local machine, provided that you have a NAS server running on your local
3905 machine.
3906
3907 @file{libsst.c}, @file{libsst.h}, and @file{libst.h} provide some
3908 additional functions for playing sound on a Sun SPARC but are not
3909 currently in use.
3910
3911
3912
3913 @example
3914   44368  tooltalk.c
3915    2137  tooltalk.h
3916 @end example
3917
3918 These two modules implement an interface to the ToolTalk protocol, which
3919 is an interprocess communication protocol implemented on some versions
3920 of Unix.  ToolTalk is a high-level protocol that allows processes to
3921 register themselves as providers of particular services; other processes
3922 can then request a service without knowing or caring exactly who is
3923 providing the service.  It is similar in spirit to the DDE protocol
3924 provided under Microsoft Windows.  ToolTalk is a part of the new CDE
3925 (Common Desktop Environment) specification and is used to connect the
3926 parts of the SPARCWorks development environment.
3927
3928
3929
3930 @example
3931   22695  getloadavg.c
3932 @end example
3933
3934 This module provides the ability to retrieve the system's current load
3935 average. (The way to do this is highly system-specific, unfortunately,
3936 and requires a lot of special-case code.)
3937
3938
3939
3940 @example
3941  148520  energize.c
3942    6896  energize.h
3943 @end example
3944
3945 This module provides code to interface to an Energize server (when
3946 XEmacs is used as part of Lucid's Energize development environment) and
3947 provides some other Energize-specific functions.  Much of the code in
3948 this module should be made more general-purpose and moved elsewhere, but
3949 is no longer very relevant now that Lucid is defunct.  It also hasn't
3950 worked since version 19.12, since nobody has been maintaining it.
3951
3952
3953
3954 @example
3955    2861  sunpro.c
3956 @end example
3957
3958 This module provides a small amount of code used internally at Sun to
3959 keep statistics on the usage of XEmacs.
3960
3961
3962
3963 @example
3964    5548  broken-sun.h
3965    3468  strcmp.c
3966    2179  strcpy.c
3967    1650  sunOS-fix.c
3968 @end example
3969
3970 These files provide replacement functions and prototypes to fix numerous
3971 bugs in early releases of SunOS 4.1.
3972
3973
3974
3975 @example
3976   11669  hftctl.c
3977 @end example
3978
3979 This module provides some terminal-control code necessary on versions of
3980 AIX prior to 4.1.
3981
3982
3983
3984 @example
3985    1776  acldef.h
3986    1602  chpdef.h
3987    9032  uaf.h
3988     105  vlimit.h
3989    7145  vms-pp.c
3990    1158  vms-pwd.h
3991   26532  vmsfns.c
3992    6038  vmsmap.c
3993     695  vmspaths.h
3994   17482  vmsproc.c
3995     469  vmsproc.h
3996 @end example
3997
3998 All of these files are used for VMS support, which has never worked in
3999 XEmacs.
4000
4001
4002
4003 @example
4004   28316  msdos.c
4005    1472  msdos.h
4006 @end example
4007
4008 These modules are used for MS-DOS support, which does not work in
4009 XEmacs.
4010
4011
4012
4013 @node Modules for Interfacing with X Windows
4014 @section Modules for Interfacing with X Windows
4015
4016 @example
4017    size  name
4018 -------  ---------------------
4019    3196  Emacs.ad.h
4020 @end example
4021
4022 A file generated from @file{Emacs.ad}, which contains XEmacs-supplied
4023 fallback resources (so that XEmacs has pretty defaults).
4024
4025
4026
4027 @example
4028   24242  EmacsFrame.c
4029    6979  EmacsFrame.h
4030    3351  EmacsFrameP.h
4031 @end example
4032
4033 These modules implement an Xt widget class that encapsulates a frame.
4034 This is for ease in integrating with Xt.  The EmacsFrame widget covers
4035 the entire X window except for the menubar; the scrollbars are
4036 positioned on top of the EmacsFrame widget.
4037
4038 @strong{Warning:} Abandon hope, all ye who enter here.  This code took
4039 an ungodly amount of time to get right, and is likely to fall apart
4040 mercilessly at the slightest change.  Such is life under Xt.
4041
4042
4043
4044 @example
4045    8178  EmacsManager.c
4046    1967  EmacsManager.h
4047    1895  EmacsManagerP.h
4048 @end example
4049
4050 These modules implement a simple Xt manager (i.e. composite) widget
4051 class that simply lets its children set whatever geometry they want.
4052 It's amazing that Xt doesn't provide this standardly, but on second
4053 thought, it makes sense, considering how amazingly broken Xt is.
4054
4055
4056 @example
4057   13188  EmacsShell-sub.c
4058    4588  EmacsShell.c
4059    2180  EmacsShell.h
4060    3133  EmacsShellP.h
4061 @end example
4062
4063 These modules implement two Xt widget classes that are subclasses of
4064 the TopLevelShell and TransientShell classes.  This is necessary to deal
4065 with more brokenness that Xt has sadistically thrust onto the backs of
4066 developers.
4067
4068
4069
4070 @example
4071    9673  xgccache.c
4072    1111  xgccache.h
4073 @end example
4074
4075 These modules provide functions for maintenance and caching of GC's
4076 (graphics contexts) under the X Window System.  This code is junky and
4077 needs to be rewritten.
4078
4079
4080
4081 @example
4082   69181  xselect.c
4083 @end example
4084
4085 @cindex selections
4086   This module provides an interface to the X Window System's concept of
4087 @dfn{selections}, the standard way for X applications to communicate
4088 with each other.
4089
4090
4091
4092 @example
4093     929  xintrinsic.h
4094    1038  xintrinsicp.h
4095    1579  xmmanagerp.h
4096    1585  xmprimitivep.h
4097 @end example
4098
4099 These header files are similar in spirit to the @file{sys*.h} files and buffer
4100 against different implementations of Xt and Motif.
4101
4102 @itemize @bullet
4103 @item
4104 @file{xintrinsic.h} should be included in place of @file{<Intrinsic.h>}.
4105 @item
4106 @file{xintrinsicp.h} should be included in place of @file{<IntrinsicP.h>}.
4107 @item
4108 @file{xmmanagerp.h} should be included in place of @file{<XmManagerP.h>}.
4109 @item
4110 @file{xmprimitivep.h} should be included in place of @file{<XmPrimitiveP.h>}.
4111 @end itemize
4112
4113
4114
4115 @example
4116   16930  xmu.c
4117     936  xmu.h
4118 @end example
4119
4120 These files provide an emulation of the Xmu library for those systems
4121 (i.e. HPUX) that don't provide it as a standard part of X.
4122
4123
4124
4125 @example
4126    4201  ExternalClient-Xlib.c
4127   18083  ExternalClient.c
4128    2035  ExternalClient.h
4129    2104  ExternalClientP.h
4130   22684  ExternalShell.c
4131    1709  ExternalShell.h
4132    1971  ExternalShellP.h
4133    2478  extw-Xlib.c
4134    1481  extw-Xlib.h
4135    6565  extw-Xt.c
4136    1430  extw-Xt.h
4137 @end example
4138
4139 @cindex external widget
4140   These files provide the @dfn{external widget} interface, which allows an
4141 XEmacs frame to appear as a widget in another application.  To do this,
4142 you have to configure with @samp{--external-widget}.
4143
4144 @file{ExternalShell*} provides the server (XEmacs) side of the
4145 connection.
4146
4147 @file{ExternalClient*} provides the client (other application) side of
4148 the connection.  These files are not compiled into XEmacs but are
4149 compiled into libraries that are then linked into your application.
4150
4151 @file{extw-*} is common code that is used for both the client and server.
4152
4153 Don't touch this code; something is liable to break if you do.
4154
4155
4156
4157 @example
4158   31014  epoch.c
4159 @end example
4160
4161 This file provides some additional, Epoch-compatible, functionality for
4162 interfacing to the X Window System.
4163
4164
4165
4166 @node Modules for Internationalization
4167 @section Modules for Internationalization
4168
4169 @example
4170    size  name
4171 -------  ---------------------
4172   42836  mule-canna.c
4173   16737  mule-ccl.c
4174   41080  mule-charset.c
4175   30176  mule-charset.h
4176  146844  mule-coding.c
4177   16588  mule-coding.h
4178    6996  mule-mcpath.c
4179    2899  mule-mcpath.h
4180   57158  mule-wnnfns.c
4181    3351  mule.c
4182 @end example
4183
4184 These files implement the MULE (Asian-language) support.  Note that MULE
4185 actually provides a general interface for all sorts of languages, not
4186 just Asian languages (although they are generally the most complicated
4187 to support).  This code is still in beta.
4188
4189 @file{mule-charset.*} and @file{mule-coding.*} provide the heart of the
4190 XEmacs MULE support.  @file{mule-charset.*} implements the @dfn{charset}
4191 Lisp object type, which encapsulates a character set (an ordered one- or
4192 two-dimensional set of characters, such as US ASCII or JISX0208 Japanese
4193 Kanji).
4194
4195 @file{mule-coding.*} implements the @dfn{coding-system} Lisp object
4196 type, which encapsulates a method of converting between different
4197 encodings.  An encoding is a representation of a stream of characters,
4198 possibly from multiple character sets, using a stream of bytes or words,
4199 and defines (e.g.) which escape sequences are used to specify particular
4200 character sets, how the indices for a character are converted into bytes
4201 (sometimes this involves setting the high bit; sometimes complicated
4202 rearranging of the values takes place, as in the Shift-JIS encoding),
4203 etc.
4204
4205 @file{mule-ccl.c} provides the CCL (Code Conversion Language)
4206 interpreter.  CCL is similar in spirit to Lisp byte code and is used to
4207 implement converters for custom encodings.
4208
4209 @file{mule-canna.c} and @file{mule-wnnfns.c} implement interfaces to
4210 external programs used to implement the Canna and WNN input methods,
4211 respectively.  This is currently in beta.
4212
4213 @file{mule-mcpath.c} provides some functions to allow for pathnames
4214 containing extended characters.  This code is fragmentary, obsolete, and
4215 completely non-working.  Instead, @var{pathname-coding-system} is used
4216 to specify conversions of names of files and directories.  The standard
4217 C I/O functions like @samp{open()} are wrapped so that conversion occurs
4218 automatically.
4219
4220 @file{mule.c} provides a few miscellaneous things that should probably
4221 be elsewhere.
4222
4223
4224
4225 @example
4226    9400  intl.c
4227 @end example
4228
4229 This provides some miscellaneous internationalization code for
4230 implementing message translation and interfacing to the Ximp input
4231 method.  None of this code is currently working.
4232
4233
4234
4235 @example
4236    1764  iso-wide.h
4237 @end example
4238
4239 This contains leftover code from an earlier implementation of
4240 Asian-language support, and is not currently used.
4241
4242
4243
4244
4245 @node Allocation of Objects in XEmacs Lisp, Events and the Event Loop, A Summary of the Various XEmacs Modules, Top
4246 @chapter Allocation of Objects in XEmacs Lisp
4247
4248 @menu
4249 * Introduction to Allocation::
4250 * Garbage Collection::
4251 * GCPROing::
4252 * Integers and Characters::
4253 * Allocation from Frob Blocks::
4254 * lrecords::
4255 * Low-level allocation::
4256 * Pure Space::
4257 * Cons::
4258 * Vector::
4259 * Bit Vector::
4260 * Symbol::
4261 * Marker::
4262 * String::
4263 * Bytecode::
4264 @end menu
4265
4266 @node Introduction to Allocation
4267 @section Introduction to Allocation
4268
4269   Emacs Lisp, like all Lisps, has garbage collection.  This means that
4270 the programmer never has to explicitly free (destroy) an object; it
4271 happens automatically when the object becomes inaccessible.  Most
4272 experts agree that garbage collection is a necessity in a modern,
4273 high-level language.  Its omission from C stems from the fact that C was
4274 originally designed to be a nice abstract layer on top of assembly
4275 language, for writing kernels and basic system utilities rather than
4276 large applications.
4277
4278   Lisp objects can be created by any of a number of Lisp primitives.
4279 Most object types have one or a small number of basic primitives
4280 for creating objects.  For conses, the basic primitive is @code{cons};
4281 for vectors, the primitives are @code{make-vector} and @code{vector}; for
4282 symbols, the primitives are @code{make-symbol} and @code{intern}; etc.
4283 Some Lisp objects, especially those that are primarily used internally,
4284 have no corresponding Lisp primitives.  Every Lisp object, though,
4285 has at least one C primitive for creating it.
4286
4287   Recall from section (VII) that a Lisp object, as stored in a 32-bit
4288 or 64-bit word, has a mark bit, a few tag bits, and a ``value'' that
4289 occupies the remainder of the bits.  We can separate the different
4290 Lisp object types into four broad categories:
4291
4292 @itemize @bullet
4293 @item
4294 (a) Those for whom the value directly represents the contents of the
4295 Lisp object.  Only two types are in this category: integers and
4296 characters.  No special allocation or garbage collection is necessary
4297 for such objects.  Lisp objects of these types do not need to be
4298 @code{GCPRO}ed.
4299 @end itemize
4300
4301   In the remaining three categories, the value is a pointer to a
4302 structure.
4303
4304 @itemize @bullet
4305 @item
4306 @cindex frob block
4307 (b) Those for whom the tag directly specifies the type.  Recall that
4308 there are only three tag bits; this means that at most five types can be
4309 specified this way.  The most commonly-used types are stored in this
4310 format; this includes conses, strings, vectors, and sometimes symbols.
4311 With the exception of vectors, objects in this category are allocated in
4312 @dfn{frob blocks}, i.e. large blocks of memory that are subdivided into
4313 individual objects.  This saves a lot on malloc overhead, since there
4314 are typically quite a lot of these objects around, and the objects are
4315 small.  (A cons, for example, occupies 8 bytes on 32-bit machines -- 4
4316 bytes for each of the two objects it contains.) Vectors are individually
4317 @code{malloc()}ed since they are of variable size.  (It would be
4318 possible, and desirable, to allocate vectors of certain small sizes out
4319 of frob blocks, but it isn't currently done.) Strings are handled
4320 specially: Each string is allocated in two parts, a fixed size structure
4321 containing a length and a data pointer, and the actual data of the
4322 string.  The former structure is allocated in frob blocks as usual, and
4323 the latter data is stored in @dfn{string chars blocks} and is relocated
4324 during garbage collection to eliminate holes.
4325 @end itemize
4326
4327   In the remaining two categories, the type is stored in the object
4328 itself.  The tag for all such objects is the generic @dfn{lrecord}
4329 (Lisp_Record) tag.  The first four bytes (or eight, for 64-bit machines)
4330 of the object's structure are a pointer to a structure that describes
4331 the object's type, which includes method pointers and a pointer to a
4332 string naming the type.  Note that it's possible to save some space by
4333 using a one- or two-byte tag, rather than a four- or eight-byte pointer
4334 to store the type, but it's not clear it's worth making the change.
4335
4336 @itemize @bullet
4337 @item
4338 (c) Those lrecords that are allocated in frob blocks (see above).  This
4339 includes the objects that are most common and relatively small, and
4340 includes floats, bytecodes, symbols (when not in category (b)), extents,
4341 events, and markers.  With the cleanup of frob blocks done in 19.12,
4342 it's not terribly hard to add more objects to this category, but it's a
4343 bit trickier than adding an object type to type (d) (esp. if the object
4344 needs a finalization method), and is not likely to save much space
4345 unless the object is small and there are many of them. (In fact, if
4346 there are very few of them, it might actually waste space.)
4347 @item
4348 (d) Those lrecords that are individually @code{malloc()}ed.  These are
4349 called @dfn{lcrecords}.  All other types are in this category.  Adding a
4350 new type to this category is comparatively easy, and all types added
4351 since 19.8 (when the current allocation scheme was devised, by Richard
4352 Mlynarik), with the exception of the character type, have been in this
4353 category.
4354 @end itemize
4355
4356   Note that bit vectors are a bit of a special case.  They are
4357 simple lrecords as in category (c), but are individually @code{malloc()}ed
4358 like vectors.  You can basically view them as exactly like vectors
4359 except that their type is stored in lrecord fashion rather than
4360 in directly-tagged fashion.
4361
4362   Note that FSF Emacs redesigned their object system in 19.29 to follow
4363 a similar scheme.  However, given RMS's expressed dislike for data
4364 abstraction, the FSF scheme is not nearly as clean or as easy to
4365 extend. (FSF calls items of type (c) @code{Lisp_Misc} and items of type
4366 (d) @code{Lisp_Vectorlike}, with separate tags for each, although
4367 @code{Lisp_Vectorlike} is also used for vectors.)
4368
4369 @node Garbage Collection
4370 @section Garbage Collection
4371 @cindex garbage collection
4372
4373 @cindex mark and sweep
4374   Garbage collection is simple in theory but tricky to implement.
4375 Emacs Lisp uses the oldest garbage collection method, called
4376 @dfn{mark and sweep}.  Garbage collection begins by starting with
4377 all accessible locations (i.e. all variables and other slots where
4378 Lisp objects might occur) and recursively traversing all objects
4379 accessible from those slots, marking each one that is found.
4380 We then go through all of memory and free each object that is
4381 not marked, and unmarking each object that is marked.  Note
4382 that ``all of memory'' means all currently allocated objects.
4383 Traversing all these objects means traversing all frob blocks,
4384 all vectors (which are chained in one big list), and all
4385 lcrecords (which are likewise chained).
4386
4387   Note that, when an object is marked, the mark has to occur
4388 inside of the object's structure, rather than in the 32-bit
4389 @code{Lisp_Object} holding the object's pointer; i.e. you can't just
4390 set the pointer's mark bit.  This is because there may be many
4391 pointers to the same object.  This means that the method of
4392 marking an object can differ depending on the type.  The
4393 different marking methods are approximately as follows:
4394
4395 @enumerate
4396 @item
4397 For conses, the mark bit of the car is set.
4398 @item
4399 For strings, the mark bit of the string's plist is set.
4400 @item
4401 For symbols when not lrecords, the mark bit of the
4402 symbol's plist is set.
4403 @item
4404 For vectors, the length is negated after adding 1.
4405 @item
4406 For lrecords, the pointer to the structure describing
4407 the type is changed (see below).
4408 @item
4409 Integers and characters do not need to be marked, since
4410 no allocation occurs for them.
4411 @end enumerate
4412
4413   The details of this are in the @code{mark_object()} function.
4414
4415   Note that any code that operates during garbage collection has
4416 to be especially careful because of the fact that some objects
4417 may be marked and as such may not look like they normally do.
4418 In particular:
4419
4420 @itemize @bullet
4421 Some object pointers may have their mark bit set.  This will make
4422 @code{FOOBARP()} predicates fail.  Use @code{GC_FOOBARP()} to deal with
4423 this.
4424 @item
4425 Even if you clear the mark bit, @code{FOOBARP()} will still fail
4426 for lrecords because the implementation pointer has been
4427 changed (see below).  @code{GC_FOOBARP()} will correctly deal with
4428 this.
4429 @item
4430 Vectors have their size field munged, so anything that
4431 looks at this field will fail.
4432 @item
4433 Note that @code{XFOOBAR()} macros @emph{will} work correctly on object
4434 pointers with their mark bit set, because the logical shift operations
4435 that remove the tag also remove the mark bit.
4436 @end itemize
4437
4438   Finally, note that garbage collection can be invoked explicitly
4439 by calling @code{garbage-collect} but is also called automatically
4440 by @code{eval}, once a certain amount of memory has been allocated
4441 since the last garbage collection (according to @code{gc-cons-threshold}).
4442
4443 @node GCPROing
4444 @section @code{GCPRO}ing
4445
4446 @code{GCPRO}ing is one of the ugliest and trickiest parts of Emacs
4447 internals.  The basic idea is that whenever garbage collection
4448 occurs, all in-use objects must be reachable somehow or
4449 other from one of the roots of accessibility.  The roots
4450 of accessibility are:
4451
4452 @enumerate
4453 @item
4454 All objects that have been @code{staticpro()}d.  This is used for
4455 any global C variables that hold Lisp objects.  A call to
4456 @code{staticpro()} happens implicitly as a result of any symbols
4457 declared with @code{defsymbol()} and any variables declared with
4458 @code{DEFVAR_FOO()}.  You need to explicitly call @code{staticpro()}
4459 (in the @code{vars_of_foo()} method of a module) for other global
4460 C variables holding Lisp objects. (This typically includes
4461 internal lists and such things.)
4462
4463 Note that @code{obarray} is one of the @code{staticpro()}d things.
4464 Therefore, all functions and variables get marked through this.
4465 @item
4466 Any shadowed bindings that are sitting on the @code{specpdl} stack.
4467 @item
4468 Any objects sitting in currently active (Lisp) stack frames,
4469 catches, and condition cases.
4470 @item
4471 A couple of special-case places where active objects are
4472 located.
4473 @item
4474 Anything currently marked with @code{GCPRO}.
4475 @end enumerate
4476
4477   Marking with @code{GCPRO} is necessary because some C functions (quite
4478 a lot, in fact), allocate objects during their operation.  Quite
4479 frequently, there will be no other pointer to the object while the
4480 function is running, and if a garbage collection occurs and the object
4481 needs to be referenced again, bad things will happen.  The solution is
4482 to mark those objects with @code{GCPRO}.  Unfortunately this is easy to
4483 forget, and there is basically no way around this problem.  Here are
4484 some rules, though:
4485
4486 @enumerate
4487 @item
4488 For every @code{GCPRO@var{n}}, there have to be declarations of
4489 @code{struct gcpro gcpro1, gcpro2}, etc.
4490
4491 @item
4492 You @emph{must} @code{UNGCPRO} anything that's @code{GCPRO}ed, and you
4493 @emph{must not} @code{UNGCPRO} if you haven't @code{GCPRO}ed.  Getting
4494 either of these wrong will lead to crashes, often in completely random
4495 places unrelated to where the problem lies.
4496
4497 @item
4498 The way this actually works is that all currently active @code{GCPRO}s
4499 are chained through the @code{struct gcpro} local variables, with the
4500 variable @samp{gcprolist} pointing to the head of the list and the nth
4501 local @code{gcpro} variable pointing to the first @code{gcpro} variable
4502 in the next enclosing stack frame.  Each @code{GCPRO}ed thing is an
4503 lvalue, and the @code{struct gcpro} local variable contains a pointer to
4504 this lvalue.  This is why things will mess up badly if you don't pair up
4505 the @code{GCPRO}s and @code{UNGCPRO}s -- you will end up with
4506 @code{gcprolist}s containing pointers to @code{struct gcpro}s or local
4507 @code{Lisp_Object} variables in no-longer-active stack frames.
4508
4509 @item
4510 It is actually possible for a single @code{struct gcpro} to
4511 protect a contiguous array of any number of values, rather than
4512 just a single lvalue.  To effect this, call @code{GCPRO@var{n}} as usual on
4513 the first object in the array and then set @code{gcpro@var{n}.nvars}.
4514
4515 @item
4516 @strong{Strings are relocated.}  What this means in practice is that the
4517 pointer obtained using @code{XSTRING_DATA()} is liable to change at any
4518 time, and you should never keep it around past any function call, or
4519 pass it as an argument to any function that might cause a garbage
4520 collection.  This is why a number of functions accept either a
4521 ``non-relocatable'' @code{char *} pointer or a relocatable Lisp string,
4522 and only access the Lisp string's data at the very last minute.  In some
4523 cases, you may end up having to @code{alloca()} some space and copy the
4524 string's data into it.
4525
4526 @item
4527 By convention, if you have to nest @code{GCPRO}'s, use @code{NGCPRO@var{n}}
4528 (along with @code{struct gcpro ngcpro1, ngcpro2}, etc.), @code{NNGCPRO@var{n}},
4529 etc.  This avoids compiler warnings about shadowed locals.
4530
4531 @item
4532 It is @emph{always} better to err on the side of extra @code{GCPRO}s
4533 rather than too few.  The extra cycles spent on this are
4534 almost never going to make a whit of difference in the
4535 speed of anything.
4536
4537 @item
4538 The general rule to follow is that caller, not callee, @code{GCPRO}s.
4539 That is, you should not have to explicitly @code{GCPRO} any Lisp objects
4540 that are passed in as parameters.
4541
4542 One exception from this rule is if you ever plan to change the parameter
4543 value, and store a new object in it.  In that case, you @emph{must}
4544 @code{GCPRO} the parameter, because otherwise the new object will not be
4545 protected.
4546
4547 So, if you create any Lisp objects (remember, this happens in all sorts
4548 of circumstances, e.g. with @code{Fcons()}, etc.), you are responsible
4549 for @code{GCPRO}ing them, unless you are @emph{absolutely sure} that
4550 there's no possibility that a garbage-collection can occur while you
4551 need to use the object.  Even then, consider @code{GCPRO}ing.
4552
4553 @item
4554 A garbage collection can occur whenever anything calls @code{Feval}, or
4555 whenever a QUIT can occur where execution can continue past
4556 this. (Remember, this is almost anywhere.)
4557
4558 @item
4559 If you have the @emph{least smidgeon of doubt} about whether
4560 you need to @code{GCPRO}, you should @code{GCPRO}.
4561
4562 @item
4563 Beware of @code{GCPRO}ing something that is uninitialized.  If you have
4564 any shade of doubt about this, initialize all your variables to @code{Qnil}.
4565
4566 @item
4567 Be careful of traps, like calling @code{Fcons()} in the argument to
4568 another function.  By the ``caller protects'' law, you should be
4569 @code{GCPRO}ing the newly-created cons, but you aren't.  A certain
4570 number of functions that are commonly called on freshly created stuff
4571 (e.g. @code{nconc2()}, @code{Fsignal()}), break the ``caller protects''
4572 law and go ahead and @code{GCPRO} their arguments so as to simplify
4573 things, but make sure and check if it's OK whenever doing something like
4574 this.
4575
4576 @item
4577 Once again, remember to @code{GCPRO}!  Bugs resulting from insufficient
4578 @code{GCPRO}ing are intermittent and extremely difficult to track down,
4579 often showing up in crashes inside of @code{garbage-collect} or in
4580 weirdly corrupted objects or even in incorrect values in a totally
4581 different section of code.
4582 @end enumerate
4583
4584 @cindex garbage collection, conservative
4585 @cindex conservative garbage collection
4586   Given the extremely error-prone nature of the @code{GCPRO} scheme, and
4587 the difficulties in tracking down, it should be considered a deficiency
4588 in the XEmacs code.  A solution to this problem would involve
4589 implementing so-called @dfn{conservative} garbage collection for the C
4590 stack.  That involves looking through all of stack memory and treating
4591 anything that looks like a reference to an object as a reference.  This
4592 will result in a few objects not getting collected when they should, but
4593 it obviates the need for @code{GCPRO}ing, and allows garbage collection
4594 to happen at any point at all, such as during object allocation.
4595
4596 @node Integers and Characters
4597 @section Integers and Characters
4598
4599   Integer and character Lisp objects are created from integers using the
4600 macros @code{XSETINT()} and @code{XSETCHAR()} or the equivalent
4601 functions @code{make_int()} and @code{make_char()}. (These are actually
4602 macros on most systems.)  These functions basically just do some moving
4603 of bits around, since the integral value of the object is stored
4604 directly in the @code{Lisp_Object}.
4605
4606   @code{XSETINT()} and the like will truncate values given to them that
4607 are too big; i.e. you won't get the value you expected but the tag bits
4608 will at least be correct.
4609
4610 @node Allocation from Frob Blocks
4611 @section Allocation from Frob Blocks
4612
4613 The uninitialized memory required by a @code{Lisp_Object} of a particular type
4614 is allocated using
4615 @code{ALLOCATE_FIXED_TYPE()}.  This only occurs inside of the
4616 lowest-level object-creating functions in @file{alloc.c}:
4617 @code{Fcons()}, @code{make_float()}, @code{Fmake_byte_code()},
4618 @code{Fmake_symbol()}, @code{allocate_extent()},
4619 @code{allocate_event()}, @code{Fmake_marker()}, and
4620 @code{make_uninit_string()}.  The idea is that, for each type, there are
4621 a number of frob blocks (each 2K in size); each frob block is divided up
4622 into object-sized chunks.  Each frob block will have some of these
4623 chunks that are currently assigned to objects, and perhaps some that are
4624 free. (If a frob block has nothing but free chunks, it is freed at the
4625 end of the garbage collection cycle.)  The free chunks are stored in a
4626 free list, which is chained by storing a pointer in the first four bytes
4627 of the chunk. (Except for the free chunks at the end of the last frob
4628 block, which are handled using an index which points past the end of the
4629 last-allocated chunk in the last frob block.)
4630 @code{ALLOCATE_FIXED_TYPE()} first tries to retrieve a chunk from the
4631 free list; if that fails, it calls
4632 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK()}, which looks at the end of the
4633 last frob block for space, and creates a new frob block if there is
4634 none. (There are actually two versions of these macros, one of which is
4635 more defensive but less efficient and is used for error-checking.)
4636
4637 @node lrecords
4638 @section lrecords
4639
4640   [see @file{lrecord.h}]
4641
4642   All lrecords have at the beginning of their structure a @code{struct
4643 lrecord_header}.  This just contains a pointer to a @code{struct
4644 lrecord_implementation}, which is a structure containing method pointers
4645 and such.  There is one of these for each type, and it is a global,
4646 constant, statically-declared structure that is declared in the
4647 @code{DEFINE_LRECORD_IMPLEMENTATION()} macro. (This macro actually
4648 declares an array of two @code{struct lrecord_implementation}
4649 structures.  The first one contains all the standard method pointers,
4650 and is used in all normal circumstances.  During garbage collection,
4651 however, the lrecord is @dfn{marked} by bumping its implementation
4652 pointer by one, so that it points to the second structure in the array.
4653 This structure contains a special indication in it that it's a
4654 @dfn{marked-object} structure: the finalize method is the special
4655 function @code{this_marks_a_marked_record()}, and all other methods are
4656 null pointers.  At the end of garbage collection, all lrecords will
4657 either be reclaimed or unmarked by decrementing their implementation
4658 pointers, so this second structure pointer will never remain past
4659 garbage collection.
4660
4661   Simple lrecords (of type (c) above) just have a @code{struct
4662 lrecord_header} at their beginning.  lcrecords, however, actually have a
4663 @code{struct lcrecord_header}.  This, in turn, has a @code{struct
4664 lrecord_header} at its beginning, so sanity is preserved; but it also
4665 has a pointer used to chain all lcrecords together, and a special ID
4666 field used to distinguish one lcrecord from another. (This field is used
4667 only for debugging and could be removed, but the space gain is not
4668 significant.)
4669
4670   Simple lrecords are created using @code{ALLOCATE_FIXED_TYPE()}, just
4671 like for other frob blocks.  The only change is that the implementation
4672 pointer must be initialized correctly. (The implementation structure for
4673 an lrecord, or rather the pointer to it, is named @code{lrecord_float},
4674 @code{lrecord_extent}, @code{lrecord_buffer}, etc.)
4675
4676   lcrecords are created using @code{alloc_lcrecord()}.  This takes a
4677 size to allocate and an implementation pointer. (The size needs to be
4678 passed because some lcrecords, such as window configurations, are of
4679 variable size.) This basically just @code{malloc()}s the storage,
4680 initializes the @code{struct lcrecord_header}, and chains the lcrecord
4681 onto the head of the list of all lcrecords, which is stored in the
4682 variable @code{all_lcrecords}.  The calls to @code{alloc_lcrecord()}
4683 generally occur in the lowest-level allocation function for each lrecord
4684 type.
4685
4686 Whenever you create an lrecord, you need to call either
4687 @code{DEFINE_LRECORD_IMPLEMENTATION()} or
4688 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()}.  This needs to be
4689 specified in a C file, at the top level.  What this actually does is
4690 define and initialize the implementation structure for the lrecord. (And
4691 possibly declares a function @code{error_check_foo()} that implements
4692 the @code{XFOO()} macro when error-checking is enabled.)  The arguments
4693 to the macros are the actual type name (this is used to construct the C
4694 variable name of the lrecord implementation structure and related
4695 structures using the @samp{##} macro concatenation operator), a string
4696 that names the type on the Lisp level (this may not be the same as the C
4697 type name; typically, the C type name has underscores, while the Lisp
4698 string has dashes), various method pointers, and the name of the C
4699 structure that contains the object.  The methods are used to encapsulate
4700 type-specific information about the object, such as how to print it or
4701 mark it for garbage collection, so that it's easy to add new object
4702 types without having to add a specific case for each new type in a bunch
4703 of different places.
4704
4705   The difference between @code{DEFINE_LRECORD_IMPLEMENTATION()} and
4706 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()} is that the former is
4707 used for fixed-size object types and the latter is for variable-size
4708 object types.  Most object types are fixed-size; some complex
4709 types, however (e.g. window configurations), are variable-size.
4710 Variable-size object types have an extra method, which is called
4711 to determine the actual size of a particular object of that type.
4712 (Currently this is only used for keeping allocation statistics.)
4713
4714   For the purpose of keeping allocation statistics, the allocation
4715 engine keeps a list of all the different types that exist.  Note that,
4716 since @code{DEFINE_LRECORD_IMPLEMENTATION()} is a macro that is
4717 specified at top-level, there is no way for it to add to the list of all
4718 existing types.  What happens instead is that each implementation
4719 structure contains in it a dynamically assigned number that is
4720 particular to that type. (Or rather, it contains a pointer to another
4721 structure that contains this number.  This evasiveness is done so that
4722 the implementation structure can be declared const.) In the sweep stage
4723 of garbage collection, each lrecord is examined to see if its
4724 implementation structure has its dynamically-assigned number set.  If
4725 not, it must be a new type, and it is added to the list of known types
4726 and a new number assigned.  The number is used to index into an array
4727 holding the number of objects of each type and the total memory
4728 allocated for objects of that type.  The statistics in this array are
4729 also computed during the sweep stage.  These statistics are returned by
4730 the call to @code{garbage-collect} and are printed out at the end of the
4731 loadup phase.
4732
4733   Note that for every type defined with a @code{DEFINE_LRECORD_*()}
4734 macro, there needs to be a @code{DECLARE_LRECORD_IMPLEMENTATION()}
4735 somewhere in a @file{.h} file, and this @file{.h} file needs to be
4736 included by @file{inline.c}.
4737
4738   Furthermore, there should generally be a set of @code{XFOOBAR()},
4739 @code{FOOBARP()}, etc. macros in a @file{.h} (or occasionally @file{.c})
4740 file.  To create one of these, copy an existing model and modify as
4741 necessary.
4742
4743   The various methods in the lrecord implementation structure are:
4744
4745 @enumerate
4746 @item
4747 @cindex mark method
4748 A @dfn{mark} method.  This is called during the marking stage and passed
4749 a function pointer (usually the @code{mark_object()} function), which is
4750 used to mark an object.  All Lisp objects that are contained within the
4751 object need to be marked by applying this function to them.  The mark
4752 method should also return a Lisp object, which should be either nil or
4753 an object to mark. (This can be used in lieu of calling
4754 @code{mark_object()} on the object, to reduce the recursion depth, and
4755 consequently should be the most heavily nested sub-object, such as a
4756 long list.)
4757
4758 @strong{Please note:} When the mark method is called, garbage collection
4759 is in progress, and special precautions need to be taken when accessing
4760 objects; see section (B) above.
4761
4762 If your mark method does not need to do anything, it can be
4763 @code{NULL}.
4764
4765 @item
4766 A @dfn{print} method.  This is called to create a printed representation
4767 of the object, whenever @code{princ}, @code{prin1}, or the like is
4768 called.  It is passed the object, a stream to which the output is to be
4769 directed, and an @code{escapeflag} which indicates whether the object's
4770 printed representation should be @dfn{escaped} so that it is
4771 readable. (This corresponds to the difference between @code{princ} and
4772 @code{prin1}.) Basically, @dfn{escaped} means that strings will have
4773 quotes around them and confusing characters in the strings such as
4774 quotes, backslashes, and newlines will be backslashed; and that special
4775 care will be taken to make symbols print in a readable fashion
4776 (e.g. symbols that look like numbers will be backslashed).  Other
4777 readable objects should perhaps pass @code{escapeflag} on when
4778 sub-objects are printed, so that readability is preserved when necessary
4779 (or if not, always pass in a 1 for @code{escapeflag}).  Non-readable
4780 objects should in general ignore @code{escapeflag}, except that some use
4781 it as an indication that more verbose output should be given.
4782
4783 Sub-objects are printed using @code{print_internal()}, which takes
4784 exactly the same arguments as are passed to the print method.
4785
4786 Literal C strings should be printed using @code{write_c_string()},
4787 or @code{write_string_1()} for non-null-terminated strings.
4788
4789 Functions that do not have a readable representation should check the
4790 @code{print_readably} flag and signal an error if it is set.
4791
4792 If you specify NULL for the print method, the
4793 @code{default_object_printer()} will be used.
4794
4795 @item
4796 A @dfn{finalize} method.  This is called at the beginning of the sweep
4797 stage on lcrecords that are about to be freed, and should be used to
4798 perform any extra object cleanup.  This typically involves freeing any
4799 extra @code{malloc()}ed memory associated with the object, releasing any
4800 operating-system and window-system resources associated with the object
4801 (e.g. pixmaps, fonts), etc.
4802
4803 The finalize method can be NULL if nothing needs to be done.
4804
4805 WARNING #1: The finalize method is also called at the end of the dump
4806 phase; this time with the for_disksave parameter set to non-zero.  The
4807 object is @emph{not} about to disappear, so you have to make sure to
4808 @emph{not} free any extra @code{malloc()}ed memory if you're going to
4809 need it later.  (Also, signal an error if there are any operating-system
4810 and window-system resources here, because they can't be dumped.)
4811
4812 Finalize methods should, as a rule, set to zero any pointers after
4813 they've been freed, and check to make sure pointers are not zero before
4814 freeing.  Although I'm pretty sure that finalize methods are not called
4815 twice on the same object (except for the @code{for_disksave} proviso),
4816 we've gotten nastily burned in some cases by not doing this.
4817
4818 WARNING #2: The finalize method is @emph{only} called for
4819 lcrecords, @emph{not} for simply lrecords.  If you need a
4820 finalize method for simple lrecords, you have to stick
4821 it in the @code{ADDITIONAL_FREE_foo()} macro in @file{alloc.c}.
4822
4823 WARNING #3: Things are in an @emph{extremely} bizarre state
4824 when @code{ADDITIONAL_FREE_foo()} is called, so you have to
4825 be incredibly careful when writing one of these functions.
4826 See the comment in @code{gc_sweep()}.  If you ever have to add
4827 one of these, consider using an lcrecord or dealing with
4828 the problem in a different fashion.
4829
4830 @item
4831 An @dfn{equal} method.  This compares the two objects for similarity,
4832 when @code{equal} is called.  It should compare the contents of the
4833 objects in some reasonable fashion.  It is passed the two objects and a
4834 @dfn{depth} value, which is used to catch circular objects.  To compare
4835 sub-Lisp-objects, call @code{internal_equal()} and bump the depth value
4836 by one.  If this value gets too high, a @code{circular-object} error
4837 will be signaled.
4838
4839 If this is NULL, objects are @code{equal} only when they are @code{eq},
4840 i.e. identical.
4841
4842 @item
4843 A @dfn{hash} method.  This is used to hash objects when they are to be
4844 compared with @code{equal}.  The rule here is that if two objects are
4845 @code{equal}, they @emph{must} hash to the same value; i.e. your hash
4846 function should use some subset of the sub-fields of the object that are
4847 compared in the ``equal'' method.  If you specify this method as
4848 @code{NULL}, the object's pointer will be used as the hash, which will
4849 @emph{fail} if the object has an @code{equal} method, so don't do this.
4850
4851 To hash a sub-Lisp-object, call @code{internal_hash()}.  Bump the
4852 depth by one, just like in the ``equal'' method.
4853
4854 To convert a Lisp object directly into a hash value (using
4855 its pointer), use @code{LISP_HASH()}.  This is what happens when
4856 the hash method is NULL.
4857
4858 To hash two or more values together into a single value, use
4859 @code{HASH2()}, @code{HASH3()}, @code{HASH4()}, etc.
4860
4861 @item
4862 @dfn{getprop}, @dfn{putprop}, @dfn{remprop}, and @dfn{plist} methods.
4863 These are used for object types that have properties.  I don't feel like
4864 documenting them here.  If you create one of these objects, you have to
4865 use different macros to define them,
4866 i.e. @code{DEFINE_LRECORD_IMPLEMENTATION_WITH_PROPS()} or
4867 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION_WITH_PROPS()}.
4868
4869 @item
4870 A @dfn{size_in_bytes} method, when the object is of variable-size.
4871 (i.e. declared with a @code{_SEQUENCE_IMPLEMENTATION} macro.)  This should
4872 simply return the object's size in bytes, exactly as you might expect.
4873 For an example, see the methods for window configurations and opaques.
4874 @end enumerate
4875
4876 @node Low-level allocation
4877 @section Low-level allocation
4878
4879   Memory that you want to allocate directly should be allocated using
4880 @code{xmalloc()} rather than @code{malloc()}.  This implements
4881 error-checking on the return value, and once upon a time did some more
4882 vital stuff (i.e. @code{BLOCK_INPUT}, which is no longer necessary).
4883 Free using @code{xfree()}, and realloc using @code{xrealloc()}.  Note
4884 that @code{xmalloc()} will do a non-local exit if the memory can't be
4885 allocated. (Many functions, however, do not expect this, and thus XEmacs
4886 will likely crash if this happens.  @strong{This is a bug.}  If you can,
4887 you should strive to make your function handle this OK.  However, it's
4888 difficult in the general circumstance, perhaps requiring extra
4889 unwind-protects and such.)
4890
4891   Note that XEmacs provides two separate replacements for the standard
4892 @code{malloc()} library function.  These are called @dfn{old GNU malloc}
4893 (@file{malloc.c}) and @dfn{new GNU malloc} (@file{gmalloc.c}),
4894 respectively.  New GNU malloc is better in pretty much every way than
4895 old GNU malloc, and should be used if possible.  (It used to be that on
4896 some systems, the old one worked but the new one didn't.  I think this
4897 was due specifically to a bug in SunOS, which the new one now works
4898 around; so I don't think the old one ever has to be used any more.) The
4899 primary difference between both of these mallocs and the standard system
4900 malloc is that they are much faster, at the expense of increased space.
4901 The basic idea is that memory is allocated in fixed chunks of powers of
4902 two.  This allows for basically constant malloc time, since the various
4903 chunks can just be kept on a number of free lists. (The standard system
4904 malloc typically allocates arbitrary-sized chunks and has to spend some
4905 time, sometimes a significant amount of time, walking the heap looking
4906 for a free block to use and cleaning things up.)  The new GNU malloc
4907 improves on things by allocating large objects in chunks of 4096 bytes
4908 rather than in ever larger powers of two, which results in ever larger
4909 wastage.  There is a slight speed loss here, but it's of doubtful
4910 significance.
4911
4912   NOTE: Apparently there is a third-generation GNU malloc that is
4913 significantly better than the new GNU malloc, and should probably
4914 be included in XEmacs.
4915
4916   There is also the relocating allocator, @file{ralloc.c}.  This actually
4917 moves blocks of memory around so that the @code{sbrk()} pointer shrunk
4918 and virtual memory released back to the system.  On some systems,
4919 this is a big win.  On all systems, it causes a noticeable (and
4920 sometimes huge) speed penalty, so I turn it off by default.
4921 @file{ralloc.c} only works with the new GNU malloc in @file{gmalloc.c}.
4922 There are also two versions of @file{ralloc.c}, one that uses @code{mmap()}
4923 rather than block copies to move data around.  This purports to
4924 be faster, although that depends on the amount of data that would
4925 have had to be block copied and the system-call overhead for
4926 @code{mmap()}.  I don't know exactly how this works, except that the
4927 relocating-allocation routines are pretty much used only for
4928 the memory allocated for a buffer, which is the biggest consumer
4929 of space, esp. of space that may get freed later.
4930
4931   Note that the GNU mallocs have some ``memory warning'' facilities.
4932 XEmacs taps into them and issues a warning through the standard
4933 warning system, when memory gets to 75%, 85%, and 95% full.
4934 (On some systems, the memory warnings are not functional.)
4935
4936   Allocated memory that is going to be used to make a Lisp object
4937 is created using @code{allocate_lisp_storage()}.  This calls @code{xmalloc()}
4938 but also verifies that the pointer to the memory can fit into
4939 a Lisp word (remember that some bits are taken away for a type
4940 tag and a mark bit).  If not, an error is issued through @code{memory_full()}.
4941 @code{allocate_lisp_storage()} is called by @code{alloc_lcrecord()},
4942 @code{ALLOCATE_FIXED_TYPE()}, and the vector and bit-vector creation
4943 routines.  These routines also call @code{INCREMENT_CONS_COUNTER()} at the
4944 appropriate times; this keeps statistics on how much memory is
4945 allocated, so that garbage-collection can be invoked when the
4946 threshold is reached.
4947
4948 @node Pure Space
4949 @section Pure Space
4950
4951   Not yet documented.
4952
4953 @node Cons
4954 @section Cons
4955
4956   Conses are allocated in standard frob blocks.  The only thing to
4957 note is that conses can be explicitly freed using @code{free_cons()}
4958 and associated functions @code{free_list()} and @code{free_alist()}.  This
4959 immediately puts the conses onto the cons free list, and decrements
4960 the statistics on memory allocation appropriately.  This is used
4961 to good effect by some extremely commonly-used code, to avoid
4962 generating extra objects and thereby triggering GC sooner.
4963 However, you have to be @emph{extremely} careful when doing this.
4964 If you mess this up, you will get BADLY BURNED, and it has happened
4965 before.
4966
4967 @node Vector
4968 @section Vector
4969
4970   As mentioned above, each vector is @code{malloc()}ed individually, and
4971 all are threaded through the variable @code{all_vectors}.  Vectors are
4972 marked strangely during garbage collection, by kludging the size field.
4973 Note that the @code{struct Lisp_Vector} is declared with its
4974 @code{contents} field being a @emph{stretchy} array of one element.  It
4975 is actually @code{malloc()}ed with the right size, however, and access
4976 to any element through the @code{contents} array works fine.
4977
4978 @node Bit Vector
4979 @section Bit Vector
4980
4981   Bit vectors work exactly like vectors, except for more complicated
4982 code to access an individual bit, and except for the fact that bit
4983 vectors are lrecords while vectors are not. (The only difference here is
4984 that there's an lrecord implementation pointer at the beginning and the
4985 tag field in bit vector Lisp words is ``lrecord'' rather than
4986 ``vector''.)
4987
4988 @node Symbol
4989 @section Symbol
4990
4991   Symbols are also allocated in frob blocks.  Note that the code
4992 exists for symbols to be either lrecords (category (c) above)
4993 or simple types (category (b) above), and are lrecords by
4994 default (I think), although there is no good reason for this.
4995
4996   Note that symbols in the awful horrible obarray structure are
4997 chained through their @code{next} field.
4998
4999 Remember that @code{intern} looks up a symbol in an obarray, creating
5000 one if necessary.
5001
5002 @node Marker
5003 @section Marker
5004
5005   Markers are allocated in frob blocks, as usual.  They are kept
5006 in a buffer unordered, but in a doubly-linked list so that they
5007 can easily be removed. (Formerly this was a singly-linked list,
5008 but in some cases garbage collection took an extraordinarily
5009 long time due to the O(N^2) time required to remove lots of
5010 markers from a buffer.) Markers are removed from a buffer in
5011 the finalize stage, in @code{ADDITIONAL_FREE_marker()}.
5012
5013 @node String
5014 @section String
5015
5016   As mentioned above, strings are a special case.  A string is logically
5017 two parts, a fixed-size object (containing the length, property list,
5018 and a pointer to the actual data), and the actual data in the string.
5019 The fixed-size object is a @code{struct Lisp_String} and is allocated in
5020 frob blocks, as usual.  The actual data is stored in special
5021 @dfn{string-chars blocks}, which are 8K blocks of memory.
5022 Currently-allocated strings are simply laid end to end in these
5023 string-chars blocks, with a pointer back to the @code{struct Lisp_String}
5024 stored before each string in the string-chars block.  When a new string
5025 needs to be allocated, the remaining space at the end of the last
5026 string-chars block is used if there's enough, and a new string-chars
5027 block is created otherwise.
5028
5029   There are never any holes in the string-chars blocks due to the string
5030 compaction and relocation that happens at the end of garbage collection.
5031 During the sweep stage of garbage collection, when objects are
5032 reclaimed, the garbage collector goes through all string-chars blocks,
5033 looking for unused strings.  Each chunk of string data is preceded by a
5034 pointer to the corresponding @code{struct Lisp_String}, which indicates
5035 both whether the string is used and how big the string is, i.e. how to
5036 get to the next chunk of string data.  Holes are compressed by
5037 block-copying the next string into the empty space and relocating the
5038 pointer stored in the corresponding @code{struct Lisp_String}.
5039 @strong{This means you have to be careful with strings in your code.}
5040 See the section above on @code{GCPRO}ing.
5041
5042   Note that there is one situation not handled: a string that is too big
5043 to fit into a string-chars block.  Such strings, called @dfn{big
5044 strings}, are all @code{malloc()}ed as their own block. (#### Although it
5045 would make more sense for the threshold for big strings to be somewhat
5046 lower, e.g. 1/2 or 1/4 the size of a string-chars block.  It seems that
5047 this was indeed the case formerly -- indeed, the threshold was set at
5048 1/8 -- but Mly forgot about this when rewriting things for 19.8.)
5049
5050 Note also that the string data in string-chars blocks is padded as
5051 necessary so that proper alignment constraints on the @code{struct
5052 Lisp_String} back pointers are maintained.
5053
5054   Finally, strings can be resized.  This happens in Mule when a
5055 character is substituted with a different-length character, or during
5056 modeline frobbing. (You could also export this to Lisp, but it's not
5057 done so currently.) Resizing a string is a potentially tricky process.
5058 If the change is small enough that the padding can absorb it, nothing
5059 other than a simple memory move needs to be done.  Keep in mind,
5060 however, that the string can't shrink too much because the offset to the
5061 next string in the string-chars block is computed by looking at the
5062 length and rounding to the nearest multiple of four or eight.  If the
5063 string would shrink or expand beyond the correct padding, new string
5064 data needs to be allocated at the end of the last string-chars block and
5065 the data moved appropriately.  This leaves some dead string data, which
5066 is marked by putting a special marker of 0xFFFFFFFF in the @code{struct
5067 Lisp_String} pointer before the data (there's no real @code{struct
5068 Lisp_String} to point to and relocate), and storing the size of the dead
5069 string data (which would normally be obtained from the now-non-existent
5070 @code{struct Lisp_String}) at the beginning of the dead string data gap.
5071 The string compactor recognizes this special 0xFFFFFFFF marker and
5072 handles it correctly.
5073
5074 @node Bytecode
5075 @section Bytecode
5076
5077   Not yet documented.
5078
5079 @node Events and the Event Loop, Evaluation; Stack Frames; Bindings, Allocation of Objects in XEmacs Lisp, Top
5080 @chapter Events and the Event Loop
5081
5082 @menu
5083 * Introduction to Events::
5084 * Main Loop::
5085 * Specifics of the Event Gathering Mechanism::
5086 * Specifics About the Emacs Event::
5087 * The Event Stream Callback Routines::
5088 * Other Event Loop Functions::
5089 * Converting Events::
5090 * Dispatching Events; The Command Builder::
5091 @end menu
5092
5093 @node Introduction to Events
5094 @section Introduction to Events
5095
5096   An event is an object that encapsulates information about an
5097 interesting occurrence in the operating system.  Events are
5098 generated either by user action, direct (e.g. typing on the
5099 keyboard or moving the mouse) or indirect (moving another
5100 window, thereby generating an expose event on an Emacs frame),
5101 or as a result of some other typically asynchronous action happening,
5102 such as output from a subprocess being ready or a timer expiring.
5103 Events come into the system in an asynchronous fashion (typically
5104 through a callback being called) and are converted into a
5105 synchronous event queue (first-in, first-out) in a process that
5106 we will call @dfn{collection}.
5107
5108   Note that each application has its own event queue. (It is
5109 immaterial whether the collection process directly puts the
5110 events in the proper application's queue, or puts them into
5111 a single system queue, which is later split up.)
5112
5113   The most basic level of event collection is done by the
5114 operating system or window system.  Typically, XEmacs does
5115 its own event collection as well.  Often there are multiple
5116 layers of collection in XEmacs, with events from various
5117 sources being collected into a queue, which is then combined
5118 with other sources to go into another queue (i.e. a second
5119 level of collection), with perhaps another level on top of
5120 this, etc.
5121
5122   XEmacs has its own types of events (called @dfn{Emacs events}),
5123 which provides an abstract layer on top of the system-dependent
5124 nature of the most basic events that are received.  Part of the
5125 complex nature of the XEmacs event collection process involves
5126 converting from the operating-system events into the proper
5127 Emacs events -- there may not be a one-to-one correspondence.
5128
5129   Emacs events are documented in @file{events.h}; I'll discuss them
5130 later.
5131
5132 @node Main Loop
5133 @section Main Loop
5134
5135   The @dfn{command loop} is the top-level loop that the editor is always
5136 running.  It loops endlessly, calling @code{next-event} to retrieve an
5137 event and @code{dispatch-event} to execute it. @code{dispatch-event} does
5138 the appropriate thing with non-user events (process, timeout,
5139 magic, eval, mouse motion); this involves calling a Lisp handler
5140 function, redrawing a newly-exposed part of a frame, reading
5141 subprocess output, etc.  For user events, @code{dispatch-event}
5142 looks up the event in relevant keymaps or menubars; when a
5143 full key sequence or menubar selection is reached, the appropriate
5144 function is executed. @code{dispatch-event} may have to keep state
5145 across calls; this is done in the ``command-builder'' structure
5146 associated with each console (remember, there's usually only
5147 one console), and the engine that looks up keystrokes and
5148 constructs full key sequences is called the @dfn{command builder}.
5149 This is documented elsewhere.
5150
5151   The guts of the command loop are in @code{command_loop_1()}.  This
5152 function doesn't catch errors, though -- that's the job of
5153 @code{command_loop_2()}, which is a condition-case (i.e. error-trapping)
5154 wrapper around @code{command_loop_1()}.  @code{command_loop_1()} never
5155 returns, but may get thrown out of.
5156
5157   When an error occurs, @code{cmd_error()} is called, which usually
5158 invokes the Lisp error handler in @code{command-error}; however, a
5159 default error handler is provided if @code{command-error} is @code{nil}
5160 (e.g. during startup).  The purpose of the error handler is simply to
5161 display the error message and do associated cleanup; it does not need to
5162 throw anywhere.  When the error handler finishes, the condition-case in
5163 @code{command_loop_2()} will finish and @code{command_loop_2()} will
5164 reinvoke @code{command_loop_1()}.
5165
5166   @code{command_loop_2()} is invoked from three places: from
5167 @code{initial_command_loop()} (called from @code{main()} at the end of
5168 internal initialization), from the Lisp function @code{recursive-edit},
5169 and from @code{call_command_loop()}.
5170
5171   @code{call_command_loop()} is called when a macro is started and when
5172 the minibuffer is entered; normal termination of the macro or minibuffer
5173 causes a throw out of the recursive command loop. (To
5174 @code{execute-kbd-macro} for macros and @code{exit} for minibuffers.
5175 Note also that the low-level minibuffer-entering function,
5176 @code{read-minibuffer-internal}, provides its own error handling and
5177 does not need @code{command_loop_2()}'s error encapsulation; so it tells
5178 @code{call_command_loop()} to invoke @code{command_loop_1()} directly.)
5179
5180   Note that both read-minibuffer-internal and recursive-edit set up a
5181 catch for @code{exit}; this is why @code{abort-recursive-edit}, which
5182 throws to this catch, exits out of either one.
5183
5184   @code{initial_command_loop()}, called from @code{main()}, sets up a
5185 catch for @code{top-level} when invoking @code{command_loop_2()},
5186 allowing functions to throw all the way to the top level if they really
5187 need to.  Before invoking @code{command_loop_2()},
5188 @code{initial_command_loop()} calls @code{top_level_1()}, which handles
5189 all of the startup stuff (creating the initial frame, handling the
5190 command-line options, loading the user's @file{.emacs} file, etc.).  The
5191 function that actually does this is in Lisp and is pointed to by the
5192 variable @code{top-level}; normally this function is
5193 @code{normal-top-level}.  @code{top_level_1()} is just an error-handling
5194 wrapper similar to @code{command_loop_2()}.  Note also that
5195 @code{initial_command_loop()} sets up a catch for @code{top-level} when
5196 invoking @code{top_level_1()}, just like when it invokes
5197 @code{command_loop_2()}.
5198
5199 @node Specifics of the Event Gathering Mechanism
5200 @section Specifics of the Event Gathering Mechanism
5201
5202   Here is an approximate diagram of the collection processes
5203 at work in XEmacs, under TTY's (TTY's are simpler than X
5204 so we'll look at this first):
5205
5206 @noindent
5207 @example
5208  asynch.      asynch.    asynch.   asynch.                [Collectors in
5209 kbd events  kbd events   process   process                   the OS]
5210       |         |         output    output
5211       |         |           |         |
5212       |         |           |         |      SIGINT,      [signal handlers
5213       |         |           |         |      SIGQUIT,        in XEmacs]
5214       V         V           V         V      SIGWINCH,
5215      file      file        file      file    SIGALRM
5216      desc.     desc.       desc.     desc.     |
5217      (TTY)     (TTY)       (pipe)    (pipe)    |
5218       |          |          |         |      fake    timeouts
5219       |          |          |         |      file        |
5220       |          |          |         |      desc.       |
5221       |          |          |         |      (pipe)      |
5222       |          |          |         |        |         |
5223       |          |          |         |        |         |
5224       |          |          |         |        |         |
5225       V          V          V         V        V         V
5226       ------>-----------<----------------<----------------
5227                    |
5228                    |
5229                    |   [collected using select() in emacs_tty_next_event()
5230                    |    and converted to the appropriate Emacs event]
5231                    |
5232                    |
5233                    V              (above this line is TTY-specific)
5234                  Emacs    ------------------------------------------------
5235                  event    (below this line is the generic event mechanism)
5236                    |
5237                    |
5238 was there      if not, call
5239 a SIGINT?   emacs_tty_next_event()
5240     |              |
5241     |              |
5242     |              |
5243     V              V
5244     --->-------<----
5245            |
5246            |        [collected in event_stream_next_event();
5247            |         SIGINT is converted using maybe_read_quit_event()]
5248            V
5249          Emacs
5250          event
5251            |
5252            \---->------>----- maybe_kbd_translate() ---->---\
5253                                                             |
5254                                                             |
5255                                                             |
5256      command event queue                                    |
5257                                                  if not from command
5258   (contains events that were                     event queue, call
5259   read earlier but not processed,                event_stream_next_event()
5260   typically when waiting in a                               |
5261   sit-for, sleep-for, etc. for                              |
5262  a particular event to be received)                         |
5263                |                                            |
5264                |                                            |
5265                V                                            V
5266                ---->------------------------------------<----
5267                                                |
5268                                                |   [collected in
5269                                                |    next_event_internal()]
5270                                                |
5271  unread-     unread-       event from          |
5272  command-    command-       keyboard       else, call
5273  events      event           macro      next_event_internal()
5274    |           |               |               |
5275    |           |               |               |
5276    |           |               |               |
5277    V           V               V               V
5278    --------->----------------------<------------
5279                      |
5280                      |      [collected in `next-event', which may loop
5281                      |       more than once if the event it gets is on
5282                      |       a dead frame, device, etc.]
5283                      |
5284                      |
5285                      V
5286             feed into top-level event loop,
5287             which repeatedly calls `next-event'
5288             and then dispatches the event
5289             using `dispatch-event'
5290 @end example
5291
5292 Notice the separation between TTY-specific and generic event mechanism.
5293 When using the Xt-based event loop, the TTY-specific stuff is replaced
5294 but the rest stays the same.
5295
5296 It's also important to realize that only one different kind of
5297 system-specific event loop can be operating at a time, and must be able
5298 to receive all kinds of events simultaneously.  For the two existing
5299 event loops (implemented in @file{event-tty.c} and @file{event-Xt.c},
5300 respectively), the TTY event loop @emph{only} handles TTY consoles,
5301 while the Xt event loop handles @emph{both} TTY and X consoles.  This
5302 situation is different from all of the output handlers, where you simply
5303 have one per console type.
5304
5305   Here's the Xt Event Loop Diagram (notice that below a certain point,
5306 it's the same as the above diagram):
5307
5308 @example
5309 asynch. asynch. asynch. asynch.                 [Collectors in
5310  kbd     kbd    process process                    the OS]
5311 events  events  output  output
5312   |       |       |       |
5313   |       |       |       |     asynch. asynch.   [Collectors in the
5314   |       |       |       |       X        X       OS and X Window System]
5315   |       |       |       |     events  events
5316   |       |       |       |       |        |
5317   |       |       |       |       |        |
5318   |       |       |       |       |        |    SIGINT,   [signal handlers
5319   |       |       |       |       |        |    SIGQUIT,     in XEmacs]
5320   |       |       |       |       |        |    SIGWINCH,
5321   |       |       |       |       |        |    SIGALRM
5322   |       |       |       |       |        |       |
5323   |       |       |       |       |        |       |
5324   |       |       |       |       |        |       |      timeouts
5325   |       |       |       |       |        |       |          |
5326   |       |       |       |       |        |       |          |
5327   |       |       |       |       |        |       V          |
5328   V       V       V       V       V        V      fake        |
5329  file    file    file    file    file     file    file        |
5330  desc.   desc.   desc.   desc.   desc.    desc.   desc.       |
5331  (TTY)   (TTY)   (pipe)  (pipe) (socket) (socket) (pipe)      |
5332   |       |       |       |       |        |       |          |
5333   |       |       |       |       |        |       |          |
5334   |       |       |       |       |        |       |          |
5335   V       V       V       V       V        V       V          V
5336   --->----------------------------------------<---------<------
5337        |              |               |
5338        |              |               |   [collected using select() in
5339        |              |               |   _XtWaitForSomething(), called
5340        |              |               |   from XtAppProcessEvent(), called
5341        |              |               |   in emacs_Xt_next_event();
5342        |              |               |   dispatched to various callbacks]
5343        |              |               |
5344        |              |               |
5345   emacs_Xt_        p_s_callback(),    |   [popup_selection_callback]
5346   event_handler()  x_u_v_s_callback(),|   [x_update_vertical_scrollbar_
5347        |           x_u_h_s_callback(),|    callback]
5348        |           search_callback()  |   [x_update_horizontal_scrollbar_
5349        |              |               |    callback]
5350        |              |               |
5351        |              |               |
5352   enqueue_Xt_       signal_special_   |
5353   dispatch_event()  Xt_user_event()   |
5354   [maybe multiple     |               |
5355    times, maybe 0     |               |
5356    times]             |               |
5357        |            enqueue_Xt_       |
5358        |            dispatch_event()  |
5359        |              |               |
5360        |              |               |
5361        V              V               |
5362        -->----------<--               |
5363               |                       |
5364               |                       |
5365            dispatch             Xt_what_callback()
5366            event                  sets flags
5367            queue                      |
5368               |                       |
5369               |                       |
5370               |                       |
5371               |                       |
5372               ---->-----------<--------
5373                    |
5374                    |
5375                    |     [collected and converted as appropriate in
5376                    |            emacs_Xt_next_event()]
5377                    |
5378                    |
5379                    V              (above this line is Xt-specific)
5380                  Emacs    ------------------------------------------------
5381                  event    (below this line is the generic event mechanism)
5382                    |
5383                    |
5384 was there      if not, call
5385 a SIGINT?   emacs_Xt_next_event()
5386     |              |
5387     |              |
5388     |              |
5389     V              V
5390     --->-------<----
5391            |
5392            |        [collected in event_stream_next_event();
5393            |         SIGINT is converted using maybe_read_quit_event()]
5394            V
5395          Emacs
5396          event
5397            |
5398            \---->------>----- maybe_kbd_translate() -->-----\
5399                                                             |
5400                                                             |
5401                                                             |
5402      command event queue                                    |
5403                                                  if not from command
5404   (contains events that were                     event queue, call
5405   read earlier but not processed,                event_stream_next_event()
5406   typically when waiting in a                               |
5407   sit-for, sleep-for, etc. for                              |
5408  a particular event to be received)                         |
5409                |                                            |
5410                |                                            |
5411                V                                            V
5412                ---->----------------------------------<------
5413                                                |
5414                                                |   [collected in
5415                                                |    next_event_internal()]
5416                                                |
5417  unread-     unread-       event from          |
5418  command-    command-       keyboard       else, call
5419  events      event           macro      next_event_internal()
5420    |           |               |               |
5421    |           |               |               |
5422    |           |               |               |
5423    V           V               V               V
5424    --------->----------------------<------------
5425                      |
5426                      |      [collected in `next-event', which may loop
5427                      |       more than once if the event it gets is on
5428                      |       a dead frame, device, etc.]
5429                      |
5430                      |
5431                      V
5432             feed into top-level event loop,
5433             which repeatedly calls `next-event'
5434             and then dispatches the event
5435             using `dispatch-event'
5436 @end example
5437
5438 @node Specifics About the Emacs Event
5439 @section Specifics About the Emacs Event
5440
5441 @node The Event Stream Callback Routines
5442 @section The Event Stream Callback Routines
5443
5444 @node Other Event Loop Functions
5445 @section Other Event Loop Functions
5446
5447   @code{detect_input_pending()} and @code{input-pending-p} look for
5448 input by calling @code{event_stream->event_pending_p} and looking in
5449 @code{[V]unread-command-event} and the @code{command_event_queue} (they
5450 do not check for an executing keyboard macro, though).
5451
5452   @code{discard-input} cancels any command events pending (and any
5453 keyboard macros currently executing), and puts the others onto the
5454 @code{command_event_queue}.  There is a comment about a ``race
5455 condition'', which is not a good sign.
5456
5457   @code{next-command-event} and @code{read-char} are higher-level
5458 interfaces to @code{next-event}.  @code{next-command-event} gets the
5459 next @dfn{command} event (i.e.  keypress, mouse event, menu selection,
5460 or scrollbar action), calling @code{dispatch-event} on any others.
5461 @code{read-char} calls @code{next-command-event} and uses
5462 @code{event_to_character()} to return the character equivalent.  With
5463 the right kind of input method support, it is possible for (read-char)
5464 to return a Kanji character.
5465
5466 @node Converting Events
5467 @section Converting Events
5468
5469   @code{character_to_event()}, @code{event_to_character()},
5470 @code{event-to-character}, and @code{character-to-event} convert between
5471 characters and keypress events corresponding to the characters.  If the
5472 event was not a keypress, @code{event_to_character()} returns -1 and
5473 @code{event-to-character} returns @code{nil}.  These functions convert
5474 between character representation and the split-up event representation
5475 (keysym plus mod keys).
5476
5477 @node Dispatching Events; The Command Builder
5478 @section Dispatching Events; The Command Builder
5479
5480 Not yet documented.
5481
5482 @node Evaluation; Stack Frames; Bindings, Symbols and Variables, Events and the Event Loop, Top
5483 @chapter Evaluation; Stack Frames; Bindings
5484
5485 @menu
5486 * Evaluation::
5487 * Dynamic Binding; The specbinding Stack; Unwind-Protects::
5488 * Simple Special Forms::
5489 * Catch and Throw::
5490 @end menu
5491
5492 @node Evaluation
5493 @section Evaluation
5494
5495   @code{Feval()} evaluates the form (a Lisp object) that is passed to
5496 it.  Note that evaluation is only non-trivial for two types of objects:
5497 symbols and conses.  A symbol is evaluated simply by calling
5498 symbol-value on it and returning the value.
5499
5500   Evaluating a cons means calling a function.  First, @code{eval} checks
5501 to see if garbage-collection is necessary, and calls
5502 @code{Fgarbage_collect()} if so.  It then increases the evaluation depth
5503 by 1 (@code{lisp_eval_depth}, which is always less than @code{max_lisp_eval_depth}) and adds an
5504 element to the linked list of @code{struct backtrace}'s
5505 (@code{backtrace_list}).  Each such structure contains a pointer to the
5506 function being called plus a list of the function's arguments.
5507 Originally these values are stored unevalled, and as they are evaluated,
5508 the backtrace structure is updated.  Garbage collection pays attention
5509 to the objects pointed to in the backtrace structures (garbage
5510 collection might happen while a function is being called or while an
5511 argument is being evaluated, and there could easily be no other
5512 references to the arguments in the argument list; once an argument is
5513 evaluated, however, the unevalled version is not needed by eval, and so
5514 the backtrace structure is changed).
5515
5516   At this point, the function to be called is determined by looking at
5517 the car of the cons (if this is a symbol, its function definition is
5518 retrieved and the process repeated).  The function should then consist
5519 of either a @code{Lisp_Subr} (built-in function), a
5520 @code{Lisp_Compiled_Function} object, or a cons whose car is the symbol
5521 @code{autoload}, @code{macro} or @code{lambda}.
5522
5523 If the function is a @code{Lisp_Subr}, the lisp object points to a
5524 @code{struct Lisp_Subr} (created by @code{DEFUN()}), which contains a
5525 pointer to the C function, a minimum and maximum number of arguments
5526 (possibly the special constants @code{MANY} or @code{UNEVALLED}), a
5527 pointer to the symbol referring to that subr, and a couple of other
5528 things.  If the subr wants its arguments @code{UNEVALLED}, they are
5529 passed raw as a list.  Otherwise, an array of evaluated arguments is
5530 created and put into the backtrace structure, and either passed whole
5531 (@code{MANY}) or each argument is passed as a C argument.
5532
5533   If the function is a @code{Lisp_Compiled_Function} object or a lambda,
5534 @code{apply_lambda()} is called.  If the function is a macro,
5535 [..... fill in] is done.  If the function is an autoload,
5536 @code{do_autoload()} is called to load the definition and then eval
5537 starts over [explain this more].
5538
5539   When @code{Feval} exits, the evaluation depth is reduced by one, the
5540 debugger is called if appropriate, and the current backtrace structure
5541 is removed from the list.
5542
5543   @code{apply_lambda()} is passed a function, a list of arguments, and a
5544 flag indicating whether to evaluate the arguments.  It creates an array
5545 of (possibly) evaluated arguments and fixes up the backtrace structure,
5546 just like eval does.  Then it calls @code{funcall_lambda()}.
5547
5548   @code{funcall_lambda()} goes through the formal arguments to the
5549 function and binds them to the actual arguments, checking for
5550 @code{&rest} and @code{&optional} symbols in the formal arguments and
5551 making sure the number of actual arguments is correct.  Then either
5552 @code{progn} or @code{byte-code} is called to actually execute the body
5553 and return a value.
5554
5555   @code{Ffuncall()} implements Lisp @code{funcall}.  @code{(funcall fun
5556 x1 x2 x3 ...)} is equivalent to @code{(eval (list fun (quote x1) (quote
5557 x2) (quote x3) ...))}.  @code{Ffuncall()} contains its own code to do
5558 the evaluation, however, and is almost identical to eval.
5559
5560   @code{Fapply()} implements Lisp @code{apply}, which is very similar to
5561 @code{funcall} except that if the last argument is a list, the result is the
5562 same as if each of the arguments in the list had been passed separately.
5563 @code{Fapply()} does some business to expand the last argument if it's a
5564 list, then calls @code{Ffuncall()} to do the work.
5565
5566   @code{apply1()}, @code{call0()}, @code{call1()}, @code{call2()}, and
5567 @code{call3()} call a function, passing it the argument(s) given (the
5568 arguments are given as separate C arguments rather than being passed as
5569 an array).  @code{apply1()} uses @code{apply} while the others use
5570 @code{funcall}.
5571
5572 @node Dynamic Binding; The specbinding Stack; Unwind-Protects
5573 @section Dynamic Binding; The specbinding Stack; Unwind-Protects
5574
5575 @example
5576 struct specbinding
5577 @{
5578   Lisp_Object symbol, old_value;
5579   Lisp_Object (*func) (Lisp_Object); /* for unwind-protect */
5580 @};
5581 @end example
5582
5583   @code{struct specbinding} is used for local-variable bindings and
5584 unwind-protects.  @code{specpdl} holds an array of @code{struct specbinding}'s,
5585 @code{specpdl_ptr} points to the beginning of the free bindings in the
5586 array, @code{specpdl_size} specifies the total number of binding slots
5587 in the array, and @code{max_specpdl_size} specifies the maximum number
5588 of bindings the array can be expanded to hold.  @code{grow_specpdl()}
5589 increases the size of the @code{specpdl} array, multiplying its size by
5590 2 but never exceeding @code{max_specpdl_size} (except that if this
5591 number is less than 400, it is first set to 400).
5592
5593   @code{specbind()} binds a symbol to a value and is used for local
5594 variables and @code{let} forms.  The symbol and its old value (which
5595 might be @code{Qunbound}, indicating no prior value) are recorded in the
5596 specpdl array, and @code{specpdl_size} is increased by 1.
5597
5598   @code{record_unwind_protect()} implements an @dfn{unwind-protect},
5599 which, when placed around a section of code, ensures that some specified
5600 cleanup routine will be executed even if the code exits abnormally
5601 (e.g. through a @code{throw} or quit).  @code{record_unwind_protect()}
5602 simply adds a new specbinding to the @code{specpdl} array and stores the
5603 appropriate information in it.  The cleanup routine can either be a C
5604 function, which is stored in the @code{func} field, or a @code{progn}
5605 form, which is stored in the @code{old_value} field.
5606
5607   @code{unbind_to()} removes specbindings from the @code{specpdl} array
5608 until the specified position is reached.  Each specbinding can be one of
5609 three types:
5610
5611 @enumerate
5612 @item
5613 an unwind-protect with a C cleanup function (@code{func} is not 0, and
5614 @code{old_value} holds an argument to be passed to the function);
5615 @item
5616 an unwind-protect with a Lisp form (@code{func} is 0, @code{symbol}
5617 is @code{nil}, and @code{old_value} holds the form to be executed with
5618 @code{Fprogn()}); or
5619 @item
5620 a local-variable binding (@code{func} is 0, @code{symbol} is not
5621 @code{nil}, and @code{old_value} holds the old value, which is stored as
5622 the symbol's value).
5623 @end enumerate
5624
5625 @node Simple Special Forms
5626 @section Simple Special Forms
5627
5628 @code{or}, @code{and}, @code{if}, @code{cond}, @code{progn},
5629 @code{prog1}, @code{prog2}, @code{setq}, @code{quote}, @code{function},
5630 @code{let*}, @code{let}, @code{while}
5631
5632   All of these are very simple and work as expected, calling
5633 @code{Feval()} or @code{Fprogn()} as necessary and (in the case of
5634 @code{let} and @code{let*}) using @code{specbind()} to create bindings
5635 and @code{unbind_to()} to undo the bindings when finished.  Note that
5636 these functions do a lot of @code{GCPRO}ing to protect their arguments
5637 from garbage collection because they call @code{Feval()} (@pxref{Garbage
5638 Collection}).
5639
5640 @node Catch and Throw
5641 @section Catch and Throw
5642
5643 @example
5644 struct catchtag
5645 @{
5646   Lisp_Object tag;
5647   Lisp_Object val;
5648   struct catchtag *next;
5649   struct gcpro *gcpro;
5650   jmp_buf jmp;
5651   struct backtrace *backlist;
5652   int lisp_eval_depth;
5653   int pdlcount;
5654 @};
5655 @end example
5656
5657   @code{catch} is a Lisp function that places a catch around a body of
5658 code.  A catch is a means of non-local exit from the code.  When a catch
5659 is created, a tag is specified, and executing a @code{throw} to this tag
5660 will exit from the body of code caught with this tag, and its value will
5661 be the value given in the call to @code{throw}.  If there is no such
5662 call, the code will be executed normally.
5663
5664   Information pertaining to a catch is held in a @code{struct catchtag},
5665 which is placed at the head of a linked list pointed to by
5666 @code{catchlist}.  @code{internal_catch()} is passed a C function to
5667 call (@code{Fprogn()} when Lisp @code{catch} is called) and arguments to
5668 give it, and places a catch around the function.  Each @code{struct
5669 catchtag} is held in the stack frame of the @code{internal_catch()}
5670 instance that created the catch.
5671
5672   @code{internal_catch()} is fairly straightforward.  It stores into the
5673 @code{struct catchtag} the tag name and the current values of
5674 @code{backtrace_list}, @code{lisp_eval_depth}, @code{gcprolist}, and the
5675 offset into the @code{specpdl} array, sets a jump point with @code{_setjmp()}
5676 (storing the jump point into the @code{struct catchtag}), and calls the
5677 function.  Control will return to @code{internal_catch()} either when
5678 the function exits normally or through a @code{_longjmp()} to this jump
5679 point.  In the latter case, @code{throw} will store the value to be
5680 returned into the @code{struct catchtag} before jumping.  When it's
5681 done, @code{internal_catch()} removes the @code{struct catchtag} from
5682 the catchlist and returns the proper value.
5683
5684   @code{Fthrow()} goes up through the catchlist until it finds one with
5685 a matching tag.  It then calls @code{unbind_catch()} to restore
5686 everything to what it was when the appropriate catch was set, stores the
5687 return value in the @code{struct catchtag}, and jumps (with
5688 @code{_longjmp()}) to its jump point.
5689
5690   @code{unbind_catch()} removes all catches from the catchlist until it
5691 finds the correct one.  Some of the catches might have been placed for
5692 error-trapping, and if so, the appropriate entries on the handlerlist
5693 must be removed (see ``errors'').  @code{unbind_catch()} also restores
5694 the values of @code{gcprolist}, @code{backtrace_list}, and
5695 @code{lisp_eval}, and calls @code{unbind_to()} to undo any specbindings
5696 created since the catch.
5697
5698
5699 @node Symbols and Variables, Buffers and Textual Representation, Evaluation; Stack Frames; Bindings, Top
5700 @chapter Symbols and Variables
5701
5702 @menu
5703 * Introduction to Symbols::
5704 * Obarrays::
5705 * Symbol Values::
5706 @end menu
5707
5708 @node Introduction to Symbols
5709 @section Introduction to Symbols
5710
5711   A symbol is basically just an object with four fields: a name (a
5712 string), a value (some Lisp object), a function (some Lisp object), and
5713 a property list (usually a list of alternating keyword/value pairs).
5714 What makes symbols special is that there is usually only one symbol with
5715 a given name, and the symbol is referred to by name.  This makes a
5716 symbol a convenient way of calling up data by name, i.e. of implementing
5717 variables. (The variable's value is stored in the @dfn{value slot}.)
5718 Similarly, functions are referenced by name, and the definition of the
5719 function is stored in a symbol's @dfn{function slot}.  This means that
5720 there can be a distinct function and variable with the same name.  The
5721 property list is used as a more general mechanism of associating
5722 additional values with particular names, and once again the namespace is
5723 independent of the function and variable namespaces.
5724
5725 @node Obarrays
5726 @section Obarrays
5727
5728   The identity of symbols with their names is accomplished through a
5729 structure called an obarray, which is just a poorly-implemented hash
5730 table mapping from strings to symbols whose name is that string. (I say
5731 ``poorly implemented'' because an obarray appears in Lisp as a vector
5732 with some hidden fields rather than as its own opaque type.  This is an
5733 Emacs Lisp artifact that should be fixed.)
5734
5735   Obarrays are implemented as a vector of some fixed size (which should
5736 be a prime for best results), where each ``bucket'' of the vector
5737 contains one or more symbols, threaded through a hidden @code{next}
5738 field in the symbol.  Lookup of a symbol in an obarray, and adding a
5739 symbol to an obarray, is accomplished through standard hash-table
5740 techniques.
5741
5742   The standard Lisp function for working with symbols and obarrays is
5743 @code{intern}.  This looks up a symbol in an obarray given its name; if
5744 it's not found, a new symbol is automatically created with the specified
5745 name, added to the obarray, and returned.  This is what happens when the
5746 Lisp reader encounters a symbol (or more precisely, encounters the name
5747 of a symbol) in some text that it is reading.  There is a standard
5748 obarray called @code{obarray} that is used for this purpose, although
5749 the Lisp programmer is free to create his own obarrays and @code{intern}
5750 symbols in them.
5751
5752   Note that, once a symbol is in an obarray, it stays there until
5753 something is done about it, and the standard obarray @code{obarray}
5754 always stays around, so once you use any particular variable name, a
5755 corresponding symbol will stay around in @code{obarray} until you exit
5756 XEmacs.
5757
5758   Note that @code{obarray} itself is a variable, and as such there is a
5759 symbol in @code{obarray} whose name is @code{"obarray"} and which
5760 contains @code{obarray} as its value.
5761
5762   Note also that this call to @code{intern} occurs only when in the Lisp
5763 reader, not when the code is executed (at which point the symbol is
5764 already around, stored as such in the definition of the function).
5765
5766   You can create your own obarray using @code{make-vector} (this is
5767 horrible but is an artifact) and intern symbols into that obarray.
5768 Doing that will result in two or more symbols with the same name.
5769 However, at most one of these symbols is in the standard @code{obarray}:
5770 You cannot have two symbols of the same name in any particular obarray.
5771 Note that you cannot add a symbol to an obarray in any fashion other
5772 than using @code{intern}: i.e. you can't take an existing symbol and put
5773 it in an existing obarray.  Nor can you change the name of an existing
5774 symbol. (Since obarrays are vectors, you can violate the consistency of
5775 things by storing directly into the vector, but let's ignore that
5776 possibility.)
5777
5778   Usually symbols are created by @code{intern}, but if you really want,
5779 you can explicitly create a symbol using @code{make-symbol}, giving it
5780 some name.  The resulting symbol is not in any obarray (i.e. it is
5781 @dfn{uninterned}), and you can't add it to any obarray.  Therefore its
5782 primary purpose is as a symbol to use in macros to avoid namespace
5783 pollution.  It can also be used as a carrier of information, but cons
5784 cells could probably be used just as well.
5785
5786   You can also use @code{intern-soft} to look up a symbol but not create
5787 a new one, and @code{unintern} to remove a symbol from an obarray.  This
5788 returns the removed symbol. (Remember: You can't put the symbol back
5789 into any obarray.) Finally, @code{mapatoms} maps over all of the symbols
5790 in an obarray.
5791
5792 @node Symbol Values
5793 @section Symbol Values
5794
5795   The value field of a symbol normally contains a Lisp object.  However,
5796 a symbol can be @dfn{unbound}, meaning that it logically has no value.
5797 This is internally indicated by storing a special Lisp object, called
5798 @dfn{the unbound marker} and stored in the global variable
5799 @code{Qunbound}.  The unbound marker is of a special Lisp object type
5800 called @dfn{symbol-value-magic}.  It is impossible for the Lisp
5801 programmer to directly create or access any object of this type.
5802
5803   @strong{You must not let any ``symbol-value-magic'' object escape to
5804 the Lisp level.}  Printing any of these objects will cause the message
5805 @samp{INTERNAL EMACS BUG} to appear as part of the print representation.
5806 (You may see this normally when you call @code{debug_print()} from the
5807 debugger on a Lisp object.) If you let one of these objects escape to
5808 the Lisp level, you will violate a number of assumptions contained in
5809 the C code and make the unbound marker not function right.
5810
5811   When a symbol is created, its value field (and function field) are set
5812 to @code{Qunbound}.  The Lisp programmer can restore these conditions
5813 later using @code{makunbound} or @code{fmakunbound}, and can query to
5814 see whether the value of function fields are @dfn{bound} (i.e. have a
5815 value other than @code{Qunbound}) using @code{boundp} and
5816 @code{fboundp}.  The fields are set to a normal Lisp object using
5817 @code{set} (or @code{setq}) and @code{fset}.
5818
5819   Other symbol-value-magic objects are used as special markers to
5820 indicate variables that have non-normal properties.  This includes any
5821 variables that are tied into C variables (setting the variable magically
5822 sets some global variable in the C code, and likewise for retrieving the
5823 variable's value), variables that magically tie into slots in the
5824 current buffer, variables that are buffer-local, etc.  The
5825 symbol-value-magic object is stored in the value cell in place of
5826 a normal object, and the code to retrieve a symbol's value
5827 (i.e. @code{symbol-value}) knows how to do special things with them.
5828 This means that you should not just fetch the value cell directly if you
5829 want a symbol's value.
5830
5831   The exact workings of this are rather complex and involved and are
5832 well-documented in comments in @file{buffer.c}, @file{symbols.c}, and
5833 @file{lisp.h}.
5834
5835 @node Buffers and Textual Representation, MULE Character Sets and Encodings, Symbols and Variables, Top
5836 @chapter Buffers and Textual Representation
5837
5838 @menu
5839 * Introduction to Buffers::     A buffer holds a block of text such as a file.
5840 * The Text in a Buffer::        Representation of the text in a buffer.
5841 * Buffer Lists::                Keeping track of all buffers.
5842 * Markers and Extents::         Tagging locations within a buffer.
5843 * Bufbytes and Emchars::        Representation of individual characters.
5844 * The Buffer Object::           The Lisp object corresponding to a buffer.
5845 @end menu
5846
5847 @node Introduction to Buffers
5848 @section Introduction to Buffers
5849
5850   A buffer is logically just a Lisp object that holds some text.
5851 In this, it is like a string, but a buffer is optimized for
5852 frequent insertion and deletion, while a string is not.  Furthermore:
5853
5854 @enumerate
5855 @item
5856 Buffers are @dfn{permanent} objects, i.e. once you create them, they
5857 remain around, and need to be explicitly deleted before they go away.
5858 @item
5859 Each buffer has a unique name, which is a string.  Buffers are
5860 normally referred to by name.  In this respect, they are like
5861 symbols.
5862 @item
5863 Buffers have a default insertion position, called @dfn{point}.
5864 Inserting text (unless you explicitly give a position) goes at point,
5865 and moves point forward past the text.  This is what is going on when
5866 you type text into Emacs.
5867 @item
5868 Buffers have lots of extra properties associated with them.
5869 @item
5870 Buffers can be @dfn{displayed}.  What this means is that there
5871 exist a number of @dfn{windows}, which are objects that correspond
5872 to some visible section of your display, and each window has
5873 an associated buffer, and the current contents of the buffer
5874 are shown in that section of the display.  The redisplay mechanism
5875 (which takes care of doing this) knows how to look at the
5876 text of a buffer and come up with some reasonable way of displaying
5877 this.  Many of the properties of a buffer control how the
5878 buffer's text is displayed.
5879 @item
5880 One buffer is distinguished and called the @dfn{current buffer}.  It is
5881 stored in the variable @code{current_buffer}.  Buffer operations operate
5882 on this buffer by default.  When you are typing text into a buffer, the
5883 buffer you are typing into is always @code{current_buffer}.  Switching
5884 to a different window changes the current buffer.  Note that Lisp code
5885 can temporarily change the current buffer using @code{set-buffer} (often
5886 enclosed in a @code{save-excursion} so that the former current buffer
5887 gets restored when the code is finished).  However, calling
5888 @code{set-buffer} will NOT cause a permanent change in the current
5889 buffer.  The reason for this is that the top-level event loop sets
5890 @code{current_buffer} to the buffer of the selected window, each time
5891 it finishes executing a user command.
5892 @end enumerate
5893
5894   Make sure you understand the distinction between @dfn{current buffer}
5895 and @dfn{buffer of the selected window}, and the distinction between
5896 @dfn{point} of the current buffer and @dfn{window-point} of the selected
5897 window. (This latter distinction is explained in detail in the section
5898 on windows.)
5899
5900 @node The Text in a Buffer
5901 @section The Text in a Buffer
5902
5903   The text in a buffer consists of a sequence of zero or more
5904 characters.  A @dfn{character} is an integer that logically represents
5905 a letter, number, space, or other unit of text.  Most of the characters
5906 that you will typically encounter belong to the ASCII set of characters,
5907 but there are also characters for various sorts of accented letters,
5908 special symbols, Chinese and Japanese ideograms (i.e. Kanji, Katakana,
5909 etc.), Cyrillic and Greek letters, etc.  The actual number of possible
5910 characters is quite large.
5911
5912   For now, we can view a character as some non-negative integer that
5913 has some shape that defines how it typically appears (e.g. as an
5914 uppercase A). (The exact way in which a character appears depends on the
5915 font used to display the character.) The internal type of characters in
5916 the C code is an @code{Emchar}; this is just an @code{int}, but using a
5917 symbolic type makes the code clearer.
5918
5919   Between every character in a buffer is a @dfn{buffer position} or
5920 @dfn{character position}.  We can speak of the character before or after
5921 a particular buffer position, and when you insert a character at a
5922 particular position, all characters after that position end up at new
5923 positions.  When we speak of the character @dfn{at} a position, we
5924 really mean the character after the position.  (This schizophrenia
5925 between a buffer position being ``between'' a character and ``on'' a
5926 character is rampant in Emacs.)
5927
5928   Buffer positions are numbered starting at 1.  This means that
5929 position 1 is before the first character, and position 0 is not
5930 valid.  If there are N characters in a buffer, then buffer
5931 position N+1 is after the last one, and position N+2 is not valid.
5932
5933   The internal makeup of the Emchar integer varies depending on whether
5934 we have compiled with MULE support.  If not, the Emchar integer is an
5935 8-bit integer with possible values from 0 - 255.  0 - 127 are the
5936 standard ASCII characters, while 128 - 255 are the characters from the
5937 ISO-8859-1 character set.  If we have compiled with MULE support, an
5938 Emchar is a 19-bit integer, with the various bits having meanings
5939 according to a complex scheme that will be detailed later.  The
5940 characters numbered 0 - 255 still have the same meanings as for the
5941 non-MULE case, though.
5942
5943   Internally, the text in a buffer is represented in a fairly simple
5944 fashion: as a contiguous array of bytes, with a @dfn{gap} of some size
5945 in the middle.  Although the gap is of some substantial size in bytes,
5946 there is no text contained within it: From the perspective of the text
5947 in the buffer, it does not exist.  The gap logically sits at some buffer
5948 position, between two characters (or possibly at the beginning or end of
5949 the buffer).  Insertion of text in a buffer at a particular position is
5950 always accomplished by first moving the gap to that position
5951 (i.e. through some block moving of text), then writing the text into the
5952 beginning of the gap, thereby shrinking the gap.  If the gap shrinks
5953 down to nothing, a new gap is created. (What actually happens is that a
5954 new gap is ``created'' at the end of the buffer's text, which requires
5955 nothing more than changing a couple of indices; then the gap is
5956 ``moved'' to the position where the insertion needs to take place by
5957 moving up in memory all the text after that position.)  Similarly,
5958 deletion occurs by moving the gap to the place where the text is to be
5959 deleted, and then simply expanding the gap to include the deleted text.
5960 (@dfn{Expanding} and @dfn{shrinking} the gap as just described means
5961 just that the internal indices that keep track of where the gap is
5962 located are changed.)
5963
5964   Note that the total amount of memory allocated for a buffer text never
5965 decreases while the buffer is live.  Therefore, if you load up a
5966 20-megabyte file and then delete all but one character, there will be a
5967 20-megabyte gap, which won't get any smaller (except by inserting
5968 characters back again).  Once the buffer is killed, the memory allocated
5969 for the buffer text will be freed, but it will still be sitting on the
5970 heap, taking up virtual memory, and will not be released back to the
5971 operating system. (However, if you have compiled XEmacs with rel-alloc,
5972 the situation is different.  In this case, the space @emph{will} be
5973 released back to the operating system.  However, this tends to result in a
5974 noticeable speed penalty.)
5975
5976   Astute readers may notice that the text in a buffer is represented as
5977 an array of @emph{bytes}, while (at least in the MULE case) an Emchar is
5978 a 19-bit integer, which clearly cannot fit in a byte.  This means (of
5979 course) that the text in a buffer uses a different representation from
5980 an Emchar: specifically, the 19-bit Emchar becomes a series of one to
5981 four bytes.  The conversion between these two representations is complex
5982 and will be described later.
5983
5984   In the non-MULE case, everything is very simple: An Emchar
5985 is an 8-bit value, which fits neatly into one byte.
5986
5987   If we are given a buffer position and want to retrieve the
5988 character at that position, we need to follow these steps:
5989
5990 @enumerate
5991 @item
5992 Pretend there's no gap, and convert the buffer position into a @dfn{byte
5993 index} that indexes to the appropriate byte in the buffer's stream of
5994 textual bytes.  By convention, byte indices begin at 1, just like buffer
5995 positions.  In the non-MULE case, byte indices and buffer positions are
5996 identical, since one character equals one byte.
5997 @item
5998 Convert the byte index into a @dfn{memory index}, which takes the gap
5999 into account.  The memory index is a direct index into the block of
6000 memory that stores the text of a buffer.  This basically just involves
6001 checking to see if the byte index is past the gap, and if so, adding the
6002 size of the gap to it.  By convention, memory indices begin at 1, just
6003 like buffer positions and byte indices, and when referring to the
6004 position that is @dfn{at} the gap, we always use the memory position at
6005 the @emph{beginning}, not at the end, of the gap.
6006 @item
6007 Fetch the appropriate bytes at the determined memory position.
6008 @item
6009 Convert these bytes into an Emchar.
6010 @end enumerate
6011
6012   In the non-Mule case, (3) and (4) boil down to a simple one-byte
6013 memory access.
6014
6015   Note that we have defined three types of positions in a buffer:
6016
6017 @enumerate
6018 @item
6019 @dfn{buffer positions} or @dfn{character positions}, typedef @code{Bufpos}
6020 @item
6021 @dfn{byte indices}, typedef @code{Bytind}
6022 @item
6023 @dfn{memory indices}, typedef @code{Memind}
6024 @end enumerate
6025
6026   All three typedefs are just @code{int}s, but defining them this way makes
6027 things a lot clearer.
6028
6029   Most code works with buffer positions.  In particular, all Lisp code
6030 that refers to text in a buffer uses buffer positions.  Lisp code does
6031 not know that byte indices or memory indices exist.
6032
6033   Finally, we have a typedef for the bytes in a buffer.  This is a
6034 @code{Bufbyte}, which is an unsigned char.  Referring to them as
6035 Bufbytes underscores the fact that we are working with a string of bytes
6036 in the internal Emacs buffer representation rather than in one of a
6037 number of possible alternative representations (e.g. EUC-encoded text,
6038 etc.).
6039
6040 @node Buffer Lists
6041 @section Buffer Lists
6042
6043   Recall earlier that buffers are @dfn{permanent} objects, i.e.  that
6044 they remain around until explicitly deleted.  This entails that there is
6045 a list of all the buffers in existence.  This list is actually an
6046 assoc-list (mapping from the buffer's name to the buffer) and is stored
6047 in the global variable @code{Vbuffer_alist}.
6048
6049   The order of the buffers in the list is important: the buffers are
6050 ordered approximately from most-recently-used to least-recently-used.
6051 Switching to a buffer using @code{switch-to-buffer},
6052 @code{pop-to-buffer}, etc. and switching windows using
6053 @code{other-window}, etc.  usually brings the new current buffer to the
6054 front of the list.  @code{switch-to-buffer}, @code{other-buffer},
6055 etc. look at the beginning of the list to find an alternative buffer to
6056 suggest.  You can also explicitly move a buffer to the end of the list
6057 using @code{bury-buffer}.
6058
6059   In addition to the global ordering in @code{Vbuffer_alist}, each frame
6060 has its own ordering of the list.  These lists always contain the same
6061 elements as in @code{Vbuffer_alist} although possibly in a different
6062 order.  @code{buffer-list} normally returns the list for the selected
6063 frame.  This allows you to work in separate frames without things
6064 interfering with each other.
6065
6066   The standard way to look up a buffer given a name is
6067 @code{get-buffer}, and the standard way to create a new buffer is
6068 @code{get-buffer-create}, which looks up a buffer with a given name,
6069 creating a new one if necessary.  These operations correspond exactly
6070 with the symbol operations @code{intern-soft} and @code{intern},
6071 respectively.  You can also force a new buffer to be created using
6072 @code{generate-new-buffer}, which takes a name and (if necessary) makes
6073 a unique name from this by appending a number, and then creates the
6074 buffer.  This is basically like the symbol operation @code{gensym}.
6075
6076 @node Markers and Extents
6077 @section Markers and Extents
6078
6079   Among the things associated with a buffer are things that are
6080 logically attached to certain buffer positions.  This can be used to
6081 keep track of a buffer position when text is inserted and deleted, so
6082 that it remains at the same spot relative to the text around it; to
6083 assign properties to particular sections of text; etc.  There are two
6084 such objects that are useful in this regard: they are @dfn{markers} and
6085 @dfn{extents}.
6086
6087   A @dfn{marker} is simply a flag placed at a particular buffer
6088 position, which is moved around as text is inserted and deleted.
6089 Markers are used for all sorts of purposes, such as the @code{mark} that
6090 is the other end of textual regions to be cut, copied, etc.
6091
6092   An @dfn{extent} is similar to two markers plus some associated
6093 properties, and is used to keep track of regions in a buffer as text is
6094 inserted and deleted, and to add properties (e.g. fonts) to particular
6095 regions of text.  The external interface of extents is explained
6096 elsewhere.
6097
6098   The important thing here is that markers and extents simply contain
6099 buffer positions in them as integers, and every time text is inserted or
6100 deleted, these positions must be updated.  In order to minimize the
6101 amount of shuffling that needs to be done, the positions in markers and
6102 extents (there's one per marker, two per extent) and stored in Meminds.
6103 This means that they only need to be moved when the text is physically
6104 moved in memory; since the gap structure tries to minimize this, it also
6105 minimizes the number of marker and extent indices that need to be
6106 adjusted.  Look in @file{insdel.c} for the details of how this works.
6107
6108   One other important distinction is that markers are @dfn{temporary}
6109 while extents are @dfn{permanent}.  This means that markers disappear as
6110 soon as there are no more pointers to them, and correspondingly, there
6111 is no way to determine what markers are in a buffer if you are just
6112 given the buffer.  Extents remain in a buffer until they are detached
6113 (which could happen as a result of text being deleted) or the buffer is
6114 deleted, and primitives do exist to enumerate the extents in a buffer.
6115
6116 @node Bufbytes and Emchars
6117 @section Bufbytes and Emchars
6118
6119   Not yet documented.
6120
6121 @node The Buffer Object
6122 @section The Buffer Object
6123
6124   Buffers contain fields not directly accessible by the Lisp programmer.
6125 We describe them here, naming them by the names used in the C code.
6126 Many are accessible indirectly in Lisp programs via Lisp primitives.
6127
6128 @table @code
6129 @item name
6130 The buffer name is a string that names the buffer.  It is guaranteed to
6131 be unique.  @xref{Buffer Names,,, lispref, XEmacs Lisp Programmer's
6132 Manual}.
6133
6134 @item save_modified
6135 This field contains the time when the buffer was last saved, as an
6136 integer.  @xref{Buffer Modification,,, lispref, XEmacs Lisp Programmer's
6137 Manual}.
6138
6139 @item modtime
6140 This field contains the modification time of the visited file.  It is
6141 set when the file is written or read.  Every time the buffer is written
6142 to the file, this field is compared to the modification time of the
6143 file.  @xref{Buffer Modification,,, lispref, XEmacs Lisp Programmer's
6144 Manual}.
6145
6146 @item auto_save_modified
6147 This field contains the time when the buffer was last auto-saved.
6148
6149 @item last_window_start
6150 This field contains the @code{window-start} position in the buffer as of
6151 the last time the buffer was displayed in a window.
6152
6153 @item undo_list
6154 This field points to the buffer's undo list.  @xref{Undo,,, lispref,
6155 XEmacs Lisp Programmer's Manual}.
6156
6157 @item syntax_table_v
6158 This field contains the syntax table for the buffer.  @xref{Syntax
6159 Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
6160
6161 @item downcase_table
6162 This field contains the conversion table for converting text to lower
6163 case.  @xref{Case Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
6164
6165 @item upcase_table
6166 This field contains the conversion table for converting text to upper
6167 case.  @xref{Case Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
6168
6169 @item case_canon_table
6170 This field contains the conversion table for canonicalizing text for
6171 case-folding search.  @xref{Case Tables,,, lispref, XEmacs Lisp
6172 Programmer's Manual}.
6173
6174 @item case_eqv_table
6175 This field contains the equivalence table for case-folding search.
6176 @xref{Case Tables,,, lispref, XEmacs Lisp Programmer's Manual}.
6177
6178 @item display_table
6179 This field contains the buffer's display table, or @code{nil} if it
6180 doesn't have one.  @xref{Display Tables,,, lispref, XEmacs Lisp
6181 Programmer's Manual}.
6182
6183 @item markers
6184 This field contains the chain of all markers that currently point into
6185 the buffer.  Deletion of text in the buffer, and motion of the buffer's
6186 gap, must check each of these markers and perhaps update it.
6187 @xref{Markers,,, lispref, XEmacs Lisp Programmer's Manual}.
6188
6189 @item backed_up
6190 This field is a flag that tells whether a backup file has been made for
6191 the visited file of this buffer.
6192
6193 @item mark
6194 This field contains the mark for the buffer.  The mark is a marker,
6195 hence it is also included on the list @code{markers}.  @xref{The Mark,,,
6196 lispref, XEmacs Lisp Programmer's Manual}.
6197
6198 @item mark_active
6199 This field is non-@code{nil} if the buffer's mark is active.
6200
6201 @item local_var_alist
6202 This field contains the association list describing the variables local
6203 in this buffer, and their values, with the exception of local variables
6204 that have special slots in the buffer object.  (Those slots are omitted
6205 from this table.)  @xref{Buffer-Local Variables,,, lispref, XEmacs Lisp
6206 Programmer's Manual}.
6207
6208 @item modeline_format
6209 This field contains a Lisp object which controls how to display the mode
6210 line for this buffer.  @xref{Modeline Format,,, lispref, XEmacs Lisp
6211 Programmer's Manual}.
6212
6213 @item base_buffer
6214 This field holds the buffer's base buffer (if it is an indirect buffer),
6215 or @code{nil}.
6216 @end table
6217
6218 @node MULE Character Sets and Encodings, The Lisp Reader and Compiler, Buffers and Textual Representation, Top
6219 @chapter MULE Character Sets and Encodings
6220
6221   Recall that there are two primary ways that text is represented in
6222 XEmacs.  The @dfn{buffer} representation sees the text as a series of
6223 bytes (Bufbytes), with a variable number of bytes used per character.
6224 The @dfn{character} representation sees the text as a series of integers
6225 (Emchars), one per character.  The character representation is a cleaner
6226 representation from a theoretical standpoint, and is thus used in many
6227 cases when lots of manipulations on a string need to be done.  However,
6228 the buffer representation is the standard representation used in both
6229 Lisp strings and buffers, and because of this, it is the ``default''
6230 representation that text comes in.  The reason for using this
6231 representation is that it's compact and is compatible with ASCII.
6232
6233 @menu
6234 * Character Sets::
6235 * Encodings::
6236 * Internal Mule Encodings::
6237 * CCL::
6238 @end menu
6239
6240 @node Character Sets
6241 @section Character Sets
6242
6243   A character set (or @dfn{charset}) is an ordered set of characters.  A
6244 particular character in a charset is indexed using one or more
6245 @dfn{position codes}, which are non-negative integers.  The number of
6246 position codes needed to identify a particular character in a charset is
6247 called the @dfn{dimension} of the charset.  In XEmacs/Mule, all charsets
6248 have dimension 1 or 2, and the size of all charsets (except for a few
6249 special cases) is either 94, 96, 94 by 94, or 96 by 96.  The range of
6250 position codes used to index characters from any of these types of
6251 character sets is as follows:
6252
6253 @example
6254 Charset type            Position code 1         Position code 2
6255 ------------------------------------------------------------
6256 94                      33 - 126                N/A
6257 96                      32 - 127                N/A
6258 94x94                   33 - 126                33 - 126
6259 96x96                   32 - 127                32 - 127
6260 @end example
6261
6262   Note that in the above cases position codes do not start at an
6263 expected value such as 0 or 1.  The reason for this will become clear
6264 later.
6265
6266   For example, Latin-1 is a 96-character charset, and JISX0208 (the
6267 Japanese national character set) is a 94x94-character charset.
6268
6269   [Note that, although the ranges above define the @emph{valid} position
6270 codes for a charset, some of the slots in a particular charset may in
6271 fact be empty.  This is the case for JISX0208, for example, where (e.g.)
6272 all the slots whose first position code is in the range 118 - 127 are
6273 empty.]
6274
6275   There are three charsets that do not follow the above rules.  All of
6276 them have one dimension, and have ranges of position codes as follows:
6277
6278 @example
6279 Charset name            Position code 1
6280 ------------------------------------
6281 ASCII                   0 - 127
6282 Control-1               0 - 31
6283 Composite               0 - some large number
6284 @end example
6285
6286   (The upper bound of the position code for composite characters has not
6287 yet been determined, but it will probably be at least 16,383).
6288
6289   ASCII is the union of two subsidiary character sets: Printing-ASCII
6290 (the printing ASCII character set, consisting of position codes 33 -
6291 126, like for a standard 94-character charset) and Control-ASCII (the
6292 non-printing characters that would appear in a binary file with codes 0
6293 - 32 and 127).
6294
6295   Control-1 contains the non-printing characters that would appear in a
6296 binary file with codes 128 - 159.
6297
6298   Composite contains characters that are generated by overstriking one
6299 or more characters from other charsets.
6300
6301   Note that some characters in ASCII, and all characters in Control-1,
6302 are @dfn{control} (non-printing) characters.  These have no printed
6303 representation but instead control some other function of the printing
6304 (e.g. TAB or 8 moves the current character position to the next tab
6305 stop).  All other characters in all charsets are @dfn{graphic}
6306 (printing) characters.
6307
6308   When a binary file is read in, the bytes in the file are assigned to
6309 character sets as follows:
6310
6311 @example
6312 Bytes           Character set           Range
6313 --------------------------------------------------
6314 0 - 127         ASCII                   0 - 127
6315 128 - 159       Control-1               0 - 31
6316 160 - 255       Latin-1                 32 - 127
6317 @end example
6318
6319   This is a bit ad-hoc but gets the job done.
6320
6321 @node Encodings
6322 @section Encodings
6323
6324   An @dfn{encoding} is a way of numerically representing characters from
6325 one or more character sets.  If an encoding only encompasses one
6326 character set, then the position codes for the characters in that
6327 character set could be used directly.  This is not possible, however, if
6328 more than one character set is to be used in the encoding.
6329
6330   For example, the conversion detailed above between bytes in a binary
6331 file and characters is effectively an encoding that encompasses the
6332 three character sets ASCII, Control-1, and Latin-1 in a stream of 8-bit
6333 bytes.
6334
6335   Thus, an encoding can be viewed as a way of encoding characters from a
6336 specified group of character sets using a stream of bytes, each of which
6337 contains a fixed number of bits (but not necessarily 8, as in the common
6338 usage of ``byte'').
6339
6340   Here are descriptions of a couple of common
6341 encodings:
6342
6343 @menu
6344 * Japanese EUC (Extended Unix Code)::
6345 * JIS7::
6346 @end menu
6347
6348 @node Japanese EUC (Extended Unix Code)
6349 @subsection Japanese EUC (Extended Unix Code)
6350
6351 This encompasses the character sets Printing-ASCII, Japanese-JISSX0201,
6352 and Japanese-JISX0208-Kana (half-width katakana, the right half of
6353 JISX0201).  It uses 8-bit bytes.
6354
6355 Note that Printing-ASCII and Japanese-JISX0201-Kana are 94-character
6356 charsets, while Japanese-JISX0208 is a 94x94-character charset.
6357
6358 The encoding is as follows:
6359
6360 @example
6361 Character set            Representation (PC=position-code)
6362 -------------            --------------
6363 Printing-ASCII           PC1
6364 Japanese-JISX0201-Kana   0x8E       | PC1 + 0x80
6365 Japanese-JISX0208        PC1 + 0x80 | PC2 + 0x80
6366 Japanese-JISX0212        PC1 + 0x80 | PC2 + 0x80
6367 @end example
6368
6369
6370 @node JIS7
6371 @subsection JIS7
6372
6373 This encompasses the character sets Printing-ASCII,
6374 Japanese-JISX0201-Roman (the left half of JISX0201; this character set
6375 is very similar to Printing-ASCII and is a 94-character charset),
6376 Japanese-JISX0208, and Japanese-JISX0201-Kana.  It uses 7-bit bytes.
6377
6378 Unlike Japanese EUC, this is a @dfn{modal} encoding, which
6379 means that there are multiple states that the encoding can
6380 be in, which affect how the bytes are to be interpreted.
6381 Special sequences of bytes (called @dfn{escape sequences})
6382 are used to change states.
6383
6384   The encoding is as follows:
6385
6386 @example
6387 Character set              Representation (PC=position-code)
6388 -------------              --------------
6389 Printing-ASCII             PC1
6390 Japanese-JISX0201-Roman    PC1
6391 Japanese-JISX0201-Kana     PC1
6392 Japanese-JISX0208          PC1 PC2
6393
6394
6395 Escape sequence   ASCII equivalent   Meaning
6396 ---------------   ----------------   -------
6397 0x1B 0x28 0x4A    ESC ( J            invoke Japanese-JISX0201-Roman
6398 0x1B 0x28 0x49    ESC ( I            invoke Japanese-JISX0201-Kana
6399 0x1B 0x24 0x42    ESC $ B            invoke Japanese-JISX0208
6400 0x1B 0x28 0x42    ESC ( B            invoke Printing-ASCII
6401 @end example
6402
6403   Initially, Printing-ASCII is invoked.
6404
6405 @node Internal Mule Encodings
6406 @section Internal Mule Encodings
6407
6408 In XEmacs/Mule, each character set is assigned a unique number, called a
6409 @dfn{leading byte}.  This is used in the encodings of a character.
6410 Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has
6411 a leading byte of 0), although some leading bytes are reserved.
6412
6413 Charsets whose leading byte is in the range 0x80 - 0x9F are called
6414 @dfn{official} and are used for built-in charsets.  Other charsets are
6415 called @dfn{private} and have leading bytes in the range 0xA0 - 0xFF;
6416 these are user-defined charsets.
6417
6418   More specifically:
6419
6420 @example
6421 Character set           Leading byte
6422 -------------           ------------
6423 ASCII                   0
6424 Composite               0x80
6425 Dimension-1 Official    0x81 - 0x8D
6426                           (0x8E is free)
6427 Control-1               0x8F
6428 Dimension-2 Official    0x90 - 0x99
6429                           (0x9A - 0x9D are free;
6430                            0x9E and 0x9F are reserved)
6431 Dimension-1 Private     0xA0 - 0xEF
6432 Dimension-2 Private     0xF0 - 0xFF
6433 @end example
6434
6435 There are two internal encodings for characters in XEmacs/Mule.  One is
6436 called @dfn{string encoding} and is an 8-bit encoding that is used for
6437 representing characters in a buffer or string.  It uses 1 to 4 bytes per
6438 character.  The other is called @dfn{character encoding} and is a 19-bit
6439 encoding that is used for representing characters individually in a
6440 variable.
6441
6442 (In the following descriptions, we'll ignore composite characters for
6443 the moment.  We also give a general (structural) overview first,
6444 followed later by the exact details.)
6445
6446 @menu
6447 * Internal String Encoding::
6448 * Internal Character Encoding::
6449 @end menu
6450
6451 @node Internal String Encoding
6452 @subsection Internal String Encoding
6453
6454 ASCII characters are encoded using their position code directly.  Other
6455 characters are encoded using their leading byte followed by their
6456 position code(s) with the high bit set.  Characters in private character
6457 sets have their leading byte prefixed with a @dfn{leading byte prefix},
6458 which is either 0x9E or 0x9F. (No character sets are ever assigned these
6459 leading bytes.) Specifically:
6460
6461 @example
6462 Character set           Encoding (PC=position-code, LB=leading-byte)
6463 -------------           --------
6464 ASCII                   PC-1 |
6465 Control-1               LB   |  PC1 + 0xA0 |
6466 Dimension-1 official    LB   |  PC1 + 0x80 |
6467 Dimension-1 private     0x9E |  LB         | PC1 + 0x80 |
6468 Dimension-2 official    LB   |  PC1 + 0x80 | PC2 + 0x80 |
6469 Dimension-2 private     0x9F |  LB         | PC1 + 0x80 | PC2 + 0x80
6470 @end example
6471
6472   The basic characteristic of this encoding is that the first byte
6473 of all characters is in the range 0x00 - 0x9F, and the second and
6474 following bytes of all characters is in the range 0xA0 - 0xFF.
6475 This means that it is impossible to get out of sync, or more
6476 specifically:
6477
6478 @enumerate
6479 @item
6480 Given any byte position, the beginning of the character it is
6481 within can be determined in constant time.
6482 @item
6483 Given any byte position at the beginning of a character, the
6484 beginning of the next character can be determined in constant
6485 time.
6486 @item
6487 Given any byte position at the beginning of a character, the
6488 beginning of the previous character can be determined in constant
6489 time.
6490 @item
6491 Textual searches can simply treat encoded strings as if they
6492 were encoded in a one-byte-per-character fashion rather than
6493 the actual multi-byte encoding.
6494 @end enumerate
6495
6496   None of the standard non-modal encodings meet all of these
6497 conditions.  For example, EUC satisfies only (2) and (3), while
6498 Shift-JIS and Big5 (not yet described) satisfy only (2). (All
6499 non-modal encodings must satisfy (2), in order to be unambiguous.)
6500
6501 @node Internal Character Encoding
6502 @subsection Internal Character Encoding
6503
6504   One 19-bit word represents a single character.  The word is
6505 separated into three fields:
6506
6507 @example
6508 Bit number:     18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
6509                 <------------> <------------------> <------------------>
6510 Field:                1                  2                    3
6511 @end example
6512
6513   Note that fields 2 and 3 hold 7 bits each, while field 1 holds 5 bits.
6514
6515 @example
6516 Character set           Field 1         Field 2         Field 3
6517 -------------           -------         -------         -------
6518 ASCII                      0               0              PC1
6519    range:                                                   (00 - 7F)
6520 Control-1                  0               1              PC1
6521    range:                                                   (00 - 1F)
6522 Dimension-1 official       0            LB - 0x80         PC1
6523    range:                                    (01 - 0D)      (20 - 7F)
6524 Dimension-1 private        0            LB - 0x80         PC1
6525    range:                                    (20 - 6F)      (20 - 7F)
6526 Dimension-2 official    LB - 0x8F         PC1             PC2
6527    range:                    (01 - 0A)       (20 - 7F)      (20 - 7F)
6528 Dimension-2 private     LB - 0xE1         PC1             PC2
6529    range:                    (0F - 1E)       (20 - 7F)      (20 - 7F)
6530 Composite                 0x1F             ?               ?
6531 @end example
6532
6533   Note that character codes 0 - 255 are the same as the ``binary encoding''
6534 described above.
6535
6536 @node CCL
6537 @section CCL
6538
6539 @example
6540 CCL PROGRAM SYNTAX:
6541         CCL_PROGRAM := (CCL_MAIN_BLOCK
6542                         [ CCL_EOF_BLOCK ])
6543
6544         CCL_MAIN_BLOCK := CCL_BLOCK
6545         CCL_EOF_BLOCK := CCL_BLOCK
6546
6547         CCL_BLOCK := STATEMENT | (STATEMENT [STATEMENT ...])
6548         STATEMENT :=
6549                 SET | IF | BRANCH | LOOP | REPEAT | BREAK
6550                 | READ | WRITE
6551
6552         SET := (REG = EXPRESSION) | (REG SELF_OP EXPRESSION)
6553                | INT-OR-CHAR
6554
6555         EXPRESSION := ARG | (EXPRESSION OP ARG)
6556
6557         IF := (if EXPRESSION CCL_BLOCK CCL_BLOCK)
6558         BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...])
6559         LOOP := (loop STATEMENT [STATEMENT ...])
6560         BREAK := (break)
6561         REPEAT := (repeat)
6562                 | (write-repeat [REG | INT-OR-CHAR | string])
6563                 | (write-read-repeat REG [INT-OR-CHAR | string | ARRAY]?)
6564         READ := (read REG) | (read REG REG)
6565                 | (read-if REG ARITH_OP ARG CCL_BLOCK CCL_BLOCK)
6566                 | (read-branch REG CCL_BLOCK [CCL_BLOCK ...])
6567         WRITE := (write REG) | (write REG REG)
6568                 | (write INT-OR-CHAR) | (write STRING) | STRING
6569                 | (write REG ARRAY)
6570         END := (end)
6571
6572         REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7
6573         ARG := REG | INT-OR-CHAR
6574         OP :=   + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | //
6575                 | < | > | == | <= | >= | !=
6576         SELF_OP :=
6577                 += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>=
6578         ARRAY := '[' INT-OR-CHAR ... ']'
6579         INT-OR-CHAR := INT | CHAR
6580
6581 MACHINE CODE:
6582
6583 The machine code consists of a vector of 32-bit words.
6584 The first such word specifies the start of the EOF section of the code;
6585 this is the code executed to handle any stuff that needs to be done
6586 (e.g. designating back to ASCII and left-to-right mode) after all
6587 other encoded/decoded data has been written out.  This is not used for
6588 charset CCL programs.
6589
6590 REGISTER: 0..7  -- refered by RRR or rrr
6591
6592 OPERATOR BIT FIELD (27-bit): XXXXXXXXXXXXXXX RRR TTTTT
6593         TTTTT (5-bit): operator type
6594         RRR (3-bit): register number
6595         XXXXXXXXXXXXXXXX (15-bit):
6596                 CCCCCCCCCCCCCCC: constant or address
6597                 000000000000rrr: register number
6598
6599 AAAA:   00000 +
6600         00001 -
6601         00010 *
6602         00011 /
6603         00100 %
6604         00101 &
6605         00110 |
6606         00111 ~
6607
6608         01000 <<
6609         01001 >>
6610         01010 <8
6611         01011 >8
6612         01100 //
6613         01101 not used
6614         01110 not used
6615         01111 not used
6616
6617         10000 <
6618         10001 >
6619         10010 ==
6620         10011 <=
6621         10100 >=
6622         10101 !=
6623
6624 OPERATORS:      TTTTT RRR XX..
6625
6626 SetCS:          00000 RRR C...C         RRR = C...C
6627 SetCL:          00001 RRR .....         RRR = c...c
6628                 c.............c
6629 SetR:           00010 RRR ..rrr         RRR = rrr
6630 SetA:           00011 RRR ..rrr         RRR = array[rrr]
6631                 C.............C         size of array = C...C
6632                 c.............c         contents = c...c
6633
6634 Jump:           00100 000 c...c         jump to c...c
6635 JumpCond:       00101 RRR c...c         if (!RRR) jump to c...c
6636 WriteJump:      00110 RRR c...c         Write1 RRR, jump to c...c
6637 WriteReadJump:  00111 RRR c...c         Write1, Read1 RRR, jump to c...c
6638 WriteCJump:     01000 000 c...c         Write1 C...C, jump to c...c
6639                 C...C
6640 WriteCReadJump: 01001 RRR c...c         Write1 C...C, Read1 RRR,
6641                 C.............C         and jump to c...c
6642 WriteSJump:     01010 000 c...c         WriteS, jump to c...c
6643                 C.............C
6644                 S.............S
6645                 ...
6646 WriteSReadJump: 01011 RRR c...c         WriteS, Read1 RRR, jump to c...c
6647                 C.............C
6648                 S.............S
6649                 ...
6650 WriteAReadJump: 01100 RRR c...c         WriteA, Read1 RRR, jump to c...c
6651                 C.............C         size of array = C...C
6652                 c.............c         contents = c...c
6653                 ...
6654 Branch:         01101 RRR C...C         if (RRR >= 0 && RRR < C..)
6655                 c.............c         branch to (RRR+1)th address
6656 Read1:          01110 RRR ...           read 1-byte to RRR
6657 Read2:          01111 RRR ..rrr         read 2-byte to RRR and rrr
6658 ReadBranch:     10000 RRR C...C         Read1 and Branch
6659                 c.............c
6660                 ...
6661 Write1:         10001 RRR .....         write 1-byte RRR
6662 Write2:         10010 RRR ..rrr         write 2-byte RRR and rrr
6663 WriteC:         10011 000 .....         write 1-char C...CC
6664                 C.............C
6665 WriteS:         10100 000 .....         write C..-byte of string
6666                 C.............C
6667                 S.............S
6668                 ...
6669 WriteA:         10101 RRR .....         write array[RRR]
6670                 C.............C         size of array = C...C
6671                 c.............c         contents = c...c
6672                 ...
6673 End:            10110 000 .....         terminate the execution
6674
6675 SetSelfCS:      10111 RRR C...C         RRR AAAAA= C...C
6676                 ..........AAAAA
6677 SetSelfCL:      11000 RRR .....         RRR AAAAA= c...c
6678                 c.............c
6679                 ..........AAAAA
6680 SetSelfR:       11001 RRR ..Rrr         RRR AAAAA= rrr
6681                 ..........AAAAA
6682 SetExprCL:      11010 RRR ..Rrr         RRR = rrr AAAAA c...c
6683                 c.............c
6684                 ..........AAAAA
6685 SetExprR:       11011 RRR ..rrr         RRR = rrr AAAAA Rrr
6686                 ............Rrr
6687                 ..........AAAAA
6688 JumpCondC:      11100 RRR c...c         if !(RRR AAAAA C..) jump to c...c
6689                 C.............C
6690                 ..........AAAAA
6691 JumpCondR:      11101 RRR c...c         if !(RRR AAAAA rrr) jump to c...c
6692                 ............rrr
6693                 ..........AAAAA
6694 ReadJumpCondC:  11110 RRR c...c         Read1 and JumpCondC
6695                 C.............C
6696                 ..........AAAAA
6697 ReadJumpCondR:  11111 RRR c...c         Read1 and JumpCondR
6698                 ............rrr
6699                 ..........AAAAA
6700 @end example
6701
6702 @node The Lisp Reader and Compiler, Lstreams, MULE Character Sets and Encodings, Top
6703 @chapter The Lisp Reader and Compiler
6704
6705 Not yet documented.
6706
6707 @node Lstreams, Consoles; Devices; Frames; Windows, The Lisp Reader and Compiler, Top
6708 @chapter Lstreams
6709
6710   An @dfn{lstream} is an internal Lisp object that provides a generic
6711 buffering stream implementation.  Conceptually, you send data to the
6712 stream or read data from the stream, not caring what's on the other end
6713 of the stream.  The other end could be another stream, a file
6714 descriptor, a stdio stream, a fixed block of memory, a reallocating
6715 block of memory, etc.  The main purpose of the stream is to provide a
6716 standard interface and to do buffering.  Macros are defined to read or
6717 write characters, so the calling functions do not have to worry about
6718 blocking data together in order to achieve efficiency.
6719
6720 @menu
6721 * Creating an Lstream::         Creating an lstream object.
6722 * Lstream Types::               Different sorts of things that are streamed.
6723 * Lstream Functions::           Functions for working with lstreams.
6724 * Lstream Methods::             Creating new lstream types.
6725 @end menu
6726
6727 @node Creating an Lstream
6728 @section Creating an Lstream
6729
6730 Lstreams come in different types, depending on what is being interfaced
6731 to.  Although the primitive for creating new lstreams is
6732 @code{Lstream_new()}, generally you do not call this directly.  Instead,
6733 you call some type-specific creation function, which creates the lstream
6734 and initializes it as appropriate for the particular type.
6735
6736 All lstream creation functions take a @var{mode} argument, specifying
6737 what mode the lstream should be opened as.  This controls whether the
6738 lstream is for input and output, and optionally whether data should be
6739 blocked up in units of MULE characters.  Note that some types of
6740 lstreams can only be opened for input; others only for output; and
6741 others can be opened either way.  #### Richard Mlynarik thinks that
6742 there should be a strict separation between input and output streams,
6743 and he's probably right.
6744
6745   @var{mode} is a string, one of
6746
6747 @table @code
6748 @item "r"
6749   Open for reading.
6750 @item "w"
6751   Open for writing.
6752 @item "rc"
6753   Open for reading, but ``read'' never returns partial MULE characters.
6754 @item "wc"
6755   Open for writing, but never writes partial MULE characters.
6756 @end table
6757
6758 @node Lstream Types
6759 @section Lstream Types
6760
6761 @table @asis
6762 @item stdio
6763
6764 @item filedesc
6765
6766 @item lisp-string
6767
6768 @item fixed-buffer
6769
6770 @item resizing-buffer
6771
6772 @item dynarr
6773
6774 @item lisp-buffer
6775
6776 @item print
6777
6778 @item decoding
6779
6780 @item encoding
6781 @end table
6782
6783 @node Lstream Functions
6784 @section Lstream Functions
6785
6786 @deftypefun {Lstream *} Lstream_new (Lstream_implementation *@var{imp}, CONST char *@var{mode})
6787 Allocate and return a new Lstream.  This function is not really meant to
6788 be called directly; rather, each stream type should provide its own
6789 stream creation function, which creates the stream and does any other
6790 necessary creation stuff (e.g. opening a file).
6791 @end deftypefun
6792
6793 @deftypefun void Lstream_set_buffering (Lstream *@var{lstr}, Lstream_buffering @var{buffering}, int @var{buffering_size})
6794 Change the buffering of a stream.  See @file{lstream.h}.  By default the
6795 buffering is @code{STREAM_BLOCK_BUFFERED}.
6796 @end deftypefun
6797
6798 @deftypefun int Lstream_flush (Lstream *@var{lstr})
6799 Flush out any pending unwritten data in the stream.  Clear any buffered
6800 input data.  Returns 0 on success, -1 on error.
6801 @end deftypefun
6802
6803 @deftypefn Macro int Lstream_putc (Lstream *@var{stream}, int @var{c})
6804 Write out one byte to the stream.  This is a macro and so it is very
6805 efficient.  The @var{c} argument is only evaluated once but the @var{stream}
6806 argument is evaluated more than once.  Returns 0 on success, -1 on
6807 error.
6808 @end deftypefn
6809
6810 @deftypefn Macro int Lstream_getc (Lstream *@var{stream})
6811 Read one byte from the stream.  This is a macro and so it is very
6812 efficient.  The @var{stream} argument is evaluated more than once.  Return
6813 value is -1 for EOF or error.
6814 @end deftypefn
6815
6816 @deftypefn Macro void Lstream_ungetc (Lstream *@var{stream}, int @var{c})
6817 Push one byte back onto the input queue.  This will be the next byte
6818 read from the stream.  Any number of bytes can be pushed back and will
6819 be read in the reverse order they were pushed back -- most recent
6820 first. (This is necessary for consistency -- if there are a number of
6821 bytes that have been unread and I read and unread a byte, it needs to be
6822 the first to be read again.) This is a macro and so it is very
6823 efficient.  The @var{c} argument is only evaluated once but the @var{stream}
6824 argument is evaluated more than once.
6825 @end deftypefn
6826
6827 @deftypefun int Lstream_fputc (Lstream *@var{stream}, int @var{c})
6828 @deftypefunx int Lstream_fgetc (Lstream *@var{stream})
6829 @deftypefunx void Lstream_fungetc (Lstream *@var{stream}, int @var{c})
6830 Function equivalents of the above macros.
6831 @end deftypefun
6832
6833 @deftypefun int Lstream_read (Lstream *@var{stream}, void *@var{data}, int @var{size})
6834 Read @var{size} bytes of @var{data} from the stream.  Return the number
6835 of bytes read.  0 means EOF. -1 means an error occurred and no bytes
6836 were read.
6837 @end deftypefun
6838
6839 @deftypefun int Lstream_write (Lstream *@var{stream}, void *@var{data}, int @var{size})
6840 Write @var{size} bytes of @var{data} to the stream.  Return the number
6841 of bytes written.  -1 means an error occurred and no bytes were written.
6842 @end deftypefun
6843
6844 @deftypefun void Lstream_unread (Lstream *@var{stream}, void *@var{data}, int @var{size})
6845 Push back @var{size} bytes of @var{data} onto the input queue.  The next
6846 call to @code{Lstream_read()} with the same size will read the same
6847 bytes back.  Note that this will be the case even if there is other
6848 pending unread data.
6849 @end deftypefun
6850
6851 @deftypefun int Lstream_close (Lstream *@var{stream})
6852 Close the stream.  All data will be flushed out.
6853 @end deftypefun
6854
6855 @deftypefun void Lstream_reopen (Lstream *@var{stream})
6856 Reopen a closed stream.  This enables I/O on it again.  This is not
6857 meant to be called except from a wrapper routine that reinitializes
6858 variables and such -- the close routine may well have freed some
6859 necessary storage structures, for example.
6860 @end deftypefun
6861
6862 @deftypefun void Lstream_rewind (Lstream *@var{stream})
6863 Rewind the stream to the beginning.
6864 @end deftypefun
6865
6866 @node Lstream Methods
6867 @section Lstream Methods
6868
6869 @deftypefn {Lstream Method} int reader (Lstream *@var{stream}, unsigned char *@var{data}, int @var{size})
6870 Read some data from the stream's end and store it into @var{data}, which
6871 can hold @var{size} bytes.  Return the number of bytes read.  A return
6872 value of 0 means no bytes can be read at this time.  This may be because
6873 of an EOF, or because there is a granularity greater than one byte that
6874 the stream imposes on the returned data, and @var{size} is less than
6875 this granularity. (This will happen frequently for streams that need to
6876 return whole characters, because @code{Lstream_read()} calls the reader
6877 function repeatedly until it has the number of bytes it wants or until 0
6878 is returned.)  The lstream functions do not treat a 0 return as EOF or
6879 do anything special; however, the calling function will interpret any 0
6880 it gets back as EOF.  This will normally not happen unless the caller
6881 calls @code{Lstream_read()} with a very small size.
6882
6883 This function can be @code{NULL} if the stream is output-only.
6884 @end deftypefn
6885
6886 @deftypefn {Lstream Method} int writer (Lstream *@var{stream}, CONST unsigned char *@var{data}, int @var{size})
6887 Send some data to the stream's end.  Data to be sent is in @var{data}
6888 and is @var{size} bytes.  Return the number of bytes sent.  This
6889 function can send and return fewer bytes than is passed in; in that
6890 case, the function will just be called again until there is no data left
6891 or 0 is returned.  A return value of 0 means that no more data can be
6892 currently stored, but there is no error; the data will be squirreled
6893 away until the writer can accept data. (This is useful, e.g., if you're
6894 dealing with a non-blocking file descriptor and are getting
6895 @code{EWOULDBLOCK} errors.)  This function can be @code{NULL} if the
6896 stream is input-only.
6897 @end deftypefn
6898
6899 @deftypefn {Lstream Method} int rewinder (Lstream *@var{stream})
6900 Rewind the stream.  If this is @code{NULL}, the stream is not seekable.
6901 @end deftypefn
6902
6903 @deftypefn {Lstream Method} int seekable_p (Lstream *@var{stream})
6904 Indicate whether this stream is seekable -- i.e. it can be rewound.
6905 This method is ignored if the stream does not have a rewind method.  If
6906 this method is not present, the result is determined by whether a rewind
6907 method is present.
6908 @end deftypefn
6909
6910 @deftypefn {Lstream Method} int flusher (Lstream *@var{stream})
6911 Perform any additional operations necessary to flush the data in this
6912 stream.
6913 @end deftypefn
6914
6915 @deftypefn {Lstream Method} int pseudo_closer (Lstream *@var{stream})
6916 @end deftypefn
6917
6918 @deftypefn {Lstream Method} int closer (Lstream *@var{stream})
6919 Perform any additional operations necessary to close this stream down.
6920 May be @code{NULL}.  This function is called when @code{Lstream_close()}
6921 is called or when the stream is garbage-collected.  When this function
6922 is called, all pending data in the stream will already have been written
6923 out.
6924 @end deftypefn
6925
6926 @deftypefn {Lstream Method} Lisp_Object marker (Lisp_Object @var{lstream}, void (*@var{markfun}) (Lisp_Object))
6927 Mark this object for garbage collection.  Same semantics as a standard
6928 @code{Lisp_Object} marker.  This function can be @code{NULL}.
6929 @end deftypefn
6930
6931 @node Consoles; Devices; Frames; Windows, The Redisplay Mechanism, Lstreams, Top
6932 @chapter Consoles; Devices; Frames; Windows
6933
6934 @menu
6935 * Introduction to Consoles; Devices; Frames; Windows::
6936 * Point::
6937 * Window Hierarchy::
6938 * The Window Object::
6939 @end menu
6940
6941 @node Introduction to Consoles; Devices; Frames; Windows
6942 @section Introduction to Consoles; Devices; Frames; Windows
6943
6944 A window-system window that you see on the screen is called a
6945 @dfn{frame} in Emacs terminology.  Each frame is subdivided into one or
6946 more non-overlapping panes, called (confusingly) @dfn{windows}.  Each
6947 window displays the text of a buffer in it. (See above on Buffers.) Note
6948 that buffers and windows are independent entities: Two or more windows
6949 can be displaying the same buffer (potentially in different locations),
6950 and a buffer can be displayed in no windows.
6951
6952   A single display screen that contains one or more frames is called
6953 a @dfn{display}.  Under most circumstances, there is only one display.
6954 However, more than one display can exist, for example if you have
6955 a @dfn{multi-headed} console, i.e. one with a single keyboard but
6956 multiple displays. (Typically in such a situation, the various
6957 displays act like one large display, in that the mouse is only
6958 in one of them at a time, and moving the mouse off of one moves
6959 it into another.) In some cases, the different displays will
6960 have different characteristics, e.g. one color and one mono.
6961
6962   XEmacs can display frames on multiple displays.  It can even deal
6963 simultaneously with frames on multiple keyboards (called @dfn{consoles} in
6964 XEmacs terminology).  Here is one case where this might be useful: You
6965 are using XEmacs on your workstation at work, and leave it running.
6966 Then you go home and dial in on a TTY line, and you can use the
6967 already-running XEmacs process to display another frame on your local
6968 TTY.
6969
6970   Thus, there is a hierarchy console -> display -> frame -> window.
6971 There is a separate Lisp object type for each of these four concepts.
6972 Furthermore, there is logically a @dfn{selected console},
6973 @dfn{selected display}, @dfn{selected frame}, and @dfn{selected window}.
6974 Each of these objects is distinguished in various ways, such as being the
6975 default object for various functions that act on objects of that type.
6976 Note that every containing object rememembers the ``selected'' object
6977 among the objects that it contains: e.g. not only is there a selected
6978 window, but every frame remembers the last window in it that was
6979 selected, and changing the selected frame causes the remembered window
6980 within it to become the selected window.  Similar relationships apply
6981 for consoles to devices and devices to frames.
6982
6983 @node Point
6984 @section Point
6985
6986   Recall that every buffer has a current insertion position, called
6987 @dfn{point}.  Now, two or more windows may be displaying the same buffer,
6988 and the text cursor in the two windows (i.e. @code{point}) can be in
6989 two different places.  You may ask, how can that be, since each
6990 buffer has only one value of @code{point}?  The answer is that each window
6991 also has a value of @code{point} that is squirreled away in it.  There
6992 is only one selected window, and the value of ``point'' in that buffer
6993 corresponds to that window.  When the selected window is changed
6994 from one window to another displaying the same buffer, the old
6995 value of @code{point} is stored into the old window's ``point'' and the
6996 value of @code{point} from the new window is retrieved and made the
6997 value of @code{point} in the buffer.  This means that @code{window-point}
6998 for the selected window is potentially inaccurate, and if you
6999 want to retrieve the correct value of @code{point} for a window,
7000 you must special-case on the selected window and retrieve the
7001 buffer's point instead.  This is related to why @code{save-window-excursion}
7002 does not save the selected window's value of @code{point}.
7003
7004 @node Window Hierarchy
7005 @section Window Hierarchy
7006 @cindex window hierarchy
7007 @cindex hierarchy of windows
7008
7009   If a frame contains multiple windows (panes), they are always created
7010 by splitting an existing window along the horizontal or vertical axis.
7011 Terminology is a bit confusing here: to @dfn{split a window
7012 horizontally} means to create two side-by-side windows, i.e. to make a
7013 @emph{vertical} cut in a window.  Likewise, to @dfn{split a window
7014 vertically} means to create two windows, one above the other, by making
7015 a @emph{horizontal} cut.
7016
7017   If you split a window and then split again along the same axis, you
7018 will end up with a number of panes all arranged along the same axis.
7019 The precise way in which the splits were made should not be important,
7020 and this is reflected internally.  Internally, all windows are arranged
7021 in a tree, consisting of two types of windows, @dfn{combination} windows
7022 (which have children, and are covered completely by those children) and
7023 @dfn{leaf} windows, which have no children and are visible.  Every
7024 combination window has two or more children, all arranged along the same
7025 axis.  There are (logically) two subtypes of windows, depending on
7026 whether their children are horizontally or vertically arrayed.  There is
7027 always one root window, which is either a leaf window (if the frame
7028 contains only one window) or a combination window (if the frame contains
7029 more than one window).  In the latter case, the root window will have
7030 two or more children, either horizontally or vertically arrayed, and
7031 each of those children will be either a leaf window or another
7032 combination window.
7033
7034   Here are some rules:
7035
7036 @enumerate
7037 @item
7038 Horizontal combination windows can never have children that are
7039 horizontal combination windows; same for vertical.
7040
7041 @item
7042 Only leaf windows can be split (obviously) and this splitting does one
7043 of two things: (a) turns the leaf window into a combination window and
7044 creates two new leaf children, or (b) turns the leaf window into one of
7045 the two new leaves and creates the other leaf.  Rule (1) dictates which
7046 of these two outcomes happens.
7047
7048 @item
7049 Every combination window must have at least two children.
7050
7051 @item
7052 Leaf windows can never become combination windows.  They can be deleted,
7053 however.  If this results in a violation of (3), the parent combination
7054 window also gets deleted.
7055
7056 @item
7057 All functions that accept windows must be prepared to accept combination
7058 windows, and do something sane (e.g. signal an error if so).
7059 Combination windows @emph{do} escape to the Lisp level.
7060
7061 @item
7062 All windows have three fields governing their contents:
7063 these are @dfn{hchild} (a list of horizontally-arrayed children),
7064 @dfn{vchild} (a list of vertically-arrayed children), and @dfn{buffer}
7065 (the buffer contained in a leaf window).  Exactly one of
7066 these will be non-nil.  Remember that @dfn{horizontally-arrayed}
7067 means ``side-by-side'' and @dfn{vertically-arrayed} means
7068 @dfn{one above the other}.
7069
7070 @item
7071 Leaf windows also have markers in their @code{start} (the
7072 first buffer position displayed in the window) and @code{pointm}
7073 (the window's stashed value of @code{point} -- see above) fields,
7074 while combination windows have nil in these fields.
7075
7076 @item
7077 The list of children for a window is threaded through the
7078 @code{next} and @code{prev} fields of each child window.
7079
7080 @item
7081 @strong{Deleted windows can be undeleted}.  This happens as a result of
7082 restoring a window configuration, and is unlike frames, displays, and
7083 consoles, which, once deleted, can never be restored.  Deleting a window
7084 does nothing except set a special @code{dead} bit to 1 and clear out the
7085 @code{next}, @code{prev}, @code{hchild}, and @code{vchild} fields, for
7086 GC purposes.
7087
7088 @item
7089 Most frames actually have two top-level windows -- one for the
7090 minibuffer and one (the @dfn{root}) for everything else.  The modeline
7091 (if present) separates these two.  The @code{next} field of the root
7092 points to the minibuffer, and the @code{prev} field of the minibuffer
7093 points to the root.  The other @code{next} and @code{prev} fields are
7094 @code{nil}, and the frame points to both of these windows.
7095 Minibuffer-less frames have no minibuffer window, and the @code{next}
7096 and @code{prev} of the root window are @code{nil}.  Minibuffer-only
7097 frames have no root window, and the @code{next} of the minibuffer window
7098 is @code{nil} but the @code{prev} points to itself. (#### This is an
7099 artifact that should be fixed.)
7100 @end enumerate
7101
7102 @node The Window Object
7103 @section The Window Object
7104
7105   Windows have the following accessible fields:
7106
7107 @table @code
7108 @item frame
7109 The frame that this window is on.
7110
7111 @item mini_p
7112 Non-@code{nil} if this window is a minibuffer window.
7113
7114 @item buffer
7115 The buffer that the window is displaying.  This may change often during
7116 the life of the window.
7117
7118 @item dedicated
7119 Non-@code{nil} if this window is dedicated to its buffer.
7120
7121 @item pointm
7122 @cindex window point internals
7123 This is the value of point in the current buffer when this window is
7124 selected; when it is not selected, it retains its previous value.
7125
7126 @item start
7127 The position in the buffer that is the first character to be displayed
7128 in the window.
7129
7130 @item force_start
7131 If this flag is non-@code{nil}, it says that the window has been
7132 scrolled explicitly by the Lisp program.  This affects what the next
7133 redisplay does if point is off the screen: instead of scrolling the
7134 window to show the text around point, it moves point to a location that
7135 is on the screen.
7136
7137 @item last_modified
7138 The @code{modified} field of the window's buffer, as of the last time
7139 a redisplay completed in this window.
7140
7141 @item last_point
7142 The buffer's value of point, as of the last time
7143 a redisplay completed in this window.
7144
7145 @item left
7146 This is the left-hand edge of the window, measured in columns.  (The
7147 leftmost column on the screen is @w{column 0}.)
7148
7149 @item top
7150 This is the top edge of the window, measured in lines.  (The top line on
7151 the screen is @w{line 0}.)
7152
7153 @item height
7154 The height of the window, measured in lines.
7155
7156 @item width
7157 The width of the window, measured in columns.
7158
7159 @item next
7160 This is the window that is the next in the chain of siblings.  It is
7161 @code{nil} in a window that is the rightmost or bottommost of a group of
7162 siblings.
7163
7164 @item prev
7165 This is the window that is the previous in the chain of siblings.  It is
7166 @code{nil} in a window that is the leftmost or topmost of a group of
7167 siblings.
7168
7169 @item parent
7170 Internally, XEmacs arranges windows in a tree; each group of siblings has
7171 a parent window whose area includes all the siblings.  This field points
7172 to a window's parent.
7173
7174 Parent windows do not display buffers, and play little role in display
7175 except to shape their child windows.  Emacs Lisp programs usually have
7176 no access to the parent windows; they operate on the windows at the
7177 leaves of the tree, which actually display buffers.
7178
7179 @item hscroll
7180 This is the number of columns that the display in the window is scrolled
7181 horizontally to the left.  Normally, this is 0.
7182
7183 @item use_time
7184 This is the last time that the window was selected.  The function
7185 @code{get-lru-window} uses this field.
7186
7187 @item display_table
7188 The window's display table, or @code{nil} if none is specified for it.
7189
7190 @item update_mode_line
7191 Non-@code{nil} means this window's mode line needs to be updated.
7192
7193 @item base_line_number
7194 The line number of a certain position in the buffer, or @code{nil}.
7195 This is used for displaying the line number of point in the mode line.
7196
7197 @item base_line_pos
7198 The position in the buffer for which the line number is known, or
7199 @code{nil} meaning none is known.
7200
7201 @item region_showing
7202 If the region (or part of it) is highlighted in this window, this field
7203 holds the mark position that made one end of that region.  Otherwise,
7204 this field is @code{nil}.
7205 @end table
7206
7207 @node The Redisplay Mechanism, Extents, Consoles; Devices; Frames; Windows, Top
7208 @chapter The Redisplay Mechanism
7209
7210   The redisplay mechanism is one of the most complicated sections of
7211 XEmacs, especially from a conceptual standpoint.  This is doubly so
7212 because, unlike for the basic aspects of the Lisp interpreter, the
7213 computer science theories of how to efficiently handle redisplay are not
7214 well-developed.
7215
7216   When working with the redisplay mechanism, remember the Golden Rules
7217 of Redisplay:
7218
7219 @enumerate
7220 @item
7221 It Is Better To Be Correct Than Fast.
7222 @item
7223 Thou Shalt Not Run Elisp From Within Redisplay.
7224 @item
7225 It Is Better To Be Fast Than Not To Be.
7226 @end enumerate
7227
7228 @menu
7229 * Critical Redisplay Sections::
7230 * Line Start Cache::
7231 @end menu
7232
7233 @node Critical Redisplay Sections
7234 @section Critical Redisplay Sections
7235 @cindex critical redisplay sections
7236
7237 Within this section, we are defenseless and assume that the
7238 following cannot happen:
7239
7240 @enumerate
7241 @item
7242 garbage collection
7243 @item
7244 Lisp code evaluation
7245 @item
7246 frame size changes
7247 @end enumerate
7248
7249 We ensure (3) by calling @code{hold_frame_size_changes()}, which
7250 will cause any pending frame size changes to get put on hold
7251 till after the end of the critical section.  (1) follows
7252 automatically if (2) is met.  #### Unfortunately, there are
7253 some places where Lisp code can be called within this section.
7254 We need to remove them.
7255
7256 If @code{Fsignal()} is called during this critical section, we
7257 will @code{abort()}.
7258
7259 If garbage collection is called during this critical section,
7260 we simply return. #### We should abort instead.
7261
7262 #### If a frame-size change does occur we should probably
7263 actually be preempting redisplay.
7264
7265 @node Line Start Cache
7266 @section Line Start Cache
7267 @cindex line start cache
7268
7269   The traditional scrolling code in Emacs breaks in a variable height
7270 world.  It depends on the key assumption that the number of lines that
7271 can be displayed at any given time is fixed.  This led to a complete
7272 separation of the scrolling code from the redisplay code.  In order to
7273 fully support variable height lines, the scrolling code must actually be
7274 tightly integrated with redisplay.  Only redisplay can determine how
7275 many lines will be displayed on a screen for any given starting point.
7276
7277   What is ideally wanted is a complete list of the starting buffer
7278 position for every possible display line of a buffer along with the
7279 height of that display line.  Maintaining such a full list would be very
7280 expensive.  We settle for having it include information for all areas
7281 which we happen to generate anyhow (i.e. the region currently being
7282 displayed) and for those areas we need to work with.
7283
7284   In order to ensure that the cache accurately represents what redisplay
7285 would actually show, it is necessary to invalidate it in many
7286 situations.  If the buffer changes, the starting positions may no longer
7287 be correct.  If a face or an extent has changed then the line heights
7288 may have altered.  These events happen frequently enough that the cache
7289 can end up being constantly disabled.  With this potentially constant
7290 invalidation when is the cache ever useful?
7291
7292   Even if the cache is invalidated before every single usage, it is
7293 necessary.  Scrolling often requires knowledge about display lines which
7294 are actually above or below the visible region.  The cache provides a
7295 convenient light-weight method of storing this information for multiple
7296 display regions.  This knowledge is necessary for the scrolling code to
7297 always obey the First Golden Rule of Redisplay.
7298
7299   If the cache already contains all of the information that the scrolling
7300 routines happen to need so that it doesn't have to go generate it, then
7301 we are able to obey the Third Golden Rule of Redisplay.  The first thing
7302 we do to help out the cache is to always add the displayed region.  This
7303 region had to be generated anyway, so the cache ends up getting the
7304 information basically for free.  In those cases where a user is simply
7305 scrolling around viewing a buffer there is a high probability that this
7306 is sufficient to always provide the needed information.  The second
7307 thing we can do is be smart about invalidating the cache.
7308
7309   TODO -- Be smart about invalidating the cache.  Potential places:
7310
7311 @itemize @bullet
7312 @item
7313 Insertions at end-of-line which don't cause line-wraps do not alter the
7314 starting positions of any display lines.  These types of buffer
7315 modifications should not invalidate the cache.  This is actually a large
7316 optimization for redisplay speed as well.
7317 @item
7318 Buffer modifications frequently only affect the display of lines at and
7319 below where they occur.  In these situations we should only invalidate
7320 the part of the cache starting at where the modification occurs.
7321 @end itemize
7322
7323   In case you're wondering, the Second Golden Rule of Redisplay is not
7324 applicable.
7325
7326 @node Extents, Faces and Glyphs, The Redisplay Mechanism, Top
7327 @chapter Extents
7328
7329 @menu
7330 * Introduction to Extents::     Extents are ranges over text, with properties.
7331 * Extent Ordering::             How extents are ordered internally.
7332 * Format of the Extent Info::   The extent information in a buffer or string.
7333 * Zero-Length Extents::         A weird special case.
7334 * Mathematics of Extent Ordering::      A rigorous foundation.
7335 * Extent Fragments::            Cached information useful for redisplay.
7336 @end menu
7337
7338 @node Introduction to Extents
7339 @section Introduction to Extents
7340
7341   Extents are regions over a buffer, with a start and an end position
7342 denoting the region of the buffer included in the extent.  In
7343 addition, either end can be closed or open, meaning that the endpoint
7344 is or is not logically included in the extent.  Insertion of a character
7345 at a closed endpoint causes the character to go inside the extent;
7346 insertion at an open endpoint causes the character to go outside.
7347
7348   Extent endpoints are stored using memory indices (see @file{insdel.c}),
7349 to minimize the amount of adjusting that needs to be done when
7350 characters are inserted or deleted.
7351
7352   (Formerly, extent endpoints at the gap could be either before or
7353 after the gap, depending on the open/closedness of the endpoint.
7354 The intent of this was to make it so that insertions would
7355 automatically go inside or out of extents as necessary with no
7356 further work needing to be done.  It didn't work out that way,
7357 however, and just ended up complexifying and buggifying all the
7358 rest of the code.)
7359
7360 @node Extent Ordering
7361 @section Extent Ordering
7362
7363   Extents are compared using memory indices.  There are two orderings
7364 for extents and both orders are kept current at all times.  The normal
7365 or @dfn{display} order is as follows:
7366
7367 @example
7368 Extent A is ``less than'' extent B, that is, earlier in the display order,
7369 if:    A-start < B-start,
7370 or if: A-start = B-start, and A-end > B-end
7371 @end example
7372
7373   So if two extents begin at the same position, the larger of them is the
7374 earlier one in the display order (@code{EXTENT_LESS} is true).
7375
7376   For the e-order, the same thing holds:
7377
7378 @example
7379 Extent A is ``less than'' extent B in e-order, that is, later in the buffer,
7380 if:    A-end < B-end,
7381 or if: A-end = B-end, and A-start > B-start
7382 @end example
7383
7384   So if two extents end at the same position, the smaller of them is the
7385 earlier one in the e-order (@code{EXTENT_E_LESS} is true).
7386
7387   The display order and the e-order are complementary orders: any
7388 theorem about the display order also applies to the e-order if you swap
7389 all occurrences of ``display order'' and ``e-order'', ``less than'' and
7390 ``greater than'', and ``extent start'' and ``extent end''.
7391
7392 @node Format of the Extent Info
7393 @section Format of the Extent Info
7394
7395   An extent-info structure consists of a list of the buffer or string's
7396 extents and a @dfn{stack of extents} that lists all of the extents over
7397 a particular position.  The stack-of-extents info is used for
7398 optimization purposes -- it basically caches some info that might
7399 be expensive to compute.  Certain otherwise hard computations are easy
7400 given the stack of extents over a particular position, and if the
7401 stack of extents over a nearby position is known (because it was
7402 calculated at some prior point in time), it's easy to move the stack
7403 of extents to the proper position.
7404
7405   Given that the stack of extents is an optimization, and given that
7406 it requires memory, a string's stack of extents is wiped out each
7407 time a garbage collection occurs.  Therefore, any time you retrieve
7408 the stack of extents, it might not be there.  If you need it to
7409 be there, use the @code{_force} version.
7410
7411   Similarly, a string may or may not have an extent_info structure.
7412 (Generally it won't if there haven't been any extents added to the
7413 string.) So use the @code{_force} version if you need the extent_info
7414 structure to be there.
7415
7416   A list of extents is maintained as a double gap array: one gap array
7417 is ordered by start index (the @dfn{display order}) and the other is
7418 ordered by end index (the @dfn{e-order}).  Note that positions in an
7419 extent list should logically be conceived of as referring @emph{to} a
7420 particular extent (as is the norm in programs) rather than sitting
7421 between two extents.  Note also that callers of these functions should
7422 not be aware of the fact that the extent list is implemented as an
7423 array, except for the fact that positions are integers (this should be
7424 generalized to handle integers and linked list equally well).
7425
7426 @node Zero-Length Extents
7427 @section Zero-Length Extents
7428
7429   Extents can be zero-length, and will end up that way if their endpoints
7430 are explicitly set that way or if their detachable property is nil
7431 and all the text in the extent is deleted. (The exception is open-open
7432 zero-length extents, which are barred from existing because there is
7433 no sensible way to define their properties.  Deletion of the text in
7434 an open-open extent causes it to be converted into a closed-open
7435 extent.)  Zero-length extents are primarily used to represent
7436 annotations, and behave as follows:
7437
7438 @enumerate
7439 @item
7440 Insertion at the position of a zero-length extent expands the extent
7441 if both endpoints are closed; goes after the extent if it is closed-open;
7442 and goes before the extent if it is open-closed.
7443
7444 @item
7445 Deletion of a character on a side of a zero-length extent whose
7446 corresponding endpoint is closed causes the extent to be detached if
7447 it is detachable; if the extent is not detachable or the corresponding
7448 endpoint is open, the extent remains in the buffer, moving as necessary.
7449 @end enumerate
7450
7451   Note that closed-open, non-detachable zero-length extents behave
7452 exactly like markers and that open-closed, non-detachable zero-length
7453 extents behave like the ``point-type'' marker in Mule.
7454
7455 @node Mathematics of Extent Ordering
7456 @section Mathematics of Extent Ordering
7457 @cindex extent mathematics
7458 @cindex mathematics of extents
7459 @cindex extent ordering
7460
7461 @cindex display order of extents
7462 @cindex extents, display order
7463   The extents in a buffer are ordered by ``display order'' because that
7464 is that order that the redisplay mechanism needs to process them in.
7465 The e-order is an auxiliary ordering used to facilitate operations
7466 over extents.  The operations that can be performed on the ordered
7467 list of extents in a buffer are
7468
7469 @enumerate
7470 @item
7471 Locate where an extent would go if inserted into the list.
7472 @item
7473 Insert an extent into the list.
7474 @item
7475 Remove an extent from the list.
7476 @item
7477 Map over all the extents that overlap a range.
7478 @end enumerate
7479
7480   (4) requires being able to determine the first and last extents
7481 that overlap a range.
7482
7483   NOTE: @dfn{overlap} is used as follows:
7484
7485 @itemize @bullet
7486 @item
7487 two ranges overlap if they have at least one point in common.
7488 Whether the endpoints are open or closed makes a difference here.
7489 @item
7490 a point overlaps a range if the point is contained within the
7491 range; this is equivalent to treating a point @math{P} as the range
7492 @math{[P, P]}.
7493 @item
7494 In the case of an @emph{extent} overlapping a point or range, the extent
7495 is normally treated as having closed endpoints.  This applies
7496 consistently in the discussion of stacks of extents and such below.
7497 Note that this definition of overlap is not necessarily consistent with
7498 the extents that @code{map-extents} maps over, since @code{map-extents}
7499 sometimes pays attention to whether the endpoints of an extents are open
7500 or closed.  But for our purposes, it greatly simplifies things to treat
7501 all extents as having closed endpoints.
7502 @end itemize
7503
7504 First, define @math{>}, @math{<}, @math{<=}, etc. as applied to extents
7505 to mean comparison according to the display order.  Comparison between
7506 an extent @math{E} and an index @math{I} means comparison between
7507 @math{E} and the range @math{[I, I]}.
7508
7509 Also define @math{e>}, @math{e<}, @math{e<=}, etc. to mean comparison
7510 according to the e-order.
7511
7512 For any range @math{R}, define @math{R(0)} to be the starting index of
7513 the range and @math{R(1)} to be the ending index of the range.
7514
7515 For any extent @math{E}, define @math{E(next)} to be the extent directly
7516 following @math{E}, and @math{E(prev)} to be the extent directly
7517 preceding @math{E}.  Assume @math{E(next)} and @math{E(prev)} can be
7518 determined from @math{E} in constant time.  (This is because we store
7519 the extent list as a doubly linked list.)
7520
7521 Similarly, define @math{E(e-next)} and @math{E(e-prev)} to be the
7522 extents directly following and preceding @math{E} in the e-order.
7523
7524 Now:
7525
7526 Let @math{R} be a range.
7527 Let @math{F} be the first extent overlapping @math{R}.
7528 Let @math{L} be the last extent overlapping @math{R}.
7529
7530 Theorem 1: @math{R(1)} lies between @math{L} and @math{L(next)},
7531 i.e. @math{L <= R(1) < L(next)}.
7532
7533   This follows easily from the definition of display order.  The
7534 basic reason that this theorem applies is that the display order
7535 sorts by increasing starting index.
7536
7537   Therefore, we can determine @math{L} just by looking at where we would
7538 insert @math{R(1)} into the list, and if we know @math{F} and are moving
7539 forward over extents, we can easily determine when we've hit @math{L} by
7540 comparing the extent we're at to @math{R(1)}.
7541
7542 @example
7543 Theorem 2: @math{F(e-prev) e< [1, R(0)] e<= F}.
7544 @end example
7545
7546   This is the analog of Theorem 1, and applies because the e-order
7547 sorts by increasing ending index.
7548
7549   Therefore, @math{F} can be found in the same amount of time as
7550 operation (1), i.e. the time that it takes to locate where an extent
7551 would go if inserted into the e-order list.
7552
7553   If the lists were stored as balanced binary trees, then operation (1)
7554 would take logarithmic time, which is usually quite fast.  However,
7555 currently they're stored as simple doubly-linked lists, and instead we
7556 do some caching to try to speed things up.
7557
7558   Define a @dfn{stack of extents} (or @dfn{SOE}) as the set of extents
7559 (ordered in the display order) that overlap an index @math{I}, together
7560 with the SOE's @dfn{previous} extent, which is an extent that precedes
7561 @math{I} in the e-order. (Hopefully there will not be very many extents
7562 between @math{I} and the previous extent.)
7563
7564 Now:
7565
7566 Let @math{I} be an index, let @math{S} be the stack of extents on
7567 @math{I}, let @math{F} be the first extent in @math{S}, and let @math{P}
7568 be @math{S}'s previous extent.
7569
7570 Theorem 3: The first extent in @math{S} is the first extent that overlaps
7571 any range @math{[I, J]}.
7572
7573 Proof: Any extent that overlaps @math{[I, J]} but does not include
7574 @math{I} must have a start index @math{> I}, and thus be greater than
7575 any extent in @math{S}.
7576
7577 Therefore, finding the first extent that overlaps a range @math{R} is
7578 the same as finding the first extent that overlaps @math{R(0)}.
7579
7580 Theorem 4: Let @math{I2} be an index such that @math{I2 > I}, and let
7581 @math{F2} be the first extent that overlaps @math{I2}.  Then, either
7582 @math{F2} is in @math{S} or @math{F2} is greater than any extent in
7583 @math{S}.
7584
7585 Proof: If @math{F2} does not include @math{I} then its start index is
7586 greater than @math{I} and thus it is greater than any extent in
7587 @math{S}, including @math{F}.  Otherwise, @math{F2} includes @math{I}
7588 and thus is in @math{S}, and thus @math{F2 >= F}.
7589
7590 @node Extent Fragments
7591 @section Extent Fragments
7592 @cindex extent fragment
7593
7594   Imagine that the buffer is divided up into contiguous, non-overlapping
7595 @dfn{runs} of text such that no extent starts or ends within a run
7596 (extents that abut the run don't count).
7597
7598   An extent fragment is a structure that holds data about the run that
7599 contains a particular buffer position (if the buffer position is at the
7600 junction of two runs, the run after the position is used) -- the
7601 beginning and end of the run, a list of all of the extents in that run,
7602 the @dfn{merged face} that results from merging all of the faces
7603 corresponding to those extents, the begin and end glyphs at the
7604 beginning of the run, etc.  This is the information that redisplay needs
7605 in order to display this run.
7606
7607   Extent fragments have to be very quick to update to a new buffer
7608 position when moving linearly through the buffer.  They rely on the
7609 stack-of-extents code, which does the heavy-duty algorithmic work of
7610 determining which extents overly a particular position.
7611
7612 @node Faces and Glyphs, Specifiers, Extents, Top
7613 @chapter Faces and Glyphs
7614
7615 Not yet documented.
7616
7617 @node Specifiers, Menus, Faces and Glyphs, Top
7618 @chapter Specifiers
7619
7620 Not yet documented.
7621
7622 @node Menus, Subprocesses, Specifiers, Top
7623 @chapter Menus
7624
7625   A menu is set by setting the value of the variable
7626 @code{current-menubar} (which may be buffer-local) and then calling
7627 @code{set-menubar-dirty-flag} to signal a change.  This will cause the
7628 menu to be redrawn at the next redisplay.  The format of the data in
7629 @code{current-menubar} is described in @file{menubar.c}.
7630
7631   Internally the data in current-menubar is parsed into a tree of
7632 @code{widget_value's} (defined in @file{lwlib.h}); this is accomplished
7633 by the recursive function @code{menu_item_descriptor_to_widget_value()},
7634 called by @code{compute_menubar_data()}.  Such a tree is deallocated
7635 using @code{free_widget_value()}.
7636
7637   @code{update_screen_menubars()} is one of the external entry points.
7638 This checks to see, for each screen, if that screen's menubar needs to
7639 be updated.  This is the case if
7640
7641 @enumerate
7642 @item
7643 @code{set-menubar-dirty-flag} was called since the last redisplay.  (This
7644 function sets the C variable menubar_has_changed.)
7645 @item
7646 The buffer displayed in the screen has changed.
7647 @item
7648 The screen has no menubar currently displayed.
7649 @end enumerate
7650
7651   @code{set_screen_menubar()} is called for each such screen.  This
7652 function calls @code{compute_menubar_data()} to create the tree of
7653 widget_value's, then calls @code{lw_create_widget()},
7654 @code{lw_modify_all_widgets()}, and/or @code{lw_destroy_all_widgets()}
7655 to create the X-Toolkit widget associated with the menu.
7656
7657   @code{update_psheets()}, the other external entry point, actually
7658 changes the menus being displayed.  It uses the widgets fixed by
7659 @code{update_screen_menubars()} and calls various X functions to ensure
7660 that the menus are displayed properly.
7661
7662   The menubar widget is set up so that @code{pre_activate_callback()} is
7663 called when the menu is first selected (i.e. mouse button goes down),
7664 and @code{menubar_selection_callback()} is called when an item is
7665 selected.  @code{pre_activate_callback()} calls the function in
7666 activate-menubar-hook, which can change the menubar (this is described
7667 in @file{menubar.c}).  If the menubar is changed,
7668 @code{set_screen_menubars()} is called.
7669 @code{menubar_selection_callback()} enqueues a menu event, putting in it
7670 a function to call (either @code{eval} or @code{call-interactively}) and
7671 its argument, which is the callback function or form given in the menu's
7672 description.
7673
7674 @node Subprocesses, Interface to X Windows, Menus, Top
7675 @chapter Subprocesses
7676
7677   The fields of a process are:
7678
7679 @table @code
7680 @item name
7681 A string, the name of the process.
7682
7683 @item command
7684 A list containing the command arguments that were used to start this
7685 process.
7686
7687 @item filter
7688 A function used to accept output from the process instead of a buffer,
7689 or @code{nil}.
7690
7691 @item sentinel
7692 A function called whenever the process receives a signal, or @code{nil}.
7693
7694 @item buffer
7695 The associated buffer of the process.
7696
7697 @item pid
7698 An integer, the Unix process @sc{id}.
7699
7700 @item childp
7701 A flag, non-@code{nil} if this is really a child process.
7702 It is @code{nil} for a network connection.
7703
7704 @item mark
7705 A marker indicating the position of the end of the last output from this
7706 process inserted into the buffer.  This is often but not always the end
7707 of the buffer.
7708
7709 @item kill_without_query
7710 If this is non-@code{nil}, killing XEmacs while this process is still
7711 running does not ask for confirmation about killing the process.
7712
7713 @item raw_status_low
7714 @itemx raw_status_high
7715 These two fields record 16 bits each of the process status returned by
7716 the @code{wait} system call.
7717
7718 @item status
7719 The process status, as @code{process-status} should return it.
7720
7721 @item tick
7722 @itemx update_tick
7723 If these two fields are not equal, a change in the status of the process
7724 needs to be reported, either by running the sentinel or by inserting a
7725 message in the process buffer.
7726
7727 @item pty_flag
7728 Non-@code{nil} if communication with the subprocess uses a @sc{pty};
7729 @code{nil} if it uses a pipe.
7730
7731 @item infd
7732 The file descriptor for input from the process.
7733
7734 @item outfd
7735 The file descriptor for output to the process.
7736
7737 @item subtty
7738 The file descriptor for the terminal that the subprocess is using.  (On
7739 some systems, there is no need to record this, so the value is
7740 @code{-1}.)
7741
7742 @item tty_name
7743 The name of the terminal that the subprocess is using,
7744 or @code{nil} if it is using pipes.
7745 @end table
7746
7747 @node Interface to X Windows, Index, Subprocesses, Top
7748 @chapter Interface to X Windows
7749
7750 Not yet documented.
7751
7752 @include index.texi
7753
7754 @c Print the tables of contents
7755 @summarycontents
7756 @contents
7757 @c That's all
7758
7759 @bye
7760