git.chise.org Git - chise/xemacs-chise.git.1/blob - man/internals/internals.texi

   1 \input texinfo  @c -*-texinfo-*-
   2 @c %**start of header
   3 @setfilename ../../info/internals.info
   4 @settitle XEmacs Internals Manual
   5 @c %**end of header
   6
   7 @ifinfo
   8 @dircategory XEmacs Editor
   9 @direntry
  10 * Internals: (internals).       XEmacs Internals Manual.
  11 @end direntry
  12
  13 Copyright @copyright{} 1992 - 1996 Ben Wing.
  14 Copyright @copyright{} 1996, 1997 Sun Microsystems.
  15 Copyright @copyright{} 1994 - 1998 Free Software Foundation.
  16 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
  17
  18
  19 Permission is granted to make and distribute verbatim copies of this
  20 manual provided the copyright notice and this permission notice are
  21 preserved on all copies.
  22
  23 @ignore
  24 Permission is granted to process this file through TeX and print the
  25 results, provided the printed document carries copying permission notice
  26 identical to this one except for the removal of this paragraph (this
  27 paragraph not being relevant to the printed manual).
  28
  29 @end ignore
  30 Permission is granted to copy and distribute modified versions of this
  31 manual under the conditions for verbatim copying, provided that the
  32 entire resulting derived work is distributed under the terms of a
  33 permission notice identical to this one.
  34
  35 Permission is granted to copy and distribute translations of this manual
  36 into another language, under the above conditions for modified versions,
  37 except that this permission notice may be stated in a translation
  38 approved by the Foundation.
  39
  40 Permission is granted to copy and distribute modified versions of this
  41 manual under the conditions for verbatim copying, provided also that the
  42 section entitled ``GNU General Public License'' is included exactly as
  43 in the original, and provided that the entire resulting derived work is
  44 distributed under the terms of a permission notice identical to this
  45 one.
  46
  47 Permission is granted to copy and distribute translations of this manual
  48 into another language, under the above conditions for modified versions,
  49 except that the section entitled ``GNU General Public License'' may be
  50 included in a translation approved by the Free Software Foundation
  51 instead of in the original English.
  52 @end ifinfo
  53
  54 @c Combine indices.
  55 @synindex cp fn
  56 @syncodeindex vr fn
  57 @syncodeindex ky fn
  58 @syncodeindex pg fn
  59 @syncodeindex tp fn
  60
  61 @setchapternewpage odd
  62 @finalout
  63
  64 @titlepage
  65 @title XEmacs Internals Manual
  66 @subtitle Version 1.4, March 2001
  67
  68 @author Ben Wing
  69 @author Martin Buchholz
  70 @author Hrvoje Niksic
  71 @author Matthias Neubauer
  72 @author Olivier Galibert
  73 @page
  74 @vskip 0pt plus 1fill
  75
  76 @noindent
  77 Copyright @copyright{} 1992 - 1996, 2001 Ben Wing. @*
  78 Copyright @copyright{} 1996, 1997 Sun Microsystems, Inc. @*
  79 Copyright @copyright{} 1994 - 1998 Free Software Foundation. @*
  80 Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois.
  81
  82 @sp 2
  83 Version 1.4 @*
  84 March 2001.@*
  85
  86 Permission is granted to make and distribute verbatim copies of this
  87 manual provided the copyright notice and this permission notice are
  88 preserved on all copies.
  89
  90 Permission is granted to copy and distribute modified versions of this
  91 manual under the conditions for verbatim copying, provided also that the
  92 section entitled ``GNU General Public License'' is included
  93 exactly as in the original, and provided that the entire resulting
  94 derived work is distributed under the terms of a permission notice
  95 identical to this one.
  96
  97 Permission is granted to copy and distribute translations of this manual
  98 into another language, under the above conditions for modified versions,
  99 except that the section entitled ``GNU General Public License'' may be
 100 included in a translation approved by the Free Software Foundation
 101 instead of in the original English.
 102 @end titlepage
 103 @page
 104
 105 @node Top, A History of Emacs, (dir), (dir)
 106
 107 @ifinfo
 108 This Info file contains v1.4 of the XEmacs Internals Manual, March 2001.
 109 @end ifinfo
 110
 111 @menu
 112 * A History of Emacs::          Times, dates, important events.
 113 * XEmacs From the Outside::     A broad conceptual overview.
 114 * The Lisp Language::           An overview.
 115 * XEmacs From the Perspective of Building::
 116 * XEmacs From the Inside::
 117 * The XEmacs Object System (Abstractly Speaking)::
 118 * How Lisp Objects Are Represented in C::
 119 * Rules When Writing New C Code::
 120 * Regression Testing XEmacs::
 121 * A Summary of the Various XEmacs Modules::
 122 * Allocation of Objects in XEmacs Lisp::
 123 * Dumping::
 124 * Events and the Event Loop::
 125 * Evaluation; Stack Frames; Bindings::
 126 * Symbols and Variables::
 127 * Buffers and Textual Representation::
 128 * MULE Character Sets and Encodings::
 129 * The Lisp Reader and Compiler::
 130 * Lstreams::
 131 * Consoles; Devices; Frames; Windows::
 132 * The Redisplay Mechanism::
 133 * Extents::
 134 * Faces::
 135 * Glyphs::
 136 * Specifiers::
 137 * Menus::
 138 * Subprocesses::
 139 * Interface to the X Window System::
 140 * Index::
 141
 142 @detailmenu
 143
 144 --- The Detailed Node Listing ---
 145
 146 A History of Emacs
 147
 148 * Through Version 18::          Unification prevails.
 149 * Lucid Emacs::                 One version 19 Emacs.
 150 * GNU Emacs 19::                The other version 19 Emacs.
 151 * GNU Emacs 20::                The other version 20 Emacs.
 152 * XEmacs::                      The continuation of Lucid Emacs.
 153
 154 Rules When Writing New C Code
 155
 156 * General Coding Rules::
 157 * Writing Lisp Primitives::
 158 * Adding Global Lisp Variables::
 159 * Coding for Mule::
 160 * Techniques for XEmacs Developers::
 161
 162 Coding for Mule
 163
 164 * Character-Related Data Types::
 165 * Working With Character and Byte Positions::
 166 * Conversion to and from External Data::
 167 * General Guidelines for Writing Mule-Aware Code::
 168 * An Example of Mule-Aware Code::
 169
 170 Regression Testing XEmacs
 171
 172 A Summary of the Various XEmacs Modules
 173
 174 * Low-Level Modules::
 175 * Basic Lisp Modules::
 176 * Modules for Standard Editing Operations::
 177 * Editor-Level Control Flow Modules::
 178 * Modules for the Basic Displayable Lisp Objects::
 179 * Modules for other Display-Related Lisp Objects::
 180 * Modules for the Redisplay Mechanism::
 181 * Modules for Interfacing with the File System::
 182 * Modules for Other Aspects of the Lisp Interpreter and Object System::
 183 * Modules for Interfacing with the Operating System::
 184 * Modules for Interfacing with X Windows::
 185 * Modules for Internationalization::
 186 * Modules for Regression Testing::
 187
 188 Allocation of Objects in XEmacs Lisp
 189
 190 * Introduction to Allocation::
 191 * Garbage Collection::
 192 * GCPROing::
 193 * Garbage Collection - Step by Step::
 194 * Integers and Characters::
 195 * Allocation from Frob Blocks::
 196 * lrecords::
 197 * Low-level allocation::
 198 * Cons::
 199 * Vector::
 200 * Bit Vector::
 201 * Symbol::
 202 * Marker::
 203 * String::
 204 * Compiled Function::
 205
 206 Garbage Collection - Step by Step
 207
 208 * Invocation::
 209 * garbage_collect_1::
 210 * mark_object::
 211 * gc_sweep::
 212 * sweep_lcrecords_1::
 213 * compact_string_chars::
 214 * sweep_strings::
 215 * sweep_bit_vectors_1::
 216
 217 Dumping
 218
 219 * Overview::
 220 * Data descriptions::
 221 * Dumping phase::
 222 * Reloading phase::
 223
 224 Dumping phase
 225
 226 * Object inventory::
 227 * Address allocation::
 228 * The header::
 229 * Data dumping::
 230 * Pointers dumping::
 231
 232 Events and the Event Loop
 233
 234 * Introduction to Events::
 235 * Main Loop::
 236 * Specifics of the Event Gathering Mechanism::
 237 * Specifics About the Emacs Event::
 238 * The Event Stream Callback Routines::
 239 * Other Event Loop Functions::
 240 * Converting Events::
 241 * Dispatching Events; The Command Builder::
 242
 243 Evaluation; Stack Frames; Bindings
 244
 245 * Evaluation::
 246 * Dynamic Binding; The specbinding Stack; Unwind-Protects::
 247 * Simple Special Forms::
 248 * Catch and Throw::
 249
 250 Symbols and Variables
 251
 252 * Introduction to Symbols::
 253 * Obarrays::
 254 * Symbol Values::
 255
 256 Buffers and Textual Representation
 257
 258 * Introduction to Buffers::     A buffer holds a block of text such as a file.
 259 * The Text in a Buffer::        Representation of the text in a buffer.
 260 * Buffer Lists::                Keeping track of all buffers.
 261 * Markers and Extents::         Tagging locations within a buffer.
 262 * Bufbytes and Emchars::        Representation of individual characters.
 263 * The Buffer Object::           The Lisp object corresponding to a buffer.
 264
 265 MULE Character Sets and Encodings
 266
 267 * Character Sets::
 268 * Encodings::
 269 * Internal Mule Encodings::
 270 * CCL::
 271
 272 Encodings
 273
 274 * Japanese EUC (Extended Unix Code)::
 275 * JIS7::
 276
 277 Internal Mule Encodings
 278
 279 * Internal String Encoding::
 280 * Internal Character Encoding::
 281
 282 Lstreams
 283
 284 * Creating an Lstream::         Creating an lstream object.
 285 * Lstream Types::               Different sorts of things that are streamed.
 286 * Lstream Functions::           Functions for working with lstreams.
 287 * Lstream Methods::             Creating new lstream types.
 288
 289 Consoles; Devices; Frames; Windows
 290
 291 * Introduction to Consoles; Devices; Frames; Windows::
 292 * Point::
 293 * Window Hierarchy::
 294 * The Window Object::
 295
 296 The Redisplay Mechanism
 297
 298 * Critical Redisplay Sections::
 299 * Line Start Cache::
 300 * Redisplay Piece by Piece::
 301
 302 Extents
 303
 304 * Introduction to Extents::     Extents are ranges over text, with properties.
 305 * Extent Ordering::             How extents are ordered internally.
 306 * Format of the Extent Info::   The extent information in a buffer or string.
 307 * Zero-Length Extents::         A weird special case.
 308 * Mathematics of Extent Ordering::  A rigorous foundation.
 309 * Extent Fragments::            Cached information useful for redisplay.
 310
 311 @end detailmenu
 312 @end menu
 313
 314 @node A History of Emacs, XEmacs From the Outside, Top, Top
 315 @chapter A History of Emacs
 316 @cindex history of Emacs, a
 317 @cindex Emacs, a history of
 318 @cindex Hackers (Steven Levy)
 319 @cindex Levy, Steven
 320 @cindex ITS (Incompatible Timesharing System)
 321 @cindex Stallman, Richard
 322 @cindex RMS
 323 @cindex MIT
 324 @cindex TECO
 325 @cindex FSF
 326 @cindex Free Software Foundation
 327
 328   XEmacs is a powerful, customizable text editor and development
 329 environment.  It began as Lucid Emacs, which was in turn derived from
 330 GNU Emacs, a program written by Richard Stallman of the Free Software
 331 Foundation.  GNU Emacs dates back to the 1970's, and was modelled
 332 after a package called ``Emacs'', written in 1976, that was a set of
 333 macros on top of TECO, an old, old text editor written at MIT on the
 334 DEC PDP 10 under one of the earliest time-sharing operating systems,
 335 ITS (Incompatible Timesharing System). (ITS dates back well before
 336 Unix.) ITS, TECO, and Emacs were products of a group of people at MIT
 337 who called themselves ``hackers'', who shared an idealistic belief
 338 system about the free exchange of information and were fanatical in
 339 their devotion to and time spent with computers. (The hacker
 340 subculture dates back to the late 1950's at MIT and is described in
 341 detail in Steven Levy's book @cite{Hackers}.  This book also includes
 342 a lot of information about Stallman himself and the development of
 343 Lisp, a programming language developed at MIT that underlies Emacs.)
 344
 345 @menu
 346 * Through Version 18::          Unification prevails.
 347 * Lucid Emacs::                 One version 19 Emacs.
 348 * GNU Emacs 19::                The other version 19 Emacs.
 349 * GNU Emacs 20::                The other version 20 Emacs.
 350 * XEmacs::                      The continuation of Lucid Emacs.
 351 @end menu
 352
 353 @node Through Version 18
 354 @section Through Version 18
 355 @cindex version 18, through
 356 @cindex Gosling, James
 357 @cindex Great Usenet Renaming
 358
 359   Although the history of the early versions of GNU Emacs is unclear,
 360 the history is well-known from the middle of 1985.  A time line is:
 361
 362 @itemize @bullet
 363 @item
 364 GNU Emacs version 15 (15.34) was released sometime in 1984 or 1985 and
 365 shared some code with a version of Emacs written by James Gosling (the
 366 same James Gosling who later created the Java language).
 367 @item
 368 GNU Emacs version 16 (first released version was 16.56) was released on
 369 July 15, 1985.  All Gosling code was removed due to potential copyright
 370 problems with the code.
 371 @item
 372 version 16.57: released on September 16, 1985.
 373 @item
 374 versions 16.58, 16.59: released on September 17, 1985.
 375 @item
 376 version 16.60: released on September 19, 1985.  These later version 16's
 377 incorporated patches from the net, esp. for getting Emacs to work under
 378 System V.
 379 @item
 380 version 17.36 (first official v17 release) released on December 20,
 381 1985.  Included a TeX-able user manual.  First official unpatched
 382 version that worked on vanilla System V machines.
 383 @item
 384 version 17.43 (second official v17 release) released on January 25,
 385 1986.
 386 @item
 387 version 17.45 released on January 30, 1986.
 388 @item
 389 version 17.46 released on February 4, 1986.
 390 @item
 391 version 17.48 released on February 10, 1986.
 392 @item
 393 version 17.49 released on February 12, 1986.
 394 @item
 395 version 17.55 released on March 18, 1986.
 396 @item
 397 version 17.57 released on March 27, 1986.
 398 @item
 399 version 17.58 released on April 4, 1986.
 400 @item
 401 version 17.61 released on April 12, 1986.
 402 @item
 403 version 17.63 released on May 7, 1986.
 404 @item
 405 version 17.64 released on May 12, 1986.
 406 @item
 407 version 18.24 (a beta version) released on October 2, 1986.
 408 @item
 409 version 18.30 (a beta version) released on November 15, 1986.
 410 @item
 411 version 18.31 (a beta version) released on November 23, 1986.
 412 @item
 413 version 18.32 (a beta version) released on December 7, 1986.
 414 @item
 415 version 18.33 (a beta version) released on December 12, 1986.
 416 @item
 417 version 18.35 (a beta version) released on January 5, 1987.
 418 @item
 419 version 18.36 (a beta version) released on January 21, 1987.
 420 @item
 421 January 27, 1987: The Great Usenet Renaming.  net.emacs is now
 422 comp.emacs.
 423 @item
 424 version 18.37 (a beta version) released on February 12, 1987.
 425 @item
 426 version 18.38 (a beta version) released on March 3, 1987.
 427 @item
 428 version 18.39 (a beta version) released on March 14, 1987.
 429 @item
 430 version 18.40 (a beta version) released on March 18, 1987.
 431 @item
 432 version 18.41 (the first ``official'' release) released on March 22,
 433 1987.
 434 @item
 435 version 18.45 released on June 2, 1987.
 436 @item
 437 version 18.46 released on June 9, 1987.
 438 @item
 439 version 18.47 released on June 18, 1987.
 440 @item
 441 version 18.48 released on September 3, 1987.
 442 @item
 443 version 18.49 released on September 18, 1987.
 444 @item
 445 version 18.50 released on February 13, 1988.
 446 @item
 447 version 18.51 released on May 7, 1988.
 448 @item
 449 version 18.52 released on September 1, 1988.
 450 @item
 451 version 18.53 released on February 24, 1989.
 452 @item
 453 version 18.54 released on April 26, 1989.
 454 @item
 455 version 18.55 released on August 23, 1989.  This is the earliest version
 456 that is still available by FTP.
 457 @item
 458 version 18.56 released on January 17, 1991.
 459 @item
 460 version 18.57 released late January, 1991.
 461 @item
 462 version 18.58 released ?????.
 463 @item
 464 version 18.59 released October 31, 1992.
 465 @end itemize
 466
 467 @node Lucid Emacs
 468 @section Lucid Emacs
 469 @cindex Lucid Emacs
 470 @cindex Lucid Inc.
 471 @cindex Energize
 472 @cindex Epoch
 473
 474   Lucid Emacs was developed by the (now-defunct) Lucid Inc., a maker of
 475 C++ and Lisp development environments.  It began when Lucid decided they
 476 wanted to use Emacs as the editor and cornerstone of their C++
 477 development environment (called ``Energize'').  They needed many features
 478 that were not available in the existing version of GNU Emacs (version
 479 18.5something), in particular good and integrated support for GUI
 480 elements such as mouse support, multiple fonts, multiple window-system
 481 windows, etc.  A branch of GNU Emacs called Epoch, written at the
 482 University of Illinois, existed that supplied many of these features;
 483 however, Lucid needed more than what existed in Epoch.  At the time, the
 484 Free Software Foundation was working on version 19 of Emacs (this was
 485 sometime around 1991), which was planned to have similar features, and
 486 so Lucid decided to work with the Free Software Foundation.  Their plan
 487 was to add features that they needed, and coordinate with the FSF so
 488 that the features would get included back into Emacs version 19.
 489
 490   Delays in the release of version 19 occurred, however (resulting in it
 491 finally being released more than a year after what was initially
 492 planned), and Lucid encountered unexpected technical resistance in
 493 getting their changes merged back into version 19, so they decided to
 494 release their own version of Emacs, which became Lucid Emacs 19.0.
 495
 496 @cindex Zawinski, Jamie
 497 @cindex Sexton, Harlan
 498 @cindex Benson, Eric
 499 @cindex Devin, Matthieu
 500   The initial authors of Lucid Emacs were Matthieu Devin, Harlan Sexton,
 501 and Eric Benson, and the work was later taken over by Jamie Zawinski,
 502 who became ``Mr. Lucid Emacs'' for many releases.
 503
 504   A time line for Lucid Emacs is
 505
 506 @itemize @bullet
 507 @item
 508 version 19.0 shipped with Energize 1.0, April 1992.
 509 @item
 510 version 19.1 released June 4, 1992.
 511 @item
 512 version 19.2 released June 19, 1992.
 513 @item
 514 version 19.3 released September 9, 1992.
 515 @item
 516 version 19.4 released January 21, 1993.
 517 @item
 518 version 19.5 was a repackaging of 19.4 with a few bug fixes and
 519 shipped with Energize 2.0.  Never released to the net.
 520 @item
 521 version 19.6 released April 9, 1993.
 522 @item
 523 version 19.7 was a repackaging of 19.6 with a few bug fixes and
 524 shipped with Energize 2.1.  Never released to the net.
 525 @item
 526 version 19.8 released September 6, 1993.
 527 @item
 528 version 19.9 released January 12, 1994.
 529 @item
 530 version 19.10 released May 27, 1994.
 531 @item
 532 version 19.11 (first XEmacs) released September 13, 1994.
 533 @item
 534 version 19.12 released June 23, 1995.
 535 @item
 536 version 19.13 released September 1, 1995.
 537 @item
 538 version 19.14 released June 23, 1996.
 539 @item
 540 version 20.0 released February 9, 1997.
 541 @item
 542 version 19.15 released March 28, 1997.
 543 @item
 544 version 20.1 (not released to the net) April 15, 1997.
 545 @item
 546 version 20.2 released May 16, 1997.
 547 @item
 548 version 19.16 released October 31, 1997.
 549 @item
 550 version 20.3 (the first stable version of XEmacs 20.x) released November 30,
 551 1997.
 552 @item
 553 version 20.4 released February 28, 1998.
 554 @item
 555 version 21.1.2 released May 14, 1999. (The version naming scheme was
 556 changed at this point: [a] the second version number is odd for stable
 557 versions, even for beta versions; [b] a third version number is added,
 558 replacing the "beta xxx" ending for beta versions and allowing for
 559 periodic maintenance releases for stable versions.  Therefore, 21.0 was
 560 never "officially" released; similarly for 21.2, etc.)
 561 @item
 562 version 21.1.3 released June 26, 1999.
 563 @item
 564 version 21.1.4 released July 8, 1999.
 565 @item
 566 version 21.1.6 released August 14, 1999. (There was no 21.1.5.)
 567 @item
 568 version 21.1.7 released September 26, 1999.
 569 @item
 570 version 21.1.8 released November 2, 1999.
 571 @item
 572 version 21.1.9 released February 13, 2000.
 573 @item
 574 version 21.1.10 released May 7, 2000.
 575 @item
 576 version 21.1.10a released June 24, 2000.
 577 @item
 578 version 21.1.11 released July 18, 2000.
 579 @item
 580 version 21.1.12 released August 5, 2000.
 581 @item
 582 version 21.1.13 released January 7, 2001.
 583 @item
 584 version 21.1.14 released January 27, 2001.
 585 @end itemize
 586
 587 @node GNU Emacs 19
 588 @section GNU Emacs 19
 589 @cindex GNU Emacs 19
 590 @cindex Emacs 19, GNU
 591 @cindex version 19, GNU Emacs
 592 @cindex FSF Emacs
 593
 594   About a year after the initial release of Lucid Emacs, the FSF
 595 released a beta of their version of Emacs 19 (referred to here as ``GNU
 596 Emacs'').  By this time, the current version of Lucid Emacs was
 597 19.6. (Strangely, the first released beta from the FSF was GNU Emacs
 598 19.7.) A time line for GNU Emacs version 19 is
 599
 600 @itemize @bullet
 601 @item
 602 version 19.8 (beta) released May 27, 1993.
 603 @item
 604 version 19.9 (beta) released May 27, 1993.
 605 @item
 606 version 19.10 (beta) released May 30, 1993.
 607 @item
 608 version 19.11 (beta) released June 1, 1993.
 609 @item
 610 version 19.12 (beta) released June 2, 1993.
 611 @item
 612 version 19.13 (beta) released June 8, 1993.
 613 @item
 614 version 19.14 (beta) released June 17, 1993.
 615 @item
 616 version 19.15 (beta) released June 19, 1993.
 617 @item
 618 version 19.16 (beta) released July 6, 1993.
 619 @item
 620 version 19.17 (beta) released late July, 1993.
 621 @item
 622 version 19.18 (beta) released August 9, 1993.
 623 @item
 624 version 19.19 (beta) released August 15, 1993.
 625 @item
 626 version 19.20 (beta) released November 17, 1993.
 627 @item
 628 version 19.21 (beta) released November 17, 1993.
 629 @item
 630 version 19.22 (beta) released November 28, 1993.
 631 @item
 632 version 19.23 (beta) released May 17, 1994.
 633 @item
 634 version 19.24 (beta) released May 16, 1994.
 635 @item
 636 version 19.25 (beta) released June 3, 1994.
 637 @item
 638 version 19.26 (beta) released September 11, 1994.
 639 @item
 640 version 19.27 (beta) released September 14, 1994.
 641 @item
 642 version 19.28 (first ``official'' release) released November 1, 1994.
 643 @item
 644 version 19.29 released June 21, 1995.
 645 @item
 646 version 19.30 released November 24, 1995.
 647 @item
 648 version 19.31 released May 25, 1996.
 649 @item
 650 version 19.32 released July 31, 1996.
 651 @item
 652 version 19.33 released August 11, 1996.
 653 @item
 654 version 19.34 released August 21, 1996.
 655 @item
 656 version 19.34b released September 6, 1996.
 657 @end itemize
 658
 659 @cindex Mlynarik, Richard
 660   In some ways, GNU Emacs 19 was better than Lucid Emacs; in some ways,
 661 worse.  Lucid soon began incorporating features from GNU Emacs 19 into
 662 Lucid Emacs; the work was mostly done by Richard Mlynarik, who had been
 663 working on and using GNU Emacs for a long time (back as far as version
 664 16 or 17).
 665
 666 @node GNU Emacs 20
 667 @section GNU Emacs 20
 668 @cindex GNU Emacs 20
 669 @cindex Emacs 20, GNU
 670 @cindex version 20, GNU Emacs
 671 @cindex FSF Emacs
 672
 673 On February 2, 1997 work began on GNU Emacs to integrate Mule.  The first
 674 release was made in September of that year.
 675
 676 A timeline for Emacs 20 is
 677
 678 @itemize @bullet
 679 @item
 680 version 20.1 released September 17, 1997.
 681 @item
 682 version 20.2 released September 20, 1997.
 683 @item
 684 version 20.3 released August 19, 1998.
 685 @end itemize
 686
 687 @node XEmacs
 688 @section XEmacs
 689 @cindex XEmacs
 690
 691 @cindex Sun Microsystems
 692 @cindex University of Illinois
 693 @cindex Illinois, University of
 694 @cindex SPARCWorks
 695 @cindex Andreessen, Marc
 696 @cindex Baur, Steve
 697 @cindex Buchholz, Martin
 698 @cindex Kaplan, Simon
 699 @cindex Wing, Ben
 700 @cindex Thompson, Chuck
 701 @cindex Win-Emacs
 702 @cindex Epoch
 703 @cindex Amdahl Corporation
 704   Around the time that Lucid was developing Energize, Sun Microsystems
 705 was developing their own development environment (called ``SPARCWorks'')
 706 and also decided to use Emacs.  They joined forces with the Epoch team
 707 at the University of Illinois and later with Lucid.  The maintainer of
 708 the last-released version of Epoch was Marc Andreessen, but he dropped
 709 out and the Epoch project, headed by Simon Kaplan, lured Chuck Thompson
 710 away from a system administration job to become the primary Lucid Emacs
 711 author for Epoch and Sun.  Chuck's area of specialty became the
 712 redisplay engine (he replaced the old Lucid Emacs redisplay engine with
 713 a ported version from Epoch and then later rewrote it from scratch).
 714 Sun also hired Ben Wing (the author of Win-Emacs, a port of Lucid Emacs
 715 to Microsoft Windows 3.1) in 1993, for what was initially a one-month
 716 contract to fix some event problems but later became a many-year
 717 involvement, punctuated by a six-month contract with Amdahl Corporation.
 718
 719 @cindex rename to XEmacs
 720   In 1994, Sun and Lucid agreed to rename Lucid Emacs to XEmacs (a name
 721 not favorable to either company); the first release called XEmacs was
 722 version 19.11.  In June 1994, Lucid folded and Jamie quit to work for
 723 the newly formed Mosaic Communications Corp., later Netscape
 724 Communications Corp. (co-founded by the same Marc Andreessen, who had
 725 quit his Epoch job to work on a graphical browser for the World Wide
 726 Web).  Chuck then become the primary maintainer of XEmacs, and put out
 727 versions 19.11 through 19.14 in conjunction with Ben.  For 19.12 and
 728 19.13, Chuck added the new redisplay and many other display improvements
 729 and Ben added MULE support (support for Asian and other languages) and
 730 redesigned most of the internal Lisp subsystems to better support the
 731 MULE work and the various other features being added to XEmacs.  After
 732 19.14 Chuck retired as primary maintainer and Steve Baur stepped in.
 733
 734 @cindex MULE merged XEmacs appears
 735   Soon after 19.13 was released, work began in earnest on the MULE
 736 internationalization code and the source tree was divided into two
 737 development paths.  The MULE version was initially called 19.20, but was
 738 soon renamed to 20.0.  In 1996 Martin Buchholz of Sun Microsystems took
 739 over the care and feeding of it and worked on it in parallel with the
 740 19.14 development that was occurring at the same time.  After much work
 741 by Martin, it was decided to release 20.0 ahead of 19.15 in February
 742 1997.  The source tree remained divided until 20.2 when the version 19
 743 source was finally retired at version 19.16.
 744
 745 @cindex Baur, Steve
 746 @cindex Buchholz, Martin
 747 @cindex Jones, Kyle
 748 @cindex Niksic, Hrvoje
 749 @cindex XEmacs goes it alone
 750   In 1997, Sun finally dropped all pretense of support for XEmacs and
 751 Martin Buchholz left the company in November.  Since then, and mostly
 752 for the previous year, because Steve Baur was never paid to work on
 753 XEmacs, XEmacs has existed solely on the contributions of volunteers
 754 from the Free Software Community.  Starting from 1997, Hrvoje Niksic and
 755 Kyle Jones have figured prominently in XEmacs development.
 756
 757 @cindex merging attempts
 758   Many attempts have been made to merge XEmacs and GNU Emacs, but they
 759 have consistently failed.
 760
 761   A more detailed history is contained in the XEmacs About page.
 762
 763   A time line for XEmacs is
 764
 765 @itemize @bullet
 766 @item
 767 version 19.11 (first XEmacs) released September 13, 1994.
 768 @item
 769 version 19.12 released June 23, 1995.
 770 @item
 771 version 19.13 released September 1, 1995.
 772 @item
 773 version 19.14 released June 23, 1996.
 774 @item
 775 version 20.0 released February 9, 1997.
 776 @item
 777 version 19.15 released March 28, 1997.
 778 @item
 779 version 20.1 (not released to the net) April 15, 1997.
 780 @item
 781 version 20.2 released May 16, 1997.
 782 @item
 783 version 19.16 released October 31, 1997.
 784 @item
 785 version 20.3 (the first stable version of XEmacs 20.x) released November 30,
 786 1997.
 787 @item
 788 version 20.4 released February 28, 1998.
 789 @item
 790 version 21.0.60 released December 10, 1998. (The version naming scheme was
 791 changed at this point: [a] the second version number is odd for stable
 792 versions, even for beta versions; [b] a third version number is added,
 793 replacing the "beta xxx" ending for beta versions and allowing for
 794 periodic maintenance releases for stable versions.  Therefore, 21.0 was
 795 never "officially" released; similarly for 21.2, etc.)
 796 @item
 797 version 21.0.61 released January 4, 1999.
 798 @item
 799 version 21.0.63 released February 3, 1999.
 800 @item
 801 version 21.0.64 released March 1, 1999.
 802 @item
 803 version 21.0.65 released March 5, 1999.
 804 @item
 805 version 21.0.66 released March 12, 1999.
 806 @item
 807 version 21.0.67 released March 25, 1999.
 808 @item
 809 version 21.1.2 released May 14, 1999. (This is the followup to 21.0.67.
 810 The second version number was bumped to indicate the beginning of the
 811 "stable" series.)
 812 @item
 813 version 21.1.3 released June 26, 1999.
 814 @item
 815 version 21.1.4 released July 8, 1999.
 816 @item
 817 version 21.1.6 released August 14, 1999. (There was no 21.1.5.)
 818 @item
 819 version 21.1.7 released September 26, 1999.
 820 @item
 821 version 21.1.8 released November 2, 1999.
 822 @item
 823 version 21.1.9 released February 13, 2000.
 824 @item
 825 version 21.1.10 released May 7, 2000.
 826 @item
 827 version 21.1.10a released June 24, 2000.
 828 @item
 829 version 21.1.11 released July 18, 2000.
 830 @item
 831 version 21.1.12 released August 5, 2000.
 832 @item
 833 version 21.1.13 released January 7, 2001.
 834 @item
 835 version 21.1.14 released January 27, 2001.
 836 @item
 837 version 21.2.9 released February 3, 1999.
 838 @item
 839 version 21.2.10 released February 5, 1999.
 840 @item
 841 version 21.2.11 released March 1, 1999.
 842 @item
 843 version 21.2.12 released March 5, 1999.
 844 @item
 845 version 21.2.13 released March 12, 1999.
 846 @item
 847 version 21.2.14 released May 14, 1999.
 848 @item
 849 version 21.2.15 released June 4, 1999.
 850 @item
 851 version 21.2.16 released June 11, 1999.
 852 @item
 853 version 21.2.17 released June 22, 1999.
 854 @item
 855 version 21.2.18 released July 14, 1999.
 856 @item
 857 version 21.2.19 released July 30, 1999.
 858 @item
 859 version 21.2.20 released November 10, 1999.
 860 @item
 861 version 21.2.21 released November 28, 1999.
 862 @item
 863 version 21.2.22 released November 29, 1999.
 864 @item
 865 version 21.2.23 released December 7, 1999.
 866 @item
 867 version 21.2.24 released December 14, 1999.
 868 @item
 869 version 21.2.25 released December 24, 1999.
 870 @item
 871 version 21.2.26 released December 31, 1999.
 872 @item
 873 version 21.2.27 released January 18, 2000.
 874 @item
 875 version 21.2.28 released February 7, 2000.
 876 @item
 877 version 21.2.29 released February 16, 2000.
 878 @item
 879 version 21.2.30 released February 21, 2000.
 880 @item
 881 version 21.2.31 released February 23, 2000.
 882 @item
 883 version 21.2.32 released March 20, 2000.
 884 @item
 885 version 21.2.33 released May 1, 2000.
 886 @item
 887 version 21.2.34 released May 28, 2000.
 888 @item
 889 version 21.2.35 released July 19, 2000.
 890 @item
 891 version 21.2.36 released October 4, 2000.
 892 @item
 893 version 21.2.37 released November 14, 2000.
 894 @item
 895 version 21.2.38 released December 5, 2000.
 896 @item
 897 version 21.2.39 released December 31, 2000.
 898 @item
 899 version 21.2.40 released January 8, 2001.
 900 @item
 901 version 21.2.41 released January 17, 2001.
 902 @item
 903 version 21.2.42 released January 20, 2001.
 904 @item
 905 version 21.2.43 released January 26, 2001.
 906 @item
 907 version 21.2.44 released February 8, 2001.
 908 @item
 909 version 21.2.45 released February 23, 2001.
 910 @item
 911 version 21.2.46 released March 21, 2001.
 912 @end itemize
 913
 914 @node XEmacs From the Outside, The Lisp Language, A History of Emacs, Top
 915 @chapter XEmacs From the Outside
 916 @cindex XEmacs from the outside
 917 @cindex outside, XEmacs from the
 918 @cindex read-eval-print
 919
 920   XEmacs appears to the outside world as an editor, but it is really a
 921 Lisp environment.  At its heart is a Lisp interpreter; it also
 922 ``happens'' to contain many specialized object types (e.g. buffers,
 923 windows, frames, events) that are useful for implementing an editor.
 924 Some of these objects (in particular windows and frames) have
 925 displayable representations, and XEmacs provides a function
 926 @code{redisplay()} that ensures that the display of all such objects
 927 matches their internal state.  Most of the time, a standard Lisp
 928 environment is in a @dfn{read-eval-print} loop---i.e. ``read some Lisp
 929 code, execute it, and print the results''.  XEmacs has a similar loop:
 930
 931 @itemize @bullet
 932 @item
 933 read an event
 934 @item
 935 dispatch the event (i.e. ``do it'')
 936 @item
 937 redisplay
 938 @end itemize
 939
 940   Reading an event is done using the Lisp function @code{next-event},
 941 which waits for something to happen (typically, the user presses a key
 942 or moves the mouse) and returns an event object describing this.
 943 Dispatching an event is done using the Lisp function
 944 @code{dispatch-event}, which looks up the event in a keymap object (a
 945 particular kind of object that associates an event with a Lisp function)
 946 and calls that function.  The function ``does'' what the user has
 947 requested by changing the state of particular frame objects, buffer
 948 objects, etc.  Finally, @code{redisplay()} is called, which updates the
 949 display to reflect those changes just made.  Thus is an ``editor'' born.
 950
 951 @cindex bridge, playing
 952 @cindex taxes, doing
 953 @cindex pi, calculating
 954   Note that you do not have to use XEmacs as an editor; you could just
 955 as well make it do your taxes, compute pi, play bridge, etc.  You'd just
 956 have to write functions to do those operations in Lisp.
 957
 958 @node The Lisp Language, XEmacs From the Perspective of Building, XEmacs From the Outside, Top
 959 @chapter The Lisp Language
 960 @cindex Lisp language, the
 961 @cindex Lisp vs. C
 962 @cindex C vs. Lisp
 963 @cindex Lisp vs. Java
 964 @cindex Java vs. Lisp
 965 @cindex dynamic scoping
 966 @cindex scoping, dynamic
 967 @cindex dynamic types
 968 @cindex types, dynamic
 969 @cindex Java
 970 @cindex Common Lisp
 971 @cindex Gosling, James
 972
 973   Lisp is a general-purpose language that is higher-level than C and in
 974 many ways more powerful than C.  Powerful dialects of Lisp such as
 975 Common Lisp are probably much better languages for writing very large
 976 applications than is C. (Unfortunately, for many non-technical
 977 reasons C and its successor C++ have become the dominant languages for
 978 application development.  These languages are both inadequate for
 979 extremely large applications, which is evidenced by the fact that newer,
 980 larger programs are becoming ever harder to write and are requiring ever
 981 more programmers despite great increases in C development environments;
 982 and by the fact that, although hardware speeds and reliability have been
 983 growing at an exponential rate, most software is still generally
 984 considered to be slow and buggy.)
 985
 986   The new Java language holds promise as a better general-purpose
 987 development language than C.  Java has many features in common with
 988 Lisp that are not shared by C (this is not a coincidence, since
 989 Java was designed by James Gosling, a former Lisp hacker).  This
 990 will be discussed more later.
 991
 992 For those used to C, here is a summary of the basic differences between
 993 C and Lisp:
 994
 995 @enumerate
 996 @item
 997 Lisp has an extremely regular syntax.  Every function, expression,
 998 and control statement is written in the form
 999
1000 @example
1001    (@var{func} @var{arg1} @var{arg2} ...)
1002 @end example
1003
1004 This is as opposed to C, which writes functions as
1005
1006 @example
1007    func(@var{arg1}, @var{arg2}, ...)
1008 @end example
1009
1010 but writes expressions involving operators as (e.g.)
1011
1012 @example
1013    @var{arg1} + @var{arg2}
1014 @end example
1015
1016 and writes control statements as (e.g.)
1017
1018 @example
1019    while (@var{expr}) @{ @var{statement1}; @var{statement2}; ... @}
1020 @end example
1021
1022 Lisp equivalents of the latter two would be
1023
1024 @example
1025    (+ @var{arg1} @var{arg2} ...)
1026 @end example
1027
1028 and
1029
1030 @example
1031    (while @var{expr} @var{statement1} @var{statement2} ...)
1032 @end example
1033
1034 @item
1035 Lisp is a safe language.  Assuming there are no bugs in the Lisp
1036 interpreter/compiler, it is impossible to write a program that ``core
1037 dumps'' or otherwise causes the machine to execute an illegal
1038 instruction.  This is very different from C, where perhaps the most
1039 common outcome of a bug is exactly such a crash.  A corollary of this is that
1040 the C operation of casting a pointer is impossible (and unnecessary) in
1041 Lisp, and that it is impossible to access memory outside the bounds of
1042 an array.
1043
1044 @item
1045 Programs and data are written in the same form.  The
1046 parenthesis-enclosing form described above for statements is the same
1047 form used for the most common data type in Lisp, the list.  Thus, it is
1048 possible to represent any Lisp program using Lisp data types, and for
1049 one program to construct Lisp statements and then dynamically
1050 @dfn{evaluate} them, or cause them to execute.
1051
1052 @item
1053 All objects are @dfn{dynamically typed}.  This means that part of every
1054 object is an indication of what type it is.  A Lisp program can
1055 manipulate an object without knowing what type it is, and can query an
1056 object to determine its type.  This means that, correspondingly,
1057 variables and function parameters can hold objects of any type and are
1058 not normally declared as being of any particular type.  This is opposed
1059 to the @dfn{static typing} of C, where variables can hold exactly one
1060 type of object and must be declared as such, and objects do not contain
1061 an indication of their type because it's implicit in the variables they
1062 are stored in.  It is possible in C to have a variable hold different
1063 types of objects (e.g. through the use of @code{void *} pointers or
1064 variable-argument functions), but the type information must then be
1065 passed explicitly in some other fashion, leading to additional program
1066 complexity.
1067
1068 @item
1069 Allocated memory is automatically reclaimed when it is no longer in use.
1070 This operation is called @dfn{garbage collection} and involves looking
1071 through all variables to see what memory is being pointed to, and
1072 reclaiming any memory that is not pointed to and is thus
1073 ``inaccessible'' and out of use.  This is as opposed to C, in which
1074 allocated memory must be explicitly reclaimed using @code{free()}.  If
1075 you simply drop all pointers to memory without freeing it, it becomes
1076 ``leaked'' memory that still takes up space.  Over a long period of
1077 time, this can cause your program to grow and grow until it runs out of
1078 memory.
1079
1080 @item
1081 Lisp has built-in facilities for handling errors and exceptions.  In C,
1082 when an error occurs, usually either the program exits entirely or the
1083 routine in which the error occurs returns a value indicating this.  If
1084 an error occurs in a deeply-nested routine, then every routine currently
1085 called must unwind itself normally and return an error value back up to
1086 the next routine.  This means that every routine must explicitly check
1087 for an error in all the routines it calls; if it does not do so,
1088 unexpected and often random behavior results.  This is an extremely
1089 common source of bugs in C programs.  An alternative would be to do a
1090 non-local exit using @code{longjmp()}, but that is often very dangerous
1091 because the routines that were exited past had no opportunity to clean
1092 up after themselves and may leave things in an inconsistent state,
1093 causing a crash shortly afterwards.
1094
1095 Lisp provides mechanisms to make such non-local exits safe.  When an
1096 error occurs, a routine simply signals that an error of a particular
1097 class has occurred, and a non-local exit takes place.  Any routine can
1098 trap errors occurring in routines it calls by registering an error
1099 handler for some or all classes of errors. (If no handler is registered,
1100 a default handler, generally installed by the top-level event loop, is
1101 executed; this prints out the error and continues.) Routines can also
1102 specify cleanup code (called an @dfn{unwind-protect}) that will be
1103 called when control exits from a block of code, no matter how that exit
1104 occurs---i.e. even if a function deeply nested below it causes a
1105 non-local exit back to the top level.
1106
1107 Note that this facility has appeared in some recent vintages of C, in
1108 particular Visual C++ and other PC compilers written for the Microsoft
1109 Win32 API.
1110
1111 @item
1112 In Emacs Lisp, local variables are @dfn{dynamically scoped}.  This means
1113 that if you declare a local variable in a particular function, and then
1114 call another function, that subfunction can ``see'' the local variable
1115 you declared.  This is actually considered a bug in Emacs Lisp and in
1116 all other early dialects of Lisp, and was corrected in Common Lisp. (In
1117 Common Lisp, you can still declare dynamically scoped variables if you
1118 want to---they are sometimes useful---but variables by default are
1119 @dfn{lexically scoped} as in C.)
1120 @end enumerate
1121
1122 For those familiar with Lisp, Emacs Lisp is modelled after MacLisp, an
1123 early dialect of Lisp developed at MIT (no relation to the Macintosh
1124 computer).  There is a Common Lisp compatibility package available for
1125 Emacs that provides many of the features of Common Lisp.
1126
1127 The Java language is derived in many ways from C, and shares a similar
1128 syntax, but has the following features in common with Lisp (and different
1129 from C):
1130
1131 @enumerate
1132 @item
1133 Java is a safe language, like Lisp.
1134 @item
1135 Java provides garbage collection, like Lisp.
1136 @item
1137 Java has built-in facilities for handling errors and exceptions, like
1138 Lisp.
1139 @item
1140 Java has a type system that combines the best advantages of both static
1141 and dynamic typing.  Objects (except very simple types) are explicitly
1142 marked with their type, as in dynamic typing; but there is a hierarchy
1143 of types and functions are declared to accept only certain types, thus
1144 providing the increased compile-time error-checking of static typing.
1145 @end enumerate
1146
1147 The Java language also has some negative attributes:
1148
1149 @enumerate
1150 @item
1151 Java uses the edit/compile/run model of software development.  This
1152 makes it hard to use interactively.  For example, to use Java like
1153 @code{bc} it is necessary to write a special purpose, albeit tiny,
1154 application.  In Emacs Lisp, a calculator comes built-in without any
1155 effort - one can always just type an expression in the @code{*scratch*}
1156 buffer.
1157 @item
1158 Java tries too hard to enforce, not merely enable, portability, making
1159 ordinary access to standard OS facilities painful.  Java has an
1160 @dfn{agenda}.  I think this is why @code{chdir} is not part of standard
1161 Java, which is inexcusable.
1162 @end enumerate
1163
1164 Unfortunately, there is no perfect language.  Static typing allows a
1165 compiler to catch programmer errors and produce more efficient code, but
1166 makes programming more tedious and less fun.  For the foreseeable future,
1167 an Ideal Editing and Programming Environment (and that is what XEmacs
1168 aspires to) will be programmable in multiple languages: high level ones
1169 like Lisp for user customization and prototyping, and lower level ones
1170 for infrastructure and industrial strength applications.  If I had my
1171 way, XEmacs would be friendly towards the Python, Scheme, C++, ML,
1172 etc... communities.  But there are serious technical difficulties to
1173 achieving that goal.
1174
1175 The word @dfn{application} in the previous paragraph was used
1176 intentionally.  XEmacs implements an API for programs written in Lisp
1177 that makes it a full-fledged application platform, very much like an OS
1178 inside the real OS.
1179
1180 @node XEmacs From the Perspective of Building, XEmacs From the Inside, The Lisp Language, Top
1181 @chapter XEmacs From the Perspective of Building
1182 @cindex XEmacs from the perspective of building
1183 @cindex building, XEmacs from the perspective of
1184
1185 The heart of XEmacs is the Lisp environment, which is written in C.
1186 This is contained in the @file{src/} subdirectory.  Underneath
1187 @file{src/} are two subdirectories of header files: @file{s/} (header
1188 files for particular operating systems) and @file{m/} (header files for
1189 particular machine types).  In practice the distinction between the two
1190 types of header files is blurred.  These header files define or undefine
1191 certain preprocessor constants and macros to indicate particular
1192 characteristics of the associated machine or operating system.  As part
1193 of the configure process, one @file{s/} file and one @file{m/} file is
1194 identified for the particular environment in which XEmacs is being
1195 built.
1196
1197 XEmacs also contains a great deal of Lisp code.  This implements the
1198 operations that make XEmacs useful as an editor as well as just a Lisp
1199 environment, and also contains many add-on packages that allow XEmacs to
1200 browse directories, act as a mail and Usenet news reader, compile Lisp
1201 code, etc.  There is actually more Lisp code than C code associated with
1202 XEmacs, but much of the Lisp code is peripheral to the actual operation
1203 of the editor.  The Lisp code all lies in subdirectories underneath the
1204 @file{lisp/} directory.
1205
1206 The @file{lwlib/} directory contains C code that implements a
1207 generalized interface onto different X widget toolkits and also
1208 implements some widgets of its own that behave like Motif widgets but
1209 are faster, free, and in some cases more powerful.  The code in this
1210 directory compiles into a library and is mostly independent from XEmacs.
1211
1212 The @file{etc/} directory contains various data files associated with
1213 XEmacs.  Some of them are actually read by XEmacs at startup; others
1214 merely contain useful information of various sorts.
1215
1216 The @file{lib-src/} directory contains C code for various auxiliary
1217 programs that are used in connection with XEmacs.  Some of them are used
1218 during the build process; others are used to perform certain functions
1219 that cannot conveniently be placed in the XEmacs executable (e.g. the
1220 @file{movemail} program for fetching mail out of @file{/var/spool/mail},
1221 which must be setgid to @file{mail} on many systems; and the
1222 @file{gnuclient} program, which allows an external script to communicate
1223 with a running XEmacs process).
1224
1225 The @file{man/} directory contains the sources for the XEmacs
1226 documentation.  It is mostly in a form called Texinfo, which can be
1227 converted into either a printed document (by passing it through @TeX{})
1228 or into on-line documentation called @dfn{info files}.
1229
1230 The @file{info/} directory contains the results of formatting the XEmacs
1231 documentation as @dfn{info files}, for on-line use.  These files are
1232 used when you enter the Info system using @kbd{C-h i} or through the
1233 Help menu.
1234
1235 The @file{dynodump/} directory contains auxiliary code used to build
1236 XEmacs on Solaris platforms.
1237
1238 The other directories contain various miscellaneous code and information
1239 that is not normally used or needed.
1240
1241 The first step of building involves running the @file{configure} program
1242 and passing it various parameters to specify any optional features you
1243 want and compiler arguments and such, as described in the @file{INSTALL}
1244 file.  This determines what the build environment is, chooses the
1245 appropriate @file{s/} and @file{m/} file, and runs a series of tests to
1246 determine many details about your environment, such as which library
1247 functions are available and exactly how they work.  The reason for
1248 running these tests is that it allows XEmacs to be compiled on a much
1249 wider variety of platforms than those that the XEmacs developers happen
1250 to be familiar with, including various sorts of hybrid platforms.  This
1251 is especially important now that many operating systems give you a great
1252 deal of control over exactly what features you want installed, and allow
1253 for easy upgrading of parts of a system without upgrading the rest.  It
1254 would be impossible to pre-determine and pre-specify the information for
1255 all possible configurations.
1256
1257 In fact, the @file{s/} and @file{m/} files are basically @emph{evil},
1258 since they contain unmaintainable platform-specific hard-coded
1259 information.  XEmacs has been moving in the direction of having all
1260 system-specific information be determined dynamically by
1261 @file{configure}.  Perhaps someday we can @code{rm -rf src/s src/m}.
1262
1263 When configure is done running, it generates @file{Makefile}s and
1264 @file{GNUmakefile}s and the file @file{src/config.h} (which describes
1265 the features of your system) from template files.  You then run
1266 @file{make}, which compiles the auxiliary code and programs in
1267 @file{lib-src/} and @file{lwlib/} and the main XEmacs executable in
1268 @file{src/}.  The result of compiling and linking is an executable
1269 called @file{temacs}, which is @emph{not} the final XEmacs executable.
1270 @file{temacs} by itself is not intended to function as an editor or even
1271 display any windows on the screen, and if you simply run it, it will
1272 exit immediately.  The @file{Makefile} runs @file{temacs} with certain
1273 options that cause it to initialize itself, read in a number of basic
1274 Lisp files, and then dump itself out into a new executable called
1275 @file{xemacs}.  This new executable has been pre-initialized and
1276 contains pre-digested Lisp code that is necessary for the editor to
1277 function (this includes most basic editing functions,
1278 e.g. @code{kill-line}, that can be defined in terms of other Lisp
1279 primitives; some initialization code that is called when certain
1280 objects, such as frames, are created; and all of the standard
1281 keybindings and code for the actions they result in).  This executable,
1282 @file{xemacs}, is the executable that you run to use the XEmacs editor.
1283
1284 Although @file{temacs} is not intended to be run as an editor, it can,
1285 by using the incantation @code{temacs -batch -l loadup.el run-temacs}.
1286 This is useful when the dumping procedure described above is broken, or
1287 when using certain program debugging tools such as Purify.  These tools
1288 get mighty confused by the tricks played by the XEmacs build process,
1289 such as allocation memory in one process, and freeing it in the next.
1290
1291 @node XEmacs From the Inside, The XEmacs Object System (Abstractly Speaking), XEmacs From the Perspective of Building, Top
1292 @chapter XEmacs From the Inside
1293 @cindex XEmacs from the inside
1294 @cindex inside, XEmacs from the
1295
1296 Internally, XEmacs is quite complex, and can be very confusing.  To
1297 simplify things, it can be useful to think of XEmacs as containing an
1298 event loop that ``drives'' everything, and a number of other subsystems,
1299 such as a Lisp engine and a redisplay mechanism.  Each of these other
1300 subsystems exists simultaneously in XEmacs, and each has a certain
1301 state.  The flow of control continually passes in and out of these
1302 different subsystems in the course of normal operation of the editor.
1303
1304 It is important to keep in mind that, most of the time, the editor is
1305 ``driven'' by the event loop.  Except during initialization and batch
1306 mode, all subsystems are entered directly or indirectly through the
1307 event loop, and ultimately, control exits out of all subsystems back up
1308 to the event loop.  This cycle of entering a subsystem, exiting back out
1309 to the event loop, and starting another iteration of the event loop
1310 occurs once each keystroke, mouse motion, etc.
1311
1312 If you're trying to understand a particular subsystem (other than the
1313 event loop), think of it as a ``daemon'' process or ``servant'' that is
1314 responsible for one particular aspect of a larger system, and
1315 periodically receives commands or environment changes that cause it to
1316 do something.  Ultimately, these commands and environment changes are
1317 always triggered by the event loop.  For example:
1318
1319 @itemize @bullet
1320 @item
1321 The window and frame mechanism is responsible for keeping track of what
1322 windows and frames exist, what buffers are in them, etc.  It is
1323 periodically given commands (usually from the user) to make a change to
1324 the current window/frame state: i.e. create a new frame, delete a
1325 window, etc.
1326
1327 @item
1328 The buffer mechanism is responsible for keeping track of what buffers
1329 exist and what text is in them.  It is periodically given commands
1330 (usually from the user) to insert or delete text, create a buffer, etc.
1331 When it receives a text-change command, it notifies the redisplay
1332 mechanism.
1333
1334 @item
1335 The redisplay mechanism is responsible for making sure that windows and
1336 frames are displayed correctly.  It is periodically told (by the event
1337 loop) to actually ``do its job'', i.e. snoop around and see what the
1338 current state of the environment (mostly of the currently-existing
1339 windows, frames, and buffers) is, and make sure that state matches
1340 what's actually displayed.  It keeps lots and lots of information around
1341 (such as what is actually being displayed currently, and what the
1342 environment was last time it checked) so that it can minimize the work
1343 it has to do.  It is also helped along in that whenever a relevant
1344 change to the environment occurs, the redisplay mechanism is told about
1345 this, so it has a pretty good idea of where it has to look to find
1346 possible changes and doesn't have to look everywhere.
1347
1348 @item
1349 The Lisp engine is responsible for executing the Lisp code in which most
1350 user commands are written.  It is entered through a call to @code{eval}
1351 or @code{funcall}, which occurs as a result of dispatching an event from
1352 the event loop.  The functions it calls issue commands to the buffer
1353 mechanism, the window/frame subsystem, etc.
1354
1355 @item
1356 The Lisp allocation subsystem is responsible for keeping track of Lisp
1357 objects.  It is given commands from the Lisp engine to allocate objects,
1358 garbage collect, etc.
1359 @end itemize
1360
1361 etc.
1362
1363   The important idea here is that there are a number of independent
1364 subsystems each with its own responsibility and persistent state, just
1365 like different employees in a company, and each subsystem is
1366 periodically given commands from other subsystems.  Commands can flow
1367 from any one subsystem to any other, but there is usually some sort of
1368 hierarchy, with all commands originating from the event subsystem.
1369
1370   XEmacs is entered in @code{main()}, which is in @file{emacs.c}.  When
1371 this is called the first time (in a properly-invoked @file{temacs}), it
1372 does the following:
1373
1374 @enumerate
1375 @item
1376 It does some very basic environment initializations, such as determining
1377 where it and its directories (e.g. @file{lisp/} and @file{etc/}) reside
1378 and setting up signal handlers.
1379 @item
1380 It initializes the entire Lisp interpreter.
1381 @item
1382 It sets the initial values of many built-in variables (including many
1383 variables that are visible to Lisp programs), such as the global keymap
1384 object and the built-in faces (a face is an object that describes the
1385 display characteristics of text).  This involves creating Lisp objects
1386 and thus is dependent on step (2).
1387 @item
1388 It performs various other initializations that are relevant to the
1389 particular environment it is running in, such as retrieving environment
1390 variables, determining the current date and the user who is running the
1391 program, examining its standard input, creating any necessary file
1392 descriptors, etc.
1393 @item
1394 At this point, the C initialization is complete.  A Lisp program that
1395 was specified on the command line (usually @file{loadup.el}) is called
1396 (temacs is normally invoked as @code{temacs -batch -l loadup.el dump}).
1397 @file{loadup.el} loads all of the other Lisp files that are needed for
1398 the operation of the editor, calls the @code{dump-emacs} function to
1399 write out @file{xemacs}, and then kills the temacs process.
1400 @end enumerate
1401
1402   When @file{xemacs} is then run, it only redoes steps (1) and (4)
1403 above; all variables already contain the values they were set to when
1404 the executable was dumped, and all memory that was allocated with
1405 @code{malloc()} is still around. (XEmacs knows whether it is being run
1406 as @file{xemacs} or @file{temacs} because it sets the global variable
1407 @code{initialized} to 1 after step (4) above.) At this point,
1408 @file{xemacs} calls a Lisp function to do any further initialization,
1409 which includes parsing the command-line (the C code can only do limited
1410 command-line parsing, which includes looking for the @samp{-batch} and
1411 @samp{-l} flags and a few other flags that it needs to know about before
1412 initialization is complete), creating the first frame (or @dfn{window}
1413 in standard window-system parlance), running the user's init file
1414 (usually the file @file{.emacs} in the user's home directory), etc.  The
1415 function to do this is usually called @code{normal-top-level};
1416 @file{loadup.el} tells the C code about this function by setting its
1417 name as the value of the Lisp variable @code{top-level}.
1418
1419   When the Lisp initialization code is done, the C code enters the event
1420 loop, and stays there for the duration of the XEmacs process.  The code
1421 for the event loop is contained in @file{cmdloop.c}, and is called
1422 @code{Fcommand_loop_1()}.  Note that this event loop could very well be
1423 written in Lisp, and in fact a Lisp version exists; but apparently,
1424 doing this makes XEmacs run noticeably slower.
1425
1426   Notice how much of the initialization is done in Lisp, not in C.
1427 In general, XEmacs tries to move as much code as is possible
1428 into Lisp.  Code that remains in C is code that implements the
1429 Lisp interpreter itself, or code that needs to be very fast, or
1430 code that needs to do system calls or other such stuff that
1431 needs to be done in C, or code that needs to have access to
1432 ``forbidden'' structures. (One conscious aspect of the design of
1433 Lisp under XEmacs is a clean separation between the external
1434 interface to a Lisp object's functionality and its internal
1435 implementation.  Part of this design is that Lisp programs
1436 are forbidden from accessing the contents of the object other
1437 than through using a standard API.  In this respect, XEmacs Lisp
1438 is similar to modern Lisp dialects but differs from GNU Emacs,
1439 which tends to expose the implementation and allow Lisp
1440 programs to look at it directly.  The major advantage of
1441 hiding the implementation is that it allows the implementation
1442 to be redesigned without affecting any Lisp programs, including
1443 those that might want to be ``clever'' by looking directly at
1444 the object's contents and possibly manipulating them.)
1445
1446   Moving code into Lisp makes the code easier to debug and maintain and
1447 makes it much easier for people who are not XEmacs developers to
1448 customize XEmacs, because they can make a change with much less chance
1449 of obscure and unwanted interactions occurring than if they were to
1450 change the C code.
1451
1452 @node The XEmacs Object System (Abstractly Speaking), How Lisp Objects Are Represented in C, XEmacs From the Inside, Top
1453 @chapter The XEmacs Object System (Abstractly Speaking)
1454 @cindex XEmacs object system (abstractly speaking), the
1455 @cindex object system (abstractly speaking), the XEmacs
1456
1457   At the heart of the Lisp interpreter is its management of objects.
1458 XEmacs Lisp contains many built-in objects, some of which are
1459 simple and others of which can be very complex; and some of which
1460 are very common, and others of which are rarely used or are only
1461 used internally. (Since the Lisp allocation system, with its
1462 automatic reclamation of unused storage, is so much more convenient
1463 than @code{malloc()} and @code{free()}, the C code makes extensive use of it
1464 in its internal operations.)
1465
1466   The basic Lisp objects are
1467
1468 @table @code
1469 @item integer
1470 28 or 31 bits of precision, or 60 or 63 bits on 64-bit machines; the
1471 reason for this is described below when the internal Lisp object
1472 representation is described.
1473 @item float
1474 Same precision as a double in C.
1475 @item cons
1476 A simple container for two Lisp objects, used to implement lists and
1477 most other data structures in Lisp.
1478 @item char
1479 An object representing a single character of text; chars behave like
1480 integers in many ways but are logically considered text rather than
1481 numbers and have a different read syntax. (the read syntax for a char
1482 contains the char itself or some textual encoding of it---for example,
1483 a Japanese Kanji character might be encoded as @samp{^[$(B#&^[(B} using the
1484 ISO-2022 encoding standard---rather than the numerical representation
1485 of the char; this way, if the mapping between chars and integers
1486 changes, which is quite possible for Kanji characters and other extended
1487 characters, the same character will still be created.  Note that some
1488 primitives confuse chars and integers.  The worst culprit is @code{eq},
1489 which makes a special exception and considers a char to be @code{eq} to
1490 its integer equivalent, even though in no other case are objects of two
1491 different types @code{eq}.  The reason for this monstrosity is
1492 compatibility with existing code; the separation of char from integer
1493 came fairly recently.)
1494 @item symbol
1495 An object that contains Lisp objects and is referred to by name;
1496 symbols are used to implement variables and named functions
1497 and to provide the equivalent of preprocessor constants in C.
1498 @item vector
1499 A one-dimensional array of Lisp objects providing constant-time access
1500 to any of the objects; access to an arbitrary object in a vector is
1501 faster than for lists, but the operations that can be done on a vector
1502 are more limited.
1503 @item string
1504 Self-explanatory; behaves much like a vector of chars
1505 but has a different read syntax and is stored and manipulated
1506 more compactly.
1507 @item bit-vector
1508 A vector of bits; similar to a string in spirit.
1509 @item compiled-function
1510 An object containing compiled Lisp code, known as @dfn{byte code}.
1511 @item subr
1512 A Lisp primitive, i.e. a Lisp-callable function implemented in C.
1513 @end table
1514
1515 @cindex closure
1516 Note that there is no basic ``function'' type, as in more powerful
1517 versions of Lisp (where it's called a @dfn{closure}).  XEmacs Lisp does
1518 not provide the closure semantics implemented by Common Lisp and Scheme.
1519 The guts of a function in XEmacs Lisp are represented in one of four
1520 ways: a symbol specifying another function (when one function is an
1521 alias for another), a list (whose first element must be the symbol
1522 @code{lambda}) containing the function's source code, a
1523 compiled-function object, or a subr object. (In other words, given a
1524 symbol specifying the name of a function, calling @code{symbol-function}
1525 to retrieve the contents of the symbol's function cell will return one
1526 of these types of objects.)
1527
1528 XEmacs Lisp also contains numerous specialized objects used to implement
1529 the editor:
1530
1531 @table @code
1532 @item buffer
1533 Stores text like a string, but is optimized for insertion and deletion
1534 and has certain other properties that can be set.
1535 @item frame
1536 An object with various properties whose displayable representation is a
1537 @dfn{window} in window-system parlance.
1538 @item window
1539 A section of a frame that displays the contents of a buffer;
1540 often called a @dfn{pane} in window-system parlance.
1541 @item window-configuration
1542 An object that represents a saved configuration of windows in a frame.
1543 @item device
1544 An object representing a screen on which frames can be displayed;
1545 equivalent to a @dfn{display} in the X Window System and a @dfn{TTY} in
1546 character mode.
1547 @item face
1548 An object specifying the appearance of text or graphics; it has
1549 properties such as font, foreground color, and background color.
1550 @item marker
1551 An object that refers to a particular position in a buffer and moves
1552 around as text is inserted and deleted to stay in the same relative
1553 position to the text around it.
1554 @item extent
1555 Similar to a marker but covers a range of text in a buffer; can also
1556 specify properties of the text, such as a face in which the text is to
1557 be displayed, whether the text is invisible or unmodifiable, etc.
1558 @item event
1559 Generated by calling @code{next-event} and contains information
1560 describing a particular event happening in the system, such as the user
1561 pressing a key or a process terminating.
1562 @item keymap
1563 An object that maps from events (described using lists, vectors, and
1564 symbols rather than with an event object because the mapping is for
1565 classes of events, rather than individual events) to functions to
1566 execute or other events to recursively look up; the functions are
1567 described by name, using a symbol, or using lists to specify the
1568 function's code.
1569 @item glyph
1570 An object that describes the appearance of an image (e.g.  pixmap) on
1571 the screen; glyphs can be attached to the beginning or end of extents
1572 and in some future version of XEmacs will be able to be inserted
1573 directly into a buffer.
1574 @item process
1575 An object that describes a connection to an externally-running process.
1576 @end table
1577
1578   There are some other, less-commonly-encountered general objects:
1579
1580 @table @code
1581 @item hash-table
1582 An object that maps from an arbitrary Lisp object to another arbitrary
1583 Lisp object, using hashing for fast lookup.
1584 @item obarray
1585 A limited form of hash-table that maps from strings to symbols; obarrays
1586 are used to look up a symbol given its name and are not actually their
1587 own object type but are kludgily represented using vectors with hidden
1588 fields (this representation derives from GNU Emacs).
1589 @item specifier
1590 A complex object used to specify the value of a display property; a
1591 default value is given and different values can be specified for
1592 particular frames, buffers, windows, devices, or classes of device.
1593 @item char-table
1594 An object that maps from chars or classes of chars to arbitrary Lisp
1595 objects; internally char tables use a complex nested-vector
1596 representation that is optimized to the way characters are represented
1597 as integers.
1598 @item range-table
1599 An object that maps from ranges of integers to arbitrary Lisp objects.
1600 @end table
1601
1602   And some strange special-purpose objects:
1603
1604 @table @code
1605 @item charset
1606 @itemx coding-system
1607 Objects used when MULE, or multi-lingual/Asian-language, support is
1608 enabled.
1609 @item color-instance
1610 @itemx font-instance
1611 @itemx image-instance
1612 An object that encapsulates a window-system resource; instances are
1613 mostly used internally but are exposed on the Lisp level for cleanness
1614 of the specifier model and because it's occasionally useful for Lisp
1615 program to create or query the properties of instances.
1616 @item subwindow
1617 An object that encapsulate a @dfn{subwindow} resource, i.e. a
1618 window-system child window that is drawn into by an external process;
1619 this object should be integrated into the glyph system but isn't yet,
1620 and may change form when this is done.
1621 @item tooltalk-message
1622 @itemx tooltalk-pattern
1623 Objects that represent resources used in the ToolTalk interprocess
1624 communication protocol.
1625 @item toolbar-button
1626 An object used in conjunction with the toolbar.
1627 @end table
1628
1629   And objects that are only used internally:
1630
1631 @table @code
1632 @item opaque
1633 A generic object for encapsulating arbitrary memory; this allows you the
1634 generality of @code{malloc()} and the convenience of the Lisp object
1635 system.
1636 @item lstream
1637 A buffering I/O stream, used to provide a unified interface to anything
1638 that can accept output or provide input, such as a file descriptor, a
1639 stdio stream, a chunk of memory, a Lisp buffer, a Lisp string, etc.;
1640 it's a Lisp object to make its memory management more convenient.
1641 @item char-table-entry
1642 Subsidiary objects in the internal char-table representation.
1643 @item extent-auxiliary
1644 @itemx menubar-data
1645 @itemx toolbar-data
1646 Various special-purpose objects that are basically just used to
1647 encapsulate memory for particular subsystems, similar to the more
1648 general ``opaque'' object.
1649 @item symbol-value-forward
1650 @itemx symbol-value-buffer-local
1651 @itemx symbol-value-varalias
1652 @itemx symbol-value-lisp-magic
1653 Special internal-only objects that are placed in the value cell of a
1654 symbol to indicate that there is something special with this variable --
1655 e.g. it has no value, it mirrors another variable, or it mirrors some C
1656 variable; there is really only one kind of object, called a
1657 @dfn{symbol-value-magic}, but it is sort-of halfway kludged into
1658 semi-different object types.
1659 @end table
1660
1661 @cindex permanent objects
1662 @cindex temporary objects
1663   Some types of objects are @dfn{permanent}, meaning that once created,
1664 they do not disappear until explicitly destroyed, using a function such
1665 as @code{delete-buffer}, @code{delete-window}, @code{delete-frame}, etc.
1666 Others will disappear once they are not longer used, through the garbage
1667 collection mechanism.  Buffers, frames, windows, devices, and processes
1668 are among the objects that are permanent.  Note that some objects can go
1669 both ways: Faces can be created either way; extents are normally
1670 permanent, but detached extents (extents not referring to any text, as
1671 happens to some extents when the text they are referring to is deleted)
1672 are temporary.  Note that some permanent objects, such as faces and
1673 coding systems, cannot be deleted.  Note also that windows are unique in
1674 that they can be @emph{undeleted} after having previously been
1675 deleted. (This happens as a result of restoring a window configuration.)
1676
1677 @cindex read syntax
1678   Note that many types of objects have a @dfn{read syntax}, i.e. a way of
1679 specifying an object of that type in Lisp code.  When you load a Lisp
1680 file, or type in code to be evaluated, what really happens is that the
1681 function @code{read} is called, which reads some text and creates an object
1682 based on the syntax of that text; then @code{eval} is called, which
1683 possibly does something special; then this loop repeats until there's
1684 no more text to read. (@code{eval} only actually does something special
1685 with symbols, which causes the symbol's value to be returned,
1686 similar to referencing a variable; and with conses [i.e. lists],
1687 which cause a function invocation.  All other values are returned
1688 unchanged.)
1689
1690   The read syntax
1691
1692 @example
1693 17297
1694 @end example
1695
1696 converts to an integer whose value is 17297.
1697
1698 @example
1699 1.983e-4
1700 @end example
1701
1702 converts to a float whose value is 1.983e-4, or .0001983.
1703
1704 @example
1705 ?b
1706 @end example
1707
1708 converts to a char that represents the lowercase letter b.
1709
1710 @example
1711 ?^[$(B#&^[(B
1712 @end example
1713
1714 (where @samp{^[} actually is an @samp{ESC} character) converts to a
1715 particular Kanji character when using an ISO2022-based coding system for
1716 input. (To decode this goo: @samp{ESC} begins an escape sequence;
1717 @samp{ESC $ (} is a class of escape sequences meaning ``switch to a
1718 94x94 character set''; @samp{ESC $ ( B} means ``switch to Japanese
1719 Kanji''; @samp{#} and @samp{&} collectively index into a 94-by-94 array
1720 of characters [subtract 33 from the ASCII value of each character to get
1721 the corresponding index]; @samp{ESC (} is a class of escape sequences
1722 meaning ``switch to a 94 character set''; @samp{ESC (B} means ``switch
1723 to US ASCII''.  It is a coincidence that the letter @samp{B} is used to
1724 denote both Japanese Kanji and US ASCII.  If the first @samp{B} were
1725 replaced with an @samp{A}, you'd be requesting a Chinese Hanzi character
1726 from the GB2312 character set.)
1727
1728 @example
1729 "foobar"
1730 @end example
1731
1732 converts to a string.
1733
1734 @example
1735 foobar
1736 @end example
1737
1738 converts to a symbol whose name is @code{"foobar"}.  This is done by
1739 looking up the string equivalent in the global variable
1740 @code{obarray}, whose contents should be an obarray.  If no symbol
1741 is found, a new symbol with the name @code{"foobar"} is automatically
1742 created and added to @code{obarray}; this process is called
1743 @dfn{interning} the symbol.
1744 @cindex interning
1745
1746 @example
1747 (foo . bar)
1748 @end example
1749
1750 converts to a cons cell containing the symbols @code{foo} and @code{bar}.
1751
1752 @example
1753 (1 a 2.5)
1754 @end example
1755
1756 converts to a three-element list containing the specified objects
1757 (note that a list is actually a set of nested conses; see the
1758 XEmacs Lisp Reference).
1759
1760 @example
1761 [1 a 2.5]
1762 @end example
1763
1764 converts to a three-element vector containing the specified objects.
1765
1766 @example
1767 #[... ... ... ...]
1768 @end example
1769
1770 converts to a compiled-function object (the actual contents are not
1771 shown since they are not relevant here; look at a file that ends with
1772 @file{.elc} for examples).
1773
1774 @example
1775 #*01110110
1776 @end example
1777
1778 converts to a bit-vector.
1779
1780 @example
1781 #s(hash-table ... ...)
1782 @end example
1783
1784 converts to a hash table (the actual contents are not shown).
1785
1786 @example
1787 #s(range-table ... ...)
1788 @end example
1789
1790 converts to a range table (the actual contents are not shown).
1791
1792 @example
1793 #s(char-table ... ...)
1794 @end example
1795
1796 converts to a char table (the actual contents are not shown).
1797
1798 Note that the @code{#s()} syntax is the general syntax for structures,
1799 which are not really implemented in XEmacs Lisp but should be.
1800
1801 When an object is printed out (using @code{print} or a related
1802 function), the read syntax is used, so that the same object can be read
1803 in again.
1804
1805 The other objects do not have read syntaxes, usually because it does not
1806 really make sense to create them in this fashion (i.e.  processes, where
1807 it doesn't make sense to have a subprocess created as a side effect of
1808 reading some Lisp code), or because they can't be created at all
1809 (e.g. subrs).  Permanent objects, as a rule, do not have a read syntax;
1810 nor do most complex objects, which contain too much state to be easily
1811 initialized through a read syntax.
1812
1813 @node How Lisp Objects Are Represented in C, Rules When Writing New C Code, The XEmacs Object System (Abstractly Speaking), Top
1814 @chapter How Lisp Objects Are Represented in C
1815 @cindex Lisp objects are represented in C, how
1816 @cindex objects are represented in C, how Lisp
1817 @cindex represented in C, how Lisp objects are
1818
1819 Lisp objects are represented in C using a 32-bit or 64-bit machine word
1820 (depending on the processor; i.e. DEC Alphas use 64-bit Lisp objects and
1821 most other processors use 32-bit Lisp objects).  The representation
1822 stuffs a pointer together with a tag, as follows:
1823
1824 @example
1825  [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
1826  [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
1827
1828    <---------------------------------------------------------> <->
1829             a pointer to a structure, or an integer            tag
1830 @end example
1831
1832 A tag of 00 is used for all pointer object types, a tag of 10 is used
1833 for characters, and the other two tags 01 and 11 are joined together to
1834 form the integer object type.  This representation gives us 31 bit
1835 integers and 30 bit characters, while pointers are represented directly
1836 without any bit masking or shifting.  This representation, though,
1837 assumes that pointers to structs are always aligned to multiples of 4,
1838 so the lower 2 bits are always zero.
1839
1840 Lisp objects use the typedef @code{Lisp_Object}, but the actual C type
1841 used for the Lisp object can vary.  It can be either a simple type
1842 (@code{long} on the DEC Alpha, @code{int} on other machines) or a
1843 structure whose fields are bit fields that line up properly (actually, a
1844 union of structures is used).  The choice of which type to use is
1845 determined by the preprocessor constant @code{USE_UNION_TYPE} which is
1846 defined via the @code{--use-union-type} option to @code{configure}.
1847
1848 Generally the simple integral type is preferable because it ensures that
1849 the compiler will actually use a machine word to represent the object
1850 (some compilers will use more general and less efficient code for unions
1851 and structs even if they can fit in a machine word).  The union type,
1852 however, has the advantage of stricter @emph{static} type checking.
1853 Places where a @code{Lisp_Object} is mistakenly passed to a routine
1854 expecting an @code{int} (or vice-versa), or a check is written @samp{if
1855 (foo)} (instead of @samp{if (!NILP (foo))}, will be flagged as errors.
1856 None of these lead to the expected results!  @code{Qnil} is not
1857 represented as 0 (so @samp{if (foo)} will *ALWAYS* be true for a
1858 @code{Lisp_Object}), and the representation of an integer as a
1859 @code{Lisp_Object} is not just the integer's numeric value, but usually
1860 2x the integer +/- 1.)
1861
1862 There used to be a claim that the union type simplified debugging.
1863 There may have been a grain of truth to this pre-19.8, when there was no
1864 @samp{lrecord} type and all objects had a separate type appearing in the
1865 tag.  Nowadays, however, there is no debugging gain, and in fact
1866 frequent debugging *@emph{loss}*, since many debuggers don't handle
1867 unions very well, and usually there is no way to directly specify a
1868 union from a debugging prompt.
1869
1870 Furthermore, release builds should *@emph{not}* be done with union type
1871 because (a) you may get less efficiency, with compilers that can't
1872 figure out how to optimize the union into a machine word; (b) even
1873 worse, the union type often triggers miscompilation, especially when
1874 combined with Mule and error-checking.  This has been the case at
1875 various times when using GCC and MS VC, at least with @samp{--pdump}.
1876 Therefore, be warned!
1877
1878 As of 2002 4Q, miscompilation is known to happen with current versions
1879 of @strong{Microsoft VC++} and @strong{GCC in combination with Mule,
1880 pdump, and KKCC} (no error checking).
1881
1882 Various macros are used to convert between Lisp_Objects and the
1883 corresponding C type.  Macros of the form @code{XINT()}, @code{XCHAR()},
1884 @code{XSTRING()}, @code{XSYMBOL()}, do any required bit shifting and/or
1885 masking and cast it to the appropriate type.  @code{XINT()} needs to be
1886 a bit tricky so that negative numbers are properly sign-extended.  Since
1887 integers are stored left-shifted, if the right-shift operator does an
1888 arithmetic shift (i.e. it leaves the most-significant bit as-is rather
1889 than shifting in a zero, so that it mimics a divide-by-two even for
1890 negative numbers) the shift to remove the tag bit is enough.  This is
1891 the case on all the systems we support.
1892
1893 Note that when @code{ERROR_CHECK_TYPECHECK} is defined, the converter
1894 macros become more complicated---they check the tag bits and/or the
1895 type field in the first four bytes of a record type to ensure that the
1896 object is really of the correct type.  This is great for catching places
1897 where an incorrect type is being dereferenced---this typically results
1898 in a pointer being dereferenced as the wrong type of structure, with
1899 unpredictable (and sometimes not easily traceable) results.
1900
1901 There are similar @code{XSET@var{TYPE}()} macros that construct a Lisp
1902 object.  These macros are of the form @code{XSET@var{TYPE}
1903 (@var{lvalue}, @var{result})}, i.e. they have to be a statement rather
1904 than just used in an expression.  The reason for this is that standard C
1905 doesn't let you ``construct'' a structure (but GCC does).  Granted, this
1906 sometimes isn't too convenient; for the case of integers, at least, you
1907 can use the function @code{make_int()}, which constructs and
1908 @emph{returns} an integer Lisp object.  Note that the
1909 @code{XSET@var{TYPE}()} macros are also affected by
1910 @code{ERROR_CHECK_TYPECHECK} and make sure that the structure is of the
1911 right type in the case of record types, where the type is contained in
1912 the structure.
1913
1914 The C programmer is responsible for @strong{guaranteeing} that a
1915 Lisp_Object is the correct type before using the @code{X@var{TYPE}}
1916 macros.  This is especially important in the case of lists.  Use
1917 @code{XCAR} and @code{XCDR} if a Lisp_Object is certainly a cons cell,
1918 else use @code{Fcar()} and @code{Fcdr()}.  Trust other C code, but not
1919 Lisp code.  On the other hand, if XEmacs has an internal logic error,
1920 it's better to crash immediately, so sprinkle @code{assert()}s and
1921 ``unreachable'' @code{abort()}s liberally about the source code.  Where
1922 performance is an issue, use @code{type_checking_assert},
1923 @code{bufpos_checking_assert}, and @code{gc_checking_assert}, which do
1924 nothing unless the corresponding configure error checking flag was
1925 specified.
1926
1927 @node Rules When Writing New C Code, Regression Testing XEmacs, How Lisp Objects Are Represented in C, Top
1928 @chapter Rules When Writing New C Code
1929 @cindex writing new C code, rules when
1930 @cindex C code, rules when writing new
1931 @cindex code, rules when writing new C
1932
1933 The XEmacs C Code is extremely complex and intricate, and there are many
1934 rules that are more or less consistently followed throughout the code.
1935 Many of these rules are not obvious, so they are explained here.  It is
1936 of the utmost importance that you follow them.  If you don't, you may
1937 get something that appears to work, but which will crash in odd
1938 situations, often in code far away from where the actual breakage is.
1939
1940 @menu
1941 * General Coding Rules::
1942 * Writing Lisp Primitives::
1943 * Writing Good Comments::
1944 * Adding Global Lisp Variables::
1945 * Proper Use of Unsigned Types::
1946 * Coding for Mule::
1947 * Techniques for XEmacs Developers::
1948 @end menu
1949
1950 @node General Coding Rules
1951 @section General Coding Rules
1952 @cindex coding rules, general
1953
1954 The C code is actually written in a dialect of C called @dfn{Clean C},
1955 meaning that it can be compiled, mostly warning-free, with either a C or
1956 C++ compiler.  Coding in Clean C has several advantages over plain C.
1957 C++ compilers are more nit-picking, and a number of coding errors have
1958 been found by compiling with C++.  The ability to use both C and C++
1959 tools means that a greater variety of development tools are available to
1960 the developer.
1961
1962 Every module includes @file{<config.h>} (angle brackets so that
1963 @samp{--srcdir} works correctly; @file{config.h} may or may not be in
1964 the same directory as the C sources) and @file{lisp.h}.  @file{config.h}
1965 must always be included before any other header files (including
1966 system header files) to ensure that certain tricks played by various
1967 @file{s/} and @file{m/} files work out correctly.
1968
1969 When including header files, always use angle brackets, not double
1970 quotes, except when the file to be included is always in the same
1971 directory as the including file.  If either file is a generated file,
1972 then that is not likely to be the case.  In order to understand why we
1973 have this rule, imagine what happens when you do a build in the source
1974 directory using @samp{./configure} and another build in another
1975 directory using @samp{../work/configure}.  There will be two different
1976 @file{config.h} files.  Which one will be used if you @samp{#include
1977 "config.h"}?
1978
1979 Almost every module contains a @code{syms_of_*()} function and a
1980 @code{vars_of_*()} function.  The former declares any Lisp primitives
1981 you have defined and defines any symbols you will be using.  The latter
1982 declares any global Lisp variables you have added and initializes global
1983 C variables in the module.  @strong{Important}: There are stringent
1984 requirements on exactly what can go into these functions.  See the
1985 comment in @file{emacs.c}.  The reason for this is to avoid obscure
1986 unwanted interactions during initialization.  If you don't follow these
1987 rules, you'll be sorry!  If you want to do anything that isn't allowed,
1988 create a @code{complex_vars_of_*()} function for it.  Doing this is
1989 tricky, though: you have to make sure your function is called at the
1990 right time so that all the initialization dependencies work out.
1991
1992 Declare each function of these kinds in @file{symsinit.h}.  Make sure
1993 it's called in the appropriate place in @file{emacs.c}.  You never need
1994 to include @file{symsinit.h} directly, because it is included by
1995 @file{lisp.h}.
1996
1997 @strong{All global and static variables that are to be modifiable must
1998 be declared uninitialized.}  This means that you may not use the
1999 ``declare with initializer'' form for these variables, such as @code{int
2000 some_variable = 0;}.  The reason for this has to do with some kludges
2001 done during the dumping process: If possible, the initialized data
2002 segment is re-mapped so that it becomes part of the (unmodifiable) code
2003 segment in the dumped executable.  This allows this memory to be shared
2004 among multiple running XEmacs processes.  XEmacs is careful to place as
2005 much constant data as possible into initialized variables during the
2006 @file{temacs} phase.
2007
2008 @cindex copy-on-write
2009 @strong{Please note:} This kludge only works on a few systems nowadays,
2010 and is rapidly becoming irrelevant because most modern operating systems
2011 provide @dfn{copy-on-write} semantics.  All data is initially shared
2012 between processes, and a private copy is automatically made (on a
2013 page-by-page basis) when a process first attempts to write to a page of
2014 memory.
2015
2016 Formerly, there was a requirement that static variables not be declared
2017 inside of functions.  This had to do with another hack along the same
2018 vein as what was just described: old USG systems put statically-declared
2019 variables in the initialized data space, so those header files had a
2020 @code{#define static} declaration. (That way, the data-segment remapping
2021 described above could still work.) This fails badly on static variables
2022 inside of functions, which suddenly become automatic variables;
2023 therefore, you weren't supposed to have any of them.  This awful kludge
2024 has been removed in XEmacs because
2025
2026 @enumerate
2027 @item
2028 almost all of the systems that used this kludge ended up having
2029 to disable the data-segment remapping anyway;
2030 @item
2031 the only systems that didn't were extremely outdated ones;
2032 @item
2033 this hack completely messed up inline functions.
2034 @end enumerate
2035
2036 The C source code makes heavy use of C preprocessor macros.  One popular
2037 macro style is:
2038
2039 @example
2040 #define FOO(var, value) do @{            \
2041   Lisp_Object FOO_value = (value);      \
2042   ... /* compute using FOO_value */     \
2043   (var) = bar;                          \
2044 @} while (0)
2045 @end example
2046
2047 The @code{do @{...@} while (0)} is a standard trick to allow FOO to have
2048 statement semantics, so that it can safely be used within an @code{if}
2049 statement in C, for example.  Multiple evaluation is prevented by
2050 copying a supplied argument into a local variable, so that
2051 @code{FOO(var,fun(1))} only calls @code{fun} once.
2052
2053 Lisp lists are popular data structures in the C code as well as in
2054 Elisp.  There are two sets of macros that iterate over lists.
2055 @code{EXTERNAL_LIST_LOOP_@var{n}} should be used when the list has been
2056 supplied by the user, and cannot be trusted to be acyclic and
2057 @code{nil}-terminated.  A @code{malformed-list} or @code{circular-list} error
2058 will be generated if the list being iterated over is not entirely
2059 kosher.  @code{LIST_LOOP_@var{n}}, on the other hand, is faster and less
2060 safe, and can be used only on trusted lists.
2061
2062 Related macros are @code{GET_EXTERNAL_LIST_LENGTH} and
2063 @code{GET_LIST_LENGTH}, which calculate the length of a list, and in the
2064 case of @code{GET_EXTERNAL_LIST_LENGTH}, validating the properness of
2065 the list.  The macros @code{EXTERNAL_LIST_LOOP_DELETE_IF} and
2066 @code{LIST_LOOP_DELETE_IF} delete elements from a lisp list satisfying some
2067 predicate.
2068
2069 @node Writing Lisp Primitives
2070 @section Writing Lisp Primitives
2071 @cindex writing Lisp primitives
2072 @cindex Lisp primitives, writing
2073 @cindex primitives, writing Lisp
2074
2075 Lisp primitives are Lisp functions implemented in C.  The details of
2076 interfacing the C function so that Lisp can call it are handled by a few
2077 C macros.  The only way to really understand how to write new C code is
2078 to read the source, but we can explain some things here.
2079
2080 An example of a special form is the definition of @code{prog1}, from
2081 @file{eval.c}.  (An ordinary function would have the same general
2082 appearance.)
2083
2084 @cindex garbage collection protection
2085 @smallexample
2086 @group
2087 DEFUN ("prog1", Fprog1, 1, UNEVALLED, 0, /*
2088 Similar to `progn', but the value of the first form is returned.
2089 \(prog1 FIRST BODY...): All the arguments are evaluated sequentially.
2090 The value of FIRST is saved during evaluation of the remaining args,
2091 whose values are discarded.
2092 */
2093        (args))
2094 @{
2095   /* This function can GC */
2096   REGISTER Lisp_Object val, form, tail;
2097   struct gcpro gcpro1;
2098
2099   val = Feval (XCAR (args));
2100
2101   GCPRO1 (val);
2102
2103   LIST_LOOP_3 (form, XCDR (args), tail)
2104     Feval (form);
2105
2106   UNGCPRO;
2107   return val;
2108 @}
2109 @end group
2110 @end smallexample
2111
2112   Let's start with a precise explanation of the arguments to the
2113 @code{DEFUN} macro.  Here is a template for them:
2114
2115 @example
2116 @group
2117 DEFUN (@var{lname}, @var{fname}, @var{min_args}, @var{max_args}, @var{interactive}, /*
2118 @var{docstring}
2119 */
2120    (@var{arglist}))
2121 @end group
2122 @end example
2123
2124 @table @var
2125 @item lname
2126 This string is the name of the Lisp symbol to define as the function
2127 name; in the example above, it is @code{"prog1"}.
2128
2129 @item fname
2130 This is the C function name for this function.  This is the name that is
2131 used in C code for calling the function.  The name is, by convention,
2132 @samp{F} prepended to the Lisp name, with all dashes (@samp{-}) in the
2133 Lisp name changed to underscores.  Thus, to call this function from C
2134 code, call @code{Fprog1}.  Remember that the arguments are of type
2135 @code{Lisp_Object}; various macros and functions for creating values of
2136 type @code{Lisp_Object} are declared in the file @file{lisp.h}.
2137
2138 Primitives whose names are special characters (e.g. @code{+} or
2139 @code{<}) are named by spelling out, in some fashion, the special
2140 character: e.g. @code{Fplus()} or @code{Flss()}.  Primitives whose names
2141 begin with normal alphanumeric characters but also contain special
2142 characters are spelled out in some creative way, e.g. @code{let*}
2143 becomes @code{FletX()}.
2144
2145 Each function also has an associated structure that holds the data for
2146 the subr object that represents the function in Lisp.  This structure
2147 conveys the Lisp symbol name to the initialization routine that will
2148 create the symbol and store the subr object as its definition.  The C
2149 variable name of this structure is always @samp{S} prepended to the
2150 @var{fname}.  You hardly ever need to be aware of the existence of this
2151 structure, since @code{DEFUN} plus @code{DEFSUBR} takes care of all the
2152 details.
2153
2154 @item min_args
2155 This is the minimum number of arguments that the function requires.  The
2156 function @code{prog1} allows a minimum of one argument.
2157
2158 @item max_args
2159 This is the maximum number of arguments that the function accepts, if
2160 there is a fixed maximum.  Alternatively, it can be @code{UNEVALLED},
2161 indicating a special form that receives unevaluated arguments, or
2162 @code{MANY}, indicating an unlimited number of evaluated arguments (the
2163 C equivalent of @code{&rest}).  Both @code{UNEVALLED} and @code{MANY}
2164 are macros.  If @var{max_args} is a number, it may not be less than
2165 @var{min_args} and it may not be greater than 8. (If you need to add a
2166 function with more than 8 arguments, use the @code{MANY} form.  Resist
2167 the urge to edit the definition of @code{DEFUN} in @file{lisp.h}.  If
2168 you do it anyways, make sure to also add another clause to the switch
2169 statement in @code{primitive_funcall().})
2170
2171 @item interactive
2172 This is an interactive specification, a string such as might be used as
2173 the argument of @code{interactive} in a Lisp function.  In the case of
2174 @code{prog1}, it is 0 (a null pointer), indicating that @code{prog1}
2175 cannot be called interactively.  A value of @code{""} indicates a
2176 function that should receive no arguments when called interactively.
2177
2178 @item docstring
2179 This is the documentation string.  It is written just like a
2180 documentation string for a function defined in Lisp; in particular, the
2181 first line should be a single sentence.  Note how the documentation
2182 string is enclosed in a comment, none of the documentation is placed on
2183 the same lines as the comment-start and comment-end characters, and the
2184 comment-start characters are on the same line as the interactive
2185 specification.  @file{make-docfile}, which scans the C files for
2186 documentation strings, is very particular about what it looks for, and
2187 will not properly extract the doc string if it's not in this exact format.
2188
2189 In order to make both @file{etags} and @file{make-docfile} happy, make
2190 sure that the @code{DEFUN} line contains the @var{lname} and
2191 @var{fname}, and that the comment-start characters for the doc string
2192 are on the same line as the interactive specification, and put a newline
2193 directly after them (and before the comment-end characters).
2194
2195 @item arglist
2196 This is the comma-separated list of arguments to the C function.  For a
2197 function with a fixed maximum number of arguments, provide a C argument
2198 for each Lisp argument.  In this case, unlike regular C functions, the
2199 types of the arguments are not declared; they are simply always of type
2200 @code{Lisp_Object}.
2201
2202 The names of the C arguments will be used as the names of the arguments
2203 to the Lisp primitive as displayed in its documentation, modulo the same
2204 concerns described above for @code{F...} names (in particular,
2205 underscores in the C arguments become dashes in the Lisp arguments).
2206
2207 There is one additional kludge: A trailing `_' on the C argument is
2208 discarded when forming the Lisp argument.  This allows C language
2209 reserved words (like @code{default}) or global symbols (like
2210 @code{dirname}) to be used as argument names without compiler warnings
2211 or errors.
2212
2213 A Lisp function with @w{@var{max_args} = @code{UNEVALLED}} is a
2214 @w{@dfn{special form}}; its arguments are not evaluated.  Instead it
2215 receives one argument of type @code{Lisp_Object}, a (Lisp) list of the
2216 unevaluated arguments, conventionally named @code{(args)}.
2217
2218 When a Lisp function has no upper limit on the number of arguments,
2219 specify @w{@var{max_args} = @code{MANY}}.  In this case its implementation in
2220 C actually receives exactly two arguments: the number of Lisp arguments
2221 (an @code{int}) and the address of a block containing their values (a
2222 @w{@code{Lisp_Object *}}).  In this case only are the C types specified
2223 in the @var{arglist}: @w{@code{(int nargs, Lisp_Object *args)}}.
2224
2225 @end table
2226
2227 Within the function @code{Fprog1} itself, note the use of the macros
2228 @code{GCPRO1} and @code{UNGCPRO}.  @code{GCPRO1} is used to ``protect''
2229 a variable from garbage collection---to inform the garbage collector
2230 that it must look in that variable and regard the object pointed at by
2231 its contents as an accessible object.  This is necessary whenever you
2232 call @code{Feval} or anything that can directly or indirectly call
2233 @code{Feval} (this includes the @code{QUIT} macro!).  At such a time,
2234 any Lisp object that you intend to refer to again must be protected
2235 somehow.  @code{UNGCPRO} cancels the protection of the variables that
2236 are protected in the current function.  It is necessary to do this
2237 explicitly.
2238
2239 The macro @code{GCPRO1} protects just one local variable.  If you want
2240 to protect two, use @code{GCPRO2} instead; repeating @code{GCPRO1} will
2241 not work.  Macros @code{GCPRO3} and @code{GCPRO4} also exist.
2242
2243 These macros implicitly use local variables such as @code{gcpro1}; you
2244 must declare these explicitly, with type @code{struct gcpro}.  Thus, if
2245 you use @code{GCPRO2}, you must declare @code{gcpro1} and @code{gcpro2}.
2246
2247 @cindex caller-protects (@code{GCPRO} rule)
2248 Note also that the general rule is @dfn{caller-protects}; i.e. you are
2249 only responsible for protecting those Lisp objects that you create.  Any
2250 objects passed to you as arguments should have been protected by whoever
2251 created them, so you don't in general have to protect them.
2252
2253 In particular, the arguments to any Lisp primitive are always
2254 automatically @code{GCPRO}ed, when called ``normally'' from Lisp code or
2255 bytecode.  So only a few Lisp primitives that are called frequently from
2256 C code, such as @code{Fprogn} protect their arguments as a service to
2257 their caller.  You don't need to protect your arguments when writing a
2258 new @code{DEFUN}.
2259
2260 @code{GCPRO}ing is perhaps the trickiest and most error-prone part of
2261 XEmacs coding.  It is @strong{extremely} important that you get this
2262 right and use a great deal of discipline when writing this code.
2263 @xref{GCPROing, ,@code{GCPRO}ing}, for full details on how to do this.
2264
2265 What @code{DEFUN} actually does is declare a global structure of type
2266 @code{Lisp_Subr} whose name begins with capital @samp{SF} and which
2267 contains information about the primitive (e.g. a pointer to the
2268 function, its minimum and maximum allowed arguments, a string describing
2269 its Lisp name); @code{DEFUN} then begins a normal C function declaration
2270 using the @code{F...} name.  The Lisp subr object that is the function
2271 definition of a primitive (i.e. the object in the function slot of the
2272 symbol that names the primitive) actually points to this @samp{SF}
2273 structure; when @code{Feval} encounters a subr, it looks in the
2274 structure to find out how to call the C function.
2275
2276 Defining the C function is not enough to make a Lisp primitive
2277 available; you must also create the Lisp symbol for the primitive (the
2278 symbol is @dfn{interned}; @pxref{Obarrays}) and store a suitable subr
2279 object in its function cell. (If you don't do this, the primitive won't
2280 be seen by Lisp code.) The code looks like this:
2281
2282 @example
2283 DEFSUBR (@var{fname});
2284 @end example
2285
2286 @noindent
2287 Here @var{fname} is the same name you used as the second argument to
2288 @code{DEFUN}.
2289
2290 This call to @code{DEFSUBR} should go in the @code{syms_of_*()} function
2291 at the end of the module.  If no such function exists, create it and
2292 make sure to also declare it in @file{symsinit.h} and call it from the
2293 appropriate spot in @code{main()}.  @xref{General Coding Rules}.
2294
2295 Note that C code cannot call functions by name unless they are defined
2296 in C.  The way to call a function written in Lisp from C is to use
2297 @code{Ffuncall}, which embodies the Lisp function @code{funcall}.  Since
2298 the Lisp function @code{funcall} accepts an unlimited number of
2299 arguments, in C it takes two: the number of Lisp-level arguments, and a
2300 one-dimensional array containing their values.  The first Lisp-level
2301 argument is the Lisp function to call, and the rest are the arguments to
2302 pass to it.  Since @code{Ffuncall} can call the evaluator, you must
2303 protect pointers from garbage collection around the call to
2304 @code{Ffuncall}. (However, @code{Ffuncall} explicitly protects all of
2305 its parameters, so you don't have to protect any pointers passed as
2306 parameters to it.)
2307
2308 The C functions @code{call0}, @code{call1}, @code{call2}, and so on,
2309 provide handy ways to call a Lisp function conveniently with a fixed
2310 number of arguments.  They work by calling @code{Ffuncall}.
2311
2312 @file{eval.c} is a very good file to look through for examples;
2313 @file{lisp.h} contains the definitions for important macros and
2314 functions.
2315
2316 @node Writing Good Comments
2317 @section Writing Good Comments
2318 @cindex writing good comments
2319 @cindex comments, writing good
2320
2321 Comments are a lifeline for programmers trying to understand tricky
2322 code.  In general, the less obvious it is what you are doing, the more
2323 you need a comment, and the more detailed it needs to be.  You should
2324 always be on guard when you're writing code for stuff that's tricky, and
2325 should constantly be putting yourself in someone else's shoes and asking
2326 if that person could figure out without much difficulty what's going
2327 on. (Assume they are a competent programmer who understands the
2328 essentials of how the XEmacs code is structured but doesn't know much
2329 about the module you're working on or any algorithms you're using.) If
2330 you're not sure whether they would be able to, add a comment.  Always
2331 err on the side of more comments, rather than less.
2332
2333 Generally, when making comments, there is no need to attribute them with
2334 your name or initials.  This especially goes for small,
2335 easy-to-understand, non-opinionated ones.  Also, comments indicating
2336 where, when, and by whom a file was changed are @emph{strongly}
2337 discouraged, and in general will be removed as they are discovered.
2338 This is exactly what @file{ChangeLogs} are there for.  However, it can
2339 occasionally be useful to mark exactly where (but not when or by whom)
2340 changes are made, particularly when making small changes to a file
2341 imported from elsewhere.  These marks help when later on a newer version
2342 of the file is imported and the changes need to be merged. (If
2343 everything were always kept in CVS, there would be no need for this.
2344 But in practice, this often doesn't happen, or the CVS repository is
2345 later on lost or unavailable to the person doing the update.)
2346
2347 When putting in an explicit opinion in a comment, you should
2348 @emph{always} attribute it with your name, and optionally the date.
2349 This also goes for long, complex comments explaining in detail the
2350 workings of something -- by putting your name there, you make it
2351 possible for someone who has questions about how that thing works to
2352 determine who wrote the comment so they can write to them.  Preferably,
2353 use your actual name and not your initials, unless your initials are
2354 generally recognized (e.g. @samp{jwz}).  You can use only your first
2355 name if it's obvious who you are; otherwise, give first and last name.
2356 If you're not a regular contributor, you might consider putting your
2357 email address in -- it may be in the ChangeLog, but after awhile
2358 ChangeLogs have a tendency of disappearing or getting
2359 muddled. (E.g. your comment may get copied somewhere else or even into
2360 another program, and tracking down the proper ChangeLog may be very
2361 difficult.)
2362
2363 If you come across an opinion that is not or no longer valid, or you
2364 come across any comment that no longer applies but you want to keep it
2365 around, enclose it in @samp{[[ } and @samp{ ]]} marks and add a comment
2366 afterwards explaining why the preceding comment is no longer valid.  Put
2367 your name on this comment, as explained above.
2368
2369 Just as comments are a lifeline to programmers, incorrect comments are
2370 death.  If you come across an incorrect comment, @strong{immediately}
2371 correct it or flag it as incorrect, as described in the previous
2372 paragraph.  Whenever you work on a section of code, @emph{always} make
2373 sure to update any comments to be correct -- or, at the very least, flag
2374 them as incorrect.
2375
2376 To indicate a "todo" or other problem, use four pound signs --
2377 i.e. @samp{####}.
2378
2379 @node Adding Global Lisp Variables
2380 @section Adding Global Lisp Variables
2381 @cindex global Lisp variables, adding
2382 @cindex variables, adding global Lisp
2383
2384 Global variables whose names begin with @samp{Q} are constants whose
2385 value is a symbol of a particular name.  The name of the variable should
2386 be derived from the name of the symbol using the same rules as for Lisp
2387 primitives.  These variables are initialized using a call to
2388 @code{defsymbol()} in the @code{syms_of_*()} function. (This call
2389 interns a symbol, sets the C variable to the resulting Lisp object, and
2390 calls @code{staticpro()} on the C variable to tell the
2391 garbage-collection mechanism about this variable.  What
2392 @code{staticpro()} does is add a pointer to the variable to a large
2393 global array; when garbage-collection happens, all pointers listed in
2394 the array are used as starting points for marking Lisp objects.  This is
2395 important because it's quite possible that the only current reference to
2396 the object is the C variable.  In the case of symbols, the
2397 @code{staticpro()} doesn't matter all that much because the symbol is
2398 contained in @code{obarray}, which is itself @code{staticpro()}ed.
2399 However, it's possible that a naughty user could do something like
2400 uninterning the symbol out of @code{obarray} or even setting
2401 @code{obarray} to a different value [although this is likely to make
2402 XEmacs crash!].)
2403
2404   @strong{Please note:} It is potentially deadly if you declare a
2405 @samp{Q...}  variable in two different modules.  The two calls to
2406 @code{defsymbol()} are no problem, but some linkers will complain about
2407 multiply-defined symbols.  The most insidious aspect of this is that
2408 often the link will succeed anyway, but then the resulting executable
2409 will sometimes crash in obscure ways during certain operations!  To
2410 avoid this problem, declare any symbols with common names (such as
2411 @code{text}) that are not obviously associated with this particular
2412 module in the module @file{general.c}.
2413
2414   Global variables whose names begin with @samp{V} are variables that
2415 contain Lisp objects.  The convention here is that all global variables
2416 of type @code{Lisp_Object} begin with @samp{V}, and all others don't
2417 (including integer and boolean variables that have Lisp
2418 equivalents). Most of the time, these variables have equivalents in
2419 Lisp, but some don't.  Those that do are declared this way by a call to
2420 @code{DEFVAR_LISP()} in the @code{vars_of_*()} initializer for the
2421 module.  What this does is create a special @dfn{symbol-value-forward}
2422 Lisp object that contains a pointer to the C variable, intern a symbol
2423 whose name is as specified in the call to @code{DEFVAR_LISP()}, and set
2424 its value to the symbol-value-forward Lisp object; it also calls
2425 @code{staticpro()} on the C variable to tell the garbage-collection
2426 mechanism about the variable.  When @code{eval} (or actually
2427 @code{symbol-value}) encounters this special object in the process of
2428 retrieving a variable's value, it follows the indirection to the C
2429 variable and gets its value.  @code{setq} does similar things so that
2430 the C variable gets changed.
2431
2432   Whether or not you @code{DEFVAR_LISP()} a variable, you need to
2433 initialize it in the @code{vars_of_*()} function; otherwise it will end
2434 up as all zeroes, which is the integer 0 (@emph{not} @code{nil}), and
2435 this is probably not what you want.  Also, if the variable is not
2436 @code{DEFVAR_LISP()}ed, @strong{you must call} @code{staticpro()} on the
2437 C variable in the @code{vars_of_*()} function.  Otherwise, the
2438 garbage-collection mechanism won't know that the object in this variable
2439 is in use, and will happily collect it and reuse its storage for another
2440 Lisp object, and you will be the one who's unhappy when you can't figure
2441 out how your variable got overwritten.
2442
2443 @node Proper Use of Unsigned Types
2444 @section Proper Use of Unsigned Types
2445 @cindex unsigned types, proper use of
2446 @cindex types, proper use of unsigned
2447
2448 Avoid using @code{unsigned int} and @code{unsigned long} whenever
2449 possible.  Unsigned types are viral -- any arithmetic or comparisons
2450 involving mixed signed and unsigned types are automatically converted to
2451 unsigned, which is almost certainly not what you want.  Many subtle and
2452 hard-to-find bugs are created by careless use of unsigned types.  In
2453 general, you should almost @emph{never} use an unsigned type to hold a
2454 regular quantity of any sort.  The only exceptions are
2455
2456 @enumerate
2457 @item
2458 When there's a reasonable possibility you will actually need all 32 or
2459 64 bits to store the quantity.
2460 @item
2461 When calling existing API's that require unsigned types.  In this case,
2462 you should still do all manipulation using signed types, and do the
2463 conversion at the very threshold of the API call.
2464 @item
2465 In existing code that you don't want to modify because you don't
2466 maintain it.
2467 @item
2468 In bit-field structures.
2469 @end enumerate
2470
2471 Other reasonable uses of @code{unsigned int} and @code{unsigned long}
2472 are representing non-quantities -- e.g. bit-oriented flags and such.
2473
2474 @node Coding for Mule
2475 @section Coding for Mule
2476 @cindex coding for Mule
2477 @cindex Mule, coding for
2478
2479 Although Mule support is not compiled by default in XEmacs, many people
2480 are using it, and we consider it crucial that new code works correctly
2481 with multibyte characters.  This is not hard; it is only a matter of
2482 following several simple user-interface guidelines.  Even if you never
2483 compile with Mule, with a little practice you will find it quite easy
2484 to code Mule-correctly.
2485
2486 Note that these guidelines are not necessarily tied to the current Mule
2487 implementation; they are also a good idea to follow on the grounds of
2488 code generalization for future I18N work.
2489
2490 @menu
2491 * Character-Related Data Types::
2492 * Working With Character and Byte Positions::
2493 * Conversion to and from External Data::
2494 * General Guidelines for Writing Mule-Aware Code::
2495 * An Example of Mule-Aware Code::
2496 @end menu
2497
2498 @node Character-Related Data Types
2499 @subsection Character-Related Data Types
2500 @cindex character-related data types
2501 @cindex data types, character-related
2502
2503 First, let's review the basic character-related datatypes used by
2504 XEmacs.  Note that the separate @code{typedef}s are not mandatory in the
2505 current implementation (all of them boil down to @code{unsigned char} or
2506 @code{int}), but they improve clarity of code a great deal, because one
2507 glance at the declaration can tell the intended use of the variable.
2508
2509 @table @code
2510 @item Emchar
2511 @cindex Emchar
2512 An @code{Emchar} holds a single Emacs character.
2513
2514 Obviously, the equality between characters and bytes is lost in the Mule
2515 world.  Characters can be represented by one or more bytes in the
2516 buffer, and @code{Emchar} is the C type large enough to hold any
2517 character.
2518
2519 Without Mule support, an @code{Emchar} is equivalent to an
2520 @code{unsigned char}.
2521
2522 @item Bufbyte
2523 @cindex Bufbyte
2524 The data representing the text in a buffer or string is logically a set
2525 of @code{Bufbyte}s.
2526
2527 XEmacs does not work with the same character formats all the time; when
2528 reading characters from the outside, it decodes them to an internal
2529 format, and likewise encodes them when writing.  @code{Bufbyte} (in fact
2530 @code{unsigned char}) is the basic unit of XEmacs internal buffers and
2531 strings format.  A @code{Bufbyte *} is the type that points at text
2532 encoded in the variable-width internal encoding.
2533
2534 One character can correspond to one or more @code{Bufbyte}s.  In the
2535 current Mule implementation, an ASCII character is represented by the
2536 same @code{Bufbyte}, and other characters are represented by a sequence
2537 of two or more @code{Bufbyte}s.
2538
2539 Without Mule support, there are exactly 256 characters, implicitly
2540 Latin-1, and each character is represented using one @code{Bufbyte}, and
2541 there is a one-to-one correspondence between @code{Bufbyte}s and
2542 @code{Emchar}s.
2543
2544 @item Bufpos
2545 @itemx Charcount
2546 @cindex Bufpos
2547 @cindex Charcount
2548 A @code{Bufpos} represents a character position in a buffer or string.
2549 A @code{Charcount} represents a number (count) of characters.
2550 Logically, subtracting two @code{Bufpos} values yields a
2551 @code{Charcount} value.  Although all of these are @code{typedef}ed to
2552 @code{EMACS_INT}, we use them in preference to @code{EMACS_INT} to make
2553 it clear what sort of position is being used.
2554
2555 @code{Bufpos} and @code{Charcount} values are the only ones that are
2556 ever visible to Lisp.
2557
2558 @item Bytind
2559 @itemx Bytecount
2560 @cindex Bytind
2561 @cindex Bytecount
2562 A @code{Bytind} represents a byte position in a buffer or string.  A
2563 @code{Bytecount} represents the distance between two positions, in bytes.
2564 The relationship between @code{Bytind} and @code{Bytecount} is the same
2565 as the relationship between @code{Bufpos} and @code{Charcount}.
2566
2567 @item Extbyte
2568 @itemx Extcount
2569 @cindex Extbyte
2570 @cindex Extcount
2571 When dealing with the outside world, XEmacs works with @code{Extbyte}s,
2572 which are equivalent to @code{unsigned char}.  Obviously, an
2573 @code{Extcount} is the distance between two @code{Extbyte}s.  Extbytes
2574 and Extcounts are not all that frequent in XEmacs code.
2575 @end table
2576
2577 @node Working With Character and Byte Positions
2578 @subsection Working With Character and Byte Positions
2579 @cindex character and byte positions, working with
2580 @cindex byte positions, working with character and
2581 @cindex positions, working with character and byte
2582
2583 Now that we have defined the basic character-related types, we can look
2584 at the macros and functions designed for work with them and for
2585 conversion between them.  Most of these macros are defined in
2586 @file{buffer.h}, and we don't discuss all of them here, but only the
2587 most important ones.  Examining the existing code is the best way to
2588 learn about them.
2589
2590 @table @code
2591 @item MAX_EMCHAR_LEN
2592 @cindex MAX_EMCHAR_LEN
2593 This preprocessor constant is the maximum number of buffer bytes to
2594 represent an Emacs character in the variable width internal encoding.
2595 It is useful when allocating temporary strings to keep a known number of
2596 characters.  For instance:
2597
2598 @example
2599 @group
2600 @{
2601   Charcount cclen;
2602   ...
2603   @{
2604     /* Allocate place for @var{cclen} characters. */
2605     Bufbyte *buf = (Bufbyte *)alloca (cclen * MAX_EMCHAR_LEN);
2606 ...
2607 @end group
2608 @end example
2609
2610 If you followed the previous section, you can guess that, logically,
2611 multiplying a @code{Charcount} value with @code{MAX_EMCHAR_LEN} produces
2612 a @code{Bytecount} value.
2613
2614 In the current Mule implementation, @code{MAX_EMCHAR_LEN} equals 4.
2615 Without Mule, it is 1.
2616
2617 @item charptr_emchar
2618 @itemx set_charptr_emchar
2619 @cindex charptr_emchar
2620 @cindex set_charptr_emchar
2621 The @code{charptr_emchar} macro takes a @code{Bufbyte} pointer and
2622 returns the @code{Emchar} stored at that position.  If it were a
2623 function, its prototype would be:
2624
2625 @example
2626 Emchar charptr_emchar (Bufbyte *p);
2627 @end example
2628
2629 @code{set_charptr_emchar} stores an @code{Emchar} to the specified byte
2630 position.  It returns the number of bytes stored:
2631
2632 @example
2633 Bytecount set_charptr_emchar (Bufbyte *p, Emchar c);
2634 @end example
2635
2636 It is important to note that @code{set_charptr_emchar} is safe only for
2637 appending a character at the end of a buffer, not for overwriting a
2638 character in the middle.  This is because the width of characters
2639 varies, and @code{set_charptr_emchar} cannot resize the string if it
2640 writes, say, a two-byte character where a single-byte character used to
2641 reside.
2642
2643 A typical use of @code{set_charptr_emchar} can be demonstrated by this
2644 example, which copies characters from buffer @var{buf} to a temporary
2645 string of Bufbytes.
2646
2647 @example
2648 @group
2649 @{
2650   Bufpos pos;
2651   for (pos = beg; pos < end; pos++)
2652     @{
2653       Emchar c = BUF_FETCH_CHAR (buf, pos);
2654       p += set_charptr_emchar (buf, c);
2655     @}
2656 @}
2657 @end group
2658 @end example
2659
2660 Note how @code{set_charptr_emchar} is used to store the @code{Emchar}
2661 and increment the counter, at the same time.
2662
2663 @item INC_CHARPTR
2664 @itemx DEC_CHARPTR
2665 @cindex INC_CHARPTR
2666 @cindex DEC_CHARPTR
2667 These two macros increment and decrement a @code{Bufbyte} pointer,
2668 respectively.  They will adjust the pointer by the appropriate number of
2669 bytes according to the byte length of the character stored there.  Both
2670 macros assume that the memory address is located at the beginning of a
2671 valid character.
2672
2673 Without Mule support, @code{INC_CHARPTR (p)} and @code{DEC_CHARPTR (p)}
2674 simply expand to @code{p++} and @code{p--}, respectively.
2675
2676 @item bytecount_to_charcount
2677 @cindex bytecount_to_charcount
2678 Given a pointer to a text string and a length in bytes, return the
2679 equivalent length in characters.
2680
2681 @example
2682 Charcount bytecount_to_charcount (Bufbyte *p, Bytecount bc);
2683 @end example
2684
2685 @item charcount_to_bytecount
2686 @cindex charcount_to_bytecount
2687 Given a pointer to a text string and a length in characters, return the
2688 equivalent length in bytes.
2689
2690 @example
2691 Bytecount charcount_to_bytecount (Bufbyte *p, Charcount cc);
2692 @end example
2693
2694 @item charptr_n_addr
2695 @cindex charptr_n_addr
2696 Return a pointer to the beginning of the character offset @var{cc} (in
2697 characters) from @var{p}.
2698
2699 @example
2700 Bufbyte *charptr_n_addr (Bufbyte *p, Charcount cc);
2701 @end example
2702 @end table
2703
2704 @node Conversion to and from External Data
2705 @subsection Conversion to and from External Data
2706 @cindex conversion to and from external data
2707 @cindex external data, conversion to and from
2708
2709 When an external function, such as a C library function, returns a
2710 @code{char} pointer, you should almost never treat it as @code{Bufbyte}.
2711 This is because these returned strings may contain 8bit characters which
2712 can be misinterpreted by XEmacs, and cause a crash.  Likewise, when
2713 exporting a piece of internal text to the outside world, you should
2714 always convert it to an appropriate external encoding, lest the internal
2715 stuff (such as the infamous \201 characters) leak out.
2716
2717 The interface to conversion between the internal and external
2718 representations of text are the numerous conversion macros defined in
2719 @file{buffer.h}.  There used to be a fixed set of external formats
2720 supported by these macros, but now any coding system can be used with
2721 these macros.  The coding system alias mechanism is used to create the
2722 following logical coding systems, which replace the fixed external
2723 formats.  The (dontusethis-set-symbol-value-handler) mechanism was
2724 enhanced to make this possible (more work on that is needed - like
2725 remove the @code{dontusethis-} prefix).
2726
2727 @table @code
2728 @item Qbinary
2729 This is the simplest format and is what we use in the absence of a more
2730 appropriate format.  This converts according to the @code{binary} coding
2731 system:
2732
2733 @enumerate a
2734 @item
2735 On input, bytes 0--255 are converted into (implicitly Latin-1)
2736 characters 0--255.  A non-Mule xemacs doesn't really know about
2737 different character sets and the fonts to display them, so the bytes can
2738 be treated as text in different 1-byte encodings by simply setting the
2739 appropriate fonts.  So in a sense, non-Mule xemacs is a multi-lingual
2740 editor if, for example, different fonts are used to display text in
2741 different buffers, faces, or windows.  The specifier mechanism gives the
2742 user complete control over this kind of behavior.
2743 @item
2744 On output, characters 0--255 are converted into bytes 0--255 and other
2745 characters are converted into `~'.
2746 @end enumerate
2747
2748 @item Qfile_name
2749 Format used for filenames.  This is user-definable via either the
2750 @code{file-name-coding-system} or @code{pathname-coding-system} (now
2751 obsolete) variables.
2752
2753 @item Qnative
2754 Format used for the external Unix environment---@code{argv[]}, stuff
2755 from @code{getenv()}, stuff from the @file{/etc/passwd} file, etc.
2756 Currently this is the same as Qfile_name.  The two should be
2757 distinguished for clarity and possible future separation.
2758
2759 @item Qctext
2760 Compound--text format.  This is the standard X11 format used for data
2761 stored in properties, selections, and the like.  This is an 8-bit
2762 no-lock-shift ISO2022 coding system.  This is a real coding system,
2763 unlike Qfile_name, which is user-definable.
2764 @end table
2765
2766 There are two fundamental macros to convert between external and
2767 internal format.
2768
2769 @code{TO_INTERNAL_FORMAT} converts external data to internal format, and
2770 @code{TO_EXTERNAL_FORMAT} converts the other way around.  The arguments
2771 each of these receives are a source type, a source, a sink type, a sink,
2772 and a coding system (or a symbol naming a coding system).
2773
2774 A typical call looks like
2775 @example
2776 TO_EXTERNAL_FORMAT (LISP_STRING, str, C_STRING_MALLOC, ptr, Qfile_name);
2777 @end example
2778
2779 which means that the contents of the lisp string @code{str} are written
2780 to a malloc'ed memory area which will be pointed to by @code{ptr}, after
2781 the function returns.  The conversion will be done using the
2782 @code{file-name} coding system, which will be controlled by the user
2783 indirectly by setting or binding the variable
2784 @code{file-name-coding-system}.
2785
2786 Some sources and sinks require two C variables to specify.  We use some
2787 preprocessor magic to allow different source and sink types, and even
2788 different numbers of arguments to specify different types of sources and
2789 sinks.
2790
2791 So we can have a call that looks like
2792 @example
2793 TO_INTERNAL_FORMAT (DATA, (ptr, len),
2794                     MALLOC, (ptr, len),
2795                     coding_system);
2796 @end example
2797
2798 The parenthesized argument pairs are required to make the preprocessor
2799 magic work.
2800
2801 Here are the different source and sink types:
2802
2803 @table @code
2804 @item @code{DATA, (ptr, len),}
2805 input data is a fixed buffer of size @var{len} at address @var{ptr}
2806 @item @code{ALLOCA, (ptr, len),}
2807 output data is placed in an alloca()ed buffer of size @var{len} pointed to by @var{ptr}
2808 @item @code{MALLOC, (ptr, len),}
2809 output data is in a malloc()ed buffer of size @var{len} pointed to by @var{ptr}
2810 @item @code{C_STRING_ALLOCA, ptr,}
2811 equivalent to @code{ALLOCA (ptr, len_ignored)} on output.
2812 @item @code{C_STRING_MALLOC, ptr,}
2813 equivalent to @code{MALLOC (ptr, len_ignored)} on output
2814 @item @code{C_STRING, ptr,}
2815 equivalent to @code{DATA, (ptr, strlen (ptr) + 1)} on input
2816 @item @code{LISP_STRING, string,}
2817 input or output is a Lisp_Object of type string
2818 @item @code{LISP_BUFFER, buffer,}
2819 output is written to @code{(point)} in lisp buffer @var{buffer}
2820 @item @code{LISP_LSTREAM, lstream,}
2821 input or output is a Lisp_Object of type lstream
2822 @item @code{LISP_OPAQUE, object,}
2823 input or output is a Lisp_Object of type opaque
2824 @end table
2825
2826 Often, the data is being converted to a '\0'-byte-terminated string,
2827 which is the format required by many external system C APIs.  For these
2828 purposes, a source type of @code{C_STRING} or a sink type of
2829 @code{C_STRING_ALLOCA} or @code{C_STRING_MALLOC} is appropriate.
2830 Otherwise, we should try to keep XEmacs '\0'-byte-clean, which means
2831 using (ptr, len) pairs.
2832
2833 The sinks to be specified must be lvalues, unless they are the lisp
2834 object types @code{LISP_LSTREAM} or @code{LISP_BUFFER}.
2835
2836 For the sink types @code{ALLOCA} and @code{C_STRING_ALLOCA}, the
2837 resulting text is stored in a stack-allocated buffer, which is
2838 automatically freed on returning from the function.  However, the sink
2839 types @code{MALLOC} and @code{C_STRING_MALLOC} return @code{xmalloc()}ed
2840 memory.  The caller is responsible for freeing this memory using
2841 @code{xfree()}.
2842
2843 Note that it doesn't make sense for @code{LISP_STRING} to be a source
2844 for @code{TO_INTERNAL_FORMAT} or a sink for @code{TO_EXTERNAL_FORMAT}.
2845 You'll get an assertion failure if you try.
2846
2847
2848 @node General Guidelines for Writing Mule-Aware Code
2849 @subsection General Guidelines for Writing Mule-Aware Code
2850 @cindex writing Mule-aware code, general guidelines for
2851 @cindex Mule-aware code, general guidelines for writing
2852 @cindex code, general guidelines for writing Mule-aware
2853
2854 This section contains some general guidance on how to write Mule-aware
2855 code, as well as some pitfalls you should avoid.
2856
2857 @table @emph
2858 @item Never use @code{char} and @code{char *}.
2859 In XEmacs, the use of @code{char} and @code{char *} is almost always a
2860 mistake.  If you want to manipulate an Emacs character from ``C'', use
2861 @code{Emchar}.  If you want to examine a specific octet in the internal
2862 format, use @code{Bufbyte}.  If you want a Lisp-visible character, use a
2863 @code{Lisp_Object} and @code{make_char}.  If you want a pointer to move
2864 through the internal text, use @code{Bufbyte *}.  Also note that you
2865 almost certainly do not need @code{Emchar *}.
2866
2867 @item Be careful not to confuse @code{Charcount}, @code{Bytecount}, and @code{Bufpos}.
2868 The whole point of using different types is to avoid confusion about the
2869 use of certain variables.  Lest this effect be nullified, you need to be
2870 careful about using the right types.
2871
2872 @item Always convert external data
2873 It is extremely important to always convert external data, because
2874 XEmacs can crash if unexpected 8bit sequences are copied to its internal
2875 buffers literally.
2876
2877 This means that when a system function, such as @code{readdir}, returns
2878 a string, you may need to convert it using one of the conversion macros
2879 described in the previous chapter, before passing it further to Lisp.
2880
2881 Actually, most of the basic system functions that accept '\0'-terminated
2882 string arguments, like @code{stat()} and @code{open()}, have been
2883 @strong{encapsulated} so that they are they @code{always} do internal to
2884 external conversion themselves.  This means you must pass internally
2885 encoded data, typically the @code{XSTRING_DATA} of a Lisp_String to
2886 these functions.  This is actually a design bug, since it unexpectedly
2887 changes the semantics of the system functions.  A better design would be
2888 to provide separate versions of these system functions that accepted
2889 Lisp_Objects which were lisp strings in place of their current
2890 @code{char *} arguments.
2891
2892 @example
2893 int stat_lisp (Lisp_Object path, struct stat *buf); /* Implement me */
2894 @end example
2895
2896 Also note that many internal functions, such as @code{make_string},
2897 accept Bufbytes, which removes the need for them to convert the data
2898 they receive.  This increases efficiency because that way external data
2899 needs to be decoded only once, when it is read.  After that, it is
2900 passed around in internal format.
2901 @end table
2902
2903 @node An Example of Mule-Aware Code
2904 @subsection An Example of Mule-Aware Code
2905 @cindex code, an example of Mule-aware
2906 @cindex Mule-aware code, an example of
2907
2908 As an example of Mule-aware code, we will analyze the @code{string}
2909 function, which conses up a Lisp string from the character arguments it
2910 receives.  Here is the definition, pasted from @code{alloc.c}:
2911
2912 @example
2913 @group
2914 DEFUN ("string", Fstring, 0, MANY, 0, /*
2915 Concatenate all the argument characters and make the result a string.
2916 */
2917        (int nargs, Lisp_Object *args))
2918 @{
2919   Bufbyte *storage = alloca_array (Bufbyte, nargs * MAX_EMCHAR_LEN);
2920   Bufbyte *p = storage;
2921
2922   for (; nargs; nargs--, args++)
2923     @{
2924       Lisp_Object lisp_char = *args;
2925       CHECK_CHAR_COERCE_INT (lisp_char);
2926       p += set_charptr_emchar (p, XCHAR (lisp_char));
2927     @}
2928   return make_string (storage, p - storage);
2929 @}
2930 @end group
2931 @end example
2932
2933 Now we can analyze the source line by line.
2934
2935 Obviously, string will be as long as there are arguments to the
2936 function.  This is why we allocate @code{MAX_EMCHAR_LEN} * @var{nargs}
2937 bytes on the stack, i.e. the worst-case number of bytes for @var{nargs}
2938 @code{Emchar}s to fit in the string.
2939
2940 Then, the loop checks that each element is a character, converting
2941 integers in the process.  Like many other functions in XEmacs, this
2942 function silently accepts integers where characters are expected, for
2943 historical and compatibility reasons.  Unless you know what you are
2944 doing, @code{CHECK_CHAR} will also suffice.  @code{XCHAR (lisp_char)}
2945 extracts the @code{Emchar} from the @code{Lisp_Object}, and
2946 @code{set_charptr_emchar} stores it to storage, increasing @code{p} in
2947 the process.
2948
2949 Other instructive examples of correct coding under Mule can be found all
2950 over the XEmacs code.  For starters, I recommend
2951 @code{Fnormalize_menu_item_name} in @file{menubar.c}.  After you have
2952 understood this section of the manual and studied the examples, you can
2953 proceed writing new Mule-aware code.
2954
2955 @node Techniques for XEmacs Developers
2956 @section Techniques for XEmacs Developers
2957 @cindex techniques for XEmacs developers
2958 @cindex developers, techniques for XEmacs
2959
2960 @cindex Purify
2961 @cindex Quantify
2962 To make a purified XEmacs, do: @code{make puremacs}.
2963 To make a quantified XEmacs, do: @code{make quantmacs}.
2964
2965 You simply can't dump Quantified and Purified images (unless using the
2966 portable dumper).  Purify gets confused when xemacs frees memory in one
2967 process that was allocated in a @emph{different} process on a different
2968 machine!.  Run it like so:
2969 @example
2970 temacs -batch -l loadup.el run-temacs @var{xemacs-args...}
2971 @end example
2972
2973 @cindex error checking
2974 Before you go through the trouble, are you compiling with all
2975 debugging and error-checking off?  If not, try that first.  Be warned
2976 that while Quantify is directly responsible for quite a few
2977 optimizations which have been made to XEmacs, doing a run which
2978 generates results which can be acted upon is not necessarily a trivial
2979 task.
2980
2981 Also, if you're still willing to do some runs make sure you configure
2982 with the @samp{--quantify} flag.  That will keep Quantify from starting
2983 to record data until after the loadup is completed and will shut off
2984 recording right before it shuts down (which generates enough bogus data
2985 to throw most results off).  It also enables three additional elisp
2986 commands: @code{quantify-start-recording-data},
2987 @code{quantify-stop-recording-data} and @code{quantify-clear-data}.
2988
2989 If you want to make XEmacs faster, target your favorite slow benchmark,
2990 run a profiler like Quantify, @code{gprof}, or @code{tcov}, and figure
2991 out where the cycles are going.  In many cases you can localize the
2992 problem (because a particular new feature or even a single patch
2993 elicited it).  Don't hesitate to use brute force techniques like a
2994 global counter incremented at strategic places, especially in
2995 combination with other performance indications (@emph{e.g.}, degree of
2996 buffer fragmentation into extents).
2997
2998 Specific projects:
2999
3000 @itemize @bullet
3001 @item
3002 Make the garbage collector faster.  Figure out how to write an
3003 incremental garbage collector.
3004 @item
3005 Write a compiler that takes bytecode and spits out C code.
3006 Unfortunately, you will then need a C compiler and a more fully
3007 developed module system.
3008 @item
3009 Speed up redisplay.
3010 @item
3011 Speed up syntax highlighting.  It was suggested that ``maybe moving some
3012 of the syntax highlighting capabilities into C would make a
3013 difference.''  Wrong idea, I think.  When processing one large file a
3014 particular low-level routine was being called 40 @emph{million} times
3015 simply for @emph{one} call to @code{newline-and-indent}.  Syntax
3016 highlighting needs to be rewritten to use a reliable, fast parser, then
3017 to trust the pre-parsed structure, and only do re-highlighting locally
3018 to a text change.  Modern machines are fast enough to implement such
3019 parsers in Lisp; but no machine will ever be fast enough to deal with
3020 quadratic (or worse) algorithms!
3021 @item
3022 Implement tail recursion in Emacs Lisp (hard!).
3023 @end itemize
3024
3025 Unfortunately, Emacs Lisp is slow, and is going to stay slow.  Function
3026 calls in elisp are especially expensive.  Iterating over a long list is
3027 going to be 30 times faster implemented in C than in Elisp.
3028
3029 Heavily used small code fragments need to be fast.  The traditional way
3030 to implement such code fragments in C is with macros.  But macros in C
3031 are known to be broken.
3032
3033 @cindex macro hygiene
3034 Macro arguments that are repeatedly evaluated may suffer from repeated
3035 side effects or suboptimal performance.
3036
3037 Variable names used in macros may collide with caller's variables,
3038 causing (at least) unwanted compiler warnings.
3039
3040 In order to solve these problems, and maintain statement semantics, one
3041 should use the @code{do @{ ... @} while (0)} trick while trying to
3042 reference macro arguments exactly once using local variables.
3043
3044 Let's take a look at this poor macro definition:
3045
3046 @example
3047 #define MARK_OBJECT(obj) \
3048   if (!marked_p (obj)) mark_object (obj), did_mark = 1
3049 @end example
3050
3051 This macro evaluates its argument twice, and also fails if used like this:
3052 @example
3053   if (flag) MARK_OBJECT (obj); else do_something();
3054 @end example
3055
3056 A much better definition is
3057
3058 @example
3059 #define MARK_OBJECT(obj) do @{ \
3060   Lisp_Object mo_obj = (obj); \
3061   if (!marked_p (mo_obj))     \
3062     @{                         \
3063       mark_object (mo_obj);   \
3064       did_mark = 1;           \
3065     @}                         \
3066 @} while (0)
3067 @end example
3068
3069 Notice the elimination of double evaluation by using the local variable
3070 with the obscure name.  Writing safe and efficient macros requires great
3071 care.  The one problem with macros that cannot be portably worked around
3072 is, since a C block has no value, a macro used as an expression rather
3073 than a statement cannot use the techniques just described to avoid
3074 multiple evaluation.
3075
3076 @cindex inline functions
3077 In most cases where a macro has function semantics, an inline function
3078 is a better implementation technique.  Modern compiler optimizers tend
3079 to inline functions even if they have no @code{inline} keyword, and
3080 configure magic ensures that the @code{inline} keyword can be safely
3081 used as an additional compiler hint.  Inline functions used in a single
3082 .c files are easy.  The function must already be defined to be
3083 @code{static}.  Just add another @code{inline} keyword to the
3084 definition.
3085
3086 @example
3087 inline static int
3088 heavily_used_small_function (int arg)
3089 @{
3090   ...
3091 @}
3092 @end example
3093
3094 Inline functions in header files are trickier, because we would like to
3095 make the following optimization if the function is @emph{not} inlined
3096 (for example, because we're compiling for debugging).  We would like the
3097 function to be defined externally exactly once, and each calling
3098 translation unit would create an external reference to the function,
3099 instead of including a definition of the inline function in the object
3100 code of every translation unit that uses it.  This optimization is
3101 currently only available for gcc.  But you don't have to worry about the
3102 trickiness; just define your inline functions in header files using this
3103 pattern:
3104
3105 @example
3106 INLINE_HEADER int
3107 i_used_to_be_a_crufty_macro_but_look_at_me_now (int arg);
3108 INLINE_HEADER int
3109 i_used_to_be_a_crufty_macro_but_look_at_me_now (int arg)
3110 @{
3111   ...
3112 @}
3113 @end example
3114
3115 The declaration right before the definition is to prevent warnings when
3116 compiling with @code{gcc -Wmissing-declarations}.  I consider issuing
3117 this warning for inline functions a gcc bug, but the gcc maintainers disagree.
3118
3119 @cindex inline functions, headers
3120 @cindex header files, inline functions
3121 Every header which contains inline functions, either directly by using
3122 @code{INLINE_HEADER} or indirectly by using @code{DECLARE_LRECORD} must
3123 be added to @file{inline.c}'s includes to make the optimization
3124 described above work.  (Optimization note: if all INLINE_HEADER
3125 functions are in fact inlined in all translation units, then the linker
3126 can just discard @code{inline.o}, since it contains only unreferenced code).
3127
3128 To get started debugging XEmacs, take a look at the @file{.gdbinit} and
3129 @file{.dbxrc} files in the @file{src} directory.  See the section in the
3130 XEmacs FAQ on How to Debug an XEmacs problem with a debugger.
3131
3132 After making source code changes, run @code{make check} to ensure that
3133 you haven't introduced any regressions.  If you want to make xemacs more
3134 reliable, please improve the test suite in @file{tests/automated}.
3135
3136 Did you make sure you didn't introduce any new compiler warnings?
3137
3138 Before submitting a patch, please try compiling at least once with
3139
3140 @example
3141 configure --with-mule --use-union-type --error-checking=all
3142 @end example
3143
3144 Here are things to know when you create a new source file:
3145
3146 @itemize @bullet
3147 @item
3148 All @file{.c} files should @code{#include <config.h>} first.  Almost all
3149 @file{.c} files should @code{#include "lisp.h"} second.
3150
3151 @item
3152 Generated header files should be included using the @code{#include <...>} syntax,
3153 not the @code{#include "..."} syntax.  The generated headers are:
3154
3155 @file{config.h sheap-adjust.h paths.h Emacs.ad.h}
3156
3157 The basic rule is that you should assume builds using @code{--srcdir}
3158 and the @code{#include <...>} syntax needs to be used when the
3159 to-be-included generated file is in a potentially different directory
3160 @emph{at compile time}.  The non-obvious C rule is that @code{#include "..."}
3161 means to search for the included file in the same directory as the
3162 including file, @emph{not} in the current directory.
3163
3164 @item
3165 Header files should @emph{not} include @code{<config.h>} and
3166 @code{"lisp.h"}.  It is the responsibility of the @file{.c} files that
3167 use it to do so.
3168
3169 @end itemize
3170
3171 @cindex Lisp object types, creating
3172 @cindex creating Lisp object types
3173 @cindex object types, creating Lisp
3174 Here is a checklist of things to do when creating a new lisp object type
3175 named @var{foo}:
3176
3177 @enumerate
3178 @item
3179 create @var{foo}.h
3180 @item
3181 create @var{foo}.c
3182 @item
3183 add definitions of @code{syms_of_@var{foo}}, etc. to @file{@var{foo}.c}
3184 @item
3185 add declarations of @code{syms_of_@var{foo}}, etc. to @file{symsinit.h}
3186 @item
3187 add calls to @code{syms_of_@var{foo}}, etc. to @file{emacs.c}
3188 @item
3189 add definitions of macros like @code{CHECK_@var{FOO}} and
3190 @code{@var{FOO}P} to @file{@var{foo}.h}
3191 @item
3192 add the new type index to @code{enum lrecord_type}
3193 @item
3194 add a DEFINE_LRECORD_IMPLEMENTATION call to @file{@var{foo}.c}
3195 @item
3196 add an INIT_LRECORD_IMPLEMENTATION call to @code{syms_of_@var{foo}.c}
3197 @end enumerate
3198
3199
3200 @node Regression Testing XEmacs, A Summary of the Various XEmacs Modules, Rules When Writing New C Code, Top
3201 @chapter Regression Testing XEmacs
3202 @cindex testing, regression
3203
3204 The source directory @file{tests/automated} contains XEmacs' automated
3205 test suite.  The usual way of running all the tests is running
3206 @code{make check} from the top-level source directory.
3207
3208 The test suite is unfinished and it's still lacking some essential
3209 features.  It is nevertheless recommended that you run the tests to
3210 confirm that XEmacs behaves correctly.
3211
3212 If you want to run a specific test case, you can do it from the
3213 command-line like this:
3214
3215 @example
3216 $ xemacs -batch -l test-harness.elc -f batch-test-emacs TEST-FILE
3217 @end example
3218
3219 If something goes wrong, you can run the test suite interactively by
3220 loading @file{test-harness.el} into a running XEmacs and typing
3221 @kbd{M-x test-emacs-test-file RET <filename> RET}.  You will see a log of
3222 passed and failed tests, which should allow you to investigate the
3223 source of the error and ultimately fix the bug.
3224
3225 Adding a new test file is trivial: just create a new file here and it
3226 will be run.  There is no need to byte-compile any of the files in
3227 this directory---the test-harness will take care of any necessary
3228 byte-compilation.
3229
3230 Look at the existing test cases for the examples of coding test cases.
3231 It all boils down to your imagination and judicious use of the macros
3232 @code{Assert}, @code{Check-Error}, @code{Check-Error-Message}, and
3233 @code{Check-Message}.
3234
3235 Here's a simple example checking case-sensitive and case-insensitive
3236 comparisons from @file{case-tests.el}.
3237
3238 @example
3239 (with-temp-buffer
3240   (insert "Test Buffer")
3241   (let ((case-fold-search t))
3242     (goto-char (point-min))
3243     (Assert (eq (search-forward "test buffer" nil t) 12))
3244     (goto-char (point-min))
3245     (Assert (eq (search-forward "Test buffer" nil t) 12))
3246     (goto-char (point-min))
3247     (Assert (eq (search-forward "Test Buffer" nil t) 12))
3248
3249     (setq case-fold-search nil)
3250     (goto-char (point-min))
3251     (Assert (not (search-forward "test buffer" nil t)))
3252     (goto-char (point-min))
3253     (Assert (not (search-forward "Test buffer" nil t)))
3254     (goto-char (point-min))
3255     (Assert (eq (search-forward "Test Buffer" nil t) 12))))
3256 @end example
3257
3258 This example could be inserted in a file in @file{tests/automated}, and
3259 it would be a complete test, automatically executed when you run
3260 @kbd{make check} after building XEmacs.  More complex tests may require
3261 substantial temporary scaffolding to create the environment that elicits
3262 the bugs, but the top-level Makefile and @file{test-harness.el} handle
3263 the running and collection of results from the @code{Assert},
3264 @code{Check-Error}, @code{Check-Error-Message}, and @code{Check-Message}
3265 macros.
3266
3267 In general, you should avoid using functionality from packages in your
3268 tests, because you can't be sure that everyone will have the required
3269 package.  However, if you've got a test that works, by all means add it.
3270 Simply wrap the test in an appropriate test, add a notice that the test
3271 was skipped, and update the @code{skipped-test-reasons} hashtable.
3272 Here's an example from @file{syntax-tests.el}:
3273
3274 @example
3275 ;; Test forward-comment at buffer boundaries
3276 (with-temp-buffer
3277
3278   ;; try to use exactly what you need: featurep, boundp, fboundp
3279   (if (not (fboundp 'c-mode))
3280
3281       ;; We should provide a standard function for this boilerplate,
3282       ;; probably called `Skip-Test' -- check for that API with C-h f
3283       (let* ((reason "c-mode unavailable")
3284              (count (gethash reason skipped-test-reasons)))
3285         (puthash reason (if (null count) 1 (1+ count))
3286                  skipped-test-reasons)
3287         (Print-Skip "comment and parse-partial-sexp tests" reason))
3288
3289     ;; and here's the test code
3290     (c-mode)
3291     (insert "// comment\n")
3292     (forward-comment -2)
3293     (Assert (eq (point) (point-min)))
3294     (let ((point (point)))
3295       (insert "/* comment */")
3296       (goto-char point)
3297       (forward-comment 2)
3298       (Assert (eq (point) (point-max)))
3299       (parse-partial-sexp point (point-max)))))
3300 @end example
3301
3302 @code{Skip-Test} is intended for use with features that are normally
3303 present in typical configurations.  For truly optional features, or
3304 tests that apply to one of several alternative implementations (eg, to
3305 GTK widgets, but not Athena, Motif, MS Windows, or Carbon), simply
3306 silently omit the test.
3307
3308
3309 @node A Summary of the Various XEmacs Modules, Allocation of Objects in XEmacs Lisp, Regression Testing XEmacs, Top
3310 @chapter A Summary of the Various XEmacs Modules
3311 @cindex modules, a summary of the various XEmacs
3312
3313   This is accurate as of XEmacs 20.0.
3314
3315 @menu
3316 * Low-Level Modules::
3317 * Basic Lisp Modules::
3318 * Modules for Standard Editing Operations::
3319 * Editor-Level Control Flow Modules::
3320 * Modules for the Basic Displayable Lisp Objects::
3321 * Modules for other Display-Related Lisp Objects::
3322 * Modules for the Redisplay Mechanism::
3323 * Modules for Interfacing with the File System::
3324 * Modules for Other Aspects of the Lisp Interpreter and Object System::
3325 * Modules for Interfacing with the Operating System::
3326 * Modules for Interfacing with X Windows::
3327 * Modules for Internationalization::
3328 * Modules for Regression Testing::
3329 @end menu
3330
3331 @node Low-Level Modules
3332 @section Low-Level Modules
3333 @cindex low-level modules
3334 @cindex modules, low-level
3335
3336 @example
3337 config.h
3338 @end example
3339
3340 This is automatically generated from @file{config.h.in} based on the
3341 results of configure tests and user-selected optional features and
3342 contains preprocessor definitions specifying the nature of the
3343 environment in which XEmacs is being compiled.
3344
3345
3346
3347 @example
3348 paths.h
3349 @end example
3350
3351 This is automatically generated from @file{paths.h.in} based on supplied
3352 configure values, and allows for non-standard installed configurations
3353 of the XEmacs directories.  It's currently broken, though.
3354
3355
3356
3357 @example
3358 emacs.c
3359 signal.c
3360 @end example
3361
3362 @file{emacs.c} contains @code{main()} and other code that performs the most
3363 basic environment initializations and handles shutting down the XEmacs
3364 process (this includes @code{kill-emacs}, the normal way that XEmacs is
3365 exited; @code{dump-emacs}, which is used during the build process to
3366 write out the XEmacs executable; @code{run-emacs-from-temacs}, which can
3367 be used to start XEmacs directly when temacs has finished loading all
3368 the Lisp code; and emergency code to handle crashes [XEmacs tries to
3369 auto-save all files before it crashes]).
3370
3371 Low-level code that directly interacts with the Unix signal mechanism,
3372 however, is in @file{signal.c}.  Note that this code does not handle system
3373 dependencies in interfacing to signals; that is handled using the
3374 @file{syssignal.h} header file, described in section J below.
3375
3376
3377
3378 @example
3379 unexaix.c
3380 unexalpha.c
3381 unexapollo.c
3382 unexconvex.c
3383 unexec.c
3384 unexelf.c
3385 unexelfsgi.c
3386 unexencap.c
3387 unexenix.c
3388 unexfreebsd.c
3389 unexfx2800.c
3390 unexhp9k3.c
3391 unexhp9k800.c
3392 unexmips.c
3393 unexnext.c
3394 unexsol2.c
3395 unexsunos4.c
3396 @end example
3397
3398 These modules contain code dumping out the XEmacs executable on various
3399 different systems. (This process is highly machine-specific and
3400 requires intimate knowledge of the executable format and the memory map
3401 of the process.) Only one of these modules is actually used; this is
3402 chosen by @file{configure}.
3403
3404
3405
3406 @example
3407 ecrt0.c
3408 lastfile.c
3409 pre-crt0.c
3410 @end example
3411
3412 These modules are used in conjunction with the dump mechanism.  On some
3413 systems, an alternative version of the C startup code (the actual code
3414 that receives control from the operating system when the process is
3415 started, and which calls @code{main()}) is required so that the dumping
3416 process works properly; @file{crt0.c} provides this.
3417
3418 @file{pre-crt0.c} and @file{lastfile.c} should be the very first and
3419 very last file linked, respectively. (Actually, this is not really true.
3420 @file{lastfile.c} should be after all Emacs modules whose initialized
3421 data should be made constant, and before all other Emacs files and all
3422 libraries.  In particular, the allocation modules @file{gmalloc.c},
3423 @file{alloca.c}, etc. are normally placed past @file{lastfile.c}, and
3424 all of the files that implement Xt widget classes @emph{must} be placed
3425 after @file{lastfile.c} because they contain various structures that
3426 must be statically initialized and into which Xt writes at various
3427 times.) @file{pre-crt0.c} and @file{lastfile.c} contain exported symbols
3428 that are used to determine the start and end of XEmacs' initialized
3429 data space when dumping.
3430
3431
3432
3433 @example
3434 alloca.c
3435 free-hook.c
3436 getpagesize.h
3437 gmalloc.c
3438 malloc.c
3439 mem-limits.h
3440 ralloc.c
3441 vm-limit.c
3442 @end example
3443
3444 These handle basic C allocation of memory.  @file{alloca.c} is an emulation of
3445 the stack allocation function @code{alloca()} on machines that lack
3446 this. (XEmacs makes extensive use of @code{alloca()} in its code.)
3447
3448 @file{gmalloc.c} and @file{malloc.c} are two implementations of the standard C
3449 functions @code{malloc()}, @code{realloc()} and @code{free()}.  They are
3450 often used in place of the standard system-provided @code{malloc()}
3451 because they usually provide a much faster implementation, at the
3452 expense of additional memory use.  @file{gmalloc.c} is a newer implementation
3453 that is much more memory-efficient for large allocations than @file{malloc.c},
3454 and should always be preferred if it works. (At one point, @file{gmalloc.c}
3455 didn't work on some systems where @file{malloc.c} worked; but this should be
3456 fixed now.)
3457
3458 @cindex relocating allocator
3459 @file{ralloc.c} is the @dfn{relocating allocator}.  It provides
3460 functions similar to @code{malloc()}, @code{realloc()} and @code{free()}
3461 that allocate memory that can be dynamically relocated in memory.  The
3462 advantage of this is that allocated memory can be shuffled around to
3463 place all the free memory at the end of the heap, and the heap can then
3464 be shrunk, releasing the memory back to the operating system.  The use
3465 of this can be controlled with the configure option @code{--rel-alloc};
3466 if enabled, memory allocated for buffers will be relocatable, so that if
3467 a very large file is visited and the buffer is later killed, the memory
3468 can be released to the operating system.  (The disadvantage of this
3469 mechanism is that it can be very slow.  On systems with the
3470 @code{mmap()} system call, the XEmacs version of @file{ralloc.c} uses
3471 this to move memory around without actually having to block-copy it,
3472 which can speed things up; but it can still cause noticeable performance
3473 degradation.)
3474
3475 @file{free-hook.c} contains some debugging functions for checking for invalid
3476 arguments to @code{free()}.
3477
3478 @file{vm-limit.c} contains some functions that warn the user when memory is
3479 getting low.  These are callback functions that are called by @file{gmalloc.c}
3480 and @file{malloc.c} at appropriate times.
3481
3482 @file{getpagesize.h} provides a uniform interface for retrieving the size of a
3483 page in virtual memory.  @file{mem-limits.h} provides a uniform interface for
3484 retrieving the total amount of available virtual memory.  Both are
3485 similar in spirit to the @file{sys*.h} files described in section J, below.
3486
3487
3488
3489 @example
3490 blocktype.c
3491 blocktype.h
3492 dynarr.c
3493 @end example
3494
3495 These implement a couple of basic C data types to facilitate memory
3496 allocation.  The @code{Blocktype} type efficiently manages the
3497 allocation of fixed-size blocks by minimizing the number of times that
3498 @code{malloc()} and @code{free()} are called.  It allocates memory in
3499 large chunks, subdivides the chunks into blocks of the proper size, and
3500 returns the blocks as requested.  When blocks are freed, they are placed
3501 onto a linked list, so they can be efficiently reused.  This data type
3502 is not much used in XEmacs currently, because it's a fairly new
3503 addition.
3504
3505 @cindex dynamic array
3506 The @code{Dynarr} type implements a @dfn{dynamic array}, which is
3507 similar to a standard C array but has no fixed limit on the number of
3508 elements it can contain.  Dynamic arrays can hold elements of any type,
3509 and when you add a new element, the array automatically resizes itself
3510 if it isn't big enough.  Dynarrs are extensively used in the redisplay
3511 mechanism.
3512
3513
3514
3515 @example
3516 inline.c
3517 @end example
3518
3519 This module is used in connection with inline functions (available in
3520 some compilers).  Often, inline functions need to have a corresponding
3521 non-inline function that does the same thing.  This module is where they
3522 reside.  It contains no actual code, but defines some special flags that
3523 cause inline functions defined in header files to be rendered as actual
3524 functions.  It then includes all header files that contain any inline
3525 function definitions, so that each one gets a real function equivalent.
3526
3527
3528
3529 @example
3530 debug.c
3531 debug.h
3532 @end example
3533
3534 These functions provide a system for doing internal consistency checks
3535 during code development.  This system is not currently used; instead the
3536 simpler @code{assert()} macro is used along with the various checks
3537 provided by the @samp{--error-check-*} configuration options.
3538
3539
3540
3541 @example
3542 universe.h
3543 @end example
3544
3545 This is not currently used.
3546
3547
3548
3549 @node Basic Lisp Modules
3550 @section Basic Lisp Modules
3551 @cindex Lisp modules, basic
3552 @cindex modules, basic Lisp
3553
3554 @example
3555 lisp-disunion.h
3556 lisp-union.h
3557 lisp.h
3558 lrecord.h
3559 symsinit.h
3560 @end example
3561
3562 These are the basic header files for all XEmacs modules.  Each module
3563 includes @file{lisp.h}, which brings the other header files in.
3564 @file{lisp.h} contains the definitions of the structures and extractor
3565 and constructor macros for the basic Lisp objects and various other
3566 basic definitions for the Lisp environment, as well as some
3567 general-purpose definitions (e.g. @code{min()} and @code{max()}).
3568 @file{lisp.h} includes either @file{lisp-disunion.h} or
3569 @file{lisp-union.h}, depending on whether @code{USE_UNION_TYPE} is
3570 defined.  These files define the typedef of the Lisp object itself (as
3571 described above) and the low-level macros that hide the actual
3572 implementation of the Lisp object.  All extractor and constructor macros
3573 for particular types of Lisp objects are defined in terms of these
3574 low-level macros.
3575
3576 As a general rule, all typedefs should go into the typedefs section of
3577 @file{lisp.h} rather than into a module-specific header file even if the
3578 structure is defined elsewhere.  This allows function prototypes that
3579 use the typedef to be placed into other header files.  Forward structure
3580 declarations (i.e. a simple declaration like @code{struct foo;} where
3581 the structure itself is defined elsewhere) should be placed into the
3582 typedefs section as necessary.
3583
3584 @file{lrecord.h} contains the basic structures and macros that implement
3585 all record-type Lisp objects---i.e. all objects whose type is a field
3586 in their C structure, which includes all objects except the few most
3587 basic ones.
3588
3589 @file{lisp.h} contains prototypes for most of the exported functions in
3590 the various modules.  Lisp primitives defined using @code{DEFUN} that
3591 need to be called by C code should be declared using @code{EXFUN}.
3592 Other function prototypes should be placed either into the appropriate
3593 section of @code{lisp.h}, or into a module-specific header file,
3594 depending on how general-purpose the function is and whether it has
3595 special-purpose argument types requiring definitions not in
3596 @file{lisp.h}.)  All initialization functions are prototyped in
3597 @file{symsinit.h}.
3598
3599
3600
3601 @example
3602 alloc.c
3603 @end example
3604
3605 The large module @file{alloc.c} implements all of the basic allocation and
3606 garbage collection for Lisp objects.  The most commonly used Lisp
3607 objects are allocated in chunks, similar to the Blocktype data type
3608 described above; others are allocated in individually @code{malloc()}ed
3609 blocks.  This module provides the foundation on which all other aspects
3610 of the Lisp environment sit, and is the first module initialized at
3611 startup.
3612
3613 Note that @file{alloc.c} provides a series of generic functions that are
3614 not dependent on any particular object type, and interfaces to
3615 particular types of objects using a standardized interface of
3616 type-specific methods.  This scheme is a fundamental principle of
3617 object-oriented programming and is heavily used throughout XEmacs.  The
3618 great advantage of this is that it allows for a clean separation of
3619 functionality into different modules---new classes of Lisp objects, new
3620 event interfaces, new device types, new stream interfaces, etc. can be
3621 added transparently without affecting code anywhere else in XEmacs.
3622 Because the different subsystems are divided into general and specific
3623 code, adding a new subtype within a subsystem will in general not
3624 require changes to the generic subsystem code or affect any of the other
3625 subtypes in the subsystem; this provides a great deal of robustness to
3626 the XEmacs code.
3627
3628
3629 @example
3630 eval.c
3631 backtrace.h
3632 @end example
3633
3634 This module contains all of the functions to handle the flow of control.
3635 This includes the mechanisms of defining functions, calling functions,
3636 traversing stack frames, and binding variables; the control primitives
3637 and other special forms such as @code{while}, @code{if}, @code{eval},
3638 @code{let}, @code{and}, @code{or}, @code{progn}, etc.; handling of
3639 non-local exits, unwind-protects, and exception handlers; entering the
3640 debugger; methods for the subr Lisp object type; etc.  It does
3641 @emph{not} include the @code{read} function, the @code{print} function,
3642 or the handling of symbols and obarrays.
3643
3644 @file{backtrace.h} contains some structures related to stack frames and the
3645 flow of control.
3646
3647
3648
3649 @example
3650 lread.c
3651 @end example
3652
3653 This module implements the Lisp reader and the @code{read} function,
3654 which converts text into Lisp objects, according to the read syntax of
3655 the objects, as described above.  This is similar to the parser that is
3656 a part of all compilers.
3657
3658
3659
3660 @example
3661 print.c
3662 @end example
3663
3664 This module implements the Lisp print mechanism and the @code{print}
3665 function and related functions.  This is the inverse of the Lisp reader
3666 -- it converts Lisp objects to a printed, textual representation.
3667 (Hopefully something that can be read back in using @code{read} to get
3668 an equivalent object.)
3669
3670
3671
3672 @example
3673 general.c
3674 symbols.c
3675 symeval.h
3676 @end example
3677
3678 @file{symbols.c} implements the handling of symbols, obarrays, and
3679 retrieving the values of symbols.  Much of the code is devoted to
3680 handling the special @dfn{symbol-value-magic} objects that define
3681 special types of variables---this includes buffer-local variables,
3682 variable aliases, variables that forward into C variables, etc.  This
3683 module is initialized extremely early (right after @file{alloc.c}),
3684 because it is here that the basic symbols @code{t} and @code{nil} are
3685 created, and those symbols are used everywhere throughout XEmacs.
3686
3687 @file{symeval.h} contains the definitions of symbol structures and the
3688 @code{DEFVAR_LISP()} and related macros for declaring variables.
3689
3690
3691
3692 @example
3693 data.c
3694 floatfns.c
3695 fns.c
3696 @end example
3697
3698 These modules implement the methods and standard Lisp primitives for all
3699 the basic Lisp object types other than symbols (which are described
3700 above).  @file{data.c} contains all the predicates (primitives that return
3701 whether an object is of a particular type); the integer arithmetic
3702 functions; and the basic accessor and mutator primitives for the various
3703 object types.  @file{fns.c} contains all the standard predicates for working
3704 with sequences (where, abstractly speaking, a sequence is an ordered set
3705 of objects, and can be represented by a list, string, vector, or
3706 bit-vector); it also contains @code{equal}, perhaps on the grounds that
3707 bulk of the operation of @code{equal} is comparing sequences.
3708 @file{floatfns.c} contains methods and primitives for floats and floating-point
3709 arithmetic.
3710
3711
3712
3713 @example
3714 bytecode.c
3715 bytecode.h
3716 @end example
3717
3718 @file{bytecode.c} implements the byte-code interpreter and
3719 compiled-function objects, and @file{bytecode.h} contains associated
3720 structures.  Note that the byte-code @emph{compiler} is written in Lisp.
3721
3722
3723
3724
3725 @node Modules for Standard Editing Operations
3726 @section Modules for Standard Editing Operations
3727 @cindex modules for standard editing operations
3728 @cindex editing operations, modules for standard
3729
3730 @example
3731 buffer.c
3732 buffer.h
3733 bufslots.h
3734 @end example
3735
3736 @file{buffer.c} implements the @dfn{buffer} Lisp object type.  This
3737 includes functions that create and destroy buffers; retrieve buffers by
3738 name or by other properties; manipulate lists of buffers (remember that
3739 buffers are permanent objects and stored in various ordered lists);
3740 retrieve or change buffer properties; etc.  It also contains the
3741 definitions of all the built-in buffer-local variables (which can be
3742 viewed as buffer properties).  It does @emph{not} contain code to
3743 manipulate buffer-local variables (that's in @file{symbols.c}, described
3744 above); or code to manipulate the text in a buffer.
3745
3746 @file{buffer.h} defines the structures associated with a buffer and the various
3747 macros for retrieving text from a buffer and special buffer positions
3748 (e.g. @code{point}, the default location for text insertion).  It also
3749 contains macros for working with buffer positions and converting between
3750 their representations as character offsets and as byte offsets (under
3751 MULE, they are different, because characters can be multi-byte).  It is
3752 one of the largest header files.
3753
3754 @file{bufslots.h} defines the fields in the buffer structure that correspond to
3755 the built-in buffer-local variables.  It is its own header file because
3756 it is included many times in @file{buffer.c}, as a way of iterating over all
3757 the built-in buffer-local variables.
3758
3759
3760
3761 @example
3762 insdel.c
3763 insdel.h
3764 @end example
3765
3766 @file{insdel.c} contains low-level functions for inserting and deleting text in
3767 a buffer, keeping track of changed regions for use by redisplay, and
3768 calling any before-change and after-change functions that may have been
3769 registered for the buffer.  It also contains the actual functions that
3770 convert between byte offsets and character offsets.
3771
3772 @file{insdel.h} contains associated headers.
3773
3774
3775
3776 @example
3777 marker.c
3778 @end example
3779
3780 This module implements the @dfn{marker} Lisp object type, which
3781 conceptually is a pointer to a text position in a buffer that moves
3782 around as text is inserted and deleted, so as to remain in the same
3783 relative position.  This module doesn't actually move the markers around
3784 -- that's handled in @file{insdel.c}.  This module just creates them and
3785 implements the primitives for working with them.  As markers are simple
3786 objects, this does not entail much.
3787
3788 Note that the standard arithmetic primitives (e.g. @code{+}) accept
3789 markers in place of integers and automatically substitute the value of
3790 @code{marker-position} for the marker, i.e. an integer describing the
3791 current buffer position of the marker.
3792
3793
3794
3795 @example
3796 extents.c
3797 extents.h
3798 @end example
3799
3800 This module implements the @dfn{extent} Lisp object type, which is like
3801 a marker that works over a range of text rather than a single position.
3802 Extents are also much more complex and powerful than markers and have a
3803 more efficient (and more algorithmically complex) implementation.  The
3804 implementation is described in detail in comments in @file{extents.c}.
3805
3806 The code in @file{extents.c} works closely with @file{insdel.c} so that
3807 extents are properly moved around as text is inserted and deleted.
3808 There is also code in @file{extents.c} that provides information needed
3809 by the redisplay mechanism for efficient operation. (Remember that
3810 extents can have display properties that affect [sometimes drastically,
3811 as in the @code{invisible} property] the display of the text they
3812 cover.)
3813
3814
3815
3816 @example
3817 editfns.c
3818 @end example
3819
3820 @file{editfns.c} contains the standard Lisp primitives for working with
3821 a buffer's text, and calls the low-level functions in @file{insdel.c}.
3822 It also contains primitives for working with @code{point} (the default
3823 buffer insertion location).
3824
3825 @file{editfns.c} also contains functions for retrieving various
3826 characteristics from the external environment: the current time, the
3827 process ID of the running XEmacs process, the name of the user who ran
3828 this XEmacs process, etc.  It's not clear why this code is in
3829 @file{editfns.c}.
3830
3831
3832
3833 @example
3834 callint.c
3835 cmds.c
3836 commands.h
3837 @end example
3838
3839 @cindex interactive
3840 These modules implement the basic @dfn{interactive} commands,
3841 i.e. user-callable functions.  Commands, as opposed to other functions,
3842 have special ways of getting their parameters interactively (by querying
3843 the user), as opposed to having them passed in a normal function
3844 invocation.  Many commands are not really meant to be called from other
3845 Lisp functions, because they modify global state in a way that's often
3846 undesired as part of other Lisp functions.
3847
3848 @file{callint.c} implements the mechanism for querying the user for
3849 parameters and calling interactive commands.  The bulk of this module is
3850 code that parses the interactive spec that is supplied with an
3851 interactive command.
3852
3853 @file{cmds.c} implements the basic, most commonly used editing commands:
3854 commands to move around the current buffer and insert and delete
3855 characters.  These commands are implemented using the Lisp primitives
3856 defined in @file{editfns.c}.
3857
3858 @file{commands.h} contains associated structure definitions and prototypes.
3859
3860
3861
3862 @example
3863 regex.c
3864 regex.h
3865 search.c
3866 @end example
3867
3868 @file{search.c} implements the Lisp primitives for searching for text in
3869 a buffer, and some of the low-level algorithms for doing this.  In
3870 particular, the fast fixed-string Boyer-Moore search algorithm is
3871 implemented in @file{search.c}.  The low-level algorithms for doing
3872 regular-expression searching, however, are implemented in @file{regex.c}
3873 and @file{regex.h}.  These two modules are largely independent of
3874 XEmacs, and are similar to (and based upon) the regular-expression
3875 routines used in @file{grep} and other GNU utilities.
3876
3877
3878
3879 @example
3880 doprnt.c
3881 @end example
3882
3883 @file{doprnt.c} implements formatted-string processing, similar to
3884 @code{printf()} command in C.
3885
3886
3887
3888 @example
3889 undo.c
3890 @end example
3891
3892 This module implements the undo mechanism for tracking buffer changes.
3893 Most of this could be implemented in Lisp.
3894
3895
3896
3897 @node Editor-Level Control Flow Modules
3898 @section Editor-Level Control Flow Modules
3899 @cindex control flow modules, editor-level
3900 @cindex modules, editor-level control flow
3901
3902 @example
3903 event-Xt.c
3904 event-msw.c
3905 event-stream.c
3906 event-tty.c
3907 events-mod.h
3908 gpmevent.c
3909 gpmevent.h
3910 events.c
3911 events.h
3912 @end example
3913
3914 These implement the handling of events (user input and other system
3915 notifications).
3916
3917 @file{events.c} and @file{events.h} define the @dfn{event} Lisp object
3918 type and primitives for manipulating it.
3919
3920 @file{event-stream.c} implements the basic functions for working with
3921 event queues, dispatching an event by looking it up in relevant keymaps
3922 and such, and handling timeouts; this includes the primitives
3923 @code{next-event} and @code{dispatch-event}, as well as related
3924 primitives such as @code{sit-for}, @code{sleep-for}, and
3925 @code{accept-process-output}. (@file{event-stream.c} is one of the
3926 hairiest and trickiest modules in XEmacs.  Beware!  You can easily mess
3927 things up here.)
3928
3929 @file{event-Xt.c} and @file{event-tty.c} implement the low-level
3930 interfaces onto retrieving events from Xt (the X toolkit) and from TTY's
3931 (using @code{read()} and @code{select()}), respectively.  The event
3932 interface enforces a clean separation between the specific code for
3933 interfacing with the operating system and the generic code for working
3934 with events, by defining an API of basic, low-level event methods;
3935 @file{event-Xt.c} and @file{event-tty.c} are two different
3936 implementations of this API.  To add support for a new operating system
3937 (e.g. NeXTstep), one merely needs to provide another implementation of
3938 those API functions.
3939
3940 Note that the choice of whether to use @file{event-Xt.c} or
3941 @file{event-tty.c} is made at compile time!  Or at the very latest, it
3942 is made at startup time.  @file{event-Xt.c} handles events for
3943 @emph{both} X and TTY frames; @file{event-tty.c} is only used when X
3944 support is not compiled into XEmacs.  The reason for this is that there
3945 is only one event loop in XEmacs: thus, it needs to be able to receive
3946 events from all different kinds of frames.
3947
3948
3949
3950 @example
3951 keymap.c
3952 keymap.h
3953 @end example
3954
3955 @file{keymap.c} and @file{keymap.h} define the @dfn{keymap} Lisp object
3956 type and associated methods and primitives. (Remember that keymaps are
3957 objects that associate event descriptions with functions to be called to
3958 ``execute'' those events; @code{dispatch-event} looks up events in the
3959 relevant keymaps.)
3960
3961
3962
3963 @example
3964 cmdloop.c
3965 @end example
3966
3967 @file{cmdloop.c} contains functions that implement the actual editor
3968 command loop---i.e. the event loop that cyclically retrieves and
3969 dispatches events.  This code is also rather tricky, just like
3970 @file{event-stream.c}.
3971
3972
3973
3974 @example
3975 macros.c
3976 macros.h
3977 @end example
3978
3979 These two modules contain the basic code for defining keyboard macros.
3980 These functions don't actually do much; most of the code that handles keyboard
3981 macros is mixed in with the event-handling code in @file{event-stream.c}.
3982
3983
3984
3985 @example
3986 minibuf.c
3987 @end example
3988
3989 This contains some miscellaneous code related to the minibuffer (most of
3990 the minibuffer code was moved into Lisp by Richard Mlynarik).  This
3991 includes the primitives for completion (although filename completion is
3992 in @file{dired.c}), the lowest-level interface to the minibuffer (if the
3993 command loop were cleaned up, this too could be in Lisp), and code for
3994 dealing with the echo area (this, too, was mostly moved into Lisp, and
3995 the only code remaining is code to call out to Lisp or provide simple
3996 bootstrapping implementations early in temacs, before the echo-area Lisp
3997 code is loaded).
3998
3999
4000
4001 @node Modules for the Basic Displayable Lisp Objects
4002 @section Modules for the Basic Displayable Lisp Objects
4003 @cindex modules for the basic displayable Lisp objects
4004 @cindex displayable Lisp objects, modules for the basic
4005 @cindex Lisp objects, modules for the basic displayable
4006 @cindex objects, modules for the basic displayable Lisp
4007
4008 @example
4009 console-msw.c
4010 console-msw.h
4011 console-stream.c
4012 console-stream.h
4013 console-tty.c
4014 console-tty.h
4015 console-x.c
4016 console-x.h
4017 console.c
4018 console.h
4019 @end example
4020
4021 These modules implement the @dfn{console} Lisp object type.  A console
4022 contains multiple display devices, but only one keyboard and mouse.
4023 Most of the time, a console will contain exactly one device.
4024
4025 Consoles are the top of a lisp object inclusion hierarchy.  Consoles
4026 contain devices, which contain frames, which contain windows.
4027
4028
4029
4030 @example
4031 device-msw.c
4032 device-tty.c
4033 device-x.c
4034 device.c
4035 device.h
4036 @end example
4037
4038 These modules implement the @dfn{device} Lisp object type.  This
4039 abstracts a particular screen or connection on which frames are
4040 displayed.  As with Lisp objects, event interfaces, and other
4041 subsystems, the device code is separated into a generic component that
4042 contains a standardized interface (in the form of a set of methods) onto
4043 particular device types.
4044
4045 The device subsystem defines all the methods and provides method
4046 services for not only device operations but also for the frame, window,
4047 menubar, scrollbar, toolbar, and other displayable-object subsystems.
4048 The reason for this is that all of these subsystems have the same
4049 subtypes (X, TTY, NeXTstep, Microsoft Windows, etc.) as devices do.
4050
4051
4052
4053 @example
4054 frame-msw.c
4055 frame-tty.c
4056 frame-x.c
4057 frame.c
4058 frame.h
4059 @end example
4060
4061 Each device contains one or more frames in which objects (e.g. text) are
4062 displayed.  A frame corresponds to a window in the window system;
4063 usually this is a top-level window but it could potentially be one of a
4064 number of overlapping child windows within a top-level window, using the
4065 MDI (Multiple Document Interface) protocol in Microsoft Windows or a
4066 similar scheme.
4067
4068 The @file{frame-*} files implement the @dfn{frame} Lisp object type and
4069 provide the generic and device-type-specific operations on frames
4070 (e.g. raising, lowering, resizing, moving, etc.).
4071
4072
4073
4074 @example
4075 window.c
4076 window.h
4077 @end example
4078
4079 @cindex window (in Emacs)
4080 @cindex pane
4081 Each frame consists of one or more non-overlapping @dfn{windows} (better
4082 known as @dfn{panes} in standard window-system terminology) in which a
4083 buffer's text can be displayed.  Windows can also have scrollbars
4084 displayed around their edges.
4085
4086 @file{window.c} and @file{window.h} implement the @dfn{window} Lisp
4087 object type and provide code to manage windows.  Since windows have no
4088 associated resources in the window system (the window system knows only
4089 about the frame; no child windows or anything are used for XEmacs
4090 windows), there is no device-type-specific code here; all of that code
4091 is part of the redisplay mechanism or the code for particular object
4092 types such as scrollbars.
4093
4094
4095
4096 @node Modules for other Display-Related Lisp Objects
4097 @section Modules for other Display-Related Lisp Objects
4098 @cindex modules for other display-related Lisp objects
4099 @cindex display-related Lisp objects, modules for other
4100 @cindex Lisp objects, modules for other display-related
4101
4102 @example
4103 faces.c
4104 faces.h
4105 @end example
4106
4107
4108
4109 @example
4110 bitmaps.h
4111 glyphs-eimage.c
4112 glyphs-msw.c
4113 glyphs-msw.h
4114 glyphs-widget.c
4115 glyphs-x.c
4116 glyphs-x.h
4117 glyphs.c
4118 glyphs.h
4119 @end example
4120
4121
4122
4123 @example
4124 objects-msw.c
4125 objects-msw.h
4126 objects-tty.c
4127 objects-tty.h
4128 objects-x.c
4129 objects-x.h
4130 objects.c
4131 objects.h
4132 @end example
4133
4134
4135
4136 @example
4137 menubar-msw.c
4138 menubar-msw.h
4139 menubar-x.c
4140 menubar.c
4141 menubar.h
4142 @end example
4143
4144
4145
4146 @example
4147 scrollbar-msw.c
4148 scrollbar-msw.h
4149 scrollbar-x.c
4150 scrollbar-x.h
4151 scrollbar.c
4152 scrollbar.h
4153 @end example
4154
4155
4156
4157 @example
4158 toolbar-msw.c
4159 toolbar-x.c
4160 toolbar.c
4161 toolbar.h
4162 @end example
4163
4164
4165
4166 @example
4167 font-lock.c
4168 @end example
4169
4170 This file provides C support for syntax highlighting---i.e.
4171 highlighting different syntactic constructs of a source file in
4172 different colors, for easy reading.  The C support is provided so that
4173 this is fast.
4174
4175 As of 21.4.10, bugs introduced at the very end of the 21.2 series in the
4176 ``syntax properties'' code were fixed, and highlighting is acceptably
4177 quick again.  However, presumably more improvements are possible, and
4178 the places to look are probably here, in the defun-traversing code, and
4179 in @file{syntax.c}, in the comment-traversing code.
4180
4181
4182 @example
4183 dgif_lib.c
4184 gif_err.c
4185 gif_lib.h
4186 gifalloc.c
4187 @end example
4188
4189 These modules decode GIF-format image files, for use with glyphs.
4190 These files were removed due to Unisys patent infringement concerns.
4191
4192
4193
4194 @node Modules for the Redisplay Mechanism
4195 @section Modules for the Redisplay Mechanism
4196 @cindex modules for the redisplay mechanism
4197 @cindex redisplay mechanism, modules for the
4198
4199 @example
4200 redisplay-output.c
4201 redisplay-msw.c
4202 redisplay-tty.c
4203 redisplay-x.c
4204 redisplay.c
4205 redisplay.h
4206 @end example
4207
4208 These files provide the redisplay mechanism.  As with many other
4209 subsystems in XEmacs, there is a clean separation between the general
4210 and device-specific support.
4211
4212 @file{redisplay.c} contains the bulk of the redisplay engine.  These
4213 functions update the redisplay structures (which describe how the screen
4214 is to appear) to reflect any changes made to the state of any
4215 displayable objects (buffer, frame, window, etc.) since the last time
4216 that redisplay was called.  These functions are highly optimized to
4217 avoid doing more work than necessary (since redisplay is called
4218 extremely often and is potentially a huge time sink), and depend heavily
4219 on notifications from the objects themselves that changes have occurred,
4220 so that redisplay doesn't explicitly have to check each possible object.
4221 The redisplay mechanism also contains a great deal of caching to further
4222 speed things up; some of this caching is contained within the various
4223 displayable objects.
4224
4225 @file{redisplay-output.c} goes through the redisplay structures and converts
4226 them into calls to device-specific methods to actually output the screen
4227 changes.
4228
4229 @file{redisplay-x.c} and @file{redisplay-tty.c} are two implementations
4230 of these redisplay output methods, for X frames and TTY frames,
4231 respectively.
4232
4233
4234
4235 @example
4236 indent.c
4237 @end example
4238
4239 This module contains various functions and Lisp primitives for
4240 converting between buffer positions and screen positions.  These
4241 functions call the redisplay mechanism to do most of the work, and then
4242 examine the redisplay structures to get the necessary information.  This
4243 module needs work.
4244
4245
4246
4247 @example
4248 termcap.c
4249 terminfo.c
4250 tparam.c
4251 @end example
4252
4253 These files contain functions for working with the termcap (BSD-style)
4254 and terminfo (System V style) databases of terminal capabilities and
4255 escape sequences, used when XEmacs is displaying in a TTY.
4256
4257
4258
4259 @example
4260 cm.c
4261 cm.h
4262 @end example
4263
4264 These files provide some miscellaneous TTY-output functions and should
4265 probably be merged into @file{redisplay-tty.c}.
4266
4267
4268
4269 @node Modules for Interfacing with the File System
4270 @section Modules for Interfacing with the File System
4271 @cindex modules for interfacing with the file system
4272 @cindex interfacing with the file system, modules for
4273 @cindex file system, modules for interfacing with the
4274
4275 @example
4276 lstream.c
4277 lstream.h
4278 @end example
4279
4280 These modules implement the @dfn{stream} Lisp object type.  This is an
4281 internal-only Lisp object that implements a generic buffering stream.
4282 The idea is to provide a uniform interface onto all sources and sinks of
4283 data, including file descriptors, stdio streams, chunks of memory, Lisp
4284 buffers, Lisp strings, etc.  That way, I/O functions can be written to
4285 the stream interface and can transparently handle all possible sources
4286 and sinks.  (For example, the @code{read} function can read data from a
4287 file, a string, a buffer, or even a function that is called repeatedly
4288 to return data, without worrying about where the data is coming from or
4289 what-size chunks it is returned in.)
4290
4291 @cindex lstream
4292 Note that in the C code, streams are called @dfn{lstreams} (for ``Lisp
4293 streams'') to distinguish them from other kinds of streams, e.g. stdio
4294 streams and C++ I/O streams.
4295
4296 Similar to other subsystems in XEmacs, lstreams are separated into
4297 generic functions and a set of methods for the different types of
4298 lstreams.  @file{lstream.c} provides implementations of many different
4299 types of streams; others are provided, e.g., in @file{file-coding.c}.
4300
4301
4302
4303 @example
4304 fileio.c
4305 @end example
4306
4307 This implements the basic primitives for interfacing with the file
4308 system.  This includes primitives for reading files into buffers,
4309 writing buffers into files, checking for the presence or accessibility
4310 of files, canonicalizing file names, etc.  Note that these primitives
4311 are usually not invoked directly by the user: There is a great deal of
4312 higher-level Lisp code that implements the user commands such as
4313 @code{find-file} and @code{save-buffer}.  This is similar to the
4314 distinction between the lower-level primitives in @file{editfns.c} and
4315 the higher-level user commands in @file{commands.c} and
4316 @file{simple.el}.
4317
4318
4319
4320 @example
4321 filelock.c
4322 @end example
4323
4324 This file provides functions for detecting clashes between different
4325 processes (e.g. XEmacs and some external process, or two different
4326 XEmacs processes) modifying the same file.  (XEmacs can optionally use
4327 the @file{lock/} subdirectory to provide a form of ``locking'' between
4328 different XEmacs processes.)  This module is also used by the low-level
4329 functions in @file{insdel.c} to ensure that, if the first modification
4330 is being made to a buffer whose corresponding file has been externally
4331 modified, the user is made aware of this so that the buffer can be
4332 synched up with the external changes if necessary.
4333
4334
4335 @example
4336 filemode.c
4337 @end example
4338
4339 This file provides some miscellaneous functions that construct a
4340 @samp{rwxr-xr-x}-type permissions string (as might appear in an
4341 @file{ls}-style directory listing) given the information returned by the
4342 @code{stat()} system call.
4343
4344
4345
4346 @example
4347 dired.c
4348 ndir.h
4349 @end example
4350
4351 These files implement the XEmacs interface to directory searching.  This
4352 includes a number of primitives for determining the files in a directory
4353 and for doing filename completion. (Remember that generic completion is
4354 handled by a different mechanism, in @file{minibuf.c}.)
4355
4356 @file{ndir.h} is a header file used for the directory-searching
4357 emulation functions provided in @file{sysdep.c} (see section J below),
4358 for systems that don't provide any directory-searching functions. (On
4359 those systems, directories can be read directly as files, and parsed.)
4360
4361
4362
4363 @example
4364 realpath.c
4365 @end example
4366
4367 This file provides an implementation of the @code{realpath()} function
4368 for expanding symbolic links, on systems that don't implement it or have
4369 a broken implementation.
4370
4371
4372
4373 @node Modules for Other Aspects of the Lisp Interpreter and Object System
4374 @section Modules for Other Aspects of the Lisp Interpreter and Object System
4375 @cindex modules for other aspects of the Lisp interpreter and object system
4376 @cindex Lisp interpreter and object system, modules for other aspects of the
4377 @cindex interpreter and object system, modules for other aspects of the Lisp
4378 @cindex object system, modules for other aspects of the Lisp interpreter and
4379
4380 @example
4381 elhash.c
4382 elhash.h
4383 hash.c
4384 hash.h
4385 @end example
4386
4387 These files provide two implementations of hash tables.  Files
4388 @file{hash.c} and @file{hash.h} provide a generic C implementation of
4389 hash tables which can stand independently of XEmacs.  Files
4390 @file{elhash.c} and @file{elhash.h} provide a separate implementation of
4391 hash tables that can store only Lisp objects, and knows about Lispy
4392 things like garbage collection, and implement the @dfn{hash-table} Lisp
4393 object type.
4394
4395
4396 @example
4397 specifier.c
4398 specifier.h
4399 @end example
4400
4401 This module implements the @dfn{specifier} Lisp object type.  This is
4402 primarily used for displayable properties, and allows for values that
4403 are specific to a particular buffer, window, frame, device, or device
4404 class, as well as a default value existing.  This is used, for example,
4405 to control the height of the horizontal scrollbar or the appearance of
4406 the @code{default}, @code{bold}, or other faces.  The specifier object
4407 consists of a number of specifications, each of which maps from a
4408 buffer, window, etc. to a value.  The function @code{specifier-instance}
4409 looks up a value given a window (from which a buffer, frame, and device
4410 can be derived).
4411
4412
4413 @example
4414 chartab.c
4415 chartab.h
4416 casetab.c
4417 @end example
4418
4419 @file{chartab.c} and @file{chartab.h} implement the @dfn{char table}
4420 Lisp object type, which maps from characters or certain sorts of
4421 character ranges to Lisp objects.  The implementation of this object
4422 type is optimized for the internal representation of characters.  Char
4423 tables come in different types, which affect the allowed object types to
4424 which a character can be mapped and also dictate certain other
4425 properties of the char table.
4426
4427 @cindex case table
4428 @file{casetab.c} implements one sort of char table, the @dfn{case
4429 table}, which maps characters to other characters of possibly different
4430 case.  These are used by XEmacs to implement case-changing primitives
4431 and to do case-insensitive searching.
4432
4433
4434
4435 @example
4436 syntax.c
4437 syntax.h
4438 @end example
4439
4440 @cindex scanner
4441 This module implements @dfn{syntax tables}, another sort of char table
4442 that maps characters into syntax classes that define the syntax of these
4443 characters (e.g. a parenthesis belongs to a class of @samp{open}
4444 characters that have corresponding @samp{close} characters and can be
4445 nested).  This module also implements the Lisp @dfn{scanner}, a set of
4446 primitives for scanning over text based on syntax tables.  This is used,
4447 for example, to find the matching parenthesis in a command such as
4448 @code{forward-sexp}, and by @file{font-lock.c} to locate quoted strings,
4449 comments, etc.
4450
4451 @c #### Break this out into a separate node somewhere!
4452 Syntax codes are implemented as bitfields in an int.  Bits 0-6 contain
4453 the syntax code itself, bit 7 is a special prefix flag used for Lisp,
4454 and bits 16-23 contain comment syntax flags.  From the Lisp programmer's
4455 point of view, there are 11 flags: 2 styles X 2 characters X @{start,
4456 end@} flags for two-character comment delimiters, 2 style flags for
4457 one-character comment delimiters, and the prefix flag.
4458
4459 Internally, however, the characters used in multi-character delimiters
4460 will have non-comment-character syntax classes (@emph{e.g.}, the
4461 @samp{/} in C's @samp{/*} comment-start delimiter has ``punctuation''
4462 (here meaning ``operator-like'') class in C modes).  Thus in a mixed
4463 comment style, such as C++'s @samp{//} to end of line, is represented by
4464 giving @samp{/} the ``punctuation'' class and the ``style b first
4465 character of start sequence'' and ``style b second character of start
4466 sequence'' flags.  The fact that class is @emph{not} punctuation allows
4467 the syntax scanner to recognize that this is a multi-character
4468 delimiter.  The @samp{newline} character is given (single-character)
4469 ``comment-end'' @emph{class} and the ``style b first character of end
4470 sequence'' @emph{flag}.  The ``comment-end'' class allows the scanner to
4471 determine that no second character is needed to terminate the comment.
4472
4473
4474 @example
4475 casefiddle.c
4476 @end example
4477
4478 This module implements various Lisp primitives for upcasing, downcasing
4479 and capitalizing strings or regions of buffers.
4480
4481
4482
4483 @example
4484 rangetab.c
4485 @end example
4486
4487 This module implements the @dfn{range table} Lisp object type, which
4488 provides for a mapping from ranges of integers to arbitrary Lisp
4489 objects.
4490
4491
4492
4493 @example
4494 opaque.c
4495 opaque.h
4496 @end example
4497
4498 This module implements the @dfn{opaque} Lisp object type, an
4499 internal-only Lisp object that encapsulates an arbitrary block of memory
4500 so that it can be managed by the Lisp allocation system.  To create an
4501 opaque object, you call @code{make_opaque()}, passing a pointer to a
4502 block of memory.  An object is created that is big enough to hold the
4503 memory, which is copied into the object's storage.  The object will then
4504 stick around as long as you keep pointers to it, after which it will be
4505 automatically reclaimed.
4506
4507 @cindex mark method
4508 Opaque objects can also have an arbitrary @dfn{mark method} associated
4509 with them, in case the block of memory contains other Lisp objects that
4510 need to be marked for garbage-collection purposes. (If you need other
4511 object methods, such as a finalize method, you should just go ahead and
4512 create a new Lisp object type---it's not hard.)
4513
4514
4515
4516 @example
4517 abbrev.c
4518 @end example
4519
4520 This function provides a few primitives for doing dynamic abbreviation
4521 expansion.  In XEmacs, most of the code for this has been moved into
4522 Lisp.  Some C code remains for speed and because the primitive
4523 @code{self-insert-command} (which is executed for all self-inserting
4524 characters) hooks into the abbrev mechanism. (@code{self-insert-command}
4525 is itself in C only for speed.)
4526
4527
4528
4529 @example
4530 doc.c
4531 @end example
4532
4533 This function provides primitives for retrieving the documentation
4534 strings of functions and variables.  These documentation strings contain
4535 certain special markers that get dynamically expanded (e.g. a
4536 reverse-lookup is performed on some named functions to retrieve their
4537 current key bindings).  Some documentation strings (in particular, for
4538 the built-in primitives and pre-loaded Lisp functions) are stored
4539 externally in a file @file{DOC} in the @file{lib-src/} directory and
4540 need to be fetched from that file. (Part of the build stage involves
4541 building this file, and another part involves constructing an index for
4542 this file and embedding it into the executable, so that the functions in
4543 @file{doc.c} do not have to search the entire @file{DOC} file to find
4544 the appropriate documentation string.)
4545
4546
4547
4548 @example
4549 md5.c
4550 @end example
4551
4552 This function provides a Lisp primitive that implements the MD5 secure
4553 hashing scheme, used to create a large hash value of a string of data such that
4554 the data cannot be derived from the hash value.  This is used for
4555 various security applications on the Internet.
4556
4557
4558
4559
4560 @node Modules for Interfacing with the Operating System
4561 @section Modules for Interfacing with the Operating System
4562 @cindex modules for interfacing with the operating system
4563 @cindex interfacing with the operating system, modules for
4564 @cindex operating system, modules for interfacing with the
4565
4566 @example
4567 callproc.c
4568 process.c
4569 process.h
4570 @end example
4571
4572 These modules allow XEmacs to spawn and communicate with subprocesses
4573 and network connections.
4574
4575 @cindex synchronous subprocesses
4576 @cindex subprocesses, synchronous
4577   @file{callproc.c} implements (through the @code{call-process}
4578 primitive) what are called @dfn{synchronous subprocesses}.  This means
4579 that XEmacs runs a program, waits till it's done, and retrieves its
4580 output.  A typical example might be calling the @file{ls} program to get
4581 a directory listing.
4582
4583 @cindex asynchronous subprocesses
4584 @cindex subprocesses, asynchronous
4585   @file{process.c} and @file{process.h} implement @dfn{asynchronous
4586 subprocesses}.  This means that XEmacs starts a program and then
4587 continues normally, not waiting for the process to finish.  Data can be
4588 sent to the process or retrieved from it as it's running.  This is used
4589 for the @code{shell} command (which provides a front end onto a shell
4590 program such as @file{csh}), the mail and news readers implemented in
4591 XEmacs, etc.  The result of calling @code{start-process} to start a
4592 subprocess is a process object, a particular kind of object used to
4593 communicate with the subprocess.  You can send data to the process by
4594 passing the process object and the data to @code{send-process}, and you
4595 can specify what happens to data retrieved from the process by setting
4596 properties of the process object. (When the process sends data, XEmacs
4597 receives a process event, which says that there is data ready.  When
4598 @code{dispatch-event} is called on this event, it reads the data from
4599 the process and does something with it, as specified by the process
4600 object's properties.  Typically, this means inserting the data into a
4601 buffer or calling a function.) Another property of the process object is
4602 called the @dfn{sentinel}, which is a function that is called when the
4603 process terminates.
4604
4605 @cindex network connections
4606   Process objects are also used for network connections (connections to a
4607 process running on another machine).  Network connections are started
4608 with @code{open-network-stream} but otherwise work just like
4609 subprocesses.
4610
4611
4612
4613 @example
4614 sysdep.c
4615 sysdep.h
4616 @end example
4617
4618   These modules implement most of the low-level, messy operating-system
4619 interface code.  This includes various device control (ioctl) operations
4620 for file descriptors, TTY's, pseudo-terminals, etc. (usually this stuff
4621 is fairly system-dependent; thus the name of this module), and emulation
4622 of standard library functions and system calls on systems that don't
4623 provide them or have broken versions.
4624
4625
4626
4627 @example
4628 sysdir.h
4629 sysfile.h
4630 sysfloat.h
4631 sysproc.h
4632 syspwd.h
4633 syssignal.h
4634 systime.h
4635 systty.h
4636 syswait.h
4637 @end example
4638
4639 These header files provide consistent interfaces onto system-dependent
4640 header files and system calls.  The idea is that, instead of including a
4641 standard header file like @file{<sys/param.h>} (which may or may not
4642 exist on various systems) or having to worry about whether all system
4643 provide a particular preprocessor constant, or having to deal with the
4644 four different paradigms for manipulating signals, you just include the
4645 appropriate @file{sys*.h} header file, which includes all the right
4646 system header files, defines and missing preprocessor constants,
4647 provides a uniform interface onto system calls, etc.
4648
4649 @file{sysdir.h} provides a uniform interface onto directory-querying
4650 functions. (In some cases, this is in conjunction with emulation
4651 functions in @file{sysdep.c}.)
4652
4653 @file{sysfile.h} includes all the necessary header files for standard
4654 system calls (e.g. @code{read()}), ensures that all necessary
4655 @code{open()} and @code{stat()} preprocessor constants are defined, and
4656 possibly (usually) substitutes sugared versions of @code{read()},
4657 @code{write()}, etc. that automatically restart interrupted I/O
4658 operations.
4659
4660 @file{sysfloat.h} includes the necessary header files for floating-point
4661 operations.
4662
4663 @file{sysproc.h} includes the necessary header files for calling
4664 @code{select()}, @code{fork()}, @code{execve()}, socket operations, and
4665 the like, and ensures that the @code{FD_*()} macros for descriptor-set
4666 manipulations are available.
4667
4668 @file{syspwd.h} includes the necessary header files for obtaining
4669 information from @file{/etc/passwd} (the functions are emulated under
4670 VMS).
4671
4672 @file{syssignal.h} includes the necessary header files for
4673 signal-handling and provides a uniform interface onto the different
4674 signal-handling and signal-blocking paradigms.
4675
4676 @file{systime.h} includes the necessary header files and provides
4677 uniform interfaces for retrieving the time of day, setting file
4678 access/modification times, getting the amount of time used by the XEmacs
4679 process, etc.
4680
4681 @file{systty.h} buffers against the infinitude of different ways of
4682 controlling TTY's.
4683
4684 @file{syswait.h} provides a uniform way of retrieving the exit status
4685 from a @code{wait()}ed-on process (some systems use a union, others use
4686 an int).
4687
4688
4689
4690 @example
4691 hpplay.c
4692 libsst.c
4693 libsst.h
4694 libst.h
4695 linuxplay.c
4696 nas.c
4697 sgiplay.c
4698 sound.c
4699 sunplay.c
4700 @end example
4701
4702 These files implement the ability to play various sounds on some types
4703 of computers.  You have to configure your XEmacs with sound support in
4704 order to get this capability.
4705
4706 @file{sound.c} provides the generic interface.  It implements various
4707 Lisp primitives and variables that let you specify which sounds should
4708 be played in certain conditions. (The conditions are identified by
4709 symbols, which are passed to @code{ding} to make a sound.  Various
4710 standard functions call this function at certain times; if sound support
4711 does not exist, a simple beep results.
4712
4713 @cindex native sound
4714 @cindex sound, native
4715 @file{sgiplay.c}, @file{sunplay.c}, @file{hpplay.c}, and
4716 @file{linuxplay.c} interface to the machine's speaker for various
4717 different kind of machines.  This is called @dfn{native} sound.
4718
4719 @cindex sound, network
4720 @cindex network sound
4721 @cindex NAS
4722 @file{nas.c} interfaces to a computer somewhere else on the network
4723 using the NAS (Network Audio Server) protocol, playing sounds on that
4724 machine.  This allows you to run XEmacs on a remote machine, with its
4725 display set to your local machine, and have the sounds be made on your
4726 local machine, provided that you have a NAS server running on your local
4727 machine.
4728
4729 @file{libsst.c}, @file{libsst.h}, and @file{libst.h} provide some
4730 additional functions for playing sound on a Sun SPARC but are not
4731 currently in use.
4732
4733
4734
4735 @example
4736 tooltalk.c
4737 tooltalk.h
4738 @end example
4739
4740 These two modules implement an interface to the ToolTalk protocol, which
4741 is an interprocess communication protocol implemented on some versions
4742 of Unix.  ToolTalk is a high-level protocol that allows processes to
4743 register themselves as providers of particular services; other processes
4744 can then request a service without knowing or caring exactly who is
4745 providing the service.  It is similar in spirit to the DDE protocol
4746 provided under Microsoft Windows.  ToolTalk is a part of the new CDE
4747 (Common Desktop Environment) specification and is used to connect the
4748 parts of the SPARCWorks development environment.
4749
4750
4751
4752 @example
4753 getloadavg.c
4754 @end example
4755
4756 This module provides the ability to retrieve the system's current load
4757 average. (The way to do this is highly system-specific, unfortunately,
4758 and requires a lot of special-case code.)
4759
4760
4761
4762 @example
4763 sunpro.c
4764 @end example
4765
4766 This module provides a small amount of code used internally at Sun to
4767 keep statistics on the usage of XEmacs.
4768
4769
4770
4771 @example
4772 broken-sun.h
4773 strcmp.c
4774 strcpy.c
4775 sunOS-fix.c
4776 @end example
4777
4778 These files provide replacement functions and prototypes to fix numerous
4779 bugs in early releases of SunOS 4.1.
4780
4781
4782
4783 @example
4784 hftctl.c
4785 @end example
4786
4787 This module provides some terminal-control code necessary on versions of
4788 AIX prior to 4.1.
4789
4790
4791
4792 @node Modules for Interfacing with X Windows
4793 @section Modules for Interfacing with X Windows
4794 @cindex modules for interfacing with X Windows
4795 @cindex interfacing with X Windows, modules for
4796 @cindex X Windows, modules for interfacing with
4797
4798 @example
4799 Emacs.ad.h
4800 @end example
4801
4802 A file generated from @file{Emacs.ad}, which contains XEmacs-supplied
4803 fallback resources (so that XEmacs has pretty defaults).
4804
4805
4806
4807 @example
4808 EmacsFrame.c
4809 EmacsFrame.h
4810 EmacsFrameP.h
4811 @end example
4812
4813 These modules implement an Xt widget class that encapsulates a frame.
4814 This is for ease in integrating with Xt.  The EmacsFrame widget covers
4815 the entire X window except for the menubar; the scrollbars are
4816 positioned on top of the EmacsFrame widget.
4817
4818 @strong{Warning:} Abandon hope, all ye who enter here.  This code took
4819 an ungodly amount of time to get right, and is likely to fall apart
4820 mercilessly at the slightest change.  Such is life under Xt.
4821
4822
4823
4824 @example
4825 EmacsManager.c
4826 EmacsManager.h
4827 EmacsManagerP.h
4828 @end example
4829
4830 These modules implement a simple Xt manager (i.e. composite) widget
4831 class that simply lets its children set whatever geometry they want.
4832 It's amazing that Xt doesn't provide this standardly, but on second
4833 thought, it makes sense, considering how amazingly broken Xt is.
4834
4835
4836 @example
4837 EmacsShell-sub.c
4838 EmacsShell.c
4839 EmacsShell.h
4840 EmacsShellP.h
4841 @end example
4842
4843 These modules implement two Xt widget classes that are subclasses of
4844 the TopLevelShell and TransientShell classes.  This is necessary to deal
4845 with more brokenness that Xt has sadistically thrust onto the backs of
4846 developers.
4847
4848
4849
4850 @example
4851 xgccache.c
4852 xgccache.h
4853 @end example
4854
4855 These modules provide functions for maintenance and caching of GC's
4856 (graphics contexts) under the X Window System.  This code is junky and
4857 needs to be rewritten.
4858
4859
4860
4861 @example
4862 select-msw.c
4863 select-x.c
4864 select.c
4865 select.h
4866 @end example
4867
4868 @cindex selections
4869   This module provides an interface to the X Window System's concept of
4870 @dfn{selections}, the standard way for X applications to communicate
4871 with each other.
4872
4873
4874
4875 @example
4876 xintrinsic.h
4877 xintrinsicp.h
4878 xmmanagerp.h
4879 xmprimitivep.h
4880 @end example
4881
4882 These header files are similar in spirit to the @file{sys*.h} files and buffer
4883 against different implementations of Xt and Motif.
4884
4885 @itemize @bullet
4886 @item
4887 @file{xintrinsic.h} should be included in place of @file{<Intrinsic.h>}.
4888 @item
4889 @file{xintrinsicp.h} should be included in place of @file{<IntrinsicP.h>}.
4890 @item
4891 @file{xmmanagerp.h} should be included in place of @file{<XmManagerP.h>}.
4892 @item
4893 @file{xmprimitivep.h} should be included in place of @file{<XmPrimitiveP.h>}.
4894 @end itemize
4895
4896
4897
4898 @example
4899 xmu.c
4900 xmu.h
4901 @end example
4902
4903 These files provide an emulation of the Xmu library for those systems
4904 (i.e. HPUX) that don't provide it as a standard part of X.
4905
4906
4907
4908 @example
4909 ExternalClient-Xlib.c
4910 ExternalClient.c
4911 ExternalClient.h
4912 ExternalClientP.h
4913 ExternalShell.c
4914 ExternalShell.h
4915 ExternalShellP.h
4916 extw-Xlib.c
4917 extw-Xlib.h
4918 extw-Xt.c
4919 extw-Xt.h
4920 @end example
4921
4922 @cindex external widget
4923   These files provide the @dfn{external widget} interface, which allows an
4924 XEmacs frame to appear as a widget in another application.  To do this,
4925 you have to configure with @samp{--external-widget}.
4926
4927 @file{ExternalShell*} provides the server (XEmacs) side of the
4928 connection.
4929
4930 @file{ExternalClient*} provides the client (other application) side of
4931 the connection.  These files are not compiled into XEmacs but are
4932 compiled into libraries that are then linked into your application.
4933
4934 @file{extw-*} is common code that is used for both the client and server.
4935
4936 Don't touch this code; something is liable to break if you do.
4937
4938
4939
4940 @node Modules for Internationalization
4941 @section Modules for Internationalization
4942 @cindex modules for internationalization
4943 @cindex internationalization, modules for
4944
4945 @example
4946 mule-canna.c
4947 mule-ccl.c
4948 mule-charset.c
4949 mule-charset.h
4950 file-coding.c
4951 file-coding.h
4952 mule-mcpath.c
4953 mule-mcpath.h
4954 mule-wnnfns.c
4955 mule.c
4956 @end example
4957
4958 These files implement the MULE (Asian-language) support.  Note that MULE
4959 actually provides a general interface for all sorts of languages, not
4960 just Asian languages (although they are generally the most complicated
4961 to support).  This code is still in beta.
4962
4963 @file{mule-charset.*} and @file{file-coding.*} provide the heart of the
4964 XEmacs MULE support.  @file{mule-charset.*} implements the @dfn{charset}
4965 Lisp object type, which encapsulates a character set (an ordered one- or
4966 two-dimensional set of characters, such as US ASCII or JISX0208 Japanese
4967 Kanji).
4968
4969 @file{file-coding.*} implements the @dfn{coding-system} Lisp object
4970 type, which encapsulates a method of converting between different
4971 encodings.  An encoding is a representation of a stream of characters,
4972 possibly from multiple character sets, using a stream of bytes or words,
4973 and defines (e.g.) which escape sequences are used to specify particular
4974 character sets, how the indices for a character are converted into bytes
4975 (sometimes this involves setting the high bit; sometimes complicated
4976 rearranging of the values takes place, as in the Shift-JIS encoding),
4977 etc.
4978
4979 @file{mule-ccl.c} provides the CCL (Code Conversion Language)
4980 interpreter.  CCL is similar in spirit to Lisp byte code and is used to
4981 implement converters for custom encodings.
4982
4983 @file{mule-canna.c} and @file{mule-wnnfns.c} implement interfaces to
4984 external programs used to implement the Canna and WNN input methods,
4985 respectively.  This is currently in beta.
4986
4987 @file{mule-mcpath.c} provides some functions to allow for pathnames
4988 containing extended characters.  This code is fragmentary, obsolete, and
4989 completely non-working.  Instead, @var{pathname-coding-system} is used
4990 to specify conversions of names of files and directories.  The standard
4991 C I/O functions like @samp{open()} are wrapped so that conversion occurs
4992 automatically.
4993
4994 @file{mule.c} provides a few miscellaneous things that should probably
4995 be elsewhere.
4996
4997
4998
4999 @example
5000 intl.c
5001 @end example
5002
5003 This provides some miscellaneous internationalization code for
5004 implementing message translation and interfacing to the Ximp input
5005 method.  None of this code is currently working.
5006
5007
5008
5009 @example
5010 iso-wide.h
5011 @end example
5012
5013 This contains leftover code from an earlier implementation of
5014 Asian-language support, and is not currently used.
5015
5016
5017
5018
5019 @node Modules for Regression Testing
5020 @section Modules for Regression Testing
5021 @cindex modules for regression testing
5022 @cindex regression testing, modules for
5023
5024 @example
5025 test-harness.el
5026 base64-tests.el
5027 byte-compiler-tests.el
5028 case-tests.el
5029 ccl-tests.el
5030 c-tests.el
5031 database-tests.el
5032 extent-tests.el
5033 hash-table-tests.el
5034 lisp-tests.el
5035 md5-tests.el
5036 mule-tests.el
5037 regexp-tests.el
5038 symbol-tests.el
5039 syntax-tests.el
5040 @end example
5041
5042 @file{test-harness.el} defines the macros @code{Assert},
5043 @code{Check-Error}, @code{Check-Error-Message}, and
5044 @code{Check-Message}.  The other files are test files, testing various
5045 XEmacs modules.
5046
5047
5048
5049 @node Allocation of Objects in XEmacs Lisp, Dumping, A Summary of the Various XEmacs Modules, Top
5050 @chapter Allocation of Objects in XEmacs Lisp
5051 @cindex allocation of objects in XEmacs Lisp
5052 @cindex objects in XEmacs Lisp, allocation of
5053 @cindex Lisp objects, allocation of in XEmacs
5054
5055 @menu
5056 * Introduction to Allocation::
5057 * Garbage Collection::
5058 * GCPROing::
5059 * Garbage Collection - Step by Step::
5060 * Integers and Characters::
5061 * Allocation from Frob Blocks::
5062 * lrecords::
5063 * Low-level allocation::
5064 * Cons::
5065 * Vector::
5066 * Bit Vector::
5067 * Symbol::
5068 * Marker::
5069 * String::
5070 * Compiled Function::
5071 @end menu
5072
5073 @node Introduction to Allocation
5074 @section Introduction to Allocation
5075 @cindex allocation, introduction to
5076
5077   Emacs Lisp, like all Lisps, has garbage collection.  This means that
5078 the programmer never has to explicitly free (destroy) an object; it
5079 happens automatically when the object becomes inaccessible.  Most
5080 experts agree that garbage collection is a necessity in a modern,
5081 high-level language.  Its omission from C stems from the fact that C was
5082 originally designed to be a nice abstract layer on top of assembly
5083 language, for writing kernels and basic system utilities rather than
5084 large applications.
5085
5086   Lisp objects can be created by any of a number of Lisp primitives.
5087 Most object types have one or a small number of basic primitives
5088 for creating objects.  For conses, the basic primitive is @code{cons};
5089 for vectors, the primitives are @code{make-vector} and @code{vector}; for
5090 symbols, the primitives are @code{make-symbol} and @code{intern}; etc.
5091 Some Lisp objects, especially those that are primarily used internally,
5092 have no corresponding Lisp primitives.  Every Lisp object, though,
5093 has at least one C primitive for creating it.
5094
5095   Recall from section (VII) that a Lisp object, as stored in a 32-bit or
5096 64-bit word, has a few tag bits, and a ``value'' that occupies the
5097 remainder of the bits.  We can separate the different Lisp object types
5098 into three broad categories:
5099
5100 @itemize @bullet
5101 @item
5102 (a) Those for whom the value directly represents the contents of the
5103 Lisp object.  Only two types are in this category: integers and
5104 characters.  No special allocation or garbage collection is necessary
5105 for such objects.  Lisp objects of these types do not need to be
5106 @code{GCPRO}ed.
5107 @end itemize
5108
5109   In the remaining two categories, the type is stored in the object
5110 itself.  The tag for all such objects is the generic @dfn{lrecord}
5111 (Lisp_Type_Record) tag.  The first bytes of the object's structure are an
5112 integer (actually a char) characterising the object's type and some
5113 flags, in particular the mark bit used for garbage collection.  A
5114 structure describing the type is accessible thru the
5115 lrecord_implementation_table indexed with said integer.  This structure
5116 includes the method pointers and a pointer to a string naming the type.
5117
5118 @itemize @bullet
5119 @item
5120 (b) Those lrecords that are allocated in frob blocks (see above).  This
5121 includes the objects that are most common and relatively small, and
5122 includes conses, strings, subrs, floats, compiled functions, symbols,
5123 extents, events, and markers.  With the cleanup of frob blocks done in
5124 19.12, it's not terribly hard to add more objects to this category, but
5125 it's a bit trickier than adding an object type to type (c) (esp. if the
5126 object needs a finalization method), and is not likely to save much
5127 space unless the object is small and there are many of them. (In fact,
5128 if there are very few of them, it might actually waste space.)
5129 @item
5130 (c) Those lrecords that are individually @code{malloc()}ed.  These are
5131 called @dfn{lcrecords}.  All other types are in this category.  Adding a
5132 new type to this category is comparatively easy, and all types added
5133 since 19.8 (when the current allocation scheme was devised, by Richard
5134 Mlynarik), with the exception of the character type, have been in this
5135 category.
5136 @end itemize
5137
5138   Note that bit vectors are a bit of a special case.  They are
5139 simple lrecords as in category (b), but are individually @code{malloc()}ed
5140 like vectors.  You can basically view them as exactly like vectors
5141 except that their type is stored in lrecord fashion rather than
5142 in directly-tagged fashion.
5143
5144
5145 @node Garbage Collection
5146 @section Garbage Collection
5147 @cindex garbage collection
5148
5149 @cindex mark and sweep
5150   Garbage collection is simple in theory but tricky to implement.
5151 Emacs Lisp uses the oldest garbage collection method, called
5152 @dfn{mark and sweep}.  Garbage collection begins by starting with
5153 all accessible locations (i.e. all variables and other slots where
5154 Lisp objects might occur) and recursively traversing all objects
5155 accessible from those slots, marking each one that is found.
5156 We then go through all of memory and free each object that is
5157 not marked, and unmarking each object that is marked.  Note
5158 that ``all of memory'' means all currently allocated objects.
5159 Traversing all these objects means traversing all frob blocks,
5160 all vectors (which are chained in one big list), and all
5161 lcrecords (which are likewise chained).
5162
5163   Garbage collection can be invoked explicitly by calling
5164 @code{garbage-collect} but is also called automatically by @code{eval},
5165 once a certain amount of memory has been allocated since the last
5166 garbage collection (according to @code{gc-cons-threshold}).
5167
5168
5169 @node GCPROing
5170 @section @code{GCPRO}ing
5171 @cindex @code{GCPRO}ing
5172 @cindex garbage collection protection
5173 @cindex protection, garbage collection
5174
5175 @code{GCPRO}ing is one of the ugliest and trickiest parts of Emacs
5176 internals.  The basic idea is that whenever garbage collection
5177 occurs, all in-use objects must be reachable somehow or
5178 other from one of the roots of accessibility.  The roots
5179 of accessibility are:
5180
5181 @enumerate
5182 @item
5183 All objects that have been @code{staticpro()}d or
5184 @code{staticpro_nodump()}ed.  This is used for any global C variables
5185 that hold Lisp objects.  A call to @code{staticpro()} happens implicitly
5186 as a result of any symbols declared with @code{defsymbol()} and any
5187 variables declared with @code{DEFVAR_FOO()}.  You need to explicitly
5188 call @code{staticpro()} (in the @code{vars_of_foo()} method of a module)
5189 for other global C variables holding Lisp objects. (This typically
5190 includes internal lists and such things.).  Use
5191 @code{staticpro_nodump()} only in the rare cases when you do not want
5192 the pointed variable to be saved at dump time but rather recompute it at
5193 startup.
5194
5195 Note that @code{obarray} is one of the @code{staticpro()}d things.
5196 Therefore, all functions and variables get marked through this.
5197 @item
5198 Any shadowed bindings that are sitting on the @code{specpdl} stack.
5199 @item
5200 Any objects sitting in currently active (Lisp) stack frames,
5201 catches, and condition cases.
5202 @item
5203 A couple of special-case places where active objects are
5204 located.
5205 @item
5206 Anything currently marked with @code{GCPRO}.
5207 @end enumerate
5208
5209   Marking with @code{GCPRO} is necessary because some C functions (quite
5210 a lot, in fact), allocate objects during their operation.  Quite
5211 frequently, there will be no other pointer to the object while the
5212 function is running, and if a garbage collection occurs and the object
5213 needs to be referenced again, bad things will happen.  The solution is
5214 to mark those objects with @code{GCPRO}.  Unfortunately this is easy to
5215 forget, and there is basically no way around this problem.  Here are
5216 some rules, though:
5217
5218 @enumerate
5219 @item
5220 For every @code{GCPRO@var{n}}, there have to be declarations of
5221 @code{struct gcpro gcpro1, gcpro2}, etc.
5222
5223 @item
5224 You @emph{must} @code{UNGCPRO} anything that's @code{GCPRO}ed, and you
5225 @emph{must not} @code{UNGCPRO} if you haven't @code{GCPRO}ed.  Getting
5226 either of these wrong will lead to crashes, often in completely random
5227 places unrelated to where the problem lies.
5228
5229 @item
5230 The way this actually works is that all currently active @code{GCPRO}s
5231 are chained through the @code{struct gcpro} local variables, with the
5232 variable @samp{gcprolist} pointing to the head of the list and the nth
5233 local @code{gcpro} variable pointing to the first @code{gcpro} variable
5234 in the next enclosing stack frame.  Each @code{GCPRO}ed thing is an
5235 lvalue, and the @code{struct gcpro} local variable contains a pointer to
5236 this lvalue.  This is why things will mess up badly if you don't pair up
5237 the @code{GCPRO}s and @code{UNGCPRO}s---you will end up with
5238 @code{gcprolist}s containing pointers to @code{struct gcpro}s or local
5239 @code{Lisp_Object} variables in no-longer-active stack frames.
5240
5241 @item
5242 It is actually possible for a single @code{struct gcpro} to
5243 protect a contiguous array of any number of values, rather than
5244 just a single lvalue.  To effect this, call @code{GCPRO@var{n}} as usual on
5245 the first object in the array and then set @code{gcpro@var{n}.nvars}.
5246
5247 @item
5248 @strong{Strings are relocated.}  What this means in practice is that the
5249 pointer obtained using @code{XSTRING_DATA()} is liable to change at any
5250 time, and you should never keep it around past any function call, or
5251 pass it as an argument to any function that might cause a garbage
5252 collection.  This is why a number of functions accept either a
5253 ``non-relocatable'' @code{char *} pointer or a relocatable Lisp string,
5254 and only access the Lisp string's data at the very last minute.  In some
5255 cases, you may end up having to @code{alloca()} some space and copy the
5256 string's data into it.
5257
5258 @item
5259 By convention, if you have to nest @code{GCPRO}'s, use @code{NGCPRO@var{n}}
5260 (along with @code{struct gcpro ngcpro1, ngcpro2}, etc.), @code{NNGCPRO@var{n}},
5261 etc.  This avoids compiler warnings about shadowed locals.
5262
5263 @item
5264 It is @emph{always} better to err on the side of extra @code{GCPRO}s
5265 rather than too few.  The extra cycles spent on this are
5266 almost never going to make a whit of difference in the
5267 speed of anything.
5268
5269 @item
5270 The general rule to follow is that caller, not callee, @code{GCPRO}s.
5271 That is, you should not have to explicitly @code{GCPRO} any Lisp objects
5272 that are passed in as parameters.
5273
5274 One exception from this rule is if you ever plan to change the parameter
5275 value, and store a new object in it.  In that case, you @emph{must}
5276 @code{GCPRO} the parameter, because otherwise the new object will not be
5277 protected.
5278
5279 So, if you create any Lisp objects (remember, this happens in all sorts
5280 of circumstances, e.g. with @code{Fcons()}, etc.), you are responsible
5281 for @code{GCPRO}ing them, unless you are @emph{absolutely sure} that
5282 there's no possibility that a garbage-collection can occur while you
5283 need to use the object.  Even then, consider @code{GCPRO}ing.
5284
5285 @item
5286 A garbage collection can occur whenever anything calls @code{Feval}, or
5287 whenever a QUIT can occur where execution can continue past
5288 this. (Remember, this is almost anywhere.)
5289
5290 @item
5291 If you have the @emph{least smidgeon of doubt} about whether
5292 you need to @code{GCPRO}, you should @code{GCPRO}.
5293
5294 @item
5295 Beware of @code{GCPRO}ing something that is uninitialized.  If you have
5296 any shade of doubt about this, initialize all your variables to @code{Qnil}.
5297
5298 @item
5299 Be careful of traps, like calling @code{Fcons()} in the argument to
5300 another function.  By the ``caller protects'' law, you should be
5301 @code{GCPRO}ing the newly-created cons, but you aren't.  A certain
5302 number of functions that are commonly called on freshly created stuff
5303 (e.g. @code{nconc2()}, @code{Fsignal()}), break the ``caller protects''
5304 law and go ahead and @code{GCPRO} their arguments so as to simplify
5305 things, but make sure and check if it's OK whenever doing something like
5306 this.
5307
5308 @item
5309 Once again, remember to @code{GCPRO}!  Bugs resulting from insufficient
5310 @code{GCPRO}ing are intermittent and extremely difficult to track down,
5311 often showing up in crashes inside of @code{garbage-collect} or in
5312 weirdly corrupted objects or even in incorrect values in a totally
5313 different section of code.
5314 @end enumerate
5315
5316 If you don't understand whether to @code{GCPRO} in a particular
5317 instance, ask on the mailing lists.  A general hint is that @code{prog1}
5318 is the canonical example
5319
5320 @cindex garbage collection, conservative
5321 @cindex conservative garbage collection
5322   Given the extremely error-prone nature of the @code{GCPRO} scheme, and
5323 the difficulties in tracking down, it should be considered a deficiency
5324 in the XEmacs code.  A solution to this problem would involve
5325 implementing so-called @dfn{conservative} garbage collection for the C
5326 stack.  That involves looking through all of stack memory and treating
5327 anything that looks like a reference to an object as a reference.  This
5328 will result in a few objects not getting collected when they should, but
5329 it obviates the need for @code{GCPRO}ing, and allows garbage collection
5330 to happen at any point at all, such as during object allocation.
5331
5332 @node Garbage Collection - Step by Step
5333 @section Garbage Collection - Step by Step
5334 @cindex garbage collection - step by step
5335
5336 @menu
5337 * Invocation::
5338 * garbage_collect_1::
5339 * mark_object::
5340 * gc_sweep::
5341 * sweep_lcrecords_1::
5342 * compact_string_chars::
5343 * sweep_strings::
5344 * sweep_bit_vectors_1::
5345 @end menu
5346
5347 @node Invocation
5348 @subsection Invocation
5349 @cindex garbage collection, invocation
5350
5351 The first thing that anyone should know about garbage collection is:
5352 when and how the garbage collector is invoked. One might think that this
5353 could happen every time new memory is allocated, e.g. new objects are
5354 created, but this is @emph{not} the case. Instead, we have the following
5355 situation:
5356
5357 The entry point of any process of garbage collection is an invocation
5358 of the function @code{garbage_collect_1} in file @code{alloc.c}. The
5359 invocation can occur @emph{explicitly} by calling the function
5360 @code{Fgarbage_collect} (in addition this function provides information
5361 about the freed memory), or can occur @emph{implicitly} in four different
5362 situations:
5363 @enumerate
5364 @item
5365 In function @code{main_1} in file @code{emacs.c}. This function is called
5366 at each startup of xemacs. The garbage collection is invoked after all
5367 initial creations are completed, but only if a special internal error
5368 checking-constant @code{ERROR_CHECK_GC} is defined.
5369 @item
5370 In function @code{disksave_object_finalization} in file
5371 @code{alloc.c}. The only purpose of this function is to clear the
5372 objects from memory which need not be stored with xemacs when we dump out
5373 an executable. This is only done by @code{Fdump_emacs} or by
5374 @code{Fdump_emacs_data} respectively (both in @code{emacs.c}). The
5375 actual clearing is accomplished by making these objects unreachable and
5376 starting a garbage collection. The function is only used while building
5377 xemacs.
5378 @item
5379 In function @code{Feval / eval} in file @code{eval.c}. Each time the
5380 well known and often used function eval is called to evaluate a form,
5381 one of the first things that could happen, is a potential call of
5382 @code{garbage_collect_1}. There exist three global variables,
5383 @code{consing_since_gc} (counts the created cons-cells since the last
5384 garbage collection), @code{gc_cons_threshold} (a specified threshold
5385 after which a garbage collection occurs) and @code{always_gc}. If
5386 @code{always_gc} is set or if the threshold is exceeded, the garbage
5387 collection will start.
5388 @item
5389 In function @code{Ffuncall / funcall} in file @code{eval.c}. This
5390 function evaluates calls of elisp functions and works according to
5391 @code{Feval}.
5392 @end enumerate
5393
5394 The upshot is that garbage collection can basically occur everywhere
5395 @code{Feval}, respectively @code{Ffuncall}, is used - either directly or
5396 through another function. Since calls to these two functions are hidden
5397 in various other functions, many calls to @code{garbage_collect_1} are
5398 not obviously foreseeable, and therefore unexpected. Instances where
5399 they are used that are worth remembering are various elisp commands, as
5400 for example @code{or}, @code{and}, @code{if}, @code{cond}, @code{while},
5401 @code{setq}, etc., miscellaneous @code{gui_item_...} functions,
5402 everything related to @code{eval} (@code{Feval_buffer}, @code{call0},
5403 ...) and inside @code{Fsignal}. The latter is used to handle signals, as
5404 for example the ones raised by every @code{QUIT}-macro triggered after
5405 pressing Ctrl-g.
5406
5407 @node garbage_collect_1
5408 @subsection @code{garbage_collect_1}
5409 @cindex @code{garbage_collect_1}
5410
5411 We can now describe exactly what happens after the invocation takes
5412 place.
5413 @enumerate
5414 @item
5415 There are several cases in which the garbage collector is left immediately:
5416 when we are already garbage collecting (@code{gc_in_progress}), when
5417 the garbage collection is somehow forbidden
5418 (@code{gc_currently_forbidden}), when we are currently displaying something
5419 (@code{in_display}) or when we are preparing for the armageddon of the
5420 whole system (@code{preparing_for_armageddon}).
5421 @item
5422 Next the correct frame in which to put
5423 all the output occurring during garbage collecting is determined. In
5424 order to be able to restore the old display's state after displaying the
5425 message, some data about the current cursor position has to be
5426 saved. The variables @code{pre_gc_cursor} and @code{cursor_changed} take
5427 care of that.
5428 @item
5429 The state of @code{gc_currently_forbidden} must be restored after
5430 the garbage collection, no matter what happens during the process. We
5431 accomplish this by @code{record_unwind_protect}ing the suitable function
5432 @code{restore_gc_inhibit} together with the current value of
5433 @code{gc_currently_forbidden}.
5434 @item
5435 If we are concurrently running an interactive xemacs session, the next step
5436 is simply to show the garbage collector's cursor/message.
5437 @item
5438 The following steps are the intrinsic steps of the garbage collector,
5439 therefore @code{gc_in_progress} is set.
5440 @item
5441 For debugging purposes, it is possible to copy the current C stack
5442 frame. However, this seems to be a currently unused feature.
5443 @item
5444 Before actually starting to go over all live objects, references to
5445 objects that are no longer used are pruned. We only have to do this for events
5446 (@code{clear_event_resource}) and for specifiers
5447 (@code{cleanup_specifiers}).
5448 @item
5449 Now the mark phase begins and marks all accessible elements. In order to
5450 start from
5451 all slots that serve as roots of accessibility, the function
5452 @code{mark_object} is called for each root individually to go out from
5453 there to mark all reachable objects. All roots that are traversed are
5454 shown in their processed order:
5455 @itemize @bullet
5456 @item
5457 all constant symbols and static variables that are registered via
5458 @code{staticpro}@ in the dynarr @code{staticpros}.
5459 @xref{Adding Global Lisp Variables}.
5460 @item
5461 all Lisp objects that are created in C functions and that must be
5462 protected from freeing them. They are registered in the global
5463 list @code{gcprolist}.
5464 @xref{GCPROing}.
5465 @item
5466 all local variables (i.e. their name fields @code{symbol} and old
5467 values @code{old_values}) that are bound during the evaluation by the Lisp
5468 engine. They are stored in @code{specbinding} structs pushed on a stack
5469 called @code{specpdl}.
5470 @xref{Dynamic Binding; The specbinding Stack; Unwind-Protects}.
5471 @item
5472 all catch blocks that the Lisp engine encounters during the evaluation
5473 cause the creation of structs @code{catchtag} inserted in the list
5474 @code{catchlist}. Their tag (@code{tag}) and value (@code{val} fields
5475 are freshly created objects and therefore have to be marked.
5476 @xref{Catch and Throw}.
5477 @item
5478 every function application pushes new structs @code{backtrace}
5479 on the call stack of the Lisp engine (@code{backtrace_list}). The unique
5480 parts that have to be marked are the fields for each function
5481 (@code{function}) and all their arguments (@code{args}).
5482 @xref{Evaluation}.
5483 @item
5484 all objects that are used by the redisplay engine that must not be freed
5485 are marked by a special function called @code{mark_redisplay} (in
5486 @code{redisplay.c}).
5487 @item
5488 all objects created for profiling purposes are allocated by C functions
5489 instead of using the lisp allocation mechanisms. In order to receive the
5490 right ones during the sweep phase, they also have to be marked
5491 manually. That is done by the function @code{mark_profiling_info}
5492 @end itemize
5493 @item
5494 Hash tables in XEmacs belong to a kind of special objects that
5495 make use of a concept often called 'weak pointers'.
5496 To make a long story short, these kind of pointers are not followed
5497 during the estimation of the live objects during garbage collection.
5498 Any object referenced only by weak pointers is collected
5499 anyway, and the reference to it is cleared. In hash tables there are
5500 different usage patterns of them, manifesting in different types of hash
5501 tables, namely 'non-weak', 'weak', 'key-weak' and 'value-weak'
5502 (internally also 'key-car-weak' and 'value-car-weak') hash tables, each
5503 clearing entries depending on different conditions. More information can
5504 be found in the documentation to the function @code{make-hash-table}.
5505
5506 Because there are complicated dependency rules about when and what to
5507 mark while processing weak hash tables, the standard @code{marker}
5508 method is only active if it is marking non-weak hash tables. As soon as
5509 a weak component is in the table, the hash table entries are ignored
5510 while marking. Instead their marking is done each separately by the
5511 function @code{finish_marking_weak_hash_tables}. This function iterates
5512 over each hash table entry @code{hentries} for each weak hash table in
5513 @code{Vall_weak_hash_tables}. Depending on the type of a table, the
5514 appropriate action is performed.
5515 If a table is acting as @code{HASH_TABLE_KEY_WEAK}, and a key already marked,
5516 everything reachable from the @code{value} component is marked. If it is
5517 acting as a @code{HASH_TABLE_VALUE_WEAK} and the value component is
5518 already marked, the marking starts beginning only from the
5519 @code{key} component.
5520 If it is a @code{HASH_TABLE_KEY_CAR_WEAK} and the car
5521 of the key entry is already marked, we mark both the @code{key} and
5522 @code{value} components.
5523 Finally, if the table is of the type @code{HASH_TABLE_VALUE_CAR_WEAK}
5524 and the car of the value components is already marked, again both the
5525 @code{key} and the @code{value} components get marked.
5526
5527 Again, there are lists with comparable properties called weak
5528 lists. There exist different peculiarities of their types called
5529 @code{simple}, @code{assoc}, @code{key-assoc} and
5530 @code{value-assoc}. You can find further details about them in the
5531 description to the function @code{make-weak-list}. The scheme of their
5532 marking is similar: all weak lists are listed in @code{Qall_weak_lists},
5533 therefore we iterate over them. The marking is advanced until we hit an
5534 already marked pair. Then we know that during a former run all
5535 the rest has been marked completely. Again, depending on the special
5536 type of the weak list, our jobs differ. If it is a @code{WEAK_LIST_SIMPLE}
5537 and the elem is marked, we mark the @code{cons} part. If it is a
5538 @code{WEAK_LIST_ASSOC} and not a pair or a pair with both marked car and
5539 cdr, we mark the @code{cons} and the @code{elem}. If it is a
5540 @code{WEAK_LIST_KEY_ASSOC} and not a pair or a pair with a marked car of
5541 the elem, we mark the @code{cons} and the @code{elem}. Finally, if it is
5542 a @code{WEAK_LIST_VALUE_ASSOC} and not a pair or a pair with a marked
5543 cdr of the elem, we mark both the @code{cons} and the @code{elem}.
5544
5545 Since, by marking objects in reach from weak hash tables and weak lists,
5546 other objects could get marked, this perhaps implies further marking of
5547 other weak objects, both finishing functions are redone as long as
5548 yet unmarked objects get freshly marked.
5549
5550 @item
5551 After completing the special marking for the weak hash tables and for the weak
5552 lists, all entries that point to objects that are going to be swept in
5553 the further process are useless, and therefore have to be removed from
5554 the table or the list.
5555
5556 The function @code{prune_weak_hash_tables} does the job for weak hash
5557 tables. Totally unmarked hash tables are removed from the list
5558 @code{Vall_weak_hash_tables}. The other ones are treated more carefully
5559 by scanning over all entries and removing one as soon as one of
5560 the components @code{key} and @code{value} is unmarked.
5561
5562 The same idea applies to the weak lists. It is accomplished by
5563 @code{prune_weak_lists}: An unmarked list is pruned from
5564 @code{Vall_weak_lists} immediately. A marked list is treated more
5565 carefully by going over it and removing just the unmarked pairs.
5566
5567 @item
5568 The function @code{prune_specifiers} checks all listed specifiers held
5569 in @code{Vall_specifiers} and removes the ones from the lists that are
5570 unmarked.
5571
5572 @item
5573 All syntax tables are stored in a list called
5574 @code{Vall_syntax_tables}. The function @code{prune_syntax_tables} walks
5575 through it and unlinks the tables that are unmarked.
5576
5577 @item
5578 Next, we will attack the complete sweeping - the function
5579 @code{gc_sweep} which holds the predominance.
5580 @item
5581 First, all the variables with respect to garbage collection are
5582 reset. @code{consing_since_gc} - the counter of the created cells since
5583 the last garbage collection - is set back to 0, and
5584 @code{gc_in_progress} is not @code{true} anymore.
5585 @item
5586 In case the session is interactive, the displayed cursor and message are
5587 removed again.
5588 @item
5589 The state of @code{gc_inhibit} is restored to the former value by
5590 unwinding the stack.
5591 @item
5592 A small memory reserve is always held back that can be reached by
5593 @code{breathing_space}. If nothing more is left, we create a new reserve
5594 and exit.
5595 @end enumerate
5596
5597 @node mark_object
5598 @subsection @code{mark_object}
5599 @cindex @code{mark_object}
5600
5601 The first thing that is checked while marking an object is whether the
5602 object is a real Lisp object @code{Lisp_Type_Record} or just an integer
5603 or a character. Integers and characters are the only two types that are
5604 stored directly - without another level of indirection, and therefore they
5605 don't have to be marked and collected.
5606 @xref{How Lisp Objects Are Represented in C}.
5607
5608 The second case is the one we have to handle. It is the one when we are
5609 dealing with a pointer to a Lisp object. But, there exist also three
5610 possibilities, that prevent us from doing anything while marking: The
5611 object is read only which prevents it from being garbage collected,
5612 i.e. marked (@code{C_READONLY_RECORD_HEADER}). The object in question is
5613 already marked, and need not be marked for the second time (checked by
5614 @code{MARKED_RECORD_HEADER_P}). If it is a special, unmarkable object
5615 (@code{UNMARKABLE_RECORD_HEADER_P}, apparently, these are objects that
5616 sit in some const space, and can therefore not be marked, see
5617 @code{this_one_is_unmarkable} in @code{alloc.c}).
5618
5619 Now, the actual marking is feasible. We do so by once using the macro
5620 @code{MARK_RECORD_HEADER} to mark the object itself (actually the
5621 special flag in the lrecord header), and calling its special marker
5622 "method" @code{marker} if available. The marker method marks every
5623 other object that is in reach from our current object. Note, that these
5624 marker methods should not call @code{mark_object} recursively, but
5625 instead should return the next object from where further marking has to
5626 be performed.
5627
5628 In case another object was returned, as mentioned before, we reiterate
5629 the whole @code{mark_object} process beginning with this next object.
5630
5631 @node gc_sweep
5632 @subsection @code{gc_sweep}
5633 @cindex @code{gc_sweep}
5634
5635 The job of this function is to free all unmarked records from memory. As
5636 we know, there are different types of objects implemented and managed, and
5637 consequently different ways to free them from memory.
5638 @xref{Introduction to Allocation}.
5639
5640 We start with all objects stored through @code{lcrecords}. All
5641 bulkier objects are allocated and handled using that scheme of
5642 @code{lcrecords}. Each object is @code{malloc}ed separately
5643 instead of placing it in one of the contiguous frob blocks. All types
5644 that are currently stored
5645 using @code{lcrecords}'s  @code{alloc_lcrecord} and
5646 @code{make_lcrecord_list} are the types: vectors, buffers,
5647 char-table, char-table-entry, console, weak-list, database, device,
5648 ldap, hash-table, command-builder, extent-auxiliary, extent-info, face,
5649 coding-system, frame, image-instance, glyph, popup-data, gui-item,
5650 keymap, charset, color_instance, font_instance, opaque, opaque-list,
5651 process, range-table, specifier, symbol-value-buffer-local,
5652 symbol-value-lisp-magic, symbol-value-varalias, toolbar-button,
5653 tooltalk-message, tooltalk-pattern, window, and window-configuration. We
5654 take care of them in the fist place
5655 in order to be able to handle and to finalize items stored in them more
5656 easily. The function @code{sweep_lcrecords_1} as described below is
5657 doing the whole job for us.
5658 For a description about the internals: @xref{lrecords}.
5659
5660 Our next candidates are the other objects that behave quite differently
5661 than everything else: the strings. They consists of two parts, a
5662 fixed-size portion (@code{struct Lisp_String}) holding the string's
5663 length, its property list and a pointer to the second part, and the
5664 actual string data, which is stored in string-chars blocks comparable to
5665 frob blocks. In this block, the data is not only freed, but also a
5666 compression of holes is made, i.e. all strings are relocated together.
5667 @xref{String}. This compacting phase is performed by the function
5668 @code{compact_string_chars}, the actual sweeping by the function
5669 @code{sweep_strings} is described below.
5670
5671 After that, the other types are swept step by step using functions
5672 @code{sweep_conses}, @code{sweep_bit_vectors_1},
5673 @code{sweep_compiled_functions}, @code{sweep_floats},
5674 @code{sweep_symbols}, @code{sweep_extents}, @code{sweep_markers} and
5675 @code{sweep_extents}.  They are the fixed-size types cons, floats,
5676 compiled-functions, symbol, marker, extent, and event stored in
5677 so-called "frob blocks", and therefore we can basically do the same on
5678 every type objects, using the same macros, especially defined only to
5679 handle everything with respect to fixed-size blocks. The only fixed-size
5680 type that is not handled here are the fixed-size portion of strings,
5681 because we took special care of them earlier.
5682
5683 The only big exceptions are bit vectors stored differently and
5684 therefore treated differently by the function @code{sweep_bit_vectors_1}
5685 described later.
5686
5687 At first, we need some brief information about how
5688 these fixed-size types are managed in general, in order to understand
5689 how the sweeping is done. They have all a fixed size, and are therefore
5690 stored in big blocks of memory - allocated at once - that can hold a
5691 certain amount of objects of one type. The macro
5692 @code{DECLARE_FIXED_TYPE_ALLOC} creates the suitable structures for
5693 every type. More precisely, we have the block struct
5694 (holding a pointer to the previous block @code{prev} and the
5695 objects in @code{block[]}), a pointer to current block
5696 (@code{current_..._block)}) and its last index
5697 (@code{current_..._block_index}), and a pointer to the free list that
5698 will be created. Also a macro @code{FIXED_TYPE_FROM_BLOCK} plus some
5699 related macros exists that are used to obtain a new object, either from
5700 the free list @code{ALLOCATE_FIXED_TYPE_1} if there is an unused object
5701 of that type stored or by allocating a completely new block using
5702 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK}.
5703
5704 The rest works as follows: all of them define a
5705 macro @code{UNMARK_...} that is used to unmark the object. They define a
5706 macro @code{ADDITIONAL_FREE_...} that defines additional work that has
5707 to be done when converting an object from in use to not in use (so far,
5708 only markers use it in order to unchain them). Then, they all call
5709 the macro @code{SWEEP_FIXED_TYPE_BLOCK} instantiated with their type name
5710 and their struct name.
5711
5712 This call in particular does the following: we go over all blocks
5713 starting with the current moving towards the oldest.
5714 For each block, we look at every object in it. If the object already
5715 freed (checked with @code{FREE_STRUCT_P} using the first pointer of the
5716 object), or if it is
5717 set to read only (@code{C_READONLY_RECORD_HEADER_P}, nothing must be
5718 done. If it is unmarked (checked with @code{MARKED_RECORD_HEADER_P}), it
5719 is put in the free list and set free (using the macro
5720 @code{FREE_FIXED_TYPE}, otherwise it stays in the block, but is unmarked
5721 (by @code{UNMARK_...}). While going through one block, we note if the
5722 whole block is empty. If so, the whole block is freed (using
5723 @code{xfree}) and the free list state is set to the state it had before
5724 handling this block.
5725
5726 @node sweep_lcrecords_1
5727 @subsection @code{sweep_lcrecords_1}
5728 @cindex @code{sweep_lcrecords_1}
5729
5730 After nullifying the complete lcrecord statistics, we go over all
5731 lcrecords two separate times. They are all chained together in a list with
5732 a head called @code{all_lcrecords}.
5733
5734 The first loop calls for each object its @code{finalizer} method, but only
5735 in the case that it is not read only
5736 (@code{C_READONLY_RECORD_HEADER_P)}, it is not already marked
5737 (@code{MARKED_RECORD_HEADER_P}), it is not already in a free list (list of
5738 freed objects, field @code{free}) and finally it owns a finalizer
5739 method.
5740
5741 The second loop actually frees the appropriate objects again by iterating
5742 through the whole list. In case an object is read only or marked, it
5743 has to persist, otherwise it is manually freed by calling
5744 @code{xfree}. During this loop, the lcrecord statistics are kept up to
5745 date by calling @code{tick_lcrecord_stats} with the right arguments,
5746
5747 @node compact_string_chars
5748 @subsection @code{compact_string_chars}
5749 @cindex @code{compact_string_chars}
5750
5751 The purpose of this function is to compact all the data parts of the
5752 strings that are held in so-called @code{string_chars_block}, i.e. the
5753 strings that do not exceed a certain maximal length.
5754
5755 The procedure with which this is done is as follows. We are keeping two
5756 positions in the @code{string_chars_block}s using two pointer/integer
5757 pairs, namely @code{from_sb}/@code{from_pos} and
5758 @code{to_sb}/@code{to_pos}. They stand for the actual positions, from
5759 where to where, to copy the actually handled string.
5760
5761 While going over all chained @code{string_char_block}s and their held
5762 strings, staring at @code{first_string_chars_block}, both pointers
5763 are advanced and eventually a string is copied from @code{from_sb} to
5764 @code{to_sb}, depending on the status of the pointed at strings.
5765
5766 More precisely, we can distinguish between the following actions.
5767 @itemize @bullet
5768 @item
5769 The string at @code{from_sb}'s position could be marked as free, which
5770 is indicated by an invalid pointer to the pointer that should point back
5771 to the fixed size string object, and which is checked by
5772 @code{FREE_STRUCT_P}. In this case, the @code{from_sb}/@code{from_pos}
5773 is advanced to the next string, and nothing has to be copied.
5774 @item
5775 Also, if a string object itself is unmarked, nothing has to be
5776 copied. We likewise advance the @code{from_sb}/@code{from_pos}
5777 pair as described above.
5778 @item
5779 In all other cases, we have a marked string at hand. The string data
5780 must be moved from the from-position to the to-position. In case
5781 there is not enough space in the actual @code{to_sb}-block, we advance
5782 this pointer to the beginning of the next block before copying. In case the
5783 from and to positions are different, we perform the
5784 actual copying using the library function @code{memmove}.
5785 @end itemize
5786
5787 After compacting, the pointer to the current
5788 @code{string_chars_block}, sitting in @code{current_string_chars_block},
5789 is reset on the last block to which we moved a string,
5790 i.e. @code{to_block}, and all remaining blocks (we know that they just
5791 carry garbage) are explicitly @code{xfree}d.
5792
5793 @node sweep_strings
5794 @subsection @code{sweep_strings}
5795 @cindex @code{sweep_strings}
5796
5797 The sweeping for the fixed sized string objects is essentially exactly
5798 the same as it is for all other fixed size types. As before, the freeing
5799 into the suitable free list is done by using the macro
5800 @code{SWEEP_FIXED_SIZE_BLOCK} after defining the right macros
5801 @code{UNMARK_string} and @code{ADDITIONAL_FREE_string}. These two
5802 definitions are a little bit special compared to the ones used
5803 for the other fixed size types.
5804
5805 @code{UNMARK_string} is defined the same way except some additional code
5806 used for updating the bookkeeping information.
5807
5808 For strings, @code{ADDITIONAL_FREE_string} has to do something in
5809 addition: in case, the string was not allocated in a
5810 @code{string_chars_block} because it exceeded the maximal length, and
5811 therefore it was @code{malloc}ed separately, we know also @code{xfree}
5812 it explicitly.
5813
5814 @node sweep_bit_vectors_1
5815 @subsection @code{sweep_bit_vectors_1}
5816 @cindex @code{sweep_bit_vectors_1}
5817
5818 Bit vectors are also one of the rare types that are @code{malloc}ed
5819 individually. Consequently, while sweeping, all further needless
5820 bit vectors must be freed by hand. This is done, as one might imagine,
5821 the expected way: since they are all registered in a list called
5822 @code{all_bit_vectors}, all elements of that list are traversed,
5823 all unmarked bit vectors are unlinked by calling @code{xfree} and all of
5824 them become unmarked.
5825 In addition, the bookkeeping information used for garbage
5826 collector's output purposes is updated.
5827
5828 @node Integers and Characters
5829 @section Integers and Characters
5830 @cindex integers and characters
5831 @cindex characters, integers and
5832
5833   Integer and character Lisp objects are created from integers using the
5834 macros @code{XSETINT()} and @code{XSETCHAR()} or the equivalent
5835 functions @code{make_int()} and @code{make_char()}. (These are actually
5836 macros on most systems.)  These functions basically just do some moving
5837 of bits around, since the integral value of the object is stored
5838 directly in the @code{Lisp_Object}.
5839
5840   @code{XSETINT()} and the like will truncate values given to them that
5841 are too big; i.e. you won't get the value you expected but the tag bits
5842 will at least be correct.
5843
5844 @node Allocation from Frob Blocks
5845 @section Allocation from Frob Blocks
5846 @cindex allocation from frob blocks
5847 @cindex frob blocks, allocation from
5848
5849 The uninitialized memory required by a @code{Lisp_Object} of a particular type
5850 is allocated using
5851 @code{ALLOCATE_FIXED_TYPE()}.  This only occurs inside of the
5852 lowest-level object-creating functions in @file{alloc.c}:
5853 @code{Fcons()}, @code{make_float()}, @code{Fmake_byte_code()},
5854 @code{Fmake_symbol()}, @code{allocate_extent()},
5855 @code{allocate_event()}, @code{Fmake_marker()}, and
5856 @code{make_uninit_string()}.  The idea is that, for each type, there are
5857 a number of frob blocks (each 2K in size); each frob block is divided up
5858 into object-sized chunks.  Each frob block will have some of these
5859 chunks that are currently assigned to objects, and perhaps some that are
5860 free. (If a frob block has nothing but free chunks, it is freed at the
5861 end of the garbage collection cycle.)  The free chunks are stored in a
5862 free list, which is chained by storing a pointer in the first four bytes
5863 of the chunk. (Except for the free chunks at the end of the last frob
5864 block, which are handled using an index which points past the end of the
5865 last-allocated chunk in the last frob block.)
5866 @code{ALLOCATE_FIXED_TYPE()} first tries to retrieve a chunk from the
5867 free list; if that fails, it calls
5868 @code{ALLOCATE_FIXED_TYPE_FROM_BLOCK()}, which looks at the end of the
5869 last frob block for space, and creates a new frob block if there is
5870 none. (There are actually two versions of these macros, one of which is
5871 more defensive but less efficient and is used for error-checking.)
5872
5873 @node lrecords
5874 @section lrecords
5875 @cindex lrecords
5876
5877   [see @file{lrecord.h}]
5878
5879   All lrecords have at the beginning of their structure a @code{struct
5880 lrecord_header}.  This just contains a type number and some flags,
5881 including the mark bit.  All builtin type numbers are defined as
5882 constants in @code{enum lrecord_type}, to allow the compiler to generate
5883 more efficient code for @code{@var{type}P}.  The type number, thru the
5884 @code{lrecord_implementation_table}, gives access to a @code{struct
5885 lrecord_implementation}, which is a structure containing method pointers
5886 and such.  There is one of these for each type, and it is a global,
5887 constant, statically-declared structure that is declared in the
5888 @code{DEFINE_LRECORD_IMPLEMENTATION()} macro.
5889
5890   Simple lrecords (of type (b) above) just have a @code{struct
5891 lrecord_header} at their beginning.  lcrecords, however, actually have a
5892 @code{struct lcrecord_header}.  This, in turn, has a @code{struct
5893 lrecord_header} at its beginning, so sanity is preserved; but it also
5894 has a pointer used to chain all lcrecords together, and a special ID
5895 field used to distinguish one lcrecord from another. (This field is used
5896 only for debugging and could be removed, but the space gain is not
5897 significant.)
5898
5899   Simple lrecords are created using @code{ALLOCATE_FIXED_TYPE()}, just
5900 like for other frob blocks.  The only change is that the implementation
5901 pointer must be initialized correctly. (The implementation structure for
5902 an lrecord, or rather the pointer to it, is named @code{lrecord_float},
5903 @code{lrecord_extent}, @code{lrecord_buffer}, etc.)
5904
5905   lcrecords are created using @code{alloc_lcrecord()}.  This takes a
5906 size to allocate and an implementation pointer. (The size needs to be
5907 passed because some lcrecords, such as window configurations, are of
5908 variable size.) This basically just @code{malloc()}s the storage,
5909 initializes the @code{struct lcrecord_header}, and chains the lcrecord
5910 onto the head of the list of all lcrecords, which is stored in the
5911 variable @code{all_lcrecords}.  The calls to @code{alloc_lcrecord()}
5912 generally occur in the lowest-level allocation function for each lrecord
5913 type.
5914
5915 Whenever you create an lrecord, you need to call either
5916 @code{DEFINE_LRECORD_IMPLEMENTATION()} or
5917 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()}.  This needs to be
5918 specified in a @file{.c} file, at the top level.  What this actually
5919 does is define and initialize the implementation structure for the
5920 lrecord. (And possibly declares a function @code{error_check_foo()} that
5921 implements the @code{XFOO()} macro when error-checking is enabled.)  The
5922 arguments to the macros are the actual type name (this is used to
5923 construct the C variable name of the lrecord implementation structure
5924 and related structures using the @samp{##} macro concatenation
5925 operator), a string that names the type on the Lisp level (this may not
5926 be the same as the C type name; typically, the C type name has
5927 underscores, while the Lisp string has dashes), various method pointers,
5928 and the name of the C structure that contains the object.  The methods
5929 are used to encapsulate type-specific information about the object, such
5930 as how to print it or mark it for garbage collection, so that it's easy
5931 to add new object types without having to add a specific case for each
5932 new type in a bunch of different places.
5933
5934   The difference between @code{DEFINE_LRECORD_IMPLEMENTATION()} and
5935 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()} is that the former is
5936 used for fixed-size object types and the latter is for variable-size
5937 object types.  Most object types are fixed-size; some complex
5938 types, however (e.g. window configurations), are variable-size.
5939 Variable-size object types have an extra method, which is called
5940 to determine the actual size of a particular object of that type.
5941 (Currently this is only used for keeping allocation statistics.)
5942
5943   For the purpose of keeping allocation statistics, the allocation
5944 engine keeps a list of all the different types that exist.  Note that,
5945 since @code{DEFINE_LRECORD_IMPLEMENTATION()} is a macro that is
5946 specified at top-level, there is no way for it to initialize the global
5947 data structures containing type information, like
5948 @code{lrecord_implementations_table}.  For this reason a call to
5949 @code{INIT_LRECORD_IMPLEMENTATION} must be added to the same source file
5950 containing @code{DEFINE_LRECORD_IMPLEMENTATION}, but instead of to the
5951 top level, to one of the init functions, typically
5952 @code{syms_of_@var{foo}.c}.  @code{INIT_LRECORD_IMPLEMENTATION} must be
5953 called before an object of this type is used.
5954
5955 The type number is also used to index into an array holding the number
5956 of objects of each type and the total memory allocated for objects of
5957 that type.  The statistics in this array are computed during the sweep
5958 stage.  These statistics are returned by the call to
5959 @code{garbage-collect}.
5960
5961   Note that for every type defined with a @code{DEFINE_LRECORD_*()}
5962 macro, there needs to be a @code{DECLARE_LRECORD_IMPLEMENTATION()}
5963 somewhere in a @file{.h} file, and this @file{.h} file needs to be
5964 included by @file{inline.c}.
5965
5966   Furthermore, there should generally be a set of @code{XFOOBAR()},
5967 @code{FOOBARP()}, etc. macros in a @file{.h} (or occasionally @file{.c})
5968 file.  To create one of these, copy an existing model and modify as
5969 necessary.
5970
5971   @strong{Please note:} If you define an lrecord in an external
5972 dynamically-loaded module, you must use @code{DECLARE_EXTERNAL_LRECORD},
5973 @code{DEFINE_EXTERNAL_LRECORD_IMPLEMENTATION}, and
5974 @code{DEFINE_EXTERNAL_LRECORD_SEQUENCE_IMPLEMENTATION} instead of the
5975 non-EXTERNAL forms. These macros will dynamically add new type numbers
5976 to the global enum that records them, whereas the non-EXTERNAL forms
5977 assume that the programmer has already inserted the correct type numbers
5978 into the enum's code at compile-time.
5979
5980   The various methods in the lrecord implementation structure are:
5981
5982 @enumerate
5983 @item
5984 @cindex mark method
5985 A @dfn{mark} method.  This is called during the marking stage and passed
5986 a function pointer (usually the @code{mark_object()} function), which is
5987 used to mark an object.  All Lisp objects that are contained within the
5988 object need to be marked by applying this function to them.  The mark
5989 method should also return a Lisp object, which should be either @code{nil} or
5990 an object to mark. (This can be used in lieu of calling
5991 @code{mark_object()} on the object, to reduce the recursion depth, and
5992 consequently should be the most heavily nested sub-object, such as a
5993 long list.)
5994
5995 @strong{Please note:} When the mark method is called, garbage collection
5996 is in progress, and special precautions need to be taken when accessing
5997 objects; see section (B) above.
5998
5999 If your mark method does not need to do anything, it can be
6000 @code{NULL}.
6001
6002 @item
6003 A @dfn{print} method.  This is called to create a printed representation
6004 of the object, whenever @code{princ}, @code{prin1}, or the like is
6005 called.  It is passed the object, a stream to which the output is to be
6006 directed, and an @code{escapeflag} which indicates whether the object's
6007 printed representation should be @dfn{escaped} so that it is
6008 readable. (This corresponds to the difference between @code{princ} and
6009 @code{prin1}.) Basically, @dfn{escaped} means that strings will have
6010 quotes around them and confusing characters in the strings such as
6011 quotes, backslashes, and newlines will be backslashed; and that special
6012 care will be taken to make symbols print in a readable fashion
6013 (e.g. symbols that look like numbers will be backslashed).  Other
6014 readable objects should perhaps pass @code{escapeflag} on when
6015 sub-objects are printed, so that readability is preserved when necessary
6016 (or if not, always pass in a 1 for @code{escapeflag}).  Non-readable
6017 objects should in general ignore @code{escapeflag}, except that some use
6018 it as an indication that more verbose output should be given.
6019
6020 Sub-objects are printed using @code{print_internal()}, which takes
6021 exactly the same arguments as are passed to the print method.
6022
6023 Literal C strings should be printed using @code{write_c_string()},
6024 or @code{write_string_1()} for non-null-terminated strings.
6025
6026 Functions that do not have a readable representation should check the
6027 @code{print_readably} flag and signal an error if it is set.
6028
6029 If you specify NULL for the print method, the
6030 @code{default_object_printer()} will be used.
6031
6032 @item
6033 A @dfn{finalize} method.  This is called at the beginning of the sweep
6034 stage on lcrecords that are about to be freed, and should be used to
6035 perform any extra object cleanup.  This typically involves freeing any
6036 extra @code{malloc()}ed memory associated with the object, releasing any
6037 operating-system and window-system resources associated with the object
6038 (e.g. pixmaps, fonts), etc.
6039
6040 The finalize method can be NULL if nothing needs to be done.
6041
6042 WARNING #1: The finalize method is also called at the end of the dump
6043 phase; this time with the for_disksave parameter set to non-zero.  The
6044 object is @emph{not} about to disappear, so you have to make sure to
6045 @emph{not} free any extra @code{malloc()}ed memory if you're going to
6046 need it later.  (Also, signal an error if there are any operating-system
6047 and window-system resources here, because they can't be dumped.)
6048
6049 Finalize methods should, as a rule, set to zero any pointers after
6050 they've been freed, and check to make sure pointers are not zero before
6051 freeing.  Although I'm pretty sure that finalize methods are not called
6052 twice on the same object (except for the @code{for_disksave} proviso),
6053 we've gotten nastily burned in some cases by not doing this.
6054
6055 WARNING #2: The finalize method is @emph{only} called for
6056 lcrecords, @emph{not} for simply lrecords.  If you need a
6057 finalize method for simple lrecords, you have to stick
6058 it in the @code{ADDITIONAL_FREE_foo()} macro in @file{alloc.c}.
6059
6060 WARNING #3: Things are in an @emph{extremely} bizarre state
6061 when @code{ADDITIONAL_FREE_foo()} is called, so you have to
6062 be incredibly careful when writing one of these functions.
6063 See the comment in @code{gc_sweep()}.  If you ever have to add
6064 one of these, consider using an lcrecord or dealing with
6065 the problem in a different fashion.
6066
6067 @item
6068 An @dfn{equal} method.  This compares the two objects for similarity,
6069 when @code{equal} is called.  It should compare the contents of the
6070 objects in some reasonable fashion.  It is passed the two objects and a
6071 @dfn{depth} value, which is used to catch circular objects.  To compare
6072 sub-Lisp-objects, call @code{internal_equal()} and bump the depth value
6073 by one.  If this value gets too high, a @code{circular-object} error
6074 will be signaled.
6075
6076 If this is NULL, objects are @code{equal} only when they are @code{eq},
6077 i.e. identical.
6078
6079 @item
6080 A @dfn{hash} method.  This is used to hash objects when they are to be
6081 compared with @code{equal}.  The rule here is that if two objects are
6082 @code{equal}, they @emph{must} hash to the same value; i.e. your hash
6083 function should use some subset of the sub-fields of the object that are
6084 compared in the ``equal'' method.  If you specify this method as
6085 @code{NULL}, the object's pointer will be used as the hash, which will
6086 @emph{fail} if the object has an @code{equal} method, so don't do this.
6087
6088 To hash a sub-Lisp-object, call @code{internal_hash()}.  Bump the
6089 depth by one, just like in the ``equal'' method.
6090
6091 To convert a Lisp object directly into a hash value (using
6092 its pointer), use @code{LISP_HASH()}.  This is what happens when
6093 the hash method is NULL.
6094
6095 To hash two or more values together into a single value, use
6096 @code{HASH2()}, @code{HASH3()}, @code{HASH4()}, etc.
6097
6098 @item
6099 @dfn{getprop}, @dfn{putprop}, @dfn{remprop}, and @dfn{plist} methods.
6100 These are used for object types that have properties.  I don't feel like
6101 documenting them here.  If you create one of these objects, you have to
6102 use different macros to define them,
6103 i.e. @code{DEFINE_LRECORD_IMPLEMENTATION_WITH_PROPS()} or
6104 @code{DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION_WITH_PROPS()}.
6105
6106 @item
6107 A @dfn{size_in_bytes} method, when the object is of variable-size.
6108 (i.e. declared with a @code{_SEQUENCE_IMPLEMENTATION} macro.)  This should
6109 simply return the object's size in bytes, exactly as you might expect.
6110 For an example, see the methods for window configurations and opaques.
6111 @end enumerate
6112
6113 @node Low-level allocation
6114 @section Low-level allocation
6115 @cindex low-level allocation
6116 @cindex allocation, low-level
6117
6118   Memory that you want to allocate directly should be allocated using
6119 @code{xmalloc()} rather than @code{malloc()}.  This implements
6120 error-checking on the return value, and once upon a time did some more
6121 vital stuff (i.e. @code{BLOCK_INPUT}, which is no longer necessary).
6122 Free using @code{xfree()}, and realloc using @code{xrealloc()}.  Note
6123 that @code{xmalloc()} will do a non-local exit if the memory can't be
6124 allocated. (Many functions, however, do not expect this, and thus XEmacs
6125 will likely crash if this happens.  @strong{This is a bug.}  If you can,
6126 you should strive to make your function handle this OK.  However, it's
6127 difficult in the general circumstance, perhaps requiring extra
6128 unwind-protects and such.)
6129
6130   Note that XEmacs provides two separate replacements for the standard
6131 @code{malloc()} library function.  These are called @dfn{old GNU malloc}
6132 (@file{malloc.c}) and @dfn{new GNU malloc} (@file{gmalloc.c}),
6133 respectively.  New GNU malloc is better in pretty much every way than
6134 old GNU malloc, and should be used if possible.  (It used to be that on
6135 some systems, the old one worked but the new one didn't.  I think this
6136 was due specifically to a bug in SunOS, which the new one now works
6137 around; so I don't think the old one ever has to be used any more.) The
6138 primary difference between both of these mallocs and the standard system
6139 malloc is that they are much faster, at the expense of increased space.
6140 The basic idea is that memory is allocated in fixed chunks of powers of
6141 two.  This allows for basically constant malloc time, since the various
6142 chunks can just be kept on a number of free lists. (The standard system
6143 malloc typically allocates arbitrary-sized chunks and has to spend some
6144 time, sometimes a significant amount of time, walking the heap looking
6145 for a free block to use and cleaning things up.)  The new GNU malloc
6146 improves on things by allocating large objects in chunks of 4096 bytes
6147 rather than in ever larger powers of two, which results in ever larger
6148 wastage.  There is a slight speed loss here, but it's of doubtful
6149 significance.
6150
6151   NOTE: Apparently there is a third-generation GNU malloc that is
6152 significantly better than the new GNU malloc, and should probably
6153 be included in XEmacs.
6154
6155   There is also the relocating allocator, @file{ralloc.c}.  This actually
6156 moves blocks of memory around so that the @code{sbrk()} pointer shrunk
6157 and virtual memory released back to the system.  On some systems,
6158 this is a big win.  On all systems, it causes a noticeable (and
6159 sometimes huge) speed penalty, so I turn it off by default.
6160 @file{ralloc.c} only works with the new GNU malloc in @file{gmalloc.c}.
6161 There are also two versions of @file{ralloc.c}, one that uses @code{mmap()}
6162 rather than block copies to move data around.  This purports to
6163 be faster, although that depends on the amount of data that would
6164 have had to be block copied and the system-call overhead for
6165 @code{mmap()}.  I don't know exactly how this works, except that the
6166 relocating-allocation routines are pretty much used only for
6167 the memory allocated for a buffer, which is the biggest consumer
6168 of space, esp. of space that may get freed later.
6169
6170   Note that the GNU mallocs have some ``memory warning'' facilities.
6171 XEmacs taps into them and issues a warning through the standard
6172 warning system, when memory gets to 75%, 85%, and 95% full.
6173 (On some systems, the memory warnings are not functional.)
6174
6175   Allocated memory that is going to be used to make a Lisp object
6176 is created using @code{allocate_lisp_storage()}.  This just calls
6177 @code{xmalloc()}.  It used to verify that the pointer to the memory can
6178 fit into a Lisp word, before the current Lisp object representation was
6179 introduced.  @code{allocate_lisp_storage()} is called by
6180 @code{alloc_lcrecord()}, @code{ALLOCATE_FIXED_TYPE()}, and the vector
6181 and bit-vector creation routines.  These routines also call
6182 @code{INCREMENT_CONS_COUNTER()} at the appropriate times; this keeps
6183 statistics on how much memory is allocated, so that garbage-collection
6184 can be invoked when the threshold is reached.
6185
6186 @node Cons
6187 @section Cons
6188 @cindex cons
6189
6190   Conses are allocated in standard frob blocks.  The only thing to
6191 note is that conses can be explicitly freed using @code{free_cons()}
6192 and associated functions @code{free_list()} and @code{free_alist()}.  This
6193 immediately puts the conses onto the cons free list, and decrements
6194 the statistics on memory allocation appropriately.  This is used
6195 to good effect by some extremely commonly-used code, to avoid
6196 generating extra objects and thereby triggering GC sooner.
6197 However, you have to be @emph{extremely} careful when doing this.
6198 If you mess this up, you will get BADLY BURNED, and it has happened
6199 before.
6200
6201 @node Vector
6202 @section Vector
6203 @cindex vector
6204
6205   As mentioned above, each vector is @code{malloc()}ed individually, and
6206 all are threaded through the variable @code{all_vectors}.  Vectors are
6207 marked strangely during garbage collection, by kludging the size field.
6208 Note that the @code{struct Lisp_Vector} is declared with its
6209 @code{contents} field being a @emph{stretchy} array of one element.  It
6210 is actually @code{malloc()}ed with the right size, however, and access
6211 to any element through the @code{contents} array works fine.
6212
6213 @node Bit Vector
6214 @section Bit Vector
6215 @cindex bit vector
6216 @cindex vector, bit
6217
6218   Bit vectors work exactly like vectors, except for more complicated
6219 code to access an individual bit, and except for the fact that bit
6220 vectors are lrecords while vectors are not. (The only difference here is
6221 that there's an lrecord implementation pointer at the beginning and the
6222 tag field in bit vector Lisp words is ``lrecord'' rather than
6223 ``vector''.)
6224
6225 @node Symbol
6226 @section Symbol
6227 @cindex symbol
6228
6229   Symbols are also allocated in frob blocks.  Symbols in the awful
6230 horrible obarray structure are chained through their @code{next} field.
6231
6232 Remember that @code{intern} looks up a symbol in an obarray, creating
6233 one if necessary.
6234
6235 @node Marker
6236 @section Marker
6237 @cindex marker
6238
6239   Markers are allocated in frob blocks, as usual.  They are kept
6240 in a buffer unordered, but in a doubly-linked list so that they
6241 can easily be removed. (Formerly this was a singly-linked list,
6242 but in some cases garbage collection took an extraordinarily
6243 long time due to the O(N^2) time required to remove lots of
6244 markers from a buffer.) Markers are removed from a buffer in
6245 the finalize stage, in @code{ADDITIONAL_FREE_marker()}.
6246
6247 @node String
6248 @section String
6249 @cindex string
6250
6251   As mentioned above, strings are a special case.  A string is logically
6252 two parts, a fixed-size object (containing the length, property list,
6253 and a pointer to the actual data), and the actual data in the string.
6254 The fixed-size object is a @code{struct Lisp_String} and is allocated in
6255 frob blocks, as usual.  The actual data is stored in special
6256 @dfn{string-chars blocks}, which are 8K blocks of memory.
6257 Currently-allocated strings are simply laid end to end in these
6258 string-chars blocks, with a pointer back to the @code{struct Lisp_String}
6259 stored before each string in the string-chars block.  When a new string
6260 needs to be allocated, the remaining space at the end of the last
6261 string-chars block is used if there's enough, and a new string-chars
6262 block is created otherwise.
6263
6264   There are never any holes in the string-chars blocks due to the string
6265 compaction and relocation that happens at the end of garbage collection.
6266 During the sweep stage of garbage collection, when objects are
6267 reclaimed, the garbage collector goes through all string-chars blocks,
6268 looking for unused strings.  Each chunk of string data is preceded by a
6269 pointer to the corresponding @code{struct Lisp_String}, which indicates
6270 both whether the string is used and how big the string is, i.e. how to
6271 get to the next chunk of string data.  Holes are compressed by
6272 block-copying the next string into the empty space and relocating the
6273 pointer stored in the corresponding @code{struct Lisp_String}.
6274 @strong{This means you have to be careful with strings in your code.}
6275 See the section above on @code{GCPRO}ing.
6276
6277   Note that there is one situation not handled: a string that is too big
6278 to fit into a string-chars block.  Such strings, called @dfn{big
6279 strings}, are all @code{malloc()}ed as their own block. (#### Although it
6280 would make more sense for the threshold for big strings to be somewhat
6281 lower, e.g. 1/2 or 1/4 the size of a string-chars block.  It seems that
6282 this was indeed the case formerly---indeed, the threshold was set at
6283 1/8---but Mly forgot about this when rewriting things for 19.8.)
6284
6285 Note also that the string data in string-chars blocks is padded as
6286 necessary so that proper alignment constraints on the @code{struct
6287 Lisp_String} back pointers are maintained.
6288
6289   Finally, strings can be resized.  This happens in Mule when a
6290 character is substituted with a different-length character, or during
6291 modeline frobbing. (You could also export this to Lisp, but it's not
6292 done so currently.) Resizing a string is a potentially tricky process.
6293 If the change is small enough that the padding can absorb it, nothing
6294 other than a simple memory move needs to be done.  Keep in mind,
6295 however, that the string can't shrink too much because the offset to the
6296 next string in the string-chars block is computed by looking at the
6297 length and rounding to the nearest multiple of four or eight.  If the
6298 string would shrink or expand beyond the correct padding, new string
6299 data needs to be allocated at the end of the last string-chars block and
6300 the data moved appropriately.  This leaves some dead string data, which
6301 is marked by putting a special marker of 0xFFFFFFFF in the @code{struct
6302 Lisp_String} pointer before the data (there's no real @code{struct
6303 Lisp_String} to point to and relocate), and storing the size of the dead
6304 string data (which would normally be obtained from the now-non-existent
6305 @code{struct Lisp_String}) at the beginning of the dead string data gap.
6306 The string compactor recognizes this special 0xFFFFFFFF marker and
6307 handles it correctly.
6308
6309 @node Compiled Function
6310 @section Compiled Function
6311 @cindex compiled function
6312 @cindex function, compiled
6313
6314   Not yet documented.
6315
6316
6317 @node Dumping, Events and the Event Loop, Allocation of Objects in XEmacs Lisp, Top
6318 @chapter Dumping
6319 @cindex dumping
6320
6321 @section What is dumping and its justification
6322 @cindex dumping and its justification, what is
6323
6324 The C code of XEmacs is just a Lisp engine with a lot of built-in
6325 primitives useful for writing an editor.  The editor itself is written
6326 mostly in Lisp, and represents around 100K lines of code.  Loading and
6327 executing the initialization of all this code takes a bit a time (five
6328 to ten times the usual startup time of current xemacs) and requires
6329 having all the lisp source files around.  Having to reload them each
6330 time the editor is started would not be acceptable.
6331
6332 The traditional solution to this problem is called dumping: the build
6333 process first creates the lisp engine under the name @file{temacs}, then
6334 runs it until it has finished loading and initializing all the lisp
6335 code, and eventually creates a new executable called @file{xemacs}
6336 including both the object code in @file{temacs} and all the contents of
6337 the memory after the initialization.
6338
6339 This solution, while working, has a huge problem: the creation of the
6340 new executable from the actual contents of memory is an extremely
6341 system-specific process, quite error-prone, and which interferes with a
6342 lot of system libraries (like malloc).  It is even getting worse
6343 nowadays with libraries using constructors which are automatically
6344 called when the program is started (even before main()) which tend to
6345 crash when they are called multiple times, once before dumping and once
6346 after (IRIX 6.x libz.so pulls in some C++ image libraries thru
6347 dependencies which have this problem).  Writing the dumper is also one
6348 of the most difficult parts of porting XEmacs to a new operating system.
6349 Basically, `dumping' is an operation that is just not officially
6350 supported on many operating systems.
6351
6352 The aim of the portable dumper is to solve the same problem as the
6353 system-specific dumper, that is to be able to reload quickly, using only
6354 a small number of files, the fully initialized lisp part of the editor,
6355 without any system-specific hacks.
6356
6357 @menu
6358 * Overview::
6359 * Data descriptions::
6360 * Dumping phase::
6361 * Reloading phase::
6362 * Remaining issues::
6363 @end menu
6364
6365 @node Overview
6366 @section Overview
6367 @cindex dumping overview
6368
6369 The portable dumping system has to:
6370
6371 @enumerate
6372 @item
6373 At dump time, write all initialized, non-quickly-rebuildable data to a
6374 file [Note: currently named @file{xemacs.dmp}, but the name will
6375 change], along with all informations needed for the reloading.
6376
6377 @item
6378 When starting xemacs, reload the dump file, relocate it to its new
6379 starting address if needed, and reinitialize all pointers to this
6380 data.  Also, rebuild all the quickly rebuildable data.
6381 @end enumerate
6382
6383 @node Data descriptions
6384 @section Data descriptions
6385 @cindex dumping data descriptions
6386
6387 The more complex task of the dumper is to be able to write lisp objects
6388 (lrecords) and C structs to disk and reload them at a different address,
6389 updating all the pointers they include in the process.  This is done by
6390 using external data descriptions that give information about the layout
6391 of the structures in memory.
6392
6393 The specification of these descriptions is in lrecord.h.  A description
6394 of an lrecord is an array of struct lrecord_description.  Each of these
6395 structs include a type, an offset in the structure and some optional
6396 parameters depending on the type.  For instance, here is the string
6397 description:
6398
6399 @example
6400 static const struct lrecord_description string_description[] = @{
6401   @{ XD_BYTECOUNT,         offsetof (Lisp_String, size) @},
6402   @{ XD_OPAQUE_DATA_PTR,   offsetof (Lisp_String, data), XD_INDIRECT(0, 1) @},
6403   @{ XD_LISP_OBJECT,       offsetof (Lisp_String, plist) @},
6404   @{ XD_END @}
6405 @};
6406 @end example
6407
6408 The first line indicates a member of type Bytecount, which is used by
6409 the next, indirect directive.  The second means "there is a pointer to
6410 some opaque data in the field @code{data}".  The length of said data is
6411 given by the expression @code{XD_INDIRECT(0, 1)}, which means "the value
6412 in the 0th line of the description (welcome to C) plus one".  The third
6413 line means "there is a Lisp_Object member @code{plist} in the Lisp_String
6414 structure".  @code{XD_END} then ends the description.
6415
6416 This gives us all the information we need to move around what is pointed
6417 to by a structure (C or lrecord) and, by transitivity, everything that
6418 it points to.  The only missing information for dumping is the size of
6419 the structure.  For lrecords, this is part of the
6420 lrecord_implementation, so we don't need to duplicate it.  For C
6421 structures we use a struct struct_description, which includes a size
6422 field and a pointer to an associated array of lrecord_description.
6423
6424 @node Dumping phase
6425 @section Dumping phase
6426 @cindex dumping phase
6427
6428 Dumping is done by calling the function pdump() (in dumper.c) which is
6429 invoked from Fdump_emacs (in emacs.c).  This function performs a number
6430 of tasks.
6431
6432 @menu
6433 * Object inventory::
6434 * Address allocation::
6435 * The header::
6436 * Data dumping::
6437 * Pointers dumping::
6438 @end menu
6439
6440 @node Object inventory
6441 @subsection Object inventory
6442 @cindex dumping object inventory
6443
6444 The first task is to build the list of the objects to dump.  This
6445 includes:
6446
6447 @itemize @bullet
6448 @item lisp objects
6449 @item C structures
6450 @end itemize
6451
6452 We end up with one @code{pdump_entry_list_elmt} per object group (arrays
6453 of C structs are kept together) which includes a pointer to the first
6454 object of the group, the per-object size and the count of objects in the
6455 group, along with some other information which is initialized later.
6456
6457 These entries are linked together in @code{pdump_entry_list} structures
6458 and can be enumerated thru either:
6459
6460 @enumerate
6461 @item
6462 the @code{pdump_object_table}, an array of @code{pdump_entry_list}, one
6463 per lrecord type, indexed by type number.
6464
6465 @item
6466 the @code{pdump_opaque_data_list}, used for the opaque data which does
6467 not include pointers, and hence does not need descriptions.
6468
6469 @item
6470 the @code{pdump_struct_table}, which is a vector of
6471 @code{struct_description}/@code{pdump_entry_list} pairs, used for
6472 non-opaque C structures.
6473 @end enumerate
6474
6475 This uses a marking strategy similar to the garbage collector.  Some
6476 differences though:
6477
6478 @enumerate
6479 @item
6480 We do not use the mark bit (which does not exist for C structures
6481 anyway); we use a big hash table instead.
6482
6483 @item
6484 We do not use the mark function of lrecords but instead rely on the
6485 external descriptions.  This happens essentially because we need to
6486 follow pointers to C structures and opaque data in addition to
6487 Lisp_Object members.
6488 @end enumerate
6489
6490 This is done by @code{pdump_register_object()}, which handles Lisp_Object
6491 variables, and @code{pdump_register_struct()} which handles C structures,
6492 which both delegate the description management to @code{pdump_register_sub()}.
6493
6494 The hash table doubles as a map object to pdump_entry_list_elmt (i.e.
6495 allows us to look up a pdump_entry_list_elmt with the object it points
6496 to).  Entries are added with @code{pdump_add_entry()} and looked up with
6497 @code{pdump_get_entry()}.  There is no need for entry removal.  The hash
6498 value is computed quite simply from the object pointer by
6499 @code{pdump_make_hash()}.
6500
6501 The roots for the marking are:
6502
6503 @enumerate
6504 @item
6505 the @code{staticpro}'ed variables (there is a special @code{staticpro_nodump()}
6506 call for protected variables we do not want to dump).
6507
6508 @item
6509 the variables registered via @code{dump_add_root_object}
6510 (@code{staticpro()} is equivalent to @code{staticpro_nodump()} +
6511 @code{dump_add_root_object()}).
6512
6513 @item
6514 the variables registered via @code{dump_add_root_struct_ptr}, each of
6515 which points to a C structure.
6516 @end enumerate
6517
6518 This does not include the GCPRO'ed variables, the specbinds, the
6519 catchtags, the backlist, the redisplay or the profiling info, since we
6520 do not want to rebuild the actual chain of lisp calls which end up to
6521 the dump-emacs call, only the global variables.
6522
6523 Weak lists and weak hash tables are dumped as if they were their
6524 non-weak equivalent (without changing their type, of course).  This has
6525 not yet been a problem.
6526
6527 @node Address allocation
6528 @subsection Address allocation
6529 @cindex dumping address allocation
6530
6531
6532 The next step is to allocate the offsets of each of the objects in the
6533 final dump file.  This is done by @code{pdump_allocate_offset()} which
6534 is called indirectly by @code{pdump_scan_by_alignment()}.
6535
6536 The strategy to deal with alignment problems uses these facts:
6537
6538 @enumerate
6539 @item
6540 real world alignment requirements are powers of two.
6541
6542 @item
6543 the C compiler is required to adjust the size of a struct so that you
6544 can have an array of them next to each other.  This means you can have an
6545 upper bound of the alignment requirements of a given structure by
6546 looking at which power of two its size is a multiple.
6547
6548 @item
6549 the non-variant part of variable size lrecords has an alignment
6550 requirement of 4.
6551 @end enumerate
6552
6553 Hence, for each lrecord type, C struct type or opaque data block the
6554 alignment requirement is computed as a power of two, with a minimum of
6555 2^2 for lrecords.  @code{pdump_scan_by_alignment()} then scans all the
6556 @code{pdump_entry_list_elmt}'s, the ones with the highest requirements
6557 first.  This ensures the best packing.
6558
6559 The maximum alignment requirement we take into account is 2^8.
6560
6561 @code{pdump_allocate_offset()} only has to do a linear allocation,
6562 starting at offset 256 (this leaves room for the header and keeps the
6563 alignments happy).
6564
6565 @node The header
6566 @subsection The header
6567 @cindex dumping, the header
6568
6569 The next step creates the file and writes a header with a signature and
6570 some random information in it.  The @code{reloc_address} field, which
6571 indicates at which address the file should be loaded if we want to avoid
6572 post-reload relocation, is set to 0.  It then seeks to offset 256 (base
6573 offset for the objects).
6574
6575 @node Data dumping
6576 @subsection Data dumping
6577 @cindex data dumping
6578 @cindex dumping, data
6579
6580 The data is dumped in the same order as the addresses were allocated by
6581 @code{pdump_dump_data()}, called from @code{pdump_scan_by_alignment()}.
6582 This function copies the data to a temporary buffer, relocates all
6583 pointers in the object to the addresses allocated in step Address
6584 Allocation, and writes it to the file.  Using the same order means that,
6585 if we are careful with lrecords whose size is not a multiple of 4, we
6586 are ensured that the object is always written at the offset in the file
6587 allocated in step Address Allocation.
6588
6589 @node Pointers dumping
6590 @subsection Pointers dumping
6591 @cindex pointers dumping
6592 @cindex dumping, pointers
6593
6594 A bunch of tables needed to reassign properly the global pointers are
6595 then written.  They are:
6596
6597 @enumerate
6598 @item
6599 the pdump_root_struct_ptrs dynarr
6600 @item
6601 the pdump_opaques dynarr
6602 @item
6603 a vector of all the offsets to the objects in the file that include a
6604 description (for faster relocation at reload time)
6605 @item
6606 the pdump_root_objects and pdump_weak_object_chains dynarrs.
6607 @end enumerate
6608
6609 For each of the dynarrs we write both the pointer to the variables and
6610 the relocated offset of the object they point to.  Since these variables
6611 are global, the pointers are still valid when restarting the program and
6612 are used to regenerate the global pointers.
6613
6614 The @code{pdump_weak_object_chains} dynarr is a special case.  The
6615 variables it points to are the head of weak linked lists of lisp objects
6616 of the same type.  Not all objects of this list are dumped so the
6617 relocated pointer we associate with them points to the first dumped
6618 object of the list, or Qnil if none is available.  This is also the
6619 reason why they are not used as roots for the purpose of object
6620 enumeration.
6621
6622 Some very important information like the @code{staticpros} and
6623 @code{lrecord_implementations_table} are handled indirectly using
6624 @code{dump_add_opaque} or @code{dump_add_root_struct_ptr}.
6625
6626 This is the end of the dumping part.
6627
6628 @node Reloading phase
6629 @section Reloading phase
6630 @cindex reloading phase
6631 @cindex dumping, reloading phase
6632
6633 @subsection File loading
6634 @cindex dumping, file loading
6635
6636 The file is mmap'ed in memory (which ensures a PAGESIZE alignment, at
6637 least 4096), or if mmap is unavailable or fails, a 256-bytes aligned
6638 malloc is done and the file is loaded.
6639
6640 Some variables are reinitialized from the values found in the header.
6641
6642 The difference between the actual loading address and the reloc_address
6643 is computed and will be used for all the relocations.
6644
6645
6646 @subsection Putting back the pdump_opaques
6647 @cindex dumping, putting back the pdump_opaques
6648
6649 The memory contents are restored in the obvious and trivial way.
6650
6651
6652 @subsection Putting back the pdump_root_struct_ptrs
6653 @cindex dumping, putting back the pdump_root_struct_ptrs
6654
6655 The variables pointed to by pdump_root_struct_ptrs in the dump phase are
6656 reset to the right relocated object addresses.
6657
6658
6659 @subsection Object relocation
6660 @cindex dumping, object relocation
6661
6662 All the objects are relocated using their description and their offset
6663 by @code{pdump_reloc_one}.  This step is unnecessary if the
6664 reloc_address is equal to the file loading address.
6665
6666
6667 @subsection Putting back the pdump_root_objects and pdump_weak_object_chains
6668 @cindex dumping, putting back the pdump_root_objects and pdump_weak_object_chains
6669
6670 Same as Putting back the pdump_root_struct_ptrs.
6671
6672
6673 @subsection Reorganize the hash tables
6674 @cindex dumping, reorganize the hash tables
6675
6676 Since some of the hash values in the lisp hash tables are
6677 address-dependent, their layout is now wrong.  So we go through each of
6678 them and have them resorted by calling @code{pdump_reorganize_hash_table}.
6679
6680 @node Remaining issues
6681 @section Remaining issues
6682 @cindex dumping, remaining issues
6683
6684 The build process will have to start a post-dump xemacs, ask it the
6685 loading address (which will, hopefully, be always the same between
6686 different xemacs invocations) and relocate the file to the new address.
6687 This way the object relocation phase will not have to be done, which
6688 means no writes in the objects and that, because of the use of mmap, the
6689 dumped data will be shared between all the xemacs running on the
6690 computer.
6691
6692 Some executable signature will be necessary to ensure that a given dump
6693 file is really associated with a given executable, or random crashes
6694 will occur.  Maybe a random number set at compile or configure time thru
6695 a define.  This will also allow for having differently-compiled xemacsen
6696 on the same system (mule and no-mule comes to mind).
6697
6698 The DOC file contents should probably end up in the dump file.
6699
6700
6701 @node Events and the Event Loop, Evaluation; Stack Frames; Bindings, Dumping, Top
6702 @chapter Events and the Event Loop
6703 @cindex events and the event loop
6704 @cindex event loop, events and the
6705
6706 @menu
6707 * Introduction to Events::
6708 * Main Loop::
6709 * Specifics of the Event Gathering Mechanism::
6710 * Specifics About the Emacs Event::
6711 * The Event Stream Callback Routines::
6712 * Other Event Loop Functions::
6713 * Converting Events::
6714 * Dispatching Events; The Command Builder::
6715 @end menu
6716
6717 @node Introduction to Events
6718 @section Introduction to Events
6719 @cindex events, introduction to
6720
6721   An event is an object that encapsulates information about an
6722 interesting occurrence in the operating system.  Events are
6723 generated either by user action, direct (e.g. typing on the
6724 keyboard or moving the mouse) or indirect (moving another
6725 window, thereby generating an expose event on an Emacs frame),
6726 or as a result of some other typically asynchronous action happening,
6727 such as output from a subprocess being ready or a timer expiring.
6728 Events come into the system in an asynchronous fashion (typically
6729 through a callback being called) and are converted into a
6730 synchronous event queue (first-in, first-out) in a process that
6731 we will call @dfn{collection}.
6732
6733   Note that each application has its own event queue. (It is
6734 immaterial whether the collection process directly puts the
6735 events in the proper application's queue, or puts them into
6736 a single system queue, which is later split up.)
6737
6738   The most basic level of event collection is done by the
6739 operating system or window system.  Typically, XEmacs does
6740 its own event collection as well.  Often there are multiple
6741 layers of collection in XEmacs, with events from various
6742 sources being collected into a queue, which is then combined
6743 with other sources to go into another queue (i.e. a second
6744 level of collection), with perhaps another level on top of
6745 this, etc.
6746
6747   XEmacs has its own types of events (called @dfn{Emacs events}),
6748 which provides an abstract layer on top of the system-dependent
6749 nature of the most basic events that are received.  Part of the
6750 complex nature of the XEmacs event collection process involves
6751 converting from the operating-system events into the proper
6752 Emacs events---there may not be a one-to-one correspondence.
6753
6754   Emacs events are documented in @file{events.h}; I'll discuss them
6755 later.
6756
6757 @node Main Loop
6758 @section Main Loop
6759 @cindex main loop
6760 @cindex events, main loop
6761
6762   The @dfn{command loop} is the top-level loop that the editor is always
6763 running.  It loops endlessly, calling @code{next-event} to retrieve an
6764 event and @code{dispatch-event} to execute it. @code{dispatch-event} does
6765 the appropriate thing with non-user events (process, timeout,
6766 magic, eval, mouse motion); this involves calling a Lisp handler
6767 function, redrawing a newly-exposed part of a frame, reading
6768 subprocess output, etc.  For user events, @code{dispatch-event}
6769 looks up the event in relevant keymaps or menubars; when a
6770 full key sequence or menubar selection is reached, the appropriate
6771 function is executed. @code{dispatch-event} may have to keep state
6772 across calls; this is done in the ``command-builder'' structure
6773 associated with each console (remember, there's usually only
6774 one console), and the engine that looks up keystrokes and
6775 constructs full key sequences is called the @dfn{command builder}.
6776 This is documented elsewhere.
6777
6778   The guts of the command loop are in @code{command_loop_1()}.  This
6779 function doesn't catch errors, though---that's the job of
6780 @code{command_loop_2()}, which is a condition-case (i.e. error-trapping)
6781 wrapper around @code{command_loop_1()}.  @code{command_loop_1()} never
6782 returns, but may get thrown out of.
6783
6784   When an error occurs, @code{cmd_error()} is called, which usually
6785 invokes the Lisp error handler in @code{command-error}; however, a
6786 default error handler is provided if @code{command-error} is @code{nil}
6787 (e.g. during startup).  The purpose of the error handler is simply to
6788 display the error message and do associated cleanup; it does not need to
6789 throw anywhere.  When the error handler finishes, the condition-case in
6790 @code{command_loop_2()} will finish and @code{command_loop_2()} will
6791 reinvoke @code{command_loop_1()}.
6792
6793   @code{command_loop_2()} is invoked from three places: from
6794 @code{initial_command_loop()} (called from @code{main()} at the end of
6795 internal initialization), from the Lisp function @code{recursive-edit},
6796 and from @code{call_command_loop()}.
6797
6798   @code{call_command_loop()} is called when a macro is started and when
6799 the minibuffer is entered; normal termination of the macro or minibuffer
6800 causes a throw out of the recursive command loop. (To
6801 @code{execute-kbd-macro} for macros and @code{exit} for minibuffers.
6802 Note also that the low-level minibuffer-entering function,
6803 @code{read-minibuffer-internal}, provides its own error handling and
6804 does not need @code{command_loop_2()}'s error encapsulation; so it tells
6805 @code{call_command_loop()} to invoke @code{command_loop_1()} directly.)
6806
6807   Note that both read-minibuffer-internal and recursive-edit set up a
6808 catch for @code{exit}; this is why @code{abort-recursive-edit}, which
6809 throws to this catch, exits out of either one.
6810
6811   @code{initial_command_loop()}, called from @code{main()}, sets up a
6812 catch for @code{top-level} when invoking @code{command_loop_2()},
6813 allowing functions to throw all the way to the top level if they really
6814 need to.  Before invoking @code{command_loop_2()},
6815 @code{initial_command_loop()} calls @code{top_level_1()}, which handles
6816 all of the startup stuff (creating the initial frame, handling the
6817 command-line options, loading the user's @file{.emacs} file, etc.).  The
6818 function that actually does this is in Lisp and is pointed to by the
6819 variable @code{top-level}; normally this function is
6820 @code{normal-top-level}.  @code{top_level_1()} is just an error-handling
6821 wrapper similar to @code{command_loop_2()}.  Note also that
6822 @code{initial_command_loop()} sets up a catch for @code{top-level} when
6823 invoking @code{top_level_1()}, just like when it invokes
6824 @code{command_loop_2()}.
6825
6826 @node Specifics of the Event Gathering Mechanism
6827 @section Specifics of the Event Gathering Mechanism
6828 @cindex event gathering mechanism, specifics of the
6829
6830   Here is an approximate diagram of the collection processes
6831 at work in XEmacs, under TTY's (TTY's are simpler than X
6832 so we'll look at this first):
6833
6834 @noindent
6835 @example
6836  asynch.      asynch.    asynch.   asynch.             [Collectors in
6837 kbd events  kbd events   process   process                the OS]
6838       |         |         output    output
6839       |         |           |         |
6840       |         |           |         |      SIGINT,   [signal handlers
6841       |         |           |         |      SIGQUIT,     in XEmacs]
6842       V         V           V         V      SIGWINCH,
6843      file      file        file      file    SIGALRM
6844      desc.     desc.       desc.     desc.     |
6845      (TTY)     (TTY)       (pipe)    (pipe)    |
6846       |          |          |         |      fake    timeouts
6847       |          |          |         |      file        |
6848       |          |          |         |      desc.       |
6849       |          |          |         |      (pipe)      |
6850       |          |          |         |        |         |
6851       |          |          |         |        |         |
6852       |          |          |         |        |         |
6853       V          V          V         V        V         V
6854       ------>-----------<----------------<----------------
6855                   |
6856                   |
6857                   | [collected using select() in emacs_tty_next_event()
6858                   |  and converted to the appropriate Emacs event]
6859                   |
6860                   |
6861                   V          (above this line is TTY-specific)
6862                 Emacs -----------------------------------------------
6863                 event (below this line is the generic event mechanism)
6864                   |
6865                   |
6866 was there     if not, call
6867 a SIGINT?  emacs_tty_next_event()
6868     |             |
6869     |             |
6870     |             |
6871     V             V
6872     --->------<----
6873            |
6874            |     [collected in event_stream_next_event();
6875            |      SIGINT is converted using maybe_read_quit_event()]
6876            V
6877          Emacs
6878          event
6879            |
6880            \---->------>----- maybe_kbd_translate() ---->---\
6881                                                             |
6882                                                             |
6883                                                             |
6884      command event queue                                    |
6885                                                if not from command
6886   (contains events that were                   event queue, call
6887   read earlier but not processed,              event_stream_next_event()
6888   typically when waiting in a                               |
6889   sit-for, sleep-for, etc. for                              |
6890  a particular event to be received)                         |
6891                |                                            |
6892                |                                            |
6893                V                                            V
6894                ---->------------------------------------<----
6895                                                |
6896                                                | [collected in
6897                                                |  next_event_internal()]
6898                                                |
6899  unread-     unread-       event from          |
6900  command-    command-       keyboard       else, call
6901  events      event           macro      next_event_internal()
6902    |           |               |               |
6903    |           |               |               |
6904    |           |               |               |
6905    V           V               V               V
6906    --------->----------------------<------------
6907                      |
6908                      |      [collected in `next-event', which may loop
6909                      |       more than once if the event it gets is on
6910                      |       a dead frame, device, etc.]
6911                      |
6912                      |
6913                      V
6914             feed into top-level event loop,
6915             which repeatedly calls `next-event'
6916             and then dispatches the event
6917             using `dispatch-event'
6918 @end example
6919
6920 Notice the separation between TTY-specific and generic event mechanism.
6921 When using the Xt-based event loop, the TTY-specific stuff is replaced
6922 but the rest stays the same.
6923
6924 It's also important to realize that only one different kind of
6925 system-specific event loop can be operating at a time, and must be able
6926 to receive all kinds of events simultaneously.  For the two existing
6927 event loops (implemented in @file{event-tty.c} and @file{event-Xt.c},
6928 respectively), the TTY event loop @emph{only} handles TTY consoles,
6929 while the Xt event loop handles @emph{both} TTY and X consoles.  This
6930 situation is different from all of the output handlers, where you simply
6931 have one per console type.
6932
6933   Here's the Xt Event Loop Diagram (notice that below a certain point,
6934 it's the same as the above diagram):
6935
6936 @example
6937 asynch. asynch. asynch. asynch.                 [Collectors in
6938  kbd     kbd    process process                    the OS]
6939 events  events  output  output
6940   |       |       |       |
6941   |       |       |       |     asynch. asynch. [Collectors in the
6942   |       |       |       |       X        X     OS and X Window System]
6943   |       |       |       |     events  events
6944   |       |       |       |       |        |
6945   |       |       |       |       |        |
6946   |       |       |       |       |        |    SIGINT, [signal handlers
6947   |       |       |       |       |        |    SIGQUIT,   in XEmacs]
6948   |       |       |       |       |        |    SIGWINCH,
6949   |       |       |       |       |        |    SIGALRM
6950   |       |       |       |       |        |       |
6951   |       |       |       |       |        |       |
6952   |       |       |       |       |        |       |      timeouts
6953   |       |       |       |       |        |       |          |
6954   |       |       |       |       |        |       |          |
6955   |       |       |       |       |        |       V          |
6956   V       V       V       V       V        V      fake        |
6957  file    file    file    file    file     file    file        |
6958  desc.   desc.   desc.   desc.   desc.    desc.   desc.       |
6959  (TTY)   (TTY)   (pipe)  (pipe) (socket) (socket) (pipe)      |
6960   |       |       |       |       |        |       |          |
6961   |       |       |       |       |        |       |          |
6962   |       |       |       |       |        |       |          |
6963   V       V       V       V       V        V       V          V
6964   --->----------------------------------------<---------<------
6965        |              |               |
6966        |              |               |[collected using select() in
6967        |              |               | _XtWaitForSomething(), called
6968        |              |               | from XtAppProcessEvent(), called
6969        |              |               | in emacs_Xt_next_event();
6970        |              |               | dispatched to various callbacks]
6971        |              |               |
6972        |              |               |
6973   emacs_Xt_        p_s_callback(),    | [popup_selection_callback]
6974   event_handler()  x_u_v_s_callback(),| [x_update_vertical_scrollbar_
6975        |           x_u_h_s_callback(),|  callback]
6976        |           search_callback()  | [x_update_horizontal_scrollbar_
6977        |              |               |  callback]
6978        |              |               |
6979        |              |               |
6980   enqueue_Xt_       signal_special_   |
6981   dispatch_event()  Xt_user_event()   |
6982   [maybe multiple     |               |
6983    times, maybe 0     |               |
6984    times]             |               |
6985        |            enqueue_Xt_       |
6986        |            dispatch_event()  |
6987        |              |               |
6988        |              |               |
6989        V              V               |
6990        -->----------<--               |
6991               |                       |
6992               |                       |
6993            dispatch             Xt_what_callback()
6994            event                  sets flags
6995            queue                      |
6996               |                       |
6997               |                       |
6998               |                       |
6999               |                       |
7000               ---->-----------<--------
7001                    |
7002                    |
7003                    |     [collected and converted as appropriate in
7004                    |            emacs_Xt_next_event()]
7005                    |
7006                    |
7007                    V          (above this line is Xt-specific)
7008                  Emacs ------------------------------------------------
7009                  event (below this line is the generic event mechanism)
7010                    |
7011                    |
7012 was there      if not, call
7013 a SIGINT?   emacs_Xt_next_event()
7014     |              |
7015     |              |
7016     |              |
7017     V              V
7018     --->-------<----
7019            |
7020            |        [collected in event_stream_next_event();
7021            |         SIGINT is converted using maybe_read_quit_event()]
7022            V
7023          Emacs
7024          event
7025            |
7026            \---->------>----- maybe_kbd_translate() -->-----\
7027                                                             |
7028                                                             |
7029                                                             |
7030      command event queue                                    |
7031                                               if not from command
7032   (contains events that were                  event queue, call
7033   read earlier but not processed,             event_stream_next_event()
7034   typically when waiting in a                               |
7035   sit-for, sleep-for, etc. for                              |
7036  a particular event to be received)                         |
7037                |                                            |
7038                |                                            |
7039                V                                            V
7040                ---->----------------------------------<------
7041                                                |
7042                                                | [collected in
7043                                                |  next_event_internal()]
7044                                                |
7045  unread-     unread-       event from          |
7046  command-    command-       keyboard       else, call
7047  events      event           macro      next_event_internal()
7048    |           |               |               |
7049    |           |               |               |
7050    |           |               |               |
7051    V           V               V               V
7052    --------->----------------------<------------
7053                      |
7054                      |      [collected in `next-event', which may loop
7055                      |       more than once if the event it gets is on
7056                      |       a dead frame, device, etc.]
7057                      |
7058                      |
7059                      V
7060             feed into top-level event loop,
7061             which repeatedly calls `next-event'
7062             and then dispatches the event
7063             using `dispatch-event'
7064 @end example
7065
7066 @node Specifics About the Emacs Event
7067 @section Specifics About the Emacs Event
7068 @cindex event, specifics about the Lisp object
7069
7070 @node The Event Stream Callback Routines
7071 @section The Event Stream Callback Routines
7072 @cindex event stream callback routines, the
7073 @cindex callback routines, the event stream
7074
7075 @node Other Event Loop Functions
7076 @section Other Event Loop Functions
7077 @cindex event loop functions, other
7078
7079   @code{detect_input_pending()} and @code{input-pending-p} look for
7080 input by calling @code{event_stream->event_pending_p} and looking in
7081 @code{[V]unread-command-event} and the @code{command_event_queue} (they
7082 do not check for an executing keyboard macro, though).
7083
7084   @code{discard-input} cancels any command events pending (and any
7085 keyboard macros currently executing), and puts the others onto the
7086 @code{command_event_queue}.  There is a comment about a ``race
7087 condition'', which is not a good sign.
7088
7089   @code{next-command-event} and @code{read-char} are higher-level
7090 interfaces to @code{next-event}.  @code{next-command-event} gets the
7091 next @dfn{command} event (i.e.  keypress, mouse event, menu selection,
7092 or scrollbar action), calling @code{dispatch-event} on any others.
7093 @code{read-char} calls @code{next-command-event} and uses
7094 @code{event_to_character()} to return the character equivalent.  With
7095 the right kind of input method support, it is possible for (read-char)
7096 to return a Kanji character.
7097
7098 @node Converting Events
7099 @section Converting Events
7100 @cindex converting events
7101 @cindex events, converting
7102
7103   @code{character_to_event()}, @code{event_to_character()},
7104 @code{event-to-character}, and @code{character-to-event} convert between
7105 characters and keypress events corresponding to the characters.  If the
7106 event was not a keypress, @code{event_to_character()} returns -1 and
7107 @code{event-to-character} returns @code{nil}.  These functions convert
7108 between character representation and the split-up event representation
7109 (keysym plus mod keys).
7110
7111 @node Dispatching Events; The Command Builder
7112 @section Dispatching Events; The Command Builder
7113 @cindex dispatching events; the command builder
7114 @cindex events; the command builder, dispatching
7115 @cindex command builder, dispatching events; the
7116
7117 Not yet documented.
7118
7119 @node Evaluation; Stack Frames; Bindings, Symbols and Variables, Events and the Event Loop, Top
7120 @chapter Evaluation; Stack Frames; Bindings
7121 @cindex evaluation; stack frames; bindings
7122 @cindex stack frames; bindings, evaluation;
7123 @cindex bindings, evaluation; stack frames;
7124
7125 @menu
7126 * Evaluation::
7127 * Dynamic Binding; The specbinding Stack; Unwind-Protects::
7128 * Simple Special Forms::
7129 * Catch and Throw::
7130 @end menu
7131
7132 @node Evaluation
7133 @section Evaluation
7134 @cindex evaluation
7135
7136   @code{Feval()} evaluates the form (a Lisp object) that is passed to
7137 it.  Note that evaluation is only non-trivial for two types of objects:
7138 symbols and conses.  A symbol is evaluated simply by calling
7139 @code{symbol-value} on it and returning the value.
7140
7141   Evaluating a cons means calling a function.  First, @code{eval} checks
7142 to see if garbage-collection is necessary, and calls
7143 @code{garbage_collect_1()} if so.  It then increases the evaluation
7144 depth by 1 (@code{lisp_eval_depth}, which is always less than
7145 @code{max_lisp_eval_depth}) and adds an element to the linked list of
7146 @code{struct backtrace}'s (@code{backtrace_list}).  Each such structure
7147 contains a pointer to the function being called plus a list of the
7148 function's arguments.  Originally these values are stored unevalled, and
7149 as they are evaluated, the backtrace structure is updated.  Garbage
7150 collection pays attention to the objects pointed to in the backtrace
7151 structures (garbage collection might happen while a function is being
7152 called or while an argument is being evaluated, and there could easily
7153 be no other references to the arguments in the argument list; once an
7154 argument is evaluated, however, the unevalled version is not needed by
7155 eval, and so the backtrace structure is changed).
7156
7157 At this point, the function to be called is determined by looking at
7158 the car of the cons (if this is a symbol, its function definition is
7159 retrieved and the process repeated).  The function should then consist
7160 of either a @code{Lisp_Subr} (built-in function written in C), a
7161 @code{Lisp_Compiled_Function} object, or a cons whose car is one of the
7162 symbols @code{autoload}, @code{macro} or @code{lambda}.
7163
7164 If the function is a @code{Lisp_Subr}, the lisp object points to a
7165 @code{struct Lisp_Subr} (created by @code{DEFUN()}), which contains a
7166 pointer to the C function, a minimum and maximum number of arguments
7167 (or possibly the special constants @code{MANY} or @code{UNEVALLED}), a
7168 pointer to the symbol referring to that subr, and a couple of other
7169 things.  If the subr wants its arguments @code{UNEVALLED}, they are
7170 passed raw as a list.  Otherwise, an array of evaluated arguments is
7171 created and put into the backtrace structure, and either passed whole
7172 (@code{MANY}) or each argument is passed as a C argument.
7173
7174 If the function is a @code{Lisp_Compiled_Function},
7175 @code{funcall_compiled_function()} is called.  If the function is a
7176 lambda list, @code{funcall_lambda()} is called.  If the function is a
7177 macro, [..... fill in] is done.  If the function is an autoload,
7178 @code{do_autoload()} is called to load the definition and then eval
7179 starts over [explain this more].
7180
7181 When @code{Feval()} exits, the evaluation depth is reduced by one, the
7182 debugger is called if appropriate, and the current backtrace structure
7183 is removed from the list.
7184
7185 Both @code{funcall_compiled_function()} and @code{funcall_lambda()} need
7186 to go through the list of formal parameters to the function and bind
7187 them to the actual arguments, checking for @code{&rest} and
7188 @code{&optional} symbols in the formal parameters and making sure the
7189 number of actual arguments is correct.
7190 @code{funcall_compiled_function()} can do this a little more
7191 efficiently, since the formal parameter list can be checked for sanity
7192 when the compiled function object is created.
7193
7194 @code{funcall_lambda()} simply calls @code{Fprogn} to execute the code
7195 in the lambda list.
7196
7197 @code{funcall_compiled_function()} calls the real byte-code interpreter
7198 @code{execute_optimized_program()} on the byte-code instructions, which
7199 are converted into an internal form for faster execution.
7200
7201 When a compiled function is executed for the first time by
7202 @code{funcall_compiled_function()}, or during the dump phase of building
7203 XEmacs, the byte-code instructions are converted from a
7204 @code{Lisp_String} (which is inefficient to access, especially in the
7205 presence of MULE) into a @code{Lisp_Opaque} object containing an array
7206 of unsigned char, which can be directly executed by the byte-code
7207 interpreter.  At this time the byte code is also analyzed for validity
7208 and transformed into a more optimized form, so that
7209 @code{execute_optimized_program()} can really fly.
7210
7211 Here are some of the optimizations performed by the internal byte-code
7212 transformer:
7213 @enumerate
7214 @item
7215 References to the @code{constants} array are checked for out-of-range
7216 indices, so that the byte interpreter doesn't have to.
7217 @item
7218 References to the @code{constants} array that will be used as a Lisp
7219 variable are checked for being correct non-constant (i.e. not @code{t},
7220 @code{nil}, or @code{keywordp}) symbols, so that the byte interpreter
7221 doesn't have to.
7222 @item
7223 The maximum number of variable bindings in the byte-code is
7224 pre-computed, so that space on the @code{specpdl} stack can be
7225 pre-reserved once for the whole function execution.
7226 @item
7227 All byte-code jumps are relative to the current program counter instead
7228 of the start of the program, thereby saving a register.
7229 @item
7230 One-byte relative jumps are converted from the byte-code form of unsigned
7231 chars offset by 127 to machine-friendly signed chars.
7232 @end enumerate
7233
7234 Of course, this transformation of the @code{instructions} should not be
7235 visible to the user, so @code{Fcompiled_function_instructions()} needs
7236 to know how to convert the optimized opaque object back into a Lisp
7237 string that is identical to the original string from the @file{.elc}
7238 file.  (Actually, the resulting string may (rarely) contain slightly
7239 different, yet equivalent, byte code.)
7240
7241 @code{Ffuncall()} implements Lisp @code{funcall}.  @code{(funcall fun
7242 x1 x2 x3 ...)} is equivalent to @code{(eval (list fun (quote x1) (quote
7243 x2) (quote x3) ...))}.  @code{Ffuncall()} contains its own code to do
7244 the evaluation, however, and is very similar to @code{Feval()}.
7245
7246 From the performance point of view, it is worth knowing that most of the
7247 time in Lisp evaluation is spent executing @code{Lisp_Subr} and
7248 @code{Lisp_Compiled_Function} objects via @code{Ffuncall()} (not
7249 @code{Feval()}).
7250
7251 @code{Fapply()} implements Lisp @code{apply}, which is very similar to
7252 @code{funcall} except that if the last argument is a list, the result is the
7253 same as if each of the arguments in the list had been passed separately.
7254 @code{Fapply()} does some business to expand the last argument if it's a
7255 list, then calls @code{Ffuncall()} to do the work.
7256
7257 @code{apply1()}, @code{call0()}, @code{call1()}, @code{call2()}, and
7258 @code{call3()} call a function, passing it the argument(s) given (the
7259 arguments are given as separate C arguments rather than being passed as
7260 an array).  @code{apply1()} uses @code{Fapply()} while the others use
7261 @code{Ffuncall()} to do the real work.
7262
7263 @node Dynamic Binding; The specbinding Stack; Unwind-Protects
7264 @section Dynamic Binding; The specbinding Stack; Unwind-Protects
7265 @cindex dynamic binding; the specbinding stack; unwind-protects
7266 @cindex binding; the specbinding stack; unwind-protects, dynamic
7267 @cindex specbinding stack; unwind-protects, dynamic binding; the
7268 @cindex unwind-protects, dynamic binding; the specbinding stack;
7269
7270 @example
7271 struct specbinding
7272 @{
7273   Lisp_Object symbol;
7274   Lisp_Object old_value;
7275   Lisp_Object (*func) (Lisp_Object); /* for unwind-protect */
7276 @};
7277 @end example
7278
7279   @code{struct specbinding} is used for local-variable bindings and
7280 unwind-protects.  @code{specpdl} holds an array of @code{struct specbinding}'s,
7281 @code{specpdl_ptr} points to the beginning of the free bindings in the
7282 array, @code{specpdl_size} specifies the total number of binding slots
7283 in the array, and @code{max_specpdl_size} specifies the maximum number
7284 of bindings the array can be expanded to hold.  @code{grow_specpdl()}
7285 increases the size of the @code{specpdl} array, multiplying its size by
7286 2 but never exceeding @code{max_specpdl_size} (except that if this
7287 number is less than 400, it is first set to 400).
7288
7289   @code{specbind()} binds a symbol to a value and is used for local
7290 variables and @code{let} forms.  The symbol and its old value (which
7291 might be @code{Qunbound}, indicating no prior value) are recorded in the
7292 specpdl array, and @code{specpdl_size} is increased by 1.
7293
7294   @code{record_unwind_protect()} implements an @dfn{unwind-protect},
7295 which, when placed around a section of code, ensures that some specified
7296 cleanup routine will be executed even if the code exits abnormally
7297 (e.g. through a @code{throw} or quit).  @code{record_unwind_protect()}
7298 simply adds a new specbinding to the @code{specpdl} array and stores the
7299 appropriate information in it.  The cleanup routine can either be a C
7300 function, which is stored in the @code{func} field, or a @code{progn}
7301 form, which is stored in the @code{old_value} field.
7302
7303   @code{unbind_to()} removes specbindings from the @code{specpdl} array
7304 until the specified position is reached.  Each specbinding can be one of
7305 three types:
7306
7307 @enumerate
7308 @item
7309 an unwind-protect with a C cleanup function (@code{func} is not 0, and
7310 @code{old_value} holds an argument to be passed to the function);
7311 @item
7312 an unwind-protect with a Lisp form (@code{func} is 0, @code{symbol}
7313 is @code{nil}, and @code{old_value} holds the form to be executed with
7314 @code{Fprogn()}); or
7315 @item
7316 a local-variable binding (@code{func} is 0, @code{symbol} is not
7317 @code{nil}, and @code{old_value} holds the old value, which is stored as
7318 the symbol's value).
7319 @end enumerate
7320
7321 @node Simple Special Forms
7322 @section Simple Special Forms
7323 @cindex special forms, simple
7324
7325 @code{or}, @code{and}, @code{if}, @code{cond}, @code{progn},
7326 @code{prog1}, @code{prog2}, @code{setq}, @code{quote}, @code{function},
7327 @code{let*}, @code{let}, @code{while}
7328
7329 All of these are very simple and work as expected, calling
7330 @code{Feval()} or @code{Fprogn()} as necessary and (in the case of
7331 @code{let} and @code{let*}) using @code{specbind()} to create bindings
7332 and @code{unbind_to()} to undo the bindings when finished.
7333
7334 Note that, with the exception of @code{Fprogn}, these functions are
7335 typically called in real life only in interpreted code, since the byte
7336 compiler knows how to convert calls to these functions directly into
7337 byte code.
7338
7339 @node Catch and Throw
7340 @section Catch and Throw
7341 @cindex catch and throw
7342 @cindex throw, catch and
7343
7344 @example
7345 struct catchtag
7346 @{
7347   Lisp_Object tag;
7348   Lisp_Object val;
7349   struct catchtag *next;
7350   struct gcpro *gcpro;
7351   jmp_buf jmp;
7352   struct backtrace *backlist;
7353   int lisp_eval_depth;
7354   int pdlcount;
7355 @};
7356 @end example
7357
7358   @code{catch} is a Lisp function that places a catch around a body of
7359 code.  A catch is a means of non-local exit from the code.  When a catch
7360 is created, a tag is specified, and executing a @code{throw} to this tag
7361 will exit from the body of code caught with this tag, and its value will
7362 be the value given in the call to @code{throw}.  If there is no such
7363 call, the code will be executed normally.
7364
7365   Information pertaining to a catch is held in a @code{struct catchtag},
7366 which is placed at the head of a linked list pointed to by
7367 @code{catchlist}.  @code{internal_catch()} is passed a C function to
7368 call (@code{Fprogn()} when Lisp @code{catch} is called) and arguments to
7369 give it, and places a catch around the function.  Each @code{struct
7370 catchtag} is held in the stack frame of the @code{internal_catch()}
7371 instance that created the catch.
7372
7373   @code{internal_catch()} is fairly straightforward.  It stores into the
7374 @code{struct catchtag} the tag name and the current values of
7375 @code{backtrace_list}, @code{lisp_eval_depth}, @code{gcprolist}, and the
7376 offset into the @code{specpdl} array, sets a jump point with @code{_setjmp()}
7377 (storing the jump point into the @code{struct catchtag}), and calls the
7378 function.  Control will return to @code{internal_catch()} either when
7379 the function exits normally or through a @code{_longjmp()} to this jump
7380 point.  In the latter case, @code{throw} will store the value to be
7381 returned into the @code{struct catchtag} before jumping.  When it's
7382 done, @code{internal_catch()} removes the @code{struct catchtag} from
7383 the catchlist and returns the proper value.
7384
7385   @code{Fthrow()} goes up through the catchlist until it finds one with
7386 a matching tag.  It then calls @code{unbind_catch()} to restore
7387 everything to what it was when the appropriate catch was set, stores the
7388 return value in the @code{struct catchtag}, and jumps (with
7389 @code{_longjmp()}) to its jump point.
7390
7391   @code{unbind_catch()} removes all catches from the catchlist until it
7392 finds the correct one.  Some of the catches might have been placed for
7393 error-trapping, and if so, the appropriate entries on the handlerlist
7394 must be removed (see ``errors'').  @code{unbind_catch()} also restores
7395 the values of @code{gcprolist}, @code{backtrace_list}, and
7396 @code{lisp_eval}, and calls @code{unbind_to()} to undo any specbindings
7397 created since the catch.
7398
7399
7400 @node Symbols and Variables, Buffers and Textual Representation, Evaluation; Stack Frames; Bindings, Top
7401 @chapter Symbols and Variables
7402 @cindex symbols and variables
7403 @cindex variables, symbols and
7404
7405 @menu
7406 * Introduction to Symbols::
7407 * Obarrays::
7408 * Symbol Values::
7409 @end menu
7410
7411 @node Introduction to Symbols
7412 @section Introduction to Symbols
7413 @cindex symbols, introduction to
7414
7415   A symbol is basically just an object with four fields: a name (a
7416 string), a value (some Lisp object), a function (some Lisp object), and
7417 a property list (usually a list of alternating keyword/value pairs).
7418 What makes symbols special is that there is usually only one symbol with
7419 a given name, and the symbol is referred to by name.  This makes a
7420 symbol a convenient way of calling up data by name, i.e. of implementing
7421 variables. (The variable's value is stored in the @dfn{value slot}.)
7422 Similarly, functions are referenced by name, and the definition of the
7423 function is stored in a symbol's @dfn{function slot}.  This means that
7424 there can be a distinct function and variable with the same name.  The
7425 property list is used as a more general mechanism of associating
7426 additional values with particular names, and once again the namespace is
7427 independent of the function and variable namespaces.
7428
7429 @node Obarrays
7430 @section Obarrays
7431 @cindex obarrays
7432
7433   The identity of symbols with their names is accomplished through a
7434 structure called an obarray, which is just a poorly-implemented hash
7435 table mapping from strings to symbols whose name is that string. (I say
7436 ``poorly implemented'' because an obarray appears in Lisp as a vector
7437 with some hidden fields rather than as its own opaque type.  This is an
7438 Emacs Lisp artifact that should be fixed.)
7439
7440   Obarrays are implemented as a vector of some fixed size (which should
7441 be a prime for best results), where each ``bucket'' of the vector
7442 contains one or more symbols, threaded through a hidden @code{next}
7443 field in the symbol.  Lookup of a symbol in an obarray, and adding a
7444 symbol to an obarray, is accomplished through standard hash-table
7445 techniques.
7446
7447   The standard Lisp function for working with symbols and obarrays is
7448 @code{intern}.  This looks up a symbol in an obarray given its name; if
7449 it's not found, a new symbol is automatically created with the specified
7450 name, added to the obarray, and returned.  This is what happens when the
7451 Lisp reader encounters a symbol (or more precisely, encounters the name
7452 of a symbol) in some text that it is reading.  There is a standard
7453 obarray called @code{obarray} that is used for this purpose, although
7454 the Lisp programmer is free to create his own obarrays and @code{intern}
7455 symbols in them.
7456
7457   Note that, once a symbol is in an obarray, it stays there until
7458 something is done about it, and the standard obarray @code{obarray}
7459 always stays around, so once you use any particular variable name, a
7460 corresponding symbol will stay around in @code{obarray} until you exit
7461 XEmacs.
7462
7463   Note that @code{obarray} itself is a variable, and as such there is a
7464 symbol in @code{obarray} whose name is @code{"obarray"} and which
7465 contains @code{obarray} as its value.
7466
7467   Note also that this call to @code{intern} occurs only when in the Lisp
7468 reader, not when the code is executed (at which point the symbol is
7469 already around, stored as such in the definition of the function).
7470
7471   You can create your own obarray using @code{make-vector} (this is
7472 horrible but is an artifact) and intern symbols into that obarray.
7473 Doing that will result in two or more symbols with the same name.
7474 However, at most one of these symbols is in the standard @code{obarray}:
7475 You cannot have two symbols of the same name in any particular obarray.
7476 Note that you cannot add a symbol to an obarray in any fashion other
7477 than using @code{intern}: i.e. you can't take an existing symbol and put
7478 it in an existing obarray.  Nor can you change the name of an existing
7479 symbol. (Since obarrays are vectors, you can violate the consistency of
7480 things by storing directly into the vector, but let's ignore that
7481 possibility.)
7482
7483   Usually symbols are created by @code{intern}, but if you really want,
7484 you can explicitly create a symbol using @code{make-symbol}, giving it
7485 some name.  The resulting symbol is not in any obarray (i.e. it is
7486 @dfn{uninterned}), and you can't add it to any obarray.  Therefore its
7487 primary purpose is as a symbol to use in macros to avoid namespace
7488 pollution.  It can also be used as a carrier of information, but cons
7489 cells could probably be used just as well.
7490
7491   You can also use @code{intern-soft} to look up a symbol but not create
7492 a new one, and @code{unintern} to remove a symbol from an obarray.  This
7493 returns the removed symbol. (Remember: You can't put the symbol back
7494 into any obarray.) Finally, @code{mapatoms} maps over all of the symbols
7495 in an obarray.
7496
7497 @node Symbol Values
7498 @section Symbol Values
7499 @cindex symbol values
7500 @cindex values, symbol
7501
7502   The value field of a symbol normally contains a Lisp object.  However,
7503 a symbol can be @dfn{unbound}, meaning that it logically has no value.
7504 This is internally indicated by storing a special Lisp object, called
7505 @dfn{the unbound marker} and stored in the global variable
7506 @code{Qunbound}.  The unbound marker is of a special Lisp object type
7507 called @dfn{symbol-value-magic}.  It is impossible for the Lisp
7508 programmer to directly create or access any object of this type.
7509
7510   @strong{You must not let any ``symbol-value-magic'' object escape to
7511 the Lisp level.}  Printing any of these objects will cause the message
7512 @samp{INTERNAL EMACS BUG} to appear as part of the print representation.
7513 (You may see this normally when you call @code{debug_print()} from the
7514 debugger on a Lisp object.) If you let one of these objects escape to
7515 the Lisp level, you will violate a number of assumptions contained in
7516 the C code and make the unbound marker not function right.
7517
7518   When a symbol is created, its value field (and function field) are set
7519 to @code{Qunbound}.  The Lisp programmer can restore these conditions
7520 later using @code{makunbound} or @code{fmakunbound}, and can query to
7521 see whether the value of function fields are @dfn{bound} (i.e. have a
7522 value other than @code{Qunbound}) using @code{boundp} and
7523 @code{fboundp}.  The fields are set to a normal Lisp object using
7524 @code{set} (or @code{setq}) and @code{fset}.
7525
7526   Other symbol-value-magic objects are used as special markers to
7527 indicate variables that have non-normal properties.  This includes any
7528 variables that are tied into C variables (setting the variable magically
7529 sets some global variable in the C code, and likewise for retrieving the
7530 variable's value), variables that magically tie into slots in the
7531 current buffer, variables that are buffer-local, etc.  The
7532 symbol-value-magic object is stored in the value cell in place of
7533 a normal object, and the code to retrieve a symbol's value
7534 (i.e. @code{symbol-value}) knows how to do special things with them.
7535 This means that you should not just fetch the value cell directly if you
7536 want a symbol's value.
7537
7538   The exact workings of this are rather complex and involved and are
7539 well-documented in comments in @file{buffer.c}, @file{symbols.c}, and
7540 @file{lisp.h}.
7541
7542 @node Buffers and Textual Representation, MULE Character Sets and Encodings, Symbols and Variables, Top
7543 @chapter Buffers and Textual Representation
7544 @cindex buffers and textual representation
7545 @cindex textual representation, buffers and
7546
7547 @menu
7548 * Introduction to Buffers::     A buffer holds a block of text such as a file.
7549 * The Text in a Buffer::        Representation of the text in a buffer.
7550 * Buffer Lists::                Keeping track of all buffers.
7551 * Markers and Extents::         Tagging locations within a buffer.
7552 * Bufbytes and Emchars::        Representation of individual characters.
7553 * The Buffer Object::           The Lisp object corresponding to a buffer.
7554 @end menu
7555
7556 @node Introduction to Buffers
7557 @section Introduction to Buffers
7558 @cindex buffers, introduction to
7559
7560   A buffer is logically just a Lisp object that holds some text.
7561 In this, it is like a string, but a buffer is optimized for
7562 frequent insertion and deletion, while a string is not.  Furthermore:
7563
7564 @enumerate
7565 @item
7566 Buffers are @dfn{permanent} objects, i.e. once you create them, they
7567 remain around, and need to be explicitly deleted before they go away.
7568 @item
7569 Each buffer has a unique name, which is a string.  Buffers are
7570 normally referred to by name.  In this respect, they are like
7571 symbols.
7572 @item
7573 Buffers have a default insertion position, called @dfn{point}.
7574 Inserting text (unless you explicitly give a position) goes at point,
7575 and moves point forward past the text.  This is what is going on when
7576 you type text into Emacs.
7577 @item
7578 Buffers have lots of extra properties associated with them.
7579 @item
7580 Buffers can be @dfn{displayed}.  What this means is that there
7581 exist a number of @dfn{windows}, which are objects that correspond
7582 to some visible section of your display, and each window has
7583 an associated buffer, and the current contents of the buffer
7584 are shown in that section of the display.  The redisplay mechanism
7585 (which takes care of doing this) knows how to look at the
7586 text of a buffer and come up with some reasonable way of displaying
7587 this.  Many of the properties of a buffer control how the
7588 buffer's text is displayed.
7589 @item
7590 One buffer is distinguished and called the @dfn{current buffer}.  It is
7591 stored in the variable @code{current_buffer}.  Buffer operations operate
7592 on this buffer by default.  When you are typing text into a buffer, the
7593 buffer you are typing into is always @code{current_buffer}.  Switching
7594 to a different window changes the current buffer.  Note that Lisp code
7595 can temporarily change the current buffer using @code{set-buffer} (often
7596 enclosed in a @code{save-excursion} so that the former current buffer
7597 gets restored when the code is finished).  However, calling
7598 @code{set-buffer} will NOT cause a permanent change in the current
7599 buffer.  The reason for this is that the top-level event loop sets
7600 @code{current_buffer} to the buffer of the selected window, each time
7601 it finishes executing a user command.
7602 @end enumerate
7603
7604   Make sure you understand the distinction between @dfn{current buffer}
7605 and @dfn{buffer of the selected window}, and the distinction between
7606 @dfn{point} of the current buffer and @dfn{window-point} of the selected
7607 window. (This latter distinction is explained in detail in the section
7608 on windows.)
7609
7610 @node The Text in a Buffer
7611 @section The Text in a Buffer
7612 @cindex text in a buffer, the
7613 @cindex buffer, the text in a
7614
7615   The text in a buffer consists of a sequence of zero or more
7616 characters.  A @dfn{character} is an integer that logically represents
7617 a letter, number, space, or other unit of text.  Most of the characters
7618 that you will typically encounter belong to the ASCII set of characters,
7619 but there are also characters for various sorts of accented letters,
7620 special symbols, Chinese and Japanese ideograms (i.e. Kanji, Katakana,
7621 etc.), Cyrillic and Greek letters, etc.  The actual number of possible
7622 characters is quite large.
7623
7624   For now, we can view a character as some non-negative integer that
7625 has some shape that defines how it typically appears (e.g. as an
7626 uppercase A). (The exact way in which a character appears depends on the
7627 font used to display the character.) The internal type of characters in
7628 the C code is an @code{Emchar}; this is just an @code{int}, but using a
7629 symbolic type makes the code clearer.
7630
7631   Between every character in a buffer is a @dfn{buffer position} or
7632 @dfn{character position}.  We can speak of the character before or after
7633 a particular buffer position, and when you insert a character at a
7634 particular position, all characters after that position end up at new
7635 positions.  When we speak of the character @dfn{at} a position, we
7636 really mean the character after the position.  (This schizophrenia
7637 between a buffer position being ``between'' a character and ``on'' a
7638 character is rampant in Emacs.)
7639
7640   Buffer positions are numbered starting at 1.  This means that
7641 position 1 is before the first character, and position 0 is not
7642 valid.  If there are N characters in a buffer, then buffer
7643 position N+1 is after the last one, and position N+2 is not valid.
7644
7645   The internal makeup of the Emchar integer varies depending on whether
7646 we have compiled with MULE support.  If not, the Emchar integer is an
7647 8-bit integer with possible values from 0 - 255.  0 - 127 are the
7648 standard ASCII characters, while 128 - 255 are the characters from the
7649 ISO-8859-1 character set.  If we have compiled with MULE support, an
7650 Emchar is a 19-bit integer, with the various bits having meanings
7651 according to a complex scheme that will be detailed later.  The
7652 characters numbered 0 - 255 still have the same meanings as for the
7653 non-MULE case, though.
7654
7655   Internally, the text in a buffer is represented in a fairly simple
7656 fashion: as a contiguous array of bytes, with a @dfn{gap} of some size
7657 in the middle.  Although the gap is of some substantial size in bytes,
7658 there is no text contained within it: From the perspective of the text
7659 in the buffer, it does not exist.  The gap logically sits at some buffer
7660 position, between two characters (or possibly at the beginning or end of
7661 the buffer).  Insertion of text in a buffer at a particular position is
7662 always accomplished by first moving the gap to that position
7663 (i.e. through some block moving of text), then writing the text into the
7664 beginning of the gap, thereby shrinking the gap.  If the gap shrinks
7665 down to nothing, a new gap is created. (What actually happens is that a
7666 new gap is ``created'' at the end of the buffer's text, which requires
7667 nothing more than changing a couple of indices; then the gap is
7668 ``moved'' to the position where the insertion needs to take place by
7669 moving up in memory all the text after that position.)  Similarly,
7670 deletion occurs by moving the gap to the place where the text is to be
7671 deleted, and then simply expanding the gap to include the deleted text.
7672 (@dfn{Expanding} and @dfn{shrinking} the gap as just described means
7673 just that the internal indices that keep track of where the gap is
7674 located are changed.)
7675
7676   Note that the total amount of memory allocated for a buffer text never
7677 decreases while the buffer is live.  Therefore, if you load up a
7678 20-megabyte file and then delete all but one character, there will be a
7679 20-megabyte gap, which won't get any smaller (except by inserting
7680 characters back again).  Once the buffer is killed, the memory allocated
7681 for the buffer text will be freed, but it will still be sitting on the
7682 heap, taking up virtual memory, and will not be released back to the
7683 operating system. (However, if you have compiled XEmacs with rel-alloc,
7684 the situation is different.  In this case, the space @emph{will} be
7685 released back to the operating system.  However, this tends to result in a
7686 noticeable speed penalty.)
7687
7688   Astute readers may notice that the text in a buffer is represented as
7689 an array of @emph{bytes}, while (at least in the MULE case) an Emchar is
7690 a 19-bit integer, which clearly cannot fit in a byte.  This means (of
7691 course) that the text in a buffer uses a different representation from
7692 an Emchar: specifically, the 19-bit Emchar becomes a series of one to
7693 four bytes.  The conversion between these two representations is complex
7694 and will be described later.
7695
7696   In the non-MULE case, everything is very simple: An Emchar
7697 is an 8-bit value, which fits neatly into one byte.
7698
7699   If we are given a buffer position and want to retrieve the
7700 character at that position, we need to follow these steps:
7701
7702 @enumerate
7703 @item
7704 Pretend there's no gap, and convert the buffer position into a @dfn{byte
7705 index} that indexes to the appropriate byte in the buffer's stream of
7706 textual bytes.  By convention, byte indices begin at 1, just like buffer
7707 positions.  In the non-MULE case, byte indices and buffer positions are
7708 identical, since one character equals one byte.
7709 @item
7710 Convert the byte index into a @dfn{memory index}, which takes the gap
7711 into account.  The memory index is a direct index into the block of
7712 memory that stores the text of a buffer.  This basically just involves
7713 checking to see if the byte index is past the gap, and if so, adding the
7714 size of the gap to it.  By convention, memory indices begin at 1, just
7715 like buffer positions and byte indices, and when referring to the
7716 position that is @dfn{at} the gap, we always use the memory position at
7717 the @emph{beginning}, not at the end, of the gap.
7718 @item
7719 Fetch the appropriate bytes at the determined memory position.
7720 @item
7721 Convert these bytes into an Emchar.
7722 @end enumerate
7723
7724   In the non-Mule case, (3) and (4) boil down to a simple one-byte
7725 memory access.
7726
7727   Note that we have defined three types of positions in a buffer:
7728
7729 @enumerate
7730 @item
7731 @dfn{buffer positions} or @dfn{character positions}, typedef @code{Bufpos}
7732 @item
7733 @dfn{byte indices}, typedef @code{Bytind}
7734 @item
7735 @dfn{memory indices}, typedef @code{Memind}
7736 @end enumerate
7737
7738   All three typedefs are just @code{int}s, but defining them this way makes
7739 things a lot clearer.
7740
7741   Most code works with buffer positions.  In particular, all Lisp code
7742 that refers to text in a buffer uses buffer positions.  Lisp code does
7743 not know that byte indices or memory indices exist.
7744
7745   Finally, we have a typedef for the bytes in a buffer.  This is a
7746 @code{Bufbyte}, which is an unsigned char.  Referring to them as
7747 Bufbytes underscores the fact that we are working with a string of bytes
7748 in the internal Emacs buffer representation rather than in one of a
7749 number of possible alternative representations (e.g. EUC-encoded text,
7750 etc.).
7751
7752 @node Buffer Lists
7753 @section Buffer Lists
7754 @cindex buffer lists
7755
7756   Recall earlier that buffers are @dfn{permanent} objects, i.e.  that
7757 they remain around until explicitly deleted.  This entails that there is
7758 a list of all the buffers in existence.  This list is actually an
7759 assoc-list (mapping from the buffer's name to the buffer) and is stored
7760 in the global variable @code{Vbuffer_alist}.
7761
7762   The order of the buffers in the list is important: the buffers are
7763 ordered approximately from most-recently-used to least-recently-used.
7764 Switching to a buffer using @code{switch-to-buffer},
7765 @code{pop-to-buffer}, etc. and switching windows using
7766 @code{other-window}, etc.  usually brings the new current buffer to the
7767 front of the list.  @code{switch-to-buffer}, @code{other-buffer},
7768 etc. look at the beginning of the list to find an alternative buffer to
7769 suggest.  You can also explicitly move a buffer to the end of the list
7770 using @code{bury-buffer}.
7771
7772   In addition to the global ordering in @code{Vbuffer_alist}, each frame
7773 has its own ordering of the list.  These lists always contain the same
7774 elements as in @code{Vbuffer_alist} although possibly in a different
7775 order.  @code{buffer-list} normally returns the list for the selected
7776 frame.  This allows you to work in separate frames without things
7777 interfering with each other.
7778
7779   The standard way to look up a buffer given a name is
7780 @code{get-buffer}, and the standard way to create a new buffer is
7781 @code{get-buffer-create}, which looks up a buffer with a given name,
7782 creating a new one if necessary.  These operations correspond exactly
7783 with the symbol operations @code{intern-soft} and @code{intern},
7784 respectively.  You can also force a new buffer to be created using
7785 @code{generate-new-buffer}, which takes a name and (if necessary) makes
7786 a unique name from this by appending a number, and then creates the
7787 buffer.  This is basically like the symbol operation @code{gensym}.
7788
7789 @node Markers and Extents
7790 @section Markers and Extents
7791 @cindex markers and extents
7792 @cindex extents, markers and
7793
7794   Among the things associated with a buffer are things that are
7795 logically attached to certain buffer positions.  This can be used to
7796 keep track of a buffer position when text is inserted and deleted, so
7797 that it remains at the same spot relative to the text around it; to
7798 assign properties to particular sections of text; etc.  There are two
7799 such objects that are useful in this regard: they are @dfn{markers} and
7800 @dfn{extents}.
7801
7802   A @dfn{marker} is simply a flag placed at a particular buffer
7803 position, which is moved around as text is inserted and deleted.
7804 Markers are used for all sorts of purposes, such as the @code{mark} that
7805 is the other end of textual regions to be cut, copied, etc.
7806
7807   An @dfn{extent} is similar to two markers plus some associated
7808 properties, and is used to keep track of regions in a buffer as text is
7809 inserted and deleted, and to add properties (e.g. fonts) to particular
7810 regions of text.  The external interface of extents is explained
7811 elsewhere.
7812
7813   The important thing here is that markers and extents simply contain
7814 buffer positions in them as integers, and every time text is inserted or
7815 deleted, these positions must be updated.  In order to minimize the
7816 amount of shuffling that needs to be done, the positions in markers and
7817 extents (there's one per marker, two per extent) are stored in Meminds.
7818 This means that they only need to be moved when the text is physically
7819 moved in memory; since the gap structure tries to minimize this, it also
7820 minimizes the number of marker and extent indices that need to be
7821 adjusted.  Look in @file{insdel.c} for the details of how this works.
7822
7823   One other important distinction is that markers are @dfn{temporary}
7824 while extents are @dfn{permanent}.  This means that markers disappear as
7825 soon as there are no more pointers to them, and correspondingly, there
7826 is no way to determine what markers are in a buffer if you are just
7827 given the buffer.  Extents remain in a buffer until they are detached
7828 (which could happen as a result of text being deleted) or the buffer is
7829 deleted, and primitives do exist to enumerate the extents in a buffer.
7830
7831 @node Bufbytes and Emchars
7832 @section Bufbytes and Emchars
7833 @cindex Bufbytes and Emchars
7834 @cindex Emchars, Bufbytes and
7835
7836   Not yet documented.
7837
7838 @node The Buffer Object
7839 @section The Buffer Object
7840 @cindex buffer object, the
7841 @cindex object, the buffer
7842
7843   Buffers contain fields not directly accessible by the Lisp programmer.
7844 We describe them here, naming them by the names used in the C code.
7845 Many are accessible indirectly in Lisp programs via Lisp primitives.
7846
7847 @table @code
7848 @item name
7849 The buffer name is a string that names the buffer.  It is guaranteed to
7850 be unique.  @xref{Buffer Names,,, lispref, XEmacs Lisp Reference
7851 Manual}.
7852
7853 @item save_modified
7854 This field contains the time when the buffer was last saved, as an
7855 integer.  @xref{Buffer Modification,,, lispref, XEmacs Lisp Reference
7856 Manual}.
7857
7858 @item modtime
7859 This field contains the modification time of the visited file.  It is
7860 set when the file is written or read.  Every time the buffer is written
7861 to the file, this field is compared to the modification time of the
7862 file.  @xref{Buffer Modification,,, lispref, XEmacs Lisp Reference
7863 Manual}.
7864
7865 @item auto_save_modified
7866 This field contains the time when the buffer was last auto-saved.
7867
7868 @item last_window_start
7869 This field contains the @code{window-start} position in the buffer as of
7870 the last time the buffer was displayed in a window.
7871
7872 @item undo_list
7873 This field points to the buffer's undo list.  @xref{Undo,,, lispref,
7874 XEmacs Lisp Reference Manual}.
7875
7876 @item syntax_table_v
7877 This field contains the syntax table for the buffer.  @xref{Syntax
7878 Tables,,, lispref, XEmacs Lisp Reference Manual}.
7879
7880 @item downcase_table
7881 This field contains the conversion table for converting text to lower
7882 case.  @xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}.
7883
7884 @item upcase_table
7885 This field contains the conversion table for converting text to upper
7886 case.  @xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}.
7887
7888 @item case_canon_table
7889 This field contains the conversion table for canonicalizing text for
7890 case-folding search.  @xref{Case Tables,,, lispref, XEmacs Lisp
7891 Reference Manual}.
7892
7893 @item case_eqv_table
7894 This field contains the equivalence table for case-folding search.
7895 @xref{Case Tables,,, lispref, XEmacs Lisp Reference Manual}.
7896
7897 @item display_table
7898 This field contains the buffer's display table, or @code{nil} if it
7899 doesn't have one.  @xref{Display Tables,,, lispref, XEmacs Lisp
7900 Reference Manual}.
7901
7902 @item markers
7903 This field contains the chain of all markers that currently point into
7904 the buffer.  Deletion of text in the buffer, and motion of the buffer's
7905 gap, must check each of these markers and perhaps update it.
7906 @xref{Markers,,, lispref, XEmacs Lisp Reference Manual}.
7907
7908 @item backed_up
7909 This field is a flag that tells whether a backup file has been made for
7910 the visited file of this buffer.
7911
7912 @item mark
7913 This field contains the mark for the buffer.  The mark is a marker,
7914 hence it is also included on the list @code{markers}.  @xref{The Mark,,,
7915 lispref, XEmacs Lisp Reference Manual}.
7916
7917 @item mark_active
7918 This field is non-@code{nil} if the buffer's mark is active.
7919
7920 @item local_var_alist
7921 This field contains the association list describing the variables local
7922 in this buffer, and their values, with the exception of local variables
7923 that have special slots in the buffer object.  (Those slots are omitted
7924 from this table.)  @xref{Buffer-Local Variables,,, lispref, XEmacs Lisp
7925 Reference Manual}.
7926
7927 @item modeline_format
7928 This field contains a Lisp object which controls how to display the mode
7929 line for this buffer.  @xref{Modeline Format,,, lispref, XEmacs Lisp
7930 Reference Manual}.
7931
7932 @item base_buffer
7933 This field holds the buffer's base buffer (if it is an indirect buffer),
7934 or @code{nil}.
7935 @end table
7936
7937 @node MULE Character Sets and Encodings, The Lisp Reader and Compiler, Buffers and Textual Representation, Top
7938 @chapter MULE Character Sets and Encodings
7939 @cindex Mule character sets and encodings
7940 @cindex character sets and encodings, Mule
7941 @cindex encodings, Mule character sets and
7942
7943   Recall that there are two primary ways that text is represented in
7944 XEmacs.  The @dfn{buffer} representation sees the text as a series of
7945 bytes (Bufbytes), with a variable number of bytes used per character.
7946 The @dfn{character} representation sees the text as a series of integers
7947 (Emchars), one per character.  The character representation is a cleaner
7948 representation from a theoretical standpoint, and is thus used in many
7949 cases when lots of manipulations on a string need to be done.  However,
7950 the buffer representation is the standard representation used in both
7951 Lisp strings and buffers, and because of this, it is the ``default''
7952 representation that text comes in.  The reason for using this
7953 representation is that it's compact and is compatible with ASCII.
7954
7955 @menu
7956 * Character Sets::
7957 * Encodings::
7958 * Internal Mule Encodings::
7959 * CCL::
7960 @end menu
7961
7962 @node Character Sets
7963 @section Character Sets
7964 @cindex character sets
7965
7966   A character set (or @dfn{charset}) is an ordered set of characters.  A
7967 particular character in a charset is indexed using one or more
7968 @dfn{position codes}, which are non-negative integers.  The number of
7969 position codes needed to identify a particular character in a charset is
7970 called the @dfn{dimension} of the charset.  In XEmacs/Mule, all charsets
7971 have dimension 1 or 2, and the size of all charsets (except for a few
7972 special cases) is either 94, 96, 94 by 94, or 96 by 96.  The range of
7973 position codes used to index characters from any of these types of
7974 character sets is as follows:
7975
7976 @example
7977 Charset type            Position code 1         Position code 2
7978 ------------------------------------------------------------
7979 94                      33 - 126                N/A
7980 96                      32 - 127                N/A
7981 94x94                   33 - 126                33 - 126
7982 96x96                   32 - 127                32 - 127
7983 @end example
7984
7985   Note that in the above cases position codes do not start at an
7986 expected value such as 0 or 1.  The reason for this will become clear
7987 later.
7988
7989   For example, Latin-1 is a 96-character charset, and JISX0208 (the
7990 Japanese national character set) is a 94x94-character charset.
7991
7992   [Note that, although the ranges above define the @emph{valid} position
7993 codes for a charset, some of the slots in a particular charset may in
7994 fact be empty.  This is the case for JISX0208, for example, where (e.g.)
7995 all the slots whose first position code is in the range 118 - 127 are
7996 empty.]
7997
7998   There are three charsets that do not follow the above rules.  All of
7999 them have one dimension, and have ranges of position codes as follows:
8000
8001 @example
8002 Charset name            Position code 1
8003 ------------------------------------
8004 ASCII                   0 - 127
8005 Control-1               0 - 31
8006 Composite               0 - some large number
8007 @end example
8008
8009   (The upper bound of the position code for composite characters has not
8010 yet been determined, but it will probably be at least 16,383).
8011
8012   ASCII is the union of two subsidiary character sets: Printing-ASCII
8013 (the printing ASCII character set, consisting of position codes 33 -
8014 126, like for a standard 94-character charset) and Control-ASCII (the
8015 non-printing characters that would appear in a binary file with codes 0
8016 - 32 and 127).
8017
8018   Control-1 contains the non-printing characters that would appear in a
8019 binary file with codes 128 - 159.
8020
8021   Composite contains characters that are generated by overstriking one
8022 or more characters from other charsets.
8023
8024   Note that some characters in ASCII, and all characters in Control-1,
8025 are @dfn{control} (non-printing) characters.  These have no printed
8026 representation but instead control some other function of the printing
8027 (e.g. TAB or 8 moves the current character position to the next tab
8028 stop).  All other characters in all charsets are @dfn{graphic}
8029 (printing) characters.
8030
8031   When a binary file is read in, the bytes in the file are assigned to
8032 character sets as follows:
8033
8034 @example
8035 Bytes           Character set           Range
8036 --------------------------------------------------
8037 0 - 127         ASCII                   0 - 127
8038 128 - 159       Control-1               0 - 31
8039 160 - 255       Latin-1                 32 - 127
8040 @end example
8041
8042   This is a bit ad-hoc but gets the job done.
8043
8044 @node Encodings
8045 @section Encodings
8046 @cindex encodings, Mule
8047 @cindex Mule encodings
8048
8049   An @dfn{encoding} is a way of numerically representing characters from
8050 one or more character sets.  If an encoding only encompasses one
8051 character set, then the position codes for the characters in that
8052 character set could be used directly.  This is not possible, however, if
8053 more than one character set is to be used in the encoding.
8054
8055   For example, the conversion detailed above between bytes in a binary
8056 file and characters is effectively an encoding that encompasses the
8057 three character sets ASCII, Control-1, and Latin-1 in a stream of 8-bit
8058 bytes.
8059
8060   Thus, an encoding can be viewed as a way of encoding characters from a
8061 specified group of character sets using a stream of bytes, each of which
8062 contains a fixed number of bits (but not necessarily 8, as in the common
8063 usage of ``byte'').
8064
8065   Here are descriptions of a couple of common
8066 encodings:
8067
8068 @menu
8069 * Japanese EUC (Extended Unix Code)::
8070 * JIS7::
8071 @end menu
8072
8073 @node Japanese EUC (Extended Unix Code)
8074 @subsection Japanese EUC (Extended Unix Code)
8075 @cindex Japanese EUC (Extended Unix Code)
8076 @cindex EUC (Extended Unix Code), Japanese
8077 @cindex Extended Unix Code, Japanese EUC
8078
8079 This encompasses the character sets Printing-ASCII, Japanese-JISX0201,
8080 and Japanese-JISX0208-Kana (half-width katakana, the right half of
8081 JISX0201).  It uses 8-bit bytes.
8082
8083 Note that Printing-ASCII and Japanese-JISX0201-Kana are 94-character
8084 charsets, while Japanese-JISX0208 is a 94x94-character charset.
8085
8086 The encoding is as follows:
8087
8088 @example
8089 Character set            Representation (PC=position-code)
8090 -------------            --------------
8091 Printing-ASCII           PC1
8092 Japanese-JISX0201-Kana   0x8E       | PC1 + 0x80
8093 Japanese-JISX0208        PC1 + 0x80 | PC2 + 0x80
8094 Japanese-JISX0212        PC1 + 0x80 | PC2 + 0x80
8095 @end example
8096
8097
8098 @node JIS7
8099 @subsection JIS7
8100 @cindex JIS7
8101
8102 This encompasses the character sets Printing-ASCII,
8103 Japanese-JISX0201-Roman (the left half of JISX0201; this character set
8104 is very similar to Printing-ASCII and is a 94-character charset),
8105 Japanese-JISX0208, and Japanese-JISX0201-Kana.  It uses 7-bit bytes.
8106
8107 Unlike Japanese EUC, this is a @dfn{modal} encoding, which
8108 means that there are multiple states that the encoding can
8109 be in, which affect how the bytes are to be interpreted.
8110 Special sequences of bytes (called @dfn{escape sequences})
8111 are used to change states.
8112
8113   The encoding is as follows:
8114
8115 @example
8116 Character set              Representation (PC=position-code)
8117 -------------              --------------
8118 Printing-ASCII             PC1
8119 Japanese-JISX0201-Roman    PC1
8120 Japanese-JISX0201-Kana     PC1
8121 Japanese-JISX0208          PC1 PC2
8122
8123
8124 Escape sequence   ASCII equivalent   Meaning
8125 ---------------   ----------------   -------
8126 0x1B 0x28 0x4A    ESC ( J            invoke Japanese-JISX0201-Roman
8127 0x1B 0x28 0x49    ESC ( I            invoke Japanese-JISX0201-Kana
8128 0x1B 0x24 0x42    ESC $ B            invoke Japanese-JISX0208
8129 0x1B 0x28 0x42    ESC ( B            invoke Printing-ASCII
8130 @end example
8131
8132   Initially, Printing-ASCII is invoked.
8133
8134 @node Internal Mule Encodings
8135 @section Internal Mule Encodings
8136 @cindex internal Mule encodings
8137 @cindex Mule encodings, internal
8138 @cindex encodings, internal Mule
8139
8140 In XEmacs/Mule, each character set is assigned a unique number, called a
8141 @dfn{leading byte}.  This is used in the encodings of a character.
8142 Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has
8143 a leading byte of 0), although some leading bytes are reserved.
8144
8145 Charsets whose leading byte is in the range 0x80 - 0x9F are called
8146 @dfn{official} and are used for built-in charsets.  Other charsets are
8147 called @dfn{private} and have leading bytes in the range 0xA0 - 0xFF;
8148 these are user-defined charsets.
8149
8150   More specifically:
8151
8152 @example
8153 Character set           Leading byte
8154 -------------           ------------
8155 ASCII                   0
8156 Composite               0x80
8157 Dimension-1 Official    0x81 - 0x8D
8158                           (0x8E is free)
8159 Control-1               0x8F
8160 Dimension-2 Official    0x90 - 0x99
8161                           (0x9A - 0x9D are free;
8162                            0x9E and 0x9F are reserved)
8163 Dimension-1 Private     0xA0 - 0xEF
8164 Dimension-2 Private     0xF0 - 0xFF
8165 @end example
8166
8167 There are two internal encodings for characters in XEmacs/Mule.  One is
8168 called @dfn{string encoding} and is an 8-bit encoding that is used for
8169 representing characters in a buffer or string.  It uses 1 to 4 bytes per
8170 character.  The other is called @dfn{character encoding} and is a 19-bit
8171 encoding that is used for representing characters individually in a
8172 variable.
8173
8174 (In the following descriptions, we'll ignore composite characters for
8175 the moment.  We also give a general (structural) overview first,
8176 followed later by the exact details.)
8177
8178 @menu
8179 * Internal String Encoding::
8180 * Internal Character Encoding::
8181 @end menu
8182
8183 @node Internal String Encoding
8184 @subsection Internal String Encoding
8185 @cindex internal string encoding
8186 @cindex string encoding, internal
8187 @cindex encoding, internal string
8188
8189 ASCII characters are encoded using their position code directly.  Other
8190 characters are encoded using their leading byte followed by their
8191 position code(s) with the high bit set.  Characters in private character
8192 sets have their leading byte prefixed with a @dfn{leading byte prefix},
8193 which is either 0x9E or 0x9F. (No character sets are ever assigned these
8194 leading bytes.) Specifically:
8195
8196 @example
8197 Character set           Encoding (PC=position-code, LB=leading-byte)
8198 -------------           --------
8199 ASCII                   PC-1 |
8200 Control-1               LB   |  PC1 + 0xA0 |
8201 Dimension-1 official    LB   |  PC1 + 0x80 |
8202 Dimension-1 private     0x9E |  LB         | PC1 + 0x80 |
8203 Dimension-2 official    LB   |  PC1 + 0x80 | PC2 + 0x80 |
8204 Dimension-2 private     0x9F |  LB         | PC1 + 0x80 | PC2 + 0x80
8205 @end example
8206
8207   The basic characteristic of this encoding is that the first byte
8208 of all characters is in the range 0x00 - 0x9F, and the second and
8209 following bytes of all characters is in the range 0xA0 - 0xFF.
8210 This means that it is impossible to get out of sync, or more
8211 specifically:
8212
8213 @enumerate
8214 @item
8215 Given any byte position, the beginning of the character it is
8216 within can be determined in constant time.
8217 @item
8218 Given any byte position at the beginning of a character, the
8219 beginning of the next character can be determined in constant
8220 time.
8221 @item
8222 Given any byte position at the beginning of a character, the
8223 beginning of the previous character can be determined in constant
8224 time.
8225 @item
8226 Textual searches can simply treat encoded strings as if they
8227 were encoded in a one-byte-per-character fashion rather than
8228 the actual multi-byte encoding.
8229 @end enumerate
8230
8231   None of the standard non-modal encodings meet all of these
8232 conditions.  For example, EUC satisfies only (2) and (3), while
8233 Shift-JIS and Big5 (not yet described) satisfy only (2). (All
8234 non-modal encodings must satisfy (2), in order to be unambiguous.)
8235
8236 @node Internal Character Encoding
8237 @subsection Internal Character Encoding
8238 @cindex internal character encoding
8239 @cindex character encoding, internal
8240 @cindex encoding, internal character
8241
8242   One 19-bit word represents a single character.  The word is
8243 separated into three fields:
8244
8245 @example
8246 Bit number:     18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
8247                 <------------> <------------------> <------------------>
8248 Field:                1                  2                    3
8249 @end example
8250
8251   Note that fields 2 and 3 hold 7 bits each, while field 1 holds 5 bits.
8252
8253 @example
8254 Character set           Field 1         Field 2         Field 3
8255 -------------           -------         -------         -------
8256 ASCII                      0               0              PC1
8257    range:                                                   (00 - 7F)
8258 Control-1                  0               1              PC1
8259    range:                                                   (00 - 1F)
8260 Dimension-1 official       0            LB - 0x80         PC1
8261    range:                                    (01 - 0D)      (20 - 7F)
8262 Dimension-1 private        0            LB - 0x80         PC1
8263    range:                                    (20 - 6F)      (20 - 7F)
8264 Dimension-2 official    LB - 0x8F         PC1             PC2
8265    range:                    (01 - 0A)       (20 - 7F)      (20 - 7F)
8266 Dimension-2 private     LB - 0xE1         PC1             PC2
8267    range:                    (0F - 1E)       (20 - 7F)      (20 - 7F)
8268 Composite                 0x1F             ?               ?
8269 @end example
8270
8271   Note that character codes 0 - 255 are the same as the ``binary encoding''
8272 described above.
8273
8274 @node CCL
8275 @section CCL
8276 @cindex CCL
8277
8278 @example
8279 CCL PROGRAM SYNTAX:
8280      CCL_PROGRAM := (CCL_MAIN_BLOCK
8281                      [ CCL_EOF_BLOCK ])
8282
8283      CCL_MAIN_BLOCK := CCL_BLOCK
8284      CCL_EOF_BLOCK := CCL_BLOCK
8285
8286      CCL_BLOCK := STATEMENT | (STATEMENT [STATEMENT ...])
8287      STATEMENT :=
8288              SET | IF | BRANCH | LOOP | REPEAT | BREAK
8289              | READ | WRITE
8290
8291      SET := (REG = EXPRESSION) | (REG SELF_OP EXPRESSION)
8292             | INT-OR-CHAR
8293
8294      EXPRESSION := ARG | (EXPRESSION OP ARG)
8295
8296      IF := (if EXPRESSION CCL_BLOCK CCL_BLOCK)
8297      BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...])
8298      LOOP := (loop STATEMENT [STATEMENT ...])
8299      BREAK := (break)
8300      REPEAT := (repeat)
8301              | (write-repeat [REG | INT-OR-CHAR | string])
8302              | (write-read-repeat REG [INT-OR-CHAR | string | ARRAY]?)
8303      READ := (read REG) | (read REG REG)
8304              | (read-if REG ARITH_OP ARG CCL_BLOCK CCL_BLOCK)
8305              | (read-branch REG CCL_BLOCK [CCL_BLOCK ...])
8306      WRITE := (write REG) | (write REG REG)
8307              | (write INT-OR-CHAR) | (write STRING) | STRING
8308              | (write REG ARRAY)
8309      END := (end)
8310
8311      REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7
8312      ARG := REG | INT-OR-CHAR
8313      OP :=   + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | //
8314              | < | > | == | <= | >= | !=
8315      SELF_OP :=
8316              += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>=
8317      ARRAY := '[' INT-OR-CHAR ... ']'
8318      INT-OR-CHAR := INT | CHAR
8319
8320 MACHINE CODE:
8321
8322 The machine code consists of a vector of 32-bit words.
8323 The first such word specifies the start of the EOF section of the code;
8324 this is the code executed to handle any stuff that needs to be done
8325 (e.g. designating back to ASCII and left-to-right mode) after all
8326 other encoded/decoded data has been written out.  This is not used for
8327 charset CCL programs.
8328
8329 REGISTER: 0..7  -- referred by RRR or rrr
8330
8331 OPERATOR BIT FIELD (27-bit): XXXXXXXXXXXXXXX RRR TTTTT
8332         TTTTT (5-bit): operator type
8333         RRR (3-bit): register number
8334         XXXXXXXXXXXXXXXX (15-bit):
8335                 CCCCCCCCCCCCCCC: constant or address
8336                 000000000000rrr: register number
8337
8338 AAAA:   00000 +
8339         00001 -
8340         00010 *
8341         00011 /
8342         00100 %
8343         00101 &
8344         00110 |
8345         00111 ~
8346
8347         01000 <<
8348         01001 >>
8349         01010 <8
8350         01011 >8
8351         01100 //
8352         01101 not used
8353         01110 not used
8354         01111 not used
8355
8356         10000 <
8357         10001 >
8358         10010 ==
8359         10011 <=
8360         10100 >=
8361         10101 !=
8362
8363 OPERATORS:      TTTTT RRR XX..
8364
8365 SetCS:          00000 RRR C...C      RRR = C...C
8366 SetCL:          00001 RRR .....      RRR = c...c
8367                 c.............c
8368 SetR:           00010 RRR ..rrr      RRR = rrr
8369 SetA:           00011 RRR ..rrr      RRR = array[rrr]
8370                 C.............C      size of array = C...C
8371                 c.............c      contents = c...c
8372
8373 Jump:           00100 000 c...c      jump to c...c
8374 JumpCond:       00101 RRR c...c      if (!RRR) jump to c...c
8375 WriteJump:      00110 RRR c...c      Write1 RRR, jump to c...c
8376 WriteReadJump:  00111 RRR c...c      Write1, Read1 RRR, jump to c...c
8377 WriteCJump:     01000 000 c...c      Write1 C...C, jump to c...c
8378                 C...C
8379 WriteCReadJump: 01001 RRR c...c      Write1 C...C, Read1 RRR,
8380                 C.............C      and jump to c...c
8381 WriteSJump:     01010 000 c...c      WriteS, jump to c...c
8382                 C.............C
8383                 S.............S
8384                 ...
8385 WriteSReadJump: 01011 RRR c...c      WriteS, Read1 RRR, jump to c...c
8386                 C.............C
8387                 S.............S
8388                 ...
8389 WriteAReadJump: 01100 RRR c...c      WriteA, Read1 RRR, jump to c...c
8390                 C.............C      size of array = C...C
8391                 c.............c      contents = c...c
8392                 ...
8393 Branch:         01101 RRR C...C      if (RRR >= 0 && RRR < C..)
8394                 c.............c      branch to (RRR+1)th address
8395 Read1:          01110 RRR ...        read 1-byte to RRR
8396 Read2:          01111 RRR ..rrr      read 2-byte to RRR and rrr
8397 ReadBranch:     10000 RRR C...C      Read1 and Branch
8398                 c.............c
8399                 ...
8400 Write1:         10001 RRR .....      write 1-byte RRR
8401 Write2:         10010 RRR ..rrr      write 2-byte RRR and rrr
8402 WriteC:         10011 000 .....      write 1-char C...CC
8403                 C.............C
8404 WriteS:         10100 000 .....      write C..-byte of string
8405                 C.............C
8406                 S.............S
8407                 ...
8408 WriteA:         10101 RRR .....      write array[RRR]
8409                 C.............C      size of array = C...C
8410                 c.............c      contents = c...c
8411                 ...
8412 End:            10110 000 .....      terminate the execution
8413
8414 SetSelfCS:      10111 RRR C...C      RRR AAAAA= C...C
8415                 ..........AAAAA
8416 SetSelfCL:      11000 RRR .....      RRR AAAAA= c...c
8417                 c.............c
8418                 ..........AAAAA
8419 SetSelfR:       11001 RRR ..Rrr      RRR AAAAA= rrr
8420                 ..........AAAAA
8421 SetExprCL:      11010 RRR ..Rrr      RRR = rrr AAAAA c...c
8422                 c.............c
8423                 ..........AAAAA
8424 SetExprR:       11011 RRR ..rrr      RRR = rrr AAAAA Rrr
8425                 ............Rrr
8426                 ..........AAAAA
8427 JumpCondC:      11100 RRR c...c      if !(RRR AAAAA C..) jump to c...c
8428                 C.............C
8429                 ..........AAAAA
8430 JumpCondR:      11101 RRR c...c      if !(RRR AAAAA rrr) jump to c...c
8431                 ............rrr
8432                 ..........AAAAA
8433 ReadJumpCondC:  11110 RRR c...c      Read1 and JumpCondC
8434                 C.............C
8435                 ..........AAAAA
8436 ReadJumpCondR:  11111 RRR c...c      Read1 and JumpCondR
8437                 ............rrr
8438                 ..........AAAAA
8439 @end example
8440
8441 @node The Lisp Reader and Compiler, Lstreams, MULE Character Sets and Encodings, Top
8442 @chapter The Lisp Reader and Compiler
8443 @cindex Lisp reader and compiler, the
8444 @cindex reader and compiler, the Lisp
8445 @cindex compiler, the Lisp reader and
8446
8447 Not yet documented.
8448
8449 @node Lstreams, Consoles; Devices; Frames; Windows, The Lisp Reader and Compiler, Top
8450 @chapter Lstreams
8451 @cindex lstreams
8452
8453   An @dfn{lstream} is an internal Lisp object that provides a generic
8454 buffering stream implementation.  Conceptually, you send data to the
8455 stream or read data from the stream, not caring what's on the other end
8456 of the stream.  The other end could be another stream, a file
8457 descriptor, a stdio stream, a fixed block of memory, a reallocating
8458 block of memory, etc.  The main purpose of the stream is to provide a
8459 standard interface and to do buffering.  Macros are defined to read or
8460 write characters, so the calling functions do not have to worry about
8461 blocking data together in order to achieve efficiency.
8462
8463 @menu
8464 * Creating an Lstream::         Creating an lstream object.
8465 * Lstream Types::               Different sorts of things that are streamed.
8466 * Lstream Functions::           Functions for working with lstreams.
8467 * Lstream Methods::             Creating new lstream types.
8468 @end menu
8469
8470 @node Creating an Lstream
8471 @section Creating an Lstream
8472 @cindex lstream, creating an
8473
8474 Lstreams come in different types, depending on what is being interfaced
8475 to.  Although the primitive for creating new lstreams is
8476 @code{Lstream_new()}, generally you do not call this directly.  Instead,
8477 you call some type-specific creation function, which creates the lstream
8478 and initializes it as appropriate for the particular type.
8479
8480 All lstream creation functions take a @var{mode} argument, specifying
8481 what mode the lstream should be opened as.  This controls whether the
8482 lstream is for input and output, and optionally whether data should be
8483 blocked up in units of MULE characters.  Note that some types of
8484 lstreams can only be opened for input; others only for output; and
8485 others can be opened either way.  #### Richard Mlynarik thinks that
8486 there should be a strict separation between input and output streams,
8487 and he's probably right.
8488
8489   @var{mode} is a string, one of
8490
8491 @table @code
8492 @item "r"
8493   Open for reading.
8494 @item "w"
8495   Open for writing.
8496 @item "rc"
8497   Open for reading, but ``read'' never returns partial MULE characters.
8498 @item "wc"
8499   Open for writing, but never writes partial MULE characters.
8500 @end table
8501
8502 @node Lstream Types
8503 @section Lstream Types
8504 @cindex lstream types
8505 @cindex types, lstream
8506
8507 @table @asis
8508 @item stdio
8509
8510 @item filedesc
8511
8512 @item lisp-string
8513
8514 @item fixed-buffer
8515
8516 @item resizing-buffer
8517
8518 @item dynarr
8519
8520 @item lisp-buffer
8521
8522 @item print
8523
8524 @item decoding
8525
8526 @item encoding
8527 @end table
8528
8529 @node Lstream Functions
8530 @section Lstream Functions
8531 @cindex lstream functions
8532
8533 @deftypefun {Lstream *} Lstream_new (Lstream_implementation *@var{imp}, const char *@var{mode})
8534 Allocate and return a new Lstream.  This function is not really meant to
8535 be called directly; rather, each stream type should provide its own
8536 stream creation function, which creates the stream and does any other
8537 necessary creation stuff (e.g. opening a file).
8538 @end deftypefun
8539
8540 @deftypefun void Lstream_set_buffering (Lstream *@var{lstr}, Lstream_buffering @var{buffering}, int @var{buffering_size})
8541 Change the buffering of a stream.  See @file{lstream.h}.  By default the
8542 buffering is @code{STREAM_BLOCK_BUFFERED}.
8543 @end deftypefun
8544
8545 @deftypefun int Lstream_flush (Lstream *@var{lstr})
8546 Flush out any pending unwritten data in the stream.  Clear any buffered
8547 input data.  Returns 0 on success, -1 on error.
8548 @end deftypefun
8549
8550 @deftypefn Macro int Lstream_putc (Lstream *@var{stream}, int @var{c})
8551 Write out one byte to the stream.  This is a macro and so it is very
8552 efficient.  The @var{c} argument is only evaluated once but the @var{stream}
8553 argument is evaluated more than once.  Returns 0 on success, -1 on
8554 error.
8555 @end deftypefn
8556
8557 @deftypefn Macro int Lstream_getc (Lstream *@var{stream})
8558 Read one byte from the stream.  This is a macro and so it is very
8559 efficient.  The @var{stream} argument is evaluated more than once.  Return
8560 value is -1 for EOF or error.
8561 @end deftypefn
8562
8563 @deftypefn Macro void Lstream_ungetc (Lstream *@var{stream}, int @var{c})
8564 Push one byte back onto the input queue.  This will be the next byte
8565 read from the stream.  Any number of bytes can be pushed back and will
8566 be read in the reverse order they were pushed back---most recent
8567 first. (This is necessary for consistency---if there are a number of
8568 bytes that have been unread and I read and unread a byte, it needs to be
8569 the first to be read again.) This is a macro and so it is very
8570 efficient.  The @var{c} argument is only evaluated once but the @var{stream}
8571 argument is evaluated more than once.
8572 @end deftypefn
8573
8574 @deftypefun int Lstream_fputc (Lstream *@var{stream}, int @var{c})
8575 @deftypefunx int Lstream_fgetc (Lstream *@var{stream})
8576 @deftypefunx void Lstream_fungetc (Lstream *@var{stream}, int @var{c})
8577 Function equivalents of the above macros.
8578 @end deftypefun
8579
8580 @deftypefun ssize_t Lstream_read (Lstream *@var{stream}, void *@var{data}, size_t @var{size})
8581 Read @var{size} bytes of @var{data} from the stream.  Return the number
8582 of bytes read.  0 means EOF. -1 means an error occurred and no bytes
8583 were read.
8584 @end deftypefun
8585
8586 @deftypefun ssize_t Lstream_write (Lstream *@var{stream}, void *@var{data}, size_t @var{size})
8587 Write @var{size} bytes of @var{data} to the stream.  Return the number
8588 of bytes written.  -1 means an error occurred and no bytes were written.
8589 @end deftypefun
8590
8591 @deftypefun void Lstream_unread (Lstream *@var{stream}, void *@var{data}, size_t @var{size})
8592 Push back @var{size} bytes of @var{data} onto the input queue.  The next
8593 call to @code{Lstream_read()} with the same size will read the same
8594 bytes back.  Note that this will be the case even if there is other
8595 pending unread data.
8596 @end deftypefun
8597
8598 @deftypefun int Lstream_close (Lstream *@var{stream})
8599 Close the stream.  All data will be flushed out.
8600 @end deftypefun
8601
8602 @deftypefun void Lstream_reopen (Lstream *@var{stream})
8603 Reopen a closed stream.  This enables I/O on it again.  This is not
8604 meant to be called except from a wrapper routine that reinitializes
8605 variables and such---the close routine may well have freed some
8606 necessary storage structures, for example.
8607 @end deftypefun
8608
8609 @deftypefun void Lstream_rewind (Lstream *@var{stream})
8610 Rewind the stream to the beginning.
8611 @end deftypefun
8612
8613 @node Lstream Methods
8614 @section Lstream Methods
8615 @cindex lstream methods
8616
8617 @deftypefn {Lstream Method} ssize_t reader (Lstream *@var{stream}, unsigned char *@var{data}, size_t @var{size})
8618 Read some data from the stream's end and store it into @var{data}, which
8619 can hold @var{size} bytes.  Return the number of bytes read.  A return
8620 value of 0 means no bytes can be read at this time.  This may be because
8621 of an EOF, or because there is a granularity greater than one byte that
8622 the stream imposes on the returned data, and @var{size} is less than
8623 this granularity. (This will happen frequently for streams that need to
8624 return whole characters, because @code{Lstream_read()} calls the reader
8625 function repeatedly until it has the number of bytes it wants or until 0
8626 is returned.)  The lstream functions do not treat a 0 return as EOF or
8627 do anything special; however, the calling function will interpret any 0
8628 it gets back as EOF.  This will normally not happen unless the caller
8629 calls @code{Lstream_read()} with a very small size.
8630
8631 This function can be @code{NULL} if the stream is output-only.
8632 @end deftypefn
8633
8634 @deftypefn {Lstream Method} ssize_t writer (Lstream *@var{stream}, const unsigned char *@var{data}, size_t @var{size})
8635 Send some data to the stream's end.  Data to be sent is in @var{data}
8636 and is @var{size} bytes.  Return the number of bytes sent.  This
8637 function can send and return fewer bytes than is passed in; in that
8638 case, the function will just be called again until there is no data left
8639 or 0 is returned.  A return value of 0 means that no more data can be
8640 currently stored, but there is no error; the data will be squirreled
8641 away until the writer can accept data. (This is useful, e.g., if you're
8642 dealing with a non-blocking file descriptor and are getting
8643 @code{EWOULDBLOCK} errors.)  This function can be @code{NULL} if the
8644 stream is input-only.
8645 @end deftypefn
8646
8647 @deftypefn {Lstream Method} int rewinder (Lstream *@var{stream})
8648 Rewind the stream.  If this is @code{NULL}, the stream is not seekable.
8649 @end deftypefn
8650
8651 @deftypefn {Lstream Method} int seekable_p (Lstream *@var{stream})
8652 Indicate whether this stream is seekable---i.e. it can be rewound.
8653 This method is ignored if the stream does not have a rewind method.  If
8654 this method is not present, the result is determined by whether a rewind
8655 method is present.
8656 @end deftypefn
8657
8658 @deftypefn {Lstream Method} int flusher (Lstream *@var{stream})
8659 Perform any additional operations necessary to flush the data in this
8660 stream.
8661 @end deftypefn
8662
8663 @deftypefn {Lstream Method} int pseudo_closer (Lstream *@var{stream})
8664 @end deftypefn
8665
8666 @deftypefn {Lstream Method} int closer (Lstream *@var{stream})
8667 Perform any additional operations necessary to close this stream down.
8668 May be @code{NULL}.  This function is called when @code{Lstream_close()}
8669 is called or when the stream is garbage-collected.  When this function
8670 is called, all pending data in the stream will already have been written
8671 out.
8672 @end deftypefn
8673
8674 @deftypefn {Lstream Method} Lisp_Object marker (Lisp_Object @var{lstream}, void (*@var{markfun}) (Lisp_Object))
8675 Mark this object for garbage collection.  Same semantics as a standard
8676 @code{Lisp_Object} marker.  This function can be @code{NULL}.
8677 @end deftypefn
8678
8679 @node Consoles; Devices; Frames; Windows, The Redisplay Mechanism, Lstreams, Top
8680 @chapter Consoles; Devices; Frames; Windows
8681 @cindex consoles; devices; frames; windows
8682 @cindex devices; frames; windows, consoles;
8683 @cindex frames; windows, consoles; devices;
8684 @cindex windows, consoles; devices; frames;
8685
8686 @menu
8687 * Introduction to Consoles; Devices; Frames; Windows::
8688 * Point::
8689 * Window Hierarchy::
8690 * The Window Object::
8691 @end menu
8692
8693 @node Introduction to Consoles; Devices; Frames; Windows
8694 @section Introduction to Consoles; Devices; Frames; Windows
8695 @cindex consoles; devices; frames; windows, introduction to
8696 @cindex devices; frames; windows, introduction to consoles;
8697 @cindex frames; windows, introduction to consoles; devices;
8698 @cindex windows, introduction to consoles; devices; frames;
8699
8700 A window-system window that you see on the screen is called a
8701 @dfn{frame} in Emacs terminology.  Each frame is subdivided into one or
8702 more non-overlapping panes, called (confusingly) @dfn{windows}.  Each
8703 window displays the text of a buffer in it. (See above on Buffers.) Note
8704 that buffers and windows are independent entities: Two or more windows
8705 can be displaying the same buffer (potentially in different locations),
8706 and a buffer can be displayed in no windows.
8707
8708   A single display screen that contains one or more frames is called
8709 a @dfn{display}.  Under most circumstances, there is only one display.
8710 However, more than one display can exist, for example if you have
8711 a @dfn{multi-headed} console, i.e. one with a single keyboard but
8712 multiple displays. (Typically in such a situation, the various
8713 displays act like one large display, in that the mouse is only
8714 in one of them at a time, and moving the mouse off of one moves
8715 it into another.) In some cases, the different displays will
8716 have different characteristics, e.g. one color and one mono.
8717
8718   XEmacs can display frames on multiple displays.  It can even deal
8719 simultaneously with frames on multiple keyboards (called @dfn{consoles} in
8720 XEmacs terminology).  Here is one case where this might be useful: You
8721 are using XEmacs on your workstation at work, and leave it running.
8722 Then you go home and dial in on a TTY line, and you can use the
8723 already-running XEmacs process to display another frame on your local
8724 TTY.
8725
8726   Thus, there is a hierarchy console -> display -> frame -> window.
8727 There is a separate Lisp object type for each of these four concepts.
8728 Furthermore, there is logically a @dfn{selected console},
8729 @dfn{selected display}, @dfn{selected frame}, and @dfn{selected window}.
8730 Each of these objects is distinguished in various ways, such as being the
8731 default object for various functions that act on objects of that type.
8732 Note that every containing object remembers the ``selected'' object
8733 among the objects that it contains: e.g. not only is there a selected
8734 window, but every frame remembers the last window in it that was
8735 selected, and changing the selected frame causes the remembered window
8736 within it to become the selected window.  Similar relationships apply
8737 for consoles to devices and devices to frames.
8738
8739 @node Point
8740 @section Point
8741 @cindex point
8742
8743   Recall that every buffer has a current insertion position, called
8744 @dfn{point}.  Now, two or more windows may be displaying the same buffer,
8745 and the text cursor in the two windows (i.e. @code{point}) can be in
8746 two different places.  You may ask, how can that be, since each
8747 buffer has only one value of @code{point}?  The answer is that each window
8748 also has a value of @code{point} that is squirreled away in it.  There
8749 is only one selected window, and the value of ``point'' in that buffer
8750 corresponds to that window.  When the selected window is changed
8751 from one window to another displaying the same buffer, the old
8752 value of @code{point} is stored into the old window's ``point'' and the
8753 value of @code{point} from the new window is retrieved and made the
8754 value of @code{point} in the buffer.  This means that @code{window-point}
8755 for the selected window is potentially inaccurate, and if you
8756 want to retrieve the correct value of @code{point} for a window,
8757 you must special-case on the selected window and retrieve the
8758 buffer's point instead.  This is related to why @code{save-window-excursion}
8759 does not save the selected window's value of @code{point}.
8760
8761 @node Window Hierarchy
8762 @section Window Hierarchy
8763 @cindex window hierarchy
8764 @cindex hierarchy of windows
8765
8766   If a frame contains multiple windows (panes), they are always created
8767 by splitting an existing window along the horizontal or vertical axis.
8768 Terminology is a bit confusing here: to @dfn{split a window
8769 horizontally} means to create two side-by-side windows, i.e. to make a
8770 @emph{vertical} cut in a window.  Likewise, to @dfn{split a window
8771 vertically} means to create two windows, one above the other, by making
8772 a @emph{horizontal} cut.
8773
8774   If you split a window and then split again along the same axis, you
8775 will end up with a number of panes all arranged along the same axis.
8776 The precise way in which the splits were made should not be important,
8777 and this is reflected internally.  Internally, all windows are arranged
8778 in a tree, consisting of two types of windows, @dfn{combination} windows
8779 (which have children, and are covered completely by those children) and
8780 @dfn{leaf} windows, which have no children and are visible.  Every
8781 combination window has two or more children, all arranged along the same
8782 axis.  There are (logically) two subtypes of windows, depending on
8783 whether their children are horizontally or vertically arrayed.  There is
8784 always one root window, which is either a leaf window (if the frame
8785 contains only one window) or a combination window (if the frame contains
8786 more than one window).  In the latter case, the root window will have
8787 two or more children, either horizontally or vertically arrayed, and
8788 each of those children will be either a leaf window or another
8789 combination window.
8790
8791   Here are some rules:
8792
8793 @enumerate
8794 @item
8795 Horizontal combination windows can never have children that are
8796 horizontal combination windows; same for vertical.
8797
8798 @item
8799 Only leaf windows can be split (obviously) and this splitting does one
8800 of two things: (a) turns the leaf window into a combination window and
8801 creates two new leaf children, or (b) turns the leaf window into one of
8802 the two new leaves and creates the other leaf.  Rule (1) dictates which
8803 of these two outcomes happens.
8804
8805 @item
8806 Every combination window must have at least two children.
8807
8808 @item
8809 Leaf windows can never become combination windows.  They can be deleted,
8810 however.  If this results in a violation of (3), the parent combination
8811 window also gets deleted.
8812
8813 @item
8814 All functions that accept windows must be prepared to accept combination
8815 windows, and do something sane (e.g. signal an error if so).
8816 Combination windows @emph{do} escape to the Lisp level.
8817
8818 @item
8819 All windows have three fields governing their contents:
8820 these are @dfn{hchild} (a list of horizontally-arrayed children),
8821 @dfn{vchild} (a list of vertically-arrayed children), and @dfn{buffer}
8822 (the buffer contained in a leaf window).  Exactly one of
8823 these will be non-@code{nil}.  Remember that @dfn{horizontally-arrayed}
8824 means ``side-by-side'' and @dfn{vertically-arrayed} means
8825 @dfn{one above the other}.
8826
8827 @item
8828 Leaf windows also have markers in their @code{start} (the
8829 first buffer position displayed in the window) and @code{pointm}
8830 (the window's stashed value of @code{point}---see above) fields,
8831 while combination windows have @code{nil} in these fields.
8832
8833 @item
8834 The list of children for a window is threaded through the
8835 @code{next} and @code{prev} fields of each child window.
8836
8837 @item
8838 @strong{Deleted windows can be undeleted}.  This happens as a result of
8839 restoring a window configuration, and is unlike frames, displays, and
8840 consoles, which, once deleted, can never be restored.  Deleting a window
8841 does nothing except set a special @code{dead} bit to 1 and clear out the
8842 @code{next}, @code{prev}, @code{hchild}, and @code{vchild} fields, for
8843 GC purposes.
8844
8845 @item
8846 Most frames actually have two top-level windows---one for the
8847 minibuffer and one (the @dfn{root}) for everything else.  The modeline
8848 (if present) separates these two.  The @code{next} field of the root
8849 points to the minibuffer, and the @code{prev} field of the minibuffer
8850 points to the root.  The other @code{next} and @code{prev} fields are
8851 @code{nil}, and the frame points to both of these windows.
8852 Minibuffer-less frames have no minibuffer window, and the @code{next}
8853 and @code{prev} of the root window are @code{nil}.  Minibuffer-only
8854 frames have no root window, and the @code{next} of the minibuffer window
8855 is @code{nil} but the @code{prev} points to itself. (#### This is an
8856 artifact that should be fixed.)
8857 @end enumerate
8858
8859 @node The Window Object
8860 @section The Window Object
8861 @cindex window object, the
8862 @cindex object, the window
8863
8864   Windows have the following accessible fields:
8865
8866 @table @code
8867 @item frame
8868 The frame that this window is on.
8869
8870 @item mini_p
8871 Non-@code{nil} if this window is a minibuffer window.
8872
8873 @item buffer
8874 The buffer that the window is displaying.  This may change often during
8875 the life of the window.
8876
8877 @item dedicated
8878 Non-@code{nil} if this window is dedicated to its buffer.
8879
8880 @item pointm
8881 @cindex window point internals
8882 This is the value of point in the current buffer when this window is
8883 selected; when it is not selected, it retains its previous value.
8884
8885 @item start
8886 The position in the buffer that is the first character to be displayed
8887 in the window.
8888
8889 @item force_start
8890 If this flag is non-@code{nil}, it says that the window has been
8891 scrolled explicitly by the Lisp program.  This affects what the next
8892 redisplay does if point is off the screen: instead of scrolling the
8893 window to show the text around point, it moves point to a location that
8894 is on the screen.
8895
8896 @item last_modified
8897 The @code{modified} field of the window's buffer, as of the last time
8898 a redisplay completed in this window.
8899
8900 @item last_point
8901 The buffer's value of point, as of the last time
8902 a redisplay completed in this window.
8903
8904 @item left
8905 This is the left-hand edge of the window, measured in columns.  (The
8906 leftmost column on the screen is @w{column 0}.)
8907
8908 @item top
8909 This is the top edge of the window, measured in lines.  (The top line on
8910 the screen is @w{line 0}.)
8911
8912 @item height
8913 The height of the window, measured in lines.
8914
8915 @item width
8916 The width of the window, measured in columns.
8917
8918 @item next
8919 This is the window that is the next in the chain of siblings.  It is
8920 @code{nil} in a window that is the rightmost or bottommost of a group of
8921 siblings.
8922
8923 @item prev
8924 This is the window that is the previous in the chain of siblings.  It is
8925 @code{nil} in a window that is the leftmost or topmost of a group of
8926 siblings.
8927
8928 @item parent
8929 Internally, XEmacs arranges windows in a tree; each group of siblings has
8930 a parent window whose area includes all the siblings.  This field points
8931 to a window's parent.
8932
8933 Parent windows do not display buffers, and play little role in display
8934 except to shape their child windows.  Emacs Lisp programs usually have
8935 no access to the parent windows; they operate on the windows at the
8936 leaves of the tree, which actually display buffers.
8937
8938 @item hscroll
8939 This is the number of columns that the display in the window is scrolled
8940 horizontally to the left.  Normally, this is 0.
8941
8942 @item use_time
8943 This is the last time that the window was selected.  The function
8944 @code{get-lru-window} uses this field.
8945
8946 @item display_table
8947 The window's display table, or @code{nil} if none is specified for it.
8948
8949 @item update_mode_line
8950 Non-@code{nil} means this window's mode line needs to be updated.
8951
8952 @item base_line_number
8953 The line number of a certain position in the buffer, or @code{nil}.
8954 This is used for displaying the line number of point in the mode line.
8955
8956 @item base_line_pos
8957 The position in the buffer for which the line number is known, or
8958 @code{nil} meaning none is known.
8959
8960 @item region_showing
8961 If the region (or part of it) is highlighted in this window, this field
8962 holds the mark position that made one end of that region.  Otherwise,
8963 this field is @code{nil}.
8964 @end table
8965
8966 @node The Redisplay Mechanism, Extents, Consoles; Devices; Frames; Windows, Top
8967 @chapter The Redisplay Mechanism
8968 @cindex redisplay mechanism, the
8969
8970   The redisplay mechanism is one of the most complicated sections of
8971 XEmacs, especially from a conceptual standpoint.  This is doubly so
8972 because, unlike for the basic aspects of the Lisp interpreter, the
8973 computer science theories of how to efficiently handle redisplay are not
8974 well-developed.
8975
8976   When working with the redisplay mechanism, remember the Golden Rules
8977 of Redisplay:
8978
8979 @enumerate
8980 @item
8981 It Is Better To Be Correct Than Fast.
8982 @item
8983 Thou Shalt Not Run Elisp From Within Redisplay.
8984 @item
8985 It Is Better To Be Fast Than Not To Be.
8986 @end enumerate
8987
8988 @menu
8989 * Critical Redisplay Sections::
8990 * Line Start Cache::
8991 * Redisplay Piece by Piece::
8992 @end menu
8993
8994 @node Critical Redisplay Sections
8995 @section Critical Redisplay Sections
8996 @cindex redisplay sections, critical
8997 @cindex critical redisplay sections
8998
8999 Within this section, we are defenseless and assume that the
9000 following cannot happen:
9001
9002 @enumerate
9003 @item
9004 garbage collection
9005 @item
9006 Lisp code evaluation
9007 @item
9008 frame size changes
9009 @end enumerate
9010
9011 We ensure (3) by calling @code{hold_frame_size_changes()}, which
9012 will cause any pending frame size changes to get put on hold
9013 till after the end of the critical section.  (1) follows
9014 automatically if (2) is met.  #### Unfortunately, there are
9015 some places where Lisp code can be called within this section.
9016 We need to remove them.
9017
9018 If @code{Fsignal()} is called during this critical section, we
9019 will @code{abort()}.
9020
9021 If garbage collection is called during this critical section,
9022 we simply return. #### We should abort instead.
9023
9024 #### If a frame-size change does occur we should probably
9025 actually be preempting redisplay.
9026
9027 @node Line Start Cache
9028 @section Line Start Cache
9029 @cindex line start cache
9030
9031   The traditional scrolling code in Emacs breaks in a variable height
9032 world.  It depends on the key assumption that the number of lines that
9033 can be displayed at any given time is fixed.  This led to a complete
9034 separation of the scrolling code from the redisplay code.  In order to
9035 fully support variable height lines, the scrolling code must actually be
9036 tightly integrated with redisplay.  Only redisplay can determine how
9037 many lines will be displayed on a screen for any given starting point.
9038
9039   What is ideally wanted is a complete list of the starting buffer
9040 position for every possible display line of a buffer along with the
9041 height of that display line.  Maintaining such a full list would be very
9042 expensive.  We settle for having it include information for all areas
9043 which we happen to generate anyhow (i.e. the region currently being
9044 displayed) and for those areas we need to work with.
9045
9046   In order to ensure that the cache accurately represents what redisplay
9047 would actually show, it is necessary to invalidate it in many
9048 situations.  If the buffer changes, the starting positions may no longer
9049 be correct.  If a face or an extent has changed then the line heights
9050 may have altered.  These events happen frequently enough that the cache
9051 can end up being constantly disabled.  With this potentially constant
9052 invalidation when is the cache ever useful?
9053
9054   Even if the cache is invalidated before every single usage, it is
9055 necessary.  Scrolling often requires knowledge about display lines which
9056 are actually above or below the visible region.  The cache provides a
9057 convenient light-weight method of storing this information for multiple
9058 display regions.  This knowledge is necessary for the scrolling code to
9059 always obey the First Golden Rule of Redisplay.
9060
9061   If the cache already contains all of the information that the scrolling
9062 routines happen to need so that it doesn't have to go generate it, then
9063 we are able to obey the Third Golden Rule of Redisplay.  The first thing
9064 we do to help out the cache is to always add the displayed region.  This
9065 region had to be generated anyway, so the cache ends up getting the
9066 information basically for free.  In those cases where a user is simply
9067 scrolling around viewing a buffer there is a high probability that this
9068 is sufficient to always provide the needed information.  The second
9069 thing we can do is be smart about invalidating the cache.
9070
9071   TODO---Be smart about invalidating the cache.  Potential places:
9072
9073 @itemize @bullet
9074 @item
9075 Insertions at end-of-line which don't cause line-wraps do not alter the
9076 starting positions of any display lines.  These types of buffer
9077 modifications should not invalidate the cache.  This is actually a large
9078 optimization for redisplay speed as well.
9079 @item
9080 Buffer modifications frequently only affect the display of lines at and
9081 below where they occur.  In these situations we should only invalidate
9082 the part of the cache starting at where the modification occurs.
9083 @end itemize
9084
9085   In case you're wondering, the Second Golden Rule of Redisplay is not
9086 applicable.
9087
9088 @node Redisplay Piece by Piece
9089 @section Redisplay Piece by Piece
9090 @cindex redisplay piece by piece
9091
9092 As you can begin to see redisplay is complex and also not well
9093 documented. Chuck no longer works on XEmacs so this section is my take
9094 on the workings of redisplay.
9095
9096 Redisplay happens in three phases:
9097
9098 @enumerate
9099 @item
9100 Determine desired display in area that needs redisplay.
9101 Implemented by @code{redisplay.c}
9102 @item
9103 Compare desired display with current display
9104 Implemented by @code{redisplay-output.c}
9105 @item
9106 Output changes Implemented by @code{redisplay-output.c},
9107 @code{redisplay-x.c}, @code{redisplay-msw.c} and @code{redisplay-tty.c}
9108 @end enumerate
9109
9110 Steps 1 and 2 are device-independent and relatively complex.  Step 3 is
9111 mostly device-dependent.
9112
9113 Determining the desired display
9114
9115 Display attributes are stored in @code{display_line} structures. Each
9116 @code{display_line} consists of a set of @code{display_block}'s and each
9117 @code{display_block} contains a number of @code{rune}'s. Generally
9118 dynarr's of @code{display_line}'s are held by each window representing
9119 the current display and the desired display.
9120
9121 The @code{display_line} structures are tightly tied to buffers which
9122 presents a problem for redisplay as this connection is bogus for the
9123 modeline. Hence the @code{display_line} generation routines are
9124 duplicated for generating the modeline. This means that the modeline
9125 display code has many bugs that the standard redisplay code does not.
9126
9127 The guts of @code{display_line} generation are in
9128 @code{create_text_block}, which creates a single display line for the
9129 desired locale. This incrementally parses the characters on the current
9130 line and generates redisplay structures for each.
9131
9132 Gutter redisplay is different. Because the data to display is stored in
9133 a string we cannot use @code{create_text_block}. Instead we use
9134 @code{create_text_string_block} which performs the same function as
9135 @code{create_text_block} but for strings. Many of the complexities of
9136 @code{create_text_block} to do with cursor handling and selective
9137 display have been removed.
9138
9139 @node Extents, Faces, The Redisplay Mechanism, Top
9140 @chapter Extents
9141 @cindex extents
9142
9143 @menu
9144 * Introduction to Extents::     Extents are ranges over text, with properties.
9145 * Extent Ordering::             How extents are ordered internally.
9146 * Format of the Extent Info::   The extent information in a buffer or string.
9147 * Zero-Length Extents::         A weird special case.
9148 * Mathematics of Extent Ordering::  A rigorous foundation.
9149 * Extent Fragments::            Cached information useful for redisplay.
9150 @end menu
9151
9152 @node Introduction to Extents
9153 @section Introduction to Extents
9154 @cindex extents, introduction to
9155
9156   Extents are regions over a buffer, with a start and an end position
9157 denoting the region of the buffer included in the extent.  In
9158 addition, either end can be closed or open, meaning that the endpoint
9159 is or is not logically included in the extent.  Insertion of a character
9160 at a closed endpoint causes the character to go inside the extent;
9161 insertion at an open endpoint causes the character to go outside.
9162
9163   Extent endpoints are stored using memory indices (see @file{insdel.c}),
9164 to minimize the amount of adjusting that needs to be done when
9165 characters are inserted or deleted.
9166
9167   (Formerly, extent endpoints at the gap could be either before or
9168 after the gap, depending on the open/closedness of the endpoint.
9169 The intent of this was to make it so that insertions would
9170 automatically go inside or out of extents as necessary with no
9171 further work needing to be done.  It didn't work out that way,
9172 however, and just ended up complexifying and buggifying all the
9173 rest of the code.)
9174
9175 @node Extent Ordering
9176 @section Extent Ordering
9177 @cindex extent ordering
9178
9179   Extents are compared using memory indices.  There are two orderings
9180 for extents and both orders are kept current at all times.  The normal
9181 or @dfn{display} order is as follows:
9182
9183 @example
9184 Extent A is ``less than'' extent B,
9185 that is, earlier in the display order,
9186   if:    A-start < B-start,
9187   or if: A-start = B-start, and A-end > B-end
9188 @end example
9189
9190   So if two extents begin at the same position, the larger of them is the
9191 earlier one in the display order (@code{EXTENT_LESS} is true).
9192
9193   For the e-order, the same thing holds:
9194
9195 @example
9196 Extent A is ``less than'' extent B in e-order,
9197 that is, later in the buffer,
9198   if:    A-end < B-end,
9199   or if: A-end = B-end, and A-start > B-start
9200 @end example
9201
9202   So if two extents end at the same position, the smaller of them is the
9203 earlier one in the e-order (@code{EXTENT_E_LESS} is true).
9204
9205   The display order and the e-order are complementary orders: any
9206 theorem about the display order also applies to the e-order if you swap
9207 all occurrences of ``display order'' and ``e-order'', ``less than'' and
9208 ``greater than'', and ``extent start'' and ``extent end''.
9209
9210 @node Format of the Extent Info
9211 @section Format of the Extent Info
9212 @cindex extent info, format of the
9213
9214   An extent-info structure consists of a list of the buffer or string's
9215 extents and a @dfn{stack of extents} that lists all of the extents over
9216 a particular position.  The stack-of-extents info is used for
9217 optimization purposes---it basically caches some info that might
9218 be expensive to compute.  Certain otherwise hard computations are easy
9219 given the stack of extents over a particular position, and if the
9220 stack of extents over a nearby position is known (because it was
9221 calculated at some prior point in time), it's easy to move the stack
9222 of extents to the proper position.
9223
9224   Given that the stack of extents is an optimization, and given that
9225 it requires memory, a string's stack of extents is wiped out each
9226 time a garbage collection occurs.  Therefore, any time you retrieve
9227 the stack of extents, it might not be there.  If you need it to
9228 be there, use the @code{_force} version.
9229
9230   Similarly, a string may or may not have an extent_info structure.
9231 (Generally it won't if there haven't been any extents added to the
9232 string.) So use the @code{_force} version if you need the extent_info
9233 structure to be there.
9234
9235   A list of extents is maintained as a double gap array: one gap array
9236 is ordered by start index (the @dfn{display order}) and the other is
9237 ordered by end index (the @dfn{e-order}).  Note that positions in an
9238 extent list should logically be conceived of as referring @emph{to} a
9239 particular extent (as is the norm in programs) rather than sitting
9240 between two extents.  Note also that callers of these functions should
9241 not be aware of the fact that the extent list is implemented as an
9242 array, except for the fact that positions are integers (this should be
9243 generalized to handle integers and linked list equally well).
9244
9245 @node Zero-Length Extents
9246 @section Zero-Length Extents
9247 @cindex zero-length extents
9248 @cindex extents, zero-length
9249
9250   Extents can be zero-length, and will end up that way if their endpoints
9251 are explicitly set that way or if their detachable property is @code{nil}
9252 and all the text in the extent is deleted. (The exception is open-open
9253 zero-length extents, which are barred from existing because there is
9254 no sensible way to define their properties.  Deletion of the text in
9255 an open-open extent causes it to be converted into a closed-open
9256 extent.)  Zero-length extents are primarily used to represent
9257 annotations, and behave as follows:
9258
9259 @enumerate
9260 @item
9261 Insertion at the position of a zero-length extent expands the extent
9262 if both endpoints are closed; goes after the extent if it is closed-open;
9263 and goes before the extent if it is open-closed.
9264
9265 @item
9266 Deletion of a character on a side of a zero-length extent whose
9267 corresponding endpoint is closed causes the extent to be detached if
9268 it is detachable; if the extent is not detachable or the corresponding
9269 endpoint is open, the extent remains in the buffer, moving as necessary.
9270 @end enumerate
9271
9272   Note that closed-open, non-detachable zero-length extents behave
9273 exactly like markers and that open-closed, non-detachable zero-length
9274 extents behave like the ``point-type'' marker in Mule.
9275
9276 @node Mathematics of Extent Ordering
9277 @section Mathematics of Extent Ordering
9278 @cindex mathematics of extent ordering
9279 @cindex extent mathematics
9280 @cindex extent ordering
9281
9282 @cindex display order of extents
9283 @cindex extents, display order
9284   The extents in a buffer are ordered by ``display order'' because that
9285 is that order that the redisplay mechanism needs to process them in.
9286 The e-order is an auxiliary ordering used to facilitate operations
9287 over extents.  The operations that can be performed on the ordered
9288 list of extents in a buffer are
9289
9290 @enumerate
9291 @item
9292 Locate where an extent would go if inserted into the list.
9293 @item
9294 Insert an extent into the list.
9295 @item
9296 Remove an extent from the list.
9297 @item
9298 Map over all the extents that overlap a range.
9299 @end enumerate
9300
9301   (4) requires being able to determine the first and last extents
9302 that overlap a range.
9303
9304   NOTE: @dfn{overlap} is used as follows:
9305
9306 @itemize @bullet
9307 @item
9308 two ranges overlap if they have at least one point in common.
9309 Whether the endpoints are open or closed makes a difference here.
9310 @item
9311 a point overlaps a range if the point is contained within the
9312 range; this is equivalent to treating a point @math{P} as the range
9313 @math{[P, P]}.
9314 @item
9315 In the case of an @emph{extent} overlapping a point or range, the extent
9316 is normally treated as having closed endpoints.  This applies
9317 consistently in the discussion of stacks of extents and such below.
9318 Note that this definition of overlap is not necessarily consistent with
9319 the extents that @code{map-extents} maps over, since @code{map-extents}
9320 sometimes pays attention to whether the endpoints of an extents are open
9321 or closed.  But for our purposes, it greatly simplifies things to treat
9322 all extents as having closed endpoints.
9323 @end itemize
9324
9325 First, define @math{>}, @math{<}, @math{<=}, etc. as applied to extents
9326 to mean comparison according to the display order.  Comparison between
9327 an extent @math{E} and an index @math{I} means comparison between
9328 @math{E} and the range @math{[I, I]}.
9329
9330 Also define @math{e>}, @math{e<}, @math{e<=}, etc. to mean comparison
9331 according to the e-order.
9332
9333 For any range @math{R}, define @math{R(0)} to be the starting index of
9334 the range and @math{R(1)} to be the ending index of the range.
9335
9336 For any extent @math{E}, define @math{E(next)} to be the extent directly
9337 following @math{E}, and @math{E(prev)} to be the extent directly
9338 preceding @math{E}.  Assume @math{E(next)} and @math{E(prev)} can be
9339 determined from @math{E} in constant time.  (This is because we store
9340 the extent list as a doubly linked list.)
9341
9342 Similarly, define @math{E(e-next)} and @math{E(e-prev)} to be the
9343 extents directly following and preceding @math{E} in the e-order.
9344
9345 Now:
9346
9347 Let @math{R} be a range.
9348 Let @math{F} be the first extent overlapping @math{R}.
9349 Let @math{L} be the last extent overlapping @math{R}.
9350
9351 Theorem 1: @math{R(1)} lies between @math{L} and @math{L(next)},
9352 i.e. @math{L <= R(1) < L(next)}.
9353
9354   This follows easily from the definition of display order.  The
9355 basic reason that this theorem applies is that the display order
9356 sorts by increasing starting index.
9357
9358   Therefore, we can determine @math{L} just by looking at where we would
9359 insert @math{R(1)} into the list, and if we know @math{F} and are moving
9360 forward over extents, we can easily determine when we've hit @math{L} by
9361 comparing the extent we're at to @math{R(1)}.
9362
9363 @example
9364 Theorem 2: @math{F(e-prev) e< [1, R(0)] e<= F}.
9365 @end example
9366
9367   This is the analog of Theorem 1, and applies because the e-order
9368 sorts by increasing ending index.
9369
9370   Therefore, @math{F} can be found in the same amount of time as
9371 operation (1), i.e. the time that it takes to locate where an extent
9372 would go if inserted into the e-order list.
9373
9374   If the lists were stored as balanced binary trees, then operation (1)
9375 would take logarithmic time, which is usually quite fast.  However,
9376 currently they're stored as simple doubly-linked lists, and instead we
9377 do some caching to try to speed things up.
9378
9379   Define a @dfn{stack of extents} (or @dfn{SOE}) as the set of extents
9380 (ordered in the display order) that overlap an index @math{I}, together
9381 with the SOE's @dfn{previous} extent, which is an extent that precedes
9382 @math{I} in the e-order. (Hopefully there will not be very many extents
9383 between @math{I} and the previous extent.)
9384
9385 Now:
9386
9387 Let @math{I} be an index, let @math{S} be the stack of extents on
9388 @math{I}, let @math{F} be the first extent in @math{S}, and let @math{P}
9389 be @math{S}'s previous extent.
9390
9391 Theorem 3: The first extent in @math{S} is the first extent that overlaps
9392 any range @math{[I, J]}.
9393
9394 Proof: Any extent that overlaps @math{[I, J]} but does not include
9395 @math{I} must have a start index @math{> I}, and thus be greater than
9396 any extent in @math{S}.
9397
9398 Therefore, finding the first extent that overlaps a range @math{R} is
9399 the same as finding the first extent that overlaps @math{R(0)}.
9400
9401 Theorem 4: Let @math{I2} be an index such that @math{I2 > I}, and let
9402 @math{F2} be the first extent that overlaps @math{I2}.  Then, either
9403 @math{F2} is in @math{S} or @math{F2} is greater than any extent in
9404 @math{S}.
9405
9406 Proof: If @math{F2} does not include @math{I} then its start index is
9407 greater than @math{I} and thus it is greater than any extent in
9408 @math{S}, including @math{F}.  Otherwise, @math{F2} includes @math{I}
9409 and thus is in @math{S}, and thus @math{F2 >= F}.
9410
9411 @node Extent Fragments
9412 @section Extent Fragments
9413 @cindex extent fragments
9414 @cindex fragments, extent
9415
9416   Imagine that the buffer is divided up into contiguous, non-overlapping
9417 @dfn{runs} of text such that no extent starts or ends within a run
9418 (extents that abut the run don't count).
9419
9420   An extent fragment is a structure that holds data about the run that
9421 contains a particular buffer position (if the buffer position is at the
9422 junction of two runs, the run after the position is used)---the
9423 beginning and end of the run, a list of all of the extents in that run,
9424 the @dfn{merged face} that results from merging all of the faces
9425 corresponding to those extents, the begin and end glyphs at the
9426 beginning of the run, etc.  This is the information that redisplay needs
9427 in order to display this run.
9428
9429   Extent fragments have to be very quick to update to a new buffer
9430 position when moving linearly through the buffer.  They rely on the
9431 stack-of-extents code, which does the heavy-duty algorithmic work of
9432 determining which extents overly a particular position.
9433
9434 @node Faces, Glyphs, Extents, Top
9435 @chapter Faces
9436 @cindex faces
9437
9438 Not yet documented.
9439
9440 @node Glyphs, Specifiers, Faces, Top
9441 @chapter Glyphs
9442 @cindex glyphs
9443
9444 Glyphs are graphical elements that can be displayed in XEmacs buffers or
9445 gutters. We use the term graphical element here in the broadest possible
9446 sense since glyphs can be as mundane as text or as arcane as a native
9447 tab widget.
9448
9449 In XEmacs, glyphs represent the uninstantiated state of graphical
9450 elements, i.e. they hold all the information necessary to produce an
9451 image on-screen but the image need not exist at this stage, and multiple
9452 screen images can be instantiated from a single glyph.
9453
9454 Glyphs are lazily instantiated by calling one of the glyph
9455 functions. This usually occurs within redisplay when
9456 @code{Fglyph_height} is called. Instantiation causes an image-instance
9457 to be created and cached. This cache is on a per-device basis for all glyphs
9458 except widget-glyphs, and on a per-window basis for widgets-glyphs.  The
9459 caching is done by @code{image_instantiate} and is necessary because it
9460 is generally possible to display an image-instance in multiple
9461 domains. For instance if we create a Pixmap, we can actually display
9462 this on multiple windows - even though we only need a single Pixmap
9463 instance to do this. If caching wasn't done then it would be necessary
9464 to create image-instances for every displayable occurrence of a glyph -
9465 and every usage - and this would be extremely memory and cpu intensive.
9466
9467 Widget-glyphs (a.k.a native widgets) are not cached in this way. This is
9468 because widget-glyph image-instances on screen are toolkit windows, and
9469 thus cannot be reused in multiple XEmacs domains. Thus widget-glyphs are
9470 cached on an XEmacs window basis.
9471
9472 Any action on a glyph first consults the cache before actually
9473 instantiating a widget.
9474
9475 @section Glyph Instantiation
9476 @cindex glyph instantiation
9477 @cindex instantiation, glyph
9478
9479 Glyph instantiation is a hairy topic and requires some explanation. The
9480 guts of glyph instantiation is contained within
9481 @code{image_instantiate}. A glyph contains an image which is a
9482 specifier. When a glyph function - for instance @code{Fglyph_height} -
9483 asks for a property of the glyph that can only be determined from its
9484 instantiated state, then the glyph image is instantiated and an image
9485 instance created. The instantiation process is governed by the specifier
9486 code and goes through a series of steps:
9487
9488 @itemize @bullet
9489 @item
9490 Validation. Instantiation of image instances happens dynamically - often
9491 within the guts of redisplay. Thus it is often not feasible to catch
9492 instantiator errors at instantiation time. Instead the instantiator is
9493 validated at the time it is added to the image specifier. This function
9494 is defined by @code{image_validate} and at a simple level validates
9495 keyword value pairs.
9496 @item
9497 Duplication. The specifier code by default takes a copy of the
9498 instantiator. This is reasonable for most specifiers but in the case of
9499 widget-glyphs can be problematic, since some of the properties in the
9500 instantiator - for instance callbacks - could cause infinite recursion
9501 in the copying process. Thus the image code defines a function -
9502 @code{image_copy_instantiator} - which will selectively copy values.
9503 This is controlled by the way that a keyword is defined either using
9504 @code{IIFORMAT_VALID_KEYWORD} or
9505 @code{IIFORMAT_VALID_NONCOPY_KEYWORD}. Note that the image caching and
9506 redisplay code relies on instantiator copying to ensure that current and
9507 new instantiators are actually different rather than referring to the
9508 same thing.
9509 @item
9510 Normalization. Once the instantiator has been copied it must be
9511 converted into a form that is viable at instantiation time. This can
9512 involve no changes at all, but typically involves things like converting
9513 file names to the actual data. This function is defined by
9514 @code{image_going_to_add} and @code{normalize_image_instantiator}.
9515 @item
9516 Instantiation. When an image instance is actually required for display
9517 it is instantiated using @code{image_instantiate}. This involves calling
9518 instantiate methods that are specific to the type of image being
9519 instantiated.
9520 @end itemize
9521
9522 The final instantiation phase also involves a number of steps. In order
9523 to understand these we need to describe a number of concepts.
9524
9525 An image is instantiated in a @dfn{domain}, where a domain can be any
9526 one of a device, frame, window or image-instance. The domain gives the
9527 image-instance context and identity and properties that affect the
9528 appearance of the image-instance may be different for the same glyph
9529 instantiated in different domains. An example is the face used to
9530 display the image-instance.
9531
9532 Although an image is instantiated in a particular domain the
9533 instantiation domain is not necessarily the domain in which the
9534 image-instance is cached. For example a pixmap can be instantiated in a
9535 window be actually be cached on a per-device basis. The domain in which
9536 the image-instance is actually cached is called the
9537 @dfn{governing-domain}. A governing-domain is currently either a device
9538 or a window. Widget-glyphs and text-glyphs have a window as a
9539 governing-domain, all other image-instances have a device as the
9540 governing-domain. The governing domain for an image-instance is
9541 determined using the governing_domain image-instance method.
9542
9543 @section Widget-Glyphs
9544 @cindex widget-glyphs
9545
9546 @section Widget-Glyphs in the MS-Windows Environment
9547 @cindex widget-glyphs in the MS-Windows environment
9548 @cindex MS-Windows environment, widget-glyphs in the
9549
9550 To Do
9551
9552 @section Widget-Glyphs in the X Environment
9553 @cindex widget-glyphs in the X environment
9554 @cindex X environment, widget-glyphs in the
9555
9556 Widget-glyphs under X make heavy use of lwlib (@pxref{Lucid Widget
9557 Library}) for manipulating the native toolkit objects. This is primarily
9558 so that different toolkits can be supported for widget-glyphs, just as
9559 they are supported for features such as menubars etc.
9560
9561 Lwlib is extremely poorly documented and quite hairy so here is my
9562 understanding of what goes on.
9563
9564 Lwlib maintains a set of widget_instances which mirror the hierarchical
9565 state of Xt widgets. I think this is so that widgets can be updated and
9566 manipulated generically by the lwlib library. For instance
9567 update_one_widget_instance can cope with multiple types of widget and
9568 multiple types of toolkit. Each element in the widget hierarchy is updated
9569 from its corresponding widget_instance by walking the widget_instance
9570 tree recursively.
9571
9572 This has desirable properties such as lw_modify_all_widgets which is
9573 called from @file{glyphs-x.c} and updates all the properties of a widget
9574 without having to know what the widget is or what toolkit it is from.
9575 Unfortunately this also has hairy properties such as making the lwlib
9576 code quite complex. And of course lwlib has to know at some level what
9577 the widget is and how to set its properties.
9578
9579 @node Specifiers, Menus, Glyphs, Top
9580 @chapter Specifiers
9581 @cindex specifiers
9582
9583 Not yet documented.
9584
9585 @node Menus, Subprocesses, Specifiers, Top
9586 @chapter Menus
9587 @cindex menus
9588
9589   A menu is set by setting the value of the variable
9590 @code{current-menubar} (which may be buffer-local) and then calling
9591 @code{set-menubar-dirty-flag} to signal a change.  This will cause the
9592 menu to be redrawn at the next redisplay.  The format of the data in
9593 @code{current-menubar} is described in @file{menubar.c}.
9594
9595   Internally the data in current-menubar is parsed into a tree of
9596 @code{widget_value's} (defined in @file{lwlib.h}); this is accomplished
9597 by the recursive function @code{menu_item_descriptor_to_widget_value()},
9598 called by @code{compute_menubar_data()}.  Such a tree is deallocated
9599 using @code{free_widget_value()}.
9600
9601   @code{update_screen_menubars()} is one of the external entry points.
9602 This checks to see, for each screen, if that screen's menubar needs to
9603 be updated.  This is the case if
9604
9605 @enumerate
9606 @item
9607 @code{set-menubar-dirty-flag} was called since the last redisplay.  (This
9608 function sets the C variable menubar_has_changed.)
9609 @item
9610 The buffer displayed in the screen has changed.
9611 @item
9612 The screen has no menubar currently displayed.
9613 @end enumerate
9614
9615   @code{set_screen_menubar()} is called for each such screen.  This
9616 function calls @code{compute_menubar_data()} to create the tree of
9617 widget_value's, then calls @code{lw_create_widget()},
9618 @code{lw_modify_all_widgets()}, and/or @code{lw_destroy_all_widgets()}
9619 to create the X-Toolkit widget associated with the menu.
9620
9621   @code{update_psheets()}, the other external entry point, actually
9622 changes the menus being displayed.  It uses the widgets fixed by
9623 @code{update_screen_menubars()} and calls various X functions to ensure
9624 that the menus are displayed properly.
9625
9626   The menubar widget is set up so that @code{pre_activate_callback()} is
9627 called when the menu is first selected (i.e. mouse button goes down),
9628 and @code{menubar_selection_callback()} is called when an item is
9629 selected.  @code{pre_activate_callback()} calls the function in
9630 activate-menubar-hook, which can change the menubar (this is described
9631 in @file{menubar.c}).  If the menubar is changed,
9632 @code{set_screen_menubars()} is called.
9633 @code{menubar_selection_callback()} enqueues a menu event, putting in it
9634 a function to call (either @code{eval} or @code{call-interactively}) and
9635 its argument, which is the callback function or form given in the menu's
9636 description.
9637
9638 @node Subprocesses, Interface to the X Window System, Menus, Top
9639 @chapter Subprocesses
9640 @cindex subprocesses
9641
9642   The fields of a process are:
9643
9644 @table @code
9645 @item name
9646 A string, the name of the process.
9647
9648 @item command
9649 A list containing the command arguments that were used to start this
9650 process.
9651
9652 @item filter
9653 A function used to accept output from the process instead of a buffer,
9654 or @code{nil}.
9655
9656 @item sentinel
9657 A function called whenever the process receives a signal, or @code{nil}.
9658
9659 @item buffer
9660 The associated buffer of the process.
9661
9662 @item pid
9663 An integer, the Unix process @sc{id}.
9664
9665 @item childp
9666 A flag, non-@code{nil} if this is really a child process.
9667 It is @code{nil} for a network connection.
9668
9669 @item mark
9670 A marker indicating the position of the end of the last output from this
9671 process inserted into the buffer.  This is often but not always the end
9672 of the buffer.
9673
9674 @item kill_without_query
9675 If this is non-@code{nil}, killing XEmacs while this process is still
9676 running does not ask for confirmation about killing the process.
9677
9678 @item raw_status_low
9679 @itemx raw_status_high
9680 These two fields record 16 bits each of the process status returned by
9681 the @code{wait} system call.
9682
9683 @item status
9684 The process status, as @code{process-status} should return it.
9685
9686 @item tick
9687 @itemx update_tick
9688 If these two fields are not equal, a change in the status of the process
9689 needs to be reported, either by running the sentinel or by inserting a
9690 message in the process buffer.
9691
9692 @item pty_flag
9693 Non-@code{nil} if communication with the subprocess uses a @sc{pty};
9694 @code{nil} if it uses a pipe.
9695
9696 @item infd
9697 The file descriptor for input from the process.
9698
9699 @item outfd
9700 The file descriptor for output to the process.
9701
9702 @item subtty
9703 The file descriptor for the terminal that the subprocess is using.  (On
9704 some systems, there is no need to record this, so the value is
9705 @code{-1}.)
9706
9707 @item tty_name
9708 The name of the terminal that the subprocess is using,
9709 or @code{nil} if it is using pipes.
9710 @end table
9711
9712 @node Interface to the X Window System, Index, Subprocesses, Top
9713 @chapter Interface to the X Window System
9714 @cindex X Window System, interface to the
9715
9716 Mostly undocumented.
9717
9718 @menu
9719 * Lucid Widget Library::        An interface to various widget sets.
9720 @end menu
9721
9722 @node Lucid Widget Library
9723 @section Lucid Widget Library
9724 @cindex Lucid Widget Library
9725 @cindex widget library, Lucid
9726 @cindex library, Lucid Widget
9727
9728 Lwlib is extremely poorly documented and quite hairy.  The author(s)
9729 blame that on X, Xt, and Motif, with some justice, but also sufficient
9730 hypocrisy to avoid drawing the obvious conclusion about their own work.
9731
9732 The Lucid Widget Library is composed of two more or less independent
9733 pieces.  The first, as the name suggests, is a set of widgets.  These
9734 widgets are intended to resemble and improve on widgets provided in the
9735 Motif toolkit but not in the Athena widgets, including menubars and
9736 scrollbars.  Recent additions by Andy Piper integrate some ``modern''
9737 widgets by Edward Falk, including checkboxes, radio buttons, progress
9738 gauges, and index tab controls (aka notebooks).
9739
9740 The second piece of the Lucid widget library is a generic interface to
9741 several toolkits for X (including Xt, the Athena widget set, and Motif,
9742 as well as the Lucid widgets themselves) so that core XEmacs code need
9743 not know which widget set has been used to build the graphical user
9744 interface.
9745
9746 @menu
9747 * Generic Widget Interface::    The lwlib generic widget interface.
9748 * Scrollbars::
9749 * Menubars::
9750 * Checkboxes and Radio Buttons::
9751 * Progress Bars::
9752 * Tab Controls::
9753 @end menu
9754
9755 @node Generic Widget Interface
9756 @subsection Generic Widget Interface
9757 @cindex widget interface, generic
9758
9759 In general in any toolkit a widget may be a composite object.  In Xt,
9760 all widgets have an X window that they manage, but typically a complex
9761 widget will have widget children, each of which manages a subwindow of
9762 the parent widget's X window.  These children may themselves be
9763 composite widgets.  Thus a widget is actually a tree or hierarchy of
9764 widgets.
9765
9766 For each toolkit widget, lwlib maintains a tree of @code{widget_values}
9767 which mirror the hierarchical state of Xt widgets (including Motif,
9768 Athena, 3D Athena, and Falk's widget sets).  Each @code{widget_value}
9769 has @code{contents} member, which points to the head of a linked list of
9770 its children.  The linked list of siblings is chained through the
9771 @code{next} member of @code{widget_value}.
9772
9773 @example
9774            +-----------+
9775            | composite |
9776            +-----------+
9777                  |
9778                  | contents
9779                  V
9780              +-------+ next +-------+ next +-------+
9781              | child |----->| child |----->| child |
9782              +-------+      +-------+      +-------+
9783                                 |
9784                                 | contents
9785                                 V
9786                          +-------------+ next +-------------+
9787                          | grand child |----->| grand child |
9788                          +-------------+      +-------------+
9789
9790 The @code{widget_value} hierarchy of a composite widget with two simple
9791 children and one composite child.
9792 @end example
9793
9794 The @code{widget_instance} structure maintains the inverse view of the
9795 tree.  As for the @code{widget_value}, siblings are chained through the
9796 @code{next} member.  However, rather than naming children, the
9797 @code{widget_instance} tree links to parents.
9798
9799 @example
9800            +-----------+
9801            | composite |
9802            +-----------+
9803                  A
9804                  | parent
9805                  |
9806              +-------+ next +-------+ next +-------+
9807              | child |----->| child |----->| child |
9808              +-------+      +-------+      +-------+
9809                                 A
9810                                 | parent
9811                                 |
9812                          +-------------+ next +-------------+
9813                          | grand child |----->| grand child |
9814                          +-------------+      +-------------+
9815
9816 The @code{widget_value} hierarchy of a composite widget with two simple
9817 children and one composite child.
9818 @end example
9819
9820 This permits widgets derived from different toolkits to be updated and
9821 manipulated generically by the lwlib library. For instance
9822 @code{update_one_widget_instance} can cope with multiple types of widget
9823 and multiple types of toolkit. Each element in the widget hierarchy is
9824 updated from its corresponding @code{widget_value} by walking the
9825 @code{widget_value} tree.  This has desirable properties.  For example,
9826 @code{lw_modify_all_widgets} is called from @file{glyphs-x.c} and
9827 updates all the properties of a widget without having to know what the
9828 widget is or what toolkit it is from.  Unfortunately this also has its
9829 hairy properties; the lwlib code quite complex. And of course lwlib has
9830 to know at some level what the widget is and how to set its properties.
9831
9832 The @code{widget_instance} structure also contains a pointer to the root
9833 of its tree.  Widget instances are further confi
9834
9835
9836 @node Scrollbars
9837 @subsection Scrollbars
9838 @cindex scrollbars
9839
9840 @node Menubars
9841 @subsection Menubars
9842 @cindex menubars
9843
9844 @node Checkboxes and Radio Buttons
9845 @subsection Checkboxes and Radio Buttons
9846 @cindex checkboxes and radio buttons
9847 @cindex radio buttons, checkboxes and
9848 @cindex buttons, checkboxes and radio
9849
9850 @node Progress Bars
9851 @subsection Progress Bars
9852 @cindex progress bars
9853 @cindex bars, progress
9854
9855 @node Tab Controls
9856 @subsection Tab Controls
9857 @cindex tab controls
9858
9859 @include index.texi
9860
9861 @c Print the tables of contents
9862 @summarycontents
9863 @contents
9864 @c That's all
9865
9866 @bye