git.chise.org Git - chise/xemacs-chise.git.1/blob - info/internals.info-2

   1 This is ../info/internals.info, produced by makeinfo version 4.0 from
   2 internals/internals.texi.
   3
   4 INFO-DIR-SECTION XEmacs Editor
   5 START-INFO-DIR-ENTRY
   6 * Internals: (internals).       XEmacs Internals Manual.
   7 END-INFO-DIR-ENTRY
   8
   9    Copyright (C) 1992 - 1996 Ben Wing.  Copyright (C) 1996, 1997 Sun
  10 Microsystems.  Copyright (C) 1994 - 1998 Free Software Foundation.
  11 Copyright (C) 1994, 1995 Board of Trustees, University of Illinois.
  12
  13    Permission is granted to make and distribute verbatim copies of this
  14 manual provided the copyright notice and this permission notice are
  15 preserved on all copies.
  16
  17    Permission is granted to copy and distribute modified versions of
  18 this manual under the conditions for verbatim copying, provided that the
  19 entire resulting derived work is distributed under the terms of a
  20 permission notice identical to this one.
  21
  22    Permission is granted to copy and distribute translations of this
  23 manual into another language, under the above conditions for modified
  24 versions, except that this permission notice may be stated in a
  25 translation approved by the Foundation.
  26
  27    Permission is granted to copy and distribute modified versions of
  28 this manual under the conditions for verbatim copying, provided also
  29 that the section entitled "GNU General Public License" is included
  30 exactly as in the original, and provided that the entire resulting
  31 derived work is distributed under the terms of a permission notice
  32 identical to this one.
  33
  34    Permission is granted to copy and distribute translations of this
  35 manual into another language, under the above conditions for modified
  36 versions, except that the section entitled "GNU General Public License"
  37 may be included in a translation approved by the Free Software
  38 Foundation instead of in the original English.
  39
  40 \1f
  41 File: internals.info,  Node: The XEmacs Object System (Abstractly Speaking),  Next: How Lisp Objects Are Represented in C,  Prev: XEmacs From the Inside,  Up: Top
  42
  43 The XEmacs Object System (Abstractly Speaking)
  44 **********************************************
  45
  46    At the heart of the Lisp interpreter is its management of objects.
  47 XEmacs Lisp contains many built-in objects, some of which are simple
  48 and others of which can be very complex; and some of which are very
  49 common, and others of which are rarely used or are only used
  50 internally. (Since the Lisp allocation system, with its automatic
  51 reclamation of unused storage, is so much more convenient than
  52 `malloc()' and `free()', the C code makes extensive use of it in its
  53 internal operations.)
  54
  55    The basic Lisp objects are
  56
  57 `integer'
  58      28 or 31 bits of precision, or 60 or 63 bits on 64-bit machines;
  59      the reason for this is described below when the internal Lisp
  60      object representation is described.
  61
  62 `float'
  63      Same precision as a double in C.
  64
  65 `cons'
  66      A simple container for two Lisp objects, used to implement lists
  67      and most other data structures in Lisp.
  68
  69 `char'
  70      An object representing a single character of text; chars behave
  71      like integers in many ways but are logically considered text
  72      rather than numbers and have a different read syntax. (the read
  73      syntax for a char contains the char itself or some textual
  74      encoding of it--for example, a Japanese Kanji character might be
  75      encoded as `^[$(B#&^[(B' using the ISO-2022 encoding
  76      standard--rather than the numerical representation of the char;
  77      this way, if the mapping between chars and integers changes, which
  78      is quite possible for Kanji characters and other extended
  79      characters, the same character will still be created.  Note that
  80      some primitives confuse chars and integers.  The worst culprit is
  81      `eq', which makes a special exception and considers a char to be
  82      `eq' to its integer equivalent, even though in no other case are
  83      objects of two different types `eq'.  The reason for this
  84      monstrosity is compatibility with existing code; the separation of
  85      char from integer came fairly recently.)
  86
  87 `symbol'
  88      An object that contains Lisp objects and is referred to by name;
  89      symbols are used to implement variables and named functions and to
  90      provide the equivalent of preprocessor constants in C.
  91
  92 `vector'
  93      A one-dimensional array of Lisp objects providing constant-time
  94      access to any of the objects; access to an arbitrary object in a
  95      vector is faster than for lists, but the operations that can be
  96      done on a vector are more limited.
  97
  98 `string'
  99      Self-explanatory; behaves much like a vector of chars but has a
 100      different read syntax and is stored and manipulated more compactly.
 101
 102 `bit-vector'
 103      A vector of bits; similar to a string in spirit.
 104
 105 `compiled-function'
 106      An object containing compiled Lisp code, known as "byte code".
 107
 108 `subr'
 109      A Lisp primitive, i.e. a Lisp-callable function implemented in C.
 110
 111    Note that there is no basic "function" type, as in more powerful
 112 versions of Lisp (where it's called a "closure").  XEmacs Lisp does not
 113 provide the closure semantics implemented by Common Lisp and Scheme.
 114 The guts of a function in XEmacs Lisp are represented in one of four
 115 ways: a symbol specifying another function (when one function is an
 116 alias for another), a list (whose first element must be the symbol
 117 `lambda') containing the function's source code, a compiled-function
 118 object, or a subr object. (In other words, given a symbol specifying
 119 the name of a function, calling `symbol-function' to retrieve the
 120 contents of the symbol's function cell will return one of these types
 121 of objects.)
 122
 123    XEmacs Lisp also contains numerous specialized objects used to
 124 implement the editor:
 125
 126 `buffer'
 127      Stores text like a string, but is optimized for insertion and
 128      deletion and has certain other properties that can be set.
 129
 130 `frame'
 131      An object with various properties whose displayable representation
 132      is a "window" in window-system parlance.
 133
 134 `window'
 135      A section of a frame that displays the contents of a buffer; often
 136      called a "pane" in window-system parlance.
 137
 138 `window-configuration'
 139      An object that represents a saved configuration of windows in a
 140      frame.
 141
 142 `device'
 143      An object representing a screen on which frames can be displayed;
 144      equivalent to a "display" in the X Window System and a "TTY" in
 145      character mode.
 146
 147 `face'
 148      An object specifying the appearance of text or graphics; it has
 149      properties such as font, foreground color, and background color.
 150
 151 `marker'
 152      An object that refers to a particular position in a buffer and
 153      moves around as text is inserted and deleted to stay in the same
 154      relative position to the text around it.
 155
 156 `extent'
 157      Similar to a marker but covers a range of text in a buffer; can
 158      also specify properties of the text, such as a face in which the
 159      text is to be displayed, whether the text is invisible or
 160      unmodifiable, etc.
 161
 162 `event'
 163      Generated by calling `next-event' and contains information
 164      describing a particular event happening in the system, such as the
 165      user pressing a key or a process terminating.
 166
 167 `keymap'
 168      An object that maps from events (described using lists, vectors,
 169      and symbols rather than with an event object because the mapping
 170      is for classes of events, rather than individual events) to
 171      functions to execute or other events to recursively look up; the
 172      functions are described by name, using a symbol, or using lists to
 173      specify the function's code.
 174
 175 `glyph'
 176      An object that describes the appearance of an image (e.g.  pixmap)
 177      on the screen; glyphs can be attached to the beginning or end of
 178      extents and in some future version of XEmacs will be able to be
 179      inserted directly into a buffer.
 180
 181 `process'
 182      An object that describes a connection to an externally-running
 183      process.
 184
 185    There are some other, less-commonly-encountered general objects:
 186
 187 `hash-table'
 188      An object that maps from an arbitrary Lisp object to another
 189      arbitrary Lisp object, using hashing for fast lookup.
 190
 191 `obarray'
 192      A limited form of hash-table that maps from strings to symbols;
 193      obarrays are used to look up a symbol given its name and are not
 194      actually their own object type but are kludgily represented using
 195      vectors with hidden fields (this representation derives from GNU
 196      Emacs).
 197
 198 `specifier'
 199      A complex object used to specify the value of a display property; a
 200      default value is given and different values can be specified for
 201      particular frames, buffers, windows, devices, or classes of device.
 202
 203 `char-table'
 204      An object that maps from chars or classes of chars to arbitrary
 205      Lisp objects; internally char tables use a complex nested-vector
 206      representation that is optimized to the way characters are
 207      represented as integers.
 208
 209 `range-table'
 210      An object that maps from ranges of integers to arbitrary Lisp
 211      objects.
 212
 213    And some strange special-purpose objects:
 214
 215 `charset'
 216 `coding-system'
 217      Objects used when MULE, or multi-lingual/Asian-language, support is
 218      enabled.
 219
 220 `color-instance'
 221 `font-instance'
 222 `image-instance'
 223      An object that encapsulates a window-system resource; instances are
 224      mostly used internally but are exposed on the Lisp level for
 225      cleanness of the specifier model and because it's occasionally
 226      useful for Lisp program to create or query the properties of
 227      instances.
 228
 229 `subwindow'
 230      An object that encapsulate a "subwindow" resource, i.e. a
 231      window-system child window that is drawn into by an external
 232      process; this object should be integrated into the glyph system
 233      but isn't yet, and may change form when this is done.
 234
 235 `tooltalk-message'
 236 `tooltalk-pattern'
 237      Objects that represent resources used in the ToolTalk interprocess
 238      communication protocol.
 239
 240 `toolbar-button'
 241      An object used in conjunction with the toolbar.
 242
 243    And objects that are only used internally:
 244
 245 `opaque'
 246      A generic object for encapsulating arbitrary memory; this allows
 247      you the generality of `malloc()' and the convenience of the Lisp
 248      object system.
 249
 250 `lstream'
 251      A buffering I/O stream, used to provide a unified interface to
 252      anything that can accept output or provide input, such as a file
 253      descriptor, a stdio stream, a chunk of memory, a Lisp buffer, a
 254      Lisp string, etc.; it's a Lisp object to make its memory
 255      management more convenient.
 256
 257 `char-table-entry'
 258      Subsidiary objects in the internal char-table representation.
 259
 260 `extent-auxiliary'
 261 `menubar-data'
 262 `toolbar-data'
 263      Various special-purpose objects that are basically just used to
 264      encapsulate memory for particular subsystems, similar to the more
 265      general "opaque" object.
 266
 267 `symbol-value-forward'
 268 `symbol-value-buffer-local'
 269 `symbol-value-varalias'
 270 `symbol-value-lisp-magic'
 271      Special internal-only objects that are placed in the value cell of
 272      a symbol to indicate that there is something special with this
 273      variable - e.g. it has no value, it mirrors another variable, or
 274      it mirrors some C variable; there is really only one kind of
 275      object, called a "symbol-value-magic", but it is sort-of halfway
 276      kludged into semi-different object types.
 277
 278    Some types of objects are "permanent", meaning that once created,
 279 they do not disappear until explicitly destroyed, using a function such
 280 as `delete-buffer', `delete-window', `delete-frame', etc.  Others will
 281 disappear once they are not longer used, through the garbage collection
 282 mechanism.  Buffers, frames, windows, devices, and processes are among
 283 the objects that are permanent.  Note that some objects can go both
 284 ways: Faces can be created either way; extents are normally permanent,
 285 but detached extents (extents not referring to any text, as happens to
 286 some extents when the text they are referring to is deleted) are
 287 temporary.  Note that some permanent objects, such as faces and coding
 288 systems, cannot be deleted.  Note also that windows are unique in that
 289 they can be _undeleted_ after having previously been deleted. (This
 290 happens as a result of restoring a window configuration.)
 291
 292    Note that many types of objects have a "read syntax", i.e. a way of
 293 specifying an object of that type in Lisp code.  When you load a Lisp
 294 file, or type in code to be evaluated, what really happens is that the
 295 function `read' is called, which reads some text and creates an object
 296 based on the syntax of that text; then `eval' is called, which possibly
 297 does something special; then this loop repeats until there's no more
 298 text to read. (`eval' only actually does something special with
 299 symbols, which causes the symbol's value to be returned, similar to
 300 referencing a variable; and with conses [i.e. lists], which cause a
 301 function invocation.  All other values are returned unchanged.)
 302
 303    The read syntax
 304
 305      17297
 306
 307    converts to an integer whose value is 17297.
 308
 309      1.983e-4
 310
 311    converts to a float whose value is 1.983e-4, or .0001983.
 312
 313      ?b
 314
 315    converts to a char that represents the lowercase letter b.
 316
 317      ?^[$(B#&^[(B
 318
 319    (where `^[' actually is an `ESC' character) converts to a particular
 320 Kanji character when using an ISO2022-based coding system for input.
 321 (To decode this goo: `ESC' begins an escape sequence; `ESC $ (' is a
 322 class of escape sequences meaning "switch to a 94x94 character set";
 323 `ESC $ ( B' means "switch to Japanese Kanji"; `#' and `&' collectively
 324 index into a 94-by-94 array of characters [subtract 33 from the ASCII
 325 value of each character to get the corresponding index]; `ESC (' is a
 326 class of escape sequences meaning "switch to a 94 character set"; `ESC
 327 (B' means "switch to US ASCII".  It is a coincidence that the letter
 328 `B' is used to denote both Japanese Kanji and US ASCII.  If the first
 329 `B' were replaced with an `A', you'd be requesting a Chinese Hanzi
 330 character from the GB2312 character set.)
 331
 332      "foobar"
 333
 334    converts to a string.
 335
 336      foobar
 337
 338    converts to a symbol whose name is `"foobar"'.  This is done by
 339 looking up the string equivalent in the global variable `obarray',
 340 whose contents should be an obarray.  If no symbol is found, a new
 341 symbol with the name `"foobar"' is automatically created and added to
 342 `obarray'; this process is called "interning" the symbol.
 343
 344      (foo . bar)
 345
 346    converts to a cons cell containing the symbols `foo' and `bar'.
 347
 348      (1 a 2.5)
 349
 350    converts to a three-element list containing the specified objects
 351 (note that a list is actually a set of nested conses; see the XEmacs
 352 Lisp Reference).
 353
 354      [1 a 2.5]
 355
 356    converts to a three-element vector containing the specified objects.
 357
 358      #[... ... ... ...]
 359
 360    converts to a compiled-function object (the actual contents are not
 361 shown since they are not relevant here; look at a file that ends with
 362 `.elc' for examples).
 363
 364      #*01110110
 365
 366    converts to a bit-vector.
 367
 368      #s(hash-table ... ...)
 369
 370    converts to a hash table (the actual contents are not shown).
 371
 372      #s(range-table ... ...)
 373
 374    converts to a range table (the actual contents are not shown).
 375
 376      #s(char-table ... ...)
 377
 378    converts to a char table (the actual contents are not shown).
 379
 380    Note that the `#s()' syntax is the general syntax for structures,
 381 which are not really implemented in XEmacs Lisp but should be.
 382
 383    When an object is printed out (using `print' or a related function),
 384 the read syntax is used, so that the same object can be read in again.
 385
 386    The other objects do not have read syntaxes, usually because it does
 387 not really make sense to create them in this fashion (i.e.  processes,
 388 where it doesn't make sense to have a subprocess created as a side
 389 effect of reading some Lisp code), or because they can't be created at
 390 all (e.g. subrs).  Permanent objects, as a rule, do not have a read
 391 syntax; nor do most complex objects, which contain too much state to be
 392 easily initialized through a read syntax.
 393
 394 \1f
 395 File: internals.info,  Node: How Lisp Objects Are Represented in C,  Next: Rules When Writing New C Code,  Prev: The XEmacs Object System (Abstractly Speaking),  Up: Top
 396
 397 How Lisp Objects Are Represented in C
 398 *************************************
 399
 400    Lisp objects are represented in C using a 32-bit or 64-bit machine
 401 word (depending on the processor; i.e. DEC Alphas use 64-bit Lisp
 402 objects and most other processors use 32-bit Lisp objects).  The
 403 representation stuffs a pointer together with a tag, as follows:
 404
 405       [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
 406       [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
 407
 408         <---------------------------------------------------------> <->
 409                  a pointer to a structure, or an integer            tag
 410
 411    A tag of 00 is used for all pointer object types, a tag of 10 is used
 412 for characters, and the other two tags 01 and 11 are joined together to
 413 form the integer object type.  This representation gives us 31 bit
 414 integers and 30 bit characters, while pointers are represented directly
 415 without any bit masking or shifting.  This representation, though,
 416 assumes that pointers to structs are always aligned to multiples of 4,
 417 so the lower 2 bits are always zero.
 418
 419    Lisp objects use the typedef `Lisp_Object', but the actual C type
 420 used for the Lisp object can vary.  It can be either a simple type
 421 (`long' on the DEC Alpha, `int' on other machines) or a structure whose
 422 fields are bit fields that line up properly (actually, a union of
 423 structures is used).  Generally the simple integral type is preferable
 424 because it ensures that the compiler will actually use a machine word
 425 to represent the object (some compilers will use more general and less
 426 efficient code for unions and structs even if they can fit in a machine
 427 word).  The union type, however, has the advantage of stricter type
 428 checking.  If you accidentally pass an integer where a Lisp object is
 429 desired, you get a compile error.  The choice of which type to use is
 430 determined by the preprocessor constant `USE_UNION_TYPE' which is
 431 defined via the `--use-union-type' option to `configure'.
 432
 433    Various macros are used to convert between Lisp_Objects and the
 434 corresponding C type.  Macros of the form `XINT()', `XCHAR()',
 435 `XSTRING()', `XSYMBOL()', do any required bit shifting and/or masking
 436 and cast it to the appropriate type.  `XINT()' needs to be a bit tricky
 437 so that negative numbers are properly sign-extended.  Since integers
 438 are stored left-shifted, if the right-shift operator does an arithmetic
 439 shift (i.e. it leaves the most-significant bit as-is rather than
 440 shifting in a zero, so that it mimics a divide-by-two even for negative
 441 numbers) the shift to remove the tag bit is enough.  This is the case
 442 on all the systems we support.
 443
 444    Note that when `ERROR_CHECK_TYPECHECK' is defined, the converter
 445 macros become more complicated--they check the tag bits and/or the type
 446 field in the first four bytes of a record type to ensure that the
 447 object is really of the correct type.  This is great for catching places
 448 where an incorrect type is being dereferenced--this typically results
 449 in a pointer being dereferenced as the wrong type of structure, with
 450 unpredictable (and sometimes not easily traceable) results.
 451
 452    There are similar `XSETTYPE()' macros that construct a Lisp object.
 453 These macros are of the form `XSETTYPE (LVALUE, RESULT)', i.e. they
 454 have to be a statement rather than just used in an expression.  The
 455 reason for this is that standard C doesn't let you "construct" a
 456 structure (but GCC does).  Granted, this sometimes isn't too
 457 convenient; for the case of integers, at least, you can use the
 458 function `make_int()', which constructs and _returns_ an integer Lisp
 459 object.  Note that the `XSETTYPE()' macros are also affected by
 460 `ERROR_CHECK_TYPECHECK' and make sure that the structure is of the
 461 right type in the case of record types, where the type is contained in
 462 the structure.
 463
 464    The C programmer is responsible for *guaranteeing* that a
 465 Lisp_Object is the correct type before using the `XTYPE' macros.  This
 466 is especially important in the case of lists.  Use `XCAR' and `XCDR' if
 467 a Lisp_Object is certainly a cons cell, else use `Fcar()' and `Fcdr()'.
 468 Trust other C code, but not Lisp code.  On the other hand, if XEmacs
 469 has an internal logic error, it's better to crash immediately, so
 470 sprinkle `assert()'s and "unreachable" `abort()'s liberally about the
 471 source code.  Where performance is an issue, use `type_checking_assert',
 472 `bufpos_checking_assert', and `gc_checking_assert', which do nothing
 473 unless the corresponding configure error checking flag was specified.
 474
 475 \1f
 476 File: internals.info,  Node: Rules When Writing New C Code,  Next: A Summary of the Various XEmacs Modules,  Prev: How Lisp Objects Are Represented in C,  Up: Top
 477
 478 Rules When Writing New C Code
 479 *****************************
 480
 481    The XEmacs C Code is extremely complex and intricate, and there are
 482 many rules that are more or less consistently followed throughout the
 483 code.  Many of these rules are not obvious, so they are explained here.
 484 It is of the utmost importance that you follow them.  If you don't,
 485 you may get something that appears to work, but which will crash in odd
 486 situations, often in code far away from where the actual breakage is.
 487
 488 * Menu:
 489
 490 * General Coding Rules::
 491 * Writing Lisp Primitives::
 492 * Writing Good Comments::
 493 * Adding Global Lisp Variables::
 494 * Proper Use of Unsigned Types::
 495 * Coding for Mule::
 496 * Techniques for XEmacs Developers::
 497
 498 \1f
 499 File: internals.info,  Node: General Coding Rules,  Next: Writing Lisp Primitives,  Up: Rules When Writing New C Code
 500
 501 General Coding Rules
 502 ====================
 503
 504    The C code is actually written in a dialect of C called "Clean C",
 505 meaning that it can be compiled, mostly warning-free, with either a C or
 506 C++ compiler.  Coding in Clean C has several advantages over plain C.
 507 C++ compilers are more nit-picking, and a number of coding errors have
 508 been found by compiling with C++.  The ability to use both C and C++
 509 tools means that a greater variety of development tools are available to
 510 the developer.
 511
 512    Every module includes `<config.h>' (angle brackets so that
 513 `--srcdir' works correctly; `config.h' may or may not be in the same
 514 directory as the C sources) and `lisp.h'.  `config.h' must always be
 515 included before any other header files (including system header files)
 516 to ensure that certain tricks played by various `s/' and `m/' files
 517 work out correctly.
 518
 519    When including header files, always use angle brackets, not double
 520 quotes, except when the file to be included is always in the same
 521 directory as the including file.  If either file is a generated file,
 522 then that is not likely to be the case.  In order to understand why we
 523 have this rule, imagine what happens when you do a build in the source
 524 directory using `./configure' and another build in another directory
 525 using `../work/configure'.  There will be two different `config.h'
 526 files.  Which one will be used if you `#include "config.h"'?
 527
 528    Almost every module contains a `syms_of_*()' function and a
 529 `vars_of_*()' function.  The former declares any Lisp primitives you
 530 have defined and defines any symbols you will be using.  The latter
 531 declares any global Lisp variables you have added and initializes global
 532 C variables in the module.  *Important*: There are stringent
 533 requirements on exactly what can go into these functions.  See the
 534 comment in `emacs.c'.  The reason for this is to avoid obscure unwanted
 535 interactions during initialization.  If you don't follow these rules,
 536 you'll be sorry!  If you want to do anything that isn't allowed, create
 537 a `complex_vars_of_*()' function for it.  Doing this is tricky, though:
 538 you have to make sure your function is called at the right time so that
 539 all the initialization dependencies work out.
 540
 541    Declare each function of these kinds in `symsinit.h'.  Make sure
 542 it's called in the appropriate place in `emacs.c'.  You never need to
 543 include `symsinit.h' directly, because it is included by `lisp.h'.
 544
 545    *All global and static variables that are to be modifiable must be
 546 declared uninitialized.*  This means that you may not use the "declare
 547 with initializer" form for these variables, such as `int some_variable
 548 = 0;'.  The reason for this has to do with some kludges done during the
 549 dumping process: If possible, the initialized data segment is re-mapped
 550 so that it becomes part of the (unmodifiable) code segment in the
 551 dumped executable.  This allows this memory to be shared among multiple
 552 running XEmacs processes.  XEmacs is careful to place as much constant
 553 data as possible into initialized variables during the `temacs' phase.
 554
 555    *Please note:* This kludge only works on a few systems nowadays, and
 556 is rapidly becoming irrelevant because most modern operating systems
 557 provide "copy-on-write" semantics.  All data is initially shared
 558 between processes, and a private copy is automatically made (on a
 559 page-by-page basis) when a process first attempts to write to a page of
 560 memory.
 561
 562    Formerly, there was a requirement that static variables not be
 563 declared inside of functions.  This had to do with another hack along
 564 the same vein as what was just described: old USG systems put
 565 statically-declared variables in the initialized data space, so those
 566 header files had a `#define static' declaration. (That way, the
 567 data-segment remapping described above could still work.) This fails
 568 badly on static variables inside of functions, which suddenly become
 569 automatic variables; therefore, you weren't supposed to have any of
 570 them.  This awful kludge has been removed in XEmacs because
 571
 572   1. almost all of the systems that used this kludge ended up having to
 573      disable the data-segment remapping anyway;
 574
 575   2. the only systems that didn't were extremely outdated ones;
 576
 577   3. this hack completely messed up inline functions.
 578
 579    The C source code makes heavy use of C preprocessor macros.  One
 580 popular macro style is:
 581
 582      #define FOO(var, value) do {            \
 583        Lisp_Object FOO_value = (value);      \
 584        ... /* compute using FOO_value */     \
 585        (var) = bar;                          \
 586      } while (0)
 587
 588    The `do {...} while (0)' is a standard trick to allow FOO to have
 589 statement semantics, so that it can safely be used within an `if'
 590 statement in C, for example.  Multiple evaluation is prevented by
 591 copying a supplied argument into a local variable, so that
 592 `FOO(var,fun(1))' only calls `fun' once.
 593
 594    Lisp lists are popular data structures in the C code as well as in
 595 Elisp.  There are two sets of macros that iterate over lists.
 596 `EXTERNAL_LIST_LOOP_N' should be used when the list has been supplied
 597 by the user, and cannot be trusted to be acyclic and `nil'-terminated.
 598 A `malformed-list' or `circular-list' error will be generated if the
 599 list being iterated over is not entirely kosher.  `LIST_LOOP_N', on the
 600 other hand, is faster and less safe, and can be used only on trusted
 601 lists.
 602
 603    Related macros are `GET_EXTERNAL_LIST_LENGTH' and `GET_LIST_LENGTH',
 604 which calculate the length of a list, and in the case of
 605 `GET_EXTERNAL_LIST_LENGTH', validating the properness of the list.  The
 606 macros `EXTERNAL_LIST_LOOP_DELETE_IF' and `LIST_LOOP_DELETE_IF' delete
 607 elements from a lisp list satisfying some predicate.
 608
 609 \1f
 610 File: internals.info,  Node: Writing Lisp Primitives,  Next: Writing Good Comments,  Prev: General Coding Rules,  Up: Rules When Writing New C Code
 611
 612 Writing Lisp Primitives
 613 =======================
 614
 615    Lisp primitives are Lisp functions implemented in C.  The details of
 616 interfacing the C function so that Lisp can call it are handled by a few
 617 C macros.  The only way to really understand how to write new C code is
 618 to read the source, but we can explain some things here.
 619
 620    An example of a special form is the definition of `prog1', from
 621 `eval.c'.  (An ordinary function would have the same general
 622 appearance.)
 623
 624      DEFUN ("prog1", Fprog1, 1, UNEVALLED, 0, /*
 625      Similar to `progn', but the value of the first form is returned.
 626      \(prog1 FIRST BODY...): All the arguments are evaluated sequentially.
 627      The value of FIRST is saved during evaluation of the remaining args,
 628      whose values are discarded.
 629      */
 630             (args))
 631      {
 632        /* This function can GC */
 633        REGISTER Lisp_Object val, form, tail;
 634        struct gcpro gcpro1;
 635
 636        val = Feval (XCAR (args));
 637
 638        GCPRO1 (val);
 639
 640        LIST_LOOP_3 (form, XCDR (args), tail)
 641          Feval (form);
 642
 643        UNGCPRO;
 644        return val;
 645      }
 646
 647    Let's start with a precise explanation of the arguments to the
 648 `DEFUN' macro.  Here is a template for them:
 649
 650      DEFUN (LNAME, FNAME, MIN_ARGS, MAX_ARGS, INTERACTIVE, /*
 651      DOCSTRING
 652      */
 653         (ARGLIST))
 654
 655 LNAME
 656      This string is the name of the Lisp symbol to define as the
 657      function name; in the example above, it is `"prog1"'.
 658
 659 FNAME
 660      This is the C function name for this function.  This is the name
 661      that is used in C code for calling the function.  The name is, by
 662      convention, `F' prepended to the Lisp name, with all dashes (`-')
 663      in the Lisp name changed to underscores.  Thus, to call this
 664      function from C code, call `Fprog1'.  Remember that the arguments
 665      are of type `Lisp_Object'; various macros and functions for
 666      creating values of type `Lisp_Object' are declared in the file
 667      `lisp.h'.
 668
 669      Primitives whose names are special characters (e.g. `+' or `<')
 670      are named by spelling out, in some fashion, the special character:
 671      e.g. `Fplus()' or `Flss()'.  Primitives whose names begin with
 672      normal alphanumeric characters but also contain special characters
 673      are spelled out in some creative way, e.g. `let*' becomes
 674      `FletX()'.
 675
 676      Each function also has an associated structure that holds the data
 677      for the subr object that represents the function in Lisp.  This
 678      structure conveys the Lisp symbol name to the initialization
 679      routine that will create the symbol and store the subr object as
 680      its definition.  The C variable name of this structure is always
 681      `S' prepended to the FNAME.  You hardly ever need to be aware of
 682      the existence of this structure, since `DEFUN' plus `DEFSUBR'
 683      takes care of all the details.
 684
 685 MIN_ARGS
 686      This is the minimum number of arguments that the function
 687      requires.  The function `prog1' allows a minimum of one argument.
 688
 689 MAX_ARGS
 690      This is the maximum number of arguments that the function accepts,
 691      if there is a fixed maximum.  Alternatively, it can be `UNEVALLED',
 692      indicating a special form that receives unevaluated arguments, or
 693      `MANY', indicating an unlimited number of evaluated arguments (the
 694      C equivalent of `&rest').  Both `UNEVALLED' and `MANY' are macros.
 695      If MAX_ARGS is a number, it may not be less than MIN_ARGS and it
 696      may not be greater than 8. (If you need to add a function with
 697      more than 8 arguments, use the `MANY' form.  Resist the urge to
 698      edit the definition of `DEFUN' in `lisp.h'.  If you do it anyways,
 699      make sure to also add another clause to the switch statement in
 700      `primitive_funcall().')
 701
 702 INTERACTIVE
 703      This is an interactive specification, a string such as might be
 704      used as the argument of `interactive' in a Lisp function.  In the
 705      case of `prog1', it is 0 (a null pointer), indicating that `prog1'
 706      cannot be called interactively.  A value of `""' indicates a
 707      function that should receive no arguments when called
 708      interactively.
 709
 710 DOCSTRING
 711      This is the documentation string.  It is written just like a
 712      documentation string for a function defined in Lisp; in
 713      particular, the first line should be a single sentence.  Note how
 714      the documentation string is enclosed in a comment, none of the
 715      documentation is placed on the same lines as the comment-start and
 716      comment-end characters, and the comment-start characters are on
 717      the same line as the interactive specification.  `make-docfile',
 718      which scans the C files for documentation strings, is very
 719      particular about what it looks for, and will not properly extract
 720      the doc string if it's not in this exact format.
 721
 722      In order to make both `etags' and `make-docfile' happy, make sure
 723      that the `DEFUN' line contains the LNAME and FNAME, and that the
 724      comment-start characters for the doc string are on the same line
 725      as the interactive specification, and put a newline directly after
 726      them (and before the comment-end characters).
 727
 728 ARGLIST
 729      This is the comma-separated list of arguments to the C function.
 730      For a function with a fixed maximum number of arguments, provide a
 731      C argument for each Lisp argument.  In this case, unlike regular C
 732      functions, the types of the arguments are not declared; they are
 733      simply always of type `Lisp_Object'.
 734
 735      The names of the C arguments will be used as the names of the
 736      arguments to the Lisp primitive as displayed in its documentation,
 737      modulo the same concerns described above for `F...' names (in
 738      particular, underscores in the C arguments become dashes in the
 739      Lisp arguments).
 740
 741      There is one additional kludge: A trailing `_' on the C argument is
 742      discarded when forming the Lisp argument.  This allows C language
 743      reserved words (like `default') or global symbols (like `dirname')
 744      to be used as argument names without compiler warnings or errors.
 745
 746      A Lisp function with MAX_ARGS = `UNEVALLED' is a "special form";
 747      its arguments are not evaluated.  Instead it receives one argument
 748      of type `Lisp_Object', a (Lisp) list of the unevaluated arguments,
 749      conventionally named `(args)'.
 750
 751      When a Lisp function has no upper limit on the number of arguments,
 752      specify MAX_ARGS = `MANY'.  In this case its implementation in C
 753      actually receives exactly two arguments: the number of Lisp
 754      arguments (an `int') and the address of a block containing their
 755      values (a `Lisp_Object *').  In this case only are the C types
 756      specified in the ARGLIST: `(int nargs, Lisp_Object *args)'.
 757
 758    Within the function `Fprog1' itself, note the use of the macros
 759 `GCPRO1' and `UNGCPRO'.  `GCPRO1' is used to "protect" a variable from
 760 garbage collection--to inform the garbage collector that it must look
 761 in that variable and regard the object pointed at by its contents as an
 762 accessible object.  This is necessary whenever you call `Feval' or
 763 anything that can directly or indirectly call `Feval' (this includes
 764 the `QUIT' macro!).  At such a time, any Lisp object that you intend to
 765 refer to again must be protected somehow.  `UNGCPRO' cancels the
 766 protection of the variables that are protected in the current function.
 767 It is necessary to do this explicitly.
 768
 769    The macro `GCPRO1' protects just one local variable.  If you want to
 770 protect two, use `GCPRO2' instead; repeating `GCPRO1' will not work.
 771 Macros `GCPRO3' and `GCPRO4' also exist.
 772
 773    These macros implicitly use local variables such as `gcpro1'; you
 774 must declare these explicitly, with type `struct gcpro'.  Thus, if you
 775 use `GCPRO2', you must declare `gcpro1' and `gcpro2'.
 776
 777    Note also that the general rule is "caller-protects"; i.e. you are
 778 only responsible for protecting those Lisp objects that you create.  Any
 779 objects passed to you as arguments should have been protected by whoever
 780 created them, so you don't in general have to protect them.
 781
 782    In particular, the arguments to any Lisp primitive are always
 783 automatically `GCPRO'ed, when called "normally" from Lisp code or
 784 bytecode.  So only a few Lisp primitives that are called frequently from
 785 C code, such as `Fprogn' protect their arguments as a service to their
 786 caller.  You don't need to protect your arguments when writing a new
 787 `DEFUN'.
 788
 789    `GCPRO'ing is perhaps the trickiest and most error-prone part of
 790 XEmacs coding.  It is *extremely* important that you get this right and
 791 use a great deal of discipline when writing this code.  *Note
 792 `GCPRO'ing: GCPROing, for full details on how to do this.
 793
 794    What `DEFUN' actually does is declare a global structure of type
 795 `Lisp_Subr' whose name begins with capital `SF' and which contains
 796 information about the primitive (e.g. a pointer to the function, its
 797 minimum and maximum allowed arguments, a string describing its Lisp
 798 name); `DEFUN' then begins a normal C function declaration using the
 799 `F...' name.  The Lisp subr object that is the function definition of a
 800 primitive (i.e. the object in the function slot of the symbol that
 801 names the primitive) actually points to this `SF' structure; when
 802 `Feval' encounters a subr, it looks in the structure to find out how to
 803 call the C function.
 804
 805    Defining the C function is not enough to make a Lisp primitive
 806 available; you must also create the Lisp symbol for the primitive (the
 807 symbol is "interned"; *note Obarrays::) and store a suitable subr
 808 object in its function cell. (If you don't do this, the primitive won't
 809 be seen by Lisp code.) The code looks like this:
 810
 811      DEFSUBR (FNAME);
 812
 813 Here FNAME is the same name you used as the second argument to `DEFUN'.
 814
 815    This call to `DEFSUBR' should go in the `syms_of_*()' function at
 816 the end of the module.  If no such function exists, create it and make
 817 sure to also declare it in `symsinit.h' and call it from the
 818 appropriate spot in `main()'.  *Note General Coding Rules::.
 819
 820    Note that C code cannot call functions by name unless they are
 821 defined in C.  The way to call a function written in Lisp from C is to
 822 use `Ffuncall', which embodies the Lisp function `funcall'.  Since the
 823 Lisp function `funcall' accepts an unlimited number of arguments, in C
 824 it takes two: the number of Lisp-level arguments, and a one-dimensional
 825 array containing their values.  The first Lisp-level argument is the
 826 Lisp function to call, and the rest are the arguments to pass to it.
 827 Since `Ffuncall' can call the evaluator, you must protect pointers from
 828 garbage collection around the call to `Ffuncall'. (However, `Ffuncall'
 829 explicitly protects all of its parameters, so you don't have to protect
 830 any pointers passed as parameters to it.)
 831
 832    The C functions `call0', `call1', `call2', and so on, provide handy
 833 ways to call a Lisp function conveniently with a fixed number of
 834 arguments.  They work by calling `Ffuncall'.
 835
 836    `eval.c' is a very good file to look through for examples; `lisp.h'
 837 contains the definitions for important macros and functions.
 838
 839 \1f
 840 File: internals.info,  Node: Writing Good Comments,  Next: Adding Global Lisp Variables,  Prev: Writing Lisp Primitives,  Up: Rules When Writing New C Code
 841
 842 Writing Good Comments
 843 =====================
 844
 845    Comments are a lifeline for programmers trying to understand tricky
 846 code.  In general, the less obvious it is what you are doing, the more
 847 you need a comment, and the more detailed it needs to be.  You should
 848 always be on guard when you're writing code for stuff that's tricky, and
 849 should constantly be putting yourself in someone else's shoes and asking
 850 if that person could figure out without much difficulty what's going
 851 on. (Assume they are a competent programmer who understands the
 852 essentials of how the XEmacs code is structured but doesn't know much
 853 about the module you're working on or any algorithms you're using.) If
 854 you're not sure whether they would be able to, add a comment.  Always
 855 err on the side of more comments, rather than less.
 856
 857    Generally, when making comments, there is no need to attribute them
 858 with your name or initials.  This especially goes for small,
 859 easy-to-understand, non-opinionated ones.  Also, comments indicating
 860 where, when, and by whom a file was changed are _strongly_ discouraged,
 861 and in general will be removed as they are discovered.  This is exactly
 862 what `ChangeLogs' are there for.  However, it can occasionally be
 863 useful to mark exactly where (but not when or by whom) changes are
 864 made, particularly when making small changes to a file imported from
 865 elsewhere.  These marks help when later on a newer version of the file
 866 is imported and the changes need to be merged. (If everything were
 867 always kept in CVS, there would be no need for this.  But in practice,
 868 this often doesn't happen, or the CVS repository is later on lost or
 869 unavailable to the person doing the update.)
 870
 871    When putting in an explicit opinion in a comment, you should
 872 _always_ attribute it with your name, and optionally the date.  This
 873 also goes for long, complex comments explaining in detail the workings
 874 of something - by putting your name there, you make it possible for
 875 someone who has questions about how that thing works to determine who
 876 wrote the comment so they can write to them.  Preferably, use your
 877 actual name and not your initials, unless your initials are generally
 878 recognized (e.g. `jwz').  You can use only your first name if it's
 879 obvious who you are; otherwise, give first and last name.  If you're
 880 not a regular contributor, you might consider putting your email
 881 address in - it may be in the ChangeLog, but after awhile ChangeLogs
 882 have a tendency of disappearing or getting muddled. (E.g. your comment
 883 may get copied somewhere else or even into another program, and
 884 tracking down the proper ChangeLog may be very difficult.)
 885
 886    If you come across an opinion that is not or no longer valid, or you
 887 come across any comment that no longer applies but you want to keep it
 888 around, enclose it in `[[ ' and ` ]]' marks and add a comment
 889 afterwards explaining why the preceding comment is no longer valid.  Put
 890 your name on this comment, as explained above.
 891
 892    Just as comments are a lifeline to programmers, incorrect comments
 893 are death.  If you come across an incorrect comment, *immediately*
 894 correct it or flag it as incorrect, as described in the previous
 895 paragraph.  Whenever you work on a section of code, _always_ make sure
 896 to update any comments to be correct - or, at the very least, flag them
 897 as incorrect.
 898
 899    To indicate a "todo" or other problem, use four pound signs - i.e.
 900 `####'.
 901
 902 \1f
 903 File: internals.info,  Node: Adding Global Lisp Variables,  Next: Proper Use of Unsigned Types,  Prev: Writing Good Comments,  Up: Rules When Writing New C Code
 904
 905 Adding Global Lisp Variables
 906 ============================
 907
 908    Global variables whose names begin with `Q' are constants whose
 909 value is a symbol of a particular name.  The name of the variable should
 910 be derived from the name of the symbol using the same rules as for Lisp
 911 primitives.  These variables are initialized using a call to
 912 `defsymbol()' in the `syms_of_*()' function. (This call interns a
 913 symbol, sets the C variable to the resulting Lisp object, and calls
 914 `staticpro()' on the C variable to tell the garbage-collection
 915 mechanism about this variable.  What `staticpro()' does is add a
 916 pointer to the variable to a large global array; when
 917 garbage-collection happens, all pointers listed in the array are used
 918 as starting points for marking Lisp objects.  This is important because
 919 it's quite possible that the only current reference to the object is
 920 the C variable.  In the case of symbols, the `staticpro()' doesn't
 921 matter all that much because the symbol is contained in `obarray',
 922 which is itself `staticpro()'ed.  However, it's possible that a naughty
 923 user could do something like uninterning the symbol out of `obarray' or
 924 even setting `obarray' to a different value [although this is likely to
 925 make XEmacs crash!].)
 926
 927    *Please note:* It is potentially deadly if you declare a `Q...'
 928 variable in two different modules.  The two calls to `defsymbol()' are
 929 no problem, but some linkers will complain about multiply-defined
 930 symbols.  The most insidious aspect of this is that often the link will
 931 succeed anyway, but then the resulting executable will sometimes crash
 932 in obscure ways during certain operations!  To avoid this problem,
 933 declare any symbols with common names (such as `text') that are not
 934 obviously associated with this particular module in the module
 935 `general.c'.
 936
 937    Global variables whose names begin with `V' are variables that
 938 contain Lisp objects.  The convention here is that all global variables
 939 of type `Lisp_Object' begin with `V', and all others don't (including
 940 integer and boolean variables that have Lisp equivalents). Most of the
 941 time, these variables have equivalents in Lisp, but some don't.  Those
 942 that do are declared this way by a call to `DEFVAR_LISP()' in the
 943 `vars_of_*()' initializer for the module.  What this does is create a
 944 special "symbol-value-forward" Lisp object that contains a pointer to
 945 the C variable, intern a symbol whose name is as specified in the call
 946 to `DEFVAR_LISP()', and set its value to the symbol-value-forward Lisp
 947 object; it also calls `staticpro()' on the C variable to tell the
 948 garbage-collection mechanism about the variable.  When `eval' (or
 949 actually `symbol-value') encounters this special object in the process
 950 of retrieving a variable's value, it follows the indirection to the C
 951 variable and gets its value.  `setq' does similar things so that the C
 952 variable gets changed.
 953
 954    Whether or not you `DEFVAR_LISP()' a variable, you need to
 955 initialize it in the `vars_of_*()' function; otherwise it will end up
 956 as all zeroes, which is the integer 0 (_not_ `nil'), and this is
 957 probably not what you want.  Also, if the variable is not
 958 `DEFVAR_LISP()'ed, *you must call* `staticpro()' on the C variable in
 959 the `vars_of_*()' function.  Otherwise, the garbage-collection
 960 mechanism won't know that the object in this variable is in use, and
 961 will happily collect it and reuse its storage for another Lisp object,
 962 and you will be the one who's unhappy when you can't figure out how
 963 your variable got overwritten.
 964
 965 \1f
 966 File: internals.info,  Node: Proper Use of Unsigned Types,  Next: Coding for Mule,  Prev: Adding Global Lisp Variables,  Up: Rules When Writing New C Code
 967
 968 Proper Use of Unsigned Types
 969 ============================
 970
 971    Avoid using `unsigned int' and `unsigned long' whenever possible.
 972 Unsigned types are viral - any arithmetic or comparisons involving
 973 mixed signed and unsigned types are automatically converted to
 974 unsigned, which is almost certainly not what you want.  Many subtle and
 975 hard-to-find bugs are created by careless use of unsigned types.  In
 976 general, you should almost _never_ use an unsigned type to hold a
 977 regular quantity of any sort.  The only exceptions are
 978
 979   1. When there's a reasonable possibility you will actually need all
 980      32 or 64 bits to store the quantity.
 981
 982   2. When calling existing API's that require unsigned types.  In this
 983      case, you should still do all manipulation using signed types, and
 984      do the conversion at the very threshold of the API call.
 985
 986   3. In existing code that you don't want to modify because you don't
 987      maintain it.
 988
 989   4. In bit-field structures.
 990
 991    Other reasonable uses of `unsigned int' and `unsigned long' are
 992 representing non-quantities - e.g. bit-oriented flags and such.
 993
 994 \1f
 995 File: internals.info,  Node: Coding for Mule,  Next: Techniques for XEmacs Developers,  Prev: Proper Use of Unsigned Types,  Up: Rules When Writing New C Code
 996
 997 Coding for Mule
 998 ===============
 999
1000    Although Mule support is not compiled by default in XEmacs, many
1001 people are using it, and we consider it crucial that new code works
1002 correctly with multibyte characters.  This is not hard; it is only a
1003 matter of following several simple user-interface guidelines.  Even if
1004 you never compile with Mule, with a little practice you will find it
1005 quite easy to code Mule-correctly.
1006
1007    Note that these guidelines are not necessarily tied to the current
1008 Mule implementation; they are also a good idea to follow on the grounds
1009 of code generalization for future I18N work.
1010
1011 * Menu:
1012
1013 * Character-Related Data Types::
1014 * Working With Character and Byte Positions::
1015 * Conversion to and from External Data::
1016 * General Guidelines for Writing Mule-Aware Code::
1017 * An Example of Mule-Aware Code::
1018
1019 \1f
1020 File: internals.info,  Node: Character-Related Data Types,  Next: Working With Character and Byte Positions,  Up: Coding for Mule
1021
1022 Character-Related Data Types
1023 ----------------------------
1024
1025    First, let's review the basic character-related datatypes used by
1026 XEmacs.  Note that the separate `typedef's are not mandatory in the
1027 current implementation (all of them boil down to `unsigned char' or
1028 `int'), but they improve clarity of code a great deal, because one
1029 glance at the declaration can tell the intended use of the variable.
1030
1031 `Emchar'
1032      An `Emchar' holds a single Emacs character.
1033
1034      Obviously, the equality between characters and bytes is lost in
1035      the Mule world.  Characters can be represented by one or more
1036      bytes in the buffer, and `Emchar' is the C type large enough to
1037      hold any character.
1038
1039      Without Mule support, an `Emchar' is equivalent to an `unsigned
1040      char'.
1041
1042 `Bufbyte'
1043      The data representing the text in a buffer or string is logically
1044      a set of `Bufbyte's.
1045
1046      XEmacs does not work with the same character formats all the time;
1047      when reading characters from the outside, it decodes them to an
1048      internal format, and likewise encodes them when writing.
1049      `Bufbyte' (in fact `unsigned char') is the basic unit of XEmacs
1050      internal buffers and strings format.  A `Bufbyte *' is the type
1051      that points at text encoded in the variable-width internal
1052      encoding.
1053
1054      One character can correspond to one or more `Bufbyte's.  In the
1055      current Mule implementation, an ASCII character is represented by
1056      the same `Bufbyte', and other characters are represented by a
1057      sequence of two or more `Bufbyte's.
1058
1059      Without Mule support, there are exactly 256 characters, implicitly
1060      Latin-1, and each character is represented using one `Bufbyte', and
1061      there is a one-to-one correspondence between `Bufbyte's and
1062      `Emchar's.
1063
1064 `Bufpos'
1065 `Charcount'
1066      A `Bufpos' represents a character position in a buffer or string.
1067      A `Charcount' represents a number (count) of characters.
1068      Logically, subtracting two `Bufpos' values yields a `Charcount'
1069      value.  Although all of these are `typedef'ed to `EMACS_INT', we
1070      use them in preference to `EMACS_INT' to make it clear what sort
1071      of position is being used.
1072
1073      `Bufpos' and `Charcount' values are the only ones that are ever
1074      visible to Lisp.
1075
1076 `Bytind'
1077 `Bytecount'
1078      A `Bytind' represents a byte position in a buffer or string.  A
1079      `Bytecount' represents the distance between two positions, in
1080      bytes.  The relationship between `Bytind' and `Bytecount' is the
1081      same as the relationship between `Bufpos' and `Charcount'.
1082
1083 `Extbyte'
1084 `Extcount'
1085      When dealing with the outside world, XEmacs works with `Extbyte's,
1086      which are equivalent to `unsigned char'.  Obviously, an `Extcount'
1087      is the distance between two `Extbyte's.  Extbytes and Extcounts
1088      are not all that frequent in XEmacs code.
1089