This is Info file ../../info/internals.info, produced by Makeinfo
version 1.68 from the input file internals.texi.

INFO-DIR-SECTION XEmacs Editor
START-INFO-DIR-ENTRY
* Internals: (internals).	XEmacs Internals Manual.
END-INFO-DIR-ENTRY

   Copyright (C) 1992 - 1996 Ben Wing.  Copyright (C) 1996, 1997 Sun
Microsystems.  Copyright (C) 1994 - 1998 Free Software Foundation.
Copyright (C) 1994, 1995 Board of Trustees, University of Illinois.

   Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
preserved on all copies.

   Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided that the
entire resulting derived work is distributed under the terms of a
permission notice identical to this one.

   Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that this permission notice may be stated in a
translation approved by the Foundation.

   Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided also
that the section entitled "GNU General Public License" is included
exactly as in the original, and provided that the entire resulting
derived work is distributed under the terms of a permission notice
identical to this one.

   Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that the section entitled "GNU General Public License"
may be included in a translation approved by the Free Software
Foundation instead of in the original English.


File: internals.info,  Node: Buffer Lists,  Next: Markers and Extents,  Prev: The Text in a Buffer,  Up: Buffers and Textual Representation

Buffer Lists
============

   Recall earlier that buffers are "permanent" objects, i.e.  that they
remain around until explicitly deleted.  This entails that there is a
list of all the buffers in existence.  This list is actually an
assoc-list (mapping from the buffer's name to the buffer) and is stored
in the global variable `Vbuffer_alist'.

   The order of the buffers in the list is important: the buffers are
ordered approximately from most-recently-used to least-recently-used.
Switching to a buffer using `switch-to-buffer', `pop-to-buffer', etc.
and switching windows using `other-window', etc.  usually brings the
new current buffer to the front of the list.  `switch-to-buffer',
`other-buffer', etc. look at the beginning of the list to find an
alternative buffer to suggest.  You can also explicitly move a buffer
to the end of the list using `bury-buffer'.

   In addition to the global ordering in `Vbuffer_alist', each frame
has its own ordering of the list.  These lists always contain the same
elements as in `Vbuffer_alist' although possibly in a different order.
`buffer-list' normally returns the list for the selected frame.  This
allows you to work in separate frames without things interfering with
each other.

   The standard way to look up a buffer given a name is `get-buffer',
and the standard way to create a new buffer is `get-buffer-create',
which looks up a buffer with a given name, creating a new one if
necessary.  These operations correspond exactly with the symbol
operations `intern-soft' and `intern', respectively.  You can also
force a new buffer to be created using `generate-new-buffer', which
takes a name and (if necessary) makes a unique name from this by
appending a number, and then creates the buffer.  This is basically
like the symbol operation `gensym'.


File: internals.info,  Node: Markers and Extents,  Next: Bufbytes and Emchars,  Prev: Buffer Lists,  Up: Buffers and Textual Representation

Markers and Extents
===================

   Among the things associated with a buffer are things that are
logically attached to certain buffer positions.  This can be used to
keep track of a buffer position when text is inserted and deleted, so
that it remains at the same spot relative to the text around it; to
assign properties to particular sections of text; etc.  There are two
such objects that are useful in this regard: they are "markers" and
"extents".

   A "marker" is simply a flag placed at a particular buffer position,
which is moved around as text is inserted and deleted.  Markers are
used for all sorts of purposes, such as the `mark' that is the other
end of textual regions to be cut, copied, etc.

   An "extent" is similar to two markers plus some associated
properties, and is used to keep track of regions in a buffer as text is
inserted and deleted, and to add properties (e.g. fonts) to particular
regions of text.  The external interface of extents is explained
elsewhere.

   The important thing here is that markers and extents simply contain
buffer positions in them as integers, and every time text is inserted or
deleted, these positions must be updated.  In order to minimize the
amount of shuffling that needs to be done, the positions in markers and
extents (there's one per marker, two per extent) and stored in Meminds.
This means that they only need to be moved when the text is physically
moved in memory; since the gap structure tries to minimize this, it also
minimizes the number of marker and extent indices that need to be
adjusted.  Look in `insdel.c' for the details of how this works.

   One other important distinction is that markers are "temporary"
while extents are "permanent".  This means that markers disappear as
soon as there are no more pointers to them, and correspondingly, there
is no way to determine what markers are in a buffer if you are just
given the buffer.  Extents remain in a buffer until they are detached
(which could happen as a result of text being deleted) or the buffer is
deleted, and primitives do exist to enumerate the extents in a buffer.


File: internals.info,  Node: Bufbytes and Emchars,  Next: The Buffer Object,  Prev: Markers and Extents,  Up: Buffers and Textual Representation

Bufbytes and Emchars
====================

   Not yet documented.


File: internals.info,  Node: The Buffer Object,  Prev: Bufbytes and Emchars,  Up: Buffers and Textual Representation

The Buffer Object
=================

   Buffers contain fields not directly accessible by the Lisp
programmer.  We describe them here, naming them by the names used in
the C code.  Many are accessible indirectly in Lisp programs via Lisp
primitives.

`name'
     The buffer name is a string that names the buffer.  It is
     guaranteed to be unique.  *Note Buffer Names: (lispref)Buffer
     Names.

`save_modified'
     This field contains the time when the buffer was last saved, as an
     integer.  *Note Buffer Modification: (lispref)Buffer Modification.

`modtime'
     This field contains the modification time of the visited file.  It
     is set when the file is written or read.  Every time the buffer is
     written to the file, this field is compared to the modification
     time of the file.  *Note Buffer Modification: (lispref)Buffer
     Modification.

`auto_save_modified'
     This field contains the time when the buffer was last auto-saved.

`last_window_start'
     This field contains the `window-start' position in the buffer as of
     the last time the buffer was displayed in a window.

`undo_list'
     This field points to the buffer's undo list.  *Note Undo:
     (lispref)Undo.

`syntax_table_v'
     This field contains the syntax table for the buffer.  *Note Syntax
     Tables: (lispref)Syntax Tables.

`downcase_table'
     This field contains the conversion table for converting text to
     lower case.  *Note Case Tables: (lispref)Case Tables.

`upcase_table'
     This field contains the conversion table for converting text to
     upper case.  *Note Case Tables: (lispref)Case Tables.

`case_canon_table'
     This field contains the conversion table for canonicalizing text
     for case-folding search.  *Note Case Tables: (lispref)Case Tables.

`case_eqv_table'
     This field contains the equivalence table for case-folding search.
     *Note Case Tables: (lispref)Case Tables.

`display_table'
     This field contains the buffer's display table, or `nil' if it
     doesn't have one.  *Note Display Tables: (lispref)Display Tables.

`markers'
     This field contains the chain of all markers that currently point
     into the buffer.  Deletion of text in the buffer, and motion of
     the buffer's gap, must check each of these markers and perhaps
     update it.  *Note Markers: (lispref)Markers.

`backed_up'
     This field is a flag that tells whether a backup file has been
     made for the visited file of this buffer.

`mark'
     This field contains the mark for the buffer.  The mark is a marker,
     hence it is also included on the list `markers'.  *Note The Mark:
     (lispref)The Mark.

`mark_active'
     This field is non-`nil' if the buffer's mark is active.

`local_var_alist'
     This field contains the association list describing the variables
     local in this buffer, and their values, with the exception of
     local variables that have special slots in the buffer object.
     (Those slots are omitted from this table.)  *Note Buffer-Local
     Variables: (lispref)Buffer-Local Variables.

`modeline_format'
     This field contains a Lisp object which controls how to display
     the mode line for this buffer.  *Note Modeline Format:
     (lispref)Modeline Format.

`base_buffer'
     This field holds the buffer's base buffer (if it is an indirect
     buffer), or `nil'.


File: internals.info,  Node: MULE Character Sets and Encodings,  Next: The Lisp Reader and Compiler,  Prev: Buffers and Textual Representation,  Up: Top

MULE Character Sets and Encodings
*********************************

   Recall that there are two primary ways that text is represented in
XEmacs.  The "buffer" representation sees the text as a series of bytes
(Bufbytes), with a variable number of bytes used per character.  The
"character" representation sees the text as a series of integers
(Emchars), one per character.  The character representation is a cleaner
representation from a theoretical standpoint, and is thus used in many
cases when lots of manipulations on a string need to be done.  However,
the buffer representation is the standard representation used in both
Lisp strings and buffers, and because of this, it is the "default"
representation that text comes in.  The reason for using this
representation is that it's compact and is compatible with ASCII.

* Menu:

* Character Sets::
* Encodings::
* Internal Mule Encodings::
* CCL::


File: internals.info,  Node: Character Sets,  Next: Encodings,  Up: MULE Character Sets and Encodings

Character Sets
==============

   A character set (or "charset") is an ordered set of characters.  A
particular character in a charset is indexed using one or more
"position codes", which are non-negative integers.  The number of
position codes needed to identify a particular character in a charset is
called the "dimension" of the charset.  In XEmacs/Mule, all charsets
have dimension 1 or 2, and the size of all charsets (except for a few
special cases) is either 94, 96, 94 by 94, or 96 by 96.  The range of
position codes used to index characters from any of these types of
character sets is as follows:

     Charset type            Position code 1         Position code 2
     ------------------------------------------------------------
     94                      33 - 126                N/A
     96                      32 - 127                N/A
     94x94                   33 - 126                33 - 126
     96x96                   32 - 127                32 - 127

   Note that in the above cases position codes do not start at an
expected value such as 0 or 1.  The reason for this will become clear
later.

   For example, Latin-1 is a 96-character charset, and JISX0208 (the
Japanese national character set) is a 94x94-character charset.

   [Note that, although the ranges above define the *valid* position
codes for a charset, some of the slots in a particular charset may in
fact be empty.  This is the case for JISX0208, for example, where (e.g.)
all the slots whose first position code is in the range 118 - 127 are
empty.]

   There are three charsets that do not follow the above rules.  All of
them have one dimension, and have ranges of position codes as follows:

     Charset name            Position code 1
     ------------------------------------
     ASCII                   0 - 127
     Control-1               0 - 31
     Composite               0 - some large number

   (The upper bound of the position code for composite characters has
not yet been determined, but it will probably be at least 16,383).

   ASCII is the union of two subsidiary character sets: Printing-ASCII
(the printing ASCII character set, consisting of position codes 33 -
126, like for a standard 94-character charset) and Control-ASCII (the
non-printing characters that would appear in a binary file with codes 0
- 32 and 127).

   Control-1 contains the non-printing characters that would appear in a
binary file with codes 128 - 159.

   Composite contains characters that are generated by overstriking one
or more characters from other charsets.

   Note that some characters in ASCII, and all characters in Control-1,
are "control" (non-printing) characters.  These have no printed
representation but instead control some other function of the printing
(e.g. TAB or 8 moves the current character position to the next tab
stop).  All other characters in all charsets are "graphic" (printing)
characters.

   When a binary file is read in, the bytes in the file are assigned to
character sets as follows:

     Bytes           Character set           Range
     --------------------------------------------------
     0 - 127         ASCII                   0 - 127
     128 - 159       Control-1               0 - 31
     160 - 255       Latin-1                 32 - 127

   This is a bit ad-hoc but gets the job done.


File: internals.info,  Node: Encodings,  Next: Internal Mule Encodings,  Prev: Character Sets,  Up: MULE Character Sets and Encodings

Encodings
=========

   An "encoding" is a way of numerically representing characters from
one or more character sets.  If an encoding only encompasses one
character set, then the position codes for the characters in that
character set could be used directly.  This is not possible, however, if
more than one character set is to be used in the encoding.

   For example, the conversion detailed above between bytes in a binary
file and characters is effectively an encoding that encompasses the
three character sets ASCII, Control-1, and Latin-1 in a stream of 8-bit
bytes.

   Thus, an encoding can be viewed as a way of encoding characters from
a specified group of character sets using a stream of bytes, each of
which contains a fixed number of bits (but not necessarily 8, as in the
common usage of "byte").

   Here are descriptions of a couple of common encodings:

* Menu:

* Japanese EUC (Extended Unix Code)::
* JIS7::


File: internals.info,  Node: Japanese EUC (Extended Unix Code),  Next: JIS7,  Up: Encodings

Japanese EUC (Extended Unix Code)
---------------------------------

   This encompasses the character sets Printing-ASCII,
Japanese-JISX0201, and Japanese-JISX0208-Kana (half-width katakana, the
right half of JISX0201).  It uses 8-bit bytes.

   Note that Printing-ASCII and Japanese-JISX0201-Kana are 94-character
charsets, while Japanese-JISX0208 is a 94x94-character charset.

   The encoding is as follows:

     Character set            Representation (PC=position-code)
     -------------            --------------
     Printing-ASCII           PC1
     Japanese-JISX0201-Kana   0x8E       | PC1 + 0x80
     Japanese-JISX0208        PC1 + 0x80 | PC2 + 0x80
     Japanese-JISX0212        PC1 + 0x80 | PC2 + 0x80


File: internals.info,  Node: JIS7,  Prev: Japanese EUC (Extended Unix Code),  Up: Encodings

JIS7
----

   This encompasses the character sets Printing-ASCII,
Japanese-JISX0201-Roman (the left half of JISX0201; this character set
is very similar to Printing-ASCII and is a 94-character charset),
Japanese-JISX0208, and Japanese-JISX0201-Kana.  It uses 7-bit bytes.

   Unlike Japanese EUC, this is a "modal" encoding, which means that
there are multiple states that the encoding can be in, which affect how
the bytes are to be interpreted.  Special sequences of bytes (called
"escape sequences") are used to change states.

   The encoding is as follows:

     Character set              Representation (PC=position-code)
     -------------              --------------
     Printing-ASCII             PC1
     Japanese-JISX0201-Roman    PC1
     Japanese-JISX0201-Kana     PC1
     Japanese-JISX0208          PC1 PC2
     
     
     Escape sequence   ASCII equivalent   Meaning
     ---------------   ----------------   -------
     0x1B 0x28 0x4A    ESC ( J            invoke Japanese-JISX0201-Roman
     0x1B 0x28 0x49    ESC ( I            invoke Japanese-JISX0201-Kana
     0x1B 0x24 0x42    ESC $ B            invoke Japanese-JISX0208
     0x1B 0x28 0x42    ESC ( B            invoke Printing-ASCII

   Initially, Printing-ASCII is invoked.


File: internals.info,  Node: Internal Mule Encodings,  Next: CCL,  Prev: Encodings,  Up: MULE Character Sets and Encodings

Internal Mule Encodings
=======================

   In XEmacs/Mule, each character set is assigned a unique number,
called a "leading byte".  This is used in the encodings of a character.
Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has
a leading byte of 0), although some leading bytes are reserved.

   Charsets whose leading byte is in the range 0x80 - 0x9F are called
"official" and are used for built-in charsets.  Other charsets are
called "private" and have leading bytes in the range 0xA0 - 0xFF; these
are user-defined charsets.

   More specifically:

     Character set           Leading byte
     -------------           ------------
     ASCII                   0
     Composite               0x80
     Dimension-1 Official    0x81 - 0x8D
                               (0x8E is free)
     Control-1               0x8F
     Dimension-2 Official    0x90 - 0x99
                               (0x9A - 0x9D are free;
                                0x9E and 0x9F are reserved)
     Dimension-1 Private     0xA0 - 0xEF
     Dimension-2 Private     0xF0 - 0xFF

   There are two internal encodings for characters in XEmacs/Mule.  One
is called "string encoding" and is an 8-bit encoding that is used for
representing characters in a buffer or string.  It uses 1 to 4 bytes per
character.  The other is called "character encoding" and is a 19-bit
encoding that is used for representing characters individually in a
variable.

   (In the following descriptions, we'll ignore composite characters for
the moment.  We also give a general (structural) overview first,
followed later by the exact details.)

* Menu:

* Internal String Encoding::
* Internal Character Encoding::


File: internals.info,  Node: Internal String Encoding,  Next: Internal Character Encoding,  Up: Internal Mule Encodings

Internal String Encoding
------------------------

   ASCII characters are encoded using their position code directly.
Other characters are encoded using their leading byte followed by their
position code(s) with the high bit set.  Characters in private character
sets have their leading byte prefixed with a "leading byte prefix",
which is either 0x9E or 0x9F. (No character sets are ever assigned these
leading bytes.) Specifically:

     Character set           Encoding (PC=position-code, LB=leading-byte)
     -------------           --------
     ASCII                   PC-1 |
     Control-1               LB   |  PC1 + 0xA0 |
     Dimension-1 official    LB   |  PC1 + 0x80 |
     Dimension-1 private     0x9E |  LB         | PC1 + 0x80 |
     Dimension-2 official    LB   |  PC1 + 0x80 | PC2 + 0x80 |
     Dimension-2 private     0x9F |  LB         | PC1 + 0x80 | PC2 + 0x80

   The basic characteristic of this encoding is that the first byte of
all characters is in the range 0x00 - 0x9F, and the second and
following bytes of all characters is in the range 0xA0 - 0xFF.  This
means that it is impossible to get out of sync, or more specifically:

  1. Given any byte position, the beginning of the character it is
     within can be determined in constant time.

  2. Given any byte position at the beginning of a character, the
     beginning of the next character can be determined in constant time.

  3. Given any byte position at the beginning of a character, the
     beginning of the previous character can be determined in constant
     time.

  4. Textual searches can simply treat encoded strings as if they were
     encoded in a one-byte-per-character fashion rather than the actual
     multi-byte encoding.

   None of the standard non-modal encodings meet all of these
conditions.  For example, EUC satisfies only (2) and (3), while
Shift-JIS and Big5 (not yet described) satisfy only (2). (All non-modal
encodings must satisfy (2), in order to be unambiguous.)


File: internals.info,  Node: Internal Character Encoding,  Prev: Internal String Encoding,  Up: Internal Mule Encodings

Internal Character Encoding
---------------------------

   One 19-bit word represents a single character.  The word is
separated into three fields:

     Bit number:     18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
                     <------------> <------------------> <------------------>
     Field:                1                  2                    3

   Note that fields 2 and 3 hold 7 bits each, while field 1 holds 5
bits.

     Character set           Field 1         Field 2         Field 3
     -------------           -------         -------         -------
     ASCII                      0               0              PC1
        range:                                                   (00 - 7F)
     Control-1                  0               1              PC1
        range:                                                   (00 - 1F)
     Dimension-1 official       0            LB - 0x80         PC1
        range:                                    (01 - 0D)      (20 - 7F)
     Dimension-1 private        0            LB - 0x80         PC1
        range:                                    (20 - 6F)      (20 - 7F)
     Dimension-2 official    LB - 0x8F         PC1             PC2
        range:                    (01 - 0A)       (20 - 7F)      (20 - 7F)
     Dimension-2 private     LB - 0xE1         PC1             PC2
        range:                    (0F - 1E)       (20 - 7F)      (20 - 7F)
     Composite                 0x1F             ?               ?

   Note that character codes 0 - 255 are the same as the "binary
encoding" described above.


File: internals.info,  Node: CCL,  Prev: Internal Mule Encodings,  Up: MULE Character Sets and Encodings

CCL
===

     CCL PROGRAM SYNTAX:
          CCL_PROGRAM := (CCL_MAIN_BLOCK
                          [ CCL_EOF_BLOCK ])
     
          CCL_MAIN_BLOCK := CCL_BLOCK
          CCL_EOF_BLOCK := CCL_BLOCK
     
          CCL_BLOCK := STATEMENT | (STATEMENT [STATEMENT ...])
          STATEMENT :=
                  SET | IF | BRANCH | LOOP | REPEAT | BREAK
                  | READ | WRITE
     
          SET := (REG = EXPRESSION) | (REG SELF_OP EXPRESSION)
                 | INT-OR-CHAR
     
          EXPRESSION := ARG | (EXPRESSION OP ARG)
     
          IF := (if EXPRESSION CCL_BLOCK CCL_BLOCK)
          BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...])
          LOOP := (loop STATEMENT [STATEMENT ...])
          BREAK := (break)
          REPEAT := (repeat)
                  | (write-repeat [REG | INT-OR-CHAR | string])
                  | (write-read-repeat REG [INT-OR-CHAR | string | ARRAY]?)
          READ := (read REG) | (read REG REG)
                  | (read-if REG ARITH_OP ARG CCL_BLOCK CCL_BLOCK)
                  | (read-branch REG CCL_BLOCK [CCL_BLOCK ...])
          WRITE := (write REG) | (write REG REG)
                  | (write INT-OR-CHAR) | (write STRING) | STRING
                  | (write REG ARRAY)
          END := (end)
     
          REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7
          ARG := REG | INT-OR-CHAR
          OP :=   + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | //
                  | < | > | == | <= | >= | !=
          SELF_OP :=
                  += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>=
          ARRAY := '[' INT-OR-CHAR ... ']'
          INT-OR-CHAR := INT | CHAR
     
     MACHINE CODE:
     
     The machine code consists of a vector of 32-bit words.
     The first such word specifies the start of the EOF section of the code;
     this is the code executed to handle any stuff that needs to be done
     (e.g. designating back to ASCII and left-to-right mode) after all
     other encoded/decoded data has been written out.  This is not used for
     charset CCL programs.
     
     REGISTER: 0..7  -- refered by RRR or rrr
     
     OPERATOR BIT FIELD (27-bit): XXXXXXXXXXXXXXX RRR TTTTT
             TTTTT (5-bit): operator type
             RRR (3-bit): register number
             XXXXXXXXXXXXXXXX (15-bit):
                     CCCCCCCCCCCCCCC: constant or address
                     000000000000rrr: register number
     
     AAAA:   00000 +
             00001 -
             00010 *
             00011 /
             00100 %
             00101 &
             00110 |
             00111 ~
     
             01000 <<
             01001 >>
             01010 <8
             01011 >8
             01100 //
             01101 not used
             01110 not used
             01111 not used
     
             10000 <
             10001 >
             10010 ==
             10011 <=
             10100 >=
             10101 !=
     
     OPERATORS:      TTTTT RRR XX..
     
     SetCS:          00000 RRR C...C      RRR = C...C
     SetCL:          00001 RRR .....      RRR = c...c
                     c.............c
     SetR:           00010 RRR ..rrr      RRR = rrr
     SetA:           00011 RRR ..rrr      RRR = array[rrr]
                     C.............C      size of array = C...C
                     c.............c      contents = c...c
     
     Jump:           00100 000 c...c      jump to c...c
     JumpCond:       00101 RRR c...c      if (!RRR) jump to c...c
     WriteJump:      00110 RRR c...c      Write1 RRR, jump to c...c
     WriteReadJump:  00111 RRR c...c      Write1, Read1 RRR, jump to c...c
     WriteCJump:     01000 000 c...c      Write1 C...C, jump to c...c
                     C...C
     WriteCReadJump: 01001 RRR c...c      Write1 C...C, Read1 RRR,
                     C.............C      and jump to c...c
     WriteSJump:     01010 000 c...c      WriteS, jump to c...c
                     C.............C
                     S.............S
                     ...
     WriteSReadJump: 01011 RRR c...c      WriteS, Read1 RRR, jump to c...c
                     C.............C
                     S.............S
                     ...
     WriteAReadJump: 01100 RRR c...c      WriteA, Read1 RRR, jump to c...c
                     C.............C      size of array = C...C
                     c.............c      contents = c...c
                     ...
     Branch:         01101 RRR C...C      if (RRR >= 0 && RRR < C..)
                     c.............c      branch to (RRR+1)th address
     Read1:          01110 RRR ...        read 1-byte to RRR
     Read2:          01111 RRR ..rrr      read 2-byte to RRR and rrr
     ReadBranch:     10000 RRR C...C      Read1 and Branch
                     c.............c
                     ...
     Write1:         10001 RRR .....      write 1-byte RRR
     Write2:         10010 RRR ..rrr      write 2-byte RRR and rrr
     WriteC:         10011 000 .....      write 1-char C...CC
                     C.............C
     WriteS:         10100 000 .....      write C..-byte of string
                     C.............C
                     S.............S
                     ...
     WriteA:         10101 RRR .....      write array[RRR]
                     C.............C      size of array = C...C
                     c.............c      contents = c...c
                     ...
     End:            10110 000 .....      terminate the execution
     
     SetSelfCS:      10111 RRR C...C      RRR AAAAA= C...C
                     ..........AAAAA
     SetSelfCL:      11000 RRR .....      RRR AAAAA= c...c
                     c.............c
                     ..........AAAAA
     SetSelfR:       11001 RRR ..Rrr      RRR AAAAA= rrr
                     ..........AAAAA
     SetExprCL:      11010 RRR ..Rrr      RRR = rrr AAAAA c...c
                     c.............c
                     ..........AAAAA
     SetExprR:       11011 RRR ..rrr      RRR = rrr AAAAA Rrr
                     ............Rrr
                     ..........AAAAA
     JumpCondC:      11100 RRR c...c      if !(RRR AAAAA C..) jump to c...c
                     C.............C
                     ..........AAAAA
     JumpCondR:      11101 RRR c...c      if !(RRR AAAAA rrr) jump to c...c
                     ............rrr
                     ..........AAAAA
     ReadJumpCondC:  11110 RRR c...c      Read1 and JumpCondC
                     C.............C
                     ..........AAAAA
     ReadJumpCondR:  11111 RRR c...c      Read1 and JumpCondR
                     ............rrr
                     ..........AAAAA


File: internals.info,  Node: The Lisp Reader and Compiler,  Next: Lstreams,  Prev: MULE Character Sets and Encodings,  Up: Top

The Lisp Reader and Compiler
****************************

   Not yet documented.


File: internals.info,  Node: Lstreams,  Next: Consoles; Devices; Frames; Windows,  Prev: The Lisp Reader and Compiler,  Up: Top

Lstreams
********

   An "lstream" is an internal Lisp object that provides a generic
buffering stream implementation.  Conceptually, you send data to the
stream or read data from the stream, not caring what's on the other end
of the stream.  The other end could be another stream, a file
descriptor, a stdio stream, a fixed block of memory, a reallocating
block of memory, etc.  The main purpose of the stream is to provide a
standard interface and to do buffering.  Macros are defined to read or
write characters, so the calling functions do not have to worry about
blocking data together in order to achieve efficiency.

* Menu:

* Creating an Lstream::         Creating an lstream object.
* Lstream Types::               Different sorts of things that are streamed.
* Lstream Functions::           Functions for working with lstreams.
* Lstream Methods::             Creating new lstream types.


File: internals.info,  Node: Creating an Lstream,  Next: Lstream Types,  Up: Lstreams

Creating an Lstream
===================

   Lstreams come in different types, depending on what is being
interfaced to.  Although the primitive for creating new lstreams is
`Lstream_new()', generally you do not call this directly.  Instead, you
call some type-specific creation function, which creates the lstream
and initializes it as appropriate for the particular type.

   All lstream creation functions take a MODE argument, specifying what
mode the lstream should be opened as.  This controls whether the
lstream is for input and output, and optionally whether data should be
blocked up in units of MULE characters.  Note that some types of
lstreams can only be opened for input; others only for output; and
others can be opened either way.  #### Richard Mlynarik thinks that
there should be a strict separation between input and output streams,
and he's probably right.

   MODE is a string, one of

`"r"'
     Open for reading.

`"w"'
     Open for writing.

`"rc"'
     Open for reading, but "read" never returns partial MULE characters.

`"wc"'
     Open for writing, but never writes partial MULE characters.


File: internals.info,  Node: Lstream Types,  Next: Lstream Functions,  Prev: Creating an Lstream,  Up: Lstreams

Lstream Types
=============

stdio

filedesc

lisp-string

fixed-buffer

resizing-buffer

dynarr

lisp-buffer

print

decoding

encoding

File: internals.info,  Node: Lstream Functions,  Next: Lstream Methods,  Prev: Lstream Types,  Up: Lstreams

Lstream Functions
=================

 - Function: Lstream * Lstream_new (Lstream_implementation *IMP, CONST
          char *MODE)
     Allocate and return a new Lstream.  This function is not really
     meant to be called directly; rather, each stream type should
     provide its own stream creation function, which creates the stream
     and does any other necessary creation stuff (e.g. opening a file).

 - Function: void Lstream_set_buffering (Lstream *LSTR,
          Lstream_buffering BUFFERING, int BUFFERING_SIZE)
     Change the buffering of a stream.  See `lstream.h'.  By default the
     buffering is `STREAM_BLOCK_BUFFERED'.

 - Function: int Lstream_flush (Lstream *LSTR)
     Flush out any pending unwritten data in the stream.  Clear any
     buffered input data.  Returns 0 on success, -1 on error.

 - Macro: int Lstream_putc (Lstream *STREAM, int C)
     Write out one byte to the stream.  This is a macro and so it is
     very efficient.  The C argument is only evaluated once but the
     STREAM argument is evaluated more than once.  Returns 0 on
     success, -1 on error.

 - Macro: int Lstream_getc (Lstream *STREAM)
     Read one byte from the stream.  This is a macro and so it is very
     efficient.  The STREAM argument is evaluated more than once.
     Return value is -1 for EOF or error.

 - Macro: void Lstream_ungetc (Lstream *STREAM, int C)
     Push one byte back onto the input queue.  This will be the next
     byte read from the stream.  Any number of bytes can be pushed back
     and will be read in the reverse order they were pushed back - most
     recent first. (This is necessary for consistency - if there are a
     number of bytes that have been unread and I read and unread a
     byte, it needs to be the first to be read again.) This is a macro
     and so it is very efficient.  The C argument is only evaluated
     once but the STREAM argument is evaluated more than once.

 - Function: int Lstream_fputc (Lstream *STREAM, int C)
 - Function: int Lstream_fgetc (Lstream *STREAM)
 - Function: void Lstream_fungetc (Lstream *STREAM, int C)
     Function equivalents of the above macros.

 - Function: int Lstream_read (Lstream *STREAM, void *DATA, int SIZE)
     Read SIZE bytes of DATA from the stream.  Return the number of
     bytes read.  0 means EOF. -1 means an error occurred and no bytes
     were read.

 - Function: int Lstream_write (Lstream *STREAM, void *DATA, int SIZE)
     Write SIZE bytes of DATA to the stream.  Return the number of
     bytes written.  -1 means an error occurred and no bytes were
     written.

 - Function: void Lstream_unread (Lstream *STREAM, void *DATA, int SIZE)
     Push back SIZE bytes of DATA onto the input queue.  The next call
     to `Lstream_read()' with the same size will read the same bytes
     back.  Note that this will be the case even if there is other
     pending unread data.

 - Function: int Lstream_close (Lstream *STREAM)
     Close the stream.  All data will be flushed out.

 - Function: void Lstream_reopen (Lstream *STREAM)
     Reopen a closed stream.  This enables I/O on it again.  This is not
     meant to be called except from a wrapper routine that reinitializes
     variables and such - the close routine may well have freed some
     necessary storage structures, for example.

 - Function: void Lstream_rewind (Lstream *STREAM)
     Rewind the stream to the beginning.


File: internals.info,  Node: Lstream Methods,  Prev: Lstream Functions,  Up: Lstreams

Lstream Methods
===============

 - Lstream Method: int reader (Lstream *STREAM, unsigned char *DATA,
          int SIZE)
     Read some data from the stream's end and store it into DATA, which
     can hold SIZE bytes.  Return the number of bytes read.  A return
     value of 0 means no bytes can be read at this time.  This may be
     because of an EOF, or because there is a granularity greater than
     one byte that the stream imposes on the returned data, and SIZE is
     less than this granularity. (This will happen frequently for
     streams that need to return whole characters, because
     `Lstream_read()' calls the reader function repeatedly until it has
     the number of bytes it wants or until 0 is returned.)  The lstream
     functions do not treat a 0 return as EOF or do anything special;
     however, the calling function will interpret any 0 it gets back as
     EOF.  This will normally not happen unless the caller calls
     `Lstream_read()' with a very small size.

     This function can be `NULL' if the stream is output-only.

 - Lstream Method: int writer (Lstream *STREAM, CONST unsigned char
          *DATA, int SIZE)
     Send some data to the stream's end.  Data to be sent is in DATA
     and is SIZE bytes.  Return the number of bytes sent.  This
     function can send and return fewer bytes than is passed in; in that
     case, the function will just be called again until there is no
     data left or 0 is returned.  A return value of 0 means that no
     more data can be currently stored, but there is no error; the data
     will be squirreled away until the writer can accept data. (This is
     useful, e.g., if you're dealing with a non-blocking file
     descriptor and are getting `EWOULDBLOCK' errors.)  This function
     can be `NULL' if the stream is input-only.

 - Lstream Method: int rewinder (Lstream *STREAM)
     Rewind the stream.  If this is `NULL', the stream is not seekable.

 - Lstream Method: int seekable_p (Lstream *STREAM)
     Indicate whether this stream is seekable - i.e. it can be rewound.
     This method is ignored if the stream does not have a rewind
     method.  If this method is not present, the result is determined
     by whether a rewind method is present.

 - Lstream Method: int flusher (Lstream *STREAM)
     Perform any additional operations necessary to flush the data in
     this stream.

 - Lstream Method: int pseudo_closer (Lstream *STREAM)

 - Lstream Method: int closer (Lstream *STREAM)
     Perform any additional operations necessary to close this stream
     down.  May be `NULL'.  This function is called when
     `Lstream_close()' is called or when the stream is
     garbage-collected.  When this function is called, all pending data
     in the stream will already have been written out.

 - Lstream Method: Lisp_Object marker (Lisp_Object LSTREAM, void
          (*MARKFUN) (Lisp_Object))
     Mark this object for garbage collection.  Same semantics as a
     standard `Lisp_Object' marker.  This function can be `NULL'.


File: internals.info,  Node: Consoles; Devices; Frames; Windows,  Next: The Redisplay Mechanism,  Prev: Lstreams,  Up: Top

Consoles; Devices; Frames; Windows
**********************************

* Menu:

* Introduction to Consoles; Devices; Frames; Windows::
* Point::
* Window Hierarchy::
* The Window Object::


File: internals.info,  Node: Introduction to Consoles; Devices; Frames; Windows,  Next: Point,  Up: Consoles; Devices; Frames; Windows

Introduction to Consoles; Devices; Frames; Windows
==================================================

   A window-system window that you see on the screen is called a
"frame" in Emacs terminology.  Each frame is subdivided into one or
more non-overlapping panes, called (confusingly) "windows".  Each
window displays the text of a buffer in it. (See above on Buffers.) Note
that buffers and windows are independent entities: Two or more windows
can be displaying the same buffer (potentially in different locations),
and a buffer can be displayed in no windows.

   A single display screen that contains one or more frames is called a
"display".  Under most circumstances, there is only one display.
However, more than one display can exist, for example if you have a
"multi-headed" console, i.e. one with a single keyboard but multiple
displays. (Typically in such a situation, the various displays act like
one large display, in that the mouse is only in one of them at a time,
and moving the mouse off of one moves it into another.) In some cases,
the different displays will have different characteristics, e.g. one
color and one mono.

   XEmacs can display frames on multiple displays.  It can even deal
simultaneously with frames on multiple keyboards (called "consoles" in
XEmacs terminology).  Here is one case where this might be useful: You
are using XEmacs on your workstation at work, and leave it running.
Then you go home and dial in on a TTY line, and you can use the
already-running XEmacs process to display another frame on your local
TTY.

   Thus, there is a hierarchy console -> display -> frame -> window.
There is a separate Lisp object type for each of these four concepts.
Furthermore, there is logically a "selected console", "selected
display", "selected frame", and "selected window".  Each of these
objects is distinguished in various ways, such as being the default
object for various functions that act on objects of that type.  Note
that every containing object rememembers the "selected" object among
the objects that it contains: e.g. not only is there a selected window,
but every frame remembers the last window in it that was selected, and
changing the selected frame causes the remembered window within it to
become the selected window.  Similar relationships apply for consoles
to devices and devices to frames.


File: internals.info,  Node: Point,  Next: Window Hierarchy,  Prev: Introduction to Consoles; Devices; Frames; Windows,  Up: Consoles; Devices; Frames; Windows

Point
=====

   Recall that every buffer has a current insertion position, called
"point".  Now, two or more windows may be displaying the same buffer,
and the text cursor in the two windows (i.e. `point') can be in two
different places.  You may ask, how can that be, since each buffer has
only one value of `point'?  The answer is that each window also has a
value of `point' that is squirreled away in it.  There is only one
selected window, and the value of "point" in that buffer corresponds to
that window.  When the selected window is changed from one window to
another displaying the same buffer, the old value of `point' is stored
into the old window's "point" and the value of `point' from the new
window is retrieved and made the value of `point' in the buffer.  This
means that `window-point' for the selected window is potentially
inaccurate, and if you want to retrieve the correct value of `point'
for a window, you must special-case on the selected window and retrieve
the buffer's point instead.  This is related to why
`save-window-excursion' does not save the selected window's value of
`point'.


File: internals.info,  Node: Window Hierarchy,  Next: The Window Object,  Prev: Point,  Up: Consoles; Devices; Frames; Windows

Window Hierarchy
================

   If a frame contains multiple windows (panes), they are always created
by splitting an existing window along the horizontal or vertical axis.
Terminology is a bit confusing here: to "split a window horizontally"
means to create two side-by-side windows, i.e. to make a *vertical* cut
in a window.  Likewise, to "split a window vertically" means to create
two windows, one above the other, by making a *horizontal* cut.

   If you split a window and then split again along the same axis, you
will end up with a number of panes all arranged along the same axis.
The precise way in which the splits were made should not be important,
and this is reflected internally.  Internally, all windows are arranged
in a tree, consisting of two types of windows, "combination" windows
(which have children, and are covered completely by those children) and
"leaf" windows, which have no children and are visible.  Every
combination window has two or more children, all arranged along the same
axis.  There are (logically) two subtypes of windows, depending on
whether their children are horizontally or vertically arrayed.  There is
always one root window, which is either a leaf window (if the frame
contains only one window) or a combination window (if the frame contains
more than one window).  In the latter case, the root window will have
two or more children, either horizontally or vertically arrayed, and
each of those children will be either a leaf window or another
combination window.

   Here are some rules:

  1. Horizontal combination windows can never have children that are
     horizontal combination windows; same for vertical.

  2. Only leaf windows can be split (obviously) and this splitting does
     one of two things: (a) turns the leaf window into a combination
     window and creates two new leaf children, or (b) turns the leaf
     window into one of the two new leaves and creates the other leaf.
     Rule (1) dictates which of these two outcomes happens.

  3. Every combination window must have at least two children.

  4. Leaf windows can never become combination windows.  They can be
     deleted, however.  If this results in a violation of (3), the
     parent combination window also gets deleted.

  5. All functions that accept windows must be prepared to accept
     combination windows, and do something sane (e.g. signal an error
     if so).  Combination windows *do* escape to the Lisp level.

  6. All windows have three fields governing their contents: these are
     "hchild" (a list of horizontally-arrayed children), "vchild" (a
     list of vertically-arrayed children), and "buffer" (the buffer
     contained in a leaf window).  Exactly one of these will be
     non-nil.  Remember that "horizontally-arrayed" means
     "side-by-side" and "vertically-arrayed" means "one above the
     other".

  7. Leaf windows also have markers in their `start' (the first buffer
     position displayed in the window) and `pointm' (the window's
     stashed value of `point' - see above) fields, while combination
     windows have nil in these fields.

  8. The list of children for a window is threaded through the `next'
     and `prev' fields of each child window.

  9. *Deleted windows can be undeleted*.  This happens as a result of
     restoring a window configuration, and is unlike frames, displays,
     and consoles, which, once deleted, can never be restored.
     Deleting a window does nothing except set a special `dead' bit to
     1 and clear out the `next', `prev', `hchild', and `vchild' fields,
     for GC purposes.

 10. Most frames actually have two top-level windows - one for the
     minibuffer and one (the "root") for everything else.  The modeline
     (if present) separates these two.  The `next' field of the root
     points to the minibuffer, and the `prev' field of the minibuffer
     points to the root.  The other `next' and `prev' fields are `nil',
     and the frame points to both of these windows.  Minibuffer-less
     frames have no minibuffer window, and the `next' and `prev' of the
     root window are `nil'.  Minibuffer-only frames have no root
     window, and the `next' of the minibuffer window is `nil' but the
     `prev' points to itself. (#### This is an artifact that should be
     fixed.)


File: internals.info,  Node: The Window Object,  Prev: Window Hierarchy,  Up: Consoles; Devices; Frames; Windows

The Window Object
=================

   Windows have the following accessible fields:

`frame'
     The frame that this window is on.

`mini_p'
     Non-`nil' if this window is a minibuffer window.

`buffer'
     The buffer that the window is displaying.  This may change often
     during the life of the window.

`dedicated'
     Non-`nil' if this window is dedicated to its buffer.

`pointm'
     This is the value of point in the current buffer when this window
     is selected; when it is not selected, it retains its previous
     value.

`start'
     The position in the buffer that is the first character to be
     displayed in the window.

`force_start'
     If this flag is non-`nil', it says that the window has been
     scrolled explicitly by the Lisp program.  This affects what the
     next redisplay does if point is off the screen: instead of
     scrolling the window to show the text around point, it moves point
     to a location that is on the screen.

`last_modified'
     The `modified' field of the window's buffer, as of the last time a
     redisplay completed in this window.

`last_point'
     The buffer's value of point, as of the last time a redisplay
     completed in this window.

`left'
     This is the left-hand edge of the window, measured in columns.
     (The leftmost column on the screen is column 0.)

`top'
     This is the top edge of the window, measured in lines.  (The top
     line on the screen is line 0.)

`height'
     The height of the window, measured in lines.

`width'
     The width of the window, measured in columns.

`next'
     This is the window that is the next in the chain of siblings.  It
     is `nil' in a window that is the rightmost or bottommost of a
     group of siblings.

`prev'
     This is the window that is the previous in the chain of siblings.
     It is `nil' in a window that is the leftmost or topmost of a group
     of siblings.

`parent'
     Internally, XEmacs arranges windows in a tree; each group of
     siblings has a parent window whose area includes all the siblings.
     This field points to a window's parent.

     Parent windows do not display buffers, and play little role in
     display except to shape their child windows.  Emacs Lisp programs
     usually have no access to the parent windows; they operate on the
     windows at the leaves of the tree, which actually display buffers.

`hscroll'
     This is the number of columns that the display in the window is
     scrolled horizontally to the left.  Normally, this is 0.

`use_time'
     This is the last time that the window was selected.  The function
     `get-lru-window' uses this field.

`display_table'
     The window's display table, or `nil' if none is specified for it.

`update_mode_line'
     Non-`nil' means this window's mode line needs to be updated.

`base_line_number'
     The line number of a certain position in the buffer, or `nil'.
     This is used for displaying the line number of point in the mode
     line.

`base_line_pos'
     The position in the buffer for which the line number is known, or
     `nil' meaning none is known.

`region_showing'
     If the region (or part of it) is highlighted in this window, this
     field holds the mark position that made one end of that region.
     Otherwise, this field is `nil'.