This is ../info/internals.info, produced by makeinfo version 4.0b from
internals/internals.texi.

INFO-DIR-SECTION XEmacs Editor
START-INFO-DIR-ENTRY
* Internals: (internals).       XEmacs Internals Manual.
END-INFO-DIR-ENTRY

   Copyright (C) 1992 - 1996 Ben Wing.  Copyright (C) 1996, 1997 Sun
Microsystems.  Copyright (C) 1994 - 1998 Free Software Foundation.
Copyright (C) 1994, 1995 Board of Trustees, University of Illinois.

   Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
preserved on all copies.

   Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided that the
entire resulting derived work is distributed under the terms of a
permission notice identical to this one.

   Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that this permission notice may be stated in a
translation approved by the Foundation.

   Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided also
that the section entitled "GNU General Public License" is included
exactly as in the original, and provided that the entire resulting
derived work is distributed under the terms of a permission notice
identical to this one.

   Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that the section entitled "GNU General Public License"
may be included in a translation approved by the Free Software
Foundation instead of in the original English.


File: internals.info,  Node: The Text in a Buffer,  Next: Buffer Lists,  Prev: Introduction to Buffers,  Up: Buffers and Textual Representation

The Text in a Buffer
====================

   The text in a buffer consists of a sequence of zero or more
characters.  A "character" is an integer that logically represents a
letter, number, space, or other unit of text.  Most of the characters
that you will typically encounter belong to the ASCII set of characters,
but there are also characters for various sorts of accented letters,
special symbols, Chinese and Japanese ideograms (i.e. Kanji, Katakana,
etc.), Cyrillic and Greek letters, etc.  The actual number of possible
characters is quite large.

   For now, we can view a character as some non-negative integer that
has some shape that defines how it typically appears (e.g. as an
uppercase A). (The exact way in which a character appears depends on the
font used to display the character.) The internal type of characters in
the C code is an `Emchar'; this is just an `int', but using a symbolic
type makes the code clearer.

   Between every character in a buffer is a "buffer position" or
"character position".  We can speak of the character before or after a
particular buffer position, and when you insert a character at a
particular position, all characters after that position end up at new
positions.  When we speak of the character "at" a position, we really
mean the character after the position.  (This schizophrenia between a
buffer position being "between" a character and "on" a character is
rampant in Emacs.)

   Buffer positions are numbered starting at 1.  This means that
position 1 is before the first character, and position 0 is not valid.
If there are N characters in a buffer, then buffer position N+1 is
after the last one, and position N+2 is not valid.

   The internal makeup of the Emchar integer varies depending on whether
we have compiled with MULE support.  If not, the Emchar integer is an
8-bit integer with possible values from 0 - 255.  0 - 127 are the
standard ASCII characters, while 128 - 255 are the characters from the
ISO-8859-1 character set.  If we have compiled with MULE support, an
Emchar is a 19-bit integer, with the various bits having meanings
according to a complex scheme that will be detailed later.  The
characters numbered 0 - 255 still have the same meanings as for the
non-MULE case, though.

   Internally, the text in a buffer is represented in a fairly simple
fashion: as a contiguous array of bytes, with a "gap" of some size in
the middle.  Although the gap is of some substantial size in bytes,
there is no text contained within it: From the perspective of the text
in the buffer, it does not exist.  The gap logically sits at some buffer
position, between two characters (or possibly at the beginning or end of
the buffer).  Insertion of text in a buffer at a particular position is
always accomplished by first moving the gap to that position (i.e.
through some block moving of text), then writing the text into the
beginning of the gap, thereby shrinking the gap.  If the gap shrinks
down to nothing, a new gap is created. (What actually happens is that a
new gap is "created" at the end of the buffer's text, which requires
nothing more than changing a couple of indices; then the gap is "moved"
to the position where the insertion needs to take place by moving up in
memory all the text after that position.)  Similarly, deletion occurs
by moving the gap to the place where the text is to be deleted, and
then simply expanding the gap to include the deleted text.
("Expanding" and "shrinking" the gap as just described means just that
the internal indices that keep track of where the gap is located are
changed.)

   Note that the total amount of memory allocated for a buffer text
never decreases while the buffer is live.  Therefore, if you load up a
20-megabyte file and then delete all but one character, there will be a
20-megabyte gap, which won't get any smaller (except by inserting
characters back again).  Once the buffer is killed, the memory allocated
for the buffer text will be freed, but it will still be sitting on the
heap, taking up virtual memory, and will not be released back to the
operating system. (However, if you have compiled XEmacs with rel-alloc,
the situation is different.  In this case, the space _will_ be released
back to the operating system.  However, this tends to result in a
noticeable speed penalty.)

   Astute readers may notice that the text in a buffer is represented as
an array of _bytes_, while (at least in the MULE case) an Emchar is a
19-bit integer, which clearly cannot fit in a byte.  This means (of
course) that the text in a buffer uses a different representation from
an Emchar: specifically, the 19-bit Emchar becomes a series of one to
four bytes.  The conversion between these two representations is complex
and will be described later.

   In the non-MULE case, everything is very simple: An Emchar is an
8-bit value, which fits neatly into one byte.

   If we are given a buffer position and want to retrieve the character
at that position, we need to follow these steps:

  1. Pretend there's no gap, and convert the buffer position into a
     "byte index" that indexes to the appropriate byte in the buffer's
     stream of textual bytes.  By convention, byte indices begin at 1,
     just like buffer positions.  In the non-MULE case, byte indices
     and buffer positions are identical, since one character equals one
     byte.

  2. Convert the byte index into a "memory index", which takes the gap
     into account.  The memory index is a direct index into the block of
     memory that stores the text of a buffer.  This basically just
     involves checking to see if the byte index is past the gap, and if
     so, adding the size of the gap to it.  By convention, memory
     indices begin at 1, just like buffer positions and byte indices,
     and when referring to the position that is "at" the gap, we always
     use the memory position at the _beginning_, not at the end, of the
     gap.

  3. Fetch the appropriate bytes at the determined memory position.

  4. Convert these bytes into an Emchar.

   In the non-Mule case, (3) and (4) boil down to a simple one-byte
memory access.

   Note that we have defined three types of positions in a buffer:

  1. "buffer positions" or "character positions", typedef `Bufpos'

  2. "byte indices", typedef `Bytind'

  3. "memory indices", typedef `Memind'

   All three typedefs are just `int's, but defining them this way makes
things a lot clearer.

   Most code works with buffer positions.  In particular, all Lisp code
that refers to text in a buffer uses buffer positions.  Lisp code does
not know that byte indices or memory indices exist.

   Finally, we have a typedef for the bytes in a buffer.  This is a
`Bufbyte', which is an unsigned char.  Referring to them as Bufbytes
underscores the fact that we are working with a string of bytes in the
internal Emacs buffer representation rather than in one of a number of
possible alternative representations (e.g. EUC-encoded text, etc.).


File: internals.info,  Node: Buffer Lists,  Next: Markers and Extents,  Prev: The Text in a Buffer,  Up: Buffers and Textual Representation

Buffer Lists
============

   Recall earlier that buffers are "permanent" objects, i.e.  that they
remain around until explicitly deleted.  This entails that there is a
list of all the buffers in existence.  This list is actually an
assoc-list (mapping from the buffer's name to the buffer) and is stored
in the global variable `Vbuffer_alist'.

   The order of the buffers in the list is important: the buffers are
ordered approximately from most-recently-used to least-recently-used.
Switching to a buffer using `switch-to-buffer', `pop-to-buffer', etc.
and switching windows using `other-window', etc.  usually brings the
new current buffer to the front of the list.  `switch-to-buffer',
`other-buffer', etc. look at the beginning of the list to find an
alternative buffer to suggest.  You can also explicitly move a buffer
to the end of the list using `bury-buffer'.

   In addition to the global ordering in `Vbuffer_alist', each frame
has its own ordering of the list.  These lists always contain the same
elements as in `Vbuffer_alist' although possibly in a different order.
`buffer-list' normally returns the list for the selected frame.  This
allows you to work in separate frames without things interfering with
each other.

   The standard way to look up a buffer given a name is `get-buffer',
and the standard way to create a new buffer is `get-buffer-create',
which looks up a buffer with a given name, creating a new one if
necessary.  These operations correspond exactly with the symbol
operations `intern-soft' and `intern', respectively.  You can also
force a new buffer to be created using `generate-new-buffer', which
takes a name and (if necessary) makes a unique name from this by
appending a number, and then creates the buffer.  This is basically
like the symbol operation `gensym'.


File: internals.info,  Node: Markers and Extents,  Next: Bufbytes and Emchars,  Prev: Buffer Lists,  Up: Buffers and Textual Representation

Markers and Extents
===================

   Among the things associated with a buffer are things that are
logically attached to certain buffer positions.  This can be used to
keep track of a buffer position when text is inserted and deleted, so
that it remains at the same spot relative to the text around it; to
assign properties to particular sections of text; etc.  There are two
such objects that are useful in this regard: they are "markers" and
"extents".

   A "marker" is simply a flag placed at a particular buffer position,
which is moved around as text is inserted and deleted.  Markers are
used for all sorts of purposes, such as the `mark' that is the other
end of textual regions to be cut, copied, etc.

   An "extent" is similar to two markers plus some associated
properties, and is used to keep track of regions in a buffer as text is
inserted and deleted, and to add properties (e.g. fonts) to particular
regions of text.  The external interface of extents is explained
elsewhere.

   The important thing here is that markers and extents simply contain
buffer positions in them as integers, and every time text is inserted or
deleted, these positions must be updated.  In order to minimize the
amount of shuffling that needs to be done, the positions in markers and
extents (there's one per marker, two per extent) are stored in Meminds.
This means that they only need to be moved when the text is physically
moved in memory; since the gap structure tries to minimize this, it also
minimizes the number of marker and extent indices that need to be
adjusted.  Look in `insdel.c' for the details of how this works.

   One other important distinction is that markers are "temporary"
while extents are "permanent".  This means that markers disappear as
soon as there are no more pointers to them, and correspondingly, there
is no way to determine what markers are in a buffer if you are just
given the buffer.  Extents remain in a buffer until they are detached
(which could happen as a result of text being deleted) or the buffer is
deleted, and primitives do exist to enumerate the extents in a buffer.


File: internals.info,  Node: Bufbytes and Emchars,  Next: The Buffer Object,  Prev: Markers and Extents,  Up: Buffers and Textual Representation

Bufbytes and Emchars
====================

   Not yet documented.


File: internals.info,  Node: The Buffer Object,  Prev: Bufbytes and Emchars,  Up: Buffers and Textual Representation

The Buffer Object
=================

   Buffers contain fields not directly accessible by the Lisp
programmer.  We describe them here, naming them by the names used in
the C code.  Many are accessible indirectly in Lisp programs via Lisp
primitives.

`name'
     The buffer name is a string that names the buffer.  It is
     guaranteed to be unique.  *Note Buffer Names: (lispref)Buffer
     Names.

`save_modified'
     This field contains the time when the buffer was last saved, as an
     integer.  *Note Buffer Modification: (lispref)Buffer Modification.

`modtime'
     This field contains the modification time of the visited file.  It
     is set when the file is written or read.  Every time the buffer is
     written to the file, this field is compared to the modification
     time of the file.  *Note Buffer Modification: (lispref)Buffer
     Modification.

`auto_save_modified'
     This field contains the time when the buffer was last auto-saved.

`last_window_start'
     This field contains the `window-start' position in the buffer as of
     the last time the buffer was displayed in a window.

`undo_list'
     This field points to the buffer's undo list.  *Note Undo:
     (lispref)Undo.

`syntax_table_v'
     This field contains the syntax table for the buffer.  *Note Syntax
     Tables: (lispref)Syntax Tables.

`downcase_table'
     This field contains the conversion table for converting text to
     lower case.  *Note Case Tables: (lispref)Case Tables.

`upcase_table'
     This field contains the conversion table for converting text to
     upper case.  *Note Case Tables: (lispref)Case Tables.

`case_canon_table'
     This field contains the conversion table for canonicalizing text
     for case-folding search.  *Note Case Tables: (lispref)Case Tables.

`case_eqv_table'
     This field contains the equivalence table for case-folding search.
     *Note Case Tables: (lispref)Case Tables.

`display_table'
     This field contains the buffer's display table, or `nil' if it
     doesn't have one.  *Note Display Tables: (lispref)Display Tables.

`markers'
     This field contains the chain of all markers that currently point
     into the buffer.  Deletion of text in the buffer, and motion of
     the buffer's gap, must check each of these markers and perhaps
     update it.  *Note Markers: (lispref)Markers.

`backed_up'
     This field is a flag that tells whether a backup file has been
     made for the visited file of this buffer.

`mark'
     This field contains the mark for the buffer.  The mark is a marker,
     hence it is also included on the list `markers'.  *Note The Mark:
     (lispref)The Mark.

`mark_active'
     This field is non-`nil' if the buffer's mark is active.

`local_var_alist'
     This field contains the association list describing the variables
     local in this buffer, and their values, with the exception of
     local variables that have special slots in the buffer object.
     (Those slots are omitted from this table.)  *Note Buffer-Local
     Variables: (lispref)Buffer-Local Variables.

`modeline_format'
     This field contains a Lisp object which controls how to display
     the mode line for this buffer.  *Note Modeline Format:
     (lispref)Modeline Format.

`base_buffer'
     This field holds the buffer's base buffer (if it is an indirect
     buffer), or `nil'.


File: internals.info,  Node: MULE Character Sets and Encodings,  Next: The Lisp Reader and Compiler,  Prev: Buffers and Textual Representation,  Up: Top

MULE Character Sets and Encodings
*********************************

   Recall that there are two primary ways that text is represented in
XEmacs.  The "buffer" representation sees the text as a series of bytes
(Bufbytes), with a variable number of bytes used per character.  The
"character" representation sees the text as a series of integers
(Emchars), one per character.  The character representation is a cleaner
representation from a theoretical standpoint, and is thus used in many
cases when lots of manipulations on a string need to be done.  However,
the buffer representation is the standard representation used in both
Lisp strings and buffers, and because of this, it is the "default"
representation that text comes in.  The reason for using this
representation is that it's compact and is compatible with ASCII.

* Menu:

* Character Sets::
* Encodings::
* Internal Mule Encodings::
* CCL::


File: internals.info,  Node: Character Sets,  Next: Encodings,  Prev: MULE Character Sets and Encodings,  Up: MULE Character Sets and Encodings

Character Sets
==============

   A character set (or "charset") is an ordered set of characters.  A
particular character in a charset is indexed using one or more
"position codes", which are non-negative integers.  The number of
position codes needed to identify a particular character in a charset is
called the "dimension" of the charset.  In XEmacs/Mule, all charsets
have dimension 1 or 2, and the size of all charsets (except for a few
special cases) is either 94, 96, 94 by 94, or 96 by 96.  The range of
position codes used to index characters from any of these types of
character sets is as follows:

     Charset type            Position code 1         Position code 2
     ------------------------------------------------------------
     94                      33 - 126                N/A
     96                      32 - 127                N/A
     94x94                   33 - 126                33 - 126
     96x96                   32 - 127                32 - 127

   Note that in the above cases position codes do not start at an
expected value such as 0 or 1.  The reason for this will become clear
later.

   For example, Latin-1 is a 96-character charset, and JISX0208 (the
Japanese national character set) is a 94x94-character charset.

   [Note that, although the ranges above define the _valid_ position
codes for a charset, some of the slots in a particular charset may in
fact be empty.  This is the case for JISX0208, for example, where (e.g.)
all the slots whose first position code is in the range 118 - 127 are
empty.]

   There are three charsets that do not follow the above rules.  All of
them have one dimension, and have ranges of position codes as follows:

     Charset name            Position code 1
     ------------------------------------
     ASCII                   0 - 127
     Control-1               0 - 31
     Composite               0 - some large number

   (The upper bound of the position code for composite characters has
not yet been determined, but it will probably be at least 16,383).

   ASCII is the union of two subsidiary character sets: Printing-ASCII
(the printing ASCII character set, consisting of position codes 33 -
126, like for a standard 94-character charset) and Control-ASCII (the
non-printing characters that would appear in a binary file with codes 0
- 32 and 127).

   Control-1 contains the non-printing characters that would appear in a
binary file with codes 128 - 159.

   Composite contains characters that are generated by overstriking one
or more characters from other charsets.

   Note that some characters in ASCII, and all characters in Control-1,
are "control" (non-printing) characters.  These have no printed
representation but instead control some other function of the printing
(e.g. TAB or 8 moves the current character position to the next tab
stop).  All other characters in all charsets are "graphic" (printing)
characters.

   When a binary file is read in, the bytes in the file are assigned to
character sets as follows:

     Bytes           Character set           Range
     --------------------------------------------------
     0 - 127         ASCII                   0 - 127
     128 - 159       Control-1               0 - 31
     160 - 255       Latin-1                 32 - 127

   This is a bit ad-hoc but gets the job done.


File: internals.info,  Node: Encodings,  Next: Internal Mule Encodings,  Prev: Character Sets,  Up: MULE Character Sets and Encodings

Encodings
=========

   An "encoding" is a way of numerically representing characters from
one or more character sets.  If an encoding only encompasses one
character set, then the position codes for the characters in that
character set could be used directly.  This is not possible, however, if
more than one character set is to be used in the encoding.

   For example, the conversion detailed above between bytes in a binary
file and characters is effectively an encoding that encompasses the
three character sets ASCII, Control-1, and Latin-1 in a stream of 8-bit
bytes.

   Thus, an encoding can be viewed as a way of encoding characters from
a specified group of character sets using a stream of bytes, each of
which contains a fixed number of bits (but not necessarily 8, as in the
common usage of "byte").

   Here are descriptions of a couple of common encodings:

* Menu:

* Japanese EUC (Extended Unix Code)::
* JIS7::


File: internals.info,  Node: Japanese EUC (Extended Unix Code),  Next: JIS7,  Prev: Encodings,  Up: Encodings

Japanese EUC (Extended Unix Code)
---------------------------------

   This encompasses the character sets Printing-ASCII,
Japanese-JISX0201, and Japanese-JISX0208-Kana (half-width katakana, the
right half of JISX0201).  It uses 8-bit bytes.

   Note that Printing-ASCII and Japanese-JISX0201-Kana are 94-character
charsets, while Japanese-JISX0208 is a 94x94-character charset.

   The encoding is as follows:

     Character set            Representation (PC=position-code)
     -------------            --------------
     Printing-ASCII           PC1
     Japanese-JISX0201-Kana   0x8E       | PC1 + 0x80
     Japanese-JISX0208        PC1 + 0x80 | PC2 + 0x80
     Japanese-JISX0212        PC1 + 0x80 | PC2 + 0x80


File: internals.info,  Node: JIS7,  Prev: Japanese EUC (Extended Unix Code),  Up: Encodings

JIS7
----

   This encompasses the character sets Printing-ASCII,
Japanese-JISX0201-Roman (the left half of JISX0201; this character set
is very similar to Printing-ASCII and is a 94-character charset),
Japanese-JISX0208, and Japanese-JISX0201-Kana.  It uses 7-bit bytes.

   Unlike Japanese EUC, this is a "modal" encoding, which means that
there are multiple states that the encoding can be in, which affect how
the bytes are to be interpreted.  Special sequences of bytes (called
"escape sequences") are used to change states.

   The encoding is as follows:

     Character set              Representation (PC=position-code)
     -------------              --------------
     Printing-ASCII             PC1
     Japanese-JISX0201-Roman    PC1
     Japanese-JISX0201-Kana     PC1
     Japanese-JISX0208          PC1 PC2
     
     
     Escape sequence   ASCII equivalent   Meaning
     ---------------   ----------------   -------
     0x1B 0x28 0x4A    ESC ( J            invoke Japanese-JISX0201-Roman
     0x1B 0x28 0x49    ESC ( I            invoke Japanese-JISX0201-Kana
     0x1B 0x24 0x42    ESC $ B            invoke Japanese-JISX0208
     0x1B 0x28 0x42    ESC ( B            invoke Printing-ASCII

   Initially, Printing-ASCII is invoked.


File: internals.info,  Node: Internal Mule Encodings,  Next: CCL,  Prev: Encodings,  Up: MULE Character Sets and Encodings

Internal Mule Encodings
=======================

   In XEmacs/Mule, each character set is assigned a unique number,
called a "leading byte".  This is used in the encodings of a character.
Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has
a leading byte of 0), although some leading bytes are reserved.

   Charsets whose leading byte is in the range 0x80 - 0x9F are called
"official" and are used for built-in charsets.  Other charsets are
called "private" and have leading bytes in the range 0xA0 - 0xFF; these
are user-defined charsets.

   More specifically:

     Character set           Leading byte
     -------------           ------------
     ASCII                   0
     Composite               0x80
     Dimension-1 Official    0x81 - 0x8D
                               (0x8E is free)
     Control-1               0x8F
     Dimension-2 Official    0x90 - 0x99
                               (0x9A - 0x9D are free;
                                0x9E and 0x9F are reserved)
     Dimension-1 Private     0xA0 - 0xEF
     Dimension-2 Private     0xF0 - 0xFF

   There are two internal encodings for characters in XEmacs/Mule.  One
is called "string encoding" and is an 8-bit encoding that is used for
representing characters in a buffer or string.  It uses 1 to 4 bytes per
character.  The other is called "character encoding" and is a 19-bit
encoding that is used for representing characters individually in a
variable.

   (In the following descriptions, we'll ignore composite characters for
the moment.  We also give a general (structural) overview first,
followed later by the exact details.)

* Menu:

* Internal String Encoding::
* Internal Character Encoding::


File: internals.info,  Node: Internal String Encoding,  Next: Internal Character Encoding,  Prev: Internal Mule Encodings,  Up: Internal Mule Encodings

Internal String Encoding
------------------------

   ASCII characters are encoded using their position code directly.
Other characters are encoded using their leading byte followed by their
position code(s) with the high bit set.  Characters in private character
sets have their leading byte prefixed with a "leading byte prefix",
which is either 0x9E or 0x9F. (No character sets are ever assigned these
leading bytes.) Specifically:

     Character set           Encoding (PC=position-code, LB=leading-byte)
     -------------           --------
     ASCII                   PC-1 |
     Control-1               LB   |  PC1 + 0xA0 |
     Dimension-1 official    LB   |  PC1 + 0x80 |
     Dimension-1 private     0x9E |  LB         | PC1 + 0x80 |
     Dimension-2 official    LB   |  PC1 + 0x80 | PC2 + 0x80 |
     Dimension-2 private     0x9F |  LB         | PC1 + 0x80 | PC2 + 0x80

   The basic characteristic of this encoding is that the first byte of
all characters is in the range 0x00 - 0x9F, and the second and
following bytes of all characters is in the range 0xA0 - 0xFF.  This
means that it is impossible to get out of sync, or more specifically:

  1. Given any byte position, the beginning of the character it is
     within can be determined in constant time.

  2. Given any byte position at the beginning of a character, the
     beginning of the next character can be determined in constant time.

  3. Given any byte position at the beginning of a character, the
     beginning of the previous character can be determined in constant
     time.

  4. Textual searches can simply treat encoded strings as if they were
     encoded in a one-byte-per-character fashion rather than the actual
     multi-byte encoding.

   None of the standard non-modal encodings meet all of these
conditions.  For example, EUC satisfies only (2) and (3), while
Shift-JIS and Big5 (not yet described) satisfy only (2). (All non-modal
encodings must satisfy (2), in order to be unambiguous.)


File: internals.info,  Node: Internal Character Encoding,  Prev: Internal String Encoding,  Up: Internal Mule Encodings

Internal Character Encoding
---------------------------

   One 19-bit word represents a single character.  The word is
separated into three fields:

     Bit number:     18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
                     <------------> <------------------> <------------------>
     Field:                1                  2                    3

   Note that fields 2 and 3 hold 7 bits each, while field 1 holds 5
bits.

     Character set           Field 1         Field 2         Field 3
     -------------           -------         -------         -------
     ASCII                      0               0              PC1
        range:                                                   (00 - 7F)
     Control-1                  0               1              PC1
        range:                                                   (00 - 1F)
     Dimension-1 official       0            LB - 0x80         PC1
        range:                                    (01 - 0D)      (20 - 7F)
     Dimension-1 private        0            LB - 0x80         PC1
        range:                                    (20 - 6F)      (20 - 7F)
     Dimension-2 official    LB - 0x8F         PC1             PC2
        range:                    (01 - 0A)       (20 - 7F)      (20 - 7F)
     Dimension-2 private     LB - 0xE1         PC1             PC2
        range:                    (0F - 1E)       (20 - 7F)      (20 - 7F)
     Composite                 0x1F             ?               ?

   Note that character codes 0 - 255 are the same as the "binary
encoding" described above.


File: internals.info,  Node: CCL,  Prev: Internal Mule Encodings,  Up: MULE Character Sets and Encodings

CCL
===

     CCL PROGRAM SYNTAX:
          CCL_PROGRAM := (CCL_MAIN_BLOCK
                          [ CCL_EOF_BLOCK ])
     
          CCL_MAIN_BLOCK := CCL_BLOCK
          CCL_EOF_BLOCK := CCL_BLOCK
     
          CCL_BLOCK := STATEMENT | (STATEMENT [STATEMENT ...])
          STATEMENT :=
                  SET | IF | BRANCH | LOOP | REPEAT | BREAK
                  | READ | WRITE
     
          SET := (REG = EXPRESSION) | (REG SELF_OP EXPRESSION)
                 | INT-OR-CHAR
     
          EXPRESSION := ARG | (EXPRESSION OP ARG)
     
          IF := (if EXPRESSION CCL_BLOCK CCL_BLOCK)
          BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...])
          LOOP := (loop STATEMENT [STATEMENT ...])
          BREAK := (break)
          REPEAT := (repeat)
                  | (write-repeat [REG | INT-OR-CHAR | string])
                  | (write-read-repeat REG [INT-OR-CHAR | string | ARRAY]?)
          READ := (read REG) | (read REG REG)
                  | (read-if REG ARITH_OP ARG CCL_BLOCK CCL_BLOCK)
                  | (read-branch REG CCL_BLOCK [CCL_BLOCK ...])
          WRITE := (write REG) | (write REG REG)
                  | (write INT-OR-CHAR) | (write STRING) | STRING
                  | (write REG ARRAY)
          END := (end)
     
          REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7
          ARG := REG | INT-OR-CHAR
          OP :=   + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | //
                  | < | > | == | <= | >= | !=
          SELF_OP :=
                  += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>=
          ARRAY := '[' INT-OR-CHAR ... ']'
          INT-OR-CHAR := INT | CHAR
     
     MACHINE CODE:
     
     The machine code consists of a vector of 32-bit words.
     The first such word specifies the start of the EOF section of the code;
     this is the code executed to handle any stuff that needs to be done
     (e.g. designating back to ASCII and left-to-right mode) after all
     other encoded/decoded data has been written out.  This is not used for
     charset CCL programs.
     
     REGISTER: 0..7  -- referred by RRR or rrr
     
     OPERATOR BIT FIELD (27-bit): XXXXXXXXXXXXXXX RRR TTTTT
             TTTTT (5-bit): operator type
             RRR (3-bit): register number
             XXXXXXXXXXXXXXXX (15-bit):
                     CCCCCCCCCCCCCCC: constant or address
                     000000000000rrr: register number
     
     AAAA:   00000 +
             00001 -
             00010 *
             00011 /
             00100 %
             00101 &
             00110 |
             00111 ~
     
             01000 <<
             01001 >>
             01010 <8
             01011 >8
             01100 //
             01101 not used
             01110 not used
             01111 not used
     
             10000 <
             10001 >
             10010 ==
             10011 <=
             10100 >=
             10101 !=
     
     OPERATORS:      TTTTT RRR XX..
     
     SetCS:          00000 RRR C...C      RRR = C...C
     SetCL:          00001 RRR .....      RRR = c...c
                     c.............c
     SetR:           00010 RRR ..rrr      RRR = rrr
     SetA:           00011 RRR ..rrr      RRR = array[rrr]
                     C.............C      size of array = C...C
                     c.............c      contents = c...c
     
     Jump:           00100 000 c...c      jump to c...c
     JumpCond:       00101 RRR c...c      if (!RRR) jump to c...c
     WriteJump:      00110 RRR c...c      Write1 RRR, jump to c...c
     WriteReadJump:  00111 RRR c...c      Write1, Read1 RRR, jump to c...c
     WriteCJump:     01000 000 c...c      Write1 C...C, jump to c...c
                     C...C
     WriteCReadJump: 01001 RRR c...c      Write1 C...C, Read1 RRR,
                     C.............C      and jump to c...c
     WriteSJump:     01010 000 c...c      WriteS, jump to c...c
                     C.............C
                     S.............S
                     ...
     WriteSReadJump: 01011 RRR c...c      WriteS, Read1 RRR, jump to c...c
                     C.............C
                     S.............S
                     ...
     WriteAReadJump: 01100 RRR c...c      WriteA, Read1 RRR, jump to c...c
                     C.............C      size of array = C...C
                     c.............c      contents = c...c
                     ...
     Branch:         01101 RRR C...C      if (RRR >= 0 && RRR < C..)
                     c.............c      branch to (RRR+1)th address
     Read1:          01110 RRR ...        read 1-byte to RRR
     Read2:          01111 RRR ..rrr      read 2-byte to RRR and rrr
     ReadBranch:     10000 RRR C...C      Read1 and Branch
                     c.............c
                     ...
     Write1:         10001 RRR .....      write 1-byte RRR
     Write2:         10010 RRR ..rrr      write 2-byte RRR and rrr
     WriteC:         10011 000 .....      write 1-char C...CC
                     C.............C
     WriteS:         10100 000 .....      write C..-byte of string
                     C.............C
                     S.............S
                     ...
     WriteA:         10101 RRR .....      write array[RRR]
                     C.............C      size of array = C...C
                     c.............c      contents = c...c
                     ...
     End:            10110 000 .....      terminate the execution
     
     SetSelfCS:      10111 RRR C...C      RRR AAAAA= C...C
                     ..........AAAAA
     SetSelfCL:      11000 RRR .....      RRR AAAAA= c...c
                     c.............c
                     ..........AAAAA
     SetSelfR:       11001 RRR ..Rrr      RRR AAAAA= rrr
                     ..........AAAAA
     SetExprCL:      11010 RRR ..Rrr      RRR = rrr AAAAA c...c
                     c.............c
                     ..........AAAAA
     SetExprR:       11011 RRR ..rrr      RRR = rrr AAAAA Rrr
                     ............Rrr
                     ..........AAAAA
     JumpCondC:      11100 RRR c...c      if !(RRR AAAAA C..) jump to c...c
                     C.............C
                     ..........AAAAA
     JumpCondR:      11101 RRR c...c      if !(RRR AAAAA rrr) jump to c...c
                     ............rrr
                     ..........AAAAA
     ReadJumpCondC:  11110 RRR c...c      Read1 and JumpCondC
                     C.............C
                     ..........AAAAA
     ReadJumpCondR:  11111 RRR c...c      Read1 and JumpCondR
                     ............rrr
                     ..........AAAAA


File: internals.info,  Node: The Lisp Reader and Compiler,  Next: Lstreams,  Prev: MULE Character Sets and Encodings,  Up: Top

The Lisp Reader and Compiler
****************************

   Not yet documented.


File: internals.info,  Node: Lstreams,  Next: Consoles; Devices; Frames; Windows,  Prev: The Lisp Reader and Compiler,  Up: Top

Lstreams
********

   An "lstream" is an internal Lisp object that provides a generic
buffering stream implementation.  Conceptually, you send data to the
stream or read data from the stream, not caring what's on the other end
of the stream.  The other end could be another stream, a file
descriptor, a stdio stream, a fixed block of memory, a reallocating
block of memory, etc.  The main purpose of the stream is to provide a
standard interface and to do buffering.  Macros are defined to read or
write characters, so the calling functions do not have to worry about
blocking data together in order to achieve efficiency.

* Menu:

* Creating an Lstream::         Creating an lstream object.
* Lstream Types::               Different sorts of things that are streamed.
* Lstream Functions::           Functions for working with lstreams.
* Lstream Methods::             Creating new lstream types.


File: internals.info,  Node: Creating an Lstream,  Next: Lstream Types,  Prev: Lstreams,  Up: Lstreams

Creating an Lstream
===================

   Lstreams come in different types, depending on what is being
interfaced to.  Although the primitive for creating new lstreams is
`Lstream_new()', generally you do not call this directly.  Instead, you
call some type-specific creation function, which creates the lstream
and initializes it as appropriate for the particular type.

   All lstream creation functions take a MODE argument, specifying what
mode the lstream should be opened as.  This controls whether the
lstream is for input and output, and optionally whether data should be
blocked up in units of MULE characters.  Note that some types of
lstreams can only be opened for input; others only for output; and
others can be opened either way.  #### Richard Mlynarik thinks that
there should be a strict separation between input and output streams,
and he's probably right.

   MODE is a string, one of

`"r"'
     Open for reading.

`"w"'
     Open for writing.

`"rc"'
     Open for reading, but "read" never returns partial MULE characters.

`"wc"'
     Open for writing, but never writes partial MULE characters.


File: internals.info,  Node: Lstream Types,  Next: Lstream Functions,  Prev: Creating an Lstream,  Up: Lstreams

Lstream Types
=============

stdio

filedesc

lisp-string

fixed-buffer

resizing-buffer

dynarr

lisp-buffer

print

decoding

encoding

File: internals.info,  Node: Lstream Functions,  Next: Lstream Methods,  Prev: Lstream Types,  Up: Lstreams

Lstream Functions
=================

 - Function: Lstream * Lstream_new (Lstream_implementation *IMP, const
          char *MODE)
     Allocate and return a new Lstream.  This function is not really
     meant to be called directly; rather, each stream type should
     provide its own stream creation function, which creates the stream
     and does any other necessary creation stuff (e.g. opening a file).

 - Function: void Lstream_set_buffering (Lstream *LSTR,
          Lstream_buffering BUFFERING, int BUFFERING_SIZE)
     Change the buffering of a stream.  See `lstream.h'.  By default the
     buffering is `STREAM_BLOCK_BUFFERED'.

 - Function: int Lstream_flush (Lstream *LSTR)
     Flush out any pending unwritten data in the stream.  Clear any
     buffered input data.  Returns 0 on success, -1 on error.

 - Macro: int Lstream_putc (Lstream *STREAM, int C)
     Write out one byte to the stream.  This is a macro and so it is
     very efficient.  The C argument is only evaluated once but the
     STREAM argument is evaluated more than once.  Returns 0 on
     success, -1 on error.

 - Macro: int Lstream_getc (Lstream *STREAM)
     Read one byte from the stream.  This is a macro and so it is very
     efficient.  The STREAM argument is evaluated more than once.
     Return value is -1 for EOF or error.

 - Macro: void Lstream_ungetc (Lstream *STREAM, int C)
     Push one byte back onto the input queue.  This will be the next
     byte read from the stream.  Any number of bytes can be pushed back
     and will be read in the reverse order they were pushed back--most
     recent first. (This is necessary for consistency--if there are a
     number of bytes that have been unread and I read and unread a
     byte, it needs to be the first to be read again.) This is a macro
     and so it is very efficient.  The C argument is only evaluated
     once but the STREAM argument is evaluated more than once.

 - Function: int Lstream_fputc (Lstream *STREAM, int C)
 - Function: int Lstream_fgetc (Lstream *STREAM)
 - Function: void Lstream_fungetc (Lstream *STREAM, int C)
     Function equivalents of the above macros.

 - Function: ssize_t Lstream_read (Lstream *STREAM, void *DATA, size_t
          SIZE)
     Read SIZE bytes of DATA from the stream.  Return the number of
     bytes read.  0 means EOF. -1 means an error occurred and no bytes
     were read.

 - Function: ssize_t Lstream_write (Lstream *STREAM, void *DATA, size_t
          SIZE)
     Write SIZE bytes of DATA to the stream.  Return the number of
     bytes written.  -1 means an error occurred and no bytes were
     written.

 - Function: void Lstream_unread (Lstream *STREAM, void *DATA, size_t
          SIZE)
     Push back SIZE bytes of DATA onto the input queue.  The next call
     to `Lstream_read()' with the same size will read the same bytes
     back.  Note that this will be the case even if there is other
     pending unread data.

 - Function: int Lstream_close (Lstream *STREAM)
     Close the stream.  All data will be flushed out.

 - Function: void Lstream_reopen (Lstream *STREAM)
     Reopen a closed stream.  This enables I/O on it again.  This is not
     meant to be called except from a wrapper routine that reinitializes
     variables and such--the close routine may well have freed some
     necessary storage structures, for example.

 - Function: void Lstream_rewind (Lstream *STREAM)
     Rewind the stream to the beginning.


File: internals.info,  Node: Lstream Methods,  Prev: Lstream Functions,  Up: Lstreams

Lstream Methods
===============

 - Lstream Method: ssize_t reader (Lstream *STREAM, unsigned char
          *DATA, size_t SIZE)
     Read some data from the stream's end and store it into DATA, which
     can hold SIZE bytes.  Return the number of bytes read.  A return
     value of 0 means no bytes can be read at this time.  This may be
     because of an EOF, or because there is a granularity greater than
     one byte that the stream imposes on the returned data, and SIZE is
     less than this granularity. (This will happen frequently for
     streams that need to return whole characters, because
     `Lstream_read()' calls the reader function repeatedly until it has
     the number of bytes it wants or until 0 is returned.)  The lstream
     functions do not treat a 0 return as EOF or do anything special;
     however, the calling function will interpret any 0 it gets back as
     EOF.  This will normally not happen unless the caller calls
     `Lstream_read()' with a very small size.

     This function can be `NULL' if the stream is output-only.

 - Lstream Method: ssize_t writer (Lstream *STREAM, const unsigned char
          *DATA, size_t SIZE)
     Send some data to the stream's end.  Data to be sent is in DATA
     and is SIZE bytes.  Return the number of bytes sent.  This
     function can send and return fewer bytes than is passed in; in that
     case, the function will just be called again until there is no
     data left or 0 is returned.  A return value of 0 means that no
     more data can be currently stored, but there is no error; the data
     will be squirreled away until the writer can accept data. (This is
     useful, e.g., if you're dealing with a non-blocking file
     descriptor and are getting `EWOULDBLOCK' errors.)  This function
     can be `NULL' if the stream is input-only.

 - Lstream Method: int rewinder (Lstream *STREAM)
     Rewind the stream.  If this is `NULL', the stream is not seekable.

 - Lstream Method: int seekable_p (Lstream *STREAM)
     Indicate whether this stream is seekable--i.e. it can be rewound.
     This method is ignored if the stream does not have a rewind
     method.  If this method is not present, the result is determined
     by whether a rewind method is present.

 - Lstream Method: int flusher (Lstream *STREAM)
     Perform any additional operations necessary to flush the data in
     this stream.

 - Lstream Method: int pseudo_closer (Lstream *STREAM)

 - Lstream Method: int closer (Lstream *STREAM)
     Perform any additional operations necessary to close this stream
     down.  May be `NULL'.  This function is called when
     `Lstream_close()' is called or when the stream is
     garbage-collected.  When this function is called, all pending data
     in the stream will already have been written out.

 - Lstream Method: Lisp_Object marker (Lisp_Object LSTREAM, void
          (*MARKFUN) (Lisp_Object))
     Mark this object for garbage collection.  Same semantics as a
     standard `Lisp_Object' marker.  This function can be `NULL'.


File: internals.info,  Node: Consoles; Devices; Frames; Windows,  Next: The Redisplay Mechanism,  Prev: Lstreams,  Up: Top

Consoles; Devices; Frames; Windows
**********************************

* Menu:

* Introduction to Consoles; Devices; Frames; Windows::
* Point::
* Window Hierarchy::
* The Window Object::


File: internals.info,  Node: Introduction to Consoles; Devices; Frames; Windows,  Next: Point,  Prev: Consoles; Devices; Frames; Windows,  Up: Consoles; Devices; Frames; Windows

Introduction to Consoles; Devices; Frames; Windows
==================================================

   A window-system window that you see on the screen is called a
"frame" in Emacs terminology.  Each frame is subdivided into one or
more non-overlapping panes, called (confusingly) "windows".  Each
window displays the text of a buffer in it. (See above on Buffers.) Note
that buffers and windows are independent entities: Two or more windows
can be displaying the same buffer (potentially in different locations),
and a buffer can be displayed in no windows.

   A single display screen that contains one or more frames is called a
"display".  Under most circumstances, there is only one display.
However, more than one display can exist, for example if you have a
"multi-headed" console, i.e. one with a single keyboard but multiple
displays. (Typically in such a situation, the various displays act like
one large display, in that the mouse is only in one of them at a time,
and moving the mouse off of one moves it into another.) In some cases,
the different displays will have different characteristics, e.g. one
color and one mono.

   XEmacs can display frames on multiple displays.  It can even deal
simultaneously with frames on multiple keyboards (called "consoles" in
XEmacs terminology).  Here is one case where this might be useful: You
are using XEmacs on your workstation at work, and leave it running.
Then you go home and dial in on a TTY line, and you can use the
already-running XEmacs process to display another frame on your local
TTY.

   Thus, there is a hierarchy console -> display -> frame -> window.
There is a separate Lisp object type for each of these four concepts.
Furthermore, there is logically a "selected console", "selected
display", "selected frame", and "selected window".  Each of these
objects is distinguished in various ways, such as being the default
object for various functions that act on objects of that type.  Note
that every containing object remembers the "selected" object among the
objects that it contains: e.g. not only is there a selected window, but
every frame remembers the last window in it that was selected, and
changing the selected frame causes the remembered window within it to
become the selected window.  Similar relationships apply for consoles
to devices and devices to frames.


File: internals.info,  Node: Point,  Next: Window Hierarchy,  Prev: Introduction to Consoles; Devices; Frames; Windows,  Up: Consoles; Devices; Frames; Windows

Point
=====

   Recall that every buffer has a current insertion position, called
"point".  Now, two or more windows may be displaying the same buffer,
and the text cursor in the two windows (i.e. `point') can be in two
different places.  You may ask, how can that be, since each buffer has
only one value of `point'?  The answer is that each window also has a
value of `point' that is squirreled away in it.  There is only one
selected window, and the value of "point" in that buffer corresponds to
that window.  When the selected window is changed from one window to
another displaying the same buffer, the old value of `point' is stored
into the old window's "point" and the value of `point' from the new
window is retrieved and made the value of `point' in the buffer.  This
means that `window-point' for the selected window is potentially
inaccurate, and if you want to retrieve the correct value of `point'
for a window, you must special-case on the selected window and retrieve
the buffer's point instead.  This is related to why
`save-window-excursion' does not save the selected window's value of
`point'.