XEmacs 21.2.46 "Urania".

[chise/xemacs-chise.git-] / info / internals.info-2
diff --git a/info/internals.info-2 b/info/internals.info-2

index 8d78cb4..805e7ef 100644 (file)
--- a/info/internals.info-2
+++ b/info/internals.info-2
@@ -1,9 +1,9 @@
-This is Info file ../../info/internals.info, produced by Makeinfo
-version 1.68 from the input file internals.texi.
+This is ../info/internals.info, produced by makeinfo version 4.0 from
+internals/internals.texi.
  
  INFO-DIR-SECTION XEmacs Editor
  START-INFO-DIR-ENTRY
-* Internals: (internals).      XEmacs Internals Manual.
+* Internals: (internals).       XEmacs Internals Manual.
  END-INFO-DIR-ENTRY
  
     Copyright (C) 1992 - 1996 Ben Wing.  Copyright (C) 1996, 1997 Sun
@@ -71,18 +71,18 @@ internal operations.)
       like integers in many ways but are logically considered text
       rather than numbers and have a different read syntax. (the read
       syntax for a char contains the char itself or some textual
-     encoding of it - for example, a Japanese Kanji character might be
-     encoded as `^[$(B#&^[(B' using the ISO-2022 encoding standard -
-     rather than the numerical representation of the char; this way, if
-     the mapping between chars and integers changes, which is quite
-     possible for Kanji characters and other extended characters, the
-     same character will still be created.  Note that some primitives
-     confuse chars and integers.  The worst culprit is `eq', which
-     makes a special exception and considers a char to be `eq' to its
-     integer equivalent, even though in no other case are objects of two
-     different types `eq'.  The reason for this monstrosity is
-     compatibility with existing code; the separation of char from
-     integer came fairly recently.)
+     encoding of it--for example, a Japanese Kanji character might be
+     encoded as `^[$(B#&^[(B' using the ISO-2022 encoding
+     standard--rather than the numerical representation of the char;
+     this way, if the mapping between chars and integers changes, which
+     is quite possible for Kanji characters and other extended
+     characters, the same character will still be created.  Note that
+     some primitives confuse chars and integers.  The worst culprit is
+     `eq', which makes a special exception and considers a char to be
+     `eq' to its integer equivalent, even though in no other case are
+     objects of two different types `eq'.  The reason for this
+     monstrosity is compatibility with existing code; the separation of
+     char from integer came fairly recently.)
  
  `symbol'
       An object that contains Lisp objects and is referred to by name;
@@ -286,7 +286,7 @@ but detached extents (extents not referring to any text, as happens to
  some extents when the text they are referring to is deleted) are
  temporary.  Note that some permanent objects, such as faces and coding
  systems, cannot be deleted.  Note also that windows are unique in that
-they can be *undeleted* after having previously been deleted. (This
+they can be _undeleted_ after having previously been deleted. (This
  happens as a result of restoring a window configuration.)
  
     Note that many types of objects have a "read syntax", i.e. a way of
@@ -405,24 +405,16 @@ representation stuffs a pointer together with a tag, as follows:
        [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
        [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
       
-        <---> ^ <------------------------------------------------------>
-         tag  |       a pointer to a structure, or an integer
-              |
-            mark bit
-
-   The tag describes the type of the Lisp object.  For integers and
-chars, the lower 28 bits contain the value of the integer or char; for
-all others, the lower 28 bits contain a pointer.  The mark bit is used
-during garbage-collection, and is always 0 when garbage collection is
-not happening. (The way that garbage collection works, basically, is
-that it loops over all places where Lisp objects could exist - this
-includes all global variables in C that contain Lisp objects [including
-`Vobarray', the C equivalent of `obarray'; through this, all Lisp
-variables will get marked], plus various other places - and recursively
-scans through the Lisp objects, marking each object it finds by setting
-the mark bit.  Then it goes through the lists of all objects allocated,
-freeing the ones that are not marked and turning off the mark bit of
-the ones that are marked.)
+        <---------------------------------------------------------> <->
+                 a pointer to a structure, or an integer            tag
+
+   A tag of 00 is used for all pointer object types, a tag of 10 is used
+for characters, and the other two tags 01 and 11 are joined together to
+form the integer object type.  This representation gives us 31 bit
+integers and 30 bit characters, while pointers are represented directly
+without any bit masking or shifting.  This representation, though,
+assumes that pointers to structs are always aligned to multiples of 4,
+so the lower 2 bits are always zero.
  
     Lisp objects use the typedef `Lisp_Object', but the actual C type
  used for the Lisp object can vary.  It can be either a simple type
@@ -433,99 +425,27 @@ because it ensures that the compiler will actually use a machine word
  to represent the object (some compilers will use more general and less
  efficient code for unions and structs even if they can fit in a machine
  word).  The union type, however, has the advantage of stricter type
-checking (if you accidentally pass an integer where a Lisp object is
-desired, you get a compile error), and it makes it easier to decode
-Lisp objects when debugging.  The choice of which type to use is
+checking.  If you accidentally pass an integer where a Lisp object is
+desired, you get a compile error.  The choice of which type to use is
  determined by the preprocessor constant `USE_UNION_TYPE' which is
  defined via the `--use-union-type' option to `configure'.
  
-   Note that there are only eight types that the tag can represent, but
-many more actual types than this.  This is handled by having one of the
-tag types specify a meta-type called a "record"; for all such objects,
-the first four bytes of the pointed-to structure indicate what the
-actual type is.
-
-   Note also that having 28 bits for pointers and integers restricts a
-lot of things to 256 megabytes of memory. (Basically, enough pointers
-and indices and whatnot get stuffed into Lisp objects that the total
-amount of memory used by XEmacs can't grow above 256 megabytes.  In
-older versions of XEmacs and GNU Emacs, the tag was 5 bits wide,
-allowing for 32 types, which was more than the actual number of types
-that existed at the time, and no "record" type was necessary.  However,
-this limited the editor to 64 megabytes total, which some users who
-edited large files might conceivably exceed.)
-
-   Also, note that there is an implicit assumption here that all
-pointers are low enough that the top bits are all zero and can just be
-chopped off.  On standard machines that allocate memory from the bottom
-up (and give each process its own address space), this works fine.  Some
-machines, however, put the data space somewhere else in memory (e.g.
-beginning at 0x80000000).  Those machines cope by defining
-`DATA_SEG_BITS' in the corresponding `m/' or `s/' file to the proper
-mask.  Then, pointers retrieved from Lisp objects are automatically
-OR'ed with this value prior to being used.
-
-   A corollary of the previous paragraph is that *(pointers to)
-stack-allocated structures cannot be put into Lisp objects*.  The stack
-is generally located near the top of memory; if you put such a pointer
-into a Lisp object, it will get its top bits chopped off, and you will
-lose.
-
-   Actually, there's an alternative representation of a `Lisp_Object',
-invented by Kyle Jones, that is used when the `--use-minimal-tagbits'
-option to `configure' is used.  In this case the 2 lower bits are used
-for the tag bits.  This representation assumes that pointers to structs
-are always aligned to multiples of 4, so the lower 2 bits are always
-zero.
-
-      [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
-      [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
-     
-        <---------------------------------------------------------> <->
-                 a pointer to a structure, or an integer            tag
-
-   A tag of 00 is used for all pointer object types, a tag of 10 is used
-for characters, and the other two tags 01 and 11 are joined together to
-form the integer object type.  The markbit is moved to part of the
-structure being pointed at (integers and chars do not need to be marked,
-since no memory is allocated).  This representation has these
-advantages:
-
-  1. 31 bits can be used for Lisp Integers.
-
-  2. *Any* pointer can be represented directly, and no bit masking
-     operations are necessary.
-
-   The disadvantages are:
-
-  1. An extra level of indirection is needed when accessing the object
-     types that were not record types.  So checking whether a Lisp
-     object is a cons cell becomes a slower operation.
-
-  2. Mark bits can no longer be stored directly in Lisp objects, so
-     another place for them must be found.  This means that a cons cell
-     requires more memory than merely room for 2 lisp objects, leading
-     to extra memory use.
-
-   Various macros are used to construct Lisp objects and extract the
-components.  Macros of the form `XINT()', `XCHAR()', `XSTRING()',
-`XSYMBOL()', etc. mask out the pointer/integer field and cast it to the
-appropriate type.  All of the macros that construct pointers will `OR'
-with `DATA_SEG_BITS' if necessary.  `XINT()' needs to be a bit tricky
-so that negative numbers are properly sign-extended: Usually it does
-this by shifting the number four bits to the left and then four bits to
-the right.  This assumes that the right-shift operator does an
-arithmetic shift (i.e. it leaves the most-significant bit as-is rather
-than shifting in a zero, so that it mimics a divide-by-two even for
-negative numbers).  Not all machines/compilers do this, and on the ones
-that don't, a more complicated definition is selected by defining
-`EXPLICIT_SIGN_EXTEND'.
-
-   Note that when `ERROR_CHECK_TYPECHECK' is defined, the extractor
-macros become more complicated - they check the tag bits and/or the
-type field in the first four bytes of a record type to ensure that the
+   Various macros are used to convert between Lisp_Objects and the
+corresponding C type.  Macros of the form `XINT()', `XCHAR()',
+`XSTRING()', `XSYMBOL()', do any required bit shifting and/or masking
+and cast it to the appropriate type.  `XINT()' needs to be a bit tricky
+so that negative numbers are properly sign-extended.  Since integers
+are stored left-shifted, if the right-shift operator does an arithmetic
+shift (i.e. it leaves the most-significant bit as-is rather than
+shifting in a zero, so that it mimics a divide-by-two even for negative
+numbers) the shift to remove the tag bit is enough.  This is the case
+on all the systems we support.
+
+   Note that when `ERROR_CHECK_TYPECHECK' is defined, the converter
+macros become more complicated--they check the tag bits and/or the type
+field in the first four bytes of a record type to ensure that the
  object is really of the correct type.  This is great for catching places
-where an incorrect type is being dereferenced - this typically results
+where an incorrect type is being dereferenced--this typically results
  in a pointer being dereferenced as the wrong type of structure, with
  unpredictable (and sometimes not easily traceable) results.
  
@@ -533,22 +453,24 @@ unpredictable (and sometimes not easily traceable) results.
  These macros are of the form `XSETTYPE (LVALUE, RESULT)', i.e. they
  have to be a statement rather than just used in an expression.  The
  reason for this is that standard C doesn't let you "construct" a
-structure (but GCC does).  Granted, this sometimes isn't too convenient;
-for the case of integers, at least, you can use the function
-`make_int()', which constructs and *returns* an integer Lisp object.
-Note that the `XSETTYPE()' macros are also affected by
+structure (but GCC does).  Granted, this sometimes isn't too
+convenient; for the case of integers, at least, you can use the
+function `make_int()', which constructs and _returns_ an integer Lisp
+object.  Note that the `XSETTYPE()' macros are also affected by
  `ERROR_CHECK_TYPECHECK' and make sure that the structure is of the
  right type in the case of record types, where the type is contained in
  the structure.
  
     The C programmer is responsible for *guaranteeing* that a
-Lisp_Object is is the correct type before using the `XTYPE' macros.
-This is especially important in the case of lists.  Use `XCAR' and
-`XCDR' if a Lisp_Object is certainly a cons cell, else use `Fcar()' and
-`Fcdr()'.  Trust other C code, but not Lisp code.  On the other hand,
-if XEmacs has an internal logic error, it's better to crash
-immediately, so sprinkle "unreachable" `abort()'s liberally about the
-source code.
+Lisp_Object is the correct type before using the `XTYPE' macros.  This
+is especially important in the case of lists.  Use `XCAR' and `XCDR' if
+a Lisp_Object is certainly a cons cell, else use `Fcar()' and `Fcdr()'.
+Trust other C code, but not Lisp code.  On the other hand, if XEmacs
+has an internal logic error, it's better to crash immediately, so
+sprinkle `assert()'s and "unreachable" `abort()'s liberally about the
+source code.  Where performance is an issue, use `type_checking_assert',
+`bufpos_checking_assert', and `gc_checking_assert', which do nothing
+unless the corresponding configure error checking flag was specified.
  
  \1f
  File: internals.info,  Node: Rules When Writing New C Code,  Next: A Summary of the Various XEmacs Modules,  Prev: How Lisp Objects Are Represented in C,  Up: Top
@@ -567,7 +489,9 @@ situations, often in code far away from where the actual breakage is.
  
  * General Coding Rules::
  * Writing Lisp Primitives::
+* Writing Good Comments::
  * Adding Global Lisp Variables::
+* Proper Use of Unsigned Types::
  * Coding for Mule::
  * Techniques for XEmacs Developers::
  
@@ -585,21 +509,6 @@ been found by compiling with C++.  The ability to use both C and C++
  tools means that a greater variety of development tools are available to
  the developer.
  
-   Almost every module contains a `syms_of_*()' function and a
-`vars_of_*()' function.  The former declares any Lisp primitives you
-have defined and defines any symbols you will be using.  The latter
-declares any global Lisp variables you have added and initializes global
-C variables in the module.  For each such function, declare it in
-`symsinit.h' and make sure it's called in the appropriate place in
-`emacs.c'.  *Important*: There are stringent requirements on exactly
-what can go into these functions.  See the comment in `emacs.c'.  The
-reason for this is to avoid obscure unwanted interactions during
-initialization.  If you don't follow these rules, you'll be sorry!  If
-you want to do anything that isn't allowed, create a
-`complex_vars_of_*()' function for it.  Doing this is tricky, though:
-You have to make sure your function is called at the right time so that
-all the initialization dependencies work out.
-
     Every module includes `<config.h>' (angle brackets so that
  `--srcdir' works correctly; `config.h' may or may not be in the same
  directory as the C sources) and `lisp.h'.  `config.h' must always be
@@ -607,6 +516,32 @@ included before any other header files (including system header files)
  to ensure that certain tricks played by various `s/' and `m/' files
  work out correctly.
  
+   When including header files, always use angle brackets, not double
+quotes, except when the file to be included is always in the same
+directory as the including file.  If either file is a generated file,
+then that is not likely to be the case.  In order to understand why we
+have this rule, imagine what happens when you do a build in the source
+directory using `./configure' and another build in another directory
+using `../work/configure'.  There will be two different `config.h'
+files.  Which one will be used if you `#include "config.h"'?
+
+   Almost every module contains a `syms_of_*()' function and a
+`vars_of_*()' function.  The former declares any Lisp primitives you
+have defined and defines any symbols you will be using.  The latter
+declares any global Lisp variables you have added and initializes global
+C variables in the module.  *Important*: There are stringent
+requirements on exactly what can go into these functions.  See the
+comment in `emacs.c'.  The reason for this is to avoid obscure unwanted
+interactions during initialization.  If you don't follow these rules,
+you'll be sorry!  If you want to do anything that isn't allowed, create
+a `complex_vars_of_*()' function for it.  Doing this is tricky, though:
+you have to make sure your function is called at the right time so that
+all the initialization dependencies work out.
+
+   Declare each function of these kinds in `symsinit.h'.  Make sure
+it's called in the appropriate place in `emacs.c'.  You never need to
+include `symsinit.h' directly, because it is included by `lisp.h'.
+
     *All global and static variables that are to be modifiable must be
  declared uninitialized.*  This means that you may not use the "declare
  with initializer" form for these variables, such as `int some_variable
@@ -615,8 +550,7 @@ dumping process: If possible, the initialized data segment is re-mapped
  so that it becomes part of the (unmodifiable) code segment in the
  dumped executable.  This allows this memory to be shared among multiple
  running XEmacs processes.  XEmacs is careful to place as much constant
-data as possible into initialized variables (in particular, into what's
-called the "pure space" - see below) during the `temacs' phase.
+data as possible into initialized variables during the `temacs' phase.
  
     *Please note:* This kludge only works on a few systems nowadays, and
  is rapidly becoming irrelevant because most modern operating systems
@@ -645,10 +579,10 @@ them.  This awful kludge has been removed in XEmacs because
     The C source code makes heavy use of C preprocessor macros.  One
  popular macro style is:
  
-     #define FOO(var, value) do {              \
-       Lisp_Object FOO_value = (value);        \
-       ... /* compute using FOO_value */       \
-       (var) = bar;                            \
+     #define FOO(var, value) do {            \
+       Lisp_Object FOO_value = (value);      \
+       ... /* compute using FOO_value */     \
+       (var) = bar;                          \
       } while (0)
  
     The `do {...} while (0)' is a standard trick to allow FOO to have
@@ -660,9 +594,9 @@ copying a supplied argument into a local variable, so that
     Lisp lists are popular data structures in the C code as well as in
  Elisp.  There are two sets of macros that iterate over lists.
  `EXTERNAL_LIST_LOOP_N' should be used when the list has been supplied
-by the user, and cannot be trusted to be acyclic and nil-terminated.  A
-`malformed-list' or `circular-list' error will be generated if the list
-being iterated over is not entirely kosher.  `LIST_LOOP_N', on the
+by the user, and cannot be trusted to be acyclic and `nil'-terminated.
+A `malformed-list' or `circular-list' error will be generated if the
+list being iterated over is not entirely kosher.  `LIST_LOOP_N', on the
  other hand, is faster and less safe, and can be used only on trusted
  lists.
  
@@ -673,7 +607,7 @@ macros `EXTERNAL_LIST_LOOP_DELETE_IF' and `LIST_LOOP_DELETE_IF' delete
  elements from a lisp list satisfying some predicate.
  
  \1f
-File: internals.info,  Node: Writing Lisp Primitives,  Next: Adding Global Lisp Variables,  Prev: General Coding Rules,  Up: Rules When Writing New C Code
+File: internals.info,  Node: Writing Lisp Primitives,  Next: Writing Good Comments,  Prev: General Coding Rules,  Up: Rules When Writing New C Code
  
  Writing Lisp Primitives
  =======================
@@ -870,7 +804,7 @@ call the C function.
  
     Defining the C function is not enough to make a Lisp primitive
  available; you must also create the Lisp symbol for the primitive (the
-symbol is "interned"; *note Obarrays::.) and store a suitable subr
+symbol is "interned"; *note Obarrays::) and store a suitable subr
  object in its function cell. (If you don't do this, the primitive won't
  be seen by Lisp code.) The code looks like this:
  
@@ -903,7 +837,70 @@ arguments.  They work by calling `Ffuncall'.
  contains the definitions for important macros and functions.
  
  \1f
-File: internals.info,  Node: Adding Global Lisp Variables,  Next: Coding for Mule,  Prev: Writing Lisp Primitives,  Up: Rules When Writing New C Code
+File: internals.info,  Node: Writing Good Comments,  Next: Adding Global Lisp Variables,  Prev: Writing Lisp Primitives,  Up: Rules When Writing New C Code
+
+Writing Good Comments
+=====================
+
+   Comments are a lifeline for programmers trying to understand tricky
+code.  In general, the less obvious it is what you are doing, the more
+you need a comment, and the more detailed it needs to be.  You should
+always be on guard when you're writing code for stuff that's tricky, and
+should constantly be putting yourself in someone else's shoes and asking
+if that person could figure out without much difficulty what's going
+on. (Assume they are a competent programmer who understands the
+essentials of how the XEmacs code is structured but doesn't know much
+about the module you're working on or any algorithms you're using.) If
+you're not sure whether they would be able to, add a comment.  Always
+err on the side of more comments, rather than less.
+
+   Generally, when making comments, there is no need to attribute them
+with your name or initials.  This especially goes for small,
+easy-to-understand, non-opinionated ones.  Also, comments indicating
+where, when, and by whom a file was changed are _strongly_ discouraged,
+and in general will be removed as they are discovered.  This is exactly
+what `ChangeLogs' are there for.  However, it can occasionally be
+useful to mark exactly where (but not when or by whom) changes are
+made, particularly when making small changes to a file imported from
+elsewhere.  These marks help when later on a newer version of the file
+is imported and the changes need to be merged. (If everything were
+always kept in CVS, there would be no need for this.  But in practice,
+this often doesn't happen, or the CVS repository is later on lost or
+unavailable to the person doing the update.)
+
+   When putting in an explicit opinion in a comment, you should
+_always_ attribute it with your name, and optionally the date.  This
+also goes for long, complex comments explaining in detail the workings
+of something - by putting your name there, you make it possible for
+someone who has questions about how that thing works to determine who
+wrote the comment so they can write to them.  Preferably, use your
+actual name and not your initials, unless your initials are generally
+recognized (e.g. `jwz').  You can use only your first name if it's
+obvious who you are; otherwise, give first and last name.  If you're
+not a regular contributor, you might consider putting your email
+address in - it may be in the ChangeLog, but after awhile ChangeLogs
+have a tendency of disappearing or getting muddled. (E.g. your comment
+may get copied somewhere else or even into another program, and
+tracking down the proper ChangeLog may be very difficult.)
+
+   If you come across an opinion that is not or no longer valid, or you
+come across any comment that no longer applies but you want to keep it
+around, enclose it in `[[ ' and ` ]]' marks and add a comment
+afterwards explaining why the preceding comment is no longer valid.  Put
+your name on this comment, as explained above.
+
+   Just as comments are a lifeline to programmers, incorrect comments
+are death.  If you come across an incorrect comment, *immediately*
+correct it or flag it as incorrect, as described in the previous
+paragraph.  Whenever you work on a section of code, _always_ make sure
+to update any comments to be correct - or, at the very least, flag them
+as incorrect.
+
+   To indicate a "todo" or other problem, use four pound signs - i.e.
+`####'.
+
+\1f
+File: internals.info,  Node: Adding Global Lisp Variables,  Next: Proper Use of Unsigned Types,  Prev: Writing Good Comments,  Up: Rules When Writing New C Code
  
  Adding Global Lisp Variables
  ============================
@@ -956,7 +953,7 @@ variable gets changed.
  
     Whether or not you `DEFVAR_LISP()' a variable, you need to
  initialize it in the `vars_of_*()' function; otherwise it will end up
-as all zeroes, which is the integer 0 (*not* `nil'), and this is
+as all zeroes, which is the integer 0 (_not_ `nil'), and this is
  probably not what you want.  Also, if the variable is not
  `DEFVAR_LISP()'ed, *you must call* `staticpro()' on the C variable in
  the `vars_of_*()' function.  Otherwise, the garbage-collection
@@ -966,7 +963,36 @@ and you will be the one who's unhappy when you can't figure out how
  your variable got overwritten.
  
  \1f
-File: internals.info,  Node: Coding for Mule,  Next: Techniques for XEmacs Developers,  Prev: Adding Global Lisp Variables,  Up: Rules When Writing New C Code
+File: internals.info,  Node: Proper Use of Unsigned Types,  Next: Coding for Mule,  Prev: Adding Global Lisp Variables,  Up: Rules When Writing New C Code
+
+Proper Use of Unsigned Types
+============================
+
+   Avoid using `unsigned int' and `unsigned long' whenever possible.
+Unsigned types are viral - any arithmetic or comparisons involving
+mixed signed and unsigned types are automatically converted to
+unsigned, which is almost certainly not what you want.  Many subtle and
+hard-to-find bugs are created by careless use of unsigned types.  In
+general, you should almost _never_ use an unsigned type to hold a
+regular quantity of any sort.  The only exceptions are
+
+  1. When there's a reasonable possibility you will actually need all
+     32 or 64 bits to store the quantity.
+
+  2. When calling existing API's that require unsigned types.  In this
+     case, you should still do all manipulation using signed types, and
+     do the conversion at the very threshold of the API call.
+
+  3. In existing code that you don't want to modify because you don't
+     maintain it.
+
+  4. In bit-field structures.
+
+   Other reasonable uses of `unsigned int' and `unsigned long' are
+representing non-quantities - e.g. bit-oriented flags and such.
+
+\1f
+File: internals.info,  Node: Coding for Mule,  Next: Techniques for XEmacs Developers,  Prev: Proper Use of Unsigned Types,  Up: Rules When Writing New C Code
  
  Coding for Mule
  ===============
@@ -1017,27 +1043,32 @@ glance at the declaration can tell the intended use of the variable.
       The data representing the text in a buffer or string is logically
       a set of `Bufbyte's.
  
-     XEmacs does not work with character formats all the time; when
-     reading characters from the outside, it decodes them to an
+     XEmacs does not work with the same character formats all the time;
+     when reading characters from the outside, it decodes them to an
       internal format, and likewise encodes them when writing.
       `Bufbyte' (in fact `unsigned char') is the basic unit of XEmacs
-     internal buffers and strings format.
+     internal buffers and strings format.  A `Bufbyte *' is the type
+     that points at text encoded in the variable-width internal
+     encoding.
  
       One character can correspond to one or more `Bufbyte's.  In the
-     current implementation, an ASCII character is represented by the
-     same `Bufbyte', and extended characters are represented by a
-     sequence of `Bufbyte's.
+     current Mule implementation, an ASCII character is represented by
+     the same `Bufbyte', and other characters are represented by a
+     sequence of two or more `Bufbyte's.
  
-     Without Mule support, a `Bufbyte' is equivalent to an `Emchar'.
+     Without Mule support, there are exactly 256 characters, implicitly
+     Latin-1, and each character is represented using one `Bufbyte', and
+     there is a one-to-one correspondence between `Bufbyte's and
+     `Emchar's.
  
  `Bufpos'
  `Charcount'
       A `Bufpos' represents a character position in a buffer or string.
       A `Charcount' represents a number (count) of characters.
       Logically, subtracting two `Bufpos' values yields a `Charcount'
-     value.  Although all of these are `typedef'ed to `int', we use
-     them in preference to `int' to make it clear what sort of position
-     is being used.
+     value.  Although all of these are `typedef'ed to `EMACS_INT', we
+     use them in preference to `EMACS_INT' to make it clear what sort
+     of position is being used.
  
       `Bufpos' and `Charcount' values are the only ones that are ever
       visible to Lisp.
@@ -1045,9 +1076,9 @@ glance at the declaration can tell the intended use of the variable.
  `Bytind'
  `Bytecount'
       A `Bytind' represents a byte position in a buffer or string.  A
-     `Bytecount' represents the distance between two positions in bytes.
-     The relationship between `Bytind' and `Bytecount' is the same as
-     the relationship between `Bufpos' and `Charcount'.
+     `Bytecount' represents the distance between two positions, in
+     bytes.  The relationship between `Bytind' and `Bytecount' is the
+     same as the relationship between `Bufpos' and `Charcount'.
  
  `Extbyte'
  `Extcount'