X-Git-Url: http://git.chise.org/gitweb/?a=blobdiff_plain;f=man%2Finternals%2Finternals.texi;h=d7df3da90eb2b5fe8c137e6bc804bcbdbc7028fe;hb=HEAD;hp=1314398607f4bb533c02170a2d04514928b22d67;hpb=02f4d2761a98c5cb9d5b423d2361160a5d8c9ee4;p=chise%2Fxemacs-chise.git.1 diff --git a/man/internals/internals.texi b/man/internals/internals.texi index 1314398..d7df3da 100644 --- a/man/internals/internals.texi +++ b/man/internals/internals.texi @@ -12,7 +12,7 @@ Copyright @copyright{} 1992 - 1996 Ben Wing. Copyright @copyright{} 1996, 1997 Sun Microsystems. -Copyright @copyright{} 1994 - 1998 Free Software Foundation. +Copyright @copyright{} 1994 - 1998, 2002, 2003 Free Software Foundation. Copyright @copyright{} 1994, 1995 Board of Trustees, University of Illinois. @@ -117,6 +117,7 @@ This Info file contains v1.4 of the XEmacs Internals Manual, March 2001. * The XEmacs Object System (Abstractly Speaking):: * How Lisp Objects Are Represented in C:: * Rules When Writing New C Code:: +* Regression Testing XEmacs:: * A Summary of the Various XEmacs Modules:: * Allocation of Objects in XEmacs Lisp:: * Dumping:: @@ -166,6 +167,8 @@ Coding for Mule * General Guidelines for Writing Mule-Aware Code:: * An Example of Mule-Aware Code:: +Regression Testing XEmacs + A Summary of the Various XEmacs Modules * Low-Level Modules:: @@ -180,6 +183,7 @@ A Summary of the Various XEmacs Modules * Modules for Interfacing with the Operating System:: * Modules for Interfacing with X Windows:: * Modules for Internationalization:: +* Modules for Regression Testing:: Allocation of Objects in XEmacs Lisp @@ -1837,16 +1841,43 @@ Lisp objects use the typedef @code{Lisp_Object}, but the actual C type used for the Lisp object can vary. It can be either a simple type (@code{long} on the DEC Alpha, @code{int} on other machines) or a structure whose fields are bit fields that line up properly (actually, a -union of structures is used). Generally the simple integral type is -preferable because it ensures that the compiler will actually use a -machine word to represent the object (some compilers will use more -general and less efficient code for unions and structs even if they can -fit in a machine word). The union type, however, has the advantage of -stricter type checking. If you accidentally pass an integer where a Lisp -object is desired, you get a compile error. The choice of which type -to use is determined by the preprocessor constant @code{USE_UNION_TYPE} -which is defined via the @code{--use-union-type} option to -@code{configure}. +union of structures is used). The choice of which type to use is +determined by the preprocessor constant @code{USE_UNION_TYPE} which is +defined via the @code{--use-union-type} option to @code{configure}. + +Generally the simple integral type is preferable because it ensures that +the compiler will actually use a machine word to represent the object +(some compilers will use more general and less efficient code for unions +and structs even if they can fit in a machine word). The union type, +however, has the advantage of stricter @emph{static} type checking. +Places where a @code{Lisp_Object} is mistakenly passed to a routine +expecting an @code{int} (or vice-versa), or a check is written @samp{if +(foo)} (instead of @samp{if (!NILP (foo))}, will be flagged as errors. +None of these lead to the expected results! @code{Qnil} is not +represented as 0 (so @samp{if (foo)} will *ALWAYS* be true for a +@code{Lisp_Object}), and the representation of an integer as a +@code{Lisp_Object} is not just the integer's numeric value, but usually +2x the integer +/- 1.) + +There used to be a claim that the union type simplified debugging. +There may have been a grain of truth to this pre-19.8, when there was no +@samp{lrecord} type and all objects had a separate type appearing in the +tag. Nowadays, however, there is no debugging gain, and in fact +frequent debugging *@emph{loss}*, since many debuggers don't handle +unions very well, and usually there is no way to directly specify a +union from a debugging prompt. + +Furthermore, release builds should *@emph{not}* be done with union type +because (a) you may get less efficiency, with compilers that can't +figure out how to optimize the union into a machine word; (b) even +worse, the union type often triggers miscompilation, especially when +combined with Mule and error-checking. This has been the case at +various times when using GCC and MS VC, at least with @samp{--pdump}. +Therefore, be warned! + +As of 2002 4Q, miscompilation is known to happen with current versions +of @strong{Microsoft VC++} and @strong{GCC in combination with Mule, +pdump, and KKCC} (no error checking). Various macros are used to convert between Lisp_Objects and the corresponding C type. Macros of the form @code{XINT()}, @code{XCHAR()}, @@ -1893,7 +1924,7 @@ performance is an issue, use @code{type_checking_assert}, nothing unless the corresponding configure error checking flag was specified. -@node Rules When Writing New C Code, A Summary of the Various XEmacs Modules, How Lisp Objects Are Represented in C, Top +@node Rules When Writing New C Code, Regression Testing XEmacs, How Lisp Objects Are Represented in C, Top @chapter Rules When Writing New C Code @cindex writing new C code, rules when @cindex C code, rules when writing new @@ -1907,6 +1938,7 @@ get something that appears to work, but which will crash in odd situations, often in code far away from where the actual breakage is. @menu +* A Reader's Guide to XEmacs Coding Conventions:: * General Coding Rules:: * Writing Lisp Primitives:: * Writing Good Comments:: @@ -1916,6 +1948,99 @@ situations, often in code far away from where the actual breakage is. * Techniques for XEmacs Developers:: @end menu +@node A Reader's Guide to XEmacs Coding Conventions +@section A Reader's Guide to XEmacs Coding Conventions +@cindex coding conventions +@cindex reader's guide +@cindex coding rules, naming + +Of course the low-level implementation language of XEmacs is C, but much +of that uses the Lisp engine to do its work. However, because the code +is ``inside'' of the protective containment shell around the ``reactor +core,'' you'll see lots of complex ``plumbing'' needed to do the work +and ``safety mechanisms,'' whose failure results in a meltdown. This +section provides a quick overview (or review) of the various components +of the implementation of Lisp objects. + + Two typographic conventions help to identify C objects that implement +Lisp objects. The first is that capitalized identifiers, especially +beginning with the letters @samp{Q}, @samp{V}, @samp{F}, and @samp{S}, +for C variables and functions, and C macros with beginning with the +letter @samp{X}, are used to implement Lisp. The second is that where +Lisp uses the hyphen @samp{-} in symbol names, the corresponding C +identifiers use the underscore @samp{_}. Of course, since XEmacs Lisp +contains interfaces to many external libraries, those external names +will follow the coding conventions their authors chose, and may overlap +the ``XEmacs name space.'' However these cases are usually pretty +obvious. + + All Lisp objects are handled indirectly. The @code{Lisp_Object} +type is usually a pointer to a structure, except for a very small number +of types with immediate representations (currently characters and +integers). However, these types cannot be directly operated on in C +code, either, so they can also be considered indirect. Types that do +not have an immediate representation always have a C typedef +@code{Lisp_@var{type}} for a corresponding structure. +@c #### mention l(c)records here? + + In older code, it was common practice to pass around pointers to +@code{Lisp_@var{type}}, but this is now deprecated in favor of using +@code{Lisp_Object} for all function arguments and return values that are +Lisp objects. The @code{X@var{type}} macro is used to extract the +pointer and cast it to @code{(Lisp_@var{type} *)} for the desired type. + + @strong{Convention}: macros whose names begin with @samp{X} operate on +@code{Lisp_Object}s and do no type-checking. Many such macros are type +extractors, but others implement Lisp operations in C (@emph{e.g.}, +@code{XCAR} implements the Lisp @code{car} function). These are unsafe, +and must only be used where types of all data have already been checked. +Such macros are only applied to @code{Lisp_Object}s. In internal +implementations where the pointer has already been converted, the +structure is operated on directly using the C @code{->} member access +operator. + + The @code{@var{type}P}, @code{CHECK_@var{type}}, and +@code{CONCHECK_@var{type}} macros are used to test types. The first +returns a Boolean value, and the latter signal errors. (The +@samp{CONCHECK} variety allows execution to be CONtinued under some +circumstances, thus the name.) Functions which expect to be passed user +data invariably call @samp{CHECK} macros on arguments. + + There are many types of specialized Lisp objects implemented in C, but +the most pervasive type is the @dfn{symbol}. Symbols are used as +identifiers, variables, and functions. + + @strong{Convention}: Global variables whose names begin with @samp{Q} +are constants whose value is a symbol. The name of the variable should +be derived from the name of the symbol using the same rules as for Lisp +primitives. Such variables allow the C code to check whether a +particular @code{Lisp_Object} is equal to a given symbol. Symbols are +Lisp objects, so these variables may be passed to Lisp primitives. (An +alternative to the use of @samp{Q...} variables is to call the +@code{intern} function at initialization in the +@code{vars_of_@var{module}} function, which is hardly less efficient.) + + @strong{Convention}: Global variables whose names begin with @samp{V} +are variables that contain Lisp objects. The convention here is that +all global variables of type @code{Lisp_Object} begin with @samp{V}, and +no others do (not even integer and boolean variables that have Lisp +equivalents). Most of the time, these variables have equivalents in +Lisp, which are defined via the @samp{DEFVAR} family of macros, but some +don't. Since the variable's value is a @code{Lisp_Object}, it can be +passed to Lisp primitives. + + The implementation of Lisp primitives is more complex. +@strong{Convention}: Global variables with names beginning with @samp{S} +contain a structure that allows the Lisp engine to identify and call a C +function. In modern versions of XEmacs, these identifiers are almost +always completely hidden in the @code{DEFUN} and @code{SUBR} macros, but +you will encounter them if you look at very old versions of XEmacs or at +GNU Emacs. @strong{Convention}: Functions with names beginning with +@samp{F} implement Lisp primitives. Of course all their arguments and +their return values must be Lisp_Objects. (This is hidden in the +@code{DEFUN} macro.) + + @node General Coding Rules @section General Coding Rules @cindex coding rules, general @@ -2375,10 +2500,15 @@ XEmacs crash!].) @code{defsymbol()} are no problem, but some linkers will complain about multiply-defined symbols. The most insidious aspect of this is that often the link will succeed anyway, but then the resulting executable -will sometimes crash in obscure ways during certain operations! To -avoid this problem, declare any symbols with common names (such as +will sometimes crash in obscure ways during certain operations! + +To avoid this problem, declare any symbols with common names (such as @code{text}) that are not obviously associated with this particular -module in the module @file{general.c}. +module in the file @file{general-slots.h}. The ``-slots'' suffix +indicates that this is a file that is included multiple times in +@file{general.c}. Redefinition of preprocessor macros allows the +effects to be different in each context, so this is actually more +convenient and less error-prone than doing it in your module. Global variables whose names begin with @samp{V} are variables that contain Lisp objects. The convention here is that all global variables @@ -2957,7 +3087,14 @@ commands: @code{quantify-start-recording-data}, If you want to make XEmacs faster, target your favorite slow benchmark, run a profiler like Quantify, @code{gprof}, or @code{tcov}, and figure -out where the cycles are going. Specific projects: +out where the cycles are going. In many cases you can localize the +problem (because a particular new feature or even a single patch +elicited it). Don't hesitate to use brute force techniques like a +global counter incremented at strategic places, especially in +combination with other performance indications (@emph{e.g.}, degree of +buffer fragmentation into extents). + +Specific projects: @itemize @bullet @item @@ -2970,8 +3107,16 @@ developed module system. @item Speed up redisplay. @item -Speed up syntax highlighting. Maybe moving some of the syntax -highlighting capabilities into C would make a difference. +Speed up syntax highlighting. It was suggested that ``maybe moving some +of the syntax highlighting capabilities into C would make a +difference.'' Wrong idea, I think. When processing one 400kB file a +particular low-level routine was being called 40 @emph{million} times +simply for @emph{one} call to @code{newline-and-indent}. Syntax +highlighting needs to be rewritten to use a reliable, fast parser, then +to trust the pre-parsed structure, and only do re-highlighting locally +to a text change. Modern machines are fast enough to implement such +parsers in Lisp; but no machine will ever be fast enough to deal with +quadratic (or worse) algorithms! @item Implement tail recursion in Emacs Lisp (hard!). @end itemize @@ -3103,21 +3248,24 @@ All @file{.c} files should @code{#include } first. Almost all @file{.c} files should @code{#include "lisp.h"} second. @item -Generated header files should be included using the @code{#include <...>} syntax, -not the @code{#include "..."} syntax. The generated headers are: +Generated header files should be included using the @samp{#include <...>} +syntax, not the @samp{#include "..."} syntax. The generated headers are: @file{config.h sheap-adjust.h paths.h Emacs.ad.h} -The basic rule is that you should assume builds using @code{--srcdir} -and the @code{#include <...>} syntax needs to be used when the +The basic rule is that you should assume builds using @samp{--srcdir} +and the @samp{#include <...>} syntax needs to be used when the to-be-included generated file is in a potentially different directory -@emph{at compile time}. The non-obvious C rule is that @code{#include "..."} -means to search for the included file in the same directory as the -including file, @emph{not} in the current directory. +@emph{at compile time}. The non-obvious C rule is that +@samp{#include "..."} means to search for the included file in the same +directory as the including file, @emph{not} in the current directory. +Normally this is not a problem but when building with @samp{--srcdir}, +@file{make} will search the @samp{VPATH} for you, while the C compiler +knows nothing about it. @item -Header files should @emph{not} include @code{} and -@code{"lisp.h"}. It is the responsibility of the @file{.c} files that +Header files should @emph{not} include @samp{} and +@samp{"lisp.h"}. It is the responsibility of the @file{.c} files that use it to do so. @end itemize @@ -3150,7 +3298,117 @@ add a DEFINE_LRECORD_IMPLEMENTATION call to @file{@var{foo}.c} add an INIT_LRECORD_IMPLEMENTATION call to @code{syms_of_@var{foo}.c} @end enumerate -@node A Summary of the Various XEmacs Modules, Allocation of Objects in XEmacs Lisp, Rules When Writing New C Code, Top + +@node Regression Testing XEmacs, A Summary of the Various XEmacs Modules, Rules When Writing New C Code, Top +@chapter Regression Testing XEmacs +@cindex testing, regression + +The source directory @file{tests/automated} contains XEmacs' automated +test suite. The usual way of running all the tests is running +@code{make check} from the top-level source directory. + +The test suite is unfinished and it's still lacking some essential +features. It is nevertheless recommended that you run the tests to +confirm that XEmacs behaves correctly. + +If you want to run a specific test case, you can do it from the +command-line like this: + +@example +$ xemacs -batch -l test-harness.elc -f batch-test-emacs TEST-FILE +@end example + +If something goes wrong, you can run the test suite interactively by +loading @file{test-harness.el} into a running XEmacs and typing +@kbd{M-x test-emacs-test-file RET RET}. You will see a log of +passed and failed tests, which should allow you to investigate the +source of the error and ultimately fix the bug. + +Adding a new test file is trivial: just create a new file here and it +will be run. There is no need to byte-compile any of the files in +this directory---the test-harness will take care of any necessary +byte-compilation. + +Look at the existing test cases for the examples of coding test cases. +It all boils down to your imagination and judicious use of the macros +@code{Assert}, @code{Check-Error}, @code{Check-Error-Message}, and +@code{Check-Message}. + +Here's a simple example checking case-sensitive and case-insensitive +comparisons from @file{case-tests.el}. + +@example +(with-temp-buffer + (insert "Test Buffer") + (let ((case-fold-search t)) + (goto-char (point-min)) + (Assert (eq (search-forward "test buffer" nil t) 12)) + (goto-char (point-min)) + (Assert (eq (search-forward "Test buffer" nil t) 12)) + (goto-char (point-min)) + (Assert (eq (search-forward "Test Buffer" nil t) 12)) + + (setq case-fold-search nil) + (goto-char (point-min)) + (Assert (not (search-forward "test buffer" nil t))) + (goto-char (point-min)) + (Assert (not (search-forward "Test buffer" nil t))) + (goto-char (point-min)) + (Assert (eq (search-forward "Test Buffer" nil t) 12)))) +@end example + +This example could be inserted in a file in @file{tests/automated}, and +it would be a complete test, automatically executed when you run +@kbd{make check} after building XEmacs. More complex tests may require +substantial temporary scaffolding to create the environment that elicits +the bugs, but the top-level Makefile and @file{test-harness.el} handle +the running and collection of results from the @code{Assert}, +@code{Check-Error}, @code{Check-Error-Message}, and @code{Check-Message} +macros. + +In general, you should avoid using functionality from packages in your +tests, because you can't be sure that everyone will have the required +package. However, if you've got a test that works, by all means add it. +Simply wrap the test in an appropriate test, add a notice that the test +was skipped, and update the @code{skipped-test-reasons} hashtable. +Here's an example from @file{syntax-tests.el}: + +@example +;; Test forward-comment at buffer boundaries +(with-temp-buffer + + ;; try to use exactly what you need: featurep, boundp, fboundp + (if (not (fboundp 'c-mode)) + + ;; We should provide a standard function for this boilerplate, + ;; probably called `Skip-Test' -- check for that API with C-h f + (let* ((reason "c-mode unavailable") + (count (gethash reason skipped-test-reasons))) + (puthash reason (if (null count) 1 (1+ count)) + skipped-test-reasons) + (Print-Skip "comment and parse-partial-sexp tests" reason)) + + ;; and here's the test code + (c-mode) + (insert "// comment\n") + (forward-comment -2) + (Assert (eq (point) (point-min))) + (let ((point (point))) + (insert "/* comment */") + (goto-char point) + (forward-comment 2) + (Assert (eq (point) (point-max))) + (parse-partial-sexp point (point-max))))) +@end example + +@code{Skip-Test} is intended for use with features that are normally +present in typical configurations. For truly optional features, or +tests that apply to one of several alternative implementations (eg, to +GTK widgets, but not Athena, Motif, MS Windows, or Carbon), simply +silently omit the test. + + +@node A Summary of the Various XEmacs Modules, Allocation of Objects in XEmacs Lisp, Regression Testing XEmacs, Top @chapter A Summary of the Various XEmacs Modules @cindex modules, a summary of the various XEmacs @@ -3169,6 +3427,7 @@ add an INIT_LRECORD_IMPLEMENTATION call to @code{syms_of_@var{foo}.c} * Modules for Interfacing with the Operating System:: * Modules for Interfacing with X Windows:: * Modules for Internationalization:: +* Modules for Regression Testing:: @end menu @node Low-Level Modules @@ -4015,6 +4274,11 @@ highlighting different syntactic constructs of a source file in different colors, for easy reading. The C support is provided so that this is fast. +As of 21.4.10, bugs introduced at the very end of the 21.2 series in the +``syntax properties'' code were fixed, and highlighting is acceptably +quick again. However, presumably more improvements are possible, and +the places to look are probably here, in the defun-traversing code, and +in @file{syntax.c}, in the comment-traversing code. @example @@ -4286,6 +4550,58 @@ for example, to find the matching parenthesis in a command such as @code{forward-sexp}, and by @file{font-lock.c} to locate quoted strings, comments, etc. +@c #### Break this out into a separate node somewhere! +Syntax codes are implemented as bitfields in an int. Bits 0-6 contain +the syntax code itself, bit 7 is a special prefix flag used for Lisp, +and bits 16-23 contain comment syntax flags. From the Lisp programmer's +point of view, there are 11 flags: 2 styles X 2 characters X @{start, +end@} flags for two-character comment delimiters, 2 style flags for +one-character comment delimiters, and the prefix flag. + +Internally, however, the characters used in multi-character delimiters +will have non-comment-character syntax classes (@emph{e.g.}, the +@samp{/} in C's @samp{/*} comment-start delimiter has ``punctuation'' +(here meaning ``operator-like'') class in C modes). Thus in a mixed +comment style, such as C++'s @samp{//} to end of line, is represented by +giving @samp{/} the ``punctuation'' class and the ``style b first +character of start sequence'' and ``style b second character of start +sequence'' flags. The fact that class is @emph{not} punctuation allows +the syntax scanner to recognize that this is a multi-character +delimiter. The @samp{newline} character is given (single-character) +``comment-end'' @emph{class} and the ``style b first character of end +sequence'' @emph{flag}. The ``comment-end'' class allows the scanner to +determine that no second character is needed to terminate the comment. + +There used to be a syntax class @samp{Sextword}. A character of +@samp{Sextword} class is a word-constituent but a word boundary may +exist between two such characters. Ken'ichi HANDA +explains the purpose of the Sextword syntax category: + +@quotation +Japanese words are not separated by spaces, which makes finding word +boundaries very difficult. Theoretically it's impossible without +using natural language processing techniques. But, by defining +pseudo-words as below (much simplified for letting you understand it +easily) for Japanese, we can have a convenient forward-word function +for Japanese. + +@display +A Japanese word is a sequence of characters that consists of +zero or more Kanji characters followed by zero or more +Hiragana characters. +@end display + +Then, the problem is that now we can't say that a sequence of +word-constituents makes up a word. For instance, both Hiragana "A" +and Kanji "KAN" are word-constituents but the sequence of these two +letters can't be a single word. + +So, we introduced Sextword for Japanese letters. +@end quotation + +There seems to have been some controversy about this category, as it has +been removed, readded, and removed again. Currently neither GNU Emacs +(21.3.99) nor XEmacs (21.5.17) seems to use it. @example @@ -4803,13 +5119,13 @@ respectively. This is currently in beta. @file{mule-mcpath.c} provides some functions to allow for pathnames containing extended characters. This code is fragmentary, obsolete, and -completely non-working. Instead, @var{pathname-coding-system} is used +completely non-working. Instead, @code{pathname-coding-system} is used to specify conversions of names of files and directories. The standard C I/O functions like @samp{open()} are wrapped so that conversion occurs automatically. -@file{mule.c} provides a few miscellaneous things that should probably -be elsewhere. +@file{mule.c} contains a few miscellaneous things. It currently seems +to be unused and probably should be removed. @@ -4833,6 +5149,37 @@ Asian-language support, and is not currently used. +@node Modules for Regression Testing +@section Modules for Regression Testing +@cindex modules for regression testing +@cindex regression testing, modules for + +@example +test-harness.el +base64-tests.el +byte-compiler-tests.el +case-tests.el +ccl-tests.el +c-tests.el +database-tests.el +extent-tests.el +hash-table-tests.el +lisp-tests.el +md5-tests.el +mule-tests.el +regexp-tests.el +symbol-tests.el +syntax-tests.el +tag-tests.el +@end example + +@file{test-harness.el} defines the macros @code{Assert}, +@code{Check-Error}, @code{Check-Error-Message}, and +@code{Check-Message}. The other files are test files, testing various +XEmacs modules. + + + @node Allocation of Objects in XEmacs Lisp, Dumping, A Summary of the Various XEmacs Modules, Top @chapter Allocation of Objects in XEmacs Lisp @cindex allocation of objects in XEmacs Lisp @@ -5100,6 +5447,10 @@ weirdly corrupted objects or even in incorrect values in a totally different section of code. @end enumerate +If you don't understand whether to @code{GCPRO} in a particular +instance, ask on the mailing lists. A general hint is that @code{prog1} +is the canonical example. + @cindex garbage collection, conservative @cindex conservative garbage collection Given the extremely error-prone nature of the @code{GCPRO} scheme, and @@ -9234,6 +9585,15 @@ elements, i.e. they hold all the information necessary to produce an image on-screen but the image need not exist at this stage, and multiple screen images can be instantiated from a single glyph. +@c #### find a place for this discussion +@c The decision to make image specifiers a separate type is debatable. +@c In fact, the design decision to create a separate image specifier +@c type, rather than make glyphs themselves be specifiers, is +@c debatable---the other properties of glyphs are rarely used and could +@c conceivably have been incorporated into the glyph's instantiator. +@c The rarely used glyph types (buffer, pointer, icon) could also have +@c been incorporated into the instantiator. + Glyphs are lazily instantiated by calling one of the glyph functions. This usually occurs within redisplay when @code{Fglyph_height} is called. Instantiation causes an image-instance