3 @setfilename emacs-mime
4 @settitle Emacs MIME Manual
10 * Emacs MIME: (emacs-mime). The MIME de/composition library.
15 @setchapternewpage odd
19 This file documents the Emacs MIME interface functionality.
21 Copyright (C) 1998, 1999, 2000, 2001, 2002 Free Software Foundation, Inc.
23 Permission is granted to copy, distribute and/or modify this document
24 under the terms of the GNU Free Documentation License, Version 1.1 or
25 any later version published by the Free Software Foundation; with no
26 Invariant Sections, with the Front-Cover texts being ``A GNU
27 Manual'', and with the Back-Cover Texts as in (a) below. A copy of the
28 license is included in the section entitled ``GNU Free Documentation
29 License'' in the Emacs manual.
31 (a) The FSF's Back-Cover Text is: ``You have freedom to copy and modify
32 this GNU Manual, like GNU software. Copies published by the Free
33 Software Foundation raise funds for GNU development.''
35 This document is part of a collection distributed under the GNU Free
36 Documentation License. If you want to distribute this document
37 separately from the collection, you can do so by adding a copy of the
38 license to the document, as described in section 6 of the license.
44 @title Emacs MIME Manual
46 @author by Lars Magne Ingebrigtsen
49 @vskip 0pt plus 1filll
50 Copyright @copyright{} 1998, 1999, 2000, 2001, 2002 Free Software
53 Permission is granted to copy, distribute and/or modify this document
54 under the terms of the GNU Free Documentation License, Version 1.1 or
55 any later version published by the Free Software Foundation; with the
56 Invariant Sections being none, with the Front-Cover texts being ``A GNU
57 Manual'', and with the Back-Cover Texts as in (a) below. A copy of the
58 license is included in the section entitled ``GNU Free Documentation
59 License'' in the Emacs manual.
61 (a) The FSF's Back-Cover Text is: ``You have freedom to copy and modify
62 this GNU Manual, like GNU software. Copies published by the Free
63 Software Foundation raise funds for GNU development.''
65 This document is part of a collection distributed under the GNU Free
66 Documentation License. If you want to distribute this document
67 separately from the collection, you can do so by adding a copy of the
68 license to the document, as described in section 6 of the license.
77 This manual documents the libraries used to compose and display
80 This manual is directed at users who want to modify the behaviour of
81 the MIME encoding/decoding process or want a more detailed picture of
82 how the Emacs MIME library works, and people who want to write
83 functions and commands that manipulate @sc{mime} elements.
85 @sc{mime} is short for @dfn{Multipurpose Internet Mail Extensions}.
86 This standard is documented in a number of RFCs; mainly RFC2045 (Format
87 of Internet Message Bodies), RFC2046 (Media Types), RFC2047 (Message
88 Header Extensions for Non-ASCII Text), RFC2048 (Registration
89 Procedures), RFC2049 (Conformance Criteria and Examples). It is highly
90 recommended that anyone who intends writing @sc{mime}-compliant software
91 read at least RFC2045 and RFC2047.
94 * Decoding and Viewing:: A framework for decoding and viewing.
95 * Composing:: MML; a language for describing @sc{mime} parts.
96 * Interface Functions:: An abstraction over the basic functions.
97 * Basic Functions:: Utility and basic parsing functions.
98 * Standards:: A summary of RFCs and working documents used.
99 * Index:: Function and variable index.
103 @node Decoding and Viewing
104 @chapter Decoding and Viewing
106 This chapter deals with decoding and viewing @sc{mime} messages on a
109 The main idea is to first analyze a @sc{mime} article, and then allow
110 other programs to do things based on the list of @dfn{handles} that are
111 returned as a result of this analysis.
114 * Dissection:: Analyzing a @sc{mime} message.
115 * Non-MIME:: Analyzing a non-@sc{mime} message.
116 * Handles:: Handle manipulations.
117 * Display:: Displaying handles.
118 * Display Customization:: Variables that affect display.
119 * New Viewers:: How to write your own viewers.
126 The @code{mm-dissect-buffer} is the function responsible for dissecting
127 a @sc{mime} article. If given a multipart message, it will recursively
128 descend the message, following the structure, and return a tree of
129 @sc{mime} handles that describes the structure of the message.
134 Gnus also understands some non-@sc{mime} attachments, such as
135 postscript, uuencode, binhex, shar, forward, gnatsweb, pgp. Each of
136 these features can be disabled by add an item into
137 @code{mm-uu-configure-list}. For example,
141 (add-to-list 'mm-uu-configure-list '(pgp-signed . disabled))
163 Non-@sc{mime} forwarded message.
171 PGP signed clear text.
174 @findex pgp-encrypted
175 PGP encrypted clear text.
182 @findex emacs-sources
183 Emacs source code. This item works only in the groups matching
184 @code{mm-uu-emacs-sources-regexp}.
191 A @sc{mime} handle is a list that fully describes a @sc{mime}
194 The following macros can be used to access elements in a handle:
197 @item mm-handle-buffer
198 @findex mm-handle-buffer
199 Return the buffer that holds the contents of the undecoded @sc{mime}
203 @findex mm-handle-type
204 Return the parsed @code{Content-Type} of the part.
206 @item mm-handle-encoding
207 @findex mm-handle-encoding
208 Return the @code{Content-Transfer-Encoding} of the part.
210 @item mm-handle-undisplayer
211 @findex mm-handle-undisplayer
212 Return the object that can be used to remove the displayed part (if it
215 @item mm-handle-set-undisplayer
216 @findex mm-handle-set-undisplayer
217 Set the undisplayer object.
219 @item mm-handle-disposition
220 @findex mm-handle-disposition
221 Return the parsed @code{Content-Disposition} of the part.
223 @item mm-handle-disposition
224 @findex mm-handle-disposition
225 Return the description of the part.
227 @item mm-get-content-id
228 Returns the handle(s) referred to by @code{Content-ID}.
236 Functions for displaying, removing and saving.
239 @item mm-display-part
240 @findex mm-display-part
244 @findex mm-remove-part
245 Remove the part (if it has been displayed).
248 @findex mm-inlinable-p
249 Say whether a @sc{mime} type can be displayed inline.
251 @item mm-automatic-display-p
252 @findex mm-automatic-display-p
253 Say whether a @sc{mime} type should be displayed automatically.
255 @item mm-destroy-part
256 @findex mm-destroy-part
257 Free all resources occupied by a part.
261 Offer to save the part in a file.
265 Offer to pipe the part to some process.
267 @item mm-interactively-view-part
268 @findex mm-interactively-view-part
269 Prompt for a mailcap method to use to view the part.
274 @node Display Customization
275 @section Display Customization
279 @item mm-inline-media-tests
280 This is an alist where the key is a @sc{mime} type, the second element
281 is a function to display the part @dfn{inline} (i.e., inside Emacs), and
282 the third element is a form to be @code{eval}ed to say whether the part
283 can be displayed inline.
285 This variable specifies whether a part @emph{can} be displayed inline,
286 and, if so, how to do it. It does not say whether parts are
287 @emph{actually} displayed inline.
289 @item mm-inlined-types
290 This, on the other hand, says what types are to be displayed inline, if
291 they satisfy the conditions set by the variable above. It's a list of
292 @sc{mime} media types.
294 @item mm-automatic-display
295 This is a list of types that are to be displayed ``automatically'', but
296 only if the above variable allows it. That is, only inlinable parts can
297 be displayed automatically.
299 @item mm-attachment-override-types
300 Some @sc{mime} agents create parts that have a content-disposition of
301 @samp{attachment}. This variable allows overriding that disposition and
302 displaying the part inline. (Note that the disposition is only
303 overridden if we are able to, and want to, display the part inline.)
305 @item mm-discouraged-alternatives
306 List of @sc{mime} types that are discouraged when viewing
307 @samp{multipart/alternative}. Viewing agents are supposed to view the
308 last possible part of a message, as that is supposed to be the richest.
309 However, users may prefer other types instead, and this list says what
310 types are most unwanted. If, for instance, @samp{text/html} parts are
311 very unwanted, and @samp{text/richtech} parts are somewhat unwanted,
312 you could say something like:
315 (setq mm-discouraged-alternatives
316 '("text/html" "text/richtext")
318 (remove "text/html" mm-automatic-display))
321 @item mm-inline-large-images-p
322 When displaying inline images that are larger than the window, XEmacs
323 does not enable scrolling, which means that you cannot see the whole
324 image. To prevent this, the library tries to determine the image size
325 before displaying it inline, and if it doesn't fit the window, the
326 library will display it externally (e.g. with @samp{ImageMagick} or
327 @samp{xv}). Setting this variable to @code{t} disables this check and
328 makes the library display all inline images as inline, regardless of
331 @item mm-inline-override-type
332 @code{mm-inlined-types} may include regular expressions, for example to
333 specify that all @samp{text/.*} parts be displayed inline. If a user
334 prefers to have a type that matches such a regular expression be treated
335 as an attachment, that can be accomplished by setting this variable to a
336 list containing that type. For example assuming @code{mm-inlined-types}
337 includes @samp{text/.*}, then including @samp{text/html} in this
338 variable will cause @samp{text/html} parts to be treated as attachments.
340 @item mm-inline-text-html-renderer
341 This selects the function used to render @sc{html}. The predefined
342 renderers are selected by the symbols @code{w3},
343 @code{w3m}@footnote{See @uref{http://emacs-w3m.namazu.org/} for more
344 information about emacs-w3m}, @code{links}, @code{lynx} or
345 @code{html2text}. You can also specify a function, which will be
346 called with a @sc{mime} handle as the argument.
348 @item mm-inline-text-html-with-images
349 Some @sc{html} mails might have the trick of spammers using
350 @samp{<img>} tags. It is likely to be intended to verify whether you
351 have read the mail. You can prevent your personal informations from
352 leaking by setting this option to @code{nil} (which is the default).
353 It is currently ignored by Emacs/w3. For emacs-w3m, you may use the
354 command @kbd{t} on the image anchor to show an image even if it is
355 @code{nil}.@footnote{The command @kbd{T} will load all images. If you
356 have set the option @code{w3m-key-binding} to @code{info}, use @kbd{i}
359 @item mm-inline-text-html-with-w3m-keymap
360 You can use emacs-w3m command keys in the inlined text/html part by
361 setting this option to non-@code{nil}. The default value is @code{t}.
369 Here's an example viewer for displaying @code{text/enriched} inline:
372 (defun mm-display-enriched-inline (handle)
375 (mm-insert-part handle)
376 (save-window-excursion
377 (enriched-decode (point-min) (point-max))
378 (setq text (buffer-string))))
379 (mm-insert-inline handle text)))
382 We see that the function takes a @sc{mime} handle as its parameter. It
383 then goes to a temporary buffer, inserts the text of the part, does some
384 work on the text, stores the result, goes back to the buffer it was
385 called from and inserts the result.
387 The two important helper functions here are @code{mm-insert-part} and
388 @code{mm-insert-inline}. The first function inserts the text of the
389 handle in the current buffer. It handles charset and/or content
390 transfer decoding. The second function just inserts whatever text you
391 tell it to insert, but it also sets things up so that the text can be
392 ``undisplayed' in a convenient manner.
398 @cindex MIME Composing
400 @cindex MIME Meta Language
402 Creating a @sc{mime} message is boring and non-trivial. Therefore, a
403 library called @code{mml} has been defined that parses a language called
404 MML (@sc{mime} Meta Language) and generates @sc{mime} messages.
406 @findex mml-generate-mime
407 The main interface function is @code{mml-generate-mime}. It will
408 examine the contents of the current (narrowed-to) buffer and return a
409 string containing the @sc{mime} message.
412 * Simple MML Example:: An example MML document.
413 * MML Definition:: All valid MML elements.
414 * Advanced MML Example:: Another example MML document.
415 * Encoding Customization:: Variables that affect encoding.
416 * Charset Translation:: How charsets are mapped from @sc{mule} to @sc{mime}.
417 * Conversion:: Going from @sc{mime} to MML and vice versa.
418 * Flowed text:: Soft and hard newlines.
422 @node Simple MML Example
423 @section Simple MML Example
425 Here's a simple @samp{multipart/alternative}:
428 <#multipart type=alternative>
429 This is a plain text part.
430 <#part type=text/enriched>
431 <center>This is a centered enriched part</center>
435 After running this through @code{mml-generate-mime}, we get this:
438 Content-Type: multipart/alternative; boundary="=-=-="
444 This is a plain text part.
447 Content-Type: text/enriched
450 <center>This is a centered enriched part</center>
457 @section MML Definition
459 The MML language is very simple. It looks a bit like an SGML
460 application, but it's not.
462 The main concept of MML is the @dfn{part}. Each part can be of a
463 different type or use a different charset. The way to delineate a part
464 is with a @samp{<#part ...>} tag. Multipart parts can be introduced
465 with the @samp{<#multipart ...>} tag. Parts are ended by the
466 @samp{<#/part>} or @samp{<#/multipart>} tags. Parts started with the
467 @samp{<#part ...>} tags are also closed by the next open tag.
469 There's also the @samp{<#external ...>} tag. These introduce
470 @samp{external/message-body} parts.
472 Each tag can contain zero or more parameters on the form
473 @samp{parameter=value}. The values may be enclosed in quotation marks,
474 but that's not necessary unless the value contains white space. So
475 @samp{filename=/home/user/#hello$^yes} is perfectly valid.
477 The following parameters have meaning in MML; parameters that have no
478 meaning are ignored. The MML parameter names are the same as the
479 @sc{mime} parameter names; the things in the parentheses say which
480 header it will be used in.
484 The @sc{mime} type of the part (@code{Content-Type}).
487 Use the contents of the file in the body of the part
488 (@code{Content-Disposition}).
491 The contents of the body of the part are to be encoded in the character
492 set speficied (@code{Content-Type}). @xref{Charset Translation}.
495 Might be used to suggest a file name if the part is to be saved
496 to a file (@code{Content-Type}).
499 Valid values are @samp{inline} and @samp{attachment}
500 (@code{Content-Disposition}).
503 Valid values are @samp{7bit}, @samp{8bit}, @samp{quoted-printable} and
504 @samp{base64} (@code{Content-Transfer-Encoding}). @xref{Charset
508 A description of the part (@code{Content-Description}).
511 RFC822 date when the part was created (@code{Content-Disposition}).
513 @item modification-date
514 RFC822 date when the part was modified (@code{Content-Disposition}).
517 RFC822 date when the part was read (@code{Content-Disposition}).
520 Who to encrypt/sign the part to. This field is used to override any
521 auto-detection based on the To/CC headers.
524 The size (in octets) of the part (@code{Content-Disposition}).
527 What technology to sign this MML part with (@code{smime}, @code{pgp}
531 What technology to encrypt this MML part with (@code{smime},
532 @code{pgp} or @code{pgpmime})
536 Parameters for @samp{application/octet-stream}:
540 Type of the part; informal---meant for human readers
541 (@code{Content-Type}).
544 Parameters for @samp{message/external-body}:
548 A word indicating the supported access mechanism by which the file may
549 be obtained. Values include @samp{ftp}, @samp{anon-ftp}, @samp{tftp},
550 @samp{localfile}, and @samp{mailserver}. (@code{Content-Type}.)
553 The RFC822 date after which the file may no longer be fetched.
554 (@code{Content-Type}.)
557 The size (in octets) of the file. (@code{Content-Type}.)
560 Valid values are @samp{read} and @samp{read-write}
561 (@code{Content-Type}).
565 Parameters for @samp{sign=smime}:
570 File containing key and certificate for signer.
574 Parameters for @samp{encrypt=smime}:
579 File containing certificate for recipient.
584 @node Advanced MML Example
585 @section Advanced MML Example
587 Here's a complex multipart message. It's a @samp{multipart/mixed} that
588 contains many parts, one of which is a @samp{multipart/alternative}.
591 <#multipart type=mixed>
592 <#part type=image/jpeg filename=~/rms.jpg disposition=inline>
593 <#multipart type=alternative>
594 This is a plain text part.
595 <#part type=text/enriched name=enriched.txt>
596 <center>This is a centered enriched part</center>
598 This is a new plain text part.
599 <#part disposition=attachment>
600 This plain text part is an attachment.
604 And this is the resulting @sc{mime} message:
607 Content-Type: multipart/mixed; boundary="=-=-="
615 Content-Type: image/jpeg;
617 Content-Disposition: inline;
619 Content-Transfer-Encoding: base64
621 /9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRof
622 Hh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/wAALCAAwADABAREA/8QAHwAA
623 AQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQR
624 BRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RF
625 RkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ip
626 qrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/9oACAEB
627 AAA/AO/rifFHjldNuGsrDa0qcSSHkA+gHrXKw+LtWLrMb+RgTyhbr+HSug07xNqV9fQtZrNI
628 AyiaE/NuBPOOOP0rvRNE880KOC8TbXXGCv1FPqjrF4LDR7u5L7SkTFT/ALWOP1xXgTuXfc7E
629 sx6nua6rwp4IvvEM8chCxWxOdzn7wz6V9AaB4S07w9p5itow0rDLSY5Pt9K43xO66P4xs71m
630 2QXiGCbA4yOVJ9+1aYORkdK434lyNH4ahCnG66VT9Nj15JFbPdX0MS43M4VQf5/yr2vSpLnw
631 5ZW8dlCZ8KFXjOPX0/mK6rSPEGt3Angu44fNEReHYNvIH3TzXDeKNO8RX+kSX2ouZkicTIOc
632 L+g7E810ulFjpVtv3bwgB3HJyK5L4quY/C9sVxk3ij/xx6850u7t1mtp/wDlpEw3An3Jr3Dw
633 34gsbWza4nBlhC5LDsaW6+IFgupQyCF3iHH7gA7c9R9ay7zx6t7aX9jHC4smhfBkGCvHGfrm
634 tLQ7hbnRrV1GPkAP1x1/Hr+Ncr8Vzjwrbf8AX6v/AKA9eQRyYlQk8Yx9K6XTNbkgia2ciSIn
635 7p5Ga9Atte0LTLKO6it4i7dVRFJDcZ4PvXN+JvEMF9bILVGXJLSZ4zkjivRPDaeX4b08HOTC
636 pOffmua+KkbS+GLVUGT9tT/0B68eeIpIFYjB70+OOVXyoOM9+M1eaWeCLzHPyHGO/NVWvJJm
637 jQ8KGH1NfQWhXSXmh2c8eArRLwO3HSv/2Q==
640 Content-Type: multipart/alternative; boundary="==-=-="
646 This is a plain text part.
649 Content-Type: text/enriched;
653 <center>This is a centered enriched part</center>
659 This is a new plain text part.
662 Content-Disposition: attachment
665 This plain text part is an attachment.
670 @node Encoding Customization
671 @section Encoding Customization
675 @item mm-body-charset-encoding-alist
676 @vindex mm-body-charset-encoding-alist
677 Mapping from MIME charset to encoding to use. This variable is
678 usually used except, e.g., when other requirements force a specific
679 encoding (digitally signed messages require 7bit encodings). The
680 default is @code{((iso-2022-jp . 7bit) (iso-2022-jp-2 . 7bit))}. As
681 an example, if you do not want to have ISO-8859-1 characters
682 quoted-printable encoded, you may add @code{(iso-8859-1 . 8bit)} to
683 this variable. You can override this setting on a per-message basis
684 by using the @code{encoding} MML tag (@pxref{MML Definition}).
686 @item mm-coding-system-priorities
687 @vindex mm-coding-system-priorities
688 Prioritize coding systems to use for outgoing messages. The default
689 is nil, which means to use the defaults in Emacs. It is a list of
690 coding system symbols (aliases of coding systems does not work, use
691 @kbd{M-x describe-coding-system} to make sure you are not specifying
692 an alias in this variable). For example, if you have configured Emacs
693 to use prefer UTF-8, but wish that outgoing messages should be sent in
694 ISO-8859-1 if possible, you can set this variable to
695 @code{(iso-latin-1)}. You can override this setting on a per-message
696 basis by using the @code{charset} MML tag (@pxref{MML Definition}).
698 @item mm-content-transfer-encoding-defaults
699 @vindex mm-content-transfer-encoding-defaults
700 Mapping from MIME types to encoding to use. This variable is usually
701 used except, e.g., when other requirements force a safer encoding
702 (digitally signed messages require 7bit encoding). Besides the normal
703 MIME encodings, @code{qp-or-base64} may be used to indicate that for
704 each case the most efficient of quoted-printable and base64 should be
705 used. You can override this setting on a per-message basis by using
706 the @code{encoding} MML tag (@pxref{MML Definition}).
708 @item mm-use-ultra-safe-encoding
709 @vindex mm-use-ultra-safe-encoding
710 When this is non-nil, it means that textual parts are encoded as
711 quoted-printable if they contain lines longer than 76 characters or
712 starting with "From " in the body. Non-7bit encodings (8bit, binary)
713 are generally disallowed. This reduce the probability that a non-8bit
714 clean MTA or MDA changes the message. This should never be set
715 directly, but bound by other functions when necessary (e.g., when
716 encoding messages that are to be digitally signed).
720 @node Charset Translation
721 @section Charset Translation
724 During translation from MML to @sc{mime}, for each @sc{mime} part which
725 has been composed inside Emacs, an appropriate charset has to be chosen.
727 @vindex mail-parse-charset
728 If you are running a non-@sc{mule} Emacs, this process is simple: If the
729 part contains any non-ASCII (8-bit) characters, the @sc{mime} charset
730 given by @code{mail-parse-charset} (a symbol) is used. (Never set this
731 variable directly, though. If you want to change the default charset,
732 please consult the documentation of the package which you use to process
734 @xref{Various Message Variables, , Various Message Variables, message,
735 Message Manual}, for example.)
736 If there are only ASCII characters, the @sc{mime} charset US-ASCII is
742 @vindex mm-mime-mule-charset-alist
743 Things are slightly more complicated when running Emacs with @sc{mule}
744 support. In this case, a list of the @sc{mule} charsets used in the
745 part is obtained, and the @sc{mule} charsets are translated to @sc{mime}
746 charsets by consulting the variable @code{mm-mime-mule-charset-alist}.
747 If this results in a single @sc{mime} charset, this is used to encode
748 the part. But if the resulting list of @sc{mime} charsets contains more
749 than one element, two things can happen: If it is possible to encode the
750 part via UTF-8, this charset is used. (For this, Emacs must support
751 the @code{utf-8} coding system, and the part must consist entirely of
752 characters which have Unicode counterparts.) If UTF-8 is not available
753 for some reason, the part is split into several ones, so that each one
754 can be encoded with a single @sc{mime} charset. The part can only be
755 split at line boundaries, though---if more than one @sc{mime} charset is
756 required to encode a single line, it is not possible to encode the part.
758 When running Emacs with @sc{mule} support, the preferences for which
759 coding system to use is inherited from Emacs itself. This means that
760 if Emacs is set up to prefer UTF-8, it will be used when encoding
761 messages. You can modify this by altering the
762 @code{mm-coding-system-priorities} variable though (@pxref{Encoding
765 The charset to be used can be overriden by setting the @code{charset}
766 MML tag (@pxref{MML Definition}) when composing the message.
768 The encoding of characters (quoted-printable, 8bit etc) is orthogonal
769 to the discussion here, and is controlled by the variables
770 @code{mm-body-charset-encoding-alist} and
771 @code{mm-content-transfer-encoding-defaults} (@pxref{Encoding
778 A (multipart) @sc{mime} message can be converted to MML with the
779 @code{mime-to-mml} function. It works on the message in the current
780 buffer, and substitutes MML markup for @sc{mime} boundaries.
781 Non-textual parts do not have their contents in the buffer, but instead
782 have the contents in separate buffers that are referred to from the MML
786 An MML message can be converted back to @sc{mime} by the
787 @code{mml-to-mime} function.
789 These functions are in certain senses ``lossy''---you will not get back
790 an identical message if you run @sc{mime-to-mml} and then
791 @sc{mml-to-mime}. Not only will trivial things like the order of the
792 headers differ, but the contents of the headers may also be different.
793 For instance, the original message may use base64 encoding on text,
794 while @sc{mml-to-mime} may decide to use quoted-printable encoding, and
797 In essence, however, these two functions should be the inverse of each
798 other. The resulting contents of the message should remain equivalent,
804 @cindex format=flowed
806 The Emacs @sc{mime} library will respect the @code{use-hard-newlines}
807 variable (@pxref{Hard and Soft Newlines, ,Hard and Soft Newlines,
808 emacs, Emacs Manual}) when encoding a message, and the
809 ``format=flowed'' Content-Type parameter when decoding a message.
811 On encoding text, lines terminated by soft newline characters are
812 filled together and wrapped after the column decided by
813 @code{fill-flowed-encode-column}. This variable controls how the text
814 will look in a client that does not support flowed text, the default
815 is to wrap after 66 characters. If hard newline characters are not
816 present in the buffer, no flow encoding occurs.
818 On decoding flowed text, lines with soft newline characters are filled
819 together and wrapped after the column decided by
820 @code{fill-flowed-display-column}. The default is to wrap after
826 @node Interface Functions
827 @chapter Interface Functions
828 @cindex interface functions
831 The @code{mail-parse} library is an abstraction over the actual
832 low-level libraries that are described in the next chapter.
834 Standards change, and so programs have to change to fit in the new
835 mold. For instance, RFC2045 describes a syntax for the
836 @code{Content-Type} header that only allows ASCII characters in the
837 parameter list. RFC2231 expands on RFC2045 syntax to provide a scheme
838 for continuation headers and non-ASCII characters.
840 The traditional way to deal with this is just to update the library
841 functions to parse the new syntax. However, this is sometimes the wrong
842 thing to do. In some instances it may be vital to be able to understand
843 both the old syntax as well as the new syntax, and if there is only one
844 library, one must choose between the old version of the library and the
845 new version of the library.
847 The Emacs @sc{mime} library takes a different tack. It defines a
848 series of low-level libraries (@file{rfc2047.el}, @file{rfc2231.el}
849 and so on) that parses strictly according to the corresponding
850 standard. However, normal programs would not use the functions
851 provided by these libraries directly, but instead use the functions
852 provided by the @code{mail-parse} library. The functions in this
853 library are just aliases to the corresponding functions in the latest
854 low-level libraries. Using this scheme, programs get a consistent
855 interface they can use, and library developers are free to create
856 write code that handles new standards.
858 The following functions are defined by this library:
861 @item mail-header-parse-content-type
862 @findex mail-header-parse-content-type
863 Parse a @code{Content-Type} header and return a list on the following
868 (attribute1 . value1)
869 (attribute2 . value2)
876 (mail-header-parse-content-type
877 "image/gif; name=\"b980912.gif\"")
878 @result{} ("image/gif" (name . "b980912.gif"))
881 @item mail-header-parse-content-disposition
882 @findex mail-header-parse-content-disposition
883 Parse a @code{Content-Disposition} header and return a list on the same
884 format as the function above.
886 @item mail-content-type-get
887 @findex mail-content-type-get
888 Takes two parameters---a list on the format above, and an attribute.
889 Returns the value of the attribute.
892 (mail-content-type-get
893 '("image/gif" (name . "b980912.gif")) 'name)
894 @result{} "b980912.gif"
897 @item mail-header-encode-parameter
898 @findex mail-header-encode-parameter
899 Takes a parameter string and returns an encoded version of the string.
900 This is used for parameters in headers like @code{Content-Type} and
901 @code{Content-Disposition}.
903 @item mail-header-remove-comments
904 @findex mail-header-remove-comments
905 Return a comment-free version of a header.
908 (mail-header-remove-comments
909 "Gnus/5.070027 (Pterodactyl Gnus v0.27) (Finnish Landrace)")
910 @result{} "Gnus/5.070027 "
913 @item mail-header-remove-whitespace
914 @findex mail-header-remove-whitespace
915 Remove linear white space from a header. Space inside quoted strings
916 and comments is preserved.
919 (mail-header-remove-whitespace
920 "image/gif; name=\"Name with spaces\"")
921 @result{} "image/gif;name=\"Name with spaces\""
924 @item mail-header-get-comment
925 @findex mail-header-get-comment
926 Return the last comment in a header.
929 (mail-header-get-comment
930 "Gnus/5.070027 (Pterodactyl Gnus v0.27) (Finnish Landrace)")
931 @result{} "Finnish Landrace"
934 @item mail-header-parse-address
935 @findex mail-header-parse-address
936 Parse an address and return a list containing the mailbox and the
940 (mail-header-parse-address
941 "Hrvoje Niksic <hniksic@@srce.hr>")
942 @result{} ("hniksic@@srce.hr" . "Hrvoje Niksic")
945 @item mail-header-parse-addresses
946 @findex mail-header-parse-addresses
947 Parse a string with list of addresses and return a list of elements like
948 the one described above.
951 (mail-header-parse-addresses
952 "Hrvoje Niksic <hniksic@@srce.hr>, Steinar Bang <sb@@metis.no>")
953 @result{} (("hniksic@@srce.hr" . "Hrvoje Niksic")
954 ("sb@@metis.no" . "Steinar Bang"))
957 @item mail-header-parse-date
958 @findex mail-header-parse-date
959 Parse a date string and return an Emacs time structure.
961 @item mail-narrow-to-head
962 @findex mail-narrow-to-head
963 Narrow the buffer to the header section of the buffer. Point is placed
964 at the beginning of the narrowed buffer.
966 @item mail-header-narrow-to-field
967 @findex mail-header-narrow-to-field
968 Narrow the buffer to the header under point. Understands continuation
971 @item mail-header-fold-field
972 @findex mail-header-fold-field
973 Fold the header under point.
975 @item mail-header-unfold-field
976 @findex mail-header-unfold-field
977 Unfold the header under point.
979 @item mail-header-field-value
980 @findex mail-header-field-value
981 Return the value of the field under point.
983 @item mail-encode-encoded-word-region
984 @findex mail-encode-encoded-word-region
985 Encode the non-ASCII words in the region. For instance,
986 @samp{Naïve} is encoded as @samp{=?iso-8859-1?q?Na=EFve?=}.
988 @item mail-encode-encoded-word-buffer
989 @findex mail-encode-encoded-word-buffer
990 Encode the non-ASCII words in the current buffer. This function is
991 meant to be called narrowed to the headers of a message.
993 @item mail-encode-encoded-word-string
994 @findex mail-encode-encoded-word-string
995 Encode the words that need encoding in a string, and return the result.
998 (mail-encode-encoded-word-string
999 "This is naïve, baby")
1000 @result{} "This is =?iso-8859-1?q?na=EFve,?= baby"
1003 @item mail-decode-encoded-word-region
1004 @findex mail-decode-encoded-word-region
1005 Decode the encoded words in the region.
1007 @item mail-decode-encoded-word-string
1008 @findex mail-decode-encoded-word-string
1009 Decode the encoded words in the string and return the result.
1012 (mail-decode-encoded-word-string
1013 "This is =?iso-8859-1?q?na=EFve,?= baby")
1014 @result{} "This is naïve, baby"
1019 Currently, @code{mail-parse} is an abstraction over @code{ietf-drums},
1020 @code{rfc2047}, @code{rfc2045} and @code{rfc2231}. These are documented
1021 in the subsequent sections.
1025 @node Basic Functions
1026 @chapter Basic Functions
1028 This chapter describes the basic, ground-level functions for parsing and
1029 handling. Covered here is parsing @code{From} lines, removing comments
1030 from header lines, decoding encoded words, parsing date headers and so
1031 on. High-level functionality is dealt with in the next chapter
1032 (@pxref{Decoding and Viewing}).
1035 * rfc2045:: Encoding @code{Content-Type} headers.
1036 * rfc2231:: Parsing @code{Content-Type} headers.
1037 * ietf-drums:: Handling mail headers defined by RFC822bis.
1038 * rfc2047:: En/decoding encoded words in headers.
1039 * time-date:: Functions for parsing dates and manipulating time.
1040 * qp:: Quoted-Printable en/decoding.
1041 * base64:: Base64 en/decoding.
1042 * binhex:: Binhex decoding.
1043 * uudecode:: Uuencode decoding.
1044 * rfc1843:: Decoding HZ-encoded text.
1045 * mailcap:: How parts are displayed is specified by the @file{.mailcap} file
1052 RFC2045 is the ``main'' @sc{mime} document, and as such, one would
1053 imagine that there would be a lot to implement. But there isn't, since
1054 most of the implementation details are delegated to the subsequent
1057 So @file{rfc2045.el} has only a single function:
1060 @item rfc2045-encode-string
1061 @findex rfc2045-encode-string
1062 Takes a parameter and a value and returns a @samp{PARAM=VALUE} string.
1063 @var{value} will be quoted if there are non-safe characters in it.
1070 RFC2231 defines a syntax for the @code{Content-Type} and
1071 @code{Content-Disposition} headers. Its snappy name is @dfn{MIME
1072 Parameter Value and Encoded Word Extensions: Character Sets, Languages,
1075 In short, these headers look something like this:
1078 Content-Type: application/x-stuff;
1079 title*0*=us-ascii'en'This%20is%20even%20more%20;
1080 title*1*=%2A%2A%2Afun%2A%2A%2A%20;
1084 They usually aren't this bad, though.
1086 The following functions are defined by this library:
1089 @item rfc2231-parse-string
1090 @findex rfc2231-parse-string
1091 Parse a @code{Content-Type} header and return a list describing its
1095 (rfc2231-parse-string
1096 "application/x-stuff;
1097 title*0*=us-ascii'en'This%20is%20even%20more%20;
1098 title*1*=%2A%2A%2Afun%2A%2A%2A%20;
1099 title*2=\"isn't it!\"")
1100 @result{} ("application/x-stuff"
1101 (title . "This is even more ***fun*** isn't it!"))
1104 @item rfc2231-get-value
1105 @findex rfc2231-get-value
1106 Takes one of the lists on the format above and returns
1107 the value of the specified attribute.
1109 @item rfc2231-encode-string
1110 @findex rfc2231-encode-string
1111 Encode a parameter in headers likes @code{Content-Type} and
1112 @code{Content-Disposition}.
1120 @dfn{drums} is an IETF working group that is working on the replacement
1123 The functions provided by this library include:
1126 @item ietf-drums-remove-comments
1127 @findex ietf-drums-remove-comments
1128 Remove the comments from the argument and return the results.
1130 @item ietf-drums-remove-whitespace
1131 @findex ietf-drums-remove-whitespace
1132 Remove linear white space from the string and return the results.
1133 Spaces inside quoted strings and comments are left untouched.
1135 @item ietf-drums-get-comment
1136 @findex ietf-drums-get-comment
1137 Return the last most comment from the string.
1139 @item ietf-drums-parse-address
1140 @findex ietf-drums-parse-address
1141 Parse an address string and return a list that contains the mailbox and
1142 the plain text name.
1144 @item ietf-drums-parse-addresses
1145 @findex ietf-drums-parse-addresses
1146 Parse a string that contains any number of comma-separated addresses and
1147 return a list that contains mailbox/plain text pairs.
1149 @item ietf-drums-parse-date
1150 @findex ietf-drums-parse-date
1151 Parse a date string and return an Emacs time structure.
1153 @item ietf-drums-narrow-to-header
1154 @findex ietf-drums-narrow-to-header
1155 Narrow the buffer to the header section of the current buffer.
1163 RFC2047 (Message Header Extensions for Non-ASCII Text) specifies how
1164 non-ASCII text in headers are to be encoded. This is actually rather
1165 complicated, so a number of variables are necessary to tweak what this
1168 The following variables are tweakable:
1171 @item rfc2047-default-charset
1172 @vindex rfc2047-default-charset
1173 Characters in this charset should not be decoded by this library.
1174 This defaults to @code{iso-8859-1}.
1176 @item rfc2047-header-encoding-list
1177 @vindex rfc2047-header-encoding-list
1178 This is an alist of header / encoding-type pairs. Its main purpose is
1179 to prevent encoding of certain headers.
1181 The keys can either be header regexps, or @code{t}.
1183 The values can be either @code{nil}, in which case the header(s) in
1184 question won't be encoded, or @code{mime}, which means that they will be
1187 @item rfc2047-charset-encoding-alist
1188 @vindex rfc2047-charset-encoding-alist
1189 RFC2047 specifies two forms of encoding---@code{Q} (a
1190 Quoted-Printable-like encoding) and @code{B} (base64). This alist
1191 specifies which charset should use which encoding.
1193 @item rfc2047-encoding-function-alist
1194 @vindex rfc2047-encoding-function-alist
1195 This is an alist of encoding / function pairs. The encodings are
1196 @code{Q}, @code{B} and @code{nil}.
1198 @item rfc2047-q-encoding-alist
1199 @vindex rfc2047-q-encoding-alist
1200 The @code{Q} encoding isn't quite the same for all headers. Some
1201 headers allow a narrower range of characters, and that is what this
1202 variable is for. It's an alist of header regexps / allowable character
1205 @item rfc2047-encoded-word-regexp
1206 @vindex rfc2047-encoded-word-regexp
1207 When decoding words, this library looks for matches to this regexp.
1211 Those were the variables, and these are this functions:
1214 @item rfc2047-narrow-to-field
1215 @findex rfc2047-narrow-to-field
1216 Narrow the buffer to the header on the current line.
1218 @item rfc2047-encode-message-header
1219 @findex rfc2047-encode-message-header
1220 Should be called narrowed to the header of a message. Encodes according
1221 to @code{rfc2047-header-encoding-alist}.
1223 @item rfc2047-encode-region
1224 @findex rfc2047-encode-region
1225 Encodes all encodable words in the region specified.
1227 @item rfc2047-encode-string
1228 @findex rfc2047-encode-string
1229 Encode a string and return the results.
1231 @item rfc2047-decode-region
1232 @findex rfc2047-decode-region
1233 Decode the encoded words in the region.
1235 @item rfc2047-decode-string
1236 @findex rfc2047-decode-string
1237 Decode a string and return the results.
1245 While not really a part of the @sc{mime} library, it is convenient to
1246 document this library here. It deals with parsing @code{Date} headers
1247 and manipulating time. (Not by using tesseracts, though, I'm sorry to
1250 These functions convert between five formats: A date string, an Emacs
1251 time structure, a decoded time list, a second number, and a day number.
1253 Here's a bunch of time/date/second/day examples:
1256 (parse-time-string "Sat Sep 12 12:21:54 1998 +0200")
1257 @result{} (54 21 12 12 9 1998 6 nil 7200)
1259 (date-to-time "Sat Sep 12 12:21:54 1998 +0200")
1260 @result{} (13818 19266)
1262 (time-to-seconds '(13818 19266))
1263 @result{} 905595714.0
1265 (seconds-to-time 905595714.0)
1266 @result{} (13818 19266 0)
1268 (time-to-days '(13818 19266))
1271 (days-to-time 729644)
1272 @result{} (961933 65536)
1274 (time-since '(13818 19266))
1277 (time-less-p '(13818 19266) '(13818 19145))
1280 (subtract-time '(13818 19266) '(13818 19145))
1283 (days-between "Sat Sep 12 12:21:54 1998 +0200"
1284 "Sat Sep 07 12:21:54 1998 +0200")
1287 (date-leap-year-p 2000)
1290 (time-to-day-in-year '(13818 19266))
1293 (time-to-number-of-days
1295 (date-to-time "Mon, 01 Jan 2001 02:22:26 GMT")))
1296 @result{} 4.146122685185185
1299 And finally, we have @code{safe-date-to-time}, which does the same as
1300 @code{date-to-time}, but returns a zero time if the date is
1301 syntactically malformed.
1303 The five data representations used are the following:
1307 An RFC822 (or similar) date string. For instance: @code{"Sat Sep 12
1308 12:21:54 1998 +0200"}.
1311 An internal Emacs time. For instance: @code{(13818 26466)}.
1314 A floating point representation of the internal Emacs time. For
1315 instance: @code{905595714.0}.
1318 An integer number representing the number of days since 00000101. For
1319 instance: @code{729644}.
1322 A list of decoded time. For instance: @code{(54 21 12 12 9 1998 6 t
1326 All the examples above represent the same moment.
1328 These are the functions available:
1332 Take a date and return a time.
1334 @item time-to-seconds
1335 Take a time and return seconds.
1337 @item seconds-to-time
1338 Take seconds and return a time.
1341 Take a time and return days.
1344 Take days and return a time.
1347 Take a date and return days.
1349 @item time-to-number-of-days
1350 Take a time and return the number of days that represents.
1352 @item safe-date-to-time
1353 Take a date and return a time. If the date is not syntactically valid,
1354 return a "zero" date.
1357 Take two times and say whether the first time is less (i. e., earlier)
1358 than the second time.
1361 Take a time and return a time saying how long it was since that time.
1364 Take two times and subtract the second from the first. I. e., return
1365 the time between the two times.
1368 Take two days and return the number of days between those two days.
1370 @item date-leap-year-p
1371 Take a year number and say whether it's a leap year.
1373 @item time-to-day-in-year
1374 Take a time and return the day number within the year that the time is
1383 This library deals with decoding and encoding Quoted-Printable text.
1385 Very briefly explained, qp encoding means translating all 8-bit
1386 characters (and lots of control characters) into things that look like
1387 @samp{=EF}; that is, an equal sign followed by the byte encoded as a hex
1390 The following functions are defined by the library:
1393 @item quoted-printable-decode-region
1394 @findex quoted-printable-decode-region
1395 QP-decode all the encoded text in the specified region.
1397 @item quoted-printable-decode-string
1398 @findex quoted-printable-decode-string
1399 Decode the QP-encoded text in a string and return the results.
1401 @item quoted-printable-encode-region
1402 @findex quoted-printable-encode-region
1403 QP-encode all the encodable characters in the specified region. The third
1404 optional parameter @var{fold} specifies whether to fold long lines.
1405 (Long here means 72.)
1407 @item quoted-printable-encode-string
1408 @findex quoted-printable-encode-string
1409 QP-encode all the encodable characters in a string and return the
1419 Base64 is an encoding that encodes three bytes into four characters,
1420 thereby increasing the size by about 33%. The alphabet used for
1421 encoding is very resistant to mangling during transit.
1423 The following functions are defined by this library:
1426 @item base64-encode-region
1427 @findex base64-encode-region
1428 base64 encode the selected region. Return the length of the encoded
1429 text. Optional third argument @var{no-line-break} means do not break
1430 long lines into shorter lines.
1432 @item base64-encode-string
1433 @findex base64-encode-string
1434 base64 encode a string and return the result.
1436 @item base64-decode-region
1437 @findex base64-decode-region
1438 base64 decode the selected region. Return the length of the decoded
1439 text. If the region can't be decoded, return @code{nil} and don't
1442 @item base64-decode-string
1443 @findex base64-decode-string
1444 base64 decode a string and return the result. If the string can't be
1445 decoded, @code{nil} is returned.
1456 @code{binhex} is an encoding that originated in Macintosh environments.
1457 The following function is supplied to deal with these:
1460 @item binhex-decode-region
1461 @findex binhex-decode-region
1462 Decode the encoded text in the region. If given a third parameter, only
1463 decode the @code{binhex} header and return the filename.
1473 @code{uuencode} is probably still the most popular encoding of binaries
1474 used on Usenet, although @code{base64} rules the mail world.
1476 The following function is supplied by this package:
1479 @item uudecode-decode-region
1480 @findex uudecode-decode-region
1481 Decode the text in the region.
1491 RFC1843 deals with mixing Chinese and ASCII characters in messages. In
1492 essence, RFC1843 switches between ASCII and Chinese by doing this:
1495 This sentence is in ASCII.
1496 The next sentence is in GB.~@{<:Ky2;S@{#,NpJ)l6HK!#~@}Bye.
1499 Simple enough, and widely used in China.
1501 The following functions are available to handle this encoding:
1504 @item rfc1843-decode-region
1505 Decode HZ-encoded text in the region.
1507 @item rfc1843-decode-string
1508 Decode a HZ-encoded string and return the result.
1516 The @file{~/.mailcap} file is parsed by most @sc{mime}-aware message
1517 handlers and describes how elements are supposed to be displayed.
1518 Here's an example file:
1522 audio/wav; wavplayer %s
1523 application/msword; catdoc %s ; copiousoutput ; nametemplate=%s.doc
1526 This says that all image files should be displayed with @code{gimp},
1527 that WAVE audio files should be played by @code{wavplayer}, and that
1528 MS-WORD files should be inlined by @code{catdoc}.
1530 The @code{mailcap} library parses this file, and provides functions for
1534 @item mailcap-mime-data
1535 @vindex mailcap-mime-data
1536 This variable is an alist of alists containing backup viewing rules.
1540 Interface functions:
1543 @item mailcap-parse-mailcaps
1544 @findex mailcap-parse-mailcaps
1545 Parse the @code{~/.mailcap} file.
1547 @item mailcap-mime-info
1548 Takes a @sc{mime} type as its argument and returns the matching viewer.
1558 The Emacs @sc{mime} library implements handling of various elements
1559 according to a (somewhat) large number of RFCs, drafts and standards
1560 documents. This chapter lists the relevant ones. They can all be
1561 fetched from @uref{http://quimby.gnus.org/notes/}.
1566 Standard for the Format of ARPA Internet Text Messages.
1569 Standard for Interchange of USENET Messages
1572 Format of Internet Message Bodies
1578 Message Header Extensions for Non-ASCII Text
1581 Registration Procedures
1584 Conformance Criteria and Examples
1587 @sc{mime} Parameter Value and Encoded Word Extensions: Character Sets,
1588 Languages, and Continuations
1591 HZ - A Data Format for Exchanging Files of Arbitrarily Mixed Chinese and
1594 @item draft-ietf-drums-msg-fmt-05.txt
1595 Draft for the successor of RFC822
1598 The @sc{mime} Multipart/Related Content-type
1601 The Multipart/Report Content Type for the Reporting of Mail System
1602 Administrative Messages
1605 Communicating Presentation Information in Internet Messages: The
1606 Content-Disposition Header Field
1609 Documentation of the text/plain format parameter for flowed text.
1625 @c coding: iso-8859-1