From: yamaoka Date: Sun, 24 Feb 2002 23:44:36 +0000 (+0000) Subject: Synch with Oort Gnus. X-Git-Tag: t-gnus-6_15_6-01-quimby~36 X-Git-Url: http://git.chise.org/gitweb/?a=commitdiff_plain;h=18c92b4231f205ecc4bd5eddbd8ce011613a3f0c;p=elisp%2Fgnus.git- Synch with Oort Gnus. --- diff --git a/GNUS-NEWS b/GNUS-NEWS index 8e64d5d..2290ba9 100644 --- a/GNUS-NEWS +++ b/GNUS-NEWS @@ -8,6 +8,11 @@ For older news, see Gnus info node "New Features". * Changes in Oort Gnus +** deuglify.el (gnus-article-outlook-deuglify-article) + +A new file from Raymond Scholz for deuglifying +broken Outlook (Express) articles. + ** (require 'gnus-load) If you use a stand-alone Gnus distribution, you'd better add (require diff --git a/lisp/ChangeLog b/lisp/ChangeLog index faa4f44..e7f1354 100644 --- a/lisp/ChangeLog +++ b/lisp/ChangeLog @@ -1,3 +1,41 @@ +2002-02-24 ShengHuo ZHU + + * deuglify.el: Fix comments. + +2002-02-23 ShengHuo ZHU + + * html2text.el (html2text-clean-anchor): If there is no HREF, + insert nothing. + + * mml.el (mml-generate-mime-1): Add cdr. + From: andre@slamdunknetworks.com + + * mm-view.el (mm-text-html-renderer-alist): Add html2text. + (mm-text-html-washer-alist): Ditto. + + * mm-decode.el (mm-text-html-renderer): Add html2text. + + * html2text.el: Face lift. + + * html2text.el: New file from Joakim Hove . + +2002-02-22 ShengHuo ZHU + + * gnus-sum.el: Add gnus-article-outlook-deuglify-article. + + * deuglify.el: Change copy right. Add autoload. Add coding-system. + + * deuglify.el: New file. The original file name is + gnus-outlook-deuglify.el from Raymond Scholz . + + * mm-decode.el (mm-display-external): Use + mm-file-name-rewrite-functions. From + +2002-02-22 Paul Jarc + + * nnmaildir.el (nnmaildir-request-list): Report the highest + article number, not the total number of articles. + 2002-02-21 ShengHuo ZHU * gnus-sum.el: Move uu key map here. diff --git a/lisp/deuglify.el b/lisp/deuglify.el new file mode 100644 index 0000000..e3a9bf3 --- /dev/null +++ b/lisp/deuglify.el @@ -0,0 +1,441 @@ +;;; deuglify.el --- deuglify broken Outlook (Express) articles + +;; Copyright (C) 2002 Free Software Foundation, Inc. +;; Copyright (C) 2001,2002 Raymond Scholz + +;; Author: Raymond Scholz +;; Thomas Steffen (unwrapping algorithm, +;; based on an idea of Stefan Monnier) +;; Keywords: mail, news + +;; This file is part of GNU Emacs. + +;; GNU Emacs is free software; you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation; either version 2, or (at your option) +;; any later version. + +;; GNU Emacs is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. + +;; You should have received a copy of the GNU General Public License +;; along with GNU Emacs; see the file COPYING. If not, write to the +;; Free Software Foundation, Inc., 59 Temple Place - Suite 330, +;; Boston, MA 02111-1307, USA. + +;;; Commentary: + +;; This file enables Gnus to repair broken citations produced by +;; common user agents like MS Outlook (Express). It may repair +;; articles of other user agents too. +;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +;; +;; Outlook sometimes wraps cited lines before sending a message as +;; seen in this example: +;; +;; Example #1 +;; ---------- +;; +;; John Doe wrote: +;; +;; > This sentence no verb. This sentence no verb. This sentence +;; no +;; > verb. This sentence no verb. This sentence no verb. This +;; > sentence no verb. +;; +;; The function `gnus-outlook-unwrap-lines' tries to recognize those +;; erroneously wrapped lines and will unwrap them. I.e. putting the +;; wrapped parts ("no" in this example) back where they belong (at the +;; end of the cited line above). +;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; +;; Note that some people not only use broken user agents but also +;; practice a bad citation style by omitting blank lines between the +;; cited text and their own text. +;: +;; Example #2 +;; ---------- +;; +;; John Doe wrote: +;; +;; > This sentence no verb. This sentence no verb. This sentence no +;; You forgot in all your sentences. +;; > verb. This sentence no verb. This sentence no verb. This +;; > sentence no verb. +;; +;; Unwrapping "You forgot in all your sentences." would be illegal as +;; this part wasn't intended to be cited text. +;; `gnus-outlook-unwrap-lines' will only unwrap lines if the resulting +;; citation line will be of a certain maximum length. You can control +;; this by adjusting `gnus-outlook-deuglify-unwrap-max'. Also +;; unwrapping will only be done if the line above the (possibly) +;; wrapped line has a minimum length of `gnus-outlook-deuglify-unwrap-min'. +;; +;; Furthermore no unwrapping will be undertaken if the last character +;; is one of the chars specified in +;; `gnus-outlook-deuglify-unwrap-stop-chars'. Setting this to ".?!" +;; inhibits unwrapping if the cited line ends with a full stop, +;; question mark or exclamation mark. Note that this variable +;; defaults to `nil', triggering a few false positives but generally +;; giving you better results. +;; +;; Unwrapping works on every level of citation. Thus you will be able +;; repair broken citations of broken user agents citing broken +;; citations of broken user agents citing broken citations... +;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; +;; Citations are commonly introduced with an attribution line +;; indicating who wrote the cited text. Outlook adds superfluous +;; information that can be found in the header of the message to this +;; line and often wraps it. +;; +;; If that weren't enough, lots of people write their own text above +;; the cited text and cite the complete original article below. +;; +;; Example #3 +;; ---------- +;; +;; Hey, John. There's no in all your sentences! +;; +;; John Doe wrote in message +;; news:a87usw8$dklsssa$2@some.news.server... +;; > This sentence no verb. This sentence no verb. This sentence +;; no +;; > verb. This sentence no verb. This sentence no verb. This +;; > sentence no verb. +;; > +;; > Bye, John +;; +;; Repairing the attribution line will be done by function +;; `gnus-outlook-repair-attribution' which calls other function that +;; try to recognize and repair broken attribution lines. See variable +;; `gnus-outlook-deuglify-attrib-cut-regexp' for stuff that should be +;; cut off from the beginning of an attribution line and variable +;; `gnus-outlook-deuglify-attrib-verb-regexp' for the verbs that are +;; required to be found in an attribution line. These function return +;; the point where the repaired attribution line starts. +;; +;; Rearranging the article so that the cited text appears above the +;; new text will be done by function +;; `gnus-outlook-rearrange-citation'. This function calls +;; `gnus-outlook-repair-attribution' to find and repair an attribution +;; line. +;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; +;; Well, and that's what the message will look like after applying +;; deuglification: +;; +;; Example #3 (deuglified) +;; ----------------------- +;; +;; John Doe wrote: +;; +;; > This sentence no verb. This sentence no verb. This sentence no +;; > verb. This sentence no verb. This sentence no verb. This +;; > sentence no verb. +;; > +;; > Bye, John +;; +;; Hey, John. There's no in all your sentences! +;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; +;; Usage +;; ----- +;; +;; Press `W k' in the Summary Buffer. +;; +;; Non recommended usage :-) +;; --------------------- +;; +;; To automatically invoke deuglification on every article you read, +;; put something like that in your .gnus: +;; +;; (add-hook 'gnus-article-decode-hook 'gnus-outlook-unwrap-lines) +;; +;; or _one_ of the following lines: +;; +;; ;; repair broken attribution lines +;; (add-hook 'gnus-article-decode-hook 'gnus-outlook-repair-attribution) +;; +;; ;; repair broken attribution lines and citations +;; (add-hook 'gnus-article-decode-hook 'gnus-outlook-rearrange-citation) +;; +;; Note that there always may be some false positives, so I suggest +;; using the manual invocation. After deuglification you may want to +;; refill the whole article using `W w'. +;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; +;; Limitations +;; ----------- +;; +;; As I said before there may (or will) be a few false positives on +;; unwrapping cited lines with `gnus-outlook-unwrap-lines'. +;; +;; `gnus-outlook-repair-attribution' will only fix the first +;; attribution line found in the article. Furthermore it fixed to +;; certain kinds of attributions. And there may be horribly many +;; false positives, vanishing lines and so on -- so don't trust your +;; eyes. Again I recommend manual invocation. +;; +;; `gnus-outlook-rearrange-citation' carries all the limitations of +;; `gnus-outlook-repair-attribution'. +;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; +;; See ChangeLog for other changes. +;; +;; Revision 1.5 2002/01/27 14:39:17 rscholz +;; * New variable `gnus-outlook-deuglify-no-wrap-chars' to inhibit +;; unwrapping if one these chars is first in the possibly wrapped line. +;; * Improved rearranging of the article. +;; * New function `gnus-outlook-repair-attribution-block' for repairing +;; those big "Original Message (following some headers)" attributions. +;; +;; Revision 1.4 2002/01/03 14:05:00 rscholz +;; Renamed `gnus-outlook-deuglify-article' to +;; `gnus-article-outlook-deuglify-article'. +;; Made it easier to deuglify the article while being in Gnus' Article +;; Edit Mode. (suggested by Phil Nitschke) +;; +;; +;; Revision 1.3 2002/01/02 23:35:54 rscholz +;; Fix a bug that caused succeeding long attribution lines to be +;; unwrapped. Minor doc fixes and regular expression tuning. +;; +;; Revision 1.2 2001/12/30 20:14:34 rscholz +;; Clean up source. +;; +;; Revision 1.1 2001/12/30 20:13:32 rscholz +;; Initial revision +;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +;;; Code: + +(require 'gnus-art) +(require 'gnus-sum) + +(defconst gnus-outlook-deuglify-version "1.5 Gnus version" + "Version of gnus-outlook-deuglify.") + +;;; User Customizable Variables: + +(defgroup gnus-outlook-deuglify nil + "Deuglify articles generated by broken user agents like MS +Outlook (Express).") + +;;;###autoload +(defcustom gnus-outlook-deuglify-unwrap-min 45 + "Minimum length of the cited line above the (possibly) wrapped line." + :type 'number + :group 'gnus-outlook-deuglify) + +;;;###autoload +(defcustom gnus-outlook-deuglify-unwrap-max 95 + "Maximum length of the cited line after unwrapping." + :type 'number + :group 'gnus-outlook-deuglify) + +(defcustom gnus-outlook-deuglify-cite-marks ">|#%" + "Characters that indicate cited lines." + :type 'string + :group 'gnus-outlook-deuglify) + +(defcustom gnus-outlook-deuglify-unwrap-stop-chars nil ;; ".?!" or nil + "Characters that inhibit unwrapping if they are the last one on the +cited line above the possible wrapped line." + :type 'string + :group 'gnus-outlook-deuglify) + +(defcustom gnus-outlook-deuglify-no-wrap-chars "`" + "Characters that inhibit unwrapping if they are the first one in the +possibly wrapped line." + :type 'string + :group 'gnus-outlook-deuglify) + +(defcustom gnus-outlook-deuglify-attrib-cut-regexp + "\\(On \\|Am \\)?\\(Mon\\|Tue\\|Wed\\|Thu\\|Fri\\|Sat\\|Sun\\),[^,]+, " + "Regular expression matching the beginning of an attribution line +that should be cut off." + :type 'string + :group 'gnus-outlook-deuglify) + +(defcustom gnus-outlook-deuglify-attrib-verb-regexp + "wrote\\|writes\\|says\\|schrieb\\|schreibt\\|meinte\\|skrev\\|a écrit\\|schreef" + "Regular expression matching the verb used in an attribution line." + :type 'string + :group 'gnus-outlook-deuglify) + +(defcustom gnus-outlook-deuglify-attrib-end-regexp + ": *\\|\\.\\.\\." + "Regular expression matching the end of an attribution line." + :type 'string + :group 'gnus-outlook-deuglify) + + +;; Functions + +;; TODO: don't kill MIME parts +;;;###autoload +(defun gnus-outlook-unwrap-lines () + "Unwrap lines that appear to be wrapped citation lines. You can +control what lines will be unwrapped by frobbing +`gnus-outlook-deuglify-unwrap-min' and +`gnus-outlook-deuglify-unwrap-max', indicating the miminum and maximum +length of an unwrapped citation line." + (interactive) + (save-excursion + (let ((case-fold-search nil) + (inhibit-read-only t) + (cite-marks gnus-outlook-deuglify-cite-marks) + (no-wrap gnus-outlook-deuglify-no-wrap-chars) + (stop-chars gnus-outlook-deuglify-unwrap-stop-chars)) + (gnus-with-article-buffer + (article-goto-body) + (while (re-search-forward + (concat + "^\\([ \t" cite-marks "]*\\)" + "\\([" cite-marks "].*[^\n " stop-chars "]\\)[ \t]?\n" + "\\1\\([^\n " cite-marks no-wrap "]+.*\\)$") + nil t) + (let ((len12 (- (match-end 2) (match-beginning 1))) + (len3 (- (match-end 3) (match-beginning 3)))) + (if (and (> len12 gnus-outlook-deuglify-unwrap-min) + (< (+ len12 len3) gnus-outlook-deuglify-unwrap-max)) + (progn + (replace-match "\\1\\2 \\3") + (goto-char (match-beginning 0)))))))))) + +;; TODO: respect signatures, don't kill MIME parts +(defun gnus-outlook-rearrange-article (from-where) + "Put the text from `from-where' to the end of buffer at the top of +the article buffer." + (save-excursion + (let ((inhibit-read-only t) + (cite-marks gnus-outlook-deuglify-cite-marks)) + (gnus-with-article-buffer + (unless (search-forward-regexp + (concat "^[ \t]*[^" cite-marks "\n]") nil t) + (kill-region from-where (point-max)) + (article-goto-body) + (yank) + (insert "\n")))))) + +;; John Doe wrote in message +;; news:a87usw8$dklsssa$2@some.news.server... + +(defun gnus-outlook-repair-attribution-outlook () + "Repair a broken attribution line (Outlook)." + (save-excursion + (let ((case-fold-search nil) + (inhibit-read-only t) + (cite-marks gnus-outlook-deuglify-cite-marks)) + (gnus-with-article-buffer + (article-goto-body) + (if (re-search-forward + (concat "^\\([^" cite-marks "].+\\)" + "\\(" gnus-outlook-deuglify-attrib-verb-regexp "\\)" + "\\(.*\n?[^\n" cite-marks "].*\\)?" + "\\(" gnus-outlook-deuglify-attrib-end-regexp "\\)$") + nil t) + (progn + (replace-match "\\1\\2\\4") + (match-beginning 0))))))) + + +;; ----- Original Message ----- +;; From: "John Doe" +;; To: "Doe Foundation" +;; Sent: Monday, November 19, 2001 12:13 PM +;; Subject: More Doenuts + +(defun gnus-outlook-repair-attribution-block () + "Repair a big broken attribution block." + (save-excursion + (let ((case-fold-search nil) + (inhibit-read-only t) + (cite-marks gnus-outlook-deuglify-cite-marks)) + (gnus-with-article-buffer + (article-goto-body) + (if (re-search-forward + (concat "^----* ?[^-]+ ?----*\n" + "[^\n]+: \\([^\n]+\\)\n" + "[^\n]+: [^\n]+\n" + "[^\n]+: [^\n]+\n" + "[^\n]+: [^\n]+$") + nil t) + (progn + (replace-match "\\1 wrote:") + (match-beginning 0))))))) + +;; On Wed, 16 Jan 2002 23:23:30 +0100, John Doe wrote: + +(defun gnus-outlook-repair-attribution-other () + "Repair a broken attribution line (other user agents than Outlook)." + (save-excursion + (let ((case-fold-search nil) + (inhibit-read-only t) + (cite-marks gnus-outlook-deuglify-cite-marks)) + (gnus-with-article-buffer + (article-goto-body) + (if (re-search-forward + (concat "^\\("gnus-outlook-deuglify-attrib-cut-regexp"\\)?" + "\\([^" cite-marks "].+\\)\n\\([^\n" cite-marks "].*\\)?" + "\\(" gnus-outlook-deuglify-attrib-verb-regexp "\\).*" + "\\(" gnus-outlook-deuglify-attrib-end-regexp "\\)$") + nil t) + (progn + (replace-match "\\4 \\5\\6\\7") + (match-beginning 0))))))) + +;;;###autoload +(defun gnus-outlook-repair-attribution () + "Repair a broken attribution line." + (interactive) + (or + (gnus-outlook-repair-attribution-other) + (gnus-outlook-repair-attribution-block) + (gnus-outlook-repair-attribution-outlook))) + +(defun gnus-outlook-rearrange-citation () + "Repair broken citations." + (let ((attrib-start (gnus-outlook-repair-attribution))) + ;; rearrange citations if an attribution line has been recognized + (if attrib-start + (gnus-outlook-rearrange-article attrib-start)))) + +;;;###autoload +(defun gnus-outlook-deuglify-article () + "Deuglify broken Outlook (Express) articles." + (interactive) + ;; apply treatment of dumb quotes + (gnus-article-treat-dumbquotes) + ;; repair wrapped cited lines + (gnus-outlook-unwrap-lines) + ;; repair attribution line + (gnus-outlook-rearrange-citation)) + +;;;###autoload +(defun gnus-article-outlook-deuglify-article () + "Deuglify broken Outlook (Express) articles and redisplay." + (interactive) + (gnus-outlook-deuglify-article) + (with-current-buffer (or gnus-article-buffer (current-buffer)) + (gnus-article-prepare-display))) + +(provide 'deuglify) + +;; Local Variables: +;; coding: iso-8859-1 +;; End: + +;;; deuglify.el ends here diff --git a/lisp/gnus-sum.el b/lisp/gnus-sum.el index ce1fd07..9ebc676 100644 --- a/lisp/gnus-sum.el +++ b/lisp/gnus-sum.el @@ -55,6 +55,9 @@ (autoload 'gnus-mailing-list-insinuate "gnus-ml" nil t) (autoload 'turn-on-gnus-mailing-list-mode "gnus-ml" nil t) (autoload 'mm-uu-dissect "mm-uu") +(autoload 'gnus-article-outlook-deuglify-article "deuglify" + "Deuglify broken Outlook (Express) articles and redisplay." + t) (defcustom gnus-kill-summary-on-exit t "*If non-nil, kill the summary buffer when you exit from it. @@ -1768,7 +1771,8 @@ increase the score of each group you read." "m" gnus-summary-toggle-mime "a" gnus-article-strip-headers-in-body ;; mnemonic: wash archive "p" gnus-article-verify-x-pgp-sig - "d" gnus-article-treat-dumbquotes) + "d" gnus-article-treat-dumbquotes + "k" gnus-article-outlook-deuglify-article) (gnus-define-keys (gnus-summary-wash-hide-map "W" gnus-summary-wash-map) "a" gnus-article-hide @@ -2017,7 +2021,9 @@ increase the score of each group you read." ["Html" gnus-article-wash-html t] ["URLs" gnus-article-unsplit-urls t] ["Verify X-PGP-Sig" gnus-article-verify-x-pgp-sig t] - ["HZ" gnus-article-decode-HZ t]) + ["HZ" gnus-article-decode-HZ t] + ["OutlooK deuglify" gnus-article-outlook-deuglify-article t] + ) ("Output" ["Save in default format" gnus-summary-save-article ,@(if (featurep 'xemacs) '(t) diff --git a/lisp/html2text.el b/lisp/html2text.el new file mode 100644 index 0000000..22ae79b --- /dev/null +++ b/lisp/html2text.el @@ -0,0 +1,568 @@ +;;; html2text.el --- a simple html to plain text converter + +;; Copyright (C) 2002 Free Software Foundation, Inc. + +;; Author: Joakim Hove + +;; This file is part of GNU Emacs. + +;; GNU Emacs is free software; you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation; either version 2, or (at your option) +;; any later version. + +;; GNU Emacs is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. + +;; You should have received a copy of the GNU General Public License +;; along with GNU Emacs; see the file COPYING. If not, write to the +;; Free Software Foundation, Inc., 59 Temple Place - Suite 330, +;; Boston, MA 02111-1307, USA. + +;;; Commentary: + +;; These functions provide a simple way to wash/clean html infected +;; mails. Definitely do not work in all cases, but some improvement +;; in readability is generally obtained. Formatting is only done in +;; the buffer, so the next time you enter the article it will be +;; "re-htmlized". +;; +;; The main function is "html2text" + +;;; Code: + +;; +;; +;; + +(eval-when-compile + (require 'cl)) + +(defvar html2text-format-single-element-list '(("hr" . html2text-clean-hr))) + +(defvar html2text-replace-list + '((" " . " ") (">" . ">") ("<" . "<") (""" . "\"")) + "The map of entity to text. + +This is an alist were each element is a dotted pair consisting of an +old string, and a replacement string. This replacement is done by the +function \"html2text-substitute\" which basically performs a +replace-string operation for every element in the list. This is +completely verbatim - without any use of REGEXP.") + +(defvar html2text-remove-tag-list + '("html" "body" "p" "img" "dir" "head" "div" "br" "font" "title" "meta") + "A list of removable tags. + +This is a list of tags which should be removed, without any +formatting. Observe that if you the tags in the list are presented +*without* any \"<\" or \">\". All occurences of a tag appearing in +this list are removed, irrespective of whether it is a closing or +opening tag, or if the tag has additional attributes. The actual +deletion is done by the function \"html2text-remove-tags\". + +For instance the text: + +\"Here comes something big .\" + +will be reduced to: + +\"Here comes something big.\" + +If this list contains the element \"font\".") + +(defvar html2text-format-tag-list + '(("b" . html2text-clean-bold) + ("u" . html2text-clean-underline) + ("i" . html2text-clean-italic) + ("blockquote" . html2text-clean-blockquote) + ("a" . html2text-clean-anchor) + ("ul" . html2text-clean-ul) + ("ol" . html2text-clean-ol) + ("dl" . html2text-clean-dl) + ("center" . html2text-clean-center)) + "An alist of tags and processing functions. + +This is an alist where each dotted pair consists of a tag, and then +the name of a function to be called when this tag is found. The +function is called with the arguments p1, p2, p3 and p4. These are +demontrated below: + +\" This is bold text \" + ^ ^ ^ ^ + | | | | +p1 p2 p3 p4 + +Then the called function will typically format the text somewhat and +remove the tags.") + +(defvar html2text-remove-tag-list2 '("li" "dt" "dd" "meta") + "Another list of removable tags. + +This is a list of tags which are removed similarly to the list +`html2text-remove-tag-list' - but these tags are retained for the +formatting, and then moved afterward.") + +;; +;; +;; + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +;; +;; +;; + +(defun html2text-buffer-head () + (if (string= mode-name "Article") + (beginning-of-buffer) + (beginning-of-buffer) + ) + ) + +(defun html2text-replace-string (from-string to-string p1 p2) + (goto-char p1) + (let ((delta (- (string-width to-string) (string-width from-string))) + (change 0)) + (while (search-forward from-string p2 t) + (replace-match to-string) + (setq change (+ change delta)) + ) + change + ) + ) + +;; +;; +;; + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +;; +;; i.e. +;; + +(defun html2text-attr-value (attr-list attr) + (nth 1 (assoc attr attr-list)) + ) + +(defun html2text-get-attr (p1 p2 tag) + (goto-char p1) + (re-search-forward " +[^ ]" p2 t) + (let* ((attr-string (buffer-substring-no-properties (1- (point)) (1- p2))) + (tmp-list (split-string attr-string)) + (attr-list) + (counter 0) + (prev (car tmp-list)) + (this (nth 1 tmp-list)) + (next (nth 2 tmp-list)) + (index 1)) + + (cond + ;; size=3 + ((string-match "[^ ]=[^ ]" prev) + (let ((attr (nth 0 (split-string prev "="))) + (value (nth 1 (split-string prev "=")))) + (setq attr-list (cons (list attr value) attr-list)) + ) + ) + ;; size= 3 + ((string-match "[^ ]=\\'" prev) + (setq attr-list (cons (list (substring prev 0 -1) this) attr-list)) + ) + ) + + (while (< index (length tmp-list)) + (cond + ;; size=3 + ((string-match "[^ ]=[^ ]" this) + (let ((attr (nth 0 (split-string this "="))) + (value (nth 1 (split-string this "=")))) + (setq attr-list (cons (list attr value) attr-list)) + ) + ) + ;; size =3 + ((string-match "\\`=[^ ]" this) + (setq attr-list (cons (list prev (substring this 1)) attr-list))) + + ;; size= 3 + ((string-match "[^ ]=\\'" this) + (setq attr-list (cons (list (substring this 0 -1) next) attr-list)) + ) + + ;; size = 3 + ((string= "=" this) + (setq attr-list (cons (list prev next) attr-list)) + ) + ) + (setq index (1+ index)) + (setq prev this) + (setq this next) + (setq next (nth (1+ index) tmp-list)) + ) + + ;; + ;; Tags with no accompanying "=" i.e. value=nil + ;; + (setq prev (car tmp-list)) + (setq this (nth 1 tmp-list)) + (setq next (nth 2 tmp-list)) + (setq index 1) + + (if (not (string-match "=" prev)) + (progn + (if (not (string= (substring this 0 1) "=")) + (setq attr-list (cons (list prev nil) attr-list)) + ) + ) + ) + + (while (< index (1- (length tmp-list))) + (if (not (string-match "=" this)) + (if (not (or (string= (substring next 0 1) "=") + (string= (substring prev -1) "="))) + (setq attr-list (cons (list this nil) attr-list)) + ) + ) + (setq index (1+ index)) + (setq prev this) + (setq this next) + (setq next (nth (1+ index) tmp-list)) + ) + + (if this + (progn + (if (not (string-match "=" this)) + (progn + (if (not (string= (substring prev -1) "=")) + (setq attr-list (cons (list this nil) attr-list)) + ) + ) + ) + ) + ) + attr-list ;; return - value + ) + ) + +;; +;; +;; + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +;; +;; +;; +(defun html2text-clean-list-items (p1 p2 list-type) + (goto-char p1) + (let ((item-nr 0) + (items 0)) + (while (re-search-forward "
  • " p2 t) + (setq items (1+ items))) + (goto-char p1) + (while (< item-nr items) + (setq item-nr (1+ item-nr)) + (re-search-forward "
  • " (point-max) t) + (cond + ((string= list-type "ul") (insert " o ")) + ((string= list-type "ol") (insert (format " %s: " item-nr))) + (t (insert " x "))) + ) + ) + ) + +(defun html2text-clean-dtdd (p1 p2) + (goto-char p1) + (let ((items 0) + (item-nr 0)) + (while (re-search-forward "
    " p2 t) + (setq items (1+ items))) + (goto-char p1) + (while (< item-nr items) + (setq item-nr (1+ item-nr)) + (re-search-forward "
    \\([ ]*\\)" (point-max) t) + (if (match-string 1) + (kill-region (point) (- (point) (string-width (match-string 1)))) + ) + (let ((def-p1 (point)) + (def-p2 0)) + (re-search-forward "\\([ ]*\\)\\(
    \\|
    \\)" (point-max) t) + (if (match-string 1) + (progn + (let* ((mw1 (string-width (match-string 1))) + (mw2 (string-width (match-string 2))) + (mw (+ mw1 mw2))) + (goto-char (- (point) mw)) + (kill-region (point) (+ (point) mw1)) + (setq def-p2 (point)) + ) + ) + (setq def-p2 (- (point) (string-width (match-string 2))))) + (put-text-property def-p1 def-p2 'face 'bold) + ) + ) + ) + ) + +(defun html2text-delete-tags (p1 p2 p3 p4) + (kill-region p1 p2) + (kill-region (- p3 (- p2 p1)) (- p4 (- p2 p1))) + ) + +(defun html2text-delete-single-tag (p1 p2) + (kill-region p1 p2) + ) + +(defun html2text-clean-hr (p1 p2) + (html2text-delete-single-tag p1 p2) + (goto-char p1) + (newline 1) + (insert (make-string fill-column ?-)) + ) + +(defun html2text-clean-ul (p1 p2 p3 p4) + (html2text-delete-tags p1 p2 p3 p4) + (html2text-clean-list-items p1 (- p3 (- p1 p2)) "ul") + ) + +(defun html2text-clean-ol (p1 p2 p3 p4) + (html2text-delete-tags p1 p2 p3 p4) + (html2text-clean-list-items p1 (- p3 (- p1 p2)) "ol") + ) + +(defun html2text-clean-dl (p1 p2 p3 p4) + (html2text-delete-tags p1 p2 p3 p4) + (html2text-clean-dtdd p1 (- p3 (- p1 p2))) + ) + +(defun html2text-clean-center (p1 p2 p3 p4) + (html2text-delete-tags p1 p2 p3 p4) + (center-region p1 (- p3 (- p2 p1))) + ) + +(defun html2text-clean-bold (p1 p2 p3 p4) + (put-text-property p2 p3 'face 'bold) + (html2text-delete-tags p1 p2 p3 p4) + ) + +(defun html2text-clean-title (p1 p2 p3 p4) + (put-text-property p2 p3 'face 'bold) + (html2text-delete-tags p1 p2 p3 p4) + ) + +(defun html2text-clean-underline (p1 p2 p3 p4) + (put-text-property p2 p3 'face 'underline) + (html2text-delete-tags p1 p2 p3 p4) + ) + +(defun html2text-clean-italic (p1 p2 p3 p4) + (put-text-property p2 p3 'face 'italic) + (html2text-delete-tags p1 p2 p3 p4) + ) + +(defun html2text-clean-font (p1 p2 p3 p4) + (html2text-delete-tags p1 p2 p3 p4) + ) + +(defun html2text-clean-blockquote (p1 p2 p3 p4) + (html2text-delete-tags p1 p2 p3 p4) + ) + +(defun html2text-clean-anchor (p1 p2 p3 p4) + ;; If someone can explain how to make the URL clickable I will + ;; surely improve upon this. + (let* ((attr-list (html2text-get-attr p1 p2 "a")) + (href (html2text-attr-value attr-list "href"))) + (kill-region p1 p4) + (when href + (goto-char p1) + (insert (substring href 1 -1 )) + (put-text-property p1 (point) 'face 'bold)))) + +;; +;; +;; + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +;; +;; +;; + +(defun html2text-fix-paragraph (p1 p2) + (goto-char p1) + (let ((has-br-line) + (refill-start) + (refill-stop)) + (if (re-search-forward "
    $" p2 t) + (setq has-br-line t) + ) + (if has-br-line + (progn + (goto-char p1) + (if (re-search-forward ".+[^<][^b][^r][^>]$" p2 t) + (progn + (beginning-of-line) + (setq refill-start (point)) + (goto-char p2) + (re-search-backward ".+[^<][^b][^r][^>]$" refill-start t) + (next-line 1) + (end-of-line) + ;; refill-stop should ideally be adjusted to + ;; accomodate the "
    " strings which are removed + ;; between refill-start and refill-stop. Can simply + ;; be returned from my-replace-string + (setq refill-stop (+ (point) + (html2text-replace-string + "
    " "" + refill-start (point)))) + ;; (message "Point = %s refill-stop = %s" (point) refill-stop) + ;; (sleep-for 4) + (fill-region refill-start refill-stop) + ) + ) + ) + ) + ) + (html2text-replace-string "
    " "" p1 p2) + ) + +;; +;; This one is interactive ... +;; +(defun html2text-fix-paragraphs () + "This _tries_ to fix up the paragraphs - this is done in quite a ad-hook +fashion, quite close to pure guess-work. It does work in some cases though." + (interactive) + (html2text-buffer-head) + (replace-regexp "^
    $" "") + ;; Removing lonely
    on a single line, if they are left intact we + ;; dont have any paragraphs at all. + (html2text-buffer-head) + (while (< (point) (point-max)) + (let ((p1 (point))) + (forward-paragraph 1) + ;;(message "Kaller fix med p1=%s p2=%s " p1 (1- (point))) (sleep-for 5) + (html2text-fix-paragraph p1 (1- (point))) + (goto-char p1) + (if (< (point) (point-max)) + (forward-paragraph 1)) + ) + ) + ) + +;; +;;
    +;; + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +;; +;; +;; + +(defun html2text-remove-tags (tag-list) + "Removes the tags listed in the list \"html2text-remove-tag-list\". +See the documentation for that variable." + (interactive) + (dolist (tag tag-list) + (html2text-buffer-head) + (while (re-search-forward (format "\\(]*>\\)" tag) (point-max) t) + (let ((p1 (point))) + (search-backward "<") + (kill-region (point) p1) + ) + ) + ) + ) + +(defun html2text-format-tags () + "See the variable \"html2text-format-tag-list\" for documentation" + (interactive) + (dolist (tag-and-function html2text-format-tag-list) + (let ((tag (car tag-and-function)) + (function (cdr tag-and-function))) + (html2text-buffer-head) + (while (re-search-forward (format "\\(<%s\\( [^>]*\\)?>\\)" tag) + (point-max) t) + (let ((p1) + (p2 (point)) + (p3) (p4) + (attr (match-string 1))) + (search-backward "<" (point-min) t) + (setq p1 (point)) + (re-search-forward (format "" tag) (point-max) t) + (setq p4 (point)) + (search-backward "]*\\)?>\\)" tag) + (point-max) t) + (let ((p1) + (p2 (point))) + (search-backward "<" (point-min) t) + (setq p1 (point)) + (funcall function p1 p2) + ) + ) + ) + ) + ) + +;; +;; Main function +;; + +;;;###autoload +(defun html2text () + "Convert HTML to plain text in the current buffer." + (interactive) + (save-excursion + (let ((case-fold-search t) + (buffer-read-only)) + (html2text-remove-tags html2text-remove-tag-list) + (html2text-format-tags) + (html2text-remove-tags html2text-remove-tag-list2) + (html2text-substitute) + (html2text-format-single-elements) + (html2text-fix-paragraphs)))) + +;; +;; +;; + +;;; html2text.el ends here diff --git a/lisp/mm-decode.el b/lisp/mm-decode.el index c7f7bc9..66f203c 100644 --- a/lisp/mm-decode.el +++ b/lisp/mm-decode.el @@ -100,7 +100,8 @@ (cond ((locate-library "w3") 'w3) ((locate-library "w3m") 'w3m) ((executable-find "links") 'links) - ((executable-find "lynx") 'lynx)) + ((executable-find "lynx") 'lynx) + (t 'html2text)) "Render of HTML contents. It is one of defined renderer types, or a rendering function. The defined renderer types are: @@ -108,11 +109,13 @@ The defined renderer types are: `w3m' : using emacs-w3m; `links': using links; `lynx' : using lynx; +`html2text' : using html2text; `nil' : using external viewer." :type '(choice (symbol w3) (symbol w3m) (symbol links) (symbol lynx) + (symbol html2text) (symbol nil) (function)) :version "21.3" @@ -683,8 +686,10 @@ external if displayed external." (make-directory dir) (set-file-modes dir 448) (if filename - (setq file (expand-file-name (file-name-nondirectory filename) - dir)) + (setq file (expand-file-name + (gnus-map-function mm-file-name-rewrite-functions + (file-name-nondirectory filename)) + dir)) (setq file (make-temp-name (expand-file-name "mm." dir)))) (let ((coding-system-for-write mm-binary-coding-system)) (write-region (point-min) (point-max) file nil 'nomesg)) diff --git a/lisp/mm-view.el b/lisp/mm-view.el index e35cf29..f021ec6 100644 --- a/lisp/mm-view.el +++ b/lisp/mm-view.el @@ -34,6 +34,7 @@ (autoload 'vcard-parse-string "vcard") (autoload 'vcard-format-string "vcard") (autoload 'fill-flowed "flow-fill") + (autoload 'html2text "html2text") (unless (fboundp 'diff-mode) (autoload 'diff-mode "diff-mode" "" t nil))) @@ -44,7 +45,8 @@ mm-links-remove-leading-blank "links" "-dump" file) (lynx mm-inline-render-with-stdin nil - "lynx" "-dump" "-force_html" "-stdin")) + "lynx" "-dump" "-force_html" "-stdin") + (html2text mm-inline-render-with-function html2text)) "The attributes of renderer types for text/html.") (defvar mm-text-html-washer-alist @@ -54,7 +56,8 @@ mm-links-remove-leading-blank "links" "-dump" file) (lynx mm-inline-wash-with-stdin nil - "lynx" "-dump" "-force_html" "-stdin")) + "lynx" "-dump" "-force_html" "-stdin") + (html2text html2text)) "The attributes of washer types for text/html.") ;;; Internal variables. diff --git a/lisp/mml.el b/lisp/mml.el index 3e40080..91a0343 100644 --- a/lisp/mml.el +++ b/lisp/mml.el @@ -433,7 +433,8 @@ If MML is non-nil, return the buffer up till the correspondent mml tag." (let (use-hard-newlines) (when (and (string= type "text/plain") (or (null (assq 'format cont)) - (string= (assq 'format cont) "flowed")) + (string= (cdr (assq 'format cont)) + "flowed")) (setq use-hard-newlines (text-property-any (point-min) (point-max) 'hard 't))) diff --git a/lisp/nnmaildir.el b/lisp/nnmaildir.el index 25b15a8..094704a 100644 --- a/lisp/nnmaildir.el +++ b/lisp/nnmaildir.el @@ -804,7 +804,10 @@ by nnmaildir-request-article.") 'read-only) ct-min (nnmaildir--article-count group)) (insert (nnmaildir--grp-get-name group) " ") - (princ (car ct-min) nntp-server-buffer) + (princ (nnmaildir--nlist-last-num + (nnmaildir--lists-get-nlist + (nnmaildir--grp-get-lists group))) + nntp-server-buffer) (insert " ") (princ (cdr ct-min) nntp-server-buffer) (insert " " (if ro "n" "y") "\n")) @@ -832,8 +835,8 @@ by nnmaildir-request-article.") (princ (cdr ct-min) nntp-server-buffer) (insert " ") (princ (nnmaildir--nlist-last-num - (nnmaildir--lists-get-nlist - (nnmaildir--grp-get-lists group))) + (nnmaildir--lists-get-nlist + (nnmaildir--grp-get-lists group))) nntp-server-buffer) (insert " " gname "\n"))))) 'group) diff --git a/texi/ChangeLog b/texi/ChangeLog index 50a45cd..27001f3 100644 --- a/texi/ChangeLog +++ b/texi/ChangeLog @@ -1,3 +1,12 @@ +2002-02-23 ShengHuo ZHU + + * gnusref.tex (subsection*{Notes}): Addition. + Suggested by Felix Natter + +2002-02-22 ShengHuo ZHU + + * gnus.texi (Splitting Mail): Addition. + 2002-02-20 ShengHuo ZHU * gnus.texi (Slave Gnusae): Addition. diff --git a/texi/gnus-ja.texi b/texi/gnus-ja.texi index f0a1931..166d624 100644 --- a/texi/gnus-ja.texi +++ b/texi/gnus-ja.texi @@ -11742,6 +11742,10 @@ nnmail-split-history} $B$r;H$&;v$,$G$-$^$9!#$3$l$+$i%9%W!<%k$7D>$=$&$H$9$k(B @code{gnus-summary-respool-trace} $B$H4XO"$9$kL?(B $BNa(B (@pxref{Mail Group Commands}) $B$r;H$&;v$,$G$-$^$9!#(B +@vindex nnmail-split-header-length-limit +@code{nnmail-split-header-length-limit} $B$N@)8B$h$jD9$$%X%C%@!<9T$O!"J,3d(B +$B4X?t$N=hM}BP>]$+$i=|30$5$l$^$9!#(B + Gnus $B$O$"$J$?$K<+J,<+?H$KD7$MJV$C$F$/$k$h$&$J2DG=@-$N$"$kA4$F$N5!2q$rDs(B $B6!$7$^$9!#Nc$($P!"$"$J$?$N>e;J$+$i$/$kA4$F$N%a!<%k$r4^$s$@%0%k!<%W$r:n$C(B $B$?$H$7$^$7$g$&!#$=$l$+$i!"6vH/E*$K$=$N%0%k!<%W$N9XFI