((section 2 "Outdated egg!" (p "This is an egg for CHICKEN 4, the unsupported old release.  You're almost certainly looking for " (int-link "/eggref/5/html-parser" "the CHICKEN 5 version of this egg") ", if it exists.") (p "If it does not exist, there may be equivalent functionality provided by another egg; have a look at the " (link "https://wiki.call-cc.org/chicken-projects/egg-index-5.html" "egg index") ". Otherwise, please consider porting this egg to the current version of CHICKEN.") (tags "egg")) (section 2 "html-parser" (toc) (section 3 "Description" (p "A permissive, scalable HTML parser.")) (section 3 "Author" (p (int-link "/users/alex-shinn" "Alex Shinn"))) (section 3 "Documentation" (p (tt "html-parser") " is intended as a permissive HTML parser for people who prefer the scalable interface described in Oleg Kiselyov's SSAX parser, as well as providing simple convenience utilities.  It correctly handles all invalid HTML, inserting \"virtual\" starting and closing tags as needed to maintain the proper tree structure needed for the foldts down/up logic.  A major goal of this parser is bug-for-bug compatibility with the way common web browsers parse HTML.") (section 4 "Main interface" (section 5 "make-html-parser" (def (sig (procedure "(make-html-parser . keys)" (id make-html-parser))) (p "Returns a procedure of two arguments, an initial seed and an optional input port, which parses the HTML document from the port with the callbacks specified in the plist " (tt "KEYS") " (using normal, quoted symbols, for portability and to avoid making this a macro).  The following callbacks are recognized:") (pre " START: TAG ATTRS SEED VIRTUAL?\n     fdown in foldts, called when a start-tag is encountered.\n   TAG:         tag name\n   ATTRS:       tag attributes as a alist\n   SEED:        current seed value\n   VIRTUAL?:    #t iff this start tag was inserted to fix the HTML tree") (pre " END: TAG ATTRS PARENT-SEED SEED VIRTUAL?\n     fup in foldts, called when an end-tag is encountered.\n   TAG:         tag name\n   ATTRS:       tag attributes of the corresponding start tag\n   PARENT-SEED: parent seed value (i.e. seed passed to the start tag)\n   SEED:        current seed value\n   VIRTUAL?:    #t iff this end tag was inserted to fix the HTML tree") (pre " TEXT: TEXT SEED\n     fhere in foldts, called when any text is encountered.  May be\n     called multiple times between a start and end tag, so you need\n     to string-append yourself if desired.\n   TEXT:        entity-decoded text\n   SEED:        current seed value") (pre " COMMENT: TEXT SEED\n     fhere on comment data") (pre " DECL: NAME ATTRS SEED\n     fhere on declaration data\n     \n PROCESS: LIST SEED\n     fhere on process-instruction data") (p "In addition, entity-mappings may be overriden with the " (tt "ENTITIES:") " keyword.")))) (section 4 "Convenience functions" (section 5 "html->sxml" (def (sig (procedure "(html->sxml [port])" (id html->sxml))) (p "Returns the SXML representation of the document from " (tt "PORT") ", using the default parsing options."))) (section 5 "html-strip" (def (sig (procedure "(html-strip [port])" (id html-strip))) (p "Returns a string representation of the document from PORT with all tags removed.  No whitespace reduction or other rendering is done."))))) (section 3 "Examples" (p "This is the definition of the " (tt "html->sxml") " convenience function included in the egg:") (highlight scheme " (define html->sxml\n   (let ((parse\n          (make-html-parser\n           'start: (lambda (tag attrs seed virtual?) '())\n           'end:   (lambda (tag attrs parent-seed seed virtual?)\n                     `((,tag ,@(if (pair? attrs)\n                                   `((@ ,@attrs) ,@(reverse seed))\n                                   (reverse seed)))\n                       ,@parent-seed))\n           'decl:    (lambda (tag attrs seed) `((*DECL* ,tag ,@attrs) ,@seed))\n           'process: (lambda (attrs seed) `((*PI* ,@attrs) ,@seed))\n           'comment: (lambda (text seed) `((*COMMENT* ,text) ,@seed))\n           'text:    (lambda (text seed) (cons text seed))\n           )))\n     (lambda o\n       (reverse (apply parse '() o)))))") (p "The parser for " (tt "html-strip") " could be defined as:") (highlight scheme " (make-html-parser\n   'start: (lambda (tag attrs seed virtual?) seed)\n   'end:   (lambda (tag attrs parent-seed seed virtual?) seed)\n   'text:  (lambda (text seed) (display text)))")) (section 3 "Changelog" (ul (li "0.1 Import upstream as of 2009-01-25"))) (section 3 "License" (p "BSD-style license: " (link "http://synthcode.com/license.txt") "."))))