* Silex / LALR -*- org -*- Silex is a direct translation of the lex interface. It is barely integrated into scheme. You must LEX a file.l to a file.scm and source that file; then you run lexer-init (which accepts a string or port) to initialize the input. It outputs one token every time you call the lexer. ** Annoying things about Silex: Lex grammar must reside in a separate file, making recompilation annoying. Lex grammar follows Lex specs exactly. Scheme may not be used to build up regular expressions, for example. It is flexible, so actions can return anything. Because lalr expects a token pair, this leads to lots of redundant actions, e.g. (cons '+ '+), (cons '- '-), ... It is impossible to relex a file.l after you call lexer-init; this may be a bug. No predefined macros (regexps) such as whitespace or digit. Only understands metachar '\n', not \t. Must use \9 for tab. In same vein, limited "silex" regular expressions rather than whatever the system provides. ** Annoying things about lalr: Productions return #f by default, not $1. Simple syntax like a: b | c ; => (a (b) (c)) is therefore not possible; you must return $1 explicitly: => (a (b) : $1 (c) : $1) Scope is not obeyed, as in PLT (and Bigloo?) Because I have used neither and do not completely follow the Lexer and Parser Generators in Scheme.pdf explanation, I don't know how limiting this is. Does not allow string literals (or regexps, of course) in the grammar. Any parser derived from YACC will do the same, as YACC is purely token-based. * SLLGen Parser used in EoPL; also numerous university courses. Ample examples available (somewhere other than EoPL, though that is authoritative.) Scans (lexes) and parses LL(1) grammar. Uses S-exprs, although does not provide a syntactic form (such as macros) for integrating them into Scheme. Allows literals in the grammar itself. Studying SLLGen examples, as I find them, should explain how to properly use a lexer in scheme and what to look for / look down upon. Search google for "sllgen". For example: http://www.mathcs.carleton.edu/faculty/rkirchne/cs217/interp3-1.scm http://courses.cs.byu.edu/cs330/cs330/lectures/ppt/Lecture16/sld001.htm http://www.cs.rit.edu/~afb/20013/plc/hmwk7/ [sllgen source -- from 1999] define-datatype has error: extra ellipsis in syntax form http://www.cs.xu.edu/csci300/00f/Programs/interpreter-1.scm.txt http://tao.uab.es/cgi-bin/archzoom.cgi/jao@gnu.org--2004/libs--eopl--1.0--patch-1/define-datatype.scm?download [newer version of define-datatype - 2000] Some lecture notes at ~/scheme/doc/SLLGEN Lecture Notes.ps. [originally from http://www.cs.ucla.edu/~palsberg/course/purdue/cs565/S97/lectures/Print-systemnotes1.ps] Tested sllgen.scm with test-sllgen2.scm (included as text in sllgen.scm). Works fine. But, I didn't have define-datatype loaded, so I am not exercising the parser... "Actions" for lex grammar are: ;;; action-opcode :: = skip | make-symbol | make-number | make-string Some examples use "symbol", "number" and "string" without the make- prefix. But the sllgen.scm code I have uses the make- prefix. The main point of SLLGEN seems to be to generate a syntax tree which can be evaluated using an interpreter you write. The "actions" in the grammar are not evaluated, but are instead "datatypes" [1] which are joined together to form a parse tree. You pass this tree to an EVAL-EXP function of your own making, along the lines of the eval in SICP 4.1. (See http://groups-beta.google.com/group/comp.lang.scheme/msg/6bb3b81e7d572391?dmode=print&hl=en for an example of eval-exp.) There are a couple bad things about this: - I imagine a significant amount of memory will be used because the entire parse tree is kept in memory. (Perhaps we could avoid emitting a parse tree.) - SLLGEN seems to be a teaching tool rather than production code. It does appear you can parse streams with sllgen:make-stream-parser (though this is intended for REPL loops). See code at end of http://www.math.grin.edu/~stone/courses/languages/Lambda-using-SLLGEN.ss Overall, I find many many references to using SLLGEN to implement an interpreter (transforming input into a parse tree) and no references to executing code on the fly [even if it is possible]. I conclude SLLGEN should not be investigated further at this time, as it will not be practical for non-interpreter use. Although it may be reopened if I need to convert arbitrary input into s-expr format. [1] The exact purpose and use of the "datatype" records is still unknown.] * Bigloo Doc at http://www-sop.inria.fr/mimosa/fp/Bigloo/doc/bigloo-11.html#Lalr(1)-parsing lalr-grammar, regular-grammar, read/lalrp Lexing and parsing integrated into the language. Seems very tied to bigloo. Lalr might be possible, but regular-grammar stuff is very tightly integrated with bigloo regular expressions. regular-grammar code appears to be in runtime/Rgc/* lalr-grammar code appears to be in runtime/Lalr/* runtime/Ieee/input.scm contains read/lalrp, which is a wrapper. runtime/Eval/expanders.scm contains (install-expander 'lalr-grammar expand-lalr-grammar) which -may- be like installing a macro. Finally, I noticed the use of multiple-value-bind and there are probably other non-standard functions used. * PLT Referenced in Lexer and Parser Generators in Scheme.pdf along with GT (cannot find source for this). Location: plt/collects/parser-tools (lex.ss and yacc.ss) http://download.plt-scheme.org/scheme/plt/collects/parser-tools/ Difficulty of separating: contains PLT-specific stuff, not only module system but syntax stuff I don't recognize such as define-for-syntax. ** Examples Simple BIND (zone) file example in ~/scheme/doc/plt-lexer-example.scm, pulled off some guy's blog. [Note: another BIND parsing example is in the Lex and YACC HOWTO.pdf in section 3.2.] ** Scott's parser tools There are a couple references to "Scott's parser tools" which is a reference to the lexer/parser in PLT, the author of which is Scott Owens. * Parsing of input @ http://okmij.org/ftp/Scheme/parsing.html Some lower level parsing functions. Probably useful for theoretical background. "The following simple functions surprisingly often suffice when parsing an input stream. They either skip, or build and return tokens according to the inclusion or delimiter semantics." * CPS Recursive Ascent Parsing http://mysite.verizon.net/vze4rvw7/RecurAscentPaper.pdf [saved locally as CPS Recursive Ascent Parsing.pdf] Gives some examples of recursive descent and top down parsers and works up to a hand written recursive ASCENT parser. Interesting (I basically know how to write a recursive descent parser now) but the CPS style requires as much work as manually writing a recursive descent parser, and is more confusing. * Essence http://www.informatik.uni-freiburg.de/proglang/software/essence/doc/html/essence.html "Essence is a generator for LR(k) and SLR(k) parsers in Scheme." Essence appears to take tokenized input and thus requires a lexer. It should be Class III (s-exprs, integrated into Scheme). No study has been made of Essence. * Taurus Recursive descent parser from 1992. No research yet Taurus, Zebu, etc. are discussed on http://www-2.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/scheme/code/parsing/0.html. All the parsers on this page are very old. * Parse::RecDescent In apache config example at http://groups-beta.google.com/group/comp.lang.perl.misc/msg/52d131d22f07f35b?dmode=print&hl=en, uses two interesting tricks -- autovivifies the %return hash with the field name (key) and value; and apparently 'field(s)' gathers the output of its subrules into a list automatically (whereas I manually use CONS for the devices in the ioscan parser). -> This is explicitly documented in the manual: "Since a repeated subrule may match many instances of the subrule itself, the value associated with it is not a simple scalar, but rather a reference to a list of scalars, each of which is the value associated with one of the individual subrule matches." * Packrat Packrat parsing is a memoizing, backtracking recursive-descent parsing technique that runs in time and space linear in the size of the input text. [documentation] [theoretical background papers] packrat egg json egg [uses packrat to implement a parser]