* Silex / LALR -*- org -*- Silex is a direct translation of the lex interface. It is barely integrated into scheme. You must LEX a file.l to a file.scm and source that file; then you run lexer-init (which accepts a string or port) to initialize the input. It outputs one token every time you call the lexer. ** Annoying things about Silex: Lex grammar must reside in a separate file, making recompilation annoying. Lex grammar follows Lex specs exactly. Scheme may not be used to build up regular expressions, for example. It is flexible, so actions can return anything. Because lalr expects a token pair, this leads to lots of redundant actions, e.g. (cons '+ '+), (cons '- '-), ... It is impossible to relex a file.l after you call lexer-init; this may be a bug. No predefined macros (regexps) such as whitespace or digit. Only understands metachar '\n', not \t. Must use \9 for tab. In same vein, limited "silex" regular expressions rather than whatever the system provides. ** Annoying things about lalr: Productions return #f by default, not $1. Simple syntax like a: b | c ; => (a (b) (c)) is therefore not possible; you must return $1 explicitly: => (a (b) : $1 (c) : $1) Scope is not obeyed, as in PLT (and Bigloo?) Because I have used neither and do not completely follow the Lexer and Parser Generators in Scheme.pdf explanation, I don't know how limiting this is. Does not allow string literals (or regexps, of course) in the grammar. Any parser derived from YACC will do the same, as YACC is purely token-based. Bison allows string literals. Does not allow char literals in the grammar. I placed an experimental unapplied patch in the lalr egg repo to allow them; it disables the check for symbols only. lalr-driver.scm uses eq? (memq) to compare tokens and eq? is legal for char comparison on Chicken. For string matching, this would have to be expanded to equal?. ** examples I wrote a simple wiki parser for the quack-pretty-lambda.txt wiki-format file. It's in . It's pretty hard to understand, but once you have it's not too hard to add stuff. However, it is very difficult to debug, as nearly 0 error reporting or tracing is done, and because you cannot really restart after an error without exiting the scheme session. There is a port of the bison.info mfcalc example in . `chicken-doc lalr` contains a calculator example, which is nearly identical to the bison port above. * SLLGen Parser used in EoPL ; also numerous university courses. Ample examples available (somewhere other than EoPL, though that is authoritative.) Scans (lexes) and parses LL(1) grammar. Uses S-exprs, although does not provide a syntactic form (such as macros) for integrating them into Scheme. Allows literals in the grammar itself. Studying SLLGen examples, as I find them, should explain how to properly use a lexer in scheme and what to look for / look down upon. Search google for "sllgen". For example: http://www.mathcs.carleton.edu/faculty/rkirchne/cs217/interp3-1.scm http://courses.cs.byu.edu/cs330/cs330/lectures/ppt/Lecture16/sld001.htm http://www.cs.rit.edu/~afb/20013/plc/hmwk7/ [sllgen source -- from 1999] define-datatype has error: extra ellipsis in syntax form http://www.cs.xu.edu/csci300/00f/Programs/interpreter-1.scm.txt http://tao.uab.es/cgi-bin/archzoom.cgi/jao@gnu.org--2004/libs--eopl--1.0--patch-1/define-datatype.scm?download [newer version of define-datatype - 2000] Some lecture notes at ~/scheme/doc/SLLGEN Lecture Notes.ps. [originally from http://www.cs.ucla.edu/~palsberg/course/purdue/cs565/S97/lectures/Print-systemnotes1.ps] Tested sllgen.scm with test-sllgen2.scm (included as text in sllgen.scm). Works fine. But, I didn't have define-datatype loaded, so I am not exercising the parser... "Actions" for lex grammar are: ;;; action-opcode :: = skip | make-symbol | make-number | make-string Some examples use "symbol", "number" and "string" without the make- prefix. But the sllgen.scm code I have uses the make- prefix. The main point of SLLGEN seems to be to generate a syntax tree which can be evaluated using an interpreter you write. The "actions" in the grammar are not evaluated, but are instead "datatypes" [1] which are joined together to form a parse tree. You pass this tree to an EVAL-EXP function of your own making, along the lines of the eval in SICP 4.1. (See http://groups-beta.google.com/group/comp.lang.scheme/msg/6bb3b81e7d572391?dmode=print&hl=en for an example of eval-exp.) There are a couple bad things about this: - I imagine a significant amount of memory will be used because the entire parse tree is kept in memory. (Perhaps we could avoid emitting a parse tree.) - SLLGEN seems to be a teaching tool rather than production code. It does appear you can parse streams with sllgen:make-stream-parser (though this is intended for REPL loops). See code at end of http://www.math.grin.edu/~stone/courses/languages/Lambda-using-SLLGEN.ss Overall, I find many many references to using SLLGEN to implement an interpreter (transforming input into a parse tree) and no references to executing code on the fly [even if it is possible]. I conclude SLLGEN should not be investigated further at this time, as it will not be practical for non-interpreter use. Although it may be reopened if I need to convert arbitrary input into s-expr format. [1] The exact purpose and use of the "datatype" records is still unknown.] * Bigloo Doc at http://www-sop.inria.fr/mimosa/fp/Bigloo/doc/bigloo-11.html#Lalr(1)-parsing lalr-grammar, regular-grammar, read/lalrp Lexing and parsing integrated into the language. Seems very tied to bigloo. Lalr might be possible, but regular-grammar stuff is very tightly integrated with bigloo regular expressions. regular-grammar code appears to be in runtime/Rgc/* lalr-grammar code appears to be in runtime/Lalr/* runtime/Ieee/input.scm contains read/lalrp, which is a wrapper. runtime/Eval/expanders.scm contains (install-expander 'lalr-grammar expand-lalr-grammar) which -may- be like installing a macro. Finally, I noticed the use of multiple-value-bind and there are probably other non-standard functions used. * PLT Referenced in Lexer and Parser Generators in Scheme.pdf along with GT (cannot find source for this). Location: plt/collects/parser-tools (lex.ss and yacc.ss) http://download.plt-scheme.org/scheme/plt/collects/parser-tools/ Difficulty of separating: contains PLT-specific stuff, not only module system but syntax stuff I don't recognize such as define-for-syntax. ** Examples Simple BIND (zone) file example in file:~/scheme/doc/plt-lexer-example.scm, pulled off some guy's blog. [Note: another BIND parsing example is in the Lex and YACC HOWTO.pdf in section 3.2.] ** Scott's parser tools There are a couple references to "Scott's parser tools" which is a reference to the lexer/parser in PLT, the author of which is Scott Owens. * Parsing of input @ http://okmij.org/ftp/Scheme/parsing.html Some lower level parsing functions. Probably useful for theoretical background. "The following simple functions surprisingly often suffice when parsing an input stream. They either skip, or build and return tokens according to the inclusion or delimiter semantics." * CPS Recursive Ascent Parsing http://mysite.verizon.net/vze4rvw7/RecurAscentPaper.pdf [saved locally as CPS Recursive Ascent Parsing.pdf] Gives some examples of recursive descent and top down parsers and works up to a hand written recursive ASCENT parser. Interesting (I basically know how to write a recursive descent parser now) but the CPS style requires as much work as manually writing a recursive descent parser, and is more confusing. * Essence http://www.informatik.uni-freiburg.de/proglang/software/essence/doc/html/essence.html "Essence is a generator for LR(k) and SLR(k) parsers in Scheme." Essence appears to take tokenized input and thus requires a lexer. It should be Class III (s-exprs, integrated into Scheme). No study has been made of Essence. * Taurus Recursive descent parser from 1992. No research yet Taurus, Zebu, etc. are discussed on http://www-2.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/scheme/code/parsing/0.html. All the parsers on this page are very old. * Parse::RecDescent (Perl5) http://search.cpan.org/~dconway/Parse-RecDescent-v1.95.1/lib/Parse/RecDescent.pm In apache config example at http://groups-beta.google.com/group/comp.lang.perl.misc/msg/52d131d22f07f35b?dmode=print&hl=en, uses two interesting tricks -- autovivifies the %return hash with the field name (key) and value; and apparently 'field(s)' gathers the output of its subrules into a list automatically (whereas I manually use CONS for the devices in the ioscan parser). -> This is explicitly documented in the manual: "Since a repeated subrule may match many instances of the subrule itself, the value associated with it is not a simple scalar, but rather a reference to a list of scalars, each of which is the value associated with one of the individual subrule matches." * Packrat Packrat parsing is a memoizing, backtracking recursive-descent parsing technique that runs in time and space linear in the size of the input text. [documentation] [theoretical background papers] packrat egg json egg [uses packrat to implement a parser] * Uncategorized, merged in from parsing.txt Perl6::Rules module -- grammar-based parsing http://www-106.ibm.com/developerworks/linux/library/l-cpregex.html -- Perl 6 grammars and regexes and lexing and parsing The Parse::RecDescent module employs Perl 5's regular expressions to do lexing seamlessly (whenever a P::RD rule does not use other rules to do matching, it's a safe bet it's a lexer rule), and builds parsing facilities on top. Thus, P::RD on top of Perl 5 is a powerful parser and lexer combination. lalr.egg: Interesting, but doc is useless and there are no examples. Whether it suppors regexes or even how to start is beyond me. And there is no extra documentation (at all) in the code. -- Lexical analysis function (needed by parser) can be provided by silex. -- After reading a bunch of stuff about parsing, I understand this better. http://www.codecomments.com/archive282-2005-2-406367.html -- silex,lalr getting started example http://www.codecomments.com/archive282-2005-2-404933.html -- YAML, "syck", silex, and lalr [codecomments.com URLs are probably from c.l.s] http://shivers.com/~shivers/scheme04/tmp/scheme04/article/05-lexparse.pdf -- Lexer and Parser Generators in Scheme -- mostly about *writing* lexers and parsers, not using them A bunch of bookmarks under safari:prog/scheme/regexps read lex/yacc stuff lexyacc.pdf (from http://ds9a.nl/lex-yacc) * parscheme http://mumble.net/~campbell/darcs/parscheme/ A "slow" parser combinator library that does not "leak space" according to Riastradh No documentation at this time No further information