* Silex / LALR                    -*- org -*-

Silex is a direct translation of the lex interface.  It is barely integrated
into scheme.

You must LEX a file.l to a file.scm and source that file; then you run
lexer-init (which accepts a string or port) to initialize the input.
It outputs one token every time you call the lexer.

** Annoying things about Silex:
Lex grammar must reside in a separate file, making recompilation annoying.
Lex grammar follows Lex specs exactly.  Scheme may not be used to build up
  regular expressions, for example.
It is flexible, so actions can return anything.  Because lalr expects a token pair,
  this leads to lots of redundant actions, e.g. (cons '+ '+), (cons '- '-), ...
It is impossible to relex a file.l after you call lexer-init; this may be a bug.
No predefined macros (regexps) such as whitespace or digit.
Only understands metachar '\n', not \t.  Must use \9 for tab.
In same vein, limited "silex" regular expressions rather than whatever
  the system provides.


** Annoying things about lalr:
Productions return #f by default, not $1.  Simple syntax like 
  a: b | c ;   =>  (a (b) (c)) is therefore not possible; you must return $1 explicitly:
               =>  (a (b) : $1 (c) : $1)
Scope is not obeyed, as in PLT (and Bigloo?)  Because I have used neither and do
  not completely follow the Lexer and Parser Generators in Scheme.pdf explanation,
  I don't know how limiting this is.
Does not allow string literals (or regexps, of course) in the grammar.  Any parser derived
  from YACC will do the same, as YACC is purely token-based.  Bison allows string literals.
Does not allow char literals in the grammar.
  I placed an experimental unapplied patch in the lalr egg repo to allow them; it disables the check
  for symbols only.  lalr-driver.scm uses eq? (memq) to compare tokens and eq? is legal
  for char comparison on Chicken.  For string matching, this would have to be expanded to equal?.

** examples

I wrote a simple wiki parser for the quack-pretty-lambda.txt wiki-format file.
It's in <file:~/scheme/lexing/wiki/wiki.scm>.  It's pretty hard to understand,
but once you have it's not too hard to add stuff.  However, it is very difficult
to debug, as nearly 0 error reporting or tracing is done, and because you cannot
really restart after an error without exiting the scheme session.

There is a port of the bison.info mfcalc example in <file:~/prog/bison/mfcalc.scm/>.

`chicken-doc lalr` contains a calculator example, which is nearly identical to
the bison port above.


* SLLGen

Parser used in EoPL <file:eopl.pdf>; also numerous university courses.
Ample examples available (somewhere other than EoPL, though that is authoritative.)
Scans (lexes) and parses LL(1) grammar.
Uses S-exprs, although does not provide a syntactic form (such as macros) for integrating them into Scheme.
Allows literals in the grammar itself.

Studying SLLGen examples, as I find them, should explain how to properly use a lexer
in scheme and what to look for / look down upon.

Search google for "sllgen".  For example:
http://www.mathcs.carleton.edu/faculty/rkirchne/cs217/interp3-1.scm
http://courses.cs.byu.edu/cs330/cs330/lectures/ppt/Lecture16/sld001.htm
http://www.cs.rit.edu/~afb/20013/plc/hmwk7/  [sllgen source -- from 1999]
  define-datatype has error: extra ellipsis in syntax form
http://www.cs.xu.edu/csci300/00f/Programs/interpreter-1.scm.txt
http://tao.uab.es/cgi-bin/archzoom.cgi/jao@gnu.org--2004/libs--eopl--1.0--patch-1/define-datatype.scm?download
  [newer version of define-datatype - 2000]
Some lecture notes at ~/scheme/doc/SLLGEN Lecture Notes.ps.
  [originally from http://www.cs.ucla.edu/~palsberg/course/purdue/cs565/S97/lectures/Print-systemnotes1.ps]

Tested sllgen.scm with test-sllgen2.scm (included as text in sllgen.scm).  Works fine.  But, I didn't have define-datatype loaded, so I am not exercising the parser...

"Actions" for lex grammar are:
;;; action-opcode :: = skip | make-symbol | make-number | make-string
Some examples use "symbol", "number" and "string" without the make- prefix.
But the sllgen.scm code I have uses the make- prefix.

The main point of SLLGEN seems to be to generate a syntax tree which can be evaluated using an interpreter you write.  The "actions" in the grammar are not evaluated, but are instead "datatypes" [1] which are joined together to form a parse tree.  You pass this tree to an EVAL-EXP function of your own making, along the lines of the eval in SICP 4.1.  (See http://groups-beta.google.com/group/comp.lang.scheme/msg/6bb3b81e7d572391?dmode=print&hl=en for an example of eval-exp.)
There are a couple bad things about this:
- I imagine a significant amount of memory will be used because the entire parse tree is kept in memory.  (Perhaps we could avoid emitting a parse tree.)
- SLLGEN seems to be a teaching tool rather than production code.

It does appear you can parse streams with sllgen:make-stream-parser
(though this is intended for REPL loops).  See code at end of
http://www.math.grin.edu/~stone/courses/languages/Lambda-using-SLLGEN.ss


Overall, I find many many references to using SLLGEN to implement an
interpreter (transforming input into a parse tree) and no references
to executing code on the fly [even if it is possible].  I conclude
SLLGEN should not be investigated further at this time, as it will not
be practical for non-interpreter use.  Although it may be reopened if
I need to convert arbitrary input into s-expr format.

[1] The exact purpose and use of the "datatype" records is still unknown.]

* Bigloo

Doc at http://www-sop.inria.fr/mimosa/fp/Bigloo/doc/bigloo-11.html#Lalr(1)-parsing

lalr-grammar, regular-grammar, read/lalrp
Lexing and parsing integrated into the language.
Seems very tied to bigloo.
Lalr might be possible, but regular-grammar stuff is very tightly integrated with bigloo regular expressions.

regular-grammar code appears to be in runtime/Rgc/*
lalr-grammar code appears to be in runtime/Lalr/*
runtime/Ieee/input.scm contains read/lalrp, which is a wrapper.
runtime/Eval/expanders.scm contains
  (install-expander 'lalr-grammar expand-lalr-grammar)
  which -may- be like installing a macro.
Finally, I noticed the use of multiple-value-bind and there
are probably other non-standard functions used.

* PLT

Referenced in Lexer and Parser Generators in Scheme.pdf
along with GT (cannot find source for this).

Location: plt/collects/parser-tools (lex.ss and yacc.ss)
http://download.plt-scheme.org/scheme/plt/collects/parser-tools/

Difficulty of separating: contains PLT-specific stuff, not only module system
but syntax stuff I don't recognize such as define-for-syntax.

** Examples
Simple BIND (zone) file example in file:~/scheme/doc/plt-lexer-example.scm, pulled off some guy's blog.
[Note: another BIND parsing example is in the Lex and YACC HOWTO.pdf in section 3.2.]

** Scott's parser tools
There are a couple references to "Scott's parser tools" which is
a reference to the lexer/parser in PLT, the author of which
is Scott Owens.


* Parsing of input @ http://okmij.org/ftp/Scheme/parsing.html

Some lower level parsing functions.  Probably useful for theoretical
background.  

"The following simple functions surprisingly often suffice when
parsing an input stream. They either skip, or build and return tokens
according to the inclusion or delimiter semantics."

* CPS Recursive Ascent Parsing

http://mysite.verizon.net/vze4rvw7/RecurAscentPaper.pdf
  [saved locally as CPS Recursive Ascent Parsing.pdf]

Gives some examples of recursive descent and top down parsers
and works up to a hand written recursive ASCENT parser.

Interesting (I basically know how to write a recursive descent parser now)
but the CPS style requires as much work as manually writing a recursive
descent parser, and is more confusing.

* Essence
http://www.informatik.uni-freiburg.de/proglang/software/essence/doc/html/essence.html

"Essence is a generator for LR(k) and SLR(k) parsers in Scheme."

Essence appears to take tokenized input and thus requires a lexer.  It
should be Class III (s-exprs, integrated into Scheme).

No study has been made of Essence.

* Taurus
Recursive descent parser from 1992.  No research yet
Taurus, Zebu, etc. are discussed on 
  http://www-2.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/scheme/code/parsing/0.html.  
  All the parsers on this page are very old.

* Parse::RecDescent (Perl5)

http://search.cpan.org/~dconway/Parse-RecDescent-v1.95.1/lib/Parse/RecDescent.pm

In apache config example at
http://groups-beta.google.com/group/comp.lang.perl.misc/msg/52d131d22f07f35b?dmode=print&hl=en,
uses two interesting tricks -- autovivifies the %return hash with the
field name (key) and value; and apparently 'field(s)' gathers the
output of its subrules into a list automatically (whereas I manually
use CONS for the devices in the ioscan parser).

-> This is explicitly documented in the manual: "Since a repeated subrule may match
   many instances of the subrule itself, the value associated with it is not a simple
   scalar, but rather a reference to a list of scalars, each of which is the value
   associated with one of the individual subrule matches."

* Packrat

Packrat parsing is a memoizing, backtracking recursive-descent parsing
technique that runs in time and space linear in the size of the input
text.

<http://www.lshift.net/~tonyg/packrat.pdf>  [documentation]
<http://pdos.csail.mit.edu/~baford/packrat/> [theoretical background papers]
packrat egg
json egg [uses packrat to implement a parser]


* Uncategorized, merged in from parsing.txt

Perl6::Rules module -- grammar-based parsing
http://www-106.ibm.com/developerworks/linux/library/l-cpregex.html -- Perl 6 grammars and regexes and lexing and parsing
		The Parse::RecDescent module employs Perl 5's regular expressions to do lexing
		seamlessly (whenever a P::RD rule does not use other rules to do matching, it's
		a safe bet it's a lexer rule), and builds parsing facilities on top. Thus,
		P::RD on top of Perl 5 is a powerful parser and lexer combination.

lalr.egg: Interesting, but doc is useless and there are no examples.  Whether it suppors
regexes or even how to start is beyond me.  And there is no extra documentation (at all)
in the code.
 -- Lexical analysis function (needed by parser) can be provided by silex.
 -- After reading a bunch of stuff about parsing, I understand this better.
http://www.codecomments.com/archive282-2005-2-406367.html -- silex,lalr getting started example
http://www.codecomments.com/archive282-2005-2-404933.html -- YAML, "syck", silex, and lalr
  [codecomments.com URLs are probably from c.l.s]
http://shivers.com/~shivers/scheme04/tmp/scheme04/article/05-lexparse.pdf -- Lexer and Parser Generators in Scheme
   -- mostly about *writing* lexers and parsers, not using them
A bunch of bookmarks under safari:prog/scheme/regexps

read lex/yacc stuff
lexyacc.pdf (from http://ds9a.nl/lex-yacc)


* parscheme

http://mumble.net/~campbell/darcs/parscheme/

A "slow" parser combinator library that does not "leak space" according to Riastradh
No documentation at this time
No further information