edward.parse

This is a stripped-down and minimally modified version of chibi parse, a parser combinator library with optional memoization. This version of the library is specifically designed to faciliate building read-eval-print loops with parser combinators.

Apart from parse streams, the central abstraction of this libary is therefore a REPL type which continously reads data from standard input and parses this data with parsers constructed using provided parser combinators.

An exemplary parser may be constructed as follows:

(import (srfi 14) (edward parse))

(define hexadecimal
  (parse-map
    (parse-seq
      (parse-string "0x")
      (parse-map
        (parse-token char-set:hex-digit)
        (lambda (str)
          (string->number str 16))))
    cadr))

This parser recognizes hexadecimal integers using the character set definition provided by SRFI 14 and transform these characters into a Scheme number type using parse-map. Refer to the documentation below for more information on available combinators.

Index

Table of contents

Read–Eval–Print Loop

REPL abstraction which provides a read-eval-print loop that continously reads data from standard input and parses this input data using provided parser combinators. The REPL operates on a parse stream internally.

procedure make-repl

Create an new REPL instance with the given input prompt string.

(make-repl prompt)

procedure repl?

Predicate which returns true if the given object was created using make-repl.

(repl? obj)

procedure repl-prompt?

Predicate which returns true if the prompt should be shown.

(repl-prompt? read-eval-print-loop)

procedure repl-set-prompt!

Change prompt visibility, a truth value means the prompt is shown.

(repl-set-prompt! read-eval-print-loop new-value)

procedure repl-run

Start the REPL given by repl, and continuously parse input using the provided parser f. Successfully parsed input is passed to the success continuation sk, which receives the line number and parser result as procedure arguments. If the parser failed for the current input, the failure continuation fk is invoked. This continuation receives the line number and failure reason as procedure arguments. Lastly, an interrupt continuation must also be provided which is invoked on SIGINT. This continuation is not passed any arguments.

(repl-run repl f sk fk ik)

procedure repl-interactive

Run a parser interactively within the REPL. That is, deviate from the standard REPL parser and instead parse the next input line with the given parser f. On success, returns the result of f otherwise, invokes the provided failure continuation fk.

(repl-interactive repl f fk)

Parse Streams

Parse streams are an abstraction to treat ports as proper streams so that we can backtrack from previous states. A single Parse-Stream record represents a single buffered chunk of text.

procedure make-parse-stream

Create a parse stream open on the given filename, with a possibly already opened port.

(make-parse-stream filename . o)

procedure file->parse-stream

Open filename and create a parse stream on it.

(file->parse-stream filename)

procedure string->parse-stream

Create a parse stream on a string str.

(string->parse-stream str)

procedure parse-stream-start?

Returns true iff i is the first character position in the parse stream source.

(parse-stream-start? source i)

procedure parse-stream-end?

Returns true iff i is the last character position in the parse stream source.

(parse-stream-end? source i)

procedure parse-stream-ref

Returns the character in parse stream source indexed by i.

(parse-stream-ref source i)

procedure parse-stream-substring

Returns a string composed of the characters starting at parse stream s0 index i0 (inclusive), and ending at s1 index i1 (exclusive).

(parse-stream-substring s0 i0 s1 i1)

Parser Interface

Procedures for operating on a created parse stream.

procedure parse-failure

Combinator to indicate failure.

(parse-failure s i reason)

procedure call-with-parse

Call the parser combinator f on the parse stream source, starting at index index, passing the result to the given success continuation sk, which should be a procedure of the form (result source index fail). The optional failure continuation should be a procedure of the form (source index reason), and defaults to just returning #f.

(call-with-parse f source index sk . o)

procedure parse

Call the parser combinator f on the parse stream source, at index index, and return the result, or #f if parsing fails.

(parse f source . o)

procedure parse-fully

Call the parser combinator f on the parse stream source, at index index. If the entire source is not parsed, raises an error, otherwise returns the result.

(parse-fully f source . o)

procedure parse-fold

The fundamental parse iterator. Repeatedly applies the parser combinator f to source, starting at index, as long as a valid parse is found. On each successful parse applies the procedure kons to the parse result and the previous kons result, beginning with knil. If no parses succeed returns knil.

(parse-fold f kons knil source . o)

procedure parse->list

Parse as many of the parser combinator f from the parse stream source, starting at index, as possible, and return the result as a list.

(parse->list f source . o)

procedure parse-fully->list

As parse->list but requires the entire source be parsed with no left over characters, signalling an error otherwise.

(parse-fully->list f source . o)

procedure parse-with-failure-reason

Return a new parser combinator with the same behavior as f, but on failure replaces the reason with reason. This can be useful to provide more descriptive parse failure reasons when chaining combinators. For example, parse-string just expects to parse a single fixed string. If it were defined in terms of parse-char, failure would indicate some char failed to match, but it's more useful to describe the whole string we were expecting to see.

(parse-with-failure-reason f reason)

Basic Parsing Combinators

Combinators to construct new parsers.

constant parse-epsilon

Parse nothing successfully.

parse-epsilon

constant parse-anything

Parse any single character successfully. Fails at end of input.

parse-anything

constant parse-nothing

Always fail to parse.

parse-nothing

procedure parse-or

The disjunction combinator. Returns the first combinator that succeeds parsing from the same source and index.

(parse-or f . o)

procedure parse-and

The conjunction combinator. If both f and g parse successfully starting at the same source and index, returns the result of g. Otherwise fails.

(parse-and f g)

procedure parse-not

The negation combinator. If f succeeds, fails, otherwise succeeds with #t.

(parse-not f)

procedure parse-seq

The sequence combinator. Each combinator is applied in turn just past the position of the previous. If all succeed, returns a list of the results in order, skipping any ignored values.

(parse-seq . o)

procedure list->parse-seq

Convert the list of parser combinators ls to a parse-seq sequence.

(list->parse-seq ls)

procedure parse-optional

The optional combinator. Parse the combinator f (in sequence with any additional combinator args o), and return the result, or parse nothing successully on failure.

(parse-optional f . o)

procedure parse-repeat

The repetition combinator. Parse f repeatedly and return a list of the results. lo is the minimum number of parses (deafult 0) to be considered a successful parse, and hi is the maximum number (default infinite) before stopping.

(parse-repeat f . o)

procedure parse-repeat+

Parse f one or more times.

(parse-repeat+ f)

procedure parse-map

Parse f and apply the procedure proc to the result on success.

(parse-map f proc)

procedure parse-map-substring

Parse f and apply the procedure proc to the substring of the parsed data. proc defaults to the identity.

(parse-map-substring f . o)

procedure parse-ignore

Parses the same streams as f but ignores the result on success. Inside a parse-seq the result will not be included in the list of results. Useful for discarding boiler-plate without the need for post-processing results.

(parse-ignore f)

procedure parse-assert

Parse with f and further require check? to return true when applied to the result.

(parse-assert f check?)

procedure parse-atomic

Parse with f once and keep the first result, not allowing further backtracking within f.

(parse-atomic f)

procedure parse-commit

Parse with f once, keep the first result, and commit to the current parse path, discarding any prior backtracking options. Can optionally be passed a failure reason with which all resulting failure messages will be prefixed.

(parse-commit f . o)

Boundary Checks

Procedures for performing boundary checks within a parser combinator.

constant parse-beginning

Returns true iff index is the first index of the first parse stream source.

parse-beginning

constant parse-end

Returns true iff index is the last index of the last parse stream source.

parse-end

constant parse-beginning-of-line

Returns true iff source, index indicate the beginning of a line (or the entire stream).

parse-beginning-of-line

constant parse-end-of-line

Returns true iff source, index indicate the end of a line (or the entire stream).

parse-end-of-line

Constant Parsers

Underlying combinators which parse a constant input and, contrary to the parsers documented above, cannot be passed parser combinators as procedure arguments.

procedure parse-char

Parse a single char which matches x, which can be a character, character set, or arbitrary procedure.

(parse-char x)

procedure parse-not-char

Parse a single char which does not match x, which can be a character, character set, or arbitrary procedure.

(parse-not-char x)

procedure parse-string

Parse the exact string str.

(parse-string str)

procedure parse-token

Parse a sequence of characters matching x as with parse-char, and return the resulting substring.

(parse-token x)

Laziness and Memoization

Lazy evaluation of parser combinators and memoization.

syntax parse-lazy

A delayed combinator. This is equivalent to the parser combinator f, but is delayed so it can be more efficient if never used and f is expensive to compute. Moreover, it can allow self-referentiality as in:

(letrec* ((f (parse-lazy (parse-or (parse-seq g f) h))))
  ...)

procedure parse-memoize

Parse the same strings as f, but memoize the result at each source and index to avoid exponential backtracking. name is provided for debugging only.

(parse-memoize name f)

Utility Parsing Combinators

Additional high-level combinators implemented using the aforementioned basic combinators. The provided high-level combinators, return more useful parser error messages then the basic ones.

procedure parse-fail

Parser which always fails with the given error message.

(parse-fail msg)

procedure parse-bind

Bind a constant value to a given parser. That is, always return this value if the parser succeeds.

(parse-bind value parser)

procedure parse-as-string

Run a parser f, which must return a list, and convert its return value to a string using the list->string procedure.

(parse-as-string parser)

constant parse-digits

Parse one or more digits (i.e. 0-9), interpret them as a decimal number, and return this number.

parse-digits

constant parse-lowercase

Parse an ASCII lowercase character (i.e. a-z).

parse-lowercase

procedure parse-default

Attempt parsing using the given parser f, if it fails return a default value def.

(parse-default f def)

constant parse-newline

Parse a newline character.

parse-newline

constant parse-blank

Parse a single blank character (i.e. horizontal whitespace).

parse-blank

constant parse-blanks+

Parse one or more blank characters.

parse-blanks+

constant parse-blanks

Parse zero or more blank characters.

parse-blanks

procedure parse-between

Invokes parser f between the parsers lhs and rhs.

(parse-between lhs f rhs)

procedure parse-esc

Returns the result of parser f but allows preceding its input with a backslash character to escape it in the parsed input format.

(parse-esc f)

procedure parse-alist

Parse an alist mapping chars to values which should be returned for each char.

(parse-alist alist)

procedure parse-regex-lit

Parse a regex literal which is enclosed by the character ch.

(parse-regex-lit ch)

procedure parse-regex-lit*

Parse a regex literal which starts with character ch and is terminated by the same character or the end of line.

(parse-regex-lit* ch)

procedure parse-strip-blanks

Invoke given parser and strip trailing blanks (if any).

(parse-strip-blanks parser)

procedure parse-blanks-seq

Like parse-seq but skip blanks before each parser.

(parse-blanks-seq . o)

constant parse-line

Parse a single line (excluding the terminating newline character).

parse-line

procedure parse-with-context

Feed the result of the parser ctx to a single argument procedure f. The procedure must then return a new parser which is executed afterwards on the same index as ctx.

(parse-with-context ctx f)