chibi parse

This is the documentation for the R7RS Scheme library chibi parse as generated using scmdoc, a new documentation tool for the Scheme programming language. The library itself provides parser combinators with optional memoization and convenient syntax.

As an example, consider the following parser for hexademical integers:

(define hexdecimal
  (parse-map
    (parse-seq
      (parse-string "0x")
      (parse-map
        (parse-token char-set:hex-digit)
        (lambda (str)
          (string->number str 16))))
    cadr))

Index

Table of contents

Parse Streams

make-parse-stream

file->parse-stream

string->parse-stream

parse-stream-start?

parse-stream-end?

parse-stream-ref

parse-stream-substring

Parser Interface

parse-failure

call-with-parse

parse

parse-fully

parse-fold

parse->list

parse-fully->list

parse-with-failure-reason

Basic Parsing Combinators

parse-epsilon

parse-anything

parse-nothing

parse-or

parse-and

parse-not

parse-seq

list->parse-seq

parse-optional

parse-repeat

parse-repeat+

parse-map

parse-map-substring

parse-ignore

parse-assert

parse-atomic

parse-commit

Boundary Checks

parse-beginning

parse-end

parse-beginning-of-line

parse-end-of-line

parse-beginning-of-word

parse-end-of-word

parse-word

parse-word+

Constant Parsers

parse-char

parse-not-char

parse-string

parse-token

parse-sre

Memoization

parse-memoize

Parse Streams

Parse streams are an abstraction to treat ports as proper streams so that we can backtrack from previous states. A single Parse-Stream record represents a single buffered chunk of text.

procedure make-parse-stream

Create a parse stream on the given filename, with a possibly already opened port.

(make-parse-stream filename . o)

procedure file->parse-stream

Open filename and create a parse stream on it.

(file->parse-stream filename)

procedure string->parse-stream

Create a parse stream on a string str.

(string->parse-stream str)

procedure parse-stream-start?

Returns true iff i is the first character position in the parse stream source.

(parse-stream-start? source i)

procedure parse-stream-end?

Returns true iff i is the last character position in the parse stream source.

(parse-stream-end? source i)

procedure parse-stream-ref

Returns the character in parse stream source indexed by i.

(parse-stream-ref source i)

procedure parse-stream-substring

Returns a string composed of the characters starting at parse stream s0 index i0 (inclusive), and ending at s1 index i1 (exclusive).

(parse-stream-substring s0 i0 s1 i1)

Parser Interface

Procedures for invoking a constructed parser on a created parse stream.

procedure parse-failure

Combinator to indicate failure.

(parse-failure s i reason)

procedure call-with-parse

Call the parser combinator f on the parse stream source, starting at index index, passing the result to the given success continuation sk, which should be a procedure of the form (result source index fail). The optional failure continuation should be a procedure of the form (source index reason), and defaults to just returning #f.

(call-with-parse f source index sk . o)

procedure parse

Call the parser combinator f on the parse stream source, at index index, and return the result, or #f if parsing fails.

(parse f source . o)

procedure parse-fully

Call the parser combinator f on the parse stream source, at index index. If the entire source is not parsed, raises an error, otherwise returns the result.

(parse-fully f source . o)

procedure parse-fold

The fundamental parse iterator. Repeatedly applies the parser combinator f to source, starting at index, as long as a valid parse is found. On each successful parse applies the procedure kons to the parse result and the previous kons result, beginning with knil. If no parses succeed returns knil.

(parse-fold f kons knil source . o)

procedure parse->list

Parse as many of the parser combinator f from the parse stream source, starting at index, as possible, and return the result as a list.

(parse->list f source . o)

procedure parse-fully->list

As parse->list but requires the entire source be parsed with no left over characters, signalling an error otherwise.

(parse-fully->list f source . o)

procedure parse-with-failure-reason

Return a new parser combinator with the same behavior as f, but on failure replaces the reason with reason. This can be useful to provide more descriptive parse failure reasons when chaining combinators. For example, parse-string just expects to parse a single fixed string. If it were defined in terms of parse-char, failure would indicate some char failed to match, but it's more useful to describe the whole string we were expecting to see.

(parse-with-failure-reason f reason)

Basic Parsing Combinators

Simple parser combinators which can be combined to create more complex ones.

constant parse-epsilon

Parse nothing successfully.

parse-epsilon

constant parse-anything

Parse any single character successfully. Fails at end of input.

parse-anything

constant parse-nothing

Always fail to parse.

parse-nothing

procedure parse-or

The disjunction combinator. Returns the first combinator that succeeds parsing from the same source and index.

(parse-or f . o)

procedure parse-and

The conjunction combinator. If both f and g parse successfully starting at the same source and index, returns the result of g. Otherwise fails.

(parse-and f g)

procedure parse-not

The negation combinator. If f succeeds, fails, otherwise succeeds with #t.

(parse-not f)

procedure parse-seq

The sequence combinator. Each combinator is applied in turn just past the position of the previous. If all succeed, returns a list of the results in order, skipping any ignored values.

(parse-seq . o)

procedure list->parse-seq

Convert the list of parser combinators ls to a parse-seq sequence.

(list->parse-seq ls)

procedure parse-optional

The optional combinator. Parse the combinator f (in sequence with any additional combinator args o), and return the result, or parse nothing successully on failure.

(parse-optional f . o)

procedure parse-repeat

The repetition combinator. Parse f repeatedly and return a list of the results. lo is the minimum number of parses (deafult 0) to be considered a successful parse, and hi is the maximum number (default infinite) before stopping.

(parse-repeat f . o)

procedure parse-repeat+

Parse f one or more times.

(parse-repeat+ f)

procedure parse-map

Parse f and apply the procedure proc to the result on success.

(parse-map f proc)

procedure parse-map-substring

Parse f and apply the procedure proc to the substring of the parsed data. proc defaults to the identity.

(parse-map-substring f . o)

procedure parse-ignore

Parses the same streams as f but ignores the result on success. Inside a parse-seq the result will not be included in the list of results. Useful for discarding boiler-plate without the need for post-processing results.

(parse-ignore f)

procedure parse-assert

Parse with f and further require check? to return true when applied to the result.

(parse-assert f check?)

procedure parse-atomic

Parse with f once and keep the first result, not allowing further backtracking within f.

(parse-atomic f)

procedure parse-commit

Parse with f once, keep the first result, and commit to the current parse path, discarding any prior backtracking options. Since prior backtracking options are discarded, prior failure continuations are also not used. By default, #f is returned on failure, a custom failure continuation can be passed as the second argument.

(parse-commit f . o)

Boundary Checks

Procedures for performing boundary checks within a parser combinator.

constant parse-beginning

Returns true iff index is the first index of the first parse stream source.

parse-beginning

constant parse-end

Returns true iff index is the last index of the last parse stream source.

parse-end

constant parse-beginning-of-line

Returns true iff source, index indicate the beginning of a line (or the entire stream).

parse-beginning-of-line

constant parse-end-of-line

Returns true iff source, index indicate the end of a line (or the entire stream).

parse-end-of-line

constant parse-beginning-of-word

Returns true iff source, index indicate the beginning of a word (or the entire stream).

parse-beginning-of-word

constant parse-end-of-word

Returns true iff source, index indicate the end of a word (or the entire stream).

parse-end-of-word

procedure parse-word

Parse the combinator word (default a parse-token of char-alphabetic? or underscores), ensuring it begins and ends on a word boundary.

(parse-word . o)

procedure parse-word+

As parse-word, but instead of an arbitrary word combinator takes a character predicate pred (conjoined with char-alphabetic? or underscore), and parses a sequence of those characters with parse-token. Returns the parsed substring.

(parse-word+ . o)

Constant Parsers

Parser for parsing input in constant time.

procedure parse-char

Parse a single char which matches x, which can be a character, character set, or arbitrary procedure.

(parse-char x)

procedure parse-not-char

Parse a single char which does not match x, which can be a character, character set, or arbitrary procedure.

(parse-not-char x)

procedure parse-string

Parse the exact string str.

(parse-string str)

procedure parse-token

Parse a sequence of characters matching x as with parse-char, and return the resulting substring.

(parse-token x)

procedure parse-sre

We provide an extended subset of SRE syntax (see SRFI 115). taking advantage of more general parsing features. These are just translated directly into parser combinators, with characters and strings implicitly matching themselves. For example, '(or "foo" "bar") ) matches either of the strings "foo" or "bar". Existing parser combinators may be embedded directly. This is of course more powerful than SREs since it is not restricted to regular languages (or in fact any languages), though it does not provide the same performance guarantees.

(parse-sre x)

Memoization

Procedures to avoid expontential backtracking through memoization.

procedure parse-memoize

Parse the same strings as f, but memoize the result at each source and index to avoid exponential backtracking. name is provided for debugging only.

(parse-memoize name f)