Interactive parser

This page gives an interactive parser, which allows you to enter a grammar and use it to parse some input. It can be a helpful tool in developping a grammar and testing it agains example input. The parser is a reimplementation of (a subset of) IParse, an interpretting parser, in JavaScript and responds to keystrokes, which gives immediate response to the grammar and input being entered.

Grammar description

To parse input that consists of an integer, an identifier and a string, the following grammar can be used:

root : int ident string .

This grammar consists of one production rule. The production rule starts with the root non-terminal, which is the non-terminal that is used to parse the whole input. After the colon the elements of the production rule are given. The element can be terminals and non-terminals. In this case it consists of three terminals. At the end of the production rule a period is placed. There are three predefined terminals defined:

int: parses a positive whole number
ident: parses a identifier, starting with an alphabetic character or an underscore followed by zero or more alphabetic characts, underscores or numerical characters.
string: parses a string surrounded with double quotes. Escape characters are not supported

An example input that should parse correctly is:

123 abc "test ?*"

The abstract syntax tree of the parsing is printed as:

(int:123,ident:abc,string:"test ?*")

The following grammar parses the same input as the above, but now it uses two extra non-terminal and two additional production rule for those non-terminals:

root : first rest .
first : int
rest : ident string .

To parse input that consists of the keyword if followed by an identifier between round brackets, the following grammar can be used:

root : "if" "(" ident ")" .

In this grammar rule, literal strings, between double quotes are used as additional terminals in the grammar. Literal strings that start with an alphabetic character are treated as keywords, meaning that it should be terminated with a non-identifier character. It also means that it is excluded as a valid identifier. An example of input that should parse correctly is:

if ( a )

The following input is not parsed correctly, because although it starts with if it does not start with an identifier that is equal to if

iff ( a )

The following input does not parse correctly, because if is no longer a valid identifier:

if ( if )

To parse input that consists an optional integer, one or more identifiers, and zero or more strings, the following grammar can be used:

root : int OPT 
       ident SEQ
       string SEQ OPT .

The OPT placed after an element indicates that the elements is optional. The SEQ placed after an element indicates that the element can occur one or more times. The combination of SEQ and OPT indicates that the element can occur zero or more times.

To parse input that consists of a sum of integers, the following grammar can be used:

root : int CHAIN "+"

The CHAIN followed by a literal indicates a chain sequence, where the literal is used as a separator between the elements. An example input is:

1 + 6 + 5

The LIST is a shorthand for a chain sequence with a comma.

To parse input that consists of intermixed list of integers and identifiers (separated by commas), the following grammar can be used:

root : ( int | ident ) LIST .

This will parse an input like:

ad, 4, 5, gh, jk

This grammar rule shows the use of brackets to group elements and the vertical bar character to separate alternatives. The vertical bar can also be used to combine several production rules for one non-terminal together.

When the input parses correctly, an abstract syntax tree is displayed in the text area to the right of the input. To annotate this abstract syntax tree, it is possible to annotate the production rules with identifiers between square brackets. An example of this is given in the grammar below, a grammar for simple expressions to add, multiply, and divide numbers:

root : E .
E : F CHAIN "+" [sum].
F : T | F "*" T [times] | F "/" T [div].
T : int | "(" E ")".

Whitespace is accepted between all the terminals. As whitespace the space, the tab and the newline characters are accepted and also the two types of comments that are allowed in the C-language: Any text between /* and */ and the remainder of the line followed by a // sequence.

For educational purposes, the rules with respect to the specification of terminals are limited. For more options, see the C++ implementation of IParse and the C implementation of RawParser.

Interactive parser

Below you can enter a grammar and some input to be parsed according to the grammar. On the left are the input text areas. At every keystroke (or clicking the execute button) the grammar will be parsed and if it is without errors, the input below is parsed according to the given grammar. Errors and results are shown in the text areas on the right.