Meta-Meta development log
by Frans
Below the development log of the Meta-Meta
project, as I started it on
August 24, 2001. It starts with the further development
of IParse, a program that I already developed some years ago,
in order to build a parser with type-checking abilities for
the specification language.
- Combined all the sources into a single file.
Sources: IParse.c.
- Decided to replace the string by a version
that was based on a hash tree where each hash had
a size of 16. Introduced the type hex_hash_tree_t
for this purpose.
Sources: IParse.c.
- Continued working on the hash tree implementation.
Renamed the type to hexa_hash_tree_t.
- Decided to turn solutions into an array the
size of the input file, instead of a linked list for
improved performance, against some increased memory
usage. Also added free_solutions to free
the memory.
- Wrote free_tree and assign_tree to
make a start with freeing the memory used by the trees as well.
Sources: IParse.c.
- Continued working on freeing the memory used for the
trees. The code now uses the functions tree_claim
and tree_releas for this.
Sources: IParse.c.
- Finished the modification for freeing the memory used
for the trees. For this the functions tree_assign,
tree_mode, and tree_release are used.
As you can see, I have been alternating the way of working
and returned to my original functions.
Sources: IParse.c,
- Created SParse.c from IParse.c.
- Created SParse.gr.
- Ran "IParse SParse.gr -o SParse_grammar.c"
to create an initialization routine and copied
this into SParse.c. Discovered some
bugs in IParse.c and fixed these.
- Started with expanding struct rule_T
with ident_class, and to adapt the
make_rule function, only to discover
some errors in the SParse.gr grammar
definition.
- Fixed SParse.gr
- Then I wanted to fix something in print_tree_to_c
of IParse, because it generates KEYWORD
for literals, which are not stricktly speaking keywords.
Suspected a memory allocation problem related to the
strings passed to make_atom_str.
- The suspected memory allocation problem was not there
as the function accept_string did do the
proper allocations. Added two calls to free
when this string was used in a call to string
in combination with a call to make_str_atom.
- Changed KEYWORD in print_tree_to_c
in the neutral LITERAL and removed STR
that was not used.
- Ran "IParse SParse.gr -o SParse_grammar.c"
to create an initialization routine and copied output
into SParse.c.
- Applied changes mentioned in point 3 to SParse.c.
- Remove function accept_symbol_single_character
from IParse.c and SParse.c
because it is not used.
- Adapted make_rule and parse_rule.
- Tried to run "SParse c1.gr", but this failed
because ident is now a keyword. Modified grammar
to include "ident" [ident] as alternative for
prim_elem.
- Temporary replaced call of make_initial_grammar
with "#include "SParse_grammar.c"", and renamed
call to the function to init_SParse_grammar.
Now c1.gr parses without problems.
- Parsing some file with the new grammar constructs in it,
failed. Some checking revealed why. In case you have
an alternative like "a" and "a b" then
the first should occur later, because of the "greedy"
nature of the parsing algorithm. Made the modification
to the grammar, which worked.
Sources:
IParse.c,
SParse.c, and
SParse.gr.
- I decided to continue to develop SParse.c under
the name IParse.c as a continuation of the original
development and to keep the program downwards compatible.
For this it is needed that SParse.c can parse
SParse.gr without problems. When I tried it this
afternoon, it returned an error, which I did not understand.
I had to switch on parsing and scanning debugging (using
the switches "+ds +dp") to find the cause. I
discovered that ident has become a keyword in
Sparse.gr where it used to be an ident. Thus
the fragment ""ident" [ident]" returns an error
on the second occurence of "ident" saying that an ident
is expected. Confused? So, was I!
But what is more important is that the error reporting
contains some errors and should be improved.
- Renamed SParse.c to IParse.c and made
it version 1.1a. I improved the error reporting.
- Did some refractoring with respect to cur_line
and cur_column. These are now administrated inside
_next and _advance.
Sources:
IParse.c and
SParse.gr.
- I renamed SParse.gr into IParse.gr
and fixed the error in it.
- I ran IParse IParse.gr -o IParse_grammar.c and
modified IParse.gr to include IParse_grammar.c
and call the right function. And then I tried to close the
loop by compiling IParse.c and running it.
But, I got a run-time error in the init_IParse_grammar
routine, due to an error in print_tree_to_c.
This means, I first have to fix IParse_grammar.c
manually, before I can fix it. After fixing in running IParse -p
worked again.
- But then I came along another run-time error, when I tried
to run IParse IParse.gr. It was caused by the
fact that in the call try = parse_nt(rule->info.non_terminal, &t)
the first argument renderd to a NULL pointer. This means that
the field non_terminal is not initialized properly
in make_rule, I guess. I decided to replace
IParse_grammar.c by an earlier version, manually
apply the changes in the grammar and then first fix some
other things.
- I added code for processing identalone that I
introduced in the rule | "ident" [identalone] .
Then I ran IParse IParse.gr -o IParse_grammar.c again,
and tried compiling IParse. Then running the
command again worked. The loop is now closed again.
Running IParse c.gr
scan_ -p returned the
correct result.
Sources:
IParse.c and
IParse.gr.
In the past days, I have been considering the idea of integrating
IParse with an editor, and also use the parsing for colour coding
and automatic reformating. The idea is that after each edit operation
the file is reparsed and colour coded according to how far the file
was parsed. If the whole file is parsed, the code can be reformated,
according to the parse tree. For this to work, it is probably needed
to implement some kind of reparsing, which make use of what has
already been parsed. Idea is to use the partial solutions stored
in solutions. Knowing what has been changed (some characters
inserted, deleted, or replaced) we can walk through the the partial
solutions an decided which overlap with the changed range. These should
then be marked as invalid. The off-set of the solutions after the
changed ranged, need to be corrected. If parsing is started over
again, most of the partial solutions will be used, except for those
in the region that was affected.
If the change consists of a single character removed, added, or changed,
it might be a good idea to see if the scanning routine called just
before the changed position still scans the same point as before,
instead of invalidating the whole parse tree and parse it again.
I have been looking for an appropriate editor to be used.
First, I thought about the
Scintilla editor component, which is an improved implementation
of the RichEditCtrl. The problem that I felt with this, was that
it has more functionality then I needed. But a real obstacle is
that I cannot easy decorate the internal struction with the
partial parse trees. Then I looked at the
Edit controls page
of the Code Project where I found Crystal Edit - a syntax colouring
text editor. After I downloaded it, I discovered that it exactly
matched my requirements. It is simple, but has everything you need
for source editing. The sources look very clean and well designed.
The cources consists of a collection of editor classes, and a
sample editor with a MDI interface (which I am using right now
for editing this text). And because it doesn't make use of any
controls, it gives you full control over the internals.
Although, I am excited about the idea of integrating IParse with
Crystal Edit, I do not think it is the right moment. Also it would
mean a great diviation from my original development plan. And there
is also the practical problem that I need to install Visual Studio
first.
- I extended the IParse grammer with some additional elementairy
elements for specifying context boundaries.
Finally, I found some time to continue working on IParse.
- I wrote a short type specification grammar, called
tspec.gr with some simple identifier binding rules.
I also wrote a accomplishing specification called upspec.txt.
I made some corrections to IParse.gr, and got tspec.gr
parsed correctly. Running "IParse tspec.gr upspec.txt" resulted
in a core dump. Which is not very surprising as the code for dealing
with the new grammar elements is totally missing.
- I added code for dealing with the open-context and close-context.
To simplify the development, I decided to change the IParse grammar
such that there are no identifier classes with an open-context.
(I also cannot think of a programming concept where this is relevant,
anyway.)
- When I ran "IParse tspec.gr upspec.txt", it looked like
the second file was parsed with the IParse grammar, not with the
tspec.gr grammar. How is this possible? The strange thing
is that "IParse c.gr scan_" just worked fine. After some
puzzling, I found that the variable all_nt was only initialized
once, and not everytime when make_all_nt was called. An old
error, as it seems. I fixed it.
- It seems IParse does not report an error when a non-terminal symbol
is not defined. Lets see, if I can do something about it.
Sources:
IParse.c,
IParse.gr,
tspec.gr, and
upspec.txt.
- Because, I have little time today, I tried to add code for storing
line numbers and columns into the parsed trees, which is needed
to report semantical errors. For this I added to members to
tree_t. I also added some code to set these members
in parse_rule and parse_seq. In print_tree
I added some lines to print-out the line and column numbers between
curly brackets.
- For building the symbol trees, I first defined some
additional structures, which will be attached to
the open-context tree elements.
- Next I wrote the routine for building up the
symbol trees. This are the functions
build_symbol_trees_tree and
build_symbol_trees_list.
- Finally, I extended the printing
routine with the function print_symbol_tree
to print out some information of the identifiers
being defined in each symbol tree. When I tested
it, it produced the correct output, but the
program did not terminate.
- Some debugging revealed that the routine did
not leave the final tree_release call.
I am getting the feeling that it is trying to
release some symbol trees as if they were trees.
That was indeed the case.
- Then I tried to change the IParse.gr
such that the non-terminals will be stored
into a symbol tree. But as soon, as I
changed the first line into
"root : { nt_def SEQ } eof.", it would
only print "{1:1}{". It took me some
time to figure out what was going wrong. I fixed
this by treating the "eof" terminal
as all other literals, and not doing any fixing
for this after having parsed the grammar.
Changing the IParse.gr can bring me
in another bootstrapping loop that I cannot get
out, so I have to be very careful, and think
how I should proceed.
Sources:
IParse.c,
IParse.gr, and
tspec.gr.
Friday, October 12, 2001
- After some consideration, I did not consider the above
idea a very good idea. The benefits would not weight
against the profits.
- I made some changes to the type definitions, after I
discovered that you can write "typedef struct tree_t tree_t;",
thus use "tree_t" both as a type name and a name
of a structure. These identifier belong to different
identifier classes. (It is long ago that I learned
something new about the C programming language.) But
now some serious work.
- I wrote the code for verifying if all identifiers
are defined. But from the upspec.txt example,
I discovered that it is not so easy as I had thought.
The problems lies with checking the member variables
of classes/structures, like in the C construct:
a->b. To analyse these kind of constructs,
you have to know that a is a variable with
a certain type and that this type is a pointer to
a structure that contains a member with the name b.
The conclusion seems that you really need to include a kind
of typing system to check the correct definition of
identifiers.
- After thinking a bit about it, I figured out that I can
make it work without much modifications. Now the definition
of a identifier is set to its parent node. However, in
case this node consists of two members, one being the
identifier, than there is no objection against making
it equal to the other member. Implementing this, proofed
to be relatively easy.
- The other modification is, that when we search for a
"local context", we also search the definitions of
identifiers. (Of course, cyclic definitions should
be blocked.) This means we might have to search the
definition of an identifier in its own context. To
solve this problem, there are tree solutions:
- Store with each tree its parent tree, such that
we can always find the definition of an identifier
given its tree, because we can determine the innermost
surrounding context, or local context from this.
- Store with each usage of an identifier its
definition, and use a multi-pass algorithm to
resolve forward usages of identifier that are
defined later.
- Store with each usage of an identifier its
context (from which the definition can be
found), and use a multi-pass algorithm as well.
Each of the solutions requires additional information
to be stored with the trees. To store the definition
with an identifier, seems to be most natural to me,
because in a sense, an identifier is nothing else than
a symbolic pointer. I haven't implemented this modification
yet, because it seems to me that the rules for finding
a "local context" might be language dependent. And if
that is the case, they should be specified.
Sources:
IParse.c
November 2001
During this month, I tried to implement a parser for
Transact-SQL. For this some extra scanning routines were needed.
I decided it was time to implement an easier method for adding
scanning functions. The following modifications were made:
- Added various SQL scanning routines.
- Introduced types terminal_t and ident_t
for registering scanners with terminal symbols; constants
and identifier respectively.
- Defined some "add terminal" functions with the macro
DEF_ADD_TERMINAL_FN, which are called in initialize
to register the scanners for the terminals.
- Defined the function find_terminal for looking up the
registered scanners. It is called from the make_rule function.
- Added the function parse_term for parsing a terminal. In
this function the actual parsing methods are called throught the
various accept_ members of terminal_t based on
the result type of the constant terminal.
- Added the parse_ident function for parsing a ident based
on the registered ident parsing function and the required identifier
occurence type.
- Replaced the terminal parsing parts in parse_rule and
parse_seq by calles to parse_term and parse_ident.
- Introduced boolean switches cpp_comments and
nested_comments, which control different forms
of comments.
Thursday, October 10, 2002
I tried to use IParse for implementing a JavaScript
parser. I am getting more and more pages, which are being generated
with JavaScript, but which are not checked by my HTML checker, so I was thinking about implementing a JavaScript interpreter
for this purpose. I did check the Internet for an available interpreter, but
did not find one fitting my purpose.
But when I wrote grammar for JavaScript, I encountered
a crash in IParse, caused by tree member of context_entry_p
being NULL. I also noticed that in many cases the identifiers were reported
to be undefined while they were should have been defined in the context.
Because it has been some time that I looked at the code, I was not able to
locate the source of these errors.
Friday, October 11, 2002
- Worked on build_contexts_for_tree. Took out some code and
moved it to find_entry_in_context. Also integrated the
code from build_contexts_for_list. The function seems to
work correctly now on the grammar.
Sunday, October 13, 2002
- A spend some time, not much, studying the code further.
I discovered that in the function search_for_context_in_tree
the value of cur_context was changed, which is
incorrect. A common cut & paste error. But I can
hardly believe that that was the cause of the problems
I was encountering. (I did not have time to verify this.)
- I spend the whole evening commenting the code. While
doing this, I also reordered some parts, and found some
points of improvement.
- The error which I fixed last Sunday, seemed to be the
fix for the problem. I felt quite happy about the result
and decided to release it as version 1.2 on the
main page.
Sources:
IParse.c,
js.gr.
Sunday, November 3, 2002
- Did some refactoring on accept_keyword and
accept_symbol. Now they are combined into
accept_literal. While doing this also solved
a little bug in the memorization of the scanning
routines. Now "CHAIN" also works with keywords.
(Before it only worked with symbols.)
- Did some final clean-up and testing. Decided to
release it as version 1.2.1.
Sources:
IParse.c,
Meta-Meta project