Previous Up Next

Small language and its compiler

I once designed a small language (together with a roommate) for which we wrote a compiler to 6502 code in Atari 800 XL Basic. It was a rather primitive language as we wanted to make compilation easy, but powerful enough to write a compiler for itself.

I never wrote down the grammar for the language, and I have also lost the Basic program. I did start writing a compiler for yet another (more elegant and powerful) language named Ecom using this language. The syntax of Ecom resembles that of the small language. This code did survive. What follows is based on analysing this code.

Types and variables

The only basic type the language has is unsigned byte, which can also contain a single character. Both (unsigned) integer and character constants were included. Furthermore, variables with an array of bytes with fixed length can be defined. Variables can either be defined global and local. Except for the procedure parameter, all variables are defined static. Variables consist of at most six alphabetic characters. It is also possible to fix the address of a variable at a given location, if needed.

Expressions

Expression working on unsigned byte values can be made with the following operators: + adding, +c adding with carry, - subtracting, -c subtracting with carry, & bitwise and, | bitwise inclusive or, and : bitwise exclusive or. All these operators have the same priority, and are executed from left to right. Brackets can be used, if needed.

The following compare operators can be used to create conditional expressions: <>, <=, <, >, >=, and =. Each conditonal expression should contain exactly one compare operator Logical expressions are not available.

Statements

The language has a assignment statement. Sequences of statement are formed by putting elementary statement after each other and separating them with semi-colons.

Control flow

The language has a rather minimal (but powerful) set of control flow statement, namely an if-statement and a repeat-statement, which is a modification of the if-statement. The if-statement uses the keywords if, then, elif, else, and fi. The repeat-statement starts with the rep keyword, which can be followed by any number of statements, and ends with a modified form of the if-statement, which starts with the keyword rif. In the clauses of this if-statement the keyword per can be used to indicate that execution should continue at the start of the repeat-statement.

Procedures

Procedures can only have atmost one parameter, which is a byte value (except for some of the build in function described below). Inside the procedure local variables can be declared, and local procedures can be defined. A return statement can be used to exit the procedure with a byte value.

Build in procedures

There are a number of build in procedures. With the procedures write, and writel, a string can be printed, where the writel will add a line break at the end. Both these procedure can either have a constant string, or an array variable as their argument. The procedure readln, which should be called with an array variable, reads a single line from the current channel. The procedure use, which takes an integer constant, can be used to select the current input/output channel. The procedure aux, which takes an integer constant, can be used to select the input/output mode. The procedure open, which takes an string argument, can be used to open a file, and the procedure close, which takes an integer constant, can be used to close a channel. The procedure dos, which has no arguments, can be used to terminate the program and return to the Atari DOS prompt.

Grammar

Below the grammar for the language is given.
root : program eof .
program : ("var" (var_name ( "[" int "]" | "@" int )OPT )LIST ";" )OPT
          ("proc" proc_name ( "(" var_name ")" )OPT ";" program )CHAIN ";" OPT
          "begin"
          statements
          "end".

statements : (statement CHAIN ";")OPT.

statement : var_name "[" expr "]" "=" expr
          | var_name "=" expr
          | "@" var_name "=" expr
          | "if" cond_expr "then" statements "per" OPT
            ("elif" cond_expr "then" statements "per" OPT)SEQ OPT
            ("else" statements "per" OPT)OPT
            "fi"
          | "rep" statements
            "rif" cond_expr "then" statements "per" OPT
            ("elif" cond_expr "then" statements "per" OPT)SEQ OPT
            ("else" statements "per" OPT)OPT
            "fi"
          | "return" expr
          | proc_name ("(" expr ")")OPT
          | "use" "(" int ")"
          | "aux" "(" int ")"
          | "open" "(" (var_name | string_const) ")"
          | "close" "(" int ")"
          | "write" "(" (var_name | string_const) ")"
          | "writel" "(" (var_name | string_const) ")"
          | "readln" "(" var_name ")"
          | "dos"
          .

cond_expr : exprr ("<>" | "<=" | "<" | ">=" | ">" | "=") expr
          | expr .

expr : int
     | char_const
     | "(" expr ")"
     | "@" var_name
     | proc_name "(" expr ")"
     | var_name ("[" expr "]")OPT
     | expr ("+" | "+c" | "-" | "-c" | "&" | "|" | ":") expr
     .

The terminals are defined as follows:
int : ("0" .. "9")SEQ.
char_const : "'" any_char "'".
string_cons : "'" non_single_quote_char "'".
var_name : ("a" .. "z")SEQ.
proc_name : ("a" .. "z")SEQ.

The Ecom language

Ecom is a small language, that was designed in such a way that it can be compiled easily into efficient 6502 machine code.

The sources for Ecom

The code which follows is an almost exact copy of the code as I wrote it down. I only corrected some grammar mistakes (mainly missing semi-colons). I also translated some Dutch variable names into their English equivalents.

variables are defined

var
a@218, as[80]
These are used to store a line from the input. The variable a is placed in the zero-page.
top, depth, first[11],
Used to manage the nested identifier scopes.
id[10],
Contains the last identifier read from the input.
vn[100],ve[100],vt[100],
vd[100],vv[100],vy[100],
leng[100],type[100],
adrl[100],adrh[100],
Used to store information of up to hundered identifiers, with their storage length, type (e.g. procedure or variable name), and address.
pnr,padl[40],padh[40],
sym,
char[1],
len,type,vnr,
cl,ch,vl,vh,
code[2],pcl,pch,
(Unknown)
stack[100],sp,
Contains the stack used during parsing.
ast,st,
labk[3],
labell[100],labelh[100],lnr,
rlab,
op,
ofsetl,ofseth,
startl,starth,
file[16],flen;

Error printing procedure

proc err;
begin
use(0);writel('error');
char[0]=as[a];
write('at ');writel(char);
readln(as);
close(1);close(2);
dos
end;

proc typerr;
begin
use(0);writel('wrong type');err
end;

Scanning procedures

proc space;
begin
rep
rif as[a]=155
then use(1);
     readln(as);
     use(0);
     writel(as);
     use(2);
     a=0 per
elif as[a]=' '
then a=a+1 per
fi
end;

proc symbol;
begin
space sym=scan(as)
end;

proc expect(s);
begin
if sym<>s
then use(0);char[0]=s;write(char);
     writel(' expected');err
fi; symbol
end;

proc expch(ch);
begin
if as[a]<>ch
then use(0);char[0]=ch;write(char);
     writel(' expected');err
fi; a=a+1; symbol
end;

proc testch(ch);
begin
if as[a]=ch then a=a+1; symbol; return 255
            else return 0
fi
end;

Identifier management procedures

proc ident;
var i,ch;
 proc add;
 begin
 if i<6
 then id[i]=ch; i=i+1
 fi; a=a+1
 end;
begin
space;
i=0;
rep ch=as[a]
rif ch>='a'
then if ch<='z' then add per fi
elif ch>='0'
then if ch<='9'
     then if i<>0 then add per fi
     fi
fi;
if i=0
then return 0
else rep rif i<6
         then id[i]=' ';i=i+1 per
         fi
fi
return 255
end;

proc lookup(eind);
var i;
begin
i=top;
rep i=i+1
rif i>=eind
then if id[0]=vn[i] then
     if id[1]=ve[i] then
     if id[2]=vt[i] then
     if id[3]=vd[i] then
     if id[4]=vv[i] then
     if id[5]=vy[i]
     then typ=type[i];
          len=leng[i];
          vnr=i;return i
     fi fi fi fi fi fi
     per
fi;
typ=0;
return 0;
end;

proc addvar;
begin
vn[top]=id[0];
ve[top]=id[1];
vt[top]=id[2];
vd[top]=id[3];
vv[top]=id[4];
vy[top]=id[5];
leng[top]=len;
type[top]=typ;
top=top+1;
if top>99
then use(0);
     writel('too many vars');err
fi
end;

proc uniek
begin
if lookup(first[depth])<>0
then use(0);
     writel('double def');
     err
fi
end;

Procedure for reading constants

proc const;
var i;
  proc readc(base);
  var d;
    proc add(v);
    var lo,hi,i;
    begin
    lo=cl;hi=ch;i=1;
    rep i=i+1;lo=lo+ cl;hi=hi+c ch
    rip i<base then per
    fi;
    cl=lo+v;ch=hi+c0
    end;
    proc digit;
    var ch;
    begin
    ch=as[a];
    if ch<='9'
    then if ch>='0'
         then return ch-'0'
         fi
    elif ch<='f'
    then if ch>='a'
         then return ch-'a'+10
         fi
    fi;return 255
    end;
  begin
  rep d=digit
  rif d<base
  then add(d);a=a+1;i=i+1 per
  fi
  end;
begin
i=0;ch=0;cl=0;
if testch(''')<>0
then i=1;cl=as[a];a=a+1; expch(''')
elif testch('#')<>0 then readc(16)
elif testch('b')<>0 then readc(2)
else readch(10)
fi; symbol; return i;
end

Code generation procedure

proc c(cod)
begin
@code=cod;
code[0]=code[0]+1;code[1]=code[1]+c0;
pcl=pcl+1;pch=pch+c0
end;

proc push(i)
begin
stack[sp]=i;
sp=sp+1;
if sp=100
then use(0);writel('stckover');err
fi
end;

proc pop;
begin
if sp=0
then use(0);writel('stckunder');err
fi
sp=sp-1;
return stack[sp]
end;

proc adrvnr;
begin
vl=adrl(vnr);
vh=adrh(vnr)
end;

proc pla;
begin
if ast=1 then c(104) fi
end;

proc pha;
begin
if ast=1 then c(72) fi
end;

proc pusha;
begin
push(ast);pha;ast=0
end;

proc incvnr;
begin
vl=vl+1;vh=vh+c0
end;

proc acode;
begin
c(vl);c(vh)
end;

proc icode(i);
begin
if i=0 then c(208)
elif i=1 then c(144);c(5);c(240)
elif i=2 then c(144);
elif i=3 then c(176);
elif i=4 then c(144);c(2);c(268)
elif i=5 then c(240)
fi; c(3)
end;

proc lcode;
begin
if st>12
then c(8);c(169);c(255);c(40);
     icode(st-25);c(169);c(0);c(234);
     st=1
fi
end;

proc mcode(base);
begin
lcode;
if st<=3
then if   st=1 then c(133)
     elif st=2 then c(135)
               else c(132)
     fi; c(212);
     if st=1 then c(104) fi;
     c(base-4);c(212)
elif st=4 then c(base);c(cl)
elif st=5 then c(base+4);acode
elif st=6 then c(base+20);acode
elif st=7 then c(base+16);acode
elif if vh<>0
     then pha;c(173);c(vl);c(vh);
          c(133);c(213);incvnr;
          c(173);c(vl);ch(vh);
          c(133);c(214);vl=213;pla
     fi;
     if st=8 then c(base-8)
             else c(base+8)
     fi; c(vl)
fi; st=1
end;

proc sta;
begin
if   st=1 then
elif st=2 then c(170)
elif st=3 then c(168)
elif st>=5
then if st<=9
     then mcode(137)
     else typerr
     fi
else typerr
fi
end;

prov lda;
begin
lcode;
if st<>1
then pha; ast=1;
     if   st=2 then c(138)
     elif st=3 then c(152)
     else mcode(169)
     fi; st=1
fi
end;

proc ocode;
begin
if   op=17 then sta
elif op=18 then mcode(105)
elif op=19 then c(24);mcode(105)
elif op=20 then mcode(233)
elif op=21 then c(56);mcode(233)
elif op=22 then mcode(41)
elif op=23 then mcode(9)
elif op=24 then mcode(73)
else mcode(201);st=op
fi
end;

proc jmpto(lab);
begin
c(242);c(76);c(lab)
end;

proc makelab(lab);
begin
labell[lab]=pcl;labelh[lab]=pch
end;

proc newlab;
begin
if lnr=99
then use(0);writel('tm lab');err
fi;
lnr=lnr+1
end;

proc jmptol(l);
begin
if labk[l]=0
then labk[l]=newlab
fi;
jmpto(labk(l))
end;

proc makecl(l);
begin
if labk[l]<>0
then makelabel(labk[l])
fi
end;

proc monad(s);
begin
if   st=2
then if   s=1 then c(232)
     elif s=2 then c(202)
     else typerr
     fi
elif st=3
then if   s=1 then c(200)
     elif s=2 then c(184)
     else typerr
     fi
else if   s=1 then s=238
     elif s=2 then s=206
     elif s=3 then s=14
     elif s=4 then s=46
     elif s=5 then s=78
     elif s=6 then s=110
     else typerr;
     if   st=5 then c(s);acode
     elif st=6 then c(s+16);acode
     else typerr
     fi
fi
end;

List parsing procedure

The grammar rule:
list : expr CHAIN ";".
is parsed by:
proc list;
begin
if sym>=8
then if sym<=14 then st=1;return 0
     fi
fi;
rep ast=0;st=0;expr
rif testch(';')<>0 then per
fi;
if ast=0 then lda fi
end

Do-expression parsing procedure

The grammar rule:
do_expr : "do" list "od".
is parsed by:
proc doexpr;
begin
rlab=rlab+1;labell[rlab]=pcl;labelh[rlab]=pch;
push(labk[0]);labk[0]=0;
pusha;list;ast=pop;st=1;
if sym=12
then symbol;c(76);c(labell[rlab]);
     c(labelh[rlab])
fi; expect(14);
makecl(0);labk[0]=pop;rlab=rlab-1
end

If-expression parsing procedure

The grammar rule:
if_expr : "if" expr "then" list
        ("elif" expr "then" list)SEQ OPT
        ("else" list)OPT
        "fi"
is parsed by:
proc ifexpr;
var nlab;
  proc endthen;
  begin
  if sym=12
  then if rlab=0 then err fi;
       symbol;c(76);c(labell[rlab]);
       c(labelh[rlab])
  elif sym=13
  then if rlab=0 then err fi;
       symbol;jmptol(0)
  elif sym<>11
  then jmptol(1)
  fi
  end;
begin
push(labk[1]);labk(1)=0;
push(labk[1]);push(nlab);pusha;
rep
 nlab=newlab;labk[2]=0;
 rep symbol;ast=0;st=0;expr;
     if st<12 then lda;st=25 fi
 rif sym=16
 then icode(30-st);jmptol(2) per
 else icode(st-25);jmpto(nlab);
      if sym=15 then per
      fi; expect(6);makecl(2);
      list;endthen;makelab(nlab)
 fi;
rif  sym=9 then per
elif sym=10 then symbol;list;endthen
fi;
expect(11);
makel(1);ast=pop;st=1;
nlab=pop;labk[2]=pop;labk[1]=pop
end;

Array parsing procedure

The grammar rule:
array ("." const)OPT
      ("[" expr "]")OPT.
is parsed by:
proc array;
bein
if   typ=2
then adrvnr;st=5
elif if typ<>3 then typerr fi;
     if testch('.')<>0
     then const else cl=0
     fi;
     if testch('[')<>0
     then push(cl);
          push(ast);push(vnr);
          expr;lcode;vnr=pop;
          if   st=1 then c(168)
          elif st<=3 then st=st+4
          elif st=4 then c( );c(cl);st=7;
          elif st=5 then c( );c(vl);c(vh);st=7
          else pha;mcode(169);c(168);
               st=1
          fi;
          ast=pop;
          if st=1 then st=7;pla fi;
          cl=pop;expch(']')
     else st=5
     fi;
     vl=adrl(vnr)+ cl;vh=adrh(vnr)+c0
fi
end;

Procedure for checking an identifier

proc expid;
begin
if   ident=0
then use(0);writel('no id');err
elif lookup(1)=0
then use(0);writel('undef');err
fi; symbol
end;

Factor parsing procedure

The grammar rule:
factor : "(" expr ")"
       | "a"
       | "x"
       | "y"
       | do_expr
       | if_expr
       | "begin" list "end"
       | "?" ident array
       | const
       | ident ( array | "(" var ")" )OPT
       .
is parsed by:
proc factor;
begin
if   testch('(')<>0
then expr;expch(')')
elif testch('a')<>0 then st=1
elif testch('x')<>0 then st=2
elif testch('y')<>0 then st=3
elif sym=7 then symbol;doexpr
elif sym=5 then ifexpr
elif sym=4 then symbol;pusha;list;ast=pop;expect(8)
elif testch('?')<>0
then expid;array;
     if   st=5 then c(160);c(0);st=9
     elif st=6 then st=8
     elif st=7 then st=9
     fi
elif sym<>0
then use(0);writel('ilsym');err
elif const<>0 then st=4
else expid;
     if typ>=2
     then array
     else pusha;
          if len>0
          then expch('(');push(vnr);
               expr;lda;var;pop;
               expch(')')
          fi;
          if type[vnr]=0
          then c(242);c(32);c(adrh[vnr])
          else c(32);c(adrl[vnr]);c(adrh[vnr])
          fi;
          st=1;ast=pop
     fi
fi
end;

Operator parsing procedure

The grammar rule:
oper : "->"
     | "+c" | "+" | "-" | "-c"
     | "&" | "|" | ":"
     | "<>" | "<=" | "<"
     | ">=" | ">" | "="
     | "inc" | "dec"
     | "ror" | "rol" | "lsr" | "asl"
     .
is parsed by:
proc oper;
begin
if sym>=17
then if sym<36
     then op=sym;symbol;return 1
     fi
fi; return 0
end;

Expression parsing procedure

The grammar rule:
expr : factor CHAIN oper.
is parsed by:
proc expr;
begin
factor;
if oper<>0
then lda;
     rep
       if st>12 then lcode fi;
       if op>=33
       then const;
            if   op=33 then op=
            elif op=34 then op=
            elif op=35 then op=
                       else op=
            fi;
            cl=cl&7;
            rep c(op);cl=cl-1
            rif cl>0 then per
            fi
       else push(op);factor;op=pop;
            opcode
       fi
    rif oper<>0 then per
    fi
fi
end;

Variable declaration parsing procedure

The grammar rule:
vars : "var"
       ( ident ( "[" const "]"
               | "?" | "$" )OPT
         ( "@" const )OPT
       ) CHAIN "," ";" .
is parsed by:
proc vars;
begin
symbol;
rep
ident; uniek; space;
if testch('[')<gt;0
then const; len=cl;typ=3;
     space;expch(']')
else testch('?')
then len=2;typ=3
elif testch('$')
then len=3;typ=3
else len=1;typ=1
fi;
if   testch('@')<gt;0
then const;space
else cl=ofsetl;ch=ofseth;
     ofsetl=cl+len;ofseth+c0
fi;
adrl[top]=cl;adrh[top]=ch;
addvar;
rif testch(',')<gt;0
then per
fi; expch(';')
end;

Procedure parsing procedure

The grammar rule:
procs : ident ( "(" ident ")" )OPT ":"
       vars OPT
       procs SEQ OPT
       factor ";"
       .
is parsed by:
proc procs;
var par
begin
symbol;ident;
if lookup(first[depth])=0
then typ=1;var=top;
     if as[a]='(' then len=1 else len=0 fi;
     addvar
else if typ<>0 then err fi;
     len=leng[vnr]
fi; push(vnr);
depth=depth+1;
if depth>10
then use(0);writel('too deep');err
fi; first[depth]=top;
if   len=1
then expch('(');ident;push(vnr)
     adrl[top]=ofsetl;adrh[top]=ofseth;
     ofsetl=ofsetl+1;ofseth=ofseth+c0;addvar;expch(')')
else push(0)
fi; expch(':');
if sym=1 then vars fi;
rep rif sym=3 then procs per fi;
par=pop;vnr=pop;typ=type[vnr];
if typ=0
then symbol;pnr=pnr+1;
     adrl[vnr]=pnr;typ=0
fi;
if typ=1
then adrl[vnr]=pcl;adrl[vnr]=pch;
     if par<>0
     then c(141);c(adrl[par]);c(adrh[par])
     fi; factor; c(96)
fi;expch(';');
top=first[depth];
depth=depth-1
end;

Main procedures

The grammar rule:
program : vars OPT
          procs SEQ OPT
          "begin" list "end".
is parsed the first time by:
proc passone;
begin
first[0]=1;depth=0;top=1;
sp=0;pnr=0;
lnr=0;rlab=0;
file[flen]=84;file[flen+1]=88;
file[flen+2]=84;
use(0);write('reading: ');writel(file);
use(1);aux(4);open(file);
a=0;as[0]=155;
symbol;
if sym=1 then vars fi;
rep rif sym=3 then procs per fi;
startl=pcl;starth=pch;
expect(4);list;c(96);
if sym<gt;8 then err fi;
use(0);
writel('program o.k.');
close(1)
end;
The second pass, patches the code, such that the right addresses are filled in:
proc passtwo;
var endl,endh;
  proc w(op);
  begin
  char[0]=0;write(char);
  end;
  proc n;
  var res;
  begin
  res=@code;
  code[0]=code[0]+1;code[1]=code[1]+c0;
  return res
  end;
  proc trans(op);
  begin
  if   op&8=0
  then if   op=32 then w(op); w(n); w(n)
       elif op&159=0 then w(op);
       elif op&15<>2 then w(op);w(n)
       elif op=242
       then op=n;w(op);
            if op=32
            then op=n;w(padl[op]);w(padh[op])
            else op=n;w(labell[op]);w(padh[op])
            fi
       fi
  elif op&13=8 then w(op)
  elif op&31=9 then w(op);w(n)
               else w(op);w(n);w(n)
  fi
  end;
begin
endl=code[0];endh=code[1];code[0]=...;code[1]=...
rep
trans(n)
rif  code[1]<endh then per
elif code[0]<endl then per
fi; w..
end;
The main body of the program is missing.


hacker page