This page describes a way for presenting values (with all their
structure) in a character based, readable manner, much like XML.
However, the value representation given here avoids many of the
drawbacks of XML. (Read XML sucks.
See also XML alternatives.)
It overcomes the following drawbacks of XML:
It should be noted that some ad-hoc decision have been made
with respect to the representation of string values. Also the
representation is based on the ASCII character set.
This page is part of the specification
language presented on my "The Art of
Programming" pages. The values describe here match the
type definitions given in description of the
static aspects of the language.
Another pages deals with the dynamic
aspects of the language.
For the representation of the elementairy types we use the following conventions:
- For the void type, the void value is represented by the string "void".
- For the boolean type, the true and false values are represented by the
strings "true" and "false".
- The values of the various numeric types (like Num, NatNum,
and PosNum) are represented by strings consting of the digits
"0" to "9", prefixed with the minus ("-") character for negative numbers.
- Rational values (of the RatNum) are represented by two numbers
separated by the slash ("/") character, where the second number cannot have
a prefixing minus character.
As an alternative, they can also represented
by the usual 10-based fraction representation with exponent, which consist
of an optional minus character, a sequence of digits, a period (".") character,
a sequence of digits, and an optional exponent expresion, which start
with the "e" character, followed by an optional minus character, and
a sequence of digits.
- The real numbers cannot be represented with a finite representation,
otherwise they would not be real numbers. That means that they cannot
be represented explicitely.
- Values for the enumerate types are simply represented by an identifier
consisting of the usual alphabetic and numerical characters and the
underscore ("_") character, but may not start with a numerical character.
String values represent arbitrary lenght sequences of ASCII characters.
We use the string representation as used in the C language, with a simple
extention. In the C programming languages, strings consist of one or
more parts, where each part consists of ASCII characters with escape
sequences surrounded by double quotes. An arbritrary amouth of white-space
(spaces, tab-characters, newline-characters) may separate the parts.
The extenstion we suggest consists of an alternative part form, in
which each ASCII character is represented by two hexadecimal characters,
surrounded by hash ('#') characters. The two forms can be arbitrary
The various set and list values are represented by a sequence of
comma separated values surrounded by brackets and preceded by
one of the keywords "set", "orderedset", and
set(3, 4, 5)
list(set(3), set(5), set(5))
orderedset(list(2), list(3, 4))
A record value is represented by a sequence of comma separated
pairs of identifier value pairs surrounded by brackets and
preceded by the keyword "record". Each identifier value
pair consists of an identifier and a value, separated by a colon
record(a:4, b:3, c:record(v:1, w:3))
One cannot really make a distinction between function and map
types without knowing the type of the domain. For this reason
function values are represented as map values. A map value is
represented by a sequence of comma separated individual mappings
surrounded by brackets and preceded by the keyword "map".
Each individual value consists of a pair of values from the
domain and codomain separated by "->" and surrounded by brackets.
The same short-hand (with respect to record types) that was
introduced for the function and map types, will also apply to
the map values.
map((name:"Tom" -> age:34),
(name:"John" -> age;27))
map((4 -> 5),(6 -> 7))
map((list(3,4) -> a:set(5,6)),
(list(3,5) -> a:set(7,8)))
Relation values can simply be seen as a set of records. The
keyword "relation" will be used.
An example is:
Named values are represented by a pair of an identifier and
a value, separated by a comma, surrounded by brackets, and
preceded by the keyword "named". For named records
the keyword "namedrecord" may be used, where the first
element within the brackets represents the name.
There are two ways in which references can be defined in
a value. One is by means of reference labels. The other is
by means of path expression.
Referencing through labels
Referencing through lables is simply done by labling
a certain value with a unique label, and using this
label at the place where the value is referenced.
Any value can be labeled by post-fixing it with
the "@" symbol and a unique identifier or number.
A labeled value is referenced by the "#" symbol followed
by a unique identifier. In case a referenced value
is itself a reference, that reference should be followed
It should be noted that the labels are not part of the
value itself, they are only used for the referencing.
One can change any label into another unique identifier
(and all references to it), without changing the value
being represented by the value representation.
A disadvantage of the use of unique identifiers as lables,
is that it introduces a semantical correctness criteria
on the representation of values. If the unique identifier does
not exist in the value representation or if it is not
unique, it is rendered as void.) Also in the
case of self referencing (such as for example in #a@a)
this is considered as equal to void.
Referencing through path expressions
Path expressions prevent the need for unique identificating labels.
Each path expression start with a number of "^" symbols
that indicate the number of "levels" to go up, possibly followed
by an expression selecting a part of the value at that level.
The levels are simply defined by the surrounding brackets, meaning
that each "^" symbol stands for one open bracket.
Depending on the type of the value, different mechanism for
selecting are used. If the value is a record, field selection
can be used, which consists of a period followed by the field name.
If the value is a list type, or some ordered type, an indexed based
selection can be used, which consists of a positive number,
surrounded by square brackets. If the value is a map type, selection
based on a domain value between round brackets can be used. Again,
the following short-hand notation can be used, in case the domain is
a record type:
(id1:v1, ..., idn:vn)
In this "idi" stands for an identifier,
and "idi" stands for a value expression.
The value selected in this manner, is the codomain part of the selected
element in the map value, except when the reference expression occurs
directly inside a map value expression.
An example is:
X : list(map((c:3 -> d:7),
(c:4 -> d:5)),
map((c:4 -> d:2))),
Y : set(^^.X(c:3).d),
Z : map(^^.X(c:4)))
In this example "^^.X(c:3).d" stands for the value 7.
In this case, it would have be shorter to just have written 7,
which would have represented exactly the same value.
In this example "^^.X(c:4)" stands for the value
"(c:4 -> d:2)".
Each value expression can be followed by a type definition
as described in the static aspects
of the specification language. This is done as by following
the value by a colon and the type specification.
An example is:
set(3,4,9) : Set(PosNum)
Although it is obvious that the above set contains only numbers,
the explicit typing states that it may only contain positive
The problem with complete typing is that it introduces a
semantical correctness criteria on the represented value.
This requires a interpretation of what the value represents
in case it does not match the given type expression. There
is no simple way out. Consider the following examples:
set(-1,4,5) : Set(PosNum)
-1 : PosNum
For the first example, one could introduce the rule that
states that the represented value is equal to set(4,5)
or that it is equal to void. However, such a rule
cannot be deviced for the second example, because void
is not in the type PosNum.
The Art of Programming