MExpr tutorial

One design objective of MExpr is to make the concrete syntax very close to the abstract syntax of the language. That is, no syntactic sugar is introduced to the concrete MCore syntax. The MExpr language is not intended to be a general-purpose programming language. Instead, the aim of MCore is to be a core language that other languages can be translated into.

Nevertheless, to understand the Miking system, it is a good idea to learn to write basic programs directly as MCore expressions.

In the end, an MCore file .mc is always translated into an MCore expression. If an MCore file contains mexpr 5, it means that the final expression of the program is value 5. That is, mexpr states the start of the program and is followed by the actual MExpr of the program. If the keyword mexpr is left out of the file, a default mexpr unit value () is the resulting value.

Unit Test Expressions

When writing MCore programs, it is typically done by writing explicit unit tests as part of the code. For instance

mexpr
utest addi 1 2 with 3 in
()

checks that the addition of 1 and 2 is in fact 3. To run the tests in an .mc file, run the mi command with argument test (assuming that you have now installed mi in your path):

mi run program.mc --test

Typically when you develop MCore programs, you do not use the print function. Instead, you write unit tests directly and then leave the units tests as is directly after your function. By doing so, you test your code, write regression tests, and document the informal semantics of your program directly. We strongly encourage you to develop your MCore programs this way.

In the rest of this document, we omit the mexpr keyword for brevity, and just write the MExpr itself. Remember to add it as appropriate when trying the various examples.

The equality between the left-hand and right-hand sides in a utest is automatically deduced for many expressions. However, if you want to compare more complex language structures, or want to define equality in a different way, you can supply a custom equality function as, e.g.,

utest addi 1 2 with 0 using neqi in ()

where neqi : Int -> Int -> bool returns true if two integers are not equal. Moreover, when a test fails the left-hand and right-hand sides are pretty printed to standard out. For more complex expressions you may have to or want to modify this string representation. There is therefore an option to supply a custom toString : a -> b -> String function to utests with the following syntax (a and b are the types of the left-hand and right-hand sides, respectively)

utest addi 1 2 with 0 using neqi else lam l. lam r. "1+2 should not be 0" in ()

The function lam l. lam r. "1+2 should not be 0" is only applied to the left-hand and right-hand sides of the of utest if the test fails.

Intrinsics

MCore contains a number of built-in values (intrinsics) and predefined functions and constants (part of the standard library). For instance,

print "Hello\n"

uses the built-in function print which has the type String -> (), i.e., it prints a string and returns the unit type.

The current documentation of intrinsics is implicit via code containing utest expressions. Please see the following files:

Let Expressions

Expressions can be given names using let expressions. For instance

let x = addi 1 2 in
x

introduces a new name x. The built-in function addi performs an addition between two integers. Note that MCore uses a call-by-value evaluation order, which means that expressions are evaluated into a value before they are applied to a function or substituted using a let expression. Hence, the expression addi 1 2 is evaluated before it is substituted for x in the rest of the expression.

Functions

Functions are always defined anonymously as lambda functions. If you would like to give a function a name, a let expression can be used. For instance, the following program defines a function double that doubles the value of its argument.

let double = lam x. muli x 2 in
utest double 5 with 10 in
()

MExpr is statically typed, using a polymorphic type system based on FreezeML. If type annotations are omitted, types will be automatically inferred. However, types can also be explicitly expressed in MCore programs. For instance, the double function can be written as

let double = lam x:Int. muli x 2 in

This means that double has type Int -> Int, which can also be expressed as part of the let expression.

let double : Int -> Int = lam x. muli x 2 in

A function with several parameters are expressed using currying, using nested lambda expressions. For instance, expression

let foo = lam x. lam y. addi x y in
utest foo 2 3 with 5 in
()

creates a function foo that takes two arguments.

A lambda can also be defined without a variable, e.g., lam. e, where e is some expression representing the body of the function. Such notation is useful if the actual variable is not used inside e. Note that lam. e is syntactic sugar for a normal lambda lam #var"". e, where the identifier of the variable name is the empty identifier.

Sequencing

Sometimes an expression has a side effect and you are not interested in the returned value. If that is the case, you can use the sequence operator ;. For instance, suppose you would like to print a value in a function before it returns:

let foo = lam x.
  print x;
  x

The sequence operator ; is not a construct of pure MExpr, but syntactic sugar for a let construct. For instance, the pure version (without syntactic sugar) of the program above is as follows:

let foo = lam x.
  let #var"" = print x in
  x

Note that a variable with an empty identifier is used in the let expression. Moreover, note that a let expression

let _ = foo x in ...

is syntactically not valid since let expressions always bind a value to a variable. Underscore _ is a pattern and patterns are only allowed in match expressions.

`if` Expressions

Conditional expressions can be expressed using if expressions. The program

let x = 5 in
let answer = if (lti x 10) then "yes" else "no" in
utest answer with "yes" in
()

checks if x is less than 10 (using the lti function with signature Int -> Int -> Bool). If it is true, the string "yes" is returned, else string "no" is returned.

Note that an if expression is not a construct in pure MExpr. It is syntactic sugar for a match expression. That is, the expression

if x then e1 else e2

is translated into

match x with true then e1 else e2

Recursion

A normal let expression cannot be used to define recursive functions. Instead, recursion can be defined using recursive lets, starting with the recursive keyword:

recursive
let fact = lam n.
  if eqi n 0
    then 1
    else muli n (fact (subi n 1))
in

utest fact 0 with 1 in
utest fact 4 with 24 in
()

Recursive lets can also be used to define mutually recursive functions. For instance:

recursive
let odd = lam n.
    if eqi n 1 then true
    else if lti n 1 then false
    else even (subi n 1)
let even = lam n.
    if eqi n 0 then true
    else if lti n 0 then false
    else odd (subi n 1)
in

utest odd 4 with false in
utest even 4 with true in

Tuples

Product types can be expressed using tuples. An n-tuple is defined using syntax (e_1, ..., e_n) where e_1 to e_n are MCore expressions. Extracting a value from a tuple (projection) is performed using an expression e.n where e is the expression that is evaluated into a tuple, and n is an integer number representing the index of an element in the tuple. The first index in a tuple is 0.

For instance, in the MCore expression

let t = (addi 1 2, "hi", 80) in
utest t.0 with 3 in
utest t.1 with "hi" in
utest t.2 with 80 in
()

we create a 3-tuple t and project out its values. Note that the different elements of a tuple can have different types. In this case, tuple t has type (Int, String, Int).

Singleton tuples are also supported: (x,) represents a tuple with x as its only element.

Records

Another more general form of product types are records. A record has named fields that can have different types. For instance,

let r1 = {age = 42, name = "foobar"} in

defines a record of type {age : int, name : string}. The order of the fields does not matter:

utest r1 with {age = 42, name = "foobar"} in
utest r1 with {name = "foobar", age = 42} in

To project out a value, a dot notation may be used.

utest r1.age with 42 in
utest r1.name with "foobar" in
()

It is possible to replace the value of a field using the with keyword.

let r = {x = 1, y = 1} in
utest {x = 1, y = 2} with {r with y = 2} in ()

A record type is not just a general product type in MCore, it is the only product type. That is, a tuple is just syntactic sugar for a record. This means that the compiler encodes a tuple as a record, where the names of the fields are numbers 0, 1, etc. Labels can internally be any kind of string. For strings that cannot be defined as a normal identifier, the label form #label"x" can be used, where x is the string of the label.

The following example shows how a tuple is actually encoded as a record.

utest ("foo",5) with {#label"0" = "foo", #label"1" = 5} in ()

Data Types and `match` expressions

Sum types or variant types are constructed using con expressions (constructor expressions). In contrast to most other functional languages, the introduction of a new data type and the introduction of constructors are separated. For instance, the expression

type Tree in
con Node : (Tree,Tree) -> Tree in
con Leaf : (Int) -> Tree in

introduces a new data type Tree and defines two new constructors Node and Leaf. Constructor Leaf takes just one argument (an integer value for the leaf) and returns a tree, whereas the Node constructor takes a tuple with two trees as input and constructs a new tree node.

For instance, the expression

let tree = Node(Node(Leaf 4, Leaf 2), Leaf 3) in

is a small tree named tree.

Assume now that we want to count the sum of the values of all leafs in a tree. We can then write a recursive function that performs the counting.

recursive
  let count = lam tree.
    match tree with Node t then
      let left = t.0 in
      let right = t.1 in
      addi (count left) (count right)
    else match tree with Leaf v then v
    else error "Unknown node"
in

The match expression inside the count function deconstructs data values by matching against a given constructor. For instance, the match expression

match tree with Node t then expr1 else expr2

matches the value after evaluating expression tree and checks if its outer most constructor is a Node constructor. If that is the case, the identifier t in expression expr1 is bound to the tuple consisting of the node's two sub trees (recall the definition of the constructor Node). Finally, if we execute the test

utest count tree with 9 in ()

we can check that the function computes the result as intended.

Pattern matching

In the previous match example, the match construct matched against the constructor, but not against the actual data content. MExpr is designed to be simple with few language constructs, at the right level of abstraction. If the abstraction level is too low, it is hard to perform useful static analysis and code generation. As a consequence, MExpr support patterns in match expressions. The count function can be rewritten as

recursive
  let count = lam tree.
    match tree with Node(left,right) then
      addi (count left) (count right)
    else match tree with Leaf v then v
    else error "Unknown node"
in

where the match construct matches against pattern Node(left,right), where left and right are pattern variables.

Remember, however, that tuples are just syntactic sugar for records. Hence, match line

    match tree with Node(left,right) then

is equivalent to the following

    match tree with Node {#label"0"=left,#label"1"=right} then

where the pattern is a record pattern.

Pattern matching is the general form of deconstructing data in MExpr. Patterns can also be nested:

utest
  match {foo=7,bar={more="hello"}} with {foo=_,bar={more=str}} then str else ""
with "hello" in

Note also the use of wildcard patterns _ (used in the foo field), which matches any value.

Finally, MExpr also supports more advanced patterns, including AND patterns (using infix notation &)

utest match (1, 2) with (a, _) & b then (a, b) else (0, (0, 0)) with (1, (1, 2)) in

OR patterns (using infix notation |)

utest match K1 1 with K1 a | K2 a | K3 a then a else 0 with 1 in

and NOT patterns (using the prefix notation !)

utest match Some true with a & !(None ()) then a else Some false with Some true in
utest match None () with a & !(None ()) then a else Some false with Some false in

These are present to make it possible to translate order-dependent patterns to order-independent patterns. For example, in OCaml,

match (opt1, opt2) with
| (Some a, _) -> a
| (_, Some a) -> a
| (_, _) -> 1

is order-dependent; any change in pattern order changes which match-arm is executed. To express this in an order-independent manner we & every pattern with the inverse (!) of the union (|) of the previous patterns. If we pretend for a moment that OCaml supports & and ! in patterns they could then be written as:

match (opt1, opt2) with
| (Some a, _) -> a
| (_, Some a) & !(Some a, _) -> a
| (_, _) & !((Some a, _) | (_, Some a))-> 1

The order can now be changed freely without affecting the semantics. In practice & and ! will probably rarely be used in manually written code, while | is rather more useful.

Sequences

An MCore sequence is constructed using syntax [e_1, ..., e_n]. All elements in a sequence must have the same type. For instance, an expression

[1,3,6,7,22,3]

has type [Int] whereas and expression

["this", "is", "a", "test"]

has type [String].

A string itself is actually a sequence of characters. Hence,

utest "foo" with ['f','o','o'] in ()

is correct. This means that the type String is just an abbreviation for the sequence type [Char].

There are several operations defined for sequences, for instance, the concat function concatenates two sequences

utest concat [1,3,5] [7,9] with [1,3,5,7,9] in ()

or the get function picks out the nth element of a sequence

utest get [3,5,8,9] 2 with 8 in ()

The functions cons can be used to prepend an element to a sequence.

utest cons 1 [2,3] with [1, 2, 3] in ()

The head and tail of a sequence can be obtaining using the functions head and tail.

utest head [1, 2, 3] with 1 in
utest tail [1, 2, 3] with [2, 3] in ()

It is also possible to pattern match on sequences, to either extract the tail of a sequence, if the first part matches

utest match "foobar" with "fo" ++ rest then rest else ""
with "obar" in

or the head of a sequence if the last part matches:

utest match "foobar" with first ++ "bar" then first else ""
with "foo" in

It is even possible to extract the middle of a sequence, if the head and the tail matches:

utest match "foobar" with "fo" ++ mid ++ "ar" then mid else ""
with "ob" in

Again, matching can be combined and nested:

utest match (1,[["a","b"],["c"]],76) with (1,b++[["c"]],76) then b else []
with [["a","b"]] in

References

A mutable reference to an MExpr value can be created with the ref operator. For instance

let r = ref 3 in

allocates a reference to a cell in memory with an initial value 3, and binds the reference to the variable r.

The deref operator is used for dereferencing, that is, to read the value that a reference points to:

let r = ref 3 in
utest deref r with 3 in ()

The value that a reference points to can be modified using the modref operator:

let r = ref 3 in
modref r 4;
utest deref r with 4 in ()

Note that the return value of modref is an MExpr unit value.

It is possible have aliases for the same memory cell by binding several variables to the same reference. As an example, in the program

let r1 = ref "A" in
let r2 = r1 in
modref r2 "B";
utest deref r1 with "B" in ()

the change made to the referenced value via the variable r2 is visible when dereferencing the reference via the variable r1.

Unit Test Expressions​

Intrinsics​

Let Expressions​

Functions​

Sequencing​

if Expressions​

Recursion​

Tuples​

Records​

Data Types and match expressions​

Pattern matching​

Sequences​

References​