4. Cogent Surface Syntax

Note

For new users of this language, this section is not intended to be a tutorial for the language. It merely introduces its surface syntax and should be used as a reference while using the language.

Note

As described in #324, we are in the progress of revising the surface sytnax. So it’s subject to change in near future.

4.1. Types

All type names must begin with an uppercase letter. Tuple, Function, Record and Variant types have special syntax.

4.1.1. Record types

Now indicated by a series of typing declarations inside braces, e.g

{x : A, y : B}

If a field f is removed by take, the postfix type operator take f may be used:

{x : A, y : B} take x

This works, even if the type on the LHS is a type synonym (c.f. Type synonyms).

Sugar: Taking multiple fields:

{x : A, y : B} take (x,y) -- parens mandatory
{x : A, y : B} take (..)  -- parens mandatory

This (..) sugar means to take all currently untaken fields (if any).

Similarly, we can write put instead of take with the same syntax. put is dual to take. One common usage would be

({x : A, y : B, z : C, u : D, w : E} take (..)) put x

Note that the parentheses around the record and first take is mandatory. Arbitrary many levels of nestings is possible.

4.1.2. Unboxed records

Unboxed records are pretty much like regular records, except that their wrappers (i.e. the unboxed container) are lightweight that no allocation is required.

The syntax is simple, just a prefixing # on its counterpart record type. take, put operations (and punning) and member extraction are of exactly the same syntax as regular records. As a consequence of its lightweightness, we can construct and deconstruct (by means of pattern matching) unboxed records, just as what we do on tuples (see Product Types for tuples).

E.g.,

#{x : A, y : B}    -- type
#{x = e1, y = e2}  -- expression or pattern
a.x                -- member
a {x = f}          -- take and put, same as regular records

About its permission rules (see Permissions): the wrapper per se is non-linear (i.e. shareable and discardable). If any linear fields are present (untaken), then the unboxed record becomes linear. Member operation can only be used if there’re no linear fields inside, or the unboxed record is banged (!).

4.1.3. Unboxed abstract types

The # sigil is not used in the declaration of an unboxed abstract type. Instead, we attach the # sigil when the type is used. E.g.:

type A
type T = {a : #A, b : A}

In the above case, field a is unboxed, whereas b is not. When generating C code, boxed abstract types will be pointers, unboxed are not. It’s the users’ responsibility to keep the C code consistent, as these types are abstract.

4.1.4. Bang (!)

Record types and abstract types have sigils, but outwardly, a Write sigil is just a plain record type / abstract type, and an Observe (or Readonly) sigil would be indicated with a postfix bang, for example: {x : A, y : B}!, C!.

The postfix bang operator can be applied to any type, which converts all write sigils to readonly sigils internally (and any type variables to readonly type variables).

To bang a parametric type, the type must be surrounded by parentheses, otherwise ! will be applied to the type parameter right before the ! symbol.

4.1.5. Product Types

Nameless, unboxed tuples may be used everywhere you like. Their syntax is identical to Haskell. A unit type (which is freely discardable and not linear) also exists, () with a value (). Tuples are not associative, namely (a, b, c, d) /= (a, (b, (c, d))) and (a, b, c, d) /= (((a, b), c), d), just as in Haskell.

4.1.6. Variant Types

Variant types look like this.

< Ok (Obj, Heap) | Fail Heap >

They can be constructed using Ok (obj,h), or Fail e.

We can determine from context if Identifier (Ok and Fail in the example above) is a type or a data constructor, much like Haskell. We will have to do a little bit of inference to determine which variant type the thing actually belongs to.

They can have as many alternatives as you like, and there is no restriction on what goes inside a variant type. Each alternative can take any number of arguments (0 or more). They will be desugared to tuples of all the arguments. E.g.:

<Ok Obj U8 | Fail >  -- equiv to <Ok (Obj, U8) | Fail ()>

4.1.7. Polymorphic types

Types can contain variables. Functions may be declared as having polymorphic type.

size : all (k, v). Map k v -> U32

Monomorphic functions are first class, but to get a monomorphic function from a polymorphic function requires instantiation, e.g length[Int]. Since Cogent-2.0.9, explicit type applications are not mandatory, although in some cases they must be supplied to guide the type inference engine. Type applications can be partial, or with type holes. For example, foo [_, A, B]; foo [S, T, _] is equivalent to foo [S,T].

A type variable under observation (i.e let!-ed) is annotated with a prefix bang (e.g !a)

4.1.7.1. Permissions

Permissions (they used to be called “kinds”) are provided for polymorphic signatures as follows:

length : all (a :< k). Array a -> U32

Permissions are internally a set of three booleans: whether or not the type can be:

  • D for Discard (i.e by weakening)
  • S for Share (i.e by contraction)
  • E for Escape (i.e returned from let!)

The permission signature on a type variable is more like a constraint. They are some combination of those three letters. If no permission is provided, it is assumed that none of those permissions are required, and the value will be linear and cannot escape a let!.

4.2. Type synonyms

Type synonyms may be provided using the type keyword as follows:

type X a b = { foo : a, bar : b, baz : Int }

The type synonym X must always be fully saturated with its two arguments wherever it is used, however.

Abstract types (defined in C) may also be declared, and they also may take parameters. This corresponds to a family of types in C.

type Buffer
type Array a

4.3. Constants and top-level definitions

Constants are mono-typed.

abc : U8
abc = 3

But the right hand side can be much more expressive now, with let bindings and whatnot. We must be able to prevent users from doing side-effects like allocation in the top-level—see Effects.

To make the syntax easier to parse, a function or constant’s body must be indented by at least one space. This means that any non-indented bareword is the start of a new definition or signature.

4.4. Effects

Most effects are currently (successfully) modelled via linear types. For allocation, Cogent does not know anything about it. Memory management involves the heap. I propose modelling the heap as an explicit linear value, just as with any other state.

Allocation functions must take and return a linear heap, as they modify it:

allocateObj : Heap -> < Ok Obj Heap | Fail Heap >

Special syntax for allocation functions and automating heap-threading are nice to have, so I welcome proposals.

4.5. Expression language

4.5.1. Matching and Error Handling

Matching may be accomplished by the following syntax:

f : Heap -> < Ok Obj Heap | Fail Heap >
f h = allocateobj h
    | Ok obj h => allocateobj h
        | Ok obj' h => Ok (mergeObj (obj, obj')) h
        | Fail h -> let () = free obj in Fail h
    | Fail h -> Fail h

This is an alignment-based syntax, grouping determined based on the alignment of the bars.

The rightward arrows for each case can either be => or ->. => indicates that that branch is likely, to enable compiler optimisations. ~> can also be used to indicate an unlikely branch.

A pattern may be _ but only if the kind of the value allows it to be discarded.

4.5.2. Biased pattern matching

The syntax above poses a problem if many levels of nestings occur—you will end up with cascading matches which indent a lot. To solve this problem, we allow a syntax for early exit, which is inspired by Idris. The syntax looks like:

f : Heap -> <Ok Obj Heap | Fail Heap>
f h = let Ok obj  h <= allocateobj h |> Fail h -> Fail h
      and Ok obj' h <= allocateobj h |> Fail h -> let _ = free obj in Fail h
       in Ok (mergeObj (obj, obj')) h

This piece of code is semantically identical to the one above. <= matches the major case, and |> bails out with the minor case.

4.5.3. Patterns

Patterns may be refutable (could fail, e.g Ok a or 43) or irrefutable (always match, e.g (a,b) or _). Refutable patterns can be used in a matching block only, but they can only nest irrefutable patterns. So, unlike in Haskell, you can’t go:

f x = foo x
  | Ok (Alt1 3) -> bar
  | _ -> baz

As this nests a refutable pattern (3) inside another refutable pattern (Alt1 3) inside another refutable pattern (Ok (Alt1 3)).

This is forbidden to make compilation much more straightforward in the presence of linear types.

4.5.4. Let-binding

Let expressions take the form of ML. They are not ever recursive. Multiple let-bindings can be introduced by separating them with and:

f () = let x = 3
       and y = 4
        in foo (x,y)

Is equivalent to:

f () = let x = 3
        in let y = 4
            in foo (x,y)

Irrefutable single patterns may occur on the left hand side of let-binding, but refutable patterns must use regular pattern matching.

To force inference to go the way you want, a type signature can be provided for a let binding:

f () = let x : U8 = 3
        in let y : U16 = 4
            in foo (x,y)

4.5.5. Observation and let!

Variables may be observed (i.e. accessed in a readonly manner) using !. In the following example, we observe x and y on the first line, so we annotate the let-binding with !x and !y:

f (x, y) = let (a,b) = foo (x, y) !x !y
            in bar (a, b)

Prefix ! annotations can be used inline with pattern matching also:

f (x,y) = foo(x,y) !x !y
          | Blah x  -> bar x
          | Blorp z -> baz z

4.5.6. If

Conditionals can be expressed in the form of if-expressions. They are in the form of if c !v1 !v2 ... then e1 else e2. The !vs are similar to the ! syntax introduced above, allowing for temporary access to linear objects in the condition.

Apart from the normal if-then-else syntax, Cogent offers a multi-way if syntax, inspired by GHC/Haskell. For example,

if | cond_1 -> expr_1
   | cond_2 -> expr_2
   | ...
   | else   -> expr_n

In the code snippet above, the conditions are Boolean expressions, instead of patterns as one might think. The final else is part of the syntax. The pipes have to be aligned. The arrows, as usual, can be any of =>, -> or ~>, which have the same semantics as used in alternatives. Prefix !s can be added after each condition (but not after the else keyword), like | cond_1 !v1 !v2 => e.

4.5.7. Sequencing

Occasionally, it is useful to free a bunch of things, and using let for this purpose can be somewhat annoying:

f : (Obj, Obj) -> Int
f (a, b) = let _ = free a
           and _ = free b
            in 42

So, a little sugar is added for a series of discarding let bindings:

f : (Obj, Obj) -> Int
f (a, b) = free a; free b; 42

These two expressions are equivalent.

4.5.8. Records

Unboxed records are initialized as follows:

::
#{x = e1, y = e2}

Boxed records are allocated on the heap. They are typically initialized by first calling the external C malloc function and then updating the record using the put operation.

4.5.9. Take/put

There is pattern syntax for take, and a similar expression syntax for put:

f : {a : Foo, b : Bar} -> {a : Foo, b : Bar}
f (r {a = ra, b = rb}) = r {a = ra, b = rb}

Note

take is always in patterns (i.e. LHS of =), whereas put is always in expressions (i.e. RHS of =).

Punning is also allowed:

f : {a : Foo, b : Bar} -> {a : Foo, b : Bar}
f (r {a, b}) = r {a, b}

(where just a is equivalent to a = a)

4.5.10. Arithmetic and comparison operators

Currently Cogent will use the smallest type possible for integer literals and generate upcasts (but not downcasts) automatically when used in a context where they are required. For non-literals, an explicit upcast primitive may be needed.

4.5.11. Type annotations

To guide the type inference engine, the user can give type annotations to any expressions. The syntax is e : t.

4.6. Experimental features

Warning

Experimental features, as they are called, are indeed experimental, which means they are not stable, and may fail, terribly. Don’t rely on them until they become stable.

4.6.1. Static arrays

Note

This feature is not implemented on the master branch. It’s implemented on the array branch and needs to be enabled by the --builtin-arrays flag (see Optional features for more information on how to do that).

Array literals can be introduced using square brackets, like [a, b, c]. This syntax can also be used as patterns. Array types can be defined like U8[3], similar to C. Indexing can be made with the @ operator. E.g.: arr @ 3. Arrays can also be taken and put. The type-level take and put operators are spelt as @take and @put respectively. On the term level, it’s written as arr @{ @idx = val }— just prefix the open bracket and the indices with @ symbols, and the rest are the same as those for records.

4.6.2. Lambda expressions

We only allow for a very limited form of lambda expressions. A lambda expression has the syntax \irref => exp, where irref is an irrefutable pattern, and exp is an expression which does not refer to any variables outside the lambda binding (no free variables). The bound variables have to be non-linear.