4. Cogent Surface Syntax¶
Note
For new users of this language, this section is not intended to be a tutorial for the language. It merely introduces its surface syntax and should be used as a reference while using the language.
Note
As described in #324, we are in the progress of revising the surface sytnax. So it’s subject to change in near future.
4.1. Types¶
All type names must begin with an uppercase letter. Tuple, Function, Record and Variant types have special syntax.
4.1.1. Record types¶
Now indicated by a series of typing declarations inside braces, e.g
{x : A, y : B}
If a field f
is removed by take
, the postfix type operator
take f
may be used:
{x : A, y : B} take x
This works, even if the type on the LHS is a type synonym (c.f. Type synonyms).
Sugar: Taking multiple fields:
{x : A, y : B} take (x,y) -- parens mandatory
{x : A, y : B} take (..) -- parens mandatory
This (..)
sugar means to take all currently untaken fields (if any).
Similarly, we can write put
instead of take
with the same
syntax. put
is dual to take
. One common usage would be
({x : A, y : B, z : C, u : D, w : E} take (..)) put x
Note that the parentheses around the record and first take
is mandatory.
Arbitrary many levels of nestings is possible.
4.1.2. Unboxed records¶
Unboxed records are pretty much like regular records, except that their wrappers (i.e. the unboxed container) are lightweight that no allocation is required.
The syntax is simple, just a prefixing #
on its counterpart record
type. take
, put
operations (and punning) and member extraction
are of exactly the same syntax as regular records. As a consequence of
its lightweightness, we can construct and deconstruct (by means of
pattern matching) unboxed records, just as what we do on tuples (see
Product Types for tuples).
E.g.,
#{x : A, y : B} -- type
#{x = e1, y = e2} -- expression or pattern
a.x -- member
a {x = f} -- take and put, same as regular records
About its permission rules (see Permissions):
the wrapper per se is non-linear (i.e. shareable
and discardable). If
any linear fields are present (untaken), then the unboxed record becomes
linear. Member operation can only be used if there’re no linear
fields inside, or the unboxed record is banged (!
).
4.1.3. Unboxed abstract types¶
The #
sigil is not used in the declaration of an unboxed abstract
type. Instead, we attach the #
sigil when the type is used. E.g.:
type A
type T = {a : #A, b : A}
In the above case, field a
is unboxed, whereas b
is not. When
generating C code, boxed abstract types will be pointers, unboxed are
not. It’s the users’ responsibility to keep the C code consistent, as
these types are abstract.
4.1.4. Bang (!
)¶
Record types and abstract types have sigils, but outwardly, a Write
sigil is just a plain record type / abstract type, and an Observe (or Readonly) sigil
would be indicated with a postfix bang, for example: {x : A, y : B}!
, C!
.
The postfix bang operator can be applied to any type, which converts all write sigils to readonly sigils internally (and any type variables to readonly type variables).
To bang a parametric type, the type must be surrounded by parentheses,
otherwise !
will be applied to the type parameter right before the
!
symbol.
4.1.5. Product Types¶
Nameless, unboxed tuples may be used everywhere you like. Their syntax
is identical to Haskell. A unit type (which is freely discardable and
not linear) also exists, ()
with a value ()
. Tuples are not
associative, namely (a, b, c, d) /= (a, (b, (c, d)))
and
(a, b, c, d) /= (((a, b), c), d)
, just as in Haskell.
4.1.6. Variant Types¶
Variant types look like this.
< Ok (Obj, Heap) | Fail Heap >
They can be constructed using Ok (obj,h)
, or Fail e
.
We can determine from context if Identifier
(Ok
and Fail
in
the example above) is a type or a data constructor, much like Haskell.
We will have to do a little bit of inference to determine which variant
type the thing actually belongs to.
They can have as many alternatives as you like, and there is no restriction on what goes inside a variant type. Each alternative can take any number of arguments (0 or more). They will be desugared to tuples of all the arguments. E.g.:
<Ok Obj U8 | Fail > -- equiv to <Ok (Obj, U8) | Fail ()>
4.1.7. Polymorphic types¶
Types can contain variables. Functions may be declared as having polymorphic type.
size : all (k, v). Map k v -> U32
Monomorphic functions are first class, but to get a monomorphic function
from a polymorphic function requires instantiation, e.g length[Int]
.
Since Cogent-2.0.9, explicit type applications are not mandatory,
although in some cases they must be supplied to guide the type inference
engine. Type applications can be partial, or with type holes. For
example, foo [_, A, B]
; foo [S, T, _]
is equivalent to foo [S,T]
.
A type variable under observation (i.e let!
-ed) is annotated with a
prefix bang (e.g !a
)
4.1.7.1. Permissions¶
Permissions (they used to be called “kinds”) are provided for polymorphic signatures as follows:
length : all (a :< k). Array a -> U32
Permissions are internally a set of three booleans: whether or not the type can be:
D
for Discard (i.e by weakening)S
for Share (i.e by contraction)E
for Escape (i.e returned fromlet!
)
The permission signature on a type variable is more like a constraint.
They are some combination of those three letters. If no permission
is provided, it is assumed that none of those permissions are required,
and the value will be linear and cannot escape a let!
.
4.2. Type synonyms¶
Type synonyms may be provided using the type
keyword as follows:
type X a b = { foo : a, bar : b, baz : Int }
The type synonym X
must always be fully saturated with its two
arguments wherever it is used, however.
Abstract types (defined in C) may also be declared, and they also may take parameters. This corresponds to a family of types in C.
type Buffer
type Array a
4.3. Constants and top-level definitions¶
Constants are mono-typed.
abc : U8
abc = 3
But the right hand side can be much more expressive now, with let bindings and whatnot. We must be able to prevent users from doing side-effects like allocation in the top-level—see Effects.
To make the syntax easier to parse, a function or constant’s body must be indented by at least one space. This means that any non-indented bareword is the start of a new definition or signature.
4.4. Effects¶
Most effects are currently (successfully) modelled via linear types. For allocation, Cogent does not know anything about it. Memory management involves the heap. I propose modelling the heap as an explicit linear value, just as with any other state.
Allocation functions must take and return a linear heap, as they modify it:
allocateObj : Heap -> < Ok Obj Heap | Fail Heap >
Special syntax for allocation functions and automating heap-threading are nice to have, so I welcome proposals.
4.5. Expression language¶
4.5.1. Matching and Error Handling¶
Matching may be accomplished by the following syntax:
f : Heap -> < Ok Obj Heap | Fail Heap >
f h = allocateobj h
| Ok obj h => allocateobj h
| Ok obj' h => Ok (mergeObj (obj, obj')) h
| Fail h -> let () = free obj in Fail h
| Fail h -> Fail h
This is an alignment-based syntax, grouping determined based on the alignment of the bars.
The rightward arrows for each case can either be =>
or ->
.
=>
indicates that that branch is likely, to enable compiler
optimisations. ~>
can also be used to indicate an unlikely branch.
A pattern may be _
but only if the kind of the value allows it to be
discarded.
4.5.2. Biased pattern matching¶
The syntax above poses a problem if many levels of nestings occur—you will end up with cascading matches which indent a lot. To solve this problem, we allow a syntax for early exit, which is inspired by Idris. The syntax looks like:
f : Heap -> <Ok Obj Heap | Fail Heap>
f h = let Ok obj h <= allocateobj h |> Fail h -> Fail h
and Ok obj' h <= allocateobj h |> Fail h -> let _ = free obj in Fail h
in Ok (mergeObj (obj, obj')) h
This piece of code is semantically identical to the one above. <=
matches the major case, and |>
bails out with the minor case.
4.5.3. Patterns¶
Patterns may be refutable (could fail, e.g Ok a
or 43
) or
irrefutable (always match, e.g (a,b)
or _
). Refutable patterns
can be used in a matching block only, but they can only nest irrefutable
patterns. So, unlike in Haskell, you can’t go:
f x = foo x
| Ok (Alt1 3) -> bar
| _ -> baz
As this nests a refutable pattern (3
) inside another refutable
pattern (Alt1 3
) inside another refutable pattern (Ok (Alt1 3)
).
This is forbidden to make compilation much more straightforward in the presence of linear types.
4.5.4. Let-binding¶
Let expressions take the form of ML. They are not ever recursive.
Multiple let-bindings can be introduced by separating them with and
:
f () = let x = 3
and y = 4
in foo (x,y)
Is equivalent to:
f () = let x = 3
in let y = 4
in foo (x,y)
Irrefutable single patterns may occur on the left hand side of let-binding, but refutable patterns must use regular pattern matching.
To force inference to go the way you want, a type signature can be provided for a let binding:
f () = let x : U8 = 3
in let y : U16 = 4
in foo (x,y)
4.5.5. Observation and let!
¶
Variables may be observed (i.e. accessed in a readonly manner) using !
.
In the following example, we observe x
and y
on the first line,
so we annotate the let
-binding with !x
and !y
:
f (x, y) = let (a,b) = foo (x, y) !x !y
in bar (a, b)
Prefix !
annotations can be used inline with pattern matching also:
f (x,y) = foo(x,y) !x !y
| Blah x -> bar x
| Blorp z -> baz z
4.5.6. If¶
Conditionals can be expressed in the form of if-expressions. They are in
the form of if c !v1 !v2 ... then e1 else e2
. The !v
s are
similar to the !
syntax introduced above, allowing for temporary
access to linear objects in the condition.
Apart from the normal if-then-else syntax, Cogent offers a multi-way if syntax, inspired by GHC/Haskell. For example,
if | cond_1 -> expr_1
| cond_2 -> expr_2
| ...
| else -> expr_n
In the code snippet above, the conditions are Boolean expressions,
instead of patterns as one might think. The final else
is part of
the syntax. The pipes have to be aligned. The arrows, as usual, can be
any of =>
, ->
or ~>
, which have the same semantics as used
in alternatives. Prefix !
s can be added after each condition (but
not after the else
keyword), like | cond_1 !v1 !v2 => e
.
4.5.7. Sequencing¶
Occasionally, it is useful to free a bunch of things, and using let
for
this purpose can be somewhat annoying:
f : (Obj, Obj) -> Int
f (a, b) = let _ = free a
and _ = free b
in 42
So, a little sugar is added for a series of discarding let bindings:
f : (Obj, Obj) -> Int
f (a, b) = free a; free b; 42
These two expressions are equivalent.
4.5.8. Records¶
Unboxed records are initialized as follows:
- ::
- #{x = e1, y = e2}
Boxed records are allocated on the heap. They are typically initialized by first calling the external C malloc function and then updating the record using the put operation.
4.5.9. Take/put¶
There is pattern syntax for take
, and a similar expression syntax
for put
:
f : {a : Foo, b : Bar} -> {a : Foo, b : Bar}
f (r {a = ra, b = rb}) = r {a = ra, b = rb}
Note
take
is always in patterns (i.e. LHS of =
), whereas
put
is always in expressions (i.e. RHS of =
).
Punning is also allowed:
f : {a : Foo, b : Bar} -> {a : Foo, b : Bar}
f (r {a, b}) = r {a, b}
(where just a
is equivalent to a = a
)
4.5.10. Arithmetic and comparison operators¶
Currently Cogent will use the smallest type possible for integer
literals and generate upcasts (but not downcasts) automatically when
used in a context where they are required. For non-literals, an explicit
upcast
primitive may be needed.
4.5.11. Type annotations¶
To guide the type inference engine, the user can give type annotations
to any expressions. The syntax is e : t
.
4.6. Experimental features¶
Warning
Experimental features, as they are called, are indeed experimental, which means they are not stable, and may fail, terribly. Don’t rely on them until they become stable.
4.6.1. Static arrays¶
Note
This feature is not implemented on the master
branch. It’s implemented
on the array
branch and needs to be enabled by the --builtin-arrays
flag
(see Optional features for more information on how to do that).
Array literals can be introduced using square brackets, like
[a, b, c]
. This syntax can also be used as patterns. Array types can
be defined like U8[3]
, similar to C. Indexing can be made with the
@
operator. E.g.: arr @ 3
. Arrays can also be take
n and put
.
The type-level take
and put
operators are spelt as @take
and @put
respectively. On the term level, it’s written as arr @{ @idx = val }
—
just prefix the open bracket and the indices with @
symbols, and the rest
are the same as those for records.
4.6.2. Lambda expressions¶
We only allow for a very limited form of lambda expressions. A lambda
expression has the syntax \irref => exp
, where irref
is an
irrefutable pattern, and exp
is an expression which does not refer
to any variables outside the lambda binding (no free variables). The
bound variables have to be non-linear.