2. Cogent Quick-Start Guide

2.1. Environment

The development is primarily done on recent versions of Debian GNU/Linux. Similar Linux system should also work, but may require minor tweaks. MacOS generally works for the Cogent compiler and its generated Isabelle/HOL proofs, but normally doesn’t work natively with the systems software that we developed. If you are on Linux, then just follow this guide; if you are on MacOS, you’ll additionally need gcc and some other GNU-compliant packages. Keep an eye on some MacOS hints in Testing on macOS as you follow this guide.

2.2. Installation

See Cogent Installation Guide for instructions.

2.3. Your first program

The program we are going to walk through is a simple program which adds two natural numbers up, and prints out the result, or reports an overflow error.

Cogent is a restricted, pure, functional language. For example, Cogent is unable to express I/O, memory management, loops or recursions. For this reason, we only write the core functionality in Cogent, which adds up two natural numbers, and check if overflow has happened. To carry the result of the check, we need a result type:

type R a b = < Success a | Error b >

It has similar meaning to a sum type in Haskell (or in any functional language):

data R a b = Success a | Error b

It says that, the result can be either a Success or Error, and in each case you also return the result associated with these tags, namely a and b respectively. Note that R a b is polymorphic on a and b. They can be instantiated to any concrete types.

The difference to the Haskell version is that, the type keyword only introduces a type synonym, which is to say, wherever an R a b is needed, it can be alternatively spelt as < Success a | Error b >, which of course is more verbose in many cases.

Next we define a function add, which has the following type signature:

add : (U32, U32) -> R U32 ()

U32 is an unsigned 32-bit integer; we use it to model natural numbers here. Cogent also has built-in U8, U16 and U64 for other sizes of unsigned integers. A Cogent function only takes one argument. When we want to pass in two U32s, we make them into a pair (or a tuple, more generally).

The full definition of the add function is given below (also includes the previously given type signature):

add : (U32, U32) -> R U32 ()
add (x, y) = let sum = x + y
              in sum < x || sum < y
                 | True -> Error ()
                 | False -> Success sum

Cogent is a layout sensitive language, like Python or Haskell. In this function, it binds the value of x + y to binder sum. In the body of the let-in expression, a pattern match is done on the condition sum < x || sum < y (if sum is less than x or it’s less than y). If this condition is True, then we return an Error (). () is “unit”, the single value of type () (also reads “unit”). If the condition is False, which means that overflow didn’t happen, then we return Success sum, which carries the summation.

That’s pretty much all that we can do in Cogent. We can save the above program in a file called Adder.cogent. For more information about the Cogent language, its syntax and semantics, you can read Cogent Surface Syntax, the Cogent language manual, and a more technical Cogent language reference.

Todo

Part of the language reference will become a section in this doc later.

Next we will have to write the C code, which does the things that Cogent cannot do. Cogent code will be compiled to C code, and is always invoked as a subroutine, by a C program. The C programs we write for Cogent are not plain C; they are called antiquoted C (c.f. Antiquoted C).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
$esc:(#include <stdio.h>)
$esc:(#include <stdlib.h>)
#include "generated.c"

int main(void){
  $ty:(U32) first_num = 19;
  $ty:(U32) second_num = 2;

  $ty:((U32, U32)) args;
  args.p1 = first_num;
  args.p2 = second_num;

  $ty:(R U32 ()) ret = $exp:add(args);
  if(ret.tag == TAG_ENUM_Success){
    $ty:(U32) sum = ret.Success;
    printf("Sum is %u\n", sum);
    return 0;
  } else {
    printf("Error: Overflow detected.\n");
    return 1;
  }
}

An antiquoted C file is very similar to a regular C file. The only difference is that you can write antiquotes in the C code. An antiquote is comprised of an antiquote name (e.g. $ty, $exp, $esc, $spec), a colon, and a Cogent snippet enclosed by a pair of parentheses. The purpose of having antiquotes is that you can refer to Cogent types, expressions, etc. without knowing what they get compiled to. In particular, with the current implementation of the Cogent compiler, it’s very difficult to know what C names will be generated. See ticket #322 on GitHub.

Let’s first look at the main function. In line 6, the antiquote $ty:(U32) means that we want to use a U32 (a primitive type in Cogent) equivalent in C. On line 9, it’s similar that we want a pair of two U32s. Note the two pairs of parentheses—the inner one is for the tuple, and the outer one is the antiquotation syntax. Both of them are necessary. The $exp:add antiquote on line 13 is for Cogent expressions, in this case a function name. Strictly speaking, this antiquote is not necessary, as we know that the C name of the Cogent add function is add. However for polymorphic functions, the names of the generated C functions will be slightly different than the Cogent function name, in which case the antiquote is necessary. Another minor syntactic flexibility that can be seen is that, if the antiquoted string is a single identifier starting with a lowercase character, the enclosing parentheses can be omitted.

For more details about antiquoted C in Cogent, see Antiquoted C.

Finally on line 1 of the antiquoted C program, the $esc tells the Cogent compiler not to preprocess the #include. To understand the reason behind it, we need to briefly talk about how antiquoted C is compiled by the Cogent compiler: The compiler tries to parse the antiquoted C files; however, because the syntax of C (or antiquoted C) is context-sensitive, it needs to know what types have already been declared in the program. This requires the antiquoted C files to be preprocessed by cpp, inlining the included files. The C parser that the Cogent compiler uses does not support full GNU extensions, which means if in your included files, unsupported syntax is used (which is very likely to be the case if you include Linux kernel headers, or glibc for example), then the parser will fail. To work around this limitation, the files that contains unsupported features need to be included, but enclosed by a $esc antiquote, so that they won’t be expanded before parsing. A file that includes all the type names declared in these excluded files will be passed to the compiler via a flag --ext-types. We will go through the compiler flags shortly.

On the contrary, Cogent-generated C code can be parsed and should be included by cpp. That’s the code on line 3. The name generated.c is specified by another command-line argument to the compiler, which will be covered later. The Cogent compiler compiles Cogent source code to C; it will generate a .h header file and a .c file. Note that it should be the .c file that’s included, instead of the header file as normal.

We name this antiquoted C file main.ac (ac for “antiquoted C”).

At this point we have all the source code that we need. As you should already know, Cogent is a code and proof co-generating compiler. As verification is more involved, we first only focus on the C code generation part.

cogent -g Adder.cogent -o generated \
  --infer-c-funcs="main.ac" \
  --cpp-args="\$CPPIN -o \$CPPOUT -P $CFLAGS" \
  --ext-types=types.cfg \
  --entry-funcs=entrypoints.cfg

The Cogent compiler comes with hundreds of flags, here we only mention the most important ones. To see the help message, you can run cogent -h<LEVEL>. <LEVEL> ranges from 0 to 4. <LEVEL> is optional, default to 1. The higher the help level, the more options and flags the help message is displayed. In general, the flags that only appear in higher help levels are less important, less stable, or changing the compiler behaviours less significantly.

The compiler has to be called with at least one command. A command indicates what the compiler does, e.g. pretty-prints the core syntax tree, generates C code, generates the Isabelle/HOL embedding of the desugered core language, etc. The compiler can do many things at once. In the command shown above, the -g is the command—it generates C code. What follows is the Cogent source file, Adder.cogent in this example.

All the rest are Cogent flags. A flag controls or fine-tunes how the compiler behaves. Arbitrary number of flags can be given.

  • -o generated designates the output file name (only the base name is needed), and that’s why we #included "generated.c" earlier in the main.ac file.

  • --infer-c-funcs passes all the .ac files. More than one .ac files can be given, separated by spaces.

  • The --cpp-args line is the command-line arguments passed to the C preprocessor, by default (GNU) cpp. In the argument line passed to the preprocessor, \$CPPIN and \$CPPOUT are placeholders that will be replaced by the Cogent compiler with the actual names of the files, as specified by Cogent compiler flags such as -o. Note that the \$ is escaped in the Shell command as the dollar sign is part of the placeholders’ names. -P inhibits generation of linemarkers by the preprocessor, which should always be used as the next stage of the compilation doesn’t support linemarkers. $CFLAGS is defined as:

    CFLAGS=-I. -I$COGENT_STDLIB -std=gnu99
    

    It just contains other standard flags that gcc and cpp demands. Normally -I for search paths, and -std for specific C standards. We use GNU C99. $COGENT_STDLIB points to the directory containing the standard Cogent libraries. The source of the standard library is located in https://github.com/NICTA/cogent/tree/master/cogent/lib, but it will be installed (i.e. copied) to a build directory depending on how you installed your Cogent compiler. See more information in Cogent Installation Guide. In this example, even no types or functions from the standard library is used, the generated program still needs the definition for the primitive types, which are defined in cogent-defns.h in the $COGENT_STDLIB folder.

  • --ext-types passes in a file named types.cfg containing a list of externally declared C types. We have explained earlier why a list of types are needed in order to parse C file correctly. In this case there’s no type that are unknown to main.ac so the file is empty. Alternatively we can omit this flag and the empty file all together. The file name and its extension is arbitrarily chosen here.

  • --entry-funcs informs the Cogent compiler which Cogent functions are needed by the .ac files. The Cogent compiler only generates functions designated in the entrypoints.cfg file and their dependencies. Again the name of the file is not of any significance and can be anything. In this example, we have add in the file. The file should be formatted to have one function name per line.

Running this command, you should get a C file called main_pp_inferred.c. The Cogent compiler will first run the C preprocessor and write to a file called main_pp.ac. It then starts from there, compiling the antiquotes substituting them with appropriate C code snippets, and writing to the final main_pp_inferred.c. To debug antiquotes, it might be worth looking at the main_pp.ac file as that’s the one that the Cogent compiler sees and on which it reports line numbers.

At this point, you have a C file (main_pp_inferred.c) which should be compiled by gcc. Although the C code should generally work with other compilers as well (e.g. Clang or CompCert), we only officially support recent versions of GCC.

You can find the complete code for this example in our repository.

2.4. A more complicated example

In this example, we will focus on using abstract functions from the standard Cogent library, called libgum. The program generates the n-th Fibonacci number using the generic iterator from libgum.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
include  <gum/common/iterator.cogent>

@ fibonacci returns the n-th Fibonacci number.
fibonacci : U32 -> U32
fibonacci n =
   let ((_, fibn, _), _) = iterate #{
       gen = fib_gen,
       cons = fib_consume,
       acc = (0, 1, 1),
       obsv = n
       }
   in fibn

@ fib_gen --- calculate the next Fibonacci number, unless we're finished.
@ Accumulator contains (n-1)th and nth Fibonacci numbers, and n in third place.
@ The accumulator returned by GeneratorResult has the same pattern; no value is returned for Stop / Yield etc.
fib_gen : #{acc : (U32, U32, U32), obsv : U32} -> GeneratorResult () () () (U32, U32, U32)
fib_gen #{acc = (n1, n2, n), obsv} =
  if | n == obsv -> ((n1, n2, n), Stop ())
     | else      -> ((n2, n1+n2, n+1), Yield ())

@ fib_consume is a verbose no-op.
fib_consume : #{obj : (), acc : (U32, U32, U32), obsv : U32} -> ConsumerResult () () (U32, U32, U32)
fib_consume #{obj, acc, obsv} = (acc, Next)

On line 1, the include command imports the iterator.cogent file. There are two forms of include command in Cogent, either include "something.cogent" or include <somelib.cogent>. They work in the same way as their #include counterparts in C.

The comment after the @ symbol on line 3 (and the other two functions) is for documentation generation, especially for documenting libraries and APIs. They can be generated by Docgent, which can be run by cogent --docgent <COGENT_SRC>, if your Cogent compiler is built with docgent flag enabled. See Optional features on how to enable the flags.

On line 6, a function iterate is invoked. This is a very general iterator that Cogent’s standard library provides. Let’s have a look at its type signature and some relevant type synonyms:

iterate : all (y, r, s, acc, obsv).
  #{ gen  : Generator y r s acc obsv!
   , cons : Consumer  y r s acc obsv!
   , acc  : acc
   , obsv : obsv!
   } -> IterationResult acc r s

type GeneratorResult y r s acc = (acc, <Return r | Yield y | Stop s>)
type Generator y r s acc obsv = #{acc : acc, obsv : obsv!} -> GeneratorResult y r s acc

type ConsumerResult r s acc = (acc, <Return r | Stop s | Next >)
type Consumer y r s acc obsv = #{obj : y, acc : acc, obsv : obsv!} -> ConsumerResult r s acc

-- Return if the body (enumerator) returned a value, or Stop if generator had no more
type IterationResult acc r s = (acc, <Return r | Stop s>)

The iterate function is polymorphic over type variables y, r, s, acc and obsv. Because in this example, they will be instantiated to (), (), (), (U32, U32, U32) and U32 respectively, all of which are simple, the type inference engine is capable of knowing what they are. In this case, type application to iterate is not necessary. You can nevertheless write them out as iterate [(), (), (), (U32, U32, U32), U32] if you think its more informative or clearer.

The function’s argument is an unboxed record of four fields. In Cogent, each function can only take exactly one argument. If more arguments are required, they can always be packed in a (usually unboxed) record or a tuple. In the argument record, gen and cons are the generator function and the consumer function respectively, which we will come back to shortly. acc is the accumulator, which is a read/write object that gets threaded through all the iterations. obsv is the observable object, which is a readonly (indicated by the ! operator on its type) object that the generator and/or the consumer can observe. As Cogent doesn’t have closures, the gen and cons functions cannot directly access variables in iterate’s scope; they have to be passed in explicitly as arguments. E.g. (in Haskell’s syntax), instead of

$> let v = 3
    in map (\a -> v + a) as

we have to write

g :: Int -> Int -> Int
g a b = a + b

$> let v = 3
    in map (g v) as

In each iteration, the generator is first called. The generator takes the accumulator (initial value) and the observable, and generates a result of either Return, Yield or Stop, updating the accumulator. If Return r or Stop s is returned, then the iteration will terminate immediately. The difference between them is that Return indicates that an early exit has happened, whereas Stop means the iterator has exhausted itself, terminating normally. If Yield y is returned, the result y will be further processed (or consumed) by the consumer. The consumer cons, takes the result y of the generator, the accumulator and the observable as usual, returns a pair of the updated accumulator, and either Return, Stop or Next. Return and Stop have the same meaning as mentioned above; Next means it will enter the next iteration. The overall iterate function will return the final accumulator, paired with the payload of either Return or Stop, of different types. As we can see, this iterator is very general, and there are more specific looping or recursion functions defined in other files in the libgum. The Cogent FFI of these types and functions can be found in cogent/lib/gum/common/iterator.cogent and the underlying C definitions in cogent/lib/gum/anti/iterator.ac.

In the code snippet above, all the work is done in the generator function; the consumer function just returns the accumulator unchanged, together with a Next tag to keep looping. As you can see, iteration is verbose.

The accumulator is a triple. Its first two terms are the n-1-th and n-th Fibonnaci numbers. Its third term is n. Each time fib_gen is invoked, it adds the first two terms together, increments n and creates a new accumulator:

Step Accumulator
1 (0,1,1)
2 (1,1,2)
3 (1,2,3)
4 (2,3,4)
5 (3,5,5)
6 (5,8,6)

When the third term reaches the observer (here just a U32), the generator returns Stop to end the loop; the pattern in the main function picks out the second term in the triple as the return value for the Fibonacci function.

In the antiquoted C file, the main function invokes the fibonacci function and prints the tenth such value:

$esc:(#include <stdio.h>)
#include "fib.c"
#include <gum/anti/iterator.ac>

int main(void)
{
   u32 n;
   n = $exp:fibonacci(10);
   printf("10th Fibonacci is %u\n", n);
   return 0;
}

The building process is very similar to the previous example (c.f. Your first program). The complete code and Makefile for this example can be found here.

2.5. Example: abstract types and polymorphic functions

In the previous example, we have shown some of the libgum functions—they are abstract functions, in the sense that we only declare them in Cogent, and defer their definitions to C. Cogent also offers abstract types. An abstract type is a type that only gets declared in Cogent, but is defined in C.

If we want to declare two abstract types A and B, we write in Cogent:

type A
type B

Cogent assumes nothing but that they are boxed types, allocated on the heap and is access by a pointer. Boxed abstract types are by definition linear in Cogent’s type system. Whenever you use a value of type A in Cogent, it will always be a pointer to type A in the generated C code.

In C, we can give concrete definitions for these types, for example:

typedef char A;
typedef struct { int b; } B;

Note

If in your Cogent source file, there’re only type definitions and no function definitions, then Cogent will not generate any types in the C file. And Cogent will only generate types that get used by at least one function.

Now we need to add some Cogent functions to work on these types. For example, we define a very simple Cogent function:

swapDrop : all (a, b, c :< DS). (a, b, c) -> (b, a)
swapDrop (a, b, c) = (b, a)

swapDrop, as its name suggests, it swaps the first two components in the argument tuple, and drops the last element. all (a, b, c :< DS) universally quantifies type variables a, b and c. Additionally, it says that c is constrained to have DS permissions (see more details in Permissions). D and S mean that the type c has to be Discardable and Shareable respectively, i.e. non-linear, and that’s why the third component in the tuple can be dropped (or, discarded). Strictly speaking, S is not needed here, as we don’t share it. But it’s more conventional to use DS together, as D and S together denote linearity.

The main.ac file has some trickiness:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
$esc:(#include <stdlib.h>)
$esc:(#include <stdio.h>)

typedef char A;
typedef struct { int b; } B;

#include "swap-drop.c"

int main() {
  A *a = (A*)malloc(2 * sizeof(char));
  B *b = (B*)malloc(sizeof(B));
  a[0] = '!';
  a[1] = '\0';
  b->b = 42;

  $ty:((A,B,U32)) arg = { .p1 = a, .p2 = b, .p3 = 12 };
  $ty:((B,A)) ret = $exp:(swapDrop[A,B,U32])(arg);

  printf("fst = %u\n", ret.p1->b);
  printf("snd = %s\n", ret.p2);
  return 0;
}

On line 4 and 5, we give definitions for types A and B, as we have discussed above. It’s worthy noting that on line 7, we include the generated C file. It has to come after the definitions of A and B, as the generated C code rely on the definition of them. Finally on line 17, we use an antiquote $exp to refer to the Cogent function swapDrop. The type arguments of this function have to be fully applied, as in this main.ac file, the Cogent compiler doesn’t know what instantiation it has, thus unable to infer.

As before, we need an entrypoints.cfg file to pass to the --entry-funcs argument. In this file, the only function needs to be included is swapDrop[A,B,U32]. Again, for the same reason, the type arguments have to be fully applied. As the programmer, you are responsible for ensuring that the ones passed to --entry-funcs are consistent with what get used in the antiquoted C files. The Cogent compiler doesn’t perform any sanity checks.

2.6. Example: polymorphic abstract types

Now let’s explore some more advanced features of Cogent. Cogent allows types to be parametric, including abstract types. Typical examples include containers: arrays, lists, trees, etc. Functions operating on these parametric abstract types are polymorphic, and share the same interface. These functions are normally parametrically polymorphic, meaning that they are generic over types.

Note

Cogent allows for ad hoc definitions of some instances of a polymorphic function, but we won’t go into it in this example. We only consider parametric polymorphism here.

include <gum/common/wordarray.cogent>

map : WordArray U32 -> WordArray U32
map arr = let view = wordarray_view (arr, 3, 6, 1)
          and view' = wordarray_map_view (view, triple)
           in wordarray_unview view'

triple : U32 -> U32
triple x = 3 * x

In this example, we write a small Cogent function map which maps a slice of a wordarray. A wordarray is a dynamically allocated array in C, with unsigned integers (of the same type) as its elements. WordArray a is an abstract type defined in cogent/lib/gum/common/wordarray.cogent, where a is the element type of that array. wordarray_view (arr, fr, to, st) is a polymorphic function over the element type a, creating a writable view into a slice of an array arr, starting from the fr-th element (inclusive), with step st, and ending at the to-th element (exclusive). wordarray_map_view maps over every element in the view, and returns the updated slice. The updates are performed in-place, resulting in more performant C code. Finally wordarray_unview converts a view back to a regular array. This piece of Cogent program is relatively simple.

In the companion main.ac file, the main function is straightforward: we call the Cogent map function as map (arr). Here we don’t even need to use the $exp antiquote, as we can already know that the generated C function name of map is identical to its Cogent name, given that this function is monomorphic.

The antiquoted C file giving the definitions of the abstract functions for wordarray can be found in cogent/lib/gum/anti/wordarray.ac and is standard. What’s not so obvious is how to define the abstract type of wordarray.

Unlike the previous example that we could define the (monomorphic) abstract types in the main.ac file, here we need to create another type of antiquoted file—a .ah file—antiquoted header file. The antiquoted header files are passed to the --infer-c-types argument, contrary to the --infer-c-funcs argument. The reason why .ah files are different from .ac files is that, we know what types a polymorphic function should be instantiated to according to the explicit type applications in the .ac file, as in $exp:(swapDrop[A,B,U32]) in the previous example. For types, however, we work out the instantiations depending on what instances are used in your Cogent functions.

Note

It’s only used if it’s a dependency of at least one function specified in --entry-funcs.

The definition of WordArray a is given below (also in the repository in cogent/lib/gum/anti/abstract/WordArray.ah):

struct $id:(WordArray a) {
      int len;
      $ty:a* values;
};

typedef struct $id:(WordArray a) $id:(WordArray a);

In the Cogent standard library, a wordarray is defined to be a struct, consisting of two fields: len stores the length of the wordarray, and values is a C array holding the contents.

Let’s come back to the main.ac file. The first few lines look like:

$esc:(#include <stdio.h>)
$esc:(#include <stdlib.h>)
$esc:(#include <string.h>)

#include "mapper.c"
#include <wordarray.ac>

We only need to include the .ac files, as the .ah files will be automatically included in the generated mapper.h file. After all, the function declarations and definitions there rely on the definitions of the abstract types.

We can have a brief look at how they are included:

#include <abstract/WordArray_u32.h>
#include <abstract/View_WordArray_u32.h>
struct t2 {
    View_WordArray_u32 p1;
    t1 p2;
} ;

Once the parametric abstract type is needed, the Cogent compiler will generate lines to include the monomorphised definitions of the parametric types.

The build command (in a Makefile) is:

cogent $(SRC) -g -o$(OUTPUT) \
        --abs-type-dir="$(ABSDIR)" \
        --infer-c-types="$(AHFILES)" \
        --infer-c-funcs="$(ACFILES)" \
        --cpp-args="\$$CPPIN -o \$$CPPOUT -E -P $(CFLAGS)" \
        --entry-funcs=entrypoints.cfg

$(ABSDIR) is the directory containing the generated definitions of parametric types. All the generated header files will be placed in $(ABSDIR)/abstract, which must already exist before this command is run. $(AHFILES) needs to include all the needed .ah files, and $(ACFILES) here is only the main.ac, since the other .ac files are already included in main.ac.

The code for this example can be found in the repository.

2.7. Example: building Isabelle proofs

Todo

An example doing verification. Maybe focus on how to write the Makefile for verification and code generation, rather on the Isabelle proofs themselves.