Separate compilation in higher-level synthesis

The Geometry of Synthesis [pdf] research programme aims to develop higher-level synthesis tools and techniques that bring full support for functions. On the language side this means things like higher-order functions [pdf] and recursion [pdf]. On the compiler side this means support for things like runtime services, libraries and separate or heterogeneous (i.e. “foreign function”) compilation. The language-side features are very cool to have, but the compiler-side features are essential if you want to handle any realistic projects. In this post I will explain how the GoS compiler (gosc) supports these features via an example.

The source code

Lets do an implementation of the Fibonacci numbers using recursion and memoization.

new r := (new mem(128) in
new i := 1 in
while !i > 0 do {mem(!i) := 0; i := !i + 1};
mem(0) := 0
# Fibonacci-by-value.
(fix \fib.\a.\n.
new n1 in
new n2 in
new n3 in
new n4 in
n1 := n;
if !n1 < 2 then 1$8
else if !a(!n1) > 0 then !a(!n1)
else (
n2 := fib(a)(!n1 - 2);
n3 := fib(a)(!n1 - 1);
n4 := !n2 + !n3;
a(!n1) := !n4;
!n4))(mem)(10))
in { print(!r) }

First a note about the programming language: it’s basically ALGOL, slightly modernized with type inference and variable data widths (1$8 means constant 1 on bits). I will discuss in a different post why this is a great language for hardware synthesis. The important point here is that Algol is a conventional programming language. At some point in time (the 60s) it was the conventional programming language. Even if you are not familiar with the details of the syntax it should be obvious where the local variable and array declarations are, how assignments and if statements work and that dereferencing (!) is explicit. There is nothing hardware-y about it. No explicit channels and signals, no timing, etc.

We have an ALGOL interpreter and we can run our program on the desktop:

Step 1: Generate System HDL files

I want to run this program on my FPGA-powered board, a Terasic DE3.

The easy way is to compile it as-is and peek-and-poke at the data using SignalTap [pdf]. The harder way is to use the segmented LED displays to report the output value.

The DE3 board comes with a tool for automatically generating the IP and pin assignments for the various things that are on the board such as LEDs, push-buttons, etc. It looks like this:

Note that most of the options are unselected because I won’t use them. I use the LEDs to display the output and a push-button for reset. The System Builder will generate then the right files as well as a Quartus project file with the right settings.

What we need to do now is to build two levels of wrappers around these IP cores so that we can use them from the Fibonacci program as if they are programming language functions.

Programming-level system calls

We are going to replace the call

{print(!r)}

which displays the result on the desktop with more suitable platform-specific calls. In this case I will do this:

 _de3_set_seg7led0_state(0);
 _de3_set_seg7led1_state(0);
 new j := (_de3_wait_button:exp$4) in
 _de3_set_seg7led0_state((!r)$$4);
 _de3_set_seg7led1_state(((!r)>>4)$$4)

These are functions that do not exist yet and things could be set up differently. The _de3_ prefix is just a convention to show that I am using DE3-specific functionality. The rest of the name of the functions should make it obvious what their role is. For example, _de3_set_seg7led0_state sets the state of one of the 7-segment LED display on the DE3 board. Note the (…:exp$4) type annotation, indicating that the result of that call is a 4-bit integer and the >>4 and $$4 operators corresponding to by-4 right-shift and truncation of an integer, seen as a bit vector.

Software-level wrappers

These functions above are a bit too “high level” to be easily implementable directly in hardware. We could do that, but it is more interesting to exhibit a mixed approach, were the interface between “software” and hardware is achieved using both software and hardware wrappers.

Let us call this file stdlib.ia, because it is a very simple standard library for the DE3 board functions we choose to use.  The full file is given below.

# This file contains drivers for the Terasic DE3 board

The first function converts hexadecimal digits in bit patterns needed to turn on or off the seven LED segments which form each display. Note that the bit pattern is a 7-bit value, as there are 7 segments.

# Bits of the number represent segments; 1 = off, 0 = on
let _de3_seg7_convert = \n:exp$4. (
                           # c tl bl bot br tr top
 if n == 0 then 64$7       # 1 0 0 0 0 0 0
 else if n == 1 then 121$7 # 1 1 1 1 0 0 1
 else if n == 2 then 36$7  # 0 1 0 0 1 0 0
 else if n == 3 then 48$7  # 0 1 1 0 0 0 0
 else if n == 4 then 25$7  # 0 0 1 1 0 0 1
 else if n == 5 then 18$7  # 0 0 1 0 0 1 0
 else if n == 6 then 2$7   # 0 0 0 0 0 1 0
 else if n == 7 then 120$7 # 1 1 1 1 0 0 0
 else if n == 8 then 0$7   # 0 0 0 0 0 0 0
 else if n == 9 then 16$7  # 0 0 1 0 0 0 0
 else if n == 10 then 8$7  # 0 0 0 1 0 0 0
 else if n == 11 then 3$7  # 0 0 0 0 0 1 1
 else if n == 12 then 70$7 # 1 0 0 0 1 1 0
 else if n == 13 then 33$7 # 0 1 0 0 0 0 1
 else if n == 14 then 6$7  # 0 0 0 0 1 1 0
 else 14$7                 # 0 0 0 1 1 1 0
 ) in                      # 64 32 16 8 4 2 1

The next two calls implement the programming-level API by calling the conversion helper function above and a function which achieves the lower-level communication with the hardware and sets the state of the register where the state of the LED segment is stored. The _ffi_ prefix is meant to emphasise that this call is truly a foreign function call, which goes directly to hardware, rather than to a genuine function.

let _de3_set_seg7led0_state = \n. {
 _ffi_HEX0_SET(_de3_seg7_convert n)}
and _de3_set_seg7led1_state = \n. {
 _ffi_HEX1_SET(_de3_seg7_convert n)} in

The next function is a convenience function we can use to set both LED displays using just one call, not actually used.

let _de3_set_seg7led_state = \n.\m. (
 _de3_set_seg7led0_state n;
 _de3_set_seg7led1_state m)
and _de3_set_seg7led0_state = _de3_set_seg7led0_state
and _de3_set_seg7led1_state = _de3_set_seg7led1_state in

Below is the implementation  of the function which waits for the button to be depressed, using a while loop. Again, note that _ffi_ function which interacts with the hardware.

let _de3_wait_button = (
 new bv := 0$4 in
 {while ( !bv == 0 ) do
 bv := _ffi_Button_get};
 !bv) in

The final thing in the file is a compiler directive indicating what functions are to be exposed, through the linker, to the programmer.

export (_de3_set_seg7led0_state, _de3_set_seg7led1_state,
        _de3_set_seg7led_state,  _de3_wait_button)

Hardware-level wrappers

The HDL files generated by the DE3 SystemBuilder are only stubs and need to be edited to:

  1. implement the desired behaviour
  2. create the right hooks for linking with the HDL generated by the compiler
  3. include metadata needed by the linker to make the several HDL files work together.

In our case the desired behaviour is quite simple, using one of the push-buttons to reset the LED segments before we proceed:

always @(posedge OSC1_50)
if (!Button[3]) begin
   BUF_LED0 <= 7'b0000000;
   BUF_LED1 <= 7'b0000000;
end else begin
   if (wire2) begin BUF_LED1 <= wire4; end
   if (wire5) begin BUF_LED0 <= wire7; end
end

There is also some code to connect the LED segments to registers.

The right hooks for connecting to the rest of the compiled program are constructed by defining a top-level instance with the right ports:

main main_Instance (
 .clock(OSC1_50),
 .reset(Button[3]),
 .v_0_0ffi0_0hex10_0set_0731_1_1(wire3),
 .v_0_0ffi0_0hex10_0set_0731_1_2c(wire2),
 .v_0_0ffi0_0hex10_0set_0731_1_2d(wire4),
 .v_0_0ffi0_0hex10_0set_0731_1_3(wire3),
 .v_0_0ffi0_0hex10_0set_0731_1_4(wire2c),
 .v_0_0ffi0_0hex00_0set_0731_1_1(wire6),
 .v_0_0ffi0_0hex00_0set_0731_1_2c(wire5),
 .v_0_0ffi0_0hex00_0set_0731_1_2d(wire7),
 .v_0_0ffi0_0hex00_0set_0731_1_3(wire6),
 .v_0_0ffi0_0hex00_0set_0731_1_4(wire5c),
 .v_0_0ffi0_0button0_0get_0400_1_1(wire1),
 .v_0_0ffi0_0button0_0get_0400_1_2c(wire1c),
 .v_0_0ffi0_0button0_0get_0400_1_2d(notButton)
);

Note in the interface the names of all the _ffi_ functions. They are mangled because the naming conventions of Verilog are stricter than those of Algol, so a naming convention applies. Ultimately, the ports are determined by the type signature of the library and are beyond the scope of this post. The library implementer needs to be aware of the way a programming signature is mapped into hardware interfaces, i.e. ports [pdf].

The file also needs to provide metadata to the linker, for connecting this module to module in the synthesised files:

///// BEGIN METADATA
// EXPORTPOS 0_0ffi0_0button0_0get_0400 0
// EXPORTTYPE 0_0ffi0_0button0_0get_0400 exp$4
// EXPORTGROUP 0_0ffi0_0button0_0get_0400 1
// EXPORTPOS 0_0ffi0_0hex00_0set_0731 0
// EXPORTTYPE 0_0ffi0_0hex00_0set_0731 (exp$7 -> com)
// EXPORTGROUP 0_0ffi0_0hex00_0set_0731 2
// EXPORTPOS 0_0ffi0_0hex10_0set_0731 0
// EXPORTTYPE 0_0ffi0_0hex10_0set_0731 (exp$7 -> com)
// EXPORTGROUP 0_0ffi0_0hex10_0set_0731 3
// LIBRARY _ true
// TOTALPORTS _ 2
// OUTERMOST _ true
///// END METADATA

The details of the linker naming conventions are also beyond the aim of the post.

Compiling, linking, running

We compile each Algol file individually, and corresponding VHDL files will be synthesised for us. We then link the HDL files:

linker fibonacci-memoized.vhdl stdlib.vhdl DE3.v > main.vhdl

The order of the files is important: the “programming files” need to first, then the software-level wrappers, then the hardware IP. The main file, using the available metadata, connects the various modules together in the correct way and establishes a top-level process for starting the overall execution: you may think of it as an extremely simple boot loader.

We can now open the Quartus project file created by System Builder and add all the HDL files plus compiler_lib.vhdl which includes some additional definitions used by the compiler. Follow the the normal design flow procedure (analysis, place & route, assembly). If we don’t set the clock values for TimeQuest we will receive warnings, which we can safely ignore (for now).

Programming the FPGA is next.

And it works! You can see fibonacci(10) which is 89:

This is 59 in hex. You can also see a video of the FPGA in action.

Credit for most of the implementation work on compiler and the example goes to my student Alex Smith.

About Dan Ghica

Reader in Semantics of Programming Languages // University of Birmingham // https://twitter.com/danghica // https://www.facebook.com/dan.ghica
This entry was posted in Geometry of Synthesis. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>