Multi-Threaded Reactive Programming—The Kiel Esterel Processor

Xin Li and Reinhard von Hanxleden, Member, IEEE

Abstract—The Kiel Esterel Processor (KEP) is a multi-threaded reactive processor designed for the execution of programs written in the synchronous language Esterel. Design goals were timing predictability, minimal resource usage, and compliance to full Esterel V5. The KEP directly supports Esterel’s reactive control flow operators, notably concurrency and various types of preemption, through dedicated control units. Esterel allows arbitrary combinations and nestings of these operators, which poses particular implementation challenges that are addressed here. Other notable features of the KEP are a refined instruction set architecture, which allows to trade off generality against resource usage, and a Tick Manager that minimizes reaction time jitter and can detect timing overruns.

Index Terms—Reactive systems, concurrency, multi-threading, synchronous languages, Esterel, low-power design, predictability

1 INTRODUCTION

Many embedded systems belong to the class of reactive systems, which continuously react to inputs from the environment by generating corresponding outputs. The programming of reactive systems typically requires the use of non-standard control flow constructs, such as concurrency and exception handling. Most programming languages, including languages such as C and Java that are commonly used in embedded systems, either do not support these constructs at all, or their use induces non-deterministic program behavior, regarding both functionality and timing.

To address this difficulty, the synchronous language Esterel [2] has been developed to express reactive control flow in a concise, deterministic manner. This is valuable for the designer, but also poses implementation challenges. As Esterel is a domain-independent programming (specification) language, there are a number of implementation alternatives, each with its advantages and drawbacks, see also Table 1. An Esterel program is typically validated via a simulation-based tool set, and then synthesized to an intermediate language, e.g., C or VHDL [1]. To build the real system, one typically uses a commercial off-the-shelf (COTS) processor for a software implementation, or a circuit is generated for a hardware implementation. HW/SW co-design strategies have also been investigated, for example in POLIS [5].

Reactive programs are often characterized by very frequent context switches; as it turns out, a context switch after every three or four instructions is not uncommon [8]. This adds significant overhead to the traditional compilation approaches, as the restriction to a single program counter requires the program to manually keep track of thread control counters using state variables. Traditional OS context switching mechanisms would be even more expensive. Furthermore, the handling of pre-emption requires a rather clumsy sequential checking of conditionals whenever control flow may be affected by a preemption.

To address these difficulties, the recently emerging reactive processing approach strives for a direct implementation of Esterel’s control flow and signal handling constructs. This provides hardware support for handling reactive control flow, alleviating the need for a compiler that sequentializes the code or for an OS that emulates concurrency. In this paper, we present the Kiel Esterel Processor (KEP) reactive architecture. The development of the KEP was driven by the desire to achieve predictable, competitive execution speeds at minimal resource usage, considering processor size and power usage as well as instruction and data memory. A key to achieve this goal is the instruction set architecture (ISA) of the KEP, which allows the mapping of Esterel programs into compact machine code while keeping the processor compact. To keep the KEP simple and lightweight, it currently does not employ classical acceleration mechanism such as pipelining, other forms of instruction level parallelism, or caching. Such mechanism can be combined with reactive processing [9], but typically there is a trade-off between average-case performance and predictability. Still, the worst case reaction time of the KEP is typically improved by 4x compared to the MicroBlaze, a COTS RISC processor core, and energy consumption is also typically reduced to a quarter; see also Sec. 6.4.

This paper presents a comprehensive overview of the KEP architecture and how it meets the challenge to accurately and efficiently implement the rich, strictly

---

R. von Hanxleden is with the Department of Computer Science, Christian-Albrechts-Universität zu Kiel, 24098 Kiel, Germany. E-Mail: rvh@informatik.uni-kiel.de.
X. Li is with the Department of Electrical and Computer Engineering, University of Minnesota, Twin Cities, Minneapolis, Minnesota, USA. E-Mail: lixxx914@umn.edu.

Manuscript received June 8, 2007; revised October 30, 2009.
For information on obtaining reprints of this article, please send e-mail to: tc@computer.org, and reference IEEECS Log Number TCSI-2007-06-0223. Digital Object Identifier no. XXX.
TABLE 1
Comparison of Esterel implementation alternatives. ++ represents best; – – means worst.

<table>
<thead>
<tr>
<th>Architecture</th>
<th>Hardware</th>
<th>Software</th>
<th>Co-design</th>
<th>Patched Processor</th>
<th>Custom Processor</th>
</tr>
</thead>
<tbody>
<tr>
<td>Environment</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Custom Hardware</td>
<td>Custom Hardware</td>
<td>COTS Assembler</td>
<td>COTS µC</td>
<td>Extended Assembler</td>
<td>Esterel µC</td>
</tr>
</tbody>
</table>

| Selected References | Berry [1], Berry et al. [2], Balarin et al. [3], Roop et al. [4], Li et al. [5] |
|---------------------|---------------------------------------------------------------------------------
| Flexibility         | – – – – + + – – – + + + + + + |
| Esterel Compliance  | ++ ++ ++ – – + – – ++ + + |
| Logic Area          | ++ – – + + – – ++ |
| Memory              | ++ – – – + + + + |
| Power Usage         | ++ – – – – + + |
| Appl. Design Cycle  | – – – – – – – ++ ++ |

2.1 Reactive control flow

Esterel's parallel operator || groups statements in concurrently executed threads. The parallel terminates when all its branches have terminated.

Esterel offers two types of preemption constructs, abortion and suspension. An abortion kills its body when a delay elapses. We distinguish strong abortion, which kills its body immediately (at the beginning of a tick), and weak abortion, which lets its body receive control for a last time (abortion at the end of the tick). A suspension freezes the state of a body in the tick when the trigger event occurs.

Esterel also offers an exception handling mechanism via the trap/exit statements. An exception is declared with a trap scope, and is thrown with an exit statement. An exit T statement causes control flow to move to the end of the scope of the corresponding trap T declaration. This is similar to a goto statement, however, there are specific rules when traps are nested or when the trap scope includes concurrent threads. The following rules apply: if one thread raises an exception and the corresponding trap scope includes concurrent threads, then the concurrent threads will be weakly aborted; if concurrent threads execute multiple exit instructions in the same tick, the outermost trap will take priority.
module EXAMPLE:
input S, I, H;
output O1, O2;
signal A, R in
every S do
trap T1 in
trap T2 in
[ await I;
  weak abort
  suspend R;
  when H;
  when immediate A;
  emit O1;
  exit T1;
]% −−−−−−− Thread 1
[ await 2 tick;
  present R then
  emit A;
  end present;
  exit T2 ];
end trap;
emit O2;
end trap;
end every;
end signal;
end module

INPUT S, I, H
OUTPUT O1, O2
SIGNAL A, R
% −−−−−−− Thread 0
EMIT_TICKLEN, #20
AWAIT S
A0: ABORT S, A1
T1S: T2S: PAR 3, P1
PAR 2, P2
PARE P3
% −−−−−−− Thread 1
P1: AWAIT I
WABORTI A, A2
SUSPEND H, A4
A3: EMIT R
PRIO 1
PRIO 3
PAUSE
GOTO A3
A4: A2:
EMIT O1
EXIT T1E, T1S
% −−−−−−−− Thread 2
P2: LOAD_COUNT, #2
AWAIT TICK
PRESENT R, A5
EMIT A
A5: EXIT T2E, T2S
% −−−−−−− Thread 0
P3: JOIN 0
T2E: EMIT O2
T1E: HALT
A1: GOTO A0

S  S  I  S  I  S  I  H
O2  R, A, O1  O2

(a) Esterel program (b) KEP assembler

(c) Sample execution trace, with inputs above and outputs below logical tick time line

Fig. 1. EXAMPLE: an Esterel module illustrating Esterel parallel, preemption, and exception statements.

2.2 An Example

Let us consider the EXAMPLE module in Fig. 1a. After the input/output/local signal declarations, an every block restarts its body whenever the input signal S is present—except for the initial tick, when S is ignored, as the every is not “immediate.” Inside the every block are two nested trap scopes, an outer trap triggered by T1 (via an exit T1 exception) and an inner T2 trap. The body of the inner trap contains two parallel threads. Thread 1 initially waits for the input signal I. Once I has become present (non-immediately), the sustain R1 statement continuously emits the local signal R. However, that sustain is weakly aborted by A (immediately), and suspended by H (non-immediately). Once A has triggered the abortion,

1. To aid readability, we here use the convention of subscripting instructions with the line number where they occur.

O1 is emitted and the exception T1 is thrown. Thread 2 initially idles for two ticks, then emits A if R is present, and exits the T2 trap.

A possible execution trace is shown in Fig. 1c. All signals are absent at the initial tick. At the second tick, the presence of input signal S triggers the start of the every body, which spawns Threads 1 and 2. Thread 1 stays at the await I, since this is non-immediate, and Thread 2 stays at the await 2 tick. At the third tick, Thread 1 again cannot proceed, since I is absent, and Thread 2 stays the second tick at the await 2 tick. At the fourth tick, Thread 1 again does not get an input I; Thread 2 can proceed, detects R as absent, throws exception T2, and terminates. This exception aborts Thread 1 and transfers control to the end of the trap T2 scope, hence O2 gets emitted and control moves back to the every S loop. The other possible behaviors that follow the next occurrences of S are shown in the remainder of the time line. For a detailed understanding, the reader might also consult the Esterel primer [14].

3 RELATED WORK

There is a by now extensive body of research on the efficient compilation of Esterel in the various domains, a full discussion of which is beyond the scope of this paper. See for example Potop-Butucaru/Edwards [15] for a good overview of software synthesis approaches. We here focus on existing work in the field of reactive processing, which is rather recent development. An earlier overview was presented in [16].

3.1 Sequential reactive processing

The first reactive processor, called REFLIX [6], was presented by Salcic, Roop et al. in 2002. The REFLIX is a patched processor, combining a traditional soft micro controller core (FLIX) with a custom hardware block that extends the instruction set of the FLIX by certain new, Esterel-like instructions; see also Table 1. Although the Esterel-style statements (instructions) supported by the REFLIX were very limited, it performed better than its competitors, i.e., the FLIX and other micro controllers. The REPLIC [17] replaced the FLIX by the PIC processor, which is popular in the industry control domain. Both of these patched processors are limited by the control path of the traditional processor. This prevents for example the proper handling of nested traps, as the control path there depends on address ranges and parallel relations of the traps.

In 2004, the authors presented the first prototype of the KEP [18], now referred to as the “KEP1,” to our knowledge the first custom reactive processor fully designed from scratch. The KEP1 was also the first to correctly handle weak and strong abortion. It included Watcher units that handle such abortions concurrently to the regular control flow, i.e., without the need to execute extra instructions to check for the triggering of abortions (see also Sec. 5.2). However, the KEP1 did not provide full
concurrency, and logic and arithmetic expression were also not supported.

In 2005, Z. Salcic et al. presented the REMIC, another custom processor \cite{19}. In the same year, the KEP2 improved over the KEP1 in that it includes an interface block that supports the PRE-operator, and can handle further Esterel-constructs such as variables and local signals \cite{10}. Furthermore, it contains an ALU and supports some classical logic and arithmetic expression. The KEP2 also includes a Tick Manager, which can provide a constant logical tick length and detects timing overruns; see also Sec. 5.5.

3.2 Concurrent reactive processing

Perhaps the most distinguishing feature for reactive processors is whether and how they handle concurrency. The first generations of reactive processors did not support Esterel's concurrency operator directly. Executing concurrent Esterel programs thus required to transform them into an equivalent program with a flattened state space, \textit{i.e.}, was sequentialized by constructing a “product automaton.”

The EMPEROR \cite{20}, \cite{21} introduced the multi-processing approach for handling concurrency directly. Here, every Esterel thread is mapped onto an independent processor to be executed, and a thread control unit handles the synchronization and communication between processors; see also Fig. 2a. The EMPEROR uses a cyclic executive to implement concurrency, and allows the arbitrary mapping of threads onto processing nodes. This approach has the potential to speed up execution relative to single-processor implementations. However, this execution model potentially requires to replicate parts of the control logic at each processor, and is thus relatively hardware-intensive. Furthermore, it is difficult to support the arbitrary nesting of concurrency and preemption. A more efficient concurrency implementation approach for reactive processing appears to be \textit{multi-threading}, which was first employed by the KEP3, presented in 2006 \cite{7}, see also Fig. 2b and further explanations in Sec. 5.1. As illustrated in Sec. 5, this approach scales well to high degrees of concurrency with minimal resource overhead. By now the multi-threading approach has also been adopted by the STARPro \cite{9}, which further improves performance by pipelining.

Later in 2006, the KEP3a and its compiler were presented \cite{6}. The KEP3a has improved over the KEP3 in that it supports exception handling and provides context-dependent preemption handling instructions. This paper presents version 4 of the KEP (KEP4), which compared to earlier generations has an enriched control path for handling advanced mixed Esterel control structures, and supports numerous options for generating processor configurations. The KEP4 and the strl2kasm compiler fully support Esterel V5. The subsequently evolved Esterel V7 has numerous extensions, mainly to support hardware design, but has the same underlying reactive control flow operations; hence, extending the KEP approach for Esterel V7 appears to be mostly a compiler issue and should not require fundamental changes of the architecture described here.

3.3 Compilation, WCRT analysis, co-design

Since multi-processing and multi-threaded reactive processors employ different strategies to handle Esterel concurrency, their compilers, which synthesize an Esterel program to the target reactive processor codes, also use different approaches to implement the communication between threads.

For multi-processing, the EMPEROR Esterel Compiler 2 (EEC2) \cite{21} is based on a variant of the graph code (GRC) format \cite{15}, and appears to be competitive even for sequential executions on a traditional processor. However, their synchronization mechanism, which is based on a three-valued signal logic, does not take compile-time scheduling knowledge into account, but replaces it by repeating cycles through all threads until all signal values have been determined. Hence the compiler needs to generate sync instructions, to ensure that signals are not tested before they are emitted \cite{20}. In comparison, the multi-threaded implementation approach implements interleaving by inserting priority setting instructions at the context switch point.

The compiler for the multi-threaded KEP employs a priority assignment approach that makes use of a novel concurrent control flow graph, the Concurrent KEP Assembler Graph (CKAG). The worst-case size of the CKAG is quadratic in the size of the corresponding Esterel program; in practice, namely for a bounded abort nesting depth, it is linear \cite{12}. Unlike earlier Esterel compilation schemes, this approach avoids unnecessary context switches by considering each thread’s actual execution state at run time. Furthermore, it avoids code replication present in other approaches. The compilation for the KEP is further summarized by Li et al. \cite{8} and presented in detail by Boldt \cite{12}.

Since one key characteristic of the KEP is its timing predictability, it is feasible to perform a conservative, yet fairly accurate analysis of its WCRT. A first WCRT
Conclusions.

as LuSteraL [26] or SyncCharts in C [27]; see also the small set of operators embedded in a host language, such as proposals for expressing synchronous concurrency by a model.

scheduling unit, and uses a synchronous programming [25] extends a traditional core with an external thread time for thread synchronization. The PRET-C proposal threaded, with true parallelism, and employs physical Columbia University. This architecture is also multi-


There are also other architectures, such as the Kiel Lustre Processor (KLP) [23], that focus on data-flow and flow. There are also other architectures, such as the Kiel Esterel Processor (KLP) [23], that focus on data-flow and subsequently performs the synthesis. An intermediate logic minimization, at the source code level, facilitates the synthesis of compact logic blocks.

3.4 Further related work

The KEP series of processors focuses on reactive control flow. There are also other architectures, such as the Kiel Lustre Processor (KLP) [23], that focus on data-flow and its efficient execution via parallelization.

Timing predictability is also the focus of the Precision Timed Architecture [24] developed at UC Berkeley and Columbia University. This architecture is also multi-threaded, with true parallelism, and employs physical time for thread synchronization. The PRET-C proposal [25] extends a traditional core with an external thread scheduling unit, and uses a synchronous programming model.

Related to the KEP ISA, there also have been recent proposals for expressing synchronous concurrency by a small set of operators embedded in a host language, such as LuSteraL [26] or SyncCharts in C [27]; see also the Conclusions.

4 THE KEP INSTRUCTION SET ARCHITECTURE

At the Esterel level, one distinguishes kernel statements and derived statements; the derived statements are basically syntactic sugar, built up from the kernel statements. Any set of Esterel statements from which the remaining statements can be constructed could be considered a valid set of kernel statements. The accepted set of Esterel kernel statements has indeed evolved over time; for example, the halt statement used to be considered a kernel statement, but is now considered to be derived from loop and pause. We here adopt the definition of kernel statements from the v5 standard [14]. The process of expanding derived statements into equivalent, more primitive statements—which may or may not be kernel statements—is also called dismantling. When designing an instruction set architecture to implement Esterel-like programs, it would in principle suffice to just implement the kernel statements—plus some additions that go beyond “pure” Esterel, such as valued signals, local registers, and support for complex signal and data expressions. However, we decided against that, in favor of an approach that includes some redundancy among the instructions to allow more compact and efficient object code.

The resulting KEP ISA is summarized in Table 2 which also illustrates the relationship between Esterel statements and the KEP instructions. The KEP uses a 36-bit wide instruction word and a 32-bit data bus. The corresponding instruction encoding is described elsewhere [11]. The KEP ISA has the following characteristics:

- All the kernel Esterel statements, and some frequently used derived statements, can be mapped to KEP instructions directly. For the remaining Esterel (V5) statements there exist dismantling rules that allow the compiler to generate KEP code, including general signal expressions (see [12]).
- The control statements are fully orthogonal, their behavior matches the Esterel semantics in all execution contexts.
- Common Esterel expressions, in particular all of the delay expressions (i.e., standard, immediate, and count delays), can be represented directly. Valued signals and other signal expressions, e.g., the previous value of a signal and the previous status of a signal, are also directly supported.
- All instructions fit into one instruction word and can be executed in a single instruction cycle, except for instructions that contain count delay expressions, which need an extra instruction word and take another instruction cycle to execute.

The KEP also handles schizophrenic programs correctly—if an Esterel statement must be executed multiple times within a tick, the KEP simply does so [8].

4.1 General Code Generation

An Esterel program is compiled into a KEP assembler program (kasm) by the KEP Esterel compiler (strl2kasm) [12], which uses the front-end of the CEC for parsing and module expansion. In a second step, the KEP assembler compiler (kasm2ko) [28] compiles the assembler program into binary machine code. This includes, for example, the mapping of input/output/local signals to signal registers, and the selection of appropriate Watchers (see Sec. 5.2). The compiler also detects whether the targeted KEP version does not have enough resources available, e.g., not enough watchers or signals.
The KEP assembler code for the EXAMPLE is shown in Fig. [10]. Similar to the Esterel module, a KEP program always starts at the input/output definition. Lines 1 to 3 define input signals S and I, output signal O, and local signals A and R. The following EMIT_TICKLEN, #20 instruction assigns the tick length of this program as fixed 20 instruction cycles, as determined by the WCRT analysis [13], see also Sec. [5].

For the program body, the generation of the KEP assembler code and the priority for each forked thread; see also Sec. [5]. Optionally, one can specify the ID of the created thread.

## Table 2

Overview of the KEP Esterel-type instruction set architecture. Esterel kernel statements are shown in **bold**. A signal expression Sexp can be one of the following: 1. S: signal status (present/absent); 2. PRE(S): previous status of signal; 3. TICK: always present. A numeral n can be one of the following: 1. #data: immediate data; 2. reg: register contents; 3. ?S: value of a signal; 4. PRE(?S): previous value of a signal.

### Mnemonic, Operands

<table>
<thead>
<tr>
<th>Mnemonic</th>
<th>Operands</th>
<th>Notes</th>
</tr>
</thead>
<tbody>
<tr>
<td>INPUT[V]</td>
<td>S</td>
<td>Input declaration.</td>
</tr>
<tr>
<td>OUTPUT[V]</td>
<td>S</td>
<td>Output declaration.</td>
</tr>
<tr>
<td>SETV S, #data[reg]</td>
<td></td>
<td>Set the initial value of S; similar to EMIT, but does not affect presence.</td>
</tr>
<tr>
<td>PAR Prio, startAddr [1, ID]</td>
<td></td>
<td>Fork and join, specifying the address range and the priority for each forked thread; see also Sec. [5]. Optionally, one can specify the ID of the created thread.</td>
</tr>
<tr>
<td>PARE endAddr</td>
<td></td>
<td></td>
</tr>
<tr>
<td>JOIN Prio</td>
<td></td>
<td></td>
</tr>
<tr>
<td>PRIO Prio</td>
<td></td>
<td>Set the priority of the current thread.</td>
</tr>
<tr>
<td>[L][T][W]ABORT[n] Sexp, endAddr</td>
<td></td>
<td>If Sexp is present, strongly/weakly abort the block ranging up to endAddr. The prefix [L]T denotes the type of watcher to use, see also Sec. [5]. L: Local Watcher, T: Thread Watcher, none: general Watcher.</td>
</tr>
<tr>
<td>[L][T][W]ABORTI Sexp, endAddr</td>
<td></td>
<td></td>
</tr>
<tr>
<td>SUSPEND[II] Sexp, endAddr</td>
<td></td>
<td>If Sexp is present, suspend the block ranging up to endAddr.</td>
</tr>
<tr>
<td>EXIT endAddr, startAddr</td>
<td>trap T in ... exit T ... end trap</td>
<td>Exit from a trap of specified scope. Unlike GOTO, check for concurrent EXITs and terminate enclosing [].</td>
</tr>
<tr>
<td>PAUSE</td>
<td>pause</td>
<td>Wait for a signal. AWAIT TICK is equivalent to PAUSE.</td>
</tr>
<tr>
<td>AWAIT[n] Sexp</td>
<td>await [n] Sexp</td>
<td></td>
</tr>
<tr>
<td>AWAIT[II] Sexp</td>
<td>await [immediate] Sexp</td>
<td></td>
</tr>
<tr>
<td>CAWAITS</td>
<td></td>
<td></td>
</tr>
<tr>
<td>CAWAIT[II] S, addr</td>
<td>await case [immediate] Sexp do end</td>
<td></td>
</tr>
<tr>
<td>CAWAITE</td>
<td>Wait for several signals in parallel. A compound statement, bracketed by CAWAITS and CAWAITE, with one CAWAIT[II] instruction per signal.</td>
<td></td>
</tr>
<tr>
<td>SIGNAL S</td>
<td>signal S in ... end</td>
<td>Initialize a local signal S.</td>
</tr>
<tr>
<td>EMIT S [], #data[reg]</td>
<td>emit S [data]</td>
<td>Emit (valued) signal S.</td>
</tr>
<tr>
<td>SUSTAIN S [], #data[reg]</td>
<td>sustain S [data]</td>
<td>Sustain (valued) signal S.</td>
</tr>
<tr>
<td>PRESENT S, elseAddr</td>
<td>present S then ... end</td>
<td>Jump to elseAddr if S is absent.</td>
</tr>
<tr>
<td>NOTHING</td>
<td>nothing</td>
<td>Do nothing.</td>
</tr>
<tr>
<td>HALT</td>
<td>halt</td>
<td>Halt the program.</td>
</tr>
<tr>
<td>GOTO addr</td>
<td>loop ... end loop</td>
<td>Jump to addr.</td>
</tr>
<tr>
<td>EMIT_TICKLEN, #data[reg]</td>
<td>Set the tick length.</td>
<td></td>
</tr>
<tr>
<td>LOAD_COUNT, n</td>
<td>Load data for count delays.</td>
<td></td>
</tr>
<tr>
<td>LOAD_32REG, #data32</td>
<td>Load a 32-bit immediate data to an intermediate register.</td>
<td></td>
</tr>
<tr>
<td>CLR/SETC</td>
<td>Clear/set carry bit.</td>
<td></td>
</tr>
<tr>
<td>LOAD reg, n</td>
<td>reg := n</td>
<td>Load/store register.</td>
</tr>
<tr>
<td>(SR</td>
<td>SL</td>
<td>C</td>
</tr>
<tr>
<td>(ADD</td>
<td>SUB</td>
<td>MUL) reg, n</td>
</tr>
<tr>
<td>(AND</td>
<td>ORR</td>
<td>XOR) reg, n</td>
</tr>
<tr>
<td>CMP[S] reg, n</td>
<td>Compare (with sign), branch conditionally.</td>
<td></td>
</tr>
<tr>
<td>JN cond, addr</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

### 4.2 Concurrency

A concurrent Esterel statement with n concurrent threads joined by the ||-operator is translated into KEP assembler as follows. First, threads are forked by a sequence of n PAR instructions and one PARE instruction. Each PAR instruction creates one thread and assigns it a non-negative priority Prio and a start address startAddr. The
end address of the thread is either given implicitly by the start address specified in a subsequent PAR instruction, or, if there is no more thread to be created, it is specified as endAddr in a PARE instruction. A thread address range starts from the start address to the end address. The code block for the last thread is followed by a JOIN instruction, which waits for the termination of all forked threads and concludes the concurrent statement. The example in Fig. 1 illustrates this: lines 12–21 constitute Thread 1, Thread 2 spans lines 23–27, and the remaining instructions belong to the main thread, Thread 0.

The main thread always has priority 1, which is the lowest possible priority. The priority of the other threads is assigned when the thread is created (with the aforementioned PAR instruction), and can be changed subsequently by executing a priority setting instruction (PRIO). A thread keeps its priority across delay instructions; that is, at the start of a tick it resumes execution with the priority it had at the end of the previous tick.

When a concurrent statement terminates, through regular termination of all concurrent threads or via an exception/abort, the priorities associated with the terminated threads also disappear, and the priority of the main thread is restored to the priority upon entering the concurrent statement.

The priority assigned during the creation of a thread and by a particular PRIO instruction is fixed. However, due to the non-linear control flow, it is still possible that a given statement may be executed with varying priorities. This is a common statically determined schedule, which requires that there are no cyclic signal dependencies. This is a common restriction, imposed for example by the Columbia Esterel Compiler (CEC) [4]; see Li et al. [9] for further elaborations on causality/constructiveness and static vs. dynamic scheduling, and Lukoschus et al. [29] for an approach to expand cyclic, yet constructive programs into equivalent acyclic programs.

4.3 Handling Signal Dependencies

The signal-based communication employed by Esterel demands that signals have a unique presence/absence status throughout a tick, which is also why this strictly synchronous semantics is sometimes referred to as fixed-point semantics. This implies that a signal status should only be tested/read (e.g., with PRESENT) once all potential emitters/writers (e.g., EMIT) have executed. Assuring this also allows a proper reaction to signal absence. We also say that there is a dependency between the statements that emit some signal (the dependency sources) and the statements where that signal is tested (the dependency sinks) [30].

In a concurrent setting, this implies that threads must be executed in an order that obeys these dependencies. Hence, a non-trivial aspect of code generation is the assignment of thread ids and priorities, as these govern the thread scheduling (see also Sec. 5.1). To understand how these priorities are assigned, we consider the thread scheduling constraints that must be obeyed to run the EXAMPLE module faithful to Esterel’s semantics. The two threads enclosed in the every block can communicate back and forth via the R and A signals, within a logical tick, which makes thread scheduling non-trivial.

First, let us consider the dependency involving R. It is clear which instruction is the dependency source: the EMIT R_{15} instruction. It is also obvious that the PRESENT R_{25} instruction is the dependency sink. This allows us to formulate the first dependency present in the EXAMPLE module: whenever the EMIT R_{15} and the PRESENT R_{25} instructions are executed within the same logical tick, the execution of the EMIT must precede the execution of the PRESENT.

As for the dependency involving A, its dependency source is the EMIT A_{26} in Thread 2. However, it is less obvious which is the dependency sink, which we have defined above as the “statements where these signals are tested.” Thread 1 reacts to A when it has entered the weak abort block, in that case A triggers the abort. Hence, whenever we execute a statement in that block ranging from WABORT_{13} to the label A_{20}, we should also watch for the presence of A. However, closer inspection yields that as this is a weak abort, it suffices to check at the end of each logical tick whether the block is aborted, that is, whenever we execute a delayed instruction. In this case, the only delayed instruction in the abort block is the PAUSE_{18}, which therefore constitutes the dependency sink for A.

In the EXAMPLE, the first dependency is met by starting Thread 1 with a higher priority (3) than Thread 2 (priority 2). The second dependency is met by the PRIO 1_{16} and PRIO 3_{17} instructions, which hand control from Thread 1 to Thread 2 and back.

Dependency analysis at the Esterel level is decidable (unlike at the KASM level), but a proper analysis that covers all aspects of the Esterel language is rather intricate. A detailed treatment of this is found in the documentation of the strl2kasm compiler [12].
5 The KEP Processor Architecture

 Naturally, a given application has specific requirements regarding the computational resources. In the KEP, these resources include for example thread management and preemption capabilities. The KEP design is freely configurable, and scalable to arbitrary degrees of concurrency, preemption nestings, signal counts, etc. A KEP can thus be configured specifically for a given application. However, just as one may use a classical processor with some fixed, conservatively large memory resources for a range of applications, one may also use a “typical” KEP configuration without detailed prior knowledge about the application. For example, all the benchmarks used here have been done with a fixed configuration, except when assessing hardware scalability.

The architecture of the KEP, shown in Fig. 3, is inspired by the three layers that constitute a reactive program \(^2\), i.e., the interface layer, the reactive kernel, and the data handling layer. An Interface Block handles input reception and output production. The classical computations are performed by the Data Handling Block, consisting of the Register File, ALU and related components. The implementation of Esterel’s reactive control statements relies on the cooperation of the KEP’s Decoder & Controller, Reactive Block and Thread Block, which together form the Reactive Core (RC). The RC contains dedicated hardware units to handle concurrency, preemption, exceptions, and delays. In the following, we will briefly discuss each of these in turn.

5.1 Handling Concurrency

To implement concurrency, the KEP employs a multi-threaded architecture. Threads are scheduled in an interleaved fashion according to their statuses and dynamically changing priorities. The context of each thread is very light weight, it mainly consists of a program counter (PC), its priority, and two status flags (see below). All data are shared, consistent with Esterel’s broadcast semantics. The scheduler is very light-weight. Scheduling and context switching do not cost extra instruction cycles, only changing a thread’s priority costs an instruction. The priority-based execution scheme allows on the one hand to enforce an ordering among threads that obeys the constraints given by Esterel’s semantics (see Sec. 4.3), but on the other hand avoids unnecessary context switches. If a thread lowers its priority during execution but still has the highest priority, it simply keeps executing.

The Thread Block is responsible for managing threads, as illustrated in Fig. 4a. The SyncChart formalism \(^3\) is used here. It consists of hierarchical finite state machines, bold state borders denote initial state at each hierarchy level. Upon program start, the main thread is enabled (forked), and the program is considered running. Subsequently, for each instruction cycle (\(\text{InstrClock}\)), the Thread Block decides which thread ought to be scheduled for execution in this instruction cycle. It schedules the thread with the highest priority among all active threads; if there are multiple threads that have highest priority, the thread with the highest \(\text{id}\) is scheduled. If there are still enabled threads, but none is active anymore (see below), the next tick is started. If no threads are enabled anymore, the whole program is terminated.

The execution status of a single thread is illustrated in Fig. 4b. Two flags are needed to describe the status of a thread. One flag indicates whether the thread is disabled or enabled. Initially, only the main thread (Thread 0) is enabled. Other threads become enabled whenever they are forked, and they become disabled again when they are joined after finishing all statements in their body, or when the preemption control tries to set its program counter to a value which is out of the thread address range. The other flag indicates whether the thread should still be scheduled within the current logical tick (the thread is active) or not (inactive). Active threads are initially preempted, and become executing when they are scheduled.

The control of a thread can never exceed its address range, and if a thread still tries to do so, it will be terminated immediately. This mechanism allows a simple solution for handling arbitrary preemption and concurrency nests. For example, if a thread nest is aborted, each thread will try to jump to the end of the abortion block, which will be beyond its address range, and hence the thread will be terminated.

5.2 Handling Preemption

According to the Esterel semantics, a preemption (abortion or suspension) is enabled when control is in its body,
and disabled when control is outside of its body. When a preemption is enabled, the corresponding trigger signal is watched and the module can react to the presence of it (is active). Otherwise, the signal does not cause preemption. We call this scheme Inside/Outside Preemption Range Watching (IOPRW).

The RC provides a configurable number of Watcher units, which detect whether a signal triggering a preemption is present and whether the program counter (PC) is in the corresponding preemption body [7]. If during execution of the program the PC is within the watched range and the trigger signal is present, the Watcher triggers the corresponding changes in the control flow. Note that Esterel allows the arbitrary nesting of preemption blocks of different types, for example a strong abortion may be nested within another weak abortion, which may be nested in a suspension. The Reactive Block is responsible for coordinating the Watcher blocks in a way that reflects the Esterel semantics. Each Watcher in the Reactive Block is assigned an index number, which also defines its priority. A Watcher can be overridden by another Watcher with higher priority. Considering the preemption nest structure, it becomes clear that the higher priority preemption has a wider address range which covers all the lower priority ones. Therefore, the earlier preemption instruction in a preemption nest will be assigned to the higher priority Watcher. Note that with this approach, it is not necessary to continuously execute special Check/Abort instructions [17] that check on the status of each watcher, meaning that the program does not slow down when entering a (nested) abortion block. The Watcher modules operate autonomously, thus also offering a certain type of concurrency (with true parallelism) beyond the concurrency operator (implemented by interleaved multi-threading).

The KEP Watchers are designed to permit arbitrary nesting of preemptions, and also the combination with the concurrency operator. However, in practice this often turns out to be more general than necessary, and hence wasteful of hardware resources. Therefore, the KEP includes several versions of the Watcher, with a correspondingly refined ISA. The least powerful, but also cheapest variant is the Thread Watcher, which belongs to a thread directly, and can neither include concurrent threads nor other preemptions. An intermediate variant is the Local Watcher, which may include concurrent threads and also preemptions handled by a Thread Watcher, but cannot include another Local Watcher. The default Watcher is the most general, which can handle all execution contexts.

5.3 Handling Exceptions

The KEP does not need an explicit equivalent to the trap statement, but it provides an EXIT statement that specifies a trap scope. If a thread executes an EXIT instruction, it tries to perform a jump to the end of the trap scope. If that address is beyond the range of the current thread, control is not transferred directly to the end of the trap scope, but instead to the JOIN instruction at the end of the current thread. If other threads that merge at this JOIN are still active, they will still be allowed to execute within the current logical tick. This is dictated by the Esterel semantics, which specifies them as a variant of weak aborts.

It is possible that some concurrent threads were executed before executing the EXIT in the current tick and have become inactive. Hence, they cannot directly respond to the exception because they will not be awoken in that tick. Such a situation will be detected at the join point. If there is an active exception, all branch threads that wait at this join point will be set to disabled.

As for trap nests, the question is how to determine which one ought to take priority. The Esterel semantics specifies that outer traps take priority over inner ones. Hence, a simple idea is that the outer trap, which has the larger address range, will override the inner one. Unfortunately, this strategy is too simplistic to satisfy all cases. Compare the Trap1 and Trap2 examples in Fig. 5. As illustrated there, one must not only consider trap nesting structures, but also the concurrency relationships among the threads involved.

To handle this issue, a thread also keeps track of its parent thread. If the PC resides inside of the scope of a trap, the corresponding exception will be active. When several exceptions are all active, the KEP will compare their parent thread to determine whether they belong to a group of concurrent threads. If they have the same parent thread, the outer trap will cancel the inner one, as in the Trap1 example. Otherwise the Esterel

![Fig. 5. Interaction of concurrency and exception handling.](image-url)
semantics requires to respond to the inner one first, as in \texttt{Trap2}. Furthermore, an inheritance strategy is used when a trap crosses several threads. At the join point, if a thread finds there is an active exception which is emitted by its child thread, it will inherit this exception by adopting the parent thread of this exception as its parent thread, \textit{i.e.}, it will propagate this exception as being emitted by itself. Once all joining threads have completed within the current tick, control is transferred to the end of the trap scope—unless there is another intermediate \texttt{JOIN} instruction. This process continues until control has reached the thread that has declared the trap.

5.4 Handling Delays

Delay expressions are used in temporal statements, \textit{e.g.}, \texttt{await} or \texttt{abort}. There are three forms of delay expressions: standard delays, immediate delays, and count delays. A delay may elapse in some later tick, or in the same tick in the case of immediate delays. In the KEP, the await statement is implemented via the \texttt{AWAIT} component. Every thread has its own \texttt{AWAIT}-component to store the parameters of the await-type statement, \textit{e.g.}, the value of count delays. For the preemption statements, every \texttt{Watcher} (including its trimmed-down derivatives) also has an independent counter to handle the delays.

5.5 The Tick Manager and Energy Saving

One of the distinguishing features of the KEP is the Tick Manager. It can autonomously ensure that logical ticks, which for a deployed KEP correspond to one read inputs/compute outputs reaction cycle, are computed at a fixed rate, given a fixed clock frequency. Furthermore, the Tick Manager internally monitors timing violations. The \texttt{str2kasm} compiler performs a conservative WCRT analysis to determine the tick frequency, and timing violations should never occur \cite{13}. Hence, when using this compiler, the timing violation monitoring can be considered a redundant self-checking run-time mechanism to enhance robustness.

The Tick Manager is activated by setting the pre-defined valued signal \_TICKLEN to a certain value. This is typically at the beginning of the program, but may also be done later at run time. The activation of the Tick Manager is optional; if \_TICKLEN is not set to any value, the KEP is in “free running” mode and starts computing the next tick as soon as the current one is finished. This will typically be faster on average; however, it will result in a jitter of the reaction time, which is often undesirable from a control perspective.

As the KEP instruction cycles require a fixed number of clock cycles, providing a value for \_TICKLEN can also alleviate the need for the environment to provide a timer that starts the ticks in regular intervals. An external timer is only needed if the clock rate is not stable enough for the application, \textit{e.g.}, due to energy-saving frequency scaling.

If a tick is finished in less than \_TICKLEN instruction cycles, the KEP idles for the remaining cycles before starting the next tick. If, on the other hand, a tick is not finished within \_TICKLEN cycles, this is considered a \textit{tick length timing violation}. Such timing violations are signaled to the environment via a special signal, TickWarn, with a dedicated output pin; this signal remains present until the next reset of the processor. Furthermore, the self-monitoring makes it easy for the environment to detect any timing violations. The WCRT analysis aims to determine a conservative, yet tight value for \_TICKLEN. How a given value for \_TICKLEN translates to concrete bounds on physical reaction times also depends on the interface with the environment, as described elsewhere \cite{10}.

Fig. 6 illustrates the KEP timing behavior for a small example. In \texttt{OVERRUN}, the first \texttt{EMIT} statement sets \_TICKLEN to three; in other words, the module claims that at most three KEP instructions suffice to compute one logical tick. For the input scenario of signal D always absent, the KEP produces the timing shown in Fig. 6a. In this example, the program is running on a KEP implemented on a Memec V2MB1000 Development Board at a rate of $T_{sc} = 41.67\text{ns}$, the waveform was recorded by an Agilent 1683A Logic Analyzer. In the example, the first logical tick lasts three instruction cycles. In the second tick, the controller has to execute five instructions until the \texttt{AWAIT} statement is executed. Hence, the Tick Manager will set the TickWarn processor pin high when the fourth instruction cycle is executed to indicate the tick length timing violation.

For controller programming, the main goal of Esterel, the control signals tend to be more often absent than present \cite{14}. The condition of all signals being absent is called a \textit{blank event}. Even though Esterel does allow to specify reactions for signal absence, typically very few instruction cycles are required for executing a blank event (see also Sec. 5.5). To make the KEP benefit from this when less than \_TICKLEN instructions have been executed and there are no instructions needed for the current tick, \textit{i.e.}, all threads are inactive, an \texttt{IDLE} signal
will be broadcast to gate the clock of other elements for power reduction \[32\].

6 Validation and Experimental Results

To validate the correctness of the KEP and its compiler and to evaluate its performance, we employ an evaluation platform whose structure is shown in Fig. 7. The user interacts via a host work station with an FPGA board, which contains the KEP as well as some testing infrastructure. First, an Esterel program is compiled into an KEP object file (.ko) which is uploaded to the FPGA board. Then, the host provides input events to the KEP and reads out the generated output events. This also yields the number of instructions per tick, from which we can deduce the WCRT for the given trace. This measures performance, and allows to validate strl2asm’s WCRT analysis with respect to its safety (never underestimates) and accuracy (as little overestimates as possible). The input events can be either provided by the user interactively, or they can be supplied via a .esi file. The host can also compare the output results to an execution trace (.esd). We use EsterelStudio V5.0 to compute input/output trace files with state and transition coverage, except for the eight but benchmark, for which the generation of the transition coverage trace took unacceptably long and we restricted ourselves to state coverage. This comparison to a reference implementation proved a very valuable aid in validating the correctness of both the KEP and its compiler. The regression test suite currently includes well over 400 Esterel programs.

![Fig. 7. The KEP evaluation platform.](image)

6.1 Concurrency Analysis

To evaluate the performance of the KEP, we selected eleven standard test cases, from the Estbench suite and other sources \[5, 33\]. These benchmarks are typical Esterel applications, which not only contain reactive statements, but also include arithmetic and logical data handling. However, we leave out programs that make use of the pre operator, since the CEC currently does not support it \[4\].

To characterize each benchmark with respect to its use of concurrency constructs, Table 3 lists the counts and depths of them. For the KEP, the table shows the number of dependencies found, the used number of priority levels (the KEP provides up to 255), and the number of used PRIO instructions. In most cases, the maximum priority used is three or less, indicating relatively few priority changes per tick. For example, eight_buttons has 168 dependencies, but the maximum priority used is 3. On the other hand, greycounter, with 53 dependencies, requires a maximum priority of 6.

![Table 3](image)

<table>
<thead>
<tr>
<th>Module Name</th>
<th>LOC</th>
<th>Threads</th>
<th>Max depth conc.</th>
<th>Code size (words)</th>
<th>Code count priority instr’s</th>
</tr>
</thead>
<tbody>
<tr>
<td>abcd</td>
<td>160</td>
<td>4</td>
<td>2</td>
<td>164</td>
<td>3</td>
</tr>
<tr>
<td>abcddef</td>
<td>236</td>
<td>6</td>
<td>2</td>
<td>244</td>
<td>90</td>
</tr>
<tr>
<td>eight but</td>
<td>312</td>
<td>8</td>
<td>2</td>
<td>324</td>
<td>168</td>
</tr>
<tr>
<td>chan_prot</td>
<td>42</td>
<td>5</td>
<td>3</td>
<td>62</td>
<td>4</td>
</tr>
<tr>
<td>reactorCtrl</td>
<td>27</td>
<td>3</td>
<td>2</td>
<td>34</td>
<td>5</td>
</tr>
<tr>
<td>eight but</td>
<td>31</td>
<td>2</td>
<td>2</td>
<td>22</td>
<td>0</td>
</tr>
<tr>
<td>example</td>
<td>76</td>
<td>13</td>
<td>3</td>
<td>95</td>
<td>0</td>
</tr>
<tr>
<td>ww_button</td>
<td>143</td>
<td>17</td>
<td>3</td>
<td>343</td>
<td>53</td>
</tr>
<tr>
<td>greycounter</td>
<td>355</td>
<td>39</td>
<td>5</td>
<td>379</td>
<td>65</td>
</tr>
<tr>
<td>tcint</td>
<td>870</td>
<td>59</td>
<td>5</td>
<td>8650</td>
<td>129</td>
</tr>
</tbody>
</table>

6.2 Preemption Analysis and Watcher Comparison

Typically, the preemption constructs tend to be sequential or concurrent rather than being nested. For example, the mca200 employs 64 preemption statements, however, the maximum depth of the preemption nest is just 4. As it turns out, most of the preemptions can be handled by the cheapest Watcher type, the Thread Watcher.

To assess the savings of watcher refinement, we synthesized different variants of the Reactive Core for each benchmark, with and without watcher refinement, see Fig. 8. In most cases, having refined watcher types available uses less resources and allows to increase the frequency, and its benefits increase with the scale of the modules. For the industry size mca200 benchmark, refined watchers reduce hardware usage by 36%, and raise the maximum frequency also by 36%. Another benefit of the refined preemption handling architecture

---

3. [http://www1.cs.columbia.edu/~sedwards/software/estbench-1.0.tar.gz](http://www1.cs.columbia.edu/~sedwards/software/estbench-1.0.tar.gz)
is that it keeps the performance stable. If there are no refined watcher types, there is a 40% gap between the highest (112MHz) and the lowest (66MHz) frequency. With refined watchers available, the frequency only degrades by about 19% (from 112MHz to 90MHz).

### 6.3 Memory Usage

Table 4 compares executable code size and RAM usage between the KEP and the MicroBlaze implementations. For the MicroBlaze, we used three different Esterel compilers (V5, V7, and CEC), and compared ourselves to the best of these. To assess the size of the KEP code related to the Esterel source, we compare the code size in words with the Esterel Lines of Code (LOC, before dismantling, without comments). We notice that the KEP code is very compact, with a word count close to the Esterel source. For comparison with the MicroBlaze, we compare the size of Code + Data, in bytes, and notice that the KEP code is typically an order of magnitude smaller than the MicroBlaze code. Furthermore, the more compact state encoding reduces the data memory requirements. The KEP implementation results on average in an 83% reduction of memory usage (codes and RAM size) when compared with the best result of the MicroBlaze implementation. As for the mca200, the memory reduction of the KEP implementation is not so dramatic as that of other cases. The reason is that the mca200 contains lots of data handling—which is not a very strong point of the KEP.

### 6.4 Execution Times

The improvement in execution time of the KEP implementation is shown in Fig. 9. Compared with the best result of the MicroBlaze implementations, the KEP typically obtains more than 4x speedup for the WCRT, and more than 5x for the Average Case Reaction Time (ACRT). For a fair comparison, the time is measured based on the system clock. If the comparison is based on the instruction cycles, the KEP will achieve 12x speedup for the WCRT and more than 15x for the ACRT.

The MicroBlaze uses several levels of memory. Here we employed an FPGA chip with a large on-chip memory to implement the MicroBlaze system. Hence, all of the MicroBlaze programs could be loaded into the on-chip memory to make sure that the memory access time is minimal. The MicroBlaze implementation benefits from this, because if the implementation is based on an FPGA which has smaller scale on-chip memory, the KEP program is still likely to fit into the on-chip memory.

### 6.5 Power Usage

To compare the energy consumptions, we choose the Xilinx 3S200-4ft256 as FPGA platform. This requires 37mW as quiescent power for the chip itself. The MicroBlaze is assumed to run at 50MHz, and the peak power of the MicroBlaze is calculated by the frequency and the hardware resources of the MicroBlaze system via Xilinx WebPower Version 8.1.01. Based on the findings presented in Fig. 9, we calculate the minimal clock frequencies of the KEP to achieve the same WCRT of corresponding MicroBlaze for each benchmark, then calculate the peak power of the KEP implementation.

For most blank events, the action of an Esterel module is very simple—it tests the presence of awaited signals, and then finishes this tick because those await statements are not terminated. For the KEP, since the
elapsed instruction cycle count for those actions is far from the assigned tick length, the system will turn to the idle state for saving power. Although the MicroBlaze has no low-power operating mode that can be used to conserve processor energy (e.g., like the wait-state of the PowerPC405), we still assume it can use some additional circuit to manage its power usage by blocking its clock to satisfy the fixed tick length feature. Note that the real tick length for a blank event depends on the state of the program of the previous tick. The average power usage of blank events is also estimated by an extended esi file, which inserts a blank event between every two original ticks. Fig. 10 shows that the KEP reduces energy usage on average by 75%. The reduction becomes even more significant for blank events. Energy consumptions of the MicroBlaze system are similar for different events. However, the reactive architecture makes the power usage of the KEP 52% lower than its peak power. Hence, in this case, the KEP achieves 86% power savings.

7 Conclusions & Outlook

We have presented the KEP, a multi-threaded processor, which allows the efficient, predictable execution of concurrent Esterel programs. The multi-threaded approach poses specific compilation challenges, in particular in terms of scheduling, and we have presented an analysis of the task at hand as well as an implemented solution. As the experimental comparison with a 32-bit commercial RISC processor indicates, the approach presented here has advantages in terms of memory use, execution speed, and energy consumption. An Esterel description of the KEP is available as open source.

To accurately capture the Esterel semantics in a reactive processing setting is not trivial, as has become evident from earlier (failed) attempts. So far, we are relying on extensive experimental validation to ensure correctness, as described in Sec. 6, a formal treatment of the reactive processing approach is underway.

It would be interesting to implement a virtual machine that has an instruction set similar to the KEP; see also the recent proposal by Plummer et al. Furthermore, the underlying model of computation, with threads keeping their individual program counters and a priority based scheduling, can also be emulated by classical processors. In fact, the recent SyncCharts in C (SC) proposal defines a set of operators for reactive control flow that follow the KEP’s execution model quite closely. The main differences are that SC does not support traps (a simplification of SyncCharts compared to Esterel) and that the thread id/priority mechanism is reduced to only ids, which serve as priorities as well. Interestingly, the current reference implementation for SC can take advantage of machine instructions for arithmetic operations that are typically not available at the program level.

Specifically, on the x86, the scheduler uses a bit vector to represent active threads and the bsr (Bit Scan Reverse) assembler instruction to determine the active thread with the highest id.

The SC operators are directly embedded in regular C, no prior compilation or OS support is necessary; hence, in a way, SC uses the ISA of a traditional processor in a reactive processing manner. At least in terms of code compactness, this approach could be competitive with the custom reactive processing approach; to what extent it can match reactive processing in terms of performance, power consumption and predictability is still open for investigation.

Acknowledgments

This work has benefited from discussions with many people, in particular Michael Mendler and Stephen Edwards. We would also like to thank Marian Boldt for developing the stri2kasm compiler, Özgün Bayramoglu and Hauke Fuhrmann for the KIELER visualization, and the reviewers for providing very valuable feedback on this manuscript.

References


**Xin Li** has obtained his B.S. in Measurement Technology and Instrument Design in 1996 and his M.Sc. in Mechanical and Electronic Engineering in 2000, both from Wuhan University of Technology, P.R. China. Until 2003 he was lecturer in the School of Mechanical and Electronic Engineering at Wuhan. He completed his Ph.D. at the Dept. of Computer Science of Christian-Albrechts-Universität Kiel, in 2007. He now is with the Laboratory for Advanced Research in Computing Technology and Compilers in the Dept. of Electrical and Computer Engineering at the University of Minnesota. His current research interests focus on embedded system design and reactive processing. He holds two patents.

**Reinhard von Hanxleden** studied Computer Science and Physics at the Christian-Albrechts-Universität (CAU) Kiel from 1985 to 1988 and compiled his M.Sc. in CS at The Pennsylvania State University in 1989. He performed his doctoral studies at the CS Dept./Center for Research on Parallel Computation at Rice University in the field of data-parallel compilation until 1994, with the Ph. D. conferred in 1995. He subsequently joined Daimler Chrysler research, until 2000 with the Responsive Systems in Berlin, afterwards with Airbus in Hamburg and Toulouse. He joined the CAU CS faculty in 2001 as head of the real-time/embedded systems group. His interests include model-based design, concurrency and synchronous programming. He is currently involved in the IEEE standardization of the Esterel language and is a member of ACM, IEEE, and GI.