ARTEMIS

Partners:

Delft University of Technology, Computer Engineering (Prof. dr. S. Vassiliadis)
University of Amsterdam, Computer Systems Architecture (Dr. A.D. Pimentel)
Leiden University, Leiden Institute of Advanced Computer Science (Prof. dr. E.F. Deprettere)
Philips Research Laboratories Eindhoven (Dr. P. van der Wolf and Dr. J. van Eijndhoven)
Artemis Project

The Artemis workbench

The Artemis workbench consists of a set of methods and tools conceived and integrated in a framework to allow designers to model applications and SoC-based (multiprocessor) architectures at a high level of abstraction, to map the former onto the latter, and to estimate performance numbers through co-simulation of application and architecture models. Figure 1 depicts the flow of operation for the Artemis workbench, where the grey parts refer to the various tool-sets that together embody the workbench. The point of departure is an application domain (being multimedia applications for Artemis), an experimental domain-specific platform architecture and a domain-specific application specified as an executable sequential program. The platform architecture is instantiated in the architecture model layer of the workbench, while the application specification is converted to a functionally equivalent concurrent specification using a translator called Compaan. More specifically, Compaan transforms the sequential application specification into a Kahn Process Network (KPN). In between the application and architecture layers there is a mapping layer. This mapping layer provides means to perform quantitative performance analysis on levels of abstraction, and to refine application specification components between levels of abstraction. Such refinement is required to match application specifications to the level of detail of the underlying architecture models. Effectively, the mapping layer bridges the gap between the application and architecture (models), sometimes referred to as the implementation gap.

Figure 1

Figure 1

Because Artemis operates at a high level of abstraction, low-level component performance numbers can be used to calibrate the system-level architecture models. To this end, individual processes (i.e., code segments) of a KPN application specification are taken apart and implemented as individual low-level components (that appear in the current high-level instance of the platform architecture). This results in performance numbers -- as well as in estimations on cost and power consumption -- for the low-level components that the system-level modeling framework needs to provide accurate performance estimations for the multi-processor system architecture as a whole. For this calibration process, the Artemis workbench uses the Laura tool-set and the Molen calibration platform architecture.

The Compaan and Laura tool-sets

Today, traditional imperative languages like C, C++ or Matlab are still dominant with respect to implementing applications for SoC-based (platform) architectures. It is, however, very difficult to map these imperative implementations, with typically a sequential model of computation, onto multi-processor SoC architectures that allow for exploiting task-level parallelism in applications. In contrast, models of computation that inherently express task-level parallelism in applications and make communications explicit, such as CSP and Process Networks, allow for easier mapping onto multi-processor SoC architectures. However, specifying applications using these models of computation usually requires more implementation effort in comparison to sequential imperative solutions.

In Artemis, we use an approach in which we start from a sequential imperative application specification -- more specifically an application written in a subset of Matlab -- which is then automatically converted into a Kahn Process Network (KPN) using the Compaan tool-set. This conversion is fast and correct by construction. In the KPN model of computation, parallel processes communicate with each other via unbounded FIFO channels. Reading from channels is done in a blocking manner, while writing to channels is non-blocking. We decided to use KPNs for application specifications because they nicely fit with the targeted media-processing application domain and they are deterministic. The latter implies that the same application input always results in the same application output, irrespective of the scheduling of the KPN processes. This provides us with a lot of scheduling freedom when, as will be discussed later on, mapping KPN processes onto SoC architecture models for quantitative performance analysis.

The infrastructure of the Compaan tool-set is illustrated on the left-hand side of Figure 2. The grey parts refer to the separate tools that are part of Compaan, while the white parts refer to the (intermediate) formats of the application specification. Starting-point is an application specification in Matlab, which needs to be specified as a parameterized static nested loop program. Recently, Compaan's scope has been extended to also include weakly-dynamic nested loop programs that allow for specifying data-dependent behavior. On these Matlab application specifications, various source-level transformations can be applied in order to, for example, increase or decrease the amount of parallelism in the final KPN. In a next step, the Matlab code is transformed into single assignment code (SAC), which resembles the dependence graph (DG) of the original nested loop program. Hereafter, the SAC is converted to a Polyhedral Reduced Dependency Graph (PRDG) data structure, being a compact mathematical representation of a DG in terms of polyhedra. Finally, a PRDG is converted into a KPN by associating a KPN process with each node in the PRDG. The parallel KPN processes communicate with each other according to the data dependencies given in the DG.

Figure 2

Figure 2

The Laura tool-set, depicted on the right-hand side of Figure 2, takes a KPN as input and produces synthesizable VHDL code that implements the application specified by the KPN for a specific FPGA platform. To this end, the KPN specification is first converted into a functionally equivalent network of conceptual processors, called hardware model. This hardware model, which is platform independent as no information on the target FPGA platform is incorporated, defines the key components of the architecture and their attributes. It also defines the semantic model, i.e., how the various components interact with each other. Subsequently, platform specific information is added to the hardware model. This includes the addition of IP cores that implement certain functions in the original application as well as setting attributes of components such as bit-width and buffer sizes. In the final step, the hardware model is converted into VHDL. To do so, Laura supplies a piece of VHDL code for each component in the hardware model that expresses how to represent that component in the target architecture. Using commercial tools, the VHDL code can then be synthesized and mapped onto an FPGA. As can be seen in Figure 2, the results from this automated implementation trajectory can be fed back to Compaan to explore different transformations that will, in the end, lead to different implementations.

The Molen calibration platform

Figure 3

Figure 3

Figure 3 depicts the platform architecture that is used for component calibration in Artemis. This platform architecture, called Molen, connects a programmable processor with a reconfigurable unit and uses microcode to incorporate architectural support for the reconfigurable unit. Instructions are fetched from the memory, after which the arbiter performs a partial decoding on the instructions to determine where they should be issued. Those instructions that have been implemented in fixed hardware are issued to the core processing (CP) unit, which is one of the PowerPCs from a Xilinx Virtex II Pro platform in the Molen prototype implementation, while instructions for custom execution are redirected to the reconfigurable unit. The instructions entering the CP unit are further decoded and then issued to their corresponding functional units.

The reconfigurable unit consists of a custom configured unit (CCU), currently implemented by the Xilinx Virtex II Pro FPGA, and a reconfigurable micro-code unit. The reconfigurable unit performs operations that can be as simple as an instruction or as complex as a piece of code describing a certain function. Molen divides an operation into two distinct phases: set and execute. The set phase is responsible for reconfiguring the CCU hardware, enabling the execution of an operation. Such a phase may be subdivided into two sub-phases, namely partial-set (p-set) and complete-set (c-set). The p-set phase covers common functions of an application or set of applications. Subsequently, the c-set sub-phase only reconfigures those blocks in the CCU which are not covered in the {p-set sub-phase in order to complete the functionality of the CCU.

To perform the actual reconfiguration of the CCU, reconfiguration microcode is loaded into the reconfigurable micro-code unit and then executed (using p-set and c-set instructions). Hereafter, the execute phase is responsible for the operation execution on the CCU, performed by executing the execution microcode. Important in this respect is the fact that both the set and execute phases do not explicitly specify a certain operation to be performed. Instead, the p-set, c-set and execute instructions point to the memory location where the reconfiguration or execution microcode is stored.

The Compaan and Laura tool-sets in combination with the Molen platform architecture provide great opportunities for the previously discussed calibration of system-level architecture models. For this purpose, Laura maps a specific component from an application specification to a hardware implementation by converting the Compaan-generated KPN associated with the application component to a VHDL implementation. This VHDL code is subsequently used as reconfiguration microcode for Molen's CCU, while the remainder of the application specification (i.e., the code that has not been synthesized to a hardware implementation) is executed on Molen's core processor. As a result, the application component mapped onto the CCU provides low-level implementation numbers that can be used to calibrate the corresponding component in the system-level architecture model.

The Sesame modeling and simulation environment

Artemis' system-level modeling and simulation environment, called Sesame, facilitates performance analysis of embedded systems architectures according to the increasingly popular Y-chart design approach. This means that we recognize separate application and architecture models within a system simulation. An application model describes the functional behavior of an application, including both computation and communication behavior. An architecture model defines architecture resources and captures their performance constraints. After explicitly mapping an application model onto an architecture model, they are co-simulated via trace-driven simulation. This allows for evaluation of the system performance of a particular application, mapping, and underlying architecture. Essential in this modeling methodology is that an application model is independent from architectural specifics, assumptions on hardware/software partitioning, and timing characteristics. As a result, a single application model can be used to exercise different hardware/software partitionings and can be mapped onto a range of architecture models, possibly representing different system architectures or simply modeling the same system architecture at various levels of abstraction. The layered infrastructure of Sesame is shown in Figure 4.

Figure 4

Figure 4

For application modeling, Sesame uses KPN application specifications that are generated by the Compaan tool-set or have been derived by hand. The computational behavior of an application is captured by instrumenting the code of each Kahn process with annotations that describe the application's computational actions. The reading from or writing to Kahn channels represents the communication behavior of a process within the application model. By executing the Kahn model, each process records its actions in order to generate its own trace of application events, which is necessary for driving an architecture model. These application events typically are coarse grained, such as "execute(DCT)" or "read(pixel-block,channel_id)".

Architecture models in Sesame, which typically operate at the so-called transaction level, simulate the performance consequences of the computation and communication events generated by an application model. These architecture models solely account for architectural performance constraints and do not need to model functional behavior. This is possible because the functional behavior is already captured in the application models, which subsequently drive the architecture simulation. An architecture model is constructed from generic building blocks provided by a library, which contains template performance models for processing cores, communication media (like busses) and various types of memory.

To map Kahn processes (i.e., their event traces) from an application model onto architecture model components and to support the scheduling of application events from different event traces when multiple Kahn processes are mapped onto a single architecture component, Sesame provides an intermediate mapping layer. This layer also facilitates the gradual refinement of the system-level architecture (performance) models. To this end, the mapping layer bridges the abstraction gap between application and architecture models by applying dataflow actors that transform coarse-grained application events into finer grained architecture events driving the architecture model components. This event refinement technique allows for architectural exploration at different levels of abstraction while maintaining high-level and architecture independent application models.

More information

For more information on Molen, follow this link. (http://ce.et.tudelft.nl/MOLEN/)
For more information on Compaan/Laura, see these links. (http://www.liacs.nl/~cserc/compaan/ and http://www.liacs.nl/~kienhuis/)
For more information on Sesame, see these links. (http://sesamesim.sourceforge.net/ and http://www.science.uva.nl/~andy/publications.html)

Contact information:

Contact Person: A.D. Pimentel
Phone: (+31) 20 5257578

Contact Information: Prof.dr. S. Vassiliadis
Phone: (+31) 15 2787146
Faculty of Electrical Engineering,
Mathematics, and Computer Science

Delft University of Technology
Computer Engineering Laboratory
Mekelweg 4 (15th floor)
2628 CD Delft
P.O. Box 5031
2600 GA Delft
The Netherlands
Phone: (+31) 15 2786196
Fax : (+31) 15 2784898


Last updated: June 10, 2005