|
|
Partners:
| Delft University of Technology, Computer Engineering (Prof. dr. S. Vassiliadis) | |
| University of Amsterdam, Computer Systems Architecture (Dr. A.D. Pimentel) | |
| Leiden University, Leiden Institute of Advanced Computer Science (Prof. dr. E.F. Deprettere) | |
| Philips Research Laboratories Eindhoven (Dr. P. van der Wolf and Dr. J. van Eijndhoven) |
|
The Artemis workbench
The Artemis workbench consists of a set of methods and tools conceived and integrated in a framework to allow designers to model applications and SoC-based (multiprocessor) architectures at a high level of abstraction, to map the former onto the latter, and to estimate performance numbers through co-simulation of application and architecture models. Figure 1 depicts the flow of operation for the Artemis workbench, where the grey parts refer to the various tool-sets that together embody the workbench. The point of departure is an application domain (being multimedia applications for Artemis), an experimental domain-specific platform architecture and a domain-specific application specified as an executable sequential program. The platform architecture is instantiated in the architecture model layer of the workbench, while the application specification is converted to a functionally equivalent concurrent specification using a translator called Compaan. More specifically, Compaan transforms the sequential application specification into a Kahn Process Network (KPN). In between the application and architecture layers there is a mapping layer. This mapping layer provides means to perform quantitative performance analysis on levels of abstraction, and to refine application specification components between levels of abstraction. Such refinement is required to match application specifications to the level of detail of the underlying architecture models. Effectively, the mapping layer bridges the gap between the application and architecture (models), sometimes referred to as the implementation gap.
Figure 1
Because Artemis operates at a high level of abstraction, low-level component performance numbers can be used to calibrate the system-level architecture models. To this end, individual processes (i.e., code segments) of a KPN application specification are taken apart and implemented as individual low-level components (that appear in the current high-level instance of the platform architecture). This results in performance numbers -- as well as in estimations on cost and power consumption -- for the low-level components that the system-level modeling framework needs to provide accurate performance estimations for the multi-processor system architecture as a whole. For this calibration process, the Artemis workbench uses the Laura tool-set and the Molen calibration platform architecture.
The Compaan and Laura tool-sets
Today, traditional
imperative languages like C, C++ or Matlab are still dominant
with respect to implementing applications for SoC-based (platform)
architectures. It is, however, very difficult to map these
imperative implementations, with typically a sequential model
of computation, onto multi-processor SoC architectures that
allow for exploiting task-level parallelism in applications. In
contrast, models of computation that inherently express task-level
parallelism in applications and make communications explicit,
such as CSP and Process Networks, allow for easier mapping onto multi-processor
SoC architectures. However, specifying applications using these
models of computation usually requires more implementation effort
in comparison to sequential imperative solutions.
In Artemis, we use an approach in
which we start from a sequential imperative application
specification -- more specifically an application written in
a subset of Matlab -- which is then automatically converted
into a Kahn Process Network (KPN) using the Compaan tool-set.
This conversion is fast and correct by construction. In the KPN
model of computation, parallel processes communicate with each
other via unbounded FIFO channels. Reading from channels is done
in a blocking manner, while writing to channels is non-blocking.
We decided to use KPNs for application specifications because
they nicely fit with the targeted media-processing application domain
and they are deterministic. The latter implies that the same application
input always results in the same application output, irrespective
of the scheduling of the KPN processes. This provides us with a
lot of scheduling freedom when, as will be discussed later on,
mapping KPN processes onto SoC architecture models for quantitative
performance analysis.
The infrastructure of the Compaan
tool-set is illustrated on the left-hand side of Figure
2. The grey parts refer to the separate tools that are part
of Compaan, while the white parts refer to the (intermediate) formats
of the application specification. Starting-point is an application
specification in Matlab, which needs to be specified as a parameterized
static nested loop program. Recently, Compaan's scope has been
extended to also include weakly-dynamic nested loop programs that
allow for specifying data-dependent behavior. On these Matlab
application specifications, various source-level transformations
can be applied in order to, for example, increase or decrease the
amount of parallelism in the final KPN. In a next step, the Matlab
code is transformed into single assignment code (SAC), which resembles
the dependence graph (DG) of the original nested loop program.
Hereafter, the SAC is converted to a Polyhedral Reduced Dependency
Graph (PRDG) data structure, being a compact mathematical representation
of a DG in terms of polyhedra. Finally, a PRDG is converted
into a KPN by associating a KPN process with each node in the PRDG.
The parallel KPN processes communicate with each other according
to the data dependencies given in the DG.
Figure 2
The Laura tool-set, depicted on the right-hand side of Figure 2, takes a KPN as input and produces synthesizable VHDL code that implements the application specified by the KPN for a specific FPGA platform. To this end, the KPN specification is first converted into a functionally equivalent network of conceptual processors, called hardware model. This hardware model, which is platform independent as no information on the target FPGA platform is incorporated, defines the key components of the architecture and their attributes. It also defines the semantic model, i.e., how the various components interact with each other. Subsequently, platform specific information is added to the hardware model. This includes the addition of IP cores that implement certain functions in the original application as well as setting attributes of components such as bit-width and buffer sizes. In the final step, the hardware model is converted into VHDL. To do so, Laura supplies a piece of VHDL code for each component in the hardware model that expresses how to represent that component in the target architecture. Using commercial tools, the VHDL code can then be synthesized and mapped onto an FPGA. As can be seen in Figure 2, the results from this automated implementation trajectory can be fed back to Compaan to explore different transformations that will, in the end, lead to different implementations.
The Molen calibration platform
Figure 3
Figure 3 depicts
the platform architecture that is used for component calibration
in Artemis. This platform architecture, called Molen, connects
a programmable processor with a reconfigurable unit and uses
microcode to incorporate architectural support for the reconfigurable
unit. Instructions are fetched from the memory, after which the
arbiter performs a partial decoding on the instructions to determine
where they should be issued. Those instructions that have been implemented
in fixed hardware are issued to the core processing (CP) unit,
which is one of the PowerPCs from a Xilinx Virtex II Pro platform
in the Molen prototype implementation, while instructions for
custom execution are redirected to the reconfigurable unit. The
instructions entering the CP unit are further decoded and then
issued to their corresponding functional units.
The reconfigurable unit consists of
a custom configured unit (CCU), currently implemented by
the Xilinx Virtex II Pro FPGA, and a reconfigurable micro-code
unit. The reconfigurable unit performs operations that can
be as simple as an instruction or as complex as a piece of code describing
a certain function. Molen divides an operation into two distinct
phases: set and execute. The set phase is responsible for reconfiguring
the CCU hardware, enabling the execution of an operation. Such
a phase may be subdivided into two sub-phases, namely partial-set
(p-set) and complete-set (c-set). The p-set phase covers common
functions of an application or set of applications. Subsequently,
the c-set sub-phase only reconfigures those blocks in the CCU
which are not covered in the {p-set sub-phase in order to complete
the functionality of the CCU.
To perform the actual reconfiguration
of the CCU, reconfiguration microcode is loaded into the
reconfigurable micro-code unit and then executed (using p-set
and c-set instructions). Hereafter, the execute phase is responsible
for the operation execution on the CCU, performed by executing
the execution microcode. Important in this respect is the fact
that both the set and execute phases do not explicitly specify a
certain operation to be performed. Instead, the p-set, c-set and
execute instructions point to the memory location where the reconfiguration
or execution microcode is stored.
The Compaan and Laura tool-sets in combination
with the Molen platform architecture provide great opportunities
for the previously discussed calibration of system-level architecture
models. For this purpose, Laura maps a specific component from
an application specification to a hardware implementation by
converting the Compaan-generated KPN associated with the application
component to a VHDL implementation. This VHDL code is subsequently
used as reconfiguration microcode for Molen's CCU, while the remainder
of the application specification (i.e., the code that has not been
synthesized to a hardware implementation) is executed on Molen's core
processor. As a result, the application component mapped onto the CCU
provides low-level implementation numbers that can be used to calibrate
the corresponding component in the system-level architecture model.
The Sesame modeling and simulation environment
Artemis' system-level modeling and simulation environment, called Sesame, facilitates performance analysis of embedded systems architectures according to the increasingly popular Y-chart design approach. This means that we recognize separate application and architecture models within a system simulation. An application model describes the functional behavior of an application, including both computation and communication behavior. An architecture model defines architecture resources and captures their performance constraints. After explicitly mapping an application model onto an architecture model, they are co-simulated via trace-driven simulation. This allows for evaluation of the system performance of a particular application, mapping, and underlying architecture. Essential in this modeling methodology is that an application model is independent from architectural specifics, assumptions on hardware/software partitioning, and timing characteristics. As a result, a single application model can be used to exercise different hardware/software partitionings and can be mapped onto a range of architecture models, possibly representing different system architectures or simply modeling the same system architecture at various levels of abstraction. The layered infrastructure of Sesame is shown in Figure 4.
Figure 4
For application
modeling, Sesame uses KPN application specifications that are
generated by the Compaan tool-set or have been derived by hand.
The computational behavior of an application is captured by instrumenting
the code of each Kahn process with annotations that describe the
application's computational actions. The reading from or writing
to Kahn channels represents the communication behavior of a process
within the application model. By executing the Kahn model, each
process records its actions in order to generate its own trace of application
events, which is necessary for driving an architecture model. These
application events typically are coarse grained, such as "execute(DCT)"
or "read(pixel-block,channel_id)".
Architecture models in Sesame, which typically
operate at the so-called transaction level, simulate the performance
consequences of the computation and communication events generated
by an application model. These architecture models solely account
for architectural performance constraints and do not need to model
functional behavior. This is possible because the functional behavior
is already captured in the application models, which subsequently
drive the architecture simulation. An architecture model is constructed
from generic building blocks provided by a library, which contains
template performance models for processing cores, communication
media (like busses) and various types of memory.
To map Kahn processes (i.e., their event traces)
from an application model onto architecture model components
and to support the scheduling of application events from different
event traces when multiple Kahn processes are mapped onto a single
architecture component, Sesame provides an intermediate mapping
layer. This layer also facilitates the gradual refinement of the
system-level architecture (performance) models. To this end, the mapping
layer bridges the abstraction gap between application and architecture
models by applying dataflow actors that transform coarse-grained
application events into finer grained architecture events driving the
architecture model components. This event refinement technique allows
for architectural exploration at different levels of abstraction while
maintaining high-level and architecture independent application models.
More information
| For more information on Molen, follow this link. (http://ce.et.tudelft.nl/MOLEN/) | |
| For more information on Compaan/Laura, see these links. (http://www.liacs.nl/~cserc/compaan/ and http://www.liacs.nl/~kienhuis/) | |
| For more information on Sesame, see these links. (http://sesamesim.sourceforge.net/ and http://www.science.uva.nl/~andy/publications.html) |
Contact Person: A.D. Pimentel
Phone: (+31) 20 5257578
Contact Information: Prof.dr. S. Vassiliadis
Phone: (+31) 15 2787146
Faculty of Electrical Engineering,
Mathematics, and Computer Science
Delft University of Technology
Computer Engineering Laboratory
Mekelweg 4 (15th floor)
2628 CD Delft
P.O. Box 5031
2600 GA Delft
The Netherlands
Phone: (+31) 15 2786196
Fax : (+31) 15 2784898
|
|