|
MOLEN
Reconfigurable Processors
|

... The Bit Crunching Mill |
The MOLEN proposition is that reconfigurable processors, i.e. processors that adapt (dynamically or statically) their microarchitecture to fit application "design requirements", are the answer to the processor (embedded or not) hardware design challenges. To prove the viability of the proposition, we are working on multiple design aspects of (single and multi) processors on a chip using reconfigurable fabric. More specifically our team is currently working on the MOLEN reconfigurable processor, embedded reconfigurable systems, Co-design, costum computing programing paradigms, and reconfigurable systems on chip (SoCs) multi-processor SoCs (MPSoCs) and Networks on a Chip (NoCs). A very short description of some of the topics we are working on follows:
The MOLEN reconfigurable microcoded processor.
The MOLEN reconfigurable processor utilizes microcode and custom configured hardware to improve performance.
We consider all kind of processor requirements from embedded to supercomputers.
The MOLEN reconfigurable processor is organized as follows: The reconfigurable hardware execution
of code (ranging from a single instruction to a piece of application code) is
divided into two logical phases. In the first phase the reconfigurable hardware is
being configured. In the second phase the execution of the code is being performed.
In both phases, microcode is utilized to perform both the reconfiguration process
and the execution of the code. Frequently utilized microcode resides permanently
within the fixed part of an on-chip storage facility and non-frequently utilized
microcode are paged into the pageable part of the same or another storage facility.
The approach is generic, therefore, different applications can utilize the proposed
processing capabilities. Our experimentation thus far has involved multimedia
operations and supercomputing applications. In the multimedia experimentation, we investigate processing elements
that are capable of performing operations and algorithms found in generating,
coding, and displaying multimedia formats, i.e., pictures, video, audio, and
graphics. At the current stage, the multimedia processor architecture has targeted
multimedia standards including JPEG, MPEG-1/2, MPEG-4, and H.261. Currently, we
consider graphic operations and power consumption. We have implemented the Molen
processor in some available FPGA families e.g. prototype Virtex-II Pro FPGA family from Xilinx Corp. The Virtex-II Pro
devices incorporate up to four PowerPC 405 GPP cores, FPGA reconfigurable fabric
hardware, dedicated RAM blocks, and dedicated high-speed I/O blocks. All prototypes are available for use (free of charges).
Reconfigurable arithmetic and logic processor units.
The first basic goal is to speed up scientific (mostly vector based) code.
Arithmetic (mostly complex to design in hardware) units normally are not present
in general purpose processor instruction sets.
Such operations include matrix multiplication, sparse matrix operations (such as transpose) etc.
They can be implemented in reconfigurable hardware speeding up the execution of
scientific programs. A second goal is to design a router and network related
reconfigurable hardware. Reconfigurable processor units can be added to
general purpose processors for domains (such us switches, networks,
packet processing, protocols), that have not been envisioned for the
general purpose processor paradigm providing substantial speed-ups.
Embedded IP execution units.
We analyze embedded system computational requirements in order to determine
the feasibility of hardwired accelerator units and propose implementations for such
units. We have considered JPEG, MPEG-1/2, MPEG-4, H.261, and lossless compression
algorithms and we have proposed numerous specialized units including DCT/IDCT, sum of
absolute differences (SAD), variable length decoding (VLD), Paeth encoding for
portable network graphics (PNG), filters, entropy decoders, repetitive padding units,
saturated arithmetic units, accepted quality function (AQF), color space convertors.
For our experimentation, we have utilized various FPGA technologies and applied the
Molen processor framework to the Philips Trimedia, the IBM's PowerPCs (processors
integrated on the Xilinx Virtex-II Pro devices) and the ARM (processors integrated on
the Altera Excalibur devices). We will keep on exploring embedded applications for potential hardwired IP units.
Memory architecture and implementations.
Multimedia and embedded processing has specific requirements for memory accesses. For high
performance processing, it is required that the memory is accessed in a rectangular
manner implying that to be efficient, mechanisms are needed that access memory in a
two-dimensional manner. We propose mechanisms for media reconfigurable processors utilizing special
addressed memory organizations and an implementation of two-dimensional memory cores
that substantially improve the memory performance of the Molen FPGA implemented
processor. In addition, because memory may require a significant amount of power, we
propose a cache organization that reduces the number of off-chip accesses thus
decreasing main memory power consumption of the reconfigurable processor architecture.
Compiler and design space exploration tools.
We have defined a programming paradigm that target the Molen reconfigurable microcoded processor engine and we are
developing a backend compiler and a design space exploration toolset. The programming paradigm is based on sequential
consistency. It provides mechanisms for parallel and concurrent hardware execution
and it is intended (currently) for single program execution. In order to conform to
the Molen programming paradigm, an existing compiler has been extended to support the
required instruction set and register set extensions. Moreover, a specific mechanism
has been developed for passing parameters/results in the case of parallel executions.
The compiler and the design space exploration tools are developed in the project
The Delft Workbench.
Low-Power High-Performance Graphics Architectures.
We are designing a low-power 2D/3D graphics hardware accelerator for mobile terminals
equipped with an ARM processor core. The purpose of using a graphics accelerator is to move
some of the graphics-related computations, in particular the rasterization, from the
CPU to this dedicated hardware device in order to improve the rendering speed for
graphics applications. One important concern for a graphics accelerator meant to be
employed in mobile terminals is a low power consumption figure since the most current
graphics accelerators are notorious for their high power consumption. Therefore,
algorithmic- and circuit-level techniques for low-power graphics need to be studied
and evaluated.
GraalBench Low-Power Graphics benchmark.
The GraalBench is a 3D graphics benchmark suite suitable for 3D graphics on low-power mobile systems, in particular mobile phones.
These benchmarks were collected to facilitate our studies on low-power 3D graphics accelerators
in the Graal (GRAphics AcceLerator) project. It includes traces of several games as well as
virtual reality applications such as 3D museum guides. Applications were selected
on the basis of several criteria such as resolution, polygon count, pixel rate, and relevance
to mobile devices. For example, 3D FPS games or 3D virtual guides were considered relevant
while CAD/CAM applications, such as contained in the Viewperf package, were excluded
because it is unlikely that they will be offered on mobile devices (they often have high polygon count
and require high resolution). More information and downloads can be found in
The GraalBench Benchmark Suite.
System on Chip (SoC).
The underlining assumption of this research is that entire systems are migrating onto single chips and that a
single chip incorporates numerous heterogeneous IPs. We further assume that the entire
system comprises of a network that provides communications among chiplet IPs (processors). In
essence, we consider embedded multiprocessor systems and networks on a chip (NoC). We believe that
for such systems the interconnection networks have to be regular, expendable, and that
they have to provide fast interconnections for some but not all communications between processors.
Additionally, networks have to be reliable and avoid lifelock and deadlocks. In our
research, we have been proposing a network topology, denoted as the clustered
torus, having the properties discussed above and we examine the properties,
performance, design issues, and feasibility of such a network.
The Delft Java processor.
We have developed a processor which is a parallel multi-threaded engine optimized for the Java
language. Since Java bytecodes are interpreted, the interpretation of the program
achieves a performance level that meets or exceeds natively compiled code. The
multi-threaded processor architecture is currently being utilized by
Sandbridge Technologies to develop
cost-effective low-power broadband wireless processor technology. The processor, a
3G digital wireless multi-threaded processor architecture, is intended to meet demands for
services such as web browsing, MP3 audio, MPEG-4 video, and video telephony for
handheld devices.
|
|
MOLEN Research Project
Telephone: +31 15 2787364
Fax: +31 15 2784898
Computer Engineering Laboratory
Faculty of Electrical Engineering, Mathematics and Computer Science (EEMCS)
TU Delft, Delft, the Netherlands
|
|