People
Publications
Theses
Prototype
Financing
Related Projects
Alumni/ae
EU Students
Admissions
Back to CE
HipeacRC












MOLEN

Reconfigurable Processors

molen
... The Bit Crunching Mill

The MOLEN proposition is that reconfigurable processors, i.e. processors that adapt (dynamically or statically) their microarchitecture to fit application "design requirements", are the answer to the processor (embedded or not) hardware design challenges. To prove the viability of the proposition, we are working on multiple design aspects of (single and multi) processors on a chip using reconfigurable fabric. More specifically our team is currently working on the MOLEN reconfigurable processor, embedded reconfigurable systems, Co-design, costum computing programing paradigms, and reconfigurable systems on chip (SoCs) multi-processor SoCs (MPSoCs) and Networks on a Chip (NoCs). A very short description of some of the topics we are working on follows:

The MOLEN reconfigurable microcoded processor.

The MOLEN reconfigurable processor utilizes microcode and custom configured hardware to improve performance. We consider all kind of processor requirements from embedded to supercomputers. The MOLEN reconfigurable processor is organized as follows: The reconfigurable hardware execution of code (ranging from a single instruction to a piece of application code) is divided into two logical phases. In the first phase the reconfigurable hardware is being configured. In the second phase the execution of the code is being performed. In both phases, microcode is utilized to perform both the reconfiguration process and the execution of the code. Frequently utilized microcode resides permanently within the fixed part of an on-chip storage facility and non-frequently utilized microcode are paged into the pageable part of the same or another storage facility. The approach is generic, therefore, different applications can utilize the proposed processing capabilities. Our experimentation thus far has involved multimedia operations and supercomputing applications. In the multimedia experimentation, we investigate processing elements that are capable of performing operations and algorithms found in generating, coding, and displaying multimedia formats, i.e., pictures, video, audio, and graphics. At the current stage, the multimedia processor architecture has targeted multimedia standards including JPEG, MPEG-1/2, MPEG-4, and H.261. Currently, we consider graphic operations and power consumption. We have implemented the Molen processor in some available FPGA families e.g. prototype Virtex-II Pro FPGA family from Xilinx Corp. The Virtex-II Pro devices incorporate up to four PowerPC 405 GPP cores, FPGA reconfigurable fabric hardware, dedicated RAM blocks, and dedicated high-speed I/O blocks. All prototypes are available for use (free of charges).

Reconfigurable arithmetic and logic processor units.

The first basic goal is to speed up scientific (mostly vector based) code. Arithmetic (mostly complex to design in hardware) units normally are not present in general purpose processor instruction sets. Such operations include matrix multiplication, sparse matrix operations (such as transpose) etc. They can be implemented in reconfigurable hardware speeding up the execution of scientific programs. A second goal is to design a router and network related reconfigurable hardware. Reconfigurable processor units can be added to general purpose processors for domains (such us switches, networks, packet processing, protocols), that have not been envisioned for the general purpose processor paradigm providing substantial speed-ups.

Embedded IP execution units.

We analyze embedded system computational requirements in order to determine the feasibility of hardwired accelerator units and propose implementations for such units. We have considered JPEG, MPEG-1/2, MPEG-4, H.261, and lossless compression algorithms and we have proposed numerous specialized units including DCT/IDCT, sum of absolute differences (SAD), variable length decoding (VLD), Paeth encoding for portable network graphics (PNG), filters, entropy decoders, repetitive padding units, saturated arithmetic units, accepted quality function (AQF), color space convertors. For our experimentation, we have utilized various FPGA technologies and applied the Molen processor framework to the Philips Trimedia, the IBM's PowerPCs (processors integrated on the Xilinx Virtex-II Pro devices) and the ARM (processors integrated on the Altera Excalibur devices). We will keep on exploring embedded applications for potential hardwired IP units.

Memory architecture and implementations.

Multimedia and embedded processing has specific requirements for memory accesses. For high performance processing, it is required that the memory is accessed in a rectangular manner implying that to be efficient, mechanisms are needed that access memory in a two-dimensional manner. We propose mechanisms for media reconfigurable processors utilizing special addressed memory organizations and an implementation of two-dimensional memory cores that substantially improve the memory performance of the Molen FPGA implemented processor. In addition, because memory may require a significant amount of power, we propose a cache organization that reduces the number of off-chip accesses thus decreasing main memory power consumption of the reconfigurable processor architecture.

Compiler and design space exploration tools.

We have defined a programming paradigm that target the Molen reconfigurable microcoded processor engine and we are developing a backend compiler and a design space exploration toolset. The programming paradigm is based on sequential consistency. It provides mechanisms for parallel and concurrent hardware execution and it is intended (currently) for single program execution. In order to conform to the Molen programming paradigm, an existing compiler has been extended to support the required instruction set and register set extensions. Moreover, a specific mechanism has been developed for passing parameters/results in the case of parallel executions. The compiler and the design space exploration tools are developed in the project The Delft Workbench.

Low-Power High-Performance Graphics Architectures.

We are designing a low-power 2D/3D graphics hardware accelerator for mobile terminals equipped with an ARM processor core. The purpose of using a graphics accelerator is to move some of the graphics-related computations, in particular the rasterization, from the CPU to this dedicated hardware device in order to improve the rendering speed for graphics applications. One important concern for a graphics accelerator meant to be employed in mobile terminals is a low power consumption figure since the most current graphics accelerators are notorious for their high power consumption. Therefore, algorithmic- and circuit-level techniques for low-power graphics need to be studied and evaluated.

GraalBench Low-Power Graphics benchmark.

The GraalBench is a 3D graphics benchmark suite suitable for 3D graphics on low-power mobile systems, in particular mobile phones. These benchmarks were collected to facilitate our studies on low-power 3D graphics accelerators in the Graal (GRAphics AcceLerator) project. It includes traces of several games as well as virtual reality applications such as 3D museum guides. Applications were selected on the basis of several criteria such as resolution, polygon count, pixel rate, and relevance to mobile devices. For example, 3D FPS games or 3D virtual guides were considered relevant while CAD/CAM applications, such as contained in the Viewperf package, were excluded because it is unlikely that they will be offered on mobile devices (they often have high polygon count and require high resolution). More information and downloads can be found in The GraalBench Benchmark Suite.

System on Chip (SoC).


Clustered Torus
The underlining assumption of this research is that entire systems are migrating onto single chips and that a single chip incorporates numerous heterogeneous IPs. We further assume that the entire system comprises of a network that provides communications among chiplet IPs (processors). In essence, we consider embedded multiprocessor systems and networks on a chip (NoC). We believe that for such systems the interconnection networks have to be regular, expendable, and that they have to provide fast interconnections for some but not all communications between processors. Additionally, networks have to be reliable and avoid lifelock and deadlocks. In our research, we have been proposing a network topology, denoted as the clustered torus, having the properties discussed above and we examine the properties, performance, design issues, and feasibility of such a network.

The Delft Java processor.

We have developed a processor which is a parallel multi-threaded engine optimized for the Java language. Since Java bytecodes are interpreted, the interpretation of the program achieves a performance level that meets or exceeds natively compiled code. The multi-threaded processor architecture is currently being utilized by Sandbridge Technologies to develop cost-effective low-power broadband wireless processor technology. The processor, a 3G digital wireless multi-threaded processor architecture, is intended to meet demands for services such as web browsing, MP3 audio, MPEG-4 video, and video telephony for handheld devices.


MOLEN Research Project
Telephone: +31 15 2787364
Fax: +31 15 2784898

Computer Engineering Laboratory
Faculty of Electrical Engineering, Mathematics and Computer Science (EEMCS)
TU Delft, Delft, the Netherlands