|
The Δ-ILIAD research conserns with new computer architectural paradigms.
The gamma of processor architectures considered include general purpose,
domain (e.g. media), vector processor extensions, polymorphic processing
and non-conventional architectures. Furthermore, we perform some reality
checks for existing processor implementations. More specifically the
Delta-Iliad team is currently working on the following research topics:
Vector Facility
Traditionally, vector processors are limited by memory accesses,
sectioning, and simple-minded computations. The Δ-ILIAD vector
architecture eliminates sectioning, alleviates storage access overhead
by overlapping accesses with computations
and merging both of them into a single instruction.
In addition to traditional operations, Δ-ILIAD architecture
includes new instructions that perform complex multicycle
latency operations.
With the introduction of the Δ-ILIAD mechanism a substantial code
elimination is achieved. The specific dense and sparse architectural
mechanisms of Δ-ILIAD include the Complex Streamed Instruction
Set (CSI) and the Δ-ILIAD sparse vector processing, which are described
below.
CSI Media Architecture.
The Complex Streamed Instruction Set Architecture (CSI) is a
memory-to-memory vector architecture targeted at multimedia
applications. A single CSI instruction can process data
streams of arbitrary length and, in addition to traditional
arithmetic and logical operations,
performs data accesses, conversion
between storage and computation formats (packing and unpacking), and
complex arithmetic hardwired computation.
The main new features of the CSI are elimination of the vector sectioning
instructions, elimination of the packing/unpacking instructions, and
introduction of new complex media related arithmetic instructions.
Sparse Matrix Architectures.
Vector processors are known for performing good on large amounts of regular data.
However, when operating on
sparse matrices such as the one depicted here, the irregular structure
induces a performance degradation. The main reasons are the need
for expensive indexed memory accesses and high vector startup overhead
due to short vectors. Moreover, the need for positional information when storing
sparse matrices implies an extra storage overhead.
The aim of this project is to aleviate most of
the aforementioned problems and increase the efficiency of vector processors
on sparse operations. This is achieved by introducing a new block
based Sparse Martix format. In conjunction with a harware Vector ISA
extension and specialized hardware for sparse matrix computations
we can aleviate the need for indexed memory accesses. Speedups of 4-5 times
have been obtained for matrix-vector multiplication, an important kernel in
sparse matrix processing.
Delft Sparse Architectures Benchmark.
The Delft Sparse Architecture Benchmark (D-SAB) has been developed
in the Computer Engineering
Laboratory as a part of the
Δ-ILIAD
project. Its purpose is the evaluation of novel architectures and
techniques for processing Sparse Matrices. The benchmark comprises
two parts: The benchmark operations and the benchmark matrices.
Polymorphic processors
Current processor architectures force a
complete separation of tasks between implementations
(hardware-architectures), which interpret an architecture, and
the programmer targeting this architecture.
Polymorphic processors eliminate the gap between
the (hardware) implementations and the programmer of the hardware.
This is achieved using a new programming paradigm and
emulation on reconfigurable hardware.
Delft Linpack
The TOP500 Supercomputer Sites webpage ( http://www.top500.org/) presents the world
best highperformance computers. The LINPACK Benchmark is used as a yardstick for performance.
Companies like IBM, HP, NEC and Intel (ASCI red) are presented there with their top supercomputers.
The interesting question is: Can a university student team with out of the shelf inexspensive
hardware components beat the industry supercomputers on Linpack? The DelftLinpack-1 processor
uses the power of reconfigurable hardware in order to attempt an answer of this question.
Xilinx state of the art FPGA (XC2VP50) incorporating reconfigurable logic and four PowerPC
general purpose cores will be used to implement the DelftLinpack-1 machine.
|