|
|
Here we have listed the currently available MSc thesis topics
within the Arachne project. We refer to the Admissions page if you are
interested in doing a PhD with us. In case you have other MSc thesis topics that relate
to Arachne and they are not listed below, feel free to contact: J.S.S.M.Wong@ewi.tudelft.nl.
-
Minimizing resource utilization of the memcpy hardware
-
Cost-Performance Trade Offs and Implementation of a High-Performance Partially Buffered Crossbar Switch
-
Design and Implementation of a High-Performance Buffered Crossbar Switch Fabric Using Network on Chip
-
Image Spam
-
Network Processing Investigation
-
Fault-Tolerant Network Processing
-
Designing TCP/IP Functions in FPGA
-
Incorporating Video-conferencing functionality on the TCP/IP stack
-
Building a Secure Video-based Internet Telephony Solution in FPGA
-
The PINE project (Packet Inspection Engine)
- Minimizing resource utilization of the memcpy hardware
Context: Since the 80s, several authors have identified that memory copies are a time-consuming part of the TCP/IP stack and of the operating system. As such, we presented a memcpy hardware that is able to speed up memory copies, in particular the memcpy function. The hardware was written in VHDL, the behavior simulation of the hardware was performed using ModelSim and the synthesis was done using the Xilinx XST. However, the amount of hardware resources utilized in the referred work is not negligible, although the performance benefits given by the hardware can reach 7x speed up.
Problem statement: We believe that a more clean and optimized implementation of the mentioned hardware will result in a reduction of the hardware resources presented in the papers, keeping the same performance.
Expected effort: The following tasks should be performed:
- Understand the memcpy hardware concept;
- Understand the VHDL code, the prototyping platform and its limitations, and the performance requirements of the memcpy hardware;
- Optimize the existing VHDL code to minimize the resource utilization, keeping the required performance;
- Perform an exhaustive test of the memcpy hardware.
If there is still time available, the student should also port the Linux operating system to the prototyping platform with memcpy hardware.
Contact person: Stephan Wong and Filipa Duarte
phones: +31 15 2781099 and +31 15 2785021
emails: J.S.S.M.Wong@ewi.tudelft.nl and F.Duarte@ce.et.tudelft.nl
- Cost-Performance Trade Offs and Implementation of a High-Performance Partially Buffered Crossbar Switch
Context: Numerous proposals for identifying suitable architectures for high-performance switches (high speed IP routers and ATM switches) have been investigated and implemented in both academia and industry. These architectures can be classiffed based on various attributes such as queuing schemes, scheduling algorithms, and/or switch fabric topology. The crossbar-based architecture is the dominant architecture for today's high-performance packet switches (IP routers, ATM switches, Ethernet switches) for at least three reasons. First, they are more scalable than their direct competitors, shared-bus and shared memory. This is due to the limitation in bus transfer bandwidth and/or the limitation in the memory access bandwidth. Second, they provide simple point-to-point connections, which means they can operate at very high-speed (up to 10Gb/s). Third, they can support multiple I/O transactions simultaneously. This can increase the aggregate bandwidth of the system, which can be in the hundreds of Gbps. There are two main variants of the crossbar fabric: unbuffered and internally buffered. On one hand, unbuffered crossbar fabric switches exhibit the advantage of using no internal buffers. However, they require a complex scheduler to solve input and output ports contention. Internally buffered crossbar fabric switches, on the other hand, overcome the scheduling complexity by means of distributed schedulers. However, they require expensive internal buffers- one per crosspoint. Partially Buffered Crossbars (PBC), a crossbar where there is a small number of internal buffers (B) per output, have recently been proposed as the best alternative. PBCs have the potential of achieving the performance of fully buffered crossbars, with a cost comparable to unbuffered crossbars.
Problem statement: For a given switching system with port count N, find the cost-performance trade off. Performance is defined as packet delay and throughput. The cost is defined by the physical cost of the switch, i.e. length and number of physical wires on the chip (ASIC or FPGA), and the size and number of internal buffers. This cost definition is not known in the literature. We wish to compare the cost-performance trade off curves of unbuffered and buffered switches.
Expected effort: The student is expected to carry some literature research in switch design and implementation. Then, she/he is expected to perform some performance analysis (both on the abstract level as well as the hardware level - ASIC and/or FPGA implementation) of the Partially Buffered Crossbar switching architecture. The student is expected to finally give a comprehensive study of her/his findings with the respect to the PBC architecture trade offs. This is a new research direction and we expect a research paper published at a conference to be one of the results of this internship
Location: This internship is at the University of Delft. However, we expect interactions with NXP Semiconductors Research (formerly Philips Research) for NOC simulation and input on ASIC cost-performance measures.
Contact person: Lotfi Mhamdi, Prof. Kees Goossens
phone: +31 15 2789656, e-mail: lotfi[at]ce.et.tudelft.nl; k.goossens[at]ewi.tudelft.nl.
- Design and Implementation of a High-Performance Buffered Crossbar Switch Fabric Using Network on Chip
Context: The crossbar fabric is widely used as the interconnect for high performance packet switches (high speed IP routers, ATM switches, Ethernet switches) due to its low cost and scalability. Crossbar fabric-based packet switches belong to two main categories, depending on their fabric core: bufereless and fully buffered. These two categories represent two extremes in a wide range of architectures depending on the number and layout of buffers inside the crossbar fabric. Recently, Partially Buffered Crossbars (PBC) have been proposed as a good alternative in packet switching design. Irrespective of whether the crossbar core is unbuffered, partially buffered or fully buffered, a scheduling algorithm is required to configure the crossbar switch matrix, i.e. deciding which input port sends to which output port by closing their corresponding (input/output) crosspoint. A packet switch contains input line cards with big buffers as well as output buffers(queues) for packet reassembly. Due to the big size of these input/output buffers, they usually consist of DRAM memories and hence have high access times. When the fabric core is unbuffered, the limitation in input/output memories access times implies that the scheduler has to select at most one packet from each input port (one memory access) and sends at most one packet to each output. This process is known as the matching (or scheduling) and is often complex to implement in hardware (to run at high rates). Using a partially or fully buffered crossbar core, however, can relax this limitation and reduce the scheduling complexity, by allowing multiple packets destined to the same output port to be temporarily stored in the fabric internal buffers. Each internal buffer is usually dedicated to packets belonging to the same input/output pairs. While this results in simplicity in design, it has some drawbacks: first, the internal buffers are over designed/dimensioned with respect to the dynamic of the switch. Second, it often results in unbalanced internal buffers utilization.
Problem statement: Instead of using dedicated internal buffers for packets belonging to the same input/output
source/destination ports, can we use alternative design(s). In particular, we envision the Network On Chip (NOC)
paradigm in this design. We wish to study the design of buffered crossbars using a NOC, by considering every crosspoint
as a router belonging to the crossbar fabric chip network.
Expected effort: The potential candidate is expected to perform a literature study in switch design and implementation. Then, she/he is expected to apply the NOC paradigm in designing a buffered crossbar fabric switch. The student is expected to finally give a comprehensive study of her/his findings with respect to the trade offs in terms of cost-performance of the new switch design. One of the outcomes of this internship is expected to entail a a research paper published at a conference.
Number of Positions: Two available positions. One will be focused on the design and the other will focus on the simulation environment.
Location: This internship is at the University of Delft. However, we expect interactions with NXP Semiconductors Research (formerly Philips Research) for NOC simulation and input on ASIC cost-performance measures.
Contact person: Lotfi Mhamdi, Prof. Kees Goossens
phone: +31 15 2789656, e-mail: lotfi[at]ce.et.tudelft.nl; k.goossens[at]ewi.tudelft.nl.
- Image Spam
Context: Image spam is unwanted e-mail in which text is embedded in an image to
fool spam filters. Traditionally, spam filters catch spam by scanning
messages for key words and by using other text-based techniques.
However, according to several vendors, approximately 25% of all unwanted
e-mail today is image-based spam. It, thus, becomes clear that advanced
spam filtering techniques are in order. One of the solutions is to use
optical character recognition (OCR) and fingerprint analysis to catch
image-based spam, but such schemes introduce a high overhead when
implemented in software.
Problem statement: The topic of this thesis is to explore the use of FPGAs to accelerate
the image processing required for an image based spam-filter. A
software/hardware partitioning must be performed to find the bottleneck
of the system and an efficient hardware acceleration unit must be
developed.
Expected effort: The student is expected, first, to perform a study of existing
(software) OCR and other image scanning techniques. For instance, a good
idea is to check through currently implemented OCR techniques in
commercially available page scanners. Then, (s)he should attempt to
choose one or more software algorithms for implementing in the FPGA
board. Here, hardware/software partitioning may take place and the
student is expected to find an optimal solution and justify his/her
choices. Lastly, actual implementation, verification and testing of the
overall design is expected.
Contact person: Christos Strydis, Christoforos Kachris, Stephan Wong
phone: +31 15 2783591, contacts: http://ce.et.tudelft.nl/person.php?id=555; http://ce.et.tudelft.nl/person.php?id=581.
-
Network Processing Investigation
Context: In telecommunication systems, transmission speeds are either limited
by the capacity of physical links connecting network devices or by the performance of
processing elements in network devices. Nowadays, with the advances of optical technology,
systems with speeds in the order of OC-48 (2,5 Gbps) and OC-192 (10 Gbps) are deployed in
the field with OC-768 (40 Gbps) systems are a reality in trials. Capacity of physical
links has practically become a non-issue, at least for a few more years. Rather, the
bottleneck has shifted from physical links to the processing elements in network devices
that must now keep in pace with the ever increasing line rates. This
project investigates the possibilities to speed up functions found in network processing
and implement them in specialized hardware (ASIC or FPGA) next to a general-purpose processor.
Problem statement: What functions in network processing, in particular TCP/IP
stack processing, can be sped up and how much speed up can be gained?
Expected effort: The student is expected to perform some literature
research in accustoming himself/herself to the subject. Then, perform architectural
simulations in order to investigate the performance potential of speeding-up several
functions found in the TCP/IP stack processing. This is performed as follows. First, take
an existing TCP/IP stack processing benchmark (in ANSI C) and compile them to be used on
the sim-outorder cycle accurate simulator found in the SimpleScalar Toolset. Then, modify
both the benchmark and the sim-outorder simulator to reflect the inclusion of specialized
hardware. Finally, obtain results about performance, etc.
Contact person: Stephan Wong
phone: +31 15 2781099, email: J.S.S.M.Wong@ewi.tudelft.nl.
-
Fault-Tolerant Network Processing
Context: In telecommunication systems, transmission speeds are either
limited by the capacity of physical links connecting network devices or by the performance
of processing elements in the network devices. Considering that current network operate at
>10 Gbits/s, the capacity of physical links has become a non-issue. On the hand, the
transmission speeds are often hampered by the limited protocol processing performance of
the network devices.
Problem statement: It is the purpose of this MSc project to increase the
performance of such network processing. Before the describing the actual project, we
present a short description of a typical protocol stack, namely the TCP/IP stack consisting
of the following layers: application layer, presentation layer, session layer, TCP layer,
IP layer, MAC layer, physical layer. We are particularly interested in the processing in
the following layers: TCP, IP, MAC, and physical layer. Furthermore, we are interested in
improving the performance of the TCP/IP layers in both the transmitting and receiving
directions. The MSc project must place special focus on the fault handling and exception
handling found in these layers. The protocol processing should preferably be executed in
parallel with other processes, e.g., those needed in higher network layers.
Expected effort: The student is required to perform the following tasks. First, the student must perform
a literature study in order to accustom himself/herself with the TCP/IP stack and other
existing network processors. Such a study will determine the requirements, such as
transfer delays, throughput, etc., that are posed by modern applications, e.g., image
processing and transfer applications. Furthermore, the operations/functions that needed
to be sped up must be identified. Second, a hardware design must be implemented
incorporating the identified operations/functions (or at least the most essential
operations/functions). An additional design requirement to the hardware design is that
it must be (micro)programmable in order to better facilitate future modifications to the
design. Third, a performance evaluation (simulation or real-life measurements) must be
performed in order to determine whether the implemented operations/functions meet the
requirements posed in the first study.
Contact person: Stephan Wong
phone: +31 15 2781099, email: J.S.S.M.Wong@ewi.tudelft.nl.
-
Designing TCP/IP Functions in FPGA
Context: In telecommunication systems, transmission speeds are either
limited by the capacity of physical links connecting network devices or by the performance
of processing elements in network devices. Nowadays, with the advances of optical
technology, systems with speeds in the order of OC-48 (2,5 Gbps) and OC-192 (10 Gbps)
are deployed in the field with OC-768 (40 Gbps) systems are a reality in trials.
Capacity of physical links has practically become a non-issue, at least for a few more
years. Rather, the bottleneck has shifted from physical links to the processing elements
in network devices that must now keep in pace with the ever increasing line rates. This
project investigates the possibilities to speed up functions
found in network processing and implement them in specialized hardware (ASIC or FPGA)
next to a general-purpose processor.
Problem statement: Design an implementation for identified network
processing functions, e.g., in TCP/IP stack processing, and determine the best FPGA
implementation platform based on performance and area.
Expected effort: The student is expected to perform some literature
research in accustoming himself/herself to the subject. Then, network processing
function(s) must be identified (by student or in cooperation with mentor) to be
designed and implemented in FPGA. The work entails design of the identified function(s)
in VHDL and mapping them to specific FPGA platforms. Utilizing synthesis software,
performance and area metrics can be determined. When the possibility presents itself,
it might be even possible to actually implement the design in actual hardware.
Contact person: Stephan Wong
phone: +31 15 2781099, email: J.S.S.M.Wong@ewi.tudelft.nl.
-
Incorporating Video-conferencing functionality in the TCP/IP stack
Context: In telecommunication systems, transmission speeds are either
limited by the capacity of physical links connecting network devices or by the performance
of processing elements in network devices. Nowadays, with the advances of optical
technology, systems with speeds in the order of OC-48 (2,5 Gbps) and OC-192 (10 Gbps)
are deployed in the field with OC-768 (40 Gbps) systems are a reality in trials.
Capacity of physical links has practically become a non-issue, at least for a few more
years. Rather, the bottleneck has shifted from physical links to the processing elements
in network devices that must now keep in pace with the ever increasing line rates. This
project investigates the possibilities to speed up functions
found in network processing and implement them in specialized hardware (ASIC or FPGA)
next to a general-purpose processor.
Problem statement: How can the existing TCP/IP stack be modified to support
Videoconferencing?
Expected effort: The student is expected to perform some literature
research in accustoming himself/herself to the subject. Then, the student has to find
out what needs to be changed in the TCP/IP stack in order to support video-conferencing.
Subsequently, the identified modifications must be implemented in an existing TCP/IP
stack processing benchmark (in ANSI C).
Contact person: Stephan Wong
phone: +31 15 2781099, email: J.S.S.M.Wong@ewi.tudelft.nl.
-
Building a Secure Video-based Internet Telephony Solution in FPGA
Context: Currently, we are working a complete Internet-based telephone
solution that incorporates elements from Voice over IP, videoconferencing, video encoding,
audio encoding, etc. This solution will be completely mapped on a FPGA platform allowing
extended flexibility to change the functionality of the envisioned solution end increased
performance due to the use of specialized hardware. Furthermore, the platform incorporates
a general-purpose processor to handle less performant operations and other control
operations. This project intend to add new (software and hardware) elements to this
solution to extend its capabilites (increased bandwidth, higher resolution video, better
sound quality) and further improve the security of the data being transmitted. Security
itself is becoming increasingly important as data are being transported over the intrinsicly
unsecure Internet.
Problem statement: What elements can be identified to further increase the
usability of the envisioned telephony solution and how can they be implemented in the
existing FPGA platform that combines both software and hardware elements?
Expected effort: The student is expected to perform some literature
research in accustoming himself/herself to the subject. Then, the student has to clearly
specify what functionalities need to be added. Consequently, these new elements must be
programmed in both software and hardware to be executed on the existing platform.
Contact person: Stephan Wong
phone: +31 15 2781099, email: J.S.S.M.Wong@ewi.tudelft.nl.
-
The PINE project (Packet Inspection Engine)
Context: The proliferation of Internet and networking applications,
coupled with the wide-spread availability of system hacks and viruses have increased
the need for network security. The most efficient way to to provide sufficient protection
from attacks is called "deep packet inspection" and is performed by intrusion
detection/prevention systems (IDS/IPS). Such systems check the packet header, rely on
pattern matching techniques to analyze the packet payload, and make decisions on the
significance of the packet body. Based on a rule database, Intrusion Detection Systems
monitor network traffic and detect intrusion events. An example of an IDS rule is:
alert tcp any any ->192.168.1.0/24 111(content: "idc$|$3a3a$|$"; msg: "mountd access";)
A rule contains fields that can specify a suspicious packet's protocol, IP address,
Port, content (static patterns or regular expressions) and others. The envisioned
FPGA-based Intrusion detection System (IDS) consists of three parts:
1. Prefiltering: includes header matching, and partial pattern matching. Its role is,
for each incoming packet, to specify a small subset of the IDS rules that are possible
to match.
2. Specialized Processors: Based on the Prefiltering results, a set of identical specialized
processors (specifically designed for matching IDS rules) are required to match a selected set
of rules. Each processor runs a small routine written for a specific rule. These processors
have an interface to a sset of coprocessors that match Regular Expressions and static Patterns.
3. Coprocessors: they are in two kinds: the regular expression comparators and the static patterns
comparators. The output of these coprocessors provide input to the specialized processors.
IDS ruleset is updated frequently (roughly monthly), therefore, the FPGA-based IDS requires
mechanisms for fast (automatic) update (re-generation of the design and reconfiguration).
Problem statement (Student 1): How to design an efficient Header matching
module? Investigate which part of the rules should be part of the prefiltering.
Expected effort (Student 1): The student is expected to perform some literature
research in header matching/packet classification. Then perform an analysis of SNORT IDS
ruleset and determine the prefiltering strategy (apart from header, a small part of packet
payload is expected to be scanned). Subsequently, design and implement (in VHDL, including
synthesis, Placing & Routing using Xilinx Tools) the header match module and integrate it
with the payload scanner. Finally, the Prefiltering design must be tested using benchmarks
suitable for IDS. He/She should be able to update automatically the prefiltering design
after any ruleset modifcation. Finally, obtain performance & cost results for the design.
Problem statement (Student 2): Design a low area & high-speed specialized
processor, with application specific instruction set and an efficient interface to the coprocessors.
Expected effort (Student 2): The student is expected to design (including
ISA design), implement (in VHDL, including synthesis, P&R using Xilinx Tools) and test
a specialized processor for IDS rule match. The processor should be able to load fast,
from a centralized memory, a routine for a specific rule. Furthermore, it will have
specialized instructions, suitable for IDS rule matching, and an interface to a set of
coprocessors. Two major issues of designing this processor are: (i) to minimize its area c
ost and (ii) maximaze its performance. An instruction sequence (small program -a few tens
of instructions) should be generated (automatically) per rule. Finally, the performance of
the processor should be evaluated.
Problem statement (Student 3): Efficient design of a hardwired unit for
matching Regular Expressions. Design of optimal interface between coprocessors and
specialized processors.
Expected effort (Student 3): The student is expected to perform some
literature study in regular expression and finite automata theory, especially their
implementation in hardware. Then find an efficient and high-speed method to design
and implement (in VHDL, including synthesis, P&R using Xilinx Tools) regular expressions'
comparators in hardware. Automatic VHDL generation for a specific set of regular expressions
is also part of this thesis. Furthermore, The student is expected to integrate the Regular
expression and the static-pattern matchers, and put them together in a common interface to
the processors. He/she should search for an efficient way to pass the results of the
coprocessors to the processors. Finally, the performance of the Coprocessors alone and
their interface to the processors should be evaluated in terms of cost (area) and
performance.
Contact persons: Yiannis Sourdis / Georgi Gaydadjiev
phone: +31 15 2789656/+31 15 2786168,
email: sourdis AT ce.et.tudelft.nl /
g.n.gaydadjiev AT ewi.tudelft.nl
|
|
|
|