
|
Experimental Results
We studied performance of the superscalar processors enhanced with CSI on
the following media benchmarks, which represent different application domains:
JPEG and MPEG-2 encoders/decoders (image and video coding/decoding subdomain),
image-processing kernels from
the Sun's VIS developer Kit (2-D image processing subdomain), and SPEC's
viewperf
(3-D graphics subdomain). Performance of the CSI-enhanced superscalar
processors was compared to that of the processors extended with Sun's VIS or
Intel's SSE media extensions.
These studies were presented in the following papers:
Below, we briefly review some of these results.
The main goal of CSI is to reduce the number of executed instructions.
Figure 8 depicts, for several media kernels, the ratio of the dynamic instruction
count exhibited by the 4-way issue superscalar VIS-enhanced CPU to the instruction count exhibited by
the same processor enhanced with the CSI execution hardware.
Figure 8:
Instruction count reductions (times), CSI w.r.t. VIS
|
It can be observed that CSI, as expected, provides significant reductions
in the instruction counts , which range from 16.07 times to 6.34 times, with
the average reduction of 12.14 times.
For complete applications CSI allows to
reduce the number of executed instructions by a factor of up to 2.05
(djpeg--JPEG decoder). Reductions in instruction counts provide
significant speedups. For example, 4-way issue superscalar processor with
64-entry instruction window enhanced with CSI execution unit capable of
processing 32 bytes in parallel outperforms the same processor enhanced with
VIS execution units with the same parallel processing capabilities by a factor
of up to 7.8 on the kernel-level (add8 kernel --adding two images)
and by a factor of up to 1.5 on the application level (djpeg-- JPEG decoder). Additionally, we find out that performance of CSI-enhanced processors scales
much better than that of VIS-enhanced processors, if the amount of parallel processing hardware is increased.
|
|