Literature

173 references, last updated Mon Oct 22 17:02:15 Europe/Amsterdam 2007

[von Neumann, 1945]
John von Neumann. First draft of a report on the edvac. Technical Report W-670-ORD-4926, Moore School of Electrical Engineering, University of Pennsylvania, June 1945. Contracted with United States Army Ordnance Department. (PDF) (DOI)
The first draft of a report on the EDVAC written by John von Neumann is presented. This first draft contains a wealth of information, and it had a pervasive influence when it was first written. Most prominently, Alan Turing cites it in his proposal for the Pilot automatic computing engine (ACE) as the definitive source for understanding the nature and design of a general-purpose digital computer
[McCabe, 1976]
Thomas J. McCabe. A complexity measure. In ICSE '76: Proceedings of the 2nd international conference on Software engineering, page 407, Los Alamitos, CA, USA, 1976. IEEE Computer Society Press. (PDF)
This paper describes a graph-theoretic complexity measure and illustrates how it can be used to manage and control program com- plexity .The paper first explains how the graph-theory concepts apply and gives an intuitive explanation of the graph concepts in programming terms. The control graphs of several actual Fortran programs are then presented to illustrate the conelation between intuitive complexity and the graph-theoretic complexity
[Halstead, 1977]
Maurice H. Halstead. Elements of Software Science (Operating and programming systems series). Elsevier Science Inc., New York, NY, USA, 1977.
[Backus, 1978]
John Backus. Can programming be liberated from the von neumann style?: a functional style and its algebra of programs. Commun. ACM, 21(8):613–641, 1978. (DOI)
[Oviedo, 1980]
E. I. Oviedo. Control flow, data flow, and program complexity. In COMPSAC'80: Proceedings of the Fourth International Computer Software and Applications Conference, pages 146–152, November 1980.
[Boehm, 1981]
Barry W. Boehm. Software Engineering Economics. Prentice Hall PTR, Upper Saddle River, NJ, USA, 1981.
[Harrison and Magel, 1981]
Warren Harrison and Kenneth Magel. A topological analysis of the complexity of computer programs with less than three binary branches. SIGPLAN Not., 16(4):51–63, 1981. (DOI)
[Piwowarski, 1982]
Paul Piwowarski. A nesting level complexity measure. SIGPLAN Not., 17(9):44–50, 1982. (DOI)
[Basili and Hutchens, 1983]
Victor R. Basili and David H. Hutchens. An empirical study of a syntactic complexity family. IEEE Transactions on Software Engineering, 9(6):664–672, 1983.
[Kirkpatrick et al., 1983]
S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science, Number 4598, 13 May 1983, 220, 4598:671–680, 1983.
[Elshoff, 1984]
James L. Elshoff. Characteristic program complexity measures. In ICSE '84: Proceedings of the 7th international conference on Software engineering, pages 288–293, Piscataway, NJ, USA, 1984. IEEE Press.
[Prather, 1984]
R. E. Prather. An Axiomatic Theory of Software Complexity Measure. The Computer Journal, 27(4):340–347, 1984. (DOI)
In software engineering, various metrics' have been introduced in an attempt to measure the complexity of programs. We show how the whole idea of a software complexity measure' can be axiomatized in such a way as to include the more familiar concrete examples and to allow for new measures that might offer advantages not captured by those previously introduced. In particular, a new testing measure is introduced, based on the multiple-condition' test strategy. Comparisons are made between this new measure and the more traditional metrics. In addition, a more general theoretical study is initiated, showing the effect of the axiomatic development in relation to the treatment of program structuredness.
[Tai, 1984]
Kuo-Chung Tai. A program complexity metric based on data flow information in control graphs. In ICSE '84: Proceedings of the 7th international conference on Software engineering, pages 239–248, Piscataway, NJ, USA, 1984. IEEE Press.
[Ejiogu, 1985]
Lem O. Ejiogu. A simple measure of software complexity. SIGMETRICS Perform. Eval. Rev., 13(1):33–47, 1985. (DOI)
[Gong and Schmidt, 1985]
Huisheng Gong and Monika Schmidt. A complexity measure based on selection and nesting. SIGMETRICS Perform. Eval. Rev., 13(1):14–19, 1985. (DOI)
[Kafura and Canning, 1985]
Dennis Kafura and James Canning. A validation of software metrics using many metrics and two resources. In ICSE '85: Proceedings of the 8th international conference on Software engineering, pages 378–385, Los Alamitos, CA, USA, 1985. IEEE Computer Society Press.
[Chen and Kwan, 1986]
T Y Chen and S C Kwan. An analysis of length equation using a dynamic approach. SIGPLAN Not., 21(4):42–47, 1986. (DOI)
[Höskuldsson, 1988]
Agnar Höskuldsson. Pls regression methods. Journal of Chemometrics, 2(3):211–228, 1988. (DOI)
[Nejmeh, 1988]
Brian A. Nejmeh. Npath: a measure of execution path complexity and its applications. Commun. ACM, 31(2):188–200, 1988. (DOI)
[Weyuker, 1988]
E. J. Weyuker. Evaluating software complexity measures. IEEE Trans. Softw. Eng., 14(9):1357–1365, 1988. (DOI)
A set of properties of syntactic software complexity measures is proposed to serve as a basis for the evaluation of such measures. Four known complexity measures are evaluated and compared using these criteria. This formalized evaluation clarifies the strengths and weaknesses of the examined complexity measures, which include the statement count, cyclomatic number, effort measure, and data flow complexity measures. None of these measures possesses all nine properties, and several are found to fail to possess particularly fundamental properties; this failure calls into question their usefulness in measuring synthetic complexity
[Zalateu and Felician, 1989]
L. Zalateu and G. Felician. Validating halstead's theory for pascal programs. IEEE Transactions on Software Engineering, 15(12):1630–1632, 1989. (DOI)
M.H. Halstead's theory (1977) has been validated for different languages, but Pascal programs seem to fit only partially with the theory. D.B. Johnston and A.M. Lister (1981) first recognized the lack of operators due to the structure of this language and proposed a modification of Halstead's formula. The article confirms their results but suggests a correction to their formula, which is particularly necessary for large programs. Experimental results, obtained by examining about 550 Pascal programs, represent the widest test to date of Halstead theory with regard to Pascal programs
[Huang et al., 1990]
Chu-Yi Huang, Yen-Shen Chen, Youn-Long Lin, and Yu-Chin Hsu. Data path allocation based on bipartite weighted matching. In DAC '90: Proceedings of the 27th ACM/IEEE conference on Design automation, pages 499–504, New York, NY, USA, 1990. ACM Press. (DOI)
[Athanas and Silverman, 1991]
Peter M. Athanas and Harvey F. Silverman. An adaptive hardware machine architecture and compiler for dynamic processor reconfiguration. In ICCD '91: Proceedings of the 1991 IEEE International Conference on Computer Design on VLSI in Computer & Processors, pages 397–400, Washington, DC, USA, 1991. IEEE Computer Society.
[Fenton, 1991]
Norman E. Fenton. Software Metrics: A Rigorous Approach. Chapman & Hall, Ltd., London, UK, UK, 1991.
[Jayaprakash et al., 1991]
K. B. Jayaprakash, S. Sinha, and P. K. Lakshmanan. Properties of control-flow complexity measures. IEEE Transactions on Software Engineering, 17(12):1289–1295, 1991. (DOI)
The authors attempt to formalize some properties which any reasonable control-flow complexity measure must satisfy. Since large programs are often built by sequencing and nesting of simpler constructs, the authors explore how control-flow complexity measures behave under such compositions. They analyze five existing control flow complexity measures-cyclomatic number, total adjusted complexity, scope ratio, MEBOW, and NPATH. The analysis reveals the strengths and weaknesses of these control flow complexity measures
[Smith and Cherniavsky, 1991]
J. C. Smith and C. H. Cherniavsky. On weyuker's axioms for software complexity measures. IEEE Transactions on Software Engineering, 17(6):636–638, 1991. (DOI)
Properties for software complexity measures are discussed. It is shown that a collection of nine properties suggested by E.J. Weyuker is inadequate for determining the quality of a software complexity measure. (see ibid., vol.14, p.1357-65, 1988). A complexity measure which satisfies all nine of the properties, but which has absolutely no practical utility in measuring the complexity of a program is presented. It is concluded that satisfying all of the nine properties is a necessary, but not sufficient, condition for a good complexity measure
[Harrison, 1992]
Warren Harrison. An entropy-based measure of software complexity. IEEE Trans. Softw. Eng., 18(11):1025–1029, 1992. (DOI)
It is proposed that the complexity of a program is inversely proportional to the average information content of its operators. An empirical probability distribution of the operators occurring in a program is constructed, and the classical entropy calculation is applied. The performance of the resulting metric is assessed in the analysis of two commercial applications totaling well over 130000 lines of code. The results indicate that the new metric does a good job of associating modules with their error spans (averaging number of tokens between error occurrences)
[Micheli and Gupta, 1992]
R. K. De Micheli and G. Gupta. System-level synthesis using re-programmable components. In Design Automation, 1992. Proceedings. [3rd], pages 2–7, Brussels, 1992. (DOI)
The authors formulate the synthesis problem of complex behavioral descriptions with performance constraints as a hardware- software co-design problem. The target system architecture consists of a software component as a program running on a re- programmable processor assisted by application-specific hardware components. System synthesis is performed by first partitioning the input system description into hardware and software portions and then by implementing each of them separately. The synthesis of dedicated hardware is then achieved by means of hardware synthesis tools (D.D. Mitchell, D.C.Ku, F. Mailhot, and T. Truong, `The Olympus Synthesis System for digital design' IEEE Design and Test Magazine, p.37- 53, Oct. 1990), while the software component is generated using software compiling techniques. The authors consider the problem of identifying potential hardware and software components of a system described in a high-level modeling language and they present a partitioning procedure. They then describe the results of partitioning a network coprocessor
[Zuse, 1992]
Horst Zuse. Properties of software measures. Software Quality Journal, 1(4):225–260, 12 1992. (DOI)
[Adams et al., 1993]
D. E. Adams, J. K. Schmit, and H. Thomas. A model and methodology for hardware-software codesign. IEEE Design & Test of Computers, 10(3):6–15, 1993. (DOI)
A behavioral model of a class of mixed hardware-software systems is presented. A codesign methodology for such systems is defined. The methodology includes hardware-software partitioning, behavioral synthesis, software compilation, and demonstration on a testbed consisting of a commercial central processing unit (CPU), field-programmable gate arrays, and programmable interconnections. Design examples that illustrate how certain characteristics of system behavior and constraints suggest hardware or software implementation are presented
[Casselman, 1993]
S. Casselman. Virtual computing and the virtual computer. In FPGA '93: Proceedings of the IEEE Workshop on FPGAs for Custom Computing Machines, pages 43–48, Reseda, CA, USA, April 1993. IEEE Computer Society. (DOI)
Virtual computing is an entirely new form of supercomputing that allows an algorithm to be implemented in hardware. Based on the Xilinx FPGA and ICube's FPID the Virtual Computer is completely reconfigurable in every respect. Computing machines based on reconfigurable logic are hyper-scalable meaning they scale up better than 1-1
[Henkel et al., 1993]
R. Henkel, J. Benner, and T. Ernst. Hardware-software cosynthesis for microcontrollers. IEEE Design & Test of Computers, 10(4):64–75, 1993. (DOI)
The authors present a software-oriented approach to hardware-software partitioning which avoids restrictions on the software semantics as well as an iterative partitioning process based on hardware extraction controlled by a cost function. This process is used in Cosyma, an experimental cosynthesis system for embedded controllers. As an example, the extraction of coprocessors for loops is demonstrated. Results are presented for several benchmark designs
[Lee and Kalavade, 1993]
A. Lee and E. A. Kalavade. A hardware-software codesign methodology for dsp applications. IEEE Design & Test of Computers, 10(3):16–28, 1993. (DOI)
The authors describe a systematic, heterogeneous design methodology using the Ptolemy framework for simulation, prototyping, and software synthesis of systems containing a mixture of hardware and software components. They focus on signal-processing systems in which the hardware typically consists of custom data paths, finite-state machines (FSMs), glue logic and programmable processors. The software is one or more embedded programs running on the programmable components
[Lok et al., 1993]
W. Lok, V. Page, and I. Luk. Hardware acceleration of divide-and-conquer paradigms: a case study. In FPGAs for Custom Computing Machines, 1993., pages 192–201, Napa, CA, 1993. (DOI)
The authors describe a method for speeding up divide-and-conquer algorithms with a hardware coprocessor, using sorting as an example. The method employs a conventional processor for the `divide' and `merge' phases, while the `conquer' phase is handled by a purpose-built coprocessor. It is shown how transformation techniques from the Ruby language can be adopted in developing a family of systolic sorters, and how one of the resulting designs is prototyped in eight FPGAs on a PC coprocessor board known as CHS2×4 from Algotronix. The execution of the hardware unit is embedded in a sorting program, with the PC host merging the sorted sequences from the hardware sorter. The performance of this implementation is compared against various sorting algorithms on a number of PC systems
[Micheli and Gupta, 1993]
R. K. De Micheli and G. Gupta. Hardware-software cosynthesis for digital systems. IEEE Design & Test of Computers, 10(3):29–41, 1993. (DOI)
As system design grows increasingly complex, the use of predesigned components, such as general-purpose microprocessors can simplify synthesized hardware. While the problems in designing systems that contain processors and application-specific integrated circuit chips are not new, computer-aided synthesis of such heterogeneous or mixed systems poses unique problems. The authors demonstrate the feasibility of synthesizing heterogeneous systems by using timing constraints to delegate tasks between hardware and software so that performance requirements can be met. System functionality is captured using the HardwareC hardware description language. The synthesis of an Ethernet-based network coprocessor is discussed as an example
[O'Neal, 1993]
Michael B. O'Neal. An empirical study of three common software complexity measures. In SAC '93: Proceedings of the 1993 ACM/SIGAPP symposium on Applied computing, pages 203–207, New York, NY, USA, 1993. ACM Press. (DOI)
[Sharma and Jain, 1993]
Alok Sharma and Rajiv Jain. Estimating architectural resources and performance for high-level synthesis applications. In DAC '93: Proceedings of the 30th international conference on Design automation, pages 355–360, New York, NY, USA, 1993. ACM Press. (DOI)
[Silverman and Athanas, 1993]
P. M. Silverman and H. F. Athanas. Processor reconfiguration through instruction-set metamorphosis. Computer, 26(3):11–18, 1993. (DOI)
The processor reconfiguration through instruction-set metamorphosis (PRISM) general-purpose architecture, which speeds up computationally intensive tasks by augmenting the core processor's functionality with new operations, is described. The PRISM approach adapts the configuration and fundamental operations of a core processing system to the computationally intensive portions of a targeted application. PRISM-1, an initial prototype system, is described, and experimental results that demonstrate the benefits of the PRISM concept are presented
[van Ierssel et al., 1993]
D. M. van Ierssel, M. H. Wong, and D. H. Lewis. A field programmable accelerator for compiled-code applications. In FPGAs for Custom Computing Machines, 1993., pages 60–67, Napa, CA, 1993. (DOI)
The paper describes a special purpose application accelerator using field programmable gate arrays to accelerate a range of applications. The accelerator is designed to support applications by allowing the user to implement a processor with an instruction set designed for the specific application being accelerated, using specialized instructions to implement critical fragments of the application. A compiled-code software organization is used to reduce overhead operations. A prototype has been built, and the first application to be ported to it, logic simulation, is underway
[DeHon, 1994]
André DeHon. DPGA-coupled microprocessors: Commodity ICs for the early 21st century. In Duncan A. Buell and Kenneth L. Pocek, editors, IEEE Workshop on FPGAs for Custom Computing Machines, pages 31–39, Los Alamitos, CA, 1994. IEEE Computer Society Press.
[Eldredge and Hutchings, 1994]
J. G. Eldredge and B. L. Hutchings. RRANN: The run-time reconfiguration artificial neural network. In Custom Integrated Circuits Conference, pages 77–80, San Diego, CA, 1994.
[Ellervee et al., 1994]
P. Ellervee, A. Jantsch, J. Öberg, A. Hemani, and H. Tenhunen. Exploring asic design space at system level with a neural networkestimator. In ASIC '94: Proceedings of the Seventh Annual IEEE International ASIC Conference and Exhibit, pages 67–70, Campus IT University, Kista, Sweden, September 1994. (DOI)
Estimators are critical tools in carrying out architectural level exploration of the design space. We present a novel approach to estimation based on the multilayer perceptron which builds the estimation function during the learning process and thus allows the description of arbitrary complex functions. We also describe how the control data flow graph is encoded for the neural network input and present results of the first experiments made with realistic design examples
[Jantsch et al., 1994]
Axel Jantsch, Peeter Ellervee, Johny Öberg, and Ahmed Hemani. A case study on hardware/software partitioning. In FCCM'94, Proceedings of the Workshop on FPGAs for Custom Computing Machines, pages 111–118, Napa Valley, CA, 1994. IEEE Computer Society Press. (PDF) (DOI)
We present an analysis of a fully automatic method to accelerate standard software in C or C++ by use of field programmable gate arrays. Traditional compiler techniques are applied to the hardware/software partitioning problem and a compiler is linked to state of the art hardware synthesis tools. Time critical regions are identified by means of profiling and are automatically implemented in user programmable logic with high level and logic synthesis design tools. The underlying architecture is an add-on board with user programmable logic connected to a Spare based workstation via the system bus. We present an analysis and case study of this method. Eight programs are used as test cases and the data collected by applying this method to programs is used to discuss potentials and limitations of this and similar methods. We discuss architectural parameters, programming language properties, and analysis techniques
[Kalavade and Lee, 1994]
Asawaree Kalavade and Edward A. Lee. A global criticality/local phase driven algorithm for the constrained hardware/software partitioning problem. In CODES '94: Proceedings of the 3rd international workshop on Hardware/software co-design, pages 42–48, Los Alamitos, CA, USA, 1994. IEEE Computer Society Press. (DOI)
[Peng and Kuchcinski, 1994]
Zebu Peng and Krzysztof Kuchcinski. An algorithm for partitioning of application specific systems. Technical Report R-94-01, Department of Computer and Information Science, Linköping University, Linköping, Sweden, 1994. Published in Proceedings of the European Conference on Design Automation EDAC'93, Paris, France, February 22-25, 1993. (PS)
[Vahid et al., 1994]
Frank Vahid, Daniel D. Gajski, and Jie Gong. A binary-constraint search algorithm for minimizing hardware during hardware/software partitioning. In EURO-DAC '94: Proceedings of the conference on European design automation, pages 214–219, Los Alamitos, CA, USA, 1994. IEEE Computer Society Press.
[Kalavade and Lee, 1995]
A. Kalavade and E. A. Lee. The extended partitioning problem: hardware/software mapping and implementation-bin selection. In RSP '95: Proceedings of the Sixth IEEE International Workshop on Rapid System Prototyping (RSP'95), page 12, Washington, DC, USA, 1995. IEEE Computer Society.
[Kifli et al., 1995]
A. Kifli, G. Goosens, and H. De Man. A unified scheduling model for high-level synthesis and code generation. In EDTC '95: Proceedings of the 1995 European conference on Design and Test, page 234, Washington, DC, USA, 1995. IEEE Computer Society.
[Lemoine and Merceron, 1995]
E. Lemoine and D. Merceron. Run time reconfiguration of fpga for scanning genomic databases. In FCCM '95: Proceedings of the IEEE Symposium on FPGA's for Custom Computing Machines, page 90, Washington, DC, USA, 1995. IEEE Computer Society.
[Prather, 1995]
Ronald E. Prather. Design and analysis of hierarchical software metrics. ACM Comput. Surv., 27(4):497–518, 1995. (DOI)
[Triantafyllos et al., 1995]
George Triantafyllos, Stamatis Vassiliadis, and Walid Kobrosly. On the prediction of computer implementation faults via static error prediction models. J. Syst. Softw., 28(2):129–142, 1995. (DOI)
[Vahid, 1995]
Frank Vahid. Procedure exlining: a transformation for improved system and behavioral synthesis. In ISSS '95: Proceedings of the 8th international symposium on System synthesis, pages 84–89, New York, NY, USA, 1995. ACM Press. (DOI)
[Balboni et al., 1996]
A. Balboni, W. Fornaciari, and D. Sciuto. Partitioning and exploration strategies in the tosca co-design flow. In CODES '96: Proceedings of the 4th International Workshop on Hardware/Software Co-Design, page 62, Washington, DC, USA, 1996. IEEE Computer Society.
[Ball and Larus, 1996]
Thomas Ball and James R. Larus. Efficient path profiling. In MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture, pages 46–57, Washington, DC, USA, 1996. IEEE Computer Society.
[dm Le Vahid, 1996]
F. Thuy dm Le Vahid. Towards a model for hardware and software functional partitioning. In Hardware/Software Co-Design, 1996. (Codes/CASHE '96), Proceedings of the Fourth International Workshop on, pages 116–123, Pittsburgh, PA, 1996. (DOI)
We describe a model that supports the functional partitioning of a system-level functional specification among hardware and software components. The model includes only the information needed by partitioning, and thus can be communicated freely and generated automatically. Based on characteristics of several real examples, we describe a technique for automatically generating generic model instances, on which partitioning heuristics can be applied and fairly compared. Such comparisons will become increasingly important as research begins to focus on fast yet effective functional partitioning techniques. We describe a set of tools for converting a specification to the model, for generating generic model instances, and for applying and comparing partitioning heuristics, available via ftp. Use of these tools may greatly reduce duplicated efforts among researchers wishing to investigate hardware/software partitioning heuristics
[Landman, 1996]
Paul Landman. High-level power estimation. In ISLPED '96: Proceedings of the 1996 international symposium on Low power electronics and design, pages 29–35, Piscataway, NJ, USA, 1996. IEEE Press.
[Suzuki and Sangiovanni-Vincentelli, 1996]
Kei Suzuki and Alberto Sangiovanni-Vincentelli. Efficient software performance estimation methods for hardware/software codesign. In DAC '96: Proceedings of the 33rd annual conference on Design automation, pages 605–610, New York, NY, USA, 1996. ACM Press. (DOI)
[Triantafyllos et al., 1996]
George Triantafyllos, Stamatis Vassiliadis, and José G. Delgado-Frias. Software metrics and microcode: a case study. Journal of Software Maintenance, 8(3):199–224, 1996. (DOI)
[Ernst and Ye, 1997]
R. Ernst and W. Ye. Embedded program timing analysis based on path clustering and architecture classification. In ICCAD '97: Proceedings of the 1997 IEEE/ACM international conference on Computer-aided design, pages 598–604, Washington, DC, USA, 1997. IEEE Computer Society. (DOI)
[Gubian et al., 1997]
W. Gubian, P. Sciuto, D. Silvano, and C. Fornaciari. System-level power evaluation metrics. In Innovative Systems in Silicon, 1997., pages 323–330, Austin, TX, 1997. (DOI)
High-level power estimation is a key issue for IC designers and system engineers. The goal is to widely explore the architectural design space and to compare alternative solutions, while maintaining an acceptable accuracy and a competitive design time. In this paper, an approach is proposed for evaluating the system-level power consumption of embedded systems implemented by using VLSI circuits. Accurate and efficient early power evaluation metrics have been defined to guide the system-level partitioning phase of a more general HW/SW co-design approach for control dominated embedded systems. The hardware and software contributions to the power consumption at the system- level have been considered as well as the contribution of the HW/SW communication
[Hammel et al., 1997]
T. Hammel, U. Schwefel, and H. P. Back. Evolutionary computation: comments on the history and current state. Evolutionary Computation, IEEE Transactions on, 1(1):3–17, 1997. (DOI)
Evolutionary computation has started to receive significant attention during the last decade, although the origins can be traced back to the late 1950's. This article surveys the history as well as the current state of this rapidly growing field. We describe the purpose, the general structure, and the working principles of different approaches, including genetic algorithms (GA) (with links to genetic programming (GP) and classifier systems (CS)), evolution strategies (ES), and evolutionary programming (EP) by analysis and comparison of their most important constituents (i.e. representations, variation operators, reproduction, and selection mechanism). Finally, we give a brief overview on the manifold of application domains, although this necessarily must remain incomplete
[Hartenstein et al., 1997a]
C. Hartenstein, R. Mencer, O. Morris, J. Palem, and Ebeling. Seeking solutions in configurable computing. Computer, 30(12):38–43, 1997. (DOI)
Configurable computing offers the potential of producing powerful new computing systems. Will current research overcome the dearth of commercial applicability to make such systems a reality? Unfortunately, no system to date has yet proven attractive or competitive enough to establish a commercial presence. We believe that ample opportunity exists for work in a broad range of areas. In particular, the configurable computing community should focus on refining the emerging architectures, producing more effective software/hardware APIs, better tools for application development that incorporate the models of hardware reconfiguration, and effective benchmarking strategies
[Hartenstein et al., 1997b]
R. Hartenstein, J. Becker, and U. Nageldinger M.Herz. Data scheduling in hardware/software co-design for field-programmable accelerators. In FPL '97: Proceedings of 7th International Workshop on Field Programmable Logic, pages 294–303, University of Kaiserlautern, Kaiserlautern, Germany, September 1997. Springer.
[Long, 1997]
J. Scott Long. Regression Models for Categorical and Limited Dependent Variables (Advanced Quantitative Techniques in the Social Sciences). Sage Publications, March 1997.
In Regression Models for Categorical and Limited Dependent Variables, J. Scott Long provides a well-written, comprehensive introduction to statistical models for binary, ordinal, nominal, and limited dependent variables. The book would serve equally well as an addition to a social scientist's quantitative library or as a text for a graduate level statistics course. The major strength of the book is its emphasis on interpretation.
[Nemani and Najm, 1997]
Mahadevamurty Nemani and Farid N. Najm. High-level area and power estimation for vlsi circuits. In ICCAD '97: Proceedings of the 1997 IEEE/ACM international conference on Computer-aided design, pages 114–119, Washington, DC, USA, 1997. IEEE Computer Society.
[Saha et al., 1997]
D. Saha, A. Basu, and R. S. Mitra. Hardware software partitioning using genetic algorithm. In VLSID '97: Proceedings of the Tenth International Conference on VLSI Design: VLSI in Multimedia Applications, page 155, Washington, DC, USA, 1997. IEEE Computer Society.
[Salchak and Chawla, 1997]
P. W. Salchak and P. Chawla. Supporting hardware trade analysis and cost estimation using design complexity. In VIUF '97: Proceedings of the VHDL International Users' Forum, pages 126–133, Beavercreek, OH, USA, October 1997. (DOI)
Defines and illustrates a hardware design complexity measure (HDCM) and describe its potential applications to trade-off analysis and cost estimation. Specifically, we define a VHDL complexity measure. We have derived the HDCM from an avionics software design complexity measure (ASDCM) that we have shown to be effective in estimation and optimization of overall software costs. Similar to the ASDCM, we believe that the proposed HDCM could enable more optimal hardware design, implementation and maintenance
[Wawrzynek and Hauser, 1997]
J. R. Wawrzynek and J. Hauser. Garp: a MIPS processor with a reconfigurable coprocessor. In FPGAs for Custom Computing Machines, 1997., pages 12–21, Napa Valley, CA, 1997. (DOI)
Typical reconfigurable machines exhibit shortcomings that make them less than ideal for general-purpose computing. The Garp Architecture combines reconfigurable hardware with a standard MIPS processor on the same die to retain the better features of both. Novel aspects of the architecture are presented, as well as a prototype software environment and preliminary performance results. Compared to an UltraSPARC, a Garp of similar technology could achieve speedups ranging from a factor of 2 to as high as a factor of 24 for some useful applications
[Brandolese, 1998]
Carlo Brandolese. System-level performance estimation strategy for sw and hw. In ICCD '98: Proceedings of the International Conference on Computer Design, page 48, Washington, DC, USA, 1998. IEEE Computer Society.
[Henkel and Ernst, 1998]
Jörg Henkel and Rolf L. Ernst. High-level estimation techniques for usage in hardware/software co-design. In ASPDAC '98: Proceedings of the ASP-DAC'98. Asia and South Pacific Design Automation Conference, pages 353–360, Princeton, NJ, USA, February 1998. (DOI)
High-level estimation techniques are of paramount importance for design decisions like hardware/software partitioning or design space explorations. In both cases an appropriate compromise between accuracy and computation time determines about the feasibility of those estimation techniques. In this paper we present high-level estimation techniques for hardware effort and hardware/software communication time. Our techniques deliver fast results at sufficient accuracy. Furthermore, it is shown in which way these techniques are applied in order to cope with contradictory design goals like performance constraints and hardware effort constraints. As a solution, we present a cost function for the purpose of hardware/software partitioning that offers a dynamic weighting of its components. The conducted experiments show that the usage of our estimation techniques in conjunction with their efficient combination leads to reasonable hardware/software implementations as opposed to approaches that consider single constraints only
[Henkel and Li, 1998]
Jörg Henkel and Yanbing Li. Energy-conscious hw/sw-partitioning of embedded systems: a case study on an mpeg-2 encoder. In CODES/CASHE '98: Proceedings of the 6th international workshop on Hardware/software codesign, pages 23–27, Washington, DC, USA, 1998. IEEE Computer Society. (PDF) (DOI)
Energy dissipation is a hot topic in the design of- especially mobile-embedded systems. This is because applications like digital video cameras, cellular phones etc. draw their current from batteries that spend a limited amount of energy only. In this paper we show that energy-conscious HW/SW-partitioning can lead to drastic reductions of energy dissipation of a whole embedded system. Subject of investigation is an MPEG-2 encoder. Therefore, we introduce our framework for estimating and optimizing system energy as well as all conducted design steps. The obtained results show energy savings up 59% while the performance remains approximately the same or becomes even slightly higher. As a main result, energy-conscious HW/SW-partitioning is a promising method to be deployed in addition to classical energy and/or power reduction methods
[Khouri et al., 1998]
Kamal S. Khouri, Ganesh Lakshminarayana, and Niraj K. Jha. Fast high-level power estimation for control-flow intensive design. In ISLPED '98: Proceedings of the 1998 international symposium on Low power electronics and design, pages 299–304, New York, NY, USA, 1998. ACM Press. (DOI)
[Kokol and Brest, 1998]
Peter Kokol and Janez Brest. Fractal structure of random programs. SIGPLAN Notices, 33(6):33–38, 1998.
[Maestro et al., 1998]
J. A. Maestro, D. Mozos, and H. Mecha. A macroscopic time and cost estimation model allowing task parallelism and hardware sharing for the codesign partitioning process. In DATE '98: Proceedings of the Design, Automation and Test in Europe Conference, pages 218–225, Madrid, Spain, February 1998. (DOI)
This paper describes a method to estimate the implementation cost of the hardware part in a mixed hardware/software system, as well as the related performance. These estimations try to avoid the use of many implementation details in order to keep the complexity order of the process under control. The concepts of hardware sharing and parallelism are exploited to make a picture of the whole hardware cost associated with a given partition
[Séméria and Micheli, 1998]
Luc Séméria and Giovanni De Micheli. Spc: synthesis of pointers in c: application of pointer analysis to the behavioral synthesis from c. In ICCAD '98: Proceedings of the 1998 IEEE/ACM international conference on Computer-aided design, pages 340–346, New York, NY, USA, 1998. ACM Press. (DOI)
[Stone and Gokhale, 1998]
M. B. Stone and J. M. Gokhale. NAPA c: compiling for a hybrid RISC/fpga architecture. In FPGAs for Custom Computing Machines, 1998., pages 126–135, Napa Valley, CA, 1998. (DOI)
Hybrid architectures combining conventional processors with configurable logic resources enable efficient coordination of control with datapath computation. With integration of the two components on a single device, loop control and data-dependent branching can be handled by the conventional processor. While regular datapath computation occurs on the configurable hardware. This paper describes a novel pragma-based approach to programming such hybrid devices. The NAPA C language provides pragma directives so that the programmer (or an automatic partitioner) can specify where data is to reside and where computation is to occur with statement-level granularity. The NAPA C compiler, targeting National Semiconductor's NAPA1000 chip, performs semantic analysis of the pragma-annotated program and co-synthesizes a conventional program executable combined with a configuration bit stream for the adaptive logic. Compiler optimizations include synthesis of hardware pipelines from pipelineable loops
[Arnout, 1999]
Guido Arnout. C for system level design. In DATE '99: Proceedings of the conference on Design, automation and test in Europe, page 81, New York, NY, USA, 1999. ACM Press. (DOI)
[Dave, 1999]
Bharat P. Dave. Crusade: hardware/software co-synthesis of dynamically reconfigurable heterogeneous real-time distributed embedded systems. In DATE '99: Proceedings of the conference on Design, automation and test in Europe, page 22, New York, NY, USA, 1999. ACM Press. (DOI)
[Dave et al., 1999]
Bharat P. Dave, Ganesh Lakshminarayana, and Niraj K. Jha. Cosyn: hardware-software co-synthesis of heterogeneous distributed embedded systems. IEEE Trans. Very Large Scale Integr. Syst., 7(1):92–104, 1999. (DOI)
Hardware-software co-synthesis starts with an embedded-system specification and results in an architecture consisting of hardware and software modules to meet performance, power, and cost goals. Embedded systems are generally specified in terms of a set of acyclic task graphs. In this paper, we present a co-synthesis algorithm COSYN, which starts with periodic task graphs with real-time constraints and produces a low-cost heterogeneous distributed embedded-system architecture meeting these constraints. It supports both concurrent and sequential modes of communication and computation. It employs a combination of preemptive and nonpreemptive static scheduling. It allows task graphs in which different tasks have different deadlines. It introduces the concept of an association array to tackle the problem of multirate systems. It uses a new task- clustering technique, which takes the changing nature of the critical path in the task graph into account. It supports pipelining of task graphs and a mix of various technologies to meet embedded-system constraints and minimize power dissipation. In general, embedded-system tasks are reused across multiple functions. COSYN uses the concept of architectural hints and reuse to exploit this fact. Finally, if desired, it also optimizes the architecture for power consumption. COSYN produces optimal results for the examples from the literature while providing several orders of magnitude advantage in central processing unit time over an existing optimal algorithm. The efficacy of COSYN and its low-power extension COSYN-LP is also established through their application to very large task graphs (with over 1000 tasks)
[Hammes et al., 1999]
J. Hammes, R. Rinker, W. Böhm, and W. Najjar. Compiling a high-level language to reconfigurable systems. In Compiler and Architecture Support for Embedded Systems Conference (CASES '99), October 1999.
[Haruyama and Cummings, 1999]
M. Haruyama and S. Cummings. Fpga in the software radio. Communications Magazine, IEEE, 37(2):108–112, 1999. (DOI)
As new radio standards are deployed without substantially supplanting existing ones, the need for multimode multiband handsets and infrastructure increases. This article describes how emerging FPGA technology's unique combination of size and power efficiency plus field programmability offers a transition of FPCAs from ASIC prototyping to embedded products. Software- defined receiver examples suggest an enlarged role for FPGAs in pragmatic paths toward the productization of software radio technology
[Micheli, 1999]
Giovanni De Micheli. Hardware synthesis from c/c++ models. In DATE '99: Proceedings of the conference on Design, automation and test in Europe, page 80, New York, NY, USA, 1999. ACM Press. (DOI)
[Shenoy et al., 1999]
U. Nagaraj Shenoy, Alok Choudhary, and Prithviraj Banerjee. Symphany: A tool for automatic synthesis of parallel heterogeneous adaptive systems. Technical Report CPDC-TR-9903-002, Center for Parallel and Distributed Computing, Northwestern University, Evanston, IL, USA, March 1999.
[Shin, 1999]
Tae-Woo Kim Hyunchul Shin. Hardware cost estimation techniques for c-level description. In VLSI and CAD, 1999. ICVC '99. 6th International, pages 85–88, Seoul, 1999. (DOI)
Recent trends in the hardware/software codesign and architectural exploration bring us the need to develop sophisticated high- level estimation tools. This paper describes hardware cost estimation techniques for descriptions written in C language. This approach estimates the area and performance of the system described in standard ANSI C language to be implemented in hardware. Experimental results show that this approach has some errors but gives the designer useful information concerning the hardware for architectural exploration and hardware/software partitioning in high-level codesign
[Walker and Blythe, 1999]
S. A. Walker and R. A. Blythe. Efficiently searching the optimal design space. In VLSI, 1999. Proceedings. Ninth Great Lakes, pages 192–195, Ypsilanti, MI, 1999. (DOI)
One of the primary advantages of a high-level synthesis system is its ability to explore the design space. This paper presents several methodologies for design space exploration that compute all optimal tradeoff points for the combined problem of scheduling, clock length determination, and module selection. We discuss how each methodology takes advantage of both the structure within the design space itself as well as the structure of, and interaction between, each of the three subproblems
[Aigner et al., 2000]
Gerald Aigner, Amer Diwan, David L. Heine, Monica S. Lam, David L. Moore, Brian R. Murphy, and Constantine Sapuntzakis. An Overview of the SUIF2 Compiler Infrastructure. Stanford University, Stanford, CA, USA, 2.2.0-4 edition, 2000. (PS)
[Arnold et al., 2000]
Matthew Arnold, Stephen Fink, Vivek Sarkar, and Peter F. Sweeney. A comparative study of static and profile-based heuristics for inlining. In DYNAMO '00: Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization, pages 52–64, New York, NY, USA, 2000. ACM Press. (DOI)
[Bammi et al., 2000]
Jwahar R. Bammi, Wido Kruijtzer, Luciano Lavagno, Edwin Harcourt, and Mihai T. Lazarescu. Software performance estimation strategies in a system-level design tool. In CODES '00: Proceedings of the eighth international workshop on Hardware/software codesign, pages 82–86, New York, NY, USA, 2000. ACM Press. (DOI)
[Banerjee et al., 2000]
P. Banerjee, N. Shenoy, A. Choudhary, S. Hauck, C. Bachmann, M. Haldar, P. Joisha, A. Jones, A. Kanhare, A. Nayak, S. Periyacheri, M. Walkden, and D. Zaretsky. A matlab compiler for distributed, heterogeneous, reconfigurable computing systems. In FCCM '00: Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines, page 39, Washington, DC, USA, 2000. IEEE Computer Society.
[Bilavarn et al., 2000]
S. Bilavarn, G. Gogniat, and J. Philippe. Area time power estimation for fpga based designs at a behavioral level. In ICECS'2K, Kaslik, Lebanon, December 2000.
[Brandolese et al., 2000]
C. Brandolese, W. Fornaciari, L. Pomante, F. Salice, and D. Sciuto. A multi-level strategy for software power estimation. In ISSS '00: Proceedings of the 13th international symposium on System synthesis, pages 187–192, Washington, DC, USA, 2000. IEEE Computer Society. (DOI)
[Buracchini, 2000]
Enrico Buracchini. The software radio concept. IEEE Communications Magazine, 38(9):138–143, September 2000.
[Diguet et al., 2000]
J.-P. Diguet, G. Gogniat, P. Danielo, J.-L. Philippe, and M. Auguin. System specification with the spf model. In FDL '00: Proceedings of the Forum on Design Languages, Lester, UBS University, Lorient, France, September 2000. ECSI Association. Session 3.4.
[Hosemann et al., 2000]
S. Hosemann, M. Reed, J. H. Athanas, and P. Srikanteswara. Design and implementation of a completely reconfigurable soft radio. In Radio and Wireless Conference, 2000. RAWCON, pages 7–11, Denver, CO, 2000. (DOI)
The advances in reconfigurable computing have now made it possible to implement the concept of hardware paging, which has the potential to greatly advance the design of soft radios. While many soft/software radio architectures have been suggested and implemented there remains a lack of a formal design methodology that can be used to design and implement these radios on reconfigurable platforms that exploit the latest inventions. This paper presents a unified architecture, called the layered radio architecture, for design of soft radios on a reconfigurable platform. Using the assumptions of the availability of run-time reconfigurable hardware and the use of stream-based computing, the layered radio architecture defines a soft radio architecture that is scalable in hardware and software, flexible, and capable of supporting multi-mode radios along with over-the-air updates and software validation
[Khoshgoftaar et al., 2000]
Taghi M. Khoshgoftaar, Edward B. Allen, Wendell D. Jones, and John P. Hudepohl. Accuracy of software quality models over multiple releases. Ann. Softw. Eng., 9(1-4):103–116, 2000.
[Lay, 2000]
David C. Lay. Linear Algebra and its Applications. Addison Wesley Longman, Inc., Boston, MA, USA, second edition, 2000.
[Li et al., 2000]
Yanbing Li, Tim Callahan, Ervan Darnell, Randolph Harr, Uday Kurkure, and Jon Stockwood. Hardware-software co-design of embedded reconfigurable architectures. In DAC '00: Proceedings of the 37th conference on Design automation, pages 507–512, New York, NY, USA, 2000. ACM Press. (DOI)
[Séméria et al., 2000]
Luc Séméria, K. Sato, and Giovanni De Micheli. Resolution of dynamic memory allocation and pointers for the behavioral synthesis from c. In DATE '00: Proceedings of the Design, Automation and Test in Europe Confeence and Exhibition, pages 312–319, Stanford University, CA, USA, March 2000. (DOI)
One of the greatest challenges in C/C++-based design methodology is to efficiently map C/C++ models into hardware. Many of the networking and multimedia applications implemented in hardware or mixed hardware/software systems are making use of complex data structures stored in one or multiple memories. As a result, many of the C/C++ features which were originally designed for software applications are now making their way into hardware. Such features include dynamic memory allocation and pointers used to manage data. We present a solution for efficiently mapping arbitrary C code with pointers and malloc/free into hardware. Our solution fits current memory management methodologies. It consists of instantiating a hardware allocator tailored to an application and a memory architecture. Our work also supports the resolution of pointers without restriction on the data structures. An implementation using the SUIF framework is presented, followed by some case studies such as the realization of a video filter
[Ye et al., 2000]
Zhi Alex Ye, Nagaraj Shenoy, and Prithviraj Baneijee. A c compiler for a processor with a reconfigurable functional unit. In FPGA '00: Proceedings of the 2000 ACM/SIGDA eighth international symposium on Field programmable gate arrays, pages 95–100, New York, NY, USA, 2000. ACM Press. (DOI)
[Bohm et al., 2001]
A. P. Bohm, B. Draper, W. Najjar, J. Hammes, R. rinker, M. Chawathe, and C. Ross. One-step compilation of image processing applications to fpgas. In FCCM '01: Proceedings of the 9th Annual IEEE Symposium on field-Programmable Custom Computing Machines, pages 209–218, Colorado State University, CO, USA, May 2001. IEEE Computer Society. (DOI)
[Brandolese et al., 2001]
C. Brandolese, W. Fornaciari, F. Salice, and D. Sciuto. Source-level execution time estimation of c programs. In CODES '01: Proceedings of the ninth international symposium on Hardware/software codesign, pages 98–103, New York, NY, USA, 2001. ACM Press. (DOI)
[Chilimbi, 2001]
Trishul M. Chilimbi. Efficient representations and abstractions for quantifying and exploiting data reference locality. In PLDI '01: Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation, pages 191–202, New York, NY, USA, 2001. ACM Press. (DOI)
[Haldar et al., 2001]
Malay Haldar, Anshuman Nayak, Alok Choudhary, Prith Banerjee, and Nagraj Shenoy. Fpga hardware synthesis from matlab. In VLSID '01: Proceedings of the 14th International Conference on VLSI Design (VLSID '01), page 299, Washington, DC, USA, 2001. IEEE Computer Society.
[Hirzel and Chilimbi, 2001]
M. Hirzel and T. Chilimbi. Bursty tracing: A framework for low-overhead temporal profiling. In 4th ACM Workshop on Feedback-Directed and Dynamic Optimization (FDDO-4), December 2001.
[Power et al., 2001]
J. Power, J. Waldron, and J. Horgan. Measurement and analysis of runtime profiling data for java programs. In Source Code Analysis and Manipulation, 2001., pages 122–130, Florence, 2001. (DOI)
The authors examine a procedure for the analysis of data produced by the dynamic profiling of Java programs. In particular, we describe the issues involved in dynamic analysis, propose a metric for discrimination between the resulting data sets, and examine its application over different test suites and compilers
[Srinivasan et al., 2001]
V. Srinivasan, S. Govindarajan, and R. Vemuri. Fine-grained and coarse-grained behavioral partitioning witheffective utilization of memory and design space exploration formulti-fpga architectures. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 9(1):140–158, February 2001. (DOI)
Reconfigurable computers (RCs) host multiple field programmable gate arrays (FPGAs) and one or more physical memories that communicate through an interconnection fabric. State-of-the-art RCs provide abundant hardware and storage resources, but have tight constraints on FPGA pin-out and inter-FPGA interconnection resources. These stringent constraints are the primary impediment for multi-FPGA partitioning tools to generate high-quality designs, in this paper, we present two integrated partitioning and synthesis approaches for RCs. The first approach involves fine-grained partitioning of a scheduled data-flow graph (DFG, or an operation graph), and the second involves a coarse-grained partitioning of an unscheduled control data flow graph (CDFG, or a block graph). A hardware design space exploration engine is integrated with the block graph partitioner that dynamically contemplates multiple schedules during partitioning. The novel feature in the partitioning approaches is that the physical memory in the RC is effectively used to alleviate the FPGA pin-out and inter- FPGA interconnection bottle-neck. Several experiments have been conducted, targeting commercial multi-FPGA boards, to compare the two partitioning approaches, and detailed summaries are presented
[Villarreal et al., 2001]
Jason R. Villarreal, R Lysecky, S Cotterell, and F. Vahid. A study on the loop behavior of embedded programs. Technical Report UCR–CSE–01–03, University of California, Riverside, Riverside, CA, USA, 2001. (PDF)
[Bjuréus et al., 2002]
Per Bjuréus, Mikael Millberg, and Axel Jantsch. Fpga resource and timing estimation from matlab execution traces. In CODES '02: Proceedings of the tenth international symposium on Hardware/software codesign, pages 31–36, New York, NY, USA, 2002. ACM Press. (DOI)
[Bodik et al., 2002]
B. Bodik, R. Hill, and M. D. Fields. Slack: maximizing performance under technological constraints. In Computer Architecture, 2002. Proceedings. 29th, pages 47–58, Anchorage, AK, 2002. (DOI)
Many emerging processor microarchitectures seek to manage technological constraints (e.g., wire, delay, power, and circuit complexity) by resorting to non-uniform designs that provide resources at multiple quality levels (e.g., fast/slow bypass paths, multi-speed functional units, and grid architectures). In such designs, the constraint problem becomes a control problem, and the challenge becomes designing a control policy that mitigates the performance penalty of the non-uniformity. Given the increasing importance of non-uniform control policies, we believe it is appropriate to examine them, in their own right. To this end, we develop slack for use in creating control policies that match program execution behavior to machine design. Intuitively, the slack of a dynamic instruction i is the number of cycles i can be delayed with no effect on execution time. This property makes slack a natural candidate for hiding non-uniform latencies. We make three contributions in our exploration of slack. First, we formally define slack, distinguish three variants (local, global and apportioned), and perform a limit study to show that slack is prevalent in our SPEC2000 workload. Second, we show how to predict slack in hardware. Third, we illustrate how to create a control policy based on slack for steering instructions among fast (high power) and slow (lower power) pipelines
[Cotterell and Hughes, 2002]
Mike Cotterell and Bob Hughes. Software project management. McGraw-Hill Publishing Company, Berkshire, England, UK, third edition, 2002.
[Cousot and Cousot, 2002]
Patrick Cousot and Radhia Cousot. Modular static program analysis. In CC '02: Proceedings of the 11th International Conference on Compiler Construction, pages 159–178, London, UK, 2002. Springer-Verlag.
[Dasu and Panchanathan, 2002]
Aravind Dasu and Sethuraman Panchanathan. Reconfigurable media processing. Parallel Comput., 28(7-8):1111–1139, 2002. (DOI)
[Haggard and L., 2002]
Jie Chen Haggard and R. L. Extraction of parallel hardware during c to vhdl translation. In System Theory, 2002. Proceedings of the, pages 334–338, 2002. (DOI)
Translating C/C++ language into VHDL is an important step in synthesizing hardware from C/C++. However, there is no explicit facility in the general C/C++ language to declare concurrent parallel execution which is a critical characteristic of hardware systems. This paper presents the outline of a set of transformation algorithms. These algorithms are helpful in the process of extracting parallel hardware during C to VHDL translation. An example of extracting parallel hardware from an array addition routine written in C is also presented in this paper.
[Jones et al., 2002]
Alex Jones, Debabrata Bagchi, Sartajit Pal, Prith Banerjee, and Alok Choudhary. PACT HDL: a compiler targeting ASICS and FPGAS with power and performance optimizations, pages 169–190. Series in Computer Science. Kluwer Academic Publishers, Norwell, MA, USA, 2002.
[Kulkarni et al., 2002]
Dhananjay Kulkarni, Walid A. Najjar, Robert Rinker, and Fadi J. Kurdahi. Fast area estimation to support compiler optimizations in fpga-based reconfigurable systems. In FCCM '02: Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, page 239, Washington, DC, USA, 2002. IEEE Computer Society.
[Maas et al., 2002]
Elmar Maas, Dirk Herrmann, Rolf Ernst, Peter Rüffer, Sieghard Hasenzahl, and Martin Seitz. A processor-coprocessor architecture for high end video applications, pages 688–691. The Morgan Kaufmann Systems On Silicon Series. Kluwer Academic Publishers, Norwell, MA, USA, 2002.
[McPeak, 2002]
Scott McPeak. Elkhound: A fast, practical glr parser generator. Technical Report UCB/CSD–2--1214, University of California, Berkeley, Berkeley, CA, USA, december 2002.
[Munson, 2002]
John C. Munson. Software Engineering Measurement. CRC Press, Inc., Boca Raton, FL, USA, 2002.
[Nayak et al., 2002]
A. Nayak, M. Haldar, A. Choudhary, and P. Banerjee. Accurate area and delay estimators for fpgas. In DATE '02: Proceedings of the conference on Design, automation and test in Europe, page 862, Washington, DC, USA, 2002. IEEE Computer Society.
[Rubin et al., 2002]
Shai Rubin, Rastislav Bodík, and Trishul Chilimbi. An efficient profile-analysis framework for data-layout optimizations. In POPL '02: Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 140–153, New York, NY, USA, 2002. ACM Press. (DOI)
[Shang and Jha, 2002]
Li Shang and Niraj K. Jha. Hardware-software co-synthesis of low power real-time distributed embedded systems with dynamically reconfigurable fpgas. In ASP-DAC '02: Proceedings of the 2002 conference on Asia South Pacific design automation/VLSI Design, page 345, Washington, DC, USA, 2002. IEEE Computer Society.
[Sima et al., 2002]
Mihai Sima, Stamatis Vassiliadis, Sorin Cotofana, Jos T. J. van Eijndhoven, and Kees A. Vissers. Field-programmable custom computing machines - a taxonomy -. In FPL '02: Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications, pages 79–88, London, UK, 2002. Springer-Verlag.
[Vahid and Gajski, 2002]
Frank Vahid and Daniel D. Gajski. Incremental hardware estimation during hardware/software functional partitioning. In Readings in hardware/software co-design, pages 516–521. Kluwer Academic Publishers, Norwell, MA, USA, 2002.
[Villarreal et al., 2002]
Jason R. Villarreal, D. Suresh, G. Stitt, F. Vahid, and W. Najjar. Improving software performance with configurable logic. Journal on Design Automation of Embedded Systems, 7(4):325–339, 2002. (DOI)
[Zhu, 2002]
Jianwen Zhu. Symbolic pointer analysis. In ICCAD '02: Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design, pages 150–157, New York, NY, USA, 2002. ACM Press. (DOI)
[Becker and Hartenstein, 2003]
Jürgen Becker and Reiner Hartenstein. Configware and morphware going mainstream. J. Syst. Archit., 49(4-6):127–142, 2003. (DOI)
[Bhasyam and Bazargan, 2003]
Karthikeyan Bhasyam and Kia Bazargan. Hw/sw codesign incorporating edge delays using dynamic programming. In DSD '03: Proceedings of the Euromicro Symposium on Digital Systems Design, page 264, Washington, DC, USA, 2003. IEEE Computer Society.
[Catthoor et al., 2003]
P. G. Catthoor, F. Aas, and E. J. Kjeldsberg. Data dependency size estimation for use in memory optimization. Computer-Aided Design of Integrated Circuits and, 22(7):908–921, 2003. (DOI)
A novel storage requirement estimation methodology is presented for use in the early system design phases when the data transfer ordering is only partly fixed. At that stage, none of the existing estimation tools are adequate, as they either assume a fully specified execution order or ignore it completely. This paper presents an algorithm for automated estimation of strict upper and lower bounds on the individual data dependency sizes in high-level application code given a partially fixed execution ordering. In the overall estimation technique, this is followed by a detection of the maximally combined size of simultaneously alive dependencies, resulting in the overall storage requirement of the application. Using representative application demonstrators, we show how our techniques can effectively guide the designer to achieve a transformed specification with low storage requirement.
[Fornaciari et al., 2003]
W. Fornaciari, F. Salice, and D. P. Scarpazza. Early estimation of the size of vhdl projects. In Proceedings of the first IEEE/ACM/IFIP Intl. Conf. on Hardware/Software Codesign and System Synthesis, pages 207–212, Milano, Italy, 2003. Politecnico di Milano. (PDF) (DOI)
[Guo et al., 2003]
Z. Guo, D. C. Suresh, and W. A. Najjar. Programmability and efficiency in reconfigurable computer systems. In Workshop on Software Support for Reconfigurable Systems, held in conjunction with the 9th Int. Conf. Of High-Performance Computer Architecture, Anaheim, CA, February 2003. (PDF)
[Gupta et al., 2003]
Sumit Gupta, Nikil Dutt, Rajesh Gupta, and Alex Nicolau. Spark : A high-level synthesis framework for applying parallelizing compiler transformations. vlsid, 00:461, 2003. (DOI)
This paper presents a modular and extensible high-level synthesis research system, called SPARK, that takes a behavioral description in ANSI-C as input and produces synthesizable register-transfer level VHDL. SPARK uses parallelizing compiler technology, developed previously, to enhance instruction-level parallelism and re-instruments it for high-level synthesis by incorporating ideas of mutual exclusivity of operations, resource sharing and hardware cost models. In this paper, we present the design flow through the SPARK system, a set of transformations that include speculative code motions and dynamic transformations and show how these transformations and other optimizing synthesis and compiler techniques are employed by a scheduling heuristic. Experiments are performed on two moderately complex industrial applications, namely MPEG-1 and the GIMP image processing tool. The results show that the various code transformations lead to up to 70 improvements in performance without any increase in the overall area and critical path of the final synthesized design.
[Jha and Vallerio, 2003]
K. S. Jha and N. K. Vallerio. Task graph extraction for embedded system synthesis. In VLSI Design, 2003. Proceedings. 16th, pages 480–486, 2003. (DOI)
Consumer demand and improvements in hardware have caused distributed real-time embedded systems to rapidly increase in complexity. As a result, designers faced with time-to-market constraints are forced to rely on intelligent design tools to enable them to keep up with demand. These tools are continually being used earlier in the design process when the design is at higher levels of abstraction. At the highest level of abstraction are hardware/software co-synthesis tools which take a system specification as input. Although many embedded systems are described in C, the system specifications for many of these tools are often in the form of one or more task graphs. These tools are very effective at solving the co-synthesis problem using task graphs but require that designers manually transform the specification from C code to task graphs, a tedious and error-prone job. The task graph extraction tool described in this paper reduces the potential for error and the time required to design an embedded system by automating the task graph extraction process. Such a tool can drastically improve designer productivity. As far as we know, this is the first tool of its kind. It has been made available on the web.
[Kaplan et al., 2003]
Adam Kaplan, Philip Brisk, and Ryan Kastner. Data communication estimation and reduction for reconfigurable systems. In DAC '03: Proceedings of the 40th conference on Design automation, pages 616–621, New York, NY, USA, 2003. ACM Press. (DOI)
[Levine and Schmit, 2003]
Benjamin A. Levine and Herman H. Schmit. Efficient application representation for haste: Hybrid architectures with a single, transformable executable. In FCCM '03: Proceedings of the 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, page 101, Washington, DC, USA, 2003. IEEE Computer Society.
[Merrill, 2003]
Jason Merrill. Generic and gimple: A new tree representation for entire functions. In Proceedings of the 2003 GCC Summit. Red Hat, Inc., 2003. (PS)
[Neto and Cardoso, 2003]
J. M. P. Neto and H. C. Cardoso. Compilation for fpga-based reconfigurable hardware. IEEE Design & Test of Computers, 20(2):65–75, 2003. (DOI)
This paper provides techniques for compiling software programs into reconfigurable hardware which offer faster and more efficient performance than the complex resource-sharing approaches typical of high-level synthesis systems. The Java-based compiler presented in this paper uses intermediate graph representations to embody parallelism at various levels.
[Panainte et al., 2003]
Elena Moscu Panainte, Koen Bertels, and Stamatis Vassiliadis. Compiling for the molen programming paradigm. In Proceedings of the 13th International Conference on Field Programmable Logic and Applications (FPL'03), pages 900–910, Delft, Netherlands, September 2003.
[Soininen and P., 2003]
Yang Qu Soininen and J. P. Estimating the utilization of embedded fpga co-processor. In Digital System Design, 2003. Proceedings., pages 214–221, 2003. (DOI)
Embedded FPGA co-processors will bring new alternatives for SoC system designers. Comparison of software implementations and reconfigurable hardware implementations will need fast and easy-to-use estimation techniques. In this paper, we present an estimation approach for the resource utilization of the embedded FPGA co-processor. Our approach is based on the principles of high-level synthesis, such as force-directed scheduling, resource allocation, operation assignment and interconnection binding. The method has been applied to simple test cases and a C-language model of MPEG-2 decoder. The average hardware estimation error of MPEG-2 functions was 25%.
[Srikanteswara et al., 2003]
S. Srikanteswara, R. C. Palat, J. H. Reed, and P. Athanas. An overview of configurable computing machines for software radio handsets. IEEE Communications Magazine, 41(7):134–141, 2003. (DOI)
The advent of software radios has brought a paradigm shift to radio design. A multimode handset with dynamic reconfigurability has the promise of integrated services and global roaming capabilities. However, most of the work to date has been focused on software radio base stations, which do not have as tight constraints on area and power as handsets. Base station software radio technology progressed dramatically with advances in system design, adaptive modulation and coding techniques, reconfigurable hardware, A/D converters, RF design, and rapid prototyping systems, and has helped bring software radio handsets a step closer to reality. However, supporting multimode radios on a small handset still remains a design challenge. A configurable computing machine, which is an optimized FPGA with application-specific capabilities, show promise for software radio handsets in optimizing hardware implementations for heterogeneous systems. In this article contemporary CCM architectures that allow dynamic hardware reconfiguration with maximum flexibility are reviewed and assessed. This is followed by design recommendations for CCM architectures for use in software radio handsets.
[Suresh et al., 2003]
Dinesh C. Suresh, Walid A. Najjar, Frank Vahid, Jason R. Villarreal, and Greg Stitt. Profiling tools for hardware/software partitioning of embedded applications. SIGPLAN Not., 38(7):189–198, 2003. (PDF) (DOI)
[Swahn and Hassoun, 2003]
Brian Swahn and Soha Hassoun. Hardware scheduling for dynamic adaptability using external profiling and hardware threading. In ICCAD '03: Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design, page 58, Washington, DC, USA, 2003. IEEE Computer Society. (DOI)
[Vassiliadis et al., 2003]
Stamatis Vassiliadis, Georgi N. Gaydadjiev, Koen Bertels, and Elena Moscu Panainte. The molen programming paradigm. In Proceedings of the Third International Workshop on Systems, Architectures, Modeling, and Simulation, pages 1–10, Delft, Netherlands, July 2003.
[Banerjee and Dutt, 2004]
Sudarshan Banerjee and Nikil Dutt. Very fast simulated annealing for hw-sw partitioning. Technical Report UCI–CECS–04–18, University of California, Irvine, Irvine, CA, USA, June 2004.
[Cardoso and Diniz, 2004]
João M. P. Cardoso and Pedro C. Diniz. Modeling Loop Unrolling: Approaches and Open Issues, volume 3133/2004 of Lecture Notes in Computer Science, page 224. Springer, July 2004. (DOI)
[Cherkaskyy, 2004]
M. Cherkaskyy. Theoretical fundamentals software/hardware algorithms. In TCSET '04: Proceedings of the International Conference on Modern Problems of Radio Engineering, Telecommunications and Computer Science, pages 9–13, Lviv, Ukraine, February 2004. (DOI)
[Faruque et al., 2004]
M.A. Al Faruque, K. Karuri, S. Kowalewski, and R. Leupers. Fine grained application profiling for guiding application specific instruction set processors(asips) design. Master's thesis, Reinisch-Westfälische Hochshule, Aachen, Germany, 2004. (PDF)
Current Application Specific Instruction set Processor (ASIP) design methodologies are mostly based on iterative architecture exploration that uses Architecture Description Languages (ADLs) and retargetable software development tools. However, for improved design efficiency, additional pre-architecture exploration tools are required to help narrow-down the huge design space and making coarsegrained Instruction Set Architecture (ISA) decisions before detailed ADL modeling. Extensive application code profiling is the key in such early design stages. Based on a novel code instrumentation technology, we present a microprofiling approach that fills the current gap between source-level and instruction-level profilers and combines their advantages w.r.t. speed and accuracy. We show how the microprofiler is embedded into an advanced ASIP design flow and justify its use in a case study to design an MP3 decoder ASIP.
[Mukherjee et al., 2004]
Rajarshi Mukherjee, Alex Jones, and Prith Banerjee. Handling data streams while compiling c programs onto hardware. In ISVLSI '04: Proceedings of the IEEE Computer Society Annual Symposium on VLSI Emerging Trends in VLSI Systems Design, pages 271–272. IEEE Computer Society, 2004. (DOI)
[Niehaus et al., 2004]
D. Niehaus, D. Ashenden, and P. Andrews. Programming models for hybrid CPU/fpga chips. Computer, 37(1):118–120, 2004. (DOI)
Designers of embedded and real-time systems are continually challenged to meet tighter system requirements at better price-performance ratios. Best-practice methods have long promoted the use of commercial-off-the-shelf components to reduce design costs and time to market, but creating COTS components that are reusable in a wide range of applications remains difficult. In part, the challenge lies in satisfying the contradictory design forces of generalization and specialization. Systems designers are all too familiar with the tension these opposing forces cause in trying to balance cost versus performance. Adopting COTS components reduces costs and time to market but often fails to meet the most demanding performance requirements; custom- designed components can achieve significantly higher performance but at greater development costs and longer times to market. Emerging hybrid chips containing both CPU and field- programmable gate array (FPGA) components are an exciting new development. They promise COTS economies of scale while also supporting significant hardware customization. Components that combine a CPU and reconfigurable logic gates need a programming model that abstracts the computational hardware.
[Panainte et al., 2004]
Elena Moscu Panainte, Koen Bertels, and Stamatis Vassiliadis. Multimedia reconfigurable hardware design space exploration. In Proceedings of the 16th IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS 2004), pages 398–403, Delft, Netherlands, November 2004. ACTA Press. (PDF)
In this paper we consider a set of multimedia applications and investigate the potential performance impact a recon figurable microcoded processor can provide when added to a general purpose core processor. In a design space ex ploration, considering MPEG2 and JPEG benchmarks, we investigate performance boundaries, memory bottlenecks and the influence the core and reconfigurable processor communication has on performance. Under some realis tic scenarios and serial FPGA execution, it is shown that a 53 cycle reduction is expected when comparing a design having a core processor and a design when the core pro cessor is augmented with a reconfigurable microcoded en gine. In addition, we have found that transferring parame ters between the core processor and the reconfigurable pro cessor may not severely influence the overall performance. Finally we investigated the memory bandwidth for opera tions mapped automatically on FPGA. The case study in dicates that small latency DCT hardware design performs well when interfaced with 512 bytes/cycle. Our studies also indicate that about 64 bytes/cycle will support high speed execution for SAD and IDCT.
[Sima, 2004]
Mihai Sima. The ρ–TriMedia Processor. PhD thesis, Delft University of Technology, Delft, Netherlands, March 2004.
[Strelzoff, 2004]
Al Strelzoff. Functional programming for reconfigurable computing. In IPDPS '04: Proceedings of the 18th International Parallel and Distributed Processing Symposium, San Jose, CA, USA, April 2004. IEEE Computer Society. (DOI)
Summary form only given. Reconfigurable computing requires organizing computation with mixtures of processors and discrete logic thus presenting a difficult problem of hardware/software integration. An execution model and adaptation of functional programming is proposed which removes the distinction between hardware and software while offering the possibility of ``correct by construction'' design. The resulting language is called ``V'' because one way of creating it is to begin with the verifiable, synthesizable subset of Verilog, and then add functional programming features. V generates the net-list of elementary functions which are supported by an array. The compiler has stages of compilation and instantiation so that recursion can be supported in the early definition of a design. The execution model is cycle based synchronous dataflow. V syntax looks much like Verilog or C without pointers in order to facilitate adoption.
[van Albada et al., 2004]
P. F. van Albada, G. D. Sloot, and P. M. A. Spinnato. Performance modeling of distributed hybrid architectures. IEEE Parallel and Distributed Systems, 15(1):81–92, 2004. (DOI)
Hybrid architectures are systems where a high performance general purpose computer is coupled to one or more special purpose devices (SPDs). Such a system can be the optimal choice for several fields of computational science. Configuring the system and finding the optimal mapping of the application tasks onto the hybrid machine often is not straightforward. Performance modeling is a tool to tackle and solve these problems. We have developed a performance model to simulate the behavior of a hybrid architecture consisting of a parallel multiprocessor where some nodes are the host of a GRAPE board. GRAPE is a very high performance SPD used in computational astrophysics. We validate our model on the architecture at our disposal, and show examples of predictions that our model can produce.
[Vuletic et al., 2004]
Miljan Vuletic, Laura Pozzi, and Paolo Ienne. Programming transparency and portable hardware interfacing: Towards general-purpose reconfigurable computing. In ASAP '04: Proceedings of the 15th IEEE International Conference on Application-Specific Systems, Architectures, and Processors, pages 339–351. IEEE Computer Society, September 2004. (DOI)
[Wong et al., 2004]
S. Wong, S. Gaydadjiev, G. Bertels, and K. Vassiliadis. The molen polymorphic processor. Transactions on Computers, 53(11):1363–1375, 2004. (DOI)
In this paper, we present a polymorphic processor paradigm incorporating both general-purpose and custom computing processing. The proposal incorporates an arbitrary number of programmable units, exposes the hardware to the programmers/designers, and allows them to modify and extend the processor functionality at will. To achieve the previously stated attributes, we present a new programming paradigm, a new instruction set architecture, a microcode-based microarchitecture, and a compiler methodology. The programming paradigm, in contrast with the conventional programming paradigms, allows general-purpose conventional code and hardware descriptions to coexist in a program: In our proposal, for a given instruction set architecture, a onetime instruction set extension of eight instructions, is sufficient to implement the reconfigurable functionality of the processor. We propose a microarchitecture based on reconfigurable hardware emulation to allow high-speed reconfiguration and execution. To prove the viability of the proposal, we experimented with the MPEG-2 encoder and decoder and a Xilinx Virtex II Pro FPGA. We have implemented three operations, SAD, DCT, and IDCT. The overall attainable application speedup for the MPEG-2 encoder and decoder is between 2.64-3.18 and between 1.56-1.94, respectively, representing between 93 percent and 98 percent of the theoretically obtainable speedups.
[Bhansali, 2005]
P. V. Bhansali. Complexity measurement of data and control flow. SIGSOFT Software Engineering Notes, 30(1):1, 2005. (DOI)
[Buyukkurt et al., 2005]
Z. Buyukkurt, B. Najjar, W. Vissers, and K. Guo. Optimized generation of data-path from c codes for fpgas. Design, Automation and Test in Europe, 2005., pages 112–117, 2005. (DOI)
FPGAs, as computing devices, offer significant speedup over microprocessors. Furthermore, their configurability offers an advantage over traditional ASICs. However, they do not yet enjoy high-level language programmability, as microprocessors do. This has become the main obstacle for their wider acceptance by application designers. ROCCC is a compiler designed to generate circuits from C source code to execute on FPGAs, more specifically on CSoCs. It generates RTL level HDLs from frequently executing kernels in an application. In this paper, we describe the ROCCC's system overview and focus on its data path generation. We compare the performance of ROCCC- generated VHDL code with that of Xilinx IPs. The synthesis result shows that the ROCCC-generated circuit takes around 2/spl times//spl sim/3/spl times/ the area and runs at a comparable clock rate.
[Calman and S., 2005]
Jianwen Zhu Calman and S. Context sensitive symbolic pointer analysis. Computer-Aided Design of Integrated Circuits and, 24(4):516–531, 2005. (DOI)
One of the bottlenecks in the recent movement of hardware synthesis from behavioral C programs is the difficulty in reasoning about runtime pointer values at compile time. The pointer analysis problem has been investigated in the compiler community for two decades and has yielded efficient, polynomial time algorithms for context-insensitive (CI) analysis. However, at the accuracy level for which hardware synthesis is desired, namely context and flow sensitive analysis, the time and space complexity of the best algorithms reported grow exponentially with program size. In this paper, we propose a new analysis technology to combat the inefficiency encountered in traditional algorithms. The key idea is to implicitly encode the pointer-to relation in the Boolean domain by Bryant's binary decision diagram, thereby capturing the procedure transfer function completely, compactly and canonically. With symbolic transfer functions, we can establish a common framework to perform both CI and context- sensitive (CS) pointer analysis efficiently. In addition, we propose a symbolic representation of the invocation graph, which can otherwise be exponentially large. In contrast to the classical frameworks, where CS point-to information of a procedure has to be obtained by the application of its transfer function exponentially many times, our method can obtain point- to information of all contexts in a single application. Our experimental evaluation on a wide range of C benchmarks indicates that our CS pointer analysis can be made almost as fast as its CI counterpart.
[Cardoso, 2005a]
J. Cardoso. Evaluating the process control-flow complexity measure. In Web Services, 2005. ICWS 2005. Proceedings. 2005, 2005. (DOI)
Process measurement is the task of empirically and objectively assigning numbers to the attributes of processes in such a way as to describe them. We define process complexity as the degree to which a process is difficult to analyze, understand or explain. One way to analyze a process' complexity is to use a process control-flow complexity measure. This measure analyzes the control-flow of processes and can be applied to both Web processes and workflows. In this paper, we discuss how to evaluate the control-flow complexity measure to ensure that it can be qualify as a good and comprehensive one.
[Cardoso, 2005b]
Jorge Cardoso. Control-flow complexity measurement of processes and weyuker's properties. Enformatika, 8:213–218, 10 2005. (DOI)
[Edwards, 2005]
Stephen A. Edwards. The challenges of hardware synthesis from c-like languages. In DATE '05: Proceedings of the Design, Automation and Test in Europe Conference and Exposition, pages 66–67, Columbia University, NY, USA, 2005. IEEE Computer Society. (DOI)
Many techniques for synthesizing digital hardware from C-like languages have been proposed, but none have emerged as successful as Verilog or VHDL for register-transfer-level design. Familiarity is the main reason C-like languages have been proposed for hardware synthesis. Synthesize hardware from C, proponents claim, and a C programmer can be turned into a hardware designer. Another common motivation is hardware/software codesign: today's systems usually contain a mix of hardware and software, and it is often unclear initially which portions to implement in hardware. Here, using a single language should simplify the migration task. The paper surveys several C-like hardware synthesis languages and looks at two of the fundamental challenges, concurrency and timing control.
[Holzer and Rupp, 2005]
M. Holzer and M. Rupp. Static estimation of execution times for hardwar/e accelerators in system-on-chips. In System-on-Chip, 2005. Proceedings. 2005 International Symposium on, pages 62–65, 2005. (PDF)
[Venkataramani et al., 2005]
Girish Venkataramani, Tiberiu Chelcea, and Seth Copen Goldstein. Hls support for unconstrained memory accesses. In IWLS '05: IEEE 14th International Workshop on Logic Synthesis, Carnegie Mellon University, Pittsburgh, PA, USA, June 2005.
[Wolf, 2005]
Wayne Wolf. Building the software radio. Computer, 38(3):87–89, 2005. (DOI)
People have been working on software radio for about few years. Software radio is just what it sounds like - a radio that uses software to perform many of the signal processing tasks that analog circuits traditionally handle. Software radio could turn out to be a paradigm shift for communication systems. The US Defense Advanced Projects Research Agency (DARPA) kicked off research into software radios to solve military problems, but software radios can help solve some important problems in commercial communication systems as well. Software radio offers the advantage of putting many traditionally hard functions in modules whose characteristics can be changed while the radio is running.
[Meeuws, 2007]
R. J. Meeuws. A quantitative model for hardware/software partitioning. Master's thesis, Delft University of Technology, Delft, Netherlands, May 2007.
System Development needs Hardware/Software Partitioning performed early on in the development process. In order to do this early on predictions of hardware resource usage and delay are necessary. In this thesis a Quantitative Model is presented that can make early predictions to support the partitioning process. The model is based on Software Complexity Metrics, which capture important aspects of functions like control intensity, data intensity, code size, etc. In order to remedy the interdependence of the software metrics a Principal Component Analysis performed. The hardware characteristics were determined by automatically generating VHDL from C using the DWARV C-to-VHDL compiler. Using the results from the principal component analysis, the quantitative model was generated using linear regression. The error of the model differs per hardware characteristic. We show that for flip-flops the mean error for the predictions is 69%. In conclusion, our quantitative model can make fast and sufficiently accurate area predictions to support Hardware/Software Partitioning. In the future, the model can be extended by introducing extra software metrics, using more advanced modeling techniques, and using a larger collection of functions and algorithms.
[Meeuws et al., 2007]
R. J. Meeuws, Y. D. Yankova, K.L.M. Bertels, G. N. Gaydadjiev, and S. Vassiliadis. A quantitative prediction model for hardware/software partitioning. In Proceedings of 17th International Conference on Field Programmable Logic and Applications (FPL'07), page 5, August 2007.
An important step in Heterogeneous System Development is Hardware/Software Partitioning. This process involves exploring a huge design space. By using profiling to select hot-spots and estimate area and delay we can prune the design space considerably. We present a Quantitative Model that makes early predictions to prune the design space and support the partitioning process. The model is based on Software Complexity Metrics, which capture important aspects of functions as control intensity, data intensity, and code size. To remedy interdependence among software metrics, we performed a Principal Component Analysis. The hardware characteristics were determined by automatically generating VHDL from C using the DWARV C-to-VHDL compiler. Linear regression on these data generated our model. The model error differs per hardware characteristic. We show that for flip-flops the mean error is 69%. In conclusion, our quantitative model makes fast and sufficiently accurate area predictions in support of early Hardware/Software Partitioning.
[Virginia, 2007]
Arcilio Jaime-Raul Virginia. Comparative study of vhdl generators. Master's thesis, Delft University of Technology, Delft, The Netherlands, May 2007.
[Yankova et al., 2007]
Y. D. Yankova, K.L.M. Bertels, S. Vassiliadis, R. J. Meeuws, and A.J.R. Virginia. Automated hdl generation: Comparative evaluation. In Proceedings of International Symposium on Circuits and Systems (ISCAS2007), May 2007.
[Larsen]
Pia Veldt Larsen. St111 - regression and analysis of variance.
[weba]
Delft workbench.
A semi-automatic tool platform for integrated hardware-software co-design targeting heterogeneous computing systems containing reconfigurable components. Delft Workbench addresses the entire design cycle rather than isolated parts. It involves the development of compilers for reconfigurable platforms, programming models, hardware software co-design, CAD and design space exploration software, optimization algorithms and integration software development. The Delft Workbench targets the MOLEN machine organisation.
[webb]
Elkhound and elsa.
Elkhound is a parser generator, similar to Bison. The parsers it generates use the Generalized LR (GLR) parsing algorithm. GLR works with any context-free grammar, whereas LR parsers (such as Bison) require grammars to be LALR(1).
[webc]
The r project for statistical computing.
R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.
[webd]
Roccc website.
ROCCC is a C to hardware compilation project whose objective is the FPGA-based acceleration of frequently executed code segments (loop nests). It focus is on extensive compile-time transformations and optimizations with the aim of 1) Maximizing parallelism by exploiting functional, loop and operation parallelism, 2) Maximizing the troughput of a computation, 3) Minimizing the number of off-FPGA memory accesses, 4) Minimizing the area occupied by the circuit.
[webe]
Spark: High-level synthesis using parallelizing compiler techniques.
SPARK is a C-to-VHDL high-level synthesis framework that employs a set of innovative compiler, parallelizing compiler, and synthesis transformations to improve the quality of high-level synthesis results. The compiler transformations have been re-instrumented for synthesis by incorporating ideas of mutual exclusivity of operations, resource sharing and hardware cost models. The SPARK parallelizing high-level synthesis methodology is particularly targeted to multimedia and image processing applications along with control-intensive microprocessor functional blocks.