Content area

Abstract

Recent advancements in computer performance have been hindered by the physical limitations of the current state-of-the-art semiconductor manufacturing technology. Steady performance growth, by means of increasing the operational frequency, is not possible any longer.

On the one hand we are "Hitting the Memory Wall"[1]: We need to increase the cache size to reduce the probability of cache misses. With the increased cache size and resulting transistor count on the other hand, we increase static and dynamic current leaks[2]. This results in an exponential growth of power consumption.

To keep up with the steady demand of increased performance, a paradigm shift towards multicore and many-core computer architecture designs has been made by the major microprocessor manufacturers.

This trend is going as far as integrating a very large number of simple processors onto a single die. This type of architecture is excellent for high-performance acceleration of domain-specific tasks. To achieve the best possible results, these accelerator platforms should be coupled with general-purpose microprocessors, which can take over the burden of running the operating system. One should note that the recent advancements in GPGPU technology along with steadily growing FPGA performance present other pathways of creating alternative acceleration platforms.

The IBM Cyclops64 Chip is part of a Petaflop class supercomputer architecture. This chip is a multicore architecture with a very large number of execution cores, memory banks and other components integrated on a single die. Each of these chip components are interconnected via the C64 Crossbar Switch, an efficient interconnection network. Simulation of such an interconnection network is a very important task throughout the design and implementation process.

This thesis describes the design, implementation, and experimentation with an environment that may be used for acceleration, verification and validation of this interconnection network. In addition to this, a latency accurate Cyclops64 architectural simulator environment has been extended and accelerated.

Under the iterative emulation technology first proposed at CAPSL, named "DIMES"[3], a portion of FPGA resources will be time-shared among several identical modules of the target design and iteratively used to emulate them in multiple steps. The representation of the identical modules in the FPGA consists of (1) a single module copy and (2) a storage block holding all the states of the modules during iterative emulation. With the help of this technology, the Cyclops32[4, 5] chip along with the Cyclops64 Crossbar Switch[6] have been implemented on the AlphaData[7] platform earlier. Additionally, the Cyclops64 chip has been recently fully implemented on the IBM MrsClops[8] Emulation Engine.

Major contributions of this document are: (i) We have ported the Cyclops64 interconnection network logic onto several state-of-the-art FPGA-Coprocessing Accelerator platforms. The increase in emulation speed as well as new logic designs of the Cyclops64 Architecture were the main driving forces for this work. Platforms such as XtremeData[9] XD1000 and DRC Computer[10] DS1000 were used for this work. Working on those novel platforms was a particularly interesting and challenging experience. We had to work on a range of different FPGA devices; we have faced and solved problems associated with bugs in vendor provided user interface logic, documentation and hardware device implementation. Throughout the process, we have provided valuable feedback to the platform designers. The resulting upgrades for future generations of these platforms will benefit from our efforts. (ii) With the use of those FPGA Accelerator platforms and based on the work of Fei Chen on the "LAST"1[11] simulator, we were able to create a new type of computer architecture simulation. By combining software Simulation with hardware Emulation, called the "SEmulator,"2 we were able to im prove the "LAST" simulator. Using the accelerated "DIMES" emulation of the Cyclops64 interconnection network, we have dramatically increased the performance of this Cyclops64 Architecture simulator. (iii) The newly developed verification utility for the Cyclops64 Interconnection Network has proven itself as an excellent tool for design verification and evaluation in various stages of development, such as verifying the initial KSM3 design for AsapSim4[12] simulation. It can provide VHDL test benches for design debugging during the creation of FPGA based emulation. In addition to that it can be used for the evaluation of these designs on the FPGA Accelerator platforms. The underlying framework also ensures portability over various emulation platforms for the "SEmulator." With this tool we have also demonstrated that the various interconnection network designs work as expected.

1Latency Accurate Software Testbench - Cyclops64 Architecture Simulator. Created by Fei Chen at CAPSL. 2Named "SEmulator" first by Fei Chen. 3HDL Language created at IBM. The Cyclops64 logic design is created in this language. 4Software simulator of the MrsClops Emulation Engine, created by Fei Chen at CAPSL.

Details

Title
A study of simulation and verification of a many-core architecture on two modern reconfigurable platforms
Author
Krepis, Dimitrij
Year
2007
Publisher
ProQuest Dissertations Publishing
ISBN
978-0-549-18384-6
Source type
Dissertation or Thesis
Language of publication
English
ProQuest document ID
304861012
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.