Simulation plays an important role in analyzing the performance of multiprocessor memory systems, but detailed simulation can consume enormous amounts of CPU time. This dissertation investigates three techniques that can reduce the time required to simulate a multiprocessor memory system: trace-driven simulation, simplifying approximations in memory-system simulators, and parallel simulation.
Trace-driven simulation allows simulation of multiple target systems from a single interpretation of a workload, recorded in a trace. It is often assumed that the trace is system-independent, but this assumption is violated by parallel workloads that have race-conditions or utilize dynamic scheduling. This dissertation measures and analyzes the impact of system-dependent traces on the accuracy of trace-driven simulations of multiprocessors.
Two such simplifying approximations for simulations of multiprocessor memory systems were investigated. Precise simulation requires that events originating from different parts of the system be simulated in the correct order. The investigation revealed that relaxing this constraint can double or triple simulator performance with little loss of accuracy. Precise simulation also requires simulating references to private data, which have little effect on performance. By neglecting private references, simulation was accelerated by up to 80 percent with little loss of accuracy.
To further increase speed, an existing simulator was parallelized to run on the DASH multiprocessor. The challenge was to preserve accuracy without serializing the simulation. Two approaches were evaluated: (1) ordering synchronization operations only and (2) relaxing the order of all events by a small amount. Both approaches yielded parallel speedups, but performance was limited by load imbalance. Overall, performance was comparable to the performance of hardware-assisted parallel simulators such as the Wisconsin Wind Tunnel.