In this paper we document our experiences with two different methods of parallelizing Sparc Sulima, a simulator of UltraSPARC III Cu-based multiprocessor systems. In the first approach, a simple interconnect model within the simulator is parallelized non-deterministically using careful locking. In the second, a detailed interconnect model is parallelized while preserving determinism using parallel discrete event simulation (PDES) techniques. While both approaches demonstrate a threefold speedup using 4 threads on workloads from the NAS parallel benchmarks, speedup proved constrained by load balancing between simulated processors. A theoretical model is developed to help understand why observed speedup is less than ideal.
An analysis of the related speed-accuracy tradeoff in the first approach with respect to the simulation time quantum is also given; the results show that, for both serial and parallel simulation, a quantum in the order of a few hundreds of cycles represents a sweet-spot, but parallel simulation is significantly more accurate for a given quantum size. As with the speedup analysis, these effects are workload dependent.