Simulation is an integral tool in performance analysis, however without some
knowledge of a simulator's underlying accuracy and limitations, the results
may prove wrong or misleading. Timing validation is one aspect of
development which is easy to overlook, typically due to the lack of a
comparison target at the time the simulator was written. This paper
discusses the design and validation of an accurate timing model for an
\ultrasparciiicu{}--based system. An existing functional simulator was
augmented with a cycle-accurate model of the memory hierarchy of a reference
system.
Key features of the model include the use of a `bridge' for the processor /
memory system interface, the use of event windows between the simulated
backplane and processors, implementation of pipelined transactions, and
the extension of the processor run loop to support this. The modelling
of the store buffer and prefetch mechanisms proved both challenging and
important for the model's accuracy.
Using a combination of documentation, microbenchmarks, and comparisons of the
NAS parallel benchmarks between the simulator and a real machine, it was
possible to uncover several undocumented architectural artifacts, and
validate the simulator to a reasonable degree. Hardware performance
counters and timing information were used to identify the source of
discrepancies. Surprisingly, the overhead of introducing the model was
within a factor of two, compared with the original functional simulator.