Matrix Multiply on the UltraSparc I - large matrix performance

This report shows how to get sustained 250-270 MFLOPs performance for large matrix multiply on an 170 MHz Ultra 1 (the Sun Performance Library 1.2 DGEMM() sustains 220-230 MFLOPS). This slide explains things more succintly.

UltraSparc BLAS (UBLAS)

Current implementation (8/9/98) has: Performance results (on UltraSPARC I 170 MHz): To use these codes:
  1. first obtain a version of the BLAS (eg. from netlib), link that into your application and THOROUGLY test your application.
  2. the re-link your application putting the file UBlas.a archive *before* the other BLAS archive.
  3. re-run your application!
Disclaimer: Anything free comes with no guarantee!
  1. As this is the alpha release of the UBLAS, anyone using these codes does so entirely at their own risk, and should *not* assume that the code is free of bugs, major or minor.
  2. The author accepts *no* responsibility for any loss or misfortune of others as a result of using these codes.
Having read the above Disclaimer, click here to obtain the UBLAS 10. alpha archive file (12K gzipped archive; compiled under Solaris 5.6).