Research

My research interests are centered on the challenge of making software run faster and more power-efficiently on modern hardware.  My primary interests include: microarchitectural support for managed languages, fast and efficient garbage collection, and the design and implementation of virtual machines.  As a backdrop to this I have a longstanding interest in role of sound methodology and infrastructure in successful research innovation. Read more here.

News

Ting Cao graduated in November 2015 after completing her PhD, which includes her landmark work on the way we think about energy, power and performance. Ting holds a distinguished fellowship at the Chinese Academy of Science.

Rifat Shahriyar graduated in July 2015 after completing a PhD that changes the way we think about reference counting.  Rifat is now a professor at BEUT.

Select Recent Publications

  • I. Jibaja, T. Cao, S. M. Blackburn, and K. S. McKinley, "Portable Performance on Asymmetric Multicore Processors," in Proceedings of the 14th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2016.
    FOR subject classification codes: 100605, 080308, 100606
    @InProceedings{JCBM:16,
      author = {Jibaja, Ivan and Cao, Ting and Blackburn, Stephen M and McKinley, Kathryn S.},
      title = {Portable Performance on Asymmetric Multicore Processors},
      booktitle = {Proceedings of the 14th Annual IEEE/ACM International Symposium on Code Generation and Optimization},
      year = 2016, month = feb, location = {Barcelona, Spain},
      publisher = {IEEE},
      }
  • X. Yang, S. M. Blackburn, and K. S. McKinley, "Computer Performance Microscopy with Shim," in ISCA ‘15: The 42nd International Symposium on Computer Architecture, 2015.
    FOR subject classification codes: 100605, 080308, 100606
    Developers and architects spend a lot of time trying to understand and eliminate performance problems. Unfortunately, the root causes of many problems occur at a fine granularity that existing continuous profiling and direct measurement approaches cannot observe. This paper presents the design and implementation of Shim, a continuous profiler that samples at resolutions as fine as 15 cycles; three to five orders of magnitude finer than current continuous profilers. Shim’s fine-grain measurements reveal new behaviors, such as variations in instructions per cycle (IPC) within the execution of a single function. A Shim observer thread executes and samples autonomously on unutilized hardware. To sample, it reads hardware performance counters and memory locations that store software state. Shim improves its accuracy by automatically detecting and discarding samples affected by measurement skew. We measure Shim’s observer effects and show how to analyze them. When on a separate core, Shim can continuously observe one software signal with a 2 overhead at a ~1200 cycle resolution. At an overhead of 61% Shim samples one software signal on the same core with SMT at a ~15 cycle resolution. Modest hardware changes could significantly reduce overheads and add greater analytical capability to Shim. We vary prefetching and DVFS policies in case studies that show the diagnostic power of fine-grain IPC and memory bandwidth results. By repurposing existing hardware, we deliver a practical tool for fine-grain performance microscopy for developers and architects.
    @Inproceedings{XBM:15,
      author = {Yang, Xi and Blackburn, Stephen M. and McKinley, Kathryn S.},
      title = {Computer Performance Microscopy with {Shim}},
      booktitle = {ISCA '15: The 42nd International Symposium on Computer Architecture},
      year = {2015},
      location = {Portland, OR},
      publisher = {IEEE},
      }
  • Y. Lin, K. Wang, S. M. Blackburn, M. Norrish, and A. L. Hosking, "Stop and Go: Understanding Yieldpoint Behavior," in Proceedings of the Fourteenth ACM SIGPLAN International Symposium on Memory Management, ISMM ‘15, Portland, OR, June 14, 2015, 2015.
    FOR subject classification codes: 080308, 080501
    Yieldpoints are critical to the implementation of high performance garbage collected languages, yet the design space is not well understood. Yieldpoints allow a running program to be interrupted at well-defined points in its execution, facilitating exact garbage collection, biased locking, on-stack replacement, profiling, and other important virtual machine behaviors. In this paper we identify and evaluate yieldpoint design choices, including previously undocumented designs and optimizations. One of the designs we identify opens new opportunities for very low overhead profiling. We measure the frequency with which yieldpoints are executed and establish a methodology for evaluating the common case execution time overhead. We also measure the median and worst case time-to-yield. We find that Java benchmarks execute about 100 M yieldpoints per second, of which about 1/20000 are taken. The average execution time overhead for untaken yieldpoints on the VM we use ranges from 2.5% to close to zero on modern hardware, depending on the design, and we find that the designs trade off total overhead with worst case time-to-yield. This analysis gives new insight into a critical but overlooked aspect of garbage collector implementation, and identifies a new optimization and new opportunities for very low overhead profiling.
    @InProceedings{LWB+:15,
      author = {Yi Lin and Kunshan Wang and Stephen M Blackburn and Michael Norrish and Antony L Hosking},
      title = {Stop and Go: Understanding Yieldpoint Behavior},
      booktitle = {Proceedings of the Fourteenth ACM SIGPLAN International Symposium on Memory Management, ISMM '15, Portland, OR, June 14, 2015},
      year = {2015},
      doi = {http://dx.doi.org/10.1145/10.1145/2754169.2754187},
      }
  • K. Wang, Y. Lin, S. M. Blackburn, M. Norrish, and A. L. Hosking, "Draining the Swamp: Micro Virtual Machines as Solid Foundation for Language Development," in 1st Summit on Advances in Programming Languages (SNAPL 2015), 2015.
    FOR subject classification codes: 080308, 080501
    Many of today’s programming languages are broken. Poor performance, lack of features and hard-to-reason-about semantics can cost dearly in software maintenance and inefficient execution. The problem is only getting worse with programming languages proliferating and hardware becoming more complicated.

    An important reason for this brokenness is that much of language design is implementation-driven. The difficulties in implementation and insufficient understanding of concepts bake bad designs into the language itself. Concurrency, architectural details and garbage collection are three fundamental concerns that contribute much to the complexities of implementing managed languages.

    We propose the micro virtual machine, a thin abstraction designed specifically to relieve implementers of managed languages of the most fundamental implementation challenges that currently impede good design. The micro virtual machine targets abstractions over memory (garbage collection), architecture (compiler backend), and concurrency. We motivate the micro virtual machine and give an account of the design and initial experience of a concrete instance, which we call Mu, built over a two year period. Our goal is to remove an important barrier to performant and semantically sound managed language design and implementation.

    @InProceedings{WLB+:15,
      author = {Kunshan Wang and Yi Lin and Stephen M Blackburn and Michael Norrish and Antony L Hosking},
      title = {Draining the Swamp: Micro Virtual Machines as Solid Foundation for Language Development},
      booktitle = {1st Summit on Advances in Programming Languages (SNAPL 2015)},
      year = {2015},
      doi = {http://dx.doi.org/10.4230/LIPIcs.SNAPL.2015.321},
      }

A full list of my publications appears here.

Prospective Students

I’m always looking for bright students.  If you’re interested in doing research work with me, please read this before you contact me.