Research

My research interests are centered on the challenge of making software run faster and more power-efficiently on modern hardware.  My primary interests include: microarchitectural support for managed languages, fast and efficient garbage collection, and the design and implementation of virtual machines.  As a backdrop to this I have a longstanding interest in role of sound methodology and infrastructure in successful research innovation. Read more here.

News

James Bornholt was runner up in the undergraduate division of the ACM Student Research Competition. This comes on the back of winning the undergraduate division of the ACM PLDI SRC. This was for his work on Uncertain‹T›, which was published as a full paper at ASPLOS 2014.
Kathryn, Perry and I received the ACM SIGMETRICS 2014 Test of Time Award for our paper Myths and Realities: The Performance Impact of Garbage Collection. Doing that work with Perry and Kathryn was a complete blast; ambitious and fun. Definitely a career highlight.
Vivek’s 2012 OOPSLA paper Work Stealing Without the Baggage was selected as a SIGPLAN Research Highlights Paper in May 2013. Vivek is now at Rice, working with Vivek Sarkar.

Select Recent Publications

  • X. Yang, S. M. Blackburn, and K. S. McKinley, "Computer Performance Microscopy with Shim," in ISCA ‘15: The 42nd International Symposium on Computer Architecture, 2015.
    FOR subject classification codes: 100605, 080308, 100606
    Developers and architects spend a lot of time trying to understand and eliminate performance problems. Unfortunately, the root causes of many problems occur at a fine granularity that existing continuous profiling and direct measurement approaches cannot observe. This paper presents the design and implementation of Shim, a continuous profiler that samples at resolutions as fine as 15 cycles; three to five orders of magnitude finer than current continuous profilers. Shim’s fine-grain measurements reveal new behaviors, such as variations in instructions per cycle (IPC) within the execution of a single function. A Shim observer thread executes and samples autonomously on unutilized hardware. To sample, it reads hardware performance counters and memory locations that store software state. Shim improves its accuracy by automatically detecting and discarding samples affected by measurement skew. We measure Shim’s observer effects and show how to analyze them. When on a separate core, Shim can continuously observe one software signal with a 2 overhead at a ~1200 cycle resolution. At an overhead of 61% Shim samples one software signal on the same core with SMT at a ~15 cycle resolution. Modest hardware changes could significantly reduce overheads and add greater analytical capability to Shim. We vary prefetching and DVFS policies in case studies that show the diagnostic power of fine-grain IPC and memory bandwidth results. By repurposing existing hardware, we deliver a practical tool for fine-grain performance microscopy for developers and architects.
    @Inproceedings{XBM:15,
      author = {Yang, Xi and Blackburn, Stephen M. and McKinley, Kathryn S.},
      title = {Computer Performance Microscopy with Shim},
      booktitle = {ISCA '15: The 42nd International Symposium on Computer Architecture},
      year = {2015},
      location = {Portland, OR},
      publisher = {IEEE},
      }
  • Y. Lin, K. Wang, S. M. Blackburn, M. Norrish, and A. L. Hosking, "Stop and Go: Understanding Yieldpoint Behavior," in Proceedings of the Fourteenth ACM SIGPLAN International Symposium on Memory Management, ISMM ‘15, Portland, OR, June 14, 2015, 2015.
    FOR subject classification codes: 080308, 080501
    Yieldpoints are critical to the implementation of high performance garbage collected languages, yet the design space is not well understood. Yieldpoints allow a running program to be interrupted at well-defined points in its execution, facilitating exact garbage collection, biased locking, on-stack replacement, profiling, and other important virtual machine behaviors. In this paper we identify and evaluate yieldpoint design choices, including previously undocumented designs and optimizations. One of the designs we identify opens new opportunities for very low overhead profiling. We measure the frequency with which yieldpoints are executed and establish a methodology for evaluating the common case execution time overhead. We also measure the median and worst case time-to-yield. We find that Java benchmarks execute about 100 M yieldpoints per second, of which about 1/20000 are taken. The average execution time overhead for untaken yieldpoints on the VM we use ranges from 2.5% to close to zero on modern hardware, depending on the design, and we find that the designs trade off total overhead with worst case time-to-yield. This analysis gives new insight into a critical but overlooked aspect of garbage collector implementation, and identifies a new optimization and new opportunities for very low overhead profiling.
    @InProceedings{LWB+:15,
      author = {Yi Lin and Kunshan Wang and Stephen M Blackburn and Michael Norrish and Antony L Hosking},
      title = {Stop and Go: Understanding Yieldpoint Behavior},
      booktitle = {Proceedings of the Fourteenth ACM SIGPLAN International Symposium on Memory Management, ISMM '15, Portland, OR, June 14, 2015},
      year = {2015},
      doi = {http://dx.doi.org/10.1145/10.1145/2754169.2754187},
      }
  • K. Wang, Y. Lin, S. M. Blackburn, M. Norrish, and A. L. Hosking, "Draining the Swamp: Micro Virtual Machines as Solid Foundation for Language Development," in 1st Summit on Advances in Programming Languages (SNAPL 2015), 2015.
    FOR subject classification codes: 080308, 080501
    Many of today’s programming languages are broken. Poor performance, lack of features and hard-to-reason-about semantics can cost dearly in software maintenance and inefficient execution. The problem is only getting worse with programming languages proliferating and hardware becoming more complicated.

    An important reason for this brokenness is that much of language design is implementation-driven. The difficulties in implementation and insufficient understanding of concepts bake bad designs into the language itself. Concurrency, architectural details and garbage collection are three fundamental concerns that contribute much to the complexities of implementing managed languages.

    We propose the micro virtual machine, a thin abstraction designed specifically to relieve implementers of managed languages of the most fundamental implementation challenges that currently impede good design. The micro virtual machine targets abstractions over memory (garbage collection), architecture (compiler backend), and concurrency. We motivate the micro virtual machine and give an account of the design and initial experience of a concrete instance, which we call Mu, built over a two year period. Our goal is to remove an important barrier to performant and semantically sound managed language design and implementation.

    @InProceedings{WLB+:15,
      author = {Kunshan Wang and Yi Lin and Stephen M Blackburn and Michael Norrish and Antony L Hosking},
      title = {Draining the Swamp: Micro Virtual Machines as Solid Foundation for Language Development},
      booktitle = {1st Summit on Advances in Programming Languages (SNAPL 2015)},
      year = {2015},
      doi = {http://dx.doi.org/10.4230/LIPIcs.SNAPL.2015.321},
      }

A full list of my publications appears here.

Prospective Students

I’m always looking for bright students.  If you’re interested in doing research work with me, please read this before you contact me.