Research

My research interests are centered on the challenge of making software run faster and more power-efficiently on modern hardware.  My primary interests include: microarchitectural support for managed languages, fast and efficient garbage collection, and the design and implementation of virtual machines.  As a backdrop to this I have a longstanding interest in role of sound methodology and infrastructure in successful research innovation. Read more here.

News

James Bornholt was runner up in the undergraduate division of the ACM Student Research Competition. This comes on the back of winning the undergraduate division of the ACM PLDI SRC. This was for his work on Uncertain‹T›, which was published as a full paper at ASPLOS 2014.
Kathryn, Perry and I received the ACM SIGMETRICS 2014 Test of Time Award for our paper Myths and Realities: The Performance Impact of Garbage Collection. Doing that work with Perry and Kathryn was a complete blast; ambitious and fun. Definitely a career highlight.
Vivek’s 2012 OOPSLA paper Work Stealing Without the Baggage was selected as a SIGPLAN Research Highlights Paper in May 2013. Vivek is now at Rice, working with Vivek Sarkar.

Select Recent Publications

  • J. Sartor, W. Heirman, S. M. Blackburn, L. Eeckout, and K. S. McKinley, "Cooperative Cache Scrubbing," in PACT ‘14 Proceedings of the 23rd International Conference on Parallel Architectures and Compilation Techniques, 2014.
    FOR subject classification codes: 080308, 100604, 100605
    Managing the limited resources of power and memory bandwidth while improving performance on multicore hardware is challenging. In particular, more cores demand more memory bandwidth, and multi-threaded applications increasingly stress memory systems, leading to more energy consumption. However, we demonstrate that not all memory traffic is necessary. For modern Java programs, 10 to 60% of DRAM writes are useless, because the data on these lines are dead – the program is guaranteed to never read them again. Furthermore, reading memory only to immediately zero initialize it wastes bandwidth. We propose a software/hardware cooperative solution: the memory manager communicates dead and zero lines with cache scrubbing instructions. We show how scrubbing instructions satisfy MESI cache coherence protocol invariants and demonstrate them in a Java Virtual Machine and multicore simulator. Scrubbing reduces average DRAM traffic by 59%, total DRAM energy by 14%, and dynamic DRAM energy by 57% on a range of configurations. Cooperative software/hardware cache scrubbing reduces memory bandwidth and improves energy efficiency, two critical problems in modern systems.
    @InProceedings{SHB+:14,
      author = {Jennifer Sartor and Wim Heirman and Stephen M. Blackburn and Lieven Eeckout and Kathryn S McKinley},
      title = {Cooperative Cache Scrubbing},
      booktitle = {PACT '14 Proceedings of the 23rd International Conference on Parallel Architectures and Compilation Techniques},
      year = {2014},
      month = {August},
      publisher = {IEEE},
      location = {Edmonton, Canada},
    }
  • V. Kumar, S. M. Blackburn, and D. Grove, "Friendly Barriers: Efficient Work-Stealing With Return Barriers," in VEE’14: Proceedings of the 2014 ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, 2014, pp. 165-176.
    FOR subject classification codes: 080308, 080501
    This paper addresses the problem of efficiently supporting parallelism within a managed runtime. A popular approach for exploiting software parallelism on parallel hardware is task parallelism, where the programmer explicitly identifies potential parallelism and the runtime then schedules the work. Work-stealing is a promising scheduling strategy that a runtime may use to keep otherwise idle hardware busy while relieving overloaded hardware of its burden. However, work-stealing comes with substantial overheads. Recent work identified sequential overheads of work-stealing, those that occur even when no stealing takes place, as a significant source of overhead. That work was able to reduce sequential overheads to just 15%.

    In this work, we turn to dynamic overheads, those that occur each time a steal takes place. We show that the dynamic overhead is dominated by introspection of the victim’s stack when a steal takes place. We exploit the idea of a low overhead return barrier to reduce the dynamic overhead by approximately half, resulting in total performance improvements of as much as 20%. Because, unlike prior work, we attack the overheads directly due to stealing and therefore attack the overheads that grow as parallelism grows, we improve the scalability of work-stealing applications. This result is complementary to recent work addressing the sequential overheads of work-stealing. This work therefore substantially relieves work-stealing of the increasing pressure due to increasing intra-node hardware parallelism.

    @InProceedings{KBG:14,
      author = {Vivek Kumar and Stephen M Blackburn and David Grove},
      title = {Friendly Barriers: Efficient Work-Stealing With Return Barriers},
      booktitle = {VEE'14: Proceedings of the 2014 ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments},
      year = {2014},
      volume = {47},
      doi = {http://dx.doi.org/10.1145/2576195.2576207},
      pages = {165--176},
      }
  • R. Shahriyar, S. M. Blackburn, X. Yang, and K. M. McKinley, "Taking Off the Gloves with Reference Counting Immix," in OOPSLA ‘13: Proceeding of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications, 2013.
    FOR subject classification codes: 080308, 100604
    Despite some clear advantages and recent advances, reference counting remains a poor cousin to high-performance tracing garbage collectors. The advantages of reference counting include a) immediacy of reclamation, b) incrementality, and c) local scope of its operations. After decades of languishing with hopelessly bad performance, recent work narrowed the gap between reference counting and the fastest tracing collectors to within 10%. Though a major advance, this gap remains a substantial barrier to adoption in performance-conscious application domains.

    Our work identifies heap organization as the principal source of the remaining performance gap. We present the design, implementation, and analysis of a new collector, RC Immix, that replaces reference counting’s traditional free-list heap organization with the line and block heap structure introduced by the Immix collector. The key innovations of RC Immix are 1) to combine traditional reference counts with per-line live object counts to identify reusable memory and 2) to eliminate fragmentation by integrating copying with reference counting of new objects and with backup tracing cycle collection. In RC Immix, reference counting offers efficient collection and the line and block heap organization delivers excellent mutator locality and efficient allocation. With these advances, RC Immix closes the 10% performance gap, matching the performance of a highly tuned production generational collector. By removing the performance barrier, this work transforms reference counting into a serious alternative for meeting high performance objectives for garbage collected languages.

    @InProceedings{SBYM:13,
      author = {Shahriyar, Rifat and Blackburn, Stephen M. and Yang, Xi and McKinley, Kathryn M.},
      title = {Taking Off the Gloves with Reference Counting Immix},
      booktitle = {OOPSLA '13: Proceeding of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications},
      location = {Indianapolis, IN, USA},
      year = {2013},
      month = {oct},
      doi = {http://dx.doi.org/10.1145/2509136.2509527},
      }
  • T. Gao, K. Strauss, S. M. Blackburn, K. S. McKinley, D. Burger, and J. Larus, "Using Managed Runtime Systems to Tolerate Holes in Wearable Memories," in PLDI ‘13: Proceedings of the 34th ACM SIGPLAN conference on Programming Language Design and Implementation, Seattle, WA, USA, June, 2013, 2013.
    FOR subject classification codes: 080308, 100604, 100605
    New memory technologies, such as phase-change memory (PCM), promise denser and cheaper main memory, and are expected to displace DRAM. However, many of them experience permanent failures far more quickly than DRAM. DRAM mechanisms that handle permanent failures rely on very low failure rates and, if directly applied to PCM, are extremely inefficient: Discarding a page when the first line fails wastes 98% of the memory.

    This paper proposes low complexity cooperative software and hardware that handle failure rates as high as 50%. Our approach makes error handling transparent to the application by using the memory abstraction offered by managed languages. Once hardware error correction for a memory line is exhausted, rather than discarding the entire page, the hardware communicates the failed line to a failure-aware OS and runtime. The runtime ensures memory allocations never use failed lines and moves data when lines fail during program execution. This paper describes minimal extensions to an Immix mark-region garbage collector, which correctly utilizes pages with failed physical lines by skipping over failures. This paper also proposes hardware support that clusters failed lines at one end of a memory region to reduce fragmentation and improve performance under failures. Contrary to accepted hardware wisdom that advocates for wear-leveling, we show that with software support non-uniform failures delay the impact of memory failure. Together, these mechanisms incur no performance overhead when there are no failures and at failure levels of 10% to 50% suffer only an average overhead of 4% and 12%, respectively. These results indicate that hardware and software cooperation can greatly extend the life of wearable memories.

    @InProceedings{GSB+:13,
      author = {Gao, Tiejun and Strauss, Karin and Blackburn, Stephen M. and McKinley, Kathryn S. and Burger, Doug and Larus, James},
      title = {Using Managed Runtime Systems to Tolerate Holes in Wearable Memories},
      booktitle = {PLDI '13: Proceedings of the 34th ACM SIGPLAN conference on Programming Language Design and Implementation, Seattle, WA, USA, June, 2013},
      year = {2013},
      month = {June},
      series = {ACM SIGPLAN Notices},
      volume = {48(6)},
      publisher = {ACM Press},
      }

A full list of my publications appears here.

Prospective Students

I’m always looking for bright students.  If you’re interested in doing research work with me, please read this before you contact me.