My research currently includes the following major interests:

Memory Management

I have had a strong interest in memory management since my post-doc from 1999-2001 at the University of Massachusetts (with Eliot Moss and Kathryn McKinley).   Memory management is important to most programming language implementations, and is particularly important to languages which automatically manage memory, which includes most of the widely used languages today, with just a few notable exceptions. Broadly my interests include the design of new high-throughput garbage collection algorithms, the performance analysis and methodology for memory management, and the engineering of memory management systems (the widely used memory management toolkit is a result of this).

Recent publications include:

  • J. Sartor, S. M. Blackburn, D. Frampton, M. Hirzel, and K. S. McKinley, "Z-Rays: Divide Arrays and Conquer Speed and Flexibility," in ACM SIGPLAN Conference on Programming Language Design and Implementation, 2010.
    FOR subject classification codes: 080308, 100604
    Arrays are the ubiquitous organization for indexed data. Throughout programming language evolution, implementations have laid out arrays contiguously in memory. This layout is problematic in space and time. It causes heap fragmentation, garbage collection pauses in proportion to array size, and wasted memory for sparse and over-provisioned arrays. Because of array virtualization in managed languages, an array layout that consists of indirection pointers to fixed-size discontiguous memory blocks can mitigate these problems transparently. This design however incurs significant overhead, but is justified when real-time deadlines and space constraints trump performance.

    This paper proposes z-rays, a discontiguous array design with flexibility and efficiency. A z-ray has a spine with indirection pointers to fixed-size memory blocks called arraylets, and uses five optimizations: (1) inlining the first N array bytes into the spine, (2) lazy allocation, (3) zero compression, (4) fast array copy, and (5) arraylet copy-on-write. Whereas discontiguous arrays in prior work improve responsiveness and space efficiency, z-rays combine time efficiency and flexibility. On average, the best z-ray configuration performs within 12.7% of an unmodified Java Virtual Machine on 19 benchmarks, whereas previous designs have two to three times higher overheads. Furthermore, language implementers can configure z-ray optimizations for various design goals. This combination of performance and flexibility creates a better building block for past and future array optimization.

    @InProceedings{SBF+:10,
      author = {Jennifer Sartor and Stephen M. Blackburn and Daniel Frampton and Martin Hirzel and Kathryn S. McKinley},
      title = {Z-Rays: Divide Arrays and Conquer Speed and Flexibility},
      booktitle = {ACM SIGPLAN Conference on Programming Language Design and Implementation},
      year = {2010},
      month = {June},
      doi = {http://doi.acm.org/10.1145/1809028.1806649},
      publisher = {ACM},
      location = {Toronto, Canada},
      patch = {arraylets-jikesrvm-3.0.1.patch},
    }
  • S. M. Blackburn and K. S. McKinley, "Immix: A Mark-Region Garbage Collector with Space Efficiency, Fast Collection, and Mutator Performance," in ACM SIGPLAN Conference on Programming Language Design and Implementation, 2008.
    FOR subject classification codes: 080308
    Programmers are increasingly choosing managed languages for modern applications, which tend to allocate many short-to-medium lived small objects. The garbage collector therefore directly determines program performance by making a classic space-time tradeoff that seeks to provide space efficiency, fast reclamation, and mutator performance. The three canonical tracing garbage collectors: semi-space, mark-sweep, and mark-compact each sacrifice one objective. This paper describes a collector family, called mark-region, and introduces opportunistic defragmentation, which mixes copying and marking in a single pass. Combining both, we implement immix, a novel high performance garbage collector that achieves all three performance objectives. The key insight is to allocate and reclaim memory in contiguous regions, at a coarse block grain when possible and otherwise in groups of finer grain lines. We show that immix outperforms existing canonical algorithms, improving total application performance by 7 to 25% on average across 20 benchmarks. As the mature space in a generational collector, immix matches or beats a highly tuned generational collector, e.g. it improves SPEC jbb by 5%. These innovations and the identification of a new family of collectors open new opportunities for garbage collector design.
    @InProceedings{BM:08,
      author = {Stephen M. Blackburn and Kathryn S. McKinley},
      title = {Immix: A Mark-Region Garbage Collector with Space Efficiency, Fast Collection, and Mutator Performance},
      booktitle = {ACM SIGPLAN Conference on Programming Language Design and Implementation},
      year = {2008},
      month = {June},
      publisher = {ACM},
      doi = {http://doi.acm.org/10.1145/1375581.1375586},
      patch = {immix-jikesrvm-r13767.patch.gz},
      results = {immix-pldi-2008.csv.tgz},
      location = {Tucson, AZ, USA},
      }

Software and Microarchitecture

Both software and hardware are currently undergoing tumultuous change.   I’m interested in the interrelationship of those changes.  I’m particularly interested in how power-efficiency plays out in different languages, how different microarchitectures are more or less well adapted for new languages, and conversely, how well new language implementations exploit new and emerging architecture.  As part of this work, Intel will be providing my lab with an experimental SCC processor, which I’ll be using together with Peter Strazdins (ANU) and Kathryn McKinley (U. Texas).

Recent publications include:

  • H. Esmaeilzadeh, S. M. Blackburn, X. Yang, and K. S. McKinley, "Power and Performance of Native and Java Benchmarks on 130nm to 32nm Process Technologies," in Sixth Annual Workshop on Modeling, Benchmarking and Simulation, MoBS 2010, Saint-Malo, France, 2010.
    FOR subject classification codes: 100605, 080308, 100606
    Over the past decade, chip fabrication technology shrank from 130nm to 32nm. This reduction was generally considered to provide performance improvements together with chip power reduc tions. This paper examines how well process technology and microarchitecture delivered on this assumption. This paper evaluates power and performance of native and Java workloads across a selection of IA32 processors from five technology generations (130nm, 90nm, 65nm, 45nm, and 32nm). We use a Hall effect sensor to accurately measure chip power. This paper reports a range findings in three areas. 1) Methodology: TDP is unsurprisingly a poor predictor of application power consumption for a particular processor, but worse, TDP is a poor predictor of relative power consumption between processors. 2) Power-performance trends: Processors appear to have already hit the power wall at 45nm. 3) Native versus Java workloads and their relationship to processor technology: Single threaded Java workloads exploit multiple cores. These results indicate that Java workloads offer different opportunities and challenges compared to native workloads. Our findings challenge prevalent methodologies and offer new insight into how microarchitectures have traded power and performance as process technology shrank.
    @InProceedings{EBYM:10,
      author = {Hadi Esmaeilzadeh and Stephen M. Blackburn and Xi Yang and Kathryn S. McKinley},
      title = {Power and Performance of Native and Java Benchmarks on 130nm to 32nm Process Technologies},
      booktitle = {Sixth Annual Workshop on Modeling, Benchmarking and Simulation, MoBS 2010, Saint-Malo, France},
      year = {2010},
      month = {June},
      location = {Saint-Malo, France},
    }
  • J. Ha, M. Arnold, S. M. Blackburn, and K. S. McKinley, "A concurrent dynamic analysis framework for multicore hardware," in OOPSLA ‘09: Proceeding of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications, New York, NY, USA, 2009, pp. 155-174.
    FOR subject classification codes: 080304, 100604, 080308
    Software has spent the bounty of Moore’s law by solving harder problems and exploiting abstractions, such as high-level languages, virtual machine technology, binary rewriting, and dynamic analysis. Abstractions make programmers more productive and programs more portable, but usually slow them down. Since Moore’s law is now delivering multiple cores instead of faster processors, future systems must either bear a relatively higher cost for abstractions or use some cores to help tolerate abstraction costs.

    This paper presents the design, implementation, and evaluation of a novel concurrent, configurable dynamic analysis framework that efficiently utilizes multicore cache architectures. It introduces Cache-friendly Asymmetric Buffering (CAB), a lock-free ring-buffer that implements efficient communication between application and analysis threads. We guide the design and implementation of our framework with a model of dynamic analysis overheads. The framework implements exhaustive and sampling event processing and is analysis-neutral. We evaluate the framework with five popular and diverse analyses, and show performance improvements even for lightweight, low-overhead analyses.

    Efficient inter-core communication is central to high performance parallel systems and we believe the CAB design gives insight into the subtleties and difficulties of attaining it for dynamic analysis and other parallel software.

    @Inproceedings{HAB+:09,
      author = {Ha, Jungwoo and Arnold, Matthew and Blackburn, Stephen M. and McKinley, Kathryn S.},
      title = {A concurrent dynamic analysis framework for multicore hardware},
      booktitle = {OOPSLA '09: Proceeding of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications},
      year = {2009},
      isbn = {978-1-60558-766-0},
      pages = {155--174},
      location = {Orlando, Florida, USA},
      doi = {http://doi.acm.org/10.1145/1640089.1640101},
      publisher = {ACM},
      address = {New York, NY, USA},
      }

Performance Analysis

I am very interested in the challenge of meaningful performance analysis of computer systems.  As systems become more and more complex, meaningful analysis becomes more and more difficult.  Over the past ten years we’ve done a lot of work on improving methodology and in developing meaningful benchmark suites, including the widely used DaCapo benchmark suite.

Recent publications include:

  • S. M. Blackburn, K. S. McKinley, R. Garner, C. Hoffmann, A. M. Khan, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. Eliot, B. Moss, A. Phansalkar, D. Stefanovic, T. VanDrunen, D. von Dincklage, and B. Wiedermann, "Wake Up and Smell the Coffee: Evaluation Methodology for the 21st Century," Communications of the ACM, 2008.
    FOR subject classification codes: 080308, 080309, 100605
    Evaluation methodology underpins all innovation in experimental computer science. It requires relevant workloads, appropriate experimental design, and rigorous analysis. Unfortunately, methodology is not keeping pace with the changes in our feild. The rise of managed languages such as Java, C& 35;, and Ruby in the past decade and the imminent rise of commodity multicore architectures for the next decade pose new methodological challenges that are not yet widely understood. This paper explores the consequences of our collective inattention to methodology on innovation, makes recommendations for addressing this problem in one domain, and provides guidelines for other domains. We describe benchmark suite design, experimental design, and analysis for evaluating Java applications. For example, we introduce new criteria for measuring and selecting diverse applications for a benchmark suite. We show that the complexity and nondeterminism of the Java runtime system make experimental design a first-order consideration, and we recommend mechanisms for addressing complexity and nondeterminism. Drawing on these results, we suggest how to adapt methodology more broadly. To continue to deliver innovations, our field needs to significantly increase participation in and funding for developing sound methodological foundations.
    @Article{BMG+:08,
      author = {Stephen M. Blackburn and Kathryn S. McKinley and Robin Garner and Chris Hoffmann and Asjad M. Khan and Rotem Bentzur and Amer Diwan and Daniel Feinberg and Daniel Frampton and Samuel Z. Guyer and Martin Hirzel and Antony Hosking and Maria Jump and Han Lee and J. Eliot and B. Moss and Aashish Phansalkar and Darko Stefanovic and Thomas VanDrunen and Daniel von Dincklage and Ben Wiedermann},
      title = {{Wake Up and Smell the Coffee: Evaluation Methodology for the 21st Century}},
      journal = {Communications of the ACM},
      year = {2008},
      month = {August},
      publisher = {ACM Press},
      address = {New York, NY, USA},
      doi = {http://doi.acm.org/10.1145/1378704.1378723},
      note = {Invited paper. CACM Research Highlights.},
      }

Virtual Machine Design and Implementation

The design and implementation of virtual machines has been a constant interest to me over the past ten years.   I’ve been an active participant in the Jikes RVM research project and have co-lead the design and implementation of MMTk, Jikes RVM’s memory management framework.   I also lead the Moxie project at Intel.  I am now involved in collaborative research with IBM Research on the implementation of a runtime for the X10 high performance computing language.

Recent publications include:

  • D. Frampton, S. M. Blackburn, P. Cheng, R. J. Garner, D. Grove, E. J. B. Moss, and S. I. Salishev, "Demystifying magic: high-level low-level programming," in VEE ‘09: Proceedings of the 2009 ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, New York, NY, USA, 2009, pp. 81-90.
    FOR subject classification codes: 080308, 080309
    The power of high-level languages lies in their abstraction over hardware and software complexity, leading to greater security, better reliability, and lower development costs. However, opaque abstractions are often show-stoppers for systems programmers, forcing them to either break the abstraction, or more often, simply give up and use a different language. This paper addresses the challenge of opening up a high-level language to allow practical low-level programming without forsaking integrity or performance.

    The contribution of this paper is three-fold: 1) we draw together common threads in a diverse literature, 2) we identify a framework for extending high-level languages for low-level programming, and 3) we show the power of this approach through concrete case studies. Our framework leverages just three core ideas: extending semantics via intrinsic methods, extending types via unboxing and architectural-width primitives, and controlling semantics via scoped semantic regimes. We develop these ideas through the context of a rich literature and substantial practical experience. We show that they provide the power necessary to implement substantial artifacts such as a high-performance virtual machine, while preserving the software engineering benefits of the host language.

    The time has come for high-level low-level programming to be taken more seriously: 1) more projects now use high-level languages for systems programming, 2) increasing architectural heterogeneity and parallelism heighten the need for abstraction, and 3) a new generation of high-level languages are under development and ripe to be influenced.

    @Inproceedings{FBC+:09,
      author = {Frampton, Daniel and Blackburn, Stephen M. and Cheng, Perry and Garner, Robin J. and Grove, David and Moss, J. Eliot B. and Salishev, Sergey I.},
      title = {Demystifying magic: high-level low-level programming},
      booktitle = {VEE '09: Proceedings of the 2009 ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments},
      year = {2009},
      isbn = {978-1-60558-375-4},
      pages = {81--90},
      location = {Washington, DC, USA},
      doi = {http://doi.acm.org/10.1145/1508293.1508305},
      publisher = {ACM},
      address = {New York, NY, USA},
      }
  • S. M. Blackburn, S. I. Salishev, M. Danilov, O. A. Mokhovikov, A. A. Nashatyrev, P. A. Novodvorsky, V. I. Bogdanov, X. F. Li, and D. Ushakov, "The Moxie JVM Experience," Australian National University, Department of Computer Science, TR-CS-08-01, 2008.
    By January 1998, only two years after the launch of the first Java virtual machine, almost all JVMs in use today had been architected. In the nine years since, technology has advanced enormously, with respect to the underlying hardware, language implementation, and in the application domain. Although JVM technology has moved forward in leaps and bounds, basic design decisions made in the 90’s has anchored JVM implementation.

    The Moxie project set out to explore the question: “How would we design a JVM from scratch knowing what we know today?’ Amid the mass of design questions we faced, the tension between performance and flexibility was pervasive, persistent and problematic. In this experience paper we describe the Moxie project and its lessons, a process which began with consulting experts from industry and academia, and ended with a fully working prototype.

    @TechReport{BSD+:08,
      author = {Stephen M. Blackburn and Sergey I. Salishev and Mikhail Danilov and Oleg A. Mokhovikov and Anton A. Nashatyrev and Peter A. Novodvorsky and Vadim I. Bogdanov and Xiao Feng Li and Dennis Ushakov},
      title = {The {M}oxie {JVM} Experience},
      institution = {Australian National University, Department of Computer Science},
      year = {2008},
      number = {TR-CS-08-01},
      month = {May},
      }
  • B. Alpern, S. Augart, S. M. Blackburn, M. Butrico, A. Cocchi, P. Cheng, J. Dolby, S. Fink, D. Grove, M. Hind, K. S. McKinley, M. Mergen, J. E. B. Moss, T. Ngo, V. Sarkar, and M. Trapp, "The Jikes Research Virtual Machine project: Building an open source research community," IBM Systems Journal, vol. 44, iss. 2, 2005.
    FOR subject classification codes: 080308
    This paper describes the evolution of the Jikes Research Virtual Machine project from an IBM internal research project, called Jalapeno, into an open-source project. After summarizing the original goals of the project, we discuss the motivation for releasing it as an open-source project and the activities performed to ensure the success of the project. Throughout, we highlight the unique challenges of developing and maintaining an open-source project designed specifically to support a research community.
    @Article{AABB+:05,
      author = {B. Alpern and S. Augart and S. M. Blackburn and M. Butrico and A. Cocchi and P. Cheng and J. Dolby and S. Fink and D. Grove and M. Hind and K. S. McKinley and M. Mergen and J. E. B. Moss and T. Ngo and V. Sarkar and M. Trapp},
      title = {The {Jikes Research Virtual Machine} project: {B}uilding an open source research community},
      journal = {IBM Systems Journal},
      year = {2005},
      month = {May},
      volume = {44},
      number = {2},
      doi = {http://dx.doi.org/10.1147/sj.442.0399},
      }