Research
I have broad interests in computer and robotic vision, machine
learning, probabilistic graphical models, and optimization. My main
research is in the application of machine learning techniques
(specifically, conditional Markov random fields) to geometric and
semantic scene understanding.
Getting involved: I am always looking for motivated students
who are interested in doing research with me. You can read some of my
papers (below) to get a feel for the type of work that I do. I
encourage students to contact me but please read the following before
doing so:
- ANU undergraduates: Email me about setting up an
independent study contract or getting involved in a summer research
program.
- PhD students: You should have a background in machine
learning and/or computer vision. When contacting me include a copy of
your academic transcript and a proposed research topic. The following
information is provided by the department regarding the PhD admissions
process:
- ANU PhD students: Email me to set up a meeting. Include
information about your current project.
All applications for PhD or Masters should come through the ANU
applications system. Please check the above links for scholarship
deadlines.
Publications

Multiclass Pixel Labeling with Non-Local Matching Constraints
Stephen Gould.
To appear in
IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
2012.
[
pdf |
bib ]
Simultaneous Multi-class Pixel Labeling over Coherent Image Sets
Paul Rivera and Stephen Gould.
In
Digital Image Computing: Techniques and Applications (DICTA),
2011.
[
pdf |
bib ]
Max-margin Learning for Lower Linear Envelope Potentials in Binary Markov Random Fields
Stephen Gould.
In
Proceedings of the International Conference on Machine Learning (ICML),
2011.
[
pdf |
code |
slides (.pdf) |
bib ]
Discriminative Learning with Latent Variables for Cluttered Indoor Scene Understanding
Huayan Wang, Stephen Gould and Daphne Koller.
In
Proceedings of the European Conference on Computer Vision (ECCV),
2010.
[
pdf |
bib ]
A Unified Contour-Pixel Model for Segmentation
Ben Packer, Stephen Gould and Daphne Koller.
In
Proceedings of the European Conference on Computer Vision (ECCV),
2010.
[
pdf |
bib ]
Probabilistic Models for Region-based Scene Understanding
Stephen Gould.
Ph.D. Thesis, Stanford University,
June 2010.
[
pdf |
archive |
bib ]
Accelerated Dual Decomposition for MAP Inference
Vladimir Jojic, Stephen Gould and Daphne Koller.
In
Proceedings of the International Conference on Machine Learning (ICML),
2010.
[
pdf |
bib ]
Single Image Depth Estimation from Predicted Semantic Labels
Beyang Liu, Stephen Gould and Daphne Koller.
In
IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
2010.
[
pdf |
bib |
data (.tar.gz) ]
Region-based Segmentation and Object Detection
Stephen Gould, Tianshi Gao and Daphne Koller.
In
Advances in Neural Information Processing Systems (NIPS),
2009.
[
pdf |
bib ]
Decomposing a Scene into Geometric and Semantically Consistent Regions
Alphabet SOUP: A Framework for Approximate Energy Minimization
Stephen Gould, Fernando Amat and Daphne Koller.
In
IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
2009.
[
pdf |
poster |
bib ]
High-Accuracy 3D Sensing for Mobile Manipulation: Improving Object Detection and Door Opening
Morgan Quigley, Siddharth Batra, Stephen Gould, Ellen Klingbeil, Quoc V. Le, Ashley Wellman and Andrew Y. Ng.
In
IEEE International Conference on Robotics and Automation (ICRA),
2009.
[
pdf |
videos |
bib ]
Cascaded Classification Models: Combining Models for Holistic Scene Understanding
Geremy Heitz, Stephen Gould, Ashutosh Saxena and Daphne Koller.
In
Advances in Neural Information Processing Systems (NIPS),
2008.
[
pdf |
bib ]
Learning Bounded Treewidth Bayesian Networks
Gal Elidan and Stephen Gould.
In
Advances in Neural Information Processing Systems (NIPS),
2008.
A longer version of this paper also appears
in
Journal of Machine Learning Research (JMLR),
2008.
[
pdf (nips) |
pdf (jmlr) |
bib ]
Integrating Visual and Range Data for Robotic Object Detection
Stephen Gould, Paul Baumstarck, Morgan Quigley, Andrew Y. Ng and Daphne Koller.
In
ECCV workshop on Multi-camera and Multi-modal Sensor Fusion Algorithms and Applications (M2SFA2),
2008.
[
pdf |
bib ]
Projected Subgradient Methods for Learning Sparse Gaussians
John Duchi, Stephen Gould and Daphne Koller.
In
Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI),
2008.
[
pdf |
bib ]
Multi-Class Segmentation with Relative Location Prior
Stephen Gould, Jim Rodgers, David Cohen, Gal Elidan and Daphne Koller.
In
International Journal of Computer Vision (IJCV),
2008.
[
pdf |
bib ]
STAIR: The STanford Artificial Intelligence Robot Project
Andrew Y. Ng, Stephen Gould, Morgan Quigley, Ashutosh Saxena and Eric Berger.
In
Learning Workshop, Snowbird,
2008.
[
project ]
Peripheral-Foveal Vision for Real-time Object Recognition and Tracking in Video
Stephen Gould, Joakim Arfvidsson, Adrian Kaehler, Benjamin Sapp,
Marius Meissner, Gary Bradski, Paul Baumstarck, Sukwon Chung and Andrew Y. Ng.
In
Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI),
2007.
[
pdf |
bib ]
Software
The following lists some large software libraries that I have
developed. For reference implementations of the algorithms described
in my work see the links next to the relevant paper in
my publications list.
Darwin
A C++ framework for machine learning and computer vision research. The
framework includes a wide range of standard machine learning and
graphical models algorithms as well as reference implementations for
many of the algorithms described in the publications above. The code
is released under the BSD license.
If you are interested in contributing to this codebase then please
email me.
[
version 1.0.2 |
documentation |
mloss |
browse svn ]
The STAIR Vision Library
A platform independent C++ toolkit for computer vision research
(building on top of OpenCV). The library also includes many machine
learning and probabilistic graphical models algorithms. We have
released the code under the BSD license. Developed while I was at
Stanford University, this library is no longer supported---much of its
functionality, however, is available in the Darwin framework described
above.
[
wiki |
doc |
sourceforge ]
Professional Activities
Conferences and Journals
I have regularly served as program committee member or reviewer for
the following conferences and journals: CVPR, ECCV, ICCV, ICML, IEEE
PAMI, IEEE TIP, IJCV, JMLR, NIPS, RSS, UAI and others.
Co-organized the
Workshop on Inference in Graphical Models with Structured Potentials
at CVPR 2011,
with Julian McAuley, Tiberio Caetano, Pushmeet Kohli and Pawan Kumar.
Invited Talks and Tutorials
- Technical workshop for computer science PhD students at ANU, 3-4 May, 2012.
- ANU Workshop on Developing and Debugging Machine Learning
Algorithms, September 2011.
[ Sessions 1-3 (2.9MB)
| Session 4 (11.6MB) ]
- Tutorial titled “Markov Random Fields for Computer Vision”
given at the Machine
Learning Summer School (MLSS 2011), 13-17 June 2011, Singapore.
[ slides (part 1) |
slides (part 2) |
slides (part 3) ]
- Invited talk titled “Probabilistic Models in Holistic Scene Understanding” given at Stanford Computer Science Faculty Luncheon, 2010.
- Invited talk titled “A Region-based Approach to Scene Understanding” given at The First IEEE Workshop on Visual Place Categorization (VPC), 2009.
- Invited talk titled “Multi-modal Robotic Vision: Detecting Objects and People” given at Honda Research Institute, April 2008.
Teaching
- COMP3130: Computer Science Group Project (Semester 1, 2012)
- COMP3130: Computer Science Group Project (Semester 1, 2011)
Issued Patents
US 7,725,312.
Transcoding method and system between CELP-based speech codes with externally provided status.
US 7,411,418.
Efficient representation of state transition tables.
US 7,301,792.
Apparatus and method of ordering state transition rules for memory efficient, programmable, pattern matching finite state machine hardware.
US 7,219,319.
Apparatus and method for generating state transition rules for memory efficient programmable pattern matching finite state machine hardware.
US 7,184,953.
Transcoding method and system between CELP-based speech codes with externally provided status.
US 7,180,328.
Apparatus and method for large hardware finite state machine with embedded equivalence classes.
US 7,082,044.
Apparatus and method for memory efficient, programmable, pattern matching finite state machine hardware.
US 6,829,579.
Transcoding method and system between CELP-based speech codes.
AU 2004222859.
A method for developing algorithms.
Copyright © 2010-2012, Stephen Gould.