![]() |
Julian McAuley
PhD Student (since 2008)
RSISE, Australian National University e-mail: moc.liamg@yeluacm.nailuj phone: (+61) 2 6267 6332 |
Introduction
Learning with structured output spaces is an increasingly popular topic in computer vision, motivated by the simple fact that the vision problems such as segmentation, detection, and restoration tend to deal with structured data. While structured learning has already proved an effective technique in a wide variety of applications, its application to computer vision requires addressing a number of central issues: How can rich, high-order features be used to improve the quality of computer vision algorithms? How can such features be embedded in the framework of structured learning? How can inference be done efficiently (so as to run on large-scale datasets), and accurately (so that the learning algorithm produces useful results)? These are the questions I aim to answer in my research.
Current Interests
Structured Learning in Computer Vision
I am interested in applying structured learning to problems in computer vision. My research involves the use of structure to encode high-order features such as scene geometry and hierarchical constraints in detection, classification, and segmentation problems.
Much of my research is concerned with representing images as attributed graphs, allowing a variety of computer vision problems to be reduced to problems of graph matching. Attributed graphs offer a natural means of encoding scene geometry and other high-order features, which can then be parametrized in the framework of structured learning. Therefore my research is concerned with investigating high-order image features, and improving the accuracy and efficiency of graph-matching and vision problems.
Graph Matching and Higher-order Models for Vision
By using attributed graphs as abstract models for natural images, one can build upon a wide body of literature and algorithms developed specifically for graphs. However, the general solutions to these problems fail to exploit the unique types of attributes observed when dealing with images. Therefore, one of my research goals has been to reduce the complexity of such algorithms in cases where the graph attributes encode scene geometry.
Developing efficient and exact solutions to these problems is a critical step towards embedding them into a structured learning framework. I have found that many otherwise intractable graph algorithms become low-order polynomial when dealing with these specific types of features. In my research I have used structured learning to produce rich models including features such as scene topology, rigidity, and scale.
In the past I have also developed higher-order models for image denoising, inpainting, and segmentation.
Exact Inference in Graphical Models
The application of structured learning to the vision problems mentioned above often amounts to repeatedly performing MAP inference in a graphical model. Therefore I am very interested in improving the complexity of exact inference schemes. One of my results has been to show that the expected-case running time of MAP inference in a graphical model need not be exponential in the model's treewidth: substantial improvements can be gained whenever cliques containing observations have fewer latent variables than purely latent cliques.
Significantly, this means that MAP inference in many chain-structured models is sub-quadratic in the number of states per node. Similar models are ubiquitous in computer vision applications, in which high-treewidth graphical models contain only pairwise factors: examples include graph matching, SLAM, interactive segmentation (snakes), and OCR.
Future Work
I am interested in further improving the running-time of exact inference in graphical models. So far my work on min-sum matrix product has lead to faster algorithms for graph-matching, though it appears that similar ideas can be used in a variety of applications such in Natural Language Processing such as Parsing with Context-Free Grammars and Named-Entity Recognition, as well as a large number of Computer Vision applications where high treewidth models include only pairwise constraints, such as SLAM, OCR, and Pose Estimation.