Stephen Gould
I'm a Professor of Computer Science at the Australian National University and an Australian Research Council (ARC) Future Fellow.
[ short bio ]

  Email:

Phone:
+61-(0)2-6125-8642 (office)
+61-(0)408-879-963 (AUS mobile)

Office Hours:
Tuesdays 11am-12pm (drop-in)

Address/Office:
Anderson Building (115), Room B165,
The Australian National University,
Corner North and Daley Roads,
Acton, ACT, 2601
AUSTRALIA

News and Events


[ more ]
Research

I have broad interests in computer and robotic vision, machine learning, models for structured prediction, and optimization. My main research focus is on the application of machine learning techniques (specifically, conditional Markov random fields and, more recently, deep learning and deep declarative networks) to geometric, semantic and dynamic scene understanding. I am also interested in seeing research translate into practical outcomes. To this end, I collaborate with industry and have previously been involved in founding start-up companies.

Getting involved: I am always looking for motivated and hard working students who are interested in doing research with me. You can read some of my selected papers (below) to get a feel for the type of work that I do. I encourage students to contact me but please read the following before doing so:

All applications for PhD or Masters should come through the ANU applications system. Please check the above links for scholarship deadlines.
▲top
Students, Post-docs and Visitors

Current PhD Students (primary supervision), Post-docs and Visitors

Weijian Deng

Itzik ben Shabat

Ming Xu

K (Nutthadech)
Banditakkarakul

Chamin Hewa
Koneputugodage

Evan Markou

Francis Snelgar

Chunyi Sun

Oliver (Kai) Xi

Jiahao Zhang

Qinyu Zhao

Former PhD Students and Post-docs (first appointment following supervision noted)
▲top
All Publications [ show selected | 2025 | 2024 | 2023 | 2022 | 2021 | 2020 & earlier ]
(Google Scholar Profile)

3D-GPT: Procedural 3D Modeling with Large Language Models
Chunyi Sun, Junlin Han, Weijian Deng, Xinlong Wang, Zishan Qin, Stephen Gould.
To appear in International Conference on 3D Vision (3DV), 2025.
[ paper coming soon | preprint | project | bib ]
Temporally Grounding Instructional Diagrams in Unconstrained Videos
Jiahao Zhang, Frederic Zhang, Cristian Rodriguez, Yizhak Ben-Shabat, Anoop Cherian, Stephen Gould.
To appear in Winter Conference on Applications of Computer Vision (WACV), 2025.
[ paper coming soon | bib ]
Guiding Neural Collapse: Optimising Towards the Nearest Simplex Equiangular Tight Frame
Evan Markou, Thalaiyasingam Ajanthan and Stephen Gould.
To appear in Advances in Neural Information Processing Systems (NeurIPS), 2024.
[ paper coming soon | preprint | bib ]
Neural Experts: Mixture of Experts for Implicit Neural Representations
Yizhak Ben-Shabat, Chamin Hewa Koneputugodage, Sameera Ramasinghe and Stephen Gould.
To appear in Advances in Neural Information Processing Systems (NeurIPS), 2024.
[ paper coming soon | bib ]
Unsupervised Dense Prediction using Differentiable Normalized Cuts
Yanbin Liu and Stephen Gould.
In proceedings of the European Conference on Computer Vision (ECCV), 2024.
[ paper | video | bib ]
The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?
Qinyu Zhao, Ming Xu, Kartik Gupta, Akshay Asthana, Liang Zheng and Stephen Gould.
In proceedings of the European Conference on Computer Vision (ECCV), 2024.
[ paper coming soon | preprint | project | bib ]
An Empirical Study into What Matters for Calibrating Vision-Language Models
Weijie Tu, Weijian Deng, Dylan Campbell, Stephen Gould and Tom Gedeon.
In the International Conference on Machine Learning (ICML), 2024.
[ paper | bib ]
Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation
Ming Xu and Stephen Gould.
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
[ paper | preprint | code | bib ]
Small Steps and Level Sets: Fitting Neural Surface Models with Point Guidance
Chamin Hewa Koneputugodage, Yizhak Ben-Shabat, Dylan Campbell and Stephen Gould.
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
[ paper | talk | code | bib ]
Differentiable Neural Surface Refinement for Transparent Objects
Weijian Deng, Dylan Campbell, Chunyi Sun, Shubham Kanitkar, Matthew E. Shaffer and Stephen Gould.
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
[ paper | talk | project | bib ]
Selective View Pipelining: An Efficient Approach for Multi-view Understanding
Yunzhong Hou, Stephen Gould and Liang Zheng.
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
[ paper | bib ]
3DInAction: Understanding Human Actions in 3D Point Clouds
Yizhak Ben-Shabat, Oren Shrout and Stephen Gould.
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
[ paper | preprint | talk | code | bib ]
Neuro-symbolic Learning of Lifted Action Models from Visual Traces
Kai (Oliver) Xi, Stephen Gould and Sylvie Thiebaux.
In International Conference on Automated Planning and Scheduling (ICAPS), 2024.
[ paper | bib ]
Candidate Set Re-ranking for Composed Image Retrieval with Dual Multi-modal Encoder
Zheyuan Liu, Weixuan Sun, Damien Teney and Stephen Gould.
In Transactions on Machine Learning Research (TMLR), 2024.
[ paper | pre-print | project | bib ]
Towards Optimal Feature-Shaping Methods for Out-of-Distribution Detection
Qinyu Zhao, Ming Xu, Kartik Gupta, Akshay Asthana, Liang Zheng and Stephen Gould.
In International Conference on Learning Representations (ICLR), 2024.
[ paper | bib ]
View-coherent Correlation Consistency for Semi-supervised Semantic Segmentation
Yunzhong Hou, Stephen Gould and Liang Zheng.
In Pattern Recognition, 2024.
[ paper | bib ]
Ray Deformation Networks for Novel View Synthesis of Refractive Objects
Weijian Deng, Dylan Campbell, Chunyi Sun, Shubham Kanitkar, Matthew Shaffer and Stephen Gould.
In IEEE Winter Conference on Applications of Computer Vision (WACV), 2024.
[ paper | project | bib ]
IKEA Ego 3D dataset: Understanding furniture assembly actions from ego view 3D Point Clouds
Yizhak Ben-Shabat, Oren Shrout, Jonathan Paul, Eviatar Segev and Stephen Gould.
In IEEE Winter Conference on Applications of Computer Vision (WACV), 2024.
[ paper | bib ]
LipAT: Beyond Style Transfer for Controllable Neural Simulation of Lipstick using Cosmetic Attributes
Amila Silva, Olga Moskvyak, Alexander Long, Ravi Garg, Stephen Gould, Gil Avraham and Anton van den Hengel.
In IEEE Winter Conference on Applications of Computer Vision (WACV), 2024.
[ paper | bib ]
NeRFEditor: Differentiable Style Decomposition for 3D Scene Editing
Chunyi Sun, Yanbin Liu, Junlin Han and Stephen Gould.
In IEEE Winter Conference on Applications of Computer Vision (WACV), 2024.
[ paper | bib ]
Bi-directional Training for Composed Image Retrieval via Text Prompt Learning
Zheyuan Liu, Weixuan Sun, Yicong Hong, Damien Teney and Stephen Gould.
In IEEE Winter Conference on Applications of Computer Vision (WACV), 2024.
[ paper | bib ]
Revisiting Implicit Differentiation for Learning Problems in Optimal Control
Ming Xu, Timothy Molloy and Stephen Gould.
In Advances in Neural Information Processing Systems (NeurIPS), 2023.
[ paper | preprint | code | bib ]
Exploring Predictive Visual Context for Detecting Human–Object Interactions
Frederic Zhang, Yuhui Yuan, Dylan Campbell, Zhuoyao Zhong and Stephen Gould.
In IEEE International Conference on Computer Vision (ICCV), 2023.
[ paper | preprint | code | bib ]
Semi-Supervised Semantic Segmentation under Label Noise via Diverse Learning Groups
Peixia Li, Pulak Purkait, Thalaiyasingam Ajanthan, Majid Abdolshah, Ravi Garg, Hisham Husain, Chenchen Xu, Stephen Gould, Wanli Ouyang and Anton van den Hengel.
In IEEE International Conference on Computer Vision (ICCV), 2023.
[ paper | bib ]
Scaling Data Generation in Vision-and-Language Navigation
Zun Wang, Jialu Li, Yicong Hong, Yi Wang, Qi Wu, Mohit Bansal, Stephen Gould, Hao Tan and Yu Qiao.
In IEEE International Conference on Computer Vision (ICCV), 2023.
[ paper | code | bib ]
Learning Navigational Visual Representations with Semantic Map Supervision
Yicong Hong, Yang Zhou, Ruiyi Zhang, Franck Dernoncourt, Trung Bui, Stephen Gould and Hao Tan.
In IEEE International Conference on Computer Vision (ICCV), 2023.
[ paper | bib ]
Towards Understanding Gradient Approximation in Equality Constrained Deep Declarative Networks
Stephen Gould, Ming Xu, Zhiwei Xu, Yanbin Liu.
In the ICML Workshop on Differentiable Almost Everything, 2023.
[ preprint | code | bib ]
PMaF: Deep Declarative Layers for Principal Matrix Features
Zhiwei Xu, Hao Wang, Yanbin Liu and Stephen Gould.
In the ICML Workshop on Differentiable Almost Everything, 2023.
[ preprint | code | bib ]
Confidence and Dispersity Speak: Characterizing Prediction Matrix for Unsupervised Accuracy Estimation
Weijian Deng, Yumin Suh, Stephen Gould and Liang Zheng.
In the International Conference on Machine Learning (ICML), 2023.
[ paper coming soon | code coming soon | bib ]
Octree Guided Unoriented Surface Reconstruction
Chamin Hewa Koneputugodage, Yizhak Ben-Shabat and Stephen Gould.
In the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
[ paper | project | code | bib ]
Aligning Step-by-Step Instructional Diagrams to Video Demonstrations
Jiahao Zhang, Anoop Cherian, Yanbin Liu, Yizhak Ben-Shabat, Cristian Rodriguez and Stephen Gould.
In the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
[ paper | preprint | code | dataset | project | bib ]
High-Fidelity Guided Image Synthesis with Latent Diffusion Models
Jaskirat Singh, Liang Zheng and Stephen Gould.
In the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
[ paper | preprint | project | bib ]
Deep Declarative Dynamic Time Warping for End-to-end Learning of Alignment Paths
Ming Xu, Sourav Garg, Michael Milford and Stephen Gould.
In International Conference on Learning Representations (ICLR), 2023.
[ paper | preprint | code | bib ]
On the Strong Correlation Between Model Invariance and Generalization
Weijian Deng, Stephen Gould and Liang Zheng.
In Advances in Neural Information Processing Systems (NeurIPS), 2022.
[ paper | bib ]
GoferBot: A Visual Guided Human-Robot Collaborative Assembly System
Zheyu Zhuang, Yizhak Ben-Shabat, Jiahao Zhang, Stephen Gould and Robert Mahony.
In International Conference on Intelligent Robots and Systems (IROS), 2022.
[ paper | video | bib ]
Fine-grained Classification via Categorical Memory Networks
Weijian Deng, Joshua Marsh, Stephen Gould and Liang Zheng.
In the IEEE Transactions on Image Processing (TIP), 2022.
[ paper | preprint | bib ]
Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation
Yicong Hong, Zun Wang, Qi Wu and Stephen Gould.
In the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
[ paper | preprint | bib ]
Efficient Two-Stage Detection of Human-Object Interactions with a Novel Unary-Pairwise Transformer
Zhen Zhang, Dylan Campbell and Stephen Gould.
In the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
[ paper | preprint | code | bib ]
DiGS: Divergence Guided Shape Implicit Neural Representation for Unoriented Point Clouds
Yizhak Ben-Shabat, Chamin Hewa Koneputugodage and Stephen Gould.
In the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
[ paper | preprint | project | code | bib ]
Exploiting Problem Structure in Deep Declarative Networks: Two Case Studies
Stephen Gould, Dylan Campbell, Yizhak Ben-Shabat, Chamin Hewa Koneputugodage and Zhiwei Xu.
In the First AAAI Workshop on Optimal Transport and Structured Data Modeling (OT-SDM), 2022.
[ paper | slides | preprint | code (rvp) | code (ot) | video | bib ]
Rethinking Conditional GAN Training: An Approach Using Geometrically Structured Latent Manifolds
Sameera Ramasinghe, Moshiur R. Farazi, Salman Khan, Nick Barnes and Stephen Gould.
In Advances in Neural Information Processing Systems (NeurIPS), 2021.
[ paper | preprint | code | bib ]
A Regularized Wasserstein Framework for Graph Kernels
Asiri Wijesinghe, Qing Wang and Stephen Gould.
In IEEE International Conference on Data Mining (ICDM), 2021.
[ paper | preprint | bib ]
Spatially Conditioned Graphs for Detecting Human-Object Interactions
Frederic Z. Zhang, Dylan Campbell and Stephen Gould.
In IEEE International Conference on Computer Vision (ICCV), 2021.
[ paper | preprint | video | code | bib ]
Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models
Zheyuan Liu, Cristian Rodriguez, Damien Teney and Stephen Gould.
In IEEE International Conference on Computer Vision (ICCV), 2021.
[ paper | data | bib ]
Contextually Plausible and Diverse 3D Human Motion Prediction
Sadegh Aliakbarian, Fatemeh Saleh, Lars Petersson, Stephen Gould and Mathieu Salzmann.
In IEEE International Conference on Computer Vision (ICCV), 2021.
[ paper | bib ]
What Does Rotation Prediction Tell Us about Classifier Accuracy under Varying Testing Environments?
Weijian Deng, Stephen Gould and Liang Zheng.
In International Conference on Machine Learning (ICML), 2021.
[ paper coming soon | preprint | code | bib ]
A Recurrent Vision-and-Language BERT for Navigation
Yicong Hong, Qi Wu, Yuankai Qi, Cristian Rodriguez and Stephen Gould.
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
[ paper | preprint | code | bib ]
Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking
Fatemeh Saleh, Sadegh Aliakbarian, Hamid Rezatofighi, Mathieu Salzmann and Stephen Gould.
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
[ paper | preprint | code | bib ]
Deep Declarative Networks
Stephen Gould, Richard Hartley and Dylan Campbell.
In IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2021.
[ paper | preprint | code & resources | bib ]
Conditional Generative Modeling via Learning the Latent Space
Sameera Ramasinghe, Kanchana Nisal Ranasinghe, Salman Khan, Nick Barnes and Stephen Gould.
In International Conference on Learning Representations (ICLR), 2021.
[ paper | bib ]
DORi: Discovering Objects Relationship for Temporal Moment Localization of a Natural-Language Query in Video
Cristian Rodriguez, Edison Marrese-Taylor, Basura Fernando, Hongdong Li and Stephen Gould.
In IEEE Winter Conference on Applications of Computer Vision (WACV), 2021.
[ paper | bib ]
The IKEA ASM Dataset: Understanding People Assembling Furniture through Actions, Objects and Pose
Yizhak Ben-Shabat, Xin Yu, Fatemeh Saleh, Dylan Campbell, Cristian Rodriguez, Hongdong Li and Stephen Gould.
In IEEE Winter Conference on Applications of Computer Vision (WACV), 2021.
[ paper | website | code | bib ]
Semantics for Robotic Mapping, Perception and Interaction: A Survey
Sourav Garg, Niko Sunderhauf, Feras Dayoub, Douglas Morrison, Akansel Cosgun, Gustavo Carneiro, Qi Wu, Tat-Jun Chin, Ian Reid, Stephen Gould, Peter Corke and Michael Milford.
In Foundations and Trends in Robotics, 2020.
[ paper | preprint | bib ]
Language and Visual Entity Relationship Graph for Agent Navigation
Yicong Hong, Cristian Rodriguez, Yuankai Qi, Qi Wu and Stephen Gould.
In Advances in Neural Information Processing Systems (NeurIPS), 2020.
[ paper | code | bib ]
Joint Unsupervised Learning of Optical Flow and Egomotion with Bi-Level Optimization
Shihao Jiang, Dylan Campbell, Miaomiao Liu, Stephen Gould and Richard Hartley.
In International Conference on 3D Vision (3DV), 2020.
[ paper | preprint | bib ]
Sub-Instruction Aware Vision-and-Language Navigation
Yicong Hong, Cristian Rodriguez, Qi Wu and Stephen Gould.
In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.
[ paper | preprint | code | bib ]
DeepFit: 3D Surface Fitting by Neural Network Weighted Least Squares
Yizhak Ben-Shabat and Stephen Gould.
In Proceedings of the European Conference on Computer Vision (ECCV), 2020.
[ paper | preprint | talk | code | bib ]
Multiview Pedestrian Detection with Feature Perspective Transformation
Yunzhong Hou, Liang Zheng and Stephen Gould.
In Proceedings of the European Conference on Computer Vision (ECCV), 2020.
[ paper | preprint | bib ]
Solving the Blind Perspective-n-Point Problem End-To-End with Robust Differentiable Geometric Optimization
Dylan Campbell, Liu Liu and Stephen Gould.
In Proceedings of the European Conference on Computer Vision (ECCV), 2020.
[ paper | talk | preprint | code | bib ]
Spectral-GANs for High-Resolution 3D Point-cloud Generation
Sameera Ramasinghe, Salman Khan, Nick Barnes and Stephen Gould.
In International Conference on Intelligent Robots and Systems (IROS), 2020.
[ paper | preprint | bib ]
Inferring Temporal Compositions of Actions Using Probabilistic Automata
Rodrigo Santa Cruz, Dylan Campbell, Anoop Cherian, Basura Fernando and Stephen Gould.
In Workshop on Compositionality in Computer Vision at CVPR, 2020.
[ paper | preprint | bib ]
Enhanced Light-Matter Interactions in Dielectric Nanostructures via Machine Learning Approach
Lei Xu, Mohsen Rahmani, Yixuan Ma, Daria A. Smirnova, Khosro Zangeneh Kamali, Fu Deng, Yan Kei Chiang, Lujun Huang, Haoyang Zhang, Stephen Gould, Dragomir N. Neshev and Andrey E. Miroshnichenko.
In Advanced Photonics, 2020.
[ paper | preprint | bib ]
A Stochastic Conditioning Scheme for Diverse Human Motion Prediction
Sadegh Aliakbarian, Fatemeh Saleh, Mathieu Salzmann, Lars Petersson, Stephen Gould and Amir Habibian.
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[ paper | code | bib ]
Learning to Structure an Image with Few Colors
Yunzhong Hou, Liang Zheng and Stephen Gould.
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[ paper | preprint | bib ]
A Signal Propagation Perspective for Pruning Neural Networks at Initialization
Namhoon Lee, Thalaiyasingam Ajanthan, Stephen Gould and Philip H. S. Torr.
In International Conference on Learning Representations (ICLR), 2020.
[ paper and talk | preprint | bib ]
Representation Learning on Unit Ball with 3D Roto-translational Equivariance
Sameera Ramasinghe, Salman Khan, Stephen Gould and Nick Barnes.
In International Journal of Computer Vision (IJCV), 2020.
[ paper | preprint | bib ]
Proposal-free Temporal Moment Localization of a Natural-language Query in Video using Guided Attention
Cristian Rodriguez Opazo, Edison Marrese-Taylor, Fatemeh Sadat Saleh, Hongdong Li and Stephen Gould.
In IEEE Winter Conference on Applications of Computer Vision (WACV), 2020.
[ paper | arXiv | bib ]
Blended Convolution and Synthesis for Efficient Discrimination of 3D Shapes
Sameera Ramasinghe, Salman Khan, Nick Barnes and Stephen Gould.
In IEEE Winter Conference on Applications of Computer Vision (WACV), 2020.
[ paper | bib ]
Learning to Find Common Objects Across Image Collections
Amirreza Shaban, Amir Rahimi, Shray Bansal, Stephen Gould, Byron Boots and Richard Hartley.
In IEEE International Conference on Computer Vision (ICCV), 2019.
[ paper | code | bib ]
The Alignment of the Spheres: Globally-Optimal Spherical Mixture Alignment for Camera Pose Estimation
Dylan Campbell, Lars Petersson, Laurent Kneip, Hongdong Li and Stephen Gould.
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[ paper | arXiv | bib ]
Visual Permutation Learning
Rodrigo Santa Cruz, Basura Fernando, Anoop Cherian and Stephen Gould.
In IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2018.
[ paper | preprint | bib ]
Partially-Supervised Image Captioning
Peter Anderson, Stephen Gould and Mark Johnson.
In Advances in Neural Information Processing Systems (NeurIPS), 2018.
[ pdf | arXiv | bib ]
Second-order Temporal Pooling for Action Recognition
Anoop Cherian and Stephen Gould.
In International Journal of Computer Vision (IJCV), 2018.
[ paper | preprint | bib ]
Non-Linear Temporal Subspace Representations for Activity Recognition
Anoop Cherian, Suvrit Sra, Stephen Gould and Richard Hartley.
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[ pdf | bib ]
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould and Lei Zhang.
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[ pdf | arXiv | bib ]
Vision-and-Language Navigation: Interpreting Visually-grounded Navigation Instructions in Real Environments
Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould and Anton van den Hengel.
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[ pdf | arXiv | project | bib ]
Video Representation Learning Using Discriminative Pooling
Jue Wang, Anoop Cherian, Fatih Porikli and Stephen Gould.
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[ pdf | arXiv | bib ]
Neural Algebra of Classifiers
Rodrigo Santa Cruz, Basura Fernando, Anoop Cherian and Stephen Gould.
In IEEE Winter Conference on Applications of Computer Vision (WACV), 2018.
[ pdf | code | bib ]
Human Pose Forecasting via Deep Markov Models
Sam Toyer, Anoop Cherian, Tengda (Mike) Han and Stephen Gould.
In International Conference on Digital Image Computing: Techniques and Applications (DICTA), 2017.
[ arXiv | Ikea dataset | bib ]
Guided Open Vocabulary Image Captioning with Constrained Beam Search
Peter Anderson, Basura Fernando, Mark Johnson and Stephen Gould.
In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2017.
[ pdf | bib ]
Incorporating Network Built-in Priors in Weakly-supervised Semantic Segmentation
Fatemeh Sadat Saleh, Mohammad Sadegh Aliakbarian, Mathieu Salzmann, Lars Petersson, Jose M. Alvarez and Stephen Gould.
In IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2017.
[ paper | preprint | bib ]
Discriminatively Learned Hierarchical Rank Pooling Networks
Basura Fernando and Stephen Gould.
In International Journal of Computer Vision (IJCV), 2017.
[ paper | bib ]
Generalized Rank Pooling for Action Recognition
Anoop Cherian, Basura Fernando, Mehrtash Harandi and Stephen Gould.
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[ pdf | preprint | bib ]
DeepPermNet: Visual Permutation Learning
Rodrigo Santa Cruz, Basura Fernando, Anoop Cherian and Stephen Gould.
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[ pdf | preprint | code | bib ]
Self-Supervised Video Representation Learning With Odd-One-Out Networks
Basura Fernando, Hakan Bilen, Efstratios Gavves and Stephen Gould.
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[ pdf | preprint | bib ]
Higher-order Pooling of CNN Features via Kernel Linearization for Action Recognition
Anoop Cherian, Piotr Koniusz and Stephen Gould.
In IEEE Winter Conference on Applications of Computer Vision (WACV), 2017.
[ preprint | code | bib ]
Depth Dropout: Efficient Training of Residual Convolutional Neural Networks
Jian Guo and Stephen Gould.
In Digital Image Computing: Techniques and Applications (DICTA), 2016.
[ pdf | bib ]
SPICE: Semantic Propositional Image Caption Evaluation
Peter Anderson, Basura Fernando, Mark Johnson and Stephen Gould.
In Proceedings of the European Conference on Computer Vision (ECCV), 2016.
[ pdf | project | code | bib ]
Built-in Foreground/Background Prior for Weakly-Supervised Semantic Segmentation
Fatemehsadat Saleh, Mohammad Sadegh Ali Akbarian, Mathieu Salzmann, Lars Petersson, Stephen Gould and Jose M. Alvarez.
In Proceedings of the European Conference on Computer Vision (ECCV), 2016.
[ pdf | arXiv | bib ]
On Differentiating Parameterized Argmin and Argmax Problems with Application to Bi-level Optimization
Stephen Gould, Basura Fernando, Anoop Cherian, Peter Anderson, Rodrigo Santa Cruz and Edison Guo.
Technical Report (available online arXiv:1607.05447), 2016.
[ pdf | arXiv | code | bib ]
Deep Convolutional Neural Networks for Human Embryonic Cell Counting
Aisha Khan, Stephen Gould and Mathieu Salzmann.
In Workshop on Bioimage Computing (BIC) at ECCV, 2016.
[ pdf | bib ]
Learning End-to-end Video Classification with Rank-Pooling
Basura Fernando and Stephen Gould.
In Proceedings of the International Conference on Machine Learning (ICML), 2016.
[ pdf | bib ]
Discriminative Hierarchical Rank Pooling for Activity Recognition
Basura Fernando, Peter Anderson, Marcus Hutter and Stephen Gould.
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[ pdf | code | bib ]
Dynamic Image Networks for Action Recognition
Hakan Bilen, Basura Fernando, Stratis Gavves, Andrea Vedaldi and Stephen Gould.
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[ pdf | code | bib ]
Segmentation of Developing Human Embryo in Time-lapse Microscopy
Aisha Khan, Stephen Gould and Mathieu Salzmann.
In IEEE International Symposium on Biomedical Imaging (ISBI), 2016.
[ pdf | bib ]
Hierarchical Higher-order Regression Forest Fields: An Application to 3D Indoor Scene Labelling
Trung T. Pham, Ian Reid, Yasir Latif and Stephen Gould.
In IEEE International Conference on Computer Vision (ICCV), 2015.
[ pdf | bib ]
Detecting Abnormal Cell Division Patterns in Early Stage Human Embryo Development
Aisha Khan, Stephen Gould and Mathieu Salzmann.
In 6th International Workshop on Machine Learning in Medical Imaging (MLMI) at MICCAI, 2015.
[ pdf | bib ]
Automated Monitoring of Human Embryonic Cells up to the 5-cell Stage in Time-lapse Microscopy Images
Aisha Khan, Stephen Gould and Mathieu Salzmann.
In IEEE International Symposium on Biomedical Imaging (ISBI), 2015.
[ pdf | bib ]
Multi-target Tracking with Time-varying Clutter Rate and Detection Profile: Application to Time-lapse Cell Microscopy Sequences
Seyed Hamid Rezatofighi, Stephen Gould, Ba-Tuong Vo, Ba-Ngu Vo, Katarina Mele and Richard Hartley.
In IEEE Transactions on Medical Imaging (TMI), 2015.
[ pdf | bib ]
Multi-class Semantic Video Segmentation with Exemplar-based Object Reasoning
Buyu Liu, Stephen Gould and Xuming He.
In IEEE Winter Conference on Applications of Computer Vision (WACV), 2015.
[ pdf | bib ]
A Linear Chain Markov Model for Detection and Localization of Cells in Early Stage Embryo Development
Aisha Khan, Stephen Gould and Mathieu Salzmann.
In IEEE Winter Conference on Applications of Computer Vision (WACV), 2015.
[ pdf | bib ]
Learning Weighted Lower Linear Envelope Potentials in Binary Markov Random Fields
Stephen Gould.
In IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2015.
[ pdf | code | bib ]
Scene Understanding by Labeling Pixels
Stephen Gould and Xuming He.
In Communications of the ACM (CACM), 2014.
[ pdf | link | video | bib ]
Determining Interacting Objects in Human-Centric Activities via Qualitative Spatio-temporal Reasoning
Hajar Sadeghi Sokeh, Stephen Gould and Jochen Renz.
In Proceedings of the Asian Conference on Computer Vision (ACCV), 2014.
[ pdf | bib ]
Superpixel Graph Label Transfer with Learned Distance Metric
Stephen Gould, Jiecheng Zhao, Xuming He and Yuhang Zhang.
In Proceedings of the European Conference on Computer Vision (ECCV), 2014.
[ pdf | code | bib ]
An Exemplar-based CRF for Multi-instance Object Segmentation
Xuming He and Stephen Gould.
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
[ pdf | bib ]
Joint Semantic and Geometric Segmentation of Videos with a Stage Model
Buyu Liu, Xuming He and Stephen Gould.
In IEEE Winter Conference on Applications of Computer Vision (WACV), 2014.
[ pdf | bib ]
A Unified Graphical Models Framework for Automated Mitosis Detection in Human Embryos
Farshid Moussavi, Wang Yu, Peter Lorenzen, Jonathan Oakley, Daniel Russakoff and Stephen Gould.
In IEEE Transactions on Medical Imaging (TMI), 2014.
A shorter version of this paper appeared in IEEE International Symposium on Biomedical Imaging (ISBI), 2014.
[ pdf | bib ]
Efficient Extraction and Representation of Spatial Information from Video Data
Hajar Sadeghi Sokeh, Stephen Gould and Jochen Renz.
In Proceedings of the Twenty Third International Joint Conference on Artificial Intelligence (IJCAI), 2013.
[ pdf | bib ]
Discriminative Learning with Latent Variables for Cluttered Indoor Scene Understanding
Huayan Wang, Stephen Gould and Daphne Koller.
In Communications of the ACM, Research Highlights, 2013.
An earlier version of this work appeared in Proceedings of the European Conference on Computer Vision (ECCV), 2010.
[ pdf (cacm) | pdf (eccv) | link | bib ]
A Multiple Model Probability Hypothesis Density Tracker for Time-lapse Cell Microscopy Sequences
Seyed Hamid Rezatofighi, Stephen Gould, Ba-Ngu Vo, Katarina Mele, William E. Hughes and Richard Hartley.
In International Conference on Information Processing in Medical Imaging (IPMI), 2013.
[ pdf | bib ]
A Framework for Generating Realistic Synthetic Sequences of Total Internal Reflection Flourescence Microscopy Images
Seyed Hamid Rezatofighi, William T. E. Pitkeathly, Stephen Gould, Richard Hartley, Katarina Mele, William E. Hughes and James G. Burchfield.
In Proceedings of the International Symposium on Biomedical Imaging (ISBI), 2013.
[ pdf | code | bib ]
DARWIN: A Framework for Machine Learning and Computer Vision Research and Development
Stephen Gould.
In Journal of Machine Learning Research (JMLR), 2012.
[ pdf | code | mloss | bib ]
Towards Unsupervised Semantic Segmentation of Street Scenes From Motion Cues
Hajar Sadeghi Sokeh and Stephen Gould.
In Proceedings of the International Conference on Image and Vision Computing New Zealand (IVCNZ), 2012.
[ pdf | bib ]
A Noise Tolerant Watershed Transformation with Viscous Force for Seeded Image Segmentation
Di Yang, Stephen Gould and Marcus Hutter.
In Proceedings of the Asian Conference on Computer Vision (ACCV), 2012.
[ pdf | code | bib ]
PatchMatchGraph: Building a Graph of Dense Patch Correspondences for Label Transfer
Stephen Gould and Yuhang Zhang.
In Proceedings of the European Conference on Computer Vision (ECCV), 2012.
[ pdf | code | polo dataset (32MB) | stanford dataset (15MB) | bib ]
On Learning Higher-Order Consistency Potentials for Multi-class Pixel Labeling
Kyoungup Park and Stephen Gould.
In Proceedings of the European Conference on Computer Vision (ECCV), 2012.
[ pdf | bib ]
Application of the IMM-JPDA Filter to Multiple Target Tracking in Total Internal Reflection Fluorescence Microscopy Images
Seyed Hamid Rezatofighi, Stephen Gould, Richard Hartley, Katarina Mele and William E. Hughes.
In Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2012.
[ pdf | bib ]
Multiclass Pixel Labeling with Non-Local Matching Constraints
Stephen Gould.
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
[ pdf | bib ]
Simultaneous Multi-class Pixel Labeling over Coherent Image Sets
Paul Rivera and Stephen Gould.
In Digital Image Computing: Techniques and Applications (DICTA), 2011.
[ pdf | bib ]
Max-margin Learning for Lower Linear Envelope Potentials in Binary Markov Random Fields
Stephen Gould.
In Proceedings of the International Conference on Machine Learning (ICML), 2011.
[ pdf | code | slides (.pdf) | bib ]
A Unified Contour-Pixel Model for Segmentation
Ben Packer, Stephen Gould and Daphne Koller.
In Proceedings of the European Conference on Computer Vision (ECCV), 2010.
[ pdf | bib ]
Probabilistic Models for Region-based Scene Understanding
Stephen Gould.
Ph.D. Thesis, Stanford University, June 2010.
[ pdf | archive | bib ]
Accelerated Dual Decomposition for MAP Inference
Vladimir Jojic, Stephen Gould and Daphne Koller.
In Proceedings of the International Conference on Machine Learning (ICML), 2010.
[ pdf | bib ]
Single Image Depth Estimation from Predicted Semantic Labels
Beyang Liu, Stephen Gould and Daphne Koller.
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010.
[ pdf | data (.tar.gz) | bib ]
Region-based Segmentation and Object Detection
Stephen Gould, Tianshi Gao and Daphne Koller.
In Advances in Neural Information Processing Systems (NeurIPS), 2009.
[ pdf | bib ]
Decomposing a Scene into Geometric and Semantically Consistent Regions
Stephen Gould, Rick Fulton and Daphne Koller.
In IEEE International Conference on Computer Vision (ICCV), 2009.
[ pdf | slides (.pdf) | inference (.wmv) | data (.tar.gz) | bib ]
Alphabet SOUP: A Framework for Approximate Energy Minimization
Stephen Gould, Fernando Amat and Daphne Koller.
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
[ pdf | erratum | poster | bib ]
High-Accuracy 3D Sensing for Mobile Manipulation: Improving Object Detection and Door Opening
Morgan Quigley, Siddharth Batra, Stephen Gould, Ellen Klingbeil, Quoc V. Le, Ashley Wellman and Andrew Y. Ng.
In IEEE International Conference on Robotics and Automation (ICRA), 2009.
[ pdf | videos | bib ]
Cascaded Classification Models: Combining Models for Holistic Scene Understanding
Geremy Heitz, Stephen Gould, Ashutosh Saxena and Daphne Koller.
In Advances in Neural Information Processing Systems (NeurIPS), 2008.
[ pdf | bib ]
Learning Bounded Treewidth Bayesian Networks
Gal Elidan and Stephen Gould.
In Advances in Neural Information Processing Systems (NeurIPS), 2008.
A longer version of this paper also appears in Journal of Machine Learning Research (JMLR), 2008.
[ pdf (nips) | pdf (jmlr) | bib ]
Integrating Visual and Range Data for Robotic Object Detection
Stephen Gould, Paul Baumstarck, Morgan Quigley, Andrew Y. Ng and Daphne Koller.
In ECCV workshop on Multi-camera and Multi-modal Sensor Fusion Algorithms and Applications (M2SFA2), 2008.
[ pdf | bib ]
Projected Subgradient Methods for Learning Sparse Gaussians
John Duchi, Stephen Gould and Daphne Koller.
In Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI), 2008.
[ pdf | bib ]
Multi-Class Segmentation with Relative Location Prior
Stephen Gould, Jim Rodgers, David Cohen, Gal Elidan and Daphne Koller.
In International Journal of Computer Vision (IJCV), 2008.
[ pdf | bib ]
STAIR: The STanford Artificial Intelligence Robot Project
Andrew Y. Ng, Stephen Gould, Morgan Quigley, Ashutosh Saxena and Eric Berger.
In Learning Workshop, Snowbird, 2008.
[ project ]
Peripheral-Foveal Vision for Real-time Object Recognition and Tracking in Video
Stephen Gould, Joakim Arfvidsson, Adrian Kaehler, Benjamin Sapp, Marius Meissner, Gary Bradski, Paul Baumstarck, Sukwon Chung and Andrew Y. Ng.
In Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI), 2007.
[ pdf | bib ]
▲top
Software

The following lists some large software libraries that I have (co-)developed and maintain. For reference implementations of the algorithms described in my work see the links next to the relevant paper in my publications list.

ANU CVML Video Annotation Tool
An in-browser video annotation tool that allows for labelling of bounding boxes, polygon regions and skeletons of video frames. Temporal segments can also be tagged and annotated with activities. It is simple and efficient for anyone to use, and runs completely within the browser with no client-side installation necessary. The code is released under the MIT license.

DDN: Deep Declarative Networks
Python code and tutorials for implementing deep declarative networks, a class of deep learning model that allows optimization problems to be embedded within an end-to-end learnable network. Includes PyTorch layers for projection, robust pooling and more. The code is released under the MIT license.
[ webpage | tutorials | github ]

Darwin
A C++ framework for machine learning and computer vision research. The framework includes a wide range of standard machine learning and graphical models algorithms as well as reference implementations for many of the algorithms described in the publications above. The code is released under the BSD license. If you are interested in contributing to this codebase then please email me.
[ releases | documentation | video help | mloss | github ]

The STAIR Vision Library
A platform independent C++ toolkit for computer vision research (building on top of OpenCV). The library also includes many machine learning and probabilistic graphical models algorithms. We have released the code under the BSD license. Developed while I was at Stanford University, this library is no longer supported---much of its functionality, however, is available in the Darwin framework described above.
[ wiki | doc | sourceforge ]
▲top
Professional Activities

Workshops, Conferences, and Journals
I have regularly served as program committee member or reviewer for the following conferences and journals: CVPR, ECCV, ICCV, ICML, IEEE PAMI, IEEE TIP, IJCV, JMLR, NeurIPS, RSS, UAI and others.

Selected Invited Talks and Tutorials

Teaching

Selected Lecture Notes
▲top
Issued Patents

US 11,048,919. Person tracking across video instances.
US 10,949,353. Data iterator with automatic caching.
US 10,885,628. Single image completion from retrieved image collections.
US 10,534,965. Analysis of video content.
US 10,460,175. Deep learning processing of video.
US 9,710,696. Apparatus, method, and system for image-based human embryo cell classification.
US 9,542,591. Apparatus, method, and system for automated, non-invasive cell activity tracking.
US 9,177,192. Apparatus, method, and system for image-based human embryo cell classification.
US 7,725,312. Transcoding method and system between CELP-based speech codes with externally provided status.
US 7,411,418. Efficient representation of state transition tables.
US 7,301,792. Apparatus and method of ordering state transition rules for memory efficient, programmable, pattern matching finite state machine hardware.
US 7,219,319. Apparatus and method for generating state transition rules for memory efficient programmable pattern matching finite state machine hardware.
US 7,184,953. Transcoding method and system between CELP-based speech codes with externally provided status.
US 7,180,328. Apparatus and method for large hardware finite state machine with embedded equivalence classes.
US 7,082,044. Apparatus and method for memory efficient, programmable, pattern matching finite state machine hardware.
US 6,829,579. Transcoding method and system between CELP-based speech codes.
AU 2004222859. A method for developing algorithms.
▲top
Useful Links