Alexander Mathews

I am a PhD student at Australian National University (ANU) working with Dr Lexing Xie and Dr Xuming He. My research interests lie in the fields of computer vision and machine learning with a primary focus on joining image and text modalities in order to produce natural language captions for images and videos.

image
SentiCap: Generating Image Descriptions with Sentiments
Alexander Mathews, Lexing Xie, Xuming He
image

We design a system to describe an image with emotions, and present a model that automatically generates captions with positive or negative sentiments. We propose a novel switching recurrent neural network with word-level regularization, which is able to produce emotional image captions using only 2000+ training sentences containing sentiments. We evaluate the captions with different automatic and crowd-sourcing metrics. Our model compares favourably in common quality metrics for image captioning. In 84.6% of cases the generated positive captions were judged as being at least as descriptive as the factual captions. Of these positive captions 88% were confirmed by the crowd-sourced workers as having the appropriate sentiment.

Choosing Basic-Level Concept Names using Visual and Language Context
Alexander Mathews, Lexing Xie, Xuming He
image

We study basic-level categories for describing visual concepts, and empirically observed context-dependant basic-level names across thousands of concepts. We propose methods for predicting basic-level names using a series of classification and ranking tasks, producing the first large-scale catalogue of basic-level names for hundreds of thousands of images depicting thousands of visual concepts. We also demonstrate the usefulness of our method with a picture-to-word task, showing strong improvement (0.17 precision at slightly higher recall) over recent work by Ordonez et al, and observing significant effects of incorporating both visual and language context for classification. Moreover, our study suggests that a model for naming visual concepts is an important part of any automatic image/video captioning and visual story-telling system.