Proposed PhD Projects in Computer Vision

Synthesis of Stereoscopic Movie from Conventional Monocular Video Clips.

In order to provide material for 3-dimensional television displays, methods are required for producing 3-dimensional video material from existing 2-dimensional video, such as old films.  This project seeks to develop automatic and interactive methods for 3-dimensionalizing video for this purpose.  What is required is to create a synthetic stereo pair from a single video sequence.  This involves synthesis of a matching image, which in conjunction with the original image will give an illusion of 3-dimensions.  Methods of geometric computer vision and “structure from motion” will be used to do this.  By analysis of the motion of objects in the video sequence, and determination of the motion of the camera, the structure of the scene can be determined.  Once the geometry of the scene is understood, a stereo pair of images can be produced.

An APAI scholarship is offered associated with this position.  More information on the project is given here.

Supervisor: Richard Hartley

Computer Vision and Machine Learning.

We seek to apply methods of machine learning, particularly kernel-based learning techniques to the solution of problems in computer vision.  Specific problems addressed are object recognition and localization (position determination).  This project involves collaboration with a European project named LAVA, which aims at applying such methods to applications in mobile computing.  Thus, images taken with a hand-held camera/digital assistant device can be used as an input device for providing immediate assistance.  For example, recognition of the location of the image can be used to bring up local information about the environment.  For instance, an image taken inside a shopping mall can bring up a map of the mall, with directions to find a desired shop, from the recognized location where the present image was taken.

Supervisor: Richard Hartley

Video Synopsis.

A system is proposed that will take video input from a video camera and make a summary video.  The envisaged scenario is one in which surveillance cameras are used to process large amounts of video, most of which is of no or little interest.  To store all the video provided by the sensor would require vast amounts of storage, and would also be far more than a human operator would want to examine.  The envisaged system would retain and store only video footage that contains interesting material.

We imagine an unsupervised video sensor (camera) placed in some location, gathering video information continuously.  Such a location may be in a public place where not much activity occurs, in a house or in some remote location.  Thus, we envisage placing such sensors both at indoors and outdoor locations.  Typically very little activity will occur, and there will be no need to retain, or transmit most of the frames.  The system will need to distinguish between normal and uninteresting activity, such as waving of trees or motion of clouds.  Such decisions can be made on the basis of learning the normal variance of the scene.  At a more sophisticated level, particularly related to indoor surveillance, the system can learn to distinguish human subjects, and their normal behaviour, only flagging unusual actions, such as people climbing through windows, or lying on the floor. 

Supervisor: Richard Hartley

Medical Image Analysis

Computer vision and image understanding are an important part of modern medicine.  Software systems are used to help clinicians to diagnose diseases, screen for abnormal conditions, and visualize body anatomy.  Our particular interest is in using computer vision techniques to help in two areas: ophthalmology and cancer detection.  Diseases of the eye, such as diabetic retinopathy, or glaucoma can lead to blindness if untreated.  Screening of at-risk patients can detect these diseases at an early stage.  For instance, in glaucoma, pressure in the eyeball causes damage to the retina and optic nerve.  Stereoscopic imagery can be used to determine the degree of deformation of the eyeball due to pressure, and hence determine the severity of the condition.  The required computer tools include stereoscopic analysis of image pairs, detection of abnormal features in the retina, retinal image alignment and matching.

Screening for colon cancer is invasive costly and uncomfortable.  A developing technique, virtual colonoscopy seeks to replace this method with a method based on analysis of Comuter Tomographic (CT) images.  Challenges involve the detecition and visualization of the colon wall as an aid to interactive screening.

Supervisor: Richard Hartley

Application of the Hyperspectral Camera to Vision Research for Near-Range Applications.

A hyperspectral camera produces images in which at each pixel (image position) a complete visible range spectrum is captured.  Instead of the usual red-green-blue bands captured by a normal digital camera, as many as 256 image bands from the visible range are captured.  This extra information makes it possible to derive much more information from the image about the material properties of each object type in the scene.  As an example, it is possible to distinguish chlorophyll-A from chlorophyll-B in plants, just on the spectrum of reflected light.  This project seeks to develop applications of this technology in analyzing images taken with a hyperspectral camera.  Important problems include segmentation of the image into regions of similar spectral characteristics, determination of efficient storage methods for hyperspectral images, material determination and detection of objects based on their spectra, geometric correction of images suffering from motion distortion, and alignment of multispectral images with images of other modalities.

Supervisor: Richard Hartley

Application of Mathematical Theories to Geometric Computer Vision Problems.

The mathematical analysis of image sequences involves the use of Projective Geometry techniques.  The usual camera model is that of a pinhole camera, which may be simply modeled in terms of a linear projection of projective 3-space to projective 2-space.  In the last decade, increasingly sophisticated use of projective geometry has led to methods of scene reconstruction from a moving camera observing a stationary scene.  Algorithms for automatic calibration of the camera and reconstruction of the scene have been developed.  Analysis of the failure and critical configurations is partially complete.  Avenues for further work include the investigation of moving scenes observed with moving cameras (dynamic scenes), and exploitation of specific configurations in the scene, such as scene planes, curved edges and curved objects. 

To undertake research in this area, a certain mathematical sophistication is necessary.  A firm basis of linear algebra and the ability to become familiar with Projective Geometry are essential.  Familiarity with some numerical methods would also be a plus.  This project is suitable for a student with a taste for mathematics.

Supervisor: Richard Hartley

Interactive model-based scene understanding and modeling from multiple images or video.

Many of the geometric problems of determining the structure of a set of points from multiple images have been solved.  However, the problem of correct modeling of more complex scenes to produce a complete 3-dimensional model remains.  To accomplish this task, we need to recognize specific geometric primitives and generic objects in the scene.  Thus, recognition of planes, such as the ground plane or the sky allow it to be correctly separated from the rest of the scene and modeled accordingly.  Correct detection and modeling of curved surfaces will help to create shape-specific models.  Finally, recognition of objects such as trees, buildings and other common outdoor objects will allow for more faithful graphic models to be produced.  The outcome would be a capability to generate better graphical models of natural scenes, leading to accurate generation of novel views.

Supervisor: Richard Hartley

Probabilistic visual learning based 3D object recognition, localization and tracking

Learning is an essential ability of any intelligent system.  Machine Learning methods should play a more important role in computer vision research.  In this research, the probabilistic visual learning methods will be investigated for the purpose of constructing robust visual appearance models from real video sequences.  Vision tasks that will be studied are 3d object segmentation, localization, and tracking in real-time.  We basically follow the idea of A. Pentland's "Probabilistic Visual Learning for Object Representation (T-PAMI-1996)", but augment it by adopting Active Shape Model techniques.  Not only the 3D geometry, but also position/pose and kinemics properties of moving objects are described in a probabilistic framework, therefore many helpful constraints and knowledge can be incorporated easily during the learning process, thereby greatly increasing the robustness of vision perception systems.

Supervisor: Hongdong Li

Constraint-based scene understanding and modelling from video

Unlike Marr's philosophy, which believes recovery of 3d information is the first step of vision perception prior to all other processing, this research suggests that partial segmentation and/or recognition stage should come before reconstruction.  In other words, we believe that partially segmentation can drastically reduce the complexity of constructing a useful machine vision system. This is because such partial knowledge of the scene being observed will narrow the domain knowledge and searching space, thus improve system efficiency.

Another motivation for such research is that for some specific computer vision applications the purpose is to build a machine that can augment our eyes, rather than take the place of our eyes.  Therefore in such cases a human-computer interactive approach seems more practical than a full automatic one.  In the expected outcome, a human's knowledge of the geometric constraints contained in scenes will play an important role in 3d reconstruction.

Examples of such constraints include some geometric relations that are often present in man-made scenes, such as coplanarity, alignment and orthogonality.  In essence, we propose to define each object by sets of linear constraints, and exploit them to confine points' space locations, thus showing how to substantially improve the precision of 3D reconstruction and scene modelling.

Supervisor: Hongdong Li

Pattern recognition and augmented vision in a pen-based (tablet PC) computing environment.

The Tablet PC is a new kind of computer, representing an evolutionary step in the development of the laptop computer used today in mobile computing.  It delivers new and easy ways to interact between human and computer, vastly extending the ways in which people will work and enjoy their PCs.  On a Tablet PC, users can write or draw directly on the screen and save electronic notes in their own handwriting – or they can draw edge maps overlaying screen images and utilize or store images in more compact forms through a highly accurate recognition engine.  The pen indeed provides a flexible and friendly way for human-computer interaction.

This research aims at extending pen-based pattern-recognition engines to non-text applications, such as hand-draw forms, sketches, graphs or mathematical formulae, or interactive 3d vision reconstruction from stereo images, etc.  Structural recognition methods using a probabilistic learning framework are the main approach that will be investigated.

Supervisor: Hongdong Li