Darwin
1.10(beta)
|
This project implements conditional Markov random field (CRF) models for multi-class image segmentation (also known as pixel labeling). The following documentation describes the pipeline for learning and evaluating a multi-class image segmentation model. The following instructions are general but give examples using the 21-class MSRC image segmentation dataset.
void
class marks regions to be ignored during training and test.Multi-class image segmentation requires labeled training data. Each training instance consists of an image and an integer matrix the same size as the image. The matrix entries indicate the class label for each pixel. These matrices can be stored as space-delimited text files (in row-major order) or 16-bit PNG files. The convertPixelLabels
application can be used to generate the right file format from colour-annotated images (see below).
Images and label files should have the same basename (e.g., img001.jpg
and img001.txt
) and may be stored in the same or different directories. Training and evaluation lists are described in terms of basenames, e.g., for the 21-class MSRC dataset we would have
prepareMSRCDemo.sh
or Python script prepareMSRCDemo.py
in the project directory will download and prepare the data for exprimenting with the 21-class MSRC dataset. The shell script assumes that you have utilities wget
, unzip
, and convert
installed on your system.A standard method of annotating images for multi-class pixel labeling is to paint each pixel with a colour corresponding to a particular class label. For example, the 21-class MSRC dataset uses red to indicate building and blue to indicate cow. Using the XML configration file (see Configuration) the convertPixelLabels
application can convert these colour images into a format recognized by the other application in this project—specifically, space-delimited text files. The application expects the colour images to be in the labels directory and will write to the same directory. E.g.,
where $ALL_LIST
should be replaced with the filename of a file containing the basenames of both training and evaluation images as described above.
The multi-class image segmentation pipeline requires a number of configuration settings to be defined so that it can find the training images, labels, etc. It also needs to know the number of class labels and how to visualize them. The following XML configuration shows an example configuration for the MSRC dataset.
The directories specified in the configuration file are relative to baseDir
. Output directories modelsDir
and outputDir
must exist before running the learning and inference code. If feature caching is being used (useCache
set to true
) then cacheDir
must also exist. Different image formats are supported. For example, if you have kept your MSRC images are in bmp
format then you can change the imgExt
option from ".jpg"
to ".bmp"
.
The unary potentials encode a single pixel's preference for each label and is the heart of the model. The unary potentials are learned in two stages. The first stage learns a one-versus-all boosted decision tree classifier for each of the labels. The key features used for this stage are derived from a bank of 17 filters which are run over the image. In addition, we include the RGB color of the pixel, dense HOG features, LBP-like features, and averages over image rows and columns. These features can all be controlled within the drwnSegImagePixelFeatures
section of the configuration file. Custom features can even be included via auxiliary feature settings (see drwnSegImageStdPixelFeatures for details).
The second stage calibrates the output of the boosted decision trees via a multi-class logistic regression classifier. These steps are performed by the following commands. One of the most important commandl-line argument is -subSample
, which determines how many pixels are used, and hence the amount of memory required, during training. Specifically, "-subSample n^2"
randomly samples 1 pixel out of every n-by-n pixel grid. With the settings below, the unary potentials can be trained using under 4GB of memory.
We can evaluate the learned model on some test images using the following commands.
Images visualizing the results are written to the output directory specified in the configuration file.
The pairwise term encodes a contrast-dependent smoothness prior on the image labeling. The weight of the term is learned by direct search, i.e., a number of parameter values are tried and the one that gives the best results on a subset of training images is kept. The following commandline will perform this step.
Note that $VAL_LIST
and $TRAIN_LIST
can be the same list, however, the code will only use up to 100 images from the list to learn the contrast weight.
In addition to the contrast-dependent smoothness pairwise terms which are defined on a local neighbourhood raound each pixel, we can add long-range pairwise terms to encourage consistent labeling across the image.
The long-range edges are determined by finding similar pairs of patches within the image. A similar approach is taken in Gould, CVPR 2012. Like the contrast-dependent pairwise terms, the strength of the long-range edge constraints is determined by cross-validation on a subset of the training images as,
The final step in the pipeline evaluates the model on some test images and reports average performance.
Unlike the unary-only evaluation, in this example we also generate a full confusion matrix (by giving the -confusion
option) to the scorePixelLabels
application.