This project demonstrates the MAP inference algorithms available in the Darwin framework by testing them on the Rosetta Protein Design problems (Yanover et al., JMLR 2006). A number of scripts in the projects/rosetta directory guide you through the steps necessary to run inference on these problems. An outline of the steps follows.

Downloading the Data

The dataset can be downloaded and files extracted using the getRosettaData.py Python script. Alternatively, you can fetch it manually from

http://jmlr.csail.mit.edu/papers/volume7/yanover06a/Rosetta_Design_Dataset.tgz

Warning: The dataset is 2.5GB.

Converting Formats

The data is in Matlab format. The script rosetta2drwn.m converts it to an XML format that Darwin can recognize. The Python script convertRosettaData.py will automatically convert all the design files and then compress them to save space. You must have Matlab or Octave installed to perform this step.

You may need to install Chen Yanover's sparse cell class to read the data. It can be downloaded from:

http://cyanover.fhcrc.org/sparse_cell_2.tgz

Running Inference

The runRosettaExperiments.py Python script runs different inference algorithms. Calling the script with no arguments will run inference on each of the design files (and creates a separate log for each one). Alternatively, you can call the script with the basename of the design file that you want to run inference on, e.g.,

./runRosettaExperiments.py 1bx7

or

python runRosettaExperiments.py 1bx7

Warning: Some inference routines take a very long time to complete.

See Also: MAP Inference (Energy Minimization); Applications and Project Descriptions