Code for Learning from Corrupted Binary Labels via Class-Probability Estimation, ICML 2015
The aim of this MATLAB code is to replicate the tables of results and figures from the paper Learning from Corrupted Binary Labels via Class-Probability Estimation, appearing in ICML 2015.The code comprises a main driver script, ccn_uci_script.m, and several additional files organised into the following subfolders:
- data_processing/: scripts to generate and load data.
- datasets/: MAT files for UCI datasets used in the paper.
- evaluation/: evaluation of AUC, BER, et cetera of a candidate predictor.
- helper/: miscellaneous helper scripts, for flipping labels, converting labels to { 0, 1 }, et cetera.
- learning/: scripts to perform cross-validation, estimate noise rates, et cetera.
- libraries/: some third party libraries (see below).
- printing/: printing LaTeX tables of results.
- setup/: setting up various enums.
- visualisation/: producing violin plots of results.
Performing cross-validation and learning
To perform learning on each of the datasets, simply run
>> ccn_uci_script;
The display window will then fill with the results of cross-validation and training each of the methods on each of the datasets, using previously saved optimal parameters. The output also attempts to give some estimate of when the particular set of trials for a given noise setting will finish. Sample output:
*** overwriting saved results ***
housing
1-AUC (%) BER (%) ERR_{max} (%) ERR_{oracle} (%)
[1] optimal lambda = 10^-8, sigma = 10^Inf, [ETA = 18-May-2015 12:07:54]
...
[100] optimal lambda = 10^-8, sigma = 10^Inf, [ETA = 18-May-2015 11:56:44]
& ($\rhoPlus, \rhoMinus$) = ($0.0, 0.0$) & 14.48 $\pm$ 0.00 & 10.94 $\pm$ 0.00 & 19.80 $\pm$ 0.00 & 4.95 $\pm$ 0.00 \\
[1] optimal lambda = 10^-2, sigma = 10^Inf, [ETA = 18-May-2015 12:08:06]
...
[100] optimal lambda = 10^-2, sigma = 10^Inf, [ETA = 18-May-2015 11:59:27]
& ($\rhoPlus, \rhoMinus$) = ($0.0, 0.1$) & 26.23 $\pm$ 0.71 & 29.99 $\pm$ 0.79 & 19.25 $\pm$ 0.96 & 5.11 $\pm$ 0.06 \\
...
In the above, final results that are output for each noise trial correspond to those in Table 6 of the Supplementary Material.
Be warned that this script is likely to take a long time. You may wish to reduce the number of noise trials by changing settings.NOISE_TRIALS on Line 79 from 100 to some smaller number.
During the course of this script, we will save, for each trial, the results of cross-validation as well as the final predictions. These can be used subsequently to either skip cross-validation and just perform learning, or to skip both and just produce formatted tables of results.
Performing learning only
Once cross-validation has been completed for a particular dataset and noise rate, when rerunning the script, one may wish to simply use the previously stored cross-validation parameters. To do so, simply run
>> ccn_uci_script('run_with_saved_params');
and re-run the script.
Generating tables of results
Once you have saved the predictions from each method, to replicate the mini-table of results from the body of the paper, simply run:
>> ccn_uci_script('print_mini_table');
To generate the full tables of results from the Supplementary Material, simply run:
>> ccn_uci_script('print_full_table');
Producing violin plots
Once learning is completed, and the results saved, to generate the violin plots, simply run:
ccn_uci_script('plot_only');
Detailed description
The basic operation of the script is that ccn_uci_main_driver() loops over all combinations of datasets and noise rates. For each noise rate, in ccn_uci_main_backseat(), we consider a number of trials wherein the training labels are randomly flipped at the appropriate rate. In ccn_uci_learner_body(), we run an appropriate learner (by default a neural network) on the resulting data, and use the range of the output probabilities to estimate the noise rates. These are then used for prediction on the test set, with the results evaluated and saved in ccn_uci_perf_saver().Third-party libraries
The code relies on certain third-party MATLAB code for various operations. For convenience, the code is included in the ZIP file. The libraries are:- cprintf: print colour text
- export_fig: export figures to PDF
- minFunc: LBFGS optimisation
- violin: violin plots
- maximizeFig: maximise a new figure
- sampleError: computes the AUC of a scorer