Code for Linking losses for density ratio and class-probability estimation, ICML 2016
The aim of this MATLAB code is to replicate the tables of results and figures from the paper Linking losses for density ratio and class-probability estimation, appearing in ICML 2016.Unzipping the code should reveal four subfolders:
- weight_function/: weight function experiments (Sec 8.1).
- covariate_shift/: covariate shift experiments (Sec 8.2).
- rtb/: ranking the best experiments (Sec 8.3).
- helper/: miscellaneous helper files (see below).
Weight function analysis
For the weight function analysis, in the weight_function folder, simply run:
>> loss_regret_script;
You should see an output such as:
>> loss_regret_script;
reg = 0.3614 [lambda = 10^-8, gamma = 0; 1.8 secs]
max regret = 0.3614 [gamma = 0, lambda = 10^-8]
reg = 0.3821 [lambda = 10^-8, gamma = 0; 1.5 secs]
max regret = 0.3821 [gamma = 0, lambda = 10^-8]
reg = 0.5460 [lambda = 10^-8, gamma = 0; 1.1 secs]
max regret = 0.5460 [gamma = 0, lambda = 10^-8]
A plot mimicking Figure 1 of the paper should also be displayed.
Covariate shift adaptation
For the covariate shift experiments on the poly dataset, in the covariate_shift folder, simply run:
>> poly_script;
The script will go through each of the losses considered in Sec 8.2, and train a kernel model to estimate the density ratio. The NMSE on the test sample is reported. You should see output that mimics Table 2(a), such as:
Uniform & 1.2723 $\pm$ 0.0302 \\
KLIEP & 0.6916 $\pm$ 0.0136 \\
LSIF & 0.7742 $\pm$ 0.0217 \\
uLSIF & 0.7038 $\pm$ 0.0102 \\
...
For the experiments on the amazon dataset, in the covariate_shift folder, simply run:
>> amazon_script;
The script will go through each of the losses considered in Sec 8.2, and train a kernel model to estimate the density ratio. The pairwise disagreement on the test sample is reported. Following the generation of the feature mappings (after TF-IDF and SVD projection), you should see output that mimics Table 2(b), such as:
generating data trial #doing svd...done
...
Uniform & 0.1582 $\pm$ 0.0018
KLIEP & 0.1500 $\pm$ 0.0018
LSIF & 0.1500 $\pm$ 0.0019
Note that the file amazon.mat contains the processed Amazon data as provided here.
Ranking the best
For the ranking the best experiments, in the rtb folder, simply run:
>> rtb_script;
The display window will then fill with the results of cross-validation and training each of the methods on each of the datasets. The script proceeds by taking each dataset and then each method in turn. The script will output, for each train-test split, the performance of a method according to all the performance criteria listed in Appendix H. Sample output:
= Dataset german [n = 1000, d = 24] =
unknown proper_logistic
unknown proper_p-classification
unknown proper_lsif
fold 1 2 3 4 5
Proper_Logistic 0.7845 0.0346 0.1827 0.5188 0.0000 0.6000 (0.0 secs; lambda 1.953125e-03, pPush 4, lPush 4)
fold 1 2 3 4 5
Proper_Logistic 0.7936 0.0342 0.1815 0.5876 0.0100 0.6000 (0.0 secs; lambda 2.441406e-04, pPush 4, lPush 4)
fold 1 2 3 4 5
Proper_Logistic 0.8011 0.0436 0.1911 0.6632 0.0490 0.8000 (0.0 secs; lambda 1.220703e-04, pPush 4, lPush 4)
...
Once the script is completed, it will output the LaTeX source for Table 5 in the appendix. Be warned that this script is likely to take a long time.
During the course of this script, we will save, for each trial, the results of cross-validation as well as the final predictions. These can be used subsequently to either skip cross-validation and just perform learning, or to skip both and just produce formatted tables of results. To just print out the results of a previous run, change
PRINT_ALL = 1;
in Line 42 of rtb_script.m.
Third-party libraries
The code relies on certain third-party MATLAB code for various operations. For convenience, the code is included in the ZIP file as part of the helper folder. The libraries are:- cprintf: print colour text
- minFunc: LBFGS optimisation
- liblinear: LibLinear (note: you may need to make binaries for your architecture.)
- liblinear-weights: LibLinear with weights (note: you may need to make binaries for your architecture.)
- sampleError: computes the AUC of a scorer
- tfidf2: computes a TF-IDF matrix