This archive accompanies the following paper: C. Walder, "Modelling Symbolic Music: Beyond the Piano Roll" (available on arXiv). It contains preprocessed musical data derived from midi files. Four datasets are included: jsb = piano-midi.de not = Nottingham mus = Musedata jsb = JSB Chorales For each, there is a python pickle file and three csv files. csv files: For each of the train, test and valid sets, there is a csv file with one row per musical note. The columns are: file_index: an integer indicating the midi file (ie. piece of music) t0: the start time t1: the end time midi: the midi note number part: an integer indicating the part or voice (based on the midi track) t0 and t1 are in units specified by resolution.txt, divide t0 and t1 by the number in that file to get the time in quarter notes Python pickle: The same data in python format. Each pickle contains a dict with keys 'resolution' (the resolution as above) and 'data'. The 'data' value is another dict with keys 'train', 'test', 'valid'. For each of these three keys, the value is a list of lists. The outer list is of length number of pieces. The inner lists have length number of notes per piece. Each element of the inner list is a tuple (t0, t1, midi, part) as above. e.g. import cPickle x = cPickle.load(file('jsb.pkl')) print x['resolution'] print x['data']['train'][0] Please see the following copyright information and original data sources: http://www-etud.iro.umontreal.ca/~boulanni/icml2012 http://piano-midi.de/copy.htm http://www.musedata.org/legal/lcr.html http://abc.sourceforge.net/NMD/