The derived dataset using the default settings is available here.
-
Download Lakh MIDI Dataset (LMD) with the following script.
./scripts/download_lmd.sh
(Or, download it manually here.)
-
Set the variables
LMD_ROOTandLPD_ROOTinrun.shand variables inconfig.pyto proper values. -
Derive all subsets and versions of LPD,
matched_ids.txtandcleansed_ids.txtwith the following script../scripts/derive_lpd.sh
The derived labels can be found at
data/labels.tar.gz.
-
Download the labels with the following script.
./scripts/download_labels.sh
-
Derive the labels with the following script.
./scripts/derive_labels.sh
-
Install GNU Parallel to run the synthesizer in parallel mode.
-
Synthesize audio files from multitrack pianorolls with the following script.
./scripts/batch_synthesize.sh ./data/lpd/lpd/lpd_cleansed/ \ ./data/synthesized/lpd_cleansed 20
(The above command will synthesize all the multitrack pianorolls in the LPD-cleansed subset with 20 parallel jobs.)