The Yale-Classical Archives Corpus


  • Christopher William White
  • Ian Quinn Yale University



corpus analysis, machine learning, common practice, tonality, style


The Yale-Classical Archives Corpus (YCAC) contains harmonic and rhythmic information for a dataset of Western European Classical art music. This corpus is based on data from, a repository of thousands of user-generated MIDI representations of pieces from several periods of Western European music history. The YCAC makes available metadata for each MIDI file, as well as a list of pitch simultaneities ("salami slices") in the MIDI file. Metadata include the piece's composer, the composer's country of origin, date of composition, genre (e.g., symphony, piano sonata, nocturne, etc.), instrumentation, meter, and key. The processing step groups the file's pitches into vertical slices each time a pitch is added or subtracted from the texture, recording the slice's offset (measured in the number of quarter notes separating the event from the file's beginning), highest pitch, lowest pitch, prime form, scale-degrees in relation to the global key (as determined by experts), and local key information (as determined by a windowed key-profile analysis). The corpus contains 13,769 MIDI files by 571 composers yielding over 14,051,144 vertical slices. This paper outlines several properties of this corpus, along with a representative study using this dataset.




How to Cite

White, C. W., & Quinn, I. (2016). The Yale-Classical Archives Corpus. Empirical Musicology Review, 11(1), 50–58.