Big Data, Big Questions: A Closer Look at the Yale–Classical Archives Corpus (c. 2015)


  • Trevor de Clercq Middle Tennessee State University



corpus analysis, key-finding, modulation, tonality, machine learning


This paper responds to the article by Christopher White and Ian Quinn, in which these authors introduce the Yale-Classical Archives Corpus (YCAC). I begin by making some general observations about the corpus, especially with regard to ramifications of the keyboard-performance origins of many pieces in the original MIDI collection. I then assess the accuracy of the scale-degree and local-key fields in the database, which were generated by the Bellman-Budge key-finding algorithm. I point out that some of the inaccuracies from the key-finding algorithm's output may influence the results we obtain from statistical studies of this corpus. I also offer an alternative analysis to the authors' finding that the ratio of V7 to V chords increases over time in common-practice music. Specifically, I conjecture that this finding may be the result of (or related to) increasing instrumental resources over time. I close with some recommendations for future versions of the corpus, such as enabling end users to help repair transcription errors as well as offer ground truths for harmonic analyses and key area information.