Big Data, Big Questions: A Closer Look at the Yale–Classical Archives Corpus (c. 2015)

Trevor de Clercq


This paper responds to the article by Christopher White and Ian Quinn, in which these authors introduce the Yale-Classical Archives Corpus (YCAC). I begin by making some general observations about the corpus, especially with regard to ramifications of the keyboard-performance origins of many pieces in the original MIDI collection. I then assess the accuracy of the scale-degree and local-key fields in the database, which were generated by the Bellman-Budge key-finding algorithm. I point out that some of the inaccuracies from the key-finding algorithm's output may influence the results we obtain from statistical studies of this corpus. I also offer an alternative analysis to the authors' finding that the ratio of V7 to V chords increases over time in common-practice music. Specifically, I conjecture that this finding may be the result of (or related to) increasing instrumental resources over time. I close with some recommendations for future versions of the corpus, such as enabling end users to help repair transcription errors as well as offer ground truths for harmonic analyses and key area information.


corpus analysis; key-finding; modulation; tonality; machine learning

Full Text:




  • There are currently no refbacks.

Copyright (c) 2016 Trevor de Clercq

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.


Beginning with Volume 7, No 3-4 (2012), Empirical Musicology Review is published under a Creative Commons Attribution-NonCommercial license

Empirical Musicology Review is published by The Ohio State University Libraries.

If you encounter problems with the site or have comments to offer, including any access difficulty due to incompatibility with adaptive technology, please contact

ISSN: 1559-5749