An Annotated Corpus of Tonal Piano Music from the Long 19th Century


  • Johannes Hentschel École Polytechnique Fédérale de Lausanne
  • Yannis Rammos École Polytechnique Fédérale de Lausanne
  • Fabian C. Moss Julius-Maximilians-Universität Würzburg
  • Markus Neuwirth Anton Bruckner University, Linz
  • Martin Rohrmeier École Polytechnique Fédérale de Lausanne



corpora, harmony, phrase, cadence, piano, 19th century


We present a dataset of 264 annotated piano pieces of nine composers, composed in the long 19th century ( Annotations adhere to the DCML harmony annotation standard and include Roman numerals, phrase boundaries, and cadence types. The scores are encoded in the XML-based MuseScore 3 format. Annotations are embedded within the MuseScore files. In addition, all harmony information, alongside key features of the encoded measure and note objects, is provided in the form of plaintext TSV-formatted tables for increased interoperability with other datasets and analysis tools. Annotations were collaboratively created and reviewed by a pool of trained music theorists. Collaboration took place asynchronously online via a semi-automated GitHub-based workflow designed for quality assurance, allowing cycles of revisions and reviews until consensus is reached. The full revision history is retained, providing data for further empirical research on inter-annotator agreement and related topics. We also present descriptive statistics about the nine corpora and the dataset as a whole, including comparisons of pitch-class contents, phrase lengths, modulations, and cadence types. We conclude with a discussion of our musicological principles for corpus building and considerations of representability.




How to Cite

Hentschel, J., Rammos, Y., Moss, F. C., Neuwirth, M., & Rohrmeier, M. (2024). An Annotated Corpus of Tonal Piano Music from the Long 19th Century. Empirical Musicology Review, 18(1), 84–95.



Data Reports