MCMA: A Symbolic Multitrack Contrapuntal Music Archive




counterpoint, polyphony, computational musicology, neural machine translation, symbolic music


We present Multitrack Contrapuntal Music Archive (MCMA, available at, a symbolic dataset of pieces specifically curated to comprise, for any given polyphonic work, independent voices. So far, MCMA consists only of pieces from the Baroque repertoire but we aim to extend it to other contrapuntal music. MCMA is FAIR-compliant and it is geared towards musicological tasks such as (computational) analysis or education, as it brings to the fore contrapuntal interactions by explicit and independent representation. Furthermore, it affords for a more apt usage of recent advances in the field of natural language processing (e.g., neural machine translation). For example, MCMA can be particularly useful in the context of language-based machine learning models for music generation. Despite its current modest size, we believe MCMA to be an important addition to online contrapuntal music databases, and we thus open it to contributions from the wider community, in the hope that MCMA can continue to grow beyond our efforts. In this article, we provide the rationale for this corpus, suggest possible use cases, offer an overview of the compiling process (data sourcing and processing), and present a brief statistical analysis of the corpus at the time of writing. Finally, future work that we endeavor to undertake is discussed.