IN line with the present-day digital turn in the humanities, musicology is showing an increasing tendency towards the study of large corpora. 2 This has materialized in the emergence of scholarly repositories offering ground truth and automatic annotations of musical parameters from both scores and recordings. In the same way as scholars in the symbolic domain have laid especial emphasis on melody and harmony, 3 the audio domain largely concentrates on timing and, sometimes, intonation. Examples include Saarland Music Data, 4 CorpusCOFLA, 5 and the MAESTRO, 6 URMP, 7 Bach10, 8 and Choral Singing datasets. 9 Such audio corpora provide data extracted not from historical commercial recordings but rather from files recorded for the purpose of being analyzed.

In this context, the Recorded Brahms Corpus (RBC) aims at benefiting computational musicology and performance studies by offering manually-extracted data of a number of performer duos' timing and dynamic decisions in commercial studio or live recordings (1934–2019) of select movements of Brahms's Cello Sonatas, Opp. 38 and 99. In this manner, it offers new insights into chamber music playing, which is necessary in order to construct narratives of performance techniques, decisions, and style.


The RBC corpus presents data extracted from 21 recordings of select movements of Brahms Cello Sonatas. For easier navigation in the corpus, the movements are labeled according to their opus number followed by a lowercase roman numeral indicating their position within each composition. For the recordings, abbreviations have been used (refer to Table 1). 10

Table 1. Recordings included in the RBC dataset (alphabetical order)
C/H(36)Pau CasalsMieczysław Horszowski1936HMV, DB 3059-62Brahms: Sonata in F major (F Dur)
C/H(58)Pau CasalsMieczysław Horszowski1958
DP/B Jacqueline du PréDaniel Barenboim1968Angel Records, S-36544Brahms: The Two Sonatas for Cello and Piano
F/B Pierre FournierWilhelm Backhaus1955Decca, LXT 5077Sonata No 1 in E MINOR for 'cello and piano – Sonata No 2 in F MAJOR for 'cello and piano – BRAHMS
F/vdP Emmanuel FeuermannTheo van der Pas1934Columbia Masterworks, Set No. 236Brahms: Sonata for Cello and Piano No. 1 In E minor, Op. 38
G/G Karine GeorgianPavel Gililov1994Biddulph Recordings, 744718101429Brahms: Cello Sonatas (Op. 38 & 99)
G/Gr Sol GabettaHélène Grimaud2012Deutsche Grammophon, 0289 479 0965 1Duo
G/Gu Anne GastinelFrançois Frédéric Guy1999Naïve (Valois), V4817Johannes Brahms: Sonates pour Violoncelle & Piano
I/H Steven IsserlisStephen Hough2005Hyperion, BE114A0BJohannes Brahms: Cello Sonatas
M/A Yo-Yo MaEmanuel Ax1991Sony Classical SK 48191Brahms: Sonatas for Cello & Piano Opp. 38, 99 and 108
M/G Mischa MaiskyPavel Gililov1998Deutsche Grammophon, 0289 459 6772 1 GHBrahms: Die Cellosonaten – Lieder ohne Worte
M/L Truls MørkJuhani Lagerspetz1988Virgin Classics, VC 5 45052 2Brahms: Cello Sonatas – Song Transcriptions
M/P António MenesesMaria João Pires2013Deutsche Grammophon, 0289 479 0965 1The Wigmore Hall Recital
P/NAsier PoloEldar Nebolsin2019Ibs Classical IBS82019Brahms. Cello Sonatas
P/R(36) Gregor PiatigorskyArthur Rubinstein1936HMV, DB 2952-54Brahms: Sonata No. 2 in E minor, Op. 38
P/R(66) Gregor PiatigorskyArthur Rubinstein1966RCA Red Seal, ARL1-2085Brahms. Sonatas for Cello and Piano – E Minor Op. 38 / F Major Op. 99
P/S Gregor PiatigorskyReginald Stewart1947Music & Arts, CD 644compiled in The Art of Gregor Piatigorsky (1903-1976)
P/V Boris PergamenschikowLars Vogt2002EMI Classics, 557526Brahms – Schumann – Works for Cello and Piano
R/S Mstislav RostropovichRudolf Serkin1983Deutsche Grammophon, 410 510-2Johannes Brahms: Die Cellosonaten – The Cello Sonatas
S/B János StarkerAbba Bogin1951Period Records, SPL 593Brahms: Sonatas for Cello and Piano
S/S János StarkerGyörgy Sebȍk1964Speaker Corners/Mercury Records, SR90392Brahms: Sonatas for Cello and Piano


The RBC dataset is available for consultation and download at At this stage, the repository contains data from recordings of the first and second movements of Brahms's two cello sonatas, Opp. 38 and 99, as well as of the third movement of the Op. 38.

In the repository, scores, 11 scape plots, 12 and performance data can be found. 13 Scores are provided in PDF, MIDI, musicXML, MUSX, and MSCX formats. In order to facilitate interpretation of the data, MSCX scores contain note numberings for each part (piano and cello) separately. Scape plots are given as PNG files.

Data for each recording are organized by measurement unit:

  • Op. 38i: note 14/ quarter note beat / half note beat / bar 15
  • Op. 38ii: note / eighth note beat / quarter note beat / bar
  • Op. 38iii: note / quarter note beat / half note beat / bar
  • Op. 99i: note / quarter note beat / bar
  • Op. 99ii: note / sixteenth note beat / eighth note beat / quarter note beat / bar

The corresponding Sonic Visualiser (SV) file is provided in each of the data folders, along with TXT files for the five parameters that the corpus evaluates (see Table 2).

Table 2. Parameters evaluated in the RBC dataset
ParameterAbbreviation in filesUnit
Beat and note onsetsonsetss
Beat and note durationsdurations
Tempo fluctuationstempobpm
Continuous dynamic variationssmoothed_power 16dB
Dynamic values at specific instantsdB_per_beatdB

All TXT files present the same internal structure:

  • column 1: onset (in seconds) at which the measurement was taken
  • column 2: value for each given parameter
  • column 3: onset label as measure.beat or as note number

Exceptionally, the smoothed_power files lack the third column, as measurements are made on a continuous basis and not at specific beat instants; dB_per_beat files lack column 2 as its position value is contained in column 1.

Merged data files are provided in CSV format too. To facilitate download, they are compressed into a single ZIP file found in the main data folder of the repository. Merged CSV files for beat measurements contain the following five columns: onset (in seconds), label (as measure.beat), duration (in seconds), tempo (in bpm), and intensity (in dB). Merged CSV files for note data are restricted to onset (in seconds), label (as note number), duration (in seconds), and bar in which the particular note appears.

In total, the RBC dataset contains 3225 files (see Table 3).

Table 3. Size of the RBC dataset
RecordingScoresSV filesTXT filesCSV filesPNG files
Op. 38i54 x 20 = 8019 x 20 = 3805 x 20 = 1009 x 20 = 180
Op. 38ii54 x 20 = 8019 x 20 = 3805 x 20 = 1009 x 20 = 180
Op. 38iii54 x 20 = 8019 x 20 = 3805 x 20 = 1009 x 20 = 180
Op. 99i53 x 15 = 4514 x 15 = 2104 x 20 = 809 x 15 = 135
Op. 99ii55 x 15 = 4524 x 15 = 2106 x 20 = 1209 x 15 = 135

All files are shared under a Creative Commons Attribution Non-Commercial Share-Alike 4.0 (CC BY-NC-SA 4.0) license. 17


After engraving the scores in Finale26 18 and exporting them in various formats (see above), ground-truth performance annotations were obtained by:

  • numbering the notes for the cello and the piano parts separately. Numberings are included in the MSCX files 19 as chord symbols (for clearer visualization, numberings for the first notes in each bar are given).
  • importing the desired track in Sonic Visualiser. 20 The corresponding waveform appears on the screen (layer 2 in SV files).
  • generating a spectrographic image (layer 3).
  • tapping the recording while it is being played at normal – or most convenient – speed to determine beat/note onsets. Onsets appear in the form of vertical lines that cover the whole vertical space of the waveform. Measurement units at this stage are the smallest beat possible (layer 4, in white) and notes (in the relevant files, layers 5 and 6; in bright purple for the piano and bright orange for the cello).
  • renumbering the onsets to show bar and beat numbers within bars or note numbers.
  • adjusting provisional onsets to the start of each tapped note as visualized in the spectrogram (Cook, 2009).
  • aurally checking the accuracy of the onsets by playing the track at a significantly slower speed and relocating the lines when necessary. For beat measurements, when asynchronies between the cello and the piano occur, onsets were adjusted according to the rhythmically most active part, normally the piano. For accurate note onsets, each instrumental part was annotated individually.
  • exporting onset data as TXT.

Subsequently, additional timing data were automatically obtained by creating, for each measurement unit, two Time Values layers 21 to yield information regarding:

  • beat duration (in seconds) (layer 5 for beat measurements; layer 7 for note onsets);
  • tempo (bpm) from the previous item (layer 6, in orange in beat-based files; layer 8 in note-measured ones). Data were exported as TXT.

In the case of beat measurements only, further data were automatically extracted by:

  • creating a smoothed power layer (no. 7, in purple) and exporting data as a TXT file.
  • aligning smoothed power data and onsets. For that, an open-source Dyn-a-matic tool 22 was employed; data were saved as TXT.
  • obtaining onsets for larger beat units by deleting undesired tapping lines and renumbering time instants. Corresponding SV and TXT files for beat onsets, durations, tempo, and dynamics data were created.
  • generating separate scape plots for tempo and dynamics (dB-per-beat data) separately using an open-source Scape Plot Generator tool. 23 Images were stored as PNG. They were subsequently merged into diamond-shaped figures in Microsoft® Paint, the upper triangle showing tempo fluctuations and the lower one referring to dynamics. Smallest beat units were always used, and scape plots were generated without flip color and using three different degrees of smoothing (0.1, 0.5, and 0.9), respectively.


The data presented in this repository can be used in the study of specific performance decisions in chamber music playing. A most direct application would be the investigation of changes and constants in performance styles in the last one-hundred years. Specifically, the analysis of the data can offer insights into the similarity between individual duos and/or individual performers' playing styles.

For instance, correlation indexes between several recordings can be obtained for the duration of notes in select parts and passages. For example, measurements for quarter note durations in the opening measures of the first movement in Brahms's Op. 38i can be compared and correlated (see Table 4).

Specifically, van der Pas (F&vdP) and Hough (I&H) play in the most dissimilar manner (ρ = -.10), whereas Barenboim and Lagerspetz's durational strategies are highly correlated (ρ = .90). Interestingly, Gililov plays those eight measures in fairly distinct ways in the two recordings in which he participates (G&G and M&G) (ρ = .40).

Similarly, durational data can facilitate the analysis and representation of the varying proportions established between the various formal sections in a given piece. In the second movement of Brahms's Op. 38, the expansion of the last phrase in the central trio can derive into a greatly enlarged central section, minimizing the structural weight of the scherzo sections, as in M&G (see Figure 1b). Other performers, such as F&B, would opt for the opposite strategy, i.e., enlarging the opening statement of the scherzo theme with a shortened central trio (see Figure 1a). Intermediate, more balanced solutions too are to be found (see P&N in Figure 1c).

Figure showing the proportion of time among different sections of Bram's Op. 38ii. More description below.

Fig. 1 Proportional durations of the various formals sections in Brahms's Op. 38ii:a) F&B; b) M&G; c) P&N.
Scherzo (mm. 0.3–76.2): a (mm. 0.3–28.2), b (mm. 28.3–58.2), c (mm.58.3–76.2);
Trio (mm. 76.3–115.2): c1 (mm. 76.3–89.2), c2 (mm. 76.3–89.2), d1 (mm. 89.3–108.2), d2 (mm. 108.3–115.2);
Scherzo (mm. 0.3–76.2): a (mm. 0.3–28.2), b (mm. 28.3–58.2), c (mm.58.3–76.2).

Table 4. Correlation matrix (Pearson coefficient) for performances of the piano part in Brahms's Op. 38i, mm. 1-8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 C&H(58)a - .68 .40 .56 .59 .58 .66 .47 .52 .39 .71 .57 .61 .72 .77 .82 .57 .31 .36 .57
2 DP&B - .65 .59 .67 .85 .49 .66 .71 .69 .90 .69 .63 .74 .68 .67 .71 .41 .45 .69
3 F&B - .21 .66 .61 .17 .37 .23 .24 .45 .30 .27 .78 .65 .58 .62 .32 .42 .63
4 F&vdP - .22 .37 .41 -.10 .28 .3. .47 .24 .36 .48 .73 .53 .70 -.07 .36 .68
5 G&G - .67 .43 .60 .39 .40 .61 .41 .39 .80 .64 .58 .56 .42 .68 .54
6 G&Gr - .54 .61 .62 .54 .87 .73 .72 .63 .61 .67 .71 .31 .47 .71
7 G&Gu - .51 .29 .17 .58 .57 .66 .57 .61 .68 .47 .43 .47 .48
8 I&H - .35 .50 .74 .49 .40 .52 .25 .41 .21 .72 .36 .20
9 M&A - .66 .67 .55 .66 .35 .44 .57 .47 .17 .38 .46
10 M&G - .69 .44 .39 .41 .39 .38 .28 .40 .48 .30
11 M&L - .70 .64 .63 .59 .62 .67 .35 .60 .67
12 M&P - .79 .41 .48 .52 .46 .14 .32 .44
13 P&N - .40 .54 .66 .52 .27 .51 .52
14 P&R(36) - .84 .81 .61 .53 .42 .60
15 P&R(66) - .88 .81 .30 .47 .81
16 P&S - .62 .51 .51 .63
17 P&V - .02 .50 .99
18 R&S - .23 .02
19 S&B - .59
20 S&S -

Note: a For specification of abbreviations see Table 1.

Note onset data for the two instrumental parts can also serve the study of ensemble microtiming practices, including asynchrony, as in Llorens (2017) and Demos et al. (2016). Finally, studies such as Rink et al. (2011) and Llorens (2018) have employed similar data to analyze the influence of performance parameters in the inference of structural relations in music, thus contributing to general music theory.

It is planned to expand the dataset by including recordings of the third and fourth movements of Brahms's Op. 99. Similarly, data from further recordings will be added in the future. In the hope of also creating a collaborative repository, the RBC affords expansion to host data extracted from recordings of other compositions by Brahms. The Violin Sonatas Opp. 78, 100, and 108 are being contemplated as the first step in that direction.


This article was copyedited by Annaliese Micallef Grimaud and layout edited by Diana Kayser.


  1. Correspondence can be addressed to Dr. Ana Llorens, Instituto Complutense de Ciencias Musicales, Universidad Complutense de Madrid, Avda. Prof. Aranguren 5, 28040 Madrid,
    Return to Text
  2. For an overview, see, for instance, the two issues devoted to corpus methods in Music Perception in 2013 and 2014 (Temperley & VanHandel, 2013). Other examples include and, for popular music, deClercq (2015).
    Return to Text
  3. For instance, the Yale-Classical Archives Corpus (; see White & Quinn, 2016), The Annotated Beethoven Corpus (ABC) (; Neuwirth et al., 2018) and the When in Rome repository (; Micchi et al., 2020).
    Return to Text
  4. For further details, see Müller et al. (2011).
    Return to Text
  5.; see Kroher et al. (2015).
    Return to Text
  6.; see Curtis et al. (2019).
    Return to Text
  7.; see Li et al. (2018).
    Return to Text
    Return to Text
  9.; see Cuesta et al. (2018).
    Return to Text
  10. A publicly available Spotify playlist including most of the recordings can be found at
    Return to Text
  11. The scores were prepared based on the first editions of the sonatas: Sonate für Pianoforte und Violoncell […] von Johannes Brahms Op. 38; Bonn: Simrock, [1866], plate 6476; Sonate für Pianoforte und Violoncell […] von Johannes Brahms Op. 99; Bonn: Simrock, 1887, plate 8750.
    Return to Text
  12. The plots represent arch-like tempo and dynamic profiles at various structural levels simultaneously. Whereas shorter flags refer to more detailed features, higher flags manifest parametric variations that shape the performance at broader formal planes. Reddish colors reflect an increase in the parameter's variation, whereas blue colors indicate a decrease. For a detailed explanation on how to interpret these visualizations, see Saap (2011).
    Return to Text
  13. Scape plots and tempo and dynamic data are offered for beat measurements only, as note measurements are most useful for evaluating microtiming, including asynchrony.
    Return to Text
  14. Note onsets are measured separately for the cello and the piano parts; see Methods.
    Return to Text
  15. In the dataset, British terms are employed: minim instead of half note, crotchet instead of quarter note, eighth note instead of quaver, and sixteenth note instead of semiquaver.
    Return to Text
  16. Smoothed dynamic values are obtained by overlooking noise; see Repp (1992) and Desain and Honing (1993). Recording and reproduction techniques greatly affect the resulting dynamic range of recordings and, thus, data should be interpreted in context. In general, modern recordings tend to have a richer frequency spectrum; see Trezise (2009).
    Return to Text
    Return to Text
    Return to Text
  19. Downloadable (free) at
    Return to Text
  20. Version 3.3. Downloadable (free) at
    Return to Text
  21. Data for both duration and tempo need to be extracted separately as they are not inversely proportional; see Bowen (1996, p. 124).
    Return to Text
  22. Developed for the Mazurka project,
    Return to Text
    Return to Text


  • Bowen, J. A. (1996). Tempo, duration, and flexibility: Techniques in the analysis of musical performance. Journal of Musicological Research, 16(2), 111–156.
  • Cook, N. (2009). Methods for analysing recordings. In N. Cook et al. (Eds.), The Cambridge Companion to Recorded Music (pp. 221–245). Cambridge: Cambridge University Press.
  • Cuesta, H., Gómez, E., Martorell, A., & Loáiciga, F. (2018). Analysis of intonation in unison choir singing. In Proceedings of the 15th International Conference on Music Perception and Cognition.
  • Curtis, H., Stasyuk, A., Simon, I., Huang, Ch.-Z. A., Dieleman, S., Elsen, E., Engel, J., & Eck, D. (2019). Enabling factorized piano music modeling and generation with the MAESTRO dataset. In International Conference on Learning Representations.
  • deClercq, T. (2015). Corpus studies of harmony in popular music: A response to Gauvin. Empirical Musicology Review, 10(3), 239–244.
  • Demos, A. P., Lisboa, T., & Chaffin, R. (2016). Flexibility of expressive timing in repeated musical performances. Frontiers in Psychology, 7.
  • Desain, P., & Henkjan H. (1993). Tempo curves considered harmful. Contemporary Music Review, 7, 123–138.
  • Kroher, N., Díaz-Ibáñez, J.M., Mora, J., & Gómez, E. (2016). Corpus COFLA: A research corpus for the computational study of flamenco Music. Journal on Computing and Cultural Heritage, 9(2).
  • Li, B., Liu, X., Dinesh, K., Duan, Z., & Sharma, G. (2018). Creating a multi-track classical music performance dataset for multi-modal music analysis: Challenges, insights, and applications. In IEEE Transactions on Multimedia.
  • Llorens, A. (2017). Recorded asynchronies, structural dialogues: Brahms's Adagio affettuoso, Op. 99ii, in the hands of Casals and Horszowski. Music Performance Research, 8, 1–31. Retrieved from[2017]/MPR%200109%20Llorens%201-31.pdf
  • Llorens, A. (2018). Creating musical structure through performance: A re-interpretation of Brahms's cello sonatas. Doctoral dissertation, University of Cambridge.
  • Micchi, G., Gotham, M., & Giraud, M. (2020). Not all roads lead to Rome: Pitch representation and model architecture for automatic harmonic analysis. Transactions of the International Society for Music Information Retrieval, 3(1), 42–54.
  • Müller, M., Konz, V., Bogler, W., & Arifi-Müller, V. (2011). Saarland music data (SMD). In International Conference on Music Information Retrieval (ISMIR). Retrieved from
  • Neuwirth, M., Harasim, D., Moss, F.C., & Rohrmeier, M. (2018). The annotated Beethoven corpus (ABC): A dataset of harmonic analyses of all Beethoven string quartets. Frontiers in Digital Humanities, 5.
  • Repp, B. (1992). Diversity and commonality in music performance: An analysis of timing microstructure in Schumann's "Träumerei". Journal of the Acoustical Society of America, 92(5), 2546–2568.
  • Rink, J.; Spiro, N., & Gold, N. (2011). Motive, gesture and the analysis of performance. In A. Gritten & E. King (Eds.), New Perspectives on Music and Gesture (pp. 267–292). London: Routledge.
  • Sapp, C.S. (2011). Computational methods for the analysis of musical structure. Doctoral dissertation, Stanford University, USA. Retrieved from
  • Temperley, D., & VanHandel, L. (2013). Introduction to the special issues on corpus methods. Music Perception, 31(1), 1–3.
  • Trezise, S. (2009). The recorded document: Interpretation and discography. In N. Cook et al. (Eds.), The Cambridge Companion to Recorded Music (pp. 186–209). Cambridge: Cambridge University Press.
  • White, C.W. & Quinn, I. (2016). The Yale-Classical Archives corpus. Empirical Musicology Review, 11(1), 50–58.
Return to Top of Page