IN Schotanus (2020a), I reported on two studies, a classroom study and a laboratory experiment involving EEG measures, in which participants listened to different song versions of the same four songs and answered some questions about them (see also Schotanus, Eekhof and Willems, 2018). The studies were created in order to test several aspects of the Musical Foregrounding Hypothesis (Schotanus 2015, 2020b), including the assumption that presenting a text sung instead of spoken would support text processing, even after just one exposure, in particular in a classroom setting. In his commentary, Lee focuses on the latter hypothesis as if it were the main one, which may have coloured his interpretation of the study. At least, he does not address the results concerning voiceless intervals, verbatim repetition of words, purity ratings, and perceived processing fluency, and he only indirectly discusses the results concerning emotion ratings. Having said that, in itself, I agree with his critique.

As I have pointed out in Schotanus (2020a), the classroom study met with several methodological problems: there were issues with randomization, and the data could not be analysed using a single factor analysis because most of the questions could not be asked in all conditions. Furthermore, several questions in the questionnaire seemed to have been misunderstood, among other problems. A comparison between the results of the classroom study and those of the better randomized laboratory study seemed to provide a solution for at least the randomization issue. However, two problems arose. The stimuli in the laboratory study involved fewer song versions (accompaniment-only and accompanied-speech versions were not used), and the results had already been analysed in Schotanus, Eekhof and Willems (2018) in a factor analysis including variables that were analyzed separately in the classroom study. Assuming that it is not appropriate to analyse the same data twice, I could only analyse data that were left unanalysed (i.e., three groups of variables and two individual variables), and in addition, reflect on the pre-existing analysis as well as possible. According to Lee (2020) this was not the right choice; So he asks for a reanalysis, restricted to the variables and conditions that appear in both studies. He also asked for additional research, as he notes that the design of the research does not disaggregate the individual contributions of the factors of interest. Below, I will report on such a reanalysis.


Participants, Materials, and Procedure

In a classroom study, 271 high-school students distributed over 12 pre-existing groups (i.e., their Dutch language and literature classes) listened to five out of 24 versions of the same four Dutch cabaret songs, written and performed by the author. All heard one song spoken, one song vocalized (i.e. sung a cappella with lalala as the only lyrics), one a cappella (sung a cappella with lyrics), and one complete (sung with accompaniment); in addition, they heard either an accompaniment-only version of the song they heard spoken, or an accompanied speech version of the song they heard vocalized. After each track they filled out a questionnaire, mainly consisting of Likert-scale items, seven of which were asked for music and lyrics separately. For more details concerning participants, materials and procedure, see Schotanus (2020a), and for songs (lead sheets and recordings) and questionnaires, see Schotanus (2017).

In a laboratory study, 24 adults listened individually to four of the same song versions. They heard one song spoken, one vocalized, one a cappella, and one complete. They listened to the tracks in a sound-proof booth, while both EEG and SCR measures were taken. For more details concerning participants and procedure, see Schotanus, Eekhof, and Willems (2018), and for the results of the EEG measures, see Schotanus, Eekhof, and Willems (2018), and Schotanus (2020b, 166-188).


The data were reanalysed as follows. A factor analysis similar to the factor analysis conducted in Schotanus, Eekhof and Willems (2018) was conducted on both datasets, except that the items concerning poetic quality were excluded, as these were different across the two studies. Second, the emotion ratings were reanalysed for both datasets because several conditions and one variable (nagging quality) had to be excluded from the classroom data, and one variable (calming quality) had to be excluded from the laboratory data. In addition, two small sets of variables from the classroom study (i.e., those concerning voiceless intervals and those concerning repetition) were reanalysed excluding the data concerning accompanied-speech and accompaniment-only versions.

As was the case in Schotanus (2020a), Principal Axis factor analyses were conducted with oblique rotation (direct oblimin), and factor scores were saved following the Anderson-Rubin method. Random intercepts were estimated for group, song and participant*group in mixed-model regressions concerning classroom data, and for participant and song in regressions concerning laboratory data.


Factor analyses

The factor analyses concerning emotions, voiceless intervals, and repetitions show similar results across studies (see Table 1). However, those concerning the 15 text-related items show differences (see Table 2).

As Table 2 shows, both models consist of four factors, one of which seems to be clustered around listening comfort (a combination of not tiring and not boring), one around heaviness, one around voice quality and one around clearness (a combination of intelligibility and comprehensibility). However, most of these factors are coloured quite differently across the two studies by factor loadings from other variables. For example, the first factor is coloured towards interesting content in the classroom study, and towards a positive mood in the laboratory study. Conversely, the heaviness in the second factor is coloured towards a negative mood in the classroom study, and towards sad content in the laboratory study. Furthermore, Clearness, is associated with several other positive qualities, particularly in the laboratory study. As a result, there is no factor that can clearly be associated with attention or appreciation for the lyrics.

Regarding the mean factor scores per condition (see Table 3), the factors Feeling upbeat, Emotional load, Distraction by voiceless intervals and Meaningfulness of repetitions show similar, largely significant patterns, as in Schotanus (2020a). These results strengthen various conclusions in Schotanus (2020) that Lee (2020) has cast doubt upon: a track turns out to be more Feeling upbeat if there is relatively 'more' music in it, which is in line with earlier research; Emotional load seems to be a content-related variable that is more likely to be associated with lyrics than with instrumental music, although music can convey Emotional load; this partly represents a positively-valued aspect of sadness that can be distinguished from the traditional negatively-valued low-energy aspect, partly represented by the inverse of feeling upbeat; even in a-cappella versions, listeners rate the music significantly different from the lyrics; silences are experienced as more distracting than instrumental interplays; verbatim repetitions of words are perceived as less superfluous and more meaningful when they are sung instead of spoken, particularly when accompaniment is present; and both the emotion ratings and the ratings of the emotional meaning of verbatim verbal repetitions show that the emotional meaning of the song lyrics within this experiment are interpreted more in line with the lyricist's intentions than the spoken ones (see Schotanus 2020a for references and discussion).

Table 1. Factor loadings, for the factors concerning emotions, voiceless intervals (VIs) and repetitions.
EmotionsDistraction by VIsMeaning of repetitions
Feeling upbeatEmotional loadFeeling upbeatEmotional load
MSAFor all items > .5For all items > .5All > .5All > .5All > .5All > .5
VIs are nice-.76-.73
VIs are distracting.76.73
Rep. superfluous-56-.90
Rep. feeling.99.77
Rep. meaningful.27.37
Initial eigenvalue2.701.142.721.451.161.541.681.90
% of variance44.9219.0145.3224.1058.1176.7455.8963.39
Sums of squared loadings2.
Table 2. Factor loadings, factor analyses of 15 text related items. Factor names focus on similarities across studies, although there are substantial differences between two factors with the same name.
MSAFor all items > .5For all items > .5
Lyrics funny.42-.
Lyrics happy.37-.
Lyrics heavy-.03.70-.04-.03-.
Lyrics sad-.15.75-.09-.12-.26.66-.18-.06
Initial Eigenvalue4.192.311.331.104.352.641.831.06
% of variance27.9315.428.877.3429.0217.5712.207.07
Rotation sums of squared loadings2.942.121.302.573.242.282.482.38

Note: Comfort="Listening comfort", Heavy = "Heaviness", Clear = "Clearness", V-Quality= "Voice quality".

Table 3. Mean factor scores (M) and Standard Deviations (SD) per condition per study, and F for the effect of condition in a post hoc univariate tests, in crossed classified Mixed-Model regressions. Factor names between quotation marks indicate that those factors represent only roughly the same phenomena in both studies.
Feeling upbeatEmotional loadDistraction by VIsMeaningfulness of rep.
A cappella L-0.110.90-0.470.710.
Complete L0.311.000.40.910.10.910.310.94-0.410.77-0.60.730.270.880.290.97
A cappella M-0.090.93-0.450.790.000.98-0.170.99
Complete M0.481.020.920.91-0.020.91-0.210.83
F condition81.92***28.64***8.47***3.82**22.47***23.90***
Table 3 (continued)
'Listening comfort''Heaviness''Clearness''Voice quality'
A cappella L-0.031.00-0.341.00-0.121.00-
Complete L0.280.940.740.76-0.531.03-0.311.00-0.040.860.091.180.370.950.331.08
F condition14.47***15.90***32.74***6.05**0.220.8529.61***2.02

Note: L= Lyrics, M=Music

The results for the other factors are less easy to interpret. Several variables were clearly not rated similarly across studies. Additionally, the exclusion of items evaluating the quality of the lyrics has weakened the model, as fewer factors are significant right now. Yet, the results support the assumption that the differences in the results for the two studies, observed in Schotanus (2020a), are due to different ratings.


A partial reanalysis of data from Schotanus (2020a) has supported various conclusions and observations from the original paper, indicating that singing a text instead of speaking it can improve processing in several ways, even after a single exposure. Admittedly, the design of the study does not allow for conclusions on the causal processes of these results, but the results are still in line with hypotheses concerning these processes. Clearly, further research into these, sometimes bold, hypotheses is required.

Other limitations are that some (but not all) of the benefits are only perceived benefits (see also the recall scores in Schotanus, Eekhof and Willems, 2018), and that the lack of stimulus variation hampers generalization. Schotanus (2020b) provides additional research addressing some of these issues, for example, an analysis of the EEG measures taken within the laboratory study, showing the effect of out-of-key notes and voiceless intervals (see also Schotanus, Eekhof, and Willems, 2018). Furthermore, a second classroom experiment was conducted, in which accompanied songs performed by various artists turned out to support recall for the content of the songs (Schotanus, 2020b, p. 305-324).


This article was copyedited by Tanushree Agrawal and layout edited by Diana Kayser.


  1. Correspondence address: Yke Schotanus,
    Return to Text


  • Lee, C. S. (2020). Commentary on Schotanus: "Singing and accompaniment support the processing of song lyrics and change the lyrics' meaning." Empirical Musicology Review, 15(1-2), 56-60.
  • Schotanus, Y. P. (2015) The musical foregrounding hypothesis: How music influences the perception of sung language, Ginsborg, J., Lamont, A., Philips, M. & Bramley, S. (Editors) Proceedings of the Ninth Triennial Conference of the European Society for the Cognitive Sciences of Music, 17-22 August 2015, Manchester, UK.
  • Schotanus, Y.P. (2017). Supplemental materials for publications concerning three experiments with four songs, hdl:10411/BZOEEA, DataverseNL Dataverse, V1.
  • Schotanus, Y. P. (2020a). Singing and accompaniment support the processing of song lyrics and change the lyrics' meaning. Empirical Musicology Review, 15(1-2), 18-55.
  • Schotanus, Y. P. (2020b). Singing as a figure of speech, music as punctuation: A study into music as a means to support the processing of song lyrics. Doctoral dissertation. Utrecht.
  • Schotanus, Y. P., Eekhof, L. S., & Willems, R. M. (2018). Behavioral and neurophysiological effects of singing and accompaniment on the perception and cognition of song. In Parncutt, R. & Sattmann, S. (Eds). Proceedings of ICMPC15/ESCOM10. Graz, Austria: Centre for Systematic Musicology, University of Graz. 389-394.
Return to Top of Page