WHY are major and minor scales like they are (with a specific ordering of tones and semitones) and not completely different? Why are major keys associated with positive emotional valence (happiness, contentment, serenity, grace, tenderness, elation, joy, victory, majesty…), and minor with negative emotional valence (sadness, anger, fear, tension, solemnity, lament, tragedy…)? For Meyer (1956), "the minor mode is not only associated with intense feeling in general but with the delineation of sadness, suffering and anguish in particular" (p. 227). Why? One would think that music psychologists would have answered these apparently simple questions by now. Evidently we have not, but things are moving in a promising direction. The contribution by Huron and Davis (2012) is a significant step towards a new explanation, and it also has interesting broader implications. In this extended commentary, I will present a new approach that builds upon their work.

Consider first the origin of the ordering of tones and semitones in major and minor scales. I addressed that question in Parncutt (2011a). My basic assumption was that any passage of music in a major or minor key may be considered a Schenkerian prolongation of its tonic triad. I will examine this idea in detail below. For the moment, allow me to quickly consider the relationship between tones of the tonic triad and scale degrees in major-minor tonality (MmT).

In Parncutt (2011a), I proposed that scale degrees in major and minor keys may be divided into three categories: the tones of the tonic triad, missing fundamentals of the tonic triad, and leading tones. In general, the missing fundamentals of a chord (and their salience) depend on voicing, but the main candidates for an octave-generalized chord type such as "major triad" can be derived by a simple octave-generalized calculation (Parncutt, 1988). The main missing fundamentals of a C-major triad are A, F, and D; for example, there is a missing fundamental at A because E corresponds to the 5th harmonic of A, and G to the 7th. Similarly, the main missing fundamentals of a C-minor triad are F and Ab. Thus, the C-major scale comprises the tonic triad (C, E, G), its missing fundamentals (A, F, D), and the leading tone (B). The C-harmonic-minor scale comprises the tonic triad (C, Eb, G), its missing fundamentals (F, Ab), and the leading tone (B). According to this logic, the gap between C and Eb could be bridged by either D or Db. D is preferred for one or more of the following reasons: the analogy to the familiar major scale, the perfect 5th above the dominant G, and the avoidance of adjacent half-steps (Pressing, 1977). Incidentally, the 5th above the dominant plays an important role in Schenkerian theory: scale step 2 in the fundamental line (Urlinie) implies dominant harmony.

Thus, six out of seven scale tones of the major scale, and five out of seven tones of the harmonic minor scale, are octave-equivalent with virtual pitches evoked by the tonic triad. Leading tones may be treated as a special case, consistent with their instability and "sensitivity" (French: la note sensible), the existence of special terms to describe them, and their apparently unique historical and psychological origin. To understand the special status of leading tones, consider first the simplest and most common form of the diatonic scale: the scale represented by the white keys on the modern piano, which is based on a continuous section of the cycle of fifths (F, C, G, D, A, E, B). Among the intervals created by this collection of 7 pitch classes, there are 2 semitones: B-C and E-F. In Medieval music theory, the hexachord ut-re-mi-fa-sol-la can be mapped onto this diatonic scale in two different places: C-D-E-F-G-A or G-A-B-C-D-E. From this perspective, the two semitones in the diatonic scale both correspond to the interval mi-fa. Statistical analysis of representative Medieval (Gregorian) chants shows that the tone fa (C or F) consistently occurs more often than mi (B or E). The reason may be because the harmonic series above fa better fits the prevailing diatonic scale, so fa has more pitch commonality with its immediate context and sounds more consonant (Parncutt & Prem, 2008). Modern research on the frequency of occurrence of scale steps and their tonal stability in different tonal styles (Järvinen, 1995; Krumhansl, 1990; Krumhansl et al., 1999; Oram & Cuddy, 1995; Smith & Schmuckler, 2004), when applied retrospectively to Medieval music, suggests that fa was perceived as more stable than mi. Thus, early listeners may have learned to associate the lower tone of any semitone interval with instability and the higher tone with stability.

Admittedly, this is not the simplest conceivable explanation for the origin of major and minor scales. But each aspect of the explanation is supported by independent psychological, historical and music-theoretic evidence or arguments. I know of no other explanation that is plausible from all three viewpoints. In the following, I will combine this approach with Huron's findings to construct a new explanation for the emotional connotations of major and minor keys that brings together ideas and approaches from both humanities and sciences.


Huron and Davis (2012) observed that the pitch range of speech is larger in happy than sad speech, and asked whether the same is true for music. A comparison of major and minor melodies shows that the answer is yes, but the difference is rather small. This leads to two further questions. Do composers or improvisers choose smaller intervals for music in the minor mode because (i) that music is normally sad and smaller intervals better convey sadness, or (ii) smaller intervals are a consequence of the structure of the minor scale? Huron and Davis answered "yes" in both cases, but focused on (ii).

To address (ii) systematically, Huron and Davis first made the question more specific by looking at what happens to the average successive interval size in major melodies when selected scale degrees are shifted up or down by a semitone. If the answer to (ii) is yes, the average successive interval size should fall if we take a melody in a major key and lower scale steps 3 and 6 to create a minor-key melody.

Huron and Davis began their argument with a simple observation. If every scale degree and all transitions between scale degrees happened equally often, shifting selected scale degrees would have no effect on mean interval size. In fact, there are large variations in the prevalence (frequency of occurrence) of both scale degrees and scale-degree transitions. For that reason, any shift in scale degrees will lead to a change in average successive interval size. Regarding the prevalence of scale steps, Krumhansl and Kessler (1982) explained their key profiles (hereafter "K-K profiles") on that basis, which led to a surge of research interest in prevalence profiles of scale degrees in MmT.

The prevalence of scale-step transitions in MmT-melodies depends on at least three separate factors:

  1. Smaller intervals happen more often than larger intervals, consistent with the gestalt principle of proximity. Vos and Troost (1989) and Huron (2006) considered two exceptions. First, the minor second generally happens less often than the major second; possible explanations involve categorical perception and intonational limitations. Second, consonant intervals such as fifths tend to happen more often than adjacent dissonant intervals such as tritones.
  2. Melodic leaps more often rise than fall, and steps more often fall than rise. Or put another way: rising intervals are more often leaps and falling intervals are more often steps. This was demonstrated empirically by Vos and Troost (1989), and confirmed by Huron (2006). Meyer (1973) regarded a melodic leap as a structural gap; it implies stepwise motion in the opposite direction to fill the gap. Schenker explained the stepwise filling-in of intervals between harmonic tones with the term Zug (linear progression), which is a form of prolongation (Forte & Gilbert, 1982).
  3. Most interesting for the present study, transition probabilities depend on scale degrees relative to the tonic. As Huron and Davis (2012) showed in their Figure 2, which is the same as Figure 9.7 in Huron (2006), some scale-degree transitions are much more common than others. The most common transitions in major-key melodies are the falling steps from 5 to 4, 4 to 3, 3 to 2 and 2 to 1—consistent with points 1 and 2 above. The most likely scale degrees to be followed by a rest (which usually indicates the end of a phrase, and hence closure) are 1, 3 and 5—the tones of the tonic triad. Surprisingly uncommon is the transition from 6 to 7 in either direction. Scale degree 6 most often moves down a step to 5, and 7 most often moves up a step to 1. For melodies whose range is smaller than an octave, these aspects taken together imply that the lowest tone of a melody is often 1 or 7, while the highest is often 6 or 5. In the following, I will refer to melodies that are largely confined to this range as "Huron's stereotype."


Before continuing, allow me to drive this point home by considering some well-known examples of Huron's stereotype: "Oh Susanna," "Twinkle twinkle little star," "Frère Jacques," "Mary had a little lamb," and the national anthems of Germany ("Einigkeit und Recht und Freiheit…," based on the 2nd movement of Haydn's Kaiserquartett) and Great Britain ("God Save the Queen"). (Readers are asked to imagine/audiate these melodies and confirm for themselves that their opening themes correspond to the stereotype before proceeding.) But the stereotype is not limited to so-called trivial music, nor is it limited to national anthems. As we shall see, it happens in most or perhaps all styles of MmT.

Consider Western art music. If we wanted to choose a representative composer, that might be Mozart, and if we wanted to choose a representative repertoire we might choose his 18 piano sonatas. If we look at the melodic line in the first two measures of each sonata, we find that in 10 of 18 cases (KV 279, 280, 281, 283, 284, 310, 331, 332, 333, 545) the melody conforms to Huron's stereotype: the range is less than an octave, the lowest tone is scale degree 7 or 1 and the highest is 5 or 6. In the other 8 cases, the range is expanded to include a lower 5 (KV 282, 330, 533) or an upper 1 (KV 311), or the theme is a triadic flourish—an arpeggiation of the tonic triad (KV 309, 457, 570, 576).

Just in case that is a coincidence, let's consider another representative Western composer, J. S. Bach, and one of his best-known works, the Well-Tempered Clavier. There are 24 fugues in Book 1, and each has a clearly defined theme. Of these, 10 conform to the stereotype (excluding the modulating parts): those in C, D, d, Eb, d#, F, f#, g, Ab and g# (upper case for major keys, lower case for minor). Like the Mozart example, other themes deviate from the stereotype, but in ways that are themselves stereotypical.

I also did a quick, non-systematic search for Huron's stereotype in a list of pop, rock and traditional songs that I happen to know (and presumably most English-speaking middle-aged people know). In the following alphabetical list, the best-known (or first few) melodic phrases of the song (regardless of whether verse or chorus) conform to Huron's stereotype. Some of the melodies also have non-diatonic tones, but they also lie within the stereotypical range:

All you need is love, Blowin' in the wind, Blue moon, Bye bye blackbird, Bye bye love, Climb every mountain, Crocodile rock, Danny boy, Don't worry be happy, Fernando, Fire and rain, Great pretender, I don't know how to love him, If I had a hammer, Michael row the boat ashore, Mister Bo Jangles, Mrs. Robinson, Sound of music (The hills are alive…), Stand by me, Sweet baby James, The lion sleeps tonight, The rose, The times they are a-changin', With a little help from my friends.

When making this list, I left out many more songs than I included. But many of those left-out songs had a melody that went down further to scale step 5 or further up to 1—just as in the Mozart piano sonata examples. I did not consider jazz standards, because their melodic range tends to exceed one octave. Note also that I have left out melodies in minor keys, for the same reason that Huron and Davis left them out of their study: they often modulate quickly. All in all, the results of this very preliminary and subjective investigation are indicative of the psychological and musical reality of Huron's stereotype.


Huron and Davis (2012) asked: If a given scale degree in a major melody is altered by shifting it up or down a semitone, how will that affect the average size of intervals between successive tones in the melody? The question is important because, as Huron previously demonstrated, sad melodies, like sad speech, tend to have smaller intervals between successive tones (or phonemes).

To answer this question, we first need to consider which scale degrees can be shifted. If we limit our investigation to shifts of one semitone, we must avoid scale steps that are a semitone apart: new scale degrees are not created if scale degree 7 is shifted up, 1 is shifted down, 3 is shifted up or 4 is shifted down. Huron and Davis also decided not to explore the effect of shifting the tonic (1) up a semitone, which I think would have been interesting—although it would not have affected their general conclusions.

Huron and Davis systematically explored all remaining possibilities, applying all remaining scale-degree shifts to a large database of melodies in major keys. The result: the best way to reduce the average size of successive intervals is to lower scale degrees 3 and/or 6 by a semitone. In other words, the harmonic minor scale minimizes the average successive interval size in typical melodies. Is that the reason why the harmonic minor is so popular?

Huron and Davis examined a large number of real melodies, but it is also possible to predict their result without doing any analysis. Consider again the typical range of a major-key melody: scale degree 1 (or 7 just below it) is often the lowest tone and scale degree 6 (or 5) is the highest. In this case, it is clear that lowering 7 will increase the average successive interval size, because 7 is the lowest tone. Lowering 2 will increase the size of intervals between 2 and 3, so that is unlikely to lead to an overall reduction in interval size. Raising 4 increases the size of intervals between 4 and 1, 2 and 3, and since these are rather common transitions, that change is also likely to increase the average. The only two possibilities that clearly reduce the mean successive interval size are lowering 6 and lowering 3. Lowering 3 has this effect due to the relatively high rate of transitions among scale degrees 1, 2 and 3 by comparison to 3, 4 and 5, but the effect is rather small.

The results of Huron and Davis are consistent with the following scenario. Historically, melodies in major and minor keys developed in parallel. Both were essentially diatonic (like transpositions of the white keys on the piano) but with different tonal centers. For the moment, we have no explanation for the specific choice of these two tonal centers. The relative prevalence of major and minor keys varies across styles and periods (e.g., there is relatively more minor in music by Bach, and relatively more major in Mozart), but a systematic analysis of large corpora in different styles and periods would presumably reveal a consistent dominance of major over minor—consistent with the general observation that people tend to prefer happy over sad sounding music (Thompson, Schellenberg, & Husain, 2001; Hunter & Schellenberg, 2010). Because major was more common, minor was perceived as a variant of it, rather than the reverse: minor became "the Other" of the major-minor system. In minor melodies, two or three scale degrees were typically lower than the equivalent scale degrees in major, which made the melodies sound sad—just as the average fundamental frequency of sad speech is lower than expected (Huron, 2008). The idea of the minor as "the Other" is not new: in the words of Meyer (1956), "States of calm contentment and gentle joy are taken to be the normal human emotional states and are hence associated with the more normative musical progressions, i.e., the diatonic melodies of the major mode and the regular progressions of major harmony. Anguish, misery, and other extreme states of affectivity are deviants and become associated with the more forceful departures of chromaticism and its modal representative, i.e., the minor mode" (p. 227).

According to this account, minor is sad compared to major simply because selected scale steps are lower than expected. This idea is direct and parsimonious, and therefore particularly convincing. But it is not the whole story, because the major-happy/minor-sad association appears to be confined to MmT; the association may be absent in historical or non-Western traditions that are not based on triadic harmony (see the historical section below).

To solve this problem, I offer an alternative hypothesis: The ultimate origin and foundation of the sad feeling of music in minor keys is the prolonged minor triad in the background. The prolonged minor triad sounds sad by comparison to the prolonged major triad in the background of music in major keys because one of its tones (the third) is lower than the corresponding tone in the major triad. That, in turn, is because major is perceived as a standard from which minor deviates, which is because music in major keys is more common. That, again in turn, is because the major triad is more harmonic (i.e., more similar to the harmonic series) and in this sense more consonant than the minor triad.

There is convergent evidence for each point in this argument, and each point can be generalized to other situations. Regarding prolongation, not only triads can act as tonic (or referential) sonorities when they are prolonged, but also open-fifth sonorities (e.g., in Medieval polyphony) and major-minor seventh chords (in blues and bebop jazz; cf. Salzer, 1952/1962). Huron has argued that the principle of communicating sadness by lowering an expected pitch has considerable generality both in music and speech. That more common percepts are perceived as standards from which less common percepts deviate is the basis of the theory of stereotype-based category perception (Taylor, Fiske, Etcoff, & Ruderman, 1978) and is related to the availability heuristic: more common things tend to come more easily to one's mind (Tversky & Kahneman, 1973). Finally, harmonicity is an important general foundation of Western consonance (McDermott, Lehre, & Oxenham, 2010; Parncutt & Hair, 2011; Terhardt, 1974).


Regarding his Figure 9.7, Huron (2006) mentioned that

One of the most striking features is the sequence of descending arrows from 5 to 4 to 3 to 2 to 1. For Schenkerian theorists, this is strikingly reminiscent of the five-line Urlinie—although it should be emphasized that these transitions are note-to-note, rather than the transitions between structural tones" (p. 160).

That is an interesting and promising observation, and unless I have missed something (for which I apologize) Huron (2006) and Huron and Davis (2012) did not follow it up. Instead, they considered the consequences of this pattern for the use of chromatically altered scale degrees.

Huron's stereotype is consistent with the idea that any passage of music in a major or minor key is a prolongation or embellishment of its tonic triad—or can be perceived as such. In making this claim, I am indebted to Schenker for the idea of prolonging the tonic triad; but I have also adapted his idea for my purpose, which is more music-psychological than music-theoretical. In his analyses, Schenker considered different kinds of prolongation on different hierarchical levels (e.g., any triad on any scale degree can be prolonged, either contrapuntally or harmonically; Salzer, 1952/1962), and focused on the analysis of German masterworks. By comparison to Schenker, I apply less theory (tonic prolongation is just one of many Schenkerian ideas) to more music (I claim that my idea applies to all of MmT).

Forte and Gilbert (1982) give several examples of chord progressions in which one chord (e.g., the tonic triad) is more tonally stable than another (e.g., a subdominant). The less stable chord may be considered a prolongation of the more stable. These examples illustrate a central feature of prolongation, both melodic and harmonic: It generally involves stepwise motion from more stable to less stable and back.

If you add passing and neighbor tones to the tones of a tonic triad, you essentially get Huron's stereotype. The rarity of the 6-7 transition is easily explained by the idea that 6 is a neighbor of 5, so progresses naturally to 5, while 7 is a neighbor of 1, so it progresses naturally to 1. If that is the case, Huron's stereotype is evidence that the tonic triad exists somehow in the background throughout a tonal melody — just as the K-K profiles are understood to exist in the psychological background of any passage of MmT, which would explain the high correlation between the K-K profiles and prevalence profiles of scale steps (Krumhansl, 1990). Different terms can be used to describe this background: a physicist might call it a frame of reference, and a psychologist might call it a schema, gestalt, or cognitive representation.

Schenker introduced the idea of Ursatz (fundamental structure) to describe this psychological background structure. His idea was later taken up in different ways by (mainly American) music theorists and provoked a quantum leap forward in music-theoretic thinking. But the idea is also famously problematic:

  1. For most listeners, the Ursatz has no psychological reality across long time-spans such as an entire movement of a classical or romantic symphony that lasts for several minutes (Cook, 1987). It may be possible for listeners with good absolute pitch or music theorists with a good understanding of the score to conceptualize the tonic triad or Ursatz throughout a piece and hear everything relative to it (which would be an example of structural hearing or Fernhören; Salzer, 1952/1962), but that is an unusual form of music perception. Even if we ignore this psychological problem, there is a music-analytical problem associated with longer timespans: the Ursatz may work well for achieving theoretical understanding of shorter passages (e.g., the Brahms/Haydn Variations theme) but the situation is less clear (more ambiguous) for longer or more complex pieces, because the details of prolongation may be implemented differently at larger levels than at smaller ones (Graham Hair, personal communication).
  2. The structural details of the Ursatz are too specific. It is not the only possible background structure for a piece in MmT. A stepwise soprano descent (the fundamental line or Urline) and rising and falling fifth intervals in the bass (bass arpeggiation or Bassbrechung) are not the only forms of prolongation of a tonic triad. Even if we focus on German masterworks, as Schenker did, forcing pieces into the Ursatz mold is not necessarily the best way to achieve music-theoretical insights—let alone explain perception.

To solve this problem, we need a representation of the musical background that is more general and more fuzzy. The representation should have the following basic properties. First, it should encapsulate the principle of moving from consonance through dissonance and back to consonance, which I am taking to be axiomatic for most Western music. Second, it should be consistent with the fact that almost all MmT-music starts and ends with a major or minor triad (either real or implied), and in most cases the two triads are the same. Third, much music in major or minor keys can be regarded as elaborations (prolongations) of the key-defining progressions (Caplin, 1998: cadences) such as I-II-V-I, I-III-V-I, I-IV-V-I, and I-VI-V-I (Salzer, 1952/1962).

A solution to this problem was offered by Salzer (1952/1962) in his example 481 (p. 263 in the Dover edition). This diagram reduces the structure of a whole piece to its tonic triad, which Salzer labels "chord (tonality-indicating)," and beyond that to the root of that triad, which Salzer labels "tone." Starting from the "chord," he separates "primordial harmonic prolongation" from "primordial contrapuntal prolongation"; both lead to "structure = tonality-determining harmonic and melodic framework."

It is interesting that here and elsewhere Salzer avoids any reference to Schenker's Ursatz. Forte (1959, p. 9) similarly claimed that "Schenker's major concept is not that of the Ursatz, as it is sometimes maintained, but that of structural levels, a far more inclusive idea" (cited in Wikipedia "Schenkerian analysis"). Boenke (2005) explained that when theorists such as Salzer applied Schenker's theory to earlier and later music, central aspects such as the Ursatz had to be weakened or abandoned:

Überlegungen, den Geltungsbereich der Schenker-Theorie durch Modifikationen zu erweitern, blieben ein bestimmendes, wenngleich kontrovers diskutiertes Motiv in der Auseinandersetzung mit Schenker. In dem Maße, wie Teilstücke seiner Theorie—beispielsweise das Konzept hierarchisch bezogener Schichten oder aber die Vorstellung der ›Auskomponierung‹ von Klängen—auf Werke außerhalb der von Schenker betrachteten Zeitspanne angewendet wurden, mußten andere zentrale Aspekte, insbesondere die Theorie des ›Ursatzes‹, abgeschwächt oder gar ganz aufgegeben werden. Je weiter das Zeitfenster geöffnet wurde, umso stärker konnten einzelne Ideen Schenkers allgemeine und epochenübergreifende Gültigkeit beweisen. Als Kehrseite dessen wurde jedoch die Theorie in ihren Fundamenten ausgehöhlt.

Returning to Salzer's diagram: in his accompanying text, Salzer explains that it

represents an attempt to demonstrate graphically the "distance" and at the same time the inner connection between the most remote, quasi-abstract, musical factors (such as a tone and its resulting chord) and the finished product of composition … just as a prolongation of lower order, to be understood, must always be referred to the one of next higher order (which is its structure), so also can the structural framework be referred further back to the tonality-indicating fundamental chord of which it logically is a harmonic or contrapuntal prolongation (p. 231).

In this passage, Salzer reinforces and reformulates Schenker's idea that a passage of music can be reduced to its tonic triad, and hence regarded as a prolongation of its tonic triad—consistent with the idea that, from a harmonic viewpoint, the Ursatz is a prolongation of the tonic triad (whereas from a melodic viewpoint it is like the gradual fall in pitch at the end of a phrase of speech or music; Huron & Davis, 2012). Similarly, I argued in Parncutt (2011a) that the K-K profiles are merely a cognitive representation of the prolonged tonic triad. The evidence for this statement is both qualitative and quantitative. The qualitative evidence can be found in Schenkerian theory, which explains the process of prolongation and the hierarchy of structural relationships that exist between a tonic triad and the details of the musical surface. The quantitative evidence is the correlation between the K-K profiles and the pitch-salience profiles of major and minor triads according to Parncutt (1988)—a simple algorithm that was inspired by two other algorithms, Terhardt (1982) and Terhardt, Stoll, & Seewann, (1982), and whose degree of complexity lies midway between them. The correlation is equally strong when a more complex algorithm is used such as defined in Terhardt et al. (1982), Parncutt (1989) or Parncutt (1993).

The quantitative evidence is based on the idea of a chord's pitch-salience profile—an experiential representation of the chord, in which pitch corresponds not to pitch as notated in a musical score or to frequency as physically measured, but to the experience of a tone with a given pitch. The strength or salience of this experience depends in general on the number of audible partials corresponding to harmonics of that pitch, and their perceptual salience. When this simple idea, which is consistent with many psychoacoustic studies of pitch perception, is applied systematically to a C-major triad, we can predict firstly that the pitch class C is on average more salient than E or G, because E and G, and their harmonic overtones, are more often harmonics of C than vice-versa. We may then predict that a C-major triad has missing fundamentals at pitch classes D, F and A; their salience is lower than that of the tones of the tonic triad, and on average A is the most salient and D is the least salient of the three. Similar arguments apply for the minor triad, which can be treated in exactly the same way (and not differently, as in the 19th-century theory of harmonic dualism; see Ortmann, 1924).

This link between the Ursatz, the key profiles, and pitch-salience profiles of tonic triads, if valid, has the potential to become the foundation for a new general understanding of MmT. Consistent with Adler's (1885) concept of systematic musicology (Parncutt, 2007), I am thinking of a new paradigm that brings together and synergizes ideas from the humanities and sciences—ideas that originally emerged independently in strikingly different intellectual traditions and contexts. Salzer (1952/1962) anticipated this development when he wrote

I firmly believe that there is a need for a theory of music and composition which never loses contact in all its branches and disciplines with what seems to me to be its principal goal and justification: leading the ear and mind to understand all details as organic offshoots of the whole, which means the perception of total musical organization. (p. 283; italics RP)

A music psychologist may disagree with the extent to which "total musical organization" can be perceived, or can exist in a listener's imagination—but at the same time welcome the opportunity to collaborate with music theorists to achieve deeper insights into these issues.


The key words in Schenker's approach are prolongation and (compositional) unfolding (Ausfaltung, Auswicklung, Auskomponierung). Forte and Gilbert (1982) explained:

Prolongation refers to the ways in which a musical component—a note (melodic prolongation) or a chord (harmonic prolongation)—remains in effect without being literally represented at every moment. Of the two main categories of prolongation, melodic and harmonic, the latter is easier to grasp. Essentially, a given harmony is prolonged so long as we feel it to be in control over a particular passage. (p. 142)

What exactly does it mean for a note or chord to "control" a passage, or to "remain in effect"? In an empirical study, Deutsch (1972) showed that short-term memory for the pitch of a tone is affected most by a following distractor tone whose frequency is about ⅓ of a whole tone higher or lower; the effect disappears at an interval of about one whole tone. This is evidence that a pitch, and stepwise departures from that pitch, can "remain in effect" for at least several seconds. The limit of about one whole-tone is broadly consistent with the gestalt principle of (pitch) proximity, Noorden's (1975) theory of temporal coherence, and Bregman's (1990) theory of auditory scene analysis. Psychological experiments have also provided evidence for the existence of a prolonged tonic triad in the background of MmT-music. Several studies reported by Krumhansl (1990) are consistent with that assumption. In a priming paradigm, for example, Bigand, Tillman, Poulin-Charronnat, and Manderlier (2005) found that response times in consonant/dissonant judgments are shorter for tonic than for nontonic targets.

Larson (1997) introduced psychological ways of thinking into the music-theoretic discourse surrounding this issue:

To auralize means to hear internally sounds that are not physically present. A trace is the internal representation of a note that is still melodically active (p. 104).
In a melodic step, the second note tends to displace the trace of the first, leaving one trace in musical memory; in a melodic leap, the second note tends to support the trace of the first, leaving two traces in musical memory (p. 105).

"Traces" can exist, and "displacement" can happen, in either the background or the foreground (using these terms relatively—not in the strict sense of Schenker and his followers). Melodic tones that are adjacent to tonic triad tones (neighboring or passing tones) can prolong the triad by displacing them in the foreground but allowing them to continue as psychological references in the background. When Larson says that "a second note tends to displace the trace of the first," he is referring to the musical surface or foreground (Deutsch's experiment was also confined to the musical foreground). If the first of a pair of melodic tones is part of a prolonged sonority, that sonority may continue to exist in the background, but disappear (be "displaced") in the foreground.

From a psychological viewpoint, the musical foreground is enabled by a shorter-term memory with a duration or half-life of perhaps one second (cf. Huron & Parncutt, 1993); the background, by a longer-term memory with a duration of roughly one minute (cf. Cook, 1987). The shorter-term memory may be considered either passive echoic memory or a working memory buffer in which active, dynamic cognitive processing occurs—similar to the visuo-spatial sketch pad and phonological loop of Baddeley and Hitch (1974), but specialized for (musical) pitch (cf. Deutsch, 1970, 1975). In such cognitive theorizing, there is a danger of reifying memory as purpose-built storage, but it can also be considered a byproduct of information processing, and its effective duration as a byproduct of the kind of stimulus or processing: "a proceduralist orientation deals with this variety as reflecting the variety of kinds of auditory information processing, with memory as a side effect of this processing in all cases" (Crowder, 1993, p. 140). The memory functions associated with the Schenkerian foreground and background may correspond in general ways to other kinds of auditory memory (e.g., for speech) but also have their own unique properties.

Larson (1997) continued:

Thus, prolongation—and only prolongation—always determines which notes are heard as stable in a given context. … To hear a note as unstable also means to hear it as embellishing a more stable pitch—that is, to hear it as embellishing a pitch at a more remote level of pitch structure (p. 112).
I have argued that prolongation is embellishment; embellishment (and only embellishment) determines the relationships between tones that make some tones of lesser and greater structural weight than others (p. 130).

The word "prolongation" emphasizes the temporal aspect (the way a sonority can continue in the background as a psychological reference although it is not physically sounding), whereas the word "embellishment" emphasizes movement (usually stepwise) in pitch-time space. Since embellishment generally has the effect of prolonging a sound or pattern, the terms are closely related.

The melodic embellishment of a tonic triad can be divided into three processes:

  1. Arpeggiation: When the tones of a triad are presented successively, we still recognize the triad. Simultaneous and successive versions have the same tonal meaning.
  2. Passing notes: Within a 1-3-5 triad, we can pass 2 on the way from 1 to 3, or from 3 to 1. Similarly, we can pass 4 on the way from 3 to 5 or from 5 to 3.
  3. Neighbor notes: 2 can also be considered a neighbor of 1 or 3, and 4 as a neighbor of 3 and 5. But the 7 just below 1 is also a neighbor, as is the 6 just above 5.

Following this logic, any melody based on the scale steps 7, 1, 2, 3, 4, 5, and 6 can be considered a prolongation of the triad 1 3 5, provided the listener is somehow imagining this triad as a background or goal of melodic motion, and typical embellishment figures connect the background with the foreground. The concept of triadic prolongation can explain diverse common melodic progressions and hence the basic structure of MmT.

Schenker first hinted at these ideas in his harmony text (1906) as explained in the introduction to the English translation by Jonas (1954, p. ix):

According to the theory of prolongation, free composition, too, is subject to the laws of strict composition, albeit in "prolonged form." The theory of Auskomponierung shows voice-leading as the means by which the chord, as a harmonic concept, is made to unfold and extend in time. This, indeed, is the essence of music. Auskomponierung thus insures the unity and continuity of the musical work of art.

Schenker later expressed his idea more clearly (1922, p. 4; cited in Wikipedia under "Fundamental structure"):

The fundamental line presents the unfolding (Auswicklung) of a basic sonority, expressing tonality in the horizontal plane. The tonal system too, joins in expression of tonality. Its task is to bring a purposeful organization into the world of chords by selecting the scale degrees from among them. The liaison between the horizontal version of tonality through the fundamental line and the vertical through the scale degrees is voice leading.

The idea of MmT as a prolongation of the tonic triad can also be found in Schenker (1935).

Schenker used these ideas to analyze the "great music" of Bach, Beethoven and Brahms. But the last quote, taken out of context, says nothing about "great music," nor does it mention the expert subjectivity that is usually considered to be the foundation of Schenkerian analysis. It is tempting to consider the more general and potentially objective, psychological significance of this quote. According to Salzer (1952/1962), "tonality is the expression of tonal unity and coherence based on the principle of structure and prolongation" (pp. 226-227), and "Tonality may thus be defined as prolonged motion within the framework of a single key-determining progression, constituting the ultimate structural framework of the whole piece" (p. 227). This quote, and the work of later theorists such as Larson (1997) and Väisälä (2002), suggests that prolongation is not confined to the "high art" of "common-practice" composers such as Bach, Beethoven and Brahms. The idea of "unfolding of a basic sonority, expressing tonality in the horizontal plane" may also be applied to related ("pretonal," "posttonal") music. Salzer (1952/1962) offered many examples of prolongation in Medieval and Renaissance music (e.g., Leonin, Perotin, Machaut, Dunstable, Dufay, Josquin) as well as the prolongation of polychords by Copland and Stravinsky.

I would like to consider an even more radical generalization of Schenkerian thought. Triadic prolongation may represent the foundation of all MmT, from the most trivial to the most profound (whatever such value judgments mean, exactly). By MmT I mean harmonic tonality in the sense of Dahlhaus (1967) and not the more general case of music in the Ionian or Aeolian mode, which may or may not imply a background triad, depending on its historic or cultural origin or who is listening to it. Given this definition, any passage of music in a major or minor key can be regarded as a prolongation of its tonic triad—even if there is no harmonic accompaniment whatsoever. As we have seen, Huron's data on transitions between scale steps in major-key melodies is consistent with that idea. We can always perceive major or minor music in this way (structural hearing), and at some level it seems that we usually do, regardless of our musical expertise.

At this point, I would like to invite the reader to test these ideas by returning to the above lists of pieces and songs. It is one thing to imagine the melody and confirm that it is confined to the range from a low scale degree 7 to a high 6, but it is another thing to imagine the melody as a prolongation of the tonic triad. For me and (presumably) any Schenkerian theorist, the feeling of tonic prolongation is obvious. But for others it may not be so obvious, so they may be right to question this kind of explanation.


According to Straus (1987, cited by Larson, 1997),

Prolongation is an idea of extraordinary power. It has afforded remarkable insights into common-practice music, enabling us to hear through the musical surface to the remoter structural levels and ultimately to the tonic triad itself (p. 1).

From the point of view of music theory, it is remarkable that cognitive music psychologists have been so reluctant to accept and build on this idea. Narmour (1977) criticized Schenkerian dogma and offered his implication-realization model as an alternative. Huron (2006) expressed some widely shared reservations: "The rewrite rules used for reductions are not fully systematic and so there is considerable latitude for interpretation. No controlled studies have been carried out to determine whether analyses are unduly influenced by confirmation bias" (p. 97). Huron goes on to suggest that the Urlinie may be based on a more fundamental phenomenon, namely the gradual fall in pitch at the end of a phrase of speech or music.

For scientific purposes, Schenker's concept of Ursatz is too exact and detailed—and hence arbitrary. Empirical psychologists know that we cannot empirically determine background pitch structures with this degree of precision. Schenker laid himself open to criticism by presenting such a sharply defined structure as the ultimate background of a range of tonal musical styles and works. If background pitch structures exist in the awareness or imagination of listeners, those structures must be more fuzzy than Schenker's Ursatz.

This applies even to the most sophisticated listeners, and even to the scenario of "great" composers listening to their own music. Of course it is possible to train oneself to hear in ways specified by Schenker, and in that way to achieve new insights into a piece, but that is a different question. But the Ursatz is only one possible way to prolong the tonic triad. Most or all of the melodies listed above may also be considered prolongations of the tonic triad. In a radical interpretation, the middleground reduction of any melody (plus bass line) may be considered an alternative to the Ursatz.

Given these arguments, it may be appropriate for music psychologists to accept the psychological reality of prolongation, but to reject the specificity of the Ursatz (perhaps pending a more realistically approximate formulation). A compromise solution of this kind may help music psychology and music theory to grow closer together, and achieve productive synergetic interactions.

The goal of Schenker's theory was appropriate in its historical and cultural context. He wanted to understand why great pieces of Western music were so great, and in that way to understand the genius of great composers. Today, we understand greatness and genius in more relative terms (Cook, 1998). We take a broader view of what is "good music" and indeed what is "music." This applies in particular to music psychology. We are aiming for a general understanding of the psychological foundations of music. We want to understand MmT in a way that may be applied to any Western style or repertoire. Great 19th-century symphonies are not intrinsically more or less important than other common-practice, traditional, popular, sacred or secular music. MmT includes such diverse styles, genres and other classifications as bel canto opera, disco, jazz standards, bebop, blues, country, sacred harp, gospel, folk, lullabies, Christmas carols, easy listening, muzak, new-age relaxation music, chill-out, electronic dance, calypso, hip-hop, funk, metal, techno, gothic rock, indie, post-grunge, Afrobeat, Brazilian funk, salsa, reggae, flamenco, acid rock, Arabic pop, and Celtic punk.

The point of this long list, which could be extended almost indefinitely, is that musical styles that appear to be based on the principle of prolonging a major or minor triad (or perhaps another relatively consonant sonority such as an open fifth or, in bebop, a major-minor seventh chord) are continuing to dominate in today's globalized and technologized musical world—even if many theorists would not consider them to be examples of MmT. Even if a musical style is not based on major or minor scales, it may still be based on the idea of harmonic prolongation; examples include Flamenco based on harmonic progressions in the Phrygian mode, Arabic pop based on harmonic progressions in recognizably Arabic scales including quarter tones, and Indian classical music in which the melody may be considered a prolongation of a background tonic-fifth drone. This raises the interesting question of whether chords or harmonic progressions are a necessary ingredient of MmT. The theory I developed in Parncutt (2011a) suggests that they are, but a thorough study of non-Western tonal systems may yield a different conclusion.

Many Western musical styles, both today and in the past, cannot be conceived of as a prolongation of major or minor triad. But they represent a small minority. Western music has been dominated by triadic prolongation since the about 14th century (Salzer, 1952/1962), and the situation is unlikely to change in the foreseeable future. Conversely, a lot of non-Western music involves prolongation of sonorities such as perfect fifths and major or minor triads (but may not involve or imply chord progressions), suggesting that the arguments in this paper would apply to it (Sarha Moore, personal communication). Due to my limited expertise in ethnomusicology, I prefer to avoid this question and focus on Western music as Schenker and Salzer did.


I have gone to considerable lengths to explain how prolongation can help us understand the structure of MmT. The basic idea is that the tonic triad exists constantly as a psychological reference in the background, and may in that sense be regarded as the ultimate foundation of MmT. I will now argue that prolongation can also help us to understand the emotional connotations of MmT—in particular, the link between major keys and positive emotions, and between minor keys and negative emotions.

Huron and Davis mentioned that "the association of the minor third and the minor triad with sadness was already described in the sixteenth century by Zarlino (1558)." The major-happy/minor-sad association was increasingly accepted in the 17th century. For example, Lippius (1612) agreed that the Ionian, Lydian, and Mixolydian modes were essentially happy, while Dorian, Phrygian, and Aeolian were weak, sad, and serious; similar ideas were expressed by Cruger (1630). Werkmeister (1687, pp. 124-125; cited in Lester, 1977) considered the major triad to be "more joyful and perfect than anything else."

But there was also considerable argument and disagreement about the emotional connotations of major and minor keys (and their modal relatives) in the 16th and 17th centuries. Gumpelzhaimer (1591) presented old-fashioned ideas about the character of church modes: he considered Dorian to be cheerful, Hypodorian sad, Phrygian severe, Hypophrygian enticing, Lydian harsh, Hypolydian gentle, Mixolydian impatient, Hypomixolydian placable, Aeolian pleasant, Hypoaeolian sorrowful, Ionian delightful, and Hypoionian tearful (Landner, 1997). Inspecting this list, we can find no clear link between pre-major (Lydian, Mixolydian, Ionian) and positive emotions, or pre-minor (Dorian, Phrygian, Aeolian) and negative emotions. In Wikipedia under "Mode: Western Church," I found a table that compares interpretations of the "character" of church modes by three historic theorists: Guido of Arezzo (995-1050), Adam of Fulda (1445-1505), and Juan de Espinosa Medrano (1632-1688)—again with little agreement. Along similar lines, Judd (2002) presented a list of affects associated with the eight modes according to Vanneus (1533).

The uncertainty continued into the 18th century. Mattheson (1713, p. 232, cited in Lester, 1977, footnote 66) wrote:

Those who are of the opinion that the entire secret resides in the minor or major third and would prove that all minor keys, speaking generically, are necessarily sad, and on the contrary, that all major keys commonly foster a lusty character—it is not so much that they are wrong, but they have no yet gone far enough in their investigations. Those who are of the opinion that if a piece has a signature with flats it must necessarily sound soft and tender; if, however, it is set with one or more sharps, then its nature must be hard, fresh and gay — they have even less going for them.

Incidentally, the idea that sharp and flat keys have different character had a big influence on the history of music and music theory, but it does not withstand modern psychological scrutiny (Powell & Dibben, 2005).

The emotional connotations of church modes in the Renaissance were not simply about positive and negative emotions. According to Meier (2009), "the authentic modes were regarded as 'joyful to moderate', and the plagal modes as 'moderate to mournful'" (p. 182). Given that the lowest tone in an authentic mode is near the final and the lowest tone in a plagal mode is about a fourth below the final, a possible explanation is that music in authentic modes has a higher average pitch, or a higher average pitch compared to the final. If so, Huron's (2008) idea of the relationship between emotion and pitch relative to average or expected pitch could explain the effect. Meier (2009) also reminded us that any such theory should be taken with a grain of salt: Tinctoris (1475) thought that a competent composer could render any of the modes joyful or mournful, and this relativist approach was shared by the 16th-century Swiss music theorist Glarean and others. Moreover, "the major-minor duality of the Ionian and Aeolian modes, or of any other modes, plays no part in any of Glarean's thinking—not in the generation of the modes, their ordering, differentiation, relationships, or affects" (Lester, 1977, p. 212).

The variation and uncertainty of historical interpretations of the emotional qualities of modes before the 17th century suggest that it was easier to override the conventional emotional qualities of modes than it was later to contradict the stereotypical emotional qualities of major and minor keys.

… it is not always the affection of the text that determines the choice of mode; … the choice of mode does not always depend on the composer's choice; and … the affective character, peculiar to a mode 'by its nature,' may be altered by various compositional procedures (Meier, 2009, p. 184)
… each of the modes may be rendered 'joyful' (or alternatively 'hard') if the composer introduces movimenti veloci and uses many major thirds sixths or tenths over the bass; conversely, in each mode the music will become 'mournful' or 'languid' … if the composer makes use of slow rhythms and introduces many minor thirds, sixths or tenths over the bass (Meyer, 2009, p. 186)

This kind of comparison between pre-MmT and MmT suggests that there is more to the major-minor emotional distinction than mere arbitrary associations. Of course it is possible to make music in a major key seem sad by choosing a slow tempo, or music in a minor key seem happy by choosing a fast tempo; the emotional effect can also be changed by other parameters such as pitch range, articulation and rhythmic pattern (Tagg & Clarida, 2003, pp. 310-317). But since the 17th century there has been remarkable agreement about the idea that major keys are associated with positive emotional valence and minor with negative. If we model emotion as a combination of different contributions that include major versus minor tonality and other features investigated by Huron and colleagues (tempo, average pitch, dynamic level, timbre, articulation; cf. Gabrielsson & Lindström, 2010), few music psychologists today would question the psychological reality of the major-happy versus minor-sad association, given the strength and diversity of the empirical evidence (Costa, Fine, & Ricci Bitti, 2004; Gabrielsson, 2009; Gabrielsson & Juslin, 2003; Gagnon & Peretz, 2003; Juslin & Laukka, 2004). That is true in spite of some contradictions in the literature. For example, Kastner and Crowder (1990), whose experimental participants were aged 3-12 years, observed that "all children, even the youngest, showed a reliable positive-major/negative-minor connotation, thus confirming the conventional stereotype" (abstract); but Gabrielsson and Lindström (2010) found that children do not recognize the emotional connotations of major and minor until 6-8 years of age.

The above historical survey suggests that the psychological association between major/minor modes and emotion emerged in the Renaissance. But that is also the period during which the system of major and minor keys emerged—depending on how you define it (Dahlhaus, 1967). Salzer (1952/1962) offered a broader definition of MmT and a correspondingly different date of origin:

since chord prolongation, contrapuntal or harmonic, is the force which creates tonal coherence, the history of tonality begins not with the detection and establishment of harmonic relationships and harmonic chord progressions, but with the first use of contrapuntal chord prolongations in the twelfth century (p. 26).

These differences in definition mean that the margin of error within which "tonality" began is enormous: three or even five centuries. It follows that we cannot separate the question of the origin of MmT from the question of the origin of its emotional connotations. Sometime during those centuries, the structural and syntactic aspects of MmT gradually became consolidated, both in musical practice and in the minds of musical practitioners and listeners. In the absence of clear evidence to the contrary, we may assume that the consolidation of the emotional connotations of major and minor happened gradually and in parallel.

In Parncutt (2011a), I attempted to explain the evolution of musical structure in a way that combined psychological and historical arguments. The structures that later became known as major and minor triads started to appear sporadically in Western polyphony almost from its beginnings in Notre Dame in the 12th century, e.g., in the works of Perotin (Flotzinger, 2007), in both prepared and unprepared forms (Parncutt, Kaiser, & Sapp, 2011). They were already surprisingly prevalent in the 14th century in the polyphony of Machaut. In the 15th and 16th centuries, major and minor triads became even more common by comparison to other possible pitch-class sets of cardinality three; in typical scores by Palestrina and Lassus, almost every sonority is a 5/3, or in modern terminology a major or minor triad in root position; 6/3 chords are remarkably unusual. This is all the more surprising when we consider that Renaissance composers had no clear concept of triad, root, or inversion: they apparently did not consider the first inversion of a triad to be related to its root position, nor did they use this terminology. Instead, they conceived of sonorities in terms of intervals above the bass (Fuller, 1986). They may simply have regarded the 5th above the bass as more consonant than the 6th, which privileged 5/3 chords over 6/3s (Väisälä, personal communication). In the history of music theory, clear concepts of root and inversion first emerged in the early 17th century, for example in Lippius (1612)—over a century before Rameau (1722).

Stereotypical structures of MmT such as subdominant-dominant-tonic cadences gradually emerged during the 14th—16th centuries (cf. Eberlein, 1994). That is the same period during which major and minor triads increasingly dominated harmonic progressions by comparison to other possible sonorities. In Parncutt (2011a), I assumed that during this period music was increasingly perceived relative to tonic triads. In Schenkerian terms, we might say that Renaissance polyphony increasingly included passages that can be considered as prolongations of (tonic and other) sonorities (cf. Leech-Wilkinson, 1984; Stern, 1981). It became increasingly feasible to regard entire pieces as prolongations of triads.

The historical literature suggests that the major-happy/minor-sad association emerged during a long period, beginning in about the 14th century and ending in the 17th. The more pieces of music were structured around their opening and closing triads, the more these triads determined the music's emotional connotations. The question why major and minor keys are perceived to be happy or sad thus reduces to the question of why major or minor triads may be perceived to be happy or sad.


Isolated major and minor triads tend to be perceived as happy and sad respectively relative to each other, but the data are noisy and affected by pitch height, loudness and timbre: higher, louder and/or brighter chords sound happier (Crowder, 1984; Heinlein, 1928; both cited in Gabrielsson & Lindström, 2010). Moreover, a major triad in the context of a minor key (e.g., a flat submediant bVI) may sound sad because the whole minor context sounds sad, and a minor triad in the context of a major key (e.g., a mediant III) may sound happy due to the major key context, suggesting that the effect of tonal context may be emotionally stronger than the effect of the harmony in isolation (I am not aware of an empirical test of this intuition). Why does a piece of music whose background triad is major tend to sound happier than a piece whose background triad is minor? I offer the following explanations.

First, the major triad is on average more consonant than the minor triad. This claim is consistent with both history and psychological theory. In retrospect, we may regard the 13th and 14th centuries as periods of experimentation with vertical pitch-class sets of cardinality three, from which major and minor triads emerged as more consonant than other possible combinations of three pitch classes (Parncutt et al., 2011). This result can be explained if we assume that consonance/dissonance (C/D) has three main components: roughness, harmonicity, and familiarity (McLachlan, Marco, Light, & Wilson, 2013; Parncutt & Hair, 2011). People are not immediately sensitive to differences in roughness and harmonicity, which can only be clearly perceived after many repetitions over a long period (the familiarity effect). On the basis of their structure, major and minor triads are clearly more consonant than any other triads due to their low roughness (no seconds) and high harmonicity (perfect fifth) (Parncutt, 1988; cf. Johnson-Laird, Kang, & Leong, 2012). That explains why they have been the most common sonorities in Western music since the 16th century; even in the 14th and 15th centuries, they were the most common sonorities of three pitch classes (Parncutt, Kaiser, & Sapp, 2012). The same theory explains why major is more consonant than minor: major triads have higher harmonicity (i.e., they are more similar to the harmonic series as it exists among the audible partials of everyday harmonic complex tones such as voiced speech sounds). There is probably no consistent or significant difference between major and minor triads in terms of roughness (in both cases roughness varies considerably across different voicings of the same chord) or familiarity (both are extremely familiar)—although one could argue that major triads are somehow more expected, since they are more common.

Second, there may be a general, global tendency for music to be associated with positive emotions. Music accompanies rituals of all kinds and marks important social events including birth, initiation, marriage, death, yearly festivals, preparation for battle, victory, communication with gods and spirits, and (increasingly in Western society) entertainment and everyday social interaction. Most such events and functions carry positive connotations, so it is natural to associate music generally with positive emotions. It is "normal" for music to be happy. Of course most (presumably all) cultures also use music for sad occasions, but this happens less often. While many cultures are especially proud of their sad traditional music (e.g., Russia, Finland, Portugal, Turkey), my guess is that—even in such cases—if one was to document the music that most people listen to in everyday life, one would find that happy music predominates.

It follows that since major triads are more common than minor in common-practice Western music (Eberlein, 1994; Parncutt et al., 2011), and music in major keys is on average more common than music in minor keys, music in major keys may be perceived as "normal" while music in minor keys is somehow exceptional. Variations on this idea can often be found in music theory treatises; for example, Werkmeister (1687, pp. 124-125; cited in Lester, 1977) considered the major mode to be "natural" or "perfect" and the minor to be "less natural" or "less perfect." But Werkmeister's reasoning was based on number ratios rather than frequency of occurrence.

This last point explains why major is happy, but we need a further argument to explain why minor is sad. The most parsimonious explanation is that in minor passages, scale degrees 3 and 6 sound lower than expected, relative to equivalent major passages—consistent with emotional cues in speech (Huron, 2008). That does not mean that people necessarily have a cognitive representation of the equivalent major passage every time they hear a minor melody; but I am assuming that this is sometimes the case, or has sometimes been the case in the past few hundred years.


Huron and Davis presented findings based on mean melodic interval size that can explain why music in minor keys is perceived as sad relative to equivalent music in major keys, without reference to any other theory. I have presented an alternative theory, for which the same can be said (although there is considerable overlap between the two). Which theory is correct?

More generally: How do we evaluate different theories that predict the same result? Is it possible that two or more theories are simultaneously correct? In a purely quantitative approach, one might solve this problem by first establishing a "ground truth" of representative datapoints and then checking which theory best accounts for the entire dataset. But things are seldom that simple. Interesting theories can generally be applied to a range of problems or datasets, so one has to evaluate and compare the performance of each theory across these situations. That inevitably introduces subjective and qualitative elements into the evaluation. A possible approach is to consider the best theory to be the one that makes the best predictions (precision), in the widest variety of relevant situations (generalization), and based on the fewest assumptions or the least complex model (parsimony).

Consider for example theories of the origin of music. Several competing theories seem capable of independently explaining music's origin. I recently attempted to solve this problem by first defining music as a long list of attributes and comparing how each theory predicts each attribute (Parncutt, 2011b). In the present case, that approach is hardly possible, because there is no equivalent problem of definition surrounding terms like major, minor, happy and sad.

The analyses of Huron and Davis (2012) can contribute to an explanation of why the harmonic minor scale is good at expressing sadness, but that is only part of the story. We must also separate effects due to individual scale steps from effects due to scale-step transitions. The reduction in average interval size that is achieved by using the harmonic minor scale is quite small (the effect size is small), suggesting that the emotional difference between major and minor has more to do with individual scale steps than interval size. I have argued above that the emotional effect of individual scale steps is anchored to the tonic triad, whose function can be understood in two equivalent ways: Schenker's background and Krumhansl's key profiles.

Meyer (1956) explained the affective power of the minor mode by noting that it is "both more ambiguous and less stable than the major mode" (p. 226): Scale degrees 6 and 7 come in two chromatic variants. Similarly, the root of the minor triad is more ambiguous (Parncutt, 1988; Terhardt, 1982). Independent of music, sadness is associated with uncertainty. Different negative moods have different functions, suggesting different evolutionary origins (Keller & Nesse, 2005): grief (exhibited as sadness and crying) has a social function of strengthening social ties to replace lost ones, while the sadness that is associated with fatigue or pessimism may have the function of conserving energy—for example, during the winter, or more generally while waiting for a new opportunity for effective action. In any case, sadness slows people down, which gives them time to think about and evaluate options in uncertain situations. In the present theory, minor is less common because it is less harmonic (it does not match the aurally familiar harmonic series so well) and hence more ambiguous and less certain. Along similar lines, Schenker (1906) regarded the major triad, due to its similarity with the harmonic series, as "natural" and hence the best basis for composition; for him, the minor triad was "artificial," an artistic product.

Is there a causal psychological link between uncertain life situations and uncertain situations between musical tones? Intuitively, this seems likely, but such hypotheses are difficult to demonstrate empirically. It is clearer that successful predictions are rewarded by the brain (Huron, 2006; Volz, Schubotz, & Cramon, 2003). Uncertain situations are situations in which successful predictions are unlikely, so the positive emotions associated with neural "rewards" are likely to be absent. Similarly, it is difficult to confirm empirically the idea that musical structure is a mirror of social structure (Feld, 1984).


I have proposed that the ultimate origin and foundation of the negative emotional valence of music in a minor key is the prolonged minor triad in the background. A prolonged minor triad has negative emotional valence by comparison to a prolonged major triad for the simple reason that the third of the triad is lower in pitch, and hence lower than expected if major is perceived as the norm. Similarly, speech can sound sad if its pitch is lower than expected. Of course there are, or may be, other reasons, but this parsimonious observation appears sufficient to explain most of the effect.

If this finding holds, it is part of an emerging solution to an old puzzle. Why is Western music dominated by MmT? Why does MmT have just two modes, major and minor? Why has the emotional difference between major and minor been so stable across periods, styles, and (Western) cultures? Why is the tonal system itself so stable in spite of the long and quite intense history of attempts to usurp it, originally inspired by composers such as Wagner and Schoenberg, and continuing to the present day?

Current evidence points toward the following tentative explanations. First, assuming that the C/D of Western sonorities is based on a combination of roughness, harmonicity and familiarity, the major and minor triads are clearly the most consonant possible chord types or "Tn types" (Rahn, 1980) of cardinality greater than two, because only they have a perfect fifth (harmonicity) and no seconds (roughness). Second, three is the greatest number of pitch classes (Parncutt, 1993) or melodies (Huron, 1989) that can be simultaneously perceived (i.e., independently noticed), based on empirical comparisons between perceived and performed numbers of simultaneous tones and melodies respectively. These two constraints may have eliminated any and all other candidates for tonic sonorities composed from tones in the chromatic scale.

Familiarity based on exposure (frequency of occurrence) may have exaggerated the difference in perceived C/D between major/minor triads and competing candidates for tonic sonorities such as suspended, diminished and augmented triads—just as constant exposure to music of "great composers" (or any music, for that matter) tends to increase the preference gap between their music and the music of lesser-known composers or in lesser-known styles. This is an example of a more general psychological effect:

The vast literature on the mere-repeated-exposure effect shows it to be a robust phenomenon that cannot be explained by an appeal to recognition memory or perceptual fluency. The effect has been demonstrated across cultures, species, and diverse stimulus domains. It has been obtained even when the stimuli exposed are not accessible to the participants' awareness, and even prenatally (Zajonc, 2001, abstract).

The mere exposure effect (Bornstein, 1989; Zajonc, 1968) has been repeatedly confirmed for music: the more often people are exposed to a particular piece or style of music, the more they like it (e.g., Peretz, Gaudreau, & Bonnel, 1998; Szpunar, Schellenberg, & Pliner, 2004). The popular idea that "familiarity breeds contempt" has not been confirmed in mainstream music psychological studies; an inverted-U function reminiscent of Wundt's (1874) relationship between stimulus intensity/activity/arousal and pleasantness/affect/liking appears to exist for the relationship between liking and subjective complexity, but not for the relationship between liking and exposure or familiarity (North & Hargreaves, 1995).

Since C/D is fundamental to Western music and only consonant sonorities can be prolonged, the major and minor triads are the best available candidates (pc-sets, or more precisely Tn-types) for prolongation. Since they also imply almost-complete diatonic scales by means of missing fundamentals (only leading tones are not implied in this way), they are the best candidates for tonic sonorities of music that is based on diatonic scales.

Acknowledgments. I am grateful to Graham Hair, Sarha Moore, Renée Timmers, Olli Väisälä, and Karim Weth for valuable comments on previous drafts. This paper is dedicated to Steve Larson, whose untimely death on June 7, 2011 was a great loss to both the music theory and the music psychology communities. Steve was uniquely qualified to evaluate this paper, being equally at home as a Schenkerian music analyst, a cognitive psychologist, and a performing musician. He would have emailed me all kinds of interesting and useful reactions. Even without his direct feedback, this paper may never have happened without several publications in which Steve combined Schenkerian and cognitive-psychological theory, and convinced researchers in both areas of the virtues of the combination. On a personal note, I should say that no theoretical discussion of the nature and origins of musical sadness will make Steve's premature departure easier for me to understand or accept.


  • Adler, G. (1885). Umfang, Methode und Ziel der Musikwissenschaft. Vierteljahresschrift für Musikwissenschaft, Vol. 1, pp. 5–20.
  • Baddeley, A. D. & Hitch, G. J. (1974). Working memory. In G. H. Bower (Ed): The psychology of learning and motivation: Advances in research and theory (Vol. 8, pp. 47–89). New York: Academic Press.
  • Bigand, E., Tillmann, B., Poulin-Charronnat, B., & Manderlier, D. (2005). Repetition priming: Is music special? Quarterly Journal of Experimental Psychology, Section A: Human Experimental Psychology, Vol. 58, pp. 1347–1375.
  • Boenke, P. (2005). Zur amerikanischen Rezeption der Schichtenlehre Heinrich Schenkers. Zeitschrift der Gesellschaft für Musiktheorie, Vol. 2, pp. 181–188.
  • Bornstein, R.F. (1989). Exposure and affect: overview and meta-analysis of research 1968–1987. Psychological Bulletin, Vol. 106, pp. 265–289.
  • Bregman, A.S. (1990). Auditory Scene Analysis: The Perceptual Organization of Sound. Cambridge, MA: MIT Press.
  • Caplin, W.E. (1998). Classical Form. New York: Oxford University Press.
  • Cook, N. (1987). The perception of large-scale tonal closure. Music Perception, Vol. 5, pp. 197–206.
  • Cook, N. (1998). Music: A Very Short Introduction. New York: Oxford University Press.
  • Costa, M., Fine, P., & Ricci Bitti, P.E. (2004). Interval distributions, mode, and tonal strength of melodies as predictors of perceived emotion. Music Perception, Vol. 22, pp. 1–14.
  • Crowder, R.G. (1993). Auditory memory. In S. McAdams & E. Bigand (Eds.), Thinking in Sound: The Cognitive Psychology of Human Audition. Oxford: Oxford University Press, pp. 113–145.
  • Crowder, R. G. (1984). Perception of the major-minor distinction: I. Historical and theoretical foundations. Psychomusicology, Vol. 4, pp. 3–12.
  • Cruger, J. (1630). Synopsis musicae. Berlin: Kall.
  • Dahlhaus, C. (1967). Untersuchungen über die Entstehung der harmonischen Tonalität. Kassel: Bärenreiter.
  • Deutsch, D. (1970). Tones and numbers: Specificity of interference in immediate memory. Science, Vol. 168, pp. 1604–1605.
  • Deutsch, D. (1972). Mapping of interactions in the pitch memory store. Science, Vol. 175, pp. 1020–1022.
  • Deutsch, D. (1975). The organization of short-term memory for a single acoustic attribute. In D. Deutsch & J. A. Deutsch (Eds.), Short-Term Memory. New York: Academic Press, pp. 107–151.
  • Eberlein, R. (1994). Die Entstehung der tonalen Klangsyntax. Frankfurt/Main: Lang.
  • Feld, S. (1984). Sound structure as social structure. Ethnomusicology, Vol. 28, pp. 383–409.
  • Flotzinger, R. (2007). Von Leonin zu Perotin: Der musikalische Paradigmenwechsel in Paris um 1210. Bern: Lang.
  • Forte, A. (1959). Schenker's conception of musical structure. Journal of Music Theory, Vol. 3, pp. 1–30.
  • Forte, A., & Gilbert, S.E. (1982). Introduction to Schenkerian Analysis. New York: Norton.
  • Fuller, S. (1986). On sonority in Fourteenth-Century polyphony: Some preliminary reflections. Journal of Music Theory, Vol. 30, pp. 35–70.
  • Gabrielsson, A. (2009). The relationship between musical structure and perceived expression. In S. Hallam, I. Cross, & M. Thaut (Eds.), Oxford Handbook of Music Psychology. Oxford: Oxford University Press, pp. 141–150.
  • Gabrielsson, A., & Juslin, P.N. (2003). Emotional expression in music. In R.J. Davidson, K.R. Scherer, & H.H. Goldsmith (Eds.), Handbook of Affective Sciences. Oxford: Oxford University Press, pp. 503–534.
  • Gabrielsson, A., & Lindström, E. (2010). The role of structure in the musical expression of emotions. In P.N. Juslin & J.A. Sloboda (Eds.), Handbook of Music and Emotion: Theory, Research, Applications. Oxford: Oxford University Press, pp. 367–400.
  • Gagnon, L., & Peretz, I. (2003). Mode and tempo relative contributions to "happy-sad" judgments in equitone melodies. Cognition & Emotion, Vol. 17, pp. 25–40.
  • Gumpelzhaimer, A. (1591). Compendium musicae latino germanicum. Augsburg: Schoenigius.
  • Heinlein, C.P. (1928). The affective character of the major and minor modes in music. Journal of Comparative Psychology, Vol. 8, pp. 101–142.
  • Hunter, P.G., & Schellenberg, E.G. (2010). Music and emotion. In M.R. Jones, R.R. Fay, & A.N. Popper (Eds.), Music Perception. New York: Springer, pp. 129–164.
  • Huron, D. (1989). Voice denumerability in polyphonic music of homogeneous timbres. Music Perception, Vol. 6, pp. 361–382.
  • Huron, D. (2006). Sweet Anticipation: Music and the Psychology of Expectation. Cambridge, MA: MIT Press.
  • Huron, D. (2008). A comparison of average pitch height and interval size in major- and minor-key themes: Evidence consistent with affect-related pitch prosody. Empirical Musicology Review, Vol. 3, pp. 59–63.
  • Huron, D., & Davis, M.J. (2012). The harmonic minor scale provides an optimum way of reducing average melodic interval size consistent with sad affect cues. Empirical Music Review.
  • Huron, D., & Parncutt, R. (1993). An improved model of tonality perception incorporating pitch salience and echoic memory. Psychomusicology, Vol. 12, pp. 152–169.
  • Järvinen, T. (1995). Tonal hierarchies in jazz improvisation. Music Perception, Vol. 12, pp. 415–437.
  • Johnson-Laird, P.N., Kang, O.E., & Leong, Y.C. (2012). On musical dissonance. Music Perception, Vol. 30, pp. 19–35.
  • Jonas, O. (1954). Introduction. In O. Jonas (Ed.), Heinrich Schenker: Harmony (transl. E.M. Borgese). Chicago: University of Chicago Press.
  • Judd, C.C. (2002). Renaissance modal theory. In T. Christensen (Ed.), Cambridge History of Music Theory. Cambridge, UK: Cambridge University Press, pp. 364–406.
  • Juslin, P.N., & Laukka, P. (2004). Expression, perception, and induction of musical emotions: A review and a questionnaire study of everyday listening. Journal of New Music Research, Vol. 33, pp. 217–238.
  • Kastner, M.P., & Crowder, R.G. (1990). Perception of the major/minor distinction: IV. Emotional connotations in young children. Music Perception, Vol. 8, pp. 189–201.
  • Keller, M.C., & Nesse, R.M. (2005). Is low mood an adaptation? Evidence for subtypes with symptoms that match precipitants. Journal of Affective Disorders, Vol. 86, pp. 27–35.
  • Krumhansl., C. L. (1990). Cognitive foundations of musical pitch. New York: Oxford University Press.
  • Krumhansl, C.L., & Kessler, E.J. (1982). Tracing the dynamic changes in perceived tonal organization in a spatial representation of musical keys. Psychological Review, Vol. 89, pp. 334–368.
  • Krumhansl, C.L., Louhivuori, J., Toiviainen, P., Järvinen, T., & Eerola, T. (1999). Melodic expectation in Finnish spiritual folk hymns: Convergence of statistical, behavioral, and computational approaches. Music Perception, Vol. 17, pp. 151–196.
  • Landner, N.S. (1997). Ricercares a quattro voci by Vincenzo Galilei, 1584. recorderhomepage.net/galilei.html
  • Larson, S. (1997). The problem of prolongation in tonal music: Terminology, perception, and expressive meaning. Journal of Music Theory, Vol. 41, pp. 101–136.
  • Leech-Wilkinson, D. (1984). Machaut's 'Rose, Lis' and the problem of early music analysis. Music Analysis, Vol. 3, pp. 9–28.
  • Lester, J. (1977). Major-minor concept and modal theory in Germany, 1592–1680. Journal of the American Musicology Society, Vol. 30, pp. 208–253.
  • Lippius, J. (1612). Synopsis musicae novae. Strasbourg: Kieffer.
  • Mattheson, J. (1713). Das neu-eröffnete Orchestre. Hamburg: Schiller).
  • McDermott, J.H., Lehre, A.J., & Oxenham, A.J. (2010). Individual differences reveal the basis of consonance. Current Biology, Vol. 20, pp. 1035–1041.
  • McLachlan, N., Marco, D., Light, M., & Wilson, S. (2013). Consonance and pitch. Journal of Experimental Psychology: General. DOI: 10.1037/a0030830.
  • Meier, B. (2009). Rhetorical aspects of the Renaissance modes (transl. G. Chew). Journal of the Royal Musical Association, Vol. 115, No. 2, pp. 182–190.
  • Meyer, L.B. (1956). Emotion and Meaning in Music. Chicago: University of Chicago Press.
  • Meyer, L.B. (1973). Explaining Music: Essays and Explorations. Berkeley, CA: University of California Press.
  • Narmour, E. (1977). Beyond Schenkerism: The Need for Alternatives in Music Analysis. Chicago: University of Chicago Press.
  • Noorden, L. van (1975). Temporal Coherence in the Perception of Tone Sequences. Dissertation, Technical University Eindhoven.
  • North, A.C., & Hargreaves, D.J. (1995). Subjective complexity, familiarity, and liking for popular music. Psychomusicology, Vol. 14, pp. 77–93.
  • Oram, N., & Cuddy, L.L. (1995). Responsiveness of western adults to pitch distributional information in melodic sequences. Psychological Research, Vol. 57, pp. 103–118.
  • Ortmann, O. (1924). The fallacy of harmonic dualism. Musical Quarterly, Vol. 10, No. 3, pp. 369–383.
  • Parncutt, R. (1988). Revision of Terhardt's psychoacoustical model of the root(s) of a musical chord. Music Perception, Vol. 6, pp. 65–94.
  • Parncutt, R. (1989). Harmony: A psychoacoustical approach. Berlin: Springer-Verlag.
  • Parncutt, R. (1993). Pitch properties of chords of octave-spaced tones. Contemporary Music Review, Vol. 9, pp. 35–50.
  • Parncutt, R. (2007). Systematic musicology and the history and future of western musical scholarship. Journal of Interdisciplinary Music Studies, Vol. 1, pp. 1–32.
  • Parncutt, R. (2011a). The tonic as triad: Key profiles as pitch salience profiles of tonic triads. Music Perception, Vol. 28, pp. 333–365.
  • Parncutt, R. (2011b). Defining music as a step toward explaining its origin (spoken paper). Society for Music Perception and Cognition (SMPC, Rochester, NY, 11–14 August).
  • Parncutt, R., & Hair, G. (2011). Consonance and dissonance in theory and psychology: Disentangling dissonant dichotomies. Journal of Interdisciplinary Music Studies, Vol. 5, pp. 119–166.
  • Parncutt, R., & Prem, D. (2008). The relative prevalence of Medieval modes and the origin of the leading tone (poster). International Conference on Music Perception and Cognition (ICMPC10, Sapporo, Japan, 25–29 August).
  • Parncutt, R., Kaiser, F., & Sapp, C. (2011). Historical development of tonal syntax: Counting pitch-class sets in 13th-16th century polyphonic vocal music. In C. Agon et al. (Eds.), Mathematics and Computation in Music. Berlin: Springer-Verlag, pp. 366–369.
  • Parncutt, R., Kaiser, F., & Sapp, C. (2012). Estimating historical changes in consonance by counting prepared and unprepared dissonances in musical scores (spoken presentation). International Conference on Music Perception and Cognition (Thessaloniki, Greece, July).
  • Peretz, I., Gaudreau, D., & Bonnel, A.M. (1998). Exposure effects on music preferences and recognition. Memory and Cognition, Vol. 15, pp. 379–388.
  • Powell, J., & Dibben, N. (2005). Key-mood association: A self-perpetuating myth. Musicae Scientiae, Vol. 9, No. 2, pp. 289–312.
  • Pressing, J. (1977). Towards an understanding of scales in jazz. Jazzforschung, Vol. 9, pp. 25–35.
  • Rahn, J. (1980). Basic Atonal Theory. New York: Schirmer.
  • Rameau, J.-P. (1722). Traité de l'harmonie reduite à ses principes naturels. Paris: J.B.C. Ballard.
  • Salzer, F. (1952/1962). Structural Hearing: Tonal Coherence in Music. New York: Dover.
  • Schenker, H. (1906). Harmonielehre (Neue musikalische Theorien und Phantasien, Vol . 1). Stuttgart: J. G. Cotta'sche Buchhandlung Nachfolger.
  • Schenker, H. (1922). Der Tonwille, No. 2. Vienna: Tonwille-Flugblätterverlag (Universal). English translation: Der Tonwille: Pamphlets in Witness of the Immutable Laws of Music, Ed. William Drabkin, (transl. Ian Bent). New York: Oxford University Press, 2004–2005.
  • Schenker, H. (1935). Der freie Satz. Neue Musikalische Theorien und Phantasien (part 3). Vienna: Universal.
  • Smith, N.A., & Schmuckler, M.A. (2004). The perception of tonal structure through the differentiation and organization of pitches. Journal of Experimental Psychology: Human Perception and Performance, Vol. 30, pp. 268–286.
  • Stern, D. (1981). Tonal organization in modal polyphony. Theory and Practice, Vol. 6, No. 2, pp. 5–39.
  • Szpunar, K.K., Schellenberg, E.G., & Pliner, P. (2004). Liking and memory for musical stimuli as a function of exposure. Journal of Experimental Psychology: Learning, Memory, and Cognition, Vol. 30, pp. 370–381.
  • Tagg, P., & Clarida, B. (2003). Ten Little Title Tunes. New York: Mass Media Music Scholars' Press.
  • Taylor, S.E., Fiske, S.T., Etcoff, N.L., & Ruderman, A. J. (1978). Categorical and contextual bases of person memory and stereotyping. Journal of Personality and Social Psychology, Vol. 36, pp. 778–793.
  • Terhardt, E. (1974). Pitch, consonance, and harmony. Journal of the Acoustical Society of America, Vol. 55, pp. 1061–1069.
  • Terhardt, E. (1982). Die psychoakustischen Grundlagen der musikalischen Akkordgrundtöne und deren algorithmische Bestimmung. In C. Dahlhaus & M. Krause (Eds.), Tiefenstruktur der Musik. Berlin: Technical University of Berlin, pp. 23–50.
  • Terhardt, E., Stoll, G., & Seewann, M. (1982). Algorithm for extraction of pitch and pitch salience from complex tonal signals. Journal of the Acoustical Society of America, Vol. 71, pp. 679–688.
  • Thompson, W.F., Schellenberg, E.G., & Husain, G. (2001). Arousal, mood, and the Mozart effect. Psychological Science, Vol. 12, pp. 248–251.
  • Tinctoris, J. (1475). Terminorum musicae diffinitorium. Paris: Richard-Masse
  • Tversky, A., & Kahneman, D. (1973). Availability: A heuristic for judging frequency and probability. Cognitive Psychology, Vol. 5, pp. 207–232.
  • Väisälä, O. (2002). Prolongation of harmonies related to the harmonic series in early post-tonal music. Journal of Music Theory, Vol. 46, pp. 207–283.
  • Vanneus, S. (1533). Recanetum de musica aurea. Rome: Valerio Dorico.
  • Volz, K.G., Schubotz, R.I., & Cramon, D.Y. von (2003). Predicting events of varying probability: Uncertainty investigated by fMRI. NeuroImage, Vol. 19, pp. 271–280.
  • Vos, P.G., & Troost, J.M. (1989). Ascending and descending melodic intervals: Statistical findings and their perceptual relevance. Music Perception, Vol. 6, pp. 383–96.
  • Werkmeister, A. (1687). Musicae mathematicae hodeguscuriosus oder Richtiger Musicalischer Weg-Weiser. Frankfurt: Calvisius.
  • Wundt, W.M. (1874). Grundzuge der Physiologischen Psychologie. Leipzig: Engelmann.
  • Zajonc, R.B. (1968) Attitudinal effects of mere exposure. Journal of Personality and Social Psychology, Vol. 9, No. 2, Pt. 2, pp. 1–27.
  • Zajonc, R.B. (2001). Mere exposure: A gateway to the subliminal. Current Directions in Psychological Science, Vol. 10, pp. 224–228.
Return to Top of Page