THERE can be no doubt that concepts of shape are ubiquitous in musical discourse and music cognition: we use innumerable shape related metaphors for most (if not all) features of music such as dynamics, timbre, harmony, pitch, contour, rhythm, texture, tempo, timing, expressivity and affective qualities. Also, we encounter shapes in various music-related images such as in graphical scores, composers' sketches, music analysis illustrations, as well as in more directly signal-based shape images such as waveforms and spectrograms, and last but not least, as shape images of music-related body motion. We could thus speak of a widespread and deep-rooted shape cognition in music, as well as in human reasoning in general, as suggested by some directions in the cognitive sciences, especially by so-called morphodynamical theory and cognitive linguistics (Godøy, 1997; Petitot, 1985; Thom, 1983).

As shape cognition seems to apply to most musical features as well as to other domains of the cognitive sciences, it may be tempting to regard shape cognition as amodal, as having a high level of generality or even abstraction, making possible applications of similar shape categories to qualitatively rather different domains. We could for instance use the shape expression "flat" to characterize a melody (i.e. having a small ambit, circling around just a few tones), dynamics (having no increase or decrease in intensity), timbre (being stationary for the duration of the sound), spectrum (meaning all partials are of equal amplitude), a musical performance (that it came across as rather bland), or even the overall emotive effect of an entire musical work (it did not move us at all). The advantage of such shape cognition applied to various music-related features is obvious: shape cognition is inherently holistic quite simply because a shape is perceived and conceived "all-at-once" as a geometric unit, and not piecemeal as a series of individual points or numerical values.

The holistic nature inherent in shape cognition also fits quite well with music-related body motion in that most human motion is continuous and clearly exhibits shapes, both of motion trajectories and of quasi-stationary postures. The main idea with what I have previously called a motor-mimetic approach to music perception and cognition (Godøy, 2003), which in turn is based on so-called motor theory (Galantucci, Fowler, & Turvey, 2006), is that perception and cognition are active processes where we trace the shapes of whatever it is that we are perceiving and thinking.


However, this link between shape cognition and sensations of motion also raises questions about timescales. By engaging in so-called musical imagery (Godøy & Jørgensen, 2001), I can "replay" a tune in my mind at the same tempo as when I heard it performed "for real", or I can quickly scan through the entire melody in a couple of seconds, or I can even imagine the entire melody "in an instant", as a shape. This means that we have timescales in music cognition ranging from the very long to the very short, so we need to consider the different timescales at work here if we wish to relate shape cognition with perceived, non-abstract features in music.

The timescales of music actually extend from the very short durations of audible vibrations (from the threshold of hearing at approximately 20000 vibration shapes per second) to several minutes and hours. We may view timescales as continuous between these extremes, but there are significant qualitative differences in human perception and action at various ranges: what is within the audio range of 20 to 20000 Hz is neurophysiologically very different from that which is below this limit. And that which can fit within the range of what is commonly accepted as short-term memory of (very approximately) 0.5 to 5 seconds, is qualitatively quite different from that which is recalled from long-term memory, such as memories of more extended sections, tunes or whole works. This means that in musical imagery it is possible to scan through any excerpt or whole work of music at any speed, but in actual unfolding or "real time" perception we have qualitatively distinct features at different timescales. This is also the case when it comes to the perception of tonality at different timescales.

Western music theory has often had a tendency to sidestep these issues, at least in part because of notation. A prime example is the idea of reducing large-scale works to more compressed and skeletal overview images as in Schenkerian analysis, reducing, say, an extended symphony movement to an Urlinie of a few tones. With such a notation-based paradigm for musical analysis, critical perceptual distinctions between the local and the global may be lost. As far as I can see, it remains to be convincingly demonstrated that large-scale formal, including tonal, relationships are perceptually as decisive as some music theory would like us to believe (see Tillmann and Bigand [2004] for some very interesting critical remarks on this). The few experimental studies we have of the efficacy of large-scale forms (e.g., Eitan & Granot, 2008) tell us we should be suspicious of such claims until further notice. To my knowledge, most of the recent perception-based studies of tonality focus on short timescales, typically on local cadential contexts, rather than on large-scale (e.g., sonata form) tonal relations. Additionally, from my knowledge of various ethnomusicological studies, the concept of tonality is much determined by instrumental constraints, effectively making local pitch patterns repeated at short timescales the basis for sensations of tonality. We should thus always consider timescales when we think about tonality.


Considering the vast literature directly or indirectly concerned with notions of tonality in Western musical thought, it is difficult to extract any single, clear definition of what tonality in music actually is. For one thing, there are the terminological issues of what is meant by terms such as "tonal", "modal", "free tonal", "atonal", "serial", etc. It seems to me that in the Anglo-Saxon world the term "tonal" is often seen as synonymous with what may be called "functional harmonic music", amenable to the Hugo Riemann type of tonality definitions by chord functions. But in other contexts "tonal" could be taken to include modal and/or "free tonal" music, as in various strands of neoclassical and other Western 20th century music, for example Olivier Messiaen or even John Coltrane.

It could be interesting here to remember the more "extended" view of tonality that was advocated by Paul Hindemith, partly based on acoustic and perceptual principles (Hindemith, 1941). Although this work is often considered speculative and unsystematic, one interesting idea of Hindemith's is that any single interval or constellation of intervals (a chord) will have a more or less salient root tone, and that any progression of intervals or chords would thus result in some tonal sensation. Hindemith's contention was that music may be anti-tonal yet still locally and in passing have some (weaker or stronger) tonal centre, as he tried to demonstrate with his analysis of an excerpt from Schönberg's Piano Piece op. 33a. Hindemith was modern in his view of tonality not just in dissociating it from past European major-minor-chromatic ways, but also in regarding tonality as a graded, "more or less" and contextually emergent phenomenon. Interestingly, we have in recent decades seen more systematic bottom-up, signal-based approaches to tonality as an emergent phenomenon in listening, as in Leman (1995). Furthermore, such bottom-up approaches suggest that the perception of pitch and intervals are also dependent on timbral features (Sethares, 2005).

Attempting to take a more universal or "world music" view of tonality, it would make sense to adopt a combined acoustical-perceptual approach that includes what I call instrumental constraints, i.e. the production of the sound, the sound's perceived timbral features, and various practices such as the tunings and interval sizes used, and also the overall statistical distribution of the various pitches during the unfolding of the music. The last point is essential as it might capture the phenomenon of tonality-sensations as a result of the sheer recurrence of certain pitches, independent of tunings, interval sizes, or modalities (scales), showing that, for example, Messiaen-style recurring central tones create tonal sensations even in passages that are modally quite diverse. Similarly, the use of drones in instrumental music like in Norwegian Hardanger Fiddling and several other world musics may be seen as prominent cases of tonality, what I would call a landmark type tonality based on the constraint that the strings (or pipes, tubes, bells, etc.) stay more or less tuned to the same pitch throughout the performance.

To summarize, there are some constraints at work here that I think we ought to keep in mind when considering tonality in its various guises, namely that tonality should

  • be seen in relation to timescales
  • be seen in relation to instrumental features, in particular the use of drones as landmarks for tonality
  • be seen in relation to timbre, tuning, and interval size
  • allow for innumerable interval constellations (all kinds of modal scales)
  • be seen as a statistical and contextually emergent "more-or-less" phenomenon.


In summary and as an overall comment to Mine Doğantan-Dack's article, it seems to me that known musical practices suggest that tonality emerges on the basis of constraints: basic physical constraints of instruments such as the use of drones, as well as of timbre and tunings, and various neurocognitive constraints at different timescales. In a world music perspective, the use of drones is significant, effectively becoming landmarks in pitch space to which a great variety of pitches may be related. Also the frequency of occurrence of a given pitch could be seen as a basis for tonality, meaning that tonality is an emergent statistical phenomenon.

Related to the landmark function of drones and/or sheer frequency of occurrence of some tone(s) in instrumental music, it could be tempting to speculate that effector position and effector shape play a role in sensations of tonality, for example as finger positions on the strings or the tubes, and later on in the development of musical technologies as position and shape of hands on the keys of an instrument. We could also speculate that on an even more general cognitive level there could be a coupling of tonality with spatial position, enabling the gestural representation of pitches (as in sol-fa practice). Research on spontaneous body motion to music seems to suggest that pitch is the feature of musical sound most readily rendered and most agreed upon: people regardless of training tend to render pitches with hands up for "high" pitches, hands down for "low" pitches (Godøy, 2010). However, to what extent recurrent tonal centers would be rendered by more precise hand positions is something that remains to be explored.

Lastly, it is essential to consider timescales when thinking about tonality, as there seem to be quite significant differences between the local and the more large-scale formal relations of tonality. Concentrating on the local, the combined embodied and instrument-based constraints mentioned above seem to allow universal frameworks for generating some tonality sensation; yet the diversity in how tonality is manifest in various musical practices world-wide is quite astonishing.


  • Eitan, Z., & Granot, R.Y. (2008). Growing oranges on Mozart's apple tree: "inner form" and aesthetic judgment. Music Perception, Vol. 25, No. 5, pp. 397-417.
  • Galantucci, B., Fowler, C.A., & Turvey, M.T. (2006). The motor theory of speech perception reviewed. Psychonomic Bulletin and Review, Vol. 13, No. 3, pp. 361-377.
  • Godøy, R.I. (1997). Formalization and Epistemology. Oslo: Scandinavian University Press.
  • Godøy, R.I. (2003). Motor-mimetic music cognition. Leonardo, Vol. 36, No. 4, pp. 317-319.
  • Godøy, R.I. (2010). Gestural affordances of musical sound. In: R.I. Godøy & M. Leman (Eds.), Musical Gestures: Sound, Movement, and Meaning. New York: Routledge, pp. 103-125.
  • Godøy, R.I., & Jørgensen, H. (Eds.) (2001). Musical Imagery. Lisse (Holland): Swets & Zeitlinger.
  • Hindemith, P. (1941). The Craft of Musical Composition. London: Schott.
  • Leman, M. (1995). Music and Schema Theory: Cognitive Foundations of Systematic Musicology. Berlin: Springer.
  • Petitot, J. (1985). Morphogenèse du Sens I. Paris: Presses Universitaires de France.
  • Sethares, W.A. (2005). Tuning, Timbre, Spectrum, Scale. London: Springer.
  • Tillmann, B., & Bigand, E. (2004). The relative importance of local and global structures in music perception. The Journal of Aesthetics and Art Criticism, Vol. 62, No. 2, pp. 211-222.
  • Thom, R. (1983). Paraboles et catastrophes. Paris: Flammarion.
Return to Top of Page


  • There are currently no refbacks.

Copyright (c) 2013 Rolfe Inge Godøy

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.


Beginning with Volume 7, No 3-4 (2012), Empirical Musicology Review is published under a Creative Commons Attribution-NonCommercial license

Empirical Musicology Review is published by The Ohio State University Libraries.

If you encounter problems with the site or have comments to offer, including any access difficulty due to incompatibility with adaptive technology, please contact

ISSN: 1559-5749