THE idea that there exist correspondences between the dynamic principles of heavenly bodies, of our subjective experiences and of music is a very old one, and not confined to the West. In the earliest accounts of human music and musicality within ancient mythologies of China, Babylonia and India (Lippman, 1992, pp. 3-9), the constitution of the cosmos, of humans and of music are integrally related; and the Pythagorean tradition originating in ancient Greece attributes the powerful effects of music to the harmonic principles it shares with cosmic order. The similarities humans observe between musical motion and physical motion, on the one hand, and between musical motion and "inner motion" or emotion, on the other hand, constitutes one of the recurrent themes in the history of Western musical aesthetics, and many non-Western cultures draw parallels between musical movement and the progression, regeneration and return perceived in the natural world as well as in psychological phenomena. Musical thought across different times and places appears to consistently attribute human musicality to something that goes deeper than the accidents of culture or nurture. Arguably the most significant development in contemporary music psychology has been the recognition precisely of this fact, that musical phenomena are indeed rooted in the hard-wired biological, i.e. perceptual, affective and motor, capacities of our species and not merely in the accidental features of cultural and historical circumstances.

A landmark in this connection has been the publication of A Generative Theory of Tonal Music (1983) by Fred Lerdahl and Ray Jackendoff, which identified music theory as an integral branch of cognitive science. Setting out to provide an account of human musicality by identifying the psychological principles underlying music perception, Lerdahl and Jackendoff (1983, p. 332) have argued that music theory, the traditional constructs of which rely upon such principles, "can provide central evidence toward a more organic theory of mind." Following the appearance of this influential work, with which music psychology is thought to have come of age (Sloboda, 2005, pp. 102-103), the discipline of psychology, which had been dominated by studies of visual perception and language throughout the twentieth century, placed musical behaviour on a par with other cognitive domains in exploring the mental capacities and principles of our species. Concurrently with this development, there has been increasing interest in studying the possible neurobiological and neuropsychological determinants of music.

The popularity of this research area has been further enhanced by a renewed interest in musical universals in ethnomusicological studies, which for the greater part of the twentieth century regarded diversity and non-universality of music as the basis of its disciplinary methodology (Nettl, 2005, pp. 42-49). In this connection, the recent emergence of evolutionary musicology as a field of enquiry in its own right (Wallin, Merker, & Brown, 2000) represents the culmination of the idea that behind the rich variety of musics around the world are the hard-wired similarities and features of the human mind and body that are part of our evolutionary heritage. As scientific investigations in relation to the evolution of our species make musical phenomena the focus of research with increasing frequency, empirical findings make it hard, if not impossible, to argue against "the universality of sensory, perceptual, and cognitive processes [underlying different music systems], independently of the social and musical culture" (Carterette & Kendall, 1999, p. 727). In the words of Stephen Mithen, "rather than looking at sociological or historical factors, we can only explain the human propensity to make and listen to music by recognizing that it has been encoded into the human genome during the evolutionary history of our species" (Mithen, 2005, p. 1).

While systematically associated with language and other cognitive domains, musical behaviour — in accordance with one of the most remarkable findings of recent neuroscience — also displays a neural architecture specific to itself and to our species. Isabelle Peretz has provided compelling evidence that the neural networks that process language and music are dissociable, indicating domains that are at least partly independent, and that the capacity for music is not represented as a single entity in the brain but has different components, such that some of these can remain intact even when others are impaired. Peretz (2003, p. 192) has written that "neurological observations have consistently and recurrently suggested that music might well be distinct from other cognitive functions, in being subserved by specialized neural networks." The existence of such specialized brain structures signifies biological foundations for music upon which cultural variations can arise. Hence, even if "the larger scientific community is largely sceptical about links between music and biology" (Trehub, 2003, p. 3) — an attitude represented by Steven Pinker's conspicuous dismissal of music as "auditory cheesecake" (Pinker, 1997, p. 534) — recent work in evolutionary musicology, which brings together the expertise of comparative musicologists, neuroscientists, developmental psychologists, anthropologists, archaeologists and even zoologists, provides compelling evidence concerning the evolutionary significance of music, supporting the view that in the evolution and survival of humankind the role played by music has been no less crucial than the role played by language.

To date, most of the research on the evolutionary origins and significance of music has focused on those features that music shares with other cognitive domains, particularly with dance and language. Indeed, some of the prominent evolutionary accounts of music have proposed common origins for speech, music and dance 1 — as well as poetry and pretend play (Molino, 2000). Consequently, among the various features that make up the phenomenon of music, the most significant evolutionary roles have been given to meter and rhythm, which are attributes of the human capacity for both language and music as well as for organized bodily movement. 2 Accordingly, the ability to keep time and entrain physical movements to an external beat is assumed to have played a major role in the emergence of coordinated body movements, social bonding and group cohesion. Another feature of music that has received considerable attention in evolutionary accounts has been pitch contour, which is part of both musical and linguistic cognition. Processing of pitch contour constitutes the presumed evolutionary origin of vocal phrase formation and of richly communicative emotional expression (Brown, 2000).

Within this research profile in evolutionary musicology, certain features of music still remain neglected. In this article, I explore one of these features, a specifically musical capacity that appears to have a brain mechanism specially reserved for it: namely, tonality. I ask whether we can claim an evolutionary basis for the phenomenon of tonality, and whether it represents a biologically adaptive function in addition to being a cultural artifice in its particular manifestations. Taken in its broadest sense, "tonality" refers to the hierarchical organization of the pitch material around a single, central pitch, which is often used to evoke stability and closure. One of the basic functions of tonality is to shape musical movement by means of the functional hierarchy among the tones. In Western tonal music, the musical movement as it unfolds is often strongly directed especially because of a highly developed and structured harmonic system. While such a harmonic system is absent in non-Western musics, organizing the pitch material around a central pitch is a musical universal. 3 Here, I use the term "universal" in the sense of a "statistical universal" (Nettl, 2005, p. 48): as Bruno Nettl has argued, a practical way of exploring and discussing musical universals is by asking "whether there are features shared not by all but by a healthy majority of musics. We look for what is extremely common, substituting the concept of 'statistical' universals for what may be described as a 'true' universal" (Nettl, 2005, p. 48). While tonality is not necessary for music to exist — think of serial or electro-acoustic music — humanity evidently has chosen to make it a universal feature of musical practice.

In the great majority of musical systems, humans use specific scale formations as the basis of their music. Most significantly, they also assign a functional hierarchy to the members of their scales such that one of them behaves as a tonal centre. One of the other scale members often has a privileged relationship to the central tone, and prepares its arrival. Björn Merker has argued in this connection that "the cross-cultural ubiquity of tonal music, and the ability of listeners to perceive tonality in music employing unfamiliar scale systems, hints that it may have a deeper significance in the world of human music" (Merker, 2006, p. 33). The hierarchical ordering of pitches as an abstract construct is a distinctly musical phenomenon and has no counterpart in other cognitive domains: cognition of language and the processing of environmental sounds do not rely on the perception or understanding of a pitch hierarchy. Even though there are similarities between the uses of pitch in musical melodies and speech intonation, no known language involves a comparable tonal hierarchy. Knowledge of tonal hierarchy enables listeners to develop expectations for the occurrence of certain pitches, especially towards the end of a piece of music. Lerdahl and Jackendoff (2006, p. 53) have written that "psychological explanations alone do not explain why music is organized in terms of a set of fixed pitches organized [hierarchically] in a tonal space… We conclude that the mind/brain must contain something more specialized than psychoacoustic principles that accounts for the existence and organization of tonality." Indeed, research indicates that tonal cognition — or the tonal encoding of pitch (Peretz & Morais, 1989) — is neurologically dissociable from pitch discrimination, recognition of melodic contour, identification of timbre, and cognition of rhythm (Ayotte, Peretz, & Hyde, 2002). Brain damage can, for example, selectively impair tonal cognition such that some patients, while having no difficulty in processing pitch variations, are no longer able to judge melodic closure properly (Peretz & Coltheart, 2003). The existence of a specialized neural architecture behind tonality, which does not at the same time serve language, dance, or any other human capacity as far as we can tell, is intriguing and requires an explanation.

In working towards an explanation and some hypotheses, I revisit the ancient idea that postulates correspondences between the dynamic order of natural phenomena, of emotions and of music, and focus on the kind of movement generated by the functional hierarchy among the pitches of a musical system, i.e. tonal movement, since the starting point towards an evolutionary account of tonality is almost certainly related to its fascinating capacity to create an experience of movement — more specifically, of return and arrival — and the attendant experience of a spatial-temporal shape. In this connection, I will consider certain patterns of movement that are observed in the context of dynamic natural phenomena — patterns that function to stabilize dynamic systems — and will argue that the representation of such patterns of movement as pitch-based shapes may have had evolutionary significance for our species. Towards this end, a brief tour within the history of Western music theory will prove useful.


In Western musical thought, theoretical, critical and analytical explorations of tonality have been so thoroughly dependent on terminology and concepts related to movement that it is hard to imagine how one can even begin to talk about the phenomenon of tonality in this tradition without any reference to movement. Exclude all motion words from music theory, and it would be extremely difficult, if not impossible, to communicate verbally the easily appreciated meaning of such statements as "harmonic motion from tonic to dominant is functionally distinct from motion from dominant to tonic" (Morgan, 1998, p. 2), or "movement to the dominant domain creates a sense of tonal tension that is subsequently resolved by the descent back to the tonic" (Gauldin, 1997, p. 256). Western music theory has long been interested in the nature and source of the movement experience that the tonal features of music generate in listeners, and employing affect-based conceptualizations and terminology has been a frequent strategy in accounting for this phenomenon. Such a strategy is certainly not unique to Western musical thought: as Ian Cross has written "The evocation of affect and the experience of movement appear intimately bound to music in many cultures" (Cross, 1999, p. 29) and any evolutionary account of tonality must take this fact into account. In Western theory, some of the oldest models attribute the source of tonal movement to musical structures, such as dissonances, and refer to affective experiences by way of explanation. A fourteenth-century author of contrapuntal theory, for example, speaks of imperfect intervals "striving to attain" a more perfect interval (Cohen, 2001, p. 16). A theorist from the fifteenth century writes that a dissonant interval, which is imperfect in comparison to a consonant one, "ardently burns to attain that perfection" (Cohen, 2001, p. 16). One of the most influential theories proposed during the twentieth century, i.e. Schenker's organicist model of tonal music, is indeed a contemporary version of this anthropomorphic tradition in music theory. Schenker believed that musical tones have a "life" of their own and behave in accordance with their "will" such that each tone desires to become the root of a consonant triad. While this tradition, which gives a central role to dissonant structures in the generation of tonal motion, has occupied an eminent place in Western musical thought for centuries, it has also been significantly manifest in the music theories of other cultures. Even though the kinds of pitch combinations that are regarded as dissonances, and the manner of "resolving" them vary widely from culture to culture, dissonance as a determinant of expectation of movement to greater stability, as well as the fact of resolution, i.e. the existence of movement patterns from "restless" pitch structures towards "restful" ones, are universal phenomena (Carterette & Kendall, 1999).

Another frequent strategy to account for tonal motion in Western theory has concerned establishing conceptual and terminological connections between physical and musical movement. This strategy turns to notions of inertia, gravity and gravitational fields and to forces of attraction in explaining the generation of movement in physical and tonal spaces. It is in particular the concept of attractions — one of the most powerful theoretical tools of modern physics since the sixteenth century — that has found its way into music theory as tonal attractions in the writings of Jean-Philippe Rameau, François-Joseph Fétis, Jérôme-Joseph de Momigny, Ernst Kurth, Victor Zuckerkandl, and more recently Fred Lerdahl and Steve Larson. The common consensus in recent theory is that we experience and understand tonal movement by metaphorically transferring our embodied experience of physical forces such as gravity into the domain of music. The mechanism we employ in making this transfer is a basic cognitive capacity, namely cross-domain conceptual mapping, which allows us to conceptualize one kind of experience in terms of another, "preserving in the target domain the inferential structure of the source domain" (Lakoff & Johnson, 1999, p. 91). In other words, tonal motion is accounted for in terms of the source domain of physical motion; yet, it is not entirely clear why and how the source domain for the concept of attraction should be the physical world as most music theorists assume. I return to this issue below, but here it is worth noting that historically other writers who have used the concept of tonal attractions made different kinds of assumptions about it: Momigny, for example, believed the term "attractions" as used in music theory is not merely a metaphor but refers to a genuine structural similarity between the planetary and tonal systems. He wrote that "like the attraction recognized in physics in relation to the inertia of bodies, this attraction acts in inverse relation to distance: a tone that is only half a step away from the one that has to follow it is much more powerfully attracted by it, than were it [separated] by a whole step. Here is a new analogy I have discovered in nature, and that proves the marvellous harmony that reigns among the things least resembling one another in appearance" (Momigny, 1803-1806, p. 52). 4 Fétis, on the other hand, attributed the experience of such attractive forces between the tones to inherent organizational principles of the mind. He argued that the human mind cannot but perceive tonal relationships as based on attractions, similarly to necessarily perceiving the physical world through the Kantian experiential categories of space and time. In this context, Fétis explicitly denied acoustical or mathematical properties as determining tonal motion, and instead talked about a "mysterious law" originating in human cognition and governing the attraction and motion of sounds.


While contemporary accounts of tonal movement in terms of tonal attractions appear to explain why musical pitches are organized within a functional hierarchy, and around a stable pitch, they fail to explain one of the most important aspects of tonal organization, i.e. the existence of recurring patterns that lead to the stable pitch. These recurring patterns constitute what are technically known as cadences. In all world musics, tonal movement preceding the return of the central pitch is structured and not arbitrary. The stable pitch does not simply reappear, but "returns" following a process of returning, which involves recurring patterns of movement from instability towards stability. This feature is so fundamental that one can re-conceptualize tonality as the system for cadencing, made possible — to be sure — by the existence of a tonal centre. In thinking about the evolutionary origins of tonality, one has to be able to explain why in all musical cultures, cadences, rather than musical initiations, have been systematized and formalized. Humans apparently choose to create strongly memorable patterns for moving from relative tonal instability to stability, rather than for moving from relative tonal stability to instability. Hence cadences, but not tonal initiations, display universal features. For instance, Nettl writes that "in the vast majority of cultures most musical utterances tend to descend at the end, but they are not similarly uniform at their beginnings" (Nettl, 2005, p. 46). Any account of tonality — evolutionary or not — needs to be able to explain the significance, and the origins, of such memorable patterns of tonal movement that precede the return of the stable pitch.

In this connection, a revised model of attraction as we find in contemporary physics proves a useful theoretical tool (Milnor, 1985). According to this model, attractive forces — conceived as generators of movement patterns — explain the temporal behaviour of dynamical systems. When the equilibrium of a nonlinear system 5 is disturbed, the system reorganizes itself to reach a stable state. In such cases, the theory of self-organization posits so-called attractors. Dynamical systems display this kind of temporal evolution by being attracted to certain dynamical configurations — typically a steady oscillation that either repels or attracts neighbouring states of the system. The equilibrium state towards which all other states converge is regarded as an attractor: it draws all possible states of the system to itself and all possible trajectories come together at that particular configuration. Most significantly, the attractor in this case is not a place or an object but a dynamic shape or temporal pattern of movement, referred to as a "limit cycle". Such attractors are at the heart of most periodic processes observed in nature. Examples of natural systems that display stable and sustained attractive behaviour of this kind would be the beating of the heart, the neural discharge in the brain, the circadian rhythms of the 24-hour period in humans and animals, etc. Limit cycles are fundamental to periodicities, and they are everywhere in nature.

I propose to conceive of the phenomenon of tonality as a dynamical system that displays attractive behaviour as described above, in particular as a system for cadencing in terms of the (quasi-periodic) recurrence of certain pitch-based shapes that draw melodic trajectories to itself (and harmonic trajectories, in the case of Western tonal music). It should be noted that what matters here is the very existence of a structural similarity between tonal practices and other kinds of phenomena rather than any terminological similarity: not all cultures would use the same kind of terminology and discourse to account for such structural similarities, but what is intriguing is the existence of a system of creative practice as music that displays a certain structural similarity to certain dynamic, temporal shapes that we observe in natural phenomena. In this connection, one would first need to understand the basis on which humans perceive such structural similarities between natural and musical phenomena: is it the case that our experience and understanding of movement across different domains in terms of stability and attractions originate in our observation of these attributes in the physical world of objects?

In this connection, I hypothesize that our recognition and identification of certain movement patterns in the physical world as a process of return to stability (as in limit cycles) is based on our capacity to generate and experience such patterns subjectively and intersubjectively in an embodied-affective manner; this capacity also forms the origins of tonality as I argue below. Movement patterns that periodically return to stability constitute the structure of an affective schema acquired very early in life. 6 Infants develop richly communicative psychological experiences and expressive behaviour before they walk, or talk (Bloom, 1993; Stern, 1981, 1985). The earliest schemas humans develop, which are affective in nature, concern survival-enhancing interactions with parents and caregivers, and teach us about orientation, temporal progression, cause and effect, force, goal, and agency. The increasing dependence of human newborns on caregivers for survival during the course of evolution made these early interactions take on features that ensured sustained positive affect. It is believed that the sequences of vocal, facial, and kinetsic movements that structure interactions between infants and caregivers played a central role in the affective — as well as cognitive — evolution of our species. Ellen Dissanayake has argued that such daily multimodal interactions between infants and caregivers become "ritualized" as they are repeated over and over again (Dissanayake, 2001, p. 389). These repeated patterns — involving changing intensities, tempos and shapes of multimodal movements accompanied by positive affective states — form the structure of arguably the earliest affective schema humans acquire in life, representing an affective process that is employed to make sense of the world at a very early stage. For the sake of my argument, I shall call this the attraction schema: one can think of such movement patterns as an attractor, or a psychological limit cycle towards which various affective exchanges are directed. The purpose of the multimodal movement patterns forming the basis of the exchanges between the infant and the caregiver is to provide a stable affective referential state, so that all negative psychological states can be steered back to it by enacting, and re-enacting, the various vocal, kinesic, haptic and facial components of the schema as and when required. We can speculate that during the course of evolution, caregivers began to create memorable pitch patterns as part of the vocal component of the attraction schema, employing them to return to the same pitch, and thereby marking the beginnings of tonality. It is crucial to note here that the temporal modulation of the kinesic, haptic, visual, and vocal components of the attraction schema are controlled by the underlying affective dynamics and its changing shapes: in other words, the different perceptual and expressive modalities always modulate congruously because they are all supported by the same affective dynamics. The practical impossibility of modulating vocal expression from sadness to joy while keeping a sad facial expression shows that the perceptual-expressive modalities are indeed controlled by an amodal affective system. As far as the origins of tonality are concerned, as the kinesic and visual components of the attraction schema moved towards an expression and experience of affective stability, the vocal component of the schema would have had to follow the dynamics of the affective shaping in the same direction, i.e. towards stability. In order to test this hypothesis, empirical research can address in detail the question of modulation of affect through the involvement of different modalities. For instance, by presenting infants with a pattern of tonal movement that modulates incongruously with the visual and kinesic components of the caregiver's ongoing affective communication, e.g. by playing a melody that moves towards instability while the visual and kinesic patterns of the caregiver move towards stability and vice versa, the causal relationships between these multimodal patterns presented in different combinations and the modulations that take place in the infant's affective state can be established.

What is important as far as tonality is concerned is that in its origins it is not the tones that are attracted to stable pitches, but it is the affective system that is attracted to stable states through all its different perceptual and expressive modalities. In other words, tonality is an integral component of the attraction scheme acquired early in life, and its origins are to be located not in tones nor in tonal perception as such but in certain affective states. I would argue that the emergence of tonal encoding of pitch can be construed as a pre-linguistic stage in the evolution of modern humans, intimately related to the evolution of our affective capacities.

As the root of the attraction schema — and thereby of tonality — are to be found in our pre-linguistic affective experiences and implicit memories of dynamic patterns that emerge as we interact with members of our species, I hypothesize that our capacity for recognizing movement patterns that converge towards stable states in diverse phenomena does not originate in our cognitive understanding of the physical world of objects and events as recent music theory claims. In other words, the source domain for the attraction schema is not the behaviour of physical objects or even merely our own physical movements in the physical world. There is evidence that affective understanding is rooted in embodied first-person feelings, and not in the mere observation of the actions and gestures of other agents or of the motions of natural phenomena (Damasio, 1999, p. 343). Accordingly, unless humans can experience affective states subjectively, their ability to recognize them in other agents is impaired. Patients with such impairment can still describe the movements they observe accurately in terms of shape, intensity and rhythm, but cannot attribute any affective content to them. In other words, the movement patterns do not constitute for them an affectively meaningful unit with a sense of purpose. To interpret the motions and dynamic shapes in the world as affectively meaningful, the first-person experience of one's own embodied feelings appear to be essential. We are able to comprehend and describe the movements of dynamical systems, such as those of the solar or tonal systems, in terms of affective concepts — such as attraction — precisely because our affective schemas support such descriptions. If we did not have access to affective schemas, it is not clear in what sense we would construe tonal — and even planetary — movement in terms of attractions. The dynamic order of the heavenly bodies and of music appear to our understanding as constituted through attractive forces generating stable states only because our affective experiences appear to our consciousness as dynamically regulated and directed towards stable states periodically.

The attraction schema that I put forward as the evolutionary basis of tonality involves several important features that need to be emphasized. Firstly, it is relational through and through in that the multimodal temporal shapes constituting the schema reflect non-linguistic, intersubjective exchanges or turn-taking. In the earliest stages of life, when the schema is first acquired, affect modulation is controlled heavily by the caregiver: it is believed that the earliest signs of self-regulation of affect appear around the age of six months, when infants appear to internalize some of the particulars of the affective schema "practised" by their caregivers (Thompson, 1994). In this connection, empirical research is needed to reveal how much tonal singing contributes to the emergence of self-regulation of affect at this early stage. For example, tests could be designed in order to establish the relationship between the amount of tonal singing a caregiver presents an infant with from birth onwards and the onset of self-regulation of affect in the infant; and to compare the effects of differently "weighted" affective schemas practised by caregivers (e.g. those that put more emphasis on visual affective exchange in comparison to tonal singing and vice versa) on the length of time infants take to start self-regulation of affect. Furthermore, existing empirical research on lullabies (e.g., Unyk, Trehub, Trainor, & Schellenberg, 1992; Trehub, Unyk, & Trainor, 1993; Trehub & Trainor 1998) can be effectively extended to test the role of tonality on infants' lullaby preferences. Lullabies exist in every known culture and are universally employed to calm infants and induce sleep. By presenting infants with re-composed lullabies that do not return to the same tonality, empirical tests can explore whether infants prefer the original to the re-composed lullabies. Significantly, the role of the shape of the return to the original tonality in lullabies can be studied by presenting infants with lullabies that return to the original tonality abruptly, i.e. without the temporal pattern of cadencing, as well as with those that employ the process of returning, and observing which alternative they prefer. In addition, the experiment by Trehub, Unyk, and Trainor (1993), which provided evidence that adult listeners are able to identify whether the songs from a foreign culture represent lullabies, can be modified to use re-composed lullabies that do not return to the same tonality in order to further explore the role and universality of tonality in infant-directing song. The second point about the attraction schema is that it concerns an affective episode that extends in time; as such it is experienced as having a trajectory and a shape. The relationship between the perception and making of spatial shapes and the abstract temporal shapes that lived phenomena (such as narratives, emotions and music) represent is a complex and intriguing issue: one mechanism that is put forward to explain this relationship is cross-modal mapping and cross-domain mapping.

There is extensive research indicating that both static and dynamic stimulations in one modality can influence the perception of information in another modality. Historically, one of the earliest theories in this connection was proposed by the Austrian philosopher and psychologist Christian von Ehrenfels, who is best-known today for his article titled "Über Gestaltqualitäten" ("On Gestalt Qualities"), published in 1890. The central idea of this work, i.e. that our perceptions contain "form qualities" or Gestalten, which are not contained in isolated sensations, is often quoted. What is not so well known about Ehrenfels' famous article is that it also involves the development of an idea first proposed by the Austrian physicist and philosopher Ernst Mach on the perception of spatial and temporal shapes (Mach, 1865). Ehrenfels argued that each experience we have of a Gestalt or form in any sensory modality is cognized as structurally analogous to the experience of a spatial shape. In other words, spatial Gestalten serve in his view as references for our comprehension of forms or shapes in other modalities. An immediate implication of this idea is that concepts related to the perception and experience of spatial shapes can be applied to shapes extended in time. Indeed, the idea that there are similarities of form between different fields of experience is one of the most important conclusions of Ehrenfels' article. During the twentieth century, various authors including Heinz Werner (1948), Susanne Langer (1942), Lawrence Marks (1978) and Daniel Stern (1985) have argued along similar lines for the existence in our minds of abstract "amodal" forms that we utilize in making sense of the world through different modalities of perception. The attraction schema that I propose, which manifests itself as felt temporal shapes, is thus also experienced and understood in terms of spatial shapes and trajectories: I would even argue that the representation of the attraction schema as spatial trajectories provides the very first step towards experiencing space as existing beyond — and more abstractly than — the locatedness we sense through the immediate kinaesthetic configurations of our bodies. In other words, it could be that the attraction schema provides the essential basis for developing a concept of space that goes beyond the immediate percept of space.


I have argued so far that the originary function of tonality is to structure the vocal component of the attraction schema by creating pitch processes experienced as movement towards stability. The capacity to structure pitch materials in this way has far-reaching consequences as far as the development of other mental capacities are concerned. These consequences implicate two inter-related areas, both of which can be regarded as having played significant roles during the evolution of humankind:

  1. Emergence of the capacity to experience and structure psychological spaces;
  2. Emergence of the capacity to create narrative structures.

There is not much work done on how psychological spaces arise and what their nature is. One of the first to theorize about these kinds of spaces was the Gestalt psychologist Kurt Lewin, who attempted to explain human behaviour by reference to psychological spaces governed by driving and hindering forces (Lewin, 1935, 1936). Such psychological spaces would be organised in terms of trajectories and dynamic shapes. The human ability to move in psychological spaces, to project oneself into different hypothetical situations and hypothetical times, and the cognitive flexibility this brings is unmatched in the animal world, and must have had crucial evolutionary significance. It is possible that tonal movement structured around a functional pitch hierarchy brought with it the earliest kind of psychological space with a clear orientation, indicating unambiguously the place of rest and stability. The secure knowledge that there is a fixed place of stability in this space may have provided humans with the capacity to imagine mechanisms for steering affect back to it from many diverging states, using many different kinds of routes. In terms of evolution, the psychological space established by tonality may have played a crucial role in affective development, by making it possible to regulate affective states and to steer affect towards stability. Originating in the dyadic relationship of a newborn, psychological spaces humans move in become more and more differentiated and complex in adult life. Consistent and stable referential states are crucial in the development of psychological spaces and the cadencing that tonality enables functions as a structuring principle in this regard: cadences provide "islands of consistency" (Stern, 1985, p. 45) around which a psychological landscape can be structured.

The emergence and experience of psychological spaces is closely related to one of the most significant features of tonality and of the attraction schema, namely its capacity to structure processes of return and arrival and thereby end-states. There is evidence in research that the representation of end-states as stable states play a crucial role in various mental and bodily functions (Aarts & Elliot, 2012; Schmidt & Lee, 2005). Trajectories and dynamic shapes that lead to end-states appear to have motor, cognitive and affective significance. In daily locomotion, for example, the majority of our physical interactions with the world are organised as motor processes oriented towards physical targets, i.e. they are goal or end-state oriented. Such locomotion involves sophisticated abilities such as object representation, trajectory prediction, etc., and humans constantly negotiate trajectories or temporal shapes when they move in physical space. There is evidence indicating that the mental representation of the goal-state drives the motor trajectory, and that the trajectory in turn determines the movement dynamics; furthermore, there is evidence that goal-oriented locomotion is not planned as a succession of steps, but as a complete locomotor spatial trajectory driven by the representation of the end-state (Hicheur, Pham, Arechavaleta, Laumond, & Berthoz, 2007). Cognitive-affective daily tasks also appear to rely on the representation of end-states. For example, when people make intensity judgements of their affective experiences, end-intensities (in addition to peak intensities) have the strongest effect on overall intensity judgment (Kahneman, 1999). There appears to be something fundamental about the representation of end-states in our physical interaction with the world, as well as in the structuring of our affective experiences. Perhaps, there is after all a neurological basis to Shakespeare's "All's well that ends well" motto.

I hypothesize that the representation of a structured return to certain end-states as stable states is also essential for making sense of temporal experiences during the pre-linguistic stage of development, and must have been significant also during human evolution. Humans can connect together and give meaning to an otherwise disconnected set of events only by surveying the past from a psychologically stable point. Imagine a world where we never experience arrival and return to psychological stability, where events follow one another in a chaotic fashion; it then becomes difficult, if not impossible, to conceive how we would assign relational significance and meaning to events. Without stable points from which to look back on what happened, human memory would not be what it is. It is only because we can periodically "return" to stable psychological states that we can interpret past events in the manner of a narrative, with more or less clearly marked departures and arrivals. Time becomes human time when we can assign narratives to our temporal experiences. The beginning of all narrative structures can be found in cadencing, i.e. the essence of tonality. In evolutionary terms, the ability to construct linguistic narratives is built upon this pre-linguistic affective-tonal capacity, which made it possible for humans to unify the events within a temporal span through the structuring power of the stabilizing cadence. The evolution of the tonal encoding of pitch, therefore, must have had an adaptive role in that it created the possibility of giving meaning to temporal experiences of longer and longer stretches, which underlies the capacity of modern humans to form autobiographical memories and narratives.

In conclusion, I hope to have opened up debate in this article not only about the possible biological bases of the "resilience of tonality" (to borrow a term from Adorno), but also to draw attention to the continuity between diverse natural phenomena and our motor-cognitive-affective experiences, all displaying intriguingly similar temporal, dynamic shapes. Much work, of course, remains to be done to better understand the correspondences between spatial shapes and the temporal dynamic course of long-range, lived phenomena such as narratives, emotions and music.


  1. See Chapters 16, 17, 18, and 22 in The Origins of Music, N. L.Wallin, B. Merker, & S. Brown (Eds.), 2001, Cambridge, MA: MIT Press.
    Return to Text
  2. Although metric time-keeping is widely considered to have evolved in the context of musicking and dancing in groups (see, for example, "An Introduction to Evolutionary Musicology" by Brown, Merker and Wallin in The Origins of Music: 12), the kind of sophisticated metric hierarchy that is observed in musics all around the world is not a feature of either language or dance. Lerdahl and Jackendoff (2006) have noted that metrical structure is not widely shared by other cognitive systems, and in that sense represents a striking contrast with grouping structures, which are observed very commonly in many different domains.
    Return to Text
  3. See for example Chapters 1, 2, 3, 5, 9 and 10 in Analytical Studies in World Music edited by Michael Tenzer (New York: Oxford University Press, 2006), which includes discussions about tonality in this broad sense, specifically in Xorasani maqam from Iran, Bulgarian Horo, Flamenco from Andalusia, music of the Aka from Central Africa, South Indian ragas, and Western classical music respectively. See also Blacking (1970).
    Return to Text
  4. Translation from the French original by the author.
    Return to Text
  5. A system whose output is not directly proportional to its input; to put it simply, a nonlinear system is one whose behavior is not the sum of its parts or their multiple; e.g. fluid flow.
    Return to Text
  6. In contemporary psychology, the term "schema" is used to refer to structured information about the similarities between different experiences. Schemas are "organized sets of memories about sequences of events or physical scenes and their temporal and spatial characteristics, which are built up as we notice regularities in the environment" (Snyder, 2000, p. 95). The commonalities shared by different situations occurring at different times are abstracted so as to become a "memory framework" (Snyder, 2000, p. 95), and to function as a schema. Schemas, which operate unconsciously, shape our expectations about various phenomena, and guide the processing of new information as well as the retrieval of information stored in memory. Affective schemas carry pre-conceptual information about our responses to the environment and to other people.
    Return to Text


  • Aarts, H., & Elliot, A.J. (2012). Goal-Directed Behavior. New York: Psychology Press.
  • Ayotte, J., Peretz, I., & Hyde, K. (2002). Congenital amusia: A group study of adults afflicted with a music-specific disorder. Brain, Vol. 125, No. 2, pp. 238-251.
  • Blacking, J. (1970). Tonal Organization in the Music of Two Venda Initiation Schools. Ethnomusicology, Vol. 14, No. 1, pp. 1-54.
  • Bloom, L. (1993). The transition from infancy to language: Acquiring the power of expression. New York: Cambridge University Press.
  • Brown, S. (2001). The 'musilanguage' model of music evolution. In: N.L. Wallin, B. Merker, & S. Brown (Eds.), The Origins of Music. Cambridge, MA: MIT Press, pp. 271-300.
  • Carterette, E.C., & Kendall, R.A. (1999). Comparative music perception and cognition. In: D. Deutsch (Ed.), The Psychology of Music. London: Academic Press, pp. 725-791.
  • Cohen, D.E. (2001). The imperfect seeks its perfection: Harmonic progression, directed motion, and Aristotelian physics. Music Theory Spectrum, Vol. 23, No. 2, pp. 139-169.
  • Cross, I. (1999). Is music the most important thing we ever did? Music, development and evolution. In: S. Won Yi (Ed.), Music, Mind and Science. Seoul National University Press, pp. 1-39.
  • Damasio, A. (1999). The feeling of what happens: Body and emotion in the making of consciousness. New York: Harcourt Brace.
  • Dissanayake, E. (2001). Antecedents of the temporal arts in early mother-infant interaction. In: N.L. Wallin, B. Merker, & S. Brown (Eds.), The Origins of Music. Cambridge, MA: MIT Press, pp. 389-410.
  • Gauldin, R. (2006). Harmonic Practice in Tonal Music. New York: W.W.Norton.
  • Hicheur, H., Pham, Q.-C., Arechavaleta, G., Laumond, J.-P., & Berthoz, A. (2007). The formation of trajectories during goal-oriented locomotion in humans. The European Journal of Neuroscience, Vol. 26, No. 8, pp. 2376-2390.
  • Lerdahl, F., & Jackendoff, R. (1983). A Generative Theory of Tonal Music. Cambridge, MA: MIT Press.
  • Lerdahl, F., & Jackendoff, R. (2006). The capacity for music: What's special about it? Cognition, Vol. 100, No. 1, pp. 33-72.
  • Kahneman, D. (1999). Objective happiness. In: D. Kahneman, E. Diener, & N. Schwartz (Eds.), Well-being: The Foundations of Hedonic Psychology. New York: Russell-Sage, pp. 4-25.
  • Lakoff, G., & Johnson, M. (1999). Philosophy in the flesh: The embodied mind and its challenge to western thought. New York: Basic Books.
  • Langer, S. (1942). Philosophy in a New Key. Cambridge, MA: Harvard University Press.
  • Lewin, K. (1935). A Dynamic Theory of Personality. New York: McGraw-Hill.

    Lewin, K. (1936). Principles of Topological Psychology. New York: McGraw-Hill.

    Lippman, E. (1992). A History of Musical Aesthetics. Lincoln: University of Nebraska Press.
  • Mach, E. (1865). Bemerkungen zur Lehre vom räumlichen Sehen. Zeitschrift für Philosophie und philosophische Kritik, Vol. 46, pp. 1-5.
  • Marks, L.E. (1978). The Unity of the Senses: Interrelations Among the Modalities. New York: Academic Press.
  • Merker, B. (2006). Layered constraints on the multiple creativities of music. In: I. Deliège & G.A. Wiggins (Eds.), Musical Creativity: Multidisciplinary Research in Theory and Practice. Hove: Psychology Press, pp. 25-41.
  • Milnor, J. (1985). On the concept of attractor. Communications of Mathematical Physics, Vol. 99, No. 2, pp. 177-195.
  • Morgan, R.P. (1998). Symmetrical form and common-practice tonality. Music Theory Spectrum, Vol. 20, No. 1, pp. 1-47.
  • Mithen, S. (2005). The Singing Neanderthals: The Origins of Music, Language, Mind and Body. London: Weidenfeld & Nicholson.
  • Molino, J. (2001). Toward an evolutionary theory of music and language. In: N.L. Wallin, B. Merker, & S. Brown (Eds.), The Origins of Music. Cambridge, MA: MIT Press, pp. 165-176.
  • Momigny, J.-J. de. (1803-1806). Cours complet d'harmonie at de composition. Paris: chez l'auteur.
  • Nettl, B. (2005). The study of ethnomusicology: Thirty-one issues and concepts. Urbana: University of Illinois Press.
  • Peretz, I. (2003). Brain specialization for music: New evidence from congenital amusia. In: I. Peretz & R. Zatorre (Eds.), The Cognitive Neuroscience of Music. New York: Oxford University Press, pp. 192-203.
  • Peretz, I., & Coltheart, M. (2003). Modularity of music processing. Nature Neuroscience, Vol. 6, No. 7, pp. 688-691.
  • Peretz, I., & Morais, J. (1989). Music and modularity. Contemporary Music Review, Vol. 4, pp. 277-291.
  • Pinker, S. (1997). How the Mind Works. London: Allen Lane.
  • Schmidt, R., & Lee, T. (2005). Motor Control and Learning: A Behavioral Emphasis. Champaign: Human Kinetics Publishers.
  • Snyder, B. (2000). Music and Memory: An Introduction. Cambridge, MA: MIT Press.
  • Sloboda, J. (2005). Exploring the Musical Mind: Cognition, Emotion, Ability, Function. New York: Oxford University Press.
  • Stern, D. (1981). The development of biologically determined signals of readiness to communicate, which are language 'resistant'. In: R. E. Stark (Ed.), Language Behaviour in Infancy and Early Childhood. New York: Elsevier/North Holland, pp. 45-62.
  • Stern, D. (1985). The Interpersonal World of the Infant. London: Academic Press.
  • Tenzer, M. (2006). Analytical Studies in World Music. New York: Oxford University Press.
  • Thompson, R.A. (1994). Emotion regulation: a theme in search of definition. Monographs for the Society for Research in Child Development, Vol. 59, pp. 25-52.
  • Trehub, S.E. (2003). Musical predispositions in infancy: An update. In: I. Peretz & R. Zatorre (Eds.), The Cognitive Neuroscience of Music. New York: Oxford University Press, pp. 3-20.
  • Trehub, S.E. & Trainor, L.J. (1998). Singing to infants: Lullabies and play songs. Advances in Infancy Research, Vol. 12, pp. 43-77.
  • Trehub, S.E., Unyk, A.M., & Trainor, L.J. (1993). Maternal singing in cross-cultural perspective. Infant Behavior and Development, Vol. 16, No. 3, pp. 285-295.
  • Unyk, A.M., Trehub, S.E., Trainor, L.J. & Schellenberg, E.G. (1992). Lullabies and simplicity: A cross-cultural perspective. Psychology of Music, Vol. 20, No. 1, pp. 15-28.
  • von Ehrenfels, C. (1890). Über Gestaltqualitäten. Vierteljahrsschrift für wissenschaftliche Philosophie, Vol. 14, pp. 242-292.
  • Wallin, N.L., Merker, B., & Brown, S. (2001). The Origins of Music. Cambridge, MA: MIT Press.
  • Werner, H. (1948). Comparative Psychology of Mental Development. Chicago: Follett Publishing.
Return to Top of Page