The enigma of the voice is rich and profound because of all the things to which it seems to be responding.

(Derrida, 1967, pp. 13)

Scholars associated with the "vocal turn" in music studies have tended to speak about the voice as something in between. The voice's force is said to reside in its contingency and abstraction; its slippery relation to representation; and its role in the circulation of interstitial things like affect, identity, and difference: "voice is nothing if not relational, always situated at boundaries" (Feldman, 2015, p. 658). 1 Theorists of evolution have long shared with music scholars this conception of voice as a mediate entity, something that muddies the usual distinctions between nature and culture, human and animal, language and music. From Charles Darwin to Steven Pinker, the puzzle of communication's origins has led to and emerged from speculation about the vocalic dimensions of music and language. 2 One of the most thorough evolutionary theories of voice appears in Gary Tomlinson's (2015) narrative history of music's origins, A Million Years of Music: The Emergence of Human Modernity. Tomlinson identifies the voice as an important means by which our ancient hominin ancestors negotiated their environments and others as well as an important philosophical problem in and of itself. A Million Years of Music presumes to establish the central role of vocality in the emergence of modern music and language. Further, it offers a framework for rethinking the role of the voice in subject formation and communication, its power to enact and disrupt meaning, and its connections to emotion and agency.

While A Million Years of Music represents a significant contribution to theories of evolution, Tomlinson offers nothing new in the way of archaeological or paleontological discoveries (nor does he purport to). His book is rather a reinterpretation of available research and a humanistic one at that. Tomlinson is influenced by philosophers and critical theorists like Bernard Stiegler, Jacques Derrida, Manuel de Landa, and Kim Sterelny. He works to identify and critique concepts of agency, aesthetics, and culture as they appear within archaeological research. For instance, Tomlinson draws substantially on the thinking of French archaeologist and paleoanthropologist André Leroi-Gourhan, whose mid-20th-century theory of human socio-cultural development anticipated the notions of bottom-up performativity, embodied cognition, and emergent complexity that are now in vogue. 3 In sum, Tomlinson's materials are archaeological, but his methods are humanistic.

In the remainder of this review essay, I extract an evolutionary theory of voice from Tomlinson's A Million Years of Music and bring it into musicological conversations about subjectivity, embodiment, and emotion. Given that a number of reviews of this book have previously been published, I avoid rehashing observations that have already been made. 4 I first offer a summary of the book's main themes and intellectual infrastructure in order to assess its distinct approach to evolutionary theories of communication. I then outline its theory of voice and describe the utility of that theory to music scholars. The value of Tomlinson's vocal theory lies in his rethinking of the meanings of voice before the emergence of recognizable human agency. This enables music scholars to reconsider some of our foundational ideas about voice, including its role in human subjectivity and social exchanges. Discovering that Tomlinson's theory of voice suggests an unmediated link between voice and emotion, I consider the risks of viewing the voice as inherently emotive. Thus, I argue that a deeper theorization of the nature and function of emotion in ancient vocality is needed.

A New Evolutionary Theory of Voice

Some music scholars are likely to be suspicious of evolutionary theories of voice due to their presumed biologism. Tomlinson's evolutionary theory of voice shows awareness of the risks of theorizing the origins of music from within musicology. Consider, for instance, his response to a central question within evolutionary musicology: does music play an active role in human evolution, or is it a cultural invention without adaptive function? To say that music does play an active role in human evolution is to align oneself with the adaptationist tradition, which seeks to explain how music helps humans to survive and reproduce, or is otherwise biologically essential to our understandings of "the modern human." Nonadaptationism, on the other hand, sees music as superfluous, unnecessary for survival, even decadent; in Steven Pinker's (1997) words, music is "auditory cheesecake" (p. 534). 5 Tomlinson, however, rejects both positions. In place of the adaptationist/nonadaptationist distinction, he develops a biocultural approach that treats musical behaviors as emergent properties of embodied interactions with an ever-changing socio-material environment. Drawing on recent evolutionary biology, he demonstrates that the question "is music essential to humans?" is reductive, both in its conception of music and its conception of human evolution.

For Tomlinson, the story of music's origins cannot simply be the story of music. It is, rather, the story of the acquisition of various cognitive competencies within the hominin line that have come to define both human modernity and modern musicality: mimesis, joint attention, entrainment, and recursive mindreading. 6 Although Tomlinson structures his book as a historical narrative, he notes that these cognitive competencies did not develop teleologically or even linearly. Relying on dynamic systems theory, Tomlinson tracks the gradual, incremental, and nonlinear materialization of hominin cognition and its aggregation with patterns of sociality and communication, out of which modern music and language "fell out, as belated emergences" (p. 12). 7

Tomlinson sets the scene long before the known origins of music, with the invention of Acheulean bifaces—prehistoric stone implements flaked on both sides. The process of creating and using such tools was bound up with rudimentary vocal and gestural communication, or "gesture-calls." As Tomlinson explains, gesture-calls are the spontaneous vocalizations or physical gestures produced alongside "emotion" and "intention" (see esp. pp. 106–112). Gesture-calls developed prior to recognizable agency, which means they rely on co-present, face-to-face interactions. Tomlinson's conception of the nonagential invention of Acheulean bifaces, and the role of nonagential vocalic and gestural communication to that process of invention, implies a "technosociality." Technosociality is the crucial binding of the technological and the social, the idea that technology is shaped by a matrix of social interactions, which are in turn shaped by technology. The philosopher Bernard Stiegler (1998)—an important influence on Tomlinson—encapsulates the concept of technosociality in two questions: Who or what does the inventing? And, who or what is invented? 8 For Tomlinson this launches a Derridean line of inquiry into ideas about the material transmission of information and the emergence of the inscribed sign (the grammè). I will say more about Tomlinson's Derridean detour in a moment.

Tomlinson holds that the origins of music are twofold: "Musicking was always technological" and "[m]usicking was always social" (p. 48). He begins his account a million years ago with flaked stone implements (again, Acheulean bifaces) discovered in the French town of Saint-Acheul, describing the social and mental actions that facilitated their manufacture. Technosociality serves as the horizon of ancient hominin existence. Following pioneering archeologists like Clive Gamble, Ian Davison, and William Noble, Tomlinson argues that at the time when these tools were invented, a recognizable human agency was absent. This implies that early hominins created sophisticated tools "without planning to do so" (p. 51), using "gestural sequences," absent of reference, implication, or logical form:

These gestural sequences should not be thought of as a semantics of toolmaking. This would be to distinguish in anachronistic fashion an action from its conceptualized implications. The operational sequences carried no implications or abstractable concepts and were nothing more than patterns of movement and registers of difference for the hominins that witnessed and performed them. They eventuated in stone tools, but they did not signal or forecast them; they generated cognition more than being generated by it, we might almost say. Their transmission was founded on a kind of agency that emerged not from ideation but from the play of intersecting contexts of actions—even coupled contexts of action, as we shall see—that together formed the taskscape. (p.71, my emphasis)

In other words, stone tools were not the products of action plans or mental templates. Rather, they emerged from the imbrication of available materials and patterns of sociality, and they simultaneously influenced the hands and minds of the beings that facilitated their creation. In Acheulean toolmaking we discover our ancient hominin ancestors shaping and being shaped by the "rhythms of their techne" (p. 87).

In describing the inherent connection between matter and sociality, Tomlinson makes use of Leroi-Gourhan's (1993) notion of the chaîne opératoire (operational chain). The chaîne opératoire is a "succession of gestures" where the social and material formed an aggregate; "from their meeting a stone tool emerged" (p. 63). Take special note of the choice of language: "a stone tool emerged." The passive voice is a feature of A Million Years of Music, and the means by which Tomlinson stylizes the non-teleological play of our ancient hominin ancestors. With Leroi-Gourhan, Tomlinson imagines a means of manufacture "without planning, foresight, or mental image of the product to come" (p. 86). Tomlinson links this idea to Derrida, who argued that by going beyond "intentional consciousness," the grammè appears according to a new structure of non-presence (p. 64). This Derridean response to Leroi-Gourhan discovers the emergence of the sign absent of recognizable agency. In its place is what Tomlinson terms an "earliest poiesis" that relies on "no mode of abstraction, no cognitive distance, no knowing craftsman; it was poiesis from the bottom up" (p. 87).

Tomlinson follows Stiegler (1998) in drawing on Leroi-Gourhan's (1993) "universal technical tendency" to theorize the coevolution of technological development (technogenesis) and biological and social developments (anthropogenesis). Tomlinson criticizes Stiegler, however, for using his discussion of the integration of technological being and biosocial being to "turn from Derridean possibilities beyond 'intentional consciousness'… back to a Heideggerian model in which technology as poiesis is founded in an anticipation or foresight arising with the temporality of Dasein" (p. 86). Stiegler's figuration of the early hominin strikes Tomlinson as too agential and therefore too modern. In opposition to Stiegler, he seeks to describe the "nearly imponderable" idea of an early hominin technological tradition "that knows little self-possession and no gathering-together-in-advance, that results in products but does not thereby realize a future" (p. 87). While Stiegler aims to demonstrate how early hominins represented the unconcealing of a Heideggerian Dasein, Tomlinson centralizes a Derridean notion of non-presence. Non-presence enables Tomlinson to imagine a technological era prior to the two imagined by Heidegger, an era defined by "pre-sapient, primordial, nonhuman Dasein" (p. 88). Non-presence is what yields "Acheulean possibilities" (poiesis, entrained operational sequences, mimetic traditions, etc.), which emerge in the absence of recognizable human consciousness, representation, or rational planning.

Tomlinson's model of Acheulean life incorporates a rudimentary mode of communication. "Gesture-calls" are basic, physical, and vocalic means of communicating emotion and intent. Like tool-making, voice-making was a technosocial phenomenon: "[voice] was a construction" (p. 89). Closely related to gesture-calls is "protodiscourse": "the negotiation of intersubjectivity through vocalization and gesture but without language" (p. 17). Protodiscourse is a fraught area of research; Tomlinson's thinking is unique for its de-emphasis on lexical language. He explicitly critiques the scientific tradition that positions language's presumed logic and clarity as the key to, if not the telos of, human modernity. 9 Indeed, a key motivation behind A Million Years of Music is the need for a broader approach to early-hominin vocalization than the "post-Chomskyan generative-grammar linguocentrism," or "syntactocentrism" (p. 106) that dominates thought about the origins of human communication. Tomlinson's account tracks a parallel trajectory for music and language, but unseats language from its place of conceptual privilege, with the audacious goal of securing music's role in the emergence of human modernity. But even as he skillfully navigates away from a linguocentric model of protodiscourse, he does not offer a music-centric model in its place. He warns that fantasies of a "protomusic"—like those of a "protolanguage"—lead to "fruitless teleology" of the sort presumed by Vico, Rousseau, or Darwin.

Voice is an important element of Tomlinson's approach to the music-language relationship, and his descriptions of it add up to a distinct approach to subjectivity. He simultaneously rejects linguocentrism and foregrounds the role of a Derridean notion of non-presence within ancient vocal communication. By doing both, A Million Years of Music provides an implicit response to Derrida's ideas about voice and presence, as articulated in his critique of Husserl's theory of the subject. For Husserl, the subject is animated by the act of silently speaking to oneself. Husserl proposes a compulsory connection between voice and logos to justify a sense of self-presence that Derrida would in turn deconstruct. Musicologists have noted the challenge of turning to the voice without also returning to the metaphysics of presence—Brian Kane (2015) dubs this the vocal turn's "Derridean impasse" (p. 672). Husserl's vision of a high-fidelity voice-to-ear circuit yields an autonomous and integral subject, known fully to itself, by itself. Derrida critiques Husserl's idealization of the voice and his assertion of (in Tomlinson's words) "the unity of thought and voice in the logos," arguing that the "privilege of being cannot resist the deconstruction of the word" (p. 64). In other words, when the subject speaks, it still hears itself as if it were other. There is a self-other division contained within the subject's voicing and auditing of itself. And within that self-other division, deconstruction begins. Tomlinson's evolutionary theory of voice avoids the Derridean impasse, in which the voice becomes the guarantor of subjectivity, in two ways:

  1. It refuses to reduce voice to logos.
  2. It does not fasten the voice to presence. Furthermore, it does not rely on familiar strategies for avoiding the metaphysics of presence, by say, figuring the voice as an index of material uniqueness, as thinkers like Adriana Cavarero do, or figuring voice as a failure to guarantee presence, as in the Lacanian tradition. 10

Though Tomlinson does not explicitly describe his project as a response to Derrida, he employs Derridean resources to point beyond the capacity of intentional consciousness for ancient technological activity and to imagine an ancient poiesis with "no mode of abstraction, no cognitive distance, no knowing craftsman" (p. 88). This "accidental poiesis" stands outside Heidegger's figuration of two technological eras, and apart from metaphysics as such. Tomlinson moves toward "the original and non-empirical space of nonfoundation" described by Derrida (1967) as the undercurrent beneath presence, that is, toward "the irreducible emptiness from which the security of presence in the metaphysical form of ideality is decided and from which this security removes itself" (p. 6).

To summarize: Tomlinson's strategy is similar to the Lacanian strategy, in that it assumes the voice was never a site of presence, but it differs in that it does not figure the voice as a kind of gap or lack. Rather, Tomlinson's evolutionary voice functions as a tool for testing environmental and social affordances in the absence of recognizable agency. In this sense, the voice is an expression of the "universal technical tendency," an asubjective techno-logic that invents and is invented by the vocalizing body's contact with its material surroundings.

Musicologists have already begun to put Tomlinson's thinking to work. Carolyn Abbate (2016), for instance, makes use of Tomlinson's figuration of technosociality as a means to conceive of musical instruments as prostheses. In Abbate, musical instruments are not mere things to be "put to use." They are also agents that actively shape the cognition and corporeality of their "users." In other words, musical instruments are "users" (Abbate, 2016, p. 804). But Tomlinson actually went further than Abbate:

[T]his early hominin voice was not merely an innate one, elicited by external stimuli in preprogrammed ways and involving little voluntary control and social complexity. Instead, it had already begun to shift along the biosocial spectrum toward modest voluntary control and social complexity. It was a construction molded […] by encounters with others amid the materials, dangers, and rewards of the environment (p. 89).

In other words, the voice itself might have emerged as a kind of prosthesis. 11


For Tomlinson, early communicative modes (again, gesture-calls) conveyed emotion and intention:

Gesture-calls are a communicative mode of copresence and immediacy, eloquent and emotive in the kind of face-to-face expressions and responses that dominated interactions on Lower Paleolithic taskscapes […] As the situations in which the calls were deployed grew more intricate, fostering a more richly contextualized deictic deployment of them, their emotional and intentional messages must have gained precision and carried new informational payloads. (pp. 111–112)

While Tomlinson's thinking is evocative for figuring emotion and embodiment as the foundations of communicative praxis, he does not explain in detail why things like gesture-calls and protodiscourse are automatically emotive—they simply are taken to be. In terms of the literature on emotion he cites, he does not dive to the same depths that he does when theorizing, for instance, mimesis, discrete pitch, or emergent symbolism. Given that he sees emotion as central to gesture-calls and protodiscourse (and to technosociality more broadly), this strikes me as a significant omission and a rich area for further research.

I see a significant challenge arising from any research program that presumes a direct connection between voice and emotion. For Tomlinson, ancient vocalic praxis arises without recognizable subjectivity. And yet, emotion and subjectivity have long been viewed as intimately entwined; apparently, as Frederic Jameson (1991) put it, there must be a "self present to do the feeling" (p. 15). Voice, too, "promises a subject; it excites or haunts a listener to recognize in the voice a 'someone,'" in the words of Brandon LaBelle (2014, p. 6). How can we simultaneously think the emotiveness of voice without also presuming a subject who feels that emotion? In Rei Terada's (2001) Feeling in Theory, she thinks through the problem of emotion after "the death of the subject" (p. 1. Terada deconstructs the presumed link between emotion and subjectivity and exposes the ideological power given to emotion as "proof of the human subject;" "we would have no emotion if we were subjects," she provocatively claims (p. 4). But conceptualizing emotion after the death of the subject can only get us partway toward emotion before the birth of the subject. Terada, for instance, insists "emotion requires the death of the subject" (p. 4, emphasis added). Obviously, this will not hold for a pre-subjective theory of emotion. Nevertheless, it can raise questions for such a theory to grapple with: How does voice emote without agency? To whom is voice recognizably emotive? What effects do pre-subjective emotions have? What types of emotions are (mis)conveyed through voice?

Tomlinson's implied link between voice and emotion also mirrors arguments proffered by nineteenth-century theorists of music's origins. For instance, the philosopher Herbert Spencer (1857), who introduced an evolutionary theory of the origins of music two years prior to Darwin's (1859) On the origin of Species, argued that music is grounded in an axiom of physiology known as "reflex action," that is, the direct connection between emotion and movement:

All music is originally vocal. All vocal sounds are produced by the agency of certain muscles. These muscles, in common with those of the body at large, are excited to contraction by pleasurable and painful feelings. And therefore it is that feelings demonstrate themselves in sounds as well as in movements. (Spencer, 1857, p. 397)

Charles Darwin (1872), an admirer of Spencer's ideas about music, traced the origins of music to the sonic behaviors of animals, for whom strong feelings are accompanied by "involuntary" and "purposeless" muscular contractions, which result in sound emissions (pp. 83–84). Darwin explains that a body in pain may scream involuntarily but discovering that the scream provides relief may lead to more screaming—a learned screaming—"and thus the use of the voice will have become associated with suffering of any kind" (p. 85). Perhaps for Darwin this bidirectional flow between instinct and enculturated ability represents how voice begins to "shift along the biosocial spectrum toward modest voluntary control and social complexity" (to borrow language from Tomlinson). It also hints at how voice functions as both an expression of emotion and a stimulus to emotion. Tomlinson, however, does not get caught up in this timeworn music-philosophical debate about whether emotion is coming from the subject or the musical object. Rather, he imagines how music might have emerged without subjectivity altogether. This implies an unprecedented rethinking of the adaptationist/nonadaptationist question: the question is no longer "Is music essential to the human?" but rather, "Is the human essential to music?"

Scholars of music's evolutionary origins are well aware that the archaeological record carries little material trace of ancient music. There is little proof of music's evolution in a strong sense: as put by Honing, Cate, Peretz, and Trehub (2015), "[f]or the moment, at least, definitive conclusions about the prehistory and origins of music cannot be formulated" (p. 2). But for Tomlinson, speculation about the origins of music is about more than music—it is about understanding our socio-biological circumstances, of which music is an emergent property. Further, according to Derrida's (2009) conception, it is about the birth of what we call the human. For Stiegler (1998), following Derrida, we are driven to interrogate the birth of the human because we have "unceasingly […] questioned its end" (p. 135). The matter of ends leaks into the margins of Tomlinson's thought as well:

The cosmos may be destined to wind down to a point of maximum entropy and to an unarticulated dispersion of matter/energy. It seems probable, however, that the systems and histories that are maintaining it far from that point across billions of years cannot all be explained by linear thermodynamics alone. The systems that present the brightest evidence of this are living ones, the histories evolutionary and—at a late, incandescent moment—cultural ones. (p. 298)

Tomlinson thus imagines the inevitable end of the cosmos delayed by systems of culture. At this moment—Tomlinson at his most apocalyptic—he raises the banner of humanism once again. Indeed, he hopes readers of his book will find "humanism redeemed" (p. 347). Even as he dismantles the place of the human at the center of musicking, Tomlinson evinces a wistful commitment to the same cultural traditions whose subjectivity he has just evacuated. There remains, then, only the call and response of interlocking systems, a "voice" that emanates not from someone or something but from everywhere.

Miriam Piilonen
Northwestern University

Submitted 2020 March 3; accepted 2020 July 11.
Published 2020 October 22;


This review was copyedited by Niels Christian Hansen and layout edited by Kelly Jakubowski.


  1. In the quoted passage, Martha Feldman is describing what is shared, conceptually, between the five contributors to the 2015 JAMS Colloquy Why Voice Now? It is contributor Brian Kane (p.671) who refers specifically to a "vocal turn" in the humanities, on the way toward a new philosophical-psychoanalytic method for analyzing vocal meaning. Together, the essays within this colloquy articulate one set of histories and frameworks for voice studies in music: Emily Wilbourne, for instance, explicates the voice's power "in and as performance" as one entry point for analysis, as well as for reflecting upon humanists' recent fascination with voice (2015, 660). An alternate summary of the field is given by Emily Dolan to Opera Quarterly (2017), emphasizing the interdisciplinarity and rapid growth of voice studies: "A complete accounting of voice-centered work from even the past five years is impossible here; this issue… adds to a seemingly indefatigable chorus" (p.203). See especially Nina Sun Eidsheim's Sensing Sound: Singing and Listening as Vibrational Practice, in which Eidsheim radically re-envisions the concept of "music" by thematizing material vibration, and Matthew D. Morrison's "The Sound(s) of Subjection: Constructing American Popular Music and Racial Identity through Blacksound" (2017) and "Race, Blacksound, and the (Re)Making of Musicological Discourse" (2019), where Morrison figures the co-optation of the Black voice as blacksound, a sonic form of blackface.
    Return to Text
  2. See, for example Darwin (1871), Patel (2010; 2018), Pinker (1997), Huron (2001) and Spencer (1857).
    Return to Text
  3. Leroi-Gourhan's influence on radical strains of critical theory should be dually noted; indeed, his thinking influenced Deleuze and Guattari's Capitalism and Schizophrenia and Jacques Derrida's Of Grammatology, including Derrida's most famous neologism, différance.
    Return to Text
  4. See, for instance, Bennett (2017), Killin (2016), Lawergren (2016), van der Schyff and Schiavio (2017).
    Return to Text
  5. It is worth noting that Pinker appears to have changed his position from that of a nonadaptationist to that of an adaptationist (see Mehr et al., 2019).
    Return to Text
  6. Cf. Tomlinson (2015, esp. pp. 15–22). Tomlinson uses the term "entrainment" in two different senses. First, he means the musical concept of entrainment, or synchronization with an external pulse. Second, he means entrainment as it is conceived by complex systems theory to describe assemblages of systems of matter (see p. 309).
    Return to Text
  7. Tomlinson's (2015) account is not teleological, inspired in part by Darwin's model of evolutionary contingency: the sense that biocultural life moves toward nothing, has no end game, no cosmic ambition, and no finale.
    Return to Text
  8. These questions are from Stiegler's (1998, esp. pp. 134–79) chapter "Who? What? The Invention of the Human," a source of inspiration for Tomlinson.
    Return to Text
  9. For Tomlinson (2015), "questions of the nature of protolanguage have too often been limited by a unilateral teleology. They have tended to look back from the single vantage of modern language … they have reflected the emphases of post-Chomskyan linguistics and the disciplinary allegiances of those who have stepped into the protolanguage arena, mostly linguistics and language cognitivists" (p. 90).
    Return to Text
  10. See Cavarero (2005) and Dolar (2006).
    Return to Text
  11. Jonathan De Souza (2014) makes a similar argument.
    Return to Text


