"CROSS-CULTURAL representations of musical shape" compellingly reminds us that some fundamental Western notions of music are neither self-evident nor universal — including the notion of the musical object itself. A deeply ingrained Western tradition (though one forcefully challenged in recent decades) views musical events and compositions as stand-alone sound objects, which may be detached from any specific social context and function and thus contemplated and represented abstractedly. Such notions, however, may be completely alien to some non-Western cultures, including that of the BenaBena participants observed by Athanasopoulos and Moran. For them, "music" necessarily imparts in specific social contexts, meanings and functions (Hays, 1986; Feld, 1990); hence, the concept of an independent sound object, detached from any social or natural production contexts, would make little sense.

Importantly, where the notion of an abstract, context-less musical object makes little sense, so does that of independent musical "structure." Thus, structural variables that, for Western musicians, may define musical patterns and events (such as pitch contour, used by the authors to distinguish stimuli from each other) may be of little significance in a different culture, where such variables have no role in delineating socially meaningful sound patterns.

The responses of BenaBena participants to the drawing task in the present experiment clearly illustrate such cultural differences. "Participants attempted to indicate that a flute was playing through iconic representation of the sound, without attempting to indicate specific variations between the sound events … When questioned if all the sound events were the same (regardless of contour variation), the participant responded, 'Yes, it is still a flute playing'" (p. 191). It seems, then, that the very notion of an abstract sound object was indeed alien to BenaBena participants. Instead, they associated the stimuli they heard with concrete, culturally significant modes of sound production (flute playing, for instance, is central to traditional "sing-sing" ceremonies of the BenaBena and other traditional Papua New Guinea cultures, associated with the evocation of ancestral spirits or the assertion of male dominance over women; Hays, 1986; Langness, 1974). Consequentially, the contour variations that, from the experimenters' perspective, clearly distinguished one stimulus from another were not heeded in the BenaBena graphic representations. Apparently, these structural variations (though perceptible) were viewed as irrelevant, since they did not represent any culturally significant distinctions. Thus, stimuli were all "the same" because they did not differ in the one way participants found to be culturally relevant: their perceived mode of production (flute playing).

The above interpretation may also suggest why most BenaBena participants did not use spatial mappings of the musical timeline in their drawings. While space-time mappings differ among cultures in important ways (Gentner, Imai, & Boroditsky, 2002; Fuhrman & Boroditsky, 2010; Núñez & Cooperrider, 2013), the tendency to represent temporal relationships, such as temporal order or duration, by some spatial relationships seems universal. Furthermore, such mappings are expressed in the BenaBena cultural milieu, using both language metaphors and bodily gestures. Núñez, Cooperrider, and Wassmann (2012), for instance, observed consistent use of up-down metaphors and gestures to refer to past and future events, respectively; and the well-known Kaluli waterfall metaphors provide another intriguing example (Feld, 1990). Yet, while Western participants in the experiment mostly used "Cartesian" time-to-space mappings, emulating the left-right spatial mapping of the musical timeline in Western musical notation, no spatial mapping of the order or duration of musical events was observed in most BenaBena drawings. Apparently, BenaBena participants did not use time-to-space mappings in their drawings, though those were culturally available, since they did not view the specific temporal structures of the individual stimuli presented to them as meaningful: devoid of any cultural context, these timelines were "all the same" to them, since "it is still a flute playing" (p. 191).


Athanasopoulos and Moran's experiment explicitly and directly required participants to draw upon their shared cultural resources, and "… represent the sound so that if another member of their community saw them they should be able to connect them with the sound" (p.186). When such tasks are presented to members of one culture with a long-established tradition of graphic sound representation, as well as to members of another culture that does not use any graphic representation of sound whatsoever (and in addition may have no clear notion of an independent sound object to be represented), cultural differences in sound representation would be inevitable.

Yet, such differences do not necessarily imply that cross-cultural, possibly universal tendencies to correlate sound and visual shape in specific ways are non-existent. In recent decades, diverse studies have indicated that some cross-modal correspondences involving sound arise independently of cultural practice (see Eitan, 2013; Spence, 2011, for research overviews). Thus, infants tend to associate pitch direction ("up" or "down") with spatial rise and fall (Dolscheid, Hunnius, Casasanto, & Majid, 2012; Jeschonek, Pauen, & Babocsai, 2012; Wagner, Winner, Cicchetti, & Gardner, 1981; Walker et al., 2010), pitch height with visual shape (higher pitches are spiky, lower pitches rounded; Walker et al., 2010), pitch height with thickness (higher pitch is thinner; Dolscheid et al., 2012), and loudness with brightness (Lewkowicz & Turkewitz, 1980); human adults, pre-verbal children, and even non-human primates (chimpanzees) all associate pitch height with visual brightness (Ludwig, Adachi, & Matsuzawa, 2011; Mondloch & Maurer, 2004). Though most such studies did not directly involve cross-cultural comparisons, they do suggest that some cross-modal correspondences do not depend on enculturation, as they emerge at a very early age and may even be shared by humans and other species.

How can such findings be reconciled with the results of Athanasopoulos and Moran's experiment, suggesting that "the iconic representation of music for a communicative function must follow society's norms" (p.191)? At least in part, the answer lies in the differences in purpose—and consequentially, in research questions and experimental procedures—between the present experiment and most cross-modal studies. The present experiment examined how shared cultural norms governing the association of sound and shape are expressed by members of the respective cultures. The task it utilizes has an explicit intra-cultural communicative purpose ("represent the sound on paper in such a way that if another member of their community saw their marks they should be able to connect them with the sound"; p.186), and it allows for—even demands—conscious reflection. In contrast, much cross-modal research elicits automatic, subconscious responses, often at a pre-attentive level of processing. Furthermore, the dependent measures in such experiments may be implicit or indirect — measures not directly defined in the experimental task, which participants are often unaware of or unable to control. Even cross-modal studies utilizing direct, explicit measures rarely demand that participants intentionally apply cultural norms or codes, or define the experimental task as a communication undertaking. Thus, while the present experiment suggests how participants utilize shared cultural resources (if such resources exist) to associate music and shape, it does not examine whether and how cross-modal correspondences other than those codified by the participants' culture (such as those revealed by the infant studies mentioned above) may affect perception and behavior. Indeed, task demands might have attenuated any effects of cross-modal correspondences other than those generated by culturally-ingrained knowledge and habits.

The main challenge for the comparative study of crossmodal correspondences, in music and elsewhere, is examining the interactions between correspondences explicitly codified by language and cultural practice and those whose sources seem independent of culture. Would, for instance, correspondences such as those between pitch and spatial height or loudness and visual brightness—already effective in infants a few months old—endure in cultures whose language and other cultural practices do not express such correspondences? Would cross-modal correspondences fashioned by cultural practice shape aspects of perception, cognition and action not susceptible to conscious awareness and control? Examining such questions would require extensive and innovative cross-cultural research, applying converging methodologies, both implicit and explicit, while investigating diverse perceptual and cognitive processes and related cultural practices and artifacts.

Currently, very few studies (to which Athanasopoulos and Moran's work is a welcome and important addition) have attempted such challenging undertaking; yet even those few suggest a very intriguing, complex picture. A recent study (Dolscheid, Shayan, Majid, & Casasanto, 2013), examining pitch mappings across cultures, provides an interesting example. Dolscheid and her colleagues observed that speakers of different languages use different metaphors to denote the auditory dimension we call "pitch height" (Eitan & Timmers, 2010). Thus, while Dutch speakers describe pitch as high or low, Farsi speakers describe it as thin or thick. The researchers presented Dutch and Farsi participants with different pitches, concurrently with visual stimuli varying in height or in thickness, and asked them to reproduce each pitch by singing. The irrelevant visual height information affected the performance of Dutch speakers (who refer to pitch as "high" or "low"), but not that of Farsi speakers (who refer to pitch as "thin" or "thick"). In contrast, the irrelevant thickness information affected the performance of Farsi speakers, but not that of Dutch speakers. In further experiments, Dutch speakers were trained to use thickness metaphors for pitch in two contrasting ways: similarly to Farsi speakers (higher pitches are "thinner") and in a "reversed-Farsi" way (higher pitches are "thicker"). Both groups then performed the pitch reproduction task. While the group trained with Farsi-like pitch metaphors (high pitch in thinner) was affected by cross-dimensional thickness interference similarly to native Farsi speakers, the group trained with "reversed-Farsi" metaphors (high pitch is thicker) demonstrated no effects.

Together, these findings present an intriguing picture of the way cultural practices (here, native language metaphors) may affect preexisting, possibly universal cross-modal correspondences. Both pitch/height and pitch/thickness correspondences were revealed in infant studies, suggesting that the source of both correspondences is not language or other cultural practices (Dolscheid et al., 2012; Jeschonek et al., 2012; Wagner et al., 1981; Walker et al., 2010). Yet language (and possibly, other cultural practices and artifacts) may strengthen or attenuate such earlier correspondences, affecting behavior and perception. Cultural practices, then, do not create cross-modal correspondences, but rather modify the strength of preexisting correspondences (inborn or acquired through implicit statistical learning) and position them in culturally specific contexts. Yet, "natural" cross-modal correspondences not adopted by a culture do not necessarily disappear. They remain latent, and may be induced even through brief training (Eitan and Timmers (2010), using a different experimental paradigm, reach comparable conclusions).

This model is, of course, tentative and cursory, and need to be examined through extensive future research examining diverse cultural settings and psychological mechanisms. Extending the intriguing findings of "Cross-cultural representations of musical shape" through the use of complementary research methods, enabling a closer look at the interaction of "nature" and "nurture" across cultures, would surely be a valuable step forward.


  • Dolscheid, S., Hunnius, S., Casasanto, D., & Majid, A. (2012). The sound of thickness: Prelinguistic infants' associations of space and pitch. In: N. Miyake, D. Peebles, & R.P. Cooper (Eds.), Proceedings of the 34th Annual Meeting of the Cognitive Science Society. Austin, TX: Cognitive Science Society, pp. 306-311.
  • Dolscheid, S., Shayan, S., Majid, A., & Casasanto, D. (2013). The thickness of musical pitch: Psychophysical evidence for linguistic relativity. Psychological Science, Vol. 24, No. 5, pp. 613-621.
  • Eitan, Z. (2013). How pitch and loudness shape musical space and motion: New finding and persisting questions. In: S.L. Tan, A. Cohen, S. Lipscomb, & R. Kendall (Eds.), The Psychology of Music in Multimedia. Oxford: Oxford University Press, pp. 161-187.
  • Eitan, Z., & Timmers, R. (2010). Beethoven's last piano sonata and those who follow crocodiles: Cross-domain mappings of auditory pitch in a musical context. Cognition, Vol. 114, No. 3, pp. 405-422.
  • Feld, S. (1990). Sound and Sentiment: Birds, Weeping, Poetics and Song in Kaluli Expression. Philadelphia, PA: University of Pennsylvania Press.
  • Fuhrman, O., & Boroditsky, L. (2010). Cross-cultural differences in mental representations of time: Evidence from an implicit non-linguistic task. Cognitive Science, Vol. 34, No. 8, pp. 1430-1451.
  • Gentner, D., Imai, M., & Boroditsky, L. (2002). As time goes by: Evidence for two systems in processing space-time metaphors. Language and Cognitive Processes, Vol. 17, No. 5, pp. 537-565.
  • Hays, T.E. (1986). Sacred flutes, fertility, and growth in the Papua New Guinea Highlands. Anthropos, Vol. 81, pp. 435-453.
  • Jeschonek, S., Pauen, S., & Babocsai, L. (2013). Cross-modal mapping of visual and acoustic displays in infants: The effect of dynamic and static components. European Journal of Developmental Psychology, Vol. 10, No. 3, pp. 337-358.
  • Langness, L.L. (1974). Ritual Power and Male Domination in the New Guinea Highlands. Ethos, Vol. 2, No. 3, pp. 189-212.
  • Lewkowicz, D., & Turkewitz, G. (1980). Cross-modal equivalence in early infancy: Auditory-visual intensity matching. Developmental Psychology, Vol. 16, pp. 597-607.
  • Ludwig, V.U., Adachi, I., & Matsuzawa, T. (2011). Visuoauditory mappings between high luminance and high pitch are shared by chimpanzees (Pan troglodytes) and humans. Proceedings of the National Academy of Sciences of the United States of America, Vol. 108, No. 51, pp. 20661-20665.
  • Mondloch C.J., & Maurer, D. (2004). Do small white balls squeak? Pitch—object correspondences in young children. Cognitive, Affective and Behavioral Neuroscience, Vol. 4, No. 2, pp. 133-136.
  • Núñez, R., & Cooperrider, K. (2013). The tangle of space and time in human cognition. Trends in Cognitive Sciences, Vol. 17, No. 5, pp. 200-229.
  • Núñez, R., Cooperrider, K., Doan, D., & Wassmann, J. (2012). Contours of time: Topographic construals of past, present, and future in the Yupno valley of Papua New Guinea. Cognition, Vol. 124, No. 1, pp. 25-35.
  • Spence, C. (2011). Crossmodal correspondences: A tutorial review. Attention, Perception & Psychophysics, Vol. 73, No. 4, pp. 971-995.
  • Wagner, S., Winner E., Cicchetti, D., & Gardner, H. (1981). "Metaphorical" mapping in human infants. Child Development, Vol. 52, No. 2, pp. 728-731.
  • Walker, P., Gravin Bremner, J., Mason, U., Spring, J., Mattock, K., Slater, A., & Johnson, S.P. (2010). Preverbal infants' sensitivity to synaesthetic cross-modality correspondences. Psychological Science, Vol. 21, No. 1, pp. 21-25.
Return to Top of Page