MY purpose in this non-zero-sum commentary is not to endorse a particular side on the disparate findings of Friedman et al. (2024) and Clemente et al. (2021), both of which are well-done and provocative studies. Rather, I wish to consider them as situated within a broader research context, touching on two sets of issues, that are vital to a rich understanding of the nature of human artmaking, yet remain under-represented in contemporary research. One concerns the murky evolutionary origin of our human artistic capacity, including the role of cross-modal processing and its role in aesthetic cognition. The second involves the first-person deployment of this capacity in creative decision-making and problem-solving, rather than in a merely receptive mode. Both speak to the importance of inherent structure and constraints on human aesthetics and creativity.
Before addressing these issues, I first wish to foreground my pragmatic conclusion, which is that neither study – indeed, no such study in this area – should be taken as uniquely definitive. The operationalization and assessment of constructs like complexity and aesthetic sensitivity can and should reflect the inherent diversity in manifestations of human aesthetic creativity. The characterization and measurement of complexity, even in a stripped-down context like sequences of digits, has long been a significant conceptual challenge (Pagels, 1988) – not to mention the added psychological layers and dimensions when considering what complexity means in the context of great works of art, music, or literature. It is unrealistic to think that any single operationalization of aesthetic complexity will be universally valid or yield a conclusive all-purpose finding. Indeed, even completely 'objective' measures of complexity – as in Martindale's (1990) revolutionary research using computer text analysis to understand the stylistic progression of, say, the history of British poetry – can involve a grab-bag of metrics, such as proportion of unique words or mean word length or number of word associates, to assess aesthetically attention-grabbing features like complexity, surprise, incongruity, and ambiguity – 'collative properties' of artworks, in Berlyne's (1971) terms. Even in such seminal research, a high-water mark in the history of empirical aesthetics and creativity, the relative importance of and inter-relations among various measures for producing an overall psychological sense of complexity have remained largely unexplored. The upshot is that aesthetic complexity is itself complex and highly nuanced, and that even objectively defined principles of complexity may be modulated by other factors and, to boot, have different effects among different audiences – for instance, experts versus novices (Kozbelt, 2022).
In that sense, the thoughtful attempts to provide isomorphic measurements of the complexity of patterns in different modalities by Clemente et al. (2021) and Friedman et al. (2024) are admirable, but they are necessarily incomplete. Other plausible cross-modal isomorphisms are imaginable, and the empirical implementation of these may yield different conclusions. Moreover, as Friedman et al. point out, terms like 'balance' and 'symmetry' may simply be qualitatively different when applied to different modalities or aesthetic domains. Indeed, it is hard to know how well a single verbal label can capture the 'repleteness' (Goodman, 1968) of any instance within an aesthetic modality, and if any such abstracted representation would serve as an adequate proxy for a modality-general mental representation acted upon by reward processes, a possibility articulated and entertained by Clemente et al. (see also Palmer & Griscom, 2013; Song et al., 2022).
In any case, the enterprise of assessing and relating aesthetic sensitivity to factors like complexity, to explore the extent to which aesthetic and evaluative judgments operate in a modality-specific or modality-general way, is innovative and provocative. Moreover, any evidence bearing on the question of cross-modality has implications for how we understand the broader nature of human aesthetic processing – in terms of its origins, limitations, applications, and ultimate prospects.
The evolutionary emergence of uniquely human abilities – including symbolic intelligence, language, creativity, humor, artistry, and musicality – remains murky and contentious. The phylogenetic origin of the arts has spawned a significant scholarly literature, often highlighting the question of what evolutionary mechanism(s) gave rise to our aesthetic capacity: possibilities include direct Darwinian natural selection, sexual selection, or the arts arising as a byproduct of other selected-for adaptations; alternatively, the arts may have cultural rather than biological roots (Dissanayake, 2007; Kozbelt, 2020, 2021; Pinker, 2002). Questions about evolution and the arts can be organized into two main levels of inquiry: why we have the arts in the first place rather than no arts at all, and why the arts are structured the ways they are and not some other ways, of which more later.
Speculative accounts of the evolutionary origins of human musicality and artistry can take several narrative forms, with different implications for modal and cross-modal processing. One involves the phylogenetic emergence of the human mind in very general cognitive terms, overall reflecting the rising intellectual capacities evident in our hominid ancestors, in saltational sequence across ancestral species in the evolutionary lineage, potentially driven by the demands of social life, and culminating in the so-called "Creative Explosion" some 40,000 years ago (e.g., Amati & Shallice, 2007; Dunbar, 1996). Such a centralized account is parsimonious, but it sidesteps the questions of how different modalities themselves became more sophisticated and how cross-modal connections developed and facilitated the emergence of some human faculty for aesthetics – 'sensory cognition' in Baumgarten's (1750/1970) original apt characterization.
Another common approach is to focus on the evidence for the origin of a particular aesthetic domain, like music (e.g., Justus & Hutsler, 2005; Purves, 2017; Wallin, Merker, & Brown, 2000) or visual art (e.g., Kozbelt, 2020; Morriss-Kay, 2010). While offering a richer description of a particular modality, such accounts are frequently siloed and do not necessarily explore relations between modalities. If we accept the premise that human cognition today is characterized by some degree of cross-modal processing, such approaches might imply an origin story rooted in the more-or-less independent origin of various mental modules, which later joined together (see also Mithen, 1996).
Yet another model posits the opposite: rather than distinct underlying sensory modalities initially giving rise to different domains of aesthetic activity – like visual art or music or storytelling or dance – Brown (2022) has suggested that the arts had a multi-modal and multi-media origin, integrated and embodied and performative, with the emergence of other domain-specific forms of art being later developments. (Thus, the rejoining of these strands – in contemporary multimedia installations, film, or other 'total works of art' (Brown & Dissanayake, 2018) – marks a return to a human default aesthetic.) Brown further argued that artistic expression in domains like theater or opera or dance frequently operates in a cross-modal fashion, noting commonalities between audition and vision in parameters like tempo, articulation, amplitude (loudness or brightness), and movement shape (in melodies or physical movement) – just the kinds of abstracted, modality-general labels like 'balance' or 'symmetry' considered by Clemente et al. (2021) and Friedman et al. (2024). Brown provided additional structure to these ideas by contrasting narrative modalities (language, gesture, and image/object) with expressive modalities (voice, face, and body), and identifying two major routes for emotional expression in the arts, one vocal-auditory (as in theater, film, storytelling, and singing) and one body-visual (including traditional visual arts as well as dance and theater and film).
While such varied accounts make different suppositions about the relative timing of modality-general versus modality-specific components of aesthetic cognition, they all share an evolutionary foundation. To the extent that some direct or indirect survival or reproductive advantage inheres in the integration of different sensory or cognitive modules in particular ways that are relevant to aesthetic response, cross-cultural commonalities in phenomena like aesthetic artifacts and preferences should be observable. Indeed, regularities in aesthetic products have been amply documented, spanning domains and modalities (Martindale, 2007; Palmer & Griscom, 2013). Music around the world typically includes common pitch intervals, tonal hierarchies, principles of grouping and meter, and aspects of melodic contour (Huron, 2006; Savage et al., 2015; Trehub, 2000). Visual artworks typically incorporate many aspects of color and shape preference and compositional regularities (Firstov et al., 2007; Graham & Redies, 2010; Kozbelt, 2020). Language-based arts feature regularities in poetic meter, rhyme, and narrative elements (Campbell, 1949; Keyser, 2020). Moreover, cross-cultural concordances in aesthetic judgment are often evident, especially among experts (Chen et al., 2002; Dutton, 2009; Kozbelt, 2022). An evolutionary foundation for common human aesthetics suggests that modality-specific structures can themselves also be mapped onto each other, engendering further cross-modal structure, the exploration of which is exactly the point of the studies by Friedman et al. (2024) and Clemente et al. (2021).
I hasten to add that widespread aesthetic regularities do not conclusively prove that they share an evolutionary basis. In principle, shared cultural exposure and convention could have led to merely illusory and coincidental surface commonalities. Put into the present context of a discussion of cross-modality, this might lead to the provocative possibility, akin to an aesthetic Sapir-Whorf hypothesis, that different abstract labels for aesthetic principles or qualities may mediate or lead to differential empirical findings across cultures, in studies conducted along the lines of Clemente et al. (2024) and Friedman et al. (2021).
In any case, other classic associative regularities relevant to aesthetic processing – like the 'bouba/kiki' effect (&Cacutewiek et al., 2022), the pervasiveness of non-random metaphors (Lakoff & Johnson, 1981), and various forms of synesthesia (Cytowic, 2002) – suggest additional important cross-modal default constraints on aesthetic cognition. Further research on all these topics will be needed to resolve longstanding questions of the evolutionary versus cultural origins of human aesthetics, as well as the ultimate prospects for the arts (Kozbelt, 2022; Martindale, 2009). Also relevant is the extent to which basic regularities suffice to provoke intense aesthetic experience, versus being just a facile scaffolding for the stimulation of some reward of mild positive affect when experiencing art. That is, how much aesthetic-experiential mileage do generic patterns of structured associations within and across modalities by themselves provide? Or is the real action elsewhere – say, in the highly elaborated domain-specific knowledge of the expert or connoisseur, in an embedded socio-cultural context that yields added real-world meaning, or in innovative violations of targeted default expectations? This nexus takes us to our second issue: how aesthetic constructs relate to creativity.
There is an unfortunate dissociation between empirical aesthetics and creativity, both in scholarly literatures and communities of researchers. The application of aesthetic principles to creativity or vice versa is made only rarely (but see, e.g., Tinio, 2013), unnecessarily limiting both fields. Despite this, it is worth considering how the structure of cross-modal processing and the notion of aesthetic regularities impact human creativity. To the extent that humans broadly share some aesthetic sensibility, at least within a cultural tradition, the creation of aesthetic artifacts within that tradition should partake of and exploit those parameters.
However, such a view is overly simplistic, because merely deploying established aesthetic regularities along familiar lines seems far more likely to lead to formulaic rather than creative outcomes – though in many cultures, where innovation is not particularly prized, this may be just fine (Kozbelt, 2016). However, many high art traditions are characterized by constant pressure for innovation, whereby creators must find new ways to garner critical attention, generally by making their artworks more striking and original (Martindale, 1990). This can be a laborious endeavor. Indeed, many high-level creators – here Beethoven is the prototype – lavished astonishing care on the minutiae of their productions, generating numerous competing ideas, repeatedly reworking problematic passages, and considering broader issues of aesthetic principles and external constraints until they were satisfied. In this originality-seeking mode, as creators break more and more 'rules' – that is, traditional ways of doing things – they may eventually go against the grain of default evolutionary aesthetic biases, limiting their ability to communicate to audiences (Keyser, 2020; Kozbelt, 2017, 2022; Martindale, 2009). Thus, ambitious creators must constantly negotiate a balance between adhering to conventional modality-specific patterns and cross-modal associations versus violating those associations in the service of novelty or expression. Over-adherence leads to predictable outcomes; excessive violations of predictability lead to a failure to communicate with audiences. Successful artistic creators strike a balance between these competing factors that is appropriate for the socio-cultural or historical circumstances in which they work, organizing their ideas so that they manage to communicate something vital in a way that is new and noteworthy.
Most artistic 'innovations' are minor and transient, suggesting that they are mere 'inventions' that do not tap into some deeper source or wellspring of human aesthetics. However, as creators forge new styles and modes of art, they occasionally hit upon novel principles, some of which represent genuine aesthetic 'discoveries,' rather than inventions. Historical examples of such discoveries may include poetic recursion and painted fractals (Keyser, 2020), a center compositional balancing point in painting (Firstov et al., 2007), linear perspective in visual art (Kubovy, 1988), certain patterns of visual decoration (Gombrich, 1979), musical systems of scales and harmony based on simple mathematical intervals (Purves, 2017), and featural commonalities across different orthographic systems (Treiman & Kessler, 2011). In such cases, the research challenge is to understand why these conventions – and not some alternatives – take hold and persist. Is their emergence a historical accident, or is there some deeper and non-arbitrary reason for their authentic staying power, potentially implying a psychobiological or evolutionary basis?
I close this discussion of creativity by noting that the contemporary arts pose new research challenges along these lines. Cross-modal effects are very relevant here, particularly with emerging media or integrative art forms, which require the development of new conventions, for instance in the development of a common 'language of film' (Edgar, Marland, & Rawle, 2015) for organizing cinematography and narrative flow. The emergence of overtly conceptual art likewise raises issues of representational modality (Kranjec, 2015). Moreover, some typological models of creativity (e.g., Galenson, 2001) suggest a dichotomy between artists who pursue a more radical conceptual approach, which perhaps lends itself more to modality-general representations driving aesthetic impact, versus artists who pursue a more experimentalist approach, which perhaps lends itself more to modality-specific representations driving aesthetic impact. As far as I know, this latter proposition has not been tested, but it is an example of how ideas from Clemente et al. (2021) and Friedman et al. (2024) can be repurposed to expand our knowledge of creative aesthetics as well as receptive aesthetics.
In conclusion, the empirical work of Friedman et al. (2024) and Clemente et al. (2021) speaks to vital issues within the study of human aesthetics and evaluative response, including the evolutionary origins of the arts and how artistic creators deploy aesthetic principles in their creative problem-solving. The question of the nature of the representations upon which mental processes operate – arguably the central enterprise of cognitive science – is difficult to definitively resolve in a complex, ecologically valid, and culturally situated nexus, like the human experience of the arts. An unambiguous answer, that the underlying representations subject to evaluative or aesthetic processing are either modality-specific or modality-general, seems unlikely, and I don't sense that either camp would endorse such a facile conclusion anyway. But the prospect of lingering ambiguity does not make the effort worthless. Investigators collectively probing the nature and limits of supposed general aesthetic principles – like complexity, balance, or harmony – may produce results that yield a structured sense of the conditions under which modality-specific versus modality-general representations predominate in aesthetic or evaluative or creative cognition and how these patterns might vary as a function of personality, culture, expertise, or other factors.
Arguably, the scientific search for new ways to operationalize and measure aesthetic constructs echoes the history of the arts themselves, whereby creators devise new forms of aesthetic communication. Sometimes they hit upon novel principles that can potentially be encapsulated by a domain-general verbal label and which can help organize an understanding of subsequent artistic or musical expression. Just as one may think it impossible to 'exhaust' artistic expression in any meaningful way, perhaps it is the same with empirical operationalizations: there will always be some alternative method to assess complexity or aesthetic sensitivity or from which to draw ever more refined cross-modal comparisons. A research agenda that embraces this pluralism and searches for structure within the diverse realm of aesthetic, evaluative, and creative cognition across modalities may come to parallel the painstakingly built-up picture of 'discovered' – or, if you prefer, 'invented' – aesthetic principles by which the entire history of the arts might be understood.
This article was copyedited by Eve Merlini and layout edited by Jonathan Tang.