A few years ago, Jesse Berezovsky (2019) produced a highly original study that used the tools of statistical physics to study the evolution of tuning systems and scales. In this issue of Empirical Musicology Review, he and his collaborators Ryan Buechele and Alex Cooke have produced a study where the focus has shifted from statistical physics to music theory and the historical progression of tuning systems in Western Europe (Buechele et al., 2024).
I am one of a rather limited set of people who have researched both statistical physics and evolution of scales (McBride & Avendaño, 2017, 2018; McBride & Tlusty, 2020; McBride et al., 2023; Brown et al., 2024). As such, my goal with this commentary is to first summarize the work of Berezovsky in a way that may be easier to digest for non-physicists. That's not to say that the original work is opaque, but given the diversity of the prospective audience, we need to approach communication from many angles. Through discussions with music scientists (I use 'music scientist' to denote anyone who studies music using the scientific method), it is clear to me that many people struggled to grasp the details of the original paper (Berezovsky, 2019). Thus, the new paper, almost entirely stripped of equations, is a welcome addition that I believe is approachable to a wider audience (Buechele et al., 2024). My summary aims for the middle ground: the music scientists who want mathematical detail. I plan to explain the models that Berezovsky uses, and to highlight how they complement the historical evolution of Western tonality.
I have also spent the past five or so years conducting a comparative study on the evolution of scales, which has given me a good vantage point to put the work in a broader context. The evolution of tonality is quite a complex problem 2. Scales can take on many forms, ranging from the ephemeral (a person singing to themselves; difficult to measure 3) to the exceedingly robust (mathematical descriptions of scales, which have lasted millennia). As such, a theory of the evolution of vocal scales sung in prehistoric societies may not bear much relevance for the more mathematical scales of Western Europe in the common-practice period. It is with this contrast in mind that I will try to highlight where the work of Berezovsky fits.
In this article, I summarize and critique the paper, and follow this up with a commentary on the art of interdisciplinary science and communication. The inscrutability of the original article to those in the field of music science leads to the question, who was the intended audience? Physicists, or music scientists? Is it possible to write for both at the same time? I have struggled with this in my work, as I find that people regularly misunderstand my research in fundamental ways. In essence, I (we) have a communication problem. After discussing with several other scientists who have crossed the bridge from the physical sciences to study music, I have found that the experiences I have encountered are quite common. I will put forth some salient differences that I've noticed between the physical science and music science communities, in the hope that each side can better understand the other. Finally, I will offer some advice to both sides, so that common pitfalls may be avoided. Primarily, I think the route to better communication is more communication. The inclusion of physical scientists in music science can lead to serendipitous synergy, and hopefully this commentary can pave a way forward.
To conclude this introduction I look to Hermann von Helmholtz, a physicist and psychologist, and a pioneer of music science. From the introduction (translated by Alexander Ellis) of his groundbreaking work, Sensations of Tone (1885), I provide a quote that is still as relevant today as it was then:
"In the present work an attempt will be made to connect the boundaries of two sciences, which, although drawn towards each other by many natural affinities, have hitherto remained practically distinct – I mean the boundaries of physical and physiological acoustics on one side, and of musical science and esthetics on the other. The class of readers addressed will, consequently, have had very different cultivation, and will be affected by very different interests. … I hope to have consulted the interests of both classes of readers." (p.1)
The model of Berezovsky (2019) answers the question: "What is the optimal distribution of pitches within an octave that minimizes dissonance and maximizes compositional variety?" The model makes a key assumption that all pitches are equally likely to interact (form harmonies) with all other pitches. Dissonance, , is quantified using the sensory dissonance model of Sethares (1998), while compositional variety is represented as the entropy, , of the pitch distribution (lower entropy means fewer options for a composer). These two factors act against each other, as excluding dissonant pitch combinations leads to fewer options for the composer. The balance between them is controlled by a parameter, referred to as 'temperature', , by analogy with statistical physics. The optimum pitch distribution is the one that minimizes (analogous to the 'free energy' in statistical physics), where is the total dissonance of the tuning system (see Eq. (5) for the mathematical definition).
As mentioned, the model chosen for quantifying dissonance is the sensory dissonance model of Sethares (1998). This model builds on the finding that people can reliably discern 'beats' (McDermott et al., 2016; Harrison & Pearce, 2020; Milne et al., 2023). 'Beats' occur when the interference pattern produced between tones with similar frequencies – e.g., tones that are one semitone apart and occur simultaneously – is perceptible (Sethares, 1998). This phenomenon is also observed for tones differing by a major seventh, which leads to the conclusion that the overall sensory dissonance between two complex tones – tones that consist of multiple, overlapping pure tones – can be computed by calculating interference between all pairs of overtones.
Dissonance, according to the model, depends on not only the frequency ratio of two complex tones, but also on the absolute frequency (via a term called the critical bandwidth, ) and the timbre (the relative weight of different overtones). Berezovsky (2019) uses a critical bandwidth of and a sawtooth timbre. The dissonance, , between two pure tones and is where is the log frequency difference measured in units of octaves (this can be multiplied by 1200 to get units of cents). The dissonance D between two complex tones is obtained by summing d between pairs of overtones over an infinite series of overtones, where n and m are terms in an integer series; approximates the perceived 'loudness' of the pair of nth and mth overtones, as defined by the sawtooth timbre. In practice, one does not need an infinite series as higher overtones have diminishing contributions; summing over the first 10 overtones of each complex tone is likely to be sufficient.
To test the generality of these results, one could try to re-do this study with a different set of model parameters (, or timbre), or indeed a different model (Harrison & Pearce, 2020). One could even replace sensory dissonance with a model based on harmonicity, or a composite model modeling both sensory dissonance or harmonicity (Marjieh et al., 2024). However, there is a good chance that the results will be similar, since the outputs of different models for sensory dissonance are strongly correlated for harmonic complex tones (McBride & Tlusty, 2020). The authors are aware of these possibilities and note that the reason for their choice was simplicity, following "the tradition of the physics community" (Buechele et al., 2024, p. 122).
In Buechele et al. (2024) the authors use the term compositional variety to describe the extent of possible choices for pitches that can be used in composition. Pitch is, of course, only one way among many of achieving variety in composition, but it is the relevant factor when considering scales. The compositional variety is modeled as the entropy of the pitch class distribution , where the integral is taken over 0 ≤ x ≤ 1 octaves. This is a fairly simple concept: entropy is maximum when all pitches are equally likely; entropy is zero when there is only one choice for pitch. At very high temperatures (T > 21); when high compositional variety is favored) we find a uniform distribution over all pitches, which does not describe many (if any) types of music. As the temperature is lowered (16.6 < T < 21), discrete pitch classes emerge, starting with 12-tone equal temperament (12-TET). At this stage, there is no clear tonal hierarchy, so the distribution describes music that might be created using twelve-tone serialism. At still lower temperatures (T < 16.6) we see pitch distributions with tonal hierarchies that are more characteristic of Western classical music (Harasim et al., 2021). Eventually at T = 5, the lack of dissonance associated with unison or octave harmony leads to a single pitch class.
The model describes pitch using a probability distribution where the pitch class is collapsed onto a single octave range, 0 < x < 1 octave. The overall dissonance is the probability of two pitch classes, x and y, sounding at the same time times the dissonance caused by the interval . We can convert x and y, measured in octaves, into frequency using and , by taking any real positive value for ; the choice of does not matter since absolute pitch is taken into account separately through the choice of the critical bandwidth, . This allows us to calculate the overall dissonance as a function of x and y,
The problem is simplified by assuming that two simultaneous pitch classes are independent of each other (in other words, all harmonies are equally likely), so that , and The resulting optimal pitch class distribution P(x) is the one that minimizes and after substituting Eq. 4 and Eq. 6 into Eq. 7,
Solving Eq. 8 to get the optimal is not trivial, hence, the important innovation of Berezovsky is to notice that this is equivalent to a mathematical equation that was already solved in the field of statistical physics (Wagner, 1951).The assumption that simultaneous pitches are independent (i.e., all harmonies are equally likely) is of course incorrect, and it was made to simplify the problem. In reality, certain harmonies are more likely than others within a musical tradition (Jordania, 2011; Huang et al., 2017), and this is acknowledged by the authors. For example, if one studies the statistical distribution of pitch in Bach's Well-Tempered Clavier, one can find close to an even distribution of pitches overall, but minor second harmonies will occur less frequently than perfect fifths. Although not directly addressed in Buechele et al. (2024), Berezovsky offers the tone lattice model as an alternative formulation, where pitches only interact with a subset of neighbors (only few harmonies are likely) in an abstract 2-dimensional (or in later work, 3-dimensional) space (Berezovsky, 2019; Din & Berezovsky, 2023). The 2-dimensional tone lattice models lead, interestingly, to equidistant 5- and 7-note scales, which are surprisingly common across cultures (McBride et al., 2023); these scales are also the only equidistant scales with fewer than 12 notes which can accommodate a maximum number of 'imperfect fifths' (fifths, but allowing for some deviation from a frequency ratio of 3:2) (McBride & Tlusty, 2020). The 3-dimensional tone lattice models lead to tonal hierarchies that approximate the major/minor tonal hierarchies of Western classical music. While the original assumption of independent pitches was an oversimplification, these additional models show how one can gradually add complexity to a model to arrive at more detailed predictions that accord better with empirical data.
It is well known that one can arrive at a version (Pythagorean) of the Western tuning system via the circle of fifths. Starting from any note, progressively increment up and/or down in intervals of a fifth (Barbour, 1951) and you end up with something resembling 12-TET (or just intonation, mean-tone temperament, etc.), such that differences between scale degrees are typically smaller than a Pythagorean comma (20 cents). The approach of Berezovsky shows that you can achieve the same result, but from a different starting point: as a trade-off between minimizing sensory dissonance and maximizing compositional variety.
The new component of Buechele et al. (2024) compared to Berezovsky (2019) is the comparison of the results of the model with the evolution of tonality in Western classical music. The authors first compare various tuning systems (equal temperament, mean-tone temperament, just intonation) with optimal tuning systems obtained at different values of T. This suggests that the historical progression of Western tuning systems can be explained as increasing compositional variety while simultaneously increasing sensory dissonance (increasing T). They next describe the evolution of Western tonality by depicting pitch distributions onto a radial plot based on the circle of fifths, showing results consistent with other investigations (Huang et al., 2017; Moss et al., 2023). This describes the gradual evolution of complexity of Western classical music, and they map this onto their model, showing that this is equivalent to music having a higher temperature (there was a shift towards greater compositional variety). They point out that in the 20th century, Western art music became increasingly dissonant, and some composers even eschewed scales entirely to compose using continuous pitch. This account offers a quantitative view of the evolution of Western classical music as a shift in the balance between minimizing sensory dissonance and maximizing compositional variety. One could say that this historical progression has certainly been stated before by musicologists, if you accept a comparison between compositional variety and the ability to easily play in many keys on one instrument (Barbour, 1951; Christensen, 2002). Nonetheless, it is useful to also have a more precise mathematical description of this progression, to supplement the qualitative accounts of musicologists and historians.
Having first summarized and interpreted the work, I would like to examine where it fits in with the bigger picture of scale evolution. In the following I discuss: the factors that may have contributed to scale evolution; how compositional variety and dissonance fit into this picture; and the implications of pitch variability on the predictions of differences between theoretical Western tuning systems.
The evolution of scales over time is a cultural-evolutionary process (Youngblood et al., 2023), and probably the most salient property of scales is that they are relatively fixed within a society, across performers and across performances (Surjodiningrat et al., 1972; Church, 2015; D'Amario et al., 2020; McBride et al., 2023) – although sometimes more than others (Arom et al., 2007; Weisser & Demolin, 2013; McBride et al., 2023). The question of evolution then becomes a question of how the scales change over time, and how some scales end up becoming used more frequently than others. Instead of modeling the cultural-evolutionary dynamics (Aucouturier, 2008), a more common approach is to study the properties of historical and extant scales (Huron, 1994; Gill & Purves, 2009; McBride & Tlusty, 2020; Phillips & Brown, 2022a, 2022b).
By studying common scales, if they have common properties (e.g., common harmonic intervals), then a plausible theory is one that links those properties to some evolutionary advantage (in the sense that the scales are more likely to persist or be selected). Other properties besides sensory dissonance and compositional variety have been investigated: Harmonicity has been considered instead of sensory dissonance, however the two properties are highly correlated so will not produce very different results (Gill & Purves, 2009). The interval spacing theory proposes that intervals need to be large enough that they are easily communicated without errors (Pfordresher & Brown, 2017; Phillips & Brown, 2022a, 2022b). Some have proposed that scales should be 'compressible' (few unique interval sizes), which leads to less complex melodies (McBride & Tlusty, 2020); others have proposed the opposite - that scales should have the properties 'uniqueness' and 'hierarchizability', where scales have more unique interval sizes (Balzano, 1980; Verosky, 2017). One important difficulty in studying the evolution of scales is that scales likely evolved differently in different cultures; e.g., societies with only solo monophonic music may not care about harmony (McDermott et al., 2016).To really understand how scales have evolved, one needs to study many properties, and many cultures.
I want to highlight how alternative hypotheses can lead to similar predictions as those found in the work of Berezovsky. The model of Berezovsky leads to separations between neighboring tones of at least about 1 semitone, since sensory dissonance is maximum when two tones that are very close in frequency are heard simultaneously. This is indeed found to be true in most societies, as intervals smaller than a semitone are exceedingly rare (Phillips & Brown, 2022b; McBride et al., 2023; Brown et al., 2024). The interval spacing hypothesis is a basis for a plausible alternative explanation of this phenomenon.
The scope of compositional variety in Buechele et al. (2024) is limited to tonality. It is true that increasing the range of possible tones increases the number of possible compositions. It is worth pointing out, however, that one does not need a lot of tones in order to create varied music. Many societies use tonal systems with as few as two or three notes (Brandel, 1961; Kunst, 1967). Due to combinatorial explosion, even two note scales are sufficient to generate huge variety: the number of unique melodies can be calculated as NL, where N is the number of possible tones, and L is the length of a melody. For N = 2 and length L = 40, there are roughly a trillion (240 ~ 1012) possible melodies. Moreover, pitch is only one component of music. Compositional variety can be widened through increasing complexity in rhythm, harmony, timbre, tempo and volume. Taking these into account, it is clear that there is no dearth of possible combinations to compose music with.
Nonetheless, there is indeed evidence in modern art music of composers becoming unsatisfied with the sets of tonalities available to them; the authors note examples from Debussy and Schoenberg, to Pousseur and Xenakis (Buechele et al., 2024). In summary, there is evidence of composers breaking the bounds of conventional Western tonality to increase compositional variety, but it is not clear how general this is for theories of the evolution of tonality, as it is possible to achieve compositional variety without resorting to increasingly dissonant tones. Furthermore, it remains to be established whether a drive towards increased compositional variety is found outside of Western (or other) 'classical' musics, compared to stabilizing forces in music (Youngblood, 2019).
The model in Berezovsky (2019) and Buechele et al. (2024) assumes octave equivalence and that scales span an octave. There is limited evidence to show that octave equivalence may not be universal (Jacoby et al., 2019), but there is much evidence of melodies that do not always span an octave (Brandel, 1961; Kunst, 1967; McBride et al., 2023; Brown et al., 2024). The model also assumes that dissonance is described by the dissonance curves of Sethares (1998), which compute the overall dissonance of complex tones as the sum of dissonance of the partial harmonics. Sensory dissonance of two pure tones is often found to lead to aversion (McDermott et al., 2016; Harrison & Pearce, 2020). However, the universality of dislike of sensory dissonance arising from interactions between overtones is hotly debated 4. Third, the model assumes that dissonance is something to be minimized (as far as allowed by limits on compositional variety). There are differences between societies in terms of how dissonance is used in composition, which is independent of how much compositional variety there is. For example, both Western religious choruses and Lithuanian sutartines (Ambrazevičius, 2017) tend to use scales with seven notes, yet sensory dissonance is avoided in the former and an integral part of the latter. Also, Javanese and Balinese gamelan music are highly similar, as expected given their shared history, yet only in Bali do they tune pairs of instruments with slight differences to create beats (Spiller et al., 2022). In summary, while there is evidence of preferences for harmony that avoids sensory dissonance in many cultures, it is not clear that dissonance is always something to be minimized.
Considerable attention is paid by the authors to differences between tuning systems (meantone temperament, just intonation, 12-TET) that were employed in the history of Western art music. Here I ask whether these differences are reliably produced and heard, and by whom?
Human pitch perception has clear limitations: Just-noticeable differences (JND), the smallest difference in pitch that can be reliably perceived, can be as low as 2-5 cents in laboratory conditions (Moore, 1973), which would enable distinctions to be made between tuning systems. However, JNDs are much higher for amateur musicians and non-musicians, and are much higher for interval differences compared to pitch differences (McDermott et al., 2010; Zarate et al., 2012). Interval discrimination, which is more relevant for tuning perception, is also likely to be diminished in realistic settings in comparison to controlled experiments in laboratories. Pitch production is also limited in precision, especially when it comes to singing (Ternström & Sundberg, 1988; Devaney et al., 2011; Pfordresher, 2022). Juxtapose the limits of pitch perception with the limitations of human vocal precision and it seems that the differences in Western tuning systems (typically about 10 cents) are unlikely to be reliably differentiated in practice, as has been found empirically (Barbour, 1951; Hagerman & Sundberg, 1980; Kopiez, 2003; D'Amario et al., 2020).
Despite these arguments against the ability to produce and perceive differences between similar tuning systems, we must take into account the historical progression of Western tuning systems, as noted in Buechele et al. (2024). The scholarly record dictates that tuning systems evolved from Pythagorean tuning, to just-intonation, mean-tone temperament, and settled at equal temperament. This indicates that someone cares. However, one could ask, who cares? I speculate that it is primarily the people who professionally create and tune fixed-pitch instruments. These are the people who create devices such as monochords and tuning forks, which improves the precision and stability of the intonation of fixed-pitch instruments over time. What we know of the historical development of Western tuning is obtained through the treatises of theoreticians, but due to the evanescence of musical sound we know little of how well theory corresponded to practice (Christensen, 2002). In modern times there is certainly evidence of a disconnect between theorists, who prefer simplicity and mathematics, and practitioners of Thai classical music (Garzoli, 2020). A better understanding of this is required to truly know how music evolution is differentially driven by performers, tuners, and theorists.
The work of Buechele et al. (2024) shows how the optimal (according to their theory) tuning systems change along a continuum from the least dissonant, to tuning systems that enable maximal compositional variety. They show that the evolution of tuning systems in Western classical music is commensurate with movement along this optimum, in the direction of increasing compositional variety. Furthermore, this is mirrored in the evolution of tonality in the work of Western classical composers. A conservative estimate of where this model is applicable is in describing the evolution of Western classical music, at least in the precise tuning of instruments and the dissonance of compositions.
There are three areas that the work does not touch on, which I think would be extremely valuable next steps: (i) Are the small differences between theoretical tuning systems (e.g., just intonation vs. 12-TET) evidenced in empirical measurements of tunings or performances? One can measure intonation of vocal performances or how instruments are tuned; the differences between tuning systems may be negligible compared to the empirical variance in intonation. Existing studies so far lean towards this result, but there are only a few such measurements (Hagerman & Sundberg, 1980; Kopiez, 2003; D'Amario et al., 2020). (ii) How does the proposed theory compare with other theories of the evolution of music? (iii) How well do the results describe tuning systems from other cultures? This last question can be evaluated with recently-published datasets of scales (McBride et al., 2023; Brown et al., 2024). I anticipate future work that leads to clarification of the relevance and power of this theory in explaining the evolution of scales more generally.
Although this Commentary was invited in response to Buechele et al. (2024), it largely deals with the theoretical model of Berezovsky (2019). This particular mode of scientific communication – whereby much of the same information is presented in two separate, stylistically-distinct papers to two very different audiences – is uncommon (though not unheard of). What would Hermann von Helmholz have chosen to do in this case, had he lived today? It is with this in mind that I write on the more general issue of interdisciplinary communication.
I would like to start the following discussion with a disclaimer. I will here be indulging myself in the writing of opinion, based on anecdotal evidence and conjecture. Some of my opinions are likely wrong, probably due to under-sampling of experiences, and possibly due to faulty inference. Hence, I feel the need to begin by introducing myself. I received education in chemical engineering and statistical physics, and after my PhD I shifted to biophysics. As luck would have it, I ended up with sufficient academic freedom to also study music. And so, I went down the path of studying various research questions, attending conferences, and I developed a network of music scientists to discuss and collaborate with.
When I was first invited to write a commentary on Buechele et al. (2024), I knew I wanted to speak about the general problem of interdisciplinary communication. To supplement my own experience, I sought out stories from others who have also straddled the boundary between physical and music sciences. I was surprised by the extent of overlap between different researchers' experiences, which gives me hope that this discussion is a worthy endeavor after all [4]. In the following, I discuss the importance of interdisciplinary work in music research, epistemological differences between disciplines, and advice for both physical scientists and those more fully embedded in the music science community.
Music science is about as interdisciplinary as it gets. Beyond the core disciplines, such as music theory, ethnomusicology, historical musicology and comparative (systematic) musicology, there are contributions from psychology, neuroscience, computer science, mathematics, anthropology, archaeology, and cultural evolution. Given this diversity, it is not surprising that there are physical scientists who also want to join in the fun. What do they bring to the party? Buechele, Cooke, and Berezovsky (2024) show how one can repurpose existing solutions from physics if two models share the same mathematical forms. Others have used theory and tools from biology to study the evolution of music: evolutionary theory (Nakamura & Kaneko, 2019; Lambert et al., 2020), sequence alignments (Savage et al., 2022), and morphometrics (Chitwood, 2014; Aguirre-Fernández et al., 2020, 2021). I should also point out that such transfer of ideas is not unidirectional, as tools developed for studying music have been used in the physical sciences (Leroi et al., 2020; Zali et al., 2023).
To reap the benefits of interdisciplinary research, we need effective communication and collaboration, followed by constructive commentary and criticism. Given the heavy burden on academics (research, teaching and administrative jobs; the constant drive to write papers and grants), stepping outside of one's main field is not necessarily supported at the faculty level. This means that people who cross the boundary from physical sciences to music, do so sporadically, and do not have so much time to commit to research and networking outside their principal domain. For these reasons, it is especially important that we make the most of what few opportunities arise. With this in mind, I would like to offer some advice to both sides on how to better communicate and understand each other.
I tentatively start this section with a conjecture, that differences in philosophy between disciplines can arise due to differences in the degree of certainty one can have in scientific results of that discipline. The lack of certainty in results is of course related to the degree of predictability and manipulability (Weaver, 1948; Sanbonmatsu et al., 2021) in the subject (e.g., psychology and culture are typically less predictable than inert matter), so this is in no way meant to be a criticism of the representatives of a particular discipline. This conjecture has been stated elsewhere more eloquently, such as in discussions on the hypothesized "hierarchy of the sciences" (Comte, 1855; Fanelli & Glänzel, 2013; Simonton, 2015). On one end of the extreme is mathematics, where once certain axioms have been assumed, there is a correct answer that can be unambiguously arrived at. On the other end of the extreme is a discipline like economics, where even after assumptions (which are not as clear as mathematical axioms) have been laid down, experimental results can be questioned based on sampling biases, construct validity, insufficient statistics, and the fidelity or utility of models used (Eronen & Bringmann, 2021). Disagreements are rare in the former discipline (Lamers et al., 2021), while in the latter disagreements have historically been embedded in the system, in the form of different schools of thought. I have noticed in my experience (and the experience of others) a series of distinctions (distinctions of degree, not categorical distinctions) that I would like to summarize: (i) the weight that is given to argument and logic over empirical results; (ii) tendency for debates to become polarized; and (iii) how much evidence is needed to claim 'proof'.
Logic is an essential tool in all sciences. It seems to me, however, that logic and argument are given greater weight in music science than in the physical sciences. I think that this arises naturally when there is greater uncertainty in observations - for example, the only significant scientific arguments I have observed in the physical sciences is when there is a challenging technique which has not undergone sufficient testing. The fact is that many observations in music science are statistical, such that there are few infallible 'laws'. There are few such irrefutable starting points (e.g., the physics of sound, Weber-Fechner laws) from which to build upon. With this amount of uncertainty, observations, inferences, and deductions can all be questioned at any level of a research project. I think it is this uncertainty that leads to increased reliance on argument and logic (Deroover et al., 2023).
As an illustration, I ask the reader to consider a recent pair of target papers in the journal Behavioral and Brain Sciences which each proposed a theory for the origins of music (Mehr et al., 2021; Savage et al., 2021). These articles provide two examples that would have been unlikely to occur in the physical sciences: First, in one paper the authors (Mehr et al., 2021) make an extensive seven point qualitative argument against the theory of music as an exaptation. Qualitative arguments are not unheard of in physical sciences, but what stands out here is the use of numerosity of arguments in place of quality (Harrison & Seale, 2021). Second, the papers are accompanied by 60 different commentaries that each discuss an aspect raised (or not raised) in the target papers. A potentially comparable case in the physical sciences would be the origin of life, but there scientists are focused almost exclusively on exploratory work, generating empirical results, rather than debates over theories; perhaps there is a general acceptance of the fact that debate cannot currently solve the question. In drawing attention to these examples, I am not making any judgments on what is 'correct'. I merely want to use these examples to illustrate differences between scientific communities. For me, awareness of the differences in approach helps to understand researchers from different disciplines.
It is my impression that there is a greater degree of polarization within music science, compared to physical sciences. In discussions at music conferences, I have been asked what side I take on some dichotomy 5, while I never experienced this at physical science conferences. An irregular experience of mine (yet which is new to me since getting involved in music science) is that music scientists erroneously assume or deduce my thoughts on topics based on what I have said previously. The only explanation I can think of for this is that my previous statements labeled me as part of a particular 'tribe' (Becher & Trowler, 2001). Such community-wide disagreements within a discipline appear to me to be uncommon in modern physics 6. They are more common in biology 7, but these disagreements typically get resolved after some decades due to methodological and technological advances. In contrast, one can point to the centuries-old nature-nurture debate over why music sounds pleasant 8. This debate is today extremely active, and shows no sign of imminent resolution (Collins, 1994). I do not mean to imply that polarization is unique or exceptional in music science communities, but it appears to me to be more prominent than in the physical sciences.
Different disciplines vary in standards about how much evidence constitutes proof (i.e., generally accepted that something is probably true). High-energy particle physics is an extreme example, where the gold standard is six sigma, which means that a result ought to occur by chance less than one out of half a billion times (p = .00005) to be accepted. In contrast, the typical probability needed to refute a null hypothesis in psychology is one in twenty (p = .05). In general, results in physical sciences are not accepted widely until there are multiple repeat measurements, using multiple experimental techniques. While similar approaches are found in music science, I can think of one example that illustrates a lower threshold for acceptance: In 1980 it was proposed that musical scales which have a property termed uniqueness - where each tone has unique relations with other tones, in terms of interval sizes - have certain benefits (Balzano, 1980). Scales with this property are asymmetric, and it has been speculated that this may facilitate key finding, increase the potential of melodic variation, and allow tones to assume different functions (Trehub, 1999). Despite going untested for almost twenty years, Balzano's (1980) paper racked up 110 citations. After the publication of a single experimental investigation (Trehub et al., 1999) (I have nothing bad to say about the paper itself), the 1980 paper picked up another 300 citations in the next 20 or so years. Eventually a more thorough investigation was undertaken (Pelofi & Farbood, 2021), but even before this I had spoken to many people who seemed to take this as 'established knowledge' in the field, yet it consisted of a hypothesis with a single test (participant sample size < 40). It is unlikely that such a paper would have been as influential in physics, where hypotheses only become important when they explain or predict robust observations 9.
Once again, I need to clarify that my intention is not to criticize, but to highlight differences between disciplines. Furthermore, one should be aware that these differences may exist, in part, due to the relative "cost" of psychological vs physical experiments. Physics projects can burn through a great deal of money, but they are less constrained by legal and ethical concerns (Sanbonmatsu et al., 2021), which severely limit the types of psychological experiments that are feasible, and how many people can be recruited. Regardless, I do worry that the inferred lower burden of proof in music science may lead to overly stringent gatekeeping, as a single peer-reviewed publication appears to carry much more weight than it does in physical sciences. I would like to illustrate this with a personal story: In a discussion about peer-review with a music scientist, my colleague opined on the importance of rejecting papers lest they lead the community astray. I explained that the papers and analyses could speak for themselves, and that people would figure it out. I was laughed at (in good humor) for my 'innocence'. I don't know if it was due to my (rather youthful, still) age, personality, or the standards of proof that I learned through the culture of physical sciences. Nonetheless, I wonder how different standards of proof may lead to different behaviors in academia.
In this section I have endeavored to open a discussion on the differences between disciplines, based largely on my personal experience. While readers may agree or disagree with my various conjectures, inferences, and deductions, I hope that they realize that they are made with good intent. In the following I aim to elaborate on more practical advice for both sides of the continuum, with the hope that the advice facilitates productive interactions between scientists differentiated by background, but united in their curiosity for the wonders of music.
It may seem obvious to the reader that more discussion and collaboration is beneficial. Here I would like to delve into the nature of the benefits, and how to practically achieve more discussion and collaboration. For me, the main contribution of a music scientist is the breadth and depth of knowledge in their subject. They know answers to many important questions that are necessary to guide reductive approaches, so that our models do not have glaring mistakes and rely on baffling assumptions. They know what has been tried before, and what are the common pitfalls within a topic. Expertise (especially the sort that is not directly evident in published papers) is invaluable in developing psychological experiments and performing appropriately-controlled big-data analyses. Some of the most enlightening discussions I have had have been with those whose expertise overlaps the least with my own. It is through these discussions that I have also learned something about the broad community of music science, which has led me to become a more effective communicator in papers and presentations. Finally, I speculate that papers may be more widely read if they are written by a diverse set of co-authors (Galam, 2022), given the social nature of knowledge diffusion (Wallace et al., 2012).
The more useful question to answer is: "How to find people to discuss and collaborate with?" I started out mainly with cold emails to strangers; this strategy has a variable hit-rate, but costs very little. Eventually I became aware of the many conferences and meetings (both online and offline), where I have found people to be overwhelmingly welcoming and certainly interested in speaking to a physical scientist. In my experience, the physical sciences have not yet adapted to the new world of online meetings - platforms are basic, buggy, and interactions have been limited. In contrast, the online music conferences I have attended have made attempts to be inclusive of different time zones, make use of innovative platforms like Gathertown, asynchronous interactions through platforms like Slack, and overall people seem more engaged. To help make these conferences and meetings more visible, I have compiled a list hosted on Github (https://github.com/jomimc/MusicScienceMeetings), which I hope can be maintained over time.
The last issue I would like to address is the question of how to initiate collaborations. This is difficult by default, since for a physical scientist with an established network in their main field, it requires either going outside one's network. This is particularly daunting since the usual development of an academic network starts with academic colleagues and advisors, and then organically grows throughout postgraduate study. Starting a new network from scratch requires a different approach, and some courage. It can also be difficult to start a collaboration when you have no track record of publishing in a different field. First, I advise starting with attending meetings and emailing people for discussions – in my experience people are more than happy to talk about their work if they have time. Second, I advise producing some preliminary results, and then to ask a music scientist for help in guiding the direction of the research at this early stage. One should be aware that such a preliminary study may suffer from serious design flaws and need to be entirely revised, but this shows a level of intent and commitment that I believe will help people take you seriously.
A succinct summary for this section is: "Write longer papers!" I was taught that good writing is about concision and clarity, but the level of detail required to achieve clarity depends on your audience.
The following advice is informed by discussions with several other physical scientists who have dabbled in, or migrated to music science. From these conversations, I have attempted to distill perceived shortcomings of critiques that we have received, to arrive at a concise set of concrete suggestions.
In this commentary I have attempted to bridge the communities of physical and music sciences. In the first section I summarized two papers (Berezovsky, 2019; Buechele et al., 2024) which are like two sides of the same coin. One was written for physicists, and the other was written for musicologists. My summary is directed towards the middle ground. In the second section I tried to distill some advice and understanding from experience, with the aim of improving dialogue and comprehension between the groups. I spent much time highlighting differences, while emphasizing that these are differences of degree. On the other hand, I neglected to mention the substantial similarities, chief of which is categorical: music scientists and physical scientists are scientists foremost, who share an intense curiosity about the world and a drive to understand it using rigorous methods and reductive theories. As music is a joy for all mankind, I am sure that physical scientists will continue to be drawn to it. With humility and understanding, physical scientists and music scientists surely have much to share.
Thanks are owed to the following for discussing their experiences as physical scientists working on music: Daniel Chitwood, Gabriel Aguirre-Fernandez, Lucas Lacasa, Eita Nakamura, Jesse Berezovsky. Thanks to Jesse Berezovsky for discussing the finer details of his work. Thanks to Niels Chr. Hansen, Juan Diego Diaz, Somya Mani, Daniel Chitwood, and Patrick Savage for comments on the manuscript. This article was copyedited by Gabriele Cecchetti and layout edited by Jonathan Tang.