DISCUSSIONS on the perception of a formal structure in collective free improvisations (CFI) are not new, especially when addressing the perception and the ontology of improvisations. Nowadays efforts to give empirical evidence for this subject can be seen in works such as Canonne and Garnier (2015) and Canonne (2018). Attributing a concept of form to freely improvised performances can be seen as dubious, given that the practice is commonly described as a musical creation without any kind of predetermination – such as a harmonic pattern, melodic theme, or any kind of predetermined structure whatsoever. It is also common to describe free improvisation as a practice in which a referent does not exist. Jeff Pressing (1984, p. 346) defines referent as an "[…] underlying formal scheme or guiding image specific to a given piece, used by the improviser to facilitate the generation and editing of improvised behavior on an intermediate time scale". By this definition, in free improvisation performances there are no referents. However, an interesting hypothesis is raised by Costa and Schaub (2013), in which the authors argue that it is possible to expand the concept of referent to free improvisation. Since the referent is shared amongst the participants (as in an execution of a jazz standard, where all the musicians share the same structure – theme, form and harmony), "the past of the current performance (involving all collective, short- and long-term memory) could be thought as the only referent for that specific performance" (Costa & Schaub, 2013, p. 5).

Thus, following this hypothesis, we can consider that the referent is constructed in real time during a free improvisation performance. This means that the musicians base themselves in the immediate past events for conceiving their musical gestures. We follow here the vision of Canonne and Garnier (2011, 2015), in which these musical gestures can generate collective sequences, meaning that the improvisers "succeed in converging on a given shared musical idea or framework, which is then played with until it is finally exhausted and so discarded, or negated by the concurrent introduction of a new musical idea from a member of the group" (Canonne & Garnier, 2015, p. 146). The establishment of a collective referent can create a collective sequence, a possible stable sonic environment. Consequently, following the authors' definition, a performance of free improvised music can be understood as a set of collective sequences. These sequences are inferred to be a conjunction of individual sequences (and it is interesting that the concept of emergence is discussed as a central part in collective free improvisation, as mentioned by Sawyer [2003], Borgo [2005, 2006] and others), and not all of the moments are considered satisfactory or interesting for the group. Thus, this can lead to different kind of sequences, such as erratic, uninteresting, or dubious. Wilson & MacDonald (2016), Canonne & Garnier (2015) and Goupil et al. (2020) made experiments with free improvisers and analyzed the impressions of the musicians on the quality of the perceived sequences. However, independently of its qualities, it is important to notice that the divisions of sequences by multiple improvisers had similarities, thus revealing a possible a posteriori structure of the improvisation. It is that subject that we'll aim to focus on in this paper.

As such, we aim to investigate the following research question: are there differences in the perception of a possible structure when the given context is different? This question is raised due to recent experiments, such as the aforementioned study by Canonne and Garnier (2015), where the sequential form was put to test, and also from Canonne's (2018) experiment on the qualitative differences in listening to a musical piece defined as an improvisation or as a composition. Results from the latter show that participants tended to have a negative evaluation—in example, in terms of musical coherence and the overall structure of the piece—of pieces that were in reality free improvisations but were contextualized as compositions. The reason being that participants perceived a certain lack of formal structure, while in composed pieces such structure would be a defining characteristic. However, given that we believe that the formal structure is an emergent element in collective free improvisations, we raise the hypothesis that listeners will segment (i.e., attribute a formal structure to) improvised pieces similarly even if the context of the creative process is said to be different. Our experimental setup also allowed us to explore how the perception of structure in improvised pieces is influenced by performance context, including factors like instrumentation and the number of instruments used in the piece.


As previously mentioned, the debate on concepts of a formal structure in collective free improvisation is an active one, as showed by Canonne and Garnier (2015). The authors address visions such as those that do not consider possible to achieve a concept of form in improvisation, as in Boulez (1975, p. 150). According to the composer, musical improvisation "focus on the sound phenomenon itself: but form is almost always left out". When addressing collective free improvisation, Boulez argues that the practice can only achieve a "[…] very predictable pattern 'excitement/rest/excitement/rest'" (Canonne & Garnier, 2015, p. 146).

Traditionally, musical form is a concept based on a predetermined structure such as a composition. In canonical works of compositional theory, such as Schoenberg (1970), or in theory of form, such as Bas (1947), musical coherence is a necessary aspect to understand form. This coherence is related, primarily, to a recognizable musical material – such as repetitions or thematic variation – that can be helpful in two major ways: 1) helpful to the organization of the composition in itself, by addressing different "intentions" to different parts of the piece; 2) helpful to the listener, as our perception is context-dependent and we tend to look for different cues to comprehend a work of art (Anglada-Tort, 2018; Lehman & Kopiez, 2010). This musical coherence is also described in an organic way – given that the form would organize the piece, so it would "function" as a living organism (Schoenberg, 1970). In summary, traditional views of formal structure are related primarily to compositions, as its musical material is predetermined. Thus, it is possible to analyze similarities between different parts of the piece and establish a form based on the thematic material – addressing repetitions and variations.

It is important to note that these views on composition do not aim to generalize the idea of an opposition between composition and improvisation. Elements of real-time creation, open form, lack of repetition and other elements were adopted in compositions especially when dealing with 20th and 21st century concert music. We will not address this debate here. However, Canonne (2018) demonstrated how these conceptions of structure are much more implied when a piece is described as a composition than when it is addressed as an improvisation, even when dealing with new music. It is shown that listeners tend to search for cues – structurally – when listening to a composition, given that a previous work by the composer is assumed. When listening to improvisations, listeners tend to follow the interaction between musicians, leaving aside the relations between parts of the music.

Authors criticize the simplistic view of form in improvisations (the pattern excitement/rest) and argue that it is possible to perceive clear formal structures in improvisations, even if the music is being created in real time and only by the interaction between the musicians (Borgo, 2005; Dean, 1992). As Jost (1994) mentions, freedom in choosing the material to play in the moment does not imply total absence of musical organization. Decisions are made based on an active knowledge base – the repertoire of the musician – and also by the active referent. Those decisions, thus, are not only referent to the sound itself, but all of the musical structure being created in real time, including form. It is clear that we cannot assume that the concept of form is similar when we're dealing with real time creation without any predetermined material, such as the case of collective free improvisation. Also, we don't believe that the idea of a simplistic pattern as "excitement/rest" is a viable one to explain what happens during a performance of collective free improvisation. One of the arguments that goes against this idea is that, although nowadays empirical research on this question is based on subdivision of multiple parts that are perceptually similar – the collective sequences it cannot be reduced to only those specific sequences. In the research that introduced the concept of a sequential form to improvisation, Canonne and Garnier (2015) show, in a posterior analysis, that musicians don't always converge on the change of sequences. This is also shown when analyzing improvisations with larger groups, such as in Goupil et al. (2020).

What is questioned is that there is no specific pattern when dealing with real-time musical creation with no referent (or with one created in real-time also). In the study by Goupil et al. (2020), musicians analyzed their gestures by parameters such as maintenance of the sound environment and change of the sound environment, thus relating to what they called as directional intentions of the musicians in a given improvisation. Interestingly, there are moments where a musician analyzed his/her action as a "change" in the sonic environment. However, other musicians described their gestures as "maintenance". In multiple times there are actions with an intention of change that are not addressed by other musicians. This means that for a real change in a collective sequence to happen, it is necessary that multiple improvisers understand the intentionality of their own gestures and, especially, the intentionality of gestures of other improvisers. We argue that there is no pattern of "excitement/rest" but a continuum of changes that, in some moments, are conceived together and are stable. Thus, we believe that the notion of collective sequences is more complex than a simple "conjunction" of similar musical gestures and/or intentions. A change sometimes is not perceived as a change by another improviser. In that way, there is no pattern "excitement/rest" but a complex game involving different intentions and different perceptions. The notion of a formal structure, in improvisation, rests on this complexity and the capability of the improvisers to maintain a stable sonic environment during a period of time. In this way, the formal structure can be seen as an emergent characteristic of improvisation.

In the aforementioned research, Goupil et al. (2020, p. 10) conclude that musicians in a 16-piece ensemble engaged in free improvisation could achieve coordination without an explicit shared plan or external conductor. They found that musicians' individual choices to play or stop playing were interdependent and influenced each other. Additionally, musicians' directional intentions to either change or support the music were also found to be interdependent and to influence each other, leading to a local alignment. Given that the directional intentions were considered interdependent only locally, we can assume that the perception of a possible form is individual: it depends on which actions were brought to the attention of the improviser during the performance. Thus, it is important also to discuss another problem that is fundamental to our experiment: are there any differences in listening to improvisations when compared to other generative processes in music?

When arguing about musical form in improvisation, it seems clear that the perception of a musician, when improvising, is different. It is not only their perception that is different, but the whole cognitive structure (c.f. Clarke, 2001). However, in this study we aim to address the interaction between the improvised work and the listener. This question is clearly put by Canonne (2013, p. 331, our translation): "Is there anything specific to the aesthetic appreciation of freely improvised music? Or, […] is the experimental relationship that is created between the listener and the object of its perception varied in function of the mode which the music perceived was produced […]?". 2 Falleiros (2012) conceives this relation between the musical object and its perceiver as a pact. As the author mentions:

"The fictional narrative throws the event into the space of speculation. For that, it is necessary that the 'reader' of this narrative detaches from the rational reference that requires external and universal evidence and immerses themselves in the particularity of the fictional world, thereby establishing a pact with the work. A narrative that wishes to describe an event in its present moment, by its own time, should be formulated by the constant struggle of redoing and reorganizing, following the encounter between rational and fictional narrative. To build a narrative that intends to present the present event, such as improvisation, an account of the understanding of the relationship between poetics and observation, and therefore the idiosyncrasies of each artistic act is first required" 3 (Falleiros, 2012, 74, our translation).

The argument of the establishment of a pact between the audience with a work of art is corroborated by the notion that our perception is context-dependent. In recent studies, it was demonstrated how the subjective evaluation of a piece of music was dependent on the given contextual information. For instance, Kroger and Margulis (2016) demonstrated that listeners generally give higher evaluations to pieces presented as performed by professional musicians, compared to those purportedly played by students, regardless of whether the information about the performers was correct or incorrect. A study by Aydogan et al. (2018) reproduced these aforementioned results – with the piece of music presented as performed by a professional being evaluated more positively when compared to a piece presented as performed by a student. Other studies, such as ones by Duerksen (1972), Kirk et al. (2009) and Margulis et al. (2017) presented different empirical evidence of the dependency of context in musical perception.

When addressing musical improvisation and the context-dependency (or, as Falleiros [2012] mentions, the pact), Lehmann and Kopiez's (2010) experiment is important to demonstrate that the perception of an improvisation is not "innate". That is, it is not possible to argue that we can fully distinguish between different generative processes only by listening to a piece. In their experiment, the authors demonstrated that we do not tend to address the pieces by its generative process – fully written pieces ("compositions") or real-time creations ("improvisations") – but by "style" or "genre".

This is attested by Canonne (2018) in an experiment where the pair—composition and improvisation—were put to test. In the experiment, two groups were divided to analyze a piece that was a collective free improvisation. Each group received different contextual information. One group was told that the piece was a composition – fully notated, where the musician respected all the directions of the sheet music. The other group was told that the piece was a product of the original creative process: a collective free improvisation. The piece presented was chosen in a pilot experiment, where participants had to guess if a number of pieces were compositions or improvisations. However, all the pieces were free improvisations. The one that had the most "compositional aspect" was the object for the main experiment in which participants were told to qualitatively describe the piece – whether they thought it was a good composition/improvisation. Results were logical: those who had the context of the piece as being a composition gave mostly bad reviews – mentioning, specially, how the lack of structure made the piece incoherent. The participants that were told that it was an improvisation gave mostly positive reviews, focusing on the interaction of the musicians.

This experiment clearly shows how contextual information is fundamental to our perception and to our subjective evaluation of an artwork. However, even if we do not have this contextual information, we tend to search for cues to support some notions that are previously constructed in our knowledge base. In the aforementioned experiment, when the piece was addressed as a composition, the listeners tended to direct their attention to structure; when addressed as an improvisation, the listeners directed their attention to interaction. Lehmann and Kopiez (2010) consider that these tacit cues can guide the listener to the generative process, but most prominently, to the style or genre of the music. In another experiment about comprovisations, we argued that there is an ambiguity in how music is perceived, given that even with context information about a piece, the participants were unable to describe parts where the music was fully-notated or where it was improvised (Faraco, 2020).

In that manner, we'll assume the position of Canonne (2013) in his partition of listening postures. As the author explains, an "intentionalist listening" characterizes the aesthetic apprehension of an improvisation. This type of listening regards a "search of a musical thought through the process of creation – that consists in relating the heard sounds to a series of musical intentions to project intentional states on the perceived sounds" (Canonne, 2013, p. 352). That is, the listening posture when establishing a pact with an improvised piece of music is directed mainly to the interactive game that happens between musicians. This type of listening is distinguished from two other listening postures, namely the "acousmatic listening" and the "instrumental listening". The first is related to a hearing directed to the sound itself, with no regards to their source of production, while the latter consists in "hearing the sounds as the product of an instrumental gesture, by relating the sounds to the complex instrument/body of the interpreter" (Canonne, 2013, p. 353). These three listening postures would then relate to three different generative process in music: electroacoustic/electronic music (acousmatic listening); music destined to be interpreted (instrumentalist listening – with a mediated reception by the notation and the interpreter); and musical improvisation (intentionalist listening – a non-mediated listening).

Following this theoretical background, in our experiment we aim to investigate whether emergent formal structures of free improvisations are perceived similarly if the context given is different (such as in Canonne [2018] – by giving a contextual information that the originally improvised piece is actually a composition). We refer to the result of each participant as an emergent formal structure.



Twenty adult participants, all Brazilian, with an average age of 31.6 years old (SD = 6.63), took part in this study. In total, there were 9 female and 11 male participants. All participants had an academic background in music (ranging from undergraduates to PhDs) and had experience in collective free improvisation and/or classical and contemporary composition. Only one participant reported that they had never played in a CFI group, although they had contact with the practice. Recruitment was done by sending invitations to known CFI groups established in universities in Brazil. No compensation was given to the participants.

We chose to recruit musicians with experience in collective free improvisation in order to address the idea of a formal structure in free improvisation/contemporary compositions. At this time, non-musicians were not targeted, as we believe that the concept of a form in this music requires, initially, a notion of the task that is to freely improvise. Perhaps, in the future, we can expand this experiment and compare the divisions made by musicians and non-musicians, or even with musicians without any experience in free improvisation.

Task and Procedures

Participants were asked to listen to a given recording of a piece and to divide it in subdivisions representing possible collective sequences or other parts, such as erratic or unstable sequences. That is, a change in the structure of the piece. Given that at the time of the experiment there were health recommendations to social distance due to the Covid-19 pandemic, participants made the analysis in their homes. They were asked to utilize headphones to hear the pieces, and they were allowed to listen to the piece multiple times (although they were limited by the application we utilized for the analysis, as we'll discuss). For each participant, we assigned a Google Drive folder in which there was a file of the piece (named only with a number) and a text file with the instructions, and the link to the application. In these instructions, we detailed the context of the piece. The first group was told that the pieces were in fact collective free improvisations, that is, the true generative process of the piece. For the second group, we described the pieces as compositions made by 20th and 21st century composers. We also explained that the pieces had "openness" to real-time creation, although they were, in most part, written. We were asked before the start of the experiment if the pieces had different kinds of notations. We did not respond to that question. We believe that if the listeners had known the types of notation (for example, if we described the pieces as with graphic notations), it would influence in the way they would conceive the structure of the piece.

In the instructions, there was a link that redirected the participants to an application designed to analyze the quality of sequences in CFI, 4 which was remodeled for this experiment. In this application, the participant would write their name, their piece number, and do the analysis by managing a slider bar with two extremes. When the participant felt that there was a change in the structure, they click on the screen and the slider bar would move to the other extreme, a visual way to denote changes. This way we had discrete data with values 0 – 1, that defined changes in the formal structure (the slider bar only had these two values; thus it was not possible to leave the bar in the middle after the beginning, for example). This data was directed to a MongoDB database, in a JSON format with the timestamps (in seconds) and the values (0 or 1), that showed the changes. We gave the participants the following instructions: "When you feel that a change in the structure occurred, you should move the bar to the other side". For clarification, Figure 1 shows the main page of the application, with the slider bar for analysis:

Screen capture of application. More description above and in figure caption.

Figure 1. The application used for the analysis.

The application has three buttons at the top: "Play", "Pause", and "Reset". The button underneath the slider bar means "Send". Although we did not specify the number of times participants should listen to the piece, the participants were obliged to do the entire analysis before they could do it again. Also, they could reset if they felt the analysis was wrong in some way. Finally, we instructed the participants to write in the text file if they had any additional comments on the piece.


We chose ten different recordings from established collective free improvisation artists. The pieces can be addressed as "classical free improvisation" (Costa, 2017), a concept that searches to define free improvisation groups that, as the author says, "realize unprepared performances (no explicit previous planning, on a time-delayed basis), or partially prepared (by a verbal script, conventional or graphic sheet music, words, restrictions, etc.)" (Costa, 2017, p. 8). Following this definition, we've chosen ten different pieces from established artists such as George Lewis, Evan Parker, Joëlle Léandre, Derek Bailey, and others. All of the recordings were free improvisations without any kind of predetermined or established referent. There were two duos, three trios, two quartets, one quintet, and two sextets. Although all pieces were free improvisations, we paid attention to possible cues that would suggest composition-like elements in those improvisations, such as repetition of a motive created in the performance, direct interaction, and imitation by the improvisers, etc. We've gathered the relevant information of the pieces in Table 1.

Table 1: Information about the pieces.
PieceTitleRecording ArtistsAlbumDurationInstrumentation
1Speed of
Joëlle Léandre,
Robert Dick, Miya
Solar Wind (Not
Two Records,
3'44"Double Bass, Flute, Koto
Nu DuoLive Performance3'42"Bass Clarinet, Vibraphone
3ImprovisationIsabel Crespo
Pardo, Zoh Amba,
Afarin Nazarijou,
Talia Rubenstein,
Anna Abandolo
Live Performance4'01"Voice, Flute, Qanun, Guitar,
4Tenor, Bass
Percussion T2
Dave Holland,
Evan Parker, Craig
Taiborn, Chess
Territories (Dare2
Records, 2018)
4'10"Tenor saxophone, Double
Bass, Percussion
514, Rue Paul
Fort 7
Joëlle Léandre,
Benoît Delbecq,
François Houle
14, Rue Paul Fort
Paris (Leo
Records, 2015)
3'24"Piano, Double Bass, Clarinet
6Free Improv 3Momentary QuartetLive Performance3'49"Piano, Horn, Trombone,
7TrahüttenEvan Parker
Toward the
Margins (ECM
Records, 1997)
6'17"Soprano Saxophone, Violin,
Double Bass, Stroh Viola,
Piccolo Clarinet, Live
8ErosãoOrquestra ErranteLive Performance7'29"Double Bass, Alto
Saxophone, Clarinet, Piano,
Flute, Electric Guitar
9TuckDerek Bailey, Evan
Parker, Hugh
Davies, Jamie Muir
The music
Company (ECM
Records, 1970)
3'04"Soprano Saxophone, Electric
Guitar, Drums, Electronics
10DuoGeorge Lewis, Yi-
Yi Wang
Live Performance4'13"Trombone, Erhu

Data Analysis Methods

Our variable of interest in this experiment was the markings of the subdivisions – the segments. We believe that these markings, that were placed when the listener believed that a change in the structure had happened, represents the perception by the listener of what Pressing (2001) calls as an IG – an interrupt generator. This idea of IG was used in the aforementioned experiment by Canonne and Garnier (2015). Pressing (2001) described a model in which the improvisation is constituted as a set of musical sequences (a concatenation of event clusters). As Canonne and Garnier (2015, p. 146) mention, "each new sequence begins with an interrupt generation, i.e. the interruption of a purely associative chain of musical ideas, which often translates to the previous sequence". We believe that the segmentation made by the participants, especially those in Group A (who were given the improvisation context), are markings of IGs, that can represent a change in the musical structure. In Group B, the group that had the composition context, we believe that the participants did not perceive IGs exactly as defined by its concept. The interruptions perceived by the participants in Group B would be more related to previous notions of formal structure in compositions. Thus, we can pose the following question: are these segments that delimit the sequences and the structure of the analyzed pieces correlated in any way? If so, it would suggest that the perception of formal structure is not related to the generative process of the music.

For this analysis, we structured the data in Boolean variables (0s and 1s), that represent the segments of the listener in the time frame of the piece. As our experiment has multiple similarities to those of Canonne and Garnier (2015), we have followed their method to correlate IGs. Traditional methods of correlation are not very useful in this case considering that we only have a "signal" when the segmentation of the listener appears. Thus, we reproduced the authors' method, in which we define a time interval (Δt) where we can consider the segments as done in conjunction. This Δt is defined given that it is practically impossible to have segments done in the exact same moment. Considering that most of the pieces we utilized for the experiment do not have defined change points such as silence, repetition of thematic material, or even bar measurements, we believe that the Δt is necessary to comprehend small differences in interpretation on the beginning of new parts. Following their method: "For a given time resolution Δt, we search for each musician and for each IG located at time t if at least another IG from another musician occurred within the time frame [t – Δt; t + Δt]" (Canonne and Garnier, 2015, p. 149). Afterwards, we can calculate the proportion of the sum of all conjunct segments divided by the total number of segments. We will call this proportion P. The results are described in the next section.


Table 2 gives a summary of the data, presenting the following measurements and statistics: number of perceived sequences/segments (NS), mean duration of the sequences by piece (MDS), standard deviation (SD), minimum value (Min.), the median value and maximum value (Max.). With exception of NS, all the other measurements were made in seconds. For each of the ten different recordings, measurements of the two samples – A (sample with the improvisation context) and B (sample with the composition context) 5 – are given. In total, there were 118 sequences perceived with a mean of 5.9 sequences per piece. The total mean of all durations as displayed in the following table is 43.01 seconds, with a standard deviation of 21.14 seconds.

Table 2: Results from segmentations made by participants: number of segments, mean duration of each sequence, standard deviations and minimum and maximum durations.

In this results section, we'll first discuss the agreement in the formal structures when compared between groups (composition x improvisation contexts), in order to see if any similarities emerge under these different contextualizations. We also conducted statistical tests on key variables obtained from this experiment, including the lengths of sequences, the total number of perceived sequences, and the relationship between sequence length and the number of instruments in each piece.

Agreement in Segmentation

First, we'll address the agreement in the segments made in each piece, by pairing them by the analysis made from the two groups in different conditions. We should remark that, even if we raised the hypothesis that a similar segmentation would indicate that the perception of structure does not depend on the contextualization of the generative process of the pieces, we should look at the data carefully, as there are multiple variables that we did not map. Those are subjective values, related to the individual aesthetic experience of each participant. In case we find multiple agreements in the segments, we can attest two different points: 1) cues afforded by the music are not dependent in its generative process, however in a specific context: that of contemporary music creation; 2) although aesthetic experience can be subjective, there are possible musical aspects – such as the formal structure – that are perceived similarly, independent of the generative process of the piece. That is, formal structure is indeed an emergent aspect of music, even in different creative processes. Although we aimed to select pieces that had cues that could imply the pieces were a composition, it is important to remark that this experiment should be augmented to encompass multiple and different pieces. That is why we consider this paper as an exploratory analysis.

As mentioned, to assess if there are any similarities in the segments, for each piece we took the analysis of each group and compared them, to find segments that are conjunct in a given Δt. First, we iterated the data in multiple Δts, to see its distribution. Logically, the larger the Δt, the more matching segments we will have. For better visualization, we computed the proportion P, which is the ratio between the number of segments considered as simultaneous and the total number of segments made. We have plotted the distribution of P in function of multiple Δt (in seconds), with a linear fit (dotted line), which can be seen in Figure 2.

Graph of linear fit. More description above and in figure caption.

Figure 2. Distribution of P in function of different Δt (in seconds).

We can see that within the 5-second mark, the proportion increases in a rapid fashion, and then tends to stabilize. A linear fit in the 5-second period shows a slope of the line of 0.12. Thus, in small-time resolution the simultaneity of segments grows linearly, giving us the typical timescale (τ) of 1/0.12 = 8.3 seconds, which represents the Δt where we consider two segments as made simultaneously. Following the methodology in Canonne and Garnier (2015), and to corroborate this typical timescale result, we reproduced our tests by using the module impro, a python library that has specific functions to analyze IGs and their relations (Faraco and Garnier, 2023). We had similar results in both the percentage of simultaneous segments and in the computation of the Δt. The latter was calculated by analyzing the data and seeing that the proportions of simultaneous segmentations exponentially increased as the Δt increases, such as in Canonne and Garnier (2015). With the typical timescale as a constant, we then take the "linear fit of log(1 – P) as a function of Δt" (Canonne and Garnier, 2015, p. 149) in order to obtain the typical timescale, which is also 8.3 seconds. Results are reported in Figure 3.

Graphs of simultaneous segmentations and typical timescale computation. More description below.

Figure 3. Analysis of the simultaneous segments and typical timescale τ = 8.3s.

In the above figure, it is shown the average proportion (the dotted black line in the left graph), as all of the proportions for all the pieces. The graph on the right in Figure 3 represents Canonne and Garnier's (2015) method to choose a realistic/common Δt by realizing the linear fit. In their experiment, they found a time scale τ of 10 seconds when comparing signals from improvisers, while listeners that made the segments had a τ of 4 seconds. In our experiment, assuming this exponential relaxation, we found τ = 8.3 seconds. For better visualization of the data, we have organized Pi, with i being the respective piece, and P in function of multiple Δts in percentages, as in Table 3.

Table 3: P in function of different Δts, in percentages.
Pi / Δt1s2s5s8s10s15s20s25s

We decided to analyze both Δt = 5s and Δt = 10s, although we believe that within the 5s time frame, the results are more meaningful. So, in summary, within a 5 second time frame, 64% of the total segments (or IGs), were done together and correlated. In a 10 second time frame, 71% were correlated. When analyzing greater time frames, as seen Table 3, we achieved numbers higher than 90%, even when the Δts are smaller than the mean length of the sequences. When analyzing Δt = 8s, which represents the typical timescale τ derived from the exponential relaxation, the results are slightly higher than Δt = 5s, with P = 69%, while a little lower than Δt = 10s.

As we can see, we have some highlights: in piece 6, there is an abrupt change from 20% of matching in Δt = 8s to 90% of matching when Δt = 15s. In piece 9, there is an abrupt change from 33% of matching in Δt = 5s to 100% of matching in Δt = 15s. At last, in piece 10 we got 100% matching since the Δt = 2s. We have analyzed the data on piece 10 and it demonstrated that when Δt = 1s, the percentage of matching is 33%. This is really outstanding, given that the listeners did not only match their segments, but almost matched it in the exactly same timestamp.

Differences in the normalized number of perceived sequences/segments

In relation to the number of perceived sequences (or segments made), we can see in Table 2 that, when analyzed in pairs, most of the recordings had similar or an approximate number of divided sequences. Pieces 2, 6, 9, and 10 had similar divisions in their pairs. Pieces 3, 4, and 7 had only one sequence in difference. The others had two or more differing number of sequences. The data in number of sequences is very limited, given that we had a small sample. We normalized the data by dividing the number of sequences perceived by the total time of the respective improvisation. This could possibly minimize effects of length of the pieces, given that a piece with a longer duration would have more sequences than a smaller piece. This is actually the case in our experiment: pieces 7 and 8 are longer than 350s and have the most subdivisions (between 8 and 10 segments, while the average is 5). We found that, when normalizing the data, the distribution is possibly symmetrical, as shown in the histogram in Figure 4. Even though the distribution is lightly skewed, the results of the Shapiro-Wilk test demonstrates the normality (W(20) = .920, p = .102).

Histogram. More description above and in figure caption.

Figure 4. Distribution of the normalized data – number of sequences (NS)

Despite our limited dataset, we conducted statistical tests to examine potential significant differences in the normalized number of perceived sequences, as illustrated in Figure 1. A paired t-test revealed a significant difference between Group A (Mean = 0.02, SD = 0.005) and Group B (Mean = 0.03, SD = 0.006), with t(8) = 2.7 and p = .013. While the small sample size limits the conclusiveness of this result, it intriguingly suggests a potential variance in sequence perception when pieces are contextualized differently. Group A perceived a total of 54 sequences, compared to 64 in Group B. This leads to an intriguing hypothesis: segmentation might generally be consistent, yet there appears to be a variance in the number of segments identified. This observation could lead to further investigation into whether listeners tend to segment more in improvisations or compositions. Such findings could indicate specific cues that influence the generative process, despite the majority of segments being agreed upon, as noted earlier.

Differences in the sequences' lengths

We have also performed some tests to see if there are any significant differences in the perception of the duration of the sequences when they are contextualized differently. To address this question, we did the same process as above, normalizing the data by dividing all the sequences' duration by the total time of their respective recording. This is important given that for these tests, we have aimed to perceive a "common length" in the lengths of the perceived sequences. With this balancing, we can observe the distribution of the data, as shown in Figure 5.

Histogram. More description below.

Figure 5. Distribution on the data of the length of sequences perceived.

The histogram clearly presents the asymmetry of the data. Even by balancing the duration of the sequences, the distribution maintains a positive skewness. It is important to consider that we tried normalizing the data with logarithms and square roots. However, none of those methods approximated our data to a normal distribution. Thus, for our objectives here, we decided to use the balanced data and non-parametric tests of inference.

To see if there were significant differences in the lengths of the sequences, we have performed a Wilcoxon signed ranks test. The results of this test indicated that there are no statistically significant differences between the values of the groups (Z = 0.97, p = .33). We have also performed a paired T-test, which rendered similar results. It indicated that there is a non-significant medium difference between Group A (M = 0.2, SD = 0.1) and Group B (M = 0.2, SD = 0.08), with t(53) = 1.3 and p = .193. Thus, there is no significant evidence to suggest that different contextualizations lead to variations in the perceived length of a collective sequence. This can lead to further research questions especially in the study of CFI: is there a "common length" of sequences? Given that improvisation requires specific cognitive processes and is based on cognitive economy, one could suppose that sequences have common length given that a variable of "boreness" (Canonne & Garnier, 2011), or tiredness exists. However, as we'll see in the next section, this hypothesis can be questioned given that we found a certain dependency in the length of the sequences and the instrumentation of the piece.

Dependency in the length of the sequences and number of instruments

If we analyze the duration of sequences by the number of instruments of the given recording, we notice a certain relationship. By using the normalized data, we performed a Kruskall-Wallis test – a one-way ANOVA test on ranks – for comparing the durations perceived in function of the number of instruments of the piece: duos, trios, quartets, quintets, and sextets. The test shows that there is a statistically significant difference between the mean ranks of the groups (H = 30.10, p < .005). Thus, it is possible to assume that the duration of a sequence perceived by the listener is influenced by the size of the group performing that piece. Figure 6 presents a visualization of the distribution of the balanced durations by the number of instruments: 6

Boxplot. More description above and in figure caption.

Figure 6. Distribution of balanced durations of sequences in function of the number of instruments in the piece.

This dependency is perceived when we compare all the available results. However, we sought to also see the distribution within groups, that is, if there is any difference in the results when the context was different. This is shown in Figure 7. We ran multiple one-way ANOVA tests and Kruskall-Wallis tests (when the data was not normally assumed) to see if there were any significant differences between both groups when analyzing the durations of the sequences in function of the number of instruments. There was only a statistically significant difference when there were trios (F = 7.27, p < 0.05). 7 This suggests that although the duration of the sequences is dependent on the number of instruments, it is not dependent on the contextualization of the generative process. As we will mention in the Discussion section, our hypothesis is that this happens because groups with larger number of instruments in CFI tend to generate sound environments that have greater complexity.

Five boxplots. More description above and in figure caption.

Figure 7. Multiple boxplots with the distributions of length of sequences in functions of the number of instruments (duos, trios, quartets, quintets, and sextets).


First, we'll address the results by comparing the simultaneity of segments. As seen, within an 8-second time frame, 69% of the total segments were considered as correlated. When analyzed individually (the pairs of pieces), we can see drastic differences between the overall percentage of correlation (such as in P6, which was 20% and P10, which was 100%). In a 10-second time frame, 71% of the total segments were considered as simultaneous.

As previously previously, we can consider these segments in two different ways for each group, due to their different contextualization: for Group A, where the true generative process was described, we can consider the segments as collective sequences determined by individual IGs, given that they would represent interruptions in the improvised sequences. In the second group, where the pieces were described as compositions, we believe that a similar thought process happened such as in Canonne (2018). The listeners searched for structural cues that would suggest a change in the overall form of the piece – which is common in compositions. In Canonne's (2018) experiment, participants showed significant differences in their judgment of the quality of the pieces when in different contexts. However, as we aimed to demonstrate here, although the difference in judgment, it appears that the comprehensions of the structure of the pieces are almost 70% of the times correlated. This leads us to assume that, although listening to improvisations and compositions have their qualitative differences (as in judgments of quality), structurally we can presume that they are more similar than they appear to be. This further attests the emergent quality of formal structures in music, independent of the creative process.

Of course, we cannot generalize our results. Further experimentation would be necessary to conclude that the perception of formal structure in improvisations and compositions are similar. In Canonne and Garnier's (2015, p. 152) experiment, they asked for listeners to segment the improvisations, and the results were that 53% of the segments made by the listeners matched those made by the improvisers. Also, between the listeners, when compared in duos there was a high proportion of correlation (73.4%). However, when analyzed within multiple listeners, this proportion goes down (with three listeners: 56.4%; 4 listeners: 40.4%; with five listeners: 9.2%, with Δt = 4s), meaning that unanimity is hard to reach. However, they did not compare this segmentation within listeners who had different contextualization, as we did here. We believe that one of their points raised can relate to the experiment here made: "In many ways, being often atonal, timbre-oriented and/or multilayered, the question of CFI's segmentation can be approached in much the same way as the segmentation of non-tonal western contemporary music or electroacoustic music. As Bailes and Dean (2007) put it, segmentation of these types of music apparently relates to its surface features: changes in the texture, in the instrumental timbres, in the frequency range or in the overall loudness will likely serve as cues for the detection of musical boundaries" (Canonne & Garnier, 2015, p. 153).

We agree with Bailes and Dean's (2007) argument in the comprehension of the segmentation that is based especially on changes rather than continuity or repetition. However, it was clear in Canonne (2018) how the participants still judge the music based on a traditional concept of form, even when faced with contemporary non-tonal music. We can relate these different quality judgments by assuming that the listener has different listening postures according to the context he has of the generative process of the piece. As already mentioned, when informed that the piece to be heard is a composition, the listener assumes an instrumentalist listening, where they aim to relate the sound product to an interpretation; when the context given is that the piece is an improvisation, the listener assumes an instrumental listening, with a hearing directed to the instrumental gestures and generative processes, rather than an interpretation. From these comprehensions on listening postures, we can understand that quality judgments can be clearly different when dealing with improvisations and compositions. However, as we saw in our experiment, the overall notion of structure can be quite related and possibly similar.

Secondly, it was shown that when analyzing the total duration of the sequences, there were no significant differences. However, the total number of perceived sequences were significantly different. This leads us to think that, even though there are similarities in the segmentation points as seen, when contextualized as compositions, listeners tend to subdivide the pieces more than when they are contextualized as improvisations. We believe that this is due to the listening posture: in intentionalist listening, listeners tend to analyze the fluidity of the improvisation; perhaps, when there is a perceptive change in the sound environment, the listeners do not qualify it as a change in the structure, but a continuity in the current idea. In instrumentalist listening, such a change in the sound environment can represent a change in the overall structure. However, as we mentioned, we cannot attest this given that the data was scarce, and no qualifications were done as to see how the segments were made.

At last, another interesting result we can assume here is that the duration of the sequences seems related to the number of instruments in the improvisations. This means that, when analyzing pieces with a greater number of instruments, listeners tend to expand the time frame of the sequences. As we mentioned before, we believe that this is related to the complexity that tends to augment when more instruments are involved in free improvisation. This was attested by Goupil et al. (2020), for example. As the complexity of the music is augmented, the sequence tends to be more complex: there are more nuances, more subtle changes that need to be addressed. That is, the listener has to decide which changes they consider as important to qualify as a point of segmentation. Thus, it can lead to greater lengths in the sequences. However, it does not seem that there are differences when analyzing the two contexts, improvisation and composition. As seen, only in trios there were significant differences. This can reinforce our hypothesis that, although the judgments on quality can change, perception of structure in improvisations/compositions can be closely related. In the study by Goupil et al. (2020), the authors perceived that there is a tendency in large groups to form smaller units (duos, trios, etc.) of interaction. This could be an influential factor in the listeners' perception, given that cues of structural elements would be disintegrated in the multiple subgroups formed in a large group CFI.

One recent study by Saint-Germier et al. (2021) investigated the differences of improvisers' experiences in small and large groups of CFI. Although here we studied parameters of the perception of improvisations, we believe that it is interesting to report the authors' results. First, they mention that "[…] we found that group size strongly altered the phenomenology of improvisers who otherwise shared many characteristics (high expertise, similar aesthetic preferences, etc.), and that such experience varied the way improvisers dynamically related to one another throughout the performance" (Saint-Germier et al., 2021, p. 18). The authors raise five aspects of the phenomenology of agency as to study these differences: a) agency for a joint outcome, b) agential identity, c) integration, d) dependence and e) reflexivity. We will highlight two results: the authors reported that in the large group, a 16-piece improvisation ensemble, they found a higher sense of We-agency, that is, "whether the agent experiences herself as an individual agent or as a part of a collective agency" (Saint-Germier et al., 2021, p. 5). Also, the ensemble participants felt more integrated than their quartet counterpart. This means that, although we tend to think that in large-group improvisations coordination is difficult to reach, and that in large groups there would be more "misalignments", as said by the authors, it seems that in smaller groups a "flow" or a "momentum" is more difficult to reach. Also, musicians within large groups tend to restrict their gestures given that they know the complexity of the environment, thus giving a greater sense of integration.

By analyzing the results in the study by Saint-Germier et al. (2021), we can attest that addressing complexity and its relation to the size of the group (number of instruments) depends on multiple factors, such as the experience of the group in playing together, the interactional links, the notion of a joint outcome, and others. Thus, although in our results we have seen a correlation between the number of instruments and the length of the sequences, we should expand our experiment and systematically approach this issue. We could raise the hypothesis that the length of the sequences had this dependence given that the musical material in the pieces used had a continuous construction, as in Saint-Germier et al. (2021). Thus, there would be no cues or clear interruptions that would make the listener think that a change in the structure was being made. If it is true that in large improvisation ensembles senses of integration and of agency are greater, we could infer that the creation of musical material in real time is more "cautioned", thus leading to a more continuous texture, without clear moments of change that would imply a structure. Also, it would be interesting to see how these phenomenologies raised in Saint-Germier et al.'s study reflect on the perception of improvisations, rather than the experience of the improvisers. However, for us to know these details, we should do a new experiment, addressing the quality of the sequences and the judgments by the listeners. Perhaps, in the future, this would make way to new hypotheses in the perception of improvisations.


The method utilized in this study has some limitations. We did not address the quality of sequences, meaning that we could not perceive if there were similarities between them. Also, our sample size was limited; in the future, it would be interesting to recruit more participants and also employ a more robust set of recordings. Also, this experiment had no criteria for the subdivision. Perhaps, also for the future, it would be interesting to think of an experiment that has individual criteria for subdividing the structure. For that, it would be proficuous to use methods such as those that consider verbal reports as data – as a mean to discover the cognitive processes that happens when the listener subdivides a recording of collective free improvisation.

In conclusion, this experiment, although simple, is a first step into analyzing the properties of the segmentational form and structure of collective free improvisation when compared to segmentation of non-tonal contemporary music. This experiment showed that there is a significant percentage of similarities in how subdivisions of the pieces were made by the group in which the piece were contextualized as being improvisations and in the other group where the pieces were contextualized as being compositions. Also, listeners tend to evaluate the lengths of the sequences in a similar manner, although their duration seems to have a dependence in the number of instruments of the piece. Further research will tell us about the qualities of these sequences and how they interact – both between similar pieces and different ones. However, knowing that these similarities and dependencies exist can lead us to better comprehend not only the perception of structure in collective free improvisation but the creation of a real-time structure in the practice.


This article was copyedited by Annaliese Micallef Grimaud and layout edited by Jonathan Tang.


  1. Correspondence can be addressed to: Arthur Faraco, University of São Paulo, Av. Prof. Lúcio Martins Rodrigues, 443 – São Paulo, São Paulo, Brazil. E-mail: arthurfaraco@usp.br
    Return to Text
  2. In the original: "Y a-t-il quelque chose de spécifique à l'appréciation esthétique des musiques librement improvisées ? Ou, […], la relation expérientielle qui se crée entre l'auditeur et l'objet de son écoute varie-t-elle en fonction de la manière dont la musique perçue est produite, […] ?" (Canonne, 2013, p. 331).
    Return to Text
  3. In the original: "A narrativa ficcional lança o acontecimento para o espaço da especulação. Para isso, é necessário que o 'leitor' desta narrativa se desprenda da referência racional que exige provas externas e universais e imerja na particularidade própria do mundo ficcional, estabelecendo assim um pacto com a obra. Uma narrativa que pretenda descrever um acontecimento no seu instante presente, por sua vez, deveria ser formulada pelo embate constante do refazer e reorganizar que se sucederia do encontro entre a narrativa racional e a ficcional. Construir uma narrativa que pretenda apresentar o acontecimento presente, como a improvisação, necessita que primeiramente seja feito um relato do entendimento da relação da poética com a observação, e por conseguinte das idiossincrasias de cada fazer artístico" (Falleiros, 2012, p. 74).
    Return to Text
  4. The application is available at the following link: https://improv-analysis.vercel.app/. This application was developed for our ongoing PhD research. Nowadays, it consists in a slider bar with multiple values, as to analyze the quality of a musical gesture within a free improvisation, in the ranges of "maintenance" and "change", such as in Goupil et al. The application was changed for the experiment described in this paper.
    Return to Text
  5. The data was analyzed using Python 3, with the statsmodel, scipy, numpy, matplotib and seaborn packages. We also utilized R and the Rcommander package for some of the inferences.
    Return to Text
  6. It is interesting to see also how the distribution changes when we do not balance the data, as in the following graph in Figure 8:
    Boxplots. More description below.

    Figure 8. Distribution of non-normalized durations of sequences (in seconds) in function of the instrumentation of the piece.

    We have performed a Kruskal-Wallis test, using chi-squared distribution, to see the difference with the balanced data: when not balancing the data, there is no significant difference between the mean ranks of any pair (χ2 (4) = 2.75, p = .601). However, as aforementioned, the differences in the lengths of the pieces can affect the data, given that the complexity of an improvisation tends to augment in function of the number of the instruments (as seen in Goupil et al., 2020). Thus, we could assume that there would be more discrepancy in the durations of sequences when the group is larger. Also, within larger groups it also normal to assume that the lengths of the pieces are greater, given that there is the possibility of multiple ideas surging at any moment.
    Return to Text
  7. For duos, p = 0.97; for quartets, p = 1; for quintets, in Kruskall-Wallis test, p = 1; in sextets, also in Kruskall-Wallis, p = 0.14.
    Return to Text


  • Aydogan, G., Flaig, N., Ravi, S., Large, E., McClure, S., & Margulis, E. (2018). Overcoming bias: Cognitive control reduces susceptibility to framing effects in evaluating musical performance. Scientific Reports, 8(1), 1-9. https://doi.org/10.1038/s41598-018-24528-3
  • Anglada-Tort, M. (2018). Commentary on Canonne (2018): Listening to Improvisation. Empirical Musicology Review, 13(1-2), 16 -20. https://doi.org/10.18061/emr.v13i1-2.6387
  • Bas, J. (1947). Tratado de La Forma Musical. Buenos Aires, BA: Ricordi.
  • Borgo, D. (2006). Sync or Swarm: Musical Improvisation and the Complex Dynamics of Group Creativity. In Futatsugi, K., Jouannaud, J., Meseguer, J. (Eds.), Algebra, Meaning, and Computation, (pp. 1-24). Springer Berlin Heidelberg. https://doi.org/10.1007/11780274_1
  • Borgo, D. (2005). Sync or Swarm: Improvising music in a complex age. Oxford: Oxford University Press.
  • Canonne, C. (2018). Listening to Improvisation. Empirical Musicology Review, 13(1-2), 2-15. https://doi.org/10.18061/emr.v13i1-2.6118
  • Canonne, C. (2013). L'appréciation esthétique de l'improvisation. Aisthesis, 6(3), 331-356. https://doi.org/10.13128/Aisthesis-14109
  • Canonne, C., & Garnier, N. (2015). Individual Decisions and Perceived Form in Collective Free Improvisation. Journal of New Music Research, 44(2), 145-167. https://doi.org/10.1080/09298215.2015.1061564
  • Clarke E. (2001). Generative principles in music performance. In: J. Sloboda (Ed.), Generative Processes in Music: The Psychology of Performance, Improvisation and Composition (pp. 1-26). Oxford University Press. https://doi.org/10.1093/acprof:oso/9780198508465.003.0001
  • Costa, R., & Schaub, S. (2013). Expanding the concepts of knowledge base and referent in the context of collective free improvisation. Anais do XXIII Congresso da ANPPOM, 23(1), 1-8.
  • Costa, R. (2017). Processos de consistência e contextos na improvisação livre: aproximações preliminares. Orfeu, 1(2), 6-20. https://doi.org/10.5965/2525530401022016006
  • Dean, R. (1992). New structures in jazz and improvised music since 1960. Philadelphia, PA: Open University Press.
  • Duerksen, G. (1972). Some effects of expectation on evaluation of recorded musical performance. Journal of Research in Music Education, 20(2), 268-272. https://doi.org/10.2307/3344093
  • Falleiros, M. (2012). Palavras sem Discurso: Estratégias Criativas na Livre Improvisação. [PhD Thesis, Universidade de São Paulo]. Biblioteca Digital USP. https://www.teses.usp.br/teses/disponiveis/27/27158/tde-08032013-140658/pt-br.php
  • Goupil, L., Saint-Germier, P., Rouvier, G., Schwarz, D., & Canonne, C. (2020). Musical coordination in a large group without plans nor leaders. Sci Rep, 10(20377). https://doi.org/10.1038/s41598-020-77263-z
  • Jost, E. (1994). Free Jazz. Viena: Da Capo Press.
  • Kirk, U., Skov, M., Hulme, O., Christensen, M., Zeki, S. (2009). Modulation of aesthetic value by semantic context: an fMRI study. Neuroimage, 44(3), 1125-1132. https://doi.org/10.1016/j.neuroimage.2008.10.009
  • Kroger, C. & Margulis, E. (2016). "But they told me it was professional": Extrinsic factors in the evaluation of musical performance. Psychology of Music, 45(1), 49-64. https://doi.org/10.1177/0305735616642543
  • Lehmann, A. & Kopiez, R. (2010). The difficulty of discerning between composed and improvised music. Musicae Scientiae, 14(2), 113-129. https://doi.org/10.1177/10298649100140S208
  • Pressing, J. (1984). Cognitive processes in improvisation. In: W. Crozier & A. Chapman (Eds.), Cognitive processes in the perception of art (pp. 345–363). Elsevier. https://doi.org/10.1016/S0166-4115(08)62358-4
  • Pressing, J. (2001). Improvisation: Methods and Models. In J. Sloboda (Ed.), Generative processes in music (pp. 129 – 178). Clarendon. https://doi.org/10.1093/acprof:oso/9780198508465.003.0007
  • Sawyer, K. (2003). Group Creativity: Music, theater, collaboration. New York: Routledge.
  • Schoenberg, A. (1970). Fundamentals of Musical Composition. London: Faber and Faber.
  • Wilson, G., & MacDonald, R. (2016). Musical choices during group free improvisation: A qualitative psychological investigation. Psychology of Music, 44(5), 1029-1043. https://doi.org/10.1177/0305735615606527
Return to Top of Page