OVERALL, this manuscript presents an interesting experiment, and provides datasets that can inform future research on how specific performance techniques could be linked to the composer's intention regarding the emotional content of string music. The consistent finding of null-results is quite surprising, it's refreshing to see that the authors approached these results with such honesty. There does not seem to be any flaw with the empirical methods employed, I consider these null results interesting and perhaps stimulating for future investigations. Ideally, we as a field would publish more studies like this, to avoid the all-too-common "bottom drawer effect".
Because of the low sample size in terms of the number of pieces potential confounders could play a strong role just by chance. Therefore, it would be good to control for and further examine the influence of instrument, composer, and piece. These results could be depicted in lattice graphs stratified by piece or matched pair, composer, instrument (and future studies might control for these through the use of a stratified chi square test).
Even though the observed results go counter to the hypothesis, it would be good to get an idea of the size of the empirical effect by calculating a measure of effect size, such as likelihood ratio or a Bayes factor. This is particularly important because all observed data goes against the hypothesis and the reader could wonder how strong the empirical evidence is for the opposite hypothesis, or whether the observed differences are all within the chance margin. This should be done for all subsequent empirical results (where significance testing was omitted). In addition, I wonder whether one should control for the pitch range or the pitch centroid as well in line with the hypothesis that sad music would sound in a lower register.