|
July 2016
About the Author
Abstract
1.
Introduction
By analysing the data from multiple studies, it should be possible to come up with more definitive results concerning the perception of high resolution audio. For instance, several tests used similar methodologies and so it might be possible to pool the data together. In other cases, data is provided on a per subject level, which could allow re-analysis. Here, we provide a meta-analysis of those studies. Note that this is far more than a literature review, since it compiles data from multiple studies, performs statistical analyses on this aggregate data, and draws new conclusions from the results of this analysis. Meta-analysis is a popular technique in medical research, and has been applied to the evaluation of music information retrieval techniques [1-3]. The term has also been applied to primary analysis of the performance of audio feature extraction techniques within a general framework [4]. But to the best of our knowledge, this represents the first time that it has been applied to audio engineering research.
1.1
Reviews
[13] provided a systematic review of studies concerning the health effects of exposure to ultrasound. The studies reviewed showed that it may be associated with hearing loss, dizziness, loss of productivity and other harmful effects. However, some of the reviewed studies defined ultrasound as beyond 10 kHz, thus including content known to be audible. And all studies discussed in [13] focused on prolonged exposure, especially in the work environment.
1.2
Identification and selection of high resolution audio studies
The review papers mentioned in the previous section may be considered the starting point for this work. We searched through all references they cited and all papers that have cited any of them in order to identify any relevant experiments. For all of the papers identified which concerned perception of high resolution audio, we then repeated the procedure, searching all citations therein and all citations of those papers. This procedure was repeated until no new potentially relevant references could be found. Potentially relevant experiments were also found based on discussions with experts, keyword searches in databases and search engines and the author's prior knowledge. The same iterative search on the citations within and citations of those papers was again applied to these additional papers. In total, 80 relevant references were found, of which there were 51 papers describing perceptual studies of high resolution audio. No experiments published before 1980 were considered. A study of potentially relevant references showed that they mainly assumed that content beyond 20 kHz would be unnecessary, and may not have had sufficiently high quality equipment to reproduce high resolution audio anyway [14-21]. Several potentially relevant references could not be found. These were all non-English language publications. Furthermore, they were often presentations in meetings and so may not have been formally published. But in all cases, the authors had English language publications and it appeared that the English language versions may have described the same experiment. There may also be relevant experiments that were overlooked because they had an unusual methodology, were described in an unusual way or presented to a very different audience. This is most likely the case for works published in physics or neuroscience journals. However, for all the relevant experiments that were found described in such places, though they dealt with aspects of high resolution audio, they did not focus directly on the most fundamental questions with which we are concerned, that is, the discrimination between standard quality and beyond standard quality audio with real world content. Many publications treated results for different conditions, such as different stimuli or different filters for sample rate conversion, as different experiments. Since these experiments generally have the same participants, same investigators, same methodology etc., they were grouped as a single study. Where the experiments involved fundamentally different tasks, as in [22-24], these were treated as different studies. Studies focused on auditory perception resolution were not considered. Such studies may suggest the underlying causes of high resolution audio discrimination, if any, but they are not directly focused on discrimination tasks. Similarly, experiments involving indirect discrimination of high resolution audio were excluded because an indirect effect may be observed or not, regardless of whether high resolution audio can be directly discriminated. In particular, brain response to high resolution content may not even relate to perception.
]Within the studies focused on perceptual discrimination, we identified at least 21 distinct, direct discrimination studies. Three of these [25-27] were excluded because there was insufficient or unusual reporting that would not allow use in meta-analysis. Figure 1 presents a study flow diagram showing how the studies were selected for meta-analysis.
1.3
Classification of high resolution audio studies
1.3.1
Auditory perception resolution studies
However, the majority of perceptual resolution studies have been concerned with time and frequency resolution. A major concern is the extent to which we hear frequencies above 20 kHz. Though many argue that this would not be the primary cause of high resolution content perception, it is nevertheless an important question. [36, 37, 39, 40] have investigated this extensively, and with positive results, although it could be subject to further statistical analysis. Temporal fine structure [73] plays an important role in a variety of auditory processes, and temporal resolution studies have suggested that listeners can discriminate monaural timing differences as low as 5 microseconds [31-33]. Such fine temporal resolution also indicates that low pass or anti-alias filtering may cause significant and perceived degradation of audio when digitized or downsampled [54], often referred to as time smearing [74]. This time smear, which occurs because of convolution of the data with the filter impulse response, has been described variously in terms of the total length of the filter's impulse response including pre-ring and post-ring, comparative percentage of energy in the sidelobes relative to the main lobe, the degree of pre-ring only, and the sharpness of the main lobe. [41, 42] both claim that human perception can outperform the uncertainty relation for time and frequency resolution. This was disputed in [75], which showed that the conclusions drawn from the experiments were far too strong.
|
|