How To Do A Proper Listening Test Part 1 Article By Ethan Winer

Home \| High-End Audio Reviews \| Audiophile Shows \| Partner Mags \| Hi-Fi / Music News
	High-Performance Audio Reviews Music News, Show Reports, And More! 30 Years Of Service To Music Lovers

April 2019

How To Do A Proper Listening Test
Part 1
Article By Ethan Winer

There are two ways to assess the quality of audio devices: measuring and listening. Measuring is usually the better choice because the results are absolute, and repeatable because they avoid the vagaries of human hearing perception. But when measuring isn't practical or possible, a listening test using a music source is perfectly fine. For example, listening is needed to compare CD quality at a 44.1 kHz sample rate to "high definition" audio at 96 kHz. Both will measure the same if the frequency response is limited to the audible range, but some people believe they sound different. Another example is when comparing MP3 bit-rates, especially higher values such as 256 versus 320 kbps. It's pretty much impossible to "measure" the effect of lossy compression using traditional means because the frequency response changes from moment to moment.

Listening tests are also useful for comparing loudspeakers because there are so many variables such as off-axis response, dB per octave low frequency roll-off slope, distortion that varies continuously with volume level, and separate distortion amounts for the woofer and tweeter. These can be measured perfectly well in a million dollar anechoic chamber, but not so well at home. And it's probably impossible to measure the subjective effect of devices that add distortion or other color intentionally. Does an Aphex Aural Exciter sound better than a BBE Sonic Maximizer? Do plug-in versions of vintage compressors sound the same (or at least as good) as the original hardware? Does a tape-sim plug-in really sound like tape? Only you can decide what sounds "better" to you, though you might be fooled, or at least biased, by various factors such as knowing whether you're hearing a real guitar amp or a plug-in simulation. So verifying your own perception is another use for a listening test.

Performing a proper listening test is a lot more complicated than many people realize, and several conditions must be satisfied for the results to be valid. When the perceived differences are subtle — and even when they're not so subtle — you can't just play one thing, then another, and proclaim a winner. The differences between modern high fidelity audio devices are usually very subtle, at least when operated at normal levels to avoid distortion or noise. Listening tests have shown repeatedly that, more often than not, people are unable to tell one competent audio device from another no matter how much they differ in price.

Testing Rules
First, and perhaps most important, a listening test must be blind. If the listener can see which device or source is playing, that will influence their opinion. Nobody is immune from sighted bias, and no reasonable person should object to being tested blind. If you're so certain that you can tell HD audio from CD quality, then you should be able to do that when you can't see which is playing. Just as important, the audio sources being compared must be the same musical passage or sound sequence. You can't compare devices by playing different parts of a song because the source sound itself changes! That makes it impossible to separate the source changes from any A/B device differences.

So it's not valid to start a piece of music playing, then switch between A and B as the music continues. You have to play a section of music through one device, then switch devices and play the same section again. Further, you can't just do this once because the listener has a 50-50 chance of being correct just by guessing. Therefore, at least five or six tests are needed, or even more, to be certain the listener really can hear a difference reliably. I'll address the number of tests needed below.

You can't compare different performances either. A common mistake is comparing microphones or preamps by recording someone singing or playing a guitar with one device, then switching to the other device and performing again. The same subtle details we listen for when comparing gear also differ between performances — for example, a bell-like attack of a guitar note or a certain sparkling sheen on a brushed cymbal. Nobody can play or sing exactly the same way twice, or remain perfectly stationary. So that's not a valid way to test microphones, preamps, or anything else. Even if you could sing or play the same, a change in microphone position of even half an inch is enough to make a real difference in the frequency spectrum captured by the microphone.

The A and B volume levels must also be matched to within 0.1 dB, or as close to that as possible. Very small level differences often don't sound louder or softer, but just slightly different. Larger volume differences have a substantial affect on tonality due to Fletcher-Munson, where both low and high frequencies become more prominent at louder volumes. You can match volume levels through electronic devices, such as preamps and equalizers, using a 1 kHz sine wave. You also need a decent voltmeter, though a recorder with a large analog VU meter is a reasonable second choice. There is a 1 kHz standard for audio testing because it's in the middle of the audio range to avoid the response errors many devices exhibit at the low and high frequency extremes.

However, when comparing acoustic sounds in the air, a better source for calibrating loudspeaker or microphone levels is pink noise that's been band-limited to contain only midrange frequencies. Acoustic waves in a room create numerous peak and null locations only inches apart. The advantage of noise is that it contains multiple frequencies, so if your microphone is in a null at 1,000 Hz it's probably not in a null at 900 Hz or 1,100 Hz. And as with sine waves, low and high frequencies are best avoided where speakers and microphones are farther from flat. A link to a band-limited pink noise Wave file is at the end of this article. I created this file from normal pink noise, then filtered out frequencies below 500 Hz and above 2 kHz at 18 dB per octave. You can set this file to loop in your favorite audio program and it will continue to play seamlessly.

Blind Testing
There are two types of blind tests: single-blind and double-blind. With a single-blind test, one person (the tester) handles playing the music and switching the media or devices without letting the listener (the subject) know which is which. So the tester knows but not the subject. The problem with single-blind tests is it's possible for the tester to give a clue to the subject without meaning to or even realizing it. In a double-blind test even the tester doesn't know which is which. So with wine tasting tests or drug trials, for example, the wine or medicine is labeled only "A" or "B" and the tester merely keeps track of each subject's choices. Those choices are then given to someone else to tabulate, and only that person knows the identity of A and B.

A double-blind test is difficult to set up and implement, and most people won't go to that much effort. I believe a single blind test can be sufficient, especially for informal tests between friends that won't be published in a peer-reviewed journal! But there are still important ground rules: The person testing must not look at the test subject, and it's better if they can't even see each other. When I test people blind in my home studio, I'm at my computer while they sit or stand behind me. So they can't see my face, and my body blocks the screen and my hands on the keyboard.

Most audio tests compare only two things at a time, such as a Wave file versus an MP3 copy, or D/A converters with music playing through one then the other. So the tester plays example A and asks the subject which version she thinks is playing. Then you can switch to example B and ask again. It's okay, and even suggested, to occasionally play the same source twice or even three times in a row. Most subjects won't expect that, so that further confirms whether they're really hearing a difference or just guessing. Then the same test is repeated enough times to be sure the subject didn't succeed once or twice by chance. Of course, the tester must keep clear notes of which version is played every time. It's probably simpler to establish a sequence of A/B switches in advance, then you won't have to keep stopping to note them all during the test. A typical sequence might be A, B, B, A, B, A, A, A, B, etc. Then during the test you can write the subject's choices underneath each letter.

Note that being consistently wrong is as significant as being right. If you listen blind to compare media players costing $30 versus $1,000 and you pick the cheaper player every time, then you really did hear a difference. So being correct only one time in ten is the same as being correct nine times out of ten. You just thought the cheap player sounded better. And maybe it really was better.

Finally, some people believe that blind tests put the listener on the spot, making it more difficult to hear differences that really do exist. Blind testing is the gold standard for all branches of science, especially when evaluating perception that is subjective. Blind tests are used to assess the effectiveness of pain medications, as well as for food and whiskey tasting comparisons. So there's no reason it can't work just as well with audio. Again, if someone is so certain of their hearing, then they should be able to do it blind. It's true that some types of degradation become more obvious with experience. Both MP3 type lossy compression, and dynamic noise reduction, add a slight hollow sound that becomes more obvious with experience and repeated listening. But learning to notice subtle artifacts is not the same as being "stressed" by having to listen blind. It's reasonable to assume that anyone serious enough about audio fidelity to take a listening test already knows what to listen for.

Correct By Chance
Nobody will miss a 10 dB midrange boost on a typical music track, but some people claim to be able to hear very subtle differences that others might miss. In that case you need enough trials to be sure they didn't guess correctly by chance. If you play A then B only once and the subject is correct, there's a 50 percent chance they were right just by guessing. If they're correct two times out of two, it's still 25% likely they just guessed correctly. This simple formula shows how many trial comparisons must all be identified correctly in order to know with X Percent certainty that someone didn't just guess correctly by chance:

Percent chance you guessed correctly = (0.5 ^ ^{Number of
Trials}) x 100

Therefore:

Number of trials	Chance you merely guessed correctly.
1	50%
2	25%
3	12.5%
4	6.3%
5	3.1%
6	1.6%
7	0.8%
8	0.4%

If someone insists they can hear some soft artifact or other anomaly, it's not unfair to expect them to be correct every single time. If a difference is so obvious that "even my maid can hear it," as one well known mixing engineer once claimed to me, then they ought to be able to choose correctly every time, no matter how many trials there are.

However, if you test a large enough group of people, it's likely some of them will be correct by chance alone. For example, if you give a test having five trials (3% reliability) to one hundred people, three of them could be correct even if everyone just guessed at random! The solution is to ask those same people to do five trials a second time. If they're correct all five times again, then the odds of having guessed correctly by chance are only 3 percent of 3 percent = 0.09 percent. And in that case they likely do really hear a difference.

Note that the number of trials needed to ensure someone is not correct merely by chance should not be confused with confidence interval in statistics. That's not the correct term because confidence is related to testing the general population. If you test 1,000 people to see how many can hear the difference between two types of dither, that gives you a good sense of whether most people can hear a difference. But for audio tests, it doesn't matter what "most people" can or cannot hear. All that matters is what you or the test subject can identify reliably.

These are the band-limited pink noise and 1 kHz sine wave files on the author's web site:

EthanWiner.com/bl-noise.wav

EthanWiner.com/1k-sine.wav

Below link is the Sonalksis Free G freeware volume control plug-in:

www.Sonalksis.com/freeg.html

About The Author
Ethan Winer has been an audio engineer and professional musician for more than 45 years. His Cello Rondo music video has received nearly two million views on YouTube and other web sites, and his book The Audio Expert published by Focal Press, now in its second edition, is available at Amazon.com and Ethan's own web site EthanWiner.com. Ethan is also a principal at RealTraps, a USA manufacturer of high quality acoustic treatment.

See Part 2 Of This Article

Continue reading Part 2 of this article at this link.

Premium Audio Review Magazine
High-End Audiophile Equipment Reviews

Equipment Review Archives
Turntables, Cartridges, Etc
Digital Source
Do It Yourself (DIY)
Preamplifiers
Amplifiers
Cables, Wires, Etc
Loudspeakers/ Monitors
Headphones, IEMs, Tweaks, Etc
Superior Audio Gear Reviews

Show Reports
HIGH END Munich 2025
Lone Star Audio Fest 2025
AXPONA 2025 Show Report
Montreal Audiofest 2025 Show
Southwest Audio Fest 2025
Florida Intl. Audio Expo 2025
Capital Audiofest 2024
Toronto Audiofest 2024
UK Audio Show 2024
Pacific Audio Fest 2024
...More Show Reports

Videos
Our Featured Videos

Industry & Music News
High-End Audio & Music News

Partner Print Magazines
audioXpress
hi-fi+ Magazine
Sound Practices
VALVE Magazine

For The Press & Industry
About Us
Press Releases
Official Site Graphics

Home | High-End Audio Reviews | Audiophile Show Reports | Hi-Fi / Music News | About Us | Contact Us

All contents copyright^©1995 - 2025 Enjoy the Music.com^®
May not be copied or reproduced without permission. All rights reserved.