Last week, I posted a challenge to listen to two pairs of audio samples, and answer two simple questions about them. If you haven’t tried it yet, you might like to check it out before you read the answers.
Six months ago, I did a similar experiment to see if people could tell the difference between compressed an uncompressed audio. Although the results didn’t suggest that anybody could tell difference, it had one key flaw: it’s impossible to prove that nobody can do something. Without testing all the people all the time, you might miss somebody, and that somebody might be working for the Philadelphia Inquirer.
This time, I approached the problem from the opposite direction: is it possible for people to think they’re hearing differences that don’t exist, and are these false perceptions vulnerable to suggestion?
The Experiment
The test is fairly simple. In case you hadn’t guessed, all four audio samples are exactly the same.
After hearing two identical samples of music, people are asked if one sounds better than the other, or if they sound the same. This is question one.
Respondents are then told that the next two should sound different. The wording of the question suggests that there’s a correct answer. The third option is changed from “They sound the same” to “I can’t tell”. This is question two.
If people’s perceptions of what they heard were reliable, you’d expect everybody to pick “They sound the same” in the first question and “I can’t tell” in the second. If just one person doesn’t, then it means it’s possible for us to be mistaken about what we hear.
If our hearing is suggestible, you’d expect more people to perceive a difference between the second pair of samples than the first pair.
The Results
I asked 100 people. 56% claimed to hear a difference between the first two identical samples.
When the question was loaded to suggest the samples weren’t the same, that rose to 69%.
On this graph, a “right” answer is that the samples sound the same, and a “wrong” answer is that they’re different.
Conclusions
In both questions, most people thought they heard a difference where there was none. Even more people perceived an imaginary difference when the question was loaded.
That means that you can’t trust what people think they’re hearing, especially when they have preconceptions about what they’re going to hear.
This is important, because if we can’t count on our own ears then we certainly can’t believe anybody’s assertion that they heard a difference between two recordings unless they consistently perceived this difference under blind conditions, where they didn’t know which recording was which.
Commentary
Under these circumstances, reviews like this one tell us nothing useful about the quality of recorded sound – there’s a good chance we’re just reading about the preconceptions of the listener. We might as well ask the critic if he is an above-average driver.
It’s worth remembering this next time somebody tries to sell you magic speaker cable. The salesman will encourage you trust your ears, because he knows they’ll do what they’re told.
It seems that this effect* isn’t limited to recording quality, but also extends to aspects of performance. Guardian columnist Ben Goldacre wrote about a similar study a few days ago. Women were filmed playing the violin while wearing a variety of different outfits. The videos were all dubbed with the same recorded sound, and then shown to an audience of trained musicians who rated them for technical skill and musicality. The viewers thought that girls in slutty outfits played badly, even when the music was exactly the same.
It seems even trained musicians aren’t much good at consistently assessing what they hear. It’s difficult to know if the same applies to critics. Perhaps they’re a breed apart. I guess the only way to find out is with an experiment. If I can think of a way to do it that’s not professionally embarrassing or physically invasive, perhaps that’s what we’ll do next. How the heck I’m going to get volunteers, I don’t know. Free drinks, maybe. That would work on me.
If you’ve investigated the limits of your own aural perception, I’d be fascinated to hear from you.
As always, the comments are here for you. I know it’s the Internet, but let’s at least try to keep it civil.
* which probably already has a name, but which I’m going to call “David Patrick Stearns Disease“. Its opposite – the Alex Ross Effect – is when, despite repeated listening, you don’t notice how genuinely awesome something is until it is pointed out to you by a trusted public figure.
I can think of one key difference for your speaker-cable comparison and this test, and it’s that you already know you are making that comparison with the same CD. You don’t suspect that the CD is being switched out and maybe it’s the same music or maybe it isn’t, or maybe the volume is louder or maybe it’s softer. There are some controls in place that you have, ahem, control over.
In general, though, I have to agree with your point that most people can’t tell the difference between the same thing if they think it will be different.
It’s true that, in the case of the speaker cable, you know what you’re listening for. It seems, though, like that would make you more susceptible to confirmation bias.
You conclude: “Even more people perceived an imaginary difference when the question was loaded.”
Well of course!
This whole thing is really more about test psychology than anything else.
First, no one “claimed to hear a difference.” That implies that people felt strongly about their choice. Because “I strongly believe A sounds better,” “I’m not 100% certain but I think A sounds better,” etc., were not options, we don’t know how confident respondents felt about their decisions. I’m sure most people thought the clips sounded the same, but since they were challenged to detect a difference, they felt pressured to choose one clip over the other rather than concede they could not hear a difference. You effectively baited people to choose one or the other, setting expectations by calling this “The Pepsi Challenge.”
If you had worded the test as: “I’m doing some equalization work on a new recording, would you please tell me which clip sounds better,” I’m willing to bet more people would have said they sound the same because the “challenge” element had been removed – there would have been less pressure to make a choice, less personal pride at stake.
This absolutely is about psychology, but do you think that people taking part in this little experiment are under more or less pressure than, for example, a critic with a reputation to uphold who knows which answer he is expected to give?
I think David Patrick Stearns simply wrote a poor review. As you noted, most of his comments about the sound quality were vague and inarticulate. He’s typically a good writer, so I’m surprised he built his argument on broad generalizations instead of presenting specific places where the sound quality was problematic.
We also shouldn’t discount his main thesis: the Orchestra would be better served releasing SACDs like SFO, LSO, CSO, BSO and ACO. Ok, he never says SACD, but he pretty much hints at it. The Ondine SACDs generally sounded great; great sounding SACDs of Dutoit conducting Philadelphia could have done a lot to repair the Orchestra’s image.
I was directed to this test by someone else. Before looking at it I did the old test. I’m a composer, a sound engineer and an audio DSP programmer with fairly extensive experience of frequency domain processing (which compression techniques have at their core). Although, I wouldn’t say the first test was easy I got it right on a reasonable pair of headphones straight out of my laptop. I had a clear idea of what kind of thing to look for and the artefacts were audible. I didn’t vote on this more recent test, but I did fall for the trick. I didn’t believe there was a difference in the first two and I wasn’t able to decide me final thoughts on the second two, but did think I could hear differences. However, I didn’t identify artefacts that I would readily associate with compression. Make of that what you will…
However, I don’t really think this test tells you anything particularly exciting. Moreover, on this blog as a whole you’ve made several posts that suggest that you don’t believe anyone can tell the difference between lossy compressed audio (at least at higher quality), and uncompressed audio. I don’t believe that to be true. Of course, testing more people (your suggestion in the final conclusion from the previous results) actually wouldn’t help determine this at all, because that test is designed around how many people can correctly identify uncompressed or compressed audio. The minimum sample size needed to prove that *someone* can is one – if they can reliably give the correct answer then you know someone can.
I don’t wish to convince you that lots of people out there can tell the difference ( I think they probably can’t). Nor do I disagree with the basic premise here that you can be easily tricked into thinking you can hear differences when there are none (I’m sure there’s ample research out there to prove this). Nor do I believe the quality of compressed audio to be unacceptable for many purposes. However, there are multiple problems with the way you’ve pursued your argument on this blog and some of the testing that you might at least be aware of…
In the first test you gave two alternatives to the uncompressed audio. You in no way accounted for which wrong answer people chose or whether one answer was *better* than the other. If we assumed one of them was utterly transparent (and no-one in the world could tell it from the original), clearly that would be a better answer. It is not possible the way the test was done to determine who choose each of the wrong ones because they couldn’t tell it apart from the original, as opposed to differentiating it from the other compressed version. The AAC at 256kbps was for me harder to differentiate than the mp3 – and really that shouldn’t be surprising, because the technology is superior (even I believe in this instance at a reduced bit rate). Viewed this way the original results *might* be seen as showing that a majority of the people taking the test can tell that the mp3 is inferior (assuming one can agree on that). Of course, they might not mean that at all – you’d really need a new test for that…..
This first test is also about whether people can *identify* uncompressed audio, not about whether than can perceive a difference. A perceived difference might be incorrectly attributed.
There’s also your assertion in the Stearn post “THEY ARE EXACTLY THE SAME”, which is probably not true (I’m not commenting here on whether what he says has any merit). Saying they are perceptually the same would probably be a lot fairer. A bit by bit comparison would be needed to prove they were exactly the same, which you didn’t apparently do. “You can measure the difference between a lossless file and a compressed one.” you write – yes, you can, but it doesn’t seem that you did that. Instead there are dubious graphics (with no scale or other useful information). FWIW I assumed the upper “silent” subtraction picture is at full amplitude, and calculate its height to be no more than 147 pixels, which means that (assuming it’s linear) you are displaying around 40dB of range – which is nowhere near enough to tell you if the difference is perceivable or if there is actual anything close to silence there. Tiny differences they maybe, but in the (roughly) logarithmic world of audio perception, tiny differences may be more important than they look.
Likewise, there is the laughable idea in an earlier post that multitrack versions can be considered of higher quality than stereo masters (there are a lot of problems with that idea – this is long enough already) which doubtless improves the look of your graphic. Then, in reference to conversion you state: “no attempt is made to compress the files in a way that conserves the audio quality in a smaller size”. I’m not really sure that statement can be considered accurate, considering that the process of dithering is all about trying to improve the perceived quality of the 16 bit format. Sample rate conversion is another matter.
Whilst some of these issues might be readily apparent to audio professionals and engineers, I suspect many readers here will not fall into your category, and may be easily misled by such arguments. If you are going to berate other people for the sloppiness of their approach and grasp of technology, I think you could do a little better yourself to be accurate, honest and upfront in your approach. There is indeed already enough misinformation, hype and bullshit in the world of audio…
Thanks for taking the time to write all of this. You make some really valuable points, and highlight some things that could be misleading.
I don’t really think this test tells you anything particularly exciting.
I’ll concede that it might not break any new ground, but an experiment should be repeatable. I could just write about existing research, but that would be boring.
On this blog as a whole you’ve made several posts that suggest that you don’t believe anyone can tell the difference between lossy compressed audio (at least at higher quality), and uncompressed audio).
This isn’t what I think. I just think that the impact of compression artifacts is frequently exaggerated.
The minimum sample size needed to prove that *someone* can is one – if they can reliably give the correct answer then you know someone can.
That’s true, if you know which person to test, and if the test were more thorough. It wouldn’t tell you anything very useful about the commercial application of high-quality audio, though. You’d just know there was definitely one customer. Using a larger sample with a less thorough test, you get an idea of how the market might respond to a product, which, at the end of the day, is what I’m interested in.
In the first test you gave two alternatives to the uncompressed audio. You in no way accounted for which wrong answer people chose or whether one answer was *better* than the other. If we assumed one of them was utterly transparent (and no-one in the world could tell it from the original), clearly that would be a better answer. It is not possible the way the test was done to determine who choose each of the wrong ones because they couldn’t tell it apart from the original, as opposed to differentiating it from the other compressed version.
People might mistake the bad one for the good one because it sounds different to the other two? Possibly. In the end, though, an even spread of results suggests that there isn’t a uniform perception that one of these formats is of superior quality.
This first test is also about whether people can *identify* uncompressed audio, not about whether than can perceive a difference. A perceived difference might be incorrectly attributed.
Ok, but if people think AAC sounds best, I don’t see why we shouldn’t sell them AAC.
There’s also your assertion in the Stearn post “THEY ARE EXACTLY THE SAME”, which is probably not true (I’m not commenting here on whether what he says has any merit). Saying they are perceptually the same would probably be a lot fairer.
I also used a recording of a different piece made by different artists. It’s an illustration that this is measurable, and there’s a scale at which the two are identical. It’s really up to Stearns to show us the scale at which they’re not. Instead, he uses “fog” and “mud”.
There is the laughable idea in an earlier post that multitrack versions can be considered of higher quality than stereo masters
I can see how this could be misleading. I thought carefully before including the multitrack version, but data is data and session files are mighty big. It might not be useful to you to be able to solo the percussion section or add more brass, but there are folks that would like that feature. I know I would. I’d value that data more than extra resolution that I can’t hear.
Then, in reference to conversion you state: “no attempt is made to compress the files in a way that conserves the audio quality in a smaller size”. I’m not really sure that statement can be considered accurate, considering that the process of dithering is all about trying to improve the perceived quality of the 16 bit format.
That’s fair. You could consider dithering as an attempt to conserve audio quality in a file of smaller size.
Thanks for your reply…
“This isn’t what I think. I just think that the impact of compression artifacts is frequently exaggerated.”
Good to have that clarified – no disagreement there – a 320kbps file is in no way comparable to the audio on a youtube clip…
I’m not involved in marketing. My interest is more in *extra*ordinary listeners. I’m happy to own compressed music, listen to it, and even sell my own music that way. I think you are right about most listeners, although I’m not sure your first test was that clear in what it could tell you:
“People might mistake the bad one for the good one because it sounds different to the other two? Possibly. In the end, though, an even spread of results suggests that there isn’t a uniform perception that one of these formats is of superior quality.”
That wasn’t what I meant, although it’s another reading. What I meant is that you seem to be assuming that everyone has a strong or clear preference. I am arguing that if two files are indistinguishable to the listener (but the third lower quality one sounds worse to them), that they are then likely to guess or “imagine” which of the better two sounds “best”. This would demonstrate some ability to hear compression, but only when the difference is perceptually clearer. However, because of the way you did the test you don’t see if they had that ability to distinguish this or not, because it goes three ways…
I’d be really interested to know if one on one tests (WAV vs mp3, WAV vs AAC or AAC vs mp3) produced different results. Maybe it wouldn’t but because the original results showed less people going for mp3 (which I thought was perceptually lowest quality) – I wonder if this reflects some higher level of critical listening ability amongst your readers, or is simply random.
I totally agree “fog” and “mud” are not helpful terms, and imply a very obvious difference in quality which I doubt is there. I guess I ‘m just saying that you could have given some values (peak or rms of the residual) to counter his assertions. Graphics without scale are somewhat meaningless – it looks like you’ve ‘measured’ something when really you haven’t.
In terms of multitracks I’d argue that most people wouldn’t know what to do with them (actually I’d *hate* to have to mix my own music collection – that would severely impact on my enjoyment – mixing can be incredibly frustrating and difficult, especially in multi miced scenarios) , but the argument you seemed to put forward in the post was about raw quality, rather than anything else. Yes, some people would like that, and there’s a fair argument for it. I agree that could be of more musical value, than resolution one can’t hear, but that’s somewhat different from what you implied…..
“The viewers thought that girls in slutty outfits played badly, even when the music was exactly the same.”
In a single sentence, why we have to have blind auditions for orchestra jobs – so that a player’s appearance/sex/race/skin tone/makeup does not affect judgment about how the person plays.
Who cares? As long as they’re slutty.
Looks like I got it right both times. So here’s what has me puzzled. Most people are satisified with mp3, but some people either can tell the difference, or at least think they can. Why not offer them what they want to buy, at a higher price, along with the cheaper and more popular format? Wouldn’t it be more useful to sell them what they want, instead of trying to convince them that they shouldn’t want it?
Sure. I’m not suggesting anybody should be forced to purchase an inferior product. There are plenty of places folks can get lossless audio if they want it. The unimpressive commercial performance of these enterprises does suggest that the market is small, though, and that’s probably why larger digital music retailers don’t offer lossless audio – it just doesn’t make economic sense.
Here’s why I don’t think this is pointless:
1) There are many people who have heard that compressed audio sounds bad. When they listen, expecting it to sound bad, that’s what they hear. Given the chance to take a blind test, many people discover they’re not missing anything when they take advantage of the added convenience of digital downloads. That means more people enjoying more great music, which I think is a good thing.
2) It is a common belief among marketing folks at classical labels that compression is the reason for their poor digital sales, when in truth they simply aren’t marketing effectively to their audience online. It’s a feeble excuse, but it’s one that allows people to get away with neglecting their business, not letting people know about the records they want. In the long run, that’s a bad thing.
On the other hand, the fact that people THINK that digital downloads are so audibly crappier IS a marketing problem. I am certain that there are many, many classical customers scared away from downloading by the poor reputation mp3s have gotten from commentators like David Patrick Stearns or Anthony Tommasini.
I can think of two ways that a company selling digital downloads might attempt to deal with this perception: one is by launching a marketing campaign that tells classical music snobs that their ears aren’t nearly as discriminating as they imagine—good luck with that—and the other is to launch, yknow, ITUNES SELECT™, now offering CD-Quality Downloads for the Élite Listener at only a slightly higher cost.
I agree that “Higher-Quality Downloads” is not in and of itself a business model for an emerging iTunes competitor, but as a marketing strategy for one of the existing download giants, it might pick up a lot of listeners who would otherwise buy a CD or nothing.
At least some of the results may be partly explainable by what listeners expect, not of the recorded samples, but of their own hearing. I got both answers “right,” but I’m 68, and my last test showed moderate hearing loss (particularly, as is typical, at higher frequencies), so I’ve reconciled myself to the fact that subtle variations in file format (or speakers or cable) will be lost on me. It’s OK with me–I was always more interested in the music than the quality of the reproduction. So when I couldn’t tell A from B or C from D I just put it down to aging. (For what it’s worth, I found the performance way too mannered, but then I can’t really stand “The Four Seasons” anyway.)