Here’s a simple scientific experiment. Please listen to this audio file* and answer the question below. The file (all 56mb of it) contains three versions of the same 1’15” excerpt of Mozart’s Piano Concerto K.491. One is the original uncompressed CD-quality audio, the other two are compressed, like you’d download from iTunes or Amazon. Can you tell which is which?
EDIT: Did you get it right? Did anybody else? The results are now in.
* The sample is from this album, which I heartily recommend. I picked it because it contains all the things that typically encode badly: piano and strings, loud and soft. It’s a big file. It has to be, because the whole thing’s gotta be at CD quality for the science to work. I’m sorry if it takes a while to download. This is what music would be like on the Internet without compression, because CD quality audio is quite a bit bigger than the stuff you’re used to downloading. There are formats that make CD-quality audio a bit smaller, but they don’t work on many of the popular players.
Is there a glaring hole in my methodology? Use the comments. I’ll close the poll and post the answers (and results) when we’ve got enough votes.
Great post. I hope you’re trying to prove a point that i (and probably you) have been making for a while. The average human ear can’t hear the highest and lowest frequencies that get lopped off during a good compression.
I feel like i could hear the inner voices better on the [EDIT: SHHHHHHH. This is a blind trial] version above. But, I had to use headphones and still am not sure which is which.
Having my classical library (mostly) physical vs. completely digital is less about sound quality and more about not loosing my content. (I can’t tell you how much i’ve lost to bad hard drives. *sigh*)
I hope you don’t mind, but I slightly edited your comment to preserve the “blind” nature of this trial until there are enough results to draw useful conclusions.
Thank you! I hope this post shuts everybody up about how lossy compression is raping our nation’s youth or whatever.
Now, to be fair to the FLACsters: it’s super-easy to convert FLAC or whatever into losslessly (is that a word) compressed iTunes files—I use XLD, a free download for Mac. Although maybe I’m wasting my time, since I had to listen to this twice before I made any kind of guess, and I’m still not sure if I picked the “right” one.
Just you wait for the Pepsi Challenge Part II – Rape or compression: can you tell the difference?
This is a great example of ‘wait – which version was that?’ It’s amazing that the human ear can’t hear most of the compression. I’ve already cast my vote, and I’m almost positive that I’m right. Wish I could post which one! Anyone want to start a betting pool??
Just wait for Pepsi Challenge Part III: How sure are you? You wanna bet on it?
Almost 100%! Haha. The real question is – what’s the wager?
I should ask for something big while you’re still sure you got it right.
I think your response depends on what you’re listening on. Computer speakers level out everything, so compressed audio is fine. Good stereo speakers show the deficiencies much more. A CD sounds great on speakers, but the distortion of compressed audio is immediately apparent when I listen to mp3s on the stereo. But if you do listen mostly on your computer, go for the compressed!
Just wait for Pepsi Challenge Part IV: Should you even be allowed to listen to proper music on that piece of crap?
ah come on! it’s the [LALALA I CAN’T HEAR YOU] one! 😀 i can’t wait to find out…
It’s like religion: everybody’s sure they’re right, but you don’t find out for sure until it’s all over.
Do one with some percussion in there and I think you’ll get a lot more of an obvious difference in sound. Especially if there are snare or cymbal hits. Pretty sure the uncompressed one is [self edit], though.
It’s a struggle to find a single real-world example that’s a fair test. On balance, I’m fairly happy with this one. I do, though, think that an audio poll might become something of a regular feature here, so maybe next time.
Ben’s right that something with a lot of hi hat is easier to hear the compression on. A poorly-compressed mp3 will tend to get a weird “underwater” sound from cymbals, choral sopranos, and other hi-frequency generators. The sound of the violin E-strings was what helped me make up my mind before I made my final, incorrect choice.
I can only think that Berlioz was thinking of this moment when he left hi hat out of his treatise on orchestration.
No, he was thinking of Corigliano’s Third Symphony.
Damn, that’s a little too much Lynyrd Skynyrd for one sitting, compressed, uncompressed, whatever. “Free Bird” should be unencumbered by bits and bytes anyway.
I always suspected that some composers must simply hear something different.
The DAC or the Speakers could be the bottleneck in the sound, and thus it would not be so obvious a difference on your computer. The best would be to take the file to a good system and play it. That would be the most clear difference in my mind.
“Is there a glaring hole in my methodology?”
————————————–
Yes, but perhaps not a glaring one.
As Marc Geelhoed correctly pointed out above, it depends on the audio system you’re using to audition the clips. If you’re using the ordinary MP3 audio system like, say, the audio system on your computer or an iPod, you probably won’t notice any difference between the clips. If, however, you’re using a high-priced, high-quality audio system (the two aren’t necessarily the same thing), you’ll notice the difference immediately. Given the ubiquitousness of the iPod and other crap audio systems, you’d be pretty stupid to pay out more bucks and give up more storage space for a CD-quality MP3.
ACD
If people can’t hear a difference, I’m expecting them to say that they can’t hear a difference.
I can’t believe that you have used an iPod to play back any of the lossless formats it supports through good-quality headphones and still think it is a “crap audio system”. I worry that if your idea of “mediocre” starts somewhere beyond this, then you’re hardly typical of even classical music purchasers – indeed it may well be that your idea of “good” surpasses the quality of the electronics used to record most classical albums in the first place.
Let me guess. You think Bose is a high-end, top quality audio system, right? I’d also guess you’re under 30.
ACD
It isn’t really important what I think. The purpose of this whole exercise is to find out what a sample of computer-literate classical music consumers think.
I’ll post the results next week.
This is a fun little idea 🙂 The tutti chords around 50seconds in seem to reveal some clumpy bit-dropping, but that seems equally present in all 3 versions, This might be from the quicktime playback on my mac, but was it in the original CD too?
I’ll still cast my vote can only give my gut reaction…
I think that what you’re hearing there is a function of the room in which it was recorded.
Hm, really can’t tell the difference. I guess I am using “inferior equipment” and should be excluded from the experiment…
The oboes are bloody flat at the end of the big tutti, which pissed me off in all three versions. Couldn’t they have taken 10 minutes to tune the winds instead of spending that time setting the microphones and creating “audiophile” recording quality? 😉
Can’t wait to hear the results!
Sorry Tim. You’re obviously not qualified to comment – my stereo cost $20,000, and the oboes sound fine to me*.
* I do wish there was some sort of punctuation mark to indicate sarcasm.
Generally when these tests are done, previously they’ve found that some people will nearly always get it right, and some will get it randomly right. What this means is often attributed to things like “golden ears” but I think that a listener trained in recognising encoding artifacts will more often get it right.
BTW on methodology: The file is 24bit so if you lifted it from a CD, then you upsampled from 16bit. This adds another file conversion process.
Generally when these tests are done, previously they’ve found that some people will nearly always get it right, and some will get it randomly right. What this means is often attributed to things like “golden ears” but I think that a listener trained in recognising encoding artifacts will more often get it right.
I think that’s borne out by the results. It’s also hardly a real-world example, because consumers wouldn’t normally listen to the two side-by-side. Perhaps a fairer test would be to use a bunch of different pieces of music, some compressed, some not. It would be really difficult to control, though, because some pieces are more vulnerable to compression artifacts than others.
BTW on methodology: The file is 24bit so if you lifted it from a CD, then you upsampled from 16bit. This adds another file conversion process.
Oops. That was just sloppiness on my part. It’s a pity, but the up-conversion was applied to all three samples so it shouldn’t have a significant effect on the outcome.
The link to your test was sent to me by a friend (audiobuddy?) The following was actually my reply to him: –
Yes, interesting – but not, alas, surprising. I’ve taken a similar test
where there were about 8 sets of headphones which were each (apparently)
playing the same piece of music using different formats and codecs. We were
asked to rank them in order. Frankly, although I could detect some subtle
differences, I did not feel able to put them in a useful ranking. Certainly
none of them gave me that AAAAAHHHH!! feeling when you know you’re in the
presence of excellent reproduction of well recorded music. Also my audio
memory just isn’t good enough for me to judge whether No.6 is actually
better than No.3. If I’d had enough time I believe I could have ranked them
reasonably well by judging them all in pairs. If, say Nos.2, 3 5 and 8
sounded better than their partners and so on. But despite all that there’s
no substitute for spending time listening to a good speaker based system.
One cetainly wonders what the people in the abovementioned sample used to
play back the samples. Finally I find that I’m generally quite happy using
320mbps MP3 files on my media player and ear buds. I imagine I would be
less so if I was using a headphone amp and high quality headphones. On my
media player I can hear a MUCH bigger difference between 160 and 320 MP3 than I
can between 320 and uncompressed WAV.
Dear Proper Discord,
I read with interest your blog entry on the “Pepsi challenge” of comparing lossless files with compressed files. First, I’d like to state that I have a biased interest in the outcome- I run a classical download site that sells lossless files (FLAC) named Passionato(www.passionato.com). I’m being upfront about this – perhaps you should expose your vested interest(s) as well.
Our research among high end, classical audiophile consumers, and our business results so far in the US, indicates that lossless files are preferred to compressed files, even though they have to pay on average 20% more for lossless, for three reasons. First, they maintain that they can tell the difference in a high quality audio playback environment. (Virtually no one says they can tell the difference with a $400 home theatre let alone computer speakers or the ear buds that portable devices like iPod offers).
Second, they don’t want their collections, which they see as long-term personal assets, encoded in a corporate-controlled format. They worry that sometime in the future Apple (or Microsoft or ‘fill in the blank’) will abandon their proprietary format, and that they will be left stranded.
I believe this fits in with the whole open source movement which has dominated web development for the past few years. FLAC is similar to Linux, mySQL, and the other open source technology that is prevalent on the web.
Third, they like the fact that they can convert their lossless files into CD’s that are uncompressed. Once you have compressed a file, you can never get back the original CD-quality sound. What’s more, a compressed file can’t be converted into a redbook CD, which is playable on any CD player.
Our research indicates that this group is indeed small (probably less than 10% of classical consumers, who indeed buy only about 2% of the US revenue for music). That makes them less than 0.2% of the public (if you extrapolate a bit). However, we are quite sure that this small group of buyers buy over half of the classical music sold in the US (the old 80/20 rule seems to run maybe 90/10 in classical).
As an aside, I passed the research results on to a market research buddy of mine, and she waxed rhapsodically about how the research was poorly designed and the base sizes were too small. I told her that it was not for the decision making of a large multi-national conglomerate but just for fun in a blog…..that’s true isn’t it?
May I propose we settle this by asking consumers (in a controlled, market research-worthy environment) the four questions below? They would of course be screened first to make sure they are unbiased, and are in our target audience (classical audiophiles):
1. How important do you think it is that your digitized music collection maintains the same sound quality as a CD?
a. Very important
b. Important
c. Don’t care
d. Not important
2. Can you tell whether File A (an unidentified 256k mbps MP3 file) or File B (an unidentified lossless FLAC file) is identical in sound quality to File C (a CD)? (given some samples of all three to listen to, with the CD identified as the standard). Comparison testing should only be done in a high quality audiophile listening setup (such as a high end home system or studio headphones with a preamp and/or converter attached).
3. Would you prefer to have your music collection
a. in a format controlled, patented, and created by a multi-national corporation, or
b. in a format that no single entity controls, but which has been created and maintained for free by computer experts around the world.
4. Would you like to store your music
a. In a format that can be copied onto a standard CD which can be played In any player, including your old home theatre, boombox, car, etc.
b. In a format which is playable only on players that are equipped with MP3 playback capability.
Of course, real research requires money—if you wanted to participate, you’d have to pony up some to join us. Or maybe you’re able to come up with a way to do it on the cheap (we like that, we’re entrepreneurs). Good luck with that.
James Glicker
http://www.passionato.com
Twitter: jglicker
There are lots of flaws in your methodology. Proper experimental design isn’t trivial.
The issues do NOT arise from the technology you used. The technology is fine. Units like the iPod or standard consumer-grade CD playersproduce audio signals with such low noise and such low THD that such signals significantly exceed the requirements of human hearing. The problems with your methodology involve experimental design and analysis, not the audio technology involved.
Let’s go over the obvious issues:
[1] The sample universe you got is clearly self-selected. This biases the sample universe. The first thing we want to know is: is your sample a normal distribution, i.e., a fair selection from a presumably Gaussian distribution of listeners? No, it’s not, and we know this because all the people who took your test are not typical of ordinary music listeners since they were all savvy enough about online contemporary music blogs to stumble onto your test.
The problem with selecting test subjects this way is the same problem as doing a political poll by calling people at random. People you call up in the phone book are not a normal distribution. It’s obvious why — nowadays, most people under 30 have no land line, and use a cell phone, so if you rely on polling people with land lines only, your sample set will skew toward older people. You’ll also miss people who pay extra not to have their phone number published, which means your sample set will skew toward middle income and include no truly high-income people. And so on.
We can deduce that your sample set skews young, skews toward people with a musical education, skews toward the geek and technophile segments of the population, and skews toward people who are intensely engaged with contemporary music. All of these are highly unrepresentative of the general audience of music listeners.
What we want to know is: what’s the shape of the distribution of the listeners who responded to your poll? It’s sure not going to be a standard Gaussian bell curve, that’s for sure. This has big repercussions for deciding important questions like, How many standard deviations do we allow per confidence interval? You can’t make meaningful statements about the accuracy of peoples’ responses to a poll without knowing that info, and you can only run those statistics if you know the difference between the distribution of your sample universe and a standard normal distribution.
[2] We want to know whether the inter-group variance is larger or smaller than the between-group variance of your sample set as compared to a truly reprsentative Gaussian sample set. We can get that by running a Students’ t-distribution but we have to have a truly normal control group to do that. We might try a bootstrap, but your sample size isn’t big enough to give good results from a bootstrap.
[3] We would like to know how many respondents got the answers 100% wrong. This is as statistically significant as getting the answers 100% correct. Respondents who got 100% wrong are hearing a difference but mislabeling it. Did you account for this? Almost certainly not.
[4] We would like to know the probability that if all respondents took this same test again, they’d get more answers right. To do that, we need to an Analysis of Variance. ANOVA analysis requires more careful experimental design than you have put into this test. In particular, we can apply Ramsey theory to get approximate guesstimates if the sample universe you chose from has a normal distribution — but it doesn’t, so Ramsey theory is out.
These are the obvious superficial problems. More subtle problems inhere, and could possibily be dealt with by deconvolving the transfer function twixt your distribution and a normal distribution and then massaging the ANOVA analysis with that information. But since you haven’t bothered to characterize the shape of the distribution of your respondents, we can’t leverage basic tools like the chi-squared test.
Thanks for taking the time to write such a considered response.
There are some good points here. The only thing I can really say is that I think you might be over-thinking it a bit:
Nobody in their right mind would base a major business decision on the outcome of this test because, as you rightly point out, it only tells you about the people who participated in it. They might, though, question groundless assumptions that they’d made and commission some proper research of their own.
To avoid the temptation to close the test when I agreed with the results, I decided in advance to end it after 100 responses or two weeks, whichever came first. It’s not perfect, but since everybody I talked to seemed completely confident that they’d get it right every time, I was mostly interested to see if that was true. The sample’s certainly good enough to cast serious doubt on that idea.
While we’re talking about the sample, I’ll share this additional piece of data: towards the end of the week in which the experiment took place, somebody posted a link to the experiment on the Linn Digital forum. That’s a place people hang out to discuss a high-end piece of hardware that plays music from your computer on your stereo. There’s a sort of a tidemark in the comments round about this time, where people start telling me I’m an idiot because I didn’t control for hardware. Almost all of respondents 70-150 arrived by clicking this link. During this time, the proportion of people getting it right went (or saying they didn’t know) went down, and the proportion of people getting it wrong went up. That is to say, the audiophile demo seemed to be worse at it than my normal readers.