Last week, a client received an email from a fan who expressed surprise that my client’s new record was only available as a compressed download from iTunes.
The prevailing level of misunderstanding over the sound quality possible from a store like iTunes is perhaps best-encapsulated by this excerpt of a review of Cameron Carpenter’s latest album by Mark Swed, of the LA Times:
“…the real deal requires the real deal. The touring organ is a digital instrument, and on it Carpenter does his wowing best in the best digital sound, which isn’t bad on the CD (and is bad on restricted mp3 downloads on Amazon and iTunes or streaming sites). On studio master download from sites that handle high definition, though, the touring organ becomes a conveyor of psychedelic electronic music in a class of its own.”
Amazon does sell MP3 files, but iTunes uses the AAC codec instead. As a consumer, the exact distinction between these isn’t terribly important, but if you’re the music critic for the LA Times and you’ve taken it upon yourself to weigh in on audio quality, it’s something you really ought to understand. Streaming sites also employ a variety of types and degrees of compression. Beats uses MP3 and AAC. Spotify uses Ogg at a variety of bitrates.
Lumping all these formats together (and dismissing them) just because they’re “compressed” makes about as much sense as equating the sound of 78s and LPs just because they’re both round. It’s really exactly that stupid, and yet here’s a respected music critic doing just that (and not for the first time).
I’m all for good audio quality, but the obsession with “lossless” is a distraction which has almost completely obfuscated any sensible discussion of useful improvements to the way normal people hear music.
It’s a common misconception to measure expected audio quality in terms of bitrate. Intuitively, it seems as if more data will mean higher quality, but this isn’t always the case. The trouble with lossless codecs is that they’re very inefficient – even a compressed lossless format like FLAC or ALAC is generally encoding things that humans simply cannot hear.
It helps to consider the bitrate not as a measure of the merits of an encoding system, but as a measure of its cost. We’re commonly encouraged to treat bitrate as a proxy for quality, but really this is like measuring the performance of a car by looking at it’s fuel consumption. True, fast cars use a lot of petrol, but so do bad ones.
We might consider the amount of data required to transmit a page of text. As a text file, it might take up a few kilobytes. If we take a high resolution photograph of the page, it might yield a thousand times as much data, but when it is read aloud, it will sound exactly the same. We could use a microscope to photograph every fibre on the surface of the page, but if what we want to do is read the text, there’s a lot of data there we simply don’t need.
People don’t seem to have a problem with this when it comes to pictures. Nobody says “I won’t look at a website unless all the images are TIFF files”, because that’s plainly ridiculous. We’ve all seen badly compressed images on the Internet, and we’ve all seen beautiful ones too. We understand that “what it looks like” is the reliable measure of, well, what it looks like.
Eyes work differently to ears, though. Eyes are much harder to bamboozle with plausible-sounding pseudoscience. This is why there is no market for super-high-end TVs which reproduce infra-red and ultraviolet light. We all just accept that these are parts of the electromagnetic spectrum that we cannot see, and we leave it at that.
One of the (many) things my company does is to help broadcasters to encode audio for delivery to consumers. When they look into it, they almost always settle on AAC – and not because they’re too cheap to store something bigger.
The fact is that AAC is efficient. Bit for bit, it achieves higher audio quality than just about any other method of storing digital audio. AAC works at a variety of bitrates. It would theoretically be possible to use something like AAC at 1411kbps. If you did that, you’d achieve far higher quality than a CD can store.
Why isn’t this done? Well, AAC is a perceptual codec, which means that its success at reproducing a sound is measured by examining what users hear. When figuring out which bits of the sound to keep, the focus on the bits that are audible. In rigorous double-blind tests published in peer-reviewed journals, nobody could find any point in encoding AAC at a higher bitrate than 256kbps. Without preconceptions to guide them, under test conditions, people simply couldn’t tell the difference between 256kbps Vbr AAC and the highest quality studio masters. Consistently.
Of course, this doesn’t mean that all AAC files sound great. To make a 16-bit CD from a 24-bit studio master, you normally add a small amount of noise in a process called dithering, to improve the dynamic range.
This noise is unnecessary in AAC encoding. By skipping this step, and by avoiding the very loudest signals than can cause distortion on decoding, it is possible to create an AAC from a studio master which more accurately reflects the audible portions of the original than is possible with a CD or uncompressed PCM WAV file.
This is how Mastered for iTunes works. Although not marketed very effectively, it’s really rather clever. Instead of saddling the user’s storage and bandwidth with inefficiently stored data and sounds they cannot hear, Apple has pushed the work back onto the producers. We do some extra work to make a more efficient master, and the consumer gets better sound with less than a fifth of the data.
There are circumstances where it makes sense to record audio at a higher degree of fidelity than is perceptible to the human ear, but once a record is finished, there’s no harm in throwing out the parts nobody can hear. When you buy a Mastered for iTunes AAC, you’re getting less data, but you’re still getting all of the music.*
*Unless you’re a dog. If you’re a dog, SACD or 96kbps downloads will sound noticeably better than CDs**. Don’t buy anything over 192khz, though. People who sell 384khz downloads to dogs are ripping them off. That stuff is for bats. They are most discerning customers.
** Perhaps Mark Swed is getting his dog to write his reviews for him. It certainly is an alternative explanation. If I had a literate dog, “music critic” would not be the way I exploited it for financial gain.
There is a fundamental divide in reactions to audio. Hearing is very adaptive, so if you are acclimated to bad sound the distortion becomes imperceptible. All the data that MP3, AAC and CDs are good enough is based on artificially stunted hearing. I have a large amount of inferential data that the reviewer is accurately reporting the observations of his cohort that digital compression is unlistenable, and uncompressed 16/44 is acceptable but inferior to higher sample rates (particularly DSD) for people who hear live acoustic music regularly, for example MORE hours per week than they listen to speakers and headphones. I have known for over 40 years that acoustic musicians hear differently, and I have recently proved precisely what that difference is and reduced it to practice. The latest data from studying comparative development and brain imaging confirms that people who play acoustic music for an hour or more a day from childhood have much greater neural mass in the loci of musical intelligence and spatial intelligence, and also inter-connectivity in the brain in the form of white matter. This increased perception of spatial information beyond the stereo and surround paradigms is where digital compression erases information that was a normal part of hearing in the pre-industrial world. What the audiologists have been measuring since 1933 is people who listen to speakers rather than music. OTOH, because people still hear acoustic speech every day from childhood, the phase information of speech signals is apparent. You can prove this by simple experiment. Plug one ear and put on a blindfold, then have someone walk around the room talking. You will have a crude sense of where they are and the room boundaries with only one ear. This disproves the standard model of hearing which requires two ears to hear any spatial content. Hearing games the Fourier transform by responding to PHASE. 80% of the neural endings in your ears respond to time instead of frequency. This enables sensing relative time to two or three microseconds by comparing phase, extending our hearing to an effective bandwidth of 500KHz or so for timing information, even though the physical mechanism rolls off starting around 4KHz. Most math and engineering geeks applying the Fourier transform to hearing and audio ignore phase. This is a vicious cycle because the phase mangling of reproduction systems cause hearing to develop so as to ignore the physically sensed phase information when listening to music (but not speech, which is processed separately), which in turn causes audiologists to report that humans can’t hear phase, and so the geeks have data to justify their over-simplification of audio.