Fundamentals of Data Representation: Sound compression
As you can see we have some serious issues with the size of sound files. Take a look at the size of a 3 minute pop song recorded at a sample rate of 44kHz and a sample resolution of 16 bits.
44,000 * 16 * 180 = 126 720 000 bits (roughly 15 MB)
As you are probably aware an mp3 of the same length would be roughly 3Mb, a fifth of the size. So what gives? It is easy to see that the raw file sizes for sounds are just too big to store and transmit easily, what is needed it a way to compress them.
WAV files don't involve any compression at all and will be the size of files that you have calculated already. There are lossless compressed file formats out there such as FLAC which compress the WAV file into data generally 50% the original size. To do this it uses run length encoding, which looks for repeated patterns in the sound file, and instead of recording each pattern separately, it stores information on how many times the pattern occurs in a row. Let us take a hypothetical set of sample points:
As you can see the silent area takes up a large part of the file, instead of recording these individually we can set data to state how many silent samples there are in a row, massively reducing the file size:
Another technique used by FLAC files is Linear prediction.
FLAC files are still very large, what is needed is a format that allows you to create much smaller file sizes that can be easily stored on your computer and portable music device, and easily transmitted across the internet.
As we have already seen, to make smaller audio files we can decrease the sampling rate and the sampling resolution, but we have also seen the dreadful effect this can have on the final sound. There are other clever methods of compressing sounds, these methods won't let us get the exact audio back that we started with, but will be close. This is lossy compression.
There are many lossy compressed audio formats out there including: MP3, AAC and OGG (which is open source). The compression works by reducing accuracy of certain parts of sound that are considered to be beyond the auditory resolution ability of most people. This method is commonly referred to as perceptual coding. It uses psychoacoustic models to discard or reduce precision of components less audible to human hearing, and then records the remaining information in an efficient manner. Because the accuracy of certain frequencies are lost you can often tell the difference between the original and the lossy versions, being able to hear the loss of high and low pitch tones.
So that they take up less space and can be sent quickly across the internet or stored on portable music players
Lossy (mp3/AAC/ogg) and lossless(FLAC)
perceptual coding reduces the quality of frequencies stored in a sound file that are beyond the auditory resolution of most people