Use of Fourier Transforms in MP3 Audio Compression
MP3, or more precisely MPEG-1 Audio Layer 3,is part of an audio-visual standard called MPEG.
Why MPEG-1 Layer-3?
- Open standard: The specification is available to everybody interested in implementing the standard. - Availability of encoders and decoders and other supporting technologies
Basic Concept behind MP3 Compression
With MP3, the sound samples are transformed using methods that involve Fourier Series Transformations. A frequency analysis of the sound is the basis for this transformation. Based on this frequency analysis, the sound is split into frequency bands, each band corresponding to a particular frequency range. With MP3, 32 frequency bands are used. Based on the frequency analysis, the encoder uses what is called a psycho-acoustic model to compute the significance of each band for the human perception of the sound. The idea is that the the human ear can only discern sounds from 20Hz to 20KHz, so any data outside of this threshold can be discarded to make the file smaller.
The information remaining after frequency analysis and using a psycho-acoustic model is coded efficiently with (a variant of) Huffman coding. MP3 supports bit rates from 32 to 320 kb/s and the sampling rates 32, 44.1, and 48 kHz. The format also supports variable bit rates (the bit rate varies in different parts of the file). An MP3 encoder also stores metadata about the sound, such as the title of the audio piece, album and artist name and other relevant data.
Given blow is a block diagram of MPEG-1 Layer-3 encoder.
The overall algorithm is broken up into 4 main parts.
● Part 1 divides the audio signal into smaller pieces, these are called frames. An MDCT filter is then performed on the output.
● Part 2 passes the sample into a 1024-point FFT, and then the psychoacoustic model is applied. Another MDCT filter is performed on the output.
● Part 3 quantifies and encodes each sample. This is also known as noise allocation. The noise allocation adjusts itself in order to meet the bit rate and sound masking requirements.
● Part 4 formats the bitstream, called an audio frame. An audio frame is made up of 4 parts, The Header, Error Check, Audio Data, and Ancillary Data.
PART 1: MCDT (Modified Discrete Cosine Transform) FILTER
The MDCT is a Fourier related transform based on type-IV DCT. It has an additional property of being “lapped.” This linear function transforms 2N real numbers to N real numbers according to the equation:
$ \chi(\omega+2\pi) = \sum_{n=-\infty}^{\infty}x[n]e^{-j(\omega +2\pi)n} $
PART 2: 1024-point FFT