Homework3ECE438JPEG - Rhea

JPEG Compression

Overview

JPEG compression is a lossy compression format for images. Here is a simple explanation of how it works.

Step 1: The image is divided into non-overlapping 8 by 8 blocks (8x8 pixels). If the image width or size does not divide evenly into 8, the image may be cropped or pixels may be added to make the image divisible by 8. Each 8 by 8 block will contain 64 values. For a grayscale image, each pixel may be anywhere from 0 to 255, where 0 is pure black and 255 is pure white.

Step 2: The 2-D Discrete Cosine Transform (DCT) is applied to each 8 by 8 block. The DCT is chosen over transforms like the FFT because the frequency coefficients are represented by only real numbers. The FFT would require two 8 by 8 matrices, one to store the real part of the frequency coefficient, another to store the imaginary part. The result of the DCT is also an 8 by 8 block, where the top left corner represents the DC or average pixel value for that specific 8 by 8 block. The bottom left corner is an AC coefficient which represents the fastest vertical change. The top right corner is an AC coefficient which represents the fastest horizontal change. The image is represented as a weighted sum of the 64 possible configurations.

Step 3: Once the DCT of the 8 by 8 block is obtained, each block is quantized using a quantization matrix. This is where the lossiness of the JPEG format is introduced. Quantization is essentially a divide and floor operation. The quality factor of a JPEG is directly related to quantization matrix.

The quantization matrix is an 8 by 8 matrix. Values within this matrix are chosen based on the Q-factor. For an image with high quality, the q-matrix may be set to all 1's. This essentially means no quantization occurs. The way the values in the q-matrix are decided is based upon common trends in an image. Typically, images contain a lot of continuous tones. This makes the DC component of the image and the low AC components particularly dominant in the representation of the image. In a typical image, the majority of the image strength is contained in the top left corner of the 2-D DCT. To preserve these portions of the image, the values in the q-matrix for the top left corner are smaller. The values in the bottom right of the q-matrix are particularly high. This is because the average image is already characterized well with just the top left portion of the 2-D DCT. We don't want to include a high frequency coefficient unless it is particularly dominant and essential to accurate representation. Therefore, the values in the lower right of the q-matrix are high. You can think of this as attenuating high frequency portions of the image unless they have a high signal energy (aka, they're significant in the reconstruction and representation of the image).

Homework3ECE438JPEG - Rhea

JPEG Compression

Overview

Alumni Liaison