# Still Image and Video Compression with MATLAB

## www.it-ebooks.info

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy orcompleteness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. One of the goals of this book is to present the basic principlesbehind image and video compression in a clear and concise manner and develop the necessary mathematical equations for a better understanding of the ideas.

### 10 Video Compression Standards 359

This book first describes the methodologies behind still image and video com- pression in a manner that is easy to comprehend and then describes the most popularstandards such as Joint Photographic Experts Group (JPEG), MPEG, and advanced video coding. One of the goals of this book is to present the basic principlesbehind image and video compression in a clear and concise manner and develop the necessary mathematical equations for a better understanding of the ideas.

### 1.1 WHAT IS SOURCE CODING?

As the proverb a picture is worth a thousand words goes, the transmission of these visual media in digital form alone will require far more bandwidth than what is available for the Internet, TV, or wireless networks. When compressed audio/video data is actually transmitted through a transmission channel, extra bits are added toit to counter the effect of noise in the channel so that errors in the received data, if present, could be detected and/or corrected.

### 1.2 WHY IS COMPRESSION NECESSARY?

Each number in the array represents an in- tensity value at a particular location in the image and is called a picture element orpixel, for short. It is very clear that efficientdata compression schemes are required to bring down the huge raw video data rates to manageable values so that practical communications channels may be employedto carry the data to the desired destinations in real time.

**1.3.1 Still Image Compression Let us first see the difference between data compression and bandwidth compression**

On the other hand, bandwidth compression refers to the process of reducing theanalog bandwidth of the analog source. If we want to compress bandwidth, we can simply filterthe analog signal by a suitable filter with a specified cutoff frequency to limit the bandwidth occupied by the analog signal.

### 0.4 Nor

In transform coding, a block of image pixels is linearly transformed into another block of transform coefficients of the same size as the pixelblock with the hope that only a few of the transform coefficients will be significant and the rest may be discarded. It must be pointed out that the amount of quantizationapplied in Figure 1.7a is not the same as that used for the DCT example and that the two examples are given only to show the differences in the artifacts introduced bythe two schemes.

**1.3.2 Video Compression**

It is more advantageous to determine or estimate the relative motionsof objects between successive frames and compensate for the motion and then do the differencing to achieve a much higher compression. The first frame in a scene is referred to as the key frame, and it is compressed by any of the above-mentioned schemes such as the DCT or DWT.

**1.3.3 Lossless Compression**

Observe that the most likely symbol a 2 has the least code length and the least prob- able symbols a 1 and a 3 have the largest code length. Depending on the num- ber of codewords, the amount of side information about the codebook to be trans-mitted may be very large and the coding efficiency may be reduced.

### 1.4 VIDEO COMPRESSION STANDARDS

If for instance images and video are compressedusing a proprietary algorithm, then decompression at the user end is not feasible un- less the same proprietary algorithm is used, thereby encouraging monopolization. This, therefore, calls for a standardization of the compression algorithms as well as data transportation mechanism and protocols so as to guarantee not only interoper-ability but also competitiveness.

### 1.5 ORGANIZATION OF THE BOOK

also be able to determine the distortions in the reproduced or displayed continuous image as a result of the spatial sampling and quantization used. In the following sec-tion, we will develop the necessary mathematical equations to describe the processes of sampling a continuous image and reconstructing it from its samples.

### 2.2 SAMPLING A CONTINUOUS IMAGE

Because the sampled image is theproduct of the continuous image and the sampling function as in equation (2.4), the 2D Fourier transform of the sampled image is the convolution of the correspondingFourier transforms. Since the Fourier transform of the sampled image is continuous, it is possible to recover the original continuous imagefrom the sampled image by filtering the sampled image by a suitable linear filter.

**2.2.1 Aliasing Distortion**

It is seen from Figure 2.3b that there is no overlap of the repli- cas and that the Fourier transform of the sampled image is identical to that of thecontinuous image in the region xC , f xC yC , f yC , and therefore, it can (− f ) × − f be recovered by filtering the sampled image by an ideal lowpass filter with cutoff frequencies equal to half the sampling frequencies. When the sampling rates are less than the Nyquist rates, the replicas overlap and the portion of the Fourier transformin the region specified by , f , f xC xC yC yC no longer corresponds to (− f ) × − f that of the continuous image, and therefore, exact recovery is not possible, see Fig-ure 2.3c.

**2.2.2 Nonideal Sampling**

Since the smearing effect is equivalent to lowpass filtering, the Fourier transform of the sampled image is abit narrower than that of the input image. In contrast to the sampling with a pulse of finite width, we see that the Fourier transform of the ideally sampled image is the same as that of theinput image.

**2.2.3 Nonrectangular Sampling Grids**

(a) (b) (c)(d) (e) (f)(g) (h) (i) Figure 2.5Effect of nonideal sampling: (a) a 64 × 64 BW image, (b) rectangular sampling image of size 32 × 32 pixels, (c) sampled image, (d) 2D Fourier transform of (a), (e) 2D Fouriertransform of (b), (f) 2D Fourier transform of the sampled image in (b), (g) 2D Fourier transform of an ideal impulse, (h) 2D Fourier transform of (a) sampled by impulse, and (i) image in (c)obtained by filtering by an ideal lowpass filter. For the set of points (t 1 , t 2 ) to be on a hexagonal sampling grid, the trans- formation takes the form y y(2.23) v = x− x Example 2.3 Convert a 256 × 256 array of pixels of alternating black and white(BW) dots to a hexagonal sampling grid and display both arrays as images.

### 2.3 IMAGE QUANTIZATION

Because the number of bits (binary digits) for representing each image sample is limited—typically 8 or 10 bits, the analog samples must first be quantized to afinite number of levels and then the particular level must be coded in binary number. The output values are predetermined corresponding to the range of the input and the number of bits allowed in the quantizer.

### 2 L (.5)

≺ r ≺ · · · ≺ rThe number of bits required to address any one of the output levels is log L (bits) (2.26) 2 B = where ⌈x⌉ is the nearest integer equal to or larger than x. A design of a quantizer amounts to specifying the decision intervals and corre- sponding output levels and a mapping rule.

**2.3.1 Lloyd–Max Quantizer**

The optimal MSE or Lloyd–Max quantizer design amounts to determining the input decision intervals k k {d } and corresponding reconstruction levels {r } for given L levels such that the MSE is minimized: d L+1 2 2p x (x) d x (2.27) (x − ˆx) =MSE = E (x − ˆx) d1 In equation (2.27), E denotes the expectation operator. Therefore, equation (2.27) j +1 can be written as d j +1 L 2p (x) d x (2.28)j x MSE = x − r j =1 dj Minimization of equation (2.28) is obtained by differentiating the MSE with re- spect to d and r and setting the partial derivatives to zero.

**2.3.2 Uniform Quantizer**

When the pdf of the analog sample is uniform, the decision intervals and output levels of the Lloyd–Max quantizer can be computed analytically as shown below. Using equation(2.34), we can also express d in terms of d and d as j j +1 j −1d j +1 j −1d (2.35)j = 2 The quantization step size of the uniform quantizer is related to the number of levels L or the number of bits B, as given in equation (2.36).

### 3. Quantize a sample x according to Q (x) = r

≤ x ≺ d j +1**2.3.3 Quantizer Performance**

Therefore, the MSE σ for the uniform quan- q tizer can be found from / 2 2 1 2 2 σ e (2.38) q = de = 12 − /2 We can express the performance of a uniform quantizer in terms of the achievable signal-to-noise ratio (SNR). Finally, a plot of the input decision intervals versus the reconstruction levels for the uniform quantizer is shown in Figure 2.8g, which is of the type staircase, as expected.

**2.3.4 PDF Optimized Quantizer**

In the following example, we will take up the design of a pdf-optimized quantizer using a single image and compare the results with that of the uniform quantizer ofExample 2.5. The number of pixels belonging to the dif-ferent output levels is plotted against output levels and is shown in Figure 2.9d (top figure), while the bottom Figure 2.9d shows the histogram of the input image for thesame number of bins—32 in the example.

**2.3.5 Dithering**

One way to mitigate contouring is to add a small amount of dither or pseudo random noise to the input image before quantization and then quantizethe image [6, 7]. As an example, the Barbara image is re- quantized to 4 bpp using a uniform dithered quantizer and is shown in Figure 2.11a.

### 2.4 COLOR IMAGE REPRESENTATION

However, an image compression system works with images captured directly from real-world scenes andfor the decompressed images to be viewed on a CRT by a human to be nearly the same as the captured images, then the captured images must be corrected using theinverse of γ . The luma component, which is the BW equivalent is a weighted sum of the R, G, and B components, and the chroma corresponds to the color differences with luma ′ ′ being absent.

**2.4.3 Video Format**

In order to accommo-date the return of the electron beam to the left of the screen as well as to the top of the screen, horizontal and vertical synchronizing pulses are introduced into thetime-domain video signal. Another interesting aspect of using a frequency domain representation of an image stems from the fact that it is much more efficient to decor-relate an image in the frequency domain than in the spatial domain.

### 3.2 UNITARY TRANSFORMS

Example 3.1 Let A be given by 1 a− j , A = √ 0 ≺ a ≺ 1 2j a −a 3.2 UNITARY TRANSFORMS Then, 2 1 1 a a− j + 1 ∗T ∗TA and AA = = 2 = I √ 2 2j a −a a + 1 a 2 1 a 2 j a T − 1 AA 2 = = I 2 2 j a a a − 1 Equation (3.2) can also be written as Basis Vectorn−1 ∗y (k) a (3.3)x = kk=0 T ∗ ∗∗T The vectors a , which are the columns of A are k = [a (k, n) , 0 ≤ n ≤ N − 1] called the basis vectors. Therefore, the N-point DFT is N = W N = −W N given in terms of two N/2-point DFTs as k X (k) = X (k) + W N 1 X 2 (k) (3.13a) Nk X X (k) (3.13b) 1 2k + = X (k) − W N 2 where we have used the fact that (N /2)−1nk X x [2n] W (3.14a) 1 (k) = N / 2n=0 (N /2)−1nk X (3.14b) 2 (k) = x [2n + 1] W N / 2n=0 N Next, the two -point sequences each can be divided into even and odd indexed sequences and the corresponding DFTs can be computed in terms of two -point se- 4 quences.

## HN HN

To prove this 3.2 UNITARY TRANSFORMS statement, substitute equation (3.37) in (3.35) and write the transformed image y as N −1 N −1Y x [m, n] A (k, m) A (l, n) 1 2 (k, l) = m=0 n=0 (3.38) N −1 N −1x [m, n] A (l, n) A (k, m) 2 1 = m=0 n=0 In equation (3.38), the quantity within the bracket is the transform of the image along the columns, that is, that the transform of each row is computed. The transform coefficients are found from y = AxA = 2 1−1 −1Next, we compute the four basis images as follows: 1 1a 00 1 1 a00 1 −1T T 00 a 00 a 10 , 01 a 01 a 11 , = × = = × =a10 2 1 1 a10 2 1 −1 1 1 1 1a 01 a01 1 −1 T a a , T a a .100010110111= × = = × =a a11 1 211 2 −1 −1 −1 Next, we calculate the weighted sum of the basis images to obtain the input image.

### 3.3 KARHUNEN–LO ` EVE TRANSFORM

Our objective in compressing an image is to be able to achieve as high a compression as possible with the lowest distortion. Denoting the autocorrelation matrix of y by R , we can write YY T T T T T TR A A (3.45) YY yy A XX = E = E Axx = AE xx = ARIn equation (3.45), the expectation operation is taken with respect to the random vector x and R is the autocorrelation matrix of x.

## XX YY

matrix. The matrix that diagonalizes R is shown to be the matrix whose column YY vectors are the eigenvectors of R .

## XX XX

For an N × N block of pixels, the KLT can be obtained as T XL C Y = L R The basis images can now be obtained from L R and L C by multiplying a column of L R by the transpose of a column of L C , that is,Tb k,l Rk l , Cl = l 0 ≤ k, l ≤ N − 1Following the MATLAB code listed below for the example, the 8 × 8 basis images of the KLT for the Barbara image is shown in Figure 3.6a. mse(7,:) mse(8,:)],’k:’,’LineWidth’,2) xlabel(’ # of basis images used’)ylabel(’MSE’) title(’MSE Vs basis images for KLT’) To reiterate our objective, the idea of using an orthogonal transform is to decorrelate the image in the transform domain so that we can individually quantize the transformcoefficients to different levels to achieve as high a compression as possible.

**3.4.1 Energy Is Conserved in the Transform Domain**

Then, 2 2 ∗T ∗T ∗T ∗T ∗TE y A x = y = y y = x Ax = x Ix = x x = x = EThat is, the unitary transform preserves the energy of the signal in the transform domain. The only effect the unitary transform has on x is to rotate the axes to a new set of orthogonal axes in which the signal is mapped with reduced correlation.

**3.4.2 Energy in the Transform Domain Is Compacted**

= E x − µ x x (k) = E A (x − µ = k=1n=1 (3.51)Thus, we find that the sum of the variances of the transform coefficients equals the sum of the variances of the input sequence. For the KLT, we compute the eigenvectorsand reorder them in descending eigenvalues and then obtain the autocorrelation ma- T trix of the transformed image from y R XX y, where R XX is the covariance matrix of the input image and y is the KLT.

### 4.3 WAVELET SERIES

More specifically, if the scale parameter is allowed to take on values that are integer power of two (called binary scaling) and the shift parameteris allowed to take on integer values (known as dyadic translation), then the wavelet series of f (t) is expanded in terms of (1) infinite sum of scaling functions and (2)infinite sum of wavelets. The functions φ (t) are j , k called scaling functions and are obtained by binary scaling and shifting of a prototype function as given by equation (4.8): j/ 2 j φ j,k (t) = 2 φ 2 t − k , −∞ ≺ j, k ≺ ∞; j, k ∈ Z (4.8)Also, the first summation in equation (4.7) represents an approximation of f (t) sim- ilar to the average value given in the Fourier series expansion of a periodic (or peri-odically extended) signal.