PowerPoint 簡報

Report
Basic Image Compression
Concepts
Presenter:
Guan-Chen Pan
Research Advisor:
Jian-Jiun Ding , Ph. D.
Assistant professor
Digital Image and Signal Processing Lab
Graduate Institute of Communication Engineering
National Taiwan University
1
Outlines
Introductions
 Basic concept of image compression
 Proposed method for arbitrary-shape
image segment compression
 Improvement of the boundary region by
morphology
 JPEG2000
 Triangular and trapezoid regions and
modified JPEG image compression

2
Introduction

G
R
Lossless or lossy(widely used)
B
YCbCr Color
Transform
Image
Quantizer
Quantization
Table
Chrominance
Downsampling
(4:2:2 or 4:2:0)
Zigzag &
Run Length
Coding
Huffman
Encoding
Differential
Coding
Huffman
Encoding
8×8
FDCT
Bit-stream
3
YCbCr
Y   0.299 0.587 0.114   R  0 
Cb    0.169 0.334 0.500  G   128
  
   
Cr   0.500 0.419 0.081  B  128
Y:the luminance of the image which
represents the brightness
 Cb:the chrominance of the image which
represents the difference between
the gray and blue
 Cr:the chrominance of the image which
represents the difference between
the gray and red

4
Chrominance Subsampling

The name of the format is not always
related to the subsampling ratio.
(a) 4 : 4 : 4
W
H
Y
(b) 4 : 2 : 2
W
H
Y
(c) 4 : 2 : 0
W
H
W/2
W
W/2
H/2
H
Cb
H
W
Cr
W/2
W/2
H
Cb
Cb
H/2
H
Y
Cr
(d) 4 : 1 : 1
W
H
Y
W/4
H C
b
W/4
Cr
H Cr
5

Compression ratio (CR):
CR =
n1
,
n2
where n1 is the data quantity of original image, and n2 is the
compressed data quantity

Root mean square error (RMSE):
R =
−1
=0
−1
=0 [
,  − ′(, )]2

,
where H and W are the height and the width of the images
respectively
6
 Peak-to-signal
ratio (PSNR):
255
P = 20 log10


We must keep in mind that the two
measurements are not related to the real
quality of the image.
7
Reduce the Correlation between
Pixels
 Transform
1.
2.
3.
4.
coding
Coordinate rotation
Karhunen-Loeve transform
Discrete cosine transform
Discrete wavelet transform
 Predictive
coding
8
Coordinate rotation

Draw a line that has the mean square
error with all data for  = 
Weight
Original sequence X=(x0,x1)
Transform
200
180
Height
Weight
65
170
75
188
60
150
70
170
56
130
80
80
203
60
68
160
40
50
110
40
80
160
140
x1
120
100
  68
20
0
0
10
20
30
40
50
x0
60
70
80
90
Height
9
New Height y0 New Width y1

181.971
3.416
203.406
0.887
161.554
0.560
183.844
-1.220
141.512
-3.223
206.133
-2.999
173.823
-3.111
120.721
-5.152
89.159
-7.112
do the inverse transform to get the data
and reduce the correlation
Discard the value of weight
Height
Inverse Transform
Weight
Height
Weight
181.97068384838812
0
68.16741797800859
168.72028006870275
202.40605916474945
0
75.82264431044632
187.66763012404562
161.55397378997284
0
60.51918377426525
149.79023613916877
183.8437168154677
0
68.86906847716196
170.45692599485025
141.51187032497342
0
53.01127967035257
131.20752139486424
206.13345984096256
0
77.21895318005868
191.12361585053173
173.822665082968
0
65.11511642520563
161.165568622698
120.72055367314222
0
45.222715366778566
111.93014828010075
89.15897210197947
0
33.399538811586865
82.66675942272599
XA Y
-1
10
Karhunen-Loeve transform(KLT)
The KLT is the optimal transform coding.
 We represent the autocorrelation matrix
of the output vector = as 
 = E[  − E   − E  T ]

we can substitute  with , and assume E  =0
 = E  − E   − E 
⇒  = E[   T ]
⇒  =  T
T
11
0 ⋯
0
⋱
⋮
  = ⋮
0 ⋯ −1
 We can achieve optimal decorrelation
through diagonalization, which means that
 is composed of the eigenvectors of the
autocorrelation matrix.
 But it takes large amount of computation
to find eigenvectors and all the
eigenvectors are need to be stored.
12
Discrete cosine transform
The DCT is an approximation of the KLT
and more widely used in image and video
compression.
 The DCT can concentrate more energy
in the low frequency bands than the DFT.

13
Discrete wavelet transform
Wavelet transform is very similar to the
conventional Fourier transform, but it is
based on small waves, called wavelet,
which is composed of time varying and
limited duration waves.
 We use 2-D discrete wavelet transform in
image compression.

14
Rows
Columns
h (n)
↓2
WD ( j , m, n)
h ( m)
↓2
WV ( j , m, n)
h ( m)
↓2
WH ( j , m, n)
h ( m)
↓2
W ( j , m, n)
↓2
W ( j  1, m, n)
h ( n)
h ( m)
↓2
LL2
HL2
H

W ( j , m, n) W ( j , m, n)
HL1
LH2
HH2
W ( j  1, m, n)
WV ( j , m, n) WD ( j , m, n)

LH1
HH1
We can see that  (, ,  is very
similar to the original image , so we can
use it to achieve image compression.
15
Predictive Coding
Predictive coding means that we transmit
only the difference between the current
pixel and the previous pixel.
 The difference may be close to zero.
 However, the predictive coding algorithm
is more widely used in video.
 EX. Delta modulation (DM), Adaptive DM.
DPCM ,Adaptive DPCM (ADPCM)

16
Quantization

Each DCT coefficient F(u,v) is divided by
the corresponding quantization matrix
Q(u,v) and rounded to the nearest integer.
(, )
 ,  = (
)
(, )
17

Luminance quantization matrix
16 11 10 16 24 40 51 61
12 12
14 13
14 17
14
16
22
19
24
29
26
40
51
58
57
87
60 55
69 56
80 62
18 22
24 35
49 64
37
55
78
56 68 109 103 77
64 81 104 113 92
87 103 121 120 101
72 92 95 98 112 100 103 99

Chrominance quantization matrix

Removes the high frequencies
18
Entropy Coding Algorithms
Huffman Coding
1.
◦
◦
2.
3.
Difference Coding (DC)
Zero Run Length Coding (AC)
Arithmetic Coding
Golomb Coding
19
Huffman Coding

Huffman coding is the most popular
technique for removing coding
redundancy.
◦ Unique prefix property
◦ Instantaneous decoding property
◦ Optimality

JPEG(fixed, not optimal)
20
Symbol
a2
a6
a1
a4
a3
a5
Probability
0.4
0.3
0.1
0.1
0.06
0.04
1
0.4
0.3
0.1
0.1
0.1
2
0.4
0.3
0.2
0.1
3
0.4
0.3
0.3
4
0.6
0.4
Code
1
00
011
0100
01010
01011
21
Difference Coding
For DC coefficients
 The DC coefficients is very close to its
neighbors and usually have much larger
value than AC coefficients.

Diffi = DCi -DCi−1
22
Zero Run Length Coding
Encode each value which is not 0, than
add the number of consecutive zeroes in
front of it
 EOB (End of Block) = (0,0)
 Only 4-bit value
 [57,45,0,0,0,0,23,0,-30,-16,0,……,0]
⇒[(0,57)(0,45)(4,23)(1,-30)(0,16)EOB]
 “Eighteen zeroes, 3” ⇒(15,0) ; (2,3)

where (15,0) is 16 consecutive zeroes
23
Category
Values
Bits for the value
1
-1,1
0,1
2
-3,-2,2,3
00,01,10,11
3
-7,-6,-5,-4,4,5,6,7
000,001,010,011,100,101,110,111
4
-15,...,-8,8,...,15
0000,...,0111,1000,...,1111
5
-31,...,-16,16,...31
00000,...,01111,10000,...,11111
6
-63,...,-32,32,...63
000000,...,011111,100000,...,111111
7
-127,...,-64,64,...,127
0000000,...,0111111,1000000,...,1111111
8
-255,..,-128,128,..,255
...
9
-511,..,-256,256,..,511
...
10
-1023,..,-512,512,..,1023
...
11
-2047,...,-1024,1024,...,2047
...
run/category
code length
code word
0/0 (EOB)
4
1010
15/0 (ZRL)
11
11111111001
0/1
2
00
7
1111000
0/10
16
1111111110000011
1/1
4
1100
1/2
5
11011
1/10
16
1111111110001000
2/1
5
11100
16
1111111110011000
16
1111111111111110
...
0/6
...
...
...
4/5
...
15/10
24
Arithmetic Coding
Arithmetic coding is another coding
method widely used in image and video
compression, and its performance is
better than Huffman coding.
 We treat the whole input data as a single
symbol and find the corresponding
codeword for it.
 Huffman, probability very close to 1.0,
1
log 2


25
Arithmetic Coding Algorithm
Input symbol is S
Previouslow is the lower bound for the old interval
Previoushigh is the upper bound for the old interval
Range = Previoushigh - Previouslow
Let
Previouslow= 0, Previoushigh = 1
Range = Previoushigh – Previouslow =1
WHILE (input symbol != EOF)
get input symbol S
Range = Previoushigh - Previouslow
New Previouslow = Previouslow + Range × intervallow of S
New Previoushigh = Previouslow + Range × intervalhigh of S
END
26

1.
2.
3.
4.
5.
First l
 = 0
 = 1
=1
 = 0 + 1 ×
0.05 = 0.05
 = 0 + 1 ×
0.25 = 0.25
k
0.05
[0.00,0.05)
l
0.2
[0.05,0.25)
u
0.1
[0.25,0.35)
w
e
0.05
0.3
[0.35,0.40)
[0.40,0.70)
r
0.2
[0.70,0.90)
?
0.1
[0.90,1.00)
0.0713348389
=2-4+2-7+2-10+2-15+2-16
l
l
1
0.25
?
u
u
r
e
?
0.10
0.074
0.0714
0.07136
0.071336
0.0713360
?
?
?
?
?
?
?
r
r
r
r
r
r
r
r
e
e
e
e
e
e
e
e
w
w
w
w
w
w
w
w
u
u
u
u
u
u
u
u
l
l
l
l
l
l
l
l
k
k
k
k
k
k
k
k
0
0.05
0.06
0.070
0.0710
0.07128
0.07132
0.0713336
1.00
0.90
0.70
2.
 = 0.25
0.25
3.
 = 0.2
 = 0.05 +
0.2 × 0.05 = 0.06
 = 0.05 +
0.2 × 0.25 = 0.1
5.
Sub-interval
Codeword : 0001001001000011
0.40
4.
Probability
Input String : l l u u r e ?
Second l
1.
 = 0.05

Symbol
0.35
0.05
0
27

Symbol
Probability
Sub-interval
k
0.05
[0.00,0.05)
l
0.2
[0.05,0.25)
u
0.1
[0.20,0.35)
w
0.05
[0.35,0.40)
e
0.3
[0.40,0.70)
r
0.2
[0.70,0.90)
?
0.1
[0.90,1.00)

0.071334 ⇒ L
For interval 0.05~0.25
Symbol
Probability
Sub-interval
k
0.05
[0.05,0.06)
l
0.2
[0.06,0.1)
u
0.1
[0.1,0.12)
w
0.05
[0.12,0.13)
e
0.3
[0.13,0.19)
r
0.2
[0.19,0.23)
?
0.1
[0.23,0.25)

0.071334 ⇒ L
28
Golomb Coding
Golomb coding is a special case of the
Huffman coding.
 Optimal for the data with a geometric
distribution.
 Prob y = a = (1 − p)pa
 No table

29
1.
First, determine m from p ,  =
log(+1)
−
log()
a = q×m + r
3. Convert q into the prefix. The prefix is
composed of q “1” bits followed by a “0” bit.
4. Convert r into the suffix using the binary code.
Threshold parameter (m) = 2^ log 2   m.
2.
◦ If r < (m), the length of the suffix is log 2  bits.
◦ If r ≥ (m), we update r into r +(m) and encode it
into a log 2  -length suffix.
30

Example
◦ p = 0.93,
m = 10,
a = 19
 q = 1, r = 9
 Prefix = “10”
 Threshold parameter (m) = 2^ log 2   m = 6
 r >threshold
⇒ r = r + threshold = 9 + 6 = 15
⇒ encode 15 into a log 2  -length suffix,
log 2  =4
⇒ Suffix = “1111”
 Code = “101111”
31

Decode “101111”
Encoding of quotient part
q
output bits
0
0
1
10
2
110
3
1110
4
11110
5
111110
6
1111110
:
:
N
<N repetitions of 1>0

Encoding of remainder part
r
offset
binary
0
0
0000
1
1
0001
2
2
0010
3
3
0011
4
4
0100
5
5
0101
6
12
1100
7
13
1101
8
14
1110
9
15
1111
output bits
000
001
010
011
100
101
1100
1101
1110
1111
q = 1, r = 9
⇒ a = 10*1+9 = 19
32
However, Golomb coding can just achieve
optimal coding efficiency when the data is
geometrically distributed.
 To solve this problem, there is an Adaptive
Golomb Code.
 Prob y = a = (1 − p)pa
⇒ Prob y = a = (1 − p(x))p(x)a

Huffman
Without
codeword
table
NO
Flexibility
and
adaptation
GOOD
Golomb
YES
MIDDLE
Adaptive
Golomb
YES
GOOD
33
Proposed Method for ArbitraryShape Image Segment Compression

An arbitrary-shape image segment f and
its shape matrix.
75
96
0
0
0
0
1
1
0
0
105 98 99 101 73
85 66
60
1
1
1
1
1
1
1
1
100 97
89 94
87 64
55
0
1
1
1
1
1
1
1
84
94 90
81 71
66
0
0
1
1
1
1
1
1
93
86 94
81 70
0
0
1
1
1
1
1
0
86 86
81 72
0
0
0
1
1
1
1
0
98 97
78
0
0
0
1
1
1
0
0
0
0
0
1
1
0
0
0
105 104
34

Standard 8x8 DCT bases with the shape
of f
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
35

The 37 arbitrary-shape orthonormal
DCT bases by Gram-Schmidt process
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
W 1 H 1
F (k )   f ( x, y) 'x, y (k ) for k  1,2,..., M
x 0 y 0
36
Quantization
 Q(k )  Qa k  Qc ,
for k  1,2,..., M
 F (k ) 
 Fq (k )  Round 
 , where k  1,2,..., M
 Q(k ) 
45
40
35
30
25
20
15
10
5
0
100
200
300
400
500
600
37
Improvement of the Boundary
Region by Morphology
38
JPEG2000
JPEG 2000 is a new standard and it can
achieve better performance in image
compression.
 Advantages

◦ Efficient lossy and lossless compression
◦ Superior image quality
◦ Additional features such as spatial scalability
and region of interest.

Complexity
39
JPEG 2000 encoder

Rate
Control
Original
Image
Forward
Component
Transform
Forward
2D DWT
Tier-1
Encoder
Quantization


JPEG 2000 decoder
Coded
Image
Tier-2
Decoder
Tier-1
Decoder
Tier-2
Encoder
Coded
Image
Embedded Block Coding with Optimized
Truncation(EBCOT) : Tier-1+Tier-2
Dequantization
Inverse
2D DWT
Inverse
Component
Transform
Reconstructed
Image
40
41
Irreversible component transform
(ICT)

Irreversible and real-to-real
0.587
0.114  U 0  x, y  
V0  x, y    0.299

 
 U x, y 
V
x
,
y

0.5

0.41869

0.08131



 1
 
 1
V2  x, y    0.16875 0.33126
0.5  U 2  x, y  
0 , 1 and 2 are just like ,  and
 ,respectively.
 0 , 1 and 2 are just like ,  and
 ,respectively. Y   0.299 0.587 0.114   R 

0 
Cb    0.169 0.334 0.500  G   128
  
   
Cr   0.500 0.419 0.081  B  128
42
Reversible component transform
(RCT)

Reversible and integer-to-integer
U 0  x, y   2U1  x, y   U 2  x, y  
V0  x, y   

4


V1  x, y   U2  x, y  U1  x, y 
V2  x, y   U0  x, y  U1  x, y 
43
Tiling
DWT in each tile
Image
Component
44
Rows
Columns
h (n)

↓2
WD ( j , m, n)
h ( m)
↓2
WV ( j , m, n)
h ( m)
↓2
WH ( j , m, n)
h ( m)
↓2
W ( j , m, n)
↓2
W ( j  1, m, n)
h ( n)
h ( m)
↓2
Irreversible , Daubechies 9/7 filter
n
0
±
1
±
2
±
3
±
4
Analysis Filter Coefficients
Lowpass Filter
Highpass Filter
0.602949018236
1.115087052456
Synthesis Filter Coefficients
Lowpass Filter
Highpass Filter
1.115087052456 0.6029490182363
0.266864118442
-0.059127176311
0.591271763114
-0.2668641184428
-0.078223266528
-0.057543526228
-0.057543526228
-0.0782232665289
-0.0168641184428
0.091271763114
-0.0912717631142
0.0168641184428
0.026748757410
0.0267487574108
45
46
Tier-1 Encoder
Quantized
DWT
coefficients
Bit-plane
Conversion
Fractional
Bit-plane
Coding
Context
Decision
Arithmetic
Coding
Tier-1 Encoder

Each Fractional Bit-plane coding will
generate the Context (CX) and the Decision
(D), which are used for arithmetic coding.
◦
◦
◦
◦
zero coding
sign coding
magnitude refinement coding
run length coding
47
Bit-plane Conversion
Converts the quantized wavelet
coefficients into several bit-planes
 First bit-plane is the sign plane
 The other planes are the magnitude plane,
from MSB to LSB

N
M
0
Sign
MSB
MSB
coding
order
Insignificant
0
Significant
First 1 appear
1
The bit to be coded
0
LSB
0
1
1
LSB
48
17
22
33
48
64
80
96
112
22
28
38
52
67
81
96
112
33
38
48
62
75
86
100
116
48
52
62
70
83
96
110
125
64
67
75
83
96
108
118
132
80
81
86
96
108
117
128
142
96
96
100
110
118
128
140
150
112
112
116
125
132
142
150
160
17 = 000100012
 160 = 101000002

49
Stripe and Scan Order
50
Zero Coding
d v d
h D h
d
v
d
D : current encode data, binary : 0 or 1
 h :0~2 v :0~2
d :0~4

51
Sign Coding

v
h D h
v
h, v: neighborhood sign
status
◦ -1: one or both negative
◦ 0: both insignificant or
both significant but
opposite sign
◦ -1: one or both positive

D :  ⊗ ,
⊗ = XOR
52
Magnitude Refinement Coding

σ′[x,y] is initialized to 0, and it will
become 1 after the first time of the
magnitude refinement coding is met at
[x,y]
53
Run-Length Coding
For four zeros : (CX,D) is (0,0)
 Else is (0,1), and use 2 uniform(CX=18)
to record the 1’s position

◦ (0110)
◦ The first nonzero position is (01)2
⇒(0,1), (18,0), (18,1)
54
D
(0,1)
CX
Arithmetic
encoder
Compressed
data
(total 19)
55
Why Called Fractional?
56
Tier-2 Encoder

Rate/Distortion optimized truncation
57
Triangular and Trapezoid Regions and
Modified JPEG Image Compression

Divide an image into 3 parts:
1. Lower frequency regions
2. Traditional image blocks and
3. The arbitrarily-shaped image blocks
Trapezoid and triangular
regions
8X8 SADCT image blocks
Traditional 8X8 image
blocks
58
1
1
1
1
1
1
1
1
1
0

0
1
1
1
1
1
1
1
1
1

0
1
1
1
1
1
1
1
1
0

0
1
1
1
1
1
1
1
0
0

0
0
1
1
0
1
1
1
1
0

0
0
1
0
0
1
1
1
0
0

0
0
0
0
0
0
1
1
0
0

0
0
0
0
0
0
1
1
0
0

Zone 1
Zone 2-1
1 sections
1 sections
1 sections
1 sections
2 sections
2 sections
1 sections
1 sections

Zone 1

Zone 2

Zone 3
1 zone
Zone 2-2
Zone 3
2 zones
1 zone
59
α-distance

α-distance < threshold
60

Corner too close

Trapezoid inside the zone
61
(M-1)th row
(M-2)th row
.
.
.
.
.
.
N = 10
1st row
0th row

N = K(m) + K(M-1-m)
62
1.
(a)
m = M-1
m = M-2
.
.
.
.
.
.
m=1
m=0
n=0 1 2
(b)
Region A
Rotation by 180 ∘
Construct the
rectangular region and
obtain the
orthonormal DCT
basis Cp,q [m, n]
Select the DCT basis
Cp,q [m, n] that
satisfies p+q=even
3. Bp,q [m, n]=
2Cp,q [m, n]
2.
Region B
Region A
Region B
Rectangular Region
63
Reference:
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
J.D Huang "Image Compression by Segmentation and Boundary Description, " 2008.
G. Roberts, "Machine Perception of Three-Dimensional Solids," in Optical and ElectroOptical Information Processing, J. T. T. e. al., Ed. Cambridge, MA: MIT Press, 1965, pp. 159197.
J. Canny, "A Computational Approach to Edge Detection," IEEE Trans. Pattern Analysis and
Machine Intelligence, vol. 8, pp. 679-698, Nov. 1986.
D. Comaniciu and P. Meer, "Mean Shift: A Robust Approach toward Feature Space
Analysis, " IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, pp. 603-619, 2002.
J.J Ding, P.Y Lin, S.C Pei, and Y.H Wang,"The Two-Dimensional Orthogonal DCT
Expansion in Triangular and Trapezoid Regions and Modified JPEG Image Compression,
",VCIP2010
J.J Ding, S.C Pei, W.Y Wei, H.H Chen, and T.H Lee, "Adaptive Golomb Code for Joint
Geometrically Distributed Data and Its Application in Image Coding", APSIPA 2010
W.Y Wei, "Image Compression", available in http://disp.ee.ntu.edu.tw/tutorial.php
K. R. Rao and P.Yip, Discrete Cosine Transform, Algorithms, Advantage, Applications, New York:
Academic, 1990.
S.S. Agaian, Hadamard Matrices and Their Applications, New York, Springer-Verlag, 1985.
H. F. Harmuth, Transmission of information by orthogonal functions, Springer, New York,
1970.
64
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
R. Koenen, Editor, “Overview of the MPEG-4 Standard,” ISO/IEC JTC/SC29/WG21,
MPEG-99-N2925, March 1999, Seoul, South Korea.
T. Sikora, “MPEG-4 very low bit rate video,” IEEE International Symposium on Circuits and
Systems, ISCAS ’97, vol. 2, pp. 1440-1443, 1997.
T. Sikora and B. Makai, “Shape-adaptive DCT for generic coding of video,” IEEE Trans.
Circuits Syst.Video Technol., vol. 5, pp. 59-62, Feb. 1995.
W.K. Ng and Z. Lin, “A New Shape-Adaptive DCT for Coding of Arbitrarily Shaped
Image Segments,” ICASSP, vol. 4, pp. 2115-2118, 2000.
S. C. Pei, J. J. Ding, P.Y. Lin and T. H. H. Lee, “Two-dimensional orthogonal DCT expansion
in triangular and trapezoid regions,” Computer Vision, Graphics, and Image Processing, Sitou,
Taiwan, Aug. 2009.
D. A. Huffman, "A method for the construction of minimum-redundancy codes,"
Proceedings of the IRE, vol. 40, no. 9, pp. 1098-1101, 1952.
S. W. Golomb, "Run length encodings," IEEE Trans. Inf.Theory, vol. 12, pp. 399-401, 1966.
R. Gallager and D.V.Voorhis, "Optimal source codes for geometrically distributed
integer alphabets," IEEE Trans. Information Theory, vol. 21, pp. 228–230, March 1975.
R. F. Rice, "Some practical universal noiseless coding techniques–part I," Tech. Rep. JPL79-22, Jet Propulsion Laboratory, Pasadena, CA, March 1979.
G. Seroussi and M. J. Weinberger, "On adaptive strategies for an extended family of
Golomb-type codes," Proc. DCC’97, pp. 131-140, 1997.
C. J. Lian “JPEG2000 “, DSP/IC design lab, GIEE, ntu
65

similar documents