Color histogram

Image Representation
Global vectorial representations
Image features have a feature vector representation to collects the numerical values of the
feature descriptor. Therefore features can be regarded as points in a multidimensional
feature space.
Representations of images based on global features in vectorial form have a dimension that
is determined by the number of features properties used to describe the image patterns.
Feature histogram representations
Histograms are a common feature vector representation that measures the frequency with
which the feature appears in an image or in an image window. With histograms features are
quantized into a finite number of bins. Histogram-based holistic representations are widely
used with color, edge, lines.
Being orderless the histogram representation is invariant to viewing conditions and is also
tolerant to some extent to partial occlusions.
Histogram representations main drawbacks are:
– Global representation of the image (window) content:
Might be inaccurate to account for local properties and spatial configuration of regions.
– Dimensionality course:
Histograms must have a high number of bins (>64) for a meaningful representation. This
requires high-dimensional indexes for similarity search
– Histogram distances:
Several histogram distances can be defined but their choice is critical for a meaningful
and sound comparison of feature distributions
– Histogram binning:
Hard assignment of feature values to bins might result into boundary effects.
Color histogram
The color histogram of an image defines the image colour distribution. Color histograms
tassellate the color space and hold the count of pixels for each color zone. For gray level
images the grey level histogram is built similarly.
Main issues:
Uniform tassellation can result inappropriate for a correct representation of color
Color histogram size is typical 256 or more for meaningful representations
Quadratic distances are often more appropriate for color distribution matching
Accounting for spatial information is often fundamental for meaningful color
Color histogram
Color Correlogram
Correlogram is a variant of histogram that accounts for the local spatial correlation of colors.
Correlogram is based on the estimation of the probability of finding a pixel of color j at a
distance k from a pixel of color i in an image
Quantized colors:
c1, c1,…….. cm,
Distance between two pixels:
|p1 – p2| = max
Pixel set with color c:
Ic = p | I(p) = c
Given distance:
|x1 – x2|, |y1 – y2|
- Correlogram:
Dimensionality m m d if number of different k is d
- Auto-correlogram:
Dimensionality m d
• Practical consideration: use auto-correlogram with m <= 64, d = 1, k = 1
Edge histograms
Edge histograms represent distribution of edge properties in one image
Edge intensity histograms provide good discrimination between scenes.
Histograms of edge orientations can also be obtained to have a description of the edge
directionality. Each histogram bin accounts for the number of edges at a specific
– 8 orientations are typically sufficient.
– Interpolation of bin assignment can be necessary for meaningful representations
Line histograms
Histograms of line lenght and orientation provide useful characterization of image content.
– Binning can be critical for discriminative representations
Image lines
Line length
Line orientation
Human localization capability
Localization is the process of identifying landmarks of a scene.
We as humans can distinguish:
– Indoors: strong assumptions of flat walls, narrow hallways…
– Outdoors: less conforming set of surfaces
We as humans we can for localization:
– Objects
– Regions
– The scene as a whole
Human vision architecture
In the human visual system there exist evidence of place recognizing cells at parahippocampal
place area. Context information is obtained within a eye saccade in approx 150 ms.
Two basic models: Gist and Saliency
Visual Cortex: low level filters, center-surround,
and normalization
Saliency model: attends to pertinent regions
Gist model: computes image general characteristics
High Level Vision:
– Object recognition
– Layout recognition
– Scene understanding
Gist versus Saliency
Gist is the term used to signify the essence, the holistic characteristics of an image.
“It is an abstract representation of the scene that spontaneously activates memory
representations of scene categories (a city, a mountain, etc.)” [A. Oliva and A. Torralba 2001]
Gist utilizes the same visual cortex raw features as in the Saliency model. Gist is theoretically
non-redundant with Saliency
Gist versus Saliency:
– instead of looking at most conspicuous locations in image, looks at scene as a whole
– detects regularities (not irregularities)
– exploits cooperation (accumulation) instead of competition (winner takes all) among
– there is more spatial emphasis in Saliency
GIST global representation
With images an approximate global representation of human gist can be obtained by partitioning
the image in a 4x4 grid (3x3 for small images like 32x32) and taking orientations at different scales
and center-surround differences of color and intensity at each grid cell.
It is equivalent to compute gradient magnitude and orientation for each grid cell plus color and
intensity gradients. Does not imply any segmentation.
Gist model implementation
V1 raw image feature-maps
‒ Orientation Channel
Gabor filters at 4 angles (0,45,90,135)
on 4 scales = 16 sub-channels
‒ Color
red-green and blue-yellow center-surround
with 6 scale combinations = 12 sub-channels
‒ Intensity
dark-bright center-surround
with 6 scale combinations = 6 sub-channels
Total of 34 sub-channels
Gist model implementation
Gist feature extraction:
average values on the predetermined grid
Gist vector
Gist model implementation
Dimension Reduction
– Original:
34 sub-channels x 16 features
= 544 features
– PCA/ICA reduction:
80 features keep >95% of variance
Place Classification
– Three-layer neural network
The MPEG-7 standard
MPEG-7 formally named Multimedia Content Description Interface, is a standard for describing
the multimedia content that supports some degree of interpretation of the information
meaning which can be passed onto or accessed by a device or a computer code . MPEG7 is
composed of:
– MPEG-7 Visual – the Description Tools dealing with Visual descriptions.
– MPEG-7 Audio – the Description Tools dealing with Audio descriptions
The goal of the MPEG-7 standard is to allow interoperable searching, indexing, filtering and
access of audio-visual content by enabling interoperability among devices and applications.
Ideally, MPEG-7 facilitates exchange and reuse of multimedia content across different
application domains
MPEG-7 description elements
MPEG-7 provides four types of normative description elements:
– Descriptors,
– Description Schemes (DSs)
– Description Definition Language (DDL)
– System Tools (coding schemes)
A description consists of a Description Scheme and the set of Descriptor values:
– Descriptor: A representation of a feature. A Descriptor defines the syntax and the
semantics of the feature representation.
– Description Scheme: The structure and semantics of the relationships between its
components, which may be both Descriptors and Description Schemes.
MPEG-7 Descriptors
MPEG-7 Descriptors support a range of abstraction levels, from low-level signal
characteristics to high-level semantic information. The abstraction level relates to the way
we extract the features: we can automatically extract most low-level features, whereas
high-level features usually need human supervision and annotation.
Only the description format is fixed, not the extraction methodologies.
Normative part of
MPEG-7 standard
MPEG-7 Description Scheme
A Description Scheme deals with the structure of the description and describes both the
structure and semantics of the audio-visual content. In addition MPEG-7 Description Scheme
also supports the description of other types of information about the multimedia data such as
the coding scheme used, the data size, place and time of recording, classification, and links to
other relevant material.
MPEG-7 Segment Description Scheme tree
Among different regions we could use Segment Relationship description tools
Spatial segmentation at different levels
Annotate the whole
image with
MPEG-7 Segment Relationship Description Scheme graph
Video Segment Relationship description tools can be used to model video shot segments
and relationships between regions within video shots
MPEG-7 Description Definition Language and System Tools
Basic tools of MPEG-7 are:
– Description Definition Language: “A language that allows the creation of new Description
Schemes and, possibly, Descriptors. It also allows the extension and modification of existing
Description Schemes.”
– Systems Tools: Tools to support multiplexing of descriptions, synchronization of descriptions with
content, delivery mechanisms, and coded representations for efficient storage and transmission
and the management and protection of intellectual property in MPEG-7 Descriptions.
MPEG-7 descriptions take two possible forms: a textual XML form suitable for editing, searching, and
filtering, the BiM binary form suitable for storage, transmission, and streaming delivery
To reduce the space occupation of the stored MPEG-7 descriptors, due to the
verbosity of the XML format, it is possible to use the BiM (Binary Format for MPEG-7)
BiM enables compression of any generic XML document, reaching an average 85%
compression ratio of MPEG-7 data, and allows the parsing of BiM encoded files,
without requiring their decompression
Application Areas of MPEG-7
Broadcast media selection (e.g., radio channel, TV channel)
Cultural services (history museums, art galleries, etc.).
Digital libraries (e.g., image catalogue, musical dictionary, film, video and radio archives).
E-Commerce (e.g., personalised advertising, on-line catalogues).
Education (e.g., repositories of multimedia courses, multimedia search for material).
Multimedia directory services (e.g. yellow pages, Tourist information, Geographical
information systems).
Remote sensing (e.g., cartography, natural resources management).
Surveillance and investigation services (e.g., humans recognition, forensics, traffic control,
surface transportation).
MPEG-7 will also make the web as searchable for multimedia content as it is searchable for
text today. This would apply especially to large content archives, which are being made
accessible to the public, as well as to multimedia catalogues enabling people to identify
content for purchase
MPEG-7 Visual : Visual Descriptors
Color Descriptors
Texture Descriptors
Shape Descriptors
Motion Descriptors for Video
Color Descriptors
Color Descriptors
Scalable Color
HSV space
Dominant Color
Group Of Frames /
Pictures histogram
• Constrained color spaces:
- Scalable Color Descriptor uses HSV
- Color Structure Descriptor uses HMMD
- Color Layout Descriptor uses YCbCr
Color Structure
HMMD space
Color Layout
YCbCr space
Scalable Color Descriptor
Scalable Color Descriptor (SCD) is in the form of a color histogram in the HSV color space encoded
using a Haar transform. H is quantized to 16 bin and S and V are quantized to 4 bins each.
The binary representation is scalable in the number of bins used and the number of bits per bin.
After all the pixels are processed, the histogram is calculated with the probability for each bin,
truncated into an 11-bit value. These values are then non-uniformly quantized into 4-bit values
according to the table provided in the ISO specification 13 for more efficient encoding, giving
higher significance to small values.
Yellow (60o)
Cyan (180o)
Red (0o)
Magenta (300o)
Blue (240o)
Haar Wavelet Transform*
In numerical analysis and functional analysis, the Discrete Wavelet Transform refers to wavelet
transforms for which the wavelets are discretely sampled
The first Discrete Wavelet Transform was invented by the mathematician Alfréd Haar:
– for an input represented by a list of 2n numbers, the Haar wavelet transform may be
considered to pair up input values, storing the difference and passing the sum.
– This process is repeated recursively, pairing up the sums to provide the next scale finally
resulting in 2n − 1 differences and 1 final sum
The Haar wavelet transform can be described as a step function. In the discrete domain it is
defined as a 2x2 matrix H defined as:
2x2 matrix H
1 -1
0 <= x < ½
½ < x < =1
Given a sequence (a0, a1, a2,a3…a2n+1) of even lenght this can be transformed into a sequence
of two-component vectors (a0,a1),… (a2n,a2n+1)
If one multiplies each vector with the matrix H one gets the result (s0,d0)…..(sn,dn) of one
stage of the Haar wavelet transform (sum, difference).
The two sequences s and d are separated and the process is repeated with the sequence s
(s0, s1, s2, s3…sn)
The discrete wavelet transform has nice properties:
– It can be performed in O(n) operations
– It captures not only some notion of the frequency content of the input, by examining it
at different scales, but also captures the temporal content, i.e. the times at which these
frequencies occur
SCD computation
With SCD summing pairs of adjacent histogram lines is equivalent to the calculation of a
histogram with half number of bins. If this is performed iteratively starting with the H axis, S,
V, and hence H….
Usage of subsets of the coefficients in the Haar representation is equivalent to histograms of
128, 64, 32 bins, calculated from the source histogram
256 bins 128 bins 64 bins 32 bins 16 bins
This is the 16-H bin group of
S=0 V=0
16 bins
S 4
V 4
This is the 16-H bin group of
S=1 V=0
H 16
16 bins
16 bins
16 bins
Here follow 16-H bin groups of
S=2, V=0
S=3, V=0
Here follow 16-H bin groups of
S=0-3, V=1
S=0-3, V=2
S=0-3, V=3
Bin scaling
The result of applying Haar Transform is a set of 16 low pass coefficients and up to 240
high-pass coefficients. The high-pass (difference) coefficients of the Haar transform express
the information contained in finer-resolution levels of the histogram.
Natural image signals usually exhibit high redundancy between adjacent histogram lines. This
can be explained by the slight variation of colors caused by variable illumination and
shadowing effects.
Hence, it can be expected that the high-pass coefficients expressing differences between
adjacent histogram bins usually have only small values. Exploiting this property, it is possible
to truncate the high-pass coefficients to integer representation with a low number of bits
SCD representations can be stored in different resolutions, ranging from 256 down to 16
coefficients per histogram.
Table shows the relationship between number of Haar coefficients as specified in the SCD and
partitions in the components of a corresponding HSV histogram that could be reconstructed
from the coefficients
No. coeff
# bins: H
# bins: S
#bins: V
Bit scaling
The high-pass (difference) coefficients in the Haar transform can take either positive or
negative values. The sign part is always retained whereas the magnitude part can be scaled
by skipping the least significant bits.
Using the sign-bit only (1 bit/coefficient) leads to an extremely compact representation, while
good retrieval efficiency is retrained.
At the highest accuracy level, 1–8 bits are defined for integer representations of the
magnitude part, depending on the relevance of the respective coefficients. In between these
extremes, it is possible to scale to different resolution levels.
Matching with SCD
With SCD, the reconstruction of color histogram from Haar coefficients allows matching with
highest retrieval efficiency. Matching in the histogram domain is only useful to achieve high
quality, i.e. when all coefficients are available.
It is recommended to perform the matching directly in the Haar coefficient domain, which
induces only marginal loss in the precision of the similarity matching with considerable savings
in computational cost.
For matching in the Haar coefficient domain it is recommended to use the L1 norm.
The L1 norm is also recommended for matching in the histogram domain.
GoF/GoP Color Descriptor
• GoF/GoP Color Descriptor extends Scalable Color Descriptor for a video segment or a group of
pictures: joint color histogram is then processed as SCD - Haar transform encoding
• In this case two additional bits allow to define how the joint histogram is calculated before
applying the Haar transform. The standard allows to use average, median or intersection
histograms aggregation methods:
– Average: sensitivity to outliers (lighting changes occlusion, text overlays)
– Median: increased computational complexity for sorting
– Intersection: a “least common” color trait viewpoint
– Browsing a large collection of images to find similar images
– Use histogram Intersection as a color similarity measure for clustering a collection of images
– Represent each cluster by GoP descriptor
histogram Intersection
Dominant Color Descriptor
Dominant Color Descriptor (DCD) assumes that a given image is described in terms of a set of
region labels and the associated color descriptors:
– Each pixel has a unique region label
– Each region is characterized by a color histogram
Colors in a given region are clustered into a small number of representative colors.
For each representative color the descriptor consists of:
– ci : representative color identifier
– pi : its percentage in the region
– vi : its color variance in the region
– s : the overall spatial coherency of the dominant colors in the region
{{c , p ,v },s},
(i =1,2, , N )
DCD computation
DCD variance is computed as the variance of each of the dominant colors (h are perceptual
Spatial coherency for each dominant color captures how coherent the pixels corresponding to
the dominant color are and whether they appear to be a solid color in the given image region.
Spatial coherency per dominant color is computed by the normalized average connectivity
(8-connectedness: pixels with coordinates
are counted if
connected to the corresponding dominant color pixel
DCD spatial coherency gives an idea of the spatial homogeneity of the dominant colors of a region.
It is computed as a single value by the weighted sum of per-dominant color spatial coherencies.
The weight is proportional to the number of pixels corresponding to each dominant color.
Matching with DCD
DCD is suitable for local (object or region) features, when a small number of colors is enough
to characterize the color information. Before feature extraction, images must be segmented
into regions:
− maximum of 8 dominant colors can be used to represent the region (3 bits)
− percentage values are quantized to 5 bits each
− variance: 3 bits /dominant color
− spatial coherence: 5 bits
The color quantization depends on the color space specifications defined for the entire
database and need not be specified with each descriptor. LuV uniform color space is
Dominant color representation is sufficiently accurate and compact compared to the
traditional color histogram:
- color bins quantized from each image region instead of fixed
- 3 bins on average instead of 256 or more
• It supports efficient database indexing and search. Typically when using DCD image similarity
is evaluated simply comparing the corresponding dominant color percentages and dominant
color similarity (color distances):
N1 N 2
D (F1, F 2 ) = å p + å p - åå(2a1i,2 j )p1i p2 j
i=1 j=1
ak,l : similarity coefficient between two colors ck and cl
a k,l
ì1- d k,l / d max
d k,l £ Td
d k,l > Td
d k ,l  c k  c l
dk,l : Euclidean distance between two colors ck and cl
Td : maximum distance for two colors to be considered similar,
dmax = Td ,  values 1.0 - 1.5, Td values 10 - 20 in the Luv color space
Color Structure Descriptor
Similar to a histogram, the Color Structure Descriptor (CSD) represents an image by both the color
distribution and the local structure. Scalable Color Descriptor may not distinguish both images but
the Color Structure Descriptor can do it.
CSD is obtained by scanning the image by an 8x8 structure element in a sliding window approach:
with each shift of the structuring element, the number of times a particular color is contained in
the structure element is counted, and a color histogram is constructed.
The HMMD color space is used.
HMMD Color space*
The HMMD color space regards the colors adjacent to a given color in the color space as the
neighboring colors. It is closely related to HSV:
‒ the Hue is the same as in the HSV space (0-360°)
‒ Max and Min are the maximum and minimum among the R, G, and B values i.e. how much
•black and how much white are present respectively
‒ Diff component is the difference between Max and Min i.e. how much a color is close to
pure• color
‒ Sum = (Max + Min) / 2 can also be defined i.e. how much brightness
Only three of the four components are sufficient to describe the HMMD space (H, Max, Min) or
(H, Diff, Sum). HMMD color space can be depicted using the double cone structure
HMMD can accomplish a color quantization close to the change of the color sensed by the
eye, thereby capable of enhancing a performance of content-based image searching.
HMMD subspace quantization
4 nonuniform quantizations are defined that partition the space into 256, 128, 64, 32 cells
Subspace 0
Subspace 1
Each quantization is defined via five subspaces. The Diff axis is defined in 5 subintervals
[0,6), [6,20), [20, 60), [60,110), [110, 255). Each subspace has sum and hue allowed to take
all values in their ranges. They are partitioned into uniform intervals according to a table.
Subspace 2
Subspace 3
Subspace 4
Example: 128-bins (cells) of the HMMD color space
CSD computation
The color structure histogram allows for m quantized colors cm, where m is {256, 128, 64, 32}.
The bin value h(m) is the number of structuring elements containing one or more pixels with color cm
– consider the set of quantized color index of an image and the set of quantized color index
existing inside the subimage region covered by the structuring element
– with the structuring element scanning the image, the color histogram bins are accumulated
– the final value of h(m) is determined by the number of positions at which the structuring
element contains color cm
8 x 8 s tru c tu rin g
e le m e n t
Matching with CSD
Given two images with DCD representation matching is performed by computing L1 distance
measure between CSDs:
dist( A , B ) 
h A (i )  h B (i )
Color Layout Descriptor
Color Layout Descriptor (CLD) is very Compact Descriptor (63 bit) per image based on:
– Grid-based Dominant Color in the YCbCr color space (the dominant color may also be
the average color)
– DCT (Discrete Cosine transformation) on a 2D-array of Dominant Colors
– Final quantization to 63 bits
F ={CoefPattern, Y-DC_coef, Cb-DC_coef, Cr-DC_coef, Y-AC_coef, Cb-AC_coef, Cr-AC_coef}
Y = 0.299*R + 0.587*G + 0.114*B
Cb = -0.169*R - 0.331*G + 0.500*B
Cr = 0.500*R - 0.419*G - 0.081*B
DCT (Discrete Cosine Transformation)*
DCT applies to 8x8 image blocks
For each block, DCT allows to shift from spatial domain to frequency domain:
f(i,j) is the value that is present in the (i,j) position of the 8x8 block of the original image
F(u,v) is the DCT coefficient of the 8x8 block in the (u,v) position of the 8x8 matrix that encodes
the transformed coefficients
The 64 (8 x 8) DCT basis functions:
CLD computation
The image is clustered into 64 (8x8) blocks
A single representative color is selected from each block (the average of the pixel colors in
a block suggested as the representative color). The selection results in a 8x8 image
Derived average colors are transformed into a series of coefficients by performing DCT
A few low-frequency coefficients are selected using zigzag scanning and quantized to form
a CLD (large quantization step in quantizing AC coeff / small quantization step in quantizing
DC coff).
If the time domain data is smooth (with little variation
in data) then frequency domain data will make low
frequency data larger and high frequency data smaller.
Matching with CLD
CLD is efficient for:
– Sketch-based image retrieval
– Content Filtering using image indexing
The distance of two Color Layout Descriptors CLD and CLD’ with 12 coefficients (6 Y, 3 Cb, 3Cr):
CLD {Y0, ..., Y5, Cr0, Cr1, Cr2, Cb0, Cb1, Cb2} is defined as follows :
What applications
Scalable Color descriptor is useful for image-to-image matching and retrieval based on color
feature. Retrieval accuracy increases with the number of bits used in the representation.
Dominant Color(s) descriptor is most suitable for representing local (object or image region)
features where a small number of colors are enough to characterize the color information. A
spatial coherency on the entire descriptor is also defined, and used in similarity retrieval.
Color structure descriptor is suited to image-to-image matching and its intended use is for stillnatural image retrieval, where an image may consist of either a single rectangular frame or
arbitrarily shaped, possibly disconnected, regions.
Color Layout descriptor allows image-to-image matching at very small computational costs and
ultra high-speed sequence-to-sequence matching also at different resolutions. It is feasible to
apply to mobile terminal applications where the available resources is strictly limited. Users can
easily introduce perceptual sensitivity of human vision system for similarity calculation.
Texture Descriptors
Homogenous Texture Descriptor
Non-Homogenous Texture Descriptor (Edge Histogram)
Homogenous Texture Descriptor
Homogenous Texture Descriptor (HTD) is composed of 62 numbers:
– #1,2: respectively the mean and the standard deviation of the image
– #3-62: the energy (e) and the energy deviation (d) of the 30 Gabor filtered responses of the
channels, in the subdivision layout of the frequency domain (6 orientations and 5 scales)
This design is based on the fact that response of the visual cortex is bandlimited and brain
decomposes the spectra into bands in spatial frequency (from 4 to 8 frequency bands and approx
as many orientations)
F = {fDC, fSD, e1,…, e30, d1,…, d30}
Gabor filter*
Gabor filter assumes that the function is first multiplied by a Gaussian function (as a window) and
the resulting function is then subjected to Fourier transform to derive the time-frequency analysis
(fixed Gaussian and variable frequency of the modulating wave).
The characteristic of optimal joint resolution in both space and frequency suggests that these
filters are appropriate operators for tasks requiring simultaneous measurement in these domains
like f.e. texture discrimination.
In 1D the window function means that the signal near the time being analyzed will have higher
weight. The Gabor transform of a signal x(t) is defined by this formula:
2D-Gabor filter for HTD
• Extension to 2D is satisfied by a family of functions which
can be realized as spatial filters consisting of sinusoidal
plane waves within two-dimensional elliptical Gaussian
• The corresponding Fourier transforms contain elliptical
Gaussians displaced from the origin in the direction of
orientation with major and minor axes inversely
proportional to those of the spatial Gaussian envelopes.
• Each channel filters a specific type of texture.
Energy in i channel is defined as: ei  log 10 [1  pi ]
pi 
  [G
P s ,r
( ,  )  P ( ,  )]
 0  0 
being P(ω,θ) the Fourier transform of an image represented
in the polar frequency domain and G a Gaussian function:
    
G P s , r      exp 
2 
2 
    
  exp 
2 
2 
The center frequencies of the channels in the
angular and radial directions are such that:
 r = 30 ° x r with 0 ≤ r ≤ 5
 s =  0 2-s with 0 ≤ s ≤ 4 ,  0 = 3/4.
Matching with HTD
With HTD one can perform:
– Rotation invariance matching
– Intensity invariance matching (fDC removed from the feature vector)
– Scale-Invariant matching F = {fDC, fSD, e1,…, e30, d1,…, d30}
Texture Browsing Descriptor
The Texture Browsing Descriptor (TBD) requires the same spatial filtering as the HTD and
captures the regularity (or the lack of it) in the texture pattern. Its computation is based on the
following observations:
– Structured textures usually consist of dominant periodic patterns.
– A periodic or repetitive pattern, if it exists, can be captured by the filtered images.
– The dominant scale and orientation information can be captured by analyzing projections
of the filtered images.
The texture browsing descriptor can be used to find a set of candidates with similar perceptual
properties and then use the HTD to get a precise similarity match list among the candidate
TBD computation
The TBD descriptor is defined as follows:
TBD = [ v1, v2, v3, v4, v5 ]
– Regularity (v1): v1 represents the degree of regularity or structuredness of the texture.
A larger value of v1 indicates a more regular pattern.
– Scale (coarseness) (v3, v5): These represent the two dominant scales of the texture.
Similar to directionality, the more structured the texture, the more robust the
computation of these two components.
– Direction (v2, v4 ): these values represent the two dominant orientations of the texture.
The accuracy of computing these two components often depends on the level of
regularity of the texture pattern. The orientation space is divided into 30 intervals.
E.g look for textures that are very regular and oriented at 300
Regularity (periodic to random)
Coarseness (grain to coarse)
Scale and orientation
selective band-pass filters
Directionality (/300)
Non-Homogenous Texture Descriptor
Edge Histogram Descriptor
Edge Histogram Descriptor (EHD) represents the spatial distribution of five types of edges:
vertical, horizontal, 45°, 135°, and non-directional
– Dividing the image into 16 (4x4) blocks
– Generating a 5-bin histogram for each block
EHD cannot be used for object-based image retrieval. Thedgeif set to 0 EHD applies for binary
edge images (sketch-based retrieval)
EHD is scale invariant. The Extended EHD achieves better results than HTD but does not
exhibits rotation invariant property
image represents the relative frequency of occurrence of the
5 types of edges in the corresponding sub-image. As a
EHD computation
result, as shown in figure 5, each local histogram contains 5
bins. Each bin corresponds to one of 5 edge types. Since
there are 16 sub-images in the image, a total of 5x16=80
histogram bins is required (figure 6). Note that each of the
80- histogram
bins 4x4
own semantics
in terms of
– Divide
the image into
location histogram
and edgeof edge
For example,
bin forusing
the 2x2 filter masks to
– Generate
for eachthe
in the sub-image
at (0, 0) innon-directional
The semantics
of the 1-D h
edges into
figure 3 carries the information of the relative population of part of the MPEG-7 s
the horizontal edges in the top-left local region of the image.
starting from the sub-imag
sub-images are visited
corresponding local h
accordingly. Within each
arranged in the followin
degree diagonal, 135-degr
Table 1 summarizes the
with 80 histogram bins. O
should be normalized and
number of edge occurrenc
total number of image-blo
The image-block is a ba
information. That is, for
whether there is at leas
Global Journal of Computer Science and Technolo
Edge map is obtained by using Canny edge operator.
ally represents the distribution of 5 types of
ocal area called a sub-image. As shown in
ub-image is defined by dividing the image
non-overlapping blocks. Thus, the image
s yields 16 equal-sized sub-images regardless
he original image. To characterize the subgenerate a histogram of edge distribution for
e. Edges in the sub-images are categorized
• The
basic EHD45-degree
uses 5 binsdiagonal,
for each sub-image.
In total we have 80 bins. The histogram bin values are
135by the total
of the 4)
al andnormalized
997]. Thus, the histogram for each sub• relative
The bin values
are then
to keep the size of the histogram as small as possible.
s the
of occurrence
bits/bin, 240 bits
are neededAs
in total
ges in With
the 3corresponding
a per sub-image.
n in figure 5, each local histogram contains 5
corresponds to one of 5 edge types. Since
b-images in the image, a total of 5x16=80
is required (figure 6). Note that each of the
bins has its own semantics in terms of
Extended EHD computation
For a good performance, we need the global edge distribution for the whole image and semi
global, horizontal and vertical edge distributions.
The Extended EHD is obtained by accumulating EHD bins for basic, semi-global and global. Global
uses 5 EHD bins for all the sub-images. For the semi-global, four connected sub-images are
clustered. In total, we have 150 bins (80 basic + 65 semi-global + 5 global )
Basic (80 bins)
Extended (150 bins)
13 clusters for semi-global
What applications
Homogenous Texture descriptor is for searching and browsing through large collections of similar
looking patterns. An image can be considered as a mosaic of homogeneous textures so that these
texture features associated with the regions can be used to index the image data.
Texture Browsing descriptor is useful for representing homogeneous texture for browsing type
applications. It provides a perceptual characterization of texture, similar to a human
characterization, in terms of regularity, coarseness and directionality.
Edge Histogram descriptor, in that edges play an important role for image perception, can retrieve
images with similar semantic meaning. It targets image-to-image matching (by example or by
sketch), especially for natural images with non-uniform edge distribution. The image retrieval
performance can be significantly improved if the edge histogram descriptor is combined with other
descriptors such as the color histogram descriptor.
Shape Descriptors
Region-based Descriptor
Contour-based Shape Descriptor
2D/3D Shape Descriptor
3D Shape Descriptor
A shape is the outline or characteristic surface configuration of a thing: a contour; a form.
A shape cannot be described through text.
Shape representation and matching is one of the major and oldest research topics of pattern
Recognition and Computer Vision.
Property of invariance of the representation - such that shape representations are left
unaltered, under a set of transformations - plays a very important role in order to recognize the
same object even in its translated /rotated/ scaled/ shrinked.. view.
Region Based Descriptor
Region Based Descriptor (RBD) expresses pixel distribution within a 2D object region.
Employs 2D-Angular Radial Transformation (ART) defined on a unit disk in polar coordinates.
ART Algorithm
Perform edge detection
Calculate ARTmn for m=0..M, n=0..N according to:
( ) ( ) ò ò V ( r,q ) f ( r,q ) r d r dq
ARTnm F=nm = Vnm r ,q , f r ,q
Scale coefficients by |ART00| to normalize
Perform matching on the features ARTmn.
f (  , )
is an image function in polar coordinates,
V nm (  ,  )
is the ART basis function. The ART basis functions are separable
along the angular and radial directions
The angular and radial basis functions are defined as follows:
A m   
 jm  
R n    
 2 cos  n  
n  0
n  0
Magnitude of ARTnm
m = 0, ..12
n = 0, ..2
Matching with RBD
Applicable to figures (a) – (e)
Distinguishes (i) from (g) and (h); (j)
Find similarities in (k), and (l)
– Describes complex shapes with disconnected regions
– Robust to segmentation noise
– Fast extraction and matching
Contour Based Descriptor
Contour-Based Descriptor (CBD) captures perceptually meaningful features of the shape contour.
It is based on Curvature Scale Space representation.
Curvature Scale-Space
– Finds curvature zero crossing points of the shape’s contour (keypoints)
– Reduces the number of keypoints step by step, by applying Gaussian smoothing (the contour
is then gradually smoothed by repetitive application of a low-pass filter with the kernel to X
and Y coordinates of the selected N contour points ).
– The position of key points are expressed relative to the length of the contour curve
Scale space diagram
The number of the curvature zeroes is a decreasing function of .
first derivative peaks
Gaussian filtered signal
Scale space diagram
The diagram of the zero positions as  varies is known as scale space diagram.
• Properties of scale space diagram:
− edge position may shift with increasing scale 
− two edges may merge with increasing scale 
− an edge may not split into two with increasing scale 
• Comparison between two scale space diagrams can be made by considering only the points
of maxima of the two diagrams.
• To obtain shape rotation invariance, invariance of scale space diagram to horizontal shifting
must be assured. Peaks are aligned to the zero of the diagram and the others are shifted
Matching with CBD
Applicable to (a)
Distinguishes differences in (b)
Find similarities in (c) - (e)
‒ Captures the shape very well
‒ Robust to the noise, scale, and orientation
‒ It is fast and compact
RBD versus CBD
Blue: Similar shapes by Region-Based
Yellow: Similar shapes by Contour-Based
Global Curvature Vector
Global Curvature Vector (GCV) s pecifies global parameters of the contour, namely the Eccentricity
and Circularity:
circularit y 
for a circle, circularity is
eccentrici ty 
C circle 
( 2 r )
 4 .
i 02 
 (y  y
i11 
 (x  x
i 20 
 (x  x
i 20  i 02 
i 20  i 02  2 i 20 i 02  4 i11
i 20  i 02 
i 20  i 02  2 i 20 i 02  4 i11
)( y  y c )
What applications
Region Shape descriptor makes use of all pixels constituting the shape within a frame and can
describe any shapes.
– It is characterized by small size, fast extraction time and matching. The data size for this
representation is fixed to 17.5 bytes.
– The feature extraction and matching processes have low order of computational complexities,
and are suitable for tracking shapes in the video data processing.
Contour Shape descriptor captures perceptually meaningful features of the shape enabling
similarity-based retrieval.
– It is robust to non-rigid motion.
– It is robust to partial occlusion of the shape.
– It is robust to perspective transformations, which result from the changes of the camera
parameters and are common in images and video

similar documents