MPEG-7 STANDARD

Report
MPEG-7 STANDARD
TOWARDS INTELLIGENT
AUDIO-VISUAL
INFORMATION HANDLING
MPEG-7, formally named “Multimedia Content
Description Interface”, is a standard for
describing the multimedia content data that
supports some degree of interpretation of the
information’s meaning, which can be passed onto,
or accessed by, a device or a computer code.
MPEG-7 is not aimed at any one application in
particular; rather, the elements that MPEG-7
standardizes support as broad a range of
applications as possible.
Applications of MPEG-7
The elements that MPEG-7 standardizes provide
support to a broad range of applications (for
example, multimedia digital libraries, broadcast
media selection, multimedia editing, home
entertainment devices, etc.). MPEG-7 will also
make the web as searchable for multimedia
content as it is searchable for text today. This
would apply especially to large content archives,
which are being made accessible to the public, as
well as to multimedia catalogues enabling people
to identify content for purchase.
OBJECTIVES OF MPEG-7 STANDARD
The MPEG-7 standard aims at providing
standardized core technologies allowing
description of audiovisual data content in
multimedia environments. Audiovisual data
content that has MPEG-7 data associated with
it, may include: still pictures, graphics, 3D
models, audio, speech, video, and composition
information about how these elements are
combined in a multimedia presentation
(scenarios). Special cases of these general data
types may include facial expressions and
personal characteristics
APPLICATION AREAS OF MPEG-7
Broadcast media selection (e.g., radio channel, TV channel).
Cultural services (history museums, art galleries, etc.).
Digital libraries (e.g., image catalogue, musical dictionary, film, video and radio archives).
E-Commerce (e.g., personalised advertising, on-line catalogues, directories of e-shops).
Education (e.g., repositories of multimedia courses, multimedia search for support material).
Home Entertainment (e.g., systems for the management of personal multimedia collections,
including manipulation of content, e.g. home video editing, searching a game, karaoke).
Investigation services (e.g., human characteristics recognition, forensics).
Journalism (e.g. searching speeches of a certain politician using his name, his voice or his
face).
Multimedia directory services (e.g. yellow pages, Tourist information, Geographical
information systems).
Multimedia editing (e.g., personalised electronic news service, media authoring).
Remote sensing (e.g., cartography, ecology, natural resources management).
Shopping (e.g., searching for clothes that you like).
Social (e.g. dating services).
Surveillance (e.g., traffic control, surface transportation, non-destructive testing in hostile
environments).






MPEG-7 Description tools allow to create descriptions of content that
may include:
Information describing the creation and production processes of the
content (director, title, short feature movie)
Information related to the usage of the content (copyright pointers,
usage history, broadcast schedule)
Information of the storage features of the content (storage format,
encoding)
Structural information on spatial, temporal or spatio-temporal
components of the content (scene cuts, segmentation in regions, region
motion tracking)
Information about low level features in the content (colors, textures,
sound timbres, melody description)
Conceptual information of the reality captured by the content
(objects and events, interactions among objects)
All these descriptions are coded in an efficient way for searching,
filtering, etc.
APPLICATION MODEL
Parts of the standard
MPEG-7 Visual – the Description Tools dealing with Visual
descriptions.
MPEG-7 Audio – the Description Tools dealing with Audio
descriptions
MPEG-7 Multimedia Description Schemes - the Description Tools
dealing with generic features and multimedia descriptions.
MPEG-7 Description Definition Language - the language
defining the syntax of the MPEG-7 Description Tools and
for defining new Description Schemes.
Structure of the descriptions
Those main elements of the MPEG-7’s
standard are:
• Descriptors (D): representations of
Features, that define the syntax and the
semantics of each feature representation,
• Description Schemes (DS): specify the
structure and semantics of the
relationships between their components.
These components may be both
Descriptors and Description Schemes,
Description Definition Language (DDL): to
allow the creation of new Description
Schemes and, possibly, Descriptors and to
allows the extension and modification of
existing Description Schemes,
System tools: to support multiplexing of
descriptions, synchronization issues,
transmission mechanisms, coded
representations (both textual and binary
formats) for efficient storage and
transmission, management and protection
of intellectual property in MPEG-7
descriptions, etc
MPEG-7 Description Definition Language (DDL)
The DDL defines the syntactic rules to
express and combine Description Schemes
and Descriptors
DDL is a schema language to represent the
results of modeling audiovisual data, i.e.
DSs and Ds.
It was decided to adopt XML Schema Language
from W3C as the MPEG-7 DDL. The DDL will
require some specific extensions to XML
Schema
WHAT IS XML ?
• Extensible Markup Language (XML) is a
subset of SGML (ISO standard). Its goal
is to enable generic SGML to be
processed on the Web in the same way
that is now possible with HTML. XML
has been designed for ease of
implementation compared to SGML.
XML defines document structure and
embeds it directly within the document
through the use of markups. A markup is
composed of two kinds of tags which
encapsulate data : open tags and close
tags. XML is similar to HTML but the
tags can be defined by the user. The
definition of valid document structure is
expressed in a language called DTD
(Document Type Definition).
SIMPLE EXAMPLE of an XML
DOCUMENT :
<letter>
<header>
<name>Mr John Smith</name>
<address>
<street>15 rue Lacepede</street>
<city>Paris</city>
</address>
</header>
<text>Dear Mr Doe, .....</text>
</letter>
XML Schema: Structures
XML Schema consists of a set of structural
schema components which can be divided
into three groups. The primary components
are:
The Schema – the wrapper around the definitions
and declarations;
• Simple type definitions;
• Complex type definitions;
• Attribute declarations;
• Element declarations.
The secondary components are:
• Attribute group definitions;
• Identity-constraint definitions;
• Model group definitions;
• Notation declaration
The third group are the "helper" components
which contribute to the other components and
cannot stand alone:
• Annotations;
• Model groups;
• Particles;
• Wildcards.
Simple example of a DTD file :
<!DOCTYPE letter[
<!ELEMENT letter (header, text)>
<!ELEMENT header (name,address)>
<!ELEMENT address (street, city)>
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
]>
name
street
city
text
#PCDATA>
#PCDATA>
#PCDATA>
#PCDATA>
What is an XML schema ?
The purpose of an XML schema is almost
the same than a DTD except that it goes
beyond the current functionalities of a
DTD and allows more precise datatype
definitions and easier reuse of structure
definitions. Schema can be seen as an
extended DTD. Even more important is
that an XML schema is itself an XML
document.
XML Schema Overview
The DDL can be broken down into the
following logical normative components:
• XML Schema Structural components;
• XML Schema Datatype components;
• MPEG-7 Extensions to XML Schema
The MPEG-7 DDL is basically XML
Schema Language but with some MPEG7-specific extensions such as array and
matrix datatypes.
The DDL allows to define complexTypes
and simpleTypes. The complexTypes
specify the structural constraints while
simpleTypes express datatype constraints
MPEG-7 Extensions to XML Schema
• Parameterized array sizes;
• Typed references ;
• Built-in array and matrix datatypes;
• Enumerated datatypes for MimeType,
CountryCode, RegionCode,
CurrencyCode and CharacterSetCode
XML Schema Language parsers available:
XSV - Open Source Edinburgh Schema
Validator (written in Python)
XML Spy - Validating XML Editor
Xerces - Open source XML Parsers in Java
and C++
EXAMPLE
<simpleType name=”6bitInteger”
base=”nonNegativeInteger”>
<minInclusive value=”0”/>
<maxInclusive
value=”63”/>
</simpleType>
A complex type definition is a set of attribute
declarations and a content type, applicable to the
attributes and children of an element declared to
be of this complex type
<complexType name="Organization">
<element name="OrganizationName" type="string"/>
<element name="ContactPerson" type="Individual"
minOccurs="0"
maxOccurs="unbounded"/>
<element
name="Address"
type="Place"
minOccurs="0"/>
<attribute name="id" type="ID" use=”required”/>
</complexType>
0.1XML Built-in Primitive Datatypes
Schema:Datatypes:

string;

boolean;

float;

double;

decimal;

timeDuration [ISO 8601];

recurringDuration;

binary;

uriReference;

ID;

IDREF;

ENTITY;

NOTATION;
QName.
0 MPEG-7 Structural Extensions
Defining Arrays and Matrices
<simpleType name="IntegerMatrix3x4"
base="integer" derivedBy="list">
<mpeg7:dimension value="3 4" />
</simpleType>
<element name='IntegerMatrix3x4'
type='IntegerMatrix3x4'/>
<IntegerMatrix3x4>
5
8 9 4
6
7 8 2
7
1 3 5
</IntegerMatrix3x4>
VECTORS
<!-- Definition of "Vector of integers"
-->
<simpleType name="listOfInteger" base="integer"
derivedBy="list" />
<complexType name="VectorI" base="listOfInteger"
derivedBy="extension">
<attribute ref="mpeg7:dim" />
</complexType>
<!-- Definition of "Vector of reals"
<simpleType
name="listOfFloat"
derivedBy="list" />
<complexType
name="VectorR"
derivedBy="extension">
<attribute ref="mpeg7:dim" />
-->
base="float"
base="listOfFloat"
MPEG-7 Visual
MPEG-7 Visual Description Tools included in
the standard consist of basic structures and
Descriptors that cover the following basic
visual features: Color, Texture, Shape, Motion,
Localization, and Face recognition. Each
category consists of elementary and
sophisticated Descriptors.
Visual Descriptors
Basic Descriptors
There are five Visual related Basic structures: the
Grid layout, and the Time series, Multiple view, the
Spatial 2D coordinates, and Temporal interpolation.
Color Descriptors
There are seven Color Descriptors: Color space,
Color Quantization, Dominant Colors, Scalable Color,
Color Layout, Color-Structure, and GoF/GoP Color.
VISUAL DESCRIPTORS AND
DESCRIPTION SCHEMES
Descriptors: representations of features
that define the syntax and the semantics of
each feature representation.
Description Schemes: specify the structure
and semantics of the relationships between
their components, which may be both
Descriptors and Description Schemes
BASIC STRUCTURES
GRID LAYOUT
The grid layout is a splitting of the image into a set of rectangular
regions, so that each region can be described separately. Each region of
the grid can be described in terms of other descriptors such as color or
texture.
1.1. DDL representation syntax
<element name=”GridLayout”>
<complexType content=”empty”>
<attribute name=”PartNumberH”
datatype=”positiveInteger”/>
<attribute name=”PartNumberV”
datatype=”positiveInteger”/>
</complexType>
</element
PartNumberH 16 bit
This field contains number of horizontal
partitions in the grid over the image.
PartNumberV 16 bit
This field contains number of vertical partitions
in the grid over the image
COLOR
Color space, several supported
- RGB
- YCbCr
- HSV
- HMMD
- Linear transformation matrix with reference to RGB
- Monochrome
DDL representation syntax
<element name=”ColorSpace”>
<complexType>
<choice>
<element name=”RGB” type=”emptyType”/>
<element name=”YCbCr” type=”emptyType”/>
<element name=”HSV” type=”emptyType”/>
<element name=”HMMD” type=”emptyType”/>
<element name=”LinearMatrix” >
<complexType base=”IntegerMatrix”
derivedBy=”restriction”>
<!--matrix element as 16-bit
unsigned integer -->
<minInclusive value=”0”/>
<maxInclusive value=”65535”/>
<attribute name=”sizes”
use=”fixed” value=”3 3”/>
</complexType>
</element>
<element name=”Monochrome” type=”emptyType”/>
</choice>
</complexType>
</element>
W h ite C o lo r
Sum
M in
D iff
H ue
M ax
B la ck C o lo r
HMMD SPACE REPRESENTATION
1.Color quantization
This descriptor defines the quantization of a
color space. The following quantization types
are supported: uniform,subspace_uniform,
subspace_nonuniform and lookup_table.
1. Dominant color
This descriptor specifies a set of dominant
colors in an arbitrarily-shaped region. It
targets content-based retrieval for color, either
for the whole image or for an arbitrary region
(rectangular or irregular)
1.1
DDL representation syntax
<element name=”DominantColor”>
<complexType>
<element ref=”ColorSpace”/>
<element ref=”ColorQuantization”/>
<element name=”DomColorValues”
minOccursPar=”DomColorsNumber”>
<complexType>
<element name=”Percentage”
type=”unsigned5”/>
<element name=”ColorValueIndex”>
<simpleType base=”unsigned12”
derivedBy=”list”>
<length
valuePar=”ColorSpaceDim”/>
</simpleType>
</element>
<element name=”ColorVariance”
minOccurs=”0” maxOccurs=”1”>
<simpleType base=”unsigned1”
derivedBy=”list”>
<length
valuePar=”ColorSpaceDim”/>
</simpleType>
</element>
1.
Descriptor semantics
DomColorsNumber
This element specifies the number of dominant colors in the region.
The maximum allowed number of dominant colors is 8, the
minimum number of dominant colors is 1.
VariancePresent
This is a flag used only in binary representation that signals the
presence of the color variances in the descriptor.
SpatialCoherency
The image spatial variance (coherency) per dominant color
captures whether or not a given dominant color is coherent and
appears to be a solid color in the given image region.
NON-COHERENT AND COHERENT REGIONS
DESCRIPTORS ALREADY DEFINED
FOR THE FOLLOWING ATTRIBUTES:
COLOR (COLOR SPACE, QUANTIZATION,
DOMINANT COLOR, SCALABLE COLOR,
COLOR LAYOUT, COLOR STRUCTURE)
COLOR HISTOGRAM FOR GROUP OF
FRAMES
TEXTURE (HOMOGENOUS,TEXTURE
BROWSING, EDGE HISTOGRAM)
SHAPE (REGION SHAPE, CONTOUR SHAPE)
CNTD.
MOTION(CAMERA MOTION, MOTION
TRAJECTORY, PARAMETRIC MOTION,
MOTION ACTIVITY)
LOCALIZATION(REGION LOCATOR,
SPATIO-TEMPORAL LOCATOR(INCLUDES
FigureTrajectory, ParemeterTrajectory)
TEXTURE
Homogeneous texture
This descriptor provides similarity based image-to-image matching for texture
image databases. In order to describe the image texture, energy and energy
deviation feature values are extracted from a frequency layout and are used to
constitute a texture feature vector for similarity-based retrieval.
Channel (Ci)
channel
number (i)
4
5
3
6
10
2
11
12
9
16
17
18
q
8
15
2322 21
24
20
19
w
14
13
7
1
w
30
29
28
27
26
25
0
30 ANGULAR CHANNELS FOR FREQUENCY LAYOUT
ENERGY FUNCTION IS DEFINED AS FOLLOWS
1
pi 
360

 
[ G P s , r (w , q )  P (w , q )]
2
w 0 q 0 
P(ω,θ) is the Fourier transform of an image represented in the
polar frequency domain
  w  w
s
G P s , r w  q   exp 
2

2 
s

2 
  q  q
r
  exp 
2


2 q
r


G is Gaussian function
ei  log 10 [1  p i ]
ei is energy in i channel
2 



DDL representation syntax
<element name=”HomogeneousTexture”>
<complexType>
<attribute name=”FeatureType” type=”boolean”/>
<element name=”AverageFeatureValue” type=”unsigned8”/>
<element name=”StandardDeviationFeatureValue”
type=”unsigned8”/>
<element name=”EnergyComponents”>
<simpleType base=”unsigned8” derivedBy=”list”>
<length value=”30”/>
</simpleType>
</element>
<element name=”EnergyDeviationComponents”
minOccurs=”0” maxOccurs=”1”>
<simpleType base=”unsigned8” derivedBy=”list”>
<length value=”30”/>
</simpleType>
</element>
</complexType>
</element>
1.
Texture browsing
This descriptor specifies a texture browsing descriptor.
It relates to a perceptual characterisation of texture,
similar to a human characterisation, in terms of
regularity, coarseness and directionality. This
representation is useful for browsing applications and
coarse classification of textures. We refer to this as the
Perceptual Browsing Component (PBC).
11
10
01
00
RegularityComponent
This element represents texture’s regularity. A texture is said to be regular if
it is a periodic pattern with clear directionalities and of uniform scale
DirectionComponent
This element represents the dominant direction characterising the
texture directionality
ScaleComponent
This element represents the coarseness of the texture associated with
the corresponding dominant orientation specified in the
DirectionComponent
Edge histogram
The edge histogram descriptor represents the spatial distribution of
five types of edges namely, four directional edges and one nondirectional edge
a) vertical
edge
b) horizontal
edge
c) 45 degree
edge
d) 135 degree
edge
e)non-directional
ed ge
DDL representation semantics
element name=”EdgeHistogram”>
<complexType>
<element name=”BinCounts”>
<simpleType
base=”unsigned8” derivedBy=”list”>
<length value=”80”>
</simpleType>
</element>
</complexType>
</element>
SHAPE
Region shape
T s The shape of an object may consist of either a single
connected region or a set of disjoint regions, as well as some
holes in the object.
SHAPES
The region-based shape descriptor utilizes a set of ART (Angular
Radial Transform) coefficients. ART is a 2-D complex transform
defined on a unit disk in polar coordinates,
F nm  V nm   , q , f   , q  
f (  ,q )
2
 
0
1
0
V nm   , q , f   , q   d  d q

is an image function in polar coordinates, and V nm (  , q )
is the ART basis function. The ART basis functions are separable
along the angular and radial directions, i.e.,
V nm   , q   A m q R n   
The angular and radial basis functions are defined
as follows:
A m q  
1
2
exp
 jm q 
1
R n    
 2 cos  n  
n  0
n  0
ART BASIS FUNCTIONS
CONTOUR SHAPE
The object contour shape descriptor describes a closed
contour of a 2D object or region in an image or video
sequence
The object contour-based shape descriptor is based on the Curvature Scale Space
(CSS) representation of the contour
HOW THE CONTOUR IS CALCULATED?
N equidistant points are selected on the contour, starting from an
arbitrary point on the contour and following the contour
clockwise. The x-coordinates of the selected N points are grouped
together and the y-coordinates are also grouped together into two
series X, Y. The contour is then gradually smoothed by repetitive
application of a low-pass filter with the kernel (0.25,0.5,0.25) to X
and Y coordinates of the selected N contour points
GlobalCurvatureVector
This element specifies global parameters of the contour, namely the
Eccentricity and Circularity
circularit y 
perimeter
2
area
FOR A CIRCLE
CIRCULARITY IS
C circle 
eccentrici ty 
i 02 
 (y  y
c
)
2
i11 
( 2 r )
r
2
2
 4 .
2
2
2
2
2
2
i 20  i 02 
i 20  i 02  2 i 20 i 02  4 i11
i 20  i 02 
i 20  i 02  2 i 20 i 02  4 i11
 (x  x
c
)( y  y c )
i 20 
 (x  x
c
)
2
MOTION
Camera motion
This descriptor characterizes 3-D camera motion parameters. It is based on 3-D
camera motion parameter information, which can be automatically extracted or
generated by capture devices
T ilt up
B o o m up
P an right
D o lly
b ack w ard
T rack right
P an left
D o lly
fo rw ard
T rack left
R o ll
B oom down
T ilt d o w n
Motion trajectory
Motion Trajectory is a high-level feature associated with a moving region, defined
as a spatio-temporal localization of one of its representative points (such as
centroid
Parametric motion
This descriptor addresses the motion of objects in video sequences, as well as
global motion
Motion activity
The activity descriptor captures intuitive notion of “intensity of action” or “pace of
action” in a video segment. Examples of high activity include scenes such as “goal
scoring in a soccer match”, “scoring in a baseball game”, “a high speed car chase”,
etc. On the other hand, scenes such as “news reader shot”, “an interview scene”,
“a still shot” etc. are perceived as low action shots
Localization
Region locator
This descriptor enables localization of regions within images or frames by
specifying them with a brief and scalable representation of a Box or a Polygon
Spatio-temporal locator
The SpatioTemporalLocator describes spatio-temporal regions in a video sequence
and provides localization functionality especially for hypermedia applications.
It consists of FigureTrajectory and ParameterTrajectory.
R efere nce R e gio n
R efere nce R e gio n
R efere nce R e gio n
M o tio n
M o tio n
M o tio n
FigureTrajectory
FigureTrajectory describes a spatio-temporal region by trajectories of the
representative points of a reference region. Reference regions are
represented by three kinds of figures: rectangles, ellipses and polygons
Te m p o ra lIn te rp o la tio n D
Te m p o ra lIn te rp o la tio n D
Te m p o ra lIn te rp o la tio n D
ParameterTrajectory
M otion
M o tio n P aram eters
a1
Te m p o ra lIn te rp o la tio n D
a2
a3
a4
tim e
ParameterTrajectory describes a spatio-temporal region by a reference region and
trajectories of motion parameters. Reference regions are described using the
RegionLocator descriptor. Motion parameters and parametric motion model
specify a mapping from the reference region to a region of an arbitrary frame
AUDIO DESCRIPTORS
Audio Framework. The main hook into a description for all
audio description schemes and descriptors
Spoken Content DS. A DS representing the output of
Automatic Speech Recognition (ASR).
Timbre Description. A collection of descriptors describing the
perceptual features of instrument sounds
Audio Independent Components. A DS containing an
Independent Component Analysis (ICA) of audio
EXAMPLES
AudioPowerType
describes the temporally-smoothed instantaneous power
<!-- definition of "AudioPowerType" -->
<complexType name="AudioPowerType"
base="mpeg7:AudioSampledType"
derivedBy="extension">
<element name="Value"
type="mpeg7:SeriesOfScalarType"
maxOccurs="unbounded"/>
</complexType
AudioSpectrumCentroidType
describes the center of gravity of the log-frequency
power spectrum
<!-- Center of gravity of log-frequency power spectrum -->
<complexType name="AudioSpectrumCentroidType"
base="mpeg7:AudioSampledType"
derivedBy="extension">
<element name="Value" type="mpeg7:SeriesOfScalarType"
maxOccurs="unbounded"/>
</complexType
THERE ARE QUITE MANY OF AUDIO Ds
AudioDescriptorType
1
3.1.2
AudioSampledType
2
3.1.3
AudioWaveformEnvelopeType
2
3.1.4
AudioSpectrumEnvelopeType
2
3.1.5
AudioPowerType
3
3.1.6
AudioSpectrumCentroidType
4
3.1.7
AudioSpectrumSpreadType
4
3.1.8
AudioFundamentalFrequencyType
3.1.9
AudioHarmonicityType 5
1.2 AudioDescriptorType
7
1.2.2
AudioSampledType
7
1.2.3
AudioWaveformEnvelopeType
7
1.2.4
AudioSpectrumEnvelopeType
7
1.2.5
AudioPowerType
9
1.2.6
AudioSpectrumCentroidType
10
1.2.7
AudioSpectrumSpreadType
10
1.2.8
AudioFundamentalFrequencyType
1.2.9
AudioHarmonicityType
3.1.1
5
11
SPOKEN CONTENT DESCRIPTORS
Spoken Content DS consists of combined word and phone lattices for
each speaker in an audio stream
The DS can be used for two broad classes of retrieval scenario: indexing
into and retrieval of an audio stream, and indexing of multimedia objects
annotated with speech
EXAMPLE APPLICATIONS
Recall of audio/video data by memorable spoken events. An example
would be a film or video recording where a character or person spoke a
particular word or sequence of words. The source media would be
known, and the query would return a position in the media
Spoken Document Retrieval. In this case, there is a database consisting
of separate spoken documents. The result of the query is the relevant
documents, and optionally the position in those documents of the
matched speech.
a)
.
lattice structure for an hypothetical (combined phone and
word) decoding of the expression “Taj Mahal drawing …”. It
is assumed that the name ‘Taj Mahal’ is out of the
vocabulary of the ASR system.
A
Definition of the SpokenContentHeader
-->
<!--->
<!-- The header consists of the following components:
-->
<!-- 1. The speakers which comprise the audio. There
must be at least one -->
<!-speaker.
-->
<!-- 2. The phone lexicons used to represent the
speech.
-->
<!-- 3. The word lexicons used to represent the
speech.
-->
<!-- Note:
-->
<!-- a) A word or phone lexicon may be used by more
than one speaker.
-->
<!-- b) Although there must be at least one word or phone lexicon.
TIMBRE DESCRIPTOR
Timbre Descriptors aim at describing perceptual features of instrument
sounds. Timbre is currently defined in the literature, as the perceptual
features that make two sounds having the same pitch and loudness
sound different. The aim of the Timbre DS is to describe these perceptual
features with a reduced set of descriptors. The descriptors relate to
notions such as “attack”, “brightness” or “richness” of a sound.
<DSType name='TimbreDS'>
<SubDSof=’AudioDS/>
<DtypeRef='LogAttackTimeD'
minoccurs='0'
maxoccurs='1'/>
<DtypeRef='HarmonicSpectralCentroidD'
minoccurs='0' maxoccurs='1'/>
<DtypeRef='HarmonicSpectralDeviationD'
minoccurs='0' maxoccurs='1'/>
<DtypeRef='HarmonicSpectralStdD'
minoccurs='0'
maxoccurs='1'/>
<DtypeRef='HarmonicSpectralVariationD'
minoccurs='0' maxoccurs='1'/>
<DtypeRef='SpectralCentroidD'
minoccurs='0'
maxoccurs='1'/>
<DtypeRef='TemporalCentroidD'
minoccurs='0'
maxoccurs='1'/>
</DSType>
DEFINITIONS ARE GIVEN:
EXAMPLE
LOG-ATTACK-TIME
lat  log 10 (T 1  T 0 )
where

T0 is the time the signal starts
T1 is the time the signal reaches its sustained part
S ig n al en v elo p e(t)
T0
T1
t
ESTIMATION O´F SOUND TIMBRE DESCRIPTORS

similar documents