ppt - Northeastern University

Report
HUCAA 2014
GPU-ACCELERATED HMM
FOR SPEECH RECOGNITION
Leiming Yu, Yash Ukidave and David Kaeli
ECE, Northeastern University
Outline
 Background & Motivation
 HMM
 GPGPU
 Results
 Future Work
Background
• Translate Speech to Text
• Speaker Dependent
Speaker Independent
• Applications
* Natural Language Processing
* Home Automation
* In-car Voice Control
* Speaker Verifications
* Automated Banking
* Personal Intelligent Assistants
Apple Siri
Samsung S Voice
* etc.
[http://www.kecl.ntt.co.jp]
DTW
Dynamic Time Warping
A template-based approach to measure similarity between two
temporal sequences which may vary in time or speed.
[opticalengineering.spiedigitallibrary.org]
DTW
Dynamic Time Warping
For i := 1 to n
For j := 1 to m
cost:= D(s[i], t[j])
DTW[i, j] := cost + minimum(DTW[i-1, j ],
DTW[i , j-1],
DTW[i-1, j-1])
DTW Pros:
1) Handle timing variation
2) Recognize Speech at reasonable cost
DTW Cons:
1) Template Choosing
2) Ending point detection (VAD, acoustic noise)
3) Words with weak fricatives, close to acoustic background
Neural Networks
Algorithms mimics the brain.
Simplified Interpretation:
* takes a set of input features
* goes through a set of hidden layers
* produces the posterior probabilities as the output
Neural Networks
Bike
Pedestrian
Car
Parking Meter
If Pedestrian
“activation” of unit in layer
matrix of weights controlling function
mapping from layer to layer
[Machine Learning, Coursera]
Neural Networks
Equation Example
Neural Networks Example
Hint:
* effective in recognizing individual phones
isolated words as short-time units
* not ideal for continuous recognition tasks
largely due to the poor ability to model temporal dependencies.
Hidden Markov Model
In a Hidden Markov Model,
* the states are hidden
* output that depend on the states are visible
x — states
y — possible observations
a — state transition probabilities
b — output probabilities
[wikipedia]
Hidden Markov Model
The temporal transition of the hidden states fits well with the nature of phoneme transition.
Hint:
* Handle temporal variability of speech well
* Gaussian mixture models(GMMs), controlled by the hidden variables
determine how well a HMM can represent the acoustic input.
* Hybrid with NN to leverage each modeling technique
Motivation
• Parallel Architecture
multi-core CPU to many-core GPU ( graphics + general purpose)
• Massive Parallelism in Speech Recognition System
Neural Networks, HMMs, etc. , are both Computation and Memory Intensive
• GPGPU Evolvement
* Dynamic Parallelism
* Concurrent Kernel Execution
* Hyper-Q
* Device Partitioning
* Virtual Memory Addressing
* GPU-GPU Data Transfer, etc.
• Previous works
• Our goal is to use new modern GPU features to accelerate Speech Recognition
Outline
 Background & Motivation
 HMM
 GPGPU
 Results
 Future Work
Hidden Markov Model
Markov chains and processes are named after Andrey Andreyevich Markov(1856-1922),
a Russian mathematician, whose Doctoral Advisor is Pafnuty Chebyshev.
1966, Leonard Baum described the underlying mathematical theory.
1989, Lawrence Rabiner wrote a paper with the most comprehensive description on it.
Hidden Markov Model
HMM Stages
* causal transitional probabilities between states
* observation depends on current state, not predecessor
Hidden Markov Model
 Forward
 Backward
 Expectation-Maximization
HMM-Forward
Hidden Markov Model
 Forward
 Backward
 Expectation-Maximization
HMM Backward
 ( + 1)
 ()
I

J
 (+1 )
t-1
t
t+1
t+2
HMM-EM
Variable Definitions:
* Initial Probability
* Transition Prob.
Observation Prob.
* Forward Variable
Backward Variable
Other Variables During Estimation:
* the estimated state transition probability matrix, epsilon 
* the estimated probability in a particular state at time t, gamma 
* Multivariate Normal Probability Density Function
Update Obs. Prob. From Gaussian Mixture Models
HMM-EM
Outline
 Background & Motivation
 HMM
 GPGPU
 Results
 Future Work
GPGPU
Programming Model
GPGPU
GPU Hierarchical Memory System
• Visibility
• Performance
Penalty
[http://www.biomedcentral.com]
GPGPU
• Visibility
• Performance Penalty
[www.math-cs.gordon.edu]
GPGPU
GPU-powered Eco System
1) Programming Model
* CUDA
* OpenCL
* OpenACC, etc.
2) High Performance Libraries
* cuBLAS
* Thrust
* MAGMA (CUDA/OpenCL/Intel Xeon Phi)
* Armadilo (C++ Linear Algebra Library), drop-in libraries etc.
3) Tuning/Profiling Tools
* Nvidia: nvprof / nvvp
* AMD: CodeXL
4) Consortium Standards
Heterogeneous System Architecture (HSA) Foundation
Outline
 Background & Motivation
 HMM
 GPGPU
 Results
 Future Work
Results
Platform Specs
Results
Mitigate Data Transfer Latency
Pinned Memory Size
current process limit:
hardware limit:
increase the limit:
ulimit -l ( in KB )
ulimit –H –l
ulimit –S –l 16384
Results
Results
A Practice to Efficiently Utilize Memory System
Results
Results
Hyper-Q Feature
Results
Running Multiple Word
Recognition Tasks
Results
Outline
 Background & Motivation
 HMM
 GPGPU
 Results
 Future Work
Future Work
• Integrate with Parallel Feature Extraction
• Power Efficiency Implementation and Analysis
• Embedded System Development, Jetson TK1 etc.
• Improve generosity, LMs
• Improve robustness, Front-end noise cancelation
• Go with the trend!
QUESTIONS ?

similar documents