HUCAA 2014 GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University Outline Background & Motivation HMM GPGPU Results Future Work Background • Translate Speech to Text • Speaker Dependent Speaker Independent • Applications * Natural Language Processing * Home Automation * In-car Voice Control * Speaker Verifications * Automated Banking * Personal Intelligent Assistants Apple Siri Samsung S Voice * etc. [http://www.kecl.ntt.co.jp] DTW Dynamic Time Warping A template-based approach to measure similarity between two temporal sequences which may vary in time or speed. [opticalengineering.spiedigitallibrary.org] DTW Dynamic Time Warping For i := 1 to n For j := 1 to m cost:= D(s[i], t[j]) DTW[i, j] := cost + minimum(DTW[i-1, j ], DTW[i , j-1], DTW[i-1, j-1]) DTW Pros: 1) Handle timing variation 2) Recognize Speech at reasonable cost DTW Cons: 1) Template Choosing 2) Ending point detection (VAD, acoustic noise) 3) Words with weak fricatives, close to acoustic background Neural Networks Algorithms mimics the brain. Simplified Interpretation: * takes a set of input features * goes through a set of hidden layers * produces the posterior probabilities as the output Neural Networks Bike Pedestrian Car Parking Meter If Pedestrian “activation” of unit in layer matrix of weights controlling function mapping from layer to layer [Machine Learning, Coursera] Neural Networks Equation Example Neural Networks Example Hint: * effective in recognizing individual phones isolated words as short-time units * not ideal for continuous recognition tasks largely due to the poor ability to model temporal dependencies. Hidden Markov Model In a Hidden Markov Model, * the states are hidden * output that depend on the states are visible x — states y — possible observations a — state transition probabilities b — output probabilities [wikipedia] Hidden Markov Model The temporal transition of the hidden states fits well with the nature of phoneme transition. Hint: * Handle temporal variability of speech well * Gaussian mixture models(GMMs), controlled by the hidden variables determine how well a HMM can represent the acoustic input. * Hybrid with NN to leverage each modeling technique Motivation • Parallel Architecture multi-core CPU to many-core GPU ( graphics + general purpose) • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive • GPGPU Evolvement * Dynamic Parallelism * Concurrent Kernel Execution * Hyper-Q * Device Partitioning * Virtual Memory Addressing * GPU-GPU Data Transfer, etc. • Previous works • Our goal is to use new modern GPU features to accelerate Speech Recognition Outline Background & Motivation HMM GPGPU Results Future Work Hidden Markov Model Markov chains and processes are named after Andrey Andreyevich Markov(1856-1922), a Russian mathematician, whose Doctoral Advisor is Pafnuty Chebyshev. 1966, Leonard Baum described the underlying mathematical theory. 1989, Lawrence Rabiner wrote a paper with the most comprehensive description on it. Hidden Markov Model HMM Stages * causal transitional probabilities between states * observation depends on current state, not predecessor Hidden Markov Model Forward Backward Expectation-Maximization HMM-Forward Hidden Markov Model Forward Backward Expectation-Maximization HMM Backward ( + 1) () I J (+1 ) t-1 t t+1 t+2 HMM-EM Variable Definitions: * Initial Probability * Transition Prob. Observation Prob. * Forward Variable Backward Variable Other Variables During Estimation: * the estimated state transition probability matrix, epsilon * the estimated probability in a particular state at time t, gamma * Multivariate Normal Probability Density Function Update Obs. Prob. From Gaussian Mixture Models HMM-EM Outline Background & Motivation HMM GPGPU Results Future Work GPGPU Programming Model GPGPU GPU Hierarchical Memory System • Visibility • Performance Penalty [http://www.biomedcentral.com] GPGPU • Visibility • Performance Penalty [www.math-cs.gordon.edu] GPGPU GPU-powered Eco System 1) Programming Model * CUDA * OpenCL * OpenACC, etc. 2) High Performance Libraries * cuBLAS * Thrust * MAGMA (CUDA/OpenCL/Intel Xeon Phi) * Armadilo (C++ Linear Algebra Library), drop-in libraries etc. 3) Tuning/Profiling Tools * Nvidia: nvprof / nvvp * AMD: CodeXL 4) Consortium Standards Heterogeneous System Architecture (HSA) Foundation Outline Background & Motivation HMM GPGPU Results Future Work Results Platform Specs Results Mitigate Data Transfer Latency Pinned Memory Size current process limit: hardware limit: increase the limit: ulimit -l ( in KB ) ulimit –H –l ulimit –S –l 16384 Results Results A Practice to Efficiently Utilize Memory System Results Results Hyper-Q Feature Results Running Multiple Word Recognition Tasks Results Outline Background & Motivation HMM GPGPU Results Future Work Future Work • Integrate with Parallel Feature Extraction • Power Efficiency Implementation and Analysis • Embedded System Development, Jetson TK1 etc. • Improve generosity, LMs • Improve robustness, Front-end noise cancelation • Go with the trend! QUESTIONS ?