Report

- A comparison with Neural Networks and Hidden Markov Models - César R. de Souza, Ednaldo B. Pizzolato and Mauro dos Santos Anjo Universidade Federal de São Carlos (Federal University of São Carlos) IBERAMIA 2012 Cartagena de Índias, Colombia 2012 Context, Motivation, Objectives and the Organization of this Presentation 3 Multidisciplinary Computing and Linguistics Ethnologue lists about 130 sign languages existent in the world (LEWIS, 2009) Motivation Objectives Agenda 4 Two fronts Social Aim to improve quality of life for the deaf and increase the social inclusion Scientific Investigation of the distinct interaction methods, computational models and their respective challenges Context Motivation Objectives Agenda 5 This paper Investigate the behavior and applicability of SVMs and HCRFs in the recognition of specific signs from the Brazilian Sign Language Long term Walk towards the creation of a full-fledged recognition system for LIBRAS This work represents a small but important step in achieving this goal Context Motivation Objectives Agenda 6 Introduction Libras - Brazilian Sign Language Literature Review Methods and Tools Support Vector Machines Conditional Random Fields Experiments Results Conclusion Context Motivation Objectives Agenda Structures and the manual alphabet 7 8 Natural language Not mimics Not universal LIBRAS Difficulties Grammar It is not only “a problem of the deaf or a language pathology” (QUADROS & KARNOPP, 2004) Introduction Libras Review Methods Experiments Results Conclusion 9 Highly context-sensitive Same sign may have distinct meanings Interpretation is hard even for humans Introduction Libras Review Methods LIBRAS Difficulties Grammar Experiments Results Conclusion 10 Fingerspelling is only part of the Grammar Needed when explicitly spelling the name of a person or a location Subset of the full-language recognition problem Introduction Libras Review Methods LIBRAS Difficulties Grammar Experiments Results Conclusion 11 Introduction Libras Review Methods Experiments Results Conclusion 12 Literature Layer architectures are common Static gestures x Dynamic gestures One of the best works on LIBRAS handles only the movement aspect of the language (Dias et al.) Introduction Libras Review Methods Experiments Results Conclusion 13 Few studies explore SVMs But many use Neural Networks No studies on HCRFs and LIBRAS Introduction Libras Review Methods Experiments Results Conclusion Example Recognition of a fingerspelled word using a two-layered architecture Pato HMM Sequence classifier P P P A A A T T T O O Static gesture classifier ANN Introduction Libras Review Methods Experiments Results Conclusion 15 YANG, SCLAROFF e LEE, 2009 Multiple layers, SVMs Elmezain, 2011 HCRF, in-air drawing recognition Introduction Libras Review Methods Experiments Results Conclusion Overview of the chosen techniques and reasons for their choice 16 Neural Networks and Support Vector Machines for the detection of static signs 17 18 Find () such that… c () b a Introduction Libras Review Methods Experiments Results Conclusion 19 Neural Networks Biologically inspired McCulloch & Pitts, Rosenblatt, Rumelhart Support Vector Machines Maximum Margin Multiple Classes Introduction Libras Review Methods Experiments Results Conclusion Perceptron Hyperplane decision Linearly separable problems Learning is a ill-posed problem Multiple local minima, ill-conditioning Layer architecture Universal approximator Introduction Libras Review Methods Experiments Results Conclusion 21 Neural Networks Strong theoretical basis Statistical Learning Theory Structural Risk Minimization (SRM) Support Vector Machines Maximum Margin Multiple Classes Introduction Libras Review Methods Experiments Results Conclusion 22 Large-margin classifiers Risk minimization through margin maximization Capacity control through margin control Sparse solutions considering only a few support vectors Neural Networks Support Vector Machines Maximum Margin Multiple Classes Introduction Libras Review Methods Experiments Results Conclusion 23 Problem: binary-only classifier How to generalize to multiple classes? Neural Networks Support Vector Machines Maximum Margin Multiple Classes Introduction Libras Review Methods Experiments Results Conclusion 24 Problem: binary-only classifier How to generalize to multiple classes? Drawbacks Classical approaches Only works when equiprobable Evaluation of c(c-1)/2 machines Non-guaranteed optimum results One-against-all One-against-one Discriminant functions With 27 static gestures, this would result in 351 SVM evaluations each time a new classification is required! Introduction Libras Review Methods Experiments Results Conclusion 25 Problem: binary-only classifier How to generalize to multiple classes? Directed Acyclic Graphs Neural Networks Generalization of Decision Trees, allowing for non-directed cycles Support Vector Machines Maximum Margin Multiple Classes Require at maximum c-1 evaluations So, for 27 static gestures, only 26 SVM evaluations are required. Only 7.4% of the original effort Introduction Libras Review Methods Experiments Results Conclusion 26 Elimination proccess A B C D One class eliminated at a time Candidates A lost D lost A B C B C D B lost C lost D lost B lost Libras C lost D lost D Introduction A B B C C D C lost A lost A lost C Review Methods B Experiments B lost A Results Conclusion 27 However, no matter the model We’ll have (extreme) noise due pose transitions How can we cope with that? Introduction Libras Review Methods Experiments Results Conclusion Hidden Markov Models, Conditional Random Field and Hidden Conditional Random Fields for dynamic gesture recognition. 28 29 Find () such that given extremely noisy sequences of labels, estimate the word being signed. blyrei () hil hello hi bye hmeylrlwo Introduction Libras Review Methods Experiments Results Conclusion 30 Hidden Markov Models (HMMs) Conditional Random Fields (CRFs) Hidden Conditional Random Fields (HCRFs) Introduction Libras Review Methods Experiments Results Conclusion 31 Hidden Markov Models Joint probability model of a observation sequence and its relationship with time Hidden Markov Models (HMMs) , = Conditional Random Fields (CRFs) −1 ( | ) =1 A Introduction Libras Review B Hidden Conditional Random Fields (HCRFs) Methods Experiments Results Conclusion 32 Hidden Markov Models Marginalizing over y, we achieve the observation sequence likelihood Hidden Markov Models (HMMs) = −1 ( | ) Conditional Random Fields (CRFs) =1 Hidden Conditional Random Fields (HCRFs) Which can be used for classification using either the ML or MAP criteria | = Introduction Libras | | Review Methods Experiments Results Conclusion 33 Word 1 (|ω ) Word 2 (|ω ) ω = max ( ) ∈ ω ... (|ω ) Word n One model for each word Introduction Libras Review Methods Experiments Results Conclusion 34 Hidden Markov Models have found great applications in speech recognition Conditional Random Fields (CRFs) However, a fundamental paradigm shift recently occurred in this field Introduction Libras Review Methods Hidden Markov Models (HMMs) Hidden Conditional Random Fields (HCRFs) Experiments Results Conclusion 35 Probability distributions governing speech signals could not be modeled accurately, turning “Bayes decision theory inapplicable under those circumstances” (Juang & Rabiner, 2005) Introduction Libras Review Methods Experiments Hidden Markov Models (HMMs) Conditional Random Fields (CRFs) Hidden Conditional Random Fields (HCRFs) Results Conclusion 36 Hidden Markov Models (HMMs) Conditional Random Fields (CRFs) Hidden Conditional Random Fields (HCRFs) Introduction Libras Review Methods Experiments Results Conclusion 37 Conditional Random Fields Generalization of the Markov models Hidden Markov Models (HMMs) Conditional Random Fields (CRFs) Discriminative Models Model | without incorporating Hidden Conditional Random Fields (HCRFs) Designates a family of MRFs Each new observation originates a new MRF Introduction Libras Review Methods Experiments Results Conclusion HMM Directional models Discriminative Naïve Bayes Graphs Sequence Generative 38 Logistic Regression Linear-chain CRF CRF Infograph based on the tutorial by Sutton, C., McCallum, A., 2007 Introduction Libras Review Methods Experiments Results Conclusion 1 = () ∈ Ψ ∈ Potential Functions Conditional Potential CliquesRandom Fields Generalization of the Markov models 1 () = 39 () ( , ) =1 Parameter vector which can be optimized using gradient Hidden Markov methods Models (HMMs) Ψ ( , ; ) Conditional Random Fields (CRFs) ∈ Ψ ∈ Hidden Conditional Random Fields (HCRFs) () Ψ , ; = ( , ) =1 = Characteristic function vector Ψ ( , ; ) ∈ Ψ ∈ Partition function Introduction Libras Review Methods Experiments Results Conclusion 1 = () 40 () ∈ Ψ ∈ ( , ) =1 Conditional Random Fields , = of the Markov models −1 Generalization =1 ( | ) How do we initialize those models? Reaching HCRFs from aHMM , −1 ( , ) =1 Introduction Libras Review Hidden Markov Models (HMMs) Conditional Random Fields (CRFs) Hidden Conditional Random Fields (HCRFs) Methods Experiments Results Conclusion 1 = () , = ∈ Ψ ∈ =1 , −1 (+ ,( ) , ) , −1 ln =1 =1 =1 = =1 , −1 + =1 =1 Libras ( , ) Introduction 41 () Review ( , ) =1 Methods Experiments =1 Results Conclusion 1 = () ∈ Ψ ∈ , = ( , ) =1 , −1 + =1 =1 42 () ( , ) =1 =1 ln ln a11 a12 a13 b11 b12 a21 a22 a23 b21 b22 a31 a32 a33 b31 b32 : = × + × Introduction Libras Review Methods Experiments Results Conclusion 1 = () ( , ) ∈ Ψ ∈ =1 , −1 + ( , ) =1 =1 =1 =1 , − , ; , = {=} {− =} : , = : 43 () , − , ; , = {=} { =} i=1 i=1 i=1 i=2 i=2 i=2 i=3 i=3 i=3 i=1 i=1 i=2 i=2 i=3 i=3 j=1 j=2 j=3 j=1 j=2 j=3 j=1 j=2 j=1 o=1 o=2 o=1 o=2 o=1 o=2 a11 a12 a13 a21 a22 a23 a31 a32 a33 b11 b12 b21 b22 b31 b32 = × + × Introduction Libras Review Methods Experiments Results Conclusion 1 = () 44 () ∈ Ψ ∈ ( , ) =1 , = 1{=} 1{−1 =} + =1 =1 =1 1{=} 1{ =} =1 =1 =1 , − , = ; , = {=}{ ,( , , ; ,, ) = {=} { =} − − , ) + ( , =} − − =1 =1 = , =1 Introduction Libras Review Methods Experiments Results Conclusion 1 = () 45 () ∈ Ψ ∈ ( , ) =1 1, , = = =1 () Introduction Libras Review Methods Experiments Results Conclusion 46 Hidden Markov Models (HMMs) Drawback Assumes both and are known Conditional Random Fields (CRFs) Hidden Conditional Random Fields (HCRFs) Introduction Libras Review Methods Experiments Results Conclusion 47 Hidden Markov Models (HMMs) Conditional Random Fields (CRFs) Hidden Conditional Random Fields (HCRFs) Introduction Libras Review Methods Experiments Results Conclusion 48 Hidden Markov Models (HMMs) Conditional Random Fields (CRFs) Hidden Conditional Random Fields (HCRFs) Introduction Libras Review Methods Experiments Results Conclusion 49 Hidden Conditional Random Fields Generalization of the hidden Markov classifiers Hidden Markov Models (HMMs) Conditional Random Fields (CRFs) Hidden Conditional Random Fields (HCRFs) Parameter space Introduction Libras Review Methods Experiments Results Conclusion 50 Hidden Conditional Random Fields Generalization of the hidden Markov classifiers Hidden Markov Models (HMMs) Sequence classification Conditional Random Fields (CRFs) Model | without explicitly modeling Hidden Conditional Random Fields (HCRFs) Do not require to be known The sequence of states is now hidden Introduction Libras Review Methods Experiments Results Conclusion 51 Hidden Conditional Random Fields Generalization of the hidden Markov classifiers ,( |) = = 1 () () =1 Libras ∈ Ψ ∈ Conditional Random Fields (CRFs) () , ;, Ψ ,, ; = Introduction Ψ ( , ,; ) Review Hidden Markov Models (HMMs) =1 Methods Experiments Hidden Conditional Random Fields (HCRFs) Results Conclusion 52 Single model for all words ω yt-2 yt-1 yt xt-2 xt-1 xt ω = max ( |) ∈ ω Introduction Libras Review Methods Experiments Results Conclusion Fingerspelling recognition with SVMs and HCRFs against ANNs and HMMs 53 Pato HCRF ω HMM yt-2 yt-1 yt xt-2 xt-1 xt P P Sequence classifier P A A A T T T O O Static gesture classifier SVM ANN Introduction Libras Review Methods Experiments Results Conclusion 55 Static gesture recognition Static gestures (hand postures) Database of 8100 grayscale images Input instances with 1024 features 27 classes (manual alphabet signs) Introduction Libras Review Methods Dynamic gestures (spelled words) Experiments Results Conclusion 56 Neural Networks Evaluate initialization heuristics (Nguyen-Widrow) Support Vector Machines Evaluate the heuristic value for 2 based on the interquartile range of the norm statistics for the input dataset Introduction Libras Review Methods Experiments Static gestures (hand postures) Dynamic gestures (spelled words) Results Conclusion 57 Static sign classification Kappa σ2 Kappa Hidden Neurons Kappa 0,1 0.000 50 0.851 1 0.106 100 0.887 10 0.569 300 0.921 100 0.950 500 0.922 (heuristic) 392 0.917 1000 0.924 1000 0.863 1 800.00 0.8 600.00 0.6 400.00 0.4 200.00 0.2 0 0.1 1 Busca em grade Introduction Libras Review 0.00 1000 10 100 Heurística Vetores de Suporte Methods Experiments Support Vectors (Average) Resilient Backpropagation ANN Gaussian kernel SVM Results Conclusion 58 Static sign classification Kappa σ2 Kappa Hidden Neurons Kappa 0.1 0.000 50 0.851 1 0.106 100 0.887 10 0.569 300 0.921 100 0.959 500 0.922 (heuristic) 392 0.917 1000 0.925 1000 0.863 1 800.00 0.8 600.00 0.6 400.00 0.4 200.00 0.2 0 0.1 1 Busca em grade Introduction Libras Review 0.00 1000 10 100 Heurística Vetores de Suporte Methods Experiments Support Vectors (Average) Resilient Backpropagation ANN Gaussian kernel SVM Results Conclusion 59 Hyperparameter surface for Gaussian SVMs Kappa 1.00 0.95 0.90 0.85 1000000 10000 C 100 0.85-0.90 50… 15… 10… 20… 0.95-1.00 400 200 Libras Review Methods 20… 15… 400-600 50… 200-400 10… Sigma (σ²) 500 H 0 0-200 Introduction 0.90-0.95 1 600 0.1 1 10 50 75 100 200 300 Average number of SVs 0.80-0.85 500 Sigma (σ²) H 0.1 1 10 50 75 100 200 300 0.80 Experiments 1000000 10000 C 100 1 Results Conclusion 60 Static gesture classification Statistically significant results (p < 0.01) Static gestures (hand postures) Dynamic gestures (spelled words) Points of interest Polynomial machines have increased sparcity but smaller kappa Neural networks were faster to evaluate, but not to learn – unless using linear SVMs Sigma plays a much more important role than C in Gaussian machines Heuristics for choosing sigma and C resulted in great performance values Introduction Libras Review Methods Experiments Results Conclusion 61 Static gestures (hand postures) Dynamic gestures (spelled words) Introduction Libras Review Methods Experiments Results Conclusion 62 Static gestures (hand postures) Dynamic gestures (spelled words) Introduction Libras Review Methods Experiments Results Conclusion Pato HCRF ω yt-2 yt-1 yt xt-2 xt-1 xt Sequence classifier P P P A A A T T Static gesture classifier SVM Introduction Libras Review Methods Experiments Results Conclusion T 64 Dynamic gesture classification Database containing 540 signed words Containing a total of 63,703 static signs The previous layer labels the entire dataset Then we tested all possible model combinations Estimated kappa () sampled from 10-fold CV Introduction Libras Review Methods Experiments Results Conclusion 65 Dynamic gesture classification Labeller Classification Algorithm SVM HMM Baum-Welch SVM HCRF RProp ANN HMM Baum-Welch ANN HCRF RProp Introduction Libras Review Methods Training Validation Kappa Kappa Experiments Results Conclusion 66 Dynamic gesture classification Training Validation Kappa Kappa Labeller Classification Algorithm SVM HMM Baum-Welch 0.95 0.82 SVM HCRF RProp 0.98 0.83 ANN HMM Baum-Welch 0.95 0.80 ANN HCRF RProp 0.99 0.82 Introduction Libras Review Methods Experiments Results Conclusion 67 Dynamic gesture classification Training Validation Kappa Kappa Labeller Classification Algorithm SVM HMM Baum-Welch 0.95 0.82 SVM HCRF RProp 0.98 0.83 ANN HMM Baum-Welch 0.95 0.80 ANN HCRF RProp 0.99 0.82 SVM+HCRF have shown the best validation result (10-fold CV) Combinations using HCRF have shown best results in general Training results are statistically different We have not enough evidence to say validation results are not equivalent Introduction Libras Review Methods Experiments Results Conclusion 68 Dynamic gesture classification Training Validation Kappa Kappa Labeller Classification Algorithm SVM HMM Baum-Welch 0,95 0,82 SVM HCRF RProp 0,98 0,83 ANN HMM Baum-Welch 0,95 0,80 ANN HCRF RProp 0,99 0,82 Hidden Conditional Random Fields – in this specific problem – had higher ability to retain knowledge while keeping the same generality In other words, achieved less overfitting. Introduction Libras Review Methods Experiments Results Conclusion and future works 69 Fingerspelling experiments SVMs & HCRFs vs ANNs & HMMs Static Gesture Recognition Statistically significant results favoring SVMs Linear SVMs on DDAGs: Best compromise between speed, accuracy and ease of use SVMs have shown easier training, reduced training times Heuristic initializations work rather well, less parameter tuning Dynamic Gesture Recognition Choice of gesture classifier had much more impact Linear-chain HCRFs: Increased knowledge absorption without overfitting Introduction Libras Review Methods Experiments Results Conclusion 71 Future works Detect standard words rather than fingerspelling (already complete) Use Structural Support Vector Machines, which are equivalent to HCRFs but are trained using a hinge loss function Use a mixed language model to categorize full phrases Introduction Libras Review Methods Experiments Results Conclusion 73 References BOWDEN, R. et al. A Linguistic Feature Vector for the Visual Interpretation of Sign Language. European Conference on Computer Vision. [S.l.]: Springer-Verlag. 2004. p. 391-401. BRADSKI, G. R. Computer Vision Face Tracking For Use in a Perceptual User Interface. Intel Technology Journal, n. Q2, 1998. Disponivel em: <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.14.7673>. DIAS, D. B. et al. Hand movement recognition for brazilian sign language: a study using distance-based neural networks. Proceedings of the 2009 international joint conference on Neural Networks. Atlanta, Georgia, USA: IEEE Press. 2009. p. 2355-2362. FERREIRA-BRITO, L. Por uma gramática de Línguas de Sinais. 2nd. ed. Rio de Janeiro: Tempo Brasileiro, 2010. 273 p. ISBN 85-282-0069-8. FERREIRA-BRITO, L.; LANGEVIN, R. The Sublexical Structure of a Sign Language. Mathématiques, Informatique et Sciences Humaines, v. 125, p. 17-40, 1994. Introduction Libras Review Methods Experiments Results Conclusion 74 References FEUERSTACK, S.; COLNAGO, J. H.; SOUZA, C. R. D. Designing and Executing Multimodal Interfaces for the Web based on State Chart XML. Proceedings of 3a. Conferência Web W3C Brasil 2011. Rio de Janeiro: [s.n.]. 2011. PIZZOLATO, E. B.; ANJO, M. D. S.; PEDROSO, G. C. Automatic recognition of finger spelling for LIBRAS based on a two-layer architecture. Proceedings of the 2010 ACM Symposium on Applied Computing. Sierre, Switzerland: ACM. 2010. p. 969-973. VIOLA, P.; JONES, M. Robust Real-time Object Detection. International Journal of Computer Vision. [S.l.]: [s.n.]. 2001. VAPNIK, V. N. The nature of statistical learning theory. New York, NY, USA: Springer-Verlag New York, Inc., 1995. ISBN 0-387-94559-8. VAPNIK, V. N. Statistical learning theory. [S.l.]: Wiley, 1998. ISBN 0471030031. YANG, R.; SARKAR, S. Detecting Coarticulation in Sign Language using Conditional Random Fields. Pattern Recognition, 2006. ICPR 2006. 18th International Conference on. [S.l.]: [s.n.]. 2006. p. 108-112. Introduction Libras Review Methods Experiments Results Conclusion Guilherme Cartacho Appendix A 77 Accord.NET Framework Machine Learning and Artificial Intelligence 78 Accord.NET Framework Machine Learning and Artificial Intelligence Computer Vision / Audition 79 Accord.NET Framework Machine Learning and Artificial Intelligence Computer Vision / Audition Mathematics and Statistics 80 Builds upon well established foundations 83 It has been used to Recognize gestures using Wii 84 It has been used to Study and evaluate performance in 3D gesture recognition 85 It has been used to Predict attacks in computer networks 86 It has been used to Compare touch and in-air gestures using Kinect 87 It has been used to Provide sensor information in multi-model interfaces It has been used in an increasing number of publications Guido Soetens, Estimating the limitations of single-handed multi-touch input. Master Thesis, Utrecht University. September, 2012. K. N. Pushpalatha, A. K. Gautham, D. R. Shashikumar, K. B. ShivaKumar. Iris Recognition System with Frequency Domain Features optimized with PCA and SVM Classifier, IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 5, No 1, September 2012. Arnaud Ogier, Thierry Dorval. HCS-Analyzer: Open source software for High-Content Screening data correction and analysis. Bioinformatics. First published online May 13, 2012. It has been used in an increasing number of publications Ludovico Buffon, Evelina Lamma, Fabrizio Riguzzi, and Davide Forment. Un sistema di vision inspection basato su reti neurali. In Popularize Artificial Intelligence. Proceedings of the AI*IA Workshop and Prize for Celebrating 100th Anniversary of Alan Turing's Birth (PAI 2012), Rome, Italy, June 15, 2012, number 860 in CEUR Workshop Proceedings, pages 1-6, Aachen, Germany, 2012. Liam Williams, Spotting The Wisdom In The Crowds. Master Thesis on Joint Mathematics and Computer Science. Imperial College London, Department of Computing. June, 2012. Alosefer, Y.; Rana, O.F.; "Predicting client-side attacks via behaviour analysis using honeypot data," Next Generation Web Services Practices (NWeSP), 2011 7th International Conference on , vol., no., pp.31-36, 1921 Oct. 2011 It has been used in an increasing number of publications Brummitt, L. Scrabble Referee: Word Recognition Component, 2011. Final project report. University of Sheffield, Sheffield, England. Cani, V., 2011. Image Stitching for UAV remote sensing application. Master Degree Thesis. Computer Engineering, School of Castelldefels of Universitat Politècnica de Catalunya. Barcelona, Spain. Hassani, A. Z.; "Touch versus in-air Hand Gestures: Evaluating the acceptance by seniors of Human-Robot Interaction using Microsoft Kinect," Master Thesis, University of Twente, Enschede, Netherlands, 2011. Kaplan, K., 2011. ADES: Automatic Driver Evaluation System. PhD Thesis, Boğaziçi University, Istanbul, Turkey. It has been used in an increasing number of publications Wright, M., Lin, C.-J., O'Neill, E., Cosker, D. and Johnson, P., 2011. 3D Gesture recognition: An evaluation of user and system performance. In: Pervasive Computing - 9th International Conference, Pervasive 2011, Proceedings. Heidelberg: Springer Verlag, pp. 294-313. Lourenço, J., 2010. Wii3D: Extending the Nintendo Wii Remote into 3D. Final course project report, Rhodes University, Grahamstown. 110p. Mendelssohn, T.; 2010. Gestureboard - Entwicklung eines Wiimotebasierten, gestengesteuerten, Whiteboard-Systems für den Bildungsbereich. Final project report. Hochschule Furtwangen University, Furtwangen im Schwarzwald, Germany. http://accord.googlecode.com 92