Report

Design Compact Recognizers of Handwritten Chinese Characters Using Precision Constrained Gaussian Models, Minimum Classification Error Training and Parameter Compression Yongqiang Wang1,2 , Qiang Huo1 1Microsoft Research Asia, Beijing, China 2The University of Hong Kong, Hong Kong, China ([email protected]) ICDAR-2009, Barcelona, Spain, July 26-29, 2009 Outline • Prior art • Our approach • Experiments and results • Conclusions Prior art to build a compact online CJK handwritten character recognizer • Feature vector: – 8-directional features (Bai & Huo, 2005) • Z.-L. Bai and Q. Huo, “A study on the use of 8-directional features for online handwritten Chinese character recognition,” Proc. ICDAR-2005, pp.262-266. – LDA for dimension reduction • Classifier: – Modified Quadratic Discriminant Function (MQDF, Kimura et al., 1987) • F. Kimura, K. Takashina, S. Tsuruoka, and Y. Miyake, “Modified quadratic discriminant functions and the application to Chinese character recognition,” IEEE Trans. on PAMI, vol. 9, pp.149-153, 1987. – Minimum Classification Error (MCE) training for MQDF-based classifier (Liu et al., 2004) • C.-L. Liu, H. Sako, and H. Fujisawa, “Discriminative learning quadratic discriminant function for handwriting recognition,” IEEE Trans. on Neural Networks, vol. 15, no. 2, pp.430-444, 2004. – Parameter compression using Split-VQ (Long & Jin, 2008) • T. Long and L.-W. Jin, “Building compact MQDF classifier for large character set recognition by subspace distribution sharing,” Pattern Recognition, vol. 41, no. 9, pp.2916-2925, 2008. What’s wrong: bad accuracy-footprint tradeoff Our New Approach • Precision Constrained Gaussian Model (PCGM) for handwritten CJK character recognition: – Borrowed from ASR area – Demonstrated previously that ML-trained PCGM can achieve much better accuracy-footprint tradeoff than the conventional ML-trained MQDF (Wang & Huo, 2009) • Y.-Q. Wang and Q. Huo, “Modeling inverse covariance matrices by expansion of tied basis matrices for online handwritten Chinese character recognition, ” Pattern Recognition, 2009. • What’s new in this study? – – – – MCE training of PCGM-based classifier using Quickprop Compression of PCGM parameters using scalar quantization and split VQ Embed parameter compression into MCE training Comparative study with their MQDF counterparts Precision Constrained Gaussian Model (PCGM) • For a pattern classification problem with M “character classes”: – Each pattern class Cj is modeled by a Gaussian PDF, where the inverse covariance matrix (a.k.a. precision matrix) is constrained to lie in a subspace: • PCGM-based discriminant function: • Classifier parameters to be stored: – Character-dependent (transformed) mean vectors – Character-independent basis matrices – Character-dependent basis coefficients MCE Training of PCGM-based Classifier • Discriminant function of PCGM-based classifier • For the i-th training feature vector, xji , belonging to j-th class, a misclassification measure is defined as • Per-sample loss function: • Batch-mode Quickprop algorithm is used to minimize the empirical loss function defined on the training data set: – Take advantage of “cluster computing” – In practice, only mean vectors are updated Parameter Compression (1) • Compress mean vectors using split VQ – Split D-dimensional (transformed) mean vectors into Q streams – Sub-vectors in different streams are quantized by VQ with different codebooks • Basis coefficients – Compressed using split VQ as well Parameter Compression (2) • Compress basis matrices – Basis matrices are symmetric, therefore only need to store the diagonal and upperdiagonal elements – Diagonal elements are not compressed – Upper-diagonal elements are compressed using 8-bit scalar quantization Combining MCE Training and Parameter Compression • Start from ML-trained PCGM models – Compress basis coefficients – Compress basis matrices • Do MCE training to fine-tune the mean vectors compressed parameters using the above • Compress the (transformed) mean vectors A similar strategy is also applied to MCE training and model compression for MQDF-based classifier! Experimental Setup (1) • Task: – Recognition of 2965 “JIS level-1” handwritten Kanji characters • Data sets: – Training set: 704,650 samples from Nakayosi database – Development set: 229,398 samples from Nakayosi/Kuchibue databases – Testing set: 506,848 samples from Kuchibue database • Feature vector: – 512 8-directional raw features extracted from each handwriting sample – Use LDA to reduce dimension to 128 Experimental Setup (2) • ML training – MQDF • 20 eigenvectors are retained for each character class, i.e. MQDF(20) – PCGM • # of basis matrices : 32, 64, 128, i.e., PCGM(32/64/128) • MCE training – Run 20 epochs in Quickprop algorithm to update mean vectors – Both MQDF and PCGM • Parameter compression – Compress MQDF/PCGM-based recognizers to less than 2MB footprint Experimental Results 98.6 Classifiers Accuracy (%) Footprint (MB) Latency (ms) PCGM(32) 98.35 0.88 2.45 PCGM(64) 98.50 1.27 --- 98.2 PCGM(128) 98.59 2.07 --- 98.1 MQDF(20) 98.20 2.33 1.50 PCGM(32) 98.5 PCGM(64) 98.4 PCGM(128) MQDF(20) 98.3 98 PCGM(32) PCGM(64) PCGM(128) MQDF(20) • Given the same footprint, PCGM achieves higher recognition accuracy than MQDF; • PCGM-based recognizer can be made very compact, yet achieves high recognition accuracy; • PCGM-based recognizers are slower than MQDF-based recognizers: – Evaluated on a PC with 3 GHz Pentium-4 CPU – Re-scoring on a short-list of 50 candidates Conclusion • PCGM-based approach provides a good solution to designing a compact CJK handwritten character recognizer • Ongoing and future works: – Better discriminative training for both MQDF/PCGM to boost the recognition accuracy further – Try it out for Japanese and Korean Backup slides Experimental Results • Comparison of ML-trained PCGM and MQDF based classifiers without parameter compression 98.6 Classifiers Accuracy (%) Footprint (MB) PCGM(32) 98.00 2.83 PCGM(64) 98.32 4.20 PCGM(128) 98.53 6.94 MQDF(20) 98.52 30.64 98.5 98.4 PCGM(32) 98.3 PCGM(64) 98.2 PCGM(128) 98.1 MQDF(20) 98 97.9 97.8 97.7 PCGM(32) PCGM(64) PCGM(128) MQDF(20) Experimental Results Before Precision Matrices Compression Classifiers Accuracy (%) Footprint (MB) PCGM(32) 98.00 2.83 PCGM(64) 98.32 4.20 PCGM(128) 98.53 6.94 MQDF(20) 98.52 30.64 Classifiers Accuracy (%) Footprint (MB) PCGM(32) 97.99 1.84 PCGM(64) 98.29 2.23 PCGM(128) 98.49 3.00 MQDF(20) 98.04 3.47 98.6 98.5 98.4 PCGM(32) 98.3 PCGM(64) 98.2 PCGM(128) 98.1 MQDF(20) 98 97.9 97.8 97.7 PCGM(32) PCGM(64) PCGM(128) MQDF(20) After Precision Matrices Compression 98.5 98.4 98.3 98.2 98.1 98 PCGM(32) PCGM(64) PCGM(128) MQDF(20) 97.9 97.8 97.7 Experimental Results Before MCE training Classifiers Accuracy (%) Footprint (MB) PCGM(32) 97.99 1.84 PCGM(64) 98.29 2.23 PCGM(128) 98.49 3.00 MQDF(20) 98.04 3.47 Classifiers Accuracy (%) Footprint (MB) PCGM(32) 98.35 1.84 PCGM(64) 98.52 2.23 PCGM(128) 98.61 3.00 MQDF(20) 98.25 3.47 98.5 98.4 98.3 PCGM(32) 98.2 PCGM(64) 98.1 PCGM(128) 98 MQDF(20) 97.9 97.8 97.7 PCGM(32) PCGM(64) PCGM(128) MQDF(20) After MCE training 98.7 98.6 98.5 PCGM(32) 98.4 PCGM(64) 98.3 PCGM(128) 98.2 98.1 98 MQDF(20) Experimental Results Before Mean Compression Classifiers Accuracy (%) Footprint (MB) PCGM(32) 98.35 1.84 PCGM(64) 98.52 2.23 PCGM(128) 98.61 3.00 MQDF(20) 98.25 3.47 Classifiers Accuracy (%) Footprint (MB) PCGM(32) 98.35 0.88 PCGM(64) 98.50 1.27 PCGM(128) 98.59 2.07 MQDF(20) 98.20 2.33 98.7 98.6 98.5 PCGM(32) 98.4 PCGM(64) 98.3 PCGM(128) MQDF(20) 98.2 98.1 98 PCGM(32) PCGM(64) PCGM(128) MQDF(20) After Mean Compression 98.6 98.5 PCGM(32) 98.4 PCGM(64) 98.3 PCGM(128) MQDF(20) 98.2 98.1 98 PCGM(32) PCGM(64) PCGM(128) MQDF(20)