ONLINE HANDWRITTEN GURMUKHI SCRIPT RECOGNITION AND ITS CHALLENGES R. K. SHARMA THAPAR UNIVERSITY, PATIALA Handwriting Recognition System The technique by which a computer system can recognize characters and other symbols written by hand in natural handwriting is called handwriting recognition (HWR) system. Types of HWR systems HWR Off-line HWR On-line HWR Handwritten Document is scanned and then recognized by the machine, is called off-line handwriting recognition. Handwritten Documents are recognized while being written, it is called on-line handwriting recognition. Increasing COMPLEXITY Handwriting Recognition System Writer dependent Writer independent Closed-vocabulary Open-vocabulary A general recognition procedure for On-line HWR Data Collection & Preprocessing Features Extraction & Segmentation Recognition Methods & Post-processing Data Collection Input Pen Writing Store pen movements Text/Other file created Text/Other file to be converted to a suitable format Need of an application for selected hardware device • Pre-developed applications do not support the features for user requirements, i.e., storing all pixels information for written text, deletion and addition of strokes w.r.t. user requirements, scaling the written text etc. • Own GUI for user requirements needs to be developed. Preprocessing • Size Normalization • Centering of text • Interpolating missing points • Smoothing of Text • Slant Correction • Resampling of points Feature Extraction • A feature extractor designed by Govindaraju converts chain code image into feature vectors and then used in recognition phase. • Hu et al. worked with point oriented features like stroke tangents for handwriting recognition. • Hu et al. also proposed a method where high-level features were extracted and then combined with local-features at each sample point. These introduced features were capable of covering large input pattern and had invariance properties. • Rocha designed feature extractor that reduced dimension of the problem and provided structural description of a character shape that consists of specification of its features and their special inter-ralations. • Feature extractor designed by S.W. Lee extracted four directional feature vectors with kirsch masks and one global feature vector linearly compressed from normalized input image. • Kirsch masks were also used by Chaos in recognition of handwritten • Numerals. • Blumenstein introduced a feature extraction technique for the recognition of segmented handwritten characters. • A hybrid feature extraction method proposed by PiFuei that was capable of providing an effective feature set of full dimension for the multiclass cases. Feature Categories Features Low-Level or Local High-Level or Global (directions, positions, slope, area, slant etc.) (loops, crossings, Headline, straight line, dots etc.) Devices based features Time taken by the pen device for capturing a stroke is one of the features as each stroke has its own complexity. If suitable information is collected about each stroke time span, it may help in recognition process. Density of points in a stroke is device dependent. Directions of pen movement in a stroke might be helpful in recognition. Stroke area covered. Pressure of the pen movements. back Features’ Properties Features giving better results may vary from one script to another script. A method that gives good results for a script may not do so for other scripts. There is no standard method for computing features of a language. Features should vary to a reasonable extent. Features must be available from different users handwriting. Features should be measurable through algorithms. Features are selected in such a way that they represent the handwriting well and emphasize the inter-class differences and intra-class similarities. Recognition methods Category Method Statistical Hidden Markov Model, Support Vector Machine Researchers Amlan kundu and Parambir Bahl (1988); Beigi (1994); Bellegarda (1994); Beim (2001); Connell and Jain (2002); Rigoll (1996); Subrahmonia (1996) Neural Network TDNN Guyon (1992); Schomaker (1993); Morasso (1995); Yeager (1998) Syntactical and Structural Decision Tree Kerrick and Bovik(1988); Chan and Yeung(1999); Jung and Kim(2000) Elastic Matching Dynamic Programming Palvidis(1997); Wakahara and Odaka(1997); Webster and Nakagawa(1998) Advantages and disadvantages of Recognition methods Category Advantages Disadvantages Statistical Models temporal relationship well. Requires very large amount of training data Neural Network Classification time is fast. Does not model temporal relationship well. Syntactical and Structural Less training data and robust for WI system. Feature choice is manual and highly script dependent. Elastic Matching Powerful high level features. Not good for the system, where large variations exists in handwriting. Post Processing Other important Aspect Language rules An Efficient Post Processing Algorithm for Online Handwritten Gurmukhi Character Recognition using Set Theory”, International Journal of Pattern Recognition and Artificial Intelligence, 27(4), 1353002 (1-17), 2013 by Ravinder Kumar and R.K. Sharma Language Models Challenges • Reverse Handwriting • Zone wise stroke predictions • Confusing Strokes • Prediction of half Akshras for example: Pairi ‘ਹ’, Pairi ‘ਵ’ • New Classes in Handwritten Words • New Features, Selection from existing features • New Classifiers / Hybrid Classifiers THANK YOU ALL !!!!!