自然场景ä¸æ–‡å..

Report
自然场景中的文本检测与识别综
述
张树业
2014.6.6
ICDAR 2003 Robust Reading Competition
• Partition into 3 sub-problems: Text Locating, Word Recognition, Character Recognition.
• Text Locating:
Figure 1 (a)Original Image; (b)Location Result.
• Word Recognition:
Figure 2 Image contained a word.
• Character Recognition:
Figure 3 Image contained a character.
[1] http://www.iapr-tc11.org/mediawiki/index.php/ICDAR_2003_Robust_Reading_Competitions
ICDAR 2003 Robust Reading Competition
• Dataset
Table 1 ICDAR 2003 Text Locating Competition Dataset
Sample
Trial Train
Trial Test
Competition Set
20
258
251
501
• Basic Information of each task
Table 2 Measures of interest in each problem.
Problem
Downloads
Entries
Text Locating
394
5
Word Recognition
228
0
Character Recognition
218
0
• Quick Impression of Data
Figure 4 Sample Image in Task 1.
ICDAR 2003 Robust Reading Competition
Evaluation Scheme:
m ( r , R )  m ax m p ( r , r ') | r '  R
p'

re  E
m ( re , T )
E
f 
1
 / p ' (1   ) / r '
r'
(2)

(1)
rt  T
m ( rt , E )
T
(3)
(4)
Figure 5 Different positional relationships between two rectangles.
Competition Result:
Table 3 ICDAR 2003 Text Locating Competition Results
System
Precision
Recall
F
Ashida
0.55
0.46
0.50
HWDavid
0.44
0.46
0.45
Wolf
0.30
0.44
0.35
Todoran
0.19
0.18
0.18
[1] S. M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, and R. Young, “ICDAR 2003 robust reading competitions,” in
Proceedings of the Seventh International Conference on Document Analysis and Recognition, vol. 2, 2003, pp. 682–687.
ICDAR 2005 Robust Reading Competition
• Task, Dataset, Evaluation scheme keep the same with ICDAR 2003.
• Competition Results:
Table 4 Text locating results for the 2005 (top) and the 2003 (bottom) entries.
Systems
Precision
Recall
F
T(s)
Hinnerk Becker
0.62
0.67
0.62
14.4
Alex Chen
0.60
0.60
0.58
0.35
Qiang Zhu
0.33
0.40
0.33
1.6
Jisoo Kim
0.22
0.28
0.22
2.2
Nobuo Ezaki
0.18
0.36
0.22
2.8
Ashida
0.55
0.46
0.50
8.7
HWDavid
0.44
0.46
0.45
0.3
Wolf
0.30
0.44
0.35
17.0
Todoran
0.19
0.18
0.18
0.3
ICDAR 2011 Robust Reading Competition
• Text Localization Task, Word Recognition Task
• Dataset
Table 5 Some drawback of previous dataset and its improvement
Drawback
Improvement
Missing ground truth information for some of the
files and text elements within some images.
Make the ground truth of these files again.
Additionally, 100 images are added into original
dataset. The final dataset consisted of 485 images
totally.
Mixed interpretation of punctuation and special
characters as part of words.
Space character is consistently used as word
separation. All punctuation marks and special
characters are considered as part of the word.
Bounding boxes around words are not tight.
The bounding boxes are tight so they touch most of
the boundary pixels of a word.
• Evaluation Scheme: Wolf’s Scheme [2] and DetEval
[1] Shahab, Asif, Faisal Shafait, and Andreas Dengel. "ICDAR 2011 robust reading competition challenge 2: Reading text in scene
images." Document Analysis and Recognition (ICDAR), 2011 International Conference on. IEEE, 2011.
[2] C. Wolf and J. Jolion, “Object count/area graphs for the evaluation of object detection and segmentation algorithms,” Int. Jour. on
Document Analysis and Recognition, vol. 8, no. 4, pp. 280–296, Sep. 2006.
ICDAR 2011 Robust Reading Competition
Table 6 Text Localization Results
Entries
Recall
Precision
Harmonic Mean
Kim’s
62.47
82.98
71.28
Yi’s
58.09
67.22
62.32
TH-TextLoc System
57.68
66.97
61.98
Neumann’s Method
52.54
68.93
59.63
TDM IACS
53.52
63.52
58.09
LIP6-Retin
50.07
62.97
55.78
KAIST AIPR System
44.57
59.67
51.03
ECNU-CCG Method
38.32
35.01
36.59
Text Hunter
25.96
50.05
34.19
Brief Introduction of Kim’s method:
First, blobs in an image are extracted by MSER approach the neighboring blobs are merged. Next, minimize the
false positives with gradient feature. Then, Adaboost learning method is used for deciding the location and size of
the rectangle in the oriented gradient image. Finally, a cascade classifier is used to discriminate text from non-text.
[1] D. Karatzas, S. R. Mestre, J. Mas, F. Nourbakhsh, and P. P. Roy, “ICDAR 2011 robust reading competition-challenge 1: reading text in borndigital images (web and email),” in Document Analysis and Recognition (ICDAR), 2011 International Conference on. IEEE, 2011, pp. 1485–1490.
ICDAR 2013 Robust Reading Competition
Challenge 2: “Reading Text in Scene Images”
Task 1-Text Localization: the objective is to obtain a rough
estimation of the text areas in the image, in terms of
bounding boxes that correspond to parts of text (words or
text lines).
Task 2-Text Segmentation: the objective is the pixel level
separation of text from the background.
Task 3-Word Recognition: the locations (bounding boxes)
of words in the image are assumed to be known and the
corresponding text transcriptions are sought. Note that
there are many short words and even single letters in this
dataset.
[1] http://dag.cvc.uab.es/icdar2013competition/?ch=2&com=tasks
ICDAR 2013 Robust Reading Competition
Table 7. Results for the ICDAR 2013 Robust Reading Competition (Challenge2: Text Localization in Real Scenes).
Methods
USTB_TexStar
TextSpotter
Recall
66.45
64.84
Precision
88.47
87.51
f
75.89
74.49
CASIA_NLPR
62.24
78.89
73.18
Text_detector_CASIA
62.85
84.70
72.16
I2R_NUS_FAR
I2R_NUS
TH-TextLoc
Text Detection
69.00
66.17
65.19
53.42
75.08
72.54
69.96
74.15
71.91
69.21
67.49
62.10
Remarks
Prof. Jiri Matas's group, Czech Technical University
Prof. Cheng-Lin Liu's group, Institute of Automation, Chinese
Academy of Sciences
Prof. Chunheng Wang's group, Institute of Automation, Chinese
Academy of Sciences
Prof. Tan Chew Lim's group, National University of Singapore
Prof. Tan Chew Lim's group, National University of Singapore
Prof. Xiaoqing Ding's group, Tsinghua University
Prof. Séverine Dubuisson's group, UPMC
Table 8. Performance (%) comparison of text localization algorithms for the multilingual dataset.
Methods
Recall
Precision
f
Yin et al’s method
Pan et al's method
68.45
65.9
82.63
64.5
74.58
65.2
Speed per image ( image
size: 796*878)
0.22
3.11
Table 9. Performance (%) comparison of text localization algorithms for the ICDAR 2011 Robust Reading Competition dataset.
Methods
Yin et al’s method
Shi et al's method
Neumann and Matas's method
Kim's Method
Yi's Method
TH-TextLoc System
Recall
68.26
63.1
64.7
62.47
58.09
57.68
Precision
86.29
83.3
73.1
82.98
67.22
66.97
f
76.22
71.8
68.7
71.28
62.32
61.98
Remarks
34(2): 107-116, Pattern Recognition Letters, 2013.
CVPR 2012: 3538-3545.
1st of the ICDAR 2011 Competition.
2nd of the ICDAR 2011 Competition.
3rd of the ICDAR 2011 Competition.
[1] http://prir.ustb.edu.cn/TexStar/scene-text-detection/
[2] Xu-Cheng Yin, Xuwang Yin, Kaizhu Huang, and Hong-Wei Hao, "Robust text detection in natural scene images," IEEE Trans. Pattern Analysis and
Machine Intelligence (TPAMI), to appear, 2013.
Demo
Neumann L., Matas J.: A Real-Time Scene Text to Speech System, Demo @ ECCV 2012.
围绕场景文字检测的创新
• 应用
(1)残疾人辅助工具;
(2)图像的理解和检索;
(3)实时场景翻译系统;
(4)无人驾驶;
(5)地图标注…
• 算法
(1)将检测与识别做成一个End-to-End的系统;
(2)利用语言模型作为后处理进一步提高系统的性能…
谢谢!

similar documents