A review of algorithms for text detection in  images and videos
Yu. A. Bolotova, V.G. Spitsyn, P.M. Osina
   
  Tomsk Polytechnic University, Tomsk, Russia
Full text of article: Russian language.
 PDF
  PDF
Abstract:
This article reviews the  history and state-of-the-art optical character recognition systems, such as ABBYY  FineReader, Tesseract, CuneiForm, with particular attention given to their  inner algorithms, including page layout analysis; page segmentation and  document skew angle estimation. The overview includes the description and  comparison of different methods proposed for the last 30 years in terms of  speed and versatility. Critical analysis and discussions about the status of  the field and open problems are reported. 
Keywords:
OCR, page layout  analysis, text segmentation, skew detection.
Citation:
Bolotova YuA, Spitsyn  VG, Osina PM. A review of algorithms for text detection in images and videos. Computer Optics 2017; 41(3): 441-452. DOI: 10.18287/2412-6179-2017-41-3-441-452.
References:
  - Kuzmitskiy NN. Detection of text  objects in images of real scenes based on convolutional neural network model  [In Russian]. Informatics 2015; 2(46): 12-21.
- Kazanskiy NL, Popov SB. The  distributed vision system of the registration of the railway train [In  Russian]. Computer Optics 2012; 36(3): 419-428.
- Smith RW. Hybrid page layout  analysis via tab-stop detection. Proc ICDAR'09 2009: 214-245. DOI: 10.1109/ICDAR.2009.257.
- Yin X-C, Pei W-Y, Zhang J, Hao H-W. Multi-orientation  scene text detection with adaptive clustering. IEEE Transactions on Pattern  Analysis and Machine Intelligence 2015; 37(9): 1930-1937. DOI: 10.1109/TPAMI.2014.2388210.
- Zuo Z-Y, Tian S, Yin X-C. Multi-strategy  tracking based text detection in scene videos. ICDAR 2015: 66-70. DOI: 10.1109/ICDAR.2015.7333727.
- Koo HI, Kim DH. Scene text detection  via connected component clustering and nontext filtering. IEEE Trans Image Process  2013; 22(6): 2296-2305. DOI: 10.1109/TIP.2013.2249082. 
- Nagy G. Twenty years of document  image analysis. IEEE Transactions  on Pattern Analysis and Machine Intelligence 2000; 22(1): 38-62. DOI:  10.1109/34.824820. 
- Bolotova  YuA, Spitsyn VG, Rudometkina   MN. License plate recognition algorithm on the  basis of a connected components method and a hierarchical temporal memory  model.Computer Optics 2015;  39(2): 275-280. DOI: 10.18287/0134-2452-2015-39-2-275-280.
- Jaderberg  M, Simonyan K, Vedaldi A, Zisserman A. Reading text in the wild with convolutional  neural networks. International Journal of Computer Vision 2016; 116(1): 1-20.  DOI: 10.1007/s11263-015-0823-z. 
- Novikova  T, Barinova O, Kohli P, Lempitsky V. Large-lexicon attribute-consistent text  recognition in natural images. ECCV 2012: 752-765. DOI:  10.1007/978-3-642-33783-3_54. 
- Zapryagaev SA,  Sorokin AI. Handwritten character recognition based on analysis of chord-length  function descriptors. Proceedings of Voronezh   State University;  Series: System Analysis and Information Technologies 2009; 2: 49-58.
- Glumov NI, Mjasnikov EV, Kopenkov VN,  Chicheva MA. The method  of fast correlation using ternary templates for object recognition on images  [In Russian]. Computer Optics 2008; 32(3): 277-282.
- Smith R. History of the Tesseract  OCR engine: what worked and what didn’t. Proc SPIE 2013; 8658: 865802. DOI: 10.1117/12.2010051.
- Breuel TM. The OCRopus open source  OCR system. Proc SPIE 2008;  6815: 68150F.  DOI: 10.1117/12.783598.
- Senior AW. Off-line cursive  handwriting recognition using recurrent neural networks. PhD thesis. Cambridge: Cambridge   University;1994.
- Graves A, Liwicki M, Fernández S, Bertolami  R, Bunke H, Schmidhuber J. A novel connectionist system for unconstrained  handwriting recognition. IEEE Trans Pattern Anal Mach Intell 2008; 31(5):  855-868. DOI: 10.1109/TPAMI.2008.137.
- Srihari SN, Zack GW. Document Image  analysis. Proceedings of  8-th International Conference on Pattern Recognition 1986: 434-436.
- Gorohovatskyi OV. The  detection of text regions on image of a document using merge method. Information  Processing Systems 2014; 1(117): 75-81. 
- Cattoni R,  Coianiz T, Messelodi S, Modena CM. Geometric layout analysis techniques for  document image understanding: a review. ITC-irst Technical Report TR#9703-09  1998: 68p. Source: áhttp://www.academia.edu/18416548/Geometric_p
- Layout_Analysis_Techniques_for_Document_Image_Understanding_a_Review._TR_9703-09ñ. 
- Negi A, Shanker KN, Chereddi CK.  Localization, Extraction and recognition of text in Telugu document images.  Proc ICDAR 2003: 1193-1197. DOI: 10.1109/ICDAR.2003.1227846.
- Bukhari SS,  Shafait F, Breuel TM. High performance layout analysis of Arabic and Urdu  document images. Proc ICDAR 2011: 1275-1279. DOI: 10.1109/ICDAR.2011.257.
- Wong KY, Casey RG, Wahl FM. Document  analysis system. IBM Journal of Research and Development 1982; 26(6): 647-656.  DOI: 10.1147/rd.266.0647.
- Nagy G, Wagle S. Hierarchical  representation of optically scanned documents. Proceedings of 7-th International  conference on Pattern recognition 1984: 347-349. 
- Baird HS, Jones SE, Fortune SJ. Image Segmentation by Shape-Directed  Covers. Proc ICPR 1990: 820-825. DOI: 10.1109/ICPR.1990.118223.
- Oudjemia  S, Ameur Z, Ouahabi A. Segmentation  of complex document. Carpathian Journal of Electronic and Computer Engineering  2014; 7(1): 13-18.
- Breuel TM. An algorithm  for finding maximal whitespace rectangles at arbitrary orientations for  document layout analysis. Proc ICDAR 2003; 1: 66-70. DOI:  10.1109/ICDAR.2003.1227629. 
- Winder A, Andersen T, Smith  EHB. Extending  page segmentation algorithms for mixed-layout document processing. Proc ICDAR  2011: 1245-1249. DOI: 10.1109/ICDAR.2011.251. 
- Breuel TM. Two  geometric algorithms for layout analysis. International Workshop on Document Analysis  Systems 2002: 188-199. DOI: 10.1007/3-540-45869-7_23. 
- Shafait F, Keysers D,  Breuel TM. Performance comparison of six algorithms for page segmentation.  International Workshop on Document Analysis Systems 2006: 368-379. DOI:  10.1007/11669487_33. 
- Baird  HS. Background structure in document images. International Journal of Pattern  Recognition and Artificial Intelligence 1994; 8(05): 1013-1030. DOI:  10.1142/S0218001494000516. 
- O'Gorman L. The  document spectrum for page layout analysis. IEEE Transactions on  Pattern Analysis and Machine Intelligence 1993; 15(11): 1162-1173. DOI:  10.1109/34.244677. 
- Skvortsov AV. Delaunay  trianguliation and its application [In Russian]. Tomsk:  Tomsk University Publisher; 2002. ISBN: 5-7511-1501-5. 
- Kise K, Sato A, Iwata  M. Segmentation of page images using the area Voronoi diagram. Computer Vision  and Image Understanding 1998; 70(3): 370-382. DOI: 10.1006/cviu.1998.0684. 
- Mao S, Kanungo T.  Empirical performance evaluation methodology and its application to page  segmentation algorithms. IEEE Transactions on Pattern Analysis and  Machine Intelligence 2001; 23(3): 242-256. DOI: 10.1109/34.910877. 
- Gather  P, Singh A. Empirical performance evaluation methodology and its application to  page segmentation algorithms: A review. International Journal of Advanced Research in Computer  Engineering & Technology 2015; 4(4): 1277-1279. 
- Esposito F, Malerba D,  Semeraro G. A knowledge-based approach to the layout analysis. Proc ICDAR  1995; 1: 466-471. DOI: 10.1109/ICDAR.1995.599037. 
- Li L, Yu S, Zhong L, Li X. Multilingual text detection  with nonlinear neural network. Mathematical Problems in Engineering 2015; 2015:  431608. DOI: 10.1155/2015/431608. 
- Shih FY, Chen SS.  Adaptive document block segmentation and classification. IEEE Transactions on  Systems, Man, and Cybernetics, Part B: Cybernetics 1996; 26(5): 797-802. DOI: 10.1109/3477.537322. 
- Wang D, Srihari SN.  Classification of newspaper image blocks using texture analysis. Computer  Vision, Graphics, and Image Processing 1989; 47(3): 327-352. DOI: 10.1016/0734-189X(89)90116-3. 
- Vil’kin AM, Safonov IV, Egorova MA. Algorithm  for segmentation of documents based on texture features. Pattern Recognition and Image Analysis 2013; 23(1): 153-159. DOI: 10.1134/S1054661813010136. 
- Sauvola JJ,  Pietikäinen M. Page segmentation and classification using fast feature extraction and  connectivity analysis. Proc ICDAR '95 1995; 2: 1127-1131. DOI:  10.1109/ICDAR.1995.602118. 
- Scherl W, Wahl F,  Fuchsberger H. Automatic separation of text, graphic and picture segments in  printed material. Pattern Recognition in Practice 1980: 213-221. 
- Tsujimoto S, Asada H.  Major components of a complete text reading system. Proceedings of the IEEE  1992; 80(7): 1133-1149. DOI: 10.1109/5.156475. 
- Jain AK, Zhong Y. Page  segmentation using texture analysis. Pattern Recognition 1996; 29(5): 743-770.  DOI: 10.1016/0031-3203(95)00131-X. 
- Cattoni R, Coianiz T, Messelodi S,  Modena CM. Geometric layout analysis techniques for document image  understanding: A review. ITC-irst Technical Report TR#9703-09 1998. Source:  <http://www.academia.edu/18416548/Geometric_Layout_Analysis_Techniques_for_Document_Image_Understanding_a_Review._TR_9703-09>. 
- Jain AK,  Bhattacharjee S. Text segmentation using Gabor filters for automatic document  processing. Machine Vision and Applications 1992; 5(3): 169-184. DOI: 10.1007/BF02626996. 
- Smith R. A simple and  efficient skew detection algorithm via text row accumulation. Proc  ICDAR '95 1995; 2: 1145-1148. DOI: 10.1109/ICDAR.1995.602124. 
- Hough PVC. Method and  means for recognizing complex patterns. Patent US 3069654, filed of March 26, 1960, published  of Desember 18, 1962.
- Hinds SC, Fisher JL,  D'Amato DP. A document skew detection method using run-length encoding and the  Hough transform. Proc ICPR 1990; 1: 464-468. DOI: 10.1109/ICPR.1990.118147. 
- Rashid SF, Shafait F, Breuel  TM. Scanning  neural network for text line recognition. 10th IAPR International Workshop on Document  Analysis Systems (DAS) 2012: 105-109. DOI: 10.1109/DAS.2012.77. 
- Breuel TM, Ul-Hasan A, Al-Azawi MA. High-performance OCR for printed English and Fraktur  using LSTM networks. Proc ICDAR 2013: 683-687. DOI: 10.1109/ICDAR.2013.140. 
- Nagy  G, Nartker TA, Rice SV. Optical character recognition: an illustrated guide to  the frontier. Proceedings of the IS&T/SPIE Symposium on Electronic Imaging  1999: 58-69. 
- Masalovich A,  Mestetskiy L. Warped image restoration based on continuous skeletal-border  representation [In Russian]. Proceedings of the International Conference  "GraphiCon" (Novosibirsk)  2006: 4 p. Source:  áhttp://graphicon.ru/html/2006/wr34_16_MestetskiyMasalovitch.pdfñ.
- Wang  T, Wu DJ, Coates A, Ng AY. End-to-end text recognition with convolutional  neural networks. ICPR 2012: 3304-3308. 
-   Zhong Y,  Zhang H, Jain AK. Automatic caption localization in compressed video. IEEE Transactions on  Pattern Analysis and Machine Intelligence 2000; 22(4): 385-392. DOI: 10.1109/34.845381. 
  
  © 2009, IPSI RAS
  Institution of Russian  Academy of Sciences, Image Processing  Systems Institute of RAS, Russia,  443001, Samara, Molodogvardeyskaya Street 151; E-mail: journal@computeroptics.ru; Phones: +7 (846) 332-56-22, Fax: +7 (846) 332-56-20