Russian * English * Content * All Issues

U-Net-bin: hacking the document image binarization contest

P.V. Bezmaternykh1,2, D.A. Ilin1, D.P. Nikolaev1,3

1Smart Engines Service LLC, 117312, Moscow, Russia,  
2Federal Research Center "Computer Science and Control" of RAS, 117312, Moscow, Russia,
3Institute for Information Transmission Problems of RAS, 127051, Moscow, Russia

DOI:10.18287/2412-6179-2019-43-5-825-832

Pages: 825-832.

Full text of article: English language.

Abstract:
Image binarization is still a challenging task in a variety of applications. In particular, Document Image Binarization Contest (DIBCO) is organized regularly to track the state-of-the-art techniques for the historical document binarization. In this work we present a binarization method that was ranked first in the DIBCO`17 contest. It is a convolutional neural network (CNN) based method which uses U-Net architecture, originally designed for biomedical image segmentation. We describe our approach to training data preparation and contest ground truth examination and provide multiple insights on its construction (so called hacking). It led to more accurate historical document binarization problem statement with respect to the challenges one could face in the open access datasets. A docker container with the final network along with all the supplementary data we used in the training process has been published on Github.

Keywords:
historical document processing, binarization, DIBCO, deep learning, U-Net architecture, training dataset augmentation, document analysis.

Citation:
Bezmaternykh PV, Ilin DA, Nikolaev DP. U-Net-bin: hacking the document image binarization contest. Computer Optics 2019; 43(5): 825-832. DOI: 10.18287/2412-6179-2019-43-5-825-832.

Acknowledgements:
The work was partially funded by Russian Foundation for Basic Research (projects 17-29-07092 and 17-29-07093).

References:

  1. Kruchinin AYu. Industrial DataMatrix barcode recognition for an arbitrary camera angle and rotation [In Russian]. Computer Optics 2014; 38(4): 865-870.
  2. Fedorenko VA, Sidak EV, Giverts PV. Binarization of images of striated toolmarks for estimation of the number of matching striations traces [In Russian]. Journal of Information Technologies and Computational Systems 2016; 3: 82-88.
  3. Gudkov V, Klyuev D. Skeletonization of binary images and finding of singular points for fingerprint recognition. Bulletin of the South Ural State University. Ser Computer Technologies, Automatic Control & Radioelectronics 2015; 15(3): 11-17. DOI: 10.14529/ctcr150302.
  4. Nikolaev DP. Segmentation-based binarization method for color document images. Proceedings of the 6th German-Russian Workshop “Pattern Recognition and Image Understanding” (OGRW-6) 2003: 190-193.
  5. Nagy G. Disruptive developments in document recognition. Patt Recogn Lett 2016; 79: 106-112. DOI: 10.1016/j.patrec.2015.11.024.
  6. Gatos B, Ntirogiannis K, Pratikakis I. ICDAR 2009 document image binarization contest (DIBCO 2009). 2009 10th International Conference on Document Analysis and Recognition 2009: 1375-1382. DOI: 10.1109/icdar.2009.246.
  7. Pratikakis I, Zagoris K, Barlas G, Gatos B. ICDAR2017 Competition on document image binarization (DIBCO 2017). 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) 2017; 1: 1395-1403. DOI: 10.1109/icdar.2017.228.
  8. Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. 2015. Source: <https://arxiv.org/abs/1505.04597>.
  9. Otsu N. A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 1979; 9(1): 62-66. DOI: 10.1109/tsmc.1979.4310076.
  10. Sauvola J, Pietikäinen M. Adaptive document image binarization. Pattern Recognition 2000; 33(2): 225-236. DOI: 10.1016/s0031-3203(99)00055-2.
  11. Cheriet M, Said JN, Suen CY. A recursive thresholding technique for image segmentation. IEEE Trans Image Process 1998; 7(6): 918-921. DOI: 10.1109/83.679444.
  12. Jianzhuang L, Wenqing L, Yupeng T. Automatic thresholding of gray-level pictures using two-dimension Otsu method. International Conference on Circuits and Systems 1991. DOI: 10.1109/ciccas.1991.184351.
  13. Ershov EI, Postnikov VV, Terekhin AP, Nikolaev DP. Exact fast algorithm for optimal linear separation of 2D distribution. European Conference on Modelling and Simulation 2015: 469-474.
  14. Shi Z, Setlur S, Govindaraju V. Digital image enhancement using normalization techniques and their application to palm leaf manuscripts. 2005. Source: <https://cedar.buffalo.edu/~zshi/Papers/kbcs04_261.pdf>.
  15. Gatos B, Pratikakis I, Perantonis SJ. Adaptive degraded document image binarization. Pattern Recognition 2006; 39(3): 317-327. DOI: 10.1016/j.patcog.2005.09.010.
  16. Lu S, Su B, Tan CL. Document image binarization using background estimation and stroke edges. Int J Doc Anal Recognit 2010; 13(4): 303-314. DOI: 10.1007/s10032-010-0130-8.
  17. Niblack W. An introduction to digital image processing. Upper Saddle River, NJ: Prentice-Hall Inc; 1990.
  18. Trier OD, Taxt T. Evaluation of binarization methods for document images. IEEE Trans Pattern Anal Mach Intell 1995; 17(3): 312-315. DOI: 10.1109/34.368197.
  19. Khurshid K, Siddiqi I, Faure C, Vincent N. Com parison of Niblack inspired binarization methods for ancient documents. Document Recognition and Retrieval XVI 2009. DOI: 10.1117/12.805827.
  20. Lazzara G, Géraud T. Efficient multiscale Sauvola’s binarization. Int J Doc Anal Recognit 2014; 17(2): 105-123. DOI: 10.1007/s10032-013-0209-0.
  21. Kim I-J. Multi-window binarization of camera image for document recognition. Ninth International Workshop on Frontiers in Handwriting Recognition 2004: 323-327. DOI: 10.1109/IWFHR.2004.70.
  22. Howe NR. Document binarization with automatic parameter tuning. Int J Doc Anal Recognit 2012; 16(3): 247-258. DOI: 10.1007/s10032-012-0192-x.
  23. Wen J, Li S, Sun J. A new binarization method for non-uniform illuminated document images. Pattern Recognition 2013; 46(6): 1670-1690. DOI: 10.1016/j.patcog.2012.11.027.
  24. Chen Y, Leedham G. Decompose algorithm for thresholding degraded historical document images. IEE Proc – Vision, Image, Signal Process 2005; 152(6): 702. DOI: 10.1049/ip-vis:20045054.
  25. , Lin W-H, Chang F. A binarization method with learning-built rules for document images produced by cameras. Pattern Recognition 2010; 43(4): 1518-1530. DOI: 10.1016/j.patcog.2009.10.016.
  26. Gatos B, Pratikakis I, Perantonis SJ. Improved document image binarization by using a combination of multiple binarization techniques and adapted edge information. 19th International Conference on Pattern Recognition 2008. DOI: 10.1109/icpr.2008.4761534.
  27. Badekas E, Papamarkos N. Optimal combination of document binarization techniques using a self-organizing map neural network. Eng Appl Artif Intell 2007; 20(1): 11-24. DOI: 10.1016/j.engappai.2006.04.003.
  28. Wu Y, Natarajan P, Rawls S, AbdAlmageed W. Learning document image binarization from data. IEEE International Conference on Image Processing (ICIP) 2016. DOI: 10.1109/icip.2016.7533063.
  29. Westphal F, Lavesson N, Grahn H. Document image binarization using recurrent neural networks. 13th IAPR International Workshop on Document Analysis Systems (DAS) 2018. DOI: 10.1109/das.2018.71.
  30. Tensmeyer C, Martinez T. Document image binarization with fully convolutional neural networks. ICDAR 2017.
  31. Xiong W, Xu J, Xiong Z, Wang J, Liu M. Degraded historical document image binarization using local features and support vector machine (SVM). Optik 2018; 164: 218-223. DOI: 10.1016/j.ijleo.2018.02.072.
  32. Nikolaev DP, Saraev AA. Quality criteria for the problem of automated adjustment of binarization algorithms [In Russian]. Proceeding of the Institute for Systems Analysis of the Russian Academy of Science 2013; 63(3): 85-94.
  33. Krokhina D, Shkanaev AY, Polevoy DV, Panchenko AV, Nailevish SR, Sholomov DL. Analysis of straw row in the image to control the trajectory of the agricultural combine harvester (Erratum). Tenth International Conference on Machine Vision (ICMV 2017) 2018: 90. DOI: 10.1117/12.2310143.
  34. Chollet F, et al. Keras: The Python deep learning library. 2015. Source: <https://keras.io>.
  35. Kingma DP, Ba J. Adam: A method for stochastic optimization. 2014. Source: <https://arxiv.org/abs/1412.6980>.
  36. Pratikakis I, Zagori K, Kaddas P, Gatos B. ICFHR 2018 Competition on Handwritten Document Image Binarization (H-DIBCO 2018). 16th International Conference on Frontiers in Handwriting Recognition (ICFHR) 2018. DOI: 10.1109/icfhr-2018.2018.00091.
  37. Oliveira SA, Seguin B, Kaplan F. dhSegment: A generic deep-learning approach for document segmentation. 2018 16th Int Conf Front Handwrit Recognit 2018: 7-12.
  38. Calvo-Zaragoza J, Gallego A-J. A selectional auto-encoder approach for document image binarization. Pattern Recognition 2019; 86: 37-47. DOI: 10.1016/j.patcog.2018.08.011.
  39. Arlazarov VV, Bulatov K, Chernov TS, Arlazarov VL. MIDV-500: A dataset for identity documents analysis and recognitionon mobile devices in video stream. 2018. Source: <https://arxiv.org/abs/1807.05786>.

 


© 2009, IPSI RAS
Россия, 443001, Самара, ул. Молодогвардейская, 151; электронная почта: ko@smr.ru ; тел: +7 (846) 242-41-24 (ответственный секретарь), +7 (846) 332-56-22 (технический редактор), факс: +7 (846) 332-56-20