(46-3) 12 * << * >> * Русский * English * Содержание * Все выпуски

Handwritten text generation and strikethrough characters augmentation
A.V. Shonenkov 1, D.K. Karachev 2, M.Y. Novopoltsev 1, M.S. Potanin 1,3, D.V. Dimitrov 1,4, A.V. Chertok 1,5

SBER AI, 117312, Moscow, Russia, ul. Vavilova, 19;
OCRV, 107078, Moscow, Russia, Kalanchevskaia, 13;
MIPT, 141701, Moscow Region, Russia, Dolgoprudny, Institutskiy per., 9;
Lomonosov MSU, 119991, Moscow, Russia, GSP-1, Leninskie Gory;
AIRI, Moscow, Russia, Nizhny Susalny lane, 5, p. 19

 PDF, 1319 kB

DOI: 10.18287/2412-6179-CO-1049

Страницы: 455-464.

Язык статьи: English.

We introduce two data augmentation techniques, which, used with a Resnet-BiLSTM-CTC network, significantly reduce Word Error Rate and Character Error Rate beyond best-reported results on handwriting text recognition tasks. We apply a novel augmentation that simulates strikethrough text (HandWritten Blots) and a handwritten text generation method based on printed text (StackMix), which proved to be very effective in handwriting text recognition tasks. StackMix uses weakly-supervised framework to get character boundaries. Because these data augmentation techniques are independent of the network used, they could also be applied to enhance the performance of other networks and approaches to handwriting text recognition. Extensive experiments on ten handwritten text datasets show that HandWritten Blots augmentation and StackMix significantly improve the quality of handwriting text recognition models.

Ключевые слова:
data augmentation, handwritten text recognition, strikethrough text, computer vision, StackMix, handwritten blots.

Shonenkov AV, Karachev DK, Novopoltsev MY, Potanin MS, Dimitrov DV, Chertok AV. Handwritten text generation and strikethrough characters augmentation. Computer Optics 2022; 46(3): 455-464. DOI: 10.18287/2412-6179-CO-1049.


  1. Potanin M, Dimitrov D, Shonenkov A, Bataev V, Karachev D, Novopoltsev M. Digital peter: Dataset, competition and handwriting recognition methods. arXiv preprint, 2021. Source: <https://arxiv.org/abs/2103.09354>.
  2. Yun S, Han D, Chun S, Oh SJ, Yoo Y, Choe J. CutMix: Regularization strategy to train strong classifiers with localizable features. 2019 IEEE/CVF Int Conf on Computer Vision (ICCV) 2019: 6022-6031.
  3. Huang S, Wang X, Tao D. SnapMix: Semantically proportional mixing for augmenting fine-grained data. Proc AAAI Conf on Artificial Intelligence 2021; 35(2): 1628-1636.
  4. Zhang H, Cisse M, Dauphin YN, Lopez-Paz D. mixup: Beyond empirical risk minimization. Int Conf on Learning Representations 2018.
  5. Yu H, Wang H, Wu J. Mixup without hesitation. arXiv preprint, 2021. Source: <https://arxiv.org/abs/2101.04342>.
  6. Wigington C, Stewart S, Davis B, Barrett B, Price B, Cohen S. Data augmentation for recognition of handwritten words and lines using a cnn-lstm network. 2017 14th IAPR Int Conf on Document Analysis and Recognition (ICDAR) 2017; 1: 639-645.
  7. Poznanski A, Wolf L. Cnn-n-gram for handwriting word recognition. Proc IEEE conf on Computer Vision and Pattern Recognition 2016: 2305-2314.
  8. Krishnan P, Jawahar C. Matching handwritten document images. Proc European Conf on Computer Vision 2016: 766-782.
  9. Shen X, Messina R. A method of synthesizing handwritten chinese images for data augmentation. 2016 15th Int Conf on Frontiers in Handwriting Recognition (ICFHR) 2015: 114-119.
  10. Chammas E, Mokbel C, Likforman-Sulem L. Handwriting recognition of historical documents with few labeled data. 2018 13th IAPR Int Workshop on Document Analysis Systems (DAS) 2018: 43-48.
  11. Aradillas JC, Murillo-Fuentes JJ, Olmos PM. Boosting offline handwritten text recognition in historical documents with few labeled lines. IEEE Access 2020; 9: 76674-76688.
  12. Fogel S, Averbuch-Elor H, Cohen S, Mazor S, Litman R. Scrabblegan: Semi-supervised varying length handwritten text generation. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition 2020: 4324-4333.
  13. Bengio Y, et al. Markovian models for sequential data. Neural Computing Surveys 1999; 2(199): 129-162.
  14. Bourlard HA, Morgan N. Connnectionist speech recognition: A hybrid approach. Kluwer Academic Publishers; 1994.
  15. Almazán J, Gordo A, Fornés A, Valveny E. Word spotting and recognition with embedded attributes. IEEE Trans Pattern Anal Mach Intell 2014; 36(12): 2552-2566.
  16. Krishnan P, Dutta K, Jawahar C. Deep feature embedding for accurate recognition and retrieval of handwritten text. 15th Int Conf on Frontiers in Handwriting Recognition (ICFHR) 2016: 289-294.
  17. Hochreiter S, Schmidhuber J. Long short-term memory. Neural comput 1997; 9(8): 1735-1780.
  18. Voigtlaender P, Doetsch P, Ney H. Handwriting recognition with large multidimensional long short-term memory recurrent neural networks. 15th Int Conf on Frontiers in Handwriting Recognition (ICFHR) 2016: 228-233.
  19. Marti U-V, Bunke H. The IAM-database: an English sentence database for offline handwriting recognition. Int J Doc Anal Recognit 2002; 5(1): 39-46.
  20. Coquenet D, Chatelain C, Paquet T. Recurrence-free unconstrained handwritten text recognition using gated fully convolutional network. 17th Int Conf on Frontiers in Handwriting Recognition (ICFHR) 2020: 19-24.
  21. Ingle RR, Fujii Y, Deselaers T, Baccash J, Popat AC. A scalable handwritten text recognition system. Int Conf on Document Analysis and Recognition (ICDAR) 2019: 17-24.
  22. Michael J, Labahn R, Grüning T, Zöllner J. Evaluating sequence-to-sequence models for handwritten text recognition. Int Conf on Document Analysis and Recognition (ICDAR) 2019: 1286-1293.
  23. Yousef M, Bishop TE. OrigamiNet: Weakly-supervised, segmentation-free, one-step, full page text recognition by learning to unfold. IEEE/CVF Conf on Computer Vision and Pattern Recognition (CVPR) 2020: 14710-14719.
  24. Competition digital peter. 2020. Source: <https://github.com/sberbank-ai/digital_peter_aij2020>.
  25. DeVries T, Taylor GW. Improved regularization of convolutional neural networks with cutout. arXiv preprint, 2017. Source: <https://arxiv.org/abs/1708.04552>.
  26. Hermes D. Helper for bézier curves, triangles, and higher order objects. J Open Source Softw 2017; 2(16): 267.
  27. Method implementation (our code). 2021. Source: <https://github.com/TheDenk/augmixations>.
  28. Bird S, Loper E, Klein E. Natural language processing with python. o’reilly media inc; 2009.
  29. Malouf R. Multi-word expression tokenizer. Source: <https://www.nltk.org/_modules/nltk/tokenize/mwe.html>.
  30. The conversation AI team, T. C. A. Jigsaw unintended bias in toxicity classification. 2018. Source: <https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification>.
  31. Credits for the Latin library. Source: <https://www.thelatinlibrary.com/cred.html>.
  32. Russian wikimedia downloads. 2021. Source: <https://dumps.wikimedia.org/ruwiki/>.
  33. Transcribe Bentham. 2010. Source: <http://transcribe-bentham.ucl.ac.uk/td/TranscribeBentham>.
  34. Gatos B, Louloudis G, Causer T, Grint K, Romero V, Sánchez J-A, Toselli A, Vidal E. Ground-truth production in the transcriptorium project. 11th IAPR Int Workshop on Document Analysis Systems 2014: 237-241.
  35. Theodore Bluche. 2002. Source: <http://www.tbluche.com/resources.html>.
  36. IAM Handwriting Database. 2002. Source: <https://fki.tic.heia-fr.ch/databases/iam-handwriting-database>.
  37. Github repository with various IAM splits. 2021. Source: <https://github.com/shonenkov/IAM-Splitting>.
  38. Nurseitov D, Bostanbekov K, Kurmankhojayev D, Alimova A, Abdallah A. HKR for Handwritten Kazakh and Russian database. arXiv preprint, 2020. Source: <https://arxiv.org/abs/2007.03579>.
  39. Github with HKR dataset splitting. 2020. Source: <https://github.com/bosskairat/Dataset>.
  40. Reza AM. Realization of the contrast limited adaptive histogram equalization (CLAHE) for real-time image enhancement. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 2004; 38(1): 35-44.
  41. Fischer A, Frinken V, Fornés A, Bunke H. Transcription alignment of Latin manuscripts using Hidden Markov Models. Proc 2011 Workshop on Historical Document Imaging and Processing (HIP’11) 2011: 29-36.
  42. de Sousa Neto AF, Bezerra BLD, Toselli AH, Lima EB. HTR-Flor: A deep learning system for offline handwritten text recognition. 33rd SIBGRAPI Conference on Graphics, Patterns and Images 2020: 54-61.
  43. HTR-Flor implementation. 2019. Source: <https://github.com/arthurflor23/handwritten-text-recognition>.
  44. Strauss T, Leifert G, Labahn R, Hodel T, Mühlberger G. Icfhr2018 competition on automated text recognition on a read dataset. 16th Int Conf on Frontiers in Handwriting Recognition (ICFHR) 2018: 477-482.
  45. Coquenet D, Chatelain C, Paquet T. End-to-end handwritten paragraph text recognition using a vertical attention network. arXiv preprint, 2020. Source: <https://arxiv.org/abs/2012.03868>.
  46. Moysset B, Messina R. Are 2D-LSTM really dead for offline text recognition. Int J Doc Anal Recognit 2019; 22(3): 193-208.
  47. Wang T, Zhu Y, Jin L, Luo C, Chen X, Wu Y, Wang Q, Cai M. Decoupled attention network for text recognition. Proc AAAI Conf on Artificial Intelligence 2020; 34(07): 12216-12224.
  48. Abdallah A, Hamada M, Nurseitov D. Attention-based fully gated CNN-BGRU for Russian handwritten text. J Imaging 2020; 6(12): 141.

© 2009, IPSI RAS
Россия, 443001, Самара, ул. Молодогвардейская, 151; электронная почта: journal@computeroptics.ru; тел: +7 (846) 242-41-24 (ответственный секретарь), +7 (846) 332-56-22 (технический редактор), факс: +7 (846) 332-56-20