(46-1) 17 * << * >> * Russian * English * Content * All Issues

Detection of objects in the images: from likelihood relationships towards scalable and efficient neural networks
N.A. Andriyanov 1, V.E. Dementiev 2, A.G. Tashlinskiy 2

Financial University under the Government of the Russian Federation,
125993, Moscow, Russia, Leningradskiy pr-t 49;
Ulyanovsk State Technical University,
432027, Ulyanovsk, Russia, Severny Venets 32

 PDF, 2226 kB

DOI: 10.18287/2412-6179-CO-922

Pages: 139-159.

Full text of article: Russian language.

The relevance of the tasks of detecting and recognizing objects in images and their sequences has only increased over the years. Over the past few decades, a huge number of approaches and methods for detecting both anomalies, that is, image areas whose characteristics differ from the predicted ones, and objects of interest, about the properties of which there is a priori information, up to the library of standards, have been proposed. In this work, an attempt is made to systematically analyze trends in the development of approaches and detection methods, reasons behind these developments, as well as metrics designed to assess the quality and reliability of object detection. Detection techniques based on mathematical models of images are considered. At the same time, special attention is paid to the approaches based on models of random fields and likelihood ratios. The development of convolutional neural networks intended for solving the recognition problems is analyzed, including a number of pre-trained architectures that provide high efficiency in solving this problem. Rather than using mathematical models, such architectures are trained using libraries of real images. Among the characteristics of the detection quality assessment, probabilities of errors of the first and second kind, precision and recall of detection, intersection by union, and interpolated average precision are considered. The paper also presents typical tests that are used to compare various neural network algorithms.

pattern recognition, object detection, computer vision, image processing, random fields, CNN, IoU, mAP, probability of correct detection.

Andriyanov NA, Dementiev VE, Tashlinskiy AG. Detection of objects in the images: from likelihood relationships towards scalable and efficient neural networks. Computer Optics 2022; 46(1): 139-159. DOI: 10.18287/2412-6179-CO-922.

This study was partly funded by the Russian Foundation of Basic Research under projects ## 20-17-50020 and 19-29-09048.


  1. Bagautdinov RS, KopenkovVN, Myshkin VN, Sergeev VV, Tribunsky SA. Study of the applicability of satellite imagery to detecting archeological objects. Computer Optics 2015; 39(3): 439-444. DOI: 10.18287/0134-2452-2015-39-3-439-444.
  2. Andriyanov NA, Vasil'ev KK, Dement'ev VE. Investigation of filtering and objects detection algorithms for a multizone image sequence. ISPRS Archives 2019; XLII-2/W12: 7-10. DOI: 10.5194/isprs-archives-XLII-2-W12-7-2019.
  3. Attard L, Farrugia R. Vision based surveillance system. 2011 IEEE EUROCON – Int Conf on Computer as a Tool 2011; 1: 1-4. DOI: 10.1109/EUROCON.2011.5929144.
  4. Prati A, Shan C, Wang K. Sensors, vision and networks: From video surveillance to activity recognition and health monitoring. J Ambient Intell Smart Environ 2019; 11(1): 5-22. DOI: 10.3233/AIS-180510.
  5. Raghu M, Zhang C, Kleinberg J, Bengio S. Transfusion: Understanding transfer learning for medical imaging. Proc 33rd Conf on Neural Information Processing Systems (NeurIPS) 2019; 1: 1-22.
  6. Mikhaylichenko AA, Demyanenko YaM. Detection of the bone contours of the knee joints on medical X-ray images. Computer Optics 2019; 43(3): 455-463. DOI: 10.18287/2412-6179-2019-43-3-455-463.
  7. Zherdev DA, Minaev EY, Procudin VV, Fursov VA. Object recognition using real and modelled SAR images. Procedia Eng 2017; 201: 503-510. DOI: 10.1016/j.proeng.2017.09.473.
  8. Aduenko AA, Vasileisky AS, Karelov AI, Reyer IA, Rudakov KV, Strijov VV. Algorithms of detection and registration of persistent scatterers in satellite radar images. Computer Optics 2015; 39(4): 622-630. DOI: 10.18287/0134-2452-2015-39-4-622-630.
  9. Kuznetsova A, Maleva T, Soloviev V. Using YOLOv3 algorithm with pre- and post-processing for apple detection in fruit-harvesting robot. Agronomy 2020; 10: 10-16. DOI: 10.3390/agronomy10071016.
  10. Rauf HT, Saleem BA, Lali MI, Khan MA, Sharif M, Bukhari D. A citrus fruits and leaves dataset for detection and classification of citrus diseases through machine learning. Data Brief 2019; 26: 104-116.
  11. Andriyanov NA, Volkov AK, Volkov AK, Gladkikh AA, Danilov SD. Automatic x-ray image analysis for aviation security within limited computing resources. IOP Conf Ser: Mater Sci Eng 2020; 862: 1-6. DOI: 10.1088/1757-899X/862/5/052009.
  12. Taimur H, Bettayeb M, Akçay S, Khan S, Bennamoun M, Werghi N. Detecting prohibited items in X-Ray images: a contour proposal learning approach. Proc 2020 IEEE Int Conf on Image Processing (ICIP) 2020; 1: 1-6. DOI: 10.1109/ICIP40778.2020.9190711.
  13. Bogdanovich VA, Vostretsov AG. Theory of robust signal detection, discrimination and estimation [In Russian]. Moscow: “Fizmatlit” Publisher; 2004.
  14. Gruzman IS, Kirichuk VS, Kosykh VP, Peretyagin GI, Spector AA. Digital image processing in information systems: A tutorial [in Russian]. Novosibirsk: “NGTU” Publisher; 2000.
  15. Andriyanov NA, Vasiliev KK. Use autoregressions with multiple roots of the characteristic equations to image representation and filtering. CEUR Workshop Proc 2018; 2210: 273-281. DOI: 10.18287/1613-0073-2018-2210-273-281.
  16. Tikhonov VI. Optimal signal reception [In Russian]. Moscow: “Radio i Svyaz” Publisher; 1983.
  17. Neyman J, Pearson ES. On the problem of the most efficient tests of statistical hypotheses. J Phil Trans R Soc 1933; 231: 694-706.
  18. Vasiliev KK, Krasheninnikov VR. Adaptive algorithms for detecting anomalies on a sequence of multidimensional images [In Russian]. Computer Optics 1995; 14-15(1): 125-132.
  19. Denisova AYu, Myasnikov VV. Detecting anomalies in hyperspectral images [In Russian]. Computer Optics 2014; 38(2): 287-296. DOI: 10.18287/0134-2452-2014-38-2-287-296.
  20. Vasiliev KK. Detection of extended anomalies in multidimensional images [In Russian]. Vestnik UlGTU 2006; 3: 47-49.
  21. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015; 521: 436-444. DOI: 10.1038/nature14539.
  22. LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD. Backpropagation applied to handwritten Zip Code recognition. Neural Computation 1989; 1(4): 541-551.
  23. Bozinovski S, Ante F. The influence of pattern similarity and transfer learning upon training of a base perceptron B2. Proceedings of Symposium Informatica 1976; 3: 121-126.
  24. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Proc 26th Conf on Neural Information Processing Systems (NeurIPS) 2012; 1: 1106-1114. DOI: 10.1145/3065386.
  25. Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. Proc IEEE Conf on Computer Vision and Pattern Recognition (CVPR) 2014; 1: 580-587.
  26. Girshick R. Fast R-CNN. Proc Int Conf on Computer Vision (ICCV) 2015; 1: 1440-1448. DOI: 10.1109/ICCV.2015.169.
  27. RenS, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. Proc 29th Conf on Neural Information Processing Systems (NeurIPS) 2015; 1: 91-99.
  28. Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. Proc IEEE Conf on Computer Vision and Pattern Recognition (CVPR) 2016; 1: 779-788.
  29. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C, Berg A. SSD: Single shot multibox detector. Proc European Conf on Computer Vision (ECCV) 2016; 1: 1-17. DOI: 10.1007/978-3-319-46448-0_2.
  30. SSD-Mobile. Source: <https://github.com/IntelAI/models/tree/master/benchmarks/object_detection/tensorflow/ssd-mobilenet>.
  31. Tensor Flow 2 detection zoo. Source: <https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md>.
  32. Chaban LN. Methods and algorithms for pattern recognition in automated decryption of remote sensing data: a tutorial [In Russian]. Moscow: “MIIGAiK” Publisher; 2016.
  33. Zhao Z, Zheng P, Xu S, Wu X. Object detection with deep learning: A review. IEEE Trans Neural Netw Learn Syst 2019; 30(11): 3212-3232. DOI: 10.1109/TNNLS.2018.2876865.
  34. Sharma K, Nileshsingh T. A review and an approach for object detection in images. Int J Comput Vis Robot 2017; 7: 196-237. DOI: 10.1504/IJCVR.2017.10001813.
  35. Zou Z, Zhenwei S, Yuhong G, Jieping Y. Object detection in 20 years: A Survey. Source: <https://arxiv.org/pdf/1905.05055.pdf>.
  36. Hlova M. How to find a reliable computer vision development company. Source: <https://www.n-ix.com/computer-vision-development-company/>.
  37. Vasiliev KK. Optimal discrete time signal processing [In Russian]. Moscow: “Radiotekhnika” Publisher; 2016.
  38. Soifer VA, ed. Methods of computer image processing [In Russian]. Moscow: "Fizmatlit" Publisher; 2003. ISBN: 5-9221-0270-2.
  39. Ramchandran A, Sangaiah AK. Unsupervised anomaly detection for high dimensional data—An exploratory analysis, intelligent data-centric systems, computational intelligence for multimedia big data on the cloud with engineering applications. In Book: Sangaiah AK, Sheng M, Zhang Z, eds. Computational intelligence for multimedia big data on the cloud with engineering applications. London: Academic Press; 2018: 233-251. DOI: 10.1016/B978-0-12-813314-9.00011-6.
  40. Andriyanov NA. Analysis of the acceleration of neural networks inference on intel processors based on OpenVINO Toolkit. Proc IEEE 2020 Systems of Signal Synchronization, Generating and Processing in Telecommunications (SYNCHROINFO) 2020; 1: 1-4. DOI: 10.1109/SYNCHROINFO49631.2020.9166067.
  41. Rosebrock A. Intersection over Union (IoU) for object detection. Source: <https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/>.
  42. Hui J. mAP (mean Average Precision) for object detection. Source: <https://jonathan-hui.medium.com/map-mean-average-precision-for-object-detection-45c121a31173>.
  43. Powers D. Evaluation: From precision, recall and f-measure to ROC, Informedness, markedness & correlation. Journal of Machine Learning Technologies 2011; 2(1): 37-63.
  44. Powers D. What the F-measure doesn't measure: Features, flaws, fallacies and fixes. arXiv Preprint. Source: <https://arxiv.org/abs/1503.06410>.
  45. Pepe MS. The statistical evaluation of medical tests for classification and prediction. New York, NY: Oxford University Press; 2003.
  46. Davis J, Goadrich M. The relationship between precision-recall and ROC curves. Proc 23rd Int Conf on Machine Learning 2006; 1: 1-8.
  47. Hoiem D, Santosh K, Hays J. Pascal VOC 2008 Challenge. Source: <http://www.wisdom.weizmann.ac.il/~/vision/courses/2010_2/papers/Hoiem_et_al_Pascal08.pdf>.
  48. COCO Dataset. Source: <https://cocodataset.org/#detection-eval>.
  49. Everingham M, Van Gool L, Williams C, Winn J, Zisserman A. The PASCAL visual object classes (VOC) challenge. Int J Comput Vis 2010; 88: 303-338. DOI: 10.1007/s11263-009-0275-4.
  50. Yaroslavsky LP. Digital signal processing in optics and holography: An introduction to digital optics [In Russian]. Moscow: "Radio i Svyaz" Publisher; 1987.
  51. Akhmetshin AM, Fedorenko AE. Application of the theory of Markov random fields for segmentation of multispectral images of the Earth's surface [In Russian]. Source: <http://gis.nmu.org.ua/lit/doc2.doc>.
  52. Bychkov AA, Ponkin VA. Image detection of spatially extended objects shading the background [In Russian]. Avtometriya 1992; 4: 33-40.
  53. Egorov VA, Bartalev SA, Burtsev MA, Efremov VYu, Lupyan EA, Mazurov AA, Matveev AM. High spatial resolution satellite image referencing streaming technology [In Russian]. Modern problems of remote sensing of the Earth from space 2010; 7(4): 97-103.
  54. Luchkov NV. Development and research of algorithms for detecting extended anomalies in multispectral images [In Russian]. The thesis for the Candidate’s degree in Technical Sciences; 2012.
  55. Soyfer VA. Advanced information technologies for Earth remote sensing [In Russian]. Samara: “Novaya Technica” Publusher; 2015.
  56. Bouman CA. Model based imaging processing. Purdue University Publisher; 2013.
  57. Pyatkin VP, Sapov GI. Nonparametric statistical approach to the problem of detecting some structures in aerospace images [In Russian]. Science-Intensive Technologies 2002; 3: 52-58.
  58. Vasiliev KK, Dementyev VE. Algorithms for optimal detection of signals with unknown levels in multispectral images [In Russian]. Digital signal processing and its applications 2006; 2: 433-436.
  59. Brokshtein IM, Merzlyakov SN, Popova NR. Detection and localization of small objects against a non-uniform background [In Russian]. Digital Optics. Image and Field Processing in Experimental Research 1996; 3: 67-72.
  60. Lei Z, Liu JC, Chan AK, Smith W. Object-based image segmentation using DWT/RDWT multiresolution Markov random field. IEEE Int Conf on Acoustics, Speech, and Signal Processing 1999; 6: 3485-3488. DOI: 10.1109/ICASSP.1999.757593.
  61. Ma WY, Manjunath BS. Edge Flow: A framework of boundary detection and image segmentation. Proc IEEE Computer Society Conference on Computer Vision and Pattern Recognition 1997; 1: 744-749. DOI: 10.1109/CVPR.1997.609409.
  62. Zlobin VK, Eremeev VV. Vasiliev VM. Stochastic satellite imagery model and its use for segmentation of natural objects [In Russian]. Avtometriya 2001; 2: 13-15.
  63. Korolev EE, Kochergin AM, Kuznetsov AE. Automatic segmentation of cloud objects on images of the earth's surface with high spatial resolution [In Russian]. Modern problems of science and education 2014; 5. Source: <https://science-education.ru/pdf/2014/5/604.pdf>.
  64. Andriyanov NA, Vasiliev KK, Dementiev VE. Anomalies detection on spatially inhomogeneous polyzonal images. CEUR Workshop Proc 2017; 1901: 10-15. DOI: 10.18287/1613-0073-2017-1901-10-15.
  65. Andriyanov NA, Dementiev VE. Developing and studying the algorithm for segmentation of simple images using detectors based on doubly stochastic random fields. Pattern Recognition and Image Analysis 2019; 29(1): 1-9. DOI: 10.1134/S105466181901005X.
  66. Horton M, Cameron-James M, Williams R. Multiple classifier object detection with confidence. Proc 20th Australian Joint Conference on Artificial Intelligence 2007; 1: 559-568.
  67. Kirichuk VS, Parfenyuk SV, Angerov VYu. Detection of small-sized objects by sequences of TV-Images of the IR range [In Russian]. Proc 5th Int Scientific and Technical Conf on Pattern Recognition and Scene Analysis 2002; 1: 273-278.
  68. Gonzalez R, Woods R. Digital image processing [In Russian]. Moscow: “Tekhnosphera” Publisher; 2012.
  69. Vasyukov VN, Gruzman IS, Rayfeld MA, Spektor AA. New approaches to solving problems of image processing and recognition [In Russian]. Science-intensive technologies 2002; 3: 44-51.
  70. Andriyanov NA, Dementiev VE, Vasiliev KK. Developing a filtering algorithm for doubly stochastic images based on models with multiple roots of characteristic equations. Pattern Recognition and Image Analysis  2019; 29(1): 10-20. DOI: 10.1134/S1054661819010048.
  71. Vasil’ev KK, Dement’ev VE, Andriyanov NA. Application of mixed models for solving the problem on restoring and estimating image parameters. Pattern Recognition and Image Analysis 2016; 26(1): 240-247. DOI: 10.1134/S1054661816010284.
  72. Andriyanov NA, Dementyiev VE. Determination of borders between objects on satellite images using a two-proof doubly stochastic filtration. J Phys: Conf Ser 2019; 1353: 1-6. DOI: 10.1088/1742-6596/1353/1/012006.
  73. Vasil’ev KK, Dement’ev VE, Andriyanov NA. Doubly stochastic models of images. Pattern Recognition and Image Analysis 2015; 25(1): 105-110. DOI: 10.1134/S1054661815010204.
  74. Vasiliev KK, Dementiev VE, Andriyanov NA. Filtration and restoration of satellite images using doubly stochastic random fields. CEUR Workshop Proc 2016; 1814: 10-20.
  75. Shcherbakov MA, Panov AA. Nonlinear filtering with adaptation to local image properties. Computer Optics 2014; 38(4): 818-824. DOI: 10.18287/0134-2452-2014-38-4-818-824.
  76. Vasilyev KK, Ageev SA. The adaptive decorrelation algorithm of signal detection. Proc 1st Int Conf on Digital Signal Processing and Its Applications 1998; 2: 133-136.
  77. Anikin IV, Shagiakhmetov MR. Methods of fuzzy processing, recognition and analysis of objects [In Russian]. Pattern Recognition and Scene Analysis: Proc 5th Int Scientific and Technical Conf 2002; 1: 16-20.
  78. Buriak DYu, Vizilter YuV. Automated design of nearly optimal procedures for identifying and detecting objects in the image using genetic algorithms [In Russian]. Proc 12th Int Conf on CG and MV Graphicon 2002; 1: 17-20.
  79. Buriak DYu, Vizilter YuV Decision procedure representation models and their use in a genetic algorithm for finding optimal image analysis procedures [In Russian] Methods and means of Information Processing: proc 1st All-Russian Scientific Conf 2003; 1: 317-323.
  80. Dubes RC, Jain AK, Nadabar SG, Chen CC. MRF model-based algorithms for image segmentation. Proc 10th Int Conf on Pattern Recognition (ICPR) 1990; 1: 808-814. DOI: 10.1109/ICPR.1990.118221.
  81. Yahne B. Digital image processing [In Russian]. Moscow: “Tekhnosphera” Publisher; 2007.
  82. Beggel L, Pfeiffer M, Bischl B. Robust anomaly detection in images using adversarial autoencoders. Source: <https://ecmlpkdd2019.org/downloads/paper/581.pdf>.
  83. Ghoneim S. React token-based authentication module with Axios Interceptors. Source: <https://medium.com/@salma_ghoneim>.
  84. Reed IS, Yu X. Adaptive multiple-band CFAR detection of an optical pattern with unknown spectral distribution. IEEE Trans Signal Process 1990; 38(10): 1760-1770.
  85. Basener W, Ientilucci E, Messinger DW. Anomaly detection using topology. Proc SPIE 2007; 6565:16-32. DOI: 10.1117/12.745429.
  86. Schaum AP. Hyperspectral anomaly detection beyond RX. Proc SPIE 2007; 6565:122-130. DOI: 10.1117/12.718789.
  87. Zheltov SY, Vizilter YuV, Ososkov MV, Beketova IV, Karateev SL. Automatic selection of human face and its characteristic features in color digital images [In Russian]. Bulletin of Computer and Information Technologies 2005; 10: 2-7.
  88. Belim SV, Kutlunin PE. Boundary extraction in images using a clustering algorithm [In Russian]. Computer Optics 2015; 39(1): 119-124. DOI: 10.18287/0134-2452-2015-39-1-119-124.
  89. Shapiro L, Stockman J. Computer vision = Computer Vision [In Russian]. Moscow: “Binom. Laboratoria Znanii” Publisher; 2006.
  90. Viola P, Jones M. Rapid object detection using a boosted cascade of simple features. Conf on Computer Vision and Pattern Recognition 2001; 1:1-9. DOI:
  91. Padilla R, Filho C, Costa M. Evaluation of Haar cascade classifiers for face detection. Int Conf on Digital Image Processing (ICDIP) 2012; 6: 466-469.
  92. Lingua A, Marenchino D, Nex F. Performance analysis of the SIFT operator for automatic feature extraction and matching in photogrammetric applications. Sensors 2009; 9(5): 3745-3766. DOI: 10.3390/s90503745.
  93. Qu X, Soheilian B, Habets E, Paparoditis N. Evaluation of SIFT and SURF for vision based localization. ISPRS Int. Arch. Photogramm. Remote Sens Spat Inf Sci 2016; XLI-B3: 685-692. DOI: 10.5194/isprs-archives-XLI-B3-685-2016.
  94. Calonder M, Lepetit V, Strecha C, Fua P. BRIEF: Binary robust independent elementary features. In Book: Daniilidis K, Maragos P, Paragios N, eds. Computer Vision – ECCV 2010. Berlin, Heidelberg: Springer-Verlag; 2010: 778-792. DOI: 10.1007/978-3-642-15561-1_56.
  95. Khoi P, Thien LH, Viet VH. Face retrieval based on local binary pattern and its variants: A comprehensive study. Int J Adv Comput Sci Appl 2016; 7: 249-258. DOI: 10.14569/IJACSA.2016.070632.
  96. Karaaba M, Surinta O, Schomaker L, Wiering MA. Robust face recognition by computing distances from multiple histograms of oriented gradients. Proc 2015 IEEE Symposium Series on Computational Intelligence 2015; 1: 203-209.
  97. Face recognition: from traditional to deep learning methods [In Russian]. Source: <https://russianblogs.com/article/1856282938/>.
  98. Felzenszwalb P, Girshick R, McAllester D, Ramanan D. Object detection with discriminatively trained part based models. TPAMI 2010; 32(9): 1627-1645. DOI: 10.1109/TPAMI.2009.167.
  99. Fischler M, Elshlager R. The representation and matching of pictorial structures. IEEE Trans on Computer 1973; 22(1): 67-92. DOI: 10.1109/T-C.1973.223602.
  100. Parkhi OM, Vedaldi A, Zisserman A, Jawahar C. Cats and dogs. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) 2012; 1: 3498-3505. DOI: 10.1109/CVPR.2012.6248092.
  101. He K, Gkioxari G, Dollár P, Girshick R. MaskR-CNN. Source: <https://arxiv.org/abs/1703.06870>.
  102. Chervyakov NI, Lyakhov PA, Nagornov NN, Valueva MV, Valuev GV. Hardware implementation of a convolutional neural network using computations in the residue number system. Computer Optics 2019; 43(5): 857-868. DOI: 10.18287/2412-6179-2019-43-5-857-868.
  103. Zhang Z. Derivation of backpropagation in convolutional neural network (CNN). Source: <https://pdfs.semanticscholar.org/5d79/11c93ddcb34cac088d99bd0cae9124e5dcd1.pdf>.
  104. Dwarampudi M, Subba NV. Reddy effects of padding on LSTMs and CNNs. arXiv Preprint. Source: <https://arxiv.org/abs/1903.07288>.
  105. Christlein V, Spranger L, Seuret M, Nicolaou A, Král P, Maier A. Deep generalized max pooling. arXiv Preprint. Source: <https://arxiv.org/abs/1908.05040>.
  106. Selective search. Source: <https://www.koen.me/research/selectivesearch/>.
  107. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Proc IEEE conf on Computer Vision and Pattern Recognition (CVPR) 2016; 1: 1-12. DOI: 10.1109/CVPR.2016.90.
  108. Cristianini N, Shawe-Taylor J. An introduction to support vector machines and other kernel-based learning methods. Cambridge: Cambridge University Press; 2000.
  109. Simonyan K. Zisserman  A. Deep convolutional networks for large-scale image recognition. arXiv Preprint. Source: <https://arxiv.org/abs/1409.1556>.
  110. Girshick R. Fast R-CNN. arXiv Preprint. Source: <https://arxiv.org/abs/1504.08083>.
  111. Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. arXiv Preprint. Source: <https://arxiv.org/abs/1506.01497>.
  112. Mask R-CNN: modern neural network architecture for object segmentation in images [In Russian]. Source: <https://habr.com/ru/post/421299/>.
  113. Single shot detectors review. Source: <https://towardsdatascience.com/review-ssd-single-shot-detector-object-detection-851a94607d11>.
  114. Redmon J, Farhadi A. YOLOv3: An incremental improvement. arXiv Preprint. Source: <https://arxiv.org/abs/1804.02767>.
  115. Bochkovskiy A, Wang C, Liao HM. YOLOv4: Optimal speed and accuracy of object detection. arXiv Preprint. Source: <https://arxiv.org/abs/2004.10934>.
  116. YOLOv5 Object detection. Source: <https://laptrinhx.com/guide-to-yolov5-for-real-time-object-detection-142707357/>.
  117. YOLOv5 release. Source: <https://github.com/ultralytics/yolov5>.
  118. BCCD. Source: <https://public.roboflow.com/object-detection/bccd>.
  119. Tan M, Pang R, Le QV. EfficientDet: Scalable and efficient object detection. arXiv Preprint. Source: <https://arxiv.org/abs/1911.09070>.
  120. Tan M, Le QV. EfficientNet: Rethinking model scaling for convolutional neural networks. arXiv Preprint. Source: <https://arxiv.org/abs/1905.11946>.
  121. Law H, Deng J. CornerNet: Detecting objects as paired keypoint. arXiv Preprint. Source: <https://arxiv.org/abs/1808.01244>.
  122. Lin TY, Goyal P, Girshick R, He K, Dollár P. Focal loss for dense object detection. arXiv Preprint. Source: <https://arxiv.org/abs/1708.02002>.
  123. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. arXiv Preprint. Source: <https://arxiv.org/abs/1612.03144>.
  124. Bochkovskiy A, Wang C, Liao HY. YOLOv4: Optimal speed and accuracy of object detection. arXiv Preprint. Source: <https://arxiv.org/pdf/2004.10934v1.pdf>.
  125. Object detection on COCO test-dev. Source: <https://paperswithcode.com/sota/object-detection-on-coco>.
  126. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Zh, Lin S, Guo B. Swin transformer: Hierarchical vision transformer using shifted windows. arXiv Preprint. Source: <https://arxiv.org/pdf/2103.14030v1.pdf>.
  127. Zhou X, Koltun V, Krahenbuhl P. Probabilistic two-stage detection. arXiv Preprint. Source: <https://arxiv.org/pdf/2103.07461v1.pdf>.
  128. Wang C, Bochkovskiy A, Liao H. Scaled-YOLOv4: Scaling cross stage partial network. arXiv Preprint. Source: <https://arxiv.org/pdf/2011.08036v2.pdf>.
  129. Liu Y, Wang S, Liang T, Zhao Q, Tang Z, Ling H. CBNet: A novel composite backbone network architecture for object detection. arXiv Preprint. Source: <https://arxiv.org/pdf/1909.03625v1.pdf>.
  130. Du X, Lin T, Jin P, Ghiasi G, Tan M, Cui Y, Le QV, Song X. SpineNet: Learning scale-permuted backbone for recognition and localization. Proc IEEE Conf on Computer Vision and Pattern Recognition (CVPR) 2020; 1: 11593-11601. DOI: 10.1109/CVPR42600.2020.01161.
  131. Wang J, Sun K, Cheng T, Jiang B, Deng C, Zhao Y, Liu D, Mu Y, Tan M, Wang X, Liu W, Xiao B. Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Mach Intell 2020; 1: 1-23. DOI: 10.1109/tpami.2020.2983686.
  132. Fang H, Sun J, Wang R, Gou M, Li Y, Lu C, Tong SJ. InstaBoost: Boosting instance segmentation via probability map guided copy-pasting. Proc 2019 IEEE/CVF Int Conf on Computer Vision (ICCV) 2019; 1: 682-691. DOI: 10.1109/ICCV.2019.00077.
  133. Gao Z, Wang L, Wu G. LIP: Local importance-based pooling. Proc 2019 IEEE/CVF Int Conf on Computer Vision (ICCV) 2019; 1: 3355-3364. DOI: 10.1109/ICCV.2019.00345.
  134. Vu T, Jang H, Pham T, Yoo C. Cascade RPN: Delving into high-quality region proposal network with adaptive convolution. Proc 33rd Conf on Neural Information Processing Systems (NeurIPS 2019) 2019; 1: 1-11.
  135. Redmon J, Farhadi A. YOLO9000: Better, faster, stronger. Proc 2017 IEEE Conf on Computer Vision and Pattern Recognition (CVPR) 2017; 1: 7263-7271. DOI: 10.1109/CVPR.2017.690.
  136. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N. An image is worth 16x16 words: Transformers for image recognition at scale. Int Conf on Learning Representations 2021; 1: 1-22.
  137. Thuan D. Evolutoin of YOLO algorithm and YOLOv5: the state of the art object detection. Source: <https://www.theseus.fi/bitstream/handle/10024/452552/Do_Thuan.pdf>.
  138. Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv Preprint. Source: <https://arxiv.org/abs/1810.04805>.
  139. Brown T, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler DM, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D. Language models are few-shot learners. arXiv Preprint. Source: <https://arxiv.org/pdf/2005.14165.pdf>.
  140. Beal J, Kim E, Tzeng E, Park DH, Zhai A, Kislyuk D. Toward transformer-based object detection toward transformer-based object detection. arXiv Preprint. Source: <https://arxiv.org/abs/2012.09958>.
  141. Gan Z, Chen Y, Li L, Zhu C, Cheng Y, Liu J. Large-scale adversarial training for vision-and-language representation learning: Supplementary material. Proc 34th Conf on Neural Information Processing Systems (NeurIPS) 2020; 1: 1-5.
  142. Mehta P. Multimodal deep learning fusion of multiple modalities using deep learning. Source: <https://towardsdatascience.com/multimodal-deep-learning-ce7d1d994f4>.
  143. Ray A, Rajeswar S, Chaudhury S. Text recognition using deep BLSTM networks. Proc Eighth Int Conf on Advances in Pattern Recognition (ICAPR) 2015; 1: 1-6. DOI: 10.1109/ICAPR.2015.7050699.
  144. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput 1997; 9: 1735-80. DOI: 10.1162/neco.1997.9.8.1735.
  145. Bianco S, Cadene R, Celona L, Napoletano P. Benchmark analysis of representative deep neural network architectures. IEEE Access 2018; 4: 1-8. DOI: 10.1109/ACCESS.2018.2877890.

© 2009, IPSI RAS
151, Molodogvardeiskaya str., Samara, 443001, Russia; E-mail: journal@computeroptics.ru ; Tel: +7 (846) 242-41-24 (Executive secretary), +7 (846) 332-56-22 (Issuing editor), Fax: +7 (846) 332-56-20