(50-2) 18 * << * >> * Русский * English * Содержание * Все выпуски

Object detection in images using deep neural networks and synthetic data in scenarios of partial object occlusion
G.A. Algashev1, P.A. Kremushchenko1, I.A. Lezin1

1Samara National Research University, 443086, Russia, Samara, Moskovskoye Shosse 34

  Полный текст (PDF)

DOI: 10.18287/COJ1879

ID статьи: 1879

Аннотация:
This research addresses the problem of automatic object detection in images under limited-visibility conditions, where objects are partially occluded, the background is complex, and lighting and viewpoints vary widely. The proposed approach combines pretraining on a programmatically generated synthetic dataset of 18,000 images -- produced using the Visualization Toolkit (VTK) library -- with fine-tuning on a compact real-image dataset of 2,000 annotated photographs (500 per class). Six deep neural network architectures -- Faster R-CNN ResNet-50 FPN, SSD MobileNet V3, YOLOv11n, EfficientDet-D7, DETR-DC5, and CenterNet -- were evaluated across three training regimes: synthetic-only, real-only, and combined (90% synthetic / 10% real). Hybrid training yielded the most substantial improvements: YOLOv11n achieved mAP@0.5 = 0.91 and mAP@0.75 = 0.86 (Precision = 0.89, Recall = 0.90, F1 = 0.89, 82 FPS), compared to 0.79 (synthetic-only) and 0.78 (real-only), representing a gain of up to +15 percentage points in mAP@0.5. EfficientDet-D7 reached mAP@0.5 = 0.87 and mAP@0.75 = 0.81, while CenterNet achieved mAP@0.5 = 0.88 at 35 FPS. Robustness analysis under simulated occlusion demonstrated that hybrid-trained models maintain reliable detection even under severe conditions: YOLOv11n retained mAP@0.5 = 0.78 at 50% occlusion and mAP@0.5 = 0.65 at 25% object visibility, while the degradation in mAP under 75% occlusion did not exceed 20% of the baseline level. The results confirm the viability of synthetic data as a standalone pretraining resource and validate the proposed hybrid pipeline for applications in autonomous driving, video surveillance, and industrial inspection.

Ключевые слова:
computer vision, synthetic data, neural networks, object detection, 3D modeling, dataset generation, deep learning.

Цитирование:
Algashev GA, Kremushchenko PA, Lezin IA. Object detection in images using deep neural networks and synthetic data in scenarios of partial object occlusion. Computer Optics 2026; 50(2): 1879. DOI: 10.18287/COJ1879.

Citation:
Algashev GA, Kremushchenko PA, Lezin IA. Object detection in images using deep neural networks and synthetic data in scenarios of partial object occlusion. Computer Optics 2026; 50(2): 1879. DOI: 10.18287/COJ1879.

References:

  1. Kazanskiy NL, Popov SB. The distributed vision system of the registration of the railway train. Computer Optics 2012; 36(3): 419-428.
  2. Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 2017; 39(6): 1137-1149. DOI: 10.1109/TPAMI.2016.2577031.
  3. Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. 2016 IEEE Conf on Computer Vision and Pattern Recognition (CVPR) 2016: 779-788. DOI: 10.1109/CVPR.2016.91.
  4. Howard A, Sandler M, Chu G, et al. Searching for MobileNetV3. 2019 IEEE/CVF Int Conf on Computer Vision (ICCV) 2019: 1314-1324. DOI: 10.1109/ICCV.2019.00140.
  5. Tan M, Pang R, Le QV. EfficientDet: Scalable and efficient object detection. 2020 IEEE/CVF Conf on Computer Vision and Pattern Recognition (CVPR) 2020: 10778-10787. DOI: 10.1109/CVPR42600.2020.01079.
  6. Kwon G, Prabhushankar M, Temel D, AlRegib G. Backpropagated gradient representations for anomaly detection. In: Vedaldi A, Bischof H, Brox T, Frahm JM, eds. Computer Vision -- ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XXI. Cham: Springer; 2020: 13-29. DOI: 10.1007/978-3-030-58589-1_13.
  7. Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q. CenterNet++ for object detection. arXiv Preprint. 2022. Source: <https://arxiv.org/abs/2204.08394>. DOI: 10.48550/arXiv.2204.08394.
  8. Schroeder W, Martin K, Lorensen B. The visualization toolkit: An object-oriented approach to 3D graphics. 4th ed. Clifton Park, NY: Kitware Inc; 2006.
  9. Bochkovskiy A, Wang C-Y, Liao H-YM. YOLOv4: Optimal speed and accuracy of object detection. arXiv Preprint. 2020. Source: <https://arxiv.org/abs/2004.10934>. DOI: 10.48550/arXiv.2004.10934.
  10. Kim N, Park J, Choi Y, Oh S. Viewpoint estimation for visual target navigation by leveraging keypoint detection. 2020 20th Int Conf on Control, Automation and Systems (ICCAS) 2020: 1162-1165. DOI: 10.23919/ICCAS50221.2020.9268215.
  11. Wu X, Yang S, Cai Z, Song R, Fan S. Measurement of fish motion parameters based on DeepLabCut. 2024 IEEE 19th Conf on Industrial Electronics and Applications (ICIEA) 2024: 1-5. DOI: 10.1109/ICIEA61579.2024.10664666.
  12. Maji D, Nagori S, Mathew M, Poddar D. YOLO-Pose: Enhancing YOLO for multi-person pose estimation using object keypoint similarity loss. 2022 IEEE/CVF Conf on Computer Vision and Pattern Recognition Workshops (CVPRW) 2022: 2636-2645. DOI: 10.1109/CVPRW56347.2022.00297.
  13. Yanyan Z, Xiangjin R. A study on 3D visualization system for GoCAD objects based on VTK and QT. 2016 Int Conf on Robots and Intelligent System (ICRIS) 2016: 47-50. DOI: 10.1109/ICRIS.2016.10.
  14. Obeidavi S, Gandomkar M, Hirtz G. In-pose estimation of covered and uncovered human body from thermal camera images using multi-scale stacked hourglass (MSSHg) network. 2022 16th Int Conf on Signal-Image Technology and Internet-Based Systems (SITIS) 2022: 84-90. DOI: 10.1109/SITIS57111.2022.00021.
  15. Ichikawa Y, Shioda A, Kawamura K, Chu TV, Motomura M. An accurate FPGA-based ORB feature extractor for SLAM with row-wise keypoint selection. 2024 IEEE Int Conf on Consumer Electronics (ICCE) 2024: 1-2. DOI: 10.1109/ICCE59016.2024.10444305.
  16. Păvăloi I, Ignat A, Lazăr L-C, Niţă C-D. Palmprint recognition with fixed number of SURF keypoints. 2021 Int Conf on e-Health and Bioengineering (EHB) 2021: 1-4. DOI: 10.1109/EHB52898.2021.9657595.
  17. Ludwig K, Harzig P, Lienhart R. Detecting arbitrary intermediate keypoints for human pose estimation with vision transformers. 2022 IEEE/CVF Winter Conf on Applications of Computer Vision Workshops (WACVW) 2022: 663-671. DOI: 10.1109/WACVW54805.2022.00073.
  18. Lv X, Zhang K, Li J, et al. Human gait analysis method based on sample entropy fusion AlphaPose algorithm. 2021 33rd Chinese Control and Decision Conference (CCDC) 2021: 1543-1547. DOI: 10.1109/CCDC52312.2021.9602427.

Россия, 443001, Самара, ул. Молодогвардейская, 151; электронная почта: journal@computeroptics.ru; тел: +7 (846) 242-41-24 (ответственный секретарь), +7 (846) 332-56-22 (технический редактор), факс: +7 (846) 332-56-20