(47-2) 11 * << * >> * Russian * English * Content * All Issues

Near real-time animal action recognition and classification
A.D. Egorov 1, M.S. Reznik 1

National Research Nuclear University MEPhI,
115409, Moscow, Russia, Kashirskoe Shosse, 31

 PDF, 2230 kB

DOI: 10.18287/2412-6179-CO-1138

Pages: 278-286.

Full text of article: Russian language.

Abstract:
In computer vision, identification of actions of an object is considered as a complex and relevant task. When solving the problem, one requires information on the position of key points of the object. Training models that determine the position of key points requires a large amount of data, including information on the position of these key points. Due to the lack of data for training, the paper provides a method for obtaining additional data for training, as well as an algorithm that allows highly accurate recognition of animal actions based on a small number of data. The achieved accuracy of determining the key points positions within a test sample is 92%. Positions of the key points define the action of the object. Various approaches to classifying actions by key points are compared. The accuracy of identifying the action of the object in the image reaches 72.9 %.

Keywords:
computer vision, machine learning, animal recognition, action recognition, data augmentation, Keypoint R-CNN, Mobile Net.

Citation:
Egorov AD, Reznik MS. Near real-time animal action recognition and classification. Computer Optics 2023; 47(2): 278-286. DOI: 10.18287/2412-6179-CO-1138.

References:

  1. Zhou J, Lin K-Y, Li H, Zheng W-S. Graph-based high-order relation modeling for long-term action recognition. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition (CVPR) 2021: 8984-8993.
  2. Wang L, Tong Z, Ji B, Wu G. TDN: Temporal Difference Networks for efficient action recognition. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition (CVPR) 2021: 1895-1904.
  3. Pereira TD, et al. Fast animal pose estimation using deep neural networks. Nat Methods 2019; 16(1): 117-125.
  4. Yu L, et al. Traffic danger recognition with surveillance cameras without training data. 2018 15th IEEE Int Conf on Advanced Video and Signal Based Surveillance (AVSS) 2018: 378-383.
  5. Shu X, et al. Concurrence-aware long short-term sub-memories for person-person action recognition. Proc IEEE Conf on Computer Vision and Pattern Recognition Workshops 2017: 2176-2183.
  6. Seredin OS, Kopylov AV, Surkov EE. The study of skeleton description reduction in the human fall-detection task. Computer Optics 2020; 44(6): 951-958. DOI: 10.18287/2412-6179-CO-753.
  7. Graving JM, Chae D, Naik H, Li L, Koger B, Costelloe BR, Couzin ID. DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning. eLife 2019; 8: e47994.
  8. Shinde S, Kothari A, Gupta V. YOLO based human action recognition and localization. Procedia Comput Sci 2018; 133: 831-838.
  9. Lalitha B, Gomathi V. Review based on image understanding approaches. 2019 IEEE Int Conf on Electrical, Computer and Communication Technologies (ICECCT) 2019: 1-8.
  10. Josyula R, Ostadabbas S. A review on human pose estimation. arXiv Preprint. 2021. Source:  <https://arxiv.org/abs/2110.06877>.
  11. Lin T-Y, et al. Microsoft coco: Common objects in context. In Book: Fleet D, Pajdla T, Schiele B, Tuytelaars T, eds. Computer Vision -- ECCV 2014. Part V. Cham: Springer; 2014: 740-755.
  12. Tuia D, Kellenberger B, Beery S, et al. Perspectives in machine learning for wildlife conservation. Nat Commun 2022; 13: 792.
  13. Li W, Swetha S, Shah M. Wildlife action recognition using deep learning. Source: <https://www.crcv.ucf.edu/wp-content/uploads/2018/11/Weining_L_Report.pdf>.
  14. Chen G, Han TX, He Z, Kays R, Forrester T. Deep convolutional neural network based species recognition for wild animal monitoring. 2014 IEEE Int Conf on Image Processing (ICIP) 2014: 858-862.
  15. Norouzzadeh MS, et al. Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning. PNAS 2018; 115(25): E5716-E5725.
  16. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016: 770-778.
  17. Schneider S, Taylor GW, Kremer S. Deep learning object detection methods for ecological camera trap data. 2018 15th Conf on Computer and Robot Vision (CRV) 2018: 321-328.
  18. Bain M, Nagrani A, Schofield D, Berdugo S, Bessa J, Owen J, Hockings KJ, Matsuzawa T, Hayashi M, Biro D, Carvalho S, Zisserman A. Automated audiovisual behavior recognition in wild primates. Sci Adv 2021; 7(46): eabi4883.
  19. Schindler F, Steinhage V. Identification of animals and recognition of their actions in wildlife videos using deep learning techniques. Ecol Inform 2021; 61: 101215.
  20. Nath T, Mathis A, Chen AC, Patel A, Bethge M, Mathis MW. Using DeepLabCut for 3D markerless pose estimation across species and behaviors. Nat Protoc 2019; 14(7): 2152-2176.
  21. Zhang J, Chen Z, Tao D. Towards high performance human keypoint detection. Int J Comput Vis 2021; 129(9): 2639-2662.
  22. Cao J, et al. Cross-domain adaptation for animal pose estimation. Proc IEEE/CVF Int Conf on Computer Vision 2019: 9498-9507.
  23. Dewi C, et al. Yolo V4 for advanced traffic sign recognition with synthetic training data generated by various GAN. IEEE Access 2021; 9: 97228-97242.
  24. Redmon J, et al. You only look once: Unified, real-time object detection. Proc IEEE Conf on Computer Vision and Pattern Recognition 2016: 779-788.
  25. meituan/YOLOv6. Source: <https://github.com/meituan/YOLOv6>.
  26. Ren S, et al. Faster R-CNN: Towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 2015; 28: 91-99.
  27. Liu W, et al. Ssd: Single shot multibox detector. In Book: Leibe B, Matas J, Sebe N, Welling M, eds. Computer Vision – ECCV 2016. Cham: Springer; 2016: 21-37.
  28. Kim J-a, Sung J-Y, Park S-h. Comparison of Faster-RCNN, YOLO, and SSD for real-time vehicle type recognition. 2020 IEEE Int Conf on Consumer Electronics-Asia (ICCE-Asia) 2020: 1-4.
  29. Dr Viraktamath SV, Neelopant A, Navalgi P. Comparison of YOLOv3 and SSD algorithms. Int J Eng Res Technol 2021; 10(02): 193-196.
  30. Sree BB, Bharadwaj VY, Neelima N. An inter-comparative survey on state-of-the-art detectors – R-CNN, YOLO, and SSD. In Book: Reddy ANR, Marla D, Favorskaya MN, Satapathy SC, eds. Intelligent manufacturing and energy sustainability. Singapore: Springer; 2021: 475-483.
  31. Ding X, et al. Local keypoint-based Faster R-CNN. Appl Intell 2020; 50(10): 3007-3022.
  32. Vizilter YV, Gorbatsevich VS, Moiseenko AS. Single-shot face and landmarks detector. Computer Optics 2020; 44(4): 589-595. DOI: 10.18287/2412-6179-CO-674.
  33. He K, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. arXiv Preprint. 2017. Source: <https://arxiv.org/abs/1703.06870>.
  34. Targ S, Almeida D, Lyman K. Resnet in resnet: Generalizing residual architectures. arXiv Preprint. 2016. Source: <https://arxiv.org/abs/1603.08029>.
  35. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv Preprint. 2017. Source: <https://arxiv.org/abs/1704.04861>.
  36. Egorov AD, Reznik MS. Selection of hyperparameters and data augmentation method for diverse backbone models mask R-CNN. 2021 IV Int Conf on Control in Technical Systems (CTS) 2021: 249-251.
  37. Breiman L. Random forests. Mach Learn 2001; 45(1): 5-32.
  38. Ke G, et al. Lightgbm: A highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst 2017; 30: 3146-3154.
  39. Chen T, Guestrin C. Xgboost: A scalable tree boosting system. Proc 22nd ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining (KDD '16) 2016: 785-794.
  40. Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A. CatBoost: unbiased boosting with categorical features. arXiv Preprint. 2017. Source: <https://arxiv.org/abs/1706.09516>.
  41. Hancock JT, Khoshgoftaar TM. CatBoost for big data: an interdisciplinary review. J big data 2020; 7(1): 94.
  42. Friedman J, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat 2000; 28(2): 337-407.
  43. Hastie T, et al. Multi-class adaboost. Stat Interface 2009; 2(3): 349-360.
  44. Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv Preprint. 2014. Source: <https://arxiv.org/abs/1412.6980>.

© 2009, IPSI RAS
151, Molodogvardeiskaya str., Samara, 443001, Russia; E-mail: journal@computeroptics.ru ; Tel: +7 (846) 242-41-24 (Executive secretary), +7 (846) 332-56-22 (Issuing editor), Fax: +7 (846) 332-56-20