(49-3) 15 * << * >> * Русский * English * Содержание * Все выпуски
  
Human Action Recognition Based on The Skeletal Pairwise Dissimilarity
 E.E. Surkov 1, O.S. Seredin 1, A.V. Kopylov 1
 1 Tula State University,
     Lenin Ave. 92, Tula, 300012, Russia
 
 PDF, 6771 kB
  PDF, 6771 kB
DOI: 10.18287/2412-6179-CO-1522
Страницы: 493-503.
Язык статьи: English.
 
Аннотация:
The  main idea of the paper is to apply the principles of featureless pattern  recognition to human activity recognition problem. The article presents the  human figure representing approach based on pairwise dissimilarity function of  skeletal models and a set of reference objects, also known as a basic assembly.  The paper includes a basic assembly analysis and we propose the method for  selecting the least-correlated basic objects. The video sequence proposed for  analysis of human activity within frames is represented as an activity map. The  activity map is a result of computing the pairwise dissimilarity function  between skeletal models from the video sequence and the basic assembly of  skeletons. The paper conducts frame-by-frame annotation of activities in the  TST Fall Detection v2 database, such as standing, sitting, lying, walking,  falling, post-fall lying, grasp, ungrasp. A convolutional neural network based  on the ResNetV2 with the SE-block is proposed to solve the activity recognition  problem. SE-block allows to detect inter-channel dependencies and selecting the  most important features. Additionally, we prepare a data for training,  determine an optimal hyperparameters of the neural network model. Experimental  results of human activity recognition on the TST Fall Detection v2 database  using the Leave-one-person-out procedure are provided. Furthermore, the paper  presents a frame-by-frame assessment of the quality of human activity  recognition, achieving an accuracy exceeding 83%.
Ключевые слова:
basic  assembly, pairwise dissimilarity measure, activity map, human action  recognition, CNN, inner-channel attention.
Благодарности
This  research is funded by the Ministry of Science and Higher Education of the Russian Federation  within the framework of the state task FSFS-2024-0012.
Citation:
Surkov EE, Seredin OS, Kopylov AV. Human Action Recognition Based on The Skeletal Pairwise Dissimilarity. Computer Optics 2025; 49(3): 493-503. DOI: 10.18287/2412-6179-CO-1522.
References:
  - Seredin  OS, Kopylov AV, Surkov EE. The study of skeleton description reduction in the  human fall-detection task. Computer Optics 2020; 44(6): 951-958. DOI: 10.18287/2412-6179-CO-753.
 
- Seredin OS, Kopylov AV, Huang   SC, Rodionov DS. A skeleton  features-based fall detection using Microsoft Kinect v2 with one  class-classifier outlier removal. Int Arch Photogramm Remote Sens Spat Inf Sci  2019; XLII-2/W12: 189-195. DOI: 10.5194/isprs-archives-XLII-2-W12-189-2019.
 
- Hussein ME, Torki  M, Gowayyed MA, El-Saban M. Human action recognition using a temporal hierarchy  of covariance descriptors on 3D joint locations. Proc Twenty-Third Int Joint  Conf on Artificial Intelligence (IJCAI '13) 2013: 2466-2472.
 
- Vemulapalli R,  Arrate F, Chellappa R. Human action recognition by rep-resenting 3D skeletons  as points in a lie group, 2014 IEEE Conf on Computer Vision and Pattern  Recognition 2014: 588-595. DOI: 10.1109/CVPR.2014.82.
 
- Wang J, Liu Z, Wu  Z, Yuan J. Mining actionlet ensemble for action recognition with depth cameras.  2012 IEEE Conf on Computer Vision and Pattern Recognition 2012: 1290-1297. DOI:  10.1109/CVPR.2012.6247813.
 
- Smolyaninov VV.  Invariants of anthropometric proportions [In Russian]. Biophisycs 2012; 57(3):  528-560.
 
- Ren F, Tang C, Tong A, et al. Skeleton-based human action recognition by  fusing attention based three-stream convolutional neural network and SVM.  Multimed Tools Appl 2024; 83(2): 6273-6295. DOI: 10.1007/s11042-023-15334-9.
 
- Xin C, Kim S, Cho Y,  Park KS.  Enhancing human action recognition with 3d skeleton data: A comprehensive study  of deep learning and data augmentation. Electronics 2024; 13(4): 747. DOI: 10.3390/electronics13040747.
 
- Xie J, Meng Y, Zhao  Y, Nguyen A, Yang X, Zheng Y. Dynamic semantic-based spatial graph convolution  network for skeleton-based human action recognition. Proc AAAI Conf on  Artificial Intelligence 2024; 38(6): 6225-6233. DOI: 10.1609/aaai.v38i6.28440.
 
- Abduljalil H,  Elhayek A, Marish Ali A, Alsolami F. Spatiotemporal graph autoencoder network  for skeleton-based human action recognition. Preprints. 2024: 2024011998. Source:            <https://www.preprints.org/manuscript/202401.1998/v2>.  DOI: 10.20944/preprints202401.1998.v2.
 
- Lovanshi M, Tiwari  V. Human skeleton pose and spatio-temporal feature-based activity recognition  using ST-GCN. Multimed Tools Appl 2024; 83(5): 12705-12730. DOI: 10.1007/s11042-023-16001-9.
 
- Chen K, Yang Z,  Yang Z. Graph neural networks for skeleton-based action recognition. Advances  in Engineering Technology Research 2024; 9(1): 604-604. DOI: 10.56028/aetr.9.1.604.2024.
 
- Do J, Kim M.  SkateFormer: skeletal-temporal transformer for human action recognition. arXiv Preprint.  2024. Source: <https://arxiv.org/abs/2403.09508>. DOI: 10.48550/arXiv.2403.09508.
 
- Lerch DJ, Zhong Z,  Martin M, Voit M, Beyerer J. Unsupervised 3D skeleton-based action recognition  using cross-attention with conditioned generation capabilities. 2024 IEEE/CVF  Winter Conf on Applications of Computer Vision Workshops (WACVW) 2024: 211-220.
 
- Qiu H, Biao H.  Multi-grained clip focus for skeleton-based action recognition. Pattern Recogn  2024; 148: 110-188. DOI: 10.1016/j.patcog.2023.110188.
 
- Uddin S, Nawaz T, Ferryman J, Rashid N, Asaduzzaman M, Nawaz R. Skeletal  keypoint-based transformer model for human action recognition in aerial videos.  IEEE Access 2024; 12: 11095-11103. DOI: 10.1109/ACCESS.2024.3354389.
 
- Han H, Zeng H,  Kuang L, Han X, Xue H. A human activity recognition method based on Vision  Transformer. Sci Rep 2024; 14: 15310. https://doi.org/10.1038/s41598-024-65850-3.
 
- Bevilacqua V, et  al. Fall detection in indoor environment with kinect sensor. 2014 IEEE Int  Symposium on Innovations in Intelligent Systems and Applications (INISTA) Proc  2014: 319-324. DOI: 10.1109/INISTA.2014.6873638.
 
- Bian P, Hou J, Chau  P, Thalmann NM. Fall detection based on body part  tracking using a depth camera. IEEE J Biomed Heal Informatics 2015; 19(2):  430-439. DOI: 10.1109/JBHI.2014.2319372.
 
- Mottl V, Seredin O,  Dvoenko S, Kulikowski C, Muchnik I. Featureless pattern recognition in an  imaginary Hilbert space. Proc 16th Int Conf on Pattern Recognition 2002; 2:  88-91. DOI: 10.1109/ICPR.2002.1048244.
 
- Duin PW, Pekalska  E, Ridder DD. Relational discriminant analysis. Pattern Recognit Lett 1999; 20(11-13):  1175-1181. DOI: 10.1016/S0167-8655(99)00085-9.
 
- Seredin OS, Kopylov  AV, Surkov EE, Huang SC. The basic assembly of skeletal models in  the fall detection problem. Computer Optics 2023; 47(2): 323-334. DOI: 10.18287/2412-6179-CO-1158.
 
- Pekalska E, Duin RPW.  The dissimilarity representation for pattern recognition: Foundations and applications.  Singapore:  World Scientific Publishing Co Ptr Ltd; 2005. ISBN: 981-256-530-2.
 
- Pekalska E, Duin  RPW, Paclik P. Prototype selection for dissimilarity-based classifiers, Pattern  Recognit 2006; 39(2): 189-208. DOI: 10.1016/j.patcog.2005.06.012.
 
- Theodorakopoulos I,  Kastaniotis D, Economou G, Fotopoulos S. Pose-based human action recognition  via sparse representation in dissimilarity space. J Vis  Commun Image Represent 2014; 25(1): 12-23. DOI: 10.1016/j.jvcir.2013.03.008.
 
- Kim M, Jiang X,  Lauter K, et al. Secure human action recognition by encrypted neural network  inference. Nat Commun 2022; 13: 4799. DOI: 10.1038/s41467-022-32168-5.
 
- Rajput AS, Raman B, Imran J.  Privacy-preserving human action recognition as a remote cloud service using  RGB-D sensors and deep CNN. Expert Syst Appl 2020; 152: 113349. DOI: 10.1016/j.eswa.2020.113349.
 
- Wang H, et al.  Understanding the robustness of skeleton-based action recognition under  adversarial attack. Proc IEEE/CVF Conf on Computer Vision and Pattern  Recognition 2021: 14656-14665. DOI: 10.1109/CVPR46437.2021.01442.
 
- Seredin OS, Surkov  EE, Kopylov AV, Dvoenko SD. Multidimensional data visualization based on the  shortest unclosed path search. In Book: Dang NHT, Zhang Y-D, Tavares JMRS, Chen  B-H, eds. Artificial intelligence in data and big data processing. Cham, Switzerland:  Springer Nature Switzerland AG; 2022: 279-299. DOI: 10.1007/978-3-030-97610-1_23.
 
- Surkov EE, Seredin  OS, Kopylov AV. Locally optimal solutions in the shortest unclosed path search  problem, IEEE Ural-Siberian Conf on Biomedical Engineering, Radioelectronics  and Information Technology (USBEREIT) 2023: 221-224. DOI: 10.1109/USBEREIT58508.2023.10158834.
 
- He K, et al. Identity  mappings in deep residual networks. In Book: Leibe B, Matas J, Sebe N, Welling  M, eds. Computer Vision – ECCV 2016. 14th European Conference, Amsterdam,  The Netherlands,  October 11–14, 2016, Proceedings, Part IV. Cham, Switzerland:  Springer International Publishing AG; 2016: 630-645. DOI: 10.1007/978-3-319-46493-0_38.
 
- Gasparrini S,  Cippitelli E, Gambi E, Spinsante S, Wahslen J, Orhan I, Lindh T. Proposal and  experimental evaluation of fall detection solution based on wearable and depth  data fusion. In Book: Loshkovska S, Koceski S, eds. ICT Innovations 2015. Emerging  technologies for better living. Cham: Springer International Publishing Switzerland;  2016:  99-108. DOI: 10.1007/978-3-319-25733-4_11.
 
- Wang X, Talavera E,  Karastoyanova D, Azzopardi G. Fall detection with a nonintrusive and  first-person vision approach. IEEE Sens J 2023; 23(22): 28304-28317. DOI: 10.1109/JSEN.2023.3314828.
 
- Mottl V, Seredin O,  Krasotkina O. Compactness hypothesis, potential functions, and rectifying  linear space in machine learning. In Book: Rozonoer L, Mirkin B, Muchnik I,  eds. Braverman readings in machine learning. key ideas from inception to  current state. International Conference Commemorating the 40th Anniversary of  Emmanuil Braverman's Decease, Boston,   MA, USA,  April 28-30, 2017, Invited Talks. Cham,   Switzerland:  Springer Nature Switzerland AG; 2018: 52-102. DOI: 10.1007/978-3-319-99492-5_3.
 
- He K, et al. Deep  residual learning for image recognition. 2016 IEEE Conf on Computer Vision and  Pattern Recognition (CVPR) 2016: 770-778. DOI: 10.1109/CVPR.2016.90.
 
- Hu J, Shen L, Sun G. Squeeze-and-excitation networks. 2018 IEEE/CVF Conf  on Computer Vision and Pattern Recognition 2018: 132-141. DOI:  10.1109/CVPR.2018.00745.
 
- Xu Z, Yu J, Xiang  W, Zhu S, Hussain M, Liu B, Li J. A novel SE-CNN attention architecture for  sEMG-based hand gesture recognition. CMES - Computer Modeling in Engineering and  Sciences 2023; 134(1): 157-177. DOI: 10.32604/cmes.2022.020035.
 
- Mikhaylichenko AA, Demyanenko YM. Using squeeze-and-excitation blocks to improve  an accuracy of automatically grading knee osteoarthritis severity using  convolutional neural networks. Computer Optics 2022; 46(2): 317-325. DOI:  10.18287/2412-6179-CO-897.
 
- Tharwat A.  Classification assessment methods. Appl Comput Inform 2021; 17(1): 168-192.  DOI: 10.1016/j.aci.2018.08.003.
 
- Kingma DP, Ba J. Adam:  A method for stochastic optimization. arXiv Preprint. 2014. Source: <https://arxiv.org/abs/1412.6980>.  DOI: 10.48550/arXiv.1412.6980.
 
- Glorot X, Bengio Y.  Understanding the difficulty of training deep feedforward neural networks. Proc  Mach Learn Res 2010; 9: 249-256.
 
- Manzi A, Dario P, Cavallo F. A human activity recognition system based on  dynamic clustering of skeleton data. Sensors 2017; 17(5): 1100. DOI:  10.3390/s17051100. 
- Yin J, et al. MC-LSTM: Real-time 3D  human action detection system for intelligent healthcare applications. IEEE  Trans Biomed Circuits Syst 2021; 15(2): 259-269. DOI:  10.1109/TBCAS.2021.3064841.
  
  © 2009, IPSI RAS
    Россия, 443001, Самара, ул. Молодогвардейская, 151; электронная почта: journal@computeroptics.ru; тел: +7  (846)  242-41-24 (ответственный секретарь), +7 (846) 332-56-22 (технический  редактор), факс: +7 (846) 332-56-20