Clustering of media content from social networks using bigdata technology
Rycarev I.A., Kirsh D.V., Kupriyanov A.V.

IPSI RAS – Branch of the FSRC “Crystallography and Photonics” RAS, Molodogvardeyskaya 151, 443001, Samara, Russia;
Samara National Research University, Moskovskoye shosse, 34, 443086, Samara, Russia

 PDF

Abstract:
The article deals with one of the key problems of the social network analysis – the problem of classifying accounts based on media content uploaded by users. The main difficulties are the content heterogeneity (both in format and subject) and the large volumes of data, which leads to excessive computational complexity of its processing and often to the complete inefficiency of traditional analysis methods. In the article, we discuss an approach to the clustering of media content from social networks based on textual annotation using BigData technology – a modern and efficient tool that allows to solve the problem of large data volume processing. To carry out computational experiments, a large sample of heterogeneous images (photographs, paintings, postcards, etc.) was collected from real Twitter accounts. The results confirmed the high quality of media content clustering, the average error was around 5 %.

Keywords:
cluster analysis, BigData technology, text annotation, social networks, media content analysis, k-means clustering, GoogLeNet.

Citation:
Rycarev IA, Kirsh DV, Kupriyanov AV. Clustering of media content from social networks using bigdata technology. Computer Optics 2018; 42(5): 921-927. DOI: 10.18287/2412-6179-2018-42-5-921-927.

References:

  1. Maxwell D, Raue S, Azzopardi L, Johnson CW, Oates S. Crisees: Real-time monitoring of social media streams to support crisis management. In Book: Baeza-Yates R, de Vries AP, Zaragoza H, Cambazoglu BB, Murdock V, Lempel R, Silvestri F, eds. Advances in information retrieval. Berlin: Springer; 2012: 573-575. DOI: 10.1007/978-3-642-28997-2_68.
  2. Scott J. Social network analysis. 3rd ed. London: Sage Publications Ltd; 2017. ISBN: 978-1-4462-0904-2.
  3. Borgatti SP, Everett MG, Johnson JC. Analyzing social networks. 2nd ed. London: Sage Publications Ltd; 2018. ISBN: 978-1-5264-0410-7.
  4. Kirsh DV, Soldatova OP, Kupriyanov AV, Lyozin IA, Lyozina IV. 3D crystal structure identification using fuzzy neural networks. Opt Mem Neural Networks 2017; 26(4): 249-256. DOI: 10.3103/S1060992X17040026.
  5. Marra F, Poggi G, Sansone C, Verdoliva L. Blind PRNU-based image clustering for source identification. IEEE Transactions on Information Forensics and Security 2017; 12(9): 2197-2211. DOI: 10.1109/TIFS.2017.2701335.
  6. Xu X, Yuruk N, Feng Z, Schweiger TAJ. SCAN: a structural clustering algorithm for networks. Proc 13th ACM SIGKDD international conference on Knowledge discovery and data mining 2007: 824-833.  DOI: 10.1145/1281192.1281280.
  7. Khotilin MI, Blagov AV. Visualization and cluster analysis of social networks. CEUR Workshop Proceedings 2016; 1638: 843-850. DOI: 10.18287/1613-0073-2016-1638-843-850.
  8. Semertzidis K, Pitoura E, Tsaparas P. How people describe themselves on Twitter. Proceedings of the ACM SIGMOD Workshop on Databases and Social Networks. ACM 2013: 25-30. DOI: 10.1145/2484702.2484708.
  9. Blagov A, Rytsarev I, Khotilin M, Strelkov K. Big data instruments for social media analysis. Proceedings of the 5th International Workshop on Computer Science and Engineering 2015: 179-184.
  10. Rytsarev I, Blagov A. Creating the model of the activity of social network Twitter users. Journal of Telecommunication, Electronic and Computer Engineering 2017; 9(1-3): 27-30.
  11. Rytsarev IA, Blagov AV. Development and research of algorithms for clustering data of super-large volume. CEUR Workshop Proceedings 2017; 1903: 80-83.
  12. Dhanachandra N, Manglem K, Chanu YJ. Image segmentation using K-means clustering algorithm and subtractive clustering algorithm. Procedia Computer Science 2017; 54: 764-771. DOI: 10.1016/j.procs.2015.06.090.
  13. Kazanskiy N, Protsenko V, Serafimovich P. Performance analysis of real-time face detection system based on stream data mining frameworks. Procedia Engineering 2017; 201: 806-816. DOI: 10.1016/j.proeng.2017.09.602.
  14. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. Proc IEEE conference on Computer Vision and Pattern Rrecognition 2015. DOI: 10.1109/CVPR.2015.7298594.
  15. Bahmani B, Moseley B, Vattani A, Kumar R, Vassilvitskii S. Scalable k-means++. Proc VLDB Endowment 2012; 5(7): 622-633. DOI: 10.14778/2180912.2180915.
  16. Rejito J, Abdullahi AS, Akmal, Setiana D, Ruchjana BN. Image indexing using color histogram and k-means clustering for optimization CBIR in image database. Journal of Physics: Conference Series 2017; 893(1): 012055. DOI: 10.1088/1742-6596/893/1/012055.

© 2009, IPSI RAS
151, Molodogvardeiskaya str., Samara, 443001, Russia; E-mail: journal@computeroptics.ru ; Tel: +7 (846) 242-41-24 (Executive secretary), +7 (846) 332-56-22 (Issuing editor), Fax: +7 (846) 332-56-20