(43-4) 21 * << * >> * Russian * English * Content * All Issues

Multivariate mixed kernel density estimators and their application in machine learning for classification of biological objects based on spectral measurements

A.A. Sirota1, A.O. Donskikh1, A.V. Akimov1, D.A. Minakov1

Voronezh State University, Voronezh, Russia

 PDF, 1171 kB

DOI: 10.18287/2412-6179-2019-43-4-677-691

Pages: 677-691.

Full text of article: Russian language.

Abstract:
A problem of non-parametric multivariate density estimation for machine learning and data augmentation is considered. A new mixed density estimation method based on calculating the convolution of independently obtained kernel density estimates for unknown distributions of informative features and a known (or independently estimated) density for non-informative interference occurring during measurements is proposed. Properties of the mixed density estimates obtained using this method are analyzed. The method is compared with a conventional Parzen-Rosenblatt window method applied directly to the training data. The equivalence of the mixed kernel density estimator and the data augmentation procedure based on the known (or estimated) statistical model of interference is theoretically and experimentally proven. The applicability of the mixed density estimators for training of machine learning algorithms for the classification of biological objects (elements of grain mixtures) based on spectral measurements in the visible and near-infrared regions is evaluated.

Keywords:
machine learning, pattern classification, data augmentation, kernel density estimation, spectral measurements

Citation:
Sirota AA, Donskikh AO, Akimov AV, Minakov DA. Multivariate mixed kernel density estimators and their application in machine learning for classification of biological objects based on spectral measurements. Computer Optics 2019; 43(4): 677-691. DOI: 10.18287/2412-6179-2019-43-4-677-691.

References:

  1. Krivenko MP. Nonparametric estimation of Bayesian classifier elements [In Russian]. Informatics and Applications 2010; 4(2): 13-24.
  2. Lapko AV, Lapko VA. Nonparametric algorithm of automatic classification under conditions of large-scale statistical data [In Russian]. Information Science and Control Systems 2018; 3(57): 59-70. DOI: 10.22250/isu.2018.57.59-70.
  3. Nakamura Y, Hasegawa O. Nonparametric density estimation based on self-organizing incremental neural network for large noisy data. IEEE Transactions on Neural Networks and Learning Systems 2016; 28(1): 8-17. DOI: 10.1109/TNNLS.2015.2489225.
  4. Donskikh AO, Sirota AA. A data augmentation method for machine learning based on nonparametric kernel density estima-tion [In Russian]. Proceedings of Voronezh State University. Series: system analysis and information technology 2017; 3: 142-155.
  5. Yaeger L, Lyon R, Webb B. Effective training of a neural network character classifier for word. NIPS 1996: 807-813.
  6. Ciresan DC, Meier U, Gambardella LM, Schmidhuber J. Deep big simple neural nets excel on handwritten digit recognition. Neural Computation 2010; 22(12): 3207-3220. DOI: 10.1162/NECO_a_00052.
  7. Simard PY, Steinkraus D, Platt JC. Best practices for convolutional neural networks applied to visual document analysis. 7th Int Conf Docum Anal Recogn 2003: 958-963. DOI: 10.1109/ICDAR.2003.1227801.
  8. Kachalin SV. Improving the stability of large neural networks by extending small training sets of parent samples with synthe-sized biometric descendant samples [In Russian]. Proceedings of the Scientific and Technical Conference of Thecluster of Penza Enterprises Providing Security of Information Technologies 2014; 9: 32-35.
  9. Akimov AV, Sirota AA. Synthetic data generation models and algorithms for training image recognition algorithms using the Viola-Jones framework. Computer Optics 2016; 40(6): 911-918. DOI: 10.18287/2412-6179-2016-40-6-911-918.
  10. Guo H, Viktor HL. Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach. ACM SIGKDD Explorations Newsletter 2004; 6(1): 30-39. DOI: 10.1145/1007730.1007736.
  11. Chawla N, Bowyer K, Hall L, Kegelmeyer W. SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intel-ligence Research 2002; 16(1): 321-357.DOI: 10.1613/jair.953.
  12. Chawla NV, Lazarevic A, Hall LO, Bowyer KW. SMOTEBoost: Improving prediction of the minority class in boosting. In Book: Lavrač N, Gamberger D, Todorovski L, Blockeel H, eds. Knowledge discovery in databases. Berlin, Heidelberg, New York: Springer-Verlag; 2003: 107-119. DOI: 10.1007/978-3-540-39804-2_12.
  13. Fukunaga K. Introduction to Statistical pattern recognition. 2nd ed. San Diego: Academic Press; 1990.
  14. Duda RO, Hart PE, Stork DG. Pattern classification. 2nd ed. Hoboken, NJ: Wiley-Interscience; 2000.
  15. Kryanev AV, Lukin GV. Mathematical methods for handling uncertain data [In Russian]. Moscow: "Fizmatlit" Publisher; 2003.
  16. Akimov AV, Donskikh AO, Sirota AA. Models and algorithms of digital image recognition under influence of warping and additive noise [In Russian]. Proceedings of Voronezh State University. Series: System Analysis and Information Technology 2018; 1: 104-118.
  17. Gramacki A. Nonparametric kernel density estimation and its computational aspects. Cham, Switzerland: Springer International Publishing AG; 2018: 42-49. ISBN: 978-3-319-71687-9.
  18. Dobrovidov AV, Ruds'ko IM. Bandwidth selection in nonparametric estimator of density derivative by smoothed cross-validation method. Automation and Remote Control 2010; 71(2): 209-224. DOI: 10.1134/S0005117910020050.
  19. Voronov IV, Mukhometzianov RN, Krasnova AA. Bandwidth selection in the approximation of probability density via Parzen-Rosenblatt method for small sample size [In Russian]. Radio Electronics Technology 2016; 1(9): 93-98.
  20. Donskikh AO, Minakov DA, Sirota AA. Optical methods of identifying the varieties of the components of grain mixtures based on using artificial neural networks for data analysis. Journal of Theoretical and Applied Information Technology 2018; 96(2): 534-542.



© 2009, IPSI RAS
151, Molodogvardeiskaya str., Samara, 443001, Russia; E-mail: ko@smr.ru ; Tel: +7 (846) 242-41-24 (Executive secretary), +7 (846) 332-56-22 (Issuing editor), Fax: +7 (846) 332-56-20