(46-1) 18 * << * >> * Russian * English * Content * All Issues

Towards energy-efficient neural network calculations
E.S. Noskova 1, I.E. Zakharov 1, Y.N. Shkandybin 1, S.G. Rykovanov 1

Skolkovo Institute of Science and Technology,
121205, Moscow, Russia, Bolshoi boulevard, 30, building 1

 PDF, 903 kB

DOI: 10.18287/2412-6179-CO-914

Pages: 160-166.

Full text of article: Russian language.

Abstract:
Nowadays, the problem of creating high-performance and energy-efficient hardware for Artificial Intelligence tasks is very acute. The most popular solution to this problem is the use of Deep Learning Accelerators, such as GPUs and Tensor Processing Units to run neural networks. Recently, NVIDIA has announced the NVDLA project, which allows one to design neural network accelerators based on an open-source code. This work describes a full cycle of creating a prototype NVDLA accelerator, as well as testing the resulting solution by running the resnet-50 neural network on it. Finally, an assessment of the performance and power efficiency of the prototype NVDLA accelerator when compared to the GPU and CPU is provided, the results of which show the superiority of NVDLA in many characteristics.

Keywords:
NVDLA, FPGA, inference, deep learning accelerators.

Citation:
Noskova ES, Zakharov IE, Shkandybin YN, Rykovanov SG. Towards energy-efficient neural network calculations. Computer Optics 2022; 46(1): 160-166. DOI: 10.18287/2412-6179-CO-914.

References:

  1. Goodfellow I, Bengio Y, Courville A. Deep learning. Cambridge: The MIT Press; 2016.
  2. Zacharov I, Arslanov R, Gunin M, Stefonishin D, Pavlov S, Panarin O, Maliutin A, Rykovanov SG, Fedorov M. “Zhores” – Petaflops supercomputer for data-driven modeling, machine learning and artificial intelligence installed in Skolkovo Institute of Science and Technology. Open Eng 2019; 9(1): 512-520.
  3. Shaw DE, Deneroff MM, Dror RO, et al. Anton, a special-purpose machine for molecular dynamics simulation. Commun ACM 2008; 51(7): 91-97.
  4. Singer G. Deep Learning is coming of age. 2018. Source: <https://www.nextplatform.com/2018/10/18/deep-learning-is-coming-of-age/>.
  5. Merenda M, Porcaro C, Iero D. Machine learning for AI-enabled IoT devices: a review. Sensors 2020; 20(9): 2533.
  6. Park J, Naumov M, Basu P, et al. Deep learning inference in facebook data centers: Characterization, performance optimizations and hardware implications. arXiv preprint arXiv:1811.09886. 2018. Source: <https://arxiv.org/abs/1811.09886>.
  7. Mishra A, Nurvitadhi E, Cook J. Marr D. WRPN: Wide reduced-precision networks. ICLR (Poster) 2018.
  8. Chen Y, Xie Y, Song L, Chen F, Tang T. A survey of accelerator architectures for deep neural networks. Engineering 2020; 6(3): 264-274.
  9. Jouppi NP, Young C, Patil N, et al. In-datacenter performance analysis of a tensor processing unit. Proc 44th Annual int Symposium on Computer Architecture 2017: 1-12.
  10. Guo K, Zeng S, Yu J, Wang Y, Yang H. A survey of FPGA-based neural network accelerator. arXiv preprint arXiv:1712.08934. 2017. Source: <https://arxiv.org/abs/1712.08934>.
  11. NVDLA. <Source: http://nvdla.org/>.
  12. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick RB, Guadarrama S, Darrell T. Caffe: Convolutional architecture for fast feature embedding. Proc 22nd ACM Int Conf on Multimedia 2014: 675-678.
  13. Tan Z, Waterman A, Cook H, Bird S, Asanovic K, Patterson D. A case for FAME: FPGA architecture model execution. ACM SIGARCH Computer Architecture News 2010; 38(3): 290-301.
  14. BeagleV Forum. Source: <https://beagleboard.org/beaglev>.
  15. The economics of ASICs: At what point does a custom SoC become viable? Source: <https://www.electronicdesign.com/technologies/embedded-revolution/article/21808278/the-economics-of-asics-at-what-point-does-a-custom-soc-become-viable>.
  16. Xilinx Zynq UltraScale+ MPSoCZCU104 evaluation kit. Source: <https://www.electronicdesign.com/technologies/embedded-revolution/article/21808278/the-economics-of-asics-at-what-point-does-a-custom-soc-become-viable>.
  17. Delbergue G, Burton M, Konrad F, Le Gal B, Jego C. QBox: An industrial solution for virtual platform simulation using QEMU and SystemC TLM-2.0. 8th European Congress on Embedded Real Time Software and Systems (ERTS 2016) 2016: hal-01292317.
  18. The Xilinx Vivado. Source: <https://www.xilinx.com/products/design-tools/vivado.html>.
  19. Farshchi F, Huang Q, Yun H. Integrating NVIDIA deep learning accelerator (NVDLA) with RISC-V SoC on FireSim. 2019 2nd Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2) 2019: 21-25.
  20. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. 2016 IEEE Conf on Computer Vision and Pattern Recognition (CVPR) 2016: 770-778.
  21. UltraScale+ FPGA product tables and product selection guide. Source: <https://www.xilinx.com/support/documentation/selection-guides/ultrascale-plus-fpga-product-selection-guide.pdf>.
  22. GeForce GTX 1080 Ti. Source: <https://www.nvidia.com/en-sg/geforce/products/10series/geforce-gtx-1080-ti/>.
  23. GeForce RTX 2080 Ti. Source: <https://www.nvidia.com/ru-ru/geforce/graphics-cards/rtx-2080-ti/>.
  24. Second Generation Intel Xeon scalable processors datasheet. Source: <https://www.intel.ru/content/www/ru/ru/products/docs/processors/xeon/2nd-gen-xeon-scalable-datasheet-vol-1.html>.
  25. Likwid perfctr. Source: <https://github.com/RRZE-HPC/likwid/wiki/likwid-perfctr>.
  26. TechPowerUp. NVIDIA GeForce RTX 2080 Ti. Source: <https://www.techpowerup.com/gpu-specs/geforce-rtx-2080-ti.c3305>.
  27. TechPowerUp. NVIDIA GeForce GTX 1080 Ti. Source: <https://www.techpowerup.com/gpu-specs/geforce-gtx-1080-ti.c2877>.
  28. Zakharov IE, Panarin OA, Rykovanov SG, Zagidullin RR, Malyutin AK, Shkandybin YuN, Ermekova AE. Monitoring applications on the ZHORES cluster at Skoltech. Program systems: Theory and Applications 2021; 12(2:49): 73-103.
  29. Panarin OА, Zacharov IE. Monitoring mobile information processing systems. Russian Digital Libraries Journal 2020; 23(4): 835-847.

© 2009, IPSI RAS
151, Molodogvardeiskaya str., Samara, 443001, Russia; E-mail: journal@computeroptics.ru ; Tel: +7 (846) 242-41-24 (Executive secretary), +7 (846) 332-56-22 (Issuing editor), Fax: +7 (846) 332-56-20