(46-1) 18 * << * >> * Russian * English * Content * All Issues
  
Towards energy-efficient neural network calculations
  E.S. Noskova 1, I.E. Zakharov 1, Y.N. Shkandybin 1, S.G. Rykovanov 1
1 Skolkovo Institute of Science and Technology,
  121205, Moscow, Russia, Bolshoi boulevard, 30, building 1
  PDF, 903 kB
DOI: 10.18287/2412-6179-CO-914
Pages: 160-166.
Full text of article: Russian language.
 
Abstract:
Nowadays, the problem of  creating high-performance and energy-efficient hardware for Artificial  Intelligence tasks is very acute. The most popular solution to this problem is  the use of Deep Learning Accelerators, such as GPUs and Tensor Processing Units  to run neural networks. Recently, NVIDIA has announced the NVDLA project, which  allows one to design neural network accelerators based on an open-source code.  This work describes a full cycle of creating a prototype NVDLA accelerator, as  well as testing the resulting solution by running the resnet-50 neural network  on it. Finally, an assessment of the performance and power efficiency of the  prototype NVDLA accelerator when compared to the GPU and CPU is provided, the  results of which show the superiority of NVDLA in many characteristics.
Keywords:
NVDLA, FPGA, inference,  deep learning accelerators.
Citation:
  Noskova ES, Zakharov IE, Shkandybin YN, Rykovanov SG. Towards energy-efficient neural network calculations. Computer Optics 2022; 46(1): 160-166. DOI: 10.18287/2412-6179-CO-914.
References:
  - Goodfellow I, Bengio Y,  Courville A. Deep learning. Cambridge:  The MIT Press; 2016.
 
  - Zacharov I, Arslanov R,  Gunin M, Stefonishin D, Pavlov S, Panarin O, Maliutin A, Rykovanov SG, Fedorov  M. “Zhores” – Petaflops supercomputer for data-driven modeling, machine  learning and artificial intelligence installed in Skolkovo Institute of Science  and Technology. Open Eng 2019; 9(1): 512-520. 
     - Shaw  DE, Deneroff MM, Dror RO, et al. Anton, a special-purpose machine for molecular  dynamics simulation. Commun ACM 2008; 51(7): 91-97.
       - Singer  G. Deep Learning is coming of age. 2018. Source: <https://www.nextplatform.com/2018/10/18/deep-learning-is-coming-of-age/>.
       - Merenda M, Porcaro C, Iero D. Machine  learning for AI-enabled IoT devices: a review. Sensors 2020; 20(9): 2533.
       - Park  J, Naumov M, Basu P, et al. Deep learning inference in facebook data centers:  Characterization, performance optimizations and hardware implications. arXiv  preprint arXiv:1811.09886. 2018. Source: <https://arxiv.org/abs/1811.09886>.
       - Mishra  A, Nurvitadhi E, Cook J. Marr D. WRPN: Wide reduced-precision networks. ICLR  (Poster) 2018.
       - Chen Y, Xie Y, Song L, Chen F, Tang  T. A survey of accelerator architectures for deep neural networks. Engineering  2020; 6(3):  264-274.
       - Jouppi  NP, Young C, Patil N, et al. In-datacenter performance analysis of a tensor  processing unit. Proc 44th Annual int  Symposium on Computer Architecture 2017: 1-12.
       - Guo  K, Zeng S, Yu J, Wang Y, Yang H. A survey of FPGA-based neural network accelerator.  arXiv preprint arXiv:1712.08934. 2017. Source: <https://arxiv.org/abs/1712.08934>. 
       - NVDLA.  <Source: http://nvdla.org/>.
       - Jia  Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick RB, Guadarrama S,  Darrell T. Caffe: Convolutional architecture for fast feature embedding. Proc  22nd ACM Int Conf on Multimedia 2014: 675-678.
       - Tan  Z, Waterman A, Cook H, Bird S, Asanovic K, Patterson D. A case for FAME: FPGA  architecture model execution. ACM SIGARCH Computer Architecture News 2010;  38(3): 290-301.
       - BeagleV  Forum. Source: <https://beagleboard.org/beaglev>.
       - The  economics of ASICs: At what point does a custom SoC become viable? Source: <https://www.electronicdesign.com/technologies/embedded-revolution/article/21808278/the-economics-of-asics-at-what-point-does-a-custom-soc-become-viable>.
       - Xilinx  Zynq UltraScale+ MPSoCZCU104 evaluation kit. Source: <https://www.electronicdesign.com/technologies/embedded-revolution/article/21808278/the-economics-of-asics-at-what-point-does-a-custom-soc-become-viable>.
       - Delbergue  G, Burton M, Konrad F, Le Gal B, Jego C. QBox: An industrial solution for  virtual platform simulation using QEMU and SystemC TLM-2.0. 8th  European Congress on Embedded Real Time Software and Systems (ERTS 2016) 2016:  hal-01292317.
       - The Xilinx Vivado. Source: <https://www.xilinx.com/products/design-tools/vivado.html>.
       - Farshchi F, Huang Q, Yun H.  Integrating NVIDIA deep learning accelerator (NVDLA) with RISC-V SoC on FireSim.  2019 2nd Workshop on Energy Efficient Machine Learning and Cognitive  Computing for Embedded Applications (EMC2) 2019: 21-25.
       - He K, Zhang X, Ren S, Sun J. Deep  residual learning for image recognition. 2016 IEEE Conf on Computer Vision and  Pattern Recognition (CVPR) 2016: 770-778.
       - UltraScale+  FPGA product tables and product selection guide. Source: <https://www.xilinx.com/support/documentation/selection-guides/ultrascale-plus-fpga-product-selection-guide.pdf>.
       - GeForce GTX 1080 Ti. Source: <https://www.nvidia.com/en-sg/geforce/products/10series/geforce-gtx-1080-ti/>.
       - GeForce RTX 2080 Ti. Source: <https://www.nvidia.com/ru-ru/geforce/graphics-cards/rtx-2080-ti/>.
       - Second  Generation Intel Xeon scalable processors datasheet. Source: <https://www.intel.ru/content/www/ru/ru/products/docs/processors/xeon/2nd-gen-xeon-scalable-datasheet-vol-1.html>.
       - Likwid  perfctr. Source: <https://github.com/RRZE-HPC/likwid/wiki/likwid-perfctr>.
       - TechPowerUp.  NVIDIA GeForce RTX 2080 Ti. Source: <https://www.techpowerup.com/gpu-specs/geforce-rtx-2080-ti.c3305>.
       - TechPowerUp.  NVIDIA GeForce GTX 1080 Ti. Source: <https://www.techpowerup.com/gpu-specs/geforce-gtx-1080-ti.c2877>.
       - Zakharov IE, Panarin OA, Rykovanov  SG, Zagidullin RR, Malyutin AK, Shkandybin YuN, Ermekova AE. Monitoring  applications on the ZHORES cluster at Skoltech. Program systems: Theory and Applications 2021; 12(2:49): 73-103.       
      
 - Panarin OА, Zacharov IE. Monitoring mobile information processing systems. Russian  Digital Libraries Journal 2020; 23(4): 835-847. 
       
  
  © 2009, IPSI RAS
  151, Molodogvardeiskaya str., Samara, 443001, Russia; E-mail: journal@computeroptics.ru ; Tel: +7 (846) 242-41-24 (Executive secretary), +7 (846) 332-56-22 (Issuing editor), Fax: +7 (846) 332-56-20