FaceDetectNet: face detection via fully-convolutional network
Gorbatsevich V.S., Moiseenko A.S., Vizilter Y.V.


State Research Institute of Aviation Systems (GosNIIAS), Moscow, Russia;

Moscow Institute of Physics and Technology (MIPT), Moscow, Russia


Face detection is one of the most popular computer vision tasks. There are a lot of face detection approaches proposed including different CNN-based techniques, but the problem of optimal balancing between detection quality and computational speed is still relevant. In this paper we propose new CNN-based solution for face detection called FaceDetectNet. Our CNN architecture is based on ideas of YOLO/DetectNet and GoogleNet architecture supported with some new tools and implementation details created especially for our face detection application. We propose: original iterative proposal clustering (IPC) algorithm for aggregation of output face proposals formed by CNN and the 2-level “weak pyramid” providing better detection quality on the testing sets containing both small and huge images. Our face detection approach is close to previously proposed SSD-based face detection, but the principal difference is that we use the deep features of top hidden CNN layer for forming the face proposals of any size. Thus we utilize the global semantic and context information for improving the detection quality for small faces. Our FaceDetectNet is trained and tested on the most challenging WIDER FACE detection benchmark. Our algorithm achieves the average precision (AP) 0.69 on the WIDER FACE hard level, and thus outperforms all competitive detectors on the Hard level besides the HR state-of-the-art solution. Note that HR solution is based on essentially deeper and slower CNN, while our FaceDetectNet can work in real-time on the NVIDIA GeForce 1080 GPU. On the other hand, SSD-based face detector with comparable CNN parameters provides AP 0.625 only on the WIDER FACE hard level. So, our approach provides the best quality with reasonable computational speed.

CNN, face detection, DetectNet, YOLO.

Gorbatsevich VS, Moiseenko AS, Vizilter YV. FaceDetectNet: Face detection via fully-convolutional network. Computer Optics 2019; 43(1): 63-71. DOI: 10.18287/2412-6179-2019-43-1-63-71.


  1. Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. Proc IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016: 779-788. DOI:10.1109/CVPR.2016.91.
  2. Tao A, Barker J, Sarathy S. DetectNet: Deep neural network for object detection in DIGITS. Source: < https://devblogs.nvidia.com/detectnet-deep-neural-network-object-detection-digits/ >.
  3. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C, Berg A. SSD: Single shot multibox detector. Source: < https://arxiv.org/abs/1512.02325 >. DOI: 10.1007/978-3-319-46448-0_2.
  4. Hu P, Ramanan D. Finding tiny faces. Proc IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017. Source: < https://arxiv.org/abs/1612.04402 >.
  5. Zhu C, Zheng Y, Luu K, Savvides M. CMS-RCNN: contextual multi-scale region-based CNN for unconstrained face detection. Source: < https://arxiv.org/abs/1606.05413 > 2016.
  6. Viola P, Jones M. Rapid object detection using a boosted cascade of simple features. Proc IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2001; 1. DOI: 10.1109/CVPR.2001.990517.
  7. Bourdev L, Brandt J. Robust object detection via soft cascade. Proc IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2005; 2: 236-243. DOI: 10.1109/CVPR.2005.310.
  8. Chen D, Ren S, Wei Y, Cao X, Sun J. Joint cascade face detection and alignment. In Book: Fleet D, Pajdla T, Schiele B, Tuytelaars T, eds. Computer Vision – ECCV 2014. Cham: Springer; 2014: 109-122. DOI: 10.1007/978-3-319-10599-4_8.
  9. Li J, Wang T, Zhang Y. Face detection using SURF Cascade. Proc IEEE International Conference on Computer Vision Workshops (ICCV Workshops) 2011: 2183-2190. DOI: 10.1109/ICCVW.2011.6130518.
  10. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. NIPS'12 Proc 25th International Conference on Neural Information Processing Systems 2012; 1: 1097-1105.
  11. Girshick R, Donahue J, Darrell T, Malik J. Rich featurehierarchies for accurate object detection and semantic segmentation. CVPR '14 Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2014: 580-587. DOI: 10.1109/CVPR.2014.81.
  12. Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. arXiv preprint 2015. Source: < https://arxiv.org/abs/1506.01497 >.
  13. Yang S, Luo P, Loy CC, Tang X. From facial parts responses to face detection: A deep learning approach. Proc IEEE International Conference on Computer Vision 2015: 3676-3684. DOI: 10.1109/ICCV.2015.419.
  14. Jiang H, Learned-Miller E. Face detection with the faster R-CNN. 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017) 2017: 650-657. DOI: 10.1109/FG.2017.82.
  15. Li H, Lin Z, Shen X, Brandt J, Hua G. A convolutional neural network cascade for face detection. Proc IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2015: 5325-5334. DOI: 10.1109/CVPR.2015.7299170.
  16. Qin H, Yan J, Li X, Hu X. Joint training of cascaded CNN for face detection. Proc 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016: 3456-3465. DOI: 10.1109/CVPR.2016.376.
  17. Zhang K, Zhang Z, Li Z, Qiao Y. Joint face detection and alignment using multi-task cascaded convolutional networks. IEEE Signal Processing Letters 2016; 23(10): 1499-1503. DOI: 10.1109/LSP.2016.2603342.
  18. Zhang C, Zhang Z. Improving multiview face detection with multi-task deep convolutional neural networks. Proc IEEE Winter Conference on Applications of Computer Vision 2014: 1036-1041. DOI: 10.1109/WACV.2014.6835990.
  19. Redmon J, Farhadi A. YOLO9000: Better, faster, stronger. arXiv preprint. Source: < https://arxiv.org/abs/1612.08242 >.
  20. Yang S, Xiong Y, Change C, Tang LX. Face detection through scale-friendly deep convolutional networks. arXiv preprint. Source: < https://arxiv.org/abs/1706.02863 >.
  21. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. Proc 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2015: 1-9. DOI: 10.1109/CVPR.2015.7298594.
  22. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. Caffe: Convolutional architecture for fast feature embedding. Proc 22nd ACM international conference on Multimedia 2014: 675-678. DOI: 10.1145/2647868.2654889.
  23. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. ImageNet: A large-scale hierarchical image database. Proc IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2009: 248-255. DOI: 10.1109/CVPR.2009.5206848.
  24. Yang S, Luo P, Loy CC, Tang X. WIDER FACE: A face detection benchmark. Proc 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016: 5525-5533. DOI: 10.1109/CVPR.2016.596.

© 2009, IPSI RAS
151, Molodogvardeiskaya str., Samara, 443001, Russia; E-mail: journal@computeroptics.ru ; Tel: +7 (846) 242-41-24 (Executive secretary), +7 (846)332-56-22 (Issuing editor), Fax: +7 (846) 332-56-20