目标检测算法国内外研究现状综述
目标检测算法国内外研究现状综述
目标检测是计算机视觉领域的重要研究方向,近年来在深度学习的推动下取得了显著进展。本文对目标检测算法的研究现状进行了全面综述,从传统方法到基于深度学习的Anchor based和Anchor Free算法,详细介绍了各类算法的原理、特点和性能表现。
传统目标检测算法
传统目标检测算法主要通过手工提取特征的方式进行目标检测,其基本流程包括候选区域选择、特征提取和分类。Viola Jones检测器通过积分图、AdaBoost分类器和级联结构等优化措施提高检测效率;HOG检测器通过局部像素块提取特征直方图,具有较好的光照和变形鲁棒性;DPM检测器在HOG检测器基础上叠加边框回归等技术,在VOC目标检测挑战赛中获得冠军。尽管这些传统算法在当时取得了不错的结果,但与基于深度学习的算法相比,在精度、计算量和检测速度等方面仍有较大差距。
基于深度学习的Anchor based两阶段目标检测算法
基于深度学习的目标检测算法主要分为Anchor based和Anchor free两大类。其中,Anchor based方案又可分为单阶段和两阶段检测算法。两阶段目标检测算法先从待检测图像中选择候选区域,再从候选区域中检测并生成目标边框。最早的基于CNN的两阶段目标检测算法是RCNN,通过选择搜索从候选框中选择可能包含物体的目标框,随后作为CNN模型的输入来提取特征,最终传输给SVM分类器进行判断。SPPNet采用空间金字塔池化层,避免重复计算;Fast RCNN在VOC 2007数据集上实现了70.0%的mAP;Faster RCNN创新性地提出了区域候选网络生成候选框,大幅提升检测速度;FPN通过横向连接的自上而下的结构,进一步提升检测精度;Cascade RCNN通过堆叠多个级联模块,采用不同IOU阈值进行训练;Grid RCNN将位置回归替换为关键点检测,实现SOTA效果。
图1.1 目标检测算法近20年来发展路线图
基于深度学习的Anchor based单阶段目标检测算法
单阶段目标检测算法直接产生检测结果,包括类别概率与边界框坐标。最早的单阶段检测器是YOLO v1,创新性地提出将图像划分为多个网格,对每个网格预测边界框与类别概率。YOLO v2和YOLO v3通过改进骨干网络和多分支检测不同尺度目标,提升检测精度与速度。SSD采用Multi-reference和Multi-resolution技术,RetinaNet引入Focal Loss解决类别不平衡问题。YOLO v4集成了同时期目标检测领域众多Tricks,YOLO v5在精度和速度上超越了YOLO v4,成为当前主流的单阶段检测器。最近的YOLO v6参考RepVGG中的思想,对骨干网络和Neck部分进行优化设计,实现70.0%的mAP。
基于深度学习的Anchor Free目标检测算法
Anchor based检测器存在对Anchor数量、大小和长宽比敏感等问题,Anchor Free检测器应运而生。CornerNet将目标边界框预测替换为关键点预测,CenterNet直接检测目标中心点坐标,FSAF提出新的结构用于特征金字塔网络训练,FCOS采用逐像素检测策略,SAPD提出软加权锚点与软选择金字塔层级策略,YOLOX实现Anchor Free并提出双检测头输出策略。这些算法在COCO数据集上均取得了优异的性能。
图1.3 Anchor Free目标检测网络结构示意图
参考文献
文中参考文献序号减去18即与下列相应文献对应!
- Viola P, Jones M. Rapid object detection using a boosted cascade of simple features[C]//Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001. Ieee, 2001, 1: I-I.
- Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]//2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05). Ieee, 2005, 1: 886-893.
- Felzenszwalb P F, Girshick R B, McAllester D, et al. Object detection with discriminatively trained part-based models[J]. IEEE transactions on pattern analysis and machine intelligence, 2010, 32(9): 1627-1645.
- Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 580-587.
- He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(9): 1904-1916.
- Girshick R. Fast r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1440-1448.
- Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[J]. Advances in neural information processing systems, 2015, 28.
- Lin T Y, Maire M, Belongie S, et al. Microsoft coco: Common objects in context[C]//European conference on computer vision. Springer, Cham, 2014: 740-755.
- Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2117-2125.
- Cai Z, Vasconcelos N. Cascade r-cnn: Delving into high quality object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 6154-6162.
- Lu X, Li B, Yue Y, et al. Grid r-cnn[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 7363-7372.
- Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779-788.
- Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 7263-7271.
- Redmon J, Farhadi A. Yolov3: An incremental improvement[J]. arXiv preprint arXiv:1804.02767, 2018.
- Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//European conference on computer vision. Springer, Cham, 2016: 21-37.
- Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2980-2988.
- Glenn Jocher, Alex Stoken, Ayush Chaurasia, et al., 2021. Ultralytics/yolov5: v6.0 - yolov5n 'nano' models, roboflow integration, tensorflow export, opencv DNN support[Z]. Zenodo(2021–10–12).
- Li C, Li L, Jiang H, et al. YOLOv6: A single-stage object detection framework for industrial applications[J]. arXiv preprint arXiv:2209.02976, 2022.
- Ding X, Zhang X, Ma N, et al. Repvgg: Making vgg-style convnets great again[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 13733-13742.
- Law H, Deng J. Cornernet: Detecting objects as paired keypoints[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 734-750.
- Duan K, Bai S, Xie L, et al. Centernet: Keypoint triplets for object detection[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 6569-6578.
- Zhu C, He Y, Savvides M. Feature selective anchor-free module for single-shot object detection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 840-849.
- Tian Z, Shen C, Chen H, et al. Fcos: Fully convolutional one-stage object detection[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 9627-9636.
- Zhu C, Chen F, Shen Z, et al. Soft anchor-point object detection[C]//European conference on computer vision. Springer, Cham, 2020: 91-107.
- Ge Z, Liu S, Wang F, et al. Yolox: Exceeding yolo series in 2021[J]. arXiv preprint arXiv:2107.08430, 2021.
本文原文来自CSDN,作者Joejwu