资讯

历史

科技

环境与自然

成长

游戏

财经

文学与艺术

美食

健康

家居

文化

情感

汽车

三农

军事

旅行

运动

教育

生活

星座命理

目标检测算法国内外研究现状综述

创作时间:

作者:

@小白创作中心

目标检测算法国内外研究现状综述

引用

CSDN

https://blog.csdn.net/Joejwu/article/details/131521981

目标检测是计算机视觉领域的重要研究方向，近年来在深度学习的推动下取得了显著进展。本文对目标检测算法的研究现状进行了全面综述，从传统方法到基于深度学习的Anchor based和Anchor Free算法，详细介绍了各类算法的原理、特点和性能表现。

传统目标检测算法

传统目标检测算法主要通过手工提取特征的方式进行目标检测，其基本流程包括候选区域选择、特征提取和分类。Viola Jones检测器通过积分图、AdaBoost分类器和级联结构等优化措施提高检测效率；HOG检测器通过局部像素块提取特征直方图，具有较好的光照和变形鲁棒性；DPM检测器在HOG检测器基础上叠加边框回归等技术，在VOC目标检测挑战赛中获得冠军。尽管这些传统算法在当时取得了不错的结果，但与基于深度学习的算法相比，在精度、计算量和检测速度等方面仍有较大差距。

基于深度学习的Anchor based两阶段目标检测算法

基于深度学习的目标检测算法主要分为Anchor based和Anchor free两大类。其中，Anchor based方案又可分为单阶段和两阶段检测算法。两阶段目标检测算法先从待检测图像中选择候选区域，再从候选区域中检测并生成目标边框。最早的基于CNN的两阶段目标检测算法是RCNN，通过选择搜索从候选框中选择可能包含物体的目标框，随后作为CNN模型的输入来提取特征，最终传输给SVM分类器进行判断。SPPNet采用空间金字塔池化层，避免重复计算；Fast RCNN在VOC 2007数据集上实现了70.0%的mAP；Faster RCNN创新性地提出了区域候选网络生成候选框，大幅提升检测速度；FPN通过横向连接的自上而下的结构，进一步提升检测精度；Cascade RCNN通过堆叠多个级联模块，采用不同IOU阈值进行训练；Grid RCNN将位置回归替换为关键点检测，实现SOTA效果。

图1.1 目标检测算法近20年来发展路线图

基于深度学习的Anchor based单阶段目标检测算法

单阶段目标检测算法直接产生检测结果，包括类别概率与边界框坐标。最早的单阶段检测器是YOLO v1，创新性地提出将图像划分为多个网格，对每个网格预测边界框与类别概率。YOLO v2和YOLO v3通过改进骨干网络和多分支检测不同尺度目标，提升检测精度与速度。SSD采用Multi-reference和Multi-resolution技术，RetinaNet引入Focal Loss解决类别不平衡问题。YOLO v4集成了同时期目标检测领域众多Tricks，YOLO v5在精度和速度上超越了YOLO v4，成为当前主流的单阶段检测器。最近的YOLO v6参考RepVGG中的思想，对骨干网络和Neck部分进行优化设计，实现70.0%的mAP。

基于深度学习的Anchor Free目标检测算法

Anchor based检测器存在对Anchor数量、大小和长宽比敏感等问题，Anchor Free检测器应运而生。CornerNet将目标边界框预测替换为关键点预测，CenterNet直接检测目标中心点坐标，FSAF提出新的结构用于特征金字塔网络训练，FCOS采用逐像素检测策略，SAPD提出软加权锚点与软选择金字塔层级策略，YOLOX实现Anchor Free并提出双检测头输出策略。这些算法在COCO数据集上均取得了优异的性能。

图1.3 Anchor Free目标检测网络结构示意图

参考文献

文中参考文献序号减去18即与下列相应文献对应！

Viola P, Jones M. Rapid object detection using a boosted cascade of simple features[C]//Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001. Ieee, 2001, 1: I-I.
Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]//2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05). Ieee, 2005, 1: 886-893.
Felzenszwalb P F, Girshick R B, McAllester D, et al. Object detection with discriminatively trained part-based models[J]. IEEE transactions on pattern analysis and machine intelligence, 2010, 32(9): 1627-1645.
Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 580-587.
He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(9): 1904-1916.
Girshick R. Fast r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1440-1448.
Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[J]. Advances in neural information processing systems, 2015, 28.
Lin T Y, Maire M, Belongie S, et al. Microsoft coco: Common objects in context[C]//European conference on computer vision. Springer, Cham, 2014: 740-755.
Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2117-2125.
Cai Z, Vasconcelos N. Cascade r-cnn: Delving into high quality object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 6154-6162.
Lu X, Li B, Yue Y, et al. Grid r-cnn[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 7363-7372.
Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779-788.
Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 7263-7271.
Redmon J, Farhadi A. Yolov3: An incremental improvement[J]. arXiv preprint arXiv:1804.02767, 2018.
Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//European conference on computer vision. Springer, Cham, 2016: 21-37.
Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2980-2988.
Glenn Jocher, Alex Stoken, Ayush Chaurasia, et al., 2021. Ultralytics/yolov5: v6.0 - yolov5n 'nano' models, roboflow integration, tensorflow export, opencv DNN support[Z]. Zenodo(2021–10–12).
Li C, Li L, Jiang H, et al. YOLOv6: A single-stage object detection framework for industrial applications[J]. arXiv preprint arXiv:2209.02976, 2022.
Ding X, Zhang X, Ma N, et al. Repvgg: Making vgg-style convnets great again[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 13733-13742.
Law H, Deng J. Cornernet: Detecting objects as paired keypoints[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 734-750.
Duan K, Bai S, Xie L, et al. Centernet: Keypoint triplets for object detection[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 6569-6578.
Zhu C, He Y, Savvides M. Feature selective anchor-free module for single-shot object detection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 840-849.
Tian Z, Shen C, Chen H, et al. Fcos: Fully convolutional one-stage object detection[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 9627-9636.
Zhu C, Chen F, Shen Z, et al. Soft anchor-point object detection[C]//European conference on computer vision. Springer, Cham, 2020: 91-107.
Ge Z, Liu S, Wang F, et al. Yolox: Exceeding yolo series in 2021[J]. arXiv preprint arXiv:2107.08430, 2021.

本文原文来自CSDN，作者Joejwu