研究生: |
鍾宜修 Chung, Yi-Hsiu |
---|---|
論文名稱: |
針對空拍影像物件偵測之改良型YOLOv7演算法研究 An Improved YOLOv7 Algorithm for Object Detection on UAV Images |
指導教授: |
蘇崇彥
Su, Chung-Yen |
口試委員: |
蘇崇彥
Su, Chung-Yen 彭昭暐 Perng, Jau-Woei 賴以威 Lai, I-Wei |
口試日期: | 2024/06/12 |
學位類別: |
碩士 Master |
系所名稱: |
電機工程學系 Department of Electrical Engineering |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 中文 |
論文頁數: | 48 |
中文關鍵詞: | 深度學習 、物件偵測 、YOLOv7 、無人機 、小物件 |
英文關鍵詞: | Deep learning, Object detection, YOLOv7, UAV, Small object |
DOI URL: | http://doi.org/10.6345/NTNU202400899 |
論文種類: | 學術論文 |
相關次數: | 點閱:81 下載:3 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近幾年無人機的技術發展迅速,飛行距離越來越遠、體積也不斷縮小,甚至能自動飛行,因此能應用的範圍也越來越廣泛,例如交通監測、工業或自然環境巡檢等等。另外隨著人工智慧的興起,現在無人機也會結合人工智慧演算法協助其辨識影像。由於無人機所拍攝的影像內物件往往尺寸偏小,且無人機本身的運算支援有限,因此如何提升小物件的辨識效果且同時降低模型運算時所需的資源至關重要。
本論文以YOLOv7為基礎模型進行改良,提升它對小物件的偵測效果且同時降低模型參數量及計算量,我們以VisDrone-DET2019資料集來驗證模型改良成效。總共修改五種方式,第一種方式是將ELAN (Efficient Layer Aggregation Network)替換成M-ELAN (Modified Efficient Layer Aggregation Network),第二種方式是在高階特徵層添加M-FLAM (Modified Feature Layer Attention Module),第三種方式是將特徵融合的結構從PANet (Path Aggregation Network)改成ResFF (Residual Feature Fusion),第四種方式是將模型內下採樣的模塊改成I-MP模塊 (Improved MaxPool Module),最後一種方式是將SPPCSPC (Spatial Pyramid Pooling Cross Stage Partial Networks)替換成GSPP(Group Spatial Pyramid Pooling)。綜合以上方法,將mAP (mean Average Precision)提升1%,同時模型參數量卻下降24.5%,模型計算量GFLOPs (Giga Floating Point of Operations)也降低13.7%。
In recent years, the advancement of unmanned aerial vehicle technology has been rapid, with increased flying distances, reduced sizes, and even autonomous capabilities. Consequently, the scope of applications has expanded significantly, including traffic surveillance, industrial inspections, and environmental monitoring. Moreover, with the rise of artificial intelligence, UAV now integrate AI algorithms to aid in image recognition. However, due to the typically small size of objects captured by drones and limited computational resource, enhancing the recognition of small objects while simultaneously reducing the computational required is crucial.
This paper proposes improvements to the YOLOv7 as base model to enhance its ability to detection small objects while reducing model parameters and computation volume. We validate the model enhancements using the VisDrone-DET2019 dataset. There are five modifications. First of all, replacing Efficient Layer Aggregation Network (ELAN) with Modified Efficient Layer Aggregation Network (M-ELAN). Secondly, adding Modified Feature Layer Attention Module (M-FLAM) at high-level feature layers. Then modifying the feature fusion structure from Path Aggregation Network (PANet) to Residual Feature Fusion (ResFF). The fourth modification is replacing the downsampling modules within the model with I-MP (Improved MaxPool) modules. Finally, replacing Spatial Pyramid Pooling Cross Stage Partial Networks (SPPCSPC) with Group Spatial Pyramid Pooling (GSPP).
Combining these methods led to a 1% increase in mean Average Precision (mAP), while reducing model parameters by 24.5% and decreasing computation volume Giga Floating Point of Operations (GFLOPs) by 13.7%.
C. Y. Wang, A. Bochkovskiy, H. Y. Liao, “YOLOv7: Trainable Bag-of-freebies Sets New State-of-the-art for Real-time Object Detectors.” arXiv:2207.02696, 2022.
D. W. Du, P. F. Zhu, L. Y. Wen, “VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge Results.” 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 213-226, 2019.
R. Girshick, J. Donahue, T. Darrell, “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580-587, 2014.
R. Girshick, “Fast R-CNN.” Proceedings of the IEEE International Conference on Computer Vision, pp. 1440-1448, 2015.
S. Ren, K. He, R. Girshick, J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1137-1149, 2017.
J. Redmon, S. Divvala, R. Girshick, A. Farhadi, “You Only Look Once: Unified, Real-time Object Detection.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779-788, 2016.
J. Redmon, A. Farhadi, “YOLO9000: Better, Faster, Stronger.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7263-7271, 2017.
J. Redmon, A. Farhadi, “YOLOv3: An Incremental Improvement.” arXiv:1804.02767, 2018.
A. Bochkovskiy, C. Y. Wang, H. Y. Liao, “Yolov4: Optimal Speed and Accuracy of Object Detection.” arXiv:2004.10934, 2020.
G. Jocher, A. Chaurasia, A. Stoken, J. Borovec, et al. “YOLOv5 SOTA Realtime Instance Segmentation.” https://doi.org/10.5281/zenodo.7347926, 2022.
T. Y. Lin, P. Goyal, R. Girshick, “Focal Loss for Dense Object Detection.” Proceedings of the IEEE International Conference on Computer Vision, pp. 318-327, 2020.
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, A. C. Berg, “SSD: Single Shot MultiBox Detector.” arXiv: 1512.02325, 2016.
J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, A. W. M. Smeulders, “Selective Search for Object Recognition.” International Journal of Computer Vision, pp. 154-171, 2013.
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, “Going Deeper with Convolutions.” arXiv:1409.4842, 2014.
G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, R. R. Salakhutdinov. “Improving Neural Networks by Preventing Co-adaptation of Feature Detectors.” arXiv:1207.0580, 2012.
M. Everingham, L. Van~Gool, C. K. I. Williams, J. Winn, A. Zisserman, “The PASCAL Visual Object Classes Challenge 2007 (VOC2007).” http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html, 2007.
S. Ioffe, C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.” arXiv:1502.03167, 2015.
K. He, X. Zhang, S. Ren, J. Sun, “Deep Residual Learning for Image Recognition.” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-778, 2016.
T. Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, S. Belongie, “Feature Pyramid Networks for Object Detection.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117-2125, 2017.
D. Misra, “Mish: A Self Regularized Non-Monotonic Activation Function.” arXiv:1908.08681, 2019.
G. Ghiasi, T. Y. Lin, Q. V. Le, “DropBlock: A Regularization Method for Convolutional Networks.” In Advances in Neural Information Processing Systems (NIPS), pp. 10727-10737, 2018.
C. Y. Wang, H. Y. Liao, Y. H. Wu, P. Y. Chen, J. W. Hsieh, I. H. Yeh, “CSPNet: A New Backbone that can Enhance Learning Capability of CNN.” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1571-1580, 2020.
S. Liu, L. Qi, H. Qin, J. Shi, J. Jia, “Path Aggregation Network for Instance Segmentation.” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8759-8768, 2018.
K. He, X. Zhang, S. Ren, J. Sun, “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.” ECCV 2014.
Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, D. Ren, “Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression.” In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2020.
C. Y. Wang, H. Y. Liao, I. H. Yeh, “Designing Network Design Strategies Through Gradient Path Analysis.” arXiv:2211.04800, 2022.
D. Hendrycks, K. Gimpel, “Gaussian Error Linear Units (GELUs).” arXiv: 1606.08415, 2016.
X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, J. Sun, “RepVGG: Making VGG-style ConvNets Great Again.” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13728-13737, 2021.
Z. Ge, S. Liu, Z. Li, O. Yoshie, J. Sun, “OTA: Optimal Transport Assignment for Object Detection.” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 303-312, 2021.
P. Ramachandran, B. Zoph, Q. V. Le, “Swish: A Self-Gated Activation Function.” arXiv: 1710.05941, 2017.
S. Elfwing, E. Uchibe, K. Doya, “Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning.” arXiv: 1702.03118, 2017.
G. Huang, Z. Liu, L. V. D. Maaten, K. Q. Weinberger, “Densely Connected Convolutional Networks.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4700-4708, 2017.
Z. Ge, S. Liu, F. Wang, Z. Li, J. Sun, “YOLOX: Exceeding YOLO Series in 2021.” arXiv: 2107.08430, 2021.
Y. H. Chung, C. Y. Su, “Object Detection Algorithm Based on Improved YOLOv7 for UAV Images.” 2023 IEEE 5th Eurasia Conference on IOT, Communication and Engineering (ECICE), pp. 18-21, 2023.
J. Hu, S. Li, S. Gang, “Squeeze-and-Excitation Networks.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132-7141, 2018.
S. Woo, J. Park, J. Y. Lee, I. S. Kweon, “CBAM: Convolutional Block Attention Module.” Proceedings of the European Conference on Computer Vision (ECCV), 2018.
J. Li, Y. Gong, Z. Ma, M. Xie, “Enhancing Feature Fusion Using Attention for Small Object Detection.” 2022 IEEE 8th International Conference on Computer and Communications (ICCC), pp. 1859-1863, 2022.
M. Tan, R. Pang, Q. V. Le, “EfficientDet: Scalable and Efficient Object Detection.” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10778-10787, 2020.
H. Lou, X. Duan, J. Guo, H. Liu, J. Gu, L. Bi, H. Chen, “DC-YOLOv8: Small-Size Object Detection Algorithm Based on Camera Sensor.” Electronics 2023.
A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H. Adam, “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.” arXiv: 1704.04861, 2017.
X. Luo, Y. Wu, F. Wang, “Target Detection Method of UAV Aerial Imagery Based on Improved YOLOv5.” Remote Sens. 2022.
C. Li, L. Li, H. Jiang, K. Weng, et al. “YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications.” arXiv: 2209.02976, 2022.
Z. Gevorgyan, “SIoU Loss: More Powerful Learning for Bounding Box Regression.” arXiv: 2205.12740, 2022.
https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)
R. Ding, L. Dai, G. Li, H. Liu, “TDD-Net: A Tiny Defect Detection Network for Printed Circuit Boards.” CAAI Transactions on Intelligence Technology, 4. 10.1049/trit.2019.0019.