研究生: |
李政霖 Li, Cheng-Lin |
---|---|
論文名稱: |
基於深度學習之多連接模塊對於物件偵測的影響 The Impact of Multi-Connection Blocks Based on Deep Learning for Object Detection |
指導教授: |
蘇崇彥
Su, Chung-Yen |
口試委員: |
賴穎暉
Lai, Ying-Hui 吳順德 Wu, Shuen-De 蘇崇彥 Su, Chung-Yen |
口試日期: | 2022/07/12 |
學位類別: |
碩士 Master |
系所名稱: |
電機工程學系 Department of Electrical Engineering |
論文出版年: | 2022 |
畢業學年度: | 110 |
語文別: | 中文 |
論文頁數: | 60 |
中文關鍵詞: | 深度學習 、物件偵測 、YOLOv5 、多連接模塊 、殘差模塊 |
英文關鍵詞: | Deep learning, Object detection, YOLOv5, Multi-Connection block, Residual block |
研究方法: | 比較研究 |
DOI URL: | http://doi.org/10.6345/NTNU202200855 |
論文種類: | 學術論文 |
相關次數: | 點閱:185 下載:13 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在本論文中,我們提出與YOLOv5不同的加深網路模型的方法,並設計了三種適用於特定資料集的多連接模塊(Multi-Connection)。多連接模塊的主要目的是重用特徵並保留輸入特徵以供向下傳遞。我們在8個公開的資料集驗證我們的方法。我們改進了YOLOv5中的殘差塊(Residual block)。實驗結果顯示,與YOLOv5s6相比,YOLOv5s6加入多連接模塊型一在Global Wheat Head Dataset 2020上的平均精度(mAP)提高1.6%; YOLOv5s6加入多連接模塊型二在PlantDoc 資料集上的 mAP 提高2.9%;YOLOv5s6加入多連接模塊型三的mAP在PASCAL Visual Object Classes(VOC)資料集上提高了2.9%。另一方面,我們也比較了一般的傳統深化模型的方法。一般來說,加深網絡模型會提高模型的學習能力,但我們認為對於不同的資料集,採用不同的策略可以獲得更高的準確率。
此外我們設計多連接模塊型四,應用在交通號誌偵測上,多連接模塊型四之一基於殘差塊做堆疊增加網路深度,來加強網路的學習能力,並加入壓縮和激勵模塊(SE block),來強化特徵圖資訊,另外透過一個額外的跳連接鼓勵特徵重用。多連接模塊型四之二,主要是將多連接模塊型四之一的通道減半,來減少模型計算量跟參數量。多連接模塊型四之三我們基於多連接模塊型四之二多增加一個3乘3的卷積提升模型學習能力。我們選擇TT100K資料集來訓練模型,我們也收集了臺灣交通號誌當作客製化資料集,去驗證我們的方法,目的是要設計出一個高效性能的模塊,所以設計出多連接模塊型四之三。在TT100K資料集中多連接模塊型四之三獲得最好的表現,與YOLOv5s6相比計算量僅增加了11%,mAP提升了3.2%,犧牲一點計算量換來模型準確率有感的提升,此外我們也在其他公開的資料集驗證我們的方法,多連接模塊型四之三的表現也是非常有效益的。
In this paper, we propose a different method of deepening the network model from YOLOv5s6 and design three types of Multi-Connection (MC) blocks that are suitable for specific datasets. The main purpose of the Multi-Connection block is to reuse features and retain input features for passing down. Eight public datasets and one customized dataset are run for verification. We improve the residual block in YOLOv5. The experimental results show that compared with the YOLOv5s6, the mean average precision (mAP) of YOLOv5s6 with MC type I is improved by 1.6% on the Global Wheat Head Dataset 2020. Compared with the YOLO5v5s6, the mAP of YOLOv5s6 with MC type II is improved by 2.9% on the PlantDoc dataset. Compared with the YOLO5v5s6 the mAP of YOLOv5s6 with MC type III is improved by 2.9% on the PASCAL Visual Object Classes (VOC) dataset. The MC block has a better performance than YOLOv5s6. On the other hand, we also compare the traditional general method of deepening the model (double residual block). In general, deepening the network model will improve the learning ability of the model, but we believe that for different datasets, adopting different strategies can get higher accuracy.
In addition, we also improved the Multi-Connection block and designed the MC block type IV, which is applied to traffic sign detection. The MC block type IV-1 is based on the stacking of residual blocks to increase the network depth to enhance the network's learning ability. We add Squeeze-and-Excitation (SE) blocks to enhance feature map information and encourage feature reusing through an additional connection. The Multi-Connection block type IV-2 mainly halves the channels of the Multi-Connection block type IV-1 to reduce the model calculation and parameters. Multi-Connection block type IV-3, we add a 3-by-3 convolution based on the Multi-Connection block type IV-2 to improve the learning ability of the model. We choose the traffic signs dataset TT100K to train the model, and we also collected Taiwan traffic signs as a customized dataset to validate our method. The purpose is to design a high-performance block, so the Multi-Connection block type IV-3 is designed, and the Multi-Connection block type IV-3 achieves the best performance in the TT100K dataset. Compared with YOLOv5s6, the Multi-Connection block type IV-3 has the best performance in the TT100K dataset, with a mere 11% increase in computation and a 3.2% increase in mAP. Sacrificing a little calculation in exchange for a significant improvement in the accuracy of the model, in addition, we are also verifying our method in other public datasets, and the performance of the Multi-Connection block type IV-3 is also very beneficial.
[1] Glenn Jocher et al. "YOLOv5." https://github.com/ultralytics/yolov5
[2] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[3] Z. Zhu, D. Liang, S. Zhang, X. Huang, B. Li, and S. Hu, "Traffic-sign detection and classification in the wild," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2110-2118.
[4] C.-Y. Wang, H.-Y. M. Liao, Y.-H. Wu, P.-Y. Chen, J.-W. Hsieh, and I.-H. Yeh, "CSPNet: A new backbone that can enhance learning capability of CNN," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2020, pp. 390-391.
[5] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580-587.
[6] R. Girshick, "Fast r-cnn," in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440-1448.
[7] S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," Advances in neural information processing systems, vol. 28, 2015.
[8] TommyHuang. https://chih-sheng-huang821.medium.com/%E6%B7%B1%E5%BA%A6%E5%AD%B8%E7%BF%92-%E4%BB%80%E9%BA%BC%E6%98%AFone-stage-%E4%BB%80%E9%BA%BC%E6%98%AFtwo-stage-%E7%89%A9%E4%BB%B6%E5%81%B5%E6%B8%AC-fc3ce505390f (accessed.
[9] J. R. Uijlings, K. E. Van De Sande, T. Gevers, and A. W. Smeulders, "Selective search for object recognition," International journal of computer vision, vol. 104, no. 2, pp. 154-171, 2013.
[10] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779-788.
[11] C. Szegedy et al., "Going deeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1-9.
[12] J. Redmon and A. Farhadi, "YOLO9000: better, faster, stronger," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7263-7271.
[13] J. Redmon and A. Farhadi, "Yolov3: An incremental improvement," arXiv preprint arXiv:1804.02767, 2018.
[14] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, "Yolov4: Optimal speed and accuracy of object detection," arXiv preprint arXiv:2004.10934, 2020.
[15] S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y. Yoo, "Cutmix: Regularization strategy to train strong classifiers with localizable features," in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 6023-6032.
[16] 江大白. https://zhuanlan.zhihu.com/p/172121380 (accessed.
[17] K. He, X. Zhang, S. Ren, and J. Sun, "Spatial pyramid pooling in deep convolutional networks for visual recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 9, pp. 1904-1916, 2015.
[18] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature pyramid networks for object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117-2125.
[19] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, " Path aggregation network for instance segmentation," arXiv preprint arXiv: 1803.01534, 2018.
[20] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, "Distance-IoU loss: Faster and better learning for bounding box regression," in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, vol. 34, no. 07, pp. 12993-13000.
[21] A. Rosebrock, "Intersection over Union (IoU) for object detection," Diambil kembali dari PYImageSearch https//www. pyimagesearch. com/2016/11/07/intersection-over-union-iou-for-object-detection, 2016.
[22] C. Ning, H. Zhou, Y. Song, and J. Tang, "Inception single shot multibox detector for object detection," in 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 2017: IEEE, pp. 549-554.
[23] S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," in International conference on machine learning, 2015: PMLR, pp. 448-456.
[24] S. Elfwing, E. Uchibe, and K. Doya, "Sigmoid-weighted linear units for neural network function approximation in reinforcement learning," Neural Networks, vol. 107, pp. 3-11, 2018.
[25] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, "Densely connected convolutional networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700-4708.
[26] J. Hu, L. Shen, and G. Sun, "Squeeze-and-excitation networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132-7141.
[27] K. He, X. Zhang, S. Ren, and J. Sun, "Identity mappings in deep residual networks," in European conference on computer vision, 2016: Springer, pp. 630-645.
[28] S. Ruder, "An overview of gradient descent optimization algorithms," arXiv preprint arXiv:1609.04747, 2016.
[29] D. Du et al., "VisDrone-DET2019: The vision meets drone object detection in image challenge results," in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2019.
[30] E. David et al., "Global Wheat Head Detection (GWHD) dataset: a large and diverse dataset of high-resolution RGB-labelled images to develop and benchmark wheat head detection methods," Plant Phenomics, vol. 2020, 2020.
[31] D. Singh, N. Jain, P. Jain, P. Kayal, S. Kumawat, and N. Batra, "PlantDoc: a dataset for visual plant disease detection," in Proceedings of the 7th ACM IKDD CoDS and 25th COMAD, 2020, pp. 249-253.
[32] M. Pedersen, J. Bruslund Haurum, R. Gade, and T. B. Moeslund, "Detection of marine animals in a new underwater dataset with varying visibility," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 18-26.
[33] S. Bambach, S. Lee, D. J. Crandall, and C. Yu, "Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions," in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1949-1957.
[34] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, "The pascal visual object classes (voc) challenge," International journal of computer vision, vol. 88, no. 2, pp. 303-338, 2010.
[35] T.-Y. Lin et al., "Microsoft coco: Common objects in context," in European conference on computer vision, 2014: Springer, pp. 740-755.
[36] O. M. Parkhi, A. Vedaldi, A. Zisserman, and C. Jawahar, "Cats and dogs," in 2012 IEEE conference on computer vision and pattern recognition, 2012: IEEE, pp. 3498-3505.