研究生: |
曾永權 Tseng, Yung-Chuan |
---|---|
論文名稱: |
以深度學習為基礎之視覺式行人危及行車安全程度評估系統 A Vision-Based Pedestrian-Endanger-Driving-Safety Evaluation System Using Deep Learning Techniques |
指導教授: |
方瓊瑤
Fang, Chiung-Yao |
口試委員: | 方瓊瑤 陳世旺 許之凡 黃仲誼 羅安鈞 |
口試日期: | 2021/07/23 |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 中文 |
論文頁數: | 62 |
中文關鍵詞: | 行人偵測 、神經網路 、深度學習 、駕駛輔助系統 |
英文關鍵詞: | Pedestrian detection, Neural network, Deep learning, Advanced Driver Assistance System |
DOI URL: | http://doi.org/10.6345/NTNU202101015 |
論文種類: | 學術論文 |
相關次數: | 點閱:103 下載:6 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
交通工具的進步使往返各地越來越方便,但也帶來許多交通事故。根據台灣內政部統計年報[4]統計,諸多交通事故造成台灣每年數十萬人傷亡,其中又以道路上弱勢行人受到的傷害為大。故本研究提出以深度學習為基礎之視覺式行人危及行車安全程度評估系統。該系統利用行車紀錄器影片作為輸入,用於主動式安全駕駛輔助,辨識行人危及行車安全程度,提前警告駕駛注意行人,希望能藉此降低車禍事故發生。
本研究首先對行人危及行車安全程度進行分析以及定義,將行人危及行車安全程度依據行人與攝影機距離、行人在影像中位置、行人面朝方位、行人是否移動、是否逆光5種條件分為14種情況,並將這14種情況對應到安全、低危險、中危險、高危險共4種類別。之後,本系統使用YOLOv4類神經網路模型作為骨架,進行YOLOv4的組合測試,並以前、後處理的方式進行改良。
本研究最終提出Single YOLOv4、Two-stage Training YOLOv4以及Parallel YOLOv4三種流程。Single YOLOv4直接以行人危及行車安全程度進行訓練,預測時加入後處理方法刪除過度重疊的預測框。Two-stage Training YOLOv4先針對影像中「人」進行訓練,再利用此權重學習行人危及行車安全程度,預測時利用第二階段學習到的權重進行預測並刪除過度重疊的預測框。Parallel YOLOv4訓練及測試時採用兩個YOLOv4,一YOLOv4以「人」進行訓練,一YOLOv4以行人危及行車安全程度進行訓練,預測時將兩個YOLOv4各自過度重疊的預測框刪除後合併預測結果。
本研究使用的測試資料庫之影片皆為作者親自拍攝,拍攝地區為新北市中永和地區,命名為Pedestrian-Endanger-Driving-Safety Evaluation Dataset。本研究所開發的行人危及行車安全程度系統將輸出一段影片,影片中含有行人的預測框,預測框上方有預測之危及行車安全程度,並根據4種不同危及行車安全類別,給予每個預測框不同顏色,用於區分危及行車安全程度。本系統以F1-measure作為正確率評估方式,最終獲得71.2%的正確率。
The advancement of transportation makes travel around more and more convenient, but also brings many traffic accidents. According to Ministry Of The Interior statistics, many traffic accidents cause hundreds of thousands people injured every year. Among them, disadvantaged pedestrians on the road suffer the most.
In this study, a vision-based pedestrian-endanger-driving-safety evaluation system using deep learning techniques is developed. The system using dashboard camera videos as input for advanced driver assistance systems, identifying pedestrian-endanger-driving-safety, and warning divers to pay attention on pedestrian. Hoping to reduce the occurrence of car accidents.
In this study, fourteen situations of pedestrian-endanger-driving-safety are defined according to five conditions: pedestrian and dashboard camera distance, pedestrian’s position in the image, pedestrian facing direction, pedestrian is moving or not, pedestrian is back light or not. These fourteen situations correspond to four categories: safety, low degree of danger, medium degree of danger and high degree of danger. After that, the system uses YOLOv4 neural networks model as backbone to conduct a combined test on YOLOv4, and using pre-processing and post-processing methods to improved.
This study finally proposes three structure: Single YOLOv4, Two-stage Training YOLOv4 and Parallel YOLOv4. Single YOLOv4 uses pedestrian-endanger-driving-safety for training, and uses post-processing to remove overlapping prediction boxes during prediction. Two-stage Training YOLOv4 first uses “person” in images for training, then uses this weight to train pedestrian-endanger-driving-safety. During prediction, Two-stage training YOLOv4 uses the weight learned in the second stage and uses post-processing to remove overlapping prediction boxes. Parallel YOLOv4 uses two YOLOv4 during training and testing. One YOLOv4 uses “person” for training, the other YOLOv4 uses pedestrian-endanger-driving-safety for training. During prediction, two YOLOv4 remove overlapping prediction boxes and the results are merged.
The videos in the dataset used in this study are all shoot by the author. The shooting area are in Yonghe Dist and Zhonghe Dist, New Taipei City. The dataset are named Pedestrian-Endanger-Driving-Safety Evaluation Dataset. The pedestrian-endanger-driving-safety evaluation system developed by this research will output a video contain a prediction box for pedestrian. Above the prediction box there is a predicted degree of endanger-driving-safety. According to four categories of endanger-driving-safety, each prediction box is given a different color to distinguish the degree of endanger-driving-safety. The system uses F1-measure as evaluation method, and finally obtains 71.2% of correct rate.
[Boc20] A. Bochkovskiy, C. Wang, and H. Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,” arXiv:2004.10934 [cs.CV], 2020.
[Red17] J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, pp. 6517-6525, 2017.
[Red18] J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,” arXiv:1804.02767 [cs.CV], 2018.
[Red16] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, pp. 779-788, 2016.
[Wan15] C. Wang, H. Mark Liao, Y. Wu, P. Chen, J. Hsieh, and I. Yeh, “CSPNet: A New Backbone that can Enhance Learning Capability of CNN,” Proceedings of 2020 Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, pp. 1571-1580, 2020.
[Ghi18] G. Ghiasi, T. Lin, and Q. Le, “DropBlock: A Regularization Method for Convolutional Networks,” Proceedings of Advances in Neural Information Processing Systems (NIPS), pp. 10727-10737, 2018.
[He15] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1904-1916, 2015.
[Tia15] Y. Tian, P. Luo, X. Wang, and X. Tang, “Deep Learning Strong Parts for Pedestrian Detection,” Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, pp. 1904-1912, 2015.
[He16] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, pp. 770-778, 2016.
[Rez19] H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, “Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression,” Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp. 658-666, 2019.
[Zhe19] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, “Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression,” arXiv:1911.08287 [cs.CV], 2019
[Han20] B. Han, Y. Wang, Z. Yang, and X. Gao, “Small-Scale Pedestrian Detection Based on Deep Neural Network,” IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 7, pp. 3046-3055, 2020.
[Gir14] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation,” Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, Ohio, pp. 580-587, 2014.
[Ren17] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 2017.
[Gir15] R. Girshick, “Fast R-CNN,” Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, pp. 1440-1448, 2015.
[Kon16] T. Kong, A. Yao, Y. Chen and F. Sun, “HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection,” Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, pp. 845-853, 2016.
[Bra17] G. Brazil, X. Yin, and X. Liu, “Illuminating Pedestrians via Simultaneous Detection and Segmentation,” Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV), Venice, pp. 4960-4969, 2017.
[Liu16] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, and A. Breg, “SSD: Single shot MultiBox Detector,” Proceedings of 2016 European Conference on Computer Vision (ECCV), Amsterdam, Netherlands, pp.21-37, 2016.
[Zha18] S. Zhang, L. Wen, X. Bian, Z. Lei, and S. Li, “Single-Shot Refinement Neural Network for Object Detection,” Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), UT, USA, pp. 4203-4212, 2018.
[Iof15] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” Proceedings of The 32nd International Conference on Machine Learning (PMLR), pp. 448-456, 2015
[Liu18] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path Aggregation Network for Instance Segmentation,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), UT, USA, pp. 8759-8768, 2018.
[1] 交通部統計查詢網,機動車輛登記數:https://stat.motc.gov.tw /mocdb/stmain.jsp?sys=100&funid=b3301,2021年。
[2] 民國108年行政院主計處國情統計通報:https://www.stat. gov.tw/public/Data/0317144756L3EO3FL6.pdf,2020年。
[3] 民國109年行政院主計處國情統計通報: https://www.dgbas. gov.tw/public/Data/010261811314J9NIE4Q.pdf,2020年。
[4] 內政部統計年報: https://www.moi.gov.tw/files/site_stuff /321/2/year/year.html,2020年。
[5] YOLOv3參考架構圖: https://pic2.zhimg.com/v2-af7f12ef17655870f1c65b17878525f1_r.jpg
[6] YOLOv4 參考架構圖: https://zhuanlan.zhihu.com/p/143747206
[7] Caltech Pedestrian Detection Benchmark: http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/
[8] The KITTI Vision Benchmark Suite: http://www.cvlibs.net/datasets/kitti/