簡易檢索 / 詳目顯示

研究生: 黃世龍
Huang, Shih-Long
論文名稱: Modified Faster R-CNN with Applications to Cat and Dog Image Detection
Modified Faster R-CNN with Applications to Cat and Dog Image Detection
指導教授: 樂美亨
Yueh, Mei-Heng
口試委員: 樂美亨
Yueh, Mei-Heng
郭岳承
Kuo, Yueh-Cheng
黃聰明
Huang, Tsung-Ming
口試日期: 2024/07/22
學位類別: 碩士
Master
系所名稱: 數學系
Department of Mathematics
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 58
中文關鍵詞: 深度學習物件辨識Faster R-CNN
英文關鍵詞: Deep learning, Object detection, Faster R-CNN
研究方法: 實驗設計法比較研究觀察研究
DOI URL: http://doi.org/10.6345/NTNU202401217
論文種類: 學術論文
相關次數: 點閱:112下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著深度學習技術的快速發展,神經網絡在物件檢測應用的範圍和性能上不斷改進,取得了顯著的進展。本論文基於 Faster R-CNN 框架,通過調整參數和卷積神經網絡,應用於 Kaggle 數據集中的貓狗圖像檢測。通過觀察性能變化並使用統計重採樣方法來確保數據集對模型精度和召回率的影響,論文展示了重採樣方法和參數調整如何影響模型的精度和召回率。在調整到最佳參數後,論文展示了基於 ResNet 的 Faster R-CNN 模型在物件特徵提取和邊界框回歸中的有效性,並比較了單階段物件辨識與兩階段物件辨識的精度差異。實驗結果表明,作為 Faster R-CNN 模型中特徵提取卷積神經網絡的 ResNet 在該數據集上表現出色,且兩階段物件辨識模型在此數據集上有較好的精度表現。

    With the rapid development of deep learning technology, neural networks have continuously improved in both the scope and performance of object detection applications, achieving significant advancements. This thesis is based
    on the Faster R-CNN framework, altering parameters and convolutional neural networks, and applies it to detecting cat and dog images in the Kaggle dataset. By observing performance changes and employing statistical resampling methods to ensure the precision and recall of the dataset's impact on the model, the thesis demonstrates how resampling methods and parameter adjustments affect model precision and recall. After adjusting for optimal parameters, the effectiveness of the ResNet-based Faster R-CNN model in object feature extraction and bounding box regression, and compares the accuracy differences between one-stage and two-stage object detection. Experimental results indicate that ResNet, used as the feature extraction convolutional neural network in the Faster R-CNN model, performs excellently on this dataset, and the two-stage object detection model exhibits better accuracy performance on this dataset.

    誌謝 i 摘要 ii Abstract iii Table of Contents iv List of Tables vi List of Figures vii Section 1 Introduction 1 Section 2 Literature 3 2.1 Object Detection 3 2.2 Object Proposal 5 2.3 Research model 8 2.4 Convolutional Neural Networks (CNN) 9 2.4.1 Convolution Layer 10 2.4.2 Pooling Layer 11 2.4.3 Visual Geometry Group (VGG) 12 2.4.4 Residual Network (ResNet) 13 2.5 Region Proposal Network (RPN) 14 2.5.1 Anchor 14 2.5.2 Non-Maximum Suppression (NMS) 16 2.5.3 Loss Function 18 2.6 Region of Interest Pooling (ROI Pooling) 20 Section 3 Modified Method 21 3.1 Data Processing 22 3.2 Neural Network of Faster R-CNN 26 3.2.1 Operations in Convolutional Blocks 27 3.2.2 Tensor transformation 28 3.3 Input Handling 28 Section 4 Numerical Experiment 30 4.1 Experimental Design 31 4.1.1 Development Tools and Experimental Environment 32 4.2 Dataset Introduction 32 4.2.1 Experimental Evaluation Criteria 33 4.3 Experimental Results 34 4.3.1 Resampling Method 34 4.3.2 Convergence of Model 35 4.3.3 Increase the Batch Size 41 4.3.4 Change the Basenet 42 4.3.5 Compare one-stage with two-stage object detection 50 Section 5 Conclusion 51 5.1 Conclusion 51 5.2 Future Research 52 References 53 Appendixes 58

    Chen, M., Yu, L., Zhi, C., Sun, R., Zhu, S., Gao, Z., Ke, Z., Zhu, M., & Zhang, Y. (2022). Improved faster R-CNN for fabric defect detection based on Gabor filter with Genetic Algorithm optimization. Computers in Industry, 134. https://doi.org/10.1016/j.compind.2021.103551
    Fan, Q., Brown, L., & Smith, J. (2016). A closer look at Faster R-CNN for vehicle detection. 2016 IEEE intelligent vehicles symposium (IV), 124-129. https://doi.org/10.1109/IVS.2016.7535375
    Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International journal of computer vision, 59, 167-181. https://doi.org/10.1023/B:VISI.0000022288.19776.77
    Girshick, R. (2015). Fast r-cnn. Proceedings of the IEEE international conference on computer vision, 1440-1448. https://doi.org/10.1109/ICCV.2015.169
    Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
    He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. Proceedings of the IEEE international conference on computer vision, 1026-1034. https://doi.org/10.48550/arXiv.1502.01852
    He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, 770-778. https://doi.org/10.48550/arXiv.1512.03385
    Hosang, J., Benenson, R., Dollár, P., & Schiele, B. (2015). What makes for effective detection proposals? IEEE transactions on pattern analysis and machine intelligence, 38(4), 814-830. https://doi.org/10.1109/TPAMI.2015.2465908.
    Hosang, J., Benenson, R., & Schiele, B. (2017). Learning non-maximum suppression. Proceedings of the IEEE conference on computer vision and pattern recognition, 4507-4515. https://doi.org/10.48550/arXiv.1705.02950
    Hung, J., & Carpenter, A. (2017). Applying faster R-CNN for object detection on malaria images. Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 56-61. https://doi.org/10.48550/arXiv.1804.09548
    Jiang, P., Ergu, D., Liu, F., Cai, Y., & Ma, B. (2022). A Review of Yolo algorithm developments. Procedia computer science, 199, 1066-1073. https://doi.org/10.1016/j.procs.2022.01.135
    Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84-90. https://doi.org/10.1145/3065386
    LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. https://doi.org/10.1109/5.726791.
    Liu, X., Ghazali, K. H., Han, F., & Mohamed, I. I. (2023). Review of CNN in aerial image processing. The Imaging Science Journal, 71(1), 1-13. https://doi.org/10.1080/13682199.2023.2174651
    Maity, M., Banerjee, S., & Chaudhuri, S. S. (2021). Faster r-cnn and yolo based vehicle detection: A survey. 2021 5th international conference on computing methodologies and communication (ICCMC), 1442-1447. https://doi.org/10.1109/ICCMC51019.2021.9418274
    McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity (Vol. 5). https://doi.org/10.1007/BF02478259
    Papageorgiou, C. P., Oren, M., & Poggio, T. (1998). A general framework for object detection. Sixth international conference on computer vision (IEEE Cat. No. 98CH36271), 555-562. https://doi.org/10.1109/TPAMI.2015.2465908.
    Powers, D. M. (2020). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061. https://doi.org/10.48550/arXiv.2010.16061
    Qian, S., & Weng, G. (2015). Research on object detection based on mathematical morphology. 4th international conference on information technology and management innovation, 203-208. https://doi.org/10.2991/icitmi-15.2015.36
    Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence, 39(6), 1137-1149. https://doi.org/10.1109/TPAMI.2016.2577031.
    Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., & Savarese, S. (2019). Generalized intersection over union: A metric and a loss for bounding box regression. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 658-666. https://doi.org/10.1109/CVPR.2019.00075
    Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. https://doi.org/10.48550/arXiv.1409.1556
    Uijlings, J. R., Van De Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition. International journal of computer vision, 104, 154-171. https://doi.org/10.1007/s11263-013-0620-5
    Xiao, J., Wang, J., Cao, S., & Li, B. (2020). Application of a novel and improved VGG-19 network in the detection of workers wearing masks (Vol. 1518). IOP Publishing. https://doi.org/10.1088/1742-6596/1518/1/012041
    Yadav, N., Yadav, A., & Kumar, M. (2015). An introduction to neural network methods for differential equations (Vol. 1). Springer. https://doi.org/10.1007/978-94-017-9816-7
    Zhiqiang, W., & Jun, L. (2017). A review of object detection based on convolutional neural network. 2017 36th Chinese control conference (CCC), 11104-11109. https://doi.org/10.23919/ChiCC.2017.8029130.

    無法下載圖示 本全文未授權公開
    QR CODE