研究生: |
羅郁鈞 Lo, Yu-Chun |
---|---|
論文名稱: |
基於非對稱U-Net實現微小且快速移動之物體檢測網路 TinySeeker: A Network for Seeking Tiny and Fast Moving Object Based on Asymmetric U-Net |
指導教授: |
林政宏
Ling, Cheng-Hung |
口試委員: |
林政宏
Ling, Cheng-Hung 賴穎暉 Lai, Ying-Hui 劉一宇 Liu, Yi-Yu |
口試日期: | 2024/07/22 |
學位類別: |
碩士 Master |
系所名稱: |
電機工程學系 Department of Electrical Engineering |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 中文 |
論文頁數: | 46 |
中文關鍵詞: | 物件偵測 、U-Net 、高效率架構 、熱力圖預測 |
英文關鍵詞: | Object Detection, U-Net, High Efficient Structure, Heatmap prediction |
研究方法: | 實驗設計法 |
DOI URL: | http://doi.org/10.6345/NTNU202401422 |
論文種類: | 學術論文 |
相關次數: | 點閱:95 下載:2 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文旨在探討物件偵測在微小、快速且特徵不明顯的物體上的應用。為了改進比賽戰術並提升技能,專業運動員和業餘玩家經常使用手機或相機記錄他們的練習和比賽。隨著這一領域的興起,越來越多的研究人員開始結合深度學習模型與運動分析,以提供更全面的見解。物件偵測是其中的關鍵任務,因為識別物體的位置可以提供有價值的資訊,如戰略分析。然而,針對如羽毛球這樣快速移動且模糊的物體進行追蹤的研究仍然有限。TrackNetv2方法基於VGG-16和U-Net,通過熱力圖檢測羽毛球的位置,但其架構需要大量計算資源,難以在實際應用中保持高效。為了解決這個問題,我們提出了一種名為TinySeeker的非對稱架構,這種新穎的架構不僅能精確的檢測羽毛球的位置,還能提高計算效率,在檢測精度和計算需求之間達到了最佳平衡,使其在現實應用中既實用又高效。實驗結果表明,Tinyseeker可以在保持精度的同時減少多達26%的計算量。這種架構在該領域標誌著一項重大進展,推動了物體檢測任務的可能性,並為未來的類似研究設立了新的基準。
To refine strategies and augment skills, both professional athletes and amateur players routinely utilize cameras to document their practice sessions and games. As a result, an increasing number of researchers are exploring this field, aiming to offer comprehensive insights. Object detection is a pivotal task within this field, as identifying object locations can provide valuable insights, such as strategic analysis. However, only a limited number of studies have specifically focused on tracking fast-moving and indistinct objects such as a badminton shuttlecock. The preceding method, TrackNetv2, proposed the use of VGG-16 and U-Net, a heatmap-based approach, for badminton detection. However, the architecture of U-Net demands substantial computational resources in this paper. To tackle this issue, we present a pioneering asymmetric architecture named Tinyseeker inspired by U-Net. This novel model not only assures precise detection of the badminton shuttlecock's location, but it also champions computational efficiency. The reimagined structure strikes an optimal balance between detection accuracy and computational demands, making it a practical and effective solution for real-world applications. Experimental results show that Tinyseeker can reduce calculation up to 26% while remaining the precision. This architecture marks a significant advancement in the field, pushing the boundaries of what is possible within object detection tasks and setting a new benchmark for similar studies in the future.
HUANG, Yu-Chuan, et al. Tracknet: A deep learning network for tracking high-speed and tiny objects in sports applications. In: 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 2019. p. 1-8.
Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer International Publishing, 2015.
SUN, Nien-En, et al. Tracknetv2: Efficient shuttlecock tracking network. In: 2020 International Conference on Pervasive Artificial Intelligence (ICPAI). IEEE, 2020. p. 86-91.
SIMONYAN, Karen; ZISSERMAN, Andrew. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
HE, Kaiming, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. p. 770-778.
DENG, Jia, et al. Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009. p. 248-255.
RUSSAKOVSKY, Olga, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 2015, 115: 211-252.
REDMON, Joseph, et al. You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. p. 779-788.
REDMON, Joseph; FARHADI, Ali. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.
BOCHKOVSKIY, Alexey; WANG, Chien-Yao; LIAO, Hong-Yuan Mark. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934, 2020.
LI, Chuyi, et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976, 2022.
WANG, Chien-Yao; BOCHKOVSKIY, Alexey; LIAO, Hong-Yuan Mark. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023. p. 7464-7475.
GIRSHICK, Ross, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2014. p. 580-587.
GIRSHICK, Ross. Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision. 2015. p. 1440-1448.
REN, Shaoqing, et al. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 2015, 28.
ARCHANA, Maruthavanan; GEETHA, M. Kalaisevi. Object detection and tracking based on trajectory in broadcast tennis video. Procedia Computer Science, 2015, 58: 225-232.
YU, Xinguo, et al. A trajectory-based ball detection and tracking algorithm in broadcast tennis video. In: 2004 International Conference on Image Processing, 2004. ICIP'04. IEEE, 2004. p. 1049-1052.
RENÒ, Vito, et al. Real-time tracking of a tennis ball by combining 3d data and domain knowledge. In: 2016 1st International Conference on Technology and Innovation in Sports, Health and Wellbeing (TISHW). IEEE, 2016. p. 1-7.
YAN, Fei; CHRISTMAS, W.; KITTLER, Josef. A tennis ball tracking algorithm for automatic annotation of tennis match. In: British machine vision conference. 2005. p. 619-628.
ZHOU, Xiangzeng, et al. Tennis ball tracking using a two-layered data association approach. IEEE Transactions on Multimedia, 2014, 17.2: 145-156.
LONG, Jonathan; SHELHAMER, Evan; DARRELL, Trevor. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. p. 3431-3440.
GRAVES, Alex. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850, 2013.
ZHANG, Liang, et al. Attention in convolutional LSTM for gesture recognition. Advances in neural information processing systems, 2018, 31.
ZHOU, Bolei, et al. Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. p. 2921-2929..
ZEILER, Matthew D.; FERGUS, Rob. Visualizing and understanding convolutional networks. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13. Springer International Publishing, 2014. p. 818-833.
SELVARAJU, Ramprasaath R., et al. Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision. 2017. p. 618-626.
XU, Kelvin, et al. Show, attend and tell: Neural image caption generation with visual attention. In: International conference on machine learning. PMLR, 2015. p. 2048-2057.
DEVLIN, Jacob, et al. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
HE, Kaiming, et al. Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision. 2017. p. 2961-2969.
O PINHEIRO, Pedro O.; COLLOBERT, Ronan; DOLLÁR, Piotr. Learning to segment object candidates. Advances in neural information processing systems, 2015, 28.
WANG, Xinlong, et al. Solo: Segmenting objects by locations. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16. Springer International Publishing, 2020. p. 649-665.
STEINBACH, Michael; KARYPIS, George; KUMAR, Vipin. A comparison of document clustering techniques. 2000.
RODRIGUEZ, Alex; LAIO, Alessandro. Clustering by fast search and find of density peaks. science, 2014, 344.6191: 1492-1496.
LECUN, Yann, et al. Handwritten digit recognition with a back-propagation network. Advances in neural information processing systems, 1989, 2.
LIU, Ting, et al. An investigation of practical approximate nearest neighbor algorithms. Advances in neural information processing systems, 2004, 17.
SHLENS, Jonathon. A tutorial on principal component analysis. arXiv preprint arXiv:1404.1100, 2014.
DATTA, Ritendra, et al. Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys (Csur), 2008, 40.2: 1-60.
LINDEBERG, Tony. Scale invariant feature transform. 2012.
SCHROFF, Florian; KALENICHENKO, Dmitry; PHILBIN, James. Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. p. 815-823.
OH SONG, Hyun, et al. Deep metric learning via lifted structured feature embedding. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. p. 4004-4012.
KOCH, Gregory, et al. Siamese neural networks for one-shot image recognition. In: ICML deep learning workshop. 2015.