研究生: |
林聖傑 Lin, Sheng-Jie |
---|---|
論文名稱: |
基於深度學習之羽球動作分析系統 A Badminton Pose Analysis System Based on Deep Learning |
指導教授: |
方瓊瑤
Fang, Chiung-Yao |
口試委員: |
方瓊瑤
Fang, Chiung-Yao 陳世旺 Chen, Sei-Wang 黃仲誼 Huang, Zhong-Yi 羅安鈞 Luo, An-Chun 吳孟倫 Wu, Meng-Luen |
口試日期: | 2024/07/12 |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 中文 |
論文頁數: | 48 |
中文關鍵詞: | 羽球 、羽球動作辨識 、羽球動作分析 、3D人體模型分析 、資料增強 、電腦視覺 |
英文關鍵詞: | Badminton, Badtminton Motion Recognition, Badminton Motion Analysis, 3D Human Model Analysis, Data Augmentation, Computer Vision |
研究方法: | 實驗設計法 、 比較研究 、 觀察研究 、 現象分析 |
DOI URL: | http://doi.org/10.6345/NTNU202401359 |
論文種類: | 學術論文 |
相關次數: | 點閱:231 下載:4 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來由於2020年東京奧運,台灣在羽球項目拿下一面金牌以及一面銀牌的好成績,隨著奪冠之後的聲浪,台灣的羽球人口也持續上升,因此本研究提出一套基於深度學習之羽球動作分析系統,能夠讓使用者輸入一段羽球動作影片,即可分析出動作的正確性,以避免造成傷害。也可以使得使用者剩下昂貴的教練費及場地費。
羽球動作分析系統主要可以分成三個部分,分別為資料前處理、羽球動作辨識子系統及3D人體模型建構及分析子系統,羽球為世界上最快的球類運動,在拍攝時容易造成物件模糊的情形,因此本研究透過資料的前處理解決模糊影像,後續使用Frame Flexible Network架構,學習來自不同頻率的特徵圖,接著透過Temporal Shift Module位移部分通道的特徵圖,以達到時序融合。後續使用近年來新穎的3D人體模型技術,透過其中24個人體關鍵點,使用普式分析(Procrustes analysis)輸出容易受傷的關節點。
本研究建立一個羽球動作資料集,命名為CVIU badminton datasets,該資料集包含7個常見的羽球動作,分別為反手擊球、正手擊球、右挑球、左挑球、低手發球、高手發球、防守動作,實驗結果顯示在CVIU badminton datasets中的Top-1準確度達到91.87%。類別準確度(Class accuracy)達到85.71%。後續實驗結果顯示本研究所提出改良都有提升效果。
In recent years, due to the 2020 Tokyo Olympics, Taiwan achieved excellent results in badminton, winning a gold medal and a silver medal. Following these victories, the number of badminton players in Taiwan has continued to rise. Therefore, this study proposes a deep learning-based badminton motion analysis system, which allows users to input a video of badminton movements to analyze the correctness of the movements and avoid injuries. It also helps users save on expensive coaching and venue fees.
The badminton motion analysis system can be divided into three main parts: data preprocessing, badminton motion recognition subsystem, and 3D human model construction and analysis subsystem. Badminton is the fastest racket sport in the world, which often causes motion blur when filming. Therefore, this study addresses blurry images through data preprocessing. Subsequently, it uses the Frame Flexible Network architecture to learn feature maps from different frequencies. Then, the Temporal Shift Module is used to shift feature maps of some channels to achieve temporal fusion. The latest 3D human model technology is then used, utilizing 24 human key points. By employing Procrustes analysis, the system outputs joint points that are prone to injury.
This study established a badminton motion dataset named CVIU Badminton Datasets, which includes seven common badminton actions: backhand stroke, forehand stroke, right lift, left lift, low serve, high serve, and defensive action. Experimental results showed that the Top-1 accuracy on the CVIU Badminton Datasets reached 91.87%. The class accuracy reached 85.71%. Subsequent experimental results indicated that the proposed improvements in this study have had an enhancement effect.
Z. Liu, J. Ning, Y. Cao, Y. Wei, Z. Zhang, S. Lin, and H. Hu, “Video Swin Transformer,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3202-3211, 2022.
M. Kocabas, N. Athanasiou, and M. J. Black, “VIBE: Video Inference for Human Body Pose and Shape Estimation,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, pp. 5252-5262, 2020.
Z. Liu, L. Wang, W. Wu, C. Qian, and T. Lu, “TAM: Temporal Adaptive Module for Video Recognition,” Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 13708-13718, 2021.
Y. Zhang, Y. Bai, C, Liu, H. Wang, S. Li, and Y. Fu, “Frame Flexible Network,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10504-10513, 2023.
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and Neil Houlsby, “An image is worth 16 x 16 words: transformers for image recognition at scale,” Proceedings of International Conference on Learning Representations (ICLR), pp. 1-22, 2021
Y. Lecun, L.Bottou, Y.Bengio, and P.Haffner, “Gradient-based learning applied to document recognition” Proceedings of the IEEE, pp. 2278-2324
G. Pavlakos, V. Choutas, N. Ghorbani, T. Bolkart, A. A. A. Osman, D. Tzionas, and M. J. Black, “Expressive Body Capture: 3D Hands, Face, and Body from a Single Image,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10975-10985, 2019.
M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black, “SMPL: A Skinned Multi-Person Linear Model,” Association for Computing Machinery (ACM), pp. 248:1-248:16, 2015.
J. Shuiwang, X. Wei, Y, Ming, and Y, Kai, "3D Convolutional Neural Networks for Human Action Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 221-231, 2013.
A. Wang, H. Chen, Z. Lin, J. Han, and G. Ding, “RepViT: Revisiting Mobile CNN From ViT Perspective,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15909-15920, 2024.
X. Sun, P. Chen, L. Chen, C. Li, T. H. Li, M. Tan, and C. Gan, "Masked Motion Encoding for Self-Supervised Video Representation Learning," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2235-2245, 2023.
Y. Li, B. Ji, X. Shi, J. Zhang, B. Kang, and L. Wang, “TEA: Temporal Excitation and Aggregation for Action Recognition,” Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 909-918, 2020.
R. Wang, D. Chen, Z. Wu, Y. Chen, X. Dai, M. Liu, Y.-G. Jiang, L. Zhou, and L. Yuan, "BEVT: BERT Pretraining of Video Transformers," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14733-14743, 2022.
Z. Liu, R. Feng, H. Chen, S. Wu, B. Yang, S. Ji, and X. Wang, "Deep Dual Consecutive Network for Human Pose Estimation," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 525-534, 2021.
X. Ma, J. Su, C. Wang, W. Zhu, and Y. Wang, "3D Human Mesh Estimation from Virtual Markers," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 534-543, 2023.
Z. Liu, R. Feng, H. Chen, S. Wu, Y. Gao, Y. Gao, and X. Wang, "Temporal Feature Alignment and Mutual Information Maximization for Video-Based Human Pose Estimation," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11006-11016, 2022.
Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, "Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7291-7299, 2017.
K. Li, Y. Wang, Y. He, Y. Li, Y. Wang, Y. Liu, Z. Wang, J. Xu, G. Chen, P. Luo, L. Wang, and Y. Qiao, "MVBench: A Comprehensive Multi-modal Video Understanding Benchmark," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 22195-22206, 2023.
Y. Zhao, W. Lv, S. Xu, J. Wei, G. Wang, Q. Dang, Y. Liu, and J. Chen, "DETRs Beat YOLOs on Real-time Object Detection," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16965-16974, 2024.
T. Cheng, L. Song, Y. Ge, W. Liu, X. Wang, and Y. Shan, "YOLO-World: Real-Time Open-Vocabulary Object Detection," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16901-16911, 2024.
W. Li, M. Liu, H. Liu, P. Wang, J. Cai, and N. Sebe, "Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 604-613, 2024.