研究生: |
蔡仁凱 Tsai, Jen-Kai |
---|---|
論文名稱: |
以深度學習為基礎之多人即時動作辨識系統 Deep Learning Based Real-Time Multiple-Person Action Recognition System |
指導教授: |
許陳鑑
Hsu, Chen-Chien 王偉彥 Wang, Wei-Yen |
學位類別: |
碩士 Master |
系所名稱: |
電機工程學系 Department of Electrical Engineering |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 中文 |
論文頁數: | 82 |
中文關鍵詞: | 動作辨識 、深度學習 、人物追蹤 、智慧型監控 、三維卷積 、人臉辨識 |
英文關鍵詞: | action recognition, deep learning, face recognition, human tracking, smart surveillance, 3D convolution |
DOI URL: | http://doi.org/10.6345/NTNU202001187 |
論文種類: | 學術論文 |
相關次數: | 點閱:211 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
[1] L. Xia, C. Chen, and J. Aggarwal, “View invariant human action recognition using histograms of 3D joints,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, Jun. 2012, pp. 20-27.
[2] J. Liu, A. Shahroudy, D. Xu, and G. Wang, “Spatio-temporal lstm with trust gates for 3d human action recognition,” in Proc. European Conference on Computer Vision, Springer, Cham, Sep. 2016, pp. 816-833.
[3] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime multi-person 2d pose estimation using part affinity fields,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, Jul. 2017, pp. 7291-7299.
[4] H.-S. Fang, S. Xie, Y.-W. Tai, and C. Lu, “RMPE: Regional multi-person pose estimation,” in Proc. IEEE International Conference on Computer Vision (ICCV), Venice, Italy, Oct. 2017, pp. 2334-2343.
[5] J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell, “Long-term recurrent convolutional networks for visual recognition and description,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, Jun. 2015, pp. 2625-2634.
[6] S. Ji, W. Xu, M. Yang, and K. Yu, “3D convolutional neural networks for human action recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 221-231, Jan. 2013.
[7] P.-J. Hwang, W.-Y. Wang, and C.-C. Hsu, “Development of a mimic robot-learning from demonstration incorporating object detection and multiaction recognition,” IEEE Consumer Electronics Magazine, vol. 9, no. 3, pp. 79-87, May 2020.
[8] K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” in Proc. Conference and Workshop on Neural Information Processing Systems, 2014, pp. 568-576.
[9] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning Spatiotemporal Features with 3d Convolutional Networks,” in Proc. of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, Dec. 2015, pp. 4489-4497.
[10] J. Carreira and A. Zisserman, “Quo vadis, action recognition? new models and the kinetics dataset,” in Proc. of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, Jul. 2017, pp. 6299-6308.
[11] C. Feichtenhofer, A. Pinz, and A. Zisserman, “Convolutional two-stream network fusion for video action recognition,” in Proc. of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, Jun. 2016, pp. 1933-1941.
[12] T. Rose, J. Fiscus, P. Over, J. Garofolo, and M. Michel, “The trecvid 2008 event detection evaluation,” Workshop on Application of Computer Vision (WACV), Snowbird, Utah, Dec. 2009, pp. 1-8.
[13] C. Schuldt, I. Laptev, and B. Caputo. “Recognizing human actions: a local svm approach,” in Proc. of the International Conference on Pattern Recognition (ICPR), Cambridge, UK, Aug. 2004, pp. 32-36.
[14] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, Jun. 2015, pp. 1-9.
[15] K. Soomro, A. R. Zamir, and M. Shah, “UCF101: A dataset of 101 human actions classes from videos in the wild,” arXiv preprint arXiv:1212.0402, 2012.
[16] J. Carreira, E. Noland, C. Hillier, and A. Zisserman, “A short note on the kinetics-700 human action dataset,” arXiv preprint arXiv:1907.06987, 2019.
[17] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime multi-person 2d pose estimation using part affinity fields,” in Proc. of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Jul. 2017, Honolulu, HI, pp. 7291-7299.
[18] Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, and Y. Sheikh, “OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields,” arXiv preprint arXiv:1812.08008, 2018.
[19] H. Joo, T. Simon, X. Li, H. Liu, L. Tan, L. Gui, S. Banerjee, T. S. Godisart, B. Nabbe, I. Matthews, T. Kanade, S. Nobuhara, and Y. A Sheikh, “Panoptic studio: A massively multiview system for social interaction capture,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 1, pp. 190-204, Jan. 2019.
[20] H.-S. Fang, S. Xie, Y.-W. Tai, and C. Lu, “RMPE: Regional Multi-Person Pose Estimation,” in Proc. of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, Oct. 2017, pp. 2334-2343.
[21] J. Li, C. Wang, H. Zhu, Y. Mao, H.-S. Fang, and C. Lu, “CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark,” in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, Jun. 2019, pp. 10863-10872.
[22] Y. Xiu, J. Li, H. Wang, Y. Fang, and C. Lu, “Pose Flow: Efficient Online Pose Tracking,” British Machine Vision Conference (BMVC), Newcastle, UK, Sep. 2018.
[23] S. Hochreiter, and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735-1780, Nov. 1997.
[24] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, Ohio, Jun. 2014, pp. 580-587.
[25] R. Girshick, “Fast R-CNN,” in Proc. of the IEEE international conference on computer vision (ICCV), Santiago, Chile, Dec. 2015, pp. 1440-1448.
[26] S. Ren, K. He, R. Girshick, and J. Sum, “Faster R-CNN: towards real-time object detection with region proposal networks,” Advances in Neural Information Processing Systems (NIPS), Montreal, Quebec, Dec. 2015, pp. 91-99.
[27] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
[28] N. Wojke, A. Bewley, and D. Paulus. “Simple online and realtime tracking with a deep association metric,” in IEEE International Conference on Image Processing (ICIP), Beijing, China, Sep. 2017, pp. 3645-3649.
[29] F. Schroff, D. Kalenichenko, and J. Philbin, “FaceNet: A unified embedding for face recognition and clustering,” in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, Massachusetts, Jun. 2015, pp. 815-823.
[30] Y.-T. Wu, Y.-H. Chien, W.-Y. Wang, and C.-C. Hsu, “A YOLO-based method on the segmentation and recognition of Chinese words,” in Proc. of the International Conference on System Science and Engineering (ICSSE), New Taipei City, Taiwan, Jun. 2018.
[31] A. Bewley, G. Zongyuan, F. Ramos, and B. Upcroft, “Simple online and realtime tracking,” in IEEE International Conference on Image Processing (ICIP), Phoenix, Arizina, Sep. 2016, pp. 3464-3468.
[32] Z. Shou, D. Wang, and S.-F. Chang, “Temporal action localization in untrimmed videos via multi-stage CNNs,” in Proc. of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, Jun. 2016, pp. 1049-1058.
[33] P.-J. Hwang, C.-C. Hsu, W.-Y. Wang, and H.-H. Chiang, “Robot learning from demonstration based on action and object recognition,” in IEEE International Conference on Consumer Electronics (ICCE), Taoyuan, Taiwan, Jan. 2020.
[34] Z. Shou, D. Wang, and S.-F. Chang, “Temporal action localization in untrimmed videos via multi-stage cnns,” in Proc. of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, Jun. 2016, pp. 1049-1058.
[35] A. Shahroudy, J. Liu, T.-T. Ng and G. Wang, “NTU RGB+D: A large scale dataset for 3d human activity analysis,” in Proc. of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, Jun. 2016, pp. 1010-1019.
[36] A. Graves, A. Mohamed and G. Hinton, “Speech recognition with deep recurrent neural networks,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, Canada, May 2013, pp. 6645-6649.
[37] Z. Wang, L. Zheng, Y. Liu, and S. Wang, “Towards real-time multi-object tracking,” arXiv preprint arXiv:1909.12605, Sep. 2019.