研究生: |
張銘仁 Chang, Ming-Jen |
---|---|
論文名稱: |
適用於室內移動式機器人之人體動作辨識系統 A Human Action Recognition System for Indoor Mobile Robots |
指導教授: |
方瓊瑤
Fang, Chiung-Yao |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2019 |
畢業學年度: | 107 |
語文別: | 中文 |
論文頁數: | 73 |
中文關鍵詞: | 彩色影像序列 、光流序列 、深度影像序列 、3D卷積神經網路 、VGG網路 、深度學習 、人體動作辨識 、移動式攝影機 、室內移動式機器人 |
英文關鍵詞: | Color information, Optical flow, Depth information, 3D convolutional neural network, VGG nets, Deep learning, Human action recognition, Moving camera, Indoor mobile robots |
DOI URL: | http://doi.org/10.6345/NTNU201900439 |
論文種類: | 學術論文 |
相關次數: | 點閱:228 下載:27 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本研究提出了一種基於視覺的人體動作識別系統,該系統採用深度學習技術。當攝影機從各個方向朝人物前進時,本系統能夠成功進行人體動作辨識。本研究所提出的方法,有助於陪伴型機器人的視覺系統。
本系統使用三種資訊進行人體動作辨識,包含彩色影像序列、光流序列及深度影像序列。首先,使用Kinect2.0中的深度感測器及彩色攝影機,同時捕捉彩色影像序列及深度影像序列。接著從彩色影像序列中擷取出HOG特徵,再使用SVM分類器來檢測人物區域。
透過檢測到之人物區域對彩色影像進行裁剪,並使用Farnebäck所提出的方法對裁剪出的彩色影像序列計算出對應的光流序列。然後透過frame sampling的技術,將序列裁剪為相同長度。接著將frame sampling的結果,分別輸入至三個改進後的3D convolution neural network(3D CNN)中。3D CNN可以擷取時間與空間中的人體動作特徵,並進行人體動作辨識。最後將三種辨識結果整合後,輸出最終的人體動作辨識結果。
本研究所提出的系統可以辨識13種人體動作,分別為坐著喝水、站著喝水、坐著吃東西、站著吃東西、使用手機、讀書、坐下、起立、使用電腦、走路(水平)、走路(垂直)、走離對方及走向對方。本系統在攝影機移動下之人體動作辨識率達到96.4%,表示本研究所提出之系統是穩定且有效的。
This study presents a vision-based human action recognition system using a deep learning technique. The system can recognize human actions successfully when the camera of the robots is moving toward the serviced person from various directions. Therefore, the proposed method is useful for the vision system of the indoor mobile robots.
The system uses three kinds of information to recognize the human actions, including color videos, optical flow videos, and depth videos. First, a Kinect 2.0 captures color videos and depth videos simultaneously using its RGB camera and depth sensor. Second, the histogram of oriented gradient (HOG) features is extracted from the color videos and a support vector machine (SVM) is used to detect the human region. Based on the detected human region, the frames of color video are cropped and the corresponding frame of the optical flow video can be obtained by Farnebäck method. The number of frames of these videos is then unified by a frame sampling technique. After frame sampling, these three kinds of videos are input into three modified 3D convolutional neural networks (3D CNN) respectively. The modified 3D CNNs can extract the spatial and temporal features of human actions and recognize them respectively. Finally, these recognition results are integrated to output the final recognition result of human actions.
The proposed system can recognize 13 kinds of human actions, including drink (sit), drink (stand), eat (sit), eat (stand), read, sit down, stand up, use computer, walk (horizontal), walk (vertical), play with phone/tablet, walk apart from each other, and walk towards each other. The average human action recognition rate of 369 testing human action videos was 96.4%, indicating that the proposed system is robust and efficient.
[Car17] J. Carreira and A. Zisserman, “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset,” Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Hawaii, Honolulu, USA, pp. 6299-6308, 2017.
[Chu16-1] S. Y. Chun and C. S. Lee, “Human Action Recognition Using Histogram of Motion Intensity and Direction from Multiple Views,” IET Computer Vision, vol. 10, no. 4, pp. 250-256, 2016.
[Chu16-2] C. H. Chuan, Y. N. Chen, and K. C. Fan, “Human Action Recognition Based on Action Forests Model Using Kinect Camera,” Proceedings of 2016 30th International Conference on Advanced Information Networking and Applications Workshops (WAINA), Crans-Montana, Switzerland, pp. 914-917, 2016.
[Dal05] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA, pp. 886-893, 2005
[Du15] Y. Du, Y. Fu, and L. Wang, “Skeleton Based Action Recognition with Convolutional Neural Network,” Proceedings of IAPR Asian Conference on Pattern Recognition, Kuala Lumpur, Malaysia, pp. 579-583, 2015.
[Gar15] Á. García-Martín, and J. M. Martínez, “People Detection in Surveillance: Classification and Evaluation,” IET Computer Vision, vol. 9, no. 5, pp. 779-788, 2015.
[Ji16] S. Ji, W. Xu, M. Yang, and K. Yu, “3D Convolutional Neural Networks for Human Action Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 10, no. 5, pp. 221-231, 2016.
[Keç18] A. S. Keçeli, A Kaya, and A. B. Can, “Combining 2D and 3D deep models for action recognition with depth information,” Signal, Image and Video Processing (SIViP), vol. 12, no. 6, pp. 1197-1205, 2018.
[Lei16] J. Lei, G. H. Li, J. Zhang, Q. Guo, and D. Tu, “Continuous Action Segmentation and Recognition Using Hybrid Convolutional Neural Network-hidden Markov Model Model,” IET Computer Vision, vol. 10, no. 6, pp. 537 - 544, 2016.
[Li17-1] J. K. Li, T. Wang, Y. Zhou, Z. Y. Wang, and H. Snoussi, “Using Gabor Filter in 3D Convolutional Neural Networks for Human Action Recognition”, Proceedings of 2017 36th Chinese Control Conference (CCC), Dalian, China, pp. 11139-11144, 2017.
[Li17-2] C. Li, Q. Y. Zhong, D. Xie, and S. L. Pu, “Skeleton-Based Action Recognition with Convolutional Neural Networks,” Proceedings of International Conference on Multimedia & Expo Workshops, Hong Kong, China, pp. 597-600, 2017.
[Liu17] H. Liu, J. Tu, and M. Liu, “Two-Stream 3D Convolutional Neural Network for Skeleton-Based Action Recognition,” arXiv preprint arXiv:1705.08106, 2017.
[Sha16] A. Shahroudy, J. Liu, T. T. Ng, and G. Wang, “NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis,” Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 1010-1019, 2016.
[Sim15] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” Proceedings of the 2015 International Conference on Learning Representations (ICLR), San Diego, CA, USA, pp. 1-14, 2015.
[Sun15] L. Sun, K. Jia, D. Y. Yeung, and B. E. Shi, “Human Action Recognition using Factorized Spatio-Temporal Convolutional Networks,” Proceedings of 2015 IEEE International Conference on Computer Vision, Santiago, Chile, pp. 4597-4605, 2015.
[Tra15] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning Spatiotemporal Features with 3D Convolutional Networks,” Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, pp. 4489-4497, 2015.
[Wan18-1] Y. F. Wang, W. G. Zhou, Q. L. Zhang, and H. Q. Li, “Visual Attribute-Augmented Three-Dimensional Convolution Neural Network for Enhanced Human Action Recognition,” arXiv preprint arXiv:1805.02860, 2018.
[Wan18-2] Y. F. Wang, W. G. Zhou, Q. L. Zhang, X. T. Zhu, and H. Q. Li, “Low-Latency Human Action Recognition with Weighted Multi-Region Convolutional Neural Network,” arXiv preprint arXiv:1805.02877, 2018.
[Yan15] S. Yan, Y. Xiong, and D. Lin, “Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action,” Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, pp. 4489-4497, 2015.
[Yan17] Y. H. Yang, C. Deng, D. P. Tao, S. T. Zhang, W. Liu, and X. B. Gao, “Latent Max-Margin Multitask Learning with Skelets for 3-D Action Recognition,” IEEE Transactions on Cybernetics, vol. 47, no. 2, pp. 439-448, 2017.
[Zha13] X. Zhang, H. J. Jiang, B. Wang, and J. X. Zhang, “The Method of Positioning Moving Human Based on Shape Complexity,” Proceedings of IEEE International Conference on Communication Technology, Guilin, China, pp. 429-432, 2013.
[1] “Service Robotics: Sales up 25 percent - 2019 boom predicted,” Available at: https://ifr.org/img/uploads/Presentation_12_Oct_2016 __WR_Service_Robots.pdf, Accessed 2018.
[2] “EMIEW3,” Available at: http://www.themakers.cn/public/upload/ videonew/1194/cfbc001372fb70d1.jpg, Accessed 2018.
[3] “Robots Will Lead Passengers to Their Gate at Seoul’s Airpor,” Available at: https://www.smithsonianmag.com/travel/robots-will-lead-passengers-their-gate-seouls-airport-180963978/, Accessed 2018.
[4] “仁川機場再添生力軍南韓導入LG機器人為旅客帶路,備戰2018冬季奧運,” Available at: http://www.limitlessiq.com/news/post/view/id/1236/, Accessed 2018.
[5] “不可思議的小管家!開箱華碩 ASUS Zenbo 智慧居家好夥伴。會說故事或求救、放音樂還會幫你叫車、掛號、控制家電 (評測/評價/心得/測試/推薦),” Available at: http://ifans.pixnet.net/blog/post/223520802-%E4%B8%8D%E5%8F%AF%E6%80%9D%E8%AD%B0%E7%9A%84%E5%B0%8F%E7%AE%A1%E5%AE%B6%EF%BC%81%E9%96%8B%E7%AE%B1%E8%8F%AF%E7%A2%A9-asus-zenbo-%E6%99%BA%E6%85%A7%E5%B1%85, Accessed 2018.
[6] “Zembo,” Available at: https://zenbo.asus.com/tw/, Accessed 2017.
[7] “i3DPost Multi-view Human Action Datasets,” Available at: https://cvssp.org/i3dpost_action/, Accessed 2018.
[8] “INRIA Xmas Motion Acquisition Sequences (IXMAS),” Available at: http://4drepository.inrialpes.fr/public/viewgroup/6#, Accessed 2018.
[9] “UTKinect-Action3D Dataset, ” Available at: http://cvrc.ece.utexas.edu/KinectDatasets/HOJ3D.html, Accessed 2018.
[10] “Kinect2.0 for Xbox One,” Available at: http://www.xbox.com/zh-TW/xbox-one/accessories/kinect , Acessed 2018.
[11] “飆機器人-全向輪自走車,” Available at: http://www.playrobot.com/robotics/664--email.html, Acessed 2018.
[12] “Berkeley MHAD資料庫,” Available at: http://tele-immersion.citris-uc.org/berkeley_mhad, Accesse 2018.
[13] “KTH Dataset,” Available at: http://www.nada.kth.se/cvap/actions/, Acessed 2018.
[14] “Kinect2.0 for Xbox One,” Available at: http://www.xbox.com/zh-TW/xbox-one/accessories/kinect , Acessed 2018.
[15] “calculating histograms of gradient orientation,” Available at: https://www.quora.com/What-are-HOG-features-in-computer-vision-in-laymans-terms, Accessed 2018.
[16] “R筆記 – (14)Support Vector Machine/Regression(支持向量機SVM),” Available at: https://rpubs.com/skydome20/R-Note14-SVM-SVR, Accessed 2018.
[17] “The legend for optical visualization shows how direction is mapped to color and magnitude is mapped to saturation.,” Available at: https://hci.iwr.uni-heidelberg.de/Correspondence_Visualization, Accessed 2018.
[18] “Review: VGGNet — 1st Runner-Up (Image Classification), Winner (Localization) in ILSVRC 2014,” Available at: https://medium.com/coinmonks/paper-review-of-vggnet-1st-runner-up-of-ilsvlc-2014-image-classification-d02355543a11, Accessed 2018.