研究生: |
璩瑄 Shuan Chu |
---|---|
論文名稱: |
適用於陪伴型機器人之視覺式人體動作辨識系統 A Vision-based Human Action Recognition System for Companion Robots |
指導教授: |
方瓊瑤
Fang, Chiung-Yao |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2016 |
畢業學年度: | 104 |
語文別: | 中文 |
論文頁數: | 105 |
中文關鍵詞: | 人體動作辨識 、Kinect 2.0 for Xbox One 、深度影像 、人物輪廓 、Extreme Learning Machines |
英文關鍵詞: | human action recognition, Kinect 2.0 for Xbox One, depth image, human contour, Extreme Learning Machines |
DOI URL: | https://doi.org/10.6345/NTNU202203645 |
論文種類: | 學術論文 |
相關次數: | 點閱:185 下載:20 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於醫療設備的進步及雙薪家庭的普遍,導致負擔家計的青壯年在工作繁忙的狀況下,無法照顧與陪伴家中的年長者與兒童。因此,若陪伴型機器人能協助青壯年照顧與陪伴年長者與兒童,既可減輕青壯年之人力負擔也可以增加兒童及年長者的安全感與生活品質。陪伴型機器人主要用於協助兒童及年長者的生活,照顧與陪伴了解被陪伴者的行為及狀態,並做出適當的相對應之回應,以達到互動、陪伴及照顧的功能。所以本研究開發一套適用於陪伴型機器人的視覺式人體動作辨識系統,自動辨識被陪伴者之動作,以達到陪伴、照顧與觀察等功能。
本系統的人體動作特徵擷取分為兩個部份,其中一部份為深度資訊特徵擷取,另一部份為人物輪廓特徵擷取。當系統讀入連續的人體動作深度影像後,會先進行人物位置之驗證,接著對該人物之深度影像建構深度直方圖(range histogram),並累積多個深度直方圖以做為深度資訊特徵;另一方面對該人物進行輪廓偵測,計算構成輪廓的各點位置與該輪廓的頂點位置的距離,得其輪廓點位置與該輪廓頂點位置的相對距離特徵,並進行差值累積,以此作為人物輪廓動作特徵描述。最後,利用兩個Extreme Learning Machines進行階層式人體動作分類,第一階段先進行深度資訊特徵分類,若第一階段未得出分類結果,則再輔以第二階段人物輪廓特徵分別進行分類。
本研究的動作共有八種,分別為走路(由遠而近走路)、正面敬禮、握手(包含握左手以及握右手)、彎腰、伸手取物、揮左手、揮右手以及蹲下。實驗影片共有760段且每段影片均為同個動作類別,合計影片frames數約為15156張,拍攝23至28歲之成人。其中以560段影片做為訓練集,由七個人各執行八種動作十次。而其餘200段影片則為測試集,由五個人各執行八種動作五次。透過實驗結果可得知,本系統之人體動作辨識率約為85.0%,由此可知,本系統的辨識結果具有一定的可信度。
Because of the advancement of medical technology and the generality of double-income families, young adults are busy for the work, so that don’t have much time to take care of the elderly and children. Therefore, the companion robot can help young adults take care of the elderly and children, also can reduce the pressure of young adults. It can also increase the family's sense of security and quality in the life. The main capabilities of the companion robot is to assisting the elderly and children’s life. It can take care and accompany the elderly and children, understand their behavior and make the corresponding response. In order to achieve interaction, companionship, care and observation effect. Therefore, this study proposes a vision-based human action recognition system for companion robots due to above’s advantages.
The input videos of the proposed system is obtained from one Kinect 2.0 for Xbox One. In this study, Human feature extraction of the system can be divided into two parts, one is depth image information feature extraction, and the other is human contour feature extraction. When the system starts, the system verification of the human’s position from the input images, then do human feature extraction. In depth image information feature extraction, construction range histogram and cumulative the range histogram. Use the accumulated range histogram as depth image information feature. In human contour feature extraction, do the human contour detection, and compute the distance between each human contour’s point with top point. Calculate the cumulative difference contour’s distance to be human contour feature.
In classification part, two stage hierarchical Extreme Learning Machines is used. The first stage is depth image information feature classification, and second stage is human contour feature classification. When the system did not gets the action classification from first stage, then do the second stage classification.
There are eight action of this study, include walk, bow, shake hands, bend, take, wave right hand, wave left hand and squat. The number of experimental sequence is 760 with total 15156 frames. Each sequence only contains one action, while the average rate of human action recognition is 85.0%. As a result, the proposed system is robust and efficient.
[Ngu08] H. Nguyen, C. Anderson, A. Trevor, A. Jain, Z. Xu and C. C. Kemp, “El-E: An Assistive Robot that Fetches Objects from Flat Surfaces,” in The Robotic Helpers Workshop, Amsterdam, Netherlands, 2008.
[Wan12] J. Wang, Z. Liu, Y. Wu and J. Yuan, “Mining Actionlet Ensemble for Action Recognition with Depth Cameras,” in Computer Vision and Pattern Recognition, Providence, RI, 2012.
[Sun11] J. Sung, C. Ponce, B. Selman and A. Saxena, “Human Activity Detection from RGBD Images,” in AAAI workshop on Pattern, Activity and Intent Recognition (PAIR), San Francisco, CA, 2011.
[Yan12] X. Yang, C. Zhang and Y. Tian, “Recognizing Actions Using Depth Motion Maps-based Histograms of Oriented Gradients,” in the 20th ACM international conference on Multimedia, New York, USA, 2012.
[Son14] Y. Song, J. Tang, F. Liu and S. Yan, “Body Surface Context: A New Robust Feature for Action Recognition From Depth Videos,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 6, pp. 952-964, 2014.
[Ma14] X. Ma, H. Wang, B. Xue, M. Zhou, B. Ji and Y. Li, “Depth-Based Human Fall Detection via Shape Features and Improved Extreme Learning Machine,” IEEE Journal of Biomedical and Health Informatics, vol. 18, no. 6, pp. 1915-1922, 2014.
[Auv11] E. Auvinet, F. Multon, A. Saint-Arnaud, J. Rousseau and J. Meunier, “Fall Detection With Multiple Cameras: An Occlusion-Resistant Method Based on 3-D Silhouette Vertical Distribution,” IEEE Transactions on Information Technology in Biomedicine, vol. 15, no. 2, pp. 290-300, 2011.
[Jia08] K. Jia and D. Y. Yeung, “Human Action Recognition using Local Spatio-Temporal Discriminant Embedding,” in IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, Alaska, USA, 2008.
[Jia12] Z. Jiang, Z. Lin and L. S. Davis, “Recognizing Human Actions by Learning and Matching Shape-Motion Prototype Trees,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 3, pp. 533-547, 2012.
[Rah12] S. A. Rahman, S. Y. Cho and M. K. H. Leung, “Recognising Human Actions by Analysing Negative Spaces,” IET Computer Vision, vol. 6, no. 3, pp. 197-213, 2012.
[Wu14] J. Wu and D. Hu, “Learning Effective Event Models to Recognize a Large Number of Human Actions,” IEEE Transactions on Multimedia, vol. 16, no. 1, pp. 147-158, 2014.
[Guh12] T. Guha and R. K. Ward, “Learning Sparse Representations for Human Action Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 8, pp. 1576-1588, 2012.
[Per12] A. P. Perez, M. Marszalek, I. Reid and A. Zisserman, “Structured Learning of Human Interactions in TV Shows,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 12, pp. 2441-2453, 2012.
[Kon14] Y. Kong, W. Liang, Z. Dong and Y. Jia, “Recognising Human Interaction from Videos by a Discriminative Model,” IET Computer Vision, vol. 8, no. 4, pp. 277-286, 2014.
[Ni13] B. Ni, Y. Pei and P. Moulin, “Multilevel Depth and Image Fusion for Human Activity Detection,” IEEE Transactions on Cybernetics, vol. 43, no. 5, pp. 1383-1394, 2013.
[Pop11] M. Popa, A. K. Koc, L. J.M. Rothkrantz, C. Shan and P. Wiggers, “Kinect Sensing of Shopping Related Actions,” in Ambient Intelligence in Future Lighting Systems, Amsterdam, the Netherlands, 2011.
[Son15] Y. Song, S. Liu and J. Tang, “Describing Trajectory of Surface Patch for Human Action Recognition on RGB and Depth Videos,” IEEE Signal Processing Letters, vol. 22, no. 4, pp. 426-429, 2015.
[Nie06] J. C. Niebles, H. Wang and L. Fei-fei, “Unsupervised learning of human action categories using spatial-temporal words,” in British Machine Vision Conference, Edinburgh, 2006.
[Cho09] W. Choi, K. Shahid and S. Savarese, “What are they doing? Collective activity classification using spatio-temporal relationship among people,” in International Workshop on Visual Surveillance, Kyoto, 2009.
[Son08] M. Sonka, V. Hlavac and R. Boyle, “Image processing, analysis, and machine vision,” United States of America, 2008.
[Hua04] G. B. Huang, Q. Y. Zhu and C. K. Siew, “Extreme Learning Machine: A New Learning Scheme of Feedforward Neural Networks,” in International Joint Conference on Neural Networks, Budapest, Hungary, 2004.
[Hua16] Z. Huang, Y. Yu, J. Gu and H. Liu, “An Efficient Method for Traffic Sign Recognition Based on Extreme Learning Machine,” IEEE Transactions on Cybernetics, vol. PP, no. 99, pp. 1-14, 2016.
[Zen15] Y. Zeng, X. Xu, Y. Fang and K. Zhao, “Traffic Sign Recognition Using Extreme Learning Classifier with Deep Convolutional Features,” in Intelligence Science and Big Data Engineering, Suzhou, 2015.
[Uza15] M. Uzair, F. Shafai, B. Ghanem and A. Mian, “Representation Learning with Deep Extreme Learning Machines for Efficient Image Set Classification,” 2015.
[Xie15] Z. Xie, K. Xu, W. Shan, L. Liu, Y. Xiong and H. Huang, “Projective Feature Learning for 3D Shapes with Multi-View Depth Images,” Pacific Graphics, vol. 34, no. 7, pp. 1-11, 2015.
[Bab11] R. V. Babu and S. Suresh, “Fully Complex-valued ELM Classifiers for Human Action Recognition,” in International Joint Conference on Neural Networks, San Jose, California, USA, 2011.
[Hua15] B. G. Huang, “What are Extreme Learning Machines? Filling the Gap between Frank Rosenblatt's Dream and John von Neumann's Puzzle,” Cognitive Computation, vol. 7, pp. 263-278, 2015.
[You15] G. Huang, B. G. Huang, S. Song and K. You, “Trends in extreme learning machines: A review,” Neural Networks, vol. 61, no. 1, pp. 32-48, 2015.
[1] “The International Federation of Robotics,” Available at: http://www. ifr.org/service-robots/, Accessed 2015.
[2] “World robotics 2015 executive summary,” Available at: http://www. worldrobotics.org/uploads/tx_zeifr/Executive_Summary__WR_2015_01.pdf, Accessed 2015.
[3] “CAD-60 dataset,” Cornell University, 2009. Available at: http://pr.cs. cornell.edu/humanactivities/data.php#cad60, Accessed 2015.
[4] “MSR Action Recognition Datasets and Codes,” Available at: http:// research.microsoft.com/en-us/um/people/zliu/actionrecorsrc/, Accessed 2015.
[5] “SDUFall dataset,” Available at: http://www.sucro.org/homepage/ wanghaibo/SDUFall.html, Accessed 2015.
[6] “Weizmann dataset,” 2007. Available at: http://www.wisdom. weizmann.ac.il/~vision/SpaceTimeActions.html, Accessed 2015.
[7] “KTH dataset,” 2005. Available at: http://www.nada.kth.se/cvap/ actions/, Accessed 2015.
[8] “HMDB51 dataset,” Available at: http://serre-lab.clps.brown.edu/resou rce/hmdb-a-large-human-motion-database/#overview, Accessed 2015.
[9] “UCF50 action dataset,” 2011. Available at: http://crcv.ucf.edu/data/ UCF50.php, Accessed 2015.
[10] “UCF Sports action dataset,” 2011. Available at: http://crcv.ucf.edu/data /UCF_Sports_Action.php, Accessed 2015.
[11] “UT-Interaction dataset,” 2010. Available at: http://cvrc.ece.utexas.edu /SDHA2010/Human_Interaction.html, Accessed 2015.
[12] “Collective Activity Dataset,” Available at: http://wwweb.eecs.umich. edu/vision/activity-dataset.html, Accessed 2015.
[13] “Kinect for Xbox One,” Available at: http://wallpaper222.com/explore /xbox%20one%20kinect/, Accessed 2015.
[14] “Kinect hardware key features and benefits,” Available at: https://dev. windows.com/en-us/kinect/hardware, Accessed 2015.
[15] “Kinect hardware requirements and sensor setup,” Available at: https://dev.windows.com/en-us/kinect/hardware-setup, Accessed 2015.
[16] “ELM-Chinese-Brief,” Available at: http://vdisk.weibo.com/s/FdGUTb HXxFqj_, Accessed 2016.
[17] M.-C. Chuang, “Digital Image Processing Image Segmentation,” Available at: http://www.csie.cyut.edu.tw/~s9227609/DIP-4-segment. pdf, Accessed 2016.
[18] “PPT模板下載,” Available at: http://ppt.downhot.com/gaoqing/sheji/ katongmanhua/45248.html, Accessed 2016.
[19] “淘圖網,” Available at: http://www.taopic.com/vector/201403/ 512445.html, Accessed 2016.
[20] “影像強化,” Available at: http://ip.csie.ncu.edu.tw/course/IP/ IP1306cp.pdf, Accessed 2016.