研究生: |
鄧宇珊 Deng, Yu-Shan |
---|---|
論文名稱: |
演講者姿勢的偵測、辨識與追蹤 Speaker Pose Detection, Recognition, and Tracking |
指導教授: |
陳世旺
Chen, Sei-Wang 方瓊瑤 Fang, Chiung-Yao |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2016 |
畢業學年度: | 104 |
語文別: | 中文 |
論文頁數: | 56 |
中文關鍵詞: | 姿勢辨識 、隨機森林 、混合高斯模型 |
英文關鍵詞: | pose recognition, Random forests, Gaussian mixture models (GMM) |
DOI URL: | https://doi.org/10.6345/NTNU202203909 |
論文種類: | 學術論文 |
相關次數: | 點閱:126 下載:5 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本文提出演講者姿勢之辨識系統,其原先之目的為提供目前實驗室正在發展中之自動化演講錄製系統有關的演講者姿態資訊,將姿態資訊結合其他的資訊,可使演講錄製系統中之演講者攝影師次子系統達到自動化運鏡取景的效果。本研究的輸入資料為演講者攝影師次子系統中之KINECT感測器所提供之演講者深度影像。此種影像的好處是不受光線強弱的影響,這點對於演講環境相當重要;又有不受人物衣著的顏色及紋理所影響;也可避免因尺寸大小不同所引起的干擾。
本研究的辨識系統是基於像素的深度比影像特徵(depth comparison image features),採用隨機森林(Random forests)技術,將輸入影像中之人物區域的像素歸類至不同的身體部位。當影像中人物的身體部位完成指定後,利用極小方框(Minimum bounding box)技術框選身體部位並取得身體部位中心點。之後,姿勢即由身體部位中心點座標來表示。接下來,將偵測到的姿勢與一組事先建立好的姿勢高斯混合模組(Gaussian mixture models, GMM)作比對,比對結果最佳的姿勢模組,系統即認為偵測的姿勢是屬於該模組所對應的姿勢類別。後續輸入的人物深度影像,以粒子過濾(Particle filter)技術來持續的追蹤姿勢。
在本研究中辨識兩種姿勢,分別為手舉起與手彎曲的姿勢。在隨機森林的訓練過程,將訓練400張人物深度影像和其對應的身體部位標籤影像。並實際在演講廳拍攝演講者的深度影片來分析姿勢辨識率。透過實驗分析得知研究的姿勢辨識率約為90%。
In the paper, a technique for identifying the poses of a lecturer is presented. The identified poses together with other sources of information will automatically direct a PTZ camera to capture appropriate videos for the lecturer. The videos taken are to be used in an automatic lecturer recording system that is currently under development in our laboratory. The input data to our system are depth images provided by a KINECT sensor. For each input image, the pixels of the lecturer are first segmented into body parts. This is achieved using a Random forest based on the depth comparison image features of pixels. The centers of body parts are next determined using a Minimum bounding box technique. A pose of the lecturer is described in terms of the centers of parts. The detected pose is recognized by matching with a set of prebuilt GMM (Gaussian mixture models) pose models. Once a pose is recognized, it is tracked over the subsequent video sequence using a hybrid approach of motion tracking and Particle filtering. A large number of real depth image sequences were examined. Experimental results revealed the feasibility and reliability of the proposed pose recognition system.
[Bre98] C. Bregler and J. Malik, “Tracking People with Twists and Exponential Maps,” IEEE Conf. Comput. Vis. Pattern Recognit., pp. 8-15, 1998.
[Che11] C. Chen, Y. Zhuang, F. Nie, Y. Yang, F. Wu, and J. Xiao, “Learning a 3D Human Pose Distance Metric from Geometric Pose Descriptor,” IEEE Transactions on Visualization and Computer Graphics, vol. 17, no. 11, pp. 1676-1689, Nov. 2011.
[Che13] H. T. Chen, Y. Z. He, C. L. Chou, S. Y. Lee, B.-S.P. Lin, and J. Y. Yu , “Computer-Assisted Self-training System for Sports Exercise Using Kinects,” IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 1-4, 2013.
[Cho16] S. Cho, W. S. Kim, N. J. Paik, and H. Bang, “Upper-Limb Function Assessment Using VBBTs for Stroke Patients,” IEEE Computer Graphics and Applications, vol. 36, no. 1, pp. 70-78, 2016.
[Con95] L. Concalves, E. D. Bernardo, E. Ursella, and P. Perona, “Monocular Tracking of the Human Arm in 3D,” In Proc. of Intl. Conf. on Computer Vision, 1995.
[Dan14] M. Dantone, J. Gall, C. Leistner, and L. Van Gool, “Body Parts Dependent Joint Regressors for Human Pose Estimation in Still Images,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 11, pp. 2131-2143, Nov. 2014.
[Du14] G. Du, P. Zhang, and D. Li ,“Human–Manipulator Interface Based on Multisensory Process via Kalman Filters,” IEEE Transactions on Industrial Electronics, vol. 61, no. 10, pp. 5411-5418, Oct. 2014.
[Ekm69] P. Ekman and W. V. Friesen, “The Repertoire of Nonverbal Behavior: Categories, Origins, Usage and Coding”, Semiotica, vol. 1, no. 1, pp49-98, 1969.
[Fan12] G. Fanelli, M. Dantone , J. Gall, A. Fossati, and L. Van Gool, “Random Forests for Real Time 3D Face Analysis,” Int. J. Comput. Vis., vol. 101, Aug. 2012.
[Fre75] H. Freeman and R. Shapira, “Determining the Minimum-Area Encasing Rectangle for an Arbitrary Closed Curve”, Comm. A.C.M., vol. 18, pp. 409-413, July 1975.
[Gav95] D. M. Gavrila and L. S. Davis, “Towards 3D Model-based Tracking and Recognition of Human Movement: A Multi-view Approach,” In Proc. of the Intl. Workshop on Automatic Face- and Gesture-Recognition, Zurich, 1995.
[Ge15] S. Ge and G. Fan, “Articulated Non-Rigid Point Set Registration for Human Pose Estimation from 3D Sensors”, Sensors (Physical Sensors), June 2015.
[Gup15] M. Gupta, L. Behera, V.K. Subramanian, and M.M. Jamshidi, “A Robust Visual Human Detection Approach with UKF-Based Motion Tracking for a Mobile Robot,” IEEE SYSTEMS JOURNAL, vol. 9, no. 4, pp. 1363-1375, Dec. 2015.
[Hon15] C. Hong, J. Yu, J. Wan, D. Tao, and M. Wang, “Multimodal Deep Autoencoder for Human Pose Recovery,” IEEE Trans. Image Process, vol. 24, no. 12, pp. 5659-5670, Dec. 2015.
[Hor14] N.S. Hore, “A Motion-Detection Biology-Based Learning Game for Children,” IEEE Potentials, vol. 33, no. 6, pp. 31-36, 2014.
[Ju96] S.X. Ju, M.J. Black, and Y. Yacoob, “Cardboard People: A Parameterized Model of Articulated Motion,” In Proc. of 2nd Intl. Conf. on Automatic Face- and Gesture-Recognition, Killington, Vermont, pp. 38-44, 1996.
[Kak96] I.A. Kakadiaris and D. Metaxas, “Model-Based Estimation of 3D Human Motion with Occlusion Based on Active Multi-viewpoint Selection,” In Proc. of Intl. Conf. on Computer Vision and Pattern Recognition, 1996.
[Kan11] V. Kanhangad. A. Kumar, and D. Zhang, “Contactless and Pose Invariant Biometric Identification Using Hand Surface,” IEEE Trans. Image Process, vol. 20, no. 5, pp. 1415-1424, May 2011.
[Kan14] S. Kang, K. Kim, and S. Chi, “Rider Posture Analysis for Postural Correction on a Sports Simulator,” International Conference on Control, Automation and Systems, pp. 485-487, Oct. 2014.
[Lee85] H.J. Lee, and Z. Chen, “Determination of 3D Human Body Postures from a Single View,” Computer Vision, Graphics, and Image Processing, pp. 148-168, 1985.
[Man15] P. Mantini, and S.K. Shah, “Person Re-Identification Using Geometry Constrained Human Trajectory Modeling,” IEEE International Symposium on Technologies for Homeland Security(HST), pp. 1-6, 2015.
[Plo16]G. Plouffe, and A. Cretu, “Static and Dynamic Hand Gesture Recognition in Depth Data Using Dynamic Time Warping,” IEEE Transactions on Instrumentation and Measurement, vol. 65, no. 2, Feb. 2016.
[Pop12] Popoola, K. and Wang, O.P., “Video-Based Abnormal Human Behavior Recognition-A Review,” IEEE Trans. Syst. Man Cybern. C., vol. 42, no. 6, pp. 865-878, Nov. 2012.
[Reg95] J.M. Regh and T. Kanade, “Model-Based Tracking of Self-Occluding Articulated Objects,” In Proc. of Intl. Conf. on Computer Vision, 1995.
[Sha78] M.I. Shamos, “Computational Geometry”, Ph.D. thesis, Yale University, 1978.
[She14] W. Shen, K. Deng, X. Bai, T. Leyvand, B. Guo, and Z. Tu, “Exemplar-Based Human Action Pose Correction,” IEEE Trans. Image Process, vol. 44, no. 7, pp. 1053-1066, Jul. 2014.
[Sho13] J. Shotton, R. Girshick, A. Fitzgibbon, T. Sharp, M. Cook, M. Finocchio, R. Moore, P. Kohli, A. Criminisi, A. Kipman, and A. Blake, “Efficient Human Pose Estimation from Single Depth Images,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 12, pp. 2821-2840, Dec. 2013.
[Tay00] C.J. Taylor, “Reconstruction of Articulated Objects from Point Correspondences in a Single Uncalibrated Image,” Comput. Vis. Image Underst, vol. 80, no. 10, pp. 349-363, Oct. 2000.
[Tou83] G.T. Toussaint, “Solving Geometric Problems with the Rotating Calipers,” Proceedings of IEEE MELECON'83, Athens, Greece, May 1983.
[Zha15] L. Zhao, X. Gao, D. Tao, and X. Li, “Tracking Human Pose Using Max-margin Markov Models,” IEEE Trans. Image Process, vol. 24, no. 12, pp. 5274-5287, Dec. 2015.