簡易檢索 / 詳目顯示

研究生: 莊謹萍
Chin-ping Chuang
論文名稱: 以視覺為基礎之即時指揮手勢追蹤系統
A Vision-based Real-time Conductor Gesture Tracking System
指導教授: 李忠謀
Lee, Chung-Mou
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2008
畢業學年度: 96
語文別: 英文
論文頁數: 55
中文關鍵詞: 手勢追蹤
英文關鍵詞: Gesture Tracking, CAMSHIFT
論文種類: 學術論文
相關次數: 點閱:171下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著網路視訊的普及,網路攝影機品質日趨優良、價格也相對低廉,本研究旨在提出一個「指揮者手勢追蹤系統」,代替鍵盤與滑鼠作為輸入單元,讓使用者能透過視訊攝影機(Webcam)及個人電腦、運用基本的指揮動作,能夠即時追蹤使用者手勢的軌跡與方向變化、偵測音樂節拍所在的時間點。
    本研究可分為兩個主要階段:第一階段為目標物追蹤,採用CAMSHIFT演算法來實現物件追蹤。CAMSHIFT演算法為平均位移演算法的改良,此演算法利用使用者所感興趣的顏色機率分佈特性,經由平均位移迭代的方式,找出其機率分佈圖的峰值,此峰值即為可能性最高之影像區塊並得到物體移動路徑。第二階段則利用兩種方法計算:K-曲率法則以及垂直分量低點偵測。K-曲率法則利用物體移動路徑各點之區率並計算找出其方向轉變;而垂直分量低點偵測則是找出物體移動的垂直低點,將此低點定義為音樂的節拍點。
    本研究所開發之系統可以讓使用者自行選定偵測目標(如指揮棒)並準確偵測移動的軌跡,將使用者的指揮動作上方向的改變,轉變成音樂檔的節拍事件,其準確率平均可達86.46%以上。

    In recent years, interaction between humans and computers is becoming more important. “Virtual Orchestra” is an Human Computer Interface (HCI) software which attempts to authentically reproduce a live orchestra using synthesized and sampled instruments sounds. Compared with the traditional HCIs, using vision-based gesture can provide a touch-free interface which is less bounding than mechanical instruments. In this research, we design a vision-based system that can track the hand motions of a conductor from webcam and extract musical beats from motions.
    The algorithm used is based on a robust nonparametric technique for climbing density gradients to find the mode of probability distributions. For each frame, the mean shift algorithm converges to the mode of the distribution. Then, the CAMSHIFT algorithm is used to track the moving objects in a video scene. After acquiring the target center point continuously, we can form the trajectory of moving target (such as baton, conductor’s hand…etc). By computing an approximation of k-curvature for the trajectory, and the angle between these two motion vectors, we can compute the point of the change of direction.
    In this thesis, a system was developed for interpreting a conductor’s gestures and translating theses gestures into musical beats that can be explained as the major part of the music. This system does not require the use of active sensing, special baton, or other constraints on the physical motion of the conductor.

    LIST OF TABLES viii LIST OF FIGURES ix Chapter 1 Introduction 1 1.1 Overview of the Problem 1 1.2 Challenges 2 1.3 Objectives 4 1.4 Thesis Organization 5 Chapter 2 Literature Review 6 2.1 Key Terms in Music Conducting 6 2.1.1 Beat 6 2.1.2 Tempo 7 2.1.3 Dynamics (Volume) 7 2.1.4 Other Musical Elements 7 2.2 Reviews of Conductor Gesture Tracking Systems 8 2.2.1 Instrumented Batons 9 2.2.2 Vision-based Conductor Gesture Tracking Systems 12 2.2.3 Summary of Conductor Gesture Tracking Systems 15 2.3 Backgrounds of Object Tracking 19 2.3.1 Motion Detection 19 2.3.2 Motion Tracking 20 Chapter 3 Vision-based Conductor Gesture Tracking 23 3.1 Overview 23 3.2 User-defined Target Tracking Using CAMSHIFT 25 3.2.1 The CAMSHIFT Algorithm 26 3.2.2 Color Space Used for ROI Probability Model 27 3.2.3 Histogram Back-projection 30 3.2.4 Mass of Center Calculation 32 3.3 Beat Detection and Analysis 34 3.3.1 K-curvature Algorithm 35 3.3.2 Local-Minimum Algorithm 37 Chapter 4 Experimental Results and Discussions 39 4.1 Overviews 39 4.2 Experimental Results 42 4.3 Analysis of the Experimental Results 47 Chapter 5 Conclusion and Future Work 49 5.1 Conclusion 49 5.2 Future Work 50 References 52

    [1] T. T. Hewett, et al., "ACM SIGCHI Curricula for Human-Computer Interaction", ACM Press, New York, NY, 1992, ACM Order Number: 608920.
    [2] B. A. Myers, "A Brief History of Human-Computer Interaction Technology", Interactions, vol. 5, pp. 44-54, 1998.
    [3] M.T. Driscoll, "A Machine Vision System for Capture and Interpretation of Orchestra Conductor’s Gestures", M. S. Degree Thesis, May, 1999.
    [4] E. Lee, I. I. Grüll, H. Kiel and J. Borchers, "Conga: A Framework for Adaptive Conducting Gesture Analysis", NIME '06: Proceedings of the 2006 Conference on New Interfaces for Musical Expression, pp. 260-265, 2006.
    [5] D. Murphy, “Tracking a Conductor's Baton” , Søren I. Olsen, Editor, Proceedings of the 12th Danish Conference on Pattern Recognition and Image Analysis, volume 2003/05 of DIKU technical report series, pp. 59-66, Copenhagen, Denmark, August 2003.
    [6] R. Behringer, "Conducting Digitally Stored Music by Computer Vision Tracking", AXMEDIS '05: Proceedings of the First International Conference on Automated Production of Cross Media Content for Multi-Channel Distribution, pp. 271, 2005.
    [7] The Church of Jesus Christ of Latter Day Saints. Conducting Course.
    [8] Wikipedia, The free Encyclopedia, http://en.wikipedia.org
    [9] M. Lambers, "How Far is Technology from Completely Understanding a Conductor?”, 4th Twente Student Conference on IT, Enschede, January 30, 2006.
    [10] Paul Kolesnik, "A Conducting Gesture Recognition, Analysis and Performance System", M. S. Degree Thesis, McGill University, June, 2004.
    [11] R. Boulanger and M. Mathews, “The 1997 Mathews Radio-baton and Improvisation Modes”, Proceedings of the 1997 International Computer Music Conference, pp.395-398, Thessaloniki, Greece, 1997.
    [12] Buchla Lightning II. <http://www.buchla.com>
    [13] B. Brecht and G. Garnett, “Conductor Follower” , Proceedings of the 1995 International Computer Music Conference, pp. 185-186, Banff, Canada, 1995, Available: http://cnmat.berkeley.edu/publication/conductor_follower.
    [14] J. Borchers, W. Samminger and M. Mühlhäuser, "Personal Orchestra: Conducting Audio/Video Music Recordings", Proceedings of the Second International Conference on WEB Delivering of Music (WEDELMUISC’02), 2002.
    [15] Carmine Cascaito and Marcelo M. Wanderley, “Lessons from Long Term Gestural Controller Users”, in Proceedings of the 4th International Conference on Enactive Interfaces (ENACTIVE'07), pp. 333-336, Grenoble, France, 2007.
    [16] F. Tobey and Ichiro Fujinaga, "Extraction of Conducting Gestures in 3D Space", Proceedings of the 1996 International Computer Music Conference, pp. 305-307, San Francisco, 1996.
    [17] T. Marrin and J. Paradiso, “The Digital Baton: a Versatile Performance Instrument”, Proceedings of the 1997 International Computer Music Conference, pp.313-316, Thessaloniki, Greece, 1997.
    [18] T. Marrin and R. Picard, “The Conductor's Jacket: a Device for Recording Expressive Musical Gestures”, Proceedings of the 1998 International Computer Music Conference, pp.215-219, Ann Arbor, MI, 1998.
    [19] T. Marrin, "Inside the Conductor's Jacket: Analysis, Interpretation and Musical Synthesis of Expressive Gesture", Ph.D. Dissertation, MIT Media Lab, February, 2000.
    [20] T. Ilmonen, “Tracking Conductor of an Orchestra Using Artificial Neural Networks”, M. S. Degree Thesis, Helsinki University of Technology, Espoo, Finland, 1999.
    [21] T. Ilmonen and T. Takala, "Conductor Following with Artificial Neural Networks", Proceedings of the 1999 International Computer Music Conference, pp. 367-370, Beijing, China, October, 1999.
    [22] H. Morita, "A Computer Music System that Follows a Human Conductor," Computer, vol. 24, pp. 44-53, 1991.
    [23] Light Baton. < http://web.tiscali.it/pcarosi/Lbs.htm>
    [24] J. Segen, A. Majumder, and J. Gluckman, "Virtual Dance and Music Conducted by a Human Conductor", Eurographics, vol. 19(3), EACG, 1999.
    [25] J. Segen, S. Kumar and J. Gluckman, "Visual Interface for Conducting Virtual Orchestra", Proceedings of the 15th International Conference on Pattern Recognition (ICPR’00), vol.1, pp. 276-279, 2000.
    [26] E. Lee, T. Marrin and J. Borchers, "You're the Conductor: A Realistic Interactive Conducting System for Children", NIME '04: Proceedings of the 2004 Conference on New Interfaces for Musical Expression, pp. 68-73, Hamamatsu, Japan, June 3-5, 2004.
    [27] E. Lee, M. Wolf and J. Borchers, "Improving Orchestral Conducting Systems in Public Spaces: Examining the Temporal Characteristics and Conceptual Models of Conducting Gestures", Proceedings of the CHI 2005 Conference on Human Factors in Computing Systems, pp. 731-740, Portland, Oregon, April 2-7, 2005.
    [28] E. Lee and J. Borchers, "The Role of Time in Engineering Computer Music Systems", NIME '05: Proceedings of the 2005 Conference on New Interfaces for Musical Expression, pp. 204-207, Vancouver, Canada, May 26-28, 2005.
    [29] D. Murphy, T. H. Andersen, and K. Jensen, “Conducting Audio Files via Computer Vision”, Gesture-Based Communication in Human-Computer Interaction: Selected Revised Papers from the 5th International Gesture Workshop, volume 2915 of LNAI, pp. 529-540, Genoa, Italy, April, 2003.
    [30] D. Murphy, “Live Interpretation of Conductors' Beat Patterns” , Proceedings of the 13th Danish Conference on Pattern Recognition and Image Analysis, Copenhagen, Denmark, pp. 111-120, 2004.
    [31] T. Sim, D. Ng, and R. Janakiraman, "VIM: Vision for Interactive Music", Proceedings of IEEE Workshop on Applications of Computer Vision (WACV '07), pp.32-32, February, 2007.
    [32] K. C. Ng, "Music via Motion: Transdomain Mapping of Motion and Sound for Interactive Performances", Proceedings of the IEEE, vol.92, no.4, pp. 645-655, April, 2004.
    [33] W. Hu, T. Tan, L. Wang and S. Maybank, "A survey on visual surveillance of object motion and behaviors", IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol. 34, pp. 334-352, 2004.
    [34] Wen-Han Yao, "Mean-Shift Object Tracking Based On A Multi-blob Model", M. S. Degree Thesis, National Tawan Chiao Tung University, June, 2006.
    [35] G. Bradski, "Computer vision face tracking for use in perceptual user interface", Intel Technology Journal, vol. 2nd Quarter, 1998.
    [36] G. John Allen, Y. D. Richard Xu and S. Jin Jesse, "Object Tracking Using CamShift Algorithm and Multiple Quantized Feature Spaces", Inc. Australian Computer Society, vol.36, 2004.
    [37] Dorin Comaniciu and Peter Meer, "Mean Shift: A robust approach toward feature space analysis", IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5):603-619, May, 2002.
    [38] Xia Liu, "Research of the Improved Camshift Tracking Algorithm", International Conference on Mechatronics and Automation, ICMA 2007, pp. 968-972, 2007.
    [39] Hongmo Je, Jiman Kim and Daijin Kim, "Vision-Based Hand Gesture Recognition for Understanding Musical Time Pattern and Tempo", The 33rd Annual Conference of the IEEE Industrial Electronics Society (IECON), pp. 2371-2376, , Taipei, Taiwan, November 5-8, 2007.
    [40] W.S. Rutkowski, A. Rosenfeld, "A comparison of corner-detection techniques for chain-coded curves", TR-623. Computer Science Center, University of Maryland, 1978.
    [41] T. Peli, "Corner extraction from radar images", 1988 International Conference on Acoustics, Speech, and Signal Processing. ICASSP-88, pp. 1216-1219 vol.2, 1988.
    [42] Huang-Yu Lian, "The Effects of Human Factors on Reaction Speed to Visual and Auditory Signals ", M. S. Degree Thesis, National Kaohsiung First University of Science and Technology, Kaohsiung, Taiwan, 2000.

    下載圖示
    QR CODE