簡易檢索 / 詳目顯示

研究生: 楊儒松
Ru-Song Yang
論文名稱: 基於視覺和聽覺的教學影片內容分析與分類
Content-Based Lecture Videos Analysis and Classification Based on Audio and Visual Cues
指導教授: 李忠謀
Lee, Chung-Mou
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2012
畢業學年度: 100
語文別: 中文
論文頁數: 39
中文關鍵詞: 教學影片分析語音情緒辨識肢體辨識
英文關鍵詞: lecture videos analysis, speech emotion recognition, gesture recognition
論文種類: 學術論文
相關次數: 點閱:173下載:9
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 現在大部分的教室仍使用黑板,以黑板授課的教學影片亦相當普及,但黑板授課的教學影片在多媒體語意分析的領域深具挑戰性但極少被討論。本論文針對黑板授課的教學影片,提出一個基於視覺和聽覺的研究方法,針對講者的肢體行為與語音內容進行探討,用以提醒學生在不同時段的教學影片上要投入多少的注意力。在視覺分析上,針對講者於教學中出現的各種姿態作分析,辨別出講者姿態所代表的意義;而在聽覺分析上本研究提出一個基於語音情緒辨識的模型,針對講者的語音內容將講者語音分類為快樂、生氣、厭倦、悲傷、正常等五種聲音情緒,再藉由講者語音情緒上的變化來分析講者的教學狀態。
    綜合視覺與聽覺的分析結果,我們可以評估出講者在教學時候各時段的重要性,同時也反映語意的強度。學習者可以根據每個時段下講者教學的重要性投注適當的注意力,讓學習者更有效率的藉由教學影片學習。

    Most of the classrooms come with blackboards, and blackboards are widely used as a teaching prop in lecture video recordings. However, there are very few discussions about lecture video recordings that use blackboard as teaching prop concerning its multimedia semantics analysis. The article used a visual and optical based research method to explore speaker’s body languages and tone of speech in the blackboard lecture recordings, and how the amount of attention to pay in different segments of lecture recordings to enhance students’ learning. The visual analysis focused on semantics implied in speaker’s postures. The optical analysis focused on the variations of speaker’s speech emotions in his flow of teaching. The article proposed a speech emotion recognition model that divides speech emotions into five categories of happy, angry, bored, sad, and normal.
    The results of the analysis showed semantic intensity of the speaker and the importance of speakers teaching in different segments, and how students can learn more effectively with their variations in amount of attention according to the importance of speakers’ teaching throughout lecture video recordings.

    目錄 I 圖目錄 IV 表目錄 V 第一章 緒論 1 1.1研究動機 1 1.2研究目的 2 1.3研究範圍 2 1.3.1 影像內容 3 1.3.2 聲音內容 : 4 1.4論文計畫書架構 4 第二章 文獻探討 6 2.1語音情緒相關研究 6 2.2黑板教學影片分析 7 第三章 研究方法 8 3.1 研究目標 8 3.1.1聲音情緒探討 8 3.1.2講者教學行為 9 3.2 系統架構 9 3.3講者行為分析 11 3.3.1講者肢體擷取 11 3.3.3 質心計算 13 3.3.4講者姿態判斷 13 3.4 講者語音情緒分析 15 3.4.1語音前處理 16 3.4.2 基本語音特徵擷取 19 3.4.3 講者教學狀態辨識 23 3.5混合架構 24 第四章 實驗結果 26 4.1講者姿態辨識效能 26 4.1.1實驗規畫 26 4.1.2 結果與分析 26 4.2語音情緒辨識 27 4.2.1實驗計畫 27 4.2.2實驗結果數據 28 4.3講者語音狀態分析 29 4.3.1 實驗規畫 29 4.3.2 實驗結果數據 29 4.3系統結果使用者(老師、學生)比較評估 31 4.3.1 實驗規畫 31 4.3.2實驗效能評估 32 第五章 結論 34 5.1 結論 34 5.2 實驗貢獻 34 5.3 未來展望 35 參考文獻 37

    [1] Ying Li, Shrikanth Narayanan, C.-C. Jay Kuo, “Content-Based Movie Analysis and Indexing Based on AudioVisual Cues,” IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 14, NO.8 , AUGUST 2004.

    [2] C. Krishna Mohan, B.Yegnanarayana , “Classification of sport videos using edge-based features and autoassociative neural network models,” Signal, Image and Video Processing, 4, 1: 61-73.

    [3] Cannon, W.B. , “Again the James-Lange theory of emotion: a critical examination and an alternative theory”, Am J. Psychol, 39.106-24,1931.

    [4] Cornelius R.R., “A THEORETICAL APPROACHES TO EMOTION”, ISCA Workshop on Speech and Emotion, Vassar College Poughkeepsie, NY USA, 2000.

    [5] Picard R.W., “Toward Machine Emotional Intelligence: Analysis of Affective Physiological State”, IEEE Transactions on Pattern Analysis and Machine Intelligence Vol 23,no. 10.October 2002.

    [6] B. Schuller, G. Rigoll and M. Lang(2003).“Hidden Markov Model-based Speech Emotion Recognition”, Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, China, vol. 2, pp. 1-4.

    [7] D. Ververidis, C. Kotropoulos and I.Pitas(2004).“Automatic Emotional Speech Classification,” Proceedings of IEEE International Conference on Acoustics,
    Speech, and Signal Processing, Montreal, Quebec, Canada,vol. 1, pp. 593-596.

    [8] B. Schuller, G. Rigoll and M. Lang(2004).“Speech Emotion Recognition Combining Acoustic Features and Linguistic Information in a Hybrid Support Vector Machine – Belief Network Architecture”, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, Quebec, Canada, vol. 1, pp. 577-580.

    [9] X.H. Le, G. Quénot and E. Castelli(2004).“Recognizing Emotions for Audio-Visual Document Indexing," Proceedings of 9th Symposium on Computers and Communications,Alexandria, Egypt, vol. 2, pp. 580-584.

    [10] Oh-Wook Kwon, Kwokleung Chan, Jiucang Hao, Te-Won Lee,” Emotion Recognition by Speech Signals”, Institute for Neural Computation University of California, San Diego, USA.

    [11] Dimitrios Ververidis, Constantine Kotropoulos ,” Emotional speech recognition: Resources, features, and methods”, Artificial Intelligence and Information
    Analysis Laboratory, Department of Informatics, Aristotle University of Thessaloniki,University Campus, Box 451, Thessaloniki 541 24, Greece, accepted 24 April 2006.

    [11] Y. Chen and W.J. Heng, “Automatic synchronization of speech tra,nscript and slides in presentation,” in Proc. Int. Symp. Circuits and Systems, vol. 2, pp. 568–571. 2003.

    [12] F. Wang, C.W. Ngo, and T.C. Pong, “Synchronization of lecture videos and electronic slides by video text analysis,” in ACM Multimedia, pp. 315–318,2003.

    [13] T. Liu, R. Hejelsvold, and J.R. Kender, “Analysis and enhancement of videos of electronic slide presentations,” in IEEE International Conference on Multimedia and Expo, vol. 1, pp. 77–80, 2002.

    [14] C.W. Ngo, F. Wang, and T.C. Pong, “Structuring lecture videos for distance learning applications,” in Proc. IEEE Int. Symp. Multimedia and Software Engineering, pp. 215–222, 2003.

    [15] L. He, Z. Liu, and Z. Zhang, “Why take notes use the whiteboard capture system,” in IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 776–779, 2003.

    [16] L. He and Z. Zhang, “Real-time whiteboard capture and processing using a video camera for teleconferencing,” in Proc. ICASSP, pp. 1113–1116, 2005.

    [17] M. Wienecke, G.A. Fink, and G. Sagerer, “Toward automatic videobased whiteboard reading,” Int. J. Doc. Anal. Recognit., vol. 7, no. 2-3, pp. 188–200, 2005.

    [18] Z. Zhang and L. He, “Notetaking with a camera: Whiteboard scanning and image enhancement,” in Proc. ICASSP, vol. 3, pp. 533–536, 2004.

    [19] C.C. Chang and C.K. Lin, LIBSVM: a libraryfor support vector machines. Software availableat http://www.csie.ntu.edu.tw/~cjlin/libsvm.

    [20] S. Ammouri, and G.A. Bilodeau, “Face and Hands Detection and Tracking Applied to the Monitoring of Medication Intake,” Canadian Conference on Computer and Robot Vision, pp. 147-154, Canadian, May 2008.

    [21] 語音訊號處理,王小川 編著,2009年2月

    [22] Fukuda S., and Kostov V., ”Extraction emotion from voice”, IEEE International Conference on System, Man, and Cybernetics, 1999.

    [23] Theodoros Giannakopoulos, Aggelos Pikrakis and Sergios Theodoridis,” A DIMENSIONAL APPROACH TO EMOTION RECOGNITION OF SPEECH FROM MOVIES,” ICASSP 2009

    下載圖示
    QR CODE