研究生: |
朱惠銘 Huei-Ming Chu |
---|---|
論文名稱: |
研究使用詞彙與語意資訊於 Investigating the Use of Lexical and Semantic Information for Automatic Spoken Document Segmentation and Organization |
指導教授: |
陳柏琳
Chen, Berlin |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2005 |
畢業學年度: | 93 |
語文別: | 中文 |
論文頁數: | 101 |
中文關鍵詞: | 語音文件切割 、語音文件組織 、自我組織圖 、主題混合模型圖示 |
英文關鍵詞: | Spoken Document Segmentation, Spoken Document Organization, Self-Organization Map, Topic Mixture Model Map |
論文種類: | 學術論文 |
相關次數: | 點閱:131 下載:3 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
語音文件切割是指在長時間的聲音訊號上自動地標定不同主題之間的邊界,因此可將語音文件分隔成具有主題凝聚力的段落。另外,語音文件組織是指對於已切割過的段落分析其應隸屬的主題,使這些段落群聚在主題群集中,並標示群集標記後以階層式視覺化呈現便於使用者瀏覽。兩者在近幾年都逐漸受到重視。
本論文首先探究如何將隱藏式馬可夫模型(HMM)此種已被廣泛應用在語音辨識及資訊檢索的模型延伸應用於語音文件切割。不僅使用了語音文件本身具有的詞彙資訊,如統計上的特徵及語言模型機率。另考量了聲學上的資訊,像是停頓分佈及辨識可信度,以辨別段落邊界。我們也融合了語音文件中具有的語意資訊於隱藏式馬可夫模型切割器中以更精確地模擬狀態的觀測分佈。此外,我們也研究了兩種非監督式且為資料導引式的組織方法於語音新聞文件分析上,分別為自我組織圖(SOM)以及機率式潛藏語意分析圖示(ProbMap)。我們提出了另一種觀察潛藏主題方式的主題混合模型圖示(TMMmap)以改進機率式潛藏語意分析圖示。透過一系列在主題偵測與追(TDT)中文語音文件集上的實驗,來分析這些方法的效能與其中的異同。最後,我們更進一步融合主題分佈資訊,也就是語音文件組織所得到的拓撲分佈資訊,於隱藏式馬可夫模型切割器中。初步發現有非常好的效果與進步空間。
Spoken document segmentation is to automatically set the boundaries between different small topics begin mentioned in long steams of audio signals, and divide the spoken documents into a set of cohesive paragraphs of sentences sharing some common central topic. While spoken document organization aims at automatically analyzing the subject topics of the segmented shot paragraphs of the spoken documents, clustering them into groups with topic labels and organizing them into some hierarchical visual presentation easier for users to browse. Both of them have gained growing attention in the past few years.
In the thesis, we explored the use of the Hidden Markov Model (HMM) approach, which has been proven effective for speech recognition and information retrieval, in the context of spoken document segmentation. We not only exploited the lexical information inherent in the spoken document, such as the statistical features or the language model probabilities, but also considered the acoustic information, such as the pause distribution and the confidence measure, in identifying segment boundaries. Moreover, the semantic information conveyed in the spoken document was also integrated into the HMM segmenter for accurately modeling the state observation distributions. On the other hand, we investigated two unsupervised and data-driven organization approaches as well for spoken document analysis, i.e., the Self-Organizing Map (SOM) and Probabilistic Latent Semantic Analysis Map (ProbMap). While for the ProbMap approach, a topical mixture model approach (TMMmap), which came from an alternative perspective, was also studied. A series of experiments was conducted on the Topic Detection and Tracking (TDT) spoken document collections in order to analyze the performance levels of these approaches and compare the differences between them. Finally, we further attempted to incorporate the topic distributions as well as the topological constraints achieved from spoken document organization into the HMM segmenter. Very Promising results were initially demonstrated.
[Allan et al., 1998] James Allan, Jaime Carbonell, George Doddington, Jonathan Yamron, and Yiming Yang, “Topic Detection and Tracking Pilot Study Final Report”, 1998.
[Ball and Hall, 1967] Ball G.H., Hall D.J. “A Clustering Technique for Summarizing Multivariate Data.” Behavioral Science, Vol. 12, 153-155. 1967.
[Baum and Eagon, 1967] Baum, L.E. and J.A. Eagon, “An Inequality with Applications to Statistical Estimation for Probabilistic Functions of Markov Processes and to a Model for Ecology”, Bulletin of American Mathematical Society, 1967, 73, pp. 360-363
[Beeferman et al., 1997] D. Beeferman, A. Berger and J. Lafferty, “Text segmentation using exponential models”, Proceedings of the Second Conference On Empirical Methods in NLP, Providence, RI, 1997
[Bellegarda, 2000] Jerome R. Bellegarda, “Exploiting Latent Semantic Information in Statistical Language Modeling” Proceedings of the IEEE, Vol. 88, No. 8, pp.1279-1296, August 2000.
[Bellegarda, 2005] J. R. Bellegarda, “Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling,” to appear in IEEE Signal Processing Magazine, September 2005.
[Blei and Moreno, 2001] Dvavid M.Blei, Pedro J. Moreno, “Topic Segmentation with an Aspect Hidden Markov Model,” SIGIR’01, September 9-12, 2001.
[Chen, 2005] Berlin Chen, “Exploring the Use of Latent Topical Information for Statistical Chinese Spoken Document Retrieval,” accepted for publication in Pattern Recognition Letters, 2005.
[Colbath and Kubala, 1998] Colbath, S. and Kubala, F. (1998): Rough’n’Ready: A Meeting Recorder and Browser. A research note of the Perceptual User Interfaces Conference, San Francisco, CA, November 1998.
[Dempster et al., 1977] A. P. Dempster, N. M. Laird, and D. B. Rubin. “Maximum likelihood from incomplete data via the EM algorithm.” Journal of the Royal Statistical Society, Series B, 34:1-38. 1977.
[Doddington, 1998] G. Doddington, "The Topic Detection and Tracking Phase 2 (TDT2) Evaluation Plan," Available at http://www.nist.gov/speech/tdt_98.htm, 1998.
[Duda and Hart, 1973] Duda R.O., Hart P.E. John Wiley Sons., Pattern Classification and Scene Analysis, 1973.
[Gildea and Hofmann, 1999] Daniel Gildea and Thomas Hofmann. Topic-based language models using EM. EuroSpeech-99, pages 2167-2170, 1999.
[Hofmann and Puzicha, 1998] T. Hofmann and J. Puzicha, “Statistical models for co-occurrence data,” Massachusetts Institute of Technology, Technical Report: AIM-1625 Year of Publication: 1998
[Hofmann, 2000] Thomas Hofmann, “ProbMap - A probabilistic approach for mapping large document collections” Intelligent-Data-Analysis.2000; 4(2): 149-64
[Hofmann, 2001] Thomas Hofmann, “Unsupervised Learning by Probabilistic Latent Semantic Analysis.”, Machine Learning, 42, 177-196, 2001.
[Honkela et al., 1996] Honkela, T., Kaski, S., Lagus, K. and Kohonen, T. “Newsgroup Exploration with WEBSOM Method and Browsing Interface”, Report A32, Helsinki Uni- 26 versity of Technology, Faculty of Information Technology, Laboratory of Computer and Information Science, Rakentajanaukio 2C, SF-02150 Espoo, Finland, 1996
[Huang et al., 2001] Xuedong Huang, Alex Acero, Hsiao-Wuen Hon “Spoken Language Processing: A Guide to Theory, Algorithm and System Development”, Prentice Hall PTR, 2001
[Ji and Zha, 2003] Xiang Ji, Hongyuan Zha, “Domain-independent Text Segmentation Using Anisotropic Diffusion and Dynamic Programming”, SIGIR’ 03, July
[Kaufman and Rousseeuw, 1990] Leonard Kaufman, Peter J. Rousseeuw, “Finding Groups in Data: An Introduction to Cluster Analysis”, John Wiley & Sons, 1990
[Kawahara et al., 2004]Tatsuya Kawahara, Masahiro Hasegawa, Kazuya Shitaoka, Tasuku Kitade, and Hiroaki Nanjo “Automatic Indexing of Lecture Presentations Using Unsupervised Learning of Presumed Discourse Markers”, IEEE Transactions on Speech and Audio Processing, vol. 12, NO. 4, July 2004
[Kohonen, 1990] Teuvo Kononen, “The Self-Organizing Map,” In Proceedings of the IEEE, vol.78, no.9, 1990
[Kohonen, 1995] Teuvo Kononen “Self-Organizing Maps”,Springer, Berlin, 1995
[Kohonen et al., 1996] T. Kohonen, S.Kaski, K. Lagus, and T. Kohonen, “Newsgroup exploration with WEBSOM method and browsing interface”, Tech. Rep. A32, Helsinki University of Technology, Laboratory of Comper and Information Science, Espoo, Finland, 1996
[Kohonen et al., 2000] T. Kohonen, S. Kaski, K. Lagus, J. Salojvi, J. Honkela, V. Paatero and Saarela A, “Self organization of a massive document collection,” IEEE Trans on Neural Networks, vol. 11, no. 3, pp. 574-585, 2000.
[Kurimo, 2002] M. Kurimo, “Thematic indexing of spoken documents by using self-organizing maps,” Speech Communication, vol. 38, pp. 29-45, 2002.
[Lavrenko and Croft] Victor Lavrenko, W.Bruce Croft, “Relevance-Based Language Models”, SIGIR’01
[Lin et al., 2003] Wei-Hao Lin, Rong Jin, Alexander Hauptmann, “Web image Retrieval Re-Ranking with Relevance Model”, WIC’03
[Liu and Croft, 2003] Liu, Xiaoyong and Croft, W. Bruce. “Statistical Language Modeling for Information Retrieval.,” To appear in the Annual Review of Information Science and Technology, Vol. 39, 2003.
[Lee and Chen, 2005] L. S. Lee, B. Chen “Spoken Document Understanding and Organization for Efficient Retrieval/Browsing Applications,” to appear in IEEE Signal Processing Magazine, September 2005.
[Li et al., 2005] Te-Hsuan Li, Ming-Han Lee, Berlin Chen, Lin-shan Lee, “Hierarchical Topic Organization and Visual Presentation of Spoken Documents Using Probabilistic Latent Semantic Analysis (PLSA) for Efficient Retrieval/Browsing Applications,“ The 9th European Conference on Speech Communication and Technology (Interspeech - Eurospeech 2005), Lisbon, Portugal, September 4-8, 2005.
[Manning, 1999] Christopher D. Manning, Hinrich Schutze, “Foundations of Statistical Natural Language Processing, pp.197 1999”
[Matusov et al., 2003] Evgeny Matusov, Jochen Peters, Carsten Meyer, and Hermann Ney, “Topic Segmentation Using Markov Models on Section Level”, ASUR 2003
[Miller et al., 1999] Miller, D. R. H., Leek, T., Schwartz, R. A Hidden Markov Model Information Retrieval System. In Proceedings of ACM SIGIR Conference on R&D in Information Retrieval, 214-221. 1999.
[Miller et al., 2000] D.R.H. Miller, T. Leek and R. Schwartz, “Speech and language technologies for audio indexing and retrieval,”Proc. IEEE, vol. 88, no. 8, pp. 1338-1353, 2000.
[Oxygen, 1995] Project Oxygen 1995 http://oxygen.lcs.mit.edu/
[Pietra et al., 1997] Stephen Della Pietra, Vincent Della Pietra, and John Lafferty. “Inducing Features of Random Fields,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 19:380–393, 1997.
[Roussinov and Chen, 1998] Dmitri G. Roussinov, Hsinchun Chen, “A Scalable Self-organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation”, available at
http://dlist.sir.arizona.edu/archive/00000460/01/A_Scalable-98.htm
[Saul and Pereira, 1997] Lawrence Saul, Fernando Pereira “Aggregate and mixed-order markov models for statistical language processing.” Proceedings of 2nd International Conference on Empirical Methods in Natural Language Processing (pp. 81--89), 1997.
[TDT] TDT, http://www.nist.gov/speech/tests/tdt/
[TDT2 Evaluation Plan, 1998] TDT2 Evaluation Plan, 1998 http://www.nist.gov/speech/tests/tdt/tdt98/doc/tdt2.eval.plan.98.v3.7.pdf
[TDT2000 Evaluation Plan, 2000] TDT2000 Evaluation Plan, 2000 http://www.nist.gov/speech/tests/tdt/tdt2000/evalplan.htm
[TDT2001 Evaluation Plan, 2001] TDT2001 Topic Detection and Tracking (TDT2001) Task Definition and Evaluation Plan, 2001
ftp://jaguar.ncsl.nist.gov/tdt/tdt2001/evalplans/TDT01.Eval.Plan.v1.2.doc
[Teuvo and Kohonen, 1990] Teuvo, Kohonen, “The Self-Organizing Map”, Proceedings of the IEEE, VOL. 78, No.9 September 1990
[Viterbi, 1967]A.J. Viterbi, “Error bounds for convulutional codes and an asymptotically optimal decoding algorithm,” in IEEE Transactions on Information Theory, vol. IT-13, pp. 260-269, Apr 1967
[Whittaker and Hirshberg, 1999] Steve Whittaker, J. Hirschberg, at el. “SCAN (Spoken Content-based Audio Navigation): Designing and evaluating user interfaces to support retrieval form speech achieves. ”In proc. SIGIR 1999.
[Xu and Croft, 1996] J. Xu and W.B Croft, “Query Expansion Using Local and Global Document Analysis”, in Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.4-11, 1996.
[Yang and Pedersen, 1997] Yiming Yang, Jan O.Pedersen, “A Comparative Study on Feature Selection in Text Categorization” Proceedings of ICML 97.
[Zhan et al., 1999] Zhan, P., Wegmann, S., Gillick, L. 1999. Dragon Systems’ 1998 Broadcast News Transcription System for Mandarin. Available at http://www.nist.gov/speech/publications/darpa99/pdf/sp350.pdf.
[方國安, 2002]方國安, “應用基因演算法於中文廣播新聞中情境切割及分類”, 國立成功大學資訊工程學系碩士班碩士論文, pp. 20~36, 2002
[陳佳甫, 2003] 陳佳甫, “考慮特徵、語言模型及額外資訊之中文語音文件切割-以廣播新聞為例” 國立台灣大學電信工程學研究所碩士論文, 2003