簡易檢索 / 詳目顯示

研究生: 陳怡婷
論文名稱: 中文語音資訊摘要-模型與特徵之改進
Chinese Speech Information Summarization-Improved Models and Features
指導教授: 陳柏琳
Chen, Berlin
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2007
畢業學年度: 95
語文別: 中文
論文頁數: 158
中文關鍵詞: 語音文件、摘錄式摘要、隱藏式馬可夫模型、關聯性模型、詞層次主題混合模型
英文關鍵詞: spoken documents, extractive summarization, hidden Markov model, relevance model, word topical mixture model
論文種類: 學術論文
相關次數: 點閱:158下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 大量包含聲音與影像的多媒體內容持續增加,並且遍佈於網際網路與我們的日常生活中,如何有系統化及自動化地處理與統整,已成為當前重要的課題之一。其中,語音為多媒體內容中最具有語意的主要內涵之一,通常可用來表示多媒體檔案的主題與概念。近幾年來,有許多學者已投入多媒體內容組織與理解的相關研究,並有豐碩的成果與貢獻,例如語音文件的轉譯、檢索與摘要。
    文件摘要可分為摘錄式(Extractive)與非摘錄式(Non-extractive or Abstract)摘要,摘錄式摘要依特定摘要比例,從原文件中選出重要的文句、段落或章節來組成摘要;非摘錄式摘要是直接根據文件內容的主題或概念所產生的摘要內容。由於非摘錄式摘要仍具相當的困難度,故目前自動語音文件摘要的相關研究大多以摘錄式摘要為主。本論文主要探討摘錄式中文廣播新聞語音文件摘要方法。我們提出一個機率生成架構,它能將文句生成模型與文句事前機率緊密地耦合,用於摘錄式摘要之重要文句選取。待摘要文件中每一文句被視為一個機率生成式模型,藉以預測文件生成的機率。我們提出二種機率生成模型:隱藏式馬可夫模型(Hidden Markov Model, HMM)與關聯性模型(Relevance Model, RM)的結合,以及詞層次混合模型(Word Topical Mixture Model, wTMM)。同時,我們亦初步將辨識信心度與一些語音聲韻特徵用來作為文句事前機率的估測。我們於中文廣播新聞語料上進行實驗與分析,經由初步的結果證明所提出的方法較其它常見方法可達到更好的摘要結果。

    Huge quantities of multimedia contents including audio and video are continuously growing and filling networks and our lives. Speech information is one of the most important sources for multimedia contents, and usually represents the concepts and topics. Hence, in the recent past, several attempts have been made to investigate the possibility of understanding and organization of multimedia content using speech, and substantial efforts and very encouraging results on spoken document transcription, retrieval and summarization have been reported.
    Spoken document summarization can be either extractive or abstractive. Extractive summarization selects indicative sentences, passages, or paragraphs from an original document according to a target summarization ratio and sequences them to form a summary. Abstractive summarization, on the other hand, produces a concise abstract of a certain length that reflects the key concepts of the document. The latter is more difficult to achieve, thus recent research has focused on the former. In this thesis, we consider extractive summarization of Chinese broadcast news speech. An unified probabilistic generative framework that seamlessly combined the sentence generative probability and the sentence prior probability for sentence ranking was proposed. Each sentence of the spoken documents to be summarized was treated as a probabilistic generative model for predicting the document. To achieve this goal, two alternative approaches, i.e., the hidden Markov model (HMM) that was integrated with the relevance model (RM), and the word topical mixture model (TMM- ), were extensively investigated. On the other hand, the confidence measure and a set of prosodic features were exploited for modeling the sentence prior probability. The summarization capabilities of the proposed approaches were verified by comparison with the other conventional summarization ones. The experiments were performed on the Chinese broadcast news collected in Taiwan. Very promising and encouraging results were initially obtained.

    1. 緒論 1 1.1 研究動機與目的 1 1.2 研究內容與成果 2 1.3 論文架構 7 2. 相關研究 9 2.1 自動文件摘要的歷史背景概述 10 2.2 自動文件摘要方法 12 2.2.1 以文件結構為基礎的摘錄方法(Document Structure-based Approach) 15 2.2.2 以統計值為基礎的摘錄方法(Statistic-based Approach) 16 2.2.3 以機率生成模型為基礎的摘錄方法(Probabilistic Generative Model-based Approach) 30 2.2.4 重要文句的精簡與壓縮(Sentence Compaction) 34 2.2.5 本節小結 35 2.3 自動摘要的評估方法 36 2.3.1 主觀評估 37 2.3.2 客觀評估 37 2.4 摘要的呈現方式 41 3. 實驗設定 45 3.1 實驗語料 45 3.1.1 摘要測試語料 45 3.1.2 訓練語料 50 3.1.3 台師大大詞彙連續語音辨識系統 50 3.2 基礎實驗 51 3.2.1 DataSet1基礎實驗 53 3.2.2 DataSet2基礎實驗 59 4. 摘錄式語音文件摘要使用機率生成式摘要模型 65 4.1 機率生成模型 65 4.2 關聯性模型與隱藏式馬可夫模型 66 4.2.1 背景簡介:關聯性模型(Relevance Model) 66 4.2.2 使用關聯性模型提昇文句估測 68 4.2.3 關聯性模型於文句模型調適之實驗結果 72 4.2.4 以相關文件進行隱藏式馬可夫模型參數訓練之實驗結果 89 4.3 詞層次主題混合模型(WORD TOPICAL MIXTURE MODEL , WTMM) 95 4.3.1 詞層次主題混合模型實驗結果 100 4.4 摘要特徵於機率生成式摘要模型之初步應用 113 4.4.1 文句位置(Location) 116 4.4.2 文句長度(Sentence Length) 120 4.4.3 語言學特徵-雙連詞語言模型分數(Bigram Language Model Score) 123 4.4.4 信心度分數(Confidence Score) 127 4.4.5 聲學特徵:音高(Pitch) 129 4.4.6 聲學特徵:能量(Energy) 139 4.4.7 結論 147 5. 結論與未來展望 149 參考文獻 151

    [Aubert 2002] X. Aubert, “An Overview of Decoding Techniques for Large Vocabulary Continuous Speech Recognition,” Computer Speech and Language, Vol. 16, pp. 89-114, 2002.
    [Baeza-Yates and Ribeiro-Neto 1999] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Addison Wesley Longman, 1999.
    [Baxendale 1958] P. B. Baxendale, “Machine-Made Index for Technical Literature-An Experiment”, IBM Journal (October) pages 354-361, 1958
    [Chen et al. 2004] Berlin Chen, Hsin-min Wang, Lin-shan Lee, “A Discriminative HMM/N-Gram-Based Retrieval Approach for Mandarin Spoken Documents,” ACM Transactions on Asian Language Information Processing, Vol. 3, No. 2, June 2004, pp. 128-145.
    [Chen et al. 2004] B. Chen, J.-W. Kuo, W.-H. Tsai (2004), ”Lightly Supervised and Data-Driven Approaches to Mandarin Broadcast News Transcription”, in Proc. ICASSP 2004.
    [Chen et al. 2005] Berlin Chen, Yi-Ting Chen, Chih-Hao Chang, Hung-Bin Chen, “Speech Retrieval of Mandarin Broadcast News via Mobile Devices”, the 9th European Conference on Speech Communication and Technology (Interspeech - Eurospeech 2005), pp. 109-112, Lisbon, Portugal, September 4-8, 2005.
    [Chen et al. 2005] B. Chen, J.-W. Kuo, W.-H. Tsai (2005), ”Lightly Supervised and Data-Driven Approaches to Mandarin Broadcast News Transcription”, International Journal of Computational Linguistics and Chinese Language Processing, Vol. 10, No. 1, pp.1-18, March 2005.
    [Chen 2006] B. Chen, “Exploring the Use of Latent Topical Information for Statistical Chinese Spoken Document Retrieval,” Pattern Recognition Letters, 2006.
    [Chen et al. 2004] Berlin Chen, Jen-Wei Kuo, Yao-Min Huang, Hsin-min Wang, "Statistical Chinese Spoken Document Retrieval Using Latent Topical Information," the 8th International Conference on Spoken Language Processing (ICSLP 2004), Vol. II, pp. 1621-1625, Jeju island (Cheju), South Korea, October 4-8, 2004.
    [Chen et al. 2006] Yi-Ting Chen, Suhan Yu, Hsin-min Wang, Berlin Chen, "Extractive Chinese Spoken Document Summarization Using Probabilistic Ranking Models," the Fifth International Symposium on Chinese Spoken Language Processing ( ISCSLP 2006), Singapore, December 13-16, 2006.
    [Chen et al. 2006] Berlin Chen, Yao-Ming Yeh, Yao-Min Huang, Yi-Ting Chen, "Chinese Spoken Document Summarization Using Probabilistic Latent Topical Information,” the 31th IEEE International Conference on Acoustics, Speech, and Signal processing (ICASSP 2006), Toulouse, France, May 14-19, 2006.
    [Chiu and Chen 2007] Hsuan-Sheng Chiu, Berlin Chen, "Word Topical Mixture Models for Dynamic Language Model Adaptation," the 32th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2007), Hawaii, USA, April 15-20, 2007.
    [Climenson et al. 1961] W.D. Climenson, N.H. Hardwick, S.N. Jacobson, “Automatic syntax analysis in machine indexing and abstracting”, In American Documentation, 12(3):178-183, 1961
    [Croft and Lafferty 2003] W. Bruce Croft, John Lafferty, “Language Modeling for nformation Retrieval”, Kluwer Academic Publishers, Norwell, MA, 2003.
    [Dempster et al. 1997] A.P. Dempster, N. M. Laird, D.B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm”, Journal of Royal Statistical Society B, Vol. 39, No. 1, 1997.
    [Edmundson 1969] H. P. Edmundson, “New methods in automatic extracting”, Journal of Association for Computing Machinery, 16(2)264-285, 1969
    [Furnas et alt. 1988] G.W. Furnas, S. Deerwester, S.T. Dumais, T.K Landauer., R. Harshman, L.A. Streeter and K.E. Lochbaum, “Information retrieval using a singular value decomposition model of latent semantic structure,” in Proc. ACM SIGIR Conference on R&D in Information Retrieval, 1988, pp. 465-480.
    [Furui et al. 2004] Sadaoki Furui, Tomonori Kikuchi, Yousuke Shinnaka, Chiori Hori, “Speech-to-Text and Speech-to-Speech Summarization of Spontaneous Speech”, IEEE transactions on speech and audio processing, VOL. 12 No.4, July 2004.
    [Gong et al. 2001] Y. Gong and X. Liu, “Generic text summarization using relevance measure and latent semantic analysis,” in Proc. ACM SIGIR Conference on R&D in Information Retrieval, 2001, pp. 19-25.
    [Gopinath 1998] R. A. Gopinath, “Maximum Likelihood Modeling with Gaussian Distributions,” Proc. of International Conference on Acoustic, Speech and Signal Processing, 1998.
    [Hajime and Manabu 2000] Mochizuki Hajime, Okumura Manabu, “A Comparison of Summarization Methods Based on Task-based Evaluation”, 2nd International conference on language resources and evaluation, LREC-2000, Athens, Greece.
    [Hirohata et al. 2005] Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui, “Sentence Extraction-Based Presentation Summarization Techniques and Evaluation Metrics”, ICASSP 2005.
    [Hirohata et al. 2006] Makoto Hirohata, Yosuke Shinnaka, Koji Iwano and Sadaoki Furui, Sentence-extractive automatic speech summarization and evaluation techniques, Speech Communication, In Press, Corrected Proof, , Available online 5 June 2006
    [Ho 2003] Y. Ho,” An initial study on automatic summarization of Chinese spoken documents”, Master Thesis, National Taiwan University, July 2003.
    [Hori et al. 2004] C. Hori, T. Hirao and H. Isozaki, “Evaluation measures considering sentence concatenation for automatic summarization by sentence or word extraction,” Proc. ACL, pp. 82-88 (2004)
    [Hove et al. 1998] Eduard Hovy and Daniel Marcu, “Automated Text Summarization Turorial”, COLING/ACL 1998.
    [Huang et al. 2001] X. Huang, A. Acero, and H.-W. Hon, “Spoken Language Processing,” Prentice Hall, Inc., 2001.
    [Huang et al. 2005] Chien-Lin Huang, Chia-Hsin Hsieh and Chung-Hsien Wu, “Spoken Document Summarization Using Acoustic, Prosodic and Semantic Information,” in Proceedings of ICME 2005, Amsterdam, The Netherlands, 2005.
    [Katz 1987] S. M. Katz (1987). “Estimation of Probabilities from Sparse Data for Other Language Component of a Speech Recognizer,” IEEE trans. Acoustics, Speech and Signal Processing, 35(3), pp. 400-401, 1987.
    [Kemp and Schaaf 1997] T. Kemp and T. Schaaf, “Estimating Confidence Using Word Lattice”, Proc of European Conference on Speech Communication Technology, 1997.
    [Kikuchi et al 2003] T. Kikuchi, S. Furui, and C. Hori, “Two-stage automatic speech summarization by sentence extraction and compaction,” in Proc. IEEE and ISCA Workshop on Spontaneous Speech Processing and Recognition, 2003, pp.207-210.
    [Knight and Marcu 2002] Kevin Knight, Daniel Marcu. “Summarization beyond sentence extraction: A probabilistic approach to sentence compression”. 2002, Artificial Intelligence 139(1): 91-107
    [Kupiec et al. 1995] Julian Kupiec, Jan Pedersen and Francine Chen, “A Trainable Document Summarizer”, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, 1995.
    [Koumpis et al. 2005] K. Koumpis, S. Renals, “Automatic Summarization of Voicemail Messages Using Lexical and Prosodic Features”, ACM trans. Speech and Language Processing 2(1), 2005.
    [Lavrenko and Croft 2001] Victor Lavrenko, W. Bruce Croft, “Relevance-Based Language Models”, SIGIR’01, September 9-12, 2001, New Orleans, Louisiana, USA.
    [LDC] Linguistic Data Consortium: http://www.ldc.upenn.edu.
    [Lee and Chen 2005] Lin-shan Lee, Berlin Chen, “Spoken Document Understanding and Organization,” IEEE Signal Processing Magazine (IEEE SPM), Vol. 22, No. 5, Sept. 2005, pp. 42-60.
    [Lin 2003] C.Y. Lin, “ROUGE: Recall-oriented Understudy for Gisting Evaluation,” 2003, http://www.isi.edu/~cyl/ROUGE/.
    [Lin et al. 2003] Chin-Yew Lin and Eduard Hovy, “Automatic Evaluation of Summaries Using N-gram Co-Occurrence Statistics”, In Proceedings of the Human Technology Conference 2003 (HLT-NAACL-2003), May 27 . June 1, 2003, Edmonton, Canada.
    [Luhn 1958] H. P. Luhn, “The Automatic Creation of Literature Abstracts”, IBM Journal of Research and Development, 1958.
    [Manning and Schutze 1999] Christopher D. Manning and Hinrich Schutze, “Foundations of Statistical Natural Language Processing”, The MIT Press, 1999.
    [Maskey et al. 2003] Sameer Raj Maskey, Julia Hirschberg, “Automatic Summarization of Broadcast News using Structural Features”, EUROSPEECH 2003.
    [Maskey et al. 2005] Sameer Maskey, Julia Hirschberg, “Comparing Lexical, Acoustic/Prosodic, Structural and Discourse Features for Speech Summarization”, Interspeech 2005.
    [Maskey et al. 2006] Sameer Maskey, Julia Hirschberg, “Summarizing Speech Without Text Using Hidden Markov Models”, HLT-NAACL, 2006
    [Merlino et al. 1999] A. Merlino, and M. Maybury, “An Empirical Study of the Optimal Presentation of Multimedia Summaries of Broadcast News”, Mani, I. and Maybury, M. (eds.) Automated Text Summarization. MIT Press. pp. 391-401, 1999.
    [Murray et al. 2005] Gabriel Murray, Steve Renals, Jean Carletta, “Extractive Summarization of Meeting Recordings”, in Proc. Eurospeech 2005.
    [New 98] http://www.new98.com.tw/
    [Ortmanns 1997] S. Ortmanns, H. Ney and X. L. Aubert, “A Word Graph Algorithm for Large Vocabulary Continuous Speech Recognition,“ Computer Speech and Language, Vol. 11, pp.43-72, 1997.
    [Pollock et al. 1975] J. J. Pollock and A. Zamora, “Automatic Abstracting Research at Chemical Abstracts Service”, Chemical Abstracts Service, The Ohio State University, Columbus, Ohio 43210, Received July 17, 1975.
    [PTS] Public Television Service Foundation. http://www.pts.org.tw.
    [Saggion et al. 2002] Horacio Saggion and Dragomir Radev, “Meta-evaluation of Summaries in a Cross-lingual Environment using Content-based Metrics”, COLING 2002.
    [Salton et al. 1968] G. Salton and M. E. Lesk, “Computer evaluation of indexing and text processing,” Journal of the ACM, vol. 15, no. 1, pp. 8-36, 1968.
    [Salton et al. 1997] G. Salton, A. Singhal, M. Mitra & C. Buckley (1997). Automatic text structuring and summary. Information Process And Management, 33(2):193-207, March, 1997.
    [Saon et al. 2000] G. Saon, M. Padmanabhan, R. Gopinath and S. Chen, “Maximum Likelihood Discriminant Feature Spaces,” Proc. of International Conference on Acoustic, Speech and Signal Processing, 2000.
    [Silva et al. 2006] Jorge Silva, Ciprian Chelba, and Alex Acero, “Pruning Analysis For The Position Specific Posterior Lattices For Spoken Document Search, the 31th IEEE International Conference on Acoustics, Speech, and Signal processing (ICASSP 2006), Toulouse, France, May 14-19, 2006.
    [SLG] Spoken Language Group at Chinese Information Processing Laboratory, Institute of Information Science, Academia Sinica. http://sovideo.iis.sinica.edu.tw/SLG/index.htm.
    [Snack] http://www.speech.kth.se/snack/
    [SRI] A. Stolcke, “SRI language Modeling Toolkit, ” version 1.3.3, http://www.speech.sri.com/projects/srilm/.
    [Steinberger et al. 2004] J. Steinberger and K. Jezek. 2004. Text Summarization and Singular Value Decomposition. In Proceedings of ADVIS. Izmir, Turkey.
    [Sun et al. 2005] JianTao Sun, Yuchang Lu, Dou Shen, Qiang Yang, HuaJun Zeng, Zheng Chen, “Web-Page Summarization Using Clickthrought Data”, SIGIR’05, August 15-19, 2005.
    [Takeshita et al. 1997] A. Takeshita, T. Inoue, and K. Tanaka, ”Topic-based Multimedia Structuring. In Maybury, M., ed., Intelligent Multimedia Information Retrieval. Cambridge, MA: AAAI/MIT Press.
    [Wang et al. 2005] H.-M. Wang, B. Chen, J.-W. Kuo, and S.-S. Cheng (2005). “MATBN: A Mandarin Chinese Broadcast News Corpus“, Internation Journal of Computational Linguistics and Chinese Language Processing, Vol. 10, No.2, pp.219-236, June 2005.
    [Wu et al. 2005] Chung-Hsien Wu, Chien-Lin Huang and Chia-Hsin Hsieh, “Spoken Document Summarization and Retrieval for Wireless Application,” in Proceedings of WirelessCom 2005, Maui, Hawaii, USA, 2005.
    [vanDijk 1980] T.A. vanDijk, “Macrostructures: An interdisciplinary study of global structures in discourse, interaction, and cognition”, Lawrence Erlbaum, Hillsdale, NJ, 1980.
    [Zhu and Penn 2005] X. Zhu, G. Penn, “Evaluation of Sentence Selection for Speech Summarization”, in Proc the 2nd International Conference on Recent Advances in Natural Language Processing (RANLP-05), pp. 39-45. September 2005.
    [陳怡婷 et al. 2005] 陳怡婷、陳柏琳、林順喜、梅士杰、廖康任,"數位典藏多媒體系統使用隱藏式馬可夫檢索模型之研究,"「第四屆數位典藏技術研討會」, September 1-2, 2005.
    [陳怡婷 et al. 2005] 陳怡婷、黃耀民、葉耀明、陳柏琳,“中文語音文件自動摘要之摘要模型”, 「第十屆人工智慧與應用研討會」, December 2-3, 2005.
    [陳怡婷 et al. 2006] 陳怡婷、游斯涵、李家豪、陳柏琳,“中文語音文件摘要使用主題混合模型”, 「第十一屆人工智慧與應用研討會」, December 15-16, 2006.
    [陳燦輝 2006] 陳燦輝,『信心度評估於中文大詞彙連續語音辨識之研究』,碩士論文,國立臺灣師範大學資訊工程研究所,2006.
    [黃耀民 2005] 黃耀民,『以字句擷取為基礎並應用於文件分類之自動摘要之研究』,碩士論文,國立臺灣師範大學資訊工程研究所,2005.

    QR CODE