研究生: |
施凱文 Shih, Kai-Wun |
---|---|
論文名稱: |
表示法學習技術於節錄式語音文件摘要之研究 A Study on Representation Learning Techniques for Extractive Spoken Document Summarization |
指導教授: |
陳柏琳
Chen, Berlin |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2015 |
畢業學年度: | 103 |
語文別: | 中文 |
論文頁數: | 84 |
中文關鍵詞: | 語音文件 、節錄式摘要 、詞表示法 、語句表示法 、韻律特徵 |
英文關鍵詞: | spoken documents, extractive summarization, word representation, sentence representation, prosodic features |
論文種類: | 學術論文 |
相關次數: | 點閱:94 下載:13 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在現今日常生活裡,大量的多媒體內容與日俱增促使自動語音文件摘要成為一項重要的研究議題。當中最為廣泛地被探究的是節錄式語音文件摘要(Extractive Spoken Document Summarization):其目的是根據事先定義的摘要比例,從語音文件中選取一些重要的語句,用以代表原始語音文件的主旨或主題。另一方面,表示法學習(Representation Learning)是近期相當熱門的一個研究議題,多數的研究成果也證明了這項技術在許多自然語言處理(Natural Language Proceeding, NLP)的相關任務上,可以獲得優良的成效。有鑑於此,本論文主要探討使用詞表示法(Word Representations)及語句表示法(Sentence Representations)於節錄式語音文件摘要任務上。基於詞表示法及語句表示法,本論文提出三種新穎且有效的排序模型(Ranking Models)。除了文件中的文字資訊外,本論文更進一步地結合語音文件上的各式聲學特徵,如韻律特徵(Prosodic Features)等,以期望可獲得更好的摘要成效。本論文的語音文件摘要實驗語料是採用公視廣播新聞(MATBN);實驗結果顯示,相較於其它現有的摘要方法,我們所發展的新穎式摘要方法能夠提供顯著的效能改善。
The rapidly increasing availability of multimedia associated spoken documents on the Internet has prompted automatic spoken document summarization to be an important research subject. Thus far, the majority of existing work has focused on extractive spoken document summarization, which selects salient sentences from an original spoken document according to a target summarization ratio and concatenates them to form a summary concisely, in order to convey the most important theme of the document. On the other hand, there has been a surge of interest in developing representation learning techniques for a wide variety of natural language processing (NLP)-related tasks. However, to our knowledge, they are largely unexplored in the context of extractive spoken document summarization. With the above background, this thesis explores a novel use of both word and sentence representation techniques for extractive spoken document summarization. In addition, three variants of sentence ranking models built on top of such representation techniques are proposed. Furthermore, extra information cues like the prosodic features extracted from spoken documents, apart from the lexical features, are also employed for boosting the summarization performance. A series of experiments conducted on the MATBN broadcast news corpus indeed reveal the performance merits of our proposed summarization methods in relation to several state-of-the-art baselines.
[1] H. P. Luhn, “The Automatic Creation of Literature Abstracts,” IBM Journal of Research and Development, vol. 2, no. 2, pp. 159-165, 1958.
[2] M. Mitra, A. Singhal, and C. Buckley, “Automatic Text Summarization by Paragraph Extraction,” in Proceedings of the ACL/EACL Workshop on Intelligent Scalable Text Summarization, pp. 39-46, 1997.
[3] K. S. Jones, “Automatic Summarising: Factors and Directions,” Advances in Automatic Text Summarization, pp. 1-12, 1999.
[4] D. Das and A. F. Martins, “A Survey on Automatic Text Summarization,” Literature Survey for the Language and Statistics II Course at Carnegie Mellon University, vol. 4, pp. 192-195, 2007.
[5] A. Nenkova and K. McKeown, A Survey of Text Summarization Techniques, Mining Text Data, Springer, pp. 43-76, 2012.
[6] G. Tur and R. D. Mori, Spoken Language Understanding: Systems for Extracting Semantic Information from Speech, John Wiley and Sons, 2011.
[7] Y. Liu and S. Xie, “Impact of Automatic Sentence Segmentation on Meeting Summarization,” in Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 5009-5012, 2008.
[8] Y. Liu, F. Liu, B. Li and S. Xie, “Do Disfluencies Affect Meeting Summarization: A Pilot Study on the Impact of Disfluencies,” in Proceedings of the Machine Learning and Multimodal Interaction, 2007.
[9] S. Xie, Y. Liu and F. Liu, “Using N-Best Recognition Output for Extractive Summarization and Keyword Extraction in Meeting Speech,” in Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 5310-5313, 2010.
[10] S. Xie and Y. Liu, “Using Confusion Networks for Speech Summarization,” in Proceedings of the NAACL-Human Language Technologies, pp. 46-54, 2010.
[11] A. Inoue, T. Mikami and Y. Yamashita, “Improvement of Speech Summarization Using Prosodic Information,” in Proceedings of Speech Prosody, pp. 599-602, 2004.
[12] J. Zhang, “Speech Summarization Without Lexical Features for Mandarin Broadcast News,” in Proceedings of the NAACL-Human Language Technologies, pp. 213-216, 2007.
[13] J. Zhang, H. Y. Chan, P. Fung and L. Cao, “A Comparative Study on Speech Summarization of Broadcast News and Lecture Speech,” in Proceedings of the International Speech Communication Association, pp. 2488-2491, 2007.
[14] H. Y. Chan, J. Zhang and P. Fung, “Improving Lecture Speech Summarization Using Rhetorical Information,” in Proceedings of the Automatic Speech Recognition and Understanding, pp. 195-200, 2007.
[15] K. Knight and D. Marcu, “Summarization Beyond Sentence Extraction: A Probabilistic Approach to Sentence Compression”, in Artificial Intelligence, pp. 91-107, 2002.
[16] T. Kikuchi, S. Furui, and C. Hori, “Two-Stage Automatic Speech Summarization by Sentence Extraction and Compaction,” in Proceedings of the IEEE and ISCA Workshop on Spontaneous Speech Processing and Recognition, pp. 207-210, 2003.
[17] S. Furui, T. Kikuchi, Y. Shinnaka and C. Hori, “Speech-to-Text and Speech-to-Speech Summarization of Spontaneous Speech”, in Proceedings of the IEEE Transactions on Speech and Audio Processing, vol. 12, no. 4, pp. 401-408, 2004.
[18] P. Baxendale, “Machine-Made Index for Technical Literature: An Experiment,” IBM Journal of Research and Development, vol. 2, no. 4, pp. 354-361, 1958.
[19] M. Hajime and O. Manabu, “A Comparison of Summarization Methods Based on Task-based Evaluation”, in Proceedings of the International Conference on Language Resources and Evaluation, pp. 633-639, 2000.
[20] M. Hirohata, Y. Shinnaka, K. Iwano and S. Furui, “Sentence Extraction-Based Presentation Summarization Techniques and Evaluation Metrics”, in Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 1065-1068, 2005.
[21] S. R. Maskey and J. Hirschberg, “Automatic Summarization of Broadcast News Using Structural Features,” in Proceedings of the European Conference on Speech Communication and Technology, pp. 1173-1176, 2003.
[22] G. Penn and X. Zhu, “A Critical Reassessment of Evaluation Baselines for Speech Summarization,” in Proceedings of the NAACL-Human Language Technologies, pp. 470-478, 2008
[23] C. Y. Lin and E. Hovy, “Automated Text Summarization and the SUMMARIST System,” in Proceedings of the TIPSTER Text Program, pp. 197-214, 1998.
[24] G. Salton and M. E. Lesk, “Computer Evaluation of Indexing and Text Processing,” Journal of the ACM, vol. 15, no. 1, pp. 8-36, 1968.
[25] J. Carbonell and J. Goldstein, “The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries,” in Proceedings of the Special Interest Group on Information Retrieval, pp. 335-336, 1998.
[26] S. Xie and Y. Liu, “Using Corpus and Knowledge-Based Similarity Measure in Maximum Marginal Relevance for Meeting Summarization,” in Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 4985-4988, 2008.
[27] A. Celikyilmaz and D. H. Tur, “Concept-Based Classification for Multi-Document Summarization,” in Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 5540-5543, 2011.
[28] A. Celikyilmaz and D. H. Tur, “A Hybrid Hierarchical Model for Multi-Document Summarization,” in Proceedings of the Association for Computational Linguistics, pp. 815-824, 2010.
[29] Y. L. Chang and J. T. Chien, “Latent Dirichlet Learning for Document Summarization,” in Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 1689-1692, 2009.
[30] Y. Gong and X. Liu, “Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis,” in Proceedings of the Special Interest Group on Information Retrieval, pp. 19-25, 2001.
[31] J. M. Ponte, and W. B. Croft, “A Language Modeling Approach to Information Retrieval,” in Proceedings of the Special Interest Group on Information Retrieval, pp. 275-281, 1998.
[32] Y. T. Chen, B. Chen and H. M. Wang, “A Probabilistic Generative Framework for Extractive Broadcast News Speech Summarization,” IEEE Transactions on Audio, Speech and Language Processing, vol. 17, no. 1, pp. 95-106, 2009.
[33] C. X. Zhai and J. Lafferty, “A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval,” in Proceedings of the Special Interest Group on Information Retrieval, pp. 334-342, 2011.
[34] S. Kullback and R. Leibler, “On Information and Sufficiency,” The Annals of Mathematical Statistics, vol. 22, no. 1, pp. 79-86, 1951.
[35] S. H. Lin, Y. M. Yeh and B. Chen, “Leveraging Kullback-Leibler Divergence Measures and Information-Rich Cues for Speech Summarization,” IEEE Transactions on Audio, Speech and Language Processing, vol. 19, no. 4, pp. 871-882, 2011.
[36] V. Lavrenko and W. B. Croft, “Relevance-Based Language Models,” in Proceedings of the Special Interest Group on Information Retrieval, pp. 120-127, 2001.
[37] B. Chen, H. C. Chang, K. Y. Chen, “Sentence Modeling for Extractive Speech Summarization,” in Proceedings of the International Conference on Multimedia and Expo, pp. 1-6, 2013.
[38] S. E. Robertson, K. S. Jones, “Relevance Weighting of Search Terms,” Joumal of the American Society for Information Science, pp. 129-146, 1976.
[39] S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu and M. Gatford, “Okapi at TREC-4,” in Proceedings of the Fourth Text Retrieval Conference, pp. 73-97, 1996.
[40] S. E. Robertson and S. Walker, “Some Simple Effective Approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval,” in Proceedings of the Special Interest Group on Information Retrieval, pp. 232-241, 1994.
[41] F. Rousseau and M. Vazirgiannis, “Graph-of-Word and TW-IDF: New Approach to Ad Hoc IR,” in Proceedings of the International Conference on Conference on Information, Knowledge Management, pp. 59-68, 2013.
[42] R. Motwani, L. Page, S. Brin and T. Winograd, “The PageRank Citation Ranking: Bringing Order to the Web,” in Proceedings of the International World Wide Web Conference, pp. 161-172, 1998.
[43] C. F. Yeh, Y. N. Chen, Y. Huang and L. S. Lee, “Spoken Lecture Summarization by Random Walk over A Graph Constructed with Automatically Extracted Key Terms,” in Proceedings of the International Speech Communication Association, 2011.
[44] Y. N. Chen, H. Y. Lee and L. S. Lee, “Improved Speech Summarization and Spoken Term Detection with Graphical Analysis of Utterance Similarities,” in Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2011.
[45] Y. N. Chen and F. Metze, “Intra-Speaker Topic Modeling for Improved Multi-Party Meeting Summarization with Integrated Random Walk,” in Proceedings of the NAACL-Human Language Technologies, pp. 377-381, 2012.
[46] Y. N. Chen and F. Metze, “Integrating Intra-Speaker Topic Modeling and Temporal-Based Inter-Speaker Topic Modeling in Random Walk for Improved Multi-Party Meeting Summarization,” in Proceedings of the International Speech Communication Association, pp. 2346-2349, 2012.
[47] Y. N. Chen and F. Metze, “Two-Layer Mutually Reinforced Random Walk for Improved Multi-Party Meeting Summarization,” in Proceedings of the Workshop on Spoken Language Technology, pp. 461-466, 2012.
[48] G. Erkan and D. R. Radev, “Lexrank: Graph-Based Centrality as Salience in Text Summarization,” Journal of Artificial Intelligence Research, vol. 22, pp. 457-479, 2004.
[49] H. Lin, J. Bilmes and S. Xie, “Graph-Based Submodular Selection for Extractive Summarization,” in Proceedings of the Automatic Speech Recognition and Understanding, pp. 381-386, 2009.
[50] N. Garg, B. Favre, K. Reidhammer and D. H. Tur, “ClusterRank: A Graph Based Method for Meeting Summarization,” in Proceedings of the International Speech Communication Association, pp. 1499-1502, 2009.
[51] C. H. Wu, C. L. Huang and C. H. Hsieh, “Spoken Document Summarization and Retrieval for Wireless Application,” in Proceedings of the Wireless Networks, Communications, and Mobile Computing, vol. 2, pp.1388-1393, 2005.
[52] C. Y. Lin, ROUGE: Recall-Oriented Understudy for Gisting Evaluation. Available: http://haydn.isi.edu/ROUGE/, 2003.
[53] C. Y. Lin, “ROUGE: A Package for Automatic Evaluation of Summaries,” in Proceedings of the Workshop on Text Summarization Branches Out, 2004.
[54] H. M. Wang, B. Chen, J. W. Kuo, and S. S. Cheng, “MATBN: A Mandarin Chinese Broadcast News Corpus,” Journal of Computational Linguistics and Chinese Language Processing, vol. 10, no. 2, pp. 219-236, 2005.
[55] G. E. Hinton, “Learning Distributed Representations of Concepts”, in Proceedings of the Cognitive Science Society, pp. 1-12, 1986.
[56] T. Mikolov, I. Sutskever and K. Chen, “Distributed Representations of Words and Phrases and Their Compositionality”, in Proceedings of the Neural Information Processing Systems, pp. 3111-3119, 2013.
[57] Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, “A Neural Probabilistic Language Model,” Journal of Machine Learning Research, vol. 3, pp. 1137-1155, 2003.
[58] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space,” in Proceedings of the International Conference on Learning Representations, pp. 1-12, 2013.
[59] G. Miller and W. Charles, “Contextual Correlates of Semantic Similarity,” Language and Cognitive Processes, vol. 6, no. 1, pp. 1-28, 1991.
[60] F. Morin and Y. Bengio, “Hierarchical Probabilistic Neural Network Language Model,” in Proceedings of the Artificial Intelligence and Statistics, pp. 246-252, 2005.
[61] A. Mnih and K. Kavukcuoglu, “Learning Word Embeddings Efficiently with Noise-Contrastive Estimation,” in Proceedings of the Neural Information Processing Systems, pp. 2265-2273, 2013.
[62] J. Mitchell and M. Lapata, “Vector-Based Models of Semantic Composition,” in Proceedings of the NAACL-Human Language Technologies, pp. 236-244, 2008.
[63] F. M. Zanzotto, I. Korkontzelos, F. Fallucchi and S. Manandhar, “Estimating Linear Models for Compositional Distributional Semantics,” in Proceedings of the International Conference on Computational Linguistics, pp. 1263-1271, 2010.
[64] A. Yessenalina and C. Cardie, “Compositional Matrix-Space Models for Sentiment Analysis,” in Proceedings of the Empirical Methods in Natural Language Processing, pp. 172-182, 2011.
[65] Q. V. Le and T. Mikolov, “Distributed Representations of Sentences and Documents,” in Proceedings of the International Conference on Machine Learning, 2014.
[66] G. Murray, S. Renals and J. Carletta, “Extractive Summarization of Meeting Recordings,” in Proceedings of the International Speech Communication Association, pp. 593-596, 2005.
[67] J. M. Conroy and D. P. O’Leary, “Text Summarization Via Hidden Markov Models,” in Proceedings of the Special Interest Group on Information Retrieval, pp. 406-407, 2001.
[68] M. Galley, “A Skip-Chain Conditional Random Field for Ranking Meeting Utterances by Importance,” in Proceedings of the Empirical Methods in Natural Language Processing, pp. 364-372, 2006.
[69] D. Shen, J. T. Sun, H. Li, Q. Yang, and Z. Chen, “Document Summarization Using Conditional Random Fields,” in Proceedings of the International Joint Conference on Artificial Intelligence, pp. 2862-2867, 2007.
[70] J. Kupiec, “A Trainable Document Summarizer,” in Proceedings of the Special Interest Group on Information Retrieval, pp. 68-73, 1995.
[71] T. Joachims, Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms, Kluwer, 2002.
[72] T. M. Cover and P. E. Hart, “Nearest Neighbor Pattern Classification,” IEEE Transactions on Information Theory, vol.13, no. 1, pp. 21-27, 1967.
[73] H. A. Rowley, S. Baluja and T. Kanade, “Neural Network-Based Face Detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 1, pp. 23-38, 1998.
[74] J. R. Quinlan, “Induction of Decision Trees,” Machine Learning, vol. 1, no. 1, pp. 81-106, 1986.
[75] R. Jin and A. G. Hauptmann, “Automatic Title Generation for Spoken Broadcast News”, in Proceedings of the Human Language Conference, pp. 1-3, 2001.
[76] K. Filippova, “Multi-Sentence Compression: Finding Shortest Paths in Word Graphs ”, in Proceedings of the International Conference on Computational Linguistics, pp. 322-330, 2010.
[77] F. Lin and Y. Liu, “From Extractive to Abstractive Meeting Summaries: Can It Be Done by Sentence Compression?,” in Proceedings of the Association for Computational Linguistics, pp. 261-264, 2009.
[78] F. Lin and Y. Liu, “Using Spoken Utterance Compression for Meeting Summarization: A Pilot Study”, in Proceedings of the Spoken Language Technology, pp. 37-42, 2010.