研究生: |
林筱芸 Lin, Hsiao-Yun |
---|---|
論文名稱: |
探究基於異質圖和上下文語言模型之自動文件摘要技術 A Study on Heterogeneous Graph Neural Networks and Contextualized Language Models for Text Summarization |
指導教授: |
陳柏琳
Chen, Berlin |
口試委員: |
洪志偉
Hung, Jeih-Weih 陳冠宇 Chen, Kuan-Yu 曾厚強 Tseng, Hou-Chiang 陳柏琳 Chen, Berlin |
口試日期: | 2022/07/21 |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2022 |
畢業學年度: | 110 |
語文別: | 英文 |
論文頁數: | 35 |
中文關鍵詞: | 節錄式摘要 、圖神經網路 、異質圖神經網路 、語言模型 |
英文關鍵詞: | Extractive Summarization, Graph Neural Networks, Heterogeneous Graph Neural Networks, Contextualized Language Model |
DOI URL: | http://doi.org/10.6345/NTNU202201355 |
論文種類: | 學術論文 |
相關次數: | 點閱:87 下載:4 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於網路蓬勃發展,每日都會產生成千上萬筆的文字訊息,但不是每個人都有時間逐一瀏覽,因此我們需要一個技術來幫助我們快速的理解每篇文章中的重要內容。自動文件摘要技術油然而生,該技術可以幫助我們從單一或數個文檔中迅速且準確地擷取關鍵的信息。自動摘要方法可以分為兩種類型:節錄式抽取式(extractive)摘要與重寫式(abstractive)摘要。前者將文章中重要的句子提取出來構成摘要內容,後者則是在理解文章內容後重建文章成字數精簡且富含重點的摘要結果。本論文的目標是建立一個能提取語意的節錄抽取式摘要模型,且其摘要句之間具有較少的冗餘。本文使用基於圖的神經網絡(graph-based neural networks, GNN)來學習富含上下文語意的句嵌入。
圖在真實世界應用上,通常會具有多種不同性質的節點,因此我們使用異構圖神經網絡(heterogeneous graph neural network, HGNN)作為我們的基礎,並從三個不同的面向來提升模型效能。首先,在編碼階段(encoding stage),我們致力於在此時期加入更多的資訊。例如,我們使用基於雙向編碼器表示的變形器(bidirectional encoder representation from transformers, BERT)的語言模型,來提供摘要模型更多的上下文資訊。除此之外,句子與句子之間的關係,或是句子本身的內部關係等,這些句子屬性皆會在此階段納入考量。緊接著,在句子評分(sentence rescoring)的階段,我們提供了幾種重新評分的方法。其中,我們可以將句子在文章中的順序納入考量,但必須經過標準化以減少誤差。最後,在挑選句子(sentence selection)的階段,我們改善了句子挑選器來減少冗餘。實驗結果證明,此論文提出的各式方法在公開的摘要集中,皆獲得相當不錯的成效。
The explosive growth of big data requires methods to capture key information effectively. Automatic summarization can quickly and accurately help us capture key information from single or multiple documents. In general, automatic summarization can be classified into two types: extractive summarization extracts existing sentences to form the abstract, and abstractive summarization reconstructs a meaningful summary after comprehension. The goal of this thesis is to generate semantic extractive summarization with less redundancy. To achieve these goals, the thesis uses graph-based neural networks to learn contextual sentence embedding.
Because real-world graph applications usually have multiple node types, we implement a heterogeneous graph neural network as our baseline model and explore three aspects to improve its performance. First, efforts are made to incorporate more contextual information in the encoding stage. Language models based on bidirectional encoder representations from transformers are used to provide more contextual representations. Additional sentence properties, such as inter- and intra-sentential relationships, are also considered. Second, this paper provides several methods for sentence rescoring. The order of sentences is considered by balancing their positions across the paragraph and then normalizing them to mitigate bias. Finally, sentence selectors are also improved to reduce redundancy. The experimental results show that all three methods significantly improve performance .
Bengio, Y., Ducharme, R., & Vincent, P. (2000). A neural probabilistic language model. Advances in neural information processing systems, 13.
Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
Carbonell, J. (1997). Automated query-relevant summarization and diversity-based reranking. In Proc. of the IJCAI-97 Workshop on AI in Digital Libraries (pp. 9–14).
Carbonell, J., & Goldstein, J. (1998, August). The use of MMR, diversity-based re-ranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 335–336).
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
Dang, H. T., & Owczarzak, K. (2008, November). Overview of the TAC 2008 Update Summarization Task. In TAC.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Edmundson, H. P. (1969). New methods in automatic extracting. Journal of the ACM (JACM), 16(2), 264–285.
Erkan, G., & Radev, D. R. (2004). Lexrank: Graph-based lexical centrality as sali-ence in text summarization. Journal of artificial intelligence research, 22, 457–479.
Goldstein, J., Mittal, V., Carbonell, J., & Kantrowitz, M. (2000, April). Multi-document summarization by sentence extraction. In Proceedings of the 2000 NAACL-ANLP Workshop on Automatic summarization (pp. 40–48). Association for Computational Linguistics.
Gupta, V., & Lehal, G. S. (2010). A survey of text summarization extractive tech-niques. Journal of emerging technologies in web intelligence, 2(3), 258–268.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural compu-tation, 9(8), 1735–1780.
Hermann, K. M., Kocisky, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M., & Blunsom, P. (2015). Teaching machines to read and comprehend. Advances in neural information processing systems, 28.
Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q., Gesmun-do, A., ... & Gelly, S. (2019, May). Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning (pp. 2790–2799). PMLR.
Johnson, M. E. (2018). Automatic Summarization of Natural Language. arXiv pre-print arXiv:1812.10549.
Kasture, N. R., Yargal, N., Singh, N. N., Kulkarni, N., & Mathur, V. (2014). A survey on methods of abstractive text summarization. International Journal for Research in Emerging Science and Technology, 1(6), 53–57.
Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of research and development, 2(2), 159–165.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning ap-plied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
Liddy, E. D. (2001). Natural language processing.
Lin, C. Y. (2004, July). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out (pp. 74–81).
Lerman, K., & McDonald, R. (2009). Contrastive summarization: an experiment with consumer reviews.
Luong, M. T., Pham, H., & Manning, C. D. (2015). Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025.
Liu, Y. (2019). Fine-tune BERT for extractive summarization. arXiv preprint arXiv:1903.10318.
Liu, Y., & Lapata, M. (2019). Text summarization with pretrained encoders. arXiv preprint arXiv:1908.08345.
Mihalcea, R., & Tarau, P. (2004, July). Textrank: Bringing order into text. In Pro-ceedings of the 2004 conference on empirical methods in natural language pro-cessing (pp. 404–411).
Medlock, B. (2006, May). An introduction to NLP-based textual anonymisation. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06).
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Nenkova, A., & McKeown, K. (2012). A survey of text summarization techniques. In Mining text data (pp. 43–76). Springer, Boston, MA.
Neha, R., & Sharvari, G. (2019). Recent Trends in Deep Learning Based Abstractive Text Summarization. International Journal of Recent Technology and Engineering, 8(3), 3108–3115.
Pan, S. J., & Yang, Q. (2009). A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10), 1345–1359.
Pennington, J., Socher, R., & Manning, C. D. (2014, October). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543).
Peyrard, M., Botschen, T., & Gurevych, I. (2017, September). Learning to score sys-tem summaries for better content selection evaluation. In Proceedings of the Work-shop on New Frontiers in Summarization (pp. 74–84).
Shannon, C. E. (2001). A mathematical theory of communication. ACM SIGMOBILE mobile computing and communications review, 5(1), 3–55.
Steinberger, J., & Ježek, K. (2012). Evaluation measures for text summariza-tion. Computing and Informatics, 28(2), 251–275.
Shi, C., Li, Y., Zhang, J., Sun, Y., & Philip, S. Y. (2016). A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineer-ing, 29(1), 17–37.
See, A., Liu, P. J., & Manning, C. D. (2017). Get to the point: Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368.
Tzu-En Liu (2019). Spoken Document Summarization Leverage Hierarchical Semat-ic and Acoustic Representations (Unpublished Master’s Thesis). National Taiwan Normal University, Taipei, Taiwan.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Po-losukhin, I. (2017). Attention is all you need. Advances in neural information pro-cessing systems, 30.
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., & Bengio, Y. (2017). Graph attention networks. arXiv preprint arXiv:1710.10903.
Wang, D., Liu, P., Zheng, Y., Qiu, X., & Huang, X. (2020). Heterogeneous graph neu-ral networks for extractive document summarization. arXiv preprint arXiv:2004.12393.
Xiao, W., & Carenini, G. (2020). Systematically exploring redundancy reduction in summarizing long documents. arXiv preprint arXiv:2012.00052.