研究生: |
吳沁穎 Wu, Chin-Ying |
---|---|
論文名稱: |
資訊擷取與知識注入技術於機器閱讀理解之研究 A Study on Information Extraction and Knowledge Injection for Machine Reading Comprehension |
指導教授: |
陳柏琳
Chen, Berlin |
口試委員: |
洪志偉
Hung, Jeih-Weih 陳冠宇 Chen, Kuan-Yu 曾厚強 Tseng, Hou-Chiang 陳柏琳 Chen, Berlin |
口試日期: | 2022/07/21 |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2022 |
畢業學年度: | 110 |
語文別: | 中文 |
論文頁數: | 61 |
中文關鍵詞: | 機器閱讀理解 、自然語言處理 、知識圖譜 、深度學習 |
英文關鍵詞: | Machine Reading Comprehension, Natural Language Processing, Knowledge Graph, Deep Learning |
研究方法: | 實驗設計法 |
DOI URL: | http://doi.org/10.6345/NTNU202201233 |
論文種類: | 學術論文 |
相關次數: | 點閱:91 下載:12 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,預訓練上下文語言模型 (Pre-trained Contextualized Language Modeling, PCLM) 的出現,使得基於 PCLM 的方法在各種機器閱讀理解 (Machine Reading Comprehension, MRC) 與對話式機器閱讀理解 (Conversational MRC, CMRC) 都有非常優秀的表現。然而,在機器閱讀理解領域仍然較少研究琢磨於開放領域知識 (Open-domain Knowledge) 與域內知識 (In-domain Knowledge) 的運用。有鑑於此,本論文提出一種針對MRC與CMRC的有效建模方法。此方法具有兩個主要的特點:首先,針對文章段落進行訊息提取 (Information Extraction, IE) 的預處理,藉此將每個文章段落聚類成一個偽類 (Pseudo-class) 以提供PCLM 進行訊息增強,進而提升後續 MRC與CMRC的任務表現;另一方面,本論文提出了一種新的知識注入 (Knowledge Injection, KI) 方法,將開放領域知識 (Open-domain Knowledge) 與域內知識 (In-domain Knowledge) 注入至 PCLM ,藉此捕捉更為精準的問題與文章段落間的相互關係。本論文將實驗結果與數個當今最佳的方法進行比較,除了在多個MRC與CMRC資料集上都有一定程度的表現外,大量的實證實驗也證明了本論文方法的有效性與可行性。
In the recent past, pre-trained contextualized language modeling (PCLM) approaches have made inroads into diverse tasks of machine reading comprehension (MRC), as well as conversational MRC (CMRC), with good promise. Despite the success of these approaches, there are still not many efforts on the integration of either open-domain or in-domain knowledge into MRC and CMRC. In view of this, we propose in this thesis an effective modeling method for MRC and CMRC, which has at least two distinctive characteristics. On one hand, an information extraction (IE) preprocess is conducted to Cluster each paragraph of interest into a pseudo-class for the purpose to provide augmented information for PCLM to enhance MRC and CMRC performance. On the other hand, we also explore a novel infusion of both open-domain and in-domain knowledge into PCLM to better capture the interrelationship between a posed question and a paragraph of interest. An extensive set of empirical experiments carried out on several MRC and CMRC benchmark datasets indeed demonstrate the effectiveness and practical feasibility our proposed approach in comparison to some top-of-the-line methods.
[1] T. Young, D. Hazarika, S. Poria and E. Cambria, "Recent Trends in Deep Learning Based Natural Language Processing," ieee Computational intelligenCe magazine 13.3:55-75, 2018.
[2] C. Zeng, S. Li, Q. Li, J. Hu and J. Hu, "A Survey on Machine Reading Comprehension: Tasks, Evaluation Metrics and Benchmark Datasets," arXiv:2006.11880, 2020.
[3] W. G. Lehnert. “The Process of Question Answering,” PhD thesis, AAI7728146, 1977.
[4] E. Riloff and M. Thelen. "A Rule-Based Question Answering System for Reading Comprehension Tests," in Proceedings of the 2000 ANLP/NAACL Workshop on Reading Comprehension Tests as Evaluation for Computer-Based Language Understanding Systems - Volume 6; Association for Computational Linguistics: USA; ANLP/NAACL-ReadingComp ’00, p. 13–19. doi:10.3115/1117595.1117598, 2000.
[5] E. Charniak, Y. Altun, R. S. Braz, B. Garrett, M. Kosmala, T. Moscovich, L. Pang, C. Pyo, Y. Sun, W. Wy, Z. Yang, S. Zeiler, and L. Zorn. "Reading Comprehension Programs in a Statistical-Language-Processing Class, " in ANLP-NAACL 2000 Workshop: Reading Comprehension Tests as Evaluation for Computer-Based Language Understanding Systems, 2000.
[6] D. Chen. “Neural reading comprehension and beyond,” PhD thesis, Stanford Uni-versity, 2018.
[7] M. Richardson, C. J.C. Burges and E. Renshaw, “MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text,” in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing; Association for Computational Linguistics, pp. 193–203, 2013.
[8] K. M. Hermann, T. Kocisky, E. Grefenstette, L. Espeholt, W. Kay, M. Suleyman and P. Blunsom, “Teaching machines to read and comprehend,” in Advances in neural information processing systems, pp. 1693–1701, 2015.
[9] R. Kadlec, M. Schmid, O. Bajgar and J. Kleindienst, “Text Understanding with the Attention Sum Reader Network,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 908–918, 2016.
[10] B. Dhingra, H. Liu, Z.Yang, W.W. Cohen and R. Salakhutdinov. “Gated-Attention Readers for Text Comprehension,” in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1832–1846, 2017.
[11] W. Wang, N.Yang, F. Wei, B. Chang and M. Zhou. “Gated Self-Matching Networks for Reading Comprehension and Question Answering,” in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 189–198, 2017.
[12] M. Zaib, W. E. Zhang, Q. Z. Sheng, A. Mahmood and Y. Zhang, "Conversational Question Answering: A Survey," arXiv:2106.00874, 2021.
[13] P. Rajpurkar, J. Zhang, K. Lopyrev and P. Liang, "SQuAD: 100,000+ Questions for Machine Comprehension of Text," in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2383–2392, 2016.
[14] C. C. Shao, T. Liu, Y. Lai, Y. Tseng, and S. Tsai, "Drcd: a chinese machine reading comprehension dataset," arXiv preprint arXiv:1806.00920. 2018.
[15] Y. Cui, T. Liu, W. Che, L. Xiao, Z. Chen, W. Ma, S. Wang, and G. Hu, "A Span-Extraction Dataset for Chinese Machine Reading Comprehension," in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pages 5886–5891, 2019.
[16] E. Choi, H. He, M. Iyyer, M. Yatskar, W. Yih, Y. Choi, P. Liang and L. Zettlemoyer, "QuAC: Question Answering in Context," in conference on Empirical Methods in Natural Language Processing, 2018.
[17] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," in Advances in Neural Information Processing Systems 26, pages 3111–3119. Curran Associates, Inc, 2013.
[18] M. Ester, H. P. Kriegel, J. Sander and X. Xu, "A density-based algorithm for discovering Clusters in large spatial databases with noise," in Kdd, volume 96, pages 226–231, 1996.
[19] A. Sherstinsky, " Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network," Physica D:Nonlinear Phenomena journal, Volume 404, 132306, 2020.
[20] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee and L. Zettlemoyer, "Deep contextualized word representations," in NAACL-HLT, 2227–2237, 2018.
[21] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," in Advances in Neural Information Processing Systems, CA, pages 6000–6010, 2017.
[22] A. Radford, K. Narasimhan, T. Salimans and I. Sutskever, I. "Improving language understanding by generative pre-training," Technical report, 2018.
[23] J. Devlin, M. W. Chang, K. Lee and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, 4171–4186, 2019.
[24] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov and Q. V. Le., "XLNET: Generalized autoregressive pretraining for language understanding," in Advances in neural information processing systems, 5753–5763, 2019.
[25] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer and V. Stoyanov. "RoBERTa: A robustly optimized BERT pretraining approach," arXiv preprint arXiv:1907.11692, 2019.
[26] Z. Z. Lan, M. D. Chen, S. Goodman, K. Gimpel, P. Sharma and R. Soricut, "ALBERT: A Lite BERT for Self- supervised Learning of Language Representations," in International Conference on Learning Representations, 2020.
[27] K. Clark, M. T. Luong, Q. V. Le and C. D. Manning, "ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators," in International Conference on Learning Representations, 2020.
[28] N. Reimers and I. Gurevych, "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks," in Conference on Empirical Methods in Natural Language Processing, 2019.
[29] W. Liu, P. Zhou, Z. Zhao, Z. Wang, Q. Ju, H. T. Deng and P. Wang, "K-BERT: Enabling Language Representation with Knowledge Graph," in Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020.
[30] Y. Cui, W. Che, T. Liu, B. Qin, Z. Yang, S. Wang, and G. Hu, "Pre-training with whole word masking for Chinese BERT," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021.
[31] Y. Cui, T. Liu, W. Che, L. Xiao, Z. Chen, W. Ma, S. Wang, and G. Hu, "Revisiting Pre-Trained Models for Chinese Natural Language Processing.", in The International Conference on Empirical Methods in Natural Language Processing, 2020.
[32] C. Qu, L. Yang, M. Qiu, W. B. Croft, Y. Zhang and M. Iyyer, "BERT with History Answer Embedding for Conversational Question Answering", in Conference on Research and Development in Information Retrieval (SIGIR), 2019.
[33] C. Qu, L. Yang, M. Qiu, Y. Zhang, C. Chen, W. B. Croft and M. Iyyer, "Attentive History Selection for Conversational Question Answering," in Conference on Information and Knowledge Management (CIKM), 2019.
[34] H. Y. Huang, E. Choi and W. T. Yih, "FlowQA: Grasping Flow in History for Conversational Machine Comprehension", in The International Conference on Learning Representations (ICLR), 2019.
[35] I. Beltagy, M. E. Peters and A. Cohan, "Longformer: The Long-Document Transformer," arXiv:2004.05150, 2020.
[36] George A. Miller, "WordNet: a lexical database for English," in Communications of the ACM, 1995.
[37] Z. Dong and Q. Dong, "HowNet - a hybrid language and knowledge resource," in International Conference on Natural Language Processing and Knowledge Engineering, 2003.
[38] F. M. Suchanek, G. Kasneci and G. Weikum, "YAGO: A Core of Semantic Knowledge Unifying WordNet and Wikipedia," in 16th international World Wide Web conference (WWW), 2007.
[39] W. Che, Z. Li, and T. Liu. “Ltp: A chinese language technology platform.” in Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations, pages 13–16, 2010.
[40] G. Kim, H. Kim, J. Park and J. Kang, "Learn to Resolve Conversational Dependency: A Consistency Training Framework for Conversational Question Answering," in Association for Computational Linguistics (ACL), 2021.
[41] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li and P. J. Liu, “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer,” in The Proceedings of Machine Learning Research, 2020.
[42] J. Zhao, J. Bao, Y. Wang, Y. Zhou, Y. Wu, X. He, B. Zhou, "RoR: Read-over-Read for Long Document Machine Reading Comprehension," in Conference on Empirical Methods in Natural Language Processing, 2021.
[43] Z. Gekhman, N. Oved, O. Keller, I. Szpektor, R. Reichart, "On the Robustness of Dialogue History Representation in Conversational Question Answering: A Comprehensive Study and a New Prompt-based Method," arXiv:2206.14796, 2022.
[44] Y. T. Yeh, Y. N. Chen, "FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension," in Conference on Empirical Methods in Natural Language Processing, 2019.
[45] Y. Chen, L. Wu, M. J. Zaki, "GraphFlow: Exploiting Conversation Flow with Graph Neural Networks for Conversational Machine Comprehension," in Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020.