研究生: |
蔡育霖 Yu-Lin Tsai |
---|---|
論文名稱: |
以機率模型為基礎之生醫文件指代消解方法 Anaphora Resolution in Biomedical Literature by Probabilistic Models |
指導教授: |
侯文娟
Hou, Wen-Juan |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2013 |
畢業學年度: | 101 |
語文別: | 中文 |
論文頁數: | 48 |
中文關鍵詞: | 指代消解 、自然語言處理 、貝式理論 、機率模型 |
英文關鍵詞: | anaphora resolution, natural language processing, Bayes' theorem, probabilistic model |
論文種類: | 學術論文 |
相關次數: | 點閱:149 下載:2 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
指代問題是自然語言中的普遍現象,隨著科技進步,生物醫學文件也需要處理指代消解問題以便擷取正確的訊息。若能解決文獻中具有指代關係的名詞片語,對於以後生醫研究人員在使用文獻上得到正確的描述會有很大的幫助,更希望透過此研究能夠加速生物醫學的發展。
在本研究中應用QA4MRE (Question Answering for Machine Reading Evaluation)提供的四篇關於阿茲海默症的生物醫學文件上進行非代名詞的指代消解,依照下列步驟擷取有意義的資訊:(1)為了得到句子的範圍,進行分句的處理,(2)為了得到句法的相關資訊,使用GDep (GENIA Dependency parser)對文件進行詞性標記,(3)為了聚集更好的特徵資訊,擷取出句子中主要的名詞以及前位修飾詞,(4)為了得到更準確的指代詞,使用規則對候選指代詞進行過濾,最後經由規則集和特徵集擷取出特徵資訊。在這篇論文中使用貝式理論的機率模型進行指代消解,應用了7種特徵值來進行實驗,實驗結果顯示precision為73.83%、recall為67.36%和F-measure為70.36%,在生醫文件的指代消解問題上屬於不錯的結果。
Anaphora is a common phenomenon in our language. With advances in technology, anaphora resolution needs to be addressed in order to retrieve the correct message in biomedical texts. Consequently, when biomedical researchers study about biomedical literatures, they can get the right description and we hope that our study can promote the speed of development of biomedical domain.
In this study, we apply a statistical model for resolution of non-pronominal anaphora in biomedical texts. The following procedures are applied to extract the relevant information: (1) applying sentence splitting for boundary detection, (2) employing the part-of-speech tagging such that the syntactic information is extracted, (3) for grouping the information of features, identifying head-noun and pre-modifiers, and (4) utilizing rules to obtain correct anaphora candidates, and at last using rule sets and feature sets for extracting feature information. This thesis presents a statistical point of view for resolution of non-pronominal anaphora, and there are seven features to be used in this experiment. The experiment achieves 73.83% precision rate, and it shows good performance of anaphora resolution in biomedical texts.
Bayes’ theorem. Available from http://en.wikipedia.org/wiki/ Bayes%27_theorem.
BioNLP-2011. Available from https://sites.google.com/site/bionlpst/.
Brennan, S.E., Friedman, M.W. and Pollard, C.J. (1987). “A Centering Approach to Pronouns,” Proceedings of Association for Computational Linguistics Conference ACL’87, Stanford, California, USA, pp. 155-162.
Briscoe, T., Carroll, J. and Watson, R. (2006) “The second release of the RASP system,” Proceedings of Association for Computational Linguistics Conference ACL’06, Sydney, Australia, pp. 77-80.
Cardie, Claire and Wagstaff, Kiri. (1999). “Noun Phrase Coreference as Clustering,” Proceedings of Joint SIGDAT Conference on Empirical Methods in NLP and Very Large Corpora, pp. 82-89.
Chen, B., Yang, X.F., Su, J., Zhou, G. and Tan, C.L. (2008). “Other-Anaphora Resolution in Biomedical Texts with Automatically Mined Patterns,” Proceedings of International Conference on Computational Linguistics Conference COLING’08, Vol. 1, Manchester , pp. 121-128.
Christopher, D. Manning., Prabhakar, Raghavan. and Hinrich, Schütze. (2008). Introduction to Information Retrieval, Cambridge University Press.
CLEF. Available from http://www.clef2013.org/index.php.
Dagan, I. and Itai, A. (1990) “Automatic Processing of Large Corpora for the Resolution of Anaphora Reference,” Proceedings of International Conference on Computational Linguistics Conference COLING’90, Vol. 3, Helsinki, Finland, pp. 330-332.
D'Souza, Jennifer. and Vincent, Ng. (2012). “Anaphora Resolution in Biomedical Literature: A Hybrid Approach, ” Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine, pp. 113-122.
Eilbeck, K. and Lewis, Suzanna E. (2004). “Sequence Ontology annotation guide,” Comparative and Functional Genomics, Vol. 5, no. 8, pp.642-647.
Gasperin, C. and Briscoe, T. (2008). “Statistical Anaphora Resolution in Biomedical Texts,” Proceedings of International Conference on Computational Linguistics Conference COLING’08, Vol. 1, Manchester, pp. 257-264.
GDep. Available from http://people.ict.usc.edu/~sagae/parser/gdep/.
GENIA corpus. Available from http://www-tsujii.is.s.u-tokyo.ac.jp/~genia/
topics/Corpus/.
Hobbs, J. (1986). Readings in Natural Language Processing. Morgan Kaufmann Publishers Inc. San Francisco, CA, USA.
Kennedy, C., Boguraev, B. (1996). “Anaphora for Everyone: Pronominal Anaphora Resoluation Without A Parser,” Proceedings of 16th conference on Computational Linguistics COLING’96, Vol. 1, pp. 113-118.
Lappin, S., Leass, H.J. (1994). “An Algorithm for Pronominal Anaphora Resolution,” Computational Linguistics, Vol. 20, no. 4, pp. 535-561.
Li, D.C., Miller, T. and Schuler, W. (2011), “A Pronoun Anaphora Resolution System based on Factorial Hidden Markov Models,” Proceedings of Association for Computational Linguistics Conference ACL’11, Portland, Oregon, pp. 1169-1178.
Marcus, M.P., Santorini, B. and Marcinkiewicz, M.A. (1993).“Building a Large Annotated Corpus of English: The Penn Treebank,” Proceedings of Computational Linguistics, Vol. 19, no. 2, pp. 313-330
McCarthy, J.F. and Lehnert, W.G. (1995). “Using Decision Trees for Coreference Resolution,” Proceedings of International Joint Conference on Artificial Intelligence Conference pp. 1050-1055.
MUC. Available from http://www.cs.nyu.edu/cs/faculty/grishman/muc6.html.
NLPBA. Available from http://www.nactem.ac.uk/tsujii/GENIA/ERtask/report.html.
Penn Treebank. Available from http://www.cis.upenn.edu/~treebank/.
PubMED. Available from http://www.ncbi.nlm.nih.gov/pubmed.
QA4MRE. Available from http://celct.fbk.eu/QA4MRE/.
Quinlan, J.R. (1993). C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
Sagae, K., Tsujii, J. (2007). “Dependency Parsing and Domain Adaptation with LR Models and Parser Ensembles,” Proceedings of EMNLP-CoNLL, pp.1044-1050.
Soon, W., Ng, H. and Lim, D. (2001). “A Machine Learning Approach to Coreference Resolution of Noun Phrases,” Computational Linguistics, Vol. 27, no. 4, pp. 521-544.
Vlachos, A. and Gasperin, C. (2006). “Bootstrapping and Evaluating Named Entity Recognition in the Biomedical Domain,” Proceedings of BioNLP at HLT-NAACL. Conference LNLBioNLP’06, New York, pp. 138-145.
Yang , X.F., Su, J., Zhou, G. and Tan, C.L. (2004). “An NP-Cluster Based Approach to Coreference Resolution,” Proceedings of International Conference onComputational Linguistics Conference COLING’04, Geneva, Switzerland, pp. 226-232.
Yang, X.F., Zhou, G., Su, J. and Tan, C.L. (2003). ”Coreference Resolution Using Competition Learning Approach,” Proceedings of Association for Computational Linguistics Conference ACL’03, Sapporo, Japan, pp. 176-183.
Yang, Y., Li, Y.C., Zhou, G. and Zhou, Q.M. (2008). “Research on Distance Information for Anaphora Resolution,” Journal of Chinese Information Processing, Vol. 22, no. 5, PP. 80-90.