簡易檢索 / 詳目顯示

研究生: 陳弘奇
Chen, Hung-Chi
論文名稱: 生醫文獻中特定關係組合之自動化擷取
Automatic Extraction of Specified Relations from Biomedical Literatures
指導教授: 侯文娟
Hou, Wen-Juan
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 65
中文關鍵詞: 疾病—藥物關聯度藥物—藥物交互作用機器學習生醫文獻
英文關鍵詞: Disease-Drug Association, Drug-Drug Interaction, Machine Learning, Biomedical Literature
DOI URL: http://doi.org/10.6345/THE.NTNU.DCSIE.016.2018.B02
論文種類: 學術論文
相關次數: 點閱:129下載:10
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究目的為擷取自然語句中指定名詞間的關係判定,並應用在生醫文獻內,以便快速地找出文獻中有用途的關係。雖然本研究是透過生醫文獻為基礎,但是對於各個領域的研究人員在探討自己領域的相關文獻資料時,也可以透過此方法更快速且正確的篩選到自己需要的文獻及資料。
    本研究所使用的資料集分成兩組,並在實驗上兩組資料個別獨立。一組為參考Clinical trials (https://clinicaltrials.gov)網站中提供美國官方已完成的疾病研究和藥物的配對為基礎,並透過PubMed資料庫(https://www.ncbi.nlm.nih.gov/pubmed)搜尋目標疾病藥物對的生醫文獻摘要。其資料分成兩類:從PubMed文章摘要找出含有Clinical trials所提及到的疾病可被藥物治療之句子,視為正向的句子;以及相同疾病不能被藥物治療或是疾病與藥物無任何關聯之句子,視為負向的句子。
    另一組為SemEval 2013 Task 9所提供,內容為MedLine的摘要以及DrugBank的資料庫構成的語料庫,SemEval 2013 Task 9為從生醫文獻中擷取藥物間交互作用的競賽(SemEval 2013 Task 9:Extraction of Drug-Drug Interactions from Biomedical Texts),該競賽將藥物間的交互作用分成五類:Advice(建議)、Effect(影響)、Mechanism(機制)、Int(交互作用)和False(無交互作用)。
    本研究為透過多層次的機器學習方法搭配基本字詞轉換與自然語言句子分析作為特徵擷取。本研究在藥物—疾病關係辨識實驗最佳結果Accuracy為75.7%、Precision為76.3%、Recall為74.6%以及F-score為75.5%;在藥物—藥物關係辨識實驗最佳結果Precision為47.8%、Recall為72.4%以及F-score為57.6%。

    The objectives of this study is to extract the relationship between the specified nouns from natural language sentences and applies them in the biomedical literature to quickly find useful relationships in the literature. Although this study is based on the biomedical literature, researchers in various fields can also use this method to quickly and correctly retrieve the literature and materials they need when discussing relevant literature in their field.
    The data sets used in this study were divided into two parts, and the two parts of data were individually independent in the experiments. The first part is based on the official US completed disease studies and drug pairings on the Clinical trials (https://clinicaltrials.gov) website and the relevant Medline abstracts to the target disease-drug pairs is retrieved through the PubMed database (https://www.ncbi.nlm.nih. gov/pubmed). The data is divided into two categories: from the PubMed article abstracts to find the sentences containing the drugs that clinical trials mentioned the drug able to treat some specified disease, regarded as positive sentences. If the same disease can not be treated by drugs or the disease and drugs have no connection, the sentences are considered as negatives.
    The other part is provided by SemEval 2013 Task 9, which includes MedLine abstracts and a corpus of DrugBank's database. SemEval 2013 Task 9 is a competition for drug interactions from the biomedical literature (SemEval 2013 Task 9: Extraction Of Drug-Drug Interactions from Biomedical Texts), which divides the interactions between drugs into five categories: Advice, Effect, Mechanism, Int, and False.
    This study dose the feature extraction through a multi-level machine learning method with basic word conversion and natural language sentence analysis. In this study, the best results in the drug-disease relationship identification experiment were 75.7% for Accuracy, 76.3% for Precision, 74.6% for Recall, and 75.5% for F-score. The best results for the drug-drug relationship identification experiment were 47.8% precision rate, 72.4% recall rate and 57.6% F-score.

    摘要 I ABSTRACT III 目錄 V 附表目錄 VII 附圖目錄 VIII 第一章 緒論 1 第一節 研究背景 1 第二節 研究目的 2 第三節 論文架構 2 第二章 文獻探討 4 第一節 藥物-疾病關係擷取語料庫原始文件資料來源 4 第二節 近期藥物-疾病關係擷取方法與成果 4 第三節 近期藥物-藥物交互作用擷取方法與成果 5 第四節 實驗工具與方法參考 9 第三章 方法與步驟 13 第一節 研究方法與架構 13 第二節 實驗資料來源 18 第三節 特徵擷取 21 第四節 機器學習方法 29 第四章 資料處理與評估方式 30 第一節 藥物—疾病組合 30 第二節 藥物—藥物組合 32 第三節 資料整理 35 第四節 合成少數採樣技術 37 第五節 評估方式 40 第五章 實驗結果與討論 42 第一節 藥物—疾病關係辨識之結果與討論 42 第二節 藥物—藥物關係辨識之結果與討論 48 第三節 實驗方法綜合討論 54 第六章 結論與未來展望 60 參考文獻 62

    Aronson, A. R. (2001). Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In Proceedings of the AMIA Symposium (p. 17). American Medical Informatics Association.
    Björne, J., Heimonen, J., Ginter, F., Airola, A., Pahikkala, T., & Salakoski, T. (2011). EXTRACTING CONTEXTUALIZED COMPLEX BIOLOGICAL EVENTS WITH RICH GRAPH‐BASED FEATURE SETS. Computational Intelligence, 27(4), 541-557.
    Björne, J., Kaewphan, S., & Salakoski, T. (2013, June). UTurku: drug named entity recognition and drug-drug interaction extraction using SVM classification and domain knowledge. In Second Joint Conference on Lexical and Computational Semantics (* SEM) (Vol. 2, pp. 651-659).
    Bobic, T., Fluck, J., & Hofmann-Apitius, M. (2013). SCAI: Extracting drug-drug interactions using a rich feature vector. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Vol. 2, pp. 676-683
    Bokharaeian, B., & Díaz, A. (2013, June). NIL UCM: Extracting Drug-Drug interactions from text through combination of sequence and tree kernels. In Second Joint Conference on Lexical and Computational Semantics. Atlanta, Georgia, USA (pp. 644-650).
    Chang, C. C., & Lin, C. J. (2011). LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 27.
    Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357.
    Chowdhury, M. F. M., & Lavelli, A. (2013). FBK-irst: A multi-phase kernel based approach for drug-drug interaction detection and classification that exploits linguistic information. Atlanta, Georgia, USA, 351, 53.
    Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.
    Hailu, N. D., Hunter, L. E., & Cohen, K. B. (2013). UColorado SOM: extraction of drug-drug interactions from biomedical text using knowledge-rich and knowledge-poor features. Proceedings of SemEval, 684-8.
    Neves, M. L., Carazo, J. M., & Pascual-Montano, A. (2009, June). Extraction of biomedical events using case-based reasoning. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task (pp. 68-76). Association for Computational Linguistics.
    Rastegar-Mojarad, M., Boyce, R. D., & Prasad, R. (2013, June). UWM-TRIADS: classifying drug-drug interactions with two-stage SVM and post-processing. In Proceedings of the 7th International Workshop on Semantic Evaluation (pp. 667-674).
    Sánchez Cisneros, D. (2013). UC3M: A kernel-based approach to identify and classify DDIs in biomedical texts. Association for Computational Linguistics.
    Segura-Bedmar, I., Martínez, P., & Zazo, M. H. (2013). Semeval-2013 task 9: Extraction of drug-drug interactions from biomedical texts (ddiextraction 2013). In Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Vol. 2, pp. 341-350).
    Thomas, P., Neves, M., Rocktäschel, T., & Leser, U. (2013, June). WBI-DDI: drug-drug interaction extraction using majority voting. In Second Joint Conference on Lexical and Computational Semantics (* SEM) (Vol. 2, pp. 628-635).
    石琢暐(2011)支持向量機簡介,Available form http://eeil.imis.ncku.edu.tw/knowledgebase/zhi-yuan-xiang-liang-ji-support-vector-machine
    李伯勳 (2017) 生醫文獻中疾病與藥物關係之樣式自動化擷取 (未出版之碩士論文),國立臺灣師範大學資訊工程系
    陳佩瑄 (2017) 以混合方法自生醫文獻擷取藥物-藥物交互作用之研究 (未出版之碩士論文),國立臺灣師範大學資訊工程系
    張毓珊 (2009) 發展處理類別不平衡問題之資料探勘模式 (未出版之碩士論文),朝陽科技大學資訊管理系

    下載圖示
    QR CODE