簡易檢索 / 詳目顯示

研究生: 卓晉緯
Chin-Wei Cho
論文名稱: 專有詞彙之定義式問題答案句自動擷取系統
Definitional Sentences Retrieval System for Domain-Specific Terms
指導教授: 柯佳伶
Koh, Jia-Ling
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2010
畢業學年度: 98
語文別: 中文
論文頁數: 61
中文關鍵詞: 資料探勘資訊檢索自動答詢系統自動摘要句子檢索句子分群資訊擷取
英文關鍵詞: Data Mining, Information Retrieval, Question Answering, Automatic Summarization, Sentence Retrieval, Sentence Clustering, Information Extraction
論文種類: 學術論文
相關次數: 點閱:201下載:6
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文針對專有詞彙之定義式問題,建立一套以電子書為答案來源之定義式
    答案句自動擷取系統雛形。本論文運用資訊檢索的概念由電子書內容中選取候選句子,並提出以維基百科等外部知識來源衡量句中所包含的字詞與查詢專有詞彙關鍵字的關聯權重值,作為系統挑選答案句之評分依據。本論文方法能夠讓答案不受限於特定定義式句型,而找出更多能夠幫助了解該專有詞彙之相關定義解釋說明的內容作為答案。並採用句子間字詞的語意關聯度,綜合評估計算答案句間的相似程度值,以不同聚落分析演算法對答案句進行自動分群處理,使答案句能依所涵蓋概念類似性分群整理呈現給使用者。由實驗結果顯示,本論文研究方法所擷取之答案句及排序順序,與專家人工評分挑選的標準答案結果一致性很高。

    This thesis proposes a sentences retrieval prototype system for answering definitional questions of domain-specific terms. Our approach select candidate answer sentences from eBooks. We propose a term weighting model using external
    knowledge (e.g. Wikipedia) to measure the importance of each terms in the sentence toward the querying domain-specific term. We then rank candidate answer sentences according to the sum of its term weights. Retrieved answers are not limited to specific definitional pattern. Any sentences which would be helpful for understanding the
    definition and explanation of the domain-specific terms can be retrieved by our proposed system. Finally, We summarize the answer result automatically by clustering answer sentences based on their semantic relatedness. Experimental results show that the ranked list of answer sentences retrieved by our proposed system are consistent with the expert voted ground-true answer in most cases.

    附表目錄 iii 附圖目錄 iv 第一章 緒論 1 1-1 研究動機 1 1-2 研究目的 2 1-3 研究的範圍與限制 3 1-4 論文方法 5 1-5 論文架構 7 第二章 文獻探討 8 2-1 自動答詢系統 8 2-2 字詞語意關聯分析 12 2-3 文件摘要及分群 16 第三章 系統架構與資料前處理 19 3-1 系統架構與流程 19 3-2 資料前處理 21 3-3 建立文件內容索引 24 第四章 答案句擷取方法 29 4-1 候選答案句選取 29 4-2 候選答案句排序 31 第五章 答案句分群 38 5-1 語意關聯度 38 5-2 分群演算法 41 第六章 系統效能評估 44 6-1 答案句擷取效果評估 44 6-2 答案句分群效果評估 51 第七章 結論與未來研究方向 57 參考文獻 58

    [1] H.T. Dang, D. Kelly and J. Lin, “Overview of the TREC 2007 Question Answering Track,” in Proceedings of the Sixteenth Text REtrieval Conference (TREC), 2007.
    [2] K.S. Han, Y.I. Song and H.C. Rim, ”Probabilistic Model for Definitional Question Answering,” in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR),2006.
    [3] X.Xue, J.Jeon and W.B.Croft, ”Retrieval Models for Question and Answer Archives,” in Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR),2008.
    [4] K.W. Kor and T.S.Chua, ”Interesting Nuggets and Their Impact on Definitional Question Answering,” in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval SIGIR),2007.
    [5] G.Cong, L.Wang, C,Y,Lin, Y.I. Song and Y.Sun, “Finding Question-Answer Pairs from Online Forums,” in Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR),2008.
    [6] L.Hong and B.D. Davison,”A Classification-based Approach to Question Answering in Discussion Boards,” in Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval (SIGIR),2009.
    [7] E.Agichtein, C.Castillo, D.Donato, A.Gionis and G.Mishne, “Finding High-Quality Content in Social Media,” in Proceedings of the international conference on Web Search and Data Mining(WSDM),2008.
    [8] Y.Liu, J.Bian and E.Agichtein, ”Predicting Information Seeker Satisfaction in Community Question Answering,” in Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR),2008.
    [9] J.Ko, E.Nyberg and L.Si, ”A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering,” in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR),2007.
    [10] J. Lin and B.Katz, ”Question answering techniques for the world wide web,” in Tutorial presentation at The 11th Conference of the European Chapter of the Association of Computational Linguistics (EACL),2003.
    [11] C.Denicia-carral, M.Montes-y-gómez, L.Villaseñor-pineda and R.G.Hernández, “A Text Mining Approach for Definition Question Answering,” in Proceedings for the 5th International Conference on Natural Language Processing
    (FinTal),2006.
    [12] S.Momtazi and D.Klakow, ”A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems,” in Proceeding of the 18th ACM conference on Information and knowledge management(CIKM),2009.
    [13] S.Zhao and J.Betz, ”Corroborate and Learn Facts from the Web,” in Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and
    data mining (KDD),2007.
    [14] D. Das and A. Martins, ”A Survey on Automatic Text Summarization,” in Literature Survey for the Language and Statistics II Course at CMU, 2007.
    [15] M.Steinbach , G.Karypis and V.Kumar, ”A Comparison of Document Clustering Techniques,” in Proceeding of Knowledge Discovery and Data Mining Workshop Text Mining,2002.
    [16] D.Carmel, H.Roitman and N.Zwerdling, “Enhancing Cluster Labeling Using Wikipedia,” in Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval(SIGIR),2009.
    [17] X.Hu, X.Zhang, C.Lu, E. K. Park and X.Zhou, ”Exploiting Wikipedia as External Knowledge for Document Clustering,” in Proceedings of the 15th ACM SIGKDD
    international conference on Knowledge discovery and data mining(KDD),2009.
    [18] A.Tombros, J.M. Jose and I.Ruthven, “Clustering Top-Ranking Sentences for Information Access,” in Proceedings of the 7th European Conference on Digital Libraries (ECDL),2003.
    [19] Z.Pei-ying and L.Cun-he, ”Automatic text summarization based on sentences clustering and extraction,” in Proceedings of the 2nd IEEE International
    Conference on Computer Science and Information Technology(ICCSIT),2009.
    [20] D.Bollegala,Y.Matsuo and M.Ishizuka, ”Measuring the Similarity Between Implicit Semantic Relations using Web Search Engines,” in Proceedings of the Second ACM International Conference on Web Search and Data Mining(WSDM),2009.
    [21] R.Sinha and R.Mihalcea, “Unsupervised Graph-based Word Sense Disambiguation Using Measures of Word Semantic Similarity,” in Proceedings of the International Conference on Semantic Computing(ICSC),2007.
    [22] D. Milne and I. Witten, ”An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links,” in Wikipedia and AI workshop at the AAAI-08 Conference (WikiAI08),2008.
    [23] E. Gabrilovich and S. Markovitch, ”Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis,” in Proceedings of The Twentieth International Joint Conference for Artificial Intelligence(IJCAI),2007.
    [24] X.Hu, N.Sun, C.Zhang and T.S. Chua, “Exploiting Internal and External Semantics for the Clustering of Short Texts Using World Knowledge,” in Proceeding of the 18th ACM conference on Information and knowledge management(CIKM),2009.
    [25] H.H.Chen, M.S.Lin and Y.C.Wei, ”Novel Association Measures Using Web Search with Double Checking,” in Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the
    Association for Computational Linguistics (COLING,ACL),2006.
    [26] M.Grineva, M.Grinev and D.Lizorkin, “Extracting key terms from noisy and multi-theme documents,” in Proceedings of the 18th international conference on World wide web(WWW),2009.
    [27] B.He, C.Macdonald, J.He and I.Ounis, “An Effective Statistical Approach to Blog Post Opinion Retrieval,” in Proceeding of the 17th ACM conference on Information and knowledge management (CIKM),2008.

    下載圖示
    QR CODE