簡易檢索 / 詳目顯示

研究生: 鍾昇宏
Sheng-Hong Chung
論文名稱: 兩個專有詞彙關聯句自動擷取之研究
Associated Sentences Retrieval for Two Domain-Specific Terms
指導教授: 柯佳伶
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2012
畢業學年度: 100
語文別: 中文
論文頁數: 70
中文關鍵詞: 專有詞彙問題分類句型樣式語意關聯度關聯句關聯句組
英文關鍵詞: domain-specific term, query classification, lexical pattern, relatedness degree, associated sentence, associated sentence pair
論文種類: 學術論文
相關次數: 點閱:146下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文之研究目的是針對可信文字資料來源,根據使用者所輸入的兩個專有詞彙,依照詞彙不同的關係,由資料來源中自動找出關聯句組或是關聯句,幫助使用者比較兩個專有詞彙概念。我們將詞彙關係分成兩大類:包含關係和非包含關係。本系統利用網路搜尋引擎分別搜尋兩個查詢詞彙,蒐集包含個別查詢詞彙的前K名網頁摘要,統計兩個查詢詞彙在彼此網頁摘要中出現的機率作為特徵,依據詞彙關係分類模型進行自動分類。兩個查詢詞彙若被分類為”包含”關係,系統會取出同時包含兩個查詢詞彙之句子作為關聯句集,比對關聯句型規則模型,並計算與查詢詞彙之語意關聯度,選出關聯分數最高的句子當作關聯句。查詢詞彙若被分類為 ”非包含” 關係,系統則取出包含任一查詢詞彙的句子作為關聯句集,從中找出對兩個查詢詞彙有高度關聯的共同概念詞,將句子依照共同概念詞進行分群,評估句子與共同概念詞以及句子間兩兩配對的語意相關分數,挑選分數最高的兩個句子形成關聯句組。實驗結果顯示本研究所提出的方法能有效對查詢字組的關係自動分類;考慮句型和語意關聯度分數找出的關聯句有助於使用者了解查詢詞彙的關聯性;而利用句組分數篩選出的關聯句組亦大多可以幫助使用者釐清兩個查詢詞彙在某些概念上相同相異的比較。

    According to different relationships between two domain-specific query terms, this thesis studies the strategies of automatically extracting the associated sentences or sentence pairs of the query terms from a reliable text data source. The goal of this task is to help users comparing two domain-specific query terms from the retrieved results. Two categories for the relationships between query terms are defined in this thesis: contained and not-contained relationships. The system uses a search engine on theweb to search the given two query termsforcollecting the top-k snippets for each query term. The probability of a query term appearing in the top-k snippets of the other query term is used as features to train aclassifier of query pair relationship. Ifthe two query terms have the containedrelationship, the sentences containing both terms are retrieved as the candidate sentences.Foreach candidate sentence, itsassociated score is evaluated by matching the lexical pattern withthe associated sentence rule model and computing the semantic relatedness degreewith the query terms. The sentence with the highest associated score is selected as the associated sentence.If the relationship is a not-containedrelationship, the common concept terms, which have high semantic relatedness with both query terms, are extracted from the sentences containingone of the two query terms.We use common concept terms to group sentences.Within each group, the representation scoreof each candidate sentence pair is evaluated by computing its sematic relatedness with the concept terms andthe sematic relatedness sematic similaritybetween the sentence pair. The sentence pairwith the highest representation score isselected as an associated sentence pair.The experimental results show that the proposed methodcan effectively classifythe relationshipsof query terms. Moreover, the retrieved associated sentencesare helpful for usersto understand the semantic relationshipbetween two query terms.The discovered associated sentence pairs also effectively help users to clarify the similar and dissimilar concept between two query terms.

    目錄 表目錄 i 圖目錄 ii 第一章 緒論 1 1.1 研究動機與目的 1 1.2 研究範圍與限制 2 1.3 論文方法 4 1.4 論文架構 6 第二章 文獻探討 7 2.1 自動問答系統 7 2.2字詞間關係分類 9 2.3 字詞語意關係探勘 11 2.3.1 字詞間的語意關係探勘 11 2.3.2 高語意關聯字詞探勘 12 2.4文件內容自動摘要技術 14 第三章 系統運作流程 15 第四章 電子書資料前處理與索引建立 18 4.1句子擷取 18 4.2建立文句索引 19 4.3詞性標記及產生概念詞集 21 4.3.1 詞性標記 21 4.3.2 產生概念詞集 21 第五章 查詢詞彙關係評估 23 5.1 詞彙關係分類定義 23 5.2分類特徵擷取方法 24 5.3 詞彙關係分類模型建立方法 27 5.3.1 產生訓練字組 27 5.3.2建立詞彙關係分類模型 28 5.3.3線上詞彙關係自動分類 29 第六章 包含關係關聯句挑選 30 6.1 擷取句型樣式 30 6.2關聯句比對規則模型建立 33 6.2.1 訓練句子的蒐集 33 6.2.2關聯句比對規則模型建立 33 6.3 挑選關聯句 35 第七章 非包含關係概念關聯句組建立 37 7.1 找出共同概念詞 37 7.1.1 產生共同候選概念詞集 37 7.1.2擷取高度相關的共同概念詞 38 7.2 建立關聯句組 41 第八章 實驗結果與討論 43 8.1 實驗資料 43 8.2 自動分類詞彙關係之正確率 44 8.2.1 不同的特徵在分類器上效能的影響 45 8.2.2 以貪婪選擇法組合特徵配對在分類器上效果的影響 47 8.3 評估關聯句品質 51 8.4 評估關聯句組效果 55 8.5 跨領域查詢字組的測試 60 第九章 結論與未來討論 64 9.1 結論 64 9.2 未來研究方向 65 參考文獻 66 附錄A 68 包含關係訓練查詢字組 68 非包含關係訓練查詢字組 69

    [1] N. Schlaefer, J. Chu-Carroll, and E. Nyberg, “Statistical Source Expansion for Question Answering,” in Proceedings of the 20th ACM conference on Information and Knowledge Management (CIKM), 2011.
    [2] H. T. Dang, D. Kelly and J. Lin, “Overview of the TREC 2007 Question Answering Track,” in Proceedings of the Sixteenth Text Retrieval Conference (TREC), 2007.
    [3] X. Cao, G. Cong, and B. Cui, “The Use of Categorization Information in Language Models for Question Retrieval,” in Proceedings of the 18th ACM conference on Information and Knowledge Management (CIKM), 2009.
    [4] L. Cai, G. Zhou and K. Liu, "Large-Scale Question Classification in cQA by Leveraging Wikipedia Semantic Knowledge", in Proceedings of the 20th ACM conference on Information and Knowledge Management (CIKM), 2011
    [5] Song, Y., Qiu, B., and Farooq, U. “Hierarchical tag visualization and application for tag recommendations.” in Proceedings of the 20th ACM conference on Information and Knowledge Management (CIKM), 2011
    [6] D. Bollegala, Y. Matsuo, and M. Ishizuka, "Measuring the SimilarityBetween Implicit Semantic Relations Using Web Search Engines", in Proceedings of the Second ACM International Conference on Web Search and Data Mining(WSDM), 2009.
    [7] A. Kalyanpur, S. Patwardhan, and B. Boguraev, “Fact-Based Question Decomposition for Candidate Answer Re-Ranking” in Proceedings of the 20th ACM conference on Information and Knowledge Management (CIKM), 2011
    [8] X. Xue, J. Jeon, and W. B. Croft, “Retrieval models for question and answer archives,” in Proceedings of the 31rd Annual International ACM conference on Special Interest Group on Information Retrieval (SIGIR), 2008
    [9] G. Luo, C. Tang, and Y. Tian, “Answering relationship queries on the web” in Proceedings of the 16th international conference on World Wide Web(WWW), 2007
    [10] S.E. Robertsom, S. Walker, and M. Hancock-Beaulieu, “Okapi at TREC-7: Automatic Ad Hoc, Filtering, VLC and Interactive”, In proceedings of the 7th Text Retrieval Conference(TREC-7), NIST Special Publication.
    [11] D. Jiang, K. W. Leung, W. Ng, “Context-Aware Search Personalization with Concept Preference,” in Proceedings of the 20th ACM conference on Information and Knowledge Management (CIKM), 2011
    [12] S. Szumlanski and F. Gomez, “Automatically Acquiring a Semantic Network of Related Concepts” in Proceedings of the 19th ACM conference on Information and Knowledge Management (CIKM), 2010
    [13] C. Fellbaum, editor. “WordNet: An electronic Lexical Database” MIT Press, 1998
    [14] C. Fautsch, J. Savoy, “Adapting the tf-idf Vector Space Model to Domain Specific Information Retrieval” in proceedings of 25th ACM Symposium on Applied Computing(SAC), 2010
    [15] D. Vandic, J. V. Dam and F. Hogenboom, “A Semantic Clustering-Based Approach for Searching and Browsing Tag Spaces” in proceedings of 26th ACM Symposium on Applied Computing(SAC), 2011
    [16] M. S. Pera, R. Qumsiyeh, Y. K. Ng, “A Query-Based Multi-document Sentiment Summarizer” in Proceedings of the 20th ACM conference on Information and Knowledge Management (CIKM), 2011.
    [17] 謝聿承, 「兩個專有詞彙概念關聯句自動擷取技術之研究」 ,國立臺灣師範大學,碩士論文,民國100年。
    [18] H. Cui, M. Kan and T. Chua, “Generic Soft Pattern Models for Definitional Question Answering” ACM Transactions on Information Systems, Vol. 25, No. 2, Article 8, April 2007.
    [19] R.-E. Fan, P.-H. Chen, and C.-J. Lin. “Working set selection using the second order information for training SVM,” Journal of Machine Learning Research 6, 1889-1918, 2005

    下載圖示
    QR CODE