簡易檢索 / 詳目顯示

研究生: 施柏先
論文名稱: 對社交標籤系統提出有效率的 Top-k 近似查詢處理方法之研究
The Efficient Top-k Similar Processing Method for Similar Search on Social Tagging System
指導教授: 柯佳伶
Koh, Jia-Ling
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2012
畢業學年度: 101
語文別: 中文
論文頁數: 70
中文關鍵詞: 社交標籤系統近似距離公式評估Top-k 近似查詢處理方法
英文關鍵詞: Social tagging system, Distance measure evaluation, Top-k similar search algorithm
論文種類: 學術論文
相關次數: 點閱:96下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文研究目的為針對社交標籤網站提供一個有效率的標籤集近似查詢處理系統,我們使用一個多階層的標籤集索引結構來對系統中資料物件的標籤集建立索引,此索引結構內記錄的資訊能夠估算一群資料物件與使用者所給的查詢標籤集合距離的上限及下限值,能夠篩除掉多個和查詢標籤集距離較大的標籤集,使查詢更有效率。本論文根據此索引結構設計了傑卡德距離與重疊距離兩個標籤集距離的計算方法,並提出考量語義的修正計算方法及對應的距離上下限值估算方法,實驗評估結果可顯示這些距離計算方法對近似查詢結果的有效性。本論文亦針對此索引結構提出一個有效率的 Top-k 近似查詢演算法,使用者僅需設定資料結果筆數 ( k ),可找出標籤集和給定查詢標籤集合最相似的前 k 筆資料物件。實驗結果顯示本研究提出的 Top-k 搜尋處理方法和基本方法比較,在多數情況下可有效提昇處理效率。

    This thesis mainly aims to provide an efficient similar search system for tag sets on the social tagging system. We use a multi-level index structure to construct an index on the tag sets of the objects. We can estimate the upper/lower bound of the distance value between a given query tag set and the tag sets of a group of objects according to the information maintained in the index structure, which can be used to improve the efficiency of searching by applying the pruning strategies. According to the property of this index structure, we design the Jaccard distance and Overlap distance measure function for evaluating the similarity distance betwen two tag sets. Then the modified distance measure functions and the upper/lower bound estimating methods corresponding to the specific distance measure functions are proposed. The experimental results show that the effectiveness on similarity searches of the proposed distance measure functions. We also proposes an efficient top-k similar search algorithm based on the index structure. Users only need to give the number of required search results, k, the system could find the top-k most similar objects with the query tag set. The experimental results show that, in most test cases, the proposed algorithm performs better in efficiency than the baseline method.

    附表目錄 i 附圖目錄 ii 第一章 緒論 1 1.1 研究動機 1 1.2 研究目的與範圍限制 2 1.3 論文方法 4 1.4 論文架構 5 第二章 相關文獻探討. 6 2.1 社交標籤系統概念 6 2.2 標籤資料物件查詢方式 7 2.2.1 包含查詢 (Contain Search) 7 2.2.2 近似查詢 (Similarity Search) 8 2.2.3 範例查詢 (Query-by-Example Search) 8 2.3 標籤系統資料結構 10 2.3.1 反向索引 (inverted index) 10 2.3.2 以分類架構建立索引結構 11 2.3.3 多階層群集之索引結構 11 第三章 系統架構與資料前處理 13 3.1 系統架構與流程 13 3.2 資料前處理 15 3.3 標籤物件集索引結構 16 3.3.1 索引結構介紹 16 3.3.2 建立索引結構群集門檻值之定義 18 第四章 索引結構下之距離評估方法 20 4.1 多階層索引結構之查詢方法 20 4.1.1 雙層邊界近似查詢方法 20 4.1.2 修正漢明距離式 (Hamming Distance Measure) 23 4.1.3 查詢範例 26 4.1.4 標籤相關程度值計算方式 28 4.2 傑卡德距離 (Jaccard Distance Measure) 29 4.2.1 修正傑卡德距離定義 29 4.2.2 採用傑卡德距離在兩層機制搜尋邊界估算方法推導 30 4.3 重疊距離 (Overlap Distance) 33 4.3.1 修正重疊距離計算方式之定義 33 4.3.2 採用修正重疊距離在兩層機制搜尋邊界算法推導 34 第五章 Top-k 近似查詢處理方法 37 5.1 基本處理概念 37 5.2 流程步驟 44 第六章 實驗評估. 51 6.1 實驗資料來源及環境設定 51 6.2 評估不同距離估算方式對查詢結果的效能 52 6.2.1 測試資料說明 52 6.2.2 實驗評估方法 52 6.2.3 實驗結果 56 6.3 評估 Top-k 近似查詢處理方法之效率 59 6.3.1 測試資料 60 6.3.2 實驗評估方法 61 6.3.3 實驗結果 61 第七章 結論與未來展望 67 7.1 結論 67 7.2 未來展望 68 參考文獻 69

    [1] J.-C. Chuang, C.-W. Cho, and A. L.-P. Chen. Similarity search in transaction
    databases with a two-level bounding mechanism. In M. Lee, K.-L. Tan, and V. Wu-
    wongse, editors, Database Systems for Advanced Applications, volume 3882 of
    Lecture Notes in Computer Science, pages 572--586. Springer Berlin Heidelberg,
    2006.
    [2] B. Ding, H. Wang, R. Jin, J. Han, and Z. Wang. Optimizing index for taxonomy
    keyword search. In Proceedings of the 2012 ACM SIGMOD International Con-
    ference on Management of Data, SIGMOD '12, pages 493--504, New York, NY,
    USA, 2012. ACM.
    [3] S. A. Golder and B. A. Huberman. Usage patterns of collaborative tagging systems.
    J. Inf. Sci., 32(2):198--208, April 2006.
    [4] M. Gupta, R. Li, Z. Yin, and J. Han. Survey on social tagging techniques. SIGKDD
    Explor. Newsl., 12(1):58--72, November 2010.
    [5] C.-C. Hsieh and J. Cho. Finding similar items by leveraging social tag clouds. In
    Proceedings of the 27th Annual ACM Symposium on Applied Computing, SAC '12,
    pages 644--651, New York, NY, USA, 2012. ACM.
    [6] J.-L. Koh, G.-T.Chiang, and I.-C. Chiu. The strategies for supporting query special-
    ization and query generalization in social tagging system. Proceedings of the 4th
    International Workshop on Social Networks and Social Web Mining(SNSM), 2013.
    [7] J.-L. Koh, N. Shongwe, and C.-W. Cho. A multi-level hierarchical index structure
    for supporting efficient similarity search on tag sets. 2012 Sixth International Con-
    ference on Research Challenges in Information Science (RCIS), pages 1--12, May
    2012.
    [8] K.-P. Lee, H.-G. Kim, and H.-J. Kim. A social inverted index for social-tagging-
    based information retrieval. J. Inf. Sci., 38(4):313--332, August 2012.
    [9] J. I. Lopez-Veyna, V. J. Sosa-Sosa, and I. Lopez-Arevalo. Kesosd: keyword search
    over structured data. In Proceedings of the Third International Workshop on Key-
    69
    word Search on Structured Data, KEYS '12, pages 23--31, New York, NY, USA,
    2012. ACM.
    [10] G. S. Manku, A. Jain, and A. Das Sarma. Detecting near-duplicates for web crawl-
    ing. In Proceedings of the 16th international conference on World Wide Web,
    WWW '07, pages 141--150, New York, NY, USA, 2007. ACM.
    [11] B. Markines, C. Cattuto, F. Menczer, D. Benz, A. Hotho, and G. Stumme. Evaluat-
    ing similarity measures for emergent semantics of social tagging. In Proceedings of
    the 18th international conference on World wide web, WWW '09, pages 641--650,
    New York, NY, USA, 2009. ACM.
    [12] A. Mathes. Folksonomies - cooperative classification and communication through
    shared metadata. http://www.adammathes.com/academic/computer-mediated-
    communication/folksonomies.html, 2004.
    [13] C. Ordonez, E. Omiecinski, and N. Ezquerra. A fast algorithm to cluster high di-
    mensional basket data. Proceedings 2001 IEEE International Conference on Data
    Mining, pages 633--636, 2001.
    [14] R. Schenkel, T. Crecelius, M. Kacimi, S. Michel, T. Neumann, J. X. Parreira, and
    G. Weikum. Efficient top-k querying over social-tagging networks. In Proceedings
    of the 31st annual international ACM SIGIR conference on Research and develop-
    ment in information retrieval, SIGIR '08, pages 523--530, New York, NY, USA,
    2008. ACM.
    [15] B. Spell. Java api for wordnet searching. http://lyle.smu.edu/ tspell/jaws/, 2008.
    [16] V. Zanardi and L. Capra. Social ranking: uncovering relevant content using tag-
    based recommender systems. In Proceedings of the 2008 ACM conference on Rec-
    ommender systems, RecSys '08, pages 51--58, New York, NY, USA, 2008. ACM.
    [17] J. Zobel and A. Moffat. Inverted files for text search engines. ACM Comput. Surv.,
    38(2), July 2006

    下載圖示
    QR CODE