簡易檢索 / 詳目顯示

研究生: 陳恒毅
Heng-Yi Chen
論文名稱: 社群資料對圖書搜尋系統效能之研究
A Study of Social Data on the Effectiveness of a Book Retrieval System
指導教授: 柯皓仁
Ke, Hao-Ren
學位類別: 碩士
Master
系所名稱: 圖書資訊學研究所
Graduate Institute of Library and Information Studies
論文出版年: 2014
畢業學年度: 102
語文別: 中文
論文頁數: 55
中文關鍵詞: 圖書搜尋社會標記社群資料搜尋引擎
英文關鍵詞: Book Search, Social Tag, Social Data, Search Engine
論文種類: 學術論文
相關次數: 點閱:165下載:14
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著Web 2.0的風潮,社群資料(Social Data)被廣泛應用於各類型的網站,其中網路書店、網路書櫃等書目社群網站迅速累積了大量由使用者產生的社群資料。而INEX (INitiative for the Evaluation of XML retrieval)自2011年開始自Amazon、LibraryThing搜集整理包含社群資料的書目資料,並做為圖書搜尋任務之測試資料集。
    本研究利用實驗法以INEX 2013圖書與社群搜尋任務的測試資料集進行圖書搜尋實驗,並探究不同欄位對搜尋結果以及應用社群資料重新排序結果之影響。在實驗中分別以傳統書目資料、社群資料和兩者合併的資料製作索引,並以社群資料將搜尋結果重新排序。主要之研究結果如下:
    1. 運用社群資料在機率模型的圖書搜尋,比目前圖書館使用的傳統書目資料,可以得到更好的檢索效能。
    2. 社會評論資料(Review)在機率模型的檢索之中,可以得到最好的結果。
    3. 社會標記(Tag)的資料在機率模型的檢索之中,與傳統書目資料並無明顯的差異,但是以被標記次數做為權重調整之後,其檢索效能提升270%,明顯高於未權重調整前之結果,僅次於社會評論資料索引。
    4. 使用社會評論將圖書搜尋結果重新排序,可以得到本研究中最好的檢索結果,可以提升3.1%的nDCG分數。
    5. 使用社會標記將圖書搜尋結果重新排序,其結果不如使用社會評論重新排序的結果,但是其對圖書搜尋效能可以最高提升25%的nDCG分數。
    前述之研究結果可進一步應用於資訊系統的設計,包含圖書搜尋、推薦系統,期使讀者有更好的使用者經驗。

    With the proliferation of Web 2.0, social tag is widely used in various applications. Online bookstores (like Amazon) and online bibliographic community Websites (like LibraryThing) have quickly accumulated a large amount of user-generated information. INEX (INitiative for the Evaluation of XML retrieval) have been using the Amazon/LibraryThing corpus for its Social Book Search Track since 2011. The purpose of the INEX Social Book Search Track is to develop novel algorithms leveraging professional metadata and user-generated metadata for effectively retrieving books. This thesis uses INEX 2013 Social Book Search Track test data set to conduct book search experiments and evaluate the retrieval results. Indices based on professional metadata, user-generated metadata and both are created respectively.
    The results of this study are summarized as follows:
     Using social data in the probabilistic retrieval model for Book Search outperforms using traditional bibliographic data.
     Using all book data including reviews in the probabilistic retrieval model for Book Search can get the best retrieval performance.
     Using social tag information in the probabilistic retrieval model for Book Search has no significant difference with traditional bibliographic data, but using the number of times a tag used as weight to retrieval can improve the retrieval performance.
     Using reviews data for re-ranking can achieve the best search results in this study; it can improve 3.1% of the nDCG scores.
     Using tag data for reranking can improve 25% of the nDCG score.
    Practically, the results of this thesis can be used as a clue for the design of a book search system and a book recommendations system.

    摘要 i 目次 iv 表次 vi 圖次 viii 第一章 緒論 1 第一節 研究背景 1 第二節 研究目的 3 第三節 研究範圍與限制 4 第四節 名詞解釋 5 第五節 論文架構 7 第二章 文獻探討 8 第一節 社會標記 8 第二節 資訊檢索與查詢擴展 10 第三節 圖書搜尋 19 第四節 社會標記與檢索系統 20 第三章 研究方法與設計 22 第一節 資料集 22 第二節 系統架構 30 第三節 實驗設計 37 第四節 結果評估方法 39 第四章 結果分析 41 第一節 書籍資料搜尋結果分析 41 第二節 使用社群資料重新排序結果分析 45 第三節 與INEX之結果比較 48 第五章 結論與建議 50 第一節 結論 50 第二節 未來建議 51 參考文獻 52

    卜小蝶(2007)。使用者導向之網路資源組織與檢索:文華圖書館管理資訊股份有限公司。
    吳明德(1993)。線上目錄的主題檢索。圖書館學刊,8,37-38。
    陳光華(1996)資訊檢索查詢之自然語言處理。中國圖書館學會會報,57, 141-153。
    陳光華(2004)。資訊檢索之績效評估。Paper presented at the 2004年現代資訊組織與檢索研討會,臺北:淡江大學。
    曾元顯(1997)。關鍵詞自動擷取技術與相關詞回饋。中國圖書館學會會報,59,59-64
    Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval (Vol. 463). New York: ACM press.
    Bao, S., Xue, G., Wu, X., Yu, Y., Fei, B., & Su, Z. (2007). Optimizing Web Search Using Social Annotations.Proceedings of the 16th International Conference on World Wide Web. New York, NY, USA: ACM. doi:10.1145/1242572.1242640
    Buckley, C., Salton, G., Allan, J., & Singhal, A. (1995). Automatic query expansion using SMART: TREC 3. NIST SPECIAL PUBLICATION SP, 69-69.
    Cambria, E., Rajagopal, D., Olsher, D., & Das, D. (2013). Big Social Data Analysis . In Big Data Computing, Big Data Computing.
    Chau, M., Fang, X., & Liu Sheng, O. R. (2005). Analysis of the query logs of a web site search engine. Journal of the American Society for Information Science and Technology, 56(13), 1363-1376.
    Cohen, J.D. (1995). Highlights: Language- and Domain-Independent Automatic Indexing Terms for Abstracting. JASIS, 46(3), 162-174.
    Crecelius, T., Kacimi, M., Michel, S., Neumann, T., Parreira, J. X., Schenkel, R., & Weikum, G. (2008). Making sense: Socially enhanced search and exploration. Proceedings of the VLDB Endowment, 1(2), 1480-1483. doi: 10.1145/1454159.1454206
    Kim, D. W., & Lee, K. H. (2001). A new fuzzy information retrieval system based on user preference model. In Fuzzy Systems, 2001. The 10th IEEE International Conference on (Vol. 1, pp. 127-130). IEEE.
    Dominguez, G., & Simon, S. (2010). Seek and Find: Folksonomy Tags to Support Usability and Findability in Library Catalogs.
    Fukumoto, F., Sekiguchi, Y., & Suzuki, Y. (1998). Keyword extraction of radio news using term weighting with an encyclopedia and newspaper articles. Paper presented at the Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval.
    Gantz, J., & Reinsel, D. (2012). THE DIGITAL UNIVERSE IN 2020: Big Data, Bigger Digital Shadow s, and Biggest Grow th in the Far East . Retrieved from http://www.emc.com/collateral/analyst-reports/idc-the-digital-universe-in-2020.pdf
    Gauch, S., & Smith, J.B. (1993). An expert system for automatic query reformulation. J. of the Amer. Society of Inf. Sci, 44(3), 124-136.
    Hadro, J. (2008). Darien Library's Open Source SOPAC 2.0 Emphasizes Patron Content. LibraryJournal.com, LibraryJournal.com. Retrieved from http://www.libraryjournal.com/article/CA6591377.html?rssid=191
    Harman, D. (1993). The First Text REtrieval Conference(TREC-1). Information Processing and Management, 29(4), 411-414.
    Hulth, A. (2003). Improved automatic keyword extraction given more linguistic knowledge. Paper presented at the Proceedings of the 2003 conference on Empirical methods in natural language processing.
    Humphreys, K. (2002). PhraseRate: An HTML Keyphrase Extractor. Riverside: University of California, Riverside.
    Järvelin, K., & Kekäläinen, J. (2000). IR evaluation methods for retrieving highly relevant documents. Paper presented at the Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval.
    Joachims, T., Granka, L., Pan, B., Hembrooke, H., & Gay, G. (2005, August). Accurately interpreting clickthrough data as implicit feedback. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 154-161). ACM.
    Kato, M., Ohshima, H., Oyama, S., & Tanaka, K. (2008). Can social tagging improve web image search?. In Web Information Systems Engineering-WISE 2008 (pp. 235-249). Springer Berlin Heidelberg.
    Klir, G. J., & Yuan, B. (1995). Fuzzy Sets and Fuzzy Logic: Theory and Applications (1st ed.). NJ, USA: Prentice Hall PTR.
    Koolen, M., Kazai, G., & Craswell, N. (2009, February). Wikipedia pages as entry points for book search. In Proceedings of the Second ACM International Conference on Web Search and Data Mining (pp. 44-53). ACM.
    Magdy, W., & Darwish, K. (2008, October). Book search: indexing the valuable parts. In Proceedings of the 2008 ACM workshop on Research advances in large digital book repositories (pp. 53-56). ACM.
    Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval (Vol. 1, p. 6). Cambridge: Cambridge university press.
    Maron, M. E., & Kuhns, J. L. (1960). On Relevance, Probabilistic Indexing and Information Retrieval. J. ACM, 7(3), 216-244. doi: 10.1145/321033.321035
    Matusiak, K.K. (2006). Towards user-centered indexing in digital image collections. OCLC Systems & Services, 22(4), 283-298.
    Mitra, M., Singhal, A., & Buckley, C. (1998). Improving automatic query expansion. Paper presented at the Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval.
    O'reilly, T. (2007). What is Web 2.0: Design patterns and business models for the next generation of software. Communications & strategies, (65). Retrieved Jul, 30, 2012, from http://oreilly.com/web2/archive/what-is-web-20.html
    Peat, H.J., & Willett, P. (1991). The limitations of term co-occurrence data for query expansion in document retrieval systems. JASIS, 42(5), 378-383.
    Peters, I., & Stock, W. G. (2010). “Power tags” in information retrieval. Library Hi Tech, 28(1), 81-93.
    Salton, G., & Lesk, M. E. (1968). Computer Evaluation of Indexing and Text Processing. J. ACM, 15(1), 8-36. doi: 10.1145/321439.321441
    Jones, K. S. (1971). Automatic keyword classification for information retrieval. (Vol. 253): London: Butterworths.
    Spink, A., Wolfram, D., Jansen, M.B.J., & Saracevic, T. (2000). Searching the web: The public and their queries. Journal of the American Society for Information Science and Technology, 52(3), 226-234.
    Thomas, M., Caudle, D.M., & Schmitz, C.M. (2009). To tag or not to tag? Library Hi Tech, 27(3), 411-434.
    Trant, J. (2009). Studying social tagging and folksonomy: A review and framework. Journal of Digital Information, 10(1).
    van der Plas, L., Pallotta, V., Rajman, M., & Ghorbel, H. (2004). Automatic keyword extraction from spoken text. a comparison of two lexical resources: the edr and wordnet. arXiv preprint cs/0410062.
    Vélez, B., Weiss, R., Sheldon, M.A., & Gifford, D.K. (1997). Fast and effective query refinement. Paper presented at the ACM SIGIR Forum.
    Voorhees, E., & Harman, D.K. (2005). TREC: Experiment and evaluation in information retrieval (Vol. 63): MIT press Cambridge^ eMA MA.
    Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., & Nevill-Manning, C.G. (1999). KEA: Practical automatic keyphrase extraction. Paper presented at the Proceedings of the fourth ACM conference on Digital libraries.
    Wu, H., Kazai, G., & Taylor, M.. (2008). Book search experiments: investigating IR methods for the indexing and retrieval of books. Paper presented at the Proceedings of the IR research, 30th European conference on Advances in information retrieval, Glasgow, UK.
    Yan, J., Liu, N., Chang, E. Q., Ji, L., & Chen, Z.. (2009). Search result re-ranking based on gap between search queries and social tags. Paper presented at the Proceedings of the 18th international conference on World wide web, Madrid, Spain.
    Zhai, C., & Lafferty, J. (2004). A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst., ACM Trans. Inf. Syst., 22, 179–214. doi:10.1145/984321.984322
    Zhang, C., Wang, H., Liu, Y., Wu, D., Liao, Y., & Wang, B. (2008). Automatic keyword extraction from documents using conditional random fields. Journal of Computational Information Systems, 4(3), 1169-1180.

    下載圖示
    QR CODE