簡易檢索 / 詳目顯示

研究生: 梁哲瑋
Che-Wei Liang
論文名稱: 運用維基百科進行個人微網誌內容主題分析
Mining Topic Interests of Users from Micro-blogs based on Wikipedia
指導教授: 柯佳伶
Koh, Jia-Ling
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2010
畢業學年度: 98
語文別: 中文
論文頁數: 65
中文關鍵詞: 微網誌維基百科文字探勘
英文關鍵詞: micro-blogging, Wikipedia, text mining
論文種類: 學術論文
相關次數: 點閱:115下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來微網誌的使用越來越普遍,使用者會透過微網誌文章與好友分享,包含使用者興趣、心情、資訊分享等。微網誌使用者所發表的文章所涵蓋的類別通常是使用者有興趣的主題,因此我們希望藉由探勘微網誌使用者的所發表的文章主題來找出使用者的興趣。本論文研究所提出的方法是先對一個微網誌使用者萃取出文章中的重要字詞,運用維基百科之分類網絡來查詢出字詞所涵蓋的類別概念,而探勘出使用者可能的興趣類別。在探勘過程中,對於維基百科中直接查詢不到的字詞,則透過線上連結維基百科尋找重定向字詞所涵蓋的類別概念。對於非維基百科字詞,我們則透過相關字詞的聚落分析結果,運用相同聚落的其他字詞來探勘出可能的類別概念。我們提出計算微網誌使用者的文章主題集中度之評估方法,實驗結果顯示:本論文系統所提出之使用者文章集中度的評估方法可達到很高的正確率,且本論文系統自動判定使用者的興趣類別與受試者所挑選的類別結果有一定程度的一致性。

    In recent years, micro-blogging has been widely used by users. Micro-blog users usually share their interests, feelings, and information with their friends. The implicit topics covered in the micro-blog articles of a user usually show the user’ interests. Therefore, the goal of this study is to discover the implicit topics of micro-blog articles posted by micro-blog users to find users' interests. In this thesis, we first extract the important terms in a micro-blog article, and then Wikipedia is used to look up the corresponding categories of each term. For the terms which that can’t be found by Wikipedia directly, the Wikipedia online is linked to find the categories of their redirected terms. For each non-Wikipedia term, through the clustering analysis of related terms, the other terms in the same cluster with the non-Wikipedia term are used instead to get the corresponding categories. An evaluation method is proposed to measure the topic concentration degree of a micro-blog user. The results of experiments show that the proposed method can judge the topic concentration degree of micro-blog users with high precision. Moreover, the interest categories of micro-blog users discovered by the proposed method has high consistency with the results decided by the testers.

    附表目錄 i 附圖目錄 ii 第一章 緒論 1 1-1 研究動機 1 1-2 相關文獻探討 2 1-3 論文方法 5 1-4 論文架構 6 第二章 問題描述與定義 7 2-1 問題描述 7 2-2 使用工具介紹 7 第三章 論文方法 16 3-1 系統簡介 16 3-2 資料蒐集及前處理 17 3-3 由維基百科進行微網誌使用者興趣類別概念探勘 20 3-4 維基百科重定向字詞與非維基百科字詞處理 29 3-5 微網誌使用者文章主題集中程度探勘 35 第四章 實驗 38 4-1 實驗評估 38 4-2 分析與討論 47 第五章 結論與未來研究 49 參考文獻 51 附錄A 由維基百科分類索引挑選之上層類別概念 54 附錄B 微網誌文章之斷詞結果與系統挑選類別範例 55

    [1] A. Java, X. Song, T. Finin and Belle Tseng, “Why We Twitter: Understanding Microblogging Usage and Communities,” in Proceedings of the 1st International Workshop on Social Network Mining and Analysis, SNAKDD, 2007.
    [2] C. Macdonald and I. Ounis, “Key Blog Distillation: Ranking Aggregates,” in Proceedings of the 16th ACM conference on Conference on Information and Knowledge Management, 2007.
    [3] J. Seo and W.B. Croft, “Blog Site Search Using Resource Selection,” in Proceedings of the 17th ACM conference on Conference on Information and Knowledge Management, 2008.
    [4] C. Costa, G. Beham, W. Reinhardt, and M. Sillaots, “Microblogging In Technology Enhanced Learning: A Use-Case Inspection of PPE Summer School 2008,” in Proceedings of the 3rd workshop at the European Conference on Technology Enhanced Learning, 2008.
    [5] B. J. Jansen, M. Zhang, K. Sobel and A. Chowdury, “The Commercial Impact of Social Mediating Technologies: Micro-blogging as Online Word-of-Mouth Branding,” in the Proceedings of the 27th international conference extended abstracts on Human factors in computing systems, 2009.
    [6] A. L. Hughes and L. Palen, “Twitter Adoption and Use in Mass Convergence and Emergency Events,” in the Proceedings of the 6th International Conference on Information Systems for Crisis Response and Management (ISCRAM), 2009.
    [7] A. Passant, T. Hastrup, U. Bojars and J. Breslin, “Microblogging: A Semantic and Distributed Approach, ” in the Proceedings of the 4th Workshop on Scripting for the Semantic Web, 2008.
    [8] N. Banerjee, D. Chakraborty, K. Dasgupta, A. Joshi, S. Madan, S. Mittal, S. Nagar and A. Rai “User Interests in Social Media Sites:An Exploration with Micro-blogs,” in Proceedings of the 18th Conference on Information and Knowledge Management, 2009.
    [9] X. Hu, X. Zhang, C. Lu, E. K. Park and X. Zhou, “Exploiting Wikipedia as External Knowledge for Document Clustering”, In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2009.
    [10] P. Kolari, A. Java, T. Finin, T. Oates, A. Joshi, “Detecting Spam Blogs: A Machine Learning Approach”, in National Conference on American Association for Artificial Intelligence, 2006.
    [11] P. Kolari, T. Finin, A. Java, A. Joshi, J. Martineau and J. Mayfield, “Blog Track Open Task: Spam Blog Classification,” in American Association for Artificial Intelligence conference, 2006.
    [12] X. Ni, X. Wu and Y. Yu , “Automatic Identification of Chinese Weblogger's Interests Based on Text Classification,” in proceedings of the 2006. IEEE/WIC/ACM International Conference on Web Intelligence
    [13] D. Carmel, H. Roitman, N. Zwerdling, “Enhancing Cluster Labeling Using Wikipedia,” in the Proceedings of the 32nd International ACM SIGIR conference on Research and development in information retrieval, 2009
    [14] A. Sun, M. A. Suryanto and Y. Liu, “Blog Classification Using Tags: An Empirical Study,” ICADL 2007. LNCS, vol. 4822, pp. 307–316. Springer, Heidelberg, 2007.
    [15] F. Liu, B. Li and Y. Liu, “Finding Opinionated Blogs Using Statistical Classifiers and Lexical Features,” in Proceedings of the Third International Conference on Weblogs and Social Media, 2009.
    [16] F. Lin and W. W. Cohen, “The MultiRank Bootstrap Algorithm: Semi-Supervised Political Blog Classification and Ranking Using Semi-Supervised Link Classification,” in Proceedings of the 2nd International Conference on Weblogs and Social Media, 2009.
    [17] C. Cortes and V. Vapnik. “Support-vector network,” Machine Learning, 20:273-297,1995.

    下載圖示
    QR CODE