研究生: |
徐毓雯 Hsu,Yu-wen |
---|---|
論文名稱: |
產品評論特徵自動擷取之研究 Automatic Feature Terms Extraction for Product Opinions |
指導教授: |
柯佳伶
Koh, Jia-Ling |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2011 |
畢業學年度: | 99 |
語文別: | 中文 |
論文頁數: | 60 |
中文關鍵詞: | 產品評論特徵 、自動擷取 、字詞重要性評估函式 、意見探勘 |
英文關鍵詞: | feature terms of products, automatic extraction, importance measure function of terms, opinion mining |
論文種類: | 學術論文 |
相關次數: | 點閱:358 下載:47 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
現今大多數意見探勘研究中,對於產品特徵字詞的挑選大多由人工給定或是依據詞頻的高低來決定,對不同種類的產品則需要重新給定產品特徵字詞,因此我們希望透過自動擷取產品特徵字詞,降低在產品特徵挑選所花費的人力成本。本論文運用不同的字詞重要性評估方式,探討如何有效地自動從論壇文章中擷取出產品特徵字詞。我們以名詞為候選特徵字詞,分別對論壇文件庫及相機介紹文件庫,統計每個字詞在文件庫中各廠牌討論文的出現頻率,反應出一般常見特徵;運用不同廠牌產品特徵字詞出現的機率差異程度,反應出廠牌特有特徵;並運用廠牌與特徵字詞出現的相關程度,反應出廠牌關聯特徵。此外我們亦考慮跨文件庫的字詞出現機率差異程度,反應出論壇及相機文中常用的產品特徵字詞,再透過常見字詞列表進行一般口語字詞的過濾篩選。我們提出產品特徵字詞重要性評估函式,結合各種分析方法所得的重要性評估值作為產品特徵字詞擷取的依據。實驗結果顯示以所提出的字詞重要性評估函式篩選字詞,可有效地自動擷取出產品特徵字詞。
In the recent researches on opinion mining, the feature terms of products are usually manual assigned or determined according to the term frequencies. Consequently, it would take lots of costs when we choose different products. For this reason, the goal of this thesis is to study how to extract feature terms of products from documents in a forum automatically and effectively. We select forum and expert commentaries as the corpora. Within a corpus, the nouns appearing in the documents are selected as the candidate feature terms. The term frequency is counted for each candidate term for the documents discussing a certain brand, which shows the popularity of a feature term. The divergence of probability between different brands is calculated for each candidate term, which shows the particular feature term of a brand. The correlation of a feature term with a brand is also calculated to show the related terms of a brand. Furthermore, the divergence of probability between the two different corpora is calculated for a candidate term to show the special terms of different corpora. Finally, we propose an importance measure function of terms to evaluate the importance of terms, which combine the scores of the above various evaluation methods. The experimental results show that the rank list of feature terms obtained by using the importance measure function could extract product feature terms automatically and effectively.
[1] L. Ku, Y. Liang and H. Chen, “Opinion Extraction, Summarization and Tracking in News and Blog Corpora” in Proceedings of International Conference on Artificial Intelligence(AAAI) ,2006.
[2] B. Liu and N. Jindal, “Opinion Spam and Sentiment Analysis”, in Proceedings of the 1st ACM International Conference on Web Search and Data Mining (WSDM), 2008.
[3] G.. Mishne “Using Blog Properties to Improve Retrieval”, in Proceedings of the 1st International Conference on Weblogs and Social Media(ICWSM), 2007.
[4] W. Zhang, C.Yu, and W. Meng, “Opinion Retrieval from Blogs”, in Proceedings of the16th ACM Conference on Information and Knowledge Management(CIKM), 2007.
[5] Q.Su, X. Xu, H. Guo, Z. Guo, X. Wu, X. Zhang, B. Swen, “Hidden Sentiment Association in Chinese Web Opinion Mining”, in Proceedings of the 17th International Conference on World Wide Web(WWW), 2008.
[6] W. Dakka and P. G. Ipeirotis, “Automatic Extraction of Useful Facet Hierarchies from Text Databases”, in Proceedings of the 24th International Conference on Data Engineering (ICDE), 2008.
[7] D. Dash, J. Rao, N. Megiddo, A. Ailamaki1, and G. Lohman, “Dynamic Faceted Search for Discovery-driven Analysis”, in Proceedings of the 17th ACM Conference on Information and Knowledge Management(CIKM), 2008.
[8] B. He, C. Macdonald, J. He, and I. Ounis, “ An Effective Statistical Approach to Blog Post Opinion Retrieval”, in Proceedings of the 17th ACM Conference on Information and Knowledge Management(CIKM), 2008.
[9] X. Ling, Q. Mei, C. Zhai, and B. Schatz, “Mining Multi-Faceted Overviews of Arbitrary Topics in a Text Collection”, in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(SIGKDD), 2008.
[10] G. Salton, “Automatic Information Organization and Retrieval” McGraw-Hill, New York, 1968.
[11] M. Hu, B. Liu, “ Mining and Summarizing Customer Reviews” , in Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(SIGKDD), 2004.
[12] L. Zhuang, F. Jing, X. Zhu, “Movie Review Mining and Summarization” , in Proceedings of the 15th ACM Conference on Information and Knowledge Management(CIKM), 2006.
[13] X. Ding, B. Liu, L. Zhang, “Entity Discovery and Assignment for Opinion Mining Applications”, in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2009.
[14] W. Jin, H. Ho, R. Srihari, “OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extraction”, in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2009.
[15] M.Grineva, M.Grinev, D. Lizorkin, “Extract Key Terms from Noisy and Multi-theme Documents”, in Proceedings of the 18th International Conference on World Wide Web (WWW), 2009.
[16] C. Fautsch, Jacques Savoy, “Adapting the Tf-idf Vector Space Model to Domain Specific Information Retrieval” in Proceedings of the 25th ACM Symposium on Applied Computing(SAC),2010.
[17] D. Carmel, H. Rotiman, N.zwerding, “Enhancing Clustering Labeling Using Wikipedia” in Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval(SIGIR),2009.