研究生: |
張莊平 Chang, Chuang-ping |
---|---|
論文名稱: |
中文文法剖析應用於電影評論之意見情感分類 Opinion Classification in Chinese Move Reviews Using Parsing-based Method |
指導教授: | 侯文娟 |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2012 |
畢業學年度: | 100 |
語文別: | 中文 |
論文頁數: | 48 |
中文關鍵詞: | 中文文法剖析 、意見探勘 、情感分類 、電影評論 |
英文關鍵詞: | Chinese parser, opinion mining, sentiment classification, movie review |
論文種類: | 學術論文 |
相關次數: | 點閱:290 下載:26 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在網路發達的現今社會,各種領域的評論資訊觸手可及,人們也習慣於收集產品的網路評論作為消費前的參考。尤其在電影產品上,除了從片商釋出預告片裡的片段內容外,事前無法試看,事後也無法退費。因此在前往電影院購票前,人們會更加重視網路上的評論心得。
在本篇論文中,收集來自電影評論網當中觀影民眾的評論文章,希望透過自然語言的分析技術,總結出一個電影整體的推薦分數以及數個電影元素(如劇情、演員、特效等)的高頻率意見詞,提供使用者選擇適合自己的電影觀賞。
在研究方法上,選擇以中文電影的評論文章為主,在傳統的電影評論意見分類步驟中引入中央研究院的中文剖析器,發展一套根據文法關係圖判斷意見詞與屬性詞配對的程式流程,以便針對大量字數的評論文章獲得更準確的分析及評分結果,最後再以五等第制的方式呈現。
實驗的結果證明本論文所提出系統的評分結果在誤差一分的情況下有70.7%的準確率,整體的MRR值為0.61;將五等第化為推薦與不推薦的結論時,也分別獲得了F-score 74.3%與51.4%的成果。這表示本實驗系統在透過大量收集網路評論文章來幫助使用者判斷電影的推薦程度上,確實達到預期的效果。
In the modern society with highly developing internet, it is easy to reach reviews of various domains. People are used to collect the reviews as references before their consumption. Especially in movie products, we can only preview some brief and fragmented contents by trailers and cannot refund after we watched it, so people think more highly of the movie reviews on the internet.
In this study, we collected movie reviews from websites and analyzed them with nature language processing approaches, which resulted in a general recommendation grade and several frequent opinion keywords in some movie elements such as plots, actors/actresses, special effects…etc. According to these results, people can choose the movies that suit themselves.
Focusing on the movie reviews in Chinese, the study leaded the CKIP Chinese Parser into traditional opinion mining approach to propose a new procedure which can extract the pairs of opinion keywords and feature keywords according to dependency grammar graphs. This parsing-based approach is more suitable for articles with plenty of words. The grading results will be presented by a 5-grade marking system.
The experimental results show that the accuracy of our system, with the deviation of grades less than 1, is 70.7%, and the MRR value is 0.61. In addition, when we changed the 5-grade marking system into the recommend and un-recommend choices, we got F-score 74.3% and 51.4% respectively. The result indicates that our system can reach satisfied expectancy for movie recommendation.
Minqing Hu and Bing Liu (2004), “Mining and summarizing customer reviews,” Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 2004, pp.168-177.
Kekang Lin (1998), “Dependency-based evaluation of MINIPAR,” Workshop on the Evaluation of Parsing Systems, Granada, Spain, 1998.
NTUSD (National Taiwan University Semantic Dictionary) ,http://nlg18.csie.ntu.edu.tw:8080/opinion/pub1.html .
Ana-Maria Popescu and Oren Etzioni (2005), “Extracting product features and opinions from reviews,” Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, Stroudsburg, PA, USA, 2005, pp.339-346.
Likun Qiu, Weishi Zhang, Changjian Hu, and Kai Zhao (2009), “SELC: A self-supervised model for sentiment classification,” Proceedings of the 18th ACM Conference on Information and Knowledge Management, New York, NY, USA, 2009, pp.929-936.
Peter D. Turney (2002), “Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews,” Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, 2002, pp.417-424.
Li Zhuang, Feng Jing, and Xiao-Yan Zhu (2006). “Movie review mining and summarization,” Proceedings of the 2006 ACM CIKM International Conference on Information and Knowledge Management, Arlington, Virginia, USA, 2006, pp.43-50.
中文斷詞系統,中文詞知識庫小組,中央研究院,http://ckipsvr.iis.sinica.edu.tw/
中文剖析系統,中文詞知識庫小組,中央研究院,http://ckip.iis.sinica.edu.tw/CKIP/parser.htm
朱嫣嵐,閔錦,周雅倩,黃萱菁,吳立德,“基於HowNet的詞彙語義傾向計算”,中文信息學報,第20卷第1期,2006年,pp.14-20。
李佳穎,古倫維,陳信希,“意見持有者辨識之研究”,中文計算語言學期刊,第14卷第4期,2009年,pp.101-114。
李政儒,“應用廣義知網以支援情緒分析之研究”,國立臺灣大學資訊工程學研究所碩士論文,2011年。
李振昌,李御璽,陳信希,“中文文本人名辨識問題之研究”,第七屆計算語言學研討會論文集,1994年,pp.203-222。
林宇中,“基於語意內容分析之情緒分類系統”,國立成功大學資訊工程學系碩士論文,2003年。
邱鴻達,“意見探勘在中文電影評論之應用”,國立交通大學資訊科學與工程研究所碩士論文,2011年。
陳立,“中文情感語意自動分類之研究”,國立臺灣師範大學資訊工程所碩士論文,2010年。
梅家駒,竺一鳴,高蘊琦,殷鴻翔編著,“同義詞詞林”,臺灣東華書局股份有限公司出版,1997年。
婁德成,姚天昉,“漢語句子語義極性分析和觀點抽取方法的研究”,計算機應用,第26卷第11期,2006年,pp.2622-2625。