研究生: |
江宜勳 Chiang, I-Hsun |
---|---|
論文名稱: |
利用剖析樹結構探討論壇評論之特徵與意見詞配對關係 Using Parse Tree Structures for Mining Matching Relationships between Features and Opinion Words from Forum Reviews |
指導教授: |
侯文娟
Hou, Wen-Juan |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2017 |
畢業學年度: | 105 |
語文別: | 中文 |
論文頁數: | 99 |
中文關鍵詞: | 意見探勘 、剖析樹結構 、論壇評論 、PVC人形模型 |
英文關鍵詞: | opinion mining, parse tree structure, forum reviews, PVC figure model |
DOI URL: | https://doi.org/10.6345/NTNU202202823 |
論文種類: | 學術論文 |
相關次數: | 點閱:126 下載:11 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著網際網路的蓬勃發展,人們的消費習慣逐漸傾向網路購物,然而在尚未見到實體的情況下,往往會被官方「美好」的商品照片及描述所矇蔽,因為官方往往帶有主觀的推銷目的而不會將產品真正的優劣寫出來,故網友的評論就具有很大的參考價值,這也是本研究進行「分析評論」以達成產品推薦的主要原因。
本研究從巴哈姆特論壇中找尋該產品的相關評論,利用中研院剖析器逐一進行分析,從中找到標記為Head Na系列之詞彙 (本研究稱為特徵詞)及標記為VH、A系列之詞彙(本研究稱為意見詞),由於網路評論大多為非正式中文,故在語料庫之擷取上本論文秉持著只要有一個特徵詞或是意見詞就採納。利用投票的方式建構出特徵詞的資料庫,意見詞資料庫的建構部分則是與台大的情緒字典(NTUSD)比對,並利用物以類聚法、教育部重編字典和人工標記等方式加以補充,建構好之資料庫可用於處裡分群及给定分數等工作,並利用Aspect Based Semantic Analysis (ABSA)的核心概念,藉由剖析樹進行特徵及意見詞的配對。在輸出方面會提供使用者該產品的各項評論之特徵、意見詞、意見詞的情感分數、特徵及意見詞之配對及整體產品的分數等,以期提供評論之重要資訊給使用者。
本論文的最後的實驗數據在特徵詞分群上有著81.8%的正確率、意見詞的分群上有著87.71%的正確率,特徵詞語意見詞之配對正確率有著87.13%,而最後與日本亞馬遜的推薦與否在星等上有著90%的相似度,IDF值上有著70%的相似度。
As the development of Internet, people’s consumption habits grow to tend to shopping in the online shop. However, we are usually deceived by the ‘beautiful pictures and words’ without seeing the real items. We analyze the comments which were written by netizens in the forum to avoid the manufacturer’s marketing purpose that makes us confusion that which advantages are right. This is the reason why we choose to explore the forum comments in the study.
In the thesis, the study retrieve the comments in ‘Bahamūt Forum’ and then parse the reviews by CKIP(Chinese Knowledge Information Processing) parser. We extract the words with tags ‘Head Na’ as the features words, and extract the words with tags ‘VH’ or ‘A’ as the opinion words. The comments in the forum are usually unofficial, so the sentences are maybe not complete. Thus, if the sentence has one of features words or opinion words, the system will extract it. The study uses the majority vote strategy to construct the Feature_Words_Database, the Opinion_Words_Database is constructed by NTUSD, the distance from Positive_Words to Negative_Words, and the dictionary revised by the Ministry of Education. These databases are used for classification and scoring tasks. Based on the concept of ABSA(Aspect Based Semantic Anlysis), a pair of the feature word and opion word is generated. The output includes the information of feature words, opinion words, the score of the production and the pair of feature words and opinion words that can be offered to users for their reference.
The experiments show the precision of feature word classification is 87.71% and opinion words classification is 81.8%. The precision of pair matching is 87.13%. Finally, the similarity of stars between the system and amazon.jp is 90%, and the similarity of IDF number between the system and amazon.jp is 70%.
Agarwal, Basant, and Namita Mittal. Categorical probability proportion difference (CPPD): A feature selection method for sentiment classification. Proceedings of the 2nd Workshop on Sentiment Analysis where AI meets Psychology (SAAIP 2012), COLING. 2012.
Agerri, Rodrigo and Bermudez, Josu, and Rigau, German. 2014. Ixa pipeline: Efficient and ready to use multilingual nlp tools. In Proceedings of the 9th Language Resources and Evaluation Conference (LREC2014), pages 26–31, Reykjavik, Iceland, May.
ALTER(アルター):https://alter-web.jp/
Amazon.cp.jp:https://www.amazon.co.jp/
Baccianella, S. and Esuli, A. and Sebastiani, F. 2010. Senti- WordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In Seventh conference on International Language Resources and Evaluation (LREC-2010), Malta., volume 25.
Brown, Peter F and Desouza, Peter V and Mercer, Robert L and Vincent Pietra, J Della and Lai, Jenifer C. 1992. Classbased n-gram models of natural language. Computational linguistics, 18(4):467–479. Rodrigo Agerri, Josu Bermudez, and German Rigau. 2014. Ixa pipeline: Efficient and ready to use multilingual nlp tools. In Proceedings of the 9th Language Resources and Evaluation Conference (LREC2014), pages 26–31, Reykjavik, Iceland, May.
Carletta, J. (1996). "Assessing Agreement on Classification Tasks: the Kappa Statistic," Computational linguistics, 22(2), pp. 249-254.
Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Computational Linguistics 16(1) (1990) 22–29
Clark, Alexander. 2003. Combining distributional and morphological information for part of speech induction. In Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics-Volume 1, pages 59–66.
cLayz(クレイズ):http://clayz-online.com/
De Clercq, O., Van de Kauter, M., Lefever, E., & Hoste, V. (2015). Applying hybrid terminology extraction to aspect-based sentiment analysis. In International Workshop on Semantic Evaluation (SemEval 2015) (pp. 719-724). Association for Computational Linguistics.
Garcıa-Pablos, A., Cuadros, M., & Rigau, G. (2015). V3: unsupervised aspect based sentiment analysis for SemEval-2015 Task 12. SemEval-2015, 714–718.
goo辞書:https://dictionary.goo.ne.jp/
GSC(GOOD SMILE COMPANY):http://www.goodsmile.info/zh/
Hall, Mark and Frank, Eibe and Holmes, Geoffrey and Pfahringer, Bernhard and Peter Reutemann and Ian H. Witten. 2009. The WEKA data mining software: an update. SIGKDD Explor. Newsl., 11(1):10–18, november.
Hu, M. and Liu, B. 2004. Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 168–177.
Jiménez-Zafra, S. M., Martınez-Cámara, E., Martın-Valdivia, M. T., & Urena-López, L. A. (2015). SINAI: Syntactic approach for Aspect Based Sentiment Analysis. SemEval-2015, 730–735.
Koppula, A. R., Pallelra, R. R., Repaka, R., & Movva, V. S. (2015). UMDuluth-CS8761-12: A Novel Machine Learning Approach for Aspect Based Sentiment Analysis. SemEval-2015, 742–747.
KOTOBUKIYA | 株式会社 壽屋 コトブキヤ:http://www.kotobukiya.co.jp/
Ku, L.-W. and Chen, H.-H. 2007. Mining Opinions from the Web: Beyond Relevance Retrieval. Journal of American Society for Information Science and Technology, Special Issue on Mining Web Resources for Enhancing Information Retrieval, 58(12), 1838-1850.
Liu, Bing and Hu, Minqing and Cheng, Junsheng. 2005. Opinion Observer: Analyzing and Comparing Opinions on the Web. In Proceedings of the 14th International World Wide Web conference (WWW-2005). Chiba, Japan.
Liu, Kang and Xu, Liheng and Zhao, Jun 2014. Extracting Opinion Targets and Opinion Words from Online Reviews with Graph Co-ranking Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics
Liu, L., Lei, M., & Wang, H. (2013). Combining domain-specific sentiment lexicon with hownet for chinese sentiment analysis. Journal of Computers, 8(4), 878-883.
Lu, Bin and Ott, Myle and Cardie, Claire, and Tsou, Benjamin K. 2011. Multi-aspect sentiment analysis with topic models. In Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on, pages 81–88. IEEE.
McCallum, Andrew Kachites. 2002. MALLET: A Machine Learning for Language Toolkit.
Mikolov, Tomas and Sutskever, Ilya and Chen, Kai and Corrado, Greg S and Dean, Jeff. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pages 3111–3119.
Miller, George A. 1995. Wordnet: a lexical database forenglish. Communications of the ACM, 38(11):39–41.
Nielsen, Finn A° rup. 2011. A New ANEW: Evaluation of a Word List for Sentiment Analysis in Microblogs. In Proceedings, 1st Workshop on Making Sense of Microposts (#MSM2011): Big things come in small packages. pp: 93-98. Greece.
Pontiki M., Galanis D., Papageorgiou H., Manandhar S., & Androutsopoulos I.(2015, June). Semeval-2015 task 12: Aspect Based Sentiment Analysis. In Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015) (pp. 486-495).
PTT:https://www.ptt.cc/bbs/hotboards.html
Saias, J. (2015, June). Sentiue: Target and aspect based sentiment analysis in semeval-2015 task 12. Association for Computational Linguistics.
San Vicente, I., Saralegi, X., Agerri, R., & Sebastián, D. S. (2015, June). Elixa: A modular and flexible absa platform. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015) (pp. 748-752).
SIGLEX(Special Interest Group on the Lexicon):http://alt.qcri.org/semeval2015/
Stone P. and Dunphy, D. and Smith, M. and Ogilvie, D. 1966. The General Inquirer: A Computer Approach to Content Analysis. Cambridge (MA): MIT Press.
Weblio日中中日辞典:http://cjjc.weblio.jp/
Wilson, Theresa and Wiebe, Janyce and Hoffmann, Paul. 2005. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT ’05, pages 347–354, Stroudsburg, PA, USA.
ixa-pipe-nerc Named Entity Recognition system. Available from:https://github.com/ixa-ehu/ixa-pipe-nerc
中研院中文剖析系統:http://parser.iis.sinica.edu.tw/
中研院中文斷詞系統:http://ckipsvr.iis.sinica.edu.tw/
分群範例 【風華の開箱】ALTER 未聞花名 本間芽衣子:https://forum.gamer.com.tw/Co.php?bsn=60036&sn=237462
巴哈姆特電玩資訊站:https://www.gamer.com.tw/
伊莉討論區:http://www68.eyny.com/index.php
國家教育研究院,雙語詞彙、學術名詞暨辭書資訊網:http://terms.naer.edu.tw/
張莊平,2012,“中文文法剖析應用於電影評論之意見情感分類”,國立師範大學資訊工程研究所碩士論文。
陳昱年,2013,“電影評論中情感詞彙之極性分析”,國立師範大學資訊工程研究所碩士論文。
陳傳生,2014“使用廣義知網於情感詞彙之極性分析研究”,國立師範大學資訊工程研究所碩士論文。
臉書社團「PVC_Figure人型討論分享社」:https://www.facebook.com/groups/figure.hot/