研究生: |
張瀞云 Chang, Ching-Yun |
---|---|
論文名稱: |
Twitter使用者之立場偵測:基於目標集子集的分而治技術應用於深度學習方法 Detecting Stance in Tweets: Deep Learning with a Divide-and-Conquer Scheme based on Subsets of a Target Set |
指導教授: |
侯文娟
Hou, Wen-Juan |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2018 |
畢業學年度: | 106 |
語文別: | 中文 |
論文頁數: | 84 |
中文關鍵詞: | Twitter分析 、立場偵測 、類神經網路 、深度學習 |
英文關鍵詞: | Twitter Analysis, Detecting Stance, Neural Network, Deep Learning |
DOI URL: | http://doi.org/10.6345/THE.NTNU.DCSIE.020.2018.B02 |
論文種類: | 學術論文 |
相關次數: | 點閱:218 下載:23 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
「立場」這個概念是模糊的。在人們用文字表達的敘述裡,可能包含正向或負向的情緒詞彙、肯定或否定的語氣,但這些特徵都不是直接與立場相關聯。人們可以透過支持一個對象來反對特定目標(明喻),也可以藉由反諷一個對象來反對特定目標(暗喻)。在本研究中,將已標記立場標籤、來自Twitter使用者所撰寫的推文(Tweet)當作訓練資料,使用監督式學習的方式訓練深度神經網路(Deep Neural Network)。
本論文提出了一個新的訓練方法,將訓練資料依據主題(Target)分割成五個子集,這五個子集作為主題集(Target Set)的元素,然後以這個主題集的所有子集(Subsets of the Target Set)當作訓練資料來訓練模型。換句話說,即為相異主題間的搭配訓練,本文稱之為“組合式學習(Combination Learning)"。所有子集的組合式學習完成後,再從中挑選出對於每個主題表現最佳的模型,最後整合其結果,此方式稱為“分而治之(Divide-and-Conquer)"。
在SemEval 2016 Task 6之子任務A中,本研究使用監督式框架來偵測Twitter使用者的立場,實驗結果的F1-score為70.24%,優於所有此任務的參賽隊伍。
The concept of “stance” is vague. The words that people used in texts may include the positive or negative emotion, or the tone of comments. However, all features of the text can not be directly related to the stance. People can oppose a specific target by supporting an object (simile), and they can also oppose a specific target by speaking ironically (metaphor). In this study, the deep neural network with a supervised framework is trained by the dataset from tweets with tags of the stance.
This paper proposes a new training scheme. The training data is divided into five subsets based on topics (targets). These five subsets are used as the elements of the topic set (target set), and then the subsets of the target set are used to train the model. In other words, it is the training combined with several topics. We call it “Combination Learning”. After the Combination Learning for all the subsets is completed, the best models are selected from each topic, and then the results are integrated. This method is called “Divide-and-Conquer”.
For the subtask A of SemEval 2016 Task 6, a supervised framework in the study was used to detect the stance of Twitter’s user. Finally, the experimental result of F1-score was 70.24%, superior to all the teams participating in this task.
Bontcheva, K., Derczynski, L., Funk, A., Greenwood, M., Maynard, D., & Aswani, N. (2013). Twitie: An open-source information extraction pipeline for microblog text. In Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013 (pp. 83-90).
Cunningham, H., Tablan, V., Roberts, A., & Bontcheva, K. (2013). Getting more out of biomedical documents with GATE's full lifecycle open source text analytics. PLoS computational biology, 9(2), e1002854.
Godin, F., Vandersmissen, B., De Neve, W., & Van de Walle, R. (2015). Multimedia Lab @ ACL WNUT NER Shared Task: Named Entity Recognition for Twitter Microposts using Distributed Word Representations. In Proceedings of the Workshop on Noisy User-generated Text (pp. 146-153).
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
Igarashi, Y., Komatsu, H., Kobayashi, S., Okazaki, N., & Inui, K. (2016). Tohoku at SemEval-2016 task 6: feature-based model versus convolutional neural network for stance detection. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016) (pp. 401-407).
Johnson, R., & Zhang, T. (2014). Effective use of word order for text categorization with convolutional neural networks. arXiv preprint arXiv:1412.1058.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Mohammad, S., Kiritchenko, S., Sobhani, P., Zhu, X., & Cherry, C. (2016). Semeval-2016 task 6: Detecting stance in tweets. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016) (pp. 31-41).
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929-1958.
Vijayaraghavan, P., Sysoev, I., Vosoughi, S., & Roy, D. (2016). Deepstance at SemEval-2016 task 6: detecting stance in tweets using character and word-level CNNs. arXiv preprint arXiv:1606.05694.
Wei, W., Zhang, X., Liu, X., Chen, W., & Wang, T. (2016). pkudblab at semeval-2016 task 6: A specific convolutional neural network system for effective stance detection. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016) (pp. 384-388).
Yang, J., & Leskovec, J. (2011, February). Patterns of temporal variation in online media. In Proceedings of the fourth ACM international conference on Web search and data mining (pp. 177-186). ACM.
Zarrella, G., & Marsh, A. (2016). MITRE at semeval-2016 task 6: Transfer learning for stance detection. arXiv preprint arXiv:1606.03784.