簡易檢索 / 詳目顯示

研究生: 張瀞云
Chang, Ching-Yun
論文名稱: Twitter使用者之立場偵測:基於目標集子集的分而治技術應用於深度學習方法
Detecting Stance in Tweets: Deep Learning with a Divide-and-Conquer Scheme based on Subsets of a Target Set
指導教授: 侯文娟
Hou, Wen-Juan
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 84
中文關鍵詞: Twitter分析立場偵測類神經網路深度學習
英文關鍵詞: Twitter Analysis, Detecting Stance, Neural Network, Deep Learning
DOI URL: http://doi.org/10.6345/THE.NTNU.DCSIE.020.2018.B02
論文種類: 學術論文
相關次數: 點閱:218下載:23
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 「立場」這個概念是模糊的。在人們用文字表達的敘述裡,可能包含正向或負向的情緒詞彙、肯定或否定的語氣,但這些特徵都不是直接與立場相關聯。人們可以透過支持一個對象來反對特定目標(明喻),也可以藉由反諷一個對象來反對特定目標(暗喻)。在本研究中,將已標記立場標籤、來自Twitter使用者所撰寫的推文(Tweet)當作訓練資料,使用監督式學習的方式訓練深度神經網路(Deep Neural Network)。

      本論文提出了一個新的訓練方法,將訓練資料依據主題(Target)分割成五個子集,這五個子集作為主題集(Target Set)的元素,然後以這個主題集的所有子集(Subsets of the Target Set)當作訓練資料來訓練模型。換句話說,即為相異主題間的搭配訓練,本文稱之為“組合式學習(Combination Learning)"。所有子集的組合式學習完成後,再從中挑選出對於每個主題表現最佳的模型,最後整合其結果,此方式稱為“分而治之(Divide-and-Conquer)"。

      在SemEval 2016 Task 6之子任務A中,本研究使用監督式框架來偵測Twitter使用者的立場,實驗結果的F1-score為70.24%,優於所有此任務的參賽隊伍。

    The concept of “stance” is vague. The words that people used in texts may include the positive or negative emotion, or the tone of comments. However, all features of the text can not be directly related to the stance. People can oppose a specific target by supporting an object (simile), and they can also oppose a specific target by speaking ironically (metaphor). In this study, the deep neural network with a supervised framework is trained by the dataset from tweets with tags of the stance.

     This paper proposes a new training scheme. The training data is divided into five subsets based on topics (targets). These five subsets are used as the elements of the topic set (target set), and then the subsets of the target set are used to train the model. In other words, it is the training combined with several topics. We call it “Combination Learning”. After the Combination Learning for all the subsets is completed, the best models are selected from each topic, and then the results are integrated. This method is called “Divide-and-Conquer”.

     For the subtask A of SemEval 2016 Task 6, a supervised framework in the study was used to detect the stance of Twitter’s user. Finally, the experimental result of F1-score was 70.24%, superior to all the teams participating in this task.

    附表目錄 vi 附圖目錄 vii 第一章 緒論 1 第一節 研究背景 1 第二節 研究動機與目的 2 第三節 論文架構 2 第二章 文獻探討 3 第一節 SemEval 2016 Task 6 3 第二節 近期相關Twitter立場偵測之方法與成果 5 第三節 類神經網路與LSTM 9 第四節 Word2Vec 12 第三章 研究方法與步驟 14 第一節 研究架構 14 第二節 原始資料前處理 16 (一) 切詞 (Tokenize) 17 (二) 分割主題標籤 (Split Hashtag) 18 (三) 去除使用者帳號 (Remove User ID) 19 (四) 去除超連結 (Remove URL) 19 第三節 訓練Word2Vec模型 20 (一) uni-gram模型 20 (二) bi-gram與tri-gram模型 20 第四節 立場偵測模型架構 22 第五節 n-gram詞嵌入學習法 (Word Embedding Learning) 24 第六節 組合式學習法 (Combination Learning) 28 第七節 模型驗證 29 (一) k次交叉驗證 (k-fold Cross-Validation) 30 (二) 以訓練資料進行驗證 (Validating with Training Data) 31 (三) 過度適應與損失函數 (Overfitting & Loss Function) 31 第四章 資料來源與評估方式 33 第一節 資料來源 33 第二節 評估方式 34 第五章 實驗結果與討論 37 第一節 n-gram詞嵌入學習法之評估 37 第二節 組合式學習法之模型架構與參數 41 (一) 1到3個主題同步學習 42 (二) 4到5個主題同步學習 47 第三節 模型效能評估 51 (一) 無神論 (Atheism) 52 (二) 氣候變遷是真切的憂慮 (Climate Change is a Real Concern) 57 (三) 女權運動 (Feminist Movement) 62 (四) 希拉蕊.柯林頓 (Hillary Clinton) 67 (五) 墮胎合法化 (Legalization of Abortion) 71 第四節 整合評估 76 第六章 結論與未來展望 80 參考文獻 83

    Bontcheva, K., Derczynski, L., Funk, A., Greenwood, M., Maynard, D., & Aswani, N. (2013). Twitie: An open-source information extraction pipeline for microblog text. In Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013 (pp. 83-90).
    Cunningham, H., Tablan, V., Roberts, A., & Bontcheva, K. (2013). Getting more out of biomedical documents with GATE's full lifecycle open source text analytics. PLoS computational biology, 9(2), e1002854.
    Godin, F., Vandersmissen, B., De Neve, W., & Van de Walle, R. (2015). Multimedia Lab @ ACL WNUT NER Shared Task: Named Entity Recognition for Twitter Microposts using Distributed Word Representations. In Proceedings of the Workshop on Noisy User-generated Text (pp. 146-153).
    Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
    Igarashi, Y., Komatsu, H., Kobayashi, S., Okazaki, N., & Inui, K. (2016). Tohoku at SemEval-2016 task 6: feature-based model versus convolutional neural network for stance detection. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016) (pp. 401-407).
    Johnson, R., & Zhang, T. (2014). Effective use of word order for text categorization with convolutional neural networks. arXiv preprint arXiv:1412.1058.
    Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
    Mohammad, S., Kiritchenko, S., Sobhani, P., Zhu, X., & Cherry, C. (2016). Semeval-2016 task 6: Detecting stance in tweets. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016) (pp. 31-41).
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929-1958.
    Vijayaraghavan, P., Sysoev, I., Vosoughi, S., & Roy, D. (2016). Deepstance at SemEval-2016 task 6: detecting stance in tweets using character and word-level CNNs. arXiv preprint arXiv:1606.05694.
    Wei, W., Zhang, X., Liu, X., Chen, W., & Wang, T. (2016). pkudblab at semeval-2016 task 6: A specific convolutional neural network system for effective stance detection. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016) (pp. 384-388).
    Yang, J., & Leskovec, J. (2011, February). Patterns of temporal variation in online media. In Proceedings of the fourth ACM international conference on Web search and data mining (pp. 177-186). ACM.
    Zarrella, G., & Marsh, A. (2016). MITRE at semeval-2016 task 6: Transfer learning for stance detection. arXiv preprint arXiv:1606.03784.

    下載圖示
    QR CODE