簡易檢索 / 詳目顯示

研究生: 石敬弘
Shih, Chin-Hong
論文名稱: 基於類神經之關聯詞向量表示於文本分類任務之研究
Neural Relevance-aware Embedding For Text Categorization
指導教授: 陳柏琳
Chen, Berlin
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2017
畢業學年度: 105
語文別: 中文
論文頁數: 75
中文關鍵詞: 文本分類表示學習深度學習連體網路生成式對抗網路
英文關鍵詞: Text Categorization, Representation Learning, Deep Learning, Siamese Networkws, Generative Adversarial Networks
DOI URL: https://doi.org/10.6345/NTNU202202684
論文種類: 學術論文
相關次數: 點閱:120下載:47
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由於資訊網路的蓬勃發展,人們在物聯網上存取文本資料的需求也與日俱增,因此文本分類在自然語言處理的領域中的應用為相當熱門的研究。目前,在文本分類中最為核心的問題為特徵表示的選擇,大部分的研究使用詞袋(Bag of words)模型做為文本的特徵表示,但詞袋模型無法有效的表達詞與詞之間的關係,進而失去了文本上的語意。
    在本論文中,我們使用兩種新穎的類神經網路架構 : 連體網路(Siamese Nets)和生成式對抗網路(Generative Adversarial Nets), 在訓練過程中使模型能學習更為強健且帶有豐富語意的特徵表示。本論文實驗採用知名的分類資料庫,IMDB電影評論分類、20Newsgroups新聞群組分類,由一系列的情緒分析和主題分類的實驗結果顯示,藉由這些類神經網路所學習到的特徵表示可以有效地提昇文本分類的效能。

    With the rapid global access to tremendous amounts of text data on the Internet, text categorization or classification has emerged as an important and hot research topic in the natural language processing (NLP) community with many applications. Currently, the foremost problem in text categorization would be feature representation, which is commonly based on the bag-of-words (BoW) model, where word unigrams, bigrams (n-grams) or some specifically designed patterns are typically extracted as the component features. It has been noted that the loss of word order raised by the BoW representations is particularly problematic on document categorization.
    In order to leverage the influence of word order and proximity information on text categorization tasks, we explore a novel use of a Siamese nets and Generative adversarial nets for document representation and text categorization. Experiments conducted on two benchmark text categorization tasks, viz. IMDB and 20Newsgroups, we take advantage of these novel architectures for learning distributed vector representations of documents that can reflect the semantic relatedness.

    目錄 圖目錄 VII 表目錄 IX 第1章 緒論 1 1.1 研究背景 1 1.2 研究內容與目的 2 1.3 研究貢獻 4 1.4 章節安排 5 第2章 相關研究 6 2.1 文本分類概述 6 2.1.1 單標籤和多標籤文本分類 6 2.1.2 文本導向和類別導向文本分類 7 2.1.3 硬性和軟性文本分類 8 2.2 文本特徵選擇概述 8 2.2.1 資訊增益 9 2.2.2 相互資訊 10 2.2.3 卡方檢驗 11 2.2.4 文件頻率 12 2.2.5 基尼指數 13 2.2.6 期望交叉熵 13 2.3 文本分類方法概述 14 2.3.1 Rocchio演算法 14 2.3.2 樸素貝葉斯 15 2.3.3 最近鄰居法 16 2.3.4 支持向量機 17 2.3.5 決策樹 19 2.4 文本效能評估概述 21 2.4.1 基本指標 21 2.4.2 綜合指標 23 第3章 基礎實驗與設置 26 3.1 實驗語料 26 3.2 基礎實驗結果 27 第4章 文本表示學習 29 4.1 表示學習概述 29 4.2 統計語言模型 30 4.2.1 N元語言模型 31 4.2.2機率式神經網路語言模型 33 4.3 詞向量 35 4.3.1 連續詞袋模型 36 4.3.2 跳躍式模型 37 4.4 基於Hierarchical Softmax模型 38 4.5 基於Negative Sampling模型 45 第5章 基於類神經網路的文本分類 51 5.1卷積神經網路 51 5.2循環神經網路 53 5.3長短期記憶網路 56 第6章 學習關聯詞向量的新穎神經網路 58 6.1 連體網路 58 6.1.1連體網路架構 59 6.1.2輸入成對文本建立 61 6.1.3對比損失函數 61 6.1.4連體網路實驗結果 62 6.2 生成式對抗網路 66 6.2.1生成式對抗網路架構 67 6.2.2生成式對抗網路實驗結果 68 第7章 結論與未來展望 69 參考書目 70

    [1] Feldman, R., & Sanger, J.: “The text mining handbook: advanced approaches in analyzing unstructured data.” (2007).
    [2] Joachims T et al.: “Text categorization with support vector machines: Learning with many relevant features.” Machine learning: ECML-98, (1998).
    [3] Cunningham H, Maynard D, Bontcheva K, et al.: “A framework and graphical development environment for robust NLP tools and applications.” ACL, (2002).
    [4] LeCun Y, Bengio Y and Hinton G.: “Deep learning.” Nature, (2015).
    [5] Salton G, Wong A, Yang C S.: “A vector space model for automatic indexing.” Communications of the ACM, (1975).
    [6] Mikolov T, Yih W and Zweig G. : “Linguistic regularities in continuous space word representations.” NAACL, (2013).
    [7] Hayes-Roth, Frederick, Donald Waterman, and Douglas Lenat.: “Building expert systems.” (1984).
    [8] Stachniss, Cyrill, Giorgio Grisetti, and Wolfram Burgard.: “Information Gain-based Exploration Using Rao-Blackwellized Particle Filters.” (2005).
    [9] Viola, Paul, and William M. Wells III.: “Alignment by maximization of mutual information.” (1997).
    [10] Mantel, Nathan.: “Chi-square tests with one degree of freedom; extensions of the Mantel-Haenszel procedure.” (1963).
    [11] Yitzhaki, Shlomo.: “Relative deprivation and the Gini coefficient.” The quarterly journal of economics, (1979).
    [12] De Boer, Pieter-Tjerk, et al.: “A tutorial on the cross-entropy method.” Annals of operations research, (2005).
    [13] Joachims, Thorsten.: “A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization.” No. CMU-CS-96-118. Carnegie-mellon univ pittsburgh pa dept of computer science, (1996).
    [14] Lewis, David D.: “Naive (Bayes) at forty: The independence assumption in information retrieval.” European conference on machine learning. Springer, Berlin, Heidelberg, (1998).
    [15] Masand, Brij, Gordon Linoff, and David Waltz.: “Classifying news stories using memory based reasoning.” Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, (1992).
    [16] Weston, Jason, et al.: “Feature selection for SVMs.” Advances in neural information processing systems, (2001).
    [17] Joachims, Thorsten.: “Making large-scale SVM learning practical.” (1998).
    [18] Kohavi, Ron.: “Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid.” (1996).
    [19] De Mántaras, R. López.: “A distance-based attribute selection measure for decision tree induction.” (1991).
    [20] Chawla, Nitesh V.: “C4. 5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure.” Proceedings of the ICML, (2003).
    [21] Friedl, Mark A., and Carla E. Brodley.: “Decision tree classification of land cover from remotely sensed data.” Remote sensing of environment, (1997).
    [22] Maas, Andrew L., et al.: “Learning word vectors for sentiment analysis.” Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, (2011).
    [23] Cardoso-Cachopo, Ana, and Arlindo L. Oliveira.: “An empirical comparison of text categorization methods.” SPIRE, (2003).
    [24] Bengio, Yoshua, Aaron Courville, and Pascal Vincent.: “Representation learning: A review and new perspectives.” IEEE transactions on pattern analysis and machine intelligence, (2013).
    [25] Brown, Peter F., et al.: “Class-based n-gram models of natural language.” Computational linguistics 18.4, (1992).
    [26] Bengio, Yoshua, et al.: “A neural probabilistic language model.” Journal of machine learning research 3, (2003).
    [27] Hinton, Geoffrey E.: “Learning distributed representations of concepts.” Proceedings of the eighth annual conference of the cognitive science society, (1986).
    [28] Mikolov, Tomas, et al.: “Distributed representations of words and phrases and their compositionality.” Advances in neural information processing systems, (2013).
    [29] Lawrence, Steve, et al.: “Face recognition: A convolutional neural-network approach.” IEEE transactions on neural networks, (1997).
    [30] Hochreiter, Sepp, and Jürgen Schmidhuber.: “Long short-term memory.” Neural computation 9.8, (1997).
    [31] Bromley, Jane, et al.: “Signature verification using a" siamese time delay neural network.” Advances in Neural Information Processing Systems, (1994).
    [32] Chopra, Sumit, Raia Hadsell, and Yann LeCun.: “Learning a similarity metric discriminatively, with application to face verification.” Computer Vision and Pattern Recognitio, (2005).
    [33] Mueller, Jonas, and Aditya Thyagarajan.: ”Siamese Recurrent Architectures for Learning Sentence Similarity.” AAAI, (2016).
    [34] Goodfellow, Ian, et al.: “Generative adversarial nets.” Advances in neural information processing systems, (2014).
    [35] Zhao, Junbo, Michael Mathieu, and Yann LeCun.: “Energy-based generative adversarial network.” (2016).

    下載圖示
    QR CODE