簡易檢索 / 詳目顯示

研究生: 陳冠穎
Chen, Guan-Ying
論文名稱: 深度視覺語義嵌入模型於生成式多標籤零樣本學習
Deep Visual-Semantic Embedding Model for Generative Multi-Label Zero-Shot Learning
指導教授: 葉梅珍
Yeh, Mei-Chen
口試委員: 葉梅珍
Yeh, Mei-Chen
陳祝嵩
Chen, Chu-Song
彭彥璁
口試日期: 2021/07/30
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 37
中文關鍵詞: 多標籤零樣本學習視覺語義嵌入模型生成對抗網路
英文關鍵詞: Multi-Label, Zero-Shot Learning, visual semantic embedding model, GAN, generative adversarial network
研究方法: 實驗設計法
DOI URL: http://doi.org/10.6345/NTNU202101371
論文種類: 學術論文
相關次數: 點閱:203下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 零樣本學習是指分類器不只能識別在訓練階段已經看過的物件,甚至能識
    別未曾看過的物件,而在多標籤零樣本學習中,每個實例中可能出現不只一個
    物件,這使得識別任務變得更加困難。
      過去的方法常利用標籤的屬性嵌入(attributes embedding)及影像抽取出的
    視覺特徵(visual feature),投影到同一空間中,藉此尋找與影像特徵最接近的
    標籤,或是利用知識圖譜、知識庫建構標籤之間的關係,根據此關係來幫助辨
    識標籤。然而在資料集欠缺屬性嵌入時,常用於替代的語義嵌入(word mbedding)並不像屬性嵌入一樣具有良好的辨識力,而建構關係的方法,也容易太過信任知識庫,便將關係強加上去,忽略了影像本身包含的資訊。近年來由於生成對抗網路(Generative Adversarial Network)的興起,對於未知類別,先從已知類別學習影像特徵的表達式及對應的屬性,再由屬性標籤生成影像特徵變得更加有效率,結果也更準確。基於這項觀察,我們提出了生成對抗網路結合語義嵌入的深度學習模型,從語義嵌入生成影像特徵,以及將影像特徵轉換成分類器映射至語義嵌入空間,尋找屬於該影像的標籤。藉由影像特徵及語義嵌入互相映射來更好地預測未知類別,並根據影像特徵與分類器之間的關係,將多標籤任務轉換化成單標籤任務。

    附表目錄 iv 附圖目錄 v 第一章 簡介 1 1.1 研究背景1 1.2 研究動機2 1.3 研究目的3 1.4 論文架構4 第二章 相關研究探討 5 2.1 零樣本學習5 2.1.1 基於語義嵌入-DeViSE方法 6 2.1.2 基於語義自編碼方法-SAE 7 2.2 多標籤任務8 2.3 語義嵌入8 2.4 AutoEncoder 9 2.4.1 VAE 10 2.5 生成對抗網路 10 2.5.1 CGAN 11 2.5.2 WGAN 11 2.5.3 VAEGAN 14 2.6 Relation to previous methods 15 第三章 方法與步驟 16 3.1 問題定義 16 3.2 模型架構 17 3.2.1 辨別器 17 3.2.2 生成器 17 3.2.3 多標籤分類器 20 第四章 實驗結果 21 4.1 資料集 21 4.2 評估方式 21 4.3 Ablation study 23 4.4 實驗一 VOC2007 ZSL 24 4.5 實驗二 VOC2007 GZSL 26 4.6 實驗三 NUS-WIDE ZSL 28 4.7 實驗四 NUS-WIDE GZSL 30 4.8 實驗分析 32 第五章 結論 33 參考著作 34

    [1] Wei Wang, Vincent W. Zheng, Han Yu, and Chunyan Miao. A Survey of Zero-Shot Learning: Settings, Methods, and Applications. ACM Trans. Intell. Syst. Technol.10, 2, Article 13, 2019.

    [2] Andrea Frome, Greg S. Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Marc'Aurelio Ranzato, Tomas Mikolov. DeViSE: A Deep Visual-Semantic Embedding Model. In NIPS, 2013.

    [3] Elyor Kodirov, Tao Xiang, Shaogang Gong. Semantic Autoencoder for Zero-Shot Learning. In CVPR, 2017.

    [4] Mehdi Mirza, Simon Osindero. Conditional Generative Adversarial Nets. arXiv preprint arXiv:1411.1784, 2014.

    [5] Martin Arjovsky, Léon Bottou. Towards Principled Methods for Training Generative Adversarial Networks. In ICLR, 2017.

    [6] Martin Arjovsky, Soumith Chintala, Léon Bottou. Wasserstein Generative Adversarial Networks. arXiv preprint arXiv:1701.07875, 2017.

    [7] Xianwen Yu, Xiaoning Zhang, Yang Cao and Min Xia. VAEGAN: A Collaborative Filtering Framework based onAdversarial Variational Autoencoders. In IJCAI, 2019.

    [8] Yongqin Xian, Sauabh Sharma, Bernt Schiele, Zeynep Akata. f-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning. In CVPR, 2019.

    [9]Meng Ye, Yuhong Guo. Multi-Label Zero-Shot Learning with Transfer-AwareLabel Embedding Projection. arXiv preprint arXiv:1808.02474, 2018.

    [10] Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. In ICLR Workshop, 2013.

    [11] Jeffrey Pennington, Richard Socher, Christopher D. Manning. GloVe: Global Vectors for Word Representation. In EMNLP, 2014.

    [12] Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805, 2018.

    [13] M.L.Menéndez, J.A.Pardo, L.Pardo, M.C.Pardo. The Jensen-Shannon divergence. J. Frankl. Inst. 1997.

    [14] M. Everingham, L. Gool, C. K. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. IJCV, 88(2):303–338, 2010.

    [15] Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhip-ing Luo, and Yantao Zheng. Nus-wide: a real-world web image database from national university of singapore. In CIVR, 2009.

    [16] Y. Zhang, B. Gong, and M. Shah, Fast zero-shot image tagging. In CVPR, 2016.

    [17] M. B. Sariyildiz and R. G. Cinbis, Gradient matching generative networks for zero-shot learning. In CVPR, 2019.

    [18] J. Lu, J. Li, Z. Yan, and C. Zhang, Zero-shot learning by generating pseudo feature representations. arXiv:1703.06389, 2017.

    [19] Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome, Greg S Corrado, and Jeffrey Dean. Zero-shot learning by convex combination of semantic embeddings. arXiv preprint arXiv:1312.5650, 2013.

    [20] Y. Xian, Z. Akata, G. Sharma, Q. Nguyen, M. Hein, and B. Schiele. Latent embeddings for zero-shot classification. In CVPR, 2016.

    [21] Y. Fu, Y. Yang, T. M. Hospedales, T. Xiang, and S. Gong. Transductive multi-label zero-shot learning. In BMVC, 2014.

    [22] Zeynep Akata, Florent Perronnin, Zaid Harchaoui, and Cordelia Schmid. Label-embedding for image classification. In TPAMI, 2015.

    [23] Dat Huynh and Ehsan Elhamifar. A shared multi-attention framework for multi-label zero-shot learning. In CVPR, 2020.

    [24] Akshita Gupta, Sanath Narayan, Salman Khan, Fahad Shahbaz Khan, Ling Shao and Joost van de Weijer. Generative Multi-Label Zero-Shot Learning. arXiv preprint arXiv:2101.11606, 2021

    無法下載圖示 電子全文延後公開
    2026/09/07
    QR CODE