研究生: |
陳冠穎 Chen, Guan-Ying |
---|---|
論文名稱: |
深度視覺語義嵌入模型於生成式多標籤零樣本學習 Deep Visual-Semantic Embedding Model for Generative Multi-Label Zero-Shot Learning |
指導教授: |
葉梅珍
Yeh, Mei-Chen |
口試委員: |
葉梅珍
Yeh, Mei-Chen 陳祝嵩 Chen, Chu-Song 彭彥璁 |
口試日期: | 2021/07/30 |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 中文 |
論文頁數: | 37 |
中文關鍵詞: | 多標籤 、零樣本學習 、視覺語義嵌入模型 、生成對抗網路 |
英文關鍵詞: | Multi-Label, Zero-Shot Learning, visual semantic embedding model, GAN, generative adversarial network |
研究方法: | 實驗設計法 |
DOI URL: | http://doi.org/10.6345/NTNU202101371 |
論文種類: | 學術論文 |
相關次數: | 點閱:137 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
零樣本學習是指分類器不只能識別在訓練階段已經看過的物件,甚至能識
別未曾看過的物件,而在多標籤零樣本學習中,每個實例中可能出現不只一個
物件,這使得識別任務變得更加困難。
過去的方法常利用標籤的屬性嵌入(attributes embedding)及影像抽取出的
視覺特徵(visual feature),投影到同一空間中,藉此尋找與影像特徵最接近的
標籤,或是利用知識圖譜、知識庫建構標籤之間的關係,根據此關係來幫助辨
識標籤。然而在資料集欠缺屬性嵌入時,常用於替代的語義嵌入(word mbedding)並不像屬性嵌入一樣具有良好的辨識力,而建構關係的方法,也容易太過信任知識庫,便將關係強加上去,忽略了影像本身包含的資訊。近年來由於生成對抗網路(Generative Adversarial Network)的興起,對於未知類別,先從已知類別學習影像特徵的表達式及對應的屬性,再由屬性標籤生成影像特徵變得更加有效率,結果也更準確。基於這項觀察,我們提出了生成對抗網路結合語義嵌入的深度學習模型,從語義嵌入生成影像特徵,以及將影像特徵轉換成分類器映射至語義嵌入空間,尋找屬於該影像的標籤。藉由影像特徵及語義嵌入互相映射來更好地預測未知類別,並根據影像特徵與分類器之間的關係,將多標籤任務轉換化成單標籤任務。
[1] Wei Wang, Vincent W. Zheng, Han Yu, and Chunyan Miao. A Survey of Zero-Shot Learning: Settings, Methods, and Applications. ACM Trans. Intell. Syst. Technol.10, 2, Article 13, 2019.
[2] Andrea Frome, Greg S. Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Marc'Aurelio Ranzato, Tomas Mikolov. DeViSE: A Deep Visual-Semantic Embedding Model. In NIPS, 2013.
[3] Elyor Kodirov, Tao Xiang, Shaogang Gong. Semantic Autoencoder for Zero-Shot Learning. In CVPR, 2017.
[4] Mehdi Mirza, Simon Osindero. Conditional Generative Adversarial Nets. arXiv preprint arXiv:1411.1784, 2014.
[5] Martin Arjovsky, Léon Bottou. Towards Principled Methods for Training Generative Adversarial Networks. In ICLR, 2017.
[6] Martin Arjovsky, Soumith Chintala, Léon Bottou. Wasserstein Generative Adversarial Networks. arXiv preprint arXiv:1701.07875, 2017.
[7] Xianwen Yu, Xiaoning Zhang, Yang Cao and Min Xia. VAEGAN: A Collaborative Filtering Framework based onAdversarial Variational Autoencoders. In IJCAI, 2019.
[8] Yongqin Xian, Sauabh Sharma, Bernt Schiele, Zeynep Akata. f-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning. In CVPR, 2019.
[9]Meng Ye, Yuhong Guo. Multi-Label Zero-Shot Learning with Transfer-AwareLabel Embedding Projection. arXiv preprint arXiv:1808.02474, 2018.
[10] Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. In ICLR Workshop, 2013.
[11] Jeffrey Pennington, Richard Socher, Christopher D. Manning. GloVe: Global Vectors for Word Representation. In EMNLP, 2014.
[12] Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805, 2018.
[13] M.L.Menéndez, J.A.Pardo, L.Pardo, M.C.Pardo. The Jensen-Shannon divergence. J. Frankl. Inst. 1997.
[14] M. Everingham, L. Gool, C. K. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. IJCV, 88(2):303–338, 2010.
[15] Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhip-ing Luo, and Yantao Zheng. Nus-wide: a real-world web image database from national university of singapore. In CIVR, 2009.
[16] Y. Zhang, B. Gong, and M. Shah, Fast zero-shot image tagging. In CVPR, 2016.
[17] M. B. Sariyildiz and R. G. Cinbis, Gradient matching generative networks for zero-shot learning. In CVPR, 2019.
[18] J. Lu, J. Li, Z. Yan, and C. Zhang, Zero-shot learning by generating pseudo feature representations. arXiv:1703.06389, 2017.
[19] Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome, Greg S Corrado, and Jeffrey Dean. Zero-shot learning by convex combination of semantic embeddings. arXiv preprint arXiv:1312.5650, 2013.
[20] Y. Xian, Z. Akata, G. Sharma, Q. Nguyen, M. Hein, and B. Schiele. Latent embeddings for zero-shot classification. In CVPR, 2016.
[21] Y. Fu, Y. Yang, T. M. Hospedales, T. Xiang, and S. Gong. Transductive multi-label zero-shot learning. In BMVC, 2014.
[22] Zeynep Akata, Florent Perronnin, Zaid Harchaoui, and Cordelia Schmid. Label-embedding for image classification. In TPAMI, 2015.
[23] Dat Huynh and Ehsan Elhamifar. A shared multi-attention framework for multi-label zero-shot learning. In CVPR, 2020.
[24] Akshita Gupta, Sanath Narayan, Salman Khan, Fahad Shahbaz Khan, Ling Shao and Joost van de Weijer. Generative Multi-Label Zero-Shot Learning. arXiv preprint arXiv:2101.11606, 2021