研究生: |
李方 Fang Li |
---|---|
論文名稱: |
基於深度視覺語義嵌入之零樣本學習 Deep Visual-Semantic Embedding Model for Zero-Shot Learning |
指導教授: |
葉梅珍
Yeh, Mei-Chen |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2019 |
畢業學年度: | 107 |
語文別: | 中文 |
論文頁數: | 26 |
中文關鍵詞: | 零樣本學習 、視覺語義嵌入模型 |
英文關鍵詞: | zero shot learning, visual semantic embedding model |
DOI URL: | http://doi.org/10.6345/NTNU201900468 |
論文種類: | 學術論文 |
相關次數: | 點閱:151 下載:1 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
零樣本學習是指能夠識別從未在訓練階段看過的物件的能力。習得這個能力的關鍵是藉由資料集提供的類別描述,將已知的類別和未知的列別相互關聯起來。目前現有的方式則直接探討影像視覺特徵與語義嵌入之間的關係,透過學習線性或非線性的相容性函數(compatibility function),並預測其相容性分數最高的類別作為該樣本的類別標籤;然而這些方法經常受到語義模型和映射模型本身的泛化能力的侷限。
問題的關鍵在於建構分類器的方式。首先,儘管語義嵌入對轉移已知類別的經驗到未知類別至關重要,但這不代表語義嵌入所建構的分類器具有足夠的辨識力。在語義嵌入空間中,每一個類別原型都是固定的,但樣本的視覺特徵卻是多樣而複雜的,因此以語義嵌入建構的分類器可能難以應付多變的視覺特徵;另外,將視覺特徵映射到語義空間也會面臨到映射領域遷移 (projection domain shift) 的問題。
基於這項觀察,我們提出了一個深度學習模型將影像轉換為分類器——模型的輸出將會是標籤的轉換函數,透過找尋語義嵌入的線性/非線性組合,進而區分語義嵌入空間中屬於該影像的標籤。實驗於多個公開標準資料集的實驗結果驗證了所提方法之有效性。
Zero-shot learning refers to the problem of recognizing objects that are unseen at the training stage. The key of the problem is to associate the seen and unseen classes by their semantic information. Existing approaches learn the mapping functions from the visual, semantic, or visual-semantic joint features; and then recognize the class label of an instance by ranking the similarity scores of all classes in the embedding space. However, these methods are often limited by the semantic gap between the visual features and the corresponding semantic features.
The compatibility learning framework applied in existing methods has the following drawbacks. First, although semantic embedding is critical to transfer the experience of seen classes to unseen classes, it does not mean that the classifier constructed by semantic embedding is discriminative. Each class prototype is fixed in the semantic space, but the visual features of a class are diverse and complex, so the classifier constructed by using semantic embedding may be difficult to cope with the dynamic visual features; in addition, mapping from the visual space to the semantic space faces the problem of projection domain shift.
To solve these problems, we propose a deep model which is used to transform an image into a classifier -- the output of our model is a transformation function of labels. The transformation function seeks for a linear/nonlinear combination of semantic embedding, and is able to differentiate the labels in the semantic space. Experiments results using several benchmark datasets validate the effectiveness of our method.
[1]C. H. Lampert, H. Nickisch, and S. Harmeling. “Learning to detect unseen object classes by between-class attribute transfer,” in CVPR, 2009.
[2]M. Palatucci, D. Pomerleau, G. E. Hinton, and T. M. Mitchell, “Zero-shot learning with semantic output codes,” in NIPS, 2009.
Z. Akata, F. Perronnin, Z. Harchaoui, C. Schmid, “Label-Embedding for Attribute-Based Classification,” in CVPR 2013.
[3]A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean,T. Mikolov, et al, “Devise: A deep visual-semantic embedding model,” in NIPS, 2013.
[4]R. Socher, M. Ganjoo, C. D. Manning, and A. Ng, “Zero-shot learning through cross-modal transfer,” in NIPS, 2013.
[5]M. Norouzi, T. Mikolov, S. Bengio, Y. Singer,J. Shlens, A. Frome, G. S. Corrado, J. Dean, “Zero-Shot Learning by Convex Combination of Semantic Embeddings”, in ICLR, 2014
[6]Z. Akata, S. Reed, D. Walter, H. Lee, and B. Schiele, “Evaluation of output embeddings for fine-grained image classification,” in CVPR, 2015.
[7]Y. Xian, Z. Akata, G. Sharma, Q. Nguyen, M. Hein, B. Schiele, “Latent Embeddings for Zero-shot Classification,” in CVPR, 2016.
[8]Y. Shigeto, I. Suzuki, K. Hara, M. Shimbo, Y. Matsumoto, “Hubness, and Zero-Shot Learning,” in JMLR, 2010.
[9]L. Zhang, T. Xiang, and S. Gong, “Learning a deep embedding model for zero-shot learning,” in CVPR, 2016
[10]Z. Zhang and V. Saligrama, “Zero-Shot Learning via Semantic Similarity Embedding”, in CVPR 2016
[11]Romera-Paredes and P. H. Torr, “An embarrassingly simple approach to zero-shot learning”, in ICML 2015
[12]Y. Fu, T. M. Hospedales, T. Xiang, S. Gong,”Transductive Multi-view Zero-Shot Learning”, in TPAMI, 2015.
[14]S. Changpinyo, W.-L. Chao, B. Gong, and F. Sha, “Synthesized classifiersfor zero-shot learning,” in CVPR, 2016
[15]E. Kodirov, T. Xiang, and S. Gong, “Semantic autoencoder for zero-shotlearning,” in CVPR, 2017.
[16]V. K. Verm and P. Rai, “A simple exponential family framework for zero-shot learning,” in ECML, 2017, pp. 792–808.
[17]L Chen, H Zhang, J Xiao, W Liu, S.-F. Chang, “Zero-Shot Visual Recognition using Semantics-Preserving Adversarial Embedding Networks,” in arXiv:1712.01928, 2018
[18]X. Wang, Y. Ye, and A. Gupta, “Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs,” in CVPR, 2018
[19]W. L.Chao, S. Changpinyo, B. Gong, F. Sha, “An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild”, in ECCV, 2016.
[20]Y. Xian, B. Schiele, and Z. Akata, “Zero-shot learning – the good, the bad and the ugly,” in CVPR, 2017.
[21]W. Wang, V. W. Zheng, H. Yu, and C. Miao, “A Survey of Zero-Shot Learning: Settings, Methods, and Applications,” in ACM Transactions on Intelligent Systems and Technology (TIST),10(2):13.
[22]A. Farhadi, I. Endres, D. Hoiem, and D.A. Forsyth, “Describing Objects by their Attributes,” in CVPR, 2009.
[23]G. Patterson and J. Hays, “Sun attribute database: Discovering, annotating, and recognizing scene attributes,” in CVPR, 2012.
[24]P. Welinder, S. Branson, T. Mita, C. Wah, F. Schroff, S. Belongie, andP. Perona, “Caltech-UCSD Birds 200,” Caltech, Tech. Rep. CNS-TR-2010-001, 2010.
[25]J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei.”Imagenet: A large-scale hierarchical image database”, inCVPR, 2009
[26]K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016.
[27]G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. Miller,“Introduction to WordNet: An On-line Lexical Database,”in International Journal of Lexicography, 3(4):302–312, 1990.
[28]Z. Harris, “Distributional structure,” in Word, 10(23), 1954.
[29]G. Salton and C. Buckley, “Term-weighting approaches in au-tomatic text retrieval,” in Information processing & management, 24(5):513–523, 1988.
[30]M. Yeh, Y. Li, “Multilabel Deep Visual-Semantic Embedding”, in TPAMI, 2019
[31]L.v.d. Maaten, G. Hinton, “Visualizing data using t-sne”, in Journal of machine learning research, 9(Nov):2579–2605, 2008.