研究生: |
李奕男 Lee, Yi-Nan |
---|---|
論文名稱: |
從多標籤圖像學習之深層視覺語意轉換模型 Deep Visual Semantic Transform Model Learning from Multi-Label Images |
指導教授: |
葉梅珍
Yeh, Mei-Chen |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2017 |
畢業學年度: | 105 |
語文別: | 中文 |
論文頁數: | 42 |
中文關鍵詞: | 卷積類神經網路 、視覺語意嵌入模型 、多標籤圖像 、多標籤分類問題 |
英文關鍵詞: | Convolutional neural network, visual semantic embedding model, multi-label image, multi-label classification problem |
DOI URL: | https://doi.org/10.6345/NTNU202202567 |
論文種類: | 學術論文 |
相關次數: | 點閱:186 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在機器學習與電腦視覺領域中,如何學習圖像與文字語意之間的關係一直都是重要的議題。本論文探討圖像與文字關連性的問題,首先,每個文字之間是具有語意關係的,例如:天空跟雲這兩個字語意上靠近的,或是天空與汽車在語意上是幾乎不相關的。但是使用者對每個文字之間的語意關係是否會根據圖像會有所不同?例如:一張有天空與汽車的圖像,「天空」與「汽車」這兩個字原本就語意上可以說是幾乎不相關的,但因為此圖而產生了關連性。因此,我們認為文字間的語意關係會因為不同的圖像而改變其關聯程度。我們提出了一個卷積類神經網路(Convolutional Neural Network)的模型來連結圖像與該圖像多個的文字標籤的語意關係,其輸入為圖像,和現有的視覺語意嵌入模型最大的不同在於該模型的輸出為一個線性轉換函數,將輸入圖像對應到一個函數,用以判斷文字對該圖像的相關性,進而為圖像預測可能的標籤。
Learning the relation between images and text semantics has been an important problem in the field of machine learning and computer vision. This paper addresses this problem. We observe that there is a semantic relation between texts, for example, “sky” and “cloud” have a close semantic relation, and “sky” and “car” have a weak semantic relation. We suppose the semantic relation between texts can be different depending on images. For example, an image contains both sky and car. The word “sky” and “car” are initially semantically irrelevant, but may have a connection because of the image containing these concepts. Therefore, we propose a Convolutional Neural Network based model to link the semantic relation between an image and its text labels. The main difference between our work and existing visual semantic embedding models is that the output of our model is a linear transformation function. In other words, each input image is treated as a function to determine the relation between each word and the image, and to predict the possible labels for the image. Finally, this model is validated on the NUS-WIDE dataset and the experimental results show that the model has a great performance on predicting labels for images.
[1] A. Frome, G. Corrado, J. Shlens, S. Bengio, J. Dean, M. Ranzato, and T. Mikolov. DeViSE: A deep visual-semantic embedding model. In NIPS, 2013.
[2] A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks. In NIPS, pp. 1106–1114, 2012.
[3] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
[4] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. In ICLR, 2013.
[5] R. JeffreyPennington and C. Manning. Glove: Global vectors for word representation. 2014.
[6] J. Wang, Y. Yang, J. Mao, Z. Huang, C. Huang, and W. Xu. CNN-RNN: A unified framework for multi-label image classification. In CVPR, 2016.
[7] Z. Ren, H. Jin, Z. Lin, C. Fang, and A. Yuille.Multi-instance visual-semantic embedding. In arXiv preprint arXiv:1512.06963, 2015.
[8] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In arXiv preprint arXiv:1408.5093, 2014.
[9] Y. LeCun, S. Chopra, R. Hadsell, M. Ranzato, and F. Huang, A tutorial on energy-based learning, in Predicting Structured Data, Eds. MIT Press, 2006.
[10] T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng. Nus-wide: a real-world web image database from national university of singapore. In Proceedings of the ACM international conference on image and video retrieval, 2009.
[11] Y. Gong, Y. Jia, T. Leung, A. Toshev, and S. Ioffe. Deep convolutional ranking for multilabel image annotation. In arXiv preprint arXiv:1312.4894, 2013.
[12] Z. Lin, G. Ding, M. Hu, Y. Lin, and S. S. Ge. Image tag completion via dual-view linear sparse reconstructions. Computer Vision and Image Understanding, 124:42–60, 2014.
[13] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet large scale visual recognition challenge. CoRR,abs/1409.0575, 2014.