研究生: |
童彧彣 Tung, Yu-Wen |
---|---|
論文名稱: |
利用AI生成圖像進行少樣本分類之研究 A Study on Using AI-Generated Images for Few-Shot Classification |
指導教授: |
葉梅珍
Yeh, Mei-Chen |
口試委員: |
葉梅珍
Yeh, Mei-Chen 方瓊瑤 Fang, Chiung-Yao 吳志強 Wu, Jhih-Ciang |
口試日期: | 2024/07/30 |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 中文 |
論文頁數: | 38 |
中文關鍵詞: | 少樣本分類 、圖像生成 、特徵轉換 |
英文關鍵詞: | Few-Shot Classification, Image Generation, Feature Mapping |
DOI URL: | http://doi.org/10.6345/NTNU202401492 |
論文種類: | 學術論文 |
相關次數: | 點閱:73 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本研究探討AI生成圖像應用於少樣本分類的問題,其任務是增加資料集中的樣本多樣性,以提高模型的分類能力。現有的數據擴充方法,如影像旋轉、縮放和使用生成對抗網路產生新樣本是基於現有少數樣本而生成圖像,此類方法會導致數據仍不夠多樣。因此本研究利用生成式AI模型(DALL-E)生成多樣化圖片,可以有效增加資料集的多樣性。
然而,我們發現直接將生成圖像加入到真實圖像的訓練集會降低模型準確率,因為生成圖像和真實圖像的特徵空間存在距離。因此,我們提出一個特徵轉換器,將生成圖像特徵映射到真實圖像特徵空間,以縮短兩者特徵空間之間的距離。實驗結果表明,生成圖像映射到真實圖像的特徵空間可以增加樣本的分佈數量,進而提升模型的分類能力。
The goal of this study is to improve the model's classification performance by increasing the diversity of samples in the dataset through the application of AI-generated images for few-shot classification. Existing methods of augmenting data generate images based on just a few sets of known samples, such as rotating and resizing images and using Generative Adversarial Networks (GANs) to generate new samples. It is possible that these methods might produce insufficiently varied data. This work generates a variety of images using a generative AI model (DALL-E) in order to successfully increase the diversity of the dataset.
However, because there is a gap between the feature spaces of generated and real images, we observed that adding generated images directly to the training set of real images reduces the accuracy of the model. To minimize the distance between the two feature spaces, we propose a feature encoder that maps the features of generated images to the feature space of real images. Based on the experiments, the model's classification performance can be improved by increasing the distribution of samples through mapping the generated images to the real image feature space.
[1] Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., ... & Sutskever, I. (2021, July). Zero-shot text-to-image generation. In International conference on machine learning (pp. 8821-8831). Pmlr.
[2] Li Fei-Fei, Rob Fergus, and Pietro Perona. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In 2004 conference on computer vision and pattern recognition workshop, pages 178–178. IEEE, 2004.
[3] Zhang, Renrui, et al. "Tip-adapter: Training-free adaption of clip for few-shot classification." European conference on computer vision. Cham: Springer Nature Switzerland, 2022.
[4] Omkar M Parkhi, Andrea Vedaldi, Andrew Zisserman, and CV Jawahar. Cats and dogs. In 2012 IEEE conference on computer vision and pattern recognition, pages 3498–3505. IEEE, 2012.
[5] Maria-Elena Nilsback and Andrew Zisserman. Automated flower classification over a large number of classes. In 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, pages 722–729. IEEE, 2008.
[6] Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. Food-101–mining discriminative components with random forests. In European conference on computer vision, pages 446–461. Springer, 2014.
[7] Wang, Y., Yao, Q., Kwok, J. T., & Ni, L. M. (2020). Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys (csur), 53(3), 1-34.
[8] Hariharan, B., & Girshick, R. (2017). Low-shot visual recognition by shrinking and hallucinating features. In Proceedings of the IEEE international conference on computer vision (pp. 3018-3027).
[9] Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., ... & Sutskever, I. (2021, July). Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748-8763). PMLR.
[10] Zhou, Y., Li, C., Chen, C., Gao, J., & Xu, J. (2022). Lafite2: Few-shot text-to-image generation. arXiv preprint arXiv:2210.14124.
[11] Sun, Y., Cheng, C., Zhang, Y., Zhang, C., Zheng, L., Wang, Z., & Wei, Y. (2020). Circle loss: A unified perspective of pair similarity optimization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6398-6407).
[12] Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009, June). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248-255). Ieee.
[13] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
[14] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems, 27.
[15] Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., ... & He, Q. (2020). A comprehensive survey on transfer learning. Proceedings of the IEEE, 109(1), 43-76.
[16] Benaim, S., & Wolf, L. (2018). One-shot unsupervised cross domain translation. advances in neural information processing systems, 31.
[17] Shyam, P., Gupta, S., & Dukkipati, A. (2017, July). Attentive recurrent comparators. In International conference on machine learning (pp. 3173-3181). PMLR.
[18] Lake, B. M., Salakhutdinov, R., & Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266), 1332-1338.
[19] Kozerawski, J., & Turk, M. (2018). Clear: Cumulative learning for one-shot one-class image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3446-3455).
[20] Vinyals, O., Blundell, C., Lillicrap, T., & Wierstra, D. (2016). Matching networks for one shot learning. Advances in neural information processing systems, 29.
[21] Motiian, S., Jones, Q., Iranmanesh, S., & Doretto, G. (2017). Few-shot adversarial domain adaptation. Advances in neural information processing systems, 30.
[22] Yan, L., Zheng, Y., & Cao, J. (2018). Few-shot learning for short text classification. Multimedia Tools and Applications, 77, 29799-29810.
[23] Koch, G., Zemel, R., & Salakhutdinov, R. (2015, July). Siamese neural networks for one-shot image recognition. In ICML deep learning workshop (Vol. 2, No. 1).
[24] Keshari, R., Vatsa, M., Singh, R., & Noore, A. (2018). Learning structure and strength of CNN filters for small sample size training. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 9349-9358).
[25] Hoffman, J., Tzeng, E., Donahue, J., Jia, Y., Saenko, K., & Darrell, T. (2013). One-shot adaptation of supervised deep convolutional models. arXiv preprint arXiv:1312.6204.
[26] Finn, C., Abbeel, P., & Levine, S. (2017, July). Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning (pp. 1126-1135). PMLR.
[27] Dhillon, G. S., Chaudhari, P., Ravichandran, A., & Soatto, S. (2019). A baseline for few-shot image classification. arXiv preprint arXiv:1909.02729.
[28] Chen, W. Y., Liu, Y. C., Kira, Z., Wang, Y. C. F., & Huang, J. B. (2019). A closer look at few-shot classification. arXiv preprint arXiv:1904.04232.
[29] Zhu, C., Chen, F., Ahmed, U., Shen, Z., & Savvides, M. (2021). Semantic relation reasoning for shot-stable few-shot object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8782-8791).
[30] Xian, Y., Akata, Z., Sharma, G., Nguyen, Q., Hein, M., & Schiele, B. (2016). Latent embeddings for zero-shot classification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 69-77).
[31] Kingma, D. P., & Welling, M. (2019). An introduction to variational autoencoders. Foundations and Trends® in Machine Learning, 12(4), 307-392.
[32] Zhang, C., Bengio, S., Hardt, M., Recht, B., & Vinyals, O. (2021). Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3), 107-115.
[33] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1), 1929-1958.
[34] Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The caltech-ucsd birds-200-2011 dataset.
[35] Vinyals, O., Blundell, C., Lillicrap, T., & Wierstra, D. (2016). Matching networks for one shot learning. Advances in neural information processing systems, 29.