簡易檢索 / 詳目顯示

研究生: 童彧彣
Tung, Yu-Wen
論文名稱: 利用AI生成圖像進行少樣本分類之研究
A Study on Using AI-Generated Images for Few-Shot Classification
指導教授: 葉梅珍
Yeh, Mei-Chen
口試委員: 葉梅珍
Yeh, Mei-Chen
方瓊瑤
Fang, Chiung-Yao
吳志強
Wu, Jhih-Ciang
口試日期: 2024/07/30
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 中文
論文頁數: 38
中文關鍵詞: 少樣本分類圖像生成特徵轉換
英文關鍵詞: Few-Shot Classification, Image Generation, Feature Mapping
DOI URL: http://doi.org/10.6345/NTNU202401492
論文種類: 學術論文
相關次數: 點閱:73下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究探討AI生成圖像應用於少樣本分類的問題,其任務是增加資料集中的樣本多樣性,以提高模型的分類能力。現有的數據擴充方法,如影像旋轉、縮放和使用生成對抗網路產生新樣本是基於現有少數樣本而生成圖像,此類方法會導致數據仍不夠多樣。因此本研究利用生成式AI模型(DALL-E)生成多樣化圖片,可以有效增加資料集的多樣性。
    然而,我們發現直接將生成圖像加入到真實圖像的訓練集會降低模型準確率,因為生成圖像和真實圖像的特徵空間存在距離。因此,我們提出一個特徵轉換器,將生成圖像特徵映射到真實圖像特徵空間,以縮短兩者特徵空間之間的距離。實驗結果表明,生成圖像映射到真實圖像的特徵空間可以增加樣本的分佈數量,進而提升模型的分類能力。

    The goal of this study is to improve the model's classification performance by increasing the diversity of samples in the dataset through the application of AI-generated images for few-shot classification. Existing methods of augmenting data generate images based on just a few sets of known samples, such as rotating and resizing images and using Generative Adversarial Networks (GANs) to generate new samples. It is possible that these methods might produce insufficiently varied data. This work generates a variety of images using a generative AI model (DALL-E) in order to successfully increase the diversity of the dataset.
    However, because there is a gap between the feature spaces of generated and real images, we observed that adding generated images directly to the training set of real images reduces the accuracy of the model. To minimize the distance between the two feature spaces, we propose a feature encoder that maps the features of generated images to the feature space of real images. Based on the experiments, the model's classification performance can be improved by increasing the distribution of samples through mapping the generated images to the real image feature space.

    第一章 緒論 1 1.1 研究背景 1 1.2 研究動機 2 1.3 研究架構 4 第二章 相關工作 5 2.1 元學習(Meta-learning) 5 2.2 少樣本學習(Few-shot Learning) 5 2.3 遷移學習(Transfer Learning) 6 2.4 預訓練backbone模型 7 2.5 生成圖片模型 8 第三章 模型方法 9 3.1 模型架構 9 3.2 DALL-E生成 10 3.3 增強型CLIP分類器(Enhance CLIP Classifier) 10 3.4 Cache模型 11 3.5 損失函數 12 3.5.1 Encoder訓練-Circle loss 12 3.5.2 Cache模型微調-Cross Entropy 13 3.6 預測階段 14 第四章 實驗 16 4.1 設置 16 4.1.1 模型參數設置 16 4.1.2 資料集設置 16 4.2 實驗結果 18 4.2.1與Tip-adapter比較 18 4.3 消融實驗 19 4.3.1 生成資料與真實資料的差距 19 4.3.2 生成圖片篩選後的效果 21 4.3.3 各模組組合準確率比較 22 4.3.4 Encoder的效果 23 4.3.5 Circle loss的優勢 24 4.3.6 預訓練DALL-E模型比較 26 4.3.7 模型分類表現:案例分析 28 第五章 結論 31 參考文獻 32 附錄 36

    [1] Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., ... & Sutskever, I. (2021, July). Zero-shot text-to-image generation. In International conference on machine learning (pp. 8821-8831). Pmlr.
    [2] Li Fei-Fei, Rob Fergus, and Pietro Perona. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In 2004 conference on computer vision and pattern recognition workshop, pages 178–178. IEEE, 2004.
    [3] Zhang, Renrui, et al. "Tip-adapter: Training-free adaption of clip for few-shot classification." European conference on computer vision. Cham: Springer Nature Switzerland, 2022.
    [4] Omkar M Parkhi, Andrea Vedaldi, Andrew Zisserman, and CV Jawahar. Cats and dogs. In 2012 IEEE conference on computer vision and pattern recognition, pages 3498–3505. IEEE, 2012.
    [5] Maria-Elena Nilsback and Andrew Zisserman. Automated flower classification over a large number of classes. In 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, pages 722–729. IEEE, 2008.
    [6] Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. Food-101–mining discriminative components with random forests. In European conference on computer vision, pages 446–461. Springer, 2014.
    [7] Wang, Y., Yao, Q., Kwok, J. T., & Ni, L. M. (2020). Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys (csur), 53(3), 1-34.
    [8] Hariharan, B., & Girshick, R. (2017). Low-shot visual recognition by shrinking and hallucinating features. In Proceedings of the IEEE international conference on computer vision (pp. 3018-3027).
    [9] Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., ... & Sutskever, I. (2021, July). Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748-8763). PMLR.
    [10] Zhou, Y., Li, C., Chen, C., Gao, J., & Xu, J. (2022). Lafite2: Few-shot text-to-image generation. arXiv preprint arXiv:2210.14124.
    [11] Sun, Y., Cheng, C., Zhang, Y., Zhang, C., Zheng, L., Wang, Z., & Wei, Y. (2020). Circle loss: A unified perspective of pair similarity optimization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6398-6407).
    [12] Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009, June). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248-255). Ieee.
    [13] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
    [14] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems, 27.
    [15] Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., ... & He, Q. (2020). A comprehensive survey on transfer learning. Proceedings of the IEEE, 109(1), 43-76.
    [16] Benaim, S., & Wolf, L. (2018). One-shot unsupervised cross domain translation. advances in neural information processing systems, 31.
    [17] Shyam, P., Gupta, S., & Dukkipati, A. (2017, July). Attentive recurrent comparators. In International conference on machine learning (pp. 3173-3181). PMLR.
    [18] Lake, B. M., Salakhutdinov, R., & Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266), 1332-1338.
    [19] Kozerawski, J., & Turk, M. (2018). Clear: Cumulative learning for one-shot one-class image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3446-3455).
    [20] Vinyals, O., Blundell, C., Lillicrap, T., & Wierstra, D. (2016). Matching networks for one shot learning. Advances in neural information processing systems, 29.
    [21] Motiian, S., Jones, Q., Iranmanesh, S., & Doretto, G. (2017). Few-shot adversarial domain adaptation. Advances in neural information processing systems, 30.
    [22] Yan, L., Zheng, Y., & Cao, J. (2018). Few-shot learning for short text classification. Multimedia Tools and Applications, 77, 29799-29810.
    [23] Koch, G., Zemel, R., & Salakhutdinov, R. (2015, July). Siamese neural networks for one-shot image recognition. In ICML deep learning workshop (Vol. 2, No. 1).
    [24] Keshari, R., Vatsa, M., Singh, R., & Noore, A. (2018). Learning structure and strength of CNN filters for small sample size training. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 9349-9358).
    [25] Hoffman, J., Tzeng, E., Donahue, J., Jia, Y., Saenko, K., & Darrell, T. (2013). One-shot adaptation of supervised deep convolutional models. arXiv preprint arXiv:1312.6204.
    [26] Finn, C., Abbeel, P., & Levine, S. (2017, July). Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning (pp. 1126-1135). PMLR.
    [27] Dhillon, G. S., Chaudhari, P., Ravichandran, A., & Soatto, S. (2019). A baseline for few-shot image classification. arXiv preprint arXiv:1909.02729.
    [28] Chen, W. Y., Liu, Y. C., Kira, Z., Wang, Y. C. F., & Huang, J. B. (2019). A closer look at few-shot classification. arXiv preprint arXiv:1904.04232.
    [29] Zhu, C., Chen, F., Ahmed, U., Shen, Z., & Savvides, M. (2021). Semantic relation reasoning for shot-stable few-shot object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8782-8791).
    [30] Xian, Y., Akata, Z., Sharma, G., Nguyen, Q., Hein, M., & Schiele, B. (2016). Latent embeddings for zero-shot classification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 69-77).
    [31] Kingma, D. P., & Welling, M. (2019). An introduction to variational autoencoders. Foundations and Trends® in Machine Learning, 12(4), 307-392.
    [32] Zhang, C., Bengio, S., Hardt, M., Recht, B., & Vinyals, O. (2021). Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3), 107-115.
    [33] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1), 1929-1958.
    [34] Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The caltech-ucsd birds-200-2011 dataset.
    [35] Vinyals, O., Blundell, C., Lillicrap, T., & Wierstra, D. (2016). Matching networks for one shot learning. Advances in neural information processing systems, 29.

    無法下載圖示 電子全文延後公開
    2026/08/09
    QR CODE