跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.85) 您好!臺灣時間:2024/12/12 10:37
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:童彧彣
研究生(外文):Tung, Yu-Wen
論文名稱:利用AI生成圖像進行少樣本分類之研究
論文名稱(外文):A Study on Using AI-Generated Images for Few-Shot Classification
指導教授:葉梅珍
指導教授(外文):Yeh, Mei-Chen
口試委員:葉梅珍方瓊瑤吳志強
口試委員(外文):Yeh, Mei-ChenFang, Chiung-YaoWu, Jhih-Ciang
口試日期:2024-07-30
學位類別:碩士
校院名稱:國立臺灣師範大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2024
畢業學年度:112
語文別:中文
論文頁數:38
中文關鍵詞:少樣本分類圖像生成特徵轉換
外文關鍵詞:Few-Shot ClassificationImage GenerationFeature Mapping
相關次數:
  • 被引用被引用:0
  • 點閱點閱:47
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本研究探討AI生成圖像應用於少樣本分類的問題,其任務是增加資料集中的樣本多樣性,以提高模型的分類能力。現有的數據擴充方法,如影像旋轉、縮放和使用生成對抗網路產生新樣本是基於現有少數樣本而生成圖像,此類方法會導致數據仍不夠多樣。因此本研究利用生成式AI模型(DALL-E)生成多樣化圖片,可以有效增加資料集的多樣性。
然而,我們發現直接將生成圖像加入到真實圖像的訓練集會降低模型準確率,因為生成圖像和真實圖像的特徵空間存在距離。因此,我們提出一個特徵轉換器,將生成圖像特徵映射到真實圖像特徵空間,以縮短兩者特徵空間之間的距離。實驗結果表明,生成圖像映射到真實圖像的特徵空間可以增加樣本的分佈數量,進而提升模型的分類能力。
The goal of this study is to improve the model's classification performance by increasing the diversity of samples in the dataset through the application of AI-generated images for few-shot classification. Existing methods of augmenting data generate images based on just a few sets of known samples, such as rotating and resizing images and using Generative Adversarial Networks (GANs) to generate new samples. It is possible that these methods might produce insufficiently varied data. This work generates a variety of images using a generative AI model (DALL-E) in order to successfully increase the diversity of the dataset.
However, because there is a gap between the feature spaces of generated and real images, we observed that adding generated images directly to the training set of real images reduces the accuracy of the model. To minimize the distance between the two feature spaces, we propose a feature encoder that maps the features of generated images to the feature space of real images. Based on the experiments, the model's classification performance can be improved by increasing the distribution of samples through mapping the generated images to the real image feature space.
第一章 緒論 1
1.1 研究背景 1
1.2 研究動機 2
1.3 研究架構 4
第二章 相關工作 5
2.1 元學習(Meta-learning) 5
2.2 少樣本學習(Few-shot Learning) 5
2.3 遷移學習(Transfer Learning) 6
2.4 預訓練backbone模型 7
2.5 生成圖片模型 8
第三章 模型方法 9
3.1 模型架構 9
3.2 DALL-E生成 10
3.3 增強型CLIP分類器(Enhance CLIP Classifier) 10
3.4 Cache模型 11
3.5 損失函數 12
3.5.1 Encoder訓練-Circle loss 12
3.5.2 Cache模型微調-Cross Entropy 13
3.6 預測階段 14
第四章 實驗 16
4.1 設置 16
4.1.1 模型參數設置 16
4.1.2 資料集設置 16
4.2 實驗結果 18
4.2.1與Tip-adapter比較 18
4.3 消融實驗 19
4.3.1 生成資料與真實資料的差距 19
4.3.2 生成圖片篩選後的效果 21
4.3.3 各模組組合準確率比較 22
4.3.4 Encoder的效果 23
4.3.5 Circle loss的優勢 24
4.3.6 預訓練DALL-E模型比較 26
4.3.7 模型分類表現:案例分析 28
第五章 結論 31
參考文獻 32
附錄 36
[1]Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., ... & Sutskever, I. (2021, July). Zero-shot text-to-image generation. In International conference on machine learning (pp. 8821-8831). Pmlr.
[2]Li Fei-Fei, Rob Fergus, and Pietro Perona. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In 2004 conference on computer vision and pattern recognition workshop, pages 178–178. IEEE, 2004.
[3]Zhang, Renrui, et al. "Tip-adapter: Training-free adaption of clip for few-shot classification." European conference on computer vision. Cham: Springer Nature Switzerland, 2022.
[4]Omkar M Parkhi, Andrea Vedaldi, Andrew Zisserman, and CV Jawahar. Cats and dogs. In 2012 IEEE conference on computer vision and pattern recognition, pages 3498–3505. IEEE, 2012.
[5]Maria-Elena Nilsback and Andrew Zisserman. Automated flower classification over a large number of classes. In 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, pages 722–729. IEEE, 2008.
[6]Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. Food-101–mining discriminative components with random forests. In European conference on computer vision, pages 446–461. Springer, 2014.
[7]Wang, Y., Yao, Q., Kwok, J. T., & Ni, L. M. (2020). Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys (csur), 53(3), 1-34.
[8]Hariharan, B., & Girshick, R. (2017). Low-shot visual recognition by shrinking and hallucinating features. In Proceedings of the IEEE international conference on computer vision (pp. 3018-3027).
[9]Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., ... & Sutskever, I. (2021, July). Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748-8763). PMLR.
[10]Zhou, Y., Li, C., Chen, C., Gao, J., & Xu, J. (2022). Lafite2: Few-shot text-to-image generation. arXiv preprint arXiv:2210.14124.
[11]Sun, Y., Cheng, C., Zhang, Y., Zhang, C., Zheng, L., Wang, Z., & Wei, Y. (2020). Circle loss: A unified perspective of pair similarity optimization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6398-6407).
[12]Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009, June). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248-255). Ieee.
[13]Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
[14]Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems, 27.
[15]Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., ... & He, Q. (2020). A comprehensive survey on transfer learning. Proceedings of the IEEE, 109(1), 43-76.
[16]Benaim, S., & Wolf, L. (2018). One-shot unsupervised cross domain translation. advances in neural information processing systems, 31.
[17]Shyam, P., Gupta, S., & Dukkipati, A. (2017, July). Attentive recurrent comparators. In International conference on machine learning (pp. 3173-3181). PMLR.
[18]Lake, B. M., Salakhutdinov, R., & Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266), 1332-1338.
[19]Kozerawski, J., & Turk, M. (2018). Clear: Cumulative learning for one-shot one-class image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3446-3455).
[20]Vinyals, O., Blundell, C., Lillicrap, T., & Wierstra, D. (2016). Matching networks for one shot learning. Advances in neural information processing systems, 29.
[21]Motiian, S., Jones, Q., Iranmanesh, S., & Doretto, G. (2017). Few-shot adversarial domain adaptation. Advances in neural information processing systems, 30.
[22]Yan, L., Zheng, Y., & Cao, J. (2018). Few-shot learning for short text classification. Multimedia Tools and Applications, 77, 29799-29810.
[23]Koch, G., Zemel, R., & Salakhutdinov, R. (2015, July). Siamese neural networks for one-shot image recognition. In ICML deep learning workshop (Vol. 2, No. 1).
[24]Keshari, R., Vatsa, M., Singh, R., & Noore, A. (2018). Learning structure and strength of CNN filters for small sample size training. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 9349-9358).
[25]Hoffman, J., Tzeng, E., Donahue, J., Jia, Y., Saenko, K., & Darrell, T. (2013). One-shot adaptation of supervised deep convolutional models. arXiv preprint arXiv:1312.6204.
[26]Finn, C., Abbeel, P., & Levine, S. (2017, July). Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning (pp. 1126-1135). PMLR.
[27]Dhillon, G. S., Chaudhari, P., Ravichandran, A., & Soatto, S. (2019). A baseline for few-shot image classification. arXiv preprint arXiv:1909.02729.
[28]Chen, W. Y., Liu, Y. C., Kira, Z., Wang, Y. C. F., & Huang, J. B. (2019). A closer look at few-shot classification. arXiv preprint arXiv:1904.04232.
[29]Zhu, C., Chen, F., Ahmed, U., Shen, Z., & Savvides, M. (2021). Semantic relation reasoning for shot-stable few-shot object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8782-8791).
[30]Xian, Y., Akata, Z., Sharma, G., Nguyen, Q., Hein, M., & Schiele, B. (2016). Latent embeddings for zero-shot classification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 69-77).
[31]Kingma, D. P., & Welling, M. (2019). An introduction to variational autoencoders. Foundations and Trends® in Machine Learning, 12(4), 307-392.
[32]Zhang, C., Bengio, S., Hardt, M., Recht, B., & Vinyals, O. (2021). Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3), 107-115.
[33]Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1), 1929-1958.
[34]Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The caltech-ucsd birds-200-2011 dataset.
[35]Vinyals, O., Blundell, C., Lillicrap, T., & Wierstra, D. (2016). Matching networks for one shot learning. Advances in neural information processing systems, 29.
電子全文 電子全文(網際網路公開日期:20260809)
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top