跳到主要內容

臺灣博碩士論文加值系統

(44.192.38.248) 您好!臺灣時間:2022/11/27 07:02
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:包一彭
研究生(外文):BAO YIPENG
論文名稱:影像處理基於膠囊條件生成對抗網路
論文名稱(外文):Image Processing based on Capsule Conditional Generative Adversarial Networks
指導教授:張介仁
指導教授(外文):CHANG,JIEH-REN
口試委員:張介仁陳佑祥吳德豐黃鴻穎
口試委員(外文):CHANG,JIEH-RENCHEN,YOU-SHYANGWU,TER-FENGHUANG,HUNG-YING
口試日期:2020-06-29
學位類別:碩士
校院名稱:國立宜蘭大學
系所名稱:電子工程學系碩士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2020
畢業學年度:108
語文別:中文
論文頁數:55
中文關鍵詞:膠囊網路生成對抗網路類神經網路深度學習
外文關鍵詞:Capsule NetworksGANNeural networkDeep Learning
DOI:doi:10.6820/niu202000063
相關次數:
  • 被引用被引用:0
  • 點閱點閱:242
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
卷積神經網路CNN(Convolutional Neural Networks)在計算機視覺領域中,例如:影像分類、物體識別等任務上已實現可超越人類的水準。但是CNN也有其缺點:它們需要大量的圖片進行訓練,CNN會在卷積層後面的池化層丟失資訊,CNN不能夠理解圖片中不同部分的關係,這些問題的最新解決方案是依靠膠囊網路CapsNet(Capsule Networks):一個膠囊網路是由一小群神經元組成的膠囊構成,它們可學習在一張圖片中一個特定的對象的特徵(比如:眼睛,鼻子,嘴等等)。
生成對抗網路GAN(Generative Adversarial Networks)是目前機器學習中最熱門的研究領城,主要以兩個相互競爭的網路,生成器網路和判別器網路來學習真實樣本的分佈,但由於真實樣本分佈的複雜性,導致生成對抗網路中生成器在訓練過程穩定性、生成數據品質等方面均存在不少問題。為解決這個問題,Radford, Alec等人將CNN引入到GAN架構中,提出深度卷積生成對抗網路DCGAN(Deep Convolutional Generative Adversarial Networks),將CNN的優點引入到GAN中,與GAN相比較,實驗結果的表現品質有顯著提高,但是,DCGAN有時仍無法產生逼真的圖像,這是因為,GAN的生成器出現模式坍塌(Mode Collapse),比如:訓練過程難以收斂,經常出現震蕩;只生成某一種風格的圖片等現象。
由於DCGAN以上的不足,本文將膠囊網路架構引入到條件生成對抗網路中,提出CapsuleCGAN(Capsule Conditional Generative Adverserial Networks)架構,相比較DCGAN的區別如下:在判別器中,使用膠囊網路代替原有的卷積層架構,可把膠囊網路的優點引入,使生成的圖片多樣性更好;在生成器中,使用上採樣和卷積層架構代替原有的反卷積層,使得生成的圖片減少棋盤效應。在引入條件標籤中使用嵌入層(Embedding)代替原有的的One-hot編碼(One-hot Encoding),同時在訓練時採用單邊標籤平滑技術,加速網路的收斂,減少大幅度震盪。
通過在MNIST和CIFAR-10數據集上的多組實驗,對比CGAN和DCGAN,結果表明,我們提出的CapsuleCGAN的架構可成功訓練,從實驗結果來看,不僅可以生成高品質影像,而且在生成圖像多樣性也表現更好,有效的抑制了GAN的模式坍塌現象。最後,我們對值得研究的問題提出了改進建議和思考。

Convolutional Neural Networks (CNN) have achieved a level beyond humans in the field of computer vision, such as image classification and object recognition. But CNN also has its shortcomings: they require a large number of pictures for training, CNN will lose information in the pooling layer behind the convolution layer, CNN cannot understand the relationship between different parts of the picture, the latest solution to these problems is to rely on the capsule network CapsNet (Capsule Networks): A capsule network is composed of a small group of neurons. They can learn the characteristics of a specific object (such as eyes, nose, mouth, etc.) in a picture.
Generative Adversarial Networks (GAN) is currently the most popular research leader in machine learning. It mainly uses two competing networks, a generator network and a discriminator network to learn the distribution of real samples, but because of The complexity of the real sample distribution leads to many problems in the stability of the generative model of the generative adversarial network in the training process stability and the quality of the generated data. In order to solve this problem, Radford, Alec and others introduced CNN into the GAN structure, and proposed a deep convolutional generation adversarial network DCGAN (Deep Convolutional Generative Adversarial Networks). The advantages of CNN were introduced into GAN. Compared with GAN, the experiment The performance quality of the results has been significantly improved, but DCGAN still sometimes fails to produce realistic images. This is because the GAN generator has Mode Collapse, for example: the training process is difficult to converge, and there are frequent shocks; only the generation Pictures of a certain style, etc.
Due to the deficiencies above DCGAN, this paper introduces the capsule network structure into the conditional generation adversarial network, and proposes the CapsuleCGAN (Capsule Conditional Generative Adverserial Networks) architecture. Compared with DCGAN, the difference is as follows: In the discriminator, the capsule network is used instead of the original Some convolutional layer structures can introduce the advantages of the capsule network to make the generated pictures more diverse; in the generator, use upsampling and convolutional layer structures to replace the original deconvolution layer, so that the generated pictures reduce the chessboard effect. In the introduction of conditional labels, Embedding is used to replace the original One-hot Encoding. At the same time, unilateral label smoothing technology is used during training to accelerate the convergence of the network and reduce large oscillations.
Through multiple sets of experiments on the MNIST and CIFAR-10 datasets, comparing CGAN and DCGAN, the results show that the proposed CapsuleCGAN architecture can be successfully trained. From the experimental results, not only can high-quality images be generated, but also in generating maps Like diversity, it also performs better, effectively suppressing the GAN model collapse. Finally, we put forward suggestions for improvement and thinking on issues worth studying.

目錄
摘要 I
Abstract II
誌謝 IV
目錄 V
圖目錄 VII
表目錄 IX
Chapter 1 緒論 1
1.1 研究動機和目的 1
1.2 論文架構 3
Chapter 2 相關研究 4
2.1 深度學習 4
2.1.1 監督式學習 4
2.1.2 非監督式學習 5
2.1.3 半監督式學習 5
2.1.4 增強式學習 5
2.2 生成對抗網路 6
2.2.1 生成對抗網路 6
2.2.2 條件生成對抗網路 9
2.2.3 深度卷積生成對抗網路 11
2.3 膠囊網路 12
2.3.1 卷積網路的問題 12
2.3.2 膠囊網路的架構 16
2.3.3 膠囊網路的動態路由機制 17
Chapter 3 系統架構與設計 20
3.1 整體架構 20
3.2 生成器架構 22
3.2.1  BatchNormalization層 24
3.2.2  Upsampling和卷積層 24
3.3 判別器架構 27
3.3.1  PrimaryCaps層和DigitCaps層 29
3.3.2  單邊標簽平滑 29
Chapter 4 實驗與結果分析 31
4.1 MNIST資料集實驗 31
4.2 CIFAR-10資料集實驗 36
4.3 實驗結果 40
4.3.1  Capsule結構對生成圖片的影響 40
4.3.2  上採樣和卷積層結構對生成圖片的影響 46
4.3.3  嵌入層結構和標籤平滑對網路的影響 47
4.3.4  CapsuleCGAN對比CGAN和CDCGAN比較 49
Chapter 5 結論與未來研究 50
5.1 本文工作總結 50
5.2 未來工作展望 51
參考文獻 52


[1]GOODFELLOW, Ian, et al. Generative adversarial nets. In: Advances in neural information processing systems. 2014. p. 2672-2680.
[2]MIRZA, Mehdi; OSINDERO, Simon. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014
[3]GAUTHIER, Jon. Conditional generative adversarial nets for convolutional face generation. Class Project for Stanford CS231N: Convolutional Neural Networks for Visual Recognition, Winter semester, 2014, 2014.5: 2.
[4]BAU, David, et al. Seeing what a gan cannot generate. In: Proceedings of the IEEE International Conference on Computer Vision. 2019. p. 4502-4511.
[5]LAWRENCE, Steve, et al. Face recognition: A convolutional neural-network approach. IEEE transactions on neural networks, 1997, 8.1: 98-113.
[6]RADFORD, Alec; METZ, Luke; CHINTALA, Soumith. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
[7]FANG, Wei, et al. A method for improving CNN-based image recognition using DCGAN. CMC: Comput. Mater. Continua, 2018, 57.1: 167-178.
[8]SUÁREZ, Patricia L.; SAPPA, Angel D.; VINTIMILLA, Boris X. Infrared image colorization based on a triplet dcgan architecture. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2017. p. 18-23.
[9]SABOUR, Sara; FROSST, Nicholas; HINTON, Geoffrey E. Dynamic routing between capsules. In: Advances in neural information processing systems. 2017. p. 3856-3866.
[10]GOODFELLOW, Ian. NIPS 2016 tutorial: Generative adversarial networks. arXiv preprint arXiv:1701.00160, 2016.
[11]SRIVASTAVA, Akash, et al. Veegan: Reducing mode collapse in gans using implicit variational learning. In: Advances in Neural Information Processing Systems. 2017. p. 3308-3318.
[12]MUKHOMETZIANOV, Rinat; CARRILLO, Juan. CapsNet comparative performance evaluation for image classification. arXiv preprint arXiv:1805.11195, 2018.
[13]RUSH, Alexander M.; CHOPRA, Sumit; WESTON, Jason. A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685, 2015.
[14]ZHU, Xiaojin; GOLDBERG, Andrew B. Introduction to semi-supervised learning (synthesis lectures on artificial intelligence and machine learning). Morgan and Claypool Publishers, 2009, 14.
[15]GHAHRAMANI, Zoubin. Unsupervised learning. In: Summer School on Machine Learning. Springer, Berlin, Heidelberg, 2003. p. 72-112.
[16]ZHU, Xiaojin; GOLDBERG, Andrew B. Introduction to semi-supervised learning. Synthesis lectures on artificial intelligence and machine learning, 2009, 3.1: 1-130.
[17]SUTTON, Richard S.; BARTO, Andrew G. Reinforcement learning: An introduction. MIT press, 2018.
[18]CRIMINISI, Antonio; SHOTTON, Jamie; KONUKOGLU, Ender. Decision forests for classification, regression, density estimation, manifold learning and semi-supervised learning. Microsoft Research Cambridge, Tech. Rep. MSRTR-2011-114, 2011, 5.6: 12.
[19]MASKIN, Eric. Nash equilibrium and welfare optimality. The Review of Economic Studies, 1999, 66.1: 23-38.
[20]DOUZAS, Georgios; BACAO, Fernando. Effective data generation for imbalanced learning using conditional generative adversarial networks. Expert Systems with applications, 2018, 91: 464-471.
[21]ZHANG, He; SINDAGI, Vishwanath; PATEL, Vishal M. Image de-raining using a conditional generative adversarial network. IEEE transactions on circuits and systems for video technology, 2019.
[22]KNAPP, Steven K. Accelerate FPGA macros with one-hot approach. Electronic Design, 1990, 38.17: 71-78.
[23]LEVY, Omer; GOLDBERG, Yoav. Neural word Embedding as implicit matrix factorization. In: Advances in neural information processing systems. 2014. p. 2177-2185.
[24]HUBEL, David H.; WIESEL, Torsten N. Receptive fields and functional architecture of monkey striate cortex. The Journal of physiology, 1968, 195.1: 215-243.
[25]https://www.mathworks.com/solutions/deep-learning/convolutional-neural-network.html
[26]https://jhui.github.io/2017/11/14/Matrix-Capsules-with-EM-routing-Capsule-Network/
[27]https://analyticsindiamag.com/why-do-capsule-networks-work-better-than-convolutional-neural-networks/
[28]MUKHOMETZIANOV, Rinat; CARRILLO, Juan. CapsNet comparative performance evaluation for image classification. arXiv preprint arXiv:1805.11195, 2018.
[29]https://www.jiqizhixin.com/articles/2017-11-05
[30]ZHOU, Zhiming; ZHANG, Weinan; WANG, Jun. Inception score, label smoothing, gradient vanishing and-log (d (x)) alternative. arXiv preprint arXiv:1708.01729, 2017.
[31]IOFFE, Sergey; SZEGEDY, Christian. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
[32]MESSING, Dean; SEZAN, M. Ibrahim; DALY, Scott J. Image upsampling technique. U.S. Patent No 8,260,087, 2012.
[33]ODENA, Augustus; DUMOULIN, Vincent; OLAH, Chris. Deconvolution and checkerboard artifacts. Distill, 2016, 1.10: e3.
[34]SHI, Wenzhe, et al. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. p. 1874-1883.
[35]DENG, Li. The MNIST database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Processing Magazine, 2012, 29.6: 141-142.
[36]KRIZHEVSKY, Alex; NAIR, Vinod; HINTON, Geoffrey. The CIFAR-10 dataset. online: http://www. cs. toronto. edu/kriz/cifar. html, 2014, 55.
[37]https://blog.csdn.net/ZouCharming/article/details/82319721
[38]https://lt911.github.io/ Experiment on class-conditioned DCGAN
[39]https://github.com/4thgen/DCGAN-CIFAR10
[40]https://github.com/shaohua0116/DCGAN-Tensorflow
電子全文 電子全文(網際網路公開日期:20250707)
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊