(3.238.174.50) 您好!臺灣時間:2021/04/16 16:36
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:劉亞昇
研究生(外文):Ya-Sheng Liu
論文名稱:符合時間與場景描述之自動影像生成模型
論文名稱(外文):Automatic Nature Scene Image Generation with Time and Place Descriptions
指導教授:鄭旭詠
指導教授(外文):Hsu-Yung Cheng
學位類別:碩士
校院名稱:國立中央大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:中文
論文頁數:61
中文關鍵詞:對抗式生成網路影像生成注意力機制想像力機制
外文關鍵詞:GANImage GenerationAttentionImagination
相關次數:
  • 被引用被引用:0
  • 點閱點閱:81
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
隨著人工智慧的蓬勃發展,無論在影像辨識、語意辨識,影像生成…等等,機器學習都取得了優異的成果,「人工智慧」四個字顧名思義是要人類所創造出來的智慧,藉由讓電腦學習的方式來讓機器或電腦獲得一定的邏輯判斷能力,是目前我們所達到的,但如果微觀的角度去看人工智慧的發展其實還是未到達真正的智慧。
本篇論文主要是想模擬人類大腦的想像能力來增加生成模型的多樣性,在text-to-image這方面領域其實已經有一些研究了,像是近年的StackGAN、StackGAN++和AttnGAN,只是他們初始的目標都是針對鳥類(CUB-200)資料集和花朵(102Flowers)資料集去做訓練和優化,通常人類在想像一個事物時,通常會給予該事物一個描述,本篇最終目標是利用這個描述產生一個有故事性的圖片,目前階段以蒐集場景的資料來使神經網路有能力產生一個符合描述的場景圖並加強多樣性。
為了讓生成的圖片有更多的多樣性而不是特定的單幾種圖片,本篇利用圖片的隱藏層資訊來初始化RNN的Memory Cell來產生更豐富的圖片,從實驗結果中,比起直接套用先前研究的網路架構,加入這個方法確實有助於增加生成圖片的多樣性。
With the rapid development of artificial intelligence, machine learning has achieved excellent results in image recognition, semantic recognition, image generation, etc. The deep meaning of the words “artificial intelligence” are the wisdom of human being. Let the computer to learn the way to get a certain logical judgment ability, which is what we have achieved at present, but if we look at the development of artificial intel-ligence from a microscopic point of view, we still have not reached the true intelligence.
This paper is mainly to simulate the imagination of human brain. In the field of text-to-image, there have been some researches, such as StackGAN, StackGAN++ and AttnGAN in recent years, but their initial goal is to target bird dataset (CUB-200) and flower (102Flowers) dataset for training and optimization. Usually when people imag-ine a thing, they usually give a description of the thing. The ultimate goal of this paper is to produce a narrative photo with description. In present stage, we make neural-based network an ability of generating scene photos corresponded to the description and en-hance the diversity with our dataset.
In order to make the generated images more diverse than a specific single image, this paper uses the hidden layer information of the image to initialize the RNN Memory Cell to produce a narrative photo. From the experimental results, it indeed works. Comparing to the original AttnGAN architecture, our proposed method does help to increase the diversity of generated images.
摘要 i
Abstract ii
致謝 iii
目錄 iv
圖目錄 vi
表目錄 viii
第1章 緒論 1
1.1 研究動機 1
1.2 相關文獻 1
1.3 論文架構 3
第2章 資料蒐集 4
2.1 資料來源 4
2.1.1 OpenImagesV4 4
2.1.2 Places 6
2.2 資料蒐集 7
2.3 SceneDataset資料集描述 10
第3章 研究方法與系統架構 13
3.1 Attentional Generative and Imaginative Networks 13
3.1.1 注意力機制 14
3.1.2 想像力機制 15
3.1.3 損失函數 20
3.2 Deep Attentional Multimodal Similarity Model 22
3.2.1 影像編碼器 22
3.2.2 文字編碼器 23
3.2.3 Attention-driven image-text matching score 24
3.2.4 損失函數 25
第4章 實驗結果 27
4.1 實驗環境與設備 27
4.2 驗證方法 27
4.2.1 Inception Score 28
4.2.2 Fréchet Inception Distance ( FID ) 29
4.3 方法比較與實驗結果 30
4.4 風格轉換 40
第5章 結論與未來工作 44
參考文獻 45
[1] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville and Yoshua Bengio, "Generative Adversarial Networks," NIPS, 2014.
[2] Diederik P Kingma and Max Welling, “Auto-encoding variational bayes,” ICLR, 2014.
[3] Mehdi Mirza and Simon Osindero, “Conditional Generative Adversarial Nets,” arXiv:1411.1784, 2014.
[4] Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende and Daan Wierstra, “DRAW: A Recurrent Neural Network For Image Generation,” ICML, 2015.
[5] Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves and Koray Kavukcuoglu, “Conditional Image Generation with PixelCNN Decoders,” NIPS, 2016.
[6] Alec Radford, Luke Metz and Soumith Chintala, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks,” ICLR, 2016.
[7] Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele and Honglak Lee, “Generative Adversarial Text to Image Synthesis,” ICML, 2016.
[8] Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang and Dimitris Metaxas, “StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks,” ICCV, 2017.
[9] Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang and Dimitris Metaxas, “StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks,” arXiv:1710.10916, 2017.
[10] Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang and Xiaodong He, “AttnGAN: Fine-Grained Text to Image Generation With Attentional Generative Adversarial Networks,” CVPR, 2018.
[11] Andrew Brock, Jeff Donahue and Karen Simonyan, “Large Scale GAN Training for High Fidelity Natural Image Synthesis,” arXiv, 2018.
[12] Tero Karras, Timo Aila, Samuli Laine and Jaakko Lehtinen, “Progressive Growing of GANs for Improved Quality, Stability, and Variation,” arXiv:1710.10196, 2017.
[13] Scott Reed, Zeynep Akata, Bernt Schiele and Honglak Lee, “Learning Deep Representations of Fine-grained Visual Descriptions,” CVPR, 2016.
[14] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser and Illia Polosukhin, “Attention Is All You Need,” NIPS, 2017.
[15] Scott Reed, Zeynep Akata, Santosh Mohan, Samuel TenkaBernt Schiele and Honglak Lee, “Learning What and Where to Draw,” NIPS, 2016.
[16] Volodymyr Mnih, Nicolas Heess, Alex Graves and Koray Kavukcuoglu, “Recurrent Models of Visual Attention,” NIPS, 2014.
[17] Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel and Yoshua Bengio, “show attend and tell neural image caption generation with visual attention,” ICML, 2015.
[18] Tu Dinh Nguyen, Trung Le, Hung Vu and Dinh Phung, “Dual Discriminator Generative Adversarial Nets,” arXiv:1709.03831, 2017.
[19] Lantao Yu, Weinan Zhang, Jun Wang and Yong Yu, “SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient,” AAAI, 2017.
[20] Jingjing Xu, Xuancheng Ren, Junyang Lin and Xu Sun, “DP-GAN: Diversity-Promoting Generative Adversarial Network for Generating Informative and Diversified Text,” EMNLP, 2018.
[21] Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Tom Duerig and Vittorio Ferrari, “The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale,” arXiv:1811.00982, 2018.
[22] Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva and Antonio Torralba, “A 10 million Image Database for Scene Recognition,” IEEE, 2017.
[23] Martin Arjovsky, Soumith Chintala and Léon Bottou, “Wasserstein GAN”.arXiv:1701.07875.
[24] S Hochreiter and J Schmidhuber, “Long short-term memory,” Neural Computation, 1997.
[25] Qiantong Xu, Gao Huang, Yang Yuan, Chuan Guo, Yu Sun, Felix Wu and Kilian Weinberger, “An empirical study on evaluation metrics of generative adversarial networks,” arXiv:1806.07755, 2018.
[26] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler and Sepp Hochreiter, “GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium,” NIPS, 2017.
[27] Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” arXiv:1810.04805v2, 2018.
電子全文 電子全文(網際網路公開日期:20240723)
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔