(3.215.180.226) 您好!臺灣時間:2021/03/06 16:16
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:黃紹賓
研究生(外文):Huang, Shao-Pin
論文名稱:基於影像之服裝更換系統
論文名稱(外文):CSSNet: Image-Based clothing styles switch
指導教授:施仁忠施仁忠引用關係
指導教授(外文):Shih, Zen-Chung
口試委員:施仁忠魏德樂張勤振
口試委員(外文):Shih, Zen-ChungWay, Der-LorChang, Chin-Chen
口試日期:2019-06-10
學位類別:碩士
校院名稱:國立交通大學
系所名稱:多媒體工程研究所
學門:電算機學門
學類:軟體發展學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:英文
論文頁數:37
中文關鍵詞:換衣衣服服飾服飾交換
外文關鍵詞:garmentclothingclothing swapping
相關次數:
  • 被引用被引用:0
  • 點閱點閱:85
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:10
  • 收藏至我的研究室書目清單書目收藏:0
我們提出CSSNet 的框架在不同姿勢、身形與穿著的人物下交換他們的
上衣。我們提出的方法包含三個階段:(1) 分離來源及目標人物的特徵,包含衣物、姿勢與語意分割(2) 合成真實且高畫質的目標穿著影像(3) 將來源衣物的複雜圖標轉移到目標穿著上。我們所提出的端到端神經網路架構可以生成特定人物穿著特定服飾。此外,我們提出一個後處理方法來復原缺失且模糊的複雜圖標在網路輸出的影像上。我們的成果比以往的方法更真實且更高品質。我們的方法可以同時保留服飾的外型與紋理。
We propose a framework, the CSSNet to exchange the upper clothes across people with different pose, body shape and clothing. We present an approach consists
of three stages. (1) Disentangling the features, such as cloth, body pose and semantic segmentation from source and target persons. (2) Synthesizing realistic and
high resolution target dressing style images. (3) Transfer the complex logo from
source clothing to target wearing. Our proposed end-to-end neural network architecture can generate the specific person to wear the target clothing. In addition,
we also propose a postprocess to recover the complex logos on network outputs
which are missing or blurring. Our results display more realistic and higher quality than previous methods. Our method can also preserve cloth shape and texture
simultaneously
摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Human Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 High Resolution Image Synthesis and Conditional GANs . . . . . . . . . . . . 5
2.3 Image-based Garment Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Texture and Style Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Chapter 3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Person Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.1 Pose Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.2 Human Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.3 Clothing Structure Extraction . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Switch Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3.1 Dual Path Encoder-Decoder Generator . . . . . . . . . . . . . . . . . 14
3.3.2 Local Discriminator . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.4 Texture Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4.1 Region of Interest Detection . . . . . . . . . . . . . . . . . . . . . . . 19
3.4.2 Transformation and Blending . . . . . . . . . . . . . . . . . . . . . . 19
iv
Chapter 4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.1 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3 Qualitative Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.4 Quantitative Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Chapter 5 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
[1] P. J. Burt and E. H. Adelson. The laplacian pyramid as a compact image code. IEEE
TRANSACTIONS ON COMMUNICATIONS, 31:532–540, 1983.
[2] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh. Realtime multi-person 2d pose estimation
using part affinity fields. In CVPR, 2017.
[3] C. Chan, S. Ginosar, T. Zhou, and A. A. Efros. Everybody dance now. arXiv preprint
arXiv:1808.07371, 2018.
[4] L. Dinh, J. Sohl-Dickstein, and S. Bengio. Density estimation using real NVP. CoRR,
abs/1605.08803, 2016.
[5] K. Gong, X. Liang, D. Zhang, X. Shen, and L. Lin. Look into person: Self-supervised
structure-sensitive learning and a new benchmark for human parsing. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
[6] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville,
and Y. Bengio. Generative adversarial nets. In Advances in Neural Information Processing
Systems 27, pages 2672–2680. 2014.
[7] R. A. Güler, N. Neverova, and I. Kokkinos. Densepose: Dense human pose estimation in
the wild. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
2018.
[8] X. Han, Z. Wu, Z. Wu, R. Yu, and L. S. Davis. Viton: An image-based virtual try-on
network. In CVPR, 2018.
34
[9] K. He, G. Gkioxari, P. Dollár, and R. Girshick. Mask R-CNN. In IEEE International
Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pages
2980–2988, 2017.
[10] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter. Gans trained by
a two time-scale update rule converge to a local nash equilibrium. In I. Guyon, U. V.
Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,
Advances in Neural Information Processing Systems 30, pages 6626–6637. Curran Associates, Inc., 2017.
[11] X. Hou and L. Zhang. Saliency detection: A spectral residual approach. In The IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), 2007.
[12] N. Jetchev and U. Bergmann. The conditional analogy GAN: swapping fashion articles on
people images. In IEEE International Conference on Computer Vision Workshops,(ICCV)
Workshops, pages 2287–2292, 2017.
[13] J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time style transfer and
super-resolution. In European Conference on Computer Vision (ECCV), 2016.
[14] T. Karras, T. Aila, S. Laine, and J. Lehtinen. Progressive growing of gans for improved
quality, stability, and variation. In International Conference on Machine Learning (ICLR),
2018.
[15] D. P. Kingma and P. Dhariwal. Glow: Generative flow with invertible 1x1 convolutions.
In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal,
Canada., pages 10236–10245, 2018.
[16] D. P. Kingma and M. Welling. Auto-encoding variati
retrieval and attribute prediction. ACM International Conference on Multimedia Retrieval,
ICMR, 2016.
[18] X. Liang, K. Gong, X. Shen, and L. Lin. Look into person: Joint body parsing & pose
estimation network and a new benchmark. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 2018.
[19] X. Mao, Q. Li, H. Xie, R. Y. K. Lau, Z. Wang, and S. P. Smolley. Least squares generative
adversarial networks. In IEEE International Conference on Computer Vision, ICCV 2017,
Venice, Italy, October 22-29, 2017, pages 2813–2821, 2017.
[20] Y. Men, Z. Lian, Y. Tang, and J. Xiao. A common framework for interactive texture
transfer. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
2018.
[21] M. Mirza and S. Osindero. Conditional generative adversarial nets. arXiv, preprint
arXiv:1411.1784, 2014.
[22] Y. Pu, Z. Gan, R. Henao, X. Yuan, C. Li, A. Stevens, and L. Carin. Variational autoencoder
for deep learning of images, labels and captions. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016,
December 5-10, 2016, Barcelona, Spain, pages 2352–2360, 2016.
[23] A. Raj, P. Sangkloy, H. Chang, J. Lu, D. Ceylan, and J. Hays. Swapnet: Garment transfer in
single view images. In The European Conference on Computer Vision (ECCV), September
2018.
[24] V. Ramakrishna, D. Munoz, M. Hebert, A. J. Bagnell, and Y. Sheikh. Pose machines:
Articulated pose estimation via inference machines. In ECCV, 2014.
[25] O. Ronneberger, P.Fischer, and T. Brox. U-net: Convolutional networks for biomedical
image segmentation. In Medical Image Computing and Computer-Assisted, volume 9351,
pages 234–241. Springer, 2015.
[26] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. Improved
techniques for training gans. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and
36
R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 2234–
2242. Curran Associates, Inc., 2016.
[27] T. Simon, H. Joo, I. Matthews, and Y. Sheikh. Hand keypoint detection in single images
using multiview bootstrapping. In CVPR, 2017.
[28] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image
recognition. In 3rd International Conference on Learning Representations, ICLR 2015,
San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
[29] B. Wang, H. Zheng, X. Liang, Y. Chen, and L. Lin. Toward characteristic-preserving
image-based virtual try-on network. In Proceedings of the European Conference on Computer Vision (ECCV), pages 589–604, 2018.
[30] T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A. Tao, J. Kautz, and B. Catanzaro. High-resolution
image synthesis and semantic manipulation with conditional gans. Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, 2018.
[31] S.-E. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh. Convolutional pose machines. In
CVPR, 2016.
[32] W. Xian, P. Sangkloy, V. Agrawal, A. Raj, J. Lu, C. Fang, F. Yu, and J. Hays. Texturegan: Controlling deep image synthesis with texture patches. In The IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 2018.
[33] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired image-to-image translation using
cycle-consistent adversarial networks. In IEEE International Conference on Computer
Vision (ICCV), 2017.
[34] S. Zhu, S. Fidler, R. Urtasun, D. Lin, and C. L. Chen. Be your own prada: Fashion synthesis with structural coherence. In Proceedings of the IEEE Conference on International
Conference on Computer Vision, 2017.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關論文
 
無相關期刊
 
無相關點閱論文
 
系統版面圖檔 系統版面圖檔