跳到主要內容

臺灣博碩士論文加值系統

(44.211.26.178) 您好!臺灣時間:2024/06/24 21:27
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:鄭亦翔
研究生(外文):Cheng Yi-Xiang
論文名稱:基於深度學習的多視角影像檢索之研究
論文名稱(外文):A Study on Multi-View Image Retrieval Based on Deep Learning
指導教授:沈岱範林國祥林國祥引用關係
指導教授(外文):Shen Day-FannLin Guo-Shiang
口試委員:江振國賴文能
口試委員(外文):Chiang Chen-KuoLai Wen-Neng
口試日期:2020-07-22
學位類別:碩士
校院名稱:國立雲林科技大學
系所名稱:電機工程系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2020
畢業學年度:108
語文別:中文
論文頁數:170
中文關鍵詞:多視角影像檢索深度卷積生成對抗網路Triplet Network三胞胎神經網路
外文關鍵詞:DCGANTriplet Networkmulti-view image retrieval
相關次數:
  • 被引用被引用:1
  • 點閱點閱:330
  • 評分評分:
  • 下載下載:44
  • 收藏至我的研究室書目清單書目收藏:0
本論文研發一套基於深度學習的多視角影像檢索研究。此套影像檢索系統包含幾個部分: DCGAN深度卷積對抗生成網路、基於Triplet Network三胞胎神經網路之特徵描述、多視角影像檢索。在DCGAN深度卷積對抗生成網路部分,本論文使用 『生成模型』 (generative model)學習特徵並生成技術與 『判別模型』 (discriminative model)識別真實或偽造影像技術,利用兩模型之間的互相對抗原理。基於預訓練模型,本系統可以在訓練人體各個角度的姿態特徵後,在測試模型時合成出不同角度的影像。基於Triplet Network三胞胎神經網路之特徵描述,本系統利用Embedding 嵌入層將輸入的各類別多視圖影像特徵轉為各類的空間向量編碼,經過Triplet lose訓練過後改變特徵描述方式,使得空間中相同類別的物件越來越靠近,不相同類別的物件越來越遠離。基於每個物件的在高維空間位置之間的關係,本系統可以通過降維技術將這些物件的相似性可視化,最後測試階段則是利用L2 norm 歐幾里德距離計算各類別擷取的Embedding 嵌入層空間向量編碼找出最短距離,距離最短的類別為最相似的類別。
為了評估系統效能,本論文使用 MVC:A Dataset for view-invariant clothing Retrieval and attribute Prediction 影像資料庫進行實驗。此MVC 影像資料庫有264個服裝項目共161260張影像,每個服裝項目收集完整的四個不同角度多視圖。

關鍵字: DCGAN深度卷積對抗生成網路、Triplet Network三胞胎神經網路、多視角影像檢索

In this thesis, we developed a multi-view image retrieval scheme based on deep learning. This multi-view image retrieval system consists of several parts: deep donvolutional generative adversarial network (DCGAN), triplet Network, and multi-view image searching. In the first part, a DCGAN with a 3D model is used to generate multi-view clothing images. After multi-view image generation, a CNN is used to extract the representative features. In the second part, a triplet network is used to modify the representative features of clothing images for raising the capability of the proposed scheme for image retrieval. The final part is to measure the similarity to find the similar clothing images.
In order to evaluate the performance of the proposed system, the dataset, MVC, is used to conduct experiments. This MVC image database has 161,260 images of 264 clothing items, and each clothing item collects a complete set of four different angles and multi-views. The experimental results show that the proposed scheme can function well.

Keywords: Deep Convolutional Generative Adversarial Network (DCGAN), Triplet Network, multi-view image retrieval 

目錄
摘要 i
Abstract ii
誌謝 iii
目錄 iv
表目錄 vii
圖目錄 viii
第一章 緒論 1
1.1 研究背景 1
1.2 研究動機與目標 2
1.3 章節概述 5
第二章 背景簡介與文獻回顧 6
2.1 基礎CNN介紹 6
2.1.1 Convolution Layer卷積層 6
2.1.2 線性整流單位函數(Rectified Linear Unit,ReLU) 8
2.1.3 Pooling Layer池化層 8
2.1.4 Full connected layer全連接層 9
2.2 著名CNN網路架構 9
2.2.1 CNN發展史 9
2.2.1 AlexNet[32] 10
2.2.2 VGGNet[36] 12
2.3 基礎GAN介紹 14
2.2.1 GAN生成對抗網路[15] 14
2.4 多視角物件辨識 17
2.4.1多視角數據集的表示 17
2.4.2手工特徵(Features extraction) 18
2.4.3特徵擷取(Features extraction) 22
2.5 損失函數 25
2.6 影像檢索 27
第三章 整體系統架構描述 29
第四章 多視角影像生成 32
4.1 DCGAN深度卷積生成對抗網絡[34] 32
4.2 DCGAN網路架構 33
4.3 DCGAN損失函數 34
4.4 DCGAN訓練影像集 36
4.5 多視角影像生成 38
4.6 本章結論 42
第五章 基於Triplet Network之特徵描述 43
5.1 TRIPLET NETWORK三胞胎神經網路[8] 43
5.2 TRIPLET NETWORK網路架構 45
5.3 TRIPLET NETWORK損失函數 47
5.4 特徵描述資料庫 49
5.5 本章結論 52
第六章 多視角影像檢索 53
6.1 標準樣本 53
6.2 相似度距離計算 55
6.3 多視角影像檢索 56
6.4 本章結論 58
第七章 實驗結果與性能評估 59
7.1 實驗平台規格 59
7.2 實驗數據集 59
7.3 檢索結果性能評估指標 62
7.3.1 Recall Rate和Precision Rate 63
7.3.2 AP和mAP (Mean Average Precision) 63
7.4 系統分析 66
7.5 實驗數據 66
7.5.1 系統效能分析 67
7.5.2 影像類別檢索 85
7.5.3 服飾影像檢索 97
7.5.4 多視角影像檢索系統 111
7.6 本章結論 115
第八章 結論與未來展望 117
8.1 結論 117
8.2 未來研究方向 118
參考文獻 119
附錄 122
附錄1. TRIPLET LOSS影像檢索性能 122
附錄2. 多視角影像檢索性能 125
附錄3. 內/外部測試影像檢索性能 128
附錄4. 小雜訊/背景雜訊影像檢索性能 131
附錄5. 優化網路(增加TRIPLET LOSS疊代)影像檢索性能 134
附錄6. 優化網路(最小平均距離標準樣本)影像檢索性能 138


參考文獻
[1]S.-H. Chao, "基於風格辨識之服裝推薦系統," 臺灣大學資訊工程學研究所學位論文, pp. 1-62, 2016.
[2]李庭安, "日系韓系風格大 PK," 2019.
[3]林正中, "以類神經網路辨識服飾造型之研究," 紡織綜合研究期刊, vol. 17, no. 1, pp. 47-58, 2007.
[4]薛百芬, "服裝風格辨識系統," 2014.
[5]X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, "Multi-view 3d object detection network for autonomous
driving," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1907-1915.
[6]S. Chopra, R. Hadsell, and Y. LeCun, "Learning a similarity metric discriminatively, with application to face verification," in CVPR (1), 2005, pp. 539-546.
[7]Z. Gao, D. Wang, X. He, and H. Zhang, "Group-pair convolutional neural networks for multi-view based 3d object retrieval," in Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[8]E. Hoffer and N. Ailon, "Deep metric learning using triplet network," in International Workshop on Similarity-Based Pattern Recognition, 2015: Springer, pp. 84-92.
[9]F. Kinli, B. Ozcan, and F. Kirac, "Fashion Image Retrieval with Capsule Networks," in Proceedings of the IEEE International Conference on Computer Vision Workshops, 2019, pp. 0-0.
[10]C. R. Qi, H. Su, M. Nießner, A. Dai, M. Yan, and L. J. Guibas, "Volumetric and multi-view cnns for object classification on 3d data," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5648-5656.
[11]H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller, "Multi-view convolutional neural networks for 3d shape recognition," in Proceedings of the IEEE international conference on computer vision, 2015, pp. 945-953.
[12]Z. Wu et al., "3d shapenets: A deep representation for volumetric shapes," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1912-1920.
[13]K. Yamaguchi, M. H. Kiapour, L. E. Ortiz, and T. L. Berg, "Parsing clothing in fashion photographs," in 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012: IEEE, pp. 3570-3577.
[14]T. Yu, J. Meng, and J. Yuan, "Multi-view harmonized bilinear network for 3d object recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 186-194.
[15]Goodfellow, Ian J., Pouget-Abadie, Jean, Mirza, Mehdi, Xu, Bing, Warde-Farley, David, Ozair, Sherjil, Courville, Aaron C., and Bengio, Yoshua. Generative adversarial nets. NIPS, 2014.
[16]Haomiao Liu, Ruiping Wang, Shiguang Shan, and Xilin Chen. “Deep Supervised Hashing for Fast Image Retrieval.” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[17]A. Hermans, L. Beyer, and B. Leibe, "In defense of the triplet loss for person re-identification," arXiv preprint arXiv:1703.07737, 2017.
[18]F. Schroff, D. Kalenichenko, and J. Philbin, "Facenet: A unified embedding for face recognition and clustering," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 815-823.
[19]D. Wang, W. Cao, J. Li, and J. Ye, "DeepSD: supply-demand prediction for online car-hailing services using deep neural networks," in 2017 IEEE 33rd International Conference on Data Engineering (ICDE), 2017: IEEE, pp. 243-254.
[20]Zhai, You, and Luan Zeng. "A SIFT matching algorithm based on adaptive contrast threshold." Consumer Electronics, Communications and Networks (CECNet), 2011 International Conference on. IEEE, 2011.
[21]李瑞文.郭忠民.楊乃中, “以 SIFT 建立有效的巨觀及微觀特徵描述在影像檢索及分類上的應用”, 2013 義守大學資訊工程學系碩士論文
[22]Huang, Wei-Ting. "Affinity propagation based image clustering with SIFT and color features." 清華大學資訊工程學系學位論文 (2009): 1-47.
[23]維基百科:
https://zh.wikipedia.org/wiki/%E5%B0%BA%E5%BA%A6%E4%B8%8D%E8%AE%8A%E7%89%B9%E5%BE%B5%E8%BD%89%E6%8F%9B
[24]360doc個人圖書館:http://www.360doc.com/content/11/1129/15/3054335_168366315.shtml
[25]Baidu文庫:http://wenku.baidu.com/view/d156036f7e21af45b307a81d.html?re=view
[26]阿洲的程式教學: http://monkeycoding.com/?p=803
[27]360doc個人圖書館:http://www.360doc.com/content/11/1129/15/3054335_168366315.shtml
[28]CondenseNet算法筆記https://blog.csdn.net/u014380165/article/details/78747711
[29]F. Massa, B. C. Russell, and M. Aubry, "Deep exemplar 2d-3d detection by adapting from real to rendered views," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2016, pp. 6024-6033.
[30]SIFT算法詳解:https://blog.csdn.net/zddblog/article/details/7521424
[31]維基百科: https://en.wikipedia.org/wiki/Affine_transformation
[32]Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton, “Imagenet classification with deep convolutional neural networks,” Proceeding NIPS'12 Proceedings of the 25th International Conference on Neural Information Processing Systems, Volume 1, pp.1097-1105, 2012.
[33]Karen Simonyan, and Andrew Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” Computer Vision and Pattern Recognition (cs.CV), 2015.
[34]Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks,2016
[35]Kuan-Hsien Liu, Ting-Yen Chen, and Chu-Song Chen,"MVC: A Dataset for View-Invariant Clothing Retrieval and Attribute Prediction",ACM International Conference on Multimedia Retrieval, ICMR 2016.
[36]Redmon, J., Divvala, S., Girshick, R., Farhadi, A., “You only look once: Unified, real-time object detection.” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779 – 788, 2016.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top