跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.81) 您好!臺灣時間:2024/12/08 04:56
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:施雅方
研究生(外文):Ya-Fang Shih
論文名稱:基於深度共現特徵的影像辨識
論文名稱(外文):Deep Co-occurrence Feature Learning for Visual Object Recognition
指導教授:莊永裕林彥宇林彥宇引用關係
口試日期:2017-07-05
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2017
畢業學年度:105
語文別:英文
論文頁數:30
中文關鍵詞:影像辨識細粒度影像辨識共現特徵
外文關鍵詞:Object recognitionfine-grained recognitionco-occurrence feature
相關次數:
  • 被引用被引用:1
  • 點閱點閱:423
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
這篇論文解決了三項過去結合基於物體部件的表示方法及卷積類神經網路應用於影像辨識上的作法之問題。首先,大多基於物體部件的模型需要人工事先定義部件的個數與種類,然而,最適合用於影像辨識的物體部件時常會隨著要區分的資料而改變。此外,多數方法在訓練卷積類神經網路時需要使用包含部件位置資訊之訓練資料,人工成本相當昂貴。最後,過去方法為了表達部件間的位置關係,經常需要繁瑣的計算或是使用多支龐大的神經網路。
我們提出一種全新的共現特徵層來解決上述三項問題。共現特徵層延伸卷積層概念,利用網路中神經元自動學習以取代事先定義的物體部件,並記錄部件間共同出現的關係。在共現特徵層中,卷積層所產生的任兩張特徵圖像作為濾器及影像,以濾器對於影像進行相關濾波運算。網路路連接共現特徵層後仍可以由頭至尾訓練,且共現層產生的共現特徵能抵抗旋轉與位移,以及物體形變的影響。
我們在VGG-16及ResNet-152加上共現特徵層,將Caltech-UCSD
鳥類影像集的辨識正確率提升至83.6%及85.8%。此篇論文的原始碼
發佈於https://github.com/yafangshih/Deep-COOC。
This thesis addresses three issues in integrating part-based representations into convolutional neural networks (CNNs) for object recognition. First, most part-based models rely on a few pre-specified object parts. However,
the optimal object parts for recognition often vary from category to category. Second, acquiring training data with part-level annotation is laborintensive. Third, modeling spatial relationships between parts in CNNs often involves an exhaustive search of part templates over multiple network streams. We tackle the three issues by introducing a new network layer, called co-occurrence layer. It can extend a convolutional layer to encode
the co-occurrence between the visual parts detected by the numerous neurons, instead of a few pre-specified parts. To this end, the feature maps serve as both filters and images, and mutual correlation filtering is conducted between them. The co-occurrence layer is end-to-end trainable. The resultant co-occurrence features are rotation- and translation-invariant, and are robust
to object deformation. By applying this new layer to the VGG-16 and ResNet-152, we achieve the recognition rates of 83.6% and 85.8% on the Caltech-UCSD bird benchmark, respectively. The source code is available at
https://github.com/yafangshih/Deep-COOC.
口試委員會審定書 i
謝辭 ii
摘要 iii
Abstract iv
1 Introduction 1
2 Related work 4
2.1 Part-based methods for object recognition 4
2.2 CNNs with part-based representations 5
2.3 Fine-grained recognition 6
3 The proposed approach 7
3.1 Co-occurrence layer: Forward pass 7
3.2 Co-occurrence layer: Backward Propagation 9
3.3 Generalization 11
4 Implementation details 13
4.1 Feature maps reduction 13
4.2 Noise suppression 14
5 Experimental results 15
5.1 The CUB200-2011 dataset 15
5.2 Experimental setup 16
5.3 Comparison to the baseline 17
5.4 Comparison to previous work 20
5.5 Visualization 22
6 Conclusions 24
Bibliography 25
[1] Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. The Caltech-UCSD birds-200-2011 dataset. Technical report, 2011.
[2] Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao, and Fei-Fei Li. Novel dataset for fine-grained image categorization: Stanford dogs. In Proc. CVPR Workshop
on Fine-Grained Visual Categorization (FGVC), 2011.
[3] Subhransu Maji, Esa Rahtu, Juho Kannala, Matthew Blaschko, and Andrea Vedaldi. Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151, 2013.
[4] Tsung-Yu Lin, Aruni RoyChowdhury, and Subhransu Maji. Bilinear CNN models for fine-grained visual recognition. In Proc. Int’l Conf. Computer Vision, 2015.
[5] Mettu Srinivas, Yen-Yu Lin, and Hong-Yuan Mark Liao. Learning deep and sparse feature representation for fine-grained recognition. In Proc. Int’l Conf. Multimedia
and Expo, 2017.
[6] Han Zhang, Tao Xu, Mohamed Elhoseiny, Xiaolei Huang, Shaoting Zhang, Ahmed Elgammal, and Dimitris Metaxas. SPDA-CNN: Unifying semantic part detection
and abstraction for fine-grained recognition. In Proc. Conf. Computer Vision and Pattern Recognition, 2016.
[7] Thomas Berg and Peter N Belhumeur. POOF: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation. In Proc. Conf. Computer Vision and Pattern Recognition, 2013.
[8] Christoph G¨oering, Erik Rodner, Alexander Freytag, and Joachim Denzler. Nonparametric part transfer for fine-grained recognition. In Proc. Conf. Computer Vision and Pattern Recognition, 2014.
[9] Pedro F Felzenszwalb, Ross B Girshick, David McAllester, and Deva Ramanan. Object detection with discriminatively trained part-based models. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2010.
[10] Pedro F Felzenszwalb and Daniel P Huttenlocher. Pictorial structures for object recognition. Int’l J. Computer Vision, 2005.
[11] Ross Girshick, Forrest Iandola, Trevor Darrell, and Jitendra Malik. Deformable part models are convolutional neural networks. In Proc. Conf. Computer Vision and Pattern Recognition, 2015.
[12] Ning Zhang, Jeff Donahue, Ross Girshick, and Trevor Darrell. Part-based R-CNNs for fine-grained category detection. In Proc. Euro. Conf. Computer Vision, 2014.
[13] Li Wan, David Eigen, and Rob Fergus. End-to-end integration of a convolution network, deformable parts model and non-maximum suppression. In Proc. Conf. Computer Vision and Pattern Recognition, 2015.
[14] Dequan Wang, Zhiqiang Shen, Jie Shao, Wei Zhang, Xiangyang Xue, and Zheng Zhang. Multiple granularity descriptors for fine-grained categorization. In Proc. Int’l Conf. Computer Vision, 2015.
[15] Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In Proc. Euro. Conf. Computer Vision, 2014.
[16] Zhou Bolei, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. Object detectors emerge in deep scene CNNs. In Proc. Int’l Conf. Learning Representations, 2015.
[17] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In Proc. Int’l Conf. Learning Representations, 2015.
[18] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. Conf. Computer Vision and Pattern Recognition, 2016.
[19] Robert Fergus, Pietro Perona, and Andrew Zisserman. Object class recognition by unsupervised scale-invariant learning. In Proc. Conf. Computer Vision and Pattern Recognition, 2003.
[20] Markus Weber, Max Welling, and Pietro Perona. Towards automatic discovery of object categories. In Proc. Conf. Computer Vision and Pattern Recognition, 2000.
[21] Martin A Fischler and Robert A Elschlager. The representation and matching of pictorial structures. IEEE Transactions on computers, 1973.
[22] Pedro Felzenszwalb, David McAllester, and Deva Ramanan. A discriminatively trained, multiscale, deformable part model. In Proc. Conf. Computer Vision and Pattern Recognition, 2008.
[23] Navneet Dalal and Bill Triggs. Histograms of oriented gradients for human detection. In Proc. Conf. Computer Vision and Pattern Recognition, 2005.
[24] Yi Yang and Deva Ramanan. Articulated pose estimation with flexible mixtures-ofparts. In Proc. Conf. Computer Vision and Pattern Recognition, 2011.
[25] Xiangxin Zhu and Deva Ramanan. Face detection, pose estimation, and landmark localization in the wild. In Proc. Conf. Computer Vision and Pattern Recognition,
2012.
[26] Ning Zhang, Ryan Farrell, Forrest Iandola, and Trevor Darrell. Deformable part descriptors for fine-grained recognition and attribute prediction. In Proc. Int’l Conf. Computer Vision, 2013.
[27] Yonglong Tian, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning strong parts for pedestrian detection. In Proc. Int’l Conf. Computer Vision, 2015.
[28] Steve Branson, Grant Van Horn, Serge Belongie, and Pietro Perona. Bird species categorization using pose normalized deep convolutional nets. In Proc. British Conf. Machine Vision, 2014.
[29] Yuning Chai, Victor Lempitsky, and Andrew Zisserman. Symbiotic segmentation and part localization for fine-grained categorization. In Proc. Int’l Conf. Computer Vision, 2013.
[30] Di Lin, Xiaoyong Shen, Cewu Lu, and Jiaya Jia. Deep LAC: Deep localization, alignment and classification for fine-grained recognition. In Proc. Conf. Computer Vision and Pattern Recognition, 2015.
[31] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proc. Conf. Computer Vision and Pattern Recognition, 2014.
[32] Tianjun Xiao, Yichong Xu, Kuiyuan Yang, Jiaxing Zhang, Yuxin Peng, and Zheng Zhang. The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In Proc. Conf. Computer Vision and Pattern Recognition, 2015.
[33] Max Jaderberg, Karen Simonyan, Andrew Zisserman, et al. Spatial transformer networks. In Advances in Neural Information Processing Systems, 2015.
[34] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proc. Conf. Computer Vision and Pattern Recognition, 2015.
[35] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, 2012.
[36] Ning Zhang, Evan Shelhamer, Yang Gao, and Trevor Darrell. Fine-grained pose prediction, normalization, and recognition. In Proc. Int’l Conf. Learning Representations, 2016.
[37] Jia Deng,Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database. In Proc. Conf. Computer Vision and Pattern Recognition, 2009.
[38] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proc. Int’l Conf. Computer Vision, 2015.
[39] Andrea Vedaldi and Karel Lenc. MatConvNet – convolutional neural networks for matlab. In Proc. ACM Conf. Multimedia, 2015.
[40] Florent Perronnin, Jorge S´anchez, and Thomas Mensink. Improving the fisher kernel for large-scale image classification. In Proc. Euro. Conf. Computer Vision, 2010.
[41] Lingqiao Liu, Chunhua Shen, and Anton van den Hengel. The treasure beneath convolutional layers: Cross-convolutional-layer pooling for image classification. In Proc. Conf. Computer Vision and Pattern Recognition, 2015.
[42] Marcel Simon and Erik Rodner. Neural activation constellations: Unsupervised part model discovery with convolutional networks. In Proc. Int’l Conf. Computer Vision, 2015.
[43] Jonathan Krause, Hailin Jin, Jianchao Yang, and Li Fei-Fei. Fine-grained recognition without part annotations. In Proc. Conf. Computer Vision and Pattern Recognition,
2015.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top