研究生(外文):Ya-Fang Shih
論文名稱(外文):Deep Co-occurrence Feature Learning for Visual Object Recognition
外文關鍵詞:Object recognitionfine-grained recognitionco-occurrence feature
This thesis addresses three issues in integrating part-based representations into convolutional neural networks (CNNs) for object recognition. First, most part-based models rely on a few pre-specified object parts. However,
the optimal object parts for recognition often vary from category to category. Second, acquiring training data with part-level annotation is laborintensive. Third, modeling spatial relationships between parts in CNNs often involves an exhaustive search of part templates over multiple network streams. We tackle the three issues by introducing a new network layer, called co-occurrence layer. It can extend a convolutional layer to encode
the co-occurrence between the visual parts detected by the numerous neurons, instead of a few pre-specified parts. To this end, the feature maps serve as both filters and images, and mutual correlation filtering is conducted between them. The co-occurrence layer is end-to-end trainable. The resultant co-occurrence features are rotation- and translation-invariant, and are robust
to object deformation. By applying this new layer to the VGG-16 and ResNet-152, we achieve the recognition rates of 83.6% and 85.8% on the Caltech-UCSD bird benchmark, respectively. The source code is available at
1 Introduction 1
2 Related work 4
2.1 Part-based methods for object recognition 4
2.2 CNNs with part-based representations 5
2.3 Fine-grained recognition 6
3 The proposed approach 7
3.1 Co-occurrence layer: Forward pass 7
3.2 Co-occurrence layer: Backward Propagation 9
3.3 Generalization 11
4 Implementation details 13
4.1 Feature maps reduction 13
4.2 Noise suppression 14
5 Experimental results 15
5.1 The CUB200-2011 dataset 15
5.2 Experimental setup 16
5.3 Comparison to the baseline 17
5.4 Comparison to previous work 20
5.5 Visualization 22
6 Conclusions 24
