跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.134) 您好!臺灣時間:2025/11/21 04:54
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:郭盈希
研究生(外文):Yin-Hsi Kuo
論文名稱:利用情境資訊的影像搜尋
論文名稱(外文):Exploiting Contextual Information for Visual Search
指導教授:徐宏民
口試日期:2017-05-31
學位類別:博士
校院名稱:國立臺灣大學
系所名稱:資訊網路與多媒體研究所
學門:電算機學門
學類:網路學類
論文種類:學術論文
論文出版年:2017
畢業學年度:105
語文別:英文
論文頁數:96
中文關鍵詞:視覺文字整合區域特徵深度特徵情境資訊影像搜尋
外文關鍵詞:Bag-of-Words (BoW)Vector of Locally Aggregated Descriptors (VLAD)Deep featuresContextual informationVisual search
相關次數:
  • 被引用被引用:1
  • 點閱點閱:279
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
With the prevalence of capture devices, people are used to share their images and videos on the social media (e.g., Flickr and Facebook). To provide relevant information (e.g., reviews, landmark names, products) for these uploaded media, the need for effective and efficient visual search (e.g., image retrieval, mobile visual search, product search) is emerging. It enables plenty of applications such as recommendation, annotation, and advertisement. The state-of-the-art approaches (visual features) usually suffer from low recall rates because small changes in lighting conditions, viewpoints, or occlusions could degrade the performance significantly. We observe that enormous media collections are along with rich contextual cues such as tags, geo-locations, descriptions, and time. Hence, we propose to exploit different contextual information with the state-of-the-art visual features for solving the above challenges, and are able to improve the retrieval accuracy and provide diverse search results.
口試委員會審定書 iii

摘要 v

Abstract vii

1 Introduction 1

2 Related Works 5
2.1 Image Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Mobile Visual Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Product Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 BoW for Image Retrieval 11
3.1 Key Observations—Requiring Semantic Feature for Image Retrieval . . . 15
3.1.1 Sparseness of the Visual Words . . . . . . . . . . . . . . . . . . 15
3.1.2 Lacking Semantics Related Features . . . . . . . . . . . . . . . . 16
3.2 Semantic Feature Discovery Framework . . . . . . . . . . . . . . . . . . 17
3.2.1 Graph Construction and Image Clustering . . . . . . . . . . . . . 18
3.2.2 Auxiliary Visual Word Propagation . . . . . . . . . . . . . . . . 19
3.2.3 Common Visual Word Selection . . . . . . . . . . . . . . . . . . 20
3.2.4 Iteration of Propagation and Selection . . . . . . . . . . . . . . . 22
3.3 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3.1 Convexity of the Formulations . . . . . . . . . . . . . . . . . . . 22
3.3.2 Gradient Descent Solver (GD) . . . . . . . . . . . . . . . . . . . 23
3.3.3 Analytic Solver (AS) . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4 Tag Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4.1 Tag Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4.2 Tag Selection and Sparsity of Tags . . . . . . . . . . . . . . . . . 27

4 VLAD for Mobile Visual Search 29
4.1 Context-Aware BoW Reconstruction . . . . . . . . . . . . . . . . . . . . 31
4.1.1 BoW Reconstruction from VLAD with Sparse Constraint . . . . . 32
4.1.2 Context-Aware Dictionary Selection (CADS) . . . . . . . . . . . 34
4.1.3 BoW Reconstruction with Prior Knowledge (BRPK) . . . . . . . 34
4.2 Reversible Binary Code Generation . . . . . . . . . . . . . . . . . . . . 36
4.2.1 Principal Component Analysis Hashing (Joint PCAH) . . . . . . 36
4.2.2 Memory-Efficient Binary Hashing (Ind. and Shared PCAH) . . . 37
4.2.3 VLAD Approximation from Binary Codes . . . . . . . . . . . . 38

5 Deep Features for Product Search 39
5.1 Observations on E-Commerce Data . . . . . . . . . . . . . . . . . . . . 41
5.1.1 Higher Similarity Scores on Target Products . . . . . . . . . . . . 43
5.1.2 Higher Retrieval Accuracy with Objects . . . . . . . . . . . . . . 44
5.2 Rank-Based Feature Learning . . . . . . . . . . . . . . . . . . . . . . . 44
5.2.1 Global Feature Extraction from ConvNets . . . . . . . . . . . . . 44
5.2.2 Attention-Based Local Feature Extraction . . . . . . . . . . . . . 46
5.2.3 Learning by Rank-Based Candidate Selection . . . . . . . . . . . 46

6 Experiment Settings 49
6.1 Settings for Image Retrieval . . . . . . . . . . . . . . . . . . . . . . . . 49
6.1.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.1.2 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . 49
6.1.3 Evaluation Protocols . . . . . . . . . . . . . . . . . . . . . . . . 50
6.2 Settings for Mobile Visual Search . . . . . . . . . . . . . . . . . . . . . 51
6.3 Settings for Product Search . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.3.1 Parameter Setting and Evaluation . . . . . . . . . . . . . . . . . 52
6.3.2 Initial Setting for the Attention Layer . . . . . . . . . . . . . . . 52
6.3.3 DeepFashion and Alibaba Datasets . . . . . . . . . . . . . . . . 53

7 Experiment Results and Discussions 55
7.1 Results on Image Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.1.1 The Performance of Auxiliary Visual Words . . . . . . . . . . . . 55
7.1.2 The Performance of Tag Refinement . . . . . . . . . . . . . . . . 58
7.1.3 Parameter Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . 62
7.1.4 Brief Summary for Image Retrieval . . . . . . . . . . . . . . . . 63
7.2 Results on Mobile Visual Search . . . . . . . . . . . . . . . . . . . . . . 63
7.2.1 Experiments on the Choice of Lambda ( ) . . . . . . . . . . . . . 63
7.2.2 BoW Reconstruction on SMVS Dataset . . . . . . . . . . . . . . 65
7.2.3 VLAD Approximation from Reversible Binary Codes . . . . . . 66
7.2.4 BoW Reconstruction on Oxford Buildings Dataset . . . . . . . . 68
7.2.5 Brief Summary for Mobile Visual Search . . . . . . . . . . . . . 72
7.3 Results on Product Search . . . . . . . . . . . . . . . . . . . . . . . . . 72
7.3.1 Accuracy on Global and Local Features . . . . . . . . . . . . . . 72
7.3.2 Combined Features on DeepFashion Testing . . . . . . . . . . . 73
7.3.3 Experiments on AlibabaS . . . . . . . . . . . . . . . . . . . . . 75
7.3.4 Learning with Different Hard Negatives . . . . . . . . . . . . . . 75
7.3.5 Learning with Ambiguous (Hard) Positives . . . . . . . . . . . . 76
7.3.6 Experiments on AlibabaM (testing) . . . . . . . . . . . . . . . . 77
7.3.7 Visualization on the Attention Layer . . . . . . . . . . . . . . . . 79
7.3.8 Brief Summary for Product Search . . . . . . . . . . . . . . . . . 79

8 Conclusions 81

Bibliography 83
[1] M. Ames and M. Naaman. Why we tag: motivations for annotation in mobile and online media. In Proceedings of the SIGCHI conference on Human factors in computing systems, pages 971–980, 2007.
[2] R. Arandjelović, P. Gronat, A. Torii, T. Pajdla, and J. Sivic. NetVLAD: CNN architecture for weakly supervised place recognition. In CVPR, 2016.
[3] R. Arandjelović and A. Zisserman. Three things everyone should know to improve object retrieval. In IEEE Conference on Computer Vision and Pattern Recognition, 2012.
[4] R. Arandjelović and A. Zisserman. All about VLAD. In IEEE Conference on Computer Vision and Pattern Recognition, 2013.
[5] H. Azizpour, A. Sharif Razavian, J. Sullivan, A. Maki, and S. Carlsson. From generic to specific deep representations for visual recognition. In CVPR Workshops, 2015.
[6] A. Babenko and V. Lempitsky. Aggregating local deep features for image retrieval. In ICCV, 2015.
[7] A. Babenko, A. Slesarev, A. Chigorin, and V. S. Lempitsky. Neural codes for image retrieval. In European Conference on Computer Vision, pages 584–599, 2014.
[8] R. G. Baraniuk. Compressive sensing. Lecture Notes in IEEE Signal Processing Magazine, 24(4):118–120, Jul. 2007.
[9] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool. Speeded-up robust features (surf). Comput. Vis. Image Underst., 110(3):346–359, June 2008.
[10] A. Borji and L. Itti. State-of-the-art in visual attention modeling. IEEE Trans. Pattern Anal. Mach. Intell., 35(1):185–207, Jan. 2013.
[11] V. Chandrasekhar, D. Chen, S. Tsai, N.-M. Cheung, H. Chen, G. Takacs, Y. Reznik, R. Vedantham, R. Grzeszczuk, J. Bach, and B. Girod. Stanford mobile visual search dataset. Stanford Digital Repository, 2013. Available at: http://purl.stanford.edu/rb470rw0983.
[12] V. Chandrasekhar, G. Takacs, D. M. Chen, S. S. Tsai, Y. Reznik, R. Grzeszczuk, and B. Girod. Compressed histogram of gradients: A low-bitrate descriptor. Int. J. Comput. Vision, 96(3):384–399, Feb. 2012.
[13] V. R. Chandrasekhar, D. M. Chen, S. S. Tsai, N.-M. Cheung, H. Chen, G. Takacs, Y. Reznik, R. Vedantham, R. Grzeszczuk, J. Bach, and B. Girod. The Stanford mobile visual search data set. In Proceedings of the second annual ACM conference on Multimedia systems, pages 117–122, 2011.
[14] M. S. Charikar. Similarity estimation techniques from rounding algorithms. In Proceedings of the thirty-fourth annual ACM symposium on Theory of computing, STOC ’02, pages 380–388, 2002.
[15] D. Chen, S. Tsai, V. Chandrasekhar, G. Takacs, H. Chen, R. Vedantham, R. Grzeszczuk, and B. Girod. Residual enhanced visual vectors for on-device image matching. In IEEE Asilomar Conference on Signals, Systems, and Computer, 2011.
[16] D. Chen, S. Tsai, V. Chandrasekhar, G. Takacs, R. Vedantham, R. Grzeszczuk, and B. Girod. Residual enhanced visual vector as a compact signature for mobile visual search. Signal Process., 93(8):2316–2327, Aug. 2013.
[17] D. M. Chen, G. Baatz, K. Köser, S. S. Tsai, R. Vedantham, T. Pylvänäinen, K. Roimela, X. Chen, J. Bach, M. Pollefeys, B. Girod, and R. Grzeszczuk. City-scale landmark identification on mobile devices. In IEEE Conf. on Computer Vision and Pattern Recognition, pages 737–744, 2011.
[18] D. M. Chen and B. Girod. Memory-efficient image databases for mobile visual search. IEEE MultiMedia, 21(1):14–23, 2014.
[19] D. M. Chen and B. Girod. A hybrid mobile visual search system with compact global signatures. IEEE Transactions on Multimedia, 17(7):1019–1030, 2015.
[20] D. M. Chen, S. S. Tsai, V. Chandrasekhar, G. Takacs, J. P. Singh, and B. Girod. Tree histogram coding for mobile image matching. In Data Compression Conference, pages 143–152, 2009.
[21] Z.-Q. Cheng, Y. Liu, X. Wu, and X.-S. Hua. Video ecommerce: Towards online video advertising. In ACM Multimedia, pages 1365–1374, 2016.
[22] O. Chum, A. Mikulík, M. Perdoch, and J. Matas. Total recall ii: Query expansion revisited. In CVPR, pages 889–896, 2011.
[23] O. Chum, M. Perdoch, and J. Matas. Geometric min-hashing: Finding a (thick) needle in a haystack. In IEEE Conf. on Computer Vision and Pattern Recognition, pages 17–24, 2009.
[24] O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman. Total recall: Automatic query expansion with a generative feature model for object retrieval. In IEEE International Conference on Computer Vision, 2007.
[25] J. Dai, K. He, and J. Sun. Convolutional feature masking for joint object and stuff segmentation. In CVPR, 2015.
[26] J. Dai, K. He, and J. Sun. Instance-aware semantic segmentation via multi-task network cascades. In CVPR, 2016.
[27] L. Dai, H. Yue, X. Sun, and F. Wu. Imshare: instantly sharing your mobile landmark images by search-based reconstruction. In Proceedings of the 20th ACM international conference on Multimedia, pages 579–588, 2012.
[28] J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In Symposium on Operating Systems Design and Implementation, pages 137–150, 2004.
[29] J. Delhumeau, P.-H. Gosselin, H. Jégou, and P. Pérez. Revisiting the VLAD image representation. In ACM Multimedia, 2013.
[30] M. Douze, A. Ramisa, and C. Schmid. Combining attributes and Fisher vectors for efficient image retrieval. In IEEE Conf. on Computer Vision and Pattern Recognition, 2011.
[31] T. Elsayed, J. Lin, and D. Oard. Pairwise document similarity in large collections with mapreduce. In the Association for Computational Linguistics, pages 265–268, 2008.
[32] B. J. Frey and D. Dueck. Clustering by passing messages between data points. Science, 315:972–976, 2007.
[33] S. Gammeter, L. Bossard, T. Quack, and L. Van Gool. I know what you did last summer: Object-level auto-annotation of holiday snaps. In IEEE International Conference on Computer Vision, pages 614–621, 2009.
[34] B. Girod, V. Chandrasekhar, D. M. Chen, N. Cheung, R. Grzeszczuk, Y. A. Reznik, G. Takacs, S. S. Tsai, and R. Vedantham. Mobile visual search. IEEE Signal Process. Mag., 28(4):61–76, 2011.
[35] Y. Gong, S. Kumar, H. A. Rowley, and S. Lazebnik. Learning binary codes for high-dimensional data using bilinear projections. In IEEE Conference on Computer Vision and Pattern Recognition, pages 484–491, 2013.
[36] Y. Gong, S. Lazebnik, A. Gordo, and F. Perronnin. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 99, 2012.
[37] Y. Gong, L. Wang, R. Guo, and S. Lazebnik. Multi-scale orderless pooling of deep convolutional activation features. In ECCV, pages 392–407, 2014.
[38] A. Gordo, J. Almazán, J. Revaud, and D. Larlus. Deep image retrieval: Learning global representations for image search. In ECCV, pages 241–257, 2016.
[39] J. Guo, H. Prasetyo, and J. Chen. Content-based image retrieval using error diffusion block truncation coding features. IEEE Trans. Circuits Syst. Video Techn., 25(3):466–481, 2015.
[40] J. Hays and A. A. Efros. IM2GPS: estimating geographic information from a single image. In IEEE Conf. on Computer Vision and Pattern Recognition, 2008.
[41] J. He, J. Feng, X. Liu, T. Cheng, T.-H. Lin, H. Chung, and S.-F. Chang. Mobile product search with bag of hash bits and boundary reranking. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
[42] G. Hinton and R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504 – 507, 2006.
[43] R. Hong, Y. Yang, M. Wang, and X. S. Hua. Learning visual semantic relationships for efficient visual retrieval. IEEE Transactions on Big Data, 1(4):152–161, Dec 2015.
[44] J. Huang, R. S. Feris, Q. Chen, and S. Yan. Cross-domain image retrieval with a dual attribute-aware ranking network. In ICCV, 2015.
[45] M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu. Spatial transformer networks. In NIPS, pages 2017–2025, 2015.
[46] H. Jégou and O. Chum. Negative evidences and co-occurrences in image retrieval: the benefit of pca and whitening. In Proceedings of the 12th European conference on Computer Vision - Volume Part II, ECCV’12, pages 774–787, 2012.
[47] H. Jegou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell., 33(1):117–128, Jan. 2011.
[48] H. Jégou, M. Douze, C. Schmid, and P. Pérez. Aggregating local descriptors into a compact image representation. In CVPR, pages 3304–3311, 2010.
[49] H. Jégou, F. Perronnin, M. Douze, J. Sánchez, P. Pérez, and C. Schmid. Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9):1704–1716, Sept. 2012.
[50] R. Ji, L.-Y. Duan, J. Chen, H. Yao, Y. Rui, S.-F. Chang, and W. Gao. Towards low bit rate mobile visual search with multiple-channel coding. In Proceedings of the 19th ACM international conference on Multimedia, pages 573–582, 2011.
[51] Y.-G. Jiang, J. Wang, X. Xue, and S.-F. Chang. Query-adaptive image search with hash codes. IEEE Transactions on Multimedia, 15(2):442–453, 2013.
[52] Y. Jing, D. Liu, D. Kislyuk, A. Zhai, J. Xu, J. Donahue, and S. Tavel. Visual search at pinterest. In KDD, pages 1889–1898, 2015.
[53] J. Johnson, M. Douze, and H. Jégou. Billion-scale similarity search with gpus. CoRR, abs/1702.08734, 2017.
[54] M. Journée, Y. Nesterov, P. Richtárik, and R. Sepulchre. Generalized power method for sparse principal component analysis. J. Mach. Learn. Res., 11:517–553, Mar. 2010.
[55] H. Kaiming, Z. Xiangyu, R. Shaoqing, and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. In ECCV, 2014.
[56] Y. Kalantidis, C. Mellina, and S. Osindero. Cross-dimensional weighting for aggregated deep convolutional features. In ECCV Workshops, pages 685–701, 2016.
[57] L. Kennedy, M. Naaman, S. Ahern, R. Nair, and T. Rattenbury. How flickr helps us make sense of the world: context and content in community-contributed media collections. In ACM Multimedia, pages 631–640, 2007.
[58] L. Kennedy, M. Slaney, and K. Weinberger. Reliable tags using image similarity: mining specificity and expertise from large-scale multimedia databases. In Proceedings of the 1st workshop on Web-scale multimedia corpus, 2009.
[59] M. H. Kiapour, X. Han, S. Lazebnik, A. C. Berg, and T. L. Berg. Where to buy it: Matching street clothing photos in online shops. In ICCV, 2015.
[60] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1097–1105, 2012.
[61] Y. Kuo, W. Cheng, H. Lin, and W. H. Hsu. Unsupervised semantic feature discovery for image object retrieval and tag refinement. IEEE Trans. Multimedia, 14(4):1079–1090, 2012.
[62] Y. Kuo and W. H. Hsu. Dehashing: Server-side context-aware feature reconstruction for mobile visual search. IEEE Trans. Circuits Syst. Video Techn., 27(1):139–148, 2017.
[63] Y.-H. Kuo, H.-T. Lin, W.-H. Cheng, Y.-H. Yang, and W. H. Hsu. Unsupervised auxiliary visual words discovery for large-scale image object retrieval. In IEEE Conf. on Computer Vision and Pattern Recognition, 2011.
[64] H. Lee, A. Battle, R. Raina, and A. Y. Ng. Efficient sparse coding algorithms. In Neural Information Processing Systems, pages 801–808, 2006.
[65] X. Li, C. G. M. Snoek, and M. Worring. Learning tag relevance by neighbor voting for social image retrieval. In Multimedia Information Retrieval, pages 180–187, 2008.
[66] X. Li, C. G. M. Snoek, and M. Worring. Unsupervised multi-feature tag relevance learning for social image retrieval. In Proceedings of the ACM International Conference on Image and Video Retrieval, pages 10–17, 2010.
[67] T.-Y. Lin, Y. Cui, S. Belongie, and J. Hays. Learning deep representations for ground-to-aerial geolocalization. In CVPR, 2015.
[68] Z. Liu, H. Li, L. Zhang, W. Zhou, and Q. Tian. Cross-indexing of binary SIFT codes for large-scale image search. IEEE Transactions on Image Processing, 23(5):2047–2057, 2014.
[69] Z. Liu, H. Li, W. Zhou, T. Rui, and Q. Tian. Making residual vector distribution uniform for distinctive image representation. IEEE Transactions on Circuits and Systems for Video Technology, 26(2):375–384, 2016.
[70] Z. Liu, H. Li, W. Zhou, R. Zhao, and Q. Tian. Contextual hashing for large-scale image search. IEEE Transactions on Image Processing, 23(4):1606–1614, 2014.
[71] Z. Liu, P. Luo, S. Qiu, X. Wang, and X. Tang. Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In CVPR, 2016.
[72] D. G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Computer Vision, 60 (2):91–110, 2004.
[73] S. Lu, T. Mei, J. Wang, J. Zhang, Z. Wang, and S. Li. Exploratory product image search with circle-to-search interaction. IEEE Trans. Circuits Syst. Video Techn., 25(7):1190–1202, 2015.
[74] H. Ma, J. Zhu, M. R.-T. Lyu, and I. King. Bridging the semantic gap between image contents and tags. IEEE Transactions on Multimedia, 12 (5):462–473, 2010.
[75] A. Mahendran and A. Vedaldi. Salient deconvolutional networks. In European Conference on Computer Vision, 2016.
[76] J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online dictionary learning for sparse coding. In International Conference on Machine Learning, 2009.
[77] J. Mairal, F. R. Bach, J. Ponce, and G. Sapiro. Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, pages 19–60, 2010.
[78] P. K. Mallapragada, R. Jin, and A. K. Jain. Online visual vocabulary pruning using pairwise constraints. In IEEE Conf. on Computer Vision and Pattern Recognition, 2010.
[79] V. Mezaris, H. Doulaverakis, S. Herrmann, B. Lehane, I. Kompatsiaris, and M. G. Strintzis. Combining textual and visual information processing for interactive video retrieval: SCHEMA’s participation in TRECVID 2004. In TRECVID Workshop, 2004.
[80] A. Mikulík, M. Perdoch, O. Chum, and J. Matas. Learning vocabularies over a fine quantization. International Journal of Computer Vision, 103(1):163–175, 2013.
[81] E. Mohedano, K. McGuinness, N. E. O’Connor, A. Salvador, F. Marques, and X. Giro-i Nieto. Bags of local convolutional features for scalable instance search. In ICMR, pages 327–331, 2016.
[82] J. Y.-H. Ng, F. Yang, and L. S. Davis. Exploiting local features from deep networks for image retrieval. In CVPR Workshops, 2015.
[83] D. Nistér and H. Stewenius. Scalable recognition with a vocabulary tree. In IEEE Conf. on Computer Vision and Pattern Recognition, pages 2161–2168, 2006.
[84] H. Noh, A. Araujo, J. Sim, T. Weyand, and B. Han. Large-scale image retrieval with attentive deep local features. In ICCV, 2017.
[85] F. Perronnin, Y. Liu, J. Sánchez, and H. Poirier. Large-scale image retrieval with compressed fisher vectors. In The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13-18 June 2010, pages 3384–3391, 2010.
[86] K. B. Petersen and M. S. Pedersen. The matrix cookbook, nov 2012. Version 20121115.
[87] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In IEEE Conference on Computer Vision and Pattern Recognition, 2007.
[88] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In IEEE Conf. on Computer Vision and Pattern Recognition, 2008.
[89] J. Philbin, M. Isard, J. Sivic, and A. Zisserman. Descriptor learning for efficient retrieval. In European Conference on Computer Vision, 2010.
[90] P. O. Pinheiro, R. Collobert, and P. Dollar. Learning to segment object candidates. In NIPS, pages 1990–1998, 2015.
[91] S. Qi, K. Zawlin, H. Zhang, X. Wang, K. Gao, L. Yao, and T. seng Chua. Saliency meets spatial quantization: A practical framework for large scale product search. In IEEE ICME Workshops, 2016.
[92] F. Radenović, G. Tolias, and O. Chum. CNN image retrieval learns from BoW: Unsupervised fine-tuning with hard examples. In ECCV, 2016.
[93] A. Rahimi and B. Recht. Random features for large-scale kernel machines. In NIPS, 2007.
[94] A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. CNN features off-theshelf: An astounding baseline for recognition. In CVPR Workshops, pages 512–519, 2014.
[95] A. S. Razavian, J. Sullivan, A. Maki, and S. Carlsson. A baseline for visual instance retrieval with deep convolutional networks. In ICLR Workshop, 2015.
[96] S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS, 2015.
[97] A. Salvador, X. Giro-i Nieto, F. Marques, and S. Satoh. Faster r-cnn features for instance search. In CVPR Workshops, 2016.
[98] K. Simonyan, A. Vedaldi, and A. Zisserman. Learning local feature descriptors using convex optimisation. IEEE TPAMI, 2014.
[99] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
[100] J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. In IEEE International Conference on Computer Vision, volume 2, pages 1470–1477, 2003.
[101] A. F. Smeaton, P. Over, and W. Kraaij. Evaluation campaigns and TRECVid. In Multimedia information retrieval, 2006.
[102] V. L. T. Trzcinski, M. Christoudias and P. Fua. Boosting Binary Keypoint Descriptors. In CVPR, 2013.
[103] B. Thomee, E. M. Bakker, and M. S. Lew. TOP-SURF: a visual words toolkit. In Proceedings of the international conference on Multimedia, 2010.
[104] G. Tolias, R. Sicre, and H. Jégou. Particular object retrieval with integral max-pooling of CNN activations. In ICLR, 2016.
[105] T. Trzcinski, M. Christoudias, and V. Lepetit. Learning image descriptors with boosting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3):597–610, 2015.
[106] T. Trzcinski, M. Christoudias, V. Lepetit, and P. Fua. Boosting Binary Keypoint Descriptors. In Computer Vision and Pattern Recognition, 2013.
[107] P. Turcot and D. Lowe. Better matching with fewer features: The selection of useful features in large database recognition problems. In ICCV Workshop on Emergent Issues in Large Amounts of Visual Data, pages 2109–2116, 2009.
[108] A. Vedaldi and K. Lenc. Matconvnet – convolutional neural networks for matlab. In ACM Multimedia, 2015.
[109] J. Wang, S. Kumar, and S.-F. Chang. Semi-supervised hashing for large scale search. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2012.
[110] J. Wang, W. Liu, S. Kumar, and S.-F. Chang. Learning to hash for indexing big data—a survey. Proceedings of the IEEE, 104(1):34–57, 2016.
[111] X. Wang, K. Liu, and X. Tang. Query-specific visual semantic spaces for web image re-ranking. In IEEE Conf. on Computer Vision and Pattern Recognition, pages 857–864, 2011.
[112] X. Wang, Z. Sun, W. Zhang, Y. Zhou, and Y.-G. Jiang. Matching user photos to online products with robust deep features. In ACM ICMR, 2016.
[113] X.-J. Wang, W.-Y. Ma, G.-R. Xue, and X. Li. Multi-model similarity propagation and its application for web image retrieval. In ACM Multimedia, pages 944–951, 2004.
[114] X.-J. Wang, L. Zhang, F. Jing, and W.-Y. Ma. Annosearch: Image auto-annotation by search. In IEEE Conf. on Computer Vision and Pattern Recognition, pages 1483–1490, 2006.
[115] X.-J. Wang, L. Zhang, M. Liu, Y. Li, and W.-Y. Ma. ARISTA - image search to annotation on billions of web photos. In IEEE Conf. on Computer Vision and Pattern Recognition, pages 2987–2994, 2010.
[116] P. Weinzaepfel, H. Jégou, and P. Pérez. Reconstructing an image from its local descriptors. In IEEE Computer Vision and Pattern Recognition, 2011.
[117] L. Wu, S. C. Hoi, and N. Yu. Semantics-preserving bag-of-words models for efficient image annotation. In ACM workshop on Large-scale multimedia retrieval and mining, pages 19–26, 2009.
[118] K. Xu, J. Ba, R. Kiros, K. Cho, A. C. Courville, R. Salakhutdinov, R. S. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In ICML, pages 2048–2057, 2015.
[119] H.-F. Yang, K. Lin, and C.-S. Chen. Cross-batch reference learning for deep classification and retrieval. In ACM Multimedia, pages 1237–1246, 2016.
[120] Y.-H. Yang, P.-T. Wu, C.-W. Lee, K.-H. Lin, and W. H. Hsu. Contextseer: Context search and recommendation at query time for shared consumer photos. In ACM Multimedia, 2008.
[121] C.-Y. Yeh, Y.-M. Hsu, H. Huang, H.-W. Jheng, Y.-C. Su, T.-H. Chiu, and W. Hsu. Me-link: Link me to the media – fusing audio and visual cues for robust and efficient mobile media interaction. In Proceedings of the 23rd International Conference on World Wide Web, pages 147–150, 2014.
[122] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson. How transferable are features in deep neural networks? In NIPS, pages 3320–3328, 2014.
[123] F. X. Yu, S. Kumar, Y. Gong, and S. Chang. Circulant binary embedding. In Proceedings of the 31th International Conference on Machine Learning, pages 946–954, 2014.
[124] H. Yue, X. Sun, J. Yang, and F. Wu. Cloud-based image coding for mobile devices - toward thousands to one compression. IEEE Transactions on Multimedia, 15(4):845–857, 2013.
[125] M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In ECCV, pages 818–833, 2014.
[126] A. Zhai, D. Kislyuk, Y. Jing, M. Feng, E. Tzeng, J. Donahue, Y. L. Du, and T. Darrell. Visual discovery at pinterest. In WWW, 2017.
[127] N. Zhang, T. Mei, X.-S. Hua, L. Guan, and S. Li. Taptell: Interactive visual search for mobile task recommendation. JVCI, May 2015.
[128] X. Zhang, Z. Li, L. Zhang, W. Ma, and H. Shum. Efficient indexing for large scale visual search. In IEEE International Conference on Computer Vision, pages 1103–1110, 2009.
[129] S. Zhao, Y. Xu, and Y. Han. Large-scale e-commerce image retrieval with topweighted convolutional neural networks. In ICMR, pages 285–288, 2016.
[130] L. Zheng, Y. Yang, and Q. Tian. Sift meets cnn: A decade survey of instance retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
[131] W. Zhou, M. Yang, H. Li, X. Wang, Y. Lin, and Q. Tian. Towards codebookfree: Scalable cascaded hashing for mobile image search. IEEE Transactions on Multimedia, 16(3):601–611, 2014.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top