跳到主要內容

臺灣博碩士論文加值系統

(18.97.9.171) 您好!臺灣時間:2025/01/14 07:10
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:林冠宏
研究生(外文):Kuan-Hung Lin
論文名稱:以擬似物件提昇物件搜尋效能
論文名稱(外文):Boosting Object Retrieval by Estimating Pseudo-Objects
指導教授:徐宏民
指導教授(外文):Winston H. Hsu
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2009
畢業學年度:97
語文別:英文
論文頁數:37
中文關鍵詞:影像搜尋物件搜尋擬似物件視覺文字區域特徵
外文關鍵詞:image retrievalobject retrievalpseudo-objectvisual wordlocal feature
相關次數:
  • 被引用被引用:0
  • 點閱點閱:258
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
目前的物件搜尋系統多以視覺文字為基礎為影像建立包含區域特徵的特徵向量,並以向量空間中各特徵向量間的距離或相似度為基準進行搜尋的動作。然而,這樣的特徵向量雖然考慮到區域特徵,但因為以單一特徵向量代表一張影像,所以當影像中有多個物件以及背景時,各個物件的資訊便混而為一,該影像的特徵向量因此是混雜的且稀釋過的,無法代表影像中任何物件,導致物件搜尋效果不佳。我們提出擬似物件的概念,用以近似影像中的實際物件,並解決前述的問題。影像中的一個擬似物件是該影像在空間中有鄰近關係的特徵點子集合,每一子集合會有相對應的特徵向量,代表其涵蓋的局部區域,因此每張影像會有多個特徵點子集合,即多個擬似物件來代表影像中實際可能包含的物件。我們提出三種有效估計出擬似物件的方法,分別是格狀方法(Grid),G-means方法,和高斯混合模型-貝氏訊息準則方法(GMM-BIC)。我們在兩個大眾影像資料集進行物件搜尋實驗,證明所提出的三個方法的效能遠超過目前的物件搜尋系統。
State-of-the-art object retrieval systems are mostly based on the bag-of-visual-words representation which encodes local appearance information of an image in a feature vector. A search is performed by comparing query object’s feature vector with those for database images. However, a database image vector generally carries mixed information of the entire image which may contain multiple objects and background. Search quality is degraded by such noisy (or diluted) feature vectors. We address this issue by introducing the concept of pseudo-objects to approximate candidate objects in database images. A pseudo-object is a subset of proximate feature points in an image with its own feature vector to represent a local area. We investigate effective methods (e.g., grid, G-means, and GMM-BIC) to estimate pseudo-objects. Experimenting over two consumer photo benchmarks, we demonstrate the proposed method significantly outperforming other state-of-the-art object retrieval algorithms.
摘要 i
Abstract ii
Chapter 1 Introduction 1
1.1 Vector Space Model for Image Retrieval 2
1.2 Object Retrieval 4
1.3 Bag-of-Words Representation 6
1.4 Spatial Pyramid Matching 9
1.5 Limitations of Prior Works 10
Chapter 2 Pseudo-Objects 15
2.1 The Grid Method 16
2.2 The G-means Method 17
2.3 The Gaussian Mixture Model
with Bayesian Information Criterion Method 22
2.4 Image Scoring Based on Pseudo-Objects 25
Chapter 3 Evaluation 27
3.1 Benchmarks 27
3.2 Implementation 30
3.3 Results and Discussion 31
Chapter 4 Conclusions and Future Work 36
Bibliography 37
[1]M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker, “Query by image and video content: the qbic system,” Computer, vol. 28, no. 9, pp. 23-32, 1995.
[2]J. R. Smith and S.-F. Chang, “Visualseek: a fully automated content-based image query system,” in Proc. of ACM Multimedia,1996.
[3]J. Sivic and A. Zisserman, “Video Google: a text retrieval approach to object matching in videos,” in Proc. of ICCV, 2003.
[4]J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, “Object retrieval with large vocabularies and fast spatial matching,” in Proc. of CVPR, 2007.
[5]O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman, “Total recall: automatic query expansion with a generative feature model for object retrieval,” in Proc. of ICCV, 2007.
[6]J. Yang, Y. G. Jiang, A. G. Hauptmann, and C. W. Ngo, “Evaluating bag-of-visual-words representations in scene classification,” in Proc. of MIR, 2007.
[7]K. Mikolajczyk and C. Schmid, “Scale & affine invariant interest point detectors,” International Journal of Computer Vision, vol. 60, no. 1, October 2004.
[8]Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Van Gool. “A comparison of affine region detectors,” International Journal of Computer Vision, vol. 65, no. 1-2, pp. 43-72, 2005.
[9]D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, November 2004.
[10]S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: spatial pyramid matching for recognizing natural scene categories,” in Proc. of CVPR, 2006.
[11]L. A. Barroso, J. Dean, and U. Holzle, “Web search for a planet: the Google cluster architecture,” IEEE Mirco, vol. 23, no. 2, pp. 22-28, March-April 2003.
[12]O. Chum, J. Matas, and S. Obdrzalek. “Enhancing RANSAC by generalized model optimization,” in Proc. ACCV, 2004.
[13]Y.-H. Yang, P.-T. Wu, C.-W. Lee, K.-H. Lin, W. H. Hsu, and H. H. Chen, “ContextSeer: context search and recommendation at query time for shared consumer photos,” in Proc. of ACM Multimedia, 2008.
[14]G. Hamerly and C. Elkan, “Learning the k in k-means,” in Proc. of NIPS, 2003.
[15]L. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition, Prentice Hall PTR, 1993.
[16]G. Schwarz, “Estimating the dimension of a model,” The Annals of Statistics, vol. 6, no. 2, pp. 461-464, March 1978.
[17]M. A. Stephens. “EDF statistics for goodness of fit and some comparisons,” Journal of the American Statistical Association, vol. 69, no. 347, pp. 730-737, September 1974
[18]A Dempster, N. Laird, and D. Rubin. “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society, Series B, vol. 39, pp. 1-38, 1977.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top