跳到主要內容

臺灣博碩士論文加值系統

(44.200.194.255) 您好!臺灣時間:2024/07/20 15:01
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:陳則佑
研究生(外文):Chen, Ze-Yiou
論文名稱:利用GNN網路學習估測2D-3D特徵對應關係
論文名稱(外文):Learning 2D-to-3D Correspondence with Graph Neural Networks
指導教授:陳冠文陳冠文引用關係
指導教授(外文):Chen, Kuan-Wen
口試委員:石勝文謝君偉林志瑋陳冠文
口試委員(外文):Shih, Sheng-WenHsieh, Jun-WeiLin, Chih-WeiChen, Kuan-Wen
口試日期:2021-10-12
學位類別:碩士
校院名稱:國立陽明交通大學
系所名稱:資訊科學與工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2021
畢業學年度:110
語文別:英文
論文頁數:35
中文關鍵詞:特徵比對視覺定位深度學習2D-3D對應電腦視覺
外文關鍵詞:Feature MatchingDeep LearningVisual Localization2D-3D CorrespondenceComputer Vision
相關次數:
  • 被引用被引用:0
  • 點閱點閱:299
  • 評分評分:
  • 下載下載:15
  • 收藏至我的研究室書目清單書目收藏:0
視覺定位在自動駕駛、擴增實境和無人機自動飛航等領域中是不可或缺的 重要元素。近年來,使用2D到3D對應關係的模型定位方法,在具有挑戰性 的場景測資中遠勝過其他的定位方法,並且被廣泛地運用在各式各樣的應 用中;在建立2D定位影像與3D點雲模型的關聯中,雖然現有的方法大多已 使用深度學習來強化特徵偵測器與特徵描述,但是在配對特徵描述時使用 的仍然為傳統的比值審斂法,即便在這一兩年間有許多關於2D到2D對應的 深度學習配對器,就我們所知還沒有方法成功將深度學習應用到2D影像特 徵與3D點雲的配對上;在此篇論文裡,我們將介紹名為Hierarchical 2D-to- 3D Matching Network (簡稱H3M-Net)的深度學習網路,此網路使用了圖神經 網路來預測出2D到3D間的特徵比對結果。與2D對應2D不同的是,單張影像 所提取出的特徵點通常不超過兩千個,但是點雲的部分往往包含了兩倍以 上甚至到上萬個點,也因此現有的2D對2D的配對器無法在修改後直接使用; 為此,我們提出了一個從粗到精的階層式方法,將兩個以圖神經網路為基礎, 名為3D SuperGlue的模塊,在以不同損失函數做訓練後,運用在2D對3D的 配對上,此方法成功避免了3D點輸入過大導致的GPU記憶體問題,並達成 更精確的結果。
Visual localization is essential in numerous applications, such as mobile robotics, augmented reality, and self-flying drones. Currently, model-based methods, which estimate 2D–3D correspondence for estimating the camera pose, outperform other approaches in challenging environments and are widely used in real-world applications. Existing approaches mainly focus on improving the matching performance with learning-based detectors or descriptors, but the correspondence is still estimated based on the traditional ratio test. Although recent research has started to develop 2D-to-2D learning-based matchers, none exploits neural networks to regress the mapping between 2D features and a 3D point cloud, to our best knowledge. In this paper, we introduce the Hierarchical 2D-to-3D Matching Network (H3M-Net), a novel hierarchical approach based on graph neural networks, which predicts feature correspondence between 2D images and a 3D point cloud. Whereas the 2D-to-2D learning-based matcher has only thousands of input points, a point cloud usually has many more (i.e. millions), and thus the existing learning-based matcher cannot be extended to find the 2D- to-3D correspondence directly. To this end, we propose a hierarchical coarse-to- fine approach which includes two GNN-based modules, named 3D SuperGlue, trained by different loss functions to achieve high-precision results, but without requiring large GPU memory.
摘要.i
Abstract.ii
目 錄.iii
圖目錄.iv
表目錄 v
1. Introduction . 1
2. Related works 5
2.1 Camera Localization. 5
2.2 Local Feature Matching 7
2.3 2D-to-2D Feature Matcher and Filter. 9
3. Methods. 12
3.1 Hierarchical 2D-to-3D Matching Network(H3M-Net) 12
3.2 3D SuperGlue . 14
3.3. Coarse Level . 16
3.4. Fine Level . 18
3.5. Training Procedure . 19
4. Implementation Details . 20
5. Experiments. 23
5.1 Dataset. 23
5.2 Evaluation and Comparison 26
5.3 Ablation Study 29
6. Conclusions . 31
Reference 33
[1] A. Barroso-Laguna, E. Riba, D. Ponsa, K. Mikolajczyk. Key.Net: Keypoint Detection by Handcrafted and Learned CNN Filters. In International Conference on Computer Vision (ICCV). 2019
[2] A. Mishchuk, D. Mishkin, F. Radenovic, K. Matas. Working hard to know your neighbod’s margins: Local descriptor learning loss. In Conference on Neural Information Processing System (NIPS). 2017
[3] A. Kendall, M. Grimes, R. Cipolla, “PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization,” Proceedings of the International Conference on Computer Vision (ICCV), 2015.
[4] B. Fan, Y. Tian and F. Wu. L2-Net: Deep learning of discriminative patch descriptor in euclidean space. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[5] C. Zhao, Z. Cao, C. Li, X. Li, J. Yang, “NM-Net: Mining Reliable Neighbors for Robust Feature Correspondences,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[6] D. Detone, T. Malisiewicz, and A. Rabinovich. Superpoint: Self-Supervised Interest Point Detection and Description. CVPR Workshop on Deep Learning for Visual SLAM. 2018.
[7] D. Lowe. Distinctive Image Features from Scale-Invariant Keypoints. In International Journal of Computer Vision (IJCV), 20(2). 2014.
[8] E. Brachmann, A. Krull, S. Nowozin, J. Shotton, F. Michel, S. Gumhold, and C. Rother. Dsac-differentiable ransac for camera localization. In Proc. of Computer Vision and Pattern Recognition (CVPR), pages 6684–6692, 2017.
[9] E. Brachmann, and C. Rother, “Learning less is more-6d camera localization via 3d surface regression,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[10] F. Walch, C. Hazirbas, L. Leal-Taixe, T. Sattler, S. Hilsenbeck, and D. Cremers. 2017. Image-Based Localization Using LSTMs for Structured Feature Correlation. InProceedings of the IEEE International Conference on ComputerVision (ICCV)
[11] J. L. Schönberger, E. Zheng, M. Pollefeys, and J.-M. Frahm. Pixelwise view selection for unstructured multi-view stereo. In ECCV, 2016.
[12] J. L. Schönberger and J.-M. Frahm. Structure-from-motion revisited. In CVPR, 2016.
[13] J. Revaud, P. Weinzaepfel, C. De Souza, N. Pion, G. Csurka, Y. Cabon, and M. Humenberger. R2D2: Repeatable and Reliable Detector and Descriptor. In NIPS. 2019.
[14] J. Sun, Z. Shen, Y. Wang, H. Bao, and X. Zhou. LoFTR: Detector-Free Local Feature Matching with Transformers. In Conference on Computer Vision and Pattern Recognition, 2021.
[15] K. M. Yi, E. Trulls, V. Lepetit, and P. Fua. LIFT: Learned Invariant Feature Transform. In European Conference on Computer Vision (ECCV). 2016.
[16] K. M. Yi, E. Trulls, Y. Ono, V. Lepetit, M. Salzmann, P. Fua, “Learning to Find Good Correspondences,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[17] K.-W. Chen, C.-H. Wang, X. Wei, Q. Liang, C.-S. Chen, M.-H. Yang, Y.-P. Hung, “Vision-Based Positioning for Internet-of-Vehicles,” IEEE Transactions on Intelligent Transportation Systems, 18(2), 364-376, 2017.
[18] M. Dusmanu, I. Rocco, T. Pajdla, M. Pollefeys, J. Sivic, A. Torii, and T. Sattler. D2-net: A trainable cnn for joint detection and description of local features. In CVPR, 2019.
[19] P.-E. Sarlin, Cesar Cadena, Roland Siegwart, and Marcin Dymczyk. From coarse to fine: Robust hierarchical localization at large scale. In CVPR, 2019.
[20] P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich. SuperGlue: Learning feature matching with graph neural networks. In CVPR, 2020.
[21] R. Arandjelovic ́, P. Gronat, A. Torii, T. Pa-jdla, and J. Sivic. NetVLAD: CNN architecture for weakly supervised place recognition. In CVPR, 2016.
[22] T. Sattler, B. Leibe, and L. Kobbelt, “Efficient & effective prioritized matching for large-scale image-based localization,” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2016.
[23] T. Sattler, Q. Zhou, M. Pollefeys, and L. Leal-Taixe. Understanding the limitations of cnn-based absolute camera pose regression. In CVPR, 2019.
[24] T. Sattler, T. Weyand, B. Leibe, and L. Kobbelt. Image retrieval for image- based localization revisited. In CVPR, 2008.
[25] T. Sattler, W. Maddern, C. Toft, A. Torii, L. Hammarstrand, E. Stenborg, D. Safari, M. Okutomi, M. Pollefeys, J. Sivic, F. Kahl, and T. Pajdla. Benchmarking 6DOF outdoor visual localization in changing conditions. In CVPR, 2018.
[26] W. Maddern, G. Pascoe, C. Linegar, and P. New-man. 1 Year, 1000km: The Oxford RobotCar Dataset. International Journal of Robotics Research, 36(1):3–15, 2017.
[27] X. Han, T. Leung, Y. Jia, R. Sukthankar, and A. C. Berg. Matchnet: Unifying feature and metric learning for patch-based matching. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 3279–3286. 2015.
[28] X. Li, K. Han, S. Li, and V. Prisacariu. Dual-resolution correspondence networks. NeurIPS, 2020.
[29] Y. Ono, E. Trulls, P. Fua, K.M. Yi. LF-Net: Learning Local Features from Images. In Conference on Neural Information Processing System (NIPS). 2018.
[30] Y. Verdie, K.M. Yi, P. Fua and V. Lepetit. TILDE: A Temporally Invariant Learned Detector. In IEEE International Computer Vision and Pattern Recognition (CVPR). 2015.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊