( 您好!臺灣時間:2021/02/27 01:07
字體大小: 字級放大   字級縮小   預設字形  


論文名稱(外文):Multiple Feature Information Reorganization and Fusion YOLO based 6D Objects Pose Estimation
外文關鍵詞:Deep LearningYOLO6D PosePerspective-n-Point
  • 被引用被引用:0
  • 點閱點閱:50
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
在計算機視覺和機器人技術中,識別圖像中的特定目標,並確定其相對坐標系的位置與姿態一直都是重要的任務。得益於深度學習研究近年來不斷的發展,目標識別技術也被應用在越來越多的領域中,其中包括6D目標姿態估測。本研究改良了Bugra 等人提出的Real-Time Seamless Single Shot 6D Object Pose Prediction(Singleshot6D),利用YOLO網絡首先對目標的3D bounding box在2D圖像上投影的關鍵點做偵測並給出目標的分類,然後使用Perspective-n-Point(PnP)演算法計算出目標的6D姿態。本研究在原本使用的YOLO架構的基礎上,提出了多次使用重組淺層神經網絡特徵訊息融入到深層神經網絡,在resize特徵圖的同時做到不損失訊息,提升模型架構對細小目標的偵測能力,再通過多次的1×1卷積層降低模型參數數量並增加模型的泛化能力,將激勵函數改為Randomized Leaky ReLU (RReLU),部分保留了原本被ReLU全部捨棄的訊息。為了測試改良後Singleshot6D的效能,使用LINEMOD與OCCLUSION資料集將結果與原始Singleshot6D和其他基於CNN的方法與進行比較,本研究提出的模型架構分別在2D重投影與3D距離實驗中表現出了明顯的優勢,且在相同運算時間等級內,準確率遙遙領先其他模型架構。
In computer vision and robotics, it is always a significant task to identify specific targets in an image and determine their positions and poses in a relative coordinate system. Thanks to the sustainable development of deep learning research in recent years, target recognition technology has also been applied in more and more domains, including 6D objects pose estimation. This study enhances the Real-Time Seamless Single Shot 6D Object Pose Prediction (Singleshot6D) proposed by Bugra et al. The YOLO network is firstly employed to detect the key points of the object’s 3D bounding box projected on the 2D image and identify object’s category, then calculate the object 6D pose by the Perspective-n-Point (PnP) algorithm. Based on the original YOLO architecture, this study proposes reorganization the shallow neural network feature information into the deep neural network many times without losing any information of feature maps, to improve the model architecture's ability to detect small targets. Applying multiple 1×1 convolution layers to reduce the number of model parameters and increase the generalization ability. Replacing activation function to Randomized Leaky ReLU (RReLU) to partially retain the information which was entirely discarded by originally ReLU. To test the performance of the enhanced Singleshot6D, the results were compared with the original Singleshot6D and other CNN-based methods by the LINEMOD and OCCLUSION data set. The model architecture proposed in this study is well evaluated in both 2D reprojection and 3D distance with better accuracy results in the same level of costing time.
摘 要 i
誌 謝 v
目 錄 vi
表目錄 viii
圖目錄 ix
第一章 緒論 1
1.1 研究背景介紹 1
1.2 研究動機與目的 1
1.3 論文內容大綱 2
第二章 相關文獻回顧 3
2.1 深度學習法 3
2.1.1 Two-stage架構 3
2.1.2 One-stage架構 7
2.2 Perspective-n-Point 12
2.2.1 3D點與2D點的關係 12
2.2.2 PnP 14
2.3 6D姿態估測CNN方法 17
第三章 研究方法 21
3.1 Reorganization 21
3.2 1×1卷積層 22
3.3 RReLU 22
3.4 模型架構 23
3.5 訓練過程 26
3.6 測試過程 28
3.7 資料增強 28
第四章 實驗結果 30
4.1資料集介紹 30
4.1.1 LINEMOD資料集介紹 30
4.1.2 OCCLUSION資料集介紹 31
4.2 實驗環境與參數設定 31
4.3 2D重投影結果與比較 32
4.4 平均模型點的3D距離結果與比較 34
4.5 OCCLUSION數據集的結果比較 36
4.6 不同淺層特徵圖實驗結果比較 36
4.7 實驗結果圖 37
第五章 結論與未來展望 42
5.1 結論 42
5.2 未來展望 42
參考文獻 43
[1]R. Girshick, J. Donahue, T. Darrell, J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.
[2]R. Girshick. Fast R-CNN. In ICCV, 2015, pp. 1440-1448
[3]S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards RealTime Object Detection with Region Proposal Networks. In NIPS, 2015.
[4]W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg. SSD: Single Shot MultiBox Detector. In ECCV, 2016.
[5]J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You Only Look Once: Unified, Real-Time Object Detection. In CVPR, 2016.
[6]J. Redmon and A. Farhadi. YOLO9000: Better, Faster, Stronger. CVPR, 2017.
[7]Cortes, C.; Vapnik, V. Support-vector networks. Machine Learning. 1995, 20 (3): 273–297.
[8]K. He, G. Gkioxari, P. Dollár, R. Girshick. Mask R-CNN. In ICCV, 2017.
[9]K. He, X. Zhang, S. Ren, J. Sun. Deep Residual Learning for Image Recognition. In CVPR, 2015.
[10]T. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie. Feature Pyramid Networks for Object Detection. In CVPR, 2017.
[11]M. Fischler, R. Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. In Communications of the ACMJune, 1981.
[12]S. Tulsiani and J. Malik. Viewpoints and Keypoints. In CVPR, 2015.
[13]Alex Kendall, Matthew Grimes, Roberto Cipolla. PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization. In ICCV, 2015.
[14]C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich. Going deeper with convolutions. In CVPR 2015.
[15]Y. Xiang, T. Schmidt, V. Narayanan, and D. Fox. PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes. arXiv preprint arXiv:1711.00199, 2017. 2.
[16]K. Simonyan, A. Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556, 2015. 4.
[17]P. Poirson, P. Ammirato, C.-Y. Fu, W. Liu, J. Kosecka, and A. C. Berg. Fast Single Shot Detection and Pose Estimation. In 3DV, 2016.
[18]C. Szegedy, S. Ioffe, V. Vanhoucke, A. Alemi. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. In AAAI, 2017.
[19]B. Tekin, S. Sinha, P. Fua. Real-Time Seamless Single Shot 6D Object Pose Prediction. In CVPR, 2018.
[20]X. Glorot, A. Bordes, Y. Bengio. Deep Sparse Rectifier Neural Networks. In AISTATS, 2011.
[21]B. X, N. Wang, T. Chen, M. Li. Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853, 2015.
[22]A. Maas, A. Hannum, A. Ng. Rectified Nonlinearities Improve Neural Network Acoustic Models. In ICML, 2013.
[23]T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick. Microsoft COCO: Common objects in context. In ECCV. 2014.
[24]S. Hinterstoisser, V. Lepetit, S. Ilic, S. Holzer, G. Bradski, K. Konolige, and N. Navab. Model Based Training, Detection and Pose Estimation of Texture-less 3D Objects in Heavily Cluttered Scenes. In ACCV, 2012.
[25]A. Krizhevsky, I. Sutskever, G. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. In LSVRC, 2010.
[26]F. Michel, A. Kirillov, E. Brachmann, A. Krull, S. Gumhold, B. Savchynskyy, and C. Rother. Global Hypothesis Generation for 6D Object Pose Estimation. In CVPR, 2017
[27]M. Rad and V. Lepetit. BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth. In ICCV, 2017.
[28]P. Besl, N. McKay. A Method for Registration of 3-D Shapes. In PAMI, 1992.
[29]P. Poirson, P. Ammirato, C.-Y. Fu, W. Liu, J. Kosecka, and A. C. Berg. Fast Single Shot Detection and Pose Estimation. In 3DV, 2016.
[30]G. Huang, Z.Liu, L.Maaten and K. Weinberger. Densely Connected Convolutional Networks. In CVPR, 2017.
電子全文 電子全文(網際網路公開日期:20250721)
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔