論文名稱(外文):Multiple Feature Information Reorganization and Fusion YOLO based 6D Objects Pose Estimation
外文關鍵詞:Deep LearningYOLO6D PosePerspective-n-Point
在計算機視覺和機器人技術中,識別圖像中的特定目標,並確定其相對坐標系的位置與姿態一直都是重要的任務。得益於深度學習研究近年來不斷的發展,目標識別技術也被應用在越來越多的領域中,其中包括6D目標姿態估測。本研究改良了Bugra 等人提出的Real-Time Seamless Single Shot 6D Object Pose Prediction(Singleshot6D),利用YOLO網絡首先對目標的3D bounding box在2D圖像上投影的關鍵點做偵測並給出目標的分類,然後使用Perspective-n-Point(PnP)演算法計算出目標的6D姿態。本研究在原本使用的YOLO架構的基礎上,提出了多次使用重組淺層神經網絡特徵訊息融入到深層神經網絡,在resize特徵圖的同時做到不損失訊息,提升模型架構對細小目標的偵測能力,再通過多次的1×1卷積層降低模型參數數量並增加模型的泛化能力,將激勵函數改為Randomized Leaky ReLU (RReLU),部分保留了原本被ReLU全部捨棄的訊息。為了測試改良後Singleshot6D的效能,使用LINEMOD與OCCLUSION資料集將結果與原始Singleshot6D和其他基於CNN的方法與進行比較,本研究提出的模型架構分別在2D重投影與3D距離實驗中表現出了明顯的優勢,且在相同運算時間等級內,準確率遙遙領先其他模型架構。
In computer vision and robotics, it is always a significant task to identify specific targets in an image and determine their positions and poses in a relative coordinate system. Thanks to the sustainable development of deep learning research in recent years, target recognition technology has also been applied in more and more domains, including 6D objects pose estimation. This study enhances the Real-Time Seamless Single Shot 6D Object Pose Prediction (Singleshot6D) proposed by Bugra et al. The YOLO network is firstly employed to detect the key points of the object’s 3D bounding box projected on the 2D image and identify object’s category, then calculate the object 6D pose by the Perspective-n-Point (PnP) algorithm. Based on the original YOLO architecture, this study proposes reorganization the shallow neural network feature information into the deep neural network many times without losing any information of feature maps, to improve the model architecture's ability to detect small targets. Applying multiple 1×1 convolution layers to reduce the number of model parameters and increase the generalization ability. Replacing activation function to Randomized Leaky ReLU (RReLU) to partially retain the information which was entirely discarded by originally ReLU. To test the performance of the enhanced Singleshot6D, the results were compared with the original Singleshot6D and other CNN-based methods by the LINEMOD and OCCLUSION data set. The model architecture proposed in this study is well evaluated in both 2D reprojection and 3D distance with better accuracy results in the same level of costing time.
摘 要 i
誌 謝 v
目 錄 vi
表目錄 viii
圖目錄 ix
第一章 緒論 1
1.1 研究背景介紹 1
1.2 研究動機與目的 1
1.3 論文內容大綱 2
第二章 相關文獻回顧 3
2.1 深度學習法 3
2.1.1 Two-stage架構 3
2.1.2 One-stage架構 7
2.2 Perspective-n-Point 12
2.2.1 3D點與2D點的關係 12
2.2.2 PnP 14
2.3 6D姿態估測CNN方法 17
第三章 研究方法 21
3.1 Reorganization 21
3.2 1×1卷積層 22
3.3 RReLU 22
3.4 模型架構 23
3.5 訓練過程 26
3.6 測試過程 28
3.7 資料增強 28
第四章 實驗結果 30
4.1資料集介紹 30
4.1.1 LINEMOD資料集介紹 30
4.1.2 OCCLUSION資料集介紹 31
4.2 實驗環境與參數設定 31
4.3 2D重投影結果與比較 32
4.4 平均模型點的3D距離結果與比較 34
4.5 OCCLUSION數據集的結果比較 36
4.6 不同淺層特徵圖實驗結果比較 36
4.7 實驗結果圖 37
第五章 結論與未來展望 42
5.1 結論 42
5.2 未來展望 42
參考文獻 43
