跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.91) 您好!臺灣時間:2025/01/21 09:23
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:張博詔
研究生(外文):CHANG,PO-CHAO
論文名稱:排除難/無匹配點之非監督式單眼深度估計
論文名稱(外文):Excluding non-matched patches to do unsupervised monocular depth estimation
指導教授:殷堂凱
指導教授(外文):Yin,Tang-Kai
口試委員:殷堂凱黃文楨彭昭暐
口試委員(外文):Yin,Tang-KaiHuang,Wen- ChenPerng,Jau-Woei
口試日期:2019-07-19
學位類別:碩士
校院名稱:國立高雄大學
系所名稱:資訊工程學系碩士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:中文
論文頁數:70
中文關鍵詞:深度學習卷積神經網路深度估測立體視覺視覺SLAM
外文關鍵詞:Deep LearningConvolution Neural NetworkDepth EstimationstereopsisVisual SLAM
相關次數:
  • 被引用被引用:2
  • 點閱點閱:147
  • 評分評分:
  • 下載下載:16
  • 收藏至我的研究室書目清單書目收藏:0
在立體視覺領域中最重要的部分莫過於深度的估計了,只要有了圖像到真實場景的距離,就能夠利用這些資訊還原場景的三維資訊,深度資訊還可以用來做自駕車避障、語意分割、判斷物體動態位姿、AR等等,可以應用的地方非常的多,雖然已經有很多方法可以直接取得深度的資訊,但它們都還是存在著各自的缺點,例如LiDAR的價格昂貴、kinect無法在室外使用、基於雙眼視覺的匹配算法計算成本很高、基於單眼視覺的算法需要場景有移動才能進行計算。
綜觀上述問題,本論文提出以非監督的方式對單眼移動影像進行深度和位姿的估計,並訓練出只需要一張圖就能進行深度估計的模型,首先使用兩個神經網路來估計單張圖的深度,以及兩張圖之間的位姿,並以新視圖合成作為用來監督的資訊進行神經網路的訓練,接著我們在訓練過程中會排除兩張圖間非共同視野、某些在移動的物體和重複紋理,以減少神經網路的錯誤估計,我們還引入雙眼的資料對模型進行訓練,最後和同樣非監督式的方法相比我們的方法準確率高了 1~2%左右。

One of the most important advancements in the field of stereo vision is depth estimation. Depth can reconstruct the 3D information from images, and also can be used for self-driving obstacle avoidance, semantic segmentation, estimated object dynamic pose, AR, etc. There are already many ways to get depth, but they still have their own shortcomings. For example, LiDAR is expensive, kinect cannot be used outdoors, the algorithm complexity of stereo vision is expensive, and the algorithm of monocular vision requires moving scenes.
Based on the above problem, this thesis proposes an unsupervised method to estimate depth and pose from a series of moving monocular images, then train a model that can estimation depth with one image. First, we use two neural networks to estimate depth and pose, and synthesize a new scene as our supervised information to train the neural networks. Then we exclude some non-matched patches during training to reduce the false estimation of the neural networks. We also train the model by stereo images. Finally, the accuracy of our method is about 1~2% better than other unsupervised methods.

論文審定書 i
誌謝 ii
摘要 iii
ABSTRACT iv
目錄 v
圖目錄 vii
表目錄 ix
第一章 緒論 1
1.1 前言 1
1.2 文獻回顧 2
1.3 研究動機 11
1.4 研究方法簡介 11
1.5 論文架構 12
第二章 原理介紹 13
2.1 深度學習 13
2.1.1 類神經網路 13
2.1.2 卷積神經網路 14
2.1.3 卷積層 15
2.1.4 激勵函數 18
2.1.5 反卷積 19
2.1.6 U-Net 21
2.1.7 結構相似性 22
2.1.8 反向傳播 23
2.1.9 梯度下降法 23
2.1.10 Adam優化演算法 24
2.2 立體視覺 26
2.2.1 相機模型 26
2.2.2 座標軸的轉換 28
2.2.3 圖片間的投影 29
2.2.4 視圖合成 30
2.2.5 會造成錯誤估計的原因 32
第三章 研究方法與設計 36
3.1 實驗流程 37
3.2 資料處理 38
3.3 網路架構 40
3.3.1 視差網路 40
3.3.2 位姿估計網路 42
3.4 視圖合成損失 43
3.5 邊緣檢測平滑損失 44
3.6 排除錯誤估計 44
3.6.1 排除非共同視野 44
3.6.2 排除和相機等速移動物體以及重複紋理 46
3.6.3 排除錯誤估計的mask 47
3.7 雙眼資料 48
3.8 準確率計算 49
第四章 實驗結果 50
4.1 實驗環境及實驗參數 50
4.2 深度圖估計 52
4.3 準確率比較 54
4.4 錯誤估計的地方 55
第五章 結論和未來展望 57
參考文獻 58


[1]J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. “ImageNet: A Large-Scale Hierarchical Image Database,” In CVPR09, 2009.
[2]Ashutosh Saxena, Sung H. Chung, and Andrew Y. Ng, “Learning depth from single monocular images,” In NIPS'05 Proceedings of the 18th International Conference on Neural Information Processing Systems Pages 1161-1168, 2005.
[3]A. Krizhevsky, I. Sutskever, and G.E. Hinton, “Imagenet classification with deep convolutional neural networks,” In Advances in Neural Information Processing Systems 25, pages 1106–1114, 2012.
[4]D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” In Advances in Neural Information Processing Systems, 2014.
[5]N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from rgbd images,” In ECCV, 2012.
[6]D. Eigen, R. Fergus, “Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture,” In ICCV ,2015.
[7]K. Simonyan, A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv:1409.1556, 2014.
[8]J. Long, E. Shelhamer, T. Darrell, “Fully convolutional networks for semantic segmentation,” arXiv:1411.4038, 2014.
[9]I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and N. Navab, “Deeper depth prediction with fully convolutional residual networks,” In 3DV, 2016.
[10]Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, Philip H. S. Torr, “Conditional Random Fields as Recurrent Neural Networks,” In ICCV, 2015.
[11]Fayao Liu, Chunhua Shen, Guosheng Lin, “Deep Convolutional Neural Fields for Depth Estimation from a Single Image,” In CVPR,2015.
[12]R. Garg, V.K. BG, G. Carneiro, and I. Reid, “Unsupervised CNN for single view depth estimation: Geometry to the rescue,” In ECCV, 2016.
[13]C. Godard, O. Mac Aodha, and G.J. Brostow, “Unsupervised monocular depth estimation with left-right consistency,” In CVPR, 2017.
[14]Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, 2004.
[15]T. Zhou, M. Brown, N. Snavely, and D.G. Lowe, “Unsupervised learning of depth and ego-motion from video,” In CVPR, 2017.
[16]A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? The KITTI vision benchmark suite,” In CVPR , 2012.
[17]O. Ronneberger, P. Fischer, T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in: MICCAI, Vol. 9351, pp. 234-241, 2015.
[18]Diederik P. Kingma, and Jimmy Ba, ”Adam: A Method for Stochastic Optimization,” arXiv:1412.6980 , 2014.
[19]J.C. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization,” Journal of Machine Learning Research, 2011.

[20]Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh, and Jianming Liang, “UNet++: A Nested U-Net Architecture for Medical Image Segmentation” arXiv:1807.10165, 2018.
[21]Clément Godard, Oisin Mac Aodha, Michael Firman, and Gabriel Brostow, “Digging Into Self-Supervised Monocular Depth Estimation,” arXiv: 1806.01260,2018.
[22]Z. Yin, J. Shi, “GeoNet: Unsupervised learning of dense depth, optical flow and camera pose,” In CVPR, 2018.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊