論文名稱(外文):Excluding non-matched patches to do unsupervised monocular depth estimation
口試委員(外文):Yin,Tang-KaiHuang,Wen- ChenPerng,Jau-Woei
外文關鍵詞:Deep LearningConvolution Neural NetworkDepth EstimationstereopsisVisual SLAM
綜觀上述問題,本論文提出以非監督的方式對單眼移動影像進行深度和位姿的估計,並訓練出只需要一張圖就能進行深度估計的模型,首先使用兩個神經網路來估計單張圖的深度,以及兩張圖之間的位姿,並以新視圖合成作為用來監督的資訊進行神經網路的訓練,接著我們在訓練過程中會排除兩張圖間非共同視野、某些在移動的物體和重複紋理,以減少神經網路的錯誤估計,我們還引入雙眼的資料對模型進行訓練,最後和同樣非監督式的方法相比我們的方法準確率高了 1~2%左右。

One of the most important advancements in the field of stereo vision is depth estimation. Depth can reconstruct the 3D information from images, and also can be used for self-driving obstacle avoidance, semantic segmentation, estimated object dynamic pose, AR, etc. There are already many ways to get depth, but they still have their own shortcomings. For example, LiDAR is expensive, kinect cannot be used outdoors, the algorithm complexity of stereo vision is expensive, and the algorithm of monocular vision requires moving scenes.
Based on the above problem, this thesis proposes an unsupervised method to estimate depth and pose from a series of moving monocular images, then train a model that can estimation depth with one image. First, we use two neural networks to estimate depth and pose, and synthesize a new scene as our supervised information to train the neural networks. Then we exclude some non-matched patches during training to reduce the false estimation of the neural networks. We also train the model by stereo images. Finally, the accuracy of our method is about 1~2% better than other unsupervised methods.

