跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.134) 您好!臺灣時間:2025/11/20 21:39
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:蔡侑儒
研究生(外文):Yu-Ju Tsai
論文名稱:使用深度學習預測光場圖像深度分布
論文名稱(外文):Estimate Disparity of Light Field Images by Deep Neural Network
指導教授:歐陽明歐陽明引用關係
口試委員:莊永裕葉正聖
口試日期:2019-06-11
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:英文
論文頁數:39
中文關鍵詞:光場深度學習深度估計
DOI:10.6342/NTU201900934
相關次數:
  • 被引用被引用:0
  • 點閱點閱:123
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
在本論文中,我們使用深度學習來預測光場圖像的深度。光場相機可以同時拍攝一個場景的光線特性:包含空間以及角度。藉由拍攝到的這些資訊,我們可以去預估場景的深度。但是光場相機結構上圖像與圖像之間狹窄的baseline造成深度估計上面的困難,現在很多方法都想要解決這個硬體上面的限制,不過仍然需要在執行速度以及估計的準確率上面達到平衡。
因此,本論文考慮了光場圖像在資料上面的結構性以及圖像上的重複性,將這些特性的概念設計到我們的深度學習網路當中。再來我們提出了attention based sub-aperture view selection來讓網路自行學習哪一些圖像對於深度估計的貢獻是更大的,最後我們比較了在benchmark上和其他states of the art方法之間的比較,來顯示我們對於這個題目的改進。
In this paper, we introduce a light field depth estimation method based on a convolutional neural network. Light field camera can capture the spatial and angular properties of light in a scene. By using this property, we can compute depth information from light field images. However, the narrow baseline in light-field cameras makes the depth estimation of light field difficult. Many approaches try to solve these limitations in the depth estimation of the light field, but there is some trade-off between the speed and the accuracy in these methods.
We consider the repetitive structure of the light field and redundant sub-aperture views in light field images. First, to utilize the repetitive structure of the light field, we integrate this property into our network design. Second, by applying attention based sub-aperture views selection, we can let the network learn more useful views by itself. Finally, we compare our experimental result with other states of the art methods to show our improvement in light field depth estimation.
誌謝ii
摘要iii
Abstract iv
1 Introduction 1
2 Related Work 4
2.1 Traditional Methods 4
2.2 Deep Learning Methods 5
3 Method 7
3.1 Network Design 7
3.2 Feature Extraction and SPP module 8
3.3 Cost Volume Construction 9
3.4 Attention-based View Selection 9
3.5 3D CNN and Disparity Regression 10
4 Experiment 12
4.1 Training Dataset 12
4.1.1 4D Light Field Dataset 12
4.2 Implementation Detail 14
4.3 Quantitative Evaluation 15
4.4 Comparison to state-of-the-art methods 17
5 Conclusion 34
Bibliography 35
[1] E. H. Adelson and J. Y. A. Wang. Single lens stereo with a plenoptic camera. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(2):99–106,Feb 1992. 1
[2] A. Alperovich, O. Johannsen, M. Strecke, and B. Goldluecke. Light field intrinsics with a deep encoder-decoder network. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9145–9154, June 2018. 5, 12, 14
[3] R. C. Bolles, H. H. Baker, and D. H. Marimont. Epipolarplane image analysis: An approach to determining structure from motion. In INTERN..1. COMPUTER VISION, pages 1–7, 1987. 4
[4] J.-R. Chang and Y.-S. Chen. Pyramid stereo matching network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5410–5418,2018. 7, 8, 9, 10, 11
[5] C. Chen, H. Lin, Z. Yu, S. B. Kang, and J. Yu. Light field stereo matching using bilateral statistics of surface cameras. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 1518–1525, June 2014. 5
[6] S. J. Gortler, R. Grzeszczuk, R. Szeliski, and M. F. Cohen. The lumigraph. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’96, pages 43–54, New York, NY, USA, 1996. ACM. 4
[7] K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9):1904–1916, Sep. 2015. 7, 8
[8] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, June 2016. 7
[9] S. Heber and T. Pock. Convolutional networks for shape from light field. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3746–3754, June 2016. 5
[10] S. Heber, W. Yu, and T. Pock. Neural epi-volume networks for shape from light field. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 2271–2279, Oct 2017. 5
[11] K. Honauer, O. Johannsen, D. Kondermann, and B. Goldluecke. A dataset and evaluation methodology for depth estimation on 4d light fields. In S.-H. Lai, V. Lepetit, K. Nishino, and Y. Sato, editors, Computer Vision – ACCV 2016, pages 19–34, Cham, 2017. Springer International Publishing. 12, 15
[12] Jingyi Yu, L. McMillan, and S. Gortler. Scam light field rendering. In 10th Pacific Conference on Computer Graphics and Applications, 2002. Proceedings., pages 137–144, Oct 2002. 5
[13] O. Johannsen, A. Sulc, and B. Goldluecke. What sparse light field coding reveals about scene structure. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3262–3270, June 2016. 17
[14] N. K. Kalantari, T.-C. Wang, and R. Ramamoorthi. Learning-based view synthesis for light field cameras. ACM Trans. Graph., 35(6):193:1–193:10, Nov. 2016. 5
[15] A. Kendall, H. Martirosyan, S. Dasgupta, and P. Henry. End-to-end learning of geometry and context for deep stereo regression. pages 66–75, 10 2017. 7, 9, 10, 11
[16] J. Y. Lee and R. H. Park. Depth estimation from light field by accumulating binary maps based on foreground-background separation. IEEE Journal of Selected Topics in Signal Processing, 11(7):955–964, Oct 2017. 17
[17] M. Levoy and P. Hanrahan. Light field rendering. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’96, pages 31–42, New York, NY, USA, 1996. ACM. 4
[18] C. Shin, H. Jeon, Y. Yoon, I. S. Kweon, and S. J. Kim. Epinet: A fully-convolutional neural network using epipolar geometry for depth from light field images. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4748–4757, June 2018. 5, 15, 17
[19] P. P. Srinivasan, T. Wang, A. Sreelal, R. Ramamoorthi, and R. Ng. Learning to synthesize a 4d rgbd light field from a single image. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 2262–2270, Oct 2017. 5
[20] M. W. Tao, S. Hadap, J. Malik, and R. Ramamoorthi. Depth from combining defocus and correspondence using light-field cameras. In 2013 IEEE International Conference on Computer Vision, pages 673–680, Dec 2013. 5
[21] T.-C. Wang, J.-Y. Zhu, E. Hiroaki, M. Chandraker, A. A. Efros, and R. Ramamoorthi. A 4d light-field dataset and cnn architectures for material recognition. In B. Leibe, J. Matas, N. Sebe, and M. Welling, editors, Computer Vision – ECCV 2016, pages 121–138, Cham, 2016. Springer International Publishing. 5
[22] S. Wanner and B. Goldluecke. Globally consistent depth labeling of 4d light fields. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 41–48, June 2012. 4, 17
[23] S. Wanner and B. Goldluecke. Variational light field analysis for disparity estimation and super-resolution. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(3):606–619, March 2014. 4
[24] Wikipedia contributors. Gabriel lippmann — Wikipedia, the free encyclopedia. https://en.wikipedia.org/w/index.php?title=Gabriel_Lippmann&oldid=879639602, 2019. [Online; accessed 15-May-2019]. 1
[25] Y. Yoon, H. Jeon, D. Yoo, J. Lee, and I. S. Kweon. Learning a deep convolutional network for light-field image super-resolution. In 2015 IEEE International Conference on Computer Vision Workshop (ICCVW), pages 57–65, Dec 2015. 5
[26] Y. Yoon, H. Jeon, D. Yoo, J. Lee, and I. S. Kweon. Light-field image super-resolution using convolutional neural network. IEEE Signal Processing Letters, 24(6):848–852, June 2017. 5
[27] Z. Yu, X. Guo, H. Ling, A. Lumsdaine, and J. Yu. Line assisted light field triangulation and stereo matching. In 2013 IEEE International Conference on Computer Vision, pages 2792–2799, Dec 2013. 5
[28] J. Zbontar and Y. LeCun. Stereo matching by training a convolutional neural network to compare image patches. Journal of Machine Learning Research, 17:1–32, 2016. 9
[29] S. Zhang, H. Sheng, C. Li, J. Zhang, and Z. Xiong. Robust depth estimation for light field via spinning parallelogram operator. Comput. Vis. Image Underst., 145(C):148–159, Apr. 2016. 4, 17
[30] Y. Zhang, H. Lv, Y. Liu, H. Wang, X. Wang, Q. Huang, X. Xiang, and Q. Dai. Lightfield depth estimation via epipolar plane image analysis and locally linear embedding.IEEE Transactions on Circuits and Systems for Video Technology, 27(4):739–747, April 2017. 4
[31] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia. Pyramid scene parsing network. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6230–6239, July 2017. 8
[32] T. Zhong, X. Jin, L. Li, and Q. Dai. Light field image compression using depth-based cnn in intra prediction. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8564–8567, May 2019. 5
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top