跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.10) 您好!臺灣時間:2025/09/30 17:42
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:陳璟旻
研究生(外文):Chen, Jing-Min
論文名稱:結合空拍影像及深度學習之3D目標物建模、視角分類及檢索技術開發
論文名稱(外文):Fusion of Drone Images and Deep Learning for 3D Object Modeling, View Estimation and Retrieval
指導教授:鄭錫齊鄭錫齊引用關係
指導教授(外文):Cheng, Shyi-Chyi
口試委員:楊健貴鄭錫齊江政欽張欽圳
口試委員(外文):Yang, Chen-KueiCheng, Shyi-ChyiChiang, Cheng-ChinChang, Chin-Chun
口試日期:2018-01-30
學位類別:碩士
校院名稱:國立臺灣海洋大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2018
畢業學年度:106
語文別:中文
論文頁數:64
中文關鍵詞:深度學習空拍機影像3D建模樣板比對3D視角偵測深度估測物件偵測
外文關鍵詞:deep learningdrone Image3D modelingtemplate matching3D pose detectiondepth estimationobject detection
相關次數:
  • 被引用被引用:2
  • 點閱點閱:698
  • 評分評分:
  • 下載下載:100
  • 收藏至我的研究室書目清單書目收藏:1
現今無人空拍機已經十分普及,因為它具有高機動性、能即時的拍攝、監控環境的特性。應用空拍機持續擷取的影像資訊,配合大數據預測分析,整合物件偵搜、環境比對或是物件行為辨識等技術,可建構因應災害防治需求的即時監控系統。
一般空拍機的擷取資訊為2D影像,我們必須導航空拍機至3D物件的不同角度、位置拍攝一組多視角影像,用來同時提供一個3D物件之不同視角資訊,針對各視角影像,我們提出一個基於深度資訊之視角偵測演算法,綜合分析這些多視角影像的重疊狀況,來進行3D目標物或場景重建。本論文之深度估算演算法的第一步是利用深度學習技術偵測空拍影像之目標物,配合運動結構分析演算法(Structure From Motion, SFM),我們可以利用鄰近視角畫面估算出各視角影像的深度資訊。本論文提出之視角估算演算法利用各視角之RGB-D(RGB-D, Color and Depth)影像特徵,估算視角相機參數,這些環場參數可以估測各視角影像間的重疊狀況,利用影像環場接合演算法,可重建其3D模型。先前的相關方法皆有物件切割前處理不夠準確及速度過慢等缺點,本論文基於深度學習的概念,能有效且快速地偵測及切割物件,進行物件導向的3D模型化工作。
藉由事先擷取之3D物件各視角畫面,利用3D重建演算法,接合各視角訓練畫面,建立3D影像物件模型。利用多主平面分析方法,以多個平面的方式逼近3D模型,進而從模型中心延伸各平面中心點,則可得到3D物件之各拍攝視角與位置,再從訓練視角畫面集合選取各視角之參考畫面。在測試階段,將空拍機監控目標物件,利用模型之參考畫面之視角來決定監控畫面影像視角,再進行相機參數之誤差校正,就能完成以空拍機為主之視訊監控應用及檢索,也能依據新的影像來重建原有的3D模型,讓此系統能不斷更新。實驗結果證明,本論文提出之視角偵測及3D檢索在計算速度與精確度上具有良好的表現,能夠有效的比對出輸入影像與3D模型間之關係。
A drone with a camera has been widely used through these years due to its functions of high mobility, immediate scanning, and to monitor the environment promptly. With the ability to continuously capture the targeted object, and integrates technologies such as big data prediction analysis, integrated object detection, environmental comparison, or/and object behavior identification, we could build an immediate surveillance system for the need of disaster prevention.
Normally, drone photos are 2D image, and we need to navigate a drone to different angles, directions, in order to capture a set of multi-angle images, which provides the information of different angles of a 3D object set simultaneously. Accordingly, we come up with a pose detection algorithm based on depth information to analyze the overlapping condition of the multi-angle image, and thus we can rebuild the 3D object or the targeted scenes. The first step of depth estimation algorithm in this study is to use deep learning technique to identify the target object of the UAV image. With the help of SFM, we can calculate the depth information of each image through analyzing the neighboring angle scenes. Moreover, the pose detection algorithm of this study specified in detecting the RGB-D difference of each pose, and to estimate the camera parameter, which could be used to estimate the overlapping condition between each pose, and to reconstruct 3D model by using panoramic image stitching algorithm. The previous methods are inefficient and have the disadvantages of imprecise preprocessing object segmentation. Based on deep learning, we can efficiently and quickly identify and segment the object, and lead to object-oriented 3D modeling processing.
By using 3D reconstruction algorithm, we could connect the image of each detection that we get beforehand, and then create the 3D model. With multiple principal plane analysis, we use multiple planes to press on towards the 3D model, and then we could get the detection and angle from connecting the center point between the model and each plane, so we could choose the reference image from dataset of trainging detection image. In the testing stage, let a drone monitor the target object, and we could use the reference detection image to know the detection of a drone, then correct error of camera parameters. Ultimately, we could complete the video surveillance applications based on a drone and retrieval the information and detection of 3D model, which was rebuilt with new images, and update the model constantly. We verify that the pose detection and 3D retrieval we come up with have well performance in the calculation of speed and accuracy, and can match input images and a 3D model effectively.
摘要 I
Abstract II
誌謝 III
目次 IV
圖目錄 V
表目錄 VI
壹、緒論
1.1 研究動機 1
1.2 研究背景 1
1.2.1 3D建模技術 1
1.2.2 3D檢索技術 2
1.2.3 深度學習 3
1.3 研究方法及簡介 4
1.4 論文組織 5
貳、相關研究
2.1 物件偵測與切割 6
2.2 3D建模技術 6
2.3 影像深度資訊估測 7
2.4 3D視角偵測 8
2.5 3D模型檢索 9
參、神經網路與深度學習
3.1 類神經網路 10
3.2 卷積神經網路 11
3.3 Tensorflow 13
肆、結合空拍影像之3D物件模型重建
4.1 基於深度學習之物件偵測及切割 15
4.2 空拍2D影像深度估測 23
4.3 利用影像接圖技術建立三維模型 24
4.4 基於2D影像之三維模型工具介紹 28
伍、空拍影像之3D物件檢索 - 訓練部分
5.1 多主平面分析模組 29
5.2 視角定位與樣板影像收集 33
5.3 視角類別影像收集 34
5.4 基於深度學習之RGB影像視角分類器制作 35
陸、空拍影像之3D物件檢索 - 測試部分
6.1 基於深度學習之物件偵測 36
6.2 利用深度學習分類器分辨視角並擷取樣板姿態參數 37
6.3 影像區域特徵抽取及物件切割與深度估測 38
6.4 估測相機參數並修正影像深度 43
6.5 3D視角誤差比對 44
6.6 3D模型重建與更新 44
6.7 3D模型檢索 45
柒、實驗結果
7.1 測試資料集描述 46
7.2 各模組測試結果 47
7.3 視角參數估測測試結果 53
7.4 3D模型檢索測試結果 54
7.5 空拍影像測試結果 58
捌、結論與未來展望 61
參考文獻 62
1. DeWitt, B.A. and P.R. Wolf, Elements of Photogrammetry(with Applications in GIS). 2000: McGraw-Hill Higher Education. 624.
2. Tomljenovic, I., D. Tiede, and T. Blaschke, A building extraction approach for Airborne Laser Scanner data utilizing the Object Based Image Analysis paradigm. International Journal of Applied Earth Observation and Geoinformation, 2016. 52(Supplement C): p. 137-148.
3. Kendrick, C., et al. An Online Tool for the Annotation of 3D Models. in 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017). 2017.
4. Hall-Holt, O. and S. Rusinkiewicz. Stripe boundary codes for real-time structured-light range scanning of moving objects. in Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001. 2001.
5. Fantoni, S., U. Castellani, and A. Fusiello. Accurate and Automatic Alignment of Range Surfaces. in 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission. 2012.
6. Rianmora, S., et al. Applying scanning techniques to create 3D model. in IECON 2015 - 41st Annual Conference of the IEEE Industrial Electronics Society. 2015.
7. Liu, S., et al. Creating Simplified 3D Models with High Quality Textures. in 2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA). 2015.
8. Kuschk, G. Model-free dense stereo reconstruction for creating realistic 3D city models. in Joint Urban Remote Sensing Event 2013. 2013.
9. Zhang, Y. and Z. Duan. Retrieving sounds by vocal imitation recognition. in 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP). 2015.
10. Chen, Z., Z. Feng, and Y. Yu. Image retrieve using visual attention weight model. in 2011 IEEE 2nd International Conference on Software Engineering and Service Science. 2011.
11. Lin, D., et al. Visual Semantic Search: Retrieving Videos via Complex Textual Queries. in 2014 IEEE Conference on Computer Vision and Pattern Recognition. 2014.
12. Harár, P., R. Burget, and M.K. Dutta. Speech emotion recognition with deep learning. in 2017 4th International Conference on Signal Processing and Integrated Networks (SPIN). 2017.
13. Akbulut, Y., A. Şengür, and S. Ekici. Gender recognition from face images with deep learning. in 2017 International Artificial Intelligence and Data Processing Symposium (IDAP). 2017.
14. Tang, C., et al. The Object Detection Based on Deep Learning. in 2017 4th International Conference on Information Science and Control Engineering (ICISCE). 2017.
15. Liu, T. and T. Stathaki. Enhanced pedestrian detection using deep learning based semantic image segmentation. in 2017 22nd International Conference on Digital Signal Processing (DSP). 2017.
16. Sun, H., et al. Learning Deep Match Kernels for Image-Set Classification. in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017.
17. Vo, A.T., H.S. Tran, and T.H. Le. Advertisement image classification using convolutional neural network. in 2017 9th International Conference on Knowledge and Systems Engineering (KSE). 2017.
18. Zhao, C., et al. Application of deep belief nets for collaborative filtering. in 2016 16th International Symposium on Communications and Information Technologies (ISCIT). 2016.
19. Blank, M., et al. Actions as space-time shapes. in Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1. 2005.
20. Nicolescu, M. and G. Medioni, A voting-based computational framework for visual motion analysis and interpretation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005. 27(5): p. 739-752.
21. Nowozin, S., G. Bakir, and K. Tsuda. Discriminative Subsequence Mining for Action Classification. in 2007 IEEE 11th International Conference on Computer Vision. 2007.
22. Laptev, I., et al. Learning realistic human actions from movies. in 2008 IEEE Conference on Computer Vision and Pattern Recognition. 2008.
23. Yao, A., J. Gall, and L.V. Gool. A Hough transform-based voting framework for action recognition. in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2010.
24. Gorelick, L., et al., Actions as Space-Time Shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007. 29(12): p. 2247-2253.
25. Brendel, W. and S. Todorovic. Video object segmentation by tracking regions. in 2009 IEEE 12th International Conference on Computer Vision. 2009.
26. Lowe, D.G., Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vision, 2004. 60(2): p. 91-110.
27. Dalal, N. and B. Triggs. Histograms of oriented gradients for human detection. in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). 2005.
28. Faugeras, O.D. and M. Hebert, The representation, recognition, and locating of 3-d objects. Int. J. Rob. Res., 1986. 5(3): p. 27-52.
29. Bergevin, R., D. Laurendeau, and D. Poussart. Estimating the 3D rigid transformation between two range views of a complex object. in [1992] Proceedings. 11th IAPR International Conference on Pattern Recognition. 1992.
30. Dey, T.K. and S. Goswami, Provable surface reconstruction from noisy samples. Computational Geometry, 2006. 35(1): p. 124-141.
31. Alexiadis, D.S., D. Zarpalas, and P. Daras, Real-Time, Full 3-D Reconstruction of Moving Foreground Objects From Multiple Consumer Depth Cameras. IEEE Transactions on Multimedia, 2013. 15(2): p. 339-358.
32. Arie-Nachimson, M. and R. Basri. Constructing implicit 3D shape models for pose estimation. in 2009 IEEE 12th International Conference on Computer Vision. 2009.
33. Su, H., et al. Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories. in 2009 IEEE 12th International Conference on Computer Vision. 2009.
34. Glasner, D., et al. Viewpoint-aware object detection and pose estimation. in 2011 International Conference on Computer Vision. 2011.
35. Sandhu, R., et al. Non-rigid 2D-3D pose estimation and 2D image segmentation. in 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009.
36. Hassner, T. Viewing Real-World Faces in 3D. in 2013 IEEE International Conference on Computer Vision. 2013.
37. Stark, M.a.G., Michael and Schiele, Bernt, Back to the Future: Learning Shape Models from 3D CAD Data. BMVC, 2010.
38. Liebelt, J. and C. Schmid. Multi-view object class detection with a 3D geometric model. in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2010.
39. Min, P., J. Chen, and T. Funkhouser, A 2D sketch interface for a 3D model search engine, in ACM SIGGRAPH 2002 conference abstracts and applications. 2002, ACM: San Antonio, Texas. p. 138-138.
40. Li, B. and H. Johan, Sketch-based 3D model retrieval by incorporating 2D-3D alignment. Multimedia Tools Appl., 2013. 65(3): p. 363-385.
41. Cha, Z. and C. Tsuhan. Efficient feature extraction for 2D/3D objects in mesh representation. in Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205). 2001.
42. Osada, R., et al. Matching 3D models with shape distributions. in Proceedings International Conference on Shape Modeling and Applications. 2001.
43. Hilaga, M., et al., Topology matching for fully automatic similarity estimation of 3D shapes, in Proceedings of the 28th annual conference on Computer graphics and interactive techniques. 2001, ACM. p. 203-212.
44. Funkhouser, T., et al., A search engine for 3D models. ACM Trans. Graph., 2003. 22(1): p. 83-105.
45. Andrew G. Howard, M.Z., Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam, MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. 2017.
46. Cremers, J.S.a.N.E.a.F.E.a.W.B.a.D., A Benchmark for the Evaluation of RGB-D SLAM Systems. 2012.
47. #246, et al., Multi-resolution surfel maps for efficient dense 3D modeling and tracking. J. Vis. Comun. Image Represent., 2014. 25(1): p. 137-147.
48. A. Segal, D.H., S. Thrun, Generalized-ICP. Proc. of Robotics: Science and Systems (RSS), 2009.
49. Steinbrücker, F., J. Sturm, and D. Cremers. Real-time visual odometry from dense RGB-D images. in 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops). 2011.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊