(44.192.112.123) 您好!臺灣時間:2021/03/07 17:06
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:蘇炯綸
研究生(外文):SU CHIUNG LUN
論文名稱:植基於深度學習的半自動化基底資料產生器
論文名稱(外文):A semi-automatic ground truth generator for deep learning application
指導教授:施國琛施國琛引用關係
指導教授(外文):Timothy K. Shih
學位類別:碩士
校院名稱:國立中央大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2018
畢業學年度:106
語文別:英文
論文頁數:75
中文關鍵詞:影像語意切割基底資料
外文關鍵詞:Semantic segmentationGround truth
相關次數:
  • 被引用被引用:0
  • 點閱點閱:84
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:1
  • 收藏至我的研究室書目清單書目收藏:0
機器學習、深度學習與人工智慧三個技術是近幾十年來,資訊科技領域最巨大的突破。人工智慧技術所做出的產品已經充斥你我的生活周遭。如智慧家庭,手機自我判斷軟體與自動駕駛汽車等。深度學習技術是人工智慧技術的基本,深度學習技術是利用預先訓練好的模型幫助系統為使用者的輸入進行自動化的判斷與識別。其中,語義分割影像處理技術被廣泛使用在自動駕駛汽車相關的技術上。
  為了訓練精準的語義分割影像處理的深度學習模型,需要大量的訓練用基底資料做為學習的目標。雖然網路上有許許多多種的資料集可以使用,但是在開發者的需求非常多元的現實上是很難找到符合需求的。因此,開發者在開發一個語義分割影像處理的深度學習模型最先面對到的難題就是如何製作屬於自己的訓練用資料集。
  本系統就是為了解決上述問題所開發出來。本系統的目的是幫助開發人員輕鬆生成自己的訓練數據集。本系統提供了一套完整的製造流程來生成一個訓練基礎數據集,尤其是基底資料。開發人員只需提前準備好符合其目標的影片和圖像數據。之後,開發人員可以按照系統流程一步一步創建自己的訓練用資料集。系統最後的輸出是語義分割圖像,其格式由開發人員定義。 本系統分為影片分割器,語義分割工具和圖像編輯程序三部分。
Semantic segmentation deep learning program need one important material to run, it is pre-trained model. Pre-trained model is a weight memory file generated by the deep learning model before learning using the training data set. Pre-trained model can help the deep learning model to identify categories by the weights in the memory.
How to train a good pre-trained model is a major problem that deep learning program developers face first. A good pre-training model requires a large number of training data sets to train. The training data set for Semantic segmentation contains two kinds of data, raw image and ground truth. Ground truth represents the semantic segmentation result image corresponding to the raw image. Ground truth is usually based on human identification, in order to ensure robustness and correctness. Because the production of ground truth relies on human identification, developers need to spend a lot of time to make ground truth. Because of this problem, many researcher on the Internet make public their training data sets for other developers to use. Like ImageNet and Camvid data set, they are famous data set because various categories. In order to solve their own problems, developers often need special training data sets. Although there are many and various data sets available on the Internet, but it is very difficult to find the ones that meet developer’s needs.
Our system is used to solve the problem described above. The purpose of this system is to help developers easily generate their own training data set. This system provides a complete set of processes in order to generate a training base data set, especially ground truth. Developers only need to prepare video and image data that meets their targets in advance. After that, developers can follow the system process to create their own training data set. Output are segmentation image whose format is developer defined. This System is divided into three parts, video splitter, semantic segmentation tool and image edit program.
中文提要 I
英文提要 II
目錄圖 III
目錄 IV
表目錄 VI
1 Introduction 1
2 Related work 5
2.1 Camvid data set 5
2.2 Fully convolutional network for semantic segmentation 6
2.3 U-Net 7
2.4 Tiramisu Net 9
2.5 SegNet 10
2.6 Gate pooling 13
3 System Architecture 15
3.1 System Introduction 15
3.1.1 Testing and generating mode 15
3.1.2 Training and modifying mode 16
3.2 Video splitter 18
3.3 Semantic segmentation algorithm of PSPNet 19
3.3.1 Theory 20
3.3.1.1 ResNet-based full connected network 22
3.3.1.2 Pyramid pooling module 23
3.3.2 Implement 25
3.3.2.1 Tensorflow 25
3.3.2.2 Model Setting 26
3.3.2.3 Testing Model 27
3.4 Semantic segmentation algorithm of improved PPEDNet 30
3.4.1 Basic theory 30
3.4.2 Problem 37
3.4.3 Improvement method 38
3.4.4 Implement 43
3.5 Compared PSPNet and PPEDNet 46
3.6 Method of ground truth and image edit 47
4 Future work 52
5 Reference 61
[1] G. J. Brostow, J. Fauqueur, and R. Cipolla, “Semantic Object Classes In Video: A High-Definition Ground Truth Database,” Pattern Recognition Letters, vol. 30, no. 2, pp. 88-97, 2009.
[2] E. Shelhamer, J. Long, and T. Darrell, “Fully Convolutional Networks For Semantic Segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 4, pp. 640-651, 2017.
[3] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks For Biomedical Image Segmentation,” Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Cham, pp. 234-241, 2015.
[4] Q. Meng, D. Catchpoole, D. Skillicom, and P. J. Kennedy, “Relational Autoencoder For Feature Extraction,” 2017 International Joint Conference on Neural Networks (IJCNN), pp. 364-371, 2017.
[5] S. Jégou, M. Drozdzal, D. Vazquez, A. Romero, and Y. Bengio, “The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation,” Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference, pp. 1175-1183, 2017.
[6] G. Huang, Z. Liu, K. Q. Weinberger, and L. van der Maaten, “Densely Connected Convolutional Networks,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3, 2017.
[7] V. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A Deep Convolutional Encoder-Decoder Architecture For Image Segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 12, pp. 2481-2495, 2017.
[8] L. Bottou, “Large-scale machine learning with stochastic gradient de-scent,” Proceedings of COMPSTAT’ 2010, pp. 177–186, Springer, 2010.
[9] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks For Large-Scale Image Recognition,” arXiv preprint arXiv:1409.1556., 2014.
[10] M. Ranzato, F. J. Huang, Y. Boureau, and Y. LeCun, “Unsupervised learning of invariant feature hierarchies with applications to object recognition,” CVPR, 2007.
[11] Noh, S. Hong,and B. Han, “Learning deconvolution network for semantic segmentation,” IEEE International Conference on Computer Vision, pp. 1520–1528, 2015.
[12] S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, and P. H. Torr, “Conditional random fields as recurrent neural networks,” Proceedings of the IEEE International Conference on Computer Vision, pp. 1529–1537, 2015.
[13] L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs.arXiv:1606.00915, 2016. 5, 7, 8, 9
[14] M. Everingham, S. A. Eslami, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes challenge: A ret-rospective,” International Journal of Computer Vision, vol. 111, no. 1, pp. 98–136.
[15] C.-Y. Lee, P. W. Gallagher, and Z. Tu, “Generalizing Pooling Functions In Convolutional Neural Networks: Mixed, Gated, and Tree,” Artificial Intelligence and Statistics, pp. 464-472, 2016.
[16] H. Zhao, J. Shi, X. Qi, X. Wang and J. Jia, "Pyramid Scene Parsing Network," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 6230-6239.
[17] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated Residual Transformations For Deep Neural Networks,” Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference, pp. 5987-5995, 2017.
[18] F. Yu and V. Koltun "Multi-scale context aggregation by dilated convolutions." arXiv preprint arXiv:1511.07122, 2015.
[19] W. Liu, A. Rabinovich, and A. C. Berg, “Parsenet: Looking wider to see better.” arXiv preprint arXiv:1506.04579, 2015.
[20] L. Shen, Z. Lin, and Q. Huang. “Relay backpropagation for effective learning of deep convolutional neural networks.” ECCV, 2016.
[21] C. Lee, S. Xie, P. W. Gallagher, Z. Zhang, and Z. Tu. “Deeply-supervised nets.” AISTATS, 2015.
[22] B. Zhou, A. Khosla,A. Lapedriza, A. Oliva, and A. Tor-ralba. “Object detectors emerge in deep scene cnns.” arXiv preprint arXiv:1412.6856, 2014.
[23] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. “Going deeper with convolutions.” CVPR, 2015.
[24] B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso, and A. Torralba. “Semantic understanding of scenes through the ADE20K dataset.” arXiv preprint arXiv:1608.05442, 2016.
[25] S. Lazebnik, C. Schmid, and J. Ponce. “Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories.” CVPR, 2006.
[26] K. He, X. Zhang, S. Ren, and J. Sun. “Spatial pyramid pooling in deep convolutional networks for visual recognition.” ECCV, 2014.
[27] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “DeepLab: Semantic Image Segmentation With Deep Convolutional Nets, Atrous Convolution, And Fully Connected CRFs,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834-848, 2018.
[28] S. Ioffe and C. Szegedy. “Batch normalization: Accelerating deep network training by reducing internal covariate shift.” ICML, 2015.
[29] Z. Tan, B. Liu, and N. Yu, “PPEDNet: Pyramid Pooling Encoder-Decoder Network For Real-Time Semantic Segmentation,” Image and Graphics, Cham, pp. 328-339, 2017.
[30] A. Berg, J. Deng, L. Fei-Fei, “Large scale visual recognition challenge (ILSVRC)”, 2014, http://www.image-net.org/challenges/LSVRC
[31] M. Mostajabi, P. Yadollahpour, G. Shakhnarovich, “Feedforward semantic segmentation with zoom-out features.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3376–3385 (2015)
[32] A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, “ENet: A Deep Neural Network Architecture For Real-Time Semantic Segmentation,” in arXiv preprint arXiv:1606.02147, 2016.
[33] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks For Biomedical Image Segmentation,” Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Cham, pp. 234-241, 2015.
[34] G. J. Brostow, J. Fauqueur, and R. Cipolla, “Semantic Object Classes In Video: A High-Definition Ground Truth Database,” Pattern Recognition Letters, vol. 30, no. 2, pp. 88-97, 2009.
[35] A. breheret, “Pixel Annotation Tool”(Online): https://github.com/abreheret/PixelAnnotationTool
[36] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, and Y. Bengio, “Generative adversarial nets.” Advances in neural information processing systems ,pp. 2672-2680, 2014.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關論文
 
無相關期刊
 
無相關點閱論文
 
系統版面圖檔 系統版面圖檔