跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.84) 您好!臺灣時間:2024/12/10 23:02
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:林俊佑
研究生(外文):Chun-Yo Lin
論文名稱:基於雲端與移動運算平台實現卷積神經網路物件辨識系統應用於安全導向程序
論文名稱(外文):Realtime CNN-based Object Recognition Framework for SafetyCritical Application using Federated Mobile Cloud ComputingPlatforms
指導教授:施吉昇
口試日期:2017-07-07
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2017
畢業學年度:105
語文別:英文
論文頁數:35
中文關鍵詞:物件辨識在線學習增進式學習先進輔助駕駛系統雲端運算
外文關鍵詞:Object DetectionOnline TrainingIncremental TrainingADASCloud Computing
相關次數:
  • 被引用被引用:1
  • 點閱點閱:287
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
近年來,隨著新穎的深度學習模型不斷推出,基於視覺的物體辨識方法越來越強大。
然而,即時是最先進的深度學習物體辨識技術,若要將其應用至現實生活中,準確率及速度還是不夠好的。
以先進輔助駕駛系統為例,物件辨識系統需要快速的辨識出路上所有物體。
若物件辨識系統之準確率及速度不足,則有可能造成車禍。
為了增進物件辨識系統之整體效能,我們首先探討造成物件辨識模型準確率不佳的原因。
我們發現訓練集與測試集有非常大的差異,而這個差異造成模型無法辨識出測試集中的物件。
我們對兩個集使用PCA,並去視覺化此差異。
另外,我們根據先進輔助駕駛系統之法規算出整體物件辨識系統所需達成之最小fps為3.75fps。
因此,為了提昇整體準確率以及達成法規之需求,我們提出了一個結合雲端與行動運算之系統。
我們在雲端上利用增量學習去加強安置在行動裝置上的物件辨識模型之即時準確率。
此外,我們交互使用物件辨識與物件追蹤去提昇整體的偵測速度。
使用了我們提出的系統之後,模型的準確率上升了56\%, 速度也提昇了至少3倍。
With novel deep learning method, vision-based object detection has become more and more powerful.
However, the performance of state-of-the-art object detection method is still unacceptable when it comes to real life scenario.
Take ADAS for example, object detection should recognize objects and run at high fps to avoid potential collision.
In order to improve the performance of object detection, we first analyze the root cause of bad accuracy.
We found there are big difference between training set and testing set, and the difference causes model unable to recognize objects in testing set.
We visualize the difference by applying PCA on both set.
Also, we calculate the minimum framerate the object detection need to achieve.
Based on the ADAS/AEBS regulation tests, the minimum framerate the object detection need to achieve is 3.75fps.
Thus, we proposed a framework which collaborate mobile and cloud device to enhance the accuracy and speed of object detection.
We do online incremental training on cloud to enhance the accuracy of object detection model in real time.
In addition, we interleave object detection with object tracking to speedup the overall detection fps.
After applying our framework, the model''s recall increase about 56\% and the detection fps improves at least three times.
Acknowledgments ii
摘 要 iii
Abstract iv
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Background and Related Work 5
2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Convolutional Neural Network . . . . . . . . . . . . . . . . . . . 5
2.1.2 CNN-Based Object Detection Method . . . . . . . . . . . . . . . 6
2.1.3 Visual Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.4 Semi-Supervised Self-Training . . . . . . . . . . . . . . . . . . . 7
2.1.5 ADAS/AEBS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.6 Nvidia Tegra X1(TX1) . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3 ADAS/AEBS Architecture and Performance Requirement 10
3.1 ADAS/AEBS Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Laws and Regulations of AEBS . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Performance Measurement on TX1 . . . . . . . . . . . . . . . . . . . . . 13
3.3.1 Model Description . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3.2 Runtime Estimation on TX1 . . . . . . . . . . . . . . . . . . . . 14
4 Targeted Problem and Proposed Method 15
4.1 Targeted Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.1.1 Cause of Low Accuracy . . . . . . . . . . . . . . . . . . . . . . 16
4.1.2 Minimum Framerate . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2 Online Incremental Training Framework . . . . . . . . . . . . . . . . . . 19
4.2.1 Framework Architecture . . . . . . . . . . . . . . . . . . . . . . 19
4.2.2 Online Incremental Training Method . . . . . . . . . . . . . . . 20
4.2.3 Overfitting and Training Set Selection . . . . . . . . . . . . . . . 22
4.2.4 Model State Labeling . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3 Detection with Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3.1 ROI Cropping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3.2 Detection with Tracking . . . . . . . . . . . . . . . . . . . . . . 24
5 Performance Evaluation 26
5.1 Experiment Environment and Performance Metrics . . . . . . . . . . . . 26
5.2 Evaluation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.2.1 Online Incremental Training Results . . . . . . . . . . . . . . . . 27
5.2.2 Influence of Training Set Selection . . . . . . . . . . . . . . . . . 28
5.2.3 Influence of Different Scene on Model Accuracy . . . . . . . . . 29
5.2.4 Detection with Tracking . . . . . . . . . . . . . . . . . . . . . . 30
6 Conclusion 32
Bibliography 33
[1] G. C. Congresss, “New toshiba image-recognition processors for adas; night-time pedestrian detection and 3d reconstruction,” http://www.greencarcongress.com/2014/11/20141114-toshibaimage.html, 2014.
[2] NVIDIA, “Nvidia tegra x1,” http://www.nvidia.com/object/tegra-x1-processor.html,2016.
[3] ARTC, “Automatic research and testing,” https://www.artc.org.tw/upfiles/
ADUpload/knowledge/tw_knowledge_499017376.pdf, 2017.
[4] R. Kastner, F. Schneider, T. Michalke, J. Fritsch, and C. Goerick, “Image-based classification of driving scenes by hierarchical principal component classification (hpcc),” in Intelligent Vehicles Symposium, 2009 IEEE, June 2009, pp. 341–346.
[5] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Computer Vision and Pattern Recognition(CVPR), 2014.
[6] J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders, “Selective search for object recognition,” in International Journal of Computer Vision,
2013.
[7] R. G. M. Research, “Fast r-cnn,” in International Conference on Computer Vision (ICCV), 2015.
[8] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Advances in Neural Information Processing Systems (NIPS), 2015.
[9] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Computer Vision and Pattern Recognition(CVPR), 2015.
[10] J. Redmon and A. Farhadi, “Yolo9000: Better, faster, stronger,” in Computer Vision and Pattern Recognition (CVPR), 2017.
[11] A. Yilmaz, O. Javed, and M. Shah, “Object tracking: A survey,” in ACM Computing Surveys (CSUR), 2006.
[12] Z. Kalal, K. Mikolajczyk, and J. Matas, “Forward-backward error: Automatic detection of tracking failures,” in International Conference on Pattern Recognition
(ICPR), 2010.
[13] H. Li, Y. Li, and F. Porikli, “Deeptrack: Learning discriminative feature representations online for robust visual tracking,” in Computer Vision and Pattern Recognition(CVPR), 2015.
[14] H. Nam and B. Han, “Learning multi-domain convolutional neural networks for visual tracking,” in Computer Vision and Pattern Recognition (CVPR), 2016.
[15] En.wikipedia.org, “Collision avoidance system,” https://en.wikipedia.org/wiki/Collision_avoidance_system, 2017.
[16] R. Nevatia, “Unsupervised incremental learning for improved object detection in a video,” in Computer Vision and Pattern Recognition (CVPR), 2012.
[17] P. Sharma and R. Nevatia, “Efficient detector adaptation for object detection in a video,” in Computer Vision and Pattern Recognition (CVPR), 2013.
[18] K. Kang, H. Li, J. Yan, X. Zeng, B. Yang, T. Xiao, C. Zhang, Z. Wang, R. Wang, X. Wang, and W. Ouyang, “T-cnn: Tubelets with convolutional neural networks for
object detection from videos,” in Computer Vision and Pattern Recognition (CVPR), 2016.
[19] E. Shelhamer, K. Rakelly, J. Hoffman, and T. Darrell, “Clockwork convnets for video semantic segmentation,” in Computer Vision and Pattern Recognition (CVPR), 2016.
[20] J. D. L. Y. Y. W. Xizhou Zhu, Yuwen Xiong, “Deep feature flow for video recognition,” in Computer Vision and Pattern Recognition (CVPR), 2017.
[21] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” in International Journal of Robotics Research(IJRR), 2013.
[22] M. Everingham, L. Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” in International Journal of Computer Vision(IJCV), 2010.
[23] En.wikipedia.org, “Audio video bridging,” https://en.wikipedia.org/wiki/Audio_Video_Bridging, 2017.
[24] G. Alderisi, G. Iannizzotto, and L. L. Bello, “Towards ieee 802.1 ethernet avb for advanced driver assistance systems: A preliminary assessment,” in Emerging Technologies & Factory Automation (ETFA), 2012 IEEE 17th Conference on. IEEE, 2012, pp. 1–4.
[25] S. Marek, “Verizon’s 5g tests hit 10-gig speeds, commercial deployment in 2017 possible,” http://www.fiercewireless.com/tech/
verizon-s-5g-tests-hit-10-gig-speeds-commercial-deployment-2017-possible, 2016.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top