跳到主要內容

臺灣博碩士論文加值系統

(44.200.27.215) 您好!臺灣時間:2024/04/20 09:28
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:LE, TRUNG-HIEU
研究生(外文):LE, TRUNG-HIEU
論文名稱:不同環境狀況下的強健型物件偵測方法
論文名稱(外文):Robust Object Detection in Different Environmental Conditions
指導教授:黃士嘉黃士嘉引用關係
指導教授(外文):HUANG, SHIH-CHIA
口試委員:黃士嘉郭斯彥顏嗣鈞張耀文林嘉文鄭文皇黃敬群杭學鳴蔡偉和
口試委員(外文):HUANG, SHIH-CHIAKUO, SY-YENYEN, HSU-CHUNCHANG, YAO-WENLIN, CHIA-WENCHENG, WEN-HUANGHUANG, CHING-CHUNHANG, HSUEH-MINGTSAI, WEI-HO
口試日期:2020-05-12
學位類別:博士
校院名稱:國立臺北科技大學
系所名稱:電資國際專班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2020
畢業學年度:108
語文別:英文
論文頁數:57
外文關鍵詞:Object detectionDual-subnet networkJoint learningMultitask learningFast detectionCross-resolution feature fusionConvolutional neural network
相關次數:
  • 被引用被引用:0
  • 點閱點閱:320
  • 評分評分:
  • 下載下載:30
  • 收藏至我的研究室書目清單書目收藏:0
In the past half of the decade, object detection approaches based on convolutional neural network have been widely studied and successfully applied in many computer vision applications. However, detecting objects in adverse conditions including inclement weather and low light source conditions remains a major challenge because of poor visibility. In this thesis, firstly, the outdoor object detection problem in the presence of fog is addressed for improving the quality of intelligent autonomous vehicle (IAV) systems by introducing a novel dual-subnet network (DSNet) that can be trained end-to-end and jointly learn three tasks: visibility enhancement, object classification, and object localization. DSNet attains complete performance improvement by including two subnetworks: detection subnet and restoration subnet. RetinaNet has been employed as a backbone network, also called detection subnet, which is responsible for learning to classify and locate objects. The restoration subnet is designed by sharing feature extraction layers with the detection subnet and adopting a feature recovery (FR) module for visibility enhancement. Secondly, to detect fast and accurately small hands in the indoor environment for developing Intelligent Homecare systems, a single shot multibox detector (SSD) has been utilized as the base architecture and a novel cross-resolution feature fusion (CFF) approach has been proposed to add contextual information and semantic information to shallower layers which is responsible for detecting small objects. The CFF approach helps improve performance significantly owing to the inclusion of two important modules: a narrow atrous spatial pyramid pooling (N-ASPP) module and a richer semantic information
generation (RSIG) module. The N-ASPP module employs atrous convolution to capture multiscale context information by adopting different atrous rates. The RSIG module uses a resolution-matching submodule to enlarge an input feature map and a ResNeXt block to exploit richer semantic information. In verification experiments, the proposed DSNet achieved 50.84% mean average precision (mAP) on a synthetic foggy dataset and 41.91% mAP on a public natural Foggy Driving dataset, and the proposed 2CFFSSD model achieved 66.41% average precision (AP) in the Oxford hand test set conducted at 55 frames per second on a GTX 1080 Ti graphics processing unit. Both our proposed methods outperform the baseline detection models and many state-of-the-art object detectors while maintaining the high speed.
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 Object Detection . . . . . . . . . . . . . . . . . . 6
2.2 Image Dehazing . . . . . . . . . . . . . . . . . . 8
2.3 Multi-task Learning . . . . . . . . . . . . . . . . 10
3 Deep Convolutional Neural Network-based Approach for ObjectDetection . . . . . . 12
3.1 Proposed Dual-subnet Network . . . . . . . . . . . . . . . 12
3.1.1 Detection Subnetwork . . . . . . . . . . . . . . . 12
3.1.2 Restoration Subnetwork . . . . . . . . . . . . . . 15
3.2 Proposed Cross-Resolution Feature Fusion Approach . . . 20
3.2.1 Narrow Atrous Spatial Pyramid Pooling Module . 22
3.2.2 Richer Semantic Information Generation Module . 23
4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 26
4.1 Verification and Analysis of The Proposed DSNet Approach 26
4.1.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.2 Implementation Details . . . . . . . . . . . . . . 29
4.1.3 Results on The FOD Dataset and The Foggy Driving Dataset . . . . . . . 31
4.1.4 Inference Time . . . . . . . . . . . . . . . . . . . 40
4.2 Verification and Analysis of the Proposed Cross-Resolution
Feature Fusion approach . . . . . . . . . . . . . . . . . . 41
4.2.1 Implementation Details . . . . . . . . . . . . . . . 42
4.2.2 Results on Oxford Hand Dataset . . . . . . . . . . 43
4.2.3 Inference Time . . . . . . . . . . . . . . . . . . . 49
5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
[1] G. Wang, J. Guo, Y. Chen, Y. Li, and Q. Xu, “A pso and bfo-based learning strategy applied to faster r-cnn for object detection in autonomous driving,” IEEE Access, vol. 7, pp. 18840–18859, 2019.
[2] C.-R. Huang, P.-C. Chung, K.-W. Lin, and S.-C. Tseng, “Wheelchair detection using cascaded decision tree,” IEEE Transactions on Information Technology in Biomedicine, vol. 14, no. 2, pp. 292–300, 2010.
[3] C. Sun, X. Zhang, Q. Zhou, and Y. Tian, “A model predictive controller with switched tracking error for autonomous vehicle path tracking,” IEEE Access, vol. 7, pp. 53103–53114, 2019.
[4] M. Gochoo, T.-H. Tan, S.-H. Liu, F.-R. Jean, F. Alnajjar, and S.-C. Huang, “Unobtrusive activity recognition of elderly people living alone using anonymous binary sensors and dcnn,” IEEE Journal of Biomedical and Health Informatics, 2018.
[5] B.-H. Chen, S.-C. Huang, and W.-H. Tsai, “Eliminating driving distractions: Human-computer interaction with built-in applications,” IEEE Vehicular Technology Magazine, vol. 12, no. 1, pp. 20–29, 2017.
[6] S.-C. Huang, B.-H. Chen, S.-K. Chou, J.-N. Hwang, and K.-H. Lee, “Smart car [application notes],” IEEE Computational Intelligence Magazine, vol. 11, no. 4, pp. 46–58, 2016.
[7] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern
recognition, pp. 580–587, 2014.
[8] R. Girshick, “Fast r-cnn,” in The IEEE International Conference on Computer Vision (ICCV), December 2015.
[9] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: towards real-time object detection with region proposal networks,” IEEE Transactions on Pattern Analysis & Machine Intelligence, no. 6, pp. 1137–1149, 2017.
[10] J. Dai, Y. Li, K. He, and J. Sun, “R-fcn: Object detection via region-based fully convolutional networks,” in Advances in neural information processing systems, pp. 379–387, 2016.
[11] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788, 2016.
[12] J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” arXiv preprint, 2017.
[13] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
[14] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in European conference on computer vision, pp. 21–37, Springer, 2016.
[15] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in CVPR, vol. 1, p. 4, 2017.
[16] T.-H. Le, D.-W. Jaw, I.-C. Lin, H.-B. Liu, and S.-C. Huang, “An efficient hand detection method based on convolutional neural network,” in 2018 7th International Symposium on Next Generation Electronics (ISNE), pp. 1–2, IEEE, 2018.
[17] T.-H. Le, S.-C. Huang, and D.-W. Jaw, “Cross-resolution feature fusion for fast hand detection in intelligent homecare systems,” IEEE Sensors Journal, vol. 19, no. 12, pp. 4696–4704, 2019.
[18] S. Huang, T. Le, and D. Jaw, “Dsnet: Joint semantic learning for object detection in inclement weather conditions,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2020.
[19] B. Li, X. Peng, Z. Wang, J. Xu, and D. Feng, “End-to-end united video dehazing and detection,” in Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[20] B. Li, W. Ren, D. Fu, D. Tao, D. Feng, W. Zeng, and Z. Wang, “Benchmarking single-image dehazing and beyond,” IEEE Transactions on Image Processing, vol. 28, no. 1, pp. 492–505, 2018.
[21] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125, 2017.
[22] A. Mittal, A. Zisserman, and P. H. Torr, “Hand detection using multiple proposals.,” in BMVC, pp. 1–11, Citeseer, 2011.
[23] X. Deng, Y. Zhang, S. Yang, P. Tan, L. Chang, Y. Yuan, and H. Wang, “Joint hand detection and rotation estimation using cnn,” IEEE Transactions on Image Processing, vol. 27, no. 4, pp. 1888–1900, 2018.
[24] X. Miao, X. Liu, J. Chen, S. Zhuang, J. Fan, and H. Jiang, “Insulator detection in aerial images for transmission line inspection using single shot multibox detector,” IEEE Access, vol. 7, pp. 9945–9956, 2019.
[25] Z.-Q. Zhao, P. Zheng, S.-t. Xu, and X. Wu, “Object detection with deep learning: A review,” IEEE transactions on neural networks and learning systems, 2019.
[26] J. R. Uijlings, K. E. Van De Sande, T. Gevers, and A. W. Smeulders, “Selective search for object recognition,” International journal of computer vision, vol. 104, no. 2, pp. 154–171, 2013.
[27] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, “Overfeat: Integrated recognition, localization and detection using convolutional networks,” arXiv preprint arXiv:1312.6229, 2013.
[28] S. Fidler, R. Mottaghi, A. Yuille, and R. Urtasun, “Bottom-up segmentation for top-down detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3294–3301, 2013.
[29] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
[30] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9, 2015.
[31] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[32] E. J. McCartney, “Optics of the atmosphere: scattering by molecules and particles,” New York, John Wiley and Sons, Inc., 1976. 421 p., 1976.
[33] S. K. Nayar and S. G. Narasimhan, “Vision in bad weather,” in Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 820–827, IEEE, 1999.
[34] S. G. Narasimhan and S. K. Nayar, “Contrast restoration of weather degraded images,” IEEE Transactions on Pattern Analysis & Machine Intelligence, no. 6, pp. 713–724, 2003.
[35] B. Xie, F. Guo, and Z. Cai, “Improved single image dehazing using dark channel prior and multi-scale retinex,” in 2010 International Conference on Intelligent System Design and Engineering Application, vol. 1, pp. 848–851, IEEE, 2010.
[36] K. He, J. Sun, and X. Tang, “Single image haze removal using dark channel prior,” IEEE transactions on pattern analysis and machine intelligence, vol. 33, no. 12, pp. 2341–2353, 2010.
[37] R. T. Tan, “Visibility in bad weather from a single image,” in 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, IEEE, 2008.
[38] Q. Zhu, J. Mai, and L. Shao, “A fast single image haze removal algorithm using color attenuation prior,” IEEE transactions on image processing, vol. 24, no. 11, pp. 3522–3533, 2015.
[39] C. O. Ancuti, C. Ancuti, C. Hermans, and P. Bekaert, “A fast semi-inverse approach to detect and remove the haze from a single image,” in Asian Conference on Computer Vision, pp. 501–514, Springer, 2010.
[40] B. Cai, X. Xu, K. Jia, C. Qing, and D. Tao, “Dehazenet: An end-to-end system for single image haze removal,” IEEE Transactions on Image Processing, vol. 25, no. 11, pp. 5187–5198, 2016.
[41] W. Ren, S. Liu, H. Zhang, J. Pan, X. Cao, and M.-H. Yang, “Single image dehazing via multi-scale convolutional neural networks,” in European conference on computer vision, pp. 154–169, Springer, 2016.
[42] B. Li, X. Peng, Z. Wang, J. Xu, and D. Feng, “Aod-net: All-in-one dehazing network,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 4770–4778, 2017.
[43] H. Zhang and V. M. Patel, “Densely connected pyramid dehazing network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3194–3203, 2018.
[44] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, pp. 234–241, Springer, 2015.
[45] R. Caruana, “Multitask learning,” Machine learning, vol. 28, no. 1, pp. 41–75, 1997.
[46] A. Kendall, Y. Gal, and R. Cipolla, “Multi-task learning using uncertainty to weigh losses for scene geometry and semantics,” in Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pp. 7482–7491, 2018.
[47] Y. Liao, S. Kodagoda, Y. Wang, L. Shi, and Y. Liu, “Understand scene categories by objects: A semantic regularized scene classifier using convolutional neural networks,” in 2016 IEEE international
conference on robotics and automation (ICRA), pp. 2318–2325, IEEE, 2016.
[48] M. Teichmann, M. Weber, M. Zoellner, R. Cipolla, and R. Urtasun, “Multinet: Real-time joint semantic reasoning for autonomous driving,” in 2018 IEEE Intelligent Vehicles Symposium (IV), pp. 1013–1020, IEEE, 2018.
[49] I. Kokkinos, “Ubernet: Training a universal convolutional neural network for low-, mid-, and highlevel vision using diverse datasets and limited memory,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6129–6138, 2017.
[50] H. Zhang, V. Sindagi, and V. M. Patel, “Multi-scale single image dehazing using perceptual pyramid deep network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 902–911, 2018.
[51] W. Liu, A. Rabinovich, and A. C. Berg, “Parsenet: Looking wider to see better,” arXiv preprint arXiv: 1506.04579, 2015.
[52] L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,” arXiv preprint arXiv:1706.05587, 2017.
[53] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, “Overfeat: Integrated recognition, localization and detection using convolutional networks,” arXiv preprint arXiv:1312.6229, 2013.
[54] A. Giusti, D. C. Ciresan, J. Masci, L. M. Gambardella, and J. Schmidhuber, “Fast image scanning with deep max-pooling convolutional neural networks,” in Image Processing (ICIP), 2013 20th IEEE International Conference on, pp. 4034–4038, IEEE, 2013.
[55] G. Papandreou, I. Kokkinos, and P.-A. Savalle, “Modeling local and global deformations in deep learning: Epitomic convolution, multiple instance learning, and sliding window detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 390–399, 2015.
[56] M. D. Zeiler, D. Krishnan, G. W. Taylor, and R. Fergus, Deconvolutional networks,” in Computer Vision and Pattern Recognition(CVPR), pp. 2528–2535, IEEE, 2010.
[57] M. D. Zeiler, G. W. Taylor, and R. Fergus, “Adaptive deconvolutional networks for mid and high level feature learning,” in Computer Vision (ICCV), 2011 IEEE International Conference on, pp. 2018–2025, IEEE, 2011.
[58] C.-Y. Fu, W. Liu, A. Ranga, A. Tyagi, and A. C. Berg, “Dssd: Deconvolutional single shot detector,” arXiv preprint arXiv:1701.06659, 2017.
[59] Z. Cai, Q. Fan, R. S. Feris, and N. Vasconcelos, “A unified multi-scale deep convolutional neural network for fast object detection,” in European Conference on Computer Vision, pp. 354–370, Springer, 2016.
[60] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in European conference on computer vision, pp. 818–833, Springer, 2014.
[61] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440, 2015.
[62] P. O. Pinheiro, T.-Y. Lin, R. Collobert, and P. Dollár, “Learning to refine object segments,” in European Conference on Computer Vision, pp. 75–91, Springer, 2016.
[63] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in Computer Vision and Pattern Recognition (CVPR), pp. 5987–5995, IEEE, 2017.
[64] C. Sakaridis, D. Dai, and L. Van Gool, “Semantic foggy scene understanding with synthetic data,” International Journal of Computer Vision, pp. 1–20, 2018.
[65] S. Li, I. B. Araujo, W. Ren, Z. Wang, E. K. Tokuda, R. H. Junior, R. Cesar-Junior, J. Zhang, X. Guo, and X. Cao, “Single image deraining: A comprehensive benchmark analysis,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3838–3847, 2019.
[66] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3213–3223, 2016.
[67] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, pp. 1097–1105, 2012.
[68] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European conference on computer vision, pp. 740–755, Springer, 2014.
[69] D. Kinga and J. B. Adam, “A method for stochastic optimization,” in International Conference on Learning Representations (ICLR), vol. 5, 2015.
[70] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results.” http:// www.pascal-network.org/ challenges/
VOC/voc2012/workshop/index.html.
[71] E. Ohn-Bar and M. M. Trivedi, “Beyond just keeping hands on the wheel: Towards visual interpretation of driver hand motion patterns,” in Intelligent Transportation Systems (ITSC), 2014 IEEE 17th International Conference on, pp. 1245–1250, IEEE, 2014.
[72] P. Buehler, M. Everingham, D. P. Huttenlocher, and A. Zisserman, “Long term arm and hand tracking for continuous sign language tv broadcasts,” in Proceedings of the 19th British Machine Vision Conference, pp. 1105–1114, BMVA Press, 2008.
[73] S. Bambach, S. Lee, D. J. Crandall, and C. Yu, “Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions,” in The IEEE International Conference on Computer Vision (ICCV), December 2015.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊