跳到主要內容

臺灣博碩士論文加值系統

(44.200.194.255) 您好!臺灣時間:2024/07/20 14:57
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:陳品孝
研究生(外文):CHEN, PIN HSIAO
論文名稱:一個基於生成式對抗網路的機車聲偵測系統
論文名稱(外文):A Detection System of Motorcycle Sounds Based on Generatived Aversarial Networks
指導教授:竇其仁竇其仁引用關係
指導教授(外文):DOW, CHYI REN
口試委員:陳烈武黃秀芬
口試日期:2021-01-05
學位類別:碩士
校院名稱:逢甲大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2021
畢業學年度:109
語文別:英文
論文頁數:76
中文關鍵詞:聲音辨識生成式對抗網路安全步行頻譜圖
外文關鍵詞:Sound RecognitionGenerative Adversarial NetworksSafe WalkingSpectrogram
相關次數:
  • 被引用被引用:0
  • 點閱點閱:152
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
行人的交通事故率逐年上升,對於行人來說,危險的不是前行方向來車,是後方難以透過視覺來察覺的車輛。通常後方車輛只能靠聽覺來發現,搭配視覺轉換才能確定後方來車,若在聽覺不佳的情況下,行人較難以察覺後方來車,而在台灣因為機車為常見車種之一,機車所造成的交通事故頻傳。因此,如何有效的偵測後方機車的聲音,並且告知路上的行人後方來車是一個值得探討的行人在行走時的安全議題。本論文將設計與實作一個基於生成式對抗網路(Generative Adversarial Networks, GAN)的機車聲偵測系統。首先,我們將從網路上以及實際在地收錄機車的引擎與喇叭聲之數據集,再對收集的聲音資料進行裁切、降噪、時間軸平移、時間軸延展等步驟,達到資料擴增與資料預處理的目標。接下來會利用生成式對抗網路來把使用裁切、降噪等資料擴增方式得到的資料進行擴增,預計能生成出許多透過亂數生成,卻與原資料有著微小差異的資料集,以利後續卷積神經網路(Convolutional Neural Networks, CNN)的訓練。下一步我們會使用卷積神經網路中的殘差網絡(Residual network, ResNet)架構,做為聲音辨識的網路架構,並利用輸出之模型來實作一個機車聲辨識預警系統的雛型,最後將針對生成式對抗網路生成的聲音進行實驗分析,來比較我們生成的聲音與真實資料集對訓練模型的效能及其影響。
The traffic accident rate of pedestrians is increasing year by year. For pedestrians, the danger is the motorcycles that are difficult to detect through the rear. Usually, the rear motorcycles can only hear, and the rear can be determined by looking backward. If pedestrians have poor hearing, it is difficult for them to detect motorcycles coming from behind. In Taiwan, motorcycles are ubiquitous, traffic accidents caused by motorcycles are frequent. Therefore, effectively detecting the sound of the motorcycles coming from behind for the pedestrians is a safe walking issue worthy of discussion. This thesis designs and implements a detection system of motorcycle sounds based on generative adversarial networks. First, we will collect the engine and horn sounds of motorcycles. From the public database on the Internet and the ground truth data, and then handle data by the steps of cropping, noise reduction, shifting time, and time stretching to achieve data augmentation and data pre-processing. The goal is to use the Generative Adversarial Networks (GAN) to augment the data by the traditional data augmentation method. It will generate many data sets generated through random numbers but are slightly different from the original data. To benefit the training of the Convolutional Neural Network (CNN), in the next step, we will use the residual network (ResNet) architecture in the CNN as the network architecture for sound recognition. Use the output model to implement a prototype of motorcycles sound recognition system. Further, we will conduct an experimental analysis on the GAN’s sound and evaluate the efficiency of GAN’s sounds and the ground truth data we collect, and our training model’s impact.
誌謝 i
摘要 ii
Abstract iii
Table of Contents iv
List of Figures vi
List of Tables viii
Chapter 1 Introduction 1
1.1 Background 1
1.2 Motivation 4
1.3 Overview of Research 8
1.4 Thesis Organization 9
Chapter 2 Related Work 10
2.1 Safe Walking 10
2.2 Machine Learning on Sound 12
2.3 Generative Adversarial Networks 17
Chapter 3 System Architecture 19
3.1 System Overview 19
3.2 Data Collection Module 21
3.3 Data Preprocessing Module 22
3.4 GAN Data Augmentation Module 23
3.5 Sound Classification Module 25
Chapter 4 System Implementation and Prototype 27
4.1 Data Collection of Horn Sounds and Engine Sounds 27
4.2 Data Preprocessing Module 28
4.3 Generative Adversarial Networks 31
Chapter 5 Experimental Results 36
5.1 Collect Data Results 36
5.2 Use CycleGAN Augmentation Data to Train Model 41
5.3 Double Data to Train Model 43
5.4 Compare GAN Generate Motor Engine to Improve Model 45
5.5 Compare Double Test Dataset 48
5.6 Use Ensemble to Improve Model 51
5.7 Experimental Conclusions 53
Chapter 6 Conclusions 61
References 62

[1]A. Budhiman, S. Suyanto and A. Arifianto, “Melanoma Cancer Classification Using ResNet with Data Augmentation,” in Proceedings of 2019 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), pp. 17-20, Dec. 2019.
[2]I. Choi, H. Song and J. Yoo, “Deep Learning Based Pedestrian Trajectory Prediction Considering Location Relationship between Pedestrians,” in Proceedings of 2019 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp. 449-451, Feb. 2019.
[3]K. Chellapilla, S. Puri and P. Simard, “High Performance Convolutional Neural Networks for Document Processing,” in Proceedings of Tenth International Workshop on Frontiers in Handwriting Recognition, Oct. 2006.
[4]W. S. Chou, “A Study of Characteristics and Preventive Strategies of Pedestrian Accidents,” Journal of Traffic Science Central Police University, pp. 113-142, Dec. 2010.
[5]L. Cohen, “Time-Frequency Distributions-a Review,” in Proceedings of the IEEE, pp. 941-981, vol. 77, Jul. 1989.
[6]Y. L. Cun, “Generalization and Network Design Strategies,” in Proceedings of Technical Report CRG-TR-89-4, Jun. 1989.
[7]P. H. Gandhi, P. A. Gokhale, H. B. Mehta, and C. J. Shah, “A Comparative Study of Simple Auditory Reaction Time in Blind (Congenitally) and Sighted Subjects,” in Proceedings of Indian J Psychol Med, pp. 273–277, Jul. 2013.
[8]G. Gessle and S. Åkesson, “A Comparative Analysis of CNN and LSTM for Music Genre Classification,” oai:DiVA.org:kth-260138, Jun. 2019.
[9]M. C. Ghilardi, G. Simões, J. Wehrmann, I. H. Manssour and R. C. Barros, “Real-Time Detection of Pedestrian Traffic Lights for Visually-Impaired People,” in Proceedings of 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1-8, Jul. 2018.
[10]I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville and Y. Bengio, “Generative Adversarial Networks,” e-print arXiv:1406.2661, Jun. 2014.
[11]M. Guzek, Z. Lozia, P. Zdanowicz, R. S. Jurecki and T. L. Stańczyk, “Research on Behaviour of Drivers in Accident Situation Conducted in Driving Simulator,” Journal of KONES Powertrain and Transport, pp. 173-183, Jan. 2019.
[12]J. Howard and S. Gugger, “Fastai: A Layered API for Deep Learning,” e-print arXiv:2002.04688, Feb. 2020.
[13]K. He, X. Zhang, S. Ren and J. Sun, “Deep Residual Learning for Image Recognition,” eprint arXiv:1512.03385, Dec. 2015.
[14]T. K. Hon and A. Georgakis, “Enhancing the Resolution of the Spectrogram Based on a Simple Adaptation Procedure,” IEEE Transactions on Signal Processing, pp. 5566-5571, Oct. 2012.
[15]P. Isola, J. Zhu, T. Zhou and A. A. Efros, “Image-to-Image Translation with Conditional Adversarial Networks,” in Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5967-5976, Jul. 2017.
[16]N. R. Koluguri, G. N. Meenakshi and P. K. Ghosh, “Spectrogram Enhancement Using Multiple Window Savitzky-Golay (MWSG) Filter for Robust Bird Sound Detection,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, pp. 1183-1192, Jun. 2017.
[17]D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” e-print arXiv:1412.6980, Jan. 2017.
[18]A. Krizhevsky, I. Sutskever and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in Proceedings of Advances in Neural Information Processing Systems(NIPS), pp. 1097-1105, Jan. 2012.
[19]Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-Based Learning Applied to Document Recognition,” in Proceedings of IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.
[20]W. Lu and Q. Zhang, “Deconvolutive Short-Time Fourier Transform Spectrogram,” IEEE Signal Processing Letters, pp. 576-579, Jul. 2009.
[21]Y. Luo and Y. Mao, “Single-Channel Speech Enhancement Based on Multi-Band Spectrogram-Rearranged RPCA,” Electronics Letters, pp. 415-417, Apr. 2019.
[22]R. Müller, S. Kornblith and G. Hinton, “When Does Label Smoothing Help?,” e-print arXiv:1906.02629, Jun. 2020.
[23]B. McFee, C. Raffel, D. Liang, D. P.W. Ellis, M. McVicar, E. Battenberg and O. Nieto. “Librosa: Audio and Music Signal Analysis in Python.” in Proceedings of the 14th Python in Science Conference, pp. 18-25, Jan. 2015.
[24]T. Oikarinen1, K. Srinivasan1, O. Meisner1, J. B. Hyman1, S. Parmar1, A. Fanucci-Kiss1, R. Desimone1, R. Landman, and G. Feng, “Deep Convolutional Network for Animal Sound Classification and Source Attribution Using Dual Audio Recordings,” Journal of the Acoustical Society of America 145, pp. 654, Feb. 2019.
[25]S. Rovetta, Z. Mnasri and F. Masulli, “Detection of Hazardous Road Events from Audio Streams: An Ensemble Outlier Detection Approach,” in Proceedings of 2020 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS), pp. 1-6, Jun. 2020.
[26]J. Salamon, C. Jacoby and J. P. Bello, “A Dataset and Taxonomy for Urban Sound Research,” in Proceedings of the 22nd ACM International Conference on Multimedia, pp. 1041-1044, Nov. 2014.
[27]S. Salman and X. Liu, “Overfitting Mechanism and Avoidance in Deep Neural Networks,” eprint arXiv:1901.06566, Jan. 2019.
[28]R. V. Sharan and T. J. Moir, “Subband Time-Frequency Image Texture Features for Robust Audio Surveillance,” IEEE Transactions on Information Forensics and Security, pp. 2605-2615, Dec. 2015.
[29]Y. Shen, P. Luo, P. Luo, J. Yan, X. Wang and X. Tang, “FaceID-GAN: Learning a Symmetry Three-Player GAN for Identity-Preserving Face Synthesis,” in Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 821-830, Jun. 2018.
[30]K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” eprint arXiv:1409.1556, Sep. 2014.
[31]C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke and A. Rabinovich, “Going Deeper with Convolutions,” eprint arXiv:1409.4842, Sep. 2014.
[32]Y. C. Tung, Y. C. Liu, “Study on Road-Crossing Behaviors for Different Age Driver and Pedestrian,” National Digital Library of Theses and Dissertations in Taiwan, Jul. 2008.
[33]J. M. Wood, “Nighttime Driving: Visual, Lighting and Visibility Challenges,” Special Issue: The Ida Mann 2020 special issue, vol. 40, pp. 187-201, Mar. 2020
[34]B. Xu, N. Wang, T. Chen and M. Li, “Empirical Evaluation of Rectified Activations in Convolutional Network,” e-print arXiv:1505.00853, Nov. 2015.
[35]J. Zhu, T. Park, P. Isola and A. A. Efros, “Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks,” in Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2242-2251, Oct. 2017.
[36]Z. Zhou, “Attention Based Stack ResNet for Citywide Traffic Accident Prediction,” in Proceedings of 2019 20th IEEE International Conference on Mobile Data Management (MDM), pp. 369-370, Jun. 2019.
[37]Ministry of Transportation and Communications R.O.C Road Traffic Safety Steering Committee URL: https://roadsafety.tw/Dashboard/Custom?type=%E7%B5%B1%E8%A8%88%E5%BF%AB%E8%A6%BD
[38]National Police Agency statistics (24 hours) road traffic accidents URL: https://stat.motc.gov.tw/mocdb/stmain.jsp?sys=100&funid=b3303
[39]Why are there so many scooters in Taiwan? URL: https://medium.com/@dwu182/why-are-there-so-many-scooters-in-taiwan-15bbeb5c77e6
[40]Ministry of Transportation and Communications R.O.C Statistics Website URL: https://stat.motc.gov.tw/mocdb/stmain.jsp?sys=100
[41]National Police Agency statistics of pedestrian road traffic accidents analysis URL: https://www.npa.gov.tw/NPAGip/wSite/public/Attachment/f1411538274997.pdf
[42]National Highway Traffic Safety Administration URL: https://www.nhtsa.gov/
[43]7 Ways to Run Safely on the Road URL: https://www.verywellfit.com/which-side-of-the-road-should-i-run-on-2911817
[44]Wendy Bumgardner Walking and Marathon Coach, Freelance Writer URL: https://www.verywellfit.com/wendy-bumgardner-3432371

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top