(34.239.150.57) 您好!臺灣時間:2021/04/14 22:13
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:王俊智
研究生(外文):Jun-Zhi Wang
論文名稱:兩岸四地與日韓地址語音辨識系統之設計研究
論文名稱(外文):A Design of Speech Recognition System for Addressin Cross-Strait Four Regions, Japan and Korea
指導教授:陳志堅陳志堅引用關係
指導教授(外文):Chih-Chien Chen
學位類別:碩士
校院名稱:國立中山大學
系所名稱:電機工程學系研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2015
畢業學年度:104
語文別:中文
論文頁數:106
中文關鍵詞:隱藏式馬可夫模型線性預估倒頻譜係數梅爾倒頻譜係數單詞標籤相關性單音次分類
外文關鍵詞:Hidden Markov modelLinear predicted cepstrum coefficientsPhrase tagging-correlationMel-frequency cepstrum coefficientsMonotone sub-classification
相關次數:
  • 被引用被引用:0
  • 點閱點閱:93
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:1
  • 收藏至我的研究室書目清單書目收藏:0
「地址」是描述一個地點最直接、明瞭的方式。過去,人們欲找尋一個地方時,常以紙本地圖來查詢;近年來,隨著科技的演進與網路的發達,有了如Google Map, Apple Map等網路地圖的出現。如今,為追求使用的便利性,均可以聲控的方式,運用語音辨識技術,輸入地址。因此,在語音辨識技術的層面上,如何更精確有效地識別地址,不僅是學術上,同時亦是資訊工業上的一門重要課題。
本論文運用語音的單音分類特性,結合字詞標籤的比對,針對傳統語音地址辨識系統,建立了強化的機制。首先,吾人錄製一輪2699個常用二字詞,作為系統之訓練語料,並依中文音節發音規則,分為404小類。其次,再運用音節中聲韻母信號的越零率平均值、標準差與聲母音長相對量,將音節特性,細分為6大類,以改善中文濁音聲母辨識混淆的問題。最後,以梅爾倒頻譜係數與線性預估倒頻譜係數,透過隱藏式馬可夫程序,產生單音節雙特徵參數模型。
在辨識策略上,吾人係透過建立字詞標籤的方式,將資料庫中的每個字音皆建立一組標籤,並依單音分為404組比對碼簿。透過標籤資訊的比對,語音地址辨識系統,可不受所唸地址單詞字數多寡的限制,因而可改善地址漏唸與多唸時系統辨識錯誤的問題。
在系統實作方面,吾人蒐集了台灣、大陸、香港、澳門、日本與韓國等六個地區,共約64萬筆地名路名資訊,結合Google API介面,以當地道路門牌地址為例,作語音地址的搜尋,於Linux Ubuntu 12.04之作業系統下,在輸入完整地址情況下,辨識率約為94.21%。
An ‘address’ is the most straightforward description of a location. In the past people search for a place using paper maps. Nowadays, due to the enormous advancement in speech science and internet technology, verbal search, such as Google Map and Apple Map, becomes popular for address inquiries. The accuracy increase of a speech recognition system for addresses is therefore not only an academic challenge, but also a profitable task in the information industry.
In this thesis, two strategies, the monotone sub-classification and the phrase tagging-correlation, are applied to improve the accuracy of a conventional recognition system for Mandarin addresses. First, 2,699 two-syllable words are chosen and recorded as training material. Secondly, all the monotones are grouped into 404 categories using Mandarin pronunciation rules, and further sub-classified into six classes according to their mean, standard deviation of zero-crossing rate and the ratio of consonant to vowel length. The confusion problem within Mandarin voiced consonants can then be alleviated. Finally, the Mel-frequency cepstrum coefficients (MFCC), and linear predicted cepstrum coefficients (LPCC) are calculated and the bi-parametric Hidden Markov models are estimated for each syllable.
Furthermore, an address recognition strategy based on the phrase tagging-correlation is designed by creating tag codebook for 404 monotones in the address database. By calculating the tagging-correlation between the spoken phrase and the designated phrase, the number of spoken words in the address phrase does not need to be absolute correct. Therefore, missing and insertion word problems can be remedied.
A Mandarin speech recognition system for addresses in Taiwan, Mainland China, Hong Kong, Macao, Japan, and South Korea is implemented using the Google API interface on a Linux Ubuntu 12.04 operating system PC. About 640,000 place names and road names are collected in this study, the recognition rate of the system is approximately 94.21%.
論文審定書 i
致謝 ii
摘要 iii
Abstract iv
目錄 vi
圖目錄 x
表目錄 xii
第 一 章 緒論 1
1.1 研究動機: 1
1.2 研究方法 2
1.3 研究主題背景介紹 3
1.4 論文章節大綱 24
第 二 章 語音前處理與相關技術 25
2.1 預強調 25
2.2 漢明窗 25
2.3 語音切割技術 26
2.3.1 語音與非語音切點偵測 27
2.3.2 連續語音切點偵測 29
2.3.3 線性預估誤差能量 (LPCEE) 30
第 三 章 語音特性分析與篩選 32
3.1 語音特性分析 32
3.1.1 聲母特性分析 32
3.1.2 韻母特性分析 33
3.2 利用能量波形分類 34
3.2.1 子音之均勻性和非均勻性 35
3.2.2 塞音中的送氣音與非送氣音 38
3.2.3 擦音與塞擦音 39
3.3 單音分類機制介紹與流程 41
第 四 章 特徵值萃取與訓練 43
4.1 梅爾倒頻譜係數 (MFCC) 43
4.2 線性預估倒頻譜係數 (LPCC) 47
4.3隱藏式馬可夫模型 53
4.3.1 求出觀測機率 54
4.3.2 找出最佳狀態轉移路徑 57
4.3.3 參數重估 59
第 五 章 語音編碼與資料庫建立 60
5.1 單音編碼 60
5.2 資料庫建立與比對 60
5.2.1 資料庫建立 60
5.2.2 多維資訊索引比對 67
第 六 章 辨識系統之設計、訓練及實作效能評析 71
6.1 辨識系統流程與架構 71
6.2 辨識系統之訓練策略 73
6.3 中文地址系統輸入 74
6.4 辨識系統實作效能與評析 75
6.4.1 系統參數設定 75
6.4.2 系統模擬數據建構 76
6.4.3 中文單音分類實驗 76
6.4.4 中文地址系統於數字上的辨識結果 78
6.4.5 中文地址系統辨識結果與比較 79
6.4.6 六國中文地址系統模擬辨識結果 85
第 七 章 結論與未來展望及建議 86
參考文獻 88
[1]數位時代, http://www.bnext.com.tw/article/view/id/34934
[2]楊仁豪,地理空間結構變遷下台灣行政區劃調整之研究,政治大學地政學系碩士論文,民國93年
[3]吳濟華,我國鄉鎮市行政區劃調整之研究-以屏東縣為例,國立中山大學公共事務管理研究所碩士論文,民國99年
[4]梁木生、王紅衛,我國行政區劃整體改革初探,《二十一世纪》網路版 44 期,民國94年
[5]黃正雄,行政區劃與鄉鎮市自治問題之研究,行政院研究發展考核委員會編印,民國90年
[6]王小川,語音訊號處理,全華圖書出版社,民國98年
[7]John R. Deller Jr., John H. L. Hansen, and John G. Proakis, “Discrete-Time Processing of Speech Signals,” Prentice Hall PTR Upper Saddle River, USA, 1993.
[8]H. Bourlard,V. Tyagi, C. Wellekens, “On Variable-Scale Piecewise Stationary Spectral Analysis of Speech Signals for ASR,” Speech Communication, Vol.48, No.9, pp. 1182-1191, September 2006.
[9]Won-Ho Shin, Byoung-Soo Lee, Yun-Keun Lee and Jong-Seok Lee, “Speech/Non-Speech Classification Using Multiple Features For Robust Endpoint Detection,” In Proceeding of ICAASSP, Vol. 3, pp. 1399-1402, 2000.
[10]K.W. Law and C.F. Chan, “Split-Dimension Vector Quantization of Parcor Coefficients for Low Bit Rate Speech Coding,” IEEE Transactions on Speech and Audio Processing, pp.443-446, July 1994.
[11]國立臺灣師範大學,國音教材編輯委員會編簒,國音學,正中書局股份有限 公司,民國96年
[12]Chee-Yau Wai, “Arjunan, S.P. and Kumar, D.K.Classification of voiceless speech using facial muscle activity and vision based techniques,” TENCON 2008-2008 IEEE Region 10 Conference, pp.1-6, November 1997.
[13]R.D. Kent and C. Read, “The Acoustic Analysis of Speech,” San Diego: Singular, pp.105-44, 1992.
[14]Jan Skoglund and W. Bastiaan Kleijn, “On Time-Frequency Masking in Voiced Speech,” IEEE Transactions on Speech and Audio Processing, Vol.8, No.4, pp.361-369, July 2000.
[15]F. Softic, Z. Bundalo and Z. Blagojevic, “Frequency corrections of sound files for listening without using hearing aid devices,” Proceedings of 2013 2nd Mediterranean Conference, pp.266-269, June 2013.
[16]K.K. Chu and S.H. Leung, “SNR-dependent non-uniform spectral compression for noisy speech recognition,” IEEE International Conference, Vol.1, pp.973-6, May 2004.
[17]Gin-Der Wu and Zhen-Wei Zhu, “Chip Design of LPC-cepstrum for Speech Recognition,” IEEE Transactions on Computer and Information Science, pp.43-47, July 2007.
[18]Lakshmi Kanaka Venkateswarlu Revada, Vasantha Kumari Rambatla and Koti Verra Nagayya Ande, “A Novel Approach to Speech Recognition by Using Generalized Regression Neural Networks,” IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 2, March 2011
[19]Mukesh Rana and Saloni Miglani, “Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition,” International Journal of Engineering and Computer Science, Vol.3, No.8, pp.7727-7732, August 2014.
[20]Bhargab Medhi, Prof P. H. Talukdar, “LPC and MFCC Analysis of Assamese Vowel Phonemes,” International Journal of Advanced Research in Computer Science and Software Engineering, Vol.5, No.1, January 2015.
[21]X. Huang, and H.W. Hon and A. Acero, “Spoken Language Processing: A Guide to Theory, Algorithm, and System Development,” Prentice Hall, USA, 2011.
[22]L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proceedings of the IEEE, Vol. 77, No. 2, February 1989.
[23]T. Kinjo and K. Funaki, “On HMM speech recognition based on complex speech analysis,” IEEE Industrial Electronics, pp. 3477-3480, 2006.
[24]J. Yamagishi, T. Nose, H. Zen, Zhen-Hua Ling, “Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis,” IEEE Transactions on Audio, Speech and Language Processing , Vol. 17, No. 6, pp.1208-1230, August 2009.
[25]Shirin Jalali, Andrea Montanari, and Tsachy Weissman, “Lossy compression of discrete sources via Viterbi algorithm,” IEEE Transactions on Information theory, Vol. 58, No. 4, April 2012.
[26]S. K. Wong and C.W. Wang, “Analysis of parallel genetic algorithms on HMM based speech recognition system,” IEEE Transactions on Consumer Electronics, Vol.43, No.4, pp.1229-1233, November 1997.
[27]Database: 郵編庫, http://tw.youbianku.com
[28]Database: 韓巢地圖, http://map.cn.konest.com/
[29]Database: MIC總務省, http://www.soumu.go.jp/
[30]Database: 戴漢平,澳門特別行政區街道名冊:澳門篇,民政總署,2012
[31]Database: 萬里地圖製作中心,2015香港街道圖,萬里機構-萬里書店,2014
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔