跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.91) 您好!臺灣時間:2024/12/11 01:45
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:李易
研究生(外文):Yi Lee
論文名稱:自發性國語語音中自動偵測填充式停頓之初步研究
論文名稱(外文):A Preliminary Study on Automatic Detection of Filled Pause in Spontaneous Mandarin Speech
指導教授:李琳山李琳山引用關係
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:電信工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2008
畢業學年度:96
語文別:中文
論文頁數:62
中文關鍵詞:自發性語音不流暢語音填充式停頓多層感知器
外文關鍵詞:spontaneous speechdisfluent speechfilled pausemultilayer perceptron
相關次數:
  • 被引用被引用:3
  • 點閱點閱:260
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
在今天,朗讀式語音的辨識已經有相當不錯的成果,但對於辨識自發性語音則仍然面對許多難題,其中重要的一個就是自發性語音中存在著的許多不流暢現象,例如:填充式停頓、重述、重新起始、延長、改正等等,而從這些不流暢會衍生出許多問題,造成自動語音辨識系統效能的降低,而其中最常出現的不流暢現象就是填充式停頓,例如國語中的「嗯」、「啊」、「呃」等等。
而填充式停頓與流暢語音的不同之處在於填充式停頓在聲學上常為中央發音的母音,在韻律上延長,聲波、時頻譜有變動緩慢平滑的特性,在語言方面則有較高的機率被靜默停頓所緊鄰。
本論文便針對上述的這些特性來設計獨立於語音辨識器之外的填充式停頓的偵測技術。首先根據梅爾倒頻譜係數向量變化的劇烈程度來抽取潛在的語段邊界,再依照填充式停頓的特性來抽取各語段的特徵值形成特徵向量,最後以多層感知器為主要分類器配合三種不同策略來對每個語段做分類。
我們能夠在CALLHOME語料中面對平衡分佈資料時同時得到約70%以上的召回率以及精確率,但面對真實分佈的資料則只能同時達到約20%,能否將這樣效能的填充式停頓偵測整合在語音辨識器中以改進字詞辨識準確率則尚待檢驗。
第1章 導論 1
1.1 研究動機 1
1.2 研究現況 1
1.3 主要成果 3
1.4 章節摘要 4
第2章 研究背景 5
2.1 國語之填充式停頓 5
2.1.1 國語填充詞及填充式停頓的定義及其功能 5
2.1.2 國語填充式停頓之特性 9
2.1.2.1 國語填充式停頓之聲學特性 9
2.1.2.2 國語填充式停頓之語言特性 13
2.2 使用語料介紹 15
2.3 填充式停頓偵測所使用語音特徵參數、機率模型及效能評估方法 15
2.3.1 高斯混和模型 16
2.3.2 多層感知器 16
2.3.3 系統效能評估 17
2.4 本章結論 18
第3章 語段切割及特徵抽取 19
3.1 系統架構簡介 19
3.2 語段切割 20
3.3 語段特徵抽取 26
3.3.1 語段持續時間(1特徵值) 26
3.3.2 語段相對持續時間比率(1特徵值) 26
3.3.3 頻譜穩定度(1特徵值) 28
3.3.4 穩定區間持續時間(8特徵值) 28
3.3.5 語段之前與之後有無靜默式停頓(2特徵值) 30
3.3.6 頻譜重心(1特徵值) 31
3.3.7 相對頻譜重心比率(1特徵值) 31
3.3.8 梅爾倒頻譜係數一階差量方差(1特徵值) 33
3.4 本章總結 34
第4章 語段分類及效能評估 42
4.1 語段分類 42
4.2 均分法 44
4.3 高斯混和模型初步篩選法 47
4.4 本章結論 53
第5章 結論與展望 55
參考文獻 57
【1】Beyerlein, P., Aubert, X., Haeb-Umbach, R., Harris, M., Klakow, D., Wendemuth, A., Molau, S., Pitz, M., Sixtus, A., 1999. "The Philips/RWTH system for transcription of broadcast news." In: Proc. European Conference on Speech Communication and Technology, Vol. II, Budapest, Hungary, pp. 647-650.
【2】Yu, H., Tomokiyo, T., Wang, Z., Waibel, A., 2000. "New developments in automatic meeting transcription." In: Proc. International Conference on Spoken Language Processing, Vol. IV, Beijing, China, pp. 310-313.
【3】Shriberg, E., 1996. “Disfluencies in Switchboard.” In: Proc. International Conference on Spoken Language Processing, Vol. Addendum, Philadelphia, USA, pp. 11–14.
【4】Shriberg, E., Stolcke, A., 1996. “Word predictability after hesitations: a corpus-based study.” In: Proc. International Conference on Spoken Language Processing, Vol. III. Philadelphia, USA, pp. 1868–1871.
【5】Pakhomov, S.-V., 2001. “Hesitations and cognitive status of noun phrase referents in spontaneous discourse.” University of Minnesota, dissertation for doctor of philosophy.
【6】Shriberg, E., 2005. “Spontaneous speech: how people really talk and why engineers should care.” In: Proc. Interspeech 2005, pp. 1781-1784.
【7】黃佳瑩, 重松淳, 2005. “日籍國語學習者之填空詞使用:以遠距形式談話為中心的考察.” 全球華文網路教育國際研討會(ICICE).
【8】Gabrea, M., O’Shaugnessy, D., 2000. “Detection of filled pauses in spontaneous conversational speech.” In: Proc. International Conference on Spoken Language Processing, Vol. III, Beijing, China, pp. 678–681.
【9】Ohta, K., Tsuchiya, M., Nakagawa, S., 2007. “Construction of spoken language model including fillers using filler prediction model.” In: Proc. Interspeech 2007, pp. 1489-1492.
【10】Pakhomov, S.-V., Savova, G., 1999. “Filled pause distribution and modeling in quasi-spontaneous speech.” Presented at Disfluency Workshop at International Congress of Phonetic Sciences, Berkely, CA.
【11】Pakhomov, S.-V., 1999. “Modeling filled pauses in medical dictations.” In: Proc. Association for Computational Linguistics (ACL), College Park, Maryland, USA, pp. 619–624.
【12】Siu, M., Ostendorf, M., 1996. “Modeling disfluencies in conversation speech.” In: Proc. ICSLP-96, vol.1, pp. 386-389.
【13】Stolcke, A., Shriberg, E., 1996. “Statistical language modeling for speech disfluencies.” In: Proc. International Conference on Acoustics, Speech and Signal Processing, Vol. I, Atlanta, USA, pp. 405–408.
【14】Siu, M., Ostendorf, M., 2000. “Variable N-gram and extensions for conversational speech language modeling.” Speech and Audio Processing, IEEE Transactions on Volume 8, pp. 63-75.
【15】Stouten, F., Duchateau, J., Martens, J.-P., Wambacq, P., 2006. “Coping with disfluencies in spontaneous speech recognition: acoustic detection and linguistic context manipulation.” In: Speech Communication 48, pp. 1590-1606.
【16】Stouten, F., Martens, J.-P., 2003. “A feature-based filled pause detection system for Dutch.” In: Proc. IEEE Automatic Speech Recognition and Understanding Workshop, Virgen Islands, USA, pp. 309–314.
【17】Wu, C.-H., Yan, G.-L., 2004. “Acoustic feature analysis and discriminative modeling of filled pauses for spontaneous speech recognition.” In: Journal of VLSI Signal Processing 36, pp. 91-104.
【18】Wu, C.-H., Yan, G.-L., 2004. “A study on speech act modeling and verification of spontaneous speech with disfluency in a spoken dialogue system.” Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, R.O.C., dissertation for doctor of philosophy.
【19】Quimbo, F.C., Kawahara, T., Doshita, S., 1998. “Prosodic analysis of fillers and self-repair in Japanese speech.” In: Proc. International Conference on Spoken Language Processing, Sydney, Australia, pp. 3313–3316.
【20】Gabrea, M., O’Shaugnessy, D., 2000. “Detection of filled pauses in spontaneous conversational speech.” In: Proc. International Conference on Spoken Language Processing, Vol. III, Beijing, China, pp. 678–681.
【21】Goto, M., Itou, K., Hayamizu, S., 1999. “A real-time filled pause detection system for spontaneous speech.” In: Proc. European Conference on Speech Communication and Technology, Vol. I, Budapest, Hungary, pp. 227–230.
【22】The Department of Linguistics at the Ohio State University, 2004. “Language files -- Materials for an introduction to language and linguistics.” 9th edition.
【23】Zhao, Y., Jurafsky, D., 2005. “A preliminary study of Mandarin filled pauses.” In: Proc. DISS'' 05, Aix-en-Provence, pp. 179-182.
【24】Wasaw, T., 1997. “Remarks on grammatical weight.” Language Variation and Change, 9, pp.81-105
【25】Vorstermans, A., Martens, J.-P., Van Coile, B., 1996. “Automatic segmentation and labeling of multi-lingual speech data.” Speech Comm. 19, pp. 271–293.
【26】http://htk.eng.cam.ac.uk/
【27】王惟正, “國語語音訊號中發音偏誤類型之自動偵測,” 國立台灣大學電機資訊學院資訊工程學系碩士論文, 2008.
【28】Jang R., "Data Clustering and Pattern Recognition," http://140.114.76.148/jang/books/dcpr/.
【29】Chang, C.-C., Lin, C.-J., “LIBSVM—a library for support vector machines,” http://www.csie.ntu.edu.tw/~cjlin/libsvm/.
【30】Akita, Y., Kawahara, T., 2006. “Efficient estimation of language model statistics of spontaneous speech via statistical transformation model.” In: Proc. ICASSP 2006.
【31】 Batliner, A., Kiessling, A., Burger, S., Noth, E., 1995. “Filled pauses in spontaneous speech.” In: Proc. International Congress of Phonetic Sciences, Stockholm, Sweden.
【32】Ishihara, K., Tsubota, Y., Okuno, H.-G., 2003. “Automatic transformation of environmental sounds into sound-imitation words based on Japanese syllable structure.” In: Proc. Interspeech 2003, pp. 3185-3188.
【33】Lin, C.-K., Lee, L.-S., 2005. “Improved spontaneous Mandarin speech recognition by disfluency interruption point (IP) detection using prosodic features.” In: Proc. Interspeech 2005, pp. 1621-1624.
【34】Moniz, H., Mata, A.-I., Viana, M.-C., 2007. “On filled-pause and prolongations in European Portuguese.” In: Proc. Interspeech 2007, pp. 2645-2648.
【35】Peters, J., May 2003. “LM studies on filled pauses in spontaneous medical dictation.” In: Proc. Human Language Technology conference/North American Chapter of the Association for Computational Linguistics Annual Meeting, Edmonton, Canada, pp. 82–84.
【36】Takahashi, S., Morimoto, T., Maeda, S., Tsuruta, N., 2005. “Detection of coughs from user utterances using imitated phoneme model.” In: Proc. Interspeech 2005, pp. 1357-1360.
【37】Takahashi, S., Morimoto, T., Maeda, S., Tsuruta, N., 2004. “Cough detection in spoken dialogue system for home health care.” In: Proc. Interspeech 2004, pp. 1865-1868.
【38】Truong, K.-P., David A. van Leeuwen., 2005. “Automatic detection of laughter.” In: Proc. Interspeech 2005, pp. 485-488.
【39】Schramm, H., Aubert, X.L., Meyer, C., Peters, J., 2003. “Filled pause modeling for medical transcriptions.” In: Proc. ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition, Tokyo, Japan.
【40】Swerts, M., Wichmann, A. and Beun, R., 1996. “Filled pauses as markers of discourse structure.” Proc. ICSLP.
【41】Shriberg, E., and Stolcke, A., "Prosody modeling for automatic speech recognition and understanding." In: Proc. Workshop on Mathematical Foundations of Natural Language Modeling, 2002.
【42】Shriberg, E., and Stolcke, A., Hakkani-Tur, D. and Tur, G., "Prosody-based automatic segmentation of speech into sentences and topics. " Speech communication 32(1-2), pp. 127-154, 2000.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top