跳到主要內容

臺灣博碩士論文加值系統

(44.222.218.145) 您好!臺灣時間:2024/02/29 16:51
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:陳柏琳
研究生(外文):Berlin Chen
論文名稱:中文語音資訊檢索─以音節為基礎之索引特徵、統計式檢索模型及進一步技術
論文名稱(外文):Speech Information Retrieval for Mandarin Chinese - Syllable-Based Indexing Features, Statistical Retrieval Models and Improved Approaches
指導教授:李琳山李琳山引用關係
指導教授(外文):Lin-shan Lee
學位類別:博士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2001
畢業學年度:89
語文別:英文
論文頁數:104
中文關鍵詞:語音資訊檢索語音辨識隱藏式馬可夫模型音節關鍵詞擷取A* 搜尋演算法資訊融合平均準確率
外文關鍵詞:Speech Information RetrievalSpeech RecognitionHidden Markov ModelSyllableKeyword SpottingA* searchInformation FusionNon-interpolated Average Precision
相關次數:
  • 被引用被引用:10
  • 點閱點閱:694
  • 評分評分:
  • 下載下載:86
  • 收藏至我的研究室書目清單書目收藏:3
語音資訊檢索主要是研究如何對大量的多媒體資訊如廣播新聞、數位博物館典藏資料等利用語音辨識技術,以自動的方式對於其內含的語音資訊建立起全文索引與檢索機制。本篇論文旨在針對使用自然語言語音或文字問句去檢索中文語音資訊所衍生的相關問題,提出完整且詳實的研究探討,範圍包括了各種不同層次的索引特徵的使用與比較、索引值比重的調整方法、問句的擴展技術與統計式檢索模型的運用。本篇論文使用了兩套廣播新聞語料庫做為實驗的題材,一套是在台灣所蒐集的廣播新聞語料,另一套是由美國語言暨語料協會所提供的美國之音中文廣播新聞語料。首先,利用在台灣所蒐集的廣播新聞語料,我們根據中文語言的結構性特徵,提出了一系列以中文音節資訊為特徵的索引組合,嘗試與一般以字或詞為特徵的索引方式作比較,驗證了在中文語音資訊檢索時採用音節資訊為索引特徵的確有其獨特的鑑別能力與優點,我們並且提出了許多方法來產生強健性的音節索引特徵。第二,利用美國之音中文廣播新聞語料,我們嘗試從不同的觀點探索加入額外的資訊於中文語音資訊檢索過程的可能性。從語音辨識的觀點,我們把語音的抑揚頓挫資訊、語音辨識中詞的混淆資訊,用於索引的建構。從語言處理的觀點,我們同時也把詞類的資訊用於索引值比重的調整。從資訊檢索的觀點,我們使用了相關回授、詞(或索引)關聯等技術於檢索的過程。結合上述這些技術,語音資訊檢索的精確率的確可以獲得顯著的提昇。第三,我們採用了統計式檢索模型於中文語音資訊檢索,它是一種結合隱藏式馬可夫模型與N鍊語言模型的檢索方式。我們同時實驗了各種不同N鍊語言模型與索引特徵的組合,證明了這種統計式檢索模型的確比傳統的向量空間模型在檢索時的表現上來的好。此外,我們並且採用了兩個常用於語音辨識的馬可夫模型訓練演算法,用以增進檢索模型的鑑別力。最後,我們建立一個展示系統雛形,可以讓使用者以語音輸入的方式檢索在台灣所蒐集的廣播新聞。
Automatically indexing and retrieving huge collections of speech information, or Spoken Document Retrieval (SDR), has become more and more important in recent years because of its potential use in navigating large quantities of multimedia information such as digital libraries in the near future. This dissertation presents a thorough investigation on the problem of retrieval of Mandarin spoken document retrieval using natural language speech or text queries, by exploring various indexing approaches, term weighting schemes, and retrieval models. Mandarin broadcast news speech from two sources are studied: the broadcast news speech collected in Taiwan and the Topic Detection and Tracking (TDT-2 and TDT-3) broadcast news speech Corpora released by the Linguistic and Data Consortium (LDC). First, based on the broadcast news speech collected in Taiwan, a whole class of indexing features for retrieval of Mandarin broadcast news using syllable-level statistical characteristics has been extensively investigated considering the monosyllabic structure of the Chinese language. We compare the discriminating capabilities of the syllable-based approach to the word- or character-based approaches, and investigate if these approaches can be integrated to provide additive discriminating capabilities. Moreover, many techniques for providing robust syllable-level indexing features and enhancing the retrieval performance are also investigated and examined. Second, based on the Topic Detection and Tracking Corpora, we explored the use of various extra information for spoken document retrieval from different perspectives. From the speech recognition perspective, we incorporated the acoustic stress and word confusion information into the audio indexing. From the linguistic perspective, we applied the part-of-speech information in both the audio indexing and the query representation. From the information retrieval perspective, we integrated techniques such as the query expansion by term associations and the blind relevance feedback into the retrieval process. Third, an HMM/N-gram-based linguistic processing approach for Mandarin spoken document retrieval was presented based on the Topic Detection and Tracking Corpora as well. The underlying characteristics and different structures of this approach were extensively investigated. The retrieval capabilities were verified by tests with indexing features of word- and syllable-levels and comparison with the conventional vector space model approach. To further improve the discrimination capabilities of the document HMM models, both the expectation-maximization (EM) and minimum classification error (MCE) training algorithms were introduced in training. Finally, a client-server-based spoken document retrieval system with both speech and text query inputs is implemented on a PC with the Microsoft Windows environment.
Abstracti
List of Figuresvii
List of Tablesviii
Chapter 1: Introduction1
1.1 Motivation1
1.2 Related Works2
1.3 Research Issues3
1.3.1 Automatic Speech Recognition for Spoken Documents and Voice Queries3
1.3.2 Appropriate Indexing Features for Speech Information Retrieval3
1.3.3 Extra Acoustic and Linguistic Cues for Speech Information Retrieval5
1.3.4 Retrieval Models for Speech Information Retrieval5
1.4 Outline of This Dissertation6
Chapter 2: Mandarin Broadcast News Speech Corpora and Retrieval Performance Measures9
2.1 The Broadcast New Speech Collected in Taipei, Taiwan9
2.2 The Topic Detection and Tracking Corpora11
2.3 Evaluation Measures for Information Retrieval13
Chapter 3: The Discriminating Capabilities of The Syllable-Based Indexing Features for Mandarin Spoken Document Retrieval15
3.1. Introduction — Some Structural Features of Mandarin Chinese15
3.2 Mandarin Speech Recognition18
3.2.1 Acoustic Processing and Language Modeling18
3.2.2 Speech Recognition19
3.3 Retrieval Approaches Using Syllable-Level Statistical Characteristics22
3.3.1 Syllable-Level Indexing Terms22
3.3.2 Information Retrieval Model24
3.4 Initial Experimental Results Using Syllable-Level Feature Alone26
3.5 Comparing the Discriminating Capabilities of Syllable-Level Features with Character- and Word-Level Information28
3.6 Fusion of Syllable-, Character- and Word-Level Information30
3.7 Improved Syllable-Level Indexing Features from Syllable Lattices32
3.7.1 Syllable-Level Utterance Verification (SUV)33
3.7.2 Deletion of Low Frequency Indexing Terms (DLF)34
3.7.3 Stop Terms (ST)35
3.8 Further Retrieval Techniques Applied on Syllable-Based Features36
3.8.1 Blind Relevance Feedback (BREF)36
3.8.2 Term Associations (TA)36
3.8.3 Experimental Results37
3.9 Further Comparison and Fusion with Character- and Word-Level Information38
3.10 Summary40
Chapter 4: Improved Spoken Document Retrieval by Exploring Extra Acoustic and Linguistic Cues41
4.1 Introduction42
4.2 Experiment Setup43
4.2.1 Information Retrieval Model43
4.2.2 Baseline Experimental Results44
4.3. Improvements from The Speech Recognition Perspective44
4.3.1 The Acoustic Stress Information (AS)44
4.3.2 The Word-Level Confusion Information (WC)45
4.3.3 Experimental Results46
4.4 Improvements From The Linguistic Perspective47
4.4.1 Part-of-Speech (POS) Information47
4.4.2 Experimental Results47
4.5. Improvements from The Information Retrieval Perspective48
4.5.1 Term Associations (TA)48
4.5.2 Blind Relevance Feedback (BREF)49
4.5.3 Experimental Results49
4.6. Summary50
Chapter 5: An HMM/N-Gram-Based Linguistic Processing Approach For Mandarin Spoken Document Retrieval52
5.1 Introduction52
5.2. Retrieval Models54
5.2.1 HMM/N-Gram-Based Model54
5.2.2 Vector Space Model56
5.3 The Expectation-Maximization (EM) Training Procedure for the HMM/N-Gram-Based Retrieval Approach57
5.4 Initial Experimental Results60
5.4.1 Experiment Setup60
5.4.2 Word-level vs. Syllable-level Indexing Features61
5.4.3 Comparisons with Vector Space Model62
5.5 Online Estimating the Weights62
5.6 Minimum Classification Error (MCE) Training65
5.7 Information Fusion for the HMM/N-Gram-Based Retrieval Approach69
5.8 Summary69
Chapter 6: The Voice-Activated Web-Based Mandarin Chinese Spoken Document Retrieval System70
6.1: A*-Admissible Key-Phrase Spotting with Sub-Syllable Level Utterance Verification70
6.1.1 Overview72
6.1.2 Key-Phrase Spotting73
6.1.3 Sub-Syllable Level Utterance Verification74
6.1.4 Minimum Classification Error Training75
6.1.5 Experiments77
6.1.6 Summary80
6.2 The Prototype Voice-Active Spoken Document Retrieval System81
6.3 System Performance83
Chapter 7: Concluding Remarks and Future Works85
7.1 Concluding Remarks85
7.2 Future Works88
Bibliography90
[1] G. Salton. Introduction to Modern Information Retrieval. McGraw-Hill, NY, 1983.
[2] Text REtrieval Conference (TREC) Home Page: “http://trec.nist.gov/”.
[3] ACM Special Interest Group on Information Retrieval (SIGIR) Home Page: “http://www.acm.org/sigir/”
[4] Yahoo Home Page: “http://www.yahoo.com”.
[5] Excite Home Page: “http://www.excite.com”.
[6] Alta Vista Home Page: “http://www.altavista.com”.
[7] Google Home Page: “http://www.google.com”.
[8] CSMART Home Page: http://csmart.iis.sinica.edu.tw”.
[9] Openfind Home Page “http://www.openfind.com.tw”.
[10] K. F. Lee. Automatic Speech Recognition: The Development of the SPHINX System. Kluwer Academic Publishers, Boston, 1989.
[11] J. G..Wilpon, L. R. Rabiner, C. H. Lee, and E. Goldman, “Automatic Recognition of Keywords in Unconstrained Speech Using Hidden Markov Models,” IEEE Trans. on Acoustic, Speech, and Signal Processing, 38(11), pp. 1870-1878, Nov. 1990.
[12] L. R. Rabiner, and B. H. Juang. Fundamentals of Speech Recognition. NJ: Prentice Hall, 1993.
[13] J. T. Foote, G. J. F. Jones, K. Spärck Jones, and S. J. Young,” Talker-Independent Keyword Spotting for Information Retrieval,” in Proc. European Conf. on Speech Communication and Technology, pp. 2145-2148,1995.
[14] H. M. Wang and L. S. Lee, et al., “Complete Recognition of Continuous Mandarin Speech for Chinese Language with Very Large Vocabulary Using Limited Training Data,” IEEE Trans. on Speech and Audio Processing, 5(2), pp. 195-200, Mar. 1997.
[15] J. Kupiec, D. Kimber, and V. Balasubramanian,” Speech-Based Retrieval Us3ing Semantic Co-Occurrence Filtering,” in Proc. The Human Knowledge Technology Workshop, pp. 373-377, 1994.
[16] S. C. Lin, L. F. Chien, K. J. Chen, and L. S. Lee, “Unconstrained Speech Retrieval for Chinese Document Databases with Very Large Vocabulary and Unlimited Domains,” in Proc. European Conf. on Speech Communication and Technology, Vol. 2, pp. 1203-1206, 1995.
[17] L. F. Chien, S. C. Lin, and L. S. Lee, et al.,” Internet Chinese Information Retrieval Using Unconstrained Mandarin Speech Queries Based on a Client-Server Architecture and a PAT-tree-based Language Model,” in Proc. Int. Conf. on Acoustic, Speech, Signal Processing, Vol. 2, pp. 1155-1158, 1997.
[18] U. Glavitsch and P. Schäuble, “A System for Retrieving Speech Documents,” in Proc. ACM SIGIR Conference on R&D in Information Retrieval, pp. 168-176, 1992.
[19] D. A. James. The Application of Classical Information Retrieval to Techniques to Spoken Documents. Ph.D. dissertation, University of Cambridge, UK, 1995.
[20] D. A. James, “A System for Unrestricted Topic Retrieval from Radio News Broadcasts,” in Proc. Int. Conf. on Acoustic, Speech, Signal Processing, pp. 279-282, 1996.
[21] K. Ng and V. Zue, “Subword Unit Representations for Spoken Document Retrieval,” in Proc. European Conf. on Speech Communication and Technology, pp. 1607-1610, 1997.
[22] John S. Garofolo, Ellen M. Voorhees, Vincent M Stanford, and K. Spärck Jones. TREC-6 1997 Spoken Document Retrieval Track Overview and Results. Available at “http://trec.nist.gov/pubs/trec4papers/sdr97.ps”.
[23] M. Wechsler. Spoken Document Retrieval Based on Phoneme Recognition. Ph.D. dissertation, Swiss Federal Institute of Technology (ETH), Zurich, 1998.
[24] A. Hauptmann, R. Jones, K. Seymore, S. Slattery, M. Witbrock, and M. Siegler, “Experiments in Information Retrieval from Spoken Documents,” in Proc. DARPA Broadcast News Transcription and Understanding Workshop. 1998.
[25] J. Allan, J. Callan, W. Croft, L. Ballesteros, D. Byrd, R. Swan, and J. Xu, “INQUERY Does Battle with TREC-6,” in Proc. The Sixth Text Retrieval Conference (TREC-6), 1998.
[26] D. Abberley, S. Renals, G. Cook, and T. Robinson, “The THISL Spoken Document Retrieval System,” in Proc. The Sixth Text Retrieval Conference (TREC-6),” 1998.
[27] S. E. Johnson, P. Jourlin, G. L. Moore, K. Spärck Jones, and P. C. Woodland, “The Cambridge University Spoken Document Retrieval System,” in Proc. Int. Conf. on Acoustic, Speech, Signal Processing, 1999.
[28] Amit Singhal and Fernando Pereira, “Document Expansion for Speech Retrieval,” in Proc. ACM SIGIR Conference on R&D in Information Retrieval, 1999.
[29] David R. H. Miller, T. Leek, and R. Schwartz, “A Hidden Markov Model Information Retrieval System,” in Proc. ACM SIGIR Conference on R&D in Information Retrieval, 1999.
[30] J. Makhoul, F. Kubala, T. Leek, D. Liu, L. Nguyen, R. Schwartz, “Speech and Language Techniques for Audio Indexing and Retrieval,” Proc. IEEE, Vol. 88, No. 8, Aug. 2000.
[31] S. Renals, D. Abberley, D. Kirby, and T. Robinson, “Indexing and Retrieval of Broadcast News,” Speech Communication, 32, pp. 5-20, 2000.
[32] P. Jourlin, S. E. Jonson, K. Spärck Jones, P. C. Woodland, “Spoken Document Representations for Probabilistic Retrieval,” Speech Communication, 32, pp. 21-36, 2000.
[33] G. Ng, R. Wilkinson, and J. Zobel, “Experiments in Spoken Document Retrieval Using Phoneme N-grams,” Speech Communication, 32, pp. 61-77, 2000.
[34] G. J. F. Jones, J. T. Foote, K. Spärck Jones, and S. J. Young, “Video Mail Retrieval Using Voice: An Overview of the Stage 2 System,” in MIRO Workshop, 1995.
[35] K. Spärck Jones, G. J. F. Jones, J. T. Foote, and S. J. Young, “Experiments on Spoken Document Retrieval,” Information Processing & Management, 32(4), pp. 399-417. 1996.
[36] B. R. Bai, L. F. Chien, and L. S. Lee, Very-Large-Vocabulary Mandarin Voice Message File Retrieval Using Speech Queries. in Proc. Int. Conf. on Spoken Language Processing, vol. 3, pp. 1950-1953, 1996.
[37] M. Wechsler and P. Schäuble, “Speech Retrieval Based on Automatic Indexing,” in MIRO Workshop, 1995.
[38] Multimedia Document Retrieval project at Cambridge University Home Page: “http://svr-www.eng.cam.ac.uk/research/Projects/Multimedia_Document_Retrieval/”.
[39] CMU Informedia Digital Video Library project “http://www.informedia.cs.cmu.edu/”
[40] A. Merlino and M. Maybury, “An empirical Study of the Optimal Presentation of Multimedia Summaries of Broadcast News,” in Automated Text Summarization, I. Mani and M. Maybury, Eds. Cambridge, MA: MIT Press 1999, pp. 391-401.
[41] S. Whittaker, J. Hirschberg, J. Choi, D. Hindle, F. Pereira, A. Singhal, “SCAN: Designing and Evaluating User Interface to Support Retrieval from Speech Archives,” in Proc. ACM SIGIR Conference on R&D in Information Retrieval, 1999.
[42] SpeechBot Audio/Video Search at Compaq Corporate Research, available at http://speechbot.research.compaq.com/.
[43] M. Wechsler and P. Schäuble, “Speech Retrieval Based on Automatic Indexing,” in MIRO Workshop, 1995.
[44] G. J. F. Jones, J. T. Foote, K. Spärck Jones, and S. J. Young, “Video Mail Retrieval: The Effect of Word Spotting Accuracy on Precision,” in Proc. Int. Conf. on Acoustic, Speech, Signal Processing, pp. 309-312, 1995.
[45] M. Witbrock and A. Hauptmann, ”Using Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents,” in Proc. ACM Digital Libraries Conference, pp. 30-35, 1997.
[46] P.C. Woodland, S.E. Johnson, P. Jourlin and K. Spärck Jones, “Effects of Out of Vocabulary Words in Spoken Document Retrieval,” in Proc. ACM SIGIR Conference on R&D in Information Retrieval, 2000.
[47] K. Ng and V. Zue, “Phonetic Recognition for Spoken Document Retrieval,” in Proc. Int. Conf. on Acoustic, Speech, Signal Processing, 1998.
[48] K. Ng, “Information Fusion for Spoken Document Retrieval,” in Proc. Int. Conf. on Acoustic, Speech, Signal Processing, 2000.
[49] B. Chen, H. M. Wang, and L. S. Lee, “Retrieval of Broadcast News Speech in Mandarin Chinese Collected in Taiwan Using Syllable-Level Statistical Characteristics,” in Proc. Int. Conf. on Acoustic, Speech, Signal Processing, 2000.
[50] B. Chen, H. M. Wang, and L. S. Lee, “Retrieval of Mandarin Broadcast News Using Spoken Queries,” in Proc. Int. Conf. on Spoken Language Processing, 2000.
[51] Savitha Srinivasan and Dragutin Petkovic, “Phonetic Confusion Matrix Based Spoken Document Retrieval,” in Proc. ACM SIGIR Conference on R&D in Information Retrieval, 2000.
[52] R. Silipo and S. Greenberg, “Automatic Transcription of Prosodic Stress for Spontaneous English Discourse,” in Proc. ICPhS, 1999.
[53] R. Silipo and F. Crestani, “Prosodic Stress and Topic Detection in Spoken Sentences,” Technical Report, International Computer Science Institute, Berkeley, 2000.
[54] H. M. Wang, H. Meng, P. Schone, B. Chen and W. K. Lo, “Multi-Scale Audio Indexing for Translingual Spoken Document Retrieval,” in Proc. Int. Conf. on Acoustic, Speech, Signal Processing, 2001.
[55] S. E. Robertson, and K. Sparck Jones, “Relevance Weighting of Search Terms,” Journal of the ASIS, 27, pp. 129-146 (1976).
[56] Rila Mandala, Takenobu Tokunaga, and Hozumi Tankaka, “Query Expansion Using Heterogeneous Thesauri,” Information Processing & Management, 36, pp. 361-378, 2000.
[57] M. E. Maron and K. L. Kuhns, “On Relevance, Probabilistic Indexing and Information Retrieval. ” Journal of the Associations of Computing Machinery, 7, pp. 216-244 , 1960.
[58] S. E. Robertson, and K. Sparck Jones, “Relevance Weighting of Search Terms,” Journal of the ASIS, 27, pp. 129-146, 1976.
[59] Baeza-Yates Ricardo and Ribeiro-Neto Berthier. Modern Information Retrieval. 1999.
[60] J. M. Ponte and W. B. Croft, “A Language Modeling Approach to Information Retrieval,” in Proc. ACM SIGIR Conference on R&D in Information Retrieval, 1998.
[61] F. Song and W. B. Croft, “A General Language Model for Information Retrieval,” in Proc. CIKM 1999.
[62] Linguistic Data Consortium Home Page: “http://www.ldc.upenn.edu”.
[63] Jelinek Frederick. Statistical Methods for Speech Recognition. The MIT Press 1999.
[64] B. H. Juang, W. Chou, and C. H. Lee, “Minimum Classification Error Rate Methods for Speech Recognition,” IEEE Trans. Speech and Audio Processing, Vol. 5, No. 3, May 1997.
[65] W. Chou, “Discriminant-Function-Based Minimum Recognition Error Rate Pattern-Recognition Approach to Speech Recognition,” Proc. IEEE, Vol. 88, No. 8, Aug. 2000.
[66] P. Zhan, S. Wegmann, and L. Gillick, “Dragon Systems’ 1998 Broadcast News Transcription System for Mandarin,” in Proc. of the DARPA Broadcast News Workshop, 1999.
[67] D. Harman, Overview of the Fourth Text Retrieval Conference (TREC-4). 1995. Available at “http://trec.nist.gov/pubs/trec4/overview.ps”.
[68] P. Kenny, R. Hollan, V.N. Gupta, M. Lennig,; P. Mermelstein, and D. O''Shaughnessy, “A*-Admissible Heuristics for Rapid Lexical Access,” IEEE Trans. on Speech and Audio Processing, Vol. 1 , Jan. 1993.
[69] B. Chen, H. M. Wang, L. F. Chien, and L. S. Lee, “A*-Admissible Key-Phrase Spotting with Sub-Syllable Level Utterance Verification,” in Proc. Int. Conf. on Spoken Language Processing, 1998.
[70] CKIP group, “Analysis of Syntactic Categories for Chinese,” in CKIP Technical Report, No. 93-05, Institute of Information Science, Academia Sinica, Taipei, 1993.
[71] A. S. Manos and V. W. Zue, ” A Segment-Based Word Spotter Using Phonetic Filler Models,” in Proc. Int. Conf. on Acoustic, Speech, Signal Processing, 1997.
[72] T. Kawahara, N. Kitaoka, and S. Doshita, “Concept-Based Phrase Spotting Approach for Spontaneous Speech Understanding,” in Proc. Int. Conf. on Acoustic, Speech, Signal Processing, 1996.
[73] R. A. Sukkar, A. R. Setlur, M. G.. Rahim, and C-H Lee, “Utterance Verification of Keyword Strings Using Word-Based Minimum Verification Error (WBMVE) Training,” in Proc. Int. Conf. on Acoustic, Speech, Signal Processing, 1996.
[74] A. L. Gorin, G. Riccardi and J. H. Wright.” How May I Help You,” Speech Communication , 23, pp. 113-127, 1997.
[75] W. P. Hsieh, B. Chen, K. T. Chen, and H. M. Wang, “Initial Experiments on Recognition of Internet-Accessible Compressed Mandarin Speech,” the Second International Symposium on Chinese Spoken Language Processing ( ISCSLP 2000), Beijing, China, October 2000.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊