跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.42) 您好!臺灣時間:2025/10/01 12:48
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:陳昭宏
研究生(外文):Chen, Jau-Hung
論文名稱:中文文句翻語音系統中合成單元選取及韻律訊息產生之研究
論文名稱(外文):A Study on Synthesis Unit Selection and Prosodic Information Generation in a Chinese Text-to-Speech System
指導教授:吳宗憲吳宗憲引用關係
指導教授(外文):Wu Chung-Hsien
學位類別:博士
校院名稱:國立成功大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:1998
畢業學年度:86
語文別:中文
論文頁數:135
中文關鍵詞:文句翻語音語音資料庫合成單元音韻訊息語言特徵音韻樣板
外文關鍵詞:Text-to-SpeechSpeech DatabaseSynthesis UnitProsodic InformationLinguistic FeaturesProsodic Template
相關次數:
  • 被引用被引用:5
  • 點閱點閱:221
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本論文提出中文文句翻語音系統中合成單元選取及韻律訊息產生之方法。
我們採用單音節作為基本合成單元,並且使用兩個評估函數,從一個大的
語音資料庫中挑選出一組合成單元。這個語音資料庫也被用來建立一個以
詞音韻為基礎的樣板樹,它是根據以下的語言特徵來建立的:詞的聲調組
合、詞長、詞性、以及詞在句中的位置。對於各種組合的語言特徵,這個
樣板樹儲存對應的詞音韻特徵,包括基週軌跡、平均能量、以及音長。在
音韻產生方面,我們提出一個句調模組及樣板擷取模組,來產生目標音韻
樣板。最後,我們實作一套稱為SATER的「聽」電子郵件系統,此系統是
結合中文文句翻語音、語者確認、電話以及電腦網路,使用者只要透過電
話(有線或無線皆可)連接到本系統,就可以聽取他的電子郵件。在實驗
方面,內部測試的結果顯示本系統合成出來的音韻特徵與原來的部份很相
像,由電腦挑出來的合成單元的合成效果也令人滿意。
In this dissertation, some approaches to synthesis unit
selection and prosodicinformation generation are proposed for
Chinese text-to-speech conversion. The monosyllables are adopted
as the basic synthesis units. A set of synthesis units is
selected from a large continuous speech database based on two
cost functions which minimize the inter- and intra-syllable
distortion. The speech database is also employed to establish a
word-prosody-based template tree according to the linguistic
features: tone combination, word length, part-of-speech (POS) of
the word, and word position in a sentence. This template tree
stores the prosodic features including pitch contour, average
energy, and syllable duration of a word for possible
combinations of linguistic features. Two modules for sentence
intonation and template selection are proposed to generate the
target prosodic templates. On the other hand, a Bayesian network
is used to model the relationship between linguistic features
and prosodic information. Finally, a Speech Activated Telephony
Email Reader (SATER) is proposed. SATER is an integrated system
combining speaker verification, network, and text-to-speech
conversion. A registered user can activate and listen to his
email through a wired/wireless telephone. In the speaker
verification subsystem, a time-varying verification phrase is
adopted. The speaker''s password is used to generate the
verification phrases for that speaker. A hidden Markov Model
with states of variable number is used to model each
verification phrase. The experimental results for the TTS
conversion system showed that synthesized prosodic features
quite resembled their original counterparts for most syllables
in the inside test. Evaluation by subjective experiments also
confirmed the satisfactory performance of these approaches.
Cover
Contents
Chapter 1 Introduction
1.1 Motivation
1.2 Overview on TTS Conversion
1.3 Overview of the Dissertation
1.4 System Diagram Description
1.5 Organization of the Dissertation
Chapter 2 Text Analysis
2.1 ASCII to Big5
2.2 Digit conversion
2.2.1 Reading digit by digit
2.2.2 Reading digit by number expansion
2.3 Word Segmentation
2.4 Phonemic Transcription
2.5 Homographic Characters
2.6 Tone Sandhi
2.7 Long Sentence Segmentation
Chapter 3 Synthesis Unit Selection
3.1 Pitch Period Detection and Smoothing
3.2 Speech Unit Filtering
3.3 Spectral Feature Extraction
3.4 Unit Selection
3.5 Manual Examination
3.6 Experiments and results
3.6.1 Syllable Occurrence
3.6.2 Unit Selection
3.7 Summary of Unit Selection
3.8 A Novel Two-Level Method for the Computation of the LSP Frequencies Using Decimation-in-Degree Algorithm
3.8.1 Introduction
3.8.2 The Line Spectrum Pair (LSP) Frequencies
3.8.3 The Two-level Method
3.8.4 Experimental Environment
3.8.5 Performance of the Two-Level Method
3.9 Summary of the Two-Level Method
Chapter 4 Bayesian-Network-Based Generation ofProsodic Information
4.1 Introduction
4.2 System Description
4.3 Bayesian Network
4.4 Prosodic Information Generator
4.5 Prosody Modification Module
4.6 Performance Evaluation
4.7 Summary
Chapter 5 Template-Based Prosody Generation
5.1 Structure of the word prosody template tree
5.2 Generation of Word Prosody Templates
5.2.1 Sentence Intonation Module
5.2.2 Template Selection Module
5.3 Experiments and Results
5.3.1 Average Pitch Periods of the Five Tones
5.3.2 Pitch Vectors of the Five Tones
5.3.3 Relation Between Prosodic Features
5.3.4 Inside Test
5.4 Summary
Chapter 6 Speech Activated Telephony Email Reader (SATER) Based on Speaker Verification and TTS Conversion
6.1 Introduction
6.2 System Overview
6.3 Speaker Verification Subsystem Using HMM
6.4 Experiments and Results
6.5 Summary
Chapter 7 Conclusions
References
作者簡歷
著作目錄
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top