跳到主要內容

臺灣博碩士論文加值系統

(3.236.23.193) 您好!臺灣時間:2021/07/24 12:17
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:黃典煌
研究生(外文):Tien-Huang Huang
論文名稱:隨讀隨聽電子書手持裝置於ARM920T嵌入式開發平台之設計與實現
論文名稱(外文):Design and Implementation of the LR-Book Handheld Device Based on ARM920T Embedded Platform
指導教授:王駿發
指導教授(外文):Jhing-Fa Wang
學位類別:碩士
校院名稱:國立成功大學
系所名稱:電機工程學系碩博士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2009
畢業學年度:97
語文別:英文
論文頁數:59
中文關鍵詞:平均意見得分文字辨識技術語義不可預測句子語音合成系統
外文關鍵詞:mean opinion score (MOS)optical character recognition (OCR)Text-to-Speech (TTS)semantic unpredictable sentence (SUS)
相關次數:
  • 被引用被引用:0
  • 點閱點閱:226
  • 評分評分:
  • 下載下載:13
  • 收藏至我的研究室書目清單書目收藏:2
近年來,手持式裝置越來越普及化,它們主要特點趨向小體積、低價格、高運算能力且擁有強大的軟體功能。由於技術的進步,許多無法在傳統手持式裝置實現的應用在今日已有被實現之可能性。

本論文的目的在於採用三星公司所開發的一顆ARM920T處理器(S3C2440A),並使用Linux環境的作業系統來實作一套「LR-Book隨讀隨聽電子書裝置」。針對銀髮族設計適合的使用者操作介面,將內容先經由人性化語音合成系統合成(TTS, Text-to-Speech),而使用者可透過USB傳輸介面存取合成後的多媒體語音資料並下載至LR-Book手持裝置儲存媒體中。最後透過LR-Book的文字辨識技術(OCR, Optical Character Recognition)取得目前正在閱讀的實體書籍內容範圍,配合記憶體內的多媒體數位化文章內容,使文章以語音輸出以達到閱讀的目的。

本論文所開發的語音合成系統中,為採用基於隱藏式馬可夫模型的語音合成器(HTS,HMM-based Speech Synthesis System)。在語義不可預測句子(SUS, Semantic Unpredictable Sentence)聽寫的測試中,平均受測者的正確率達到96.4%;而在針對不同的題裁短文測試中,主觀評測的自然度平均意見得分(MOS, Mean Opinion Score)達到3.6分。所以本系統已可合成出流暢及可理解的語音。同時合成部份的語音模型所佔記憶體空間非常的小,在可攜性及適應性更是其發展優勢。
In recent years, hand-held devices have become more and more popular in our daily life. In addition to the trend of low price and small volume, these devices usually possess strong software functions and high operation ability. Owing to these technology advances, many unfeasible applications in old hand-held devices can already be realized nowadays.

The purpose of this thesis is to propose a “Listenable and Readable BOOK Device”. This device is based on S3C2440A with ARM920T as the main processor and the Linux is adopted as the operating system. First, the text content of a physical book is converted into digital speech by a user-friendly text-to-speech (TTS) system. The speech content can then be easily downloaded into the memory of LR-Book through the USB interface. With the optical character recognition (OCR) process, LR-Book system is able to identify the page number of the currently reading physical book and then obtain the corresponding digital speech content in the memory. Finally, the speech output of the LR-Book can be read out.

The proposed speech synthesis system is based on Hidden Markov Models to synthesize smooth and easy-understanding speech. In the semantic unpredictable sentence (SUS) dictation test, the correct mean rate is 96.4%. In the naturalness test, the mean opinion score (MOS) is 3.6. The model of synthesize output is very small and can be used in many applications because of its flexibility and portability.
中文摘要 ............................................................................................................................................. I
Abstract ........................................................................................................................................... III
誌謝(Acknowledgments) ................................................................................................................. V
Contents .......................................................................................................................................... VI
Figure Captions ............................................................................................................................ VIII
Table Captions .................................................................................................................................. X
Chapter1 Introduction ................................................................................................................ 1
1.1 Background and Motivation ............................................................................................. 1
1.2 Related Work ...................................................................................................................... 3
1.2.1 Overview of Speech Synthesis ....................................................................................... 3
1.2.2 General TTS Architecture ............................................................................................. 3
1.2.3 Speech Synthesis Methods ............................................................................................. 4
1.2.3.1 Concatenative Synthesis ................................................................................................ 4
1.2.3.2 LPC-Based Synthesis ..................................................................................................... 5
1.2.3.3 HMM-Based Synthesis .................................................................................................. 5
1.3 Thesis Organization ........................................................................................................... 7
Chapter 2 HMM-Based Mandarin Speech Synthesizer ........................................................... 8
2.1 HMM-Based Speech Synthesis System ............................................................................ 9
2.2 Training Part of the System ............................................................................................ 11
2.2.1 Context-Dependent Modeling Techniques ................................................................. 11
2.2.2 Model Reduction .......................................................................................................... 12
2.2.3 Pre-processor of Text Analysis ................................................................................... 14
2.3 Synthesis Part of the System ........................................................................................... 16
2.3.1 Speech Parameter Generation .................................................................................... 16
2.3.2 Mel Log Spectrum Approximation Filter .................................................................. 17
Chapter 3 Embedded System Design for the Text-to-Speech Synthesizer Based on
ARM920T-S3C2440A ...................................................................................................................... 19
3.1 System Overview of S3C2440A ....................................................................................... 21
3.2 Hardware Architecture of the Proposed System ........................................................... 26
3.2.1 NAND Flash and SDRAM Controller ........................................................................ 26
3.2.2 Audio Circuit ................................................................................................................ 27
3.2.3 Camera Circuit ............................................................................................................. 28
3.2.4 Regulation Circuit ........................................................................................................ 29
3.2.5 UART Circuit ............................................................................................................... 31
3.3 Software Architecture of the Proposed System ............................................................. 33
3.3.1 LR-Book System Processes ......................................................................................... 33
3.3.2 Related Work of Optical Character Recognition ...................................................... 34
3.3.3 Proposed OCR System ................................................................................................. 36
Chapter 4 Experiments and Implementation .......................................................................... 39
4.1 Training Phase .................................................................................................................. 39
4.2 Synthesis Phase ................................................................................................................. 40
4.3 Establishment of the Corpus ........................................................................................... 41
4.4 Experimental Design ........................................................................................................ 43
4.4.1 Testing Sentences .......................................................................................................... 43
4.4.2 Testing Criterion .......................................................................................................... 44
4.4.3 Testing Results.............................................................................................................. 45
4.5 System Board Implementation ....................................................................................... 47
4.5.1 Design Process .............................................................................................................. 47
4.5.2 Layout Demonstration ................................................................................................. 48
Chapter 5 Conclusions and Future Work ................................................................................ 50
References ........................................................................................................................................ 51
Appendix 1 ....................................................................................................................................... 53
Appendix 2 ....................................................................................................................................... 55
Appendix 3 ....................................................................................................................................... 56
[1] E. Fitzpatrick, "An introduction to text-to-speech synthesis," Computational Linguistics, vol.
24, pp. 322-323, Jun 1998.
[2] A. Iida, N. Campbell, F. Higuchi, and M. Yasumura, "A corpus-based speech synthesis system
with emotion," Speech Communication, vol. 40, pp. 161-187, Apr 2003.
[3] M. Puckette, "FORMANT-BASED AUDIO SYNTHESIS USING NONLINEAR
DISTORTION," Journal of the Audio Engineering Society, vol. 43, pp. 40-47, Jan-Feb 1995.
[4] C. H. Shadle and B. S. Atal, "SPEECH SYNTHESIS BY LINEAR INTERPOLATION OF
SPECTRAL PARAMETERS BETWEEN DYAD BOUNDARIES," Journal of the Acoustical
Society of America, vol. 66, pp. 1325-1332, 1979.
[5] H. Zen, T. Masuko, K. Tokuda, T. Yoshimura, T. Kobayasih, and T. Kitamura, "State duration
modeling for HMM-based speech synthesis," Ieice Transactions on Information and Systems,
vol. E90D, pp. 692-693, Mar 2007.
[6] L. J. Siegel and A. C. Bessey, "VOICED UNVOICED MIXED EXCITATION
CLASSIFICATION OF SPEECH," Ieee Transactions on Acoustics Speech and Signal
Processing, vol. 30, pp. 451-460, 1982.
[7] K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi and T. Kitamura, “Speech parameter
generation algorithms for HMM-based speech synthesis,” Proc. of ICASSP 2000, vol.3,
pp.1315–1318, June 2000.
[8] K. Tokuda, H. Zen, and A.W. Black, “An HMM-based speech synthesis system applied to
English,” in Proc. Of IEEE Speech Synthesis Workshop, Sept. 2002. http://hts.sp.nitech.ac.jp/
[9] L. S. Lee, C. Y. Tseng, and C. J. Hsieh, "Improved Tone Concatenation Rules in a
Formant-Based Chinese Text-to-Speech System," Ieee Transactions on Speech and Audio
Processing, vol. 1, pp. 287-294, Jul 1993.
[10] C. Huang, Y. Shi, J. L. Zhou, M Chu, T Wang, and E. Chang, “Segmental Tonal Modeling for
Phone Set Design in Mandarin LVCSR,” in Proc. of ICASSP, pp.901-904, 2004.
[11] Zen. Heiga, “An example of context-dependent label format for HMM-based speech synthesis
in English,” March 2, 2006.
[12] http://www.arm.com/products/CPUs/index.html
[13] “S3C2440A, 32-BIT RISC MICROPROCESSOR USER’S MANUAL PRELINIMARY
Revision 0.14”, Samsung Electronics., June 30, 2004.
[14] “K4S641632D, SDRAM Datasheet”, Samsung Electronics.
[15] “K9F1208U0M FLASH MEMORY Datasheet”, Samsung Electronics.
[16] “UDA1341TS Economy Audio Codec Datasheet”, Philips Semiconductors.
[17] “LM1117 800mA Low-Dropout Linear Regulator Datasheet”, National Semiconductors.
[18] “MAX3232 Datasheet”, MAXIM Electronics.
[19] “ARM9 DMA-2440 User’s Manual”, DMATEK, http://www.dmatek.com.tw
[20] G. Nagy, “Optical character recognition: Theory and Practice”, in P. R. Krishnaiah and L. N.
Kanal, eds., Handbook of Statistics, vol. 2, pp. 621-649, 1982. This is a survey of statistical
feature analysis techniques for OCR.
[21] Brill, P.H. (1996), “Level Crossing Methods”, in Encyclopedia of Operations Research and
Management Science, Gass, S.I. and Harris, C.M..editors, Kluwer Academic Publishers,
338-340.
[22] Histogram, http://www.netmba.com/statistics/histogram/
[23] MS_SDK51 Download, http://www.microsoft.com/speech/download/sdk51/
[24] NCHU-TTS, http://speechlab.cs.nchu.edu.tw/OnLineTTS/cgitest.html
[25] IFLYTEK, http://www.iflytek.com/TtsDemo/interPhonicShow.aspx
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關論文
 
1. 王湘瀚(2004)。臺灣社會人口變遷對教育政策發展的影響。社會科教育研究,9,255-280。
2. 林海清(2006)。少子化效應對技職教育發展之衝擊與因應策略。教育研究月刊,151,32-45。
3. 郭添財(2005)。人口遞減對當前教育之影響與未來教育之發展。臺灣教育,623,34-35。
4. 陳寬政、王德睦、陳文玲(1986)。臺灣地區人口變遷的原因與結果。人口學刊,9,1-23。
5. 黃文樹(1994)。高雄市近十年來(民國七十二年至民國八十一年)人口成長與中小學教育發展關係之研究。高市文獻,7(1),25-67。
6. 薛承泰(2003)。臺灣地區人口特質與趨勢:對社會福利政策的幾個啟示。國家政策季刊,2(4),1-22。
7. 薛曉華(2004)。少子化的教育生態轉變是危機或轉機?兩種價值觀的檢視…兼論因應少子化時代以學習者為中心的教育政策。臺灣教育,630,21-30。
8. 1.吳金柱,長期股權投資估價規定有疑義,稅務旬刊,第2040 期,2008 年5 月31 日。
9. 2.林汀錇,投資損失與出售財產損失認列,稅務旬刊,第2259 期,2014 年6 月30 日。
10. 5.柯格鐘,論量能課稅原則,成大法學,第14 期,2007 年12 月。
11. 6.柯宗佑,論投資損失認列時點:北高行100 訴字第978 號判決為例,稅務旬刊,第2224 期,2013 年7 月10 日。
12. 7.袁金蘭、李惠先,以債作股可否認列投資損失,會計研究月刊,第343 期,2014年6 月。
13. 8.張錦娥、張思國,論投資損失之課稅規範,稅務旬刊,第1881 期,2003 年12月31 日。
14. 9.陳清秀,量能課稅與實質課稅原則(下),月旦法學雜誌,第184 期,2010 年9月。
15. 10.陳敏,租稅課徵與經濟事實之掌握:經濟考察方法,政大法學評論,第26 期,1982 年12 月。