跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.10) 您好!臺灣時間:2025/10/01 05:21
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:吳柏鋒
研究生(外文):Po-feng Wu
論文名稱:聲控Google地圖
論文名稱(外文):Voice Command for Google Map
指導教授:陳嘉平陳嘉平引用關係
指導教授(外文):Chia-Ping Chen
學位類別:碩士
校院名稱:國立中山大學
系所名稱:資訊工程學系研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2012
畢業學年度:100
語文別:中文
論文頁數:53
中文關鍵詞:聲控解碼器Google地圖
外文關鍵詞:voice commanddecoderGoogle Map
相關次數:
  • 被引用被引用:3
  • 點閱點閱:658
  • 評分評分:
  • 下載下載:51
  • 收藏至我的研究室書目清單書目收藏:0
本論文中,我們整合聲控技術於Google地圖。也就是說,我們可以將原本利用滑鼠或鍵盤的部分地圖操作,改由聲控來進行。相較於最新的即時語音處理技術,我們系統最大的不同,在於所有的語音運算處理都是在客戶端上作執行。在語料庫部分,我們錄製了100個熱門的台灣景點與一些特定的地圖控制指令來作為訓練語料。在我們的實驗中,使用了不同的訓練方式來訓練聲學模型、設計字典和語言模型並估算我們系統的效能。在系統實際使用情況,透過位置、控制和座標不同部分的聲控操作,便可循序地移動地圖中心到達指定的搜尋景點。不同使用者針對數個特定位置進行估算,整體搜尋過程平均花費20.8秒,其中大部分時間都是花費在錄音階段。
In this research, we integrate the voice commands technique into Google Map. It means
that we can control part of the movements for Google Map search without using the mouse or
keyboard but with voice. Our voice command system is built on the client side. The biggest
different between our system and state-of-the-art real-time speech processing system is that
all the computation about the speech process always work on the client side. For our corpus,
we choose the Top100 scenic spots in Taiwan and some specific control commands as our
training data. In the experiment of our research, we make use of the different ways to train
the acoustic models and design dictionary and language models to estimate the efficiency on our system. Actual usage in the system, we can move the map center to the specific location sequentially by voice command operations for location, control and coordinate. we estimate the overall search process time on some specific locations by different users. It spends 20.8 seconds in average which spends most of time in recording stage.
Acknowledgments d
List of Tables iii
List of Figures iv
Chapter 1 簡介1
1.1 研究背景. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 研究動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 聲控系統. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Google地圖應用. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 網頁型的語音辨識系統中伺服器與客戶端的關係. . . . . . . . . . . . . 4
1.6 論文架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Chapter 2 自動語音辨識系統架構6
2.1 隱藏式馬可夫模型和HTK工具. . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 模型單位集. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 語音特徵參數擷取. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 聲學模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5 使用HTK工具指令於辨識. . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Chapter 3 應用於Google地圖上的中文聲控系統16
3.1 中文與注音符號系統的介紹. . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 解碼器. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 系統架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4 錄音時間資訊分析. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.5 Silverlight的瀏覽器外用支援(Out-of-browser) . . . . . . . . . . . . . . . . 23
Chapter 4 語料庫與實驗25
4.1 語音語料庫的蒐集. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 模型訓練和文法設定. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3 實驗估算. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.4 地圖搜尋情境. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Chapter 5 總結與未來展望31
5.1 總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2 未來展望. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
附錄A 37
[1] Apple, “iPhone 4S.” http://www.apple.com/iphone/features/siri.
html.
[2] B.-K. Shim, Y.-K. Cho, J.-B. Won, and S.-H. Han, “A Study on Real-time Control of
Mobile Robot with Based on Voice Command,” in proceedings of 11th International
Conference on Control, Automation and Systems(ICCAS), Korea, pp. 1102 – 1103, October
2011.
[3] Y. Lu, L. Liu, S. Chen, and Q. Huang, “Voice Based Control for Humanoid Teleoperation,”
in proceedings of 2010 International Conference on Intelligent System Design
and Engineering Application,China, vol. 2, pp. 814 – 818, October 2010.
[4] Google, “Google 地圖行動版.” http://www.google.com.tw/mobile/
maps/.
[5] Microsoft, “Microsoft Silverlight.” http://www.microsoft.com/
silverlight/.
[6] 賽微科技股份有限公司, “Cyberon Voice Commander 多國語言語音命令系統.”
http://www.aclweb.org/anthology-new/O/O07/O07-1005.pdf.
[7] J.-B. G’omez, A. Ceballos, F. Prieto, and T. Redarce, “Mouth Gesture and Voice Command
Based Robot Command Interface,” in proceedings of 2009 IEEE International
Conference on Robotics and Automation(ICRA), Japan, pp. 4289 – 4294, may 2009.
[8] Google, “Google Maps Javascript API V3 Basics.” http://code.google.com/
intl/en/apis/maps/documentation/javascript/basics.html.
[9] F. L. Huang, S. W. Lin, and J. H. Lin, “Integrating Speech and Google Maps System
With Community Site Based on Text-to Speech Approach ,” in proceedings of 2011
Advanced Speech Processing Technology and Application, Taiwan, pp. 61 – 82, June
2011.
[10] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz, A. K. G. Lee, D. A. Patterson,
A. Rabkin, I. Stoica, and M. Zaharia, “Above the Clouds: A Berkeley View of
Cloud Computing,” tech. rep., 2009.
[11] J. Borges, J. Jimenez, and N. Rodriquez, “Speech Browsing the World Wide Web,” in
proceedings of 1999 IEEE International Conference on Systems, Man, and Cybernetics,
Japan, vol. 4, pp. 80 – 86, October 1999.
[12] T. Hain, A. E. Hannani, S. N. Wrigley, and V. Wan, “Automatic speech recognition for
scientific purpose - webASR,” in proceedings of 9th Annual Conference of the International
Speech Communication Association(INTERSPEECH2008), Australia, pp. 504 –
507, September 2008.
[13] P. R. Dixon and S. Furui, “ExploringWeb-Browser based Runtimes Engines for Creating
Ubiquitous Speech Interfaces,” in proceedings of 11th Annual Conference of the International
Speech Communication Association(INTERSPEECH2010), Japan, pp. 630 –
632, September 2010.
[14] M. Mohri, F. Pereira, and M. Riley, “Speech Recognition With Weighted Finite-State
Transducers,” Springer Handbook on Speech Processing and Speech Communication,
pp. 559–584, 2008.
[15] D. Moore, J. Dines, M. M. Doss, J. Vepa, O. Cheng, and T. Hain, “Juicer: A Weighted
Finite-State Transducer speech decoder,” 3rd Joint Workshop on Multimodal Interaction
and Related Machine Learning Algorithms MLMI’06, pp. 285–296, 2006.
[16] J.-P. Hosom, J. de Villiers, R. Cole, M. Fanty, J. Schalkwyk, Y. Yan, and W. Wei, Training
Hidden Markov Model/Artificial Neural Network (HMM/ANN) Hybrids for Automatic
Speech Recognition (ASR). Center for Spoken Language Understanding (CSLU),
2006.
[17] S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, G. Moore, J. Odell, D. Ollason,
D. Povey, V. Valtchev, and P. Woodland, The HTK Book Version 3.3. Cambridge
University Engineering Department, 2005.
[18] K. Aida–Zade, C. Ardil, and S. Rustamov, “Investigation of Combined use of MFCC and
LPC Features in Speech Recognition Systems,” World Academy of Science, Engineering
and Technology, vol. 19, pp. 74 – 80, 2006.
[19] 教育部國語推行委員會, “注音符號.” http://www.edu.tw/files/site_
content/M0001/juyin/ppp.htm?open.
[20] J. M. Unger, “Pinyin.info - a guild to the writing of Mandarin Chinese in romanization.”
http://www.pinyin.info/index.html.
[21] 教育部國語推行委員會, “國語注音符號第二式.”
http://language.moe.gov.tw/upload/public/20110125/
f0a04047-ac6b-498a-ba33-04fd81e575b4.pdf.
[22] C. Huang, Y. Shi, J. Zhou, M. Chu, T. Wang, and E. Chang, “Segmental tonal modeling
for phone set design in mandarin LVCSR,” in proceedings of 2004 International Conference
on Acoustics, Speech, and Signal Processing(ICASSP), Canada, pp. 901 – 904,
May 2004.
[23] Merialdo and Bernard, “Multilevel decoding for Very-Large-Size-Dictionary speech
recognition,” IBM Journal of Research and Development, vol. 32, no. 2, pp. 227 – 237,
1988.
[24] V. Ion and R. Haeb-Umbach, “A Novel Uncertainty Decoding RuleWith Applications to
Transmission Error Robust Speech Recognition,” IEEE Transactions on Audio, Speech,
and Language Processing, vol. 16, no. 5, pp. 1047 – 1060, 2008.
[25] J. Bloit and X. Rodet, “Short-time Viterbi for online HMM decoding: Evaluation on
a real-time phone recognition task ,” in proceedings of 2008 IEEE International Conference
on Acoustics, Speech and Signal Processing(ICASSP), U.S.A, pp. 2121 – 2124,
April 2008.
[26] G. Forney and JR., “The viterbi algorithm,” Proceedings of the IEEE, vol. 61, no. 3,
pp. 268 – 278, 1973.
[27] D. Goodman, Dynamic HTML: The Definitive Reference (Dynamic Html). 2006.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top