跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.176) 您好!臺灣時間:2025/09/06 09:51
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:昌芳騁
研究生(外文):Chang Fang-Chen
論文名稱:唇語辨識系統
論文名稱(外文):Lipreading System
指導教授:劉長遠
指導教授(外文):Liou Cheng-Yuan
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:1999
畢業學年度:87
語文別:中文
論文頁數:32
中文關鍵詞:唇語辨識
外文關鍵詞:LipreadingLip ExtractionSpeech RecognitionPoint Distribution ModelPrincipal Component AnalysisTime-Delay Neural Networks
相關次數:
  • 被引用被引用:0
  • 點閱點閱:1038
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本論文建構了兩個和嘴唇相關的模型,其一是嘴唇輪廓模型,使用的方法是 Point Distribution Model (PDM) ﹔其二是嘴唇顏色模型,使用的方法和 PDM 相似,以主要成分分析(PCA)分析嘴唇附近的顏色,建構嘴唇顏色模型。我們使用一個能量函式來衡量一個落在一張影像中的嘴唇,是否符合嘴唇顏色模型所描述的嘴唇。因此,在影像中搜尋嘴唇的問題,可以轉換為最小化此能量函式的問題,Simplex Method 將被用來最小化此能量函式。在唇語辨讀方面,我們使用 Time-Delay Neural Networks 來做辨識。

中文摘要 ……………………………………………………………………… 3
英文摘要 ……………………………………………………………………… 4
1. 導論 ………………………………………………………………… 5
1.1 McGurk 效應 ……………………………………………………………… 5
1.2 唇語辨識相關之研究 …………………………………………………… 6
1.3 系統簡介 ………………………………………………………………… 6
1.4 論文概要 ………………………………………………………………… 7
2. 研究背景 …………………………………………………………… 8
1.1 特徵擷取的方法 ………………………………………………………… 8
1.2 分類器的種類 …………………………………………………………… 9
3. 嘴唇模型的建構 ………………………………………………… 10
3.1簡介 ……………………………………………………………………… 10
3.2主要成分分析 …………………………………………………………… 10
3.3 Point Distribution Models …………………………………………… 13
3.4嘴唇形狀模型 …………………………………………………………… 13
3.4.1 手動標示嘴唇形狀訓練範例 …………………………………… 14
3.4.2 對齊訓練範例中的嘴唇形狀 …………………………………… 14
3.4.3 以主要成分分析來分析訓練範例 ……………………………… 15
3.4.4 嘴唇形狀模型 …………………………………………………… 15
3.5 嘴唇顏色模型 …………………………………………………………… 19
3.5.1 嘴唇顏色的取樣方法 …………………………………………… 19
3.5.2 以主要成分分析來分析嘴唇顏色 ……………………………… 19
3.5.3 嘴唇顏色模型 …………………………………………………… 20
4. 尋找嘴唇位置與唇形追蹤 ……………………………………… 21
4.1 簡介 ……………………………………………………………………… 21
4.2 能量函式 ………………………………………………………………… 21
4.2.1 能量函式一 ……………………………………………………… 23
4.2.2 能量函式二 ……………………………………………………… 23
4.3 影像中嘴唇輪廓的追蹤 ………………………………………………… 23
5. 唇語辨讀 ………………………………………………………… 25
5.1 Time Delay Neural Networks 架構 …………………………………… 25
5.2 網路輸入值的取得 ……………………………………………………… 26
6. 結論與討論 ……………………………………………………… 28
7. 參考資料 ………………………………………………………… 29

1. [Basu98] S. Basu, N. Oliver, A. Pentland, " 3D modeling and tracking of human lip motions ", Sixth International Conference on Computer Vision, 1998, pp. 337 -343
2. [Bregler93] Christoph Bregler, Hermann Hild, Stefan Manke, and Alex Waibel, “Improving connected letter recognition by Lipreading”, Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Minneapolis, 1993, Vol. 1, pp.557-560.
3. [Bruce92] Vicki Bruce, “What the human face tells the human mind: Some challenges for the robot-human interface”, IEEE International Workshop on Robot and Human Communication, 1992, pp. 44-51.
4. [Carraro89] A. Carraro, E. Chilton, H. McGurk, “A Telephonic Lipreading Device for the Hearing Impaired”, IEE Colloquium on Biomedical Applications of Digital Signal Processing, 1989, pp. 1-8
5. [Chen95a] Tsuhan Chen, Yao Wang, H. P. Graf, C. Swain, "A new frame interpolation scheme for talking head sequences ", Proc. International Conference on Image Processing, 1995, vol. 2, Pp.591 -594
6. [Chen95b] Tsuhan Chen, H. P. Graf, B. Haskell, E. Petajan, Yao Wang, H. Chen, Wu Chou, "Speech-assisted lip synchronization in audio-visual communications ", Proc. International Conference on Image Processing, 1995, vol. 2, pp.579 -582
7. [Chen97] Tsuhan Chen , "Recent development in multimedia signal processing: a review on audio-visual interaction ", 13th International Conference on Digital Signal Processing Proceedings (DSP 97), vol. 1, pp.175 -178
8. [Chen98] Tsuhan Chen; Rao, R.R. , "Audio-visual integration in multimodal communication ", Proceedings of the IEEE Volume: 86 5 , May 1998 , pp.837 -852
9. [Chiou97] Greg I. Chiou and Jenq-Neng Hwang, “Lipreading from Color Video”, IEEE Transactions on Image Processing, Vol. 6, No.8, August 1997, pp.1192-1195
10. [Cootes95] T.F. Cootes, C.J. Taylor, D.H. Cooper, J. Graham, “Active shape models — Their traning and application”, Computer Vision and Image Understanding 61, 1995, pp.38-59.
11. [Edward96] T. Edward, Jr. Auer, E. Bernstein Lynne, “Lipreading Supplemented by Voice Fundamental Frequency: To What Extent Does The Addition of Voicing Increase Lexical Uniqueness for the Lipreader?”, Proceedings of Fourth International Conference on Spoken Language, vol.1, 1996, pp.86-89.
12. [Essa94] Irfan A. Essa, Trevor Darrell and Alex Pentland, “Tracking Facial Motion”, Proceedings of the IEEE Workshop on Nonrigid and Articulate Motion, Austin, Texas, November 1994
13. [Goldschen95] Alan J. Goldschen, Oscar N. Garcia, Eric Petajan, “Continuous Optical Automatic Speech Recognition by Lipreading”, IEEE Signals, Systems and Computers, vol. 1, 1994, pp.572-577.
14. [Grant91] P. M. Grant, “Speech recognition techniques”, Electronics & Communication Engineering Journal, Feb. 1991, pp. 37-48.
15. [Green96] K. P. Green, " Studies of the McGurk effect: implications for theories of speech perception ", Proc., Fourth International Conference on Spoken Language, 1996, vol. 3, pp.1652 -1655
16. [Hampshire90] John B. Hampshire, II, and Alexander H. Waibel, “A Novel Objective Function for Improved Phoneme Recognition Using Time-Delay Neural Networks”, IEEE Transactions on Neural Networks, Vol. 1, No. 2, June 1990.
17. [Huang98] Chung-Lin Huang, Wen-Yi Huang, “Sign language recognition using model-based tracking and a 3D Hopfield neural network”, Machine Vision and Applications (1998), 10, pp. 292-307
18. [Jolliffe86] I. T. Jolliffe, “Principal Component Analysis”, Springer-Verlag, 1986.
19. [Juang91] B. H. Juang, L. R. Rabiner, “Hidden Markov Models for Speech Recognition”, American Statistical Association and the American Society for Quality Control, TECHNOMETRICS, August 1991, Vol.33, No. 3, pp.251-272.
20. [Lavagetto97] Fabio Lavagetto, “Time-Delay Neural Networks for Estimating Lip Movements From Speech Analysis: A Useful Tool in Audio-Video Synchronization”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 7, No. 5, Oct. 1997, pp. 786-800.
21. [Luettin] Juergen Luettin, Neil A. Thacker and Steve W. Beet, “Visual Speech Recognition Using Active Shape Models and Hidden Markov Models”,
22. [Luettin96] Juergen Luettin, Neil A. Thacker, and Steve W. Beet, “Locating and tracking facial speech features”, Proceedings of the International Conference on Pattern Recognition (ICPR'96), 1996
23. [Luettin97] Juergen Luettin, “Towards Speaker Independent Continuous Speechreading”, Proceedings of the European Conference on Speech Communication and Technology (EUROSPEECH'97), 1997
24. [Luettin98] Juergen Luettin, Neil A. Thacker, “Speechreading using Probabilistic Models”, Computer Vision and Image Understanding, Vol 65, No. 2, February, pp.163-178, 1998
25. [Mak94] M. W. Mak, W. G. Allen, “A lip-tracking system based on morphological processing and block matching techniques”, Signal Processing: Image Communication 6 (1994), pp. 335-348.
26. [Mase91] K. Mase and A. Pentland, “Automatic lipreading by optical-flow analysis,” Syst. Comput. Jpn., vol 22, pp.67-76, 1991
27. [McGurk76] H. McGurk, J. MacDonald, “Hearing lips and seeing voices”, Nature, 264, 1976, pp.746-768
28. [Moghaddam95] Baback Moghaddam, Alex Pentland, “Probabilistic Visual Learning for Object Detection”, IEEE Proceedings, International Conference on Computer Vision, 1995, pp. 786-793.
29. [Murase96] Hiroshi Murase, Rie Sakai, “Moving object recognition in eigenspace representation: gait analysis and lip reading”, Pattern Recognition Letters 17 (1996), pp.155-162.
30. [Petajan96] E. Petajan, H. P. Graf, "Robust face feature analysis for automatic speechreading and character animation ", Proceedings of the Second International Conference on Automatic Face and Gesture Recognition, 1996, pp.357 -362
31. [Rabi97] Gihad Rabi, Si Wei Lu, “Energy Minimization for Extracting Mouth Curves in a Facial Image”, IEEE Proceedings on Intelligent Information Systems 1997, (ISS '97) , pp. 381 —385.
32. [Rabiner89] Lawrence R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition”, Proceedings of the IEEE, vol. 772, Feb, 1989, pp. 257-286.
33. [Rao94] Ram R. Rao, Russel M. Mersereau, “Lip Modeling for Visual Speech Recognition”, IEEE Signals, Systems and Computers, 1994. 1994 Conference Record of the Twenty-Eighth Asilomar Conference, vol. 1, 1994, pp. 587-590.
34. [Silsbee96] Peter L. Silsbee, Alan C. Bovik, “Computer Lipreading for Improved Accuracy in Automatic Speech Recognition”, IEEE Transactions on Speech and Audio Processing, Vol. 4, No. 5, Sep. 1996.
35. [Sozou95] P.D. Sozou, T.F. Cootes, C.J. Taylor, E.C. Di Mauro, “A non-linear generalisation of PDMs using polynomial regression”, Image and Vision Computing 13 (5), 1995, pp. 451-457.
36. [Terzopoulos93] Demetri Terzopoulos, Keith Waters, “Analysis and Synthesis of Facial Image Sequences Using Physical and Anatomical Models”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 15, No. 6, June 1993, pp. 569-579
37. [Waibel89] Alexander Waibel, Toshiyuki Hanazawa, Geoffrey Hinton, Kiyohiro Shikano, and Kevin J. Lang, “Phoneme Recognition Using Time-Delay Neural Networks”, IEEE Transactions on Acoustics, Speech, and Signal Processing. Vol. 37, No. 3, March 1989.
38. [Yang96] J. Yang and A. Waibel, ``A real-time face tracker," Proceedings of WACV'96 , pp. 142-147 (Sarasota, Florida, USA)
39. [Yoshikawa96] Yoshikawa, H.; Yokosato, J.; Tanaka, S. "Synthesizing human motion in a CG authoring environment for nonprofessionals ", Proceedings of the Third IEEE International Conference on Multimedia Computing and Systems, 1996, pp.40 -43
40. [Yu97] Keren Yu, Xiaoyi Jiang, Horst Bunke, “Lipreading: A classifier combination approach”, Pattern Recognition Letters 18 (1997), pp. 1421-1426.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top