跳到主要內容

臺灣博碩士論文加值系統

(3.81.172.77) 您好!臺灣時間:2022/01/21 19:00
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:林郁政
論文名稱:利用二維影像分析及聲音同步技術作演講者臉部動畫之建構
論文名稱(外文):A Study on Virtual Talking Head Animation by 2D Image Analysis and Voice Synchronization Techniques
指導教授:蔡文祥蔡文祥引用關係
學位類別:碩士
校院名稱:國立交通大學
系所名稱:資訊科學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:英文
論文頁數:66
中文關鍵詞:語音素音節
外文關鍵詞:phonemesyllable
相關次數:
  • 被引用被引用:2
  • 點閱點閱:232
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
近年來,虛擬演講者之動畫在許多電腦介面的應用上逐漸扮演著重要的角色,在本研究中,我們提出了二維影像分析及聲音同步技術等方法來實現虛擬演講者臉部動作。相對於傳統上利用複雜的三維模型去建構虛擬演講者,我們使用簡單的二維影像序列來完成動畫。此系統主要分成二部分,第一是學習部分,第二是動畫部分。在學習部分中,我們使用動作捕捉系統來擷取臉部表情動作及聲音,之後每張說話的圖片都被切割出其嘴型及表情。在製作動畫時,我們使用alpha混色技術來合併每張被切割出的圖片。為了減少存放每一嘴型之資料庫的容量大小,我們把四百一十一個中文基本音化簡成一百二十個嘴型。為了在動畫中加入表情,我們使用伽傌分配和均勻分配來控制每個眨眼和挑眉間的時間差。最後,我們使用語音分析程式來獲得語音輸入中每個音節的發音長度,並根據此資訊使輸入的語音和撥放的動畫達到同步的目的。實驗結果證明了以上的方法確實可行。

In recent years, animated talking heads are playing an increasingly important role in many applications of computer interfacing. An approach to virtual talking head animation by 2D image analysis and voice synchronization techniques is proposed in this study. Instead of using the conventional way of adopting complicated 3D models to construct a virtual talking head, we use 2D image sequences to simplify the animation process. The proposed animation method includes two phases. One is the learning phase, and the other is the animation phase. In the learning phase, a motion capture system is used to capture speaking face images with facial expressions and sound. Then each speaking face image is segmented into a base face and some facial parts. And the alpha-blending technique is employed to smooth the seam between the base face and the facial parts when they are integrated to form new face images with expressions for use in animation. To reduce the size of the viseme database, a method for classification of the 411 base-syllables in Mandarin into 120 categories is proposed. To add facial expressions into the animation, the gamma distribution and the uniform distribution are used to model the timing behaviors of eye blinks and eyebrow movements. Finally, a speech analyzer is used to obtain the timing interval of each word. According to this timing information, a method is proposed to synchronize the speech and the talking head animation. Experimental results show the feasibility and practicability of the proposed methods.

Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Survey of Related Studies 3
1.3 Overview of Proposed Approach 4
1.3.1 Definitions 4
1.3.2 Brief Description of Proposed System 6
1.3.3 System Configuration 8
1.4 Contributions 9
1.5 Thesis Organization 10
Chapter 2 Construction of Virtual Faces 11
2.1 Introduction 11
2.2 Facial Part Segmentation 12
2.2.1 Head Location by Eye Corners 12
2.2.2 Mouth and Eyes Location by Relative Positions 13
2.3 Image Illumination and Color Calibration 14
2.3.1 Review of YUV Color Model 15
2.3.2 Proposed Calibration Method 16
2.4 Integration of Base Face and Facial Parts 18
2.4.1 Composition of Virtual Face 18
2.4.2 Smoothing of Seams 19
2.5 Experimental Results 20
Chapter 3 Viseme Classification and Database Construction 23
3.1 Introduction 23
3.2 Clustering of Visemes of Mandarin Initials by Articulation 24
3.3 Clustering of Visemes of Mandarin Finals by Mouth Shapes 26
3.4 Mandarin Viseme Construction 30
3.4.1 Syllable Extraction 30
3.4.2 Viseme Extraction 32
3.5 Construction of Viseme Database 34
3.6 Discussions 36
Chapter 4 Animation of Facial Expressions 38
4.1 Introduction 38
4.2 Analysis of Facial Expression Data from News on TV 39
4.3 Simulation of Eye Blinks by Gamma Distribution 41
4.3.1 Review of Gamma Distribution 41
4.3.2 Simulation of Eye Blinks 43
4.4 Simulation of Eyebrow Movements by Uniform Distribution 45
4.5 Procedure of Applying Gamma Distribution 46
4.6 Discussions 47
Chapter 5 Animation and Speech Synchronization 48
5.1 Introduction 48
5.2 Recognition of Characters in Mandarin Speech 49
5.3 Articulation Smoothing of Speech Animation 50
5.4 Speech Synchronization with Time Separation 53
5.5 Experimental Results 56
Chapter 6 Experimental Results and Discussions 61
6.1 Experimental Results 61
6.2 Discussions 64
Chapter 7 Conclusions and Suggestions for Future Works 65
7.1 Conclusions 65
7.2 Suggestions for Future Works 66
Reference 67

[1] Cosatto, E.; Graf, H.P. “Photo-Realistic Talking-Heads from Image Samples” IEEE Transaction on Multimedia, Volume 2, No. 3, pp. 152-163, Sept. 2000.
[2] Eric Cosatto, and Hans Peter Graf, “Sample-Based of Photo-realistic Talking Heads”, Computer Animation, Philadelphia, Pennsylvania, pp. 103-110, June 8-10 1998.
[3] Tzeng, Ovid. Et al., “Auto activation of linguistic information in Chinese character recognition,” Advances in Phychology, Vol.94, pp. 119-130, 1992.
[4] W. C. Shih and W. H. Tsai, “A Study on Lip Animation for Virtual Announcers by Combing Voice Analysis, 3D Graphics, and Computer Vision Techniques,” Porceedings of IPPR Conf. On CVGIP, Taipei, Taiwan, R. O. C., Vol. 1, pp. 286-298, 2001.
[5] T. M. Yeh, “Drills and Exercises in Mandarin Pronunciation”, ROC, May 1982.
[6] Sheldon Ross, “A First Course in Probability,” New York, U. S. A., 1976
[7] T. F. Cootes, K. Walker, and C. J. Taylor, “View-based active appearance models,“ in Proc Int. Conf. Automatic Face and Gesture Recognition, pp. 227-232, 2000.
[8] B. Guenter, C. Grimm, D. Wood, H. Malvar, and F. Pighin, “Making faces,” in Proc. SIGGRAPH’98, pp. 55-66, July 1998.
[9] Fredric I. Parke, “A Model for Human Face that Allows Speech Synchronized Animation,“ Computer & Graphics, 1975, Vol. 1, No.1, pp. 1-4
[10] C. Bregler, M. Covell, and M. Slaney, “Video Rewrite: Driven Visual Speech with Audio,” Proceedings of SIGGRAPH’97, Los Angeles, pp. 353-360, July 1997.
[11] S. H. Chen, Y. F. Liao, S. Ml Chiang, and S. Chang, “An RNN-based Preclassification Method for Fast Continuous Mandarin Speech Recognition,” IEEE Trans. Speech and Audio Processing, Vol. 6, 1998, pp. 86-90.
[12] Robert V. Hogg and Elliot A. Tanis, “Probability and statistical Inference,” New Jersey, U. S. A., 1993.
[13] H. Ney, “The Use of a One-Stage Dynamic Programming Algorithm for Connected Word Recognition,” IEEE Trans. Acoustics, Speech, Signal Proc., Vol. 32, No. 2, pp. 263-271.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top