跳到主要內容

臺灣博碩士論文加值系統

(34.236.192.4) 您好!臺灣時間:2022/08/17 19:27
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:張耀仁
研究生(外文):Yao-Jen Chang
論文名稱:應用於虛擬視訊會議之臉部表情分析與合成
論文名稱(外文):Analysis and Synthesis of Facial Expressions for Virtual Conferencing Systems
指導教授:陳永昌陳永昌引用關係
指導教授(外文):Yung-Chang Chen
學位類別:博士
校院名稱:國立清華大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:英文
論文頁數:139
中文關鍵詞:臉部表情分析臉部表情合成虛擬視訊會議音訊視訊整合模式編碼合成與自然影像混合編碼
外文關鍵詞:Facial Expression AnalysisFacial Expression SynthesisVirtual Conferencing SystemAudio-Visual IntegrationModel-based CodingSNHC
相關次數:
  • 被引用被引用:0
  • 點閱點閱:208
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
多媒體系統的豐富表現帶動了現今個人電腦軟硬體的普及。隨著網際網路的快速發展,多媒體資訊藉由全球資訊網而無遠弗屆。但是對於使用者而言,除了觀賞網際網路上的多媒體資訊之外,使用者更有興趣的是藉由網際網路做點對點、甚至於多使用者之間的通訊。而達成此目標的方法之一,就是設法讓使用者感覺到他們如同一起“存在”於一個虛擬的網際空間之中。
面對多媒體的廣泛應用,國際標準組織也積極從事相關標準的制訂。其中ISO MPEG-4 標準制訂小組於1999年年底制訂出MPEG-4標準,除了提供更有效率的演算法進行音訊、視訊編碼與傳輸,其另一重點即在於自然影像與合成影像的複合編解碼。這囊括了二維影像以及包含人體模型在內的三維物件的結合。在網路上所建構的世界中合成的人體稱為虛擬人,它可以當作使用者的代表者,提供了在網路上做虛擬會議、遠距教學等多媒體通訊應用的可能性。
在本論文中,我們發展臉部表情分析與合成之演算法則,期望依此建構虛擬視訊會議系統能夠比傳統視訊會議系統能以更低的頻寬需求提供更加逼真的呈現,使得多使用者透過網路進行會議時,能有面對面身歷其境的感受,而非現今視訊會議系統中以分割視窗為主如同觀賞電視牆般的呈現方式。
在以往應用於虛擬視訊會議系統之臉部表情分析與合成的相關研究中,往往是將整個系統分為許多子系統如臉部模型建構、頭部姿態分析、臉部表情分析、臉部表情合成、音訊視訊同步、音訊視訊編解碼等個別進行研究,之後再結合各個子系統完成虛擬視訊會議系統之建立。然而,目前在進行各子系統的研究時,通常會先假設其他子系統為完美無暇的狀況,或是將問題簡化,不考慮有其他子系統的狀況下進行,因此在實際整合上就會發生相當的困難。在本論文中,我們預先以整體系統整合為考量,同時結合數個子系統進行研究,並結合電腦視覺分析與電腦圖學、二維視訊編碼與三維模式編碼(3D Model-based Coding)、臉部表情分析與音訊分析等跨領域之研究,試圖提供多媒體虛擬視訊會議系統更適切的臉部表情分析與合成法則。
首先,透過電腦視覺分析與電腦圖學技術之結合,我們發展以合成協助分析策略(Analysis-by-Synthesis)進行頭部模型建構、頭部姿態分析與臉部表情分析之研究。使得使用者頭部模型建構能藉由單部未經校正的攝影機擷取頭部動作影像完成,而其結果能達到相當於以多部校正過攝影機進行拍攝之準確度。而所提出的頭部姿態分析法則,能在環境光源不均的狀況下進行準確之估測。並且能在使用者於任意頭部姿態的狀況下,進行臉部表情分析。其次,藉由二維影像編碼與三維模式編碼之結合,我們所提出的臉部表情合成架構不僅能提供比傳統二維影像編碼提供更好的畫質,同時能保有三維模式編碼之低頻寬需求與三維任意視角合成等優點。再者,藉由語音分析與表情分析之結合,我們提出以音訊協助臉部表情分析與合成之概念,不僅能降低電腦視覺分析的複雜度,同時提供更低頻寬需求之演算法進行臉部表情之合成。最後,在本論文研究中,我們也實作了包含點對點的虛擬視訊電話與多使用者虛擬視訊會議等系統雛型,藉以評估所發展演算法之實用性。
Analysis and synthesis of facial expressions are two core technologies in construction of a virtual conferencing system for face-to-face communications. In this thesis, we explore the possible integration of researches from different fields, namely, computer vision and computer graphics, 2D video coding and 3D model-based coding, expression analysis and speech analysis, for providing better solutions for facial expression analysis and synthesis for use in virtual conferencing systems. With the integration of computer vision and computer graphics, the head modeling, head pose estimation, and facial expression analysis can be performed in an analysis-by-synthesis approach. Thus, the head modeling from monocular image sequences can be achieved with a more complete feature set composed of salient facial feature points and head profiles; the head pose estimation can be performed with high accuracy under various lighting conditions; and expression analysis can be operated under different head poses with the assistance of user-customized facial model. With the integration of 2D video coding and 3D model-based coding, expression synthesis can achieve better visual quality while merits of 3D model-based coding, the coding efficiency and 3D rendering from arbitrary viewpoints, are well preserved for use in a multiuser virtual conferencing system. Finally, the concept of integration of expression analysis and speech analysis is presented, which can lead to a lower complexity solution to expression analysis and a much lower bandwidth requirement solution for expressive expression synthesis. Prototypes of virtual conferencing applications for point-to-point and multipoint audio-visual communications are also realized to verify the usefulness of the proposed methods.
Chapter 1 Introduction
1.1 From Video Conferencing to Virtual Presence
1.2 Motivation
1.3 Framework
1.4 Contributions
1.5 Thesis Organization
Chapter 2 Background and Related Works
2.1 Head Modeling
2.2 Head Pose Estimation
2.3 Facial Expression Analysis and Synthesis
2.4 Speech-driven Facial Animation
2.5 Summary
Chapter 3 Head Modeling and Head Pose Estimation
3.1 Introduction
3.2 Initialization
3.2.1 Feature Point Extraction
3.2.2 Initial Model Estimation
3.2.3 Texture Mapping
3.3 Tracking
3.3.1 Head Pose Tracking
3.3.2 Feature Point Tracking
3.3.3 Profile Extraction
3.4 Adaptation
3.4.1 Model Adaptation
3.4.2 Depth Displacement Calculation
3.4.3 Depth Displacement Interpolation
3.4.4 Texture Update
3.5 Experiments
3.5.1 Synthetic Sequences Generation
3.5.2 Error Measurement
3.5.3 Experimental Results on Synthetic Sequences
3.5.4 Experimental Results on Real Sequences
3.6 Summary
Chapter 4 Facial Expression Analysis and Synthesis for Creating Video Realistic Avatars
4.1 Introduction
4.2 Coding Architecture
4.3 Model-Assisted Facial Expression Analysis
4.3.1 Feature Point Tracking Using Stabilized View
4.3.2 Direct 3D Facial Feature Tracking
4.3.3 Facial Expression Interpretation
4.4 Facial Expression Synthesis and Coding
4.5 Experiments
4.5.1 Test Sequences Generation
4.5.2 Error Measurement
4.5.3 Experimental Results on Facial Feature Tracking
4.5.4 Experimental Results on Coding and Visual Synthesis
4.6 Summary
Chapter 5 Speech-Assisted Facial Expression Analysis and Synthesis
5.1 Introduction
5.2 The Gaussian Mixture Model
5.2.1 The training of GMM
5.2.2 The estimation with GMM
5.2.3 The adaptation of GMM
5.3 Speech-assisted Facial Expression Analysis
5.4 Speech-assisted Facial Expression Synthesis
5.5 Experiments
5.5.1 The audio-to-visual conversion
5.5.2 The FAP-to-texture conversion
5.6 Summary
Chapter 6 Virtual Conferencing Systems and Related Applications
6.1 Virtual Talk
6.2 Virtual Meeting
6.3 More Potential Applications
Chapter 7 Conclusions and Future Research
7.1 Summary and Conclusions
7.2 Suggestions for Future Research
References
[1] G.A. Abrantes and F. Pereira, “MPEG-4 facial animation technology: Survey, implementation, and results,” IEEE Trans. Circuits Systems Video Technology, Vol. 9, No. 2, pp. 290-305, March 1999.
[2] S. M. Ahadi and P. C. Woodland, “Combined Bayesian and predictive techniques for rapid speaker adaptation of continuous density hidden Markov models,” Computer Speech and Language, Vol. 11, pp. 187-206, 1997.
[3] K. Aizawa, H. Harashima, and T. Saito, “Model-based analysis synthesis image coding (MBASIC) system for a person’s face,” Signal Processing: Image Communication, Vol. 1, No. 2, pp. 587-590, October 1989.
[4] A.C. Andrés del Valle and J. Ostermann, “3D talking head customization by adapting a generic model to one uncalibrated picture,” Proc. 2001 IEEE Internat. Symp. on Circuits and Systems, Vol. 2, pp.325-328, Sydney, Australia, May 6-9, 2001.
[5] Annanova Ltd. web page: http://www.annanova.com.
[6] The Flock of Birds. Ascension Technology Corp., P.O. Box 527, Burlington, Vt. 05402, USA, web page: http://www.ascension-tech.com.
[7] G. Bozdaği, A.M. Tekalp, L. Onural, “3-D motion estimation and wireframe adaptation including photometric effects for model-based coding of facial image sequence,” IEEE Trans. Circuits Systems Video Technology, Vol. 4, No. 3, pp. 246-256, June 1994.
[8] R. A. Brooks, “A robust layered control system for a mobile robot,” IEEE Journal of Robotics and Automation, Vol. 2, No. 1, pp. 14-23, Mar. 1986.
[9] T.K. Capin, H. Noser, D. Thalmann, I.S. Pandzic, and N.M. Thalmann, “Virtual human representation and communication in VLNet,” IEEE Computer Graphics and Applications, Vol. 17, No. 2, pp. 42-53, March-April 1997.
[10] T.K. Capin, E. Petajan, J. Ostermann, “Efficient modeling of virtual humans in MPEG-4,” Proc. 2000 IEEE Internat. Conf. on Multimedia and Expo, Vol. 2, pp.1103-1106, New York City, NY, USA, July 30 — Aug. 2, 2000.
[11] M. La Cascia, S. Sclaroff, and V. Athitsos, “Fast, reliable head tracking under varying illumination: An Approach Based on Registration of Textured-Mapped 3D Models,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 22, No. 4, pp. 322-336, April 2000.
[12] Y.J. Chang, C.C. Chen, J.C. Chou, and Y.C. Chen, “Virtual talk: A model-based virtual phone using a layered audio-visual integration,” Proc. 2000 IEEE Internat. Conf. on Multimedia and Expo, Vol. 1, pp.415-418, New York City, NY, USA, July 30 — Aug. 2, 2000.
[13] Y.J. Chang, C.C. Chen, J.C. Chou, and Y.C. Chen, “Development of a multi-user virtual conference system using a layered audio-visual integration,” Proc. 1st IEEE Pacific-Rim Conf. on Multimedia, pp.50-53, Sydney, Australia, Dec. 13-15, 2000.
[14] L. S. Chen, H. Tao, T. S. Huang, T. Miyasato, R. Nakatsu, “Emotion recognition from audiovisual information,” Proceedings of 1998 IEEE Multimedia Signal Processing, pp. 83-88, Redondo Beach, CA, USA, Dec. 7-9, 1998.
[15] C. C. Chien, Yao-Jen Chang, and Y.C. Chen, “Facial expression analysis under various head poses,” accepted by Proc. 3rd IEEE Pacific-Rim Conf. on Multimedia, Hsinchu, Taiwan, Dec. 16-18, 2002.
[16] C.S. Choi, K. Aizawa, H. Harashima, T. Takebe, “Analysis and synthesis of facial image sequences in model-based image coding,” IEEE Trans. Circuits Systems Video Technol. Vol. 4, No. 3, pp. 257-275, June 1994.
[17] K.H. Choi, Y. Luo, and J.N. Hwang, “Hidden Markov model inversion for audio-to-visual conversion in an MPEG-4 facial animation system,” Journal of VLSI Signal Processing, Vol. 29, Issue 1-2, pp. 51-61, 2001.
[18] J.C. Chou, Y.J. Chang, and Y.C. Chen, “Facial feature point tracking and expression analysis for virtual conferencing systems,” Proceedings of 2001 IEEE International Conference on Multimedia and Expo (ICME2001), pp.415-418, Tokyo Japan, Aug. 22-25, 2001.
[19] Cyberlink TalkingShow, web page: http://www.gocyberlink.com/english/ products/talkingshow/talkingshow.asp.
[20] J. R. Deller, J. G. Proakis, J. H. L. Hansen, “Discrete-time processing of speech signals,” Macmillan Publishing Company, 1993.
[21] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum-likelihood from incomplete data via the EM algorithm, “ J. Royal Statistical Society B, Vol.39, pp.1-38, 1977.
[22] G. Donato, M. S. Bartlett, J. C. Hager, P. Ekman, and T. J. Sejnowski, “Classifying facial actions,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 21, No. 10, pp. 974-989, Oct. 1999.
[23] S. Dupont and J. Luettin, “Audio-visual speech modeling for continuous speech recognition,” IEEE Transactions on Multimedia, Vol. 2, No. 3, pp. 131-151, Sept. 2000.
[24] P. Eisert, T. Wiegand, and B. Girod, “Model-aided coding: A new approach to incorporate facial animation into motion-compensated video coding,” IEEE Trans. Circuits Systems Video Technology, Vol. 10, No. 3, pp. 344-358, April 2000.
[25] I. A. Essa and A. P. Pentland, “Coding, analysis, interpretation, and recognition of facial expressions,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 19, No. 7, pp. 57-763, July 1997.
[26] T. Ezzat, G. Geiger, and T. Poggio, “Trainable videorealistic speech animation,” to appear in Proceedings of ACM SIGGRAPH 2002, San Antonio, Texas, July 2002.
[27] B. Fasel and J. Luttin, “Facial expression analysis and recognition: A survey,” IDIAP Research Report, IDIAP-RR 99-19, Dec. 1999.
[28] D. Fidaleo, Y. Y. Noh, T. Kim, R. Enciso, U. Neumann, “Classification and volume morphing for performance-driven facial animation,” Proceedings of Digital and Computational Video, 1999.
[29] P. Fua, “Regularized bundle-adjustment to model heads from image sequences without calibration data,” International Journal of Computer Vision, Vol. 38, No. 2, pp. 153-171, July 2000.
[30] M. J. F. Gales, “Maximum likelihood linear transformations for HMM-based speech recognition,” Computer Speech and Language, Vol. 12, pp. 75-98, 1998.
[31] M. J. F. Gales, “Multiple-cluster training of hidden Markov models,” IEEE Trans. Speech and Audio Processing, Vol. 8, No. 4, pp. 417-428, July 2000.
[32] T. Goto, S. Kshirsagar, and N. M. Thalmann, “Automatic face cloning and animation,” IEEE Signal Processing Magazine, pp. 17-25, May 2001.
[33] P. Hong, Z. Wen, and T. S. Huang, “Real-time speech-driven face animation with expressions using neutral networks,” IEEE Trans. Neural Networks, Vol. 13, No. 1, Jan. 2002.
[34] T. Horprasert, Y. Yacoob, and L. Davis, “Computing 3-D head orientation from monocular image sequence,” Proc. Internat. Conf. on Automatic Face and Gesture Recognition, pp.242-247, Killington, Vermont, USA, Oct. 13-16, 1996.
[35] ITU-T Recommendation G.723.1, “Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s,” International Telecommunication Union, 1996.
[36] ITU-T. Recommendation H.323, “Packet-based multimedia communications systems,” International Telecommunication Union, 1998.
[37] T. Jebara, A. Azarbayejani, and A. Pentland, “3D structure from 2D motion,” IEEE Signal Processing Magazine, Vol. 16, No. 3, pp. 66-84, May 1999.
[38] T.S. Jebara and A. Pentland, “Parameterized structure from motion for 3D adaptive feedback tracking of faces,” Proc. 1997 Computer Vision and Pattern Recognition, pp.144-150, San Juan, Puerto Rico, June 17-19, 1997.
[39] M. Kampmann, “Estimation of the chin and cheek contours for precise face model adaptation,” Proc. 1997 Internat. Conf. on Image Processing, Vol. 3, pp.300-303, Washington, DC, USA, Oct. 26-29, 1997.
[40] M. Kampmann and R. Farhoud, “Precise face model adaptation for semantic coding of video sequences,” Proc. Picture Coding Symposium’97, Berlin, Germany, September 1997.
[41] M. Kampmann and J. Ostermann, “Automatic adaptation of a face model in a layered coder with an object-based analysis-synthesis layer and a knowledge-based layer,” Signal Processing: Image Communication, Vol. 9, pp. 201-220, March 1997.
[42] M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active contour models,” International Journal of Computer Vision, Vol. 1, No. 4, pp. 321—331, Jan. 1988.
[43] A. Kiruluta, M. Eizenman, and S. Pasupathy, “Predictive head movement tracking ssing a Kalman filter,” IEEE Trans. System, Man, and Cybernetics-Part B: Cybernetics, Vol. 27, No. 2, April 1997.
[44] C.J. Kuo and T.G. Lin, “Facial model generation through mono image sequence,” Proc. 2000 IEEE Internat. Conf. on Multimedia and Expo, Vol. 1, pp.407-410, New York City, NY, USA, July 30 - Aug. 2, 2000.
[45] F. Lavagetto and R. Pockaj, “The facial animation engine: Toward a high-level interface for the design of MPEG-4 compliant animated faces,” IEEE Trans. Circuits Systems Video Technol. Vol. 9, No. 2, pp. 277-289, March 1999.
[46] F. Lavagetto and R. Pockaj, “An efficient use of MPEG-4 FAP interpolation for facial animation at 70 bis/frame,” IEEE Trans. Circuits Systems Video Technol. Vol. 11, No. 10, pp. 1085-1097, Oct. 2001.
[47] Y. Lee, D. Terzopoulos, and K. Waters, “Realistic modeling for facial animation,” Proc. ACM SIGGRAPH''95, pp. 55-62, Los Angeles, CA, Aug. 1995.
[48] W.S. Lee and N.M. Thalmann, “Fast head modeling for animation,” Image and Vision Computing, Vol. 18, No. 4, pp. 355-364, March 2000,.
[49] C. J. Leggetter and P. C. Woodland, “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models,” Computer Speech and Language, Vol. 9, pp. 171-185, 1995.
[50] W.H. Leung and T. Chen, “Creating a multiuser 3-D virtual environment,” IEEE Signal Processing Magazine, Vol. 18, No. 3, pp. 9-16, May 2001.
[51] W. H. Leung, B. L. Tseng, Z. Y. Shae, F. Hendriks, and T. Chen, “Realistic video avatar,” Proc. 2000 IEEE Internat. Conf. on Multimedia and Expo, Vol. 2, pp. 631-634, New York City, NY, USA, July 30 - Aug. 2, 2000.
[52] B. Li and M.I. Sezan, “Adaptive video background replacement,” Proc. IEEE Int. Conf. Multimedia, pp. 385-388, Tokyo, Japan, Aug. 22-25, 2001.
[53] Weiping Li, “Overview of fine granularity scalability in MPEG-4 video standard,” IEEE Trans. Circuits and Systems for Video Technology, Vol. 11, No. 3, pp. 301-317, March 2001.
[54] J. J. J. Lien, T. Kanade, J. F. Cohn, and C. C. Li, “Detection, tracking, and classification of action units in facial expression,” Robotics and Autonomous Systems, Vol. 31, Issue 3, pp. 131-146, May 2000.
[55] C.W. Lin, “Video transcoding techniques for multipoint video conferencing,” Ph.D. thesis, National Tsing Hua University, Taiwan, Republic of China, 2000.
[56] C.W. Lin, Y.J. Chang, and Y.C. Chen, “Low-complexity face-assisted video coding,” Proc. 2000 IEEE Internat. Conf. on Image Processing, Vol. 2, pp.207-210, Vancouver, British Columbia, Canada, Sept. 10-13, 2000.
[57] Z. Liu, Y. Shan, and Z. Zhang, “Expressive expression mapping with ratio images,” Proceedings of 2001 ACM SIGGRAPH, pp. 271-276, Los Angeles, CA, USA, Aug. 12-17, 2001.
[58] Z. Liu, Z. Zhang, C. Jacobs, and M. Cohen, “Rapid modeling of animated faces from video,” Proc. 3rd Internat. Conf. on Visual Computing, pp.58-67, Mexico City, Mexico, Sept. 2000.
[59] The M2VTS database, web page: http://www.tele.ucl.ac.be/PROJECTS/M2VTS/ m2fdb.html.
[60] H. McGurk and J. MacDonald, “Hearing lips and seeing voices,” Nature, vol. 264, pp. 746-748, Dec. 1976.
[61] T. Meier and K.N. Ngan, “Video segmentation for content-based coding,” IEEE Trans. Circuits Systems Video Technology, Vol. 9, No. 8, pp. 1190-1203, Dec. 1999.
[62] J. M. Mendel, Lessons in estimation theory for signal processing, communications, and control, Englewood Cliffs, NJ: Prentice-Hall, 1995.
[63] Microsoft NetMeeting 3, online Internet conferencing software.
[64] S. Morishima, “Face analysis and synthesis-for duplication expression and impression,” IEEE Signal Processing Magazine, Vol. 18, No. 3, pp. 26-34, May 2001.
[65] MPEG-4 Video Group, Generic coding of audio-visual objects: Part 2 — Visual, ISO/IEC JTC1/SC29/WG11 N2502, FDIS of ISO/IEC 14496-2, Nov. 1998.
[66] MPEG-4 Audio Group, Generic coding of audio-visual objects: Part 3 — Audio ISO/IEC JTC1/SC29/WG11 N2503, FDIS of ISO/IEC 14496-2, Nov. 1998.
[67]“Text of ISO/IEC FDIS 14496-3 Section 2: Speech coding - HVXC,” ISO/IEC JTC1/SC29/WG11 N2503-sec2, Nov. 1998.
[68] Text of ISO/IEC FDIS 14496-3 Section 3: Speech coding - CELP,” ISO/IEC JTC1/SC29/WG11 N2503-sec3, Dec. 1998.
[69] J. Y. Noh, U. Neumann, “Expression cloning,” Proceedings of 2001 ACM SIGGRAPH, Los Angeles, CA, USA, pp. 277-288, Aug. 12-17, 2001.
[70] J. Ohya, Y. Kitamura, F. Kishino, N. Terashima, H. Takemura, and H. Ishii, “Virtual space teleconferencing: real-time reproduction of 3d human images,” Journal of Visual Communication and Image Representation, Vol. 6, No. 1, pp. 1-25, March 1995.
[71] N. Oliver, A. Pentland, and F. Berard, “LAFTER: A real-time lips and face tracker with facial expression recognition,” Pattern Recognition, Vol. 33, No. 8, pp. 1369-1382, Aug. 2000.
[72] J. Ostermann, “Object-based analysis-synthesis coding based on the source model of moving rigid 3D objects,” Signal Processing: Image Communication, Vol. 6, No. 2, pp. 143-161, May 1994.
[73] J. Ostermann, L.S. Chen, and T.S. Huang, “Animated talking head with personalized 3d head model,” Journal of VLSI Signal Processing: Systems for Signal, Image, and Video Technology, Vol. 20, No. 1/2, pp. 97-105, Oct. 1998.
[74] A. W. Paeth, “A fast algorithm for general raster rotation,” Graphics Gems, Academic Press, pp. 176-195, 1990.
[75] F. Pedersini, A. Sarti, and S. Tubaro, “Multi-camera systems: Calibration and applications,” IEEE Signal Processing Magazine, Vol. 16, No. 3, pp. 55-65, May 1999.
[76] F. Pighin, J. Hecker, D. Lischinski, R. Szeliski, and D.H. Salesin, “Synthesizing realistic facial expressions from photographs,” Proc. SIGGRAPH’98, pp.75-84, Orlando, Florida, July 1998.
[77] F. Pighin, R. Szeliski, and D. H. Salesin, “Resynthesizing facial animation through 3d model-based tracking,” Proceedings of International Conference on Computer Vision, Vol. 1, pp. 143-150, Kerkyra, Greece, Sept. 20-27, 1999.
[78] C.J. Poelman and T. Kanade, “A paraperspective factoration method for shape and motion recovery,” IEEE Trans. Pattern Analysis Machine Intelligence, Vol. 19, No. 3, pp. 206-218, March 1997.
[79] Carey E. Priebe, “Adaptive mixtures,” Journal of the American Statistical Association, Vol.89, No. 427, pp. 796-806, 1994.
[80] L. R. Rabiner and B. H. Juang, Fundamentals of speech recognition, Englewood Cliffs, NJ: Prentice-Hall, 1993.
[81] R. R. Rao, T. Chen, and R. M. Mersereau, “Audio-to-visual conversion for multimedia communication,” IEEE Trans. on Industrial Electronics, Vol. 45, No. 1, pp. 15-22, Feb. 1998.
[82] N. Sarris and M. G. Strintzis, “Constructing a videophone for the hearing impaired using MPEG-4 tools,” IEEE Multimedia Magazine, Vol.8, Issue 3, pp. 56-67, July-Sept. 2001.
[83] A. Schödl, A. Haro, and I. Essa, “Head tracking using a textured polygonal model,” Proc. Workshop on Perceptual User Interfaces, pp. 43-48, San Francisco, CA, Nov. 5-6, 1998.
[84] L.S. Shapiro, A. Zisserman, and M. Brady, “3D Motion recovery via affine epipolar geometry,” International Journal of Computer Vision, Vol. 16, No. 2, pp. 147-182, Oct. 1995.
[85] SNHC, Caspar Horne (editor), “SNHC verification model 3.0,” ISO/IEC JTC1/SC29/WG11 N1545, Feb. 1997.
[86] D. G. Stork and M. E. Hennecke, “Speechreading: An overview of image processing, feature extraction, sensory integration and pattern recognition techniques,” Proceedings of the Second International Conference on Automatic Face and Gesture Recognition, 1996.
[87] N. M. Thalmann, P. Kalra, and M. Eschar, “Face to virtual face,” Proceedings of the IEEE, Vol. 86, No. 5, pp. 870-883, May 1998.
[88] Y. L. Tian, T. Kanade, and J. F. Cohn, “Recognizing action units for facial expression analysis,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 23, No. 2, pp. 97-115, Feb. 2001.
[89] S. Valente and J.L. Dugelay, “A visual analysis/synthesis feedback loop for accurate face tracking,” Signal Processing: Image Communication, Vol. 16, No. 6, pp. 585-608, Feb. 2001.
[90] S. Valente, A. C. Andres Del Valle, and J. L. Dugelay, “Analysis and reproduction of facial expressions for realistic communicating clones,” Journal of VLSI Signal Processing, Vol. 29, No. 1/2, pp. 41-49, Aug./Sept. 2001.
[91] VocalTec Internet Phone 5, online Internet telephony software.
[92] H. Wang, and S. F. Chang, “A highly efficient system for automatic face region detection in MPEG video”, IEEE Trans. Circuits Syst. Video Technology, Vol. 7, No. 4, pp. 615-628, Aug. 1997.
[93] Y. Yacoob and L.S. Davis, “Recognizing human facial expressions from long image sequences using optical flow,” IEEE Trans. Pattern Analysis Machine Intelligence, Vol.18, No.6, pp. 636-642, June 1996.
[94] J. Yang and A. Waibel, “A real-time face tracker,” Proc. 3rd IEEE Workshop on Applications of Computer Vision, Sarasota, Florida, pp.142-147, Dec. 2-3, 1996.
[95] L. Yin and A. Basu, “Generating realistic facial expressions with wrinkles for model-based coding,” Computer Vision and Image Understanding, Vol. 84, No. 2, pp. 201-240, Nov. 2001.
[96] A. Yuille, P. Hallinan, and D. Cohen, “Feature extraction from faces using deformable templates,” International Journal of Computer Vision, Vol. 8, No. 2, pp. 99-111, 1992.
[97] L. Zhang, “Automatic adaptation of a face model using action units for semantic coding of videophone sequences,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 8, No. 6, pp. 781-795, Oct. 1998.
[98] J.Y. Zheng, “Acquiring 3-D models from sequences of contours,” IEEE Trans. Pattern Analysis Machine Intelligence, Vol. 16, No. 2, pp. 163-178, Feb. 1994.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top