|
[1] G.A. Abrantes and F. Pereira, “MPEG-4 facial animation technology: Survey, implementation, and results,” IEEE Trans. Circuits Systems Video Technology, Vol. 9, No. 2, pp. 290-305, March 1999. [2] S. M. Ahadi and P. C. Woodland, “Combined Bayesian and predictive techniques for rapid speaker adaptation of continuous density hidden Markov models,” Computer Speech and Language, Vol. 11, pp. 187-206, 1997. [3] K. Aizawa, H. Harashima, and T. Saito, “Model-based analysis synthesis image coding (MBASIC) system for a person’s face,” Signal Processing: Image Communication, Vol. 1, No. 2, pp. 587-590, October 1989. [4] A.C. Andrés del Valle and J. Ostermann, “3D talking head customization by adapting a generic model to one uncalibrated picture,” Proc. 2001 IEEE Internat. Symp. on Circuits and Systems, Vol. 2, pp.325-328, Sydney, Australia, May 6-9, 2001. [5] Annanova Ltd. web page: http://www.annanova.com. [6] The Flock of Birds. Ascension Technology Corp., P.O. Box 527, Burlington, Vt. 05402, USA, web page: http://www.ascension-tech.com. [7] G. Bozdaği, A.M. Tekalp, L. Onural, “3-D motion estimation and wireframe adaptation including photometric effects for model-based coding of facial image sequence,” IEEE Trans. Circuits Systems Video Technology, Vol. 4, No. 3, pp. 246-256, June 1994. [8] R. A. Brooks, “A robust layered control system for a mobile robot,” IEEE Journal of Robotics and Automation, Vol. 2, No. 1, pp. 14-23, Mar. 1986. [9] T.K. Capin, H. Noser, D. Thalmann, I.S. Pandzic, and N.M. Thalmann, “Virtual human representation and communication in VLNet,” IEEE Computer Graphics and Applications, Vol. 17, No. 2, pp. 42-53, March-April 1997. [10] T.K. Capin, E. Petajan, J. Ostermann, “Efficient modeling of virtual humans in MPEG-4,” Proc. 2000 IEEE Internat. Conf. on Multimedia and Expo, Vol. 2, pp.1103-1106, New York City, NY, USA, July 30 — Aug. 2, 2000. [11] M. La Cascia, S. Sclaroff, and V. Athitsos, “Fast, reliable head tracking under varying illumination: An Approach Based on Registration of Textured-Mapped 3D Models,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 22, No. 4, pp. 322-336, April 2000. [12] Y.J. Chang, C.C. Chen, J.C. Chou, and Y.C. Chen, “Virtual talk: A model-based virtual phone using a layered audio-visual integration,” Proc. 2000 IEEE Internat. Conf. on Multimedia and Expo, Vol. 1, pp.415-418, New York City, NY, USA, July 30 — Aug. 2, 2000. [13] Y.J. Chang, C.C. Chen, J.C. Chou, and Y.C. Chen, “Development of a multi-user virtual conference system using a layered audio-visual integration,” Proc. 1st IEEE Pacific-Rim Conf. on Multimedia, pp.50-53, Sydney, Australia, Dec. 13-15, 2000. [14] L. S. Chen, H. Tao, T. S. Huang, T. Miyasato, R. Nakatsu, “Emotion recognition from audiovisual information,” Proceedings of 1998 IEEE Multimedia Signal Processing, pp. 83-88, Redondo Beach, CA, USA, Dec. 7-9, 1998. [15] C. C. Chien, Yao-Jen Chang, and Y.C. Chen, “Facial expression analysis under various head poses,” accepted by Proc. 3rd IEEE Pacific-Rim Conf. on Multimedia, Hsinchu, Taiwan, Dec. 16-18, 2002. [16] C.S. Choi, K. Aizawa, H. Harashima, T. Takebe, “Analysis and synthesis of facial image sequences in model-based image coding,” IEEE Trans. Circuits Systems Video Technol. Vol. 4, No. 3, pp. 257-275, June 1994. [17] K.H. Choi, Y. Luo, and J.N. Hwang, “Hidden Markov model inversion for audio-to-visual conversion in an MPEG-4 facial animation system,” Journal of VLSI Signal Processing, Vol. 29, Issue 1-2, pp. 51-61, 2001. [18] J.C. Chou, Y.J. Chang, and Y.C. Chen, “Facial feature point tracking and expression analysis for virtual conferencing systems,” Proceedings of 2001 IEEE International Conference on Multimedia and Expo (ICME2001), pp.415-418, Tokyo Japan, Aug. 22-25, 2001. [19] Cyberlink TalkingShow, web page: http://www.gocyberlink.com/english/ products/talkingshow/talkingshow.asp. [20] J. R. Deller, J. G. Proakis, J. H. L. Hansen, “Discrete-time processing of speech signals,” Macmillan Publishing Company, 1993. [21] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum-likelihood from incomplete data via the EM algorithm, “ J. Royal Statistical Society B, Vol.39, pp.1-38, 1977. [22] G. Donato, M. S. Bartlett, J. C. Hager, P. Ekman, and T. J. Sejnowski, “Classifying facial actions,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 21, No. 10, pp. 974-989, Oct. 1999. [23] S. Dupont and J. Luettin, “Audio-visual speech modeling for continuous speech recognition,” IEEE Transactions on Multimedia, Vol. 2, No. 3, pp. 131-151, Sept. 2000. [24] P. Eisert, T. Wiegand, and B. Girod, “Model-aided coding: A new approach to incorporate facial animation into motion-compensated video coding,” IEEE Trans. Circuits Systems Video Technology, Vol. 10, No. 3, pp. 344-358, April 2000. [25] I. A. Essa and A. P. Pentland, “Coding, analysis, interpretation, and recognition of facial expressions,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 19, No. 7, pp. 57-763, July 1997. [26] T. Ezzat, G. Geiger, and T. Poggio, “Trainable videorealistic speech animation,” to appear in Proceedings of ACM SIGGRAPH 2002, San Antonio, Texas, July 2002. [27] B. Fasel and J. Luttin, “Facial expression analysis and recognition: A survey,” IDIAP Research Report, IDIAP-RR 99-19, Dec. 1999. [28] D. Fidaleo, Y. Y. Noh, T. Kim, R. Enciso, U. Neumann, “Classification and volume morphing for performance-driven facial animation,” Proceedings of Digital and Computational Video, 1999. [29] P. Fua, “Regularized bundle-adjustment to model heads from image sequences without calibration data,” International Journal of Computer Vision, Vol. 38, No. 2, pp. 153-171, July 2000. [30] M. J. F. Gales, “Maximum likelihood linear transformations for HMM-based speech recognition,” Computer Speech and Language, Vol. 12, pp. 75-98, 1998. [31] M. J. F. Gales, “Multiple-cluster training of hidden Markov models,” IEEE Trans. Speech and Audio Processing, Vol. 8, No. 4, pp. 417-428, July 2000. [32] T. Goto, S. Kshirsagar, and N. M. Thalmann, “Automatic face cloning and animation,” IEEE Signal Processing Magazine, pp. 17-25, May 2001. [33] P. Hong, Z. Wen, and T. S. Huang, “Real-time speech-driven face animation with expressions using neutral networks,” IEEE Trans. Neural Networks, Vol. 13, No. 1, Jan. 2002. [34] T. Horprasert, Y. Yacoob, and L. Davis, “Computing 3-D head orientation from monocular image sequence,” Proc. Internat. Conf. on Automatic Face and Gesture Recognition, pp.242-247, Killington, Vermont, USA, Oct. 13-16, 1996. [35] ITU-T Recommendation G.723.1, “Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s,” International Telecommunication Union, 1996. [36] ITU-T. Recommendation H.323, “Packet-based multimedia communications systems,” International Telecommunication Union, 1998. [37] T. Jebara, A. Azarbayejani, and A. Pentland, “3D structure from 2D motion,” IEEE Signal Processing Magazine, Vol. 16, No. 3, pp. 66-84, May 1999. [38] T.S. Jebara and A. Pentland, “Parameterized structure from motion for 3D adaptive feedback tracking of faces,” Proc. 1997 Computer Vision and Pattern Recognition, pp.144-150, San Juan, Puerto Rico, June 17-19, 1997. [39] M. Kampmann, “Estimation of the chin and cheek contours for precise face model adaptation,” Proc. 1997 Internat. Conf. on Image Processing, Vol. 3, pp.300-303, Washington, DC, USA, Oct. 26-29, 1997. [40] M. Kampmann and R. Farhoud, “Precise face model adaptation for semantic coding of video sequences,” Proc. Picture Coding Symposium’97, Berlin, Germany, September 1997. [41] M. Kampmann and J. Ostermann, “Automatic adaptation of a face model in a layered coder with an object-based analysis-synthesis layer and a knowledge-based layer,” Signal Processing: Image Communication, Vol. 9, pp. 201-220, March 1997. [42] M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active contour models,” International Journal of Computer Vision, Vol. 1, No. 4, pp. 321—331, Jan. 1988. [43] A. Kiruluta, M. Eizenman, and S. Pasupathy, “Predictive head movement tracking ssing a Kalman filter,” IEEE Trans. System, Man, and Cybernetics-Part B: Cybernetics, Vol. 27, No. 2, April 1997. [44] C.J. Kuo and T.G. Lin, “Facial model generation through mono image sequence,” Proc. 2000 IEEE Internat. Conf. on Multimedia and Expo, Vol. 1, pp.407-410, New York City, NY, USA, July 30 - Aug. 2, 2000. [45] F. Lavagetto and R. Pockaj, “The facial animation engine: Toward a high-level interface for the design of MPEG-4 compliant animated faces,” IEEE Trans. Circuits Systems Video Technol. Vol. 9, No. 2, pp. 277-289, March 1999. [46] F. Lavagetto and R. Pockaj, “An efficient use of MPEG-4 FAP interpolation for facial animation at 70 bis/frame,” IEEE Trans. Circuits Systems Video Technol. Vol. 11, No. 10, pp. 1085-1097, Oct. 2001. [47] Y. Lee, D. Terzopoulos, and K. Waters, “Realistic modeling for facial animation,” Proc. ACM SIGGRAPH''95, pp. 55-62, Los Angeles, CA, Aug. 1995. [48] W.S. Lee and N.M. Thalmann, “Fast head modeling for animation,” Image and Vision Computing, Vol. 18, No. 4, pp. 355-364, March 2000,. [49] C. J. Leggetter and P. C. Woodland, “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models,” Computer Speech and Language, Vol. 9, pp. 171-185, 1995. [50] W.H. Leung and T. Chen, “Creating a multiuser 3-D virtual environment,” IEEE Signal Processing Magazine, Vol. 18, No. 3, pp. 9-16, May 2001. [51] W. H. Leung, B. L. Tseng, Z. Y. Shae, F. Hendriks, and T. Chen, “Realistic video avatar,” Proc. 2000 IEEE Internat. Conf. on Multimedia and Expo, Vol. 2, pp. 631-634, New York City, NY, USA, July 30 - Aug. 2, 2000. [52] B. Li and M.I. Sezan, “Adaptive video background replacement,” Proc. IEEE Int. Conf. Multimedia, pp. 385-388, Tokyo, Japan, Aug. 22-25, 2001. [53] Weiping Li, “Overview of fine granularity scalability in MPEG-4 video standard,” IEEE Trans. Circuits and Systems for Video Technology, Vol. 11, No. 3, pp. 301-317, March 2001. [54] J. J. J. Lien, T. Kanade, J. F. Cohn, and C. C. Li, “Detection, tracking, and classification of action units in facial expression,” Robotics and Autonomous Systems, Vol. 31, Issue 3, pp. 131-146, May 2000. [55] C.W. Lin, “Video transcoding techniques for multipoint video conferencing,” Ph.D. thesis, National Tsing Hua University, Taiwan, Republic of China, 2000. [56] C.W. Lin, Y.J. Chang, and Y.C. Chen, “Low-complexity face-assisted video coding,” Proc. 2000 IEEE Internat. Conf. on Image Processing, Vol. 2, pp.207-210, Vancouver, British Columbia, Canada, Sept. 10-13, 2000. [57] Z. Liu, Y. Shan, and Z. Zhang, “Expressive expression mapping with ratio images,” Proceedings of 2001 ACM SIGGRAPH, pp. 271-276, Los Angeles, CA, USA, Aug. 12-17, 2001. [58] Z. Liu, Z. Zhang, C. Jacobs, and M. Cohen, “Rapid modeling of animated faces from video,” Proc. 3rd Internat. Conf. on Visual Computing, pp.58-67, Mexico City, Mexico, Sept. 2000. [59] The M2VTS database, web page: http://www.tele.ucl.ac.be/PROJECTS/M2VTS/ m2fdb.html. [60] H. McGurk and J. MacDonald, “Hearing lips and seeing voices,” Nature, vol. 264, pp. 746-748, Dec. 1976. [61] T. Meier and K.N. Ngan, “Video segmentation for content-based coding,” IEEE Trans. Circuits Systems Video Technology, Vol. 9, No. 8, pp. 1190-1203, Dec. 1999. [62] J. M. Mendel, Lessons in estimation theory for signal processing, communications, and control, Englewood Cliffs, NJ: Prentice-Hall, 1995. [63] Microsoft NetMeeting 3, online Internet conferencing software. [64] S. Morishima, “Face analysis and synthesis-for duplication expression and impression,” IEEE Signal Processing Magazine, Vol. 18, No. 3, pp. 26-34, May 2001. [65] MPEG-4 Video Group, Generic coding of audio-visual objects: Part 2 — Visual, ISO/IEC JTC1/SC29/WG11 N2502, FDIS of ISO/IEC 14496-2, Nov. 1998. [66] MPEG-4 Audio Group, Generic coding of audio-visual objects: Part 3 — Audio ISO/IEC JTC1/SC29/WG11 N2503, FDIS of ISO/IEC 14496-2, Nov. 1998. [67]“Text of ISO/IEC FDIS 14496-3 Section 2: Speech coding - HVXC,” ISO/IEC JTC1/SC29/WG11 N2503-sec2, Nov. 1998. [68] Text of ISO/IEC FDIS 14496-3 Section 3: Speech coding - CELP,” ISO/IEC JTC1/SC29/WG11 N2503-sec3, Dec. 1998. [69] J. Y. Noh, U. Neumann, “Expression cloning,” Proceedings of 2001 ACM SIGGRAPH, Los Angeles, CA, USA, pp. 277-288, Aug. 12-17, 2001. [70] J. Ohya, Y. Kitamura, F. Kishino, N. Terashima, H. Takemura, and H. Ishii, “Virtual space teleconferencing: real-time reproduction of 3d human images,” Journal of Visual Communication and Image Representation, Vol. 6, No. 1, pp. 1-25, March 1995. [71] N. Oliver, A. Pentland, and F. Berard, “LAFTER: A real-time lips and face tracker with facial expression recognition,” Pattern Recognition, Vol. 33, No. 8, pp. 1369-1382, Aug. 2000. [72] J. Ostermann, “Object-based analysis-synthesis coding based on the source model of moving rigid 3D objects,” Signal Processing: Image Communication, Vol. 6, No. 2, pp. 143-161, May 1994. [73] J. Ostermann, L.S. Chen, and T.S. Huang, “Animated talking head with personalized 3d head model,” Journal of VLSI Signal Processing: Systems for Signal, Image, and Video Technology, Vol. 20, No. 1/2, pp. 97-105, Oct. 1998. [74] A. W. Paeth, “A fast algorithm for general raster rotation,” Graphics Gems, Academic Press, pp. 176-195, 1990. [75] F. Pedersini, A. Sarti, and S. Tubaro, “Multi-camera systems: Calibration and applications,” IEEE Signal Processing Magazine, Vol. 16, No. 3, pp. 55-65, May 1999. [76] F. Pighin, J. Hecker, D. Lischinski, R. Szeliski, and D.H. Salesin, “Synthesizing realistic facial expressions from photographs,” Proc. SIGGRAPH’98, pp.75-84, Orlando, Florida, July 1998. [77] F. Pighin, R. Szeliski, and D. H. Salesin, “Resynthesizing facial animation through 3d model-based tracking,” Proceedings of International Conference on Computer Vision, Vol. 1, pp. 143-150, Kerkyra, Greece, Sept. 20-27, 1999. [78] C.J. Poelman and T. Kanade, “A paraperspective factoration method for shape and motion recovery,” IEEE Trans. Pattern Analysis Machine Intelligence, Vol. 19, No. 3, pp. 206-218, March 1997. [79] Carey E. Priebe, “Adaptive mixtures,” Journal of the American Statistical Association, Vol.89, No. 427, pp. 796-806, 1994. [80] L. R. Rabiner and B. H. Juang, Fundamentals of speech recognition, Englewood Cliffs, NJ: Prentice-Hall, 1993. [81] R. R. Rao, T. Chen, and R. M. Mersereau, “Audio-to-visual conversion for multimedia communication,” IEEE Trans. on Industrial Electronics, Vol. 45, No. 1, pp. 15-22, Feb. 1998. [82] N. Sarris and M. G. Strintzis, “Constructing a videophone for the hearing impaired using MPEG-4 tools,” IEEE Multimedia Magazine, Vol.8, Issue 3, pp. 56-67, July-Sept. 2001. [83] A. Schödl, A. Haro, and I. Essa, “Head tracking using a textured polygonal model,” Proc. Workshop on Perceptual User Interfaces, pp. 43-48, San Francisco, CA, Nov. 5-6, 1998. [84] L.S. Shapiro, A. Zisserman, and M. Brady, “3D Motion recovery via affine epipolar geometry,” International Journal of Computer Vision, Vol. 16, No. 2, pp. 147-182, Oct. 1995. [85] SNHC, Caspar Horne (editor), “SNHC verification model 3.0,” ISO/IEC JTC1/SC29/WG11 N1545, Feb. 1997. [86] D. G. Stork and M. E. Hennecke, “Speechreading: An overview of image processing, feature extraction, sensory integration and pattern recognition techniques,” Proceedings of the Second International Conference on Automatic Face and Gesture Recognition, 1996. [87] N. M. Thalmann, P. Kalra, and M. Eschar, “Face to virtual face,” Proceedings of the IEEE, Vol. 86, No. 5, pp. 870-883, May 1998. [88] Y. L. Tian, T. Kanade, and J. F. Cohn, “Recognizing action units for facial expression analysis,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 23, No. 2, pp. 97-115, Feb. 2001. [89] S. Valente and J.L. Dugelay, “A visual analysis/synthesis feedback loop for accurate face tracking,” Signal Processing: Image Communication, Vol. 16, No. 6, pp. 585-608, Feb. 2001. [90] S. Valente, A. C. Andres Del Valle, and J. L. Dugelay, “Analysis and reproduction of facial expressions for realistic communicating clones,” Journal of VLSI Signal Processing, Vol. 29, No. 1/2, pp. 41-49, Aug./Sept. 2001. [91] VocalTec Internet Phone 5, online Internet telephony software. [92] H. Wang, and S. F. Chang, “A highly efficient system for automatic face region detection in MPEG video”, IEEE Trans. Circuits Syst. Video Technology, Vol. 7, No. 4, pp. 615-628, Aug. 1997. [93] Y. Yacoob and L.S. Davis, “Recognizing human facial expressions from long image sequences using optical flow,” IEEE Trans. Pattern Analysis Machine Intelligence, Vol.18, No.6, pp. 636-642, June 1996. [94] J. Yang and A. Waibel, “A real-time face tracker,” Proc. 3rd IEEE Workshop on Applications of Computer Vision, Sarasota, Florida, pp.142-147, Dec. 2-3, 1996. [95] L. Yin and A. Basu, “Generating realistic facial expressions with wrinkles for model-based coding,” Computer Vision and Image Understanding, Vol. 84, No. 2, pp. 201-240, Nov. 2001. [96] A. Yuille, P. Hallinan, and D. Cohen, “Feature extraction from faces using deformable templates,” International Journal of Computer Vision, Vol. 8, No. 2, pp. 99-111, 1992. [97] L. Zhang, “Automatic adaptation of a face model using action units for semantic coding of videophone sequences,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 8, No. 6, pp. 781-795, Oct. 1998. [98] J.Y. Zheng, “Acquiring 3-D models from sequences of contours,” IEEE Trans. Pattern Analysis Machine Intelligence, Vol. 16, No. 2, pp. 163-178, Feb. 1994.
|