跳到主要內容

臺灣博碩士論文加值系統

(3.236.68.118) 您好!臺灣時間:2021/07/31 19:44
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:鄭宏文
研究生(外文):Hong-Wun Jheng
論文名稱:利用語音進行照片中人物影像的自動化標註及檢索
論文名稱(外文):Automatic Facial Image Annotation and Retrieval by Integrating Voice Label and Visual Appearance
指導教授:徐宏民
口試委員:陳祝嵩葉梅珍余能豪
口試日期:2015-07-22
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2015
畢業學年度:103
語文別:英文
論文頁數:19
中文關鍵詞:照片標註語音檢索
外文關鍵詞:Photo AnnotationSpeech Retrieval
相關次數:
  • 被引用被引用:0
  • 點閱點閱:76
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
Annotation is important for managing and retrieving a large amount of photos, but it is generally labor-intensive and time-consuming. However, speaking while taking photos is straightforward and effortless, and using voice for annotation is faster than typing words. To best reduce the manual cost of annotating photos, we propose a novel framework which utilizes the scarce spoken annotations recorded while capturing as voice labels and automatically label every facial image in the photo collection. To accomplish this goal, we employ a probabilistic graphical model which integrates voice labels and visual appearances for inference. Combined with group prior estimation and gender attribute association, we can achieve an outstanding performance on the proposed synthesized group photo collections.

誌謝 i
摘要 ii
Abstract iii
Contents iv
List of Figures vi
List of Tables vii
1 Introduction 1
2 Related Work 3
2.1 SpeechforPhotoAnnotationandRetrieval . . . . . . . . . . . . . . 3
2.2 MultipleInstancesIdentification ...................... 4
3 Proposed Method 5
3.1 SystemOverview.............................. 5
3.2 LatentIdentityDiscovery.......................... 6
3.3 ProbabilisticGraphicalModelConstruction . . . . . . . . . .. . . . 6
3.3.1 PriorEstimationWithKDE .................... 8
3.3.2 GenderAttributeAssociation ................... 9
3.4 Retrieval................................... 10
4 Experiment Results 11 4.1 DatasetsandImplementation........................ 11
4.2 PerformanceonDifferentDatasets..................... 13
4.3 PerformancewithPartialAnnotations ................... 14
4.4 RetrieveFaceImages............................ 15
5 Conclusion 16
Bibliography 17

[1] U. S. S. Administration. Baby name database. URL: http://www.socialsecurity.gov/OACT/babynames/.
[2] X. Anguera, J. Xu, and N. Oliver. Multimodal photo annotation and retrieval on a mobile phone. In Proceedings of the 1st ACM international conference on Multime- dia information retrieval, pages 188–194. ACM, 2008.
[3] T. L. Berg, A. C. Berg, J. Edwards, M. Maire, R. White, Y.-W. Teh, E. Learned- Miller, and D. A. Forsyth. Names and faces in the news. In Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, volume 2, pages II–848. IEEE, 2004.
[4] M.BrennerandE.Izquierdo.Graph-basedrecognitioninphotocollectionsusingso- cial semantics. In Proceedings of the 2011 ACM workshop on Social and behavioural networked media access, pages 47–52. ACM, 2011.
[5] M. Brenner and E. Izquierdo. Recognizing people by face and body in photo col- lections. In 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), pages 1–7. IEEE, 2013.
[6] C.-C. Chang and C.-J. Lin. Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3):27, 2011.
[7] D. Chen, X. Cao, F. Wen, and J. Sun. Blessing of dimensionality: High-dimensional feature and its efficient compression for face verification. In Computer Vision and
17
Pattern Recognition (CVPR), 2013 IEEE Conference on, pages 3025–3032. IEEE, 2013.
[8] J. Chen, T. Tan, and P. Mulhem. A method for photograph indexing using speech annotation. In Advances in Multimedia Information Processing—PCM 2001, pages 867–872. Springer, 2001.
[9] J. Chen, T. Tan, P. Mulhem, and M. Kankanhalli. An improved method for image retrieval using speech annotation. In Tamkang University, Taiwan, pages 7–10, 2003.
[10] P. Duygulu and A. Hauptmann. What’s news, what’s not? associating news videos with words. In Image and Video Retrieval, pages 132–140. Springer, 2004.
[11] B. J. Frey and D. Dueck. Clustering by passing messages between data points. sci- ence, 315(5814):972–976, 2007.
[12] A. C. Gallagher and T. Chen. Using group prior to identify people in consumer images. In Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Con- ference on, pages 1–8. IEEE, 2007.
[13] A. C. Gallagher and T. Chen. Estimating age, gender, and identity using first name priors. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Con- ference on, pages 1–8. IEEE, 2008.
[14] T. J. Hazen, B. Sherry, and M. Adler. Speech-based annotation and retrieval of digital photographs. In INTERSPEECH, volume 7, pages 2165–2168, 2007.
[15] C.-L. Huang, W.-C. Lee, and C.-H. Wu. Robust speech-annotated photo retrieval using syllable-transformed patterns.
[16] G.B.Huang,M.Ramesh,T.Berg,andE.Learned-Miller.Labeledfacesinthewild: A database for studying face recognition in unconstrained environments. Technical report, Technical Report 07-49, University of Massachusetts, Amherst, 2007.
18
[17] H.-W. Jheng, B.-C. Chen, Y.-Y. Chen, and W. Hsu. Automatic facial image annota- tion and retrieval by integrating voice label and visual appearance. In Proceedings of the ACM International Conference on Multimedia, pages 1001–1004. ACM, 2014.
[18] D. V. Kalashnikov, S. Mehrotra, J. Xu, and N. Venkatasubramanian. A semantics- based approach for speech annotation of images. Knowledge and Data Engineering, IEEE Transactions on, 23(9):1373–1387, 2011.
[19] N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar. Attribute and simile classifiers for face verification. In Computer Vision, 2009 IEEE 12th International Conference on, pages 365–372. IEEE, 2009.
[20] K.-F. Lee and H.-W. Hon. Speaker-independent phone recognition using hidden markov models. Acoustics, Speech and Signal Processing, IEEE Transactions on, 37(11):1641–1648, 1989.
[21] K. P. Murphy, Y. Weiss, and M. I. Jordan. Loopy belief propagation for approxi- mate inference: An empirical study. In Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence, pages 467–475. Morgan Kaufmann Publishers Inc., 1999.
[22] K. Rodden and K. R. Wood. How do people manage their digital photographs? In Proceedings of the SIGCHI conference on Human factors in computing systems, pages 409–416. ACM, 2003.
[23] T. Tan, J. Chen, P. Mulhem, and M. Kankanhalli. Smartalbum: a multi-modal photo annotation system. In Proceedings of the tenth ACM international conference on Multimedia, pages 87–88. ACM, 2002.
[24] R. L. Weide. The cmu pronouncing dictionary. URL: http://www. speech. cs. cmu. edu/cgibin/cmudict, 1998.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關論文
 
無相關期刊
 
無相關點閱論文