跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.42) 您好!臺灣時間:2025/10/01 12:48
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:林哲正
研究生(外文):Che-Cheng Lin
論文名稱:利用跨媒體關聯性於旅遊媒體分析
論文名稱(外文):Using Cross-Media Correlation for Travel Media Analysis
指導教授:朱威達
指導教授(外文):Wei-Ta Chu
學位類別:碩士
校院名稱:國立中正大學
系所名稱:資訊工程所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2009
畢業學年度:97
語文別:中文
論文頁數:54
中文關鍵詞:影片摘要場景偵測跨媒體關聯性照片摘要
外文關鍵詞:video scene detectionvideo summarizationcross-media correlationphoto summarization
相關次數:
  • 被引用被引用:0
  • 點閱點閱:374
  • 評分評分:
  • 下載下載:67
  • 收藏至我的研究室書目清單書目收藏:0
由於電子科技日益進步,人們外出旅遊時常常拍攝大量照片及影片以供日後觀賞,而傳統整理多媒體資料的方法不外乎是以人工的方式對檔案進行篩選及分群。為了利用電腦自動完成以上工作,我們提出旅遊影片及照片間的跨媒體關聯性概念來幫助我們分析旅遊影片。
因旅遊影片及照片常常包含了類似的內容,我們利用此項特徵建立出兩媒體間的跨媒體關聯性,並根據建立方法的不同,另外將跨媒體關聯性區分成全域跨媒體關聯性及區域跨媒體關聯性。全域跨媒體關聯性的特點,在於它連接了兩媒體間視覺概念相似的部分。我們利用照片內的時間資訊偵測照片的場景位置,並搭配全域跨媒體關聯性完成影片的場景偵測。此外,區域跨媒體關聯性所連接的不只是概念相似的圖片,而是兩媒體內容中包含同一物件的部份。我們在此假設當某物件同時出現在影片及照片當中,則表示包含此物件的影片或照片相對重要。因此我們利用區域跨媒體的特性找出重要性較高的影片片段及照片,來完成影片及照片摘要。在最後的實驗部分,我們分別呈現運用跨媒體關聯性於場景分析、影片摘要以及照片摘要的效能結果, 進一步證實了跨媒體關聯性的確有助於旅遊影片分析。
According to advance of electronic devices, people usually take lots of videos and photos in a journey. Traditionally, we put all multimedia data in order by manual selection or file clustering. In order to do these works automatically by computer, we present the concept of cross-media correlation between travel videos and photos to facilitate travel videos analysis.
Because travel videos and photos often include similar content, we take advantage of this characteristic to construct cross-media correlation between them. In accordance with establishment of different methods, we categorize correlation into global correlation and local correlation. Characteristic of global cross-media correlation is that it correlates video fragments and photos which containing similar visual concepts. We apply time information of photos to facilitate scene detection of photo, and use global correlation and photo scene information to accomplish video scene detection. Besides, local cross-media correlation not only correlates those pictures with similar visual concepts, but also those pictures with the same object. We assume if videos and photos contain the same object at the same time, those pictures are relatively important. Therefore, we apply local correlation to find which video fragments and photos are more important to accomplish video summarization and photo summarization. In experiments, we show promising performance of using cross-media correlation for video scene detection, video summarization as well as photo summarization to confirm that cross-media correlation certainly has the ability to assist travel video analysis.
第一章 簡介 1
1.1 研究背景及動機 1
1.2 旅遊影片分析 1
1.3 全域跨媒體關聯性 2
1.4 場景偵測 3
1.5 區域跨媒體關聯性 4
1.6 影片及照片摘要 5
1.7 論文架構 5
第二章 文獻探討 6
2.1 家庭影片分析 6
2.2 跨媒體關聯性 6
2.3 場景偵測 8
2.4 影片及照片摘要 9
第三章 全域跨媒體關聯性之建構與應用 12
3.1 全域跨媒體關聯性之建構 12
3.1.1 照片場景偵測 13
3.1.2 關鍵影格選取 14
3.1.3 過濾模糊關鍵影格 16
3.1.4 視覺單字之建構 18
3.1.5 近似字串比對 20
3.2 影片場景偵測 23
3.3 小結 24
第四章 區域跨媒體關聯性之建構與應用 26
4.1 區域跨媒體關聯性之建構 26
4.1.1 照片與關鍵影格比對 27
4.1.2 比對時間限制 28
4.2 旅遊影片摘要 29
4.2.1 關鍵影格重要性評估 29
4.2.2 摘要影片片段選取 31
4.3 旅遊照片摘要 33
4.3.1 照片重要性評估 33
4.3.2 摘要照片選取 35
4.4 小結 36
第五章 實驗結果 37
5.1 實驗資料 37
5.2 場景偵測 39
5.3 影片及照片摘要 43
5.4 小結 45
第六章 結論 47
參考文獻 49
附錄 54
[1]Gatica-Perez, D., Loui, A., and Sun, M.-T. 2003. Finding structure in home videos by probabilistic hierarchical clustering. IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 6, 539-548.
[2]Pan, Z. and Ngo, C.-W. 2004. Structuring home video by snippet detection and pattern parsing. In Proc. of ACM International Workshop on Multimedia Information Retrieval, 69-76.
[3]Hua, X.-S., Lu, L., and Zhang, H.-J. 2004. Optimization-based automated home video editing system, vol. 14, no. 5, 572-583.
[4]Lee, S.-H., Wang, S.-Z., and Kuo, C.C.J. 2005. Tempo-based MTV-style home video authoring. In Proc. of IEEE International Workshop on Multimedia Signal Processing.
[5]Shipman, F., Girgensohn, A., and Wilcox. L. 2008. Authoring, viewing, and generating hypervideo: an overview of Hyper-Hitchcock. ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 5, no. 2, Article no. 15.
[6]Achanta, R.S.V., Yan, W.-Q., and Kankanhalli, M.S. 2006. Modeling intent for home video repurposing. IEEE Multimedia, vol. 13, no. 1, 46-55.
[7]Mei, T. and Hua, X.-S. 2005. Intention-based home video browsing. In Proc. of ACM Multimedia, 221-222.
[8]Chu, W.-T., and Chen, H.-Y. 2002. Cross-media correlation: a case study of navigated hypermedia document. In Proc. of ACM Multimedia, 57-66.
[9]Chu, W.-T., Hsu, K.-T., and Chen, H.-Y. 2001. Design of an alignment system for synchronized speech-text presentation. In Proc. of Distributed Multimedia Systems, 86-93.
[10]Owen, C. B. 1998. Multiple media correlation: theory and applications. Technical Report PCS-TR98-335, Dartmouth College, USA.
[11]Jeon , J., Lavrenko, V. and Manmatha R. 2003. Automatic image annotation and retrieval using cross-media relevance models. In Proc. of ACM SIGIR, 119-126.
[12]Zhang, H., Zhuang, Y., and Wu, F. 2007. Cross-modal correlation learning for clustering on image-audio dataset. In Proc. ACM Multimedia, 273-276.
[13]Pan, J.-Y., Yang, H., Faloutos, C., and Duygulu, P. 2007. Cross-modal correlation mining using graph algorithms. In Knowledge Discovery and Data Mining: Challenges and Realities with Real World Data, Idea Group Reference.
[14]Yeung, M. and Yeo, B.-L. 1998. Segmentation of video by clustering and graph analysis. Computer Vision and Image Understanding, vol. 71, no. 1, 94-109.
[15]Hanjalic, A., Lagendijk, R. L., and Biemond, J. 1999. Automated high-level movie segmentation for advanced video-retrieval systems. IEEE Transactions on Circuits and Systems for Video Technology, vol. 9, no. 4, 580-588.
[16]Rasheed, Z., and Shah, M. 2005. Detection and representation of scenes in videos. IEEE Transactions on Multimedia, vol. 7, no. 6, 1097-1105.
[17]Chasanis, V. T., Likas, A. C., Galatsanos, N. P. 2007. Scene detection in videos using shot clustering and symbolic sequence segmentation. IEEE Workshop on Multimedia Signal Processing, 187-190.
[18]Chasanis, V. T., Likas, A. C., Galatsanos, N. P. 2009. Scene detection in videos using shot clustering and sequence alignment. IEEE Transactions on Multimedia, vol. 11, no. 1, 89-100.
[19]Zhuang, Y., Rui, Y., Huang, T.S., and Mehrotra, S. 1998. Adaptive key frame extraction using unsupervised clustering. In Proc. of IEEE International Conference on Image Processing, 866-870.
[20]Uchihashi, S., Foote, J., Girgensohn, A., and Boreczky, J. 1999. Video manga: generating semantically meaningful video summaries. In Proc. of ACM Multimedia, 383-392.
[21]Jiebo, L., Papin, C., Costello, K. 2009, Towards extracting semantically meaningful key frames from personal video clips: from humans to computers. IEEE Transactions on Circuits and Systems for Video Technology, vol. 19, no. 2, 289-301.
[22]Gong, Y. H., Liu, X. 2001. Video summarization with minimal visual content redundancies. In Proc. of IEEE International Conference on Image Processing, vol. 3, 362-365.
[23]Li, Z., Schuster, G.M., Katsaggelos, A. K. 2005. Rate-distortion optimal video summary generation. In Proc. of IEEE International Conference on Image Processing, vol. 14, 1550-1560.
[24]Otsuka, I., Nakane, K., Divakaran, A., et. al. 2005. A highlight scene detection and video summarization system using audio feature for a personal video Recorder. IEEE Trans on Consumer Electronics, vol. 51, 112-116.
[25]Ma, Y. F., Lu, L., Zhang, H. J., Li, M. J. 2006. Video summarization using personal photo libraries. In Proc. of ACM Multimedia, 213-221.
[26]Peng, W. T., Chiang, Y. H., Chu, W. T., Huang, W. J., Chang, W. L., Huang, P. C., and Hung, Y. P. 2008. Aesthetics-based automatic home video skimming system. In LNCS 4903, 186-197.
[27]Mor, N., Yee, J. S., Andreas, P., and Hector, G.-M. 2004. Automatic organization for digital photographs with geographic coordinates. In Proc. of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 53-62.
[28]Pigeau, A. and Gelgon, M. 2004. Organizing a personal image collection with statistical model-based ICL clustering on spatio-temporal camera phone meta-data. Journal of Visual Communication and Image Representation, vol. 15, 425-445.
[29]Marc, D., Simon, K., Nathan, G., and Risto, S. 2004. From context to content: leveraging context to infer media metadata. In Proc. of the 12th International Conference on Multimedia, 188-195.
[30]O’Hare, N., Gurrin, C., Jones, G. J. F., and Smeaton, A. F. 2005. Combination of content analysis and context features for digital photograph retrieval. In 2nd IEE European Workshop on the Integration of Knowledge, Semantic and Digital Media Technologies.
[31]Alexandar, J., Mor N., Tamir, T., Marc, D. 2006. Generating summaries and visualization for large collections of geo-referenced photographs. In Proc. of the ACM international workshop on Multimedia information retrieval, 89-98.
[32]Adrian, G., Hector, G.-M., Andreas, P., and Terry, W. 2002. Time as essence for photo browsing through personal digital libraries. In Proc. of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 326-335.
[33]Matthew, C., Jonathan, F., Andreas, G., and Lynn, W. 2003. Temporal event clustering for digital photo collections. In Proc. of ACM Multimedia, 364-373.
[34]Rasheed, Z. and Shah, M. 2003. Scene detection in Hollywood movies and tv shows. In Proc. of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, 343-348.
[35]Vendrig, J. and Worring, M. 2002. Systematic evaluation of logical story unit segmentation. IEEE Transactions on Multimedia, vol. 4, no. 4, 492-499.
[36]Platt, J.C., Czerwinski, M., and Field, B.A. 2003. PhotoTOC: automating clustering for browsing personal photographs. In Proc. of IEEE Pacific Rim Conference on Multimedia, 6-10.
[37]Likas, A., Vlassis, N., and Verbeek, J.J. 2003. The global k-means clustering algorithm. Pattern Recognition, vol. 36, 451-461.
[38]Lowe, D. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 2, 91-110.
[39]Wang, F., Jiang, Y.-G., and Ngo, C.-W. 2008. Video event detection using motion relativity and visual relatedness. In Proc. of ACM Multimedia, 239-248.
[40]Zhou, X., Zhuang, X., Yan, S., Chang, S.-F., Hasegawa-Johnson, M., and Huang, T.S. 2008. SIFT-bag kernel for video event analysis. In Proc. of ACM Multimedia, 229-238.
[41]Vinciarelli, A. and Favre, S. 2007. Broadcast news story segmentation using social network analysis and hidden Markov models. In Proc. of ACM Multimedia, 261-264.
[42]Hsu, W.H., Kennedy, L., and Chang, S.-F. 2007. Reranking methods for visual search. IEEE Multimedia, vol. 14, no. 3, 14-22.
[43]Tong, H., Li, M., Zhang, H.-J., and Zhang, C. 2004. Blur detection for digital images using wavelet transform. In Proc. of IEEE International Conference on Multimedia & Expo, 17-20.
[44]Chu, W.-T. and Chen, H.-Y. 2005. Towards better retrieval and presentation by exploring cross-media correlations. Multimedia Systems, vol. 10, no. 3, 183-198.
[45]Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman, A. 2008. Lost in quantization: improving particular object retrieval in large scale image databases. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition.
[46]Wu, F., Zhang, H., and Zhuang, Y. 2006. Learning semantic correlation for cross-media retrieval. In Proc. of IEEE International Conference on Image Processing, 1465-1468.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top