跳到主要內容

臺灣博碩士論文加值系統

(54.173.214.227) 您好!臺灣時間:2022/01/29 14:39
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:林淑娟
研究生(外文):Shu-Jiuan Lin
論文名稱:利用運動強度分析,影片片段辨識及畫面字幕偵測建構視訊影片內容結構之研究
論文名稱(外文):Motion Activity Based Shot Identification and Closed Caption Localization for Video Structuring
指導教授:李素瑛李素瑛引用關係
指導教授(外文):Suh-Yin Lee
學位類別:碩士
校院名稱:國立交通大學
系所名稱:資訊工程系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:英文
論文頁數:79
中文關鍵詞:運動強度字幕影片結構化
外文關鍵詞:Motion ActivityClosed CaptionVideo Structuring
相關次數:
  • 被引用被引用:1
  • 點閱點閱:351
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
在這篇論文中,針對MPEG-Ⅱ壓縮的運動類視訊影片,利用影片片段之運動強度及影片字幕之文字資訊,提出一個新方法產生視訊影片之目錄。為了能加快場景變化偵測的運算速度,先以GOP為單位一個GOP接一個GOP地檢查,當發現可能發生場景變化的GOP位置時,然後在影像畫面這一層找出真正的場景邊界。利用物體的運動強度當描述子來描述已切割好的影片片段。這個描述子是考慮影片片段中移動物體的空間-時間關係之長期一致性來計算物體的2維直方圖而得到的。利用影片片段中的運動強度特徵,我們提出影片片段辨識演算法辨認出影片片段的種類(發球片段,全場片段及特寫片段)。選擇特定的片段(即發球片段),利用字幕偵測演算法來偵測這些片段的字幕。此外,用自我組織映射圖演算法設計一個過濾器,能夠從複雜的背景區域中區分出字幕。最後,我們建構一個運動類影片視訊系統並提供視訊影片之目錄-它是由故事單元,即發球、全場及特寫等連續的影片片段,和字幕組成的階層式架構。此外,我們進而提供可依使用者需求而動態調整視訊影片內容的樹狀結構。實驗結果顯示這個系統的有效性及影片內容之階層式架構的可行性。

In this paper, we propose a novel approach to generate the table of video content based on shot description of motion activity and textual information of closed caption in MPEG-Ⅱ sports videos. In order to speed up in scene change detection, instead of examining scene cut frame by frame, GOP-based approach first checks video streams GOP by GOP and then finds out the actual scene boundaries in the frame level. Segmented shots are described by the proposed object-based motion activity descriptor. The descriptor is computed based on the object 2D-histogram, in which long-term consistency of spatial-temporal relationship of moving objects within video shots is considered. Utilizing the characterized features of motion activity in video shots, video clips are recognized by the proposed algorithm of shot identification. Subsequently, the specific shots of interest are selected and the proposed mechanism of closed caption localization is exploited to detect captions in these shots. Moreover, the SOM (Self-Organization Map) based algorithm is designed as a filter to distinguish the superimposed closed captions from the high-textured background regions. Finally, we can construct a sports video content visualization system and provide the table of video content composed of the hierarchical structure of story units, consecutive shots and closed captions. Furthermore, we supply users with the dynamic tree structure of video content. The experimental results show the effectiveness of the proposed system and reveal the feasibility of the hierarchical structuring of video content.

Abstract in Chinese…………………………………………………………………… i
Abstract………………………………………………………………………………..ii
Acknowledgement…………………………………………………………………….iv
Table of Contents………………………………………………………………………v
List of Figures………………………………………………………………………..vii
List of Tables………………………………………………………………………….ix
List of Algorithms……………………………………………………………………..x
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Organization 3
Chapter 2 Background 5
2.1 Overview of MPEG-Ⅱstandard 5
2.1.1 Structure of coded video data 6
2.1.2 Picture layer of coded video data 7
2.1.3 Picture encoding units: macroblock and block 8
2.2 Scene change detection method 10
2.2.1 Pixel-domain scene change detection 10
2.2.2 Histogram comparison 11
2.2.3 DCT coefficients based approach 12
2.2.4 Motion-vector based approach 13
2.2.5 Feature-based approach 14
2.2.6 DC image based approach 15
2.3 The basis of caption localization method 17
2.4 Video Structuring method 20
2.5 Self-organizing map approach 22
2.6 Affine motion model 24
2.7 Overview of MPEG-7 standard 25
2.8 XML standard 27
Chapter 3 Motion Activity Based Shot Identification and Closed Caption Localization for Video Structuring 30
3.1 Overview of the proposed Scheme 31
3.2 Scene change detection 32
3.2.1 Inter-GOP scene change detection 33
3.2.2 Intra-GOP scene change detection 35
3.3 Shot identification 38
3.3.1 Moving object detection 38
3.3.2 Motion activity descriptor — 2D histogram 39
3.3.3 Shot identification algorithm 42
3.4 Closed caption localization 45
3.4.1 Closed caption detection 46
3.4.2 SOM-based noise filtering 49
3.5 Extracting highlights 51
3.6 Constructing the table of video content 53
Chapter 4 System architecture and experiment 55
4.1 Overview of sports video content visualization system 55
4.2 Audio-Video splitter module 58
4.3 Sports video content visualization module 58
4.3.1 Component of video segmentation 60
4.3.2 Component of motion activity based shot identification 61
4.3.3 Component of closed caption localization 61
4.4 Video editor module 62
4.5 Experiment and analysis 66
Chapter 5 Conclusion and future work 72
Bibliography 74

[1] H. Wang and S. F. Chang, “A Highly Efficient System for Automatic Face Region Detection in MPEG Video,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 7, No. 4, pp. 615-628, Aug. 1997.
[2] Y. Zhong, H. Zhang and A. K. Jain, “Automatic Caption Localization in Compressed Video,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 4, pp. 385-392, Apr. 2000.
[3] H. Luo and A. Eleftheriadis, “On Face Detection in the Compressed Domain,” Proc. 8th International ACM Conference on Multimedia, pp. 285-294, Los Angeles, CA, USA, Nov. 2000.
[4] Y. Zhang and T. S. Chua, “Detection of Text Captions in Compressed Domain Video,” Proc. ACM Multimedia Workshop, pp. 201-204, Los Angeles, CA, USA, Nov. 2000.
[5] S. W. Lee, Y. M. Kim and S. W. Choi, “Fast Scene Change Detection using Direct Feature Extraction from MPEG Compressed Videos,” IEEE Transactions on Multimedia, Vol. 2, No. 4, pp. 240-254, Dec. 2000.
[6] H. Lu and Y. P. Tan, “Sports Video Analysis and Structuring,” Proc. IEEE 4th Workshop on Multimedia Signal Processing, pp.45-50, Cannes, France, Oct. 2001.
[7] D. Y. Chen and S. Y. Lee, “Object-Based Motion Activity Description in MPEG-7 for MPEG Compressed Video,” Proc. the 5th World Multi-conference on Systemics, Cybernetics and Informatics (SCI 2001), Vol. 6, pp. 252-255, Orlando, USA, Jul. 2001.
[8] S. Y. Lee, J. L. Lian and D. Y. Chen, “Video Summary and Browsing Based on Story-Unit for Video-on-Demand Service,” Proc. 3rd International Conference on Information, Communications and Signal Processing, Singapore, Oct. 2001.
[9] J. L. Mitchell, W. B. Pennebaker, Chad E.Fogg, and Didier J. LeGall, “MPEG VIDEO COMPRESSION STANDARD,” Chapman&Hall, NY, USA, 1997.
[10] J. Meng, Y. Juan, S.F. Chang, “Scene Change Detection in a MPEG Compressed Video Sequence,” Proceedings IS&T/SPIE, Vol. 2419, pp. 14-25, CA, USA, Feb. 1995.
[11] T. Kohonen, “The Self-Organizing Map,” Proceedings of IEEE, 78: 1464-1480, 1990.
[12] H. Li, D. Doermann and O. Kia, “Automatic Text Detection and Tracking in Digital Video,” IEEE Transactions on Image Processing, Vol. 9, No. 1, pp. 147-156, Jan. 2000.
[13] J. C. Shim, C. Dorai and R. Bollee, “Automatic Text Extraction from Video for Content-Based Annotation and Retrieval,” Proc. 14th International Conference on Pattern Recognition, Vol. 1, pp. 618-620, Brisbane, Australia, Aug. 1998.
[14] U. Gargi, S. Antani and R. Kasturi, “Indexing Text Events in Digital Video Databases,” Proc. 14th International Conference on Pattern Recognition, Vol. 1, pp. 916-918, Brisbane, Australia, Aug. 1998.
[15] Guojun Lu, “Communication and Computing for Distributed Multimedia Systems”, Artech House, 1996.
[16] G.Ahanger and T.D.C. Little, “A Survey of Technologies for Parsing and Indexing Digital Video,” Journal of Visual Communicationn and Image Representation, special issue on Digital Libraries, Vol. 7, No. L, pp. 28-43, Mar. 1996.
[17] U. Gargi, R. Kasturi, and S.H. Strayer, “Performance Characterization of Video-Shot-Change Detection Methods,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 10, No. 1, pp. 1-13, Feb. 2000.
[18] A. Nagasaka, and Y. Tanaka, “Automatic Video Indexing and Full-Video Search for Object Appearances,” Visual Database Systems, II, Eds. E. Knuth, and L.M. Wegner, Elsevier Science Publishers B.B., IFIP, pp. 113-127, Budapest, Hungary, Oct. 1991.
[19] H. J. Zhang, A. Kankanhalli, and S.W. Smoliar, “Automatic Partitioning of Full-Motion Video,” Multimedia Systems, Vol. 1, No. 1, pp. 10-28, Jul. 1993.
[20] F. Arman, A.Hsu, and M.Y. Chiu, “Image processing on Compressed Data for large Video Databases,” Proceedings First ACM International Conference on Multimedia, pp. 267-272, Anaheim, CA, USA, 1993.
[21] R. Zabih, J. Miller and K. Mai, “A Feature-Based Algorithm for Detecting and Classifying Scene Breaks,” Proc. ACM Multimedia, pp. 189-200, San Francisco, CA, USA, Nov. 1995.
[22] A. Hauptmann, and M. Smith, “Text, Speech, and Vision for Video Segmentation: The Informedia Project,” AAAI Symposium on Computational Models for Integrating Languages and Vision, Boston, MA, USA, Nov. 1995.
[23] B. Yeo and B. Liu, “Rapid Scene Analysis on Compressed Video,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 5, No. 6, pp. 533-544, Dec. 1995.
[24] S. F. Chang and D. G. Messerschmitt, “Manipulation and Compositing of MC-DCT Compressed Video,“ IEEE Journal on Selected Areas in Communications., Vol. 13, No. 1, pp. 1-11, Jan. 1995.
[25] I. K. Sethi and N. V. Patel, “A Statistical Approach to Scene Change Detection,” in IS&T SPIE: Storage and Retrieval for Image and Video Database Ⅲ, Vol. 2420, pp. 329-338, San Jose, CA, USA, Feb. 1995.
[26] W. A. C. Fernando, C. N. Canagarajah and D. R. Bull, “A Unified Approach to Scene Change Detection in Uncompressed and Compressed Video,“ IEEE Transactions on Consumer Electronics, Vol. 46, No. 3, pp. 769-779, Aug. 2000.
[27] A. Hanjalic and R. L. Lagendijk, “Automated High-Level Movie Segmentation for Advanced Video-Retrieval Systems,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 9, No. 4, pp. 580-588, Jun. 1999.
[28] M. M. Yeung and B. L. Yeo, “Video Visualization for Compact Presentation and Fast Browsing of Pictorial Content,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 7, No. 5, pp. 771-785, Oct. 1997.
[29] Y. M. Kwon, C. J. Song and I. J. Kim, “A New Approach for High Level Video Structuring,” Proc. IEEE International Conference on Multimedia and Expo., Vol. 2, pp. 773-776, NY, USA, Aug. 2000.
[30] X. Chen and H. Zhang, “Text Area Detection from Video Frames,” Proc. 2nd IEEE Pacific Rim Conference on Multimedia, pp. 222-228, Beijing, China, Oct. 2001.
[31] J. Ohya, A. Shio and S. Akamastsu, “Recognizing Characters in Scene Images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 16, No. 2, pp. 214-220, Feb. 1994.
[32] S. Kannangara, E. Asbun, R. X. Browning and E. J. Delp, “The Use of Nonlinear Filtering in Automatic Video Title Capture,” Proc. IEEE/EURASIP Workshop on Nonlinear Signal and Image Processing, Mackinac Island, MI, USA, Sep.1997.
[33] V. Wu, R. Manmatha and E. M. Riseman, “TextFinder: An Automatic System to Detect and Recognize Text in Images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 21, No. 11, pp. 1224-1229, Nov. 1999.
[34] ISO/IEC JTC1/SC29/WG11/N3913, “Study of CD 15938-3 MPEG-7 Multimedia Content Description Interface — Part 3 Visual,” Pisa, Jan. 2001.
[35] ISO/IEC JTC1/SC29/WG11/N3751, “Introduction to MPEG-7 (v2.0),” La Baule, Oct. 2000.
[36] ISO/IEC JTC1/SC29/WG11/N4547, Text of 15938-8 PDTR Information Technology - Multimedia Content Description Interface — Part 8: Extraction and Use of MPEG-7 descriptions, Final Committee Draft (FCD) edition, Dec. 2001.
[37] M.M. Yeung and B.L. Yeo, “Time-constrained Clustering for Segmentation of Video into Story Units,” International Conference on Pattern Recognition, Vol. 3, pp. 375-380, Vienna, Austria, Aug. 1996.
[38] J. Nang, O. Kwon and S. Hong, “Caption Processing for MPEG Video in MC-DCT Compressed Domain,” Proc. 8th International ACM Conference on Multimedia, pp. 211-218, Los Angeles, CA, USA, Nov. 2000.
[39] H. C. Lin, “Distributed News Video Database System for VOD Environment,” Master thesis, National Chiao Tung University, Dept. Of CSIE, Taiwan, Jun. 1997.
[40] Jae-Gon Kim, Hyun Sung Chang, Jinwoong Kim, and Hyung-Myung Kim, “Efficient Camera Motion Characterization for MPEG Video Indexing,” IEEE International Conference on Multimedia and Expo, Vol. 2, pp. 1171-1174, NY, USA, Aug. 2000.
[41] E. Franeois and P. Bouthemy, “Derivation of Qualitative Information in Motion Analysis,” Image Vis. Computing, Vol. 8, No. 4, pp. 279-287, Nov. 1990.
[42] MPEG Software Simulation Group, http://www.mpeg.com/.
[43] R. J. Chen, “Design of an MPEG-2 Video Editing System,” Master thesis, National Chiao Tung University, Dept. of CSIE, Taiwan, Jun. 1996.
[44] Sangkeun Lee, and Monson H. Hayes, “Efficient Scene Segmentation for Content-based indexing in the Compressed Domain,“ IEEE 4th Workshop on Multimedia Signal Processing, pp. 473-478, Cannes, France, Oct. 2001.
[45] Z. Chi, H. Yan and T.Pham., “Fuzzy Algorithms: With Applications to Image Processing and Pattern Recognition,” World Scientific, Singapore, 1996.
[46] Khalid Sayood, “Introduction to Data Compression”, Morgan Kaufmann Publishers, 1996.
[47] S. C. Pei, and Y. Z. Chou, “Efficient MPEG Compressed Video Analysis Using Macroblock Type Information, “ IEEE Transactions on Multimedia, Vol. 1, No. 4, pp. 321-333, Dec. 1999.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top