(3.238.240.197) 您好!臺灣時間:2021/04/13 01:13
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:陳韻竹
研究生(外文):Chen, Yun-Jhu
論文名稱:基於RGB-D視訊之空間時間對稱樣式非監督式學習及其人類行為分析之應用
論文名稱(外文):Unsupervised Learning of Space-time Symmetric Patterns in RGB-D Videos for Human Activity Detection
指導教授:鄭錫齊鄭錫齊引用關係
指導教授(外文):Cheng, Shyi-Chy
口試委員:江政欽楊健貴張欽圳
口試委員(外文):Chiang, Cheng-ChinYang, Chen-KueiChang, Chin-Chun
口試日期:2018-07-09
學位類別:碩士
校院名稱:國立臺灣海洋大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2018
畢業學年度:106
語文別:中文
論文頁數:37
中文關鍵詞:視訊行為分析分類
外文關鍵詞:video activity analysisclassification
相關次數:
  • 被引用被引用:0
  • 點閱點閱:110
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:34
  • 收藏至我的研究室書目清單書目收藏:0
本論文中,提出了在視訊鏡頭中使用3D矩量方法取得時間空間活動向量圖的方法。傳統的影片分類方法是將影片分割成影像片段後,再對於每一影像片段之關鍵畫面進行分類,並沒有考慮到圖片之間的時間關聯性特徵。從影片中分割出的關鍵畫面,前後關鍵畫面間存在之物件運動特徵是行為分析的重要特徵,所以本論文提出先對視訊序列進行前處理,取得重疊之影像片段間的隱含運動向量特徵,結合原始影像資料,輸入以卷積神經網路為基礎之遞迴神經網路(recurrent neural network)進行視訊行為分析。
本論文主要分為兩大部分:取得視訊片段運動向量的資料前處理;結合原始影像資料,輸入以卷積神經網路為基礎之遞迴神經網路(recurrent neural network)進行視訊行為分析。運動向量資料前處理首先是將包含特定行為類別的視訊畫面切割成多個互相重疊的視訊片段。定義每個視訊片段之中間畫面為關鍵畫面,以關鍵畫面之邊緣點為中心,將每個視訊片段分成多個視訊立方體,並利用本論文提出之3D矩量方法擷取各個局部立方體中的物件時間空間特徵,其中之空間特徵表示物件之局部外型;時間特徵表示局部立方體之物件運動特徵,蒐集關鍵畫面各像素的時間空間特徵,我們建構關鍵畫面之時間空間特徵影像。本論文提出之前處理轉換視訊各畫面為相對應之關鍵畫面的時間空間特徵影像。利用這些轉換之時間空間特徵影像序列,輸入到遞迴神經網路(RNN),進一步組合時間空間特徵序列影像,強化準確行為分析所需要的時間空間特徵。本論文提出之時間空間特徵影像表示法,可以提供以原始影像序列之進一步時間空間特徵,所以能夠有較準確的行為分析實驗結果。
This paper presents an approach to obtain the space-time motion vectors in a video shot using the proposed 3D moment method. The recognition of human activity in a video can be formulated as the problem of video classification by first segmenting the input video into video shots, estimating the activity scores of individual shots using a classifier, and accumulating the scores of consecutive shots to determine the activity boundary. However, the accuracy of video classification degrades dramatically when time-related features, i.e. motion vectors or optical flows, across frames are not considered in the training procedure. This paper proposes an approach to embed the implicit motion vector map into individual frames of a video. This enlarges the number of channels of a frame from 3(4) to 6; the first three channels are the original RGB(RGB-D) data and the remainders are the three-dimensional space-time motion vector.
The effectiveness of the space-time motion features are first verified by a k-nnr-based video classifier which uses kernel-PCA to construct the activity voting dictionary for activity detection. Also the motion feature maps are embedded into video shots and used to train the recurrent neural network (RNN) for further analyzing the behavior in a video. Using the 3D moment method, the proposed video classifier includes the pixel-wise space-time motion vector which both considers the appearance changing and temporal motion of objects across frames in a video to achieve of the goal of improving the accuracy of human activity recognition and detection. Experimental results demonstrate the effectiveness of the proposed method in terms of recognition accuracy and execution speed.
一、 緒論 1
1.1 研究動機 1
1.2 研究背景 1
1.3 研究目的 2
1.4 研究方法簡介 2
1.5 論文架構 3
二、 相關研究 4
2.1 RGBD視訊處理 4
2.2 Tensorflow深度學習簡介 4
2.3 視訊人類行為分析 5
2.3.1 傳統方法 5
2.3.2 深度學習方法 6
三、 結合時間空間特徵之視訊行為分析方法設計 7
3.1 視訊片段活動圖 7
3.1.1 時間空間對稱模式檢測 8
3.1.2 產生視訊片段的活動圖 10
3.2 主要活動圖 11
3.3 4D人類行為識別 13
四、 利用遞迴神經網路(RNN)進行視訊行為分析 15
五、 實驗結果 18
5.1 實驗資料準備/實驗環境 18
5.2 取得運動向量 18
5.4 遞迴神經網路參數分析 20
5.5 使用遞迴神經網路(RNN)分類 22
5.6 實驗結果比較 24
5.7 問題探討 24
六、 結論與未來展望 26
參考文獻 27
[1] A. A. Zakharov and A. E. Barinov, “An algorithm for 3D-object reconstruction from video using stereo correspondences,” Journal of Pattern Recognition and Image Analysis, vol. 25, no.1, 2015, pp. 117-121.
[2] D. Kosmopoulos and S. Chatzis, “Robust visual behavior recognition: A framework based on holistic representations and multicamera information fusion,” IEEE Signal Processing Magazine, vol. 27, no. 5, 2010, pp. 34-45.
[3] C. Orrite, M. Rodriguez, E. Herrero, G. Rogez, and S.A. Velastin, “Automatic segmentation and recognition of human actions in monocular sequences,” in Proc. IEEE International Conference on Pattern Recognition (ICPR), 2014 , pp. 4218-4223.
[4] S. Samanta and B. Chanda, “Space-time facet model for human activity classification,” IEEE Transactions On Multimedia, vol. 16, no. 6, 2014, pp. 1525-1535.
[5] A. Jayabalan1, H. Karunakaran1, S. Murlidharan1, and T. Shizume, “Dynamic action recognition: A convolutional neural network model for temportally organized joint location data,” arXiv:1612.06703 [cs.CV].
[6] S. Karaman, L. Seidenari, and A. Del Bimbo, “Fast saliency based pooling of Fisher encoded dense trajectories,” in ECCV THUMOS Workshop, vol. 1, 2014, p. 6.
[7] Y. Gang and J. Yuan, "Fast action proposals for human action detection and search," in Procs. of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1302-1311.
[8] G. Gkioxari and J. Malik, “Finding action tubes,” in Procs. of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 759-768.
[9] M. Shugao, L. Sigal, and S. Sclaroff, "Learning activity progression in lstms for activity detection and early detection," in Procs. of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1942-1950.
[10] S.-C. Cheng, J.-Y. Su, K.-F. Hsiao, Habib F. Rashvand, “Latent semantic learning with time-series cross correlation analysis for video scene detection and classification,” Multimedia Tools Appl., vol. 75, issue 20, 2016, pp. 12919-12940.
[11] M. Hasan and A. K. Roy-Chowdhury, “A continuous learning framework for activity recognition using deep hybrid feature models,” IEEE Trans. Multimedia, vol. 17, no. 11, 2015, pp. 1909-1922.
[12] Li-Qun Xu and Yongmin Li, “Video Classification Uing Spatial-Temporal Features And PCA,” International Conference on Multimedia and Expo, vol.3,2003,pp485-8
[13] Wei-Hao Lin and Alexander Hauptmann, “News Video Classification Using SVM-based Multimodal Classifiers and Combination Strategies.” the tenth ACM international conference on Multimedia ,2002,pp. 323-326
[14] Zhipeng Liu, Xiujuan Chai, Zhuang Liu, Xilin Chen, “Continuous Gesture Recognition with Hand-oriented Spatiotemporal Feature.” , IEEE International Conference on Computer Vision Workshops ,2017,pp.3056-3064
[15] Pavlo Molchanov, Xiaodong Yang, Shalini Gupta, Kihwan Kim, Stephen Tyree and Jan Kautz, ”Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks”, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 4207-4215
[16] Qian Xie , Oussama Remil, Yanwen Guo , Meng Wang , Senior Member, IEEE, Mingqiang Wei , and Jun Wang, “Object Detection and Tracking Under Occlusion for Object-Level RGB-D Video Segmentation.” IEEE Transactions on Multimedia,2017,pp.580-592
[17] Hong Liu,Jun Wang,Xiangdong Wang and Yueliang Qian, “Efficient Multi-scale Plane Extraction Based RGBD Video Segmentation.” International Conference on Multimedia Modeling,2017,pp.614-625
[18] Yeong-Seok Kim, Jong-Chul Yoon and In-Kwon Lee, “Real-time human segmentation from RGB-D video sequence based on adaptive geodesic distance computation.” Multimedia Tools and Applications • November 2017,pp.1-13
[19] Max Schwarz, Anton Milan, Arul Selvam Periyasamy, , , Sven Behnke, Max Schwarz, Anton Milan, Arul Selvam Periyasamy, Sven Behnke, “RGB-D object detection and semantic segmentation for autonomous manipulation in clutter.” The International Journal of Robotics Research,2017
[20] A. Jain, Amir R. Zamir, S. Savarese, and A. Saxena, “ Structural-RNN: Deep Learning on Spatio-Temporal Graphs,” n Proc. Intl. Conf. CVPR, 2016
[21] Bingbing Ni Gang Wang and Pierre Moulin, “RGBD-HuDaAct: A Color-Depth Video Database For Human Daily Activity Recognition”, IEEE ICCV Workshops, 2011
[22] Hema Swetha Koppula, Rudhir Gupta and Ashutosh Saxena, ”Learning human activities and object affordances from RGB-D videos,” The International Journal of Robotics Research ,2013,pp. 951–970
[23] Urbano Miguel Nunes,Diego R. Faria and Paulo Peixoto, “A human activity recognition framework using max-min features and key poses with differential evolution random forests classifier.” Pattern Recognition Letters,Volume 99, 2017, pp. 21-31
[24] Saeed Ghodsi, Hoda Mohammadzade and Erfan Korki, “Simultaneous Joint and Object Trajectory Templates for Human Activity Recognition from 3-D Data.” Cornell University, Computer Vision and Pattern Recognition (cs.CV),2017
[25] AhmadJalalaYeon-HoKimaYong-JoongKimaShaharyarKamalbDaijinKima, “Robust human activity recognition from depth video using spatiotemporal multi-fused features.” Pattern Recognition,Volume 61,2017,pp. 295-308
[26] B. Zhang, L. Wang, Z. Wang, Y. Qiao, and H. Wang, ”Real-time action recognition with enhanced motion vector CNNs,” in Proc. Intl. Conf. CVPR, 2016.
[27] J. Y.-H. Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici. “Beyond short snippets: Deep networks for video classification,” in Proc. CVPR,2015, pp. 4694–4702.
[28] S. Ma, L. Sigal, and S. Scalroff, “Learning activity progression in LSTMs for activity detection and early detection,” in Proc. Intl. Conf. CVPR, 2016.
[29] M. Sabokrou, M. Fayyaz, and M. Fathy, “Deep-Cascade: Cascading 3D Deep Neural Networks for Fast Anomaly Detection and Localization in Crowded Scenes,” IEEE Transactions on Image Processing, Vol. 26, Issue: 4, pp. 1992-2004, April 2017
[30] F. Murtaza, M. H. Yousaf, and S.A. Velastin, “Multi-view human action recognition using histograms of oriented gradients (HOG) description of motion history images (MHIs),” in Proc. 13th IEEE International Conference on Frontiers of Information Technology (FIT), 2015, pp. 297-302.
[31] C.-Y. Hsieh, S.-C. Cheng, C.-C Chang, and C.-L. Lin, “Automatic liver segmentation from CT images using latent semantic indexing,” in Proc. IEEE MMSP, 2015,pp. 1-6
[32] W.-K. Huang, C.-H. Chung, S.-C. Cheng, and J.-W. Hsieh, “Fast cube-based video shot retrieval using 3D moment-preserving technique,” in Proc. IEEE Conf. Image Processing, 2009.
[33] H. Hoffmann, “Kernel PCA for novelty detection,” Pattern Recognition, vol. 40, 2007, pp. 863-874.
[34] M. Barnachon, S. Bouakaz, B. Boufama, and E. Guillou, “Ongoing human action recognition with motion capture,” Pattern Recognition, vol. 47, no. 1, 2014, pp. 238–247.
[35] http://pr.cs.cornell.edu/humanactivities/data.php
[36] Cornell Activity Datasets: CAD-60 & CAD-120, http://pr.cs.cornell.edu/humanactivities/data.php, accessed online March 7 2017.
[37] H. S. Koppula and A. Saxena, “Learning spatio-temporal structure from RGB-D videos for human activity detection and anticipation,” in Proc. International Conference on Machine Learning (ICML), 2013.
[38] Y. Jiang and A. Saxena, “Low-dimensional Modeling of Humans in Environment Context for Activity Anticipation,” Robotics: Science and Systems (RSS), 2014.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關論文
 
無相關期刊
 
無相關點閱論文
 
系統版面圖檔 系統版面圖檔