跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.181) 您好!臺灣時間:2025/12/14 08:49
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:林伯儒
研究生(外文):LIN, PO-JU
論文名稱:利用時空自動編碼器檢測視頻中的真實異常
論文名稱(外文):Detecting Real World Anomalies in Video using Spatio-Temporal Autoencoder
指導教授:熊博安熊博安引用關係
指導教授(外文):HSIUNG, PAO-ANN
口試委員:李宗演嚴茂旭黃敬群
口試委員(外文):LEE, TRONG-YENYEN, MAO-HSUHUANG, CHING-CHUN
口試日期:2019-07-17
學位類別:碩士
校院名稱:國立中正大學
系所名稱:資訊工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:英文
論文頁數:43
中文關鍵詞:異常檢測自動編碼器無監督學習時空關係影像重建3D 卷積神經網絡卷積長短期記憶
外文關鍵詞:Anomaly detectionAutoencoderUnsupervised learningSpatio-Temporal RelationImage reconstruction3D convolutional networkConvolutional long short term memoryConvlstm
相關次數:
  • 被引用被引用:0
  • 點閱點閱:184
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
隨著對安全性的需求增加,基於視頻的監視系統越來越多地用於公共場所。在火車站,機場,道路或購物中心廣泛使用的監控系統收集大量視頻數據,用於交通違規,事故,犯罪,恐怖主義,故意破壞和其他可疑活動,如人群恐慌,踩踏事件和涉及大量的個人事故。在過去,許多研究已提出了不同方法來檢測視頻中的視頻異常,但是檢測的準確性在擁擠的場景迅速減少,因為現實世界的異常是複雜多樣的。很難從大量視頻數據中手動標記異常事件。大多數模型只能檢測數據集中的異常事件。如果有新的異常事件,則不會檢測到。異常行為可能持續數秒,因此異常行為表現出空間和時間關係。例如,搶劫和結賬有類似的行為,如店員拿錢給客戶,只使用相鄰幀的像素變化不容易區分搶劫和結賬,如果用一段時間來觀察搶劫和結賬,我們可以觀察較大的變化。

本論文中,我們提出一種無監督學習神經網絡模型,該模型使用時空特徵來學習正常事件。目標是在發生新異常時解決它,我們的模型可以檢測到它。在空間特徵的部分是使用3D CNN來學習而時間特徵的部分是使用ConvLSTM來學習,我們利用學習到的特徵來重建輸入的影像,再利用歐幾里德損失來估計異常檢測中的誤差。如果誤差太大,則會將其標記為異常。實驗結果證明本文提出的模型能夠有效偵測出在打架和攻擊的資料集中的異常,在打架的資料集中,我們的模型準確率達73\%,比CNNAE高出了8\%,在搶劫的資料集中,我們的模型準確率達63\%,比CNNAE高出了7\%,在攻擊的資料集中,我們的模型準確率達71\%,比CNNAE高出了9\%,我們還比較了三種模型的AUC。在戰斗數據集中,我們的方法AUC為0.794,比MIL高0.044,比CNNAE高0.19。在搶劫數據集中,我們的方法AUC為0.620,比MIL低0.13,比CNNAE高0.058。在攻擊中,我們的方法AUC為0.746,比MIL低0.04,比CNNAE高0.077。由實驗結果可知,本篇論文提出的方法有最好的準確率,能夠提高視頻異常檢測系統的效能。
As the demand for security increases, video-based surveillance systems are increasingly used in public places. Widely used surveillance systems in airports, shopping malls ,train stations or roads to collect large amounts of video data for traffic accidents crime, vandalism , terrorism and other suspicious activities like crowd panic, stampede events and accidents involving a large number of individuals. In the past, many studies have proposed various methods to detect video anomalies in a video, but the accuracy of anomaly detection in a crowded scene decreases rapidly, because real world anomalies are complex and diverse. It is difficult to manually label abnormal events from massive video data. Most of the models can only detect abnormal events in a dataset. If there is a new abnormal event, it will not be detected. An abnormal behavior might last for a few seconds while a target object moves position, so both spatial and temporal relationships, are exhibited by the anomalous behavior. For example, Robbery and checkout have similar actions, such as the clerk taking money to the customer, only using the pixel change of the adjacent frame is not easy to distinguish robbery and checkout. If we use a period of time to observe robbery and checkout, we can observe a big difference.

In this Thesis, we propose an unsupervised learning neural network model that uses spatio-temporal features to learn normal events. The goal is to detect new anomalies when they happen. The spatial feature is learned using 3D CNN and the temporal feature is learned using ConvLSTM. We use the learned features to reconstruct the input image and then use the Euclidean loss to estimate the error in the anomaly detection. If the error is very large, it will be labeled as an anomaly. Experimental results show that the proposed method can effectively detect the anomaly in the dataset of the fighting, robbery and assault. In the fighting dataset, our method accuracy rate is 73\%, which is 8\% higher than CNNAE. In the robbery dataset, our method accuracy is 63\%, which is 7\% higher than CNNAE. In the assault dataset, our method accuracy is 71\%, which is 9\% higher than CNNAE. We also compared the AUC results for the three model for all the datasets. For the fighting dataset, the AUC of our proposed model is 0.794, that for MIL is 0.75, and CNNAE is 0.685. Our model surpasses the other models in AUC by 0.044 and 0.109 respectively for MIL and CNNAE. For the robbery dataset, the AUC of our proposed model is 0.620, that for MIL is 0.75, and CNNAE is 0.562. Our model surpasses the other models in AUC by 0.130 and 0.058 respectively for MIL and CNNAE. For the assault dataset, the AUC of our proposed model is 0.746, that for MIL is 0.75, and CNNAE is 0.669. Our model surpasses the other models in AUC by 0.004 and 0.077 respectively for MIL and CNNAE.
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Related Work 5
2.1 Tracking method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Motion based methods . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Combining motion and appearance methods . . . . . . . . . . . . . . . 7
2.4 Neural network methods . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Preliminaries 10
3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2.1 Architecture Assumptions . . . . . . . . . . . . . . . . . . . . 11
3.2.2 Anomaly Detection Assumptions . . . . . . . . . . . . . . . . 11
3.3 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4 Parameter Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4.1 Parameters for Anomaly Detect Model . . . . . . . . . . . . . 14
4 Spatio-Temporal Anomaly Detection System 16
4.1 Spatio-Temporal Anomaly Detection System Architecture . . . . . . . 16
4.2 Image Processor Module . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.3 Reconstruction Module . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.4 Detection Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5 Experiments 28
5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2 Experimental Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.3.1 Experimental Results of Adjusting parameter of our model . . . 31
5.3.2 Experimental Results of different neural network modules with
testing data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6 Conclusions and Future Work 37
Bibliography 39
[1] Autoencoder. https://en.wikipedia.org/wiki/Autoencoder, 2019.
[2] V. Mahadevan, W. Li, V. Bhalodia, and N. Vasconcelos. Anomaly detection in crowded scenes. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 1975–1981, June 2010.
[3] P. Tu, T. Sebastian, G. Doretto, N. Krahnstoever, J. Rittscher, and T. Yu. Unified crowd segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 691–704, October 2008.
[4] R. Raghavendra, A.D. Bue, M. Cristani, and V. Murino. Optimizing interaction force for global anomaly detection in crowded scenes. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pages 136–143, November 2011.
[5] X. Cui, Q. Liu, M. Gao, and D. N. Metaxas. Abnormal detection using interaction energy potentials. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3161–3167, June 2011.
[6] C. Lu, J. Shi, and J. Jia. Abnormal event detection at 150 fps in matlab. In Proceedings of the IEEE International Conference on Computer Vision, pages 2720–2727, December 2013.
[7] Y. Cong, J. Yuan, and J. Liu. Sparse reconstruction cost for abnormal event detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3449–3456, June 2011.
[8] Y. Cong, J. Yuan, and Y. Tang. Video anomaly search in crowded scenes via spatiotemporal motion context. IEEE Transactions on Information Forensics and Security, 8(10):1590–1599, October 2013.
[9] M.J. Roshtkhari and M.D. Levine. An online, realtime learning method for detecting anomalies in videos using spatiotemporal compositions. Computer Vision and Image Understanding, 117(10):1436–1452, October 2013.
[10] M. Rodriguez, I. Laptev, J. Sivic, and J.Y. Audibert. Densityaware person detection and tracking in crowds. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), IEEE, Barcelona, pages 2423–2430, November 2011.
[11] R. Mehran, A. Oyama, and M. Shah. Abnormal crowd behavior detection using social force model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 935–942, June 2009.
[12] A. Basharat, A. Gritai, and M. Shah. Learning object motion patterns for anomaly detection and improved object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8, June 2008.
[13] B. Antić and B. Ommer. Video parsing for abnormality detection. In Proceedings of the International Conference on Computer Vision, pages 2415–2422, November 2011.
[14] T. Hospedales, S. Gong, and T. Xiang. A Markov clustering topic model for mining behaviour in video. In Proceedings of the IEEE 12th International Conference on Computer Vision, pages 1165–1172, September 2009.
[15] D. Kumar, J. Bezdek, S. Rajasegarar, C. Leckie, and M. Palaniswami. A visualnumeric approach to clustering and anomaly detection for trajectory data. The Visual Computer, 33(3):265–281, March 2017.
[16] S. Wu, B. E. Moore, and M. Shah. Chaotic invariants of Lagrangian particle trajectories for anomaly detection in crowded scenes. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 2054–2060, June 2010.
[17] L. Kratz and K. Nishino. Anomaly detection in extremely crowded scenes using spatiotemporal motion pattern models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, page 1446–1453, June 2009.
[18] A. Klaser, M. Marszałek, and C. Schmid. A spatiotemporal descriptor based on 3Dgradients. In Proceedings of the 19th British Machine Vision Conference, pages 275–284, September 2008.
[19] X.B. Zhu, X. Jin, X.Y. Zhang, C.S. Li, F.G. He, and L. Wang. Contextaware local abnormality detection in crowded scene. Science China Information Sciences, 58(5):1–11, April 2015.
[20] C. Caetano, V. H. C. de Melo, J. A. dos Santos, and W. R. Schwartz. Activity recognition based on a magnitudeorientation stream network. In Proceedings of the 30th Conference on Graphics, Patterns and Images (SIBGRAPI), page 47–54, October 2017.
[21] D. Xu, E. Ricci, Y. Yan, J.k. Song, and N. Sebe. Learning deep representations of appearance and motion for anomalous event detection. In Proceedings of the Computer Vision and Pattern Recognition, pages 1–12, October 2015.
[22] S. Ji, W. Xu, M. Yang, and K. Yu. 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence,35(1):221–231, January 2013.
[23] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and F.F. Li. Largescale video classification with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1725– 1732, June 2014.
[24] X. Shi, Z. Chen, H. Wang, D.Y. Yeung, W.k. Wong, and W.c. Woo. Convolutional lstm network: A machine learning approach for precipitation nowcasting. In Proceedings of the Computer Vision and Pattern Recognition (CVPR), pages 802–810, June 2015.
[25] X. J. Shi, Z. R. Chen, H. Wang, D. Y. Yeung, W. K. Wong, and C. W. Wang. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Proceedings of the conference on Neural Information Processing Systems (NIPS), June 2015.
[26] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. Learning spatiotemporal features with 3D convolutional networks. In Proceedings of the IEEE Conference on Computer Vision, pages 4489–4497, December 2015.
[27] S. Ji, W. Xu, M. Yang, and K. Yu. 3d convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1):221–231, January 2013.
[28] W. Sultani, C. Chen, and M. Shah. Real world anomaly detection in surveillance videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6479–6488, June 2018.
[29] M. Hasan, J. Choi, J. Neumann, A. K. RoyChowdhury, and L. S. Davis. Learning temporal regularity in video sequences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 733–742, June 2016.
[30] K. Biradar, S. Dube, and S. K. Vipparthi. Dearest: Deep convolutional aberrant behavior detection in realworld scenarios. In Proceedings of the IEEE 13th International Conference on Industrial and Information Systems (ICIIS), pages 163–167, Dec 2018.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top