跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.208) 您好!臺灣時間:2025/10/03 14:04
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:劉又誠
研究生(外文):Yu-Cheng Liu
論文名稱:利用捲積神經網路進行動作辨識
論文名稱(外文):Action Recognition Using Convolutional Neural Network
指導教授:丁建均丁建均引用關係
指導教授(外文):Jian-Jiun Ding
口試委員:王家慶王鵬華許文良
口試委員(外文):Jia-Ching WangPeng-Hua WangWen-Liang Hsue
口試日期:2016-07-26
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:電信工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2016
畢業學年度:104
語文別:英文
論文頁數:77
中文關鍵詞:動作辨識深度學習捲積神經網路長短時間記憶三維捲積核心
外文關鍵詞:action recognitiondeep learningconvolutional neural networklong short term memory3-D convolutional kernel
相關次數:
  • 被引用被引用:0
  • 點閱點閱:814
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
多媒體在人類的生活中扮演重要的角色。有數以萬計的影片被上傳至網路。一些熱門的主題,像是籃球和棒球運動都有著極高的點閱率。因此資料擷取的技術逐漸變得重要。
人類的動作辨識可以被近一步應用於異常事件偵測以及分析人類活動。在我們實驗中所使用到的資料庫裡,有包含像是人類身體的動作以及人類與物品之間的互動,像是跳躍,拍手和飲食。
在這篇論文中,我們先利用捲積神經網路去訓練一個模型。然後擷取訓練及測試用影片的特徵。在取得這些特徵後,我們利用同一個影片中,特徵之間的時間關係去訓練一個三層的長短時間記憶模型。最後,我們選擇長短時間記憶模型的最後一層的最後一個時間步的特徵作為整個測試影片的特徵去分類。我們模型在測試之後的準確率高於一些近幾年來的方法。

Multimedia plays an important role in human daily life. Hundreds of thousands videos are uploaded on the Internet. Some hot topic such as basketball and baseball games are with high click through rate so information retrieval techniques become important.
Human action detection can be further applied to detect abnormal events and analyze activity. In this thesis, the dataset we use in experiments contains the human body action and interaction with objects like jumping, clapping, drinking.
In the thesis, we first uses convolutional neural network (CNN) to train a model. Then extract the features of training and testing data from the model. After obtaining the features, we use the temporal information between features in same video clip to train a 3-layered long short term memory (LSTM) model. Finally, we choose the last layer feature vector of LSTM which contains all data characteristics of the testing video features as the determine scores. The results show that the accuracy of our structure is higher than some works proposed in recent years.

口試委員會審定書……………………………………………………………………...#
中文摘要 i
ABSTRACT ii
CONTENTS iii
LIST OF FIGURES vii
LIST OF TABLES xii
Chapter 1 Introduction 1
1.1 Background 1
1.2 Organization 1
Chapter 2 Conventional Feature Based Methods 3
2.1 Spatio-Temporal Interest Points (STIPs) 3
2.1.1 Histogram of Gradient (HoG) 3
2.1.2 Optical Flow (OF) 6
2.1.3 Histogram of Flow (HoF) 9
2.1.4 Motion Boundary Histogram (MBH) 10
2.2 Dense Trajectories 11
2.2.1 Dense Sampling 13
2.2.2 Trajectories 14
2.3 Clustering 16
2.3.1 K-means 16
2.4 Feature Encoding 19
2.4.1 Vector Quantization 19
2.4.2 Sparse Coding 21
2.4.3 Fisher Vector 22
2.5 Principle Component Analysis (PCA) 23
2.6 Normalization and Pooling 25
2.6.1 Normalization 25
2.6.2 Pooling 26
2.7 Support Vector Machine (SVM) 27
2.8 Overall structure 28
2.9 Feature Fusion 29
Chapter 3 Neural Network (NN) 31
3.1 Deep Neuron Network (DNN) 31
3.1.1 Basic Structure of Neural Network 32
3.1.2 Back Propagation 37
3.1.3 Advance Structures for Neural Network 42
3.1.4 Recurrent Neural Network (RNN) 47
3.1.5 Long Short Term Memory (LSTM) 50
3.2 Convolutional Neural Network (CNN) 52
3.2.1 Convolutional Kernel 52
3.2.2 Pooling 54
3.2.3 Local Response Normalization 55
3.2.4 Fully-Connected Layer 55
3.2.5 Other Functions 55
3.2.6 Data Augmentation 55
3.2.7 CNN Structure 56
3.2.8 Visualization of CNN Kernels 57
Chapter 4 Existing CNN Methods 61
4.1 3-D Kernel Based Models 61
4.2 LSTM Based Model 63
4.3 Optical Flow Based Model 64
Chapter 5 Proposed 3-D Convolutional Model Based Feature training using LSTM 65
5.1 Structure of Proposed method 65
5.2 Simulation Results and Discussion 66
5.2.1 Dataset 67
5.2.2 Experiment on whole dataset 68
5.2.3 Experiment on 15 classes 70
Chapter 6 Conclusion and Future Work 71
6.1 Conclusion 71
6.2 Future Work 71
REFERENCE 72



A.Feature point based materials
[1]N. Dalal and B. Triggs., Histograms of oriented gradients for human detection, Computer Vision and Pattern Recognition, 2005. Computer Society Conference on IEEE, 2005, pp. 886-893.
[2]Lowe, David G., Distinctive image features from scale-invariant key points, International Journal of Computer Vision 60.2 (2004), pp. 91-110.
[3]Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B., Learning realistic human actions from movies, Computer Vision and Pattern Recognition, 2008. Computer Society Conference on IEEE, 2008, pp. 1-8.
[4]C. Harris and M. Stephens, A combined corner and edge detector, In Proc. of the 4th Alvey Vision Conference, 1988, pp. 147–151.
[5]B. Lucas and T. Kanade, An iterative image registration technique with an application to stereo vision, In Proc. Seventh International Joint Conference on Artificial Intelligence, 1981, pages 674–679.
[6]J. Weickert, A. Bruhn, and C. Schnorr, Lucas/Kanade meets Horn/Schunck: Combining local and global optic flow methods, International Journal of Computer Vision 61.3 (2005), pp. 211-231.
[7]T. Brox, A. Bruhn, N. Papenberg, and J. Weickert. High accuracy optical flow estimation based on a theory for warping, In Proc. European Conference on Computer Vision (ECCV), 2004, pp. 25-36.
[8]N. Dalal, B. Triggs, and C. Schmid, Human detection using oriented histograms of flow and appearance, In Proc. European Conference on Computer Vision (ECCV), 2006, pp. 428-441.
[9]L. Fei-Fei and P. Perona, A Bayesian hierarchical model for learning natural scene categories, Computer Vision and Pattern Recognition, 2005. Computer Society Conference on IEEE, 2005, pp. 524-531.
[10]E. Nowak, F. Jurie, and B. Triggs, Sampling strategies for bag-of-features image classification, In Proc. European Conference on Computer Vision (ECCV), 2006, pp. 409-503.
[11]Wang, H., Klaser, A., Schmid, C., Liu, C.L., Dense trajectories and motion boundary descriptors for action recognition, International Journal of Computer Vision 103.1 (2013), pp. 60-79.
[12]Bishop, C.M., Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA ,2006
[13]X. Peng, L.Wang, X.Wang, and Y. Qiao, Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice, CoRR, abs/1405.4506, 2014.
[14]Jaakkola, T., Haussler, D., Exploiting generative models in discriminative classifiers, In Proc. of Neural Information Processing System (NIPS), 1998, pp. 487-493.
[15]Perronnin, F., S´anchez, J., Mensink, T., Improving the fisher kernel for large-scale image classification, In Proc. European Conference on Computer Vision (ECCV), 2010, pp. 143-156.
[16]A tutorial on Principal Components Analysis
http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
[17]Cortes, C. and Vapnik, V., Support vector networks, Machine Learning, 20.3, 1995, pp. 273-297.
B.Neural network based materials
[18]Neural Network (Basic Ideas)
http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN%20(v4).pdf
[19]Backpropagation
http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN%20backprop.pdf
[20]Tips for Training Deep Neural Network
http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/Deep%20More%20(v2).pdf
[21]Neural Network with Memory
http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/RNN%20(v4).pdf
[22]Training Recurrent Neural Network
http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/RNN%20training%20(v6).pdf
[23]Donahue, Jeff, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell., Long-term recurrent convolutional networks for visual recognition and description, Computer Vision and Pattern Recognition, 2015. Computer Society Conference on IEEE, 2015, pp. 2625-2634.
[24]Performing Convolution Operations
https://developer.apple.com/library/ios/documentation/Performance/Conceptual/vImage/ConvolutionOperations/ConvolutionOperations.html
[25]Chang-Di Huang, Chien-Yao Wang, Jia-Ching Wang, Human Action Recognition System for Elderly and Children Care Using Three Stream ConvNet, Orange Technologies on IEEE International Conference, 2015, pp. 5-9.
[26]A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, In Proc. of Neural Information Processing System (NIPS), 2012, pp. 1106-1114.
[27]M. D. Zeiler and R. Fergus, Visualizing and understanding convolutional networks, CoRR, abs/1311.2901, 2013.
[28]S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd ed. Englewood Cliffs, NJ: Prentice-Hall, 1999.
[29]S. Ji, W. Xu, M. Yang, and K. Yu, 3D convolutional neural networks for human action recognition, Pattern Analysis and Machine Intelligence on IEEE Transactions on 35, 2013, pp. 221-231.
[30]D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, Learning Spatiotemporal Features with 3D Convolutional Networks, Computer Vision, 2015. ICCV, 2015. IEEE International Conference, 2015, pp. 4489-4497.
[31]A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, Large-scale video classification with convolutional neural networks, Computer Vision and Pattern Recognition, 2014. Computer Society Conference on IEEE, 2014, pp. 1725-1732.
[32]K. Simonyan and A. Zisserman, Two-stream convolutional networks for action recognition in videos, In Proc. of Neural Information Processing System (NIPS), 2014, pp. 568-576.
[33]H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, HMDB: A large video database for human motion recognition. Computer Vision, 2011. ICCV, 2011. IEEE International Conference, 2011, pp. 2556-2563.


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊