臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.103) 您好！臺灣時間：2025/11/22 04:53

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
紙本論文
QR Code

本論文永久網址:

研究生:

劉又誠

研究生(外文):

Yu-Cheng Liu

論文名稱:

利用捲積神經網路進行動作辨識

論文名稱(外文):

Action Recognition Using Convolutional Neural Network

指導教授:

丁建均

指導教授(外文):

Jian-Jiun Ding

口試委員:

王家慶、王鵬華、許文良

口試委員(外文):

Jia-Ching Wang、Peng-Hua Wang、Wen-Liang Hsue

口試日期:

2016-07-26

學位類別:

碩士

校院名稱:

國立臺灣大學

系所名稱:

電信工程學研究所

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2016

畢業學年度:

104

語文別:

英文

論文頁數:

中文關鍵詞:

動作辨識、深度學習、捲積神經網路、長短時間記憶、三維捲積核心

外文關鍵詞:

action recognition、deep learning、convolutional neural network、long short term memory、3-D convolutional kernel

相關次數:

被引用:0
點閱:822
評分:
下載:0
書目收藏:0

多媒體在人類的生活中扮演重要的角色。有數以萬計的影片被上傳至網路。一些熱門的主題，像是籃球和棒球運動都有著極高的點閱率。因此資料擷取的技術逐漸變得重要。
人類的動作辨識可以被近一步應用於異常事件偵測以及分析人類活動。在我們實驗中所使用到的資料庫裡，有包含像是人類身體的動作以及人類與物品之間的互動，像是跳躍，拍手和飲食。
在這篇論文中，我們先利用捲積神經網路去訓練一個模型。然後擷取訓練及測試用影片的特徵。在取得這些特徵後，我們利用同一個影片中，特徵之間的時間關係去訓練一個三層的長短時間記憶模型。最後，我們選擇長短時間記憶模型的最後一層的最後一個時間步的特徵作為整個測試影片的特徵去分類。我們模型在測試之後的準確率高於一些近幾年來的方法。

Multimedia plays an important role in human daily life. Hundreds of thousands videos are uploaded on the Internet. Some hot topic such as basketball and baseball games are with high click through rate so information retrieval techniques become important.
Human action detection can be further applied to detect abnormal events and analyze activity. In this thesis, the dataset we use in experiments contains the human body action and interaction with objects like jumping, clapping, drinking.
In the thesis, we first uses convolutional neural network (CNN) to train a model. Then extract the features of training and testing data from the model. After obtaining the features, we use the temporal information between features in same video clip to train a 3-layered long short term memory (LSTM) model. Finally, we choose the last layer feature vector of LSTM which contains all data characteristics of the testing video features as the determine scores. The results show that the accuracy of our structure is higher than some works proposed in recent years.

口試委員會審定書……………………………………………………………………...#
中文摘要 i
ABSTRACT ii
CONTENTS iii
LIST OF FIGURES vii
LIST OF TABLES xii
Chapter 1 Introduction 1
1.1 Background 1
1.2 Organization 1
Chapter 2 Conventional Feature Based Methods 3
2.1 Spatio-Temporal Interest Points (STIPs) 3
2.1.1 Histogram of Gradient (HoG) 3
2.1.2 Optical Flow (OF) 6
2.1.3 Histogram of Flow (HoF) 9
2.1.4 Motion Boundary Histogram (MBH) 10
2.2 Dense Trajectories 11
2.2.1 Dense Sampling 13
2.2.2 Trajectories 14
2.3 Clustering 16
2.3.1 K-means 16
2.4 Feature Encoding 19
2.4.1 Vector Quantization 19
2.4.2 Sparse Coding 21
2.4.3 Fisher Vector 22
2.5 Principle Component Analysis (PCA) 23
2.6 Normalization and Pooling 25
2.6.1 Normalization 25
2.6.2 Pooling 26
2.7 Support Vector Machine (SVM) 27
2.8 Overall structure 28
2.9 Feature Fusion 29
Chapter 3 Neural Network (NN) 31
3.1 Deep Neuron Network (DNN) 31
3.1.1 Basic Structure of Neural Network 32
3.1.2 Back Propagation 37
3.1.3 Advance Structures for Neural Network 42
3.1.4 Recurrent Neural Network (RNN) 47
3.1.5 Long Short Term Memory (LSTM) 50
3.2 Convolutional Neural Network (CNN) 52
3.2.1 Convolutional Kernel 52
3.2.2 Pooling 54
3.2.3 Local Response Normalization 55
3.2.4 Fully-Connected Layer 55
3.2.5 Other Functions 55
3.2.6 Data Augmentation 55
3.2.7 CNN Structure 56
3.2.8 Visualization of CNN Kernels 57
Chapter 4 Existing CNN Methods 61
4.1 3-D Kernel Based Models 61
4.2 LSTM Based Model 63
4.3 Optical Flow Based Model 64
Chapter 5 Proposed 3-D Convolutional Model Based Feature training using LSTM 65
5.1 Structure of Proposed method 65
5.2 Simulation Results and Discussion 66
5.2.1 Dataset 67
5.2.2 Experiment on whole dataset 68
5.2.3 Experiment on 15 classes 70
Chapter 6 Conclusion and Future Work 71
6.1 Conclusion 71
6.2 Future Work 71
REFERENCE 72

A.Feature point based materials
[1]N. Dalal and B. Triggs., Histograms of oriented gradients for human detection, Computer Vision and Pattern Recognition, 2005. Computer Society Conference on IEEE, 2005, pp. 886-893.
[2]Lowe, David G., Distinctive image features from scale-invariant key points, International Journal of Computer Vision 60.2 (2004), pp. 91-110.
[3]Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B., Learning realistic human actions from movies, Computer Vision and Pattern Recognition, 2008. Computer Society Conference on IEEE, 2008, pp. 1-8.
[4]C. Harris and M. Stephens, A combined corner and edge detector, In Proc. of the 4th Alvey Vision Conference, 1988, pp. 147–151.
[5]B. Lucas and T. Kanade, An iterative image registration technique with an application to stereo vision, In Proc. Seventh International Joint Conference on Artificial Intelligence, 1981, pages 674–679.
[6]J. Weickert, A. Bruhn, and C. Schnorr, Lucas/Kanade meets Horn/Schunck: Combining local and global optic flow methods, International Journal of Computer Vision 61.3 (2005), pp. 211-231.
[7]T. Brox, A. Bruhn, N. Papenberg, and J. Weickert. High accuracy optical flow estimation based on a theory for warping, In Proc. European Conference on Computer Vision (ECCV), 2004, pp. 25-36.
[8]N. Dalal, B. Triggs, and C. Schmid, Human detection using oriented histograms of flow and appearance, In Proc. European Conference on Computer Vision (ECCV), 2006, pp. 428-441.
[9]L. Fei-Fei and P. Perona, A Bayesian hierarchical model for learning natural scene categories, Computer Vision and Pattern Recognition, 2005. Computer Society Conference on IEEE, 2005, pp. 524-531.
[10]E. Nowak, F. Jurie, and B. Triggs, Sampling strategies for bag-of-features image classification, In Proc. European Conference on Computer Vision (ECCV), 2006, pp. 409-503.
[11]Wang, H., Klaser, A., Schmid, C., Liu, C.L., Dense trajectories and motion boundary descriptors for action recognition, International Journal of Computer Vision 103.1 (2013), pp. 60-79.
[12]Bishop, C.M., Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA ,2006
[13]X. Peng, L.Wang, X.Wang, and Y. Qiao, Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice, CoRR, abs/1405.4506, 2014.
[14]Jaakkola, T., Haussler, D., Exploiting generative models in discriminative classifiers, In Proc. of Neural Information Processing System (NIPS), 1998, pp. 487-493.
[15]Perronnin, F., S´anchez, J., Mensink, T., Improving the fisher kernel for large-scale image classification, In Proc. European Conference on Computer Vision (ECCV), 2010, pp. 143-156.
[16]A tutorial on Principal Components Analysis
http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
[17]Cortes, C. and Vapnik, V., Support vector networks, Machine Learning, 20.3, 1995, pp. 273-297.
B.Neural network based materials
[18]Neural Network (Basic Ideas)
http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN%20(v4).pdf
[19]Backpropagation
http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN%20backprop.pdf
[20]Tips for Training Deep Neural Network
http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/Deep%20More%20(v2).pdf
[21]Neural Network with Memory
http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/RNN%20(v4).pdf
[22]Training Recurrent Neural Network
http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/RNN%20training%20(v6).pdf
[23]Donahue, Jeff, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell., Long-term recurrent convolutional networks for visual recognition and description, Computer Vision and Pattern Recognition, 2015. Computer Society Conference on IEEE, 2015, pp. 2625-2634.
[24]Performing Convolution Operations
https://developer.apple.com/library/ios/documentation/Performance/Conceptual/vImage/ConvolutionOperations/ConvolutionOperations.html
[25]Chang-Di Huang, Chien-Yao Wang, Jia-Ching Wang, Human Action Recognition System for Elderly and Children Care Using Three Stream ConvNet, Orange Technologies on IEEE International Conference, 2015, pp. 5-9.
[26]A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, In Proc. of Neural Information Processing System (NIPS), 2012, pp. 1106-1114.
[27]M. D. Zeiler and R. Fergus, Visualizing and understanding convolutional networks, CoRR, abs/1311.2901, 2013.
[28]S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd ed. Englewood Cliffs, NJ: Prentice-Hall, 1999.
[29]S. Ji, W. Xu, M. Yang, and K. Yu, 3D convolutional neural networks for human action recognition, Pattern Analysis and Machine Intelligence on IEEE Transactions on 35, 2013, pp. 221-231.
[30]D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, Learning Spatiotemporal Features with 3D Convolutional Networks, Computer Vision, 2015. ICCV, 2015. IEEE International Conference, 2015, pp. 4489-4497.
[31]A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, Large-scale video classification with convolutional neural networks, Computer Vision and Pattern Recognition, 2014. Computer Society Conference on IEEE, 2014, pp. 1725-1732.
[32]K. Simonyan and A. Zisserman, Two-stream convolutional networks for action recognition in videos, In Proc. of Neural Information Processing System (NIPS), 2014, pp. 568-576.
[33]H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, HMDB: A large video database for human motion recognition. Computer Vision, 2011. ICCV, 2011. IEEE International Conference, 2011, pp. 2556-2563.

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	藉由孿生網路進行不受濾鏡影響之社群網路圖片分類
2.	開放環境下之車牌偵測
3.	一種股市預測獲利行情的深度模型：以台灣股市為例
4.	拓增型深度類神經網路於影像辨識
5.	阿茲海默症的深度神經識別方法
6.	利用深度學習演算法預測企業下市模式
7.	運用於RGB-D影片的動作辨識之研究
8.	利用球狀空間組件模型應用於3D手部姿勢估計之深層神經網路學習
9.	以卷積神經網路分析部落格社群網站垃圾文章
10.	透過跨層併列與多尺度預測的完全卷積網路之語意分割
11.	基於長短期記憶深層學習方法之動作辨識
12.	使用混合卷積神經網路於影片分類之研究
13.	基於深度學習概念之金融市場價格預測
14.	基於部分連結類神經網路之雲端數字辨識設計
15.	基於深度學習之靜態影像超解析度技術

無相關期刊

1.	使用混合卷積神經網路於影片分類之研究
2.	基於卷積神經網路之手語影像辨識
3.	開放環境下之車牌偵測
4.	卷積神經網路影像辨識系統架構設計
5.	阿茲海默症的深度神經識別方法
6.	基於長短期記憶深層學習方法之動作辨識
7.	先進卷積式神經網路應用於深度學習及影像通用分類
8.	使用聚類及卷積神經網路於類別型及維度型情感分析之研究
9.	使用卷積類神經網路及長短期記憶單元方法以標籤關係為基礎的場景辨識
10.	基於深度學習之靜態影像超解析度技術
11.	全捲積神經網路於招牌偵測與辨識之應用
12.	影像切割之醫學細胞追蹤與分析
13.	拓增型深度類神經網路於影像辨識
14.	基於深度學習之 RGB-D 視覺辨識系統
15.	以卷積神經網路分析部落格社群網站垃圾文章

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室