跳到主要內容

臺灣博碩士論文加值系統

(3.236.50.201) 您好!臺灣時間:2021/08/02 01:52
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:何哲廷
研究生(外文):Che-Ting Ho
論文名稱:基於情緒環與深層學習架構之使用者生成影片情緒辨識系統
論文名稱(外文):Emotion Prediction from User-Generated Videos by EmotionWheel Guided Deep Learning
指導教授:吳家麟
口試委員:胡敏君鄭文皇陳文進
口試日期:2015-07-09
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2015
畢業學年度:103
語文別:英文
論文頁數:24
中文關鍵詞:深層卷積類神經網路情緒辨識使用者生成影片
外文關鍵詞:deep convolutional neural networkemotion predictionuser-generated video
相關次數:
  • 被引用被引用:0
  • 點閱點閱:243
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
自動偵測一段影片中人們所表現出來的情緒,對於許多應用來說是
很有用的資訊。近期,隨著網路進步及社交媒體的興起,以及各種能
夠拍攝影片的設備,人們能夠輕易地在網路上分享自己所拍攝的影片。
相對於以往的情緒辨識著重於人臉分析,這類使用者產生影片在影片
的內容以及質量上有著極大的多樣性,提升了辨識的困難度以及穩固
性。為了解決這個問題,在我們所建構的系統中引入了深層卷積類神
經網路。深層卷積類神經網路最近在許多視覺辨認競賽上取得了相當
成功的成績,我們將它用作特徵的抽取工具。此外我們也引入情緒環
來對特徵的抽取流程進行改進,進一步提升深層卷積類神經網路特徵
的效能。我們在一個由 Youtube 和 Flickr 所收集的影片數據集上測試所提出的系統,辨識的準確率從先前的 46.1% 提升至 54.2%。

Predicting emotions in videos is important for many applications with the
requirements of user reactions. Recently, the increasing web services on the
Internet allow users to upload and share videos very conveniently. To build a
robust system for predicting emotions in such user-generated videos is a quite
challenging problem, due to the diversity of contents and high-level abstrac-
tions of human emotions. Motivated by the success of Convolutional Neural
Networks (CNN) in several visual competitions, it is a prospective solution to
bridge this affective gap. In this paper, we propose a multimodal framework
to predict emotions in user-generated videos based on CNN extracted fea-
tures. Psychological emotion wheel is included to learn better representations
as compare with its simply transfer learning counterpart. We also showed
through experiments that traditional encoding methods for local features can
help improve the prediction performance. Experiments conducted on a real-
world dataset from Youtube and Flickr demonstrate that our proposed frame-
work outperforms the previous related work, in terms of prediction accuracy
rate, by 54.2% to 46.1%.

致謝 i
中文摘要 ii
Abstract iii
Contents iv
List of Figures vi
List of Tables vii
1 Introduction 1
2 Related Work 3
3 Proposed Method 5
3.0.1 Convolutional neural network models . . . . . . . . . . . . . . . 5
3.0.2 Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.0.3 Visual CNN features . . . . . . . . . . . . . . . . . . . . . . . . 7
3.0.4 Audio CNN feature . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.0.5 Motion feature . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4 Experiments 15
4.1 Experiments & Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.1.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
iv4.1.2 Baseline framework . . . . . . . . . . . . . . . . . . . . . . . . 16
4.1.3 CNN features vs. baseline . . . . . . . . . . . . . . . . . . . . . 18
4.1.4 CNN with emotion wheel guided architecture . . . . . . . . . . . 18
4.1.5 Comparison of encoding methods . . . . . . . . . . . . . . . . . 19
4.1.6 Fusion performance . . . . . . . . . . . . . . . . . . . . . . . . 19
5 Conclusion 21
Bibliography 22

Bibliography
[1] Xiangyang Xue Yu-Gang Jiang, Baohan Xu. Predicting Emotions in User-Generated
Videos. In The 28th AAAI Conference on Artificial Intelligence (AAAI), Canada,
2014.
[2] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification
with deep convolutional neural networks. In F. Pereira, C.J.C. Burges, L. Bottou,
and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems
25, pages 1097–1105. Curran Associates, Inc., 2012.
[3] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hier-
archies for accurate object detection and semantic segmentation. In Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
[4] Y. Taigman, Ming Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap
to human-level performance in face verification. In Computer Vision and Pattern
Recognition (CVPR), 2014 IEEE Conference on, pages 1701–1708, June 2014.
[5] Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks
for semantic segmentation. In Computer Vision and Pattern Recognition (CVPR),
2015 IEEE Conference on, 2015.
[6] Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Suk-
thankar, and Li Fei-Fei. Large-scale video classification with convolutional neural
networks. In CVPR, 2014.
22[7] Rob Fergus Matthew D. Zeiler. Visualizing and understanding convolutional net-
works. In ECCV, 2014.
[8] Weining Wang and Qianhua He. A survey on emotional semantic image retrieval. In
Image Processing, 2008. ICIP 2008. 15th IEEE International Conference on, pages
117–120, Oct 2008.
[9] Damian Borth, Rongrong Ji, Tao Chen, Thomas Breuel, and Shih-Fu Chang. Large-
scale visual sentiment ontology and detectors using adjective noun pairs. In Pro-
ceedings of the 21st ACM International Conference on Multimedia, MM ’13, pages
223–232, New York, NY, USA, 2013. ACM.
[10] Z. Rasheed, Y. Sheikh, and M. Shah. On the use of computable features for film
classification. Circuits and Systems for Video Technology, IEEE Transactions on,
15(1):52–64, Jan 2005.
[11] Hee Lin Wang and Loong-Fah Cheong. Affective understanding in film. Circuits and
Systems for Video Technology, IEEE Transactions on, 16(6):689–704, June 2006.
[12] Lorenzo Torresani, Martin Szummer, and Andrew Fitzgibbon. Efficient object cat-
egory recognition using classemes. In European Conference on Computer Vision
(ECCV), pages 776–789, September 2010.
[13] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to
document recognition. Proceedings of the IEEE, 86(11):2278–2324, Nov 1998.
[14] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma,
Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C.
Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. Inter-
national Journal of Computer Vision (IJCV), pages 1–42, April 2015.
[15] Shengxin Zha, Florian Luisier, Walter Andrews, Nitish Srivastava, and Ruslan
Salakhutdinov. Exploiting image-trained CNN architectures for unconstrained video
classification. CoRR, abs/1503.04144, 2015.
23[16] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for
large-scale image recognition. CoRR, abs/1409.1556, 2014.
[17] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross
Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional architecture
for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.
[18] Emotion: Theory, research and experience. volume 1. theories of emotion. edited by
r. plutchik and h. kellerman. (pp. 399; illustrated; ?19.00.) academic press: London.
1980. Psychological Medicine, 11:207–207, 2 1981.
[19] Karen Simonyan and Andrew Zisserman. Two-stream convolutional networks for
action recognition in videos. In Z. Ghahramani, M. Welling, C. Cortes, N.d.
Lawrence, and K.q. Weinberger, editors, Advances in Neural Information Processing
Systems 27, pages 568–576. Curran Associates, Inc., 2014.
[20] Heng Wang and Cordelia Schmid. Action recognition with improved trajectories. In
IEEE International Conference on Computer Vision, Sydney, Australia, 2013.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top