跳到主要內容

臺灣博碩士論文加值系統

(3.236.110.106) 您好!臺灣時間:2021/07/27 18:46
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:陳昱廷
研究生(外文):Yu-Ting Chen
論文名稱:在互動顯示上使用捲積網路的視線偵測
論文名稱(外文):Gaze Detection Using Convolutional Neural Network for Interactive Displays
指導教授:洪一平洪一平引用關係
指導教授(外文):Yi-Ping Hung
口試委員:謝俊科石勝文吳健榕陳湘鳳
口試日期:2015-07-20
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊網路與多媒體研究所
學門:電算機學門
學類:網路學類
論文種類:學術論文
論文出版年:2015
畢業學年度:103
語文別:中文
論文頁數:36
中文關鍵詞:視線偵測捲積神經網路互動顯示裝置電腦視覺人機互動
外文關鍵詞:Gaze DetectionConvolutional Neural NetworkInteractive DisplaysComputer VisionHuman Computer Interaction
相關次數:
  • 被引用被引用:0
  • 點閱點閱:192
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
現今出現了許多互動顯示裝置,像 Google Glass、 Oculus、 Samsung TV 等。而對於大型的互動顯示系統,基於視線的互動方式是一種有效率和方便的方法。然而,大部分的視線偵測系統會需要侵入式光線、頭戴式裝置或固定的頭部位置。
在這份論文中,我們展示了一種只需要 RGB-D 相機和高解析度相機的視線偵測方法。方法的重點在於使用最新的機械學習技術—捲積神經網路。我們將比較三種方法在兩種著名的網路模型的準確度。
為了收集實驗數據,我們設計了一個互動牆實驗。最後的結果顯示我們的方法在 36 個方向的視線偵測上可以達到 80% 的成功率。然而,RGB-D 資料對準確度並無貢獻。即使如此,我們的依舊有良好的準確度。

Many new interactive display devices appear recently, like Google Glass, Oculus, Samsung TV and so on. For large interactive display, like a wall, gaze-based interaction can be more effective and convenient. However, many gaze detection system need intrusive light, wearable devices or fixed head pose.
In this paper, our goal is to study if head pose information can be useful for gaze detection. We propose a method which uses RGB-D camera for head pose detection and high esolution camera for gaze detection. The main idea is applying the new technology named Convolutional Neural Network (CNN) as the training process. We compared accuracy of gaze detection for interactive display between two well-known models of CNN with three approaches.
We held an experiment on an interactive wall to collect data for our approach. The result shows our system can have more than 80% accuracy for 36 labels gaze detection. The head pose information provided no significant improvement. Even then, our approach still has good accuracy.

口試委員會審定書 i
中文摘要 ii
Abstract iii
Contents iv
List of Figures vi
List of Tables viii
1 Introduction 1
2 Related Work 3
2.1 Gaze Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Convolutional Neural Network (CNN) . . . . . . . . . . . . . . . . . . . 4
3 Convolutional Neural Network (CNN) 5
3.1 LeNet5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 AlexNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2.1 ReLU Nonlinearity . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2.2 Multiple GPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2.3 Local Response Normalization . . . . . . . . . . . . . . . . . . . 7
3.2.4 Overlapping Pooling . . . . . . . . . . . . . . . . . . . . . . . . 8
v
3.2.5 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3 GoogLeNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3.1 Inception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4 Method 12
4.1 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2 CNN Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3 Combining CNN Training with Head Pose Training . . . . . . . . . . . . 14
4.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5 Experiments 16
5.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.1.1 Interactive Wall Display . . . . . . . . . . . . . . . . . . . . . . 16
5.1.2 Computer Used for Training . . . . . . . . . . . . . . . . . . . . 18
5.2 Experiment Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.2.1 Laser Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.2.2 Subject Measurement . . . . . . . . . . . . . . . . . . . . . . . . 20
5.2.3 Data Recording . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.3 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.4 Learning Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.5 Subject Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6 Conclusions and Future Works 29
A Displacement of Head Pose 30
A.1 Calibration State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
A.2 Horizontal Move State . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
A.3 Vertical Move State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Bibliography 34

[1] Carlos H Morimoto and Marcio RM Mimica. Eye gaze ㄔracking techniques for interactive applications. Computer Vision and Image Understanding, 98(1):4–24, 2005.
[2] Dan Witzner Hansen and Qiang Ji. In the eye of the beholder: A survey of models for eyes and gaze. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 32(3):478–500, 2010.
[3] Dong Hyun Yoo and Myung Jin Chung. A novel non-intrusive eye gaze estimation using cross-ratio under large head motion. Computer Vision and Image Understanding, 98(1):25–51, 2005.
[4] David Beymer and Myron Flickner. Eye gaze tracking using an active stereo head. In Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on, volume 2, pages II–451. IEEE, 2003.
[5] Shumeet Baluja and Dean Pomerleau. Non-intrusive gaze tracking using artificial neural networks. Technical report, DTIC Document, 1994.
[6] Kar-Han Tan, David J Kriegman, and Narendra Ahuja. Appearance-based eye gaze estimation. In Applications of Computer Vision, 2002.(WACV 2002). Proceedings. Sixth IEEE Workshop on, pages 191–195. IEEE, 2002.
[7] Oliver Williams, Andrew Blake, and Roberto Cipolla. Sparse and semi-supervised visual mapping with the sˆ 3gp. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, volume 1, pages 230–237. IEEE, 2006.34
[8] Brian A Smith, Qi Yin, Steven K Feiner, and Shree K Nayar. Gaze locking: Passive eye contact detection for human-object interaction. In Proceedings of the 26th annual ACM symposium on User interface software and technology, pages 271–280. ACM, 2013.
[9] Kunihiko Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological cybernetics, 36(4):193–202, 1980.
[10] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–
2324, 1998.
[11] Dan Ciresan, Ueli Meier, and Jürgen Schmidhuber. Multi-column deep neural networks for image classification. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 3642–3649. IEEE, 2012.
[12] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, pages 1–42, 2014.
[13] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
[14] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.
[15] Shuicheng Yan Min Lin, Qiang Chen. Network in network. arXiv preprint arXiv:1312.4400, 2013.35
[16] Timothy F Cootes, Christopher J Taylor, David H Cooper, and Jim Graham. Active shape models-their training and application. Computer vision and image understanding, 61(1):38–59, 1995.
[17] S. Milborrow and F. Nicolls. Active Shape Models with SIFT Descriptors and MARS. VISAPP, 2014.
[18] Davis E. King. Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research, 10:1755–1758, 2009.
[19] Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks? In Advances in Neural Information Processing
Systems, pages 3320–3328, 2014.
[20] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[21] Yi-Ping Hung Yu-Shan Lin. Attention-aware interactive display wall. 2013.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top