跳到主要內容

臺灣博碩士論文加值系統

(44.210.83.132) 您好!臺灣時間:2024/05/27 02:08
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:周佳勲
研究生(外文):CHOU,CHIA-HSUN
論文名稱:利用YOLOX架構推定視線落點
論文名稱(外文):Appearance-based Gaze Estimation Using YOLOX
指導教授:蘇志文蘇志文引用關係
指導教授(外文):SU,CHIH-WEN
口試委員:蘇志文朱守禮林學億
口試委員(外文):SU,CHIH-WENCHU, SLO-LILIN, HSUEH-YI
口試日期:2022-07-21
學位類別:碩士
校院名稱:中原大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2022
畢業學年度:110
語文別:中文
論文頁數:50
中文關鍵詞:深度學習卷積神經網路視線推定
外文關鍵詞:Deep learningConvolutional neural networkGaze estimation
DOI:10.6840/cycu202201409
相關次數:
  • 被引用被引用:0
  • 點閱點閱:228
  • 評分評分:
  • 下載下載:32
  • 收藏至我的研究室書目清單書目收藏:0
視線推定在研究領域主要分為基於幾何法(Geometry-based Methods)和基於外觀法(Appearance-based Methods)。基於幾何法利用眼睛的特徵,例如 : 瞳孔、虹膜,眼角等關鍵特徵進行視線推定,但缺點是必須要穩定光源、 高解析度、低雜訊的影像,在一般生活中因環境相較於單純實驗環境複雜 因此難以應用。近年來因深度學習興起,基於外觀法在研究領域受到重視,基於外觀法將影像直接映射到模型上推定出視線,由於此方法利用大量的圖像進行訓練,因此在光源複雜、低解析度、高雜訊和不同頭部姿態下的情況下有更良好的穩定性。基於外觀法目前多需先偵測出單眼、雙眼或全臉區域後,再針對區域影像進行視線估計。本論文利用無錨框解耦頭 YOLOX 架構,利用其單階段偵測之特性,對視線點位置進行直接估測,並對近年來數個公開資料集進行實驗,得到了相當良好的結果。
In the field of gaze estimation, the previous researches can be divided into geometry-based methods and appearance-based methods. Geometry-based methods use the geometric position of eye components, such as pupil, iris, and eye corners, to achieve gaze estimation. However, the disadvantage is that it requires a stable light source, high resolution and low noise images, which is difficult to apply in daily life due to the complexity of the general environment. In recent years, due to the rise of deep learning, the appearance-based methods take the real image as input and uses a large number of images for training, to predict robust gaze position. This provides better stability in situations with complex light sources, low resolution, high noise and different head postures. Nevertheless, most of the current methods require the detection of single-eye, double-eye or full-face areas before the gaze estimation. This paper utilizes the decoupled head YOLOX to directly estimate the gaze position in its single stage architecture. A number of experiments have been conducted on several public datasets and excellent experimental results have been obtained.
目次
摘要 I
致謝 III
目次 IV
圖目次 V
表目次 VII
第一章 序論 1
第二章 相關文獻 2
2-1 基於幾何法(Geometry-based Methods) 4
2-1-1 基於眼部模型回復法(Eye Model Recovery-based Method) 4
2-1-2 基於眼部特徵迴歸法 (Eye Feature Regression-based Method) 5
2-2基於外觀法(Appearance-based Methods) 8
2-2-1單眼視線推定 8
2-2-2 雙眼視線推定 11
2-2-3全臉視線推定 12
第三章 研究方法 15
3-1 YOLO系列架構 16
3-2 基於解耦頭YOLOX架構進行視線推定 22
3-3視線點/視線向量轉換 25
第四章 實驗結果與分析 26
4.1 實驗環境 26
4.2 實驗資料 26
4.3 實驗結果 30
第五章 結論與未來方向 40
參考文獻 41

圖目次
圖 2 1二維視線點推定示意圖 [2] 2
圖2 2三維視線向量推定示意圖[3] 2
圖 2 3相機坐標系臉部特徵映射至三維坐標系示意圖[4] 4
圖 2 4 Chen等人計算視線推定之眼球剖面圖[4] 4
圖 2 5左圖為亮瞳法,右圖為暗瞳法之示意圖[5] 5
圖 2 6 Purkinje image 之示意圖[6] 5
圖 2 7 Wang等人之實驗裝置示意圖[7] 6
圖2 8 Wang等人提出視線推定之流程圖[7] 6
圖 2 9 螢幕校正點之坐標系與瞳孔座標建立映射關係示意圖[9] 7
圖 2 10 Zhang等人提出MPIIGAZE資料集之影像[11] 8
圖 2 11 Zhang等人[11] 提出的MPIIGAZE資料集之蒐集環境 9
圖 2 12 Zhang等人提出視線推定流程圖[11] 9
圖 2 13 人臉標記點二維座標系映射至三維坐標系示意圖 10
圖 2 14 透過相機坐標系與頭部坐標系取得變換矩陣並將影像正規化之示意圖 10
圖 2 15 Zhang等人利用類似於LeNet[15]之網路架構 10
圖 2 16 Chang等人提出以雙眼為輸入之網路架構[16] 11
圖 2 17 Krafka等人提出的GazeCapture資料集[2] 12
圖 2 18 Krafka等人提出的iTracker網路架構 [2] 12
圖 2 19 Zhang等人提出的基於AlexNet網路架構[18] 13
圖 2 20 Bao等人提出AFF-Net網路架構[19] 14
圖 3 1 本方法之訓練以及檢測流程圖 15
圖 3 2 2014年 Girshick 等人提出的R-CNN模型[20] 16
圖 3 3 Redmon等人提出單階段物體檢測方法[21] 17
圖 3 4 Redmon等人提出Darknet-19 架構[21] 18
圖 3 5 Redmon等人借鑒Faster R-CNN錨框之概念[21] 18
圖 3 6 Redmon等人提出Darknet-53網路架構 19
圖 3 7 金字塔特徵網路(Feature Pyramid Network)示意圖[24] 19
圖 3 8 物體檢測模型內之網路分類[26] 20
圖 3 9 YOLO系列耦合頭架構與YOLOX解耦頭架構[1] 21
圖 3 10 YOLO架構與YOLOX架構實驗結果比較[1] 21
圖 3 11 YOLOX網路架構圖 22
圖 3 12 左為原始邊界框,右為加入視線點之邊界框 23
圖 3 13 兩種頭部網路修改方式之示意圖 23
圖 3 14 本論文所提出視線推定無錨框單階段物體檢測模型架構圖 24
圖 3 15 二維視線點轉換為三維視線向量之示意圖 25
圖 3 16 藉由臉部特徵點檢測取得臉部中心位置之示意圖 25
圖 4 1 利用雙眼眼角關鍵點取得眼部區域之邊界框 28
圖 4 2 GazeCapture資料集蒐集環境[2] 28
圖 4 3 Krafka等人將螢幕坐標系視線點映射至以相機為原點之坐標系示意圖 29
圖 4 4 MPIIFaceGaze 資料集每位參與者平均歐幾里德距離誤差測試結果 30
圖 4 5 MPIIFaceGaze 資料集下,參與者測試結果轉換為向量夾角平均誤差 31
圖4 6 參與者01~05不同頭部姿勢、大小和光源之實驗結果 34
圖 4 7 參與者06~10不同頭部姿勢、大小和光源之實驗結果 35
圖 4 8 參與者11~15不同頭部姿勢、大小和光源之實驗結果 36
圖 4 9 參與者01~05 眼部特徵被遮擋之結果圖 37
圖 4 10 參與者06~10 眼部特徵被遮擋之結果圖 38
圖 4 11 參與者11~15 眼部特徵被遮擋之結果圖 39


表目次

表 4 1 近年來視線推定之資料集 26
表 4 2 MPIIFaceGaze參與者之蒐集環境及資料量 27
表 4 3 近年不同方法對於MPIIFaceGaze資料集進行視線推定之平均誤差 32
表 4 4 本論文對於GazeCapture實驗結果 32
表 4 5 不同CNN方法對於GazeCapture 實驗結果 33


參考文獻
[1]Ge, Zheng, Songtao Liu, Feng Wang, Zeming Li and Jian Sun. “YOLOX: Exceeding YOLO Series in 2021.” ArXiv abs/2107.08430 (2021)
[2]Krafka, Kyle et al. “Eye Tracking for Everyone.” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016): 2176-2184.
[3]X. Zhang, S. Park, T. Beeler, D. Bradley, S. Tang, and O. Hilliges, “Eth-xgaze: A large scale dataset for gaze estimation under extreme head pose and gaze variation,” in The European Conference on Computer Vision (ECCV), 2020.
[4]Chen, J. and Ji, Q.: 3D gaze estimation with a single camera without IR illumination, Proc. ICPR, pp.1–4 (2008).
[5]Morimoto, Carlos & Koons, David & Amir, Arnon & Flickner, Myron. (2000). Pupil Detection and Tracking Using Multiple Light Sources. Image and Vision Computing.
[6]Sheehy CK, Beaudry-Richard A, Bensinger E, Theis J, Green AJ. Methods to Assess Ocular Motor Dysfunction in Multiple Sclerosis. J Neuroophthalmol. 2018
[7]Wang J, Zhang G, Shi J. 2D Gaze Estimation Based on Pupil-Glint Vector Using an Artificial Neural Network. Applied Sciences. 2016
[8]N. Otsu, "A Threshold Selection Method from Gray-Level Histograms," in IEEE Transactions on Systems, Man, and Cybernetics, vol. 9, no. 1, pp. 62-66, Jan. 1979
[9]Lee, Ji W., Hwan Heo, and Kang R. Park. 2013. "A Novel Gaze Tracking Method Based on the Generation of Virtual Calibration Points" Sensors 13, no. 8: 10802-10822.
[10]Tan, Kar-Han & Kriegman, David & Ahuja, Narendra. (2002). Appearance-based Eye Gaze Estimation. 191 - 195. 10.1109/ACV.2002.1182180.
[11]X. Zhang, Y. Sugano, M. Fritz and A. Bulling, "Appearance-based gaze estimation in the wild," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 4511-4520, doi: 10.1109/CVPR.2015.7299081
[12]. J. Li and Y. Zhang. Learning surf cascade for fast and accurate object detection. In Proc. CVPR, pages 3468–3475, 2013.
[13]T. Baltrusaitis, P. Robinson, and L.-P. Morency. Continuous ˇ conditional neural fields for structured regression. In Proc. ECCV, pages 593–608, 2014.
[14]Lepetit, Vincent & Moreno-Noguer, Francesc & Fua, Pascal. (2009). EPnP: An accurate O(n) solution to the PnP problem. International Journal of Computer Vision. 81. 10.1007/s11263-008-0152-6.
[15]Lecun, Y., Bottou, L., Bengio, Y. and Haffner, P. (1998). Gradient-based learning applied to document recognition. pp. 2278–2324.
[16]Yihua Cheng, Feng Lu, Xucong Zhang;.(2018)Appearance-Based Gaze Estimation via Evaluation-Guided Asymmetric Regression Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 100-115
[17]A. Krizhevsky, I. Sutskever, and G. Hinton. “Imagenet classification with deep Convolutional neural networks,” In NIPS, 2012.
[18]X. Zhang, Y. Sugano, M. Fritz, and A. Bulling, “It’s written all over your face: Full-face appearance-based gaze estimation,” in The IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, July 2017, pp. 2299–2308
[19]Y. Bao, Y. Cheng, Y. Liu, and F. Lu, “Adaptive feature fusion network for gaze tracking in mobile tablets,” in The International Conference on Pattern Recognition (ICPR), 2020.
[20]Girshick, R.B., Donahue, J., Darrell, T., & Malik, J. (2014). Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 580-587.
[21]Redmon, J., Divvala, S.K., Girshick, R.B., & Farhadi, A. (2016). You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 779-788
[22]Redmon, Joseph & Farhadi, Ali. (2017). YOLO9000: Better, Faster, Stronger. 6517-6525. 10.1109/CVPR.2017.690.
[23]Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018
[24]Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., & Belongie, S.J. (2017). Feature Pyramid Networks for Object Detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 936-944.
[25]Liu, S., Qi, L., Qin, H., Shi, J., & Jia, J. (2018). Path Aggregation Network for Instance Segmentation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8759-8768.
[26]Chien-Yao Wang, Alexey Bochkovskiy, and Hong Yuan Mark Liao. Scaled-yolov4: Scaling cross stage partial network. arXiv preprint arXiv:2011.08036, 2020.
[27]B. A. Smith, Q. Yin, S. K. Feiner, and S. K. Nayar, “Gaze locking: Passive eye contact detection for human-object interaction,” in Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology, ser. UIST ’13. New York, NY, USA: Association for Computing Machinery, 2013, p. 271–280.
[28]Y. Sugano, Y. Matsushita, and Y. Sato, “Learning-by-synthesis for appearance-based 3d gaze estimation,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2014.
[29]K. A. Funes Mora, F. Monay, and J.-M. Odobez, “Eyediap: A database for the development and evaluation of gaze estimation algorithms from rgb and rgb-d cameras,” in Proceedings of the 2014 ACM Symposium on Eye Tracking Research & Applications. ACM, Mar. 2014.
[30]Q. Huang, A. Veeraraghavan, and A. Sabharwal, “Tabletgaze: dataset and analysis for unconstrained appearance-based gaze estimation in mobile tablets,” Machine Vision and Applications, vol. 28, no. 5-6, pp. 445–461, 2017.
[31]T. Fischer, H. Jin Chang, and Y. Demiris, “Rt-gene: Real-time eye gaze estimation in natural environments,” in The European Conference on Computer Vision (ECCV), September 2018.
[32]J. He, K. Pham, N. Valliappan, P. Xu, C. Roberts, D. Lagun, and V. Navalpakkam, “On-device few-shot personalization for real-time gaze estimation,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, 2019, pp. 0–0.
[33]T. Guo, Y. Liu, H. Zhang, X. Liu, Y. Kwak, B. In Yoo, J.-J. Han, and C. Choi, “A generalized and robust method towards practical gaze estimation on smart phone,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, 2019, pp. 0–0.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊