跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.134) 您好!臺灣時間:2025/11/20 00:51
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:陳立騰
研究生(外文):CHEN, LI-TENG
論文名稱:基於深度學習與視覺關注度之影像標題生成
論文名稱(外文):Image Caption Generation Based on Deep Learning and Visual Attention Model
指導教授:沈岱範林國祥林國祥引用關係
指導教授(外文):SHEN, DAY-FANNLIN, GUO-SHIANG
口試委員:林春宏賴文能沈岱範林國祥
口試委員(外文):LIN, CHUEN-HORNGLIE, WEN-NUNGSHEN, DAY-FANNLIN, GUO-SHIANG
口試日期:2018-07-18
學位類別:碩士
校院名稱:國立雲林科技大學
系所名稱:電機工程系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2018
畢業學年度:106
語文別:中文
論文頁數:95
中文關鍵詞:物件檢測顯著度檢測長短期記憶網路影像標題
外文關鍵詞:object detectionvisual attentionlong short term memoryimage caption
相關次數:
  • 被引用被引用:1
  • 點閱點閱:395
  • 評分評分:
  • 下載下載:9
  • 收藏至我的研究室書目清單書目收藏:1
本論文研發一套基於深度學習與視覺關注度之影像標題生成技術。此套影像標題生成技術包含幾個部分: 物件偵測、視覺顯著度計算和語意處理。在物件偵測部分,本論文使用深度學習技術 Faster R-CNN 偵測與識別影像內的物件。基於預訓練模型,本系統可以偵測與識別圖像中80類的物件。在視覺顯著度計算部分,本論文使用預訓練的視覺顯著度模型 [9],計算得到輸入影像之顯著圖。基於物件偵測與顯著圖,本系統可以找出影像內的感興趣區域。基於每個感興趣區域,本論文採用一具有關注度機制與長短期記憶網路(LSTM)結合的網路,生成對應影像描述文句。
為了評估系統效能,本論文使用 COCO Dataset 2014數據集進行實驗。此 COCO Dataset 2014 影像資料庫有80類共30,000張影像。對於影像標題生成,本系統之BLEU數值高於文獻[11]之BLEU數值。實驗結果顯示,本系統的影像標題生成效能更加細膩。

In this thesis, we develop an image caption generation based on deep learning and visual attention model. This system is composed of several parts: object detection, saliency computation, and image caption generation. In the object detection part, a deep learning technique, Faster R-CNN, is used to detect and classify objects in images. A pre-trained model can classify 80 categories for image classification. In the saliency computation, the pre-training model proposed in [8] is to compute the saliency value of each ROI image. According to category information and saliency value, the proposed system can generate the corresponding image caption.
To evaluate the performance of the proposed system, the COCO 2014 image set is used. There are 30,000 images in the COCO 2014 image set. For image caption, the BLEU value of the proposed system is higher than that of [11]. Experimental results show that the proposed system is superior to the existing method [11].

摘要 i
ABSTRACT ii
誌謝 iii
目錄 iv
表目錄 vi
圖目錄 vii
第一章 緒論 1
1.1 研究背景 1
1.2 研究動機與目標 2
1.3 章節概述 3
第二章 相關方法與文獻回顧 4
2.1 卷積類神經網路 4
2.2 物件偵測 13
2.3 視覺顯著度網路 20
2.4 影像標題描述 28
第三章 系統架構 31
3.1 視覺理解 32
3.2 語意處理 33
第四章 視覺理解 34
4.1 物件檢測 34
4.2 視覺顯著度網路 37
4.3 特徵融合 38
4.4 感興趣區域排序 39
第五章 影像描述 40
5.1 編碼器-解碼器架構 40
5.2 編碼器 40
5.3 解碼器 41
第六章 實驗結果與性能評估 45
6.1 實驗環境與數據集 45
6.2 性能評估方式 48
6.3 物件、顯著度檢測實驗結果 54
6.4 影像描述實驗與性能評估 58
6.5 影像內容驗證 76
第七章 結論與未來展望 78
7.1 結論 78
7.2 未來展望 79
參考文獻 80

[1]http://big5.gov.cn/gate/big5/www.gov.cn/jrzg/2013-05/14/content_2402255.htm
[2]https://udn.com/news/story/7240/2435821
[3]https://www.inside.com.tw/2017/10/26/umbo-computer-vision
[4]Yann Lecun, Leon Bottou, Yoshua Bengio, and Patrick Haffner, “Gradient-Based Learning Applied to Document Recognition,” Proceedings of the IEEE, vol. 86, no. 11, pages. 2278-2324, Nov 1998.
[5]Alex Krizhevsky, IIya. Sutskever, Geoffrey Hinton, “Imagenet classification with deep convolutional neural networks,” In Advances in Neural Information Processing Systems, pages 1097–1105, 2012.
[6]Karen Simonyan, Andrew Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” International Conference on Learning Representations, 2014.
[7]Ross Girshick, Jeff Donahue, TrevorDarrell, Jitendra Malik “Rich Feature Hierarchies for Accurate Object Detection and SemanticSegmentation,” Computer Vision and Pattern Recognition (CVPR), pages 23-28, June 2014.
[8]Ross Girshick, “FAST R-CNN,” IEEE International Conference on Computer Vision (ICCV), pages. 1440-1448, 2015.
[9]Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun “Faster r-cnn: Towards real-time object detection with region proposal networks,” Neural Information Processing Systems (NIPS), 2015.
[10]Qibin Hou, Ming-Ming Cheng, Xiaowei Hu, Ali Borji, Zhuowen Tu, Philip Torr, “Deeply Supervised Salient Object Detection with Short Connections,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages. 5300-5309, 2017.
[11]Saining Xie and Zhuowen Tu, “Holistically-Nested Edge Detection,” IEEE International Conference on Computer Vision (ICCV), pages. 1395-1403, 2015.
[12]Kelvin Xu, Jimmy Ba, Ryan Kiros, et al. “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention,” International Conference on Machine Learning (ICML), 2015.
[13]Blaine Rister and Dieterich Lawson “Image Captioning with Attention,”.
[14]Andrej Karpathy and Li Fei-Fei, “Deep Visual-Semantic Alignments for Generating Image Descriptions,”IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol.39, pages 664-676, 2016.
[15]Zhongliang Yang, Yu-Jin Zhang, Sadaqat ur Rehman, Yongfeng Huang, “ Image Captioning with Object Detection and Localization,” International Conference on Image and Graphics, pages 109-118, 2017.
[16]Stas Goferman, Lihi Zelnik-Manor, “ Context-Aware Saliency Detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), pages 1915-1926, 2011.
[17]Guo-Shiang Lin, and Xian-Wei Ji, “Video Quality Enhancement Based on Visual Attention Model and Multi-level Exposure Correction,” Multimedia Tools and Applications, Vol. 75, No. 16, pages.9903–9925,2016.
[18]Chih-Wei Tang, Ching-Ho Chen, Ya-Hui Yu, and Chun-Jen Tsai,“Visual sensitivity guided bit allocation for video coding,”IEEE Trans. onMultimedia, Vol. 8,No. 1, pages.11-18, Feb. 2006.
[19]Y.-F. Ma and H.-J. Zhang, “A Model of motion attention for video skimming,” Proc. of IEEE Int’l Conf. on Image Processing, Vol. 1, pages.129-132, Sept. 2002
[20]http://big5.gov.cn/gate/big5/www.gov.cn/jrzg/2013-05/14/content_2402255.htm
[21]Linghui Li, Sheng Tang, Lixi Deng, Yongdong Zhang,Qi Tian, “Image Caption with Global-Local Attention,” Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, Feb. 4-9, 2017.
[22]Christopher Elamri, Teun de Planque, “Automated Neural Image Caption Generator for Visually Impaired People,”
[23]Bo Wu, et al. "Sequential Prediction of Social Media Popularity with Deep Temporal Context Networks." international Joint Conference on Artificial Intelligence (IJCAI). 2017.
[24]Joseph Redmon, et al. "You Only Look Once: Unified, Real-Time Object Detection." arXiv 1506.02640. Pages 779-788, (2016).
[25]Leon A. Gatys, et al. "A Neural Algorithm of Artistic Style." arXiv preprint arXiv 1508.06576v2, 2015.
[26]Jonathan Long, E. S., Trevor Darrell. "Fully Convolutional Networks for Semantic Segmentation." CVPR. Pages 3431-3440, 2015
[27]Kaiming He, X. Z., Shaoqing Ren, Jian Sun. "Deep Residual Learning for Image Recognition." IEEE Conference on Computer Vision and Pattern Recognition. Pages. 770-778, 2015
[28]Sepp Hochreiter, J. S. "Long Short -Term Memory." Pages 1735-1780, (1997).
[29]Kaiming He, G. G., i Piotr Dollar, Ross Girshick. "Mask R-CNN." arXiv 1703.06870. Pages2980–2988, 2017
[30]Ali Choumane, Z. A. A. I. "Friend Recommendation based on Hashtags Analysis." Pages 337-350, 2017
[31] Jia Li, H. X., Xingwei He, Junhui Deng, Xiaomin Sun, "Tweet Modeling with LSTM Recurrent Neural Networks for Hashtag Recommendation." IJCNN. Pages:1570-1577, 2016.
[32]Wenguan Wang, a. J. S., Senior Member , "Deep Visual Attention Prediction." arXiv 1705.02544., 2017.
[33]Xiaodong He, L. D. "Deep Learning for Image-to-Text Generation." IEEE Signal Processing Magazine. Pages:109–116, 2017.
[34]Girish Kulkarni, V. P., Sagnik Dhar, Siming Li, Yejin Choi, Alexander C Berg, Tamara L Berg "Baby Talk: Understanding and Generating Image Descriptions." CVPR. Pages 1143-1151, 2011.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top