臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.134) 您好！臺灣時間：2025/11/20 00:51

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
QR Code

本論文永久網址:

研究生:

陳立騰

研究生(外文):

CHEN, LI-TENG

論文名稱:

基於深度學習與視覺關注度之影像標題生成

論文名稱(外文):

Image Caption Generation Based on Deep Learning and Visual Attention Model

指導教授:

沈岱範、林國祥

指導教授(外文):

SHEN, DAY-FANN、LIN, GUO-SHIANG

口試委員:

林春宏、賴文能、沈岱範、林國祥

口試委員(外文):

LIN, CHUEN-HORNG、LIE, WEN-NUNG、SHEN, DAY-FANN、LIN, GUO-SHIANG

口試日期:

2018-07-18

學位類別:

碩士

校院名稱:

國立雲林科技大學

系所名稱:

電機工程系

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2018

畢業學年度:

106

語文別:

中文

論文頁數:

中文關鍵詞:

物件檢測、顯著度檢測、長短期記憶網路、影像標題

外文關鍵詞:

object detection、visual attention、long short term memory、image caption

相關次數:

被引用:1
點閱:395
評分:
下載:9
書目收藏:1

本論文研發一套基於深度學習與視覺關注度之影像標題生成技術。此套影像標題生成技術包含幾個部分: 物件偵測、視覺顯著度計算和語意處理。在物件偵測部分，本論文使用深度學習技術 Faster R-CNN 偵測與識別影像內的物件。基於預訓練模型，本系統可以偵測與識別圖像中80類的物件。在視覺顯著度計算部分，本論文使用預訓練的視覺顯著度模型 [9]，計算得到輸入影像之顯著圖。基於物件偵測與顯著圖，本系統可以找出影像內的感興趣區域。基於每個感興趣區域，本論文採用一具有關注度機制與長短期記憶網路(LSTM)結合的網路，生成對應影像描述文句。
為了評估系統效能，本論文使用 COCO Dataset 2014數據集進行實驗。此 COCO Dataset 2014 影像資料庫有80類共30,000張影像。對於影像標題生成，本系統之BLEU數值高於文獻[11]之BLEU數值。實驗結果顯示，本系統的影像標題生成效能更加細膩。

In this thesis, we develop an image caption generation based on deep learning and visual attention model. This system is composed of several parts: object detection, saliency computation, and image caption generation. In the object detection part, a deep learning technique, Faster R-CNN, is used to detect and classify objects in images. A pre-trained model can classify 80 categories for image classification. In the saliency computation, the pre-training model proposed in [8] is to compute the saliency value of each ROI image. According to category information and saliency value, the proposed system can generate the corresponding image caption.
To evaluate the performance of the proposed system, the COCO 2014 image set is used. There are 30,000 images in the COCO 2014 image set. For image caption, the BLEU value of the proposed system is higher than that of [11]. Experimental results show that the proposed system is superior to the existing method [11].

摘要 i
ABSTRACT ii
誌謝 iii
目錄 iv
表目錄 vi
圖目錄 vii
第一章緒論 1
1.1 研究背景 1
1.2 研究動機與目標 2
1.3 章節概述 3
第二章相關方法與文獻回顧 4
2.1 卷積類神經網路 4
2.2 物件偵測 13
2.3 視覺顯著度網路 20
2.4 影像標題描述 28
第三章系統架構 31
3.1 視覺理解 32
3.2 語意處理 33
第四章視覺理解 34
4.1 物件檢測 34
4.2 視覺顯著度網路 37
4.3 特徵融合 38
4.4 感興趣區域排序 39
第五章影像描述 40
5.1 編碼器-解碼器架構 40
5.2 編碼器 40
5.3 解碼器 41
第六章實驗結果與性能評估 45
6.1 實驗環境與數據集 45
6.2 性能評估方式 48
6.3 物件、顯著度檢測實驗結果 54
6.4 影像描述實驗與性能評估 58
6.5 影像內容驗證 76
第七章結論與未來展望 78
7.1 結論 78
7.2 未來展望 79
參考文獻 80

[1]http://big5.gov.cn/gate/big5/www.gov.cn/jrzg/2013-05/14/content_2402255.htm
[2]https://udn.com/news/story/7240/2435821
[3]https://www.inside.com.tw/2017/10/26/umbo-computer-vision
[4]Yann Lecun, Leon Bottou, Yoshua Bengio, and Patrick Haffner, “Gradient-Based Learning Applied to Document Recognition,” Proceedings of the IEEE, vol. 86, no. 11, pages. 2278-2324, Nov 1998.
[5]Alex Krizhevsky, IIya. Sutskever, Geoffrey Hinton, “Imagenet classification with deep convolutional neural networks,” In Advances in Neural Information Processing Systems, pages 1097–1105, 2012.
[6]Karen Simonyan, Andrew Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” International Conference on Learning Representations, 2014.
[7]Ross Girshick, Jeff Donahue, TrevorDarrell, Jitendra Malik “Rich Feature Hierarchies for Accurate Object Detection and SemanticSegmentation,” Computer Vision and Pattern Recognition (CVPR), pages 23-28, June 2014.
[8]Ross Girshick, “FAST R-CNN,” IEEE International Conference on Computer Vision (ICCV), pages. 1440-1448, 2015.
[9]Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun “Faster r-cnn: Towards real-time object detection with region proposal networks,” Neural Information Processing Systems (NIPS), 2015.
[10]Qibin Hou, Ming-Ming Cheng, Xiaowei Hu, Ali Borji, Zhuowen Tu, Philip Torr, “Deeply Supervised Salient Object Detection with Short Connections,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages. 5300-5309, 2017.
[11]Saining Xie and Zhuowen Tu, “Holistically-Nested Edge Detection,” IEEE International Conference on Computer Vision (ICCV), pages. 1395-1403, 2015.
[12]Kelvin Xu, Jimmy Ba, Ryan Kiros, et al. “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention,” International Conference on Machine Learning (ICML), 2015.
[13]Blaine Rister and Dieterich Lawson “Image Captioning with Attention,”.
[14]Andrej Karpathy and Li Fei-Fei, “Deep Visual-Semantic Alignments for Generating Image Descriptions,”IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol.39, pages 664-676, 2016.
[15]Zhongliang Yang, Yu-Jin Zhang, Sadaqat ur Rehman, Yongfeng Huang, “ Image Captioning with Object Detection and Localization,” International Conference on Image and Graphics, pages 109-118, 2017.
[16]Stas Goferman, Lihi Zelnik-Manor, “ Context-Aware Saliency Detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), pages 1915-1926, 2011.
[17]Guo-Shiang Lin, and Xian-Wei Ji, “Video Quality Enhancement Based on Visual Attention Model and Multi-level Exposure Correction,” Multimedia Tools and Applications, Vol. 75, No. 16, pages.9903–9925,2016.
[18]Chih-Wei Tang, Ching-Ho Chen, Ya-Hui Yu, and Chun-Jen Tsai,“Visual sensitivity guided bit allocation for video coding,”IEEE Trans. onMultimedia, Vol. 8,No. 1, pages.11-18, Feb. 2006.
[19]Y.-F. Ma and H.-J. Zhang, “A Model of motion attention for video skimming,” Proc. of IEEE Int’l Conf. on Image Processing, Vol. 1, pages.129-132, Sept. 2002
[20]http://big5.gov.cn/gate/big5/www.gov.cn/jrzg/2013-05/14/content_2402255.htm
[21]Linghui Li, Sheng Tang, Lixi Deng, Yongdong Zhang,Qi Tian, “Image Caption with Global-Local Attention,” Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, Feb. 4-9, 2017.
[22]Christopher Elamri, Teun de Planque, “Automated Neural Image Caption Generator for Visually Impaired People,”
[23]Bo Wu, et al. "Sequential Prediction of Social Media Popularity with Deep Temporal Context Networks." international Joint Conference on Artificial Intelligence (IJCAI). 2017.
[24]Joseph Redmon, et al. "You Only Look Once: Unified, Real-Time Object Detection." arXiv 1506.02640. Pages 779-788, (2016).
[25]Leon A. Gatys, et al. "A Neural Algorithm of Artistic Style." arXiv preprint arXiv 1508.06576v2, 2015.
[26]Jonathan Long, E. S., Trevor Darrell. "Fully Convolutional Networks for Semantic Segmentation." CVPR. Pages 3431-3440, 2015
[27]Kaiming He, X. Z., Shaoqing Ren, Jian Sun. "Deep Residual Learning for Image Recognition." IEEE Conference on Computer Vision and Pattern Recognition. Pages. 770-778, 2015
[28]Sepp Hochreiter, J. S. "Long Short -Term Memory." Pages 1735-1780, (1997).
[29]Kaiming He, G. G., i Piotr Dollar, Ross Girshick. "Mask R-CNN." arXiv 1703.06870. Pages2980–2988, 2017
[30]Ali Choumane, Z. A. A. I. "Friend Recommendation based on Hashtags Analysis." Pages 337-350, 2017
[31] Jia Li, H. X., Xingwei He, Junhui Deng, Xiaomin Sun, "Tweet Modeling with LSTM Recurrent Neural Networks for Hashtag Recommendation." IJCNN. Pages:1570-1577, 2016.
[32]Wenguan Wang, a. J. S., Senior Member , "Deep Visual Attention Prediction." arXiv 1705.02544., 2017.
[33]Xiaodong He, L. D. "Deep Learning for Image-to-Text Generation." IEEE Signal Processing Magazine. Pages:109–116, 2017.
[34]Girish Kulkarni, V. P., Sagnik Dhar, Siming Li, Yejin Choi, Alexander C Berg, Tamara L Berg "Baby Talk: Understanding and Generating Image Descriptions." CVPR. Pages 1143-1151, 2011.

電子全文

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	不同模型組合的多重注意力機制於影像標題生成之應用
2.	應用於影像檢索之基於深度學習與視覺關注度之哈希碼生成
3.	基於深度學習之戶外停車場多車輛追蹤及辨別
4.	結合物件偵測與權重式特徵混合之影像標題生成

無相關期刊

1.	不同模型組合的多重注意力機制於影像標題生成之應用
2.	結合物件偵測與權重式特徵混合之影像標題生成
3.	應用於影像檢索之基於深度學習與視覺關注度之哈希碼生成
4.	無線射頻辨識系統讀取器天線之研究
5.	VHF與UHF頻段電視天線與巴倫器之設計與整合
6.	基於卷積神經網路的高樓外牆磁磚檢測
7.	基於彩色視訊暨卷積神經網路深度學習之即時咖啡Agtron烘焙度估測方法
8.	細懸浮微粒預測系統之研究
9.	空氣品質指數劣化即時預警ACES系統之研究
10.	電鍍鎳鎢合金作為擴散阻障層之研究
11.	公立中小學「不適任教師」處置相關法制之研究
12.	英語學習焦慮、學習自主與學習成就之關係研究
13.	以服務設計觀點探討工業設計實務教學之需求
14.	領導行為及安全文化對安全績效之影響—以台灣營建業為例
15.	基於機電整合實現自動化應用之研究

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室