跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.84) 您好!臺灣時間:2024/12/03 10:22
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:洪楷鈞
研究生(外文):Kai-Chun Hung
論文名稱:改良YOLOv8模型之遙測影像目標偵測
論文名稱(外文):Improving YOLOv8 Model for Object Detection of Remote Sensing Images
指導教授:陳永隆陳永隆引用關係黃馨逸黃馨逸引用關係
指導教授(外文):Young-Long ChenHsin-I Huang
口試委員:翁萬德
口試日期:2024-07-31
學位類別:碩士
校院名稱:國立臺中科技大學
系所名稱:資訊工程系碩士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2024
畢業學年度:112
語文別:中文
論文頁數:63
中文關鍵詞:深度學習卷積神經網路物件偵測YOLOv8遙測影像
外文關鍵詞:Deep LearningConvolutional Neural NetworksObject DetectionYOLOv8Remote Sensing
相關次數:
  • 被引用被引用:0
  • 點閱點閱:30
  • 評分評分:
  • 下載下載:16
  • 收藏至我的研究室書目清單書目收藏:0
近幾年在電腦視覺領域當中有許多專家學者提出了高效的物件偵測器。然而,遙測影像相比普通影像更為複雜,且具有較高的相似度。這一特性使得直接將物件偵測器應用於遙感影像時,效果往往不如預期。此外,基於深度學習的演算法應用於物件偵測,雖然可以辨識到物件的存在和對應類別,但目前大部分的演算法只有利用區域提案中的訊息,從而忽略全區域的資訊。而近年較新的物件偵測器,絕大多數追求更高的精度,卻忽略了檢測精度和模型大小的平衡,這將導致物件偵測器無法良好地應用於資源較為受限的環境。為瞭解決上述問題並能夠更好的地實現遙測影像之目標偵測,我們基於YOLOv8快速目標檢測模型,提出三個新模型應用在遙測影像物件偵測。第一個模型為YOLOv8n with Bi-directional Feature Pyramid Network(YOLOv8n-Bi)模型。YOLOv8n-Bi模型將特徵金字塔改為具有權重的雙向特徵金字塔網絡(Bi-directional Feature Pyramid Network, BiFPN),並且學習不同輸入的重要性。第二個模型進一步在YOLOv8n-Bi模型的基礎上,添加Transformer區塊,使模型能更有效地捕捉影像中的長距離相依和全區域上下文訊息。且透過自注意力機制,模型得以提高對影像中各個位置的關注度,特別是在存在遮蔽或複雜背景的情境下,我們稱第二個模型為YOLOv8n with Transformer and Bi-directional Feature Pyramid Network(YOLOv8n-TFBi)模型。而第三個模型基於 YOLOv8n-TFBi模型,加入座標注意力區塊(Coordinate Attention, CA),強化模型對特定位置的注意力,進而提高物件偵測的精度,故我們提出的第三個模型為YOLOv8n with Transformer and Bi-directional Feature Pyramid Network and Coordinate Attention(YOLOv8n-TFBiCA)模型。本文將我們所提出的模型與其他物件偵測的先進模型於公開資料集RSOD上進行比較,並透過全類均值精度(Mean Average Precision, mAP)、參數量(Parameters)及推理速度(Inference time)三個指標,驗證我們所提出的模型在參數量及推理速度相差不遠的情況下,精度優於其他先進模型。實驗結果顯示,本文所提出的YOLOv8n-Bi、YOLOv8n-TFBi和YOLOv8n-TFBiCA模型在RSOD公開資料集上相比於YOLOv8n模型,mAP從90.2%分別提升至91.7%、92.5%及94.5%。在與YOLOv5、YOLOv6、CA-YOLO等模型相比亦擁有更好的mAP表現,並同時保有競爭性的參數量及推理速度。
In recent years, numerous experts and scholars in the field of computer vision have proposed efficient object detectors. However, remote sensing images are more complex than ordinary images, and the similarity is much higher. This feature makes the effect often less effective when applied directly to remote sensing images with object detectors.
Furthermore, algorithms based on deep learning are applied to object detection. Although the existence and corresponding category of the object can be identified, most current algorithms only use the information in the region proposals, thereby ignoring the information of the entire area. In addition, many recent state-of-the-art object detectors prioritize higher accuracy but neglect the balance between detection accuracy and model size. This oversight limits their applicability in resource-constrained environments. To address these issues and enhance the detection of targets in remote sensing images, we propose three novel models based on the YOLOv8 fast object detection model. The first model is YOLOv8n with Bi-directional Feature Pyramid Network(YOLOv8n-Bi), which replaces the feature pyramid with a weighted Bi-directional Feature Pyramid Network(BiFPN) and learns the importance of distinguishing features between different inputs. Building upon YOLOv8n-Bi, the second model incorporates Transformer blocks to enable the model to capture long-range dependencies and global contextual information more effectively. This model called YOLOv8n with Transformer and Bi-directional Feature Pyramid Network(YOLOv8n-TFBi), utilizes self-attention mechanisms to enhance focus on various positions in the image, particularly in scenarios with occlusion or complex backgrounds. The third called YOLOv8n with Transformer and Bi-directional Feature Pyramid Network(YOLOv8n-TFBi), builds upon YOLOv8n-TFBi by introducing a Coordinate Attention(CA) block to strengthen the model's attention to specific positions, thereby improving object detection accuracy. This thesis compares our proposed models with other advanced object detection approaches on the publicly available RSOD dataset. Evaluation metrics include Mean Average Precision(mAP) and the number of parameters. We demonstrate that our models outperform other state-of-the-art approaches in terms of accuracy while maintaining a comparable number of parameters. The experimental results show that our proposed YOLOv8n-Bi, YOLOv8n-TFBi, and YOLOv8n-TFBiCA models improved the mAP on the RSOD public dataset from 90.2% with the YOLOv8n model to 91.7%, 92.5%, and 94.5%, respectively. Compared to models such as YOLOv5, YOLOv6, and CA-YOLO, these proposed models also demonstrated better mAP performance while maintaining competitive parameter counts and inference speeds.
目次
摘要 i
Abstract iii
致謝 v
目次 vi
表目次 viii
圖目次 ix
第一章 前言 1
第一節 研究背景與動機 1
第二節 研究目的 2
第二章 相關研究 4
第一節 卷積神經網路(Convolutional neural network, CNN) 4
第二節 兩階段物件偵測(Two-Stage Object Detection) 6
第三節 單階段物件偵測(One-Stage Object Detection) 7
第三章 本文方法 9
第一節 YOLOv8n with Bi-directional Feature Pyramid Network(YOLOv8n-Bi) Model 9
3-1-1 Backbone 10
3-1-2 Neck 12
3-1-3 Head 14
3-1-4 模型實驗 17
3-1-4-1 RSOD資料集 17
3-1-4-2 評估指標 18
3-1-4-3 實驗環境 20
3-1-4-4 訓練參數 20
3-1-4-5 實驗結果 20
第二節 YOLOv8n with Transformer and Bi-directional Feature Pyramid Network(YOLOv8n-TFBi) Model 23
3-2-1 Backbone 24
3-2-1-1 Transformer 27
3-2-2 Neck 29
3-2-3 Head 31
3-2-4 模型實驗 33
3-2-4-1 RSOD資料集 33
3-2-4-2 評估指標 33
3-2-4-3 實驗環境 33
3-2-4-4 訓練參數 34
3-2-4-5 實驗結果 34
第三節 YOLOv8n with Transformer and Bi-directional Feature Pyramid Network and Coordinate Attention(YOLOv8n-TFBiCA) Model 37
3-3-1 Backbone 39
3-3-1-1 Transformer 41
3-3-2 Neck 43
3-3-2-1 Coordinate Attention 45
3-3-3 Head 47
3-3-4 模型實驗 49
3-3-4-1 RSOD資料集 49
3-3-4-2 評估指標 49
3-3-4-3 實驗環境 49
3-3-4-4 訓練參數 50
3-3-4-5 實驗結果 50
第四章 消融實驗 54
第五章 模型比較 56
第六章 結論 58
參考文獻 59
-------------------------------------------------
表目次
表1. RSOD資料集詳細資訊 17
表2. 錯誤類型表 18
表3. YOLOv8n-Bi模型實驗環境 20
表4. YOLOv8n-Bi模型訓練參數 20
表5. YOLOv8n-Bi模型於RSOD資料集上的實驗結果 21
表6. YOLOv8n-TFBi模型實驗環境 34
表7. YOLOv8n-TFBi模型訓練參數 34
表8. YOLOv8n-TFBi模型於RSOD資料集上的實驗結果 35
表9. YOLOv8n-TFBiCA模型實驗環境 50
表10. YOLOv8n-TFBiCA模型訓練參數 50
表11. YOLOv8n-TFBiCA模型於RSOD資料集上的實驗結果 51
表12. 各種模型組合 54
表13. 消融實驗結果 55
表14. 與其他先進物件偵測模型在RSOD資料集上的比較結果 57
-------------------------------------------------
圖目次
圖1. 卷積示意圖 4
圖2. 最大池化層示意圖 5
圖3. 全連接層示意圖 5
圖4. VGG16架構 6
圖5. 兩階段物件偵測示意圖 6
圖6. 單階段物件偵測流程 7
圖7. YOLOv8n-Bi架構圖 10
圖8. YOLOv8n-Bi使用之CBS區塊架構圖 11
圖9. YOLOv8n-Bi使用之C2f區塊架構圖 11
圖10. YOLOv8n-Bi使用之SPPF區塊架構 12
圖11. PANet與BiFPN差異 13
圖12. YOLOv8n-Bi模型之預測頭架構 14
圖13. CIoU示意圖 16
圖14. RSOD資料集範例 17
圖15. YOLOv8n-Bi模型邊界框損失變化曲線圖 21
圖16. YOLOv8n-Bi模型分類損失變化曲線圖 22
圖17. YOLOv8n-Bi模型全類均值精度曲線圖 22
圖18. YOLOv8n-Bi模型混淆矩陣 23
圖19. YOLOv8n-TFBi架構圖 24
圖20. YOLOv8n-TFBi模型使用之CBS區塊架構圖 25
圖21. YOLOv8n-TFBi使用之C2f區塊架構圖 26
圖22. YOLOv8n-TFBi使用之SPPF區塊架構圖 27
圖23. YOLOv8n-TFBi使用之Transformer區塊架構圖 28
圖24. YOLOv8n-TFBi使用之Multi-Head Attention架構 29
圖25. YOLOv8n-TFBi模型使用之BiFPN連接方法 30
圖26. YOLOv8n-TFBi模型之預測頭架構 31
圖27. YOLOv8n-TFBi模型邊界框損失變化曲線圖 35
圖28. YOLOv8n-TFBi模型分類損失變化曲線圖 36
圖29. YOLOv8n-TFBi模型全類均值精度曲線圖 36
圖30. YOLOv8n-TFBi模型混淆矩陣 37
圖31. YOLOv8n-TFBiCA架構圖 38
圖32. YOLOv8n-TFBiCA使用之CBS區塊架構圖 39
圖33. YOLOv8n-TFBiCA使用之C2f區塊架構圖 40
圖34. YOLOv8n-TFBiCA使用之SPPF區塊架構圖 41
圖35. YOLOv8n-TFBiCA使用之Transformer區塊架構圖 42
圖36. YOLOv8n-TFBiCA使用之Multi-Head Attention架構 43
圖37. YOLOv8n-TFBiCA使用BiFPN連接方法 44
圖38. YOLOv8n-TFBiCA模型之CA區塊架構 46
圖39. YOLOv8n-TFBiCA模型之預測頭架構 48
圖40. YOLOv8n-TFBiCA模型邊界框損失變化曲線圖 51
圖41. YOLOv8n-TFBiCA模型分類損失變化曲線圖 52
圖42. YOLOv8n-TFBiCA模型全類均值精度曲線圖 52
圖43. YOLOv8n-TFBiCA模型混淆矩陣 53
[1]S. Elfwing, E. Uchibe, and K. Doya, "Sigmoid-weighted linear units for neural network function approximation in reinforcement learning," Neural Netw, vol. 107, pp. 3-11, Nov. 2018.
[2]L. Zhang, H. Lin, and F. Wang, "Individual tree detection based on high-resolution RGB images for urban forestry applications," IEEE Access, vol. 10, pp. 46589-46598, May 2022.
[3]H. Wang et al., "Multi-source remote sensing intelligent characterization technique-based disaster regions detection in high-altitude mountain forest areas," IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1-5, Jun. 2022.
[4]M. Zhang and X. Li, "Drone-enabled Internet-of-Things relay for environmental monitoring in remote areas without public networks," IEEE Internet of Things Journal, vol. 7, no. 8, pp. 7648-7662, Apr. 2020.
[5]J. Lv et al., "Recognition of deformation military targets in the complex scenes via miniSAR submeter images with FASAR-Net," IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1-19, May 2023.
[6]R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proc. IEEE conference on computer vision and pattern recognition, pp. 580-587, Jun. 2014.
[7]K. He, X. Zhang, S. Ren, and J. Sun, "Spatial pyramid pooling in deep convolutional networks for visual recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 9, pp. 1904-1916, Jan. 2015.
[8]R. Gavrilescu, C. Zet, C. Foșalău, M. Skoczylas, and D. Cotovanu, "Faster R-CNN: an approach to real-time object detection," in Proc. 2018 International Conference and Exposition on Electrical And Power Engineering (EPE), pp. 0165-0168, Oct. 2018.
[9]S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, "Path aggregation network for instance segmentation," in Proc. IEEE conference on computer vision and pattern recognition, pp. 8759-8768, Jun. 2018.
[10]Z. Cai and N. Vasconcelos, "Cascade r-cnn: Delving into high quality object detection," in Proc. IEEE conference on computer vision and pattern recognition, pp. 6154-6162, Jun. 2018.
[11]K. He, G. Gkioxari, P. Dollár, and R. Girshick, "Mask r-cnn," in Proc. IEEE international conference on computer vision, pp. 2961-2969, Oct. 2017.
[12]W. Liu et al., "Ssd: Single shot multibox detector," in Proc. European Conference on Computer Vision, pp. 21-37, Sep. 2016.
[13]T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, "Focal loss for dense object detection," in Proc. IEEE international conference on computer vision, pp. 2980-2988, Oct. 2017.
[14]J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proc. IEEE conference on computer vision and pattern recognition, pp. 779-788, Jun. 2016.
[15]Z. Huang, J. Wang, X. Fu, T. Yu, Y. Guo, and R. Wang, "DC-SPP-YOLO: Dense connection and spatial pyramid pooling based YOLO for object detection," Information Sciences, vol. 522, pp. 241-258, Jun. 2020.
[16]Z. Liu, Y. Gao, Q. Du, M. Chen, and W. Lv, "YOLO-extract: improved YOLOv5 for aircraft object detection in remote sensing images," IEEE Access, vol. 11, pp. 1742-1751, Jan. 2023.
[17]L. Shen, B. Lang, and Z. Song, "CA-YOLO: Model optimization for remote sensing image object detection," IEEE Access, vol. 11, pp. 64769-64781, Jun. 2023.
[18]Y. Yuan, J. Fang, X. Lu, and Y. Feng, "Remote sensing image scene classification using rearranged local features," IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 3, pp. 1779-1792, Oct. 2018.
[19]C. Li, C. Xu, Z. Cui, D. Wang, T. Zhang, and J. Yang, "Feature-attentioned object detection in remote sensing imagery," in Proc. 2019 IEEE international conference on image processing, pp. 3886-3890, Sep. 2019.
[20]X. Yang et al., "Scrdet: Towards more robust detection for small, cluttered and rotated objects," in Proc. IEEE/CVF international conference on computer vision, pp. 8232-8241, Oct. 2019.
[21]Z. Huang, W. Li, X.-G. Xia, X. Wu, Z. Cai, and R. Tao, "A novel nonlocal-aware pyramid and multiscale multitask refinement detector for object detection in remote sensing images," IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1-20, Feb. 2021.
[22]M. Tan, R. Pang, and Q. V. Le, "Efficientdet: Scalable and efficient object detection," in Proc. IEEE/CVF conference on computer vision and pattern recognition, pp. 10781-10790, Jun. 2020.
[23]Q. Hou, D. Zhou, and J. Feng, "Coordinate attention for efficient mobile network design," in Proc. IEEE/CVF conference on computer vision and pattern recognition, pp. 13713-13722, Jun. 2021.
[24]A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, vol. 25, pp. 1097-1105, 2012.
[25]Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[26]K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proc. IEEE conference on computer vision and pattern recognition, pp. 770-778, Jun. 2016.
[27]J. Yang, L. Jiao, R. Shang, X. Liu, R. Li, and L. Xu, "EPT-Net: edge perception transformer for 3D medical image segmentation," IEEE Transactions on Medical Imaging, vol. 42, no. 11, pp. 3229 - 3243, May 2023.
[28]K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[29]R. Girshick, "Fast r-cnn," in Proc. IEEE international conference on computer vision, pp. 1440-1448, Dec. 2015.
[30]S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," Advances in neural information processing systems, vol. 28, 2015.
[31]K. E. Van de Sande, J. R. Uijlings, T. Gevers, and A. W. Smeulders, "Segmentation as selective search for object recognition," in Proc. 2011 international conference on computer vision, pp. 1879-1886, Nov. 2011.
[32]J. Redmon and A. Farhadi, "YOLO9000: better, faster, stronger," in Proc. IEEE conference on computer vision and pattern recognition, pp. 6517-6525, Jul. 2017.
[33]C. Li et al., "YOLOv6: A single-stage object detection framework for industrial applications," arXiv preprint arXiv:2209.02976, 2022.
[34]C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, "YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors," in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464-7475, Jun. 2023.
[35]M. Tan and Q. Le, "Efficientnet: Rethinking model scaling for convolutional neural networks," in Proc. International conference on machine learning, pp. 6105-6114, Jun. 2019.
[36]Z. Tian, C. Shen, H. Chen, and T. He, "Fcos: Fully convolutional one-stage object detection," in Proc. IEEE/CVF international conference on computer vision, pp. 9627-9636, Oct. 2019.
[37]Y. Zhong, J. Wang, J. Peng, and L. Zhang, "Anchor box optimization for object detection," in Proc. IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1286-1294, Mar. 2020.
[38]J. Hosang, R. Benenson, and B. Schiele, "Learning non-maximum suppression," in Proc. 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 6469-6477, Jul. 2017.
[39]A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, "Yolov4: Optimal speed and accuracy of object detection," arXiv preprint arXiv:2004.10934, 2020.
[40]S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," in Proc. International conference on machine learning, pp. 448-456, Jul. 2015.
[41]C.-Y. Wang, H.-Y. M. Liao, Y.-H. Wu, P.-Y. Chen, J.-W. Hsieh, and I.-H. Yeh, "CSPNet: A new backbone that can enhance learning capability of CNN," in Proc. IEEE/CVF conference on computer vision and pattern recognition workshops, pp. 1571-1580, Jun. 2020.
[42]H. Zhang, Y. Wang, F. Dayoub, and N. Sunderhauf, "Varifocalnet: An iou-aware dense object detector," in Proc. IEEE/CVF conference on computer vision and pattern recognition, pp. 8510-8519, Jun. 2021.
[43]I. Loshchilov and F. Hutter, "Decoupled weight decay regularization," arXiv preprint arXiv:1711.05101, 2017.
[44]A. Vaswani et al., "Attention is all you need," Advances in neural information processing systems, vol. 30, 2017.
[45]Ultralytics, "YOLOv5 Release v7.0," Nov. 22, 2022.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊