跳到主要內容

臺灣博碩士論文加值系統

(34.204.198.73) 您好!臺灣時間:2024/07/21 16:53
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:林修民
研究生(外文):LIN,HSIU-MING
論文名稱:應用於醫學影像之自監督式輕量化視覺轉換器的語義分割系統
論文名稱(外文):A Study on Light-weight Semantic Segmentation Based on Vision Transformer and Self-Supervised Learning for Medical Images
指導教授:張傳旺林國祥林國祥引用關係
指導教授(外文):CHANG,CHUAN-WANGLIN,GUO-SHIANG
口試委員:楊勝智江振國黃敬群
口試委員(外文):YANG,SHENG-CHIHCHIANG,CHEN-KUOHUANG,CHING-CHUN
口試日期:2023-07-28
學位類別:碩士
校院名稱:國立勤益科技大學
系所名稱:資訊工程系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2023
畢業學年度:112
語文別:中文
論文頁數:88
中文關鍵詞:視覺轉換器輕量化模型自監督式學習醫學影像分割
外文關鍵詞:Vision Transformerlightweight modelsself-supervised learningmedical image segmentation
相關次數:
  • 被引用被引用:0
  • 點閱點閱:48
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本研究提出了一種基於多任務學習的自監督學習框架與基於視覺轉換器的輕量化語義分割模型。新型自監督框架分為兩個部分:上半部能夠融合現有的自監督對比方法之框架,下半部使用基於auto-encoder的自監督重建任務之框架,在實驗結果中達到良好的效果。為了達成在降低視覺轉換器的參數與計算量還能保持相同的準確率,本研究擬議開發將視覺轉換器進行輕量化。
為了評估所提方法的性能,本研究使用合作醫院所提供的腦出血資料集進行性能評估。為了評估影像分割結果,使用了Dice、Pixel accuracy、MIoU作為評估指標。與現有方法BT-U-Net與Coash mask相比,所提出方法在各個評估因子中提供良好的性能。實驗結果表明,所提出的方法可以很好的處理醫學影像。

In this study, we propose a self-supervised learning framework based on multi-task learning and a lightweight semantic segmentation model based on a visual transducer. The novel self-supervised framework is divided into two parts: the upper part is able to integrate the framework of existing self-supervised comparison methods, and the lower part uses the framework of self-supervised task reconstruction based on auto-encoder, which achieves good results in the experimental results. In order to achieve the same accuracy while reducing the parameters and computation of the visual transducer, this study proposes to develop a light weighting of the visual transducer.
In order to evaluate the performance of the proposed method, a brain hemorrhage dataset provided by the collaborating hospital was used for performance evaluation. To evaluate the image segmentation results, Dice, Pixel accuracy, and MIoU were used as evaluation metrics. Compared with the existing method BT-U-Net and Coash Mask, the proposed method provides good performance in each evaluation factor. The experimental results show that the proposed method can handle medical images well.

摘要 i
ABSTRACT ii
誌謝 iii
目錄 iv
圖目錄 vi
表目錄 iv
1 第一章 緒論 1
1.1 研究背景 1
1.2 研究動機 3
1.3 論文架構 5
2 第二章 文獻探討 6
2.1 輕量化卷積神經網路(light-weight CNN) 6
2.1.1 MobileNet 8
2.1.2 Cross Stage Partial Network (CSPNet) 9
2.1.3 EfficientNet 11
2.2 CNN-based semantic segmentation network 12
2.2.1 U-Net 13
2.2.2 DeepLab 14
2.3 視覺轉換器(Vision Transformers , ViTs) 16
2.3.1 Self-attention mechanism 17
2.3.2 Light-weight ViT 20
2.4 自監督式學習(Self-Supervised learning , SSL) 21
2.4.1 Contrastive SSL 24
2.5 多任務學習(Multi-task learning , MTL)[37] 33
3 第三章 系統設計與方法 35
3.1 Encoder 37
3.1.1 Inception mixer Block 43
3.2 Decoder 46
3.3 Segmentation head 50
4 第四章 自監督式學習框架 52
4.1 Pretext task 53
4.2 下游任務(Downstream task) 57
5 第五章 實驗設計與結果 58
5.1 實驗平臺與資料集 58
5.1.1 實驗平臺 58
5.1.2 Dataset 59
5.1.3 實驗參數 60
5.2 評估因子 63
5.2.1 混淆矩陣 63
5.2.2 特異度、真陰性率 (Specificity , True Negative Rate) 65
5.2.3 召回率、靈敏性、真陽性率(Recall , Sensitivity ,True Positive Rate) 65
5.2.4 Precision 65
5.2.5 Pixel Accuracy 65
5.2.6 Dice coefficient / F1-score 66
5.2.7 Intersection-Over-Union(IoU) 66
5.2.8 mean Intersection-Over-Union(mIoU) 66
5.3 消融實驗(Ablation Study) 67
5.4 視覺品質評估(Visual Quality Evaluation) 69
5.5 客觀評估(Objective Evaluation) 74
6 第六章 結論與探討 82
7 參考文獻 84
[1]A. Vaswani et al., “Attention is all you need,” in Advances in Neural Information Processing Systems, 2017, vol. 30.
[2]A. DosoViTskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,”(2020). arXiv:2010.11929
[3]N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. J. a. e.-p. Zagoruyko, “End-to-End Object Detection with Transformers,”(2020).arXiv.2005.12872
[4]S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in Advances in Neural Information Processing Systems, 2015, vol. 28.
[5]E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. J. A. i. N. I. P. S. Luo, “SegFormer: Simple and efficient design for semantic segmentation with transformers,” Advances in Neural Information Processing Systems, vol. 34, pp. 12077-12090, 2021.
[6]A. Howard et al., “Searching for mobilenetv3,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 1314-1324.
[7]X. Zhang, X. Zhou, M. Lin, and J. Sun, “Shufflenet: An extremely efficient convolutional neural network for mobile devices,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6848-6856.
[8]F. Iandola, M. Moskewicz, S. Karayev, R. Girshick, T. Darrell, and K. J. a. p. a. Keutzer, “Densenet: Implementing efficient convnet descriptor pyramids,”(2014). arXiv:1404.1869
[9]C.-Y. Wang, H.-Y. M. Liao, Y.-H. Wu, P.-Y. Chen, J.-W. Hsieh, and I.-H. Yeh, “CSPNet: A new backbone that can enhance learning capability of CNN,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2020, pp. 390-391.
[10]T. Elsken, J. H. Metzen, and F. J. T. J. o. M. L. R. Hutter, “Neural architecture search: A survey,” Machine Learning Research, vol. 20, no. 1, pp. 1997-2017, 2019.
[11]M. Tan et al., “Mnasnet: Platform-aware neural architecture search for mobile,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 2820-2828.
[12]M. Tan, R. Pang, and Q. V. Le, “Efficientdet: Scalable and efficient object detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 10781-10790.
[13]K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[14]S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1492-1500.
[15]O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, 2015, pp. 234-241: Springer.
[16]J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431-3440.
[17]L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 801-818.
[18]Z. Liu et al., “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10012-10022.
[19]J. Fang, L. Xie, X. Wang, X. Zhang, W. Liu, and Q. Tian, “Msg-transformer: Exchanging local spatial information by manipulating messenger tokens,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12063-12072.
[20]H. Wang, Y. Zhu, B. Green, H. Adam, A. Yuille, and L.-C. Chen, “Axial-deeplab: Stand-alone axial-attention for panoptic segmentation,” in European conference on computer vision, 2020, pp. 108-126: Springer.
[21]X. Dong et al., “Cswin transformer: A general vision transformer backbone with cross-shaped windows,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12124-12134.
[22]S. Wu, T. Wu, H. Tan, and G. Guo, “Pale transformer: A general vision transformer backbone with pale-shaped attention,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2022, vol. 36, no. 3, pp. 2731-2739.
[23]F. Lin, Y. Ma, S. Wu, L. Yu, and S. J. a. p. a. Tian, “AxWin Transformer: A Context-Aware Vision Transformer Backbone with Axial Windows,”(2023). arXiv:2305.01280
[24]N. Alam, S. Kolawole, S. Sethi, N. Bansali, and K. J. a. e.-p. Nguyen, “Vision Transformers for Mobile Applications: A Short Survey,”(2023).arXiv.2305.19365
[25]X. Liu, H. Peng, N. Zheng, Y. Yang, H. Hu, and Y. Yuan, “EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 14420-14430.
[26]X. Liu et al., “Self-supervised learning: Generative or contrastive,” IEEE Transactions on Knowledge Data Engineering, vol. 35, no. 1, pp. 857-876, 2021.
[27]T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International conference on machine learning, 2020, pp. 1597-1607: PMLR.
[28]K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9729-9738.
[29]J.-B. Grill et al., “Bootstrap your own latent-a new approach to self-supervised learning,” Advances in neural information processing systems, vol. 33, pp. 21271-21284, 2020.
[30]X. Chen and K. He, “Exploring simple siamese representation learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 15750-15758.
[31]G. Li, R. Togo, T. Ogawa, and M. Haseyama, “Tribyol: Triplet byol for self-supervised representation learning,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 3458-3462: IEEE.
[32]G. E. Hinton and R. R. J. s. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” in science, 2006, vol. 313, no. 5786, pp. 504-507.
[33]Z. Xie et al., “Simmim: A simple framework for masked image modeling,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 9653-9663.
[34]I. Goodfellow et al., “Generative adversarial networks,” in Communications of the ACM, 2020, vol. 63, no. 11, pp. 139-144.
[35]H. Poormohammadi, C. Eslahchi, and R. J. a. p. a. Tusserkani, “TripNet: A Heuristic Algorithm for Constructing Rooted Phylogenetic Networks from Triplets,”(2012). arXiv:1201.3722
[36]N. S. Punn and S. J. M. L. Agarwal, “BT-U-Net: A self-supervised learning framework for biomedical image segmentation using barlow twins with U-net models,” Machine Learning, pp. 1-16, 2022.
[37]X. Xu, H. Zhao, V. Vineet, S.-N. Lim, and A. Torralba, “MTFormer: Multi-task Learning via Transformer and Cross-Task Reasoning,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVII, 2022, pp. 304-321: Springer.
[38]Y. Chen et al., “Drop an octave: Reducing spatial redundancy in convolutional neural networks with octave convolution,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 3435-3444.
[39]C. Si, W. Yu, P. Zhou, Y. Zhou, X. Wang, and S. J. a. e.-p. Yan, “Inception Transformer,”(2022).arXiv.2205.12956
[40]D. Hendrycks and K. J. a. e.-p. Gimpel, “Gaussian Error Linear Units (GELUs),”(2016). arXiv:1606.08415
[41]T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117-2125.
[42]S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for instance segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8759-8768.
[43]I. Loshchilov and F. J. a. p. a. Hutter, “Decoupled weight decay regularization,”(2017). arXiv:1711.05101
[44]I. Loshchilov and F. J. a. p. a. Hutter, “Sgdr: Stochastic gradient descent with warm restarts,” 2016.
[45]S. Singh et al., “Self-Supervised Feature Learning for Semantic Segmentation of Overhead Imagery,” in BMVC, 2018, vol. 1, no. 2, p. 4.
電子全文 電子全文(網際網路公開日期:20290116)
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top