跳到主要內容

臺灣博碩士論文加值系統

(44.221.73.157) 您好!臺灣時間:2024/06/17 21:47
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:鄭敬達
研究生(外文):CHENG, CHING-TA
論文名稱:基於Transformer架構在少數標註醫學影像疾病辨識之應用:以青光眼為例
論文名稱(外文):Application of Transformer Architecture in Disease Identification from Medical Images with Few Labeled Data : Using Glaucoma as an Example
指導教授:王元凱王元凱引用關係
指導教授(外文):WANG, YUAN-KAI
口試委員:王元凱張意政林正忠
口試委員(外文):WANG, YUAN-KAICHANG, I-CHENGLIN, CHENG-CHUNG
口試日期:2023-07-14
學位類別:碩士
校院名稱:輔仁大學
系所名稱:電機工程學系碩士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2023
畢業學年度:111
語文別:中文
論文頁數:64
中文關鍵詞:醫學影像影像辨識少量資料光學相干斷層掃描青光眼Vision TransformerHeatmap
外文關鍵詞:Medical ImageImage ClassificationFew DataOptical Coherence TomographyGlaucomaVision TransformerHeatmap
相關次數:
  • 被引用被引用:0
  • 點閱點閱:162
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
近幾年,深度學習網路因其強大的學習能力,在電腦視覺影像辨識領域上備受關注,尤其是Transformer架構在自然語言處理(NLP)領域獲得巨大的成功,Vision Transformer成功將其應用在影像上,許多的研究也漸漸地將Transformer架構應用在影像辨識領域,並希望在醫學影像分析領域做出貢獻,但是醫學影像已標註的資料取得不易,且非常的消耗時間與成本,因此使得少量的醫學影像訓練資料在深度學習模型中無法有效的學習。我們首先關注在光學相干斷層掃描(OCT)影像,並使用OCT影像進行青光眼疾病的疾病辨識,目前現有的方法中,並沒有基於Transformer架構針對少量的醫學影像資料進行疾病辨識的有效方法。本論文在少量的醫學影像標註資料的情況下,提出ResViT網路架構,使用Resnet的卷積層和Vision Transformer架構進行結合,使模型架構可以快速的學習影像當中的特徵資訊,以解決在訓練資料不足時無法有效訓練的問題。本論文還提出了每個階段和整體的架構演算法和最終的學習演算法,來對模型進行一個精確的解釋。在實驗的部分我們的方法與現有的其他方法相比,在少量的醫學影像訓練資料的情況下,與CvT[29]、CeiT[30]、LeViT[31]相比,達到最好的疾病辨識性能。並且,我們發現在OCT影像辨識青光眼的任務上,神經網路的寬度比深度來的更為重要。最後,我們還使用了六個不同的Heatmap方法來進行OCT影像的分析,我們發現OCT影像在不同的青光眼疾病時,所關注的區域並不完全相同,根據不同疾病的病症,正常眼壓青光眼(NTG)會首先關注視神經盤和Bruch’s membrane opening (BMO)區域,原發性開角型青光眼(POAG)會首先關注視神經纖維層和Lamina cribrosa區域,原發性隅角閉鎖性青光眼(PACG)的情況下所關注的區域並不一定。
In recent years, deep learning networks have garnered significant attention for their powerful learning capabilities in the field of computer vision and image recognition. Particularly, the Transformer architecture has achieved tremendous success in the domain of Natural Language Processing(NLP). Many researchers have started exploring the use of Transformer architecture in the field of image recognition, aiming to make contributions in the domain of medical image analysis. However, acquiring labeled medical imaging data is challenging and time-consuming, making it difficult for deep learning models to effectively learn from limited training data. Our focus is on Optical Coherence Tomography(OCT) images for the detection of glaucoma. In this paper, we propose the ResViT network architecture, which combines the convolutional layers of ResNet with the Vision Transformer architecture. This combination allows the model to quickly learn the informative features present in the images, addressing the challenge of training with limited data.Our experiments demonstrated that ResViT outperforms existing methods like CvT[29], CeiT[30], and LeViT[31] in disease recognition tasks using OCT images for glaucoma detection. In this task, we discovered that the network's width is more crucial than its depth. We also introduce stage-wise and overall architectural algorithms, along with a final learning algorithm, to provide a comprehensive explanation of the model. Furthermore, We also applied six heatmap methods to analyze OCT images, revealing that areas of interest shift based on the specific type of glaucoma. For instance, normal tension glaucoma (NTG) mainly targets the optic disc and Bruch's membrane opening (BMO), while primary open-angle glaucoma (POAG) focuses on the retinal nerve fiber layer and Lamina cribrosa. However, the focal points in primary angle-closure glaucoma (PACG) can vary across cases.
摘要................................................................................................................................ i
英文摘要...................................................................................................................... ii
目錄.............................................................................................................................. iv
表目錄.......................................................................................................................... vi
圖目錄......................................................................................................................... vii
第 1 章 簡介................................................................................................................. 1
1.1 研究背景................................................................................................................ 2
1.2 研究動機與目的................................................................................................... 4
1.3 方法概要................................................................................................................ 5
1.4 論文概要................................................................................................................ 7
第 2 章 文獻探討..........................................................................................................8
2.1 傳統卷積神經網路(CNN)影像辨識................................................................. 8
2.1.1 VGG 網路············································································································8
2.1.2 GoogleNet 網路·······························································································9
2.1.3 ResNet 網路 ······································································································9
2.1.4 DenseNet 網路 ······························································································10
2.2 Transformer 到 Vision Transformer 影像辨識 ....................................... 10
2.2.1 Transformer ····································································································10
2.2.2 Vision Transformer ······················································································11
2.2.3 Vision transformer with convolution·····················································11
2.3 Vision transformer 技術於醫學影像上的發展......................................... 12
2.4 OCT 影像和青光眼............................................................................................ 13
第 3 章 ViT 網路結合 Convolution.......................................................................17
3.1 ResViT architecture ........................................................................................ 18
3.1.1 Resnet convolution······················································································18
3.1.2 Class, patch and position embedding ·················································19
3.1.3 Transformer block ························································································20
3.2 Learning Algorithm......................................................................................... 24
第 4 章 Heatmap 分析..............................................................................................25
4.1 Grad cam ............................................................................................................ 26
4.2 Grad cam plus plus.......................................................................................... 27
4.3 Eigen cam............................................................................................................ 28
4.4 Eigen gradcam ................................................................................................. 28
4.5 Score cam .......................................................................................................... 29
4.6 Layer cam............................................................................................................ 31
第 5 章 實驗..................................................................................................................34
5.1 實驗資料集........................................................................................................... 34
5.1.1 FJUOCT for Glaucoma ·················································································34
5.1.2 訓練資料前處理 ·······························································································37
5.2 資料集類別選擇分析 ......................................................................................... 39
5.3 網路架構及參數分析.......................................................................................... 42
5.4 State-of-the-Art 比較....................................................................................... 45
5.5 Heatmap 分析..................................................................................................... 50
第 6 章 結論...................................................................................................................58
參考文獻........................................................................................................................59
[1]M. S. Sung, M. Y. Heo, and S. W Park, "Bruch’s membrane opening enlargement and its implication on the myopic optic nerve head," Scientific Reports, vol. 9, article number 19564, 2019.
[2]I. I. Bussel, G. Wollstein, and J. S. Schuman, "OCT for glaucoma diagnosis, screening and detection of glaucoma progression," British Journal of Ophthalmology, vol. 98, pp. 15213, 2014.
[3]Y. LeCun, B. Boser, J. S. Denker et al, "Backpropagation applied to handwritten zip code recognition," Neural Computation, vol. 01, pp. 541-551, 1989.
[4]A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Communications of the ACM, vol. 60, no. 6, pp. 84-90, 2017.
[5]J. Deng, W. Dong, R. Socher et al, "ImageNet: a large-scale hierarchical image database," in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 248-255, 2009.
[6]K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[7]C. Szegedy, W. Liu, Y. Jia et al, "Going deeper with convolutions," in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9, 2015.
[8]M. Lin, Q. Chen, and S. Yan, "Network in network," arXiv preprint arXiv:1312.4400, 2013.
[9]S. Hochreiter, "The vanishing gradient problem during learning recurrent neural nets and problem solutions," International Journal of Uncertainty Fuzziness and Knowledge-Based Systems, vol. 6, no. 2, pp. 107-116, 1998.
[10]K. He and J. Sun, "Convolutional neural networks at constrained time cost," in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5353-5360, 2015.
[11]R. K. Srivastava, K. Greff, and J. Schmidhuber, "Highway networks," arXiv preprint arXiv:1505.00387, 2015.
[12]K. He, X. Zhang, S. Ren et al, "Deep residual learning for image recognition," in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2015.
[13]G. Huang, Z. Liu et al, "Densely connected convolutional networks," in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700-4708, 2017.
[14]D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," arXiv preprint arXiv:1409.0473, 2014.
[15]A. P. Parikh, D. Das, J. Uszkoreit et al, "A decomposable attention model for natural language inference," arXiv preprint arXiv:1606.01933, 2016.
[16]A. Vaswani, N. Shazeer, N. Parmar et al, "Attention is all you need," in Proc. Neural Information Processing Systems, 2017.
[17]J. Devlin, M. W. Chang, K. Lee et al, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
[18]T. Brown, B. Mann, N. Ryder et al, "Language models are few-shot learners," in Proc. Neural Information Processing Systems, pp. 1877-1901, 2020.
[19]N. Parmar, A. Vaswani, J. Uszkoreit et al, "Image transformer," in Proc. International Conference on Machine Learning, pp. 4055-4064, 2018.
[20]H. Hu, Z. Zhang, Z. Xie et al, "Local relation networks for image recognition," in Proc. IEEE/CVF International Conference on Computer Vision, pp. 3464-3473, 2019.
[21]P. Ramachandran, N. Parmar, A. Vaswani et al, "Stand-alone self-attention in vision models," in Proc. Neural Information Processing Systems, pp. 68-80, 2019.
[22]H. Zhao, J. Jia, and V. Koltun, "Exploring self-attention for image recognition," in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10076-10085, 2020.
[23]R. Child, S. Gray, A. Radford et al, "Generating long sequences with sparse transformers," arXiv preprint arXiv:1904.10509, 2019.
[24]D. Weissenborn, O. Täckström, and J. Uszkoreit, "Scaling autoregressive video models," arXiv preprint arXiv:1906.02634, 2019.
[25]J. B. Cordonnie, A. Loukas, and M. Jaggi, "On the relationship between self-attention and convolutional layers," arXiv preprint arXiv:1911.03584, 2019.
[26]M. Chen, A. Radford, R. Child et al, "Generative pretraining from pixels," in Proc. International Conference on Machine Learning, vol. 119, pp. 1691-1703, 2020.
[27]A. Dosovitskiy, L. Beyer, A. Kolesnikov et al, "An image is worth 16x16 words: Transformers for image recognition at scale," arXiv preprint arXiv:2010.11929v2, 2021.
[28]X. Chu, Z. Tian, B. Zhang et al, "Conditional positional encodings for vision transformers," arXiv preprint arXiv:2102.10882, 2021.
[29]H. Wu, B. Xiao, N. Codella et al, "Cvt: Introducing convolutions to vision transformers," in Proc. IEEE/CVF International Conference on Computer Vision, pp. 22-31, 2021.
[30]K. Yuan, S. Guo, Z. Liu et al, "Incorporating convolution designs into visual transformers," in Proc. IEEE/CVF International Conference on Computer Vision, pp. 579-588, 2021.
[31]B. Graham, A. EI-Nouby, H. Touvron et al, "Levit: a vision transformer in convnet's clothing for faster inference," in Proc. IEEE/CVF International Conference on Computer Vision, pp. 12259-12269, 2021.
[32]C. Matsoukas, J. F. Haslum, M. Söderberg et al, "Is it time to replace cnns with transformers for medical images?" arXiv preprint arXiv:2108.09038, 2021.
[33]S. Perera, S. Adhikari, and A. Yilmaz, "Pocformer: A lightweight transformer architecture for detection of covid-19 using point of care ultrasound," in Proc. IEEE International Conference on Image Processing, pp. 195-199, 2021.
[34]S. Park, G. Kim, J. Kim et al, "Federated split vision transformer for COVID-19 CXR diagnosis using task-agnostic training," arXiv preprint arXiv:2111.01338, 2021.
[35]M. A. Ferrag, O. Friha, L. Maglaras et al, "Federated deep learning for cyber security in the internet of things: Concepts, applications, and experimental analysis," IEEE Access, vol. 9, pp. 138509-138542, 2021.
[36]P. Vepakomma, O. Gupta, T. Swedish et al, "Split learning for health: Distributed deep learning without sharing raw patient data," arXiv preprint arXiv:1812.00564, 2018.
[37]M. Lu, Y. Pan, D. Nie et al, "Smile: Sparse-attention based multiple instance contrastive learning for glioma sub-type classification using pathological images," MICCAI Workshop on Computational Pathology, vol. 156, pp. 159-169, 2021.
[38]B. Gheflati and H. Rivaz, "Vision transformers for classification of breast ultrasound images," in Proc. Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 480-483, 2022.
[39]W. Al-Dhabyani, M. Gomaa, H. Khaled et al, "Dataset of breast ultrasound images," Data in Brief, vol. 28, pp. 104863, 2020.
[40]M. H. Yap, G. Pons, J. Marti et al, "Automated breast ultrasound lesions detection using convolutional neural networks," IEEE Journal of Biomedical and Health Informatics, vol. 22, pp. 1218-1226, 2017.
[41]S. Yu, K. Ma, Q. Bi et al, "Mil-vt: Multiple instance learning enhanced vision transformer for fundus image classification," MICCAI, vol. 12908, pp. 45-54, 2021.
[42]R. Sun, Y. Li, T. Zhang et al, "Lesion-aware transformers for diabetic retinopathy grading," in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10938-10947, 2021.
[43]J. Wu, R. Hu, Z. Xiao et al, "Vision Transformer‐based recognition of diabetic retinopathy grade," Medical Physics, vol. 48, pp. 7850-7863, 2021.
[44]N. AlDahoul, H. A. Karim, M. J. T. Tan et al, "Encoding retina image to words using ensemble of vision transformers for diabetic retinopathy grading," F1000Research, vol. 10, pp. 948, 2021.
[45]J. S. Schuman, M. R. Hee, A. V. Arya et al, "Optical coherence tomography: a new tool for glaucoma diagnosis," Current Opinion in Ophthalmology, vol. 6, pp. 89-95, 1995.
[46]C. Schweitzer, M. Le Goff, J. F. Korobelnik et al, "Screening of glaucoma using spectral-domain optical coherence tomography (Sd-Oct) in an elderly population: the alienor study," Investigative Ophthalmology and Visual Science, vol. 56, pp. 1025, 2015.
[47]B. E. K. Klein, C. A. Johnson, S. M. Meuer et al, "Nerve fiber layer thickness and characteristics associated with glaucoma in community living older adults: prelude to a screening trial," Ophthalmic Epidemiology, vol. 24, pp. 104-110, 2017.
[48]C. K. S. Leung, C. Y. L. Cheung, R. N. Weinreb et al, "Evaluation of retinal nerve fiber layer progression in glaucoma: a comparison between the fast and the regular retinal nerve fiber layer scans," Ophthalmology, vol. 118, pp. 763-767, 2011.
[49]J. H. Na, K. R. Sung, S. Baek et al, "Detection of glaucoma progression by assessment of segmented macular thickness data obtained using spectral domain optical coherence tomography," Investigative Ophthalmology and Visual Science, vol. 53, pp. 3817-3826, 2012.
[50]J. H. Na, K. R. Sung, J. R. Lee et al, "Detection of glaucomatous progression by spectral-domain optical coherence tomography," Ophthalmology, vol. 120, pp. 1388-1395, 2013.
[51]A. R. Ran, C. C. Tham, P. P. Chan et al, "Deep learning in glaucoma with optical coherence tomography: a review," Eye, vol.35, pp. 188-201, 2021.
[52]Y. C. Tham, X. Li, T. Y. Wong et al, "Global prevalence of glaucoma and projections of glaucoma burden through 2040: a systematic review and meta-analysis," Ophthalmology, vol. 121, pp. 2081-2090, 2014.
[53]H. A. Quigley, S. K. West, J. Rodriguez et al, "The prevalence of glaucoma in a population-based study of hispanic subjects: Proyecto VER," Archives of Ophthalmology, vol. 119, pp. 1819-1826, 2001.
[54]A. P. Rotchford, J. F. Kirwan, M. A. Muller et al, "Temba glaucoma study: a population-based cross-sectional survey in urban south africa," Ophthalmology, vol. 110, pp. 376-382, 2003.
[55]F. Topouzis, A. L. Coleman, A. Harris et al, "Factors associated with undiagnosed open-angle glaucoma: the thessaloniki eye study,"American Journal of Ophthalmology, vol. 145, pp. 327-335, 2008.
[56]Y. Shaikh, F. Yu, and A. L. Coleman, "Burden of undetected and untreated glaucoma in the united states," American Journal of Ophthalmology, vol. 158, pp. 1121-1129, 2014.
[57]J. Chua, M. Baskaran, P. G. Ong et al, "Prevalence, risk factors, and visual features of undiagnosed glaucoma: the singapore epidemiology of eye diseases study," JAMA Ophthalmology, vol. 133, pp. 938-946, 2015.
[58]R. Salowe, J. Salinas, N. H. Farbman et al, "Primary open-angle glaucoma in individuals of african descent: a review of risk factors," Journal of Clinical and Experimental Ophthalmology, vol. 6, pp. 450, 2015.
[59]Z. Li, Y. He, S. Keel et al, "Efficacy of a deep learning system for detecting glaucomatous optic neuropathy based on color fundus photographs," Ophthalmology, vol. 125, pp. 1199-1206, 2018.
[60]H. Liu, L. Li, I. M. Wormstone et al, "Development and validation of a deep learning system to detect glaucomatous optic neuropathy using fundus photographs," JAMA Ophthalmology, vol. 137, pp. 1353-1360, 2019.
[61]J. D. Rossetto, L. A. S. Melo Jr, M. S. Campos et al, "Agreement on the evaluation of glaucomatous optic nerve head findings by ophthalmology residents and a glaucoma specialist," Clinical Ophthalmology, vol. 11, pp. 1281-1284, 2017.
[62]J. F. De Boer, B. Cense, B. H. Park et al, "Improved signal-to-noise ratio in spectral-domain compared with time-domain optical coherence tomography," Optics Letters, vol. 28, pp. 2067-2069, 2003.
[63]R. T. Chang, J. K. O'rese, W. J. Feuer et al, "Sensitivity and specificity of time-domain versus spectral-domain optical coherence tomography in diagnosing early to moderate glaucoma," Ophthalmology, vol. 116, pp. 2294-2299, 2009.
[64]D. E. Johnson, S. R. El-Defrawy, D. R. P. Almeida et al, "Comparison of retinal nerve fibre layer measurements from time domain and spectral domain optical coherence tomography systems," Canadian Journal of Ophthalmology, vol. 44, pp. 562-566, 2009.
[65]S. Maetschke, B. Antony, H. Ishikawa et al, "A feature agnostic approach for glaucoma detection in OCT volumes," PloS One, vol. 14, 2019.
[66]M. S. Sung, M. Y. Heo, H. Heo et al, "Bruch’s membrane opening enlargement and its implication on the myopic optic nerve head," Scientific Reports, vol. 9, article number 19564, 2019.
[67]R. Li, X. Wang, Y. Wei et al, "Diagnostic capability of different morphological parameters for primary open‐angle glaucoma in the chinese population," BMC Ophthalmology, vol. 21, pp. 1-9, 2021.
[68]K. R. Sung, J. H. Na, and Y Lee, "Glaucoma diagnostic capabilities of optic nerve head parameters as determined by cirrus HD optical coherence tomography," Journal of Glaucoma, vol. 21, pp. 498-504, 2012.
[69]B. C. Chauhan and C. F. Burgoyne, "From clinical examination of the optic disc to clinical assessment of the optic nerve head: a paradigm change," American Journal of Ophthalmology, vol.156, pp. 218-227, 2013.
[70]B. C. Chauhan, N. O'Leary, F. A. AlMobarak et al, "Enhanced detection of open-angle glaucoma with an anatomically accurate optical coherence tomography-derived neuroretinal rim parameter," Ophthalmology, vol.120, pp. 535-543, 2013.
[71]D. Park, S. P. Park, and K. I. Na, "Comparison of retinal nerve fiber layer thickness and bruch’s membrane opening minimum rim width thinning rate in open-angle glaucoma," Scientific Reports, vol. 12, article number 16069, 2022.
電子全文 電子全文(網際網路公開日期:20280926)
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊