論文名稱(外文):Towards Enhanced Analysis of Lung Cancer Lesions in EBUS-TBNA -- A Semi-Supervised Video Object Detection Method
口試委員(外文):Ching-Chun HuangVincent S. Tseng
外文關鍵詞:Deep learningLung cancerVision TransformerEndobronchial ultrasound-guided transbronchial needle aspirationMediastinal lesionsVideo object detection
本研究旨在建立一套支氣管內視鏡超聲波肺部病灶的電腦輔助診斷系統協助醫師找出病灶區,在EBUS-TBNA手術過程中,醫師可以利用灰階超聲影像判斷出病灶位置,然而這些影像中包含大量雜訊且會受其他組織或是血管影響,大多難以判讀。先前研究缺乏目標檢測模型在EBUS-TBNA的應用,且尚無對於EBUS-TBNA資料集中標註資料少提出良好的解決方法,而其他超聲影像上的目標檢測模型相關研究,雖都能在各自任務上捕捉到目標區域,但由於訓練以及預測皆是用二維圖像,因此沒辦法透過捕捉時間維度的特徵去改善預測值。本研究提出了一套基於三維影像的目標檢測模型,首先會透過生成式模型生成一組較好的查詢,之後藉由注意力機制捕捉時間上的關聯性,並在後續利用篩選機制挑選出適合的先前幀資訊傳遞到當前幀,之後用老師學生模型訓練方法進一步利用未標註資料優化模型,透過加入不同的資料增強以及特徵對齊,使模型可以具備一定的抗干擾能力。測試結果顯示,此模型使用捕捉時空資訊以及半監督學習的方法,在測試資料集上取得了48.7 AP,相較其他模型表現得更好,且也達到了79.2 AR大幅領先了其他現有模型,大幅降低了病灶漏檢率。
This study aims to establish a computer-aided diagnostic system for lung lesions using endobronchial ultrasound (EBUS) to assist physicians in identifying lesion areas. During EBUS-transbronchial needle aspiration (EBUS-TBNA) procedures, physicians rely on grayscale ultrasound images to determine the location of lesions. However, these images often contain significant noise and can be influenced by surrounding tissues or blood vessels, making interpretation challenging. Previous research has lacked the application of object detection models to EBUS-TBNA, and there has been no well-defined solution for the lack of annotated data in the EBUS-TBNA dataset. In related studies on ultrasound images, although models have been successful in capturing target regions for their respective tasks, their training and predictions have been based on two-dimensional images, limiting their ability to leverage temporal features for improved predictions. This study introduces a three-dimensional image-based object detection model. It first generates a set of improved queries using a diffusion model, then captures temporal correlations through an attention mechanism. A filtering mechanism selects relevant information from previous frames to pass to the current frame. Subsequently, a teacher-student model training approach is employed to further optimize the model using unlabeled data. By incorporating various data augmentation and feature alignment, the model gains robustness against interference. Test results demonstrate that this model, which captures spatiotemporal information and employs semi-supervised learning methods, achieves an Average Precision (AP) of 48.7 on the test dataset, outperforming other models. It also achieves an Average Recall (AR) of 79.2, significantly reducing the miss rate of lesions and leading over existing models.
2.1DEtection TRansformer (DETR)8
2.3Diffusion model10
3.1.1 實驗概述12
3.1.2 實驗流程12
3.1.3 EBUS-TBNA資料來源13
3.3.1 SVDETR (Short time video detection transformer)15
3.3.2 生成式擴散模型 (Diffusion model)17
3.3.3 半監督學習(Teacher student model)17
