研究生(外文):Cheng-Nan Wu
論文名稱(外文):A Two-Layer Image Spam Filtering System with Self-Adaptable Mechanism
指導教授(外文):Tzong-Jye Liu
外文關鍵詞:spam filteringself-adaptablemachine learningimage spamimage classification
本篇論文提出了一個二階層的圖像式垃圾郵件過濾系統,並加入了自我調整的機制。第一層為一個快速過濾的分類器,我們利用顏色灰階化後的出現比例做為垃圾圖像的比對特徵,因此可以過濾大量的相似垃圾圖像;第二層為一個較精確的分類器,我們使用常用圖像的視覺特徵,並配合SVM (Support Vector Machine) 用以分類第一層無法識別的圖像;最後透過自我調整的機制,讓第二層能夠立即回饋資訊給第一層以防止大量的相似的垃圾圖像再度進入系統。而第一層與第二層對圖像判斷的結果和該圖像的視覺特徵也會被儲存在資料庫中,用以重新訓練並更新系統的SVM模型。我們分別從多個不同的資料集作為訓練和測試資料,用以模擬環境的變化,實驗顯示本系統能夠改善準確率與處理時間。此外,我們也將不同的資料集混和作為訓練和測試資料,準確率也能達到93.2%的水準。
Image spam embeds the information into images to circumvent the text-based spam mail filtering system. Previous research does not consider cases where the behavior of spammers will change as time goes on.
In this thesis, we propose a framework, which can adapt new kinds of image spam as time goes on. The proposed framework is a two-layer imaging spam filtering system with a self-adaptable mechanism. The first layer is the fast classification module; it uses the color proportion of input images to filter similar spam images. Therefore, this module can filter a lot of similar spam images quickly. The second layer is the precise classification module; in this thesis, we use the visual features classification module. This module applies the SVM algorithm to classify input images that are not readily classified by the first layer. Finally, by the proposed self-adaptable mechanism, the second layer will immediately feed spam image information back to the first layer. Thus, the first layer may process new kinds of image spam by the updated information.
The experimental results show that the proposed framework improves both the accuracy rate and the performance. The accuracy result part of our experiment uses limited training data to reach approximately 93.2%.
