跳到主要內容

臺灣博碩士論文加值系統

(44.200.122.214) 您好!臺灣時間:2024/10/07 21:58
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:鄔正男
研究生(外文):Cheng-Nan Wu
論文名稱:具自我調整能力之兩階層圖像垃圾郵件過濾系統
論文名稱(外文):A Two-Layer Image Spam Filtering System with Self-Adaptable Mechanism
指導教授:劉宗杰劉宗杰引用關係
指導教授(外文):Tzong-Jye Liu
學位類別:碩士
校院名稱:逢甲大學
系所名稱:資訊工程所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2011
畢業學年度:99
語文別:英文
論文頁數:36
中文關鍵詞:垃圾郵件過濾自我調整機械學習圖像式垃圾郵件圖像分類
外文關鍵詞:spam filteringself-adaptablemachine learningimage spamimage classification
相關次數:
  • 被引用被引用:1
  • 點閱點閱:326
  • 評分評分:
  • 下載下載:26
  • 收藏至我的研究室書目清單書目收藏:0
圖像式垃圾郵件將訊息嵌入圖像中,以避免被傳統上以郵件文字內容為主的垃圾郵件過濾系統攔截。在前人的研究中,並沒有考慮垃圾圖像會隨時間改變特徵的現象。因此我們提出了一種能適應新形態的圖像式垃圾郵件的系統,這個系統能自我調整並有效的過濾大量的相似圖像式垃圾郵件。
本篇論文提出了一個二階層的圖像式垃圾郵件過濾系統,並加入了自我調整的機制。第一層為一個快速過濾的分類器,我們利用顏色灰階化後的出現比例做為垃圾圖像的比對特徵,因此可以過濾大量的相似垃圾圖像;第二層為一個較精確的分類器,我們使用常用圖像的視覺特徵,並配合SVM (Support Vector Machine) 用以分類第一層無法識別的圖像;最後透過自我調整的機制,讓第二層能夠立即回饋資訊給第一層以防止大量的相似的垃圾圖像再度進入系統。而第一層與第二層對圖像判斷的結果和該圖像的視覺特徵也會被儲存在資料庫中,用以重新訓練並更新系統的SVM模型。我們分別從多個不同的資料集作為訓練和測試資料,用以模擬環境的變化,實驗顯示本系統能夠改善準確率與處理時間。此外,我們也將不同的資料集混和作為訓練和測試資料,準確率也能達到93.2%的水準。
Image spam embeds the information into images to circumvent the text-based spam mail filtering system. Previous research does not consider cases where the behavior of spammers will change as time goes on.
In this thesis, we propose a framework, which can adapt new kinds of image spam as time goes on. The proposed framework is a two-layer imaging spam filtering system with a self-adaptable mechanism. The first layer is the fast classification module; it uses the color proportion of input images to filter similar spam images. Therefore, this module can filter a lot of similar spam images quickly. The second layer is the precise classification module; in this thesis, we use the visual features classification module. This module applies the SVM algorithm to classify input images that are not readily classified by the first layer. Finally, by the proposed self-adaptable mechanism, the second layer will immediately feed spam image information back to the first layer. Thus, the first layer may process new kinds of image spam by the updated information.
The experimental results show that the proposed framework improves both the accuracy rate and the performance. The accuracy result part of our experiment uses limited training data to reach approximately 93.2%.
誌謝 i
摘要 ii
Abstract iii
Table of Contents iv
List of Figures v
List of Tables vi
Chapter 1 Introduction 1
Chapter 2 Related Work 4
2.1 Support Vector Machine 4
2.2 Related Work 7
Chapter 3 The System Architecture 10
3.1 The System Framework 10
3.2 An Example 11
3.2.1 The First Layer – the Color Proportion Classifier 12
3.2.2 The Second Layer – the Visual Feature Classifier 13
3.2.3 The Self-Adaptable Mechanism 15
Chapter 4 Experimental Results 16
4.1 Data Sets 17
4.2 The Parameters of the First Layer 19
4.3 The Analysis of the Self-Adaptable Mechanism 20
4.3.1 Experiment for the Self-Adaptable Mechanism 21
4.3.2 Experiment for the Number of User Feedback Images 23
4.4 System Performance 25
4.4.1 The Accuracy Rate Influenced by the First Layer 25
4.4.2 The Processing Time 26
4.4.3 The First Layer Filtering Rate 28
4.5 The Measurements of the Proposed System 30
4.6 Conclusion on Experiment 31
Chapter 5 Discussion 32
Chapter 6 Conclusion and Future Work 33
References 34
[1] MessageLabs Intelligence Reports: http://www.messagelabs.com/mlireport/MessageLabsIntelligence_2010_Annual_Report_FINAL.pdf
[2] M. Sahami, S. Dumais, D. Heckerman, and E. Horviz, “A Bayesian approach to filtering junk e-mail,” AAAI’98 Workshop on Learning for Text Categorization, July 1998.
[3] M. Uemura and T. Tabata, “Design and Evaluation of a Bayesian-filter-based Image Spam Filtering Method,” International Conference on Information Security and Assurance, pp. 46-51, May 2008.
[4] H. Drucker, D. Wu, and V. N. Vapnik, “Support Vector Machines for Spam Categorization,” IEEE Transactions on Neural networks, vol. 10, no. 5, September 1999.
[5] The Official CAPTCHA Website: http://www.captcha.net/
[6] G. Fumera, I. Pillai, and F. Roli, “Spam filtering based on the analysis of text information embedded into images,” Journal of Machine Learning Research, vol. 7, pp. 2699-2720, June 1999.
[7] M. Dredze, R. Gevaryahu, and A. Elias-Bachrach, “Learning Fast Classifiers for Image Spam,” Proceedings of the fourth conference on email and anti-spam, August 2007.
[8] F. Gargiulo, A. Penta, A. Picariello, and C. Sansone, “Using Heterogeneous Features for Anti-spam Filters,” Proceedings of the 19th International Conference on Database and Expert Systems Application, pp. 670-674, September 2008.
[9] Q. Liu, Z.Q. Qin, H.R. Cheng, and M.C. Wan, “Efficient Modeling of Spam Images,” Proceedings of the third International Symposium on Intelligent Information Technology and Security Informatics, pp. 663-666, April 2010.
[10] H. B. Aradhye, G. K. Myers, and J. A. Herson, “Image Analysis for Efficient Categorization of Image-based Spam E-mail,” Proceedings of the Eighth International Conference on Document Analysis and Recognition, vol 2, pp. 914-918, September 2005.
[11] Z. Wang, W. Josephson, Q. Lv, M. Charikar, and K. Li, “Filtering Image Spam with Near-Duplicate Detection,” Proceedings of the Fourth Conference on Email and Anti-Spam, August 2007.
[12] T.J. Liu, W.L. Tsao, and C.L. Lee, “A High Performance Image-Spam Filtering System,” IEEE Proceedings of the Ninth International Symposium on Distributed Computing and Applications to Business Engineering and Science, pp. 445-449, August 2010.
[13] Y. Gao, M. Yang, and X. Zhao, “Image Spam Hunter,” IEEE International Conference on Acoustics, Speech and Signal Processing, March 2008.
[14] B. Byun, C.H. Lee, S. Webb, and C. Pu, “A Discriminative Classifier Learning Approach to Image Modeling and Spam Image Identification,” 4th Conference on Email and Anti-Spam, August 2007.
[15] P. He, X. Wen, and W. Zheng, “A Simple Method for Filtering Image Spam,” IEEE/ACIS International Conference on Computer and Information Science, pp. 910-913, June 2009.
[16] C. Cortes and V. Vapnik, MACHINE LEARNING, vol. 20, pp. 273-297, September 1995.

[17] SpamAssasian:
http://spamassassin.apache.org/
[18] M. Stricker and M. Orengo, “Similarity of color images,” SPIE Conference on Storage and Retrieval for Image and Video Databases III, vol. 2420, pp. 381-392, February 1995.
[19] C. Frankel, M. Swain, and V. Athitsos, “Webseer: An Image Search Engine for the World Wide Web,” Univ. of Chicago Techniacal Report TR96-14, August 1996.
[20] G. V. Cormack and A. Bratko, “Batch and Online Spam Filter Comparison,” 3rd Conference on Email and Anti-Spam, July 2006.
[21] Distributed Checksum Clearinghouse: http://www.rhyolite.com/dcc/
[22] J. Hu and A. Bagga, “Categorizing images in Web documents,” IEEE MultiMedia, vol. 11, pp. 22-30, January-March, 2004.
[23] Y Rui, T. S. Huang, and S.F. Chang, “Image Retrieval: Current Techniques, Promising Directions, and Open Issues,” Journal of Visual Communications and Image Representation, vol. 10, pp. 39-62, April 1999.
[24] LIBSVM: http://www.csie.ntu.edu.tw/~cjlin/libsvm/
[25] TREC: http://trec.nist.gov/data/spam.html
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top