(3.215.77.193) 您好!臺灣時間:2021/04/17 01:35
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:邱郁芬
研究生(外文):Yu-fen Chiu
論文名稱:以區域聯防為基礎之垃圾郵件防治研究
論文名稱(外文):Anti-Spam Study: an Alliance-based Approach
指導教授:陳嘉玫陳嘉玫引用關係鄭炳強鄭炳強引用關係
指導教授(外文):Chia-Mei ChenJeng-Bing Chiang
學位類別:碩士
校院名稱:國立中山大學
系所名稱:資訊管理學系研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2006
畢業學年度:94
語文別:英文
論文頁數:91
中文關鍵詞:強化學習約略集合理論文件分類垃圾郵件XCS分類元系統
外文關鍵詞:Rough set theoryReinforcement learningXCS classifier systemSpamText classification
相關次數:
  • 被引用被引用:1
  • 點閱點閱:142
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
垃圾郵件帶來的威脅日趨嚴重,顯示出垃圾郵件過濾技術的價值所在。現今的過濾技術多為機器學習與資料探勘的結合,這些技術強調能達到極高的準確度,但其誤判率卻不一定很低;在實際狀況中,誤判率造成的損失通常都是難以彌補的。許多垃圾郵件防治方案只是針對某些現行的技術提出改善,而混用多種演算法的研究又相當少見,於是本研究提出了區域聯防的架構,結合約略集合理論、基因演算法與XCS分類元系統,期望能廣為散播關於垃圾郵件的即時資訊,使郵件伺服器得以聯手防堵氾濫成災的垃圾郵件。
約略集合理論在處理不精確也不完整的資料方面有卓越的能耐,並且是有助於交換分享的規則導向演算法;又因約略集合理論計算最佳reduct組合屬於NP-hard的問題,所以需藉助基因演算法可在大量資料中快速搜尋、比對、演化出最佳解的特性,產生垃圾郵件的過濾規則。XCS中的強化學習能幫助各個郵件伺服器了解最適合自身的郵件分類準則。以區域聯防為基礎的垃圾郵件過濾成果,經過一些統計方法評估後證實有不錯的表現,並有以下兩個結論:
(1)從別台郵件伺服器交換來的過濾規則,確實對阻擋掉更多的垃圾郵件有貢獻。
(2)混用多種演算法的垃圾郵件防治方案能同時改善準確度與誤判率。
The growing problem of spam has generated a need for reliable anti-spam filters. There are many filtering techniques along with machine learning and data miming used to reduce the amount of spam. Such algorithms can achieve very high accuracy but with some amount of false positive tradeoff. Generally false positives are prohibitively expensive in the real world. Much work has been done to improve specific algorithms for the task of detecting spam, but less work has been report on leveraging multiple algorithms in email analysis. This study presents an alliance-based approach to classify, discovery and exchange interesting information on spam. Furthermore, the spam filter in this study is build base on the mixture of rough set theory (RST), genetic algorithm (GA) and XCS classifier system.
RST has the ability to process imprecise and incomplete data such as spam. GA can speed up the rate of finding the optimal solution (i.e. the rules used to block spam). The reinforcement learning of XCS is a good mechanism to suggest the appropriate classification for the email. The results of spam filtering by alliance-based approach are evaluated by several statistical methods and the performance is great. Two main conclusions can be drawn from this study: (1) the rules exchanged from other mail servers indeed help the filter blocking more spam than before. (2) a combination of algorithms improves both accuracy and reducing false positives for the problem of spam detection.
Chapter 1 Introduction....................................1
1.1 Recent Reports on Spam................................2
1.2 Problem Definition and Motivation.....................4
1.3 Reader’s Guide.......................................7
Chapter 2 Related Works...................................9
2.1 Spam Filtering Techniques Review......................9
2.2 Rough Sets Theory....................................17
2.3 Genetic Algorithm....................................22
2.4 XCS Classifier System................................25
Chapter 3 Alliance-based Approach........................30
3.1 Single-server System.................................32
3.2 System Architecture..................................39
3.3 Performance Criteria.................................46
Chapter 4 Evaluation and Validation......................49
4.1 Design of Experiments................................49
4.2 Steps of Experiments.................................60
4.3 The Respective Performances..........................63
4.4 The Overall Performance..............................67
Chapter 5 Conclusions and Future Work....................74
Appendix A–The Configuration of .procmailrc.............76
Appendix B–Miscellaneous Notation and System Parameters.77
Bibliography.............................................78
[1] Mark Levitt and Robert P. Mahowald, "Worldwide email usage 2005-2009 forecast: Email''s future depends on keeping its value high and its cost low, " Tech. Rep, pp. 36, 22 Dec, 2005.
[2] IDC, "IDC_ROI_Calculator for Anti-Spam solution," 2004. Available: http://www.surfcontrol.com/resources/asroi/IDC_ROI_Calculator.htm
[3] SophosLabs, "Sophos reveals latest ‘dirty dozen’ spam relaying countries", Tech. Rep, 12 October, 2005.
[4] The Spamhause Project, "The 10 worst spam origin countries," Spamhaus, Tech. Rep. 20051030, 30 October, 2005.
[5] James Carpinter and Ray Hunt, "Tightening the net: A review of current and next generation spam filtering tools, " Presented at Asia Pacific Regional Internet Conference on Operational Technologies, 2005.
[6] MessageLabs, "MessageLabs intelligence report: 2006 quarter 2 summary report, " Tech. Rep, pp. 17, June 2006.
[7] Hassan, Y. Tazaki, E. "Rule extraction based on rough set theory combined with genetic programming and its application to medical data analysis," Presented at Intelligent Information Systems Conference, the Seventh Australian and New Zealand, 2001.
[8] Pivotal Veracity, "Anti-Spam Methods & Checks,"
[9] Bart Massey, Mick Thomure, Raya Budrevich and Scott Long. "Learning spam: Simple techniques for freely-available software," 2003.
[10] A. Chouchoulas, "A rough set approach to text classification," 1999. Available: http://www.bedroomlan.org/~alexios/files/alexios_msc_thesis.pdf
[11] P. Alina Lazar, "An overview of heuristic knowledge discovery for large data sets Using genetic algorithms and rough sets," pp. 7, 2002.
[12] A. Hassanien, "Rough set approach for attribute reduction and rule generation: a case of patients with suspected breast cancer," J. Am. Soc. Inf. Sci. Technol., vol. 55, pp. 954-962, 2004.
[13] Z. Pawlak, "Rough sets," Int. J. Inf. Comput. Sci., 11. 1982.
[14] L. A. Zadeh, "Fuzzy sets," Inf Control, 8. 1965.
[15] Z. Pawlak, J. Grzymala-Busse, R. Slowinski and W. Ziarko, "Rough sets," Commun ACM, vol. 38, pp. 88-95, 1995.
[16] B. Walczak and D. L. Massart, "Tutorial Rough sets theory," Chemometrics Intellig. Lab. Syst., vol. 47, pp. 1-16, 1999.
[17] Z. Zheng, G. Wang and Y. Wu, "Objects''combination based simple computation of attribute core," Intelligent Control, 2002.Proceedings of the 2002 IEEE International Symposium on, pp. 514-519, 2002.
[18] S. Fujimori, T. Kaiya and T. Inoue, "Analysis of discharge currents with discernibility matrices," Electrical Insulating Materials, 1998.Proceedings of 1998 International Symposium on, pp. 649-652, 1998.
[19] S. Vinterbo and A. Ohrn, "Minimal approximate hitting sets and rule templates," International Journal of Approximate Reasoning, vol. 25, pp. 123-143, 2000.
[20] J. Wroblewski, "Finding minimal reducts using genetic algorithm (extended version)," Proceedings of Second Joint Annual Conference on Information Sciences, USA, pp. 186-189, 1995.
[21] Binbin Qu and Yansheng Lu, "A rough sets & genetic based approach for rule induction," in 2004, pp. 4300-4303 Vol.5.
[22] G. Chakraborty and B. Chakraborty, "A rough-GA hybrid algorithm for rule extraction from large data," in 2004, pp. 85-90.
[23] Tian-Le Tan, Ping Li and Zhi-Huan Song, "Matrix computation for dynamic modification of rough set information system," in 2003, pp. 1692-1697 Vol.3.
[24] Sen Guo, Zhi-Yan Wang, Zhi-Cheng Wu and He-Ping Yan, "A novel dynamic incremental rules extraction algorithm based on rough set theory," in 2005, pp. 1902-1907 Vol. 3.
[25] Tong Lingyun and An Liping, "Incremental learning of decision rules based on rough set theory," in 2002, pp. 420-425 vol.1.
[26] Tianrui Li, Ning Yang, Yang Xu and Jun Ma, "An incremental algorithm for mining classification rules in incomplete information systems," in 2004, pp. 446-449 Vol.1.
[27] J. H. Holland, "Adaptation in Natural and Artificial Systems [M]," Ann Arbor: University of Michigan Press, vol. 183, 1975.
[28] L. Khoo and L. Zhai, "A prototype genetic algorithm-enhanced rough set-based rule induction system," Comput. Ind., vol. 46, pp. 95-106, August. 2001.
[29] R. L. Haupt and S. E. Haupt, Practical Genetic Algorithms. Wiley-Interscience, 2004,
[30] S. W. Wilson, "State of XCS classifier system research," in Learning Classifier Systems, from Foundations to Applications, 2000, pp. 63-82.
[31] Stewart W. Wilson, "Classifier Fitness Based on Accuracy," Evolutionary Computation, Vol. 3, No.2, pp. 175, 1995.
[32] M. V. Butz, T. Kovacs, P. L. Lanzi and S. W. Wilson, "Toward a theory of generalization and learning in XCS," Evolutionary Computation, IEEE Transactions on, vol. 8, pp. 28-46, 2004.
[33] J. Hidalgo, "Evaluating cost-sensitive unsolicited bulk email categorization," in SAC ''02: Proceedings of the 2002 ACM Symposium on Applied Computing, 2002, pp. 615-620.
[34] H. Katirai, "Filtering Junk E-Mail: A Performance Comparison between Genetic Programming and Naive Bayes," September 10, 1999. 1999.
[35] G. Sakkis, I. Androutsopoulos, G. Paliouras, V. Karkaletsis, C. Spyropoulos and P. Stamatopoulos, "A memory-based approach to anti-spam filtering," 2001.
[36] I. Androutsopoulos, J. Koutsias, K. V. Chandrinos and C. D. Spyropoulos, "An experimental comparison of naive bayesian and keyword-based anti-spam filtering with personal e-mail messages," in SIGIR ''00: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2000, pp. 160-167.
[37] E. Riloff and W. Lehnert, "Information extraction as a basis for high-precision text classification," ACM Trans. Inf. Syst., vol. 12, pp. 296-333, 1994.
[38] H. Drucker, Donghui Wu and V. N. Vapnik, "Support vector machines for spam categorization," Neural Networks, IEEE Transactions on, vol. 10, pp. 1048-1054, 1999.
[39] Aleksander Øhrn, "Discernibility and Rough Sets in Medicine: Tools and Applications", December 1999.
[40] T. Kovacs, "Evolving Optimal Populations with XCS Classier Systems," Research Papers CSRP-96-17, the University of Birmingham, School of Computer Science, 1996.
[41] E. Bernado-Mansilla and Tin Kam Ho, "Domain of competence of XCS classifier system in complexity measurement space," Evolutionary Computation, IEEE Transactions on, vol. 9, pp. 82-104, 2005.
[42] Mo-Yi Tzeng, "A Spam Filter Based on Rough Sets Theory," July 2005.
[43] Doug Herbers, "Collaborative E-mail Filtering," 2005.
[44] F. D. Garcia,J.-H.Hoepman and J. van Nieuwenhuizen, "Spam Filter Analysis," Presented at Proceedings of 19th IFIP International Information Security Conference, WCC2004-SEC, 2004.
[45] Lorrie Faith, Brain A. LaMacchia. “Spam!”, Commun ACM, vol. 41, pp. 74-83, 8. 1998.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔