跳到主要內容

臺灣博碩士論文加值系統

(3.236.84.188) 您好!臺灣時間:2021/08/06 12:57
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:徐得恩
研究生(外文):Te-en Hsu
論文名稱:合作式垃圾郵件偵測之研究
論文名稱(外文):A Study of Collaborative Spam Mail Detection
指導教授:施東河施東河引用關係
指導教授(外文):Dong-Her Shih
學位類別:碩士
校院名稱:國立雲林科技大學
系所名稱:資訊管理系碩士班
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2004
畢業學年度:92
語文別:中文
論文頁數:96
中文關鍵詞:多重代理人合作偵測垃圾郵件增強式學習電子郵件
外文關鍵詞:spammultiagentreinforcement learningemailcollaborative detection
相關次數:
  • 被引用被引用:2
  • 點閱點閱:106
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:2
電子郵件由於其速度、低成本、與易於使用的緣故,已經成為了溝通上最常被使用的方法。然而,此媒體也帶來了垃圾郵件的問題;電子郵件對合法使用者與想要進行廣告行銷的企業而言同樣有用,因此產生了能自動地過濾這些不想要的郵件,保留與使用者有關之郵件的需求。
本研究提出一基於多重代理人系統的合作式偵測方法,用來從使用者的郵件中過濾出垃圾郵件
此方法由三個部份所構成:na��ve bayesian分類器作為獨立偵測之用;投票機制作為計算來自偵測團隊中的垃圾郵件機率成為最終垃圾郵件機率值;以及修改的增強式學習方法用來更新報酬表。
然後,此方法經模擬驗證,結果顯示在合作式垃圾郵件偵測中的偵測率要高於單一偵測,而合作式垃圾郵件偵測的誤判率要低於獨立偵測。
Email has become one of the most widely used methods of communication, due to its speed, low cost and ease of use. However, with this medium also comes the problem of Unsolicited Bulk Email, or called “spam mail”; the medium is just as useful legitimate user as it is for corporations wanting to advertise. This has is turn led to the requirement that these unwanted emails should be automatically filtered to leave just the relevant ones for the user.
This study proposed a method based on multiagent system to collaboratively filter spam from users’ mail stream.
The method composed of three parts: a na��ve bayesian classifier for single detection; a voting scheme for computing spam probability from each agent in the committee to final spam probability and a modified reinforcement learning algorithm used for updating reward table.
Then, the method was tested by simulation. The result showed that detection rate in collaborative spam detection is higher than single detection and false positive rate is less than single detection.
目 錄
一、緒論 1
1.1 研究背景 1
1.2 研究動機 2
1.3 研究目的與預期貢獻 3
1.4 研究限制 3
1.5 研究流程與論文架構 4
二、文獻探討 6
2.1 垃圾郵件 6
2.1.1 垃圾郵件簡介 6
2.1.2 垃圾郵件偵測 7
2.1.3 郵件伺服器黑名單(Blacklist) 8
2.1.4簽章式(Signature-Based Filtering) 9
2.1.5 規則式 9
2.1.6 分散式代理人 10
2.1.7 挑戰-回應式過濾(Challenge-Response Filtering) 11
2.1.8 機器學習(文字分類) 12
2.1.9 免疫系統式 13
2.1.10 FFBs 14
2.2 合作式學習 16
2.2.1 代理人 16
2.2.2 多重代理人 19
2.3.3 合作式多重代理人之學習 20
2.2.4 增強式學習(Reinforcement Learning) 23
2.2.5 小結 24
三、系統設計 25
3.1 問題描述 25
3.2 資料收集與分析 26
3.2 Na��ve Bayes 27
3.3 變數粹取 28
3.4 合作式學習 30
3.4.1 Voting Scheme 31
3.4.2 增強式學習(Reinforcement Learning) 34
3.5 合作式偵測架構設計 36
3.5.1 個別代理人偵測架構 36
3.5.2 合作偵測的演算法 37
四、實驗設計與實驗結果 43
4.1 實驗設計 43
4.2 參數說明 44
4.3 評估準則 45
4. 4 模擬系統 46
4.2代理人使用數量對偵測能力之影響 53
4.3 垃圾郵件門檻值與正常郵件門檻值對偵測能力的影響 55
4.4 遞增值與遞減值對合作偵測能力的影響 57
4.5 初始值對合作偵測能力的影響 61
4.6 合作偵測門檻值能力對偵測能力的影響 63
4.7 獨立偵測與合作偵測結果之比較 65
第五章 管理議題 72
5.1. 使用者反制垃圾郵件的策略 72
5.2 企業層次防止垃圾郵件策略 78
第六章 結論與未來建議 81
6.1 結論 81
6.2 未來研究 81
參考文獻 83


表 目 錄
表2-1 垃圾郵件偵測相關研究表 7
表2-2 垃圾郵件偵測方法優缺點整理 15
表2-2 垃圾郵件偵測方法優缺點整理(續1) 15
表2-2 垃圾郵件偵測方法優缺點整理(續2) 16
表3-1 驗證資料來源內容 26
表3-2 訓練資料與測試資料說明 26
表3-3 增強式學習與本研究之間的差異 35
表3-4 符號列表 42
表5-1 防制垃圾郵件的技巧,原則一 74
表5-2 防制垃圾郵件的技巧,原則二 75
表5-3 防制垃圾郵件的技巧,原則三 76
表5-4 防制垃圾郵件的技巧,原則四 77


圖 目 錄
圖1-1 美國電子郵件中包含垃圾郵件的比率長條圖 2
圖1-2 研究流程圖 5
圖2-1 SPAM機率排序範例圖 10
圖2-2 Hash code與SVM的結合過濾架構 11
圖2-3 垃圾郵件偵測淋巴球生命週期 14
圖2-4 智慧型代理人 17
圖2-5 Nwana之代理人分類 18
圖2-6 典型的多重代理人系統之架構 19
圖2-7 代理人間合作的方式 21
圖3-1 選擇候選屬性流程圖 28
圖3-2 CBR機制 31
圖3-3 個別代理人獨立偵測架構 36
圖3-4 合作式偵測代理人運作示意圖 37
圖3-5 合作偵測運作流程圖 38
圖4-1 實驗流程 44
圖4-2 模擬系統-系統初始選單 47
圖4-3 實驗模擬流程 48
圖4-4 模擬系統-建立文字變數 49
圖4-5 模擬系統-建立訓練向量 50
圖4-6 模擬系統-選擇代理人資料夾位置 50
圖4-7 模擬系統-模擬變數設定 51
圖4-8 模擬系統-指令功能表與代理人資訊 51
圖4-9 模擬系統-模擬結果 52
圖4-10 Reward Table變化趨勢 52
圖4-11 偵測過程偵測郵件類別變化趨勢 53
圖4-12 使用者代理人使用數量對偵測能力的影響(平均值) 54
圖4-13 使用者代理人使用數量對偵測能力的影響(標準差) 54
圖4-14 不同合作偵測次數對偵測結果的影響 56
圖4-15 不同門檻值對合作偵測次數的影響 56
圖4-16 改變R+, R-對偵測結果的影響 58
圖4-17 兩種R+/R-設定對偵測率及誤判率的影響 58
圖4-18 R-=80, R+=5對Reward Table變化的影響 59
圖4-19 R+=80, R-=5對Reward Table變化的影響 59
圖4-20 RI對偵測率與誤判率的影響 61
圖4-21 RI=0時對Reward Table趨勢的影響 62
圖4-22 RI=800時對Reward Table趨勢的影響 62
圖4-23 不同TC對於偵測結果的影響 64
圖4-24 合作偵測的ROC曲線 64
圖4-25 [TS=0.999]獨立偵測與[TS=0.999, TL=0.001]合作偵測偵測率比較圖 66
圖4-26 [TS=0.5]獨立偵測與[TS=0.999, TL=0.001]合作偵測偵測率比較圖 66
圖4-27 [TS=0.5]獨立偵測與[TS=0.999999, TL=0.000001]合作偵測偵測率比較圖 67
圖4-28 [TS=0.999]獨立偵測與[TS=0.999, TL=0.001]合作偵測誤判率比較圖 67
圖4-29 [TS=0.5]獨立偵測與[TS=0.999, TL=0.001]合作偵測誤判率比較圖 68
圖4-30 [TS=0.999]獨立偵測與[TS=0.999999, TL=0.000001]合作偵測誤判率比較圖 68
圖4-31 4種設定之準確率比較圖 69
圖4-32 合作偵測欲獨立偵測的ROC曲線比較圖 69
圖5-1 垃圾郵件反制的三個階段 73
圖5-2 用戶端垃圾郵件防治策略 77
圖5-3 伺服器端垃圾郵件防治策略 79
英文部份:
[1]Aamodt, A. & Plaza E. (1994), Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches, Artificial Intelligence Communications, Vol.7 No.1, pp.39-59.
[2]Barto, A.B., Sutton, R.S., Watkins, C., 1989, "Sequential decision problems and neural networks.", Proceedings of 1989 Conference on Neural Information Processing
[3]Belkin, Nicholas J. and Croft, W. Bruce, 1992, "Information filtering and information retrieval: two sides of the same coin?", Communications of the ACM, Vol 35, Issue 12, pages 29-38
[4]Carreras, Xavier, M�跫quez, Llu�臃, 2001, "Boosting Trees for Anti-Spam Email SpamAssassin Web Site, http://spamassassin.orgFiltering", Proceeding of 4th International Conference on Recent Advances in Natural Language Processing
[5]Cohen, Williams W., 1996, "Learning Rules that classify e-mail", Proceedings of the 1996 AAAI Spring Symposium on Machine Learning and Information Access. AAAI Press, 1996
[6]Drucker, H., Wu, D., Vapnik, V., N., 1999, "Support vector machines for spam categorization", IEEE transactions on Neural Networks, Vol 10, Issue 5
[7]Gilbert, D., et al., 1995, "IBM Intelligent Agent Strategy", IBM Corporation
[8]Good, I. J. 1965. The Estimation of Probabilities: An Essay on Modern Bayesian Methods. M.I.T. Press.
[9]Hellweg, Eric, 1999, "What price spam?", Business 2.0
[10]Huhns, M. and Weiss, G., 1998, "Special issue on multiagent learning.", Machine Learning, vol 33, pp. 123-128
[11]I. Androutsopoulos et. al, 2003, "A Evaluation of Na��ve Bayesian Anti-Spam Filtering", Proceedings of the workshop on Machine Learning in the New Information Age, 11th European Conference on Machine Learning, Barcelona, Spain, pp. 9-17, 2000.
[12]Internet E-mail Corporate Usage Report, 2000,
[13]Jennings, N. R., 2000, "On agent-base software engineering.", Artificial Intelligence, vol 117, pp 277-296
[14]Jung, Jason J. and Jo, Geun-Sik, 2003, "Collaborative Junk E-mail Filtering Based on Multi-agent Systems", HIS 2003, LNCS 2003, pages 218-227
[15]Kaelbling, L.P., Littman, Michael L., Moore, Andrew W.., "Reinforcement learning: A survey.", Journal of AI Reasearch, Vol 4, pp 237-285
[16]Katirai, H., Filtering junk e-mail: A performance comparison between genetic programming & naive bayes.
Available: http://members.rogers.com/hoomank/papers/katirai99filtering.pdf, 1999.
[17]Kay, Allen, 1984, "Computer Software", Scientific American Vol 251, Issue 3, pages 53-59
[18]Kolcz, A. & Alspector J. (2001) "SVM-based filtering of e-mail spam with content-specific misclassification costs.", Proceedings of the TextDM''01 Workshop on Text Mining - held at the 2001 IEEE International Conference on Data Mining, 2001.
[19]Metzger, J�宁g, et al., 2003, "A Multiagent-based Peer-to-Peer Network in Java for Distributed Spam Filtering", Multiagent Systems and Applications III, Proceedings of The 3rd International/Central and Eastern European Conference on Multi-Agent Systems (CEEMAS''03), Lecture Notes on Artificial Intelligence 2691, pages 616-625. Springer, 2003.
[20]Nwana, Hyacinth S., 1996, "Software Agents: An Overview", Knowledge Engineering Review, Vol. 11, No 3, pp.1-40, Sept 1996.
[21]Oda, Terri and White, Tony, 2003, "Developing an Immunity to Spam", Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2003), Lecture Note in Computer Science, Vol 2723, pp. 231-242, 2003
[22]Pantel, Patrick and Lin, Dekang, 1998, "SpamCop - A Spam Classification and Organization Program", Proceedings of AAAI-98 Workshop on Learning for Text Categorization., pages 95-98, Madison, WI.
[23]Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann.
[24]Plaza, Enric and Onta?n’on, Santiago, Cooperative Multiagent Learning, Adaptive Agents and MAS, LNAI 2636, pp. 1-17, 2003.
[25]Rennie, Jason D. M., 2000, "ifile: An Application of Machine Learning to E-Mail Filtering", Proceedings of the KDD-2000 Workshop on Text Mining, Boston, MA, USA.
[26]Sahami, Mehran, et al., 1998, "A bayesian approach to filtering junk E-mail", Learning for Text Classification – Papers from the AAAI Workshop, pp. 55-62, Madison Wisconsin, AAAI Technical Report WS-98-05, 1998
[27]Sakkis, Georgios, 2001, "Stacking classifiers for anti-spam filtering of E-mail", 6th Conference on Empirical Methods in Natural Language Processing, Pittsburgh, US, Association for Computational Linguistics, Morristown,
[28]Sakkis, Georgios, et al., 2003, "A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists", Information Retrieval Journal, Vol 6, Isuue 1, pages 49-73, January 2003
[29]Salton, G., and McGill, M. J. 1983. Introduction to Modern Information Retrieval. McGraw-Hill Book Company.
[30]Sichman, J. S. et al., 1994, "A social reasoning mechanism based on dependence networks.", Proceedings of the 11 European Conference on Artificial Intelligence(ECAI-94), Amsterdam, pp. 188-192.
[31]Smith, R. G. and Davis, R., 1980, "Frameworks for cooperation in distributed problem solving.", IEEE Transactions on Systems, Man and Cybernetic, Vol 11(1)
[32]Sutton, Richard and Barto, Andrew., Reinforcement Learning. MIT Press, 1998.
[33]Vaughan-Nichols, Steven J., 2003, "Saving Private E-mail", IEEE Spectrum, August 2003, pp.40-44
[34]Vladimir Vapnik, 1995, The Nature of Statistical Learning Theory, Springer-Verlag, NY, USA,
[35]Weiss, G., 2000, Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence ,second printing, Massachusetts Institute of Technology, USA
[36]White, Steve, et al., 2002, "Anatomy of a commercial-grade immune system", Technical report, IBM Thomas J. Watson Research Center
[37]Wiener, Erik, 1995, "A Neural Network Approach To Topic Spotting", 4th Annual Symposium on Document Analysis and Information Retrieval(SDAIR-95), Las Vegas, US, page 317-332
[38]Wooldridge, Michael, Jennings, Nicholas R., 1995, "Intelligent Agents: Theory and Practice", vol10, Number 2, pp. 115-152
[39]Yang Y and Pedersen Jo (1997) A comparative study on feature selection in text categorization. In: Proceedings of ICML-97, 14th International Conference on Machine Learning, Nashville, US, pp. 412-420.)
[40]Yang, Y.,1999, "An evaluation of statistical approaches to text categorization", Information Retrieval, Vol 1-2, Issue 1, pages 69-90
[41]Zhao, B. Y., et al., 2001, "Tapestry: An infrastructure for fault-tolerant wide-area location and routing.", Tech.Rep. UCB/CSD-01-1141, U.C Berkeley, April 2001
[42]Zhou, Feng, Zhuang, Li, Zhao, Ben Y., Huang, Ling, 2003, "Approximate Object Location and Spam Filtering on Peer-to-Peer Systems", Middleware 2003, LNCS 2672, pages 1-20

中文部份:
[43] 垃圾電子郵件網路調查報告發佈,民90,
http://www.net080.com.tw/stand/junk_mail_press.htm

網站部份:
[44]
Brightmail, http://brightmail.com/
[45]
death2spam, http://death2spam.com/
[46]
EDUTELLA, http://edutella.jxta.org

[47]
Email, C.A.U:Pending legislation, 2002, http://www.cauce.org/legislation
[48]
ePrivacy Group, "Spam By Numbers", http://www.eprivacygroup. com/pdfs/SpamByTheNumbers.pdf.
[49]
Gnutella, http://www.gnutella.com/
[50]
JavaMail API., http://java.sun.com/products/javamail/index.html
[51]
Morpheus, http://www.morpheus.com
[52]
Napster Home Page. http://www.napster.com/
[53]
OpenNap Home Page. http://opennap.sourceforge.net/.
[54]
Paul Graham Site, http://www.paulgraham.com
[55]
Perl, http://www.perl.com
[56]
Pointera Home Page. http://www.pointera.com/.
[57]
SpamAssassin Web Site, http://spamassassin.org
[58]
The Spamhaus Project, http://www.spamhaus.org/sbl/index.lasso
[59]
Vipul''s Razor, http://razor.sourceforge.net
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top