(3.235.139.152) 您好!臺灣時間:2021/05/11 05:17
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:林嘉翔
研究生(外文):Chia-shyang Lin
論文名稱:以貝氏定理為基礎於垃圾郵件過濾之研究
論文名稱(外文):An Enhanced Naïve Bayesian Classifier on Spam Filtering
指導教授:唐順明唐順明引用關係施東河施東河引用關係
指導教授(外文):S.M. TungDon-her Shieh
學位類別:碩士
校院名稱:國立雲林科技大學
系所名稱:資訊管理系碩士班
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2005
畢業學年度:93
語文別:中文
論文頁數:71
中文關鍵詞:垃圾郵件電子郵件貝氏定理
外文關鍵詞:Bayes'' Theorememailspam
相關次數:
  • 被引用被引用:2
  • 點閱點閱:264
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:2
電子郵件對人們工作與日常生活的影響,使之可稱為 Internet 的「殺手級應用 (Killer Application)」,已成為企業與個人間便捷的溝通管道之一。但同時,大多數的使用者也都飽受了垃圾郵件轟炸之苦。針對此一問題,現有的解決方式中,以內容為基礎的過濾方法最適合於個人用戶端所使用,而其中又以貝氏定理為基礎的演算法為大宗。本研究檢視了 Naïve Bayes 與 Robinson (2003) 兩種以貝氏定理為基礎的過濾方法後,提出了三種改進演算法,其中透過多屬性維度與回饋式學習之方法,經實驗後證明其結果相較於 Naïve Bayes 與 Robinson (2003) 有較低的錯誤率,而回饋式學習演算法更在各種評估指標中獲得整體的提昇。
Spam problem has been viewed as a serious threat to the Internet, flooding users’ inboxes and costing businesses billions of dollars through the waste of bandwidth, storage, and office work forces. To the worse and worse spam problem, several studies have been made, ranging from technical to regulatory. Naïve Bayes classifier is a widely used classifier in text categorization task. It also enjoys a blaze of popularity in anti-spam researchers. In this study, we analysis the Naïve Bayes classifier and the modification of Robinson (2003), then proposed three ways of enhancement. The experiment result shows that two of the proposed methods have better performance in most cases than traditional Naïve Bayes model while holding good detection rate and eliminating the false positive problem.
中文摘要 I
ABSRACT II
誌謝 III
目錄 IV
表目錄 VI
圖目錄 VII
一、緒論 1
1.1研究背景 1
1.2研究動機 2
1.3研究目的 5
1.4研究範圍 5
1.5研究流程 6
1.6論文架構 8
二、文獻探討 9
2.1電子郵件的基礎概念 9
2.1.1 電子郵件系統的組成 10
2.1.2電子郵件通訊的主要協定 11
2.1.3開放式代轉站 (Open Relay) 12
2.2 垃圾郵件所帶來之問題 13
2.3 處裡垃圾郵件問題的挑戰 19
2.4 垃圾郵件的偵測與過濾 21
2.4.1 社會面的解決方法 22
2.4.2 技術面的解決方法 24
2.5小結 31
三、系統設計 33
3.1以貝氏定理為基礎的過濾方法 33
3.1.1 Naïve Bayes過濾法 33
3.1.2 Robinson (2003) 提出之貝氏方法 36
3.2 方法的改進 38
3.2.1 分類屬性的事後機率值調整 38
3.2.2 提高屬性維度 40
3.2.3 回饋式學習 42
四、實驗設計 46
4.1 資料集 46
4.2 評估指標 47
4.3 實驗結果 49
4.3.1 實驗 1 ( ) 49
4.3.2 實驗 2 ( ) 51
4.3.3 實驗結果說明 53
五、結論與未來建議 55
5.1 研究結論 55
5.2 研究限制 56
5.3 未來研究方向 56
參考文獻 57
中文部份
1."企業採購-對抗垃圾郵件",民93年九月,iThome企業採購情報誌,第149期。
2.徐綺憶,邱瑞滿,楊原杰,民93年,"郵件服務之管理問題:結合貝氏定理與案例庫推理機制於垃圾郵件辨識之研究",第一屆服務業管理與創新學術研討會。
3.翁千婷,民94年,"別讓垃圾信封閉擬的心",科學人雜誌五月號。
4.張清溪,許嘉棟,劉鶯釧,吳聰敏,民89年,經濟學-理論與實際(上冊),第四版,台北,台灣。
5.張漢宜,民93年,"垃圾郵件內幕╱病毒行銷惹人厭?",e天下雜誌,2004年2月。
6.張鎮遠,民90年,"垃圾郵件的規範管理研究",國立台灣大學商學研究所碩士論文。
7.蔡承家,民93年,"垃圾郵件終結者-貝氏過濾法",iThome企業採購情報誌,第149期。
8.蔡瓊輝,吳志宏,民93年,"使用倒傳遞類神經網路學習垃圾郵件行為的類型",第九屆人工智慧與應用研討會。
英文部分
1.Allman, Eric., (2003), "The FTC and SPAM”, QUEUE, September 2003, 62-69,ACM Press,
2.Androutsopoulos, I., Koutsias, J., Chandrinos, K. V., Paliouras, G., Spyropoulos, C D. (2000). “An Evaluation of Naive Bayesian Anti-Spam Filtering”, In Workshop on Machine Learning in the New Information Age.
3.Androutsopoulos, I., Koutsias, J., Chandrinos, K. V., Spyropoulos, C. D., (2000), “An Experimental Comparison of Naïve Bayesian and Keyword-Based Anti-Spam Filtering with Personal E-mail Messages.”, In Proc. of the 23 rd Annual International ACM SIGR Conference on Research and Development in Information Retrieval, pp. 160-167, Athens, Greece,.
4.Belkin, Nicholas J. and Croft, W. Bruce, (1992), “Information filtering and information retrieval: two sides of the same coin?,” Communications of the ACM, Vol 35, Issue 12, pages 29-38.
5.Baayen, H., Van Halteren, H., Neijt, A., and Tweedie, F., (2002), “An experiment in authorship attribution.”, Proceedings of JADT 2002. St. Malo, 2002. 29-37.,.
6.Carreras, X., Màrquez, L., (2001), “Boosting Trees for Anti-Spam Email Filtering”, Proceedings of RANLP-01, 4th International Conference on Recent Advances in Natural Language Processing
7.Cerf, V. G., “Spam, Spim, and Spit,”, (2005, Apr), Communications of the ACM, Volume 48 Issue 4, pp. 39-43.
8.Cheng, J. and Greiner, R., (1999, Aug), “Comparing Bayesian Network Classifiers.”, Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI-99), Sweden.
9.Chwartz, A., (2004, Jul), SpamAssassin, first edition, ISBN: 0-596-00707-8, publisher: O’Reilly.
10.Cournane, A. and Hunt, R., (2004, Mar), “An Analysis of the Tools used for the Generation and Prevention of Spam”, Computers and Security, Elsevier, UK, Vol 23 No 2, pp154-166.
11.Cranor, L. Faith, B. H. LaMacchia, (1998, Aug), “Spam!”, Communication of ACM, vol 41, no. 8, pp. 74-83,
12.Cunningham, P., Nowlan, N., Delany, S. J., Haahr, M., (2003), “A Cased-Based Approach to Spam Filtering that can Track Concept Drift”, Technical Report TCD-CS-2003-16, Trinity College Dublin, Ireland
13.Deepak, P., Parameswaran, S., (2005, Jan), “Spam filtering using spam mail communities”, Proceedings. The 2005 Symposium on Applications and the Internet, pp. 377 – 383
14.Dent, K. D., (2003, Dec), Postfix: The Definitive Guide, first edition, ISBN: 0-596-00212-2, Publisher: O''Reilly.
15.Domingos, P. and Pazzani, M., (1996), “Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier.”, In Proc. o f the 13th International Conference on Machine Learning, pp. 105-112, Bari, Italy.
16.Drucker H., D. Wu, and V. Vapnik., (1999), “Support vector machines for spam categorization.”, IEEE Transactions on Neural Networks, 10(5), pp. 1048-1054.
17.Dunham, M. H., (2003), Data Mining Introductory and Advanced Topics, Pearson Education Inc.
18.Fawcett, T., (2003), “"In vivo" spam filtering: a challenge problem for KDD”, Contributed articles, ACM SIGKDD Explorations Newsletter, Volume 5 , Issue 2, pp. 140 - 148
19.Gburzynski P., Maitan J., (2004, Feb), “Fighting the spam wars: A remailer approach with restrictive aliasing”, ACM Transactions on Internet Technology (TOIT), Volume 4, Issue 1, pp. 1-30.
20.GFI Whitepaper, (2005), Why Bayesian Filtering is the most effective anti-spam technology, GFI Software LTD.
21.Giraud-Carrier, C., (2000, Jun), “Unifying Learning with Evolution Through Baldwin an Evolution and Lamarckism: A Case Study.” In: Proceedings of the Symposium on Computational Intelligence and Learning (CoIL-2000), pp. 36-41. MIT GmbH.
22.Hidalgo, J. M. G., (2002), “Evaluating Cost-sensitive Unsolicited Bulk Email Categorization”, In Proceedings of SAC-02, 17th ACM Symposium on Applied Computing, pp 615-620, Madrid, ES.
23.Graham, P., (2004), “Better Bayesian Filtering”, In Proceedings of Spam Conference, 2004, Massachusetts Institute of Technology
24.Graham-Cummings, J., (2003), “The spammers'' compendium.”, In Proceedings of the Spam Conference 2003, Massachusetts Institute of Technology
25.Grimes, Galen A., (2004, May), “Issues with spam”, Computer Fraud & Security, Volume 2004, Issue 5, Pages 12-16
26.Han, J. and Kamber, M., (2001), Data mining concepts and techniques, Morgan Kaufmann, page 284-287.
27.Hulten G., Penta A., Seshadrinathan G., Mishra M., (2004), “Trends in Spam Products and Methods”, In First Conference on Email and Anti-Spam (CEAS) 2004 Proceedings.
28.Ivey, K.C., (1998, Apr), “Spam: the plague of junk E-mail”, Computer Applications in Power, IEEE, Volume 11, Issue 2, pp. 15-16
29.Jin, R. and Si, L., “A study of methods for normalizing user ratings in collaborative filtering.”, Proceedings of the 27th annual international conference on Research and development in information retrieval, Pages: 568 – 569, July 2004.
30.Jung, J., Sit, E., (2004), “Traffic characterization and SPAM: An empirical study of spam traffic and the use of DNS black lists”, Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, pp. 370-375.
31.Kohavi, R., (1995), “A study of cross-validation and bootstrap for accuracy estimation and model selection”, In Proc. of the 14th Int. Joint Conf. on AI, Vol. 2, Canada, 1995.
32.Lai, C. C., Tsai, M. C., (2004, Dec), “An Empirical Performance Comparison of Machine Learning Methods for Spam E-Mail Categorization”, Fourth International Conference on Hybrid Intelligent Systems, Proceedings, pp. 44-48
33.Lee, Y., (2005, Jun), “The CAN-SPAM Act: a silver bullet solution?”, Communications of the ACM, Volume 48 , Issue 6, pp. 131-132
34.McWilliams, B. S., (2004, Oct), Spam Kings, First Edition, ISBN: 0-596-00732-9, Publisher: O’Reilly.
35.Langley, P., Wayn, I. and Thompson, K., (1992), “An Analysis of Bayesian Classifiers.”, In Proc. o f the 10th National Conference on Artificial Intelligence, pp. 223-228, San Jose, California.
36.Lashkari, Y., (1995), “Feature Guided Automated Collaborative Filtering”, Masters Thesis, MIT Media Laboratory.
37.Loder, T., Alstyne, M. V., Wash, Rick., (2004), “An economic Answer to Unsolicited Communication”, Proceedings of the 5th ACM conference on Electronic commerce, Session 2, pp 40 – 50.
38.Ludlow, M., (2002). “Just 150 ''spammers'' blamed for e-mail woe.”, The Sunday Times, 1st December. page 3.
39.Metzger, J., Schillo, M. and Fischer, K., (2003, June), “A multiagent-based peer-to-peer network in java for distributed spam filtering.”, In Proc. of the CEEMAS, Czech Republic.
40.O’Brien, C. and Vogel, C., (2003), “Spam filters: Bayes vs. chi-squared; letters vs. words”, In Proceedings of the International Symposium on Information and Communication Technologies.
41.O’Brien, C. and Vogel, C., (2004), “Comparing SpamAssassin with CBDF Email Filtering”, In Proceedings of the 7th Annual CLUK Research Colloquium.
42.Paulson, D. L. (2003, Jul), “Group Considers Drastic Measures to Stop Spam”, News Briefs, Computer, IEEE Computer Society, Volume: 36 , Issue: 7 ,Pages:21 – 22.
43.Pfleeger, S. L. Bloom, G., (2005, Mar), “Canning Spam: Proposed solutions to Unwanted Email”, IEEE Security & Privacy, IEEE Computer Society.
44.Robinson, G., (2003, Mar), “A Statistical Approach to the Spam Problem”, Linux Journal, Volume 2003, Issue 107
45.Sahami, M., Dumais, S., Heckerman, D. and Horvitz, E., (1998), “A Bayesian Approach to Filtering Junk E-mail”, In Learning for Text Categorization – Paper from the AAAI Workshop, pp. 55-62, Madison Wisconsin. AAAI Technical Report WS-98-05.
46.Sakkis, G., Androutsopoulos, I., Paliouras, S., Karkaletsis, Spyropoulos, C. D., Stamatopoulos, P., (2003), “A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists” Inf. Retr. 6(1): 49-73.
47.Weber, R., (2004, Sep), “The Grim Reaper: The Curse of E-Mail”, Editor’s Comments, MIS Quarterly Vol. 28 No. 3, pp. 3-13
48.Whitworth, B., Whitworth, E., (2004, Oct), “Spam and the Social-Technical Gap”, Computer, IEEE Computer Society, Volume 37, Issue 10, pp. 38 - 45
49.Wu, Y. F., (1999), Learning with bayesian networks, Publications of Mississippi State University, Institute for Signal and Information Processing 1999.
50.Zhang, L., Zhu, J., Yao, T., (2004, Dec), “An Evaluation of Statistical Spam Filtering Techniques”, ACM Transactions on Asian Language Information Processing, Vol. 3, No. 4, Pages 243-269
51.Zhou F., Zhuang L., Zhao, B. Y., Huang, L., Joseph A. D., Kubiatowicz, J., (2003), “Approximate Object Location and Spam Filtering on Peer-to-peer Systems”, In Proc. of Middleware (Rio de Janeiro, Brazil, June 2003), ACM, pp. 1--20.
網站部份
1.Anti Spam Research Group, http://asrg.sp.am/
2.Bogofilter project, http://bogofilter.sourceforge.net/
3.CAUCE, About the problem, Coalition Against Unsolicited Commercial Email, available from: http://www.cauce.org/about/problem.shtml.
4.Ferris Research, http://www.ferris.com/
5.Graham, P., (2002, Aug), A plan for spam, Retrieved December 23, 2004 from, http://www.paulgraham.com/spam.html,
6.Graham, P., (2003, Aug), Filters that Fight back, Retrieved December 23, 2004, from http://www.paulgraham.com/ffb.html
7.Jupiter Research, http://www.jupiterresearch.com/bin/item.pl/home
8.Mason, Justin. The SpamAssassin Homepage. Available from: http://spamassassin.org/index.html, 2004
9.Nielsen//NetRatings, http://www.netratings.com/
10.Prakash, V.V., Vipul’s Razor, http://razor.sourceforge.net/
11.SpamBayes Project, http://spambayes.sourceforge.net/
12.Spammers Grab MSN Hotmail addresses, http://www.spamhaus.org/news.lasso?article=6
13.RFC Editor Homepage, http://www.rfc-editor.org/
14.Sullivan, B. (2003, Aug), Spam Wars: How Unwanted Email is Burying the Internet, MSNBC, at http://www.msnbc.com/news/941040.asp
15.The Internet Engineering Task Force, http://www.ietf.org/
16.UCI Machine Learning Repository, http://www.ics.uci.edu/~mlearn/MLRepository.html
17.李欣茹,民93年,垃圾信嚴重,企業頭痛,民93.12.23 檢索,來源http://taiwan.cnet.com/enterprise/features/0,2000062876,20085772-3,00.htm
18.李欣茹,郭和杰,民93年,辦公室『信』騷擾調查報告,民93.12.23 檢索,來源http://taiwan.cnet.com/enterprise/features/0,2000062876,20085772-2,00.htm
19.資策會市場情報中心,民94年5月5日檢索,http://mic.iii.org.tw/intelligence/
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔