跳到主要內容

臺灣博碩士論文加值系統

(44.200.86.95) 您好!臺灣時間:2024/05/22 13:44
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:陳胤凱
研究生(外文):Yin-Kai Chen
論文名稱:一個以決策樹為基礎的三階段監督式學習垃圾郵件過濾架構
論文名稱(外文):A three-phase supervised learning spam filtering architecture base on decision tree
指導教授:楊維邦楊維邦引用關係
指導教授(外文):Wei-Pang Yang
學位類別:碩士
校院名稱:國立東華大學
系所名稱:數位知識管理碩士學位學程
學門:商業及管理學門
學類:其他商業及管理學類
論文種類:學術論文
論文出版年:2009
畢業學年度:97
語文別:中文
論文頁數:50
中文關鍵詞:機器學習垃圾郵件過濾決策樹
外文關鍵詞:Machine LearningDecision TreeSpam Filtering
相關次數:
  • 被引用被引用:1
  • 點閱點閱:337
  • 評分評分:
  • 下載下載:74
  • 收藏至我的研究室書目清單書目收藏:0
  隨著網際網路(Internet)的快速發展,電子郵件成為了散佈廣告、病毒以及進行攻擊的方便管道。有鑒於垃圾郵件的快速增長帶來的問題日益嚴重,研究者們提出了各種方法解決,其中機器學習演算法普遍的被使用。然而現階段機器學習演算法大多分為訓練與分類兩階段,訓練完成後即不再修改規則,對於防杜垃圾郵件發送者不斷改變的垃圾郵件散佈方法仍力有未逮。
  因此本研究中我們提出了一個可以持續學習的三階段監督式學習垃圾郵件過濾架構,不同於一般機器學習方法僅分為訓練與分類階段,本研究加入再學習階段形成訓練、分類、學習三階段架構,可循環的進行持續學習,解決一般機器學習演算法訓練完即不再更動規則之缺點。
  並且我們提出RMT_Initial及RMT_Modify兩個演算法根據每一條規則設計並維護反轉機制表,達成Graylisting [Harris, 2003]的概念,配合本研究的三階段架構根據目前狀態判定合法性,改善一般過濾法非黑即白的缺陷。
  而經由實驗證明可透過持續學習來防杜新型的垃圾郵件並達到至少96.75%以上的正確率。另外經由實驗結果發現,當獲得較佳的訓練資料以及經由足夠的學習後,正確率更可以達到0.993以上。
摘要 I
誌謝 II
目錄 III
圖目錄 V
表目錄 VII
第一章 緒論 1
1.1 研究背景 1
1.2 研究動機 3
1.3 研究目的與方法 4
1.4 論文架構 4
第二章 文獻探討 5
2.1 垃圾郵件過濾技術 (spam filtering technologies) 5
2.2 機器學習方法 (Machine learning methods) 7
2.3 決策樹演算法 (decision tree algorithms) 8
第三章 三階段式系統架構 13
3.1 前置處理與資料結構定義 15
3.2 訓練階段 17
3.2.1 規則產生階段 17
3.2.2 反轉機制階段 19
3.3 分類階段 22
3.4 再學習階段 23
第四章 實驗結果與分析 25
4.1 參數最佳化 26
4.2 詳細實驗數據 29
4.3 實驗數據比較 33
第五章 研究貢獻與未來展望 37
第六章 參考文獻 39
[Alpaydin, 2004] E. Alpaydin, Introduction to Machine Learning, 1st ed. MIT Press, 2004.
[Androutsopoulos, 2000] I. Androutsopoulos, J. Koutsias, K. V. Chandrinos, Paliouras, G., Spyropoulos, C.D. An Evaluation of Naїve Bayesian Anti-spam Filtering. Proc. of the Workshop on Machine Learning in the New Information Age, 11th European Conference on Machine Learning (ECML 2000), pp. 9-17, 2000.
[Berry, 1997] M. J. A. Berry, and G. S. Linoff, Data Mining Techniques: for Marketing, Sales, and Customer Support, Wiley Computer, New York, 1997.
[Breiman, 1984] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone (Eds.), Classification and regression trees, The Wadsworth statistics/probability series, Wadsworth International Group, Belmont, CA. pp. 358
[Breiman, 2001] L. Breiman, “Random forests,” Mach. Learn., vol. 45, pp. 5–32, 2001.
[Carreras, 2001] X. Carreras, and L. Marquez, “Boosting Tree for Anti-Spam Email Filtering,” Proc. of RANLP-01, 4th International Conference on Recent Advances in Natural Language Processing, Tzigov Chark, BG, 2001.
[Cohen, 1996] W. W. Cohen, “Learning Rules that Classify E-Mail,” Proc. of the 1996 AAAI Spring Symposiun on Machine Learning in Information Access, 1996.
[Drucker, 1999] H. Drucker, D. Wu, and V. N. Vapnik, “Support Vector Machines for Spam,” IEEE Transactions on Neural Networks, 10(5), pp. 1048-1054, 1999.
[Dudley, 2008] J. Dudley, L. Barone, Member, IEEE, L. While, Senior Member, IEEE,” Multi-objective Spam Filtering Using an Evolutionary Algorithm,” IEEE Congress on Evolutionary Computation, 2008. CEC 2008. (IEEE World Congress on Computational Intelligence), pp. 123 – 130, 1-6 Jun. 2008.
[Dwork, 2003] C. Dwork, and A. Goldberg, M. Naor, “On memory-bound functions for fighting spam,” Proc. of the 23rd Annual International Cryptology Conference (CRYPTO 2003), Aug. 2003.
[Garretson, 2007] C. Garretson, “Summer of Spam: Record Growth, Record Irritation,” , Aug. 16, 2007, Retrieved May 24, 2009, from http://www.pcworld.com/article/id,136061-c,spam/article.html
[Golbeck, 2004] J. Golbeck, and J. Hendler, “Reputation network analysis for email filtering,” Proc. of the First Conference on Email and Anti-Spam (CEAS), 2004.
[Gómez, 2000] J. M. Gómez, M. Maña-López, and E. Puertas, “Combining Text and Heuristics for Cost-Sensitive spam Filtering,” Proc. of the Fourth Computational Natural Language Learning Workshop, CoNLL-2000, Association for Computational Linguistics, 2000.
[Goodman, 2005] J. Goodman, D. Heckerman and R. Rounthwaite, “Stopping Spam-What can be done to stanch the flood of junk e-mail messages?,” Scientific American Magazine, Apr. 2005.
[Hall, 1998] R. J. Hall, “How to avoid unwanted email,” Conf. of the ACM, Mar. 1998.
[Harris, 2003] E. Harris, “The Next Step in the Spam Control War: Graylisting,” Retrieved Dec. 21, 2008, from http://projects.puremagic.com/greylisting
[Hastie, 2001] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, New York: Springer-Verlag, 2001.
[Jayaraj, 2008] A. Jayaraj, T. Venkatesh, and C. S. R. Murthy, ”Loss Classification in Optical Burst Switching Networks using Machine Learning Techniques: Improving the Performance of TCP,” IEEE Journal on Selected Areas in Communications, Volume 26, Issue 6, Part Supplement, pp. 45 – 54, , Aug. 2008.
[Jingrui, 2007] He. Jingrui, and B. Thiesson, Asymmetric Gradient Boosting with Application to Spam Filtering, 2007.
[Kirkos, 2007] E. Kirkos, C. Spathis, and Y. Manolopoulos, “Data Mining techniques for the detection of fraudulent financial statements,” Expert Systems With Applications, Vol. 32, Issue: 4 , pp. 995-1003, 2007.
[Konstantin, 2004] T. Konstantin, “Machine Learning Techniques in Spam Filtering,” Data Mining Problem-oriented Seminar, MTAT.03.177 , pp. 60-79, 2004.
[Koprinska, 2007] I. Koprinska, J. Poon, J. Clark, and J. Chan, “Learning to classify e-mail," Information Sciences, vol. 177, pp. 2167-2187, 2007.
[Lieven, 2006] P. Lieven, “Pre-MX Spam Filtering with Adaptive Greylisting Based on Retry Patterns,” Heinrich-Heine-Universität Düsseldorf , pp.5-8, 2006.
[Manco, 2002] G. Manco, E. Masciari, M. Ruffolo and A. Tagarelli, “Towards an Adaptive Mail Classifier,” Proc. Conf. of the Italian Association of Artificial Intelligence, 2002.
[Metsis, 2006] V. Metsis, I. Androutsopoulos and G. Paliouras, “Spam Filtering with Naive Bayes - Which Naive Bayes? ,” Proc. of the 3rd Conference on Email and Anti-Spam (CEAS 2006), Mountain View, CA, USA, 2006.
[Ohmann, 1996] C. Ohmann, V. Moustakis, Q. Yang and K. Lang, “Evaluation of automatic knowledge acquisition techniques in the diagnosis of acute abdominal pain," Artificial Intelligence in Medicine, vol. 8, no. 1, pp. 23-36, 1996.
[Pölzlbauer, 2008] G. Pölzlbauer, T. Lidy, and A. Rauber, ” Decision Manifolds—A Supervised Learning Algorithm Based on Self-Organization,” IEEE Transactions on Neural Networks, Volume 19, Issue 9, pp. 1518 – 1530, Sep. 2008.
[Quinlan, 1979] J. R. Quinlan, “Discovering rules from large collections of examples: a case study,” In Michie, Ph.D. (Ed.), Expert Systems in the Microelectronic Age. Edinburgh, Scotland: Edinburgh University Press, 1979.
[Quinlan, 1986] J. R. Quinlan, “Induction of Decision Trees,” Machine Learning, Volume 1, Number 1, pp 81-106, 1986.
[Quinlan, 1993] J. R. Quinlan, C4.5: Programs For Machine Learning, Morgan Kaufmann, Los Altos, 1993.
[Sahami, 1998] M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz, “A Bayesian Approach to Filtering Junk E-Mail,” In Learning for Text Categorization – Papers from the AIII Workshop , pp 55-62, 1998.
[Schryen, 2007] G. Schryen, Anti-spam Measures: Analysis and Design, Springer, 2007.
[Schwartz, 2004] A. Schwartz, SpamAssassin, O’Reilly, 2004.
[Score-Networks, 2006] Score-Networks, Global Internet Population Report. Retrieved Dec. 21, 2008, from http://www.score-network.org
[Sebastiani, 2002] F. Sebastiani, “Machine learning in automated text categorization,” ACM Computing Surveys, 34(1):1-47, 2002.
[Sheu, 2009] j. j. Shue, “An Efficient Two-phase Spam Filtering Method Based on E-mails Categorization,” International Journal of Network Security, Vol. 9, No. 1, pp. 34-43, 2009.
[Sheu et al., 2009] J. J. Sheu and K. T. Chu, “An efficient spam filtering method by analyzing e-mail’s header session only”, accepted by International Journal of Innovative Computing, Information and Control, 2009.
[Shin, 2006] D. Shin, J. Ahn, and C. Shim, ”Progressive Multi Gray-Leveling: A Voice Spam Protection Algorithm,” IEEE Network, Volume 20, Issue 5, pp. 18 – 24, Sep.-Oct. 2006.
[Symantec Messaging and web Security, 2007] Symantec Messaging and web Security, “The State of Spam: A monthly Report-June 2007,” Jun. 2007.
[Tompkins, 2003] T. Tompkins, and D. Handley, “Giving e-mail back to the users: Using digital signatures to solve the spam problem,” First Monday, 8(9), Sep. 2003.
[Vapnik, 1995] V. Vapnik, The Nature of Statistical Learning Theory New, York: Springer-Verlag, 1995.
[Wang, 2003] J. H. Wang, and L. F. Chien, “Toward Automated E-mail Filltering-An Investigation of Commercial and Academic Approaches,” Proc. of TANET, pp 687-692, 2003.
[Witten, 1999] I. H. Witten, and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann, 1999.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊