( 您好!臺灣時間:2023/10/02 06:18
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::


研究生(外文):Hsiang-Chun Kuo
論文名稱(外文):Acquiring Domain Knowledge of SPAM Classification Using Fuzzy Repertory Grids
指導教授(外文):Chih-Hung WuShing-Hwang Doong
外文關鍵詞:Spam FilteringRepertory GridFuzzy Repertory TablePersonal Construct TheoryKnowledge Acquisition
  • 被引用被引用:1
  • 點閱點閱:396
  • 評分評分:
  • 下載下載:54
  • 收藏至我的研究室書目清單書目收藏:2
目前垃圾郵件引發愈來愈嚴重的問題,從以前惡意攻擊的方式和腦人促銷的手段變成犯罪的媒介如網路釣魚,竊取使用者個人重要資料、金融帳號與密碼。人類以隱藏的知識背景來判斷垃圾郵件正確率高,所以利用知識擷取的方法取得專家的獨特經驗,本研究從人工智慧中知識擷取方法Repertory Grid 技術由個人獨特的認知模式,以數值尺度將元素與屬性構念用結構化的矩陣方格來表示。元素與屬性構念的關係強度描述原以數值1~5等級,如等級1表示二者關係強度大且呈反向、等級3表示二者關係強度無、等級5表示二者關係強度大且呈正向等。發現在元素與屬性構念關係強度描述有許多無法以單純的數值1~5等級來區別,如「大量」、「半夜」等,然而我們可以使用模糊理論中的歸屬函數製作Fuzzy Repertory Table來將語意形容數值化。本研究歸納出有效條件,使用機器學習的方式取得有效條件的規則,讓垃圾郵件的判讀正確率高。
Today, most spam filters usually compare the contents of emails against specific keywords, which are not robust as the spammers frequently change the terms used in emails. Human beings can successfully judge whether a coming email is spam or ham according to many features, including the keywords, associated with emails. In this paper, we apply the technique of fuzzy repertory grid to acquire knowledge of selecting behavioral features of spamming. Human concepts are extracted and classified by the techniques of repertory grid. Concepts involving uncertainty are described as fuzzy sets. Such features are than converted into filtering rules using a deductive algorithm and integrated with the mechanism of spam filtering. It is evident that such features are more robust than keyword-based so that spam filtering based on such features can have a better performance.
目 錄

摘要 i
誌 謝 iii
表 目 錄 vi
圖 目 錄 vii
第一章 緒 論 1
第一節 研究背景 1
第二節 研究動機 5
第三節 研究目的 6
第四節 研究主要貢獻與論文架構 7
第二章 相關研究與參考文獻 8
第一節 電子郵件基本架構、組成與寄發流程 8
第二節 傳統防治垃圾郵件的方法 10
第三節 垃圾郵件過濾知識與寄件行為 12
第四節 知識擷取 12
一、 個人建構理論 13
二、 Repertory Grid 技術 14
三、 以Repertory Grid、模糊方格應用研究為主 14
第三章 知識擷取技術 16
第一節 Repertory Grid 技術 16
第二節 模糊方格技術 23
一、 模糊集合梯型函數 24
二、 屬性特徵相似度 28
第四章 研究架構與方法 33
第一節 研究構想與目的 33
第二節 研究方法設計 34
一、 定義擷取垃圾郵件的相關分類元素 34
二、 擷取相關分類元素的特徵屬性構念 36
三、 模糊方格表進行評比 39
四、 產生郵件行為規則 49
五、 驗證規則有效性 50
第五章 實驗結果與分析 51
第一節 實驗一:取得描述垃圾郵件的詞彙 51
一、 實驗目的說明 51
二、 實驗設計與進行方式 51
三、 實驗結果列表 53
四、 結果說明與討論 55
第二節 實驗二:進行模糊方格表步驟 55
一、 實驗目的說明 55
二、 實驗設計與進行方式 55
第三節 實驗三:規則正確性實驗I 65
一、 實驗目的說明 65
二、 實驗設計與進行方式 65
三、 實驗結果列表 65
四、 結果說明與討論 67
第六章 結論與討論 68
第一節 結 論 68
第二節 討 論 69
第三節 未來研究與建議 70
參考文獻 71
附 錄 75
附錄一:實驗統計表 75
附錄二:實驗一確定性與模糊化描述詞彙與歸屬函數 78
附錄三:實驗二完成模糊方格表 81
[1]A. Gray and M. Haahr, 2004, “Personalised, Collaborative Spam Filtering”, First Conference on Email and Anti-Spam (CEAS).
[2]B. Mobasher, H. Dai, T. Luo, Y. Sun and J. Zhu, 2000, “Integrating web usage and content mining for more effective personalization”, in Proceedings of the International Conference on E-Commerce and Web Technologies, pp. 165-76.
[3]Bryan Costales, Eric Allman, 2002, sendmail 3rd Edition, O'Reilly &Associates, Inc.
[4]Castro, J.L., Castro-Schez, J.J., Zurita, J.M., 1999, “Learning maximal structure rules in fuzzy logic for knowledge acquisition in expert systems.”, Fuzzy Sets and Systems 101, 345–353.
[5]Castro, J.L., Castro-Schez, J.J., Zurita, J.M., 2001, “Use of a fuzzy machine learningtechnique in the knowledge acquisition process.”,Fuzzy Sets and Systems 123 (3), 307–320.
[6]EE, Jose Jesus Castro-Schez, Juan Luis Castro, Jose Manuel Zurita,2004, “ Fuzzy repertory table: a method for acquiring knowledge about input variables to machine learning algorithm.” ,IEEE T. Fuzzy Systems 12(1): 123-139.
[7]Elisabeth Crawford, Judy Kay, and Eric McCreath,2001, “Automatic Induction of Rules for e-mail Classification”, In Proceedings of the Sixth Australiasian Document Computing Symposium, Coffs Harbour, Australia, December 7.
[8]Evans, J., 1988, “The knowledge elicitation problem: a psychological perspective.”Behavior and Information Technology 1 (2), 111–130.
[9]Gammack, J., Young, R.,1985, “Psychological techniques for eliciting expert knowledge.” ,In: Bramer, M. (Ed.), Research and Development in Expert Systems. Cambridge University Press, London,pp. 105–112.
[10]Graessar, A.C., Gordon, S.E., 1991. Questions answering and the organization of the world knowledge. In: Kesesen W., Ortony, A., Craik, F. (Eds.), Memories, Thoughts, and Emotions. Essays in Honor of George Mandler. Earlbaum, Hillsdale, NJ.
[11]G. Boone. ,1998, “Concept features in Re:Agent, an intelligent email agent.” In K. P. Sycara and. M. Wooldridge, editors, Proceedings of the 2nd International Conference on Autonomous Agents., pages 141–148, New York, May 9–13.
[12]H. Drucker, Donghui Wu and VN Vapnik,1999, “Support vector machines for spam categorization” ,Neural Networks, IEEE Transactions on, vol. 10, pp. 1048-1054,.
[13]Hooman Katirai.,1999, Filtering junk E-mail: A performance comparison between genetic. programming and naive bayes. Available from: http://citeseer.ist.psu.edu/. katirai99filtering.html.
[14]Ion Androutsopoulos, John Koutsias, Konstantinos V. Chandrinos and Constantine D. Spyropoulos,2000, “An Experimental Comparison of Naive Bayesian and Keyword-Based Anti-Spam Filtering with Personal E-mail Messages,” in Proc. of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2000), Jul.
[15]J.J. Rocchio, 1971, “Relevance feedback in information retrieval,” ,In: G. Salton (ed.), The Smart retrieval system:. experiments in automatic document processing, Prentice Hall, pp. 313-323.
[16]Jefferson Provost.,1999, “Naive-Bayes vs. Rule-Learning in Classification of Email.” ,Technical Report AI-TR-99-284, The University of Texes at Austin, Artificial Intelligence Lab.
[17]Jose Jesus Castro-Schez, Nicholas R. Jennings, Xudong Luo, Nigel R. ,2004, “Shadbolt: Acquiring domain knowledge for negotiating agents: a case of study.” Int. J. Hum.-Comput. Stud. 61(1): 3-31.
[18]Kelly, G.A., 1955. The Psychology of Personal Constructs. Norton, New York.
[19]McDonald, J., Dearholt, D., Paap, K., Schvanevedt, R., 1986. A formal interface design methodology based on user knowledge. CHI’86 Proceedings, pp. 285–290.
[20]Mehran Sahami, Susan Dumais, David Heckerman, and Eric Horvitz.,1998, “ A bayesian approach to filtering junk E-mail.” ,In Learning for Text Categorization:Papers from the 1998 Workshop, Madison, Wisconsin.
[21]Vel, A. Anderson, M. Corney, and G. Mohay, 2001, “Mining e-mail content for author. identification forensics.” ,ACM SIGMOD Record,. vol. 30, pp. 55-64.
[22]Richard B. Segal and Jeffrey O. Kephart.,1999, Mailcat: An Intelligent Assistant for Organizing E-Mail. In Proceedings of the third annual conference on Autonomous Agents, Seattle, Washington, United States.
[23]Robert J. Hall. ,1996, “Channels: Avoiding Unwanted Electronic Mail.” ,In Proceedings of the 1996 DIMACS Symposium on Network Threats, volume 38, Piscataway, NJ, December 4-6.
[24]Schreiber, G., Akkermans, H., Anjewierden, A., Hoog, R., Shadbolt, N., Van de Velde, W., Wielinga, B.,2000. Knowledge Engineering and Management: The CommonKADS Methodology. Massachusetts Institute of Technology Press, Cambridge, MA.
[25]Shadbolt, N., Burton, A.M.,1989. “The empirical study of knowledge elicitation techniques.”, SIGART Newsletter 108, 15–18.
[26]Spamhaus, 2004, The Spamhaus Project, http://www.spamhaus.org/.
[27]Spamhaus, “The Definiation of Spam”, http://www.spamhaus.org/definition.html, 2006.
[28]Shortliffe, E.H. and B.G. Buchanan, 1975, “A Model of Inexact Reasoning in Medicine” , Mathematical Bioscinences, Vol. 23, pp. 351-379
[29]UCEB,2004, SPAM blocking blackhole list, http://www.uceb.org/.
[30]W. W. Cohen.,1995, “Fast effective rule induction. In Proceedings of the Twelfth International.”, Conference on Machine Learning (ICML-95), pages 115–123, San Francisco, CA.
[31]William W. Cohen,1993, “Learning Rules that Classify. E-Mail”, Proc. AAAI-1996 Spring Symposium on. Machine Learning in Information Access, pp. 124-143,. March.
[32]William W. Cohen.,1995, “Fast Effective Rule In-. duction,” In Armand Prieditis and Stuart Russell, edi-. tors, Proceedings of the 12th International Conference. on Machine Learning, pages 115–123, Tahoe City, CA
[33]Y. Diao, H. Lu, and D. Wu, 2000, “A Comparative Study of Classification Based Personal E-mail Filtering,” In Proceeding 4th Pacific-Asia Conference Knowledge Discovery and Data Mining, pp.408-419, April.
[34]Zadeh, L.A.,1975a, “The calculus of fuzzy restrictions. In: Zadeh, L.A., et al. (Ed.), Fuzzy Sets and Applications to Cognitive and Decision Making Processes. Academic Press, New York, pp. 1–39.
[35]國家通訊傳播委員會,2005,濫發商業電子郵件管理條例草案, http://www.dgt.gov.tw/chinese/ncc/mail-requlation/ncc-mail-requlation.shtml
[36]CNET,2005,垃圾郵件關鍵報告, http://www.cnet.com/.
[53]廖書瑩,2003,利用Repertory Grid擷取出偵測垃圾郵件的分類知識,樹德科技大學資訊管理研究所,碩士論文。
第一頁 上一頁 下一頁 最後一頁 top