研究生(外文):Hsiang-Chun Kuo
論文名稱(外文):Acquiring Domain Knowledge of SPAM Classification Using Fuzzy Repertory Grids
指導教授(外文):Chih-Hung WuShing-Hwang Doong
外文關鍵詞:Spam FilteringRepertory GridFuzzy Repertory TablePersonal Construct TheoryKnowledge Acquisition
目前垃圾郵件引發愈來愈嚴重的問題,從以前惡意攻擊的方式和腦人促銷的手段變成犯罪的媒介如網路釣魚,竊取使用者個人重要資料、金融帳號與密碼。人類以隱藏的知識背景來判斷垃圾郵件正確率高,所以利用知識擷取的方法取得專家的獨特經驗,本研究從人工智慧中知識擷取方法Repertory Grid 技術由個人獨特的認知模式,以數值尺度將元素與屬性構念用結構化的矩陣方格來表示。元素與屬性構念的關係強度描述原以數值1~5等級,如等級1表示二者關係強度大且呈反向、等級3表示二者關係強度無、等級5表示二者關係強度大且呈正向等。發現在元素與屬性構念關係強度描述有許多無法以單純的數值1~5等級來區別,如「大量」、「半夜」等,然而我們可以使用模糊理論中的歸屬函數製作Fuzzy Repertory Table來將語意形容數值化。本研究歸納出有效條件,使用機器學習的方式取得有效條件的規則,讓垃圾郵件的判讀正確率高。
Today, most spam filters usually compare the contents of emails against specific keywords, which are not robust as the spammers frequently change the terms used in emails. Human beings can successfully judge whether a coming email is spam or ham according to many features, including the keywords, associated with emails. In this paper, we apply the technique of fuzzy repertory grid to acquire knowledge of selecting behavioral features of spamming. Human concepts are extracted and classified by the techniques of repertory grid. Concepts involving uncertainty are described as fuzzy sets. Such features are than converted into filtering rules using a deductive algorithm and integrated with the mechanism of spam filtering. It is evident that such features are more robust than keyword-based so that spam filtering based on such features can have a better performance.
