(3.237.97.64) 您好!臺灣時間:2021/03/04 14:55
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:楊原杰
研究生(外文):Yuan-Jie Yang
論文名稱:使用關聯規則分析及決策樹分析進行垃圾郵件之特徵關聯規則探勘與比較
論文名稱(外文):Mining and Comparison of Feature Association Rules for Spam by Using Association rules Analysis and Decision Tree Analysis
指導教授:邱瑞滿
指導教授(外文):Jui-Man Chiu
學位類別:碩士
校院名稱:開南管理學院
系所名稱:資訊管理系碩士班
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2005
畢業學年度:93
語文別:中文
論文頁數:66
中文關鍵詞:垃圾郵件廣告郵件超文件標示語言關聯規則分析決策樹分析
外文關鍵詞:SpamAdvertising mailHypertext markup languageAssociation Rule AnalysisDecision tree analysis
相關次數:
  • 被引用被引用:0
  • 點閱點閱:386
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
現今社會因為垃圾郵件的氾濫,已造成個人與企業時間與資源上的嚴重損失,因此本論文針對時下的垃圾郵件問題,在參考過許多學者的過濾方法與特徵的抽取,提出不使用郵件內容(Content)與郵件標頭(Head)作為學習與過濾的資料來源,而改以抽取廣告郵件所採用之廣告手法之特徵,在此,廣告郵件手法指的是大量運用於垃圾郵件中的超文件標示語言(Hypertext Markup Language),我們使用Java連線郵件伺服器,直接抽取垃圾郵件中的超文件標示語言做為資料的來源,然後利用關聯規則分析(Association Rule Analysis)中的Apriori 演算法與決策樹分析中的C4.5演算法進行規則的探勘,並以這兩種演算法所產生的規則進行比較,找出特徵屬性間的相關聯性,並以此結論回推廣告郵件的樣貌。
Many spam appear in the emails of people and thus they results in severe loss on the time and resources of personnel and enterprises in our society. In the past, many researchers used the content or the head of emails as the data source of learning or screening for finding whether the emails can be classified as spam. In this thesis, we propose a novel method to mine the feature association rules hidden in the spam. Since we observe the truth that, in general, there is a lot of advertising skills hidden in the spam, we extract the features from the hypertext markup language (HTML) of the emails. At first, we extract the HTML as our data source by using Java on-line mail server. Next, we use Apriori algorithm for association rules analysis and the C4.5 algorithm for decision-tree analysis to mine the association rules of features hidden in the HTML of spam. Finally, we compare the obtained association rules to find the association between features and then use the results to infer backward the original style of spam.
目錄
誌謝........................................................................................................................................ii
摘要.......................................................................................................................................iii
Abstract.................................................................................................................................iv
第一章 緒論 1
1.1 研究動機 1
1.2 論文大綱 1
第二章 文獻探討..................................................................................................................3
2.1 垃圾郵件之定義 3
2.2 垃圾郵件之過濾流程 5
2.3 垃圾郵件前處理與特徵抽取 6
2.4 過濾方法 7
2.4.1 群集分析(Clustering Analysis) 7
2.4.2 樸素貝式(Naive Bayesian) 8
2.4.3 記憶基礎推理(Memory-Based Reasoning) 9
2.4.4 垃圾郵件現況調查 10
2.5 成效評估 10
2.5.1 召回率(Recall)與精確率(Precision) 10
2.5.2 總成本比例(Total Cross Ratio ) 11
第三章 方法論 14
3.1 系統架構 14
3.1.1 系統平台與工具 14
3.1.2 資料流程 16
3.1.3 郵件特徵 18
3.2 資料探勘方法 21
3.2.1資料前處理(Data Preprocessing) 21
3.2.2 關聯規則分析(Association Rule Analysis) 23
3.2.3 決策樹分析(Decision Tree Analysis) 25
3.3 實驗結果 27
3.3.1 關聯規則分析結果 27
3.3.2 決策樹分析結果 29
第四章 結論 33
參考文獻 34
附錄1:關聯規則分析探勘結果 37
1.1 Apriori for TRN 37
1.2 Apriori for TN 39
1.3 Apriori for TDN 40
1.4 Apriori for MS..........................................................................................................41
1.5 Apriori for LN 43
1.6 Apriori for IN 44
1.7 Apriori for FN 45
1.8 Apriori for BS 46
1.9 Apriori for BN 47

附錄2:決策樹分析探勘結果 49
2.1 Decision Tree for TRN 49
2.2 Decision Tree for TN 51
2.3 Decision Tree for TDN 53
2.4 Decision Tree for MS 55
2.5 Decision Tree for LN 57
2.6 Decision Tree for IN 58
2.7 Decision Tree for FN 59
2.8 Decision Tree for BS 60
2.9 Decision Tree for BN 61
參考文獻
壹、中文部分
1.吳文峰, 「中文郵件分類器之設計及實作」, 逢甲大學 資訊工程學系碩士班碩士論文, 民國九十一年六月。
2.吳仁和、林信惠, 「系統分析與設計 理論與實務應用」, 智勝文化出版, 2002。
3.吳弘凱、鄧文淵, 「首頁製作百寶箱使用技巧」, 松崗出版, 2000。
4.林長毅 譯, 「Java網路程式設計」, 歐萊禮出版, 2002。
5.麥可 斐瑞, 戈登 林若夫, 「資料採礦-顧客關係管理暨電子行銷之應用」, 2001。
6.陳建勳 譯, 「Eclipse 整合開發工具」, 歐萊禮出版, 2004。
7.陳建勳 譯, 「JDBC與Java資料庫程式設計」, 歐萊禮出版, 2003。
8.曾龍, 「資料探礦-概念與技術」, 維科出版, 2004。
9.PC Home, 「PC Home Online網路家庭TM會員服務使用條款, Available:http://isp.pchome.com.tw/reg-bin/law.html. (Accessed on March 2005) 。

1.Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Sakkis, G., Spyropoulos, C.,and Stamatopoulos, P., "Learning to filter spam e-mail: A comparison of a naiveBayesian and a memory-based approach," In Proceedings of the Workshop onMachine Learning and Textual Information Access, 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2000),Lyon, France, 2000, pp. 1- 3.
2.Cunningham, P., Nowlan, N., Delany, S., and Haahr, M., "A Case-Based Approach to Spam Filtering that Can Track Concept Drift," In Proceedings of ICCBR-03 Workshop on Long-Lived CBR Systems, 13 May 2003, Available:http://www.cs.tcd.ie/publications/tech-reports/reports.03/TCD-CS-2003-16.pdf. (Accessed on March 2004).
3.Elliotte Rusty Harold, "Java Network Programing", O`REILLY, 2002.
4.Fawcett, T., "In vivo spam filtering: A challenge problem for data mining," KDD Explorations, 5(2), December 2003,Available:http://www.hpl.hp.com/personal/Tom_Fawcett/papers/spam-KDDexp.pdf (Accessed on March 2005).
5.George Reese, "JDBC And Java Database Designing", O`REILLY, 2003.
6.I. Androutsopoulos, J. Koutsias, K. V. Chandrinos,and C. D. Spyropoulos. “An experimental comparison of naive bayesian and keyword-based anti-spam filtering with personal e-mail messages”. In Proceedings of SIGIR-2000, ACM, pp. 160-167, 2000.
7.Ian H. Witten, Eibe Frank, "Data Mining-Practical machine learning tools and techniques with java implementations ", Morgon kaufmann ,2003.
8.Jiawei, H. And Micheline, K., “Data Mining Concepts and Techniques.”, Morgan Kaufmann Publishers, 2001.
9.Kirkby, R., “WEKA Explorer User Guide for Version 3-3-4”, Waikato University, July 3,2002,Available:http://prdownloads.sourceforge.net/weka/ExplorerGuide.pdf(Accessed on March 2005).
10.Michael J.A. Berry, Gordon Linoff, "Data MiningTechniques:for marketing, and sales, and customer support", John Wiley & Sons, Inc., 2001.
11.Manco, G., Masciari1, E., Ruffolo, M., and Tagarelli, A., "Towards An AdaptiveMail Classifier," In Italian Association for Artificial Intelligence Workshop Su Apprendimento Automatico: Metodi Ed Applicazioni, 2002,Available:http://www-dii.ing.unisi.it/aiia2002/paper/APAUT/Abstract/manco-aiia02.pdf .(Accessed on March 2005).
12.Myllymaki, P., Tirri, H., "Bayesian Case-Based Reasoning with NeuralNetworks," Neural Networks, IEEE International Conference, 1(28), pp.422-427, 1993,.
13.Sakkis, G., Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Spyropoulos, C.,and Stamatopoulos, P., "A memory-based approach to anti-spam ltering," TechReport DEMO 2001, National Centre for Scientic Research \Demokritos", 2001.
14.Sun Microsystems, "JavaMail API Design Specification Version 1.2", Available:http://java.sun.com/products/javamail/reference/api/index.html (Accessed on March 2004).
15.Sun Microsystems, "JavaMail guide for service providers", Available:http://java.sun.com/products/javamail/reference/api/index.html (Accessed on March 2004).
16.Steve Holzner, "Eclipse IDE", O`REILLY, 2004.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔