跳到主要內容

臺灣博碩士論文加值系統

(35.172.223.30) 您好!臺灣時間:2021/07/25 10:03
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:黎俊達
研究生(外文):Li, Chungta
論文名稱(外文):A method of spam detection based on structural similarity
指導教授:林柏青林柏青引用關係
指導教授(外文):Lin, Poching
口試委員:葉春超江為國黃啟富
口試委員(外文):Yeh, ChunchaoChiang, WeikuoHuang, Chifu
口試日期:2012-07-10
學位類別:碩士
校院名稱:國立中正大學
系所名稱:資訊工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2012
畢業學年度:100
語文別:英文
論文頁數:41
中文關鍵詞:垃圾郵件分群文件相似度
外文關鍵詞:SpamClusteringDocument similarity
相關次數:
  • 被引用被引用:0
  • 點閱點閱:296
  • 評分評分:
  • 下載下載:2
  • 收藏至我的研究室書目清單書目收藏:0
Spammers usually deliver a large number of spam instances generated
from a set of templates. To identify spam messages in the same campaigns or
to detect new spam instances that are likely to belong to known campaigns,
we propose a method to group spam messages based on their HTML struc-
tural features. We observe that spam mails tend to have similar structures of
the mail bodies, even though the words in the bodies can be signi cantly dif-
ferent to evade spam detection. Rather than infer the templates and represent
them in regular expressions, we extract the HTML tags from the mail bodies
as the structural features, and build a ngerprint for each structure. With
the ngerprints, we can eciently identify the clusters of similar structures
using the simhash algorithm and the Jaccard similarity. The identi cation
is useful to nd new spam instances belonging to known structures with a
high recall up to around 95%, while the false-positive rates for normal mails
can be less than 5%.

1 Introduction
2 Background and Related Work
2.1 Anti-Spam Approaches
2.2 Document similarity methods
2.3 Clustering Method
3 Methodology
3.1 Observation
3.1.1 Case Studies
3.1.2 Data set analysis
3.2 System Design
3.2.1 Tag Retrieval
3.2.2 Cluster Merge
3.3 Detection Algorithm
3.4 Evasion Method
4 Experimental Results
4.1 Experiment parameter setting
4.2 Clustering and validation based on simhash
4.3 Clustering and validation with Jaccrd similarity
4.4 Case Studies
5 Conclusion
[1] C. Kanich, C. Kreibich, K. Levchenko, B. Enright, G. Voelker, V. Paxson
and S. Savage, Spamalytics: an empirical analysis of spam marketing
conversion, Comm. of the ACM, Sep. 2009.

[2] C. Akass, Storm worm making millions a day,
http://www.pcw.co.uk/personal-computer-world/news/2209293/storm-
worm-making-millions-day, Feb. 2008.

[3] C. Kreibich, C. Kanich, K. Levchenko, B. Enright, G. Voelker, V. Paxson,
and S. Savage, On the spam campaign trail, Proc. USENIX Workshop on
Large-scale Exploits and Emergent Threats (LEET), San Francisco, CA,
Apr. 2008.

[4] C. Y. Cho, J. Caballero, C. Grier, V. Paxson and D. Song, Insights from
the inside: a view of botnet management from in ltration, Proc. USENIX
Workshop on Large-scale Exploits and Emergent Threats (LEET), Apr.
2010.

[5] Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geo Hul-
ten, and Ivan Osipkov,\ Spamming botnets: signatures and characteris-
tics,"Proceedings of the ACM SIGCOMM 2008 conference on Data com-
munication(SIGCOMM), Aug. 2008.

[6] Andreas Pitsillidis, Kirill Levchenko, Christian Kreibich, Chris Kanich,
Geo rey M. Voelker, Vern Paxson, Nicholas Weaver, and Stefan Savage,
\Botnet Judo: Fighting Spam with Itself,"Proceedings of the Network
and Diestributed System Security Symposium(NDSS), San Diego, USA,
Feb. 2010.

[7] J. P. John, A. Moshchuk, S. D. Gribble and A. Krishamurthy, \Study-
ing spamming botnets using Botlab," 6th USENIX Symp. on Networked
Systems Design and Implementation (NDSI), Apr. 2009.

[8] "http://www.green-computing.com/ocial"

[9] Symantec Email AntiSpam.cloud-Spam Filter Service, http://www.
symanteccloud.com/en/us/services/communication-security/
email\_antispam.aspx.

[10] Gurmeet Manku, Arvind Jain, and Anish Das Sarma, \Detecting Near-
Duplicates for Web Crawling,"16th International World Wide Web Con-
ference), May 2007.

[11] Monika Henzinger, \Finding near-duplicate web pages: a large-scale
evaluation of algorithms,"SIGIR '06 Proceedings of the 29th annual in-
ternational ACM SIGIR conference on Research and development in in-
formation retrieval(SIGIR)), May 2007.

[12] Zhuang Li, Dunagan John, Simon Daniel R, Wang Helen J, Tygar, J. D,
\Characterizing botnets from email spam records,"SIGIR 'Proceedings of
the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats
Article No. 2(LEET)), 2008.

[13] Gianluca Stringhinix, Thorsten Holzz, Brett Stone-Grossx, Christopher
Kruegelx, and Giovanni Vignax,\BotMagni er: Locating Spambots on
the Internet,"USENIX Security Symposium(USENIX), Aug. 2011.

[14] Moses S. Charikar,\Similarity estimation techniques from rounding al-
gorithms,"ACM symposium on Theory of computing(STOC), Aug. 2002.

[15] J. Jung and E. Sit,\An empirical study of spam trac and the use of
DNS black lists,"4th ACM SIGCOMM Conf. on Internet Measurement,
Oct. 2004.

[16] H. Drucker, D. Wu, and V. N. Vapnik, Support vector machines for
spam categorization, IEEE Trans. Neural Networks, vol. 10, issue 5, pp.
10481054, Sept. 1999.

[17] M. Ye, T. Tao, F. J. Mai and X. H. Cheng, A spam discrimination based
on mail header feature and SVM, 4th Intl. Conf. on Wireless Communi-
cations, Networking and Mobile Computing (WiCOM), Oct. 2008.

[18] Flavio D, Garcia, Jaap-henk Hoepman, and Jeroen van\Spam Filter
Analysis,"Proceedings of 19th IFIP International Information Security
Conference(WCC2004-SEC), 2004.

[19] Ulrich Bayer, Paolo Milani Comparetti, Clemens Hlauschek, Christo-
pher Krgel, and Engin Kirda: , \Scalable, Behavior-based Malware
Clustering,"Proceedings of the Network and Diestributed System Security
Symposium(NDSS), San Diego, USA, Feb. 2009.

[20] Andrei Z Broder, Steven C Glassman, Mark S Manasse, and Geo rey
Zweig\Syntactic clustering of the Web,"Computer and Information Sci-
ence, 1997.

[21] D. Shah, T. R. Zaman,\Community Detection in Networks: The Leader-
Follower Algorithm,"Conference on Neural Information Processing Sys-
tems(NIPS ), Dec. 2010.

[22] Chun-Chao Yeh, and Chia-Hui Lin Near-Duplicate Mail Detection
Based on URL Information for SPAM Detection, Lecture Notes in Com-
puter Science (LNCS), Oct. 2008.

[23] Chun-Chao Yeh and Nai-Wei Yeh, Octet Histogram-Based Near Dupli-
cated Mail Detection for Spam Filtering, Proc. of IEEE EEE'05Workshop
on Mobility, Agents and Mobile Services, Mar. 2005.

[24] Tanguy Urvoy, Emmanuel Chauveau, Pascal Filoche, Thomas
Lavergne,\A Short Survey of Document Structure Similarity Algo-
rithms,"ACM Transactions on the Web (TWEB), Feb. 2008.

[25] Piotr Indyk , Rajeev Motwani , Prabhakar Raghavan , S. Vempala
, Santosh Vempala ,\Locality-Preserving Hashing in Multidimensional
Spaces,"\ACM symposium on Theory of computing(STOC), 1997.

[26] "http://www.projecthoneypot.org"

[27] A. Pathak, F. Qian, Y. C. Hu, Z. M. Mao and S. Ranjan, Botnet spam
campaigns can be long lasting: evidence, implications, and analysis," In
Proceedings of ACM Sigmetrics, Aug. 2009.

[28] Equitz, W.H., "A new vector quantization clustering algo-
rithm,"Acoustics, Speech and Signal Processing, Oct. 1989.

[29] Saul Schleimer, Daniel S, Wikerson, and Alex Aiken,\Winnowing: lo-
cal algorithm for document ngerprinting,"Proceedings of the 2003 ACMSIGMOD international conference on Management of data(SIGMOD),
June 2003.

[30] Pedro H. Calais Guerra, Douglas E. V. Pires, Marco Tulio C. Riberiro,
Dorgival Guedes, Wagner Meira Jr., Cristine Hoepers, Marcelo H. P.
C. Chaves, and Klaus Steding-Jessen, \Spam Miner: Aplatform for De-
tecting and Characterizing Spam Campaigns,"The 15th ACM SIGKDD
Conference on Knowledge Discovery and Data Mining(KDD), June 2009.

[31] Pedro Calais, Douglas Pires, Dorgival Olavo Guedes Neto,Wagner Meira
Jr., Cristine Hoepers, and Klaus Steding-Jessen,\A Campaign-based
Characterization of Spamming Strategies,"Collaboration, Electronic mes-
saging, Anti-Abuse and Spam Conference(CEAS), Aug. 2008.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top