跳到主要內容

臺灣博碩士論文加值系統

(3.231.230.177) 您好!臺灣時間:2021/07/28 19:53
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:謝宗霖
研究生(外文):Tsung-LinHsieh
論文名稱:基於Boyer-Moore演算法的多重字串比對演算法
論文名稱(外文):A Boyer-Moore Based Multiple Pattern Matching Algorithm
指導教授:張燕光
指導教授(外文):Yeim-Kuan Chang
學位類別:碩士
校院名稱:國立成功大學
系所名稱:資訊工程學系碩博士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2012
畢業學年度:100
語文別:英文
論文頁數:63
中文關鍵詞:網路入侵偵測系統樣式比對位移-基礎樣式比對演算法
外文關鍵詞:Network Intrusion Detection SystemsPattern MatchingShift-based Pattern Matching algorithm
相關次數:
  • 被引用被引用:0
  • 點閱點閱:215
  • 評分評分:
  • 下載下載:12
  • 收藏至我的研究室書目清單書目收藏:0
近年來,隨著網路速度越來越快,大量的病毒及惡意攻擊不斷的在網際網路間傳播。網路入侵偵測系統是一套設計來專門監測網路上是否含有惡意攻擊封包及病毒。一個有效率的樣式比對演算法在網路入侵偵測系統中扮演著重要的角色。藉由事先定義好病毒碼及惡意攻擊的內容,網路入侵偵測系統比對封包的內容看其是否包含已定義過的病毒碼。位移-基礎(shift-based)樣式比對演算法,像是KMP及Boyer-Moore演算法,為其中一大類應用在網路入侵偵測系統的方法。其優點在於能節省很多不需要的字元比對來提高樣式比對的效率。但是當事先定義好的病毒碼數量過多時,將造成病毒碼間關聯性較高而降低可以位移的距離,進而掩蓋位移-基礎(shift-based)樣式比對演算法的優點。
在這篇論文中,我們將以Boyer-Moore單一樣式比對演算法為基礎,實作出Boyer-Moore演算法的多重樣式版本。藉由增大最大可位移的距離,避免更多不需要的字元比對。在大部分的情況下,我們提出的方法在同一輸入字元只會有一次字元比對。而在原始Boyer-Moore演算法,同一輸入字元則可能被比對多次。使用的病毒規則表是取自於ClamAV此開放式源碼的軟體。根據我們模擬實作的結果,與Aho-Corasick演算法比較之下,在較小的病毒規則表,我們提出的方法能節省了超過75%不必要的字元比對。而在較大的病毒規則表,搭配分組的方法,在不同的輸入檔案中依然節省了33%到63%不必要的字元比對。藉由壓縮資料結構,我們提出的方法僅使用Aho-Corasick演算法0.91倍的記憶體使用量。
In recent years, with the grown of network speeds, a lot of viruses and malicious attacks have spread continuously over the Internet. Network intrusion detection system (NIDS) is designed especially for monitoring the viruses or malicious attacks by inspecting the contents of packets travelling over the Internet. An efficient pattern matching algorithm plays an important role in the NIDS. Based on the pre-defined virus signatures and malicious attack contents (called patterns), NIDS performs the pattern matching operations to find whether any pre-defined pattern exists in the payload of any packet. Shift-based schemes such as KMP and Boyer-Moore are the main pattern matching algorithms adopted in NID systems. The advantage of shift-based schemes is that they avoid many character comparisons to gain the performance of pattern matching. But when the set of the pre-defined patterns become larger, the relation between patterns would become closer so that the shift distance may be too small to fully take the advantage of the shift-based schemes.
In this thesis, we will use Boyer-Moore single pattern matching algorithm as our foundation and implement a multiple pattern matching version from it. By increasing the biggest shift distance, we can avoid more unnecessary character comparisons. In most cases, the same input character is only matched once in the proposed scheme while some input characters may have to be compared many times in the original Boyer-Moore scheme. The pattern set that we use is from ClamAV, the open source software. According to our simulation results, comparing with Aho-Corasick algorithm, our method can save more than 75% unnecessary character comparisons in small pattern set. In the set with more patterns, our scheme enhanced by a grouping method still saves 33% to 63% unnecessary character comparisons in different input files. By compressing the data structure, it only costs 0.91 times of memory usage comparing to Aho-Corasick algorithm.
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Organization of The Thesis 2
Chapter 2 Background 3
2.1 Pattern Matching Problem 3
2.2 Real-Life Pattern Sets 3
2.2.1 ClamAV 3
2.2.2 Snort 4
2.2.3 Bro 5
Chapter 3 Related Work 7
3.1 Automata-Based Algorithms 7
3.1.1 Knuth-Morris-Pratt Algorithm 7
3.1.2 Aho-Corasick Algorithm 9
3.2 Shift-Based Algorithms 11
3.2.1 Boyer-Moore Algorithm 11
3.2.2 Wu-Manber Algorithm 14
3.3 Algorithms of Other Mechanisms 16
3.3.1 Combined Algorithm 16
3.3.2 Bloom Filter Based Algorithms 19
3.3.3 TCAM-Based Algorithms 20
Chapter 4 Proposed Scheme 23
4.1 Preliminaries 23
4.2 The First Data Structure 24
4.3 The Second Data Structure 26
4.4 The Searching Process 31
4.5 Enhancement on Searching Speed 34
4.5.1 Handling the Repetitively Scanning Case 34
4.5.2 Problem When Use Exception Table 38
4.6 Divide Pattern Set into Groups 42
4.7 Table Compression 46
Chapter 5 Experimental Result 48
5.1 Simulation Environment 48
5.2 Pattern Set and Input Trace 48
5.3 Dividing Pattern Set into Sub-Pattern Sets 49
5.3.1 Deciding Number of Groups 49
5.3.2 The Method of Classifying Patterns 52
5.4 Simulation Results 53
5.4.1 Results of Proposed Scheme 53
5.4.2 Results with Exception Table 59
Chapter 6 Conclusions and Future Work 61
References 62

[1]V. Aho and M. J. Corasick, “Efficient string matching: An aid to bibliography search, Communications of the ACM, vol. 18, no.6, pp.333-340, 1975.
[2]B. H. Bloom, “Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM, vol.13, no.7, pp.422-426, July 1970.
[3]R. S. Boyer and J. S. Moore, “A fast string searching algorithm, Communications of the ACM, vol. 20, no 10, pp.762-772, Oct. 1977.
[4]Bro: The Bro Network Security Monitor, www.bro-ids.org
[5]Clam Anti-Virus signature database, www.clamav.net.
[6]B. Commentz-Walter “A string matching algorithm fast on the average, Proc. vol. 6 International Colloquium on Automata, Languages, and Programming, pp.118-132, 1979.
[7]L. Fan, P. Cao, J. Almeida, and A. Z. Broder, “Summary Cache: A scalable Wide-Area Web Cache Sharing Protocol, IEEE/ACM Transactions on Networking, vol.8, no.3, pp.281-293, June 2000.
[8]R. Nigel Horspool, “Practical fast searching in strings. Software–Practice & Experience 10–6 (1980) pp.501-506.
[9]Andrew Hume & Daniel Sunday, “Fast string searching. Software–Practice &Experience 21-11 (1991) pp.1221-1248.
[10]E. Knuth, J. H. M. Jr., and V. R. Pratt, “Fast pattern matching in strings, SIAM Journal on Computing, vol. 6, no. 2, pp. 323–350, June 1977.
[11]P. C. Lin, Y. D. Lin, and Y. C. Lai, “A Hybrid Algorithm of Backward Hashing and Automaton Tracking for Virus Scanning, IEEE Transactions on Computers, vol. 60, no. 4, April 2011.
[12]C. R. Meiners, J. Patel, E. Norige, E. Torng and A. X. Liu, “Fast Regular Expression Matching Using Small TCAMs For Network Intrusion Detection and Prevention Systems, in Proceedings of the 19th USENIX conference on Security, 2010.
[13]Snort: The Open Source Network Intrusion Detection System, www.snort.org
[14]The Shmoo Group: the Capture the Flag Data. http://cctf.shmoo.com/
[15]Sun Wu, Udi Manber, “A fast algorithm For Multi-Pattern Searching, Technical Report TR 94-17, University of Arizona at Tuscon, May 1994
[16]D. H. Yang, X. Ke and C. Yong, “An Improved Wu-Manber Multiple Patterns Matching Algorithm IEEE International Performance Computing and Communications Conference, pp. 675-680, 2006.
[17]F. Yu, R. H. Katz, and T. V. Lakshman, “Gigabit Rate Packet Pattern-Matching Using TCAM, in Proceeding of 12th IEEE International Conference on Network Protocols (ICNP’04), pp. 174-183, 2004.
[18]F. Zane, G. Narlikar and A. Basu, “CoolCAMs: Power-Efficient TCAMs for Forwarding Engines. IEEE International Conference on Computer Communications, vol.1, pp.42-45, March 2003.

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top