跳到主要內容

臺灣博碩士論文加值系統

(44.192.49.72) 您好!臺灣時間:2024/09/11 05:00
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:卓函穎
研究生(外文):CHO-HAN YIN
論文名稱:應用熵與主成份分析法於網路流量異常分析之研究
論文名稱(外文):A Study of Applying Entropy and Principal Componet Analysis for Networking Traffic Anomaly Analysis
指導教授:賴裕昆
指導教授(外文):Yu-Kuen Lai
學位類別:碩士
校院名稱:中原大學
系所名稱:電機工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2012
畢業學年度:100
語文別:中文
論文頁數:69
中文關鍵詞:網路流量分析主成份分析
外文關鍵詞:Network traffic analysisPCAEntropy
相關次數:
  • 被引用被引用:0
  • 點閱點閱:202
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
網路系統有許多原因會造成網路異常,網路管理人員可以藉由分析網路流量(Network Flow)的相關資訊,找出造成異常流量的可能原因。諸如,快閃、壅塞、設備毀損、斷線、蠕蟲以及分散式阻斷服務(DDOS)等攻擊。透過分析封包流資訊,可以即時的找出異常,快速的解決那些異常在網路上造成的問題。但現今網路傳輸的速度已有10Gbps、40Gbps,甚至可達到100Gbps的高速傳輸。若要在高速網路下分析數量龐大且高維度的資料,是一項具有挑戰性的任務。

本文探討即時縮減高維度流量資料之川流演算法,並且使用熵(Entropy)與主成份分析(Principal Component Analysis, PCA),快速的進行流量分析,針對特定標頭資訊分類出的封包流計算熵值。藉由觀察熵值的變化,可反應出封包流出現次數的分散程度,可達成找出流量變異的目的。將熵值進行主成份分析可觀察到異常流量發生時,主成份分數及因子負荷的變化,進而可分辨出異常的類型。論文中探討了三項有關於川流式估計熵的演算法,估計熵的演算法可以對不間斷的網路封包流做一次性的統計處理,不須儲存再重複讀取分析封包資訊,使分析速度更快且節省儲存的空間,達到即時分析的目標。本論文使用估計熵演算法與主成份分析,實現一個即時網路流量分析的方法。我們使用真實含有異常的網路流量檔,做實際的模擬及分析,達到即時的偵測異常並分辨種類,也達到節省空間的目標。



It's a challenge task to analyze networking traffic and identify anomalies in real-time at wire-speed. In this thesis, we propose a framework with limited memory space to perform network traffic analysis at the core networks. The design is based on sketch algorithm implemented in hardware to digest the highly dimensional traffic information at wire-speed. The compact data structure is then feedback to the system to conduct analysis with Entropy and Principal Component Analysis (PCA) in a streaming fashion. Based on several real-world traffic traces, simulations are performed to reveal the effectiveness of this framework. We also provide in-depth discussions for the system design with tradeoffs on accuracy, time and space.



目錄

中文摘要..................................I
Abstract.................................II
誌謝.....................................III
目錄......................................IV
圖目錄....................................VII
表目錄....................................IX

1 緒論......................................1
1.1 研究背景與動機.........................1
1.2 研究目的..............................2
1.3 論文架構..............................3

2 相關研究....................................4
2.1 流量資料處理方法........................4
2.1.1 取樣(Sampling)..................4
2.1.2 速寫演算法(Sketch)...............4
2.1.3 小波轉換(Wavelet Transform)......5
2.1.4 主成份分析(PCA)..................5
2.1.5 熵(Entropy).....................6
2.1.6 Kullback-Leibler divergence(K-L distance)............6
2.2 流量觀測方式...........................6
2.2.1 移動視窗.........................6

3 文獻探討....................................7
3.1 主成份分析相關研究......................7
3.2 熵相關研究.............................10
3.3 基於熵結合主成份分析相關研究.............13

4 流量異常分析架構.............................16
4.1 熵處理.................................16
4.2 主成份分析處理..........................18
4.2.1 主成份指標.......................20
4.3 分析架構................................22

5 估計熵演算法.................................24
5.1 Sieving演算法..........................25
5.2 Hybird演算法...........................28
5.3 Count-Min 速寫演算法....................31
5.4 估計熵演算法探討與比較...................33
5.4.1 演算法特性.......................33
5.4.2 使用空間與錯誤率..................34
5.4.3 分析架構加入估計熵................35

6 分析架構實現與討論............................36
6.1 流量檔.................................36
6.2 流量檔熵值..............................37
6.3 主成份分析結果..........................41
6.4 實驗討論...............................44
6.5 異常來源查詢............................50

7 結論與未來展望...............................54
7.1 結論..................................54
7.2 未來展望...............................55

參考文獻......................................57


圖目錄
3.1.1 起迄流與各個連結流量時間序列[1].............7
3.1.2 流量主成份分析解釋原資料資訊程度[1]..........8
3.1.3 流量在正常主成份的主成份分數[1]..............9
3.1.4 流量在異常主成份的主成份分數[1]..............9
3.2.1 正常流量發生蠕蟲攻擊的變化[2]...............11
3.2.2 端口掃描在熵值時間序列的特徵變化[3]..........13
3.3.1 經主成份分析的特徵值與特徵值累積分佈[4].......15
3.3.2 因子負荷[4]...............................15

4.1.1 熵值計算..................................17
4.1.2 熵值表示的資料分佈..........................18
4.2.1 矩陣X資料示意圖............................18
4.2.2 共變異數計算示意圖..........................19
4.2.3 共變異數的特徵值與特徵向量...................19
4.2.4 相關係數求因子負荷..........................20
4.2.5 計算主成份分數.............................21
4.3.1 系統架構流程圖.............................22

5.1.1 Sample and hold..........................27
5.1.2 Sieving foe elephant.....................27
5.2.1 Hybrid計數(例)............................30
5.2.2 Hybrid演算法區間結束刪除動作................31
5.2.3 Hybrid演算法最後刪除動作...................32
5.3.1 Count-Min Sketch計數方式..................32
5.3.2 Count-Min Sketch查詢......................32
5.4.1 加入估計熵演算法的流量分析架構...............35

6.2.1 MAWI流量檔熵值............................38
6.2.2 MAWI流量檔合併Witty蠕蟲流量檔熵值...........38
6.2.3 MAWI流量檔合併Slammer蠕蟲流量檔熵值.........39
6.2.4 LLS_DDOS流量檔熵值........................39
6.3.1 Witty蠕蟲第一主成份分數....................42
6.3.2 Slammer蠕蟲第一主成份分數..................43
6.3.3 LLS_DDOS第一主成份分數.....................44
6.4.1 異常區間佔有的個數示意圖....................46
6.4.2 異常區間多寡與主成份分數變化................46
6.4.3 主成份分析使用的區間示意圖..................47
6.4.4 Slammer蠕蟲總數分佈.......................48
6.4.5 Slammer熵值分佈(5分鐘區間).................48
6.4.6 時間區間大小與熵值變異程度關係...............49
6.5.1 MAWI流量檔使用最新區間查詢封包流在過去區間出現次數.................51
6.5.2 Witty流量檔使用最新區間查詢封包流在過去區間出現次數.................52
6.5.3 Slammer流量檔使用最新區間查詢封包流在過去區間出現次數...............53
6.5.4 10G流量檔使用最新區間查詢封包流在過去區間出現次數.................53

表目錄
3.1 常見異常流量影響的特徵分佈[3]..................12
3.2 Fukuda, K等人分析的10項變數..................14

5.1 估計熵演算法錯誤率與使用空間大小之關係..........34

6.1 MAWI流量檔資訊...............................36
6.2 Count_Min速寫演算法設定參數...................37
6.3 各流量檔四項變數與原熵值的差值平均..............40
6.4 主成份分析相關設定............................40
6.5 前後區間差異平均.............................41
6.6 一變數與其他變數差值平均......................41
6.7 偵測出Witty後特徵值及特徵值累積分佈............41
6.8 偵測出Slammer後特徵值及特徵值累積分佈..........41
6.9 偵測出LLS_DDOS後特徵值及特徵值累積分佈.........42
6.10 Witty蠕蟲因子負荷分析.......................42
6.11 Slammer蠕蟲因子負荷分析.....................43
6.12 LLS_DDOS因子負荷分析........................44
6.13 使用丵個區間進行主成份分析的因子負荷...........45
6.14 異常區間多寡與第一主成份因子負荷變化...........47
6.15 查詢出現區間個數所使用的流量檔資訊.............52
[1] A. Lakhina, M. Crovella, and C. Diot. Diagnosing network-wide traffic anomalies. In ACM SIGCOMM Computer Communication Review, volume 34, page 219–230, 2004.
[2] A. Wagner and B. Plattner. Entropy based worm and anomaly detection in fast IP networks. 2005.
[3] A. Lakhina, M. Crovella, and C. Diot. Mining anomalies using traffic feature distributions. In Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications, page 217–228, 2005.
[4] K. Fukuda, T. Hirotsu, O. Akashi, and T. Sugawara. A pca analysis of daily unwanted traffic. In 2010 24th IEEE International Conference on Advanced Information Networking and Applications, page 377–384, 2010.
[5] G. Nychis, V. Sekar, D.G. Andersen, H. Kim, and H. Zhang. An empirical evaluation of entropybased traffic anomaly detection. In Proceedings of the 8th ACM SIGCOMM conference on Internet measurement, page 151–156, 2008.
[6] Q. Quan, C. Hong-Yi, and Z. Rui. Entropy based method for network anomaly detection. In 2009 15th IEEE Pacific Rim International Symposium on Dependable Computing, page 189–191, 2009.
[7] D. Brauckhoff, K. Salamatian, and M. May. Applying PCA for traffic anomaly detection: Problems and solutions. In INFOCOM 2009, IEEE, page 2866–2870, 2009.
[8] C. Callegari, L. Gazzarrini, S. Giordano, M. Pagano, and T. Pepe. A novel multi timescales pca-based anomaly detection system. In Performance Evaluation of Computer and Telecommunication Systems (SPECTS), 2010 International Symposium on, page 156–162, 2010.
[9] C. Issariyapat and K. Fukuda. Anomaly detection in IP networks with principal component analysis. In Communications and Information Technology, 2009. ISCIT 2009. 9th International Symposium on, page 1229–1234, 2009.
[10] P. Barford, J. Kline, D. Plonka, and A. Ron. A signal analysis of network traffic anomalies. In Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment, page 71–82, 2002.
[11] X. Li, F. Bian, M. Crovella, C. Diot, R. Govindan, G. Iannaccone, and A. Lakhina. Detection and identification of network anomalies using sketch subspaces. In Proceedings of the 6th ACM SIGCOMM conference on Internet measurement, page 147–152, 2006.
[12] Y. Liu, L. Zhang, and Y. Guan. Sketch-based streaming PCA algorithm for network-wide traffic anomaly detection. In 2010 International Conference on Distributed Computing Systems, page 807–816, 2010.
[13] J. Mai, C.N. Chuah, A. Sridharan, T. Ye, and H. Zang. Is sampled data sufficient for anomaly detection? In Proceedings of the 6th ACM SIGCOMM conference on Internet measurement, page 165–176, 2006.
[14] J. Shlens. A tutorial on principal component analysis. Systems Neurobiology Laboratory, University of California at San Diego, 2005.
[15] L. Zheng, P. Zou, Y. Jia, and W. Han. Traffic anomaly detection and containment using filter-ary-sketch. Procedia Engineering, 29:4297–4306, 2012.
[16] H. Ringberg, A. Soule, J. Rexford, and C. Diot. Sensitivity of PCA for traffic anomaly detection. ACM SIGMETRICS Performance Evaluation Review, 35(1):109–120, 2007.
[17] IT Jolliffe. Discarding variables in a principal component analysis. i: Artificial data. Applied statistics, page 160–173, 1972.
[18] A. Lall, V. Sekar, M. Ogihara, J. Xu, and H. Zhang. Data streaming algorithms for estimating entropy of network traffic. ACM SIGMETRICS Performance Evaluation Review, 34(1):145–156, 2006.
[19] C. Estan and G. Varghese. New directions in traffic measurement and accounting. In ACM SIGCOMM Computer Communication Review, volume 32, page 323–336, 2002.
[20] G. Shen, J. Zhu, and Z. Qin. A hybrid algorithm for accurate stream entropy estimation. In Wireless Communications, Networking and Mobile Computing, 2007. WiCom 2007. International Conference on, page 2302–2305, 2007.
[21] G. Cormode and M. Hadjieleftheriou. Finding frequent items in data streams. Proceedings of the VLDB Endowment, 1(2):1530–1541, 2008.
[22] G. Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55(1):58–75, 2005.
[23] MAWI working group traffic archive. http://mawi.wide.ad.jp/mawi/.
[24] CAIDA witty worm. http://www.caida.org/research/security/witty/.
[25] MS-SQL slammer traffic analysis. http://rbeverly.net/research/slammer/.
[26] MIT lincoln laboratory: Communications &; information technology. http://www.ll.mit.edu/mission/communications/ist/index.html.
[27] MAWI trace "200302270030.dump".
http://mawi.wide.ad.jp/mawi/samplepointB/20030227/200302270030.html.
[28] MAWI trace "200302270200.dump". http://mawi.wide.ad.jp/mawi/samplepointB/20030227/200302270200.html.
[29] Passive monitor: equinix-sanjose. http://www.caida.org/data/monitors/passiveequinix-sanjose.xml.
[30] N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker, and J. Turner. OpenFlow: enabling innovation in campus networks. ACM SIGCOMM Computer Communication Review, 38(2):69–74, 2008.
電子全文 電子全文(本篇電子全文限研究生所屬學校校內系統及IP範圍內開放)
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top