跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.85) 您好!臺灣時間:2024/12/12 09:00
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:廖又葳
研究生(外文):Yu-Wei Liao
論文名稱:使用權重動態視窗之密度導向的局部離群值偵測演算法
論文名稱(外文):A New Density-Based Local Outlier Detection Algorithm Using Weighted Dynamic Window
指導教授:吳俊霖吳俊霖引用關係
口試委員:李怡靜范志鵬
口試日期:2017-07-18
學位類別:碩士
校院名稱:國立中興大學
系所名稱:資訊工程學系所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2017
畢業學年度:105
語文別:英文
論文頁數:55
中文關鍵詞:局部離群值偵測非監督式離群值偵測異常值偵測密度導向之離群值偵測動態視窗權重賦予權重動態視窗
外文關鍵詞:Local outlierUnsupervised Outlier DetectionAnomaly DetectionDensity-Based Outlier FactorDynamic windowWeighting assignmentWeighted dynamic window
相關次數:
  • 被引用被引用:0
  • 點閱點閱:255
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
所謂的離群值即是一個資料樣本與該資料集中其餘的部分都不一致。而離群偵測演算法在資料分析與圖訊識別領域中是一項非常重要的研究議題,其廣泛地應用於工業、多媒體,商業和工程等不同的領域。本研究主要著重於局部離群值的偵測,亦即一個資料樣本與其周圍資料(非全域資料集)是非常不相似的。現有基於密度的離群偵測演算法-局部離群因子存在有以下的問題:(一)在資料集密度分布較不均勻(其分布有緊密的群也有稀疏的群)或有些微重疊的時候,不能有效地找出離群值;(二)在找尋最近的k個鄰居時,對於參數k的選擇是很敏感的,其選擇很容易會影響離群偵測的精確度。
因此本篇研究提出一個使用權重動態視窗之密度導向的局部離群值偵測演算法,在非監督式偵測的情況下,透過使用動態視窗的擴張以及權重的賦予,來解決以上所提到的問題,主要的目的是想要給予較有相關性的資料較大的影響力。在實驗中我們使用了人造資料以及真實世界的資料來測試我們所提出的演算法,結果也驗證了我們所提出的方法較強健且也能夠更有效地偵測出離群值。
An outlier is an observation sample that is distant from other observations. The outlier detection method area one of the important research topics in data analysis and pattern recognition, it has been widely used in various knowledge domains. The focus of this study is on the local outlier detection, i.e., a sample is dissimilar to its surrounding data (not global dataset). The existing density-based outlier detection algorithm - local outlier factor (LOF) has the following problems: (1) It can’t perform well when the dataset is imbalanced or their density distributions are overlapped; (2) It is sensitive to the selection of the parameter k in finding nearest neighbors. This study aims at implement a better performance density-based outlier detection method which can solve above problems. By using the dynamic window and the weighting assignment, the proposed method can detect the outlier effectively and robustly. Experiments on synthetic and real world datasets demonstrate that our proposed method yields robust and excellent performance.
摘要 i
Abstract ii
Contents iii
Figure Contents iv
Table Contents vi
CHAPTER 1 Introduction 1
1.1 Background and Motivation 1
1.2 Organization of the dissertation 6
CHAPTER 2 Related Works 7
2.1 Statistical Outlier Detection Methods 7
2.2 Distance-based Outlier Detection Methods 9
2.3 Density-based Outlier Detection Methods 11
2.3.1 Local Outlier Factor (LOF) 11
2.3.2 Dynamic window Outlier Factor (DWOF) 14
CHAPTER 3 Proposed Method 17
3.1 Weighted Local Density 18
3.2 Ranking Outlier Score with Weighted Dynamic Window 20
CHAPTER 4 Experimental Results 25
4.1 Background 25
4.2 Synthetic dataset 26
4.2.1 Varying Density Dataset 26
4.2.2 Arbitrary Shaped Dataset 35
4.3 Real-World datasets 44
4.4 Discussions 51
CHAPTER 5 Conclusions 53
References 54
[1].V. Barnett and T. Lewis, Outliers in statistical data, 3rd ed. Chichester [u.a.]: Wiley, 1994.
[2].V. Chandola, A. Banerjee and V. Kumar, "Anomaly detection", ACM Computing Surveys, vol. 41, no. 3, pp. 1-58, 2009.
[3].E. Knorr and R. Ng, "Algorithms for mining distancebased outliers in large datasets", in Proceedings of the International Conference on Very Large Data Bases, pp. 392-403, 1998.
[4].E. Knorr, R. Ng and V. Tucakov, "Distance-based outliers: algorithms and applications", The VLDB Journal The International Journal on Very Large Data Bases, vol. 8, no. 3-4, pp. 237-253, 2000.
[5].S. Ramaswamy, R. Rastogi and K. Shim, "Efficient algorithms for mining outliers from large data sets", ACM SIGMOD Record, vol. 29, no. 2, pp. 427-438, 2000.
[6].M. Breunig, H. Kriegel, R. Ng and J. Sander, "LOF", ACM SIGMOD Record, vol. 29, no. 2, pp. 93-104, 2000.
[7].R. Momtaz, N. Mohssen and M. Gowayyed, "DWOF: A Robust Density-Based Outlier Detection Approach", in Pattern Recognition and Image Analysis, Berlin, Heidelberg, pp. 517-525, 2013.
[8].H. Fan, O. Zaïane, A. Foss and J. Wu, "Resolution-based outlier factor: detecting the top-n most outlying data points in engineering data", Knowledge and Information Systems, vol. 19, no. 1, pp. 31-51, 2008.
[9].E. Schubert, A. Koos, T. Emrich, A. Zufle, K. A. Schmid, and ぴ A. Zimek, “A framework for clustering uncertain data,” Proc. of the VLDB Endowment, vol. 8, no. 12, pp. 1976–1979, 2015.
[10].L. Fu and E. Medico, "FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data", BMC Bioinformatics, vol. 8, no. 1, p. 3, 2007.
[11].G. Markus, "Unsupervised Anomaly Detection Benchmark - Unsupervised Anomaly Detection Dataverse", Dataverse.harvard.edu, 2015. [Online]. Available: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/OPQMVF.
[12].S. Rayana, "ODDS Library", Odds.cs.stonybrook.edu, 2016. [Online]. Available: http://odds.cs.stonybrook.edu.
[13].Wikipedia contributors, “Normal distribution”, Wikipedia.org, 2016. [Online]. Available: https://en.wikipedia.org/wiki/Normal_distribution.
[14].E. Schubert, A. Zimek and H. Kriegel, "Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection", Data Mining and Knowledge Discovery, vol. 28, no. 1, pp. 190-237, 2012.
[15].W. Jin, A. K. H. Tung, and J. Han. “Mining top-n local outliers in large databases.” In KDD ’01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 293–298, 2001.
[16].K. Zhang, M. Hutter, and H. Jin. “A new local distance-based outlier detection approach for scattered real-world data.” Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pages 813–822, 2009
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top