跳到主要內容

臺灣博碩士論文加值系統

(44.192.48.196) 您好!臺灣時間:2024/06/16 10:49
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:賴冠宇
研究生(外文):Lai, Guan-Yu
論文名稱:通過流形學習進行異常檢測
論文名稱(外文):Anomaly Detection via Manifold Learning
指導教授:林得勝
指導教授(外文):Lin, Te-Sheng
口試委員:林祐霆朱家杰
口試委員(外文):Lin, Yu-TingChu, Chia- Chieh
口試日期:2023-06-27
學位類別:碩士
校院名稱:國立陽明交通大學
系所名稱:應用數學系所
學門:數學及統計學門
學類:數學學類
論文種類:學術論文
論文出版年:2023
畢業學年度:111
語文別:英文
論文頁數:41
中文關鍵詞:異常檢測流形學習擴散映射局部線性嵌入譜聚類
外文關鍵詞:Anomaly detectionManifold learningDiffusion mapLocally linear embeddingSpectral clustering
相關次數:
  • 被引用被引用:0
  • 點閱點閱:52
  • 評分評分:
  • 下載下載:13
  • 收藏至我的研究室書目清單書目收藏:0
這篇文章主要的目標是想要將現實的資料轉換成能夠良好的成為監督式學習的訓練資料,這個過程也被稱為特徵選擇,其中的兩個步驟:異常檢測、資料降維,是本篇專注的重點,而我將使用流形學習的演算法來嘗試。在一開始介紹主成分分析(principal component analysis)、多維標度(multidimensional scaling)、分群與合併多維標度(split and combine multidimensional scaling)、擴散映射(diffusion map)、局部尺度擴散映射(diffusion map with local scaling)、局部線性嵌入(locally linear embedding)、黑塞特徵映射(Hessian locally linear embedding)以及切面局部線性嵌入(tangential locally linear embedding),並進行初步的證明。

接下來我將用兩個平滑的流形展示上述幾種方法的差異,以及其中的參數對於演算法的影響,然後我們將展示異常值對於演算法結果的影響,其中的異常值包括離群值及雜訊。最終,我們會使用譜聚類(Spectral clustering)以及K-平均演算法(K-means clustering)來嘗試分離我們想要的資料以及異常值。
The main goal of this article is to transform real-world data into well-behaved training data for supervised learning, a process known as feature selection. Two key steps in this process, namely anomaly detection and data dimensionality reduction, are the focal points of this study. I will be using manifold learning algorithms to accomplish this task. The article begins by introducing principal component analysis, multidimensional scaling, split and combine multidimensional scaling, diffusion map, diffusion map with local scaling, locally linear embedding, Hessian locally linear embedding and tangential locally linear embedding, and give some prove.

Next, I will present two smooth manifolds to show the differences among the previous methods and the impact of their parameters on the algorithms. Additionally, we will examine the influence of anomaly values, including outliers and noise, on the algorithm's results. Finally, we will apply spectral clustering and K-means clustering to separate the desired data from the anomaly values.
摘要................................................... i
Abstract............................................... ii
Table of Contents......................................iii
List of Figures........................................ V

1 Introduction......................................... 1

2 Methodology.......................................... 2
2.1 Notation........................................... 2
2.2 Principal Component Analysis....................... 3
2.3 Multidimensional Scaling........................... 4
2.3.1 Classical Multidimensional Scaling............... 4
2.3.2 Split and Combine Multidimensional Scaling....... 5
2.4 Diffusion map...................................... 7
2.4.1 Connect to MDS................................... 9
2.4.2 Local Scale...................................... 10
2.5 Locally Linear Embedding........................... 11
2.5.1 Original method.................................. 11
2.5.2 Hessian method................................... 13
2.5.3 Tangential Locally Linear Embedding.............. 14

3 Numerical Experiment................................. 16
3.1 Data............................................... 16
3.2 PCA and SCMDS...................................... 17
3.2.1 Special result of SCMDS.......................... 19
3.2.2 Result of PCA.................................... 20
3.3 Diffusion Map...................................... 20
3.3.1 Diffusion Map and Local Scale.................... 21
3.4 LLE................................................ 22
3.4.1 Cost and Neighbors............................... 23
4 Anomaly Detection.................................... 26
4.1 Data with Noise.................................... 26
4.1.1 Outlier.......................................... 26
4.1.2 Noise............................................ 28
4.2 Spectral Clustering................................ 32
4.2.1 K-means Clustering............................... 32
4.2.2 PCA and LLE...................................... 32
4.2.3 Diffusion Map.................................... 34

5 Conclusion and future work........................... 36

Reference.............................................. 38
Appendix A............................................. 39
[1] I. Jolliffe, Principal component analysis. Wiley Online Library, 2005.

[2] J. Kruskal and M. Wish, “Multidimensional scaling. 1978,” Beverly Hills, CA, 1978.

[3] J. Tzeng, H. H.-S. Lu, and W.-H. Li, “Multidimensional scaling for large genomic data sets.” BMC Bioinform., vol. 9, p. 179, 2008.

[4] Coifman and S. Lafon, “Diffusion map,” Appl. Comput. Harmon. Anal., vol. 21, pp. 5––30, 2006.

[5] P. Perona and L. Zelnik-Manor, “Self-tuning spectral clustering,” Proc. Adv. Neural Inf. Process. Syst., vol. 17, pp. 1601–1608, 2004.

[6] L. Lin, “Avoiding unwanted results in locally linear embedding: A new under standing of regularization,” arXiv, 2021, 2108.12680.

[7] D. L. Donoho and C. Grimes, “Hessian eigenmaps: locally linear embedding techniques for high-dimensional data.” Proc. Natl. Acad. Sci. U.S.A., vol. 100, pp. 5591–5596, 2003.

[8] L. Lin and C.-W. Chen, “A new locally linear embedding scheme in light of hessian eigenmap,” arXiv, 2021, 2112.09086.

[9] J. von Neumann, “Some matrix-inequalities and metrization of matrix-space,” omsk. Univ. Rev., vol. 1, pp. 286–300, 1937.

[10] A. Ruhe, “Perturbation bounds for means of eigenvalues and invariant subspaces,” BIT Numer. Math., vol. 10, pp. 343–354, 1970.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top