論文名稱(外文):The Impact of Linear Transformation on the Effectiveness and Security of the Privacy Preserving Data Mining Process
指導教授(外文):Tzu-Tsung Wong
外文關鍵詞:classificationdata perturbationencryptionlinear transformationprivacy preserving
Since data mining techniques can extract useful knowledge from data, more and more people are devoted to this field. The data for mining generally contain personal records, and hence people pay more attention on preventing their private data from being disclosed. This study attempts to establish a procedure to ensure the effectiveness and security of the original data. Data are transformed by piecewise linear functions before sending to data analysts who will apply classification methods on the transformed data. Transmitting data are also protected by perturbation and encryption processes. The classification models produced by data analysts can be sent back to data providers who will restore the models for the data analysts. This restoring process is designed to ensure that data analysts can have models for classifying new instances. According to the experimental results on ten data sets, the more pieces a linearly function has, the higher security can be achieved for algorithms decision tree and rule-based classifier. The data analyzed by algorithms logistic regression and support vector machine should not be transformed by multi-piece linear functions, because the accuracies of the original and transformed data resulting from these two algorithms will be different.
摘要 I
誌謝 VI
目錄 VII
表目錄 IX
圖目錄 X
第一章 緒論 1
1.1 研究背景及動機 1
1.2 研究目的 2
1.3 研究流程 3
1.4 研究限制 3
第二章 文獻探討 4
2.1 隱私保護資料探勘 4
2.1.1 隱私保護資料探勘的應用技術 4
2.1.2 隱私保護資料探勘的實際應用 6
2.1.3 隱私保護資料探勘的挑戰 7
2.2 資料擾動技術 8
2.3 加密技術 10
2.3.1 同態加密 10
2.3.2 RSA加密 11
2.4 PPDM評估方法 13
2.4.1 隱私級別 13
2.4.2 資料質量 13
2.5 小結 15
第三章 研究方法 17
3.1 資料轉換 18
3.1.1 一段式線性轉換 18
3.1.2 多段式線性轉換 19
3.2 效果測試 21
3.2.1 決策樹、規則分類 21
3.2.2 羅吉斯/線性迴歸、支持向量機 24
3.3 新資料預測 25
3.3.1 模型還原 27
3.3.2 預測資料程式 28
3.4 噪音干擾及加密 30
3.5 評估指標 31
第四章 實證研究 33
4.1 資料集介紹 33
4.2 效用性實證 34
4.2.1 一段式線性分析 35
4.2.2 多段式線性分析 38
4.3 安全性實證 45
4.4 小結 48
第五章 結論與建議 49
5.1 結論 49
5.2 未來研究與發展 50
參考文獻 51
