(3.235.245.219) 您好!臺灣時間:2021/05/10 00:56
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:方荷雅
研究生(外文):He-YaFang
論文名稱:探討線性轉換方法對隱私保護資料探勘流程效用及安全性影響之研究
論文名稱(外文):The Impact of Linear Transformation on the Effectiveness and Security of the Privacy Preserving Data Mining Process
指導教授:翁慈宗翁慈宗引用關係
指導教授(外文):Tzu-Tsung Wong
學位類別:碩士
校院名稱:國立成功大學
系所名稱:資訊管理研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2020
畢業學年度:108
語文別:中文
論文頁數:54
中文關鍵詞:隱私保護資料探勘資料擾動加密
外文關鍵詞:classificationdata perturbationencryptionlinear transformationprivacy preserving
相關次數:
  • 被引用被引用:0
  • 點閱點閱:35
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
由於資料探勘技術能夠從資料中提取有用的知識,以便更好地了解和服務客戶,從而獲得競爭優勢,因此在資料探勘這一領域也越來越多人投入及研究。而隨著資訊技術的發展,使得購物習慣、信用記錄、疾病歷史等個人資料都能夠被收集和處理。毫無疑問,這些資訊對於許多領域都非常有用。然而,目前大眾對個人隱私的關注越來越大,許多人並不希望自己的私密資料被透露出去,所以保障資料的安全性又能夠透過資料探勘取得有效的資訊,在現今是一項需要被思考的事。故本研究嘗試建立一套方法流程,保障原始資料的私密性及效用性,期望對隱私保護資料探勘這塊領域有所貢獻。
目前隱私保護資料探勘這一領域,除去加密和匿名化,大多以對資料進行擾動來保護資料隱私,而擾動會導致資料失真,降低資料的效用性,因此本研究使用一段和多段式線性函數,將原始的連續型資料轉換成另一種數值,確保原始資料值不被直接得知,又能夠讓資料維持原本的效用性,再透過模型還原和預測程式的方法,使得資料接收者能夠使用自己的資料來做新資料預測。除了轉換資料集之外,還加入對轉換資料集的噪音干擾及加密的方式,使得在資料集的傳遞過程中,達到加強資料安全性的效果,讓第三方(非資料提供或接收者)無法使用此轉換的資料集。依據實驗結果可以發現決策樹及規則分類,使用多段式線性轉換能夠保持資料的效用性,提供比一段式轉換還高的安全性,並且分段數越多,安全性就會越高;羅吉斯迴歸和支持向量機則是使用一段式轉換即可,因為一段式轉換能夠維持資料的效用性,並且根據本研究的流程與評估方式,一段式轉換並不會對這兩種分類方法的安全性造成影響。
Since data mining techniques can extract useful knowledge from data, more and more people are devoted to this field. The data for mining generally contain personal records, and hence people pay more attention on preventing their private data from being disclosed. This study attempts to establish a procedure to ensure the effectiveness and security of the original data. Data are transformed by piecewise linear functions before sending to data analysts who will apply classification methods on the transformed data. Transmitting data are also protected by perturbation and encryption processes. The classification models produced by data analysts can be sent back to data providers who will restore the models for the data analysts. This restoring process is designed to ensure that data analysts can have models for classifying new instances. According to the experimental results on ten data sets, the more pieces a linearly function has, the higher security can be achieved for algorithms decision tree and rule-based classifier. The data analyzed by algorithms logistic regression and support vector machine should not be transformed by multi-piece linear functions, because the accuracies of the original and transformed data resulting from these two algorithms will be different.
摘要 I
誌謝 VI
目錄 VII
表目錄 IX
圖目錄 X
第一章 緒論 1
1.1 研究背景及動機 1
1.2 研究目的 2
1.3 研究流程 3
1.4 研究限制 3
第二章 文獻探討 4
2.1 隱私保護資料探勘 4
2.1.1 隱私保護資料探勘的應用技術 4
2.1.2 隱私保護資料探勘的實際應用 6
2.1.3 隱私保護資料探勘的挑戰 7
2.2 資料擾動技術 8
2.3 加密技術 10
2.3.1 同態加密 10
2.3.2 RSA加密 11
2.4 PPDM評估方法 13
2.4.1 隱私級別 13
2.4.2 資料質量 13
2.5 小結 15
第三章 研究方法 17
3.1 資料轉換 18
3.1.1 一段式線性轉換 18
3.1.2 多段式線性轉換 19
3.2 效果測試 21
3.2.1 決策樹、規則分類 21
3.2.2 羅吉斯/線性迴歸、支持向量機 24
3.3 新資料預測 25
3.3.1 模型還原 27
3.3.2 預測資料程式 28
3.4 噪音干擾及加密 30
3.5 評估指標 31
第四章 實證研究 33
4.1 資料集介紹 33
4.2 效用性實證 34
4.2.1 一段式線性分析 35
4.2.2 多段式線性分析 38
4.3 安全性實證 45
4.4 小結 48
第五章 結論與建議 49
5.1 結論 49
5.2 未來研究與發展 50
參考文獻 51
Ahmad, I., & Archana, K. (2014). Homomorphic encryption method applied to cloud computing. International Journal of Information & Computation Technology, 4(15), 1519-1530.
Agrawal, R., & Srikant, R. (2000). Privacy-preserving data mining. Proceedings of the ACM SIGMOD Conference on Management of Data, 439-450.
Bhaladhare, P. R., & Jinwala, D. C. (2016). Novel approaches for privacy preserving data mining in k-anonymity model. Journal of Information Science and Engineering, 32(1), 63–78.
Chen, K., & Liu, L. (2005). Privacy preserving data classification with rotation
perturbation. Fifth IEEE International Conference on Data Mining, 589-592.
Cai, YL., & Tang, CM. (2019). Privacy of outsourced two-party k-means clustering. Concurrency and Computation: Practice and Experience, doi: 10.1002/cpe.5473.
Gokulnath, C., Priyan, M. K., Balan, E. V., Prabha, K. R., & Jeyanthi, R. (2015).
Preservation of privacy in data mining by using PCA based perturbation technique. 2015 International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials, 202-206.
Gao, JL., Ping, Q., & Wang, JX. (2018). Resisting re-identification mining on social graph data. World Wide Web - Internet and Web Information Systems, 21(6), 1759-1771.
Jain, Y. K., & Bhandare, S. K. (2011). Min max normalization based data perturbation method for privacy protection. International Journal of Computer & Communication Technology, 2, 45-50.
Liew, C. K., Choi, U. J., & Liew, C. J. (1985). A data distortion by probability
distribution. ACM Transaction on Database Systems, 10(3), 395-411.
Li, G., & Xue, R. (2018). A new privacy-preserving data mining method using non-negative matrix factorization and singular value decomposition. Wireless Personal Communications, 102(2), 1799-1808.
López, V., Fernández, A., Moreno-Torres, J. G., & Herrera, F. (2012). Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics. Expert Systems with Applications, 39(7), 6585-6608.
Mittal, D., Kaur, D., & Aggarwal, A. (2014). Secure data mining in cloud using
homomorphic encryption. 2014 IEEE International Conference on Cloud Computing in Emerging Markets, 1–7.
Ma, H., Guo, XY., Ping, Y., Wang, BC., Yang, YH., Zhang, ZL., & Zhou, JX. (2019). PPCD: Privacy-preserving clinical decision with cloud support. Plos One, 14(5), doi: 10.1371/journal.pone.0217349.
Maheswaria, N., & Revathi, M. (2014). Data security using decomposition. International Journal of Applied Science and Engineering, 12(4), 303-312.
Mendes, R., & Vilela, J. P. (2017). Privacy-preserving data mining: methods, metrics, and applications. IEEE Access, 5, 10562–10582.
Oliveira, S.R.M., & Zaı¨ane, O.R. (2010). Privacy preserving clustering by data
transformation. Journal of Information and Data Management, 1(1), 37–51.
Rivest, R., Shamir, A., & Adleman, L. (1978). A method for obtaining digital signatures and public key cryptosystems. Communications of the ACM, 21(2), 120-126.
Rathna, S. S., & Karthikeyan, T. (2015). Survey on recent algorithms for privacy preserving data mining. International Journal of Computer Science and Information Technologies, 6(2), 1835-1840.
San, I., At, N., Yakut, I., & Polat, H. (2016). Efficient paillier cryptoprocessor for privacy-preserving data mining. Security and Communication Networks, 9(11), 1535–1546.
Saranya, K., Premalatha, K., & Rajasekar, S. S. (2015). A survey on privacy preserving data mining. 2nd International Conference on Electronics and Communication System, 1740–1744.
Sweeney, L. (2002). k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness, and Knowledge-Based Systems, 10(5), 557-570.
Tripathi, R., & Agrawal, S. (2014). Comparative study of symmetric and asymmetric cryptography techniques. International Journal of Advance Foundation and Research in Computer, 1(6), 68–76.
Tsiafoulis, S. G., & Zorkadis, V. C. (2010). A neural network clustering based
algorithm for privacy preserving data mining. International Conference on Computational Intelligence and Security, 401-405.
Upadhyay, S., Sharma, C., Sharma, P., Bharadwaj, P., & Seeja, K. R. (2018). Privacy preserving data mining with 3-D rotation transformation. Journal of King Saud University-Computer and Information Sciences, 30(4), 524-530.
Wang, Q., Du, MX., Chen, XY., Chen, YJ., Zhou, P., Chen, XF., & Huang, XY. (2018).Privacy-preserving collaborative model learning: the case of word vector training. IEEE Transactions on Knowledge and Data Engineering, 30(12), 2381-2393.
Wu, W., Parampalli, U., Liu, J., & Xian, M. (2019). Privacy preserving k-nearest neighbor classification over encrypted database in outsourced cloud environments. World Wide Web - Internet and Web Information Systems, 22(1), 101-123.
Wang, Y., Adams, S., Beling, P., Greenspan, S., Rajagopalan, S., Velez-Rojas, M., Mankovski, S., Boker, S., & Brown, D. (2018). Privacy preserving distributed deep learning and its application in credit card fraud detection. 2018 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/ 12th IEEE International Conference on Big Data Science and Engineering, 1070–1078.
Yin, D., & Yang, Q. (2018) GANs based density distribution privacy-preservation on mobility data. Security and Communication Networks, 2018(2), 1-13.
Yu, S. (2016). Big privacy: challenges and opportunities of privacy study in the age of big data. IEEE Access, 4, 2751–2763.
Yun, U., & Kim, J. (2015). A fast perturbation algorithm using tree structure for privacy preserving utility mining. Expert Systems with Applications, 42(3), 1149–1165.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔