跳到主要內容

臺灣博碩士論文加值系統

(44.200.194.255) 您好!臺灣時間:2024/07/19 04:53
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:溫明勳
研究生(外文):Ming-Hsun Wen
論文名稱:以阻斷推論通道方式之隱私保護樣式探勘
論文名稱(外文):Privacy Preserving Pattern Mining Based on Blocking Inference Channels
指導教授:葉介山葉介山引用關係
指導教授(外文):Jieh-Shan Yeh
學位類別:碩士
校院名稱:靜宜大學
系所名稱:資訊管理學系研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2009
畢業學年度:98
語文別:英文
論文頁數:48
中文關鍵詞:推論通道隱私保護探勘敏感樣式
外文關鍵詞:Privacy preserving data miningInference channelSensitive frequent pattern
相關次數:
  • 被引用被引用:0
  • 點閱點閱:166
  • 評分評分:
  • 下載下載:12
  • 收藏至我的研究室書目清單書目收藏:0
資料探勘的隱私權保護是近年熱門的議題。許多研究使用修改資料庫的技術來達到保護隱私的目標。做法是將原始的資料庫轉換成修改過的資料庫,這樣敏感的樣式就不會被探勘出來。這樣的做法有缺點,就是非敏感樣式的支持度會被改變。另外,有一些修改資料庫的方式會產生假的樣式,這些序列樣式在原始的資料庫中不是頻繁項目。
樣式探勘的結果是一群達到支持度的樣式。未經分別就分享樣式探勘的結果是危險的,可能會影響資料的隱私和安全,所以敏感的樣式不應該被揭露。更進一步來說,因為有推論通道的存在,有心取得隱私資料的人還是可能從非敏感樣式中推得敏感的樣式。因此單純刪除探勘結果的敏感樣式並不足以保護隱私資料被揭露。本研究針對阻斷推論通道的樣式探勘提出BASE, MINNSP, MAXSP, MINNSP+及MAXSP+ 五個演算法可以有效的阻斷推論通道達到保護隱私的目的。這些演算法的優點在於不會產生假的樣式,而且不會破壞原始的資料庫。實驗結果顯示BASE、MINNSP及MINNSP+ 比MAXSP及MAXSP+更有效的阻斷推論通道。此外,BASE、MINNSP及MAXSP 比MINNSP+及MAXSP+的處理時間要快上許多。
Privacy preserving data mining is a popular research topic in recent years. Many studies use data sanitization technique to achieve this goal. The principal process is to transform the original database into a sanitized database so that no sensitive patterns can be mined from the sanitized database. One disadvantage of this technique is that the supports of non-sensitive frequent patterns are changed after the sanitization process. Another known side effect is that some data sanitization approaches may also produce fictitious frequent patterns that are not supposedly frequent in the original database.
The knowledge discovered by frequent pattern mining is represented in the form of a collection of frequent patterns with their supports. Sharing the frequent patterns without discrimination may bring threats against privacy and security, as some of the frequent patterns may be sensitive and should not be disclosed. Furthermore, due to the existence of inference channels, an attacker could also derive sensitive patterns from a set of non-sensitive patterns. Therefore, just eliminating sensitive patterns from the mining result is not enough to prevent their disclosure. This research focuses on blocking inference channels in frequent pattern mining and presents five effective algorithms to achieve the goal of hiding sensitive itemsets so that the adversaries cannot mine them via the inference channels.
We here present five algorithms, namely, BASE, MINNSP, MAXSP, MINNSP+, and MAXSP+ for blocking inference channels by patterns sanitization. The main advantage of these algorithms is that they do not produce any fictitious frequent patterns, nor do they distort the original database. The experimental results show that BASE, MINNSP, and MINNSP+ are more effective than MAXSP and MAXSP+, meaning they can disclose more non-sensitive information. Furthermore, BASE, MINNSP, and MAXSP are more efficient than MINNSP+ and MAXSP+ in terms of the running time.
Abstract…………………………………………………………………… ii
Acknowledgement ……………………………………………… iii
Content……………………………………………………………………… iv
List of Tables……….………………………………………… v
List of Figures………………………………………………… vi
Chapter 1 Introduction 1
Chapter 2 Related Work 3
2.1 Association rule mining 3
2.2 Privacy preserving data mining 4
2.2.1 The data-sharing approach 5
2.2.2 The pattern-sharing approach 6
2.3 Inference channels in pattern sharing-based algorithms 7
Chapter 3 Proposed Algorithms 9
3.1 Notations and problem definition 9
3.2 BASE algorithm 11
3.3 MINNSP algorithm 15
3.4 MINNSP+ algorithm 17
3.5 MAXSP algorithm 20
3.6 MAXSP+ algorithm 22
Chapter 4 Experimental Results 25
4.1 IBM synthetic data generator 25
4.2 Effectiveness measurement 26
4.3 Comparison of side effect 26
Chapter 5 Conclusions 36
References……………………………………………………………………37
[1] R. Agrawal and R. Srikant, “Fast algorithms for mining association rules in large databases,” In Proceedings of 20th International Conference on Very Large Data Bases, pp. 487–499, 1994.
[2] R. Agrawal and R. Srikant, “Privacy-preserving data mining,” In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 439–450, 2000.
[3] R. Agrawal, T. Imielinski, and A. N. Swami, “Mining association rules between sets of items in large databases,” In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pp. 207-216, 1993.
[4] M. Atallah, E. Bertino, A. Elmagarmid, M. Ibrahim, and V. Verykios, “Disclosure limitation of sensitive rules,” In IEEE Workshop on Knowledge and Data Engineering Exchange, pp. 45–52, 1999.
[5] T. Calders, “Computational complexity of itemset frequency satisfiability,” In Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 143–154, 2004.
[6] C. Clifton and D. Marks, “Security and privacy implications of data mining,” In Workshop on Data Mining and Knowledge Discovery, pp. 15–19, 1996.
[7] E. Dasseni, V. S. Verykios, A. K. Elmagarmid, and E. Bertino, “Hiding association rules by using confidence and support,” In Proceedings of the 4th Information Hiding Workshop, pp. 369-383, 2001.
[8] G. Grahne and J. Zhu, “Efficiently using Prefix-Trees in mining frequent itemsets,” Proc. ICDM 2003 Workshop Frequent Itemset Mining Implementations, Dec. 2003.
[9] J. Han, J. Pei, and Y. Yin, “Mining frequent patterns without candidate generation,” In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 1-12, 2000.
[10] G. Jagannathan and R. N. Wright, “Privacy-preserving distributed k-means clustering over arbitrarily partitioned data,” In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 593–599, 2005.
[11] S. R. M. Oliveira, O. R. Zaïane, and Y. Saygin, “Secure association rule sharing,” In Proceedings of 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’04), pp. 74-85, 2004.
[12] J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang, “Hmine: Hyper-structure mining of frequent patterns in large databases,” Proc. IEEE Int''l Conf. Data Mining, pp. 441-448, 2001.
[13] S. Rizvi and J. R. Haritsa, “Maintaining data privacy in association rule mining,” In Proceedings of the 28th International Conference on Very Large Data Bases, pp. 682–693, 2002.
[14] Y. Saygin, V. S. Verykios, and C. Clifton, “Using unknowns to prevent discovery of association rules,” SIGMOD Record, vol. 30, no. 4, pp.45-54, 2001.
[15] P. Shenoy, J.R. Haritsa, S. Sudarshan, G. Bhalotia, M. Bawa, and D. Shah, “Turbo-charging vertical mining of large databases,” Proc. ACM SIGMOD Int''l Conf. Management of Data, pp. 22-23, May 2000.
[16] V. Verykios, A. Elmagarmid, E. Bertino, Y. Saygin, and E. Dasseni, “Association rules hiding,” IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 4, pp.434-447, 2004.
[17] V. Verykios, E. Bertino, I. G. Fovino, L. P. Provenza, Y. Saygin, and Y. Theodoridis, “State-of-the-art in privacy preserving data mining,” SIGMOD Record, vol. 33, no. 1, pp. 50-57, 2004.
[18] Z. Wang, W. Wang, and B. Shi, “Blocking channels in frequent pattern sharing,” In Proceedings of IEEE 23rd International Conference on Data Engineering, pp. 1425-1429, 2007.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top