跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.13) 您好!臺灣時間:2025/11/24 06:20
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:振昊
研究生(外文):Hao Zhen
論文名稱:差分隱私之關聯規則挖掘機制
論文名稱(外文):DPARM: Differential Privacy Association Rules Mining
指導教授:郭斯彥郭斯彥引用關係
指導教授(外文):Sy-Yen Kuo
口試委員:顏嗣鈞雷欽隆鄒耀東
口試委員(外文):Hsu-Chun YenChin-Laung LeiYao-Tung Tsou
口試日期:2018-12-13
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:電機工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2018
畢業學年度:107
語文別:英文
論文頁數:47
中文關鍵詞:隱私保護數據分析差分隱私關聯分析關聯規則挖掘頻繁項集挖掘
DOI:10.6342/NTU201804365
相關次數:
  • 被引用被引用:0
  • 點閱點閱:191
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
在當代社會中,數據量的迅速膨脹推動了數據分析技術的發展,這使得決策自動化成為可能。關聯分析是數據分析中的一項重要任務,其目標在於從事務數據集中找出所有的共現關係,即頻繁項集或可信關聯規則。一條關聯規則由前件和後件兩部分組成,其含義是如果前件發生那麼後件也很可能發生。可信關聯規則即那些可能性較高的關聯規則,它可以幫助人們更好地發現規律,制定相應的策略。
數據分析的過程可以高度概括為一組查詢,每個查詢都是一個關於數據集的實值函數。然而毫無限制和保護地接入數據集以獲取查詢結果,可能導致個人隱私的洩露。因此,隱私保護下的數據分析技術受到了越來越多的關注,人們迫切希望可以找到一種強大的、在數學上嚴格的,且符合人們社會認知的隱私定義。差分隱私就是這樣一種隱私定義,它通過隱私水平這一參數來管理和量化個人在數據分析中的所面臨的隱私風險。一般而言,差分隱私可以通過對查詢結果添加精巧設計的噪聲來實現。
在本文中,我們重點關注滿足差分隱私的多支持度閾值的關聯規則挖掘,並解決了現有技術中存在的挑戰。我們提出並實現了DPARM算法,它利用多支持度閾值,在反映項目真實屬性的同時減少候選項集的數量,並且通過隨機截斷和均勻分割來降低數據集的維度。這兩者有助於降低查詢的敏感度,進而減小所需的噪聲規模,提升挖掘結果的效用。我們還通過自適應配置隱私水平來穩定噪聲規模,併限制了總體的隱私損失。此外,我們證明了DPARM 算法滿足事後差分隱私,並通過一系列實驗驗證了DPARM 算法的實用性。
In contemporary society, the rapid expansion of data volume has driven the development of data analysis techniques, which makes decision automation possible. Association analysis is an important task in data analysis. The goal is to find all co-occurrence relationships from the transactional dataset, i.e. frequent itemsets or confident association rules. An association rule consists of two parts, the antecedent and the consequent, which means that if the antecedent occurs then the consequent is also possible to happen. Confident association rules are those association rules with larger possibility, which can help people better discover patterns and develop corresponding strategies.
The process of data analysis can be highly summarized as a set of queries, where each query is a real-valued function of the dataset. However, without any restriction and protection, accessing the dataset to answer the queries may lead to the disclosure of individual privacy. Therefore, techniques for privacy-preserving data analysis has received increasing attention. People are eager to find a strong, mathematically rigorous, and socio-cognitive-conform definition of privacy. Differential privacy is such a privacy definition that manages and quantifies the privacy risks faced by individuals in data analysis through the parameter called the privacy level. In general, differential privacy can be achieved by adding delicate noise to the query results.
In this thesis, we focus on differential privacy association rules mining with multiple support thresholds, and solve the challenges existing in the state-of-art works. We propose and implement the DPARM algorithm, which uses multiple support thresholds to reduce the number of candidate itemsets while reflecting the real nature of the items, and uses random truncation and uniform partition to lower the dimensionality of the dataset. Both of these are helpful to reduce the sensitivity of the queries, thereby reducing the scale of the required noise and improving the utility of the mining results. We also stabilize the noise scale by adaptively allocating the privacy levels, and bound the overall privacy loss. In addition, we prove that the DPARM algorithm satisfies ex post differential privacy, and verify the utility of the DPARM algorithm through a series of experiments.
誌謝 iii
摘要 v
Abstract vii
1 Introduction 1
1.1 Data Automates Decisions 1
1.2 Privacy Is at Stake 3
1.3 Motivations and Contributions 5
2 Preliminaries 7
2.1 Association Rules Mining 7
2.2 Differential Privacy 9
3 Computational Model and Key Challenges 13
3.1 Computational Model and Problem Statement 13
3.2 Challenges and Existing Solutions 14
4 DPARM Algorithm 19
4.1 Initialization 20
4.2 Support Counting and MIS Assignment 22
4.3 Main Algorithm 23
4.4 Miscellaneous 25
5 Experiments 29
5.1 Environment and Datasets 29
5.2 Utility Metrics 30
5.3 Non-Private Parameters Versus Utility 31
5.4 Overall Privacy Loss Versus Utility 34
6 Related Works and Future Works 37
6.1 Related Works 37
6.2 Future Works 38
7 Conclusion 41
Bibliography 43
[1]Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. Mining association rules between sets of items in large databases. In Acm sigmod record, volume 22, pages 207–216. ACM, 1993.
[2]Rakesh Agrawal, Ramakrishnan Srikant, et al. Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB, volume 1215, pages 487– 499, 1994.
[3]Brendan Avent, Aleksandra Korolova, David Zeber, Torgeir Hovden, and Benjamin Livshits. Blender: enabling local search with a hybrid differential privacy model. In Proc. of the 26th USENIX Security Symposium, pages 747–764, 2017.
[4]Raghav Bhaskar, Srivatsan Laxman, Adam Smith, and Abhradeep Thakurta. Discov- ering frequent patterns in sensitive data. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 503–512. ACM, 2010.
[5]Ferenc Bodon. A fast apriori implementation. In FIMI, volume 3, page 63, 2003.
[6]Tom Brijs, Gilbert Swinnen, Koen Vanhoof, and Geert Wets. Using association rules for product assortment decisions: A case study. In Knowledge Discovery and Data Mining, pages 254–260, 1999.
[7]Wei-Yen Day and Ninghui Li. Differentially private publishing of high-dimensional data using sensitivity control. In Proceedings of the 10th ACM Symposium on Infor- mation, Computer and Communications Security, pages 451–462. ACM, 2015.
[8]Apple Differential Privacy Team. Learning with privacy at scale. Apple Machine Learning Journal, 1(8), 2017.
[9]Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. Collecting telemetry data privately. In Advances in Neural Information Processing Systems, pages 3571–3580, 2017.
[10]Cynthia Dwork. Differential privacy. In Automata, Languages, and Programming, pages 1–12. Springer, 2006.
[11]Cynthia Dwork and Moni Naor. On the difficulties of disclosure prevention in sta- tistical databases or the case for differential privacy. Journal of Privacy and Confi- dentiality, 2(1), 2010.
[12]Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In Annual Inter- national Conference on the Theory and Applications of Cryptographic Techniques, pages 486–503. Springer, 2006.
[13]Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, pages 265–284. Springer, 2006.
[14]Cynthia Dwork, Aaron Roth, et al. The algorithmic foundations of differential pri- vacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
[15]Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. Journal of Privacy and Confidentiality, 7(3): 17–51, 2017.
[16]Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. Rappor: Randomized ag- gregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pages 1054–1067. ACM, 2014.
[17]Jiawei Han, Jian Pei, and Yiwen Yin. Mining frequent patterns without candidate generation. In ACM sigmod record, volume 29, pages 1–12. ACM, 2000.
[18]Jiawei Han, Jian Pei, Yiwen Yin, and Runying Mao. Mining frequent patterns with- out candidate generation: A frequent-pattern tree approach. Data mining and knowl- edge discovery, 8(1):53–87, 2004.
[19]F Maxwell Harper and Joseph A Konstan. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis), 5(4):19, 2016.
[20]Shiva Prasad Kasiviswanathan, Homin K Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. What can we learn privately? SIAM Journal on Computing, 40 (3):793–826, 2011.
[21]Daniel Kifer and Ashwin Machanavajjhala. No free lunch in data privacy. In Pro- ceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 193–204. ACM, 2011.
[22]Jing Lei. Differentially private m-estimators. In Advances in Neural Information Processing Systems, pages 361–369, 2011.
[23]Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on, pages 106–115. IEEE, 2007.
[24]Ninghui Li, Wahbeh Qardaji, Dong Su, and Jianneng Cao. Privbasis: frequent item- set mining with differential privacy. Proceedings of the VLDB Endowment, 5(11): 1340–1351, 2012.
[25]Ninghui Li, Min Lyu, Dong Su, and Weining Yang. Differential privacy: From theory to practice. Synthesis Lectures on Information Security, Privacy, & Trust, 8 (4):1–138, 2016.
[26]Katrina Ligett, Seth Neel, Aaron Roth, Bo Waggoner, and Steven Z Wu. Accuracy first: Selecting a differential privacy level for accuracy constrained erm. In Advances in Neural Information Processing Systems, pages 2566–2576, 2017.
[27]Bing Liu, Wynne Hsu, and Yiming Ma. Mining association rules with multiple min- imum supports. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 337–341. ACM, 1999.
[28]Ashwin Machanavajjhala, Johannes Gehrke, Daniel Kifer, and Muthuramakrishnan Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. In null, page 24. IEEE, 2006.
[29]Mihai Maruseac and Gabriel Ghinita. Differentially-private mining of moderately- frequent high-confidence association rules. In Proceedings of the 5th ACM Confer- ence on Data and Application Security and Privacy, pages 13–24. ACM, 2015.
[30]Mihai Maruseac and Gabriel Ghinita. Precision-enhanced differentially-private min- ing of high-confidence association rules. IEEE Transactions on Dependable and Secure Computing, 2018.
[31]Viktor Mayer-Schonberger and Kenneth Cukier. Big data: the essential guide to work, life and learning in the age of insight. Hachette UK, 2013.
[32]Frank D McSherry. Privacy integrated queries: an extensible platform for privacy- preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pages 19–30. ACM, 2009.
[33]Chirag N Modi, Udai Pratap Rao, and Dhiren R Patel. Maintaining privacy and data quality in privacy preserving association rule mining. In Computing Communication and Networking Technologies (ICCCNT), 2010 International Conference on, pages 1–6. IEEE, 2010.
[34]Arvind Narayanan and Vitaly Shmatikov. Robust de-anonymization of large sparse datasets. In Security and Privacy, 2008. SP 2008. IEEE Symposium on, pages 111–125. IEEE, 2008.
[35]Kobbi Nissim and Alexandra Wood. Is privacy privacy? Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 376 (2128), 2018.
[36]David Reinsel, John Gantz, and John Rydning. Data age 2025: The evolution of data to life-critical. Don’t Focus on Big Data, 2017.
[37]Ryan M Rogers, Aaron Roth, Jonathan Ullman, and Salil Vadhan. Privacy odome- ters and filters: Pay-as-you-go composition. In Advances in Neural Information Processing Systems, pages 1921–1929, 2016.
[38]Latanya Sweeney. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05):557–570, 2002.
[39]Salil Vadhan. The complexity of differential privacy. In Tutorials on the Foundations of Cryptography, pages 347–450. Springer, 2017.
[40]Mohammed Javeed Zaki. Scalable algorithms for association mining. IEEE trans- actions on knowledge and data engineering, 12(3):372–390, 2000.
[41]Chen Zeng, Jeffrey F Naughton, and Jin-Yi Cai. On differentially private frequent itemset mining. Proceedings of the VLDB Endowment, 6(1):25–36, 2012.
[42]Zijian Zheng, Ron Kohavi, and Llew Mason. Real world performance of association rule algorithms. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 401–406. ACM, 2001.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top