臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.13) 您好！臺灣時間：2025/11/24 06:20

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
紙本論文
QR Code

本論文永久網址:

研究生:

振昊

研究生(外文):

Hao Zhen

論文名稱:

差分隱私之關聯規則挖掘機制

論文名稱(外文):

DPARM: Differential Privacy Association Rules Mining

指導教授:

郭斯彥

指導教授(外文):

Sy-Yen Kuo

口試委員:

顏嗣鈞、雷欽隆、鄒耀東

口試委員(外文):

Hsu-Chun Yen、Chin-Laung Lei、Yao-Tung Tsou

口試日期:

2018-12-13

學位類別:

碩士

校院名稱:

國立臺灣大學

系所名稱:

電機工程學研究所

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2018

畢業學年度:

107

語文別:

英文

論文頁數:

中文關鍵詞:

隱私保護數據分析、差分隱私、關聯分析、關聯規則挖掘、頻繁項集挖掘

DOI:

10.6342/NTU201804365

相關次數:

被引用:0
點閱:191
評分:
下載:0
書目收藏:0

在當代社會中，數據量的迅速膨脹推動了數據分析技術的發展，這使得決策自動化成為可能。關聯分析是數據分析中的一項重要任務，其目標在於從事務數據集中找出所有的共現關係，即頻繁項集或可信關聯規則。一條關聯規則由前件和後件兩部分組成，其含義是如果前件發生那麼後件也很可能發生。可信關聯規則即那些可能性較高的關聯規則，它可以幫助人們更好地發現規律，制定相應的策略。
數據分析的過程可以高度概括為一組查詢，每個查詢都是一個關於數據集的實值函數。然而毫無限制和保護地接入數據集以獲取查詢結果，可能導致個人隱私的洩露。因此，隱私保護下的數據分析技術受到了越來越多的關注，人們迫切希望可以找到一種強大的、在數學上嚴格的，且符合人們社會認知的隱私定義。差分隱私就是這樣一種隱私定義，它通過隱私水平這一參數來管理和量化個人在數據分析中的所面臨的隱私風險。一般而言，差分隱私可以通過對查詢結果添加精巧設計的噪聲來實現。
在本文中，我們重點關注滿足差分隱私的多支持度閾值的關聯規則挖掘，並解決了現有技術中存在的挑戰。我們提出並實現了DPARM算法，它利用多支持度閾值，在反映項目真實屬性的同時減少候選項集的數量，並且通過隨機截斷和均勻分割來降低數據集的維度。這兩者有助於降低查詢的敏感度，進而減小所需的噪聲規模，提升挖掘結果的效用。我們還通過自適應配置隱私水平來穩定噪聲規模，併限制了總體的隱私損失。此外，我們證明了DPARM 算法滿足事後差分隱私，並通過一系列實驗驗證了DPARM 算法的實用性。

In contemporary society, the rapid expansion of data volume has driven the development of data analysis techniques, which makes decision automation possible. Association analysis is an important task in data analysis. The goal is to find all co-occurrence relationships from the transactional dataset, i.e. frequent itemsets or confident association rules. An association rule consists of two parts, the antecedent and the consequent, which means that if the antecedent occurs then the consequent is also possible to happen. Confident association rules are those association rules with larger possibility, which can help people better discover patterns and develop corresponding strategies.
The process of data analysis can be highly summarized as a set of queries, where each query is a real-valued function of the dataset. However, without any restriction and protection, accessing the dataset to answer the queries may lead to the disclosure of individual privacy. Therefore, techniques for privacy-preserving data analysis has received increasing attention. People are eager to find a strong, mathematically rigorous, and socio-cognitive-conform definition of privacy. Differential privacy is such a privacy definition that manages and quantifies the privacy risks faced by individuals in data analysis through the parameter called the privacy level. In general, differential privacy can be achieved by adding delicate noise to the query results.
In this thesis, we focus on differential privacy association rules mining with multiple support thresholds, and solve the challenges existing in the state-of-art works. We propose and implement the DPARM algorithm, which uses multiple support thresholds to reduce the number of candidate itemsets while reflecting the real nature of the items, and uses random truncation and uniform partition to lower the dimensionality of the dataset. Both of these are helpful to reduce the sensitivity of the queries, thereby reducing the scale of the required noise and improving the utility of the mining results. We also stabilize the noise scale by adaptively allocating the privacy levels, and bound the overall privacy loss. In addition, we prove that the DPARM algorithm satisfies ex post differential privacy, and verify the utility of the DPARM algorithm through a series of experiments.

誌謝 iii
摘要 v
Abstract vii
1 Introduction 1
1.1 Data Automates Decisions 1
1.2 Privacy Is at Stake 3
1.3 Motivations and Contributions 5
2 Preliminaries 7
2.1 Association Rules Mining 7
2.2 Differential Privacy 9
3 Computational Model and Key Challenges 13
3.1 Computational Model and Problem Statement 13
3.2 Challenges and Existing Solutions 14
4 DPARM Algorithm 19
4.1 Initialization 20
4.2 Support Counting and MIS Assignment 22
4.3 Main Algorithm 23
4.4 Miscellaneous 25
5 Experiments 29
5.1 Environment and Datasets 29
5.2 Utility Metrics 30
5.3 Non-Private Parameters Versus Utility 31
5.4 Overall Privacy Loss Versus Utility 34
6 Related Works and Future Works 37
6.1 Related Works 37
6.2 Future Works 38
7 Conclusion 41
Bibliography 43

[1]Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. Mining association rules between sets of items in large databases. In Acm sigmod record, volume 22, pages 207–216. ACM, 1993.
[2]Rakesh Agrawal, Ramakrishnan Srikant, et al. Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB, volume 1215, pages 487– 499, 1994.
[3]Brendan Avent, Aleksandra Korolova, David Zeber, Torgeir Hovden, and Benjamin Livshits. Blender: enabling local search with a hybrid differential privacy model. In Proc. of the 26th USENIX Security Symposium, pages 747–764, 2017.
[4]Raghav Bhaskar, Srivatsan Laxman, Adam Smith, and Abhradeep Thakurta. Discov- ering frequent patterns in sensitive data. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 503–512. ACM, 2010.
[5]Ferenc Bodon. A fast apriori implementation. In FIMI, volume 3, page 63, 2003.
[6]Tom Brijs, Gilbert Swinnen, Koen Vanhoof, and Geert Wets. Using association rules for product assortment decisions: A case study. In Knowledge Discovery and Data Mining, pages 254–260, 1999.
[7]Wei-Yen Day and Ninghui Li. Differentially private publishing of high-dimensional data using sensitivity control. In Proceedings of the 10th ACM Symposium on Infor- mation, Computer and Communications Security, pages 451–462. ACM, 2015.
[8]Apple Differential Privacy Team. Learning with privacy at scale. Apple Machine Learning Journal, 1(8), 2017.
[9]Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. Collecting telemetry data privately. In Advances in Neural Information Processing Systems, pages 3571–3580, 2017.
[10]Cynthia Dwork. Differential privacy. In Automata, Languages, and Programming, pages 1–12. Springer, 2006.
[11]Cynthia Dwork and Moni Naor. On the difficulties of disclosure prevention in sta- tistical databases or the case for differential privacy. Journal of Privacy and Confi- dentiality, 2(1), 2010.
[12]Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In Annual Inter- national Conference on the Theory and Applications of Cryptographic Techniques, pages 486–503. Springer, 2006.
[13]Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, pages 265–284. Springer, 2006.
[14]Cynthia Dwork, Aaron Roth, et al. The algorithmic foundations of differential pri- vacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
[15]Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. Journal of Privacy and Confidentiality, 7(3): 17–51, 2017.
[16]Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. Rappor: Randomized ag- gregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pages 1054–1067. ACM, 2014.
[17]Jiawei Han, Jian Pei, and Yiwen Yin. Mining frequent patterns without candidate generation. In ACM sigmod record, volume 29, pages 1–12. ACM, 2000.
[18]Jiawei Han, Jian Pei, Yiwen Yin, and Runying Mao. Mining frequent patterns with- out candidate generation: A frequent-pattern tree approach. Data mining and knowl- edge discovery, 8(1):53–87, 2004.
[19]F Maxwell Harper and Joseph A Konstan. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis), 5(4):19, 2016.
[20]Shiva Prasad Kasiviswanathan, Homin K Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. What can we learn privately? SIAM Journal on Computing, 40 (3):793–826, 2011.
[21]Daniel Kifer and Ashwin Machanavajjhala. No free lunch in data privacy. In Pro- ceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 193–204. ACM, 2011.
[22]Jing Lei. Differentially private m-estimators. In Advances in Neural Information Processing Systems, pages 361–369, 2011.
[23]Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on, pages 106–115. IEEE, 2007.
[24]Ninghui Li, Wahbeh Qardaji, Dong Su, and Jianneng Cao. Privbasis: frequent item- set mining with differential privacy. Proceedings of the VLDB Endowment, 5(11): 1340–1351, 2012.
[25]Ninghui Li, Min Lyu, Dong Su, and Weining Yang. Differential privacy: From theory to practice. Synthesis Lectures on Information Security, Privacy, & Trust, 8 (4):1–138, 2016.
[26]Katrina Ligett, Seth Neel, Aaron Roth, Bo Waggoner, and Steven Z Wu. Accuracy first: Selecting a differential privacy level for accuracy constrained erm. In Advances in Neural Information Processing Systems, pages 2566–2576, 2017.
[27]Bing Liu, Wynne Hsu, and Yiming Ma. Mining association rules with multiple min- imum supports. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 337–341. ACM, 1999.
[28]Ashwin Machanavajjhala, Johannes Gehrke, Daniel Kifer, and Muthuramakrishnan Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. In null, page 24. IEEE, 2006.
[29]Mihai Maruseac and Gabriel Ghinita. Differentially-private mining of moderately- frequent high-confidence association rules. In Proceedings of the 5th ACM Confer- ence on Data and Application Security and Privacy, pages 13–24. ACM, 2015.
[30]Mihai Maruseac and Gabriel Ghinita. Precision-enhanced differentially-private min- ing of high-confidence association rules. IEEE Transactions on Dependable and Secure Computing, 2018.
[31]Viktor Mayer-Schonberger and Kenneth Cukier. Big data: the essential guide to work, life and learning in the age of insight. Hachette UK, 2013.
[32]Frank D McSherry. Privacy integrated queries: an extensible platform for privacy- preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pages 19–30. ACM, 2009.
[33]Chirag N Modi, Udai Pratap Rao, and Dhiren R Patel. Maintaining privacy and data quality in privacy preserving association rule mining. In Computing Communication and Networking Technologies (ICCCNT), 2010 International Conference on, pages 1–6. IEEE, 2010.
[34]Arvind Narayanan and Vitaly Shmatikov. Robust de-anonymization of large sparse datasets. In Security and Privacy, 2008. SP 2008. IEEE Symposium on, pages 111–125. IEEE, 2008.
[35]Kobbi Nissim and Alexandra Wood. Is privacy privacy? Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 376 (2128), 2018.
[36]David Reinsel, John Gantz, and John Rydning. Data age 2025: The evolution of data to life-critical. Don’t Focus on Big Data, 2017.
[37]Ryan M Rogers, Aaron Roth, Jonathan Ullman, and Salil Vadhan. Privacy odome- ters and filters: Pay-as-you-go composition. In Advances in Neural Information Processing Systems, pages 1921–1929, 2016.
[38]Latanya Sweeney. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05):557–570, 2002.
[39]Salil Vadhan. The complexity of differential privacy. In Tutorials on the Foundations of Cryptography, pages 347–450. Springer, 2017.
[40]Mohammed Javeed Zaki. Scalable algorithms for association mining. IEEE trans- actions on knowledge and data engineering, 12(3):372–390, 2000.
[41]Chen Zeng, Jeffrey F Naughton, and Jin-Yi Cai. On differentially private frequent itemset mining. Proceedings of the VLDB Endowment, 6(1):25–36, 2012.
[42]Zijian Zheng, Ron Kohavi, and Llew Mason. Real world performance of association rule algorithms. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 401–406. ACM, 2001.

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

無相關論文

無相關期刊

1.	使用卷積神經網路架構進行單張影像煙霧移除
2.	設計及實作時效性網路橋接器驗證平台
3.	多域混合尺寸電路設計之巨集電路擺置
4.	物聯網雲端資料串流環境下安全且可驗證的查詢方法
5.	民法消滅時效期間、起算點及障礙事由之研究
6.	滿足差分隱私之多支持度閾值頻繁項集挖掘
7.	企業IT雲端化之分析以製造業及金融業為例
8.	利用受體模式評估都會區細懸浮微粒上多環芳香烴濃度與來源貢獻垂直分布
9.	網龍華漁教育發展戰略研究
10.	兒童娛樂行業商業模式創新研究—以上海玩童會公司為例
11.	軟體資訊安全自動化檢測技術研究與在物聯網上之應用
12.	一個在邊際運算環境中建立反應式應用之框架
13.	結合仿人耳聽覺定向系統的機器人室內環境導航
14.	使用自旋轉移力矩式記憶體為基礎的物理性不可複製功能之物聯網裝置認證機制
15.	我國非公開發行公司治理路徑之探尋—從法制建構和實務運作出發

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室