跳到主要內容

臺灣博碩士論文加值系統

(44.222.131.239) 您好!臺灣時間:2024/09/13 18:39
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:潘勁安
研究生(外文):PAN, CHIN-AN
論文名稱:使用合成資料解決以顧客為中心的隱私與模型預測問題
論文名稱(外文):Using synthetic data to address customer-centric privacy and model prediction issues
指導教授:尹秦清
指導教授(外文):YIN, CHIN-CHING
口試委員:曾柏興李育奇尹秦清
口試委員(外文):TSENG, PO-HSINGLEE, YU-CHIYIN, CHIN-CHING
口試日期:2024-06-03
學位類別:碩士
校院名稱:國立臺北科技大學
系所名稱:工業工程與管理系
學門:工程學門
學類:工業工程學類
論文種類:學術論文
論文出版年:2024
畢業學年度:112
語文別:英文
論文頁數:52
中文關鍵詞:機器學習隱私權資料生成表格式生成對抗網路集成式學習
外文關鍵詞:Machine LearningData GenerationPrivacyConditional Tabluar Generative Adversarial NetworkEnsemble Learning
相關次數:
  • 被引用被引用:0
  • 點閱點閱:6
  • 評分評分:
  • 下載下載:2
  • 收藏至我的研究室書目清單書目收藏:0
自古以來,科技不斷進步,人類處於資訊爆棚的時代,科技帶給人類便利,任何資訊都唾手可得。然而,於此情形下,資訊隱私及安全性,是否能夠被考量並適時地保護,過去的時代,並不需要考慮其相關的法律約束性,但漸漸地時代正在改變,在人工智慧和自動化方面,某些國家已訂定相關法律去限制侵犯隱私,如何去拿捏這之中的平衡,使雙方不損有其害。本研究將深入探討企業Meta被多個州指控利用功能吸引兒童使用Instagram和Facebook之實際案例並探討「一般資料保護規則」(General Data Protection Regulation, GDPR)、「加州消費者隱私法」(California Consumer Privacy Act, CCPA)以及「健康保險可攜與責任法」(Health Insurance Portability and Accountability Act, HIPAA)等法律條文,所列出之限制,利用條件表格生成對抗網路(CTGAN)去進行資料生成,有效地保護個人的隱私權,並導入預測模型。後續,以真實資料作為對照,分析和解讀兩者差距。
於本研究的最後,試圖回答所提出之方案能不能完全取代真實資料,並為往後企業要數據分析時,能否協助企業在建立預測模型時,即時節省巨量的資料搜集程序,以及違反隱私相關條文的問題。

In recent years, with the rapid advancement of technology in this era of information explosion, technology has brought many conveniences to people. Any information is easily accessible. However, in this situation, the privacy and security of information should be considered and timely protected. In the past, there was no need to consider relevant legal constraints. Gradually, times are changing, and in some countries, laws have been enacted to restrict the infringement of privacy by artificial intelligence.
Therefore, this study will be discussing the case of “Meta is being sued by States of using features to tempt children into using Instagram and Facebook” and the regulations of the "General Data Protection Regulation" (GDPR), the "California Consumer Privacy Act" (CCPA), and the "Health Insurance Portability and Accountability Act" (HIPAA). It will utilize Conditional Table Generative Adversarial Networks (CTGAN) for data generation while ensuring compliance with the aforementioned legal provisions, to guard the privacy of personal data with incorporate predictive models. Subsequently, using real data as a control, the study will analyze and interpret the differences of the generated data and real data.
In the conclusion of this research, an attempt will be made to answer whether the proposed solution can effectively replace real data. It aims to address whether it can assist businesses in data analysis, enabling them to streamline the data collection process and avoid issues related to privacy laws when developing predictive models for future use.

摘要 i
Abstract ii
致謝 iv
Table of Contents v
List of Tables vii
List of Figures viii
1. Introduction 1
1.1 Research Background and Motivation 1
1.2 Research Purpose 3
1.3 Research Framework 3
2. Literature Review 5
2.1 The Definition and Applications of Machine Learning and Deep Learning 5
2.2 Privacy Regulations 6
2.2.1 General Data Protection Regulation(GDPR) 6
2.2.2 California Consumer Privacy Act (CCPA) 12
2.2.3 US Health Insurance Porta‐bility and Accountability Act (HIPAA) 14
2.3 Synthetic Data 15
3. Methodology 19
3.1.1 Conditional Tabular Generative Adversarial Network(CTGAN) 19
3.1.2 Data 20
3.1.3 Label Encoding 22
3.1.4 Standard Scaler 23
3.1.5 Ensemble Learning Model 24
4. Emperical Study 27
4.1 Privacy concerns in Europe and USA 27
4.1.1 European General Data Protection Regulation (GDPR) 27
4.1.2 California Consumer Privacy Act (CCPA) 27
4.1.3 US Health Insurance Portability and Accountability Act (HIPAA) 29
4.2 Can synthetic tabular data replace real data? 29
4.2.1 The statistical characteristics of data 29
4.2.2 Encoding Features 34
4.2.3 StandardScaler on Features 35
4.2.4 Ensemble Learning Hard & Soft voting 36
5. Conclusion and Discussion 39
6. Future Work and limitations 45
Reference 48
1.歐盟個人資料保護規則本文部分(https://www.jcic.org.tw/) Accessed December 29, 2023.
2.歐洲一般資料保護法規 (GDPR) - 對於影像監控的意義(https://www.axis.com/) Accessed January 12, 2024.
3.加州消費者隱私保護法(CCPA)規範重點說明(https://ws.ndc.gov.tw/) Accessed February 4, 2024.
4.關於保護健康和心理健康資訊的 HIPAA 隱私規則(https://omh.ny.gov/) Accessed January 20, 2024
5.Wijesinghe, P., & Dholakia, K. (2021). Emergent physics-informed design of deep learning for microscopy. Journal of Physics: Photonics, 3(2), 021003.
6.Aljeraisy, A., Barati, M., Rana, O., & Perera, C. (2021). Privacy laws and privacy by design schemes for the internet of things: A developer’s perspective. ACM Computing Surveys (Csur), 54(5), 1-38.
7.Silva, P., Gonçalves, C., Antunes, N., Curado, M., & Walek, B. (2022). Privacy risk assessment and privacy-preserving data monitoring. Expert Systems with Applications, 200, 116867.
8.Nield, J., Scanlan, J., & Roehrer, E. (2020). Exploring consumer information-security awareness and preparedness of data-breach events. Library Trends, 68(4), 611-635.
9.Voigt, P., & Von dem Bussche, A. (2017). The eu general data protection regulation (gdpr). A Practical Guide, 1st Ed., Cham: Springer International Publishing, 10(3152676), 10-5555
10.Talbot, P. W., Rabiti, C., Alfonsi, A., Krome, C., Kunz, M. R., Epiney, A., ... & Mandelli, D. (2020). Correlated synthetic time series generation for energy system simulations using Fourier and ARMA signal processing. International Journal of Energy Research, 44(10), 8144-8155.
11.Rahmani, A. M., Yousefpoor, E., Yousefpoor, M. S., Mehmood, Z., Haider, A., Hosseinzadeh, M., & Ali Naqvi, R. (2021). Machine learning (ML) in medicine: Review, applications, and challenges. Mathematics, 9(22), 2970.
12.Taherkhani, K., Ero, O., Liravi, F., Toorandaz, S., & Toyserkani, E. (2023). On the application of in-situ monitoring systems and machine learning algorithms for developing quality assurance platforms in laser powder bed fusion: A review. Journal of Manufacturing Processes, 99, 848-897.
13.Wang, H., Dong, G., Chen, J., Hu, X., & Zhu, Z. (2023). A novel dictionary learning named deep and shared dictionary learning for fault diagnosis. Mechanical Systems and Signal Processing, 182, 109570.
14.Alammar, Z., Alzubaidi, L., Zhang, J., Li, Y., Lafta, W., & Gu, Y. (2023). Deep transfer learning with enhanced feature fusion for detection of abnormalities in x-ray images. Cancers, 15(15), 4007.
15.Xiang, A. (2022). Being'Seen'vs.'Mis-Seen': Tensions between Privacy and Fairness in Computer Vision. Harvard Journal of Law & Technology, 36(1).
16.Joshi, I., Grimmer, M., Rathgeb, C., Busch, C., Bremond, F., & Dantcheva, A. (2024). Synthetic data in human analysis: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence.
17.El Emam, K., Mosquera, L., & Hoptroff, R. (2020). Practical synthetic data generation: balancing privacy and the broad availability of data. O'Reilly Media.
18.Asghar, M. N., Kanwal, N., Lee, B., Fleury, M., Herbst, M., & Qiao, Y. (2019). Visual surveillance within the EU general data protection regulation: A technology perspective. IEEE Access, 7, 111709-111726.
19.Wood, E., Baltrušaitis, T., Hewitt, C., Dziadzio, S., Cashman, T. J., & Shotton, J. (2021). Fake it till you make it: face analysis in the wild using synthetic data alone. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 3681-3691).
20.Dinur, E. (2017). The Filmmaker's Guide to Visual Effects: The Art and Techniques of VFX for Directors, Producers, Editors, and Cinematographers.
21.Buolamwini, J., & Gebru, T. (2018, January). Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency (pp. 77-91). PMLR.
22.Robert Williams. "Opinion I was wrongfully arrested because of facial recognition. Why are police allowed to use it? " The Washington Post. 2020.
23.Chandra, T. B., & Dwivedi, A. K. (2022). Data visualization: existing tools and techniques. In Advanced data mining tools and methods for social computing (pp. 177-217). Academic Press.
24.Arteaga, P., Batanero, C., Contreras, J. M., & Cañadas, G. R. (2012). Understanding statistical graphs: a research survey. Boletín de Estadística e Investigación Operativa, 28(3), 261-277.
25.Wang, Y., Han, F., Zhu, L., Deussen, O., & Chen, B. (2017). Line graph or scatter plot? automatic selection of methods for visualizing trends in time series. IEEE transactions on visualization and computer graphics, 24(2), 1141-1154.
26.Lewandowsky, S., & Spence, I. (1989). The perception of statistical graphs. Sociological Methods & Research, 18(2-3), 200-242.
27.Perikova, E. I., Filippova, M. G., Makarova, D. N., & Gnedykh, D. S. (2023). THE LABELING BENEFIT IN FAST MAPPING AND EXPLICIT ENCODING. IP Pavlov Journal of Higher Nervous Activity, 73(6), 749-763.
28.Lin, Z., Ding, G., Hu, M., & Wang, J. (2014, June). Multi-label classification via feature-aware implicit label space encoding. In International conference on machine learning (pp. 325-333). PMLR.
29.Raju, V. G., Lakshmi, K. P., Jain, V. M., Kalidindi, A., & Padma, V. (2020, August). Study the influence of normalization/transformation process on the accuracy of supervised classification. In 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT) (pp. 729-735). IEEE.
30.Misra, P., & Yadav, A. S. (2019, March). Impact of preprocessing methods on healthcare predictions. In Proceedings of 2nd International Conference on Advanced Computing and Software Engineering (ICACSE).
31.Aldi, F., Hadi, F., Rahmi, N. A., & Defit, S. (2023). Standardscaler's Potential in Enhancing Breast Cancer Accuracy Using Machine Learning. Journal of Applied Engineering and Technological Science (JAETS), 5(1), 401-413.
32.Becher, J. D., Berkhin, P., & Freeman, E. (2000, August). Automating exploratory data analysis for efficient data mining. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 424-429).
33.Mienye, I. D., & Sun, Y. (2022). A survey of ensemble learning: Concepts, algorithms, applications, and prospects. IEEE Access, 10, 99129-99149.
34.Polikar, R. (2012). Ensemble learning. Ensemble machine learning: Methods and applications. Cham: Springer.
35.Leon, F., Floria, S. A., & Bădică, C. (2017, July). Evaluating the effect of voting methods on ensemble-based classification. In 2017 IEEE international conference on INnovations in intelligent Systems and applications (INISTA) (pp. 1-6). IEEE.
36.Ganaie, M. A., Hu, M., Malik, A. K., Tanveer, M., & Suganthan, P. N. (2022). Ensemble deep learning: A review. Engineering Applications of Artificial Intelligence, 115, 105151.
37.Sevim, S., Omurca, S. I., & Ekinci, E. (2021, October). Improving accuracy of document image classification through soft voting ensemble. In The International Conference on Artificial Intelligence and Applied Mathematics in Engineering (pp. 161-173). Cham: Springer International Publishing.
38.Mohammed, A., & Kora, R. (2023). A comprehensive review on ensemble deep learning: Opportunities and challenges. Journal of King Saud University-Computer and Information Sciences, 35(2), 757-774.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊