跳到主要內容

臺灣博碩士論文加值系統

(44.222.64.76) 您好!臺灣時間:2024/06/17 08:05
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:林苡晴
研究生(外文):Lin, Yi-Ching
論文名稱:以DeepSHAP生成決策邏輯的對抗樣本偵測研究
論文名稱(外文):DeepSHAP Summary for Adversarial Example Detection
指導教授:郁方郁方引用關係
指導教授(外文):Yu, Fang
口試委員:洪智鐸江介宏
口試委員(外文):Hong, Chih-DuoJiang, Jie-Hong
口試日期:2023-07-24
學位類別:碩士
校院名稱:國立政治大學
系所名稱:資訊管理學系
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2023
畢業學年度:111
語文別:英文
論文頁數:65
中文關鍵詞:對抗樣本可解釋人工智慧DeepSHAP決策邏輯
外文關鍵詞:Adversarial exampleExplainable AIDeepSHAPDecision logic
相關次數:
  • 被引用被引用:0
  • 點閱點閱:93
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
深度學習的應用已廣泛地使用在各種場景之中,可解釋人工智慧有助於提供模型預測的解釋,增強模型的可靠性及可信度。本研究提出三種基於DeepSHAP Summary所擴展的對抗樣本偵測方法。研究發現正常樣本與對抗樣本之間在解釋上存在差異,並且具有不同的決策邏輯可用於區別樣本。研究首先使用可解釋人工智慧的工具——DeepSHAP計算各個神經元在分類模型中逐層的貢獻,以篩選出關鍵神經元,並生成代表決策邏輯的關鍵神經元分佈圖,藉此提出基於決策邏輯而非SHAP值的新方法來偵測對抗樣本。將所有決策邏輯中關鍵神經元整合的決策圖則提供神經元對分類結果的影響力共識。研究亦透過逐層解釋的SHAP值來偵測對抗樣本,並推薦基於決策圖來選擇最佳層的策略,以提供更為合理的單一層來偵測對抗樣本。另外,研究提出以活化狀態方法進行偵測,透過提取決策圖中模型的活化值作為資料以降低計算成本。本研究針對三種資料集的實驗結果顯示:1) 提供更多層的SHAP值資訊可以獲得更好的偵測結果,2) 使用專注於關鍵神經元的決策邏輯方法其以更少的資源需求,達到與使用所有層的SHAP值相當的準確率,3) 使用最佳層的SHAP值與活化狀態方法可以提供更加輕量化且具有足夠的偵測能力的偵測方法。所有提出的方法均顯示對未經訓練的對抗樣本具有有效的偵測轉移能力。
Deep learning has broad applications. Explainable AI (XAI) enhances interpretability and reliability. Leveraging XAI, we propose three adversarial example detection approaches based on DeepSHAP Summary. Specifically, we use DeepSHAP to calculate the neuron contributions, identifying critical neurons by SHAP values and generating critical neuron bitmap as decision logic. We reveal distinct interpretations and diverse decision logic between normal and adversarial examples. Our approach uses the decision logic instead of the SHAP signature for detection. We then employ the layer-wise SHAP explanation and recommend a strategy for best layer selection through decision graph that summarizes critical neurons, enhancing single-layer detection. The activation status approach reduces computation using decision graph-based activation values. The results across three datasets demonstrate accuracy improvement with more SHAP layer information. Focusing on critical neurons yields competitive accuracy with fewer resources. The best layer SHAP signature and activation status approaches offer lightweight yet effective detection. This efficacy extends to untrained attack detection.
Contents iii
List of Figures v
List of Tables vi
1 Introduction 1
2 Related Work 5
2.1 Adversarial Example Detection Method 5
2.1.1 Kernel Density and Bayesian Uncertainty 6
2.1.2 MagNet 6
2.1.3 Feature Squeezing 7
2.1.4 Natural Scene Statistics 7
2.2 DeepSHAP Application 8
2.2.1 In Engineering 8
2.2.2 In Aerospace 9
2.2.3 In Medical 9
2.3 Coverage Criteria on Deep Neural Network 10
3 DeepSHAP Summary 12
3.1 Phase 1: Data Collection 13
3.2 Phase 2: DeepSHAP Computation 14
3.3 Phase 3: Critical Neuron Identification 16
3.4 Phase 4: Critical Neuron Consensus 17
3.5 Phase 5: Decision Graph 18
4 Adversarial Example Detection 20
4.1 Decision Logic Approach 20
4.2 Best Layer SHAP Signature Approach 23
4.3 Activation Status Approach 25
5 Experiments 28
5.1 Experiment Setup 29
5.1.1 Adversarial Example Collection 29
5.1.2 Detection Model 30
5.2 Decision Logic Approach 31
5.2.1 Distribution Analysis 31
5.2.2 Decision Logic Approach Evaluation 32
5.3 Best Layer SHAP Signature Approach 34
5.3.1 Distribution Analysis 34
5.3.2 Best Layer Selection 36
5.3.3 Best Layer SHAP Signature Approach Evaluation 37
5.4 Activation Status Approach 41
5.4.1 Distribution Analysis 41
5.4.2 Activation Status Approach Evaluation 42
5.5 Performance 46
5.5.1 Comparison on Other Detection Methods 46
5.5.2 Transferability Evaluation 47
5.5.3 Performance Summary 50
6 Conclusions 55
Reference 57
[1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, pp. 436–444, 2015.
[2] T. Young, D. Hazarika, S. Poria, and E. Cambria, “Recent trends in deep learning based natural language processing,” ieee Computational intelligenCe magazine, vol. 13, no. 3, pp. 55–75, 2018.
[3] OpenAI, “Chatgpt language model,” https://openai.com, 2023.
[4] A. Aldahdooh, W. Hamidouche, S. A. Fezza, and O. Déforges, “Adversarial example detection for dnn models: A review and experimental comparison,” Artificial Intelligence Review, vol. 55, no. 6, pp. 4403–4462, 2022.
[5] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572, 2014.
[6] S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, “Deepfool: a simple and accurate method to fool deep neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2574–2582.
[7] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” arXiv preprint arXiv:1706.06083, 2017.
[8] S. Gu and L. Rigazio, “Towards deep neural network architectures robust to adversarial examples,” arXiv preprint arXiv:1412.5068, 2014.
[9] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami, “Distillation as a defense to adversarial perturbations against deep neural networks,” in 2016 IEEE symposium on security and privacy (SP). IEEE, 2016, pp. 582–597.
[10] A. Rozsa, E. M. Rudd, and T. E. Boult, “Adversarial diversity and hard positive generation,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2016, pp. 25–32.
[11] S. Zheng, Y. Song, T. Leung, and I. Goodfellow, “Improving the robustness of deep neural networks via stability training,” in Proceedings of the ieee conference on computer vision and pattern recognition, 2016, pp. 4480–4488.
[12] T. Pang, C. Du, Y. Dong, and J. Zhu, “Towards robust detection of adversarial examples,” Advances in neural information processing systems, vol. 31, 2018.
[13] N. Carlini and D. Wagner, “Adversarial examples are not easily detected: Bypassing ten detection methods,” in Proceedings of the 10th ACM workshop on artificial intelligence and security, 2017, pp. 3–14.
[14] X. Yuan, P. He, Q. Zhu, and X. Li, “Adversarial examples: Attacks and defenses for deep learning,” IEEE transactions on neural networks and learning systems, vol. 30, no. 9, pp. 2805–2824, 2019.
[15] J. H. Metzen, T. Genewein, V. Fischer, and B. Bischoff, “On detecting adversarial perturbations,” arXiv preprint arXiv:1702.04267, 2017.
[16] J. Lu, T. Issaranon, and D. Forsyth, “Safetynet: Detecting and rejecting adversarial examples robustly,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 446–454.
[17] S. Ma, Y. Liu, G. Tao, W.-C. Lee, and X. Zhang, “Nic: Detecting adversarial samples with neural network invariant checking,” in 26th Annual Network And Distributed System Security Symposium (NDSS 2019). Internet Soc, 2019.
[18] Z. Gong, W. Wang, and W.-S. Ku, “Adversarial and clean data are not twins,” arXiv preprint arXiv:1704.04960, 2017.
[19] A. Kherchouche, S. A. Fezza, W. Hamidouche, and O. Déforges, “Detection of adversarial examples in deep neural networks with natural scene statistics,” in 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020, pp. 1–7.
[20] K. Grosse, P. Manoharan, N. Papernot, M. Backes, and P. McDaniel, “On the (statistical) detection of adversarial examples,” arXiv preprint arXiv:1702.06280, 2017.
[21] X. Li and F. Li, “Adversarial examples detection in deep networks with convolutional filter statistics,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 5764–5772.
[22] R. Feinman, R. R. Curtin, S. Shintre, and A. B. Gardner, “Detecting adversarial samples from artifacts,” arXiv preprint arXiv:1703.00410, 2017.
[23] X. Ma, B. Li, Y. Wang, S. M. Erfani, S. Wijewickrema, G. Schoenebeck, D. Song, M. E. Houle, and J. Bailey, “Characterizing adversarial subspaces using local intrinsic dimensionality,” arXiv preprint arXiv:1801.02613, 2018.
[24] S. Freitas, S.-T. Chen, Z. J. Wang, and D. H. Chau, “Unmask: Adversarial detection and defense through robust feature alignment,” in 2020 IEEE International Conference on Big Data (Big Data). IEEE, 2020, pp. 1081–1088.
[25] D. Meng and H. Chen, “Magnet: a two-pronged defense against adversarial examples,” in Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, 2017, pp. 135–147.
[26] B. Liang, H. Li, M. Su, X. Li, W. Shi, and X. Wang, “Detecting adversarial image examples in deep neural networks with adaptive noise reduction,” IEEE Transactions on Dependable and Secure Computing, vol. 18, no. 1, pp. 72–85, 2018.
[27] W. Xu, D. Evans, and Y. Qi, “Feature squeezing: Detecting adversarial examples in deep neural networks,” arXiv preprint arXiv:1704.01155, 2017.
[28] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, and W. Samek, “On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation,” PloS one, vol. 10, no. 7, p. e0130140, 2015.
[29] M. T. Ribeiro, S. Singh, and C. Guestrin, “” why should i trust you?” explaining the predictions of any classifier,” in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016, pp. 1135–1144.
[30] A. Shrikumar, P. Greenside, and A. Kundaje, “Learning important features through propagating activation differences,” in International conference on machine learning. PMLR, 2017, pp. 3145–3153.
[31] S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” Advances in neural information processing systems, vol. 30, 2017.
[32] A. Adadi and M. Berrada, “Peeking inside the black-box: a survey on explainable artificial intelligence (xai),” IEEE access, vol. 6, pp. 52 138–52 160, 2018.
[33] A. Singh, S. Sengupta, and V. Lakshminarayanan, “Explainable deep learning models in medical image analysis,” Journal of Imaging, vol. 6, no. 6, p. 52, 2020.
[34] L.-P. Cen, J. Ji, J.-W. Lin, S.-T. Ju, H.-J. Lin, T.-P. Li, Y. Wang, J.-F. Yang, Y.-F. Liu, S. Tan et al., “Automatic detection of 39 fundus diseases and conditions in retinal photographs using deep neural networks,” Nature communications, vol. 12, no. 1, pp. 1–13, 2021.
[35] J. Reiter, “Developing an interpretable schizophrenia deep learning classifier on fmri and smri using a patient-centered deepshap,” in in 32nd Conference on Neural Information Processing Systems (NeurIPS 2018)(Montreal: NeurIPS), 2020, pp. 1–11.
[36] S. Mangalathu, S.-H. Hwang, and J.-S. Jeon, “Failure mode and effects analysis of rc members based on machine-learning-based shapley additive explanations (shap) approach,” Engineering Structures, vol. 219, p. 110927, 2020.
[37] K. Zhang, J. Zhang, P.-D. Xu, T. Gao, and D. W. Gao, “Explainable ai in deep reinforcement learning models for power system emergency control,” IEEE Transactions on Computational Social Systems, vol. 9, no. 2, pp. 419–427, 2021.
[38] H. Wu, A. Huang, and J. W. Sutherland, “Layer-wise relevance propagation for interpreting lstm-rnn decisions in predictive maintenance,” The International Journal of Advanced Manufacturing Technology, vol. 118, no. 3, pp. 963–978, 2022.
[39] A. T. Keleko, B. Kamsu-Foguem, R. H. Ngouna, and A. Tongne, “Health condition monitoring of a complex hydraulic system using deep neural network and deepshap explainable xai,” Advances in Engineering Software, vol. 175, p. 103339, 2023.
[40] A. Warnecke, D. Arp, C. Wressnegger, and K. Rieck, “Evaluating explanation methods for deep learning in security,” in 2020 IEEE european symposium on security and privacy (EuroS&P). IEEE, 2020, pp. 158–174.
[41] R. Alenezi and S. A. Ludwig, “Explainability of cybersecurity threats data using shap,” in 2021 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, 2021, pp. 01–10.
[42] A. B. Parsa, A. Movahedi, H. Taghipour, S. Derrible, and A. K. Mohammadian, “Toward safer highways, application of xgboost and shap for real-time accident detection and feature analysis,” Accident Analysis & Prevention, vol. 136, p. 105405, 2020.
[43] L. He, N. Aouf, and B. Song, “Explainable deep reinforcement learning for uav autonomous path planning,” Aerospace science and technology, vol. 118, p. 107052, 2021.
[44] G. Fidel, R. Bitton, and A. Shabtai, “When explainability meets adversarial learning: Detecting adversarial examples using shap signatures,” in 2020 international joint conference on neural networks (IJCNN). IEEE, 2020, pp. 1–8.
[45] E. Mosca, L. Huber, M. A. Kühn, and G. Groh, “Detecting word-level adversarial text attacks via shapley additive explanations,” in Proceedings of the 7th Workshop on Representation Learning for NLP, 2022, pp. 156–166.
[46] E. Tcydenova, T. W. Kim, C. Lee, and J. H. Park, “Detection of adversarial attacks in ai-based intrusion detection systems using explainable ai,” HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES, vol. 11, 2021.
[47] X. Xie, T. Li, J. Wang, L. Ma, Q. Guo, F. Juefei-Xu, and Y. Liu, “Npc: N euron p ath c overage via characterizing decision logic of deep neural networks,” ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 31, no. 3, pp. 1–27, 2022.
[48] A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial examples in the physical world,” in Artificial intelligence safety and security. Chapman and Hall/CRC, 2018, pp. 99–112.
[49] F. Tramèr, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and P. Mc-Daniel, “Ensemble adversarial training: Attacks and defenses,” arXiv preprint arXiv:1705.07204, 2017.
[50] F. Croce and M. Hein, “Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,” in International conference on machine learning. PMLR, 2020, pp. 2206–2216.
[51] A. Aldahdooh, https://github.com/aldahdooh/detectors_review, 2022.
[52] R. Ding, C. Gongye, S. Wang, A. A. Ding, and Y. Fei, “Emshepherd: Detecting adversarial samples via side-channel leakage,” in Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security, ser. ASIA CCS ’23. New York, NY, USA: Association for Computing Machinery, 2023, p. 300–313. [Online]. Available: https://doi.org/10.1145/3579856.3582827
[53] L. S. Shapley et al., “A value for n-person games,” 1953.
[54] A. Nascita, A. Montieri, G. Aceto, D. Ciuonzo, V. Persico, and A. Pescapé, “Xai meets mobile traffic classification: Understanding and improving multimodal deep learning architectures,” IEEE Transactions on Network and Service Management, vol. 18, no. 4, pp. 4225–4246, 2021.
[55] S. Meister, M. Wermes, J. Stüve, and R. M. Groves, “Investigations on explainable artificial intelligence methods for the deep learning classification of fibre layup defect in the automated composite manufacturing,” Composites Part B: Engineering, vol. 224, p. 109160, 2021.
[56] B. H. Van der Velden, H. J. Kuijf, K. G. Gilhuijs, and M. A. Viergever, “Explainable artificial intelligence (xai) in deep learning-based medical image analysis,” Medical Image Analysis, p. 102470, 2022.
[57] K. Davagdorj, J.-W. Bae, V.-H. Pham, N. Theera-Umpon, and K. H. Ryu, “Explainable artificial intelligence based framework for non-communicable diseases prediction,” IEEE Access, vol. 9, pp. 123 672–123 688, 2021.
[58] M. V. García and J. L. Aznarte, “Shapley additive explanations for no2 forecasting,” Ecological Informatics, vol. 56, p. 101039, 2020.
[59] V. Kumar and D. Boulanger, “Explainable automated essay scoring: Deep learning really has pedagogical value,” in Frontiers in education, vol. 5. Frontiers Media SA, 2020, p. 572367.
[60] D. Ma, J. Bortnik, X. Chu, S. G. Claudepierre, Q. Ma, and A. Kellerman, “Opening the black box of the radiation belt machine learning model,” Space Weather, vol. 21, no. 4, p. e2022SW003339, 2023.
[61] F. Yin, R. Fu, X. Feng, T. Xing, and M. Ji, “An interpretable neural network tv program recommendation based on shap,” International Journal of Machine Learning and Cybernetics, pp. 1–14, 2023.
[62] A. O. Anim-Ayeko, C. Schillaci, and A. Lipani, “Automatic blight disease detection in potato (solanum tuberosum l.) and tomato (solanum lycopersicum, l. 1753) plants using deep learning,” Smart Agricultural Technology, p. 100178, 2023.
[63] A. Temenos, N. Temenos, M. Kaselimi, A. Doulamis, and N. Doulamis, “Interpretable deep learning framework for land use and land cover classification in remote sensing using shap,” IEEE Geoscience and Remote Sensing Letters, vol. 20, pp. 1–5, 2023.
[64] M. Veerappa, M. Anneken, N. Burkart, and M. F. Huber, “Validation of xai explanations for multivariate time series classification in the maritime domain,” Journal of Computational Science, vol. 58, p. 101539, 2022.
[65] H. Chen, S. M. Lundberg, and S.-I. Lee, “Explaining a series of models by propagating shapley values,” Nature communications, vol. 13, no. 1, p. 4512, 2022.
[66] L. Ma, F. Juefei-Xu, F. Zhang, J. Sun, M. Xue, B. Li, C. Chen, T. Su, L. Li, Y. Liu et al., “Deepgauge: Multi-granularity testing criteria for deep learning systems,” in Proceedings of the 33rd ACM/IEEE international conference on automated software engineering, 2018, pp. 120–131.
[67] Z. Ji, P. Ma, Y. Yuan, and S. Wang, “Cc: Causality-aware coverage criterion for deep neural networks,” in 2023 IEEE/ACM 45th International Conference on Software Engineering Proceedings (ICSE). IEEE, 2023.
[68] S. Lundberg, “shap,” https://github.com/slundberg/shap, 2022.
[69] H. Kim, “Torchattacks: A pytorch repository for adversarial attacks,” arXiv preprint arXiv:2010.01950, 2020.
電子全文 電子全文(網際網路公開日期:20280820)
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top