跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.32) 您好!臺灣時間:2026/01/07 04:29
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:郭家諭
研究生(外文):Kuo, Chia-Yu
論文名稱:應用個人化的特徵歸因與反事實解釋建構可解釋的機器學習風險預測系統於家內兒童暴力事件
論文名稱(外文):Explainable Risk Prediction System for Child Abuse Event by Individual Feature Attribution and Counterfactual Explanation
指導教授:盧鴻興盧鴻興引用關係
指導教授(外文):Lu, Henry Horng-Shing
學位類別:碩士
校院名稱:國立交通大學
系所名稱:統計學研究所
學門:數學及統計學門
學類:統計學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:英文
論文頁數:40
中文關鍵詞:可解釋機器學習個人化特徵歸因反事實解釋風險預測暴力防治
外文關鍵詞:Explainable Machine LearningIndividual Feature AttributionShapley ValueKernel SHAPCounterfactual ExplanationRisk PredictionViolence Prevention
相關次數:
  • 被引用被引用:0
  • 點閱點閱:320
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
在建構機器學習模型時,我們常常需要在模型表現與模型可解釋性之間做出選擇。一些複雜的模型如集成式學習(Ensemble Learning)、深度神經網路(Deep Neural Network)有較高的預測準確率,但通常很難詮釋模型的預測結果,對於一些不了解機器學習的使用者來說,這些方法儼然變成了一個黑盒模型(Black-box model),雖然有很高的準確度,但卻無法了解這些預測結果,只知道模型的預測值。了解為什麼模型會做出這些預測有助於我們相信黑盒模型,也幫助使用者作出相對應的決策。本文將使用可解釋機器學習中的方法,建構出一套同時具有高準確率與良好可解釋性之風險預測機器學習系統。

本文所使用之資料由臺北市家庭暴力暨性侵害防治中心提供。我們建構一個風險預測系統,以預測一個家內兒少暴力案件在案件解決之前再度受虐的風險,也提供了個人化的特徵歸因(Individual Feature Attribution)與反事實解釋(Counterfactual Explanation),個人化的特徵歸因利用各個特徵對預測結果之影響解釋了模型的個人預測值,而反事實解釋則能提供社工一些處遇上的建議,以期望以最有效率的方式降低個案風險。
There always have a trade-off: Performance or Interpretability. The complex model, such as ensemble learning can achieve outstanding prediction accuracy. However, it is not easy to interpret the complex model. Understanding why a model made a prediction help us to trust the black-box model, and also help users to make decisions. This work plans to use the techniques of explainable machine learning to develop the appropriate model for empirical data with high prediction and good interpretability.
In this study, we use the data provided by Taipei City Center for Prevention of Domestic Violence and Sexual Assault (臺北市家庭暴力暨性侵害防治中心) to develop the risk prediction model to predict the recurrence probability of violence accident for the same case before this case is resolved. This prediction model can also provide individual feature explanation and the counterfactual explanation to help social workers conduct an intervention for violence prevention.
摘要 i
Abstract ii
Acknowledgments iii
Outline iv
List of Figures v
List of Tables vi
List of Algorithms vi
Notations vii
1 Introduction 1
1.1 Background 1
1.2 Interpretable Machine Learning 1
1.3 Proposed System 3
2 Machine Learning Model-XGBoost 5
3 Individual Feature Attribution 9
3.1 Model Specific Method-An example of XGBoost 10
3.2 Local Surrogate Model-LIME 11
3.3 Shapley Value 14
3.4 Kernel SHAP 18
3.5 Comparison 21
4 Counterfactual Explanation 24
4.1 Greedy Best First Search 24
4.2 Optimization Approach 25
5 Results 28
5.1 Data Description 28
5.2 Machine Learning Model 28
5.3 Individual Feature Attribution 30
5.4 Counterfactual Explanation 33
6 Conclusion and Discussion 37
Reference 38
Appendix 39
[1] Friedman, Jerome H. “Greedy function approximation: A gradient boosting machine.” The Annals of Statistics Vol. 29, No. 5 (2001), pp. 1189-1232.
[2] Breiman, Leo.“Random Forests.” Machine Learning 45 (1). Springer: 5-32 (2001).
[3] Aaron Fisher, Cynthia Rudin, Francesca Dominici. “All Models are Wrong but many are Useful: Variable Importance for Black-Box, Proprietary, or Misspecified Prediction Models, using Model Class Reliance” arXiv:1801.01489 (2018)
[4] Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin. “Why Should I Trust You?: Explaining the Predictions of Any Classifier” arXiv:1602.04938 (2016).
[5] Shapley, Lloyd S. “A value for n-person games.” Contributions to the Theory of Games 2.28 (1953): 307-317.
[6] David M. Allen. “The Relationship between Variable Selection and Data Agumentation and a Method for Prediction” Technometrics Vol. 16, No. 1 (1974), pp. 125-127.
[7] R. Dennis Cook. “Detection of Influential Observation in Linear Regression” Technometrics Vol. 19, No. 1 (Feb., 1977), pp. 15-18.
[8] Sandra Wachter, Brent Mittelstadt, Chris Russell. “Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR” arXiv:1711.00399 (2017).
[9] Tianqi Chen, Carlos Guestrin. “XGBoost: A Scalable Tree Boosting System” arXiv:1603.02754 (2016).
[10] Scott M. Lundberg, Su-In Lee. “A Unified Approach to Interpreting Model Predictions” Advances in Neural Information Processing Systems 30 (2017).
[11] Jerome H. Friedman. “Greedy Function Approximation: A Gradient Boosting Machine” The Annals of Statistics Vol. 29, No. 5 (2001), pp. 1189-1232.
[12] Erik Strumbelj, Igor Kononenko. “An Efficient Explanation of Individual Classifications using Game Theory” The Journal of Machine Learning Research Volume 11, (2010) pp. 1-18.
[13] David Martens, Foster Provost. “Explaining data-driven document classifications” MIS Quarterly Volume 38 Issue 1 (2014), pp. 73-100.
[14] Kjersti Aas, Martin Jullum, Anders Løland. “Explaining individual predictions when features are dependent: More accurate approximations to Shapley values” arXiv:1903.10464 (2019).
[15] N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer. “SMOTE: Synthetic Minority Over-sampling Technique” Journal of Artificial Intelligence Research Vol16 (2002).
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top