跳到主要內容

臺灣博碩士論文加值系統

(216.73.217.5) 您好!臺灣時間:2026/06/08 09:42
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:黃孟凡
研究生(外文):Meng-Fan Huang
論文名稱:發展羅吉斯迴歸模型平均法評估稀有事件之風險
論文名稱(外文):Logistic Regression Model Averaging for Rare Events Data
指導教授:陳春樹
指導教授(外文):Chun-Shu Chen
學位類別:碩士
校院名稱:國立彰化師範大學
系所名稱:統計資訊研究所
學門:數學及統計學門
學類:統計學類
論文種類:學術論文
論文出版年:2013
畢業學年度:101
語文別:英文
論文頁數:40
中文關鍵詞:不平衡資料Kullback-Leibler損失最大概似估計風險評估不確定性
外文關鍵詞:Imbalanced dataKullback-Leibler lossMaximum likelihood estimateRisk assessmentUncertainty
相關次數:
  • 被引用被引用:1
  • 點閱點閱:826
  • 評分評分:
  • 下載下載:120
  • 收藏至我的研究室書目清單書目收藏:0
在許多統計的應用中,羅吉斯回歸(logistic regression)模型常用來分析帶有解釋變數的二元資料形態。當資料中的兩個分類極度不平衡時,模型參數的估計已被證實會有嚴重的偏誤,因而對應的相關風險評估推論將是不精確的。本篇論文中我們應用羅吉斯回歸模型去評估罕見事件的風險變異,我們不是使用某一特定的變數選擇準則去挑選一個最佳的模型,而是透過資料擾動的技巧應用在某些變數選取準則上去提出一個局部的模型平均流程,進而得到不同的風險估計值。然後藉由Kullback-Leibler損失的近似不偏估計量從中選擇。我們所提出的局部模型平均流程考慮了一般配模過程中常忽略的參數估計與模型選取的不確定性,因此我們所提出的方法將具有較佳的表現。我們透過完整的模擬實驗說明所提出方法的有效性,並藉由分析壞死性小腸結腸炎的真實資料說明我們方法的可行性。
In statistical applications, logistic regression has been a popular method for analyzing binary data accompanied by explanatory variables. But when the two classifications are extremely imbalanced, the estimation of model parameters has been shown to be severely biased and hence the inferences of relative risks based on a selected model would be inaccurate. In this paper, we focus on assessing the risk variations of rare events based on logistic regression models. Instead of selecting a best model based on a particular variable selection criterion, we propose a local model averaging procedure based on a data perturbation technique applied to different information criteria to obtain new risk estimates. Then an approximately unbiased estimator of Kullback-Leibler loss is used to choose the best one among them. Our proposed local model averaging procedure accounts for both estimation uncertainty and selection uncertainty, which are generally not considered by other modeling procedures. Therefore the proposed method has superior performance in various situations. We present complete simulations to assess the robustness of our approach and a real data example for necrotizing enterocolitis (NEC) is also applied for illustration.
Contents
1 Background and Motivation 1
2 Logistic Regression Model 4
3 Model Selection and Model Averaging 7
3.1 Model Selection 7
3.2 Model Averaging 8
4 The Proposed Methodology 10
4.1 Local Model Average Estimators 10
4.2 Assessment of LMA Estimators 12
5 Simulation Study 18
5.1 Simulation Scenarios 18
5.2 Simulation Results and Sensitivity Study 19
6 Applications to the Necrotizing Enterocolitis Data 31
7 Conclusion 37

List of Tables
1 True values and MLEs of model parameters for each case of Experiment I, where the values in parentheses are the standard deviations of parameter estimation based on 400 simulation replicates 20
2 True values and MLEs of model parameters for each case of Experiment II, where the values in parentheses are the standard deviations of parameter estimation based on 400 simulation replicates 21
3 Average values of percentage of rare events (Y = 1) for Experiment I and Experiment II based on 400 simulation replicates, where the values in parentheses are the corresponding standard deviations 23
4 KL loss performance of various methods for Experiment I based on 400 simulation replicates, where the values in parentheses are the corresponding standard errors 24
5 KL loss performance of various methods for Experiment II based on 400 simulation replicates, where the values in parentheses are the corresponding standard errors 25
6 Distributions of the selected variables for AIC and BIC criteria under various cases of Experiment I and Experiment II based on 400 simulation replicates, where the symbol “ ∗ ” represents that the underlying true model is exactly selected 28
7 Times of our proposed adaptive-LMA method averages around AIC or BIC criterion for various cases of Experiment I and Experiment II based on 400 simulation replicates 29
8 KL loss performance of our proposed adaptive-LMA method with respect to various perturbation sizes τ and T = 400 for Case 3 of Experiment I and Experiment II based on 400 simulation replicates, where the values in parentheses are the corresponding standard errors 29
9 KL loss performance of our proposed adaptive-LMA method with respect to various MC sample sizes T and τ = 0.5 for Case 3 of Experiment I and Experiment II based on 400 simulation replicates, where the values in parentheses are the corresponding standard errors 29

List of Figures
1 Plots of averaged KL loss values with respect to five cases for Experiment I based on 400 simulation replicates 26
2 Plots of averaged KL loss values with respect to five cases for Experiment II based on 400 simulation replicates 27
3 Risk estimates of severe NEC for 188 individuals based on various methods 34
4 Frequency of the selected models (variables) for the BIC-LMA method based on 400 perturbed data..........36
References
Agresti, A. (2007). An Introduction to Categorical Data Analysis (second edition). John Wiley: New York.
Akaike, H. (1973). Information theory and the maximum likelihood principle. In International Symposium on Information Theory (V. Petrov and F. Cs´aki eds.), Akademiai Ki´ado, Budapest, 267-281.
Bolton, R. J. and Hand, D. J. (2002). Statistical fraud detection: A review. Statistical Science, 17(3), 235-255.
Breiman, L. (1996). Heuristics of instability and stabilization in model selection. Annals of Statistics, 24, 2350-2383.
Burnham, K. P. and Anderson, D. R. (2002). Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach (second edition). SpringerVerlag: New York.
Chen, C. S. and Chang, Y. M. (2011). Model selection for two-sample problems with right-censored data: an application of Cox model. Journal of Statistical Planning and Inference, 141, 2120-2127.
Chen, C. S. and Huang, H. C. (2012). Geostatistical model averaging based on conditional information criteria. Environmental and Ecological Statistics, 19, 23-35.
Claeskens, G. and Hjort, N. L. (2008). Model Selection and Model Averaging. Cambridge University Press: Cambridge.
Cox, D. R. (1970). The Analysis of Binary Data. Metheun: London.
Gregory, K. E. (2008). Clinical predictors of necrotizing enterocolitis in premature infants. Nursing Research, 57, 260-270.
Hoeting, J. A., Madigan, D., Raftery, A. E., and Volinsky, C. T. (1999). Bayesian model averaging: A tutorial (with Discussion). Statistical Science, 14, 382-401.
King, G. and Zeng, L. (2001). Logistic regression in rare events data. Political Analysis, 9(2), 137-163.
Lin, H. C., Hsu, C. H., Chen, H. L., Chung, M. Y., Hsu, J. F., Lien, R. I., Tsao, L. Y., Chen, C. H., and Su, B. H. (2008). Oral probiotics prevent Necrotizing Enterocolitis
in very low birth weight preterm infants: a multicenter, randomized, controlled trial. Pediatrics, 122(4), 693-700.
Lin, H. C., Su, B. H., Chen, A. C., Lin, T. W., Tsai, C. H., Teh, T. F., and Oh, W. (2005). Oral probiotics reduce the incidence and severity of Necrotizing Enterocolitis in very low birth weight infants. Pediatrics, 115(1), 1-4.
Nishii, R. (1984). Asymptotic properties of criteria for selection of variables in multiple regression. Annals of Statistics, 12, 758-765.
Owen, A. B. (2007). Infinitely imbalanced logistic regression. Journal of Machine Learning Research, 8, 761-773.
Schaefer, R. L. (1983). Bias correction in maximum likelihood logistic regression. Statistics in Medicine, 2, 71-78.
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461-464.
Shen, X., Huang, H. C., and Ye, J. (2004). Adaptive model selection and assessment for exponential family models. Technometrics, 46, 306-317.
Shen, X., and Ye, J. (2002). Adaptive model selection. Journal of the American Statistical Association, 97, 210-221.
Ye, J. (1998). On measuring and correcting the effects of data mining and model selection. Journal of the American Statistical Association, 93, 120-131.
Zhu, M., Su, W., and Chipman, H. A. (2005). LAGO: A computationally efficient approach for statistical detection. Technometrics, 48, 193-205.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top