臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.217.5) 您好！臺灣時間：2026/06/08 09:42

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
論文連結
QR Code

本論文永久網址:

研究生:

黃孟凡

研究生(外文):

Meng-Fan Huang

論文名稱:

發展羅吉斯迴歸模型平均法評估稀有事件之風險

論文名稱(外文):

Logistic Regression Model Averaging for Rare Events Data

指導教授:

陳春樹

指導教授(外文):

Chun-Shu Chen

學位類別:

碩士

校院名稱:

國立彰化師範大學

系所名稱:

統計資訊研究所

學門:

數學及統計學門

學類:

統計學類

論文種類:

學術論文

論文出版年:

2013

畢業學年度:

101

語文別:

英文

論文頁數:

中文關鍵詞:

不平衡資料、Kullback-Leibler損失、最大概似估計、風險評估、不確定性

外文關鍵詞:

Imbalanced data、Kullback-Leibler loss、Maximum likelihood estimate、Risk assessment、Uncertainty

相關次數:

被引用:1
點閱:826
評分:
下載:120
書目收藏:0

在許多統計的應用中，羅吉斯回歸（logistic regression）模型常用來分析帶有解釋變數的二元資料形態。當資料中的兩個分類極度不平衡時，模型參數的估計已被證實會有嚴重的偏誤，因而對應的相關風險評估推論將是不精確的。本篇論文中我們應用羅吉斯回歸模型去評估罕見事件的風險變異，我們不是使用某一特定的變數選擇準則去挑選一個最佳的模型，而是透過資料擾動的技巧應用在某些變數選取準則上去提出一個局部的模型平均流程，進而得到不同的風險估計值。然後藉由Kullback-Leibler損失的近似不偏估計量從中選擇。我們所提出的局部模型平均流程考慮了一般配模過程中常忽略的參數估計與模型選取的不確定性，因此我們所提出的方法將具有較佳的表現。我們透過完整的模擬實驗說明所提出方法的有效性，並藉由分析壞死性小腸結腸炎的真實資料說明我們方法的可行性。

In statistical applications, logistic regression has been a popular method for analyzing binary data accompanied by explanatory variables. But when the two classifications are extremely imbalanced, the estimation of model parameters has been shown to be severely biased and hence the inferences of relative risks based on a selected model would be inaccurate. In this paper, we focus on assessing the risk variations of rare events based on logistic regression models. Instead of selecting a best model based on a particular variable selection criterion, we propose a local model averaging procedure based on a data perturbation technique applied to different information criteria to obtain new risk estimates. Then an approximately unbiased estimator of Kullback-Leibler loss is used to choose the best one among them. Our proposed local model averaging procedure accounts for both estimation uncertainty and selection uncertainty, which are generally not considered by other modeling procedures. Therefore the proposed method has superior performance in various situations. We present complete simulations to assess the robustness of our approach and a real data example for necrotizing enterocolitis (NEC) is also applied for illustration.

Contents
1 Background and Motivation 1
2 Logistic Regression Model 4
3 Model Selection and Model Averaging 7
3.1 Model Selection 7
3.2 Model Averaging 8
4 The Proposed Methodology 10
4.1 Local Model Average Estimators 10
4.2 Assessment of LMA Estimators 12
5 Simulation Study 18
5.1 Simulation Scenarios 18
5.2 Simulation Results and Sensitivity Study 19
6 Applications to the Necrotizing Enterocolitis Data 31
7 Conclusion 37

List of Tables
1 True values and MLEs of model parameters for each case of Experiment I, where the values in parentheses are the standard deviations of parameter estimation based on 400 simulation replicates 20
2 True values and MLEs of model parameters for each case of Experiment II, where the values in parentheses are the standard deviations of parameter estimation based on 400 simulation replicates 21
3 Average values of percentage of rare events (Y = 1) for Experiment I and Experiment II based on 400 simulation replicates, where the values in parentheses are the corresponding standard deviations 23
4 KL loss performance of various methods for Experiment I based on 400 simulation replicates, where the values in parentheses are the corresponding standard errors 24
5 KL loss performance of various methods for Experiment II based on 400 simulation replicates, where the values in parentheses are the corresponding standard errors 25
6 Distributions of the selected variables for AIC and BIC criteria under various cases of Experiment I and Experiment II based on 400 simulation replicates, where the symbol “ ∗ ” represents that the underlying true model is exactly selected 28
7 Times of our proposed adaptive-LMA method averages around AIC or BIC criterion for various cases of Experiment I and Experiment II based on 400 simulation replicates 29
8 KL loss performance of our proposed adaptive-LMA method with respect to various perturbation sizes τ and T = 400 for Case 3 of Experiment I and Experiment II based on 400 simulation replicates, where the values in parentheses are the corresponding standard errors 29
9 KL loss performance of our proposed adaptive-LMA method with respect to various MC sample sizes T and τ = 0.5 for Case 3 of Experiment I and Experiment II based on 400 simulation replicates, where the values in parentheses are the corresponding standard errors 29

List of Figures
1 Plots of averaged KL loss values with respect to five cases for Experiment I based on 400 simulation replicates 26
2 Plots of averaged KL loss values with respect to five cases for Experiment II based on 400 simulation replicates 27
3 Risk estimates of severe NEC for 188 individuals based on various methods 34
4 Frequency of the selected models (variables) for the BIC-LMA method based on 400 perturbed data..........36

References
Agresti, A. (2007). An Introduction to Categorical Data Analysis (second edition). John Wiley: New York.
Akaike, H. (1973). Information theory and the maximum likelihood principle. In International Symposium on Information Theory (V. Petrov and F. Cs´aki eds.), Akademiai Ki´ado, Budapest, 267-281.
Bolton, R. J. and Hand, D. J. (2002). Statistical fraud detection: A review. Statistical Science, 17(3), 235-255.
Breiman, L. (1996). Heuristics of instability and stabilization in model selection. Annals of Statistics, 24, 2350-2383.
Burnham, K. P. and Anderson, D. R. (2002). Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach (second edition). SpringerVerlag: New York.
Chen, C. S. and Chang, Y. M. (2011). Model selection for two-sample problems with right-censored data: an application of Cox model. Journal of Statistical Planning and Inference, 141, 2120-2127.
Chen, C. S. and Huang, H. C. (2012). Geostatistical model averaging based on conditional information criteria. Environmental and Ecological Statistics, 19, 23-35.
Claeskens, G. and Hjort, N. L. (2008). Model Selection and Model Averaging. Cambridge University Press: Cambridge.
Cox, D. R. (1970). The Analysis of Binary Data. Metheun: London.
Gregory, K. E. (2008). Clinical predictors of necrotizing enterocolitis in premature infants. Nursing Research, 57, 260-270.
Hoeting, J. A., Madigan, D., Raftery, A. E., and Volinsky, C. T. (1999). Bayesian model averaging: A tutorial (with Discussion). Statistical Science, 14, 382-401.
King, G. and Zeng, L. (2001). Logistic regression in rare events data. Political Analysis, 9(2), 137-163.
Lin, H. C., Hsu, C. H., Chen, H. L., Chung, M. Y., Hsu, J. F., Lien, R. I., Tsao, L. Y., Chen, C. H., and Su, B. H. (2008). Oral probiotics prevent Necrotizing Enterocolitis
in very low birth weight preterm infants: a multicenter, randomized, controlled trial. Pediatrics, 122(4), 693-700.
Lin, H. C., Su, B. H., Chen, A. C., Lin, T. W., Tsai, C. H., Teh, T. F., and Oh, W. (2005). Oral probiotics reduce the incidence and severity of Necrotizing Enterocolitis in very low birth weight infants. Pediatrics, 115(1), 1-4.
Nishii, R. (1984). Asymptotic properties of criteria for selection of variables in multiple regression. Annals of Statistics, 12, 758-765.
Owen, A. B. (2007). Infinitely imbalanced logistic regression. Journal of Machine Learning Research, 8, 761-773.
Schaefer, R. L. (1983). Bias correction in maximum likelihood logistic regression. Statistics in Medicine, 2, 71-78.
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461-464.
Shen, X., Huang, H. C., and Ye, J. (2004). Adaptive model selection and assessment for exponential family models. Technometrics, 46, 306-317.
Shen, X., and Ye, J. (2002). Adaptive model selection. Journal of the American Statistical Association, 97, 210-221.
Ye, J. (1998). On measuring and correcting the effects of data mining and model selection. Journal of the American Statistical Association, 93, 120-131.
Zhu, M., Su, W., and Chipman, H. A. (2005). LAGO: A computationally efficient approach for statistical detection. Technometrics, 48, 193-205.

電子全文

國圖紙本論文

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供，不一定有電子全文可供下載，若連結有誤，請點選上方之〝勘誤回報〞功能，我們會盡快修正，謝謝！

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	污染場址健康風險評估參數之敏感性分析
2.	地下水污染風險評估之不確定性分析與降低
3.	不同污染源對大氣中有害空氣污染物濃度相對貢獻量在健康風險評估程序中之不確定性研究
4.	風險評估中環境參數解析度之研究--以醫療廢棄物焚化爐為例
5.	含氯有機物污染場址之健康風險評估
6.	有害廢棄物場址之危害性評估
7.	有害廢棄物場址之危害性評估
8.	水再生利用微生物風險評估與決策系統之開發
9.	風險評估模型之最適重新取樣策略
10.	利用貝氏統計估計太平洋黑鮪族群動態並考慮其不確定性

1.	林鶴玲、李香潔，1998，〈台灣閩、客、外省族群家庭中之性別資源配置〉。《人文及社會科學集刊》，11(4)，475-528。
2.	李美玲，1994，〈二十世紀以來台灣人口婚姻狀況的變遷〉。《人口學刊》16:1-15。
3.	巫麗雪、蔡瑞明，2006，〈跨越族群的藩籬：從機會供給觀點分析台灣的族群通婚〉。《人口學刊》32:1-41。
4.	施葛父，1983，〈擇偶心理學〉。《張老師月刊》12(3) :196-203。
5.	張榮富，2010，〈適婚人口性比例對擇偶年齡偏好的影響：以台灣與日本大學學歷者為樣本〉。《人文暨社會科學期刊》，6(2) :1-11。
6.	張榮富、唐玉蟬，2009，〈逐歲分析男女年齡對擇偶年齡偏好的影響〉。《淡江人文社會學刊》40:115-144。
7.	張榮富，2010，〈適婚人口性比例對擇偶年齡偏好的影響：以台灣與日本大學學歷者為樣本〉。《人文暨社會科學期刊》6(2)：1-11。
8.	楊靜利、李大正、陳寬政，2006，〈台灣傳統婚配空間的變化與婚姻行為之變遷〉。《人口學刊》33: 1-32。
9.	簡春安、方婷，1998，〈大專學生對情愛教育的認知及其需求研究〉。《長榮學報》2(2):145-172。

1.	羅吉斯迴歸模型主要共變項係數調整方法的進階研究
2.	以兩種資料探勘方式預測透析中低血壓事件:比較支持向量機法與羅吉斯迴歸法
3.	企業違約機率預測－使用羅吉斯迴歸模型
4.	結合羅吉斯迴歸分析與細胞自動機預測紅樹林變遷之研究
5.	就學貸款違約風險因子之研究-應用羅吉斯迴歸模型
6.	以羅吉斯迴歸預測抗藥性金黃色葡萄球菌帶菌者與院內感染風險
7.	同時選取抽樣法與參數估計法於地理統計模型
8.	主導匯率反轉因子之分析:羅吉斯迴歸模型之應用
9.	以羅吉斯迴歸法建立陳有蘭溪集水區山崩潛感圖之研究
10.	影響現金卡違約之因素分析---羅吉斯迴歸模型之應用
11.	衛生紙購買行為的因素探討：羅吉斯迴歸之應用
12.	可疑連結過濾器基於羅吉斯迴歸與多觀點分析
13.	達成基金目標獲利的羅吉斯迴歸模型探討
14.	決策樹、羅吉斯迴歸與類神經網路預測員工績效之比較研究
15.	決策樹分析與羅吉斯迴歸於資料探勘的整合運用：以人事資料與民眾健康影響因素之探討為例

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室