跳到主要內容

臺灣博碩士論文加值系統

(98.82.140.17) 您好!臺灣時間:2024/09/10 13:43
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:王俞媜
研究生(外文):WANG, YU-CHEN
論文名稱:羅吉斯迴歸模型下影響點的診斷分析
論文名稱(外文):Diagnostic Analysis of Influential Points in Logistic Regression Model
指導教授:蔡嘉仁蔡嘉仁引用關係
指導教授(外文):TSAI, JIA-REN
口試委員:陳瓊梅李美賢
口試委員(外文):CHEN, CHYONG-MEILEE, MEI-HSIEN
口試日期:2024-06-21
學位類別:碩士
校院名稱:輔仁大學
系所名稱:統計資訊學系應用統計碩士班
學門:數學及統計學門
學類:統計學類
論文種類:學術論文
論文出版年:2024
畢業學年度:112
語文別:中文
論文頁數:46
中文關鍵詞:羅吉斯迴歸影響點診斷
外文關鍵詞:Logistic Regression ModelInfluential Points Diagnosis
相關次數:
  • 被引用被引用:0
  • 點閱點閱:7
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
羅吉斯迴歸模型是一種廣泛應用於建立二元反應變數與自變數間關係的統計模式。影響點診斷是分析的重要步驟,當資料中存在影響點時,會導致模型配適效果不佳。在診斷影響點的研究中,學者們已開發多種檢測影響點的方法。本研究提出了“standard pseudo R^2 程序 (std. pseudo R^2)”,透過此程序的計算可以應用於資料存在多個影響點識別。同時,也探討在羅吉斯迴歸模式下對不同的 pseudo R^2 指標 (McFadden R^2、Cox and Snell R^2、Nagelkerke R^2) 所建立的 std. pseudo R^2 及檢測影響點在偵測效果的差異。透過實證資料分析,驗證了 std. pseudo R^2 程序應用在不同的 pseudo R^2 計算方式下與其他多個影響點方法,均能有效地檢測到資料集中的影響點。在模擬研究中,考慮了不同的樣本數、自變數個數以及影響點大小和數量(即汙染率)的設定。研究結果呈現,當 std. pseudo R^2 程序透過 Cox and Snell R^2 計算,在樣本數較少 (n = 30, 50) 的情況下,其偽裝率 (swamping rate; SR) 表現最佳,且若樣本數大於20,std. pseudo R^2 程序的 SR 低於 mCD* 法。此外,在相同模擬條件下,比較不同 pseudo R^2 的診斷效果,若由 McFadden R^2 計算 std. pseudo R^2,會有較高的正確識別率 (correct identification rate; CIR),而在樣本數大於20的情況下,使用 Cox and Snell R^2 則會呈現較低的偽裝率。
Logistic regression is an extensively used statistical method for modeling the relationship between a binary response variable and its predictors. However, the presence of influential points can adversely affect the model fit. Hence, diagnosing influential points is a crucial step before conducting further analysis. Previous studies have developed various methods for identifying single or multiple influential points within datasets. In this study, a procedure for detecting multiple influential points, referred to as standard pseudo R^2 (std. pseudo R^2), is proposed. Additionally, the study has explored the differences in detection effectiveness of influential points under different pseudo R^2 indicators, including McFadden R^2, Cox and Snell R^2, and Nagelkerke R^2. Through empirical data analysis, it is confirmed that the std. pseudo R^2 procedure, under different pseudo R^2 calculation methods, and other multiple influential point methods, can effectively recognize influential points in the dataset. In the simulation study, various settings were considered, including sample sizes, numbers of independent variables, and magnitudes as well as quantities (i.e., contamination rates; CR) of influence points. The results indicate that the std. pseudo R^2 procedure, calculated via Cox and R^2, exhibits the lowest swamping rate (SR) in scenarios with smaller sample sizes (n = 30, 50). If the sample size exceeds 20, the SR of the std. pseudo R^2 procedure remains lower than that of the mCD* method. Furthermore, in the assessment of diagnostic effectiveness across different pseudo R^2, the utilization of McFadden R^2 for computing std. pseudo R^2 results in higher correct identification rate (CIR), while for sample sizes above 20, Cox and Snell R^2 shows lower swamping rate.
第壹章 緒論 1
第一節 研究背景 1
第二節 研究動機與目的 2
第三節 研究流程 4
第貳章 文獻探討 5
第一節 影響點診斷方法介紹 5
第二節 羅吉斯迴歸 pseudo R^2 介紹 10
第三節 Standard pseudo R^2 診斷法 11
第參章 實際資料分析 15
第一節 資料介紹 15
第二節 分析結果 17
第肆章 模擬研究 22
第一節 模擬研究設計 22
第二節 模擬研究結果 23
第伍章 結論與建議 34
第一節 研究發現 34
第二節 研究限制與未來發展 35
參考文獻 36
附錄 41

Agresti, A. (2002). Categorical data analysis (2nd ed.). New Jersey: John Wiley & Sons, Inc.
Ahmad, S., Ramli, N. M., & Midi, H. (2012, December 3-4). Outlier detec-tion in logistic regression and its application in medical data analysis [Conference presentation]. 2012 IEEE Colloquium on Humanities, Science and Engineering (CHUSER) (pp. 503-507). IEEE., Kota Kinabalu, Malaysia.
Atkinson, A. C. (1981). Two graphical displays for outlying and influential observations in regression. Biometrika, 68(1), 13-20.
Atkinson, A. C. (1986). Masking unmasked. Biometrika, 73(3), 533-541.
Barnett, V., & Lewis, T. (1994). Outliers in statistical data (3rd ed.). Chich-ester: John Wiley & Sons, Inc.
Belsley, D. A., Kuh, E., & Welsch, R. E. (1980). Regression diagnostics: Identifying influential data and sources of collinearity. New Jersey: John Wiley & Sons, Inc.
Billor, N., Hadi, A. S., & Velleman, P. F. (2000). BACON: blocked adaptive computationally efficient outlier nominators. Computational statistics & data analysis, 34(3), 279-298.
Brown, B. W. (1980). Prediction analysis for binary data. In R. G. Miller, B. Efron, B. W. Brown, & L. E. Moses (Eds.), Biostatistics casebook (pp. 3-18). New York: John Wiley & Sons.
Chatterjee, S., & Hadi, A. S. (1986). Influential observations, high leverage points, and outliers in linear regression. Statistical Science, 1(3), 379-393.
Christensen, R. (1997). Log-linear models and logistic regression (2nd ed.). New York: Springer.
Cook R. D. (1977). Detection of influential observation in linear regression. Technometrics, 19(1), 15-18.
Cook R. D., & Weisberg, S. (1982). Residuals and influence in regression. New York: Chapman and Hall.
Cook, R. D. (1986). Assessment of local influence. Journal of the Royal Statistical Society Series B: Statistical Methodology, 48(2), 133-155.
Coskun, B., & Alpu, O. (2019). Diagnostics of multiple group influential observations for logistic regression models. Journal of Statistical Computation and Simulation, 89(16), 3118-3136.
Cox, D. R. & Snell, E. J. (1989). Analysis of Binary Data (2nd ed.). London: Chapman and Hall/CRC.
Cragg, J. G., & Uhler, R. S. (1970). The demand for automobiles. The Ca-nadian Journal of Economics / Revue canadienne d'Economique, 3(3), 386-406.
Davies P., Imon A. H. M. R., & Ali M. M. (2004). A conditional expectation method for improved residual estimation and outlier identification in linear regression. International Journal of Statistical Sciences, Special issue, 191-208.
Elihimas Júnior, U. F., Couto, J. P., Pereira, W., Barros de Oliveira Sá, M. P., Tenório de França, E. E., Aguiar, F. C., et al. (2020). Logistic regression model in a machine learning application to predict elderly kidney transplant recipients with worse renal function one year after kidney transplant: elderly KTbot. Journal of Aging Research, 2020.
Ghosh, S. (2022). Deletion diagnostics in logistic regression. Journal of Applied Statistics, 1(1), 1-13.
Gordian, M. E., Haneuse, S., & Wakefield, J. (2006). An investigation of the association between traffic exposure and the diagnosis of asthma in children. Journal of Exposure Science & Environmental Epidemiology, 16(1), 49-55.
Hadi, A. S. (1992). A new measure of overall potential influence in linear regression. Computational Statistics & Data Analysis, 14(1), 1-27.
Hilbe, J. M. (2009). Logistic regression models. New York: Chapman and Hall/CRC.
Hodge, V., & Austin, J. (2004). A survey of outlier detection methodologies. Artificial Intelligence Review, 22, 85-126.
Imon, A. R., & Hadi, A. S. (2008). Identification of multiple outliers in lo-gistic regression. Communications in Statistics-Theory and Methods, 37(11), 1697-1709.
Imon, A. R., & Hadi, A. S. (2013). Identification of multiple high leverage points in logistic regression. Journal of Applied Statistics, 40(12), 2601-2616.
Jennings, D. E. (1986). Outliers and residual distributions in logistic re-gression. Journal of the American Statistical Association, 81(396), 987-990.
Joshi, M. V. (2002). Learning classifier models for predicting rare phe-nomena. Unpublished doctoral dissertation, University of Minnesota, Twin Cities, Minnesota.
Maddala, G. S. (1983). Limited dependent and qualitative variables in economics. New York: Cambridge Press.
McFadden, D. (1974) Conditional Logit Analysis of Qualitative Choice Behavior. In Zarembka P. (Ed.), Frontiers in Econometrics (pp. 105-142). New York: Academic Press.
Menard, S. (2002). Applied logistic regression analysis (2nd ed.). Thousand Oaks, CA: Sage Publications.
Midi, H., & Ariffin, S. B. (2013). Modified standardized pearson residual for the identification of outliers in logistic regression model. Journal of Applied Sciences, 13(6), 828-836.
Nagelkerke, N. J. (1991). A note on a general definition of the coefficient of determination. Biometrika, 78(3), 691-692.
Nurunnabi, A. A. M. (2008). Robust diagnostic deletion techniques in linear and logistic regression. Unpublished doctoral dissertation, University of Rajshahi, Motihar, Rajshahi.
Nurunnabi, A. A. M., & Nasser, M. (2011). Outlier diagnostics in logistic regression: a supervised learning technique. In 2009 International Conference on Medicine Learning and Computing IPCSIT (Vol. 3, pp. 90-95).
Nurunnabi, A. A. M., Nasser, M., & Imon, A. H. M. R. (2016). Identification and classification of multiple outliers, high leverage points and influ-ential observations in linear regression. Journal of Applied Statistics, 43(3), 509-525.
Nurunnabi, A. A. M., Rahmatullah Imon, A. H. M., & Nasser, M. (2010). Identification of multiple influential observations in logistic regression. Journal of Applied Statistics, 37(10), 1605-1624.
Pregibon, D. (1981). Logistic regression diagnostics. The Annals of Statis-tics, 9(4), 705-724.
Quenouille, M. H. (1956). Notes on Bias in Estimation. Biometrika, 43(3/4), 353-360.
Rahmatullah Imon, A. H. M. (2005). Identifying multiple influential obser-vations in linear regression. Journal of Applied statistics, 32(9), 929-946.
Rousseeuw, P. J. (1984). Least median of squares regression. Journal of the American statistical association, 79(388), 871-880.
Rousseeuw, P. J., & Leroy, A. M. (1987). Robust regression and outlier de-tection. New York: John Wiley & Sons, Inc.
Ryan, T. P. (1997). Modern regression methods (1st ed.). New York: John Wiley & Sons, Inc.
Sarkar, S. K., Midi, H., & Rana, S. (2011). Detection of outliers and influ-ential observations in binary logistic regression: An empirical study. Journal of Applied Sciences, 11(1), 26-35.
Smith, T. J., & McKenna, C. M. (2013). A comparison of logistic regression pseudo R2 indices. Multiple Linear Regression Viewpoints, 39(2), 17-26. Retrieved from http://www.glmj.org/archives/articles/Smith_v39n2.pdf
Tsai, J. R. & Hsiao S. W. (2022, submitted). Detection of influential obser-vations based on a linear regression model with measurement errors.
Weiss, G. M., & Provost, F. (2003). Learning when training data are costly: The effect of class distribution on tree induction. Journal of artificial intelligence research, 19, 315-354.
Zhang, Z. (2016). Residuals and regression diagnostics: focusing on logistic regression. Annals of Translational Medicine, 4(10).

電子全文 電子全文(網際網路公開日期:20290731)
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top