研究生(外文):WANG, YU-CHEN
論文名稱(外文):Diagnostic Analysis of Influential Points in Logistic Regression Model
指導教授(外文):TSAI, JIA-REN
外文關鍵詞:Logistic Regression ModelInfluential Points Diagnosis
羅吉斯迴歸模型是一種廣泛應用於建立二元反應變數與自變數間關係的統計模式。影響點診斷是分析的重要步驟,當資料中存在影響點時,會導致模型配適效果不佳。在診斷影響點的研究中,學者們已開發多種檢測影響點的方法。本研究提出了“standard pseudo R^2 程序 (std. pseudo R^2)”,透過此程序的計算可以應用於資料存在多個影響點識別。同時,也探討在羅吉斯迴歸模式下對不同的 pseudo R^2 指標 (McFadden R^2、Cox and Snell R^2、Nagelkerke R^2) 所建立的 std. pseudo R^2 及檢測影響點在偵測效果的差異。透過實證資料分析,驗證了 std. pseudo R^2 程序應用在不同的 pseudo R^2 計算方式下與其他多個影響點方法,均能有效地檢測到資料集中的影響點。在模擬研究中,考慮了不同的樣本數、自變數個數以及影響點大小和數量(即汙染率)的設定。研究結果呈現,當 std. pseudo R^2 程序透過 Cox and Snell R^2 計算,在樣本數較少 (n = 30, 50) 的情況下,其偽裝率 (swamping rate; SR) 表現最佳,且若樣本數大於20,std. pseudo R^2 程序的 SR 低於 mCD* 法。此外,在相同模擬條件下,比較不同 pseudo R^2 的診斷效果,若由 McFadden R^2 計算 std. pseudo R^2,會有較高的正確識別率 (correct identification rate; CIR),而在樣本數大於20的情況下,使用 Cox and Snell R^2 則會呈現較低的偽裝率。
Logistic regression is an extensively used statistical method for modeling the relationship between a binary response variable and its predictors. However, the presence of influential points can adversely affect the model fit. Hence, diagnosing influential points is a crucial step before conducting further analysis. Previous studies have developed various methods for identifying single or multiple influential points within datasets. In this study, a procedure for detecting multiple influential points, referred to as standard pseudo R^2 (std. pseudo R^2), is proposed. Additionally, the study has explored the differences in detection effectiveness of influential points under different pseudo R^2 indicators, including McFadden R^2, Cox and Snell R^2, and Nagelkerke R^2. Through empirical data analysis, it is confirmed that the std. pseudo R^2 procedure, under different pseudo R^2 calculation methods, and other multiple influential point methods, can effectively recognize influential points in the dataset. In the simulation study, various settings were considered, including sample sizes, numbers of independent variables, and magnitudes as well as quantities (i.e., contamination rates; CR) of influence points. The results indicate that the std. pseudo R^2 procedure, calculated via Cox and R^2, exhibits the lowest swamping rate (SR) in scenarios with smaller sample sizes (n = 30, 50). If the sample size exceeds 20, the SR of the std. pseudo R^2 procedure remains lower than that of the mCD* method. Furthermore, in the assessment of diagnostic effectiveness across different pseudo R^2, the utilization of McFadden R^2 for computing std. pseudo R^2 results in higher correct identification rate (CIR), while for sample sizes above 20, Cox and Snell R^2 shows lower swamping rate.
第壹章 緒論 1
第一節 研究背景 1
第二節 研究動機與目的 2
第三節 研究流程 4
第貳章 文獻探討 5
第一節 影響點診斷方法介紹 5
第二節 羅吉斯迴歸 pseudo R^2 介紹 10
第三節 Standard pseudo R^2 診斷法 11
第參章 實際資料分析 15
第一節 資料介紹 15
第二節 分析結果 17
第肆章 模擬研究 22
第一節 模擬研究設計 22
第二節 模擬研究結果 23
第伍章 結論與建議 34
第一節 研究發現 34
第二節 研究限制與未來發展 35
參考文獻 36
附錄 41

