跳到主要內容

臺灣博碩士論文加值系統

(3.236.110.106) 您好!臺灣時間:2021/07/25 08:18
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:李詩芸
研究生(外文):Shih-Yun Lee
論文名稱:共線性及遺漏值對迴歸模式變項選擇影響
論文名稱(外文):Model selection in regression analysis on the data with multicollinearity and missing covariates
指導教授:林逸芬
指導教授(外文):I-Feng Lin
學位類別:碩士
校院名稱:國立陽明大學
系所名稱:公共衛生研究所
學門:醫藥衛生學門
學類:公共衛生學類
論文種類:學術論文
論文出版年:2004
畢業學年度:92
語文別:中文
中文關鍵詞:共線性多元共線性遺漏值變項選擇模擬研究線性迴歸邏輯斯迴歸
外文關鍵詞:multicollinearitymissing valuesvariable selectionsimulation studylinear regressionlogistic regression
相關次數:
  • 被引用被引用:0
  • 點閱點閱:214
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
在公共衛生的研究中,常會遇到共線性和遺漏值的問題。當自變項彼此之間互有相關時,則稱其中存在多元共線性。嚴重的多元共線性在迴歸分析上,經常會造成迴歸參數估計的不穩定導致標準誤增加,進而可能干擾或誤導迴歸模式的選擇結果。當遺漏值出現在重要變項中,會造成若干程度的訊息流失,並可能使分析結果有偏差。過去同時探討共線性及遺漏值對迴歸模式變項選擇影響的研究很少。
本研究的目的在比較(1)自變項遺漏值存在與否 和(2)自變項有無多元共線性存在時 以及(3)遺漏值與共線性同時存在時,對統計模式選擇之影響。將利用某醫學中心睡眠呼吸中止症病人資料為母體,進行各項模擬研究,探討不同的樣本大小、共線性嚴重度、遺漏值比率、模式解釋力程度、變項進入/退出模式標準和模式選擇方法等各種情況下,對迴歸模式變項選擇之影響。根據資料性質本研究將以線性迴歸為主,邏輯斯迴歸將只作部分探討。以下章節的文獻探討包括多元共線性、遺漏值和變項選擇這三個主題。
在線性迴歸的結果顯示,遺漏值對正確模式選中比例的影響不大,只有在小樣本有若干程度的影響。多元共線性對正確模式選中比例的影響很大,候選變項或是真實模式有共線性,都會降低正確模式選中比例。當真實模式和候選變項都沒有共線性時,不同的模式解釋力程度、變項進入/退出模式標準和模式選擇方法,影響正確模式選中比例程度很小。當真實模式加上平方變項或候選變項含交互變項時,變項進入/退出標準較嚴者在樣本小時結果較差,在樣本大時結果較好。模式解釋力大者選中比例較好,不論樣本數大或小。當真實模式本身有共線性且候選變項又含交互變項,亦即當嚴重的多元共線性存在時,變項選擇方式有很大的差別,後退消去法比前進逐步法選中正確模式的比例高很多。
以P值分布比例的結果而言,當真實模式有共線性時,只要樣本數夠大,估計模式中的自變項不會因共線性的干擾而變得不顯著。當真實模式沒有共線性時,估計模式中的自變項只要沒出現在真實模式中,都不會顯著,不論有無共線性的干擾。
在邏輯斯迴歸的結果顯示,變項選擇方式不論是前進逐步法或後退消去法,都沒有太大差別。變項進入/退出標準為0.05者在樣本小時,比0.1者結果較差,在樣本大時,比0.1者結果較好。有遺漏值會稍微降低選中正確模式比例。整體而言,樣本大小對結果影響非常大。
In the research of public health, the problems of multicollinearity and missing values are frequently encountered. When the covariates are related to each other, this is called multicollinearity. Serious multicollinearity in regression analysis will make unstableness of regression parameter estimates, which result in the increase of standard errors, probably even in interfering or misleading the variable selection of the model. When missing values exist in important variables, certain degrees of information will lose, and the results of analysis will be likely to bias. In the past, there are few researches to discuss the effect on variable selection of both multicollinearity and missing values at the same time.
The purpose of this research is to compare (1) whether the missing values of covariates exist or not, (2) whether covariates exist with multicollinearity or not ,and (3) what the effect on the variable selection of the model is when the missing values and multicollinearity exist at the same time. Use a sample of Obstructive Sleep Apnea Syndrome (OSAS) patients’ records of a hospital center as the population to do various simulation studies. Study different sample sizes, data structures, proportions of missingness, magnitude of multicollinearity, the criteria of variables in/out the model and selection methods in different conditions to the effect of the variable selection in regression models. According to the data characters, this research will study most of linear regression and part of logistic regression. The followings research three subjects—multicollinearity, missing values and variable selection.
The result of linear regression shows that the missing values have little effect on the proportion of choosing the correct model and only a little effect on a small sample size. Multicollinearity has a substantial effect on the proportion of choosing the correct model, and candidate variables or the true model which have collinearity will decrease the proportion of choosing the correct model. When neither the true model nor candidate variables have collinearity, different levels of R2, criteria of variables in/out the model and selection methods have little effect on the correct model. When the true model with quadratic forms or candidate variables with interactions, the result is bad at small sample sizes if criteria of variables in/out the model are strict, but is better at large sample sizes. The bigger the R2 is, the better the proportion of choices will be, no matter the sample size is large or small. When the true model has collinearity in itself and candidate variables with interactions, that is, when serious multicollinearity exists, the method of backward selection is better than the method of forward stepwise selection on choosing the correct model.
According to the result of the P values, when true models with collinearity, if the samples are large enough, the covariates in estimated models won’t interfere but become significant. When true models are without collinearity, as long as the covariates in estimated models don’t appear in true models, the result won’t be significant no matter whether there is the interference of collinearity or not.
The result of logistic regression shows that selection methods, no matter whether it is forward stepwise or backward selection, have little difference. The criterion of variables in/out the model with small sample sizes is 0.05 worse than 0.1, but better than 0.1 with large sample sizes. The existence of missing values will slightly decrease the proportion of choosing the correct model. In conclusion, the sample size will effect the result greatly.
1. Paul D. Allison (2002). Missing Data, Sage Publications.
2. Roderick J. A. Little, Donald B. Rubin (2002). Statistical Analysis with Missing Data 2nd edition, Wiley Intersciences.
3. William Mendenhall, Terry Sincich (1995). Statistics for Engineering and the Sciences 4th edition, Prentice Hall.
4. John Neter, Michael H. Kutner, Christopher J. Nachtsheim, William Wasserman (1996). Applied Linear Statistical Models 4th edition, Irwin/McGraw-Hill
5. Kristel Van Steen, Desmond Curran, Jocelyn Kramer et al.(2002). Multicollinearity in prognostic factor analyses using the EORTC QLQ-C30: identification and impact on model selection. Statistics in Medicine 21: 3865-3884.
6. Michael J. Cannon, Lee Warner, J. Augusto Taddei and David G. Kleinbaum (2001). What can go wrong when you assume that correlated data are independent: an illustration from the evaluation of a childhood health intervention in Brazil. Statistics in Medicine 20: 1461-1467.
7. Raymond H. Myers (1986). Classical and Modern Regression with Applications, Duxbury Press.
8. David A. Belsley, Edwin Kuh, Roy E. Welsch (1980). Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, Wiley.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
1. 1. 陶育均、邢文灝、李貴琪,改變抗彈織物組織對抗彈性能之影響,華岡紡織期刊,第七卷第三期,臺灣,頁259-273 (2000)
2. 11. 趙永泰、邢文灝、李貴琪,複合抗彈織物排列層次對抗彈性之研究,華岡紡織期刊,第五卷第四期,臺灣,頁341-353 (1998)
3. 12. 劉南利、邢文灝、李貴琪,改變防彈織物之角度排列對穿刺性之研究,華岡紡織期刊,第九卷第四期,臺灣,頁368-376 (2002)
4. 14. 高志全、李貴琪,改變Kevlar纖維積層板成型壓力與排列方式對抗彈性能之研究,華岡紡織期刊,第七卷第四期,臺灣,頁345-354 (2000)
5. 19. 楊家儒、李貴琪,以聚丙烯增加Kevlar複合防彈積層板韌性之研究,華岡紡織期刊,第五卷第二期,臺灣,頁208-218 (1998)
6. 24. 陶育均、邢文灝、李貴琪,織物排列方式於抗彈效果之影響,絲織園地,第三十五期,臺灣,頁28-35 (2001)
7. 25. 梁鼎佳、李貴琪、蔡鳳瑞,測試防彈衣柔軟性方法之探討,華岡紡織期刊,第五卷第二期,臺灣,頁170-179 (1998)
8. 31. 鍾佩玲、李貴琪、龔耀秋,應用田口法分析防彈織物排列比例及子彈入射角度對防彈性能之影響,華岡紡織期刊,第九卷第一期,臺灣,頁80-91 (2002)
9. 33. 李雅惠、申屠光、李貴琪、牟雋月,我國目前警用防彈衣穿著舒適性之調查,華岡紡織期刊,第八卷第一期,臺灣,頁99-117 (2001)
10. 37. 李貴琪、陳鑫泉,高強度高模數纖維之結構、物性及應用,絲織園地,第四十四期,臺灣,頁43-49 (2003)
11. 48. 徐辛欽、周淑美,臺灣臭氧量與紫外線(UVB)之分析研究,氣象學報,第四十二卷第二期,臺灣,頁106-122 (1995)
12. 50. 徐崇泰、申屠光,抗紫外線織物消費者購買涉入與消費行為之研究,華岡紡織期刊,第八卷第二期,臺灣,頁124-143 (2001)
13. 51. 莫清賢、林再福、林憲男,最像日光的燈:氙燈,電力電子技術,第五十期,臺灣,頁19-25 (1999)
14. 56. 陳秋芳、王權泉,添加紫外光吸收劑及抗氧化劑對鹽基性染料與Nomex單體之光褪色動力型態研究,華岡紡織期刊,第九卷第一期,臺灣,頁53-68 (2002)
15. 57. 林玉櫻、王權泉,添加紫外線吸收劑對尼龍織物以分散染料染色耐光染色堅牢度及物性改善之研究,華岡紡織期刊,第六卷第四期,臺灣,頁317-324 (1999)