(3.239.192.241) 您好!臺灣時間:2021/03/02 19:32
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:謝明潭
研究生(外文):Ming-tan Hsieh
論文名稱:利用強迫通過觀察值的最小平方迴歸線診斷異常值
論文名稱(外文):Detecting outliers using the Least Square Regression Line that is forced through an observation
指導教授:吳榮彬吳榮彬引用關係
指導教授(外文):Jung-Pin Wu
學位類別:碩士
校院名稱:逢甲大學
系所名稱:統計與精算所
學門:數學及統計學門
學類:統計學類
論文種類:學術論文
論文出版年:2007
畢業學年度:95
語文別:中文
論文頁數:60
中文關鍵詞:遮蔽效應最小平方法拔靴法角度異常值Cook''''s Distance
外文關鍵詞:Least Square Methodmasking effectCook''''s Distanceoutlieranglebootstrap method
相關次數:
  • 被引用被引用:0
  • 點閱點閱:273
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
目前普遍使用 Cook''s Distance 診斷異常值。
雖然在異常值只有一點的時候,
它的診斷效果很好,
但是如果有兩點以上的異常值時,
它容易因為遮蔽效應(masking)而發生誤判的情形。
這篇文章提出一種診斷異常值的新方法。
我們利用最小平方法以及 Lagrange 乘數法解出強迫通過觀察值的最小平方迴歸線,
並解出刪去該觀察值的最小平方迴歸線,
接著計算兩直線的夾角,
此夾角被用來判斷該觀察值是否為異常值。
計算這一項角度的抽樣分配並不容易,
所以我們利用拔靴法(bootstrap)模擬它的抽樣分配,
藉以估計夾角觀察值的 p 值(p-value)。
如果 p 值小於預設的型 I 錯誤機率 α ,該觀察值即被認定為異常值。
我們建議的新手法透過蒙地卡羅模擬跟幾個傳統的診斷方法(Cook''s Distance, H, DFFITS, DFBETAS以及COVRATIO)比較。
用 Positive 和 False Positive 衡量診斷方法的好壞,其中 Positive 是真正的異常值沒有被找出來的比例,而 False Positive 則是把正常值誤認為是異常值的比例。
Cook''s Distance are commonly used to detect outliers.
Although only has one outlier its diagnostic effect is very good,
but if has more than one outlier,
it easily has the situation because of masking effect which sentences by mistake.
This article proposes new method of detect outlier.
We use Least Square Method and Lagrange Multiplier Method to solve the Least Square Regression Line that is forced through an observation,
and solve the Least Square Regression Line that is deleted above observation.
Then calculates the angle of two straight lines,
the angle is used to judge this observation whether is outlier.
It''s not easy to calculate the sampling distribution of angle,
therefore we use bootstrap to simulate its sampling distribution so as to estimate the p value of angle.
If p value smaller than the probability of type I error α,
this observation is recognized for outlier.
The new technique that we suggested is compared with several traditional diagnosis method (Cook''s Distance, H, DFFITS, DFBETAS and COVRATIO) penetrates Monte Carlo simulation.
Using Positive and False Positive to ponder the quality of diagnosis method.
Positive is the ratio that the true outlier not be found.
False Positive is the ratio that a good observation was mistaken for the outlier.
1 序論...................................................1
2 理論回顧.............................................. 2
2.1 殘差分析.............................................4
2.2 帽子矩陣(H).........................................5
2.3 Cook’s D ..........................................6
2.4 DFFITS 和 DFBETAS....................................7
2.5 COVRATIO............................................9
3 利用夾角診斷異常值...................................10
3.1 強迫通過觀察值的最小平方迴歸線 L1 .................12
3.2 利用Excel 樣板求解 L1 .............................13
3.3 刪掉某個觀察值的最小平方迴歸線 L2..................17
3.4 L1 與 L2夾角估計式 A ...............................18
3.5 利用統計量 A診斷異常值的流程 .......................22
4 模擬研究 ............................................24
4.1 狀況設計 ...........................................24
4.2 程式設計概述 .......................................26
4.3 結果與分析 ........................................ 28
5 實例分析 ............................................38
5.1 Pilot-PlantDataSet .................................38
5.2 Hawkins-Bradu-KassData .............................38 6 結論 ................................................42參考文獻 ...............................................44 A 模擬資料 .............................................46
Aczel, A. D., and Sounderpandian, J. (2006), Complete Business Statistics, 6th ed. McGraw. Hill
Belsley, D. A., Kuh, E., and Welsch, R. E. (1980), Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, Wiley, New York.
Chatterjee, S., and Hadi, A. S. (1986), Influential Observations, High Leverage Points, and Outliers in Linear Regression, Statistical Science, Vol. 1, No. 3, 379-393.
Cook, R. D. (1977), Detection of influential observation in linear regression. Technometrics, 19, 15-18
Daniel, C., and Wood, F. S. (1971), Fitting Equations to Data: Computer Analysis of Multifactor Data, Wiley, New York.
Eforn, B. (1979a), Bootstrap Methods : Another Look at the Jackknife. Annals of Statist. 7,1-26.
Hadi, A. S., and Simonoff, J. S. (1993), Procedures for the identification of Multiple Outliers in Linear Models, J. Am. Stat. Assoc., 88, 1264-1272.
Hawkins, D. M., Bradu, D., and Kass, G. V. (1984), Location of several outliers in multiple regression data using elemental sets, Technometrics, 26, 197-208.
Kianifard, F., and Swallow, W. H. (1990), A Monte Carlo Comparison of Five Procedures for Identifying Outliers in Linear Regression,Communications in Statistics, Part A-Theory and Methods, 19, 1913-1938.
Martinez, W. L., and Martinez A. R. (2002), Computational statistics handbook with MATLAB, Chapman & Hall /CRC.
Montgomery, D. C., Peck, E. A., and Vining, G. G. (2001), Introduction to Linear Regression analysis, 3rd ed. Wiley, New York.
Neter, J., Kutner, M. H., Nachtsheim, C. J., and Wasserman, William. (1996), Appliced Linear Regression Models, Irwin, Chicago Ill.
Rousseeuw, P. J., and Leroy, A. M. (1987), Robust regression and outlier detection, Wiley, New York.
Rousseeuw, P. J., Daniels, B., and Leroy, A. (1984a), Applying robust regression to insurance, Insur. Math. Econ., 3, 67-72.
溫志平 (2000),Cook''s Distance 的影響值診斷績效,國立成功大學統計學研究所碩士論文。
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔