研究生(外文):Chih-Hsuan Wu
論文名稱(外文):Robust regression by self-updating process
指導教授(外文):Ting-Li Chen
外文關鍵詞:robust regressioniterative processmean-shift
由Huber (1973)提出的穩健迴歸分析方法及相關研究為現今常見的迴歸分析方法之一,其實作疊代演算法的概念是依據前一次的迴歸線更新權重,作為下一次加權最小平方法的權重。本文提出的穩健迴歸分析方法是受到Chen and Shui (2007)提出的自我更新過程群類分析演算法和Cheng (1995)的均值偏移演算法的啟發,能減少離群值對迴歸分析的影響。此方法亦是一個疊代過程,透過資料間的距離決定局部加權最小平方法的權重,再由更新的資料作為下一步的原始資料,完成自我更新過程。本文透過模擬研究展示了此方法在三種類型資料上的優異表現:有均勻雜訊的資料、重尾誤差的資料、多重迴歸模型資料。除了模擬研究外,最後引用一筆美國職業棒球大聯盟選手的薪水資料作為此方法的實際應用。
Robust regression based on an M-estimator has been developed by Huber since 1973. A common algorithm is to perform weighted least-squares in which the weights are iteratively updated according to the new fitted line. In this paper, we will present an iterative process reducing the effect from outliers. It is an extension of SUP (self-updating process) clustering algorithm (Chen and Shiu 2007) and mean-shift clustering (Cheng 1995). This process updates the weights and moves the data points to a locally fitted line in each iteration. We also provide some estimation protocols after this process converged. Simulation studies were done to show that our proposed method outperforms the traditional approach in some types of data. For example, (i) data with uniform noise, (ii) heavy-tailed noise data, and (iii) multiple linear models. The convergence problem and some simple examples will be discussed. Finally, a real data set about MLB players’ salaries is analyzed to demonstrate our method.
Abstract iii
1 Introduction 1
1.1 Robust regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Clustering by self-updating process . . . . . . . . . . . . . . . . . . . 2
1.3 Mean-shift clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 The concept of our method . . . . . . . . . . . . . . . . . . . . . . . 5
2 Regression by SUP 5
2.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 The effect of parameters . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.1 The parameter r . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.2 The parameter T . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.3 The parameter p . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Strengths of the algorithm 12
3.1 Data with uniform noise . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Heavy-tailed noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Multiple linear models . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4 Discussion about convergence 17
4.1 Blurring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Non-blurring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5 Real data 21
6 Discussion and future work 25
Reference 26
