論文名稱(外文):Improving Performance of Prediction Model with Outlier Detection and Feature Selection
外文關鍵詞:Outlier DetectionFeature SelectionMultiple Linear RegressionLearning performance prediction
In order to improve students’ learning performance, early and accurately identify at-risk students, so that teachers can early intervention, is the focus topic of many related research.
Blended course is a course which combine online and offline learning, different from traditional offline learning, students are also able to learn through the online learning platform. However, students will leave a lot of records in the learning process, such as students' homework grade, video viewing behavior, online activity frequency, online test grade etc. Therefore, this paper based on data mining and machine learning technologies, collects students’ learning activity data from a blended calculus course, uses multiple linear regression to predict students’ final grade.
Related researchs point out the accuracy of the prediction model is easily affected by outliers. Therefore, this paper uses RANSAC algorithm as outlier detection method to remove outliers from data. In order to futher improve accuracy of prediction model after remove outliers, this paper uses T-Test as feature selection method, retains the key features that have a significant impact on the final grade, to futher improve accuracy of prediction model.
According to the results of research, through the outlier detection and feature selection process proposed in this paper, prediction error from 15.516 down to 4.571 points, improving the prediction error about 70 percent.
摘要 i
圖目錄 v
表格目錄 vi
一、 緒論 1
二、 文獻探討 3
2.1 隨機抽樣一致(RANdom SAmple Consensus) 3
2.2 特徵選取 3
2.3 學習風險預測 4
2.4 總結 5
三、 混成式微積分課程 7
四、 方法 8
4.1 資料收集 8
4.2 資料前處理 8
4.2.1 填充缺失值 8
4.2.2 資料整合 9
4.2.3 資料聚合 10
4.3 離群值偵測 10
4.4 特徵選取 11
4.5 殘差分析 11
4.6 迴歸分析 12
4.7 交叉驗證 12
4.8 資料標準化 13
五、 結果及討論 14
5.1 研究問題一 14
5.1.1 未移除Outlier流程-結果 14
5.1.2 移除Outlier流程-結果 15
5.1.3 結果總結 19
5.2 研究問題二 19
5.2.1 加入特徵選取流程-結果 20
5.2.2 結果總結 22
5.3 研究問題三 23
5.3.1 「week 1 ~ week 3」資料集流程-結果 24
5.3.2 結果總結 28
六、 結論 29
參考文獻 31
