研究生(外文):Cuo-Yen Lin
論文名稱(外文):Comparison of Automatic Classification Methods
指導教授(外文):Sun Wu
在本論文中,我們決定使用KNN和SVM這兩種自動分類的方法。會先收集大量的新聞網頁做分析,以Vector Space Model為主要架構,對不同的網頁做切割產生輸入檔。將產生的輸入檔傳給KNN和SVM這兩種方法去測試,最後由 precision 來探論這兩種方法的優缺及差異。
With the rapid development of internet and the popularity of online internet, we can surf various information what we needed on the internet. But so far the information on the line lack automobile system to categorize them, it needs to categorize them by artificialities. In this thesis, I try to analyze these main methods and test categorization. I want to present they are effective and view its result.

In this thesis, we try to decide two comparisons of automatic classification methods. One is KNN, another is SVM. At first, I will collect lots of data about news websites to analyze. I chose Vector Space Model as the main structure. Then, I try to parser different pages to test KNN and SVM these two methods,Finally, precision will compare and contrast the difference , the advantage and disadvantage.
摘 要 II
目 錄 IV
圖表目錄 VI
1 緒論 1
1.1 簡介 1
1.2 研究動機 2
1.3 論文組織 2
2 文獻探討 3
2.1 文件自動分類 3
2.2 特徵選取 3
2.3 分類規則 4
2.4 分類方法的選擇 4
3 系統簡介 6
3.1 系統架構 6
3.2 Document Pre-Processing 7
4 分類方法 9
4.1 Feature Selection 9
4.2 K-Nearest Neighbor 11
4.3 Weight K-Nearest Neighbors 13
4.4 Support Vector Machine 14
5 實驗結果與分析 18
5.1 實驗資料 18
5.2 KNN與WKNN的實驗結果 19
5.3 SVM的實驗結果 24
5.4 實驗分析 25
6 結論 28
6.1 Future Work 28
6.1.1 其它語系 28
6.1.2 重覆分類 28
6.1.3 階層式分類 29
6.1.4 Over-Fitting Problem 29
6.2 結論 30
7 參考資料 32
