論文名稱(外文):A Hybrid Classification Method for Conference Information
指導教授(外文):Hei-Chia Wang
外文關鍵詞:text miningfeature selectionSVMhybrid classification
為了要讓研究學者快速找尋到適合的會議資訊,本研究將利用文字探勘技術過濾,並將會議資訊分類,以期能蒐集完整的資訊並透過分類,讓使用者可以容易找到適合自己有興趣的會議。因過去文獻的傳統分類演算法像是Support Vector Machine、Decision Tree和Naïve Bayes Classifier,並未有專門針對學術會議資訊做處理,如只用傳統文字方式分類可能會造成分類錯誤的情形發生。故本研究的目的為設計一為以學術會議資訊的分類演算法。

There are many researchers who want to realize the latest research topic and exchange information with others. They will surf the Internet for scholar conference information, and choose some of them to attend. Some websites have provided part of conference information, but most of them cannot help users find the information users really want to explore; besides, it is a hard work to filter the searched conference information by human. Hence, it is an important issue to help reseachers find out the suited conference information from the huge dataset to attend.
To find out the suited conference information efficiently for researcher, this study will classify the conference by text mining. The previous references of traditional classification algorithm like Decision Tree, Naïve Bayes Classifier and Support Vector Machine are not designed to classify documents of conference information, so when we classify these academic documents, we may get some incorrect answers. Therefore, the goal of this study is designing a classification algorithm for conference information.
Because there are many terminology nouns or phrases which consist of two words in the conference, when we analyze the importance of the terms, we should take this situation into consideration. Moreover, there are pros and cons in different existing classification algorithms, so the hybrid classification is adopted to integrate the traditional algorithm. We expect the new method designing for conference information can help researchers find the suited conferences efficiently and exactly.

第1章 緒論 1
1.1 研究背景 1
1.2 研究動機與目的 2
1.3 研究範圍與限制 4
1.4 研究流程 5
1.5 論文大綱 6
第2章 文獻探討 8
2.1 資料檢索 8
2.2 自然語言處理 10
2.2.1 詞性標註 10
2.2.2 字根還原 11
2.3 特徵選取 11
2.3.1 文件頻率 12
2.4 文件分類方法 12
2.4.1 簡單貝氏分類器(Naïve Bayes Classifier) 14
2.4.2 支援向量機(SVM) 15
2.4.3 決策樹(Decision Tree) 17
2.5 會議資訊網站 19
2.5.1 All Conference 19
2.5.2 Conference Alert 20
2.5.3 DBWorld 21
2.6 混合式分類法 22
2.7 小結 22
第3章 研究方法 23
3.1 研究架構 23
3.2 資料蒐集與前處理 25
3.3 訓練資料特徵選取 26
3.4 混合式分類模組 31
3.4.1 測試資料過濾 31
3.4.2 混合式分類 32
3.5 小結 35
第4章 系統建置與驗證 36
4.1 系統建置環境 36
4.2 實驗設計 37
4.2.1 實驗資料來源 37
4.2.2 前處理階段 38
4.2.3 特徵選取 38
4.2.4 Classifier 38
4.2.5 評估指標 38
4.3 實驗結果分析 39
4.3.1 實驗一:特徵選取方法比較 39
4.3.2 實驗二:混合式分類方法與傳統分類方法之比較 53
第5章 結論及未來研究方向 62
5.1 結論 62
5.2 未來研究方向 64
參考文獻 65

