研究生(外文):Yen-Hsun Chen
論文名稱(外文):OWA Based Information Fusion Method with PCA Preprocessing for Dataset Classification
指導教授(外文):Ching-Hsue Cheng
外文關鍵詞:ClassificationFCM methodPCA methodOWA operator
資訊對於企業而言,扮演了一個極為重要的角色;不論是在決策制定、擬定經營策略上都能給予管理者有效地支援。然而,面臨資訊量爆增的年代,資訊量相較於過去增加許多,要如何有效地處理高維度、高複雜度的資訊,是本文所研究的重點。本研究提出的方法係先採用主成份分析法(Principal Components Analysis method, PCA)將資料進行整合。其次,運用循序加權平均運算子(Ordered Weighted Averaging operator, OWA),將多屬性資料融合為單一屬性整合值。接著使用模糊C均值法(Fuzzy C Means algorithm, FCM)將整合值進行分群,透過訓練分類的正確率,而得到最佳的情境參數。最後,對測試資料進行分類。本研究將使用五個資料集來進行驗證:(1)鳶尾花、(2)肺癌、(3)威斯康辛乳癌、(4)SPA50A、(5)SPA50B。由實驗結果可知,本研究所提出的方法不但可以有效地降低資料的維度與複雜度,循序權重運算子也能針對重要的主成份給予相對應的權重,使得其分類正確率較列出的研究有顯著的提昇。
Information plays an important role in enterprises, no matter in decision supporting and business strategies making all provide efficient supports to executive managers. However, information is getting more and more today, how to handle high dimensions data and high complexity data are the key issues of this research.
Multi-attribute data usually possesses high data dimension and high data complexity. In order to solve above problems, this research proposes a new information fusion method which is briefly described as follows: (1) Reduce data dimensions by PCA method. (2) Calculate integrated values by OWA operator. (3) Cluster data instance into specific group by FCM and train classification accuracy of training data. (4) Validate classification accuracy of testing data. In this research, there five datasets adopted to verify performances of proposed method, i.e. Iris, Lung cancer, WBC, SPA50A and SPA50B. The experiments results show that classification accuracies rates and of proposed method obviously surpass the listing methods and OWA operator can effectively offer corresponding weights to those important principal components. It is conducive to enhance the performance of proposed method.
摘要 I
致謝 III
Contents IV
List of Figures V
List of Tables VI
List of Appendix VII
1. Introduction 1
1.1 Background and Motivation 1
1.2 Objective 3
1.3 Research Limitation 4
1.4 Thesis Organization 5
2. Background Knowledge 6
2.1 PCA method 6
2.2 OWA operator 8
2.2.1 Yager’s OWA 9
2.2.2 Fuller and Majlender’s OWA 11
2.3 Clustering 12
2.3.1 FCM method 12
2.3.2 FCM Algorithm 13
3. The Methodology 16
3.1 Proposed Method 16
3.2 Research Framework 17
3.3 The Algorithm of Proposed Method 18
4. Verifications and Comparisons 25
4.1 WBC dataset 25
4.2 Lung Cancer dataset 29
4.3 SPA50A and SPA50B dataset 30
5. Conclusion 33
Reference: 35
Appendix A: 38
