跳到主要內容

臺灣博碩士論文加值系統

(98.82.120.188) 您好!臺灣時間:2024/09/09 04:06
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:王博正
研究生(外文):Po-Cheng Wang
論文名稱:基於基因演算法之自動屬性分群與特徵選取
論文名稱(外文):Automatic Attribute Clustering and Feature Selection Based on Genetic Algorithms
指導教授:洪宗貝洪宗貝引用關係
指導教授(外文):Tzung-Pei Hong
學位類別:碩士
校院名稱:國立中山大學
系所名稱:資訊工程學系研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2009
畢業學年度:97
語文別:英文
論文頁數:83
中文關鍵詞:折減子集k-means基因演算法特徵分群特徵選取
外文關鍵詞:k-meansreductgenetic algorithmsfeature clusteringfeature selection
相關次數:
  • 被引用被引用:0
  • 點閱點閱:204
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
特徵選取的技術在資料探勘與機器學習的預處理程序扮演著相當重要的角色,一組優良的特徵集合在分類問題上不只可以擁有著高準確率並且可以減少探勘所需的時間。當訓練用的資料集合具備大量的特徵個數時,其所需要的計算時間通常是相當可觀。在這篇論文內,我們提出了三個基於基因演算法的屬性分群方法以做為特徵選取的技術。第一個方法使用正整數來進行染色體的編碼,並以每個基因值代表著這個特徵所歸屬的群聚。我們利用不同群聚內的特徵組合出來的集合之分類準確度與各群間特徵數量的平衡度來評估一個染色體的好壞。第二個方法則使用新的適應度函數來評估染色體的好壞,此新的適應度函數可大量減低掃描資料庫所需的時間。最後第三個方法改變原本的染色體編碼方式,不同基因存放不同群集的中心點,此新的編碼方式可以使基因演算法更快速的達到收斂速度。最後實驗將我們所提的方法與k-means所求得的分群結果做比較與討論。本篇論文所提出的三個演算法可在準確率與計算時間之間達成折衷。此外當使用傳統的特徵選取技術所得到的結果,若在推論時無法取得所選取特徵之數值時,則之前推導出的分類規則將無法被利用。然而利用我們所提出的演算法可以有效的解決特徵值遺失的情形,遺漏掉的特徵我們可以用與其相同的群集中的其他屬性來替換使用。因此我們所提出的特徵分群方法比以往的特徵選取技術有更大的彈性。
Feature selection is an important pre-processing step in mining and learning. A good set of features can not only improve the accuracy of classification, but also reduce the time to derive rules. It is executed especially when the amount of attributes in a given training data is very large. This thesis thus proposes three GA-based clustering methods for attribute clustering and feature selection. In the first method, each feasible clustering result is encoded into a chromosome with positive integers and a gene in the chromosome is for an attribute. The value of a gene represents the cluster to which the attribute belongs. The fitness of each individual is evaluated using both the average accuracy of attribute substitutions in clusters and the cluster balance. The second method further extends the first method to improve the time performance. A new fitness function based on both the accuracy and the attribute dependency is proposed. It can reduce the time of scanning the data base. The third approach uses another encoding method for representing chromosomes. It can achieve a faster convergence and a better result than the second one. At last, the experimental comparison with the k-means clustering approach and with all combinations of attributes also shows the proposed approach can get a good trade-off between accuracy and time complexity. Besides, after feature selection, the rules derived from only the selected features may usually be hard to use if some values of the selected features cannot be obtained in current environments. This problem can be easily solved in our proposed approaches. The attributes with missing values can be replaced by other attributes in the same clusters. The proposed approaches thus provide flexible alternatives for feature selection.
Chinese Abstract .................................................................................................... I
Eglissh Abstract .................................................................................................. III
CHAPTER 1 Introduction ................................................................................. 1
1.1 Background and Motivation ........................................................................................ 1
1.2 Contributions................................................................................................................ 4
1.3 Thesis Organization ..................................................................................................... 5
CHAPTER 2 Review of Related Works ............................................................ 6
2.1 The Concept of Clustering ........................................................................................... 6
2.2 Reduct .......................................................................................................................... 7
2.3 Genetic Algorithms ...................................................................................................... 9
CHAPTER 3 GA-Based Clustering with Accuracy Measurement .............. 12
3.1 Notation...................................................................................................................... 12
3.2 Chromosome Representation ..................................................................................... 13
3.3 Initial Population ........................................................................................................ 14
3.4 Fitness and Selection.................................................................................................. 14
3.5 Genetic Operators ...................................................................................................... 18
3.6 The Proposed Algorithm ............................................................................................ 21
3.7 An Example ................................................................................................................ 23
CHAPTER 4 GA-Based Clustering with Similarity ..................................... 29
4.1 Notation...................................................................................................................... 29
4.2 Fitness Function ......................................................................................................... 30
4.3 The Proposed Algorithm ............................................................................................ 34
4.4 An Example ................................................................................................................ 36
CHAPTER 5 GA-Based Clustering with a Novel Encoded Scheme ............ 42
5.1 Notation...................................................................................................................... 42
5.2 Chromosome Representation ..................................................................................... 43
5.3 Initial Population ........................................................................................................ 44
5.4 Fitness and Selection.................................................................................................. 45
5.5 Genetic Operators ...................................................................................................... 47
5.6 The Proposed Algorithm ............................................................................................ 47
5.7 An Example ................................................................................................................ 49
CHAPTER 6 Experimental Results ................................................................ 55
6.1 Experimental Results of Method(I) ........................................................................... 55
6.2 Experimental Results of Method(II) .......................................................................... 63
6.3 Experimental Results of Method(III) ......................................................................... 65
CHAPTER 7 Conclusions and Future Works ................................................ 68
References ........................................................................................................ 69
[1] Q. A. Al-Radaideh, M. N. Sulaiman, M. H. Selamat and H. Ibrahim “Approximate reduct computation by rough sets based attribute weighting,” IEEE International Conference on Granular Computing, Vol. 2, pp. 383-386, 2005.
[2] A. L. Blum and R. L. Rivest, “Training a 3-node neural networks is NP-complete,” Neural Networks, Vol. 5, pp. 117-127, 1992.
[3] K. J. Cios, L. A. Kurgan, R. Tadeusiewicz, M. Ogiela and L.S. Goodenday, "Knowledge Discovery Approach to Automated Cardiac SPECT Diagnosis," Artificial Intelligence in Medicine 23, no. 2 (2001) : 149-169.
[4] D. E. Goldberg, “Genetic Algorithms in Search, Optimization & Machine Learning,” Addison Wesley, 1989.
[5] J. J. Grefenstette, “Optimization of control parameters for genetic algorithms,” IEEE Transactions on System Man, and Cybernetics, Vol. 16, No. 1, pp. 122-128, 1986.
[6] I. Graham and P. L. Jones, Expert Systems - Knowledge, Uncertainty and Decision, Boston, Chapman and Computing, pp.117-158, 1988.
[7] K. Gao, M. Liu, K. Chen, N. Zhou and J. Chen, “Sampling-based tasks scheduling in dynamic grid environment,” The Fifth WSEAS International Conference on Simulation, Modeling and Optimization, pp.25-30, 2005.
[8] J. H. Holland. Adaptation in Natural and Artificial Systems, University of Michigan Press, 1975.
[9] A. Homaifar, S. Guan, and G. E. Liepins, “A new approach on the traveling salesman problem by genetic algorithms,” The Fifth International Conference on Genetic Algorithms, 1993.
[10] J. Han, X. Hu and T. Y. Lin, “Feature selection based on rough set and information entropy,” The IEEE International Conference on Granular Computing, Vol. 1, pp. 153-158, 2005.
[11] T. P. Hong and Y. L. Liou, "Attribute Clustering in High Dimensional Feature Spaces", The International Conference on Conference on Machine Learning and Cybernetics, pp.19-22, 2007.
[12] Y. Li, S. C. K. Shiu and S. K. Pal, “Combining feature reduction and case selection in building CBR classifiers,” IEEE Transactions on Knowledge and Data Engineering, Vol. 18, No. 3, pp. 415- 429, 2006.
[13] M. Mitchell, An Introduction to Genetic Algorithms, MIT press, 1996.
[14] Z. Michalewicz, “Evolutionary Computation: Practical Issues,” International Conference on Evolutionary Computation, pp. 30-39, 1996
[15] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs, Springer-Verlag, 1994.
[16] J. MacQueen, L. M. LeCam, and J. Neyman, “Some methods for classification and analysis of multivariate observations,” The Fifth Berkeley Symposium on Mathematics, Statistics, and Probability, pp. 281-297, 1967.
[17] Z. Pawlak, “Rough set,” International Journal of Computer and Information Sciences, Vol. 11, No. 1, pp. 341-356, 1982.
[18] Z. Pawlak, “Why rough sets?” The Fifth IEEE International Conference on Fuzzy Systems, Vol. 2, pp. 738-743, 1996.
[19] E. Sanchez, et al, Genetic Algorithms and Fuzzy Logic Systems: Soft Computing Perspectives (Advances in Fuzzy Systems - Applications and Theory, Vol. 7), World-Scientific, 1997.
[20] S. Z. Selim and M. A. Ismail, “K-means type algorithms: A generalized convergence theorem and characterization of local optimality,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 6, pp. 81-87, 1984.
[21] A. Skowron and C. Rauszer, “The discernibility matrices and functions in information systems,” Fundamenta Informaticae, Vol. 15, pp. 331-362, 1991.
[22] H. Q. Sun, Z. Xiong, “Finding minimal reducts from incomplete information systems,” The Second International Conference on Machine Learning and Cybernetics, Vol. 1, pp. 350-354, 2003.
[23] J. Wroblewski, “Finding minimal reducts using genetic algorithms,” The Second Annual Join Conference on Information Sciences, pp. 186-189, 1995.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊