跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.136) 您好!臺灣時間:2025/09/20 07:02
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:鄭俞榮
研究生(外文):Yu-Rong Cheng
論文名稱:萬用演算法為基礎之可能性模糊k-眾數演算法於類別型資料分群之研究
論文名稱(外文):Metaheuristic-Based Possibilistic Fuzzy k-modes Algorithms for Categorical Data Clustering
指導教授:郭人介郭人介引用關係
指導教授(外文):Ren-Jieh Kuo
口試委員:歐陽超王孔政
口試委員(外文):Chao Ou-YangKung-Jeng Wang
口試日期:2019-05-27
學位類別:碩士
校院名稱:國立臺灣科技大學
系所名稱:工業管理系
學門:商業及管理學門
學類:其他商業及管理學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:英文
論文頁數:87
中文關鍵詞:分群分析類別型資料萬用演算法基因演算法粒子群演算法正弦餘弦演算法模糊分群模糊k眾數
外文關鍵詞:Clustering analysisCategorical dataMetaheuristicGenetic algorithmParticle swarm optimization algorithmSine cosine algorithmFuzzy clusteringFuzzy k-modes algorithm
相關次數:
  • 被引用被引用:0
  • 點閱點閱:163
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
在現今智慧型裝置及科技應用普及的世代,大量的數據以既便利又快速的方式被記錄、收集,因此,如何對數據進行處理與分析,並從中得到有價值的訊息,是現今非常受重視的議題。在資料探勘領域,分群分析是一項非常重要的技術,然而針對不同的數據型態應選擇合適的方法著手分析。
本研究針對類別型資料之分群,探討以可能性概念 (possibilistic concept) 結合模糊k-眾數演算法 (Fuzzy k-modes, FKM),提出一可能性模糊k-眾數演算法(Possibilistic Fuzzy k-modes, PFKM) 以降低資料集中離群值的干擾進而改善分群結果,並進一步應用三種萬用演算法優化分群表現,分別為基因演算法、粒子群演算法和正弦餘弦演算法,進而提出三種分群演算法,分別為基因演算法為基礎之可能性模糊k-眾數演算法 (GA-PFKM)、粒子群演算法為基礎之可能性模糊k-眾數演算法 (PSO-PFKM) 和以正弦餘弦演算法為基礎之可能性模糊k-眾數演算法 (SCA-PFKM)。
本研究之實驗使用八組類別型資料集,分別針對所提出之演算法進行分群分析並使用Sum-of-square-error (SSE) 及準確率兩種評比指標與模糊k-眾數 (FKM)演算法進行比較。根據實驗結果證實,PSO-PFKM與SCA-PFKM兩種演算法在大多數的資料集中能獲得較其他演算法之優異表現。此外,本研究針對Breast cancer資料集的分群結果進一步進行個案研究分析,分析結果顯示,當normal nucleoli、bare nuclei 及clump thickness三種特徵值的類別數值較高,罹患乳癌的風險機率也越高。
Recently, smart devices and technology applications are applied widely in many fields. An enormous amount of information is recorded and collected rapidly. Thus, the process to analyze and obtain valuable information from the data becomes a very crucial issue. Clustering analysis plays an important role to solve the aforementioned issue. However, facing with the different types of data, the appropriate approach should be chosen to handle the data.
This study focuses on categorical data. A possibilistic fuzzy k-modes (PFKM) algorithm is proposed by combining the possibilistic concept with fuzzy k-modes (FKM) algorithm in order to alleviate the effects of outlier points and improve the clustering result. In addition, this study also implements three metaheuristics, namely genetic algorithm (GA), particle swarm optimization (PSO), and sine-cosine algorithm (SCA) in order to enhance the clustering performance. Therefore, three clustering algorithms are proposed in this study, named GA-PFKM, PSO-PFKM, and SCA-PFKM algorithms.
The proposed algorithms are utilized to perform a cluster analysis for eight categorical datasets. The performance of the algorithms is compared with the classical FKM algorithm using two indexes, namely sum-of-squared error (SSE) and accuracy. The experimental results indicate that PSO-PFKM and SCA-PFKM algorithms obtain the better performance for most of the datasets. Furthermore, this study analyzes the clustering result for breast cancer dataset more detailed. The analysis reveals that people with a higher range of normal nucleoli, bare nuclei, and clump thickness have a higher risk of breast cancer.
摘要.............I
ABSTRACT.............II
致謝.............III
CONTENTS.............IV
LIST OF TABLES.............VI
LIST OF FIGURES.............VII
CHAPTER 1 INTRODUCTION.............1
1.1 Research Background.............1
1.2 Research Objectives.............2
1.3 Research Scope and Constrains.............3
1.4 Research Organization.............3
CHAPTER 2 LITERATURE REVIEW.............5
2.1 Categorical Clustering Analysis.............5
2.2 Fuzzy k-modes Algorithm.............5
2.3 Possibilistic Fuzzy c-means Algorithm.............9
2.4 Metaheuristic.............12
2.4.1 Genetic Algorithm.............12
2.4.2 Particle Swarm Optimization Algorithm.............13
2.4.3 Sine Cosine Algorithm.............15
CHAPTER 3 RESEARCH METHODOLOGY.............17
3.1 Methodology Framework.............17
3.2 Data preprocessing.............18
3.3 Objective Function.............18
3.4 Possibilistic Fuzzy k-modes Algorithm (PFKM).............19
3.5 Metaheuristic-based PFKM Algorithm.............23
3.5.1 GA-PFKM Algorithm.............24
3.5.2 PSO-PFKM Algorithm.............27
3.5.3 SCA-PFKM Algorithm.............29
CHAPTER 4 EXPERIMENTAL RESULTS.............31
4.1 Datasets Collection.............31
4.2 Parameters Setting.............32
4.3 Performance Measurement.............32
4.4 Experimental Result and Statistical Hypothesis Testing.............33
4.4.1 Sum of Square Error (SSE).............33
4.4.2 Accuracy (AC).............40
4.4.3 Computational Time and Convergence History.............45
CHAPTER 5 CASE STUDY.............46
5.1 Background.............46
5.2 Research Framework.............47
5.3 Data Collection.............47
5.4 Clustering and Analysis.............48
5.5 Analysis Results.............49
CHAPTER 6 CONCLUSIONS AND FUTURE RESEARCH.............51
6.1 Conclusions.............51
6.2 Contributions.............51
6.3 Future Research.............52
REFERENCES.............53
APPENDIX.............55
Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E. D., Gutierrez, J. B., & Kochut, K. (2017). A brief survey of text mining: Classification, clustering and extraction techniques. arXiv preprint arXiv:1707.02919.
Bezdek, J. C., Boggavarapu, S., Hall, L. O., & Bensaid, A. (1994). Genetic algorithm guided clustering. Paper presented at the Evolutionary Computation, 1994. IEEE World Congress on Computational Intelligence., Proceedings of the First IEEE Conference on.
Bezdek, J. C., Ehrlich, R., & Full, W. (1984). FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences, 10(2-3), 191-203.
Bouguessa, M. (2015). Clustering categorical data in projected spaces. Data mining and knowledge discovery, 29(1), 3-38.
Chen YP et al., 2017, https://medcraveonline.com/MOJWH/MOJWH-06-00153.pdf, (online accessed:2019)
Chen, C.-Y., & Ye, F. (2012). Particle swarm optimization algorithm and its application to clustering analysis. Paper presented at the Electrical Power Distribution Networks (EPDC), 2012 Proceedings of 17th Conference on.
Diday, E., Govaert, G., Lechevallier, Y., & Sidi, J. (1981). Clustering in Pattern Recognition, Dordrecht.
Djenouri, Y., Belhadi, A., & Belkebir, R. (2018). Bees swarm optimization guided by data mining techniques for document information retrieval. Expert Systems with Applications, 94, 126-136. doi:https://doi.org/10.1016/j.eswa.2017.10.042
Eberhart, R., & Kennedy, J. (1995). A new optimizer using particle swarm theory. Paper presented at the Micro Machine and Human Science, 1995. MHS'95., Proceedings of the Sixth International Symposium on.
Farhang, Y. (2017). Face Extraction from Image based on K-Means Clustering Algorithms. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 8(9), 96-107.
Garces, E., Munoz, A., Lopez‐Moreno, J., & Gutierrez, D. (2012). Intrinsic images by clustering. Paper presented at the Computer graphics forum.
Holland, J. H. (1975). Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence: University of Michigan Press Ann Arbor.
Horn, D., & Gottlieb, A. (2001). Algorithm for data clustering in pattern recognition problems based on quantum mechanics. Physical review letters, 88(1), 018702.
Huang, Z. (1997). A fast clustering algorithm to cluster very large categorical data sets in data mining. DMKD, 3(8), 34-39.
Jain, A., Murty, M., & Flynn, P. (1999). Data Clustering: A Review ACM Computing surveys, vol. 31. Google Scholar, 264-318.
Jia, H., Cheung, Y.-m., & Liu, J. (2016). A new distance metric for unsupervised learning of categorical data. IEEE transactions on neural networks and learning systems, 27(5), 1065-1079.
Johnson, S. C. (1967). Hierarchical clustering schemes. Psychometrika, 32(3), 241-254.
Kachitvichyanukul, V. (2009). A particle swarm optimization for the vehicle routing problem with simultaneous pickup and delivery. Computers & Operations Research, 36(5), 1693-1702.
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Paper presented at the Proceedings of the fifth Berkeley symposium on mathematical statistics and probability.
Maulik, U., & Bandyopadhyay, S. (2000). Genetic algorithm-based clustering technique. Pattern Recognition, 33(9), 1455-1465.
Michielssen, E., Ranjithan, S., & Mittra, R. (1992). Optimal multilayer filter design using real coded genetic algorithms. IEE Proceedings J-Optoelectronics, 139(6), 413-420.
Mirjalili, S. (2016). SCA: a sine cosine algorithm for solving optimization problems. Knowledge-Based Systems, 96, 120-133.
Murthy, C. A., & Chowdhury, N. (1996). In search of optimal clusters using genetic algorithms.
Nicholls, T., & Bright, J. (2018). Understanding news story chains using information retrieval and network clustering techniques. arXiv preprint arXiv:1801.07988.
Osman, I. H., & Kelly, J. P. (1996). Meta-heuristics: an overview Meta-heuristics (pp. 1-21): Springer.
Pal, N. R., Pal, K., Keller, J. M., & Bezdek, J. C. (2005). A possibilistic fuzzy c-means clustering algorithm. IEEE transactions on fuzzy systems, 13(4), 517-530.
Rokach, L., & Maimon, O. (2005). Clustering methods Data mining and knowledge discovery handbook (pp. 321-352): Springer.
Tan, P.-N., Steinbach, M., & Kumar, V. (2006). Introduction to Data Mining (M. Goldstein Ed.). United States of America: Pearson education,Inc.
Wang, K.-P., Huang, L., Zhou, C.-G., & Pang, W. (2003). Particle swarm optimization for traveling salesman problem. Paper presented at the Machine Learning and Cybernetics, 2003 International Conference on.
Zhexue, H., & Ng, M. K. (1999). A fuzzy k-modes algorithm for clustering categorical data. Fuzzy Systems, IEEE Transactions on, 7(4), 446-452.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊