跳到主要內容

臺灣博碩士論文加值系統

(44.201.97.138) 您好!臺灣時間:2024/09/08 04:06
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:戴宇宏
研究生(外文):yu-hong dai
論文名稱:應用基因演算法結合SVM與FCM之漸進式分群
論文名稱(外文):Incremental clustering with GA, SVM, and FCM methods
指導教授:邱登裕邱登裕引用關係
指導教授(外文):Deng-Yiv Chiu
學位類別:碩士
校院名稱:中華大學
系所名稱:資訊管理學系(所)
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2008
畢業學年度:96
語文別:中文
論文頁數:71
中文關鍵詞:漸進式分群方法基因演算法模糊分群演算法支持向量機
外文關鍵詞:Incremental clustering methodsgenetic algorithmsfuzzy clustering algorithmsSVM
相關次數:
  • 被引用被引用:2
  • 點閱點閱:490
  • 評分評分:
  • 下載下載:64
  • 收藏至我的研究室書目清單書目收藏:1
在資訊爆炸的時代,日益龐大且增加快速的文件已經難以更進一步的進行管理與分析,進而造成資訊過載(information overload)的現象。因此如何在龐大的資料中正確且有效率找尋有用的資訊是很重要的,而分群(Clustering)就是一項常用來找尋資料特徵與關聯的主要技術之ㄧ。
本研究提出了一個結合了以基因演算法為基礎的支持向量機分類方法及模糊分群法,其中結合了基因演算法的支持向量機分類模型將新進文件分類至既有類別,而結合基因演算法的模糊分群模組將針對無法分類至既有類別的文件進行分群。首先利用中研院的CKIP中文斷詞系統進行中文文件的斷詞處理,篩選出所需要的特徵詞。接著利用基因演算法(Genetic Algorithm)挑選適合的特徵詞組合來訓練既有類別文件的支持向量機模型(Support Vector Machine),並用測試文件將屬於計有類別的文件與以分類;接下來對於未分至既有類別的文件分群,利用基因演算法(Genetic Algorithm)進行分群群數最佳化,以及挑選模糊分群法(Fuzzy C-means)的最佳分群中心點以進行分群。最後,使用效能衡量指標Precision、Recall以及F-measure評估本研究的效率及分類準確率Macro-average和Micro-average。
由實驗結果可以看出,使用GA-SVM方法可以有效的提升分類的效能,而使用GA-FCM模組進行分群也可以顯著的取得較佳群聚架構。
With explosion of information, it is very difficult to manage documents. How to efficiently find useful information in large information is very important. Clustering algorithm is a kind of technology to find characteristics of information and relationship to help manage documents.
This study proposes a method--combination of SVM classification method and fuzzy clustering method based on genetic algorithm. SVM classification method based on genetic algorithm is used to classify incoming document to see if it belongs to the existing classes. Fuzzy clustering method based on genetic algorithm is used to cluster the unclassified documents. First, we use CKIP system to segment Chinese documents to extract keywords. Genetic algorithms is used to select the appropriate terms to train SVM model of existing classes and classify incoming document to see if it belongs to the existing classes. Then genetic algorithm is used again to select the best number of clustering and the best centroid of cluster. Finally, precision, recall and F-measure are used to measure the efficiency. Macro-average and Micro-average are used to measure accuracy.
In empirical results, the proposed method can improve classification effectiveness. Also, GA-FCM outperforms other clustering methods significantly.
摘要 i
Abstract ii
誌謝 iii
目錄 iv
圖目錄 v
表目錄 vi
第壹章 緒論 1
第一節 研究動機 1
第二節 研究目的 2
第三節 論文架構 2
第貳章 文獻探討 5
第一節 知識挖掘 5
第二節 分類與分群的技術 8
第三節 基因演算法 21
第参章 研究方法 30
第一節 文件前置處理 32
第二節 GA-SVM模型 32
第三節 GA-FCM模組 39
第四節 效能評估 44
第肆章 實驗 46
第一節 實驗資料來源 46
第二節 實驗設計 47
第三節 實驗評估與討論 48
第伍章 結論與未來展望 57
第一節 結論 57
第二節 未來研究發展 57
參考文獻 59
英文部分 59
中文部分 62
附錄一、特徵詞詞性對照表 63
中文部分
1. 何俊德: 基於影像與文字特徵之網頁內容分類方法之研究,朝陽科技大學資管所碩士論文 (2004)
2. 曾元顯,莊大衛: 文件自我擴展於自動分類之應用,輔仁大學圖書資訊學所(2005) 129-141
3. 陳榮昌,林育臣: 群聚演算法及群聚參數的分析, 朝陽學報 Vol.1(8) (2003) 327-354
4. 曾元顯: 資訊檢索與知識探勘,輔仁大學圖書資訊系,2004
5. 曾元顯: 文件主題自動分類成效因素探討,輔仁大學圖書資訊系,中國圖書館學會會報, No.67 (2002) 62-83
6. 黃安橦: 應用支向機於晶圓圖類之研究,明新科技大學工管所(2005)
7. 劉冠妤,導入概念階層觀念以改善分群演算法之績效,成大資管所 (2004)
8. 蔡明倫: 二維點狀影像資訊之強化、特徵擷取及辨識-以X光乳房微鈣化檢測為例,大葉大學工工所碩士論文(2002)
9. 鍾明璇: 應用關聯規則技術有效輔助以向量空間模型為基礎之文件群集法, 中原大學資訊管理學系碩士學位論文(2002)
10. 韓歆儀: 應用兩階段分類法提升SVM法之分類準確率,成大工管所碩士論文(2004)
11. 潘雅真: 企業式知識地圖,中華大學資管所碩士論文 (2004)

英文部分
1. Bezdek, J. C.: Pattern recognition with fuzzy objective function algorithms. New York: Plenum Press, (1981)
2. Chang, C., Lin, C.: LIBSVM: a library for suport vector machines. Software available at http://www.csie.ntu.edu.tw/˜cjlin/libsvm, (2001)
3. Chou, C.H., Han, C.C., Chen Y.H.: GA Based Optimal Keyword Extraction in an Automatic Chinese Web Document Classification System. Lecture Notes in Computer Science, Vol.4743(2007) 224-234
4. Dash, M., Liu, H., Xu, X.: `1+1>2': Merging distance and density based clustering. Proceedings of the Database Systems for Advanced Applications, Hong Kong (2001) 18-20.
5. Davidov, D., Gabrilovich, E., Markovitch, S.: Parameterized generation of labeled datasets for text categorization based on a hierarchical directory. Proceedings of the 27th Annual International ACM SIGIR, (2004) 250-257
6. Dorre, J., Gerstl, P., Seiffert, R.: Text mining: finding nuggets in mountains of textual data. Proceedings of the Knowledge Discovery and Data Mining, (1999) 398-401
7. Drucker H., Shahrary, B., David C. Gibbon: Support vector machines: relevance feedback and information retrieval. Information Processing & Management, Vol.38 (2002) 305-323
8. Eshelman, L.J., Caruana, R.A., Schaffer, J.D.: Biases in the Crossover Landscape. Proceedings of the 3rd Int’l Conference on Genetic Algorithms, (1989) 10-19
9. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: Density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the Knowledge Discovery and Data Mining, (1996) 226-231
10. Everitt, B.: Cluster analysis, New York:Heinemann Educational Book, London (1974)
11. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P. : From Data Mining to Knowledge Discovery: An Overview. Advances in Knowledge Discovery and Data Mining, (1996) 1-36
12. Gavrilis D, Tsoulos IG, Dermatas E.: Neural recognition and genetic features selection for robust detection of e-mail spam. Lecture notes in computer science, Vol.3955 (2006) 498–501
13. Goldberg, G. E.: Genetic algorithms in search, optimization and machine learning. Addison Wesley, (1989)
14. Han, J., Kamber, M.: Data mining: concepts and techniques. New York: Morgan Kaufmann Publishers, (2001)
15. Holland, J. H.: Adaptation in natural and artificial. Cambridge, (1975)
16. Hwee T.N., Wei B.G., Kok L.L.: Feature Selection, Perception Learning, and a Usability Case Study for Text Categorization. Proceedings of the 20th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, (1997) 67-73.
17. Karypis, G., Han, E.H., Kumar, V.: Chameleon: hierarchical clustering using dynamic modeling. IEEE ,Computer, Vol.32(8) (1999) 68-75
18. Liu, D., Wang, Y., Liu C., Wang Z.: Multiple Documents Summarization Based on Genetic Algorithm. FSKD, (2006) 355-364
19. Lin, F.R., Hsueh, C.M., Knowledge Map Creation and Maintenance for Virtual Communities of Practice. Information Processing and Management, (2006) 551-568,.
20. López-Pujalte, C., Bote, V.P.G., de Moya Anegón, F.: A test of genetic algorithms in relevance feedback. Information Processing and Management. Vol.38(6) (2002) 793-805
21. MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol.1 (1967) 281-297
22. Miyamoto, S.: Information clustering based on fuzzy multisets. Information Processing and Management, Vol.39 (2003)195–213.
23. Nello, C., John, S.T.: An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, (2000)
24. Onoda, T., Murata H., Yamada, S.: SVM-based Interactive document Retrieval with Active Learning. New Generation Computing, Vol.26 (2008) 49-61
25. Pal, M.R., Bezdek, J.C.: On Cluster Validity for the fuzzy c-means. IEEE Transactions on the Fuzzy Systems, Vol.3(3) (1995) 370-379.
26. Principe, J.C., Euliano, N.R., Lefebvre, W.C.: Neural and adaptive systems: fundamentals through simulations. John Wiley and Sons (2000)
27. Trotman, A.,: Choosing document structure weights. Information Processing and Management,Vol.41 (2005) 243-264
28. Van Rijsbergen, C. J.: Information Retrieval. Butterworths (1979)
29. Vapnik, V.: The nature of statistical learning theory. Springer Verlag, New York (1995)
30. Wang, T.Y., Chiang, H.M.: Fuzzy support vector machine for multi-class text categorization. Information Processing & Management, Vol.43 (2007) 914-929.
31. Xie L.X., Beni G.: A validity measure for fuzzy clustering. IEEE Trans. Pattem Anal. Machine Intell, Vol.13 (1991) 841-847,
32. Xu S., Wang H., Zhang Y.: Chinese Abbreviation-Definition Identification: A SVM Approach Using Context Information. ,PRICAI 2006, (2006) 495-504
33. Yang, Y., Pedersen, J.: A Comparative Study on Feature Selection in Text Categorization, Proceedings of the International Conference on Machine Learning, (1997) 412-420.
34. Youngjoong, K., Jinwoo P., Jungyun, S.: Improving text categorization using the importance of sentences, Information Processing & Management, Vol.40 (2004) 65-79
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top