研究生(外文):Wei-chen Hsu
論文名稱(外文):Applications of Clustering Technologies on Fuzzy Data Mining
指導教授(外文):Chienwen WuR. J. Kuo
外文關鍵詞:data miningfuzzy functionneural networkfuzzy inferenceself-organizing maps network
資料萃取是從儲存大量資料的資料庫中篩選過濾、探索出具有新的意義之關連性、特徵與趨勢的技術。近來隨著網際網路的普遍化,資料的收集也越來越方便,更顯得資料萃取(Data Mining)的重要性。因為資料的數目會隨著電子化時代的來臨日漸膨脹,想要從資料庫中萃取出有用的資訊,變得更為困難。在這種情形下,資料萃取為一最佳選擇。但是由於資料萃取過程中需要不同領域的專家參與,但專家的判斷常伴隨著主觀的意見,容易產生誤判的情形發生,而這些誤判會造成企業在決策過程中,造成重大的損失。
It is a modern trend for an enterprise to use computers in every business process. The result is that huge amount of enterprise data is collected by computers. Data have to be analyzed effectively so that useful enterprise knowledge can be retrieved and utilized. But past technologies cannot serve for this purpose. Data mining is a new technology aiming at transforming the raw data into valuable information. In the process, different domain experts are needed to provide different information. For most of fuzzy data mining researches, the fuzzy membership function needs to be provided by the domain experts.
In this thesis, three approaches are provided to assist in deriving the fuzzy membership functions. We use the two-dimensional and single-dimensional SOM (self-organizing map) neural networks, and a combination of the SOM network and the K-means method to determine the appropriate number of groups for data attributes. When the group centers that are the appropriate number of groups for data attributes are decided, these centers are used to construct the triangle fuzzy membership functions. Next, the fuzzy association rule algorithm is used to retrieve the fuzzy customer behavior knowledge. In the process, the support and confidence values are used to filter out the noise values and unimportant attributes. Experiments are performed to evaluate all the approaches. Raw data from a library are examined and the fuzzy customer behavior knowledge is retrieved.
摘要 i
第一章 緒論1
1.1 研究背景與動機1
1.2 研究目的3
1.3 研究限制3
1.4 研究步驟與流程4
1.5 各章節的概述5
第二章 文獻探討6
2.1 資料萃取基本概念6
2.1.1 類神經網路在資料萃取中的應用9
2.1.2 統計學在資料萃取中的應用14
2.1.3 以Itemset為基礎的資料萃取應用15
2.2 模糊理論18
2.3 聚類的方法21
2.3.1 K-means 聚類法22
2.3.2 自組織映射圖網路聚類法23
2.3.3 兩階段聚類法25
2.3.4 其他聚類法26
第三章 研究方法27
3.1 資料的收集29
3.2 自動化聚類法29
3.2.1 SOM聚類演算法30
3.2.2 SOM+K-means 聚類演算法33
3.3 模糊歸屬函數的制訂35
3.4 模糊資料萃取法35
3.4.1 符號的定義36
3.4.2 模糊資料萃取演算法36
第四章 實證結果40
4.1 多維SOM結合模糊資料萃取演算法的實證結果41
4.2 單維SOM結合模糊資料萃取演算法的實證結果46
4.3 單維SOM+K-means結合模糊資料萃取演算法的實證結果50
4.4 比較結果54
第五章 結論與建議58
5.1 結論58
5.2 研究貢獻59
5.3 後續研究建議59
A 資料萃取軟體畫面65
B Apriori演算法運算過程68
C 原始與分析資料資料庫內容71
