研究生(外文):Chang, Kuo Chen
論文名稱(外文):Mining Closed Patterns from Long Biological Datasets by Multithread in Multicore CPU
指導教授(外文):Hunag, Jen-peng
外文關鍵詞:MicroarrayData MiningFrequent Closed ItemsetFP-GrowthMultithread Processing
As the bioinformatics technologies growing, the whole human genes have been decoded. The applications of microarray become more and more popular. Some data mining technologies can help us to discover the co-regulated genes or some specific gene groups. But the datasets of microarry are very different from the transactional datasets which are common used in most data mining fields. Basically, the samples of microarray are much less than the record size of transaction datasets and it is generally between 100 and 1,000 data rows. But the number of gene features is very large, it usually reaches thousands or more. So the performance will be bad if we mine those dataset by traditional frequent closed itemset mining algorithms, such as FP-Growth, especially in small minimum support, the run time will increase dramatically. This thesis presents an approach which invokes two or more threads for mining simultaneously, so it will help us to increase efficiency of mining those datasets and alleviates the effect of small minimum support. Besides, this approach can work under multithread environment with duo core or multi core CPU computers which are more popular recently, so the algorithms could work independently to achieve higher performance.
摘 要 iv
目次 vi
圖目錄 viii
表目錄 ix
第1章 緒論 1
1.1. 研究背景 1
1.2. 研究動機 2
1.3. 研究目的 3
1.4. 論文架構 4
第2章 定義與相關文獻 5
2.1. 淺談Mircoarray 5
2.2. 資料探勘 7
2.3. 平行資料探勘 10
2.4. 定義 12
2.5. 相關文獻 13
2.5.1. Apriori演算法 14
2.5.2. FP-growth 15
2.5.3. CARPENTER 16
2.5.4. TD-Close 18
第3章 研究方法 21
3.1. BD-Close演算法 23
3.1.1. MinePattern演算法 28
3.1.2. BDClose主程序說明 33
3.2. 實例說明 35
3.3. PMClose演算法 38
3.3.1. PMMine探勘演算法 40
3.3.2. PMClose演算法主程序 43
第4章 效能測試 46
4.1. 實驗環境 46
4.2. 實驗設計 47
4.3. 效能測試與結果分析 48
4.3.1. 模擬資料集 50
4.3.2. 實際資料 53
第5章 結論與未來研究 59
5.1. 結論 59
5.2. 未來研究 60
參考文獻 62
