跳到主要內容

臺灣博碩士論文加值系統

(44.220.184.63) 您好!臺灣時間:2024/10/08 08:21
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:張國鎮
研究生(外文):Chang, Kuo Chen
論文名稱:在多核心CPU中以多緒執行探勘生物資訊學之封閉項目集
論文名稱(外文):Mining Closed Patterns from Long Biological Datasets by Multithread in Multicore CPU
指導教授:黃仁鵬黃仁鵬引用關係
指導教授(外文):Hunag, Jen-peng
學位類別:碩士
校院名稱:南台科技大學
系所名稱:資訊管理系
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2007
畢業學年度:95
語文別:中文
論文頁數:64
中文關鍵詞:微陣列基因體資料探勘頻繁封閉集合FP-Growth多緒執行
外文關鍵詞:MicroarrayData MiningFrequent Closed ItemsetFP-GrowthMultithread Processing
相關次數:
  • 被引用被引用:0
  • 點閱點閱:152
  • 評分評分:
  • 下載下載:23
  • 收藏至我的研究室書目清單書目收藏:1
近年來隨著生物科技的日益成熟,人類的所有基因均已解碼,而透過微陣列基因體(Microarray)取樣分析的應用也日益廣泛,而利用資料探勘的技術,可以協助分析基因群組或特徵表現等有用資訊。但這一類的資料特徵跟以往資料探勘常見的資料集有很顯著的差異。一般來說,Microarray的樣本數不多,約數十筆最多至約莫千筆左右,但其基因特徵片段卻常高達數千至上萬之譜,因此傳統的頻繁封閉集合探勘演算法,如Apriori、FP-Growth等,應用在此類資料集時,效能表現較不理想,尤其在最小支持度較小時,所需時間更是急遽增加,主要是因為其探勘演算法是以資料項目集作為樣式組合依據。本研究提出BD-Close與PMClose演算法,將資料項目與資料列作轉換,減少樣式組合數量,同時還可以將探勘程序切割成兩個甚至是多個執行緒,可以同時進行探勘,不需等候其他執行緒的探勘結果,因此可以大幅增進對此類資料集的頻繁封閉項目集探勘效率,並減少因最小支持度遞減而造成的效能急遽低落的現象;此外這兩個演算法的執行緒因為還可以各自獨立運作,因此運用在日益普及的多核心CPU與多緒執行的作業系統下,可以讓整體探勘效率更好。
As the bioinformatics technologies growing, the whole human genes have been decoded. The applications of microarray become more and more popular. Some data mining technologies can help us to discover the co-regulated genes or some specific gene groups. But the datasets of microarry are very different from the transactional datasets which are common used in most data mining fields. Basically, the samples of microarray are much less than the record size of transaction datasets and it is generally between 100 and 1,000 data rows. But the number of gene features is very large, it usually reaches thousands or more. So the performance will be bad if we mine those dataset by traditional frequent closed itemset mining algorithms, such as FP-Growth, especially in small minimum support, the run time will increase dramatically. This thesis presents an approach which invokes two or more threads for mining simultaneously, so it will help us to increase efficiency of mining those datasets and alleviates the effect of small minimum support. Besides, this approach can work under multithread environment with duo core or multi core CPU computers which are more popular recently, so the algorithms could work independently to achieve higher performance.
摘 要 iv
ABSTRACT v
目次 vi
圖目錄 viii
表目錄 ix
第1章 緒論 1
1.1. 研究背景 1
1.2. 研究動機 2
1.3. 研究目的 3
1.4. 論文架構 4
第2章 定義與相關文獻 5
2.1. 淺談Mircoarray 5
2.2. 資料探勘 7
2.3. 平行資料探勘 10
2.4. 定義 12
2.5. 相關文獻 13
2.5.1. Apriori演算法 14
2.5.2. FP-growth 15
2.5.3. CARPENTER 16
2.5.4. TD-Close 18
第3章 研究方法 21
3.1. BD-Close演算法 23
3.1.1. MinePattern演算法 28
3.1.2. BDClose主程序說明 33
3.2. 實例說明 35
3.3. PMClose演算法 38
3.3.1. PMMine探勘演算法 40
3.3.2. PMClose演算法主程序 43
第4章 效能測試 46
4.1. 實驗環境 46
4.2. 實驗設計 47
4.3. 效能測試與結果分析 48
4.3.1. 模擬資料集 50
4.3.2. 實際資料 53
第5章 結論與未來研究 59
5.1. 結論 59
5.2. 未來研究 60
參考文獻 62
1. Agrawal, R., and Srikant, R. "Fast algorithms for mining association rules." in Proc. 1994 Int. Conf. Very Large Data Bases (VLDB'94), Santiago, Chile, 1999, pp. 487-499.
2. Cong, G., Tung, A.K.H., Xin, X., Pan, F., and Yang, J. "FARMER: Finding interesting rule groups in microarray datasets." in Proc. 23rd ACM Int. Conf. Management of Data, 2004, pp. 143-154.
3. Creighton, C., and Hanash, S. "Mining gene expression databases for association rules," Bioinformatics Vol. 19, No. 1, 2003, pp. 79-86.
4. Flynn, M.J. "Very high-speed computing systems," in: Readings in computer architecture, Morgan Kaufmann Publishers Inc., 2000, pp. 519-527.
5. Grahne, G., and Zhu, J. "Efficiently using prefix-trees in mining frequent itemsets." in FIMI'03 Workshop on Frequent Itemset Mining Implementations, Melbourne, Florida, 2003.
6. Han, J., and Kamber, M. Data Mining: Concepts and Techniques, (2 ed.), Morgan Kaufmann, 2006.
7. Han, J., and Pei, J. "Mining frequent patterns by pattern-growth: methodology and implications." in SIGKDD Explor. Newsl., 2000, pp. 14-20.
8. Han, J., Pei, J., Yin, Y., and Mao, R. "Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach," Data Mining and Knowledge Discovery Vol. 8, No. 1, 2004, pp. 53-87.
9. http://fimi.cs.helsinki.fi/data/
10. http://research.i2r.a-star.edu.sg/rp/Leukemia/ALLAML.html
11. http://research.i2r.a-star.edu.sg/rp/NervousSystem/NervousSystem.html
12. Jin, R., and Agrawal, G. "Communication and Memory Efficient Parallel Decision Tree Construction." in Proceedings of Third SIAM Conference on Data Mining, 2003.
13. Jin, R., Yang, G., and Agrawal, G. "Shared Memory Parallelization of Data Mining Algorithms: Techniques, Programming Interface, and Performance," IEEE Transactions on Knowledge and Data Engineering Vol. 17, No. 1, 2005, pp. 71-89.
14. Li, J., and Wong, L. "Identifying good diagnostic genes or genes groups from gene expression data by using the concept of emerging patterns," Bioinformatics Vol. 18, No. 5, 2002, pp. 725-734.
15. Lin, D.-I., and Kedem, Z.M. "Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set," Lecture Notes in Computer Science Vol. 1377, 1998, pp. 105-121.
16. Liu, H., Han, J., Xin, D., and Shao, Z. "Mining Interesting Patterns from Very High Dimensional Data: A Top-Down Row Enumeration Approach." in Proc 2006 SIAM Conf. On Data Mining, 2006, pp. 280-291.
17. Niehrs, C., and Pollet, N. "Synexpression groups in eukaryotes," Nature Vol. 402, 1999, pp. 483-487.
18. Pan, F., Cong, G., Tung, A.K.H., Yang, J., and Zaki, M.J. "Carpenter: finding closed patterns in long biological datasets," in: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM Press, Washington, D.C., 2003, pp. 637-642.
19. Pan, F., Cong, G., Xin, X., and Tung, A.K.H. "COBBLER: Combining Column and Row Enumeration for Closed Pattern Discovery." in Proc 2004 Int. Conf. on Scientific and Statistical Database Management (SSDBM'04), Santorini Island, Greece, 2004, pp. 21-30.
20. Pasquier, N., Bastide, Y., Taouil, R., and Lakhal, L. "Discovering frequent closed itemsets for association rules." in Proc. 7th Int. Conf. Database Theory, 1999, pp. 398-416.
21. Pei, J., Han, J., and Mao, R. "CLOSET: An efficient algorithm for mining frequent closed itemsets." in Proc. 2000 ACM-SIGMOD Int. Workshop Data Mining and Knowledge Discovery, Dallas, TX, 2000, pp. 11-20.
22. Pramudiono, I., and Kitsuregawa, M. "FP-tax: tree structure based generalized association rule mining." in Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, ACM Press, Paris, France, 2004, pp. 60-63.
23. Sieb, C., Fatta, G.D., and Berthold, M.R. "A Hierarchical Distributed Approach for Mining Molecular Fragments." in Proceedings of the International Workshop on Parallel Data Mining, Berlin, Germany, 2006, pp. 25-37.
24. Steinhaeuser, K., Chawla, N.V., and Kogge, P. "Exploiting Thread-Level Parallelism to Build Decision Trees." in Proceedings of the Workshop on Parallel and Distributed Data Mining, Berlin, Germany, 2006, pp. 13-24.
25. Zaki, M.J., and Ho, C.-T. Large-Scale Parallel Data Mining, Springer, 2000.
26. Zaki, M.J., and Hsiao, C.-J. "CHARM: An Efficient Algorithm for Closed Itemset Mining." in Proceedings of the Second SIAM International Conference on Data Mining, SIAM, 2002, pp. 12-28.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top