跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.80) 您好!臺灣時間:2024/12/08 02:35
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:劉韋權
研究生(外文):Wei-Chuan Liu
論文名稱:應用於資料探勘上之高效能硬體架構設計
論文名稱(外文):High Performance Hardware Enhanced Frameworks on Data Mining
指導教授:陳銘憲陳銘憲引用關係
指導教授(外文):Ming-Syan Chen
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:電機工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2005
畢業學年度:93
語文別:英文
論文頁數:52
中文關鍵詞:硬體設計資料探勘頻繁時間樣式分群演算法
外文關鍵詞:hardware enhanceddata miningfrequent temporal patternk-meansclusteringFPGA
相關次數:
  • 被引用被引用:0
  • 點閱點閱:514
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
使用硬體去加速資料探勘演算法是一個新興的議題。在本論文中,我們針對頻繁時間樣式探勘與資料分群演算法分別提出相對應的硬體架構來提高效能,藉由硬體的平行性去加速資料探勘演算法中最耗時的程序,以提昇整個演算法的資料處理速度。針對頻繁時間樣式探勘,我們提出了一個Apriori-like演算法的電路去處理會隨著資料項目增加而呈指數成長的頻繁雙項目組(frequent 2-itemsets)。透過該電路,資料只要經過一次掃描,頻繁單一與雙項目組都能在固定的時間水準內完成。另外針對資料分群演算法硬體改進方案,我們整合硬體的質心(centroid)更新機制進入資料分群演算法的執行流程,大量減少質心更新的時間以提高效能。從各種實驗數據看來,相對於傳統完全採用軟體去執行資料探勘的演算法,使用硬體加速在效能上可以得到可觀的改進。
Hardware enhanced mining is an emerging issue. In this thesis, we propose two frameworks to enhance the speed of mining problems: temporal pattern mining in data streams and K-means clustering algorithm. By exploiting the parallelism in hardware, many data mining primitive subtasks can be executed with high throughput, thus increasing the performance of the overall data mining tasks. Specifically, in temporal pattern mining we realize Apriori-like algorithm within our proposed hardware enhanced mining framework. Even with the quadratic increase of the size of 2-itemsets, the counts of frequent 1-itemsets and 2-itemsets are obtained after one pass of the datasets through our hardware implementation, thus the throughput is maintained at constant level. Moreover, we propose a KACU (standing for K-means with hArdware Centroid updating) framework which integrates a hardware centroid updating mechanism into the procedure of continuous K-means algorithm. The proposed hardware frameworks are implemented in commercial Field Programmable Gate Array (FPGA) devices in order to measure their performance. The experimental results show that the hardware enhancements achieve considerably higher performance than traditional mining algorithm architectures with pure software implementation.
Chapter 1. Introduction 1
Chapter 2. Preliminaries 5
Chapter 3. Mining Frequent Temporal Patterns over Data Streams 9
3.1 Introduction 9
3.1.1 Background 9
3.1.2 Motivation 13
3.2 Hardware Design 14
3.2.1 Environment 15
3.2.2 Architecture of Hardware Stream Processor 16
3.2.3 The 2-itemset Generator 18
3.2.4 Frequent Decision 20
3.3 Performance Analysis 20
3.4 Results and Discussion 22
Chapter 4. KACU: K-means with hArdware Centroid-Updating 29
4.1 Introduction 29
4.1.1 Standard K-means Algorithm 29
4.1.2 Motivation 32
4.1.3 Systolic Process Array 33
4.1.4 Continuous K-means Algorithm 34
4.2 Hardware Design 35
4.3 Performance analysis 38
4.4 Results and Discussion 40
Chapter 5. Conclusions 45
Reference: 47
[1]Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S. Yu, “On demand classification of data streams.” Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004
[2]R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. ”Fast discovery of association rules.” In Advances in Knowledge Discovery and Data Mining, pages 307—328. AAAI Press, 1996.
[3]R. Agrawal and John C. Shafer. “Parallel mining of association rules.” IEEE Transactions on Knowledge and Data Engineering, Vol. 8, NO. 6,pages 962-969 , December 1996.
[4]R. Agrawal and R. Srikant. “Fast algorithms for mining association rules.”, Proceeding of 20th Int. Conf. Very Large Data Bases, VLDB, pages 487—499. Morgan Kaufmann, 12—15 1994.
[5]Altera Corporation. http://www.altera.com.
[6]A. Bagchi, A. Chaudhary, D. Eppstein, and M. T. Goodrich. “Deterministic sampling and range counting in geometric data streams.” Proceeding of the 20th ACM Symposium on Computational Geometry, pages 144—151
[7]Z. K. Baker and V. K. Prasanna, “Efficient Hardware Data Mining with the Apriori Algorithm on FPGAs,” Proceedings of the Thirteenth Annual IEEE Symposium on Field Programmable Custom Computing Machines (FCCM ''05), 2005
[8]Ordonez C., “Clustering Binary Data Streams with K-means.” Proceeding of ACM DMKD, 2003.
[9]Joong Hyuk Chang and Won Suk Lee. “Finding recent frequent itemsets adaptively over online data streams.” Proceeding of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003.
[10]Ming-Syan Chen, Jiawei Han, and Philip S. Yu. “Data mining: an overview from a database perspective.” IEEE Trans. On Knowledge And Data Engineering, 8:866—883, 1996.
[11]C. W. Chen, J. Luo, and K. J. Parker, “Image Segmentation via Adaptive K-Mean Clustering and Knowledge-Based Morphological Operations with Biomedical Applications”, IEEE Transactions on Image Processing, vol. 7, no. 12, pp. 1673-1683, 1998.
[12]Custom Instructions. http://www.altera.com/products/ip/processors/nios2/features/ni2-cust_instructions.html
[13]Bi-Ru Dai, Jen-Wei Huang, Mi-Yen Yeh, Ming-Syan Chen: “Clustering on Demand for Multiple Data Streams.” Proceeding of the IEEE 4th Intern''l Conf. on Data Mining (ICDM-2004), November 1-4, 2004.
[14]M. Datar, A. Gionis, P. Indyk, and R. Motwani. “Maintaining Stream Statistics over Sliding Windows.” Proceedings of the 2002 Annual ACM-SIAM Symposium on Discrete Algorithms, January 2002.
[15]A. Dobra, M. N. Garofalakis, J. Gehrke, and R. Rastogi. “Processing Complex Aggregate Queries over Data Streams.” Proceedings of the 2002 ACM SIGMOD, pages 61—72, June 2002.
[16]P. Domingos and G. Hulten. “Mining High-Speed Data Streams.” Proceedings of the 6th ACM SIGKDD, pages 71—80, August 2000.
[17]BA Draper et al., "Accelerated image processing on FPGAs", IEEE Transactions on Image Processing, Volume 12, Issue 12, Dec. 2003 Page(s):1543 – 1551
[18]M. Estlick , M. Leeser , J. Theiler , J. J. Szymanski, “Algorithmic transformations in the implementation of K- means clustering on reconfigurable hardware,” Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays, p.103-110
[19]V. Faber, “Clustering and the continuous k-means algorithm,” Los Alamos Science 22, pp. 138-144, 1994.
[20]F. Ferrandi, P. Luca and D. Sciuto, “Mining Interesting Patterns from Hardware-Software Codesign Data with the Learning Classifier System XCS”. Proceeding of IEEE CEC 2003 – Congress on Evolutionary Computation, Canberra, Australia, 9-12 December 2003, pp. 1486–1492
[21]M. Franzmeier, C. Pohl, M. Porrmann,and U. Rückert, "Hardware Accelerated Data Analysis." Proceedings of the 4th International Conference on Parallel Computing in Electrical Engineering (PARELEC 2004). Dresden, Germany, 7 - 10 September 2004, S. 309-314
[22]M. Gokhale, J. Frigo, K. McCabe, J. Theiler, C. Wolinski, D. Lavenier: “Experience with a Hybrid Processor: K-means Clustering.” The Journal of Supercomputing 26(2): 131-148 (2003)
[23]Eui-Hong Han, George Karypis, and Vipin Kumar. ”Scalable parallel data mining for association rules.” Proceeding of ACM SIGMOD Conference on Management of Data, pages 277—288, 1997.
[24]Jiawei Han and Micheline Kamber. “Data Mining: Concepts and Techniques.” Morgan Kaufmann, 2000.
[25]W. Daniel Hillis and Jr. Guy L. Steele. “Data parallel algorithms.” Communications of the ACM, 29(12):1170—1183, 1986.
[26]Jack S.N. Jean, Guozhu Dong, Hwa Zhang, Xinzhong Guo, and Baifeng Zhang. “Query processing with an fpga coprocessor board.” Proceedings of The First International Conference on ENGINEERING OF RECONFIGURABLE SYSTEMS AND ALGORITHMS (ERSA), 2001.
[27]Ruoming Jin and Gagan Agrawal. “An algorithm for in-core frequent itemset mining on streaming data.” In Research Report. http://wis.cs.ucla.edu/ hxwang/stream/jinsubmitted.pdf, 2005.
[28]Richard M. Karp and Scott Shenker. “A simple algorithm for finding frequent elements in streams and bags”. ACM Transactions on Database Systems, 2003.
[29]S.Y. Kung. “VLSI Array Processors.” Prentice Hall, 1998.
[30]R. Lysecky and F. Vahid, “A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning”, Proceeding Of the Design Automation and Test in Europe (DATE), March 2005.
[31]M. Leeser, J. Theiler, M. Estlick, and J. J. Szymanski, “Design Tradeoffs in a Hardware Implementation of the K-means Clustering Algorithm”, http://nis-www.lanl.gov/~jt/Papers/kmeans-sam.ps
[32]J. B. MacQueen (1967): "Some Methods for classification and Analysis of Multivariate Observations,” Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, 1:281-297
[33]B. Maliatski, O. Yadid-Pecht, ”Hardware-driven adaptive K-means clustering for real-time video imaging.” IEEE Trans. Circuits Syst. Video Techn. 15(1): 164-166 (2005)
[34]Gurmeet Singh Manku and Rajeev Motawani. “Approximate frequency counts over data streams”. Proceeding of VLDB, 2002.
[35]H. Mannila G. Renganathan G. Das, K.-I. Lin and P. Smyth. “Rule discovery from time series.” Proceeding of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1998.
[36]Andreas Mueller. “Fast sequential and parallel algorithms for association rule mining: A comparison.” Technical Report CS-TR-3515, Departure of Computer Science, University of Maryland, College Park, MD, 1995.
[37]Lenart, T. Owall, V. Gustafsson, M. Sebesta, M. Egelberg, P. “Accelerating signal processing algorithms in digital holography using an FPGA platform“ Proceedings of IEEE International Conference on Field-Programmable Technology (FPT), 15-17 Dec. 2003 Page(s):387 – 390
[38]Keshab K. Parhi. “VLSI Digital Signal Processing Systems : Design and Implementation.” Wiley-Interscience, 1998.
[39]Jong Soo Park, Ming-Syan Chen, and Philip S. Yu. “An effective hash based algorithm for mining association rules.” Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, pages 175—186, San Jose, California, 22—25 1995.
[40]Jong Soo Park, Ming-Syan Chen, and Philip S. Yu. “Efficient parallel data mining for association rules.” Proceedings of the fourth international conference on Information and knowledge management, pages 31—36. ACM Press, 1995.
[41]S. Rajamani and P. Viswanath. “A quantitative analysis of processor - programmable logic interface.” IEEE Symposium on FPGAs for Custom Computing Machines, pages 226—234, Los Alamitos, CA, 1996. IEEE Computer Society Press.
[42]Erik Riedel, Christos Faloutsos, Garth A. Gibson, and David Nagle. “Active disks for large-scale data processing.” IEEE Computer, 34:68—74, 2001.
[43]Guha, S., Meyerson, A., Mishra, N., Motwani, R., and O''Callaghan, L. “Clustering data streams: Theory and practice.” IEEE Trans. Knowl. Data Eng 15, 3 (2003), 515—528
[44]H. Spath, “Clustering Dissection and Analysis: Theory, FORTRAN programs, Examples,” Ellis Horwood Limited, Chichester, UK, 1985
[45]Yufei Tao, Dimitris Papadias, Jian Zhai, Qing Li: Venn “Sampling: A Novel Prediction Technique for Moving Objects.” Proceedings of 21st IEEE International Conference on Data Engineering (ICDE), pp. 680-691, Tokyo, Japan, April 5-8, 2005
[46]Wei-Guang Teng, Ming-Syan Chen, and Philips S. Yu. “A regression-based temporal pattern mining scheme for data streams.” Proceeding of VLDB, 2003.
[47]J. Theiler, J. Frigo, M. Gokhale, and J. J. Szymanski. “Co-design of Software and Hardware to Implement Remote Sensing Algorithms.” Proceeding of SPIE 4480 (2001) 86-99.
[48]Lars Wanhammar. “DSP Integrated Circuits.” Academic Press, 1999.
[49]B. West, Roger D. Chamberlain, Ronald S. Indeck, and Qiong Zhang, "An FPGA-based Search Engine for Unstructured Database." Proceeding of 2nd Workshop on Application Specific Processors, December 2003, pp. 25-32.
[50]Xilinx, Inc. http://www.xilinx.com.
[51]Mohammed J. Zaki. “Parallel and distributed association mining: A survey.” IEEE Concurrency, 7(4):14—25, 1999.
[52]Q. Zhang, Roger D. Chamberlain, Ronald S. Indeck, Benjamin West, and Jason White, "Massively Parallel Data Mining Using Reconfigurable Hardware: Approximate String Matching." Proceeding of Workshop on Massively Parallel Processing, April 2004.
[53]C. Zhou, D. Frankowski, P. J. Ludford, S. Shekhar, L. G. Terveen, “Discovering personal gazetteers: an interactive clustering approach.” Proceedings of ACM GIS 2004, Washington, DC. 2004.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊