跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.13) 您好!臺灣時間:2025/11/24 06:20
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:溫英翔
研究生(外文):Ying-Hsiang Wen
論文名稱:應用於關聯性規則探勘之平行化硬體架構
論文名稱(外文):Parallel Hardware Architecture for Mining Association Rules
指導教授:陳銘憲陳銘憲引用關係
指導教授(外文):Ming-Syan Chen
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:電機工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2006
畢業學年度:94
語文別:英文
論文頁數:50
中文關鍵詞:資料探勘關聯性規則硬體加速心縮陣列管線化可程式化邏輯閘陣列
外文關鍵詞:data miningassociation ruleshardware-enhancedsystolic arraypipelineFPGA
相關次數:
  • 被引用被引用:0
  • 點閱點閱:204
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
自從早期關聯性規則探勘演算法Apriori被提出後,相關演算法已被廣泛研究與討論。然而隨著資料大量的成長,利用軟體演算法處理大量的資料,在效能上會遇到瓶頸。

有鑑於此,在本論文中,我們提出了一個管線化(Pipeline)排程的硬體架構,可以同時執行以下多個運算:利用心縮陣列(Systolic Array)做平行化的項目組(Itemsets)比對,並且可以同時運用雜湊表(Hash Table)方法來刪去掉不必要的候選人項目組(Candidate Itemsets),此外透過紀錄每筆交易資料中,項目出現在候選人項目組的次數,使用修飾過濾器(Trimming filter),我們可以過濾掉低頻(Infrequent)的項目,進而加速整個資料探勘的速度。

從各種實驗數據看來,相對於傳統完全採用軟體去執行資料探勘的演算法,使用硬體加速在效能上可以得到可觀的改進。
Generally speaking, to implement Apriori-based association rule mining in hardware, one has to load candidate itemsets and a database into the hardware. Since the capacity of the hardware architecture is fixed, if the number of candidate itemsets or the number of items in the database is larger than the hardware capacity, the items are loaded into the hardware separately. The time complexity is in proportion to the number of candidate
itemsets multiplied by the number of items in the database. Too many candidate itemsets and a large database would create a performance bottleneck. In this thesis, we propose a HAsh-based and PiPelIned architecture (abbreviated as HAPPI) for hardware-enhanced association rule mining. We apply the pipeline methodology in the HAPPI architecture to compare itemsets with the database and collect useful information for reducing the number
of candidate itemsets and items in the database simultaneously. When the database is fed into the hardware, candidate itemsets are compared with the items in the database to find frequent itemsets. At the same time, trimming information is collected from each
transaction. In addition, itemsets are generated from transactions and hashed into a hash table. The useful trimming information and the hash table enable us to reduce the number of items in the database and the number of candidate itemsets. Therefore, we can effectively reduce the frequency of loading the database into the hardware. As such, HAPPI solves the bottleneck problem in Apriori-based hardware schemes. We also derive some properties to investigate the performance of this hardware implementation. As shown by the experiment results, HAPPI significantly outperforms the previous hardware approach in terms of execution cycles.
1 Introduction 5
2 Related Works 11
3 Preliminaries 14
4 HAPPI Architecture 18
4.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 PipelineDesign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3 Transaction Trimming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.4 Hash Table Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.5 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5 Experiment Results 36
5.1 Generation of SyntheticData . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.3 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.4 Scale-up Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6 Conclusion 44
[1] R. Agarwal, C. Aggarwal, and V. Prasad. A tree projection algorithm for generation
of frequent itemsets. Journal of Parallel and Distributed Computing, 2000.
[2] R. Agrawal and J. C. Shafer. Parallel mining of association rules. IEEE Transactions
on Knowledge and Data Engineering, 8(6):962—969, December 1996.
[3] R. Agrawal and R. Srikant. Fast algorithms for mining association rules. Proceedings
of the 20th International Conference on Very Large Databases, 1994.
[4] Z. Baker and V. K. Prasanna. Efficient hardware data mining with the apriori algorithm
on fpgas. Proceedings of the Thirteenth Annual IEEE Symposium on Field
Programmable Custom Computing Machines, 2005.
[5] C. Besemann and A. Denton. Integration of profile hidden markov model output
into association rule mining. Proceedings of the 11th ACM SIGKDD International
Conference on Knowledge Discovery in Data Mining, pages 538—543, 2005.
[6] C. W. Chen, J. Luo, and K. J. Parker. Image segmentation via adaptive k-mean clustering
and knowledge-based morphological operations with biomedical applications.
IEEE Transactions on Image Processing, 7(12):1673—1683, 1998.
[7] S. M. Chung and C. Luo. Parallel mining of maximal frequent itemsets from databases.
Proceedings of the 15th IEEE International Conference on Tools with Artificial
Intelligence, 2003.
[8] S. Cong, J. Han, J. hoeflinger, and D. Padua. A sampling -based framework for parallel
data mining. Proceedings of the tenth ACM SIGPLAN symposium on Principles
and practice of parallel programming, June 2005.
[9] A. Corporation. http://www.altera.com.
[10] M. Estlick, M. Leeser, J. Szymanski, and J. Theiler. Algorithmic transformations in
the implementation of k-means clustering on reconfigurable hardware. Proceedings
of the Ninth Annual IEEE Symposium on Field Programmable Custom Computing
Machines (FCCM), 2001.
[11] F. Ferrandi, P. Luca, and D. Sciuto. Mining interesting patterns from hardwaresoftware
codesign data with the learning classifier system xcs. Proceedings of IEEE
CEC, December 2003.
[12] M. Franzmeier, C. Pohl, M. Porrmann, and U. Ruckert. Hardware accelerated data
analysis. Proceedings of the 4th International Conference on Parallel Computing in
Electrical Engineering (PARELEC), pages 309—314, September 2004.
[13] M. Gokhale, J. Frigo, K. McCabe, J. Theiler, C. Wolinski, and D. Lavenier. Experience
with a hybrid processor: K-means clustering. The Journal of Supercomputing,
pages 131—148, 2003.
[14] E.-H. Han, G. Karypis, and V. Kumar. Scalable parallel data mining for association
rules. Proceedings of ACM SIGMOD Conference on Management of Data, pages
277—288, 1997.
[15] J. Han andM. Kamber. Data Mining: Concepts and Techinques. Morgan Kaufmanm,
2001.
[16] J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation.
Proceedings of ACM SIGMOD, pages 1—12, May 2000.
[17] H. Kung and C. Leiserson. Systolic arrays for vlsi. Proceedings of Sparse Matrix,
1976.
[18] S. Kung. VLSI Array Processors. Prentice Hall, 1999.
[19] M. Leeser, J. Theiler, M. Estlick, and J. J. Szymanski. Design tradeoffs in a hardware
implementation of the k-means clustering algorithm. Proceedings of SPIE, pages 99—
106, 2000.
[20] N. Ling and M. Bayoumi. Specification and Verification of Systolic Arrays. World
Scientific Publishing, 1999.
[21] W.-C. Liu, K.-H. Liu, andM.-S. Chen. High performance data stream processing on a
novel hardware enhanced framework. Proceedings of the 10th Pacific-Asia Conference
on Knowledge Discovery and Data Mining, April 2006.
[22] R. Lysecky and F. Vahid. A study of the speedups and competitiveness of fpga soft
processor cores using dynamic hardware/software partitioning. Proceedings Of the
Design Automation and Test in Europe (DATE), March 2005.
[23] K. K. Parhi. VLSI Digital Signal Processing Systems : Design and Implementation.
Wiley-Interscience, 1998.
[24] J. Park, M.-S. Chen, and P. Yu. An effective hash based algorithm for mining
association rules. Proceedings of ACM SIGMOD, pages 175—186, May 1995.
[25] J. Park, M.-S. Chen, and P. Yu. Efficient parallel data mining for association rules.
Proceedings of the fourth international conference on Information and knowledge
management, pages 31—36, 1995.
[26] A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association
rules in large databases. Proceedings of the 21th International Conference on
Very Large Data Bases, pages 432—444, September 1995.
[27] J. Theiler, J. Frigo, M. Gokhale, and J. J. Szymanski. Co-design of software and
hardware to implement remote sensing algorithms. Proceedings of SPIE, pages 86—
99, 2001.
[28] H. Toivonen. Sampling large databases for association rules. Proceedings of the 22th
International Conference on Very Large Data Bases table of contents, pages 134 —
145, 1996.
[29] C.Wolinski, M. Gokhale, and K. McCabe. A reconfigurable computing fabric. Proceedings
of the Engineering of Reconfigurable Systems and Algorithms (ERSA), 2004.
[30] M. J. Zaki. Parallel and distributed association mining: A survey. IEEE Concurrency,
1999.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top