臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.13) 您好！臺灣時間：2025/11/24 06:20

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
紙本論文
QR Code

本論文永久網址:

研究生:

溫英翔

研究生(外文):

Ying-Hsiang Wen

論文名稱:

應用於關聯性規則探勘之平行化硬體架構

論文名稱(外文):

Parallel Hardware Architecture for Mining Association Rules

指導教授:

陳銘憲

指導教授(外文):

Ming-Syan Chen

學位類別:

碩士

校院名稱:

國立臺灣大學

系所名稱:

電機工程學研究所

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2006

畢業學年度:

語文別:

英文

論文頁數:

中文關鍵詞:

資料探勘、關聯性規則、硬體加速、心縮陣列、管線化、可程式化邏輯閘陣列

外文關鍵詞:

data mining、association rules、hardware-enhanced、systolic array、pipeline、FPGA

相關次數:

被引用:0
點閱:204
評分:
下載:0
書目收藏:0

自從早期關聯性規則探勘演算法Apriori被提出後，相關演算法已被廣泛研究與討論。然而隨著資料大量的成長，利用軟體演算法處理大量的資料，在效能上會遇到瓶頸。

有鑑於此，在本論文中，我們提出了一個管線化(Pipeline)排程的硬體架構，可以同時執行以下多個運算：利用心縮陣列(Systolic Array)做平行化的項目組(Itemsets)比對，並且可以同時運用雜湊表(Hash Table)方法來刪去掉不必要的候選人項目組(Candidate Itemsets)，此外透過紀錄每筆交易資料中，項目出現在候選人項目組的次數，使用修飾過濾器(Trimming filter)，我們可以過濾掉低頻(Infrequent)的項目，進而加速整個資料探勘的速度。

從各種實驗數據看來，相對於傳統完全採用軟體去執行資料探勘的演算法，使用硬體加速在效能上可以得到可觀的改進。

Generally speaking, to implement Apriori-based association rule mining in hardware, one has to load candidate itemsets and a database into the hardware. Since the capacity of the hardware architecture is fixed, if the number of candidate itemsets or the number of items in the database is larger than the hardware capacity, the items are loaded into the hardware separately. The time complexity is in proportion to the number of candidate
itemsets multiplied by the number of items in the database. Too many candidate itemsets and a large database would create a performance bottleneck. In this thesis, we propose a HAsh-based and PiPelIned architecture (abbreviated as HAPPI) for hardware-enhanced association rule mining. We apply the pipeline methodology in the HAPPI architecture to compare itemsets with the database and collect useful information for reducing the number
of candidate itemsets and items in the database simultaneously. When the database is fed into the hardware, candidate itemsets are compared with the items in the database to find frequent itemsets. At the same time, trimming information is collected from each
transaction. In addition, itemsets are generated from transactions and hashed into a hash table. The useful trimming information and the hash table enable us to reduce the number of items in the database and the number of candidate itemsets. Therefore, we can effectively reduce the frequency of loading the database into the hardware. As such, HAPPI solves the bottleneck problem in Apriori-based hardware schemes. We also derive some properties to investigate the performance of this hardware implementation. As shown by the experiment results, HAPPI significantly outperforms the previous hardware approach in terms of execution cycles.

1 Introduction 5
2 Related Works 11
3 Preliminaries 14
4 HAPPI Architecture 18
4.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 PipelineDesign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3 Transaction Trimming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.4 Hash Table Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.5 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5 Experiment Results 36
5.1 Generation of SyntheticData . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.3 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.4 Scale-up Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6 Conclusion 44

[1] R. Agarwal, C. Aggarwal, and V. Prasad. A tree projection algorithm for generation
of frequent itemsets. Journal of Parallel and Distributed Computing, 2000.
[2] R. Agrawal and J. C. Shafer. Parallel mining of association rules. IEEE Transactions
on Knowledge and Data Engineering, 8(6):962—969, December 1996.
[3] R. Agrawal and R. Srikant. Fast algorithms for mining association rules. Proceedings
of the 20th International Conference on Very Large Databases, 1994.
[4] Z. Baker and V. K. Prasanna. Efficient hardware data mining with the apriori algorithm
on fpgas. Proceedings of the Thirteenth Annual IEEE Symposium on Field
Programmable Custom Computing Machines, 2005.
[5] C. Besemann and A. Denton. Integration of profile hidden markov model output
into association rule mining. Proceedings of the 11th ACM SIGKDD International
Conference on Knowledge Discovery in Data Mining, pages 538—543, 2005.
[6] C. W. Chen, J. Luo, and K. J. Parker. Image segmentation via adaptive k-mean clustering
and knowledge-based morphological operations with biomedical applications.
IEEE Transactions on Image Processing, 7(12):1673—1683, 1998.
[7] S. M. Chung and C. Luo. Parallel mining of maximal frequent itemsets from databases.
Proceedings of the 15th IEEE International Conference on Tools with Artificial
Intelligence, 2003.
[8] S. Cong, J. Han, J. hoeflinger, and D. Padua. A sampling -based framework for parallel
data mining. Proceedings of the tenth ACM SIGPLAN symposium on Principles
and practice of parallel programming, June 2005.
[9] A. Corporation. http://www.altera.com.
[10] M. Estlick, M. Leeser, J. Szymanski, and J. Theiler. Algorithmic transformations in
the implementation of k-means clustering on reconfigurable hardware. Proceedings
of the Ninth Annual IEEE Symposium on Field Programmable Custom Computing
Machines (FCCM), 2001.
[11] F. Ferrandi, P. Luca, and D. Sciuto. Mining interesting patterns from hardwaresoftware
codesign data with the learning classifier system xcs. Proceedings of IEEE
CEC, December 2003.
[12] M. Franzmeier, C. Pohl, M. Porrmann, and U. Ruckert. Hardware accelerated data
analysis. Proceedings of the 4th International Conference on Parallel Computing in
Electrical Engineering (PARELEC), pages 309—314, September 2004.
[13] M. Gokhale, J. Frigo, K. McCabe, J. Theiler, C. Wolinski, and D. Lavenier. Experience
with a hybrid processor: K-means clustering. The Journal of Supercomputing,
pages 131—148, 2003.
[14] E.-H. Han, G. Karypis, and V. Kumar. Scalable parallel data mining for association
rules. Proceedings of ACM SIGMOD Conference on Management of Data, pages
277—288, 1997.
[15] J. Han andM. Kamber. Data Mining: Concepts and Techinques. Morgan Kaufmanm,
2001.
[16] J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation.
Proceedings of ACM SIGMOD, pages 1—12, May 2000.
[17] H. Kung and C. Leiserson. Systolic arrays for vlsi. Proceedings of Sparse Matrix,
1976.
[18] S. Kung. VLSI Array Processors. Prentice Hall, 1999.
[19] M. Leeser, J. Theiler, M. Estlick, and J. J. Szymanski. Design tradeoffs in a hardware
implementation of the k-means clustering algorithm. Proceedings of SPIE, pages 99—
106, 2000.
[20] N. Ling and M. Bayoumi. Specification and Verification of Systolic Arrays. World
Scientific Publishing, 1999.
[21] W.-C. Liu, K.-H. Liu, andM.-S. Chen. High performance data stream processing on a
novel hardware enhanced framework. Proceedings of the 10th Pacific-Asia Conference
on Knowledge Discovery and Data Mining, April 2006.
[22] R. Lysecky and F. Vahid. A study of the speedups and competitiveness of fpga soft
processor cores using dynamic hardware/software partitioning. Proceedings Of the
Design Automation and Test in Europe (DATE), March 2005.
[23] K. K. Parhi. VLSI Digital Signal Processing Systems : Design and Implementation.
Wiley-Interscience, 1998.
[24] J. Park, M.-S. Chen, and P. Yu. An effective hash based algorithm for mining
association rules. Proceedings of ACM SIGMOD, pages 175—186, May 1995.
[25] J. Park, M.-S. Chen, and P. Yu. Efficient parallel data mining for association rules.
Proceedings of the fourth international conference on Information and knowledge
management, pages 31—36, 1995.
[26] A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association
rules in large databases. Proceedings of the 21th International Conference on
Very Large Data Bases, pages 432—444, September 1995.
[27] J. Theiler, J. Frigo, M. Gokhale, and J. J. Szymanski. Co-design of software and
hardware to implement remote sensing algorithms. Proceedings of SPIE, pages 86—
99, 2001.
[28] H. Toivonen. Sampling large databases for association rules. Proceedings of the 22th
International Conference on Very Large Data Bases table of contents, pages 134 —
145, 1996.
[29] C.Wolinski, M. Gokhale, and K. McCabe. A reconfigurable computing fabric. Proceedings
of the Engineering of Reconfigurable Systems and Algorithms (ERSA), 2004.
[30] M. J. Zaki. Parallel and distributed association mining: A survey. IEEE Concurrency,
1999.

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	運用類神經網路與資料探勘技術於網路教學課程推薦之研究
2.	圖書館個人化館藏推薦系統
3.	資料採礦技術在病例與藥品關連性之研究
4.	智慧型代理人於電子商務之整合與應用
5.	資料採礦之應用研究─台灣地區漁市場行情資料庫之關聯法則分析
6.	醫院門診資料探勘─以虎尾若瑟醫院為例
7.	中醫院揀藥儲位規劃之研究
8.	利用資料挖掘技術提供網際網路使用者個人化服務
9.	模糊邏輯與資料探勘技術為基礎在顧客關係管理上之研究與應用
10.	應用資料挖掘於目標行銷之研究
11.	資料探勘應用於需求鏈協同設計與新產品開發之研究
12.	應用人工智慧技術於人類慢性疾病管理
13.	應用資料探勘技術於數位圖書館之個人化服務及管理
14.	網路犯罪模式分析及偵防對策之研究
15.	應用資料探勘於顧客的行為分析-以半導體業為例

1.	41、蔡碧玉，1999年，月旦法學雜誌，第45期論我國檢察官在刑事訴訟上應有的角色與定位。
2.	37、褚劍鴻，1997年，法令月刊第四十八卷第八期─偵查機關與偵查犯罪職權之比較研究。
3.	35、楊永年，1997年，警學叢刊第28卷1期，警察組織績效評估─就李總統六個月內改善治安之承諾論述。
4.	34、楊永年，2000年，警學叢刊31卷3期─跨世紀刑事警察組織體系之研究。
5.	12、林燦璋，1994年，《警政學報》，第二十四期，系統化的犯罪分析：程式、方式與自動化犯罪剖析之探討。
6.	08、余振華、康順興，1999年，理論與政策第51期─正當法律程序原則與檢警關係之研究。
7.	07、余振華、康順興，2000年，月旦法學─中日檢警關係及偵查主體法制之比較考察。

1.	微影成像模擬使用同調與部份同調光源
2.	新應許之地？桃園地區跨國移工勞動力再生產之空間
3.	利用網路編碼實施高頻寬效益多點傳播檔案分享
4.	二點葉蟎大量繁殖及其應用
5.	對郭象《莊子注》詮釋方法的反思
6.	非均勻地震資料的統計分析
7.	研究人員科技查新服務需求之研究—以台灣大學生物科學領域研究人員為例
8.	非線性系統之適應性模糊小腦模式控制
9.	快速多層介質電容萃取
10.	利用蛋白質序列預測蛋白質作用區段
11.	一種在異質網路中對於垂直無縫交遞的聰明決策模組
12.	利用二級結構資訊提昇蛋白質非穩定區段的預測準確度
13.	利率目標區與貨幣供給目標區之抉擇
14.	利用農桿菌注入法探討葵百合LsGRP1與水楊酸誘導抗病之相關性
15.	RFID應用於營運與資產管理之科技問題與解決方法

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室