(54.236.58.220) 您好!臺灣時間:2021/02/27 12:28
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:何達明
研究生(外文):Dar-Ming(Jonathan) Hao
論文名稱:排序對關聯演算法速度的影響
論文名稱(外文):Partial order improve the performance of the association rule computing
指導教授:劉士豪劉士豪引用關係
指導教授(外文):Vandy Liu
學位類別:碩士
校院名稱:中原大學
系所名稱:資訊管理研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:中文
論文頁數:100
中文關鍵詞:關聯式規則交易物項頻物項集合非高頻物項集合資料挖掘資料挖掘關聯式規則交易物項高頻物項集合非高頻物項集合
外文關鍵詞:sortcompressfp-treeaprioriassociation rulesdatamining
相關次數:
  • 被引用被引用:4
  • 點閱點閱:114
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
摘要
近年各企業全面電腦化的普及程度幾乎到了沒有企業不使用電腦的地步,而全方位的電腦化,使得企業累積了大量資料,也加快了資料累積的速度,而這些資料是企業經營的第一手資料,內含大量豐富的潛在資訊,可是以往都將這些資料做為財務報表的資料來源而己,鮮少於在其中發現有價值的隱含資訊,更不要說其中蘊含的知識.
但企業競爭的加劇,全球化市場的需求,商業由原來的守勢轉變為攻勢,掌握機先則成為生存的要務,在需求孔急眾緣齊備的狀況,資料探勘成為當紅炸子雞,
我們希望尋找資料挖掘中關聯規則(Association Rules) 的新架構。因古典的Apriori 演算法,利用猜測瀘除的方法,找出交易中品項間的關係,使用Apriori 演算法耗費時間,多次的產生龐大的備選集,再以掃描資料庫過瀘支持度低的備選集,需要大量的記憶體並且執行速度緩慢。本研究利用壓縮陣列將交易資料的關聯頻次紀錄在記憶體中,而再由完整的關聯頻次紀錄找出關聯規則,由於全部作業皆在記憶體中運作,直接歸納,不做猜測瀘除的步驟,故效率提昇顯著,而因為壓縮率較FP-Tree為高故在FP-Tree不適運作的狀況亦能穩定作業,且FP-Tree在實作時常使用指標連結,故較難serialize,也較難儲存結果,而SortCompress使用壓縮陣列,緊密且小,便於儲存結果,且直接循序讀出即可和新的壓縮陣列合併,故在實用上提供了更多的應用方式,適合做為基底的關聯規則演算法,提供其它變化型的關聯應用.
Abstract
In recent years many business use computer to keep their transaction data and control their business process. Until now all Corp. has their own computers keep a lot’s of their process digital data. Faster and faster data generates. Those data conclude very important information and some useful knowledge , but we had no idea to use and discover it , but now the information science has a big improves, the faster computer, the larger size disk, the more popular database application, those let’s can find the knowledge hides in database.
The way to find knowledge in data is called “Data mining”. One data mining method is called “Association Rules”. A classical method “Apriori” has use on it for many years, It’s a way to look for Association Rules but it’s really not a good way, Apriori use the Psychology “Guess and filter” so spend many efforts on something no need to care about, so It’s need many main memory and compute time. Someone discovered a smart way to do it. Use a compressed tree to record the pattern with counts , so can in two pass database read to find whole all association rules. The compressed tree we call it “FP-Tree”. It’s a pretty nice method to find association rules. It’s brought a reduced complexity order and lower efforts. But it’s not made by the god. It’s need much memory to construct the FP-Tree. So it can hardly use on very large database, if it need to swap the tree between memory and disk. It’s may be a nightmare. The master idea of FP-Tree is compress transaction database into a limited main memory.
Main memory faster than disk for ten thousand times. So use it can reduce lot’s of efforts on swap and scan.
Now we find an new way with large compress ratio on transfer transaction data into to main memory can stored and can easy to use it to extracts association rules. We named this logarithm “SortCompress”. We use a compress array to keep the transaction association rules and use sort to help the compress process faster. It slower than FP-Tree a little, but it just need half size memory to stored it, so we can process more complexity transaction data use just an normal computer, and do it well. Now we will introduce that how we do it.
Keywords: datamining, association rules, apriori, fp-tree, sortcompress,
目 錄
第一章緒 論………………………………………………………....1
第一節研究背景…………………………….……………………………..1
第二節研究動機…………………………………………………………...3
第三節研究目的…………………………………………………………...5
第二章文獻探討……………………………………………………..6
第一節關聯式法則(Association Rule)探討………………………….…..6
一、關聯式法則的意義及重要性….………………………………….6
二、關聯式法則的定義及相關名詞.………………………………….7
三、關聯式法則的相關應用……….………………………………... .8
第二節Apriori演算法探討….……………………………………………10
一、Apriori演算法….………………………………………………...10
二、Apriori演算法的改良.…………………………………………...11
第三節FP-TREE(Frequent Pattern Tree)..……………………………14
一、FP-TREE演算法...……………………………………………...14
二、FP-TREE的建構方式.……………………..…………………...15
第四節仿真資料(Synthetic Data)說明...…………………………………16
第三章研究方法……………………………………………………19
第一節研究問題………………………………………………………….19
第二節SortCompress方法.…………………….………………………20
第三節仿真資料設計…………………………….………………………23
第四章實驗及結果分析……………………………………………27
第一節測試環境………………………………….………………………27
第二節實驗設計………………………………….………………………27
第三節實驗結果及分析………………………….………………………28
第五章結論與未來研究……………………………………………32
第一節結論……………………………………….………………………32
第二節研究限制與未來研方向………………….………………………36
參考文獻……………………………………………………………..37
附錄一、仿真資料產生程式列表……….…………………………………38
附錄二、Apriori演算法程式……….………………………………………58
附錄三、FP-Tree……….……………………………………………………69
附錄四、SortCompress.………………………………………
參考文獻[1]A. Savasere, E. Omiecinski, and S. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases”, In Proceedings of 21st VLDB, pp.432-444,1995[2]D. Lin and Z. M. Kedem, “Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set”, In Proceedings of VI Intl. Conference on Extending Database Technology, 1998.[3]Eui-Hong Han, George Karypis and Vipin Kumar, “Scalable Parallel Data Mining for Association Rules”, IEEE Transactions on Knowledge and Data Engineering, Vol 12,No.3,May/JUNE 2000[4]H.Toivonen, “Sampling Large Databases for Association Rules”, VLDB, pp.134-145,1996[5]IBM Quest Data Mining Project, “Quest Synthetic Data Generation Code”, http://www.almaden.ibm.com/cs/quest/syndata.html,1996[6]J.S.Park,M.S.Chen,and P.S.Ui,”An Effective Hash Based Algorithm for Mining Association Rules”, Proceedings of the ACM SIGMOD, pp.1750186,1995[7]M.Houtsma and A. Swami, “Set-Oriented Mining of Association Rules in Relational Databases,” 11th int’l Conference on Data Engineer, 1995[8]M.J.Zaki, S. Parthasarathy, M. Ogihara, and W.Li, “New Algorithms for Fast Discovery of Association Rules”, 3rd Int’l Conference on Knowledge Discovery&Data Mining(KDD), Newport,CA, August 1997.[9]M.S. Chen, J.S. Park, and P.S. Yu, “Efficient Data Mining for Path Traversal Patterns”,IEEE Transactions on Knowledge and Data Engineering, Vol. 10, No.2, 1998, pp. 209-220
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊
 
系統版面圖檔 系統版面圖檔