跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.13) 您好!臺灣時間:2025/11/24 06:20
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:董建弘
研究生(外文):Chien-Hung Tung
論文名稱:以XQuery為基礎之漸進式XML關聯規則探勘方法
論文名稱(外文):Incremental XML Association Rules Mining MiningUsing XQuery
指導教授:曾守正曾守正引用關係
指導教授(外文):Frank S.C. Tseng
學位類別:碩士
校院名稱:國立高雄第一科技大學
系所名稱:資訊管理所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2006
畢業學年度:94
語文別:中文
論文頁數:105
中文關鍵詞:資料探勘關聯規則XMLXQuery漸進式方法
外文關鍵詞:Incremental Data MiningXMLXQuery.Association RulesData Mining
相關次數:
  • 被引用被引用:0
  • 點閱點閱:169
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
摘 要
XML 已逐漸成為組織間在網際網路上資料交換的標準,在未來將勢必
有大量的資料會以XML 格式儲存著,由此看來,未來資訊的傳遞與交換
將逐漸根植於XML。雖然在關聯式資料庫系統中已經有很成熟的資料探勘
技術,但對於XML 資料的探勘還像是處女地一般。本論文嘗試提出一個
方法,不僅能探勘出XML 文件中的關聯規則,並且可以在XML 文件有新
增刪除修改等異動時,只需針對異動的部份做計算,便可以與前次探勘的
結果整合成新的且可用的項目集資料,供使用者查詢關聯規則,我們稱這
個演算法為-IXARM (Incremental XML Association Rules Mining)。並已經
由實驗證明,本演算法的正確性以及在處理資料異動時效能上的優異表現。
ABSTRACT
XML has already been recognized as a standard for electronic data interchange
over the Internet. We believe a large amount of data will be represented
and stored in XML format in the near future. Therefore, we think it
will be indispensable to develop tools for mining information directly from
XML data. Although many well-known data minig methods have been developed,
they are almost based on relational database formats. In this paper, we
will propose an algorithm, called IXARM (Incremental XML Association
Rules Mining), which does not only extract association rules from XML
documents, but also offer flexible incremental mining tasks. That is, even
when there are many INSERT, DELETE or UPDATE events performed on
source XML documents, our algorithm re-computes the modified part only
and then combines the previous result to build new association rules in an efficient
way. We also have conducted experiment to show the accuracy and
performances are all satisfactory for most of the assocation rule mining applications
on XML documents.
目 錄
摘 要.......................................................................................... i
ABSTRACT ................................................................................. ii
誌 謝........................................................................................ iii
目 錄........................................................................................ iv
表目錄....................................................................................... vii
圖目錄...................................................................................... viii
公式目錄.......................................................................................x
1. 緒論.......................................................................................1
1.1 研究背景.....................................................................1
1.2 研究動機.....................................................................1
1.3 研究目的.....................................................................3
1.4 研究架構.....................................................................3
2. 文獻探討................................................................................4
2.1 關聯規則之探勘..........................................................4
2.1.1 關聯規則的探討..........................................................5
2.1.2 Apriori 演算法.............................................................6
2.1.3 Boolean 演算法..........................................................8
2.1.4 FP-Tree 演算法.........................................................11
2.2 XML 的儲存方式......................................................14
2.2.1 XML 檔案.................................................................14
2.2.2 XML 與關聯式資料庫...............................................15
2.2.3 原生型XML 資料庫..................................................16
2.3 XML 查詢語言..........................................................18
2.3.1 XPath.........................................................................18
2.3.2 Quilt ..........................................................................19
2.3.3 XQuery ......................................................................20
2.3.4 Microsoft ® SQL Server® 2005 XML Enable.............25
2.4 XML 資料的關聯規則探勘.......................................33
v
3. 漸進式探勘方法...................................................................35
3.1 基本概念...................................................................35
3.2 關聯規則整合方法....................................................36
3.2.1 整合多個XML 關聯規則之支持度的計算................36
3.2.2 漸增XML Data 的關聯規則之支持度的計算...........38
3.2.3 刪除XML Data 的關聯規則之支持度的計算...........38
3.2.4 IXARM 演算法.........................................................38
3.3 實作IXARM 演算法.................................................39
3.3.1 定義XML 文件.........................................................39
3.3.2 建立XML 檔案與項目集儲存表格...........................41
3.3.3 設計預存程序............................................................42
3.3.4 SQL Server 2005 XQuery 陳述式的撰寫...................51
4. 演算法的驗證與評估...........................................................55
4.1 演算法驗證...............................................................55
4.1.1 驗證資料...................................................................55
4.1.2 驗證高頻項目集........................................................56
4.1.3 驗證新增XML 文件..................................................57
4.1.4 驗證刪除XML 文件..................................................58
4.2 實驗評估...................................................................58
4.2.1 實驗設計...................................................................59
4.2.2 和非漸進式探勘演算之比較.....................................63
4.2.3 實驗參數對IXARM 演算法之效率評估...................66
4.3 實驗結果總結分析....................................................68
5. 結論與建議..........................................................................72
5.1 研究貢獻...................................................................72
5.2 未來研究...................................................................72
5.2.1 演算法效率的改善....................................................72
5.2.2 平行處理的應用........................................................73
參考文獻.....................................................................................74
附錄一 W3C XML 相關規格文件..........................................77
附錄二 W3C XPath 1.0 相關規格文件...................................78
vi
附錄三 W3C XQuery 1.0 相關規格文件................................79
附錄四 Table .........................................................................84
附錄五 StoredProcedure .........................................................85
vii
表目錄
表1-1 2005 DATA STORAGE FORMATS 調查 (KDNUGGETS)................ 2
表1-2 2006 DATA STORAGE FOR DATA MINING 調查 (KDNUGGETS) 2
表2-1 DATA MINING 技術採用調查 (KDNUGGETS) ............................. 4
表2-2 使用FP-GROWTH 演算法探勘出的高頻項目集........................... 14
表2-3 FILE SYSTEM DB TYPE 的NXD.................................................. 16
表2-4 OBJECT-ORIENTED DB TYPE 的NXD ........................................ 16
表2-5 MODEL-BASED DB TYPE 的NXD ............................................... 17
表2-6 RELATIONAL DB TYPE 的NXD .................................................. 17
表2-7 其他DB TYPE 的NXD................................................................. 17
表2-8 SQL SERVER 2005 的XQUERY 數值函數.................................... 26
表2-9 SQL SERVER 2005 的XQUERY 字串值相關函數........................ 27
表2-10 SQL SERVER 2005 的XQUERY 布林值相關函數....................... 28
表2-11 SQL SERVER 2005 的XQUERY 節點相關函數........................... 28
表2-12 SQL SERVER 2005 的 XQUERY 內容函數................................. 28
表2-13 SQL SERVER 2005 的XQUERY 序列的相關函數....................... 29
表2-14 SQL SERVER 2005 的XQUERY 彙總函數.................................. 30
表2-15 SQL SERVER 2005 的XQUERY 建構函式函數........................... 31
表2-16 SQL SERVER 2005 的XQUERY DATA ACCESSOR 函數........... 31
表2-17 SQL SERVER 2005 的XQUERY 布林建構函式........................... 32
表2-18 SQL SERVER 2005 的XQUERY 擴充函數.................................. 32
表4-1 GENCBA 參數說明........................................................................ 59
表4-2 實驗資料集參數說明.................................................................... 60
表4-3 實驗4 結果................................................................................... 67
viii
圖目錄
圖1-1 W3C 公佈XML 規格文件的歷程..................................................... 1
圖2-1 APRIORI 演算法............................................................................. 6
圖2-2 交易資料庫範例.............................................................................. 7
圖2-3 APRIORI 運算過程.......................................................................... 8
圖2-4 將交易資料庫D 建成TITTC 表.................................................... 9
圖2-5 用L1 組合產生C2 的TITTC............................................................ 9
圖2-6 C2 去除不滿足最小支持度後的L2 表格......................................... 10
圖2-7 L2 組合產生L3 高頻項目集產生過程............................................. 10
圖2-8 以C1 重整交易資料...................................................................... 11
圖2-9 建構FP-TREE............................................................................... 12
圖2-10 找出以A 點為基礎的高頻項目集................................................ 13
圖2-11 找出以B 點為基礎的高頻項目集................................................ 13
圖2-12 找出以D 點為基礎的高頻項目集............................................... 13
圖2-13 找出以C 點為基礎的高頻項目集............................................... 14
圖2-14 XML 與查詢語言的發展 (本研究整理) ....................................... 18
圖2-15 XPATH 元素關係屬性................................................................. 19
圖2-16 W3C XQUERY 相關規格文件版本與公佈日期............................ 23
圖2-17 FLWOR 表示法之資料結構順序.................................................. 24
圖2-18 計算C1 (WAN & DOBBIE) ......................................................... 33
圖2-19 計算高頻項目集 (WAN & DOBBIE)........................................... 34
圖2-20 計算關聯項目 (WAN & DOBBIE) .............................................. 34
圖3-1 交易資料XML 文件架構.............................................................. 40
圖3-2 高頻項目集與關聯規則的XML 文件架構.................................... 41
圖3-3 CREATE TABLE T ........................................................................ 42
圖3-4 SP_BUILDITEMSET 動作流程...................................................... 42
圖3-5 關聯項目集搜尋方法.................................................................... 43
圖3-6 SP_INSERTXMLDOC 動作流程.................................................... 46
圖3-7 SP_DELETEXMLDOC 動作流程................................................... 48
圖3-8 資料UPDATE 的動作流程............................................................ 51
圖3-9 計算C1 (WAN & DOBBIE)........................................................... 52
圖3-10 關聯式資料庫之交易資料........................................................... 53
圖3-11 產生C1 的T-SQL (使用子查詢) ................................................. 53
圖3-12 產生C1 的T-SQL (不使用子查詢) ............................................. 54
圖3-13 計算C1 (IXARM) ....................................................................... 54
圖4-1 邏輯驗證用資料............................................................................ 55
圖4-2 XML 型態的邏輯驗證用資料........................................................ 56
ix
圖4-3 XML DOC 1 的高頻項目集............................................................ 57
圖4-4 新增XML 文件後的高頻項目集................................................... 58
圖4-5 刪除XML 文件後的高頻項目集................................................... 58
圖4-6 資料產生器................................................................................... 60
圖4-7 計算執行時間............................................................................... 61
圖4-8 實驗數據計算與繪圖.................................................................... 62
圖4-9 實驗1-1 結果................................................................................ 63
圖4-10 實驗1-2 結果.............................................................................. 64
圖4-11 實驗2-1 結果.............................................................................. 65
圖4-12 實驗2-2 結果.............................................................................. 65
圖4-13 實驗2-3 結果.............................................................................. 66
圖4-14 實驗3 結果之一.......................................................................... 67
圖4-15 實驗3 結果之二.......................................................................... 67
圖4-16 IXARM 流程修正......................................................................... 69
圖4-17 IXARM 不使用支持度................................................................. 70
圖4-18 IXARM 設定支持度..................................................................... 71
圖4-19 統計經驗法則............................................................................. 71
圖5-1 平行處理的應用............................................................................ 73
x
公式目錄
( )
D
X Y
S X Y

∩ = (2-1)........................................................................... 5
( ) ( )
S(X )
C Y X S X Y ∩
| = (2-2) ......................................................................... 5
( )
S(X )S(Y )
corr S X Y X Y

= , (2-3) .......................................................................... 5
( )
S(Y )
corr C Y X X Y
|
, = (2-4) ............................................................................ 6
Σ=
= ×
n
k
k
k count I
s s count I
1
1 1 ( )
( )
( 3-1) .................................................................... 37
( ) ( )
( ) ( )
1
1 1 1
1
+
+ +
+
× + ×
=
old n
old old n n
new count I count I
s count I s count I
s (3-2) ......................................... 38
( ) ( )
( ) ( ) 1 1
1
old m
old old n m
new count I count I
s count I s count I
s

× − ×
= + (3-3) .......................................... 38
參考文獻
[1] R. Agrawal, and R.Srikant, “Mining Sequential Patterns,” In Procedings of the
11th International Conference on Data Engineering (ICDE), pp. 3-14, 1995.
[2] R. Agrawal, and R. Srikant, “Fast Algorithms for Mining Association Rules,” In
Procedings of the 20th VERY LARGE DATA BASE (VLDB) Conference Santiago,
pp.487-499, 1994.
[3] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules Between
Sets of Items in Large Databases,” In Procedings of the ACM SIGMOD Conference
on Management of Data, pp. 207-216, 1993.
[4] K. Alsabti, S. Ranka, and V. Singh, “An Efficient K-Means Clustering Algorithm,”
PPS/SPDP Workshop on High Performance Data Mining, pp.34-39, 1997.
[5] R. J. Bayardo, “Efficiently Mining Long Patterns from Databases,” In Procedings
of ACM SIGOD Conferences On Management of Data, pp. 85-93, 1998.
[6] C. Borgelt, "Recursion Pruning for the Apriori Algorithm," In Procedings of the
2nd Workshop of Frequent Item Set Mining Implementations (FIMI), Brighton, UK,
2004.
[7] C. Borgelt, "Efficient Implementations of Apriori and Eclat," In Procedings of the
1st Workshop of Frequent Item Set Mining Implementations (FIMI), Melbourne,
FL, USA, 2003.
[8] C. Borgelt and R. Kruse, "Induction of Association Rules: Apriori Implementation,"
In Procedings of the 15th Conference on Computational Statistics, Physica
Verlag, Heidelberg, Germany, 2002.
[9] R. Bourret, C. Bornhovd, A. Buchmann, “A generic load/extract utility for data
transfer between XML documents and relational databases,” Advanced Issues of
E-Commerce and Web-Based Information Systems, Second International Workshop,
pp. 134-143, 2000.
[10] L. Breiman, J. Friedman, R. Olshen, and C. Stone, “Classification of Regression
Trees,” Wadsworth, 1984.
[11] A. B. Chaudhri, A. Rashid, and R. Zicari, “XML Data Management: Native XML
and XML-Enabled Database Systems,” Addison-Wesley, Boston, MA, 2003.
[12] E. Bertino, and B. Catania, “Integrating XML and Database,” IEEE Internet Computing,
Volume: 5 Issue: 4, July-Aug, pp. 84-88, 2001.
[13] D. Chamberlin, J. Robie, and D. Florescu, "Quilt: An XML Query Language for
Heterogeneous Data Sources,” In Proceedings of WebDB 2000 Conference, in
Lecture Notes in Computer Science, Springer-Verlag, 2000.
[14] M. S. Chen, J. Han, and P. S. Yu, “Data Mining: An Overview from a Database
Perspective,” IEEE Transactions on Knowledge and Data Engineering, vol. 8, no.
6, pp. 866-883, 1996.
[15] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,”
In Procedings of the ACM SIGMOD Int. Conferences on Management of
Data, pp. 1-12, Dallas, Texas, USA, 2000.
[16] J. Hipp, A. Myka, R. Wirth, and U. Guntzer, “A New Algorithm for Faster Mining
- 75 -
of Generalized Association Rules,” In Proceedings of the 2nd European Symposium
on Principles of Data Mining and Knowledge Discovery (PKDD ’98), pp.
74-82, Nantes, France, September 23-26, 1998.
[17] D. Chamberlin, D. Draper, M. Fernandez, H. Katz, M. Kay, J. Robie, M. Rys, J.
Simeon, J. Tivy and P. Wadler, “XQuery from the Experts: A Guide to the W3C
XML Query Language,” Addison-Wesley, Boston, MA, 2003.
[18] L. Kaufman, and P. J. Rousseeuw, “Finding Groups in Data : An Introduction to
Cluster Analysis,” John Wiley & Sons, 1990.
[19] H. Mannila, and P. Ronkainen, “Similarity of Event Sequences,” Procedings of the
Fourth International Workshop on Temporal Representation and Reasoning
(TIME’97) , pp. 136-139, 1997.
[20] S. Natu and J. Mendonca, “Digital asset management using a native XML database
implementation,” In Procedings of the 4th Conferences on Information Technology
Curriculum, Indiana, USA, pp. 237-241, Oct. 2003.
[21] R. Ng, and J. Han,”Efficient and Effective Clustering Method for Spatial Data
Mining,” In Procedings of International Conferences Very Large Data Bases, pp.
144-155, 1994.
[22] J. R. Quilan, “Induction of decision trees,” Machine Learning, pp. 81-106, 1986.
[23] J. R. Quilan, “C4.5 : Programs for Machine Learning,” Morgan Kaufmann, 1993.
[24] Savasere, E.Omiecinski, and S.Navathe, “An Efficient Algorithm for Mining Association
Rules in Large Databases,” In Procedings of the 21st VERY LARGE
DATA BASE (VLDB), pp. 432-444, 1995.
[25] J. Shanmugasundaram, K. Tufte, G. He, C. Zhang, D. De-Witt, and J. Naughton,
“Relational databases for querying XML documents: limitations and portunities,”
In Proceedings of VERY LARGE DATA BASE (VLDB), Edinburgh, UK, pp.
302-314, September 1999 .
[26] R. Srikant, and R. Agrawal, “Mining Generalized Association Rules,” In Procedings
of the 21st International Conference on VERY LARGE DATA BASE (VLDB),
pp. 407-419, 1995.
[27] Toivonen, “Sampling Large Databases for Association Rules,” In Procedings of
the 23st VERY LARGE DATA BASE (VLDB), pp. 134-145, 1996.
[28] J. W.W. Wan, and G. Dobbie, “Mining Association Rules XML Data using
XQuery,” In Procedings of Australasian Workshop on Data Mining and Web Intelligence
(DMWI2004), Dunedin, New Zealand. CRPIT, Vol. 32. Purvis, M., Ed.
ACS. pp. 169-174, 2004.
[29] J. W.W. Wan, and G. Dobbie, “Extracting association rules from XML documents
using XQuery,” In Proceedings of the Fifth International Workshop on Web Information
and Data Management, New Orleans, LA, USA, pp. 94-97, 2002.
[30] K. Wang, L. Tang, J. Han, and J. Liu, “Top down FP-Growth for Association Rule
Mining,” In Procedings of the 6th Pacific Area Conference on Knowledge Discovery
and Data Mining (PAKDD2002), pp. 334-340, May 6-8, Taipei, Taiwan, 2002.
[31] S. Y. Wur and Y. Leu,"An effective Boolean Algorithm for Mining Association
Rules in Large Databases,” In Proceedings of the Sixth International Conference
on Database Systems for Advanced Applications (DASFAA 99), pp. 179-186, 1998.
- 76 -
[32] M. J. Zaki, “Scalable Algorithms for Association Mining,” IEEE Transactions On
Knowledge and Data Engineering, pp. 372-390, 2000.
[33] M. J. Zaki, S. Parthasarathy, M.Ogihara, and W.Li, “New Algorithms for Fast
Discovery of Association Rules,” In Proceedings of the 3rd International Conference
On Knowledge Discovery & Data Mining (KDD), pp. 283-296, 1997.
[34] “Standard Generalized Markup Language (SGML) ,” ISO 8879:1986.
[35] Bourret, XML and Databases,
http://www.rpbourret.com/XML/XMLAndDatabases.htm, 2006.
[36] KDnuggets, Polls : Data Mining Methods,
http://www.kdnuggets.com/polls/2006/data_mining_methods.htm, Apr 2006.
[37] KDnuggets, Polls : Data Mining Techniques,
http://www.kdnuggets.com/polls/2005/data_mining_techniques.htm, Feb 2005.
[38] KDnuggets, Polls : Data Storage for Data Mining,
http://www.kdnuggets.com/polls/2006/data_storage_for_data_mining.htm, June
2006.
[39] KDnuggets, Polls : Data Storage Formats,
http://www.kdnuggets.com/polls/2005/data_storage_formats.htm, June 2005.
[40] World Wide Web Consortium, Extensible Markup Language (XML) ,
http://www.w3.org/XML/.
[41] World Wide Web Consortium, XML Query (XQuery) ,
http://www.w3.org/XML/Query/.
[42] World Wide Web Consortium, XQuery 1.0: An XML Query Language,
http://www.w3.org/TR/xquery/.
[43] X-Hive/DB. http://www.x-hive.com.
[44] XML 台北資訊網 http://www.xml.org.tw/Default.asp
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top