臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.13) 您好！臺灣時間：2025/11/24 06:20

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
紙本論文
QR Code

本論文永久網址:

研究生:

董建弘

研究生(外文):

Chien-Hung Tung

論文名稱:

以XQuery為基礎之漸進式XML關聯規則探勘方法

論文名稱(外文):

Incremental XML Association Rules Mining MiningUsing XQuery

指導教授:

曾守正

指導教授(外文):

Frank S.C. Tseng

學位類別:

碩士

校院名稱:

國立高雄第一科技大學

系所名稱:

資訊管理所

學門:

電算機學門

學類:

電算機一般學類

論文種類:

學術論文

論文出版年:

2006

畢業學年度:

語文別:

中文

論文頁數:

105

中文關鍵詞:

資料探勘、關聯規則、XML、XQuery、漸進式方法

外文關鍵詞:

Incremental Data Mining、XML、XQuery.、Association Rules、Data Mining

相關次數:

被引用:0
點閱:169
評分:
下載:0
書目收藏:0

摘要
XML 已逐漸成為組織間在網際網路上資料交換的標準，在未來將勢必
有大量的資料會以XML 格式儲存著，由此看來，未來資訊的傳遞與交換
將逐漸根植於XML。雖然在關聯式資料庫系統中已經有很成熟的資料探勘
技術，但對於XML 資料的探勘還像是處女地一般。本論文嘗試提出一個
方法，不僅能探勘出XML 文件中的關聯規則，並且可以在XML 文件有新
增刪除修改等異動時，只需針對異動的部份做計算，便可以與前次探勘的
結果整合成新的且可用的項目集資料，供使用者查詢關聯規則，我們稱這
個演算法為－IXARM (Incremental XML Association Rules Mining)。並已經
由實驗證明，本演算法的正確性以及在處理資料異動時效能上的優異表現。

ABSTRACT
XML has already been recognized as a standard for electronic data interchange
over the Internet. We believe a large amount of data will be represented
and stored in XML format in the near future. Therefore, we think it
will be indispensable to develop tools for mining information directly from
XML data. Although many well-known data minig methods have been developed,
they are almost based on relational database formats. In this paper, we
will propose an algorithm, called IXARM (Incremental XML Association
Rules Mining), which does not only extract association rules from XML
documents, but also offer flexible incremental mining tasks. That is, even
when there are many INSERT, DELETE or UPDATE events performed on
source XML documents, our algorithm re-computes the modified part only
and then combines the previous result to build new association rules in an efficient
way. We also have conducted experiment to show the accuracy and
performances are all satisfactory for most of the assocation rule mining applications
on XML documents.

目錄
摘要.......................................................................................... i
ABSTRACT ................................................................................. ii
誌謝........................................................................................ iii
目錄........................................................................................ iv
表目錄....................................................................................... vii
圖目錄...................................................................................... viii
公式目錄.......................................................................................x
1. 緒論.......................................................................................1
1.1 研究背景.....................................................................1
1.2 研究動機.....................................................................1
1.3 研究目的.....................................................................3
1.4 研究架構.....................................................................3
2. 文獻探討................................................................................4
2.1 關聯規則之探勘..........................................................4
2.1.1 關聯規則的探討..........................................................5
2.1.2 Apriori 演算法.............................................................6
2.1.3 Boolean 演算法..........................................................8
2.1.4 FP-Tree 演算法.........................................................11
2.2 XML 的儲存方式......................................................14
2.2.1 XML 檔案.................................................................14
2.2.2 XML 與關聯式資料庫...............................................15
2.2.3 原生型XML 資料庫..................................................16
2.3 XML 查詢語言..........................................................18
2.3.1 XPath.........................................................................18
2.3.2 Quilt ..........................................................................19
2.3.3 XQuery ......................................................................20
2.3.4 Microsoft ® SQL Server® 2005 XML Enable.............25
2.4 XML 資料的關聯規則探勘.......................................33
v
3. 漸進式探勘方法...................................................................35
3.1 基本概念...................................................................35
3.2 關聯規則整合方法....................................................36
3.2.1 整合多個XML 關聯規則之支持度的計算................36
3.2.2 漸增XML Data 的關聯規則之支持度的計算...........38
3.2.3 刪除XML Data 的關聯規則之支持度的計算...........38
3.2.4 IXARM 演算法.........................................................38
3.3 實作IXARM 演算法.................................................39
3.3.1 定義XML 文件.........................................................39
3.3.2 建立XML 檔案與項目集儲存表格...........................41
3.3.3 設計預存程序............................................................42
3.3.4 SQL Server 2005 XQuery 陳述式的撰寫...................51
4. 演算法的驗證與評估...........................................................55
4.1 演算法驗證...............................................................55
4.1.1 驗證資料...................................................................55
4.1.2 驗證高頻項目集........................................................56
4.1.3 驗證新增XML 文件..................................................57
4.1.4 驗證刪除XML 文件..................................................58
4.2 實驗評估...................................................................58
4.2.1 實驗設計...................................................................59
4.2.2 和非漸進式探勘演算之比較.....................................63
4.2.3 實驗參數對IXARM 演算法之效率評估...................66
4.3 實驗結果總結分析....................................................68
5. 結論與建議..........................................................................72
5.1 研究貢獻...................................................................72
5.2 未來研究...................................................................72
5.2.1 演算法效率的改善....................................................72
5.2.2 平行處理的應用........................................................73
參考文獻.....................................................................................74
附錄一 W3C XML 相關規格文件..........................................77
附錄二 W3C XPath 1.0 相關規格文件...................................78
vi
附錄三 W3C XQuery 1.0 相關規格文件................................79
附錄四 Table .........................................................................84
附錄五 StoredProcedure .........................................................85
vii
表目錄
表1-1 2005 DATA STORAGE FORMATS 調查 (KDNUGGETS)................ 2
表1-2 2006 DATA STORAGE FOR DATA MINING 調查 (KDNUGGETS) 2
表2-1 DATA MINING 技術採用調查 (KDNUGGETS) ............................. 4
表2-2 使用FP-GROWTH 演算法探勘出的高頻項目集........................... 14
表2-3 FILE SYSTEM DB TYPE 的NXD.................................................. 16
表2-4 OBJECT-ORIENTED DB TYPE 的NXD ........................................ 16
表2-5 MODEL-BASED DB TYPE 的NXD ............................................... 17
表2-6 RELATIONAL DB TYPE 的NXD .................................................. 17
表2-7 其他DB TYPE 的NXD................................................................. 17
表2-8 SQL SERVER 2005 的XQUERY 數值函數.................................... 26
表2-9 SQL SERVER 2005 的XQUERY 字串值相關函數........................ 27
表2-10 SQL SERVER 2005 的XQUERY 布林值相關函數....................... 28
表2-11 SQL SERVER 2005 的XQUERY 節點相關函數........................... 28
表2-12 SQL SERVER 2005 的 XQUERY 內容函數................................. 28
表2-13 SQL SERVER 2005 的XQUERY 序列的相關函數....................... 29
表2-14 SQL SERVER 2005 的XQUERY 彙總函數.................................. 30
表2-15 SQL SERVER 2005 的XQUERY 建構函式函數........................... 31
表2-16 SQL SERVER 2005 的XQUERY DATA ACCESSOR 函數........... 31
表2-17 SQL SERVER 2005 的XQUERY 布林建構函式........................... 32
表2-18 SQL SERVER 2005 的XQUERY 擴充函數.................................. 32
表4-1 GENCBA 參數說明........................................................................ 59
表4-2 實驗資料集參數說明.................................................................... 60
表4-3 實驗4 結果................................................................................... 67
viii
圖目錄
圖1-1 W3C 公佈XML 規格文件的歷程..................................................... 1
圖2-1 APRIORI 演算法............................................................................. 6
圖2-2 交易資料庫範例.............................................................................. 7
圖2-3 APRIORI 運算過程.......................................................................... 8
圖2-4 將交易資料庫D 建成TITTC 表.................................................... 9
圖2-5 用L1 組合產生C2 的TITTC............................................................ 9
圖2-6 C2 去除不滿足最小支持度後的L2 表格......................................... 10
圖2-7 L2 組合產生L3 高頻項目集產生過程............................................. 10
圖2-8 以C1 重整交易資料...................................................................... 11
圖2-9 建構FP-TREE............................................................................... 12
圖2-10 找出以A 點為基礎的高頻項目集................................................ 13
圖2-11 找出以B 點為基礎的高頻項目集................................................ 13
圖2-12 找出以D 點為基礎的高頻項目集............................................... 13
圖2-13 找出以C 點為基礎的高頻項目集............................................... 14
圖2-14 XML 與查詢語言的發展 (本研究整理) ....................................... 18
圖2-15 XPATH 元素關係屬性................................................................. 19
圖2-16 W3C XQUERY 相關規格文件版本與公佈日期............................ 23
圖2-17 FLWOR 表示法之資料結構順序.................................................. 24
圖2-18 計算C1 (WAN & DOBBIE) ......................................................... 33
圖2-19 計算高頻項目集 (WAN & DOBBIE)........................................... 34
圖2-20 計算關聯項目 (WAN & DOBBIE) .............................................. 34
圖3-1 交易資料XML 文件架構.............................................................. 40
圖3-2 高頻項目集與關聯規則的XML 文件架構.................................... 41
圖3-3 CREATE TABLE T ........................................................................ 42
圖3-4 SP_BUILDITEMSET 動作流程...................................................... 42
圖3-5 關聯項目集搜尋方法.................................................................... 43
圖3-6 SP_INSERTXMLDOC 動作流程.................................................... 46
圖3-7 SP_DELETEXMLDOC 動作流程................................................... 48
圖3-8 資料UPDATE 的動作流程............................................................ 51
圖3-9 計算C1 (WAN & DOBBIE)........................................................... 52
圖3-10 關聯式資料庫之交易資料........................................................... 53
圖3-11 產生C1 的T-SQL (使用子查詢) ................................................. 53
圖3-12 產生C1 的T-SQL (不使用子查詢) ............................................. 54
圖3-13 計算C1 (IXARM) ....................................................................... 54
圖4-1 邏輯驗證用資料............................................................................ 55
圖4-2 XML 型態的邏輯驗證用資料........................................................ 56
ix
圖4-3 XML DOC 1 的高頻項目集............................................................ 57
圖4-4 新增XML 文件後的高頻項目集................................................... 58
圖4-5 刪除XML 文件後的高頻項目集................................................... 58
圖4-6 資料產生器................................................................................... 60
圖4-7 計算執行時間............................................................................... 61
圖4-8 實驗數據計算與繪圖.................................................................... 62
圖4-9 實驗1-1 結果................................................................................ 63
圖4-10 實驗1-2 結果.............................................................................. 64
圖4-11 實驗2-1 結果.............................................................................. 65
圖4-12 實驗2-2 結果.............................................................................. 65
圖4-13 實驗2-3 結果.............................................................................. 66
圖4-14 實驗3 結果之一.......................................................................... 67
圖4-15 實驗3 結果之二.......................................................................... 67
圖4-16 IXARM 流程修正......................................................................... 69
圖4-17 IXARM 不使用支持度................................................................. 70
圖4-18 IXARM 設定支持度..................................................................... 71
圖4-19 統計經驗法則............................................................................. 71
圖5-1 平行處理的應用............................................................................ 73
x
公式目錄
( )
D
X Y
S X Y
∩
∩ = (2-1)........................................................................... 5
( ) ( )
S(X )
C Y X S X Y ∩
| = (2-2) ......................................................................... 5
( )
S(X )S(Y )
corr S X Y X Y
∩
= , (2-3) .......................................................................... 5
( )
S(Y )
corr C Y X X Y
|
, = (2-4) ............................................................................ 6
Σ=
= ×
n
k
k
k count I
s s count I
1
1 1 ( )
( )
( 3-1) .................................................................... 37
( ) ( )
( ) ( )
1
1 1 1
1
+
+ +
+
× + ×
=
old n
old old n n
new count I count I
s count I s count I
s (3-2) ......................................... 38
( ) ( )
( ) ( ) 1 1
1
old m
old old n m
new count I count I
s count I s count I
s
−
× − ×
= + (3-3) .......................................... 38

參考文獻
[1] R. Agrawal, and R.Srikant, “Mining Sequential Patterns,” In Procedings of the
11th International Conference on Data Engineering (ICDE), pp. 3-14, 1995.
[2] R. Agrawal, and R. Srikant, “Fast Algorithms for Mining Association Rules,” In
Procedings of the 20th VERY LARGE DATA BASE (VLDB) Conference Santiago,
pp.487-499, 1994.
[3] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules Between
Sets of Items in Large Databases,” In Procedings of the ACM SIGMOD Conference
on Management of Data, pp. 207-216, 1993.
[4] K. Alsabti, S. Ranka, and V. Singh, “An Efficient K-Means Clustering Algorithm,”
PPS/SPDP Workshop on High Performance Data Mining, pp.34-39, 1997.
[5] R. J. Bayardo, “Efficiently Mining Long Patterns from Databases,” In Procedings
of ACM SIGOD Conferences On Management of Data, pp. 85-93, 1998.
[6] C. Borgelt, "Recursion Pruning for the Apriori Algorithm," In Procedings of the
2nd Workshop of Frequent Item Set Mining Implementations (FIMI), Brighton, UK,
2004.
[7] C. Borgelt, "Efficient Implementations of Apriori and Eclat," In Procedings of the
1st Workshop of Frequent Item Set Mining Implementations (FIMI), Melbourne,
FL, USA, 2003.
[8] C. Borgelt and R. Kruse, "Induction of Association Rules: Apriori Implementation,"
In Procedings of the 15th Conference on Computational Statistics, Physica
Verlag, Heidelberg, Germany, 2002.
[9] R. Bourret, C. Bornhovd, A. Buchmann, “A generic load/extract utility for data
transfer between XML documents and relational databases,” Advanced Issues of
E-Commerce and Web-Based Information Systems, Second International Workshop,
pp. 134-143, 2000.
[10] L. Breiman, J. Friedman, R. Olshen, and C. Stone, “Classification of Regression
Trees,” Wadsworth, 1984.
[11] A. B. Chaudhri, A. Rashid, and R. Zicari, “XML Data Management: Native XML
and XML-Enabled Database Systems,” Addison-Wesley, Boston, MA, 2003.
[12] E. Bertino, and B. Catania, “Integrating XML and Database,” IEEE Internet Computing,
Volume: 5 Issue: 4, July-Aug, pp. 84-88, 2001.
[13] D. Chamberlin, J. Robie, and D. Florescu, "Quilt: An XML Query Language for
Heterogeneous Data Sources,” In Proceedings of WebDB 2000 Conference, in
Lecture Notes in Computer Science, Springer-Verlag, 2000.
[14] M. S. Chen, J. Han, and P. S. Yu, “Data Mining: An Overview from a Database
Perspective,” IEEE Transactions on Knowledge and Data Engineering, vol. 8, no.
6, pp. 866-883, 1996.
[15] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,”
In Procedings of the ACM SIGMOD Int. Conferences on Management of
Data, pp. 1-12, Dallas, Texas, USA, 2000.
[16] J. Hipp, A. Myka, R. Wirth, and U. Guntzer, “A New Algorithm for Faster Mining
- 75 -
of Generalized Association Rules,” In Proceedings of the 2nd European Symposium
on Principles of Data Mining and Knowledge Discovery (PKDD ’98), pp.
74-82, Nantes, France, September 23-26, 1998.
[17] D. Chamberlin, D. Draper, M. Fernandez, H. Katz, M. Kay, J. Robie, M. Rys, J.
Simeon, J. Tivy and P. Wadler, “XQuery from the Experts: A Guide to the W3C
XML Query Language,” Addison-Wesley, Boston, MA, 2003.
[18] L. Kaufman, and P. J. Rousseeuw, “Finding Groups in Data : An Introduction to
Cluster Analysis,” John Wiley & Sons, 1990.
[19] H. Mannila, and P. Ronkainen, “Similarity of Event Sequences,” Procedings of the
Fourth International Workshop on Temporal Representation and Reasoning
(TIME’97) , pp. 136-139, 1997.
[20] S. Natu and J. Mendonca, “Digital asset management using a native XML database
implementation,” In Procedings of the 4th Conferences on Information Technology
Curriculum, Indiana, USA, pp. 237-241, Oct. 2003.
[21] R. Ng, and J. Han,”Efficient and Effective Clustering Method for Spatial Data
Mining,” In Procedings of International Conferences Very Large Data Bases, pp.
144-155, 1994.
[22] J. R. Quilan, “Induction of decision trees,” Machine Learning, pp. 81-106, 1986.
[23] J. R. Quilan, “C4.5 : Programs for Machine Learning,” Morgan Kaufmann, 1993.
[24] Savasere, E.Omiecinski, and S.Navathe, “An Efficient Algorithm for Mining Association
Rules in Large Databases,” In Procedings of the 21st VERY LARGE
DATA BASE (VLDB), pp. 432-444, 1995.
[25] J. Shanmugasundaram, K. Tufte, G. He, C. Zhang, D. De-Witt, and J. Naughton,
“Relational databases for querying XML documents: limitations and portunities,”
In Proceedings of VERY LARGE DATA BASE (VLDB), Edinburgh, UK, pp.
302-314, September 1999 .
[26] R. Srikant, and R. Agrawal, “Mining Generalized Association Rules,” In Procedings
of the 21st International Conference on VERY LARGE DATA BASE (VLDB),
pp. 407-419, 1995.
[27] Toivonen, “Sampling Large Databases for Association Rules,” In Procedings of
the 23st VERY LARGE DATA BASE (VLDB), pp. 134-145, 1996.
[28] J. W.W. Wan, and G. Dobbie, “Mining Association Rules XML Data using
XQuery,” In Procedings of Australasian Workshop on Data Mining and Web Intelligence
(DMWI2004), Dunedin, New Zealand. CRPIT, Vol. 32. Purvis, M., Ed.
ACS. pp. 169-174, 2004.
[29] J. W.W. Wan, and G. Dobbie, “Extracting association rules from XML documents
using XQuery,” In Proceedings of the Fifth International Workshop on Web Information
and Data Management, New Orleans, LA, USA, pp. 94-97, 2002.
[30] K. Wang, L. Tang, J. Han, and J. Liu, “Top down FP-Growth for Association Rule
Mining,” In Procedings of the 6th Pacific Area Conference on Knowledge Discovery
and Data Mining (PAKDD2002), pp. 334-340, May 6-8, Taipei, Taiwan, 2002.
[31] S. Y. Wur and Y. Leu,"An effective Boolean Algorithm for Mining Association
Rules in Large Databases,” In Proceedings of the Sixth International Conference
on Database Systems for Advanced Applications (DASFAA 99), pp. 179-186, 1998.
- 76 -
[32] M. J. Zaki, “Scalable Algorithms for Association Mining,” IEEE Transactions On
Knowledge and Data Engineering, pp. 372-390, 2000.
[33] M. J. Zaki, S. Parthasarathy, M.Ogihara, and W.Li, “New Algorithms for Fast
Discovery of Association Rules,” In Proceedings of the 3rd International Conference
On Knowledge Discovery & Data Mining (KDD), pp. 283-296, 1997.
[34] “Standard Generalized Markup Language (SGML) ,” ISO 8879:1986.
[35] Bourret, XML and Databases,
http://www.rpbourret.com/XML/XMLAndDatabases.htm, 2006.
[36] KDnuggets, Polls : Data Mining Methods,
http://www.kdnuggets.com/polls/2006/data_mining_methods.htm, Apr 2006.
[37] KDnuggets, Polls : Data Mining Techniques,
http://www.kdnuggets.com/polls/2005/data_mining_techniques.htm, Feb 2005.
[38] KDnuggets, Polls : Data Storage for Data Mining,
http://www.kdnuggets.com/polls/2006/data_storage_for_data_mining.htm, June
2006.
[39] KDnuggets, Polls : Data Storage Formats,
http://www.kdnuggets.com/polls/2005/data_storage_formats.htm, June 2005.
[40] World Wide Web Consortium, Extensible Markup Language (XML) ,
http://www.w3.org/XML/.
[41] World Wide Web Consortium, XML Query (XQuery) ,
http://www.w3.org/XML/Query/.
[42] World Wide Web Consortium, XQuery 1.0: An XML Query Language,
http://www.w3.org/TR/xquery/.
[43] X-Hive/DB. http://www.x-hive.com.
[44] XML 台北資訊網 http://www.xml.org.tw/Default.asp

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	運用類神經網路與資料探勘技術於網路教學課程推薦之研究
2.	圖書館個人化館藏推薦系統
3.	資料採礦技術在病例與藥品關連性之研究
4.	智慧型代理人於電子商務之整合與應用
5.	資料採礦之應用研究─台灣地區漁市場行情資料庫之關聯法則分析
6.	醫院門診資料探勘─以虎尾若瑟醫院為例
7.	中醫院揀藥儲位規劃之研究
8.	利用資料挖掘技術提供網際網路使用者個人化服務
9.	模糊邏輯與資料探勘技術為基礎在顧客關係管理上之研究與應用
10.	應用資料挖掘於目標行銷之研究
11.	應用人工智慧技術於人類慢性疾病管理
12.	應用資料探勘技術於數位圖書館之個人化服務及管理
13.	建立一個Web-based資料挖掘系統提供藥物交互作用資訊查詢
14.	應用資料探勘於顧客的行為分析-以半導體業為例
15.	提昇資料探勘效率於企業行銷之應用

1.	【1】徐辛欽、周淑美，臺灣臭氧量與紫外線(UVB)之分析研究，氣象學報，VOL.41 (2), （1995）
2.	【3】馮欽賜、楊明仁，紫外線指數的分析與預報，氣象學報，VOL.43(1), （1999）
3.	【2】柳中明、劉銘龍、李涵慧，我國紫外線指數監測工作之推動，環境工程會刊，VOL.9(2), （1998）

1.	一般化多重調整直方圖之無失真資料嵌入法
2.	以知識為基礎的維護管理專家系統建構之研究-以供電系統為例
3.	以動機歷程整合模式探討虛擬社群知識分享行為之研究
4.	應用本體論建構短期授信專家系統
5.	建構一教育行政法規支援系統
6.	任務特性、知識管理流程與資訊科技應用配適對知識管理績效之影響
7.	視覺化與語音互動式數位學習語言發音輔助評分系統
8.	台灣國際觀光旅館業旅館資產管理系統成功模式之研究
9.	適地性服務在智慧型行動電話介面設計之研究
10.	人機互動設計於行動學習系統建構之研究
11.	以公平及價值觀點探討供應鏈管理系統滿意度
12.	以智慧型代理人支援顧客知識管理之研究
13.	企業實施協同商務之相關要素與特性
14.	以嵌入式Linux/Qtopia實現行動裝置上之多媒體串流技術
15.	IT資源類型對SCM系統使用及其供應鏈組織績效之影響-以資源基礎理論為基礎

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室