跳到主要內容

臺灣博碩士論文加值系統

(44.200.168.16) 您好!臺灣時間:2023/04/02 02:30
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:黃人傑
研究生(外文):Jen-Chieh Huang
論文名稱:一個為串流資料查詢系統設計之XML路徑選擇性估計方法
論文名稱(外文):An Efficient Method for Estimating XML Path Selectivity in Stream Query Systems
指導教授:陳良弼陳良弼引用關係
指導教授(外文):Arbee L. P. Chen
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2004
畢業學年度:92
語文別:英文
論文頁數:54
中文關鍵詞:XML選擇性XPath
外文關鍵詞:XMLpath selectivityXPath
相關次數:
  • 被引用被引用:0
  • 點閱點閱:138
  • 評分評分:
  • 下載下載:5
  • 收藏至我的研究室書目清單書目收藏:0
隨著串流資料的日益重要,管理此類之龐大而不可預測之資料成為一急須之問題。許多串流資料管理系統 (DSMS) 被提出來解決不同的環境與資料型態之問題。其中一纇即為針對 XML (eXtensible Markup Language) 串流資料。XML 被廣泛用在資訊交換,與描述性之語言。因此,加速 XML 串流的查詢處理為一重要之課題。
在各種查詢處理的最佳化技巧中,選擇性 (selectivity) 的估計為一個重要的傳統方法。在新的串流環境下,決定一個條件敘述的選擇性也是一個重要的最佳化指標,尤其是在有結合 (join) 運算子的情形下。在 XML 串流處理中,對於路徑敘述 (path expressions) 的選擇性預估問題是更加複雜的。因此,我們提出了一個基於資料摘要 (summarization) 的新方法來估計路徑敘述的選擇性。我們所提出的方法是由路徑樹(path tree)結構與馬可夫模型(Markov model)所組成。我們也提出了一個基於選擇性估計的放寬查詢的技術以避免沒有答案被找出。
同時我們也在實驗部分對於不同型態的資料與查詢驗證我們方法的能力,並且考慮不同大小的記憶體所造成的影響。也分析了不同的刪除節點之方式所造成的影響。
Abstract
As the increasing need for streaming applications, management of these unbounded
and unpredicted data becomes an urgent issue. Many Data Stream Management
Systems (DSMSs) are proposed to accommodate to different streaming environments
and data types. One kind of these DSMSs focuses on the XML data type. XML is
widely used in many kinds of applications such as information exchange and web
services. Therefore, it is an important problem to efficiently process XML queries
over data streams.
Among various optimization techniques of query processing, selectivity estimation is
important in the traditional database researches. In the streaming environment, to
determine the selectivity of some condition clauses is also important for query
optimization, especially the case of join operation. In the XML streams, the estimation
of path selectivity is more complex and difficult. Therefore, a novel selectivity
estimation approach based on data summarization is proposed for XML path
expressions over streaming data. This approach is combined with both path tree
approach and Markov-based approach. The newly arrived data are processed and
recorded in the summary tree, and the unimportant data are archived in a preference
matrix. Based on the summary structures, the selectivity of the query paths can be
efficiently estimated. We also propose a prototype of query relaxation system for
XML path queries, which is based on the proposed selectivity estimation method. This
system is used to automatically relax the user query such that the number of query
results will reach the user requirement.
In the experiment results, we show that the selectivity of various path queries can be
efficiently estimated from different kinds of data. The performance under different
memory limits and the node deletion policies are also discussed.
Keywords: XML, XPath, Stream data, selectivity estimation, summarization
Table of Contents
Abstract III
Table of Contents V
List of Figures VI
1. Introduction 1
2. The Overview 6
2.1 The Environment and the Query Processor 6
2.2 Query Optimization Using Path Selectivity Estimation 8
3. Selectivity Estimation Using Summarization 11
3.1 Construction of the Summary Tree 11
3.2 Deletion of Nodes and Expiration of Data 18
3.3 Estimate the Selectivity Using the Summary Structure 27
4. Query Relaxation Using Selectivity Estimation 30
4.1 Motivation 30
4.2 the Relaxation Model 36
4.3 System: an Overview 39
4.4 The Proposed Relaxation Method 40
5. Performance Evaluation 43
5.1 Data Generation 43
5.2 Query Generation 44
5.3 Experiment Environment 45
5.3.1 The Experiment Setup 45
5.3.2 The Measures 45
5.4 Results and Discussion 46
5.4.1 Deletion Policies: D-Val vs. Leaf 46
5.4.2 Relevance of Data vs. Nodes in the Summary Tree 49
6. Conclusion 51
7. References 52

List of Figures
Figure 1 a Sample SQL Query 2
Figure 2 a Sample Query with Window Parameters 3
Figure 3 a Sample XQuery with Window Parameters 4
Figure 4 the Example Stream Proxy 6
Figure 5 the Sample Query Processor 7
Figure 6 the Initial Label Table 12
Figure 7 the Sample Path Expressions 13
Figure 8 an Incoming XML Data Tree 14
Figure 9 the Summarization Process at Step One with Parsing Stack 14
Figure 10 the Summarization Process at Step Two with Parsing Stack 15
Figure 11 the Partial Summary Tree with Parsing Stack 15
Figure 12 Another Partial Summary Tree 17
Figure 13 Yet Another Summary Tree 17
Figure 14 the Completed Summary Tree 18
Figure 15 Example of the Order-1 Markov Property 19
Figure 16 the Example of Order-2 Markov Property 20
Figure 17 A Sample Summary Tree to Be Trimmed 21
Figure 18 A Partial Preference Matrix 21
Figure 19 Another Partial Preference Matrix 23
Figure 20 the Partially Deleted Summary Tree 23
Figure 21 the Complete Preference Matrix 24
Figure 22 the Summary Tree Trimmed 24
Figure 23 the D-Value Formula 25
Figure 24 Estimation Function of Order-1 27
Figure 25 the Computation of Dead 27
Figure 26 the Algorithm for Simple Path Estimation 29
Figure 27 a Sample XML Schema 32
Figure 28 a Sample XML Document 33
Figure 29 a VLDB Example 34
Figure 30 a QAC-framework Query 37
Figure 31 a Sample Query to Be Relaxed 38
Figure 32 the Query Relaxation Framework 39
Figure 33 the Relaxation Result 41
Figure 34 the Complete Summary Tree 41
Figure 35 the Relaxed Query 42
Figure 36 the Error Function 45
Figure 37 the EC-Ratio Function 46
Figure 38 Window Size vs. Error with Deletion Ratio 0.2 46
Figure 39 Window Size vs. Error with Deletion Ratio 0.5 47
Figure 40 EC-Ratio vs. Window Size with Node Deletion Ratio 0.2 47
Figure 41 EC-Ratio vs. Window Size with Node Deletion Ratio 0.5 48
Figure 42 Error vs. Node Deletion Ratio of Path Queries with Different Data 49
Figure 43 Error vs. Node Deletion Ratio of Path Queries with Different Data 50
[1] Mehmet Altinel and Michael J. Franklin, “Efficient Filtering of XML Documents for Selective Dissemination of Information,” Proceedings of the 26th Very Large Data Bases Conference (VLDB), 2000.
[2] Yanlei Diao, Peter Fischer, Michael J. Franklin and Raymond To, “YFilter: Efficient and Scalable Filtering of XML Documents,” Proceedings of the 18th International Conference on Data Engineering (ICDE), 2002.
[3] Chee-Yong Chan, Pascal Felber, Minos Garofalakis and Rajeev Rastoji, “Efficient Filtering of XML Documents with XPath Expressions,” Proceedings of the 18th International Conference on Data Engineering (ICDE), 2002.
[4] Chee-Yong Chan, Minos Garofalakis and Rajeev Rastoji, “”RE-Tree: An Efficient Index Structure for Regular Expressions,” Proceedings of the 28th Very Large Data Bases Conference (VLDB), 2002.
[5] Madoka Yuriyama and Hiroaki Nakamura, “Filtering Contents by Efficient Evaluation of XPath Expressions,” Proceedings of the 2003 Symposium on Applications and the Internet (SAINT ‘03).
[6] Feng Peng and Sudarshan S. Chawathe, "XPath Queries on Streaming Data," Proceedings of Special Interests Groups Management of Data Conference (SIGMOD) 2003.
[7] Ashish Kumar Gupta and Dan Suciu, "Stream processing of XPath queries with predicates," Proceedings of Special Interest Group on Management of Data Conference (SIGMOD) 2003.
[8] Ashraf Aboulnaga, Jeffrey F. Naughton, Chun Zhang, "Generating Synthetic Complex-structured XML Data," Proceedings of International Workshop on the Web and Databases (WebDB) 2001
[9] Nicholas J. Belkin and W. Bruce Croft, "Information Filtering and Information Retrieval: Two Sides of the Same Coin?" Communications of the ACM (CACM) 35(12):29-38 1992.
[10] Georg Gottolob, Christoph Koch, and Reinhard Pichler, "XPath Processing in a Nutshell," SIGMOD Record 32(1):12-19 2003
[11] Jianjun Chen, David J. DeWitt, Feng Tien, and Yuan Wang, "NiagaraCQ, A Scalable Continuous Query System for Internet Databases," Proceedings of the Special Interest Group on Management of Data Conference (SIGMOD) 2000
[12] Zachary G. Ives, Alon Y. Halevy, and Daniel S. Weld, "An XML Query Engine for Network-Bound Data," Very Large Data Bases Journal 11(4):380-402, 2002.
[13] Dongwon Lee, "Query Relaxation for XML Model," PhD. Thesis
[14] Ashraf Aboulnaga, Alaa R. Alameldeen, and Jeffrey F. Naughton, "Estimating the Selectivity of XML Path Expressions for Internet Scale Applications," Proceedings of 27th Very Large Data Bases Conference (VLDB) 2001.
[15] Neoklis Polyzotis and Minos Garofalakis, "Statistical Synopses for General Graph-Structured XML Documents," Proceedings of Special Interest Group on Management f Data Conference (SIGMOD), 2002.
[16] Ashraf Aboulnaga, Jeffrey F. Naughton, "Building XML Statistics for the Hidden Web," Proceedings of Conference on Information and Knowledge Management (CIKM) 2002
[17] J. Clark, XML path language (XPath), 1999, http://www.w3.org/TR/xpath
[18] J. P. T. Bray and C. M. Sperberg-McQueen. eXtensible Markup Language(xml) 1.0, http://www.w3.org/TR/Rec-xml, 1998
[19] The STREAM Group. "STREAM: The Stanford Stream Data Manager," IEEE Data Engineering Bulletin, Vol. 26 No. 1, March 2003
[20] B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, "Models and Issues in Data Stream Systems," Proceedings of Principles of Database Systems Conference (PODS) 2002.
[21] Jaewoo Kang, Jeffrey F. Naughton, and Stratis D. Viglac, “Evaluating Window Joins over Unbounded Streams”, Proceedings of the 28th Very Large Data Bases Conference (VLDB) 2002
[22] S. J. Kaplan. “Cooperative Aspects of Database Interactions,” Artificial Intelligence, 19(2):165–187, Oct. 1982.
[23] Lukasz Golab and M. Tamer �頊su, “Processing Sliding Window Multi-Joins in Continuous Queries over Data Streams,” Proceedings of the 29th Very Large Data Bases Conference (VLDB) 2003.
[24] Megginson Technologies, “SAX 1.0: a free API for event-based XML parsing”, http://www.megginson.com/SAX/index.html, May, 1998
[25] TinyXML project, http://tinyxpath.sourceforge.net/, 2004
[26] XQuery, http://www.w3c.org/XML/Query, 2004
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top