跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.23) 您好!臺灣時間:2025/10/26 22:23
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:許雯絞
研究生(外文):Wen-Chiao Hsu
論文名稱:支援有效查詢與更新機制的XML文件索引架構
論文名稱(外文):Compacted Indexing Schemes for Supporting Efficient Query and Update Processing of XML Documents
指導教授:廖宜恩廖宜恩引用關係
口試委員:何建明陳孟彰高成炎高勝助
口試日期:2012-06-19
學位類別:博士
校院名稱:國立中興大學
系所名稱:資訊科學與工程學系所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2012
畢業學年度:100
語文別:英文
論文頁數:65
中文關鍵詞:XML索引結構合併索引XML查詢XML更新分支映射
外文關鍵詞:XML IndexStructural Summary IndexXML QueryXML UpdateBranch Map
相關次數:
  • 被引用被引用:1
  • 點閱點閱:275
  • 評分評分:
  • 下載下載:3
  • 收藏至我的研究室書目清單書目收藏:0
為加快XML文檔的查詢,許多索引和查詢處理方法已陸績被提出。其中Structural summary索引理論減少了查詢處理過程中,XML必須檢視的部分。然而,大多數基於這個理論方法不能有效率地支持複雜的查詢,有些則需要費時的索引建構時間和巨大的索引空間。Structural join索引理論則提出了藉由Decomposition- matching-merging的方式支援分支結構的查詢。不過這種方法會產生大量的中間結果,造成不必要的處理成本。因此一些後續的研究著重在如何以減少中間結果,以及如何避免高成本的合併階段,例如各種資料結構的提出,和有效率地查詢處理演自法。最重要的問題是,大部分索引方法不能支持動態更新,如插入、刪除、和修改。
本論文首先提出了一種新的索引方法,稱為CIS-X (A Compacted Indexing Scheme for XML documents)。CIS-X的繼承了DataGuide的特性,但只使用一個整數去標記節點在XML文檔中的位置,盡可能減少所需的索引空間。它也可以有效地處理各類查詢,不需要走訪原始的XML文檔。為了支援更新機制,本論文又提出一個文件編碼方法,稱為分支映射編碼法 (branch map),合併 XML中的每個索引節點並只記錄了他們與父節點之間對應的關係。如果一個節點有一個以上的子節點有相同的標籤名稱,對應的位置將被分割成多個位置,以符合原文件的結構關係。CIS-X經修改後,稱為UCIS-X (An Updatable Compacted Indexing Scheme for XML documents)。透過分支映射編碼法,UCIS-X可以支援大部分由W3C(World Wide Web Consortium)所定義的更新作業。實驗結果也顯示,分支映射編碼法在資料儲存、查詢效能和更新處理上,都優於由Microsoft SQL ServerTM 2005所採用的ORDPATH編碼法。


For accelerating query processing in XML documents, some indexing and query evaluation methods have been proposed. The structural summary approach reduces the portion of the XML to be scanned during query processing. However, most of the methods based on this approach cannot support complex queries efficiently and/or encounter a long index construction time and a huge index size. The structural join approach is proposed to support twig pattern matching by decomposition- matching-merging. This approach produces huge amount of intermediate results that cause unnecessarily high processing cost. Several follow-up techniques, such as various data structures and efficient query evaluation algorithms, have been proposed to reduce intermediate results and avoid the expensive merging phase. The most important issue is that most of the indexing methods can''t support efficient and dynamic updates, such as insertion, deletion, and modification.
This dissertation first proposes a novel indexing method, called the CIS-X (A Compacted Indexing Scheme for XML documents). The CIS-X inherits the features of the DataGuide and makes the required index space as small as possible by using only one integer to label each node in an XML document. It can also process various types of queries efficiently without accessing the original XML document. In order to support the update mechanism, a local nested labeling scheme, called branch map, is also introduced in this dissertation. “Local” means each index node in the summarized XML tree only records the bit-map of instances corresponding to their parents. “Nested” means if a node has more than one child with same tag name, the corresponding bit will be split up into many bits. The new version of indexing scheme is called UCIS-X (An Updatable Compacted Indexing Scheme for XML documents). Benefit from the branch map, the UCIS-X can support most of the update operations defined by W3C (World Wide Web Consortium). The experiments show the branch map outperforms ORDPATH, which is the labeling method implemented in Microsoft SQL ServerTM 2005, in terms of data storage, query processing, and update processing.


Abstract i
Contents iii
List of Tables v
List of Figures vi
Chapter 1 Introduction 1
1.1. Background 1
1.2. Research Motivation 3
1.3. Contributions 4
1.4. Organization 5
Chapter 2 Related Work 6
2.1. The Structural Summary Methods 6
2.2. Structural Join and the Follow-up Methods 8
2.3. Labeling Schemes 10
Chapter 3 Preliminaries and Problem Statement 14
3.1. Data Model and Query Patterns 14
3.2. The Summarized Index Tree 16
3.3. Problem Statement 18
Chapter 4 A Compacted Indexing Scheme for Efficient
Query Evaluation 20
4.1. Summarization of the XML Document 20
4.2. Construction of the CIS-X 21
4.3. Query Evaluation Using CIS-X 25
Chapter 5 An Updatable Compacted Indexing Scheme
for XML Documents 32
5.1. Construction of the UCIS-X 33
5.2. Query Evaluation Using UCIS-X 35
5.3. Update Operations of UCIS-X 37
5.3.1 Insertion 37
5.3.2 Deletion 39
5.3.3 The Supported Update Operations 41
Chapter 6 Experimental Design and Results 47
6.1. Index Construction Costs 48
6.2. Performance on Query Evaluation 51
6.3. Performance on Update Operations 54
Chapter 7 Conclusions and Future Work 59
7.1. Conclusions 59
7.2. Future Work 60
References 62

[1]Q. Li, B. Moon, Indexing and querying XML data for regular expressions, In: Proceedings of the 27th International Conference on Very Large Data Bases, 2001, pp. 367–370.
[2]J. Clark and S. DeRose, XML path language (XPath) version 1.0. W3C Recommendation, 1999, http://www.w3.org/TR/xpath, Accessed 11 April 2012.
[3]S. Boag, D. Chamberlin, M.Fernandez, D. Florescu, J. Robie, J. Simeon, and M. Stefanescu, XQuery 1.0: An XML query language, Working Draft, 2001, http://www.w3.org/TR/2001/WD-xquery-20011220, Accessed 11 April 2012.
[4]Q. Chen, A. Lim, and K.W. Ong, D(K)-Index: an adaptive structural summary for graph-structured data, In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data , 2003, pp. 134-144.
[5]C.-W. Chung, J.-K. Min, and K. Shim, APEX: an adaptive path index for XML data, In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, 2002, pp.121-132.
[6]R. Goldman and J. Widom, DataGuides: enabling query formulation and optimization in semistructured databases, In: Proceedings of the 23rd International Conference on Very Large Data Bases, 1997, pp. 436-445.
[7]T. Milo and D. Suciu, Index structures for path expressions, Lecture Notes in Computer Science 1540, 1999, pp. 277-295.
[8]Z. Chen, J. Gehrke, F. Korn, N. Koudas, J. Shanmugasundaram, and D. Srivastava, Index structures for matching XML twigs using relational query processors, Data and Knowledge Engineering, 60, 2007, pp. 293-302.
[9]S.-C. Haw and C.-S. Lee, Extending path summary and region encoding for efficient structural query processing in native XML databases, Journal of Systems and Software, 82, 2009, pp.1025-1035.
[10]W.-C. Hsu, I-E. Liao, S.-Y. Wu, and K.-F. Kao, An efficient XML indexing method based on path clustering, In: Proceedings of the 20th IASTED International Conference on Modelling and Simulation, 2009, pp.339-344.
[11]S. K. Izadi, T. Harder, and M. S.Haghjoo, S3: Evaluation of tree-pattern XML queries supported by structural summaries, Data and Knowledge Engineering, 68, 2009, pp.126-145.
[12]X. Wu and G. Liu, XML twig pattern matching using version tree, Data and Knowledge Engineering, 64, 2008, pp.580-599.
[13]S. Al-Khalifa, H.V. Jagadish, N. Koudas, J.M. Patel, D. Srivastava, and Y. Wu, Structural joins: a primitive for efficient XML query pattern matching, In: Proceedings of the 18th International Conference on Data Engineering, 2002, pp. 141-152.
[14]N. Bruno, N. Koudas, and D. Srivastava, Holistic twig joins: optimal XML pattern matching, In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, 2002, pp. 310-321.
[15]S. Chen, H.G. Li, J. Tatemura, W.P. Hsiung, D. Agrawal, and K.S. Candan, Twig2Stack: bottom-up processing of generalized tree-pattern queries over XML documents, In: Proceedings of the 32nd International Conference on Very Large Data Bases, 2006, pp. 283–294.
[16]J. Kim, Advanced structural joins using element distribution, Information Sciences, 176, 2006, pp.3300-3331.
[17]Q. Li and B. Moon, Indexing and querying XML data for regular expressions, In: Proceedings of the 27th International Conference on Very Large Data Bases, 2001, pp. 367-370.
[18]L. Qin, J. Xu Yu, and B. Ding, TwigList: make twig pattern matching fast, In: Proceedings of the 12th International Conference on Database Systems for Advanced Applications, 2007, pp.850-862.
[19]B. Cooper, N. Sample, M. J. Franklin, G. R. Hjaltason, and M. Shadmon, A fast index for semistructured data, In: Proceedings of the 27th International Conference on Very Large Data Bases, 2001, pp.341-350.
[20]B. Zhang, Z. Geng, and A. Zhou, SIMP: efficient XML structural index for multiple query processing. In: 9th International Conference on Web-Age information Management (WAIM), IEEE Computer Society, Washington, DC, 2008, pp.113-118.
[21]Q. Chen, A. Lim, and K. W. Ong, Enabling structural summaries for efficient update and workload adaptation, Data and Knowledge Engineering, 64(3), 2008, pp.558-579.
[22]R. Kaushik, D. Shenoy, P. Bohannon, and E. Gudes, Exploiting local similarity for indexing paths in graph-structured data, In: Proceedings of the 18th International Conference on Data Engineering, 2002, pp.129-140.
[23]J.-K. Min, J. Lee, and C.-W. Chung, An efficient XML encoding and labeling method for query processing and updating on dynamic XML data, Journal of Systems and Software, 82, 2009, pp.503-515.
[24]D. Gerdemann, Parsing As Tree Traversal. In: Proceedings of the 15th Conference on Computational Linguistics, Vol. 1, 1994, pp. 396-400(1994).
[25]I. Tatarinov, S. D. Viglas, K. Beyer, J. Shanmugasundaram, E. Shekita, and C. Zhang, Storing and querying ordered XML using a relational database system. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, 2002, pp. 204-215.
[26]K. Kobayashi, W. Liang, D. Kobayashi, A. Watanabe, and H. Yokota, VLEI code: An efficient labeling method for handling XML documents in an RDB, In: Proceeding of the 21st International Conference on Data Engineering, 2005, pp. 386-387.
[27]P. E. O''Neil, E. J. O''Neil, S. Pal, I. Cseri, G. Schaller, and N, Westbury, ORDPATHs: insert-friendly XML node labels, In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, 2004, pp. 903-908.
[28]S.-C. Haw and C.-S. Lee, Evolution of structural path indexing techniques in XML databases: A survey and open discussion, Advanced Communication Technology, 3, 2008, pp.2054-2059.
[29]J. Lu, T. W. Ling, Z. Bao and C. Wang, Extended XML tree pattern matching: theories and algorithms, IEEE Transactions on Knowledge and Data Engineering, 23(3), 2011, pp.402-416.
[30]I. Tatarinov, Z.G. Ives, A.Y. Halevy, and D.S. Weld, Updating XML, In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, 2001, pp. 413-424.
[31]W3C Recommendation 17 March 2011, XQuery Update Facility 1.0, http://www.w3.org/TR/xquery-update-10/, Accessed 11 April 2012.
[32]XMLData Repository, http://www.cs.washington.edu/research/xmldatasets/, Accessed 11 April 2012.
[33]XMark- An XML benchmark project, http://www.xml-benchmark.org/, Accessed 11 April 2012.
[34]Y. Diao, P. Fischer, M. Franklin, and R. To., YFilter: efficient and scalable filtering of xml documents, In: Proceedings of the 18th International Conference on Data Engineering, 2002, pp. 341-342.
[35]W3C Recommendation 06 May 2010, XML Linking Language (XLink) Version 1.1, http://www.w3.org/TR/xlink11/, Accessed 11 June 2012.
[36]W. May, E. Behrends, and O. Fritzen, Integrating and querying distributed XML data via XLink, Information Systems, 33, 2008, pp. 508-566.
[37]W.-C. Hsu, I-E. Liao, H.-C. Shih, A cloud computing implementation of XML indexing method using Hadoop, In: Proceeding of the 4th Asian Conference on Intelligent Information and Database Systems, Vol. 7198, 2012, pp. 256-265.


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top