跳到主要內容

臺灣博碩士論文加值系統

(44.192.254.173) 您好!臺灣時間:2023/10/02 05:28
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:郭育佐
研究生(外文):Yu-Tso Kuo
論文名稱:適合串流XML文件查詢的編碼法
論文名稱(外文):An Encoding Scheme Facilitating Streaming XML Query Processing
指導教授:賈坤芳
學位類別:碩士
校院名稱:國立中興大學
系所名稱:資訊科學系所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
畢業學年度:95
語文別:英文
論文頁數:62
中文關鍵詞:XML串流查詢處理區間編碼法
外文關鍵詞:Streaming XML Query ProcessingInterval-based LabelingXML schema
相關次數:
  • 被引用被引用:0
  • 點閱點閱:100
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
近年來,XML文件串流查詢處理引起許多的注意,並成為一個在XML社群裡主要的挑戰。這主要是因為XML事實上已成為網際網路上資料交換格式的標準而且也是因為人們對於移動式與無線計算應用的需求孔急所致。
區間編碼法是一個可用於XML串流查詢處理的方法。但是區間編碼法只標示了節點間父子和祖孫關係,對於一個節點來說資訊非常的有限。由於XML有著不規則或不完整的半結構特性,我們便運用這一特性提出kodex編碼法,來提升XML文件串流查詢處理的效率,而作法是透過編入額外的後裔節點類型的結構資訊。
然而,由於kodex方法的編碼長度是與節點類型的多寡成正比的,所以節點的類型若過多可能會是一個問題。因此,我們也建議使用XML schema來歸納出編碼所需的最少節點類型數量,以期縮減編碼長度。而針對一群XML schema來分析之後,發現編碼長度可有30%至60%不等的縮減率。
最後,我們進行全面的查詢處理實驗,並用目前最有效率的EXPedite平台來比較我們的方法。實驗結果說明我們的方法在處理一套查詢處理度量上要比EXPedite平台快上近50%。
The query processing over XML-document streams draws lots of attentions and becomes a major challenge in the XML society in recent years. This is mainly because XML is the de facto standard of information exchange format in the Internet applications and the demand of mobile, wireless computing applications is booming.
Interval-based labeling is one answer to streaming XML query processing. However the information encoded in the interval-based labeling addresses merely about the parent-child and ancestor-descendant relationships, which are very limited for a single element. Since XML has the irregular or incomplete nature of semi-structured data, we apply this characteristic to our KODEX approach for optimizing query processing over XML-document streams by adding extra structural information into the labeling via encoding the types of descendants for each element.
However, the encoding length of our KODEX approach might be an issue while there are too many types of elements, since the encoding length is in proportional to the variety of elements. Therefore, we also propose to use XML schema (XSD) to figure out the minimum set of element types needed for encoding in order to cut down the size of our numbering scheme. After performing a research on a collection of XML schema, we found the reduction on encoding length ranges from 30% to 60%.
Finally, we perform comprehensive query processing experiments to compare our approach against the EXPedite platform, which is one of the most efficient platforms we have ever found so far. Experimental results show that our method takes 50% less time than EXPedite to process a suite of XPath query benchmark.
1. Introduction 1
1.1. Motivation 1
1.2. Contributions 3
1.3. Organization of this thesis 3
2. Related work 5
2.1. XML labeling scheme 5
2.2. Streaming XML query processing 6
2.3. Query optimization techniques 7
3. Methodology 9
3.1. The encoding scheme 9
3.2. Concept of proxy 12
3.3. Query processing on encoded XML streams 16
4. Experimental evaluation 23
4.1. Proxy encoding evaluation 23
4.2. Query processing evaluation 24
4.3. XML file size comparision 27
5. Conclusions and future works 29
References 31
Appendix 36
[1]S. Acharya, R. Alonso, M. Franklin, and S. Zdonik, “Broadcast Disk Management for Asymmetric Communication Environment,” Proceedings of ACM SIGMOD Conference, 1995, pages 199 - 210.
[2] M. Altinel and M. J. Franklin, “Efficient Filtering of XML Documents for Selective Dissemination of Information,” Proceedings of the 26th International Conference on Very Large Data Bases, 2000, pages 53 - 64.
[3] S. Amer-Yahia, S. Cho, L. V. Lakshmanan, and D. Srivas-tava, “Minimization of Tree Pattern Queries,” Proceedings of ACM SIGMOD Conference, 2001, pages 497–508.
[4] A. Berglund et al., XML Path Language (XPath) 2.0. W3C Recommendation 23 January 2007, Available at http://www.w3.org/TR/2007/REC-xpath20-20070123/
[5] S. Boag et al., XQuery 1.0: An XML Query Language. W3C Recommendation 23 January 2007, Available at http://www.w3.org/TR/2007/REC-xquery-20070123/
[6] T. Bray et al., Extensible Markup Language (XML) 1.1. W3C Recommendation 16 August 2006, Available at http://www.w3.org/TR/2006/REC-xml11-20060816/
[7]T. Brown et al., “The Datacycle Architecture,” Communications of the ACM, Volume 35, Number 12, 1992, pages 71 - 81.
[8] Y. Chen, S. B. Davidson, and Y. Zheng, “ViteX: A Streaming XPath Processing System,” Proceedings of 21st International Conference on Data Engineering (ICDE), 2005, pages 1118 - 1119.
[9] Y. Chen, G. A. Mihaila, S. B. Davidson, and S. Padmanabhan, “EXPedite: a System for Encoded XML Processing,” Proceedings of the 13th ACM International Conference on Information and Knowledge Management, 2004, pages 108 - 117.
[10] Y. Chen, G. A. Mihaila, S. B. Davidson, and S. Padmanabhan, “Efficient Path Query Processing on Encoded XML,” Proceedings of International Workshop on High Performance XML Processing, 2004.
[11] Y. Chen, Y. Shi, and Y. Chen, “Tree Inclusion Algorithm, Signatures and Evaluation of Path-oriented Queries,” Proceedings of the 2006 ACM Symposium on Applied Computing, 2006, pages 1020 - 1025.
[12] Z. Chen et al., “From Tree Patterns to Generalized Tree Patterns on Efficient Evaluation of XQuery,” Proceedings of International Conference on Very Large Data Bases, 2003, pages 237-248.
[13] C.-W. Chung, J.-K. Min, and K. Shim, “APEX: An Adaptive Path Index for XML Data,” Proceedings of ACM SIGMOD Conference, 2002, pages121-132.
[14] M. P. Consens and T. Milo, “Optimizing Queries on Files,” Proceedings of ACM SIGMOD Conference, 1994, pages 301 - 312.
[15] B. Cooper, N. Sample, M. J. Franklin, G. R. Hjaltason, and M. Shadmon, “A Fast Index for Semistructured Data,” Proceedings of International Conference on Very Large Data Bases, 2001, pages 341–350.
[16] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 2nd Edition, MIT Press and McGraw-Hill, 2001.
[17] J. W. R. Goldman, “DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases,” Proceedings of International Conference on Very Large Data Bases, 1997, pages 436 - 445.
[18] T. J. Green, G. Miklau, M. Onizuka, and D. Suciu, “Processing XML Streams with Deterministic Automata,” Proceedings of the 9th International Conference on Database Theory, 2003, pages 173 - 189.
[19] M. P. Haustein et al., “DeweyIDs - The Key to Fine-grained Management of XML Documents,” Proceedings of the 20th Brazilian Symposium on Databases (SBBD), 2005, pages 85 - 99.
[20] T. Imielinksi and B. Badrinath, “Mobile Wireless Computing: Challenges in Data Management,” Communication of the ACM, Volume 37, Number 10, October 1994, pages 18 - 28.
[21]T. Imielinski, S. Viswanathan, and B. Badrinath, “Energy Efficient Indexing on Air,” Proceedings of ACM SIGMOD Conference, 1994, pages 25 - 36.
[22] R. Jain and J. Werth, Airdisks and AirRAID: Modeling and Scheduling Periodic Wireless Data Broadcast, DIMACS Technical Report 95-11, Rutgers University, U.S.A., May 1995.
[23]H. Koch, L. Krombholz, and O. Theel, A Brief Introduction into the World of Mobile Computing, Technical Report THD-BS-1993-03, Department of Computer Science, University of Darmstadt, U.S.A., May 1993.
[24] J. Lu et al., “From Region Encoding to Extended Dewey: On Efficient Processing of XML Twig Pattern Matching,” Proceedings of International Conference on Very Large Data Bases, 2005, pages 193 - 204.
[25] S. Park, Y. Choi, and H.-J. Kim, “XML Query Processing Using Signature and DTD,” Proceedings of the Third International Conference on E-Commerce and Web Technologies, 2002, pages 162 - 171.
[26] S. Park and H.-J. Kim, “A New Query Processing Technique for XML Based on Signature,” Proceedings of the 7th International Conference on Database Systems for Advanced Applications, 2001, page 22.
[27] F. Peng and S. S. Chawathe, “XPath Queries on Streaming Data,” Proceedings of ACM SIGMOD Conference, 2003, pages 431 - 442.
[28] I. Stanoi, C. A. Lang, and S. Padmanabhan, “Hint and Run: Accelerating XPath Queries,” Proceedings of the 9th International Database Engineering & Application Symposium, 2005, pages 253 - 262.
[29] H. Su, E. A. Rundensteiner, and M. Mani, “Semantic Query Optimization for XQuery over XML Streams,” Proceedings of International Conference on Very Large Data Bases, 2005, pages 277-288.
[30] I. Tatarinov, S. Viglas, K. S. Beyer, J. Shanmugasundaram, E. J. Shekita, and C. Zhang. “Storing and Querying Ordered XML Using a Relational Database System,” Proceedings of ACM SIGMOD Conference, 2002, pages 204 - 215.
[31] H. S. Thompson et al., XML Schema Part 1: Structures. W3C Recommendation, 28 October 2004, Available at http://www.w3.org/TR/xmlschema-1/
[32] Z. Vagena and V.J. Tsotras, “Path-expression Queries over Multiversion XML Documents,” Proceedings of WebDB, 2003, pages 49 - 54.
[33] J. Widom, “Data Management for XML: Research Direction,” IEEE Data Engineering Bulletin, Volume 22, Number 3, 1999, pages 44 - 52.
[34] X. Wu, M. L. Lee, and W. Hsu, “A Prime Number Labeling Scheme for Dynamic Ordered XML Trees,” Proceedings of 20th International Conference on Data Engineering, 2004, pages 66 - 78.
[35] G. Xing and B. Tseng, “Extendible Range-Based Numbering Scheme for XML Document,” Proceedings of the International Conference on Information Technology, 2004, pages 140 - 141.
[36] C. Zhang, J. F. Naughton, D. J. DeWitt, Q. Luo, and G. M. Lohman, “On Supporting Containment Queries in Relational Database Management Systems,” Proceedings of ACM SIGMOD Conference, 2001, pages 425 - 436.
[37] GSFC/NASA XML Project. Nasa XML data, 2001. http://xml.gsfc.nasa.gov/archive/index.html
[38] Georgetown Protein Information Resource. Protein Sequence Database, 2001. http://pir.georgetown.edu/
[39] The XML Data Repository. University of Washington, Computer Science & Engineering Department. http://www.cs.washington.edu/research/xmldatasets/.
[40] Altova XMLSpy 2007. http://www.altova.com/products/xmlspy/xml_editor.html
[41] Stylus Studio 2007. http://www.stylusstudio.com/xml_product_index.html
[42] NIAGARA Experimental Data. http://www.cs.wisc.edu/niagara/data.html
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top