跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.84) 您好!臺灣時間:2024/12/04 11:18
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:周健平
研究生(外文):Chien-Ping Chou
論文名稱:語法剖析式的XML串流連續型查詢方法
論文名稱(外文):Syntactic Matching of Continuous-Query on XML Streams
指導教授:賈坤芳
口試委員:洪國寶楊東麟黃胤傅林明言
口試日期:2013-06-20
學位類別:博士
校院名稱:國立中興大學
系所名稱:資訊工程學系所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2013
畢業學年度:101
語文別:英文
論文頁數:71
中文關鍵詞:XMLXML streamsContinuous queryTwig query matchingSyntactic pattern recognition
外文關鍵詞:XMLXML streamsContinuous queryTwig query matchingSyntactic pattern recognition
相關次數:
  • 被引用被引用:0
  • 點閱點閱:186
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
當大量XML串流資料連續不斷地產生的時候,如何能快速地查詢XML串流成為一項具有挑戰性的工作。本論文著重於探討在XML串流中,非限定分支順序型之分支查詢處理。其目標是在僅能讀取串流資料一遍的限制下,從XML串流裡即刻取出所有符合分支查詢的結果。本論文提出兩種基於語法剖析的XML查詢方法,稱為STQM與GCQ。STQM方法針對每個使用者的查詢去產生一個上下文無關文法(context-free grammar);然後使用常見的編譯器工具(例如YACC)產生一個剖析器(parser);利用這個剖析器,STQM方法就能完成XML串流資料查詢的工作。另一方面,GCQ方法將多個查詢轉換成單一上下文無關文法;此轉換演算法支援新增查詢,並保證新增後,該文法仍為上下文無關文法。因此,GCQ能用單一剖析器,在XML串流上進行多查詢處理。我們證明此語法剖析式查詢方法具有正確性與明確性(unambiguity),並且兩種方法之文法產生與查詢比對演算法皆屬於多項式時間複雜度(polynomial time complexity)。我們經由多個角度進行各項模擬實驗,以呈現所提出的方法之效率與延展性。實驗結果顯示STQM與GCQ方法皆優於所比較的對象。因此,本論文所提出的語法剖析式查詢方法將有助於建立高效率之XML串流資料查詢系統。
Ensuring querying efficiency in query matching on Extensible Markup Language (XML) streams is challenging work when the amount of queried stream data is huge and the data can be streamed in continuously. This dissertation addresses the unordered twig-query matching problem on the XML stream. The goal is to immediately and efficiently extract all XML data that match twig queries from the XML stream under the constraint of scanning the stream only once. In this dissertation, two syntactic query matching methods, namely Syntactic Twig-Query Matching (STQM) and Grammar-Based Continuous Query Matching (GCQ), are proposed. The STQM method generates a context-free grammar according to the query being processed, and then produces a parser capable of parsing the grammar by using a compiler tool, such as YACC. With the parser, STQM matches a query from the XML stream. In contrast, the GCQ method is developed to convert multiple twig queries into one context-free grammar. The converting process guarantees that the grammar remains context-free while new queries are incorporated. GCQ can match multiple queries concurrently from the XML stream by using only one parser. We have proven the unambiguity of the grammars, as well as the correctness of the syntactic query matching method. Both of the algorithms for grammar generation and query matching have polynomial time complexity. Several experiments were conducted in various aspects to show the efficiency and scalability of the proposed methods, and the experimental results show that both STQM and GCQ outperform their counterparts. Consequently, the proposed methods will prove beneficial for building efficient query systems on stream information.
Chapter 1. Introduction 1
1.1. XML Query 2
1.2. Querying on XML Streams 3
1.3. Contributions 3
1.4. Organization of the Dissertation 4
Chapter 2. Related Work 5
2.1. Stack-Based Approach 5
2.2. Automata-Based Approach 5
2.3. Sequence-Based Approach 6
2.4. Grammar-Based Approach 7
2.5. Continuous Query Matching 7
Chapter 3. Problem Description 9
3.1. Preliminaries 9
3.2. Problem Statement 10
Chapter 4. Syntactic Query Matching on XML Data 11
4.1. Overview of SQMX Technique 11
4.2. Lexical Analysis 12
4.3. Nested-Structure Recognition 14
4.4. Properties of SQMX 18
Chapter 5. Syntactic Twig-Query Matching 21
5.1. Overview of STQM 21
5.2. Syntactic Query Processing 22
5.2.1. Functions in Semantic Actions 23
5.2.2. Grammar Generation of STQM 25
5.2.3. Example of Syntactic Twig Query Matching 27
5.3. Complexity Analysis of STQM 29
5.4. Comparison of STQM, YFilter and TransformX 30
5.4.1. Architecture 31
5.4.2. Implementation 32
5.4.3. Query Type 32
5.4.4. Formal View 32
5.4.5. Performance 33
Chapter 6. Grammar-Based Matching of Multiple Continuous Queries 34
6.1. Overview of GCQ 34
6.2. Query Analyzer 35
6.3. Production-Split for Recognizing an SPP 38
6.4. SPP Concatenation by Semantic Actions 40
6.5. Commonality among SPPs 42
6.6. Grammar Generation of GCQ 45
6.7. Analysis of GCQ 48
6.7.1. Complexity Analysis of GCQ 48
6.7.2. Correctness of GCQ 50
Chapter 7. Experiments 54
7.1. Performance Evaluation of STQM 55
7.1.1. Experiment 1: Varying Queries 56
7.1.2. Experiment 2: Varying Document Sizes 57
7.1.3. Experiment 3: Varying Branch Degrees in Twig Queries 59
7.2. Performance Evaluation of GCQ 60
7.2.1. Experiment 4: Varying Numbers of Queries 61
7.2.2. Experiment 5: Varying Document Sizes 63
7.2.3. Experiment 6: Comparison of GCQ and STQM 64
Chapter 8. Conclusions and Future Work 66
8.1. Conclusions 66
8.2. Future Work 67
References 68
[1] A.V. Aho, M.S. Lam, R. Sethi, J.D. Ullman, Compilers - Principles, Techniques, & Tools, 2nd ed., Addition-Wesley, Boston, Massachusetts, 2007.
[2] M. Altinel, M.J. Franklin, Efficient filtering of XML documents for selective dissemination of information, in: Proceedings of the 26th International Conference on Very Large Data Bases, September, 2000, pp. 53–64.
[3] P. Antonellis, C. Makris, XFIS: an XML filtering system based on string representation and matching, International Journal of Web Engineering and Technology 4 (2008) 70–94.
[4] B. Babcock, S. Babu, M. Datar, R. Motwani, J. Widom, Models and Issues in Data Stream Systems, in: Proceedings of ACM Symposium on Principles of Database Systems Conference, June, 2002, pp. 1–16.
[5] H. Balakrishnan, M. Balazinska, D. Carney, U. Cetintemel, M. Cherniack, C. Convey, E. Galvez, J. Salz, M. Stonebraker, N. Tatbul, R. Tibbetts, S. Zdonik, Retrospective on Aurora, VLDB Journal 13 (2004) 370–383.
[6] R.S. Boyer, J.S. Moore, A fast string searching algorithm, Communications of the ACM 20 (1977) 762–772.
[7] C. Brabrand, R. Giegerich, A. Moller, Analyzing ambiguity of context-free grammars, Science of Computer Programming 75 (2010) 176–191.
[8] N. Bruno, N., Koudas, D. Srivastava, Holistic twig joins: optimal XML pattern matching, in: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, June, 2002, pp. 310–321.
[9] F. Bry, F. Coskun, S. Durmaz, T. Furche, D. Olteanu, and M. Spannagel, The XML Stream Query Processor SPEX, in: Proceedings of the 21st International Conference on Data Engineering, April, 2005, pp. 1120–1121.
[10] K.S. Candan, W.P. Hsiung, S. Chen, J. Tatemura, D. Agrawal, AFilter: adaptable XML filtering with prefix-caching suffix-clustering, in: Proceedings of the 32nd International Conference on Very Large Data Bases, September, 2006, pp. 559–570.
[11] C.Y. Chan, P. Felber, M. Garofalakis, R. Rastogi, Efficient filtering of XML documents with XPath expressions, VLDB Journal 11 (2002) 354–379.
[12] S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss, M. Shah, TelegraphCQ: Continuous Dataflow Processing for an Uncertain World, in: Proceedings of the Conference on Innovative Data Systems Research, January, 2003, pp. 269–280.
[13] J. Chen, D. DeWitt, F. Tian, and Y. Wang, NiagaraCQ: A Scalable Continuous Query System for Internet Databases, in: Proceedings of ACM International Conference on Management of Data, May, 2000, pp. 379–390.
[14] Y. Chen, G.A. Mihaila, S.B. Davidson, S. Padmanabhan, EXPedite: a system for encoded XML processing, in: Proceedings of the 13th ACM International Conference on Information and Knowledge Management, November, 2004, pp. 108–117.
[15] C.P. Chou, K.F. Jea, H.H. Liao, A syntactic approach to twig-query matching on XML streams, The Journal of Systems and Software 84 (2011) 993–1007.
[16] C.P. Chou, K.F. Jea, Grammar-based matching of multiple continuous queries on XML streams, submitted to the journal: Information systems (2012).
[17] Y. Diao, M. Altinel, M.J. Franklin, H. Zhang, P. Fischer, Path sharing and predicate evaluation for high-performance XML filtering, ACM Transactions on Database Systems 28 (2003) 467–516.
[18] Y. Diao, M. J. Franklin, High-performance XML filtering: an overview of YFilter, IEEE Data Engineering Bulletin 26 (2006) 41–48.
[19] E. Fredkin, Trie memory, Communications of the ACM 3 (1960) 490–499.
[20] K.S. Fu, M.A. Aizerman, Syntactic methods in pattern recognition, IEEE Transactions on Systems, Man and Cybernetics 6 (1976) 590–591.
[21] J. Kim, Y. Kim, S. Park, An efficient bottom-up filtering of XML messages by exploiting the postfix commonality of XPath queries, IEICE Transactions on Information and Systems, E91-D (2008) 2124–2133.
[22] C. Koch, S. Scherzinger, Attribute grammars for scalable query processing on XML streams, VLDB Journal 16 (2007) 317–342.
[23] J. Kwon, P. Rao, B. Moon, S. Lee, FiST: scalable XML document filtering by sequencing twig patterns, in: Proceedings of the 31st International Conference on Very Large Data Bases, October, 2005, pp. 217–228.
[24] J. Kwon, P. Rao, B. Moon, S. Lee, Value-based predicate filtering of XML documents, Data & Knowledge Engineering 67 (2008) 51–73.
[25] J. Kwon, P. Rao, B. Moon, S. Lee, Fast XML document filtering by sequencing twig patterns, ACM Transactions on Internet Technology 9 (2009) 13:1–51.
[26] J.R. Levine, T. Mason, D. Brown, Lex & Yacc, 2nd ed., O'Reilly & Associates, Cambridge, Massachusetts, 1992.
[27] L. Liu, C. Pu, W. Tang, Continual Queries for Internet-Scale Event-Driven Information Delivery, IEEE Transactions on Knowledge and Data Engineering 12 (1999) 610–628.
[28] B. Ludascher, P. Mukhopadhyay, Y. Papakonstantinou, A Transducer-based XML Query Processor, in: Proceedings of the 28th International Conference on Very Large Data Bases, August, 2002, pp. 227–238.
[29] S. Mohammad, I. Burcea, H.A. Jacobsen, GPX-matcher: a generic boolean predicate-based XPath expression matcher, in: Proceedings of the 14th International Conference on Extending Database Technology, March, 2011, pp. 45–56.
[30] M. M. Moro, P. Bakalov, V.J. Tsotras, Early profile pruning on XML-aware publish-subscribe systems, in: Proceedings of the 33rd International Conference on Very Large Data Bases, September, 2007, pp. 866–877.
[31] A. Raj, P.S. Kumar, Branch sequencing based XML message broker architecture, in: Proceedings of the IEEE 23rd International Conference on Data Engineering, April, 2007, pp. 656–665.
[32] V. Raman, A. Deshpande, J. Hellerstein, Using State Modules for Adaptive Query Processing, in: Proceedings of IEEE Conference on Data Engineering, March, 2003, pp. 353–364.
[33] P. Rao, B. Moon, PRIX: indexing and querying XML using prufer sequences, in: Proceedings of the 20th International Conference on Data Engineering, April, 2004, pp. 288–299.
[34] M. Salloum, V.J. Tsotras, Efficient and scalable sequence-based XML filtering, in: Proceedings of the 12th International Workshop on the Web and Databases, June, 2009.
[35] A. Schmidt, F. Waas, M. Kersten, M.J. Carey, I. Manolescu, R. Busse, XMark: a benchmark for XML data management, in: Proceedings of the 28th International Conference on Very Large Data Bases, August, 2002 , pp. 974–985.
[36] S. Scherzinger, A. Kemper, Syntax-directed transformations of XML streams, in: Proceedings of the Workshop on Programming Language Technologies for XML (PLAN-X), 2005, pp. 79–90.
[37] M. Sullivan, A. Heybey, Tribeca: A System for Managing Large Databases of Network Traffic, in: Proceedings of USENIX Annual Technical Conference, June, 1998.
[38] D. Terry, D. Goldberg, D. Nichols, B. Oki, Continuous queries over append-only databases, in: Proceedings of the 1992 ACM SIGMOD International Conference on Management of Data, June, 1992, Volume 21, Number 2, pp. 321–330.
[39] W. H. Tsai, K. S. Fu, 1980. Attributed grammar–a tool for combining syntactic and statistical approaches to pattern recognition, IEEE Transactions on Systems, Man, and Cybernetics SMC-10 (1980) 873–885.
[40] G. Wang, B. Ning, G. Yu, Holistically stream-based processing Xtwig queries, World Wide Web 11 (4) (2008) 407–425.
[41] Dublin Core Metadata Initiative. <http://dublincore.org/documents/dces/>.
[42] Facebook. <http://www.facebook.com/>.
[43] Lawrence Berkeley Laboratory, Flex: The Fast Lexical Analyzer. <http://flex.sourceforge.net/>.
[44] Myspace. <http://www.myspace.com/>.
[45] SAX. <http://www.saxproject.org/>.
[46] The GNU Project, Bison - the GNU parser generator. <http://www.gnu.org/software/bison/>.
[47] University of Washington, XML Data Repository. <http://www.cs.washington.edu/research/xmldatasets/>.
[48] World-Wide Web Consortium (W3C), Extensible Markup Language (XML). <http://www.w3.org/TR/xml/>.
[49] World-Wide Web Consortium (W3C), RDF/XML Syntax Specification (Revised), <http://www.w3.org/TR/rdf-syntax-grammar/>.
[50] World-Wide Web Consortium (W3C), XML Path Language (XPath). <http://www.w3.org/TR/xpath20/>.
[51] World-Wide Web Consortium (W3C), XQuery 1.0: An XML Query Language. <http://www.w3.org/TR/xquery/>.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top