跳到主要內容

臺灣博碩士論文加值系統

(3.238.204.167) 您好!臺灣時間:2022/08/09 22:58
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:林茂桐
研究生(外文):Mao-Tong Lin
論文名稱:XML文件之索引方法之設計與製作
論文名稱(外文):Design and Implementation of Indexing Strategies for XML Documents
指導教授:張玉盈
指導教授(外文):Ye-In Chang
學位類別:碩士
校院名稱:國立中山大學
系所名稱:資訊工程學系研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:英文
論文頁數:89
中文關鍵詞:可延伸標示語言資料交換索引全球資訊網關聯式資料庫
外文關鍵詞:XMLIndexData exchangeRelational databaseWWW
相關次數:
  • 被引用被引用:2
  • 點閱點閱:141
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:2

在最近幾年,很多人利用全球資訊網 (World Wide Web) 網際網路 (Internet) 去找他們所想要的資訊。超文件標示語言(HTML)是一種用於發行超文件的文件標示語言,同時它也是一種對於世界上的內容發展者的目標格式。基本上,超文件標示語言最主要的貢獻在於描述如何去展示一個資料的項目。因此,很難從HTML文件中找到有用的資訊。這是因為HTML文件是將內容與展示用標籤混雜在一起。可延伸標示語言(eXtensible Markup Language) 則是另一種在網際網路及企業內部應用之間作為資料交換和應用的格式。為了能夠幫助資料的交換,企業的夥伴定義了共同XML文件的文件型別定義(DTD)來做為他們的應用所需的文件交換。而且,受歡迎的WWW/EDI、電子商務和許許多多的商業資料都使用XML在WWW上做資料的交換。基本上,XML可以描述自身資料的意義,同時,XML文件的內容是和展示的格式分開的,所以很容易從中找到有意義的資訊並且能夠進一步去分析它。當大量商業資料存在時,對於支援XML文件的管理的方法之一,是運用關聯式資料庫。對於這種方法,我們必須要能夠將XML文件轉換到關聯式資料庫內。在這一篇論文中,我們設計與實作索引方法來有效的存取XML文件。XML文件在根本上不同於關聯式資料。XML是階層與巢狀的文件,它是非常類似於半結構化資料模型。半結構化資料的特性是在於它沒有固定的綱要和它可能是不具規則或是不完整的。因為,半結構化資料模型是彈性的,所以在查詢處理上它需要大量搜尋空間,因為它沒有固定的綱要。索引是有效改善查詢效能的方法之一。鑒於XML的半結構化資料特性,我們歸納出五種查詢的型態:(1)完整而單一路徑,(2)特定樹葉節點,(3)特定內部路徑,(4)特定屬性/元素(值),(5)在同一層中多個路徑。在這一篇論文中,我們將所有可能的查詢歸納成這五種查詢型態。接著,我們對於不同的查詢型態建立索引。除此之外,我們也設計與實作了從XML查詢語句到SQL語句的查詢轉換。我們設計了一個容易使用的使用者介面來輸入XML查詢語句。這整個系統是用JAVA程式語言實作而後端的資料庫使用 SQL Server 2000。從我們的實驗顯示中,我們的索引方法可以有效地改善XML查詢處理效能。


In recent years, many people use the World Wide Web and Internet
to find information that they want. HTML is a document markup
language for publishing hypertext on the WWW. HTML has been the
target format for content developers around the world. Basically,
HTML tags serve the primary purpose of describing how to display a
data item. Therefore, HTML documents are difficult to find some
useful information. That is because, HTML documents are mixed
content with display tags. On the other hand, XML is the another
data format for data exchange inter-enterprise applications on the
Internet. In order to facilitate data exchange, industry groups
define public Document Type Definitions (DTD) that specify the
format of the XML documents to be exchanged between their
applications. Moreover, WWW/EDI or Electric Commerce is very
popular and a lot of business data uses XML to exchange
information on the World Wide Web. Basically, XML tags describe
the data itself. The contents (meaning) of the XML documents and
the display format is separated. It could be easily to find
meaningful information of the XML documents and analyze the
information. Moreover, when a large volume of business data (XML
documents) exists, one way to support the management of the XML
documents is to apply the relational databases. For such an
approach, we must transform the XML documents to the relational
databases. In this thesis, we design and implement the indexing
strategies to efficiently access XML documents. XML document is
fundamentally different from relational data. XML is a
hierarchical and nested document, it is very similar to the
semistructured data model. The characteristic of semistructured
data is that it may not have a fixed schema and it may be
irregular or incomplete. Though, the semistructured data model is
flexible in data modeling, it requires a large search space in
query processing since there is no schema fixed in advance.
Indexing is the way of how to improve query performance
efficiently. However, due to the special properties of
semistructued data, there are up to five types of queries: (1)
complete single path, (2) specified leaf only, (3) specified
intrapath, (4) specified attribute/element(value), and (5)
multiple paths with the same level. In this thesis, we classify
all possible queries into those five query types. Next, we create
different indexes for different query types. Moreover, we design
and implement the query transformation from XML query statements
to SQL statements. Also, we create a user-friendly interface for
users to input XML query statements. The whole system is
implemented in JAVA and SQL Server 2000. From our experiences, we
show that our indexing strategies can improve the XML query
processing performance very well.


ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 XML . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Query Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 XML-QL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 XSL Patterns . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 XQuery . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.6 Motivations . .. . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.7 Organization of the Thesis . . . . .. . . . . . . . . . . . . . . . 17
2. A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1 The Object Exchange Model . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Extracting Indexing Information from XML DTDs . . . . . . . . . . 19
2.2.1 Key Ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.2 Classication of DTD Elements . . . . . . . . . . . . . . . . . 20
2.2.3 DTD Automata . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.1 Existence of Multiple DataGuides . . . . . . . . . . . . . . . . 24
2.4 Four Di erent Types of Index Structures . . . . . . . . . . . . . . . . 26
2.4.1 Value Index (Vindex) . . . . . . . . . . . . . . . . . . . . . . . 26
2.4.2 Text Index (Tindex) . . . . . . . . . . . . . . . . . . . . . . . 26
2.4.3 Link Index (Lindex) . . . . . . . . . . . . . . . . . . . . . . . 27
2.4.4 Path Index (Pindex) . . . . . . . . . . . . . . . . . . . . . . . 27
3. Query Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4. Query Transformation . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2 System Flowchart . . . . . .. . . . . . . . . . . . . . . . . . . . . . 41
4.3 DTD of Movie.xml . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4 Constructing an XML Query . . . . . . . . . . . . . . . . . . . . . . . 46
4.5 Constructing a SQL Query . . . . . . . . . . .. . . . . . . . . . . . . 50
4.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5. The Implementation of Index Strategies . . . . . . . . . . . . . . . . 66
5.1 Index Constructing . . . . . . . . . . . .. . . . . . . . . . . . . . . 66
5.1.1 Constructing the Value Index . . . . . . . . . . . . . . . . . . 67
5.1.2 Constructing the Text Index . . . . . . . . . . . . . . . . . . 67
5.1.3 Constructing the Link Index . . . . . . . . . . . . . . . . . . 70
5.1.4 Constructing the Path Index . . . . . . . . . . . . . . . . . . . 73
5.2 Query Processing by Indexes . . . . . . . . . . . . . . . . . . . . . . 73
5.3 Performance Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.3.1 The Performance Model . . . . . . . . . . . . . . . . . . . . . 82
5.3.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . 83
6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.2 Further Research Work . . . . . . . . . . . . . . . . . . . . . . . . . 87
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88


BIBLIOGRAPHY [1] "XSL Documentation: XQL Users Guide" http://www.cuesoft.com/docs/cuexsl-activex/xql-users-guide.htm. [2] Y. Papakonstantinou, H. Garcia-Molina, and J. Widom, "Object Exchange Across Heterogeneous Information Sources", Proceedings of the Eleventh Inter- national Conference on Data Engineering, pp. 251-260, 1995. [3] Jason McHugh, Jennifer Widom, Serge Abiteboul, Qingshan Luo, and Anand Ra- jaraman,"Indexing Semistructured Data," http://www-db.standford.edu/lore. [4] Chia-He Lee, "Design and Implementation of a Mapping Technique Between XML Documents and Relational Databases," Master Thesis, Receipt of National Sun Yat-sen University, June, 2001. [5] Roy Goldman, and Jennifer Widom, "DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases," Proceedings of the 23rd VLDB Conference, pp. 436-445, 1997. [6] ALin Deutsch, Mary Fernandez, and Dan Suciu, "Storing Semistructured Data with STORED," AT&T Labs-Research, 1998. [7] J. McHugh and J. Widom, "Query optimization in semistructured data," Technical report, Standford University Database Group, 1997. Available at http://www-db.standford.edu/pub/papers/qo.ps. [8] Don Chamberlin, James Clark, Daniela Florescu, Jonathan Robie, and Mugur Stefanescu "XQuery 1.0: An XML Query Language," W3C Working Draft, 7, June, 2001, http://www.w3.org/TR/xquery. [9] Don Chamberlin, Jonathan Robie, and Daiela Florescu. "Quilt: an XML Query Language for Heterogeneous Data Sources," In Lecture Notes in Com- puter Science, Springer-Verlag, pp. 199-234, Dec, 2001. Also available at http://www.almaden.ibm.com/cs/people/chamberlin/quilt lncs.pdf. [10] World Wide Web Consortium, "XML Path Language(XPath) Version 1.0," W3C Recommendation, Nov. 16, 1999. See http://www.w3.org/TR/xpath.xml. [11] J. Robie, J. Lapp, and D. Schach, "XML Query Language (XQL)," See http://www.w3.org/TandS/QL/QL98/pp/xql.html. [12] Iternational Organizarion for Standardization (ISO). Information Technology- Database Language SQL. Standard No. ISO/IEC 9075:1999. (Available from American National Standards Institute, New York, NY 10036, (212) 642-4900.) [13] Rick Cattell et al, "The Object Database Standard: ODMG-93, Release 1.2." Morgan Kaufmann Publishers, San Francisco, 1996. [14] Serge Abiteboul, Dallan Quass, Jason McHugh, Jennifer Widom, and Janet L. Wiener. "The Lorel Query Language for Semistructured Data," International Journal on Digital Libraries, Vol. 1 No.1, pp. 68-88, April 1997. See http://www- db.standford.edu/ widom/pubs.html. [15] S. Cluet, S. Jacqmin, and J. Simeon. "The New YATL: Design and Specica- tions," Technical Report, INRIA, 1999. [16] Alin Deutsch, Mary Fernandez, Daniela Florescu, Alon Levy, and Dan Suciu "XML-QL: A Query Language for XML" http://www.w3.org/TR/1998/NOTE- xml-ql-19980819. [17] Y. Papakonstantnou, P. Velikhov, "Enhancing Semistructured Data Mediators with Document Type Denitions" in: Proceedings of International Conference on Data Engineering, pp. 251-260, 1999. [18] Tae-Sun Chung*, and Hyoung-Joo Kim, "Extracting Indexing Information from XML DTDs," Information Processing Letters, 81, pp. 97-103, 2002. [19] S. Nestorov, J. Ullman, J. Wiener, and S. Chawathe, "Representative Objects: Concise Representations of Semistructured, Hierarchical Data," Proceedings of the Thirteenth International Conference on Data Engineering, pp. 79-90, 1997. [20] J. Hopcroft, "An n log n Algorithm for Minimizing the States in a Finite Au- tomaton," The Theory of Machines and Computations, Academic Press, NY, pp. 189-196, 1971. [21] Alin Deutsch, Mary Fernandez, Daniela Florescu, Alon Levy, and Dan Suciu, "XML-QL: A Query Language for XML" http://www.w3.org/TR/NOTE-xml- ql.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊