(18.206.238.77) 您好!臺灣時間:2021/05/12 01:56
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

: 
twitterline
研究生:徐任鵬
研究生(外文):Jen-Peng Hsu
論文名稱:多重關鍵字搜尋在動態XML文件之研究
論文名稱(外文):A Study of Multiple Keywords Searching on Dynamic XML Document
指導教授:陳靖國陳靖國引用關係
指導教授(外文):Jeang-Kuo Chen
學位類別:碩士
校院名稱:朝陽科技大學
系所名稱:資訊管理系碩士班
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2008
畢業學年度:96
語文別:中文
論文頁數:92
中文關鍵詞:文件型態定義關鍵字搜尋標記索引資料大小單元可擴展標示語言
外文關鍵詞:Keyword searchLabelingIndexingGranularityDTDXML
相關次數:
  • 被引用被引用:0
  • 點閱點閱:329
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:38
  • 收藏至我的研究室書目清單書目收藏:0
關鍵字搜尋對大部份的使用者而言,是一個常用的資料搜尋方法,因為使用者不需要學習複雜的查詢語言或了解資料基本的結構。近年來,XML不但是網際網路上流行的資訊交換媒介,也是儲存資料的主流樣式。透過關鍵字來搜尋XML文件中的資料是目前重要的研究項目之一。對大量XML資料做查詢,要加快查詢速度,最有效的方法之一就是建立良好的索引結構並輔以相關的搜尋技術,例如路徑索引、標記或編號方法等等。然而,當XML資料有變動時,用這些技術所建立的索引必須加以重建以及時反應更新後的資料,這樣一來不但消耗不少時間而且又會降低系統績效。雖然有些學者提出所謂的動態XML資料標記法(LSDX)可以動態更新標記,不用重建索引和標記,但是有兩個問題仍然沒有完全解決,因而影響它的實用性。第一個問題是在建立XML資料索引樹的過程中,同一階層的節點最多只能標記25個,超過26個以上就做不下去。第二個問題是對索引樹作更動時,若在兩個相鄰節點A和B之間,新增一個節點C後,又在A和C之間再新增一個節點D,則會造成新增節點D的標記無法編碼的窘況。在本論文中,我們提出兩個標記編碼規則來徹底解決上述的兩個問題。在建立索引樹時,讓同一階層欲標記編碼的節點個數沒有限制。在更動索引樹時任何新增的節點,其標記都有適當之編碼對應,而且以不增加新增節點的標記編碼長度為前提,若連續新增的節點不超過五個,其編碼長度和相鄰的兄弟節點一樣,超過五個以後,每隔五個編碼才會增加一個位數。
透過多重關鍵字來對大量的XML資料做查詢,有了良好的索引結構之後,可以加快查詢速度,但是資料搜尋得到的回傳結果,有可能是純粹簡單元素的內容、複雜元素的部份子元素或整份XML文件,若回傳結果單元太小,則資料可能不夠詳細。若回傳結果單元太大,可能會包含過多不相關的資料,因此有必要事先決定適當的回傳結果的資料大小單元。本論文對於XML多重關鍵字查詢,提出一個適當的XML資料元素片段擷取方法。使用者只需輸入多個關鍵字來查詢資料而不必描述關鍵字們之間的階層關係。我們的方法先從XML文件附屬的DTD中,摘取出若干個邏輯的XML元素片段,叫做ACS-D,然後再利用每一個ACS-D從XML文件中辨識出符合這個ACS-D結構的若干個實際XML資料元素片段,叫做ACS-X。每個ACS-X是XML實際資料的一個適當的資料大小單元,包含使用者所下的所有關鍵字以及關鍵字之間的階層架構,讓使用者得以瀏覽適當範圍內有興趣的資料。
Keyword search is an effective approach for most users to search information because they neither need to learn complex query languages nor need to understand the underlying structure of data. Recently, the eXtensible Mark-up Language (XML) has become a popular medium for data exchange or storage. Therefore, keyword search on XML documents has become one of the important researches. To speed up query on large amount of data, one of the most efficient methods is to build a good index. In order to facilitate query of XML data, some approaches such as path indexing, labeling, and numbering scheme have been proposed. However, if XML data is updated, the indexes derived by these approaches must be rebuilt. It is a time-consuming and performance-degrading job. Although the method LSDX (Labeling Scheme for Dynamic XML data) proposed by some scholars can dynamically add new labels without reconstructing the index structure or updating existing old labels. However, there are two defects in LSDX which affect the practicality of LSDX. The first defect is that the nodes within the same level can be labeled no more than twenty five when LSDX builds the XML index tree. The second defect is described as follows. When updating the index tree, if LSDX inserts a new node C between two adjacent nodes A, B and then inserts a new node D between the two nodes A and C, the label of the node D cannot be coded because of the label coding strategy of LSDX. In this thesis, we propose two label coding rules to completely solve the two defects of LSDX. With our method, unlimited nodes within the same level can be labeled when LSDX builds the index. When updating the index tree, our method can give a suitable label code to any new-inserted node. Our method does not increase the label length of a new-inserted node as possible. If the number of the new-inserted nodes is not more than five, the label length is as the same as its sibling nodes. If the number of new-inserted nodes is more than five, every five codes will just increase one digit.
To query on large amount of XML data with multiple keywords, a good index can be used to speed up query. The returned results of an XML document search may be the content of the simple type element, part subelement of the complexity type element or the whole XML document. If the returned granularity is too small, then the data may be insufficient to the user. If the returned granularity is too large, it may contain too much unrelated data. Therefore, one of the difficulties for keyword search is how to determine the appropriate granularity of returned results. This thesis proposes a method to abstract appropriate segments of XML elements for multiple keywords queries. The user only needs to input several related keywords without describing the hierarchical relationship among these keywords. From the associated DTD of an XML document, our method can abstract several logical XML element segments called ACS-D’s. These ACS-D’s then can be used to identify some practical XML element segments, called ACS-X’s, from the XML document. As an appropriate granularity of XML instance data, each ACS-X contains all inputted keywords and the hierarchy architecture of the keywords. Users can browse interesting information from the returned ACS-X’s in an appropriate range.
目錄
中文摘要.....................................I
Abstract...................................III
誌謝.........................................V
目錄........................................VI
表目錄......................................IX
圖目錄.......................................X
第一章、緒論.................................1
1.1 研究背景和動機......................1
1.2 研究目的和架構......................4
第二章、文獻探討.............................7
2.1 XML文件.............................7
2.2 Document Type Definition (DTD)......9
2.3 XML索引研究之探討..................12
2.4 XML關鍵字搜尋研究之探討............18
第三章、研究架構與方法......................21
3.1 索引樹建立與更動技術...............21
3.1.1 XML資料索引樹的節點結構............21
3.1.2 建立索引樹節點標記.................22
3.1.3 新增索引樹節點標記.................26
3.1.4 刪除索引樹節點標記.................34
3.1.5 修改索引樹節點元素的內容...........35
3.2 XML關鍵字搜尋方法描述..............37
3.2.1 ACS的結構描述......................37
3.2.2 ACS的產生..........................39
3.2.3多重關鍵字搜尋.........................43
第四章、實驗與分析..........................63
4.1 實驗目的...........................63
4.2 實驗環境...........................63
4.3 效能評估項目.......................66
4.4 實驗結果分析.......................68
4.4.1 第一組實驗群.......................68
4.4.2 第二組實驗群.......................79
第五章、結論與未來工作......................85
參考文獻....................................88
表目錄
表1、建XML資料索引樹的時間..................69
表2、XML索引樹連續新增子樹的時間............71
表3、XML索引樹隨機新增子樹的時間............74
表4、XML文件索引樹新增子樹兩種狀況的時間....75
表5、刪除XML索引樹子樹的時間................78
表6、狀況一查詢等待時間和回傳結果正確率.....81
表7、狀況二查詢等待時間和回傳結果正確率.....83
圖目錄
圖1、一個RDB書籍資料XML元素片段和XML邏輯樹狀結構......3
圖2、一份圖書館資料XML文件............................3
圖3、一份書店資料XML文件..............................9
圖4、一份書店XML文件附屬DTD檔案......................10
圖5、一份書店XML文件附屬XML Schema檔案...............11
圖6、一份書店資料XML文件和XML邏輯樹狀結構............12
圖7、簡單字首標記方法................................15
圖8、新增節點時,部份節點標記必須更改................15
圖9、杜威字首為基礎的編號方法........................16
圖10、新增節點時,部份節點標記必須更改...............17
圖11、XML Schema的邏輯樹狀結構.......................20
圖12、XML資料索引樹的節點結構........................22
圖13、XML資料索引樹中適當的節點標記編碼..............25
圖14、同一Level的書籍節點個數超過25個的情況..........25
圖15、同一Level節點個數超過25個時,仍可被適當地標記..25
圖16、在XML資料樹中,前無節點的位置新增節點..........30
圖17、索引樹新增節點時,適當的標記編碼...............30
圖18、在XML資料樹中,兩個節點之間新增兩個節點........32
圖19、索引樹新增節點時,LSDX節點標記編碼所產生的問題.32
圖20、索引樹新增節點時,適當的標記編碼...............32
圖21、在XML資料樹中,前有節點且後無節點的位置新增兩個節點....33
圖22、索引樹新增節點時,適當的標記編碼...............34
圖23、在XML資料樹中,刪除一棵子樹....................35
圖24、在XML資料索引樹中,刪除一棵子樹................35
圖25、在XML資料樹中,修改一個節點的內容..............36
圖26、透過索引樹,修改一個XML元素的內容..............36
圖27、ACS的兩種原型..................................38
圖28、產生ACS-D的流程圖..............................41
圖29、兩種類型ACS-X..................................43
圖30、多個關鍵字搜尋XML資料的流程圖..................46
圖31、一份書店資料XML文件............................49
圖32、書店XML文件的邏輯樹狀結構......................50
圖33、一份書店XML文件的附屬DTD檔案...................50
圖34、DTD邏輯樹狀結構................................50
圖35、書店XML文件的索引樹TX..........................53
圖36、書店XML文件附屬DTD的TD.........................54
圖37、MD中ACS-D資訊..................................54
圖38、第一個例子MK的暫時結果.........................57
圖39、第一個例子MK的暫時結果.........................57
圖40、第一個例子MK的暫時結果.........................57
圖41、第一個例子MK的最後結果.........................57
圖42、從TX找出的四個ACS-X以S、T、U、V代表............58
圖43、S、T、U、V子樹對應的四棵XML資料邏輯樹..........59
圖44、第一個例子最後回傳的結果.......................59
圖45、第二個例子MK的最後結果.........................61
圖46、從TX找出的三個ACS-X............................61
圖47、三個ACS-X對應的三棵XML資料邏輯樹...............61
圖48、第二個例子最後回傳的結果.......................62
圖49、DBLP附屬的DTD邏輯樹狀結構......................65
圖50、修改過的DTD邏輯結構............................66
圖51、建索引樹平均時間曲線圖.........................69
圖52(a)、狀況一所需平均時間曲線圖....................72
圖52(b)、狀況一所需平均時間曲線圖....................72
圖53、狀況二所需平均時間曲線圖.......................74
圖54、1MB索引樹,兩個狀況所需平均時間曲線圖..........76
圖55、40MB索引樹,兩個狀況所需平均時間曲線圖.........76
圖56、80MB索引樹,兩個狀況所需平均時間曲線圖.........77
圖57、隨機刪除子樹所需平均時間曲線圖.................78
圖58、狀況一查詢平均等待時間曲線圖...................81
圖59、狀況一查詢結果的正確率曲線圖...................82
圖60、狀況二查詢平均等待時間曲線圖...................83
圖61、狀況二查詢結果的正確率曲線圖...................84
[1]S. Agrawal, S. Chaudhuri, and G. Das(2002), “DBXplorer: A System for Keyword-Based Search over Relationship Databases,” Proceedings of the 18th International Conference on Data Engineering, pp. 5-16.
[2]G. Amato, F. Debole, F. Rabitti, and P. Zezula(2003), “Yet Another Path Index for XML Searching,” Proceedings of the 7th European Conference on Research and Advanced Technology for Digital Libraries, pp. 176-187.
[3]G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan(2002), “Keyword Searching and Browsing in Databases Using BANKS,” Proceedings of the 18th International Conference on Data Engineering, pp. 431-440.
[4]J. K. Chen and J. P. Hsu, “Algorithms for Indexing Dynamic XML Data(2006),” Proceedings of the 9th Joint Conference on Information Sciences.
[5]J. K. Chen and J. P. Hsu, “Appropriate Cutting Segments of XML Elements for Multiple Keywords Queries(2007),” Proceedings of the 14th International Conference on Distributed Multimedia Systems.
[6]E. Cohen, H. Kaplan, and T. Milo(2002), “Labeling Dynamic XML Trees,” Proceedings of the 21th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 271-281.
[7]S. Cohen, J. Mamou, Y. Kanza, and Y. Sagiv(2003), “XSEarch: A Semantic Search Engine for XML,” Proceedings of the 29th International Conference on VLDB, pp. 45-56.
[8]F. B. Cooper, N. Sample, J. M. Franklin, R. G. Hjaltason, and M. Shadmon(2001), “A Fast Index for Semistructured Data,” Proceedings of the 27th International Conference on VLDB, pp. 341-350.
[9]M. Duong and Y. Zhang(2005), “LSDX: A New Labelling Schema Dynamically Updating XML Data,” Proceedings of the 16th Australasian Conference on Database Technologies, pp. 185-193.
[10]D. Florescu, D. Kossmann, and I. Manolescu(2000), “Integrating Keyword Search into XML Query Processing,” Proceedings of the 9th International World Wide Web Conference on Computer Networks, Vol. 33, pp. 119-135.
[11]N. Fuhr and K. Grobjohann(2001), “XIRQL: A Query Language for Information Retrieval in XML documents,” Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 172-180.
[12]T. Grust(2002), “Accelerating XPath Location Steps,” Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 109-120.
[13]L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram(2003), “XRANK: Ranked Keyword Search over XML Documents,” Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 16-27.
[14]V. Hristidis and Y. Papakonstantinou(2002), “DISCOVER: Keyword Search in Relationship Databases,” Proceedings of the 28th International Conference on VLDB, pp. 670-681.
[15]V. Hristidis, Y. Papakonstantinou, and A. Balmin(2003), “Keyword Proximity Search on XML Graphs,” Proceedings of the 19th International Conference on Data Engineering, pp. 367-378.
[16]M. Kaelin, Database Optimization: Increase Query Performance with Indexes and Statistics, TechRepublic, 6 February 2004 http://techrepublic.com.com/5100-10878_11-5146588.html?tag=search.
[17]H. Kaplan, T. Milo, and R. Shabo, A Comparison of Labeling Schemes for Ancestor Queries, http://www.math.tau.ac.il/~haimk/papers/comparison.ps.
[18]M. Ley, DBLP Computer Science Bibliography, 1998, http://dblp.uni-trier.de/.
[19]Q. Li and B. Moon(2001), “Indexing and Querying XML Data for Regular Path Expressions,” Proceedings of the 27th International Conference on VLDB, pp. 361-370.
[20]H. Meuss and M. C. Strohmaier(1999), “Improving Index Structures for Structured Document Retrieval,” Proceedings of the 21st BCS IRSG Colloquium on IR.
[21]I. Tatarinov, S. Viglas, K. Beyer, J. Shanmugasundaram, E. Shekita, and C. Zhang(2005), “Storing and Querying Ordered XML Using a Relational Database System,” Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 204-215.
[22]A. Theobald and G. Weikum(2002), “The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking,” Proceedings of the 8th International Conference on Extending Database Technology, Vol. 2287, pp. 477-495.
[23]W. Wang, H. Jiang, H. Lu, and X. J. Yu(2003), “PBiTree Coding and Efficient Processing of Containment Joins,” Proceedings of the 19th International Conference on Data Engineering, pp. 391-402.
[24]J. Xu, J. Lu, W. Wang, and B. Shi(2006), “Efficient Keyword Search in XML Documents Based on MIU,” Proceedings of the 11th International Conference on Database Systems for Advanced Applications, Vol. 3882, pp. 702-716.
[25]Y. Xu and Y. Papakonstantinou(2005), “Efficient Keyword Search for Smallest LCAs in XML Databases,” Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 527-538.
[26]X. J. Yu, D. Luo, X. Meng, and H. Lu(2005), “Dynamically Updating XML Data: Numbering Scheme Revisited,” World Wide Web: Internet and Web Information System, Vol. 8, Issue 1, pp. 5-26.
[27]W3C, Extensible Markup Language (XML) 1.0 W3C Recommendation, 10 February 1998, http://www.w3.org/TR/1998/REC-xml-19980210.
[28]W3C, Extensible Markup Language (XML) 1.1 W3C Recommendation, 4 February 2004, http://www.w3.org/TR/2004/REC-xml11-20040204/.
[29]W3C, XML-QL: A Query Language for XML W3C Recommendation, 19 August 1998, http://www.w3.org/TR/1998/NOTE-xml-ql-19980819/.
[30]W3C, XML Schema Part 0: Primer W3C Recommendation, 2 May 2001, http://www.w3.org/TR/2001/REC-xmlschema-0-20010502/.
[31]W3C, XML Schema Part 1: Structures W3C Recommendation, 2 May 2001, http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/.
[32]W3C, XML Schema Part 2: Datatypes W3C Recommendation, 2 May 2001, http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊
 
系統版面圖檔 系統版面圖檔