(3.236.228.250) 您好!臺灣時間:2021/04/22 04:45
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:顏俊傑
研究生(外文):Jyun-Jie Yan
論文名稱:使用階層式元素叢集於延伸標記語言文件之可回傳單位粹取
論文名稱(外文):Retrievable Units Extraction of XML Documents by Element Hierarchical Clustering
指導教授:李漢銘李漢銘引用關係
指導教授(外文):Hahn-Ming Lee
學位類別:碩士
校院名稱:國立臺灣科技大學
系所名稱:資訊工程系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2004
畢業學年度:92
語文別:中文
論文頁數:60
中文關鍵詞:延伸標記語言文件檢索
外文關鍵詞:XML document retrieval
相關次數:
  • 被引用被引用:0
  • 點閱點閱:210
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
隨著延伸標記語言(eXtensible Markup Language)被越來越的各種的應用領越接受,可以預想的是在不久的將來將會有大量的延伸標記語言文件在網路上流傳。所以,如何在這麼大量的延伸標記語言文件中作查詢和搜尋即便成了一個重要的問題。
  本篇論文中,我們將焦點放於一個問題上,稱之為[可回傳單位粹取],這個問題產生的最主要原因來自於目前的一般延伸標記語言文件所使用用來敘述資訊的通用模式而造成的問題。經觀察,我們會發現延伸標記語言文件通常會包括一個以上的可回傳單位。然而,我們並沒有辦法從延伸標記語言文件中直接得到這項資訊,這造成了目前的找尋方法只能把所有的在延伸標記語言文件內的子樹當作是可能的結果來作找尋的動作,但是在延伸標記語言文件內的並不是所有的子樹當作回傳的結果都有意義,有時這樣的回傳結果對於使用者來說並不適合。所以,我們在這裡提出了一個用來解決可回傳單位粹取的方法,命名為[階層式元素叢集],[階層式元素叢集]最主要為利用在延伸標記語言文件中的元素階層資訊以及元素之間的關係來找尋適合的可回傳單位。我們期望的是,藉由[可回傳單位粹取]這個問題的解決,我們可以用來提升在目前延伸標記語言文件搜尋的結果。我們會在實驗驗證所提出的做法確實能夠在提升在目前延伸標記語言文件的搜尋結果。
As the eXtensible Markup Language (XML) is accepted by more and more application domains, it is without saying that there will be large quantity of XML documents on the Web in near future. Therefore, how we can efficiently make queries and search from such large collections of XML documents becomes an important problem.
In this thesis, we focus on a problem, named as Retrievable Units Extraction problem, which is caused by the pattern for describing information of nowadays XML documents. By observation, an XML document is often composed of several retrievable units. However, there is no information about retrievable units in an XML document. This causes the nowadays retrieving approaches can only take all sub-trees of an XML document as possible results. Whereas not of all sub-trees are meaningful for retrieving as result, take all sub-trees of an XML document as possible results sometimes are not suitable for users. Therefore, we proposed an approach, named Element Hierarchical Clustering for solving Retrievable Unit Extraction problem. The proposed Element Hierarchical Clustering measure is base on the element hierarchical information and semantic relations between elements of XML document as factors and use the factors for finding the possible retrievable units. We expect that if the Retrievable Unit Extraction problem can be solved, we can use it for improving the performance of nowadays approaches. In the experiment of this thesis, we will show that the proposed measure is workable for improving the XML retrieval approaches.
Abstract II
Acknowledgement IV
Content VI
List of Figures IX
List of Tables XI
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Issues in XML retrieval 4
1.2.1 Query difficulty 4
1.2.2 Structure retrieval problem 5
1.3 Goal and Design 6
1.4 Organization of the thesis 6
Chapter 2 Background 7
2.1 XML 7
2.1.1 XML syntax 9
2.1.2 XML applications 10
2.2 XML retrieval problem 13
2.3 Approaches for XML retrieval 15
2.3.1 Rule-based approaches 15
2.3.1.1 Ontology Inference Layer (OIL) 15
2.3.2 Statistics-based approaches 17
2.3.2.1 Modified TF-IDF measure 17
2.3.2.2 XIRQL 18
2.3.2.3 XPRES 19
2.3.2.4 Xircus 20
2.3.2.5 Coherent Partial Documents (CPDs) 21
2.4 Summary on the statistics-based approaches 22
Chapter 3 Element Hierarchical Clustering 24
3.1 Concept of element hierarchical clustering 24
3.2 Details of element hierarchical clustering 26
3.2.1 Element coverage measure 26
3.2.2 Sub-element tags variety measure 29
3.3.3 Measure combination 31
3.3 System architecture 32
3.4 Characteristics of proposed measure 35
Chapter 4 Experiments 38
4.1 Characteristics of experimental dataset 38
4.2 Evaluation criteria 41
4.2.1 Score evaluation scheme on the dataset 41
4.2.2 Performance measure scheme 43
4.3 Experimental result 45
4.4 Discussion 51
4.4.1 Tuning ability discussion 51
4.4.2 Limitations 53
4.4.2.1 Tuning ability limitation 53
4.4.2.2 Limited dataset 56
Chapter 5 Conclusion and Future work 57
5.1 Conclusion 57
5.2 Future work 59
5.2.1 Naming rule mining 59
5.2.2 Relation mining for combining the unit information and the content information 60
Chapter 6 References 61
[1] eXtensible Markup Language (XML) http://www.w3.org/TR/REC-xml/
[2] K. Hatanoy, H. Kinutaniz, “Determining the Unit of Retrieval Results for XML Documents,” Proceedings of the First Workshop of the INitiative for the Evaluation of XML retrieval (INEX), pp. 57-64, March 2003.
[3] B. Yates, B. R. Neto, Modern Information Retrieval, Addison-Wesley, 1999.
[4] M. Kobayashi, K. Takeda, “Information Retrieval on the Web,” ACM Computing Surveys, Vol. 32, Issue 2, pp. 144-173, 2000.
[5] E. Kotsakis, “Structured information retrieval in XML documents,” Proceedings of the 2002 ACM symposium on Applied computing, pp. 663-667, 2002.
[6] Document Type Definition (DTD)
http://www.w3.org/XML/1998/06/xmlspec-report
[7] XML Schema http://www.w3.org/XML/Schema
[8] P. Fankhauser, “XQuery Formal Semantics State and Challenges,” ACM SIGMOD Record, Vol. 30, Issue 3, September 2001.
[9] A. Bonifati, S. Ceri, “Comparative Analysis of Five XML Query Languages,” ACM SIGMOD Record, Vol. 29, Issue 1, March 2000.
[10] A. Theobald, G. Weikum, “System performance and benchmarking: The XXL search engine: ranked retrieval of XML data using indexes and ontologies,” Proceedings of the 2002 ACM SIGMOD international conference on Management of data, pp. 615-615, 2002.
[11] T.T. Chinenyanga, N. Kushmerick, “Expressive retrieval from XML documents,” Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 163-171, 2001.
[12] D. Carmel, Y. S. Maarek, M. Mandelbrod, Y. Mass, A. Soffer, “Structured documents: Searching XML documents via XML fragments,” Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 151-158, July 2003.
[13] T. Schlieder, H. Meuss, “Querying and ranking XML documents,” Journal of the American Society for Information Science and Technology, Vol. 53, Issue 6, May 2002.
[14] Standard Generalized Markup Language (SGML) http://www.w3.org/MarkUp/SGML/
[15] World Wide Web Consortium (W3C) http://www.w3c.org
[16] W. Treese, “Putting it together: what''s all the noise about XML?,” netWorker, Vol. 2, Issue 5, page 27-29, 1998.
[17] J. B. Bedunah, “XML: the future of the Web,” Crossroads, Vol. 6, Issue 2, November 1999.
Available at http://www.acm.org/crossroads/xrds6-2/future.html
[18] Really Simple Syndication (RSS) http://blogs.law.harvard.edu/tech/rss
[19] Universal Description, Discovery and Integration (UDDI) http://www.uddi.org/
[20] Scalable Vector Graphics (SVG) http://www.w3.org/TR/SVG/
[21] eXtensible Stylesheet Language Format Object (XSL-FO) http://www.w3.org/TR/xsl/slice6.html#fo-section
[22] XML User Interface Language (XUL) http://www.mozilla.org/projects/xul/
[23] eXtensible Application Markup Language (XAML) http://longhorn.msdn.microsoft.com/lhsdk/core/overviews/about xaml.aspx
[24] Resource Description Framework (RDF) http://www.w3.org/RDF/
[25] Topic map http://www.topicmaps.org/
[26] XPath http://www.w3.org/TR/xpath
[27] XQuery http://www.w3.org/TR/xquery/
[28] Google http://www.google.com
[29] Semantic Web http://www.semanticweb.org/
[30] Ontology Inference Layer (OIL) http://www.ontoknowledge.org/oil/
[31] DARPA Agent Markup Language (DAML) http://www.daml.org
[32] Y. Y. Shen,, “Extending RDF in distributed knowledge-intensive applications,” Future Computer Systems Journal, Vol. 20, Issue 1, Jan 2004.
[33] U. Shah, T. Finin, A. Joshi, “Information Retrieval On The Semantic Web,” Proceedings of the eleventh international conference on Information and knowledge management, pp. 461-468, 2002.
[34] K. Yokota, T. Kunishima, T. B. Liu, “Semantic Extensions of XML for Advanced Applications,” Information Technology for Virtual Enterprises, pp. 49-57, 2001.
[35] L. Feng, E. Chang, T. S. Dillon, “A semantic network-based design methodology for XML documents,” ACM Transactions on Information Systems (TOIS), Vol. 20, Issue 4, October 2002.
[36] J. Mayfield, T. Finin, “Information retrieval on the Semantic Web: Integrating inference and retrieval,” Special Interest Group on Information Retrieval (SIGIR) Semantic Web Workshop, 2003.
Available at http://www.umbc.edu/~finin/papers/sigir03.pdf.
[37] N. Fuhr, K. Grosjohann, “XIRQL: An XML query language based on information retrieval concepts,” ACM Transactions on Information Systems (TOIS), Vol. 22, Issue 2, 2004.
[38] L. Mounia, “Dempster-Shafer''s Theory of Evidence Applied to Structured Documents: Modeling Uncertainty,” Proceedings of the 20th Annual International ACM SIGIR, pp. 110-118, 1997.
[39] H. Meyer, I. Bruder, Weber, G. Weber, A. Heuer, “The Xircus Search Engine,” Rostocker Informatik-Berichte, 2003.
Available at http://qmir.dcs.qmul.ac.uk/inex/Papers/final_p11_Meyer_etal.pdf.
[40] Y. Cong, Q. Hong, H. V. Jagadish, “Integration of IR into an XML Database,” Proceedings of the First Workshop of the INitiative for the Evaluation of XML retrieval (INEX), 2002.
Available at http://qmir.dcs.qmul.ac.uk/inex/Papers/final_p15_Yu_etal.pdf.
[41] J.E. Wolff, H. Flrke, A.B. Cremers, “XPRES: a Ranking Approach to Retrieval on Structured Documents,” Technical Report of University of Bonn, 1999. Available at
ftp://ftp3.informatik.uni-bonn.de/pub/paper/tr/IAI-TR-99-12.ps.gz.
[42] L. Guo, F. Shao , C. Botev , J. Shanmugasundaram, “XRANK: ranked keyword search over XML documents,” Proceedings of the ACM SIGMOD international conference on Management of data, June, 2003.
Available at http://www.cs.cornell.edu/People/jai/papers/XRank.pdf.
[43] E. Kotsakis, “Structured information retrieval in XML documents,” Proceedings of the 2002 ACM symposium on Applied computing, pp. 663-667, 2002.
[44] J. Kamps, M. Marx, M. Rijke, B. Sigurbjrnsson, “XML retrieval: what to retrieve?,” Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 409-417, 2003.
[45] C. E. Shannon, “A mathematical theory of communication,” Bell System Technical Journal, Vol. 27, pp. 379-423 and 623-656, 1948.
[46] G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, K.J. Miller, “Introduction to WordNet: an on-line lexical database,” International Journal of Lexicography, Vol. 3, Issue 4, Winter 1990.
[47] G. Kazai, M. Lalmas, J. Reid, “Construction of a test collection for the focussed retrieval of structured documents,” European Conference on Information Retrieval (ECIR), pp. 88-103, 2003.
[48] P. Ogilvie, and J. Callan, “Language Models and Structured Document Retrieval,” Proceedings of the Initiative for the Evaluation of XML Retrieval Workshop , 2002.
Available at http://qmir.dcs.qmul.ac.uk/inex/Papers/final_p1_Ogilvie_etal.pdf
[49] B. Choi, “What are real DTDs like?,” Fifth International Workshop on the Web and Databases, pp. 43-48, 2002.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關論文
 
1. 5.李文瑞、曹為忠、吳美珍(2000),「我國電子資訊廠商策略聯盟夥伴選擇之研究」, 管理評論,第19卷第3期,頁1-24。
2. 4.方至民、曾志弘、鍾憲瑞、沈如騏(2000),「高階經營團隊特質與財務能力對企業研發支出影響之研究─以台灣上市電子產業公司為例」,科技管理學刊,第5卷第3期,頁29-46。
3. 6.佘日新、梁家隆、陳厚銘(2000),「廠商如何經由國外夥伴之技術合作提昇企業技術能力─以台灣資訊電子廠商為例」,管理學報,第十七卷第二期,頁297-319。
4. 8.林穎芬、劉維琪(2003),「從高階主管薪酬的研究探討代理理論在台灣的適用程度」,管理學報,第二卷第二期,頁365-395。
5. 9.周添城(1988),「開放經濟的產業集中度─台灣製造業個案研究」,經濟論文,第16卷第1期,頁113-147。
6. 13.馬維揚、林卓民、楊永列(2003),「決定高科技產業研究發展支出因素之探討」,產業論壇,第5卷第2期。
7. 15.陳振祥、李吉仁(1997),「ODM 的成因與策略運作-水平式產業下的策略聯盟型態」,中山管理評論,第5卷第3期,頁553-572。
8. 22.湯明哲、李吉仁(1999),「外包與專業製造商雙贏的策略」,遠見雜誌,第62期,頁172-175。
9. 23.趙郁文(1998),「跨國委託製造對台灣資訊電子廠商營運能力之提升效果」,中山管理評論,第6卷第4期,頁1113-1136。
10. 24.歐進士(1998),「我國企業研究發展與經營績效關連之實證研究」,中山管理評論,第6卷第2期,頁357-386。
 
系統版面圖檔 系統版面圖檔