跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.23) 您好!臺灣時間:2025/10/25 17:01
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:陳鍾誠
研究生(外文):Chen, Chung Chen
論文名稱:基於欄位填充機制的XML文件檢索方法
論文名稱(外文):XML Retrieval - A Slot-Filling Approach
指導教授:項潔項潔引用關係
指導教授(外文):Jieh Hsiang
學位類別:博士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:英文
論文頁數:101
中文關鍵詞:標記語言資訊檢索知識表達人工智慧自然語言處理資料採掘
外文關鍵詞:XMLinformation retrievalOntologyArtificial IntelligenceNatural Language ProcessingData Mining
相關次數:
  • 被引用被引用:1
  • 點閱點閱:781
  • 評分評分:
  • 下載下載:53
  • 收藏至我的研究室書目清單書目收藏:8
可擴充標記語言XML自1998年由W3C 提出之後,已被廣泛用於文件交換與知識表達上,由於XML 文件具有語意標記與半結構化的特性,使得 XML 的檢索具有相當大的發展潛力,為了充分利用 XML 文件的特性,本論文利用特殊設計的知識表達方法,發展出了一套 XML 文件的檢索機制。
由於電腦不容易理解自然語言文件,因此造成了人與機器之間的語意落差,對於 XML檢索系統而言,語意落差可分為查詢端的語意落差與文件端的語意落差,查詢端的語意落差是由於結構化查詢語言的不容易寫所造成的,而文件端的語意落差則是由於電腦無法理解XML文件所造成的,為了解決語意落差的問題,本論文提出以欄位樹(Slot-Tree Ontology)為核心的知識表達方法,並利用此方法解決XML文件檢索系統上的語意落差問題。
欄位樹是一種物件式的知識表達法,特別適合用來檢索物件式的XML文件,在本論文中,首先我們設計出欄位樹以代表物件的背景知識,接著發展出來欄位填充機制 (Slot-Filling Algorithm)將XML 文件映射到欄位樹中,以抓取XML文件的語意,然後利用該欄位樹與填充機制,設計出一套 XML 文件的語意檢索方法,包含多欄位的檢索介面,能充分利用語意標籤的檢索模型與摘要技術,以使系統能精確的檢索出XML 文件,並動態抽取出語意樹以便瀏覽。
由於建構欄位樹的工作不易,因此我們發展出一套資料採掘(Data Mining)的演算法 (Slot-Mining Algorithm),以自動從XML文件集合中抽取出欄位樹,該方法以統計的手段分析語意標籤與詞彙之間的相關係數,以便找出特徵詞彙填入欄位中,自動建構出欄位樹, 使得欄位樹的建構工作變得比較容易。
我們用兩個實際的案例-台灣蝴蝶數位博物館與蛋白質資料庫(Protein Information Resource),來測試該XML文件檢索系統的表現,發現該系統能較正確的檢索XML文件,並且組織檢索結果以便瀏覽,,另外, 自動建構欄位樹的程式也能有效填入特徵詞彙於欄位中,但仍然需要人工修改以提高欄位樹的品質。
最後、我們總結了本論文在 XML 文件檢索上的貢獻,並與現有的一些方法進行定性的比較,以說明本方法的優點與缺點,並提出未來可能的研究方向。

Extensible Markup Language (XML) is widely used in data exchanging and knowledge representation. A retrieval system that understands the content of XML documents is strongly desired. In order to improve the efficiency of XML retrieval systems, we design a set of methods based on a ontology called slot-tree, and use the slot-tree to help the XML retrieval process.
One problem for us to building smart computer is that computer cannot understand natural language as good as human. This is called the semantic gap between human and computer. For XML retrieval systems, semantic gap lies on both the query side and document side. The semantic gap in query side is due to the difficulty for human to write structuralized query. The semantic gap in document side is due to the difficulty for computer to understand XML documents. In order to reduce semantic gaps, we design a XML retrieval system based on the slot-tree ontology.
The slot-tree ontology is an object-based knowledge representation. First, we design the slot-tree ontology to represent the inner structure of an object. Next, we design a slot-filling algorithm that maps XML documents into the slot-tree ontology in order to catch the semantics. After that, we design a XML retrieval system to reduce the semantic gap based on the slot-tree ontology and slot-filling algorithm. The system contains a slot-based query interface, a semantic retrieval model for XML, and a program that extract summary for browsing.
However, the construction of slot-tree is not an easy job, so we design a slot-mining algorithm to construct slot-tree automatically. Slot-mining algorithm is a statistical approach that based on the correlation analysis between tags and words. The highly correlated terms are filled into the slot-tree as values. This algorithm eases the construction process of slot-tree.
Two XML collections are used as the test bed of our XML retrieval system, one about butterflies and another about proteins. We found that our XML retrieval system is easy to use and performs well in the retrieval effectiveness and the quality of browsing. Besides, the slot-mining algorithm can fill important words into each slot. However, the mining result should be modified in order to improve the quality of slot-tree.
Finally, we conclude our contribution on XML retrieval and compare our methods to some other methods. A qualitative analysis is given in the last chapter. After that, we propose our direction for the future research.

Content
Part 1 : Tutorial
1 Introduction
1.1 Motivation — Reducing the semantic gap in XML retrieval
1.2 Research problem - The semantic gap problem of XML
1.3 Research approaches
1.4 Outline of this thesis
2 Background — XML and Information Retrieval
2.1 Introduction
2.2 XML
2.3 Information retrieval
2.4 XML and information retrieval
2.5 Discussion
Part 2 : Methods
3 Slot-Tree Ontology and Slot-Filling Algorithm
3.1 Introduction
3.2 Background
3.3 Slot-tree ontology
3.4 Slot-filling algorithm
3.5 Discussion
4 An Ontology Based Approach for XML Querying, Retrieval and Browsing
4.1 Introduction
4.2 XML data — An example
4.3 Indexing structure
4.4 Query language and query interface
4.5 Ranking strategy
4.6 Browsing XML documents
4.7 Discussion
5 Building the Slot-Tree Ontology
5.1 Introduction
5.2 Background
5.3 The construction of slot-tree ontology
5.4 Slot-mining algorithm
5.5 Discussion
Part 3 : Case Study
6 Case Study - A Digital Museum of Butterflies
6.1 Introduction
6.2 Data representation in XML
6.3 Slot-tree ontology for butterflies
6.4 Query interface
6.5 Mapping XML documents into ontology
6.6 XML retrieval for butterflies
6.7 Mining slot-tree from XML documents
6.8 Discussion
7 Case Study - Protein Information Resource (PIR)
7.1 Introduction
7.2 Data representation in XML
7.3 Slot-tree ontology for protein
7.4 Mapping data into ontology
7.5 XML retrieval for proteins
7.6 Mining slot-tree from XML documents
7.7 Discussion
Part 4 : Conclusions
8 Conclusions
8.1 Comparison
8.2 Contribution
8.3 Conclusions and future research
Reference

[Abiteboul97] S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. Wiener. “The Lorel Query Language for Semistructured Data.” International Journal on Digital Libraries, 1(1):68-88, April 1997.
[Aguilera00] Aguilera, V. and Cluet, S. and Veltri, P. and Vodislav, D. and Wattez,F. (2000) “Querying XML Documents in Xyleme” in ACM SIGIR 2000 Workshop On XML and Information Retrieval.
http://www.haifa.il.ibm.com/sigir00-xml/final-papers/xyleme/XylemeQuery/XylemeQuery.html
[Albano00] Albano, A. and Colazzo, D. and Ghelli, G. and Manghi, P. and Sartiani, C. (2000) “A Type System for Querying XML Documents” in ACM SIGIR 2000 Workshop On XML and Information Retrieval.
http://www.haifa.il.ibm.com/sigir00-xml/final-papers/Sartiani/athens.html
[Allen94] Allen, J.F. “Natural Language Understanding,” Benjamin Cummings, 1987, Second Edition, 1994.
[Alshawi92] Hiyan Alshawi, editor. The Core Language Engine. MIT Press, Cambridge, Massachusetts, 1992.
[Baeza00] Baeza-Yates, R. and Navarro, G. (2000) “XQL and Proximal Nodes,” in ACM SIGIR 2000 Workshop On XML and Information Retrieval.
http://www.haifa.il.ibm.com/sigir00-xml/final-papers/RBaetza/att1.htm
[Bobrow77] Bobrow, D. G. and Winograd, T. (1977). “An overview of KRL, a knowledge representation language.” Cognitive Science, 1(1), 3--46.
[Bollacker98] Bollacker, K.D. and Lawrence, S. and Giles, C.L. (1998) “CiteSeer: An Autonomous Web Agent for Automatic Retrieval and Identification of Interesting Publications”, 2nd International ACM Conference on Autonomous Agents, pp. 116-123, ACM Press, May, 1998.
[Brachman85a] Brachman, R. and Levesque, H. (1985). “Readings in Knowledge Representation”, Stanford: Morgan Kaufmann
[Brachman85b] Brachman, F.J., and Schmolze, J.G. (1985) “An overview of the KL-ONE knowledge representation system.” Cognitive Sci. 9.2 (Apr. 1985) 171-216.
[Brin98] Brin, S. and Page,L.(1998) "The Anatomy of a Large-Scale Hypertextual Web Search Engine" in Proceedings of World-Wide Web '98 (WWW7), April 1998.
[Carmel00] Carmel, D. and Maarek, Y. and Soffer, A. (2000) “Workshop Summary of XML and Information Retrieval: a SIGIR 2000 Workshop” IBM Research Lab in Haifa.
http://www.haifa.il.ibm.com/sigir00-xml/WorkshopSummary.html
[Chien97] Chien, L.F. (1997) "PAT-Tree Based Keyword Extraction for Chinese Information Retrieval" ACM SIGIR 1997.
[Chen00] Chen, B.C. (2000) ‘Content-Based Image Retrieval of Butterflies”, Master Thesis. NTU, Taiwan, June, 2000.
[Cooper01] Cooper, B.F. and Sample, N. and Franklin,M.J. and Hjaltason,G.R. and Shadmon, M. (2001) “A Fast Index for Semistructured Data” Proc. of 27th Intl. Conf. on Very Large Data Bases, August 2001. http://www.rightorder.com/technology/XML.pdf
[DC99] “Dublin Core Metadata Element Set, Version 1.1: Reference Description” —
http://dublincore.org/documents/dces/
[Dyer83] Dyer, M.G. (1983) "In-Depth Understanding - A computer model of integrated processing for Narrative Comprehension, " MIT press, 1983.
[DeJong82] DeJong; G.. (1982) “An Overview of the FRUMP System.” In Strategies for Natural Language Processing, W.G.Lehnert & M.H.Ringle (Eds), Lawrence Erlbaum Associates, 1982, 149-176.
[Egnor00] Egnor,D. and Lord,R. (2000) “XYZfind: Searching in Context with XML” in ACM SIGIR 2000 Workshop On XML and Information Retrieval.
http://www.haifa.il.ibm.com/sigir00-xml/final-papers/Egnor/index.html
[Goldman97] Goldman, R. and Widom, J. (1997) “DataGuides: Enabling query formulation and optimization in semistructured databases.” In Proc. Intl. Conf. on Very Large Data Bases, 1997.
[Green63] Green, B.F., Wolf, A.K., Chomsky, C., and Laughery, K. (1963). “Baseball : An automatic question answerer.” In Feigenbaum and Feldman (Eds.), Computer and Thought. McGraw-Hill, New York, 207-233.
[Grosz86] Grosz, B.J., Sparck-Jones, K., and Webber, B.L., eds. (1986) "Readings in Natural Language Processing", Morgan Kaufmann Publishers, Los Altos, CA, 1986
[Fuhr00] Fuhr, N. (2000) “XIRQL An Extension of XQL for Information Retrieval” in ACM SIGIR 2000 Workshop On XML and Information Retrieval.
http://www.haifa.il.ibm.com/sigir00-xml/final-papers/KaiGross/sigir00.html
[Han01] Han, J. and Kamber, M. (2001) “Data Mining - Concepts and Techniques”, Morgan Kaufmann Publisher. 2001.
[Hayashi00] Hayashi, Y. and Tomita, J. and Kikui,G. (2000) “Searching Text-rich XML Documents with Relevance Ranking” in ACM SIGIR 2000 Workshop On XML and Information Retrieval.
http://www.haifa.il.ibm.com/sigir00-xml/final-papers/Hayashi/hayashi.html
[Heb00] Heb, M. and Monch, C. and Drobnik, O. (2000) "Quest - Querying Specialized Collections on the Web", J. Borbinha and T.Baker (Eds.) : ECDL 2000, LNCS 1923, pp. 117-126, 2000.
[Hobbs96] Hobbs, J. and Appelt, D. and Bear, J. and Israel, D. and Kameyama, M. and Stickel, M. and Tyson, M. (1996) “FASTUS: A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text.” in Finite State Devices for Natural Language Processing, MIT Press, 1996
[Hsu98] Hsu, C.N. and Dung, M.T. (1998) “Generating finite-state transducers for semistructured data extraction from the web,” Information Systems, 23(8):521-538, Special Issue on Semistructured Data, 1998.
[Ide00] Ide, N. (2000) “Searching Annotated Language Resources in XML: A Statement of the Problem” in ACM SIGIR 2000 Workshop On XML and Information Retrieval.
http://www.haifa.il.ibm.com/sigir00-xml/final-papers/Ide/SIGIR-XML.html
[Ifikes85] Ifikes, R. and Kehler, J. (1985) “The role of frame-based representation in reasoning.” Communications of the ACM, Volume 28 Number 9, September 1985.
[Kehler84] T.P. Kehler and G.D. Clemenson. KEE: The Knowledge Engineering Environment for Industry. Systems And Software, 3(1):212-224, January 1984.
[Kleinberg98] Kleinberg, J.M. (1998) "Authoritative Sources in a Hyperlinked Environment" in Proceedings of ACM-SIAM Symposium on Discrete Algorithms, 668-677, January 1998.
http://www.cs.cornell.edu/home/kleinber/auth.ps
[Kushmerick00] Kushmerick, N. (2000) “Wrapper induction: Efficiency and expressiveness” Artificial Intelligence J. 118(1-2):15-68 (special issue on Intelligent Internet Systems).
[Lewin99] Lewin et al.1999 I. Lewin, R. Becket, J. Boye, D. Carter, M. Rayner, and M. Wiren. Language processing for spoken dialogue systems: is shallow parsing enough? In Accessing Information in Spoken Audio: Proceedings of ESCA ETRW Workship, Cambridge, 19 & 20th April 1999, pages 37--42, 1999.
[Luk00] Luk,R. and Chan,A. and Dillon,T. and Leong, H.V. (2000) “A Survey of Search Engines for XML Documents” in ACM SIGIR 2000 Workshop On XML and Information Retrieval.
http://www.haifa.il.ibm.com/sigir00-xml/final-papers/Luk/XMLSUR.htm
[McHugh97] McHugh, J. and Widom, J. and Wiener, J. and Abiteboul, S. and Quass, D. (1997) “The Lorel Query Language for Semistructured Data, ” - International Journal on Digital Libraries, 1(1):68-88, 1997.
[Minsky75] Minsky, M. (1975). “A framework for representing knowledge.” Available in Readings in Knowledge Representation, Brachman, R.J. & Levesque, H.J., Eds. (1985), Morgan Kaufman.
[Muslea99] Muslea, I. (1999) “Extraction Patterns for Information Tasks : A Survey, ” In AAAI-99 Workshop on Machine Learning for Information Extraction, 1999.
[OIL00] “An informal description of Standard OIL and Instance OIL 28 November 2000”
http://www.ontoknowledge.org/oil/downl/oil-whitepaper.pdf
7[Page98] Page, L. and Brin, S. and Motwani, R. and Winograd, T. “The PageRank citation ranking: Bringing order to the Web.” Unpublished manuscript, online at http://google.stanford.edu/~backrub/ pageranksub.ps, 1998.
[Quillian66] Quillian, R. "Semantic memory," Cambridge, Mass. : Bolt, Beranek and Newman, 1966.
[RDF99] Resource Description Framework (RDF) Model and Syntax Specification W3C Recommendation 22 February 1999 http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/
[RDFS00] Resource Description Framework (RDF) Schema Specification 1.0 W3C Candidate Recommendation 27 March 2000 http://www.w3.org/TR/2000/CR-rdf-schema-20000327/
[Salton75] Salton, G. and Yang, C. and Wong. A. “A vector-space model for automatic indexing.” Communications of the ACM, 18(11):613-620,1975.
[Salton88] Salton, G. and Buckley, C. “Term-Weighting Approaches in Automatic Text Retrieval,” Information Processing and Management, 24(5), 513-23, 1988.
[Schank74] Schank, R.C. and Reiger III, C.J.(1974) "Inference and the Computer Understanding of Natural language," Artificial Intelligence 5(4), 1974, 373-412.
[Schank77] Schank, R.C. and Abelson, R. (1977). “Scripts, Plans, Goals, and Understanding.” Hillsdale, NJ: Earlbaum Assoc.
[Schank80] Schank, R.C. and Kolodner, J.L. and DeJong, G. (1980) “Conceptual Information Retrieval.” SIGIR 1980: 94-116.
[Schmidt00] Schmidt, A. et al. (2000) “Efficient Relational Storage and Retrieval of XML Documents”, In proceedings of International Workshop on the Web and Databases (In conjunction with ACM SIGMOD), pages 47-52, Dallas, TX, USA, May 2000.
http://citeseer.nj.nec.com/schmidt00efficient.html
[Schlieder01] Schlieder, T. (2001) “Similarity search in XML data using cost-based query transformations.” In Proceedings of the Fourth International Workshop on the Web and Databases (WebDB'01), Santa Barbara, USA, May 2001.
[Schlieder00] Schlieder, T. and Meuss, H. (2000) “Result ranking for structured queries against XML documents.” In DELOS Workshop on Information Seeking, Searching and Querying in Digital Libraries, Zurich, Switzerland, December 2000.
[Schlieder00] Schlieder, T. and Naumann ,F. (2000) “Approximate Tree Embedding for Querying XML Data” in ACM SIGIR 2000 Workshop On XML and Information Retrieval.
http://www.haifa.il.ibm.com/sigir00-xml/final-papers/Approximate.htm
[Stefik79] Stefik, M.J. (1979) “An examination of a frame-structured representation system.” In Proceedings of the 6th International Joint Conference on Artificial Intelligence (Tokyo, Japan, Aug.). Kaufmann, Los Altos, CaIif., 1979, pp. 845-852.
[Stefik83] Stefik, M., Bobrow, D. G., Mittal, S., and Conway, L. Knowledge Programming in Loops: Report on an Experimental Course. AI Magazine, 4:3, pp. 3-13, Fall 1983. (Reprinted in Readings From the AI Magazine, Volumes 1-5, 1980-1985, pp. 493-503, 1988.)
[Tu99] Tu, H. C. (1999) “Interactive Web IR: Focalization Model, Effectiveness Measures, and Experiments”, Doctoral Dissertation, NTU, Taiwan, June, 1999.
[Turing50] Turing, A. M. “Computing machinery and intelligence. Mind”, 59:433-460, 1950.
[UDDI00] “UDDI Technical White Paper” September 6, 2000
http://www.uddi.org/pubs/Iru_UDDI_Technical_White_Paper.PDF
[van Zwol2002] van Zwol, R. (2002). “Modelling and searching web-based document collections.” PhD thesis, Centre for Telematics and Information Technology (CTIT), Enschede, the Netherlands. ISBN: 90-365-1721-4; ISSN: 1381-3616 No. 02-40 (CTIT Ph.D. thesis series).
[Weizenbaum66] Weizenbaum, J. 1966. “ELIZA,” Communication of ACM 9:36-45.
[Widom99] Widom, J. (1999) “Data Management for XML - Research Directions”, IEEE Data Engineering Bulletin, Special Issue on XML, 22(3):44-52, September 1999.
http://www-db.stanford.edu/~widom/xml-whitepaper.htm
[Wood75] Woods, William A. “What's in a Link : Foundations for Semantic Networks” Available in Readings in Knowledge Representation, Brachman, R.J. & Levesque, H.J., Eds. (1985), Morgan Kaufman.
[XML98] “Extensible Markup Language (XML) 1.0” W3C Recommendation 10-February-1998
http://www.w3.org/TR/1998/REC-xml-19980210
[XML00] “Extensible Markup Language (XML) 1.0 (Second Edition)” W3C Recommendation 6 October 2000 http://www.w3.org/TR/REC-xml
[XMLNS99] “Namespaces in XML” World Wide Web Consortium 14-January-1999
http://www.w3.org/TR/REC-xml-names/
[XML-QL98] “XML-QL: A Query Language for XML,” Submission to the World Wide Web Consortium 19-August-1998 http://www.w3.org/TR/1998/NOTE-xml-ql-19980819/
[XML-GL99] Stefano Ceri, Sara Comai, Ernesto Damiani , Piero Fraternali, Stefano Paraboschi, Letizia Tanca “XML-GL: a Graphical Language for Querying and Restructuring XML Documents,” in The Eighth International World Wide Web Conference (WWW8), Toronto Convention Centre, Toronto, Canada May 11-14, 1999.
[XPATH99] “XML Path Language (XPath) Version 1.0”, W3C Recommendation 16 November 1999,
http://www.w3.org/TR/xpath
[XTM01] “XML Topic Maps (XTM) 1.0” http://www.topicmaps.org/xtm/1.0/
[XQuery01] “XQuery 1.0 and XPath 2.0 Data Model” W3C Working Draft 20 December 2001
http://www.w3.org/TR/2001/WD-query-datamodel-20011220/

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
1. 黃鴻博(民86b):影響教師實施STS教學活動因素之研究。科學教育研究與發展季刊,7,4-13。
2. 黃鴻博(民85):改變中的自然科學評量實務。國教輔導,35(5),42-47。
3. 湯梅英(民89):綜合活動領域之精神與特色-實踐參與、體驗意義。教育研究,74,51-60。
4. 曾端真(民89):綜合活動學習領域中的輔導內涵。測驗與輔導,161,3374-3376。
5. 陳木金(民89):從學校行政系統理論觀點看九年一貫新課程的實施。學校行政,7,17-31。
6. 陳文典(民86):STS教學教師所需之專業準備。科學教育學刊,5(2),167-189。
7. 許文勇(民89):九年一貫新課程的教學革新研究與發展。學校行政,7,56-64。
8. 張世忠(民86):建構主義與科學教學。科學教育月刊,202,16-23。
9. 林明瑞(民86):,STS模式之環境教育教學法。科學教育月刊,204,24-31。
10. 王澄霞、謝昭賢(民86):以教與學歷程檔案評量STS教師的專業能力及其成長。科學教育學刊,5(2),137-165。
11. 王澄霞(民86):STS教師的專業成長。科學教育學刊,5(1),23-58。
12. 王澄霞(民84a):從「科技與社會互動之學習」探究數理教育問題。科學發展月刊,25(3),167-174。
13. 黃鴻博(民87):以STS教育理念改進國小自然科教學之研究。科學教育研究與發展季刊,12,3-17。
14. 黃鴻博、郭重吉(民88):STS教育理論的接納與實踐-一個國小教師的個案研究。科學教育學刊,7(1),1-15。
15. 歐用生(民88):新法令需有新土壤-評九年一貫課程的配套措施。國民教育,39(6),2-9。