跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.23) 您好!臺灣時間:2025/10/25 19:42
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:呂永祥
研究生(外文):Yung-hsiang Lu
論文名稱:以最適化需求為導向之XML資料擷取技術
論文名稱(外文):A User-Expectation-Oriented XML Data Retrieval Technique
指導教授:趙景明趙景明引用關係
學位類別:碩士
校院名稱:東吳大學
系所名稱:資訊科學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2005
畢業學年度:93
語文別:中文
論文頁數:85
中文關鍵詞:可能文件集合語意矩陣資料擷取XML結構檢索
外文關鍵詞:Possible document setData retrievalXMLStructure indexingSemantic matrix
相關次數:
  • 被引用被引用:5
  • 點閱點閱:302
  • 評分評分:
  • 下載下載:21
  • 收藏至我的研究室書目清單書目收藏:3
由於延伸性標記語言(Extensive Markup Language, XML)可以定義文件內元素(Element)的種類與格式,透過統一的DTD格式定義,可使得文件交換時有著共同的標準,不同組織與單位間的訊息可依此定義順利交換,XML已取代傳統的超文件標記語言HTML成為網頁資料呈現的主流方式。隨著大量的XML文件在網際網路上傳遞,過於龐大的資訊以及資料複雜化的問題也就隨之產生。因此本文提出了以最適化需求為導向之XML資料擷取技術(User-Expectation-Oriented XML Retrieval Technique),並且設計與建置了一個XML資料擷取系統(XML Retrieval System)雛型,此系統能讓使用者在最符合需求與語意的狀況下找尋出合適的文件。本論文有三個重要的貢獻:首先,我們提出一套XML架構重置的機制,並利用此機制來分析XML文件的內部架構與蘊含內容。其次,我們提出一個語意分析模組,利用該模組能進行元素間語意關連分析的特性,把關鍵詞之間的語意關連程度清楚地定義並予以量化。最後,我們提出一個資料探索模組,此模組將各個可能成為結果的XML文件進行再次審認,如此可以找出最符合使用者意向的文件。經由實驗的方式證明,我們將本論文所使用的方法—「依語意與結構資訊的擷取技術」與「僅依結構資訊進行擷取」做一比較,結果顯示前者的方法無論在使用者滿意度以及減少索引資料的記憶體佔用率方面都比後者更具有成效。
As a result of the format and element type can be clearly defined inside a XML document, information could be exchanged between different organization or community based on their DTD definition. XML has replaced the HTML and become the optimum information exchange platform in the Internet. However, it can also cause some problem such as data complicating in the situation of increasing of large amount of XML document. In this research we propose a User- Expectation-Oriented XML Retrieval Technique and implement a prototype of a retrieval system of XML data which can find the most suitable document that match user’s semantic expectation and desire. This research has three major contributions. First, we propose a structure reforming mechanism and it’s sub-components to analysis the structure and content inside XML document at the beginning of the whole retrieval process. Second, we propose a semantic analysis module based on correlation relationship between key terms and document structure information to clearly provide the degree of semantic correlation. Third, we propose a information explore module to synthesize information coming form previous module and pick up the most suitable result form all possible document set. We compared our approach with another one that only use structure-based retrieval technique by experiments. Performance evaluation shows that our approach is more effective than other in terms of user–satisfaction and the ratio of memory occupation of indexing representation.
誌謝 i
摘要 ii
Abstract iii
目錄 iv
圖目錄 vii
表格目錄 ix
1. 緒論 1
1.1 研究背景 1
1.2 研究動機 3
1.3 研究目的 3
1.4 研究流程 4
1.5 論文架構 5
2. 基本觀念介紹 6
2.1 電子化文件與XML 6
2.1.1 XML緣起與目標 6
2.1.2 XML特色與趨勢 8
2.1.2.1 XML具自我描述性 8
2.1.2.2 XML具文件整合效益 9
2.1.2.3 XML具結構性 11
2.1.2.4 XML具備內容和外觀分離原則 14
2.1.2.5 XML具備多樣性及可擴展性 16
2.2 資料擷取技術 17
2.2.1 資料檢索系統簡介 17
2.2.1.1 電子資料檢索系統 18
2.2.1.2 電子資料檢索之優缺點 20
2.2.2 電子資料檢索系統發展過程 21
2.2.2.1 西文電子資料檢索系統發展過程 22
2.2.2.2 中文電子資料檢索系統發展過程 24
2.2.3 檢索模型 24
2.2.4 檢索策略 29
2.2.5 檢索評估 33
2.2.5.1 檢索品質評估 33
2.2.5.2 評估調查 37
3. 文獻探討 39
3.1 文件檢索系統的探討 39
3.2 檢索詞彙的相關研究 40
3.3 文句判斷的重要性 41
3.4 中文斷詞 42
3.5 結構化文件索引技術 42
3.6 語意網 43
3.7 自然語言處理與語意分析 44
4. 資料擷取系統架構 46
4.1 XML文件架構重建與可索引化 46
4.2 語意索引的建立 47
4.3 資料擷取之可用度評定 48
4.4 資料擷取系統雛型 50
5. 架構重置 52
5.1 架構重置模組 52
5.1.1 內容剖析器 53
5.1.2 結構樹產生器 55
5.1.3 元素指位器 56
5.2 架構重置範例 57
5.2.1語意強度及權重強度範例 59
5.2.2重置結果範例 61
6. 語意分析 62
6.1 語意分析模組 62
6.2 語意分析範例 66
7. 資料探索 67
7.1 資料探索模組 67
7.2 運作模式 70
8. 實驗方法與效能分析 72
8.1 實驗方法 72
8.2 效能分析 73
8.2.1 系統滿意度測試 73
8.2.2運作效能測試 74
9. 結論與未來研究 79
參考文獻 80
附錄A 85
1.卜小碟(89),台灣網路使用者檢索行為探討,大學圖書館第四卷第二期,頁23-37。
2.吳文峰(民91),中文郵件分類器之設計及實作,逢甲大學資訊工程學系碩士班碩士論文。
3.李建興等(民91),基於Ontology之中文文件自動摘要技術之研究,第三屆產業資訊管理學術暨新興科技實務研討會論文集(下集),頁870-876。
4.林隆祺(民89),運用字詞位置的文件索引技術初探,台灣大學資訊管理學研究所碩士論文。
5.唐大任(民91),中文斷詞器之研究,交通大學電信工程學系碩士班碩士論文。
6.高虹(民91),電腦也能看懂語意網,科學人雜誌第六期,頁47-56。
7.陳佳君(民87),檢索詞彙來源探析,大學圖書館第二卷第三期。
8.曾志軒(民90),中文結構化文件之語意索引,交通大學資訊科學研究所碩士論文。
9.黃雲龍(民87),中文全文資訊檢索研究架構與重要議題探討,大學圖書館第二卷第三期。
10.葉政輝(民91),以語料為基礎的中文專有名詞分類之研究,交通大學資訊科學研究所碩士論文。
11.葉震源(民91),文件自動化摘要方法之研究及其在中文文件的應用,交通大學資訊科學研究所碩士論文。
12.Allen, J.F., “Natural Language Understanding,” Benjamin Cummings, Second Edition, 1994.
13.Asuncion, G.P., and Oscar, C., “Ontology Languages for the Semantic Web,” IEEE INTELLIGENT SYSTEMS, 2002, pp. 54-60.
14.Belkin, N.J., and Croft, W.B., “Information filtering and information retrieval: two sides of the same coin?,” Communications of the ACM, Vol.35, No.12., 1992, pp. 29-38.
15.Bobrowy, D.G., and Winograd, T., “An Overview of KRL, a knowledge representation language,” Cognitive Science, 1(1), 1997, pp. 3—46.
16.Bosak, J., and Bray, T., “XML and the Second-GenerationWeb,” Scientific American, 1999.
17.Brachman, R., and Levesque, H., “Readings in Knowledge Representation,” Stanford: Morgan Kaufmann. 1985.
18.Chow, J.H., Cheng, J., Chang, D., and Xu, J., “Index Design for Structured Documents Based on Abstraction,” 6th International Conference on Database Systems for Advanced Applications, 1999, pp. 89-96.
19.Chung, M., He, Q., Kevin Powell and Bruce Schatz, “Semantic Indexing for a Complete Subject Discipline,” Proceedings of the fourth ACM conference on Digital libraries , 1999, pp. 39 - 48
20.Dao, T., “An Indexing Model for Structured Documents to Support Queries on Content, Structure and Attributes,” IEEE International Forum on Research and Technology Advances in Digital Libraries, ADL ’98, 1998, pp. 39 – 48.
21.Grosz, B.J., Sparck-Jones, K., and Webber, B.L., “Readings in Natural Language Processing,” Morgan Kaufmann Publishers, Los Altos, CA, 1986.
22.Han, S.G., Son, J.H., Chang, J.W., and Zhoo, Z.C., “Design and Implementation of a Structured Information Retrieval System for SGML Documents,” Database Systems for Advanced Applications, 1999. In Proceeding of the 6th International Conference, 1999, pp. 81-88.
23.Hong, S., “Automating the Transformation of XML Documents,” In Proceeding of the 2001 Workshop on Web Information and Data Management (WIDM’01), Atlanta, GA, 2001, pp. 67-75.
24.Fikes, R., and Kehler, J., “The role of frame-based representation in reasoning,” Communications of the ACM, Volume 28 Number 9, September 1985.
25.Kasukawa, T., Matsuda, H., Nakanishi, M., and Hashimoto, A., “A New Method for Maintaining Semi-Structured Data Described in XML,” IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, 1999, pp. 258-261.
26.Kehler, T.P., and Clemenson, G.D., “The Knowledge Engineering Environment for Industry,” Systems and Software, 3(1), January 1984, pp. 212-224.
27.Kotsakis, E., Bohm, K., “XML Schema Directory: A Data Structure for XML Data Processing,” In Proceeding of the First International Conference on Web Information Systens Engineering, Hong Kong, 2002, pp.62-69.
28.Kurgan, L., Swiercz, W., and Cios, K., “Semantic Mapping of XML Tags using Inductive Machine Learing,” International Conference on Machine Learning and Applications, Las Vegas, 2002, pp. 99-109.
29.Lee, Y.K., Yoo, S.J., and Yoon, K., “Index Structures for Structured Documents,” Proceedings for the 1st ACM international conference on Digital libraries, 1996, pp. 91-99.
30.Minsky, M., “A framework for representing knowledge,” Available in Readings in Knowledge Representation, Brachman, R.J. & Levesque, H.J., Eds. (1985), Morgan Kaufman.
31.Noy, N.F., and McGuinness, D.L., “Ontology Development 101: A Guide to Creating Your First Ontology,” Stanford Knowledge Systems Laboratory Technical Report KSL-01-05 and Stanford Medical Informatics Technical Report SMI-2001-0880, 2001.
32.Poullet, L., Pinon, J.M., and Calabretto, S., “Semantic Structuring of documents,” Proceedings of the Third Basque International Workshop on Information Technology, 1997, pp. 118 – 124.
33.Quillian, R. “Semantic Memory,” Cambridge, Mass.: Bolt, Beranek and Newman, 1996.
34.Saracevic, T., and Kantor, P., “A Study of Information Seeking and Retrieving,” Journal Of The American Society For Information Science. Vol. 39, Iss.3, 1988, pp. 214.
35.Shin, D., Jang, H., and Jin, H., “BUS: An Effective Indexing and Retrieval Scheme in Structured Documents,” Proceeding of the third ACM Conference on Digital libraries, 1998, pp. 235 – 243.
36.Stefik, M., Bobrow, D. G., Mittal, S., and Conway, L., “Knowledge Programming in Loops: Report on an Experimental Course,” AI Magazine, 4:3, 1983, pp. 3-13.
37.Tim, B.L., “Weaving the Web: The Original Design and Ultimate Destiny of the world Wide Web by its inventor,” Mark Harper San Fancisco, 1999.
38.Turing, A. M., “Computing machinery and intelligence. Mind,” 59:433-460, 1950.
39.Wilkinson, R., “Effective Retrieval of Structured Documents,” Proceedings of the 17th ACM SIGIR conference, 1994, pp. 311 – 317.
40.Wolff, J.E., Florke, H., and Cremers, A.B., “Searching and Browsing Collections of Structure Information,” IEEE Proceeding of Advances in Digital Libraries, 2000, pp.141-150.
41.Woods, W.A., “What’s in a Link: Foundations for Semantic Networks,” Available in Readings in Knowledge Representation, Brachman, R.J. & Levesque, H.J., Eds. (1985). Morgan Kaufman.
42.“Extensible Markup Language (XML) 1.0” W3C Recommendation 10-February-1998
43.http://www.keenage.com
44.http://www.w3c.org/
45.http://www.SemanticWeb.org/knowmarkup.html
46.http://www.cs.umd.edu/projects/SHOE/faq.html
47.http://www.daml.org/
48.http://corpus.ling.sinica.edu.tw/project/LanguageArchive
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top