跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.84) 您好!臺灣時間:2024/12/04 10:35
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:范瓊文
研究生(外文):Tiffany Fan
論文名稱:主題概念階層模型:概念式搜尋
論文名稱(外文):Topic Concept-Hierarchy Model:Concept-Based Search
指導教授:楊鎮華楊鎮華引用關係
指導教授(外文):Stephen J.H. Yang
學位類別:碩士
校院名稱:國立中央大學
系所名稱:網路學習科技研究所
學門:教育學門
學類:教育科技學類
論文種類:學術論文
論文出版年:2005
畢業學年度:93
語文別:中文
論文頁數:49
中文關鍵詞:資訊擷取概念式搜尋領域概念階層資訊擷取模型
外文關鍵詞:Information ExtractionDomain Conceptual HierarchyConcept-Based Information RetrievalInformation RetrievalRetrieval Model
相關次數:
  • 被引用被引用:5
  • 點閱點閱:515
  • 評分評分:
  • 下載下載:33
  • 收藏至我的研究室書目清單書目收藏:3
  以學術的角度來看,資訊擷取技術(Information Retrieval, IR)主要是用於透過搜尋機制幫助使用者在圖書館找尋文件內容(Content)。在圖書館環境中,數位或實體文件是必須經過分類與整理,而每份文件內容也必須經過人工的處理,將文件的基本資訊如文件(書籍)名稱、作者、出版日期、出版商、摘要、分類類別、關鍵字詞與數位內容本身以結構化與非結構化的資料存放於資料庫中。因而資訊擷取的應用主要是依據搜尋者的關鍵字詞,在結構化與非結構化的資料庫中找到所有可能與相關的數位內容,並依其優先順序呈現。
  一般搜尋數位內容的作法是利用關鍵字詞比對(Keyword Matching)或各種的相似度公式作資訊的擷取。然而採用關鍵字詞查詢或相似度公式的作法,不容易從數位圖書館中擷取到『所有相關的資料』,原因是人們使用自然語言作表逹,用不同的詞彙表達本身要傳逹的概念,致使以關鍵字詞比對的召回率(Recall)無法提升。本篇論文提出的主題概念階層模型(Topic/ Domain Concept-Hierarchy Model)是將領域知識概念作階層式的分類,形成一個領域概念階層。每個分類項為一個概念節點,其對應相關的文件集。概念節點關鍵字詞來自於文件集,在領域階層那些概念節點中,搜尋使用者所下的關鍵字詞,若二者符合,表示搜尋者想了解此概念節點的內容,此節點稱為相關概念節點(Relevant Conceptual Node),而以下的節點稱為相關概念子節點( Relevant Conceptual Subnode),利用五個變動因素:節點之階層數,使用者的關鍵字落於節點之個數,使用者關鍵字詞與相關概念節點之cosine相似度,概念節點與子節點之距離,節點之分支度來調整計算相關概念節點與子節點之權數與相似度值。
  實驗數據證明主題概念階層模型能有效地應用在資訊擷取,能帶出搜尋者想要的搜尋目標與其相關的數位內容,並依據使用者最合適與相關聯的優先順序作排列,在最短時間內擷取他最想要的數位內容。
In viewpoint of the academic, Information Retrieval method is used to facilitate content search in a library environment. In a library, librarian needs to establish description information of digital content or physical content before stored. The description information will stored into repository including title, authors, published date, publisher, abstract, category, terms and the contents. Therefore, retrieval process is implemented based on comparison between user’s query and repository.
In general, keyword matching is a common approach in information retrieval research. However, this approach can not always brings a lot of all relevant information. The main reason cause this result is that people may use different words to access a specified information. Therefore, the recall performance of keyword-match is poor. In our study, we proposed Topic/ Domain Concept-Hierarchy Model to transform domain knowledge into hierarchical category in a domain hierarchy. Each category is a concept node and has corresponding content set. The represented keyword of node is extracted from content set. The matching is executed in the domain hierarchy to compute the similarity between user’s query and keywords in domain hierarchy. If matched, it means user intend to browse corresponding content set. The Node is call relevant conceptual node (RCN) and its bellow nodes are relevant conceptual sub-node (RCS).
Experiment result shows the proposed Topic/ Domain Concept-Hierarchy Model can be applied to information retrieval effectively. The recall and precision has been significantly improved comparison with traditional method. The responded result is ranked followed the correlation in domain hierarchy. In this way, users can retrieval suitable material in a short time.
中文摘要.............................................................................................................................. I
英文摘要........................................................................................................................... III
致謝................................................................................................................................... V
目錄.................................................................................................................................VII
表、圖目錄....................................................................................................................... IX
演算法、主題概念階層模型公式、引用目錄................................................................ X
第1 章. 簡介...................................................................................................................... 1
1.1 研究動機................................................................................................................... 1
1.2 研究目的................................................................................................................... 2
1.3 研究方法................................................................................................................... 3
第2 章. 相關研究.............................................................................................................. 5
2.1 傳統的資訊擷取模型(CLASSIC INFORMATION RETRIEVAL MODEL)......................... 5
2.1.1 布林函數模型(Boolean Model) .................................................................... 5
2.1.2 向量空間模型(Vector Space Model) .................................................................. 5
2.2 搜尋的召回率(RECALL, COMPLETE)與精確率(PRECISION, SOUND) ........................ 8
2.3 概念階層化(CONCEPT HIERARCHY).......................................................................... 9
2.4 相關研究總結......................................................................................................... 13
第3 章. 主題概念階層模型(TOPIC CONCEPT-HIERARCHY MODEL)--概念式搜
尋...................................................................................................................................... 14
3.1 主題概念階層的定義(DEFINITION OF TOPIC CONCEPTUAL HIERARCHY ) .............. 14
3.1.1 搜尋者的行為分析(Analysis of Seeker’s behavior) ........................................ 15
3.1.2 關鍵字詞之前置處理作業(Query Preprocessing ).................................... 16
3.2 主題概念階層模型相關名詞定義......................................................................... 17
3.3 權重策略-相關概念節點與相關概念子節點..................................................... 22
3.4 相似度的計算與排序(CALCULATE SIMILARITY MEASURE AND RANKING) .............. 25
3.5 系統架構與流程..................................................................................................... 28
第4 章. 系統實作............................................................................................................ 31
4.1 實驗環境................................................................................................................. 31
4.2 實驗目的................................................................................................................. 32
4.3 實驗結果................................................................................................................. 32
第5 章. 效能分析與比較................................................................................................ 36
5.1 F-MEASURE (TOPIC CONCEPT-HIERARCHY MODEL, TCHM).................................... 36
5.2 前十筆之平均滿意度值......................................................................................... 37
5.3 平均回應時間(AVERAGE RESPONSE TIME ) .............................................................. 37
第6 章. 結論與未來展望................................................................................................ 39
6.1 結論與貢獻............................................................................................................. 39
6.2 未來展望................................................................................................................. 40
第7 章. 參考文獻............................................................................................................ 42
第8 章. 附錄.................................................................................................................... 47
[1] Xiaomeng Su, Sari Hakkarainen and Terje Brasethvik, “Semantic enrichment for improving systems interoperability”, SAC’04, ACM, Nicosia, Cyprus, 2004, March, pp.1634-1641.

[2] Bruno Possas, Nivio Ziviani, Wagner Meira Jr. and Berthier Ribeiro-Neto, “Set-Based Model: A New Approach for information Retrieval”, SIGIR’02, ACM, TAMPERE, FINLAND , August 11-15, 2002, PP. 230-237.

[3] Stanley Loh, Leandro Krug Wives and Jose Palazzo M. de, “Concept-Based Knowledge Discovery in Texts Extracted from the Web”, Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, Volume 2, Issue1, pp. 29-38.

[4] Peter V. Henstock, Daniel J. Pack, Young-Suk Lee, Clifford J. Weinstein , “Toward an improved concept-based information retrieval system”, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, ACM ,New Orleans, Louisiana,September 9-12, 2001.

[5] Chau, R., Yeh, C.-H., "Fuzzy conceptual indexing for concept-based cross-lingual text retrieval",Internet Computing, IEEE , Volume: 8 , Issue: 5 , Sept.-Oct. 2004,Pages:14 – 21.
September 2001

[6] Karypis, George and Eui-Hong Han. 2000. Concept indexing: A fast dimensionality reduction algorithm with applications to document retrieval and categorization. Technical report tr-00-0016,
University of Minnesota.

[7] M. Sanderson and Bruce Croft, "Deriving Concept Hierarchies From Text", International Conference on Research and Development in Information Retrieval (SIGIR 1999), pp.206-213.

[8] Un Yong Nahm, Raymond J. Mooney,"Using Information Extraction to Aid the Discovery of Prediction Rules from Text",Proceedings of the KDD(Knowledge Discovery in Databases)-2000 Workshop on Text Mining, Boston, MA, , August 2000, pp.51-58.

[9] Christina Yip Chung, Raymond Lieu and Jinhui Liu,"Thematic Mapping – From Unstructured Documents to Taxonomies", CIKM ’02, ACM, McLean, Virginia, USA, November 4-9, PP.603-610.

[10] PRASANNA GANESAN, HECTOR GARCIA-MOLINA, and JENNIFER WIDOM “Exploiting Hierarchical Domain Structure to Compute Similarity”, ACM Transactions on Information Systems, Vol. 21, No. 1, January 2003, Pages 64–93.

[11] S.K.M. Wong, Vijay V. Raghavan, "Vector space model of information retrieval: a reevaluation", Proceedings of the 7th annual international ACM SIGIR conference on Research and development in information retrieval, British Computer Society, Cambridge, England,1984, pp.167 – 185.

[12] Valerie Cross, “Fuzzy Semantic Distance Measures Between Ontological Concepts”, Processing NAFIPS '04, IEEE Annual Meeting of the Volume 2, 27-30 June 2004 pp.635 - 640 Vol.2

[13] Jianyong Wang, Jiawei Han, Jian Pei, "CLOSET+: Searching for the Best Strategies for Mining Frequent Closed Itemsets", SIGKDD ’03, ACM, Washington, August 24-27,2003.

[14] Jan Paralic, Ivan Kostial, "Ontology-based Information Retrieval", Web Technologies Supporting Direct Participation in Democratic Processes", ACM.

[15] Jay j. Jiang and David W. Conrah, “Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy”, In Proceedings of International Conference Research on Computational Linguistics, Taiwan, 1997.

[16] M. Andrea Rodriguez, Max J. Egenhofer, “Determining Semantic Similarity among Entity Classes from Different Ontologies”,IEEE Transactions on Knowledge and Data Engineering, vol. 15, No. 2, IEEE, March/April 2003.

[17] Lee, D.L.; Huei Chuang; Seamons, K., "Document ranking and the vector-space model", Software, IEEE Volume 14, Issue 2, Mar/Apr 1997 PP.67-75.

[18] Silva, I.R.; Souza, J.N.; Santos, K.S., "Dependence among terms in vector space model", Database Engineering and Applications Symposium, 2004. IDEAS '04. Proceedings. International 7-9 July 2004, pp. 97-102.

[19] Claudia Leacock and Martin Chodorow. 1998. Combining local context and WordNet similarity for word sense ident-fication. In Fellbaum 1998, pp.265-283.

[20] Zhibiao Wu and Martha Palmer. Verb, semantics and lexical selection. In proceedings of the 32nd Annual meeting of the Assocation for Computational Linguistics, Las Cruces, New Mexicok, Pages 133-138, June 1994.

[21] Stephen J.H. YANG and Norman W.Y. SHAO, “An Ontology Based Content Model for Intelligent Web Content Access Services”, Submit to Journal of Web Services Research, March 2005.

[22] J. Lee, M. Kim and Y. Lee, “Information Retrieval based on conceptual Distance” in IS-A Hierarchies, “J.Documentation”,vol.49,PP.188-207.

[23] Comparisons of similarity metrics http://www.dcs.shef.ac.uk/~sam/stringmetrics.html#compare

[24] Johan Natt och Dag, Björn Regnell, “Evaluating Automated Support for Requirements Similarity Analysis in Market-Driven Development”

[25] Ding, Chris H.Q. 2000. A probabilistic model for dimensionality reduction in information retrieval and filtering. In Proc. of 1st SIAM Computational Information Retrieval Workshop, Raleigh, NC.

[26] Wenlei Mao, MS and Wesley W. Chu, PhD, "Free-text Medical Document Retrieval Via
Phrase-based Vector Space Model", Computer Science Department, University of California, Los Angeles.

[27] G. Miller, r.Beckwith, C.Fellbaum, D. Gross and K. Miller, “Introduction to WordNet:An On-Line Lexical Database,” International Journal of Lexicography. Vol 3, No. 4, 1990, pp.236-244.

[28] Ilmerio R. Silva, Joao, Nunes Souza and Karina S. Santos,“Dependence Among Terms in Vector Space Model”, Proceedings of the International Database Engineering and Applications Symposium, IEEE, 2004.

[29] Hideyuki UCHIDA Atsushi MANO and Takashi YUKAWA,“Patent Map Generation using Concept-based Vector Space Model”,working notes of NTCIR-4, Tokyo,2-4 June 2004

[30] Gary H. Merrill, “The Babylon Project: Toward an Extensible Text-Mining Platform”, IT Pro, IEEE Computer Society, March | April 2003.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top