跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.84) 您好!臺灣時間:2024/12/03 23:53
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:陳皙彥
研究生(外文):Hsi-Yen Chen
論文名稱:一個有效的文件檢索索引結構-關鍵詞繼承結構
論文名稱(外文):An index structure for efficient document retrieval-Term Inheritance Structure
指導教授:楊朝成楊朝成引用關係張簡尚偉張簡尚偉引用關係
指導教授(外文):Chou-Chen YangShang-Wei Changchien
學位類別:碩士
校院名稱:朝陽科技大學
系所名稱:資訊管理系碩士班
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2003
畢業學年度:91
語文別:中文
論文頁數:80
中文關鍵詞:倒轉檔文件檢索關鍵詞權重
外文關鍵詞:document retrievalinverted fileterm weightindex term
相關次數:
  • 被引用被引用:4
  • 點閱點閱:308
  • 評分評分:
  • 下載下載:20
  • 收藏至我的研究室書目清單書目收藏:0
在資訊大量氾濫的時代,由於各種文件的快速成長,使得文件的管理與檢索效率受到更多的重視。在傳統的倒轉檔(Inverted File)結構上,使用者以多個關鍵詞檢索時,系統必須多次檢索大量的文件資料庫,並不是很有效率。本論文提出一個有效的文件檢索索引結構-關鍵詞繼承結構(Term Inheritance Structure, TIS),用以改善使用者同時檢索數個關鍵詞的效率,經實驗結果,TIS可以改善使用者同時檢索數個關鍵詞的效率,其檢索時間不因關鍵詞或文件數量的增加而增加,並且較倒轉檔集中且穩定。另外,我們也提出了一個計算文件相似度的演算法,利用檢索關鍵詞與權重,更精確地測量被檢索文件之間的相關性。雖然TIS資料結構需要花較多的時間與空間建構,但卻只需要被建構一次,之後在自大量的文件資料庫檢索時,將節省很多查詢與檢索的時間,此外TIS結構可輕易地進行新增文件的程序。
As more and more electronic documents are generated, the management of structured documents and the efficiency of the retrieval system become significantly essential. With popular inverted file structure, still a large number of documents will be searched and examined when queried with multiple index terms; that is, any document contains any one of the index terms will be checked. An improved new index structure Term Inheritance Structure (TIS), which can reduce the search time when queried with multiple index terms, is proposed. In addition, a new algorithm of query-document similarity is defined to more precisely measure the relativeness of retrieved documents based on user’s index terms and term weights. Although it takes time and space to construct the data structure of the TIS, it only needs to be implemented once. Thereafter it may save a lot of search and retrieval time when querying frequently from a large document base.
中文摘要 I
英文摘要 II
目錄 IV
表目錄 V
圖目錄 VI
第一章、簡介 1
1.1. 研究背景與動機 1
1.2. 研究目的與範圍 2
1.3. 論文架構 3
第二章、文獻探討 4
2.1. 倒轉檔 5
2.2. 簽名檔 7
2.3. 字尾詞陣列 9
2.4. 檢索模型 9
第三章、研究方法 13
3.1. 建立TIS 15
3.2. 建立TIS的規則 16
3.3. 利用TIS檢索文件 19
3.4. 計算相似度 21
3.5. 建立TIS的實例 27
3.6 文件關鍵詞權重自動產生方法 34
第四章、實驗結果 37
4.1. TIS檢索實驗 37
4.2. 系統自動給權重值的實驗 50
第五章、結論 60
參考文獻 62

附錄A
附錄B
[1] Korfhage, R. R., 1997, Information Storage and Retrieval, John Wiley & Sons.
[2] Baeza-Yates, R. and Ribeiro-Neto, B. , 1999, Modern Information Retrieval, Addison Wesley.
[3] Kim, H.-G. , Cho, S.-B. , July 1, 2000, Structured storage and retrieval of SGML documents using Grove, Information Processing and Management, 36(4): 643-657.
[4] Turtle, Howard and Croft, W. Bruce, July 1991, Evaluation of an inference network-based retrieval model, ACM Transactions on Information Systems (TOIS), 9(3): 187-222.
[5] Salton, G. and Buckley, C. , 1988, “Term Weighting Approaches in Automatic Text Retrieval,” Information Processing & Management, 24(5): 513-523.
[6] Zobel, J., Moffat, A. and Ramamohanarao, K. , December 1998, Inverted files versus signature files for text indexing, ACM Transactions on Database Systems (TODS), 23(4): 453-490.
[7] Hand, D., Mannila, H. and Smyth, P. , 2001, Principles of Data Mining, The MIT Press.
[8] Zhang, C., Naughton, J., DeWitt, D., Luo, Q. and Lohman G., May 2001, On supporting containment queries in relational database management systems, Proceedings of the 2001 ACM SIGMOD International Conference on Management of data, ACM SIGMOD Record, 30(2): 425-436.
[9] Tu, Hsieh-Chang and Hsiang, Jieh, May, 2000, An architecture and category knowledge for intelligent information retrieval agents, Decision Support Systems, 28(3): 255-268.
[10] Wiesman, F., Hasman, A. and van den Herik, H. J. , 1997, “Information Retrieval: an Overview of System Characteristics,” International Journal of Medical Informatics, 47(1-2): 5-26.
[11] Shaw Jr, W. M., Burgin, R. and Howell, P., January, 1997, Performance standards and evaluations in IR test collections: Vector-space and other retrieval models, Information Processing & Management, 33(1): 15-36.
[12] Chen, Shyi-Ming; Horng, Yih-Jen, Feb 1999, Fuzzy query processing for document retrieval based on extended fuzzy concept networks, Systems, Man and Cybernetics, Part B, IEEE Transactions on , 29(1): 96-104.
[13] Chen, Shyi-Ming; Horng, Yih-Jen; Lee, Chia-Hoang, Feb 2001, Document retrieval using fuzzy-valued concept networks, Systems, Man and Cybernetics, Part B, IEEE Transactions on , 31(1): 111-118.
[14] Lee, Y. K., Yoo , S.-J. and Yoon, K. , March 1996, “Index Structures for Structured Documents,” Proceedings of the 1st ACM International Conference on Digital Libraries, 91-99, Bethesda, Maryland.
[15] Kowalski, G. , 1997, Information Retrieval System: Theory and Implementation, Kluwer Academic Publishers.
[16] Faloutsos, C., March 1985, Access methods for text, ACM Computing Surveys (CSUR), 17(1): 49-74.
[17] Manber, U. and Myers, G. , 1990, Suffix arrays: A new method for on-line string searches. In Proceedings of the 1st ACM-SIAM Symposium on Discrete Algorithms, ACM Press, New York, NY, 319–327.
[18] Croft, W. B. and Savino, P. , January 1988, Implementing ranking strategies using text signatures, ACM Transactions on Information Systems (TOIS) 6 (1): 42-62.
[19] Du, D.H.-C., Ghanta, S., Maly, K.J. and Sharrock, S.M., June 1989, An efficient file structure for document retrieval in the automated office environment, Knowledge and Data Engineering, IEEE Transactions on , 1(2): 258 -273.
[20] Faloutsos, C. and Christodoulakis, S. , October 1984, Signature files: an access method for documents and its analytical performance evaluation, ACM Transactions on Information Systems (TOIS), 2(4): 267-288.
[21] Rabitti, V. and Zizka, J. , 1984, Evaluation of access methods to text documents in office systems, In Proceedings of the 3rd Joint ACM-BCS Symposium on Research and Development in Information Retrieval, 21-40.
[22] Faloutsos, C. and Oard, D. W. , August 1995, "A Survey of Information Retrieval and Filtering Methods", University of Maryland, Technical Report CS-TR-3514.
[23] Crochemore, Maxime, January 10, 2003, Reducing space for index implementation, Theoretical Computer Science, 292(1): 185-197.
[24] Ijsbrand Jan Aalbersberg, August 1994, A document retrieval model based on term frequency ranks, Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval.
[25] Salton, G., Fox, E. A. and Wu, H. , November 1983, Extended Boolean information retrieval, Communications of the ACM, 26(11): 1022-1036.
[26] Yu , C. T. and Salton, G. , 1977, “Effective information retrieval using term accuracy,” Communications of the ACM, 20(3): 135-142.
[27] Salton, Gerard; Allan, James; Singhal, Amit, March, 1996, Automatic text decomposition and structuring, Information Processing & Management 32(2): 127-138.
[28] Salton, G. and McGill, M. J. , 1983, Introduction to Modern Information Retrieval, McGraw Hill Book Co., New York.
[29] Horng, Jorng-Tzong and Yeh, Ching-Chang, September 2000, Applying genetic algorithms to query optimization in document retrieval, Information Processing and Management, 36(5): 737-759.
[30] Changchien, S. W. and Lu, T. C. , 2001, “A New Efficient Association Rules Mining Method Using Class Inheritance Tree (CIT),” Proceedings of the 12th International Conference on Information Management, Taipei, Taiwan, May 18-19.
[31] Kolman, B., Busby, R. C. and Ross, S. , 1996, Discrete mathematical structures, Prentice Hall.
[32] Chen, Yangjun, May 31, 2002, Signature files and signature trees, Information Processing Letters, 82(4): 213-221.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top