研究生(外文):Hsi-Yen Chen
論文名稱(外文):An index structure for efficient document retrieval-Term Inheritance Structure
指導教授(外文):Chou-Chen YangShang-Wei Changchien
外文關鍵詞:document retrievalinverted fileterm weightindex term
在資訊大量氾濫的時代,由於各種文件的快速成長,使得文件的管理與檢索效率受到更多的重視。在傳統的倒轉檔(Inverted File)結構上,使用者以多個關鍵詞檢索時,系統必須多次檢索大量的文件資料庫,並不是很有效率。本論文提出一個有效的文件檢索索引結構-關鍵詞繼承結構(Term Inheritance Structure, TIS),用以改善使用者同時檢索數個關鍵詞的效率,經實驗結果,TIS可以改善使用者同時檢索數個關鍵詞的效率,其檢索時間不因關鍵詞或文件數量的增加而增加,並且較倒轉檔集中且穩定。另外,我們也提出了一個計算文件相似度的演算法,利用檢索關鍵詞與權重,更精確地測量被檢索文件之間的相關性。雖然TIS資料結構需要花較多的時間與空間建構,但卻只需要被建構一次,之後在自大量的文件資料庫檢索時,將節省很多查詢與檢索的時間,此外TIS結構可輕易地進行新增文件的程序。
As more and more electronic documents are generated, the management of structured documents and the efficiency of the retrieval system become significantly essential. With popular inverted file structure, still a large number of documents will be searched and examined when queried with multiple index terms; that is, any document contains any one of the index terms will be checked. An improved new index structure Term Inheritance Structure (TIS), which can reduce the search time when queried with multiple index terms, is proposed. In addition, a new algorithm of query-document similarity is defined to more precisely measure the relativeness of retrieved documents based on user’s index terms and term weights. Although it takes time and space to construct the data structure of the TIS, it only needs to be implemented once. Thereafter it may save a lot of search and retrieval time when querying frequently from a large document base.
