|
With the development of information technology and the increasing ofinformation flow, everyone has to face more and more information. Therefore, without technology for information filtering, it would be very difficult to find the needed information. In order to solve this difficulty, the information retrieval therefore developed. When using this technology, users expect this technology help to search for what they really need. The query result must not only match users'' requirement, but also be meaningful to users. Thus, if the query result includes only a small portion of meaningful information, it will be of no value to users.Using current information retrieval system, the target of the system is to return most relevant document to users. But sometimes users expect more precise result like paragraphs, lists, etc.. These "passages" are really meaningful to users. However, current information retrieval algorithms can not match this kind of application. Thus, original algorithms should be modified to meet these requirements.In order to solve the problem of passage retrieval, a passage retrieval system is implemented by using LSI (Latent Semantics Indexing). At the same time the properties of LSI under passage retrieval is investigated. These properties includes optimal query length, optimal word segmentation, optimal document segmentation, impact when appending new documents, and the benefit of relevance feedback.In this research, the passage retrieval system works best when document paragraphs, longer Chinese word, and adequate query length are used. In this research on appending documents using folding-in technique, documents can be appended without re-SVD the document index. A ratio of new document is found to prevent re-computing the matrix. Second, the document vector matrix can be used in passage retrieval. Finally, the research on relevance feedback shows that this technique is useful.Thus, the conclusion is: LSI indeed fits passage and concept retrieval, especially when searching for relevant documents from some passages. Thus, LSI is feasible for passage retrieval.
|