跳到主要內容

臺灣博碩士論文加值系統

(44.210.149.205) 您好!臺灣時間:2024/04/16 18:03
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:顏志娟
研究生(外文):Chih-Chuan Yen
論文名稱:應用文件探勘技術於智慧型中文資訊檢索系統
論文名稱(外文):Intelligent Chinese Information Retrieval with Text Mining
指導教授:黃謙順黃謙順引用關係
指導教授(外文):Chein-Shung Hwang
學位類別:碩士
校院名稱:中國文化大學
系所名稱:資訊管理研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2004
畢業學年度:93
語文別:中文
論文頁數:67
中文關鍵詞:中文斷詞資訊萃取文件探勘資訊檢索
外文關鍵詞:Chinese word-segmentedInformation ExtractionText MiningInformation Retrieval
相關次數:
  • 被引用被引用:2
  • 點閱點閱:380
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:9
隨著資訊科技的進步、網際網路的普及與知識經濟時代的來臨,知識成為掌握競爭優勢的關鍵因素,使得使用者對資訊檢索的方式與結果提出了更高、更多樣化的要求。
由於傳統資訊檢索系統是採行人工方式建置索引詞,並藉由比對檢索詞與索引詞的方式,找出符合條件的文件,然而此種方式不僅費時費力,而且對於檢索概念無法明確表示、索引用語未充分掌握的使用者而言,常常會造成資訊遺漏的問題。
本研究旨在應用文件探勘技術建構一智慧型資訊檢索系統,此機制利用權重式混合斷詞與自動化字詞萃取,以擷取出每篇文章中相對重要的詞彙並依定位放至事先建置的樣版格式中,以作為每篇文件的索引詞,接著應用關聯法則,以發掘字詞之間的關聯程度。當使用者進行文件檢索時,可依據檢索詞的不同,擴展相關詞彙並將文件清單依相似度的高低排名輸出,因此,整個機制不但可使得相關性詞典自動化建構,更可使得資訊檢索系統朝向智慧化,以提供使用者更便捷、有效的檢索環境。
With the development of information technology, the popularity of Internet and the advent of knowledge economic era, knowledge becomes a key factor for gaining com-petitive advantages. Therefore, more versatile and powerful Information Retrieval methods are highly expected by users.
In traditions Information Retrieval system, index phrases are often manually con-structed. Documents are found by matching the query phrases with the index phrases. However, it is time-consuming and laborious. Users often lose important information when they are not good at expressing concepts clearly.
In this study, we apply text-mining technology to construct an Intelligent Chinese Information Retrieval System. The system uses weighted hybrid word-segmented and automatic term extraction method to feed extracted data into fields of predefined tem-plates. These extracted data are used as the index phrases for each document. Then, the association rules mining are applied to find the co-occurrence relationship between in-dex phrases. When users search for documents, the system can expand phrase-querying automatically and rank related documents. The system provides users more convenient and efficient documents retrieval environment, which makes a Chinese Information Re-trieval System toward so called “Intelligent”.
中文摘要 .....................iii
英文摘要 ..................... iv
誌謝辭  ..................... vi
內容目錄 .....................vii
表目錄  ..................... ix
圖目錄  .....................xii
第一章  緒論................... 1
  第一節  研究背景及動機............ 1
  第二節  研究目的............... 2
  第三節  研究範圍與架構............ 3
  第四節  研究方法............... 6
  第五節  研究限制............... 7
第二章  文獻探討................. 8
  第一節  資訊檢索............... 8
  第二節  中文資訊檢索............. 12
  第三節  文件探勘............... 14
  第四節  關聯法則............... 15
第三章  智慧型中文資訊檢索模型設計........ 22
  第一節  安全控管子模型............ 23
  第二節  參數設定子模型............ 24
  第三節  文件探勘子模型............ 28
  第四節  資訊檢索子模型............ 41
第四章  智慧型中文資訊檢索模型實作........ 44
  第一節  實作環境與實驗資料.......... 44
  第二節  參數設定............... 45
  第三節  文件探勘............... 49
  第四節  資訊檢索............... 54
第五章  結論與建議................ 59
  第一節  結論................. 59
  第二節  未來發展方向............. 60
參考文獻 ..................... 62
一、中文部分
許中川,陳景揆(2001),探勘中文新聞文件,資訊管理學報,7(2),103-122。
陳光華,莊雅蓁(2001),應用於資訊檢索的中文同義詞之建構,中國圖書館學會會報,67,93-107。
陳景揆(2000) ,探勘中文新聞文件中的概念及趨勢,國立雲林科技大學資訊管理研究所未出版之碩士論文。
張瓈文(2004),藥物過敏基因台灣第一發現,中國時報 [線上資料],來源:http://www.chinatimes.com.tw [2004, April 1]。
曾元顯,林瑜一(1998),模糊搜尋、相關詞提示與相關詞回饋在 OPAC系統中的成效評估,中國圖書館學會會報,61,103-125。
黃如玉(2002),應用文件探勘技術於中文產業新聞之知識發掘,私立中國文化大學資訊管理研究所未出版之碩士論文。
彭桂芳(2001),臺灣百家姓考(3版),臺北 : 黎明文化出版。
蔡志浩(2000),人名頻率 [線上資料],來源:http://input.cpatch.org/ [2000, March 21]。
顏義樺(2003),以聯想法則概念網路為基礎之文章概念探索及相似性比對,私立東海大學資訊工程與科學研究所未出版之碩士論文。
二、英文部分
Agrawal, R., & Srikant, R. (1994). Fast algorithm for mining associa-tion rules in large databases. In J. B. Bocca, M. Jarke, and C. Zaniolo (Eds.), Proceedings of 20th International Conference on Very Large Databases (pp. 487-499), Santiago: Morgan Kaufmann.
Belkin, J. N., & Croft, W. B. (1992). Information filtering and infor-mation retrieval : Two sides of the same coin ? Communica-tion of the ACM , 35(9), 29-38.
Chen, H. H., & Bian, G. W. (1997). Proper name extraction from WebPages for finding people in Internet. In Institute of Informa-tion Science Academia Sinica (Ed.), Proceedings Of ROCING X International conference (pp. 143-158), Taipei: Institute of Information Science Academia Sinica.
Chen, H., Yim, T., Fye, D., & Schatz, B. (1995). Automatic thesaurus generation for an electronic community system. Journal of the American Society for Information Science, 46(3), 175-193.
Chien, L. F. (1995). Fast and quasi-natural language search for giga-bytes of Chinese texts. In E. A. Fox, P. Ingwersen, and R. Fidel (Eds.), Proceedings of the 1995 ACM SIGIR Conference (pp. 112-120), Seattle: ACM Press.
DoszKocs, T. E. (1983). CITE NLM: Natural language searching in an online catalog. Information Technology and Libraries, 2, 364-380.
Faloutsos, C., & Christodoulakis, S. (1984). Signature files: An access method for documents and its analytical performance evaluation. ACM Transactions on Office Information Systems, 2(4), 267-288.
Feldman, R., & Hirsh, H. (1997). Exploiting background information in knowledge discovery from text. Journal of Information Sys-tem, 9(1), 83-97.
Haddad, M. H., Cheveallet, J. P., & Bruandet, M. F. (2000). Relations between terms discovered by association rule. Paper presented at Workshop on Machine Learning and Textual information Ac-cess of 4th European conference on Principles and Practices of Knowledge Discovery in Databases, Lyon, France.
He, J., Xu, J., Chen, A., Meggs, J., & Gey, F. C. (1996). Berkeley Chi-nese information retrieval [Online]. Available: http://trec.nist.gov/pubs/trec5/t5_proceedings.html [2000, Au-gust 1].
Jie, C. Y., Liu, Y., & Liang, N. Y. (1991). The design and realization Chinese automatic segmenting system CASS. Journal of Chi-nese Information Processing, 5(4), 27-34.
Layaida, R., Boughanem M., & Caron, A. (1994). Constructing an in-formation retrieval system with neural networks. In D. Karagiannis (Ed.), Database and Expert Systems Applications, 5th International Conference (pp. 561-570), Athens: Springer.
Leung, C. H., & Kan, W. K. (1996). Parallel Chinese word segmenta-tion algorithm based on maximum matching. Neural, Parallel and Science Computations, 4(3), 291-303.
Li, H., & Foo, S. (2004). Chinese word segmentation and its effect on information retrieval. Information Processing and Management, 40, 161-190.
Nasukawa, T., & Nagano, T. (2001). Text analysis and knowledge mining system. IBM System Journal, 40(4), 967-984.
Robert, S. E. (1977). The probability ranking principle in IR. Journal of Documentation, 33(4), 294-304.
Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for information retrieval. Journal of the American Society for In-formation Science, 18(11), 613-620.
Salton, G. (1989). Automatic text processing: The transformation, analysis, and retrieval of information by computer. Massachu-setts: Addison-Wesley.
Sanderson, M., & Croft, B. (1999). Deriving concept hierarchies from text. In SIGIR Committee (Ed.), Proceedings of the 22nd An-nual International ACM SIGIR conference on research and de-velopment in information retrieval (pp. 206-213), Berkeley: ACM.
Singh, L., Chen, B., Haight, R., & Scheuermann, P. (1999). An algo-rithm for constrained association rule mining in semi-structured data. In N. Zhong and L. Zhou (Eds.), The Third Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 148-158), Beijing: Springer.
Singh, L., Scheuermann, P. , & Chen, B. (1997). Generating associa-tion rules from semi-structured documents using an extended concept hierarchy. In F. Golshani and K. Makki (Eds.), Pro-ceedings of the Sixth International Conference on Information and Knowledge Management (pp. 193-200), Las Vegas: ACM Press.
Sproat, R., & Shih, C. (1990). A statistical method for finding word boundaries in Chinese text. Computer Processing of Chinese and Oriental Languages, 4(4), 336-351.
Stallings, W. (2000). Network security essentials applications and standards. New Jersey: Prentice-Hall, 283-286.
Sullivan, D. (2001). Document warehousing and text mining. New York: Wiley Computer Publishing, 326.
Tseng, Y. H. (2002). Automatic thesaurus generation for Chinese documents. Journal of the American Society for Science and Technology, 53(13), 1130-1138.
Wiesman, F., Hasman, A., & van den Herik, H. J. (1997). Information retrieval: An overview of system characteristics. International Journal of Medical Informatics, 47, 5-26.
Wu, Z. M., & Tseng, G. (1993). Chinese text segmentation for text re-trieval: Achievements and problems. Journal of the American Society for Information Science, 44(9), 532-542.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top