跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.44) 您好!臺灣時間:2026/01/01 20:46
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:丁肇君
論文名稱:線上新聞之中英文跨語系資訊檢索系統
論文名稱(外文):A Chinese-English Cross-Language Information Retrieval System for On-line News Articles
指導教授:林敏勝林敏勝引用關係
學位類別:碩士
校院名稱:國立臺北科技大學
系所名稱:電機工程系碩士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:中文
論文頁數:65
中文關鍵詞:資訊檢索跨語系資訊檢索新聞檢索查詢語擴充
外文關鍵詞:information retrievalcross-language information retrievalnews retrievalquery expansion
相關次數:
  • 被引用被引用:0
  • 點閱點閱:349
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:4
近年來,由於網際網路的全球化與英文新聞數量的大量增加,國人瀏覽英文新聞的情形將會越來越普遍。然而,由於一般國人很難給予正確的英文查詢語,所以往往無法檢索出相關的英文新聞。再者,一旦遇到不會拼的英文詞彙時,想要搜尋出相關的英文新聞更是難如登天。
本研究的目的是希望提出一個中英文跨語系線上新聞檢索系統,使得非英語系的國人能夠使用中文查詢語來檢索相關英文新聞。本系統將每天定時擷取網路上所發佈的的中英文電子新聞,然後再依據中文查詢語進行新聞斷句,由斷句的結果可以將中文查詢語擴充出其他相關的中文查詢語,最後再將這些中文查詢語翻譯成英文查詢語,並且篩選出相關的英文新聞。此外本研究亦考量針對同一事件的中英文新聞發佈時間的關連性,以提升系統檢索新聞的準確度。

Accelerated growth of the Internet and on-line news in English allow non-native English speakers to access on-line news in English more frequently. However, Chinese speaking Internet users cannot retrieve relevant topics from an enormous amount of news owing to the difficulty of making a precise query in English. Moreover, non-native English speakers cannot retrieve relevant news in English owing to limited vocabulary skills.
This study proposes a novel information retrieval system for Chinese-English cross-language when retrieving on-line news articles. Thus, Chinese speaking Internet users can formulate queries in Chinese and then retrieve relevant news in English via the proposed system. The proposed system first collects on-line news from Chinese and English news web sites daily. Sentence segmentation is then performed using the Chinese query. With sentence segmentation, the original Chinese query can be expanded to more Chinese queries. Finally, these Chinese queries can be translated into English queries and then the relevant English news retrieved. Additionally, the relation between the announcement date between Chinese news and English news for the same event is considered to enhance the precision of the proposed system.

摘要 i
Abstract ii
誌謝 iii
目次 iv
表目錄 vi
圖目錄 viii
第一章 緒論 1
1.1前言 1
1.2研究背景與動機 1
1.3研究目的 2
1.4論文架構 2
第二章 文獻探討 3
2.1資訊檢索技術 3
2.2跨語系資訊檢索 4
2.3 檢索系統的比較評估 9
2.4中文斷詞 10
2.5查詢語的擴充 12
第三章 中英文跨語系新聞檢索 14
3.1系統架構 14
3.2新聞收集 16
3.3 資料處理、過濾 16
3.4 查詢語擴充 17
3.5常用詞彙過濾與英文詞彙權重計算 19
3.6搜尋與排序 20
第四章 研究結果與評估 21
4.1實驗環境 21
4.2研究結果評估方法 21
4.3結果與分析 23
第五章 結論與未來研究方向 41
5.1結論 41
5.2未來研究方向 44
附錄 50

1. C. Faloutsos , D. Oard ,”A survey of Information Retrieval and Filtering Methods”,University of Maryland College Park,CS-TR-3514,1995.
2. 黃燕萍,「中文社會新聞文件資訊擷取」(碩士論文,國立雲林科技大學資訊管理系民國87年)。
3. G. Salton. “A Theory of indexing.Regional Conference Series in Applied Mathematics “,1975.
4. F. Norbert , “Optimum Polynomial Retrieval Functions Based on the Probability Ranking Principle” , ACM Translations on Information systems , 1989.
5. S. E. Roberson ,“The Probability Ranking Principle in Information Retrieval” , J.Doc.33,1977.
6. 圖2-2 引用台灣大學資訊工程學系陳信希教授於民國86年6月2日發表的演講稿。陳信希。「跨語資訊檢索」,「電子辭典、機器翻譯與資訊擷取研討會」。
7. L. Ballesteros, and W.B. Croft, "Phrasal Translation and Query Expansion Techniques for Cross-Language Information retrieval." In Proceedings of ACMSIGIR'97, 1997,pp.84-91.
8. M.W. David, "New Experiments in Cross-Language Text Retrieval at New Mexico State University's Computing Research Laboratory." In Proceedings of the Fifth Text Retrieval Evaluation Conference (TREC-5), Gaithersburg, MD, National Institute of Standards and Technology ,1996.
9. D.A. Hull, and G. Grefenstette, "Querying Across Languages: A Dictionary-Based Approach to Multilingual Information Retrieval. " In Proceedings of ACM SIGIR,1996,pp.49-57.
10. L. Ballesteros, and W.C. Croft, "Dictionary-Based Methods for Cross-Lingual Information Retrieval." In Proceedings of the 7' International DEXA Conference on Database and Expert Systems Applications,1996, pp. 791-801.
11. M.W. David, "New Experiments in Cross-Language Text Retrieval at New Mexico State University's Computing Research Laboratory." In Proceedings of the Fifth Text Retrieval Evaluation Conference (TREC-5), Gaithersburg, MD, National Institute of Standards and Technology,1996.
12. Y. Hayashi, G. Kikui, and S. Susaki, "TITAN: A Cross-Linguistic Search Engine for the WWW." Working Notes of the AAAI-97 Spring Symposium on Cross-Language Text and Speech Retrieval, 1997, pp. 58-65.
13. M.W. David, and T. Dunning, "A TREC Evaluation of Query Translation Methods for MultiLingual Text Retrieval." In Proceedings of the Fourth Text Retrieval Evaluation Conference(TREC-4), Gaithersburg, MD, National Institute of Standards and Technology.,1995.
14. T.K. Landauer, and M.L. Littman, "Fully Automatic Cross-Language Doctunent Retrieval." In Proceedings of the Sixth Conference on Electronic Text Research , 1990, pp. 31-38.
15. S.T. Dumais, M.L. Littman, and T.K. Landauer, "Automatic Cross-Language Retrieval Using Latent Semantic Indexing." Working Notes of the AAAI-97 Spring Symposium on CrossLanguage Text and Speech Retrieval, 1997, pp. 18-24.
16. L. Ballesteros, and W.B. Croft, "Resolving Ambiguity for Cross-Language Retrieval." In Proceedings ofACM SIGIR 98, 1998, pp. 64-71.
17. G.W. Bian, and H.H. Chen, "Integrating Query Translation and Document Translation in a Cross-Language Information Retrieval System." In Proceedings of AMTA Conference on Machine Translation (AMTA-98), Langhorne, PA, USA, October 28-31, 1998, pp.250-265.
18. W. Kraaij, and D. Hiemstra, "Cross Language Retrieval with the Twenty-One System." In Proceedings of the Sixth Text Retrieval Evaluation Conference (TREC-6). Gaithersburg, MD, National Institute of Standards and Technology, 1997.
19. K. Radwan, “Acces Multilingue en Language Naturel aux Baess De Donness textuelles.PhD Thesis”,1994.
20. D.W. Oard, “A comparative Study of Query and Document Translation for cross-language Information Retrieval." In Proceedings of AMTA Conference on Machine Translation(AMTA-98).Langhorne,PA,USA,October 28-31 , 1998 , pp . 472-483.
21. 陳光華,「語彙知識之擷取與混合式機器翻譯系統之研究」(博士論文,國立台灣大學資訊工程學研究所,民國85年)
22. G.W. Bian and H.H. Chen, "An MT Meta-Server for Information Retrieval on WWW" ,in Proceedings of AAAI-97 Spring Symposium Series on Natural Language Processing for the World Wide Web, 1997,pp.10-16.
23. P. Sheridan, and J.P. Ballerini, ''Experiments in Multilingual Information Retrieval Using the SPIDER system." In Proceedings of ACM SIGIR'96,1996,pp58-65.
24. A. Chen, H. Jiang and F. Gey, ”combining multiple sources for short query translation in Chinese English cross language information retrieval ”, 2000.
25. 林其青,「英中詞彙知識庫建構機制之研究」(碩士論文,國立台灣大學資訊工程學研究所,民國89年)。
26. 邊國維,「跨語言資訊檢索系統中查詢翻譯與文件翻亦之研究」(博士論文,國立台灣大學資訊工程學研究所,民國88年)。
27. 彭載衍、張俊盛,”中文詞彙奇異之研究─斷詞與詞性標示”,第五屆計算機語言學研討會論文集,1994。
28. 陳克建、陳正佳、林隆基 “ 中文語句的研究─斷詞與構詞”,技術報告,TR-86-006,中央研究院,南港,1986。
29. A. Chen, J. He, L. Xu, F. C. Gey, and J. Meggs, "Chinese text retrieval without using a dictionary", Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval , 1997, pp.42 — 49.
30. K. H. Chen, and H. H. Chen, "Extracting Noun Phrases from Largescale Texts: A Hybrid Approach and Its Automatic Evaluation", Proceedings of the 32 nd Annual Meeting of the Association for Computational Linguistics (ACL94), 1994 ,pp.234-241.
31. J. Nie, M. Briscbois and X. Ren, "On Chinese Text Retrieval", Conf. Proc. of SIGIR, 1996, pp.225-233
32. C. L. Yeh, and J. Lee, "Rule-Based Word Identification for Mandarin”,1991.
33. S. Gauch and J.Wang ,”Tuning a Corpus Analysis Approach for Automatic Query Expansion”,2000
34. 莊雅蓁,「中文查詢語擴充之研究」(碩士論文,國立台灣大學圖書資訊研究所,民國89年)。
35. 中央社:http://www.cna.com.tw/
36. 中時電子報:http://news.chinatimes.com/
37. 民視新聞:http://www.ftvn.com.tw/
38. 夜光新聞:http://dailynews.muzi.com/index1.shtml
39. 英文中國郵報:http://www.chinapost.com.tw/
40. CNN英文新聞網:http://www.cnn.com/
41. 中英雙語字典: http://www.mandarintools.com/
42. 英文常用字字典:http://140.118.43.68/GroupB_89/Page3.htm
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top