(3.235.191.87) 您好!臺灣時間:2021/05/13 14:24
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

: 
twitterline
研究生:曾彥博
研究生(外文):Yen-Po Tseng
論文名稱:詞彙共現關係在跨語言檢索之應用
論文名稱(外文):The word co-occurrence relationship for cross-language information retrieval
指導教授:邊國維邊國維引用關係
指導教授(外文):Guo-Wei Bian
學位類別:碩士
校院名稱:華梵大學
系所名稱:資訊管理學系碩士班
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
畢業學年度:98
語文別:中文
中文關鍵詞:資訊檢索跨語言資訊檢索詞彙共現關係動態規劃
外文關鍵詞:Information RetrievalCross-Language Information RetrievalCLIRWord Co-occurrenceDynamic Programming
相關次數:
  • 被引用被引用:2
  • 點閱點閱:407
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:78
  • 收藏至我的研究室書目清單書目收藏:1
本論文探討在跨語言檢索中,使用三種不同的查詢集翻譯方法在不同的索引建置方法中對檢索成效的影響。對於英文查詢集使用Google翻譯、Yahoo Babel Fish翻譯及詞彙共現關係結合動態規劃翻譯,對於日文文件集以Bigram、詞彙為本之方法、及詞性過濾三種處理方式建置索引檔案。
根據實驗結果指出,比較跨語檢索達到單語檢索精確率之平均百分比,Google翻譯達到89.00%,優於詞彙共現關係結合動態規劃翻譯73.76%以及Yahoo Babel Fish翻譯59.33%。使用詞彙共現關係結合動態規劃翻譯得到的查詢集,檢索以詞彙為本之索引時精確率為0.115,優於Google翻譯0.104與Yahoo Babel Fish翻譯0.065。而檢索以詞性過濾法索引之精確率,Google翻譯得到0.144,優於詞彙共現關係結合動態規劃翻譯0.119與Yahoo Babel Fish翻譯0.109。
此外,詞性過濾法對於使用長查詢檢索時,相對於詞彙為本之方法檢索之精確率有大幅度的改善,Google翻譯在TDNC欄位查詢相較於詞彙為本達到267.64%,DESC欄位查詢達到165.76%;Yahoo Babel Fish翻譯在TDNC欄位查詢達到506.49%,DESC欄位查詢達到245.11%,詞性過濾法於長查詢檢索時能夠有效過濾雜訊,對於檢索精確率有明顯
提升的效果。
In this research, we use three different indexing methods: Bigram,Word-Based, Word-Based and Part-Of-Speech (POS) filtering methods, on indices of Japanese newspaper collection to find out how the query translation of topics using word co-occurrence relationship affects the retrieval performance in cross-language information retrieval (CLIR).Three different translation systems: Google translation, Yahoo Babel Fish translation, and the translation based on mutual information are used to translate English topics to Japanese ones.
According to the experimental results, the best result is obtained by
the Google translation and can reach the 89% of the average performance of the monolingual retrieval. The query translation based on mutual information with dynamic programming can reach about 73.76% of the performance of the monolingual retrieval. Yahoo Babel Fish translation can reach about 59.33% of the performance. For the word-based index, the translation based on the mutual information obtains the best result and the average precision is 0.115. For word-based and POS filtering, Google translation obtains best result (0.144).
Compared with the word-based indexing, the POS filtering method improves the retrieval performance of the long query. For the TDNC-run, the translation by Google can obtain 267.64% of the performance which retrieved the word-based index. And the D-run (Description) of Google translation to search the POS-filtering index can reach 165.76% of performance which searching the word-based index.
誌謝 ..................................................................... I
摘要 .....................................................................II
Abstract ................................................................III
目錄 .....................................................................IV
表錄 .....................................................................VI
圖錄 ....................................................................VII
一、簡介 ................................................................. 1
1.1 研究介紹 ......................................................... 1
二、文獻探討 ............................................................. 3
2.1 詞彙處理方法 ..................................................... 3
2.2 詞彙共現關係 ..................................................... 5
三、研究架構 ............................................................. 7
3.1 文件集 ........................................................... 7
3.2 查詢集 .......................................................... 10
3.3 詞彙處理(Tokenization) .......................................... 10
3.3.1 Bi-gram .................................................... 10
3.3.2 詞彙為本之方法 ............................................. 12
3.3.3 詞性過濾法 ................................................. 12
3.4 查詢集翻譯 ...................................................... 15
3.4.1 Google Translate 線上翻譯................................... 16
3.4.2 Yahoo Babel Fish 線上翻譯 .................................. 16
3.4.3 詞彙共現關係結合動態規劃之翻譯方法 ......................... 20
3.4.3.1 候選詞資料庫 ........................................ 20
3.4.3.2 詞彙共現關係 ........................................ 21
3.4.3.3 動態規劃 ............................................ 22
3.5 索引建置及檢索 .................................................. 24
3.6 分數評估 ........................................................ 24
四、實驗及結果 .......................................................... 27
4.1 索引檔案 ........................................................ 27
4.2 查詢集(Topics) .................................................. 27
4.3 查詢集翻譯 ...................................................... 28
4.3.1 擴充候選詞資料庫 ........................................... 28
4.3.2 候選詞資料庫修正 ........................................... 29
4.4 實驗組數 ........................................................ 30
4.5 實驗結果 ........................................................ 31
4.5.1 單語檢索實驗結果 ........................................... 31
4.5.2 跨語言檢索實驗結果 ......................................... 32
4.5.2.1 Bigram 方法實驗結果 ................................. 32
4.5.2.2 詞彙為本之方法實驗結果 .............................. 32
4.5.2.3 詞性過濾法實驗結果 .................................. 32
4.6 討論 ............................................................ 37
4.6.1 人名與專有名詞之影響 ....................................... 37
4.6.2 詞彙處理方法討論 ........................................... 38
五、結論 ................................................................ 44
參考文獻 ................................................................ 46
[1] 鄧舜元,「CPRS 跨語言專利檢索系統」,私立華梵大學資訊管理學系研究所碩士論文,民國九十七年。
[2] 溫振丞,「詞彙擷取對統計式日英翻譯系統之影響」,私立華梵大學資訊管理學系研究所碩士論文,民國九十八年。
[3] 楊凱雯,「利用詞性過濾之資訊檢索技術」,私立華梵大學資訊管理學系研究所碩士論文,民國九十八年。
[4] 鄧舜元, 邊國維,「結合線上翻譯服務的跨語言專利檢索系統」,ROCLING 2008, Taipei, Taiwan, 4-5 September 2008.
[5] Chasen, http://chasen-legacy.sourceforge.jp/
[6] Chen, Hsin-Hsi. and Lin, Wen-Cheng., ”NTU at CLEF2001:Chinese-English Cross-Lingual Information Retrieval”,CLEF2001:Workshop of the Cross-Language Evaluation Forum, 3-4 September, Darmstadt, Germany, 2001.
[7] Church, Kenneth Ward. and Hanks, Patrick.,”Word Association Norms, Mutual Information, and Lexicography” ,Computational Linguistics Volume 16, Number 1, March, 1990.
[8] Lucene.Net, http://incubator.apache.org/lucene.net/
[9] Murata, Masaki.; Ma, Qing. and Isahara, Hitoshi., “Applying Multiple Characteristics and Techniques in the NICT Information Retrieval System in NTCIR-5", Proceedings of NTCIR-5 Workshop Meeting, December 6-9, Tokyo, Japan, 2005.
[10] Collins, Michael., “A new statistical parser based on bigram lexical dependencies”. Proceedings of the 34th Annual Meeting of the Association of Computational Linguistics, Santa Cruz, CA., pp.184-191, 1996.
[11] Jang, Myung-Gil.; Sung, Hyon. Myeang. and Se, Young. Park., “Using Mutual Information to Resolve Query Translation Ambiguities and Query Term Weighting” , Annual Meeting of the ACL Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics College Park, Maryland, pp. 223 – 229, 1999.
[12] NTCIR (NII Test Collection for IR Systems) Project, http://research.nii.jp/ntcir/
[13] Nakagawa, Tetsuji., "NTCIR-5 CLIR Experiments at Oki", Proceedings of NTCIR-5 Workshop Meeting, December 6-9, Tokyo, Japan, 2005.
[14] Nakagawa, Tetsuji., “Chinese and Japanese Word Segmentation Using Word-Level and Character-Level Information”, Proceedings of the 20th International Conference on Computational Linguistics, University of Geneva, Switzerland, August 23-27, pp. 466-472, 2004.
[15] Ogilvie, Paul and Callan, Jamie, “Experiments Using the Lemur Toolkit”, Proceedings of the Tenth Text Retrieval Conference, Gaithersburg, MD, USA, November 13-16, pp.103-108, 2002.
[16] Shi, Lixin. and Nie, Jian-Yun., “Using Unigram and Bi-gram Language Models for Monolingual and Cross-Language IR”, Proceedings of NTCIR-6 Workshop, Tokyo, Japan, May 15-18, pp20-25, 2007.
[17] Su, C.Y.; Lin, T.C. and Wu, S.H., “Using Wikipedia to Translate OOV Term on MLIR”, Proceedings of NTCIR-6 Workshop, Tokyo, Japan, May 15-18, pp. 109-115, 2007.
[18] Tomlinson, Stephen., “CJK Experiments with Hummingbird SearchServer™ at NTCIR-5", Proceedings of NTCIR-5 Workshop Meeting, December 6-9, Tokyo, Japan, 2005.
[19] The Lemur Toolkit, http://www.lemurproject.org/
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔