跳到主要內容

臺灣博碩士論文加值系統

(44.211.34.178) 您好!臺灣時間:2024/11/02 22:50
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:溫振丞
研究生(外文):Wen Chen Cheng
論文名稱:詞彙擷取對統計式日英翻譯系統之影響
論文名稱(外文):Statistical Japanese-English Machine Translation System Using Term Extraction
指導教授:邊國維邊國維引用關係
學位類別:碩士
校院名稱:華梵大學
系所名稱:資訊管理學系碩士班
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2009
畢業學年度:97
語文別:中文
論文頁數:50
中文關鍵詞:自然語言處理統計式機器翻譯詞彙擷取詞彙對齊
外文關鍵詞:Natural Language ProcessingStatistical Machine Translationterm extractionword alignment
相關次數:
  • 被引用被引用:2
  • 點閱點閱:219
  • 評分評分:
  • 下載下載:25
  • 收藏至我的研究室書目清單書目收藏:0
本論文利用詞彙擷取工具辨識出英文語料庫中的多字詞詞彙,擷取出的詞彙視為單一字詞,並設計一個以統計為基礎的機器翻譯系統,利用詞彙對齊、詞性標記、詞頻統計、詞性對照、翻譯模型、以及統計式選詞的方法,產生建議的翻譯。
訓練語料庫是NTCIR-7 Patent Translation的英日雙語語料庫,我們選取10萬句日文英文配對句子,擷取的詞彙長度介於2 至 6,訓練出6種不同的模型。測試資料採用NTCIR7 Patent Translation Formal Run的資料,英文與日文各1380句。
比較經過詞彙擷取所訓練的模型與未作詞彙擷取的模型,在英文句翻譯日文部份,透過NIST和BLEU的評估指標,詞彙擷取模型之翻譯結果較佳,表示經過詞彙擷取的資料可以提升翻譯選詞的準確度。
In this paper, we proposed to use the term extraction tool to extract the multi-word patterns before the word alignment processing in the statistical machine translation system. The identified pattern was used as a single word for alignment and translation. We designed an English-Japanese machine translation system, which used this term extraction technology, word alignment, part of speech tagging, translation probability, and different translation models to evaluate the performances.
The bilingual corpus of the NTCIR-7 Patent Translation Task is used for our experiments. In training stage, 100,000 aligned sentences are selected from the parallel corpus. The common patterns with length from two to six are extracted to process as the words. We select another 1,380 sentences for testing and evaluation.
The performances of the NIST and BLEU evaluations have shown that the N-Gram Precisions of BLEU and NIST using term extraction technology are better than the method without term extraction.
誌謝 I
摘要 II
Abstract III
第一章 緒論 1
1.1 研究背景與動機 1
1.2 研究目的 2
1.3 研究流程 3
第二章 相關研究 5
2.1 基於規則的方法 5
2.1.1 直接式翻譯 5
2.1.2 轉換式翻譯 6
2.1.3 中介式翻譯 6
2.2 基於語料庫的方法 7
2.2.1 基於統計式翻譯 7
2.2.2 基於實例的機器翻譯 9
2.3 基於片語(短語)統計式翻譯 10
2.4 雙語平行語料庫 10
2.4.1 GIZA++計算詞彙對齊 11
2.5 詞彙擷取 12
2.6 機器翻譯的評估方法 13
2.6.1 BLEU的評估方法 13
2.6.2 NIST的評估方法 14
第三章 利用詞彙擷取的統計式翻譯 16
3.1 系統架構與流程 16
3.2 語料庫的前處理流程 18
3.2.1 英文文件的前處理 18
3.2.2 日文文件的前處理 19
3.3 詞彙擷取 19
3.4 訓練各統計資料 20
3.4.1 詞彙對齊 20
3.4.2 詞頻統計 21
3.4.3 詞性標記 22
3.5 翻譯程序 24
第四章 系統實驗 27
4.1 實驗資料 27
4.1.1 訓練資料與測試資料 27
4.2 實驗列表 27
4.2.1 英翻日 28
4.2.2 日翻英 33
4.3 討論 38
第五章 結論以及未來研究 42
5.1 結論 42
5.2 未來研究 42
參考文獻 44
附錄一 47
附錄二 49
[1]王斌、劉群、張祥,"漢英雙語自動分段對齊研究",軟體學報,第11期,第1547-1553頁,2000年。
[2]王駿發、林順傑,"溝通無國界-多國語音電腦輔助翻譯",科學發展,第363期,第26-33頁,2003年3月。
[3]呂明欣、高照明、劉昭麟、張俊彥,"針對數學與科學教育領域之電腦輔助英中試題翻譯系統",ROCLING XIX,台灣,2007年。
[4]夏敏翔,使用詞組及流暢性來改善統計式機器翻譯,國立成功大學資訊工程學系碩博士班,2006年。
[5]曾元顯,"關鍵詞自動擷取技術與相關詞回饋",中國圖書館學會會報,第59期,第59-64頁,1997年12月。
[6]簡立峰,"尋易系統(Csmart)與中文智慧型資訊檢索",資訊傳播與圖書館學,第3卷第2期,第28-37頁,1996年12月。
[7]Bonnie J. Dorr, Pamela W. Jordan and John W. Benoit, "A survey of current paradigms in machine translation," In the Advances in Computers, Vol. 49, M. Zelkowitz (Ed.), Academic Press, London, 1999.
[8]Daniel Marcu and William Wong, "A phrase-based, joint probability model for statistical machine translation," In Proceedings of the ACL-02 conference on Empirical methods in natural language processing, Vol. 10, pp. 133-139, 2002.
[9]David Geer, "Statistical Machine Translation Gains Respect," Computer, Vol. 38, Issue 10, pp. 18-21, 2005.
[10]Franz Josef Och and Hermann Ney, "Improved statistical alignment models," In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pp. 440-447, 2000.
[11]Franz Josef Och and Hermann Ney, "The alignment template approach to statistical machine translation," Computational Linguistics, Vol. 30, pp. 417-449, 2003.
[12]Franz Josef Och, "An efficient method for determining bilingual word classes," In Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics, pp. 71-76, 1999.
[13]http://www.fjoch.com/GIZA++.html
[14]John Hutchins, "Machine translation: General overview," Ruslan Mitkov (Ed.), The Oxford Handbooks in Linguistics, pp. 501-511, 2003.
[15]John Hutchins, "Retrospect and prospect in computer-based translation," MT Summit VII, pp. 30-34, September 1999.
[16]Judith Klavans and Evelyne Tzoukermann, "The bicord system," In Proceedings of 15th International Conference on Computational Linguistics, pp. 174-179, 1990.
[17]Kishore Papineni, Salim Roukos, Todd Ward and Wei-jing Zhu, "Bleu: A method for automatic evaluation of machine translation," In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311-318, 2001.
[18]Lee-Feng Chien, "PAT-Tree Based Keyword Extraction for Chinese Information Retrieval," ACM SIGIR Forum, Vol. 31, pp. 50-58, 1997.
[19]Makoto Nagao,"A framework of a mechanical translation between Japanese and English by analogy principle," In Proceedings of the international NATO symposium on Artificial and human intelligence, pp. 173-180, 1984.
[20]Peter E Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer, "The mathematics of statistical machine translation: Parameter estimation," Computational Linguistics, Vol. 19, No. 2, pp. 263-311, 1993.
[21]Peter F. Brown, John Cocke, Stephen A. Della Pietra, Vincent J. Della Pietra, Fredrick Jelinek, John D. Lafferty, Robert L. Mercer, and Paul S. Roossin, "A statistical approach to machine translation," Computational Linguistics, Vol. 16, No. 2, pp. 79-85, June 1990.
[22]Philipp Koehn, Franz Josef Och and Daniel Marcu, "Statistical phrase-based translation," In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Vol. 1, pp. 48-54, 2003.
[23]Stanley F. Chen, "Aligning sentences in bilingual corpora using lexical information," In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, 1993.
[24]W. John Hutchins and Harold L. Somers, "An introduction to machine translation," Computational Linguistics, Vol. 19, Issue 2, pp.383-384, June 1993.
[25]W. John Hutchins, "Machine translation: Past, present, future," Ellis Horwood: Chichester, pp.382, 1986.
[26]Warren Weaver, "Translation," William Locke and Donald Booth (Eds), Machine Translation of Languages: Fourteen Essays, pp. 15-23, 1955.
[27]William A. Gale and Kenneth W. Church, "A program for aligning sentence in bilingual corpora," Computational Linguistics, Vol. 19, Issue 1, pp. 75-102 March 1993.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top