跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.85) 您好!臺灣時間:2024/12/07 10:32
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:葉至程
研究生(外文):Kevin Chih-Cheng Yeh
論文名稱:利用標點之雙語句子對應研究
論文名稱(外文):Using Punctuation Marks for Bilingual Sentence Alignment
指導教授:張俊盛張俊盛引用關係
指導教授(外文):Jason S. Chang
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2003
畢業學年度:91
語文別:英文
論文頁數:41
中文關鍵詞:句子對應標點符號機器翻譯雙語平行語料機率式模型同源詞
外文關鍵詞:Sentence AlignmentPunctuation MarksMachine TranslationBilingual Parallel CorpusProbabilistic ModelCognate
相關次數:
  • 被引用被引用:3
  • 點閱點閱:191
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本論文研究利用中英文標點符號,從互為翻譯的文章中,將對應的句子取出。過去統計式的做法,大多是利用互譯句子長度也大致接近的性質。長度為本的做法,處理英文、法文等相近的語言,效果很好。但是處理英文、中文等差異大的語言,效果就比較差。文獻中也有利用同源詞,來加強長度為本的做法。然而英文與中文,所用的字母符號不同,也缺乏同源詞的現象,無法採用同源詞為本的做法。我們提出標點符號的機率式模型,依序的彈性比對,以決定中文句子與英文句子的標點符號之相容性,進而推斷兩句子互為翻譯的可能性。以標點符號為本的句子對應做法,確實可行。我們用光華雜誌與科學人雜誌的雙語語料庫作實驗,證明可以優於長度為本的做法。只要稍加改變,這個做法也可以處理日文與英文、日文與中文的句子對應。

In this paper, we present a new approach to aligning English and Chinese sentences in parallel corpora based solely on punctuation marks. Although the length-based approach produces high accuracy rates of sentence alignment for clean parallel corpora written in two Western languages such as French and English or German and English, it is not fair as well for parallel corpora written in two disparate languages such as Chinese and English. It is possible to use cognates on top of length-based approach to increase alignment accuracy. However, cognates do not exist between two disparate languages, therefore limiting the applicability of cognate-based approach. In this paper, we examine the feasibility of exploiting soft, ordered matching punctuation marks in two languages for high accuracy sentence alignment. We experimented with an implementation of the proposed method on the parallel corpus of Chinese-English Sinorama Magazine Corpus and Scientific American Magazine Corpus with satisfactory results. We have carried out experiments on sentence alignment using our method and comparing with the length-based method. We evaluated the results based on precision and recall rates with good results. We also demonstrated that the method is applicable to other language pairs such as English and Japanese with minimal additional effort.

摘要………………………………………………………………i
Abstract…………………………………………………………ii
致謝辭…………………………………………………………iii
Table of Contents……………………………………………iv
List of Tables…………………………………………………v
List of Figures………………………………………………vi
Chapter 1 Introduction………………………………………1
Chapter 2 Approach……………………………………………8
2.1 Training………………………………………11
2.2 The Model………………………………………21
Chapter 3 Experiments and Evaluation……………………24
3.1 First Experiment and Evaluation…………25
3.2 Second Experiment and Evaluation ………30
3.3 Demonstration…………………………………33
Chapter 4 Discussion…………………………………………35
Chapter 5 Conclusion and Future Work……………………38
References………………………………………………………39

Brown, P. F., J. C. Lai and R. L. Mercer (1991), Aligning sentences in parallel corpora, in 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, CA, USA. pp. 169-176.
Chen, Stanley F. (1993), Aligning Sentences in Bilingual Corpora Using Lexical Information. In Proceedings of ACL-93. Columbus OH, 1993.
Chen, Aitao & Fredric C. Gey (2001), Translation Term Weighting and Combining Translation Resources in Cross-Language Retrieval. TREC 2001.
Chuang, T., G.N. You, J.S. Chang (2002), Adaptive Bilingual Sentence Alignment, Lecture Notes in Artificial Intelligence 2499, 21-30.
Danielsson Pernilla, Katarina Mühlenbock (2000), Small but Efficient: The Misconception of High-Frequency Words in Scandinavian Translation. AMTA 2000: 158-168.
Déjean, Hervé, Éric Gaussier and Fatia Sadat (2002), Bilingual Terminology Extraction: An Approach based on a Multilingual thesaurus Applicable to Comparable Corpora. In Proceedings of the 19th International Conference on Computational Linguistics COLING 2002, pp. 218-224, Taipei, Taiwan, Aug. 24-Sep. 1, 2002
Dolan, William B., Jessie Pinkham, Stephen D. Richardson (2002), MSR-MT: The Microsoft Research Machine Translation System. AMTA 2002: 237-239.
Gale, William A. & Kenneth W. Church (1991), A program for aligning sentences in bilingual corpus. In Computational Linguistics, vol. 19, pp. 75-102.
Gey, Fredric C., Aitao Chen, Michael K. Buckland, Ray R. Larson (2002), Translingual vocabulary mappings for multilingual information access. SIGIR 2002: 455-456.
Jutras, J-M 2000. An Automatic Reviser: The TransCheck System, In Proc. of Applied Natural Language Processing, 127-134.
Kay, Martin & Martin Röscheisen (1993), Text-Translation Alignment. In Computational Linguistics, 19:1. pp. 121-142.
Ker, Sue J. & Jason S. Chang (1997), A class-based approach to word alignment. In Computational Linguistics, 23:2, pp. 313-344.
Kueng, T.L. and Keh-Yih Su, 2002. A Robust Cross-Domain Bilingual Sentence Alignment Model, In Proceedings of the 19th International Conference on Computational Linguistics.
Kwok, KL. 2001. NTCIR-2 Chinese, Cross-Language Retrieval Experiments Using PIRCS. In Proceedings of the Second NTCIR Workshop Meeting, pp. (5) 14-20, National Institute of Informatics, Japan.
Melamed, I. Dan (1997), A portable algorithm for mapping bitext correspondence. In The 35th Conference of the Association for Computational Linguistics (ACL 1997), Madrid, Spain.
Melamed, I. Dan (1999), Bitext Maps and Alignment via Pattern Recognition. In Computational Linguistics 25(1)107-130, March.
Moore, Robert C. (2002), Fast and Accurate Sentence Alignment of Bilingual Corpora. AMTA 2002: 135-144.
Piao, Scott Songlin 2000, Sentence and word alignment between Chinese and English. Ph.D. thesis, Lancaster University.
Ribeiro, António, Gaël Dias, Gabriel Lopes and João Mexia (2001), Cognates Alignment. In Bente Maegaard (ed.), Proceedings of the Machine Translation Summit VIII (MT Summit VIII) — Machine Translation in the Information Age, Santiago de Compostela, Spain, 2001 September 18—22. pp. 287—292.
Richards, Jack et al. Longman Dictionary of Applied Linguistics, Longman, 1985.
Simard, M., G. Foster & P. Isabelle (1992), Using cognates to align sentences in bilingual corpora. In Proceedings of TMI92, Montreal, Canada, pp. 67-81.
Wu, Dekai (1994), Aligning a parallel English-Chinese corpus statistically with lexical criteria. In The Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, New Mexico, USA, pp. 80-87.
Wu, (2003), Bilingual Collocation Extraction Based on Linguistic and Statistical Analyses. Master thesis, National Tsing Hua University, Taiwan.
Wu, Jian-Cheng, Kevin C. Yeh, Thomas C. Chuang, Wen-Chi Shei and Jason S. Chang(2003), TotalRecall: A Bilingual Concordance for Computer Assisted Translation and Language Learning, ACL2003.
Yang, Y., 1981. Researches on Punctuation Marks, Tien-Chien Publishing, Hong Kong.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top