

( 您好!臺灣時間:2024/12/07 10:32
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::


研究生(外文):Kevin Chih-Cheng Yeh
論文名稱(外文):Using Punctuation Marks for Bilingual Sentence Alignment
指導教授(外文):Jason S. Chang
外文關鍵詞:Sentence AlignmentPunctuation MarksMachine TranslationBilingual Parallel CorpusProbabilistic ModelCognate
  • 被引用被引用:3
  • 點閱點閱:191
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0

In this paper, we present a new approach to aligning English and Chinese sentences in parallel corpora based solely on punctuation marks. Although the length-based approach produces high accuracy rates of sentence alignment for clean parallel corpora written in two Western languages such as French and English or German and English, it is not fair as well for parallel corpora written in two disparate languages such as Chinese and English. It is possible to use cognates on top of length-based approach to increase alignment accuracy. However, cognates do not exist between two disparate languages, therefore limiting the applicability of cognate-based approach. In this paper, we examine the feasibility of exploiting soft, ordered matching punctuation marks in two languages for high accuracy sentence alignment. We experimented with an implementation of the proposed method on the parallel corpus of Chinese-English Sinorama Magazine Corpus and Scientific American Magazine Corpus with satisfactory results. We have carried out experiments on sentence alignment using our method and comparing with the length-based method. We evaluated the results based on precision and recall rates with good results. We also demonstrated that the method is applicable to other language pairs such as English and Japanese with minimal additional effort.

Table of Contents……………………………………………iv
List of Tables…………………………………………………v
List of Figures………………………………………………vi
Chapter 1 Introduction………………………………………1
Chapter 2 Approach……………………………………………8
2.1 Training………………………………………11
2.2 The Model………………………………………21
Chapter 3 Experiments and Evaluation……………………24
3.1 First Experiment and Evaluation…………25
3.2 Second Experiment and Evaluation ………30
3.3 Demonstration…………………………………33
Chapter 4 Discussion…………………………………………35
Chapter 5 Conclusion and Future Work……………………38

Brown, P. F., J. C. Lai and R. L. Mercer (1991), Aligning sentences in parallel corpora, in 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, CA, USA. pp. 169-176.
Chen, Stanley F. (1993), Aligning Sentences in Bilingual Corpora Using Lexical Information. In Proceedings of ACL-93. Columbus OH, 1993.
Chen, Aitao & Fredric C. Gey (2001), Translation Term Weighting and Combining Translation Resources in Cross-Language Retrieval. TREC 2001.
Chuang, T., G.N. You, J.S. Chang (2002), Adaptive Bilingual Sentence Alignment, Lecture Notes in Artificial Intelligence 2499, 21-30.
Danielsson Pernilla, Katarina Mühlenbock (2000), Small but Efficient: The Misconception of High-Frequency Words in Scandinavian Translation. AMTA 2000: 158-168.
Déjean, Hervé, Éric Gaussier and Fatia Sadat (2002), Bilingual Terminology Extraction: An Approach based on a Multilingual thesaurus Applicable to Comparable Corpora. In Proceedings of the 19th International Conference on Computational Linguistics COLING 2002, pp. 218-224, Taipei, Taiwan, Aug. 24-Sep. 1, 2002
Dolan, William B., Jessie Pinkham, Stephen D. Richardson (2002), MSR-MT: The Microsoft Research Machine Translation System. AMTA 2002: 237-239.
Gale, William A. & Kenneth W. Church (1991), A program for aligning sentences in bilingual corpus. In Computational Linguistics, vol. 19, pp. 75-102.
Gey, Fredric C., Aitao Chen, Michael K. Buckland, Ray R. Larson (2002), Translingual vocabulary mappings for multilingual information access. SIGIR 2002: 455-456.
Jutras, J-M 2000. An Automatic Reviser: The TransCheck System, In Proc. of Applied Natural Language Processing, 127-134.
Kay, Martin & Martin Röscheisen (1993), Text-Translation Alignment. In Computational Linguistics, 19:1. pp. 121-142.
Ker, Sue J. & Jason S. Chang (1997), A class-based approach to word alignment. In Computational Linguistics, 23:2, pp. 313-344.
Kueng, T.L. and Keh-Yih Su, 2002. A Robust Cross-Domain Bilingual Sentence Alignment Model, In Proceedings of the 19th International Conference on Computational Linguistics.
Kwok, KL. 2001. NTCIR-2 Chinese, Cross-Language Retrieval Experiments Using PIRCS. In Proceedings of the Second NTCIR Workshop Meeting, pp. (5) 14-20, National Institute of Informatics, Japan.
Melamed, I. Dan (1997), A portable algorithm for mapping bitext correspondence. In The 35th Conference of the Association for Computational Linguistics (ACL 1997), Madrid, Spain.
Melamed, I. Dan (1999), Bitext Maps and Alignment via Pattern Recognition. In Computational Linguistics 25(1)107-130, March.
Moore, Robert C. (2002), Fast and Accurate Sentence Alignment of Bilingual Corpora. AMTA 2002: 135-144.
Piao, Scott Songlin 2000, Sentence and word alignment between Chinese and English. Ph.D. thesis, Lancaster University.
Ribeiro, António, Gaël Dias, Gabriel Lopes and João Mexia (2001), Cognates Alignment. In Bente Maegaard (ed.), Proceedings of the Machine Translation Summit VIII (MT Summit VIII) — Machine Translation in the Information Age, Santiago de Compostela, Spain, 2001 September 18—22. pp. 287—292.
Richards, Jack et al. Longman Dictionary of Applied Linguistics, Longman, 1985.
Simard, M., G. Foster & P. Isabelle (1992), Using cognates to align sentences in bilingual corpora. In Proceedings of TMI92, Montreal, Canada, pp. 67-81.
Wu, Dekai (1994), Aligning a parallel English-Chinese corpus statistically with lexical criteria. In The Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, New Mexico, USA, pp. 80-87.
Wu, (2003), Bilingual Collocation Extraction Based on Linguistic and Statistical Analyses. Master thesis, National Tsing Hua University, Taiwan.
Wu, Jian-Cheng, Kevin C. Yeh, Thomas C. Chuang, Wen-Chi Shei and Jason S. Chang(2003), TotalRecall: A Bilingual Concordance for Computer Assisted Translation and Language Learning, ACL2003.
Yang, Y., 1981. Researches on Punctuation Marks, Tien-Chien Publishing, Hong Kong.

第一頁 上一頁 下一頁 最後一頁 top