跳到主要內容

臺灣博碩士論文加值系統

(18.97.9.171) 您好!臺灣時間:2024/12/09 12:01
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:邱筱涵
研究生(外文):Sheau-Harn Chiou
論文名稱:法律翻譯語料庫建置及分析
論文名稱(外文):Corpora for Legal Translation: Compilation and Analysis
指導教授:高照明高照明引用關係
口試委員:王世平王珊珊
口試日期:2016-01-19
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:翻譯碩士學位學程
學門:人文學門
學類:翻譯學類
論文種類:學術論文
論文出版年:2016
畢業學年度:104
語文別:英文
論文頁數:110
中文關鍵詞:平行及單語語料語料庫工具翻譯參考資源法律翻譯法律英語
外文關鍵詞:parallel and monolingual corporacorpus analysis toolstranslation reference toollegal translationlegal English
相關次數:
  • 被引用被引用:3
  • 點閱點閱:626
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
語料庫語言學及相關技術在翻譯領域中的應用日趨重要,專門語料庫作為特定專業領域的翻譯參考資源,也極具價值。為解決譯者在專門領域的翻譯工作中,可能面臨參考資源不足的問題,本研究嘗試應用現有之軟體資源輔助,建置中英平行結合英文單語之法律語料庫。建置過程採用軟體工具及半自動化方式,進行大批語料處理、自動斷詞、詞性標記、段落(句)對齊,以及詞組對應擷取之工作。建置完成的語料庫,以語料庫軟體輔助,進行關鍵詞、對應詞組、N連詞、雙語關鍵詞檢索,以及單語關鍵詞檢索之分析。研究結果顯示,本文中嘗試採用的語料庫分析方式,可有效幫助譯者取得多種翻譯過程中需要的參考範例。研究過程中取得的關鍵詞、詞組翻譯、常用表達方式和翻譯策略,也可經累積後應用於其他形式的翻譯資源建置。本研究採用的語料庫建置與分析方式,還可應用在其他專門領域之翻譯,以支援譯者工作需求;分析過程中觀察得的許多現象,也值得進一步分析探索,期能貢獻於未來的翻譯實務與研究工作。

Corpora, the well-organized bodies of “naturally occurring language data” sampled to represent a variety of language (McEnery, 2003, p. 449), have been making a growing impact in the field of translation (Bernardini, Stewart, & Zanettin, 2003). Scholars have asserted the immense value of corpora as reference tool for translation practice in specialized subject domains (Bowker & Pearson, 2002; Varantola, 2003), where intrinsic features of the language may cause difficulties for the translator. To address the potential lacking in reference tools for specialized translation assignments, this study explores a number of methods and computerized tools in compiling and analyzing a parallel and monolingual corpus of Chinese and English legislation. Incorporating semi-automated tools for text processing, part-of-speech tagging, sentence alignment, and phrasal alignment, this study utilizes keyword analysis, n-gram and n-gram part-of-speech sequence, as well as bilingual and monolingual concordance search to address identification of terminology equivalents, stylistic features, usage patterns, and translation strategies for legal contexts. Findings suggest that with the proposed methods, the corpus compiled in this study could effectively provide a number of information to aid the work of legal translators. The information identified can also be applied to compiling other forms of translation resources. It is hoped that in future research, the corpus tools and approaches employed in this study can be applied to facilitating other specialized fields of translation, and that preliminary findings observed here could be further explored to benefit future work in this discipline.

Acknowledgements i
摘要 ii
Abstract iii
Table of Contents iv
List of Figures vii
List of Tables viii
Chapter 1 Introduction 1
1.1 Background of Study 1
1.1.1 Technologies in Translation 2
1.1.2 References for Legal Translation 4
1.2 Purpose and Research Questions 7
1.3 Significance of Study 8
1.4 Terminology 9
1.5 Outline of Thesis 11
Chapter 2 Literature Review 12
2.1 Statistical Techniques in Corpus Analysis 12
2.1.1 Frequency Data, Keywords and Keyness 12
2.1.2 Phraseology and N-grams 15
2.1.3 Pattern Grammar and Concordances 16
2.2 Corpora as Translation Reference Tool 18
2.2.1 Corpus Typology and Usage 18
2.2.2 Designing of Specialized Corpora 21
2.3 Computational Linguistics and Corpus Processing 22
2.3.1 Part-of-Speech Tagging 23
2.3.2 Statistical Machine Translation 25
2.3.3 Sentence Alignment 27
2.3.4 Word and Phrasal Alignment 29
2.4 Corpus-based Studies on Legal Language and Translation 31
2.4.1 Legal Language and External Variation 32
2.4.2 Internal Variation of Legal English 32
2.4.3 Legal Language in Chinese and English Contracts 34
Chapter 3 Method 41
3.1 Corpus Selection 41
3.2 Corpus Processing and Annotation 43
3.2.1 Text Processing 44
3.2.2 Part-of-Speech Tagging 47
3.2.3 Sentence and Phrasal Alignment 48
3.3 Corpora Analysis 50
3.3.1 Keyword Analysis 51
3.3.2 Terminology Equivalents and Translation Units 53
3.3.3 Exploring Stylistic Features 55
3.3.4 Utilizing Translational and Non-translational Corpora 57
Chapter 4 Results and Discussion 59
4.1 Keyword Analysis and Preliminary Observation 59
4.2 Terminology Equivalents and Translation Units 62
4.2.1 Theme-related Terminology 62
4.2.2 Taiwan-specific Proper-nouns 69
4.3 Stylistic Features: Modals 74
4.3.1 Shall 75
4.3.2 May 81
4.4 Translational and Non-translational Corpora in Conjunction 86
4.4.1 Extended Terminology Information 86
4.4.2 Writing Style 88
Chapter 5 Conclusion 93
5.1 Summary of Findings and Implications 93
5.1.1 Terminology Equivalents and Computerized Tools 93
5.1.2 Stylistic Features and Patterns 95
5.1.3 Translational and Non-translational Corpora 97
5.1.4 Implications 98
5.2 Limitations and Suggestions for Future Research 99
References 102
Appendix 110

Anthony, L. (2014a). AntConc (3.4.3) [Computer Software]. Tokyo, Japan: Waseda University. Available from http://www.antlab.sci.waseda.ac.jp/
Anthony, L. (2014b). AntConc help file. Version 001. Laurence Anthony. Retrieved from http://www.laurenceanthony.net/software/antconc/releases/AntConc344/help.pdf
Baker, M. (1995). Corpora in translation studies: An overview and some suggestions for future research. Target, 7(2), 223-243.
Barnett, B. (2015). Sed - An introduction and tutorial by Bruce Barnett. The Grymoire - home for UNIX wizards. Retrieved from http://www.grymoire.com/Unix/sed.html
Bernardini, S., Stewart, D., & Zanettin, F. (2003). Corpora in translator education: An introduction. In F. Zanettin, S. Bernardini, & D. Stewart (Eds.), Corpora in translator education (pp. 1-14). UK: St. Jerome Publishing.
Biber, D. (1988). Variation across speech and writing. Cambridge University Press.
Biber, D. (2004). Conversation text types: A multi-dimensional analysis. In G. Purnelle, C. Fairon, & A. Dister (Eds.), Proceedings from JADT 2004: the 7th International Conference on Textual Data Statistical Analysis (pp. 926-936). Louvain-la-Neuve, Belgium: Presses universitaires de Louvain.
Biber, D., & Conrad, S. (1999). Lexical bundles in conversation and academic prose. In H. Hasselgård & S. Oksefjell (Eds.), Out of corpora: Studies in honour of Stig Johansson (pp. 181-189). Amsterdam: Rodopi.
Biber, D., Conrad, S., & Cortes, V. (2004). If you look at...: Lexical bundles in university teaching and textbooks. Applied Linguistics, 25(3), 371-405.
Biel, Ł. (2010). Corpus-based studies of legal language for translation purposes: Methodological and practical potential. In C. Heine & J. Engberg (Eds.), Reconceptualizing LSP, Online proceedings of the XVII European LSP Symposium 2009.
Bondi, M., & Scott, M. (Eds.). (2010). Keyness in texts. Philadelphia: J. Benjamins.
Bowker, L. (2002). Computer-aided translation technology: A practical introduction. Ottawa: University of Ottawa Press.
Bowker, L., & Pearson, J. (2002). Working with specialized language: A practical guide to using corpora. London; New York: Routledge.
Brown, P., Cocke, J., Della Pietra, S., Della Pietra, V., Jelinek, F., Mercer, R., & Roossin, P. (1990). A statistical approach to language translation. Proceedings of the 12th Conference on Computational Linguistics, 71-76. Stroudsburg: Association for Computational Linguistics.
Brown, P. F., Lai, J. C., & Mercer, R. L. (1993). Aligning sentences in parallel corpora. Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, 169-176. Berkeley.
Cao, Deborah. (2007). Translating law. Clevedon: Multilingual Matters.
Chen, P. (2012). Using comparable specialized corpora with machine translation for extracting n-gram translation equivalents: A case study of Chinese and English contracts (Doctoral dissertation). Available from National Digital Library of Theses and Dissertations in Taiwan.
Cheng, N. (2013). CUC_ParaConc (0.3) [Computer Software]. Beijing, China: Communication University of China. Available from http://ling.cuc.edu.cn/chs/News_View.asp?NewsID=244
Cheng, N., & Hou, M. (2012). Parallel corpus retrieval technology research. Computer Engineering and Applications, 48(31), 134-139.
Cheng, Y.-C. (2013). A Corpus-based analysis of character usage in Chinese transliteration: A case study of newspapers in Taiwan and Mainland China (Master’s thesis). Available from National Digital Library of Theses and Dissertations in Taiwan.
Church, K. W., & Gale, W. A. (1991). Concordances for parallel text. Using corpora: Proceedings of the Eighth Annual Conference of the UW Centre for the New OED and Text Research, 40-62. Oxford.
Coulthard, M. & Johnson, A. (2007). An introduction to forensic linguistics: Language in evidence. London; New York: Routledge.
Dagan, I., Church, K. W., & Gale, W. A. (1993). Robust bilingual word alignment for machine aided translation. Proceedings of the Workshop on Very Large Corpora: Academic and Industrial Perspectives, 1-8. Ohio.
Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61-74.
European Association for Machine Translation. (n.d.) What is machine translation?. Retrieved from http://www.eamt.org/mt.php
Farkas, A. (2015). LF Aligner (4.1) [Computer Software]. Available from http://sourceforge.net/projects/aligner/
Flowerdew, L. (2012). Corpora and language education. New York: Palgrave Macmillan.
Francis, W. N., & Kucera H. (1964). Brown corpus. Retrieved from http://www.nltk.org/nltk_data/packages/corpora/brown.zip
Frankenberg-Garcia, A., & Santos, D. (2003). Introducing Compara: The Portuguese-English parallel corpus. In F. Zanettin, S. Bernardini, & D. Stewart (Eds.), Corpora in translator education (pp. 71-89). UK: St. Jerome Publishing.
Gale, W. A., & Church, K. W. (1991a). Identifying word correspondences in parallel texts. Proceedings of the DARPA Workshop on Speech and Natural Language, 152-157. Stroudsburg: Association for Computational Linguistics.
Gale, W. A., & Church, K. W. (1991b). A program for aligning sentences in bilingual corpora. Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics (ACL), 177-184. Berkeley.
Garner, D., & Davies, M. (2013). A new academic vocabulary list. Applied Linguistics, 35, 1-24.
Gozdz-Roszkowski, S. (2011). Patterns of linguistic variation in American legal English: A corpus-based study. Frankfurt am Main; New York: Peter Lang AG.
Huddleston, R., & Pullum, G. K. (2002). The Cambridge grammar of the English language. Cambridge, U.K.; New York: Cambridge University Press.
Hunston, S., & Francis, G. (2000). Pattern grammar: A corpus-driven approach to the lexical grammar of English. Philadelphia, PA: John Benjamins Publishing.
Ji, M. (2012). Hypothesis testing in corpus-based literary translation studies. In M. P. Oakes & M. Ji (Eds.), Quantitative methods in corpus-based translation studies: A practical guide to descriptive translation research (pp. 53-72). Philadelphia, PA: John Benjamins.
Kay, M., & Röscheisen, M. (1993). Text-translation alignment. Computational Linguistics, 19(1), 121-142.
Laviosa, S. (1998). Core patterns of lexical use in a comparable corpus of English narrative prose. Meta: Translators’ Journal, 43(4), 557-570.
Lee, D. Y. W. (2010). What corpora are available?. In A. O’Keeffe & M. McCarthy (Eds.), The Routledge handbook of corpus linguistics (pp. 107-121). London; New York: Routledge.
Liu, T.-J. (2014). PTT Corpus: Construction and applications (Master’s thesis). Available from National Digital Library of Theses and Dissertations in Taiwan.
Lu, X. (2014). Computational methods for corpus annotation and analysis. Netherlands: Springer.
Maia, B. (2003). Training translators in terminology and information retrieval using comparable and parallel corpora. In F. Zanettin, S. Bernardini, & D. Stewart (Eds.), Corpora in translator education (pp. 43-54). UK: St. Jerome Publishing.
Marcus, P., Santorini, B., & Marcinkiewicz, M. (1993). Building a large annotated corpus of English: The Penn Treebank. Technical Report MSCIS-93-87, Department of Computer and Information Science, University of Pennsylvania.
McEnery, T. (2003). Corpus linguistics. In R. Mitkov (Ed.), The Oxford handbook of computational linguistics (pp. 448-463). New York: Oxford University Press.
McEnery, T., & Wilson, A. (2001). Corpus linguistics: An introduction. Edinburgh: Edinburgh University Press.
Mikheev, A. (2003). Text segmentation. In R. Mitkov (Ed.), The Oxford handbook of computational linguistics (pp. 201-218). New York: Oxford University Press.
Ministry of Justice. (2015). Laws & Regulations Database of the Republic of China. Available from http://law.moj.gov.tw/
Mitkov, R. (Ed.) (2003). The Oxford handbook of computational linguistics. New York: Oxford University Press.
Neubig, G. (2012). pialign (0.2.4) [Computer Software]. Retrieved from http://phontron.com/pialign/download/pialign-0.2.4.tar.gz
Neubig, G., Watanabe, T., Sumita, E., Mori, S., & Kawahara, T. (2011). An unsupervised model for joint phrase alignment and extraction. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, 632-641. Portland.
Office of the Law Revision Counsel. (2015). United States code. Available from http://uscode.house.gov/browse.xhtml
Pearson, J. (2003). Using parallel texts in the translator training environment. In F. Zanettin, S. Bernardini, & D. Stewart (Eds.), Corpora in translator education (pp. 15-24). UK: St. Jerome Publishing.
Quah, C. K. (2006). Translation and technology. New York: Palgrave Macmillan.
Rayson P., Berridge D., & Francis B. (2004). Extending the Cochran rule for the comparison of word frequencies between corpora. In G. Purnelle, C. Fairon, & A. Dister (Eds.), Proceedings from JADT 2004: The 7th International Conference on Statistical Analysis of Textual Data (pp. 15-34). Louvain-la-Neuve, Belgium: Presses universitaires de Louvain.
Rayson, P., & Garside, R. (2000). Comparing corpora using frequency profiling. Proceedings of the Workshop on Comparing Corpora, 9, 1-6. Stroudsburg: Association for Computational Linguistics.
Red Hat, Inc. (2015). Cygwin user’s guide. Retrieved from https://cygwin.com/cygwin-ug-net/cygwin-ug-net.html
Reppen, R. (2010). Building a corpus: What are the key considerations?. In A. O’Keeffe & M. McCarthy (Eds.), The Routledge handbook of corpus linguistics (pp. 31-37). London; New York: Routledge.
Rossini Favretti, R., Tamburini, F., & Martelli1, E. (2007).Words from BOnonia Legal Corpus. In W. Teubert (Ed.), Text corpora and multilingual lexicography (pp. 11-30). Amsterdam; Philadelphia: John Benjamins Publishing Company.
Santorini, B. (1990). Part-of-speech tagging guidelines for the Penn Treebank Project. Technical report MS-CIS-90-47, Department of Computer and Information Science, University of Pennsylvania.
Šarčević, S. (1997). New approach to legal translation. The Hague: Kluwer Law International.
Scott, M. (1997). PC analysis of key words – And key key words. System, 25(2), 233-245.
Scott, M. (2000). WordSmith tools help manual. Version 3.0. Mike Scott and Oxford University Press.
Scott, M., & Tribble, C. (2006). Textual patterns: Key words and corpus analysis in language education. Philadelphia: J. Benjamins.
Shuttleworth, M., & Lagoudaki, E. (2006). Translation memory systems: technology in the service of the translation professional. Paper presented at 1st Athens International Conference of Translation and Interpretation, Athens, Greece. Retrieved from http://project2007.hau.gr/telamon/files/MarkShuttleworth_ElinaLagoudaki_PaperAICTI.pdf
Sinclair, J. (2003). Reading concordances: An introduction. London; New York: Pearson/Longman.
Somers, H. (2003). Machine translation: Latest developments. In R. Mitkov (Ed.), The Oxford handbook of computational linguistics (pp. 512-528). New York: Oxford University Press.
Stenberg, D. (2015). cURL (7.41.0) [Computer Software]. Available from http://curl.haxx.se/
Stubbs, M. (2001). Words and phrases: Corpus studies of lexical semantics. Oxford; Malden, MA: Blackwell Publishers.
Stubbs, M. (2007). An example of frequent English phraseology: Distributions, structures and functions. In R. Facchinetti (Ed.), Corpus linguistics 25 years on (pp. 89-105). Amsterdam; New York: Rodopi.
Toyama, K. (2011). Brief introduction to Bilingual KWIC for Taiwan Laws [PowerPoint slides]. Retrieved from http://www.slidefinder.net/t/taiwanlii-workshop-toyamaenglish20110607/32657725
Toutanova, K., Klein D., Manning, C., & Singer, Y. (2003). Feature-rich part-of-speech tagging with a cyclic dependency network. Proceedings of HLT-NAACL 2003, 252-259.
Varantola, K. (2002). Disposable corpora as intelligent tools in translation. Cadernos de Tradução, 1(9), 171-189.
Varantola, K. (2003). Translators and disposable corpora. In F. Zanettin, S. Bernardini, & D. Stewart (Eds.), Corpora in translator education (pp. 55-70). UK: St. Jerome Publishing.
Varga, D., Németh, L., Halácsy, P., Kornai, A., Trón, V., & Nagy, V. (2005). Parallel corpora for medium density languages. Proceedings of the RANLP 2005, 590-596.
Véronis, J. (2000). From the Rosetta Stone to the information society: A survey of parallel text processing. In J. Véronis (Ed.), Parallel text processing: Alignment and use of translation corpora (pp. 1-24). Boston: Kluwer Academic Publishers.
Voutilainen, A. (2003). Part-of-speech tagging. In R. Mitkov (Ed.), The Oxford handbook of computational linguistics (pp. 219-232). New York: Oxford University Press.
Wu, D. (1995a). Stochastic inversion transduction grammars, with application to segmentation, bracketing, and alignment of parallel corpora. Technical Report HKUST-CS95-30, Department of Computer Science, University of Science and Technology.
Wu, D. (1995b). Grammarless extraction of phrasal translation examples from parallel texts. Proceedings of the Sixth International Conference on Theoretical and Methodological Issues in Machine Translation, 2, 354-372. Leuven, Belgium.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top