研究生(外文):Jian-Cheng Wu
論文名稱(外文):Bilingual Collocation Extraction Based on Syntactic and Statistical Analyses
指導教授(外文):Jason S. Chang
在本論文中,我們提出了一個藉由語言以及統計分析,從平行語料中擷取雙語詞語搭配 (Bilingual Collocations) 的演算法。在此,我們從一個機器可讀詞典 (Machine Readable Dictionary) 所包含的搭配與成語中,取得搭配通常的詞性樣式。從平行語料的對應中英句子,擷取合乎這些詞性樣式,而且統計關聯性高的連續的詞語,作為可能的搭配詞語。這些擷取出來兩種語言的詞語搭配,再透過英文及中文的之間的關聯性統計,進行對應。而在做對應時,兩個詞語搭配的整體之間的關聯性統計,以及其中的單字對應到另一語言的單字之間的關係,都加以考量,來擷取雙語詞語搭配。我們實際製作了程式,並利用光華雜誌中英文平行語料庫來測試,評估系統的效率。實驗結果相當令人滿意,所擷取到的雙語詞語搭配大約有 80 % 是正確的(精確率)。另外一方面,系統擷取到,文件內大約 60 % 的真正詞語搭配(召回率)。

In this paper, we describe an algorithm that employs linguistic and statistical analyses to extract bilingual collocations from a parallel corpus. Preferred bilingual syntactic patterns of collocation are obtained from idioms and collocations in the machine readable dictionary. Phrases matching the patterns are extract from aligned sentences in a parallel corpus. Those phrases are subsequently matched up based on cross linguistic statistical association. Statistical associations between the whole collocations as well as words in collocations are used jointly to link a collocation and its counterpart collocation in the other language. We experimented with an implementation of the proposed method on a very large Chinese-English parallel corpus with satisfactory results.

Chapter 1 Introduction………………………………………….1
Chapter 2 Extraction of Bilingual Collocations………….6
2.1 An Example of Extracting Bilingual Collocations.7
2.2 The Method…………………………………………17
Chapter 3 Experiments and Evaluation………………………21
Chapter 4 Discussions………….………………………………29
Chapter 5 Conclusion and Future Work…………………………33
Appendix I ― Test Data……………………………………..35

