跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.87) 您好!臺灣時間:2025/01/13 04:38
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:吳鑑城
研究生(外文):Jian-Cheng Wu
論文名稱:語言及統計分析為本之雙語搭配擷取
論文名稱(外文):Bilingual Collocation Extraction Based on Syntactic and Statistical Analyses
指導教授:張俊盛張俊盛引用關係
指導教授(外文):Jason S. Chang
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2003
畢業學年度:91
語文別:中文
論文頁數:68
中文關鍵詞:搭配雙語搭配統計式語言分析
相關次數:
  • 被引用被引用:0
  • 點閱點閱:133
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
在本論文中,我們提出了一個藉由語言以及統計分析,從平行語料中擷取雙語詞語搭配 (Bilingual Collocations) 的演算法。在此,我們從一個機器可讀詞典 (Machine Readable Dictionary) 所包含的搭配與成語中,取得搭配通常的詞性樣式。從平行語料的對應中英句子,擷取合乎這些詞性樣式,而且統計關聯性高的連續的詞語,作為可能的搭配詞語。這些擷取出來兩種語言的詞語搭配,再透過英文及中文的之間的關聯性統計,進行對應。而在做對應時,兩個詞語搭配的整體之間的關聯性統計,以及其中的單字對應到另一語言的單字之間的關係,都加以考量,來擷取雙語詞語搭配。我們實際製作了程式,並利用光華雜誌中英文平行語料庫來測試,評估系統的效率。實驗結果相當令人滿意,所擷取到的雙語詞語搭配大約有 80 % 是正確的(精確率)。另外一方面,系統擷取到,文件內大約 60 % 的真正詞語搭配(召回率)。

In this paper, we describe an algorithm that employs linguistic and statistical analyses to extract bilingual collocations from a parallel corpus. Preferred bilingual syntactic patterns of collocation are obtained from idioms and collocations in the machine readable dictionary. Phrases matching the patterns are extract from aligned sentences in a parallel corpus. Those phrases are subsequently matched up based on cross linguistic statistical association. Statistical associations between the whole collocations as well as words in collocations are used jointly to link a collocation and its counterpart collocation in the other language. We experimented with an implementation of the proposed method on a very large Chinese-English parallel corpus with satisfactory results.

Chapter 1 Introduction………………………………………….1
Chapter 2 Extraction of Bilingual Collocations………….6
2.1 An Example of Extracting Bilingual Collocations.7
2.2 The Method…………………………………………17
Chapter 3 Experiments and Evaluation………………………21
Chapter 4 Discussions………….………………………………29
Chapter 5 Conclusion and Future Work…………………………33
Reference……………………………………………………………34
Appendix I ― Test Data……………………………………..35

1. Benson, Morton., Evelyn Benson, and Robert Ilson. The BBI Combinatory Dictionary of English: A Guide to Word Combinations. John Benjamins, Amsterdam, Netherlands, 1986.
2. Choueka, Y. (1988) : "Looking for needles in a haystack", Actes RIAO, Conference on User-Oriented Context Based Text and Image Handling, Cambridge, p. 609-623.
3. Choueka, Y.; Klein, and Neuwitz, E.. Automatic retrieval of frequent idiomatic and collocational expressions in a large corpus. Journal of the Association for Literary and Linguistic Computing, 4(1):34-8, (1983)
4. Church, K. W. and Hanks, P. Word association norms, mutual information, and lexicography. Computational Linguistics, 1990, 16(1), pp. 22-29.
5. Dagan, I. and K. Church. Termight: Identifying and translation technical terminology. In Proc. of the 4th Conference on Applied Natural Language Processing (ANLP), pages 34-40, Stuttgart, Germany, 1994.
6. Dunning, T (1993) Accurate methods for the statistics of surprise and coincidence, Computational Linguistics 19:1, 61-75.
7. Haruno, M., S. Ikehara, and T. Yamazaki. Learning bilingual collocations by word-level sorting. In Proc. of the 16th International Conference on Computational Linguistics (COLING '96), Copenhagen, Denmark, 1996.
8. Huang, C.-R., K.-J. Chen, Y.-Y. Yang, Character-based Collocation for Mandarin Chinese, In ACL 2000, 540-543.
9. Inkpen, Diana Zaiu and Hirst, Graeme. ``Acquiring collocations for lexical choice between near-synonyms.'' SIGLEX Workshop on Unsupervised Lexical Acquisition, 40th meeting of the Association for Computational Lin
10. Justeson, J.S. and Slava M. Katz (1995). Technical Terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering, 1(1):9-27.
11. Kupiec, Julian. An algorithm for finding noun phrase correspondences in bilingual corpora. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, 1993.
12. Lin, D. Using collocation statistics in information extraction. In Proc. of the Seventh Message Understanding Conference (MUC-7), 1998.
13. Melamed, I. Dan. "A Word-to-Word Model of Translational Equivalence". In Procs. of the ACL97. pp 490-497. Madrid Spain, 1997.
14. Smadja, F. 1993. Retrieving collocations from text: Xtract. Computational Linguistics, 19(1):143-177
15. Smadja, F., K.R. McKeown, and V. Hatzivassiloglou. Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics, 22(1):1-38, 1996.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top