|
[1] S.F. Adafre, and M.d. Rijke, Finding similar sentences across multiple languages in Wikipedia, Proceedings of the EACL Workshop on New Text, 2006. [2] D. Andrade, T. Matsuzaki, and J.i. Tsujii, Learning the Optimal use of Dependency-parsing Information for Finding Translations with Comparable Corpora, 4thWorkshop on Building and Using Comparable Corpora, USA, 2011. [3] P.F. Brown, V.J.D. Pietra, S.A.D. Pietra, and R.L. Mercer, The mathematics of statistical machine translation: parameter estimation. Computational Linguistics 19 (1993). [4] P. Cheung, and P. Fung, Sentence alignment in parallel, comparable, and quasi-comparable corpora, LREC2004 Workshop, 2004. [5] J. Civera, and A. Juan, Unigram-IBM model 1 mixtures for bilingual text classification, Proceeding of LREC’08, 2008. [6] M. Collins, and Y. Singer, Unsupervised models for named entity classification, Proceedings of the Joint SIG- DAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 1999. [7] S. Cucerzan, Large-scale named entity disambiguation based on Wikipedia data, Proceeding of the 2007 Joint Conference on Empirical Methods in Nature Language Processing and Computational Nature Language Learning, 2007. [8] V. Dániel, P. Halácsy, A. Kornai, V. Nagy, L. Németh, and V. Trón, Parallel corpora for medium density languages Proceedings of RANLP’2005 Bulgaria, 2005, pp. 590-596. [9] G. Doddington, Automatic evaluation of machine translation quality using n-gram co-occurence statistics, Proceeding of the Second International Conference of Human Language Technology Research, 2002. [10] W.A. Gale, and K.W. Church, A program for aligning sentences in bilingual corpora, Proceedings of the 29th annual meeting on Association for Computational Linguistics, California, 1991, pp. 177-184. [11] J. Giles, Internet encyclopaedias go head to head, Nature, 2005. [12] J. Goodman, A Bit of Progress in Language Modeling, Technical report, Microsoft Research, 2001. [13] M. Hepp, K. Siorpaes, and D. Bachlechner, Harvesting Wiki Consensus: Using Wikipedia Entries as Vocabulary for Knowledge Management, IEEE Internet Computing, 2007, pp. 54-65. [14] S. Hewavitharana, and S. Vogel, Extracting Parallel Phrases from Comparable Data, Proceedings of the 4th Workshop on Building and Using Comparable Corpora, 2011, pp. 61-68. [15] C.-C. Hsu, Y.-T. Li, Y.-W. Chen, and S.-H. Wu, Query Expansion via Link Analysis of Wikipedia for CLIR, Proceedings of NTCIR-7, 2008. [16] T. Joachims, Text categorization with support vector machines, 1998. [17] S. Katz, Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer. IEEE Transactions on ACOUSTICS, SPEECH, and SIGNAL PROCESSING (1987). [18] J. Kazama, and K. Torisawa, Exploiting Wikipedia as external knowledge for named entity recognition, Proceeding of the 2007 Joint Conference on Empirical Methods in Nature Language Processing and Computational Nature Language Learning, 2007. [19] P. Koehn, A parallel corpus for statistical machine translation, Proceedings of MT-Summit, 2005. [20] P. Koehn, H. Hoang, A. Birch, C.-B. Chris, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. Herbst, Moses: Open Source Toolkit for Statistical Machine Translation, Annual Meeting of the Association for Computational Linguistics (ACL), 2007. [21] D.D. Lewis, Naive (Bayes) at forty: The independence assumption in information retrieval, Tenth European Conference on Machine Learning, 1998. [22] M.-C. Lin, M.-X. Li, C.-C. Hsu, and S.-H. Wu, Query Expansion from Wikipedia and Topic Web Crawler on CLIR, Proceedings of NTCIR-8 Workshop, 2010. [23] X. Ma, Champollion: A Robust Parallel Text Sentence Aligner, Proceedings of LREC, 2006. [24] X. Ma, and C. Cieri, Corpus support for machine translation at LDC, Proceedings of LREC-2006, 2006. [25] X. Ma, and M. Liberman, BITS: A method for bilingual text search over the web, Proceedings of the Machine Translation Summit VII, 1999. [26] K. Maeda, X. Ma, and S. Strassel, Creating Sentence-Aligned Parallel Text Corpora from a Large Archive of Potential Parallel Text using BITS and Champollion, the Sixth Language Resources and Evaluation Conference, 2008, pp. 26-30. [27] F.J. Och, An Efficient Method for Determining Bilingual Word Classes, Proceedings of European Chapter of the Association for Computational Linguistics, 1999. [28] F.J. Oct, and H. Ney, A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29 (2003) 19-51. [29] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, BLEU: a method for automatic evaluation of machine translation, Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 2002. [30] D. Pinto, J. Civera, A. Juan, P. Rosso, and A. Barr′on-Cede˜no, A statistical approach to crosslingual natural language tasks. Algorithms Cognition, Informatics and Logic 64 (2009) 51-60. [31] C.J.v. Rijsbergen, S.E. Robertson, and M.F. Porter, New models in probabilistic information retrieval, British Library Research and Development Report, London, 1980. [32] J.R. Smith, C. Quirk, and K. Toutanova, Extracting parallel sentences from comparable corpora using document level alignment, HLT ''10 Human Language Technologies, 2010. [33] R. Steinberger, B. Pouliquen, A. Widiger, C. Ignat, T. Erjavec, D. Tufiş, and D. Varga, The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages In proceeding of the 5th International Conference on Language Resource and Evaluation 2006. [34] S. Strassel, M. Przybocki, K. Peterson, Z. Song, and K. Maeda, Linguistic Resources and Evaluation Techniques for Evaluation of Cross-Document Automatic Content Extraction, Proceedings of the 6th International Conference on Language Resources and Evaluation, 2008. [35] C.-Y. Su, S.-H. Wu, and T.-C. Lin, Using Wikipedia to translate OOV term on MLIR, Proceedings of NTCIR-6 Workshop, 2007. [36] J. Tiedemann, and L. Nygaard, The opus corpus - parallel & free, Proceeding of LREC’04, 2004, pp. 1183-1186. [37] M.F. Tyers, and J.A. Pieanaar, Extracting bilingual word pairs from Wikipedia, Proceedings of the SALTMIL Workshop at Language Resources and Evaluation Conference, 2008. [38] M. Utiyama, and H. Isahara, A Japanese-English patent parallel corpus, Proceedings of MT Summit XI, 2007. [39] T. Utsuro, H. Ikeda, M. Yamane, Y. Matsumoto, and M. Nagao, Bilingual text matching using bilingual dictionary and statistics COLING’94, 1994, pp. 1076–1082. [40] Voorhees, The TREC-8 question answering track report, Proceeding of the 8th Text Retrieval Conference, 1999, pp. 77-82. [41] G. William, and K. Church, A Program for Aligning Sentences in Bilingual Corpora. Computational Linguistics 19 (1993) 75-102. [42] F. Wong, M. Dong, and D. Hu, Machine transation based on translation corresponding tree structure, Tsinghua science & technology, 2006. [43] P. Wong, and C. Chan, Chinese word segmentation based on maximum matching and word binding force, COLING’96, Copenhagen, 1996. [44] D. Wu, A polynomial-time algorithm for statistical machine translation, Proc. of Annual meeting of the Association for Computational Linguistics (ACL), California, USA, 1996, pp. 152-158. [45] T. Xiao, H. Zhang, Q. Li, Q. Lu, J. Zhu, F. Ren, and H. Wang, The NiuTrans Machine Translation System for CWMT2011, Proc. of the 6th China workshop on Machine Translation (CWMT), China, 2011, pp. 59-66. [46] Y. Yang, and J.O. Pedersen, A comparative study on feature selection in text categorization, Internation conference on Machine learning, 1997. [47] H.-P. Zhang, H.-K. Yu, D.-Y. Xiong, and Q. Liu, HHMM-based Chinese lexical analyzer ICTCLAS, Proceeding of SIGHAN Workshop, 2003.
|