|
Current computational model for Nature Language Processing tends to combine statistics-based model and rule-based model. It is easy to keep the benefits of both rule-based and statistics-based language models in such a kind of models. Word segmentation systems also tend to be implemented in this way. Although the word segmentation systems are more mature than before, there also exist an serious problem. Proper noun identification is a bottle neck and should be resolbed. In this thesis, we propose models to identify some proper nouns. The key point is feature assignment. Not only are those proper nouns identified, but also some special features are assigned. Chinese surname-names are more than the other proper nouns in general Chinese texts. We propose a new model to resolve this problem. By the way, we also assign those proper nouns features. Transliterated person-names are parts of person- names in general texts. In the view point of Chinese surname- name, the structures of transliterated person-names are more complex. We propose a complete new thinking for resolving this problem. Based on the structures of proper nouns, we know organization names are much more complex than the former two. Sometimes, an organization name may be composed of other proper nouns, e.g., Chinese surname-names, transliterated person- names, place names, and so on. In this thesis, we propose a new model to identify these proper nouns. Finally, we will demonstrate some examples to explain our approaches and discuss the experimental results.
|