研究生(外文):Shi-Chi Li
論文名稱(外文):A Stydy of Chinese Word Sense Disambiguation Based on Concept Primarily
指導教授(外文):Sue J. Ker
詞義辨識(Word Sense Disambiguation)的目的在於決定一個多義詞之最佳詞義,在眾多的研究方式中,又以監督式學習法最為成功。監督式學習法需要人工標示語料庫,並從其中擷取資訊作為辨識多義詞之特徵。在選擇特徵方面,一般的作法會以共現的周邊詞彙作為特徵,但僅考量詞型會忽略詞彙在語言上的多樣性及一詞多義性,使得在收集特徵時無法集中資訊,或者造成特徵集合中帶有雜訊而影響辨識效果。因此,本研究在收集特徵時,先將周邊詞彙透過詞義區分字典轉換為該詞彙所代表的概念,再集合相同概念作為特徵,辨識目標詞之詞義。在實驗方面,我們以Senseval-3中文語料庫作為實驗語料庫,以知網知識庫作為詞義區分字典,並透過實驗結果驗證本研究之方式有不錯的辨識效能及很高的標示應用率(Applicability),也證明了先將周邊詞彙轉換為概念再收集為特徵,確實比較能集中資訊之假設。
The purpose of Word Sense Disambiguation (WSD) is to determine the best definition of polysemous words. Among the numerous researches, supervised learning is the most successful way. Under supervised learning, hand-tagged corpus is required in order to be extracted information as features of identifying the ambiguous word. In the aspect of selecting features, we choose co-occurrence context words as distinctions in general, but concern types of the words only will take no notice of the multiformity of our language and the polysemy of vocabularies. Under this circumstance, the information can’t be centralized or may lead to features with noise to affect the efficiency of WSD. Therefore, the collection method in this research is to transform context words into the concept of words first which are represented by sense division dictionary, then collect the same concept as the features to identify the meaning of the target word. In the aspect of experiment, we take Senseval-3 Chinese corpus as experimental language materials and HowNet as sense division dictionary. Afterward, we prove our research with good identifying efficiency and high applicability based on experiment results; also confirm the hypothesis which transforms context words to concept before collecting them as features is actually efficient to centralize the information.
誌謝..... i
目錄..... iv
1. 緒論..... 1
2. 文獻探討....3
2.1 常用的詞彙資源.... 3
2.2 詞義辨識技術....4
2.2.1 監督式學習法....5
2.2.2 非監督式學習法.... 6
2.2.3 部分監督式學習法....7
2.3 中文詞義辨識....8
3. 研究資源....10
3.1 Senseval語料庫....10
3.1.1 Senseval會議....10
3.1.2 Senseval-3中文語料庫....11
3.2 知網知識庫....14
4. 研究方法....19
4.1 觀察....19
4.2 以共現上下文在知網定義之義原為標示特徵....21
4.2.1 收集詞義特徵....21
4.2.2 標示目標詞....25
5. 實驗....29
5.1 實驗資料....29
5.2 參數設定....31
5.3 實驗設計....32
5.4 實驗結果....33
5.5 分析與討論....37
6. 結論....41
