研究生(外文):Jia-Cing Ruan
論文名稱(外文):A Learning Methodology for English Grammar-based Ontologies
指導教授(外文):Daniel J. BuehrerJames Myers
口試委員(外文):Jiun-Shiung WuShu-Kai HsiehZhao-Ming Gao
外文關鍵詞:Grammar-based ontologyCorpus Pattern AnalysisConstruction Grammarpatternslexical-based
本初探性的研究旨在嘗試介紹一個以文法為基底的知識本體,並且實際以結合句法及語義結構資料來建立。基本的背景指向於Hanks (2004; 2013)的想法,其認為任何的詞義,特別是多義詞,無法脫離其所處的語境。因此,不論是片語或是句子的語境,都能幫助辨別單詞語義。因為詞網(WordNet)被Ide and Wilks (2006) 批評,其對於詞彙的多義性沒有提供區辨。因此,Hanks以句型(pattern)來達到這樣的目的。
不只是Hanks認為句型本身具備語義,包括「構式語法Construction Grammar」(Goldberg, 1995),「構式構詞學Construction Morphology」(Booij, 2010),及「分散構詞學Distributed Morphology」(Halle and Marantz, 1993; Harley and Noyer, 1999; Embick and Noyer, 2007)也認同此番論點。現今的知識本體都是以詞彙為基底,來描述來自這個世界或某個領域的資訊。然而語言並不單純只是詞彙而已,它還包括語法、結構等的表達義。因此,做為語言及來自世界或某個領域的資訊的介面,知識本體也應該要擴充到以文法為基底的知識本體。
本論文透過實際編寫程式實現文法知識本體、提出相關的結構查詢、以文法知識本體自動化Corpus Pattern Analysis、以文法知識本體提供建立Word sketch系統的另一方法等實例,來說明傳統以詞彙為基底的知識本體的不足,進而提出貢獻。

This preliminary study tries to introduce a new concept of grammar-based ontology through building up an ontology with patterns composed by conjoining syntactic and semantic structures. The background refers to Hanks’ (2004, 2013) thought that any word, especially a polysemous word, does not have its clear meaning unless it occurs in a context. Consequently, phraseological patterns and collocations enable the possibilities to disambiguate word meaning. Since WordNet (Fellbaum, 1998) is unreliable for not providing contrastive analyses of word senses (Ide and Wilks, 2006), using patterns instead of words to achieve the same goal seems more reasonable.
With the Corpus Pattern Analysis (CPA) (Hanks, 2004), Hanks has realized a pattern dictionary for English verbs. This dictionary contains pattern information composed within a corpus, and is complementary to Construction Grammar (Goldberg, 1995) or FrameNet (Baker, Fillmore and Lowe, 1998). Hanks indicates Construction Grammar studies need corpus data as evidence, instead of using fictitious examples. FrameNet provides numerous frame structures while the Corpus Pattern Analysis focuses on systematic analyses between patterns and verbs.
Not only the Corpus Pattern Analysis and its one of the foundations, Theory of Norms and Exploitations (TNE) (Hanks, 2013), but also Construction Grammar, Construction Morphology (Booij, 2010) and Distributed Morphology (Halle and Marantz, 1993; Harley and Noyer, 1999; Embick and Noyer, 2007) admit the existence of meanings of patterns or constructions themselves independently. Thus, an ontology ought not to be only lexicon-based, but grammar-based or pattern-based.
The needs for a grammar-based ontology include three dimensions. First, according to Construction Grammar, whatever words, phrases or sentences have their form-meaning pairs, which exhibits that a meaning corresponding to a form to any grammatical (syntactic or semantic) element itself exists. The grammatical meanings which show implicit abstract knowledge of the world have long been ignored in building up an ontology. Second, according to Construction Morphology or Distributed Morphology, the meaning of a word formation (especially compound or complex words) is predetermined by its syntactic pattern or construction. This indicates ‘patterns’ contributes to human knowledge. Third, without the grammatical knowledge (whether in awareness or not) a language may not be used properly, which points out that traditional lexicon-based ontologies fail to reflect the knowledge of real language usage, and thus further fail to reflect meanings from language usage interfaced to the objects of the world.
Through implementing the queries of a grammar-based ontology and word-sketch-like systems made from a grammar-based ontology, either the systems or the query results are incapable to be reproduced by traditional ontologies, which implies the need of a grammar-based ontology. Furthermore, the grammar-based ontology has been applied to try doing the machine sentence generation, which cannot be handled by traditional ontologies, either. Additionally, a grammar-based ontology can be applied to study typological issues in linguistics, and can also be applied to analyze reference books of learning languages in educational purposes.
The evaluation of a grammar-based ontology is much more straightforward due to the mixture of several systems with clear precisions, compared to traditional ontologies. However, the precision or quality of a grammar-based ontology is changed if the tags are changed in constituency parse results, dependency parse results or semantic tagging results. Applying different tagsets result in different performance.
The limitations of the grammar-based ontology are stated below. In the design of a grammar-based ontology proposed in the present dissertation, the pragmatic information cannot be handled due to the lack of non-taxonomic relations. Using English as an example, the coordinate and subordinate conjunctions play important roles to be the non-taxonomic relations between two concepts. Second, the grammar-based ontology does not provide much information for evaluating CPA.

Chapter 1: Introduction 1
1.1 Purpose and significance of the present study 3
1.2 The structure of this dissertation 4
1.3 Definitions of terms 4
1.3.1 Set theory 5
1.3.2 IS-A hierarchy 6
1.3.3 Constituency and dependency 7
1.3.4 Semantic Grammar 8
1.4 Summary 9
Chapter 2: Theoretical Foundations 10
2.1 Corpus Pattern Analysis 11
2.1.1 The concept of meanings in CPA 12
2.1.2 CPA steps 13
2.1.3 Pattern Dictionary of English Verbs 15
2.2 Relevant linguistic theories 18
2.2.1 Construction Grammar (Goldberg, 1995) 18
2.2.2 Construction Morphology (Booij, 2010) 19
2.2.3 Distributed Morphology (Halle and Marantz, 1993) 20
2.3 Ontologies 23
2.3.1 Background on ontology 23
2.3.2 Ontology in philosophy 24
2.3.3 Ontology in computer (information) science 27
2.3.4 Ontology presentation languages 29
2.4 Methods for building ontologies 36
2.5 Ontologies learning techniques 36
2.5.1 The linguistics technique 37
2.5.2 The statistical approach 38
2.5.3 Machine learning techniques 38
2.6 Research gaps 39
2.7 Summary 41
Chapter 3: Grammar-based Ontology and Building Methodology 43
3.1 Grammar-based ontology 44
3.2 Tools 47
3.2.1 NLTK 3.0 47
3.2.2 Stanford Parser 3.5.0 49
3.2.3 USAS semantic tagger 49
3.3 Procedures for building a grammar-based ontology 50
3.3.1 Manually type raw sentences in ASWE 50
3.3.2 Constituency parsing 51
3.3.3 Dependency parsing 54
3.3.4 Semantic tagging 56
3.3.5 Merging parsing and tagging 60
3.3.6 Extracting of concepts and instances 63
3.3.7 Extracting of the schema subordinates 64
3.3.8 Generating frames 65
3.4 Summary 66
Chapter 4: Results and applications 68
4.1 Descriptive results 68
4.2 Queries of the grammar-based ontology 75
4.2.1 Finding the concepts and the hypernym concepts of an instance 76
4.2.2 Finding the similarity of the concepts of two instances 81
4.2.3 Finding the difference of the concepts of two instances 83
4.2.4 Finding the similarity of the hypernym concepts of two instances 86
4.2.5 Finding the difference of the hypernym concepts of two instances 87
4.3 Automation of the Corpus Pattern Analysis (CPA) 89
4.4 Word-sketch-like systems 93
4.4.1 The syntactic word-sketch-like system 93
4.4.2 The semantic word-sketch-like system 99
4.5 The preliminary machine sentence generation 103
4.5.1 Generated word-by-word 103
4.5.2 Generated by phrases 104
4.5.3 Generated by phrases with ontological knowledge 106
4.5.4 A list of semantic grammar with syntactic tags 107
4.6 Evaluations 109
4.6.1 Evaluation of the grammar-based ontology 109
4.6.2 Evaluation of the automation of Corpus Pattern Analysis 109
4.6.3 Evaluation of the word-sketch-like systems 111
4.7 Summary 112
Chapter 5: Conclusions 113
References 115

