研究生(外文):Ya-Ching Lee
論文名稱(外文):A Study on Adding New Concepts to Domain Ontology
指導教授(外文):Rung-Ching Chen
外文關鍵詞:Semantic WebDomain OntologyCKIP SegmentationLSA Matrix
近年來,本體論(Ontology)的應用在網際網路上變成一個熱門的研究議題。由於本體論具有分享與再利用特定領域知識的能力,可作為人類與應用系統之間溝通的管道,因此本體論應該要能隨著資訊的快速變動而更新,不過隨著資訊科技的發展,新的知識不斷被提出與發現,已建構好的本體論可能已經不符合使用者的需求。因此本體論的更新有其必要性。有鑑於此,本研究提出一個本體論更新的方法,從網際網路中收集相關領域的網頁,從中找出與領域本體論意思相符的詞彙予以加入。首先將所收集的網頁傳送至 CKIP 線上斷詞服務,再利用 POS (Part-of-speech) 標籤屬性取出名詞。接著建立詞-文件矩陣,以 TF-IDF 公式來計算每個詞彙的權重,刪去權重較低的詞彙後,再將此矩陣經由潛在語意分析 (Latent Semantic Analysis, LSA) 來強化每個詞彙的潛在語意特徵,最後產生 LSA 矩陣,我們將 LSA 矩陣轉換到一高維度空間,利用 Hamming distance function 計算詞彙與本體論概念的語意相似度,得到與領域本體論概念語意最相似的新詞彙,並找出新詞彙在本體論中適當的位置插入,完成本體論的更新。根據實驗結果,我們提出的方法具有73%的準確性。藉由這樣的方式,相信可以有效解決人工建構、更新本體論必須耗費的大量時間與人力的缺點。
In recent years, ontology application has become one of the most popular research topics on the Internet. Many researchers use domain ontology to facilitate knowledge sharing and reusing. However, the specification of the ontology is still incompletely now; due to no standard expression and creating method exist for the ontology. The information is changed fast on the Internet, so the knowledge on the ontology often can not really meet the user’s requirement. Therefore, how to create a renewable ontology is necessary. In this paper, we will propose an ontology renewable method which can insert new keywords into the corresponding constructed ontology. The novel method uses TF-IDF (Term Frequency – Inverse Document Frequency) and LSA (Latent Semantic Analysis) to strengthen the semantic characteristic of keywords and transforms the LSA matrix to a high dimensional space based on collected web page. The similar values between keywords and concepts are gotten by comparing the high dimension keyword to the corresponding constructed ontology concepts based on group center similarity. The keyword with the least value of similarity will be inserted into the offspring of the concept on the domain ontology and become the domain concept. The primary experiment results indicate the method has 73% accuracy.
