研究生(外文):Bing-Sheng Huang
論文名稱(外文):A Study on Semantic Annotation of Web Pages UsingSelf-Organizing Maps
指導教授(外文):Hsin-Chang Yang
外文關鍵詞:Semantic AnnotationSemantic WebMetadataSelf-Organizing Map
全球資訊網的建立使得資訊易於獲得、儲存與傳播。架構其上之電子商務之可能性更使得全球資訊網被認為決定企業獲利能力乃至於生存與否的關鍵因素之一。然而電子商務或其他全球資訊網應用並未全面性的取待傳統模式之應用,其主要原因為全球資訊網上之網路代理人等應用程式未能簡單且一致的瞭解網頁的內容,致使自動化訊息交換無法進行,導致人力或領域特定之應用程式之介入而使得系統複雜度過高。為解決網頁語意難以瞭解的困難,語意網(Semantic Web)架構便應運而生,目前語意網已逐漸成為全球資訊網的後續標準。
本研究的重點便是發展一自動語意註解(semantic annotation)技術來將普通網頁加上語意資訊。賦予網頁各種不同形式的語意註解,使得網頁所代表的意義更容易為人們所了解,並且可以讓電腦更有效率的處理,這樣的功能對於語意網所重視之自動資料擷取與資料交換而言,是一項重要的基礎技術。
本研究的註解方式是透過自我組織圖(Self-Organizing Maps, SOM)技術來自動發掘一文件內之關鍵字並找出其相關的關鍵字及這些關鍵字所對應的文件,此外也會自動產生與此文件相關的其他文件。這類的語意資訊,將適當的被加入網頁中成為後設資料或語意標記(semantic tags)。如此便使得自動化的語意瞭解變為可能,進一步促進語意網之建構。
The World Wide Web (WWW, the Web) provides a uniform framework for knowledge storage and sharing. The possibility of e-commerce, which is operated on the WWW, has been recognized as one of the key factors that may affect the survival of an enterprise. However, e-commerce or other WWW applications are not able to replace traditional models of merchandise and transactions. The major cause of such inability is the incapability of understanding the semantics of a web page, which makes automated information exchange impossible. To remedy the deficiency in semantics understanding, the Semantic Web architecture has been suggested and emerges to be the succeeding standard of the WWW.
The aim of the Semantic Web is to provide machine accessible metadata that describes the semantics of resources to facilitate the search, filter, condense, or negotiate knowledge for their human users. A core technology for making the Semantic Web happen, but also to leverage application areas like knowledge management and e-commerce, is the field of Semantic Annotation, which turns human-understandable content into a machine understandable form. For newly created Semantic Web resources, the annotation can be done manually or by help of some authoring tools. However, it is not practical to semantically annotating existing Web pages due to the gigantic amount of them. To promote the Web to the Semantic Web, we need to develop an automated process to discover the semantics of a web page and explicitly add them to the page, generally in the form of XML-based metadata, for future use. Automatic semantic metadata generation thus plays an important role to ensue the success of the Semantic Web.
In this work, we present a semantics metadata generation method using a text mining approach. First we cluster the training Web pages. The Web pages are trained by the self-organizing map (SOM) algorithm to generate a feature map, namely the keyword cluster map (KCM). A semantics extraction process is then applied on the KCM to identify a set of keywords that could describe the main theme of each page. The process also reveals the relationships among there keywords. The identified keywords and relationships may then be used to construct an ontology that could be used as metadata and annotations for the web pages.
