研究生(外文):Sin-Jhih Chen
論文名稱(外文):A Hybrid-Approach Method for Schema Matching Problem in Data Exchange
指導教授(外文):Wei-Jung Shiang
外文關鍵詞:similarity floodingXMLdata exchangeschema matching
本研究提出結合語意對應與結構對應的混合式演算法-改良型SF(Similarity Flooding)演算法來解決商業交易資料交換的一對一綱要衝突問題,期望在供應鏈環境中的商業交易資料能快速且正確的對應。SF演算法原型在結構對應過程分為四個階段,第一階段以OEM結構表達需要配對的兩個綱要,第二階段將兩個綱要的OEM圖形結合成所有可能配對的連接圖形(Pairwise Connecting Graph, PCG)進行結構圖形的重組,第三階段開始進行結構相似度的計算,第四階段則是將結構對應結果放入篩選機制中,找出最可能的配對提供使用者參考並做出最後判斷。改良型 SF演算法主要針對第二階段的過程進行改善,在結合OEM圖形的過程中參考語意對應相關的資訊,排除較為不可能的配對,達到簡化PCG圖形結構的效果。

Data exchange between companies in a supply chain environment needs to fulfill the requirements of common data format and data representation to assure the accuracy of communication. XML has recently emerged as a common data format for cross-platform information exchange over the Internet. Since information systems are developed independently, identical data represented with different schemas in each system is a common state; therefore information systems may not understand the true meaning of exchanged data. This kind of communication problem is named as schema conflict. The core technique for solving schema conflict in data exchange is correctly matching imported XML documents into internal relational database schemas.
There are two major methods in schema matching: linguistic matching and structural matching. From previous research results, only one single method can not effectively solve linguistic matching problems in one-to-many and many-to-one cases. Similarity flooding (SF) originally is a purely structure-oriented algorithm based on the propagation graph, pairwise connecting graph (PCG), and fixpoint computation to detect similar schema structure. A modified similarity flooding method using linguistic similarity values to simplify the PCG is proposed to improve the effectiveness of schema matching.
With a simplified data structure in the PCG, this hybrid method can reduce the computation effort in matching schemas. Based on the experimental results, in most cases this method increases matching accuracy with less computing time compared to the original SF method. The major factor could be only linguistically qualified candidates are included in the PCG, and this modification may increase the matching ability of the proposed method.

Keywords: data exchange, schema matching, similarity flooding, XML.
