( 您好!臺灣時間:2021/05/08 06:51
字體大小: 字級放大   字級縮小   預設字形  


研究生(外文):Bing-Sheng Huang
論文名稱(外文):A Study on Semantic Annotation of Web Pages UsingSelf-Organizing Maps
指導教授(外文):Hsin-Chang Yang
外文關鍵詞:Semantic AnnotationSemantic WebMetadataSelf-Organizing Map
  • 被引用被引用:0
  • 點閱點閱:134
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:4
全球資訊網的建立使得資訊易於獲得、儲存與傳播。架構其上之電子商務之可能性更使得全球資訊網被認為決定企業獲利能力乃至於生存與否的關鍵因素之一。然而電子商務或其他全球資訊網應用並未全面性的取待傳統模式之應用,其主要原因為全球資訊網上之網路代理人等應用程式未能簡單且一致的瞭解網頁的內容,致使自動化訊息交換無法進行,導致人力或領域特定之應用程式之介入而使得系統複雜度過高。為解決網頁語意難以瞭解的困難,語意網(Semantic Web)架構便應運而生,目前語意網已逐漸成為全球資訊網的後續標準。
本研究的重點便是發展一自動語意註解(semantic annotation)技術來將普通網頁加上語意資訊。賦予網頁各種不同形式的語意註解,使得網頁所代表的意義更容易為人們所了解,並且可以讓電腦更有效率的處理,這樣的功能對於語意網所重視之自動資料擷取與資料交換而言,是一項重要的基礎技術。
本研究的註解方式是透過自我組織圖(Self-Organizing Maps, SOM)技術來自動發掘一文件內之關鍵字並找出其相關的關鍵字及這些關鍵字所對應的文件,此外也會自動產生與此文件相關的其他文件。這類的語意資訊,將適當的被加入網頁中成為後設資料或語意標記(semantic tags)。如此便使得自動化的語意瞭解變為可能,進一步促進語意網之建構。
The World Wide Web (WWW, the Web) provides a uniform framework for knowledge storage and sharing. The possibility of e-commerce, which is operated on the WWW, has been recognized as one of the key factors that may affect the survival of an enterprise. However, e-commerce or other WWW applications are not able to replace traditional models of merchandise and transactions. The major cause of such inability is the incapability of understanding the semantics of a web page, which makes automated information exchange impossible. To remedy the deficiency in semantics understanding, the Semantic Web architecture has been suggested and emerges to be the succeeding standard of the WWW.
The aim of the Semantic Web is to provide machine accessible metadata that describes the semantics of resources to facilitate the search, filter, condense, or negotiate knowledge for their human users. A core technology for making the Semantic Web happen, but also to leverage application areas like knowledge management and e-commerce, is the field of Semantic Annotation, which turns human-understandable content into a machine understandable form. For newly created Semantic Web resources, the annotation can be done manually or by help of some authoring tools. However, it is not practical to semantically annotating existing Web pages due to the gigantic amount of them. To promote the Web to the Semantic Web, we need to develop an automated process to discover the semantics of a web page and explicitly add them to the page, generally in the form of XML-based metadata, for future use. Automatic semantic metadata generation thus plays an important role to ensue the success of the Semantic Web.
In this work, we present a semantics metadata generation method using a text mining approach. First we cluster the training Web pages. The Web pages are trained by the self-organizing map (SOM) algorithm to generate a feature map, namely the keyword cluster map (KCM). A semantics extraction process is then applied on the KCM to identify a set of keywords that could describe the main theme of each page. The process also reveals the relationships among there keywords. The identified keywords and relationships may then be used to construct an ontology that could be used as metadata and annotations for the web pages.
摘要 II
Abstract IV
目錄 VI
表目錄 IX
圖目錄 X
第一章 緒論 1
1.1 研究背景 1
1.2 研究動機 6
1.3 研究目的 8
1.4 論文架構 9
第二章 文獻探討 10
2.1 文本探勘 10
2.2 類神經網路 14
2.4 自我組織圖 18
2.5 語意註解 21
第三章 研究方法 25
3.1 前置處理 26
3.1.1 中文斷詞 26
3.1.2 選擇關鍵字 30
3.1.3 轉換文件為向量模式 32
3.2 自我組織圖分群 35
3.3 標記處理 38
3.3.1 文件分群圖之產生 38
3.3.2 關鍵字分群圖之產生 40
3.4 取得語意註解資料 41
第四章 實證分析 43
4.1 系統平台 43
4.2 前置處理 44
4.3 自我組織圖訓練 47
4.4 標記處理 49
4.5註解產生 52
4.6實驗評估 53
第五章 結論 56
5.1 研究貢獻 56
5.2 後續研究方向 57
參考文獻 58
英文文獻 58
中文文獻 61
附錄 63
[1]Heflin, J. & Hendle, J. (2001), “A Portrait of the Semantic Web in Action.” IEEE Intelligent System, Vol. 16, No. 2, pp. 54-59.
[2]Yang, H.-C. (2005), “Spatial Topology Distances of High-order Receptive Fields for Shape-based Image Retrieval.” In Proceedings of The 12th International Conference on Neural Information Processing (ICONIP 2005), pp. 727-731.
[3]Fayyad, U., Piatetsky-Shapiro, G. & Smyth, P. (1996) “From Data Mining to Knowledge Discovery: An Overview,” Advances in Knowledge Discovery and Data Mining, pp.1-36.
[4]Sullivan, D. (2001) Document Warehousing and Text Mining, Wiley Computer Publishing, pp. 326.
[5]Losiewicz, P., Douglas W. O., & Ronald N. K. (2000). “Textual Data Mining to Support Science and Technology Management.” Journal of Intelligent Information Systems, Vol. 15, pp. 99-119.
[6]Tan, A.-H. (1999) “Text mining: the state of the art and the challenges.” Proc. of the Pacific-Asia Workshop on Knowledge Discovery from Advanced Databases, pp. 65-70.
[7]Dörre, J., Gerstl, P., & Seiffert, R. (1999). Text Mining: Finding Nuggets in Mountains of Textual Data. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, pp. 398-401.
[8]Chen, H. (2001) “Knowledge Management Systems ─ A Text Mining Perspective,” Ph.D. thesis.
[9]Feldman, R., & Dagan, I. (1995). “KDT-Knowledge Discovery in Texts.” In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD95).
[10]Feldman, R., & Hirsh, R. (1997). “Finding Associations in Collections of Text.” In R.S. Michalski, I. Bratko, & M. Kubat, editors, Machine Learning and Data Mining: Methods and Applications, pp. 223-240. John Wiley and Sons.
[11]Hearst, M.A. (1997). Text Data Mining: issues, Techniques, and the Relation to Information Access. Retrieved March 29, 2005 from the World Wide Web : http://www.sims.berkeley.edu/~hearst/
[12]Kohonen, T., Kaski, S., Lagus, K., & Honkela, T. (1996). “Very Large Two-Level SOM for the Browsing of Newsgroups.” In Proceedings of ICANN 1996, pp. 269-274.
[13]Kohonen, T. (1998).“Self-Organization of Very Large Document Collections: State of the Art.” In Proceedings of ICANN98, the 8th International Conference on Artificial Neural Networks, Vol. 1, pp. 65-74.
[14]Nagao, K., Shirai, Y., & Squire, K. (2001)“Semantic annotation and transcoding: making Web content more accessible.” IEEE Intelligent System, vol. 8, issue 2, pp. 69 – 81.
[15]Handschuh, S. & Stabb, S. (2002) “Authoring and Annotation of Web Pages in CREAM.” WWW2002, May 7-11, 2002, Honolulu, Hawaii, USA.
[16]Ma,Q., Zhang, M., Murata, M., Zhou, M., & Isahara, H. (2002)“Self-organizing Chinese and Japanese Semantic Maps.” In Proceedings of COLING 2002.
[17]Erdmann, M., Maedche, A., Schnurr, H., & Staab, S. (2000) “From Manual to Semi-automatic Semantic Annotation: About Ontology-based Text Annotation Tools.” In Proceedings of the COLING 2000 Workshop on Semantic Annotation and Intelligent Content, Luxembourg.
[18]Hopfield, J. J. (1988) “Artificial Neural Networks.” IEEE Intelligent System.
[19]Bechhofer, S. & Gobel, C. (2001) “Towards Annotation Using DAML+OIL.” In Proc. First International Conference on Knowledge Capture (K-CAP 2001) Workshop on Knowledge Markup and Semantic Annotation, Victoria, B. C., Canada.
[20]Erdmann, M., Maedche, A., Schnurr, H., & Staab, S. (2000) “From Manual to Semi-automatic Semantic Annotation: About Ontology-based Text Annotation Tools.” In Proceedings of the COLING 2000 Workshop on Semantic Annotation and Intelligent Content, Luxembourg.
[21]Martin, P. & Eklund, P. (1999) “Embedding Knowledge in Web Documents.” Computer Networks, Vol. 31, pp. 1403–1419.
[22]Dill, S., Eiron, N., Gibson, D., Gruhl, D., Guha, R., Jhingran, A., Kanungo, T.,McCurley, K. S., Rajagopalan, S., Tomkins, A., Tomlin, J. A., & Zien, J. Y. (2003) “A Case for Automated Largescale Semantic Annotation.” Web Semantics: Science, Services and Agents on the World Wide Web, Vol. 1, no. 1, pp. 115–132.
[23]Bonino, D., Corno, F., & Farinetti, L. (2003) “Semantic Annotation and Search at the Document Substructure Level.” In Proceedings of the 2nd International Semantic Web Conference, Florida, USA.
[24]Kiryakov, A., Popov, B., Terziev, I., Manov, D., & Ognyanoff, D.(2005) “Semantic Annotation, Indexing, and Retrieval.” To be appeared in Web Semantics: Science, Services and Agents on the World Wide Web, Vol. 2, no. 1.
[25]Handschuh, S. & Staab, S. (2003) “CREAM: CREAting Metadata for the Semantic Web.” Computer Networks, Vol. 42, no. 5, pp. 579–598.
[26]Cimiano, P., Handschuh, S., & Staab, S. (2004) “Towards the Self-annotating Web. In Proceedings of The 13th International Conference on World Wide Web, pp. 462–471, New York, NY, USA.
[27]中研院詞庫小組(2005)。中研院平衡語料庫詞類標記集。線上檢 索日期:2006年3月21日。http://ckipsvr.iis.sinica.edu.tw/cat.htm/
[28]林頌堅(2004)。以自組織映射圖進行計算語言學領域視覺化之研究。In Proceedings of ROCLING XVI, pp. 69-77.
[29]楊新章,朱慶章 (2004). “應用文本探勘於網頁導覽架構建立之研究.”二零零四數位生活與網際網路科技研討會論文集 ,台南, 台灣.
[30]楊新章,李弘斌 (2004). “應用文本探勘技術於網頁影像語意發掘.” 二零零四數位生活與網際網路科技研討會,台南, 台灣.
[31]葉怡成(1998)。類神經網路模式應用與實作( 第五版)。台北:儒林圖書有限公司
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔