外文關鍵詞:document clusteringdata miningassociation rulesdimensionality reduction techniques
In recent years, due to the rapid development of Internet, World Wide Web, communication networks, the information of Web page is inundant. So the users always couldn’t get information precisely by search engines. And the inundation also lead individuals to spend more and more time filtering information. Nowadays, the users are interested in the multimedia information because of the colorful homepages. But the present search engines always present the single media that couldn’t satisfy the users. In order to satisfy the user’s needs, a specific precise category should be created to fit the user’s needs.
In this paper, a new method for clustering multimedia documents is proposed. The keywords and the low-level image features are extracted from a set of sampling multimedia documents first. Then the association rules among keywords, features, or keyword-feature pairs are extracted based on a data mining technique. Those rules are useful to enhance the semantic deficiency of a document. Finally documents are clustered by the proposed algorithm using dimensionality reduction techniques.
In the experimental aspect of view, there are two trials should be done. One is homo-web-page trial, the web pages of the different sub categories of the same knowledge domain is selected and the other is hetero-web-page trial, the web pages of different knowledge domain is selected to validate that the clustering rule could be executed well in the classification of multimedia web pages .
中文摘要 i
英文摘要 ii
誌 謝 iii
表目錄 v
圖目錄 vii
第一章 緒論 1
第一節 背景說明 1
第二節 研究動機 2
第三節 研究目的 4
第四節 論文架構 5
第二章 文獻探討 6
第一節 文字文件的分類 7
第二節 圖片分類 10
第三節 圖片與文字的整合 12
第四節 知識的萃取 14
第三章 設計方法 15
第一節 網頁下載器 17
第二節 分析器 17
第三節 文件特徵的萃取與篩選 19
第四節 圖片特徵的萃取與篩選 21
第五節 知識萃取 24
第六節 網頁分群技術 29
第七節 系統架構與資料結構 38
第四章 實驗結果與分析討論 45
第一節 實驗環境 45
第二節 實驗步驟 46
第三節 實驗成果與分析 48
第五章 結論與未來發展 70
文獻參考 71
小 傳 74
