跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.91) 您好!臺灣時間:2024/12/10 07:08
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:許暉煌
研究生(外文):Shiu,Huei-Huang
論文名稱:應用資料探勘技巧於多媒體文件分類法則之研究
指導教授:鄭錫齊鄭錫齊引用關係楊健貴楊健貴引用關係
指導教授(外文):Cheng ,Shyi-ChyiYang ,Chen-Kuei
學位類別:碩士
校院名稱:銘傳大學
系所名稱:資訊管理研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2000
畢業學年度:88
語文別:中文
論文頁數:82
中文關鍵詞:特徵萃取資料探勘維度降低技術文件分類
外文關鍵詞:document clusteringdata miningassociation rulesdimensionality reduction techniques
相關次數:
  • 被引用被引用:4
  • 點閱點閱:416
  • 評分評分:
  • 下載下載:80
  • 收藏至我的研究室書目清單書目收藏:3
近年來網路快速發展造成資訊的氾濫,使用者使用搜尋引擎查詢資訊時,經常無法在短時間內精確地找到所需資訊,更因使用者對於多媒體資訊的需求越來越高,不再侷限於文字媒體上,導致目前的搜尋引擎都無法滿足使用者需求,為解決以上問題本論文對多媒體文件進行分類。期望中多媒體資訊也可以作為收尋標的,並讓使用者更容易找到所需的資訊。
本研究提出一個對文件進行分類的多媒體文件分類架構,主要做法是整合圖片與文字,將文件關鍵字及圖片代表色萃取出來,並利用資料探勘產生關鍵字與關鍵字間、圖片特徵與圖片特徵間、及關鍵字與圖片特徵間的關聯法則,再應用這些法則補強多媒體文件可能缺漏的語意模型,並強化圖文間的關係。經過語意補充後的文件,以階層式決策樹配合本研究所提出維度降低機制的快速分群方法進行文件分類,並以實驗驗證本分類法則的可行性。
在實驗設計方面,實驗樣本共分兩類,一為同質性網頁分類實驗,取同知識領域但不同子類別的網頁資料進行實驗;另ㄧ為異質性網頁分類實驗,取不同知識領域的網頁資料進行實驗,以驗證所提方法在分類過程中,對於相同或不同性質的網頁資訊皆可得到不錯的分類結果。
In recent years, due to the rapid development of Internet, World Wide Web, communication networks, the information of Web page is inundant. So the users always couldn’t get information precisely by search engines. And the inundation also lead individuals to spend more and more time filtering information. Nowadays, the users are interested in the multimedia information because of the colorful homepages. But the present search engines always present the single media that couldn’t satisfy the users. In order to satisfy the user’s needs, a specific precise category should be created to fit the user’s needs.
In this paper, a new method for clustering multimedia documents is proposed. The keywords and the low-level image features are extracted from a set of sampling multimedia documents first. Then the association rules among keywords, features, or keyword-feature pairs are extracted based on a data mining technique. Those rules are useful to enhance the semantic deficiency of a document. Finally documents are clustered by the proposed algorithm using dimensionality reduction techniques.
In the experimental aspect of view, there are two trials should be done. One is homo-web-page trial, the web pages of the different sub categories of the same knowledge domain is selected and the other is hetero-web-page trial, the web pages of different knowledge domain is selected to validate that the clustering rule could be executed well in the classification of multimedia web pages .
中文摘要 i
英文摘要 ii
誌 謝 iii
表目錄 v
圖目錄 vii
第一章 緒論 1
第一節 背景說明 1
第二節 研究動機 2
第三節 研究目的 4
第四節 論文架構 5
第二章 文獻探討 6
第一節 文字文件的分類 7
第二節 圖片分類 10
第三節 圖片與文字的整合 12
第四節 知識的萃取 14
第三章 設計方法 15
第一節 網頁下載器 17
第二節 分析器 17
第三節 文件特徵的萃取與篩選 19
第四節 圖片特徵的萃取與篩選 21
第五節 知識萃取 24
第六節 網頁分群技術 29
第七節 系統架構與資料結構 38
第四章 實驗結果與分析討論 45
第一節 實驗環境 45
第二節 實驗步驟 46
第三節 實驗成果與分析 48
第五章 結論與未來發展 70
文獻參考 71
小 傳 74
[1] Shian-Hua Lin,Chi-Sheng Shih,Meng Chang Chen,Jan-Ming Ho “Extracting Classification Knowledge of Internet Documents with Mining Term Associations: A Semantic Apporach,” Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval , 1998, pp241 — 249
[2] Marinette Bouet,Chabane Djeraba “Powerful image Organization in Visual Retrieval System,” 6th ACM international conference on Multimedia , 1998,pp315 - 322
[3] Mehran Sahami,Salim Yusufali,Michelle Q. W. Baldonado “SONIA:A Service for Organizing Networked Information Autonomously,” Proceedings of the third ACM Conference on Digital libraries 1998,pp200-209
[4] C.J. van Rijsbergen. “Information Retrieval,” Butterworths,1979
[5] Daniel Boley,Maria Gini,Kyle Hastings,Bamshad Mobasher and Ferry Moore “A client-side Web agent for document categorization,” Internet Resarch 1998,pp387-399
[6] E. Rasmussen, “Clustering algorithms,” Information Retrieval: Data Structures and Algorithms Prentice Hall 1992,pp419-442.
[7] Kyung-Ah Han, and Sung-Hyun Myaeng, “Image Organization and Retrieval with Automatically Constructed Feature Vector,”Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval , 1996, pp157-165
[8] Jing Huang, S Ravi Kumar and Ramin Zabih “An Automatic Hierarchical Image Classification Scheme,” Proceedings of the 6th ACM international conference on Multimedia , 1998, pp219 - 228
[9] Yuen-Hsien Tseng “Multilingual Keyword Extraction for Term Suggestion,” Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval , 1998, pp377 — 378
[10] Lee-Feng Chien. “PAT-Tree-Based Keyword Extraction for Chinese Information Retrieval,” Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval , 1997,pp50 — 58
[11] M.F. Porter. “An algorithm for suffix stripping,” Program,14(3):,1980, pp130-137
[12] Kyung-Ah Han, and Sung-Hyun Myaeng, “Image Organization and Retrieval with Automatically Constructed Feature Vector,”Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval , 1996, pp157-165
[13] In-Soo Kang,Tae-wan Kim,and Ki-Joune Li “A Spatial Data Mining by Delaunay Triangulation ,” Proceedings of the 5th international workshop on Advances in geographic information systems 1997,pp35-39
[14] Mehran Sahami “Learning limited dependence Bayesian classiers,” Proceedings of KDD 1996,pp335-338
[15] Osmar R.Zaïane, Jiawei Han, Ze-Nian Li, Sonny H.Chee, and Jenny Y.Chiang; “MultiMediaMiner: a system prototype for multimedia data mining,” Proceedings of ACM SIGMOD international conference on Management of data , 1998, pp 581 - 583
[16] Heckbert,P. “Color Image Quantization for Frame Buffer Display,” B.S. Thesis Architecture Machine Group,MIT,Cambridge,Mass.,1980
[17] Shyi-Chyi Cheng and Chen-Kuei Yang, “Fast and Novel Techniques for Color Quantization Using Reduction of Color Space dimensionality,” submitted for publication.
[18] Paul Heckbert, “Color Image Quantization for Frame Buffer Display,”ACM 1982,pp297-307
[19] Wu and Witten “A fast k-means type clustering algorithm,” Technique report, Department of computer science, University of Calgary, Calgary, Canada 1985
[20] Joy and Xiang ,“Center-cut for color image quantization,” The visual Computer 10,1993 pp62-66
[21] Chen-Kuei Yang and Tsai , “Color Image compression using quantization, thresholding ,and edge detection techniques all based on moment-preserving principle,” Pattern Recognition Letter 19 ,1998 ,pp205-215
[22] Hafner, J., “Efficient Color Histogram Indexing for Quadratic Distance Functions,” IEEE Transaction on Pattern analysis and Machine Intelligence, July 1995.
[23] V. Harmandas,M.Sanderson,and M.D. Dunlop “Image retrieval by hypertext links,” Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval , 1997, pp296 - 303
[24] Soumen Chakrabarti, Byron Dom, Prabhakar Raghavan, Sridar Rajagopalan, David Gibson, and Jon Kleinberg. “Automatic resource compiliation by analyzing hyperlink structure and associated text,” http://www.almaden.ibm.com/cs/people/pragh/www98/438.html
[25] Rakesh Agrawal, Tomasz Imielinski and Arun Swami “Mining Association Rules between Sets of Items in Large Databases,” Proceedings of the 1993 ACM SIGMOD international conference on Management of data , 1993, pp207 - 216
[26] Fleury, L. Knowledge Extraction in human resource management database. Ph.D. thesis —IRIN,Nantes Universiy,22 Novemeber 1996
[27] Swain,M.J. and Ballard, D.H. “Color indexing,” Intern. Journal of computer vol. 7(1),1991,pp11-32
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top