(3.238.173.209) 您好!臺灣時間:2021/05/16 21:08
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

: 
twitterline
研究生:湯佾達
研究生(外文):Tang, Yi-Ta,
論文名稱:運用文字探勘及關鍵字相似度於標籤雲之研究
論文名稱(外文):The Application of Text Mining and Keyword Similarity on Tag Cloud
指導教授:葉介山葉介山引用關係
指導教授(外文):Yeh, Jieh-Shan
口試委員:李金鳳王孝熙
口試委員(外文):Lee, Chin-FengWang, Hsiao-hsi
口試日期:2011-07-29
學位類別:碩士
校院名稱:靜宜大學
系所名稱:資訊碩士在職專班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2011
畢業學年度:99
語文別:中文
論文頁數:58
中文關鍵詞:標籤雲相似度資料探勘關聯規則
外文關鍵詞:Tag CloudSimilarityData MiningAssociation Rules
相關次數:
  • 被引用被引用:1
  • 點閱點閱:940
  • 評分評分:
  • 下載下載:44
  • 收藏至我的研究室書目清單書目收藏:3
  隨著時代的進步與科技的發達,網際網路已成為現代人們不可或缺的工具。而隨各入口網站所提供資訊的應用不斷被發展出來,巨量的資料不斷的產生,使得使用者開始要面對資訊過多、找尋所需資訊的時間變長、難度變高等問題。除各搜尋網站積極發展更聰明、功能更強的的搜尋引擎外,在一般的網站設計上,以標籤雲形式提供關鍵詞的視覺化呈現方式也成為主流,使用者則可在瀏覽網站時快速找尋到相關的網頁及資訊。
  目前,標籤雲的使用上多以顏色或字體大小來呈現標籤詞彙的重要性或熱門程度,然而,標籤與標籤之間的相互關聯性則未被強調及呈現。本研究以協助使用者找到所需資料為出發點,實作一雛形系統以提升標籤雲的使用性。本研究首先透過了爬行程式將網路文章擷取下來,並使用中央研究院資訊科學研究所中文詞知識庫小組(CKIP)之中文斷詞服務進行斷詞,將關鍵字擷取後,進行關鍵字的TF-IDF(Term Frequency-Inverse Document Frequency)權重計算、相似度比對、及關聯規則探勘,以提供給標籤雲顯示,讓使用者找尋所需資料更為方便。本研究所提出之方法可分別以字體大小、顏色、深淺度、標籤群聚方式表示關鍵字的重要性及關聯程度,進而加強標籤雲中各標籤的辨識度。
As the time progress and the technological advancement, Internet has become an indispensable tool for people nowadays. Following by the constant development of application of information from various portals; a vast amount of data has been generated; therefore, the problems such as excessive information, the searching time lengthening, and more difficulties increasing have been raised up to users. Search sites, in addition to, actively develop more intelligent and functional search engines; the visual presentation of keywords as a tag cloud has become the mainstream in the web site design. The users then can retrieve relative pages and information more efficiently when browsing the websites. Currently in the usage of a tag cloud to perceive the prominence and ranking of tags are mostly displaying in color and font size, yet the interconnection between tags has not been emphasized or fully represented.
This study employs a prototype system to enhance the usage of a tag cloud aiming on assisting the users to search what they need. This study first implemented the concept of the tag cloud to identify the keywords, and then selected relevant articles by a crawler program. After utilizing Chinese word segmentation system developed by Chinese Knowledge Information Processing (CKIP) group, Institute of Information Science, Academia Sinica, to segment words, this study weighted keywords by TF-IDF (Term Frequency-Inverse Document Frequency), compared similarity of keywords, and interconnected keywords with association rules in the tag cloud; thus, this prototype system can improve the convenience and efficiency for the users when searching target information. Moreover, to enhance the recognition of tags, the proposed method in this study can also be applied to present the significance and relativity of keywords by different font, size, color, and intensity, and tag clustering.
摘要 i
ABSTRACT ii
誌謝 iv
目錄 v
圖目錄 vii
表目錄 ix
第一章 緒論 1
1.1研究背景 1
1.2研究動機及目的 1
1.3研究限制 1
1.4研究貢獻 2
1.5論文架構與流程 2
第二章 文獻探討 3
2.1網頁分類標籤雲 3
2.2網頁資料擷取 5
2.3資料處理 6
2.4資料探勘 Data Mining 11
2.5關聯規則資料探勘 13
第三章 系統設計 17
3.1系統構想 17
3.2系統架構 18
3.3資料擷取 19
3.4文章斷詞 22
3.5相似度計算 25
3.6關鍵字關聯規則計算 26
3.7字詞篩選及可運用方向 27
第四章 系統實作 28
4.1系統開發環境 28
4.2個案資料來源 29
4.3中文斷詞服務 33
4.4權重計算 36
4.5相似度計算 36
4.6實作關聯規則資料探勘 37
4.7結果分析 52
4-8標籤雲 56
第五章 結論 57
5.1研究結論及貢獻 57
5.2未來研究方向 57
中文部份:
[1] 中央研究院詞庫小組,中央研究院資訊科學所詞庫小組詞庫介紹 ,http://ckip.iis.sinica.edu.tw/CKIP/,民國100 年5 月。
[2] 中央研究院詞庫小組,中央研究院資訊科學所詞庫小組中文斷詞線上服務,http://ckipsvr.iis.sinica.edu.tw/,民國100 年5 月。
[3] 天空部落格 http://blog.yam.com。
[4] 范長康、蔡文祥,"以鬆強法作中文斷詞," 全國計算機會議論文集,頁 423-431,民國76年。
[5] 陳克健、陳正佳、林隆基,"中文語句的研究—斷詞與構詞," 中央研究院技術報告,民國75年, TR-86-006。
[6] 陳稼興、謝佳倫、許芳誠,以遺傳演算法為基礎的中文斷詞研究,資訊管理研究,第二卷,第二期,頁 27-44,民國89年。
[7] 黃清俊,「網路論壇知識轉換之實證研究」,義守大學資訊管理研究所,碩士論文,民國92年。
[8] 曾憲雄等箸,資料探勘Data Mining,台北,旗標出版股份有限公司,民國97年。
[9] 微軟公司,SQL Server 2008 線上叢書,Microsoft Corporation,民國99年。
[10] P. N. Tan, M. Steinbach and V. Kumar, (施雅月、賴錦慧 譯), "資料探勘:Introduction to Data Mining," Pearson Education Taiwan Ltd, 2007.
英文部份:
[11] A Web2.0 Tag cloud, http://www.archimuse.com/mw2006/papers/lowndes/lowndes-fig2.html, July, 2011.
[12] R. Agrawal, T. Imielinski, and A. Swami. "Mining association rules between sets of items in large database," Proceedings of the ACM SIGMOD Conference on Management of Data, pp. 207-216, 1993.
[13] P. Bausch and J. Bumgardner, Make a Flickr-Style Tag Cloud, Flickr Hacks. O'Reilly Press. 2006.
[14] G. Begelman, P. Keller, and F. Smadja, "Automated tag clustering: improving search and exploration in the tag space," Proceedings of WWW2006, Edinburgh, UK, May 22-26, 2006.
[15] M. J. A. Berry and G. S. Linoff, Data mining techniques: for marketing, sales, and customer support, John Wiley & Sons, Canada, 1997.
[16] R. Brandow, K. Mitze, and L. F. Rau, "Automatic condensation of electronic publications by sentence selection," Information Processing & Management, vol. 31, no. 5, pp. 675-685, 1995.
[17] C. Castillo, "Effective Web Crawling," PhD thesis, University of Chile, 2004.
[18] K. J. Chen and S. H. Liu, "Word identification for Mandarin Chinese sentences," Proceeding of COLING-92, 14th Int. Conf. On Computational Linguistics, pp. 101-107, 1992.
[19] Codeplex, http://www.codeplex.com, June, 2011.
[20] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, "From data mining to knowledge discovery in databases," AI Magazine, vol. 17, no. 3, pp. 37-54, 1996.
[21] S. Foo, S. C. Hui, H. K. Lim, and H. Li, "Automatic thesaurus for enhanced Chinese text retrieval," Library Review, vol. 49, no. 5, pp. 230-239, 2000.
[22] W. J. Frawley, G. Piatetsky-Shapiro, and C. J. Matheus, "Knowledge discovery in databases: an overview," AI Magazine, vol. 13, no. 3, pp. 57-70, 1992.
[23] F. H. Grupe and M. M. Owrang, "Database mining discovering new knowledge and cooperative advantage," Information Systems Management, vol. 12, no. 4, pp. 26-31, 1995.
[24] D. J. Hand, G. Blunt, M. G. Kelly, and N. M. Adams, "Data mining for fun and profit," Statistical Science, vol.15, no. 2, pp.111-131, 2000.
[25] J. T. Horng and C. C. Yeh, "Applying genetic algorithms to query optimization in document retrieval," International Journal of Information Processing and Management, vol. 36, no. 5, pp. 737-759, 2000.
[26] HttpUnit, http://httpunit.sourceforge.net, July, 2011.
[27] B. J. Kang and K. S. Choi, "Effective foreign word extraction for Korean information retrieval," International Journal of Information Processing and Management, vol. 38, no. 1, pp. 91-109, 2000.
[28] O. Kaser and D. Lemire, "Tag-cloud drawing: algorithms for cloud visualization," In Proceedings of the Tagging and Metadata for Social Information Organization Workshop in conjunction with 16th International World Wide Web Conference, 2007.
[29] G. C. Li, K. Y. Liu, and Y. K. Zhang, "Identifying Chinese word and processing different meaning structures," Journal of Chinese Information Processing, vol. 2, pp. 45-53, 1988.
[30] N. Y. Liang, "Knowledge of Chinese word segmentation," Journal of Chinese Information Processing, vol. 4, pp. 42-49, 1990.
[31] J. Y. Nie, M. Brisebois, and X. Ren, "On Chinese text retrieval," Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 225-233, 1996.
[32] J. Y. Nie, J. Gao, J. Zhang, and M. Zhou, "On the use of words and N-grams for Chinese information retrieval," Proceedings of the 15th international workshop on Information retrieval with Asian languages, pp. 141-148, 2000.
[33] NCrawler, http://ncrawler.codeplex.com, June 2011.
[34] NWebCrawler, http://nwebcrawler.codeplex.com, June 2011.
[35] P. R. Peacock, "Data Mining in marketing: Part 1," Marketing Management, vol.6, no.4, pp. 8-18, 1998.
[36] Phpcrawl, http://phpcrawl.cuab.de, June 2011.
[37] A. Singhal, G. Salton, M. Mitra, and C. Buckley, "Document length normalization," Information Processing & Management, vol.32, pp.619-633, 1996.
[38] R. Sproat and C. Shih, "A statistical method for finding word boundaries in Chinese text," Computer Processing of Chinese and Oriental Languages, pp.336-351, 1990.
[39] C. L. Yeh and H. J. Lee, "Rule-based word identification for Mandarin Chinese sentences - A unification approach," Computer Processing of Chinese and Oriental Languages, vol. 5, no. 2, pp. 97-118, 1991.
[40] H. M. Yusef and H. S. Victor, "Improving tag-clouds as visual information retrieval interfaces", International Conference on Multidisciplinary Information Sciences and Technologies, pp. 25-28, 2006.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top