(3.239.192.241) 您好!臺灣時間:2021/03/02 12:41
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:黃威穎
研究生(外文):Wei-Yin Huang
論文名稱:網頁搜尋結果的階層式動態分群之研究
論文名稱(外文):Hierarchically Dynamic Clustering of Web Search Results
指導教授:陸承志陸承志引用關係
指導教授(外文):Cheng-Jye Luh
學位類別:碩士
校院名稱:元智大學
系所名稱:資訊管理學系
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2006
畢業學年度:94
語文別:中文
論文頁數:51
中文關鍵詞:文件分群網頁搜尋階層式分群動態分群多重分群
外文關鍵詞:Document ClusteringWeb SearchHierarchical ClusteringDynamic ClusteringOverlap Clustering
相關次數:
  • 被引用被引用:0
  • 點閱點閱:146
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本研究提出一個階層式分群方法將網頁搜尋結果做動態分群,以協助使用者以瀏覽分群樹方式,快速地找到有興趣的網頁。這個方法從網頁搜尋結果的網頁標題和說明片段中萃取出特徵詞彙,再依特徵詞彙的網頁涵蓋率和區別率的綜合指標來篩選分群概念、標題與群集個數。這個分群方法允許一個網頁分配到多個群集,同時,也把原來排名較前的網頁儘量排在前面的群集中。
本研究以實作系統對熱門的中英文搜尋關鍵字在尋得時間(Reach Time)的初步效能表現來選定網頁分群的停止條件,再透過使用者滿意度測試,以及系統尋得時間對中英文關鍵字的表現,來做效能比較。實驗結果顯示,本研究提出的方法明顯優於商業化分群系統Vivisimo,而且略勝於有階層分群的相關方法DisCover。
This study proposes a hierarchical clustering method for dynamic clustering of web search results. The resulting tree of clusters can help users efficiently locate the relevant web pages they are interested in. The proposed method extracts feature tokens from the page titles and snippets of search results, and based on an indicator calculated by the coverage and distinctiveness of these feature tokens, determines the clustering concepts, the cluster labels and the number of clusters. Additionally, the proposed method allows a web page to be grouped into several clusters, also it pushes the high ranking web pages into the leading clusters. This study determined the clustering termination condition based on preliminary evaluation results of reach time for several Chinese and English hot keywords. A user study showed that the users are more satisfied with the proposed system than with the commercial system, Vivisimo, and are slightly satisfied with the proposed system than with the related method, DisCover, using English and Chinese hot keywords. Moreover, a performance measure on reach time confirmed that the proposed system out-performs Vivisimo, and performs slightly better than DisCover.
目錄
書名頁 …………………………………… i
中文摘要 ………………………………… ii
英文摘要 ………………………………… iii
誌謝 ……………………………………… iv
目錄 ……………………………………… v
表目錄 …………………………………… vii
圖目錄 …………………………………… viii

第一章、緒論 …………………………………… 1
1.1 研究背景與動機 …………………………… 1
1.2 研究目的 …………………………………… 3
1.3 論文架構 …………………………………… 3
第二章、文獻探討 ……………………………… 4
2.1 網頁分群的關鍵需求 ……………………… 4
2.2 分群樹的基本特性 ………………………… 5
2.3 文件分群方法之比較 ……………………… 6
2.3.1 非階層式分群方法 ……………………… 6
2.3.2 階層式分群方法 ………………………… 8
2.3.3 文件分群方法比較 ……………………… 11
2.4 文件分群在搜尋引擎的應用實例 ………… 11
2.4.1 Vivisimo分群式搜尋引擎 ……………… 11
2.4.2 DisCover階層式分群系統 ……………… 13
2.4.3 Grouper自動分群系統 ………………… 14
2.4.4 Hoskinson 研究助理系統 ……………… 15
第三章、系統分析與設計 ……………………… 19
3.1 系統特性 …………………………………… 19
3.2 系統架構 …………………………………… 20
3.2.1 資料蒐集(Web crawling)與分析 ……… 21
3.2.2 斷詞 (Tokenizing) …………………… 21
3.2.3 網頁分群 ………………………………… 22
3.2.4 結果呈現 ……………………………… 27
3.2.5 相關查詢列表 ………………………… 28
第四章、系統評估 ……………………………… 30
4.1 尋得時間(Reach time) …………………… 30
4.2 網頁分群停止條件的選擇 ………………… 35
4.3 使用者滿意度測試 ………………………… 36
4.4 系統效能評估 ……………………………… 42
第五章、結論 …………………………………… 48
參考文獻 ………………………………………… 49
[1] Ball, G. and D. A. Hall (1967). “A clustering technique for summarizing multivariate data,” Behavioral Science, 12:153–155, 1967.
[2] Chia-Hui Chang , Zhi-Kai Ding (2005), “Categorical data visualization and clustering using subjective factors,” Data & Knowledge Engineering, v.53 n.3, p.243-262, June 2005
[3] eMarketer(2006). Search Marketing: Coming Out On Top.—http://www.emarketer.com/Article.aspx?1003922
[4] Fung, B. C. M. et al. (2003). “Hierarchical Document Clustering Using Frequent Itemsets,” Proceedings of the SIAM International Conference on Data Mining. pp. 59-70, 2003.
[5] Hoskinson, A. (2005). “Creating the Ultimate Research Assistance,” Computer, Volume 38, Number 11, pp. 97-99, 2005
[6] K. Kummamuru and R. Krishnapuram(2001). “A clustering algorithm for asymmetrically related data with its applications to text mining.” In Proceedings of CIKM, pages 571–573, Atlanta, USA.
[7] Kummamuru, K. et al. (2004). “A Hierarchical Monothetic Document Clustering Algorithm for Summarization and Browsing Search Results,” In Proceedings of International WWW Conference, New York, USA. pp.658–665, 23 May 2004
[8] Lee, Kyung-Soon and Kyo Kageura (2005). “Korean-Japanese Story Link Detection Based on Distributional and Contrastive Properties of Event Terms,” Information Processing & Management, to be published in issue 42, 2006, available online at www.sciencedirect.com.
[9] Lycos50 (2005). Lycos50 with Aaron Schatz — Lycos.com, http://50.lycos.com/.
[10] M. F. Porter. (1980) An algorithm for suffix stripping. Program,14:130–137.
[11] R.E. Valdes-Perez, , F. Pereira and V. Pericliev. (2000). “Concise, Intelligible, and Approximate Profiling of Multiple Classes.” International Journal of Human-Computer Studies, 53(3): 411-436.
[12] Sneath, P. H. A. and R. R. Sokal (1973). “Numerical Taxonomy – The Principles and Practice of Numerical Classification.” W. H. Freeman, San Francisco, CA, 1973.
[13] Vitter, J. S. (1985). “Random Sampling with a Reservoir,” ACM Transactions on Mathematical Software, Vol.11, No.1, 37-57, 1985.
[14] W. F. Cody et al. (2002). “The integration of business intelligence and knowledge management,” IBM System Journal, Volume 41, Number 4, pp. 697-713, 2002.
[15] Zamir, Oren and Oren Etzioni (1999). “Grouper: A dynamic clustering interface to web search results,” Proceedings of Eighth International World Wide Web Conference, pp. 123-131, May 11-14 1999
[16] Zamir, Oren and Oren Etzioni, (1998). “Web Document Clustering: A Feasibility Demonstration,” SIGIR 98, 1998, pp. 46-54.
[17] Zamir, Oren E. (1999). “Clustering Web Documents: A Phrase-Based Method for Grouping Search Engine Results,” PhD Thesis, University of Washington, 1999.
[18] Zhao, Y. and G. Karypis (2002). “Evaluation of hierarchical clustering algorithms for document datasets,” In Proceedings of CIKM, pp. 515–524, ACM Press.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔