跳到主要內容

臺灣博碩士論文加值系統

(44.210.83.132) 您好!臺灣時間:2024/05/25 20:09
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:呂忠憲
研究生(外文):Jung-Shian Liu
論文名稱:集中式資訊收集系統的設計與評估
論文名稱(外文):Design and Evaluation of Focused Crawling System
指導教授:蔡志忠蔡志忠引用關係
指導教授(外文):Jyh-Jong Tsay
學位類別:碩士
校院名稱:國立中正大學
系所名稱:資訊工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:中文
論文頁數:49
中文關鍵詞:資訊收集系統
外文關鍵詞:context graphfocused crawlerweb crawler
相關次數:
  • 被引用被引用:0
  • 點閱點閱:198
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
近年來,由於WWW的蓬勃發展,要如何更有效率地來收集資訊,已經是越來越重要的課題了。在本論文中,我們研究以及評估一些集中式資料收集系統的方法,它們的目的是收集Web上特定主題的網頁,而不需拜訪所有的網頁。我們的方法是假設那些主題相似的網頁可能會連結到其它相似的網頁,我們也利用網頁內容、超連結、關鍵字和context graph的資訊來決定網頁排序的策略。實驗結果顯示我們的方法比BFS和單獨用context graph的集中式資料收集系統更快收集到相關的網頁。

The enormous growth of the world wide web in the recent years has made it important to perform resource discovery efficiently. In this thesis, we develop and evaluate approaches for focused crawling whose objective is to crawl particular topical portions of the World Wide Web quickly without having to visit all web pages. Our approach is based on the assumption that similar pages have links pointing to related pages. We propose oredering strstegies that combine information from page contents, hyperlinks, query words and context graphs.
Experiments show that our approach can obtain relevant pages faster than BFS and focused crawler using solely context graphs.

Chapter 1 Introduction
Chapter 2 Preliminaries
Chapter 3 Our Approach
Chapter 4 System Architecture
Chapter 5 Evaluation
Chapter 6 Conclusion

[1]Junghoo Cho, Hector Garcia-Molina, Lawrence Page. Efficient Crawling Through URL Ordering. In Proceedings of the 7th World Wide Web conference (WWW7), Brisbane, Australia, April 1998.
[2]Soumen Chakrabarti, Martin van den Berg, Byron Dom. Focused crawling:a new approach to topic-specific Web resource discovery. In Proceedings of the Eighth International World Wide Web Conference, Toronto, Canada, May 1999.
[3]M. Diligenti, F.M. Coetzee, S. Lawrence, C. L. Giles, M. Gori.Focused Crawling Using Context Graphs. In Proceedings of the 26th VLDB Conference, Cairo, Egypt, 2000.
[4]Jason Rennie, Andrew McCallum. Efficient Web Spidering with Reinforcement Learning. International Conference on Machine Learning (ICML98)1998.
[5]Jyh-Jong Tsay, Jing-Doo Wang. Comparing Classifiers for Automatic Chinese Text Categorization. Proceeding of National Computer Symposium 1999(NCS'99), R.O.C, Dec 20,21,1999.
[6]Jyh-Jong Tsay, Jing-Doo Wang. Term Selection with Distributional Clustering for Chinese Text Classification using N-grams. ROCLING 1999, Page 151-170,August 24.25,1999.
[7]Google. http://www.google.com/
[8]Yahoo!. http://www.yahoo.com/
[9]AltaVista. http://www.altavista.com/
[10]InfoSeek. http://www.infoseek.com/
[11]Excite. http://www.excite.com/
[12]Lycos. http://www.lycos.com/
[13]ResearchIndex. http://citeseer.nj.nec.com/cs/
[14]HTML小說推進委員會. http://html-npc.hypermart.net/
[15]Ricardo Baeza-Yates, Berthier Ribeiro-Neto.
Modern Information Retrieval. Addison-Wesley. 1999.
[16]Martijn Koster. Robots Exclusion. http://www.robotstxt.org/wc/exclusion.html
[17]Sriram Raghavan, Hector Garcia-Molina. Crawling the Hidden Web. In the Proceedings of the 27th Intl. Conf. on Very Large Databases (VLDB), pp. 129-138, September 2001.
[18]Onn Brandman, Junghoo Cho, Hector Garcia-Molina, Narayanan Shivakumar. Crawler-Friendly Web Servers. In Workshop on Performance and Architecture of Web Servers (PAWS), June 2000.
[19]Brewster Kahle. Preserving the Internet. "Preserving the Internet" that appeared in Scientific American, March 1997. http://www.sciam.com/0397issue/0397kahle.html
[20]Yiming Yang, Jan O. Pedersen. A re-examination of text categorizaion methods. In Proceedings of 22th Ann Int ACM SIGIR Conference on Research and Development Information Retrieval(SIGIR'94), pages 42-49, 1999.
[21]JAVA SUN. http://java.sun.com/
[22]Cora search engine. http://www.whizbang.com/
[23]G. Salton. Automatic Text Processing. Addison Wesley, Massachusetts, 1989.
[24]Budi Yuwono, Savio L. Lam, Jerry H. Ying, Dik L. Lee.
A World Wide Web Resource Discovery System. The Fourth International WWW Conference Boston, USA, December 11-14, 1995.
[25]Charu C. Aggarwal, Fatima Al-Garawi, Philip S. Yu. Intelligent Crawling on the World Wide Web with Arbitrary Predicates. In Proceedings of the Tenth International World Wide Web Conference, Hong Kong, May 1-5 2001.
[26]M. Najork, J. L. Wiener. Breadth-First Search Crawling Yields High-Quality Pages. Proceedings of the 10th International World Wide Web Conference, May 2001.
[27]Martin Ester, Matthias Gross, Hans-Peter Kriegel. Focused Web Crawling: A Generic Framework for Specifying the User Interest and for Adaptive Crawling Strategies. 27th Int. Conf. on Very Large Databases (VLDB 2001), Rom, Italien, 2001.
[28]Allan Heydon, Marc Najork. Mercator: A Scalable, Extensible Web Crawler. In World Wide Web, December 1999, pages 219-229.
[29]Robin D. Burke. Salticus: Guided Crawling for Personal Digital Libraries. In Proceedings of the first ACM/IEEE-CS joint conference on Digital Libraries, pages 88--89, Roanoke, Virginia, June 24--28 2001.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top