|
[1] H. Witten, A.Moffat, and T.C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Van Nostrand Reinhold, New York,1994. [2] M. S. Manasse A. Z. Broder, S. C. Glassman and G. Zweig. Syntatic clustering of the Web. In Proc. Of the sixth International World Wide Web Conference [WWW6], pages 391-404. [3] K. Bharat and A. Z. Broder. A study of host pairs with replicated content. In Proc. of 8th International Conference on World Wide Web [WWW99],May 1999. [4] N. Shivakumar and H. Gracia-Molina. SCAM: a copy detection mechanism for digital documents. In Proc. of 2nd International Conference in Theory and Practice of Digital Libraries, June 1995. [5] N. Shivakumar and H. Gcacia-Molina. Building a scalable and accurate copy detection mechanism. In Proc of 1 st ACM conference on Digital Libraries (DL’96), March 1996. [6] S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. In Proceeding of the 7th International World Wide Web Conference 1998, pages 107-117. [7] L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the Web. Stanford Digital Library Technologies, Working Paper 1999-0120,1998. [8] B. Yuwono and D. L. Lee. Server ranking for distributed text retrieval systems on the Internet. In Proceeding of the 5th International conference on Data Engineering (ICDE), pages 164-171,New Orleans, USA, 1996. [9] J. Kleinberg. Authoritative scores in a hyperlinked environment. In Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 668-667, January 1998. [10] GAIS http://gais.cs.ccu.edu.tw [11] M. O. Rabin. Fingerprinting by random polynomials. Center for Research in Computing Technology, Harvard University, Report TR-15-81, 1981. [12] Dina Bitton, David J. DeWitt, Duplicate record elimination in large data files, ACM Transactions on Database Systems (TODS), v.8 n.2, p.255-265, June 1983 [13] Melody Y. Ivory, Rashmi Sinha, and Matri A. Hearst. Preliminary findings on quantitative measures for distinguishing highly rated information-centric web pages. In Proceedings of the 6th Conference on Human Factors and the Web, 2000 [14] Shian-Hua Lin and Jan-Ming Ho. Discovering Informative Content Blocks from Web Documents, KDD-02, 2002 [15] Yi. L. and Liu, B., Eliminating Noisy Information in Web Pages for Data Mining. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 296-305, ACM Press, 2003 [16] Nahm, U.Y., Bilenko, M. and Mooney R.J. Two Approaches to Handling Noisy Variation in Text Mining. ICML-2002 Workshop on Text Learning, 2002 [17] Ed H. Chi. Peter Pirolli. and James Pitkow. The scent of a site: A system for analyzing and predicting information scent, usage, and usability of a web site. In Proceedings of ACM CHI00 Conference on Human Factors in computing Systems, 2000 [18] M. Cral Drott. Using web server logs to improve site design. In ACM 16th International Conference on Systems Documentation. Getting Feedback on your Web Site, pages 43-50, 1998 [19] Jean Scholtz and Sharon Laskowski. Developing usability tools and techniques for designing and testing web sites. In Proceedings of the 4th Conference on Human Factors and the Web, 1998 [20] Yin Leng Theng and Gil Marsden. Authoring tools: Towards continuous usability testing of web documents. In Proceedings of the 1st International Workshop on Hypermedia Development, 1998. [21] Horold Thimbleby. Gentler: A tool for systematic web authoring. International Journal of Human-Computer Studies, 47(1): 139-168, 1997.
|