|
[1]. S. Brin and L. Page, ”The Anatomy of a Large-Scale Hypertextual Web Search Engine,” In Proceedings of Seventh World Wide Web Conference, pp. 107-117, 1998. [2]. M.S. Chen, J.S. Park, and P.S. Yu, “Efficient Data Mining for Path Traversal Patterns,” IEEE Transactions on Knowledge and Data Engineering, vol. 10, no. 2, pp. 209-221, April 1998. [3]. R. Song, H. Liu, J.R. Wen, and W.Y. Ma, “Learning Block Importance Models for Web Pages,” In Proceedings of the 13th World Wide Web Conference, 2004. [4]. J. Han and K.C.C. Chang, “Data Mining for Web Intelligence,” IEEE Computer, vol. 35, no. 2, pp. 64-70, November 2002. [5].Y. Yang, S. Slattery, and R. Ghani, “A Study of Approaches to Hypertext Categorization,” Journal of Intelligent Information Systems, 2002. [6].H. Yu, J. Han, and K.C.C. Chang, “PEBL: Web Page Classification without Negative Examples,” IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 1, January 2004. [7].A. Sun, E.P. Lim, and W.K. Ng, “Web Classification Using Support Vector Machine,” In Proceedings of the Fourth International Workshop on Web Information and Data Management, pp. 96-99, 2002. [8].J. Furnkranz, “Exploiting Structural Information for Text Classification on the WWW,” In Proceedings of the Third Symposium on Intelligent Data Analysis, 1999.
[9].H.J. Oh, S.H. Myaeng, and M.H. Lee, “A Practical Hypertext Categorization Method Using Links and Incrementally Available Class Information,” In Proceedings of the 23rd ACM International Conference on Research and Development in Information Retrieval, pp. 264-271, 2000. [10].E. Glover, K. Tsioutsiouliklis, S. Lawrence, D. Pennock, and G. Flake, “Using Web Structure for Classifying and Describing Web Pages,” In Proceedings of the 11th World Wide Web Conference, 2002. [11].L.K. Shih and D.R. Karger, “Using URLs and Table Layout for Web Classification Tasks,” In Proceedings of the 13th World Wide Web Conference, 2004. [12].L. Page, S. Brin, R. Motwani, and T. Winograd, “The PageRank Citation Ranking: Bringing Order to the Web,” Technical Report, Department of Computer Science, Stanford University, 1998. [13].S. Chakrabarti, M. Berg, and B. Dom, “Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery,” In Proceedings of 8th World Wide Web Conference, 1999. [14].M. Diligenti, F.M. Coetzee, S. Lawrence, C.L. Giles, and M. Gori, “Focused Crawling Using Context Graphs,” In Proceedings of 26th International Conference on Very Large Databases, pp. 527-534, 2000. [15].S. Chakrabarti, K. Punera, and M. Subramanyam, “Accelerated Focused Crawling Through Online Relevance Feedback,” In Proceedings of 11th World Wide Web Conference, pp. 148-159, 2002. [16].C. Cardie, “Empirical Methods in Information Extraction,” Journal of AI Magazine, vol. 18, no. 4, pp. 5-79, 1997.
[17].D. Buttler, L. Liu, and C. Pu, “A Fully Automated Object Extraction System for the World Wide Web,” In Proceedings of the International Conference on Distributed Computing Systems, pp. 361-370, May 2001. [18].C.H. Chang, and S.C. Lui, “IEPAD: Information Extraction based on Pattern Discovery,” In Proceedings of 10th World Wide Web Conference, pp. 681-688, 2001. [19].D. Embley, Y. Jiang, and Y.K. Ng., “Record-Boundary Discovery in Web Documents,” In Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 467-478, 1999. [20].N. Kushmerick, D.Weld, and R. Doorenbos, “Wrapper Induction for Information Extraction,” In Proceedings of 15th International Joint Conference on Artificial Intelligence, 1997. [21].I. Muslea, S. Minton, and C. Knoblock, “A Hierarchical Approach to Wrapper Induction,” In Proceedings of Third International Conference on Autonomous Agents, 1999. [22].K. Wang and H. Liu, “Discovering Structural Association of Semistructured Data,” IEEE Transactions on Knowledge and Data Engineering, vol. 12, no. 3, pp. 353-371, 2000. [23].J. Hou and Y. Zhang, “Effectively Finding Relevant Web Pages from Linkage Information,” IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 4, 2003. [24].S. Chakrabarti, “Integrating the Document Object Model with Hyperlinks for Enhanced Topic Distillation and Information Extraction,” In Proceedings of 10th World Wide Web Conference, pp. 210-220, 2001.
[25].K. Bharat and M.R. Henzinger, “Improved Algorithms for Topic Distillation in a Hyperlinked Environment,” In Proceedings of 21st ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 104-111, 1998. [26].S. Chakrabarti, B. Dom, P. Raghavan, S. Rajagopalan, D. Gibson, and J. Kleinberg, ”Automatic Resource Compilation by Analyzing Hyperlink Structure and Associate Text,” In Proceedings of Seventh World Wide Web Conference, pp. 65-74, 1998. [27].J. Kleinberg, “Authoritative Sources in a Hyperlinked Environment,” Journal of the ACM, 1999. [28].S. Chakrabarti, M. Joshi, and V. Tawde, “Enhanced Topic Distillation Using Text, Markup Tags, and Hyperlinks,” In Proceedings of 24th ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 208-216, 2001. [29].D. Gibson, J. Kleinberg, and P. Raghvan, “Inferring Web Communities from Link Topology,” In Proceedings of 9th ACM Conference on Hypertext and Hypermedia, pp. 225-234, 1998. [30].C. Clifton, “TopCat: Data Mining for Topic Identification in a Text Corpus,” IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 8, August 2004. [31].O. Buyukkokten, H. Garcia-Molina, A. Paepcke, and T. Winograd, “Power Browser: Efficient Web Browsing for PDAs,” In Proceedings of the ACM SIGCHI Special Interest Group on Computer-Human Interaction Conference on Human factors in computing systems, pp. 430-437, 2000.
[32].S.H. Lin and J.M. Ho, “Discovering Informative Content Blocks from Web Documents,” In Proceedings of Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002. [33].H.Y. Kao, S.H. Lin, J.M. Ho, and M.S. Chen, “Mining Web Informative Structures and Contents Based on Entropy Analysis,” IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 1, January 2004. [34].L. Yi, B. Liu, and X. Li, “Eliminating Noisy Information in Web Pages for Data Mining,” In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2003. [35].L. Yi and B. Liu, “Web Page Cleaning for Web Mining Through Feature Weighting,” In Proceedings of Eighteenth International Joint Conference on Artificial Intelligence, August 2003. [36].J. Wang and F.H. Lochovsky, “Data-Rich Section Extraction from HTML pages,” In Proceedings of IEEE International Conference on Web Information Systems Engineering, 2002. [37].B.Y. Ziv and R. Sridhar, “Template Detection via Data Mining and its Applications,” In Proceedings of the 11th World Wide Web Conference, 2002. [38]. M. Kovacevic, M. Diligenti, M. Gori, and V. Milutinovic, “Searching for Web Information More Efficiently Using Presentational Layout Analysis,” Journal of Electronic Business, vol. 1, no. 3, pp. 310-326, 2003. [39].N. Kushmerick, “Learning to remove Internet Advertisements,” In Proceedings of 3rd International Conference on Autonomous Agents, pp. 175-181, 1999. [40].T. Mitchell, Machine Learning, McGraw Hill, 1997.
[41].D. Cai, S. Yu, J.R. Wen, and W.Y. Ma, “Extracting Content Structure for Web Pages Based on Visual Representation,” In Proceedings of Fifth Asia Pacific Web Conference, 2003. [42].V. N. Vapnik, The Nature of Statistical Learning Theory, Springer, New York, 1995. [43].J. Chen, B. Zhou, J. Shi, H. Zhang, and Q. Wu, “Function-Based Object Model Towards Website Adaptation,” In Proceedings of 10th World Wide Web Conference, 2001. [44].C. Shannon, “A Mathematical Theory of Communication,” Journal of Bell System, vol. 27, pp. 398-403, 1948.
URL List: [45].W3C DOM, Document Object Model (DOM), http://www.w3c.org/DOM/, 2003. [46].CNN web site, http://www.cnn.com, 2005. [47].BBC news web site, http://news.bbc.co.uk, 2005. [48].ABC news web site, http://abcnews.go.com, 2005. [49].Yahoo news web site, http://news.yahoo.com, 2005.
|