|
[1] A. Arasu and H. Garcia-Molina, “Extracting structured data from Web pages,” ACM SIGMOD International Conference on Management of Data, San Diego, California, pp.337- 348, 2003. [2] M. Álvarez, A. Pan, J. Raposo, F. Bellas, and F. Cacheda, “Extracting lists of data records from semi-structured web pages,” Data and Knowledge Engineering, Vol.64, No.2, pp.491-509, 2008. [3] R. Baeza-Yates and B. Ribeiro-Neto, “Modern Information Retrieval,” Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1999. [4] D. Buttler, L. Liu, and C. Pu, “A Fully Automated Object Extraction System for the World Wide Web,” 2001 International Conference on Distributed Computing Systems, Phoenix, Arizona, 2001. [5] H. Carrillo and D. Lipman, “The Multiple Sequence Alignment Problem in Biology,” SIAM Journal Applied Mathematics, Vol.48, No.5, pp.1073–1082, 1988. [6] C. H. Chang and C. N. Hsu, “Automatic Extraction of Information Blocks Using PAT Trees,” National Computer Symposium, Taipei, Taiwan, 1999. [7] C. H. Chang, S. C. Lui, and Y. C. Wu, “Applying pattern mining to Web information extraction,” The Fifth Pacific Asia Conference on Knowledge Discovery and Data Mining, pp.4-16, Hong Kong, 2001. [8] C.H. Chang and S.C. Lui, “IEPAD: Information Extraction based on Pattern Discovery,” The 10th International World Wide Web Conference, Hong Kong, 2001. [9] V. Crescenzi, G. Mecca, and P. Merialdo, “RoadRunner: towards automatic data extraction from large Web sites,” The 26th International Conference on Very Large Database Systems, Rome, Italy, pp.109-118, 2001. [10] D. Cai, S. Yu, J.R. Wen, and W.Y. Ma, “VIPS: a Vision-based Page Segmentation Algorithm," Microsoft Technical Report, MSR-TR-2003-79, 2003. [11] C.H. Chang and S.C. Kuo, “OLERA: Semisupervised Web-Data Extraction with Visual Support, “IEEE Intelligent Systems, Vol.19, No.6, pp.56- 64, 2004. [12] D. Embley, Y. Jiang, and Y.-K. Ng, “Record-boundary discovery in Web documents,” ACM SIGMOD Conference on Management of Data, pp.467-478, 1999. [13] D. Gusfield, “Algorithms on Strings, Trees and Sequences,” Cambridge University Press, 1997. [14] X. Gu, J. Chen, W.-Y. Ma, and G. Chen, “Visual based content understanding towards web adaptation,” The Second International Conference on Adaptive Hypermedia and Adaptive Web-based Systems, pp.164-173, Spain, 2002. [15] S. Gupta, G. Kaiser, D. Neistadt, and P. Grimm, ”DOM-based content extraction of HTML documents,” The 12th international conference on World Wide Web, pp.207-214, 2003. [16] C. N. Hsu and M. Dung, “Generating finite-state transducers for semi-structured data extraction from the web,” Journal of Information Systems, Vol.23, No.8, pp.521-538, 1998. [17] N. Kushmerick, D. Weld, and R. Doorenbos, “Wrapper Induction for Information Extraction,” The Fifteenth International Joint Conference on Artificial Intelligence, pp.729-737, 1997. [18] B. Liu, R. Grossman, and Y. Zhai, “Mining data records in Web Pages,” the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.601-606, 2003. [19] L. Li, Y. Liu, A. Obregon, and M. Weatherston, “Visual Segmentation-Based Data Record Extraction from Web Documents,” IEEE International Conference on Information Reuse and Integration, pp.502-507, 2007. [20] B. Liu, “Web Data Mining: Exploring hyperlinks, Contents, and Usage Data,” Springer Verlag, 2007. [21] I. Muslea, S. Minton, and C. Knoblock, “A hierarchical approach to wrapper induction,” The 3rd International Conference on Autonomous Agents, pp.190-197, 1999. [22] D.C. Reis, P.B. Golgher, A. S. Silva, and A. F. Laender, “Automatic web news extraction using tree edit distance,” The 13th International Conference on the World Wide Web, pp.502-511, New York, 2004. [23] S. Sarawagi, “Automation in Information Extraction and Data Integration (Tutorial),” The 28th International Conference on Very Large Data Bases, 2002. [24] R. Song, H. Liu, J.R. Wen, and W.Y. Ma, “Learning block importance models for web pages,” The 13th international conference on World Wide Web, pp.203-211, 2004. [25] K. Simon and G. Lausen, ”ViPER: Augmenting Automatic Information Extraction with Visual Perceptions,” the 14th ACM international conference on Information and knowledge management, pp.381-388, 2005. [26] Y.F. Tseng and H. Y. Kao, “The Mining and Extraction of Primary Informative Blocks and Data Objects from Systematic Web Pages,” The 2006 IEEE/WIC/ACM International Conference on Web Intelligence, pp.370-373, 2006. [27] J. Wang and F.H. Lochovsky, “Data extraction and label assignment for Web databases,” The Twelfth International Conference on World Wide Web, Budapest, Hungary, pp.187-196, 2003. [28] Y. Yang and H.-J. Zhang, “HTML page analysis based on visual cues,” The 6th International Conference on Document Analysis and Recognition, pp.859–864, 2001. [29] H. Zhzo, W. Meng, Z. Wu, V. Raghavan, and C. Yu, “Fully Automatic Wrapper Generation for Search Engines”, the 14th international conference on World Wide Web, pp.66-75, 2005. [30] H. Zhzo, W. Meng, and C. Yu., “Automatic extraction of dynamic record sections from search engine result pages”, The 32nd international conference on Very large data bases, pp.989-1000, 2006. [31] Y. Zhai, B. Liu, “Structured data extraction from the web based on partial tree alignment,” IEEE Transactions on Knowledge and Data Engineering, Vol.18, No.12, pp.1614-1628, 2006. [32] Y. Zhai, B. Liu, “Extracting Web Data Using Instance-Based Learning,” World Wide Web, Vol.10, No.2, pp.113-132, 2007.
|