跳到主要內容

臺灣博碩士論文加值系統

(216.73.217.144) 您好!臺灣時間:2026/04/25 20:45
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:王威澤
研究生(外文):Wei-Tse Wang
論文名稱:原生型XML資料庫關聯規則之探勘與利用關聯規則探勘壓縮資料庫
論文名稱(外文):A native XML database association rules mining method and a database compression approach using association rules mining
指導教授:陳靖國陳靖國引用關係
學位類別:碩士
校院名稱:朝陽科技大學
系所名稱:資訊管理系碩士班
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2003
畢業學年度:91
語文別:英文
論文頁數:110
中文關鍵詞:原生型XML資料庫關聯規則資料探勘資料庫壓縮
外文關鍵詞:Native XML DatabaseAssociation RulesData MiningDatabase Compression
相關次數:
  • 被引用被引用:3
  • 點閱點閱:444
  • 評分評分:
  • 下載下載:103
  • 收藏至我的研究室書目清單書目收藏:3
資訊科技的進步,企業資訊系統的應用普及,企業所必須處理的資料量隨之愈來愈龐大,為了能夠正確的處理與保存資料,資料庫的地位就顯得很重要了。然而在龐大的資料庫中找出有用的資訊,更是一門很大的學問了。因此這也是近年來,資料探勘研究會備受重視的原因。此外,面對著日益增加的資料量,我們需要更大的資料儲存體來存取這些龐大的資料量。為了因應對資料永無止境的需求與增加,首要解決之道是提供有效的資料庫壓縮技術,節省資料儲存的成本。
本論文在資料探勘的議題提出了相關的研究,首先不同於資料探勘研究領域以關聯式資料庫為主的想法,對於日漸增加的XML文件,我們提出一套針對原生型XML資料庫的資料探勘方法,利用資料探勘方法能夠從中找出一些知識。其次,提出了一套語意的關聯規則方法,利用資料探勘方法所探勘出來的規則,經過所提出的程序,將關聯規則轉換成具有語意的關聯規則,使關聯規則更具可讀性,進而更容易提供給使用者做參考之用。最後,提出一種以資料探勘為基礎的資料庫壓縮方法,利用關聯規則資料探勘方法,進行資料庫壓縮。不但能夠壓縮資料庫,節省儲存空間,降低資料儲存的成本,且利用關聯規則探勘方法,能夠找出資料與資料間的關聯,這些關聯規則將可以更進一步提供給組織做為組織策略的參考。
With the advancement of technology and popularity of applications in enterprises’ information system, greater and greater amount of data is generated everyday. To properly store and access these data, database applications have come into play and become crucial. The main task of data mining is to help enterprises make their decisions by extracting useful information from the large amount of complicated data storage for reference, so this is why data mining has been recently paid more attention than ever. Also, more storage media for data is required for the increasing amount of data. For unlimited needs of increasing amount of data, it will be wise to provide an efficient data compression technique to reduce the cost.
The thesis proposes the related research on data mining. First of all, it is different from data mining fields based primarily on relational database. We propose a data mining method for native XML database. It can extract some knowledge from native XML database. Secondly, propose a semantic association rule – the rule that is extracted from data mining method. Convert it to the semantic association rule from the proposed procedures so as to make it more legible and easier to users as reference. Finally, propose a database compression using association rules mining. The method compresses the database for reducing the cost of storage. And from the association rules mining, it finds the association among these data. These association rules are further taken as reference for the organizations when making their strategic steps.
中文摘要 I
ABSTRACT II
致謝 III
TABLE OF CONTENTS V
LIST OF FIGURES VIII
LIST OF TABLES X
CHAPTER 1 INTRODUCTION 1
1.1 BACKGROUND AND MOTIVATION 1
1.2 OBJECTIVES 2
CHAPTER 2 ASSOCIATION RULES MINING 4
2.1 INTRODUCTION 4
2.2 RELATED LITERATURE 5
2.2.1 Data Mining 5
2.2.2 Class Inheritance Tree 7
2.2.3 Extensible Markup Language (XML) 11
2.2.4 XML Database 13
2.3 ASSOCIATION RULES MINING FOR NATIVE XML DATABASE 15
2.3.1 Proposed Association Rules Mining for Native XML Database
Method 15
2.3.1.1 Coding Process 17
2.3.1.2 Process of Mining Single Simultaneously Occurring Nodes
Association Rules (Mining Single SONARs) 23
2.3.1.3 Process of Mining Multiple Simultaneously Occurring
Nodes Association Rules (Mining Multiple SONARs) 29
2.3.2 Experiments 31
2.3.3 Summary 36
2.4 SEMANTIC ASSOCIATION RULES MINING 37
2.4.1 Proposed Semantic Association Rules Mining Method 38
2.4.2 An Example 43
2.4.3 Summary 50
CHAPTER 3 DATABASE COMPRESSION USING DATA MINING
51
3.1 INTRODUCTION 51
3.2 RELATED LITERATURE 52
3.2.1 Data and Database Compression 52
3.2.2 Database Compression Using Apriori 54
3.2.3 Tuple Differential Coding Database Compression 56
3.3 PROPOSED DATABASE COMPRESSION METHOD USING ASSOCIATION
RULES MINING 59
3.3.1 Mining All Association Rules 59
3.3.2 Generating Compression Rules 60
3.3.3 Calculating Compressed Space for Each Compression Rule 63
3.3.4 Applying a Proposed Heuristic Method to Resolve the
Conflicts of Compression Rules 64
3.3.5 Computing Compression Ratio 67
3.4 AN EXAMPLE 68
3.4.1 Mining All Association Rules 69
3.4.2 Generating Compression Rules 73
3.4.3 Calculating Compressed Space for Each Compression Rule 74
3.4.4 Applying the Proposed Heuristic Method to Resolve the
Conflicts of Compression Rules 75
3.4.5 Computing Compression Ratio 77
3.5 EXPERIMENT RESULTS 77
3.5.1 A Comparison with Database Compression Using Apriori 78
3.5.2 A Comparison with TDC 81
3.6 CONCLUSIONS 83
CHAPTER 4 CONCLUSIONS AND FUTURE WORKS 85
4.1 DISCUSSIONS AND CONCLUSIONS 85
4.2 FUTURE WORKS 86
BIBLIOGRAPHY 87
[1]Agostino, S. D. (2001), “Parallelism and Dictionary Based Data Compression,” Information Sciences, Vol. 135, pp. 43-56.
[2]Agrawal, R., and Srikant, R. (1994), “Fast Algorithms for Mining Association Rules,” Proc. 20th Int’l Conf. on Very Large Data Bases (VLDB’94), Santiago, Chile, pp. 487-499.
[3]Agrawal, R., Imielinski, T., and Swami, A. (1993), “Mining Association Rules between Sets of Items in Large Databases,” Proc. 1993 ACM-SIGMOD Int’l Conf. on Management of Data (SIGMOD’93), Washington, DC, pp. 207-216.
[4]Ankerst, M., Breunig, M., Kriegel, H. P., and Sander, J. (1999), “OPTICS: Ordering Points to Identify the Clustering Structure,” Proc. 1999 ACM-SIGMOD Int’l Conf. on Management of Data (SIGMOD’99), Philadelphia, PA, pp. 49-60.
[5]Apache, “Apache Xindice,” http://xml.apache.org/xindice/
[6]Asai, T., Abe, K., Kawasoe, S., Sakamoto, H., Arimura, H., and Arikawa, S. (2002), “Efficiently Mining Frequent Substructures from Semi-structured Data,” Proceedings of International Workshop on Informations & Electrical Engineering (IWIE2002), 16-17 May 2002, Suwon, Korea, pp. 59-64.
[7]Babu, S., Garofalakis, M., and Rastogi, R. (2001), “SPARTAN: A Model-Based Semantic Compression System for Massive Data Tables,” Proc. 2001 ACM-SIGMOD Int’l Conf. on Management of Data (SIGMOD’01), pp. 283-294.
[8]Balkenhol, B., and Kurtz, S. (2000), “Universal Data Compression Based on the Burrows-Wheeler Transformation: Theory and Practice,” IEEE Trans. on Computers, Vol. 49, No. 10, pp. 1043-1053.
[9]Banerjee, S., Krishnamurthy V., Krishnaprasad M., and Murthy R. (2000), “Oracle8i-the XML enabled data management system,” Proceedings of the International Conference on Data Engineering (ICDE), California, March 2000, pp. 561-568.
[10]Bassiouni, M. A. (1985), “Data Compression in Scientific and Statistical Databases,” IEEE Trans. on Software Engineering, Vol. SE-11, No. 10, pp. 1047-1058.
[11]Bell, T. C., Witten, I. H., and Cleary, J. G. (1989), “Modeling for Text Compression,” ACM Computing Surveys, Vol. 21, pp. 557-591.
[12]Bentley, J. L., Sleator, D. D., Tarjan, R. E., and Wei, V. K. (1986), “A Locally Adaptive Data Compression Scheme,” Communications of the ACM, Vol. 29, No. 4, pp. 320-330.
[13]Bertino, E., and Catania, B. (2001), “Integrating XML and Database,” IEEE Internet Computing, Vol. 5, No. 4, pp.84-88.
[14]Bourret, R., “XML and Databases,” http://www.rpbourret.com/xml/XMLAndDatabases.htm
[15]Bourret, R., Bornhovd, C., and Buchmann, A. (2000), “A Generic Load/Extract Utility for Data Transfer between XML Documents and Relational Databases,” Proceedings of the Second International Workshop on Advanced Issues of E-Commerce and Web-based Information Systems, pp. 134-143.
[16]Braga, D., Campi, A., Ceri, S., Klemettinen, M., and Lanzi, P. (2002), “A Tool for Extracting XML Association Rules from XML Documents,” Research paper in Proceedings of IEEE-ICTAI 2002, Washington DC, USA, Nov. 2002, pp. 57-64.
[17]Braga, D., Campi, A., Klemettinen, M., and Lanzi, P. (2002), “Mining Association Rules from XML Data,” Proceedings of the Fourth International Conference on Data Warehousing and Knowledge Discovery (DaWaK’02), September 4-6, 2002.
[18]Büchner, A. G., Baumgarten, M., Mulvenna, M. D., Böhm, R., and Anand, S. S. (2000), “Data Mining and XML: Current and Future Issues,” International Conference on Web Information System Engineering, pp. 131-135.
[19]Cannane, A., and Williams, H. E. (2000), “A Compression Scheme for Large Databases,” Australasian Database Conf., Vol. 22, No. 2, pp. 6-11.
[20]Chan, S., Dillon, T., and Siu, A. (2002), “Applying a Mediator Architecture Employing XML to Retailing Inventory Control,” The Journal of Systems and Software, Vol. 60, pp. 239-248.
[21]Chang, C. C. and Wang, C. H. (1997), “A Locally Adaptive Data Compression Strategy for Chinese-English Characters,” Journal of Systems and Software, Vol. 36, No. 2, pp. 167-179.
[22]Chang, H. K. C., and Chen, S. H. (1993), “A New Locally Adaptive Data Compression Scheme using Multilist Structure,” The Computer Journal, Vol. 36, No. 6, pp. 570-578.
[23]Changchien, S. W., and Lu, T. C. (2001), “Knowledge Discovery from Object-Oriented Databases using an Association Rules Mining Algorithm,” Proc. of the 5th Int’l Conf. on Knowledge-Based Intelligent Information Engineering Systems & Allied Technologies 6, 7 & 8 (KES’2001).
[24]Chen, E., and Wang, X. (1999), “Semi-Structured Data Extraction And Schema Knowledge Mining,” Proceedings 25th EUROMICRO Conference, Vol. 2, pp. 310 -317.
[25]Cockshott, W.P., McGregor, D., Kotsis, N., and Wilson, J. (1998), “Data Compression in Database Systems,” Proc. Int’l Database Engineering and Applications Symposium, pp. 111-120.
[26]CoIL Challenge 2000 Report, http://www.liacs.nl/~putten/library/cc2000/
[27]Connack, G. V., and Horspool, R. N. S. (1987), “Data Compression using Dynamic Markov Modeling,” The Computer Journal, Vol. 30, pp. 541-550.
[28]Cooper, B., Sample, N., Franklin, M., Hjaltason, G., and Shadmon, M. (2001), “A Fast Index for Semistructured Data,” Proceedings of the 27th International Conference on Very Large Databases (VLDB’01), Roma, Italy.
[29]Cormack, G. V. (1985), “Data Compression on a Database System,” Communications of the ACM, Vol. 28, pp. 1336-1342.
[30]Crochemore, M., Mignosi, F., Restivo, A., and Salemi, S. (2000), “Data Compression using Antidictionaries,” Proc. of the IEEE, Vol. 88, No. 11, pp. 1756-1768.
[31]Dayen I., “Storing XML in Relational Databases,” http://www.xml.com/pub/a/2001/06/20/databases.html
[32]Deogun, S., Raghavan, V., Sarkar, A., and Sever, H. (1997), Rough Sets and Data Mining – Analysis of Imprecise Data. Kluwer Academic.
[33]Deutsch, A., Fernandez, M., and Suciu, D. (1999), “Storing Semistructured Data with STORED,” In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 431-442.
[34]Effros, M. (2000), “PPM Performance with BWT Complexity: A Fast and Effective Data Compression Algorithm,” Proc. of the IEEE, Vol. 88, No. 11, pp. 1703-1712.
[35]Elias, P. (1975), “Universal Codeword Sets and Representations of the Integers,” IEEE Trans. on Information Theory, Vol. IT-21, pp. 194-203.
[36]Ester, M., Kriegel, H. P., Sander, J., and Xu, X. (1996), “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases,” Proc. 1996 Int’l Conf. on Knowledge Discovery and Data Mining (KDD’96), Portland, Oregon, pp. 226-231.
[37]Florescu, D., and Kossmann, D. (1999), “Storing and Querying XML Data Using an RDBMS,” IEEE Data Engineering Bulletin, Vol. 22, No. 3, pp. 27-34.
[38]Fong, J., Wong, H. K., and Cheng, Z. (2003), “Converting Relational Database into XML Documents with DOM,” Information and Software Technology, Vol. 45, pp. 335-355.
[39]Gallagher, R. G. (1978), “Variations on a Theme by Huffman,” IEEE Trans. on Information Theory, Vol. IT-24, No. 6, pp. 668-674.
[40]Ghazizadeh, S., and Chawathe, S. (2002), “Discovering Frequent Structures using Summaries,” Proceeding of the 5th International Conference on Discovery Science, Germany. Nov. 2002.
[41]Gibson, J. D. (1980), “Adaptive Prediction in Speech Differential Encoding System,” Proc. of the IEEE, Vol. 68, pp. 488-525.
[42]Goh, C. L., Aisaka, K., Tsukamoto, M., Harumoto, K., and Nishio, S. (1998), “Database Compression with Data Mining Methods,” Proc. 5th Int’l Conf. on Foundations of Data Organization (FODO'98), Kobe, Japan, pp. 97-106.
[43]Han, J., and Kamber, M. (2001), Data Mining: Concepts and Techniques. Morgan Kaufmann.
[44]Han, J., Pei, J., and Yin, Y. (2000), “Mining Frequent Patterns without Candidate Generation,” Proc. 2000 ACM-SIGMOD Int’l Conf. on Management of Data (SIGMOD’00), Dallas, TX, pp. 1-12.
[45]Huang, Z. (1998), “Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values,” Data Mining and Knowledge Discovery, Vol. 2, pp. 283-304.
[46]Huffman, D. (1951), “A Method for the Construction of Minimum Redundancy Codes,” Proc. of the Institute of Radio Engineers, Vol. 40, pp. 1098-1101.
[47]Ipedo, “Ipedo XML Database,” http://www.ipedo.com/html/products_xml_dat.html
[48]Jain, A. K., Murty, M. N., and Flynn, P. J. (1999), “Data Clustering: A Review,” ACM Computing Surveys, Vol. 31, No. 3, pp.264-323.
[49]Kaufman, L., and Rousseeuw, P. J. (1990), Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley and Sons.
[50]Knuth, D. E. (1985), “Dynamic Huffman Coding,” Journal of Algorithms, Vol. 6, pp. 163-180.
[51]Langa, K., and Burnett, M. (2000), “XML, Metadata and Efficient Knowledge Discovery,” Knowledge-Based Systems, Vol. 13, pp. 321-331.
[52]Lee, J. W., Lee, K., and Kim, W. (2001), “Preparations for Semantics-Based XML Mining,” Proceedings of IEEE International Conference on Data Mining (ICDM’01), pp. 345-352.
[53]Linoff, G., and Stanfill, C. (1993), “Compression of Indexes with Full Positional Information in Very Large Text Databases,” Proc. of the 16th Int’l. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 88-95.
[54]Liu, B., Hsu, W., and Chen, S. (1997), “Using General Impressions to Analyze Discovered Classification Rules,” Proc. 1997 Int’l Conf. on Knowledge Discovery and Data Mining (KDD’97), Newport Beach, CA, pp. 31-36.
[55]Lu, E. J. L., and Jung, Y. M. (2003), “XDSearch: An Efficient Search Engine for XML Document Schemata,” Expert System with Applications, Vol. 24, pp. 213-224.
[56]Moffat, A., and Zobel, J. (1997), “Text Compression for Dynamic Document Databaes,” IEEE Trans. on Knowledge and Data Engineering, Vol. 9, No. 2, pp. 302-313.
[57]Moh, C. H., Lim, E. P., and Ng, W. K. (2000), “DTD-Miner: A Tool for Mining DTD from XML Documents,” Proceedings of the 2nd International Workshop on Advanced Issues of E-Commerce and Web-based Information Systems (WECWIS 2000), June 2000, San Jose.
[58]Murthy, S. K. (1998), “Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey,” Data Mining and Knowledge Discovery, Vol. 2, pp. 345-389.
[59]Nayak, R., Witt, R., and Tonev, A. (2002), “Data Mining and XML documents,” Proceedings of the 2002 International Conference on Internet Computing, Nevada, USA, June 24-27, 2002.
[60]Ng, W. K., and Ravishankar, C. V. (1997), “Block-Oriented Compression Techniques for Large Statistical Databases,” IEEE Trans. on Knowledge and Data Engineering, Vol. 9, No. 2, pp. 314-328.
[61]Padmanabhan, B., and Tuzhilin, A. (1999), “Unexpectedness as a Measure of Interestingness in Knowledge Discovery,” Decision Support System, Vol. 27, pp. 303-318.
[62]Pawlak, Z. (1982), “Rough sets,” International Journal of Computer and Information Science, Vol. 11, pp. 341-356.
[63]Quinlan, J. R. (1986), “Induction of Decision Trees,” Machine Learning, Vol. 1, pp. 81-106.
[64]Quinlan, J. R. (1993), C4.5: Programs for Machine Learning. Morgan Kaufmann.
[65]Rastogi, R., and Shim, K. (1998), “PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning,” Proc. 1998 Int’l Conf. on Very Large Data Bases (VLDB’98), Valencia, Spain, pp. 263-277.
[66]Rys, M. (2001), “Bringing the Internet to Your Database: Using SQL Server 2000 and XML to Build Loosely-Coupled Systems,” In Proceedings of the International Conference on Data Engineering, pp. 465-472.
[67]Shanmugasundaram, J., Shekita, E., Barr, R., Carey, M., Lindsay, B., Pirahesh, H., and Reinwald, B. (2000), “Efficiently Publishing Relational Data as XML Documents,” Proceedings of the 26th International Conference on Very Large Databases (VLDB’00), Cairo, Egipt, September 2000, pp. 65-76.
[68]Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., De-Witt, D., and Naughton, J. (1999), “Relational Databases for Querying XML Documents: Limitations and Opportunities,” Proceedings of the 25th International Conference on Very Large Databases (VLDB’99), Edinburgh, UK, September 1999, pp. 302-314.
[69]Software AG, “Tamino XML Server,” http://www.softwareag.com/tamino/default.htm
[70]Ströbel, M. (2002), “An XML Schema Representation for the Communication Design of Electronic Negotiations,” Computer Networks, Vol. 39, pp. 661-680.
[71]Termier, A., Rousset, M. C., and Sebag, M. (2002), “TreeFinder, a First Step towards XML Data Mining,” Proceedings of IEEE International Conference on Data Mining (ICDM’02), Maebashi, Japan, December 9-12, 2002, pp 450-457.
[72]Vitter, J. S. (1987), “Design and Analysis of Dynamic Huffman Codes,” Journal of ACM, Vol. 34, pp. 825-845.
[73]Welch, T. A. (1984), “A Technique for High-Performance Data Compression,” IEEE Computer, Vol. 17, pp. 8-19.
[74]Witten, I. H., Neal, R. M., and Cleary, J.G. (1987), “Arithmetic Coding for Data Compression,” Communications of ACM, Vol. 30, No. 6, pp. 520-540.
[75]World Wide Web Consortium. “Extensible Markup Language (XML) Version 1.0 (W3C Recommendation),” http://www.w3.org/XML/, Feb. 1998.
[76]World Wide Web Consortium. “Guide to the W3C XML Specification ("XMLspec") DTD, Version 2.1,” http://www.w3.org/XML/1998/06/xmlspec-report.
[77]Xiao, Y., and Dunham, M. H. (2001), “Efficient Mining of Traversal Patterns,” Data & Knowledge Engineering, Vol. 39, pp. 191-214.
[78]XML.org, http://xml.org/.
[79]Yang, E. H., Kaltchenko, A., and Kieffer, J. C. (2001), “Universal Lossless Data Compression with Side Information by using a Conditional MPM Grammar Transform,” IEEE Trans. on Information Theory, Vol. 47, No. 6, pp. 2130-2150,.
[80]Yen, S. J., and Chen, A. L. P. (2001), “A Graph-Based Approach for Discovering Various Types of Association Rules,” IEEE Trans. on Knowledge and Data Engineering, Vol. 13, No. 5, pp. 839-845.Zadeh, L. A. (1965), “Fuzzy Sets,” Information and Control, Vol.8, No.3, pp. 338-353.
[82]Ziv, J., and Lempel, A. (1977), “A Universal Algorithm for Sequential Data Compression,” IEEE Trans. on Information Theory, Vol. IT-23, pp. 337-343.
[83]Ziv, J., and Lempel, A. (1978), “Compression of Individual Sequences via Variable-Rate Coding,” IEEE Trans. on Information Theory, Vol. IT-24, pp. 530-536.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top