( 您好!臺灣時間:2021/05/18 04:14
字體大小: 字級放大   字級縮小   預設字形  


研究生(外文):Chou, HaoChing
論文名稱(外文):A Case Study of Porting Relational Database to Hbase
指導教授(外文):Ren-Hung Hwang
口試委員(外文):Ben-Jye ChangJen-Yi PanRen-Hung Hwang
外文關鍵詞:HbaseCloud Computingde-normalization
  • 被引用被引用:1
  • 點閱點閱:712
  • 評分評分:
  • 下載下載:20
  • 收藏至我的研究室書目清單書目收藏:0
隨著網際網路的成熟導致資料的大量膨脹,針對小量資料(terabyte)做處理的關聯式資料庫已無法滿足目前的使用者。因此間接促使了雲端運算(Cloud Computing)的興起。其中Hbase為雲端運算中發展較為成熟的開放原始碼、以key-value架構並可處理大量資料(petabyte)的資料庫。利用分散式的檔案系統(HDFS)達到結合伺服器之間的資源以達到高效能,且相較於關連式資料庫以行存取(row-oriented)的資料庫,Hbase採用以列為主(column-oriented)的資料存取方式。有別於關聯式資料庫的存取方式,在資料表的設計上也不像關聯式資料庫要依循資料正規化(normalization)。設計上必須考量Hbase資料庫的特性並採用新的策略,並考慮雲端運算的特性。因此本論文提出利用反正規化(de-normalization),並且分析網站的使用行為與Hbase的特性,根據資料表間的關係重新設計符合key-value特性的資料庫的設計方法。最後經由模擬的結果顯示,我們所提出的方法適合在分散式環境底下的資料庫做為設計資料表的指引。
With the rapid traffic growth in the Internet, researchers have found that the relational databases fail to manage large volume of data, e.g., above terabyte dataset. Among methods that could manage large volume data, cloud computing with distributed file system is a popular method which has proven to be a scalable solution. Hbase is one of the open-source, distributed model database projects which stores large volume data, parallels and aggregates the resource among a cluster of servers. However, Hbase stores data in a column-oriented style, in contrast to the row-oriented relational databases,therefore normalization mechanism which is a guideline for relational database's schema design is not suitable for Hbase. In this thesis, we propose a new strategy for redesigning schema for Hbase. Based on the analysis of user activities and the relationship among tables in the original relational database system, the proposed de-normalization mechanism effectively speeds up the data access time.Our numerical results suggest our approach is a good choice for schema design in Hbase.
1. 緒論 1
1.1 研究背景 1
1.2 研究動機 2
1.3 研究目的與貢獻 3
1.4 論文架構 3
2. 相關研究 4
2.1HBASE相關特性 4
2.1.1HDFS(Hadoop Distribution File System) 4
2.1.2Hadoop 8
2.1.3 Hbase 11 CAP理論 11 ACID與BASE特性 11資料模型 14
2.2正規化 20
2.3轉換關聯式資料表到HBASE資料表的設計以及兩者在叢集環境中的效能比較 21
2.3.1 轉換關聯式資料表到Hbase資料表的設計 21
2.3.2關聯式資料表與Hbase資料表在叢集環境中的效能比較 22
2.4使用反正規劃轉換關聯式資料表到HBASE資料表的設計 22
3.系統架構設計 25
3.1 分析HBASE特性 25
3.2系統架構設計 29
4.實驗分析 41
4.1環境與參數 41
4.2 效能測量項目 45
4.2.1Net I/O 46
4.2.1Response time 46
4.3實驗結果與討論 46
4.3.1Net I/O 47
4.3.2Response time 48
5. 結論與未來展望 50
6. 參考文獻 51
[1]Gantz, et al.,“The expanding digital universe: A forecast of worldwide information growth through 2010” IDC, 2007.
[2]R. R. Schaller, “Moore's law: Past, present, and future,”IEEE Spectrum, pp.52 - 59, 1997.
[3]Ghemawat, Gobioff, Leung, “The Google file system,”Proceedings of the nineteenth ACM symposium on Operating systems principles, pp.29-43, 2003.
[4]Ousterhout, Agrawal, Erickson, Kozyrakis, Leverich, Mazières, Mitra, Narayanan, Parulkar, Rosenblum, M. Rumble, Stratmann, Stutsman, “The case for RAMClouds: scalable high-performance storage entirely in DRAM,”ACM SIGOPS Operating Systems Review, pp. 92-105, v.43, 2010.
[5]Dean, Ghemawat, “MapReduce: simplified data processing on large clusters,”Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, pp.107 – 113, 2004.
[6]Chang, Dean, Ghemawat, C. Hsieh , A. Wallach , Burrows , Chandra , Fikes , E. Gruber, “Bigtable: a distributed storage system for structured data,”Proceedings of the 7th symposium on Operating systems design and implementation, pp 205 - 218, 2006.
[7]Hadoop and Distributed Computing at Yahoo!. [Online].Available: http://developer.yahoo.com/blogs/Hadoop/
[8]Hadoop: Open-source implementation of MapReduce. [Online].Available: http://Hadoop.apache.org
[9]Hbase: Bigtable-like structured storage for HadoopHDFS.[Online].Available: http://wiki.apache.org/lucene-Hadoop/Hbase
[10]M. Slee, A. Agarwal, M. Kwiatkowski. “Thrift: Scalable cross-language services implementation,” Whitepaper, Facebook, 2007.
[11]Chongxin Li, “Transforming Relational Database into HBase: A Case Study,” Software Engineering and Service Sciences (ICSESS), pp.683 - 687, 2010.
[12]F. Cooper, Silberstein, Tam, Ramakrishnan, Sears, “Benchmarking cloud serving systems with YCSB,”Proceedings of the 1st ACM symposium on Cloud computing, pp. 143 – 154, 2010.
[13]Franke, Morin, Chebotko, Abraham, Brazier, “Distributed Semantic Web Data Management in HBase and MySQL Cluster,” Proceedings of the 4th IEEE International Conference on Cloud Computing, 2011
[14]Elmasri,B.Navathe, Fundamentals Of Database Systems(4th ed ),pp.293-323, 2003.
[15]E-learning Online System of Ministry of Education. [Online].Available: http://ups.moe.edu.tw
[16]Teorey et al.,Database Design know it all, Morgan Kaufmann, 2009.
[17]Anti-RDBMS: A list of distributed key-value stores.[Online].Available: http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores/.
[18]Mysql AB.[Online].Available: http://www.Mysql.com
[19]Cao, Y. Ts’o, BadariPulavarty, Bhattacharya, Dilger,Tomas. “State of the art: Where we arewith the ext3 filesystem,”Proceedings of the Linux Symposium,pp. 69 – 96, 2005.
[20]A. Brewer, “Towards robust distributed systems (abstract),”Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing, 2000.
[21]Charles M., RAID Level 1.[Online].Available: http://www.pcguide.com/ref/hdd/perf/raid/levels/singleLevel1-c.html
[22]J. D. Zawodny, D. J. Balling. High PerformanceMySQL: Optimization, Backups, Replication, and LoadBalancing, O’Reilly, 2004.
[23]Blumberg, R., Atre, S.,“The Problem with Unstructured Data.”Information Management Magazine, 2003.
[24]D. Carstoiu, A. Cernian, A. Olteanu, “Hadoop hbase-0.20. 2 performanceevaluation,” New Trends in Information, pp. 84 – 87, 2010.
[25]Mathew Wilcox, “Website: Documentation/filesystems/ext2.txt,” Linux kernel source documentation.
[26]Mysqlreport. [Online].Available:http://hackmysql.com/mysqlreport
[27]Hoberman, S., The Data Modeler’s Workbench, New York: John Wiley & Sons, 2001.
[28]H. S. Thompson, D. Beech, M. Maloney, N. Mendelsohn. XML Schema Part 1: Structures.[Online].Available:http://www.w3.org/TR/xmlschema-1/
[29]Mosberger, Jin, “httperf—a tool for measuring web server performance,”ACM SIGMETRICS Performance Evaluation Review, v.26, pp.31-37, 1998.
[30]AWStats: Advanced Web Statistics. [Online].Available:http://awstats.sourceforge.net
[31]Autobench: A perl wrapper around httperf for automating benchmarking.[Online].Available: http://www.xenoclast.org/autobench/
[32]Ganglia: Distributed Monitoring and Execution System.[Online].Available: http://ganglia.sourceforge.net
[33]Brian F. Cooper , Adam Silberstein , Erwin Tam , Raghu Ramakrishnan , Russell Sears, “Benchmarking cloud serving systems with YCSB,”Proceedings of the 1st ACM symposium on Cloud computing, pp. 143 – 154, 2010.
[34]httperf - HTTP performance measurement tool.[Online].Available: http://linux.die.net/man/1/httperf
[35]C. Zhang, H. De Sterck. “Supporting Multi-row Distributed Transactions with Global Snapshot Isolation Using Bare-bones HBase,”Proceedings of the 11th International Conference on Grid Computing (Grid), pp. 177 – 184, 2010.
第一頁 上一頁 下一頁 最後一頁 top
1. 2、王永智〈論道家哲學的現代價值〉,《宗教哲學》,第4卷第3期,1998/07。
2. 1、方連祥〈老子「道法自然」的思維探賾〉,《東方人文學誌》,第7卷第3期,2008/9。
3. 3、王邦雄〈略論老子的人生哲學〉,《文藝復興月刋》,第1卷11期,1970/11。
4. 4、王邦雄〈從道家思想看當代人生〉,《鵝湖月刊》,第7卷第70期,1982/04。
5. 5、王邦雄〈老莊道家齊物兩行之道〉,《鵝湖學誌》,第30期,2006/06。
6. 16、李美燕〈當代西方漢學家李約瑟與史華茲眼中的老子「自然」觀〉,《鵝湖月刊》,第30卷第1期, 2003/7。
7. 17、林瑞龍〈論王弼《老子注》中之自然觀念〉,《思辨集》,第8期,2005/3。
8. 18、封思毅〈老子自然法觀念之探討〉,《法學叢刊》,第8卷第2期,1963/4。
9. 19、袁保新〈秩序與創新——從文化治療學的角度省思道家哲學的現代義涵〉,《鵝湖月刊》,第27卷第2期,2001/08。
10. 20、陳文章〈老子「自然」義詮解——老子修證進路試探〉,收錄於《簡牘學報》,第16期(勞貞一先生九秩榮慶論文集),1997/7。
11. 21、陳怡燕〈論《老子》哲學中「自然」之義理內涵〉,《古今藝文》,第33卷第3期,2007/5。
12. 22、陳榮波〈老子的人生哲學及其應用之道〉,《中國文化月刊》,第87期,1987/1。
13. 23、陳德和〈論牟宗三對人間道家的哲學建構—以老子思想的詮釋為例〉,《揭諦》,第3期,2001/05。
14. 24、陳德和〈老莊的教育思想及其實踐〉,《鵝湖月刊》,第27卷第2期,2001/08。
15. 27、程林輝〈老子的人生哲學〉,《人文學報》,第2卷第17期,1995/4。