研究生(外文):Chou, HaoChing
論文名稱(外文):A Case Study of Porting Relational Database to Hbase
指導教授(外文):Ren-Hung Hwang
口試委員(外文):Ben-Jye ChangJen-Yi PanRen-Hung Hwang
外文關鍵詞:HbaseCloud Computingde-normalization
隨著網際網路的成熟導致資料的大量膨脹,針對小量資料(terabyte)做處理的關聯式資料庫已無法滿足目前的使用者。因此間接促使了雲端運算(Cloud Computing)的興起。其中Hbase為雲端運算中發展較為成熟的開放原始碼、以key-value架構並可處理大量資料(petabyte)的資料庫。利用分散式的檔案系統(HDFS)達到結合伺服器之間的資源以達到高效能,且相較於關連式資料庫以行存取(row-oriented)的資料庫,Hbase採用以列為主(column-oriented)的資料存取方式。有別於關聯式資料庫的存取方式,在資料表的設計上也不像關聯式資料庫要依循資料正規化(normalization)。設計上必須考量Hbase資料庫的特性並採用新的策略,並考慮雲端運算的特性。因此本論文提出利用反正規化(de-normalization),並且分析網站的使用行為與Hbase的特性,根據資料表間的關係重新設計符合key-value特性的資料庫的設計方法。最後經由模擬的結果顯示,我們所提出的方法適合在分散式環境底下的資料庫做為設計資料表的指引。
With the rapid traffic growth in the Internet, researchers have found that the relational databases fail to manage large volume of data, e.g., above terabyte dataset. Among methods that could manage large volume data, cloud computing with distributed file system is a popular method which has proven to be a scalable solution. Hbase is one of the open-source, distributed model database projects which stores large volume data, parallels and aggregates the resource among a cluster of servers. However, Hbase stores data in a column-oriented style, in contrast to the row-oriented relational databases,therefore normalization mechanism which is a guideline for relational database's schema design is not suitable for Hbase. In this thesis, we propose a new strategy for redesigning schema for Hbase. Based on the analysis of user activities and the relationship among tables in the original relational database system, the proposed de-normalization mechanism effectively speeds up the data access time.Our numerical results suggest our approach is a good choice for schema design in Hbase.
