跳到主要內容

臺灣博碩士論文加值系統

(18.204.48.69) 您好!臺灣時間:2021/07/27 23:23
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:林千琇
研究生(外文):Chien-Hsiu Lin
論文名稱:一個以計數分配為基礎來設計用於資料倉儲之動態區域查詢的位元索引方法
論文名稱(外文):A Count-Based Partition Approach to the Design of the Range-Based Bitmap Indexes for Data Warehouses
指導教授:張玉盈
指導教授(外文):Ye-in Chang
學位類別:碩士
校院名稱:國立中山大學
系所名稱:資訊工程學系研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2004
畢業學年度:92
語文別:英文
論文頁數:98
中文關鍵詞:位元索引線上資料分析處理系統壓縮資料倉儲區域查詢
外文關鍵詞:bitmap indexrange querydata warehousecompressOLAP
相關次數:
  • 被引用被引用:0
  • 點閱點閱:116
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
資料倉儲(Data Warehouse)提供了整合多個操作型資料庫(Operational Database)一致性正確歷史資料,因此適用於資料的分析應用。線上資料分析處理系統(OLAP)進一步提供了分析用的工具,以便從資料倉儲萃取有用的資訊。在線上支援決策系統中,快速的回應時間是必要的。而在以讀取為主的環境裡,為了達到此目的,位元索引是最常用的方法。當資料屬性有高的基數(cardinality),以區域為基礎的位元索引法(RBI)將資料分成數個部份(range-based),利用位元向量(bitmap vector)去記錄,是較有效率的做法。然而,RBI在每段範圍含的資料個數不平均,導致不同查詢硬碟搜尋次數不一致的問題。Wu等人提出的DBEC演算法中,已進一步考慮資料的分佈情況。但卻不能保證切割結果符合需要位元向量的個數。而且,對有相同值的不同筆紀錄,可能被切割至不同的位元向量,導致硬碟輸入輸出時間增加。因此,我們提出IPDF、CP、CP*三種演算法,這三種演算法考慮了具有高基數的資料屬性及非均勻的資料分佈的情況,用來建立以區域為基礎的位元索引。在IPDF演算法中,我們根據機率密度函數(p.d.f.)切割每個部份,使得每個切割的結果包含的資料量趨於一致。在CP演算法中,我們將資料排序,然後一定數量分成同一部份。CP*演算法為CP演算法之改進,它調整分割點使得有相同值之不同紀錄被分割至同一部份。在另一方面,我們考慮查詢的歷史。根據貪婪演算法,我們提出GreedyExt和GreedyRange演算法。GreedyExt演算法針對精確查詢所設計,而GreedyRange演算法則是針對區域查詢設計。這兩種演算法挑選最常用的查詢建立位元向量,使得回答查詢的平均回應時間降低。除此之外,位元索引包含了一組位元向量,而位元索引大小可能大於硬碟的容量。為了節省儲存空間,我們提出了FZ演算法,將每個位元向量壓縮。而且為了保持二元運算的快速性,也提出了有效的演算法,可直接將位元向量作二元運算,而不需解壓縮。最後,跟據我們的分析,在硬碟存取次數上,CP*演算法優於CP演算法。而跟據我們的模擬結果,IPDF和CP*切割出來的部份,比DBEC更加平均。GreedyExt和GreedyRange這兩個演算法,在大部份的情況,都可以得到更快速的回應時間。在壓縮方面,我們提出的FZ演算法,則可比WAH演算法節省更多的儲存空間。
Data warehouses contain data consolidated from several operational databases and provide the historical, and summarized data which is more appropriate for analysis than detail, individual records. On-Line Analytical Processing (OLAP) provides advanced analysis tools to extract information from data stored in a data warehouse. Fast response time is essential for on-line decision support. A bitmap index could reach this goal in read-mostly environments. When data has high cardinality, we prefer to use the Range-Based Index (RBI), which divides the attributes values into several partitions and a bitmap vector is used to represent a range. With RBI, however, the number of records assigned to different ranges can be highly unbalanced, resulting in different search times of disk accesses for different queries. Wu et al proposed an algorithm for RBI, DBEC, which takes the data distribution into consideration. But the DBEC strategy could not guarantee to get the partition result with the given number of bitmap vectors, PN. Moreover, for different data records with the same value, they may be partitioned into different bitmap vectors which takes long disk I/O time. Therefore, we propose the IPDF, CP, CP* strategies for constructing the dynamic range-based indexes concerning with the case that data has high cardinality and is not uniformly distributed. The IPDF strategy decides each partition according to the Probability Density Function (p.d.f.). The CP strategy sorts the data and partitions them into PN groups for every w continuous records. The CP* strategy is an improved version of the CP strategy by adjusting the cutting points such that data records with the same value will be assigned into the same partition. On the other hand, we could take the history of users'' queries into consideration. Based on the greedy approach, we propose the GreedyExt and GreedyRange strategies. The GreedyExt strategy is used for answering exact queries and the GreedyRange strategy is used for answering range queries. The two strategies decide the set of queries to construct the bitmap vectors such that the average response time of answering queries could be reduced. Moreover, a bitmap index consists of a set of bitmap vectors and the size of the bitmap index could be much larger than the capacity of the disk. We propose the FZ strategy to compress each bitmap vector to reduce the size of the storage space and provide efficient bitwise operations without decompressing these bitmap vectors. Finally, from our performance analysis, the performance of the CP* strategy could be better than the CP strategy in terms of the number of disk accesses. From our simulation, we show that the ranges divided by the IPDF and CP* strategies are more uniform than those divided by the DBEC strategy. The GreedyExt and GreedyRange strategies could provide fast response time in most of situations. Moreover, the FZ strategy could reduce the storage space more than the WAH strategy.
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 DataWarehouse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Decision Support Systems . . . . . . . . . . . . . . . . . . . . 2
1.1.2 The DimensionalModel . . . . . . . . . . . . . . . . . . . . . 2
1.1.3 OLAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Query Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Variant Index Strategies . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.1 TheMultidimensional Array-Based Strategy . . . . . . . . . . 11
1.3.2 The Hierarchical Indexing Strategy . . . . . . . . . . . . . . . 12
1.3.3 TheMultidimensional Indices Strategy . . . . . . . . . . . . . 12
1.3.4 The Bitmap Index Strategy . . . . . . . . . . . . . . . . . . . 13
1.4 RelatedWork of the Bitmap Index . . . . . . . . . . . . . . . . . . . 13
1.5 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.6 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . 24
2. A Survey of Strategies for Bitmap Indexing . . . . . . . . . . . . . 25
2.1 Simple Bitmap Index . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 (Binary) Bit-Sliced Index . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 Encoded Bitmap Index . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4 Equality, Range, Interval Encoding . . . . . . . . . . . . . . . . . . . 31
2.5 Range-Based Bitmap Index . . . . . . . . . . . . . . . . . . . . . . . 35
2.6 Compressing Bitmap Index . . . . . . . . . . . . . . . . . . . . . . . . 37
2.6.1 Byte-Aligned Bitmap Codes . . . . . . . . . . . . . . . . . . . 38
2.6.2 Word-Aligned Hybrid Run-Length Code . . . . . . . . . . . . 39
3. Dynamic Range-Based Bitmap Indexing
Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.1 Strategies for Non-uniformData Distribution. . . . . . . . . . . . . . 42
3.1.1 The Integral Probability Density Function-Based Strategy (IPDF) 43
3.1.2 The Count-Based Partition Strategy (CP) . . . . . . . . . . . 46
3.1.3 The Count-Based Partition* Strategy (CP*) . . . . . . . . . . 48
3.2 Strategies for the History of Queries . . . . . . . . . . . . . . . . . . . 51
3.2.1 The GreedyExt Strategy for Exact Queries . . . . . . . . . . . 51
3.2.2 The GreedyRange Strategy for Range Queries . . . . . . . . . 54
3.3 The Filtering-Out-Zeros (FZ) Strategy for compressing the Bitmap
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4. Performance Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.1 The CP and CP* Strategies . . . . . . . . . . . . . . . . . . . . . . . 70
4.2 The Performance of the DBEC, IPDF, and CP* Strategies . . . . . . 76
4.2.1 The Performance Model . . . . . . . . . . . . . . . . . . . . . 76
4.2.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . 77
4.3 Performance Analysis of the GreedyExt and GreedyRnage Strategies 82
4.4 The Performance of the WAH and FZ Strategies . . . . . . . . . . . . 84
4.4.1 The PerformanceModel . . . . . . . . . . . . . . . . . . . . . 84
4.4.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . 85
5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.2 FutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
A. Different Types of Data Distributions . . . . . . . . . . . . . . . . . 94
[1] R. Agrawal, A. Gupta, and S. Sarawagi, “Modeling Multidimensional
Databases,” Proc. of the 13th IEEE Int. Conf. on Data Eng., pp. 232–243, 1997.
[2] G. Antoshenkov, “Byte-Aligned Bitmap Compression,” tech. rep., Oracle Corp.,
1994.
[3] G. Casella and R. L. Berger, Statistical Inference. 511 Forest Lodge Road, Pacific
Grove, CA 93950, USA: the Wadsworth Group, second ed., 2001.
[4] C. Y. Chan and Y. E. Ioannidis, “Bitmap Index Design and Evaluation,” Proc. of
the ACM SIGMOD Int. Conf. on Management of Data, pp. 355–366, 1998.
[5] C. Y. Chan and Y. E. Ioannidis, “An Efficient Bitmap Encoding Scheme for
Selection Queries,” Proc. of the ACM SIGMOD Int. Conf. on Management of
Data, pp. 215–226, 1999.
[6] S. Chaudhuri and U. Dayal, “An Overview of Data Warehouse and OLAP Technology,”
ACM SIGMOD Record, Vol. 26, No. 1, pp. 65–74, 1997.
[7] E. F. Codd, “Providing OLAP (On-Line Analytical Processing) to User-Analysts:
An IT Mandate,” tech. rep., E. F. Codd and Associates, 1993.
[8] C. I. Ezeife, “Selecting and Materializing Horizontally Partitioned Warehouse
Views,” Data and Knowledge Eng., Vol. 36, No. 2, pp. 185–210, Jan. 2001.
[9] J. Gray, A. Bosworth, A. Layman, and H. Pirahesh, “Data Cube: A Relational
Aggregation Operator Generalizing Group-by, Cross-Tabs and Sub-
Totals,” Proc. of the 12th IEEE Int. Conf. on Data Eng., pp. 152–159, 1996.
[10] A. Gupta and I. S. Mumick, “Maintenance of Materialized Views: Problems,
Techniques, and Applications,” IEEE Data Eng. Bulletin, Vol. 18, No. 2, pp. 3–
18, June 1995.
[11] J. Han, “OLAP Mining: An Integration of OLAP with Data Mining,” Proc. of
IFIP Conf. on Data Semantics, pp. 1–11, 1997.
[12] V. Harinarayan, A. Rajaraman, and J. D. Ullman, “Implementing Data Cubes
Efficiently,” Proc. of the ACM SIGMOD Int. Conf. on Management of Data,
pp. 205–216, 1996.
[13] K. J. Hastings, Probability and Statistics. Addison Wesley Longman, Inc.,
first ed., 1997.
[14] B. Inmon, H. William, and C. Kelley, “The 12 Rules of Data Warehouse for A
Client/Server World,” Data Management Review, pp. 6–16, May 1994.
[15] M. Jurgens and H.-J. Lenz, “Tree Based Indexes vs. Bitmap Indexes: A Performance
Study,” Int. Journal of Cooperative Information Systems, Vol. 10, No. 1-2,
pp. 355–376, Mar. 2001.
[16] P. Kalnis, N. Mamoulis, and D. Papadias, “View Selection Using Randomized
Search,” Data and Knowledge Eng., Vol. 42, No. 1, pp. 89–11, July 2002.
[17] M. Levene and G. Loizou, “Why is the Snowflake Schema A Good Data Warehouse
Design?,” Proc. of the 10th Int. Conf. on Tools with Artificial Intelligence,
pp. 225–240, 2003.
[18] T. W. Ling and E. K. Sze, “Materialized View Maintenance Using Version Numbers,”
Proc. of the 6th Int. Conf. on Database Systems for Advanced Applications,
pp. 263–270, 1999.
[19] P. O’Neil, “Model 204 Architecture and Performance,” Springer-Verlag Lecture
Notes in Computer Science 359, Proc. of the 2nd Int. Workshop on High Performance
Transactions Systems, pp. 40–59, 1987.
[20] P. O’Neil and D. Quass, “Improved Query Performance with Variant Indexes,”
Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pp. 38–49, 1997.
[21] T. Palpanas, “Knowledge Discovery in Data Warehouses,” ACM SIGMOD
Record, Vol. 29, No. 3, pp. 88–100, Sept. 2000.
[22] R. Ramakrishnan and J. Gehrke, “Database Management Systems,” tech. rep.,
The McGraw-Hill Companies, 2000.
[23] S. Sarawagi, “Indexing OLAP Data,” IEEE Data Eng. Bulletin, Vol. 202, No. 1,
pp. 36–43, 1997.
[24] A. Shoshani, L. M. Bernardo, H. Nordberg, D. Rotem, and A. Sim, “Multidimensional
Indexing and Query Coordination for Tertiary Storage Management,”
Proc. of the 11th Int. Conf. on Scientific and Statistical Database Management,
pp. 214–225, 1999.
[25] A. Y. Sihem and T. Johnson, “Optimizing Queries on Compressed Bitmap,”
Proc. of the 26th Int. Conf. on Very Large Data Bases, pp. 329–338, 2000.
[26] N. Stefanovic, J. Han, and K. Koperski, “Object-Based Selective Materialization
for Efficient Implementation of Spatial Data Cubes,” IEEE Trans. on Knowledge
and Data Eng., Vol. 12, No. 6, pp. 938–958, Nov./Dec. 2000.
[27] K. Stockinger, “Design and Implementation of Bitmap Indices for Scientific
Data,” Proc. of the Int. Database Eng. Applications Symposium, pp. 47–57, 2001.
[28] K. Stockinger, “Bitmap Indices for Speeding Up High-Dimensional Data Analysis,”
Proc. of the 13th Int. Conf. on Database and Expert Systems Applications,
pp. 881–890, 2002.
[29] K. Stockinger, D. Dullmann, W. Hoschek, and E. Schikuta, “Improving the
Performance of High-Energy Physics Analysis Through Bitmap Indices,” Proc. of
the 11th Int. Conf. on Database and Expert Systems Applications, pp. 835–845,
2000.
[30] D. Theodoratos, “Detecting Redundant Materialized Views in Data Warehouse
Evolution,” Information Systems, Vol. 26, No. 5, pp. 363–381, July 2001.
[31] P. Westerman, Data Warehousing. Morgan Kaufmann, first ed., 2001.
[32] H. K. T. Wong, H.-F. Liz, F. Olken, D. Rotem, and L. Wong, “Bit Tranposed
Files,” Proc. of the 11th Int. Conf. on Very Large Data Bases, pp. 448–457, 1985.
[33] K. Wu, E. J. Otoo, and A. Shoshani, “Compressed Bitmap Indices for Efficient
Query Processing,” tech. rep., LBNL-47807, Lawrence Berkeley National Laboratory,
Berkeley, CA, 1999.
[34] K. Wu, E. J. Otoo, and A. Shoshani, “Compressing Bitmap Indexes for Faster
Search Operations,” Proc. of the 14th Int. Conf. on Scientific and Statistical
Database Management, pp. 625–658, 2002.
[35] K.-L. Wu and P. S. Yu, “Range-Based Bitmap Indexing for High-Cardinality Attributes
with Skew,” Proc. of the 22th Int. Computer Software and Applications
Conf., pp. 61–67, 1998.
[36] M. C. Wu, “Query Optimization for Selections Using Bitmaps,” Proc. of the
ACM SIGMOD Int. Conf. on Management of Data, pp. 227–238, 1999.
[37] M. C. Wu and A. P. Buchmann, “Encoded Bitmap Indexing for Data Warehouses,”
Proc. of the 14th Int. Conf. on Data Eng., pp. 220–230, 1998.
[38] W. S. Yang, Y. D. Chung, and M. H. Kim, “The RD-Tree: a Structure for Processing
Partial-MAX,MIN Queries in OLAP,” Information Sciences, Vol. 146,
No. 1-4, pp. 137–149, Oct. 2002.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
1. 魏麗敏(1996):國小學生學習動機、數學焦慮與數學成就之研究。國立台中師範學院國民教育研究集刊,4,133-155。
2. 譚寧君、涂金堂(2000)。國小數學實驗班與非實驗班學生數學學習成效之比較研究。國立臺北師範學院學報,13,P397-427。
3. 賴賢宗(1999)。共識與共識理論危機的當代論諍。思與言,37(3),119-139。
4. 廖春文(1990)。哈伯瑪斯溝通行動理論及其在教育行政的啟示。彰化師範大學學報,1,59-86。
5. 黃秀文(1996)。從傳統到變通:教學評量的省思。國民教育研究學報,12,1-26。
6. 曹常仁(1994)。親師關係之經營。國教之聲,27(3),8-17 。
7. 陳美玉(1999)。教師專業發展途徑之探討---以教師專業經驗合作反省為例。教育研究資訊,7(2),80-99。
8. 胡夢鯨(1993)。哈伯瑪斯溝通行動理論探微:貢獻與限制。國立中正大學學報,4(1),33-70。
9. 林碧珍(2001a)協助教師實踐學生數學學習歷程檔案之行動研究。新竹師院學報,14,163-213。
10. 林俊瑩、林淑華(2000)。哈伯瑪斯(J.HabeRmas)「溝通行動理論」在親師溝通歷程中的啟示。研習資訊,17(2),70-75。
11. 林明地(1999)。家長參與學校教育的研究與實際:對教育改革的啟示。教育研究資訊,7(2),61-79。
12. 吳元良(1996):不同數學課程、性別、社經地位的國小學生在數學態度及成就上比較之研究。國立屏東師範學院國民教育研究所碩士論文,未出版,屏東縣。
13. 李俊輝(2000)。公共領域在社區成人教育上的意義。社區發展季刊,89,188-200。
14. 何明修(1998)。談哈伯瑪斯的系統與生活世界之區分:回到康德的批判理論。哲學雜誌,26,150-167。
15. 古明峰(2000)。數學評量的現況與發展。國教世紀,189。