跳到主要內容

臺灣博碩士論文加值系統

(3.239.4.127) 您好!臺灣時間:2022/08/20 07:05
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:吳發榮
研究生(外文):Fa-Jung Wu 
論文名稱:一個資料儲倉中查詢區域和的遞迴相對前置和方法
論文名稱(外文):A Recursive Relative Prefix Sum Approach to Range Queries in Data Warehouses
指導教授:張玉盈
指導教授(外文):Ye-In Chang
學位類別:碩士
校院名稱:國立中山大學
系所名稱:資訊工程學系研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:英文
論文頁數:101
中文關鍵詞:資料方體資料更新資料儲倉加總運算查詢區域和
外文關鍵詞:data warehouseaggregation operationrange sum queryupdatedata cube
相關次數:
  • 被引用被引用:0
  • 點閱點閱:110
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0

資料儲倉(Data Warehouse)提供了整合多個操作型資料庫(Operational Database)的一致性正確歷史資料,因此適合用於資料的分析應用。線上資料分析處理系統(OLAP)進一步提供了分析用的工具,如加總過後的資料,以便從資料儲倉萃取有用的資訊。區域查詢(Range Query)是我們常在線上資料分析處理系統上,對特定的數值維度區域作加總的運算。其中查詢區域和運算對於探索資料庫中不同屬性之間的關聯性,以及在做趨勢分析上非常的有用。藉由預先計算好的輔助資料,前置和方法(Prefix Sum Method)對於在資料方體上作查詢區域和的回應,只需要常數時間。然而,前置和方法對於資料更新的處理效能卻不甚理想。目前一些需要當前資訊的應用,要求的是迅速的回應時間,以及合理的更新處理時間。然而由於資料方體的大小,是和它的維度數量成指數關係,因此重新建立資料方體的成本是很驚人的,而且不切實際。為了處理動態資料方體上的問題,目前有幾種解決方法。這些方法都是利用特殊的資料結構,以額外的空間儲存相關資料,來對查詢區域和運算做快速的回應。例如雙重相對前置和方法(Double Relative Prefix Sum Method)使用三個特殊的資料結構來儲存輔助的資料。雖然雙重相對前置和方法改善了資料更新處理的時間,然而其代價是增加查詢所需的時間。我們在這篇論文裡,提出了一個新的方法,叫做遞迴相對前置和方法(Recursive Relative Prefix Sum Method);這方法對於資料的查詢與更新處理提供了一個折衷的方式。在遞迴k層的相對前置和方法中,我們使用了一個相對前置陣列,以及k個相對覆蓋值陣列來儲存一些輔助的資料。從效能分析的結果來看,我們所提出的方法在資料更新的處理上,所花的成本小於前置和方法,以及相對前置和方法(Relative Prefix Sum Method);此外,在大多數的情況下,我們所提出的方法在查詢上所花的成本小於雙重相對前置和方法;而相較於動態資料方體方法(Dynamic Data Cub Method),我們所提出的方法所需的額外儲存空間較少,並且有較短的查詢回應時間。因此,在我們所提出的遞迴相對前置和方法上,對於查詢區域和運算我們提供了合理的回應時間,並且大幅度的減少了資料更新處理的時間。同時,對於一些應用領域,資料更新的發生頻率在某些區域會小於其他的區域,我們也提供了一個方法─加權相對前置和方法(Weighted Relative Prefix Sum Method),它是遞迴相對前置和方法的一個延伸。在考慮區域的資料更新發生機率不同的情況下,加權相對前置和方法同樣的對於查詢區域和以及資料更新處理的成本,提供了一個折衷的方案。


Data warehouses contain data consolidated from several operational databases and provide the historical, and summarized data which is more appropriate for analysis than detail, individual records. On-Line Analytical Processing (OLAP) provides advanced analysis tools to extract information from data stored in a Data Warehouse.
OLAP is designed to provide aggregate information that can be used to analyze the contents of databases and data warehouses. A range query applies an aggregation operation over all selected cells of an OLAP data cube where the selection is specified by providing
ranges of values for numeric dimensions. Range sum queries are very useful in finding trends and in discovering relationships between attributes in the database. There is a method, prefix sum method, promises that any range sum query on a data cube can be answered in constant time by precomputing some auxiliary information. However, it is hampered by its update cost. For
today's applications, interactive data analysis applications which provide current or "near current" information will require fast
response time and have reasonable update time. Since the size of a data cube is exponential in the number of its dimensions, rebuilding the entire data cube can be very costly and is not
realistic. To cope with this dynamic data cube problem, several strategies have been proposed. They all use specific data structures, which require extra storage cost, to response range
sum query fast. For example, the double relative prefix sum method makes use of three components: a block prefix array, a relative overlay array and a relative prefix array to store auxiliary
information. Although the double relative prefix sum method improves the update cost, it increases the query time. In the thesis, we present a method, called the recursive relative
prefix sum method, which tries to provide a compromise between query and update cost. In the recursive relative prefix sum method with k levels, we use a relative prefix array and k
relative overlay arrays. From our performance study, we show that the update cost of our method is always less than that of the prefix sum method. In most of cases, the update cost of our method is less than that of the relative prefix sum method. Moreover, in most of cases, the query cost of our method is less than that of
the double relative prefix sum method. Compared with the dynamic data cube method, our method has lower storage cost and shorter query time. Consequently, our recursive relative prefix sum method has a reasonable response time for ad hoc range queries on the data cube, while at the same time, greatly reduces the update cost. In some applications, however, updating in some regions may happen more frequently than others. We also provide a solution, called the weighted relative prefix sum} method, for this situation. Therefore, this method can also provide a compromise between the range sum query cost and the update cost, when the update probabilities of different regions are considered.


Page
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . .. . . iv
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Data Warehouses . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 OLAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Range Sum Queries on OLAP . . . . . . . . . . . . . . . . . . . ..9
1.4 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 The Organization of the Thesis . . . . . . . . . . . . . . . . . 14
2. A Survey of Strategies for Range Sum Queries . . . . . . . . . . . 15
2.1 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . ..15
2.2 Pre x Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Relative Pre x Sum . . . . . . . . . . . . . . . . . . . . . .. . 19
2.4 The Double Relative Pre x Sum Approach . . . . . . . . . . . . . 23
2.5 The Dynamic Data Cube . . . . . . . . . . . . . . . . . . . . . . 29
3. The Recursive Relative Pre x Sum Approach . . . . . . . . . . . . 34
3.1 Level-k Relative Overlay Arrays . . . . . . . . . . . . . . . . . 34
3.2 Relative Pre x Arrays . . . . . . . . . . . . . . . . . . . . . . 40
3.3 Queries and Updates . . . . . . . . . . . . . . . . . . . . . . . 42
3.3.1 Range Sum Queries . . . . . . . . . . . . . . . . . . . . . . . 43
3.3.2 Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.4 The Recursive Relative Pre x Sum Method with 3-Levels . . . . . . 46
3.5 A Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4. The Weighted Relative Pre x Sum Approach . . . . . . . . . . . . 52
4.1 The Weighted Relative Pre x Sum Method with 2-Levels . . . . . . 52
4.2 Queries and Updates . . . . . . . . . . . . . . . . . . . . . . . 56
4.2.1 Range Sum Queries . . . . . . . . . . . . . . . . . . . . . . . 56
4.2.2 Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3 A Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5. Performance Study . . . . . . . . . . . . . . . . . . . . . . . . 64
5.1 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . 64
5.2 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . 70
5.3 Improving Updates . . . . . . . . . . . . . . . . . . . . . . . . 78
6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
A. Details of Calculation . . . . . . . . . . . . . . . . . . . . . . 94
B. Discussion of Simulation . . . . . . . . . . . . . . . . . . . . . 98


[1] S. Agarwal, R. Agrawal, P. M. Deshpande, A. Gupta, J. F. Naughton, R. Ra- makrishnan, and S. Sarawagi, "On the Computation of Multidimensional Aggre- gates," In Proc. of the 22nd VLDB Conf., pp. 506-521, 1996. [2] R. Agrawal. A. Gupta, and S. Sarawagi, "Modeling Multidimensional Databases," In Proc. of the 13th IEEE Int'l. Conf. on Data Engineering, pp. 232-243, 1997. [3] S. Chaudhuri, and U. Dayal, "An Overview of Data Warehouse and OLAP Tech- nology." ACM SIGMOD Record, Vol. 26, No. 1, pp. 65-74, 1997. [4] E. F. Codd, "Providing OLAP (On-Line Analytical Processing) to User-Analysts: An IT Mandate." Technical Report, E. F. Codd and Associates, 1993. [5] S. Ge ner, D. Agrawal, A. El Abbadi, and T. Smith, "Relative Pre x Sums: An Efficient Approach for Querying Dynamic OLAP Data Cubes," In Proc. of the 15th IEEE Int'l. Conf. on Data Engineering, pp. 328-335, 1999. [6] S. Ge ner, D. Agrawal, and A. El Abbadi, "The Dynamic Data Cube," In Proc. of the 7th Int'l Conf. on Extending Database Technology, pp.237-248, 2000. [7] J. Gray, A. Bosworth, A. Layman, and H. Pirahesh, "Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-Tabs and Sub-Totals," In Proc. of the 12th IEEE Int'l Conf. on Data Engineering, pp. 152-159, 1996. [8] H. Gupta, V. Harinarayan, A. Rajaraman, and J. D. Ullman, "Index Selection for OLAP," In Proc. of the 13rd IEEE Int'l. Conf. on Data Engineering, pp. 208-219, 1997. [9] V. Harinarayan, A. Rajaraman, and J. D. Ullman, "Implementing Data Cube Efficiently," In Proc. of the ACM SIGMOD Conf. on the Management of Data, pp. 205-216, 1996. [10] C. Ho, R. Agrawal, N. Megiddo, and R. Srikant, "Range Queries in OLAP Data Cubes," In Proc. of the ACM SIGMOD Conf. on the Management of Data, pp. 73-88, 1997.[11] B. Inmon, H. William, and C. Kelley, "The 12 Rules of Data Warehouse For A Client/Server World," Data Management Review, Vol. 4, pp. 6-16, May, 1994. [12] T. Johnson, and D. Shasha, "Hierarchically Split Cube Forests for Decision Support: Description and Tuned Design," Working paper, 1996. [13] W. Liang, H. Wang, and M. E. Orlowska, "Range Queries in Dynamic OLAP Data Cubes," Data and Knowledge Engineering, Vol. No. 34, pp. 21-38, Mar. 2000.[14] I. S. Mumick, D. Quass, and B. S. Mumick, "Maintenance of Data Cubes and Summary Tables in a Warehouse." In Proc. of the ACM SIGMOD Conf. on the Management of Data, pp. 100-111, 1997. [15] The OLAP Council. MD-API the OLAP Application Program Inerface Version 5.0 Speci cation, September 1996. [16] T. Palpanas, "Knowledge Discovery in Data Warehouses," ACM SIGMOD Record, Vol. 29, No. 3, pp. 88-100, 2000.[17] K. A. Ross, and D. Srivastava, "Fast Computation of Sparse Datacubes," In Proc. of 23rd VLDB Conf., pp. 116-125, 1997. [18] B. Salzberg, and A. Reuter, "Indexing for Aggregation." Working paper, 1996.[19] A. Shukla, P. M. Deshpande, J. F. Naughton, and K. Ramasamy, "Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies." In Proc. of 22nd VLDB Conf., pp. 522-531, 1996. [20] Y. Zhao, P. M. Deshpande, and J. F. Naughton, "An Array-Based Algorithm for Simultaneous Multidimensional Aggregates," In Proc. of the ACM SIGMOD Conf. on the Management of Data , pp. 159-170, 1997.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊