跳到主要內容

臺灣博碩士論文加值系統

(44.212.99.208) 您好!臺灣時間:2024/04/23 21:29
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:李圭章
研究生(外文):Kuei-chang Lee
論文名稱:在資料庫系統中Skyline運算之研究
論文名稱(外文):A Study on Skyline Computation in Database Systems
指導教授:陳維美
指導教授(外文):Wei-mei Chen
學位類別:碩士
校院名稱:國立臺灣科技大學
系所名稱:電子工程系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2008
畢業學年度:96
語文別:中文
論文頁數:45
外文關鍵詞:Skyline algorithmdominatedR-treeBranch-and-Bound Skyline
相關次數:
  • 被引用被引用:0
  • 點閱點閱:522
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
處理多重條件間的決策問題時,Skyline 是一非常重要的運算。在維度為 d 的資料集合內,Skyline 中的點具有在所有維度下皆不會被其他點所支配的特性。特別在資料庫的研究領域方面,已經將 Skyline 運算納為基本的語法之一。

本篇論文將研究重點放在 Skyline 演算法本身,希望能改善演算法的內容以求達到更佳的執行效率。在此,我們針對目前 Skyline 演算法中整體效率最佳的 Branch-and-Bound ( BBS ) 演算法進行研究:第一步是改善演算法的內容,利用降低演算法中判斷式的使用來降低整體的比較次數;第二步則是對於原本 BBS 演算法中用來儲存資料的 R-tree 資料結構,提出輔助資料結構來保存有用的資料,使得效率改善更加明顯。由實驗模擬之結果可知,當資料分佈型態為 independent 時在執行效率上能得到明顯提升;而當資料分佈型態為 anti-correlated 時則有些許改善,主要原因為在此種資料分佈中大多數之計算量皆為必須,所能精簡的比例自然有限。未來將以計算量大之資料型態為主,研究出新的改善方法以達到更佳的執行效率。
Skyline is an important computation for handling problems involving multicriteria decision making. With a d-dimensional dataset, the points in the skyline have a common characteristic that contains the points which are not dominated by any other point on all dimensions. Skyline can be widely applied, and even became a fundamental syntax in the research area of database.

In the thesis, we focus on improving the skyline algorithm for better execution efficiency. This thesis is aimed at the Branch-and-Bound Skyline ( BBS ) algorithm which has a good performance in database systems. First, we improve the algorithm by reducing the number of the dominated comparison. Second, we use an auxiliary data structure to keep some promising data and make the algorithm more efficient. By the result in the experiment section, we have that the simulation improves the execution efficiency for independent data, and has a little improvement for anti-correlated data. Because most of the computation are necessary, the reduced ratio for anti-correlated data type is not much consequently. In the future, we hope to study some improved method on the data type with large computation and make the skyline computation more efficient.
摘要 i
Abstract ii
目錄 iii
圖片索引 v
表格索引 vi

第一章. 序論 1

第二章. 相關貢獻 4
2.1 Divide-and-Conquer ( D&C ) 演算法 4
2.2 Block Nested Loop ( BNL ) 演算法 5
2.3 Sort First Skyline ( SFS ) 演算法 6
2.4 Bitmap演算法 7
2.5 Index演算法 7
2.6 Nearest Neighbor ( NN ) 演算法 7
2.6.1 以空間為基底的資料結構:R-tree 8
2.6.2 Nearest Neighbor演算法 10
2.7 各種 Skyline 演算法之綜合討論 11

第三章. BBS 演算法之改進 12
3.1 Branch-and-Bound Skyline ( BBS ) 演算法 12
3.1.1 BBS 演算法內容 12
3.1.2 BBS 演算法範例 15
3.1.3 BBS 演算法之探討 16
3.2 藉由降低 dominated 判斷次數以改善 BBS 演算法之效率 18
3.2.1 改善動機 18
3.2.2 基本概念 18
3.2.3 改善 BBS 演算法中 dominated 判斷次數 22
3.3 藉由輔助資料型態以改善 BBS 演算法之效率 24
3.3.1 基本概念 25
3.3.2 Sieve演算法 25
3.3.3 藉由 Sieve 演算法建立輔助 R-tree 以改善 BBS 演算法 26

第四章. 模擬實驗與數據 30
4.1 改善 BBS 演算法中 dominated 判斷次數之模擬 30
4.2 以輔助資料型態改善 BBS 演算法之模擬 35
4.3 Node Capacity 對於 BBS 演算法的影響 39

第五章. 結論 41

參考文獻 42

圖 1. Skyline 範例 1
圖 2. Divide-and-Conquer 5
圖 3. BNL 範例 6
圖 4. R-tree 範例 9
圖 5. NN 範例 10
圖 6. BBS 中建立 R–tree 範例 14
圖 7. BBS 演算法 15
圖 8. Skyline 中前 20 點的分佈情況 17
圖 9. BBS 演算法中第一點範例 19
圖 10. BBS 的第一種改善法 20
圖 11. 點 S1 所支配的 entries 範例 21
圖 12. dominated 範例 22
圖 13. BBS 改善後的演算法 24
圖 14. Sieve 演算法 26
圖 15. 建立輔助 R-tree 範例 27
圖 16. 建立輔助 R-tree 的演算法 28
圖 17. 建立輔助 R-tree 實例 29
圖 18. Independent,Dimension = 3 之改進圖 31
圖 19. Anti-Correlated,Dimension = 3 ~ 6 35
圖 20. Anti-Correlated,Dimension = 3 ~ 6 39
圖 21. 節點容量與改善效果 39

表 I. BBS 演算法中 Heap 所儲存的內容 16
表 II. 模擬環境相關參數 30
表 III. Independent,Dimension = 4 ~ 6 32
表 IV. Independent,Data Size = 103 ~ 106 33
表 V. Independent,Dimension = 3 ~ 6 36
表 VI. Independent,Data Size = 103 ~ 106 37
[1] R. Andonov, V. Poirriez, and S. Rajopadhye, Unbounded Knapsack Problem: Dynamic Programming Revisited, European Journal of Operational Research, vol. 123, pages 394–407, 2000.

[2] A. Apostolico and C. Guerra, The Longest Common Subsequence Problem Revisited, Algorithmica, vol. 2, pages 315–336, 1987.

[3] N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger, The R*-tree: An Efficient and Robust Access Method for Points and Rectangles, Proc. of the ACM SIGMOD Conference, vol. 19, pages 322–331, 1990.

[4] J. Bentley, K. Clarkson, and D. Levine, Fast Linear Expected-time Algorithms for Computing Maxima and Convex Hulls, Algorithmica, vol. 9, pages 168–183, 1993.

[5] P. Berman, Z. Zhang, Y. I. Wolf, E. V. Koonin, and W. Miller, Winnowing Sequences From a Database Search, Journal of Computational Biology, vol. 7, pages 293–302, 2000.

[6] S. Bespamyatnikh and D. Kirpatrick, Rectilinear 2-center Problems, Proceedings of the 11th Canadian Conference on Computational Geometry (CCCG’99), pages 68–71, 1999.

[7] C. Bohm and H. Kriegel, Determining the Convex Hull in Large Multidimensional databases, Proceedings of the International Conference on Data Warehousing and Knowledge Discovery, pages 294–306, 2001.

[8] S. Borzsonyi, D. Kossmann, and K. Stocker, The Skyline Operator, Proceedings of the IEEE International Conference on Data Engineering, pages 421–430, 2001.

[9] W. M. Chen, H. K. Hwang, and T. H. Tsai, Efficient Maxima-finding Algorithms for Random Planar Samples, Discrete Mathematics and Theoretical Computer Science, vol. 6, page 107–122, 2003.

[10] J. Chomicki, P. Godfrey, J. Gryz, and D. Liang, Skyline With Pre-sorting, Proceedings of the IEEE International Conference on Data Engineering, pages 717–719, 2003.

[11] M. E. Dyer and J. Walker, Dominance in Multi-dimensional Multiple-choice Knapsack Problems, Asia-Pacific Journal of Operational Research, vol. 15, pages 159–168, 1998.

[12] H. Edelsbrunner and M. H. Overmars, On the Equivalence of Some Rectangle Problems, Information Processing Letters, vol. 14, pages 124–127, 1982.

[13] T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama, Interval Finding and Its Application to Data Mining, IEICE Transaction on Fundamentals of Electronics, Communications and Computer Sciences, vol. E80-A, pages 620–626, 1997.

[14] M. S. Gelfand, L. I. Podolsky, T. V. Astakhova, and M. A. Roytberg, Recognition of Genes in Human DNA Sequences, Journal of Computational Biology, vol. 3, pages 223–234, 1996.

[15] A. Guttman, R-trees: A Dynamic Index Structure for Spatial Searching, Proceedings of the ACM Conference on the Management of Data, pages 47–57, 1984.

[16] K. Hakata and H. Imai, Algorithms for the Longest Common Subsequence Problem for Multiple Strings Based on Geometric Maxima, Optimization Methods and Software, vol. 10, pages 233–260, 1998.

[17] A. Henrich, A Distance Scan Algorithm for Spatial Access Structures, Proceeding of the ACM Workshop on Geographic Information Systems, pages 136–143, 1994.

[18] A. Hero and G. Fleury, Pareto-Optimal Methods for Gene Analysis, preprint; available at www.eecs.umich.edu/~hero/bioinfo.html, 2002.

[19] G. Hjaltason and H. Samet, Distance Browsing in Spatial Databases, ACM Trans. Database System, vol. 24, pages 265–318, 1999.

[20] R. E. Johnston and L. R. Khan, A Note on Dominance in Unbounded Knapsack Problems, Asia-Pacific Journal of Operational Research, vol. 12, pages 145–160, 1995.

[21] V. Kathail, S. Aditya, R. Schreiber, B. R. Rau, D. C. Cronquist, and M. Sivaraman, PICO: Automatically Designing Custom Computers, IEEE Computer, vol. 35, pages 39–47, 2002.

[22] D. Kossmann, F. Ramsak, and S. Rost, Shooting Stars in the Sky: An Online Algorithm for Skyline Queries, Proceedings of the Very Large Data Bases Conference, pages 275–286, 2002.

[23] J. Lillis and C. K. Cheng, Timing Optimization for Multisource Nets: Characterization and Optimal Repeater Insertion, IEEE Transations on Computer Aided Design of Integrated Circuits and Systems, vol. 18, pages 322–331, 1999.


[24] J. Matousek, Computing Dominances in En, Information Processing Letters, vol.38, pages 277–278, 1991.

[25] D. Papadias, Y. Tao, G. Fu, and B. Seeger, Progressive Skyline Computation in Database Systems, ACM Transactions on Database Systems, vol. 30, pages 41–82, 2005.

[26] N. Roussopoulos, S. Kelly, and F. Vincent, Nearest Neighbor Queries, Proceeding of the ACM Conference on the Management of Data, pages 71–79, 1995.

[27] T. Sellis, N. Roussopoulos, and C. Faloutsos, The R+-tree: A Dynamic Index for Multidimensional Objects, Proceedings of the Very Large Data Bases Conference, pages 507–518, 1987.

[28] Y. Sun and C. L. Liu, An Area Minimizer for Floorplans With L-shaped Regions, Proceedings of the IEEE ICCD’92, pages 383–386, 1992.

[29] K. Tan, P. Eng, and B. Ooi, Efficient Progressive Skyline Computation, Proceedings of the Very Large Data Bases Conference, pages 301–310, 2001.

[30] D. A. Van Veldhuizen and G. B. Lamont, Multiobjective Evolutionary Algorithms: Analyzing the State-of-the-art, Evolutionary Computation, vol. 8, pages 125–147, 2000.

[31] T. Xia and D. Zhang, Improving the R*-tree With Outlier Handling Techniques, Processing of the 13th Annual ACM International Workship on Geographic Information Systems, pages 125–134, 2005.

[32] E. Zitzler, K. Deb, L. Thiele, C. A. Coello Coello, and D. Corne, Proceedings of the First International Conference on Evolutionary Multi-criterion Optimization, Lecture Notes in Computer Science, vol. 1993, 2001.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top