跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.107) 您好!臺灣時間:2025/12/18 19:09
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:林靖琨
論文名稱:具限制條件聚合連結之高效率天際線查詢方法
論文名稱(外文):Efficient Processing of Skyline Queries in Aggregate-Join with Constraints
指導教授:林明言
口試委員:蘇宗安彭文志
口試日期:2016-06-14
學位類別:碩士
校院名稱:逢甲大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2016
畢業學年度:104
語文別:英文
論文頁數:82
中文關鍵詞:天際線查詢連結天際線匯總連結限制條件
外文關鍵詞:Skyline QuerySkyline JoinAggregate-JoinConstraint
相關次數:
  • 被引用被引用:0
  • 點閱點閱:245
  • 評分評分:
  • 下載下載:11
  • 收藏至我的研究室書目清單書目收藏:0
天際線查詢(Skyline Query)是資料庫系統的重要研究主題之一。天際線查詢主要是藉由支配的關係從資料集中回傳使用者有興趣的集合,稱為天際線(Skyline)。過去天際線相關的研究通常假設天際線查詢僅執行於單一的表格上,但資料庫應用中,表格連結(Table Join)的查詢所在多有,因此連結兩個以上的表格的連結天際線查詢(Skyline Join)也非常重要。
由於單一表格天際線查詢在高維度資料中已經相當複雜,因此連結天際線查詢的研究並不多。我們發現,實務應用上,表格連結的天際線查詢,經常會有匯總(Aggregate)的需求,而且更多時候會對匯總屬性有進一步的條件限制(Constraint),以找出符合使用者需求的集合。例如:連結同一地區的飯店表格與餐廳表格以找出最佳組合,通常會在組合的匯總價格有預算限制;找出前往不同城市旅遊的最佳搭車方案,可能在旅行匯總時間有時間限制;找出不同產品綁售最佳組合,可能對產品生產總時間限制等。因此,解決具限制條件之匯總連結天際線查詢更重要。
本論文中,我們明確定義此問題並提出一個有效率的演算法稱SAJC (Skyline in Aggregate-Join with Constraints)加以解決。SAJC設計排序(Sorting)與提前剔除(Early Pruning)的技巧提早減少連結表格內會被支配的資料項目,再進一步以限制型連結(Constrained Join)技巧大量減少連結匯總資料的產生,最後進行支配檢查(Dominance Check)運算最終答案。透過模擬資料與真實資料的廣泛實驗,結果顯示,SAJC演算法與近年較佳的SEPT演算法處理後再挑選符合限制的方法比較,執行效率快9到40倍。與最接近的MSC演算法比較,執行效率平均快2倍。此外,SAJC演算法也具有良好的擴展性。
Skyline query is an important issue in database research. The query uses a dominance relationship to return an interesting set, called skyline, for the user. Previous researches usually assume that the skyline query is applied on a single table only. However, table join is so common in database queries that finding the skyline in joining tables, called skyline join, becomes an essential problem.
The problem of finding the skyline in a single table of high dimension is complicated so that the algorithms for solving skyline joins hardly can be found. In practice, table join often generates a new attribute by aggregation, and a user generally specifies a constraint on the aggregated attribute for the skyline join. For example, finding the skyline of joining a hotel table and a restaurant table on the same location for best hotel-restaurant combinations usually comes with a budget on the total price; business trips travelling across cities would be constrained in the total travelling distance after joining the traffic tables on the same city for finding the best travelling plan; finding best sales for product bundling demands skyline join on the production area with a constraint of total production time. Thus, discovering skylines in aggregate-join with constraints is more important in practice.
In this thesis, we propose an algorithm called SAJC (Skyline in Aggregate-Join with Constraints) to solve the problem. SAJC uses sorting and early-pruning techniques to eliminate data before aggregate-join. SAJC then uses a constrained-join technique to reduce the tuples in the join and computes the answer by the dominance-check technique. Our comprehensive experiments using synthetic datasets and real datasets show that SAJC is 9 to 40 times faster than the SEPT algorithm, 2 times faster than the MSC algorithm in average, and has excellent scalability.
誌謝 i
摘要 ii
Abstract iii
Table of Contents iv
List of Figures vii
List of Tables ix
Chapter 1 Introduction 1
1.1 Background 1
1.2 Motivation 4
1.3 Objective of the Research 4
1.4 Problem Definition 5
1.5 Organization of the Thesis 8
Chapter 2 Related Work 9
2.1 Skyline Query 9
2.1.1 Block Nested Loops (BNL) 9
2.1.2 The Divide and Conquer (DC) Algorithm 10
2.1.3 The Sort-Filter-Skyline (SFS) Algorithm 11
2.1.4 The Branch and Bound Skyline (BBS) Algorithm 12
2.2 Skyline Join 14
2.2.1 The Sort-First-Skyline-Join (SFSJ) Algorithm 14
2.2.2 The Skyline-Sensitive Join (S2J) Algorithm 15
2.2.3 Symmetric Skyline-Sensitive Join (S3J) 18
2.2.4 Skyline-join with Early Pruning and Termination (SEPT) 19
2.2.5 Aggregate Skyline Join Query 21
2.3 Skyline Query with Constraint 25
2.4 Summary 26
Chapter 3 The Proposed Algorithm 28
3.1 Phase One: Sorting 29
3.2 Phase Two: Early Pruning 30
3.3 Phase Three: Constrained Join 32
3.4 Phase Four: Dominance Check 35
3.5 Pseudo Codes of the SAJC Algorithm 36
3.6 Discussion 38
3.6.1 More than One Aggregate Attribute 39
3.6.2 More than One Join Attribute 40
3.6.3 More than Two Tables 41
3.6.4 Other Aggregate Functions 42
3.6.5 Semantics of the Greater-than Constraint 43
Chapter 4 Performance Evaluation 44
4.1 Setup of the Experiments 44
4.2 Evaluations on Small-Scale Datasets 46
4.3 Experiments on Real Datasets 50
4.4 Scale-up Experiments 56
4.4.1 Varying Number of Tuples 57
4.4.2 Varying Constraint Value 59
4.4.3 Varying Number of Skyline Attributes 60
4.4.4 Varying Join Selectivity 62
Chapter 5 Conclusions 65
5.1 Contributions 65
5.2 Future Work 66
References 67
[1]M. Bai, J. Xin, G. Wang, R. Zimmermann, and X. Wang, “Skyline-join query processing in distributed databases,” Frontiers of Computer Science, Volume 10, Issue 2, pp. 330-352, 2016.
[2]I. Bartolini, P. Ciaccia, and M. Patella, “SaLSa: computing the skyline without scanning the whole sky,” Proceedings of the 2006 ACM International Conference on Information and Knowledge Management, pp. 405-414, 2006.
[3]I. Bartolini, P. Ciaccia, and M. Patella, “Efficient Sort-Based Skyline Evaluation,” ACM Transactions on Database Systems, Volume 33, Issue 4, pp. 31-49, 2008.
[4]A. Bhattacharya and B. P. Teja, “Aggregate Skyline Join Queries: Skylines with Aggregate Operations over Multiple Relations,” Proceedings of the 16th International Conference on Management of Data, 2010.
[5]S. Börzsönyi, D. Kossmann, and K. Stocker, “The Skyline Operator,” Proceedings of the 17th International Conference on Data Engineering, pp. 421-430, 2001.
[6]S. Chaudhuri, N. N. Dalvi, and R. Kaushik, “Robust Cardinality and Cost Estimation for Skyline Operator,” Proceedings of the 22nd International Conference on Data Engineering, pp. 64-74, 2006.
[7]J. Chomicki, P. Ciaccia, and N. Meneghetti, “Skyline Queries, Front and Back,” ACM SIGMOD Record, Volume 42, Issue 3, pp. 6-18, 2013.
[8]J. Chomicki, P. Godfrey, J. Gryz, and D. Liang, “Skyline with Presorting,” Proceedings of the 19th International Conference on Data Engineering, pp. 717-719, 2003.
[9]J. Chomicki, P. Godfrey, J. Gryz, and D. Liang, “Skyline with Presorting: Theory and Optimizations,” Proceedings of the Intelligent Information Systems Conference, pp. 595-604, 2005.
[10]M. Endres and W. KieBling, “Semi-Skyline Optimization of Constrained Skyline Queries,” Twenty-Second Australasian Database Conference, pp. 7-16, 2011.
[11]M. Han and K. S. Candan, “Skyline-sensitive joins with LR-pruning,” Proceedings of the 15th International Conference on Extending Database Technology, pp. 252-263, 2012.
[12]X. Han, J. Li, H. Gao, and C. Yang, “SEPT: an efficient skyline join algorithm on massive data,” Knowledge and Information Systems, Volume 43, Issue 2, pp. 355-388, 2015.
[13]J. Huang, J. Chen, Q. Du, and J. Yin, “A load balancing skyline query algorithm in high bandwidth distributed systems,” Proceedings of the 7th International Conference on Fuzzy Systems and Knowledge Discovery, pp. 2076-2080, 2010.
[14]W. Jin, M. Ester, Z. Hu, and J. Han, “The Multi-Relational Skyline Operator,” Proceedings of the 23rd International Conference on Data Engineering, pp. 1276-1280, 2007.
[15]M. E. Khalefa, M. F. Mokbel, and J. J. Levandoski, “PrefJoin: An Efficient Preference-aware Join Operator,” Proceedings of the 27th International Conference on Data Engineering, pp. 995-1006, 2011.
[16]D. Kossmann, F. Ramsak, and S. Rost, “Shooting Stars in the Sky: An Online Algorithm for Skyline Queries,” Proceedings of 28th International Conference on Very Large Data Bases, pp. 275-286, 2002.
[17]K. C. K. Lee, B. Zheng, H. Li, and W. C. Lee, “Approaching the Skyline in Z Order,” Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 279-290, 2007.
[18]M. Magnani and I. Assent, “From stars to galaxies: skyline queries on aggregate data,” Proceedings of the 16th International Conference on Extending Database Technology, pp. 477-488, 2013.
[19]M. L. Mortensen, S. Chester, I. Assent, and M. Magnani, “Efficient caching for constrained skyline queries,” Proceedings of the 18th International Conference on Extending Database Technology, pp. 337-348, 2015.
[20]M. Nagendra and K. S. Candan, “Efficient Processing of Skyline-Join Queries over Multiple Data Sources,” ACM Transactions on Database Systems, Volume 40, Issue 2, pp. 10:1-10:46, 2015.
[21]A. Nagendra, C. Doulkeridis, and N. Polyzotis, “Skyline Query Processing over Joins,” Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 73-84, 2011.
[22]D. Papadias, Y. Tao, G. Fu, and B. Seeger, “An Optimal and Progressive Algorithm for Skyline Queries,” Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 467-478, 2003.
[23]D. Papadias, Y. Tao, G. Fu, and B. Seeger, “Progressive Skyline Computation in Database Systems,” ACM Transactions on Database Systems, Volume 30, Issue 1, pp. 41-82, 2005.
[24]V. Raghavan and E. A. Rundensteiner, “Progressive Result Generation for Multi-Criteria Decision Support Queries,” Proceedings of the 26th International Conference on Data Engineering, pp. 733-744, 2010.
[25]V. Raghavan, E. A. Rundensteiner, and S. Srivastava, “Skyline and mapping aware join query evaluation,” Information Systems, Volume 36, Issue 6, pp. 917-936, 2011.
[26]H. Shang and M. Kitsuregawa, “Skyline Operator on Anti-correlated Distributions,” Proceedings of the VLDB Endowment, Volume 6, Issue 9, pp. 649-660, 2013.
[27]D. Sun, S. Wu, J. Li, and A. K Papadias. H. Tung, “Skyline-join in distributed databases,” Proceedings of the 24th International Conference on Data Engineering Workshops, pp. 176-181, 2008.
[28]K. L. Tan, P. K. Eng, and B. C. Ooi, “Efficient Progressive Skyline Computation,” Proceedings of the 27th International Conference on Very Large Data Bases, pp. 301-310, 2001.
[29]Y. Tao, X. Xiao, and J. Pei, “SUBSKY: Efficient Computation of Skylines in Subspaces,” Proceedings of the 22nd International Conference on Data Engineering, p65, 2006.
[30]G. Trimponias, I. Bartolini, D. Papadias, and Y. Yang, “Skyline Processing on Distributed Vertical Decompositions,” IEEE Transactions on Knowledge and Data Engineering, Volume 25, Number 4, pp. 850-862, 2013.
[31]W. Vlachou, M. D. Morse, J. M. Patel, M. Ester, and Z. Hu, “Evaluating skylines in the presence of equijoins,” Proceedings of the 26th International Conference on Data Engineering, pp. 249-260, 2010.
[32]M. Zhang and R. Alhajj, “Skyline queries with constraints: Integrating skyline and traditional query operators,” Data and Knowledge Engineering, Volume 69, Issue 1, pp. 153-168, 2010.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top