跳到主要內容

臺灣博碩士論文加值系統

(44.200.171.156) 您好!臺灣時間:2023/03/27 10:18
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:Heri Wijayanto
研究生(外文):HERI WIJAYANTO
論文名稱:Distributed Processing of Skyline Queries and the Applications for Upgrading Products to Maximize Profits
論文名稱(外文):Distributed Processing of Skyline Queries and the Applications for Upgrading Products to Maximize Profits
指導教授:陳良弼陳良弼引用關係
指導教授(外文):CHEN, ARBEE L. P.
口試委員:顏嗣鈞曾新穆柯佳伶薛榮銀陳良弼
口試委員(外文):YEN, HSU-CHUNTSENG, VINCENT SHI-MUKOH, JIA-LINGSHAE, ZON-YINCHEN, ARBEE L. P
口試日期:2021-06-29
學位類別:博士
校院名稱:亞洲大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2021
畢業學年度:109
語文別:英文
論文頁數:73
中文關鍵詞:Skyline QueryMapReduceDistributed ProcessingDominating RelationshipUpgrading Product Recommendations
外文關鍵詞:Skyline QueryMapReduceDistributed ProcessingDominating RelationshipUpgrading Product Recommendations
相關次數:
  • 被引用被引用:0
  • 點閱點閱:74
  • 評分評分:
  • 下載下載:1
  • 收藏至我的研究室書目清單書目收藏:0
A skyline query determines the data points in a dataset that are not dominated by others. It is widely used for many applications which require multi-criteria decision-making. However, skyline query processing is considerably time-consuming for a high-dimensional large-scale dataset. This study consists of two main tasks. The first is to design an efficient parallel computing technique to address the computational problem of skyline queries for large high-dimensional datasets. It is based on MapReduce frameworks to process large datasets. The second study focuses on the recommendation of upgrading products based on the skyline data points.
A large number of efficient MapReduce skyline algorithms have been proposed in the literature. However, there are still opportunities for further parallelism. Our method to divide a large dataset is called by LShape partitioning strategy and we propose an effective filtering algorithm named propagation filtering. We verify that our algorithms outperformed the state-of-the-art approaches by extensive experiments, especially for high-dimensional large-scale datasets.
The manufacturer often needs to make a proper decision to gain maximal profits from the product upgrading. The goal of upgrading products is to maximize the profit by increasing the total number of expected customers with a certain upgrading cost. Our algorithms in this study are based on the dominating relationships among products. It has been proved that finding the dominating relationship of products with three or more features is NP-hard. We first propose an optimal algorithm for upgrading a single product and then modify it to be an efficient heuristic algorithm with a percentage error approaching 20%. We also extend this heuristic algorithm for simultaneously upgrading multiple products.

A skyline query determines the data points in a dataset that are not dominated by others. It is widely used for many applications which require multi-criteria decision-making. However, skyline query processing is considerably time-consuming for a high-dimensional large-scale dataset. This study consists of two main tasks. The first is to design an efficient parallel computing technique to address the computational problem of skyline queries for large high-dimensional datasets. It is based on MapReduce frameworks to process large datasets. The second study focuses on the recommendation of upgrading products based on the skyline data points.
A large number of efficient MapReduce skyline algorithms have been proposed in the literature. However, there are still opportunities for further parallelism. Our method to divide a large dataset is called by LShape partitioning strategy and we propose an effective filtering algorithm named propagation filtering. We verify that our algorithms outperformed the state-of-the-art approaches by extensive experiments, especially for high-dimensional large-scale datasets.
The manufacturer often needs to make a proper decision to gain maximal profits from the product upgrading. The goal of upgrading products is to maximize the profit by increasing the total number of expected customers with a certain upgrading cost. Our algorithms in this study are based on the dominating relationships among products. It has been proved that finding the dominating relationship of products with three or more features is NP-hard. We first propose an optimal algorithm for upgrading a single product and then modify it to be an efficient heuristic algorithm with a percentage error approaching 20%. We also extend this heuristic algorithm for simultaneously upgrading multiple products.

Contents
Abstract i
Contents ii
List of Figures v
List of Tables vii
Chapter 1 Introduction 1
1.1. Skyline Query of Large Dataset 1
1.2. Recommendation of Upgrading Products 3
1.3. Conceptual Diagram for the research issues 5
1.4. Research Purposes 6
1.5. Organization of Dissertation 6
Chapter 2 Related Works 7
2.1 MapReduce Algorithms for Large Dataset Skyline Processing 7
2.2 Upgrading Products Based on Skyline Data 10
Chapter 3 Large Dataset Skyline Processing 12
3.1 Preliminaries 12
3.2 The Two-Phases LShape Framework 14
3.3 The One-Phase LShape Framework 16
3.4 Propagation Filtering Algorithm 17
3.5 The Two-Phases LShape Framework Algorithms 20
3.6 The One-Phase LShape Algorithms 22
3.7 Experimental Results 24
3.7.1 Varying Data Dimensions 25
3.7.2 Varying Data Sizes 26
3.7.3 Varying Number of Computer Nodes 28
3.7.4 Varying Cell Sizes 30
3.7.5 Two-Phases and One-Phase LShape Algorithms 30
3.7.6 Real-World Dataset 31
3.7.7 Effectiveness of Sampling 32
Chapter 4 Calculating the Number of Expected Customers 34
4.1 Preliminaries 34
4.2 Dominating Graph of Intersections (DGI) and its complexity 36
4.3 Properties of DGI 38
4.4 Naïve Approach to Construct DGI 39
4.5 Top-Down Recursive Depth First Search Algorithm (TDRDFS) 40
4.6 Calculating TEC(pi) based on DGI 49
4.7 Experimental Results 49
Chapter 5 Recommendation of Upgrading Products based on DGI 54
5.1 Preliminaries 54
5.1.1 Computing the Profit of Product Upgrading 54
5.1.2 Target Point Surface (TPS) 54
5.2 Grid-Based Find and Refine (GBFR) algorithm 55
5.2.1 Upper Bound and Lower Bound of GBFR 57
5.2.2 Best First Search Processing 58
5.2.3 Threshold 59
5.3 Upgrading Single Product 59
5.3.1 The O-GBFR-S 59
5.3.2 The G-GBFR-S 61
5.4 Greedy GBFR for Multiple Products Upgrading (G-GBFR-M) 61
5.5 Experimental Results 62
5.5.1 The Profits of Points on the TPS 63
5.5.2 The single product upgrading 63
5.5.2.1 The comparison of O-GBFR-S and Naïve Approach 63
5.5.2.2 Execution Time of O-GBFR-S and G-GBFR-S 64
5.5.2.3 Percentage Error of G-GBFR-S 66
5.5.3 Execution Time of G-GBFR-M 66
Chapter 6 Conclusion and Future Research 68
6.1 Conclusion 68
6.2 Future Research Direction 68
References 69
Acknowledgment 73

List of Figures
Fig.1.1. Skyline example 3
Fig. 1.2. Dominating regions, expected customers, and product upgrading 5
Fig. 1.3. Conceptual diagram for the research issues 5
Fig. 3.1. The LShape partitions 14
Fig. 3.2. The first phase of the two-phases MapReduce LShape framework 14
Fig. 3.3. The second phase of the two-phases MapReduce LShape framework 15
Fig. 3.4. The one-phase MapReduce LShape framework 16
Fig. 3.5. Propagation filtering in a 2-dimensional dataset 17
Fig. 3.6. Example of intersection cell in LShape partitions 22
Fig. 3.7. Comparison by number of dimensions 25
Fig. 3.8. Comparison by data sizes for anti-correlated data 27
Fig. 3.9. Comparison by data sizes for independent data 27
Fig. 3.10. Comparisons by the number of computer nodes for anti-correlated data 29
Fig. 3.11. Comparisons by the number of computer nodes for independent data 29
Fig. 3.12. Cell sizes experiment 31
Fig. 3.13. one-phase vs. two-phases LShape algorithms 31
Fig. 3.14. Real-world dataset 32
Fig. 3.15. Sampling effect of three data distributions 33
Fig. 4.1. (a) 2D dominating regions; (b) 2D DGI representation 34
Fig. 4.2. (a) 3D dominating regions. (b) 3D DGI representation 37
Fig. 4.3. A node of DGI data structure 42
Fig. 4.4. Illustration of TDRDFS of DGI algorithm 45
Fig. 4.5. The Naïve vs TDRDFS construction time 51
Fig. 4.6. DGI construction time compared to the number of dimensions 51
Fig. 4.7. The number of intersection points vs. the number of skyline points 52
Fig. 4.8. The number of intersection points vs. the number of dimensions 52
Fig. 5.1 The two-dimensional TPS 55
Fig. 5.2. GBFR visualization in 3-dimensional dataset 57
Fig. 5.3. Cell properties 58
Fig.5.4. Profit of points on the Target Point Surface 63
Fig. 5.5 The O-GBFR-S vs Naïve approach 64
Fig. 5.6 The execution time of O-GBFR-S in a varied number of the product 64
Fig. 5.7 The execution time of O-GBFR-S vs. G-GBFR-S 65
Fig. 5.8. The execution time of O-GBFR-S vs. G-GBFR-S in varied depth 66
Fig. 5.9 The percentage error of G-GBFR-S for varied depth (threshold) 66
Fig. 5.10. Execution time of G-GBFR-M with varying number products 67
Fig. 5.11. Execution time of G-GBFR-M with varying number of competitor 67

List of Tables
Table 5.1 Threshold of GBFR 58


[1]J. Chomicki, P. Godfrey, J. Gryz, and D. Liang, “Skyline with Presorting,” in ICDE, 2003.
[2]S. Borzsony, D. Kossmann, and K. Stocker, “The Skyline operator,” Proc. 17th Int. Conf. Data Eng., pp. 421–430, 2001.
[3]G. Wang, J. Xin, L. Chen, and Y. Liu, “Energy-efficient reverse skyline query processing over wireless sensor networks,” IEEE Trans. Knowl. Data Eng., vol. 24, no. 7, pp. 1259–1275, 2012.
[4]C. Y. Lin, J. L. Koh, and A. L. P. Chen, “Determining k-most demanding products with maximum expected number of total customers,” IEEE Trans. Knowl. Data Eng., vol. 25, no. 8, pp. 1732–1747, 2013.
[5]G. Xiao, K. Li, and K. Li, “Reporting L Most Favorite Objects in Uncertain Databases with Probabilistic Reverse Top-k Queries,” Proc. - 15th IEEE Int. Conf. Data Min. Work. ICDMW 2015, pp. 1592–1599, 2016.
[6]J.-L. Koh, C.-Y. Lin, and A. L. P. Chen, “Finding k most favorite products based on reverse top-t queries,” VLDB J., vol. 23, no. 4, pp. 541–564, 2014.
[7]B. C. Tan, K.-L.; Eng, P.-K. Eng; Ooi, “Efficient progressive skyline computation,” in VLDB, 2001.
[8]D. Kossmann, F. Ramsak, and S. Rost, “Shooting Stars in the Sky: An Online Algorithm for Skyline Queries,” {VLDB} 2002, Proc. 28th Int. Conf. Very Large Data Bases, August 20-23, 2002, Hong Kong, China, pp. 275–286, 2002.
[9]D. Papadias, Y. Tao, G. Fu, and S. Bernhard, “An Optimal and Progressive Algorithm for Skyline Queries,” in ACM SIGMOD, 2003.
[10]D. Papadias, G. Fu, and B. Seeger, “An Optimal and Progresive Algorithm for Skyline Queries,” in ACM SIGMOID, 2003.
[11]D. Sacharidis, P. Bouros, and T. Sellis, “Caching dynamic skyline queries,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 5069 LNCS, pp. 455–472, 2008.
[12]D. Papadias, Y. Tao, G. Fu, and B. Seeger, “Progressive skyline computation in database systems,” ACM Trans. Database Syst., vol. 30, no. 1, pp. 41–82, 2005.
[13]W. C. Wang, E. T. Wang, and A. L. P. Chen, “Dynamic skylines considering range queries,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 6588 LNCS, no. PART 2, pp. 235–250, 2011.
[14]E. Dellis and B. Seeger, “Efficient Computation of Reverse Skyline Queries,” VLDB 07 Proc. 33rd Int. Conf. Very large data bases, pp. 291–302, 2007.
[15]L. Xiang and L. Chen, “Monochromatic and bichromatic reverse skyline search over uncertain databases,” in Proceedings of the 2008 ACM SIGMOD international conference on Management of data, 2008, pp. 213–226.
[16]P. M. Deshpande and P. Deepak, “Efficient reverse skyline retrieval with arbitrary non-metric similarity measures,” Proc. 14th Int. Conf. Extending Database Technol. - EDBT/ICDT ’11, p. 319, 2011.
[17]J. Pei, B. Jiang, X. Lin, and Y. Yuan, “Probabilistic Skylines on Uncertain Data,” 33rd Int. Conf. Very Large Data Bases, pp. 15–26, 2007.
[18]Y. Tao and D. Papadias, “Maintaining Sliding Window Skylines on Data Streams,” IEEE Trans. Knowl. Data Eng., vol. 18, no. 3, pp. 377–391, 2006.
[19]H. Z. Su, E. T. Wang, and A. L. P. Chen, “Continuous Probabilistic Skyline Queries over Uncertain Data Streams,” pp. 105–121, 2010.
[20]M. Parsian, Data Algorithms Recipes for Scaling up with Hadoop and Spark. Boston; Farnhan; Sebastapol; Tokyo: O’Reilly, 2015.
[21]B. Zhang, S. Zhou, and J. Guan, “Adapting Skyline Computation to the MapReduce Framework: Algorithms and Experiments,” Lect. Notes Comput. Sci., vol. 6637, no. 60873040, pp. 403–414, 2011.
[22]L. Chen, K. Hwang, and J. Wu, “MapReduce skyline query processing with a new angular partitioning approach,” Proc. 2012 IEEE 26th Int. Parallel Distrib. Process. Symp. Work. IPDPSW 2012, pp. 2262–2270, 2012.
[23]K. Mullesgaard, J. L. Pedersen, H. Lu, and Y. Zhou, “Efficient Skyline Computation in MapReduce,” Proc. 17th Int. Conf. Extending Database Technol., no. c, pp. 37–48, 2014.
[24]J. Zhang, X. Jiang, W. S. Ku, and X. Qin, “Efficient parallel skyline evaluation using MapReduce,” IEEE Trans. Parallel Distrib. Syst., vol. 27, no. 7, pp. 1996–2009, 2016.
[25]Y. Park, J.-K. Min, and K. Shim, “Parallel computation of skyline and reverse skyline queries using mapreduce,” Proc. VLDB Endow., vol. 6, no. 14, pp. 2002–2013, 2013.
[26]Y. Park, J. K. Min, and K. Shim, “Efficient Processing of Skyline Queries Using MapReduce,” IEEE Trans. Knowl. Data Eng., vol. 29, no. 5, pp. 1031–1044, 2017.
[27]J. L. Koh, C. C. Chen, C. Y. Chan, and A. L. P. Chen, “MapReduce skyline query processing with partitioning and distributed dominance tests,” Inf. Sci. (Ny)., vol. 375, pp. 114–137, 2017.
[28]M. Tang, Y. Yu, W. G. Aref, Q. M. Malluhi, and M. Ouzzani, “Efficient Parallel Skyline Query Processing for High-Dimensional Data,” IEEE Trans. Knowl. Data Eng., vol. 30, no. 10, pp. 1838–1851, 2018.
[29]H. Wijayanto, W. Wang, W.-S. Ku, and A. Chen, “LShape Partitioning: Parallel Skyline Query Processing using MapReduce,” IEEE Trans. Knowl. Data Eng., 2020.
[30]X. Lin, Y. Yuan, Q. Zhang, and Y. Zhang, “Selecting stars: The k most representative skyline operator,” Proc. - Int. Conf. Data Eng., pp. 86–95, 2007.
[31]X. Lin, Q. Liu, Y. Yuan, X. Zhou, and H. Lu, “Summarizing level-two topological relations in large spatial datasets,” ACM Trans. Database Syst., vol. 31, no. 2, pp. 584–630, 2006.
[32]M. L. Yiu and N. Mamoulis, “Efficient processing of top-k dominating queries on multi-dimensional data,” VLDB ’07 Proc. 33rd Int. Conf. Very large data bases, pp. 483–494, 2007.
[33]M. L. Yiu and N. Mamoulis, “Multi-dimensional top- k dominating queries,” VLDB J., vol. 18, pp. 695–718, 2009.
[34]C. Li, B. C. Ooi, A. K. H. Tung, and S. Wang, “DADA : A Data Cube for Dominant Relationship Analysis,” in SIGMOD’06, June 26–29, 2006.
[35]J. Gray, A. Bosworth, A. Layman, and H. Pirahesh, “Data Cube: {A} Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total,” in Proceedings of the Twelfth International Conference on Data Engineering, February 26 - March 1, 1996, New Orleans, Louisiana, {USA}, 1996, pp. 152–159.
[36]L. Zou and L. Chen, “Pareto-Based Dominant Graph : An Efficient Indexing Structure to Answer Top-K Queries,” IEEE Trans. Knowl. Data Eng., vol. 23, no. 5, pp. 727–741, 2011.
[37]B. J. Santoso, G. Chiu, and I. C. Society, “Close Dominance Graph : An Efficient Framework for Answering Continuous Top- k Dominating Queries,” IEEE Trans. Knowl. Data Eng., vol. 26, no. 8, pp. 1853–1865, 2014.
[38]H. Wijayanto, S. Thamrin, and A. L. P. Chen, “Upgrading Products based on Existing Dominant Competitors,” in Proceedings of the 54th Hawaii International Conference on System Sciences, pp. 1738–1747.
[39]H. Lu and C. S. Jensen, “Upgrading uncompetitive products economically,” Proc. - Int. Conf. Data Eng., pp. 977–988, 2012.
[40]S. Ge, L. Hou U, N. Mamoulis, and D. W. L. Cheung, “Dominance Relationship Analysis with Budget Constraints,” Knowl. Inf. Syst., 2013.
[41]B. Yin, K. Gu, X. Wei, S. Zhou, and Y. Liu, “A cost-efficient framework for finding prospective customers based on reverse skyline queries,” Knowledge-Based Syst., vol. 152, pp. 117–135, 2018.
[42]M. S. Islam and C. Liu, “Know your customer: computing k-most promising products for targeted marketing,” VLDB J., vol. 25, no. 4, pp. 545–570, 2016.
[43]X. Zhou, K. Li, Z. Yang, and K. Li, “Finding Optimal Skyline Product Combinations under Price Promotion,” IEEE Trans. Knowl. Data Eng., vol. 4347, no. c, pp. 1–14, 2019.
[44]P. Baldi, K. Cranmer, T. Faucett, P. Sadowski, and D. Whiteson, “Parameterized neural networks for high-energy physics,” Eur. Phys. J. C, vol. 76, no. 5, Apr. 2016.


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top