(3.236.122.9) 您好!臺灣時間:2021/05/14 06:20
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:林與絜
研究生(外文):Yu-Chieh Lin
論文名稱:於機率性資料庫中選擇具影響力物件之技術
論文名稱(外文):Techniques for Selecting Influential Objects in Probabilistic Databases
指導教授:陳銘憲陳銘憲引用關係
指導教授(外文):Ming-Syan Chen
口試委員:陳良弼李官陵李旺謙曾新穆楊得年葉彌妍
口試委員(外文):Arbee L.P. ChenGuanling LeeWang-Chien LeeVincent S. TsengDe-Nian YangMi-Yen Yeh
口試日期:2013-07-04
學位類別:博士
校院名稱:國立臺灣大學
系所名稱:電機工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2013
畢業學年度:101
語文別:英文
論文頁數:108
中文關鍵詞:機率性資料庫群集分析最鄰近k 點搜索社群網路
外文關鍵詞:probabilistic databaseclusteringnearest-neighbor querysocial networks
相關次數:
  • 被引用被引用:0
  • 點閱點閱:111
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
In this dissertation, we study how to select influential objects in probabilistic databases. For an uncertain dataset with probabilistic attribute values, we would like to tell which objects can best improve query or mining results if we can acquire their exact attribute values. The problem is explored on both clustering and the Probabilistic k-Nearest-Neighbor (k-PNN) query. We carefully define the metrics for evaluating the quality of the results of clustering and k-PNN query, and then we design algorithms to find the solutions according to the metrics correspondingly. For the k-PNN query, we provide optimal solutions of acquisition for nearest-neighbor query (1-PNN), and we propose a scalable algorithm solving the acquisition for k-PNN query with k > 1. Besides, for a social network dataset with edge probabilities, we would like to tell which neighboring nodes of the query node can best help gather specific information if these nodes are asked. We carefully formulate the problem according to the motivated scenario, and the proposed approach considers both the strength and the diversity of a node’s influence. We conduct experiments on various datasets, and the experimental results demonstrate the effectiveness and the efficiency of the proposed approaches.

1 Introduction 5
1.1 Motivation 5
1.2 Overview 6
1.2.1 Data Acquisition for Uncertain Clustering 6
1.2.2 Data Acquisition for Probabilistic Nearest-Neighbor Query 7
1.2.3 Guide Query in Sociel Networks 7
1.3 Organization 8
2 Data Acquisition for Uncertain Clustering 9
2.1 Introduction 9
2.2 Related Works 11
2.3 Problem Description 12
2.4 Selective Exact Value Acquisition for Clustering 14
2.4.1 Metric for Selection 15
2.4.2 Localized Metric for Selection 16
2.5 Experiments 17
2.5.1 The FDBSCAN Algorithm 17
2.5.2 Quality Evaluation 18
2.5.3 Experimental Setup and Results 19
2.6 Summary 22
3 Data Acquisition for Probabilistic Nearest-Neighbor Query 23
3.1 Introduction 23
3.2 Related Work 26
3.3 Problem Description 27
3.3.1 Problem Definition 28
3.3.2 Candidate Identification 30
3.4 Single Object Acquisition for 1-PNN 33
3.4.1 Problem Analysis 34
3.4.2 Representative New Answer of di,j ≤ d(1)max 37
3.4.3 Representative New Answer of di,j > d(1)max 40
3.4.4 Algorithm Design 42
3.5 Multiple Object Acquisition for 1-PNN 45
3.5.1 s-acquisition for 1-PNN 45
3.5.2 Set Quality Calculation 48
3.5.3 Set Quality Estimation 55
3.6 Acquisition for k-PNN 61
3.6.1 Major Answers and 1-Acquisition 62
3.6.2 Approach for s-Acquisition 65
3.6.3 Discussion 73
3.7 Experiments 75
3.7.1 Experimental Setup 76
3.7.2 Quality of Data Acquisition 76
3.7.3 Efficiency of Data Acquisition 78
3.7.4 Supplementary Experimental Results 81
3.8 Summary 85
4 Guide Query in Social Networks 86
4.1 Introduction 86
4.2 Proposed Framework 88
4.2.1 Problem Definition 88
4.2.2 The Basic Framework 89
4.2.3 The Advanced Framework 93
4.3 Experimental Evaluation 95
4.3.1 Implementation 95
4.3.2 Experimental Results 95
4.4 Related Work 99
4.5 Summary 100
5 Conclusion 101

[1] S. Abiteboul, P. C. Kanellakis, and G. Grahne, ”On the Representation and Querying of Sets of Possible Worlds,” Proc. ACM SIGMOD Int’l Conf. on Management of Data (SIGMOD), 1987.
[2] P. K. Agarwal, A. Efrat, S. Sankararaman, and W. Zhang, ”Nearest-Neighbor Searching under Uncertainty,” Proc. 31st ACM Symposium on Principles of Database Systems (PODS), 2012.
[3] C. C. Aggarwal, J. Han, J. Wang and P. S. Yu, ”A Framework for Clustering Evolving Data Streams,” Proc. 29th Int’l Conf. on Very Large Data Bases (VLDB), 2003.
[4] C. C. Aggarwal and P. S. Yu, ”A Framework for Clustering Uncertain Data Streams,” Proc. 24th IEEE Int’l Conf. on Data Engineering (ICDE), 2008.
[5] C.C. Aggarwal and P.S. Yu, ”Outlier Detection with Uncertain Data,” Proc. 8th SIAM Int’l Conf. on Data Mining (SDM), 2008.
[6] C. C. Aggarwal and P. S. Yu, ”A Survey of Uncertain Data Algorithms and Applications,” IEEE Trans. on Knowledge and Data Engineering (TKDE), vol. 21, no. 5, pp. 609-623, May 2009.
[7] R. Akbarinia, P. Valduriez, and G. Verger, ”Efficient Evaluation of SUM Queries over Probabilistic Data,” IEEE Trans. on Knowledge and Data Engineering (TKDE), vol.25, no. 4, pp.764-775, Apr. 2013.
[8] M. Ankerst, M. M. Breunig, H.-P. Kriegel and J. Sander, ”OPTICS: Ordering Points to Identify the Clustering Structure,” Proc. ACM SIGMOD Int’l Conf. on Management of Data (SIGMOD), 1999.
[9] M. Azizyan, I. Constandache, and R. Roy Choudhury, ”SurroundSense: Mobile Phone Localization via Ambience Fingerprinting,” Proc. 15th Annual Int’l Conf. on Mobile Computing and Networking (MobiCom), 2009.
[10] P. Bahl and V. Padmanabhan, ”RADAR: An In-Building RF-based User Location and Tracking System,” Proc. 19th IEEE Int’l Conf. on Computer Communications (INFOCOM), 2000.
[11] G. Beskales, M. A. Soliman, and I. F. Ilyas, ”Efficient Search for the Top-k Probable Nearest Neighbors in Uncertain Databases,” Proc. 34th Int’l Conf. on Very Large Data Bases (VLDB), 2008.
[12] J. Bi and T. Zhang, ”Support Vector Machines with Input Data Uncertainty,” Proc. Advances in Neural Information Processing Systems (NIPS), 2004.
[13] C. Bohm, F. Fiedler, A. Oswald, C. Plant, and B. Wackersreuther, ”Probabilistic Skyline Queries,” Proc. 18th ACM Conf. on Information and Knowledge Management (CIKM), 2009.
[14] K. Chang and S. Hwang, ”Minimal Probing: Supporting Expensive Predicates for Top-k Queries,” Proc. ACM SIGMOD Int’l Conf. on Management of Data (SIGMOD), 2002.
[15] J. Chen and R. Cheng, ”Quality-Aware Probing of Uncertain Data with Resource Constraints,” Proc. 20th Int’l Conf. on Scientific and Statistical Database Management (SSDBM), 2008.
[16] J. Chen, R. Cheng, M. F. Mokbel, and C.-Y. Chow, ”Scalable Processing of Snapshot and Continuous Nearest-Neighbor Queries over One-Dimensional Uncertain Data,” Proc. 35th Int’l Conf. on Very Large Data Bases (VLDB), 2009.
[17] W. Chen, C. Wang, and Y. Wang, ”Scalable Influence Maximization for Prevalent Viral Marketing in Large-Scale Social Networks,” Proc. 16th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (KDD), 2010.
[18] R. Cheng, L. Chen, J. Chen, and X. Xie, ”Evaluating Probability Threshold k-Nearest-Neighbor Queries over Uncertain Data,” Proc. 12th Int’l Conf. on Extending Database Technology (EDBT), 2009.
[19] R. Cheng, J. Chen, and X. Xie, ”Cleaning Uncertain Data with Quality Guarantees,” Proc. 34th Int’l Conf. on Very Large Data Bases (VLDB), 2008.
[20] R. Cheng, D.V. Kalashnikov, and S. Prabhakar, ”Evaluating Probabilistic Queries over Imprecise Data,” Proc. ACM SIGMOD Int’l Conf. on Management of Data (SIGMOD), 2003.
[21] R. Cheng, E. Lo, X. S. Yang, M.-H. Luk, X. Li, and X. Xie, ”Explore or Exploit? Effective Strategies for Disambiguating Large Databases,” Proc. 36th Int’l Conf. on Very Large Data Bases (VLDB), 2010.
[22] R. Cheng, D. V. Kalashnikov, and S. Prabhakar, ”Querying Imprecise Data in Moving Object Environments,” IEEE Trans. on Knowledge and Data Engineering (TKDE), vol. 16, no. 9, pp. 1112-1127, Sep. 2004.
[23] R. Cheng, Y. Xia, S. Prabhakar, R. Shah, and J. S. Vitter, ”Efficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data,” Proc. 30th Int’l Conf. on Very Large Data Bases (VLDB), 2004.
[24] C. Chui, B. Kao, and E. Hung, ”Mining Frequent Itemsets from Uncertain Data,” Proc. 11th Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD), 2007.
[25] H. Deng, I. King, and M.R. Lyu, ”Formal Models for Expert Finding on DBLP Bibliography Data,” Proc. of 8th IEEE Int’l Conf. on Data Mining (ICDM), 2008.
[26] A. Deshpande, C. Guestrin, S. R. Madden, J. M. Hellerstein and W. Hong, ”Model-Driven Data Acquisition in Sensor Networks,” Proc. 30th Int’l Conf. on Very Large Data Bases (VLDB), 2004.
[27] J. Dougherty, R. Kohavi, and M. Sahami, ”Supervised and Unsupervised Discretization of Continuous Features,” Proc. 12th Int’l Conf. on Machine Learning (ICML), 1995.
[28] W. Elmenreich, ”Fusion of Continuous-valued Sensor Measurements using Confidence Weighted Averaging,” Journal of Vibration and Control (JVC), vol. 13, no. 9-10, pp.1303-1312, Sep. 2007.
[29] M. Ester, H.-P. Kriegel, J. Sander and X. Xu, ”A Density Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” Proc. 2nd ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (KDD), 1996.
[30] C. Faloutsos, K.S. McCurley, and A. Tomkins, ”Fast Discovery of Connection Subgraphs,” Proc. 10th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (KDD), 2004.
[31] N. Fuhr and T. Rolleke, ”A Probabilistic Relational Algebra for the Integration of Information Retrieval and Database Systems,” ACM Trans. on Information Systems (TOIS), vol. 15, pp. 32-66, 1997.
[32] T. Ge and S. B. Zdonik, ”Handling Uncertain Data in Array Database Systems,” Proc. 24th IEEE Int’l Conf. on Data Engineering (ICDE), 2008.
[33] A. Goel, S. Guha, and K. Munagala, ”Asking the right questions: Model-driven Optimization using Probes,” Proc. 25th ACM Symposium on Principles of Database Systems (PODS), 2006.
[34] A. Goel, S. Guha, and K. Munagala, ”How to Probe for an Extreme Value,” ACM Trans. on Algorithms (TALG), vol. 7, issue 1, Nov. 2010.
[35] S. Gunnemann, H. Kremer, and T. Seidl, ”Subspace Clustering for Uncertain Data,” Proc. 10th SIAM Int’l Conf. on Data Mining (SDM), 2010.
[36] D. Kempe, J. Kleinberg, and E. Tardos, ”Maximizing the Spread of Influence Through a Social Network,” Proc. 9th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (KDD), 2003.
[37] M. E. Khalefa, M. F. Mokbel, and J. J. Levandoski, ”Skyline Query Processing for Uncertain Data,” Proc. 19th ACM Conf. on Information and Knowledge Management (CIKM), 2010.
[38] M. Kimura, K. Saito, and R. Nakano, ”Extracting Influential Nodes for Information Diffusion on a Social Network,” Proc. 22nd AAAI Conf. on Artificial Intelligence (AAAI), 2007.
[39] Y. Koren, S.C. North, and C. Volinsky, ”Measuring and Extracting Proximity in Networks,” Proc. 12th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (KDD), 2006.
[40] H.-P. Kriegel, P. Kunath, and M. Renz, ”Probabilistic Nearest-Neighbor Query on Uncertain Objects,” Proc. 12th Int’l Conf. on Database Systems for Advanced Applications (DASFAA), 2007.
[41] H.-P. Kriegel and M. Pfeifle, ”Density-Based Clustering of Uncertain Data,” Proc. 11th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (KDD), 2005.
[42] H.-P. Kriegel and M. Pfeifle, ”Hierarchical Density Based Clustering of Uncertain Data,” Proc. of 5th IEEE Int’l Conf. on Data Mining (ICDM), 2005.
[43] H.-P. Kriegel and M. Pfeifle, ”Measuring the Quality of Approximated Clusterings,” Proc. BTW’05, 2005. Proc. 11th GI-Symposium Database Systems for Business, Technology and Web (BTW), 2005.
[44] T. Lappas, K. Liu, and E. Terzi, ”Finding a Team of Experts in Social Networks,” Proc. 15th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (KDD), 2009.
[45] M. Ley, The DBLP (Digital Bibliography and Library Project) Computer Science Bibliography, http://www.informatik.uni-trier.de/˜ley/db/
[46] J. Li and A. Deshpande, ”Ranking Continuous Probabilistic Datasets,” Proc. 36th Int’l Conf. on Very Large Data Bases (VLDB), 2010.
[47] X. Lian and L. Chen, ”Efficient Processing of Probabilistic Reverse Nearest Neighbor Queries over Uncertain Data,” Int’l Journal on Very Karge Data Bases (VLDB Journal), vol. 18, no. 3, pp. 787-808, June 2009.
[48] X. Liu, M. Ye, J. Xu, Y. Tian, and W.-C. Lee, ”K-Selection Query over Uncertain Data,” Proc. 15th Int’l Conf. on Database Systems for Advanced Applications (DASFAA), 2010.
[49] H. Lu, W. Pan, N. Lane, T. Choudhury, and A. Campbell, ”Soundsense: Scalable Sound Sensing for People-Centric Applications on Mobile Phones,” Proc. 7th Int’l Conf. on Mobile Systems, Applications, and Services (MobiSys), 2009.
[50] L. V. S. Lakshmanan, N. Leone, R. Ross, and V. S. Subrahmanian, ”ProbView: A Flexible Probabilistic Database System,” ACM Trans. on Database Systems (TODS), vol. 22, no. 3, pp. 419-469, Sep. 1997.
[51] Y. Ma, R. Hankins, and D. Racz, ”iLoc: a Framework for Incremental Locationstate Acquisition and Prediction Based on Mobile Sensors,” Proc. 18th ACM Conf. on Information and Knowledge Management (CIKM), 2009.
[52] W. Ngai, B. Kao, C. Chui, R. Cheng, M. Chau and K. Y. Yip, ”Efficient Clustering of Uncertain Data,” Proc. of 6th IEEE Int’l Conf. on Data Mining (ICDM), 2006.
[53] C. Olston, J. Jiang, and J. Widom, ”Adaptive Filters for Continuous Queries over Distributed Data Streams,” Proc. ACM SIGMOD Int’l Conf. on Management of Data (SIGMOD), 2003.
[54] M. Potamias, F. Bonchi, A. Gionis, and G. Kollios, ”K-Nearest Neighbors in Uncertain Graphs,” Proc. 36th Int’l Conf. on Very Large Data Bases (VLDB), 2010.
[55] S. Singh, C. Mayfield, R. Shah, S. Prabhakar, S. E. Hambrusch, J. Neville, and R. Cheng, ”Database Support for Probabilistic Attributes and Tuples,” Proc. 24th IEEE Int’l Conf. on Data Engineering (ICDE), 2008.
[56] M.A.Soliman and I.F. Ilyas, ”Ranking with Uncertain Scores,” Proc. 25th IEEE Int’l Conf. on Data Engineering (ICDE), 2009.
[57] M. A. Soliman, I. F. Ilyas, and K. C. -C. Chang, ”Top-k Query Processing in Uncertain Databases,” Proc. 23th IEEE Int’l Conf. on Data Engineering (ICDE), 2007.
[58] H. Tong, S. Papadimitriou, C. Faloutsos, P.S. Yu and T. Eliassi-Rad, ”BASSET: Scalable Gateway Finder in Large Graphs,” Proc. 14th Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD), 2010.
[59] Y. Wang, G. Cong, G. Song, and K. Xie, ”Community-based Greedy Algorithm for Mining Top-K Influential Nodes in Mobile Social Networks,” Proc. 16th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (KDD), 2010.
[60] S. Wu, J. Tang, and B. Gao, ”Instant Social Graph Search,” Proc. 16th Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD), 2012.
[61] B. Yang, H. Lu, and C. S. Jensen, ”Probabilistic Threshold K Nearest Neighbor Queries over Moving Objects in Symbolic Indoor Space,” Proc. 13th Int’l Conf. on Extending Database Technology (EDBT), 2010.
[62] S. M. Yuen, Y. Tao, X. Xiao, J. Pei, and D. Zhang, ”Superseding Nearest Neighbor Search on Uncertain Spatial Databases,” IEEE Trans. on Knowledge and Data Engineering (TKDE), vol. 22, no. 7, pp. 1041-1055, July 2010.
[63] Q. Zhang and F. Li and K. Yi, ”Finding Frequent Items in Probailistic Data,” Proc. ACM SIGMOD Int’l Conf. on Management of Data (SIGMOD), 2008.
[64] Y. Zhang, X. Lin, G. Zhu,W. Zhang, and Q. Lin, ”Efficient Rank Based KNN Query Processing over Uncertain Data,” Proc. 26th IEEE Int’l Conf. on Data Engineering (ICDE), 2010.
[65] K. Zheng, Z. Huang, A. Zhou, and X. Zhou, ”Discovering the Most Influential Sites over Uncertain Data: A Rank-Based Approach,” IEEE Trans. on Knowledge and Data Engineering (TKDE), vol. 24, no. 12, pp. 2156-2169, Dec. 2012.
[66] Z. Zou, J. Li, H. Gao, and S. Zhang, ”Frequent Subgraph Pattern Mining on Uncertain Graph Data,” Proc. 18th ACM Conf. on Information and Knowledge Management (CIKM), 2009.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔