(3.235.245.219) 您好!臺灣時間:2021/05/07 21:39
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:李宗翰
研究生(外文):Tsung-Han Li
論文名稱:具擴展性的高效能分散式圖資料處理系統
論文名稱(外文):Kylin: An Efficient and Scalable Graph Data ProcessingSystem
指導教授:劉邦鋒
指導教授(外文):Pangfeng Liu
口試委員:吳真貞洪士灝
口試委員(外文):Jan-Jan WuShih-Hao Hung
口試日期:2013-07-30
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2013
畢業學年度:101
語文別:英文
論文頁數:27
中文關鍵詞:圖資料處理圖資料切割負載平衡動態載入
外文關鍵詞:graph data processinggraph data partitionload balancingpull messagingdynamic loading
相關次數:
  • 被引用被引用:0
  • 點閱點閱:156
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
We introduce Kylin, an efficient and scalable graph data processing system. Kylin is based
on Bulk Synchronous Parallel (BSP) model to process graph data. Although there have
been some BSP-based graph processing systems, Kylin is different from these systems
in two-fold. First, Kylin cooperates with HBase to achieve scalable data manipulation.
Second, We propose three techniques to optimize the performance of Kylin. The proposed
techniques are pull messaging, lazy vertex loading and vertex-weighted partitioning. We
demonstrate Kylin outperforms other BSP-based systems, i.e. Hama and Giraph, in the
experiments.

Contents
Certification I
Acknowledgement II
Chinese Abstract III
Abstract IV
1 Introduction 1
2 Related Works 4
3 Architecture 6
3.1 Partition Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Query Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Data Graph and Data Loader . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.4 Query Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4 Optimizations 9
4.1 Pull Messaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2 Lazy Vertex Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3 Vertex-weighted Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5 Implementation 14
5.1 Data Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.2 Communication and Synchronization . . . . . . . . . . . . . . . . . . . . . . 15
5.3 Vertex Program Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6 Benchmark Suite 17
6.1 Max Value Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.2 Single Source Shortest Path . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.3 Bipartite Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.4 Influence Spreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.5 PageRank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.6 N-steps neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.7 Label propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
7 Experiments 20
7.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7.2 Pull model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.3 Lazy Vertex Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.4 Partitioning strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.5 Overall performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
8 Conclusion 25
Bibliography 26

[1] Hbase. http://hbase.apache.org/.
[2] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large
clusters. In Proceedings of the 6th conference on Symposium on Opearting Systems
Design and Implementation.
[3] Leslie G. Valiant. A bridging model for parallel computation. Commun. ACM,
33(8):103–111, August 1990.
[4] Storm. http://storm-project.net/.
[5] Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan
Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: a system for large-scale graph
processing. In Proceedings of the 2010 international conference on Management of
data.
[6] Hama. http://hama.apache.org/.
[7] Giraph. http://giraph.apache.org/.
[8] Wei Dong, Charikar Moses, and Kai Li. Efficient k-nearest neighbor graph con-
struction for generic similarity measures. In Proceedings of the 20th international
conference on World wide web.
[9] Kyle H. Ambert and Aaron M. Cohen. k-information gain scaled nearest neigh-
bors: A novel approach to classifying protein-protein interaction-related documents.
IEEE/ACM Trans. Comput. Biol. Bioinformatics.
[10] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank
citation ranking: Bringing order to the web. Technical report, 1999.
[11] G. Karypis and V. Kumar. Multilevel algorithms for multi-constraint graph parti-
tioning. In Supercomputing, 1998. SC98. IEEE/ACM Conference on.
[12] Li-Yung Ho, Jan-Jan Wu, and Pangfeng Liu. Distributed graph database for large-
scale social computing. In Cloud Computing (CLOUD), 2012 IEEE 5th International
Conference on, 2012.
[13] Wei Chen, Yifei Yuan, and Li Zhang. Scalable influence maximization in social net-
works under the linear threshold model. In Proceedings of the 2010 IEEE International
Conference on Data Mining.
[14] David Kempe, Jon Kleinberg, and

Eva Tardos. Maximizing the spread of influence
through a social network. In Proceedings of the ninth ACM SIGKDD international
conference on Knowledge discovery and data mining.
[15] Xiaohong Wang, Aaron Smalter, Jun Huan, and Gerald H. Lushington. G-hash:
towards fast kernel-based similarity search in large graph databases. In Proceedings
of the 12th International Conference on Extending Database Technology: Advances in
Database Technology, EDBT ’09, pages 472–480, New York, NY, USA, 2009. ACM.
[16] Luke Wang, Yanghua Xiao, and Haixun Wang. How to partition a billion-node graph.
Technical report, Microsoft Research Asia, 2013.
[17] Michele Coscia, Giulio Rossetti, Fosca Giannotti, and Dino Pedreschi. Demon: a
local-first discovery method for overlapping communities. In Proceedings of the 18th
ACM SIGKDD international conference on Knowledge discovery and data mining,
KDD ’12, pages 615–623, New York, NY, USA, 2012. ACM.
[18] Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and
Bobby Bhattacharjee. Measurement and Analysis of Online Social Networks. In
Proceedings of the 5th ACM/Usenix Internet Measurement Conference (IMC’07).

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔