臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.152) 您好！臺灣時間：2025/11/01 23:18

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
論文連結
QR Code

本論文永久網址:

研究生:

歐韋伸

研究生(外文):

Wei-Shen Ou

論文名稱:

整合PCI穿透技術於GPU與InfiniBand虛擬化於雲端虛擬叢集建置

論文名稱(外文):

Construction of a Virtual Cluster by Integrating PCI Pass-Through for GPU and InfiniBand Virtualization in Cloud

指導教授:

楊朝棟

指導教授(外文):

Chao-Tung Yang

口試委員:

李冠憬、朱正忠、薛念林、劉榮春

口試委員(外文):

Kuan-Ching Li、Cheng-Chung Chu、Nien-Lin Hsueh、Jung-Chun Liu

口試日期:

2013-07-02

學位類別:

碩士

校院名稱:

東海大學

系所名稱:

資訊工程學系

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2013

畢業學年度:

101

語文別:

英文

論文頁數:

114

中文關鍵詞:

虛擬叢集、CUDA、InfiniBand、虛擬化、MPI、KVM、XEN、VMware、HPL

外文關鍵詞:

Virtual Cluster、CUDA、InfiniBand、virtualization、MPI、KVM、XEN、VMware、HPL

相關次數:

被引用:0
點閱:1275
評分:
下載:21
書目收藏:0

現今，NVIDIA公司的CUDA是一種為了撰寫高度平行的應用程式。它需要一些平行化建構概念：一個有層次的線程區塊、共享式的記憶體和屏障同步。使用CUDA開發的程式，可以達到驚人的加速性。圖形處理器則能在叢集環境的雲端運算中扮演一個重要的角色。因為這樣將可以建置一個更強大的高效能運算環境。
在雲端的架構中，虛擬化是很重要的一環，虛擬機器使用NVIDIA的顯示卡，進而可以使用CUDA高效能運算。這將使得虛擬機器不僅能有虛擬的中央處理器，更可以使用實體的圖形處理器來做運算，虛擬機器的效能將可大幅提升。InfiniBand 是一種擁有高頻寬、低延遲的高速網路，所以它被廣泛的用於高效能運算的領域中。Xen、KVM、VMware 都是我們所熟悉的虛擬化平台，它們有很好的效能和穩定性。我希望在這些虛擬平台上，利用很多張的圖形處理器，利用它們強大的運算能力，和比乙太網路更快的Infiniband做為傳輸的媒介，來達成一個高效能的叢集運算環境。我在實驗的最後用HPL來顯示這叢集運算環境的能力。

At present, NVIDIA's CUDA can support programmers to develop highly parallel applications. It utilizes some parallel construct concepts: hierarchical thread blocks, shared memory, and barrier synchronization. CUDA development programs can be used to achieve amazing acceleration. The graphics processor is able to play an important role in cloud computing in a cluster environment, because it can be used to build a high-performance computing environment.
In the cloud architecture, virtualization plays a very important part. Any virtual machine built with the NVIDIA graphics card will have the CUDA high-performance computing ability. This makes the virtual machine have not only virtual CPUs, but also physical graphics processors to do computations, resulting in much improvement in performance of the virtual machine. InfiniBand, a high-bandwidth and low-latency high-speed interconnect, is widely used in the field of high performance computing. Xen, KVM, and VMware are well-known virtualization platforms that have good performance and stability. In this work, a high-performance cluster computing environment is achieved on Xen, KVM, and VMware virtual platforms, with graphics processors that have powerful computing capabilities, and Infiniband that is faster than Ethernet as the transmission medium. Finally, the High Performance Linpack (HPL) benchmark is used in the experiment to test the computing capability of the cluster computing environment.

摘要I
Abstract II
致謝詞III
Table of Contents V
List of Figures VII
List of Tables IX
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Thesis Goal and Contributions . . . . . . . . . . . . . . . . . . . . 2
1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Backgroud Review and Related Work 4
2.1 CUDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 MPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Infiniband . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.1 Full-Virtualization . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.2 Para-Virtualization . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.3 KVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.4 XEN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5 Virtualization on GPU . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5.1 Hypervisor-based device emulation . . . . . . . . . . . . . . 19
2.5.2 User space device emulation . . . . . . . . . . . . . . . . . . 20
2.5.3 Passthrough within the hypervisor . . . . . . . . . . . . . . 21
2.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3 System Design and Implementation 24
3.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 Graphic Processing Unit . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.1 Tesla C2050/C2070/C2075 Computing Processor Board . . 27
3.2.2 Tesla K20 Computing Processor Board . . . . . . . . . . . . 28
3.3 System Implementation - Setting PCI-Passthrough . . . . . . . . . 31
4 Experimental Results 37
4.1 Experimental Environment . . . . . . . . . . . . . . . . . . . . . . . 37
4.1.1 Experimental Environment - InfiniBand Test Case . . . . . 40
4.1.2 Experimental Environment - GPU Test Case . . . . . . . . 41
4.1.3 Experimental Environment - HPL . . . . . . . . . . . . . . . 42
4.2 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2.1 Results and Discussion - InfiniBand Test Case . . . . . . . . 44
4.2.2 Results and Discussion - GPU Test Case . . . . . . . . . . . 50
4.2.3 Results and Discussion - HPL . . . . . . . . . . . . . . . . . 61
5 Conclusions and Future Work 69
5.1 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.2 Further Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Bibliography 71
Appendix 78
A Set PCI-Passthrough on Xen 78
A.1 Setup Xen on CentOS . . . . . . . . . . . . . . . . . . . . . . . . . 78
A.2 Set PCI passthrough . . . . . . . . . . . . . . . . . . . . . . . . . . 79
B Set PCI-Passthrough on KVM 81
B.1 Change Linux-Kernel on CentOS . . . . . . . . . . . . . . . . . . . 81
B.2 Set PCI passthrough . . . . . . . . . . . . . . . . . . . . . . . . . . 82
C Set PCI-Passthrough on VMware vSphere 5.1 84
C.1 GPU Driver on VWware-vSphere-ESXi-5.1 . . . . . . . . . . . . . . 84
C.2 Mellanox InfiniBand OFED Driver for VMware vSphere 5.1 . . . . 85
C.3 Infiniband SRP Target on CentOS 6 . . . . . . . . . . . . . . . . . 85
D CUDA Installation 91
E Infiniband Installation 93
F Parallel Studio XE 2013 Installation 95
G HPL Installation 96

Bibliography
[1] Cuda. http://www.nvidia.com.tw/object/cuda_home_new_tw.html.
[2] Download cuda. http://developer.nvidia.com/object/cuda.htm.
[3] Nvidia cuda programming guide. http://developer.download.nvidia.com/
compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf.
[4] Nvidia cuda sdk. http://developer.nvidia.com/cuda-cc-sdk-code-samples.
[5] nvidia. http://www.nvidia.com.
[6] Gpgpu. http://en.wikipedia.org/wiki/GPGPU.
[7] Opencl. http://www.khronos.org/opencl/.
[8] Opencl-wiki. http://en.wikipedia.org/wiki/OpenCL.
[9] Full virtualization. http://en.wikipedia.org/wiki/Full_virtualization.
[10] Para virtualization. http://en.wikipedia.org/wiki/Paravirtualization.
[11] Kvm. http://www.linux-kvm.org/page/Main_Page.
[12] Xen. http://www.xen.org/.
[13] Qemu. http://wiki.qemu.org/Main_Page.
[14] Pci-pass-through. http://www.ibm.com/developerworks/linux/library/l-pcipassthrough.
[15] J. Duato, A.J. Pena, F. Silla, J.C. Fernandez, R. Mayo, and E.S. Quintana-
Orti. Enabling cuda acceleration within virtual machines using rcuda. In
High Performance Computing (HiPC), 2011 18th International Conference
on, pages 1–10, 2011.
[16] J. Duato, A.J. Pena, F. Silla, R. Mayo, and E.S. Quintana-Orti. Performance
of cuda virtualized remote gpus in high performance clusters. In Parallel
Processing (ICPP), 2011 International Conference on, pages 365–374, 2011.
[17] J. Duato, A.J. Pena, F. Silla, R. Mayo, and E.S. Quintana-Orti. rcuda:
Reducing the number of gpu-based accelerators in high performance clusters.
In High Performance Computing and Simulation (HPCS), 2010 International
Conference on, pages 224–231, 2010.
[18] Lin Shi, Hao Chen, Jianhua Sun, and Kenli Li. vcuda: Gpu-accelerated highperformance
computing in virtual machines. Computers, IEEE Transactions
on, 61(6):804–816, 2012.
[19] Vishakha Gupta, Ada Gavrilovska, Karsten Schwan, Harshvardhan Kharche,
Niraj Tolia, Vanish Talwar, and Parthasarathy Ranganathan. Gvim: Gpuaccelerated
virtual machines. In Proceedings of the 3rd ACM Workshop on
System-level Virtualization for High Performance Computing, HPCVirt ’09,
pages 17–24, New York, NY, USA, 2009. ACM.
[20] Giulio Giunta, Raffaele Montella, Giuseppe Agrillo, and Giuseppe Coviello.
A gpgpu transparent virtualization component for high performance computing
clouds. In Proceedings of the 16th international Euro-Par conference on
Parallel processing: Part I, EuroPar’10, pages 379–391, Berlin, Heidelberg,
2010. Springer-Verlag.
[21] Front and back ends. http://en.wikipedia.org/wiki/Front_and_back_ends.
[22] Yi-Man Ma, Che-Rung Lee, and Yeh-Ching Chung. Infiniband virtualization
on kvm. 4th IEEE International Conference on Cloud Computing Technology
and Science Proceedings, 0:777–781, 2012.
[23] M.E. Kanal and M. Demiralp. A modified method of calculating high dimensional
model representation (hdmr) terms for parallelization with mpi and
cuda. The Journal of Supercomputing, 62(1):199–213, 2012.
[24] P. Alonso, R. Cortina, F.J. Martínez-Zaldívar, and J. Ranilla. Neville elimination
on multi- and many-core systems: Openmp, mpi and cuda. The Journal
of Supercomputing, 58(2):215–225, 2011.
[25] Yue-Shan Chang, Ruey-Kai Sheu, Shyan-Ming Yuan, and Jyn-Jie Hsu. Scaling
database performance on gpus. Information Systems Frontiers, 14(4):
909–924, 2012.
[26] T.D. Han and T.S. Abdelrahman. hicuda: High-level gpgpu programming.
Parallel and Distributed Systems, IEEE Transactions on, 22(1):78–90, 2011.
[27] Virtualbox. https://www.virtualbox.org/.
[28] Virtualization. http://en.wikipedia.org/wiki/Virtualization.
[29] National institute of standards and technology. http://www.nist.gov/index.
html.
[30] Cloud computing. http://en.wikipedia.org/wiki/Cloud_computing.
[31] Top 500. http://www.top500.org/.
[32] Mellanox. http://www.mellanox.com/index.php.
[33] Jithin Jose, Mingzhe Li, Xiaoyi Lu, Krishna Chaitanya Kandalla, Mark Daniel
Arnold, and Dhabaleswar K. Panda. Sr-iov support for virtualization on
infiniband clusters: Early experience. In Cluster, Cloud and Grid Computing
(CCGrid), 2013 13th IEEE/ACM International Symposium on, pages 385–
392, 2013.
[34] H. Subramoni, S. Potluri, K. Kandalla, B. Barth, J. Vienne, J. Keasler,
K. Tomko, K. Schulz, A. Moody, and D.K. Panda. Design of a scalable
infiniband topology service to enable network-topology-aware placement of
processes. In High Performance Computing, Networking, Storage and Analysis
(SC), 2012 International Conference for, pages 1–12, 2012.
[35] H. Subramoni, K. Kandalla, J. Vienne, S. Sur, B. Barth, K. Tomko, R. McLay,
K. Schulz, and D.K. Panda. Design and evaluation of network topology-/
speed- aware broadcast algorithms for infiniband clusters. In Cluster Computing
(CLUSTER), 2011 IEEE International Conference on, pages 317–325,
2011.
[36] K. Kandalla, H. Subramoni, J. Vienne, S.P. Raikar, K. Tomko, S. Sur, and
D.K. Panda. Designing non-blocking broadcast with collective offload on
infiniband clusters: A case study with hpl. In High Performance Interconnects
(HOTI), 2011 IEEE 19th Annual Symposium on, pages 27–34, 2011.
[37] J. Vienne, J. Chen, M. Wasi-ur Rahman, N.S. Islam, H. Subramoni, and D.K.
Panda. Performance analysis and evaluation of infiniband fdr and 40gige roce
on hpc and cloud computing systems. In High-Performance Interconnects
(HOTI), 2012 IEEE 20th Annual Symposium on, pages 48–55, 2012.
[38] N.S. Islam, M.W. Rahman, J. Jose, R. Rajachandrasekar, H. Wang, H. Subramoni,
C. Murthy, and D.K. Panda. High performance rdma-based design of
hdfs over infiniband. In High Performance Computing, Networking, Storage
and Analysis (SC), 2012 International Conference for, pages 1–12, 2012.
[39] Jian Huang, Xiangyong Ouyang, J. Jose, M. Wasi-ur Rahman, Hao Wang,
Miao Luo, H. Subramoni, C. Murthy, and D.K. Panda. High-performance
design of hbase with rdma over infiniband. In Parallel Distributed Processing
Symposium (IPDPS), 2012 IEEE 26th International, pages 774–785, 2012.
[40] S. Sur, S. Potluri, K. Kandalla, H. Subramoni, D.K. Panda, and K. Tomko.
Codesign for infiniband clusters. Computer, 44(11):31–36, 2011.
[41] C. Reano, A.J. Pea, F. Silla, J. Duato, R. Mayo, and E.S. Quintana-Orti.
Cu2rcu: Towards the complete rcuda remote gpu virtualization and sharing
solution. In High Performance Computing (HiPC), 2012 19th International
Conference on, pages 1–10, 2012.
[42] M.S. Vinaya, N. Vydyanathan, and M. Gajjar. An evaluation of cuda-enabled
virtualization solutions. In Parallel Distributed and Grid Computing (PDGC),
2012 2nd IEEE International Conference on, pages 621–626, 2012.
[43] Chao-Tung Yang, Hsien-Yi Wang, Wei-Shen Ou, Yu-Tso Liu, and Ching-
Hsien Hsu. On implementation of gpu virtualization using pci pass-through.
In Cloud Computing Technology and Science (CloudCom), 2012 IEEE 4th
International Conference on, pages 711–716, 2012.
[44] F.N. Almari, P. Zavarsky, R. Ruhl, D. Lindskog, and A. Aljaedi. Performance
analysis of oracle database in virtual environments. In Advanced Information
Networking and Applications Workshops (WAINA), 2012 26th International
Conference on, pages 1238–1245, 2012.
[45] R. Owens and Weichao Wang. Non-interactive os fingerprinting through memory
de-duplication technique in virtual machines. In Performance Computing
and Communications Conference (IPCCC), 2011 IEEE 30th International,
pages 1–8, 2011.
[46] O. Sukwong, A. Sangpetch, and H.S. Kim. Sageshift: Managing slas for highly
consolidated cloud. In INFOCOM, 2012 Proceedings IEEE, pages 208–216,
2012.
[47] M. Ahmed and Yang Xiang. Trust ticket deployment: A notion of a data
owner’s trust in cloud computing. In Trust, Security and Privacy in Computing
and Communications (TrustCom), 2011 IEEE 10th International Conference
on, pages 111–117, 2011.
[48] Xiaofei Huang, Xiaoying Bai, and Richard M. Lee. An empirical study of
vmm overhead, configuration performance and scalability. In Service Oriented
System Engineering (SOSE), 2013 IEEE 7th International Symposium on,
pages 359–366, 2013.
[49] P. Muditha Perera and Chamath Keppitiyagama. A performance comparison
of hypervisors. In Advances in ICT for Emerging Regions (ICTer), 2011
International Conference on, pages 120–120, 2011.
[50] G. Kukreja and S. Singh. Virtio based transcendent memory. In Computer
Science and Information Technology (ICCSIT), 2010 3rd IEEE International
Conference on, volume 1, pages 723–727, 2010.
[51] R. Shea and Jiangchuan Liu. Understanding the impact of denial of service
attacks on virtual machines. In Quality of Service (IWQoS), 2012 IEEE 20th
International Workshop on, pages 1–9, 2012.
[52] Jiuxing Liu. Evaluating standard-based self-virtualizing devices: A performance
study on 10 gbe nics with sr-iov support. In Parallel Distributed
Processing (IPDPS), 2010 IEEE International Symposium on, pages 1–12,
2010.
[53] Zhaoliang Guo and Qinfen Hao. Optimization of kvm network based on cpu
affinity on multi-cores. In Information Technology, Computer Engineering and
Management Sciences (ICM), 2011 International Conference on, volume 4,
pages 347–351, 2011.
[54] N. Regola and J.-C. Ducom. Recommendations for virtualization technologies
in high performance computing. In Cloud Computing Technology and Science
(CloudCom), 2010 IEEE Second International Conference on, pages 409–416,
2010.
[55] I. Tafa, E. Beqiri, H. Paci, E. Kajo, and A. Xhuvani. The evaluation of transfer
time, cpu consumption and memory utilization in xen-pv, xen-hvm, openvz,
kvm-fv and kvm-pv hypervisors using ftp and http approaches. In Intelligent
Networking and Collaborative Systems (INCoS), 2011 Third International
Conference on, pages 502–507, 2011.
[56] D. Petrovic and A. Schiper. Implementing virtual machine replication: A
case study using xen and kvm. In Advanced Information Networking and
Applications (AINA), 2012 IEEE 26th International Conference on, pages
73–80, 2012.
[57] S3544-3d-apps-vmware-horizon-view. http:// on-demand.gputechconf.com/
gtc/2013/presentations/S3544-3D-Apps-VMware-Horizon-View.pdf.
[58] S3355-deploying-grid-citrix-vmware-vd-environments. http:// ondemand.
gputechconf.com/ gtc/ 2013/ presentations/ S3355-Deploying-GRIDCitrix-
VMWare-VD-Environments.pdf.
[59] Directx-wiki. http://en.wikipedia.org/wiki/DirectX.
[60] Opengl-wiki. http://en.wikipedia.org/wiki/OpenGL.

電子全文

國圖紙本論文

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供，不一定有電子全文可供下載，若連結有誤，請點選上方之〝勘誤回報〞功能，我們會盡快修正，謝謝！

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	教育機構導入虛擬化資訊系統之研究
2.	測試機房虛擬化：以大型軟體設計公司採用VMware為例
3.	虛擬叢集部署與管理系統 - 以 KMLN 為例
4.	以PC叢集架構設計虛擬電腦教室兼具負載平衡之研究
5.	叢集式虛擬機器磁碟系統之優化研究
6.	虛擬化技術下虛擬CPU資源配置之最佳化
7.	使用DWRR排程演算法之高效率Xen vCPU排程器
8.	虛擬機器導入之研究—以M銀行代管伺服器為例
9.	KVM虛擬機之InfiniBand虛擬化
10.	ARMvisor IO 效能最佳化及分析，以 Virtio 及 irqchip 為例
11.	比較型局部錯誤診斷演算法的平行設計與分析
12.	具容錯機制的KMLN虛擬叢集管理平台之開發實作
13.	以 Open vSwitch 實作虛擬伺服叢集的負載平衡
14.	三個主要虛擬機器-VMware, XEN and KVM-的效能比較
15.	多核心系統之OpenMP和CUDA平行程式效能比較

無相關期刊

1.	建構虛擬化InfiniBand和10G乙太網路於虛擬GPU叢集
2.	用於醫療保健的智能環境監測系統之實作
3.	一個節電方法於雲端虛擬機管理平台之實作
4.	運用HBase 於醫療雲中資料轉換方法之實作
5.	KVM虛擬機之InfiniBand虛擬化
6.	以虛擬化叢集及DRBD建構雙重高可用性雲端服務
7.	一個基於OpenStack 上具虛擬機耗能監測及動態遷移的節能雲端基礎設施之實作
8.	直轄市區公所管轄權與組織發展之研究-以臺中市為例
9.	利用微米氣泡處理法去除水中布洛芬之研究
10.	生產速率變更對經濟批量檢驗和排程問題之影響
11.	使用OpenStack建置一個擁有動態資源分配方法的雲端基礎設施
12.	工作休息策略和合作關係對團隊績效與視覺疲勞評估研究—以行動遊戲為例
13.	陽明心學研究
14.	牙醫助理離職傾向之研究
15.	機械零組件廠商成功經營模式之建構

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室