研究生(外文):Po-Yao Huang
論文名稱:高速且具擴增性 HPC Application 的叢集網路效能模擬器
論文名稱(外文):A Fast and Scalable Cluster Simulator for Network Performance Analysis of HPC Applications
外文關鍵詞:Profiling toolNetwork SimulatorAccelerated SimulatorHigh-Performance ComputingNetwork Topology
The computer cluster is combined with a set of loosely or tightly connected computers which work together. They can be regarded as a single system. The computer cluster is that solves the large scale scientific problems and processes the big data. In the large-scale system, interconnect plays an important role in overall performance. The interconnect have an impact on the performance of the HPC application. To achieve the best performance, we need to modify the network settings or topology. However, modified the network settings or topology is difficult in the computer cluster. Hence, we need the network simulator.

In this study, we developed the simulator base on the ns-3 to simulate the network latency of the HPC applications and analyze the network performance. Also, we researched the acceleration of the simulator for reducing the simulation time. The simulator can rapidly provide the simulated performance metrics of a large scale system with the appropriate trade-off between simulation accuracy and simulation time. The simulator maximum speedup with acceleration simulation can be 18.5 times. Compared with the physical machine, the overall error rates of the simulation can be less than 16.5%. Finally, the simulator can predict the network performance of a rack scale system of up to 64 nodes.
Chapter 1 Introduction 1
Chapter 2 Background and Related Work 5
2.1 High Performance Computing Applications . . . . . . . . . . . . . . . . 5
2.1.1 High Performance Computing . . . . . . . . . . . . . . . . . . . 5
2.1.2 Message Passing Interface . . . . . . . . . . . . . . . . . . . . . 6
2.2 High Performance Computing Networks . . . . . . . . . . . . . . . . . . 6
2.2.1 InfiniBand versus Ethernet . . . . . . . . . . . . . . . . . . . . . 6
2.2.2 End-to-end delay . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.3 Fat Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.4 3D Torus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Simulation of Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.1 ns-3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Accelerated Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5.1 dist-gem5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5.2 CloudSim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5.3 NTU-DSI-DCN . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5.4 DCNSim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5.5 XSim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Chapter 3 Methodology 12
3.1 Concurrency Preservation of Network Traces . . . . . . . . . . . . . . . 12
3.1.1 Network Traces Collects from Virtual Cluster . . . . . . . . . . . 12
3.1.2 Per-flow Timing Components . . . . . . . . . . . . . . . . . . . 14
3.1.3 Concurrency Preservation . . . . . . . . . . . . . . . . . . . . . 14
3.2 Trace-driven Network Simulation . . . . . . . . . . . . . . . . . . . . . 16
3.2.1 Traffic Matrix Preprocessing . . . . . . . . . . . . . . . . . . . . 16
3.2.2 NS-3 Simulation Procedure . . . . . . . . . . . . . . . . . . . . 17
3.3 Accelerated Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Chapter 4 Evaluation 21 4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3 Evaluation of Single-thread Multi-node Simulation . . . . . . . . . . . . 23
4.4 Evaluation of Multi-thread Multi-node Simulation . . . . . . . . . . . . . 23
4.5 Accelerated Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.6 Case study : 64-Node Rack Scale Simulation . . . . . . . . . . . . . . . 25
Chapter 5 Conclusion and Future Work 30
Bibliography 33
