跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.81) 您好!臺灣時間:2025/10/05 21:40
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:劉季燁
研究生(外文):Chi-YehLiu
論文名稱:混合開放式多處理與進階矢量擴展的加速在分裂HLL方法解尤拉方程式
論文名稱(外文):Hybrid OpenMP/AVX Acceleration of a Split Harten, Lax and van Leer Method for the Euler Equations
指導教授:李汶樺
指導教授(外文):Matthew R. Smith
學位類別:碩士
校院名稱:國立成功大學
系所名稱:機械工程學系
學門:工程學門
學類:機械工程學類
論文種類:學術論文
論文出版年:2014
畢業學年度:102
語文別:英文
論文頁數:147
中文關鍵詞:平行計算進階矢量擴展(AVX)開放式多處理流體計算力學有限體積法分裂HLL方法HLL方法
外文關鍵詞:Parallel ComputingAdvanced Vector eXtensions (AVX)OpenMPComputational Fluid DynamicFinite Volume MethodSHLL (Split HLL) and HLL method
相關次數:
  • 被引用被引用:0
  • 點閱點閱:202
  • 評分評分:
  • 下載下載:8
  • 收藏至我的研究室書目清單書目收藏:0
本研究運用混和開放式多處理與進階矢量擴展之並行模式展現分裂HLL(SHLL)方法進行平行計算。 為了並行化的目的,分裂HLL (SHLL)方法之統御方程式在每一方向之座標都以數學和向量分裂來呈現。 此種高程度之局部性結果已經成功地被運用在繪圖處理單元進行平行計算。 在繪圖處理單元設備能呈現高性能的表現,同樣的原理在本論文使用進階矢量擴展之方法也呈現高性能的表現。 高性能之表現被表現在藉由確定所有通量計算皆由使用進階矢量擴展之固有函數來計算而非在串行計算的情況下。 進階矢量擴展之主要能力特點是表現單指令多數據操作在並行八個浮點變數上,而此方法是由較早前之單指令多數據流性擴展(SSE)使用4個浮點變數所發展。 經由固有函數的使用,這八個平行計算的暫存器可被視為在每個實體核心上而外增加的八個核心。 自從現代的中央處理器運用大量的核心,其性能得以進一步擴展藉由使用進階矢量擴展的計算暫存器在每一個可用之中央處理器並且結合運用開放式多處理的共享記憶體達到並行化,同時達到更有效的同時運用8P個核心,其中P為實際上的核心數量。
另外,此高效率並行計算工具的發展,也就是分裂的HLL (SHLL) 方程式已經重先前所相信地單一耗散係數改寫成兩者不同的消散係數來呈現。 本研究運用誤差分析探查震波管與衝擊聲波相互作用之一維問題來求得分裂HLL (SHLL)方法之理想耗散係數。 此外,理想的耗散係數在二維問題中包含尤拉四震波的交互作用問題、尤拉四面波交互作用問題也被呈現與討論。 其在不同網格數下的並行化表現模擬則以一階與二階之尤拉四震波的交互作用問題來比較使用16核心雙至強中央處起器單一工作站與英特爾i7-3930K和英特爾i3-3220K之性能。 而在最後之結果顯示最佳的效率相較於單一核心之通量計算可得到增速超越326倍之表現藉由雙E52670至強中央處理器的使用。
Presented is the Split Harten, Lax and van Leer (SHLL) method applied to parallel computation using a hybrid OpenMP/AVX (Advanced Vector eXtensions) parallelization paradigm. The governing equations in each directional coordinate of SHLL method have been mathematically vector split for the purpose of parallelization. This splitting results in a high degree of locality and has been previously successfully applied to parallel computation using Graphics Processing Units (GPU). The same principles which allow high performance on GPU devices also permit high performance using AVX as demonstrated in the present study. The High performance was obtained by ensuring that all flux computations were performed using only AVX intrinsic functions with no computations performed in serial. The major feature of AVX is the capacity to perform SIMD operations on 8 floating point variables in parallel, which is an extension on the previous SIMD Streaming Extensions (SSE) using 4 floating point variables. Through the use of intrinsic functions, these 8 parallel computation registers may be treated as 8 additional computing core per each physical core. Since modern CPU’s employ a large number of physical cores, the performance can be further extended by using all 8 AVX computational registers on each available CPU core using shared memory OpenMP parallelization, effectively employing 8P cores where P is the number of actual physical cores available.
In addition to the development of this highly efficient parallel computing tool, the SHLL equations have been reformulated and are shown to possess two dissipation coefficients as opposed to the single dissipation coefficient previously believed present. The ideal dissipation coefficients α_1 and α_2 of SHLL scheme for one dimensional problems, including the Shock-tube and Shock-acoustic wave interaction problems, were investigated through the use of error analysis. Additionally, several two dimensional problems, including the Euler-four-shock interaction and Euler-four-contact interaction problem, are also shown and discussed with regards to the ideal dissipation coefficients. Careful manipulation of these coefficients leads to performance approaching 4th order spatial accuracy, and results are shown to be an improvement upon previously published 3rd order accurate techniques.
The parallel performance for various problems in first and second order using a varying number of cells are presented using a single workstation with dual Xeon CPU’s (16 physical cores), Intel i7-3930K and Intel i3-3220K. The best reported speedup – the computational performance compared to a single core - was over 326 times using dual E52670 Xeon CPU’s when computing the flux evaluation kernel.

中文摘要 i
Abstract iii
Acknowledgements v
Findings/Publications vi
Conference Presentations vi
Table of Contents vii
List of Table x
List of Figures xii
Nomenclature xxiii
Chapter 1 Introduction 1
1.1 Background and Motivation 1
1.2 Parallel computing 1
1.2.1 Parallelization Theory 3
1.2.2 Message Passing Interface (MPI) 8
1.2.3 Open Multi-Processing (OpenMP) 9
1.2.4 Graphics processing unit (GPU)10
1.3 History of CPU architecture 11
1.3.1 Intel’s MMX 12
1.3.2 Streaming SIMD Extensions (SSE) 13
1.3.3 Advanced Vector eXtensions (AVX) 14
1.4 Governing Equations 16
1.4.1 Isothermal Euler Equations 16
1.4.2 Euler and Navier-Stokes Equations 17
1.5 Finite Volume Method 18
1.5.1 CFL number 20
1.5.2 Total Variation Diminishing schemes 20
1.6 TVD-MUSCL scheme 28
1.7 Rusanov Method 30
1.8 Harten, Lax and Van Leer (HLL) method 32
1.9 SHLL (Split HLL) method 34
1.10 Fifth WENO scheme 36
Chapter 2 NumericalMethods 38
2.1 Dissipation control SHLL (Split HLL) method 38
2.2 Extension to a second-order SHLL scheme 39
2.3 Execution process and concept for OpenMP – AVX parallelization 40
Chapter 3 Results and discussion 50
3.1. Euler-four-shock interaction 50
3.2. Euler-four-contact interaction 51
3.3 Dissipation Control Analysis 52
3.3.1 Shock tube problem 52
3.3.2 Shock-acoustic-wave interaction 54
3.3.3 First order Euler-four-shock interaction & Euler-four-contact interaction problem 55
3.3.4 Second order Euler-four-shock interaction & Euler-four-contact interaction problem 58
3.4 Parallel Performance 60
Chapter 4 Conclusion 68
References 70
Appendix A 74
Tables 74
Figures 85
[1] Amdahl. G.M., “Validity of the single processor approach to achieving large scale computing capabilities, In AFIPS Conference Proceedings Vol. 30, pp. 483–485, 1967.
[2] Čada. M., M. Torrilhon., “Compact third-order limiter functions for finite volume methods, J. Compt. Phys., Vol.228, pp. 4118–4145, 2009.
[3] Castro M.J, Garc´ıa-Rodr´ıguez., J.A, Gonz´alez-Vida., J.M, Par´es. C., “Solving shallow-water systems in 2D domains using Finite Volume methods and multimedia SSE instructions, Journal of Computational and Applied Mathematics, Vol. 221, pp. 16-32, 2008.
[4] Chris Lomont., “Introduction to Intel Advanced Vector Extensions, Intel Corporation, 2011.
[5] Colella P., A direct Euleman MUSCL scheme for gas dynamics, SIAM J. Sez. Star Comput. 6, 104-117, 1985.
[6] Colella P. and H M. Glaz, Efficmnt solution algorithms for the Rmmann problem for real gases, J. Cornput. Phys. 59, 264-289, 1985.
[7] Davis. S. F., “Simplified Second-Order Godunov-Type Methods, SIAM J. ScI. STAT. COMPUT., Vol. 9, pp. 3, 1988.
[8] Euler Leonard. “Principes généraux du mouvement des fluides Mémoires de l'Academie des Sciences de Berlin, 1757.
[9] Flynn. M. J., “ Very High-Speed Computing Systems, Proc. IEEE, Vol. 54, pp. 1901-1909, 1966.
[10] Fragalla. John., “The Future of CPU Architectures Sun Microsystems, Inc., 2004.
[11] Gustafson. J.L., “Reevaluating Amdahl's Law, Communications of the ACM, Vol. 31(5), pp.532-533, 1988.
[12] Gorobets A.V., Trias F.X., Oliva A., “A parallel MPI + OpenMP + OpenCL algorithm for hybrid supercomputations of incompressible flows, J. Computers & Fluids Vol 88, pp. 764-772, 2013.
[13] Gottlieb Siga, Mullen Julia S., Ruuth Steven J., “A Fifth Order Flux Implicit WENO Method, Journal of Scientific Computing, Vol. 27, Nos. 1–3, 2006.
[14] Gropp William, Lusk Ewing, Skjellum Anthony., “Using MPI: Portable Parallel Programming with the Message-passing Interface, In Scientific and Engineering Computation.Cambridge, Mass : [N.p.]. 1999.
[15] Harten A., Lax P. D., van Leer B., “On upstream differencing and Godunov-type schemes for hyperbolic conservation laws, SIAMJ Rev, Vol. 25(1), pp. 35–61, 1983.
[16] Hayase, T., Humphrey, J. A. C. and Greif, R. “A Consistently Formulated QUICK Scheme for Fast and Stable Convergence Using Finite-volume Iterative Calculation Procedures, J. Comput. Phys., Vol. 98, pp. 108–118, 1992.
[17] Intel., “Intel Architecture Software Developer’s Manual, Intel Corporation, 1999.
[18] Kuo Fang-An., Matthew R. Smith., Chih-Wei Hsieh., Chau-Yi Chou., Jong-Shinn Wu., “GPU acceleration for general conservation equations and its application to several engineering problems, J. Compt. & Fluids, Vol. 45, pp. 147-154, 2011.
[19] Koren B., “A robust upwind discretization method for advection, diffusion and source terms, in Koren Vreugdenhil (Ed.), Numerical Methods for Advection–Diffusion Problems, Vieweg, Braunschweig, Germany, pp. 117–138, 1993.
[20] Kleen. Andreas. “A NUMA API for LINUX, SUSE LINUX Products GmbH, A Novell Business, 2005.
[21] Liu. Ji-Yueh, Matthew R. Smith., Fang-An Kuo., and Jong-Shin Wu., “Hybrid OpenMP/AVX Acceleration of a Split Harten, Lax and van Leer Method for the Shallow Water and Euler Equations, Accepted.
[22] Osher Stanley, “Convergence of Generalized Muscl Schemes, Siam J. Nmer. Anal., Vol. 22,(5), 1985.
[23] Rechtin Eberhardt., “Systems architecting: creating and building complex systems, Prentice Hall, Inc. Englewood Cliffs, NJ, USA. 1991.
[24] Roe, P. L., “Some Contributions to the Modelling of Discontinuous Flows, Lectures in Applied Mechanics, Springer-Verlag, Berlin, Vol. 22, pp. 163–193, 1985.
[25] Schulz-Rinne. C.W., J.P. Collins., H.M. Glaz,“Numerical solution of the Riemann problem for two-dimensional gas dynamics, SIAM J. Sci. Compt., Vol. 14, pp. 1394–1414, 1993.
[26] Smith M.R., Chen. Y.C., Liu. J.Y., Ferguson Alexander., Wu. J.S. “Extension of Uniform Equilibrium Flux Method (UEFM) to Second Order Accuracy and its Graphics Processing Unit Acceleration, Procedia Engineering., Vol. 61, pp. 70-75, 2013.
[27] Serna. S.,“A class of extended limiters applied to piecewise hyperbolic methods, SIAM J. Sci. Compt., Vol. 28, pp. 123–140, 2006.
[28] Shu C.W., Osher S.,“Efficient implementation of essentially non-oscillatory shock-capturing schemes, J. Comp. Phys., Vol. 77, pp. 439–471, 1988.
[29] Sweby P.K., “High resolution schemes using flux limiters for hyperbolic conservation laws Siam J. Nmer. Anal., Vol. 21,(5), 1984.
[30] Sohn Sung-IK, “A New TVD-MUSCL Scheme for Hyperbolic Conservation Laws, Computers and Mathematics with Applications, Vol. 50, pp.231-248, 2005.
[31] SOD Gary A. “A Survey of Several Finite Difference Methods for Systems of Nonlinear Hyperbolic Conservation Laws J. Comput. Phys., Vol, 27, pp.1-31,1978.
[32] Spinellis Diomidis., “A Critique of the Windows Application Programming Interface, University of the Aegean, Karlovasi, Greece, 1997.
[33] Toro E.F., Riemann Solvers and Numerical Methods for Fluid Dynamics, second ed., Springer, Berlin, Germany, 1999.
[34] Trangenstein John A. “Numerical Solution of Hyperbolic Partial Differential Equations, Cambridge University Press, USA , 2009.
[35] Van Leer B. “Towards the Ultimate Conservative Difference Scheme. II. Monotonicity and Conservation Combined in a Second-Order Scheme, J. Comput. Phys., Vol, 14, pp. 361-370, 1974.
[36] Van Leer, B., “Towards the Ultimate Conservative Difference Scheme V: A Second-Order Sequel to Godunov’s Method, J. Comput. Phys., Vol. 32, pp. 101–136, 1979.
[37] Van Leer, B., “Flux-Vector Splitting For The Euler Equations, the 8th International Conference on Numerical Methods in Fluid Dynamics, Aachen, Germany, 1982.
[38] V. V. Rusanov.,“Calculation of Intersection of Non-Steady Shock Waves with Obstacles, J. Compt. Math. Phys. USSR., Vol. 1, pp. 267–279, 1961.
[39] Zhang. Shanghong, Xia. Zhongxi, Yuan. Rui, Jiang. Xiaoming., “Parallel computation of a dam-break flow model using OpenMP on a multi-core computer, Journal of Hydrology, Vol. 512, pp. 126–133, 2014.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊