臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.81) 您好！臺灣時間：2025/10/05 21:40

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
QR Code

本論文永久網址:

研究生:

劉季燁

研究生(外文):

Chi-YehLiu

論文名稱:

混合開放式多處理與進階矢量擴展的加速在分裂HLL方法解尤拉方程式

論文名稱(外文):

Hybrid OpenMP/AVX Acceleration of a Split Harten, Lax and van Leer Method for the Euler Equations

指導教授:

李汶樺

指導教授(外文):

Matthew R. Smith

學位類別:

碩士

校院名稱:

國立成功大學

系所名稱:

機械工程學系

學門:

工程學門

學類:

機械工程學類

論文種類:

學術論文

論文出版年:

2014

畢業學年度:

102

語文別:

英文

論文頁數:

147

中文關鍵詞:

平行計算、進階矢量擴展(AVX)、開放式多處理、流體計算力學有限體積法、分裂HLL方法、HLL方法

外文關鍵詞:

Parallel Computing、Advanced Vector eXtensions (AVX)、OpenMP、Computational Fluid Dynamic、Finite Volume Method、SHLL (Split HLL) and HLL method

相關次數:

被引用:0
點閱:202
評分:
下載:8
書目收藏:0

本研究運用混和開放式多處理與進階矢量擴展之並行模式展現分裂HLL(SHLL)方法進行平行計算。為了並行化的目的，分裂HLL (SHLL)方法之統御方程式在每一方向之座標都以數學和向量分裂來呈現。此種高程度之局部性結果已經成功地被運用在繪圖處理單元進行平行計算。在繪圖處理單元設備能呈現高性能的表現，同樣的原理在本論文使用進階矢量擴展之方法也呈現高性能的表現。高性能之表現被表現在藉由確定所有通量計算皆由使用進階矢量擴展之固有函數來計算而非在串行計算的情況下。進階矢量擴展之主要能力特點是表現單指令多數據操作在並行八個浮點變數上，而此方法是由較早前之單指令多數據流性擴展(SSE)使用4個浮點變數所發展。經由固有函數的使用，這八個平行計算的暫存器可被視為在每個實體核心上而外增加的八個核心。自從現代的中央處理器運用大量的核心，其性能得以進一步擴展藉由使用進階矢量擴展的計算暫存器在每一個可用之中央處理器並且結合運用開放式多處理的共享記憶體達到並行化，同時達到更有效的同時運用8P個核心，其中P為實際上的核心數量。
另外，此高效率並行計算工具的發展，也就是分裂的HLL (SHLL) 方程式已經重先前所相信地單一耗散係數改寫成兩者不同的消散係數來呈現。本研究運用誤差分析探查震波管與衝擊聲波相互作用之一維問題來求得分裂HLL (SHLL)方法之理想耗散係數。此外，理想的耗散係數在二維問題中包含尤拉四震波的交互作用問題、尤拉四面波交互作用問題也被呈現與討論。其在不同網格數下的並行化表現模擬則以一階與二階之尤拉四震波的交互作用問題來比較使用16核心雙至強中央處起器單一工作站與英特爾i7-3930K和英特爾i3-3220K之性能。而在最後之結果顯示最佳的效率相較於單一核心之通量計算可得到增速超越326倍之表現藉由雙E52670至強中央處理器的使用。

Presented is the Split Harten, Lax and van Leer (SHLL) method applied to parallel computation using a hybrid OpenMP/AVX (Advanced Vector eXtensions) parallelization paradigm. The governing equations in each directional coordinate of SHLL method have been mathematically vector split for the purpose of parallelization. This splitting results in a high degree of locality and has been previously successfully applied to parallel computation using Graphics Processing Units (GPU). The same principles which allow high performance on GPU devices also permit high performance using AVX as demonstrated in the present study. The High performance was obtained by ensuring that all flux computations were performed using only AVX intrinsic functions with no computations performed in serial. The major feature of AVX is the capacity to perform SIMD operations on 8 floating point variables in parallel, which is an extension on the previous SIMD Streaming Extensions (SSE) using 4 floating point variables. Through the use of intrinsic functions, these 8 parallel computation registers may be treated as 8 additional computing core per each physical core. Since modern CPU’s employ a large number of physical cores, the performance can be further extended by using all 8 AVX computational registers on each available CPU core using shared memory OpenMP parallelization, effectively employing 8P cores where P is the number of actual physical cores available.
In addition to the development of this highly efficient parallel computing tool, the SHLL equations have been reformulated and are shown to possess two dissipation coefficients as opposed to the single dissipation coefficient previously believed present. The ideal dissipation coefficients α_1 and α_2 of SHLL scheme for one dimensional problems, including the Shock-tube and Shock-acoustic wave interaction problems, were investigated through the use of error analysis. Additionally, several two dimensional problems, including the Euler-four-shock interaction and Euler-four-contact interaction problem, are also shown and discussed with regards to the ideal dissipation coefficients. Careful manipulation of these coefficients leads to performance approaching 4th order spatial accuracy, and results are shown to be an improvement upon previously published 3rd order accurate techniques.
The parallel performance for various problems in first and second order using a varying number of cells are presented using a single workstation with dual Xeon CPU’s (16 physical cores), Intel i7-3930K and Intel i3-3220K. The best reported speedup – the computational performance compared to a single core - was over 326 times using dual E52670 Xeon CPU’s when computing the flux evaluation kernel.

中文摘要 i
Abstract iii
Acknowledgements v
Findings/Publications vi
Conference Presentations vi
Table of Contents vii
List of Table x
List of Figures xii
Nomenclature xxiii
Chapter 1 Introduction 1
1.1 Background and Motivation 1
1.2 Parallel computing 1
1.2.1 Parallelization Theory 3
1.2.2 Message Passing Interface (MPI) 8
1.2.3 Open Multi-Processing (OpenMP) 9
1.2.4 Graphics processing unit (GPU)10
1.3 History of CPU architecture 11
1.3.1 Intel’s MMX 12
1.3.2 Streaming SIMD Extensions (SSE) 13
1.3.3 Advanced Vector eXtensions (AVX) 14
1.4 Governing Equations 16
1.4.1 Isothermal Euler Equations 16
1.4.2 Euler and Navier-Stokes Equations 17
1.5 Finite Volume Method 18
1.5.1 CFL number 20
1.5.2 Total Variation Diminishing schemes 20
1.6 TVD-MUSCL scheme 28
1.7 Rusanov Method 30
1.8 Harten, Lax and Van Leer (HLL) method 32
1.9 SHLL (Split HLL) method 34
1.10 Fifth WENO scheme 36
Chapter 2 NumericalMethods 38
2.1 Dissipation control SHLL (Split HLL) method 38
2.2 Extension to a second-order SHLL scheme 39
2.3 Execution process and concept for OpenMP – AVX parallelization 40
Chapter 3 Results and discussion 50
3.1. Euler-four-shock interaction 50
3.2. Euler-four-contact interaction 51
3.3 Dissipation Control Analysis 52
3.3.1 Shock tube problem 52
3.3.2 Shock-acoustic-wave interaction 54
3.3.3 First order Euler-four-shock interaction ＆ Euler-four-contact interaction problem 55
3.3.4 Second order Euler-four-shock interaction ＆ Euler-four-contact interaction problem 58
3.4 Parallel Performance 60
Chapter 4 Conclusion 68
References 70
Appendix A 74
Tables 74
Figures 85

[1] Amdahl. G.M., “Validity of the single processor approach to achieving large scale computing capabilities, In AFIPS Conference Proceedings Vol. 30, pp. 483–485, 1967.
[2] Čada. M., M. Torrilhon., “Compact third-order limiter functions for finite volume methods, J. Compt. Phys., Vol.228, pp. 4118–4145, 2009.
[3] Castro M.J, Garc´ıa-Rodr´ıguez., J.A, Gonz´alez-Vida., J.M, Par´es. C., “Solving shallow-water systems in 2D domains using Finite Volume methods and multimedia SSE instructions, Journal of Computational and Applied Mathematics, Vol. 221, pp. 16-32, 2008.
[4] Chris Lomont., “Introduction to Intel Advanced Vector Extensions, Intel Corporation, 2011.
[5] Colella P., A direct Euleman MUSCL scheme for gas dynamics, SIAM J. Sez. Star Comput. 6, 104-117, 1985.
[6] Colella P. and H M. Glaz, Efficmnt solution algorithms for the Rmmann problem for real gases, J. Cornput. Phys. 59, 264-289, 1985.
[7] Davis. S. F., “Simplified Second-Order Godunov-Type Methods, SIAM J. ScI. STAT. COMPUT., Vol. 9, pp. 3, 1988.
[8] Euler Leonard. “Principes généraux du mouvement des fluides Mémoires de l'Academie des Sciences de Berlin, 1757.
[9] Flynn. M. J., “ Very High-Speed Computing Systems, Proc. IEEE, Vol. 54, pp. 1901-1909, 1966.
[10] Fragalla. John., “The Future of CPU Architectures Sun Microsystems, Inc., 2004.
[11] Gustafson. J.L., “Reevaluating Amdahl's Law, Communications of the ACM, Vol. 31(5), pp.532-533, 1988.
[12] Gorobets A.V., Trias F.X., Oliva A., “A parallel MPI + OpenMP + OpenCL algorithm for hybrid supercomputations of incompressible flows, J. Computers ＆ Fluids Vol 88, pp. 764-772, 2013.
[13] Gottlieb Siga, Mullen Julia S., Ruuth Steven J., “A Fifth Order Flux Implicit WENO Method, Journal of Scientific Computing, Vol. 27, Nos. 1–3, 2006.
[14] Gropp William, Lusk Ewing, Skjellum Anthony., “Using MPI: Portable Parallel Programming with the Message-passing Interface, In Scientific and Engineering Computation.Cambridge, Mass : [N.p.]. 1999.
[15] Harten A., Lax P. D., van Leer B., “On upstream differencing and Godunov-type schemes for hyperbolic conservation laws, SIAMJ Rev, Vol. 25(1), pp. 35–61, 1983.
[16] Hayase, T., Humphrey, J. A. C. and Greif, R. “A Consistently Formulated QUICK Scheme for Fast and Stable Convergence Using Finite-volume Iterative Calculation Procedures, J. Comput. Phys., Vol. 98, pp. 108–118, 1992.
[17] Intel., “Intel Architecture Software Developer’s Manual, Intel Corporation, 1999.
[18] Kuo Fang-An., Matthew R. Smith., Chih-Wei Hsieh., Chau-Yi Chou., Jong-Shinn Wu., “GPU acceleration for general conservation equations and its application to several engineering problems, J. Compt. ＆ Fluids, Vol. 45, pp. 147-154, 2011.
[19] Koren B., “A robust upwind discretization method for advection, diffusion and source terms, in Koren Vreugdenhil (Ed.), Numerical Methods for Advection–Diffusion Problems, Vieweg, Braunschweig, Germany, pp. 117–138, 1993.
[20] Kleen. Andreas. “A NUMA API for LINUX, SUSE LINUX Products GmbH, A Novell Business, 2005.
[21] Liu. Ji-Yueh, Matthew R. Smith., Fang-An Kuo., and Jong-Shin Wu., “Hybrid OpenMP/AVX Acceleration of a Split Harten, Lax and van Leer Method for the Shallow Water and Euler Equations, Accepted.
[22] Osher Stanley, “Convergence of Generalized Muscl Schemes, Siam J. Nmer. Anal., Vol. 22,(5), 1985.
[23] Rechtin Eberhardt., “Systems architecting: creating and building complex systems, Prentice Hall, Inc. Englewood Cliffs, NJ, USA. 1991.
[24] Roe, P. L., “Some Contributions to the Modelling of Discontinuous Flows, Lectures in Applied Mechanics, Springer-Verlag, Berlin, Vol. 22, pp. 163–193, 1985.
[25] Schulz-Rinne. C.W., J.P. Collins., H.M. Glaz,“Numerical solution of the Riemann problem for two-dimensional gas dynamics, SIAM J. Sci. Compt., Vol. 14, pp. 1394–1414, 1993.
[26] Smith M.R., Chen. Y.C., Liu. J.Y., Ferguson Alexander., Wu. J.S. “Extension of Uniform Equilibrium Flux Method (UEFM) to Second Order Accuracy and its Graphics Processing Unit Acceleration, Procedia Engineering., Vol. 61, pp. 70-75, 2013.
[27] Serna. S.,“A class of extended limiters applied to piecewise hyperbolic methods, SIAM J. Sci. Compt., Vol. 28, pp. 123–140, 2006.
[28] Shu C.W., Osher S.,“Efficient implementation of essentially non-oscillatory shock-capturing schemes, J. Comp. Phys., Vol. 77, pp. 439–471, 1988.
[29] Sweby P.K., “High resolution schemes using flux limiters for hyperbolic conservation laws Siam J. Nmer. Anal., Vol. 21,(5), 1984.
[30] Sohn Sung-IK, “A New TVD-MUSCL Scheme for Hyperbolic Conservation Laws, Computers and Mathematics with Applications, Vol. 50, pp.231-248, 2005.
[31] SOD Gary A. “A Survey of Several Finite Difference Methods for Systems of Nonlinear Hyperbolic Conservation Laws J. Comput. Phys., Vol, 27, pp.1-31,1978.
[32] Spinellis Diomidis., “A Critique of the Windows Application Programming Interface, University of the Aegean, Karlovasi, Greece, 1997.
[33] Toro E.F., Riemann Solvers and Numerical Methods for Fluid Dynamics, second ed., Springer, Berlin, Germany, 1999.
[34] Trangenstein John A. “Numerical Solution of Hyperbolic Partial Differential Equations, Cambridge University Press, USA , 2009.
[35] Van Leer B. “Towards the Ultimate Conservative Difference Scheme. II. Monotonicity and Conservation Combined in a Second-Order Scheme, J. Comput. Phys., Vol, 14, pp. 361-370, 1974.
[36] Van Leer, B., “Towards the Ultimate Conservative Difference Scheme V: A Second-Order Sequel to Godunov’s Method, J. Comput. Phys., Vol. 32, pp. 101–136, 1979.
[37] Van Leer, B., “Flux-Vector Splitting For The Euler Equations, the 8th International Conference on Numerical Methods in Fluid Dynamics, Aachen, Germany, 1982.
[38] V. V. Rusanov.,“Calculation of Intersection of Non-Steady Shock Waves with Obstacles, J. Compt. Math. Phys. USSR., Vol. 1, pp. 267–279, 1961.
[39] Zhang. Shanghong, Xia. Zhongxi, Yuan. Rui, Jiang. Xiaoming., “Parallel computation of a dam-break flow model using OpenMP on a multi-core computer, Journal of Hydrology, Vol. 512, pp. 126–133, 2014.

電子全文

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	以有限體積法分析二維淺水波方程式並評估其平行化效能
2.	CPU平行粒子群最佳化應用於平面桁架結構最佳化設計
3.	使用 OpenMP 平行化 GPGPU-SIM
4.	晶格波茲曼方法應用於兩相流不穩定性之研究
5.	利用非結構性網格有限體積法之平行化靜磁場模擬器的研發與驗證
6.	線疊代法與點疊代法於兩平行板間層流場及熱場之數值模擬與比較
7.	應用圖形處理器平行計算技術求解淺水波方程式
8.	發展平行計算之群體寸進強化法並應用於氣體軸承之最佳化設計
9.	以CUDA及OpenMP加速非線性動態結構分析之程式架構
10.	利用GPU平行計算在非結構性網格下的真實流向靜態直接模擬法
11.	三角化平衡通量法使用多重圖型處理器與MPI解尤拉方程式
12.	使用多重圖形顯示晶片及OpenMP平行化方法於分裂AUSM方法應用在暫態可壓縮流
13.	使用英特爾Xeon PHI協處理器加速模擬使用非結構四面體網格之暫態可壓縮流
14.	三角化平衡通量法應用圖形處理器速解尤拉與納維-史托克斯方程式之研究
15.	應用平行計算於薄膜潤滑模型之分析

無相關期刊

1.	三角化平衡通量法應用圖形處理器速解尤拉與納維-史托克斯方程式之研究
2.	應用於大面積平面顯示器的薄膜電晶體新穎關鍵技術之研究
3.	降低乾旱期間缺水影響之南化水庫最佳營運策略
4.	睡眠剝奪對於小鼠體內多重器官傷害之影響
5.	生質物燃燒排放持久性有機污染物於煙道及大氣傳輸之特徵
6.	兩稅合一稅制下一帳制與兩帳制之企業價值攸關性
7.	鈣系爐石在純氧燃燒下吸收二氧化碳之研究
8.	應用參數模型軟體於平面四桿機構合成
9.	機器手臂在考量關節間隙下之多目標最佳路徑規劃
10.	地下經濟議題之研究
11.	裁決性應計項目模型之評估
12.	具支架及凹槽之超音速燃燒流場模擬分析
13.	結合類神經網路與基因演算法於含搬運時間的生產系統排程之研究
14.	在地組織參與歷史區域再生之研究─以舊城聯盟組織參與台南市舊城區再生為例
15.	內外弔詭－無印良品旗艦店

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室