跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.108) 您好!臺灣時間:2025/09/02 01:52
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:巫濟帆
研究生(外文):Chi-Fan Wu
論文名稱:一種在共享記憶體多處理器系統中之執行時期迴圈平行化技術
論文名稱(外文):A Run-Time Loop Parallelization Technique on Shared-Memory Multiprocessor Systems
指導教授:黃宗傳
指導教授(外文):Tsung-Chuan Huang
學位類別:碩士
校院名稱:國立中山大學
系所名稱:電機工程學系研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2000
畢業學年度:88
語文別:中文
論文頁數:49
中文關鍵詞:多處理器系統平行編譯器波前排程執行時期平行法
外文關鍵詞:Run-time parallelizationParallelizing compilerMultiprocessor systemWavefront scheduling
相關次數:
  • 被引用被引用:0
  • 點閱點閱:362
  • 評分評分:
  • 下載下載:9
  • 收藏至我的研究室書目清單書目收藏:0
目前的科學計算必須要仰賴高效能的計算能力,而多處理器系統(Multiprocessor System)的平行計算則為此高效能計算能力的主要來源。平行編譯器(Parallel Compiler)的功能在於將輸入的循序程式轉譯成在多處理器機器上執行的平行程式,其主要是藉著檢查程式迴圈(Loop)中的資料相依性(Data dependence)來決定可平行執行的迴圈。而在迴圈的陣列存取中若遇到非常態(Irregular)、非線性(Nonlinear)或是動態的存取樣式(Dynamic access pattern),平行編譯器則無法在編譯時期(Compiler-time)中決定迴圈的資料相依性,因此我們必須運用執行時期(Run-time)的演算法來找出這些迴圈中的資料相依,並且萃取出迴圈的平行度。在本論文中,我們提出了一個有效的執行時期平行化技術,針對上述形式的迴圈,排程出平行執行的順序。在這個新技術中,我們使用平行技術首先找出迴圈中每一個輪次(Iteration)的立即先行者輪次(Immediate predecessor iteration)資訊,然後我們參考這些資訊能夠快速而且有效的使用平行技術將輪次排入適當的波前(Wavefront),如此根據所排出之波前順序便可以將迴圈平行的執行。經過理論上的分析及實驗的結果,本論文所提出之新技術具有高增速(Speedup)及低負荷(Overhead)的優點。同時,因其具有高的可調性(Scalability)之特性,更適合使用於多處理器的系統當中。
High performance computing power is important for the current advanced calculations of scientific applications. A multiprocessor system obtains its high performance from the fact that some computations can proceed in parallel. A parallelizing compiler can take a sequential program as input and automatically translate it into parallel form for the target multiprocessor system. But when loops with arrays of irregular, nonlinear or dynamic access patterns, no any current parallelizing compiler can determine whether data dependences exist at compile-time. Thus a run-time parallel algorithm will be utilized to determine dependences and extract the potential parallelism of loops. In this thesis, we propose an efficient run-time parallelization technique to compute a proper parallel execution schedule in those loops. This new method first detects immediate predecessor iterations of each loop iteration and constructs an immediate predecessor table, then efficiently schedules the whole loop iterations into wavefronts for parallel execution. According to either theoretical analysis or experimental results, our new run-time parallelization technique reveals high speedup and low processing overhead. Furthermore, this new technique is appropriate to implement on multiprocessor systems due to the characteristics of high scalability.
目錄
中文摘要 i
英文摘要 iii
目錄 v
圖目錄 vii
表目錄 ix
第1章 簡介 1
第1.1節 問題描述 1
第1.2節 執行時期迴圈排程技術探討 3
第1.3節 設計策略考量 6
第1.4節 論文架構 7
第2章 相關研究 8
第3章 一個新的執行時期迴圈平行化方法 16
第3.1節 動機 16
第3.2節 範例說明 21
第3.3節 偵測器階段(Inspector) 23
第3.4節 排程器階段(Scheduler) 31
第3.5節 執行器階段(Executor) 34
第4章 實驗結果 35
第4.1節 實驗環境 35
第4.2節 實驗數據 36
第4.3節 相關研究比較 43
第5章 結論 45
參考文獻 47
[1]Chen, D. K., Yew, P. C., and Torrellas, J., “An efficient algorithm for the run-time parallelization of doacross loops,” in Proc. 1994 Supercomputing, 518-527, 1994.
[2]Huang, T. C. and Hsu, P. H., “The SPNT test: a new technology for run-time speculative parallelization of loops,” in Lecture Notes in Computer Science No. 1366 (Z. Li, P. C. Yew, S. Chatterjee, C. H. Huang, P. Sadayappan, and D. Sehr, eds.), 177-191, Springer-Verlag, 1998.
[3]Huang, T. C., Hsu, P. H., and Sheng, T. N., “Efficient run-time scheduling for parallelizing partially parallel loops,” J. Information Science and Engineering 14(1), 255-264, 1998.
[4]Leung, S. T. and Zahorjan, J., “Improving the performance of run-time parallelization,” in Proc. 4th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, 83-91, 1993.
[5]Leung, S. T. and Zahorjan, J., “Extending the applicability and improving the performance of run-time parallelization,” Tech. Rep. 95-01-08, Dept. CSE, Univ. of Washington, 1995.
[6]Midkiff, S. and Padua, D., “Compiler algorithms for synchronization,” IEEE Trans. Comput. C-36, 12, 1485-1495, 1987.
[7]Polychronopoulos, C., “Compiler optimizations for enhancing parallelism and their impact on architecture design,” IEEE Trans. Comput. C-37, 8, 991-1004, 1988.
[8]Rauchwerger, L., Amato, N., and Padua, D., “A scalable method for run-time loop parallelization,” Int. J. Parallel Processing, 26(6), 537-576, 1995.
[9]Rauchwerger, L. and Padua, D., “The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization,” IEEE Trans. Parallel and Distributed Systems; 10(2), 160-180, 1999.
[10]Saltz, J., Mirchandaney, R., and Crowley, K., “Run-time parallelization and scheduling of loops,” IEEE Trans. Comput. 40(5), 603-612, 1991.
11.Xu, C. and Chaudhary, V., “Time-stamping algorithms for parallelization of loops at run-time,” in Proc. 11th Int. Parallel Processing Symp. 1997.
[12]Zhu, C. Q. and Yew, P. C., “A scheme to enforce data dependence on large multiprocessor systems,” IEEE Trans. Software Eng. 13(6), 726-739, 1987.
[13]Z. Shen, Z. Li, and P. C. Yew, “An empirical study on array subscripts and data dependencies,” in Proc. Of ICPP, pp. II-145 to II-152, 1989.
[14]Lawrence Rauchwerger, “Run-time parallelization: A framework for parallel computation,” A thesis for the degree of Doctor of Philosophy in Computer Science in the Graduate College of the Univ. of Illinois, 1995.
[15]H. Zima, Supercompilers for Parallel and Vector Computer. ACM press, New York, 1991.
[16]Michael Wolfe, High Performance Compilers for Parallel Computing. Addition-Wesley Publish, 1996.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top