跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.106) 您好!臺灣時間:2026/04/05 02:44
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:王鉉崴
研究生(外文):Wang, Hsuan-Wei
論文名稱:以CUDA 為基礎快速靜態時序分析引擎以及其應用
論文名稱(外文):A Fast CUDA-Based Static Timing Analysis (STA) Engine and Its Application
指導教授:溫宏斌
指導教授(外文):Wen, Hung-Pin
學位類別:碩士
校院名稱:國立交通大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2013
畢業學年度:102
語文別:英文
論文頁數:39
中文關鍵詞:圖形處理器靜態時序分析平行
外文關鍵詞:GPUSTAParallel
相關次數:
  • 被引用被引用:0
  • 點閱點閱:322
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
圖形處理器使得平行運算可以實作在靜態時序分析上。由於圖形處理器上的有數百個核心,能夠比平常的平行硬體有更多的加速效果。但manycore 使得在記憶體存取以及同步處理上有了困難性,限制了圖形處理器的加速能力。因此,本論文提出一個以CUDA 為基礎快速靜態時序分析引擎,此引擎中運用了依邏輯閘類型排序的分層法、邏輯閘訊號結構重建、表格索引重組以及硬體加速渲染的機制來處理記憶體存取時間過長的問題。依邏輯閘類型排序的分層法將分層後同一層的邏輯閘依類型排好,使多核心處理器中每個核心所負責處理相同類型的閘,就只需讀取一種閘類型的資料;邏輯閘訊號結構重建將幾個小訊號打包成一個符合圖形處理器一次存取的量,提高記憶體吞吐量;表格索引重組後能使用紋理暫存器共同存取更多資料;而硬體加速渲染擴展表格,讓查找表所表示的範圍足以負責整個訊號域,只需做內插法而不需使用外插法而產生執行分枝。實驗結果表示,此論文所提出的以CUDA 為基礎快速靜態時序分析引擎比CPU 版本快上12.85 倍,在最大電路netcard 上,有29.35 倍的加速。與商用軟體primetime,能有3229 倍加速,netcard 上更有8117 倍的加速效果。
Graphics processing unit (GPU) enables the possibility of parallel computing for Static Timing Analysis (STA). However, memory access and synchronization between cores has become more difficult in STA and thus its algorithm needs to be re-designed. In this work, we developed a CUDA-based STA engine that incorporates cell levelization and type sorting (CLTS), timing table restructuring (TTR), table indexing by
texture (TIT) and hardware-accelerated rendering (HAR) for
high-parallelism. Cell levelization and type sorting (CLTS) levelize cells and sort their types in order to efficiently access the same timing library. Timing table restructuring (TTR) modifies signal structure of one cell to increase the throughput. Table indexing by texture (TIT) combines the
axes of each table to access data jointly while hardware-accelerated rendering (HAR) expands look-up tables (LUTs) without extrapolation. As result, our fast CUDA-based STA engine shows an average of 12.85X speedup on experimental circuits over the CPU version. The proposed work outperformed PrimeTime in speedup by three orders of magnitude.
Contents
List of Tables vi
List of Figures vii
1 Introduction 1
2 Related Work 6
2.1 STA Overview . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 GPU Architecture . . . . . . . . . . . . . .. . . . . . . . . 9
2.2.1 Hardware Model . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.2 Memory Model . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.3 Programming Model . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Fast CUDA-Based Static Timing Analysis 14
3.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.1 Overall Flow . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.2 Table Indexing by Texture (TIT) . . . . . . . . . . . . . . . . . . . 16
3.1.3 Hardware-Accelerated Rendering (HAR) . . . . . . . . . . . . . . 16
3.1.4 Cell Levelization &; Type Sorting (CLTS) . . . . . . . . . . . . . . 20
3.1.5 Timing Table Restructuring (TTR) . . . . . . . . . . . . . . . . . . 20
4 Experimental Results 22
4.1 Comparison between CUDA-based STA, CPU-based STA and PrimeTime . 23
4.1.1 Runtime Result . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.1.2 Single-thread versus Multi-threads on GPU . . . . . . . . . . . . . 25
4.1.3 CUDA-based versus CPU-based STA . . . . . . . . . . . . . . . . 26
4.1.4 CUDA-based STA versus PrimeTime . . . . . . . . . . . . . . . . 26
4.2 Application: Gate Sizing by Simulated Annealing . . . . . . . . . . . . . . 30
4.2.1 SA times . . . . . . . . . . . . . . . . . . . . . . . . . 30
5 Conclusion 35
5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Bibliography 38
[1] R. Hitchcock, “Timing Verification and the Timing analysis Program,” in Design Automation Conference. IEEE, 1982, pp. 594–604.
[2] K. Gharachorloo, A. Gupta, and J. Hennessy, “Performance evaluation of memory consistency models for shared-memory multiprocessors,” ACM SIGOPS Operating Systems Review, vol. 25, no. Special Issue, pp. 245–257, Apr. 1991.
[3] J. Protic, M. Tomasevic, and V. Milutinovic, Distributed shared memory: concepts and systems,” IEEE Parallel &; Distributed Technology: Systems &; Applications, vol. 4, no. 2, pp. 63–71, Jan. 1996.
[4] J. Owens, M. Houston, D. Luebke, S. Green, J. Stone, and J. Phillips, “GPU Computing,”Proceedings of the IEEE, vol. 96, no. 5, pp. 879–899, May 2008.
[5] “Intel®Xeon Phi™ Product Family,”http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-detail.html.
[6] D. Chatterjee, A. DeOrio, and V. Bertacco, “GCS: High-performance gate-level simulation with GPGPUs,” in Design, Automation &; Test in Europe Conference &; Exhibition.
IEEE, Apr. 2009, pp. 1332–1337.
[7] A. Sen, B. Aksanli, M. Bozkurt, and M. Mert, “Parallel Cycle Based Logic Simulation Using Graphics Processing Units,” in International Symposium on Parallel and Distributed Computing. IEEE, Jul. 2010, pp. 71–78.
[8] D. Chatterjee, A. DeOrio, and V. Bertacco, “Event-driven gate-level simulation with gp-gpus,” in Design Automation Conference, 2009, pp. 557–562.
[9] K. Gulati and S. Khatri, “Towards acceleration of fault simulation using graphics processing units,” in Design Automation Conference, 2008, pp. 822–827.
[10] M. Li and M. S. Hsiao, “FSimGPˆ2: An Efficient Fault Simulator with GPGPU,”IEEE Asian Test Symposium, pp. 15–20, Dec. 2010.
[11] K. Gulati and S. P. Khatri, “Accelerating statistical static timing analysis using graphics processing units,” Asia and South Pacific Design Automation Conference, pp. 260–265, 2009.
[12] “Synopsys PrimeTime,”http://www.synopsys.com/Tools/Implementation/SignOff/PrimeTime/.
[13] J. E. Kelley and M. R. Walker, “Critical-path planning and scheduling,” in eastern joint IRE-AIEE-ACM computer conference. ACM, Dec. 1959, pp. 160–173.
[14] “The NVIDIA Tesla C1060 Specification,”
http://www.nvidia.com/docs/IO/43395/BD-04111-001 v06.pdf.
[15] NVIDIA, “NVIDIA CUDA Compute Unified Device Architecture Reference Manual,”Jun. 2008.
[16] “Cuda-zone,”
https://developer.nvidia.com/category/zone/cuda-zone.
[17] “NVIDIA CUDA Introduction,”
http://www.beyond3d.com/content/articles/12/1.
[18] M. M. Ozdal, C. Amin, A. Ayupov, S. Burns, G. Wilke, and C. Zhuo, “The ISPD-2012 discrete cell sizing contest and benchmark suite,” in Proceedings of the ACM international symposium on International Symposium on Physical Design. ACM,Mar. 2012, p. 161.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊