跳到主要內容

臺灣博碩士論文加值系統

(34.204.172.188) 您好!臺灣時間:2023/09/27 19:01
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:管偉傑
研究生(外文):Kuan, Wei-Chieh
論文名稱:利用跨平台加速框架實現快速初期預測性擺置器
論文名稱(外文):Fast Early-phase Predictive Placer Using Cross-platform Acceleration Framework
指導教授:黃柏蒼黃柏蒼引用關係
指導教授(外文):Huang, Po-Tsang
口試委員:黃柏蒼陳宏明劉建男
口試委員(外文):Huang, Po-TsangChen, Hung-MingLiu, Chien-Nan
口試日期:2022-03-08
學位類別:碩士
校院名稱:國立陽明交通大學
系所名稱:國際半導體產業學院
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2022
畢業學年度:110
語文別:英文
論文頁數:48
中文關鍵詞:可解析擺置器GPU 加速的擺放演算法CPU 平行化技巧跨平台加速的架構稀疏矩陣格式
外文關鍵詞:analytical placementGPU-accelerated placementCPU parallel techniquecrossplatform acceleration frameworkSPARSE matrix format ii
相關次數:
  • 被引用被引用:0
  • 點閱點閱:108
  • 評分評分:
  • 下載下載:9
  • 收藏至我的研究室書目清單書目收藏:0
隨著超大型積體電路規模逐漸變大,擺放的時間也隨之攀升。除此之外,整個流程中需要重新透過擺放這個步驟來找到元件的最佳擺放位置。意即越來越難找到設計收斂。此外,現今的巨集是透過人工決定位置,因此快速的擺放可以提供初期性的效能評估。
在此篇論文,我們提出一個跨平台的加速框架,透過深度學習工具Pytorch 的輔助能夠達到協調CPU 與GPU。此外,我們採用二次最佳化的可解析擺放演算法SimPL 來套用到此篇的框架。我們利用GPU 的稀疏庫加速共軛梯度法並且搭配CPU 的平行化技巧加速其餘的部分。然後透過我們的跨平台框架結合GPU 與CPU 來實現此篇的快速初期預測性擺置器。
結論來說,透過我們跨平台加速的架構可以加速原先為瓶頸的共軛梯度演算法達到
約4.34 倍的效果,而整體的加速效果則是可以得到大約2.64 倍的加速。
As the size of very-large-scale integrated (VLSI) circuits increases, the needed time of placement rises imultaneously. Moreover, it need to run the placement phase repeatedly to find the better location of cell. In other words, it’s more difficult to find the design closure. Besides, the location of macros is determined by manual. Therefore, the fast placement can give a earlyphase estimation of performance.
In this thesis, we developed a cross-platform acceleration framework which can cooperate between CPU and GPU with the assistance of the deep learning toolkit Pytorch. In addition, we implement the SimPL which is quadratic optimization analytical placement on this framework.
We leverage the SPARSE library to accelerate the conjugate gradient method in GPU and introduce CPU parallel technique to accelerate other parts. Then, we integrate these two parts into our cross-platform framework to implement this fast early-phase predictive placer.
In conclusion, we get about 4.34 speedup on conjugate gradient part which is the bottleneck of SimPL and 2.64x speedup than the original flow of SimPL with our cross-platform acceleration framework.
摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Previous and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 Analytical Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Application of Placement . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 Global Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.3 Overlap Reducing Technique . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.4 Integration of the Wirelength Models and Overlap Reduction Techniques 10
2.2 SimPL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 Bound2Bound Wirelength Model . . . . . . . . . . . . . . . . . . . . 12
2.2.2 Quadratic Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.3 Overlap Reducing Technique . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 DREAMPlace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.1 Overview of ePlace/RePlAce Algorithm . . . . . . . . . . . . . . . . . 14
2.3.2 Similarities Between Analytical Placement and Deep Learning . . . . . 16
2.3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3 Cross-platform Acceleration Framework . . . . . . . . . . . . . . . . . . . . . . . 19
3.1 Modeling SimPL on Deep Neural Network and Classification ops to CPU/GPU 19
3.1.1 SimPL Algorithm Flow . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.2 GPU Architecture and Deep Learning Toolkit Architecture . . . . . . . 20
3.1.3 Classification of ops to CPU/GPU . . . . . . . . . . . . . . . . . . . . 22
3.1.4 Modeling Forward and Backward Propagation . . . . . . . . . . . . . 26
3.2 Analysis of acceleration on individual technique . . . . . . . . . . . . . . . . . 26
3.2.1 Data Type Transformation on Cross-platform . . . . . . . . . . . . . . 28
3.2.2 Acceleration of Conjugate Gradient . . . . . . . . . . . . . . . . . . . 28
3.2.3 Analysis on Matrix Construction . . . . . . . . . . . . . . . . . . . . . 33
3.2.4 Analysis on Overlap Reduction Technique . . . . . . . . . . . . . . . . 34
3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4 Experimental Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.1 Visualization of Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 Single-thread SimPL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3 Multi-thread SimPL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.4 Our Proposed Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.5 Pie Chart and Bar Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5 Conclusion and Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
[1] C. Zou. Data-driven congestion prediction at placement stage. [Online]. Available:
https://zinechant.github.io/CongML/
[2] Google. Chip design with deep reinforcement learning. [Online]. Available: https:
//ai.googleblog.com/2020/04/chip-design-with-deep-reinforcement.html
[3] P. Spindler, U. Schlichtmann, and F. M. Johannes, “Kraftwerk2—a fast force-directed
quadratic placement approach using an accurate net model,” IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Systems, vol. 27, no. 8, pp. 1398–1411,
2008.
[4] Y. Lin, Z. Jiang, J. Gu, W. Li, S. Dhar, H. Ren, B. Khailany, and D. Z. Pan, “Dreamplace:
Deep learning toolkit-enabled gpu acceleration for modern vlsi placement,” IEEE Transactions
on Computer-Aided Design of Integrated Circuits and Systems, vol. 40, no. 4, pp.
748–761, 2021.
[5] M.-C. Kim, D.-J. Lee, and I. L. Markov, “Simpl: An effective placement algorithm,” in
2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2010,
pp. 649–656.
[6] N. Viswanathan and C.-N. Chu, “Fastplace: efficient analytical placement using cell shifting,
iterative local refinement,and a hybrid net model,” IEEE Transactions on Computer-
Aided Design of Integrated Circuits and Systems, vol. 24, no. 5, pp. 722–733, 2005.
[7] G. Sigl, K. Doll, and F. Johannes, “Analytical placement: a linear or a quadratic objective
function?” in 28th ACM/IEEE Design Automation Conference, 1991, pp. 427–432.
[8] A. Kahng and Q. Wang, “Implementation and extensibility of an analytic placer,” IEEE
Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 24, no. 5,
pp. 734–747, 2005.
[9] T. Chan, J. Cong, and K. Sze, “Multilevel generalized force-directed method for circuit
placement,” in Proceedings of the 2005 International Symposium on Physical Design, ser.
ISPD ’05. New York, NY, USA: Association for Computing Machinery, 2005, p. 185–
192. [Online]. Available: https://doi.org/10.1145/1055137.1055177
[10] T.-C. Chen, Z.-W. Jiang, T.-C. Hsu, H.-C. Chen, and Y.-W. Chang, “Ntuplace3: An analytical
placer for large-scale mixed-size designs with preplaced blocks and density constraints,”
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,
vol. 27, no. 7, pp. 1228–1240, 2008.
[11] T. Chan, J. Congy, and K. Sze, “Multilevel generalized force-directed method for circuit
placement,” in Proceedings of the 2005 international symposium on physical design -
ISPD '05. ACM Press, 2005. [Online]. Available: https://doi.org/10.1145%2F1055137.
1055177
[12] K. Vorwerk, A. Kennings, and A. Vannelli, “Engineering details of a stable force-directed
placer,” in IEEE/ ACM International Conference on Computer Aided Design, 2004.
ICCAD-2004., 2004, pp. 573–580.
[13] J. Lu, P. Chen, C.-C. Chang, L. Sha, D. J.-H. Huang, C.-C. Teng, and C.-K. Cheng, “eplace:
Electrostatics based placement using nesterov’s method,” in 2014 51st ACM/EDAC/IEEE
Design Automation Conference (DAC), 2014, pp. 1–6.
[14] J. Lu, H. Zhuang, P. Chen, H. Chang, C.-C. Chang, Y.-C. Wong, L. Sha, D. Huang, Y. Luo,
C.-C. Teng, and C.-K. Cheng, “eplace-ms: Electrostatics-based placement for mixed-size
circuits,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,
vol. 34, no. 5, pp. 685–698, 2015.
[15] C.-K. Cheng, A. B. Kahng, I. Kang, and L. Wang, “Replace: Advancing solution quality
and routability validation in global placement,” IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems, vol. 38, no. 9, pp. 1717–1730, 2019.
[16] T. F. Chan, J. Cong, J. R. Shinnerl, K. Sze, and M. Xie, “Mpl6: Enhanced multilevel
mixed-size placement,” in Proceedings of the 2006 International Symposium on Physical
Design, ser. ISPD ’06. New York, NY, USA: Association for Computing Machinery,
2006, p. 212–214. [Online]. Available: https://doi.org/10.1145/1123008.1123055
[17] B. Hu, Y. Zeng, and M. Marek-Sadowska, “mfar: fixed-points-addition-based vlsi placement
algorithm,” in ISPD ’05, 2005.
[18] T. Luo and D. Z. Pan, “Dplace2.0: A stable and efficient analytical placement based on
diffusion,” in 2008 Asia and South Pacific Design Automation Conference, 2008, pp. 346–
351.
[19] N. Viswanathan, G.-J. Nam, C. J. Alpert, P. Villarrubia, H. Ren, and C. Chu, “Rql: Global
placement via relaxed quadratic spreading and linearization,” in 2007 44th ACM/IEEE
Design Automation Conference, 2007, pp. 453–458.
[20] M.-K. Hsu, Y.-W. Chang, and V. Balabanov, “Tsv-aware analytical placement for 3d ic
designs,” in 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC), 2011,
pp. 664–669.
[21] J. Lu, P. Chen, C.-C. Chang, L. Sha, D. J.-H. Huang, C.-C. Teng, and C.-K. Cheng, “Fftpl:
An analytic placement algorithm using fast fourier transform for density equalization,” in
2013 IEEE 10th International Conference on ASIC, 2013, pp. 1–4.
[22] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016, http:
//www.deeplearningbook.org.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top