跳到主要內容

臺灣博碩士論文加值系統

(44.211.26.178) 您好!臺灣時間:2024/06/15 03:02
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:杜巧韻
研究生(外文):Tu, Chiao-Yun
論文名稱:通用圖形處理器內晶片網路之動態頻率調整機制
論文名稱(外文):Run-Time Frequency Scaling of On-Chip Networks for GPGPU
指導教授:金仲達金仲達引用關係
指導教授(外文):King, Chung-Ta
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2013
畢業學年度:101
語文別:英文
論文頁數:43
中文關鍵詞:通用圖形處理器晶片網路動態頻率調整
外文關鍵詞:GPGPUNetwork-On-ChipDynamic Frequency ScalingMany-to-Few-to-Many
相關次數:
  • 被引用被引用:0
  • 點閱點閱:132
  • 評分評分:
  • 下載下載:8
  • 收藏至我的研究室書目清單書目收藏:0
現今的通用圖形處理器(GPGPUs) 於高度平行的應用程式上能提供相較於一般用途處理器 (CPU) 十倍甚至百倍的運算能力。由於通用圖形處理器於晶片網路(Network-on-Chip)上所表現的傳輸行為模式與一般用途處理器不同,傳統設計給一般用途處理器的晶片網路架構並不適用給通用圖形處理器。此碩士論文中提出了一個動態頻率調整機制,根據晶片網路處理的負荷量調整網路頻率以符合不同應用程式對於頻寬的需求。
在此碩士論文裡,首先,我們探討通用圖形處理器在網路的傳輸行為模式並將它們分成三種類型。在不同的類型下,需求網路和回覆網路由於網路負載量的差異會有不同的頻寬需求。其次,我們動態的監控某些運算核心(shader core)並預測網路的負荷量,再依據前面研究階段的類型特徵去調節網路頻率。實驗結果顯示此動態頻率調節機制最高可以提升二十七百分比的性能(平均能夠提升七點四百分比的性能)。

Modern General Purpose computing on GPUs (GPGPUs) provide orders of magnitude more computing power than general purpose processors (CPU) for highly parallel applications. Since the traffic pattern of GPGPUs behaves considerably different than CPU, the conventional interconnection network designs for CPU are not applicable for GPGPUs. This thesis proposes a run-time dynamic frequency scaling mechanism that can meet the bandwidth demands of different applications by tuning the frequency of network in response to the network load. In this thesis, we first investigate the characteristics of GPGPU traffic pattern and classify the traffic patterns of GPGPUs to three types. Under the different types, the request network and reply network require different bandwidth to handle the network load. Second, we leverage the property to regulate the network
frequency dynamically by monitoring some shader cores and predict the network load. Evaluation show that this dynamic frequency tuning design can achieve up to 27% improvement compared to baseline setting (on average, it results 7.4 % improvement).
1 Introduction 1
2 Background 5
2.1 Baseline GPGPU Architecture 5
2.2 Characteristics of GPGPU Applications 6
2.2.1 Many-to-Few-to-Many Traffic Pattern 6
2.2.2 Characteristic of GPGPU Traffic Pattern 7
3 Methodology 13
3.1 DFS Policy Overview 13
3.1.1 Data collection 15
3.1.2 Parameter computation 15
3.1.3 Dynamic Frequency Control Algorithm 16
3.2 Frequency Tuning Rationale 17
3.2.1 Ideal Frequency Tuning Technique 19
3.2.2 Practical Frequency Tuning Technique 20
3.3 Scaling Overhead and Hardware Implementation 20
3.3.1 Scaling Overhead 20
3.3.2 Hardware Implementation 21
4 Experimental Evaluation 23
4.1 Simulation Setup 23
4.2 Evaluation Result 25
4.2.1 Network Limit Exploration 25
4.2.2 Core Monitor Exploration 26
4.2.3 Time Window Size 27
4.2.4 Performance Result 29
4.2.5 Power Consumption 32
4.2.6 Performance Gain by Ideal DFS Mechanism 34
5 Related Work 37
6 Conclusion and Future Work 39
6.1 Conclusion 39
6.2 Future Work . 40
[1] NVIDIA, “Cuda zone.” [Online]. Available: http://www.nvidia.com/cuda
[2] ——, “Nvidias next generation cuda compute architecture: Fermi,” 2009.
[3] A. Bakhoda, J. Kim, and T. Aamodt, “Throughput-effective on-chip networks for manycore accelerators,” in Proceedings of the 2010 43rd Annual IEEE/ACM international symposium on Microarchitecture. IEEE Computer Society, 2010, pp. 421–432.
[4] W. Dally and B. Towles, Principles and practices of interconnection networks. Morgan
Kaufmann, 2004.
[5] A. Bakhoda, G. Yuan, W. Fung, H. Wong, and T. Aamodt, “Analyzing cuda workloads using
a detailed gpu simulator,” in Performance Analysis of Systems and Software, 2009. ISPASS 2009. IEEE International Symposium on. IEEE, 2009, pp. 163–174.
[6] K. J. Nowka, G. D. Carpenter, E. W. MacDonald, H. C. Ngo, B. C. Brock, K. I. Ishii, T. Y. Nguyen, and J. L. Burns, “A 32-bit powerpc system-on-a-chip with support for dynamic voltage scaling and dynamic frequency scaling,” Solid-State Circuits, IEEE Journal of, vol. 37, no. 11, pp. 1441–1447, 2002.
[7] R. M. Senger, E. D. Marsman, G. A. Carichner, S. Kubba, M. S. McCorquodale, and
R. B. Brown, “Low-latency, hdl-synthesizable dynamic clock frequency controller with self-referenced hybrid clocking,” in Circuits and Systems, 2006. ISCAS 2006. Proceedings. 2006
IEEE International Symposium on. IEEE, 2006, pp. 4–pp.
[8] L. Shang, L.-S. Peh, and N. K. Jha, “Dynamic voltage scaling with links for power opti-
mization of interconnection networks,” in High-Performance Computer Architecture, 2003.
HPCA-9 2003. Proceedings. The Ninth International Symposium on. IEEE, 2003, pp. 91–102.
[9] “Parboil benchmark suite.” http://impact.crhc.illinois.edu/parboil.php.
[10] Pcchen, “N-queens solver,” http://forums.nvidia.com/index.php?showtopic=76893.
[11] S. Che, M. Boyer, J. Meng, D. Tarjan, J. Sheaffer, S. Lee, and K. Skadron, “Rodinia: A
benchmark suite for heterogeneous computing,” in Workload Characterization, 2009. IISWC
2009. IEEE International Symposium on. IEEE, 2009, pp. 44–54.
[12] NVIDIA, “Nvidia cuda sdk code samples.” [Online]. Available:
http://developer.download.nvidia.com/compute/cuda/sdk /website/samples.html.
[13] K. Choi, R. Soma, and M. Pedram, “Fine-grained dynamic voltage and frequency scaling
for precise energy and performance tradeoff based on the ratio of off-chip access to on-chip
computation times,” Computer-Aided Design of Integrated Circuits and Systems, IEEE Trans-
actions on, vol. 24, no. 1, pp. 18–28, 2005.
[14] H. Kim, J. Kim,W. Seo, Y. Cho, and S. Ryu, “Providing cost-effective on-chip network band-
width in gpgpus,” in Computer Design (ICCD), 2012 IEEE 30th International Conference on.
IEEE, 2012, pp. 407–412.
[15] G. Semeraro, G. Magklis, R. Balasubramonian, D. H. Albonesi, S. Dwarkadas, and M. L.
Scott, “Energy-efficient processor design using multiple clock domains with dynamic voltage
and frequency scaling,” in High-Performance Computer Architecture, 2002. Proceedings.
Eighth International Symposium on. IEEE, 2002, pp. 29–40.
[16] W. Kim, M. S. Gupta, G.-Y. Wei, and D. Brooks, “System level analysis of fast, per-core
dvfs using on-chip switching regulators,” in High Performance Computer Architecture, 2008.
HPCA 2008. IEEE 14th International Symposium on. IEEE, 2008, pp. 123–134.
[17] A. K. Mishra, R. Das, S. Eachempati, R. Iyer, N. Vijaykrishnan, and C. R. Das, “A case
for dynamic frequency tuning in on-chip networks,” in Microarchitecture, 2009. MICRO-42.
42nd Annual IEEE/ACM International Symposium on. IEEE, 2009, pp. 292–303.
[18] J. Lee, V. Sathisha, M. Schulte, K. Compton, and N. S. Kim, “Improving throughput of
power-constrained gpus using dynamic voltage/frequency and core scaling,” in Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on. IEEE, 2011, pp. 111–120.
[19] A. Samih, R. Wang, A. Krishna, C. Maciocco, T.-Y. C. Tai, and Y. Solihin, “Energy-efficient
interconnect via router parking.” in HPCA, 2013, pp. 508–519.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top