跳到主要內容

臺灣博碩士論文加值系統

(44.200.171.156) 您好!臺灣時間:2023/03/22 02:03
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:杜領諺
研究生(外文):ling-yan du
論文名稱:增進效能的超執行緒指令排程機制設計與實現
論文名稱(外文):Design of instructions scheduling Mechanism in Hyper-Threading Architecture for Improving Performance
指導教授:邱日清
指導教授(外文):Jih-ching Chiu
學位類別:碩士
校院名稱:國立中山大學
系所名稱:電機工程學系研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2004
畢業學年度:92
語文別:英文
論文頁數:58
中文關鍵詞:平行處理指令排程
外文關鍵詞:ILPscheduling
相關次數:
  • 被引用被引用:0
  • 點閱點閱:162
  • 評分評分:
  • 下載下載:10
  • 收藏至我的研究室書目清單書目收藏:0
在微處理器系統中,指令平行處理是主要影響系統效能的關鍵,當指令排序機制設計複雜時以提高ILP,相對的也增加很多硬體的花費。在現今的處理器架構中,為了不使指令排序機制的硬體花費過大,採用了多重佇列的方式去派送指令,此種排程方法可能因多個相鄰且相依的指令阻擋了在排序佇列後的可執行指令去使用執行單元,使得執行單元的使用率並不飽和,對於可以同時執行多個執行緒的架構來說,排程佇列內的指令平行度高於只有單一個執行緒的指令平行
度,若能減少派送出相鄰且相依的指令機率,則可提高執行單元的使用率,於是我們提出了一個稱做priority-scheduling buffer 排程的機制去取代原本的各執行單元的排程佇列,此機制利用堆積在排程佇列內的指令相依性虛擬的分出多個排程佇列,使得指令可從不同的虛擬排程佇列派送,減少連續派送相依的指令,以提高執行單元的使用率。根據SPEC CINT2000 模擬的結果顯示,以英特爾Pentium 4 作為模擬的基礎架構,當排序佇列可容納的指令越多時,此機制在五個執行緒同時執行的時候可比原來的排程佇列增加7.14%的效能
In the microprocessor system, exploiting ILP is an important key for improving performance. As instructions scheduling mechanism is designed complicated for employing ILP more efficient, the hardware cost will become larger in opposition. In the nowadays processor, they adopt the multiple scheduler queues to issue instructions so that the hardware cost will be not larger. But in this scheduling mechanism, it could successive issue the instructions that have dependence. This situation can makes that the utilization of execution units is not saturated. In the hyperthreading architecture, the instructions in the scheduler queue have high degree of parallelism. If we can
decrease the probability of situation that successive issue the instructions that have dependence, the utilization of execution units will heighten. In this paper, we propose
the scheduling mechanism called as priority-scheduling buffer to replace the original scheduler queues. The scheduling mechanism will divide an original scheduler queue
into multiple virtual scheduler queues according to the dependence of instructions. the instructions that have dependence will dispatch into the same virtual scheduler queue. The instructions can be issued from the ahead of different virtual scheduler queues. This can reduce the probability that successive issues the instructions that have dependence. According to result of simulation in SPEC CINT2000, we adopt the Intel Pentium 4 for basic architecture of our simulation. In the five threads executing simultaneously, the performance will increase 7.14% average that compares with the original scheduler queue.
摘要....2
ABSTRACT....3
Contents ....5
List of Figures ....7
List of Tables....8
Chapter 1 Introduction ....9
1.1 The Problems of Instruction Window....11
1.2 Motivations and Purposes ....12
1.3 Organization of This Thesis ....12
Chapter 2 Survey....14
2.1 Summary of MTA....14
2.1.1 MTA....14
2.1.2 Hyper-Threading Technology ....16
2.2 Relative Research and Technology ....18
2.2.1 First-Use Issue Logic ....19
2.2.2 Palacharla’s Dependence-Based FIFO schedulers....20
2.2.3 LeBeck’s WIB scheduler ....21
Chapter 3 Design of the Priority- Scheduling Buffer ....23
3.1 The Concept of Priority-Scheduling Buffer....23
3.1.1 Dynamic Adjusting Virtual Scheduler Queue....23
3.2 The Architecture of Priority-Scheduling Buffer ....27
3.2.1 The Tag of Dependence ....27
3.2.2 The Procedure of Operations in our Scheduler....31
3.2.3 Deadlock Free ....33
3.3 The Hardware of Priority-Scheduling Buffer ....34
Chapter 4 Simulation and Analysis.....37
4.1 Simulation Environment ....37
4.1.1 Simulator....37
4.1.2 Benchmark Programs....42
4.2 The Result of Simulation and Analysis....43
4.2.1 The Effect of Buffer Size and Number of Threads ....44
4.2.2 The Effect of Retirement....46
4.2.3 The Effect of Number of ALU....48
4.3 Compare with the other Scheduling Strategies ....49
4.4 Verification....51
Chapter 5 Conclusion....53
Reference ....55
[1] S. Palacharla, N.P. Jouppi, J.E. Smith, “Complexity-Effective Superscalar Processors”, in Proc of the 24th. Int. Symp. on Comp. Architecture, 1997, pp 1-13.

[2] D. Folegnani, A. Gonzalez, “Reducing Power Consumption of the Issue Logic”, in the Workshop on Complexity-Effective Design, Vancouver, June 2000.

[3] S. Önder, R. Gipta, “Superscalar Execution with Dynamic Data Forwarding”, in Proc. Int. Conference on Parallel Architectures and Compilation Techniques,
pp.130-135, 1998.

[4] V.V. Zyuban, “Inherently Lower-Power High-Performance Supersalar Architectures”, PhD. Thesis, Dept. of Computer Science and Engineering, University of Notre Dame, Indiana, January 2000. 320

[5] J.E. Smith, G.S. Sohi, “The Mircoarchitecture of Superscalar Processors”, in Proc. of the IEE, vol. 83, no.12, december 1995, pp. 1609-1624.

[6] Deborah T. Marr; Frank Binns; David L. Hill; Glenn Hinton; David A. koufaty; J. Alan Miller; Michael Upton; “ Hyper-Threading Technology Architecture and Microarchitecture” Intel Technology Journal Q1, 2002

[7] Koufaty, D.; Marr, D.T.; ” Hyperthreading technology in the netburst microarchitecture “ Micro, IEEE, Volume: 23, Issue: 2, March-April 2003 Pages:56 – 65

[8] Glenn Hinton; Dave Sager; Mike Upton; Darrell Boggs; Doug Carmean; Alan Kyker; Patrice Roussel; “The Microarchitecture of the Pentium 4 Processor” Intel
Technology Journal Q1, 2001

[9] J.LI. Cruz, A. Gonz~ilez, M. Valero, N. Topham, "Multiple-Banked Register
File Architectures" in Proc. of the 27nd lnt. Syrup. on Computer Architecture, 2000.

[10] Ramon Canal, Antonio González; “A low-complexity issue logic” Proceedings of the 14th international conference on Supercomputing May 2000

[11] S. Palacharla, N.E Jouppi, and J.E. Smith, "Complexity-Effective Superscalar Processors", in Proc of the 24th. Int. Symp. on Comp. Architecture, pp 1-13, 1997.

[12] Dan Ernst, Andrew Hamel, and Todd Austin; “Cyclone: A Broadcast-Free Dynamic Instruction Scheduler with Selective Replay”; ACM SIGARCH Computer Architecture News , Proceedings of the 30th annual international
symposium on Computer architecture, Volume 31 Issue 2 ,May 2003

[13] Alvin R. Lebeck, Jinson Koppanalil, Tong Li, Jaidev Patwardhan, and Eric Rotenberg. “A Large, Fast Instruction Window for Tolerating Cache Misses, In Proceedings of the International Symposium on Computer Architecture”
57 (ISCA-29), May 2002.

[14] Ilhyun Kim and Mikko H. Lipasti ; “Macro-op Scheduling: Relaxing Scheduling Loop Constraints”; Proceedings of the 36th Annual IEEE/ACM International
Symposium on Microarchitecture ; December 2003

[15] Jared Stark _ Mary D. Brown _ Yale N. Patt ; “On Pipelining Dynamic Instruction Scheduling Logic”; Proceedings of the 33rd annual ACM/IEEE
international symposium on Microarchitecture ; December 2000

[16] David W. Wall; “Limits of Instruction-Level Parallelism”; Proceedings of the fourth international conference on Architectural support for programming
languages and operating systems, Volume 19 , 25 , 26 Issue 2 , Special Issue , 4 , April 1991

[17] Gonzalez, J., and A. Gonzalez. “ Limits of instruction Level parallelism with data
speculation”, Proc. of the VECPAR conf., 585-598, 1998

[18] M. D. Smith, M. Johnson, M. A. Horowitz ;” Limits on multiple instruction issue”; ACM SIGARCH Computer Architecture News , Proceedings of the third
international conference on Architectural support for programming languages and
operating systems, Volume 17 Issue 2; April 1989

[19] Haitham Akkary Ravi Rajwar Srikanth T. Srinivasan; “Checkpoint Processing
and Recovery: Towards Scalable Large Instruction Window Processors“;
58
Proceedings of the 36th Annual IEEE/ACM International Symposium on
Microarchitecture, December 2003
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top