(3.235.108.188) 您好!臺灣時間:2021/02/28 00:25
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:石佳弘
研究生(外文):Chia-Hung Shih
論文名稱:超純量微處理機之內部聯結網路設計與實作
論文名稱(外文):Design and Implementation of the Internal Connection Network for a Superscalar Microprocessor
指導教授:紀新洲
指導教授(外文):Hsin-Chou Chi
學位類別:碩士
校院名稱:國立東華大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:1999
畢業學年度:87
語文別:中文
論文頁數:78
中文關鍵詞:晶片電晶體指令階層平行度管線化產能暫存器重新命名結果交換網路功能執行單元
外文關鍵詞:chiptransistorinstruction-level parallelismpipeliningthroughputregister renamingresult switching networkfunctional unit
相關次數:
  • 被引用被引用:0
  • 點閱點閱:161
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
由於電子技術與製程不斷地快速進步,使得晶片(chip)中可容納的電晶體 (transistor)數量增多,訊號傳送速度也相對大幅增快。因為有豐富的電晶體可供其應用,所以計算機結構設計有很好的發展空間。在管線化(pipelining)微處理機的設計中,可發現提高指令階層平行度(instruction-level parallelism,ILP),將使管線化微處理機提高其執行效能。因此,以計算機結構設計者的眼光來看,若能於每時脈週期同時發送(issue)數個指令,讓被發送出去的指令可同時被執行,則將可增加微處理機執行指令運算的產能(throughput)。由於使用超純量機制,增加了動態排程電路─指令預留站(reservation station)、復序緩衝器(reorder associative buffer)、以及多個功能執行單元,而讓數個指令能同時被執行,並以暫存器重新命名(register renaming)等方法,解決因指令不按順序執行(out-of-order execution)而產生的危障。上述這些為提高ILP而產生的硬體電路,將佔用許多的晶片面積。
結果交換網路(result switching network)為功能執行單元(functional unit)與指令預留站和復序緩衝器的橋樑,負責傳輸功能執行單元產生的結果給指令預留站和復序緩衝器。所以,結果交換網路掌握了整個系統效能的關鍵部份,扮演一重要角色。本論文主要研究探討的主題,為如何設計一高效能的內部聯結網路,亦即所謂的結果交換網路。於文中將討論如何提高仲裁及資料傳輸速度,並且能動態調整仲裁優先順序權重。除此之外,還考慮低功率(low-power)電路設計,從計算機結構層次來做一探討。最後說明實作過程,以及一些相關的驗證與模擬測試,並展現相關驗證與模擬測試所得到的實驗數據。

With the rapid advance of VLSI technology, the performance of microprocessors has improved significantly in the past years. The effect of the performance improvement stems from both the reduction of the feature size and the increased number of transistors we can put on a chip. The reduced feature size provides faster raw circuit speed. Furthermore, according to Moore's Law, the transistor count on a chip is doubled every one and half years. Hence, we can utilize more devices to enhance the VLSI architecture.
Computer architecture benefits from the rapid improvement of VLSI technology. Pipelining is a typical technique of employing more transistors to improve the performance of a microprocessor significantly. By using pipelining, the execution of different instructions can be overlapped. The potential overlap among instructions is called instruction-level parallelism (ILP). ILP can be further explored by using superscalar approach. In a superscalar microprocessor, there can be multiple functional units of the same kind, and multiple instructions can be issued concurrently. A superscalar microprocessor typically includes some hardware modules in order to dynamically schedule instructions which are ready to be issued. The modules include the reservation station and reorder buffer. Some superscalar microprocessors provide the feature of out-of-order execution, which implies out-of-order completion and requires the register renaming mechanism.
In this thesis, our focus is on the result switching network (or internal connection network), which delivers the results produced by functional units to all the reservation stations and the reorder buffer. Therefore, the result switching network is a critical element in a superscalar microprocessor. The result switching network dramatically affects the performance of a microprocessor. In our proposed design, we try to maximize the performance and also reduce the power consumption of the result switching network in a microprocessor. Our proposed design takes advantage of multiple bus scheme, which fits the superscalar architecture. With the strategy of allocating multiple bus resources for different functional units, the effective bandwidth of the result switching network is improved. Furthermore, the circuit design we propose can successfully speed up the arbitration process. In our design, the circuit signals become stable as soon as possible such that power-consuming glitches are mitigated. We have implemented our own result switching network in VLSI. The result shows that the circuit can work with 133 MHz clock rate. The gate count of the circuit is 27,000 and the maximum delay is 4.99ns.
The performance of the result switching network can be further improved through dynamically tuning the arbitration priority. The scheme of tuning arbitration priority can also reduce the power consumption for the result switching network. Finally, we describe the verification of our design and show how it works.

1. 簡介………………………………………………………1
1.1. 研究動機………………………………………………1
1.2. 研究目的………………………………………………2
1.3. 研究方法………………………………………………3
1.4. 研究結果………………………………………………4
1.5. 論文架構………………………………………………5
2. 微處理機內部架構組織…………………………………6
2.1. 微處理機內部元件相互關係…………………………6
2.2. 管線化與危障…………………………………………8
2.2.1. 管線化………………………………………………8
2.2.2. 危障…………………………………………………10
2.3. 指令高度平行運算與動態排程………………………11
2.4. 超純量化微處理機……………………………………14
3. 結果交換網路與仲裁決策………………………………16
3.1. 結果交換網路功能介紹………………………………16
3.2. 結果交換網路之種類…………………………………18
3.2.1. 分時公用匯流排結果交換網路……………………18
3.2.2. 縱橫交叉匯流排結果交換網路……………………19
3.2.3. 多級轉接網路………………………………………21
3.2.4. 多重匯流排結果交換網路…………………………23
3.3. 仲裁器的種類架構……………………………………25
3.3.1. 串列仲裁程序………………………………………25
3.3.2. 並形式仲裁邏輯……………………………………26
3.3.3. 動態仲裁演算法……………………………………27
4. 超純量微處理機結果交換網路之設計與實現………29
4.1. NSC98 結果交換網路結構及規格……………………29
4.2. 結果交換網路之輸入緩衝器設計……………………34
4.3. 結果交換網路之仲裁器組織結構設計………………35
4.3.1. 仲裁電路之組別仲裁設計…………………………38
4.3.2. 仲裁電路之匯流排仲裁設計………………………41
4.4. 結果交換網路之交換電路設計………………………44
4.5. 結果交換網路之仲裁電路進階研究…………………52
4.5.1. 算術邏輯運算單元仲裁模組………………………52
4.5.2. 浮點運算單元仲裁模組……………………………55
4.5.3. 分支、載入/儲存與多媒體指令運算單元仲裁模組…57
4.5.4. 多媒體指令運算單元仲裁模組………………………59
4.6. 結果交換網路之仲裁優先順序決策……………………61
4.7. 結果交換網路之實現與測試……………………………68
4.7.1. 電腦仿真驗證平台……………………………………69
4.7.2. 驗證項目………………………………………………70
4.7.3. 實作測試模擬數據……………………………………71
5. 結論………………………………………………………73
參考文獻……………………………………………………75

[1] John L Hennessy and David A Patterson, "Computer Architecture: A Quantitative Approach," Morgan Kaufman Publishers, Inc., San Matro, 1990.
[2] T. Lang and M. Valero, "M-users B-server Arbiter for Multiple-Buses Multiprocessors", Microprocessing and Microprogramming, 1982, pp.11-18.
[3] T. N. Mudge, J. P. Hayes, and D. C. Winsor, "Multiple Bus Architecture," IEEE Computer, June 1987, pp. 42-48.
[4] Neil H.E. Weste and Kamran Eshraghian, "Principles of COMS VLSI Design," Addison Wesley, Second Edition, 1993.
[5] David A. Patterson and John L. Hennessy, "Computer Organization and Design: the Hardware/Software Interface," Morgan Kaufman Publishers, Inc.,Second Edition, 1997.
[6] M. Alidina, J. Monteiro, S. Devadas, A. Ghosh, and M. Papaefthymiou, "Precomputation-based sequential logic optimization for low power," IEEE Trans. VLSI Syst., vol. 2, pp.426-436, Dec. 1994.
[7] J. Monteiro, Srinivas Devadas, and Abhijit Ghosh, "Sequential logic optimization for low power using input-disabling precomputation architectures," IEEE Trans. Computer-Aideded Design of Integrated Circuits and Systems. vol. 17, pp. 279-284, March 1998.
[8] Anand Raghunathan, Sujit Dey, and Niraj K. Jha, "Glitch analysis and reduction in register transfer level power optimization," in Proc. Design Automation Conf., pp.331-336, June 1996.
[9] J. Monteiro, Srinivas Devadas, Pranav Ashar, and Ashutosh Mauskar, "Scheduling techniques to enable power management," in Proc. Design Automation Conf., pp.349-352, June 1996.
[10] Anand Raghunathan, Sujit Dey, Niraj K. Jha, and Kazutoshi Wakabayashi, "Power management techniques for control-flow intensive designs," in Proc. Design Automation Conf., pp.429-434, June 1997.
[11] Vivek Tiwari, Sharad Malik, and Pranav Ashar, "Guarded evaluation: pushing power management to logic synthesis/design," IEEE Trans. Computer-Aideded Design of Integrated Circuits and Systems. vol. 17, pp. 1051-1060, October 1998.
[12] Gwo-Hwa Chen, Chih-Tai Hsieh, and H. C. Chi, "Design and Implementation of the Result Switching Network for a Superscalar Microprocessor," Proc. International Conference on Chip Technology, Hsinchu, Taiwan, April 1998.
[13] Gwo-Hwa Chen and Hsin-Chou Chi, "A Connection Network for Result Delivery in an Advanced Microprocessor," Proc. 9th VLSI Design/CAD Symposium, Nantou, Taiwan, August 1998.
[14] D. Alpert and D. Avnon, "Architecture of the Pentium Microprocessor," IEEE Micro, June 1993, pp. 11-21.
[15] Steven J. Beaty, "Genetic Algorithms for Instruction Sequencing and Scheduling," Workshop on Computer Architecture Technology and Formalism for Computer Science Research and Applications, Naples, Italy, April 1992.
[16] Steven J. Beaty, "Lookahead Scheduling," Proceedings of the 25th Annual International Symposium on Microarchitecture, December 1992.
[17] M. Becker et al., "The PowerPC 601 Microprocessor," IEEE Micro, Oct. 1994, pp. 54-68.
[18] James E. Bennett and Michael J. Flynn, "Performance Factors for Superscalar Processors", Technical Report No. CSL-TR-95-61-571, Stanford University, January 1995.
[19] Pohua P. Chang, Scott A. Mahlke, William Y. Chen, Nancy J. Water, and Wen-mei W. Hwu, " IMPACT: An Architectural Framework for Multiple-Instruction-Issue Processors," Proceedings of the 18th Annual International Symposium on Computer Architecture, Toronto, Canada, May 28, 1991, pp. 266-275.
[20] Pohua P. Chang, William Y. Chen, Scott A. Mahlke, and Wen-Mei W. Hwu, "Comparing Static and Dynamic Code Scheduling for Multiple-Instruction-Issue Processors", Proceedings of the 24th Annual ACM/IEEE International Symposium on Microarchitecture, Albuquerque, New Mexico, pp. 69-73, November 1991.
[21] J. Circello et al., "The Superscalar Architecture of the MC68060," IEEE Micro, April 1995, pp. 10-21.
[22] Nirav Dagli, "Design and Implementation of a Scheduling Unit for a Superscalar Processor," Master's thesis, UCI, December 1994.
[23] K. Diefendorff and M. Allen, "Organization of the Motorola 88110 Superscalar RISC Microprocessor," IEEE Micro, April 1992, pp. 40-63.
[24] Wayne Dugal, "Code Scheduling and Optimization for a Superscalar X86 Microprocessor," Master thesis, Department of Computer Science, University of Illinois, Urbana IL, May, 1995.
[25] John H. Edmondson, Paul Rubinfeld, Ronald Preston and Vidya Rajagopalan, "Superscalar Instruction Execution in the Alpha Microprocessor," IEEE Micro, April 1995, pp. 33-43.
[26] M. Anton Ertl and Andreas Krall, "Instruction Scheduling for Complex Pipelines", http://mips.complang.tuwien.ac.at/papers/ertl&krall92.ps.gz.
[27] M. J. Flynn, "Research in Computer Architecture", Technical Report No. CSL-TR-95-661, Stanford University, February 1995.
[28] D. Greenly et al., "Ultra-SPARC: The Next Generation Superscalar 64-bit SPARC," Proceedings of Compcon, 1995, pp. 442-451.
[29] L. Gwennap, "MIPS R10000 Uses Decoupled Architecture," Microprocessor Report, Vol. 8, No. 14, 1994, pp.18-22.
[30] L. Gwennap, "PA-8000 Combines Complexity and Speed," Microprocessor Report, Vol. 8, No. 15, 1994, pp.1-8.
[31] L. Gwennap, "Intel's P6 Uses Decoupled Superscalar Design," Microprocessor Report, Vol. 9, No.2, pp. 9-15.
[32] M. Johnson, "Superscalar Design," Prentice Hall, 1990.
[33] Stephen W. Keckler and William J. Dally, "Processor Coupling: Integrating Compile Time and Runtime Scheduling for Parallelism", 19th Annual International Symposium on Computer Architecture, Queensland, Australia, 1992.
[34] David Levitan, Thomas Thomas, and Paul Tu, "The PowerPC 620 Microprocessor: A High Performance Superscalar RISC Microprocessor," Proceedings of Compcon, 1995, pp. 285-291.
[35] Jose Monteiro and Sirnivas Devadas, "Computer-Aided Design Techniques for Low Power Sequential Logic Circuits", Kluwer Acadmic Publishers, 1997.
[36] R. R. Oehler and R. D. Groves, "IBM RISC System/6000 Processor Architecture," IBM Journal of Research and Development, vol. 34, pp. 23-36, January 1990.
[37] Deene Ogden, Belli Kuttanna, Albert J. Loper, Soummya Mallick, and Michael Putrino, "A New PowerPC Microprocessor for Low Power Computing Systems," Proceedings of Compcon, 1995, pp. 281-184.
[38] Subbarao Palacharla, Norman P. Jouppi and James E. Smith, "Quantifying the Complexity of Superscalar Processors," http://www.cs.wisc.edu:80/Dienst/Repostory/2.0/Body/ncstrl.uwmadison%2fCS-TR-96-1328/postscript. Y. N.
[39] Patt, W. W. Hwu, and M. Shebanow, "HPS, A New Microarchitecture: Rationale and Introduction," Proc. 18th Annual Workshop on Microprogramming, pp. 103-108, December 1985.
[40] Dezso Sima, "Superscalar Instruction Isssue", IEEE Micro, pp. 28-39, September/October 1997.
[41] J. E. Smith and G. S. Sohi, "The Microarchitecture of Superscalar Processors," Proceedings of the IEEE, December 1995.
[42] P.H. Sweany, S.J. Beaty."Dominator-Path Scheduling: A Global Scheduling Method", Proceedings of the 25th International Symposium on Microarchitecture, December 1992.
[43] Steven Wallace, Nirav Dagli, and Nader Bagherzadeh, "Design and Implementation of a 100 MHz Centralized Instruction Window for a Superscalar microprocessor", International Conference on Computer Design, October 1995.
[44] S. Wallace, N. Dagli, and N. Bagherzadeh, "Design and Implementation of a 100MHz Reorder Buffer," 37th Midwest Symposium on Circuit and Systems, August 1994.
[45] S. Wallace and N. Bagherzadeh, "Performance Issues of a Superscalar Microprocessor," 23rd International Conference of Parallel Processing, August 1994.
[46] S. Wallace, "Performance Analysis of a Superscalar Architecture," Master's thesis, UCI, September 1993.
[47] 吳全臨、王岳宜、王蘭豐、巫秋田、李仁德、柯健華、高民晟、陳少平、陳宏廣、張原榮、許懷仁、葉常征、劉國雄、蔡育濘(民89年):前瞻性微處理機設計與驗證─產學合作計畫現階段成果報告。論文發表於「1999微處理機研討會」,民89年5月27日,國立交通大學。
[48] 本論文曾獲立青研究論文獎學金。

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔