跳到主要內容

臺灣博碩士論文加值系統

(44.222.134.250) 您好!臺灣時間:2024/10/08 03:46
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:郭武安
研究生(外文):Wu-an Kuo
論文名稱:適合高效能及低功率處理器設計的合成技術
論文名稱(外文):Synthesis Techniques for High-Performance and Low-Power Processor Designs
指導教授:吳中浩黃婷婷黃婷婷引用關係
指導教授(外文):Allen C.-H. WuTing-Ting Hwang
學位類別:博士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2006
畢業學年度:94
語文別:英文
論文頁數:90
中文關鍵詞:高效能低功率處理器串音編譯器指令匯流排指令解碼器乘法器
外文關鍵詞:High performanceLow powerProcessorCrosstalkCompilerInstruction busInstruction decoderMultiplier
相關次數:
  • 被引用被引用:0
  • 點閱點閱:279
  • 評分評分:
  • 下載下載:23
  • 收藏至我的研究室書目清單書目收藏:1
積體電路製程的演進使得元件尺寸越來越小。在元件縮小的同時,越來越多的元件可以放入單一的晶片中。為了讓晶片中的大量元件共同運作,以處理器為主的設計架構已成為主流。對於如何設計處理器,效能及功率消耗為兩個主要的議題,這兩個議題與指令的完成相關性非常高。要完成一個指令,抓取、解碼及執行是主要的三個步驟。我們將針對這三個步驟提出方法來改善處理器的效能及功率消耗。
首先,針對抓取的步驟。因為製程的演進,使得兩條導線的距離越來越短,隨之而來的串音問題,使得導線上的延遲變的十分嚴重。我們發現指令匯流排上的傳輸資料可以在編譯時就可以加以控制,所以提出兩個編譯器演算法──指令重新排程及暫存器更名,來消除指令匯流排上的串音效應,進而改善指令匯流排的效能。
其次,對解碼的步驟來說,我們發現指令執行的機率不是平均的。這表示在大部分的時候,我們不需要實現整個指令解碼器,來為指令解碼。藉由分析程式執行的序列,我們將指令解碼器分割成數個子解碼器。在大部分的時候,只有一個子解碼器在工作,因而改善了功率消耗。
最後,對執行的步驟來說,乘法指令一向指令集中消耗功率大及執行時間長的部分。我們以一個長度可變的雙乘法器結構,提出一個產生乘法指令的方法,來減少特殊應用指令處理器的功率消耗。這方法可以分析乘法指令的執行順序及乘法運算元的有效長度,來減少因為重覆位元的功率消耗。
Significant advances in VLSI process technology have scaled the feature size down. As the transistor size shrinking, more and more transistors can be integrated into a single chip. To make large number of components in a chip working together, processor-based design methods have become one of the best choices. In a processor design, power consumption and performance are two important issues. The power and performance issues are related to the execution of an instruction. During the execution of an instruction, three main steps are fetching, decoding, and executing. We will propose techniques to improve performance and power in these three steps.
First, for the step of fetching instructions, because the advances of technology shorten the distances between wires, the crosstalk problem has affected the wire delay seriously. Since the data sequences on an instruction bus are known during the compile time, we present two compiler algorithms, rescheduling and renaming, for performance improvement by eliminating crosstalk effects on an instruction bus.
Second, for the step of decoding instructions, we found that the execution frequency of instructions is uneven. It means that in most of time, we need not utilize the whole instruction decoder to decode instructions. By tracing program execution sequences, we decompose the instruction decoder into several coupling sub-decoders. For most of time, only one sub-decoder is activated and the power is minimized.
Finally, for the step of executing instructions, the multiplication is the most power consuming operation among all operations in an instruction set. Based on a dual-&-configurable-multiplier structure, our proposed method devises a multiplication instruction-set for low-power ASIPs. Our method exploits the execution sequences of multiplication instructions and effective bit-widths of variables to reduce power consumed by redundant multiplication bits.
ABSTRACT
CHAPTER 1 INTRODUCTION
CHAPTER 2 RELATED WORK
CHAPTER 3 PERFORMANCE-DRIVEN CROSSTALK ELIMINATION AT POST-COMPILER LEVEL
CHAPTER 4 DECOMPOSITION OF INSTRUCTION DECODER FOR LOW POWER DESIGNS
CHAPTER 5 A POWER-DRIVEN MULTIPLICATION INSTRUCTION-SET DESIGN METHOD FOR ASIPS
CHAPTER 6 CONCLUSIONS AND FUTURE WORK
REFERENCES
APPENDIX A PUBLICATION LISTS
[1]Baniasadi, A. and Moshovos, A., “Instruction flow-based front-end throttling for power-aware high-performance processors”, in Proceedings of Low Power Electronics and Design, pp. 16-21, 2001.
[2]Kalambur, A. and Irwin, M. J., “An extended addressing mode for low power”, in Proceedings of In Proceedings of Low Power Electronics and Design, pp. 208-213, 1997.
[3]Lee, L., Moyer, B., and Arends, J., “Instruction fetch energy reduction using loop caches for embedded applications with small tight loops”, in Proceedings of Low Power Electronics and Design, pp. 267-269, 1999.
[4]Montanaro, J. and et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor”, in IEEE Journal of Solid-State Circuits, volume:31 issue:11, 1996.
[5]Kobayashi, S., et al., “Compiler generation in PEAS-III an ASIP development system”, in Proc. of SCOPES’01 ,2001.
[6]Semiconductor Industry Association, “National Technology Roadmap for Semiconductors”, 1997.
[7]Jason Cong, “An Interconnect-Centric Design Flow for Nanometer Technologies”, in the Proc. of the IEEE, vol.89(4), pp. 505-528, 2001.
[8]A. Wang, et al., “Energy-Aware Architectures for a Real-Valued FFT Implementation”, in Proc. of ISLPED’03, pp. 360-365, 2003.
[9]Siu-Kei Wong, Chi-Ying Tsui, “Re-configurable Bus Encoding Scheme for Reducing Power Consumption of the Cross Coupling Capacitance for Deep Sub-micron Instruction Bus”, in Proc. of DATE’04, pp. 10130-10135, 2004.
[10]Yan Zhang, John Lach, Kevin Skadron, Mircea R. Stan, “Odd Even Bus Invert with Two-Phase Transfer for Buses with Coupling”, in Proc. of ISLPED’02, pp. 80-83, 2002.
[11]Paul P. Sotiriadis and Anantha Chandrakasan, “Bus Energy Minimization by Transition Pattern Coding (TPC) in Deep Sub-Micron Technologies”, in Proc. of DAC’00, pp. 322-328, 2000.
[12]Srinivasa R. Sridhara and Naresh R. Shanbhag, “A low-power bus design using joint repeater insertion and coding”, in Proc. of ISLPED’05, pp. 99-102, 2005.
[13]Bret Victor and Kurt Keutzer, “Bus Encoding to Prevent Crosstalk Delay”, in Proc. of ICCAD’01, pp. 57-63, 2001.
[14]Chunjie Duan, Anup Tirumala, and Sunil P. Khatri, “Analysis and Avoidance of Cross-talk in On-Chip Buses”, in Hot Interconnects 9, pp. 133-138, 2001.
[15]Chunjie Duan and Sunil P. Khatri, “Exploiting Crosstalk to Speed up On-chip Buses”, in Proc. of DATE’04, pp. 20778-20783, 2004.
[16]Chun-Gi Lyuh and Taewhan Kim, “Low Power Bus Encoding with Crosstalk Delay Elimination”, in 15th Annual IEEE International ASIC/SOC Conference, pp. 389-393, 2002.
[17]Tiehan Lv, Jörg Henkel, Haris Lekatsas, and Wayne Wolf, “Enhancing Signal Integrity through a Low-overhead Encoding Scheme on Address Buses”, in Proc. of DATE’05, pp. 10542-10547, 2002.
[18]Benini, L., Macii, A. Macii, E., and Poncino, M., “Selective instruction compression for memory energy reduction in embedded systems”, in Proceedings of Low Power Electronics and Design, pp. 206-211, 1999..
[19]Lekatsas, H., Henkel, J., and Wolf, W., “Code compression for low power embedded system design”, in Proceedings of the Design Automation Conference, pp. 294-299, 1999..
[20]Su.C., Tsui, C., and Despain, A., “Saving power in the control path of embedded processors”, in IEEE Design and Test of Computers, volume: 11, issue: 4, pp. 24-31, 1994.
[21]Stan, M. and Burleson, W., “Bus Invert for Low Power I/O”, in IEEE transactions on VLSI, volume: 3, issue: 1, pp. 49-58, 1995.
[22]Bajwa, R. S., et al., “Instruction buffering to reduce power in processors for signal processing”, in IEEE Transactions on VLSI, volume: 5 issue: 4, pp. 417-424, 1997.
[23]Tang, W., Gupta, R., and Nicolau, A., “Power savings in embedded processors through decode filter cache”, in Proceedings of Design, Automation and Test in Europe, pp. 443-448, 2002.
[24]Solomon, B., Mendelson, A., Ronen, R., Orenstien, D., and Almog, Y., ”Micro-operation cache: a power aware frontend for variable instruction length ISA”, in IEEE Transactions on VLSI, Volume 11, Issue 5, pp. 801-811, Oct. 2003.
[25]Kissell, K.D., “MIPS16: high density MIPS for the embedded market”, silicon graphics group, 1997.
[26]ARM Ltd., “An introduction to thumb”, 1995.
[27]Chen, G., Natan Baron and Zvika Rosenshein, “Small-area, low-power instruction decoder”, in Technique report in Motorola, 1996.
[28]Intel, “Pentium Pro family developer’s manual volume 2: programmer’s reference manual”, 1995.
[29]Kubilay Atasu, Günhan Dündar, Can Özturan, “An integer linear programming approach for identifying instruction-set extensions”, in Proc. of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, pp. 172-177, 2005.
[30]L. Benini, et al., “Reducing power consumption of dedicated processors through instruction set encoding”, in Proc. Great Lakes Symposium on VLSI, pp. 8-12, 1998.
[31]W. Dougherty, et al., “Instruction subsetting: Trading power for programmability”, in Proc. Workshop on System Level Design, pp.42-47, 1998.
[32]J. Lee, et al., “Energy-Efficient Instruction Set Synthesis for Application-Specific Processors”, in Proc. ISLPED’03, pp.330-333, 2003.
[33]R. S. Bajwa, et al., “Instruction buffering to reduce power in processors for signal processing”, in IEEE Transactions on VLSI, vol.5 no.4, pp. 417-424, 1997.
[34]T. Okuma, et al., “Reducing Access Energy of On-Chip Data Memory Considering Active Data Bitwidth”, in Proc. of ISLPED’02, pp. 88-91, 2002.
[35]Y. Cao, et al., “Data Memory Design Considering Effective Bitwidth for Low-Energy Embedded Systems”, in Proc. of ISSS’02, pp. 201-206, 2002.
[36]D. Brooks, et al., “Value-Based Clock Gating and Operation Packing: Dynamic Strategies for Improving Processor Power and Performance”, in ACM Transactions on Computer Systems, Vol. 18 , No. 2, May 2000.
[37]M. Bhardwaj, et al., “Quantifying and Enhancing Power Awareness of VLSI Systems”, in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Volume:9 ,Issue: 6, Dec 2001.
[38]L. Benini, G. DeMicheli, A. Macii, E. Macii and M. Poncino, “Automatic Selection of Instruction Op-codes of Low-power Core Processors.”, in IEE Proc.-Comput. Digit. Tech., Vol 146, No.4, July 1999.
[39]Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein, “Introduction to Algorithms”, in the MIT Press, second edition, pp. 969-972, 1989.
[40]Sabin H. Gerez, “Algorithms for VLSI Design Automation”, John Wiley and Sons, pp. 265-267, 1998.
[41]E. G. Coffman, Jr., G. Galambos, S. Martello, and D. Vigo, “Bin packing approximation algorithms: Combinatorial Analysis”, in Handbook of Combinatorial Optimization, Kluwer Academic Publishers, 1998.
[42]Naveed A. Sherwani, “Algorithms for VLSI Physical Design Automation”, Kluwer Academic Publishers, third edition, Section 5.4.1, 1999.
[43]Hennessy, J. and Patterson, D., “Computer architecture: a quantitative approach”, Second Edition, 103-108, 1996.
[44]Scott, J., Lee, L., Arends, J., and Moyer, B., “Designing the low-power M*CORE architecture”, in Proceedings of International Symposium on Computer Architecture Power Driven Microarchitecture Workshop, 145-150, 1998.
[45]Homepage of SimIt ARM. http://www.ee.princeton.edu/~wqin/armsim.htm.
[46]Kuo W. A., Hwang, T., and Wu, A., “Decomposition of instruction decoder for low power design”, in Proceedings of Design, Automation and Test in Europe, 10664-10665, 2004.
[47]Synopsys, “Synopsys Design Compile Reference Manual”, 2004.
[48]Chow, S. H., Ho, Y. C., Hwang, T., and Liu C. L., “Low power realization of finite state machines–a decomposition approach”, in ACM Transactions on Design Automation and Electronic Systems, volume: 1, issue: 3, pp. 315-340, 1996.
[49]Patterson, D., and Hennessy, J., “Computer organization and design: the hardware/software interface”, Third edition, pp. 399-304, 2004.
[50]Holmer, B., et al., “Fast prolog with an extended general purpose architecture”, in the 17th Annual International Symposium on Computer Architecture, pp. 282-291, 1990.
[51]Lin, B., and Newton, A. R., “Synthesis of multiple level logic from symbolic high-level description languages”, in Proceedings of the International Conference on VLSI, pp. 187-196, 1989.
[52]C. Lee, et al., ”MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems”, in IEEE Micro, pp.330-335, 1997.
[53]Y. Cao, et al., “Data Memory Design Considering Effective Bitwidth for Low-Energy Embedded Systems”, in Proc. of ISSS’02, pp.201-206, 2002.
[54]H. Yamashita, et al., “Variable Size Analysis and Validation of Computation Quality”. in Proc. of Workshop on High-Level Design Validation and Test, pp. 95-100, 2000.
[55]Y. Cao, et al., “A System-level Energy Minimization Using Datapath Width Optimization”, in Proc. of ISLPED’01, pp.231-236, 2001.
[56]The Trimaran Compiler Infrastructure, http://www.trimaran.org.
[57]G. Karypis, et al., “A fast and high quality multilevel scheme for partitioning irregular graphs “, in SIAM, pp. 359-392, 1995.
[58]Falcon, A., Ramirez, A., and Valero, V., “A Low-Complexity, High-Performance Fetch Unit for Simultaneous Multithreading Processors”, in Proc. of High Performance Computer Architecture, pp. 244-253, 2004.
[59]Daniel Chaver, Miguel A. Rojas, Luis Pinuel, Manuel Prieto, Francisco Tirado, Michael C. Huang, “Energy-aware fetch mechanism: trace cache and BTB customization”, in Proc. of ISLPED’05, pp. 42-47, 2005.
[60]L. Nagel, “Spice: A computer program to simulate computer circuits”, in UCB/ERL Memo M520, University of California, Berkeley, May 1995.
[61]Berkeley Predictive Technology Model, http://www-device.eecs.berkeley.edu/~ptm.
[62]R.E. Kessler, “The Alpha 21264 microprocessor”, in IEEE Micro, Volume 19, Issue 2, pp. 24 – 36, March-April 1999.
[63]Y-F. Tsai, D. Duarte, N. Vijaykrishnan, and M.J. Irwin, “Implications of Technology Scaling on Leakage Reduction Techniques”, in Proc. of DAC’03, pp.187-190, 2003.
[64]D. Brooks, V. Tiwari, and M. Martonosi, “Wattch: A Framework for Architecture-Level Power Analysis and Optimizations”, in Proc. of ISCA-27, 2000.
[65]Homepage of MediaBench, http://cares.icsl.ucla.edu/ MediaBench/applications.html.
[66]The Berkeley Multimedia Research Center, http://bmrc.berkeley.edu/ftp/pub/multimedia/mpeg2/conformance-bitstreams/video/bitstreams/main-profile/.
[67]Second data set of MediaBench, http://www.icsl.ucla.edu/ ~leec/2nd.tar.gz.
[68]John L. Hennessy & David A. Patterson, “Computer Architecture a Quantitative Approach”, Morgan Kaufmann Publishers, second edition, 1996, pp. 201-209.
[69]Freescale Semiconductor, “MCF5213 Microcontroller Family Hardware Specification”, Data Sheet, 2006.
[70]Analog Devices, Inc, “ADSP-BF561 Blackfin® Processor Hardware Reference”. 2005.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top