( 您好!臺灣時間:2024/05/22 22:54
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::


研究生(外文):Chung-Lin Tang
論文名稱(外文):Code Generation for Complex Processors by Machine Learning
指導教授(外文):Jenq-Kuen LeeWei-Kuan Shih
外文關鍵詞:CompilersCode GenerationVLIWDSP ProcessorsMachine Learning
  • 被引用被引用:1
  • 點閱點閱:256
  • 評分評分:
  • 下載下載:40
  • 收藏至我的研究室書目清單書目收藏:1
本論文之研究主題為複雜架構處理器之編譯技術。此類複雜處理器,為達到現代嵌入式系統之高效能、低耗電的艱鉅需求,在架構設計上產生許多奇特之處:不規則之資料運算路徑設計、分散式暫存器、簡化之內部資料線路、等。這些設計反應在其指令集架構,對於編譯器之設計造成相當的挑戰。此論文中實作之部分是以Open Research Compiler (ORC) 開放原始碼編譯器為基礎。我們也將詳細介紹此論文中主題之複雜處理器,叫做Parallel Architecture Core (PAC)。
In this thesis, we describe a method of instruction scheduling and operand placement for complex processor architectures. Such processors possess irregular datapaths, multiple register banks, and incomplete internal connection networks, hindering the abilities of classical compilation techniques in generating efficient code. We discuss a port of the Open Research Compiler (ORC) to such a complex processor, a new VLIW DSP, called the Parallel Architecture Core (PAC). We examine PAC, observe and characterize the machine models of such architectures, and how they are different from contemporary processors. We find that for such architectures, the restrictions on operand transport,
represents a new class of machine resource models. Such resource models are due to incomplete interconnection networks in the design, making structural properties emerge in such processors. Using the PAC processor as an example,
we describe a machine learning method that formulates the problem state space as a storage mapping of data operands,
and generate code for the PAC by doing combined instruction scheduling and operand storage assignment. We evaluate our algorithm using a benchmark suite for DSP processors, and find our technique to obtain approximately 35% to 40% improvement over a prior simplistic algorithm.
Title Page
1 Introduction
1.1 Overview
1.2 Outline of This Thesis
2 The Open Research Compiler
2.1 Origins of ORC
2.2 Overview of ORC/Open64
2.3 Porting ORC
2.3.1 Target Information Descriptions
2.3.2 Code Expansion/Instruction Selection
3 The Parallel Architecture Core
3.1 The PAC VLIW DSP Processor
3.2 Clustered Design
3.3 Port Constrained Register Files
3.4 DSP Specific Features
3.5 ORC Porting Issues
3.5.1 Runtime Environment Registers
3.5.2 Other Porting Issues
4 Generating Code for PAC
4.1 Resources, Structure, and Parallelism
4.1.1 Operand Placement and Instruction Scheduling
4.1.2 Transport Resources
4.1.3 Structural Processors
4.1.4 Examination of Some Processors
4.2 Machine Learning: Simulated Annealing
4.2.1 Simulated Annealing
4.2.2 Cluster Partitioning of Operations
4.2.3 The Evaluation Function
4.3 Reformulation of the Problem
4.3.1 Optimizing Assignments: Transport Resources
4.3.2 Optimizing Assignments: Operand Locations
4.3.3 Register Allocation
4.4 The Simulated Annealing Algorithm
4.5 The Scheduler Algorithm
5 Evaluation
5.1 Benchmark Environment
5.2 DSPStone Results
6 Conclusion
6.1 Summary
6.2 Future Work
[1] The GNU Compiler Collection.
[2] Open Research Compiler for Itanium processor family.
[3] Open64 Compiler Tools.
[4] WHIRL Intermediate Language and Symbol Table Speci‾cation.
[5] Michael Chu, Kevin Fan, and Scott Mahlke. Region-based hierarchical operation
partitioning for multicluster processors. In PLDI '03: Proceedings of the ACM
SIGPLAN 2003 conference on Programming language design and implementa-
tion, pages 300{311, New York, NY, USA, 2003. ACM Press.
[6] Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Ken-
neth Zadeck. E±ciently computing static single assignment form and the control
dependence graph. ACM Trans. Program. Lang. Syst., 13(4):451{490, 1991.
[7] Analog Devices. Black‾n Processor Instruction Set Reference.
[8] J. R. Goodman and W.C. Hsu. Code scheduling and register allocation in large
basic blocks. In ICS '88: Proceedings of the 2nd international conference on
Supercomputing, pages 442{452, New York, NY, USA, 1988. ACM Press.
[9] Texas Instruments. TMS320C6000 CPU and Instruction Set Reference Guide.
[10] Krishnan Kailas, Ashok Agrawala, and Kemal Ebcioglu. CARS: A new code gen-
eration framework for clustered ILP processors. In HPCA '01: Proceedings of the
Seventh International Symposium on High-Performance Computer Architecture
(HPCA'01), page 133, Washington, DC, USA, 2001. IEEE Computer Society.
[11] Robert Kennedy, Sun Chan, Shin-Ming Liu, Raymond Lo, Peng Tu, and Fred
Chow. Partial redundancy elimination in SSA form. ACM Trans. Program. Lang.
Syst., 21(3):627{676, 1999.
[12] S. Kirkpatrick, C.D. Gerlatt Jr., and M.P. Vecchi. Optimization by simulated
annealing. Science, 220:671{680, 1983.
[13] Viktor S. Lapinskii, Margarida F. Jacome, and Gustavo A. De Veciana. Cluster
assignment for high-performance embedded VLIW processors. ACM Trans. Des.
Autom. Electron. Syst., 7(3):430{454, 2002.
[14] Rainer Leupers. Instruction scheduling for clustered VLIW DSPs. In IEEE
PACT, pages 291{300, 2000.
[15] S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann.
E®ective compiler support for predicated execution using the hyperblock. In
25th Annual International Symposium on Microarchitecture, 1992.
[16] Srinivas Mantripragada, Suneel Jain, and James C. Dehnert. A new framework
for integrated global local scheduling. In IEEE PACT, pages 167{, 1998.
[17] Emre Ä Ozer, Sanjeev Banerjia, and Thomas M. Conte. Uni‾ed assign and sched-
ule: a new approach to scheduling for clustered register ‾le microarchitectures.
In MICRO 31: Proceedings of the 31st annual ACM/IEEE international sympo-
sium on Microarchitecture, pages 308{315, Los Alamitos, CA, USA, 1998. IEEE
Computer Society Press.
[18] M. Pincus. A Monte Carlo method for the approximate solution of certain types
of constrained optimization problems. Oper. Res., 18:1225{1228, 1970.
[19] B. Ramakrishna Rau. Iterative modulo scheduling: an algorithm for software
pipelining loops. In MICRO 27: Proceedings of the 27th annual international
symposium on Microarchitecture, pages 63{74, New York, NY, USA, 1994. ACM
[20] Michael Bedford Taylor, Jason Kim, Jason Miller, David Wentzla®, Fae Gho-
drat, Ben Greenwald, Henry Ho®man, Paul Johnson, Jae-Wook Lee, Walter Lee,
Albert Ma, Arvind Saraf, Mark Seneski, Nathan Shnidman, Volker Strumpen,
Matt Frank, Saman Amarasinghe, and Anant Agarwal. The Raw Microprocessor:
A Computational Fabric for Software Circuits and General-Purpose Programs.
IEEE Micro, 22(2):25{35, 2002.
[21] V. Zivojnovic, J.M. Velarde, C. SchlÄager, and H. Meyr. DSPstone: A DSP-
oriented benchmarking methodology. In Proceedings of Signal Processing Appli-
cations & Technology, 1994.
第一頁 上一頁 下一頁 最後一頁 top