研究生(外文):Chung-Lin Tang
論文名稱(外文):Code Generation for Complex Processors by Machine Learning
指導教授(外文):Jenq-Kuen LeeWei-Kuan Shih
外文關鍵詞:CompilersCode GenerationVLIWDSP ProcessorsMachine Learning
本論文之研究主題為複雜架構處理器之編譯技術。此類複雜處理器,為達到現代嵌入式系統之高效能、低耗電的艱鉅需求,在架構設計上產生許多奇特之處:不規則之資料運算路徑設計、分散式暫存器、簡化之內部資料線路、等。這些設計反應在其指令集架構,對於編譯器之設計造成相當的挑戰。此論文中實作之部分是以Open Research Compiler (ORC) 開放原始碼編譯器為基礎。我們也將詳細介紹此論文中主題之複雜處理器,叫做Parallel Architecture Core (PAC)。
In this thesis, we describe a method of instruction scheduling and operand placement for complex processor architectures. Such processors possess irregular datapaths, multiple register banks, and incomplete internal connection networks, hindering the abilities of classical compilation techniques in generating efficient code. We discuss a port of the Open Research Compiler (ORC) to such a complex processor, a new VLIW DSP, called the Parallel Architecture Core (PAC). We examine PAC, observe and characterize the machine models of such architectures, and how they are different from contemporary processors. We find that for such architectures, the restrictions on operand transport,
represents a new class of machine resource models. Such resource models are due to incomplete interconnection networks in the design, making structural properties emerge in such processors. Using the PAC processor as an example,
we describe a machine learning method that formulates the problem state space as a storage mapping of data operands,
and generate code for the PAC by doing combined instruction scheduling and operand storage assignment. We evaluate our algorithm using a benchmark suite for DSP processors, and find our technique to obtain approximately 35% to 40% improvement over a prior simplistic algorithm.
Title Page
1 Introduction
1.1 Overview
1.2 Outline of This Thesis
2 The Open Research Compiler
2.1 Origins of ORC
2.2 Overview of ORC/Open64
2.3 Porting ORC
2.3.1 Target Information Descriptions
2.3.2 Code Expansion/Instruction Selection
3 The Parallel Architecture Core
3.1 The PAC VLIW DSP Processor
3.2 Clustered Design
3.3 Port Constrained Register Files
3.4 DSP Specific Features
3.5 ORC Porting Issues
3.5.1 Runtime Environment Registers
3.5.2 Other Porting Issues
4 Generating Code for PAC
4.1 Resources, Structure, and Parallelism
4.1.1 Operand Placement and Instruction Scheduling
4.1.2 Transport Resources
4.1.3 Structural Processors
4.1.4 Examination of Some Processors
4.2 Machine Learning: Simulated Annealing
4.2.1 Simulated Annealing
4.2.2 Cluster Partitioning of Operations
4.2.3 The Evaluation Function
4.3 Reformulation of the Problem
4.3.1 Optimizing Assignments: Transport Resources
4.3.2 Optimizing Assignments: Operand Locations
4.3.3 Register Allocation
4.4 The Simulated Annealing Algorithm
4.5 The Scheduler Algorithm
5 Evaluation
5.1 Benchmark Environment
5.2 DSPStone Results
6 Conclusion
6.1 Summary
6.2 Future Work
