研究生(外文):Jiun-You Chen
論文名稱(外文):An Implementation of MIPS32 Floating-Point Co-processor with Rounding Mechanism
中文關鍵詞:RoundingFPALU浮點Count Leading Zero浮點協同處理器
外文關鍵詞:Count Leading ZeroFPALURounding
隨著多媒體應用的蓬勃發展,現今嵌入式手持式設備需要更多的計算效能來處理複雜的計算。其中,多媒體處理所需之浮點計算能力尤為重要。在過去,多數的嵌入式系統未具備硬體浮點計算能力,當需要浮點計算時,只能以整數函數模擬。除了增加程式計算時間外,更提高了完成工作所需之總能量消耗。有鑑於此,本研究以Verilog硬體描述語言,設計一個具備完整MIPS32浮點指令運算之浮點運算協同處理器(Floating-Point Co-Processor)。此浮點協同處理器實現了52道浮點指令之功能,包括浮點算術邏輯指令、分支指令、比較指令、記憶體存取指令、轉換指令、與搬移指令,並符合IEEE 754單精度與倍精度標準。當中包含自行研發之快速多週期浮點算術邏輯單元(Floating Point Arithmetic Logic Unit, FPALU)與快速捨入機制,並開發符合IEEE 754與MIPS32所要求各種例外情況之專屬例外處理機制。藉此加速浮點運算處理速度,進而提昇系統整體效能。
設計過程中,本研究以MIPS32軟體模擬器:SPIM作為參考依據,藉以驗證所設計之浮點處理器之功能。並以Verilog模擬器:Mentor Graphic的Modelsim與Novas nLint進行功能驗證與可合成語法檢查。在與本實驗室所開發之MIPS32整數處理器整合後,在硬體模擬模型(Simulation Model)中,藉著先前研發之模擬輸出入函式庫與MIPS SDE Lite/GCC 交叉編譯器(Cross Compiler),編譯應用程式,並搭配Softfloat函式庫,分別產生具有浮點指令與不具有浮點指令之測試檔,以驗證浮點協同處理器之功能,進而取得同一程式,在有硬體浮點運算與軟體模擬浮點運算下之執行時間差異。
在通過功能驗證後,本研究以Synopsys Design Compiler,在TSMC 0.13μm的製程技術,合成前述之RTL Verilog浮點運算器,並利用ARM Integrator發展板,將此一設計實現於FPGA晶片上,並進行軟硬體協同驗證,以確認硬體功能正確。
從實驗結果可知,本研究所開發之浮點協同處理器,在TSMC 0.13μm製程技術下,工作頻率可達113.3MHz。實驗結果亦證實,在具備硬體浮點運算器之電腦系統裡,需浮點運算之程式,效能提昇達823%以上,足見本研究之設計正確與其應用價值。
With rapid growing of multimedia applications, modern embedded handheld devices need more computing power to process complex problems, especially in floating point computation when process multimedia applications. In the past, most of embedded systems don’t contain hardware floating point unit, so the applications have to be simulated by integer functions while they are required floating point capabilities. It consumes more execution time and energy to complete this kind of jobs. Accordingly, this study designs a hardware floating point co-processor with complete MIPS32 floating point instructions. This co-processors implements 52 floating point instructions, which includes floating point arithmetic logic instructions, branch instructions, comparison instructions, memory access instructions, convert instructions and move instructions, that are complaint with IEEE 754 single and double precision standard. In this co-processor, it contains a fast multi-cycle floating point arithmetic logic unit and a fast rounding mechanism. In order to deal with the exception handling mechanisms that are required by IEEE 754 and MIPS32, this study proposes a specific exception handling mechanism. These mechanisms can accelerate the floating point operations and overall system performance.
In the development, this study adopts a MIPS32 software simulator: SPIM, to be the golden model for verifying the functionality of this floating point co-processor. Then the designed Verilog co-processor is simulated by Mentor Graphic Modelsim and lint by Novas nLint to check its functional correctness and synthesizable syntax. By integrating with MIPS32 integer processor that developed by our lab, in the corresponding Simulation Model, this study evaluates the performance difference of benchmarks with and without floating point instructions. These benchmarks are generated by our designed simulating I/O routines and MIPS SDE Lite/GCC cross compiler to compile the same application, with Softfloat library.
After passing the functional verification, this study synthesize the proposed RTL Verilog floating point co-processor by Synopsys Design Compiler with TSMC 0.13μm technology library, then process the Hardware/Software co-verification by downloading this design into the FPGA of ARM Integrator and verifying its correctness.
According to experimental results, the proposed floating point co-processor can achieve 113.3MHz by TSMC 0.13μm technology. It also provides that the speedup can achieve 820% while the given benchmarks require floating point operations and executes in the computer system which includes our proposed floating point co-processor. These results demonstrate the correctness and functionality of our designed floating point co-processor.
