跳到主要內容

臺灣博碩士論文加值系統

(44.200.94.150) 您好!臺灣時間:2024/10/16 14:58
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:洪銘彥
研究生(外文):Ming-Yen Hong
論文名稱:運用於嵌入式訊號處理器之向量暫存器架構設計與模擬
論文名稱(外文):Vector Register Architecture Design and Simulation on Embedded DSP Processor
指導教授:吳仁銘
指導教授(外文):Jen-Ming Wu
學位類別:碩士
校院名稱:國立清華大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2008
畢業學年度:96
語文別:英文
論文頁數:73
中文關鍵詞:暫存器架構
相關次數:
  • 被引用被引用:0
  • 點閱點閱:158
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
Single Instruction Multiple Data (SIMD) is powerful in multimedia processing. Usually, for a conventional 32-bit machine, if one data unit is 8-bit in width, one SIMD instruction can operate on four units at a time and thus reach data parallelism to four. These data units are often be regarded as subwords in SIMD processing. However, performance of SIMD is restricted by ill subword permutation in register file. Therefore, we propose a new architecture of register file named Vector Register (VR) architecture. With Vector Register, subwords can be well permuted in register file without bringing heavy traffic between memory and register file. We have designed three benchmarks, matrix transposition, deblocking filter, and discrete cosine transform (DCT) based on H.264/AVC, and set up a deliberating simulation flow on Starfish DSP (digital signal processor) simulator. The simulation results shows that, in average, we can improve cycle count, instruction count of these benchmarks to 30.865%, 30.606%, respectively.
在多媒體處理的領域中,由於資料的特性,單一指令操作於多重資料( Single Instruction Multiple Data, SIMD )的運算處理技術是有效及廣泛被使用的。通常,對於一台32-bit的機器來說,假如一個運算資料單位是8-bit的話,一條SIMD的指令可以同時操作於4個資料單位,因此也能將運算的平行度提升到4。這些運算資料單元在SIMD運算處理技術中,時常被稱之為子字符( subword )。然而,SIMD運算的效能常常受限於這些subwords在暫存器( register )之間的排列狀況。因此,為了解決subwords的排列問題,我們提出了一種新的暫存器架構,稱之為向量暫存器架構( Vector Register Architecture )。藉由向量暫存器架構,我們可以更自由地在暫存器間,排列、重組這些subwords,而不需要在暫存器跟記憶體之間,製造大量的資料流量。為了模擬與驗證向量暫存器的效能,我們基於新一代的影像壓縮技術─H.264/AVC,設計了三組標準測試程式( benchmark ),這些程式分別是矩陣轉置( matrix transposition),去方塊效應濾波器( deblocking filter),離散餘弦轉換 ( discrete cosine transform)。我們並設計了一套清楚的模擬流程去進行向量暫存器架構的模擬。透過這套流程,我們的模擬結果顯示:向量暫存器架構能有效地降低指令所消耗的週期數( cycle count ),以及所需要的指令數( instruction count )。平均而言,透過向量暫存器架構,我們能分別改善cycle count達到30.865%,instruction count達到30.606%。
Contents
1 Introduction 1
1.1 Research Motivation 1
1.2 Organization of This Thesis 2
2 Starfish DSP Architecture 5
2.1 Introduction 5
2.2 Architecture of Starfish DSP 6
2.2.1 Register File Architecture 6
2.2.2 Pipeline Architecture 7
2.2.3 Instruction Set Architecture 9
2.3 Software Toolkit of Starfish DSP 11
2.3.1 Toolchain 11
2.3.2 Simulator 12
2.3.3 Debuger 13
3 H.264/AVC Video Coding 15
3.1 Introduction 15
3.1.1 Terminologies in H.264/AVC 16
3.1.2 H.264/AVC Encoder 17
3.1.3 H.264/AVC Decoder 20
3.2 Critical Functions in H.264/AVC 20
3.2.1 Discrete Cosine Transform (DCT) 21
3.2.2 Deblocking Filter 24
4 Vector Register Architecture 31
4.1 Introduction 31
4.2 Previous Studies and Works 32
4.3 Principle of Vector Register 33
4.4 Hardware Architecture of Vector Register 34
4.4.1 Register File 34
4.4.2 Status Flag 34
4.5 ISA of Vector Register 34
4.6 Issues Regarding to Vector Register 36
4.6.1 Register Pressure 36
4.6.2 Pipeline Data Hazard Detection and Register Bypassing 37
5 Simulation 43
5.1 Introduction 43
5.1.1 Assumptions and Terminologies 44
5.2 An Overview of Simulation Flow 44
5.3 Modification of Starfish DSP Simulator 44
5.3.1 Principle of Starfish DSP Simulator 44
5.3.2 Implementations of VR instructions 47
5.4 Composing Vector Register Benchmarks 51
5.4.1 Matrix Transposition 51
5.4.2 Deblocking Filter 52
5.4.3 DCT 55
5.5 Code Generation 59
5.6 Simulation Results 61
5.6.1 Matrix Transposition 61
5.6.2 Deblocking Filter 63
5.6.3 DCT 64
5.6.4 Summary 66
6 Conclusion 69
[1] ”Analog Devices - Embedded Processing and DSP - Blackfin Processor Home”.
http://www.analog.com/processors/blackfin/.
[2] Juinn-Dar Huang. Members of starfish C1 group. ”An Overview to the Pipeline
Archtecture of Star IP DSP Processor”. http://nthucad.cs.nthu.edu.tw/ starip.
[3] Iain E. G. Richardson. ”H.264 and MPEG-4 Video Compression Video Coding for
Next-generation Multimedia”. John Wiley and Sons, 2003.
[4] G.J. Bjntegaard G. Luthra A. Wiegand, T. Sullivan. ”Overview of the H.264/AVC
video coding standard”. IEEE trans. Circuits and Systems for Video Technology,
13(7):560–576, July 2003.
[5] Y. Kamaci, N. Altunbasak. ”Performance comparison of the emerging H.264 video
coding standard with the existing standards”. Multimedia and Expo, 2003. ICME
’03., 1:345–348, July 2003.
[6] Xue Quan. Liu Jilin. Wang Shijie. Zhao Jiandong. ”H.264/AVC baseline profile de-
coder optimization on independent platform”. International Conference on Wire-
less Communications, Networking and Mobile Computing, 2005., 2:1253–1256, Sept
2005.
BIBLIOGRAPHY 72
[7] D. Ligang Lu. Ming-Ting Sun Jian Lou. Jagmohan, A. He. ”Statistical Analysis
Based H.264 High Profile Deblocking Speedup”. IEEE International Symposium
on Circuits and Systems, 2007. ISCAS 2007., pages 3143–3146, May 2007.
[8] Joint Video Team of ITU-T and ISO/IEC JTC 1. ”Draft ITU-T Recommenda-
tion and Final Draft International Standard of Joint Video Specification (ITU-
T Rec. H.264 — ISO/IEC 14496-10 AVC)”. document JVT-G050r1, May 2003;
technical corrigendum 1 documents JVTK050r1 (non-integrated form) and JVT-
K051r1 (integrated form),March 2004; and Fidelity Range Extensions documents
JVT-L047(nonnonintegrated form) and JVT-L050 (integrated form),, July 2004.
[9] S.G. Donglok Kim. Yongmin Kim Yoochang Jung. Berg. ”A register file with
transposed access mode”. International Conference on Computer Design, 2000.,
pages 559–560, September 2000.
[10] Asadollah Shahbahrami. Ben Juurlink. Stamatis Vassiliadis. ”Matrix register file
and extended subwords: two techniques for embedded media processors”. Confer-
ence On Computing Frontiers, pages 171–179, 2005.
[11] Asadollah Shahbahrami. Ben Juurlink. Stamatis Vassiliadis. ”Accelerating Color
Space Conversion Using Extended Subwords and the Matrix Register File”. Eighth
IEEE International Symposium on Multimedia, 2006. ISM’06., pages 37–46, Dec
2006.
[12] John Oliver. Venkatesh Akella. Frederic Chong. ”Efficient orchestration of sub-
word parallelism in media processors”. ACM Symposium on Parallel Algorithms
and Architectures, pages 225–234, 2004.
[13] R.B. Lee. ”Subword permutation instructions for two-dimensional multimedia
processing in MicroSIMD architectures”. IEEE International Conference on
73 BIBLIOGRAPHY
Application-Specific Systems, Architectures, and Processors, 2000., pages 3–14,
2000.
[14] U. Peleg, A. Weiser. ”MMX technology extension to the Intel architecture”. Micro,
IEEE, 16(4):42–50, Aug 1996.
[15] W.J. Khailany B. Mattson P. Kapasi-U.J. Owens J.D. Rixner, S. Dally. ”Regis-
ter organization for media processing”. Sixth International Symposium on High-
Performance Computer Architecture, 2000. HPCA-6., pages 375–386, Jan 2000.
[16] E. Dutt N.D. Nicolau-A. Paek Yunheung Shrivastava, A. Park Sanghyun. Earlie.
”Automatic Design Space Exploration of Register Bypasses in Embedded Proces-
sors”. IEEE Transactions on Computer-Aided Design of Integrated Circuits and
Systems, 26(12):2102–2115, Dec 2007.
[17] ”H.264/AVC Software Coordination”. http://iphome.hhi.de/suehring/tml/.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top