跳到主要內容

臺灣博碩士論文加值系統

(44.211.26.178) 您好!臺灣時間:2024/06/24 20:19
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:林志遠
研究生(外文):Lin, Chih-Yuan
論文名稱:DSP程式效能最佳化研究-以H.264編碼器為範例
論文名稱(外文):Research on the performance optimization of DSP program, with a casestudy of the H.264 Encoder
指導教授:陳延禎黃植振黃植振引用關係
指導教授(外文):Chen, Yen-JenHuang, Jr-Jen
學位類別:碩士
校院名稱:明志科技大學
系所名稱:電機工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2010
畢業學年度:98
語文別:中文
論文頁數:59
中文關鍵詞:H.264數位信號處理器TMS320DM6437最佳化
外文關鍵詞:H.264DSPTMS320DM6437Optimization
相關次數:
  • 被引用被引用:0
  • 點閱點閱:467
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本論文研究執行於數位訊號處理器(Digital Signal Processor, DSP)內之程式的效能最佳化,並以本團隊實做之H.264視訊編碼器為案例進行效能改善。研究所用之平台是德州儀器(Texas Instruments, TI)的DM6437 DSP程式開發板,並利用H.264官方釋出之視訊測試檔案來測試本團隊的編碼器程式。此程式是參考H.264官方所釋出之編碼器軟體JM8.0的原始碼改寫而成,不過JM8.0是以x86 CPU為平台撰寫而成,為了解JM8.0在DM6437的執行效能,本團隊也製作JM8.0在DM6437 DSP平台上的移植版。將QCIF (176x144視訊解析度)格式的視訊測試檔送入在DM6437內不同的H.264編碼器做視訊壓縮,他們是JM8.0移植版、本團隊自製版、本研究加強版,其每秒壓縮之畫面張數分別是0.16張、1.31張、6.73張。本研究在兩個面向做最佳化之處理,其一是利用TI程式編譯器Code Composer Studio (CCS) 3.3做程式碼編譯最佳化,分為四個層級o0 ~ o3,當使用o0層級時使用最少的最佳化,而o3階層則為最高級最佳化,能調整程式碼做管線與平行化處理,但此時DM6437平台需要很大的記憶體空間方能進行編譯;其二是利用DM6437 DSP內部之快取記憶體的配置,使常用的程式碼或資料能儘量或常駐在快取記憶體內,以減少DSP存取主記憶體的次數,進而達成最佳化。本研究分析編譯最佳化與快取最佳化對整體最佳化之效益,並以複雜度極高之H.264編碼器作為效能改善研究的實例。未來若進一步將瓶頸撰寫為組合語言碼,以及運用EDMA做更多最佳化的研究,相信H.264之編碼效能還有大幅改善的空間。
This thesis focuses on the research of the efficiency optimization that is applied inside the Digital Signal Processor and uses the H.264 video encoder, made by the team, as an example of improving efficiency.
The platform used in the research is the DM6437 DSP system development board that was made by Texas Instruments. To test our encoding system, we use the officially released H.264 video testing file.
This system was developed by referring to the original code of the coding software JM8.0 that was released by the official H.264. However, while the original JM8.0 was written using x86 CPU as the platform, in order to understand the application efficiency of JM8.0 when using DM6437, we also developed an implanted version of JM8.0 on the DM6437 DSP platform. The video compression rates in the JM8.0 implanted version, the team-designed version, and our enhanced version are 0.16 pages per second, 1.31 pages per second, and 6.73 pages per second.
This research processes the optimization in 2 ways. The first way is to utilize the TI Code Composer Studio (CCS) 3.3 for the optimization of system coding which is divided into 4 levels (o0 to o3) with the o0 level being minimal optimized and the o3 level being the most optimized level. When using the o3 option, the CCS will activate pipeline and parallel processing functions. At this moment, the DM6437 platform requires a large amount of memory for compiling. The second way is to utilize the cache memory allocation that is equipped inside the DM6437 so the system code/data that is frequently used can remain inside the cache memory as long as possible. Usage of the memory cache could achieve optimization because it decreases the number of times that DSP needs to access the main memory.
This research analyzed the overall efficiency of the compiling optimization/cache optimization and used the complicated H.264 encoder as an example of efficiency promoting research.
If one could take further steps and conquer the final obstacles by rewriting the system encoder to assembly language and applying EDMA for more optimization research, it is possible that the efficiency of the H.264 could be further improved and promoted in the future.
明志科技大學碩士學位論文指導教授推薦書 ......................................... i
明志科技大學碩士學位論文口試委員會審定書 ..................................... ii
明志科技大學學位論文授權書 .............................................................. iii
博碩士論文電子檔案上網授權書 ........................................................... iv
誌謝 ........................................................................................................... v
中文摘要 .................................................................................................. vi
英文摘要 ................................................................................................. vii
目錄 .......................................................................................................... ix
表目錄 ...................................................................................................... xi
圖目錄 ..................................................................................................... xii
第一章緒論 ........................................................................................... 1
1.1研究背景與動機 .......................................................................... 1
1.2研究目的 ...................................................................................... 2
1.3論文架構 ...................................................................................... 3
第二章動態影像壓縮 ............................................................................ 4
2.1預測編碼(Prediction Coding) ......................................................... 5
2.2轉換編碼(Transform Coding) ......................................................... 6
2.3熵編碼(Entropy Coding) ................................................................ 7
2.4自製H.264編碼器......................................................................... 8
第三章數位訊號處理器DM6437EVM軟硬體開發環境 ...................... 9
3.1德州儀器硬體開發平台 ............................................................. 10
3.1.1記憶體架構 ...................................................................... 12
3.2德州儀器軟體開發平台 ............................................................. 14
3.2.1 DSP/BIOS與cmd檔的比較和運用 ................................. 18
3.3實驗環境建置 ............................................................................ 20
3.3.1硬體實驗環境 .................................................................. 20
3.3.2軟體實驗架構 .................................................................. 21
第四章程式碼最佳化 .......................................................................... 22
4.1程式C/C++編譯選項 ................................................................. 23
4.2程式執行CPU週期數與程式碼大小 ........................................ 25
4.3程式C/C++編譯流程 ................................................................. 27
4.4虛指令 ....................................................................................... 30
4.5優化記憶體配置 ........................................................................ 31
第五章實驗與結果分析 ...................................................................... 33
5.1自製H.264運算量百分比分析 .................................................. 35
5.1.1畫面內預測功能運算量百分比分析 ................................ 35
5.1.2畫面間預測功能運算量百分比分析 ................................ 37
5.1.3畫面內預測與畫面間預測功能比較與分析 ..................... 40
5.2編譯最佳化效能分析 ................................................................. 42
5.2.1編譯最佳化改善程式碼大小 ........................................... 42
5.2.2編譯最佳化改善程式效能 ............................................... 43
5.2.3編譯最佳化總結 .............................................................. 44
5.3記憶體配置影響效能分析 ......................................................... 46
第六章結論與未來展望 ...................................................................... 49
參考文獻 ................................................................................................. 51
附錄
A. DSP - Transform轉換程式 .................................................................. 54

表目錄
表2.1 量化步階表 ................................................................................ 6
表2.2 資料編碼表範例 ........................................................................ 7
表2.3 自製H.264技術功能列表 ........................................................ 8
表2.4 畫面間預測數據比較 (JM8.0測試) ......................................... 8
表3.1 C64x+ DSP記憶體配置 .......................................................... 12
表4.1 快取記憶體固定可規劃大小 .................................................. 32
表5.1 自製H.264 - Intra技術功能列表 ................................................ 35
表5.2 自製H.264 - Intra技術實驗數據(1Frame) ............................... 36
表5.3 自製H.264 - Intra技術實驗數據(10Frame) ............................. 36
表5.4 自製H.264 - Intra技術功能列表 ................................................ 37
表5.5 自製H.264 - Inter技術實驗數據(1frame) ................................ 38
表5.6 自製H.264 - Inter技術實驗數據(10frame) .............................. 38
表5.7 H.264壓縮結果比較表 ................................................................ 40
表5.8 程式碼(Code Size)實驗數據 ........................................................ 42
表5.9 程式效能(CPU Cycle)實驗數據 .................................................. 43
表5.10 H.264程式執行時間 (fps) ........................................................ 45
表5.11 內部記憶體配置表 ....................................................................... 46
表6.1 圖片測試結果表(100 frame) ........................................................ 49

圖目錄
圖2.1 影像壓縮H.264編碼流程圖 ......................................................... 4
圖2.2 畫面內技術Luma 16×16預測模式 .............................................. 5
圖2.3 畫面間技術切割的七種方格大小區塊 ......................................... 5
圖2.4 量化步階畫面變化 ......................................................................... 6
圖2.5 Baseline Profile串流基本架構 ...................................................... 7
圖3.1 DM6437 EVM開發板 ................................................................ 9
圖3.2 TI eXpress DSP ....................................................................... 10
圖3.3 DM6437 EVM內部硬體方塊圖 ............................................... 10
圖3.4 C64x+快取記憶體架構 ............................................................. 12
圖3.5 範例程式記憶體配置 ............................................................... 13
圖3.6 硬體平台界面設定 ................................................................... 14
圖3.7 Code Composer Studio基本程式開發流程 ................................ 15
圖3.8 Code Composer Studio 3.3 ........................................................... 16
圖3.9 顧問系統編譯選單 ....................................................................... 17
圖3.10 顧問系統提示 ............................................................................... 17
圖3.11 進階功能視窗 ............................................................................... 17
圖3.12 DSP/BIOS設定環境 .................................................................... 18
圖3.13 cmd檔基本設定 ........................................................................... 19
圖3.14 資料存取硬體實驗環境 ............................................................... 20
圖3.15 即時性影像硬體實驗環境 ........................................................... 20
圖3.16 軟體實驗架構 ............................................................................... 21
圖4.1 TMS320C6000軟體開發流程 ..................................................... 22
圖4.2 建立編譯選項 ............................................................................... 23
圖4.3 顧問系統 ....................................................................................... 24
圖4.4 軟體管線化 ................................................................................... 26
圖4.5 初始效能最佳化設定 ................................................................... 27
圖4.6 編譯C/C++程式開發流程圖 ....................................................... 28
圖4.7 軟體程式效能最佳化流程圖 ....................................................... 29
圖4.8 DM6437記憶體配置圖 ............................................................... 31
圖4.9 DM6437硬體JP2記憶體擺放區塊設定 ................................... 32
圖5.1 Foreman測試畫面 ........................................................................ 33
圖5.2 Coastguard測試畫面.................................................................... 34
圖5.3 Salesman測試畫面 ...................................................................... 34
圖5.4 H.264 - Intra技術運算量百分比 ................................................. 37
圖5.5 H.264 - Inter技術運算量百分比(1frame) ................................. 39
圖5.6 H.264 - Inter技術運算量百分比(10frame) ............................... 39
圖5.7 Intra與Inter技術CPU Cycle比較圖 ........................................ 41
圖5.8 程式碼(Code Size)實驗數據變化曲線 ........................................ 43
圖5.9 程式效能(CPU Cycle)實驗數據變化曲線 .................................. 44
圖5.10 H.264程式執行時間(fps)曲線變化 ............................................ 45
圖5.11 L1D Cache執行時間變化圖........................................................ 47
圖5.12 L2 Cache執行時間變化圖 .......................................................... 47
圖5.13 記憶體分配位置圖 ....................................................................... 48
[1]ISO/IEC, “Information technology - Coding of audio-visual objects -- Part 10: Advanced Video Coding,” ISO/IEC 14496-10:2009, May 2009.
[2]Fraunhofer-Institute for Telecommunications, Heinrich-Hertz-Institut, H.264/ MPEG-4 AVC Reference Software JM8.0, Retrieved August 7, 2007, from http://iphome.hhi.de/suehring/tml/download/old_jm/
[3]I.E.G. Richardson, H.264 and MPEG-4 Video Compression: Video Coding for Next-generation Multimedia., John Wiley & Sons, Ltd, UK, pp. 160-206, 2003.
[4] Telecommunication standardization sector of ITU, H.264 Advanced video coding for generic audiovisual services, Mar 2005.
[5]Spectrum Digital, Inc., “DM6437 EVM Technical Reference,” Manual No. 509105-0001 Rev. C, Dec 2006.
[6]Texas Instruments Inc., “TMS320C64x+ DSP Cache User's Guide,” Literature No. spru862b, February 2009.
[7] Texas Instruments Inc., “TMS320C64x+ DSP Mega module Reference Guide,” Literature No. spru871j, August 2008.
[8]Texas Instruments Inc., “Code Composer Studio Development Tools v3.3,” Literature No. spru509h, October 2006.
[9]Texas Instruments Inc., “TMS320C6000 Optimizing Compiler v 6.1,” Literature No. spru187o, May 2008.
[10]盧怡仁、蔡偉和,數位信號處理平台在嵌入式系統的應用,台北市文魁資訊,民國95。
[11]T. Wiegand, J. Sullivan, G. Bjontegaard, et al., “Overview of the H.264/AVC Video Coding Standard,” IEEE Transactions on Circuits & Systems Video Technology, Vol.13, No. 7, pp. 560-576, July 2003.
[12] L. Zhuo, Q. Wang, D.D. Feng, and Lansun Shen, “Optimization and Implementation of H.264 Encoder on DSP Platform,” IEEE International Conference on Multimedia and Expo, pp. 232-235, July 2007.
[13]T. C. Chen, Y. W. Huang, C. Y. Tsai, B. Y. Hsieh, and L. G. Chen, ”Architecture Design of Context-Based Adaptive Variable-Length Coding for H.264/AVC,” IEEE Transactions on Circuits and Systems, Vol.53, No. 9, pp. 832-836, September 2006.
[14] I. Werda, F. Kossentini, M.-A. Ben Ayed and N. Massmoudi, “Analysis and Optimization of UB Video's H.264 Baseline Encoder Implementation on Texas Instruments' TMS320DM642 DSP,” IEEE International Conference on Image Processing (ICIP), pp. 8-11, October 2006.
[15]C. L. Fang, T. H. Tsai, and R. C. Kuo, “Design and Implementation of a Videotext Extractor on Dual-core Platform,” IEEE Asia-Pacific Services Computing Conference, pp. 896 - 900, December 2008.
[16]N Vun, and Y. J. Cai, “Optimization Techniques for a DSP Based H.264 Embedded System,” IEEE International Symposium on Consumer Electronics (ISCE), pp. 213 – 217, May 2009.
[17]T. Sheng, M. Sarem, J. Zhou and S. Hu, “Memory Optimization for Embedded Systems Running H.264/AVC Video Encoder,” IEEE International Conference on Parallel Processing Workshops, pp. 32 - 32, September 2007.
[18]K. S. Aw, S. M. Goh, K. H. Goh, T .K. Chiew, and J. Y.Tham, “Live H.264/AVC Encoding and Streaming Server Based on TI DSP,” IEEE Conference on Industrial Electronics and Applications (ICIEA), pp. 2436 – 2439, May 2009.
[19]K. Li, K. Jia, J. Xie, and Y. Wang, “Design and Optimization of H.264 Video Encoder on DSP Platform,” IEEE International Conference on Innovative Computing, Information and Control (ICICIC), pp. 541 – 541, September 2007.
[20]N. Vun, T.N.A. Nguyen, “Development of H.264 Encoder for a DSP Based Embedded System,” IEEE International Symposium on Consumer Electronics (ISCE), pp. 1–4, June 2007.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊