跳到主要內容

臺灣博碩士論文加值系統

(44.201.92.114) 您好!臺灣時間:2023/03/31 12:31
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:蘇則仲
研究生(外文):Tse-Chung Su
論文名稱:CUDA架構下針對低密度奇偶校驗碼為基礎之分散式編碼的近即時解碼設計
論文名稱(外文):A Near Real Time Decoding for LDPC Based Distributed Video Coding Using CUDA
指導教授:吳家麟
口試委員:陳宏銘陳維超鄭羽伸
口試日期:2011-06-09
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2011
畢業學年度:99
語文別:英文
論文頁數:52
中文關鍵詞:分散式視訊編碼WZ 視訊編碼低密度奇偶校驗碼SPA平行計算雲端計算CUDA通用圖形處理器
外文關鍵詞:Distributed video codingWyner-Ziv video codinglow density parity check codessum-product algorithmparallel computingcloud computingCUDAGPGPU
相關次數:
  • 被引用被引用:0
  • 點閱點閱:251
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:2
Wyner-Ziv (或簡稱WZ) 視訊編碼為分散式視訊編碼 (或簡稱DVC) 的一種實作,它基於Wyner-Ziv 的理論,主要針對視訊資料之間的資料相關性進行失真壓縮。這種新的壓縮方式,在計算複雜度上因為擁有簡單的編碼器和極複雜的解碼器特性而受到重視,其解碼器的複雜度來自於Slepian–Wolf解碼。雖然近年來,許多能有效改進WZ 視訊編碼壓縮效率的方法被提出,目前大部分被提出的WZ視訊編碼,其解碼端的時間延遲都非常的長,這對於即時性要求較高的應用裡,WZ 視訊編碼失去了其實用價值。在這篇論文中,我們使用CUDA架構,針對低密度奇偶校驗碼(目前壓縮效能最好的Slepian–Wolf解碼器)中的sum-product 演算法(或簡稱SPA),提出一個高度平行化的設計。再者,我們在CUDA上提出的收斂偵測機制,能消除CPU和GPU之間的傳輸延遲。實驗結果顯示,在QCIF大小下,(監控)影片格式能夠被即時解碼,其他格式也有至少每秒五張的解碼速率。影片在解壓縮的過程中,和平行化之前相比,都能維持非常高的壓縮比以及極低的失真率。

Wyner-Ziv (WZ) video coding – a particular case of distributed video coding (DVC), is based on the Wyner-Ziv theorem for lossy coding of correlated video sources. This new coding paradigm is well known for its low-complexity encoding and high-complexity decoding characteristics, where the high decoding complexity is mainly due to the intricate procedures of Slepian–Wolf decoding. Although some works have been made in recent years, especially for improving the coding efficiency, most reported WZ codecs have high time delay in the decoder, which hinders its practical values for applications with critical timing constraint. In this paper, a fully parallelized sum-product algorithm (SPA) for low density parity check accumulate (LDPCA) codes is applied through Compute Unified Device Architecture (CUDA) based on General-Purpose Graphics Processing Unit (GPGPU). Furthermore, we proposed a novel early stop detection mechanism, implemented on CUDA, which substantially eliminates the communication latency between CPU and GPU. Experimental results show that, through our work, QCIF (surveillance) videos can be decoded in real-time and videos in other formats can reach to at least 5.01 frames per second in terms of decoding speed. All videos are decoded with extremely high quality and negligible rate-distortion (RD) loss.

口試委員會審定書 i
誌謝 ii
中文摘要 iii
ABSTRACT iv
CONTENTS v
LIST OF FIGURES vii
LIST OF TABLES viii
Chapter 1 Introduction 1
Chapter 2 Sum-Product Algorithm 5
Chapter 3 Parallel LDPCA Decoding In CUDA 8
3.1 Reduction φ Function Computation 8
3.2 Fully Parallelized SPA – Parallel Partial Reduction in HPK 12
3.3 Combination of VPK and HPK 15
3.4 Check Node Rearrangement and Loop Unrolling 16
3.4.1 Ballot Based Reduction in Fermi 19
3.4.2 Half Float (16-bit) for Data Storage 20
Chapter 4 LDPCA Early Stop Detection Mechanism Using CUDA 22
4.1 Early Jump Detection Kernel (EDK) 22
4.1.1 Successfully Decoded Codeword Detection in CUDA (Condition 1) 23
4.1.2 Un-decodable Codeword Detection in CPU (Condition 2) 24
4.1.3 The Overhead of Early Stop Detection in CUDA 25
4.2 Complexity Reduction of Early Stop Detection 27
4.2.1 Combination of EDK and UMK 27
4.2.2 Latency Reduction of Data Transfer and Data Initialization 27
4.3 Concurrent Kernel Execution and Data Transfer 29
4.3.1 Practical CUDA Implementation for Early Stopping Detection 30
4.3.2 Early Stop Detection Using CUDA Driver API 31
4.3.3 Fixed Number of overlapping UMKs 33
Chapter 5 Experimental Results 34
5.1 Speed-up Ratios for various LDPCA Decoder Versions 34
5.2 Decoding Time of the Optimized LDPCA Decoder per Feedback 36
5.3 Speed-up ratio of early stop detection against theoretical bound 38
5.4 Overall WZ Decoder Complexity and RD Performance 39
Chapter 6 Conclusion and Future Work 48
REFERENCE 50



[1]Slepian, D. and Wolf, J. 1973. Noiseless coding of correlated information sources. IEEE Transactions on Information Theory. 19, 4, 471- 480.
[2]Wyner, A. and Ziv, J. 1976. The rate-distortion function for source coding with side information at the decoder. IEEE Transactions on Information Theory. 22, 1, 1-10.
[3]Aaron, A., Rui, Z. and Girod, B. 2002. Wyner-ziv coding of motion video. In Proc. of Conference Record of the Thirty-Sixth Asilomar Conference on Signals, Systems and Computers, 2002., 240-244.
[4]Girod, B., Aaron, A. M., Rane, S. and Rebollo-Monedero, D. 2005. Distributed video coding. Proceedings of the IEEE. 93, 1, 71-83.
[5]Liveris, A. D., Zixiang, X. and Georghiades, C. N. 2002. Compression of binary sources with side information at the decoder using ldpc codes. Communications Letters, IEEE. 6, 10, 440-442.
[6]Varodayan, D. and Aaron, A. 2006. Rate-adaptive codes for distributed source coding. EURASIP Signal Processing Journal, Special Issue on Distributed Source Coding. 86, 11, 3123-3130.
[7]Varodayan, D., Aaron, A. and Girod, B. 2005. Rate-adaptive distributed source coding using low-density parity-check codes. In Proc. of Signals, Systems and Computers, 2005. Conference Record of the Thirty-Ninth Asilomar Conference on, 1203-1207.
[8]Kschischang, F. R., Frey, B. J. and Loeliger, H. A. 2001. Factor graphs and the sum-product algorithm. Information Theory, IEEE Transactions on. 47, 2, 498-519.
[9]Pai, Y.-S., Cheng, H.-P., Shen, Y.-C. and Wu, J.-L. 2010. Fast decoding for ldpc based distributed video coding. In Proc. of ACM international conference on Multimedia,
[10]Falcao, G., Sousa, L. and Silva, V. 2008. Massive parallel ldpc decoding on gpu. In Proc. of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, 83-90.
[11]Ldpca c source: http://www.stanford.edu/~divad/software/ldpca.zip.
[12]Cuda programming guide 3.2: http://developer.download.nvidia.com/compute/cuda/3_2/toolkit/docs/CUDA_C_Programming_Guide.pdf.
[13]Optimizing parallel reduction in cuda: http://developer.download.nvidia.com/compute/cuda/1_1/Website/projects/reduction/doc/reduction.pdf.
[14]Ascenso, J. and Pereira, F. 2009. Complexity efficient stopping criterion for ldpc based distributed video coding. In Proc. of the 5th International ICST Mobile Multimedia Communications Conference,
[15]Cuda-enabled gpu list: http://www.nvidia.com/object/cuda_gpus.html.
[16]Artigas, X., Ascenso, J., Dalai, M., Klomp, S., Kubasov, D. and Ouaret, M. 2007. The discover codec: Architecture, techniques and evaluation. In Proc. of Picture Coding Symposium,
[17]Discover - test material: http://www.img.lx.it.pt/~discover/test_conditions.html.
[18]Ryanggeun, O., Jongbin, P. and Byeungwoo, J. 2010. Fast implementation of wyner-ziv video codec using gpgpu. In Proc. of IEEE International Symposium on Broadband Multimedia Systems and Broadcasting, 1-5.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊