

( 您好!臺灣時間:2024/12/03 16:48
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::


研究生(外文):Li Chang
論文名稱(外文):GPU-Aware Sphere Decoder Implementation in Software-Defined Radio
外文關鍵詞:Software-defined radioGraphics processing unitSphere decodingMulti-input Multi-output
  • 被引用被引用:0
  • 點閱點閱:169
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
Modern communication protocols are getting more complicated. With general purpose processor getting more powerful, software-defined radio offer an attractive alternative with its high flexibility. Graphics Processing Unit(GPU) enable software-defined radio to exploit its massive parallelism computing paradigm. While the majority of the baseband processing algorithm is highly parallel, the sphere decoder, however, due to its depth-first search nature, is not as regular as other components. According to the Amdahl''s law, the efficiency of the software-defined LTE system implementation may be limited by the sphere decoding stage. In this thesis, we propose a preprocessing stage to significantly improve the extit{warp} execution efficiency of sphere decoding algorithm on GPU. The sphere decoder can run 1.6X faster in average in a middle to low SNR environment. Together with memory hierarchy related optimizations, the overall performance improvement to a sphere decoder has been improved by 2-3.5 times.
誌謝 i
摘要 ii
Abstract iii
Contents iv
List of Figures vi
List of Tables viii
1 Introduction 1
2 Background and Motivation 4
2.1 Overview of MIMO Detection Algorithms 4
2.1.1 Maximum Likelihood Detection 6
2.2 Graphics Processing Unit 9
2.2.1 Overview of CUDA 10
2.2.2 GPU Memory Hierarchy 10
2.2.3 Irregular Program 11
2.3 MIMO Detection on GPU 12
3 Memory-aware Implementation and Divergence Reduction Preprocessing for Sphere Decoding 14
3.1 Schnorr-Euchner Sphere Decoding Algorithm 14
3.2 Memory-aware Implemenation 15
3.2.1 Minimize the Main Memory Access 17
3.3 Divergence Issue in Schnorr-Euchner Sphere Decoding 18
3.3.1 Observation of Input Data Pattern 18
3.3.2 Full Frame Data Size 22
3.3.3 Divergence reduction Through Reorder 22
3.4 Proposed Preprocessing Stages 25
4 Experiments 28
4.1 Simulation Setup 28
4.2 Results 29
4.3 Limitation 32
5 Conclusions 35
Bibliography 37
[1] U. Ramacher, “Software-defined radio prospects for multistandard mobile phones,” Computer, vol. 40, no. 10, 2007.
[2] S. Yang and L. Hanzo, “Fifty years of mimo detection: The road to large-scale mimos,” IEEE Communications Surveys & Tutorials, vol. 17, no. 4, pp. 1941–1988, 2015.
[3] Q. Zheng, Y. Chen, R. Dreslinski, C. Chakrabarti, A. Anastasopoulos, S. Mahlke, and T. Mudge, “Architecting an lte base station with graphics processing units,” in 2013 IEEE Workshop on Signal Processing Systems, SiPS 2013. IEEE, 2013, pp. 219–224.
[4] J. Berkmann, C. Carbonelli, F. Dietrich, C. Drewes, and W. Xu, “On 3g lte terminal implementation-standard, algorithms, complexities and challenges,” in Wireless Communications and Mobile Computing Conference, 2008. IWCMC’08. Interna-
tional. IEEE, 2008, pp. 970–975.
[5] J. Ketonen, M. Juntti, and J. R. Cavallaro, “Performance—complexity comparison of receivers for a lte mimo–ofdm system,” IEEE transactions on signal processing, vol. 58, no. 6, pp. 3360–3372, 2010.
[6] Q. Zheng, Y. Chen, R. Dreslinski, C. Chakrabarti, A. Anastasopoulos, S. Mahlke, and T. Mudge, “Wibench: An open source kernel suite for benchmarking wireless systems,” in Proceedings of the IEEE International Symposium on Workload Char- acterization (IISWC), 2013, pp. 123–132.
[7] S.Bang,C.Ahn,Y.Jin,S.Choi,J.Glossner,andS.Ahn,“Implementation of lte system on an sdr platform using cuda and uhd,” Analog Integrated Circuits and Signal Processing, vol. 78, no. 3, pp. 599–610, 2014.
[8] J. Kim, S. Hyeon, and S. Choi, “Implementation of an sdr system using graphics processing unit,” IEEE Communications Magazine, vol. 48, no. 3, 2010.
[9] T.-D.Chiueh,P.-Y.Tsai,andI.-W.Lai,Baseband receiver design for wireless MIMO-OFDM communications. John Wiley & Sons, 2012.
[10] B. Hassibi and H. Vikalo, “On the sphere-decoding algorithm i. expected complexity,” IEEE transactions on signal processing, vol. 53, no. 8, pp. 2806–2818, 2005.
[11] S. Cook, CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs, 1st ed. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2013.
[12] Q. Xu, H. Jeon, and M. Annavaram, “Graph processing on gpus: Where are the bottlenecks?” in Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), 2014, pp. 140–149.
[13] D. Sui, Y. Li, J. Wang, P. Wang, and B. Zhou, “High throughput mimo-ofdm detection with graphics processing units,” in 2012 IEEE International Conference on Computer Science and Automation Engineering (CSAE), vol. 2, pp. 176–179.
[14] M.Wu,S.Gupta,Y.Sun,andJ.R.Cavallaro,“A GPU implementation of a real-time mimo detector,” in 2009 IEEE Workshop on Signal Processing Systems, SiPS, 2009, pp. 303–308.
[15] M.S.Khairy,C.Mehlführer,and M.Rupp,“Boosting spheredecoding speed through graphic processing units,” in 2010 European Wireless Conference (EW). IEEE, 2010, pp. 99–104.
第一頁 上一頁 下一頁 最後一頁 top