|
[1] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 2nd ed. MA: The MIT Press, 2001. [2] S. G. Akl, Parallel Sorting Algorithms. Orlando: Academic Press, 1985. [3] M. A. Weiss, Data structures and algorithm analysis in C, 2nd ed. CA:Addison-Wesley, 1997. [4] C. Breshears, The Art of Concurrency: A Thread Monkey's Guide to Writing Parallel Applications, Cambridge: O'Reilly Media, 2009. [5] D. B. Kirk and W.-M. W. Hwu, Programming Massively Parallel Processors: A Hands-on Approach, MA: Morgan Kaufmann Publishers, 2010. [6] D. E. Knuth, The Art of Computer Programming, Vol 3, Addison-Wesley, 1973. [7] T. R. Halfhill, \Parallel Processing with CUDA," Microprocessor Report, Jan. 2008. [8] W. A. Martin, \Sorting," ACM Comput. Surv., vol. 3, no. 4, pp. 147-174, Dec. 1971. [9] D. L. Shell, \A high-speed sorting procedure," Commun. ACM, vol. 2, no. 7, pp. 30-32, July 1959. [10] H. W. Lang. (2010). Shellsort [Online]. Available: http://www.inf.fh- ensburg.de/lang/algorithmen/sortieren/shell/shellen.htm. [11] V. R. Pratt, \Shellsort and Sorting Networks," Ph.D. dissertation. Stanford University, Standford, CA, USA, 1972. [12] M. Ciura, \Best Increments for the Average Case of Shellsort," 13th International Symposium on Fundamentals of Computation Theory, vol. 2138, pp. 106-117, August 2001. [13] R. Sedgewick, \Analysis of Shellsort and Related Algorithms," In Proceedings of the Fourth Annual European Symposium on Algorithms, vol. 1136, pp. 1-11, 1996. [14] Wikipedia. (2010). WIKI: Shell sort [online]. Available: http://en.wikipedia.org/wiki/Shellsort [15] NVIDIA CUDA Programming Guide, NVIDIA Corporation, 2009, version 2.3. [16] NVIDIA CUDA C Programming Best Practices Guide, NVIDIA corporation, 2009, version 2.3. [17] M. Harris, S. Sengupta, J. D. Owens, Parallel Prex Sum (Scan) with CUDA. In GPU Gems 3, Nguyen H., (Ed.), Addison Wesley, Aug. 2007, chapter 31. [18] R. Baraglia, G. Capannini, F. M. Nardini, and F. Silvestri, Sorting using Bitonic Network with CUDA. In th 7th Workshop on Large Scale Distributed Systems for Information Retrieval, Boston, USA, July, 2009. [19] N. Satish, M. Harris, and M. Garland, "Designing Ecient Sorting Algorithms for Manycore GPUs," IPDPS, 2009, pp. 1-10. [20] J. Chhugani, A. D. Nguyen, V. W. Lee, W. Macy, M. Hagog, Y.-K. Chen, A. Baransi, S. Kumar, and P. Dubey. "Efficient Implementationof Sorting on Multi-core SIMD CPU Architecture," PVLDB, vol. 1, no. 2, pp. 1313-1324, August 2008. [21] N. Leischner, V. Osipov, and P. Sanders, "GPU Sample Sort," IPDPS, 2010, pp. 1-10. [22] D. R. Helman, D. A. Bader, and J. JaJa, "A randomized parallel sorting algorithm with an experimental study," J. of Parallel and Distributed Computing, vol. 52, no. 1, pp. 1-23, 1998. [23] S. Sengupta, M. Harris, and M. Garland, \Ecient Parallel Scan Algorithms for GPUs," NVIDIA Technical Report, 2008. [24] S. Sengupta, M. Harris, Y. Zhang, and J.D. Owens, "Scan primitives for GPU computing." in Graphics Hardware 2007, August 2007, pp. 97-106. [25] D. Cederman and P. Tsigas, "A practical quicksort algorithm for graphics processors," in Proc. 16th Annual European Symposium on Algorithms, Sep. 2008, pp. 246-258. [26] S. Chen, J. Qin, Y. Xie, J. Zhao, and P.-A. Heng, \A fast and exible sorting algorithm with CUDA," Proceedings of the 9th International Conference on Algorithms and Architectures for Parallel Processing Springer-Verlag, 2009, pp. 281-290. [27] E. Sintorn and U. Assarsson, \Fast parallel GPU-sorting using a hybrid algorithm," J. Parallel Distrib. Comput., vol. 68, no. 10, pp. 1381-1388, 2008. [28] F. Dehne and H. Zaboli. \Deterministic Sample Sort For GPUs," arXiv:1002.4464, 2010. [29] J. Vitter, "External memory algorithms and data structures: Dealing with massive data," ACM Computing Surveys, pages 209-271, 2001. [30] G. E. Blelloch, C. E. Leiserson, B. M. Maggs, C. G. Plaxton, S. J. Smith, and M. Zagha, "An experimental analysis of parallel sorting algorithms," Theory of Computing Systems, Vol. 31, No. 2, March/April 1998, pp. 135-167. [31] J. Nickolls, I. Buck, M. Garland, and K. Skadron, "Scalable Parallel Programming with CUDA," Queue, Vol. 6, No. 2, pp. 40-53, Mar/Apr 2008. [32] V. Volkov, and Demmel, J. W, "Benchmarking GPUs to tune dense linear algebra, "ACM/IEEE Conference on Supercomputing, Austin, 2008. pp. 1-11. [33] M. Matsumoto and T. Nishimura, "Mersenne twister: A 623-dimensionally equidistributed uniform pseudo-random number genera-tor," ACM Transactions on Modeling and Computer Simulation, vol. 8, No. 1, pp. 330, 1998. [34] S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W. W. Hwu, "Optimization principles and application performance evaluation of a multithreaded GPU using CUDA," In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008, pp. 73-82. [35] N. Govindaraju, J. Gray, R. Kumar, and D. Manocha, "GPUTeraSort: high performance graphics coprocessor sorting for large database management," In Proceedings of the 2006 ACM SIGMOD international Conference on Management of Data, June 2006.
|