跳到主要內容

臺灣博碩士論文加值系統

(18.97.9.172) 您好!臺灣時間:2024/12/03 06:44
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:陳學中
研究生(外文):Hsieh-Chung Chen
論文名稱:顯示晶片高速模算術
論文名稱(外文):MAGE : High-throughput Modular Arithmetic on GPUand its Application to ECM
指導教授:鄭振牟鄭振牟引用關係
指導教授(外文):Chen-Mou Cheng
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:電機工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2009
畢業學年度:97
語文別:英文
論文頁數:65
中文關鍵詞:密碼學橢圓曲線因數分解顯示卡模算術CUDA
外文關鍵詞:cryptography and cryptanalysisCUDAelliptic curve method (ECM)GPGPUinteger factorizationmodular arithmeticnumber field sieves (NFS)
相關次數:
  • 被引用被引用:0
  • 點閱點閱:290
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:2
模算術在現代密碼學中舉足輕重,不但直接決定了RSA 及許多橢圓曲線密碼系統之運算速度,同時也常用於大數分解演算法。舉例來說,橢圓曲線質因數分解方法 (Elliptic Curve Method, ECM) 的運算速度就完全取決於模算術處理能力。近年來質因數分解演算法的持續進步,迫使 RSA 的金鑰長度不斷增加,其中 ECM就扮演了重要的角色。

在這些密碼學的應用中,往往運算量都非常可觀,尤其以 ECM 等質因數分解相關演算法為最。考慮到這些應用中的大量運算,我們認為使用多核運算器是比較合適的,並因此選擇顯示晶片為目標。

這篇論文中討論了模算術的實作方法,亦即大數在蒙氏空間 (Montgomery space)中的操作狀況,並討論在顯示晶片的運算/空間資源分配及利用。除了討論模算術器的設計,論文中也說明了橢圓曲線質因數分解演算法,以及使用前述之模算術器實作ECM 的方法。

最後,本篇論文提出把顯示晶片轉化為高速模算術運算器的方法,並且進而組合成高速 ECM 軟體,寫下了數倍於以往的新記錄。我們預期這種技巧除了大幅提升 ECM 的速度之外,也可嘉惠其他受限於模算術速度之演算法。
Modular arithmetic is the core operation of many cryptographic applications such as RSA, elliptic curve cryptography, and the elliptic curve method of integer factorization (ECM). The speed of ECM, as well as many other applications, is directly determined by the capability of executing modular operations.

ECM is one of the fastest algorithms for factoring large general numbers. It is also an important subroutine in the number field sieves (NFS), the champion method for factoring
RSA numbers and solving discrete logarithm problems. ECM is of practical interest today because it is becoming more and more important in NFS as the problem size increases.

Graphics processing unit (GPU) represents a new class of massively parallel computer architectures that provide immense aggregated throughput of computation. These massively parallel computers are scalable and efficient at exploiting the enormous transistor budget afforded by Moore’s law. GPU in particular has a competitive edge in terms of price-performance ratio, thanks to video game industry.

In this thesis, we present MAGE, a GPU-based, high-throughput, massively parallel modular arithmetic unit. With the design and techniques presented in this thesis, we are able to deliver more than 480 million 210-bit modular multiplications per second on a single graphics card, setting a new record in terms of the highest single-chip performance, the highest performance per dollar, as well as the highest performance per watt among all known implementations up to date.

We also report a record-setting ECM implementation based on MAGE, delivering a throughput of 4928 curves per second for ECM stage 1 with B1 = 8192 for 210-bit integers on a single graphics card. With the help of MAGE, our ECM implementation outperforms the best previous results published in EUROCRYPT 2009 by a factor of more than four
times.
Abstract i
Table of Contents iii
List of Tables iv
List of Figures 1
1 Introduction 2
1.1 Motivation 2
1.2 Challenges 4
1.2.1 Task Assignment 5
1.2.2 Memory Management 6
1.2.3 Resource Utilization 6
1.2.4 Efficiency vs. Flexibility 7
1.3 Contributions 7
1.4 Thesis Organization 8
2 Background 9
2.1 The Elliptic Curve Method 9
2.1.1 Pollard’s p 􀀀 1 Method 10
2.1.2 ECM Algorithm 11
2.2 The GPU Architecture14
2.3 CUDA Programming 16
3 Modular Arithmetic Units 21
3.1 Design for Flexibility 22
3.2 Interface Design 23
3.3 Modular Arithmetic 25
3.3.1 Modular Addition and Subtraction 25
3.3.2 Modular Multiplication 26
3.3.2.1 Problem Statement 26
3.3.2.2 Montgomery Coordinate 26
3.3.2.3 Montgomery Reduction 28
3.4 Operand Representation 30
3.4.1 Limb Data Type and Size 30
3.4.2 Storage Compaction . 32
3.5 Mapping to CUDA 33
3.5.1 Task Decomposition 33
3.5.2 Data Movement 34
3.5.3 Memory Utilization 36
3.5.4 Processor Utilization 38
4 ECM Implementation 40
4.1 Elliptic-Curve Arithmetic 40
4.2 Scaler Multiplication over Elliptic Curve 42
4.3 ECM Stage 1 44
4.4 ECM Stage 2 45
5 Experiment Results 47
5.1 Effect of Number of Thread Blocks 48
5.2 Effect of Size of Moduli 50
5.3 Analysis of Resource Utilization Rate 52
5.4 Comparison 54
6 Conclusion 59
Bibliography 65
[1] Daniel J. Bernstein, Tien-Ren Chen, Chen-Mou Cheng, Tanja Lange, and Bo-Yin
Yang. Ecm on graphics cards. In EUROCRYPT, pages 483–501, 2009.

[2] NVIDIA. The cuda compiler driver nvcc 2.0, 2008.

[3] M. Simka, J. Pelzl, T. Kleinjung, J. Franke, C. Priplata, C. Stahlke, M. Drutarovsky,
V. Fischer, and C. Paar. Hardware factorization based on elliptic curve method. In
Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom
Computing Machines (FCCM 2005), pages 107–116, 2005.

[4] Eric Landquist. The number field sieve factoring algorithm. University of Illinois at
Urbana-Champaign, 2002.

[5] Intel VP Patrick Gelsinger. Microprocessors for the new millennium — challenges,
opportunities and new frontiers. ISSCC, San Francisco, 2001.

[6] Hendrik W. Lenstra. Factoring integers with elliptic curves. The Annals of Mathematics,
126(3):649–673, November 1987.

[7] Peter L. Montgomery. Speeding the pollard and elliptic curve methods of factorization.
Mathematics of Computation, 48(177):243–264, 1987.
61

[8] Kirsten Eisenträger, Kristin Lauter, and Peter L. Montgomery. Fast elliptic curve
arithmetic and improved weil pairing evaluation. In Marc Joye, editor, CT-RSA, volume
2612 of Lecture Notes in Computer Science, pages 343–354. Springer, 2003.

[9] Paul Zimmermann and Bruce Dodson. 20 Years of ECM, volume 4076, pages 525–
542. Springer, 2006.

[10] 50 largest factors found by ecm. http://www.loria.fr/~zimmerma/
records/top50.html/.

[11] Kazumaro Aoki, Jens Franke, Thorsten Kleinjung, Arjen K. Lenstra, and Dag Osvik.
A kilobit special number field sieve factorization. In Advances in Cryptology —
ASIACRYPT 2007, pages 1–12, 2008.

[12] Adi Shamir and Eran Tromer. Factoring large number with the twirl device. In Dan
Boneh, editor, CRYPTO, volume 2729 of Lecture Notes in Computer Science, pages
1–26. Springer, 2003.

[13] Arjen K. Lenstra, Eran Tromer, Adi Shamir, Wil Kortsmit, Bruce Dodson, James
Hughes, and Paul Leyland. Factoring estimates for a 1024-bit RSA modulus. In
Advances in Cryptology — ASIACRYPT 2003, pages 55–74. 2003.

[14] Jens Franke, Thorsten Kleinjung, Christof Paar, Jan Pelzl, Christine Priplata, and
Colin Stahlke. SHARK: A realizable special hardware sieving device for factoring
1024-bit integers. In Cryptographic Hardware and Embedded Systems — CHES
2005, pages 119–130. 2005.
62

[15] Willi Geiselmann, Adi Shamir, Rainer Steinwandt, and Eran Tromer. Scalable hardware
for sparse systems of linear equations, with applications to integer factorization.
In Cryptographic Hardware and Embedded Systems — CHES 2005, pages 131–146.
2005.

[16] S. Cavallar, W. M. Lioen, H. J. J. Te Riele, B. Dodson, A. K. Lenstra, P. L. Montgomery,
B. Murphy Et Al, Mathematisch Centrum (smc, The Dutch Foundation, Stefania
Cavallar, Walter Lioen, Herman Te Riele, Bruce Dodson, Arjen K. Lenstra, Peter
L. Montgomery, Brian Murphy, Karen Aardal, Je Gilchrist, and Gerard Guillerm.
Factorization of a 512-bit rsa modulus. In Proceedings of Eurocrypt ’2000, pages
1–18. Springer-Verlag, 2000.

[17] Optimal parameters for ecm. http://www.loria.fr/~zimmerma/
records/ecm/params.html.

[18] A. O. L. Atkin and Francois Morain. Finding suitable curves for the elliptic curve
method of factorization. Mathematics of Computation, 60:399–405, 1993.

[19] D. Cook, J. Ioannidis, A. Keromytis, and J. Luck. Cryptographics: Secret key cryptography
using graphics cards, 2005.

[20] Angelos D. Keromytis Debra L. Cook. Cryptographics: Exploiting graphics cards for
security. Advances in Information Security, 20, 2006.

[21] S.A. Manavski. Cuda compatible gpu as an efficient hardware accelerator for aes
cryptography. In Signal Processing and Communications, 2007. ICSPC 2007. IEEE
International Conference on, pages 65–68, Nov. 2007.
63

[22] Robert Szerwinski and Tim Güneysu. Exploiting the power of GPUs for asymmetric
cryptography. In Cryptographic Hardware and Embedded Systems — CHES 2008,
pages 79–99. 2008.

[23] Peter L. Montgomery. Modular multiplication without trial division. Mathematics of
Computation, 44(170):519–521, 1985.

[24] NVIDIA. Cuda programming guide 2.2, 2009.

[25] Harold M. Edwards. A normal form for elliptic curves. Bulletin of the American
Mathematical Society, 44(3):393–422, July 2007.

[26] Daniel Bernstein and Tanja Lange. Faster addition and doubling on elliptic curves. In
Advances in Cryptology — ASIACRYPT 2007, pages 29–50. 2008.

[27] Daniel J. Bernstein, Peter Birkner, Tanja Lange, and Christiane Peters. Optimizing
double-base elliptic-curve single-scalar multiplication. In INDOCRYPT, pages 167–
182, 2007.

[28] Daniel J. Bernstein and Tanja Lange. Inverted edwards coordinates. Cryptology ePrint
Archive, Report 2007/410, 2007. http://eprint.iacr.org/.

[29] Daniel J. Bernstein and Tanja Lange. Analysis and optimization of elliptic-curve
single-scalar multiplication. Cryptology ePrint Archive, Report 2007/455, 2007.
http://eprint.iacr.org/.

[30] Huseyin Hisil, Kenneth Koon-Ho Wong, Gary Carter, and Ed Dawson. Faster group
operations on elliptic curves. In Ljiljana Brankovic andWilly Susilo, editors, Seventh
64
Australasian Information Security Conference (AISC 2009), volume 98 of CRPIT,
pages 7–19, Wellington, New Zealand, 2009. ACS.

[31] Daniel J. Bernstein, Peter Birkner, Tanja Lange, and Christiane Peters. Ecm using
edwards curves. Cryptology ePrint Archive, Report 2008/016, 2008. http:
//eprint.iacr.org/.

[32] The billion-mulmod-per-second pc, 2009. manuscript.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊