Giter Club home page Giter Club logo

secp256k1-cl's Introduction

SECP256K1-CL

SECP256K1-CL is a fork of sipa's (Pieter Wuille) optimized ECDSA library for Bitcoin.

The original SECP256K1 is the fastest crypto library working on Bitcoin's curve. On my computer (i7 3770K), it is able to verify a signature in 0.08 ms.

It achieves this considerable speed by a combination of mathematical optimizations and good old programming tricks.

For example, the verification involves calculating hG + xQ where G is the Generator point and Q is the public key. It uses the windowed non adjacent form (WNAF) to reduce the number of multiplication required. For G a large array of precomputed multiple are used since G is a constant. hG is calculated in cartesian coordinates and xQ in Jacobian coordinates because cartesian coordinates are faster to add but slower to compute multiples, and Jacobian coordinates are the opposite.

For this OpenCL version, I have done the following modifications. First of all, verifications are batched. The latency of the GPU is very bad. It excels at doing calculations in parallel so we need to give them in a big batch.

I split the verifications into 3 phases. Phase 1 performs the setup of h and x and finds their WNAF. Phase 2 computes the array of multiples of Q and calculates hG + xQ. Phase 3 checks that the x-coord of the resulting point matches r. Only phase 2 is run on the GPU because it is by far the most expensive operation.

Also splitting the task into CPU/GPU reduces the size of the kernels. On the GPU, we have the group multiplication code and the field operations. The multiple precision arithmetic library (GMP) is not required. All the math operators are run as auxilliary functions and use local storage. However, openCL does not have a native 128 bit integer type so I had to use the smaller size (FIELD10x26).

One big problem remains though. The original code optimizes for a single execution at a time so the size of the data structures are mostly irrelevant. Unfortunately, once we create a large batch they matter. The WNAF is sparse. It is guaranteed to have non zero values separated by at least the window size. For G, it's 14 and for Q it's 5. So we have big arrays that have many zeroes and they have to be transferred to the GPU. I believe memory bandwidth could be an issue here.

Another point of concern is the trade off between precalculation and memory usage. It is possible that for a GPU, it is worth using smaller window sizes. This should be profiled.

Some results: On my Nvidia GTX 560Ti, 262144 samples took

  • prepare -> 2518 (ms)
  • compute -> 15201
  • check -> 35
  • total -> 17755

So about 0.067733 ms per sample.

TODO:

  • profile
  • look at using the endormorphism code

secp256k1-cl's People

Contributors

hhanh00 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.