nucypher / ferveo Goto Github PK

This project forked from anoma/ferveo

An implementation of a DKG protocol forked from Anoma

Home Page: https://nucypher.github.io/ferveo/benchmarks/perf/tpke/index.html

License: GNU General Public License v3.0

Rust 90.55% Python 6.74% HTML 0.15% JavaScript 0.99% TypeScript 1.56%

ferveo's Introduction

Cryptographic primitives, hosted on the decentralized nodes of the Threshold network, offering accessible, intuitive, and extensible runtimes and interfaces for secrets management and dynamic access control.

Threshold Access Control (TACo)

TACo is end-to-end encrypted data sharing and communication, without the requirement of trusting a centralized authority, who might unilaterally deny service or even decrypt private user data. It is the only access control layer available to Web3 developers that can offer a decentralized service, through a live, well-collateralized and battle-tested network. See more here: https://docs.threshold.network/applications/threshold-access-control

Getting Involved

NuCypher is a community-driven project and we're very open to outside contributions.

All our development discussions happen in our Discord server, where we're happy to answer technical questions, discuss feature requests, and accept bug reports.

If you're interested in contributing code, please check out our Contribution Guide and browse our Open Issues for potential areas to contribute.

Security

If you identify vulnerabilities with any nucypher code, please email [email protected] with relevant information to your findings. We will work with researchers to coordinate vulnerability disclosure between our stakers, partners, and users to ensure successful mitigation of vulnerabilities.

Throughout the reporting process, we expect researchers to honor an embargo period that may vary depending on the severity of the disclosure. This ensures that we have the opportunity to fix any issues, identify further issues (if any), and inform our users.

Sometimes vulnerabilities are of a more sensitive nature and require extra precautions. We are happy to work together to use a more secure medium, such as Signal. Email [email protected] and we will coordinate a communication channel that we're both comfortable with.

A great place to begin your research is by working on our testnet. Please see our documentation to get started. We ask that you please respect testnet machines and their owners. If you find a vulnerability that you suspect has given you access to a machine against the owner's permission, stop what you're doing and immediately email [email protected].

ferveo's People

Stargazers

Watchers

Forkers

arjunhassard piotr-roslaniec kprasch pinkdiamond1 ticketland-io derekpierre jmyles ghardin1314 enigmae

ferveo's Issues

Sketch client-facing and server-facing APIs

Related to creating protocol description
Relevant issues:
- #59
- #60

Determine viability & usefulness of punishment (e.g. stake slashing, withhold rewards/fees) for verified non-availability during DKG ritual

Starting point: unresponsiveness is verifiable on-chain if an 'active' node fails to submit a transcript before the first round deadline

Add Additional Authenticated Data (AAD) to API

Additional Authenticated Data (AAD) is an arbitrary, non-private bytestring that's included at encryption and decryption time for further validation of ciphertexts. In the context of the Ferveo design, it's also possible to publicly validate a ciphertext, including the AAD in the process. For us, the AAD is the simplest way to attach conditions and other metadata to ciphertexts.

However, for some reason the Ferveo implementation didn't include the AAD as an input to the encryption API (and therefore to the decryption and ciphertext validation APIs); they're just not using it. See https://github.com/nucypher/ferveo/blob/main/tpke/src/ciphertext.rs#L56

We need to add the AAD as an input to the encryption, decryption and ciphertext validation functions, as well as using this AAD when creating the ciphertext authentication tag.

Document Verifiable DKG protocol

Notes & draft descriptions here:
https://hackmd.io/B01qaAnURpG5zoZQdx2a5A

Validate 'small count cohort' DKG for memory optimization

Note that this could skip the centralized coordination version.

Tests and benchmarks that show that the DKG and TDec works with N=8 and N=16, and t from 1 to N? including the size in bytes of A_i and Y_i vectors
- #9 (comment)
Work out viability of proactive key refresh 'back-up' strategy to inform the value of high redundancy groups (both for m and n constraints (#78)
Raise constraint of 8, 16, 32 and 64 DKG group size with adopters

Ensure presence and correctness of validity checks

Revise Ferveo's whitepaper and docs. List all validity checks for the protocol. Revise the source code and make sure that those checks are present, correct, and tested. Devise green- and red path scenarios.

List of checks:

#42
#44 - moved, postponed
#45 - moved, postponed

Ciphertext Validity Checking - must be done before creating a decryption share
#38

Verifying Decryption Shares - must be done before combining decryption shares
- batch_verify_decryption_shares
  - Used to be called in the original tpke benchmark
  - Checks that $e([\sum_j \alpha_{i,j} ] D_i, B_i)$ matches $e(\sum_j [ \sum_i \alpha_{i,j} ] U_j, -H)$ (?)
    - $B_i = [b_i] H$, blinding_key_prepared
- Also described in docs
  - #38
  - Existing implementation works for the fast tDec variant
  - Check for simple tDec variant is backlogged #42

ferveo:

verify_optimistic
- Usage example
- Already tested with positive and negative examples.
- This one is effectively done, we just need to make sure we know what it does and whether we are happy with using it

Verify PVSS transcript correctness

Check #3 and #4 of section 4.2.3
This should be implemented in ferveo PVSS
Has a relation to verify_full
- Described in docs as "Public verification" of PVSS

Add type hints to `ferveo-python` package

Type definitions for function calls
FerveoError for typed exceptions

Is Ferveo viable for our MVP?

TODO: Group into concrete questions, write sensible questions with acceptance criteria

Is Ferveo encryption & combination performance acceptable on the client side (in the browser)? Acceptable: <1s (?)
- #11
- #5
Can we directly achieve trust-minimized coordination for Ferveo DKG (via Ethereum or a L2)?
- #9
Follow-up: Is it possible to verify participant payloads of DKG (a.k.a. transcripts) on Ethereum/L2?
- #7 (Nice-to-have)
- #28 (Nice-to-have)
Feasibility of proactive secret sharing
- #3

Expose PSS methods in client and server APIs

Locate and `zeroize` all sensitive material

Python bindings

WASM bindings

At the very least for:

encrypt
combine

Cohort Refresh – research (PSS) & product implications + validation

Integrate key refreshing and key recovery from `tpke` with `ferveo` crate

Devise and test a share refreshing scenario with multiple delears

Source

Research and implement decryption share verification for simple tDec

Originally mentioned in #31 and backlogged.
In plain words: "Prove or verify between decryption share, commitment and something public associated to the node's private share"
Needed to figure out which Ursulas is responsible after final decryption fails
Dependency of tpke security, isn't covered by verify_full from ferveo

Validate feasibility of using KZG

Consider the KZG commitment scheme for VSS and its implications for share refreshing and recovery.

Implement an aggregator-light variant of simple threshold decryption

    Not for this PR, but we need to explore a potential optimization where the exponentiation is performed before the pairing (which implies that either the request does it before the request, or `alpha_i` is shared with the node and the node does it), and in the combination side we use something for products of pairings.

Originally posted by @cygnusv in #20 (comment)

Integrate fast tDec variant from `tpke` with `ferveo` crate

Do we event want to do that?

Expose client methods in `tpke::api`

Expose tpke methods used on the client side on tpke::api module:
- (List of methods TBD)
- encrypt
- decrypt_with_shared_secret
- decrypt_with_private_key
- Consider methods that could be used in binding testing
Implement and test Python and/or TypeScript bindings for these methods
- Consider how bindings would be tested in nucypher and nucypher-ts

Evaluate zkSync and other L2 scaling technologies for DKG verification

Replace deprecater rust-crypto crate with sha2

The rust-crypto create used in tpke is deprecated. We should probably use sha2 instead.

Research parity of blinding key in `tpke` and and validator's decryption key in `ferveo`

Make sure these abstractions are matching and overlapping
Is the simple tDec variant supposed to use blinding? (Docs)
Share encryption in ferveo PVSS vs blinding factor in tpke

DKG Key Refreshing via Proactive secret sharing

Related issue: #78

Integrate light variant of simple tDec into `ferveo` crate

Validate desired granularity of DKG rituals (per label, per end-user client, per adopting application, etc.)

"Fast" vs "Simple" decryption method

Ferveo's paper and implementation use what they call a "fast" approach for Threshold Decryption, as opposed to the "Simple" method ("Fast" and "Simple" being the terms used in their docs). The reality is a bit more nuanced and it has implications for us.

The "Fast" method is actually a trade-off where they optimize the threshold decryption step over the combine step. That makes sense for their use case, where Ferveo is used during a block production stage to decrypt encrypted TXs; since in that stage the validator set that decrypts is fixed and reused for many TXs, it makes sense to reduce the individual decryption time, while reusing intermediate computations in the combine step that are depended on the validator set (i.e., Lagrange coefficients). The way to do this is to kick the can down the road at decryption time, so validators don't actually create decryption shares (which would require computing a pairing) but some blinded values (which only require an EC multiplication) that can be later combined with known information from the DKG ritual at the combine step. In short, the implications of the "fast" method are:

Optimizing decryption over combination
Combination is now heavier (as the pairings you're not doing at decryption time need to be done anyway at combine time) and requires knowledge of some DKG ritual information

The "simple" method is a more balanced approach, where pairings are done at each individual decryption request, and the combination is "merely" a bunch of EC multiplications (although in the much harder group $\mathbb G_T$). For our use case, it makes sense to try to reduce the combine step as much as possible as it will be done by Bob in the browser (potentially by a Porter instance if it's trusted enough).

To me this points towards changing Ferveo to implement the "simple" version, which is also...simpler, as you don't need to introduce this intermediate blinding step. Since they prepare these blinding values at the DKG stage, this also has implications on the DKG implementation.

Benchmark gas cost of synchronization layer transactions on L2

TBD: Scope of benchmarks (specific contract methods)
TBD: Platforms to test (L2s: Optimism, Arbitrum; Polygon; cost and trust assumptions)
Produce transcripts (as blobs of data) and use them in gas benchmarks

DKG Key Refreshing (maintain PK vs. trivial interactive new PK) from product perspective

Analyse viability of interactive cohort refresh on per use case basis
Surface constraints associated with maintaining PK via PSS

Dependency on block time in `ferveo`

PubliclyVerifiableDkg implementation in ferveo has a dependency on the current block time in its state transition function.
- It also shows up in other places, such as message processing retrying: 1, 2
Do we foresee a similar dependency in our DKG design? Do we want to remove or retrofit this code?

Replace `blinded` usage with a proper nonce

Currently, Ciphertext::nonce (Chacha nonce) is set to the blinded private key share.

Considerations:

Rename Ciphertext::nonce to Ciphertext::blinded.
- blinded is used in impl<E: PairingEngine> Ciphertext<E>, so we may not remove it (source).
Add a Ciphertext::nonce field, where nonce is a random 12 bytes.

Figure out order of magnitude for the number keys in a DKG ritual

where number of keys doesn't necessarily equate to number of nodes

Benchmark size of protocol primitives

Benchmarks size of PVSS transcript for 2, ..., 64 shares
See previous work

Review `hash_to_curve.rs`

File has not yet been looked at and contains dead code. Let's review it and tidy it up.

Benchmark encryption & combination (TypeScript via WASM bindings) and partial decryption (Python bindings)

Domain specific API integration into nucypher-core

Consider moving Python and WASM bindings to nucypher-core
Rewrite domain-specific knowledge using the "language" of nucypher-core primitives

Expand benchmark suite in `tpke` crate

Currently, we only have benchmarks for the share combination method of the fast threshold decryption variant.

We want to add more benchmarks to the tpke crate for the following:

Fast & simple threshold decryption, #25
- Creating a decryption share
- Preparing share combination
- Share combination
Encryption & decryption, #25
Key share refreshing & recovery (delayed, and to be revised later, #41
- Computing share updates
- Refreshing a share

RFC @cygnusv

Verify blinding of the key shares

Implemented in the tpke crate, verify_blinding
Checks that $e(g, \sum_i(Y_i)) = e(\sum_i(A_i), [b] H)$
Used to be a part of fast threshold decryption flow
Currently doesn't work
- At one point we "fixed" that code by removing the blinding (source). Those changes were temporary - the bliding on main branch is done using a random factor. This factor corresponds to validator's key in ferveo.
- Removing blinding factor didn't impact verifiabilty (verification didn't work).
Missing from operation summary in docs

`G1Prepared` bytes serialization

Refers to the direct use of dkg.pvss_params.g_inv()

On-chain BLS curve cryptography support

Research support for running BLS curve cryptography methods on-chain, either on the Ethereum EVM (check state of EIP-2537), or other EVM-compatible rollups. The rationale for this is to verify validity of PVSS instances (i.e., DKG rituals metadata) on-chain.

Use ciphertext validity checks

Ciphertext validation functions are defined, but for some reason, they're not used. Call them before decryption.

Handle pessimistic cases in the light tDec variant

Design a variation of this scheme that is robust to a pessimistic case.

There's a caveat with the light approach: this works as long as all t requests are ok, but if one of them fails, then all the lagrange coeffs you created and that all the nodes used before the pairing are incorrect

2-of-3
Nodes 1 and 2 --> L1 and L2
Nodes 1 and 3 --> L'1 and L'3
Optimistic: Node 1: C1 = e(L1 * U, Z_1). Node 2: C2 = e(L2 * U, Z_2)
Node 3: C3 = e(U, Z_3)
C1^(L'1/L1) * C3^L'3

Use low-latency (optimistic variant, light variant). If it fails, switch to regular simple tDec.

Refers to #30

Update serialization to use `serde`

There is no direct compatibility with serde, although some exploratory work is being done in this area
Trying to create a workaround for that using serde_with and changes from the already-merged PR at arkworks. So far no luck. Tracking this work here.
- Update: Decided to postpone this work since it requires updating a host of arkworks ecosystem crates.
This issue is hence blocked by the 0.4 version release of arkworks/algebra crates, and the subsequent release of the updated arkworks-bls-12-381 crate

Share replacement procedures & naming conventions, based on user/staker actions & application logic

Context
A replacement key share – compatible with the persistent (whole) public key – will need to be generated in a variety of scenarios, most of which are unscheduled – i.e. 'external' share generation prompts, including staker actions (and inaction), end-user actions, and the execution of arbitrary application logic.

Relevant issues:
#26
#70

Prompt categories

(1) Orderly staker unbonding
This comes in four flavors:
(a) Staker commences unbonding of entire stake in order to depart network
(b) Staker commences unbonding of some number of T tokens which drops them below a minimum requirement for one or more of their current cohorts. This can include unbonding such that the remaining amount is lower than the global minimum stake size.
(c) Staker bonds some number of T tokens that takes them ABOVE the maximum allowed in a given cohort (corner case).
(d) Staker commences unbonding of some number of T tokens which does not disqualify them from participation in any of their current cohorts.

(2) Inactive or defunct* node
Inactivity could be theoretically detected by:
(a) a tBTCv2 slashing event (on-chain verifiable)
(b) the absence of some predefined interactive check-in requirement – e.g. developer sets cohort compositional parameter to require a signature from all cohort members at least once every 3 months (on-chain verifiable). This is basically confirmActivity but customizable and potentially incurring a higher fee (adopters pay more), useful for cases where non-responsiveness is intolerable.
(c) challenge protocol – by other members of cohort (off-chain verifiable)

  For early versions of CBD, the inactivity check with the best cost-benefit is (a). Note that the motivation to split the operation of a tBTCv2 and CBD node between two separate Ethereum addresses to evade this double punishment could be counterbalanced by providing adopters with the option of filtering for addresses with both apps authorized, which would be advertised as providing superior availability (in theory).   The simplest implementation of (a) would be maximally punitive – a node would be removed from a Cohort if there was a slashing event of any size in any other Threshold application to which they have authorized T.
  
*A node which has lost or had their key share corrupted could be considered inactive by the protocol and dealt the same punishment.  

(3) Application-level cohort refresh
(This woul increased collusion-resistance & redundancy, which becomes more critical if DKG & Decryption cohorts are always the same in early versions of the CBD MVP.
This can be driven by either:
(a) A pre-specified schedule
e.g. For all sharing patterns; once per 7 day cycle, a random 10% of the longest-serving nodes are recycled, with the exception of three 'hard-coded' known nodes.
(b) A user action and/or business logic.
e.g. if NFT sold for greater than 5 ETH, initiate share refreshing such that n - m + 2 cohort members are replaced.
(c) A schedule based on action/logic.
e.g. if over 100 separate Ethereum addresses request access (and sign to prove ownership), increase frequency of cohort refresh to weekly.

(4) Emergency/safety cohort refresh
e.g. If some sub-threshold minority of operators attempt to bribe or coerce the remaining members, the honest threshold could 'vote' to prompt a cohort refresh and remove offending operators from the cohort. This prompt needs more thought, particularly with regard to malicious misuse of emergency refresh and the lack of provability of a collusion attempt.

Procedures

For prompts 1(a-c) and 2(a), one entirely new key share is required to onboard a new staker into the Cohort. This share is replacing an old share belonging to a previous member of the cohort – therefore that old share must no longer be valid in the context of decrypting the underlying data. This is procedurally simple.

Conversely, prompts (3) & (4) will sometimes require all the members of the cohort to be replaced – or at least a greater number than the threshold. However, you need at least a threshold of nodes to execute any kind of share generation/replacement. Therefore the maximum number of nodes that can be replaced in a single procedure, or execution of any share generation function, is n - m. And to maintain the original cohort size, the protocol must simultaneously enlist and assign new nodes to take the newly generated replacement shares.

Hence a 'total cohort' refresh could be achieved in three steps – using a 9-of-16 cohort as an example:

9 of the 'original' nodes replace the other 7 original nodes, and 7 new nodes are onboarded.
7 of the new nodes + 2 of the original nodes replace the other 7 original nodes and another 7 new nodes are onboarded.
Any 9 of the 14 new nodes replace the 2 remaining original nodes.
Note that the new Cohort may end up with some of the same node addresses as before, unless this is disallowed by the protocol and/or application-level parameters, but they would all hold fresh key shares, pertaining to the same whole public key.

Note that a dishonest threshold of nodes can choose to dump the rest of the nodes out of a Cohort at any time with the execution of a single method. This is congruent with the Honest Threshold trust assumption.

Naming

For prompts 1(a-c) and 2(a), we might call the corresponding method ‘Share Replacement’ or 'Share Substitution'. The existing term, 'share recovery' is misleading as it sort of implies that you'd end up with the same exact key share, same node, or both. It also makes sense to me to reserve the word ‘refresh’ for broader/higher-level changes to the cohort composition – see below.

For prompts 3(a) and 3(b), the protocol could in theory individually and sequentially replace nodes in the cohort, provided that the composition recycling parameters are abided by. However, it is more efficient and safer to invoke a generate multiple new shares in one swoop. To distinguish this method from individual share replacement, we might call this 'Multi-share Replacement', 'Multi-share Refresh', or 'Cohort Refresh'. The latter is a weaker name because there will be plenty of scenarios where multiple shares are replaced at once without the entire cohort changing.

Hash function selection for `tpke`

We need to be careful with hash function selection, specially if there's stuff that has to be executed later on-chain. Although there's a precompile for the internal function F for Blake2 (https://eips.ethereum.org/EIPS/eip-152), we need to consider it. We don't want to end up with a hash function salad like it happened with our first Umbral implementation, where Blake2b, SHA256 and keccak256 had to coexist. In rust-umbral we switched to SHA256 for everything.

In any case, this is not a comment for this PR, but an issue to be opened.

Originally posted by @cygnusv in #8 (comment)

Update error handling

ferveo/tpke defines ThresholdEncryptionError, but doesn't use it anywhere.

Consider using and extending ThresholdEncryptionError.
Replace unwrap() calls with errors defined with thiserror crate.

Sketch a design of a state machine for PSS

Sketch a design protocol for proactive secret sharing
Provide a design of a state machine
Abstract away synchronization layer

Gather relevant papers for DKG as it applies to Threshold Decryption

Expose server methods in `ferveo::api`

Expose ferveo methods used on the server side on ferveo::api module:
- encrypt,
- combine_decryption_shares,
- decrypt_with_shared_secret,
- Keypair::random,
- Dkg::generate_transcript,
- Dkg::aggregate_transcripts,
- AggregatedTranscript::validate
- AggregatedTranscript::create_decryption_share
- Serialization where needed
- Consider methods that could be used in binding testing
Implement and test Python and/or TypeScript bindings for these methods
- Consider how bindings would be tested in nucypher and nucypher-ts