ethereum / builder-specs Goto Github PK

View Code? Open in Web Editor NEW

158.0 158.0 42.0 2.77 MB

Specification for the external block builders.

Home Page: https://ethereum.github.io/builder-specs/

License: Creative Commons Zero v1.0 Universal

HTML 100.00%

builder-specs's People

Contributors

Stargazers

Watchers

builder-specs's Issues

Validator compute state root requires full withdrawal list

To compute the state root for Capella blind block, a full withdrawal list will be required in state transaction. Currently, a blind block contains a header that only contains a withdrawal root, not a list. That information is not sufficient to construct state root. One solution is client can utilize get_exepcted_withdrawal from the parent state and use the expected withdrawal and call process_withdrawals. I think we might want to clarify this in the builder specs

How long should validators wait for a payload from a builder?

I've been looking at Teku's builder endpoint recently. After sending the signed blinded block to the builder, Teku will wait 8 seconds for a payload before erroring out. Just to be clear, it will not fallback to the local payload here; it's too late for that. I'm wondering if that is an appropriate value and/or if we should define a timeout for this in the spec. I think that 8 seconds makes sense. In the spec, I only see a timeout for getting the payload header.

These are all of the builder timeouts defined in Teku:

Duration EL_BUILDER_CALL_TIMEOUT = Duration.ofSeconds(8);
Duration EL_BUILDER_STATUS_TIMEOUT = Duration.ofSeconds(1);
Duration EL_BUILDER_REGISTER_VALIDATOR_TIMEOUT = Duration.ofSeconds(8);
Duration EL_BUILDER_GET_HEADER_TIMEOUT = Duration.ofSeconds(1);
Duration EL_BUILDER_GET_PAYLOAD_TIMEOUT = Duration.ofSeconds(8);

Feature Request: Let validators compromise between gas usage and fees.

Correct me if I'm wrong, but I believe validators only see the value of the block in the bids but not other important properties like the used gas. Thus it cannot compromise between gas usage and fee. Since larger blocks increase the orphan risk, a validator may want to specify a minimum miner tip per gas.

I propose two changes.

Publish gas usage in addition to value in the bids (and for EIP4844 also the extra data usage). A validator can then choose the block where value - minimumtip*gasusage is maximal.
Let the validator publish his preferences to the block builder, i.e., the minimumtip it uses. This will help block builders to optimize for the wishes of the validator.

Otherwise, we will see more and more transaction with just 1 wei miner tip. Block builders have no incentive not to include them (their orphan risk is negligible as they will still have the best bid in the next block). And validators don't even see the size of the block before accepting the bid.

Discard the validator registrations from slashed validators?

According to the current spec, builder software can accept registrations from validators who are currently active but also slashed.

Given that slashed proposers cannot produce valid blocks, we could tighten the spec to disallow this set of possible registrations.

It is a bit of an edge case but does reduce the possible load on builder software so I wanted to open this issue to see if there is any interest to modify the spec here.

Allow SSZ encoding in addition to JSON encoding (for request and response payloads)

I've benchmarked encoding and decoding of signed-blinded-beacon-block payloads with SSZ vs JSON: flashbots/go-boost-utils#50

SSZ seems 40-50x faster:

BenchmarkJSONvsSSZEncoding/Encode_JSON-10         	   18561	     62939 ns/op	   50754 B/op	     274 allocs/op
BenchmarkJSONvsSSZEncoding/Encode_SSZ-10          	  855457	      1368 ns/op	    9472 B/op	       1 allocs/op
BenchmarkJSONvsSSZEncoding/Decode_JSON-10         	    7621	    153292 ns/op	   30433 B/op	     613 allocs/op
BenchmarkJSONvsSSZEncoding/Decode_SSZ-10          	  276904	      3968 ns/op	   10116 B/op	     152 allocs/op

This would be particularly relevant for getPayload calls, because of the payload size and timing constraints (but perhaps also interesting for getHeader and registerValidator).

Each getPayload call currently does 8 JSON encoding and decoding steps (if we include mev-boost in the mix):

BN encodes the getPayload request body (signed blinded beacon block)
mev-boost decodes the getPayload request body
mev-boost encodes the getPayload request body
relay decodes the getPayload request body
relay encodes the getPayload response body (execution payload)
mev-boost decodes the getPayload response body
mev-boost encodes the getPayload response body
BN decodes the getPayload response body

Considering 20-40ms per coding on average, that's up to 200-300ms JSON latency (or more).

Sending the data SSZ encoded could reliably shave 200-250ms off each getPayload roundtrip.

I propose we add SSZ encoded payloads as first-class citizens.

Questions:

How can this work with relays that didn't (yet) support SSZ encoded payloads? 🤔
- How can we make this backward compatible, so it will also work if a given relay didn't yet upgrade?
- Should the BN just try to send the getPayload with SSZ body, and if the relay responds with an error then try with JSON again? Or should it send an accept-encoding: ssz header to getHeader, and if it receives the header in SSZ encoding then use SSZ for getPayload too, otherwise fall back to JSON?

Improve endpiont descriptions

I think it makes sense to write the error conditions first. They are essentially the abort conditions

if any of these, error

otherwise, return this (with these conditions on values)

rather than

return this!

but don't do that if any of these error conditions

Clarify errors on registrations

The validator registration endpoint currently takes a single registration with the understanding that relay muxers will send the registration to each relay they are aware of.

The spec for this endpoint returns a binary success/error response which precludes the case where a some relays succeed where others fail when processing a registration.

I think we want to erase this level of detail from the API in which case we don't need to change anything. For example, a caller of this endpoint who receives an error should assume the registration has completely failed and they need to try again (e.g. beacon node queues a retry for later). If they get a success, they should assume the responsibility on promulgating their registration is now with the muxer software (and the implication is that internally, the muxer may retry against transient relay failures, etc.)

If this is not the intended semantics, then we should extend the response so many errors can be returned.

honest validators should build a local payload in parallel (?)

flashbots/mev-boost#222 (comment)

Asynchronous signature verification on validator registration

Abstract

This proposal removes immediate signature verification of new validator registrations, making the verification asynchronous. The information about verification status shouldn’t be returned from the registration endpoint any more, and instead queried from data API.

This change removes a CPU bottleneck in the relay, along with possible DOS attack vectors, and allows registration process to be resilient to high loads.

With the current process, relays don’t have an even load and relay operators need to use expensive infrastructure to cover the load spikes. As the signature verification failure status is not used in the flow, relaxing the spikes should greatly reduce relay’s maintenance costs, increasing the number of people who can afford running relays.

Motivation

This change addresses a number of problems and threats to the relay ecosystem that are the effect of verifying registration signatures on register validator submission.

The mainnet right now has more than 400k validators and we expect this number to grow with Ethereum adoption. There are existing mechanisms that on every epoch re-register existing validators, resending hundreds of thousands of validator registrations in the few first seconds of an epoch.

Verifying signatures is not a computationally trivial operation - to carry the CPU load a singular relay instance needs to be run on an oversized server that is not utilizing its computing power for the rest of an epoch.

The current implementation of [mev-boost](https://github.com/flashbots/mev-boost/blob/main/server/service.go#L276-L284) does not return error contents back to the validator on failed registration.

The verification of registration signatures is also not immediately used as the registrations are only used to be returned later by the api endpoint ([link](https://github.com/flashbots/mev-boost-relay/blob/174a4a66280aa0289551f61dbabbb17ec202c18d/services/api/service.go#L1420)).

Current network traffic characteristics are similar to a DDOS attack, as the current mechanism creates an attack vector where a bad actor sends its own registration slightly ahead of time and then floods the server with incorrect registrations. It’s also possible to completely clog the relay with just the number of new registrations.

Prior Art

The recurrent registration problem was reported a few times before, but the core of the problem was never satisfactorily resolved.

#24

Relays need to be designed in a way that can handle the load. Some variance might be nice but couldn't be relied upon anyway.

This change is meant to eliminate the CPU-bound performance problems, without changing the entire network’s behavior.

Detailed Description

Current Validator registration process:

Receive http request
Decodes request’s body into json array of validator registration objects
For every element of the array check series of parameters (e.g. submission time, is known validator)
Additionally to the checks, the signature verification are performed to verify payload authenticity
For every successful verification, registration is persisted in the storage.

This proposal aims to make the signature verification (step 4) asynchronous.

The only reason a good, lawful and honest validator may be concerned about its signature state is at the time of its initial or consecutive deployments, i.e. when configuration can change. There is no benefit to knowing it is still correct on every deployment.

Therefore, the information about the state of verification may be offloaded into a separate endpoint and removed from (POST) /relay/v1/builder/validators . It can be achieved by extending /relay/v1/data/validator_registration with additional enum field - status.

Go (possible implementation)

type Status int64

const ( 
	Unverified Status = iota
	Verified
	Invalid
)

// SignedValidatorRegistration https://github.com/ethereum/beacon-APIs/blob/master/types/registration.yaml#L18
type SignedValidatorRegistration struct {
	Message   *RegisterValidatorRequestMessage `json:"message"`
	Signature Signature                        `json:"signature" ssz-size:"96"`
  Status    Status                           `json:"status"`
}

https://github.com/flashbots/go-boost-utils/blob/main/types/builder.go#L200-L204

The endpoint would then return:

{
  "message": {
    "fee_recipient": "0xabcf8e0d4e9587369b2301d0790347320302cc09",
    "gas_limit": "1",
    "timestamp": "1",
    "pubkey": "0x93247f2209abcacf57b75a51dafae777f9dd38bc7053d1af526f220a7489a6d3a2753e5f3e8b1cfe39b56f43611df74a"
  },
	"signature": "0x1b66ac1fb663c9bc59509846d6ec05345bd908eda73e670af888da41af171505cc411d61252fb6cb3fa0017b679f8bb2305b26a285fa2737f175668d0dff91cc1b66ac1fb663c9bc59509846d6ec05345bd908eda73e670af888da41af171505"
  "status": 1 // or possibly in a string form "verified"
}

The new process would assume that after successful initial verification (e.g. submission time, is known validator) every registration would be persisted with the default unverified state.

It should be left for the relay development team to decide on implementation details, however, the goal for the verification itself is to become “eventually verified”. This new flow would allow various improvements not limited to verifying signatures in the background process, throttling the number of parallel calculations, or calculating signatures only upon request.

For existing relay implementations it would still be possible to preserve verification calculations on submission and save as already verified.

Backward Compatibility

This change introduces a weak inconsistency for people who were expecting to find an error on incorrect signature - that should no longer be returned.

Change to the /relay/v1/data/validator_registration endpoint is additive - meaning there are no protocol-breaking changes.

In existing codebases, the value can still be calculated in the same place as it was before and saved as already verified. So no big immediate changes should be needed.

Dependencies

This proposal doesn’t depend on any other work.

Risks and Security Considerations

There is no standard of the multi-validator registration process - it is unclear how relays should behave upon a failure of one validator. From the user perspective, it’s undesirable to fail all validators in the payload if one has a broken signature. This change allows all validators that passed pre-check to be registered and discarded only when needed.

Rationale and Alternatives

As described above this change targets performance improvements for good, lawful and honest validators, as signatures may only change during the deployment of a new configuration, followed by a process restart. There is no benefit to verifying that one’s signature is still correct for every request. Furthermore, current implementations of processes like mev-boost would not return this information back to the validator either.

This simple change can allow relays to use smaller servers, utilizing more of the cpu idle time - as we no longer have 3s window for verifying >400k signatures - allowing more people to afford running relays.

Error responses

Issue for discussing various parts of the error responses.

There was some discussion about sending the SignedBlindedBeaconBlock in SSZ format, using a Content-Type: application/octet-stream request header. If mev-boost / the relay does not accept that format, should it return status code 415? https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/415
#8

Specify how to handle conflicting bids

This is a bit of an edge case but I think it is worth specifying that if a proposer (with the boost software acting as a proxy) receives multiple signed builder bids with the same value but differing payload headers, the proposer should break the tie with the hash tree root of the signed bid (e.g. choose the lessor root interpreting the root as a 256-bit number)

Add release version to oapi file

Here

builder-specs/builder-oapi.yaml

Line 22 in b66471c

version: "dev"

Either at release time or in master should indicate the release version

forward port the updated specs to capella and deneb

follow up from the work in #76 to ensure the capella and deneb specs make sense

Clarify relationship b/t builder, relay and proposer

There is an ambiguity right now with the pubkey contained in the SignedBuilderBid and the signature included in that message.

We want to know which builder produced a given bid/block for accountability reasons and so the signed message should include an identifier (and ideally a signature) for the builder.

Proposers also want to be able to know (and importantly, prove to others) which relays they use when obtaining blocks from the external builder network. We ideally have a separate identifier and signature to bind a particular relay to a bid they forward to validators.

Right now, there is only one set of (pubkey, signature) in the SignedBuilderBid message and it is not clear in the specs which set of actors they refer to.

To remedy, one solution is:

Include a builder public key and signature in the BuilderBid message that signs over the header and value data.
Include a relay public key along with an outer signature in the SignedBuilderBid message.

And ideally the relay validates the signature from the builder and the proposer validates the signature from the relay and builder. Proposers could skip builder verification if they want, or perhaps do offline in situations where it becomes relevant.

Another solution that reduces the overhead here compared to the status quo is to make the builder opaque to the proposer (so that the SignedBuilderBid only has pubkey and signature for the relay) with an option to also have the relay sign over the builder identity (so include a second pubkey for the builder in the BuilderBid message). If it is not clear from the messages exchanged in the builder APIs which builder was responsible for a block, then I think there should be a separate protocol put in place to increase accountability between builders and relays but if we go this route then this concern can be handled elsewhere.

Extend `PayloadAttributes` with configurable `gas_limit`

If we refer to the current execution Engine API spec for the build inputs: https://github.com/ethereum/execution-apis/blob/main/src/engine/specification.md#payloadattributesv1

we see that there is no way to specify a gas limit.

It is critical for the security of the network that this parameter remains under the control of the validator set and not the builder set (which should be much smaller and possibly not as aligned with ethereum overall).

I think an easy fix is to specify a PayloadAttributesV2 message (h/t @mkalinin ) that simply appends a gas_limit field:

* gasLimit: QUANTITY, 64 Bits - value for the block gas limit of the new payload

RFC: Expanding scope of API

The current design of the API is focused on simplicity, with the goal of providing an interface for receiving blocks from a single, trusted builder. This scenario is contrived though, because in reality there will be additional software involved, such as a multiplexer like mev-boost and relays which create a buffer between builders and proposers.

Expanding the scope of the API beyond the contrived architectural view may make sense to better accommodate these designs, so I would like others to weigh in.

`status` Endpoint

The status endpoint is defined to return whether the builder is operating normally and able to produce blocks. It's undefined what the response for this endpoint should be when called against a multiplexer. It could return "OK" as long as one connected relay is up, or maybe only return "OK" if all configured relays are up.

I am okay with this being an implementation detail of the multiplexer. As long as that software believes it can retrieve blocks, it should return "OK".

Both relay and builder signature in `Bid`

Expanding the definition of the Bid object to include signatures from both the relay and the builder would give the proposer additional insight into the entity that actually built the block.

I think an issue with this is that the relay can simply fill out the builder signature with another key pair they have on hand and continue masking the builder's real identity.

My current opinion is to not expand the scope, but rather focus on the functionality provided by the API to make sure it is adequate to build more complicated systems on top of. Right now, only the most basic protocol for retrieving external blocks is defined. This allows a great deal of flexibility. For example, CLs who want to implement mev-boost natively do not have specialized fields for mev-boost that they need to worry about.

If there are other compelling examples / reasons for expanding the scope, please share them here.

Message rejection when timestamps are equal

builder-specs/apis/builder/validators.yaml

Line 9 in e602513

must error. Requests with `timestamp` less than or equal to the previous

The above validation results in duplicates of valid messages returning: {"code":400,"message":"invalid timestamp"}

I was under the impression registrations needed to be re-signed infrequently, and the once per epoch registration we are sending would be a duplicate.

This timestamp requirement also creates a re-signing once per epoch requirement (or however frequently we re send registrations), which could add a lot of load for a client running a lot of validators.

How frequently do validator registrations need to be published?

The spec currently says validators should submit their registrations once per epoch.

However, it seems like the mev-boost component has the responsibility to keep registrations it learns about and upstream them to connected relays in which case the validator client may be posting data every ~6 mins for no additional gain.

I'm opening this issue to seek feedback on expectations around this process and if we can reduce the frequency I'm happy to change the spec to reflect that.

Based on my current understanding, the validator client only needs to call the registration API when they boot or if any of their registration data changes (which practically may be synonymous with boot if clients only allow changes to config during validator client process boot).

Clarify signing routines

The signing routines for the Builder API messages are under-specified.

The spec currently says to compute a message to sign over using compute_signing_root which takes the SSZ data and a Domain. Rather than specify a Domain, the spec says to simply use the DomainType 0x00000001.

There are two degrees of freedom to go from a DomainType to a Domain which we should decide how to handle:

Fork version
Genesis validators root

The basic question to answer is if we want to version messages across instances of the protocol or not. The degrees of freedom here are to allow for messages (like deposits) where the relevant information either is not known or we don't want to restrict a message to a given fork.

If we follow the current layout of the builder spec here, we should version (so use state.genesis_validators_root and the appropriate fork_version) the builder bid types but not version (so use "default" values as defined in the spec) the validator registration types. Moreover, this maps to the "deposits vs. other protocol messages" usage in the consensus specs.

add scaffolding for executable python spec

the ethereum consensus specs are executable, such that the specs are testable, and test vectors can be generated for consensus clients to consume. the builder specs should be executable in a similar way.

the scope of this issue is to set up the basic scaffolding for executable builder specs.

for example:

script to parse the markdown files and generate python files with defined constants, containers, functions, etc
minimal test(s)
mechanism to execute the test(s) (e.g. Makefile)

Add payment proof to builder bid

Right now we're trusting the bid from the builder will pay the value it claims. This can and should be provable with a state proof.

Suppose we're building block N+1 -- the builder bid object can be extended to have a proof to the balance of the fee_recipient at block N and N+1. The proofs would be rooted against the headers for N and N+1. This would allow the validator to verify that their account does receive the value as expected.

Use `redocli` for CI linting

cf. ethereum/beacon-APIs#223

Myhigh

Chigo79

Allow Proposers To Force Transaction Inclusion in Payloads

Currently block proposers outsource block building to remote builders via relays to maximize block proposal income. While this makes sense for both solo stakers and pooled stakers as they want to have access to MEV income, it comes at a great cost as it provides relays/builders full control over what transactions can be included in a payload. Block builders might choose to censor certain types of transactions even if they are perfectly valid and economically attractive. The end result would be that block proposers would inadvertently also censor these transactions as full control over transaction inclusion has been handed over to builders as the current APIs currently stand.

Proposal

The current state of the builder apis can be improved to allow for stronger censorship resistance guarantees along with still allowing validators to access MEV income. Instead of the block proposer allowing the builder to build the whole payload, the block proposer now submits a list of full transactions that it wants added to the payload by the builder when requesting the execution payload header in GET /eth/v2/builder/header/{slot}/{parent_hash}/{pubkey}. This would be a new endpoint as the expected request/response from this is different(even if it is easily extendable from the current data structures)

The builder now has to include these transactions into the payload and is a constraint in the block building process. Once the builder has built the execution payload, it returns the execution payload header along with a multiproof that verifies the existence of the desired transactions for the provided transactions field root.

Once the block proposer receives the response back it verifies that the transactions were all included by the builder:

Provided multiproof verifies for the desired transaction list.

Once this is verified, the block proposer can then sign the blinded block and request the full block from the builder and then broadcast it. This proposal accomplishes a few things:

Adds stronger censorship resistance guarantees for honest validators who want access to mev income.
Allow the builder to prove inclusion of specific transactions without revealing the contents of them.
Prevents relays/builders from being handed unilateral power to censor txs/entities from ethereum.

The transactions to be force included can simply be retrieved from the local node's mempool. Ex: Top N most valuable transactions seen. In the event that the relay/builder attempts to censor, the proposer simply falls back to local block building. This allows honest validators to detect when relays attempt to censor and fall back to building locally.

An advantage of this proposal is that it is pretty straightforward for consensus clients to implement on top of the existing builder api.

Consider running builder software through the Merge testing

Remaining testnets before the Merge:

~~Sepolia~~
Goerli

@metachris and I would like to consider CL clients running version v0.1.0 of these specs during the upcoming Sepolia merge as a dress-rehearsal for later testnet merges and ultimately mainnet.

There is a trade-off here between shipping a safe merge with reduced code surface and deploying an untested software stack a few epochs into the finalized mainnet merge.

I'm raising this issue as a place for discussion of these various trade offs.

I'll open with the following extension to the builder-specs:

`MERGE_DELAY_EPOCHS` | `16` | `epoch`

Proposers **MUST** wait until `MERGE_DELAY_EPOCHS` epochs after the merge transition has been finalized before they can start querying the external builder network.

* NOTE: if the merge transition happens in epoch N and is finalized in epoch N+2, then proposers **MUST** not use the external builder network until epoch N + 2 + `MERGE_DELAY_EPOCHS`.

Missing web assets

Repo seems to be missing a /dist dir with CSS, image, etc. assets.

If you look at the beacon-APIs repo, this folder is just checked in (so I don't think it can be generated locally by some tooling).

I fixed the website render by cloning this repo along w/ the beacon APIs submodule and just copying /dist next to index.html.

We should probably do the same here.

Extend validator guide with concrete validations a beacon node should perform for builder blocks

A builder provides an ExecutionPayloadHeader when offering bids to a given proposer.

The proposer (and specifically the local beacon node the proposer is running) should check the header against their local execution state to ensure the header is not inaccurate or even malicious.

We should specify the set of validations that should be performed in the validator guide.

Pull validations out of the API docs

There are some validations relevant for how implementers of the spec use the builder network that are "hidden" in the API documentation.

This organization is a bit different than how similar spec repos (like the consensus-specs) are styled so may lead to some confusion.

e.g. I suspect the root cause for this: flashbots/mev-boost#245

Let's pull all validations out of the API docs (or at least put them into the "literate specs")

Clarify Gas Limit in Register Validator endpoint

I wanted to ask to what extent ( or up to the discretion of the validator clients) should we warn users on invalid Gas Limits? or if the builder will handle with something more sensible and is the responsibility is on the builder?

As discussed previously I believe client teams are using the 30M as the default but perhaps that's simply an implementation detail ( instead of the previous block? or whatever is higher?).
note: would need to release to avoid lowering the limit by default if the network settles on something different.

For user overrides what were we thinking ( I just hope an error isn't thrown back to us) for the intended flow on a bad range for gas limit. Sending something very large will result in you not being able to include that many transactions in a block and sending too little the block is not profitable. Will those be adjusted somehow or is it totally up to the validator client.

Usage of the BLS signature scheme should mandate a "proof-of-possession" check

there is a well-known attack on the BLS signature scheme called a "rogue public key" attack

you can read more about it here: https://hackmd.io/@benjaminion/bls12-381#Rogue-key-attacks

the mitigation is straightforward: publish a "proof of possession" along w/ the public key.

given that this spec current requires builders to sign over their messages, we should also specify that builders publish a "proof-of-possession" alongside their public key and any other configuration info required to connect.

concretely, the "proof-of-possession" can just sign over the message that is the encoding of the builder's BLS public key according to the SSZ spec defined in this repo: https://github.com/ethereum/consensus-specs

Clarify validator responsibilities

Validators are expected to periodically (once per epoch? once per N epochs?) submit a SignedValidatorRegistration to builder software but I do not think this is document anywhere outside of the comment in the https://github.com/ethereum/beacon-APIs

We should craft a spec/validator.md that contains some light prose on the expected validator flow to be able to participate in the external builder network.

Swagger UI broke w/ `v0.3.0`

not sure what is going on but I think something broke the rendering https://ethereum.github.io/builder-specs/

in this commit: 58e2c66