thrumdev / blobs Goto Github PK

View Code? Open in Web Editor NEW

63.0 1.0 8.0 11.03 MB

Blobchains on Polkadot and Kusama

Home Page: https://docs-site-pi.vercel.app

License: Apache License 2.0

Shell 0.07% Rust 98.59% Dockerfile 0.70% JavaScript 0.52% CSS 0.13%

blobs's Introduction

Ikura

Blobchains on Polkadot and Kusama

Project Structure

blobs: The Ikura monorepo.
├──adapters: Adapters for various rollup development kits (RDK).
│   ├── sovereign: An adapter connecting Sovereign to Ikura.
├──demo: Projects showcasing integration of RDKs with Ikura.
│   ├── rollkit: Rollkit's GM rollup.
│   ├── sovereign: Sovereign Demo Rollup.
|--docs-site: Documentation site source, using Docusaurus.
|──ikura: Ikura source code.
│   ├──ikura-chain: Implementation of the Ikura parachain.
│   ├──ikura-nmt: Namespaced Merkle Trie definitions.
│   ├──ikura-serde-util: Various utilities for serde.
│   ├──ikura-shim: Shim between ikura parachain RPC and RDK adapters.
│   ├──ikura-subxt: Bindings to Ikura RPC.

Running Demos

Prerequisites

In general you need to have the following components running:

build ikura-chain:

cd ikura/chain
cargo build --release

Make sure that zombienet binary and ikura-chain binary are in your PATH.

Now you can launch 2 polkadot validators and one ikura-chain collator

./zombienet.sh

Sovereign Demo

launch the ikura-shim with:

cargo run -p ikura-shim -- serve sov --submit-dev-alice

then launch the demo rollup with:

cd demo/sovereign/demo-rollup
cargo run

execute the test

cd demo/sovereign/demo-rollup
./test_create_token.sh

You should see at the end that a batch of two transactions was correctly pushed in the DA, fetched back and then executed in the rollup to create and mint 4000 new tokens

If you want to rerun the demo, you need to reset zombienet and the demo-rollup

rm -r zombienet
cd demo/sovereign/demo-rollup
# clean the ledger db
make clean

Rollkit Demo

cargo run -p ikura-shim -- serve rollkit --port 26650 --submit-dev-alice --namespace 01

Original instructions should work. Make sure to check them out for prerequisites and other details. Below is a quick summary for reference.

Make sure that go bin folder is in path.

export PATH=$PATH:$(go env GOPATH)/bin

go to the rollkit demo folder and launch ./init-local.sh

cd demo/rollkit
./init-local.sh

Then use the following command to get the demo keys:

gmd keys list --keyring-backend test

save them into environment variables:

export KEY1=gm1sa3xvrkvwhktjppxzaayst7s7z4ar06rk37jq7
export KEY2=gm13nf52x452c527nycahthqq4y9phcmvat9nejl2

then you can send a transaction and check the results:

# it will ask for confirmation, type "y" and press enter
gmd tx bank send $KEY1 $KEY2 42069stake --keyring-backend test \
--node tcp://127.0.0.1:36657


gmd query bank balances $KEY2 --node tcp://127.0.0.1:36657
gmd query bank balances $KEY1 --node tcp://127.0.0.1:36657

If you see the amounts:

10000000000000000000042069
9999999999999999999957931

that means it worked!

To reset the chain

rm -r zombienet
cd demo/rollkit
gmd tendermint unsafe-reset-all

blobs's People

Contributors

Stargazers

Watchers

Forkers

pepyakin mrisholukamba noah-foltz tomusdrw muddlebee jsidorenko tugytur ashkan1987

blobs's Issues

persistent connection

a problem with subxt/jsonrpsee is that once it is disconnected, you are expected to recreate the whole connection manually. Auto reconnection will have to be performed manually.

Make blob submission more robust

Right now and after #76 the submit blob is not very robust. Specifically, I fear that the user would expect it to be a fire-and-forget and it's certainly not. We should not forget that this code is going to be used by sequencers and thus should be of appropriate quality.

Assign Proper Weights for Blobs Pallet

Benchmark
Account for on-finalize weight (charged flat cost when submitting a blob, even though actual incremental cost per blob is logarithmic)

Unit Tests for Blobs Pallet

The stub tests that were there didn't compile and should be replaced with real ones.

To test:

Individual blob size limit is respected
Maximum number of blobs is respected
Total blob size limit is respected
NMT calculation

Blob submission keys

Right now, when submitting a transaction, we are using alice key. Of course, this won't fly. We should come up with a plausible story for this.

shim/docker: handle SIGTERM & SIGINT?

Normally, SIGTERM & SIGINT signals terminate a receiving process, unless overriden. However, the PID 1 process is special cased and doesn't have this behavior. So those signals are ignored.

In Docker, a container's entrypoint is spawned as PID 1 and thus is not killable by default.

One workaround for that is to use tini or alike. We could also just override the default signal handlers. We should ensure however no surprises when running under docker.

Give a heads up about lacking the submit key

However, maybe we could issue a info! saying that submit key is not provided upon start, to indicate that the blob submission will fail if tried.

Originally posted by @pepyakin in https://github.com/thrumdev/sugondat/pull/35#discussion_r1402217791

Update the URL

blobs/sugondat-chain/node/src/command.rs

Lines 58 to 59 in 793ffe1

 fn support_url() -> String { 

 "https://github.com/thrumdev/sugondat".into()

Rename Pallet to Blobs Pallet

Fix Sovereign-SDK adapter

Add MIT/Apache2 License

PoC: polygon zkEVM

Avoid waste of blockspace with invalid blobs

Right now the submit_blob extrinsic would return an error if a preconditions for blob submission are not met.

Specifically, those preconditions are:

the blob itself is less than or equal to the maximum (actually, enforced by encoding with BoundedVec)
the ordinal number of the blob is less than or equal to the maximum number of blobs.
the total blob size (including the one being submitted) must not exceed the maximum size.

Those preconditions may make sense, but they are not doing what they are supposed to do. Specifically, it looks like they are trying to allocate the blockspace for blobs, but the way it's done is very weak. Let's take a closer look.

Returning an error from the extrinsic have the following effects:

The sender is still charged with fees and the collator still gets paid.
The extrinsic data is left in the block.
The blob is not committed in the NMT.

First of all, even if the preconditions are not met, it doesn't mean that there is absolutely no space for another blob. It may still be possible (although, IMO, unlikely) that another blob can be put into the block by the collator. The blob will end up in the block but won't affect the NMT trie. In this case, the collator will get its fee for the inclusion in any case, and as such is actually is incentivised to shove as many blobs as possible into a block potentially robbing poor sequencers of their fees.

At the same time, those limits are present to make sure that there is space left in block for other extrinsics and there is some space left of witness data in POV. However, I would argue that they don't actually achieve any of that.

If the limits are meant for making sure there are enough space for witness in POV, even if the blob slots are fully packed, then it won't actually achieve that, because other types of extrinsics or logic may already fill the POV with witness data. Furthermore, there is not much sense of limiting the consumption of the pov data in this way. Anything goes as long as the block builder is able to fit into the block size limit and the POV size limit.

If the limits are meant for making sure the block space is not consumed, then they are not fit for doing that. This is because a transaction that failed the blob inclusion is still present in the block and occupies space. As was mentioned above, the collator is not incentivized to not overfill the blockspace. Also the same dynamic applies here, that it doesn't matter in the end, how exactly collator decides to allocate the block as long as it's valid (and after all, the collator may forfeit its block creation privilege anyway).

The actual effect, or the "punishment", of including the blob when it's not supposed to be, is not including the blob into the NMT. This is silly, because:

It's just the runtime being anal towards the, I guess, block builder. It just refuses to do some job when it's able to due to a situation akin to making a minor mistake when filling a form. Everything that's needed is there, just do the fucking job.
It's misattributed. The punishment is directed towards the blob submitter for the result of action of the block producer.
Even if misattributed, it fails even at that punishment. Not every rollup is interested in creating inclusion proofs, so they may as well use the failed extrinsics.

So the current construction clearly doesn't work. On top of that, it also gives rise some complications. For example, see #71.

All of that suggests that we should probably treat the blobs present in the blob as unconditionally successful. It's should be up to the collator to decide how much POV and block space it wants to allocate on each part of block, and the collator should be incentivized to do the "right thing", which is include more blobs.

However, there are some constraints that should be probably enforced. One is the number of blobs. Firstly, they spend some compute. Sure, that could be solved by weight. However, in the future we may get to sugondat multicore and unbounded number of blobs in a worker may overwhelm the aggregator.

The maximum size of a blob may also be worth to enforce.

There is another constraint we discussed before: the blobs should be ideally ordered by their namespace.

So we have those block validity constraints how do we want to satisfy them.

Panic is one. It actually would help keeping the precondition violating blobs out of a block. However, those actually typically pose a DoS vector, because, an extrinsic that panics is not included in the block and thus the block doesn't charge the sender the fees and the sender can create lots of those blobs. I am also not sure about the behavior of mempool in this case. So when a transaction is validated and its dispatchable fails, how that should be treated? I guess it should ban that blob transaction, because they are always included based off a fresh block state (and thus only the blob properties can lead to panics and those won't change with revalidation). Then, how that would work during block building? If when the submit blob extrinsic panics, should that tx retried or banned? If the transaction panicked because the block was full, then the blob could fit the next blob and if it was banned then that blob could be forgotten by the network. However, if the blob is too big and that may be an attack where the adversary gets free tries. Despite that, it won't allow us to enforce the constraint of block ordering.

So it's clear to me that panicking is not the right tool for that.

Perhaps, something like ValidateUnsigned would work. Well, in our case here, there is a signer, I just don't know if there is a corresponding API for that. Such an API definitely would be more preferable, because the constraints should be the same as of the normal signed transaction plus some extra constraints.

Next, the nuclear option, is a custom block authorship. That would provide us with the greatest flexibility. For example, maybe we want to split the submit_blob from blob data. submit_blob would only carry the blob hash. A full block has a payload space for the blob preimages. A valid block always has all the blob preimages. POV probably still have to have all blob preimages. On p2p layer between the sugondat node, maybe sometimes the blobs can be omitted. Those kind of tricks.

That said, this is probably more for the future.

Fix the tests for the blobs pallet.

Design: Horizontally Scaling Data Availability Across Cores

Problem Statement:

A single Polkadot core provides 5MB of data per parachain block at a frequency of 6s, for a total of 0.83MB/s data throughput per core. To be truly competitive as a data availability service, Sugondat must utilize multiple cores.

Design Goals:

Scale to use as many cores as possible
Use the existing Parachain framework to access cores rather than waiting for CoreJam
Light-client friendly observing and fetching
Censorship Resistant Blob Submission
User fee payment per blob
Nice to have: hide load balancing from the user
Nice to have: support auto scaling
Nice to have: fast paths for data fetching (to avoid hitting the Polkadot validator set)

The nice-to-have goals are not necessary for any initial design but are worth thinking about now. Furthermore, we should be comfortable using a centralized architecture in an initial rollout, as long as there is a clear path to a decentralized architecture.

Approach 1 draft: Coordinated Worker model

In this model, we would have N Worker chains and 1 Aggregator chain. Each worker would simply make blobs available by bundling them in blocks. A single aggregator chain would scrape the worker chains and construct some kind of indexed data structure (such as an NMT) in each block which contains a reference to an application identifier, the blob hash, and a reference to the worker chain ID and block hash. Users would submit blobs in a p2p mempool which spans all the worker chains, with some mechanism which ensures that each blob is only posted to a single worker chain.

Compared against design goals:

With respect to goal (1), this scales up to the point where the aggregator chain or worker chain collation is the bottleneck. This depends on how expensive the scraping is.

Light clients, for goal (3), would need to follow only the aggregator chain, and then fetch block bodies for various worker chains. Bridge-based light clients only need the data availability commitment, which is guaranteed by bridging over the aggregator chain alone. The worker approach also assists with goal (9), as in the fast path the light client can fetch the block body from any full node of the worker chain. In the case of a Polkadot DA fallback, the light client needs the candidate hash to make the ChunkFetchingRequest to relay chain validators, and can only get the candidate hash by scraping relay chain block bodies.

Goals (4), (5), and (6) highlight the difficulty of this approach:

User balances must be global across all workers
Collators for each worker chain should be coordinated to avoid submitting blobs to multiple workers simultaneously

This implies that the worker chains should be advanced by the same pre-consensus process and collator set which is shared among all workers. This pre-consensus process would be responsible for determining what goes into each worker chain block at each moment, and would need to adapt to Polkadot faults such as inactive backing groups.

Approach 2 draft: Uncoordinated Worker Model

The uncoordinated worker model is similar to the coordinated worker model, but sidesteps the need for a pre-consensus by having workers operate independently without any consensus-level mechanism to avoid blobs being posted twice. Users would keep balances on each worker chain and submit blobs only to a specific worker chain. An aggregator chain would still track the blobs published on each worker chain for easy indexing, but load-balancing would become a user-level concern.

PoC: running Rollkit Adapter

Investigate API

Runtime Integration Test for pallet-blobs Signed Extension

After #103 we have unit tests for the signed extension functionality itself, but we need to add integration tests for the runtime which

Ensure the correct behavior when invoking TaggedTransactionQueue::validate_transaction
Ensure the correct behavior when invoking BlockBuilder::apply_extrinsic

Figure out running nodes

Storage:

Rollup stacks are very immature so all historical blocks must be kept and served. Calibrate maximum block size based on not overwhelming a 2TB SSD in 1 year

CPU:

2 / 3 vCPUs

Network:

Validators: 500Mb, RPC nodes: 1Gb

Memory:

16GB / 32GB ? Mempool is likely the largest contributor

cleanup demo adapter

right now, the demo still mentions celestia adapter in a few places. Here is for example rollup_config.toml

[da]
# The JWT used to authenticate with the celestia light client. Instructions for generating this token can be found in the README
celestia_rpc_auth_token = "MY.SECRET.TOKEN"
# The address of the *trusted* Celestia light client to interact with
celestia_rpc_address = "http://127.0.0.1:26658"
# The largest response the rollup will accept from the Celestia node. Defaults to 100 MB
max_celestia_response_body_size = 104_857_600
# The maximum time to wait for a response to an RPC query against Celestia node. Defaults to 60 seconds.
celestia_rpc_timeout_seconds = 60

we should clean it up

Use out-of-tree subxt

Investigate if this is possible - if not, keep the in-tree clone.

shim: Nicer submit key message

no submit key provided, will not be able to submit blobs is printed when there is no submit key. Perhaps, for DX we could additionally print use --submit-dev-alice or --submit-private-key=<..> to fix.

PoC: running Op-Stack adapter

Investigate API

shim/sovereign: remove greedy polling

Right now each get_block request will await the needed height. Each of the requests will do its own polling loop which is not ideal. We could solve this by placing a conditional variable in the sovereign adapter code.

shim: extract timestamp from block not from storage

Right now we use a storage query to find out the timestamp of a block. However, the timestamp is set by a mandatory extrinsic, so we might as well extract it from there.

The common API

There are several rollup stacks out there. E.g. #19 #8 #9 and sovereign-sdk.

Sov is Rust. Others in Go. Probably there may be yet others in completely different languages (although, not very likely). We don't want to reimplement all the intricacies in every adapter. Therefore, it may be worthwhile to push most of the heavy lifting into what we called previous an API.

There are several approaches to build such an API:

One of them is to try to come up with a common denominator API that could be used to implement every adapter we would encounter.
Another way, is to just provide API specifically for each adapter, i.e. ad-hoc API, Like, e.g. /sovereign_v1/finalized_at/:height and /rollkit_v2/get_blob/:id.

While the common denominator approach seems cleaner/more elegant/more aesthetically pleasing, I think the ad-hoc API approach is the way to go, at least for the beginning.

I think getting the common denominator API is tricky. Every new adapter may change the whole thing completely and we may still need to add the ad-hoc things for specific adapters.

I am pretty sure we will have some breaking changes initially. With the ad-hoc APIs we don't have to spend too much time to foresee all the possible use-cases, we just implement the current API in the form of RPC and that's it. There still may be a new adapter that challenges some foundational assumptions, but that would affect the internal API/the implementations which should be easier to change compared to the common denominator RPC which a public API. Such a change may affect the adapters who were perfectly happy with the previous version of the API (most of them probably).

The elegance of the common denominator API also comes at a cost: while ad-hoc API will return all the required information for the adapter call in one go, the common denominator API may require several calls and further logic on the adapter side.

OTOH, the ad-hoc APIs allow making the adapters dumb. All the heavy lifting is performed in the implementation of API. Making the adapters dumb is desirable because it makes the whole system more amenable for testing. Arguably, dumb adapters don't require as much testing besides some integration tests and we can cover the common Rust logic more extensively than we could ever do in adapters.

The ad-hoc API has a downside, that probably the adapters will have to be in-tree. I think this is fine. We can provide the adapters for select rollup stacks. For example I wonder if #9 #8 #19 cover a big chunk of rollups already. Realistically, we cannot expect that stack devs will line up to implement integrations with sugondat, so we will have to kinda do it ourselves at least initially.

Also, picking the ad-hoc API way doesn't mean we should stick forever to it. Maybe, if there's enough demand, we could add a common denominator API later on, when we gain enough experience and things may stabilize.

With all that said, I will present my current vision of the common denominator API below. The reason is, it should be obvious how to structure the ad-hoc APIs, however, as I mentioned, the common denominator API is less obvious. Also, perhaps, this API may inform under the hood abstractions for the ad-hoc API approach.

header(height) → AbridgedHeader

/// Get the blob by it's ID. 
get_blob(id) → Option<Blob>

/// Returns the inidices of blob extrinsics for the given namespace.
get_blobs(height, Namespace) → [ExtrinsicIndex]

/// Same as the above, but also returns a proof that the returned extrinsic index vector:
/// (a) doesn't omit any extrinsic index (exhaustive)
/// (b) every extrinsic index represents a blob of the specified namespace.
get_blobs_with_proof(height, Namespace) → [ExtrinsicIndex], Proof

validate(id, proof) → bool


submit(Blob, Namespace) → id

Notes:

height — a finalized block number. In every API so far it was u64 and I am ok with it. In JSON we may consider using String to encode it.
id is an identifier of an included blob, e.g. height and extrinsic index or height and extrinsic hash.
- The is a subtle difference between picking index or hash for the ID in that hash can be obtained before submitting but index can be only obtained after the blob landed. Arguably, it doesn't matter much because height has to be obtained after landing anyway.
- Substrate default RPC doesn't provide extrinsic-by-hash API and it's not trivial to add since that would require integrating the corresponding index. Hence height is required.
AbridgedHeader consists of:
- hash
- parent
- nmt_root
- timestamp
- ...
Blob = Vec
Namespace, TBD by #12

docs: sovereign adapter

https://github.com/thrumdev/blobs/tree/main/adapters/sovereign

Update README.md for the sovereign adapter. Right now it is just the template copy pasta. The readme should try better at being useful. It may describe the architecture and maybe link to the copy of the template, but I think it would be better if it talked about things like how to integrate it into a project, anything that a developer looking to integrate it with a sov based rollup should know.

sugondat-node: fix warnings

there are quite some warnings coming from sugondat-node. It seems that the node-template is out dated and is using some of the soon-to-be-deprecated APIs. This should be fixed.

CI: make sure that demos work

Demos are not in the CI and it's quite cumbersome to run them. Those should be checked in the CI.

It's a bit heavy though, so maybe we should run them only on master commits or maybe as part of merge queue as opposed to for every commit on PR. I think I am OK to break master occasionally for the time being if that improves CI times. When the project is released we should be more strict about that though.

Perhaps, that would require us to write docker-compose files (or maybe something similar). Those would allow us a one-click way to launch the demos, which is useful to do to check stuff locally. Those files can also double as configuration examples should the user want to build their own test environments.

Add a workspace Cargo.toml

The motivation is to write a single cargo test --all command (which can be run in CI)

rename shim?

shim is fine, but sugondat-shim may be not. Too long to type and tools and apps like that should try to have shorter names. Imagine if git, node, npm, go, and even cargo (although that one is a bit on the longer side already) had longer names.

I think we should pick a name which is like up to 8 chars long. It ideally should be catchy, but it doesn't have to make sense and could be a made up word.

Test Kusama Parachain using Chopsticks

Chopsticks is a testing tool for creating a parallel reality. We will use this to ensure that we can deploy the runtime onto Kusama safely, so we do not win an auction with a bricked chain like many teams before us.

We want to create a test scenario where:

The Kusama parachain is registered with the desired genesis and Wasm code, with the reserved para-id: 3338
The Kusama parachain has a lease assigned
Kusama has a sudo key for making root-level transactions to emulate governance.

Properties to test:

Teleport KSM from relay to para
Teleport KSM from para to relay
Sudo: perform governance operation
XCM: perform governance operation from Kusama
Blob: submit blob

shim: proper error handling

specifically, a few unwraps scattered around the codebase. Specifically, if you sovereign_getBlock with node-url set to a non-sugondat node, it will panic with the error

thread 'tokio-runtime-worker' panicked at 'called Result::unwrap() on an Err value: no tree root found in block header. Are you sure this is a sugondat node?', sugondat-shim/src/adapters/sovereign.rs:27:64
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

(Another issue is that sovereign seems to not receive the memo that it panicked, so it kept waiting)

shim: Avoid bare serialization of byte buffers and other non-standard json types

Right now we are using bare bones serialization for blobs.

blobs/sugondat-shim/common/sovereign/src/lib.rs

Lines 20 to 22 in 662d507

 pub struct Blob { 

 // TODO: This should be a newtype that serializes to hex. 

 pub sender: [u8; 32],

Basically it encodes each byte as a dec number, which are delimited by commas. This is not really efficient.

A better approach would be to one of the base64 or hex.

base64 is used by celestia/cosmos. hex, prefixed with 0x00 used by ethereum and thus inherited by polkadot. I lean towards the latter.

submit blob should probably take [u8; NS_ID_SIZE] instead of u32

Migrate Rollkit adapter to go-da

The Rollkit team has been working on the new and better DA abstraction landed at https://github.com/rollkit/go-da. The Sugondat adapter for Rollkit will have to support it soon as Rollkit already migrated.

sugondat-shim

So let's say we implement #23

The question is how? I'd like to propose we consider implementing it as a shim/sidecar server running along the rollup and the sugondat-node, instead of implementing it as part of the RPC of sugondat-node.

There are several facets, let's go through each of them.

UX/DX

Yes, that would mean that the user will need to run another thing. IMO it's fine. I agree it would be better to run a single binary, but we are already past this: in the minimal deployment the rollup user would run the rollup node and the sugondat node as well. This already warrants using something like docker-compose. More realistically, the user will run some monitoring tools as well in a typical deployment. Adding another app here doesn't feel like much difference.

In a development environment, the dev would need to run the rollup node, sugondat-node and perhaps one or more polkadot validators, which already kinda assumes there would be some orchestration tooling (zombienet or docker-compose).

However, having a shim may improve the experience though. Imagine that we could provide the sugondat-shim simulate --data /tmp where instead of depending on a full fledged sugondat environment, it would simulate a DA layer. I expect this would improve the DX significantly.

Key management

There is a problem right now: #22 we completely ignore the transaction signing aspect of blob submission in adapters. As I alluded to in #23 I think it's worthwhile to shift the complexity into the common API layer and make the adapters dumb. This would fit perfectly into the shim set of responsibilities.

E.g. when running in non-simulation mode, the user would be able to specify: sugondat-shim --submit-private-key=/var/secrets/my_key (or sugondat-shim --submit-dev-alice to preserve the existing behavior) and that would enable the blob submission endpoint.

Flexibility

We would decouple running sugondat-node from the rollup client. That:

enables users to pick whether to run node locally or point to some, potentially public, remote endpoint.
allows us to embed the light node later on.
gives us a point of integration into the users machines. We could add some caching, or maybe promote the node to do some stuff, e.g. portal network like stuff but for blobs.

Shortcomings of Substrate RPC

Funny thing, but this approach doesn't address the issue we discussed in slack: we still have to request the full blocks. This is fine for the initial time: for local use-case it works, for the remote use-case it's worse, but in the future, sugondat-node should expose more efficient.

Embedding

In case embedding is needed badly, it will be possible to arrange that anyway at the relatively low cost through a hourglass pattern. This is how we can achieve that. We link the shim to the adapter directly. The shim publishes a very slim FFI API that configures and sets up a server and some FFI functions to send a message to the server and receive a result, very much like in-process HTTP server (although the API would be more complex in case websocket/JSON-RPC are used for the shim transport).

Idea: load balancing by implicit worker-chain locking of global account states

A follow-on from this excerpt in the discussion of https://github.com/thrumdev/sugondat/issues/4#issuecomment-1797062292

A rough writeup of the idea, although in its current state I can't see wanting to go with it

There is a possible set of approaches where each blob can actually land on any worker chain, but only a single one per relay-parent, and only if it has not been included on any other worker chain already...but I'm not sure exactly how this would look.

What this essentially asks for is:

A global state of account balances and nonces
A locking protocol which ensures any particular transaction is handled on only a single worker chain

Solutions of this type are possible, but with latency, because worker chains do not run completely independently but are all secured by the same relay chain and they can receive state proofs from the relay chain.

We can have a global state of accounts stored on the aggregator as well as a record of the last relay parent known for each worker. The aggregator updates the nonce of a sender whenever a worker includes a blob from the sender. The worker chains will use state proofs of the aggregator state to implicitly acquire unique locks on the account state.

Each transaction (sender, nonce) pair implies a random ordering of all worker chains such that at any relay chain block number there is a deterministic fn transaction_lock_description(RelayChainBlockNumber) -> { current: WorkerId, previous: WorkerId, prev_release: RelayChainBlockNumber } which describes the current conditions of the transaction lock:

The current worker chain which may host the transaction
The previous worker chain which could have posted the transaction
The last relay parent at which the transaction could have been posted to the previous

Therefore, if tx.nonce == nonce_from_aggregator_state(sender) + 1 and next_possible_relay_parent(previous_worker) > transaction_lock_description(now).prev_release then the lock is owned by the current chain.

However, there is an edge case when the previous(previous) worker chain has had a block stuck in availability - such a block may be included with a very old relay parent and may include the transaction. So although the previous chain would have not acquired the lock by the above conditions, the current chain assumes that it has exclusive access and this results in a double spend when the old block from 2 hops before is finally included. This can be fixed by ensuring that the transaction_lock_description changes slowly, so that by the time of prev_release, the next_possible_relay_parent(previous(previous)) must have been released.

Lock handovers are very time consuming. Each worker chain has a view of the aggregator which is 12-18 seconds out of date, due to asynchronous backing. The aggregator in turn has a view of each worker chain which is 12-18 seconds out of date, leading to a total expected case latency of 24-36 seconds for a lock handover. This could be improved by having worker chains directly read each others' state rather than taking the aggregator's view of the state, but this harms horizontal scaling.

The random ordering for each (sender, nonce) pair should vary in ordering as well as a changeover time (but not duration) to ensure a balanced load across the system.

Balance withdrawals from the global state also add some complexity, in that the aggregator chain would need to acquire an exclusive lock on the account state. One approach is to add an additional (bit) flag to each account and execute the withdrawal only at the point where all worker chains would have seen the bit flag and refused to execute any transaction from the user.

CI: ensure subxt bindings are up-to-date

Instead of generating subxt bindings during each build we pregenerate them and commit in-tree. However, the bindings may go out-of-sync with the runtime due to (deliberate or accidential) changes.

Therefore, CI must fail for commits that have bindings that do not correspond to the bindings produces by a runtime should be marked as invalid.

Create multiple runtimes for different deployments

In runtimes/ directory under sugondat-chain. Each chain should have its own SS58

Kusama - inherit Kusama SS58 (2)
Test - inherit Generic-Substrate SS58 (42)

shim: make sure it's connected to sugondat-node

Shim expects a node URL to connect to. It's common to run several nodes on one machine and I think it's not impossible to connect to a wrong node, e.g. relay chain node.

Using subxt we can validate that the blobs pallet is present upon establishing connection.

Determine Namespace ID size

Can likely be future-proofed with left padding

Collect background reasoning behind Celestia 27 byte namespaces

shim: health endpoint

Docker has a feature referred to as HEALTHCHECK. It allows running a predefined command that checks health of the container. It may be a nice-to-have for ops. It is also useful for docker-compose since it allows specifying depends_on for start up order and ideally the demo containers should start only after the shim is up.

fix todo: rollkit adapter multiple blocks submission

right now, rollkit adapter API can submit several blobs at the same time and expects that a single value "da layer height" is returned. It looks like it's implied that the blobs are going to be submitted all at the same time.

I am not sure why they make such an assumption, but it looks like we could:

leverage the existing batch functionality from the utility pallet. It looks like batch_all variety should work better, so that all blobs are either included or not included.
roll our own implementation of submit_blob_batch and then reimplement submit_blob as a degenerate case of submit_blob_batch with one blob.
fuse the blobs together in the rollkit adapter or dock.

The (2) approach seems to be (1) better on the first sight, at least because it's more efficient. Specifically, it performs a single check, instead of performing the checks for each blob. The (2) will require additional benchmarking on our side, whereas (1) won't?

Note

There is a weird thing, that if the blob validation fails, the blob will not get into the final NMT, but it will still occupy space in the block.

See https://github.com/thrumdev/sugondat/issues/74

The (3) maybe a good approach for now. It doesn't require any changes on the runtime side and should be pretty simple.

Failed submit_blob will still end up in returned blobs

It's possible that when shim returns blobs for a specific namespace (for sovereign dock at minimum), it would attempt to create proofs of inclusion. However, some of the blobs may potentially fail to be included. Those submit blob extrinsics would still be present in the block, and as such will be returned in the resulting "blobs" field. By default the sovereign adapter would treat them as all of them included. At the same time the inclusion proof would not contain the failed blobs. Therefore, I would expect that this inclusion proof will fail verification.

This would be solved if we did not allow the invalid blobs in blocks as per #74

Decide appropriate hash function(s) for NMT internal / blob hashes

Either choose one or a couple hash functions for extra blobs

Consider using a different hash function for blobs vs NMT nodes

Multiple hash functions = multiple trees + extra overheads per blob in the multi-core model

Tidy up CLI crate / helptext

Consider deleting `BlobsList` in `on_finalize`

Committing blob metadata to the tree may be unnecessary

Remove explicit commit for sovereign-sdk in Cargo.toml

The Ultimate Demo

We discussed how we want to launch with some demos. It's cool and such, but I doubt that people would rush to build the stuff on top of it.

So I was wondering if there is something that normal people could interact with. Something where people could send transactions, something tangible.

Well, it's not obvious what could be deployed and that works out of the box. Yes, there is #9 but I doubt we can make it work that easily. It's hard to imagine to build something that would convince people ape in immediatelly. Even if what it takes to bridge DOT.

At the beginning of kusama chain there was the original pessimistic rollup, called 1m remarks, concieved by Shawn Tabrizi. Basically, it worked with the help of System.remark. Users would send a remark with a specifically crafted message to paint a single pixel on the canvas. A special node software (well, more like a script) would just scan remarks and put them on the canvas and render them as a picture which then went on kusama website.

We can pull off the same here, revive it as 1m-remarks-9000 (1mr9k). Specifically, craft a pessimistic rollup (i.e. no SCA), that would just scan the transactions and create a picture. 1m remarks was not easily discoverable (it actually was hidden on the website) and 1mr9k should be very discoverable. 1m remarks did not come with software that allowed to drop pictures, and 1mr9k should come with software that can draw entire pictures. Ideally, there should be a website that posts the picture, but maybe a command line tool would suffice.

If that picks up, it would make a good stress test. If it doesn't pick up, well, it would still make a good stress test but locally :p

Probably, https://github.com/thrumdev/sugondat/issues/16 should be taken into account.

shim: simulate

As described here https://github.com/thrumdev/sugondat/issues/24

Mainly it should try to model closely a running sugondat parachain:

it probably should emulate distinction between finalized and not finalized blocks.
blobs may take some time to land, but also can land immediatelly.

The user must be able to interact with the simulation somehow. First of all, there must be functionality comparable to sugondat-shim query. That is, the user must be able to submit blobs, inspect blobs, inspect the block contents, at the minimum. Optionally, the user must be able to do some advanced manipulations such as revert blocks.

Given that, I think we should duplicate the those controls under simulate. E.g.

simulate serve – starts the simulation. Optionally, provides a way to store the "blockchain" on disk.
simulate submit – similar to query, but operates on the simulation and provides additional settings such as inclusion delay.
simulate revert – reverts the simulated blockchain. I guess for simplicity operates on the running simulation as opposed to locally, although that's up for discussion.
simulate get-blob

	fn support_url() -> String {
	"https://github.com/thrumdev/sugondat".into()

	pub struct Blob {
	// TODO: This should be a newtype that serializes to hex.
	pub sender: [u8; 32],