(Rough) Design Proposal: Storage & Distribution System
Background
This is not an explanation at the level of runtime modules and protocols. It needs to be split up properly for reusability etc. It combines designs for working groups, proposal system and tranches (here called groups) for both distributors and storage providers.
Overview
The storage and distribution system is responsible for long term storage and distribution of static data objects.
Principles
- All values are immutable unless explicitly listed otherwise
- No value is ever deleted from the state, only marked as no longer in use.
Concepts
Payload Filter
An operationalized way of screening a data object deeply, as defined by
- ID: A unique integer identifier.
- Max Size: If set, the maximum possible data size.
- Min Size (optional): If set, the minimum possible data size.
- Inspection Routine: If set, the raw byte code encoding of WASM (or Javascript perhaps) pure function to be provided raw payload, and which decide whether the payload is acceptable.
Notice that payload size has been lifted out of the inspection routine for efficiency, as it will frequently be the primary dimensions to filter on, and explicitly specifying this outside of the routine allows for checks without always going through the costly exercise of loading a runtime environment for the routine.
Data Object Family
A shared profile for how a family of data objects is treated in the system, as defined by
- ID: A unique integer identifier.
- Description: Human readable description of the purpose of the data object type.
- Payload Filter: ID of payload filter to be used.
- Expected Daily Download Frequency: How many times during a representative 24 hour time period is it expected that a download session will be initiated.
- Expected Download Progress: What percentage of the total data object is expected to be downloaded per download session.
- Minimum Distribution Bitrate: The minimum rate at which such an object must be delivered.
- Distribution Breaks: An array positions in the data stream, as well as corresponding time durations, where distribution will pause the given amount of time. So e.g. [(0,5s), (10MB, 10s)] means to pause at the beginning for 5 seconds, and at 10MB mark for 10 seconds.
- Feasible Storage Groups: Either not set, which means any, or set to a list of specific storage group IDs.
- Feasible Distributor Groups: Either not set, which means any, or set to a list of specific distributor group IDs.
NB: Formerly called data object type.
NB: Policy information about who is allowed to download what data object under what circumstances is exogenous to the system
Data Object
Presence of a static data blob in the system, as defined by
- ID: A unique integer identifier.
- Size: Number of bytes occupied by data.
- CID: A secure hash commitment over data. This needs unpacking, e.g. to allow chunking, and even perhaps variable chunk sizes for different data types?.
- Family: ID of data object family.
- Assigned Storage Groups: List of IDs of storage providers currently responsible for storing a data object.
- Added: Date and time for original upload event.
- Origin: ID of the member who uploaded the data.
- Liaison: ID of storage provider that was assigned the upload from the origin.
- Status: One among
- Pending Liason Review: Liason is validating the object.
- Rejected By Liason: Upload was invalid.
- Accepted By Liason: Upload was valid.
- Removed: No longer stored for whatever reason.
Storage Group
A collection of storage providers, with identical terms of participation and fully replicated storage, as defined by
- ID: A unique integer identifier.
- Status: One among
- Expired: meaning the group is no longer in use, no members are part of the group, no no new members can enter, nor data objects can be added.
- Active: meaning it is fully operational.
- Paused: meaning it is temporarily not in use, hence no new member can join, and no new data can be added.
- Slots: The number of providers which can at most be part of the group at any time.
- Storage Utilisation: The total size of all data objects assigned.
- Required Stake: The number of tokens required to currently enter this group.
- Required Storage Capacity: The amount of storage capacity which any participant is expected to be able to store.
- Required Total Downstream Bandwidth: The required total amount of downstream bandwidth required.
- Eviction Slashing Percentage: The max percentage which can be slashed during an eviction.
- Eviction Gating: One among
- Conductor: The conductor can unilaterally evict.
- Council: The conductor can only recommend eviction to the council, council decides.
- Exit Terms: The earliest time when an exit can be initiated by the group member and the unstaking period from initiation of exit. During this period an eviction can still occur.
Storage Group Entry Application
An application for a member to enter a storage group as a storage provider, as defined by
- ID: A unique integer identifier.
- Storage Group ID: ID of the storage group.
- Applicant: ID of membership for application.
- Submitted: When the application was submitted by the applicant.
- Expiry: How long after submitting the application will automatically expire.
- Status: One among
- Pending: Initial status when created.
- Accepted: Was accepted, storage provider is in the group, includes when, and with rationale.
- Rejected: Rejected, storage provider not in the group, and cannot enter group based on this application, includes when, and rationale.
- Withdrawn: Application no longer active, initiated by the applicant.
- Expired: Application no longer active.
NB: Could add a separate possible application staking fee, both to avoid DoS abuse, and also signal the seriousness of applicants
Storage Group Membership
Membership of a given provider in a given group, as defined by
- ID: A unique integer identifier.
- Storage Group ID: ID of storage group in which membership applies.
- Membership: ID of membership for application.
- Established: When membership was established.
- Application: ID of application which was accepted.
- Status: One among
- Entering: In the process of becoming a fully operational member.
- Normal: Fully operational.
- Paused: Not actively servicing group or distributors at this time, since some point in time.
- Exiting: Is in the process of exiting, initiated at some time.
- Exited: Has completed exiting at some point in time, is no longer part of the group.
- Eviction: One among
- Council: ID of eviction proposal
- Conductor: Conductor has unilaterally evicted provider, with accompanying rationale, at a given time.
NB: Perhaps this can be generalized, this could be the general structure for any working group membership?
Storage Provider Eviction Proposal
Proposal to evict a storage provider from a group, as defined by
- ID: A unique integer identifier.
- Membership: ID of storage group membership where the provider was evicted.
- Time: When it occurred.
- Slashed: Amount slashed.
- Conductor: ID of the conductor.
- Rationale: Description of the underlying cause of eviction.
- Deliberation: ID of deliberation which may have transpired.
- Status: One among
- Opened: Open for deliberation
- Affirmed: Affirmed resolution.
- Cancelled: Cancelled resolution.
Deliberation
A discussion thread concerning some topic, as defined by
TBD
Deliberation Post
A post in a deliberation, as defined by
TBD
Distributor Group
A collection of distributors, with identical terms of participation and fully replicated distribution, as defined by a set of properties identical to the Storage Group, with the exceptions that Required Total Downstream Bandwidth is replaced Required Total Upstream Data Capacity Per Month.
Distributor Group Entry Application
Analogous to Storage Group Entry Application.
Distributor Group Membership
Analogous to Storage Group Membership.
Distributor Group Eviction Proposal
Analogous to Storage Group Eviction Proposal.
Conductor
Analogous to Storage Group Membership, exceptions
- evictions cannot happen via conductor, the only council based
- Stake is stored directly, not in group <== this needs more thought
NB: Perhaps this can be generalized, this could be the general structure for any working group lead membership?
Conductor Entry Application Proposal
Analogous to Storage Group Entry Application, except as proposal.
Shared State
NB: Proposals (Storage Provider Eviction Proposal, Distributor Group Eviction Proposal, Conductor Entry Application Proposal) not here, unclear how to organise, and where
State Transitions
Here is (soon to be) complete list of state transitions supported.
- Storage group expires
- Storage group membership application expires
- Distributor group membership application expires
- Pending Storage group entry application expires
- Accept, reject, withdraw conductor application proposal
- Add storage group
- Mutate a new storage group
- Pause or unpause group status
- Add distributor group
- Add data object family
- Mutate a new data object family
- Update feasible storage groups
- Update feasible distributor groups
- Expected download frequency
- Expected download progress
- Distribution Breaks
- Add a new data object
- Mutate a data object
- Update assigned storage groups
- Set liason status
- Set as removed status
- Add payload filter
- Apply to storage group
- Apply to distributor group
- Exit storage group
- Exit distributor group
- Pause storage group membership
- Pause distributor group membership.
- Evict storage group member
- Evict distributor group member
NB: missing proposal related state transitions, also deliberation
Communication Protocols
This section is highly incomplete, as it probably should incorporate some of the already existing design I am not familiar with. But the upload is shown for reference.
User upload
-
User issues transaction for adding a data object, which includes CID, size and object family. Runtime ensures that there is sufficient storage capacity, in that there is at least one feasible active with a normal status member, and which has sufficient space, and randomly assigns to one among them, and picks a liaison. The space utilization is automatically updated at this time. Otherwise, the transaction is rejected.
-
User resolves and connects to a host corresponding to the liasion, and attempts to make an upload by providing
- A reference to the new data object created
- The raw payload
- Optional: Request token
- Two signature over request using account corresponding to membership, one including request token if present, the other without.
-
Liaison validates owner, upload, and status of group. If checks pass, any applicable access policy can be applied based on request token, or otherwise. If access policy fails, upload is rejected on chain, and the interaction ends, otherwise, the status is set to accept.
The access token optional parameter which can, in a given instantiation of the protocol and system, carry information useful to determining under what context to allow or reject the upload.
NB: Why not let user directly pick liaison at random, based on chain state? It simplifies everything, offloads transactions, etc.
It also means there is no need for cleaning up failed uploads because only successful uploads are added.
User download
TBD
Distributor download
TBD
Storage provider download
TBD
Upload data object
TBD
Conductor Reporting
A conductor has to operate two separate public endpoints, one for submitting errors and one for submitting non-error utilization information. The purpose of the former is to make early stage detection of faults or malicious behavior, which could trigger direct inspection and inquiry. This could be things like
- User is unable to resolve or connect to liaison.
- User upload interrupted.
- Distributor is unable to download from storage provider.
- Storage provider is unable to download from storage provider.
- Invalid data sent.
The purpose of the latter is to guide how storage and distribution resources are deployed, as well as maintain usage statistics, such as view counts, etc.
This could be things like
- Upload initiated and or succeeded or failed for a given reason
- Download initiated and or succeeded or failed for a given reason
It could be useful to have both user and infrastructure software report on the same events. Mismatches could be valuable information to guide policy.
Thoughts
-
Is there room to generalize some concepts? seems to be a lot of repetition for different roles and interactions with council. e.g. the concept of a working group, the lead, participation, membership in the group, application to the group. All of it could be generalized, and there could the capability to dynamically create group on chain even, for more bespoke purposes. It would take more work to give such a dynamic group members on chain capabilities, by defining what one can and cannot do dynamically. But that last step could be overkill. At least we coulda avoid recording the same almost identical structure of all the working groups (curation, communication, builders, discovery, storage&distribution, ...)
-
Should deliberation be here? should it be in generalized working group structure?
-
We must make a protocol and module design which allows for someone to reuse this type of subsystem in their Substrate chain.