This is a candidate way forward for the filecoin storage actor interface <p dir="a

Actor Storage API Proposal about specs HOT 2 CLOSED

filecoin-project commented on June 9, 2024

Actor Storage API Proposal

from specs.

Comments (2)

phritz commented on June 9, 2024

Thanks for writing this up. Here are a few questions that would help me understand what you're thinking. Dunno if it is easier for you to answer in a comment or reflect the answers into the text above or some combo.

Can we find a way to not call the unit of storage a block? I realize the underlying interface is a blocks.Block, but man is this a source of confusion.
What does the VM ABI look like in this world? To pass data between actors are we passing a cid or the serialized version of the thing (the thing as it would be stored)?
There are no more remnants of the linear array of bytes idea, correct?
There's no automatic chunking of data into blocks in this world right? That is, actor authors manually lay their data out in a way that makes it likely that small data changes result in small output changes. I mean, I know actors could implement automatic chunking, but we don't have any plans to for v1 is what I'm trying to confirm.
WRT not loading the world into memory with the hamt:
- at first glance it looks like the hamt nodes embed values (not cids) in the KV. values could be really large, should we be storing cids there?
- suppose i have a huge value in the hamt, maybe the value is a huge array of ints. is there any way to avoid loading the value entirely into memory?
Could you talk through a simple example of laying the data out to ensure that small input changes lead to small changes in output, say with something like AskSet?
Suppose we find a bug in the hamt compiled into an actor. Suppose it causes the tree to be laid out improperly, we lose keys under certain conditions. How do we fix it?
What happens in between the go struct and the BlockStorage interface? Like, how does an actor transform a go struct into a blocks.Block and how does it get a go struct from a block? Basically I'm curious how huge a dependency tree sits under whatever those interfaces are as they will have to be compiled down to wasm and we have to have a way to update them.
How might versioning work? Say I have a struct foo and I inline some required data bar that is big. Later I realize I'm wasting a lot of space because bar rarely changes. I want to link to bar instead of inline it. How do I do that?

I have a few more questions around what this looks like with user-supplied actors but they can wait, this list is long enough!

from specs.

whyrusleeping commented on June 9, 2024

Can we find a way to not call the unit of storage a block? I realize the underlying interface is a blocks.Block, but man is this a source of confusion.

Yeah... Youre right. Not using block would be nice. We can probably alias the blockstore to chunkstore just to save ourselves the confusion.

What does the VM ABI look like in this world? To pass data between actors are we passing a cid or the serialized version of the thing (the thing as it would be stored)?

I think to start, we can just do pass by copy. Later on, we can add fancier ways for 'zero copy' sharing of data between actors, I don't think this is needed early on, and likely easiest to just upgrade to it later

There are no more remnants of the linear array of bytes idea, correct?

Correct. I think we all agree its a 'neat' abstraction that doesnt work out in practice very well. In ethereum its very inefficient.

There's no automatic chunking of data into blocks in this world right? That is, actor authors manually lay their data out in a way that makes it likely that small data changes result in small output changes. I mean, I know actors could implement automatic chunking, but we don't have any plans to for v1 is what I'm trying to confirm.

Right. The intent is to use data structures like the HAMT or 'sharded arrays' that do the chunking for you.

WRT not loading the world into memory with the hamt

Yeah, we should put cids as the values in the hamt, it should work out just fine (with respect to object linkage for graph traversals).

suppose i have a huge value in the hamt, maybe the value is a huge array of ints. is there any way to avoid loading the value entirely into memory?

You should use a data structure that shards the array. One easy option is to implement 'arrays' as just a HAMT with the keys being the index of each element. (this is pretty much how ethereum does arrays of things in the evm, its not great, but it works)

Could you talk through a simple example of laying the data out to ensure that small input changes lead to small changes in output, say with something like AskSet?

For the ask set, I would just use a HAMT, and have the keys be the ID, and the values be the Ask (could also be a cid of the ask, but its probably fine to put the ask directly in this case. We will want to develop some conventions around this, ethereum mixes this inconsistently to my great dismay. Sometimes theres hashes, sometimes theres the actual objects :/ )

Suppose we find a bug in the hamt compiled into an actor. Suppose it causes the tree to be laid out improperly, we lose keys under certain conditions. How do we fix it?

For this, i'm going to answer the versioning question:

How might versioning work? Say I have a struct foo and I inline some required data bar that is big. Later I realize I'm wasting a lot of space because bar rarely changes. I want to link to bar instead of inline it. How do I do that?

The way I see actor upgrades (and i'm pretty sure we talked about this in lisbon), is we make a call that looks something like:

func (a Actor) Upgrade(from, to CodeCid, dataMigration CodeCid) error {
    // Code here loads up the data in the form that it is in the old code, and converts it to the 
    // new code with user specified logic. The nice thing here is that since all the inputs and outputs
    // are hashes, this should be easy to test against real chain data and validate everything is working
    // properly.
    // The 'migration' code could even just be 'set the storage cid to QmFooBar'
}

So in the case that we have an improperly laid out tree, we load the tree with the old code, load up a new tree with the fixed code, and transfer all the data between them. In the case that the current values are too big, we do the same, but split things up before inserting them into the new tree.

What happens in between the go struct and the BlockStorage interface? Like, how does an actor transform a go struct into a blocks.Block and how does it get a go struct from a block? Basically I'm curious how huge a dependency tree sits under whatever those interfaces are as they will have to be compiled down to wasm and we have to have a way to update them.

This would be cbor marshaling the structs into the raw data and visa versa. It's a sizeable dependency, but It will be reuseable between actors. As for updating, That's tricky, but I'm pretty sure we can do it without a hard fork via some voting mechanism, or shared custody of keys responsible for the dependency.

Anyways, thats a first pass over those questions. Let me know what I should elaborate more on, and feel free to add more. tomorrow I'll see about editing the top comment to integrate some of that information.

from specs.

Actor Storage API Proposal about specs HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent