wnfs-wg / rs-wnfs Goto Github PK
View Code? Open in Web Editor NEWRust implementation of the WebNative FileSystem (WNFS) specification
Home Page: https://github.com/wnfs-wg
License: Apache License 2.0
Rust implementation of the WebNative FileSystem (WNFS) specification
Home Page: https://github.com/wnfs-wg
License: Apache License 2.0
Allows us to see differences between two HAMT nodes.
This allows us to see what's changed between changes in the filesystem and enables checking writes.
Hi, I described issues what I met while playing around with rs-wnfs
. I know we're WIP, so any issues could happen.
README.md
line 193 script/
rs-wnfs.sh
got permission denied.rs-wnfs build --all
get wasm-opt
command not found.example/graph
following README.md there, got some errors in console:sh scripts/rs-wnfs.sh setup
::: Make script executable :::
::: Drop a link to it in /usr/local/bin :::
ln: failed to create symbolic link '/usr/local/bin/rs-wnfs': Permission denied
::: Failed to install :::
/usr/local/bin/rs-wnfs: line 124: wasm-opt: command not found
scripts/
instead of script/
/usr/local/bin/
area which is required root privilege. Therefore I recommend using sudo
and update README correspondingly.binaryen
by sudo pacman -Sy binaryen
.More information
Desktop:
I'm pleased to help further to the project (by creating any PRs) if its fine.
rs-wnfs
todayPublic File System:
read
, write
, mkdir
, ls
, mv
, rm
, of public files/directoriesPrivate File System:
read
, write
, mkdir
, ls
, mv
, rm
, cp
, private files/directories with a ~262kb limit on private file size & ~2500 directory entry limit on private directoriesPublic File System:
Private File System:
For both:
This is a great start, but we should add some variants to this, specifically a
lookup_node
that seeks.
I.e. does the normal lookup, but then it tries to advance the ratchet of what it looked up to the most recent version it can find. That might reveal a newer version of a directory that actually does (or doesn't) have the next path segment.We want to do seeking lookups because one peer may only have access to
private/Apps/flatmate
and so when writing the fileprivate/Apps/flatmate/state.json
it will only write thestate.json
andflatmate
nodes.
Then, when a peer reads theprivate/Apps
directory, that will still link to the oldflatmate
directory, not the new one. So that peer needs toseek
to newer versions.(Also, I need to write down the seeking algorithm in the spec, and it should be probably implemented in the rs-skip-ratchet repo)
Originally posted by @matheus23 in #38 (comment)
The different kinds of reads are described in the spec here:
https://github.com/wnfs-wg/spec/blob/main/spec/private-wnfs.md#path-resolving
In the definition for PrivateRef
:
Lines 93 to 102 in 11b73a7
Serialization is derived via the macro.
This struct is used in the PrivateDirectorySerde
representation of directories:
rs-wnfs/wnfs/private/directory.rs
Lines 51 to 58 in 11b73a7
Again, serialization is derived using the macros.
However, we actually want the ratchet_key
within the entires
of directories to be encrypted with the directory's ratchet_key
!
Again, we have an issue with passing around key
s to enable serialization.
We could define & use a PrivateRefSerde
in PrivateDirectorySerde
and decrypt it on-the-fly similarly to the PrivateDirectorySerde
.
Currently we're throwing away the metadata we get from rust on an ls
operation:
It's useful to have, though: it helps us figure out what whether entries are files or directories ahead of time.
This is needed to implement a backwards-compatible interface in webnative's ls
.
With the completion of the public filesystem, we think the rs-wnfs
is now suitable for experimentation.
The libraries from this project should be published in respective registries with necessary usage documentation provided.
wnfs
and wasm-wnfs
).wnfs
).@zeeshanlakhani pointed out an opportunity to use tinyvec
for a stack structure in a review.
We use working stacks and other collections in a few places that can be optimized with tinyvec
. tinyvec
keeps the collection on the stack until the static limit is hit subsequently moving the collection to the heap. For some of these structure we know the upper bound that we are likely not to exceed allowing us to keep things mostly on the stack until we hit an exceptional case.
Here are some areas in the library that can be improved:
The PrivateNode
implementation is missing the as_file
and is_file
functions.
The library is likely going to be used from other languages in the future. Given the number of languages that support the C calling conventions, there will be an expectation of a version of rs-wnfs
that compiles to static or dynamic library that exposes such interface.
We should support this by:
Essentially have the PrivateForest's values be an array of CIDs.
It is useful to have a benchmark comparison of ts-based wnfs implementation and wasm-wnfs.
It will help us:
See this article for more information on how to get benchmarking working with playwright.
Let's also create a follow-up issue for changing this according to the spec, i.e. have the header inline, encrypted into the node and have this link just be a
Cid
.
Originally posted by @matheus23 in #38 (comment)
The specification changed (Sorry!).
The new format can be seen here:
https://github.com/wnfs-wg/spec/blob/main/spec/private-wnfs.md#the-decrypted-layer
Specifically this:
type PrivateDirectory = {
type: "wnfs/priv/dir"
version: "0.2.0"
// encrypted using deriveKey(ratchet)
header: Encrypted<CBOR<PrivateNodeHeader>>
// userland:
metadata: Metadata
entries: Record<string, {
contentKey: Key // hash(deriveKey(entryRatchet))
revisionKey: Encrypted<Key> // encrypt(deriveKey(ratchet), deriveKey(entryRatchet))
name: Hash<Namefilter> // hash(saturated(add(deriveKey(ratchet), entryBareName)))
// and can be used as the key in the private partition HAMT to lookup
// a (set of) PrivateNode(s) with an entryBareName and entryRatchet from above
}>
}
Enables conflict resolution / dealing with concurrent writes
Apart from intrinsic memory allocation needed for collections, smart pointers, etc., the library does not depend on a lot of system functions that the OS provides therefore making it a good candidate for a no_std
implementation with just the alloc
and collection
features enabled.
There is really no pressing need for this now and requirements might change as the implementation evolves. But it would be nice to enforce this non-dependency somehow so this issue is here for that as well as other needs for it in the future.
We want test our fs components based on their properties instead of the specific scenarios we create for them right now. These scenarios are the edge cases we can think of and it is easy to miss other important cases. Property-based testing lets us lay bare the component and let automation poke at different points in its makeup.
There are two libraries to consider, proptest
and quikcheck
, but proptest
is favored because it has a better "shrinker", even though it can be a tad tedious to write.
Using RefCell
in rust can be dangerous, because it can panic if you don't follow the borrowing rules correctly.
This is where we're using RefCell
today:
https://github.com/WebNativeFileSystem/rs-wnfs/blob/8e59f1f056ed3b526ce1f63e789de53fa94c11b3/crates/fs/public/directory.rs#L28-L37
https://github.com/WebNativeFileSystem/rs-wnfs/blob/8e59f1f056ed3b526ce1f63e789de53fa94c11b3/crates/fs/lib.rs#L22-L31
It may be possible to change the Shared<PublicDirectoryInner>
, which is essentially a Rc<RefCell<PublicDirectoryInner>>
into a Rc<PublicDirectoryInner>
.
However, that'd need some change for a couple of internal algorithms, specifically diverge_and_patch
and upsert
.
Ported from oddsdk/ts-odd#312
Increase namefilters to 2048-bits (256-bytes). This doubles the number of elements that we can fit in the Nyberg accumulator.
This has implications for private file UCAN semantics. See oddsdk/ts-odd#313
Currently, PublicDirectory
s that are passed from rust to JS will have a free()
function generated with them, which will clear up memory used inside the WASM memory & must be used to prevent memory leaks.
We should figure out a way to call wasm-bindgen
with the --weak-refs
option. Unfortunately wasm-pack
doesn't support that yet.
Implement FS APIs:
read
write
mkdir
ls
get_history
(or similar)Contrary to the public side, this will likely not need base_history_on
, but work differently, since stuff isn't content-addressed anymore & we don't necessarily need to "upgrade" nodes on the path anymore.
Whether a node is a file or directory is now determined from the type
key outside the metadata
record.
Same thing with the version
field.
Also, metadata
is now more explicitly just specified as an dag-cbor map. In the code we can probably represent it using the IpldMap
type.
We should add functions that allow reading/writing from/to specific keys in the metadata.
The JS API should be adjusted accordingly, a JSValue serde implementation may be helpful?
We should still write some metadata keys "by default". I'd recommend these to be created
and modified
fields.
Let's better our series of contribution information and how to get started in developing for rs-wnfs. I'm always heavily influenced
Implement a basic WNFSv2 public filesystem as a proof of concept of running wasm in the browser. This is not supposed to be a full implementation of the public filesystem.
To resolve this issue, the following conditions should met:
PrivateForest
is currently defined as pub type PrivateForest = Hamt<Namefilter, Cid>;
where Cid points to an encrypted deserialized PrivateNode
bytes. For every access to this structure we have to decrypt the PrivateNode
each time. This is inefficient so we want to cache the PrivateNode
s in a PrivateLink
type just like PublicLink
.
I was working on this but soon found out current AsynSerialize
definition is not enough to allow PrivateLink
implementation. We should look into how we can make AsyncSerialize
more robust or have a variant of it for the Private filesystem.
Complete the immutable core implementation of the public fs. We need this to get rid of race condition bugs that require locks which can also lead to deadlock situations. Basically, it gives us fearless concurrency. ;-)
To resolve this issue, the following conditions should met:
Note: It is a forever append-only system but we think it is not a significant problem for now. More on this later.
Figure out IPLD DAG-CBOR serialization with libipld + serde + recursive async functions.
The return type should be Promise<Uint8Array>
. The byte array should be a CID, at least from looking at the rust side:
I think the fix is just a return cid.bytes
on the last line of putBlock
.
Some encoded field names still encode in snake case, e.g. content_key
in PrivateRef
, but should be camel case, so contentKey
.
Some other field names just need to be matched, e.g. saturated_name_hash
should be encoded as name
, or userland
for directories generally needs to be renamed to entries
.
Ported from oddsdk/ts-odd#310
Especially with #301 (immutable core), the
Sync is network-bound, and the private partition evenly distributes data by design. While we're waiting for GraphSync to be production ready, the easily solution to improving performance is to flatten the structure. A 1024-weighted HAMT (i.e. 128-byte bitmask headers) hits a sweet spot efficiently getting a node to queryabile sync in few round trips and overhead.
We're thinking of a data structure that may look like this:
enum Link<T> {
Clean {
cid: Cid,
cache: OnceCell<Box<T>>,
},
Dirty(Box<T>),
}
There may be issues using a OnceCell
, we need to figure it out!
Also, to Box
or not to Box
(or Rc
?) TBD.
mv
is much more complicated in the private file system because we need to modify the namefilters of everything in the tree that's moved.
I think it may also be useful to separate out mv
into cp
+ rm
.
Add support for merging file trees. Resolution is based on file/folder names and the resulting tree should be a union of both trees.
NB: Feature requests will only be considered if they solve a pain
The only way to obtain the root private key of WNFS-RS and use it on antoher device(login), right now, is to call .get_private_ref() on the directory on the first device and then serialize it and move it over to the other device. This doesn't allow deriving it deterministically.
In both the current version of fs used in webnative and WNFS go there is a constructor interface available to the programmer that can feed the private key that WNFS uses for the root. However, it is not exposed in WNFS rust version. exposing it allows application developer to feed the private key from their own login process (whatever it is) and it is deterministic, and based on users' login and not random.
We can then decrypt WNFS tree on any device that the user has without needing to transfer the keys as long as users logs in on different devices and the program can create the same keypair. It gives more flexibility to the application developer on what approach they want and also gives the user more control over which private key to use. If developer wants to feed a deterministic key, they can, if they want a random key, they still can do it.
Exposing the constructor for PrivateRef
Is your feature request related to a problem? Please describe.
It's frustrating where the login cannot be linked to WNFS encryption and we still need to transfer the private key between multiple devices even when the user completes a login process.
Describe the solution you'd like
Like webnative js version and go version, I want to be able to construct the Private_ref myself from a keypair or seed hash (both work).
Describe alternatives you've considered
The only way to obtain the root private key of WNFS-RS and use it on antoher device(login), right now, is to call .get_private_ref() on the directory on the first device and then serialize it and move it over to the other device. This doesn't allow deriving it deterministically.
Additional context
It is also discussed on IPFS Discord on #wnfs with Matheus and Boris
Port of oddsdk/ts-odd#301
Make the internals run on an immutable core
Linking
There is a custom Rng
trait we use for random number generation stuff. We should leverage the RngCore
trait from rand
crate that is commonly used for such cases just like we do serde
for serialization.
This is going to make our Rng implementations a tad complex because we don't really need next_u32
and next_u64
methods right now, but this is going to allow us to use existing RngCore implementations out there without wrapping them, like proptest TestRng
.
We don't necessarily need to have --weak-refs
flag support in wasm-pack
, we can make wasm-bindgen generate weak-ref stuff via environment variables: rustwasm/wasm-pack#937 (comment)
The typescript types generated from wasm-bindgen aren't very precise, see for example:
/**
* Returns the name and metadata of the direct children of a directory.
* @param {Array<any>} path_segments
* @param {any} store
* @returns {Promise<any>}
*/
ls(path_segments: Array<any>, store: any): Promise<any>;
I think path_segments
could be Array<string>
, and store: ForeignBlockStore
?
This helps orient users using TS types for autocompletion in IDEs, for example. Also makes sure that there are fewer accidental runtime errors.
Not big, only people using the wnfs
npm package. Which is mostly me at the moment. I think it's fair to expect anyone else to use webnative, not wnfs
directly.
I don't know. Maybe a setting in wasm-bindgen? Something that needs to be configured? Is that perhaps an issue with wasm-bindgen and they need to fix it?
NB: Feature requests will only be considered if they solve a pain
Right now, when we do an operation in WNFS, it returns the new root cid as a result. To understand which cids are added or removed we need to compare the whole HAMT with the last version and traverse manually.
This gives the flexibility to the protocol developers on top of WNFS to choose how they want to keep the replicates of WNFS synced together.
Besides returning just the new root cid, it return added or removed cids.
Is your feature request related to a problem? Please describe.
Let's say a protocol is developed on top of WNFS that keeps the WNFS-generated cids backed up on multiple untrusted devices (like how Filecoin is chunking and keeping the chunks). Right now, there is a way to keep the whole HAMT synced easily, but there is no easy way to back up part of the HAMT and always sync on some untrusted devices.
Describe the solution you'd like
When performing an operation that changes the root cid of WNFS HAMT, also get a list of cids that were added and a list of cids that were removed as a result of that specific action. For example, cids that we added as a result of mkdir and those cids that are no longer belonging to the new version and are deprecated.
Describe alternatives you've considered
Traversing the HAMT from the root by the protocol on top of WNFS itself and finding the differences between old and new versions of HAMT.
Additional context
Discussed initially on IPFS discord in #wnfs channel with Boris
Implement WNFSv2 private filesystem. This will be one of the initial work to get WNFSv2 private filesystem working.
Here are some of the things that should be implemented:
mv
(#54)Central to our inconsistency is the fact that we can't implement Serialize
and Deserialize
for our core structs like PublicDirectory
or PrivateFile
, since they have additional context requirements during serialization or deserialization.
Key
for serialization and deserialization used for en/decryption as well as some impl RngCore
for randomness during serialization for generating the IV (initialization vector) for encryptionLink
abstraction which allows postponing serialization of CID references in structures as well as lazily loading what's behind a CID reference into an in-memory structure. This means we need a BlockStore
during serialization to call put_block
on for getting the actual CIDs to serialize. It also means that serialization can be asynchronous. We don't technically have an issue implementing Deserialize
, since we can just initialize our Link
with the deserialized Cid
s.There's two dimensions to this issue:
Serialize
or Deserialize
traits from serde, but since we have extra context requirements, we need to copy & modify these traits to become more "powerful" (i.e. have access to a BlockStore
& be async -> AsyncSerialize
).Link<T>
, which needs T
to be some kind of serializable/deserializable, but should be flexible enough to support data structures like PublicDirectory
.Serialize
and Deserialize
for our structs directly, we also can't directly use #[derive(Deserialize, Serialize)]
. We have resorted to different ways of solving this problem, although we're mostly using structs named *Serde
that only contain easily de/serializable data.For (1) we both have the AsyncSerialize
trait for public data (notice its serialize
is an async fn
):
rs-wnfs/wnfs/src/common/async_serialize.rs
Lines 34 to 49 in ce7d988
But on the private side we didn't have a need to abstract yet, so we only used custom functions:
(this serialization only needs some randomness for encryption, since the key can be derived from the ratchet within PrivateDirectory
)
rs-wnfs/wnfs/src/private/directory.rs
Lines 1140 to 1147 in ce7d988
rs-wnfs/wnfs/src/private/directory.rs
Lines 1170 to 1173 in ce7d988
And for (2) in order to make the above easier, we're using these *Serde
structs (Notice what #[derive(Serialize, Deserialize)]
s):
(notice how the *Serde
variant uses Cid
instead of PublicLink
)
rs-wnfs/wnfs/src/public/directory.rs
Lines 41 to 56 in ce7d988
PrivateDirectory
the PrivateNodeHeader
turns into a Vec<u8>
, the ciphertext of a PrivateNodeHeader
)rs-wnfs/wnfs/src/private/directory.rs
Lines 43 to 58 in ce7d988
One problem of deriving Serialize
and Deserialize
for a related struct like PrivateDirectorySerde
is that, this intermediate data structure mostly gets built up only to be torn down immediately afterwards.
Ideally we can skip such work.
While we're looking at these issues, there's a similar concern with building up Ipld
as an intermediate data structure, e.g. for deserializing PublicNode
into appropriate elements, depending on what the type
field deserializes to:
rs-wnfs/wnfs/src/public/node.rs
Lines 268 to 292 in ce7d988
Right now we have a memory-based block store that works well for testing purposes but to make the js bindgen useful, we need the generated wasm code to be able to use some imported foreign block store.
The idea is to expose foreign function interfaces that will allow the wasm side to call and use some opaque block store object passed in from JavaScript code. It has to be implemented in a way that won't result in adding wasm or js specific details to the rust core.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.