ibm / dpfs Goto Github PK
View Code? Open in Web Editor NEWDPU-Powered File System Virtualization over virtio-fs
License: GNU Lesser General Public License v2.1
DPU-Powered File System Virtualization over virtio-fs
License: GNU Lesser General Public License v2.1
These functions will create a struct on the stack that the FUSE implementation has to fill in and then call a special reply function such as fuse_ll_reply_attr
. However with async these stack-residing structs will get destroyed.
Broken functions to fix:
setattr
readdir
and readdirplus
create
flush
(or just don't use the struct fuse_file_info *
, then its fine)flock
If this project is to be used for more long-term projects, it should move away from C and something more safe but low-level like Rust. The frequency of memory related bugs and undefined behavior is very high. The DPU library is in C, so two approaches are possible.
TODO
I posted my question in discussion couple days ago, I think I should move it here for visibility.
Hello,
Currently we are configuring our own environment to evaluate DPFS by following the instructions in the github (DPFS/README.md at master · IBM/DPFS · GitHub) . From the document, a prototype firmware is required for the whole system to work properly. Would you please provide more details about this firmware? Is it developed by NVDIA specifically for the DPFS project? How can we obtain the binary?
Thank you.
In NFS the filehandle returned by OPEN is only valid for the lease period (attribute lease_time
which is in seconds). Currently no consideration of this is made, thus after the lease period has ended we start erroring.
https://www.rfc-editor.org/rfc/rfc5661#section-5.8.1.11
It is likely that this is causing the 10025 (NFS4ERR_BAD_STATEID) and 10020 (NFS4ERR_NOFILEHANDLE) NFS errors that occur some time during the workloads.
The NFS operations nfs_argop4
that are created for every NFS request are not zeroed out. There might be possible initialized memory in there. This is a very easy mistake as some of the operations have very complex RPC structures. So these need to be zeroed going forward.
Currently when opening a file CLAIM_NULL
is used in conjunction with CURRENT_FH=parent->fh
and the filename of the file to be opened. This requires tracking the parents of files plus the filenames of files inside of virtionfs.
In NFS 4.1 there is a CLAIM_FH
flag that allows you to open the file that the CURRENT_FH
points to. Removing this extra bookkeeping overhead from the client.
Just in virtionfs
it seems.
fuser
is completely broken right now. The encapsulation of iovecs called iov
was removed from fuse_lowlevel
. The function pointers of the FUSE implementation functions in fuser
do not match those of fuse_lowlevel
.
Because of limitations in libnfs and the absence of a need for NFS callbacks support, virtionfs currently doesn't support anything to do with NFS callbacks. NFS callbacks allow the server to notify or ask the client (of) something, mainly used for cache invalidation and delegations (when a file resides on a client). See the NFS 4, 4.1 and 4.2 RFCs for all the callbacks.
Problem
https://elixir.bootlin.com/linux/latest/C/ident/virtio_fs_wake_pending_and_unlock
In the virtio_fs_wake_pending_and_unlock
function the queue on which the request will be put, is hardcoded to a single virtio-fs request queue.
Execution
Development is happening here: https://github.com/Peter-JanGootzen/linux
TODO:
The NFS RPC seems to disconnect after a while of inactivity. An option is to periodically send a NFS:NULL to keep the connection alive. Or use rpc_set_autoreconnect
of libnfs maybe (not sure what it does)?
This might be non-trivial because the current architecture has no good place for something periodic like this.
These are all non-essential for experiments:
create
setattr
flock
readdir
and readdirplus
mkdir
rmdir
mknod
symlink
unlink
fallocate
rename
forget
and batch_forget
Speaks for itself.
Currently virtionfs
uses a non-random verifier and non-random clientid. This is unreliable and also bad when multiple clients connect to the same server.
Currently with XLIO sequential write performance with bs=4k, iodepth=16, numjobs=1
is ~244MB/s, while sequential read only gives ~20MB/s.
Furthermore blocksizes larger than the page size (4k) don't do anything (i.e. no performance increase).
flatten(filename) = key
src_ino
not used and wrongly?Some FUSE requests are fire-and-forget, and thus do not need a reply. Currently, the RVFS implementation sends a reply for every request, even forget
. This can be optimized away.
Currently fuser is incredibly slow in metadata operations as they all occur synchronously thus blocking the Virtio queue (and the currently polling thread) while the remote operation outstanding, thus incurring huge latencies.
Apparently io_uring supports metadata operations, so this would be a good fit.
This is a big problem virtionfs
needs to execute asynchronous handshake requests in the init.
Currently it just sends the INIT FUSE completion before those are done. This can result in race conditions on boot.
An attempt to fix this was made in this commit, however this made random operations randomly break. Seems there is some timeout on FUSE:init or that commit was triggering UB.
Currently the virtionfs
implementation uses as single Virtio queue polling thread and a single NFS socket polling thread (called service thread in libnfs
).
The goal is to move to 8 Virtio queue polling threads and 8 NFS socket polling threads (as our DPU has 8 cores). The optimal number of threads is probably eight because the DPU has eight cores and of these 16 threads only 8 would need to run at a time (either sending or receiving).
To do this multiple NFS connections are needed, which is supported in NFS 4.1. This feature is called 'session trunking'.
Host NFS vs DPU NFS
statfs
and read
Key-Value implementation
Metadata workload blocked by
Currently the UID and GID that are supplied to FUSE:init are used throughout the whole lifetime of the virtio-fs device (fuser and virtionfs).
However each individual FUSE request contains a UID and GID, so each request should be executed under the name of that request's UID and GID.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.