Giter Club home page Giter Club logo

dpfs's Issues

Some metadata FUSE functions are not compatible with async

These functions will create a struct on the stack that the FUSE implementation has to fill in and then call a special reply function such as fuse_ll_reply_attr. However with async these stack-residing structs will get destroyed.
Broken functions to fix:

  • setattr
  • readdir and readdirplus
  • create
  • flush (or just don't use the struct fuse_file_info *, then its fine)
  • flock

Rust bindings

If this project is to be used for more long-term projects, it should move away from C and something more safe but low-level like Rust. The frequency of memory related bugs and undefined behavior is very high. The DPU library is in C, so two approaches are possible.

  • Create Rust bindings for the DPU library
  • Create Rust bindings for the lowlevel dpu-virtio-fs library and then reimplement the FUSE facilities in Rust

TODO

  • Figure out how to create rustbindings for an automake project

Help needed: Prototype firmware for DPFS in BF2

I posted my question in discussion couple days ago, I think I should move it here for visibility.

Hello,

Currently we are configuring our own environment to evaluate DPFS by following the instructions in the github (DPFS/README.md at master · IBM/DPFS · GitHub) . From the document, a prototype firmware is required for the whole system to work properly. Would you please provide more details about this firmware? Is it developed by NVDIA specifically for the DPFS project? How can we obtain the binary?

Thank you.

NFS lease period not implemented

In NFS the filehandle returned by OPEN is only valid for the lease period (attribute lease_time which is in seconds). Currently no consideration of this is made, thus after the lease period has ended we start erroring.
https://www.rfc-editor.org/rfc/rfc5661#section-5.8.1.11

It is likely that this is causing the 10025 (NFS4ERR_BAD_STATEID) and 10020 (NFS4ERR_NOFILEHANDLE) NFS errors that occur some time during the workloads.

Investigate possible uninitialized memory with NFS operations

The NFS operations nfs_argop4 that are created for every NFS request are not zeroed out. There might be possible initialized memory in there. This is a very easy mistake as some of the operations have very complex RPC structures. So these need to be zeroed going forward.

Use `CLAIM_FH` for NFS:OPEN instead of inefficient `CLAIM_NULL`

Currently when opening a file CLAIM_NULL is used in conjunction with CURRENT_FH=parent->fh and the filename of the file to be opened. This requires tracking the parents of files plus the filenames of files inside of virtionfs.

In NFS 4.1 there is a CLAIM_FH flag that allows you to open the file that the CURRENT_FH points to. Removing this extra bookkeeping overhead from the client.

Blocked by #2 and #15 because NFS 4.1 is required.

`io_uring` feature tracker

  • R/W I/O
  • Completion queue polling (i.e. userspace-side polling)
  • getattr
  • fsync
  • Fixed buffers
  • Fixed files
  • Submission queue polling (i.e. kernel-side polling)

Currently not able to receive NFS callbacks

Because of limitations in libnfs and the absence of a need for NFS callbacks support, virtionfs currently doesn't support anything to do with NFS callbacks. NFS callbacks allow the server to notify or ask the client (of) something, mainly used for cache invalidation and delegations (when a file resides on a client). See the NFS 4, 4.1 and 4.2 RFCs for all the callbacks.

Virtio-fs Linux driver does not support multi-queue

Problem
https://elixir.bootlin.com/linux/latest/C/ident/virtio_fs_wake_pending_and_unlock

In the virtio_fs_wake_pending_and_unlock function the queue on which the request will be put, is hardcoded to a single virtio-fs request queue.

Execution
Development is happening here: https://github.com/Peter-JanGootzen/linux

TODO:

  • Select queue depending on CPU id
  • Fix needing to use get_cpu (smp_processor_id seems not to be allowed in tasks, but get_cpu disables preemption)
  • Set interrupt affinity
  • Simple cat test (only runs cat on a single core)
  • Big Ubuntu fio test (multi-core, check whether it actually goes into different queues with debugfs)
  • CPU Hotplugging (see virtio-net)
  • Implement round-robin queue scheduling (using atomics) in virtio-fs for comparison
  • Send it in as a formal patch (not a priority, but would be good for the paper)

NFS connection timeout

The NFS RPC seems to disconnect after a while of inactivity. An option is to periodically send a NFS:NULL to keep the connection alive. Or use rpc_set_autoreconnect of libnfs maybe (not sure what it does)?
This might be non-trivial because the current architecture has no good place for something periodic like this.

Implement missing file system operations in virtionfs

These are all non-essential for experiments:

  • create
    • Send-side programmed but untested, no receive-side
  • setattr
  • flock
  • readdir and readdirplus
  • mkdir
  • rmdir
  • mknod
  • symlink
  • unlink
  • fallocate
  • rename
  • forget and batch_forget

NFS random verifier and clientid

Currently virtionfs uses a non-random verifier and non-random clientid. This is unreliable and also bad when multiple clients connect to the same server.

Performance of virtionfs

Currently with XLIO sequential write performance with bs=4k, iodepth=16, numjobs=1 is ~244MB/s, while sequential read only gives ~20MB/s.

  • Investigate why this is happening (there are already some flamegraphs ran on commit 7fc9e18)
  • Implement a fix

Furthermore blocksizes larger than the page size (4k) don't do anything (i.e. no performance increase).

  • Investage why
  • Impl fix

Investigate fuser problems

  • fio rand_iops benchmarks returns -14 error
  • src_ino not used and wrongly?
  • inode not being properly locked (especially during the inode table operations)

Rewrite fuser to use io_uring

Currently fuser is incredibly slow in metadata operations as they all occur synchronously thus blocking the Virtio queue (and the currently polling thread) while the remote operation outstanding, thus incurring huge latencies.
Apparently io_uring supports metadata operations, so this would be a good fit.

FUSE:init is not async

This is a big problem virtionfs needs to execute asynchronous handshake requests in the init.
Currently it just sends the INIT FUSE completion before those are done. This can result in race conditions on boot.

An attempt to fix this was made in this commit, however this made random operations randomly break. Seems there is some timeout on FUSE:init or that commit was triggering UB.

Multiple NFS connections

Currently the virtionfs implementation uses as single Virtio queue polling thread and a single NFS socket polling thread (called service thread in libnfs).
The goal is to move to 8 Virtio queue polling threads and 8 NFS socket polling threads (as our DPU has 8 cores). The optimal number of threads is probably eight because the DPU has eight cores and of these 16 threads only 8 would need to run at a time (either sending or receiving).
To do this multiple NFS connections are needed, which is supported in NFS 4.1. This feature is called 'session trunking'.

Comprehensive experimentation suite

Host NFS vs DPU NFS

  • IOPS
  • Latency statfs and read
  • Bandwidth
  • Metadata workload
  • CPU utlization per IOP

Key-Value implementation

  • Metadata workload
  • IOPS
  • Bandwidth

Metadata workload blocked by

UID and GID not applied on a per-request basis

Currently the UID and GID that are supplied to FUSE:init are used throughout the whole lifetime of the virtio-fs device (fuser and virtionfs).

However each individual FUSE request contains a UID and GID, so each request should be executed under the name of that request's UID and GID.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.