In our env, nfs-ganesha will take up too much memory if there is huge amount of data r

[Question] throttling: how to limit memory usage of nfs-ganesha? about nfs-ganesha HOT 20 OPEN

zhitaoli-6 commented on June 9, 2024

[Question] throttling: how to limit memory usage of nfs-ganesha?

from nfs-ganesha.

Comments (20)

ffilz commented on June 9, 2024 1

It is on our short list, but just when we will get to it is very up in the air.

from nfs-ganesha.

mattbenjamin commented on June 9, 2024 1

fsal ops limit/budget? yes, I think so, please proceed governor :)

from nfs-ganesha.

ffilz commented on June 9, 2024 1

That could work, though there's no guarantee the client pauses sending other requests. It would be better to do something that blocks the client's IP stream. That would require a signal back to the RPC layer, or maybe simply a limit on inflight requests per SVCXPRT (basically per client). If that limit is hit, the RPC layer stops reading from that TCP stream until the inflight requests drop below the limit. Might want hi and lo water mark for hysteresis.

from nfs-ganesha.

ffilz commented on June 9, 2024

Are you using the async I/O mechanism in the FSAL? Which FSAL? You may be exposing an issue that we need some throttling of async requests.

from nfs-ganesha.

zhitaoli-6 commented on June 9, 2024

Yeah, we do use async read/write in the FSAL of our developing distributed file system.

	 void (*write2)(struct fsal_obj_handle *obj_hdl,
			bool bypass,
			fsal_async_cb done_cb,
			struct fsal_io_arg *write_arg,
			void *caller_arg);
	 void (*read2)(struct fsal_obj_handle *obj_hdl,
		       bool bypass,
		       fsal_async_cb done_cb,
		       struct fsal_io_arg *read_arg,
		       void *caller_arg);

We use FIO with following parameters from some NFS client. NFS-ganesha runs on the VM with 8G memory and 8 cores. It turns out that nfs-ganesha takes up 4G+ memory. OOM happens eventually.

[global]
filesize=12G
time_based=1
numjobs=32
startdelay=5
exitall_on_error=1
create_serialize=0
filename_format=$jobnum/$filenum/bw.$jobnum.$filenum
directory=/mnt/vd
group_reporting=1
clocksource=gettimeofday
runtime=300
ioengine=psync
disk_util=0
iodepth=1

[read_throughput]
bs=1m
rw=read
direct=1
new_group

from nfs-ganesha.

ffilz commented on June 9, 2024

You may want to investigate throttling.

It may well be that async I/O is not ready for production yet.

from nfs-ganesha.

zhitaoli-6 commented on June 9, 2024

Can we add a simple throttling algorithm by set a limit on the total number of inflight requests? If it exceeds threshold, the thread blocks there. The algorithm is effective to avoid too much memory used. For example, if the limit is 256, per request size 1M, then about 256MB memory is used.

It can be implemented in alloc_nfs_request() and free_nfs_request() functions.
The pseudo code maybe like this:

alloc_nfs_request() {
// wait for a condition variable until the value is less than threshold
// inflight_num += 1;
}
free_nfs_request() {
// inflight_num -= 1;
// notify this change to unblock some threads
}

from nfs-ganesha.

ffilz commented on June 9, 2024

If we do throttling, we want to do fair throttling.

from nfs-ganesha.

zhitaoli-6 commented on June 9, 2024

Is there any plan to add throttling mechanism into nfs-ganesha?

from nfs-ganesha.

zhitaoli-6 commented on June 9, 2024

Can we add a simple throttling algorithm by set a limit on the total number of inflight requests? If it exceeds threshold, the thread blocks there. The algorithm is effective to avoid too much memory used. For example, if the limit is 256, per request size 1M, then about 256MB memory is used.

We implement this mechanism in our environment and the experiment shows that it is effective to avoid overload of async operations(READ/WRITE). Is this feature acceptable into the community repository?

from nfs-ganesha.

mattbenjamin commented on June 9, 2024

something is definitely needed for async; and we aspire to some qos. it seems to me that since the current async dispatch mechanism (and selection of sync vs async) are at the fsal level, you shouldn't probably be using a global throttle to control it (although that would probably work ok for a lot of bespoke setups). Past that, we have requests for fairness on other dimensions--in particular, exports/shares. We've been treating that as something different.

from nfs-ganesha.

zhitaoli-6 commented on June 9, 2024

I agree with you about that it is better to add throttling mechanism into FSAL layer than global scope. Our goal is to avoid the case where nfs-ganesha takes up too much memory because of a lot of aysnc read/write requests. So we have to add the limit of number of inflight async OPs into the FSAL layer. And fairness on other dimensions like exports/share maybe not very in need now.

Is this design reasonable? If so, I can implement and check it in our environment and contribute to our community :)

from nfs-ganesha.

ffilz commented on June 9, 2024

If you want to submit something for review and discussion, that would be most welcome.

from nfs-ganesha.

zhitaoli-6 commented on June 9, 2024

The draft implementation above will bring deadlock. All RPC threads are waiting for conditional variable, and no threads can handle that an issued request has finished and wakeup other threads in blocking.

Our libntirpc framework supports both sync and async requests while it has no limit on the concurrency of requests. I think there maybe two solutions here:

Add throttle into libntirpc. If there are too many requests inflight, it doesn't handle new connection or data receive events on sockets.
Add throttle into FSAL of nfs-ganesha. There is a limit on async requests and if it exceeds, FSAL switches into sync requests. So more callback functions need to be added into FSAL API, like sync write/read.

Maybe solution 1 is better because the application does not worry about overload anymore.

from nfs-ganesha.

ffilz commented on June 9, 2024

If you have something implemented that is working, please submit to Gerrithub for review and discussion.

Matt and I had a conversation about this. We will need a solution ourselves soon, so anything you have would be of interest to us.

Your solution of throttling overall in flight requests is at least a good start. Another option that might be a bit harder because it requires ntirpc and Ganesha to talk is limit the amount of memory in I/O buffers (which is the big killer), so we stop accepting requests when we hit that limit. Maybe that's hard to do.

Longer term, it would be useful for the FSAL to be involved so it can be smart about what types of requests are backing up.

Eventually we will also want some fairness/QOS so one client doesn't hog all the budget.

from nfs-ganesha.

zhitaoli-6 commented on June 9, 2024

Thanks for your reply. The draft above has deadlock bug and now we let FSAL to return ERR_FSAL_DELAY if there are too many inflight requests. The error code will be translated into NFS4ERR_DELAY or NFS3ERR_JUKEBOX, and NFS client will retry according to mount options.

What's your opinion about this solution?

from nfs-ganesha.

zhitaoli-6 commented on June 9, 2024

It would be better to do something that blocks the client's IP stream.

I agree with you that it would be better to add throttling mechanism in RPC layer.

from nfs-ganesha.

mattbenjamin commented on June 9, 2024

Feedback into the RPC layer is an important addition, but in the longer term, I don't think the RPC layer can own all of flow control as it lacks knowledge of the i/o targets (shares, fsals, paths). I am not bringing up client fairness because we agreed earlier this change isn't attempting it.

from nfs-ganesha.

xiaods commented on June 9, 2024

any patch on gerrit host and can review it? @zhitaoli-6

from nfs-ganesha.

zhitaoli-6 commented on June 9, 2024

I don't submit a patch because the initial design brings deadlock.

from nfs-ganesha.

[Question] throttling: how to limit memory usage of nfs-ganesha? about nfs-ganesha HOT 20 OPEN

Comments (20)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent