This would give us the ability to store more jobs than fit in RAM. W

Some discussion pasted from <a href="http://groups.google.com/group/beanstalk-talk/t/5

use mmap for writing (and reading) the binlog about beanstalkd HOT 1 CLOSED

kr commented on August 19, 2024

use mmap for writing (and reading) the binlog

from beanstalkd.

Comments (1)

kr commented on August 19, 2024

Some discussion pasted from http://groups.google.com/group/beanstalk-talk/t/5d318b9847dc93a8:

Zhu Han wrote:

Hi,

I just found beanstalkd is quite simple but very useful. Thank you for
such a great contribution.

I'm not familiar with the design and implementation of beanstalkd
because I just went through the code yesterday. So if I made any
mistake or invalid consumption, please correct me.

I noticed that beanstalkd uses normal read/write style file IO on
the binlog and keeps all the job data in the memory, including the
bookmarking information and body data. If binlog is activated, is it a
good idea to map the binlog file into memory directly? The advantage
of this approach is:

No double cache of the job. Just leave it in the binlog and map it
to memory, the VFS laryer will manage it for beanstalkd. The current
approach caches the job in heap memory, which is backed up by swap
space, it also caches the job in the VFS layer of file system.

The number of system call can be decreased, so that the latency of
single operation might be better.

Binlog can be used as the default option. And even the memory
cannot hold all the jobs, that's fine. Let the VFS layer to do the
cache management. Varnish uses the same approach and it's reported
with good performance number. Refer:
http://varnish-cache.org/wiki/ArchitectNotes

The only disadvantage I can though of is if the OS swapped the memory
page, due to the single thread implementation of beanstalkd, all
operations by different client connections may be affected... We can
tune the memory settings of process to avoid it for low latency
environment or disable the binlog at all.

(As a tiny bit of background, in case there's any doubt, I usually
assume that deployments of beanstalkd are configured so that they
never swap. If they do swap, performance is likely to go down the
toilet.)

Using mmap is worth thinking about if we want to allow storing more
jobs than fit in memory. Until then, I think the advantages of using
mmap are too slight to worry about. Right now beanstalkd uses the
binlog for only recovery. For this purpose, writing to a file
descriptor is not really easier or harder than mmap. Having fewer
system calls will make things a bit faster, but I think our
performance is already pretty good and our time is probably better
spent on bug fixing and features. Far more important for speed is
maintaining sequential access to avoid disk seeks. I think having an
extra copy of some jobs in the page cache makes little difference for
good or bad.

If we want to use the binlog as the primary store of jobs (thus
letting us accept more jobs than memory can hold), we'll be both
reading and writing, which can cause seeks, which will kill
performance. This might be a desirable tradeoff for some, but we must
make sure, if we do it at all, this doesn't cause any problems for
those who don't need the extra storage space.

from beanstalkd.

use mmap for writing (and reading) the binlog about beanstalkd HOT 1 CLOSED

Comments (1)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent