beanstalkd / beanstalkd Goto Github PK

View Code? Open in Web Editor NEW

6.5K 6.5K 869.0 1.3 MB

Beanstalk is a simple, fast work queue.

Home Page: https://beanstalkd.github.io/

License: Other

C 96.61% Shell 2.37% Makefile 0.89% Dockerfile 0.13%

beanstalkd's Introduction

beanstalkd

Simple and fast general purpose work queue.

https://beanstalkd.github.io/

See doc/protocol.txt for details of the network protocol.

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms. See CodeOfConduct.txt for details.

Quick Start

$ make
$ ./beanstalkd

also try,

$ ./beanstalkd -h
$ ./beanstalkd -VVV
$ make CFLAGS=-O2
$ make CC=clang
$ make check
$ make install
$ make install PREFIX=/usr

Requires Linux (2.6.17 or later), Mac OS X, FreeBSD, or Illumos.

Currently beanstalkd is tested with GCC and clang, but it should work with any compiler that supports C99.

Uses ronn to generate the manual. See http://github.com/rtomayko/ronn.

Subdirectories

adm - files useful for system administrators
ct - testing tool; vendored from https://github.com/kr/ct
doc - documentation
pkg - scripts to make releases

Tests

Unit tests are in test*.c. See https://github.com/kr/ct for information on how to write them.

beanstalkd's People

Contributors

Stargazers

Watchers

Forkers

dustin brosner gbarr kristjan abh steveyen lericson echou ghazel amireh dnlchen causes carlhoerberg camilo ptrmcrthr rainly aleksi floop-inc jakobdamjensen fdr joncooper thaingo mvickers michaeldwan userid gui dolfelt keeyipchan sdboyer sepreece drasch julman99 stephan-hof jonas8 dsmith tonyarkles soup w00w00 seacoastboy deepfryed ebottabi comforx kunalag129 aia keen99 valentinbora nitinahuja eamocanu dp-opensource benvanstaveren shardnit ziyoudefeng guoyunsky chopsui258 tant42 dwoon dumpforjunk minhajuddin ifduyue wahghw seraphlnwu davies lujiang-wed garymonson zerocoolys svoflee cincodenada jasonwiener billhongs nicotaing gollapudi fantasyni charsyam ejmvar jney nathanielc nephics arigemini karenetheridge thisandagain schmichael gavinhwa compendiumsoftware zhoufeng1989 ibank gregoryp tomponline netsweng rottenbytes zhanglei cardinala wl-chuang fudong1127 bmhatfield liexusong kencochrane jensrantil kulv2012 hufon notxarb

beanstalkd's Issues

Fix the build on Solaris

See http://groups.google.com/group/beanstalk-talk/t/e589c79f86f4d0ac for the bug report.

add underscore "_" to list of valid tube name characters

peek buried seems to have a problem

Hi,

I'm using beanstalkd 1.4.3 with PHP, and I found a little problem. The "peek-buried" command seems to be broken
Here are my commands :

watch test
WATCHING 2

stats-tube test
OK 248
name: test
current-jobs-urgent: 0
current-jobs-ready: 0
current-jobs-reserved: 0
current-jobs-delayed: 0
current-jobs-buried: 4
total-jobs: 4
current-using: 0
current-watching: 1
current-waiting: 0
cmd-pause-tube: 0
pause: 0
pause-time-left: 0

list-tubes-watched
OK 21
default
test

peek-buried
NOT_FOUND

So, I have some buried jobs in my queue, but peek-buried don't find them...

Joël

Broken pipe error and heavy CPU use with misbehaving clients

| See below... This happens when clients send reserve
| commands constantly on an empty queue.

I'm running the master beanstalkd on Snow Leopard and it's been working well, except I've come across the error:

./beanstalkd: prot.c:672 in check_err: writev(): Broken pipe
./beanstalkd: prot.c:672 in check_err: writev(): Broken pipe
..

Without the clients doing anything else to the queue, this causes beanstalkd to run at 97% CPU usage. I'm not sure what caused this specifically but it occurs when I have one client adding things onto the queue and another removing them (about 10 items per second).

Cheers

Peter

forwarding client

It would be nice to have a client that just forwards jobs from one beanstalkd instance to another. See http://groups.google.com/group/beanstalk-talk/t/984bfc8037847e53 for some motivation for this setup.

carefully allocate disk space for binlog records

We want to handle a full disk with grace.

clear tube command

http://groups.google.com/group/beanstalk-talk/t/4ed1d368d7b3a5a8

Add appropriate fsync calls.

Should have optional throttling. There should be three modes: always fsync (for best reliability), fsync at most once every N seconds, or never fsync (best performance).

Could be controlled by two options: "-s 0" for always fsync, "-s N" for sync at most once every N seconds, and "-S" for never fsync.

make a kick-job command

Kick a job by id.

Format: "kick-job" id CR LF

mac os doesn't have posix_fallocate

See http://groups.google.com/group/beanstalk-talk/t/8210428dd5f2c98f for details.

mailbox tubes

Most tubes exist as long as jobs exist in them or a client is watching or using them. Mailbox tubes exist only while a client is watching. If no clients are watching, the tube and all jobs in it vanish.

A common use for this is to indicate completion of a job by a "reply job" sent to a mailbox tube.

Such tubes can be marked as special by prefixing the name with a special char. I think "~" would work fine as that char.

Unreliable reserve on server restart.

Steps to reproduce (used ruby client library for this):

Put some task to empty queue
Reserve task
restart beanstalkd
Reserve task from other process succeeds thus making this task being executed by 2 processes simultaneously without knowing about parallel execution.
When trying to delete task from first connection, EOFError: EOFError is received (probably because of disconnect caused by server restart).
delete from second connection succeeds

stateless connections

Stateful connections cause problems, especially when processing long-running jobs
or connecting over long distances or complicated network topologies. We can solve
these problems by learning from HTTP: prefer (rather, require) stateless, disposable,
short-lived connections.

However, statelessness poses challenges, especially with efficiency. For example,
repeating the tube name for every put command would use more bandwidth than
the current protocol. So this is a tradeoff.

The current beanstalkd protocol is highly stateful and its design pervasively assumes
that state is available. It uses this where possible to be more efficient.

We should reevaluate whether the current statefulness of the protocol is the best tradeoff.

Original post follows.

Currently when a client disconnects while it has a reserved job, that job is instantly released onto the front of the queue, without respect of its remaining TTR.

This means that:

A worker executing long jobs cannot disconnect/reconnect to beanstalkd during execution. Perhaps that would make it too hard to track which client owns which job reservation though...
A job that causes a worker to crash constantly hogs the front of the queue, blocking subsequent jobs from coming through.

Perhaps auto-release due to disconnect should at least release onto the end of the queue, at a lower priority, or with a delay..?

allow "delete" command to work on delayed jobs

Peek commands leak memory

Currently the peek commands do a job_copy() before calling reply_job() presumably to avoid a race between the job being sent back to the client, and it being DELETEd by another client.

reset_conn() should free the job copy however it currently requires the job.id to be 0 before it will call job_free(). This works fine for the h_conn_timeout() case as in http://github.com/kr/beanstalkd/commit/d77814816f54c48b7bcc85c36bb681fd277d3e65 but it overloads the meaning of job.id which is an essential part of the peek responses.

The result is that we can't set a copied job.id to 0 for peek responses which means it won't get freed by reset_conn(), leading to a fast memory leak if any of the peek commands are used.

I'll submit a patch which adds a new job state to mark a job as copied (JOB_STATE_COPY) but I'm not sure if its the best way to solve this.

http://gist.github.com/139279 will hammer peek-buried sufficiently that you'll see the beanstalkd process size grow very rapidly.

Jobs list

Is there a way to see all the jobs in the queue with peek_ready perhaps? I can see only the latest job.

don't use netcat in automated tests

Our jankity shell scripts don't work with all versions of netcat. It's just too hard to use reliably.

I'm leaning toward a really simple, custom python framework. But if there are any really good protocol testing tools out there, I'd love to know.

use high-resolution timeouts internally

Avoid stupid off-by-one errors at one-second granularity.

Build error on Snow Leopard: fdatasync

I get a build error on Mac OS X Snow Leopard where it says it can't find fdatasync.

Output of ./configure:

https://gist.github.com/0290aa815842b888eed4

Error upon running make:

cc1: warnings being treated as errors
binlog.c: In function ‘binlog_write_job’:
binlog.c:491: warning: implicit declaration of function ‘fdatasync’ 
make[1]: *** [binlog.o] Error 1
make: *** [all] Error 2

Using libevent-1.4.12-stable ("make verify" passed).

YAML, newlines and shell tests

I'm looking at getting the shell tests passing, but have a dilemma (quadlemma?) regarding newlines in the YAML output.

The beanstalkd protocol uses CR+LF newlines, but the YAML in stats, stats-job and stats-tube commands uses LF newlines. This makes it very difficult to write *.expected files that contain beanstalkd protocol response lines as well as YAML.

Note that the YAML spec supports any kind of newline; CR+LF, LF or CR:
http://www.yaml.org/spec/1.2/spec.html#id2774608

I see a few possible options:

Option 1: Use CR+LF in YAML response data

This would make the beanstalkd protocol more predictable, and would require no changes to protocol.txt which doesn't specify which newlines YAML uses.
However, it would possibly break beanstalkd clients that assume LF and don't handle CR+LF. I've only recently fixed this in Pheanstalk.

Option 2: Use diff --strip-trailing-cr in check-one.sh

This will convert the diff input from CR+LF to LF.
The downside is that the tests will no longer pick up any actual newline bugs, for example a protocol response ending in LF instead of CR+LF.
Also, I can only confirm that the option is available in GNU diffutils - I don't know if there's other common diff implementations out there that don't support it.

*Option 3: Split .expected files into parts when required.

If example.commands expects a beanstalk response line, then YAML data, then another beanstalk response line, split it into example.expected.1 example.expected.2 and example.expected.3, where example.expected.2 is saved with LF newlines, and the others with CR+LF.
This solution makes managing the *.expected files a bit of a pain, but it doesn't impact on beanstalkd itself, nor weaken the test assertions.

Option 4: Define a basic syntax to specify newline type

For example the _.expected files could be pre-processed to convert all newlines to CR+LF unless the end in the string "<LF>".
This would add pre-processing complexity and a new (tiny) syntax, but would leave the .expected files more manageable.
It would also make the tests less fragile to accidentally saving _.expected with the wrong newline type.

Any preferences or better ideas on how I should go about this?

Cheers!
Paul

OUT_OF_MEMORY error with -d AND -b

On my MacBook, beanstalkd 1.4.4 raises an OUT_OF_MEMORY error after ~15k jobs pushed, if I detach beanstalkd (-b) AND I activate the binlog (-b).
If I remove one of these parameters, I did not see any error after 1 million jobs pushed.

http://pastie.org/959225

require 'rubygems'
require 'beanstalk-client'

body = {
  :fdsf => 'gkljfjkldsgjkldsjkgldskljfgjklsdf sdfkljfdslk',
  :jklsf => 'gkljfjkldsgjkldsjkgldskljfgjklsdf 878734',
  :gdfhjk => 'gkljfjkldsgjkldsjkgldskljfgjklsdf (fdskljgfhsdkl)è!ç',
}

conn = Beanstalk::Connection.new('0.0.0.0:11300')
conn.use('test')
conn.watch('test')
conn.ignore('default')

1_000.times do |i|
  puts i * 2000
  2_000.times do
    conn.yput body, 1, 0, 8_035_200
  end
end

puts 'ok'

test binlog-sizelimit fails for some people

It works for me, but two people have reported failure. Can somebody post a log or error message here?

binlog consumes unreasonable disk space with very old jobs

If you get a buried job the binlog will start to stack up. This is very unfortunate in high traffic environments since it will fill both disk and memory very quickly.

There's a good blog post about this here: http://blog.sendapatch.se/2010/may/how-do-you-handle-job-failures-really.html

Allow beanstalkd to reserve arbitrary amounts of binlog space

Currently it can only create one extra binlog file on disk. After that it will deny operations that want to reserve space. (Search for "overextended" in binlog.c.) There just needs to be a linked list of future binlog files; currently there is a pointer to (up to) one future file. In practice this only happens when the queue holds more than about 10 MB of jobs.

add SASL support

beanstalkd sometimes exits on snow leopard

There is a mailing-list thread about this:
http://groups.google.com/group/beanstalk-talk/t/b14ef253bcaec197

memory leak in list-tubes and list-tubes-watched with beanstalkd 1.4.3

the following python code trigger a memory leak:

from beanstalkc import Connection
c = Connection('127.0.0.1')
while True:
    c.watching()

the fix is in prot.c

--- prot.c~     2009-11-29 01:55:35.000000000 +0100
+++ prot.c      2009-12-28 15:00:22.308353126 +0100
@@ -1001,6 +1001,9 @@
 c->out_job = allocate_job(resp_z); /* fake job to hold response data */
 if (!c->out_job) return reply_serr(c, MSG_OUT_OF_MEMORY);

+    /* Mark this job as a copy so it can be appropriately freed later on */
+    c->out_job->state = JOB_STATE_COPY;
+
 /* now actually format the response */
 buf = c->out_job->body;
 buf += snprintf(buf, 5, "---\n");

please add ipv6 support

Sequence numbers / Synchronous clients?

I just read the protocol discription for Beanstalkd. The project looks great and the protocol is really simple. Should be very straightforward to implement in a language such as Erlang. The only problem I could see is the lack of sequence numbers. This basically would force synchronous usage for the clients of Beanstalkd ? Do you have any plans for a binary protocol with sequence numbers?

does not handle SIGTERM

beanstalkd only handles SIGINT and does then a clean shutdown. The following patch adds this behavior for SIGTERM. Tested against version 1.3

Patch:
--- beanstalkd.c.orig 2009-08-15 13:48:46.000000000 +0200
+++ beanstalkd.c 2009-08-15 13:49:11.000000000 +0200
@@ -120,6 +120,10 @@
sa.sa_handler = exit_cleanly;
r = sigaction(SIGINT, &sa, 0);
if (r == -1) twarn("sigaction(SIGINT)"), exit(111);
+
+ sa.sa_handler = exit_cleanly;
+ r = sigaction(SIGTERM, &sa, 0);
+ if (r == -1) twarn("sigaction(SIGTERM)"), exit(111);
}

 /* This is a workaround for a mystifying workaround in libevent's epoll

beanstalkd fails to build against libevent2

Beanstalkd (at least 1.4.6) fails to build against libevent 2 series. Build log and more info here: https://bugs.gentoo.org/show_bug.cgi?id=333091

accept LF in addition to CR LF to terminate commands

OUT_OF_MEMORY after crash

Updated steps to reproduce in v1.4.6:

$ mkdir x
$ ./beanstalkd -b x &
[1] 4369
$ kill -9 4369
$ ./beanstalkd -b x &
[2] 4371
[1]   Killed                  ./beanstalkd -b x
$ printf 'put 0 0 0 0\r\n\r\n' | nc localhost 11300
./beanstalkd: binlog.c:589 in maintain_invariants_iter: newest binlog has invalid 155 reserved
./beanstalkd: prot.c:841 in enqueue_incoming_job: server error: OUT_OF_MEMORY

$

The same sympoms can be reproduced when working with queue for some time and leaving it empty before crash. Non-empty queue replays nicely. So, generally this looks like a bug in replaying the empty binlog or something closely related.

print helpful warnings about command-line errors

Giving beanstalkd -b -l 127.0.0.1 should produce a warning message:

Warning: "-l" looks like an option flag, but it will be interpreted as a path name by -b.

This is in addition to the error message we already print ("unknown option: 127.0.0.1").

Slowly increasing memory usage

On a machine with 16GB of RAM, we need to restart the Beanstalkd daemon at least 3 times a week because it slowly eats all of the memory on the server.
We are using the ruby beanstalk client with beanstalkd version 1.4.3 on Ubuntu 9.10 (karmic), kernel version 2.6.31-14-server.
Definitely calling job.delete from the client when we're done with the job. My feeling is the memory leak has to do with connections.
Our application makes about 2k-3k connections at any given time, reaching about 10,000,000 total connections around the time it runs out of memory. It uses about 5 tubes and pushes upwards of 100,000,000 jobs through the queue per day.

Does not compile on FreeBSD 7 amd64

reeler% git clone git://github.com/kr/beanstalkd.git
reeler% cd beanstalkd
reeler% ./autogen.sh
configure.in:10: installing ./install-sh' configure.in:10: installing./missing'
reeler% ./configure
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... ./install-sh -c -d
checking for gawk... no
checking for mawk... no
checking for nawk... nawk
checking whether make sets $(MAKE)... yes
checking whether to enable maintainer-specific portions of Makefiles... no
checking for style of include used by make... GNU
checking for gcc... gcc
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables...
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking dependency style of gcc... none
checking for supported compiler flags... -Wall -Werror
checking for gcc... (cached) gcc
checking whether we are using the GNU C compiler... (cached) yes
checking whether gcc accepts -g... (cached) yes
checking for gcc option to accept ISO C89... (cached) none needed
checking dependency style of gcc... (cached) none
checking for a BSD-compatible install... /usr/bin/install -c
checking if compiler supports -R... yes
checking libevent install prefix... /usr/local
checking for posix_fallocate... no
checking for fdatasync... no
checking for bind in -lsocket... no
checking for inet_aton in -lnsl... no
checking for event_get_version in -levent... yes
checking for event_reinit in -levent... yes
checking how to run the C preprocessor... gcc -E
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for uint16_t... yes
configure: creating ./config.status
config.status: creating Makefile
config.status: creating beanstalkd.spec
config.status: creating config.h
config.status: executing depfiles commands
reeler% make
make all-am
gcc -DHAVE_CONFIG_H -I. -g -O2 -Wall -Werror -I/usr/local/include -c -o beanstalkd.o beanstalkd.c
gcc -DHAVE_CONFIG_H -I. -g -O2 -Wall -Werror -I/usr/local/include -c -o binlog.o binlog.c
gcc -DHAVE_CONFIG_H -I. -g -O2 -Wall -Werror -I/usr/local/include -c -o conn.o conn.c
gcc -DHAVE_CONFIG_H -I. -g -O2 -Wall -Werror -I/usr/local/include -c -o job.o job.c
gcc -DHAVE_CONFIG_H -I. -g -O2 -Wall -Werror -I/usr/local/include -c -o ms.o ms.c
gcc -DHAVE_CONFIG_H -I. -g -O2 -Wall -Werror -I/usr/local/include -c -o net.o net.c
In file included from net.h:22,
from net.c:22:
/usr/include/netinet/tcp.h:40: error: expected '=', ',', ';', 'asm' or 'attribute' before 'tcp_seq'
/usr/include/netinet/tcp.h:50: error: expected specifier-qualifier-list before 'u_short'
/usr/include/netinet/tcp.h:175: error: expected specifier-qualifier-list before 'u_int8_t'
*** Error code 1

Stop in /usr/home/peter/beanstalkd.
*** Error code 1

Stop in /usr/home/peter/beanstalkd.
reeler%

1.4.4 Memory Leak?

I keep getting these errors every 10-15 min in dev mode.

beanstalkd: prot.c:1785 in h_accept: accept(): Too many open files
beanstalkd: net.c:72 in brake: too many connections; putting on the brakes

Any ideas?

named jobs

Can be used, e.g., for deduplication of work.

Re-enable compiler warnings and -werror

These accidentally got disabled in the switch to autoconf.

Make binlog format robust against errors.

Probably just means adding a CRC to stored records.

cutgen crashes on Solaris

list existing jobs

There's currently several ways to get a list of existing jobs, but most either modify the jobs directly or make copies which is dangerous on a large tube. I found a branch of beanstalkd that implements listing commands in the C code written by amireh and I think it would be a great feature to include in the next release: http://github.com/amireh/beanstalkd

If those commands are implemented, managing the beanstalkd service would be considerably easier.

Feature Request: tube latency and age of oldest job

For monitoring purposes, it would be great to have the following information as part of stats and stats-tube:

latency: average time that a job spends (in sec, over all queues, and for each tube)
oldest-age: age of oldest job (in sec, over all queues, and for each tube)

make unit tests portable

One of them breaks on Mac OS.

slow performance with large jobs

When submitting jobs around 2MB, beanstalkd gets really slow. See here for the original report:

http://groups.google.com/group/beanstalk-talk/t/f27bd53d62d23f0c

internal error when inserting/deleting many jobs with binlog

http://groups.google.com/group/beanstalk-talk/t/8edacfb9781c5adc

No rule to make target `NEWS.md'

Cannot successfully make the beanstalkd:

make  all-am
make[1]: *** No rule to make target `NEWS.md', needed by `all-am'.  Stop. 
make: *** [all] Error 2

Binary was built successfully, while this issue prevents from executing 'make install'. Source is latest master.

$ uname -a

Darwin 10.4.0 Darwin Kernel Version 10.4.0: Fri Apr 23 18:28:53 PDT 2010; root:xnu-1504.7.4~1/RELEASE_I386 i386

use mmap for writing (and reading) the binlog

This would give us the ability to store more jobs than fit in RAM.

We don't necessarily want to do this, but it's worth discussing.

add a "reserves" counter

Currently, if a client reserves a job and then crashes, the job will be put in the ready queue with no indication that it had ever been reserved. This is not good; clients should be able to distinguish such a job from one that has never been reserved.

So we should add a "reserves" counter and/or increment the "timeouts" counter in this situation and/or introduce a separate counter just for this situation.

See http://groups.google.com/group/beanstalk-talk/t/951b6c35752257a4 for more info.

Reprioritise existing jobs

Given a job id, I would like to be able to reprioritize it. Currently I can only achieve this by calling peek, delete, then recreating the job.

One possibility would be to allow release to be called without first having reserved the job. Currently the interface for delete and release is inconsistent:

put 0 0 60 3
foo
INSERTED 1
delete 1
DELETED
put 0 0 60 3
bar
INSERTED 2
release 2 100 0
NOT_FOUND
delete 2
DELETED