The OS: CentOS 7 64 bit
Memory installed: 32g
The bug: Jobberd crashes occasionally on my servers. If I run "jobber status" I get:
Couldn't connect to daemon: dial unix /var/jobber_daemon.sock: connect: connection refused
After running a "systemctl start jobber" (since its shown as dead) I get this:
Starting jobber (via systemctl): Job for jobber.service failed because the control process exited with error code. See "systemctl status jobber.service" and "journalctl -xe" for details.
Then obviously doing that command I get this:
● jobber.service - LSB: jobber
Loaded: loaded (/etc/rc.d/init.d/jobber)
Active: failed (Result: exit-code) since Tue 2016-10-11 01:19:47 EDT; 10s ago
Docs: man:systemd-sysv-generator(8)
Process: 12236 ExecStop=/etc/rc.d/init.d/jobber stop (code=exited, status=0/SUCCESS)
Process: 12233 ExecStart=/etc/rc.d/init.d/jobber start (code=exited, status=1/FAILURE)
Oct 11 01:19:47 localhost.localdomain systemd[1]: Starting LSB: jobber...
Oct 11 01:19:47 localhost.localdomain jobber[12233]: Starting jobberd: /bin/bash: fork: Cannot allocate memory
Oct 11 01:19:47 localhost.localdomain jobber[12233]: [FAILED]
Oct 11 01:19:47 localhost.localdomain systemd[1]: jobber.service: control process exited, code=exited status=1
Oct 11 01:19:47 localhost.localdomain systemd[1]: Failed to start LSB: jobber.
Oct 11 01:19:47 localhost.localdomain systemd[1]: Unit jobber.service entered failed state.
Oct 11 01:19:47 localhost.localdomain systemd[1]: jobber.service failed.
There is no issue with memory as running free -g
returns:
total used free shared buff/cache available
Mem: 31 9 17 0 4 17
Swap: 7 0 7
Doing a stop, restart, stop and then start I get this:
● jobber.service - LSB: jobber
Loaded: loaded (/etc/rc.d/init.d/jobber)
Active: active (exited) since Tue 2016-10-11 01:21:14 EDT; 4s ago
Docs: man:systemd-sysv-generator(8)
Process: 12236 ExecStop=/etc/rc.d/init.d/jobber stop (code=exited, status=0/SUCCESS)
Process: 1850 ExecStart=/etc/rc.d/init.d/jobber start (code=exited, status=0/SUCCESS)
Oct 11 01:21:14 localhost.localdomain systemd[1]: Starting LSB: jobber...
Oct 11 01:21:14 localhost.localdomain systemd[1]: Started LSB: jobber.
Then doing jobber status
:
After doing some more stop and starts, I manage to get it to fail when doing jobber
:
runtime/cgo: pthread_create failed: Resource temporarily unavailable
SIGABRT: abort
PC=0x7f877dbe55f7 m=0
goroutine 0 [idle]:
goroutine 1 [running, locked to thread]:
runtime.systemstack_switch()
/usr/lib/golang/src/runtime/asm_amd64.s:216 fp=0xc82004fe90 sp=0xc82004fe88
runtime.newproc(0x0, 0x89d720)
/usr/lib/golang/src/runtime/proc1.go:2213 +0x62 fp=0xc82004fed8 sp=0xc82004fe90
runtime.init.4()
/usr/lib/golang/src/runtime/proc.go:141 +0x2b fp=0xc82004fef0 sp=0xc82004fed8
runtime.init()
/usr/lib/golang/src/runtime/zversion.go:9 +0x378 fp=0xc82004ff50 sp=0xc82004fef0
runtime.main()
/usr/lib/golang/src/runtime/proc.go:63 +0x103 fp=0xc82004ffa0 sp=0xc82004ff50
runtime.goexit()
/usr/lib/golang/src/runtime/asm_amd64.s:1696 +0x1 fp=0xc82004ffa8 sp=0xc82004ffa0
goroutine 17 [syscall, locked to thread]:
runtime.goexit()
/usr/lib/golang/src/runtime/asm_amd64.s:1696 +0x1
Reproducing: I cannot reliably reproduce the problem. Out of 10 servers it's on, they randomly crash.
Expected behavior: Not crashing
Extra info: I have a single jobber task running every 20 minutes ( 0 */20 * * * * ) which runs many commands. The output for those commands would not eat up any memory, maybe 2 megs at the most. Commands are mostly network related, and then running 2 PHP scripts.