Comments (6)
Hi there!
We use Pivotal Tracker to provide visibility into what our team is working on. A story for this issue has been automatically created.
The current status is as follows:
- #129667821 Investigate: runc "no space left on device" cgroup errors on concourse worker
This comment, as well as the labels on the issue, will be automatically updated as the status in Tracker changes.
from garden-runc-release.
Thanks for the clear description and info on how to access the env @JesseTAlford !
I'm having difficulty SSHing to the worker, is the SSH port for the BOSH director locked down to a specific set of IP addresses? I'm getting a Errno::ETIMEDOUT
from here.
from garden-runc-release.
Yeah, it looks like there's a security group rule that restricts access; you hit the nail on the head, it only allows traffic from SF.
I've added a rule that allows traffic from 80.169.160.158/30
, which IOPS tells me is the Pivotal London office. If that doesn't work, you might try VPNing into the SF office, or letting me know what range you need a rule for.
Please let me know if that gets you in or not!
from garden-runc-release.
Hi @JesseTAlford,
Thanks for sorting that out, we've now been able to gain access to the worker VM to take a look around.
We have some good news and some not so good news.
The good news is that we've found the cause of the no space left on device
error you're seeing.
The reason for the error is that the maximum no. of memory Cgroups has been reached on the VM:
$ cat /proc/cgroups
#subsys_name hierarchy num_cgroups enabled
cpuset 1 1524 1
cpu 2 168 1
cpuacct 3 168 1
memory 4 **65535** 1
Unfortunately we're not sure if this is something that can be fixed immediately...
Long story short is that we believe we need this feature to be implemented in runc to prevent this from happening again.
It's looking like memory Cgroups are not getting cleaned up as efficiently as they could be, and so in environments that are creating lots and lots of containers, the aforementioned limit can be reached.
An interim solution is to upgrade the stemcell for the deployment, which, at the very least, will reset the count back down to 0.
It's also possible that the kernel in the newer stemcell will help with the mem Cgroup cleanup as well (but we're not 100% sure on that).
Thanks,
Ed & Petar
from garden-runc-release.
Hello again!
All stories related to this issue have been accepted, so I'm going to automatically close this issue.
At the time of writing, the following stories have been accepted:
- #129667821 Investigate: runc "no space left on device" cgroup errors on concourse worker
If you feel there is still more to be done, or if you have any questions, leave a comment and we'll reopen if necessary!
from garden-runc-release.
Cool, we used bosh recreate worker 1
to resolve the problem. We'll just keep applying that balm until that runc-feature-based fix lands.
from garden-runc-release.
Related Issues (20)
- thresholder flakes HOT 2
- Use containerd-style stdin closer instead of exponential backoff stdin close HOT 2
- Get the protobuf duplicate fix registration warning/panic fixed in log-cache-release HOT 3
- GrootFS additional metrics HOT 7
- Support exporting garden-runc-release on windows HOT 4
- Uninitialized constant when rendering job template HOT 15
- Add support in CFAR for per-docker-app seccomp profiles HOT 6
- Upgrade busybox to 1.34.1 HOT 1
- Question: now the app container started by garden, the PID 1 process is app process? HOT 3
- gdn binary is gone in 1.20.9 release assets HOT 3
- Gdn failed to run on ubuntu bionic HOT 33
- Release gdn binary for ARM HOT 2
- release 1.22.9 doesn't include gdn binary HOT 2
- containerd and runc are included in two places that cause versions falling out of sync HOT 1
- Pinned dependecies should have a reason or unpinned
- Test issue. Please Ignore.
- Change default for garden spec to be containerd mode
- Stop Testing for and remove rootless mode
- Stop testing for containerd-for-processes in CI HOT 6
- Missing gdn binaries in release assets for 1.46 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from garden-runc-release.