Comments (13)
We have created an issue in Pivotal Tracker to manage this:
https://www.pivotaltracker.com/story/show/156001041
The labels on this github issue will be updated when the story is started.
from garden-runc-release.
Hi @JY-Lee, it looks like we weren't able to create the xfs filesystem we use to enforce container quotas. Could you check which stemcell version (and especially what kernel is in the stemcell) you are using? Thanks!
from garden-runc-release.
Hi @julz ,
this was with stemcell version of bosh-vsphere-esxi-ubuntu-trusty-go_agent | ubuntu-trusty | 3468.21 and i am not sure about the kernel as i have deleted diego. Plus, v3445.2 stemcell also had same result.
The thing is,,, i am working with bosh version 261 and have succeded depolyment on 'Openstack' using the same manifest.
Thank you!!!
from garden-runc-release.
Hi @JY-Lee - it looks like the latest vsphere stemcell is 3541.9, could you please try upgrading the stemcell to the latest version and let me know if the problem still occurs? If it does, could you also please bosh ssh
in to a VM and run uname -r
to check the kernel version. Thanks!
from garden-runc-release.
Hi @JY-Lee , Recently, I also came across same issue. I was using "3.13.0-142-generic" kernel version. But when i upgraded it to 4.4.0-105-generic, it worked for me.
Thanks
from garden-runc-release.
Thank you, @julz and @AmitRoushan .
For v 3468.21 Stemcell >> diego cell kernal version was "4.4.0-111-generic"
and for v 3541.9 Stemcell >> diego cell kernal version was "4.4.0-116-generic"
I did success deployment, but it only happens when i deploy it twice. ( succeeded only in second execution ) Or it takes some time after the occurring error to have $ bosh vms cell state change to "running"
Both of the cases occurs error of following at the first deployment:
####################################################
/cb4a956c-ce5d-41cd-ad96-51e90adcab6f:/var/vcap/sys/log/monit$ dmesg | tail
dmesg: klogctl failed: Operation not permitted
/cb4a956c-ce5d-41cd-ad96-51e90adcab6f:/var/vcap/sys/log/monit$ sudo dmesg | tail
[sudo] password for vcap:
[ 839.720424] bridge: automatic filtering via arp/ip/ip6tables has been deprecated. Update your scripts to load br_netfilter if you need this.
[ 839.746738] device w7224skiqi4t-0 entered promiscuous mode
[ 839.746935] wbrdg-0afe0000: port 1(w7224skiqi4t-0) entered forwarding state
[ 839.746946] wbrdg-0afe0000: port 1(w7224skiqi4t-0) entered forwarding state
[ 839.768064] wbrdg-0afe0000: port 1(w7224skiqi4t-0) entered disabled state
[ 839.869393] wbrdg-0afe0000: port 1(w7224skiqi4t-0) entered forwarding state
[ 839.869404] wbrdg-0afe0000: port 1(w7224skiqi4t-0) entered forwarding state
[ 840.057631] wbrdg-0afe0000: port 1(w7224skiqi4t-0) entered disabled state
[ 840.061700] device w7224skiqi4t-0 left promiscuous mode
[ 840.061719] wbrdg-0afe0000: port 1(w7224skiqi4t-0) entered disabled state
/cb4a956c-ce5d-41cd-ad96-51e90adcab6f:/var/vcap/sys/log/monit$
/cb4a956c-ce5d-41cd-ad96-51e90adcab6f:/var/vcap/sys/log/monit$ uname -r
4.4.0-116-generic
####################################################
## I would like to know if there's any way to make it succeed by my first deployment? (by first chance; deploying one time only )
Thnak you very much!!
from garden-runc-release.
Also, ## does vSphere Starndard env support grootfs? I am curious as i am working on vSphere Starndard env for this. Thanks.
from garden-runc-release.
Hey @JY-Lee we are still looking into your initial mounting failure, and will let you know if you make any progress.
As for garden not succeeding in a first deploy, but then succeeding in a second; we have seen recently that some slow-to-deploy environments are hitting a harsh default timeout, and that garden does not report as running in that time which makes the deployment fail with no visible error. We have bumped this in a newer release. Have you seen any other errors recently?
from garden-runc-release.
Hi @Callisto13 , thank you for your time.
As you suggested, I have updated to 'garden-runc v1.12.1' and tested on few different environments.
However, it came up with an error as addressed below.
#################################################################
Are you sure you want to deploy? (type 'yes' to continue): yes
Director task 104
Deprecation: Ignoring cloud config. Manifest contains 'networks' section.
Started preparing deployment > Preparing deployment. Done (00:00:02)
Started preparing package compilation > Finding packages to compile. Done (00:00:00)
Started creating missing vms
Started creating missing vms > database_z1/ea20cde2-2698-4a3d-81b2-04616c9b3742 (0)
Started creating missing vms > cc_bridge_z1/0b15bc81-b942-49a3-8620-e0a791a1711d (0)
Started creating missing vms > route_emitter_z1/b0c4fadf-39e9-4f31-ad91-01819e7f18da (0)
Started creating missing vms > cell_z1/f81dfd88-fe7b-4c9f-9249-0d3ec7bd0ca8 (0)
Started creating missing vms > brain_z1/5e6bd0ca-d531-4bf8-95c3-ca5a8f097076 (0)
Started creating missing vms > access_z1/f5ca3015-c0b2-4173-9850-33d968c3e870 (0)
Done creating missing vms > database_z1/ea20cde2-2698-4a3d-81b2-04616c9b3742 (0) (00:05:16)
Done creating missing vms > access_z1/f5ca3015-c0b2-4173-9850-33d968c3e870 (0) (00:05:16)
Done creating missing vms > brain_z1/5e6bd0ca-d531-4bf8-95c3-ca5a8f097076 (0) (00:05:54)
Done creating missing vms > cc_bridge_z1/0b15bc81-b942-49a3-8620-e0a791a1711d (0) (00:05:56)
Done creating missing vms > route_emitter_z1/b0c4fadf-39e9-4f31-ad91-01819e7f18da (0) (00:05:57)
Done creating missing vms > cell_z1/f81dfd88-fe7b-4c9f-9249-0d3ec7bd0ca8 (0) (00:06:07)
Done creating missing vms (00:06:07)
Started updating instance database_z1 > database_z1/ea20cde2-2698-4a3d-81b2-04616c9b3742 (0) (canary). Done (00:01:24)
Started updating instance brain_z1 > brain_z1/5e6bd0ca-d531-4bf8-95c3-ca5a8f097076 (0) (canary). Done (00:00:44)
Started updating instance cc_bridge_z1 > cc_bridge_z1/0b15bc81-b942-49a3-8620-e0a791a1711d (0) (canary)
Started updating instance route_emitter_z1 > route_emitter_z1/b0c4fadf-39e9-4f31-ad91-01819e7f18da (0) (canary)
Started updating instance access_z1 > access_z1/f5ca3015-c0b2-4173-9850-33d968c3e870 (0) (canary)
Started updating instance cell_z1 > cell_z1/f81dfd88-fe7b-4c9f-9249-0d3ec7bd0ca8 (0) (canary)
Done updating instance route_emitter_z1 > route_emitter_z1/b0c4fadf-39e9-4f31-ad91-01819e7f18da (0) (canary) (00:01:09)
Done updating instance cc_bridge_z1 > cc_bridge_z1/0b15bc81-b942-49a3-8620-e0a791a1711d (0) (canary) (00:01:30)
Done updating instance access_z1 > access_z1/f5ca3015-c0b2-4173-9850-33d968c3e870 (0) (canary) (00:01:40)
Failed updating instance cell_z1 > cell_z1/f81dfd88-fe7b-4c9f-9249-0d3ec7bd0ca8 (0) (canary): 'cell_z1/0 (f81dfd88-fe7b-4c9f-9249-0d3ec7bd0ca8)' is not running after update. Review logs for failed jobs: consul_agent, rep, garden, metron_agent (00:05:26)
Error 400007: 'cell_z1/0 (f81dfd88-fe7b-4c9f-9249-0d3ec7bd0ca8)' is not running after update. Review logs for failed jobs: consul_agent, rep, garden, metron_agent
#################################################################
Different from before, you can see failed job list have changed,
but /var/vcap/data/sys/log/monit/garden.err.log file continues to throw error as addressed below.
#################################################################
{"timestamp":"1521677187.187122345","source":"grootfs","message":"grootfs.init-store.store-manager-init-store.overlayxfs-init-filesystem.mounting-filesystem-failed","log_level":2,"data":{"error":"exit status 32: mount: wrong fs type, bad option, bad superblock on /dev/loop0,\n missing codepage or helper program, or other error\n In some cases useful info is found in syslog - try\n dmesg | tail or so\n\n","filesystemPath":"/var/vcap/data/grootfs/store/unprivileged.backing-store","session":"1.1.2","spec":{"UIDMappings":[{"HostID":4294967294,"NamespaceID":0,"Size":1},{"HostID":1,"NamespaceID":1,"Size":4294967293}],"GIDMappings":[{"HostID":4294967294,"NamespaceID":0,"Size":1},{"HostID":1,"NamespaceID":1,"Size":4294967293}],"StoreSizeBytes":17118453760},"storePath":"/var/vcap/data/grootfs/store/unprivileged"}}
{"timestamp":"1521677187.187420130","source":"grootfs","message":"grootfs.init-store.store-manager-init-store.initializing-filesystem-failed","log_level":2,"data":{"backingstoreFile":"/var/vcap/data/grootfs/store/unprivileged.backing-store","error":"Mounting filesystem: exit status 32: mount: wrong fs type, bad option, bad superblock on /dev/loop0,\n missing codepage or helper program, or other error\n In some cases useful info is found in syslog - try\n dmesg | tail or so\n\n","session":"1.1","spec":{"UIDMappings":[{"HostID":4294967294,"NamespaceID":0,"Size":1},{"HostID":1,"NamespaceID":1,"Size":4294967293}],"GIDMappings":[{"HostID":4294967294,"NamespaceID":0,"Size":1},{"HostID":1,"NamespaceID":1,"Size":4294967293}],"StoreSizeBytes":17118453760},"storePath":"/var/vcap/data/grootfs/store/unprivileged"}}
{"timestamp":"1521677187.187511206","source":"grootfs","message":"grootfs.init-store.cleaning-up-store-failed","log_level":2,"data":{"error":"initializing filesyztem: Mounting filesystem: exit status 32: mount: wrong fs type, bad option, bad superblock on /dev/loop0,\n missing codepage or helper program, or other error\n In some cases useful info is found in syslog - try\n dmesg | tail or so\n\n","session":"1"}}
#################################################################
It used to throw this kind of error at least 2 times or more before, but now it only throws once.
The log below is the change point of data volume.
#################################################################
garden-runce v 1.11.1 == -rw-r--r-- 1 root root 35612 Mar 22 00:06 garden.err.log
garden-runce v 1.12.1 == -rw-r--r-- 1 root root 17468 Mar 22 00:06 garden.err.log
#################################################################
I suppose having 'timeout' replaced to 2mins from 30secs in 'garden-runce-release v1.12.'1 have effected the number of error occurring.
If so, ## is there any way (or function) to manipulate 'timeout' in 'deigo installation manifest' file??
FYI, i have tested with 'garden_healthcheck.timeout' by adjusting the time and it also threw same error.
Thank you very much!!
from garden-runc-release.
hey @JY-Lee, i suggested the monit timeout as a potential cause for the second issue you mentioned:
I did success deployment, but it only happens when i deploy it twice. ( succeeded only in second execution )
There is no way to configure the monit timeout from the manifest I am afraid. The increased monit timeout has produced the unintended side-effect of starting the garden ctl script more than once, which is why you are seeing the bad superblock
error more frequently now.
The garden_healthcheck.timeout
is a different setting and would not have lead to either of your problems.
We are still trying to reproduce the issue. Is this a production environment?
from garden-runc-release.
I have also changed the title of this issue since the original was very generic and not search friendly for others who may have come across the same thing
from garden-runc-release.
Hi, @Callisto13 , This is a testing environment.
Thank you very much for your time and effort.
from garden-runc-release.
@JY-Lee we are still unable to reproduce so could you try to deploy again? and then right after it fails (assuming it fails for the same reason), ssh in and get the following debug information:
- the contents of
/var/log/messages
and all ofdmesg
(not with| tail
). These may be quite long, so please attach them as files. - the outputs from:
uname -a
blkid
modprobe xfs
andlsmod | grep xfs
- If
modprobe
exited 0 andlsmod
returned at least one line, please also get the outputs from:file -s /var/vcap/data/grootfs/store/unprivileged.backing-store
xfs_check /var/vcap/data/grootfs/store/unprivileged.backing_store
cat /proc/self/mountinfo
Thanks!
from garden-runc-release.
Related Issues (20)
- Handle reserved space on `ext4` HOT 2
- Garden might fail to generate the bundle when mounts from an SMB volume are present HOT 1
- Bump Go to v1.17 HOT 3
- Replace `windows-tools-release` with our own private release HOT 2
- thresholder flakes HOT 2
- Use containerd-style stdin closer instead of exponential backoff stdin close HOT 2
- Get the protobuf duplicate fix registration warning/panic fixed in log-cache-release HOT 3
- GrootFS additional metrics HOT 7
- Support exporting garden-runc-release on windows HOT 4
- Uninitialized constant when rendering job template HOT 15
- Add support in CFAR for per-docker-app seccomp profiles HOT 6
- Upgrade busybox to 1.34.1 HOT 1
- Question: now the app container started by garden, the PID 1 process is app process? HOT 3
- gdn binary is gone in 1.20.9 release assets HOT 3
- Gdn failed to run on ubuntu bionic HOT 33
- Release gdn binary for ARM HOT 2
- release 1.22.9 doesn't include gdn binary HOT 2
- containerd and runc are included in two places that cause versions falling out of sync HOT 1
- Pinned dependecies should have a reason or unpinned
- Test issue. Please Ignore.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from garden-runc-release.