Giter Club home page Giter Club logo

Comments (13)

cf-gitbot avatar cf-gitbot commented on July 21, 2024

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/156001041

The labels on this github issue will be updated when the story is started.

from garden-runc-release.

julz avatar julz commented on July 21, 2024

Hi @JY-Lee, it looks like we weren't able to create the xfs filesystem we use to enforce container quotas. Could you check which stemcell version (and especially what kernel is in the stemcell) you are using? Thanks!

from garden-runc-release.

JY-Lee avatar JY-Lee commented on July 21, 2024

Hi @julz ,
this was with stemcell version of bosh-vsphere-esxi-ubuntu-trusty-go_agent | ubuntu-trusty | 3468.21 and i am not sure about the kernel as i have deleted diego. Plus, v3445.2 stemcell also had same result.
The thing is,,, i am working with bosh version 261 and have succeded depolyment on 'Openstack' using the same manifest.

Thank you!!!

from garden-runc-release.

julz avatar julz commented on July 21, 2024

Hi @JY-Lee - it looks like the latest vsphere stemcell is 3541.9, could you please try upgrading the stemcell to the latest version and let me know if the problem still occurs? If it does, could you also please bosh ssh in to a VM and run uname -r to check the kernel version. Thanks!

from garden-runc-release.

AmitRoushan avatar AmitRoushan commented on July 21, 2024

Hi @JY-Lee , Recently, I also came across same issue. I was using "3.13.0-142-generic" kernel version. But when i upgraded it to 4.4.0-105-generic, it worked for me.

Thanks

from garden-runc-release.

JY-Lee avatar JY-Lee commented on July 21, 2024

Thank you, @julz and @AmitRoushan .
For v 3468.21 Stemcell >> diego cell kernal version was "4.4.0-111-generic"
and for v 3541.9 Stemcell >> diego cell kernal version was "4.4.0-116-generic"

I did success deployment, but it only happens when i deploy it twice. ( succeeded only in second execution ) Or it takes some time after the occurring error to have $ bosh vms cell state change to "running"

Both of the cases occurs error of following at the first deployment:
####################################################
/cb4a956c-ce5d-41cd-ad96-51e90adcab6f:/var/vcap/sys/log/monit$ dmesg | tail
dmesg: klogctl failed: Operation not permitted
/cb4a956c-ce5d-41cd-ad96-51e90adcab6f:/var/vcap/sys/log/monit$ sudo dmesg | tail
[sudo] password for vcap:
[  839.720424] bridge: automatic filtering via arp/ip/ip6tables has been deprecated. Update your scripts to load br_netfilter if you need this.
[  839.746738] device w7224skiqi4t-0 entered promiscuous mode
[  839.746935] wbrdg-0afe0000: port 1(w7224skiqi4t-0) entered forwarding state
[  839.746946] wbrdg-0afe0000: port 1(w7224skiqi4t-0) entered forwarding state
[  839.768064] wbrdg-0afe0000: port 1(w7224skiqi4t-0) entered disabled state
[  839.869393] wbrdg-0afe0000: port 1(w7224skiqi4t-0) entered forwarding state
[  839.869404] wbrdg-0afe0000: port 1(w7224skiqi4t-0) entered forwarding state
[  840.057631] wbrdg-0afe0000: port 1(w7224skiqi4t-0) entered disabled state
[  840.061700] device w7224skiqi4t-0 left promiscuous mode
[  840.061719] wbrdg-0afe0000: port 1(w7224skiqi4t-0) entered disabled state
/cb4a956c-ce5d-41cd-ad96-51e90adcab6f:/var/vcap/sys/log/monit$
/cb4a956c-ce5d-41cd-ad96-51e90adcab6f:/var/vcap/sys/log/monit$ uname -r
4.4.0-116-generic
####################################################

## I would like to know if there's any way to make it succeed by my first deployment? (by first chance; deploying one time only )

Thnak you very much!!

from garden-runc-release.

JY-Lee avatar JY-Lee commented on July 21, 2024

Also, ## does vSphere Starndard env support grootfs? I am curious as i am working on vSphere Starndard env for this. Thanks.

from garden-runc-release.

Callisto13 avatar Callisto13 commented on July 21, 2024

Hey @JY-Lee we are still looking into your initial mounting failure, and will let you know if you make any progress.
As for garden not succeeding in a first deploy, but then succeeding in a second; we have seen recently that some slow-to-deploy environments are hitting a harsh default timeout, and that garden does not report as running in that time which makes the deployment fail with no visible error. We have bumped this in a newer release. Have you seen any other errors recently?

from garden-runc-release.

JY-Lee avatar JY-Lee commented on July 21, 2024

Hi @Callisto13 , thank you for your time.

As you suggested, I have updated to 'garden-runc v1.12.1' and tested on few different environments.

However, it came up with an error as addressed below.
#################################################################
Are you sure you want to deploy? (type 'yes' to continue): yes

Director task 104
Deprecation: Ignoring cloud config. Manifest contains 'networks' section.

  Started preparing deployment > Preparing deployment. Done (00:00:02)

  Started preparing package compilation > Finding packages to compile. Done (00:00:00)

  Started creating missing vms
  Started creating missing vms > database_z1/ea20cde2-2698-4a3d-81b2-04616c9b3742 (0)
  Started creating missing vms > cc_bridge_z1/0b15bc81-b942-49a3-8620-e0a791a1711d (0)
  Started creating missing vms > route_emitter_z1/b0c4fadf-39e9-4f31-ad91-01819e7f18da (0)
  Started creating missing vms > cell_z1/f81dfd88-fe7b-4c9f-9249-0d3ec7bd0ca8 (0)
  Started creating missing vms > brain_z1/5e6bd0ca-d531-4bf8-95c3-ca5a8f097076 (0)
  Started creating missing vms > access_z1/f5ca3015-c0b2-4173-9850-33d968c3e870 (0)
     Done creating missing vms > database_z1/ea20cde2-2698-4a3d-81b2-04616c9b3742 (0) (00:05:16)
     Done creating missing vms > access_z1/f5ca3015-c0b2-4173-9850-33d968c3e870 (0) (00:05:16)
     Done creating missing vms > brain_z1/5e6bd0ca-d531-4bf8-95c3-ca5a8f097076 (0) (00:05:54)
     Done creating missing vms > cc_bridge_z1/0b15bc81-b942-49a3-8620-e0a791a1711d (0) (00:05:56)
     Done creating missing vms > route_emitter_z1/b0c4fadf-39e9-4f31-ad91-01819e7f18da (0) (00:05:57)
     Done creating missing vms > cell_z1/f81dfd88-fe7b-4c9f-9249-0d3ec7bd0ca8 (0) (00:06:07)
     Done creating missing vms (00:06:07)

  Started updating instance database_z1 > database_z1/ea20cde2-2698-4a3d-81b2-04616c9b3742 (0) (canary). Done (00:01:24)
  Started updating instance brain_z1 > brain_z1/5e6bd0ca-d531-4bf8-95c3-ca5a8f097076 (0) (canary). Done (00:00:44)
  Started updating instance cc_bridge_z1 > cc_bridge_z1/0b15bc81-b942-49a3-8620-e0a791a1711d (0) (canary)
  Started updating instance route_emitter_z1 > route_emitter_z1/b0c4fadf-39e9-4f31-ad91-01819e7f18da (0) (canary)
  Started updating instance access_z1 > access_z1/f5ca3015-c0b2-4173-9850-33d968c3e870 (0) (canary)
  Started updating instance cell_z1 > cell_z1/f81dfd88-fe7b-4c9f-9249-0d3ec7bd0ca8 (0) (canary)
     Done updating instance route_emitter_z1 > route_emitter_z1/b0c4fadf-39e9-4f31-ad91-01819e7f18da (0) (canary) (00:01:09)
     Done updating instance cc_bridge_z1 > cc_bridge_z1/0b15bc81-b942-49a3-8620-e0a791a1711d (0) (canary) (00:01:30)
     Done updating instance access_z1 > access_z1/f5ca3015-c0b2-4173-9850-33d968c3e870 (0) (canary) (00:01:40)
   Failed updating instance cell_z1 > cell_z1/f81dfd88-fe7b-4c9f-9249-0d3ec7bd0ca8 (0) (canary): 'cell_z1/0 (f81dfd88-fe7b-4c9f-9249-0d3ec7bd0ca8)' is not running after update. Review logs for failed jobs: consul_agent, rep, garden, metron_agent (00:05:26)

Error 400007: 'cell_z1/0 (f81dfd88-fe7b-4c9f-9249-0d3ec7bd0ca8)' is not running after update. Review logs for failed jobs: consul_agent, rep, garden, metron_agent
#################################################################
Different from before, you can see failed job list have changed,

but /var/vcap/data/sys/log/monit/garden.err.log file continues to throw error as addressed below.
#################################################################
{"timestamp":"1521677187.187122345","source":"grootfs","message":"grootfs.init-store.store-manager-init-store.overlayxfs-init-filesystem.mounting-filesystem-failed","log_level":2,"data":{"error":"exit status 32: mount: wrong fs type, bad option, bad superblock on /dev/loop0,\n       missing codepage or helper program, or other error\n       In some cases useful info is found in syslog - try\n       dmesg | tail  or so\n\n","filesystemPath":"/var/vcap/data/grootfs/store/unprivileged.backing-store","session":"1.1.2","spec":{"UIDMappings":[{"HostID":4294967294,"NamespaceID":0,"Size":1},{"HostID":1,"NamespaceID":1,"Size":4294967293}],"GIDMappings":[{"HostID":4294967294,"NamespaceID":0,"Size":1},{"HostID":1,"NamespaceID":1,"Size":4294967293}],"StoreSizeBytes":17118453760},"storePath":"/var/vcap/data/grootfs/store/unprivileged"}}
{"timestamp":"1521677187.187420130","source":"grootfs","message":"grootfs.init-store.store-manager-init-store.initializing-filesystem-failed","log_level":2,"data":{"backingstoreFile":"/var/vcap/data/grootfs/store/unprivileged.backing-store","error":"Mounting filesystem: exit status 32: mount: wrong fs type, bad option, bad superblock on /dev/loop0,\n       missing codepage or helper program, or other error\n       In some cases useful info is found in syslog - try\n       dmesg | tail  or so\n\n","session":"1.1","spec":{"UIDMappings":[{"HostID":4294967294,"NamespaceID":0,"Size":1},{"HostID":1,"NamespaceID":1,"Size":4294967293}],"GIDMappings":[{"HostID":4294967294,"NamespaceID":0,"Size":1},{"HostID":1,"NamespaceID":1,"Size":4294967293}],"StoreSizeBytes":17118453760},"storePath":"/var/vcap/data/grootfs/store/unprivileged"}}
{"timestamp":"1521677187.187511206","source":"grootfs","message":"grootfs.init-store.cleaning-up-store-failed","log_level":2,"data":{"error":"initializing filesyztem: Mounting filesystem: exit status 32: mount: wrong fs type, bad option, bad superblock on /dev/loop0,\n       missing codepage or helper program, or other error\n       In some cases useful info is found in syslog - try\n       dmesg | tail  or so\n\n","session":"1"}}
#################################################################

It used to throw this kind of error at least 2 times or more before, but now it only throws once.

The log below is the change point of data volume.
#################################################################
garden-runce v 1.11.1 == -rw-r--r-- 1 root root 35612 Mar 22 00:06 garden.err.log

garden-runce v 1.12.1 == -rw-r--r-- 1 root root 17468 Mar 22 00:06 garden.err.log
#################################################################

I suppose having 'timeout' replaced to 2mins from 30secs in 'garden-runce-release v1.12.'1 have effected the number of error occurring.
If so, ## is there any way (or function) to manipulate 'timeout' in 'deigo installation manifest' file??

FYI, i have tested with 'garden_healthcheck.timeout' by adjusting the time and it also threw same error.

Thank you very much!!

from garden-runc-release.

Callisto13 avatar Callisto13 commented on July 21, 2024

hey @JY-Lee, i suggested the monit timeout as a potential cause for the second issue you mentioned:

I did success deployment, but it only happens when i deploy it twice. ( succeeded only in second execution )

There is no way to configure the monit timeout from the manifest I am afraid. The increased monit timeout has produced the unintended side-effect of starting the garden ctl script more than once, which is why you are seeing the bad superblock error more frequently now.
The garden_healthcheck.timeout is a different setting and would not have lead to either of your problems.

We are still trying to reproduce the issue. Is this a production environment?

from garden-runc-release.

Callisto13 avatar Callisto13 commented on July 21, 2024

I have also changed the title of this issue since the original was very generic and not search friendly for others who may have come across the same thing

from garden-runc-release.

JY-Lee avatar JY-Lee commented on July 21, 2024

Hi, @Callisto13 , This is a testing environment.
Thank you very much for your time and effort.

from garden-runc-release.

Callisto13 avatar Callisto13 commented on July 21, 2024

@JY-Lee we are still unable to reproduce so could you try to deploy again? and then right after it fails (assuming it fails for the same reason), ssh in and get the following debug information:

  1. the contents of /var/log/messages and all of dmesg (not with | tail). These may be quite long, so please attach them as files.
  2. the outputs from:
    • uname -a
    • blkid
    • modprobe xfs and lsmod | grep xfs
  3. If modprobe exited 0 and lsmod returned at least one line, please also get the outputs from:
    • file -s /var/vcap/data/grootfs/store/unprivileged.backing-store
    • xfs_check /var/vcap/data/grootfs/store/unprivileged.backing_store
    • cat /proc/self/mountinfo

Thanks!

from garden-runc-release.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.