Giter Club home page Giter Club logo

bosh-linux-stemcell-builder's Introduction

BOSH Linux Stemcell Builder

This repo contains tools for creating BOSH stemcells. A stemcell is a bootable disk image that is used as a template by a BOSH Director to create VMs.

Quick Start: Building a Stemcell Locally

git clone [email protected]:cloudfoundry/bosh-linux-stemcell-builder.git
cd bosh-linux-stemcell-builder
git checkout ubuntu-jammy/master
mkdir -p tmp
docker run \
   --privileged \
   -v "$(pwd):/opt/bosh" \
   --workdir /opt/bosh \
   --user=1000:1000 \
   -it \
   bosh/os-image-stemcell-builder:jammy
# You're now in the the Docker container
gem install bundler
bundle
 # build OS image
bundle exec rake stemcell:build_os_image[ubuntu,jammy,$PWD/tmp/ubuntu_base_image.tgz] # build OS image
 # build vSphere stemcell
bundle exec rake stemcell:build_with_local_os_image[vsphere,esxi,ubuntu,jammy,$PWD/tmp/ubuntu_base_image.tgz]

When building a vSphere stemcell, you must download VMware-ovftool-*.bundle and place it in the ci/docker/os-image-stemcell-builder-jammy/ directory. See External Assets for download instructions.

OS image

An OS image is a tarball that contains a snapshot of an OS filesystem, including the libraries and system utilities needed by the BOSH agent; however, it does not contain the BOSH agent nor the virtualization tools: a subsequent Rake task adds the BOSH agent and a set of virtualization tools to the base OS image to produce a stemcell.

The OS Image should be rebuilt when you are making changes to the packages installed in the operating system or when making changes to the configuration of those packages.

bundle exec rake stemcell:build_os_image[ubuntu,jammy,$PWD/tmp/ubuntu_base_image.tgz]

The arguments to the stemcell:build_os_image rake task follow:

  1. operating_system_name (ubuntu): identifies which type of OS to fetch. Determines which package repository and packaging tool will be used to download and assemble the files. Currently, only ubuntu is recognized.
  2. operating_system_version (jammy): an identifier that the system may use to decide which release of the OS to download. Acceptable values depend on the operating system. For ubuntu, use jammy.
  3. os_image_path ($PWD/tmp/ubuntu_base_image.tgz): the path to write the finished OS image tarball to. If a file exists at this path already, it will be overwritten without warning.

Building a Stemcell

Rebuild the stemcell when you are making and testing BOSH-specific changes such as a new BOSH agent.

bundle exec rake stemcell:build_with_local_os_image[vsphere,esxi,ubuntu,jammy,$PWD/tmp/ubuntu_base_image.tgz,"0.0.8"]

The arguments to stemcell:build_with_local_os_image are:

  1. infrastructure_name: Which IaaS you are producing the stemcell for. Determines which virtualization tools to package on top of the stemcell.
  2. hypervisor_name: Depending on what the IAAS supports, which hypervisor to target: awsxen-hvm, azurehyperv, googlekvm, openstackkvm, vsphereesxi
  3. operating_system_name (ubuntu): Type of OS. Same as
  4. stemcell:build_os_image. Can optionally include a variant suffix (jammy-fips)
  5. operating_system_version (jammy): OS release. Same as
  6. os_image_path ($PWD/tmp/ubuntu_base_image.tgz): Path to base OS image produced in stemcell:build_os_image
  7. build_number (0.0.8): Stemcell version. Pro-tip: take the version number of the most recent release and add one, e.g.: "0.0.7" → "0.0.8". If not specified, it will default to "0000".

The Resulting Stemcell

You can find the resulting stemcell in the tmp/ directory of the host, or in the /opt/bosh/tmp directory in the Docker container. Using the above example, the stemcell would be at tmp/bosh-stemcell-0.0.8-vsphere-esxi-ubuntu-jammy-go_agent.tgz. You can upload the stemcell to a vSphere BOSH Director:

bosh upload-stemcell tmp/bosh-stemcell-0.0.8-vsphere-esxi-ubuntu-jammy-go_agent.tgz

Testing

[Fixme: update Testing section to Jammy]

How to run tests for OS Images

The OS tests are meant to be run against the OS environment to which they belong. When you run the stemcell:build_os_image rake task, it will create a .raw OS image that it runs the OS specific tests against. You will need to run the rake task the first time you create your docker container, but everytime after, as long as you do not destroy the container, you should be able to run the specific tests.

To run the ubuntu_jammy_spec.rb tests (assuming you've already built the OS image at the tmp/ubuntu_base_image.tgz and you're within the Docker container):

cd /opt/bosh/bosh-stemcell
OS_IMAGE=/opt/bosh/tmp/ubuntu_base_image.tgz bundle exec rspec -fd spec/os_image/ubuntu_jammy_spec.rb

How to Run Tests for Stemcell

When you run the stemcell:build_with_local_os_image or stemcell:build rake task, it will create a stemcell that it runs the stemcell specific tests against. You will need to run the rake task the first time you create your docker container, but everytime after, as long as you do not destroy the container, you should be able to run the specific tests:

cd /opt/bosh/bosh-stemcell; \
STEMCELL_IMAGE=/mnt/stemcells/vsphere/esxi/ubuntu/work/work/vsphere-esxi-ubuntu.raw \
STEMCELL_WORKDIR=/mnt/stemcells/vsphere/esxi/ubuntu/work/work/chroot \
OS_NAME=ubuntu \
bundle exec rspec -fd --tag ~exclude_on_vsphere \
spec/os_image/ubuntu_jammy_spec.rb \
spec/stemcells/ubuntu_jammy_spec.rb \
spec/stemcells/go_agent_spec.rb \
spec/stemcells/vsphere_spec.rb \
spec/stemcells/stig_spec.rb \
spec/stemcells/cis_spec.rb

How to run tests for ShelloutTypes

In pursuit of more robustly testing, we wrote our testing library for stemcell contents, called ShelloutTypes.

The ShelloutTypes code has its own unit tests, but require root privileges and an ubuntu chroot environment to run. For this reason, we use the bosh/main-ubuntu-chroot docker image for unit tests. To run these unit tests locally, run:

bundle install --local
cd /opt/bosh/bosh-stemcell
OS_IMAGE=/opt/bosh/tmp/ubuntu_base_image.tgz bundle exec rspec spec/ --tag shellout_types

If on macOS, run:

OSX=true OS_IMAGE=/opt/bosh/tmp/ubuntu_base_image.tgz bundle exec rspec spec/ --tag shellout_types

How to run tests for BOSH Linux Stemcell Builder

The BOSH Linux Stemcell Builder code itself can be tested with the following command's:

bundle install --local
cd /opt/bosh/bosh-stemcell
bundle exec rspec spec/

Troubleshooting

If you find yourself debugging any of the above processes, here is what you need to know:

  1. Most of the action happens in Bash scripts, which are referred to as stages, and can be found in stemcell_builder/stages/<stage_name>/apply.sh.

  2. While debugging a particular stage that is failing, you can resume the process from that stage by adding resume_from=<stage_name> to the end of your bundle exec rake command. When a stage's apply.sh fails, you should see a message of the form Can't find stage '<stage>' to resume from. Aborting. so you know which stage failed and where you can resume from after fixing the problem. Please use caution as stages are not guaranteed to be idempotent.

    Example usage:

    bundle exec rake stemcell:build_os_image[ubuntu,jammy,$PWD/tmp/ubuntu_base_image.tgz] resume_from=rsyslog_config

Pro Tips

  • If the OS image has been built and so long as you only make test case modifications you can rerun the tests (without rebuilding OS image). Details in section How to run tests for OS Images
  • If the Stemcell has been built and you are only updating tests, you do not need to re-build the stemcell. You can simply rerun the tests (without rebuilding Stemcell. Details in section How to run tests for Stemcell
  • It's possible to verify OS/Stemcell changes without making a deployment using the stemcell. For a vSphere-specific Ubuntu stemcell, the filesytem is available at /mnt/stemcells/vsphere/esxi/ubuntu/work/work/chroot

External Assets

The ovftool installer from VMWare can be found at my.vmware.com.

The ovftool installer must be copied into the ci/docker/os-image-stemcell-builder-jammy next to the Dockerfile or you will receive the error

Step 24/30 : ADD ${OVF_TOOL_INSTALLER} /tmp/ovftool_installer.bundle
ADD failed: stat /var/lib/docker/tmp/docker-builder389354746/VMware-ovftool-4.1.0-2459827-lin.x86_64.bundle: no such file or directory

Rebuilding the Docker Image

The Docker image is published to bosh/os-image-stemcell-builder. You will need the ovftool installer present on your filesystem.

Rebuild the container with the build script...

./build os-image-stemcell-builder

When ready, push to DockerHub and use the credentials from LastPass...

cd os-image-stemcell-builder
./push

bosh-linux-stemcell-builder's People

Contributors

aduffeck avatar andyliuliming avatar aramprice avatar belinda-liu avatar christarazi avatar cjnosal avatar cunnie avatar dennisdenuto avatar dlresende avatar dpb587-pivotal avatar dsboulder avatar friegger avatar jfmyers9 avatar jpalermo avatar jrussett avatar klakin-pivotal avatar lnguyen avatar luan avatar manno avatar mikexuu avatar pivotal-mp avatar ramonskie avatar rkoster avatar rthill91 avatar stefanwutz avatar tjvman avatar tylerschultz avatar xtreme-nitin-ravindran avatar xtreme-sameer-vohra avatar zankich avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bosh-linux-stemcell-builder's Issues

Base os image file name mismatches in ubuntu-xenial.meta4 and Rakefile

I'm building ubuntu xenial stemcell 456.40 for SoftLayer and found the following issue.

The file name of base os image specified in https://github.com/cloudfoundry/bosh-linux-stemcell-builder/blob/ubuntu-xenial/v456.40/bosh-stemcell/image-metalinks/ubuntu-xenial.meta4#L2 has been changed to ubuntu-xenial.tgz from previous bosh-ubuntu-xenial-os-image.tgz. But in the Rakefile https://github.com/cloudfoundry/bosh-linux-stemcell-builder/blob/ubuntu-xenial/v456.40/Rakefile#L61, the resolved meta4 download filename specified after --file is still bosh-ubuntu-xenial-os-image.tgz which makes the download failed due to error File does not exist.

Would you please fix the issue? Thanks.

Unable to pass the smoke test

Hi, there is an error when running smoke test:

Task 93 | 04:44:40 | Updating instance default: default/c43d44ed-e043-45df-b315-5e9debbead6f (0) (canary) (00:02:13)
[Cmd Runner] 2019/05/13 04:46:58 DEBUG - Stderr: 
[Cmd Runner] 2019/05/13 04:46:58 DEBUG - Successful: true (0)
•! Panic [155.717 seconds]
Stemcell #164749230, when using targeted blobstores when deploying with a invalid logs blobstore [It] should fail to get logs, but the deploy should succeed 
/tmp/build/82c27cda/bosh-linux-stemcell-builder/src/github.com/cloudfoundry/stemcell-acceptance-tests/ipv4director/smoke/smoke_test.go:309

  Test Panicked
  runtime error: index out of range
  /usr/local/go/src/runtime/panic.go:44

  Full Stack Trace
  	/usr/local/go/src/runtime/panic.go:522 +0x1b5
  github.com/cloudfoundry/stemcell-acceptance-tests/ipv4director/smoke_test.glob..func3.11.2.1()
  	/tmp/build/82c27cda/bosh-linux-stemcell-builder/src/github.com/cloudfoundry/stemcell-acceptance-tests/ipv4director/smoke/smoke_test.go:330 +0x185e
  github.com/cloudfoundry/stemcell-acceptance-tests/vendor/github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync(0xc0000734a0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
  	/tmp/build/82c27cda/bosh-linux-stemcell-builder/src/github.com/cloudfoundry/stemcell-acceptance-tests/ipv4director/smoke/smoke_suite_test.go:18 +0x64
  testing.tRunner(0xc000128d00, 0x899e08)
  	/usr/local/go/src/testing/testing.go:865 +0xc0
  created by testing.(*T).Run
  	/usr/local/go/src/testing/testing.go:916 +0x35a

I think this issue comes from there is a invalid stdout parsing in the smoke_test.go . In the line, the config is a agentSetting but the stdout contains more content that the agentSetting defined, like the following:

"{\"agent_id\":\"4095d873-fb34-48dd-9ad7-5131e14878c2\",\"blobstore\":{\"provider\":\"dav\",\"options\":{\"endpoint\":\"http://172.16.0.6:25250\",\"password\":\"41b4fl2yr*****\",\"user\":\"agent\"},\"name\":\"\"},\"disks\":{\"system\":\"/dev/vda\",\"ephemeral\":\"/dev/vdb\",\"persistent\":{},\"raw_ephemeral\":null},\"env\":{\"bosh\":{\"agent\":{\"settings\":{\"tmpfs\":false}},\"password\":\"$6$69bb48*******jx9rCLJmo7z10\",\"keep_root_password\":false,\"remove_dev_tools\":true,\"remove_static_libraries\":true,\"authorized_keys\":null,\"swap_size\":null,\"mbus\":{\"cert\":{\"ca\":\"-----BEGIN CERTIFICATE-----\\nMIIEijCCA*******me6MI0VoB6U23UJzKppcsWwABrfGKRyRkuM7mYe5KkxDP5PP0SEUtBlxGx+zco\\nTgb+LDlpnYHcexHgmzk=\\n-----END CERTIFICATE-----\\n\\n\",\"private_key\":\"-----BEGIN RSA PRIVATE KEY-----\\nMIIG5AIBAAKC********ZTWeSdA==\\n-----END RSA PRIVATE KEY-----\\n\",\"certificate\":\"-----BEGIN CERTIFICATE-----\\nMIIEeTCCAuGgAw*******lYcaMO6Sx+9nHR5D0gSl\\n-----END CERTIFICATE-----\\n\"},\"urls\":null},\"ipv6\":{\"enable\":false},\"job_dir\":{\"tmpfs\":false,\"tmpfs_size\":\"\"},\"blobstores\":null,\"ntp\":null,\"parallel\":null,\"targeted_blobstores\":{\"packages\":\"\",\"logs\":\"\"}},\"persistent_disk_fs\":\"\",\"persistent_disk_mount_options\":null,\"persistent_disk_partitioner\":\"\"},\"networks\":{\"default\":{\"type\":\"manual\",\"ip\":\"172.16.0.4\",\"netmask\":\"255.255.0.0\",\"gateway\":\"172.16.0.1\",\"resolved\":false,\"use_dhcp\":false,\"default\":[\"dns\",\"gateway\"],\"dns\":[\"8.8.8.8\",\"172.16.0.6\"],\"mac\":\"\",\"preconfigured\":false}},\"ntp\":[\"server 0.cn.pool.ntp.org\",\"server 1.cn.pool.ntp.org\",\"server 2.cn.pool.ntp.org\",\"server 3.cn.pool.ntp.org\"],\"mbus\":\"nats://nats:[email protected]:4222\",\"vm\":{\"name\":\"i-gw8i2woxrw*****j\"}}\t" 

faillog command can't work on Xenial stemcell

See the error below in the new Xenial stemcell:

# faillog
faillog: Cannot open /var/log/faillog: No such file or directory

The workaround is to create an empty file /var/log/faillog with 600 permission, please help check why there isn't /var/log/faillog in base image? Thanks.

/cc @maximilien

rotate the wtmp/btmp logs timeout after 122.096s.

When I use the Alibaba Cloud Bosh CPI and the latest Alibaba Cloud stemcell to run the smoke test, there always happens a timeouts error:

• Failure [147.178 seconds]
Stemcell when logrotate wtmp/btmp logs [It] should rotate the wtmp/btmp logs 
/tmp/build/82c27cda/bosh-linux-stemcell-builder/src/github.com/cloudfoundry/stemcell-acceptance-tests/ipv4director/smoke/smoke_test.go:34

  Timed out after 122.096s.
  Logfile '/var/log/wtmp' was larger than expected. It should have been rotated.

Even if I shorten the interval from 15 seconds to 5 seconds, it also failed after several retry:

default/76a7fe46-6c8b-41c7-b1ad-aa001ee1e624: stderr | Connection to 172.16.0.4 closed.
[Cmd Runner] 2019/07/22 19:24:03 DEBUG - Stderr: 
[Cmd Runner] 2019/07/22 19:24:03 DEBUG - Successful: true (0)
[Cmd Runner] 2019/07/22 19:24:03 DEBUG - Running command '/usr/local/bin/bosh -n -d stemcell-acceptance-tests ssh --column=stdout --results default/0 sudo du /var/log/wtmp | cut -f1'
[Cmd Runner] 2019/07/22 19:24:14 DEBUG - Stdout: 4	
	
[Cmd Runner] 2019/07/22 19:24:14 DEBUG - Stderr: 
[Cmd Runner] 2019/07/22 19:24:14 DEBUG - Successful: true (0)
[Cmd Runner] 2019/07/22 19:24:19 DEBUG - Running command '/usr/local/bin/bosh -n -d stemcell-acceptance-tests ssh --column=stdout --results default/0 sudo du /var/log/wtmp | cut -f1'
[Cmd Runner] 2019/07/22 19:24:29 DEBUG - Stdout: 4	
	
[Cmd Runner] 2019/07/22 19:24:29 DEBUG - Stderr: 
[Cmd Runner] 2019/07/22 19:24:29 DEBUG - Successful: true (0)
[Cmd Runner] 2019/07/22 19:24:34 DEBUG - Running command '/usr/local/bin/bosh -n -d stemcell-acceptance-tests ssh --column=stdout --results default/0 sudo du /var/log/wtmp | cut -f1'
[Cmd Runner] 2019/07/22 19:24:47 DEBUG - Stdout: 4	
	
[Cmd Runner] 2019/07/22 19:24:47 DEBUG - Stderr: 
[Cmd Runner] 2019/07/22 19:24:47 DEBUG - Successful: true (0)
[Cmd Runner] 2019/07/22 19:24:52 DEBUG - Running command '/usr/local/bin/bosh -n -d stemcell-acceptance-tests ssh --column=stdout --results default/0 sudo du /var/log/wtmp | cut -f1'
[Cmd Runner] 2019/07/22 19:25:02 DEBUG - Stdout: 8	
	
[Cmd Runner] 2019/07/22 19:25:02 DEBUG - Stderr: 
[Cmd Runner] 2019/07/22 19:25:02 DEBUG - Successful: true (0)
[Cmd Runner] 2019/07/22 19:25:07 DEBUG - Running command '/usr/local/bin/bosh -n -d stemcell-acceptance-tests ssh --column=stdout --results default/0 sudo du /var/log/wtmp | cut -f1'
[Cmd Runner] 2019/07/22 19:25:18 DEBUG - Stdout: 8	
	
[Cmd Runner] 2019/07/22 19:25:18 DEBUG - Stderr: 
[Cmd Runner] 2019/07/22 19:25:18 DEBUG - Successful: true (0)
[Cmd Runner] 2019/07/22 19:25:23 DEBUG - Running command '/usr/local/bin/bosh -n -d stemcell-acceptance-tests ssh --column=stdout --results default/0 sudo du /var/log/wtmp | cut -f1'
[Cmd Runner] 2019/07/22 19:25:33 DEBUG - Stdout: 12	
	
[Cmd Runner] 2019/07/22 19:25:33 DEBUG - Stderr: 
[Cmd Runner] 2019/07/22 19:25:33 DEBUG - Successful: true (0)
[Cmd Runner] 2019/07/22 19:25:38 DEBUG - Running command '/usr/local/bin/bosh -n -d stemcell-acceptance-tests ssh --column=stdout --results default/0 sudo du /var/log/wtmp | cut -f1'
[Cmd Runner] 2019/07/22 19:25:49 DEBUG - Stdout: 12	
	
[Cmd Runner] 2019/07/22 19:25:49 DEBUG - Stderr: 
[Cmd Runner] 2019/07/22 19:25:49 DEBUG - Successful: true (0)
[Cmd Runner] 2019/07/22 19:25:54 DEBUG - Running command '/usr/local/bin/bosh -n -d stemcell-acceptance-tests ssh --column=stdout --results default/0 sudo du /var/log/wtmp | cut -f1'

------------------------------
[Cmd Runner] 2019/07/22 19:26:05 DEBUG - Stdout: 12	
	
[Cmd Runner] 2019/07/22 19:26:05 DEBUG - Stderr: 
[Cmd Runner] 2019/07/22 19:26:05 DEBUG - Successful: true (0)
• Failure [147.178 seconds]
Stemcell when logrotate wtmp/btmp logs [It] should rotate the wtmp/btmp logs 
/tmp/build/82c27cda/bosh-linux-stemcell-builder/src/github.com/cloudfoundry/stemcell-acceptance-tests/ipv4director/smoke/smoke_test.go:34

  Timed out after 122.096s.
  Logfile '/var/log/wtmp' was larger than expected. It should have been rotated.

Is there any ideas to fix it?

In addition, this CI pipeline is bosh-cpi-certification of alicloud.

Error in environment.os_image_rspec_command when running build_os_image target

Followed the instruction on README to build an Ubuntu trusty image using docker container.
Ran
$ bundle exec rake stemcell:build_os_image[ubuntu,trusty,$PWD/tmp/ubuntu_base_image.tgz]

The image builds fine but verification on

https://github.com/cloudfoundry/bosh-linux-stemcell-builder/blob/master/Rakefile#L39

fails with

/opt/bosh/bosh-stemcell/spec/support/spec_assets.rb:11:in `block in <top (required)>': uninitialized constant ShelloutTypes (NameError)
Did you mean?  Shellwords
	from /home/ubuntu/.gem/ruby/2.3.1/gems/rspec-core-3.4.4/lib/rspec/core.rb:97:in `configure'
	from /opt/bosh/bosh-stemcell/spec/support/spec_assets.rb:9:in `<top (required)>'
	from /opt/bosh/bosh-stemcell/spec/spec_helper.rb:5:in `require'
	from /opt/bosh/bosh-stemcell/spec/spec_helper.rb:5:in `block in <top (required)>'
	from /opt/bosh/bosh-stemcell/spec/spec_helper.rb:5:in `each'
	from /opt/bosh/bosh-stemcell/spec/spec_helper.rb:5:in `<top (required)>'
	from /opt/bosh/bosh-stemcell/spec/os_image/ubuntu_trusty_spec.rb:2:in `require'
	from /opt/bosh/bosh-stemcell/spec/os_image/ubuntu_trusty_spec.rb:2:in `<top (required)>'
	from /home/ubuntu/.gem/ruby/2.3.1/gems/rspec-core-3.4.4/lib/rspec/core/configuration.rb:1361:in `load'
	from /home/ubuntu/.gem/ruby/2.3.1/gems/rspec-core-3.4.4/lib/rspec/core/configuration.rb:1361:in `block in load_spec_files'
	from /home/ubuntu/.gem/ruby/2.3.1/gems/rspec-core-3.4.4/lib/rspec/core/configuration.rb:1359:in `each'
	from /home/ubuntu/.gem/ruby/2.3.1/gems/rspec-core-3.4.4/lib/rspec/core/configuration.rb:1359:in `load_spec_files'
	from /home/ubuntu/.gem/ruby/2.3.1/gems/rspec-core-3.4.4/lib/rspec/core/runner.rb:106:in `setup'
	from /home/ubuntu/.gem/ruby/2.3.1/gems/rspec-core-3.4.4/lib/rspec/core/runner.rb:92:in `run'
	from /home/ubuntu/.gem/ruby/2.3.1/gems/rspec-core-3.4.4/lib/rspec/core/runner.rb:78:in `run'
	from /home/ubuntu/.gem/ruby/2.3.1/gems/rspec-core-3.4.4/lib/rspec/core/runner.rb:45:in `invoke'
	from /home/ubuntu/.gem/ruby/2.3.1/gems/rspec-core-3.4.4/exe/rspec:4:in `<top (required)>'
	from /home/ubuntu/.gem/ruby/2.3.1/bin/rspec:22:in `load'
	from /home/ubuntu/.gem/ruby/2.3.1/bin/rspec:22:in `<main>'

Running the verification directly from the command line has the same result

ubuntu@73e58c136147:/opt/bosh$ cd /opt/bosh/bosh-stemcell; OS_IMAGE=/opt/bosh/tmp/ubuntu_base_image.tgz bundle exec rspec -fd spec/os_image/ubuntu_trusty_spec.rb
All stemcell_tarball tests are being skipped. STEMCELL_WORKDIR needs to be set
/opt/bosh/bosh-stemcell/spec/support/spec_assets.rb:11:in `block in <top (required)>': uninitialized constant ShelloutTypes (NameError)
Did you mean?  Shellwords
	from /home/ubuntu/.gem/ruby/2.3.1/gems/rspec-core-3.4.4/lib/rspec/core.rb:97:in `configure'
	from /opt/bosh/bosh-stemcell/spec/support/spec_assets.rb:9:in `<top (required)>'
	from /opt/bosh/bosh-stemcell/spec/spec_helper.rb:5:in `require'
	from /opt/bosh/bosh-stemcell/spec/spec_helper.rb:5:in `block in <top (required)>'
	from /opt/bosh/bosh-stemcell/spec/spec_helper.rb:5:in `each'
	from /opt/bosh/bosh-stemcell/spec/spec_helper.rb:5:in `<top (required)>'
	from /opt/bosh/bosh-stemcell/spec/os_image/ubuntu_trusty_spec.rb:2:in `require'
	from /opt/bosh/bosh-stemcell/spec/os_image/ubuntu_trusty_spec.rb:2:in `<top (required)>'
	from /home/ubuntu/.gem/ruby/2.3.1/gems/rspec-core-3.4.4/lib/rspec/core/configuration.rb:1361:in `load'
	from /home/ubuntu/.gem/ruby/2.3.1/gems/rspec-core-3.4.4/lib/rspec/core/configuration.rb:1361:in `block in load_spec_files'
	from /home/ubuntu/.gem/ruby/2.3.1/gems/rspec-core-3.4.4/lib/rspec/core/configuration.rb:1359:in `each'
	from /home/ubuntu/.gem/ruby/2.3.1/gems/rspec-core-3.4.4/lib/rspec/core/configuration.rb:1359:in `load_spec_files'
	from /home/ubuntu/.gem/ruby/2.3.1/gems/rspec-core-3.4.4/lib/rspec/core/runner.rb:106:in `setup'
	from /home/ubuntu/.gem/ruby/2.3.1/gems/rspec-core-3.4.4/lib/rspec/core/runner.rb:92:in `run'
	from /home/ubuntu/.gem/ruby/2.3.1/gems/rspec-core-3.4.4/lib/rspec/core/runner.rb:78:in `run'
	from /home/ubuntu/.gem/ruby/2.3.1/gems/rspec-core-3.4.4/lib/rspec/core/runner.rb:45:in `invoke'
	from /home/ubuntu/.gem/ruby/2.3.1/gems/rspec-core-3.4.4/exe/rspec:4:in `<top (required)>'
	from /home/ubuntu/.gem/ruby/2.3.1/bin/rspec:22:in `load'
	from /home/ubuntu/.gem/ruby/2.3.1/bin/rspec:22:in `<main>'

Docker setup in use

ubuntu@ubuntu:~$ docker images
REPOSITORY                       TAG                 IMAGE ID            CREATED             SIZE
bosh/os-image-stemcell-builder   latest              8674f852b6e4        6 months ago        2.41 GB
ubuntu@ubuntu:~$ docker ps
CONTAINER ID        IMAGE                            COMMAND             CREATED             STATUS              PORTS               NAMES
73e58c136147        bosh/os-image-stemcell-builder   "/bin/bash"         9 hours ago         Up 9 hours                              friendly_euclid

aws stemcell - limited network connectivity due to network driver restarts

Symptoms:
On AWS on the affected VMs we observe problems with the network connectivity. Frequently the network connectivity is down, coming back up shortly after. This is probably related to a restart of the NIC as seen in the logs below. It is independent from the releases running on the managed bosh vm.

The problem will stay indefinitely. To solve it we delete the vm via bosh. We once rebooted a VM which also solved it.

Logs:
kern.log:

2019-07-22T11:01:55.096109+00:00 localhost kernel: [2008934.869922] ixgbevf 0000:00:03.0: NIC Link is Up 10 Gbps
2019-07-22T11:02:06.108096+00:00 localhost kernel: [2008945.880952] ixgbevf 0000:00:03.0: NIC Link is Up 10 Gbps
2019-07-22T11:02:11.992109+00:00 localhost kernel: [2008951.765109] ixgbevf 0000:00:03.0: NIC Link is Up 10 Gbps
2019-07-22T11:02:21.972103+00:00 localhost kernel: [2008961.745862] ixgbevf 0000:00:03.0: NIC Link is Up 10 Gbps
2019-07-22T11:02:32.216097+00:00 localhost kernel: [2008971.989764] ixgbevf 0000:00:03.0: NIC Link is Up 10 Gbps
2019-07-22T11:02:42.964105+00:00 localhost kernel: [2008982.737861] ixgbevf 0000:00:03.0: NIC Link is Up 10 Gbps
2019-07-22T11:02:53.208095+00:00 localhost kernel: [2008992.981733] ixgbevf 0000:00:03.0: NIC Link is Up 10 Gbps
2019-07-22T11:02:59.100107+00:00 localhost kernel: [2008998.873855] ixgbevf 0000:00:03.0: NIC Link is Up 10 Gbps
2019-07-22T11:03:10.104101+00:00 localhost kernel: [2009009.877387] ixgbevf 0000:00:03.0: NIC Link is Up 10 Gbps
(...)

There are no correlating log entries in the agent or audit.log.

Impact
Network calls will occasionally not go through. Usually the VM will report as healthy as the occasional health check goes through. Jobs will still be reported as running. VMs that rely on frequent network communication like Diego cells are strongly affected.

As the problem will stay indefinitely we currently need to spot the instances and then use delete-vm to delete them manually.

Reproducibility
We see this issue only on large installations (>200 Instances).
We were able to reproduce it by creating a large deployments with 200 instances. Then we recreated instances until the problem occurred.

Further information
Because of this problem the network driver already has been updated but this didn't solve or improve the situation.

We communicated this with the AWS support. They suggested us to contact the AMI vendor and gave the following hint on debugging:

Regarding your set debug mode for ixgbevf driver question, I found the links below and want to share with you on a best effort basis.
You may try to enable the kernel debug or if the driver supports you may use msglvl flag with ethtool.
http://mails.dpdk.org/archives/dev/2017-September/076962.html
https://techedemic.com/2015/01/22/ubuntugrub2-verbose-booting/

The AWS support suggested to use the AWS kernel which includes the ixgbevf driver but our assumption that this is not possible on a stemcell.

cc @friegger @achawki

ncurses 6.0 fails to compile on bosh-google-kvm-ubuntu-xenial-go_agent v50

I'm getting the following error when compiling ncurses 6.0:

Task 46996 | 22:02:16 | Compiling packages: ncurses/a458e4c499edb33b98aa7b833ce8e8af082ea9c5 (00:03:46)
                     L Error: Action Failed get_task: Task 0988e16c-92db-4664-60b1-a4de71d46d37 result: Compiling package ncurses: Running packaging script: Running packaging script: Command exited with 2; Truncated stdout: checking for fork... yes
checking for vfork... yes
checking for working fork... (cached) yes
checking for working vfork... (cached) yes
checking for openpty in -lutil... yes
checking for openpty header... pty.h
checking if we should include stdbool.h... yes
checking for builtin bool type... no
checking for library stdc++... no
checking whether /usr/bin/g++ understands -c and -o together... yes
checking how to run the C++ preprocessor... /usr/bin/g++ -E
checking for typeinfo... yes
checking for iostream... yes
checking if iostream uses std-namespace... yes
checking if we should include stdbool.h... (cached) yes
checking for builtin bool type... yes
checking for size of bool... unsigned char
checking for special defines needed for etip.h...
checking if /usr/bin/g++ accepts parameter initialization... no
checking if /usr/bin/g++ accepts static_cast... yes
checking for gnatmake... no
checking for library subsets... ticlib+termlib+ext_tinfo+base+ext_funcs
checking default library suffix...
checking default library-dependency suffix... .a
checking default object directory... objects
checking c++ library-dependency suffix... .a
checking if linker supports switching between static/dynamic... no
checking where we will install curses.h... ${prefix}/include/ncurses
checking for src modules... ncurses progs panel menu form
checking for tic... /usr/bin/tic
checking for defines to add to ncurses6-config script...  -D_GNU_SOURCE
package: ncurses
configure: creating ./config.status
config.status: creating include/MKterm.h.awk
config.status: creating include/curses.head
config.status: creating include/ncurses_dll.h
config.status: creating include/termcap.h
config.status: creating include/unctrl.h
config.status: creating man/Makefile
config.status: creating include/Makefile
config.status: creating ncurses/Makefile
config.status: creating progs/Makefile
config.status: creating panel/Makefile
config.status: creating menu/Makefile
config.status: creating form/Makefile
config.status: creating test/Makefile
config.status: creating misc/Makefile
config.status: creating c++/Makefile
config.status: creating misc/run_tic.sh
config.status: creating misc/ncurses-config
config.status: creating man/ncurses6-config.1
config.status: creating Makefile
config.status: creating include/ncurses_cfg.h
Appending rules for normal model (ncurses: ticlib+termlib+ext_tinfo+base+ext_funcs)
Appending rules for debug model (ncurses: ticlib+termlib+ext_tinfo+base+ext_funcs)
Appending rules for normal model (progs: ticlib+termlib+ext_tinfo+base+ext_funcs)
Appending rules for debug model (progs: ticlib+termlib+ext_tinfo+base+ext_funcs)
Appending rules for normal model (panel: ticlib+termlib+ext_tinfo+base+ext_funcs)
Appending rules for debug model (panel: ticlib+termlib+ext_tinfo+base+ext_funcs)
Appending rules for normal model (menu: ticlib+termlib+ext_tinfo+base+ext_funcs)
Appending rules for debug model (menu: ticlib+termlib+ext_tinfo+base+ext_funcs)
Appending rules for normal model (form: ticlib+termlib+ext_tinfo+base+ext_funcs)
Appending rules for debug model (form: ticlib+termlib+ext_tinfo+base+ext_funcs)
Appending rules for normal model (test: ticlib+termlib+ext_tinfo+base+ext_funcs)
Appending rules for debug model (test: ticlib+termlib+ext_tinfo+base+ext_funcs)
Appending rules for normal model (c++: ticlib+termlib+ext_tinfo+base+ext_funcs)
Appending rules for debug model (c++: ticlib+termlib+ext_tinfo+base+ext_funcs)
creating headers.sh

** Configuration summary for NCURSES 6.0 20150808:

       extended funcs: yes
       xterm terminfo: xterm-new

        bin directory: /var/vcap/packages/ncurses/bin
        lib directory: /var/vcap/packages/ncurses/lib
    include directory: /var/vcap/packages/ncurses/include/ncurses
        man directory: /var/vcap/packages/ncurses/share/man
   terminfo directory: /var/vcap/packages/ncurses/share/terminfo

** Include-directory is not in a standard location
cd man && make DESTDIR="" RPATH_LIST="/var/vcap/packages/ncurses/lib" all
make[1]: Entering directory '/var/vcap/data/compile/ncurses/ncurses/ncurses-6.0/man'
/bin/sh ./MKterminfo.sh ./terminfo.head ./../include/Caps ./terminfo.tail >terminfo.5
make[1]: Leaving directory '/var/vcap/data/compile/ncurses/ncurses/ncurses-6.0/man'
cd include && make DESTDIR="" RPATH_LIST="/var/vcap/packages/ncurses/lib" all
make[1]: Entering directory '/var/vcap/data/compile/ncurses/ncurses/ncurses-6.0/include'
cat curses.head >curses.h
AWK=mawk /bin/sh ./MKkey_defs.sh ./Caps >>curses.h
/bin/sh -c 'if test "chtype" = "cchar_t" ; then cat ./curses.wide >>curses.h ; fi'
cat ./curses.tail >>curses.h
/bin/sh ./MKhashsize.sh ./Caps >hashsize.h
AWK=mawk /bin/sh ./MKncurses_def.sh ./ncurses_defs >ncurses_def.h
AWK=mawk /bin/sh ./MKparametrized.sh ./Caps >parametrized.h
touch config.h
mawk -f MKterm.h.awk ./Caps > term.h
/bin/sh ./edit_cfg.sh ../include/ncurses_cfg.h term.h
** edit: HAVE_TCGETATTR 1
** edit: HAVE_TERMIOS_H 1
** edit: HAVE_TERMIO_H 1
** edit: BROKEN_LINKER 0
make[1]: Leaving directory '/var/vcap/data/compile/ncurses/ncurses/ncurses-6.0/include'
cd ncurses && make DESTDIR="" RPATH_LIST="/var/vcap/packages/ncurses/lib" all
make[1]: Entering directory '/var/vcap/data/compile/ncurses/ncurses/ncurses-6.0/ncurses'
mawk -f ./tinfo/MKcodes.awk bigstrings=1 ./../include/Caps >codes.c
gcc -o make_hash -DHAVE_CONFIG_H -DUSE_BUILD_CC -I../ncurses -I. -I../include -I./../include -DHAVE_CONFIG_H -I. -I../include  -D_GNU_SOURCE -DNDEBUG  -O2 --param max-inline-insns-single=1200 ./tinfo/make_hash.c
/bin/sh -e ./tinfo/MKcaptab.sh mawk 1 ./tinfo/MKcaptab.awk ./../include/Caps > comp_captab.c
/bin/sh -e ./tty/MKexpanded.sh "gcc -E" -DHAVE_CONFIG_H -I. -I../include  -D_GNU_SOURCE -DNDEBUG > expanded.c
/bin/sh -e ./tinfo/MKfallback.sh /var/vcap/packages/ncurses/share/terminfo ../misc/terminfo.src /usr/bin/tic  >fallback.c
/bin/sh -e ./base/MKlib_gen.sh "gcc -E -DHAVE_CONFIG_H -I. -I../include  -D_GNU_SOURCE -DNDEBUG" "mawk" generated <../include/curses.h >lib_gen.c
AWK=mawk /bin/sh ./tinfo/MKkeys_list.sh ../include/Caps | sort >keys.list
mawk -f ./base/MKkeyname.awk bigstrings=1 keys.list > lib_keyname.c
/bin/sh -e ./base/MKlib_gen.sh "gcc -E -DHAVE_CONFIG_H -I. -I../include  -D_GNU_SOURCE -DNDEBUG" "mawk" implemented <../include/curses.h >link_test.c
mawk -f ./tinfo/MKnames.awk bigstrings=1 ./../include/Caps >names.c
echo | mawk -f ./base/MKunctrl.awk bigstrings=1 >unctrl.c
gcc -o make_keys -DHAVE_CONFIG_H -DUSE_BUILD_CC -I../ncurses -I. -I../include -I./../include -DHAVE_CONFIG_H -I. -I../include  -D_GNU_SOURCE -DNDEBUG  -O2 --param max-inline-insns-single=1200 ./tinfo/make_keys.c
./make_keys keys.list > init_keytry.h
gcc -DHAVE_CONFIG_H -I. -I../include  -D_GNU_SOURCE -DNDEBUG -O2 --param max-inline-insns-single=1200 -c ../ncurses/./tty/hardscroll.c -o ../objects/hardscroll.o
gcc -DHAVE_CONFIG_H -I. -I../include  -D_GNU_SOURCE -DNDEBUG -O2 --param max-inline-insns-single=1200 -c ../ncurses/./tty/hashmap.c -o ../objects/hashmap.o
gcc -DHAVE_CONFIG_H -I. -I../include  -D_GNU_SOURCE -DNDEBUG -O2 --param max-inline-insns-single=1200 -c ../ncurses/./base/lib_addch.c -o ../objects/lib_addch.o
gcc -DHAVE_CONFIG_H -I. -I../include  -D_GNU_SOURCE -DNDEBUG -O2 --param max-inline-insns-single=1200 -c ../ncurses/./base/lib_addstr.c -o ../objects/lib_addstr.o
gcc -DHAVE_CONFIG_H -I. -I../include  -D_GNU_SOURCE -DNDEBUG -O2 --param max-inline-insns-single=1200 -c ../ncurses/./base/lib_beep.c -o ../objects/lib_beep.o
gcc -DHAVE_CONFIG_H -I. -I../include  -D_GNU_SOURCE -DNDEBUG -O2 --param max-inline-insns-single=1200 -c ../ncurses/./base/lib_bkgd.c -o ../objects/lib_bkgd.o
gcc -DHAVE_CONFIG_H -I. -I../include  -D_GNU_SOURCE -DNDEBUG -O2 --param max-inline-insns-single=1200 -c ../ncurses/./base/lib_box.c -o ../objects/lib_box.o
gcc -DHAVE_CONFIG_H -I. -I../include  -D_GNU_SOURCE -DNDEBUG -O2 --param max-inline-insns-single=1200 -c ../ncurses/./base/lib_chgat.c -o ../objects/lib_chgat.o
gcc -DHAVE_CONFIG_H -I. -I../include  -D_GNU_SOURCE -DNDEBUG -O2 --param max-inline-insns-single=1200 -c ../ncurses/./base/lib_clear.c -o ../objects/lib_clear.o
gcc -DHAVE_CONFIG_H -I. -I../include  -D_GNU_SOURCE -DNDEBUG -O2 --param max-inline-insns-single=1200 -c ../ncurses/./base/lib_clearok.c -o ../objects/lib_clearok.o
gcc -DHAVE_CONFIG_H -I. -I../include  -D_GNU_SOURCE -DNDEBUG -O2 --param max-inline-insns-single=1200 -c ../ncurses/./base/lib_clrbot.c -o ../objects/lib_clrbot.o
gcc -DHAVE_CONFIG_H -I. -I../include  -D_GNU_SOURCE -DNDEBUG -O2 --param max-inline-insns-single=1200 -c ../ncurses/./base/lib_clreol.c -o ../objects/lib_clreol.o
gcc -DHAVE_CONFIG_H -I. -I../include  -D_GNU_SOURCE -DNDEBUG -O2 --param max-inline-insns-single=1200 -c ../ncurses/./base/lib_color.c -o ../objects/lib_color.o
gcc -DHAVE_CONFIG_H -I. -I../include  -D_GNU_SOURCE -DNDEBUG -O2 --param max-inline-insns-single=1200 -c ../ncurses/./base/lib_colorset.c -o ../objects/lib_colorset.o
gcc -DHAVE_CONFIG_H -I. -I../include  -D_GNU_SOURCE -DNDEBUG -O2 --param max-inline-insns-single=1200 -c ../ncurses/./base/lib_delch.c -o ../objects/lib_delch.o
gcc -DHAVE_CONFIG_H -I. -I../include  -D_GNU_SOURCE -DNDEBUG -O2 --param max-inline-insns-single=1200 -c ../ncurses/./base/lib_delwin.c -o ../objects/lib_delwin.o
gcc -DHAVE_CONFIG_H -I. -I../include  -D_GNU_SOURCE -DNDEBUG -O2 --param max-inline-insns-single=1200 -c ../ncurses/./base/lib_echo.c -o ../objects/lib_echo.o
gcc -DHAVE_CONFIG_H -I. -I../include  -D_GNU_SOURCE -DNDEBUG -O2 --param max-inline-insns-single=1200 -c ../ncurses/./base/lib_endwin.c -o ../objects/lib_endwin.o
gcc -DHAVE_CONFIG_H -I. -I../include  -D_GNU_SOURCE -DNDEBUG -O2 --param max-inline-insns-single=1200 -c ../ncurses/./base/lib_erase.c -o ../objects/lib_erase.o
gcc -DHAVE_CONFIG_H -I. -I../include  -D_GNU_SOURCE -DNDEBUG -O2 --param max-inline-insns-single=1200 -c ../ncurses/./base/lib_flash.c -o ../objects/lib_flash.o
gcc -DHAVE_CONFIG_H -I. -I../include  -D_GNU_SOURCE -DNDEBUG -O2 --param max-inline-insns-single=1200 -c ../ncurses/lib_gen.c -o ../objects/lib_gen.o
Makefile:962: recipe for target '../objects/lib_gen.o' failed
make[1]: Leaving directory '/var/vcap/data/compile/ncurses/ncurses/ncurses-6.0/ncurses'
Makefile:113: recipe for target 'all' failed
, Stderr: + export PREFIX=/var/vcap/packages/ncurses
+ PREFIX=/var/vcap/packages/ncurses
+ cd ncurses
+ tar xzf ncurses-6.0.tar.gz
+ cd ncurses-6.0
+ ./configure --prefix=/var/vcap/packages/ncurses
configure: WARNING: pkg-config is not installed
configure: WARNING: This option applies only to wide-character library
+ make
In file included from ./curses.priv.h:325:0,
                 from ../ncurses/lib_gen.c:19:
_7805.c:843:15: error: expected ')' before 'int'
../include/curses.h:1631:56: note: in definition of macro 'mouse_trafo'
 #define mouse_trafo(y,x,to_screen) wmouse_trafo(stdscr,y,x,to_screen)
                                                        ^
make[1]: *** [../objects/lib_gen.o] Error 1
make: *** [all] Error 2

Any ideas on how to fix this?

cc @cppforlife @emalm

Add a limit to avoid endless bosh-agent setup

Hi, we occurred a problem as regards the 'agent' runit service. The problem is the agent process would restart endlessly when running into errors and stuck there. Because the runit service has a fixed auto-restart policy. I knew we need cancel this VM or re-deploy, but I thought we could change the restart policy to stop agent by the setting for runit service such as a retry count or something.

This is a simple '/etc/sv/agent/finish' example to about limiting the bosh-agent start.

#!/bin/bash
set -e

exit_code=$1
counter_file="./agent_counter"

if [[ ${exit_code} -eq 0 ]]; then
    rm -f ${counter_file}
    exit 0
fi

[[ ! -f "$counter_file" ]] && echo 0 > ${counter_file}

read index < ${counter_file}
if [ ${index} -gt 200 ]; then       # 200 times or more
    rm -f ${counter_file}
    sv stop agent
    exit 0
else
    declare -i tmp=${index}+1
    echo ${tmp} > ${counter_file}
fi

exec sleep 5

Thank you. :)

[xenial] v4.15 kernel hangs on Azure if SR-IOV is enabled

I use the stemcell xenial v40 to deploy CF on Azure with accelerated networking (SR-IOV) enabled. The VM hangs with watchdog: BUG: soft lockup - CPU#2 stuck for 22s!.
Detailed logs: 09d9a3c0-2ea6-4e68-9e98-0f1c00928c31.860db4a3-01fe-4f10-a97b-38283bc4cd64.serialconsole.txt

This should be fixed by torvalds/linux@de0aa7b#diff-a9cd8181fe90643b51b4a77e48556743, which is released as 4.17-rc1.
Is there a plan to upgrade the kernel in the ubuntu xenial stemcell? Thanks.

Update util-linux to 2.27 or later

The logger command that we use extensively to forward logs from non-syslog-aware components to syslog gained in 2.27 the ability to raise the maximum log line length with the --size switch (default 1K). Some of our components love to emit long structured lines and being able to forward them without fragmentation is very important.

In addition, 2.26 also gained the --id switch that allows to specify the process ID of the emitting process.

[xenial] ip6tables v1.6.0: can't initialize ip6tables table `filter': Address family not supported by protocol

On Xenial I'm getting:

# ip6tables  -w 30  -I  OUTPUT  1  -p  tcp  --dport 19080  -m comment --comment  'WindowsFabric:azure-service-fabric0 WindowsFabric Http Gateway (TCP-Public-Out)' -j ACCEPT
ip6tables v1.6.0: can't initialize ip6tables table `filter': Address family not supported by protocol
Perhaps ip6tables or your kernel needs to be upgraded.

The iptables equivalent command works as expected.

Is this something fixable by me/my boshrelease, something fixable in stemcell, or elsewhere?

[It] should fail to get logs, but the deploy should succeed

When I use the Alibaba Cloud Bosh CPI and the latest Alibaba Cloud stemcell to run the smoke test, there always happens a panic error:

Task 91 | 10:46:53 | Updating instance default: default/6331b3fd-e16c-4316-80e6-566742343562 (0) (canary) (00:02:22)
[Cmd Runner] 2019/07/20 10:49:19 DEBUG - Stderr: 
[Cmd Runner] 2019/07/20 10:49:19 DEBUG - Successful: true (0)
•! Panic [161.830 seconds]
Stemcell #164749230, when using targeted blobstores when deploying with a invalid logs blobstore [It] should fail to get logs, but the deploy should succeed 
/tmp/build/82c27cda/bosh-linux-stemcell-builder/src/github.com/cloudfoundry/stemcell-acceptance-tests/ipv4director/smoke/smoke_test.go:309

  Test Panicked
  runtime error: index out of range
  /usr/local/go/src/runtime/panic.go:44

  Full Stack Trace
  	/usr/local/go/src/runtime/panic.go:522 +0x1b5
  github.com/cloudfoundry/stemcell-acceptance-tests/ipv4director/smoke_test.glob..func3.11.2.1()
  	/tmp/build/82c27cda/bosh-linux-stemcell-builder/src/github.com/cloudfoundry/stemcell-acceptance-tests/ipv4director/smoke/smoke_test.go:328 +0x1664
  github.com/cloudfoundry/stemcell-acceptance-tests/vendor/github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync(0xc0000734a0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
  	/tmp/build/82c27cda/bosh-linux-stemcell-builder/src/github.com/cloudfoundry/stemcell-acceptance-tests/ipv4director/smoke/smoke_suite_test.go:18 +0x64
  testing.tRunner(0xc0000ece00, 0x898db0)
  	/usr/local/go/src/testing/testing.go:865 +0xc0
  created by testing.(*T).Run
  	/usr/local/go/src/testing/testing.go:916 +0x35a

By debug, I noticed the config always a nil struct and the length of Blobstores always is 0. So I changed the blobStoreVars by the following:

				bsv := blobStoreVars{}
				if len(config.Env.BoshEnv.Blobstores) > 0 {
					bsv.Endpoint=             config.Env.BoshEnv.Blobstores[0].Options.Endpoint
					bsv.BlobstoreAgentPassword= config.Env.BoshEnv.Blobstores[0].Options.Password
					bsv.BlobstoreCaCertificate= config.Env.BoshEnv.Blobstores[0].Options.Tls.Cert.Ca
				}

After that, I retry and got another error:

Task 95 | 12:23:18 | Updating instance default: default/4caea898-9773-4f9b-906e-b2abedd3a50e (0) (canary) (00:02:13)
[Cmd Runner] 2019/07/21 12:25:33 DEBUG - Stderr: 
[Cmd Runner] 2019/07/21 12:25:33 DEBUG - Successful: true (0)
• Failure [305.064 seconds]
Stemcell #164749230, when using targeted blobstores when deploying with a invalid logs blobstore [It] should fail to get logs, but the deploy should succeed 
/tmp/build/82c27cda/bosh-linux-stemcell-builder/src/github.com/cloudfoundry/stemcell-acceptance-tests/ipv4director/smoke/smoke_test.go:309

  Expected an error to have occurred.  Got:
      <nil>: nil

  /tmp/build/82c27cda/bosh-linux-stemcell-builder/src/github.com/cloudfoundry/stemcell-acceptance-tests/ipv4director/smoke/smoke_test.go:354

I also found it always will be success whatever I setting any values for the add-invalid-logs-blobstore.yml and there is no any error.

Is there any ideas to fix it?

In addition, this CI pipeline is bosh-cpi-certification of alicloud and this error same as invalid packages blobstore

Partition "/" full caused by Windows Azure Linux Agent(waagent)

Hi,

When creating the buildpack, shall we consider installing the Windows Azure Linux Agent(waagent) in another partition, for example /var/vcap/data?
Or shall we give a larger size for the "/" partition?

In vms deployed in Azure, the Windows Azure Linux Agent(waagent) tries to pull update OMS agent zip files under /var/lib/waagent. As the "/" partition is quite small and only 3GB, it is unable to extract the zip file due to the space issue.

This is an example of UAA vm. You can see the"/" partition is around 3GB.
uaa/bf85e3bd-2e76-4a74-8b9d-00ae1c5c70fc:/var/lib/waagent# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdb 8:16 0 8G 0 disk
└─sdb1 8:17 0 8G 0 part
sr0 11:0 1 632K 0 rom
fd0 2:0 1 4K 0 disk
sda 8:0 0 35G 0 disk
├─sda2 8:2 0 3.8G 0 part [SWAP]
├─sda3 8:3 0 28.3G 0 part /var/vcap/data
└─sda1 8:1 0 2.9G 0 part /

While installing/updating OMSagent for linux, it fills up /dev/sda1 mount across all PCF VM's causing high cpu IO time issues in customer side.
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 2.8G 2.3G 328M 88% /

Regards,
David

[xenial] Could not execute 'apt-key' to verify signature

Inside a xenial 1.97 vm I tried to run apt-get update and got the errors:

Err:1 http://security.ubuntu.com/ubuntu xenial-security InRelease
  Could not execute 'apt-key' to verify signature (is gnupg installed?)
Err:5 http://archive.ubuntu.com/ubuntu xenial-updates InRelease
  Could not execute 'apt-key' to verify signature (is gnupg installed?)

The solution from https://bugs.launchpad.net/ubuntu/+source/apt/+bug/1577926 was to first run:

chmod 1777 /tmp

Perhaps /tmp needs these settings permanently; or am I doing something wrong?

BOSH Stemcells are very slow to download

When downloading stemcells from https://bosh.io/stemcells I'm getting download speeds a few orders of magnitude slower than what my connection is capable of. I'm on wired office internet and I can download files from Google Cloud Storage at around 70 megabytes per second (~500 megabits per second).

I've graphed the download speed of a typical stemcell download session speed below.

stemcell_download_speed

The specific stemcell I downloaded was: https://s3.amazonaws.com/bosh-core-stemcells/315.13/bosh-stemcell-315.13-google-kvm-ubuntu-xenial-go_agent.tgz

As you can see this is always an order of magnitude slower than my connection and sometimes drops to 2 orders of magnitude slower. This is really frustrating as these files are so large.

Do you have any idea what could be causing this?

Unable to create loopback devices in Ubuntu Xenial

I am trying to run a Concourse deployment on a BOSH Lite director, which uses the Ubuntu Xenial stemcell (downloaded from https://bosh.cloudfoundry.org/stemcells/bosh-warden-boshlite-ubuntu-xenial-go_agent).

My problem is that the OS doesn't seem to support loopback devices:

worker/5ad0b4d9-1e30-46e3-892c-14554dec5629:~$ losetup -l
worker/5ad0b4d9-1e30-46e3-892c-14554dec5629:~$ losetup -f
losetup: cannot find an unused loop device: No such file or directory

This means that the Concourse worker fails. This appears to be a regression, since I am following the instructions from https://github.com/concourse/concourse-bosh-deployment/tree/master/cluster which suggest that it is possible to run in this configuration.

image-metalinks hash mismatch

Since ubuntu-xenial/v621.51, I have troubles downloading pre-built os-image because meta4 reports hash mismatch.

$ git checkout ubuntu-xenial/v621.60
$ meta4 file-download --metalink ubuntu-xenial.meta4 --file ubuntu-xenial.tgz ~/ubuntu-xenial.tgz 

481.29 MiB / 481.30 MiB [-------------------------------->] 100.00% 3.63 MiB p/s
ubuntu-xenial.tgz: sha-512: INVALID: incorrect hash: ea20acc5e42f744e0dbd711cc884f1505fd928f80cd69aa54d7e4bf9a45b10c80a6f5c6aac63d9c8ed1ff4cc0d0190553c4f00ca25d401efd3f6f3610abeb416
Verifying file: expected hash: 71f24b7c9cd1d2e77b451ff257cd488d1d8ddcb313ddef80cc4317624ca3b1a85ae4b98acc662e381c816a2a9e2723dd4734d3e2883a152b2eb432c5b7254935

I'm puzzled because downloading the file multiple times gives different hashes.

(try1, 06:51:51): incorrect hash: ea20acc5e42f744e0dbd711cc884f1505fd928f80cd69aa54d7e4bf9a45b10c80a6f5c6aac63d9c8ed1ff4cc0d0190553c4f00ca25d401efd3f6f3610abeb416
(try2, 06:54:14): incorrect hash: cf4bc7422683adcfd61467183774c7ec727a6de0bce464e0100c5e06b539e36b239c89c22f78b13ee5cc7889c18adbe6287229e918ae73e4e89d311e440d3bcd
(try3, 07:01:05): incorrect hash: c62b08b3a0aaaa514c0ea24e8bb0cf6575613c5b4b4fbbce64fed4fb203d857b4ffc2df68b6ee4c69c996ec96623854cf836c89a6ac5a130338e1f46065b18c6

I have the same behaviour with curl

  • using url: https://bosh-os-images.s3.amazonaws.com/621.x/ubuntu-xenial.tgz
  • reproduced on multiple hosts (no rproxy on the way)
  • amazon response headers are the same on each try
    • ETag: "46681df253fd089bb3de69f0f8a7ccee-8"
    • x-amz-version-id: vKhv2ZCas1Wa9BNT.rxHqi7vf6tcHgdj
  • sha-512 of downloaded files differ on each try

Is this a normal amazon s3 behaviour ? If so, how should I check the pre-built image integrity ?

Update

The sha-512 has stopped moving on each download and has stabilized on cf4bc7422683adcfd61467183774c7ec727a6de0bce464e0100c5e06b539e36b239c89c22f78b13ee5cc7889c18adbe6287229e918ae73e4e89d311e440d3bcd
It seems to be the value expected for the commit 5abd846 which is not tagged yet.

Hypothesis:

  • I'm having "moving" checksums if I download the file while its is being uploaded by pivotal's concourse
  • The meta4 file is only capable of downloading the latest version of the file (ie: the version field is unused) which makes tagged meta-links unusable

Update outdated s3cli version

Please bump s3cli to the latest version. This is needed as the currently included version causes issues when using alicloud OSS as blobstore for the BOSH director.

While compiling you'll receive the error performing operation get: AccessDenied: OSS authentication requires a valid Date.

This is caused by a wrong Date format ( using UTC instead of GMT ) in the request header.
This issues was fixed with version v0.0.80 which is already part of the latest version of the BOSH Director.

[xenial] Support for multipath and open-iscsi

We would like to ease the effort required to integrate CFCR with iscsi storage.

An operation team I am working with is hoping to use Pure Storage with iscsi through vSphere datastores. To enable this, we need the following packages :

  • multipath-tools_0.5.0+git1.656f8865-5ubuntu2_amd64.deb
  • open-iscsi_2.0.873+git0.3b4b4500-14ubuntu3_amd64.deb

I would be happy to provide a PR but wanted to discuss with you if that should be contributed in the upstream stemcell or in the CFCR bosh release.

RANDOM_DELAY in anacrontab is not applied

RANDOM_DELAY is currently appended to the end of anacrontab (at least on ubuntu stemcells):

$ cat /etc/anacrontab
# /etc/anacrontab: configuration file for anacron

# See anacron(8) and anacrontab(5) for details.

SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
HOME=/root
LOGNAME=root

# These replace cron's entries
1       5       cron.daily      run-parts --report /etc/cron.daily
7       10      cron.weekly     run-parts --report /etc/cron.weekly
@monthly        15      cron.monthly    run-parts --report /etc/cron.monthly

RANDOM_DELAY=60

as can be seen in the anacron source code, RANDOM_DELAY only applies to cron entries that follow it: https://github.com/cronie-crond/cronie/blob/b836b5789cae7d608346d64e64ade121d8608518/anacron/readtab.c#L304 (this is not clear from the documentation, but it's worth pointing out that all anacrontab(5) examples in the docs put RANDOM_DELAY before the cron entries)

we noticed this because if a RANDOM_DELAY=60 was really into effect the following would return a much more even distribution (just for context runsrv dea runs the command against all of our 9 lab deas):

$ runsrv dea 'sudo zgrep /etc/cron.hourly /var/log/syslog*' | cut -d: -f3 | sort | uniq -c
   5214 17
      9 22
$ runsrv dea 'sudo zgrep /etc/cron.daily /var/log/syslog*' | cut -d: -f3 | sort | uniq -c
      9 23
    215 25
$ runsrv dea 'sudo zgrep /etc/cron.weekly /var/log/syslog*' | cut -d: -f3 | sort | uniq -c
      9 23
     34 47
$ runsrv dea 'sudo zgrep /etc/cron.monthly /var/log/syslog*' | cut -d: -f3 | sort | uniq -c
      9 23
      9 52

When running on on-premise infrastructure it's pretty important for cronjobs not to run at the same time to avoid gratuitous spikes. The most desirable behaviour, at least from our perspective, would be for all VMs to run the cronjobs at slightly different times.

Project quota support missing from e2fsprogs

Xenial has support for project quotas (directory quotas) using ext4 in the kernel, but the version of e2fsprogs availabe on xenial is too low to make use of this feature (v1.43+ required).

We are exploring how we deal with container quotas on garden. Rather than quotas working through a xfs formatted sparse file on a loopback device, we would like /var/vcap/data to be mounted with the project quota option enabled. How might it be possible to get a newer version of e2fsprogs compiled on new stemcells?

Add support for Cloudstack/XenServer stemcell

We're currently using CPI for Cloudstack IaaS.

We have forked bosh repo in 2015 to add stemcell support for this IaaS, but never went into a PR.

We take the opportunity with this new repo to create a PR adding support for Apache Cloudstack IaaS.

This issue is there to track and support this future PR.

[xenial] sync-time should not block bosh-agent

In https://github.com/cloudfoundry/bosh-agent/blob/de93f41fba875f70253b0b8af7e590481e7751f4/platform/linux_platform.go#L548-L549, bosh-agent should make a best effort to sync time but don't block. But in https://github.com/cloudfoundry/bosh-linux-stemcell-builder/blob/ubuntu-xenial/v87/stemcell_builder/stages/bosh_ntp/assets/chrony-updater#L24, the max-retries are not set. It means, chronyc will wait forever if the time fails to sync for some reason. bosh-agent will be blocked. I suggest to add max-retries, like chronyc waitsync 10.

Readme documentation

In the Build and OS Image section the there is a separate Rake task link does not redirect you anywhere.

Consider disabling port forwarding by default

There are recommendations to turn off port forwarding by default, e.g. https://www.ssh.com/ssh/sshd_config/.

AllowTcpForwarding no
AllowStreamLocalForwarding no
GatewayPorts no
PermitTunnel no

Since the stemcell is a general purpose image it might be necessary to change this for certain scenarios. I assume that os-conf-release could learn to provide that possibility.

Exposure of /var/vcap/jobs/*/config

We are worried about the exposure on /var/vcap/jobs/*/config allowing anyone to read the contents.

It is mentioned on bosh slack that the upcoming stemcell line will fix the permission of it. What is the expected date and how to fix it.Thanks!

/cc @maximilien @cppforlife

Context:

sandycash 
[12:17 AM] 
@sf-bosh @toronto-bosh Is there a way to specify the permissions for directories like `/var/vcap/jobs/<jobname>/config`?  I know we can likely just run something in the pre-start/post-start scripts, but is there any kind of OOTB feature supporting this?  I couldn't find anything in the docs, but if I'm being obtuse, I apologize.


dkalinin 
[12:23 AM] 
@sandycash though in upcoming stemcell line we will be settig it in the agent?


[12:23] 
what are you trying to set them to?


sandycash 
[12:28 AM] 
@dkalinin we'd like to remove world-readability, basically


[12:28] 
So 750 instead of 755


[12:29] 
but that could be an issue for jobs which don't run as the user owning the dir, so it's not 100% straightforward, I know


dkalinin 
[12:29 AM] 
@sandycash next stemcell line should do just that


sandycash 
[12:29 AM] 
@tschultz thanks


dkalinin 
[12:29 AM] 
if its not urgent i would just wait for that


sandycash 
[12:29 AM] 
@dkalinin I'll find out, thanks man

GCC version out of date

gcc version provided by stemcell is out of date and disallow compilation of recent packages (for exemple mongodb >3.2 needs at least gcc 5.3)
The recompilation from gcc sources in a package takes arround 1hour, so it should be a good thing if a more recent version of gcc could be embedded directly in bosh stemcellgcc version provided by stemcell is out of date and disallow compilation of recent packages (for exemple mongodb >3.2 needs at least gcc 5.3)
The recompilation from gcc sources in a package takes arround 1hour, so it should be a good thing if a more recent version of gcc could be embedded directly in bosh stemcell

[xenial] nvidia gpu support

Hi,

I was interested to understand the possibility of using nvidia GPUs with CFCR/PKS however I had issues with getting nvidia drivers running on the Xenial stemcells. nouveau kernel modules seem to be enabled by default and runtime removal of the module seem to cause a panic on stemcell.

I wanted to open a thread to discuss if it would be possible to blacklist nouveau

The date of the release notes

In the release notes of Stemcell 3586.24 and other versions, is the date a typo?
Periodic Ubuntu Trusty stemcell bump (July 18, 2018) => ``Periodic Ubuntu Trusty stemcell bump (June 18, 2018)`
@cppforlife

Include rng-tools package for increasing low entropy available

Most IaaS providers as GCP, Azure and AWS have support for "real" hardware RNG, but the relevant "rng-tools" package is not installed.

The Stemcell builder should include this package to support fetching entropy directly from /dev/hwrng.

Installing this package manually to GCP and Azure based steamcells (apt-get install rng-tools) increased the amount of entropy drastically from a 42-600 range to ~3000 and also the speed of re-generating new entropy.

The impact of not having enough entropy is severe and application or functions that relay on random number generation as :
secureRandom = SecureRandom.getInstanceStrong();
Can take 20-30 minutes to complete on GCP and 2-5 minutes on Azure and AWS.

including the rng-tools package (https://wiki.archlinux.org/index.php/Rng-tools) solve this issue by increasing the quantity of entropy in kernel and makes /dev/random faster. allows the use of faster entropy sources, mainly hardware random number generators (TRNG).

Build RHEL 7.4 stemcell

Hi guys

We're trying to build a stemcell based on RHEL 7.4 since some software we use requires at least 7.3.
However, this repo's README states that it only works with RHEL 7.0, not with RHEL 7.1.

The first issue we ran into (newer packages provided by the distro) could be fixed by updating the references in stemcell_builder/stages/base_rhel/apply.sh to

    release_package_url="/mnt/rhel/Packages/redhat-release-server-7.4-18.el7.x86_64.rpm"
    epel_package_url="http://dl.fedoraproject.org/pub/epel/7/x86_64/Packages/e/epel-release-7-11.noarch.rpm"

We fixed the second issue (by default, no subscription-manager installed) by installing it using yum in the same script:

...
 run_in_chroot $chroot "yum -c /custom_rhel_yum.conf update --assumeyes"
 run_in_chroot $chroot "yum -c /custom_rhel_yum.conf --verbose --assumeyes groupinstall Base"
 run_in_chroot $chroot "yum -c /custom_rhel_yum.conf --verbose --assumeyes groupinstall 'Development Tools'"
+run_in_chroot $chroot "yum -c /custom_rhel_yum.conf --verbose --assumeyes install subscription-manager"
...

However, we're now stuck with the following:

...
subscription-manager register --username=<removed> --password=<removed>--auto-attach
subscription-manager repos --enable=rhel-7-server-optional-rpms
'
+ disable /mnt/stemcells/null/null/rhel/work/work/chroot/sbin/initctl
+ '[' -e /mnt/stemcells/null/null/rhel/work/work/chroot/sbin/initctl ']'
+ disable /mnt/stemcells/null/null/rhel/work/work/chroot/usr/sbin/invoke-rc.d
+ '[' -e /mnt/stemcells/null/null/rhel/work/work/chroot/usr/sbin/invoke-rc.d ']'
+ unshare -f -p -m /bin/sh
lscpu: failed to determine number of CPUs: /sys/devices/system/cpu/possible: No such file or directory
lscpu: failed to determine number of CPUs: /sys/devices/system/cpu/possible: No such file or directory
Error updating system data on the server, see /var/log/rhsm/rhsm.log for more details.
Registering to: subscription.rhsm.redhat.com:443/subscription
The system has been registered with ID: <removed>

The mentioned file /var/log/rhsm/rhsm.log shows the following:

2018-01-11 17:28:21,055 [WARNING] subscription-manager:15:MainThread @hwprobe.py:545 - Error with lscpu (/usr/bin/lscpu) subprocess: Command '['/usr/bin/lscpu']' returned non-zero exit status 1
2018-01-11 17:28:24,973 [INFO] subscription-manager:15:MainThread @connection.py:552 - Response: status=200, requestUuid=bb208b66-bc9a-44c8-8120-866eeb40687e, request="GET /subscription/users/<removed>/owners"
2018-01-11 17:28:25,997 [INFO] subscription-manager:15:MainThread @connection.py:552 - Response: status=200, request="GET /subscription/"
2018-01-11 17:28:29,255 [INFO] subscription-manager:15:MainThread @connection.py:552 - Response: status=200, requestUuid=2d3a0863-a1af-48f1-ba2a-71bd0b82ffbd, request="POST /subscription/consumers?owner=<removed>"
2018-01-11 17:28:29,257 [INFO] subscription-manager:15:MainThread @managerlib.py:74 - Consumer created: {'consumer_name': u'<removed>', 'uuid': '<removed>'}
2018-01-11 17:28:29,258 [INFO] subscription-manager:15:MainThread @connection.py:822 - Connection built: host=subscription.rhsm.redhat.com port=443 handler=/subscription auth=identity_cert ca_dir=/etc/rhsm/ca/ insecure=False
2018-01-11 17:28:30,476 [INFO] subscription-manager:15:MainThread @connection.py:552 - Response: status=200, request="GET /subscription/"
2018-01-11 17:28:32,010 [ERROR] subscription-manager:15:MainThread @utils.py:271 - Error while checking server version: EOF occurred in violation of protocol (_ssl.c:579)
2018-01-11 17:28:32,011 [ERROR] subscription-manager:15:MainThread @utils.py:273 - EOF occurred in violation of protocol (_ssl.c:579)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/subscription_manager/utils.py", line 252, in get_server_versions
    status = cp.getStatus()
  File "/usr/lib64/python2.7/site-packages/rhsm/connection.py", line 1410, in getStatus
    return self.conn.request_get(method)
  File "/usr/lib64/python2.7/site-packages/rhsm/connection.py", line 646, in request_get
    return self._request("GET", method, headers=headers)
  File "/usr/lib64/python2.7/site-packages/rhsm/connection.py", line 672, in _request
    info=info, headers=headers)
  File "/usr/lib64/python2.7/site-packages/rhsm/connection.py", line 528, in _request
    conn.request(request_type, handler, body=body, headers=final_headers)
  File "/usr/lib64/python2.7/httplib.py", line 1017, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib64/python2.7/httplib.py", line 1051, in _send_request
    self.endheaders(body)
  File "/usr/lib64/python2.7/httplib.py", line 1013, in endheaders
    self._send_output(message_body)
  File "/usr/lib64/python2.7/httplib.py", line 864, in _send_output
    self.send(msg)
  File "/usr/lib64/python2.7/httplib.py", line 826, in send
    self.connect()
  File "/usr/lib64/python2.7/httplib.py", line 1236, in connect
    server_hostname=sni_hostname)
  File "/usr/lib64/python2.7/ssl.py", line 350, in wrap_socket
    _context=self)
  File "/usr/lib64/python2.7/ssl.py", line 611, in __init__
    self.do_handshake()
  File "/usr/lib64/python2.7/ssl.py", line 833, in do_handshake
    self._sslobj.do_handshake()
SSLEOFError: EOF occurred in violation of protocol (_ssl.c:579)
2018-01-11 17:28:32,014 [INFO] subscription-manager:15:MainThread @managercli.py:418 - Server Versions: {'rules-version': u'Unknown', 'candlepin': u'Unknown', 'server-type': u'Red Hat Subscription Management'}
2018-01-11 17:28:33,132 [INFO] subscription-manager:15:MainThread @connection.py:552 - Response: status=200, request="GET /subscription/"
2018-01-11 17:28:34,666 [ERROR] subscription-manager:15:MainThread @cache.py:178 - Error updating system data on the server
2018-01-11 17:28:34,666 [ERROR] subscription-manager:15:MainThread @cache.py:179 - EOF occurred in violation of protocol (_ssl.c:579)

Can you help us with that? Do you know of anybody who built a RHEL 7.4 stemcell?

Failed to run "sudo apt-get update" in the xenial stemcell

uaa/1e6130cd-b7ff-425b-a4f2-4eb9effeeb5b:~$ cat /etc/apt/sources.list
deb http://archive.ubuntu.com/ubuntu xenial main universe multiverse
deb http://archive.ubuntu.com/ubuntu xenial-updates main universe multiverse
deb http://security.ubuntu.com/ubuntu xenial-security main universe multiverse
uaa/1e6130cd-b7ff-425b-a4f2-4eb9effeeb5b:~$ sudo apt-get update
Get:1 http://security.ubuntu.com/ubuntu xenial-security InRelease [107 kB]
Hit:2 http://archive.ubuntu.com/ubuntu xenial InRelease
Err:2 http://archive.ubuntu.com/ubuntu xenial InRelease                            Couldn't create tempfiles for splitting up /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_xenial_InRelease
  Could not execute 'apt-key' to verify signature (is gnupg installed?)
Get:3 http://archive.ubuntu.com/ubuntu xenial-updates InRelease [109 kB]
Err:1 http://security.ubuntu.com/ubuntu xenial-security InReleasesplitting up /var/lib/apt/lists/partial/security.ubuntu.com_ubuntu_dists_xenial-security_InRelease
  Could not execute 'apt-key' to verify signature (is gnupg installed?)
Err:3 http://archive.ubuntu.com/ubuntu xenial-updates InReleaseting up /var/lib/apt/lists/partial/archive.ubuntu.com_ubuntu_dists_xenial-updates_InRelease
  Could not execute 'apt-key' to verify signature (is gnupg installed?)
Fetched 216 kB in 2s (99.9 kB/s)
Reading package lists... Done
W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: http://archive.ubuntu.com/ubuntu xenial InRelease: Could not execute 'apt-key' to verify signature (is gnupg installed?)
W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: http://security.ubuntu.com/ubuntu xenial-security InRelease: Could not execute 'apt-key' to verify signature (is gnupg installed?)
W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: http://archive.ubuntu.com/ubuntu xenial-updates InRelease: Could not execute 'apt-key' to verify signature (is gnupg installed?)
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/xenial/InRelease  Could not execute 'apt-key' to verify signature (is gnupg installed?)
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/xenial-updates/InRelease  Could not execute 'apt-key' to verify signature (is gnupg installed?)
W: Failed to fetch http://security.ubuntu.com/ubuntu/dists/xenial-security/InRelease  Could not execute 'apt-key' to verify signature (is gnupg installed?)
W: Some index files failed to download. They have been ignored, or old ones used instead.

Stemcell version: 97.19
platform: Azure

chmod 1777 /tmp can fix the issue.
#39

Failed to run 'sfdisk -d /dev/xvdc' when bosh-agent partitions ephemeral disk

Hi, we were trying to build SoftLayer xenial stemcell with this based OS image: https://github.com/cloudfoundry/bosh-linux-stemcell-builder/blob/master/bosh-stemcell/os_image_versions.json#L4. It failed during compiler vm creation:

02:11:30 | Preparing deployment: Preparing deployment (00:00:00)
02:11:30 | Preparing package compilation: Finding packages to compile (00:00:00)
02:11:30 | Compiling packages: batlight/dd894d36ab4d50ec504c09be6715c04f4077afee (0)
02:26:50 | Creating missing vms: mutable/ad72b6a1-a1bd-4546-9ba3-1781595aedef (0) (00:15:20)
            L Error: Timed out pinging to e4526a3f-9639-4b9d-a24c-51fdb45c01d2 after 600 seconds

02:26:51 | Error: Timed out pinging to e4526a3f-9639-4b9d-a24c-51fdb45c01d2 after 600 seconds

This error was caused due to the bosh-agent(v2.65.0) failed to partition ephemeral disk in the compilation VM:
[main] 2018/02/08 02:43:01 ERROR - Agent exited with error: Running bootstrap: Setting up ephemeral disk: Partitioning ephemeral disk: Partitioning ephemeral disk '/dev/xvdc': Getting partitions for /dev/xvdc: Shelling out to sfdisk when getting partitions: Running command: 'sfdisk -d /dev/xvdc', stdout: '', stderr: 'sfdisk: failed to dump partition table: Success

And 'sfdisk -l' result seems normal:

/:/var/vcap/bosh# sfdisk -l /dev/xvdc
Disk /dev/xvdc: 100 GiB, 107374182400 bytes, 209715200 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Hope detailed logs help you:

2018-02-08_02:42:49.10993 ********************
2018-02-08_02:42:49.11021 [settingsService] 2018/02/08 02:42:49 DEBUG - Loading settings from fetcher
2018-02-08_02:42:49.11023 [File System] 2018/02/08 02:42:49 DEBUG - Reading file
2018-02-08_02:42:49.11023 [registryProvider] 2018/02/08 02:42:49 DEBUG - Using file registry at /var/vcap/bosh/user_data.json
2018-02-08_02:42:49.11023 [File System] 2018/02/08 02:42:49 DEBUG - Reading file /var/vcap/bosh/user_data.json
2018-02-08_02:42:49.11024 [File System] 2018/02/08 02:42:49 DEBUG - Read content
2018-02-08_02:42:49.11024 ********************
2018-02-08_02:42:49.11024 {"agent_id":"91787558-2d2b-4d64-a99e-7b1736f75cf9","vm":{"name":"vm-91787558-2d2b-4d64-a99e-7b1736f75cf9","id":"vm-91787558-2d2b-4d64-a99e-7b1736f75cf9"},"mbus":"nats://nats:[email protected]:4222","ntp":["time1.google.com","time2.google.com","time3.google.com","time4.google.com"],"blobstore":{"provider":"dav","options":{"endpoint":"http://10.112.116.16:25250","password":"sp4uvvhgop3nrv9387nc","user":"agent"}},"networks":{"default":{"type":"dynamic","dns":["10.112.166.140"],"default":["dns","gateway"],"preconfigured":true,"cloud_properties":{"PrimaryBackendNetworkComponent":{"NetworkVlan":{"Id":1.292651e+06}},"PrimaryNetworkComponent":{"NetworkVlan":{"Id":1.292653e+06}}}}},"disks":{"ephemeral":"/dev/xvdc","persistent":null},"env":{"bosh":{"group":"bats-director-bat-compilation-36bfae6b-a4fd-4445-a51a-9e384c850217","groups":["bats-director","bat","compilation-36bfae6b-a4fd-4445-a51a-9e384c850217","bats-director-bat","bat-compilation-36bfae6b-a4fd-4445-a51a-9e384c850217","bats-director-bat-compilation-36bfae6b-a4fd-4445-a51a-9e384c850217"],"keep_root_password":true,"password":"$6$4cbef42d659bd6ea$6EvArRU9G9haaLC.1UTbNMyP7JNWxoddj6jVgGsF8fglPPUWGv1jJ2lteG3mmdoe6EA6W5m0lfagvICYn0zF8/"}}}
2018-02-08_02:42:49.11026 ********************
2018-02-08_02:42:49.11066 [settingsService] 2018/02/08 02:42:49 DEBUG - Successfully received settings from fetcher
2018-02-08_02:42:49.11099 [File System] 2018/02/08 02:42:49 DEBUG - Making dir /var/vcap/bosh with perm 0777
2018-02-08_02:42:49.11100 [Cmd Runner] 2018/02/08 02:42:49 DEBUG - Running command 'route -n'
2018-02-08_02:42:49.11194 [Cmd Runner] 2018/02/08 02:42:49 DEBUG - Successful: true (0)
2018-02-08_02:42:49.11250 [Cmd Runner] 2018/02/08 02:42:49 DEBUG - Running command 'usermod -p $6$4cbef42d659bd6ea$6EvArRU9G9haaLC.1UTbNMyP7JNWxoddj6jVgGsF8fglPPUWGv1jJ2lteG3mmdoe6EA6W5m0lfagvICYn0zF8/ vcap'
2018-02-08_02:42:49.13426 [Cmd Runner] 2018/02/08 02:42:49 DEBUG - Stdout:
2018-02-08_02:42:49.13428 [Cmd Runner] 2018/02/08 02:42:49 DEBUG - Stderr:
2018-02-08_02:42:49.13428 [Cmd Runner] 2018/02/08 02:42:49 DEBUG - Successful: true (0)
2018-02-08_02:42:49.13453 [File System] 2018/02/08 02:42:49 DEBUG - Writing /etc/resolvconf/resolv.conf.d/base
2018-02-08_02:42:49.13454 [File System] 2018/02/08 02:42:49 DEBUG - Making dir /etc/resolvconf/resolv.conf.d with perm 0777
2018-02-08_02:42:49.13454 [File System] 2018/02/08 02:42:49 DEBUG - Write content
2018-02-08_02:42:49.13455 ********************
2018-02-08_02:42:49.13455 # Generated by bosh-agent
2018-02-08_02:42:49.13455 nameserver 10.112.166.140
2018-02-08_02:42:49.13456
2018-02-08_02:42:49.13456 ********************
2018-02-08_02:42:49.13456 [File System] 2018/02/08 02:42:49 DEBUG - Symlinking oldPath /run/resolvconf/resolv.conf with newPath /etc/resolv.conf
2018-02-08_02:42:49.13456 [File System] 2018/02/08 02:42:49 DEBUG - Lstat '/etc/resolv.conf'
2018-02-08_02:42:49.13500 [Cmd Runner] 2018/02/08 02:42:49 DEBUG - Running command 'resolvconf -u'
2018-02-08_02:42:49.14989 [Cmd Runner] 2018/02/08 02:42:49 DEBUG - Stdout:
2018-02-08_02:42:49.14991 [Cmd Runner] 2018/02/08 02:42:49 DEBUG - Stderr:
2018-02-08_02:42:49.14991 [Cmd Runner] 2018/02/08 02:42:49 DEBUG - Successful: true (0)
2018-02-08_02:42:49.14991 [File System] 2018/02/08 02:42:49 DEBUG - Writing /var/vcap/bosh/etc/ntpserver
2018-02-08_02:42:49.15016 [File System] 2018/02/08 02:42:49 DEBUG - Making dir /var/vcap/bosh/etc with perm 0777
2018-02-08_02:42:49.15017 [File System] 2018/02/08 02:42:49 DEBUG - Write content
2018-02-08_02:42:49.15018 ********************
2018-02-08_02:42:49.15019 time1.google.com time2.google.com time3.google.com time4.google.com
2018-02-08_02:42:49.15019 ********************
2018-02-08_02:42:49.15019 [Cmd Runner] 2018/02/08 02:42:49 DEBUG - Running command 'sync-time'
2018-02-08_02:43:01.21297 [Cmd Runner] 2018/02/08 02:43:01 DEBUG - Stdout:
2018-02-08_02:43:01.21300 [Cmd Runner] 2018/02/08 02:43:01 DEBUG - Stderr:
2018-02-08_02:43:01.21300 [Cmd Runner] 2018/02/08 02:43:01 DEBUG - Successful: true (0)
2018-02-08_02:43:01.21300 [linuxPlatform] 2018/02/08 02:43:01 INFO - Setting up raw ephemeral disks
2018-02-08_02:43:01.21301 [linuxPlatform] 2018/02/08 02:43:01 INFO - Setting up ephemeral disk...
2018-02-08_02:43:01.21301 [File System] 2018/02/08 02:43:01 DEBUG - Glob '/var/vcap/data/*'
2018-02-08_02:43:01.21301 [File System] 2018/02/08 02:43:01 DEBUG - Making dir /var/vcap/data with perm 0750
2018-02-08_02:43:01.21302 [linuxPlatform] 2018/02/08 02:43:01 INFO - Creating swap & ephemeral partitions on ephemeral disk...
2018-02-08_02:43:01.21302 [linuxPlatform] 2018/02/08 02:43:01 DEBUG - Getting device size of `/dev/xvdc'
2018-02-08_02:43:01.21303 [Cmd Runner] 2018/02/08 02:43:01 DEBUG - Running command 'sfdisk -s /dev/xvdc'
2018-02-08_02:43:01.21453 [Cmd Runner] 2018/02/08 02:43:01 DEBUG - Stdout: 104857600
2018-02-08_02:43:01.21455 [Cmd Runner] 2018/02/08 02:43:01 DEBUG - Stderr:
2018-02-08_02:43:01.21456 [Cmd Runner] 2018/02/08 02:43:01 DEBUG - Successful: true (0)
2018-02-08_02:43:01.21456 [linuxPlatform] 2018/02/08 02:43:01 DEBUG - Calculating partition sizes of `/dev/xvdc', with available size 107374182400B
2018-02-08_02:43:01.21558 [linuxPlatform] 2018/02/08 02:43:01 INFO - Partitioning `/dev/xvdc' with [[Type: swap, SizeInBytes: 8368222208] [Type: linux, SizeInBytes: 99005960192]]
2018-02-08_02:43:01.21559 [Cmd Runner] 2018/02/08 02:43:01 DEBUG - Running command 'sfdisk -d /dev/xvdc'
2018-02-08_02:43:01.22030 [Cmd Runner] 2018/02/08 02:43:01 DEBUG - Stdout:
2018-02-08_02:43:01.22031 [Cmd Runner] 2018/02/08 02:43:01 DEBUG - Stderr: sfdisk: failed to dump partition table: Success
2018-02-08_02:43:01.22032 [Cmd Runner] 2018/02/08 02:43:01 DEBUG - Successful: false (1)
2018-02-08_02:43:01.22033 [main] 2018/02/08 02:43:01 ERROR - App setup Running bootstrap: Setting up ephemeral disk: Partitioning ephemeral disk: Partitioning ephemeral disk `/dev/xvdc': Getting partitions for /dev/xvdc: Shelling out to sfdisk when getting partitions: Running command: 'sfdisk -d /dev/xvdc', stdout: '', stderr: 'sfdisk: failed to dump partition table: Success
2018-02-08_02:43:01.22034 ': exit status 1
2018-02-08_02:43:01.22035 [main] 2018/02/08 02:43:01 ERROR - Agent exited with error: Running bootstrap: Setting up ephemeral disk: Partitioning ephemeral disk: Partitioning ephemeral disk `/dev/xvdc': Getting partitions for /dev/xvdc: Shelling out to sfdisk when getting partitions: Running command: 'sfdisk -d /dev/xvdc', stdout: '', stderr: 'sfdisk: failed to dump partition table: Success
2018-02-08_02:43:01.22037 ': exit status 1

Thank you so much. :)

Customizing stem cell

Is there a documentation which tell use how to modify stem cell and rebuild it for ubuntu ?
name: bosh-aws-xen-hvm-ubuntu-trusty-go_agent
version: '3312.49'

VM not reliably starting up on deployment

Hello,

we are experiencing issues with the bosh-aws-xen-hvm-ubuntu-trusty-go_agent/3586.24 Stemcell. When we create a VM from it on a t2.nano instance there is a one in three chance one of those 2 things will happen:

  • Time out because bosh never gets an agent connection
  • Time out on blob upload or get task

After we stopped and started the instance on the IaaS, we see that the instance was rebooting continuuously while being not reachable via the network. We were also able to reproduce this 5 times today

Where did it occur?
eu-.central-1c

Please contact me if you have further questions or are unable to reproduce, I will then send a syslog to you.

logger could not write to `/var/log/syslog` file

Hi, I am stemcell builder: https://github.com/bluebosh/bosh-linux-stemcell-builder/tree/ubuntu-xenial-ng

And my stemcell could not passed smoke test: when syslog threshold limit is reached should rotate the logs

I ran these command mannully and logrotate worked well(rotate logs larger than 10MB) but logger command could not write to /var/log/syslog. So this case failed.

I could not find your builder pipeline logs. So I had to ask you. Did you see same issues?

Thank you!

Syslog logging not possible on Xenial with unprivileged user

We observe that we cannot write to syslog on Xenial stemcells with a non root
user belonging to no group.

Trusty > 3586.x

zookeeper/df901ee6-87b2-4a83-8aec-8b68574f0f98:~# su - testuser
zookeeper/df901ee6-87b2-4a83-8aec-8b68574f0f98:~$ whoami
testuser
zookeeper/df901ee6-87b2-4a83-8aec-8b68574f0f98:~$ logger message
zookeeper/df901ee6-87b2-4a83-8aec-8b68574f0f98:~$

Xenial > 170.x

bosh/cec4c995-5f80-4b41-8262-f604b9db1a97:~$ su - testuser
bosh/cec4c995-5f80-4b41-8262-f604b9db1a97:~$ whoami
testuser
bosh/cec4c995-5f80-4b41-8262-f604b9db1a97:~$ logger testmessage
logger: socket /dev/log: Permission denied

Steps to reproduce

  1. log into a vm using Xenial as stemcell
  2. create a new user, e.g. adduser testuser
  3. switch to that user, su - testuser
  4. execute logger <message>
  5. fails with logger: socket /dev/log: Permission denied

Please note that this is not reproducible on bosh-lite

50-default.conf in rsyslog.d results in errors

2018-04-03T17:05:02.358719+00:00 localhost rsyslogd-2039: Could not open output pipe '/dev/xconsole':: No such file or directory [v8.22.0 try http://www.rsyslog.com/e/2039 ]

As you can see above, it's trying to output logs to xconsole, but the stemcell does not seem to have an xconsole, or an x-server running(nor would I expect it to have one). This probably comes from the default ubuntu config, but does not make sense for a headless bosh vm.

Stage system_ixgbvef failing in building opensuse image on s390x

Hi,

I am trying to build opensuse tumbleweed stemcell. I am using the base docker image opensuse/tumbleweed for s390x architecture.

I am getting the following error while running the rake command
bundle exec rake stemcell:build_os_image[opensuse,tumbleweed,$PWD/tmp/os_tumbleweed_base_image.tgz]
dkmsixgbevf

Any help appreciated.

Thanks :)

Can't use OpenJDK 11 because of glibc 2.23 on Bionic

Hi there! I'm on the CredHub team and we would like to use OpenJDK 11 on the bionic stemcell (https://bosh.io/stemcells/bosh-google-kvm-ubuntu-xenial-go_agent#v170.21). However, we are seeing a glibc error when we try to run anything on it:

 Error: failed /var/vcap/data/packages/openjdk_11/51d047a74a5aad28ae960fd578494c169acbe270/jre/lib/server/libjvm.so, because /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.27' not found (required by /var/vcap/data/packages/openjdk_11/51d047a74a5aad28ae960fd578494c169acbe270/jre/lib/server/libjvm.so)

Could you please update the xenial stemcell to use glibc 2.27 or later? Thanks!

/run/shm should be noexec

Currently /run/shm is mounted with rw,nosuid,nodev but should additionally have the noexec flag set. This is recommended by common security scanning tools. From OpenSCAP[1]:

"The noexec mount option can be used to prevent binaries from being executed out of /dev/shm. It can be dangerous to allow the execution of binaries from world-writable temporary storage directories such as /dev/shm. Add the noexec option to the fourth column of /etc/fstab for the line which controls mounting of /dev/shm."

  1. OpenSCAP: https://static.open-scap.org/ssg-guides/ssg-rhel7-guide-C2S.html
  2. Ubuntu: https://help.ubuntu.com/community/StricterDefaults

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.