cloudfoundry / bosh-vsphere-cpi-release Goto Github PK

BOSH	vSphere
Availability Zone	Clusters/Resource Pools
Virtual Machine	Virtual Machine
VM Config Metadata	Virtual Device ISO
Network Subnet	Networking
Persistent Disk	Virtual Hard Disk
Stemcell	Virtual Machine

Feature Support

The following sections describe some specific BOSH features supported by the CPI.

Network

The CPI supports multiple NICs being attached to a single VM.

Network Type	Support
Manual	Multiple networks per instance
Dynamic	Not Supported
VIP	Not Supported

Encryption

vSphere supports disk encryption and customer-managed keys when managed through policy configuration within the vCenter 6.5+ (learn more). For this functionality, encryption occurs at the hypervisor level which is transparent to the VM. Once enabled within vCenter, no additional configuration is required for the CPI.

Disk Type	Encryption
Root Disk	Supported
Ephemeral Disk	Supported
Persistent Disk	Supported

Miscellaneous

Feature	Support
Multi-CPI	Supported, v34+
Native Disk Resize	Not Supported

bosh-vsphere-cpi-release's People

Stargazers

Watchers

bosh-vsphere-cpi-release's Issues

improve comm error messages to include destination

creating stemcell (bosh-vsphere-esxi-ubuntu-trusty-go_agent 3541.25):
  CPI 'create_stemcell' method responded with error: CmdError{"type":"Unknown","message":"execution expired","ok_to_retry":false}

above error message does not indicate during which operation (and to which endpoint request is made) execution expires.

[feature request] support vSphere Virtual Machine Encryption

discussion in #bosh: https://cloudfoundry.slack.com/archives/C02HPPYQ2/p1518042060000030

https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/vm-encryption-vsphere65-perf.pdf

https://docs.vmware.com/en/VMware-vSphere/6.5/com.vmware.vsphere.security.doc/GUID-A29066CD-8EF8-4A4E-9FC9-8628E05FC859.html

https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/products/vsphere/vmw-wp-vsphere-virtual-machin-encryp.pdf

Fix for [ warning] [guestinfo] Failed to get vmstats log spam

I don't know if this is something specific to my vsphere environment or not, but, in the /var/vcap/messages of my vms I get a constant stream of:

Jan 31 22:07:47 localhost vmsvc[28642]: [ warning] [guestinfo] Failed to get vmstats.
Jan 31 22:08:17 localhost vmsvc[28642]: [ warning] [guestinfo] Failed to get vmstats.
Jan 31 22:08:47 localhost vmsvc[28642]: [ warning] [guestinfo] Failed to get vmstats.
Jan 31 22:09:17 localhost vmsvc[28642]: [ warning] [guestinfo] Failed to get vmstats.

I decided to do some digging to see if if this was a problem or fixable. I was able to determine that it is fixable by setting the vmx parameter: isolation.tools.setinfo.disable = "false".

Whether that parameter should be set or what it does was less clear. The ability to send data from the vm to the vhost could be considered a minor security consideration as it could open a way for someone with vm access to exploit some vmware security hole and perhaps own a vhost????

Also it appears by default there is no limit to tools.setInfo.sizeLimit which could be a problem.

See the note under tools.setInfo.sizeLimit in https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/vi35_security_hardening_wp.pdf and https://goingvirtual.wordpress.com/2009/07/11/locking-down-vmware-tools/

So, this issue could be solved in 2 ways:

decide enabling isolation.tools.setinfo.disable and setting a reasonable tools.setInfo.sizeLimit is not that big a deal and attempt to enable it by default in the vsphere cpi.
Fix #17 and support configuring arbitrary vmx parameters on a per vm-type basis and let customers decide what they want to do.

Thoughts?

CPI does not pay attention to ESXI servers disconnected or in maintenance mode

We are running CPI version 38 (but I think this issue affects all versions) with Bosh director v261.4 running against Vsphere Vcenter 5.5 using Vsphere ESXI 5.5 on the hypervisor hosts. The Vcenter has different clusters, but we are defining only 3 in the CPI configuration in order to deploy vms to them (in order to have 3 AZs, a cluster maps to an AZ). These clusters have Vsphere HA and DRS both disabled and they use local storage (SSDs), so when a vm is deployed, the cpi selects a random datastore based on the datastore_pattern and continues with the rest of operations to create a VM. Each ESXI server has a local datastore named similar to its name, so the ESXI server cf-dogo-esxi-01 has a local datastore called local-ssd-dogo-01.

This is the CPI config for bosh-init:

    vcenter:
      address: X.X.X.X
      user: user
      password: pass
      datacenters:
      - name: XXXXX
        vm_folder: CF_Test/Bosh_VMs
        template_folder: CF_Test/Bosh_Templates
        datastore_pattern: local-ssd*
        persistent_datastore_pattern: data_cf_gold_*
        disk_path: bosh_test_disks
        clusters:
          - CF_rone: {}
          - CF_dogo: {}
          - Online_Prod: {}

The main issue we have is when a esxi server is in maintenance mode (or disconnected). The CPI still sees the esxi hypervisor and tries to deploy vms there, and ofc, it fails. It seems it does not pay attention to the status of the server (maintenance status and/or disconnected).

The workaround we found is define a list of datastores by specifing a datastore pattern like (local-ssd-rone-[0-9]{2})|(local-ssd-dogo-[0-9]{1}[0-2,4-9]{1}) (in this case we want to avoid using the server/datastore local-ssd-dogo-03), but we do not want to do in this way because it needs to re-deploy bosh with bosh-init again (or do some hacks in the bosh director vm and cpi json config).

Is there a way to filter the ESXI servers which are in maintenance mode (or disconnected)?

Thanks!

CPI tries to connect to ESX hosts - is there any other way?

We're trying to deploy a bosh director to our VSphere/ESX cluster; however, it fails when uploading the stemcell since it tries to connect to ESX hosts:

bosh create-env bosh-deployment/bosh.yml \
     --state=state.json \
     --vars-store=creds.yml \
     -o bosh-deployment/vsphere/cpi.yml \
...
     -v vcenter_ip=<IP of VCenter> \
...

Error:

...
Starting registry... Finished (00:00:00)
Uploading stemcell 'bosh-vsphere-esxi-ubuntu-trusty-go_agent/3421.4'... Failed (00:00:08)
Stopping registry... Finished (00:00:00)
Cleaning up rendered CPI jobs... Finished (00:00:00)

creating stemcell (bosh-vsphere-esxi-ubuntu-trusty-go_agent 3421.4):
  CPI 'create_stemcell' method responded with error: CmdError{"type":"Unknown","message":"Connection refused - Connection refused - connect(2) for \"<hostname of ESX host>\" port 443 (<hostname of ESX host>:443)","ok_to_retry":false}

Exit code 1

The docs mention this:

The vSphere CPI requires access to port 80/443 for all the ESXi hosts in your vSphere resource pool(s). In order to upload stemcells to vSphere, the vSphere CPI makes use of an API call that returns a URL that the CPI should make a POST request to in order to upload the stemcell. This URL could have a hostname that resolves to any one of the ESXi hosts that are associated with your vSphere resource pool(s).

We were expecting that the CPI only needs to communicate to the VSphere API to fulfill its task. In our case, communication to the ESX hosts is not possible (they're on a complete different network segment). I'm afraid a lot of big enterprises have a similar setup.

Is there a way to get the CPI running without direct interaction to the ESX hosts?
Any suggestions would be greatly appreciated!

disk_cid might exceed 255 characters

Hi there

I've found a potential bug when using custom datastore placment for disk-types in my cloud-config.
The result is this error:
Error: Unknown CPI error 'InvalidCall' with message 'Arguments are not correct, details: 'invalid base64'' in 'attach_disk' CPI method

I've trace the problem back to the DB schema definition. When using mysql the column 'disk_cid' is of type varchar(255). Mysql will cut any additional characters after the 255th and thus the result is a loss of data (metadata to be precise).

I see that in the database there is also the column 'cloud_properties_json' which seems to include the same data as the base64 encoded metadata. So I'm not sure if this base64 encoded metadata in the cid field is really needed.

Can you point me to a workaround or is this a thing which can easily be fixed?
Thanks for the help!

Used CPI-release: 45.1.0

Allow memory reservation - by default or parameterized

In src/vsphere_cpi/lib/cloud/vsphere/vm_creator.rb line 46 would look something this:

config_hash = {memory_mb: @memory, num_cpus: @cpu, memory_allocation: VimSdk::Vim::ResourceAllocationInfo.new(reservation: @memory)}

for forced reservation.

We were short of disk space in our cluster and forcing memory reservation helped us by loosing swap files.

CPI 'create_vm' method responded with error: CmdError{"type":"Unknown","message":"Invalid configuration for device '11'

Recently we've been experiencing this issue with both bosh-init & bosh deploy with vsphere cpi where vm creating is failing during clone phase of cpi call.

Tried with multiple cpi versions , below debug logs are from cpi v38

For bosh-init
When deploying for the first time there are no issues, but when deploying again with some change in deployment manifest , bosh-init fails with device error.

If stemcell block is removed from state file and bosh-init deploy is fired again , everything goes smoothly, that means everytime fresh stemcell is uploaded it works.

While debugging , got this stack trace from bosh-init debug logs

I, [2017-03-02T03:46:09.504663 #18705]  INFO -- : Cloning vm: (VSphereCloud::Resources::VM (cid="sc-7b961d98-857b-4590-9ff5-8d2c020471e4")) to vm-78b3a003-3e92-46a5-96be-68579ce27a37
D, [2017-03-02T03:46:09.504803 #18705] DEBUG -- : Running method 'FindByInventoryPath'...
D, [2017-03-02T03:46:09.508589 #18705] DEBUG -- : Running method 'RetrieveProperties'...
D, [2017-03-02T03:46:09.513017 #18705] DEBUG -- : Running method 'CloneVM_Task'...
D, [2017-03-02T03:46:09.534150 #18705] DEBUG -- : Running method 'RetrievePropertiesEx'...
D, [2017-03-02T03:46:09.539643 #18705] DEBUG -- : Starting task 'VirtualMachine.clone'...
D, [2017-03-02T03:46:09.540111 #18705] DEBUG -- : Running method 'RetrievePropertiesEx'...
D, [2017-03-02T03:46:10.546503 #18705] DEBUG -- : Running method 'RetrievePropertiesEx'...
D, [2017-03-02T03:46:11.553379 #18705] DEBUG -- : Running method 'RetrievePropertiesEx'...
D, [2017-03-02T03:46:12.599832 #18705] DEBUG -- : Running method 'RetrievePropertiesEx'...
D, [2017-03-02T03:46:14.184733 #18705] DEBUG -- : Running method 'RetrievePropertiesEx'...
D, [2017-03-02T03:46:16.583993 #18705] DEBUG -- : Running method 'RetrievePropertiesEx'...
D, [2017-03-02T03:46:20.216892 #18705] DEBUG -- : Running method 'RetrievePropertiesEx'...
W, [2017-03-02T03:46:20.225565 #18705]  WARN -- : Error running task 'VirtualMachine.clone'. Failed with message 'Invalid configuration for device '11'.'.

<RetrievePropertiesExResponse xmlns="urn:vim25"><returnval><objects><obj type="Task">task-322042</obj><propSet><name>info.descriptionId</name><val xsi:type="xsd:string">VirtualMachine.clone</val></propSet><propSet><name>info.entity</name><val type="VirtualMachine" xsi:type="ManagedObjectReference">vm-60846</val></propSet><propSet><name>info.error</name><val xsi:type="LocalizedMethodFault"><fault xsi:type="InvalidDeviceSpec"><property>virtualDeviceSpec.device.backing.parent.fileName</property><deviceIndex>11</deviceIndex></fault><localizedMessage>Invalid configuration for device &apos;11&apos;.</localizedMessage></val></propSet><propSet><name>info.name</name><val xsi:type="xsd:string">CloneVM_Task</val></propSet><propSet><name>info.state</name><val xsi:type="TaskInfoState">error</val></propSet></objects></returnval></RetrievePropertiesExResponse>
</soapenv:Body>
</soapenv:Envelope>
Rescued Unknown: Invalid configuration for device '11'.. backtrace: /home/.bosh_init/installations/fde44370-8f88-45df-7524-6d792d706eb1/packages/vsphere_cpi/lib/cloud/vsphere/retryer.rb:13:in `try'
/home/.bosh_init/installations/fde44370-8f88-45df-7524-6d792d706eb1/packages/vsphere_cpi/lib/cloud/vsphere/task_runner.rb:12:in `run'
/home/.bosh_init/installations/fde44370-8f88-45df-7524-6d792d706eb1/packages/vsphere_cpi/lib/cloud/vsphere/vcenter_client.rb:50:in `wait_for_task'
/home/.bosh_init/installations/fde44370-8f88-45df-7524-6d792d706eb1/packages/vsphere_cpi/lib/cloud/vsphere/vm_creator.rb:88:in `create'
/home/.bosh_init/installations/fde44370-8f88-45df-7524-6d792d706eb1/packages/vsphere_cpi/lib/cloud/vsphere/cloud.rb:259:in `block in create_vm'
/home/.bosh_init/installations/fde44370-8f88-45df-7524-6d792d706eb1/packages/vsphere_cpi/vendor/bundle/ruby/2.2.0/gems/bosh_common-1.3262.24.0/lib/common/thread_formatter.rb:49:in `with_thread_name'
/home/.bosh_init/installations/fde44370-8f88-45df-7524-6d792d706eb1/packages/vsphere_cpi/lib/cloud/vsphere/cloud.rb:206:in `create_vm'
/home/.bosh_init/installations/fde44370-8f88-45df-7524-6d792d706eb1/packages/vsphere_cpi/vendor/bundle/ruby/2.2.0/gems/bosh_cpi-2.0.1/lib/bosh/cpi/cli.rb:71:in `public_send'
/home/.bosh_init/installations/fde44370-8f88-45df-7524-6d792d706eb1/packages/vsphere_cpi/vendor/bundle/ruby/2.2.0/gems/bosh_cpi-2.0.1/lib/bosh/cpi/cli.rb:71:in `run'
/home/.bosh_init/installations/fde44370-8f88-45df-7524-6d792d706eb1/packages/vsphere_cpi/bin/vsphere_cpi:42:in `<main>'
[File System] 2017/03/06 09:05:36 DEBUG - Remove all /home/.bosh_init/installations/fde44370-8f88-45df-7524-6d792d706eb1/jobs/vsphere_cpi
[File System] 2017/03/06 09:05:36 DEBUG - Remove all /home/.bosh_init/installations/fde44370-8f88-45df-7524-6d792d706eb1/blobs/5a85bf97-a29c-4bf6-40cc-ca4be17221a5
[File System] 2017/03/06 09:05:36 DEBUG - Remove all /home/.bosh_init/installations/fde44370-8f88-45df-7524-6d792d706eb1/tmp/stemcell-manager347632596
[File System] 2017/03/06 09:05:36 DEBUG - Remove all /home/.bosh_init/installations/fde44370-8f88-45df-7524-6d792d706eb1/tmp/bosh-init-release065385170
[File System] 2017/03/06 09:05:36 DEBUG - Remove all /home/.bosh_init/installations/fde44370-8f88-45df-7524-6d792d706eb1/tmp/bosh-init-release627218441
[main] 2017/03/06 09:05:36 ERROR - Command 'deploy' failed: Deploying: Creating instance 'bosh/0': Creating VM: Creating vm with stemcell cid 'sc-7b961d98-857b-4590-9ff5-8d2c020471e4': CPI 'create_vm' method responded with error: CmdError{"type":"Unknown","message":"Invalid configuration for device '11'.","ok_to_retry":false}

mentioning <fault xsi:type="InvalidDeviceSpec"><property>virtualDeviceSpec.device.backing.parent.fileName</property><deviceIndex>11</deviceIndex> as invalid device spec

Going through few vmware docs , got the reference about the property being used when Performing Advanced Manipulation of Delta Disks in case of linked clones.

References:
Vmware linked clone notes
Vmware spec - parent property is described here in detail which is used during clone phase
CPI code for backing spec - As of now parent property is not added here in spec , as optional

Observed another thing which is probably cause of above issue

When updating vm with bosh-init , during delete phase , .vmdk file from base stemcell directory is also deleted , which shouldn't be the case because same vmdk should be used next time during vm creation phase.

Contents of stemcell dir on datastore after first Bosh-init deploy

Contents of stemcell dir on datastore after manifest update & Bosh-init deploy

Started deploying
  Waiting for the agent on VM 'vm-7cf258dd-85d8-4a6c-8267-fa75a07b4ddf'... Finished (00:00:00)
  Stopping jobs on instance 'unknown/0'... Finished (00:00:01)
  Unmounting disk 'disk-e8ebd8de-da10-4feb-ad9b-180efa333662'... Finished (00:00:01)
  Deleting VM 'vm-7cf258dd-85d8-4a6c-8267-fa75a07b4ddf'... Finished (00:01:01)
  Creating VM for instance 'bosh/0' from stemcell 'sc-4c722f61-e085-4bea-a377-4ab6518818fd'... Failed (00:02:16)
Failed deploying (00:03:23)

Stopping registry... Finished (00:00:00)
Cleaning up rendered CPI jobs... Finished (00:00:00)

Command 'deploy' failed:
  Deploying:
    Creating instance 'bosh/0':
      Creating VM:
        Creating vm with stemcell cid 'sc-4c722f61-e085-4bea-a377-4ab6518818fd':
          CPI 'create_vm' method responded with error: CmdError{"type":"Unknown","message":"Invalid configuration for device '11'.","ok_to_retry":false}

unknown variable in an error message

#{datacenters.inspect} references to a variable that does not exist.

if vm_datacenters.size > 1
  raise "stemcell VM #{vm.inspect} found in multiple datacenters #{datacenters.inspect}"
end

https://github.com/cloudfoundry-incubator/bosh-vsphere-cpi-release/blob/v46.1.0/src/vsphere_cpi/lib/cloud/vsphere/cloud.rb#L583-L585

line 38: #<RuntimeError: cluters property must be an array

Assuming you meant clusters?

The CPI doesn't delete nsx security group, created for a particular vm, after the vm is deleted.

Because of this behavior right now we are having 3000+ security groups. Sometimes this causes new deployments to fail, because this method times out.

Screenshots for NSX blog post

Support configuring vsphere memory ballooning settings

We recently ran into an issue where our bosh deployed VMs were having their memory ballooned away causing issues for our deployment.

It would be nice if it were possible to configure on a per vm_type basis the sched.mem.maxmemctl parameter. This would allow us to ensure bosh deployed vms that are sensitive to memory ballooning don't have only an appropriate amount of memory ballooned away from it.

Here are some links talking about ballooning and how to disable/configure it.

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1002586

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1003586

Could not find 'VimSdk::Vim::ResourcePool

take from cloudfoundry/cf-mysql-release#134

Could anyone help me on this please?

Started compiling packages
Started compiling packages > cluster-schema-verifier/a2f8659250e5b51a7f5d0b0fdff5dbdc5ee98895
Started compiling packages > cli/41edd6a70de171fcefd9328c5a616a03e9c88677
Started compiling packages > mysqlclient/ce95f8ac566f76b650992987d5282ee473356e43
Started compiling packages > ruby/a57562f234ecf56581a73f2e18715f19ea74e9f1
Failed compiling packages > mysqlclient/ce95f8ac566f76b650992987d5282ee473356e43: Unknown CPI error 'Unknown' with message 'Could not find 'VimSdk::Vim::ResourcePool': {:root=>
#<VimSdk::Vim::ResourcePool:0x005559933b4380

Datastore disk placement issue in v47

Hi,

We deployed a bosh director specifying the parameters listed at https://bosh.io/docs/init-vsphere/. What we noticed though was a strange behavior when deploying VMs.
Every single ephemeral disk would use the same datastore until it fills up and return us the error Error: Unknown CPI error 'Unknown' with message 'Module 'MonitorLoop' power on failed. ' in 'create_vm' CPI method.

Following the documentation, we tried multiple configurations. From changing the regex in the init command to specifying the datastores/ds_cluster for persistent and ephemeral disks in the cloud config as per https://bosh.io/docs/vsphere-cpi/#resource-pools. None of those options worked and we were seeing the same behavior all the time.

Our final 'regex' looked like: ^(DS0|DS1|DS2|DS3|DS4|DS5|DS6)$ and even then only a single datastore was picked every time.

The storage config comprises of a single datastore cluster with 6 datastores in it, SDRS is enabled.
Every physical host has the permission to access the 6 datastores.

Interestingly, changing the cpi version from v47 to v45.1.0 and using the same regex as shown above resulted in the expected disk placement as for https://github.com/cloudfoundry-incubator/bosh-vsphere-cpi-release/blob/master/src/vsphere_cpi/lib/cloud/vsphere/datastore_picker.rb#L14.

v47 has added support for SDRS and I wonder if that has impact on what was described above - is there an additional configuration required?.

We didn't check if the same applies for v48 or v49.

Cluster placement not working correctly for local storage persistent disks

Currently we seem to be running into a bug where if you specify local storage it does not validate if there is enough persistent disk space available on the host it is placing the VM on. Ephemeral disks are running on shared storage and persistent disks are running on local storage. We attached our bosh log for further clarification.
bosh_deployment_log.txt

Based on the documentation this should be a supported configuration:

VMs are placed on clusters and datastores based on a weighted random algorithm. The weights are calculated by how many times the requested memory, ephemeral and persistent disk could fit on the cluster.

Source: https://bosh.io/docs/vsphere-cpi.html#vm-placement

bosh: 264.5.0
bosh-vsphere-cpi: 45

Need to handle opaque network backing for virtual nic

As VMware vSphere 6.5 supports opaque network, please help add vitrual nic backing for opaque network attachment when creating virtual nic.

https://pubs.vmware.com/vsphere-65/index.jsp#com.vmware.vspsdk.apiref.doc/vim.vm.device.VirtualDevice.BackingInfo.html
https://pubs.vmware.com/vsphere-65/index.jsp#com.vmware.vspsdk.apiref.doc/vim.vm.device.VirtualEthernetCard.OpaqueNetworkBackingInfo.html

NSX integration returns error "The object vm-86479 is already present in the system."

Steps to Reproduce

Deploy a director (v258+) with NSX credentials correctly configured in global properties.
Create a deployment where the deployment name and the instance_group name are the same.
Verify that deploy fails with:

Error: Unknown CPI error 'Unknown' with message 'Failed to add VM to Security Group with unknown NSX error: '<?xml version="1.0" encoding="UTF-8"?>
<error><details>The object vm-86478 is already present in the system.</details><errorCode>203</errorCode><moduleName>core-services</moduleName></error>'' in 'create_vm' CPI method

Expected Result

The CPI realizes that the VM is already in that group and continues without error.

VMFork with BOSH

With vSphere 6.7 VMware enabled the ability to invoke VMFork through the API, this means one can power on hundreds of VMs at once using linked cloning technologies.
This can be of great value for BOSH as anyone working with BOSH knows the pain of waiting for BOSH waiting minutes for each VM to come up. in comparison to the cloud that time is unacceptable. If taking advantage of VMFork BOSH can deploy many VMs at once and power them on at the same time shrinking time for deployment and ops immensely.
I believe this would be of great value to our customer and have a real impact for our customers

Thanks,
Niran

ability to specify drs rules without overriding datacenter and cluster

currently one has to specify drs rules like following (nested under cluster):

cloud_properties:
  datacenters:
  - name: dc1
    clusters:
    - cluster1:
        drs_rules:
        - name: rule1
          type: separate_vms

which means that one has to know datacenter and cluster name when specifying it. (typically dc/cluster configuration is specified under an az)

ideally we can specify drs_rules at the top level of cloud properties so that one does not have to know about dc/cluster. example:

cloud_properties:
  datacenters:
  - name: dc1
    clusters:
    - cluster1:
  drs_rules:
  - name: rule1
    type: separate_vms

this way drs_rules can be specified in a vm_extension without clobbering az configuration (dc/cluster).

cc @cppforlife

Bump ruby version to 2.4-r3

As part of a recent bosh story we bumped the ruby version for the director, the ruby bosh-package, and the aws CPI to 2.4-r3. This change should also happen to the bosh-vsphere-cpi-release. The steps are:

Check out the bosh-packages ruby release
cd into the bosh-vsphere-cpi-release
With a recent version of the bosh-cli, run bosh vendor-package path/to/bosh-packages/ruby-release ruby-2.4-r3
Change all ruby-2.4 references in the bosh-vsphere-cpi-release to ruby-2.4-r3. Find-and-replace should be good enough.

Bosh-init deployment to vsphere environment fails due to encoding issues

We've been trying to deploy bosh into our vSphere environment for a few weeks without much success. HUGE THANK YOU goes out to @mjavault for tracking down and documenting this bug. All of the credit for the content below goes to @mjavault.

When we try to run bosh-init deploy manifest.yml against our vSphere deployment, we get the following error:

Deployment manifest: '/git/manifest.yml'
Deployment state: '/git/manifest-state.json'

Started validating
  Downloading release 'bosh'... Skipped [Found in local cache] (00:00:00)
  Validating release 'bosh'... Finished (00:00:02)
  Downloading release 'bosh-vsphere-cpi'... Skipped [Found in local cache] (00:00:00)
  Validating release 'bosh-vsphere-cpi'... Finished (00:00:00)
  Validating cpi release... Finished (00:00:00)
  Validating deployment manifest... Finished (00:00:00)
  Downloading stemcell... Finished (00:01:03)
  Validating stemcell... Finished (00:00:03)
Finished validating (00:01:09)

Started installing CPI
  Compiling package 'vsphere_cpi_mkisofs/b3ebe039dae6a312784ece4da34d66053d1dfbba'... Finished (00:02:43)
  Compiling package 'vsphere_cpi_ruby/3ce375f2863799664bff235e2c778a3131f1e981'... Finished (00:01:47)
  Compiling package 'vsphere_cpi/6cce7d152770ee8a2d2309d2278cc83af878757d'... Finished (00:00:52)
  Installing packages... Finished (00:00:00)
  Rendering job templates... Finished (00:00:00)
  Installing job 'vsphere_cpi'... Finished (00:00:00)
Finished installing CPI (00:05:24)

Starting registry... Finished (00:00:00)
Uploading stemcell 'bosh-vsphere-esxi-ubuntu-trusty-go_agent/3149'... Failed (00:01:56)
Stopping registry... Finished (00:00:00)
Cleaning up rendered CPI jobs... Finished (00:00:00)

Command 'deploy' failed:
  creating stemcell (bosh-vsphere-esxi-ubuntu-trusty-go_agent 3149):
    Unmarshalling external CPI command output: STDOUT: '', STDERR: 'at depth 1 - 19: self signed certificate in certificate chain
I, [2016-01-07T16:06:31.576680 #26031]  INFO -- : Extracting stemcell to: /tmp/d20160107-26031-f6jcqs
I, [2016-01-07T16:06:36.033533 #26031]  INFO -- : Generated name: sc-f8687c5b-30bf-4f10-99e4-4ad8f6cf9901
D, [2016-01-07T16:06:36.033812 #26031] DEBUG -- : All clusters provided: {"cluster1"=>#<VSphereCloud::ClusterConfig:0x007f182922c4d0 @name="cluster1", @config={}>}
at depth 1 - 19: self signed certificate in certificate chain
at depth 1 - 19: self signed certificate in certificate chain
D, [2016-01-07T16:08:04.610612 #26031] DEBUG -- : cluster1 ephemeral disk bound
D, [2016-01-07T16:08:04.610796 #26031] DEBUG -- : Acceptable clusters: [[<Cluster: <[Vim.ClusterComputeResource] domain-c63525> / cluster1>, 5929]]
D, [2016-01-07T16:08:04.610881 #26031] DEBUG -- : Choosing cluster by weighted random
D, [2016-01-07T16:08:04.610955 #26031] DEBUG -- : Selected cluster 'cluster1'
D, [2016-01-07T16:08:04.611021 #26031] DEBUG -- : Looking for a ephemeral datastore in cluster1 with 529MB free space.
D, [2016-01-07T16:08:04.611094 #26031] DEBUG -- : All datastores within cluster cluster1: ["Somerville 3PAR 2 VM 5 (3137714MB free of 4194048MB capacity)"]
D, [2016-01-07T16:08:04.611172 #26031] DEBUG -- : Datastores with enough space: ["Somerville 3PAR 2 VM 5 (3137714MB free of 4194048MB capacity)"]
I, [2016-01-07T16:08:04.611252 #26031]  INFO -- : Deploying to: <[Vim.ClusterComputeResource] domain-c63525> / <[Vim.Datastore] datastore-188772>
I, [2016-01-07T16:08:05.045048 #26031]  INFO -- : Importing VApp
I, [2016-01-07T16:08:05.141158 #26031]  INFO -- : Waiting for NFC lease to become ready
I, [2016-01-07T16:08:07.168564 #26031]  INFO -- : Uploading
I, [2016-01-07T16:08:07.178566 #26031]  INFO -- : Uploading disk to: https://vmhost/nfc/52b32078-1cfe-c4f3-a2e8-94769d02a483/disk-0.vmdk
at depth 0 - 20: unable to get local issuer certificate
I, [2016-01-07T16:08:22.260389 #26031]  INFO -- : Removing NICs
I, [2016-01-07T16:08:23.315795 #26031]  INFO -- : Taking initial snapshot
/home/user/.bosh_init/installations/7f8327b7-578b-4d35-7a62-2ee15c504e92/packages/vsphere_cpi_ruby/lib/ruby/2.1.0/json/common.rb:223:in `encode': "\xC2" on US-ASCII (Encoding::InvalidByteSequenceError)
    from /home/user/.bosh_init/installations/7f8327b7-578b-4d35-7a62-2ee15c504e92/packages/vsphere_cpi_ruby/lib/ruby/2.1.0/json/common.rb:223:in `generate'
    from /home/user/.bosh_init/installations/7f8327b7-578b-4d35-7a62-2ee15c504e92/packages/vsphere_cpi_ruby/lib/ruby/2.1.0/json/common.rb:223:in `generate'
    from /home/user/.bosh_init/installations/7f8327b7-578b-4d35-7a62-
        from /home/user/.bosh_init/installations/7f8327b7-578b-4d35-7a62-2ee15c504e92/packages/vsphere_cpi/vendor/bundle/ruby/2.1.0/gems/bosh_cpi-1.3093.0/lib/bosh/cpi/cli.rb:114:in `result_response'
        from /home/user/.bosh_init/installations/7f8327b7-578b-4d35-7a62-2ee15c504e92/packages/vsphere_cpi/vendor/bundle/ruby/2.1.0/gems/bosh_cpi-1.3093.0/lib/bosh/cpi/cli.rb:82:in `run'
        from /home/user/.bosh_init/installations/7f8327b7-578b-4d35-7a62-2ee15c504e92/packages/vsphere_cpi/bin/vsphere_cpi:37:in `<main>'
':
      unexpected end of JSON input

The deploy process is downloading a huge xml file that contains UTF8 characters. Now that file is properly parsed by ruby, but the code also specifies that the full output should be added as a string in a hash object, for logging purposes. Later in the code, that hash object is converted back to json, but the encoder fails because it finds invalid characters, specifically UTF8 characters in a ASCII string.(edited)

code extract, cli.rb:

  def result_response(result)
    hash = {
      result: result,
      error: nil,
      log: @logs_string_io.string,
    }
    @result_io.print(JSON.dump(hash)); nil
  end

The problem here is the log: @logs_string_io.string, the UTF8 encoding is lost. I doubt the variable is actually very useful, but in doubt, I patched it this way:
log: @logs_string_io.string.force_encoding(Encoding::UTF_8)

I have never coded in ruby in my life, so ruby people would probably come up with a cleaner fix, but this effectively forces the string as UTF8, and the later calls to generate() work just fine! With that file patched, the process goes all the way (I'm still getting some warnings, that I assume are related to the way I hacked my patch into the archives)

So, until we come up with a cleaner, official fix, here is a step by step guide on how to patch it yourself.

1. Run `bosh-init deploy manifest.yml` once, and let it fail
2. Go into `~/.bosh_init/installations/{install uuid}/blobs` (the `{installation uuid}` can be found in the error message that you got at step 1)
3. look for the largest file in that folder (there should be three files), and note the name (it should look like a UUID)
4. create a `tmp` folder here, and extract the file: `tar -xvzf {file} -C tmp`
5. edit `tmp/vendor/bundle/ruby/2.1.0/gems/bosh_cpi-1.3093.0/lib/bosh/cpi/cli.rb` (note: if the file does not exist, it's likely you didn't pick the right file out of the three)
6. patch line 112: `log: @logs_string_io.string.force_encoding(Encoding::UTF_8),`, save the file
7. from inside the `tmp` folder: `tar -cvzf ..\patched.tgz *`
8. rename `patched.tgz` to the original file name from step 3 (you might want to rename the original file first as a backup)
9. generate the sha1: `sha1sum {file}` and write it down
10. edit `~/.bosh_init/installations/{install uuid}/compiled_packages.json`, search for the `BlobID` that is the name of the file from step 3, and update the corresponding `BlobSHA1`
11. that's it. Go back into you working folder, and run `bosh-init deploy manifest.yml` one more time.

Disk Snapshot Implementation

Hey, all.

Originally, with BOSH you can snapshot jobs disk with the following command: bosh [OPTIONS] take-snapshot [INSTANCE-GROUP/INSTANCE-ID].

In order to have such functionality, the CPI should implement snapshot_disk method.

I am wondering if there a specific concerns behind the fact the vSphere CPI does not have this method implemented?

Thank you,
Alex L.

create_vm failure

Guys

Could someone please help me debug this issue? I keep getting this error on attempt to deploy VMs.

L Error: Unknown CPI error 'Unknown' with message 'Could not transfer file 'https://10.234.36.102/folder/vm-c8fa7d28-67ea-4538-a067-c3c137cd76f7/env.json?dcPath=ppc&dsName=vol-NFScf4', received status code '500'' in 'create_vm' CPI method
Task 162 | 16:42:55 | Creating missing vms: access_z1/02480489-74d1-4605-b53f-41e8e4c8da3c (0) (00:06:32)
Task 162 | 16:42:55 | Error: Unknown CPI error 'Unknown' with message 'Could not transfer file 'https://10.234.36.102/folder/vm-c8fa7d28-67ea-4538-a067-c3c137cd76f7/env.json?dcPath=ppc&dsName=vol-NFScf4', received status code '500'' in 'create_vm' CPI method

This is not Issue

Hi everyone this CPI must and should need NSX, or NSX is optional.

Thanks in Advance

Abhilash

retry on nsx v errors

Rescued Unknown: Failed to add VM to Security Group with unknown NSX error: '<?xml version="1.0" encoding="UTF-8"?>
<error><details>Concurrent object access error. Refresh UI or fetch the latest copy of the object and retry the operation.</details><errorCode>101</errorCode></error>'. backtrace: /var/vcap/packages/vsphere_cpi/lib/cloud/vsphere/nsx.rb:42:in `block in add_vm_to_security_group'
/var/vcap/packages/vsphere_cpi_ruby/lib/ruby/gems/2.2.0/gems/bundler-1.15.0/lib/bundler/friendly_errors.rb:121:in `with_friendly_errors'
I, [2017-12-04T17:17:11.315955 #24487]  INFO -- [req_id cpi-532525]: Failed to apply NSX properties to VM 'vm-4b4c07e2-7683-4ae8-bbc2-33f999bb9eb0' with error: Failed to add VM to Security Group with unknown NSX error: '<?xml version="1.0" encoding="UTF-8"?>
<error><details>Concurrent object access error. Refresh UI or fetch the latest copy of the object and retry the operation.</details><errorCode>101</errorCode></error>'

Cluster selection including non specified datastores in calculation instead of only specified ones

We have a problem in our environment where VMs appear to be getting placed in clusters we wouldn't expect.

Our scenario

We have 2 clusters.
We have configured bosh to use a single datastore that is shared between the 2 clusters.

Behavior

Our VMs are repeatedly picking a single cluster causing that cluster to run hot on memory.

Expected Behaviour

Since we use the same datastore for both clusters I'd think that the picker algorithm should pick the cluster will more memory available.

What I found while digging through the code and debugging

Cluster selection is taking place here: https://github.com/cloudfoundry-incubator/bosh-vsphere-cpi-release/blob/7fb6677a6a8dd9d4221dd57c171c9b0932d152fb/src/vsphere_cpi/lib/cloud/vsphere/cluster_picker.rb#L15

Browsing through the code it appears that the method is supposed to be checking:

placements_with_minimum_disk_migrations
placements_with_max_free_space
placements_with_max_free_memory

Debugging through the code it appears we have a problem with Disk Max Free Space calculation. The main code is: https://github.com/cloudfoundry-incubator/bosh-vsphere-cpi-release/blob/027c3f4c8f769806b86fda88375dbee21688a4f0/src/vsphere_cpi/lib/cloud/vsphere/datastore_picker.rb#L21

It appears that the balance_score is including in its calculation all of the accessible datastores on the cluster and not just the datastore(s) that we have selected via datastore_pattern.

This causes the cluster picker to always pick the cluster with the most available disk space regardless of the datastores we actually have configured.

What I would expect is that if we use the same datastore for both clusters then the disk selection process of cluster_picker would result in a tie for the 2 clusters and would therefore move on to picking based on memory.

Thoughts?

[feature request] automatically upgrade hw version

some customers want to run on latest virtual hardware version of a stemcell. it may be to take advantage of perf features, or some to be compatible with new esxi features. recently we had another request for it, so im filing this issue on thier behalf. i think it would be nice to add a feature flag (with a default false) so that we can opt in for some users as a trial run. eventually may be enable for all by default?

we've previously did this experiment to see if we could do it. here is our try: 346417b. story in which we investigated this feature: https://www.pivotaltracker.com/story/show/154215922

Promote out of incubator

@chipchilders @cppforlife AWS CPI is the oldest CPI. Can this repo be promoted to @cloudfoundry org?

Host anti-affinity for instances in the same instance group and AZ (cluster/resource pool)

I understand that if I have an instance_group with several azs, BOSH will try to balance instances across AZs to provide AZ-level anti-affinity. For the VMs that will be scheduled within the same AZ (which, on vSphere, corresponds to a cluster and optional resource pool), are there any guarantees around host-level anti-affinity? Or, is there some option that could be passed to the CPI to request that it schedules in an anti-affine manner?

/cc @cppforlife @zaksoup

Does the multi-cluster per AZ feature load-balance and how?

In recent versions of the CPI BOSH now supports having multiple vSphere Clusters as part of a single AZ.

Does the CPI balance VM placement across each member cluster equally, or via some other method? Is this tuneable?

Could this also be documented on https://bosh.io/docs/vsphere-cpi.html?

cc @cppforlife

Deploying multiple CFs to one NSX breaks security groups

Currently security groups are automatically created for all instance groups in a deployment when NSX is enabled. In fact you actually get one security group namespaced to the deployment and one namespaced just to the instance group. As such two deployments can very easily step on each other.

As a minimum a security_group prefix could be defined to ensure jobs get namespaced appropriately.

provide better error message about network connectivity

current error message:

CPI 'create_stemcell' method responded with error: CmdError{"type":"Unknown","message":"execution expired","ok_to_retry":false}

in the above scenario CPI is trying to talk to one of the esxi hosts. it's not clear from the error message which endpoint it's trying to hit. let's provide a more verbose error message with the endpoint information. be careful about exposing any creds.

allow cpi to use copied stemcells as source for copying stemcells into new datastores

Scenarios for Permission “tightening”

There are scenarios in which users would like to limit the out of the box permissions as outlined above in vSphere to tighten down the control that the vSphere user supplied to Pivotal Cloud Foundry/Bosh to the bare minimum of scope and capability, particularly in vSphere environments that may be “shared” cluster infrastructure, and not dedicated to PCF alone. That said, the following example is a specific user was able to limit the RBAC security for the Bosh user. However, there are still 9 outstanding controls that were tested as needed to remain the default recommendation.

Users were trying to set up multiple CloudFoundries in a single VCenter. To achieve the best isolation possible, we would like to have a separate VCenter user for each CloudFoundry, and each user should be granted only the least privileges possible. The goal would be that user for CloudFoundry A cannot access stuff from CloudFoundry B.

The following steps are mentioned in this KB article from Pivotal How to restrict permissions in a multi-tenant Pivotal Cloud Foundry deployment on vSphere

However, if we grant one user these privileges on a global level, it can access other user's data in the datastores or access other user's networks, breaking the separation.

Is there a way to still achieve the desired isolation?

We're aware that for some of these privileges, we could work around by separating resources (i.e. create separate data stores for each cloudfoundry), but then we lose some of the 'Cloudy' properties.

Cannot complete login due to an incorrect user name or password

Hi i am trying to deploy microbosh on vsphere 6.5 but i encountering below error

creating stemcell (bosh-vsphere-esxi-ubuntu-trusty-go_agent 3468.13):
CPI 'create_stemcell' method responded with error: CmdError{"type":"Unknown","message":"Cannot complete login due to an incorrect user name or password.","ok_to_retry":false

My Bosh Deatils are follows

bosh create-env /home/ubuntu/workspace/bosh-deployment/bosh.yml
--state=state.json
--vars-store=creds.yml
-o /home/ubuntu/workspace/bosh-deployment/vsphere/cpi.yml
-v director_name=bosh
-v internal_cidr=192.168.203.0/24
-v internal_gw=192.168.203.1
-v internal_ip=192.168.203.18
-v network_name="SBP Network"
-v vcenter_dc=cloudservice
-v vcenter_ds=datastore1
-v vcenter_ip=192.168.203.20
-v vcenter_user=[email protected]
-v vcenter_password=$D$32@bhi07-
-v vcenter_templates=bosh-templates
-v vcenter_vms=bosh-vms
-v vcenter_disks=bosh-disks
-v vcenter_cluster=cluster1

but when i try to login from vSphere Web Client single sign on dashboard i am successfully signing in with above username and password.

So i checked the Events in vSphere Web Client dashboard i see the below error

Should i want to configure and enable Active Directory ?

vSphere Web Client Active Directory

vSphere ESXi Active Directory

Thanks in Advance!!!

Slack integration test issue

Testing

Assets for NSX blog post

Required privileges on datacenter level for multi-tenancy within a VCenter

Hi guys

We try to set up multiple CloudFoundries in a single VCenter. To achieve the best isolation possible, we would like to have a separate VCenter user for each CloudFoundry, and each user should be granted only the least privileges possible. The goal would be that user for CloudFoundry A can not access stuff from CloudFoundry B.

To separate the different CloudFoundries, we created a folder in VCenter for each CloudFoundry. On this Folder we applied the necessary permissions as listed in the docs.

However, some of the permissions can not be set on a Folder level but must be set on Datacenter level. We had to grant the users these privileges on Datacenter level to get basic deployments working:

Datastore.AllocateSpace
Datastore.Browse
Datastore.FileManagement
Datastore.DeleteFile
Datastore.UpdateVirtualMachineFiles
Network.Assign
Resource.AssignVMToPool 
VirtualMachine.Config.AddNewDisk
VApp.Import

However, if we grant one user these privileges on a global level, it can access other user's data in the datastores or access other user's networks, breaking the separation.

Is there a way to still achieve the desired isolation?

We're aware that for some of these privileges, we could work around by separating resources (i.e. create separate data stores for each cloudfoundry), but then we lose some of the 'Cloudy' properties.

Name or service not known

Hi everyone, i am trying to deploying Micro-bosh on vsphere 6.5.

But i encounter below error

creating stemcell (bosh-vsphere-esxi-ubuntu-trusty-go_agent 3468.21):
CPI 'create_stemcell' method responded with error: CmdError{"type":"Unknown","message":"getaddrinfo: Name or service not known (xxxxxxxx:443)","ok_to_retry":false}

Exit code 1

Thank in advance.

Abhilash S

Datacenter not found

Hi i am trying to deploy Cloud Foundry on Multi datacenter and configured datastores in datastore-cluster and networks respectively.

But unfortunately i am encountering this error while deploying Micro-Bosh,

My Command

Is it possible to deploy Micro-Bosh and CF on Multi-datacenters.

Thanks in advance

bump to use ruby-2.4-r4 from bosh-packages/ruby-release

to pick up latest ruby version for the cpi.

$ git clone https://github.com/bosh-packages/ruby-release ~/workspace/ruby-release
$ bosh vendor-package ruby-2.4-r4 ~/workspace/ruby-release
$ replace references from ruby-2.4-r3 to ruby-2.4-r4

No valid placement found for disks

Hi everyone i am trying to deploy Cloud Foundry on 3 clusters (with single host) in single datacenter and i created datastore cluster for three datastores. More over my vSphere and vCenter Version 6.5 with standard license.

My Micro Bosh Command:

But i am encountering below error:

Is there sollution fo this error

Please refer my cloud-config and cf yaml files
cf-deployment.txt
cloud-config.txt

Intermittent `HTTPClient::KeepAliveDisconnected` errors when creating VMs

In our CI pipeline, we frequently seen deploys fail with the following error:

Error 100: Unknown CPI error 'Unknown' with message 'HTTPClient::KeepAliveDisconnected: ' in 'create_vm' CPI method

The CPI logs show that it failed in file provider:

Rescued Unknown: HTTPClient::KeepAliveDisconnected: . backtrace: /var/vcap/packages/vsphere_cpi/vendor/bundle/ruby/2.2.0/gems/httpclient-2.7.1/lib/httpclient/session.rb:795:in `block in parse_header'
/var/vcap/packages/vsphere_cpi_ruby/lib/ruby/2.2.0/timeout.rb:88:in `block in timeout'
/var/vcap/packages/vsphere_cpi_ruby/lib/ruby/2.2.0/timeout.rb:98:in `call'
/var/vcap/packages/vsphere_cpi_ruby/lib/ruby/2.2.0/timeout.rb:98:in `timeout'
/var/vcap/packages/vsphere_cpi/vendor/bundle/ruby/2.2.0/gems/httpclient-2.7.1/lib/httpclient/session.rb:788:in `parse_header'
/var/vcap/packages/vsphere_cpi/vendor/bundle/ruby/2.2.0/gems/httpclient-2.7.1/lib/httpclient/session.rb:771:in `read_header'
/var/vcap/packages/vsphere_cpi/vendor/bundle/ruby/2.2.0/gems/httpclient-2.7.1/lib/httpclient/session.rb:547:in `get_header'
/var/vcap/packages/vsphere_cpi/vendor/bundle/ruby/2.2.0/gems/httpclient-2.7.1/lib/httpclient.rb:1294:in `do_get_header'
/var/vcap/packages/vsphere_cpi/vendor/bundle/ruby/2.2.0/gems/httpclient-2.7.1/lib/httpclient.rb:1241:in `do_get_block'
/var/vcap/packages/vsphere_cpi/vendor/bundle/ruby/2.2.0/gems/httpclient-2.7.1/lib/httpclient.rb:1021:in `block in do_request'
/var/vcap/packages/vsphere_cpi/vendor/bundle/ruby/2.2.0/gems/httpclient-2.7.1/lib/httpclient.rb:1134:in `rescue in protect_keep_alive_disconnected'
/var/vcap/packages/vsphere_cpi/vendor/bundle/ruby/2.2.0/gems/httpclient-2.7.1/lib/httpclient.rb:1128:in `protect_keep_alive_disconnected'
/var/vcap/packages/vsphere_cpi/vendor/bundle/ruby/2.2.0/gems/httpclient-2.7.1/lib/httpclient.rb:1016:in `do_request'
/var/vcap/packages/vsphere_cpi/vendor/bundle/ruby/2.2.0/gems/httpclient-2.7.1/lib/httpclient.rb:858:in `request'
/var/vcap/packages/vsphere_cpi/vendor/bundle/ruby/2.2.0/gems/httpclient-2.7.1/lib/httpclient.rb:771:in `put'
/var/vcap/packages/vsphere_cpi/lib/cloud/vsphere/base_http_client.rb:74:in `do_request'
/var/vcap/packages/vsphere_cpi/lib/cloud/vsphere/base_http_client.rb:39:in `put'
/var/vcap/packages/vsphere_cpi/lib/cloud/vsphere/file_provider.rb:66:in `block in do_request'
/var/vcap/packages/vsphere_cpi/lib/cloud/vsphere/file_provider.rb:72:in `call'
/var/vcap/packages/vsphere_cpi/lib/cloud/vsphere/file_provider.rb:72:in `block in do_request'
/var/vcap/packages/vsphere_cpi/lib/cloud/vsphere/retryer.rb:9:in `block in try'
/var/vcap/packages/vsphere_cpi/lib/cloud/vsphere/retryer.rb:8:in `times'
/var/vcap/packages/vsphere_cpi/lib/cloud/vsphere/retryer.rb:8:in `try'
/var/vcap/packages/vsphere_cpi/lib/cloud/vsphere/file_provider.rb:71:in `do_request'
/var/vcap/packages/vsphere_cpi/lib/cloud/vsphere/file_provider.rb:32:in `upload_file_to_datastore'
/var/vcap/packages/vsphere_cpi/lib/cloud/vsphere/agent_env.rb:37:in `set_env'
/var/vcap/packages/vsphere_cpi/lib/cloud/vsphere/cloud.rb:754:in `add_disk_to_agent_env'
/var/vcap/packages/vsphere_cpi/lib/cloud/vsphere/cloud.rb:403:in `block in attach_disk'
/var/vcap/packages/vsphere_cpi/vendor/bundle/ruby/2.2.0/gems/bosh_common-1.3262.24.0/lib/common/thread_formatter.rb:49:in `with_thread_name'
/var/vcap/packages/vsphere_cpi/lib/cloud/vsphere/cloud.rb:366:in `attach_disk'
/var/vcap/packages/vsphere_cpi/vendor/bundle/ruby/2.2.0/gems/bosh_cpi-2.0.5/lib/bosh/cpi/cli.rb:79:in `public_send'
/var/vcap/packages/vsphere_cpi/vendor/bundle/ruby/2.2.0/gems/bosh_cpi-2.0.5/lib/bosh/cpi/cli.rb:79:in `run'
/var/vcap/packages/vsphere_cpi/bin/vsphere_cpi:42:in `<main>'I, [2017-12-19T19:40:42.454284 #10823]  INFO -- [req_id 932491]: Starting attach_disk...

We are following up with our vSphere admins to see if there are any networking issues. But we noticed that the CPI already tries to retry on HTTPClient::KeepAliveDisconnected error as shown here. However, the file provider only retries if it gets a 500 from vSphere as shown here. Should the file provider (or even base_http_client) be retrying on the same HTTP errors that the stub adapter retries?

Thanks!
@michelleheh && @ljfranklin

This is not issue

Hi thanks for your opensource, is this cpi supports VMware vSphere 6.5, if not please let me know up to which version of VMware vSphere it supports thanks in advance.

CPI ignores maintenance mode of datastore

The datastore accessibility is mentioned in #29, but its not quite the same.

We have a range of datastores, at any point in time one or more could be in maintenance mode. As it stands, the CPI falls over when it tries to create a disk inside a datastore that is in maintenance mode.

Within the datastore.rb file the object returned by Vim::Datastore + Datastore::PROPERTIES should return a 'summary.maintenanceMode' boolean. It should be possible to check this value before determining if a datastore can be used.

My Ruby is quite poor, but I would imagine changing https://github.com/cloudfoundry-incubator/bosh-vsphere-cpi-release/blob/master/src/vsphere_cpi/lib/cloud/vsphere/resources/datastore.rb#L18

from:

ds_properties['summary.accessible'],

to:

ds_properties['summary.accessible'] &&  ! ds_properties['summary.maintenanceMode'],

would cause the datastore to be not selected and the CPI would be free to move to the next datastore for integration.

Upgrade hardware version really working?

Hi,
we are just trying to get into cloud foundry on vsphere. As the stemcell version 9 is prehistoric and has a lot of issues we would like to bump it to 13. What we are trying ist to put upgrade_hw_version: true in the vm types section.

vm_types:
- name: default
  cloud_properties:
    cpu: 2
    upgrade_hw_version: true
    ram: 1024
    disk: 3240

wrong place?

Free disk space calculation in datastore_picker.rb doesn't take vmware swapfiles into account

There is a hardcoded value for the headroom in the datastore_picker.rb that's used when selecting datastores to place a vm/disk. The CPI's choice is based on free space + headroom value but unfortunately it doesn't take into account the fact that vmware creates a temporary swapfile (same size as the vm-RAM is). This might lead to the situation that bosh tries to create this swapfile on a datastore which has not enough free space to do so. You'll see an error message like this:

Error: Unknown CPI error 'Unknown' with message 'Failed to extend swap file from 0 KB to XXXX KB

We've solved this issue by temporarily changing the headroom value to at least the size of the biggest instance (RAM wise) in the vsphere-cpi code directly on the bosh-director.

Could you please fix this bug by changing the free space calculation for datastores to account for the additional needed vm-RAM size? It would be nice if this is somehow similar handled as it is done for the free disk space calculation by adding the headroom value.

Missing creds from vSphere admin account

I just tested the permissions roster listed in the docs file perm-by-perm for making an administrative user in vCenter and found there are eight missing permissions required when building PCF v1.8 with vSphere 6.0.

Those are:

Virtual Machine: Inventory: Register
Virtual Machine: Inventory: Unregister
Virtual Machine: Interact: Console Interaction
Virtual Machine: Interact: Guest Control with VIX
Virtual Machine: Interact: Defragment All Disks
Virtual Machine: Guest Operations: Query
Virtual Machine: Guest Operations: Modify
Virtual Machine: Guest Operations: Execute

Moving persistent disk on VM delete breaks if datastore has a space in the name

Before a VM with a persistent disk is deleted, the CPI checks the vAppConfig property to determine the original path of the persistent VMDK and will move it back in cases where the VM has been migrated to a new datastore and the disk has been moved into the VM folder. We've been trying to troubleshoot why the move function fails in some cases with this error:

CPI 'delete_vm' method responded with error: CmdError{"type":"Unknown","message":"Invalid datastore path '[some-datastore Folder'.","ok_to_retry":false}

Looking at the code at https://github.com/cloudfoundry-incubator/bosh-vsphere-cpi-release/blob/master/src/vsphere_cpi/lib/cloud/vsphere/resources/vm.rb#L297-L299, you are splitting the path on a space character which breaks if the datastore name contains a space. For example, this will work fine:

[some-datastore-path] Folder/disk-guid.vmdk

This will break:

[some datastore path] Folder/disk-guid.vmdk

In other places in the code you're using a regular expression to grab the text between the brackets to determine the name, splitting on a space is not safe.

cloudfoundry / bosh-vsphere-cpi-release Goto Github PK

bosh-vsphere-cpi-release's Introduction

BOSH vSphere CPI Release

Development

Requirements

Concepts