suse-cloud / automation Goto Github PK
View Code? Open in Web Editor NEWAutomation scripts for development, testing, and CI
License: Apache License 2.0
Automation scripts for development, testing, and CI
License: Apache License 2.0
With : cloudsource=GM5+up want_sles12=1 want_ceph=1 mkcloud .. proposal
Finished proposal heat(default) at: Fri Sep 11 07:22:43 UTC 2015
Usage: crowbar <area> <subcommand>
Areas: batch ceilometer ceph cinder crowbar database deployer dns glance heat ipmi keystone logging machines network neutron nfs_client node_state nova nova_dashboard ntp pacemaker provisioner rabbitmq reset reset_nodes reset_proposal suse_manager_client swift tempest trove updater
Starting proposal manila(default) at: Fri Sep 11 07:22:53 UTC 2015
No hooks defined for service: manila
Usage: crowbar <area> <subcommand>
Areas: batch ceilometer ceph cinder crowbar database deployer dns glance heat ipmi keystone logging machines network neutron nfs_client node_state nova nova_dashboard ntp pacemaker provisioner rabbitmq reset reset_nodes reset_proposal suse_manager_client swift tempest trove updater
Error: 'crowbar manila proposal --file=/root/manila.default.proposal edit default' failed with exit code: 255
$h1!!
Error detected. Stopping mkcloud.
When mkcloud creates the virtual networks in libvirt, it should first check that it is not creating a network that overlaps with any existing virtual network that may be present. Failure to make this pre-check may cause network requests fail in ways that may be difficult to diagnose, especially when there is only a partial overlap, i.e. requests to/from some interfaces work properly and others do not.
From https://ci.suse.de/job/cloud-update-ci/741/console
INFO:jenkins_jobs.builder:Reconfiguring jenkins job openstack-trackupstream
INFO:requests.packages.urllib3.connectionpool:Resetting dropped connection: ci.opensuse.org
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/jenkins_jobs/parallel.py", line 62, in run
**task['kwargs'])
File "/usr/lib/python2.7/site-packages/jenkins_jobs/builder.py", line 355, in parallel_update_job
self.update_job(job.name, job.output().decode('utf-8'))
File "/usr/lib/python2.7/site-packages/jenkins_jobs/builder.py", line 139, in update_job
self.jenkins.reconfig_job(job_name, xml)
File "/usr/lib/python2.7/site-packages/jenkins/__init__.py", line 1223, in reconfig_job
headers=DEFAULT_HEADERS
File "/usr/lib/python2.7/site-packages/jenkins/__init__.py", line 541, in jenkins_open
return self.jenkins_request(req, add_crumb, resolve_auth).text
File "/usr/lib/python2.7/site-packages/jenkins/__init__.py", line 560, in jenkins_request
self._request(req))
File "/usr/lib/python2.7/site-packages/jenkins/__init__.py", line 520, in _response_handler
"empty response" % self.server)
EmptyResponseException: Error communicating with server[https://ci.opensuse.org/]: empty response
Traceback (most recent call last):
File "/usr/bin/jenkins-jobs", line 10, in <module>
sys.exit(main())
File "/usr/lib/python2.7/site-packages/jenkins_jobs/cli/entry.py", line 158, in main
jjb.execute()
File "/usr/lib/python2.7/site-packages/jenkins_jobs/cli/entry.py", line 139, in execute
ext.obj.execute(self.options, self.jjb_config)
File "/usr/lib/python2.7/site-packages/jenkins_jobs/cli/subcommand/update.py", line 150, in execute
existing_only=options.existing_only)
File "/usr/lib/python2.7/site-packages/jenkins_jobs/builder.py", line 340, in update_jobs
raise result
jenkins.EmptyResponseException: Error communicating with server[https://ci.opensuse.org/]: empty response
When I want to create a test environment for let's say ceilometer, I currently have to deploy all barclamps up to ceilometer manually. Then I can fetch the updated ceilometer barclamp and deploy it.
It should be possible to tell mkcloud which proposals to deploy. It could be done with a variable that contains a string that holds the requested proposals and is exported as a environment variable or using crowbar batch in a way.
Traceback (most recent call last):
File "./scripts/lib/libvirt/cleanup.py", line 79, in <module>
main()
File "./scripts/lib/libvirt/cleanup.py", line 68, in main
network.undefine()
UnboundLocalError: local variable 'network' referenced before assignment
Obviously mkcloud
needs to retain the ability to output to the CLI, but the same output should also be preserved in a log file for future reference. We could use this trick.
When a cloud is built via Jenkins, ideally a URL to the job should be included in /etc/motd
on at least the Crowbar node (maybe the others too) so that when ssh
ing to a worker cloud, it's easy to see where it came from. This mechanism could be implemented in a way which supports arbitrary cloud descriptions, not just Jenkins URLs etc. E.g. if I manually invoke mkcloud to test a particular feature such as compute node HA, I should be able to set a clouddescription
variable or similar with a value like testing compute node HA feature
. Then anyone else who ssh
es to my cloud can see what its purpose is.
Implementation would be in two parts:
clouddescription
variable to mkcloud
, so that the description appears in /etc/motd
We should make mkcloud write a /var/log/mkcloud.txt containing details of the mkcloud run (including Jenkins job number), and make the supportconfig plugin capture this. So then if you have a mystery supportconfig, you can tell where it came from and how mkcloud was configured.
The setupcompute
and instcompute
steps don't just setup and install compute nodes, they also do controller and storage nodes. So they should be renamed to something more general. I can't yet think of the right word though, sorry ;-)
The extra noise is causing problems with login to Horizon Dashboard
Comment the noise as show below to get back to a healthy state.
Listen 5000
Listen 35357
<VirtualHost *:5000>
WSGIDaemonProcess keystone-public processes=2 threads=1 user=keystone group=keystone display-name=%{GROUP}
WSGIProcessGroup keystone-public
WSGIScriptAlias / /usr/bin/keystone-wsgi-public
WSGIApplicationGroup %{GLOBAL}
WSGIPassAuthorization On
LimitRequestBody 114688
ErrorLogFormat "%{cu}t %M"
ErrorLog /var/log/keystone/keystone.log
CustomLog /var/log/keystone/keystone_access.log combined
<Directory /usr/bin>
Require all granted
</Directory>
<VirtualHost *:35357>
WSGIDaemonProcess keystone-admin processes=2 threads=1 user=keystone group=keystone display-name=%{GROUP}
WSGIProcessGroup keystone-admin
WSGIScriptAlias / /usr/bin/keystone-wsgi-admin
WSGIApplicationGroup %{GLOBAL}
WSGIPassAuthorization On
LimitRequestBody 114688
ErrorLogFormat "%{cu}t %M"
ErrorLog /var/log/keystone/keystone.log
CustomLog /var/log/keystone/keystone_access.log combined
<Directory /usr/bin>
Require all granted
</Directory>
#Alias /identity /usr/bin/keystone-wsgi-public
#<Location /identity>
#
#Alias /identity_admin /usr/bin/keystone-wsgi-admin
#<Location /identity_admin>
#
cloudpv=/dev/loop0 cloudsource=GM5+up nodenumber=2 compute_node_memory=4194304 want_sles12=1 tempestoptions="-N -t" ./scripts/mkcloud plain
will fail with:
Starting proposal swift(default) at: Pá zář 11 20:46:18 UTC 2015
Failed to edit: default : Errors in data
Failed to validate proposal: Role swift-storage can't be used for suse 12.0, windows /.*/ platform(s).
Error: 'crowbar swift proposal --file=/root/swift.default.proposal edit default' failed with exit code: 1
$h1!!
Error detected. Stopping mkcloud.
The step 'proposal' returned with exit code 88
Please refer to the proposal function in this script when debugging the issue.
The documentation for mkcloud, docs/mkcloud.md, contains a sample bash script in the section "Using with local repositories". The biggest problem with this script is that it suggests that it can be used to create a full cloud and on casual reading seems to execute the additional setup steps mentioned at the top, such as creating the disk and loopback. But the problem is that it skips step (like setuphost) that have to be run in order to work properly. It also has a hardcoded path (/home/tom...) that make it unusable without tweaking. It also changes several network values for no apparent reason.
It appears that the whole point of the script was to demonstrate caching, so instead of having a script, it would be better to document how the caching options work, i.e. cache_clouddata
and cache_dir.
docs/mkcloud.md
has a number of one-time setup tasks that really make sense to incorporate into the setuphost
target. This includes:
losetup
, if necessarysetuphost
should be idempotent, of course, so that it can be re-run without harm.setuphost
were included in the default list of those targets that expand to lots of steps. e.g. all
, plain
, etc.Currently you have to cd
to the directory containing mkcloud
before running it. That shouldn't be necessary.
It seems that recently some mkcloud host (e.g. mkchm
) were changed to run mkcloud as the non-root user jenkins
. With such a setup it's no longer possible to reserve built slot using soc-ci worker-pool-reserve
as that tool currently only handles the case where the pool directory is located in /root
.
According to @jdsn soc-ci was supposed to be fixed like this:
"the basic idea was, that soc-ci tries to connect as jenkins first, and falls back to root (long term goal is to have all workers run as non-root), to save ssh connections, soc-ci caches its results in a local dot-file and in case of an error or after an expiry period the host is probed again (first jenkins then root) "
Unfortunately setting up a test environment with mkcloud takes quite some time, and mkcloud only allows to have one snapshot at a time, which makes difficult to quickly test and iterate locally.
We should try to identify alternative routes for a faster mkcloud deployments for development purposes.
Some bullet points from my (still ignorant) point of view of mkcloud:
Looking for some comments here to see if its feasible or not and what else could be looked at.
Currently stuff like the SUSE-SLE12-CLOUD-5-COMPUTE
.iso
get downloaded into the admin node every run. They should get downloaded into /var/cache/mkcloud
on the mkcloud
host to avoid this.
mkcloud writes mkcloud.pid
and mkcloud.config
temporary files into whichever directory you run it from, but these should go under /tmp
or /var
to avoid littering the git working tree (best case) or other random directories (worst case) with temporary files. This happens even before the sanity_check
function is hit which is currently causing #222.
Using commit a2de743 and the following settings:
export cloudsource=develcloud6
export debug_qa_crowbarsetup=1
export cephvolumenumber=1
export want_neutronsles12=1
export want_mtu_size=8900
export clusterconfig=data+services+network=2
with mkcloud plain
downloading manila-service-image times out.
+ wget -N --progress=dot:mega http://149.44.176.43/images/other/manila-service-image.qcow2
--2016-04-22 10:41:40-- http://149.44.176.43/images/other/manila-service-image.qcow2
Connecting to 149.44.176.43:80... failed: Connection timed out.
When I manually log into the cloud node I can reproduce this behavior. From the crowbar node wget
completes just fine. This looks like some network forwarding is not correctly setup from within the cloud node.
I should mention that this is on tumbleweed.
After looking at #2679 it seems to me from the logs that when
fails to create a job when the HTTPS certificate is expired it doesn't print any output. But it should. Also I'm unsure whether it correctly sets a failure exit return value.The request is made around here: https://github.com/openstack/python-jenkins/blob/1.2.1/jenkins/__init__.py#L543
I would expect an SSLError exception or something to be printed.
There are a bunch of expandable steps called alias_*
and I have no idea why.
The README.md
in the root of this repo could also show the contents of doc/mkcloud.md
It would be nice being able to update the crowbar codebase to the newest version from D:C:S:X without having to reinstall everything from scratch.
sth like ./mkcloud rebase
This is already (manually) possible with the devsetup, and updating local git clones manually, but for production environment this doesn't exist yet.
I remember having done a rsync from the newest ISO to the repo on the admin node, but this required a download of the newest ISO.
maybe there is a way (also for remote workers) to update the cloud like that?
See attached log for errors. Note that this might be an error in my configuration as well.
manila.txt
The scripts/jenkins/jenkins-job-trigger
script fails in our Jenkins job with a LookupError because of "unknown encoding: idna".
The following stack trace shows the error:
Triggering jenkins job with url http://<address>/mko/sap-oc:crowbar-openstack:37:db8e9b8ac89b7c5b0ab98bd402f115871f2783c9:stable/sap/3.0/ and directory /srv/mkcloud/mko/sap-oc:crowbar-openstack:37:db8e9b8ac89b7c5b0ab98bd402f115871f2783c9:stable/sap/3.0
Traceback (most recent call last):
File "/root/github.com/SUSE-Cloud/automation/scripts/jenkins/jenkins-job-trigger", line 71, in <module>
jenkins_build_job(sys.argv[1], args)
File "/root/github.com/SUSE-Cloud/automation/scripts/jenkins/jenkins-job-trigger", line 63, in jenkins_build_job
server.build_job(job_name, job_parameters)
File "/usr/lib/python2.7/site-packages/jenkins/__init__.py", line 915, in build_job
self.build_job_url(name, parameters, token), b''))
File "/usr/lib/python2.7/site-packages/jenkins/__init__.py", line 344, in jenkins_open
self.maybe_add_crumb(req)
File "/usr/lib/python2.7/site-packages/jenkins/__init__.py", line 258, in maybe_add_crumb
self._build_url(CRUMB_URL)), add_crumb=False)
File "/usr/lib/python2.7/site-packages/jenkins/__init__.py", line 345, in jenkins_open
response = urlopen(req, timeout=self.timeout).read()
File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib64/python2.7/urllib2.py", line 431, in open
response = self._open(req, data)
File "/usr/lib64/python2.7/urllib2.py", line 449, in _open
'_open', req)
File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/usr/lib64/python2.7/urllib2.py", line 1227, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib64/python2.7/urllib2.py", line 1194, in do_open
h.request(req.get_method(), req.get_selector(), req.data, headers)
File "/usr/lib64/python2.7/httplib.py", line 1041, in request
self._send_request(method, url, body, headers)
File "/usr/lib64/python2.7/httplib.py", line 1075, in _send_request
self.endheaders(body)
File "/usr/lib64/python2.7/httplib.py", line 1037, in endheaders
self._send_output(message_body)
File "/usr/lib64/python2.7/httplib.py", line 881, in _send_output
self.send(msg)
File "/usr/lib64/python2.7/httplib.py", line 843, in send
self.connect()
File "/usr/lib64/python2.7/httplib.py", line 824, in connect
self.timeout, self.source_address)
File "/usr/lib64/python2.7/socket.py", line 554, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
LookupError: unknown encoding: idna
/root/github.com/SUSE-Cloud/automation/scripts/crowbar-testbuild.rb:124:in `trigger_jenkins_job'
/root/github.com/SUSE-Cloud/automation/scripts/crowbar-testbuild.rb:137:in `block in trigger_jenkins_jobs'
/root/github.com/SUSE-Cloud/automation/scripts/crowbar-testbuild.rb:136:in `each'
/root/github.com/SUSE-Cloud/automation/scripts/crowbar-testbuild.rb:136:in `trigger_jenkins_jobs'
/root/github.com/SUSE-Cloud/automation/scripts/crowbar-testbuild.rb:289:in `<main>'
We try to build & test the following PR sap-oc/crowbar-openstack#37
uncommenting OPENSTACK_API_VERSIONS in
/srv/www/openstack-dashboard/openstack_dashboard/local/local_settings.py
seems to fix this.
Currently mkcloud relies on a bunch of shell environment variables to control its behaviour:
docs/mkcloud.md
../mkcloud
with no arguments. (It's also supposed to be output when you run ./mkcloud help
, but that's broken.)How did we get ourselves in this mess? I suggest that the answers include (but are not limited to) the following reasons:
This is not the fault of any one person, and anyway my goal is not to blame people. The priorities are:
My suggestions are:
docs/mkcloud.md
and remove it from the code.docs/mkcloud.md
.Thoughts welcome.
During onadmin_prepareinstallcrowbar two lines are written to /opt/dell/chef/cookbooks/nfs-server/templates/default/exports.erb
(See lines qa_crowbarsetup.sh#L1546 and qa_crowbarsetup.sh#L1550).
The file /opt/dell/chef/cookbooks/nfs-server/templates/default/exports.erb
belongs to the crowbar-core package. If this package gets updated after the onadmin_prepareinstallcrowbar, all changes to this file are overwritten.
We run the following steps for testing: cleanup prepare setupadmin addupdaterepo prepareinstcrowbar runupdate bootstrapcrowbar instcrowbar setupnodes instnodes setup_aliases proposal testsetup cct addupdaterepo+0 onadmin+allow_vendor_change_at_nodes onadmin+zypper_update onadmin+cloudupgrade_clients testsetup cct
In our case the runupdate step that runs after the prepareinstcrowbar performs such an update of crowbar-core. Because of this the mkcloud run fails at step proposal when trying to mount the /var/lib/glance nfs share. More specifically, the command crowbarctl proposal commit nfs_client data
fails with the message:
---- Begin output of mount -t nfs -o nofail,comment="managed-by-crowbar-barclamp-nfs-client" <hostname>:/var/lib/glance/images /var/lib/glance/images ----
STDOUT:
STDERR: mount.nfs: access denied by server while mounting <hostname>:/var/lib/glance/images
---- End output of mount -t nfs -o nofail,comment="managed-by-crowbar-barclamp-nfs-client" <hostname>:/var/lib/glance/images /var/lib/glance/images ----
This is caused by the fact that /var/lib/glance/images is not part of /etc/exports
on the nfs server.
A possible hot fix for this could be to move the code that alters the crowbar template to a later mkcloud step (will provide a PR to showcase this). A real solution IMHO should avoid changing the template at all, since it is owned by an rpm package that might change the file.
e.g. https://ci.suse.de/job/openstack-mkcloud/12920/
mkcloud should have its own timeout handler which ensures that supportconfigs are always collected if the build takes too long.
Even a 10-line README would be a good start.
This is a sister issue to #223, covering the case when download of a new image is actually required.
Back in 2014 I wrote the dl-ibs
utility and announced it on cloud-devel
. Dirk replied suggesting makedeltaiso
/ applydeltaiso
instead.
In this issue, let's track all possible options, decide on one, and then ensure it's used by mkcloud
.
The function do_testsetup()
https://github.com/SUSE-Cloud/automation/blob/master/scripts/qa_crowbarsetup.sh#L1138 has a string of 186, which is a shell script with a lot logic passed as argument to ssh to be ran in $novacontroller
.
This function at least should scp
a script to novacontroller and run it in a similar way qa_crowbarsetup.sh
is executed.
It seems like a pretty common use case to want to run mkcloud
inside screen. I guess there are a few reasons, e.g.
mkcloud
runs generally take a long time, so this protects against network outagesmkcloud
runs generate a lot of output, and screen/tmux can capture output in the scroll-back buffer or even in a log file (although #1191 could take care of that separately)mkcloud
currently needs to run in a dedicated directory (see #224), so this helps keep separation between multiple clouds on the same machine.We could either write a wrapper around mkcloud
which supports creation of a new screen session for the cloud and reuse of any existing session, or we could add native support for this directly into mkcloud
itself. For example:
exec $0 "$@"
or similar.A similar approach could be used for tmux
.
Is this worth doing? @vuntz This arose from looking at /root/manual.vuntz/start-screen.sh
etc. on mkcloud1
.
/cc @jdsn @bmwiedemann
One remaining question is how to handle non-interactive invocations, e.g. if you were batch-starting 10 new clouds in parallel.
I get the following error trying to run crowbar-prep.sh from the current master:
# cat crowbar-prep.sh | ssh root@crowbar bash -s - -p 4 nue-nfs
Password:
WARNING: Removing 192.168.124.10 entry already in /etc/hosts:
192.168.124.10 pebbles.crowbar.dev pebbles
Not using 9p
/srv/tftpboot/suse-11.3/install already mounted; umounting ...
mounted /srv/tftpboot/suse-11.3/install
/srv/tftpboot/repos/SLES11-SP3-Pool already mounted; umounting ...
mounted /srv/tftpboot/repos/SLES11-SP3-Pool
/srv/tftpboot/repos/SLES11-SP3-Updates already mounted; umounting ...
mounted /srv/tftpboot/repos/SLES11-SP3-Updates
/srv/tftpboot/repos/Cloud already mounted; umounting ...
mounted /srv/tftpboot/repos/Cloud
mount: can't find /srv/tftpboot/repos/SUSE-Cloud-4-Updates in /etc/fstab or /etc/mtab
Couldn't mount /srv/tftpboot/repos/SUSE-Cloud-4-Updates
It outputs bash
help instead.
qa_crowbarsetup.sh
has loads of sleep
calls. This is unreliable and also typically results in everything taking longer than it should. They should all be replaced with busy waits.
If the addupdaterepo
step is run before the prepareinstcrowbar
step (e.g. via alias_new_admin
), then the newly added PTF repo gets removed again.
The code really should be more robust than this.
+ echo '============> MKCLOUD STEP START: rebootcloud <============'
============> MKCLOUD STEP START: rebootcloud <============
+ echo
+ sleep 2
+ echo rebootcloud
+ cmd_parameters=rebootcloud
+ cmd=rebootcloud
+ rebootcloud rebootcloud
+ onadmin rebootcloud
+ local cmd=rebootcloud
+ shift
+ sshrun onadmin_rebootcloud
+ cat
+ env
+ grep -e '^debug_' -e '^pre_' -e '^want_' -e '^net_' -e '^nodenumber' -e '^clusterconfig'
+ sort
+ scp -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null ./qa_crowbarsetup.sh mkcloud.config [email protected]:
ssh: connect to host 192.168.217.10 port 22: Connection timed out
lost connection
+ [[ '' = 1 ]]
++ hostname
+ ssh -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null [email protected] 'echo mkcloud1 > cloud ; . qa_crowbarsetup.sh ; onadmin_rebootcloud'
ssh: connect to host 192.168.217.10 port 22: Connection timed out
+ return 255
+ return 255
+ ret=255
+ '[' 255 '!=' 0 ']'
+ set +x
$h1!!
Error detected. Stopping mkcloud.
The step 'rebootcloud' returned with exit code 255
Please refer to the rebootcloud function in this script when debugging the issue.
ssh: connect to host 192.168.217.10 port 22: Connection timed out
ssh: connect to host 192.168.217.10 port 22: Connection timed out
Environment Details
-------------------------------
hostname: mkcloud1.cloud.suse.de
started: Mon 23 Nov 11:07:20 UTC 2015
ended: Mon 23 Nov 11:15:53 UTC 2015
-------------------------------
cloudsource: develcloud6
TESTHEAD: 1
want_test_updates: 1
scenario:
nodenumber: 4
cloudpv:
UPDATEREPOS:
cephvolumenumber: 0
upgrade_cloudsource:
-------------------------------
want_ipmi=false
want_sles12sp1=1
want_test_updates=1
want_sles12=1
-------------------------------
allocpool is a small but critical part of our CI. It would be helpful if it was documented and also written in a language which doesn't violate our policy. This is increasingly important as new people join the team who have been hired as Python/Ruby/shell hackers, not Perl hackers.
nova_with_ssl and maybe more options are not documented anywhere, not even in the undocumented options
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.