Giter Club home page Giter Club logo

controller's People

Contributors

aledbf avatar arschles avatar babarinde avatar bengrunfeld avatar carmstrong avatar croemmich avatar cryptophobia avatar dependabot[bot] avatar ess avatar glogiotatidis avatar helgi avatar iancoffey avatar jgmize avatar johanneswuerbach avatar joshua-anderson avatar kalbasit avatar kmala avatar krancour avatar mboersma avatar nathansamson avatar ngpestelos avatar notmaxx avatar progrium avatar romansergey avatar smothiki avatar technosophos avatar tombh avatar vdice avatar wenzowski avatar xe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

controller's Issues

Improvements to the Scheduler Client

From @helgi on September 15, 2016 22:0

This is an umbrella ticket and not all of these things need to be done urgently. Just didn't want to create bunch of tiny tickets just yet

  • When fetching / creating resources can the data be stored internally on the object and run certain function without needing to pass in the structs (move response to attribute). This would amount to being able to do pod = scheduler.pod.get('blah'); pod['metadata']['name'] == 'foo'; pod.update() - note the lack of .json() and such, as response would not be returned anymore by default.
  • Can we store pod/container objects but turn them into a dict for requests (create / update / etc)? makes it all flow nicer, could get rid of manifest()
  • memoize results of version and more! This can be applied per function
  • have the client understand throttling from Kubernetes - Does anything need to be done? Backoff?
  • Start validating inputs? Or make API server catch all those and just bubble it up... our models should catch it anyway

Removing dependency on Django / DRF in Scheduler:

  • remove django settings usage (testing only atm)
  • remove reliance on django cache (testing only)

Copied from original issue: deis/controller#1072

Create Kubernetes specific validation functions to use all over

From @helgi on September 15, 2016 21:13

Create validation functions for the various Kubernetes requirements

Such as validation what is used to validate the various length and char requirements. Basically DNS validation.

Labels are 63 chars, adhere to DNS specs, etc etc

This would allow a few models to have more unified validation functions, especially the tags validation could do with it.

Question is if this lives inside of the scheduler or not? Problem with it living there is at the end it needs to raise ValidationError for the serializers and friends, but worth investigating

Copied from original issue: deis/controller#1070

router.deis./ssl.enforce annotation not persisted on service creation

From @mattk42 on June 5, 2017 19:57

If somehow a service is to get deleted, when the controller recreates that service the router.deis./ssl.enforce does not get set to match the state in the database.

To reproduce, I had an app with tls:enable set and deleted the namespace. I restarted the controller to force the app to be re-created and then ran the following:

$ deis tls:info -a test-app
=== test-app TLS
HTTPS Enforced: true

$ kubectl get service -n test-app -o json | grep enforce

$ deis tls:enable -a test-app
Enabling https-only requests for test-app... Error: Unknown Error (409): {"detail":"mattk42 changed nothing"}

$ deis tls:disable -a test-app
Disabling https-only requests for test-app... done

$ deis tls:enable -a test-app
Enabling https-only requests for test-app... done

$ kubectl get service -n test-app -o json | grep enforce
                    "router.deis.io/ssl.enforce": "True"

Copied from original issue: deis/controller#1304

Scaling an app down while a build is running leads to unpredictable results

From @deis-admin on January 19, 2017 23:41

From @jeff-lee on November 5, 2015 22:47

I'm running into an issue in v1.12.0 where scaling down an app while a build is running can result in either:

a) The new containers getting shut down and the build hanging
b) Zero running containers

I started a new cluster and scaled the example-go app up to 3.

$ fleetctl list-units|grep jefftest
jefftest_v74.web.1.service  a5ea5dc1.../10.10.17.144    active      running
jefftest_v74.web.2.service  6b548706.../10.10.19.9      active      running
jefftest_v74.web.3.service  6b548706.../10.10.19.9      active      running

I then started a build ( v75 ) and scaled the app down from 3 to 2 when the node started pulling the new containers down.

$ deis ps:scale web=2 -a jefftest
Scaling processes... but first, coffee!
done in 5s
=== jefftest Processes
--- web:
web.1 up (v74)
web.2 up (v74)

At this point, the v75 container gets stopped and the build ( with HEALTHCHECK_URL set ) hangs.

Thu Nov  5 22:06:45 UTC 2015
cda30e1fda8e        10.10.16.243:5000/jefftest:v75   "/runner/init start    1 seconds ago        Up Less than a second   0.0.0.0:32901->5000/tcp   jefftest_v75.web.1
2598c80e0985        10.10.16.243:5000/jefftest:v74   "/runner/init start    About a minute ago   Up About a minute       0.0.0.0:32900->5000/tcp   jefftest_v74.web.3
9d4614e6fb3f        10.10.16.243:5000/jefftest:v74   "/runner/init start    2 minutes ago        Up 2 minutes            0.0.0.0:32899->5000/tcp   jefftest_v74.web.2
Thu Nov  5 22:06:46 UTC 2015
9d4614e6fb3f        10.10.16.243:5000/jefftest:v74   "/runner/init start    2 minutes ago       Up 2 minutes        0.0.0.0:32899->5000/tcp   jefftest_v74.web.2
Thu Nov  5 22:06:47 UTC 2015
9d4614e6fb3f        10.10.16.243:5000/jefftest:v74   "/runner/init start    2 minutes ago       Up 2 minutes        0.0.0.0:32899->5000/tcp   jefftest_v74.web.2
Thu Nov  5 22:06:48 UTC 2015
9d4614e6fb3f        10.10.16.243:5000/jefftest:v74   "/runner/init start    2 minutes ago       Up 2 minutes        0.0.0.0:32899->5000/tcp   jefftest_v74.web.2
Thu Nov  5 22:06:49 UTC 2015
9d4614e6fb3f        10.10.16.243:5000/jefftest:v74   "/runner/init start    2 minutes ago       Up 2 minutes        0.0.0.0:32899->5000/tcp   jefftest_v74.web.2

I have also seen all of the containers get stopped when scaling from 3 to 2. Though I have only been able to reproduce this when HEALTHCHECK_URL is not set so far.

14:42:50 [ds12] - /Users/jefflee
$ deis ps:scale web=2 -a jefftest
Scaling processes... but first, coffee!
done in 6s
=== jefftest Processes

14:44:49 [ds12] - /Users/jefflee
$ deis info -a jefftest
=== jefftest Application
updated:  2015-11-05T22:44:49UTC
uuid:     20949ab0-ffbd-4442-b490-f7b01951976b
created:  2015-11-05T18:43:43UTC
url:      jefftest.ds12.therealreal.com
owner:    jefflee
id:       jefftest

=== jefftest Processes

=== jefftest Domains

Copied from original issue: deis/deis#4719

Copied from original issue: deis/controller#1224

Proposal: Fine Grained Permissions

From @Joshua-Anderson on July 30, 2015 20:52

Right now, there are two permission levels in deis:

administrators
    full access
normal users
    full access to create apps and manage their own apps.

This proposal would overhaul the permission system by allowing much more finer controller over which users can do.
Cluster Permissions

certs
    add or remove certs from the cluster
apps
    create/destroy
    app management: view, modify, share, and transfer other user's apps.

App permissions

config
    read or modify config
push
    can push code or create a release
domains
    add or remove domains
scale
    scale an app up or down

Default Permissions

Administrators have all permissions granted.

An ETCD setting would set the default permissions for new users:

Example key layout:

/deis/controller/permissions/apps true
/deis/controller/permissions/certs true
...

Example Usage

an admin shares user cert permission with user foo

$ deis perms:create foo --cert --apps

an app owner removes config permission from user tester

$ deis perms:delete tester --config

users can view what permissions they have

$ deis perms:view
Cluster Wide Permissions

certs

App epic-app Permissions

config
push
scale

admins and app owners can also view a users permissions

$ deis perm:list --username=foo
App epic-app Permissions

config
push
scale

Testing

Almost all of this code resides in the controller, so it would mostly involve lots of tests in the controller to make sure all the edge cases are covered.
Migration

Migration should be pretty simple, admins would still have all access, and a migration script would grant all existing users their current permissions.

The same would apply for apps, the app owner would have all access and users who had the app shared would get the subset of permissions they already had.

Copied from original issue: deis/deis#4150

deis/controller#1226

Missing global settings customization in helm chart

From @felixbuenemann on May 24, 2017 17:44

There is currently no way to set the following global controller environment variables without customizing helm charts:

  • DEIS_DEPLOY_BATCHES
  • DEIS_DEPLOY_TIMEOUT
  • KUBERNETES_DEPLOYMENTS_REVISION_HISTORY_LIMIT
  • KUBERNETES_POD_TERMINATION_GRACE_PERIOD_SECONDS

(Probably more, all of the above can be overridden using the equally named environnment variables in each app.)

It would be great if the controller helm chart had a environment_variables section to allow setting these variables, similar to router.deployment_annotations in the deis/router chart:

controller:
  environment_variables:
    KUBERNETES_DEPLOYMENTS_REVISION_HISTORY_LIMIT: "10"
    KUBERNETES_POD_TERMINATION_GRACE_PERIOD_SECONDS: "60"

Alternatively separate values could be added for each setting, but I think this approach is more flexible and avoids the need to document rarely customized settings.

Copied from original issue: deis/controller#1301

Some portion of failed requests (502) during application deployment (regression since deis 1.x)

From @rvadim on March 7, 2017 7:36

Hello,
thank you for you work.
We have a problem with zero downtime deployment in deis 2, due to http://stackoverflow.com/questions/40545581/do-kubernetes-pods-still-receive-requests-after-receiving-sigterm
I tested it with gatling on different applications(pull, config:set etc) and get 5 - 100 errors (502) under 100 RPS
We workaround this issue in native k8s interface with preStop hook /bin/sh -c "sleep 5", but i don't know how to fix this in deis controller, due to for example images with golang binary from scratch don't have any shell.
Have anyone an idea how to fix this?

Additional info:

Thank you.

Copied from original issue: deis/controller#1254

config:set --global for common variables

From @deis-admin on January 19, 2017 23:24

From @boffbowsh on September 10, 2014 14:8

It'd be extremely useful for a Deis deployment of common applications to have the concept of cluster-wide environment variables, for example a REDIS_URL if all apps hit a common Redis server, or perhaps some email service credentials.

Regarding the action that would take place when these get updated, I think it would be best to allow the operators to re-release and thus restart apps at their own pace.

Copied from original issue: deis/deis#1804

Copied from original issue: deis/controller#1219

Attach host devices to containerized application

Is there a way to attach a webcam device to the containerized app that is managed via deis workflow ? e.g. --device option in docker.

I see that there is no way to attach volumes to the container using deis and that was a work in progress using addons.

Ability to use volumes within deployed apps

From @jchauncey on October 11, 2016 20:56

I've recently needed the ability to mount a volume from the host into a pod. It actually made me question the use of deis for this particular app and if I should just use deployments instead. It would be awesome if we could make a persistentvolumeclaim or use a regular volume mount.

Copied from original issue: deis/controller#1111

Deployment timeouts set using k8s API

Look at what the Kubernetes API provides in terms of deployment checking. Right now, deis does not seem to check if there is already a deployment going on in the kubernetes API. This creates problems when deployments are triggered from kubernetes side simultaneously with a hephy-controller deployment. There is a default timeout set and an app specific override provided as an environment variable, DEIS_DEPLOY_TIMEOUT.

Using the k8s deployment API might allow for smoother deployments in case the hephy-controller is trying to deploy to an app already being deployed.

deis run command taking over 60 seconds to complete

From @vdice on October 5, 2016 21:37

CI is seeing failures when running deis run env -a <appName> due to said command not returning within the default timeout of 60 seconds. (Original issue/background info found in deis/workflow-e2e#329)

As it is reasonable to expect such a command would return in a more reasonable amount of time, I've filed the issue here. cc @kmala

If issue is addressed, a TODO on the potential fix would be to decrease e2e's bumped threshold for any test involving deis run.

Copied from original issue: deis/controller#1106

add deis domains:transfer support

From @deis-admin on January 19, 2017 23:23

From @olalonde on August 9, 2015 17:50

I'm trying to do blue-green deployment with Deis. So I have two Deis apps (app-green and app-blue). At any time either app-green OR app-blue should be live (e.g. serve traffic for app.domain.com). Everything good so far except for one thing: is there any way to atomically change app-green and app-blue's DNS configs so that I can redirect traffic to one app or the other for my domain name atomically?

e.g:

deis domains:remove app.domain.com --app app-green && deis domains:add app.domain.com --app app-blue

Any way to make this atomically so there is no downtime? Other suggestions?

Copied from original issue: deis/deis#4237

Copied from original issue: deis/controller#1218

feat(client): `deis logs -f`

From @deis-admin on January 19, 2017 20:53

From @mboersma on January 15, 2014 21:6

Currently with deis you can run deis logs, but it's a single request-response that returns the current app logs and ends. Nor does deis run tail -f work since that isn't (yet) a long-running, duplex connection. Users should be able to get live tailing of logs.

This was requested by user kfarrell on IRC. See also #117.

Copied from original issue: deis/deis#465

Copied from original issue: deis/controller#1210

mitigate the number of times we tag an image

From @bacongobbler on September 18, 2016 17:42

According to @lavalamp, Workflow clusters in GKE are triggering this bug. This issue occurs because Workflow will re-tag an image even if no changes were made. We used to do this because we would inject environment metadata into the image when deis config:set was called. Now, we just inject that environment metadata directly into the k8s manifest, so no image modification (and therefore no image re-tagging) is required.

While the upstream bug will eventually get fixed, in the meantime we should investigate how we trigger this bug, and see if there is a way to mitigate this issue.

Copied from original issue: deis/controller#1082

deis run need pods-create permissions (rbac)

deis run needs pods - create permissions in charts/controller/templates/controller-clusterrole.yaml

Error:

Running 'bundle exec rake db:migrate'...
Error: Unknown Error (503): {"detail":"assessments-service-config-run-f1dc5 (run): Expecting value: line 1 column 1 (char 0)"}```

Logs
```INFO:scheduler:[assessments-service-config]: run assessments-service-config-run-5ouaj, img quay.io/welltok/assessments:f56e975, entrypoint ['/bin/bash', '-c'], cmd "['bundle exec rake db:migrate']"
ERROR:root:Expecting value: line 1 column 1 (char 0)
Traceback (most recent call last):
  File "/app/api/models/app.py", line 749, in run
    **data
  File "/app/scheduler/__init__.py", line 311, in run
    self.pod.create(namespace, name, image, **kwargs)
  File "/app/scheduler/resources/pod.py", line 42, in create
    raise KubeHTTPException(response, 'create Pod in Namespace "{}"', namespace)
  File "/app/scheduler/exceptions.py", line 10, in __init__
    data = response.json()
  File "/usr/local/lib/python3.5/dist-packages/requests/models.py", line 885, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/lib/python3.5/json/__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.5/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.5/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 486, in dispatch
    response = handler(request, *args, **kwargs)
  File "/app/api/views.py", line 241, in run
    rc, output = app.run(self.request.user, request.data['command'])
  File "/app/api/models/app.py", line 755, in run
    raise ServiceUnavailable(err) from e
api.exceptions.ServiceUnavailable: assessments-service-config-run-5ouaj (run): Expecting value: line 1 column 1 (char 0)```

When namespace is missing, apps:destroy should delete from postgres db

From @Cryptophobia on June 26, 2017 15:11

There is an issue with the deis-controller when it comes to deleting applications. Let's say I create a new app in deis-controller called testing-app . The app is created in the data model for deis-controller and everything works correctly.

But let's say I am a kubernetes fanboi and I delete the the entire namespace (testing-app) of the application instead of using the good old deis-controller api to do the exact same job. This will no doubt achieve what I want because the application will disappear. However, when I query the api the application is still there. So the next thing I want to do is delete the application testing-app from deis using the deis apps:destroy -a testing-app or delete using DELETE on the v2/apps/testing-app route. However that call does not succeed because it does not find the namespace, when in reality everything is done for me, I just want this app to disappear from the deis db as well.

Is this as simple as adding more if logic that handle this case or am I missing something extra...?

Copied from original issue: deis/controller#1310

DEIS_DEPLOY_HOOK_URLS too early?

From @Cryptophobia on August 10, 2017 22:4

Is the DEIS_DEPLOY_HOOK_URLS pretty useless...? The hook actually fires before the deployment starts and in what case would it be ever that useful?

Deis-controller Logs:

172.17.0.8 "POST /v2/apps/testing-application/config HTTP/1.1" 201 5084 "PostmanRuntime/6.1.6"
172.17.0.8 "POST /v2/auth/login/ HTTP/1.1" 200 52 "Typhoeus - https://github.com/typhoeus/typhoeus"
172.17.0.8 "GET /v2/apps/testing-application/releases HTTP/1.1" 200 17307 "Typhoeus - https://github.com/typhoeus/typhoeus"
172.17.0.8 "POST /v2/auth/login/ HTTP/1.1" 200 52 "Typhoeus - https://github.com/typhoeus/typhoeus"
172.17.0.8 "GET /v2/apps/testing-application/releases HTTP/1.1" 200 17307 "Typhoeus - https://github.com/typhoeus/typhoeus"
INFO [testing-application]: config testing-application-2277815 updated
INFO [testing-application]: admin changed DEIS_DEPLOY_TEST
INFO [testing-application]: Deploy hook sent to https://requestb.in/asd89iuowq
INFO Pulling Docker image deis/example-go:latest
INFO Tagging Docker image deis/example-go:latest as 127.0.0.1:5555/testing-application:v54
INFO Pushing Docker image 127.0.0.1:5555/testing-application:v54
INFO Pulling Docker image 127.0.0.1:5555/testing-application:v54
INFO Pulling Docker image 127.0.0.1:5555/testing-application:v54
INFO [testing-application]: adding 5s on to the original 120s timeout to account for the initial delay specified in the liveness / readiness probe
INFO [testing-application]: This deployments overall timeout is 125s - batch timeout is 125s and there are 1 batches to deploy with a total of 1 pods

As you can see from above, the DEIS_DEPLOY_HOOK fires and it does not wait for liveness and readiness probe before firing...

We need a POST_DEIS_DEPLOY_HOOK to fire so that we can do actions with another API that our application might depend on. What was the idea behind the original deploy hook, was it meant just for Slack integrations?

This is a feature request.

Copied from original issue: deis/controller#1319

LDAP authentication requires a valid group filter and group basedn

From @hankjacobs on June 7, 2017 19:46

Hello,

I recently upgraded from 2.13.0 to 2.15.0. After upgrading, deis login (which is configured to use LDAP) started to fail with Error: Internal Server Error. This stack trace appeared in the logs of deis-controller:

ERROR:root:Uncaught Exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/ldap/ldapobject.py", line 265, in _ldap_call
    result = func(*args,**kwargs)
ldap.FILTER_ERROR: {'desc': 'Bad search filter'}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 486, in dispatch
    response = handler(request, *args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/authtoken/views.py", line 17, in post
    serializer.is_valid(raise_exception=True)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/serializers.py", line 237, in is_valid
    self._validated_data = self.run_validation(self.initial_data)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/serializers.py", line 435, in run_validation
    value = self.validate(value)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/authtoken/serializers.py", line 16, in validate
    user = authenticate(username=username, password=password)
  File "/usr/local/lib/python3.5/dist-packages/django/contrib/auth/__init__.py", line 100, in authenticate
    user = backend.authenticate(*args, **credentials)
  File "/usr/local/lib/python3.5/dist-packages/django_auth_ldap/backend.py", line 171, in authenticate
    user = ldap_user.authenticate(password)
  File "/usr/local/lib/python3.5/dist-packages/django_auth_ldap/backend.py", line 346, in authenticate
    self._get_or_create_user()
  File "/usr/local/lib/python3.5/dist-packages/django_auth_ldap/backend.py", line 574, in _get_or_create_user
    self._mirror_groups()
  File "/usr/local/lib/python3.5/dist-packages/django_auth_ldap/backend.py", line 704, in _mirror_groups
    target_group_names = frozenset(self._get_groups().get_group_names())
  File "/usr/local/lib/python3.5/dist-packages/django_auth_ldap/backend.py", line 827, in get_group_names
    group_infos = self._get_group_infos()
  File "/usr/local/lib/python3.5/dist-packages/django_auth_ldap/backend.py", line 875, in _get_group_infos
    self._group_search)
  File "/usr/local/lib/python3.5/dist-packages/django_auth_ldap/config.py", line 467, in user_groups
    groups = search.execute(ldap_user.connection)
  File "/usr/local/lib/python3.5/dist-packages/django_auth_ldap/config.py", line 168, in execute
    self.attrlist)
  File "/usr/local/lib/python3.5/dist-packages/ldap/ldapobject.py", line 770, in search_s
    return self.search_ext_s(base,scope,filterstr,attrlist,attrsonly,None,None,timeout=self.timeout)
  File "/usr/local/lib/python3.5/dist-packages/ldap/ldapobject.py", line 763, in search_ext_s
    msgid = self.search_ext(base,scope,filterstr,attrlist,attrsonly,serverctrls,clientctrls,timeout,sizelimit)
  File "/usr/local/lib/python3.5/dist-packages/ldap/ldapobject.py", line 759, in search_ext
    timeout,sizelimit,
  File "/usr/local/lib/python3.5/dist-packages/ldap/ldapobject.py", line 273, in _ldap_call
    e.args[0]['info'] = strerror(e.args[0]['errno'])
KeyError: 'errno'
10.68.167.81 "POST /v2/auth/login/ HTTP/1.1" 500 25 "Deis Client v2.15.0"

I was able to determine that the issue had to do with LDAP_GROUP_BASEDN and LDAP_GROUP_FILTER being empty (as per the default settings). This had worked on 2.13.0 but broke on 2.15.0. Setting the above to a valid basedn and filter solved the issue but is unnecessary since we do not use groups.

Copied from original issue: deis/controller#1306

Ability to cancel a deploy

From @helgi on September 15, 2016 21:19

Ability to cancel a deployment (deis deploy:cancel), but the question is will it create:

  • a new release in Deis (allows for doing a "log" but release is still kept around)
  • rollback in k8s (no log / audit trail)

This will require a CLI change as well

Copied from original issue: deis/controller#1071

apps to namespace mapping

Today the apps are mapped to namespace one-on-one. Can we extend this to enable run multiple apps and hence services under single namespace. kubernetes and helm allows this. I can understand why it is done the way it is done now. But would that be a possibility we would extend this to run multiple apps/svc under a namespace and not have one-on-one connection of app with namespace.

Deployments allowed when PORT not set and private docker repository in use.

From @mattk42 on November 9, 2016 20:53

It seems that I have hit an edge case which has allowed me to attempt to deploy a docker app from a private repository even though I have not set the $PORT environment variable.

In this scenario I have set up a brand new cluster configured with a new private repository pointing at a replica of the block storage from a previous cluster which used the internal repository.

Once the cluster started, I had existing apps that were docker based that could not launch since the images were unavailable. If I attempt to push to that application without setting the PORT variable, I am allowed but then get stuck in a situation where the app can't launch because it is not set and I am unable to set it.

Copied from original issue: deis/controller#1134

Deploy hooks don't inform of failed deploys

From @mhulscher on January 19, 2017 15:1

The controller posts a callback as soon as a request comes in. However, a failed deploy, which does a rollback to the previous release, is not posted. This can cause confusion. Is this something that would be within the scope of the deployment hooks?

Copied from original issue: deis/controller#1209

Wrong service selector after switch from Dockerfile to Docker Image

From @felixbuenemann on August 7, 2017 19:19

Switching an app from Dockerfile to Docker Image deploys can cause the app to become unroutable, because the kubernetes service uses the type=web selector instead of type=cms.

Looking at the api_app table in the deis database shows that the structure column of the affected app contains {"web":0,"cmd":1} which causes the controller to choose the wrong selector.

Manually updating the structure column to {"cmd":1} and re-pulling the image fixes the problem.

The problem was observed in a cluster running workflow v2.14.0 for both the initial Dockerfile deploy and all following releases of the app:

v7    2017-08-07T17:29:32Z    buenemann deployed registry.example.com/buenemann/shop-api-cache
v6    2017-08-07T17:25:01Z    buenemann deployed registry.example.com/buenemann/shop-api-cache
v5    2017-08-07T17:22:29Z    buenemann added IMAGE_PULL_POLICY
v4    2017-05-30T20:01:51Z    buenemann deployed registry.example.com/buenemann/shop-api-cache
v3    2017-05-30T20:01:16Z    buenemann added registry info password, username
v2    2017-05-30T20:00:52Z    buenemann added CACHE_SIZE, PURGE_NET, BACKEND_PORT, BACKEND_HOST, PORT
v1    2017-05-30T19:59:33Z    buenemann created initial release

The app was running fine on v6, but I likely fixed the kubernetes service selector using kubectl edit previously and forgot about it, so the problem came back to to the stale proctype config in the deis database.

This problem looks similar to what was reported in deis/workflow#658 and should have been fixed in workflow v2.12 according to the PR #1201.

So either this fix was not sufficient or there was a regression.

Copied from original issue: deis/controller#1316

deis tags per process type

From @deis-admin on January 19, 2017 21:9

From @azurewraith on April 3, 2015 20:32

It could be useful if you could configure process types to run in particular places based on fleet metadata with deis tags.

For example, if you have asynchronous workers that don't necessarily have to be on a fast machine but you want your web processes on better hardware. Or you have asynchronous workers which require a lot of CPU/Memory and you want to schedule them only on hardware that could support it.

Right now, you have to have separate apps for them which can lead to a lot of redundancy, multiple pushes/envirionment config, etc.

I understand not everyone is running a heterogeneous cloud environment but there is some aspect of cluster cost maintenance untapped here where you could have redundant large hosts paired with redundant smaller hosts and reserve the large hosts for process types that need them. Case in point here from a deis core perspective, the deis-builder.

Copied from original issue: deis/deis#3416

Copied from original issue: deis/controller#1212

Improvement: display current number of processes in `deis ps`

From @ineu on May 21, 2017 7:28

In Deis 1 processes were named like type-number, so you could take a look at the highest number to understand the current count. But in K8S pods have unique hashes instead of numbers, so it's not easy to tell what the current count is if you want to scale. So a nice improvement would be to display a count next to the process name, i.e.

=== foo Processes
--- web (10):
...

Copied from original issue: deis/controller#1295

Cannot delete a user who owns releases.

From @mattk42 on February 14, 2017 22:17

I have a user who was an admin on our cluster and had pushed to multiple apps that he was not the owner for.

After transferring ownership of the apps that he owns to another user, I get the following when trying to delete him.

$ deis auth:cancel --username=someuser cancel account someuser at https://deis.mycluster.com? (y/N): y Error: Unknown Error (409): {"detail":"(\"Cannot delete some instances of model 'User' because they are referenced through a protected foreign key: 'Release.owner'\", <QuerySet [<Release: some-app-v7>, <Release: another-app-v10>]

Copied from original issue: deis/controller#1244

proposal: atomic cert update

From @deis-admin on January 19, 2017 23:40

From @szymonpk on September 29, 2015 11:29

There is no way to do 'atomic' certificate update, if I do deis certs:remove tld.com && derts certs:add tld.crt tld.key. Cert is changed on routers disk but nginx isn't restarted. It's required to have few minutes pause between each command or do some strange workarounds (restart routers by hand or add/remove certs for other apps where downtime is acceptable, then configuration is reloaded as one).

I'm not sure which component should be modified to achieve this, can controller instrument routers? (request certificate refresh?)

Copied from original issue: deis/deis#4544

Copied from original issue: deis/controller#1223

feat(controller): add `deis certs:transfer` endpoint

From @deis-admin on January 19, 2017 23:39

From @stuszynski on October 7, 2015 15:7

Hi,

This project is quite new to me but it seems that I'll be hanging around here for a while. ;) I encounter some strange issue and I'm not sure if this is a desired behavior.

We have several users on our deis platform that including 3 admins (with superuser privilages). If some of them add application certificate it can't be listed or removed by another admin.

First admin (who added certificates):

$ deis certs:list
Common Name                Expires
-------------------------  ----------------------
*.myawesomecert.com                  2016-08-24T23:59:59UTC
www.yolo.io                          2020-05-30T10:48:38UTC

Second admin:

$ deis certs:list
No certs
$ deis perms:list --admin
{
  "count": 3,
  "previous": null,
  "results": [
    {
      "username": "admin1",
      "is_superuser": true
    },
    {
      "username": "admin2",
      "is_superuser": true
    },
    {
      "username": "admin3",
      "is_superuser": true
    }
  ],
  "next": null
}

I figure out that if I'll add any another certificate (second admin) it will be visible in cers:list but only by me and a first admin will not see this certificate as well. Anyway all certificates will be correctly rendered on routers.

Unfortunately I couldn't find anything suspicious in the api code. I tested it on 1.8.0 and 1.9.1 platform and I also updated a client to 1.11.0 with no luck.

What do you think of it? Is this a bug or some sort of user/owner relation for application certs that I'm not aware of?

Copied from original issue: deis/deis#4576

Copied from original issue: deis/controller#1221

deis controller throws 500 with badly padded key

From @deis-admin on January 19, 2017 23:44

From @bfosberry on May 21, 2015 21:2

May 21 20:58:58 ip-10-12-13-195.ec2.internal sh[3317]: Traceback (most recent call last):
May 21 20:58:58 ip-10-12-13-195.ec2.internal sh[3317]: File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 112, in get_response
May 21 20:58:58 ip-10-12-13-195.ec2.internal sh[3317]: response = wrapped_callback(request, _callback_args, *_callback_kwargs)
May 21 20:58:58 ip-10-12-13-195.ec2.internal sh[3317]: File "/usr/local/lib/python2.7/dist-packages/django/views/decorators/csrf.py", line 57, in wrapped_view
May 21 20:58:58 ip-10-12-13-195.ec2.internal sh[3317]: return view_func(_args, *_kwargs)
May 21 20:58:58 ip-10-12-13-195.ec2.internal sh[3317]: File "/usr/local/lib/python2.7/dist-packages/rest_framework/viewsets.py", line 85, in view
May 21 20:58:58 ip-10-12-13-195.ec2.internal sh[3317]: return self.dispatch(request, _args, *_kwargs)
May 21 20:58:58 ip-10-12-13-195.ec2.internal sh[3317]: File "/usr/local/lib/python2.7/dist-packages/rest_framework/views.py", line 407, in dispatch
May 21 20:58:58 ip-10-12-13-195.ec2.internal sh[3317]: response = self.handle_exception(exc)
May 21 20:58:58 ip-10-12-13-195.ec2.internal sh[3317]: File "/usr/local/lib/python2.7/dist-packages/rest_framework/views.py", line 404, in dispatch
May 21 20:58:58 ip-10-12-13-195.ec2.internal sh[3317]: response = handler(request, _args, *_kwargs)
May 21 20:58:58 ip-10-12-13-195.ec2.internal sh[3317]: File "/app/api/views.py", line 163, in run
May 21 20:58:58 ip-10-12-13-195.ec2.internal sh[3317]: output_and_rc = app.run(self.request.user, request.data['command'])
May 21 20:58:58 ip-10-12-13-195.ec2.internal sh[3317]: File "/app/api/models.py", line 372, in run
May 21 20:58:58 ip-10-12-13-195.ec2.internal sh[3317]: return c.run(escaped_command)
May 21 20:58:58 ip-10-12-13-195.ec2.internal sh[3317]: File "/app/api/models.py", line 521, in run
May 21 20:58:58 ip-10-12-13-195.ec2.internal sh[3317]: rc, output = self._scheduler.run(job_id, image, entrypoint, command)
May 21 20:58:58 ip-10-12-13-195.ec2.internal sh[3317]: File "/app/scheduler/fleet.py", line 240, in run
May 21 20:58:58 ip-10-12-13-195.ec2.internal sh[3317]: file_obj = cStringIO.StringIO(base64.b64decode(self.pkey))
May 21 20:58:58 ip-10-12-13-195.ec2.internal sh[3317]: File "/usr/lib/python2.7/base64.py", line 76, in b64decode
May 21 20:58:58 ip-10-12-13-195.ec2.internal sh[3317]: raise TypeError(msg)
May 21 20:58:58 ip-10-12-13-195.ec2.internal sh[3317]: TypeError: Incorrect padding

Copied from original issue: deis/deis#3738

Copied from original issue: deis/controller#1228

Look into if modeling data differently in the DB has benefits

From @helgi on September 15, 2016 23:13

Instead of JSONField consider doing a parent/child model to handle proc type information. (and other nested information). Autoscales model with Autoscale for web and worker records. This will only work in some cases. Might not map to health checks very well given it is a very nested structure.

There are two reasons why I want to see some exploration on this

  1. Do validations outside of the JSON Schema hack and rather utilise built-in validations from Django / DRF or at least function based validations
  2. If swagger (deis/controller#811) happens (or any auto-generated schema format) then having data concretely described it going to help a lot

This may not be viable at all, or is only viable when dealing with very simple structures, at best when we know it needs to only map down to the level of process type

Copied from original issue: deis/controller#1075

Soft delete resources instead of hard

From @helgi on September 15, 2016 22:18

When deleting apps / resources in DB we actually do a hard delete but what if we did soft delete? It is easier to trace things through time

http://stefan.haflidason.com/safer-soft-deletion-in-django/
http://www.akshayshah.org/post/django-soft-deletion/

This can be used instead to keep audit log without having any resource around https://github.com/jjkester/django-auditlog (https://github.com/shtalinberg/django-actions-logger is a fork of that) or https://github.com/kajic/django-model-changes, https://pypi.python.org/pypi/django-reversion/2.0.6

Audit log is only useful up to a point if there is no data around to introspect

Copied from original issue: deis/controller#1073

LDAP authentication with incorrect password causes a 500

From @hankjacobs on June 7, 2017 19:48

I recently upgraded from 2.13.0 to 2.15.0. After upgrading, deis login with a valid LDAP user but an incorrect password started to fail with Error: Internal Server Error.

The following stack trace appears in the deis-controller logs:

ERROR:root:Uncaught Exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/ldap/ldapobject.py", line 265, in _ldap_call
    result = func(*args,**kwargs)
ldap.INVALID_CREDENTIALS: {'desc': 'Invalid credentials'}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 486, in dispatch
    response = handler(request, *args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/authtoken/views.py", line 17, in post
    serializer.is_valid(raise_exception=True)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/serializers.py", line 237, in is_valid
    self._validated_data = self.run_validation(self.initial_data)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/serializers.py", line 435, in run_validation
    value = self.validate(value)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/authtoken/serializers.py", line 16, in validate
    user = authenticate(username=username, password=password)
  File "/usr/local/lib/python3.5/dist-packages/django/contrib/auth/__init__.py", line 100, in authenticate
    user = backend.authenticate(*args, **credentials)
  File "/usr/local/lib/python3.5/dist-packages/django_auth_ldap/backend.py", line 171, in authenticate
    user = ldap_user.authenticate(password)
  File "/usr/local/lib/python3.5/dist-packages/django_auth_ldap/backend.py", line 344, in authenticate
    self._authenticate_user_dn(password)
  File "/usr/local/lib/python3.5/dist-packages/django_auth_ldap/backend.py", line 460, in _authenticate_user_dn
    self._bind_as(self.dn, password, sticky=sticky)
  File "/usr/local/lib/python3.5/dist-packages/django_auth_ldap/backend.py", line 765, in _bind_as
    force_str(bind_password))
  File "/usr/local/lib/python3.5/dist-packages/ldap/ldapobject.py", line 386, in simple_bind_s
    resp_type, resp_data, resp_msgid, resp_ctrls = self.result3(msgid,all=1,timeout=self.timeout)
  File "/usr/local/lib/python3.5/dist-packages/ldap/ldapobject.py", line 682, in result3
    resp_ctrl_classes=resp_ctrl_classes
  File "/usr/local/lib/python3.5/dist-packages/ldap/ldapobject.py", line 689, in result4
    ldap_result = self._ldap_call(self._l.result4,msgid,all,timeout,add_ctrls,add_intermediates,add_extop)
  File "/usr/local/lib/python3.5/dist-packages/ldap/ldapobject.py", line 273, in _ldap_call
    e.args[0]['info'] = strerror(e.args[0]['errno'])
KeyError: 'errno'
10.68.167.81 "POST /v2/auth/login/ HTTP/1.1" 500 25 "Deis Client v2.15.0"

Copied from original issue: deis/controller#1307

Use build args to capture build time data

From @jchauncey on November 10, 2016 17:52

Acceptance Criteria:

  • Use the --build-args flag when running docker build to pass in the following items (and more if needed)
  • BUILD_DATE
  • VERSION

You will need to do the following in the dockerfile to persist the data into the image:

ARG VERSION
ARG BUILD_DATE
ENV VERSION $VERSION
ENV BUILD_DATE $BUILD_DATE

Copied from original issue: deis/controller#1135

Refresh the objectstorage-keyfile across namespaces

The controller does not appear to refresh the objectstorage-keyfile across application namespaces when the objectstorage-keyfile is updated in the main deis namespace.

More information on object storage in deis components in the docs: https://deis.com/docs/workflow/en/v2.2.0/installing-workflow/configuring-object-storage/

This is not necessarily a bug, but in an enterprise environment S3 objectstorage keys may be rotated often. This is a problem because this key has to be updated across all the namespaces manually if this happens.

The fix would be to refresh the objectstorage-keyfile if it has changed in the main deis namespace.

Come up with a better way to handle interrupted connections

From @helgi on September 15, 2016 22:52

Most of the time Deployments continue when HTTP connection is severed between our CLI and our API. However, if bad config/image etc is pushed then we are not in as nice of a spot. Things like DB records and so on are not cleaned up and you end up in with an "In Progress" Deployment (see below for one possible solution)

Deployments have a few things on the Kubernetes roadmap (https://github.com/kubernetes/community/wiki/Roadmap:-Deployment) but there is nothing we can depend on in the near term

See deis/controller#1071 for more context

Copied from original issue: deis/controller#1074

`deis pull` should be more informative in error cases

From @rimusz on November 14, 2016 15:35

I was deploying the go app, which was missing set env var and deis pull just gave such strange
error:

Creating build... Error: Post https://deis.staging.clearbit.io/v2/apps/logo/builds/: EOF

deis logs did not show that in more details too. I only was able to get more detailed app log via kubectl log

The same stands for release with curl via API, very informative error message:

curl: (52) Empty reply from server

Copied from original issue: deis/controller#1139

Only able to create 200 releases?

From @nathansamson on July 28, 2017 15:2

Some of our apps stopped deploying after a while (see controller logs).

In at least 2 of the 3 cases, the failures started exactly at release v200 (v200 works, v201 didn't)

We are using workflow 2.11 with kubernetes 1.5.2

ERROR [bplpp-stef]: (app::deploy): 'NoneType' object is not subscriptable
ERROR:root:(app::deploy): 'NoneType' object is not subscriptable
Traceback (most recent call last):
  File "/app/api/models/app.py", line 526, in deploy
    async_run(tasks)
  File "/app/api/utils.py", line 169, in async_run
    raise error
  File "/usr/lib/python3.5/asyncio/tasks.py", line 241, in _step
    result = coro.throw(exc)
  File "/app/api/utils.py", line 181, in async_task
    await loop.run_in_executor(None, params)
  File "/usr/lib/python3.5/asyncio/futures.py", line 361, in __iter__
    yield self  # This tells Task to wait for completion.
  File "/usr/lib/python3.5/asyncio/tasks.py", line 296, in _wakeup
    future.result()
  File "/usr/lib/python3.5/asyncio/futures.py", line 274, in result
    raise self._exception
  File "/usr/lib/python3.5/concurrent/futures/thread.py", line 55, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/app/scheduler/__init__.py", line 270, in deploy
    namespace, name, image, entrypoint, command, **kwargs
  File "/app/scheduler/resources/deployment.py", line 138, in update
    self.wait_until_ready(namespace, name, **kwargs)
  File "/app/scheduler/resources/deployment.py", line 336, in wait_until_ready
    self._check_for_failed_events(namespace, labels=labels)
  File "/app/scheduler/resources/deployment.py", line 373, in _check_for_failed_events
    'involvedObject.name': data['items'][0]['metadata']['name'],
TypeError: 'NoneType' object is not subscriptable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/api/models/build.py", line 65, in create
    self.app.deploy(new_release)
  File "/app/api/models/app.py", line 545, in deploy
    raise ServiceUnavailable(err) from e
api.exceptions.ServiceUnavailable: (app::deploy): 'NoneType' object is not subscriptable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 480, in dispatch
    response = handler(request, *args, **kwargs)
  File "/app/api/views.py", line 527, in create
    super(BuildHookViewSet, self).create(request, *args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/mixins.py", line 21, in create
    self.perform_create(serializer)
  File "/app/api/viewsets.py", line 21, in perform_create
    self.post_save(obj)
  File "/app/api/views.py", line 533, in post_save
    build.create(self.user)
  File "/app/api/models/build.py", line 79, in create
    raise DeisException(str(e)) from e
api.exceptions.DeisException: (app::deploy): 'NoneType' object is not subscriptable

Copied from original issue: deis/controller#1315

deis run need pods-create permissions (rbac)

From @dmcnaught on September 21, 2017 18:41

deis run needs pods - create permissions in charts/controller/templates/controller-clusterrole.yaml

Error:

Running 'bundle exec rake db:migrate'...
Error: Unknown Error (503): {"detail":"assessments-service-config-run-f1dc5 (run): Expecting value: line 1 column 1 (char 0)"}```

Logs
```INFO:scheduler:[assessments-service-config]: run assessments-service-config-run-5ouaj, img quay.io/welltok/assessments:f56e975, entrypoint ['/bin/bash', '-c'], cmd "['bundle exec rake db:migrate']"
ERROR:root:Expecting value: line 1 column 1 (char 0)
Traceback (most recent call last):
  File "/app/api/models/app.py", line 749, in run
    **data
  File "/app/scheduler/__init__.py", line 311, in run
    self.pod.create(namespace, name, image, **kwargs)
  File "/app/scheduler/resources/pod.py", line 42, in create
    raise KubeHTTPException(response, 'create Pod in Namespace "{}"', namespace)
  File "/app/scheduler/exceptions.py", line 10, in __init__
    data = response.json()
  File "/usr/local/lib/python3.5/dist-packages/requests/models.py", line 885, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/lib/python3.5/json/__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.5/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.5/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 486, in dispatch
    response = handler(request, *args, **kwargs)
  File "/app/api/views.py", line 241, in run
    rc, output = app.run(self.request.user, request.data['command'])
  File "/app/api/models/app.py", line 755, in run
    raise ServiceUnavailable(err) from e
api.exceptions.ServiceUnavailable: assessments-service-config-run-5ouaj (run): Expecting value: line 1 column 1 (char 0)```

_Copied from original issue: deis/controller#1326_

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.