nds-org / ndslabs Goto Github PK

View Code? Open in Web Editor NEW

8.0 6.0 9.0 36.65 MB

Main repository for the National Data Service Labs initiative.

Home Page: http://www.nationaldataservice.org

License: BSD 3-Clause "New" or "Revised" License

Shell 100.00%

labs-workbench

ndslabs's Introduction

NDS Labs

This is the main repository for the National Data Service Labs Workbench service.

Labs Workbench is an experimental space for evaluating, developing, and exploring interoperability between research data management services. This is achieved through the use of applications containerized with Docker, Kubernetes for container orchestration, deployed on an OpenStack cluster.

For more information, see the architecture documentation.

On top of this foundation, NDS Labs provides a user interface, command line interface, and an API server used to manage the configuration and deployment of containerized services. This repository includes documentation for the platform, as well as references to the following:

helm-chart: Instructions and templates for deploying Workbench to a Kubernetes cluster
specs: Catalog of containerized services offered by the Workbench platform
apiserver: REST API server, a thin management layer over etcd and Kubernetes as well as the ndslabsctl command line utility
webui: Project management UI implemented in AngularJS

Helm Chart

The Labs Workbench offers a helm-chart to ease deployment to a Kubernetes cluster.

The workbench-helm-chart repository offers steps to:

Deploy a Kubernetes cluster
Prepare your cluster to run the Labs Workbench platform
Deploy pre-built containers to run Workbench in your cluster

Service Catalog

The Labs Workbench includes a catalog of service specifications (specs) managed via the ndslabs-specs repository. The catalog currenty contains over 50 services (and growing).

API Server and Web UI

The two main components of the Workbench platform are the apiserver and the webui.

The workbench-apiserver-python and workbench-webui have their own source repositories.

Documentations

The apis and docs directories contain instructions for generating the documentation for the platform.

apis: Swagger documentation about the REST API offered by Workbench
docs: Sphinx documentation about general platform usage

Resources

The Labs Workbench is a hosted service and is not intended for installation. For Labs Workbench developers, it is possible to run the complete system on a single virtual machine or laptop.

To run a development copy of Labs Workbench, see https://github.com/nds-org/workbench-helm-chart
To deploy Kubernetes Terraform on an OpenStack cluster, see https://github.com/nds-org/kubeadm-terraform

Where to Get Help

Contributing

For more information on our Developer Workflows, see Developer Workflows.

In short, fork this repository and make a pull request. We will review and give feedback.

ndslabs's People

Contributors

Stargazers

Watchers

Forkers

bodom0015 craig-willis gitter-badger linearregression bengalewsky kevincoakley cheese-hub manning-ncsa

ndslabs's Issues

Parallel startup/shutdown for independent services

There's no need to start services sequentially if they have no dependencies. Speed up startup and shutdown times by starting/stopping independent services in parallel.

Ability to collect feedback from users

Users of the NDS Labs Workbench services should be able to send feedback about the system. We could follow the model of Cyverse/DE and provide a short questionnaire or simply provide a text box for free text.

Exec access to containers

We need to determine whether/how to support exec access to individual containers. Launching a console via the browser is attractive, or using Kubectl from the command line. Either way, we'll need to address security -- specifically through Kubernetes (i.e., API certificate access)

Create/Delete PVCs on StackCreate / StackDelete

The work-in-progress PVC support currently creates PVCs when the stack starts up.

If we create these when the stack is created, then the start/stop operations are idempotent and non-destructive. A user can start and stop their stack until it is deleted.

We should also be thinking about how this pattern would affect edge use cases, such as "Editing the Spec of an Existing Stack" for example. Previously, we would just delete and recreate the stack with the assumption that the data would remain on disk, but now we have a choice:

Assume that StackDelete means that we no longer want the data, and delete the data with the stack - we lose a use case with this choice, but I'm not sure its widely used
Add API endpoints to manually cleanup PVCs - decide on either single or batch (e.g. "delete the volume name username-sxxxxx-vol#" vs "delete the volumes of stack sxxxxx")

This ticket is complete when the ndslabs API server has been adjusted to create PVCs on CreateStack, instead of StartStack.

This ticket is complete when:

the ndslabs API server has been adjusted to create PVCs on CreateStack, instead of StartStack
the ndslabs API server has been modified to delete PVCs on DeleteStack

Better error handling in API server

There are a number of problems with the API server if things are stopped and started outside of the ndslabsctl and in some error conditions. Specifically:

Server is rebooted
Kubernetes, etcd, Openstack API stopped
Services stopped via docker kill or kubectl delete
Containers in pending states (invalid image names, drained nodes)

Document Workbench development instructions

Shared data should be configurable

With #261 merged, we have an open question about how to handle shared data. Today, it's assumed to be an NFS mount via hostpath (ala TERRA-REF use case), but really should be an NFS RWM mount (not hostpath).

Upgrade Workbench to support Kubernetes 1.10

Was https://opensource.ncsa.illinois.edu/jira/browse/NDS-1117.

The WB API server currently uses raw REST calls (via kube.go) to interact with Kubernetes. Update this to use the Go client: https://github.com/kubernetes/client-go. While we're at it, move to go client 7.0 (k8s 1.10).

Automated unit and/or integration/end-to-end tests

Unit / integration are something that NDS Labs currently lacks. This can cause new and old issues alike to bubble up unnoticed until a manual round of testing reveals them. While I am not familiar enough with Go to recommend technologies, there seem to be some existing methodologies for testing Go applications.

There are several options regarding unit or integration/e2e testing in AngularJS applications. Many such options are enumerated here: http://stackoverflow.com/questions/300855/javascript-unit-test-tools-for-tdd

Two of the more common choices in this area are either Jasmine or Mocha + Chai + Sinon for writing test specs (Ruby style), with Karma + Protractor for running tests specs.

Support multiple replicas of each service

Kubernetes allows you to specify a number of replicas that each replication controller should enforce. Requests to the service wrapping around the controller would then round-robin requests to different replicas for load-balancing.

Currently the API server assumes only one replica, but it should be fairly easy to allow the user to scale up a particular service.

It should be noted that some services may not benefit from this type of scaling (i.e. sharing elastic search/mango/rabbitmq). Further investigation may be necessary in order to scale these types of services.

Change spec format to support capabilities

Currently we have only the coarse-grained privileged, but for CHEESE will need to add support for individual capabilities.

Cauth assumes "www" prefix but helm chart does not

The helm chart lets me specify a subdomain prefix other than "www":
https://github.com/nds-org/workbench-helm-chart/blob/master/values.yaml#L11

The Cauth endpoint runs check_token validation if the host prefix isn't "www", resulting always in 403 logging into the system:
https://github.com/nds-org/ndslabs/blob/develop/gui/server.js#L226

API
I0204 17:51:48.014334      21 server.go:287] remoteAddr: 10.48.0.13:33212
04/Feb/2019:17:51:48 +0000 200 124μs "GET /api/version HTTP/1.1" - "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Firefox/60.0"
I0204 17:51:52.248844      21 server.go:287] remoteAddr: 10.48.0.13:33212
04/Feb/2019:17:51:52 +0000 200 5533μs "POST /api/authenticate HTTP/1.1" - "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Firefox/60.0"
I0204 17:51:52.300503      21 server.go:287] remoteAddr: 127.0.0.1:49580
04/Feb/2019:17:51:52 +0000 403 12691μs "GET /api/check_token?host=dev.mbwb.k117.org HTTP/1.1" demo "-"

WEBUI
2019-02-04T17:52:58.739Z - error: Request Failed.
Status Code: 403
::ffff:10.48.0.13 - GET /cauth/auth HTTP/1.1 403 9 - 12.616 ms
::ffff:10.48.0.1 - GET /asset/png/favicon-2-32x32.png HTTP/1.1 200 2209 - 0.449 ms

Ideally, we either parameterize cauth based on the helm chart or remove the ability to specify the subdomain.

Revisit Globus auth after adding PVC support

Our previous implementation of Globus auth involved writing the user's Globus auth token to a file in their home directory. With the advent of PVC support in Workbench, we no longer have direct filesystem access to this directory from the API server.

We need to find a way to get this auth token to the user within the container.

Some methods proposed:

Put the auth token in a Secret or ConfigMap in the user's namespace - either could then be injected via envvar and/or file into all containers
Run a small Kubernetes Job to write the file to disk - file will be mounted into all containers via the user's PVC
Add an initContainer to each pod/container that the user starts - file will be manually written into all containers upon startup

This issue is complete when we have implemented one of the above or comparable to somehow asynchronously pass the user's Globus auth token from the Workbench apiserver's login handler into the pods that the user subsequently starts.

Explore spec validation

The only way we can currently test that a spec works is by configuring it and running a stack. This is tedious when attempting to rapidly test changes to a spec.

We should investigate whether it would be possible / useful to attempt to validate the spec when adding it to the service list, rather than attempting to do it when the user runs an instance of the stack described by the spec.

Create PVCs on StackCreate

The work-in-progress PVC support currently creates PVCs when the stack starts up.

It would make more sense to create these when the stack is created, so that start/stop are idempotent and non-destructive.

This ticket is complete when the ndslabs API server has been adjusted to create PVCs on CreateStack, instead of StartStack.

"View Logs" throwing 404

Ran into a problem with "View Logs" on a stack service.

CLI:

[root@lambert-test NDSLabsSystem ] bin # ndslabsctl list stacks
STACK        SERVICE         STATUS    SID
Clowder-01                   started   sr4nbm
             clowder         ready     sr4nbm-clowder
             mongo           ready     sr4nbm-mongo
             rabbitmq        ready     sr4nbm-rabbitmq
             plantcv         ready     sr4nbm-plantcv
             imagepreview    ready     sr4nbm-imagepreview
             elasticsearch   ready     sr4nbm-elasticsearch
[root@lambert-test NDSLabsSystem ] bin # ndslabsctl logs sr4nbm-rabbitmq
Error getting logs sr4nbm-rabbitmq: 404 Not Found

GUI:

GET http://141.142.209.135:30001/projects/terra/logs/sr4nbm-rabbitmq 404 (Not Found)
error grabbing logs for service sr4nbm-rabbitmq

Support more fine-grained security capabilities

As an administrator of Workbench, it would be nice to be able to define more fine-grained security permissions for each service. The API currently allows us to specify "privileged", even if the UI does not offer this option directly. It would be nice to be able to specify particular permissions, instead of giving full root access with privileged.

It may be as simple as tracking down anywhere that privileged is specified and replacing it with a "securityContext" block (to share the exact syntax with the Kubernetes spec) defining zero or more permissions. We would need to replace this in each spec, as well as adjusting the handling for this field in the API server.

See Security Context for more details

This task is complete when:

ndslabs-specs has been updated where necessary to tune down the allowed permissions of privileged specs
ndslabs has been updated to handle the new securityContext block appropriately and to copy these capabilities into the user's stack service container(s)

Revisit automated tests with Travis/Jenkins

At one point we pursued running the postman tests against a minikube instance (see https://github.com/nds-org/ndslabs/tree/NDS1151-TravisCI). It seems like it should be possible to do this so we get automated integration tests.

A few thoughts:

Have Travis run minikube, deploy Workbench via helm chart
Build/run apiserver from source (using ExternalService)
Run postman tests

Specify Configs / Volumes when adding a service to an existing stack

After creating a stack with the wizard (which handles configuration), the user has the option to Add/Remove optional dependencies without recreating the entire stack. While this is helpful in simple cases, it creates a problem in more complex services.

A perfect example encompassing both of these problems can be seen in ownCloud + MySQL. Specifying ownCloud does not prompt for configuration, but the optional MySQL service requires the user to enter several passwords for proper functionality. Without entering a MySQL password/root password, the service can never successfully start. Furthermore since MySQL was never allocated a volume, it will not persist its contents between restarts (which seems suboptimal for a database).

Create Helm chart

Finalize work on the Helm chart:
https://github.com/nds-org/workbench-helm-chart

minikube's default storage-provisioner RBAC is missing some permissions

Received this in the logs after creating a PVC:

$ kubectl logs -f storage-provisioner -n kube-system
Error watching for provisioning success, can't provision for claim "test/s4rdfk-cloudcmd": events is forbidden: User "system:serviceaccount:kube-system:storage-provisioner" cannot list events in the namespace "test"
Error watching for provisioning success, can't provision for claim "test/spd9xt-cloudcmd": events is forbidden: User "system:serviceaccount:kube-system:storage-provisioner" cannot list events in the namespace "test"

Manual workaround that worked for now:

# Edit to add the "list" verb to the "events" resource
$ kubectl edit clusterrole -n kube-system system:persistent-volume-provisioner
$ kubectl get pvc -n test
NAME              STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
s4rdfk-cloudcmd   Bound     pvc-e794fa3e-b6ac-11e8-8044-080027add193   1Mi        RWX            standard       12m
spd9xt-cloudcmd   Bound     pvc-6baa04ab-b6ad-11e8-8044-080027add193   1Mi        RWX            standard       8m
src67q-cloudcmd   Bound     pvc-2243a82c-b6ae-11e8-8044-080027add193   1Mi        RWX            standard       3m

We should probably submit this as an issue to minikube's official issue tracker.

"Additional Configuration" page only displays when coming from "Optional Services" page

Currently, the discovery of "extraConfigs" in the GUI only happens when navigating from "optional" to "config". Since some services do not have optional dependencies, they will skip the "optional" page.

This leads to an issue where services that require extra configuration do not allow the user to specify that configuration unless they have optional dependencies, which can be problematic for standalone services that require such configuration like MySQL.

One example of where this is handled properly is with the volume requirements. Following this pattern, one viable solution would be to abstract all "config discovery" into a helper function, similar to the "discoverVolumeRequirements" function.

Refresh token on wizard page navigation

The current timeout for an auth token is 30 minutes. In extreme cases, this timeout can occur during the configuration of a stack. This will bounce the user to the login page without creating their stack, and force them to reconfigure it using the wizard.

To combat this problem, the wizard should refresh the user's token whenever they change between each discrete step of the wizard.

Add support for single-namespace configuration

Explore support for real PVCs

Preliminary work to determine what it will take to replace our current Gluster/hostPath mounts with a per-user read-write-many PVC. Ideally, we implement the minimum changes to support current functionality with a PVC. Create new tasks/document and improvements that this suggests.

Change spec isYYY to "access" and "display"

The boolean spec properties isService/Public/Standalone are burdensome. The new approach will be access {external, internal}, which determines whether a NodePort is created and display {stack, standalone}, which determines how a service is displayed.

NetworkPolicies not working as expected

Per the discussion on #285, NetworkPolicies are not working as expected with the full Workbench deployment and Weave. The primary symptom seems to be related to the ingress load balancer running on the host not being able to access pods running on other hosts. If the ILB is on master and the pod is on worker0, Weave will not allow traffic through.

Better log handling

We use the logspout container to pull logs from Docker containers within a namespace. Unfortunately this only works on a single host. We need a way to pull logs from containers across hosts (probably via Kubernetes API).

Also, it would be nice to have a clean way to pull logs that aren't sent to stdout/stderr. Some link known logs to /dev/stdxxx. Maybe do this for everything instead of requiring the container writer to do it?

Squelch debug logging in GUI

The frontend logs are currently fairly noisy. Since we are using AngularJS's $log service, this should be fairly simple, and can be accomplished by calling the following during app.config():

$logProvider.debugEnabled(false);

This will prevent $log from propagating $log.debug() events, and should only show the other levels of log activity. Furthermore, the start.sh -d command could substitute true (via sed), to allow you to see the debug output when starting in "developer mode".

Confirm github oauth support

Include documentation on how to use groups/organizations.

Use secrets for passwords

Passwords are currently handled via environment variables. We should be using Kubernete's secrets.

Fix Globus auth payload

#261 breaks writing the Globus auth payload (writeAuthPayload) since the apiserver would need to now mount the user's home PVC. We need to find another solution to writing the auth information to the user's home PVC

Design platform instrumentation

Per the proposal:

"In addition to specific course-based evaluations, general usage analytics will also be conducted for the platform. While traditional usage geographic distribution and counts for individual applications can be collected using tools such as Google Analytics, we will also instrument our platform to collect data on complex usage patterns such as application usage chains. Such information can be used in recommending applications for users. Application specific instrumentation will also be added for contributions from project members. We will explore integrating progress tracking within the Jupyter instructional notebook to analyze the effectiveness of this instruction method."

At a minimum, this should include per-application usage information.

Dependency shutdown(/startup) ordering

There's currently no way to specify dependency shutdown order. There are cases where shutdown order matters, such as RabbitMQ and extractors. Shutting down rabbit before extractors causes extractor error during shutdown.

Orphaned volumes add to 'Projected Usage'

Currently, the UI gives the option to reuse existing (detached) volumes that the user has already allocated for a particular service. Even though these volumes have already been allocated, they are still reflected as contributing to the 'Projected Usage' in the volumes page of the configuration wizard.

This is clearly a mistake, and needs to be addressed as we flesh out more of the strategy for enforcing storage quotas in NDS Labs.

Asynchronous updates from API server

Currently, when performing a start/stop the "Auto Refresh" functionality is enabled. This simply performs a GET on the following 2 URLs every 1-2 seconds:

/projects/{project-id}/stacks
/projects/{project-id}/volumes

While this usually works well enough, I have a strong suspicion that this is related to some of the quirks that the UI experiences (such as issue #21).

The current scheme also does not cover the case of multiple users editing the same project. For example: let's say users A and B are both logged into the same project. User A adds a stack slightly before user B adds their stack.

User A will only see the stack that they created, and not User B's stack, due to the race condition.
User B, however, will see both stacks since a successful POST leads to soft refresh that will retrieve both created stacks.

Ideally, we should investigate asynchronous technologies (i.e. WebSockets) to handle this behavior.

Endpoint values not displayed if access=internal

When access=internal, the internal information should be displayed for the endpoint. Currently, the endpoint is only returned if access=external (apiserver bug).

Add network policy support

Original requirements were based on the arpspoof case:

Pods in the same stack need to run on the same node (node affinity)
Pods in the same stack can only communicate with each other

Additional requirements added after the fact:

Disable network policy support with apiserver configuration flag

Race condition in API server tests

First run of the API server tests will create an account/namespace called test.
Second run of the tests will attempt to delete this account before recreating it to reset the test state.

With PVCs and running pods in the namespace, this can create a race condition - since we don't wait for the namespace to be fully deleted before continuing the tests, the account creation can fail because the namespace is still being cleaned up:

24/Sep/2018:10:26:07 -0500 200 8874μs "GET /api/accounts HTTP/1.1" admin "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
E0924 10:26:08.162690   16237 server.go:1070] Operation cannot be fulfilled on namespaces "test": The system is ensuring all content is removed from this namespace.  Upon completion, this namespace will automatically be purged by the system.
24/Sep/2018:10:26:08 -0500 500 17416μs "DELETE /api/accounts/test HTTP/1.1" admin "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
24/Sep/2018:10:26:09 -0500 409 7774μs "POST /api/register HTTP/1.1" - "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
24/Sep/2018:10:26:10 -0500 404 1478μs "PUT /api/register/verify HTTP/1.1" - "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
24/Sep/2018:10:26:11 -0500 200 4244μs "GET /api/accounts HTTP/1.1" admin "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
24/Sep/2018:10:26:12 -0500 404 1574μs "GET /api/register/approve?u=test&t=vKq8XWqIB5iOn6RXO8ucgcKS8yM HTTP/1.1" - "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
24/Sep/2018:10:26:13 -0500 200 4472μs "GET /api/accounts HTTP/1.1" admin "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
runtime error: index out of range
goroutine 247 [running]:
runtime/debug.Stack(0xc4204454b8, 0x1e42b80, 0x2803b50)
	/usr/local/Cellar/go/1.8/libexec/src/runtime/debug/stack.go:24 +0x79
github.com/ndslabs/apiserver/vendor/github.com/ant0ine/go-json-rest/rest.(*RecoverMiddleware).MiddlewareFunc.func1.1(0x2803750, 0x27c1540, 0xc4203a2690)
	/Users/lambert8/go/src/github.com/ndslabs/apiserver/vendor/github.com/ant0ine/go-json-rest/rest/recover.go:41 +0x6e
panic(0x1e42b80, 0x2803b50)
	/usr/local/Cellar/go/1.8/libexec/src/runtime/panic.go:489 +0x2cf
main.(*Server).GetAccount(0xc420098fd0, 0x27c1500, 0xc42016f180, 0xc420148e80)
	/Users/lambert8/go/src/github.com/ndslabs/apiserver/cmd/server/server.go:628 +0xc79
main.(*Server).GetAccount-fm(0x27c1500, 0xc42016f180, 0xc420148e80)
	/Users/lambert8/go/src/github.com/ndslabs/apiserver/cmd/server/server.go:331 +0x48
github.com/ndslabs/apiserver/vendor/github.com/ant0ine/go-json-rest/rest.(*router).AppFunc.func1(0x27c1500, 0xc42016f180, 0xc420148e80)
	/Users/lambert8/go/src/github.com/ndslabs/apiserver/vendor/github.com/ant0ine/go-json-rest/rest/router.go:56 +0xa3
github.com/ndslabs/apiserver/vendor/github.com/StephanDollberg/go-json-rest-middleware-jwt.(*JWTMiddleware).middlewareImpl(0xc420018480, 0x27c1500, 0xc42016f180, 0xc420148e80, 0xc42020ca90)
	/Users/lambert8/go/src/github.com/ndslabs/apiserver/vendor/github.com/StephanDollberg/go-json-rest-middleware-jwt/auth_jwt.go:103 +0x226
github.com/ndslabs/apiserver/vendor/github.com/StephanDollberg/go-json-rest-middleware-jwt.(*JWTMiddleware).MiddlewareFunc.func2(0x27c1500, 0xc42016f180, 0xc420148e80)
	/Users/lambert8/go/src/github.com/ndslabs/apiserver/vendor/github.com/StephanDollberg/go-json-rest-middleware-jwt/auth_jwt.go:82 +0x51
github.com/ndslabs/apiserver/vendor/github.com/ant0ine/go-json-rest/rest.(*IfMiddleware).MiddlewareFunc.func1(0x27c1500, 0xc42016f180, 0xc420148e80)
	/Users/lambert8/go/src/github.com/ndslabs/apiserver/vendor/github.com/ant0ine/go-json-rest/rest/if.go:47 +0x78
github.com/ndslabs/apiserver/pkg/middleware.(*NoCacheMiddleware).MiddlewareFunc.func1(0x27c1500, 0xc42016f180, 0xc420148e80)
	/Users/lambert8/go/src/github.com/ndslabs/apiserver/pkg/middleware/no_cache.go:17 +0x135
github.com/ndslabs/apiserver/vendor/github.com/ant0ine/go-json-rest/rest.(*ContentTypeCheckerMiddleware).MiddlewareFunc.func1(0x27c1500, 0xc42016f180, 0xc420148e80)
	/Users/lambert8/go/src/github.com/ndslabs/apiserver/vendor/github.com/ant0ine/go-json-rest/rest/content_type_checker.go:38 +0x200
github.com/ndslabs/apiserver/vendor/github.com/ant0ine/go-json-rest/rest.(*JsonIndentMiddleware).MiddlewareFunc.func1(0x27c1540, 0xc4203a2690, 0xc420148e80)
	/Users/lambert8/go/src/github.com/ndslabs/apiserver/vendor/github.com/ant0ine/go-json-rest/rest/json_indent.go:35 +0xd8
github.com/ndslabs/apiserver/vendor/github.com/ant0ine/go-json-rest/rest.(*RecoverMiddleware).MiddlewareFunc.func1(0x27c1540, 0xc4203a2690, 0xc420148e80)
	/Users/lambert8/go/src/github.com/ndslabs/apiserver/vendor/github.com/ant0ine/go-json-rest/rest/recover.go:57 +0x8a
github.com/ndslabs/apiserver/vendor/github.com/ant0ine/go-json-rest/rest.(*PoweredByMiddleware).MiddlewareFunc.func1(0x27c1540, 0xc4203a2690, 0xc420148e80)
	/Users/lambert8/go/src/github.com/ndslabs/apiserver/vendor/github.com/ant0ine/go-json-rest/rest/powered_by.go:26 +0xae
github.com/ndslabs/apiserver/vendor/github.com/ant0ine/go-json-rest/rest.(*RecorderMiddleware).MiddlewareFunc.func1(0x27c1580, 0xc420148ea0, 0xc420148e80)
	/Users/lambert8/go/src/github.com/ndslabs/apiserver/vendor/github.com/ant0ine/go-json-rest/rest/recorder.go:22 +0xa2
github.com/ndslabs/apiserver/vendor/github.com/ant0ine/go-json-rest/rest.(*TimerMiddleware).MiddlewareFunc.func1(0x27c1580, 0xc420148ea0, 0xc420148e80)
	/Users/lambert8/go/src/github.com/ndslabs/apiserver/vendor/github.com/ant0ine/go-json-rest/rest/timer.go:20 +0x122
github.com/ndslabs/apiserver/vendor/github.com/ant0ine/go-json-rest/rest.(*AccessLogApacheMiddleware).MiddlewareFunc.func1(0x27c1580, 0xc420148ea0, 0xc420148e80)
	/Users/lambert8/go/src/github.com/ndslabs/apiserver/vendor/github.com/ant0ine/go-json-rest/rest/access_log_apache.go:87 +0x54
github.com/ndslabs/apiserver/vendor/github.com/ant0ine/go-json-rest/rest.adapterFunc.func1(0x27c03c0, 0xc420456700, 0xc4204a4000)
	/Users/lambert8/go/src/github.com/ndslabs/apiserver/vendor/github.com/ant0ine/go-json-rest/rest/middleware.go:70 +0x122
net/http.HandlerFunc.ServeHTTP(0xc42020cde0, 0x27c03c0, 0xc420456700, 0xc4204a4000)
	/usr/local/Cellar/go/1.8/libexec/src/net/http/server.go:1942 +0x44
net/http.serverHandler.ServeHTTP(0xc420098e70, 0x27c03c0, 0xc420456700, 0xc4204a4000)
	/usr/local/Cellar/go/1.8/libexec/src/net/http/server.go:2568 +0x92
net/http.(*conn).serve(0xc4201368c0, 0x27c1280, 0xc420277980)
	/usr/local/Cellar/go/1.8/libexec/src/net/http/server.go:1825 +0x612
created by net/http.(*Server).Serve
	/usr/local/Cellar/go/1.8/libexec/src/net/http/server.go:2668 +0x2ce
24/Sep/2018:10:26:14 -0500 500 19321μs "GET /api/accounts/test HTTP/1.1" admin "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"

If it is simple enough, we should look into modifying the "Delete Account" test case to wait until the namespace has been fully deleted before moving onto the next test.

Another pattern might be to see if newman has a Ruby-esque "afterAll" hook (such as the one offered by protractor and karma) that will let us specify cleanup code that should run once all test cases have completed.

Volume actions should use ID, not name

Currently, all components rely on the volume name for attach, detach, and delete. Change to use volume ID.

Add "Change Password" function to GUI

The CLI allows you to change your account password, but the GUI does not offer a similar function. It should be fairly easy to add, since a REST endpoint likely exists to provide this functionality to the CLI.

Multiple volume support

Currently, services can only have one volume. We need to figure out how best to handle services with multiple volumes -- or whether this is really a good requirement.

Pods in the same stack should (optionally) run on the same node

The CHEESE arpspoof case requires that hacker, victim and server all be collocated on the same node. We should be able to achieve this with the current NDSLabs pod-centric design by pod affinity.

The following example demonstrates how to use labels to ensure that pods run on the same node:
https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#more-practical-use-cases

We should also include a spec-level configuration option to enable -- and should be disabled by default.

Updgrate etcd or move to something else

We initially used the in-cluster etcd because it was easy and have since moved to our own etcd2 Pod for Workbench configuration. This is problematic on OpenStack because of a known isse with etcd on NFS. We either need to upgrade to etcd3 or consider moving to another store. We don't do anything fancy that could be done with sqlite, mysql, Mongo, etc.

Add support for workbench OS user

We periodically encounter problems with filesystem permissions on the home directory (e.g., terraref/computing-pipeline#545). One solution we've discussed is having a workbench OS user that maps to the common UID/GID used by applications like Jupyter and Rstudio.

Add also to v1.0.12 for Terra-ref.

Intermittent: stacks/volumes rarely will fail to POST

While I have not been able to reliably reproduce the behavior, I have noticed that sometimes creating a stack fails with a 404. More rarely, creating the stack is successful but creating and/or attaching its associated volumes may fail.

Obviously both cases are undesirable, but the volumes failing poses the bigger problem of leaving the user's project in a bad state: with no volumes allocated to a stack, the data will disappear when the stack is restarted. Furthermore the user is given no indication that the failure has occurred (unless they happen to be watching the Developer Console), giving them no chance to rectify the problem.

More investigation is needed as to why this issue occurs and how to prevent it.

Move DNS to CloudFlare?

If we moved ndslabs.org to CloudFlare, we'd be able to use the cert-manager integration for Wildcard certs:

https://docs.cert-manager.io/en/latest/tutorials/acme/dns-validation.html

Or maybe we do this when we register datbench.org.

Split service list into functional categories

The current list of services is rather small, but once we start getting more specs into NDS Labs, I suspect that this list will quickly become unwieldy. We have already hidden "standalone" services behind a checkbox, but this is only delaying the inevitable.

One idea to combat this is to categorize all services by function (i.e. storage, transport, etc). More discussion may be necessary to discover all of the categories we can offer, as well as where each service would fit into that model.

Support apiserver.json as configmap

The current approach of generating the apiserver.json via entrypoint.sh is annoying and should be replaced with a configmap. This would allow for customization without requiring translating into bash.

initExistingUsers needs to create ResourceQuotas and Services

Using the cluster-backup repo, I was able to migrate the users from one instance of Workbench to another. The API server throws a 500 exception when trying to pull from /accounts, since the user's ResourceQuota does not exist and cannot be read. Furthermore any existing stacks needed to be wiped out, since their services were not migrated properly.

This could easily be fixed by having the API server create resources where they do exist.