Giter Club home page Giter Club logo

cpho-phase2's People

Contributors

alexcleduc avatar andguo avatar dependabot[bot] avatar github-actions[bot] avatar kdompaul avatar kingbain avatar lilakelland avatar msarar avatar najsaqib avatar samiatoui avatar simardeep1792 avatar sleepycat avatar stephen-oneil avatar szeckirjr avatar tomcdona avatar vedantthapa avatar vickiszhang avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cpho-phase2's Issues

speed up tests

The conftest is super slow, makes TDD really annoying, maybe we shouldn't create all of those users

Pre-prod (dev/test) environment

There's going to be a need for a dev/test environment to (user) test new features in a prod-like environment.

GitOps wise, I'd suggest having the main branch syncing to the dev/test environment and a separate prod branch for the prod state. @AlexCLeduc any considerations from your perspective?

The dev/test environment would ideally use fake data, re-seeded periodically (on every deploy maybe). That'd put the onus on the devs to maintain seeding scripts or fake data factories, so that's also @AlexCLeduc's call. Could also just leave the manual management of the dev/test environment's database to the devs.

On the DevOps side, we'll need to figure out how to best enable this. There's a lot about the k8s side of this that aren't obvious to me, so if you have a good idea, chime in @simardeep1792 @vedantthapa!

Age group records shouldn't be deleted

Currently we rely on the formset logic to delete age groups that get checked "deleted". Right now it is actually deleting records and all of their versions. We need to override this behaviour and probably modify the data-model to keep its versions around.

Scripts and fixtures as migrations

Longer term we need to be using the migration engine to push "data updates" (aka scripts and fixture updates).

The official way to do this is to use data-migrations but those involve writing scripts in a very rigorous fashion. These are only supposed to access "serializable" parts of models, not methods or utility functions that may move around around.

Using official "data-migrations" is realistic for very simple operations like creating groups or small groups of lookup-records, but it is cumbersome for larger fixtures and complex scripts.

Another issue with these data-migrations is that they would also run in tests, which we certainly don't want in every case.

However migrations are ideal because migrations and scripts often depend on one-another, and migrations are already built around a dependency queue

I'd like to propose subclassing the django migration class to create custom behaviour.

  • Our subclass would specify in which environments it should run (test, localdev, prod)
  • If it gets run in a non-applicable environment, it just does nothing so that the migration gets recorded in django's migration table - If a migration class gets run in tests or localdev, its dependencies should be considered permanents parts of the repo
  • Migrations can still run arbitrary python, so we should be able to keep using fixture, e.g. call_command("loadadata", "fixture.json")
    • Note that this can cause problems if we add columns and the migration is run before those columns are added! In this case, we could go and "invalidate" the old migration by wiping out its code and create a new migration.

Note that this still doesn't cover everything, and we'll need manual prod DB access for exceptions. For instance, creating the first user can't be done through the UI, and we probably don't want people's emails in this github repo.

Load test the k8s configuration

Doesn't need to be very fancy or too realistic, although if you pick a solution that can manage some automation then I'd recommend talking to @AlexCLeduc and hashing out some test credentials and basic workflows to properly put the app nodes through their paces.

Ideally pick a tool where we can re-run the load balancing whenever we want, as we'll be needing load testing at least until the configuration is fully stable (and even then it could be useful to the devs if the app requirements include heavier routes down the line, etc).

We want to do this soon to put the experimental-ish 1-1 app node to django process configuration through its paces. It works when it's just me poking around the test deployment, but it's definitely suboptimal for higher traffic volume. Adding some sort of request counting to the horizontal scaling and making sure the load balancing is smart could be necessary, but we should see how bad it is to start with before going down the rabbit hole. Compare it to nodes with more resources each running multiple django process via gunicorn.

Consider an even more minimal run-time container image

From here:

While the current image is already using python-slim, it'd be good to reduce the attack surface further with some kind of minimalist docker image. While Alpine linux is a popular choice, it doesn't go quite as far as the Distroless images do, and historically doesn't play nice with Python which seems to need Glibc to work properly.

Chainguards Python image seems pretty ideal to use as a base image for our Django setup, taking the ideas of distroless even further.

We can explore/verify the contents of a docker image with dive.

Going slimmer might be marginal at this point, since we're already on Alpine, but it might be worth a look at some point. Slimmer means better security and better cold start time. Downside is only if the image ends up very different from dev/is harder to debug.

Create a PHAC signature component

CPHO needs an agency signature component that works for responsive web applications and dark mode.
Similar to the Wordmark component from a72c45c, this component should be able to display differently based on props to allow it to be used both in CPHO and lots of other contexts. Behaviour we're looking for:

  • show just the flag
  • show flag plus one language (FR/EN)
  • show flag plus both languages (in the correct order, determined by the current language)
  • able to support monochrome usage
  • alt text that matches whatever is displayed

As always, optimised svg, tests and a solid accessibility story are key! Inspirational source material can be found here and in gcui.

Add PHAC signature/wordmark

See #39 which was working on this for the old React frontend. Presume that this is still needed in the new Django monolith app so recover that work.

I assume all of the Django apps around here will need this, so maybe it should be a templatetag in django-phac_aspc-helpers.

Consider adding some Metrics to our OpenTelemetry instrumentation

Use opentelemetry-exporter-gcp-monitoring for exporting metrics directly to GCP Cloud Trace. TBC, do the predefined aggregation types require a collector, or can the GCP Cloud Trace backend perform aggregation for them?

We may want a Cloud Run sidecar to run an OpenTelemetry Collector, assuming we want metrics with aggregation logic, which is the super power of OpenTelemetry metrics. See the google docs here and here.

If we don't need a collector, this is pretty simple. If/when we have metrics that need the side car, then this becomes more complicated (and bloats up our infrastructure a bit).

Infrastructure as Code/Data for non-k8s infrastructure

TODO:

  • determine choice of IAC/IAD tooling. KCC?
    • don't have to get it right the first time, just pick one that everyone at least agrees on in theory
    • going with config connector
  • convert the remaining infrastructure represented by deploy/gcloud_init_setup.sh
    • DNS managed zone
    • Artifact registry for app server images
    • Cloud storage for test coverage reports
    • Cloud build IAM (read/write to the test coverage bucket)
    • GitHub trigger, although that's tricky. The repository connection needs to made first, and may not be something we can automate
    • Cloud trace (well, all we need is for the API to be enabled, but this needs to be captured for cold starts in new GCP projects)
    • Uptime monitoring. This one's already Pulumi, we might be able to generate the k8s yaml , although note the caveats
  • try to identify and capture any other pieces of necessary infrastructure which were click ops-ed in to the current project

Add linting to the server

Adding linting to projects is a great way to catch errors and ensure the code quality stays high.
Ruff is a linter for python that is written in Rust and extremely fast.
Adding it to the server (as a pdm script initially) would be useful both during development (where tools like Ale or a VSCode extension surface linting errors as you type), and also in the CI pipeline that we'll be creating shortly.

Automatically run server migrations

We need to figure out a way of automatically running migrations for the server.

There is some existing code from an initial attempt in the deployment for the server. The thinking was to use an initContainer, but it wasn't working for some reason. Historically istio's sidecars can conflict with running migrations from an initContainer but the exact cause isn't clear.

Totally open to other solutions like jobs here too.

Add coverage reports for templates.

This issue came out of PR #73 (Testing as a Cloud Build step and code coverage reporting.)

Stephen made this comment "...there's a coverage plugin called django_coverage_plugin we can add to possibly get coverage reports on django templates as well. Note, the example I saw this in was using django's built in templates, while this project is using jinja for templating. Haven't checked if this specific plugin can also track coverage for jinja files, or if some jinja alternative exists."

Configure Istio HTTP health checks

As per these docs, we can configure HTTP based health checks via Istio for readiness/liveliness of our nodes. The app provides a simple health check route at /healthcheck, use that. Tune the initial delay and period; delay will probably be relatively stable going forward but the appropriate period may depend a lot on whether we stick with low-resource, single django process, nodes (related: #133).

Configure k8s postgres to autoscale storage capacity

Pgaudit logs are stored to the DB, so from those alone we can expect it to fill up gradually.

Failing a good option to configure auto storage scaling, we should look in to setting up warnings when the disk space starts running out.

Production DB management mechanism & playbook

We need to keep the production database fenced off and assure its security and integrity, but we'll also need a mechanism/playbook for executing data management scripts and (non-trivial) migrations against it. Acceptance criteria will mean something that can pass a protected-B level security assessment while also being sufficiently flexible and confidence-inspiring for devs.

The GitOps + zero trust ideal would be to carefully plan out django migrations, and to use data migrations instead of scripts, so that they can be run hands-off in the cloud as part of the project's CD infrastructure. This no-one-touches-the-database approach theoretically breezes past a prot-B assessment (as much as anything can breeze past the gauntlet).

Fully hands-off DB management is less flexible and seems to not inspire developer confidence. To my understanding, lack of confidence in the approach is in large part because no hands-on-keyboard means no ability to steer the ship when things go sideways.

Personally, I'd like to try and work out the hands-free approach, but I also think some sort of break-glass direct access system for emergencies is sensible. Of course, that means doing all the work and maintenance for both (although it'd be easier to get break-glass console access past security then saying it will be the default management method).

Alpha domain name and DNS infrastructure

We probably want a new domain to reflect the HoPiC brand, rather than the existing CPHO name. This will just be for the alpha deployments, but we should start answering some related bilingualism questions now.

Do we want one bilingual acronym based name, or one plan language domain name per official language? The former is the standard, but isn't great, while the second will require slightly more book keeping and might put us at odds with policy enforcers. It'll be the business owners call ultimately, although we could do either or both while still in the dev/alpha stage.

As part of this issue, update the relevant steps in gcloud_init_setup.sh, the ALLOWED_HOSTS configuration in the GCP Secret Manager prod env vars, etc.

The domain name will be provisioned via https://github.com/PHACDataHub/dns

Consider CDN for static content

Something to look in to down the road. Whitenoise is nice for simplicity and consistency between dev and prod, but having all static content requests hitting Cloud Run isn't great. Cloud Armor should help block DDOS once we have that in place, so that part of the Whitenoise trade off isn't a big concern. Still, a CDN would probably be nice-to-have for performance, if the cache busting isn't too annoying to wrangle.

breadcrumb trail and back buttons

indicators > (indicator name) > period > stratifier > (edit/review)

Also add back buttons, but back buttons shouldn't be too redundant with the breadcrumb trail. Maybe just on form pages

Upload coverage reports & make more visable.

This issue comes out of PR #73 Testing as a Cloud Build step and code coverage reporting.

The test coverage report is currently printed deep in the cloud build logs. Let's pull this out and ideally have it printed back to GitHub, but saved to Google Cloud Storage is a good first step.

Link to comment.

Consider incorporating Identity Aware Proxy down the line

Since the (non-API) portions of this project are intended to be accessible only by internal users, and the application may every integrate with AD for SSO down the line, it will likely make sense to go one step further and slap an IAP along the network path for incoming requests.

This video might be a good starting point.

This is low-ish priority for now, might be worth waiting to see if the AD SSO pans out first.

Streamline project specific configurations as Kustomize patch

There are some "project specific" components that can be consolidated into a kustomize patch to house all configurations in a single file. This would make deployments easier for other gcp environments / projects.

For example, the project field in the cert-manager/issuers.yaml could be a kustomization patch.

cloudDNS:
# The ID of the GCP project
project: phx-01h4rr1468rj3v5k60b1vserd3

Tasks

Review the k8s logs

There are a lot of logs coming out of the k8s cluster.

  • some might indicate missing or bad configuration. Prioritize fixing those
    • there are info level logs that also seem to imply configuration issues, e.g.
      image
  • others might just be pure noise with no applicable "fix". If possible, do something to silence those

System check warning re AxesModelBackend was renamed

Get this warning ?: (axes.W003) You do not have 'axes.backends.AxesStandaloneBackend' or a subclass in your settings.AUTHENTICATION_BACKENDS. HINT: AxesModelBackend was renamed to AxesStandaloneBackend in django-axes version 5.0.when running on each step when running

python ./manage.py loaddata cpho/fixtures/periods.yaml, 
python ./manage.py loaddata cpho/fixtures/dimension_lookups.yaml
python ./manage.py runscript cpho.scripts.dev

multiple measures per indicator

Often an indicator has multiple things being measured, e.g. a relative measure and an absolute. This sounds tricky, we may have to add a new model.

A cheap alternative is to force people to create one indicator per measure. If the only issue with this is cosmetic (e.g. indicator names) then we can create a very lightweight "indicator grouping" as a parent to indicator and rebrand indicators as "measures".

Add step to enable Postgres audit logs in gcloud_init_setup.sh

As described here. Setting/updating database flags is easy, but you also need to run CREATE EXTENSION pgaudit; in the database. So we're back to needing a good script-able way to connect to the prod DB or a good strategy for DB script Cloud Run jobs.

Could be Cloud SQL Auth Proxy, but that's an additional dependency needed on-machine and might be difficult to integrate smoothly in the script. It also can't currently connect to the DB; either the dev machine needs to be able to connect to the VPC to use the private IP, or we need to be temporarily enabling a public IP before connecting via Cloud SQL Proxy. Hm. See #64,

Much less hacky would be to make a one-time DB initialization Cloud Job that we kick off from the init script, although I know there's still a desire to have a way to get a hands-on DB shell, at least during this pre-prod stage.

... see also these limitations and warnings. Could need some caution and careful configuration (especially if we want to use this on a busier application in the future).

Approvals and submission flow - data-model and UI

Only records that have been both approved by the program and HSO can be published via the API. Records can still be modified after approval, but we don't want to publish the edits that aren't published.

How do we do this? By only serving approved versions in the API.

We can add is_program_approved and is_hso_approved boolean fields on the indicator-datum VERSION model. There's already an approved flag on the abstract ApprovableCustomVersionModelWithEditor, we just need to split up that field into 2.

Also, for more meta-data reasons, we can also have a Submission model that will just note who/when approved something. This submission model will also contain a type=HSO|Program choice field.

For the user-interface. One idea is to have a POST endpoint (triggered by a submit button) that scopes an (indicator,period) pair. This endpoint triggers a services.py function that iterates all the latest versions for indicator-data within that scope and sets is_program_approved=True. Similarly, another endpoint/service/button for the HSO approval. These services will also create a Submission record.

Note that if someone has already submitted, but wants to make a correction to a single record and clicks "re-submit", the service will iterate versions it has previously approved. This isn't an issue or anything, just thought I'd point out.

Add linting and formatting steps to Cloud Build CI/CD

Maybe straight forward, unless there's a lot of outstanding lint/format errors to resolve.

Should run black, isort, and djlint. Will need a container with requirements_formatting.txt installed, could be done with the test image or even during the tests step.

Can be done right now, but will have interplay with #78.

Infrastructure as Code/Data

The current run-once gcloud_init_setup.sh script is better than undocumented click-ops, but it's not the ideal (and will not be too useful when it's time to update the infrastructure).

@tcaky will be working on a generic Kubernetes Config Connector IaC for a more framework-agnostic Cloud Run + Cloud SQL infrastructure. We'll likely wait for that as our starting point, but this task for this issue is to help Keith out with that.

Alternatively, we could take time now to experiment with some other IaC/D solutions, but that's not a priority for now.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.